On Sat, 12 May 2007 12:05:26 -0700
Allison Randal (via RT) <[EMAIL PROTECTED]> wrote:
> On x86 Linux (Ubuntu), this configuration fails 2 tests:
>
> t/library/string_utils.t 0 134 29 4 13.79% 28-29
> t/op/stringu.t 2 512 25 2 8.00% 1 19
>
> Both tests are failing with the error:
>
> parrot: src/encodings/utf8.c:271: utf8_encode_and_advance: Assertion
> `i->bytepos <= (s)->obj.u._b._buflen' failed.
Reproduced on Gentoo. Before patch, results are as above.
After patch:
t/library/string_utils....ok
t/op/stringu..............ok
The code in utf8_encode_and_advance is beautiful. It basically says,
add a utf8 character to the buffer. Ok, now did we overrun the buffer?
CRASH!
It seems safer to check the buffer size *before* writing to it, so
here's a patch to do so. Is it the right fix? I thought so when I
was doing it, but now I'm not so sure; it does introduce a const
warning. Maybe we can resolve that with a cast; maybe its the wrong
solution to the problem. Please provide guidance.
Might be worth it to prereserve 8 bytes or so, to avoid having to
realloc as often, if this will be called a lot. Currently it just
reallocs the minimum necessary to fit the existing string, the new
character and a null terminator.
Mark
=== src/encodings/utf8.c
==================================================================
--- src/encodings/utf8.c (revision 20520)
+++ src/encodings/utf8.c (local)
@@ -264,6 +264,9 @@
const STRING *s = i->str;
unsigned char *new_pos, *pos;
+ if(i->bytepos + UNISKIP(c) >= PObj_buflen(s)) {
+ Parrot_reallocate_string(interp, i->str, i->bytepos + UNISKIP(c) + 1);
+ }
pos = (unsigned char *)s->strstart + i->bytepos;
new_pos = utf8_encode(pos, c);
i->bytepos += (new_pos - pos);