Patch #1 shouldn't be necessary after a patch I have recently submitted to Bram
is
incorporated as it was a bug that utf_head_off was called at all in the case
you
found; your solution of adding a NUL works around one of the symptoms rather
than
fixing the real bug.
Haven't looked at #2 so will leave that to Bram/others to comment.
Ben.
Dominique Pelle wrote:
> Bug#1:
>
> Valgrind memory checker detects a bug when using utf-8 locale
> and when loading a file which contains only 1 byte (0xfc).
>
> Bug is easy to reproduce:
>
> 1/ make sure locale is utf-8
>
> 2/ create a file with only 1 byte 0xfc:
>
> $ perl -e 'print chr(0xfc)' > foo.txt
>
> Note that this is clearly not a valid utf-8 file.
>
> 3/ open the file with vim and valgrind:
>
> $ valgrind vim -u NONE foo.txt 2> vg.log
>
> 4/ notice that valgrind complains immediately as file is being read:
>
> ==15575== Conditional jump or move depends on uninitialised value(s)
> ==15575== at 0x811F6FD: utf_head_off (mbyte.c:2494)
> ==15575== by 0x80C32B0: readfile (fileio.c:1880)
> ==15575== by 0x80527AE: open_buffer (buffer.c:130)
> ==15575== by 0x80E7F82: create_windows (main.c:2459)
> ==15575== by 0x80E5C84: main (main.c:799)
>
> 5/ notice as well that vim has set file encoding to utf-8 even
> though it is not a valid utf-8 file.
>
> I would expect fenc to be set automatically to latin1,
> and not utf-8.
>
> ':set fenc?' shows: fileencoding=utf-8
> ':set fileencodings?" shows: fileencodings=ucs-bom,utf-8,latin1
>
>
> Looking at the code, I see that function utf_head_off(base, p)
> reads 1 to n bytes ahead p at line 2494:
>
> 2474 int
> 2475 utf_head_off(base, p)
> 2476 char_u *base;
> 2477 char_u *p;
> 2478 {
> 2479 char_u *q;
> 2480 char_u *s;
> 2481 int c;
> 2482 #ifdef FEAT_ARABIC
> 2483 char_u *j;
> 2484 #endif
> 2485
> 2486 if (*p < 0x80) /* be quick for ASCII */
> 2487 return 0;
> 2488
> 2489 /* Skip backwards over trailing bytes: 10xx.xxxx
> 2490 * Skip backwards again if on a composing char. */
> 2491 for (q = p; ; --q)
> 2492 {
> 2493 /* Move s to the last byte of this char. */
> !!! 2494 for (s = q; (s[1] & 0xc0) == 0x80; ++s)
> 2495 ;
> 2496 /* Move q to the first byte of this char. */
> 2497 while (q > base && (*q & 0xc0) == 0x80)
> 2498 --q;
>
>
> Line 2494 is accessing s[1] (i.e. 'p[1]'). But only
> p[0] is initialized since readfile() only read one byte
> from file, so p[1] and beyond are uninitialized.
>
> Loop at line 2494 can in theory access many other bytes
> beyond the end of the buffer since exit condition
> of loop depends on uninitialized memory.
>
> I think adding a NUL after what is read from file fixes
> this bug. See attached patch but please review it.
>
> However, it does not fix the fact that fenc is set to
> utf-8 even though it is an invalid utf-8 file.
>
>
> Bug #2:
>
> Using the same input file, I can see another bug:
>
> 1/ yank the first line: Y
> This copies the invalid utf-8 into the unnamed "" register
>
> 2/ copy the unnamed "" register onto the Ex line :<CTRL-R>"
>
> 3/ observe the following error when copying "" register:
>
> ==6484== at 0x80BAEBD: cmdline_paste_str (ex_getln.c:3028)
> ==6484== by 0x8132B12: cmdline_paste_reg (ops.c:1528)
> ==6484== by 0x80BADD2: cmdline_paste (ex_getln.c:3009)
> ==6484== by 0x80B8029: getcmdline (ex_getln.c:1131)
> ==6484== by 0x80B96C7: getexline (ex_getln.c:2100)
> ==6484== by 0x80A31FA: do_cmdline (ex_docmd.c:995)
> ==6484== by 0x8129F9A: nv_colon (normal.c:5179)
> ==6484== by 0x8123556: normal_cmd (normal.c:1152)
> ==6484== by 0x80E631D: main_loop (main.c:1181)
> ==6484== by 0x80E5E6D: main (main.c:940)
> ==6484== Address 0x51519C2 is 0 bytes after a block of size 2 alloc'd
> ==6484== at 0x4022765: malloc (vg_replace_malloc.c:149)
> ==6484== by 0x8113B6C: lalloc (misc2.c:862)
> ==6484== by 0x8113A8E: alloc (misc2.c:761)
> ==6484== by 0x8113EFB: vim_strsave (misc2.c:1166)
> ==6484== by 0x81355CD: op_yank (ops.c:2901)
> ==6484== by 0x8124AC6: do_pending_operator (normal.c:1890)
> ==6484== by 0x81235D1: normal_cmd (normal.c:1178)
> ==6484== by 0x80E631D: main_loop (main.c:1181)
> ==6484== by 0x80E5E6D: main (main.c:940)
>
> 3018 void
> 3019 cmdline_paste_str(s, literally)
> 3020 char_u *s;
> 3021 int literally;
> 3022 {
> 3023 int c, cv;
> 3024
> 3025 if (literally)
> 3026 put_on_cmdline(s, -1, TRUE);
> 3027 else
> !!! 3028 while (*s != NUL)
> 3029 {
> 3030 cv = *s;
> 3031 if (cv == Ctrl_V && s[1])
> 3032 ++s;
> 3033 #ifdef FEAT_MBYTE
> 3034 if (has_mbyte)
> 3035 {
> !!! 3036 c = mb_ptr2char(s);
> !!! 3037 s += mb_char2len(c);
> 3038 }
> 3039 else
> 3040 #endif
>
> Call to mb_ptr2char(s) at line 3036 consumes only 1 byte
> because s is an invalid utf-8 sequence and returns the
> first byte which is 0xfc. Calls to mb_char2len(0xfc)
> at next line increments s by 2 bytes instead of 1 (!!)
> because it takes 2 bytes to store character u+00fc in
> utf-8 (which is inconsistent with the 1 byte consumed
> by previous line).
>
> I think calling 'c = mb_cptr2char_adv(&s);' to increment s
> fixes the problem as in attached patch but please review it.
>
> -- Dominique
>
> >
>
--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---