Patch #1 shouldn't be necessary after a patch I have recently submitted to Bram is incorporated as it was a bug that utf_head_off was called at all in the case you found; your solution of adding a NUL works around one of the symptoms rather than fixing the real bug.
Haven't looked at #2 so will leave that to Bram/others to comment. Ben. Dominique Pelle wrote: > Bug#1: > > Valgrind memory checker detects a bug when using utf-8 locale > and when loading a file which contains only 1 byte (0xfc). > > Bug is easy to reproduce: > > 1/ make sure locale is utf-8 > > 2/ create a file with only 1 byte 0xfc: > > $ perl -e 'print chr(0xfc)' > foo.txt > > Note that this is clearly not a valid utf-8 file. > > 3/ open the file with vim and valgrind: > > $ valgrind vim -u NONE foo.txt 2> vg.log > > 4/ notice that valgrind complains immediately as file is being read: > > ==15575== Conditional jump or move depends on uninitialised value(s) > ==15575== at 0x811F6FD: utf_head_off (mbyte.c:2494) > ==15575== by 0x80C32B0: readfile (fileio.c:1880) > ==15575== by 0x80527AE: open_buffer (buffer.c:130) > ==15575== by 0x80E7F82: create_windows (main.c:2459) > ==15575== by 0x80E5C84: main (main.c:799) > > 5/ notice as well that vim has set file encoding to utf-8 even > though it is not a valid utf-8 file. > > I would expect fenc to be set automatically to latin1, > and not utf-8. > > ':set fenc?' shows: fileencoding=utf-8 > ':set fileencodings?" shows: fileencodings=ucs-bom,utf-8,latin1 > > > Looking at the code, I see that function utf_head_off(base, p) > reads 1 to n bytes ahead p at line 2494: > > 2474 int > 2475 utf_head_off(base, p) > 2476 char_u *base; > 2477 char_u *p; > 2478 { > 2479 char_u *q; > 2480 char_u *s; > 2481 int c; > 2482 #ifdef FEAT_ARABIC > 2483 char_u *j; > 2484 #endif > 2485 > 2486 if (*p < 0x80) /* be quick for ASCII */ > 2487 return 0; > 2488 > 2489 /* Skip backwards over trailing bytes: 10xx.xxxx > 2490 * Skip backwards again if on a composing char. */ > 2491 for (q = p; ; --q) > 2492 { > 2493 /* Move s to the last byte of this char. */ > !!! 2494 for (s = q; (s[1] & 0xc0) == 0x80; ++s) > 2495 ; > 2496 /* Move q to the first byte of this char. */ > 2497 while (q > base && (*q & 0xc0) == 0x80) > 2498 --q; > > > Line 2494 is accessing s[1] (i.e. 'p[1]'). But only > p[0] is initialized since readfile() only read one byte > from file, so p[1] and beyond are uninitialized. > > Loop at line 2494 can in theory access many other bytes > beyond the end of the buffer since exit condition > of loop depends on uninitialized memory. > > I think adding a NUL after what is read from file fixes > this bug. See attached patch but please review it. > > However, it does not fix the fact that fenc is set to > utf-8 even though it is an invalid utf-8 file. > > > Bug #2: > > Using the same input file, I can see another bug: > > 1/ yank the first line: Y > This copies the invalid utf-8 into the unnamed "" register > > 2/ copy the unnamed "" register onto the Ex line :<CTRL-R>" > > 3/ observe the following error when copying "" register: > > ==6484== at 0x80BAEBD: cmdline_paste_str (ex_getln.c:3028) > ==6484== by 0x8132B12: cmdline_paste_reg (ops.c:1528) > ==6484== by 0x80BADD2: cmdline_paste (ex_getln.c:3009) > ==6484== by 0x80B8029: getcmdline (ex_getln.c:1131) > ==6484== by 0x80B96C7: getexline (ex_getln.c:2100) > ==6484== by 0x80A31FA: do_cmdline (ex_docmd.c:995) > ==6484== by 0x8129F9A: nv_colon (normal.c:5179) > ==6484== by 0x8123556: normal_cmd (normal.c:1152) > ==6484== by 0x80E631D: main_loop (main.c:1181) > ==6484== by 0x80E5E6D: main (main.c:940) > ==6484== Address 0x51519C2 is 0 bytes after a block of size 2 alloc'd > ==6484== at 0x4022765: malloc (vg_replace_malloc.c:149) > ==6484== by 0x8113B6C: lalloc (misc2.c:862) > ==6484== by 0x8113A8E: alloc (misc2.c:761) > ==6484== by 0x8113EFB: vim_strsave (misc2.c:1166) > ==6484== by 0x81355CD: op_yank (ops.c:2901) > ==6484== by 0x8124AC6: do_pending_operator (normal.c:1890) > ==6484== by 0x81235D1: normal_cmd (normal.c:1178) > ==6484== by 0x80E631D: main_loop (main.c:1181) > ==6484== by 0x80E5E6D: main (main.c:940) > > 3018 void > 3019 cmdline_paste_str(s, literally) > 3020 char_u *s; > 3021 int literally; > 3022 { > 3023 int c, cv; > 3024 > 3025 if (literally) > 3026 put_on_cmdline(s, -1, TRUE); > 3027 else > !!! 3028 while (*s != NUL) > 3029 { > 3030 cv = *s; > 3031 if (cv == Ctrl_V && s[1]) > 3032 ++s; > 3033 #ifdef FEAT_MBYTE > 3034 if (has_mbyte) > 3035 { > !!! 3036 c = mb_ptr2char(s); > !!! 3037 s += mb_char2len(c); > 3038 } > 3039 else > 3040 #endif > > Call to mb_ptr2char(s) at line 3036 consumes only 1 byte > because s is an invalid utf-8 sequence and returns the > first byte which is 0xfc. Calls to mb_char2len(0xfc) > at next line increments s by 2 bytes instead of 1 (!!) > because it takes 2 bytes to store character u+00fc in > utf-8 (which is inconsistent with the 1 byte consumed > by previous line). > > I think calling 'c = mb_cptr2char_adv(&s);' to increment s > fixes the problem as in attached patch but please review it. > > -- Dominique > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message from the "vim_dev" maillist. For more information, visit http://www.vim.org/maillist.php -~----------~----~----~----~------~----~------~--~---