Bug#1: Valgrind memory checker detects a bug when using utf-8 locale and when loading a file which contains only 1 byte (0xfc).
Bug is easy to reproduce: 1/ make sure locale is utf-8 2/ create a file with only 1 byte 0xfc: $ perl -e 'print chr(0xfc)' > foo.txt Note that this is clearly not a valid utf-8 file. 3/ open the file with vim and valgrind: $ valgrind vim -u NONE foo.txt 2> vg.log 4/ notice that valgrind complains immediately as file is being read: ==15575== Conditional jump or move depends on uninitialised value(s) ==15575== at 0x811F6FD: utf_head_off (mbyte.c:2494) ==15575== by 0x80C32B0: readfile (fileio.c:1880) ==15575== by 0x80527AE: open_buffer (buffer.c:130) ==15575== by 0x80E7F82: create_windows (main.c:2459) ==15575== by 0x80E5C84: main (main.c:799) 5/ notice as well that vim has set file encoding to utf-8 even though it is not a valid utf-8 file. I would expect fenc to be set automatically to latin1, and not utf-8. ':set fenc?' shows: fileencoding=utf-8 ':set fileencodings?" shows: fileencodings=ucs-bom,utf-8,latin1 Looking at the code, I see that function utf_head_off(base, p) reads 1 to n bytes ahead p at line 2494: 2474 int 2475 utf_head_off(base, p) 2476 char_u *base; 2477 char_u *p; 2478 { 2479 char_u *q; 2480 char_u *s; 2481 int c; 2482 #ifdef FEAT_ARABIC 2483 char_u *j; 2484 #endif 2485 2486 if (*p < 0x80) /* be quick for ASCII */ 2487 return 0; 2488 2489 /* Skip backwards over trailing bytes: 10xx.xxxx 2490 * Skip backwards again if on a composing char. */ 2491 for (q = p; ; --q) 2492 { 2493 /* Move s to the last byte of this char. */ !!! 2494 for (s = q; (s[1] & 0xc0) == 0x80; ++s) 2495 ; 2496 /* Move q to the first byte of this char. */ 2497 while (q > base && (*q & 0xc0) == 0x80) 2498 --q; Line 2494 is accessing s[1] (i.e. 'p[1]'). But only p[0] is initialized since readfile() only read one byte from file, so p[1] and beyond are uninitialized. Loop at line 2494 can in theory access many other bytes beyond the end of the buffer since exit condition of loop depends on uninitialized memory. I think adding a NUL after what is read from file fixes this bug. See attached patch but please review it. However, it does not fix the fact that fenc is set to utf-8 even though it is an invalid utf-8 file. Bug #2: Using the same input file, I can see another bug: 1/ yank the first line: Y This copies the invalid utf-8 into the unnamed "" register 2/ copy the unnamed "" register onto the Ex line :<CTRL-R>" 3/ observe the following error when copying "" register: ==6484== at 0x80BAEBD: cmdline_paste_str (ex_getln.c:3028) ==6484== by 0x8132B12: cmdline_paste_reg (ops.c:1528) ==6484== by 0x80BADD2: cmdline_paste (ex_getln.c:3009) ==6484== by 0x80B8029: getcmdline (ex_getln.c:1131) ==6484== by 0x80B96C7: getexline (ex_getln.c:2100) ==6484== by 0x80A31FA: do_cmdline (ex_docmd.c:995) ==6484== by 0x8129F9A: nv_colon (normal.c:5179) ==6484== by 0x8123556: normal_cmd (normal.c:1152) ==6484== by 0x80E631D: main_loop (main.c:1181) ==6484== by 0x80E5E6D: main (main.c:940) ==6484== Address 0x51519C2 is 0 bytes after a block of size 2 alloc'd ==6484== at 0x4022765: malloc (vg_replace_malloc.c:149) ==6484== by 0x8113B6C: lalloc (misc2.c:862) ==6484== by 0x8113A8E: alloc (misc2.c:761) ==6484== by 0x8113EFB: vim_strsave (misc2.c:1166) ==6484== by 0x81355CD: op_yank (ops.c:2901) ==6484== by 0x8124AC6: do_pending_operator (normal.c:1890) ==6484== by 0x81235D1: normal_cmd (normal.c:1178) ==6484== by 0x80E631D: main_loop (main.c:1181) ==6484== by 0x80E5E6D: main (main.c:940) 3018 void 3019 cmdline_paste_str(s, literally) 3020 char_u *s; 3021 int literally; 3022 { 3023 int c, cv; 3024 3025 if (literally) 3026 put_on_cmdline(s, -1, TRUE); 3027 else !!! 3028 while (*s != NUL) 3029 { 3030 cv = *s; 3031 if (cv == Ctrl_V && s[1]) 3032 ++s; 3033 #ifdef FEAT_MBYTE 3034 if (has_mbyte) 3035 { !!! 3036 c = mb_ptr2char(s); !!! 3037 s += mb_char2len(c); 3038 } 3039 else 3040 #endif Call to mb_ptr2char(s) at line 3036 consumes only 1 byte because s is an invalid utf-8 sequence and returns the first byte which is 0xfc. Calls to mb_char2len(0xfc) at next line increments s by 2 bytes instead of 1 (!!) because it takes 2 bytes to store character u+00fc in utf-8 (which is inconsistent with the 1 byte consumed by previous line). I think calling 'c = mb_cptr2char_adv(&s);' to increment s fixes the problem as in attached patch but please review it. -- Dominique --~--~---------~--~----~------------~-------~--~----~ You received this message from the "vim_dev" maillist. For more information, visit http://www.vim.org/maillist.php -~----------~----~----~----~------~----~------~--~---
Index: fileio.c =================================================================== RCS file: /cvsroot/vim/vim7/src/fileio.c,v retrieving revision 1.114 diff -c -r1.114 fileio.c *** fileio.c 11 Mar 2008 21:01:41 -0000 1.114 --- fileio.c 6 Apr 2008 18:14:14 -0000 *************** *** 1279,1284 **** --- 1279,1291 ---- * Read bytes from the file. */ size = vim_read(fd, ptr, size); + + #ifdef FEAT_MBYTE + /* null terminate what has been read or else call to + * utf_head_off() may access uninitialized bytes + * beyond size. */ + ptr[size] = NUL; + #endif } if (size <= 0) *************** *** 2424,2430 **** /* * Work around a weird problem: When a file has two links (only * possible on NTFS) and we write through one link, then stat() it ! * throught the other link, the timestamp information may be wrong. * It's correct again after reading the file, thus reset the timestamp * here. */ --- 2431,2437 ---- /* * Work around a weird problem: When a file has two links (only * possible on NTFS) and we write through one link, then stat() it ! * through the other link, the timestamp information may be wrong. * It's correct again after reading the file, thus reset the timestamp * here. */ *************** *** 5908,5914 **** #endif /* ! * Append the extention. * ext can start with '.' and cannot exceed 3 more characters. */ STRCPY(s, ext); --- 5915,5921 ---- #endif /* ! * Append the extension. * ext can start with '.' and cannot exceed 3 more characters. */ STRCPY(s, ext);
Index: ex_getln.c =================================================================== RCS file: /cvsroot/vim/vim7/src/ex_getln.c,v retrieving revision 1.82 diff -c -r1.82 ex_getln.c *** ex_getln.c 22 Jan 2008 11:44:39 -0000 1.82 --- ex_getln.c 6 Apr 2008 18:12:07 -0000 *************** *** 3032,3041 **** ++s; #ifdef FEAT_MBYTE if (has_mbyte) ! { ! c = mb_ptr2char(s); ! s += mb_char2len(c); ! } else #endif c = *s++; --- 3032,3038 ---- ++s; #ifdef FEAT_MBYTE if (has_mbyte) ! c = mb_cptr2char_adv(&s); else #endif c = *s++; *************** *** 3335,3341 **** /* * Do wildcard expansion on the string 'str'. * Chars that should not be expanded must be preceded with a backslash. ! * Return a pointer to alloced memory containing the new string. * Return NULL for failure. * * "orig" is the originally expanded string, copied to allocated memory. It --- 3332,3338 ---- /* * Do wildcard expansion on the string 'str'. * Chars that should not be expanded must be preceded with a backslash. ! * Return a pointer to allocated memory containing the new string. * Return NULL for failure. * * "orig" is the originally expanded string, copied to allocated memory. It *************** *** 6099,6105 **** exmode_active = save_exmode; ! /* Safety check: The old window or buffer was deleted: It's a a bug when * this happens! */ if (!win_valid(old_curwin) || !buf_valid(old_curbuf)) { --- 6096,6102 ---- exmode_active = save_exmode; ! /* Safety check: The old window or buffer was deleted: It's a bug when * this happens! */ if (!win_valid(old_curwin) || !buf_valid(old_curbuf)) {