Bug#1:
Valgrind memory checker detects a bug when using utf-8 locale
and when loading a file which contains only 1 byte (0xfc).
Bug is easy to reproduce:
1/ make sure locale is utf-8
2/ create a file with only 1 byte 0xfc:
$ perl -e 'print chr(0xfc)' > foo.txt
Note that this is clearly not a valid utf-8 file.
3/ open the file with vim and valgrind:
$ valgrind vim -u NONE foo.txt 2> vg.log
4/ notice that valgrind complains immediately as file is being read:
==15575== Conditional jump or move depends on uninitialised value(s)
==15575== at 0x811F6FD: utf_head_off (mbyte.c:2494)
==15575== by 0x80C32B0: readfile (fileio.c:1880)
==15575== by 0x80527AE: open_buffer (buffer.c:130)
==15575== by 0x80E7F82: create_windows (main.c:2459)
==15575== by 0x80E5C84: main (main.c:799)
5/ notice as well that vim has set file encoding to utf-8 even
though it is not a valid utf-8 file.
I would expect fenc to be set automatically to latin1,
and not utf-8.
':set fenc?' shows: fileencoding=utf-8
':set fileencodings?" shows: fileencodings=ucs-bom,utf-8,latin1
Looking at the code, I see that function utf_head_off(base, p)
reads 1 to n bytes ahead p at line 2494:
2474 int
2475 utf_head_off(base, p)
2476 char_u *base;
2477 char_u *p;
2478 {
2479 char_u *q;
2480 char_u *s;
2481 int c;
2482 #ifdef FEAT_ARABIC
2483 char_u *j;
2484 #endif
2485
2486 if (*p < 0x80) /* be quick for ASCII */
2487 return 0;
2488
2489 /* Skip backwards over trailing bytes: 10xx.xxxx
2490 * Skip backwards again if on a composing char. */
2491 for (q = p; ; --q)
2492 {
2493 /* Move s to the last byte of this char. */
!!! 2494 for (s = q; (s[1] & 0xc0) == 0x80; ++s)
2495 ;
2496 /* Move q to the first byte of this char. */
2497 while (q > base && (*q & 0xc0) == 0x80)
2498 --q;
Line 2494 is accessing s[1] (i.e. 'p[1]'). But only
p[0] is initialized since readfile() only read one byte
from file, so p[1] and beyond are uninitialized.
Loop at line 2494 can in theory access many other bytes
beyond the end of the buffer since exit condition
of loop depends on uninitialized memory.
I think adding a NUL after what is read from file fixes
this bug. See attached patch but please review it.
However, it does not fix the fact that fenc is set to
utf-8 even though it is an invalid utf-8 file.
Bug #2:
Using the same input file, I can see another bug:
1/ yank the first line: Y
This copies the invalid utf-8 into the unnamed "" register
2/ copy the unnamed "" register onto the Ex line :<CTRL-R>"
3/ observe the following error when copying "" register:
==6484== at 0x80BAEBD: cmdline_paste_str (ex_getln.c:3028)
==6484== by 0x8132B12: cmdline_paste_reg (ops.c:1528)
==6484== by 0x80BADD2: cmdline_paste (ex_getln.c:3009)
==6484== by 0x80B8029: getcmdline (ex_getln.c:1131)
==6484== by 0x80B96C7: getexline (ex_getln.c:2100)
==6484== by 0x80A31FA: do_cmdline (ex_docmd.c:995)
==6484== by 0x8129F9A: nv_colon (normal.c:5179)
==6484== by 0x8123556: normal_cmd (normal.c:1152)
==6484== by 0x80E631D: main_loop (main.c:1181)
==6484== by 0x80E5E6D: main (main.c:940)
==6484== Address 0x51519C2 is 0 bytes after a block of size 2 alloc'd
==6484== at 0x4022765: malloc (vg_replace_malloc.c:149)
==6484== by 0x8113B6C: lalloc (misc2.c:862)
==6484== by 0x8113A8E: alloc (misc2.c:761)
==6484== by 0x8113EFB: vim_strsave (misc2.c:1166)
==6484== by 0x81355CD: op_yank (ops.c:2901)
==6484== by 0x8124AC6: do_pending_operator (normal.c:1890)
==6484== by 0x81235D1: normal_cmd (normal.c:1178)
==6484== by 0x80E631D: main_loop (main.c:1181)
==6484== by 0x80E5E6D: main (main.c:940)
3018 void
3019 cmdline_paste_str(s, literally)
3020 char_u *s;
3021 int literally;
3022 {
3023 int c, cv;
3024
3025 if (literally)
3026 put_on_cmdline(s, -1, TRUE);
3027 else
!!! 3028 while (*s != NUL)
3029 {
3030 cv = *s;
3031 if (cv == Ctrl_V && s[1])
3032 ++s;
3033 #ifdef FEAT_MBYTE
3034 if (has_mbyte)
3035 {
!!! 3036 c = mb_ptr2char(s);
!!! 3037 s += mb_char2len(c);
3038 }
3039 else
3040 #endif
Call to mb_ptr2char(s) at line 3036 consumes only 1 byte
because s is an invalid utf-8 sequence and returns the
first byte which is 0xfc. Calls to mb_char2len(0xfc)
at next line increments s by 2 bytes instead of 1 (!!)
because it takes 2 bytes to store character u+00fc in
utf-8 (which is inconsistent with the 1 byte consumed
by previous line).
I think calling 'c = mb_cptr2char_adv(&s);' to increment s
fixes the problem as in attached patch but please review it.
-- Dominique
--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---
Index: fileio.c
===================================================================
RCS file: /cvsroot/vim/vim7/src/fileio.c,v
retrieving revision 1.114
diff -c -r1.114 fileio.c
*** fileio.c 11 Mar 2008 21:01:41 -0000 1.114
--- fileio.c 6 Apr 2008 18:14:14 -0000
***************
*** 1279,1284 ****
--- 1279,1291 ----
* Read bytes from the file.
*/
size = vim_read(fd, ptr, size);
+
+ #ifdef FEAT_MBYTE
+ /* null terminate what has been read or else call to
+ * utf_head_off() may access uninitialized bytes
+ * beyond size. */
+ ptr[size] = NUL;
+ #endif
}
if (size <= 0)
***************
*** 2424,2430 ****
/*
* Work around a weird problem: When a file has two links (only
* possible on NTFS) and we write through one link, then stat() it
! * throught the other link, the timestamp information may be wrong.
* It's correct again after reading the file, thus reset the timestamp
* here.
*/
--- 2431,2437 ----
/*
* Work around a weird problem: When a file has two links (only
* possible on NTFS) and we write through one link, then stat() it
! * through the other link, the timestamp information may be wrong.
* It's correct again after reading the file, thus reset the timestamp
* here.
*/
***************
*** 5908,5914 ****
#endif
/*
! * Append the extention.
* ext can start with '.' and cannot exceed 3 more characters.
*/
STRCPY(s, ext);
--- 5915,5921 ----
#endif
/*
! * Append the extension.
* ext can start with '.' and cannot exceed 3 more characters.
*/
STRCPY(s, ext);
Index: ex_getln.c
===================================================================
RCS file: /cvsroot/vim/vim7/src/ex_getln.c,v
retrieving revision 1.82
diff -c -r1.82 ex_getln.c
*** ex_getln.c 22 Jan 2008 11:44:39 -0000 1.82
--- ex_getln.c 6 Apr 2008 18:12:07 -0000
***************
*** 3032,3041 ****
++s;
#ifdef FEAT_MBYTE
if (has_mbyte)
! {
! c = mb_ptr2char(s);
! s += mb_char2len(c);
! }
else
#endif
c = *s++;
--- 3032,3038 ----
++s;
#ifdef FEAT_MBYTE
if (has_mbyte)
! c = mb_cptr2char_adv(&s);
else
#endif
c = *s++;
***************
*** 3335,3341 ****
/*
* Do wildcard expansion on the string 'str'.
* Chars that should not be expanded must be preceded with a backslash.
! * Return a pointer to alloced memory containing the new string.
* Return NULL for failure.
*
* "orig" is the originally expanded string, copied to allocated memory. It
--- 3332,3338 ----
/*
* Do wildcard expansion on the string 'str'.
* Chars that should not be expanded must be preceded with a backslash.
! * Return a pointer to allocated memory containing the new string.
* Return NULL for failure.
*
* "orig" is the originally expanded string, copied to allocated memory. It
***************
*** 6099,6105 ****
exmode_active = save_exmode;
! /* Safety check: The old window or buffer was deleted: It's a a bug when
* this happens! */
if (!win_valid(old_curwin) || !buf_valid(old_curbuf))
{
--- 6096,6102 ----
exmode_active = save_exmode;
! /* Safety check: The old window or buffer was deleted: It's a bug when
* this happens! */
if (!win_valid(old_curwin) || !buf_valid(old_curbuf))
{