Here is another bug because of invalid utf-8 sequence:
==21329== Conditional jump or move depends on uninitialised value(s)
==21329== at 0x811C4C7: utf_ptr2char (mbyte.c:1390)
==21329== by 0x811C83F: utf_composinglike (mbyte.c:1498)
==21329== by 0x811CD96: utfc_ptr2len_len (mbyte.c:1756)
==21329== by 0x80FEAF1: msg_outtrans_len_attr (message.c:1355)
==21329== by 0x80FE9D0: msg_outtrans_len (message.c:1292)
==21329== by 0x80B9D7F: draw_cmdline (ex_getln.c:2641)
==21329== by 0x80BAA99: redrawcmd (ex_getln.c:3126)
==21329== by 0x80BAF82: nextwild (ex_getln.c:3317)
==21329== by 0x80B6D3B: getcmdline (ex_getln.c:803)
==21329== by 0x80B912E: getexline (ex_getln.c:2101)
==21329== by 0x80A33FC: do_cmdline (ex_docmd.c:991)
==21329== by 0x81280DD: nv_colon (normal.c:5181)
==21329== by 0x8121772: normal_cmd (normal.c:1152)
==21329== by 0x80E4E8A: main_loop (main.c:1177)
==21329== by 0x80E49DA: main (main.c:936)
==21434== Conditional jump or move depends on uninitialised value(s)
==21434== at 0x811C4C7: utf_ptr2char (mbyte.c:1390)
==21434== by 0x811C28C: utf_ptr2cells (mbyte.c:1268)
==21434== by 0x805D084: ptr2cells (charset.c:746)
==21434== by 0x80FEC65: msg_outtrans_len_attr (message.c:1394)
==21434== by 0x80FE9D0: msg_outtrans_len (message.c:1292)
==21434== by 0x80B9D7F: draw_cmdline (ex_getln.c:2641)
==21434== by 0x80BAA99: redrawcmd (ex_getln.c:3126)
==21434== by 0x80BAF82: nextwild (ex_getln.c:3317)
==21434== by 0x80B6D3B: getcmdline (ex_getln.c:803)
==21434== by 0x80B912E: getexline (ex_getln.c:2101)
==21434== by 0x80A33FC: do_cmdline (ex_docmd.c:991)
==21434== by 0x81280DD: nv_colon (normal.c:5181)
==21434== by 0x8121772: normal_cmd (normal.c:1152)
==21434== by 0x80E4E8A: main_loop (main.c:1177)
==21434== by 0x80E49DA: main (main.c:936)
It's easy to reproduce:
1/ Create a file names with invalid utf-8 name:
$ mkdir testcase
$ touch testcase/$(perl -e 'print chr(0xfb)')
2/ Start Vim in 'no compatible' mode with Valgrind:
$ valgrind vim -u NONE -c -N 2> vg.log
4/ In ex mode, enter:
:e testcase/<TAB>
When pressing <TAB> key (file completion), vim should display
file name with invalid utf-8 sequence 'testcase/<fb>'
and observe in vg.log that valgrind reports above errors.
The 2 error messages are actually 2 distinct bugs.
- First error: in mbyte.c:1756, calling
UTF_COMPOSINGLIKE(p + prevlen, p + len) is unsafe, since it
can read bytes beyond size. So in theory, there is a small risk
that vim combines characters when it should not.
Attached patch "fix1-invalid-utf8-seq.patch" fixes this first
error, by calling utf_ptr2len_len(p + len, size - len) before
UTF_COMPOSINGLIKE(...) to ensure that UTF_COMPOSINGLIKE(...)
can't access beyond size.
- Second error: in message.c:1394, calling ptr2cells(str) is
unsafe since it can read bytes in str beyond len (which are
uninitialized). At line 1394, we know that the character is
only 1 byte long (mb_l is 1) so it should call char2cells(*str).
mb_l is 1 here because of incomplete utf-8 sequence so calling
ptr2cells(str) overflows.
Attached patch "fix2-invalid-utf8-seq.patch" fixes it.
After fixing these 2 bugs, valgrind no longer complains with
the above testcase.
However, there is a 3rd bug which can be triggered with a slightly
different test case, using file <ff> instead of <fb>. The difference
is that <ff> is an invalid utf-8 sequence, whereas <fb> was an
incomplete utf-8 sequence:
1/ Create a file names with invalid utf-8 names:
$ mkdir testcase
$ touch testcase/$(perl -e 'print chr(0xff)')
2/ Start Vim in 'no compatible' mode with Valgrind:
$ valgrind vim -u NONE -c -N 2> vg.log
3/ In ex mode, enter:
:e testcase/<TAB><TAB>
When pressing <TAB> key (file completion), vim should display
file name with invalid utf-8 sequence 'e: testcase/<fb>' (without
valgrind errors after above patches) then when processing <TAB>
again, vim displays next file 'e: testcase/<ff>' but observe
that valgrind reports following error:
==22527== Conditional jump or move depends on uninitialised value(s)
==22527== at 0x811C4CB: utf_ptr2char (mbyte.c:1390)
==22527== by 0x811C843: utf_composinglike (mbyte.c:1498)
==22527== by 0x811CDC8: utfc_ptr2len_len (mbyte.c:1769)
==22527== by 0x80FEAF1: msg_outtrans_len_attr (message.c:1355)
==22527== by 0x80FE9D0: msg_outtrans_len (message.c:1292)
==22527== by 0x80B9D7F: draw_cmdline (ex_getln.c:2641)
==22527== by 0x80BAA99: redrawcmd (ex_getln.c:3126)
==22527== by 0x80BAF82: nextwild (ex_getln.c:3317)
==22527== by 0x80B6CC6: getcmdline (ex_getln.c:790)
==22527== by 0x80B912E: getexline (ex_getln.c:2101)
==22527== by 0x80A33FC: do_cmdline (ex_docmd.c:991)
==22527== by 0x81280F5: nv_colon (normal.c:5181)
==22527== by 0x812178A: normal_cmd (normal.c:1152)
==22527== by 0x80E4E8A: main_loop (main.c:1177)
==22527== by 0x80E49DA: main (main.c:936)
This bug happens because utf_ptr2char(p) does not check
whether first byte of utf-8 sequence is legal. When p[0] >= 0x80
and utf8len_tab[p[0]] is 1, then the first byte is illegal, and
utf_ptr2char(p) should return 1 immediately, without checking p[1].
Attached patch fix3-invalid-utf8-seq.patch fixes this 3rd bug.
After those 3 fixes, I no longer observe errors.
I hope errors with invalid utf-8 sequences are not too nitpicking.
I'm not aware of any other such errors at the moment. So hopefully
no more patches like this.
I'm using Vim-7.2a.8 BETA (huge), utf-8 encoding, on Linux x86.
-- Dominique
--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---
Index: mbyte.c
===================================================================
RCS file: /cvsroot/vim/vim7/src/mbyte.c,v
retrieving revision 1.65
diff -c -r1.65 mbyte.c
*** mbyte.c 24 Jun 2008 21:15:45 -0000 1.65
--- mbyte.c 29 Jun 2008 11:12:05 -0000
***************
*** 1753,1766 ****
#endif
while (len < size)
{
! if (p[len] < 0x80 || !UTF_COMPOSINGLIKE(p + prevlen, p + len))
break;
/* Skip over composing char */
#ifdef FEAT_ARABIC
prevlen = len;
#endif
! len += utf_ptr2len_len(p + len, size - len);
}
return len;
}
--- 1753,1779 ----
#endif
while (len < size)
{
! int len_next_char;
!
! if (p[len] < 0x80)
! break;
!
! /*
! * Next character length should not go beyond size to ensure that
! * UTF_COMPOSINGLIKE(...) does not read beyond size.
! */
! len_next_char = utf_ptr2len_len(p + len, size - len);
! if (len_next_char > size - len)
! break;
!
! if (!UTF_COMPOSINGLIKE(p + prevlen, p + len))
break;
/* Skip over composing char */
#ifdef FEAT_ARABIC
prevlen = len;
#endif
! len += len_next_char;
}
return len;
}
Index: message.c
===================================================================
RCS file: /cvsroot/vim/vim7/src/message.c,v
retrieving revision 1.62
diff -c -r1.62 message.c
*** message.c 28 Jun 2008 14:09:47 -0000 1.62
--- message.c 29 Jun 2008 11:12:47 -0000
***************
*** 1391,1397 ****
plain_start = str + 1;
msg_puts_attr(s, attr == 0 ? hl_attr(HLF_8) : attr);
}
! retval += ptr2cells(str);
++str;
}
}
--- 1391,1397 ----
plain_start = str + 1;
msg_puts_attr(s, attr == 0 ? hl_attr(HLF_8) : attr);
}
! retval += char2cells(*str);
++str;
}
}
Index: mbyte.c
===================================================================
RCS file: /cvsroot/vim/vim7/src/mbyte.c,v
retrieving revision 1.65
diff -c -r1.65 mbyte.c
*** mbyte.c 24 Jun 2008 21:15:45 -0000 1.65
--- mbyte.c 29 Jun 2008 11:18:02 -0000
***************
*** 1387,1393 ****
return p[0];
len = utf8len_tab[p[0]];
! if ((p[1] & 0xc0) == 0x80)
{
if (len == 2)
return ((p[0] & 0x1f) << 6) + (p[1] & 0x3f);
--- 1387,1396 ----
return p[0];
len = utf8len_tab[p[0]];
!
! /* Do not check p[1] if p[0] is an illegal first byte in utf-8 sequence
! * i.e. if len is 1 */
! if (len > 1 && (p[1] & 0xc0) == 0x80)
{
if (len == 2)
return ((p[0] & 0x1f) << 6) + (p[1] & 0x3f);