Dominique Pelle wrote:

> Here is another bug because of invalid utf-8 sequence:
> 
> ==21329== Conditional jump or move depends on uninitialised value(s)
> ==21329==    at 0x811C4C7: utf_ptr2char (mbyte.c:1390)
> ==21329==    by 0x811C83F: utf_composinglike (mbyte.c:1498)
> ==21329==    by 0x811CD96: utfc_ptr2len_len (mbyte.c:1756)
> ==21329==    by 0x80FEAF1: msg_outtrans_len_attr (message.c:1355)
> ==21329==    by 0x80FE9D0: msg_outtrans_len (message.c:1292)
> ==21329==    by 0x80B9D7F: draw_cmdline (ex_getln.c:2641)
> ==21329==    by 0x80BAA99: redrawcmd (ex_getln.c:3126)
> ==21329==    by 0x80BAF82: nextwild (ex_getln.c:3317)
> ==21329==    by 0x80B6D3B: getcmdline (ex_getln.c:803)
> ==21329==    by 0x80B912E: getexline (ex_getln.c:2101)
> ==21329==    by 0x80A33FC: do_cmdline (ex_docmd.c:991)
> ==21329==    by 0x81280DD: nv_colon (normal.c:5181)
> ==21329==    by 0x8121772: normal_cmd (normal.c:1152)
> ==21329==    by 0x80E4E8A: main_loop (main.c:1177)
> ==21329==    by 0x80E49DA: main (main.c:936)
> 
> ==21434== Conditional jump or move depends on uninitialised value(s)
> ==21434==    at 0x811C4C7: utf_ptr2char (mbyte.c:1390)
> ==21434==    by 0x811C28C: utf_ptr2cells (mbyte.c:1268)
> ==21434==    by 0x805D084: ptr2cells (charset.c:746)
> ==21434==    by 0x80FEC65: msg_outtrans_len_attr (message.c:1394)
> ==21434==    by 0x80FE9D0: msg_outtrans_len (message.c:1292)
> ==21434==    by 0x80B9D7F: draw_cmdline (ex_getln.c:2641)
> ==21434==    by 0x80BAA99: redrawcmd (ex_getln.c:3126)
> ==21434==    by 0x80BAF82: nextwild (ex_getln.c:3317)
> ==21434==    by 0x80B6D3B: getcmdline (ex_getln.c:803)
> ==21434==    by 0x80B912E: getexline (ex_getln.c:2101)
> ==21434==    by 0x80A33FC: do_cmdline (ex_docmd.c:991)
> ==21434==    by 0x81280DD: nv_colon (normal.c:5181)
> ==21434==    by 0x8121772: normal_cmd (normal.c:1152)
> ==21434==    by 0x80E4E8A: main_loop (main.c:1177)
> ==21434==    by 0x80E49DA: main (main.c:936)
> 
> It's easy to reproduce:
> 
> 1/ Create a file names with invalid utf-8 name:
> 
>    $ mkdir testcase
>    $ touch testcase/$(perl -e 'print chr(0xfb)')
> 
> 2/ Start Vim in 'no compatible' mode with Valgrind:
> 
>    $ valgrind vim -u NONE -c -N 2> vg.log
> 
> 4/ In ex mode, enter:
> 
>    :e testcase/<TAB>
> 
>    When pressing <TAB> key (file completion), vim should display
>    file name with invalid utf-8 sequence 'testcase/<fb>'
>    and observe in vg.log that valgrind reports above errors.
> 
> The 2 error messages are actually 2 distinct bugs.
> 
> - First error: in mbyte.c:1756, calling
>   UTF_COMPOSINGLIKE(p + prevlen, p + len) is unsafe, since it
>   can read bytes beyond size.  So in theory, there is a small risk
>   that vim combines characters when it should not.
> 
>   Attached patch "fix1-invalid-utf8-seq.patch" fixes this first
>   error, by calling utf_ptr2len_len(p + len, size - len) before
>   UTF_COMPOSINGLIKE(...) to ensure that UTF_COMPOSINGLIKE(...)
>   can't access beyond size.
> 
> - Second error: in message.c:1394, calling ptr2cells(str) is
>   unsafe since it can read bytes in str beyond len (which are
>   uninitialized).  At line 1394, we know that the character is
>   only 1 byte long (mb_l is 1) so it should call char2cells(*str).
>   mb_l is 1 here because of incomplete utf-8 sequence so calling
>   ptr2cells(str) overflows.
> 
>   Attached patch "fix2-invalid-utf8-seq.patch" fixes it.
> 
> After fixing these 2 bugs, valgrind no longer complains with
> the above testcase.
> 
> However, there is a 3rd bug which can be triggered with a slightly
> different test case, using file <ff> instead of <fb>.  The difference
> is that <ff> is an invalid utf-8 sequence, whereas <fb> was an
> incomplete utf-8 sequence:
> 
> 1/ Create a file names with invalid utf-8 names:
> 
>    $ mkdir testcase
>    $ touch testcase/$(perl -e 'print chr(0xff)')
> 
> 2/ Start Vim in 'no compatible' mode with Valgrind:
> 
>    $ valgrind vim -u NONE -c -N 2> vg.log
> 
> 3/ In ex mode, enter:
> 
>    :e testcase/<TAB><TAB>
> 
>    When pressing <TAB> key (file completion), vim should display
>    file name with invalid utf-8 sequence 'e: testcase/<fb>' (without
>    valgrind errors after above patches) then when processing <TAB>
>    again, vim displays next file 'e: testcase/<ff>' but observe
>    that valgrind reports following error:
> 
> ==22527== Conditional jump or move depends on uninitialised value(s)
> ==22527==    at 0x811C4CB: utf_ptr2char (mbyte.c:1390)
> ==22527==    by 0x811C843: utf_composinglike (mbyte.c:1498)
> ==22527==    by 0x811CDC8: utfc_ptr2len_len (mbyte.c:1769)
> ==22527==    by 0x80FEAF1: msg_outtrans_len_attr (message.c:1355)
> ==22527==    by 0x80FE9D0: msg_outtrans_len (message.c:1292)
> ==22527==    by 0x80B9D7F: draw_cmdline (ex_getln.c:2641)
> ==22527==    by 0x80BAA99: redrawcmd (ex_getln.c:3126)
> ==22527==    by 0x80BAF82: nextwild (ex_getln.c:3317)
> ==22527==    by 0x80B6CC6: getcmdline (ex_getln.c:790)
> ==22527==    by 0x80B912E: getexline (ex_getln.c:2101)
> ==22527==    by 0x80A33FC: do_cmdline (ex_docmd.c:991)
> ==22527==    by 0x81280F5: nv_colon (normal.c:5181)
> ==22527==    by 0x812178A: normal_cmd (normal.c:1152)
> ==22527==    by 0x80E4E8A: main_loop (main.c:1177)
> ==22527==    by 0x80E49DA: main (main.c:936)
> 
> This bug happens because utf_ptr2char(p) does not check
> whether first byte of utf-8 sequence is legal.  When p[0] >= 0x80
> and utf8len_tab[p[0]] is 1, then the first byte is illegal, and
> utf_ptr2char(p) should return 1 immediately, without checking p[1].
> 
> Attached patch fix3-invalid-utf8-seq.patch fixes this 3rd bug.
> 
> After those 3 fixes, I no longer observe errors.

All three look like good fixes to me.  I'll include them.

> I hope errors with invalid utf-8 sequences are not too nitpicking.
> I'm not aware of any other such errors at the moment.  So hopefully
> no more patches like this.

Depending on the system reading uninitialised memory may cause a crash.
And when mixing up latin1 with utf-8 it's easy to get illegal byte
sequences.  So it definitely makes sense to fix all these little
problems.  I'm very glad you find them.

-- 
hundred-and-one symptoms of being an internet addict:
117. You are more comfortable typing in html.

 /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net   \\\
///        sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\        download, build and distribute -- http://www.A-A-P.org        ///
 \\\            help me help AIDS victims -- http://ICCF-Holland.org    ///

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Raspunde prin e-mail lui