On Sun, Jun 29, 2008 at 11:43 PM, Dominique Pelle wrote: > On Sun, Jun 29, 2008 at 4:10 PM, Bram Moolenaar wrote: > >> Dominique Pelle wrote: >> >>> Here is another bug because of invalid utf-8 sequence: >>> >>> ==21329== Conditional jump or move depends on uninitialised value(s) >>> ==21329== at 0x811C4C7: utf_ptr2char (mbyte.c:1390) >>> ==21329== by 0x811C83F: utf_composinglike (mbyte.c:1498) >>> ==21329== by 0x811CD96: utfc_ptr2len_len (mbyte.c:1756) >>> ==21329== by 0x80FEAF1: msg_outtrans_len_attr (message.c:1355) >>> ==21329== by 0x80FE9D0: msg_outtrans_len (message.c:1292) >>> ==21329== by 0x80B9D7F: draw_cmdline (ex_getln.c:2641) >>> ==21329== by 0x80BAA99: redrawcmd (ex_getln.c:3126) >>> ==21329== by 0x80BAF82: nextwild (ex_getln.c:3317) >>> ==21329== by 0x80B6D3B: getcmdline (ex_getln.c:803) >>> ==21329== by 0x80B912E: getexline (ex_getln.c:2101) >>> ==21329== by 0x80A33FC: do_cmdline (ex_docmd.c:991) >>> ==21329== by 0x81280DD: nv_colon (normal.c:5181) >>> ==21329== by 0x8121772: normal_cmd (normal.c:1152) >>> ==21329== by 0x80E4E8A: main_loop (main.c:1177) >>> ==21329== by 0x80E49DA: main (main.c:936) >>> >>> ==21434== Conditional jump or move depends on uninitialised value(s) >>> ==21434== at 0x811C4C7: utf_ptr2char (mbyte.c:1390) >>> ==21434== by 0x811C28C: utf_ptr2cells (mbyte.c:1268) >>> ==21434== by 0x805D084: ptr2cells (charset.c:746) >>> ==21434== by 0x80FEC65: msg_outtrans_len_attr (message.c:1394) >>> ==21434== by 0x80FE9D0: msg_outtrans_len (message.c:1292) >>> ==21434== by 0x80B9D7F: draw_cmdline (ex_getln.c:2641) >>> ==21434== by 0x80BAA99: redrawcmd (ex_getln.c:3126) >>> ==21434== by 0x80BAF82: nextwild (ex_getln.c:3317) >>> ==21434== by 0x80B6D3B: getcmdline (ex_getln.c:803) >>> ==21434== by 0x80B912E: getexline (ex_getln.c:2101) >>> ==21434== by 0x80A33FC: do_cmdline (ex_docmd.c:991) >>> ==21434== by 0x81280DD: nv_colon (normal.c:5181) >>> ==21434== by 0x8121772: normal_cmd (normal.c:1152) >>> ==21434== by 0x80E4E8A: main_loop (main.c:1177) >>> ==21434== by 0x80E49DA: main (main.c:936) >>> >>> It's easy to reproduce: >>> >>> 1/ Create a file names with invalid utf-8 name: >>> >>> $ mkdir testcase >>> $ touch testcase/$(perl -e 'print chr(0xfb)') >>> >>> 2/ Start Vim in 'no compatible' mode with Valgrind: >>> >>> $ valgrind vim -u NONE -c -N 2> vg.log >>> >>> 4/ In ex mode, enter: >>> >>> :e testcase/<TAB> >>> >>> When pressing <TAB> key (file completion), vim should display >>> file name with invalid utf-8 sequence 'testcase/<fb>' >>> and observe in vg.log that valgrind reports above errors. >>> >>> The 2 error messages are actually 2 distinct bugs. >>> >>> - First error: in mbyte.c:1756, calling >>> UTF_COMPOSINGLIKE(p + prevlen, p + len) is unsafe, since it >>> can read bytes beyond size. So in theory, there is a small risk >>> that vim combines characters when it should not. >>> >>> Attached patch "fix1-invalid-utf8-seq.patch" fixes this first >>> error, by calling utf_ptr2len_len(p + len, size - len) before >>> UTF_COMPOSINGLIKE(...) to ensure that UTF_COMPOSINGLIKE(...) >>> can't access beyond size. >>> >>> - Second error: in message.c:1394, calling ptr2cells(str) is >>> unsafe since it can read bytes in str beyond len (which are >>> uninitialized). At line 1394, we know that the character is >>> only 1 byte long (mb_l is 1) so it should call char2cells(*str). >>> mb_l is 1 here because of incomplete utf-8 sequence so calling >>> ptr2cells(str) overflows. >>> >>> Attached patch "fix2-invalid-utf8-seq.patch" fixes it. >>> >>> After fixing these 2 bugs, valgrind no longer complains with >>> the above testcase. >>> >>> However, there is a 3rd bug which can be triggered with a slightly >>> different test case, using file <ff> instead of <fb>. The difference >>> is that <ff> is an invalid utf-8 sequence, whereas <fb> was an >>> incomplete utf-8 sequence: >>> >>> 1/ Create a file names with invalid utf-8 names: >>> >>> $ mkdir testcase >>> $ touch testcase/$(perl -e 'print chr(0xff)') >>> >>> 2/ Start Vim in 'no compatible' mode with Valgrind: >>> >>> $ valgrind vim -u NONE -c -N 2> vg.log >>> >>> 3/ In ex mode, enter: >>> >>> :e testcase/<TAB><TAB> >>> >>> When pressing <TAB> key (file completion), vim should display >>> file name with invalid utf-8 sequence 'e: testcase/<fb>' (without >>> valgrind errors after above patches) then when processing <TAB> >>> again, vim displays next file 'e: testcase/<ff>' but observe >>> that valgrind reports following error: >>> >>> ==22527== Conditional jump or move depends on uninitialised value(s) >>> ==22527== at 0x811C4CB: utf_ptr2char (mbyte.c:1390) >>> ==22527== by 0x811C843: utf_composinglike (mbyte.c:1498) >>> ==22527== by 0x811CDC8: utfc_ptr2len_len (mbyte.c:1769) >>> ==22527== by 0x80FEAF1: msg_outtrans_len_attr (message.c:1355) >>> ==22527== by 0x80FE9D0: msg_outtrans_len (message.c:1292) >>> ==22527== by 0x80B9D7F: draw_cmdline (ex_getln.c:2641) >>> ==22527== by 0x80BAA99: redrawcmd (ex_getln.c:3126) >>> ==22527== by 0x80BAF82: nextwild (ex_getln.c:3317) >>> ==22527== by 0x80B6CC6: getcmdline (ex_getln.c:790) >>> ==22527== by 0x80B912E: getexline (ex_getln.c:2101) >>> ==22527== by 0x80A33FC: do_cmdline (ex_docmd.c:991) >>> ==22527== by 0x81280F5: nv_colon (normal.c:5181) >>> ==22527== by 0x812178A: normal_cmd (normal.c:1152) >>> ==22527== by 0x80E4E8A: main_loop (main.c:1177) >>> ==22527== by 0x80E49DA: main (main.c:936) >>> >>> This bug happens because utf_ptr2char(p) does not check >>> whether first byte of utf-8 sequence is legal. When p[0] >= 0x80 >>> and utf8len_tab[p[0]] is 1, then the first byte is illegal, and >>> utf_ptr2char(p) should return 1 immediately, without checking p[1]. >>> >>> Attached patch fix3-invalid-utf8-seq.patch fixes this 3rd bug. >>> >>> After those 3 fixes, I no longer observe errors. >> >> All three look like good fixes to me. I'll include them. >> >>> I hope errors with invalid utf-8 sequences are not too nitpicking. >>> I'm not aware of any other such errors at the moment. So hopefully >>> no more patches like this. >> >> Depending on the system reading uninitialised memory may cause a crash. >> And when mixing up latin1 with utf-8 it's easy to get illegal byte >> sequences. So it definitely makes sense to fix all these little >> problems. I'm very glad you find them. > > > > Hmmm, just re-looking at my patches again, I think patch #2 may > not be quite right. It was this patch: > > Index: message.c > =================================================================== > RCS file: /cvsroot/vim/vim7/src/message.c,v > retrieving revision 1.62 > diff -c -r1.62 message.c > *** message.c 28 Jun 2008 14:09:47 -0000 1.62 > --- message.c 29 Jun 2008 11:12:47 -0000 > *************** > *** 1391,1397 **** > plain_start = str + 1; > msg_puts_attr(s, attr == 0 ? hl_attr(HLF_8) : attr); > } > ! retval += ptr2cells(str); > ++str; > } > } > --- 1391,1397 ---- > plain_start = str + 1; > msg_puts_attr(s, attr == 0 ? hl_attr(HLF_8) : attr); > } > ! retval += char2cells(*str); > ++str; > } > } > > > It certainly avoids reading uninitialized memory, > but if *str character is an invalid utf-8 (say <ff> > for example) then shouldn't retval be incremented > by the number of cells 4 because it's an invalid > utf-8 sequence? With patch, retval gets incremented > by char2cells(0xff) which is 1, as if it was a legal > latin1 character (latin small letter y with diaeresis > has code 0xff). > > I don't observe anything wrong when using Vim though > but that does not mean it's right. I will try to fix this > tomorrow when I have time. > > -- Dominique
Ah, I do observe something wrong because of of the bug in patch #2 that I described in my previous comment. With: set nocompatible set wildmode=longest,list set wildmenu When file names contain invalid utf-8 sequences, I see file names incorrectly aligned on screen when doing file completion in ex mode, because of incorrect cell count. See screenshot below: http://dominique.pelle.free.fr/pic/bug-cells.png I will try to fix it tonight when I have time. -- Dominique --~--~---------~--~----~------------~-------~--~----~ You received this message from the "vim_dev" maillist. For more information, visit http://www.vim.org/maillist.php -~----------~----~----~----~------~----~------~--~---
