Re: (patches) fixed 2 bugs in multi-byte

Ben Schmidt Thu, 15 May 2008 10:03:39 -0700

Patch #1 shouldn't be necessary after a patch I have recently submitted to Bram 
is 
incorporated as it was a bug that utf_head_off was called at all in the case 
you 
found; your solution of adding a NUL works around one of the symptoms rather 
than 
fixing the real bug.


Haven't looked at #2 so will leave that to Bram/others to comment.

Ben.



Dominique Pelle wrote:
> Bug#1:
> 
> Valgrind memory checker detects a bug when using utf-8 locale
> and when loading a file which contains only 1 byte (0xfc).
> 
> Bug is easy to reproduce:
> 
> 1/ make sure locale is utf-8
> 
> 2/ create a file with only 1 byte 0xfc:
> 
>    $ perl -e 'print chr(0xfc)' > foo.txt
> 
>    Note that this is clearly not a valid utf-8 file.
> 
> 3/ open the file with vim and valgrind:
> 
>    $ valgrind vim -u NONE foo.txt 2> vg.log
> 
> 4/ notice that valgrind complains immediately as file is being read:
> 
> ==15575== Conditional jump or move depends on uninitialised value(s)
> ==15575==    at 0x811F6FD: utf_head_off (mbyte.c:2494)
> ==15575==    by 0x80C32B0: readfile (fileio.c:1880)
> ==15575==    by 0x80527AE: open_buffer (buffer.c:130)
> ==15575==    by 0x80E7F82: create_windows (main.c:2459)
> ==15575==    by 0x80E5C84: main (main.c:799)
> 
> 5/ notice as well that vim has set file encoding to utf-8 even
>    though it is not a valid utf-8 file.
> 
>    I would expect fenc to be set automatically to latin1,
>    and not utf-8.
> 
>    ':set fenc?'          shows: fileencoding=utf-8
>    ':set fileencodings?" shows: fileencodings=ucs-bom,utf-8,latin1
> 
> 
> Looking at the code, I see that function utf_head_off(base, p)
> reads 1 to n bytes ahead p at line 2494:
> 
>     2474     int
>     2475 utf_head_off(base, p)
>     2476     char_u      *base;
>     2477     char_u      *p;
>     2478 {
>     2479     char_u      *q;
>     2480     char_u      *s;
>     2481     int         c;
>     2482 #ifdef FEAT_ARABIC
>     2483     char_u      *j;
>     2484 #endif
>     2485
>     2486     if (*p < 0x80)              /* be quick for ASCII */
>     2487         return 0;
>     2488
>     2489     /* Skip backwards over trailing bytes: 10xx.xxxx
>     2490      * Skip backwards again if on a composing char. */
>     2491     for (q = p; ; --q)
>     2492     {
>     2493         /* Move s to the last byte of this char. */
> !!! 2494         for (s = q; (s[1] & 0xc0) == 0x80; ++s)
>     2495             ;
>     2496         /* Move q to the first byte of this char. */
>     2497         while (q > base && (*q & 0xc0) == 0x80)
>     2498             --q;
> 
> 
> Line 2494 is accessing s[1] (i.e. 'p[1]').  But only
> p[0] is initialized since readfile() only read one byte
> from file, so p[1] and beyond are uninitialized.
> 
> Loop at line 2494 can in theory access many other bytes
> beyond the end of the buffer since exit condition
> of loop depends on uninitialized memory.
> 
> I think adding a NUL after what is read from file fixes
> this bug.  See attached patch but please review it.
> 
> However, it does not fix the fact that fenc is set to
> utf-8 even though it is an invalid utf-8 file.
> 
> 
> Bug #2:
> 
> Using the same input file, I can see another bug:
> 
> 1/ yank the first line:  Y
>    This copies the invalid utf-8 into the unnamed "" register
> 
> 2/ copy the unnamed "" register onto the Ex line    :<CTRL-R>"
> 
> 3/ observe the following error when copying "" register:
> 
> ==6484==    at 0x80BAEBD: cmdline_paste_str (ex_getln.c:3028)
> ==6484==    by 0x8132B12: cmdline_paste_reg (ops.c:1528)
> ==6484==    by 0x80BADD2: cmdline_paste (ex_getln.c:3009)
> ==6484==    by 0x80B8029: getcmdline (ex_getln.c:1131)
> ==6484==    by 0x80B96C7: getexline (ex_getln.c:2100)
> ==6484==    by 0x80A31FA: do_cmdline (ex_docmd.c:995)
> ==6484==    by 0x8129F9A: nv_colon (normal.c:5179)
> ==6484==    by 0x8123556: normal_cmd (normal.c:1152)
> ==6484==    by 0x80E631D: main_loop (main.c:1181)
> ==6484==    by 0x80E5E6D: main (main.c:940)
> ==6484==  Address 0x51519C2 is 0 bytes after a block of size 2 alloc'd
> ==6484==    at 0x4022765: malloc (vg_replace_malloc.c:149)
> ==6484==    by 0x8113B6C: lalloc (misc2.c:862)
> ==6484==    by 0x8113A8E: alloc (misc2.c:761)
> ==6484==    by 0x8113EFB: vim_strsave (misc2.c:1166)
> ==6484==    by 0x81355CD: op_yank (ops.c:2901)
> ==6484==    by 0x8124AC6: do_pending_operator (normal.c:1890)
> ==6484==    by 0x81235D1: normal_cmd (normal.c:1178)
> ==6484==    by 0x80E631D: main_loop (main.c:1181)
> ==6484==    by 0x80E5E6D: main (main.c:940)
> 
>     3018     void
>     3019 cmdline_paste_str(s, literally)
>     3020     char_u      *s;
>     3021     int         literally;
>     3022 {
>     3023     int         c, cv;
>     3024
>     3025     if (literally)
>     3026         put_on_cmdline(s, -1, TRUE);
>     3027     else
> !!! 3028         while (*s != NUL)
>     3029         {
>     3030             cv = *s;
>     3031             if (cv == Ctrl_V && s[1])
>     3032                 ++s;
>     3033 #ifdef FEAT_MBYTE
>     3034             if (has_mbyte)
>     3035             {
> !!! 3036                 c = mb_ptr2char(s);
> !!! 3037                 s += mb_char2len(c);
>     3038             }
>     3039             else
>     3040 #endif
> 
> Call to mb_ptr2char(s) at line 3036 consumes only 1 byte
> because s is an invalid utf-8 sequence and returns the
> first byte which is 0xfc.   Calls to mb_char2len(0xfc)
> at next line increments s by 2 bytes instead of 1 (!!)
> because it takes 2 bytes to store character u+00fc in
> utf-8 (which is inconsistent with the 1 byte consumed
> by previous line).
> 
> I think calling 'c = mb_cptr2char_adv(&s);' to increment s
> fixes the problem as in attached patch but please review it.
> 
> -- Dominique
> 
> > 
> 

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Re: (patches) fixed 2 bugs in multi-byte

Raspunde prin e-mail lui