[patch] fixed 3 bugs with invalid utf-8 sequences

Dominique Pelle Sun, 29 Jun 2008 04:49:35 -0700

Here is another bug because of invalid utf-8 sequence:

==21329== Conditional jump or move depends on uninitialised value(s)
==21329==    at 0x811C4C7: utf_ptr2char (mbyte.c:1390)
==21329==    by 0x811C83F: utf_composinglike (mbyte.c:1498)
==21329==    by 0x811CD96: utfc_ptr2len_len (mbyte.c:1756)
==21329==    by 0x80FEAF1: msg_outtrans_len_attr (message.c:1355)
==21329==    by 0x80FE9D0: msg_outtrans_len (message.c:1292)
==21329==    by 0x80B9D7F: draw_cmdline (ex_getln.c:2641)
==21329==    by 0x80BAA99: redrawcmd (ex_getln.c:3126)
==21329==    by 0x80BAF82: nextwild (ex_getln.c:3317)
==21329==    by 0x80B6D3B: getcmdline (ex_getln.c:803)
==21329==    by 0x80B912E: getexline (ex_getln.c:2101)
==21329==    by 0x80A33FC: do_cmdline (ex_docmd.c:991)
==21329==    by 0x81280DD: nv_colon (normal.c:5181)
==21329==    by 0x8121772: normal_cmd (normal.c:1152)
==21329==    by 0x80E4E8A: main_loop (main.c:1177)
==21329==    by 0x80E49DA: main (main.c:936)


==21434== Conditional jump or move depends on uninitialised value(s)
==21434==    at 0x811C4C7: utf_ptr2char (mbyte.c:1390)
==21434==    by 0x811C28C: utf_ptr2cells (mbyte.c:1268)
==21434==    by 0x805D084: ptr2cells (charset.c:746)
==21434==    by 0x80FEC65: msg_outtrans_len_attr (message.c:1394)
==21434==    by 0x80FE9D0: msg_outtrans_len (message.c:1292)
==21434==    by 0x80B9D7F: draw_cmdline (ex_getln.c:2641)
==21434==    by 0x80BAA99: redrawcmd (ex_getln.c:3126)
==21434==    by 0x80BAF82: nextwild (ex_getln.c:3317)
==21434==    by 0x80B6D3B: getcmdline (ex_getln.c:803)
==21434==    by 0x80B912E: getexline (ex_getln.c:2101)
==21434==    by 0x80A33FC: do_cmdline (ex_docmd.c:991)
==21434==    by 0x81280DD: nv_colon (normal.c:5181)
==21434==    by 0x8121772: normal_cmd (normal.c:1152)
==21434==    by 0x80E4E8A: main_loop (main.c:1177)
==21434==    by 0x80E49DA: main (main.c:936)

It's easy to reproduce:

1/ Create a file names with invalid utf-8 name:

   $ mkdir testcase
   $ touch testcase/$(perl -e 'print chr(0xfb)')

2/ Start Vim in 'no compatible' mode with Valgrind:

   $ valgrind vim -u NONE -c -N 2> vg.log

4/ In ex mode, enter:

   :e testcase/<TAB>

   When pressing <TAB> key (file completion), vim should display
   file name with invalid utf-8 sequence 'testcase/<fb>'
   and observe in vg.log that valgrind reports above errors.

The 2 error messages are actually 2 distinct bugs.

- First error: in mbyte.c:1756, calling
  UTF_COMPOSINGLIKE(p + prevlen, p + len) is unsafe, since it
  can read bytes beyond size.  So in theory, there is a small risk
  that vim combines characters when it should not.

  Attached patch "fix1-invalid-utf8-seq.patch" fixes this first
  error, by calling utf_ptr2len_len(p + len, size - len) before
  UTF_COMPOSINGLIKE(...) to ensure that UTF_COMPOSINGLIKE(...)
  can't access beyond size.

- Second error: in message.c:1394, calling ptr2cells(str) is
  unsafe since it can read bytes in str beyond len (which are
  uninitialized).  At line 1394, we know that the character is
  only 1 byte long (mb_l is 1) so it should call char2cells(*str).
  mb_l is 1 here because of incomplete utf-8 sequence so calling
  ptr2cells(str) overflows.

  Attached patch "fix2-invalid-utf8-seq.patch" fixes it.

After fixing these 2 bugs, valgrind no longer complains with
the above testcase.

However, there is a 3rd bug which can be triggered with a slightly
different test case, using file <ff> instead of <fb>.  The difference
is that <ff> is an invalid utf-8 sequence, whereas <fb> was an
incomplete utf-8 sequence:

1/ Create a file names with invalid utf-8 names:

   $ mkdir testcase
   $ touch testcase/$(perl -e 'print chr(0xff)')

2/ Start Vim in 'no compatible' mode with Valgrind:

   $ valgrind vim -u NONE -c -N 2> vg.log

3/ In ex mode, enter:

   :e testcase/<TAB><TAB>

   When pressing <TAB> key (file completion), vim should display
   file name with invalid utf-8 sequence 'e: testcase/<fb>' (without
   valgrind errors after above patches) then when processing <TAB>
   again, vim displays next file 'e: testcase/<ff>' but observe
   that valgrind reports following error:

==22527== Conditional jump or move depends on uninitialised value(s)
==22527==    at 0x811C4CB: utf_ptr2char (mbyte.c:1390)
==22527==    by 0x811C843: utf_composinglike (mbyte.c:1498)
==22527==    by 0x811CDC8: utfc_ptr2len_len (mbyte.c:1769)
==22527==    by 0x80FEAF1: msg_outtrans_len_attr (message.c:1355)
==22527==    by 0x80FE9D0: msg_outtrans_len (message.c:1292)
==22527==    by 0x80B9D7F: draw_cmdline (ex_getln.c:2641)
==22527==    by 0x80BAA99: redrawcmd (ex_getln.c:3126)
==22527==    by 0x80BAF82: nextwild (ex_getln.c:3317)
==22527==    by 0x80B6CC6: getcmdline (ex_getln.c:790)
==22527==    by 0x80B912E: getexline (ex_getln.c:2101)
==22527==    by 0x80A33FC: do_cmdline (ex_docmd.c:991)
==22527==    by 0x81280F5: nv_colon (normal.c:5181)
==22527==    by 0x812178A: normal_cmd (normal.c:1152)
==22527==    by 0x80E4E8A: main_loop (main.c:1177)
==22527==    by 0x80E49DA: main (main.c:936)

This bug happens because utf_ptr2char(p) does not check
whether first byte of utf-8 sequence is legal.  When p[0] >= 0x80
and utf8len_tab[p[0]] is 1, then the first byte is illegal, and
utf_ptr2char(p) should return 1 immediately, without checking p[1].

Attached patch fix3-invalid-utf8-seq.patch fixes this 3rd bug.

After those 3 fixes, I no longer observe errors.

I hope errors with invalid utf-8 sequences are not too nitpicking.
I'm not aware of any other such errors at the moment.  So hopefully
no more patches like this.

I'm using Vim-7.2a.8 BETA (huge), utf-8 encoding, on Linux x86.

-- Dominique

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Index: mbyte.c
===================================================================
RCS file: /cvsroot/vim/vim7/src/mbyte.c,v
retrieving revision 1.65
diff -c -r1.65 mbyte.c
*** mbyte.c	24 Jun 2008 21:15:45 -0000	1.65
--- mbyte.c	29 Jun 2008 11:12:05 -0000
***************
*** 1753,1766 ****
  #endif
      while (len < size)
      {
! 	if (p[len] < 0x80 || !UTF_COMPOSINGLIKE(p + prevlen, p + len))
  	    break;
  
  	/* Skip over composing char */
  #ifdef FEAT_ARABIC
  	prevlen = len;
  #endif
! 	len += utf_ptr2len_len(p + len, size - len);
      }
      return len;
  }
--- 1753,1779 ----
  #endif
      while (len < size)
      {
! 	int	len_next_char;
! 
! 	if (p[len] < 0x80)
! 	    break;
! 
! 	/*
! 	 * Next character length should not go beyond size to ensure that
! 	 * UTF_COMPOSINGLIKE(...) does not read beyond size.
! 	 */
! 	len_next_char = utf_ptr2len_len(p + len, size - len);
! 	if (len_next_char > size - len)
! 	    break;
! 
! 	if (!UTF_COMPOSINGLIKE(p + prevlen, p + len))
  	    break;
  
  	/* Skip over composing char */
  #ifdef FEAT_ARABIC
  	prevlen = len;
  #endif
! 	len += len_next_char;
      }
      return len;
  }

Index: message.c
===================================================================
RCS file: /cvsroot/vim/vim7/src/message.c,v
retrieving revision 1.62
diff -c -r1.62 message.c
*** message.c	28 Jun 2008 14:09:47 -0000	1.62
--- message.c	29 Jun 2008 11:12:47 -0000
***************
*** 1391,1397 ****
  		plain_start = str + 1;
  		msg_puts_attr(s, attr == 0 ? hl_attr(HLF_8) : attr);
  	    }
! 	    retval += ptr2cells(str);
  	    ++str;
  	}
      }
--- 1391,1397 ----
  		plain_start = str + 1;
  		msg_puts_attr(s, attr == 0 ? hl_attr(HLF_8) : attr);
  	    }
! 	    retval += char2cells(*str);
  	    ++str;
  	}
      }

Index: mbyte.c
===================================================================
RCS file: /cvsroot/vim/vim7/src/mbyte.c,v
retrieving revision 1.65
diff -c -r1.65 mbyte.c
*** mbyte.c	24 Jun 2008 21:15:45 -0000	1.65
--- mbyte.c	29 Jun 2008 11:18:02 -0000
***************
*** 1387,1393 ****
  	return p[0];
  
      len = utf8len_tab[p[0]];
!     if ((p[1] & 0xc0) == 0x80)
      {
  	if (len == 2)
  	    return ((p[0] & 0x1f) << 6) + (p[1] & 0x3f);
--- 1387,1396 ----
  	return p[0];
  
      len = utf8len_tab[p[0]];
! 
!     /* Do not check p[1] if p[0] is an illegal first byte in utf-8 sequence
!      * i.e. if len is 1 */
!     if (len > 1 && (p[1] & 0xc0) == 0x80)
      {
  	if (len == 2)
  	    return ((p[0] & 0x1f) << 6) + (p[1] & 0x3f);

[patch] fixed 3 bugs with invalid utf-8 sequences

Raspunde prin e-mail lui