Hi

Here is another bug found by afl-fuzz. The following commands causes
vim-7.8.824 (and older) to access invalid memory, beyond end of string:

  $ mkdir /tmp/foo
  $ cd /tmp/foo
  $ touch '#'
  $ touch $(perl -e 'print chr(0xf0),chr(0x80),chr(0x80),chr(0xa3)')
  $ vim -u NONE -c 'e*'

Asan reports this error:

=================================================================
==6572==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x602000017114 at pc 0x0000011db7e1 bp 0x7fff5b6aacb0 sp
0x7fff5b6aaca8
READ of size 1 at 0x602000017114 thread T0
    #0 0x11db7e0 in utf_ptr2char /home/pel/sb/vim/src/mbyte.c:1696
(discriminator 3)
    #1 0x113bbe7 in pathcmp /home/pel/sb/vim/src/misc2.c:5824 (discriminator 1)
    #2 0x10e9c68 in pstrcmp /home/pel/sb/vim/src/misc1.c:10159 (discriminator 9)
    #3 0x7f4bd183b4c8 in msort_with_tmp
/build/buildd/eglibc-2.19/stdlib/msort.c:83
    #4 0x7f4bd183b77b in msort_with_tmp
/build/buildd/eglibc-2.19/stdlib/msort.c:45
    #5 0x10e8820 in unix_expandpath /home/pel/sb/vim/src/misc1.c:10377
(discriminator 12)
    #6 0x150934e in mch_expandpath /home/pel/sb/vim/src/os_unix.c:5640
    #7 0x10e1dbf in gen_expand_wildcards /home/pel/sb/vim/src/misc1.c:10983
    #8 0x10dd295 in expand_wildcards /home/pel/sb/vim/src/misc1.c:9695
    #9 0x10dcb25 in expand_wildcards_eval /home/pel/sb/vim/src/misc1.c:9666
    #10 0xc31003 in ExpandFromContext /home/pel/sb/vim/src/ex_getln.c:4613
    #11 0xbfef41 in ExpandOne /home/pel/sb/vim/src/ex_getln.c:3627
    #12 0xae89e1 in expand_filename /home/pel/sb/vim/src/ex_docmd.c:5061
    #13 0xab8d5d in do_one_cmd /home/pel/sb/vim/src/ex_docmd.c:2879
    #14 0xa92b4d in do_cmdline /home/pel/sb/vim/src/ex_docmd.c:1133
    #15 0xa99f50 in do_cmdline_cmd /home/pel/sb/vim/src/ex_docmd.c:738
    #16 0x2076afd in exe_commands /home/pel/sb/vim/src/main.c:2926
    #17 0x20577a5 in main /home/pel/sb/vim/src/main.c:961
    #18 0x7f4bd1821ec4 in __libc_start_main
/build/buildd/eglibc-2.19/csu/libc-start.c:287
    #19 0x466c96 in _start ??:?

0x602000017114 is located 2 bytes to the right of 2-byte region
[0x602000017110,0x602000017112)
allocated by thread T0 here:
    #0 0x4edc62 in __interceptor_malloc ??:?
    #1 0x110e232 in lalloc /home/pel/sb/vim/src/misc2.c:921
    #2 0x110de9b in alloc /home/pel/sb/vim/src/misc2.c:820
    #3 0x10e9360 in addfile /home/pel/sb/vim/src/misc1.c:11134
    #4 0x10e8269 in unix_expandpath /home/pel/sb/vim/src/misc1.c:10363
    #5 0x150934e in mch_expandpath /home/pel/sb/vim/src/os_unix.c:5640
    #6 0x10e1dbf in gen_expand_wildcards /home/pel/sb/vim/src/misc1.c:10983
    #7 0x10dd295 in expand_wildcards /home/pel/sb/vim/src/misc1.c:9695
    #8 0x10dcb25 in expand_wildcards_eval /home/pel/sb/vim/src/misc1.c:9666
    #9 0xc31003 in ExpandFromContext /home/pel/sb/vim/src/ex_getln.c:4613
    #10 0xbfef41 in ExpandOne /home/pel/sb/vim/src/ex_getln.c:3627
    #11 0xae89e1 in expand_filename /home/pel/sb/vim/src/ex_docmd.c:5061
    #12 0xab8d5d in do_one_cmd /home/pel/sb/vim/src/ex_docmd.c:2879
    #13 0xa92b4d in do_cmdline /home/pel/sb/vim/src/ex_docmd.c:1133
    #14 0xa99f50 in do_cmdline_cmd /home/pel/sb/vim/src/ex_docmd.c:738
    #15 0x2076afd in exe_commands /home/pel/sb/vim/src/main.c:2926
    #16 0x20577a5 in main /home/pel/sb/vim/src/main.c:961
    #17 0x7f4bd1821ec4 in __libc_start_main
/build/buildd/eglibc-2.19/csu/libc-start.c:287

Shadow bytes around the buggy address:
  0x0c047fffadd0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fffade0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fffadf0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fffae00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fffae10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c047fffae20: fa fa[02]fa fa fa 05 fa fa fa fd fa fa fa fd fa
  0x0c047fffae30: fa fa 03 fa fa fa 04 fa fa fa 00 00 fa fa 00 00
  0x0c047fffae40: fa fa 00 00 fa fa 00 00 fa fa 00 00 fa fa 00 00
  0x0c047fffae50: fa fa 00 00 fa fa fd fd fa fa fd fd fa fa fd fd
  0x0c047fffae60: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fd fd
  0x0c047fffae70: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==6572==ABORTING

Code in misc2.c:

5812     int
5813 pathcmp(p, q, maxlen)
5814     const char *p, *q;
5815     int maxlen;
5816 {
5817     int         i;
5818     int         c1, c2;
5819     const char  *s = NULL;
5820
5821     for (i = 0; maxlen < 0 || i < maxlen; i += MB_PTR2LEN((char_u *)p + i))
5822     {
5823         c1 = PTR2CHAR((char_u *)p + i);
5824         c2 = PTR2CHAR((char_u *)q + i);

This code assumes that if c1 and c2 are identical, then they must also
have the same value for MB_PTR2LEN(....). It's an incorrect assumption.
For example, the following 2 UTF-8 sequences encode the same character
U+0023 (i.e #) and yet they have different MB_PTR2LEN(...):

   0x23                     (MB_PTR2LEN(...) is 1)
   0xf0 0x80 0x80 0xa3      (MB_PTR2LEN(...) is 4).

The first UTF-8 sequence uses:

       0x00000000 - 0x0000007F:
           0xxxxxxx

       0x00010000 - 0x001FFFFF:
           11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Strictly speaking, the 2nd UTF-8 sequence should be considered illegal
in this case according to "man utf-8":

=== BEGIN [ man utf-8 ] ===
Security
   The Unicode and UCS standards require that producers of UTF-8 shall
   use  the  shortest form possible, for example, producing a two-byte
   sequence with first byte 0xc0 is nonconforming.   Unicode  3.1  has
   added the requirement that conforming programs must not accept non-
   shortest forms in their input.  This is for  security  reasons:  if
   user  input  is checked for possible security violations, a program
   might check only for the ASCII version of "/../" or ";" or NUL  and
   overlook  that  there  are  many  non-ASCII ways to represent these
   things in a non-shortest UTF-8 encoding.
=== END [ man utf-8 ] ===

Yet Vim gladly accept such malformed UTF-8 sequence like 0xf0 0x80 0x80 0xa3
as valid and treats it the same as 0x23 (# char). So perhaps a better fix
would be to consider those UTF-8 sequences as invalid? (changes to mbytes.c).
I did not do that. It might slow down Vim with UTF-8, and being able to view
such invalid utf8 sequences in Vim might be considered useful in some cases?

Regards
Dominique

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Raspunde prin e-mail lui