Re: patch to fix access beyond allocated buffer in regex.c

Bram Moolenaar Sat, 24 Nov 2007 05:04:57 -0800


Dominique Pelle wrote:


> Valgrind memory checker detects out of bounds memory access
> when using random characters in regular expressions.
> 
> It happens only when vim is built with +multi_byte
> 
> ==6133== Invalid read of size 1
> ==6133==    at 0x8159FC2: peekchr (regexp.c:2606)
> ==6133==    by 0x8159B60: regatom (regexp.c:2331)
> ==6133==    by 0x8157E16: regpiece (regexp.c:1433)
> ==6133==    by 0x8157D78: regconcat (regexp.c:1394)
> ==6133==    by 0x8157B29: regbranch (regexp.c:1309)
> ==6133==    by 0x8157803: reg (regexp.c:1222)
> ==6133==    by 0x8157311: vim_regcomp (regexp.c:1019)
> ==6133==    by 0x8170B5A: search_regcomp (search.c:215)
> ==6133==    by 0x81713BC: searchit (search.c:531)
> ==6133==    by 0x81725A3: do_search (search.c:1247)
> ==6133==    by 0x80A72CC: get_address (ex_docmd.c:3903)
> ==6133==    by 0x80A4067: do_one_cmd (ex_docmd.c:1965)
> ==6133==    by 0x80A2966: do_cmdline (ex_docmd.c:1099)
> ==6133==    by 0x80A2018: do_cmdline_cmd (ex_docmd.c:705)
> ==6133==    by 0x80E72B8: exe_commands (main.c:2663)
> ==6133==    by 0x80E4CB2: main (main.c:875)
> ==6133==  Address 0x4F9FA16 is 0 bytes after a block of size 22 alloc'd
> ==6133==    at 0x4022765: malloc (vg_replace_malloc.c:149)
> ==6133==    by 0x8112860: lalloc (misc2.c:857)
> ==6133==    by 0x8112782: alloc (misc2.c:756)
> ==6133==    by 0x8112BE5: vim_strsave (misc2.c:1144)
> ==6133==    by 0x80A27A8: do_cmdline (ex_docmd.c:1029)
> ==6133==    by 0x80A2018: do_cmdline_cmd (ex_docmd.c:705)
> ==6133==    by 0x80E72B8: exe_commands (main.c:2663)
> ==6133==    by 0x80E4CB2: main (main.c:875)
> 
> (and many more errors of the same kind)
> 
> I can reproduce these errors often (not 100%) by doing a search
> with '/' using a regex containing a few random characters:
> 
>   $ echo test > test.txt
>   $ vim -c "/$(head -c 10 /dev/urandom | tee testcase)" test.txt 2> vg.log
> 
> For example, errors happen all the time with testcase file containing:
> 
>   $ od -An -x testcase
>   6d8d 7aa2 4e33 8214 f492
> 
> Bug happens when regex ends with an invalid (or incomplete)
> UTF-8 sequence.
> 
> Regex with invalid UTF-8 sequences are not supposed to happen
> generally so this is a low priority bug.  However, accessing
> buffer out of bound should not happen, even with garbage regex.
> 
> I can see why it happens:
> 
> regex.c:
> 
> 2331             for (len = 0; c != NUL && (len == 0
> 2332                         || (re_multi_type(peekchr()) == NOT_MULTI
> 2333                             && !one_exactly
> 2334                             && !is_Magic(c))); ++len)
> 2335             {
> 2336                 c = no_Magic(c);
> 2337 #ifdef FEAT_MBYTE
> 2338                 if (has_mbyte)
> 2339                 {
> 2340                     regmbc(c);
> 2341                     if (enc_utf8)
> 2342                     {
> 2343                         int     l;
> 2344
> 2345                         /* Need to get composing character too. */
> 2346                         for (;;)
> 2347                         {
> 2348                             l = utf_ptr2len(regparse);
> 2349                             if (!UTF_COMPOSINGLIKE(regparse, regparse + 
> l))
> 2350                                 break;
> 2351                             regmbc(utf_ptr2char(regparse));
> 2352                             skipchr();
> 2353                         }
> 2354                     }
> 2355                 }
> 2356                 else
> 2357 #endif
> 2358                     regc(c);
> 2359                 c = getchr();
> 2360             }
> 
> peekchr() at line 2332 accesses buffer regparse[] out of bounds.
> regparse pointer goes 1 byte beyond allocated size because getchr()
> at line 2359 consumed 2 bytes and regparse goes 1 bytes beyond
> end of allocated size.  2 bytes are consumed because getchr()
> at line 2359 calls skipchr() which increments regparse
> by 2 at line 2773:
> 
> regex.c:
> 
> 2761     static void
> 2762 skipchr()
> 2763 {
> 2764     /* peekchr() eats a backslash, do the same here */
> 2765     if (*regparse == '\\')
> 2766         prevchr_len = 1;
> 2767     else
> 2768         prevchr_len = 0;
> 2769     if (regparse[prevchr_len] != NUL)
> 2770     {
> 2771 #ifdef FEAT_MBYTE
> 2772         if (enc_utf8)
> 2773             prevchr_len += utf_char2len(mb_ptr2char(regparse +
> prevchr_len));
> 2774         else if (has_mbyte)
> 2775             prevchr_len += (*mb_ptr2len)(regparse + prevchr_len);
> 2776         else
> 2777 #endif
> 2778             ++prevchr_len;
> 2779     }
> 2780     regparse += prevchr_len;
> 2781     prev_at_start = at_start;
> 
> At line 2773, call to mb_ptr2char (function pointer which points
> to utf_char2len()) detects an invalid/incomplete UTF-8 sequence
> but in that case returns the first byte. When this first byte is >= 0x80,
> then call to utf_char2len() also at line 2773 return 2 (or more)
> thus consuming 2 bytes.  Adding 2 bytes to regparse at line
> 2790 can make regparse go 1 byte beyond end of regex string.
> 
> I attach a patch which fixes it.  Perhaps there is a better way of
> fixing it.

Thanks for locating this problem.  For the fix I think it's simplest to
use utf_ptr2len().  It checks for the NUL character in the same way as
mb_ptr2char():

*** ../vim-7.1.159/src/regexp.c Sat Aug 11 13:57:31 2007
--- src/regexp.c        Sat Nov 24 13:23:53 2007
***************
*** 2770,2776 ****
      {
  #ifdef FEAT_MBYTE
        if (enc_utf8)
!           prevchr_len += utf_char2len(mb_ptr2char(regparse + prevchr_len));
        else if (has_mbyte)
            prevchr_len += (*mb_ptr2len)(regparse + prevchr_len);
        else
--- 2770,2777 ----
      {
  #ifdef FEAT_MBYTE
        if (enc_utf8)
!           /* exclude composing chars that mb_ptr2len does include */
!           prevchr_len += utf_ptr2len(regparse + prevchr_len);
        else if (has_mbyte)
            prevchr_len += (*mb_ptr2len)(regparse + prevchr_len);
        else

Not sure if that fixes all situations, perhaps you can check that?

-- 
hundred-and-one symptoms of being an internet addict:
139. You down your lunch in five minutes, at your desk, so you can
     spend the rest of the hour surfing the Net.

 /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net   \\\
///        sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\        download, build and distribute -- http://www.A-A-P.org        ///
 \\\            help me help AIDS victims -- http://ICCF-Holland.org    ///

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Re: patch to fix access beyond allocated buffer in regex.c

Raspunde prin e-mail lui