Valgrind memory checker detects out of bounds memory access
when using random characters in regular expressions.
It happens only when vim is built with +multi_byte
==6133== Invalid read of size 1
==6133== at 0x8159FC2: peekchr (regexp.c:2606)
==6133== by 0x8159B60: regatom (regexp.c:2331)
==6133== by 0x8157E16: regpiece (regexp.c:1433)
==6133== by 0x8157D78: regconcat (regexp.c:1394)
==6133== by 0x8157B29: regbranch (regexp.c:1309)
==6133== by 0x8157803: reg (regexp.c:1222)
==6133== by 0x8157311: vim_regcomp (regexp.c:1019)
==6133== by 0x8170B5A: search_regcomp (search.c:215)
==6133== by 0x81713BC: searchit (search.c:531)
==6133== by 0x81725A3: do_search (search.c:1247)
==6133== by 0x80A72CC: get_address (ex_docmd.c:3903)
==6133== by 0x80A4067: do_one_cmd (ex_docmd.c:1965)
==6133== by 0x80A2966: do_cmdline (ex_docmd.c:1099)
==6133== by 0x80A2018: do_cmdline_cmd (ex_docmd.c:705)
==6133== by 0x80E72B8: exe_commands (main.c:2663)
==6133== by 0x80E4CB2: main (main.c:875)
==6133== Address 0x4F9FA16 is 0 bytes after a block of size 22 alloc'd
==6133== at 0x4022765: malloc (vg_replace_malloc.c:149)
==6133== by 0x8112860: lalloc (misc2.c:857)
==6133== by 0x8112782: alloc (misc2.c:756)
==6133== by 0x8112BE5: vim_strsave (misc2.c:1144)
==6133== by 0x80A27A8: do_cmdline (ex_docmd.c:1029)
==6133== by 0x80A2018: do_cmdline_cmd (ex_docmd.c:705)
==6133== by 0x80E72B8: exe_commands (main.c:2663)
==6133== by 0x80E4CB2: main (main.c:875)
(and many more errors of the same kind)
I can reproduce these errors often (not 100%) by doing a search
with '/' using a regex containing a few random characters:
$ echo test > test.txt
$ vim -c "/$(head -c 10 /dev/urandom | tee testcase)" test.txt 2> vg.log
For example, errors happen all the time with testcase file containing:
$ od -An -x testcase
6d8d 7aa2 4e33 8214 f492
Bug happens when regex ends with an invalid (or incomplete)
UTF-8 sequence.
Regex with invalid UTF-8 sequences are not supposed to happen
generally so this is a low priority bug. However, accessing
buffer out of bound should not happen, even with garbage regex.
I can see why it happens:
regex.c:
2331 for (len = 0; c != NUL && (len == 0
2332 || (re_multi_type(peekchr()) == NOT_MULTI
2333 && !one_exactly
2334 && !is_Magic(c))); ++len)
2335 {
2336 c = no_Magic(c);
2337 #ifdef FEAT_MBYTE
2338 if (has_mbyte)
2339 {
2340 regmbc(c);
2341 if (enc_utf8)
2342 {
2343 int l;
2344
2345 /* Need to get composing character too. */
2346 for (;;)
2347 {
2348 l = utf_ptr2len(regparse);
2349 if (!UTF_COMPOSINGLIKE(regparse, regparse + l))
2350 break;
2351 regmbc(utf_ptr2char(regparse));
2352 skipchr();
2353 }
2354 }
2355 }
2356 else
2357 #endif
2358 regc(c);
2359 c = getchr();
2360 }
peekchr() at line 2332 accesses buffer regparse[] out of bounds.
regparse pointer goes 1 byte beyond allocated size because getchr()
at line 2359 consumed 2 bytes and regparse goes 1 bytes beyond
end of allocated size. 2 bytes are consumed because getchr()
at line 2359 calls skipchr() which increments regparse
by 2 at line 2773:
regex.c:
2761 static void
2762 skipchr()
2763 {
2764 /* peekchr() eats a backslash, do the same here */
2765 if (*regparse == '\\')
2766 prevchr_len = 1;
2767 else
2768 prevchr_len = 0;
2769 if (regparse[prevchr_len] != NUL)
2770 {
2771 #ifdef FEAT_MBYTE
2772 if (enc_utf8)
2773 prevchr_len += utf_char2len(mb_ptr2char(regparse +
prevchr_len));
2774 else if (has_mbyte)
2775 prevchr_len += (*mb_ptr2len)(regparse + prevchr_len);
2776 else
2777 #endif
2778 ++prevchr_len;
2779 }
2780 regparse += prevchr_len;
2781 prev_at_start = at_start;
At line 2773, call to mb_ptr2char (function pointer which points
to utf_char2len()) detects an invalid/incomplete UTF-8 sequence
but in that case returns the first byte. When this first byte is >= 0x80,
then call to utf_char2len() also at line 2773 return 2 (or more)
thus consuming 2 bytes. Adding 2 bytes to regparse at line
2790 can make regparse go 1 byte beyond end of regex string.
I attach a patch which fixes it. Perhaps there is a better way of
fixing it.
I'm using vim-7.1 with patches 1-156, on Linux, built without
optimizations (-O0), configured with "configure --with-features=huge".
-- Dominique
--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---
Index: regexp.c
===================================================================
RCS file: /cvsroot/vim/vim7/src/regexp.c,v
retrieving revision 1.44
diff -c -r1.44 regexp.c
*** regexp.c 11 Aug 2007 11:58:14 -0000 1.44
--- regexp.c 22 Nov 2007 23:07:50 -0000
***************
*** 582,587 ****
--- 582,588 ----
*/
static char_u *regparse; /* Input-scan pointer. */
+ static char_u *endregparse; /* end of input-scan pointer. */
static int prevchr_len; /* byte length of previous char */
static int num_complex_braces; /* Complex \{...} count */
static int regnpar; /* () count. */
***************
*** 2590,2595 ****
--- 2591,2597 ----
char_u *str;
{
regparse = str;
+ endregparse = str + STRLEN(str);
prevchr_len = 0;
curchr = prevprevchr = prevchr = nextchr = -1;
at_start = TRUE;
***************
*** 2778,2783 ****
--- 2780,2790 ----
++prevchr_len;
}
regparse += prevchr_len;
+
+ /* don't let regparse go beyond endregparse (which could happen when
+ * regparse ends with invalid utf-8 sequence */
+ if (regparse > endregparse)
+ regparse = endregparse;
prev_at_start = at_start;
at_start = FALSE;
prevprevchr = prevchr;