Hi,

while studying martijn@'s pending regexec(3) patch, i found
a read-access one-byte buffer underflow in "case OBOL" in the
function backref(), file libc/regex/engine.c.

I think outright bugs (like crashes) ought to be fixed before
improving functionality.  So i'd like to get this in first, then
return to checking martijn@'s improvements.  In particular since
this doesn't conflict with martijn@'s patch, even though it touches
closely related code.

The following is required to trigger it:

 * REG_BASIC
 * REG_NOTBOL
 * The regular expression must contain at least one backreference.
 * The regular expression must start with at least one opening
   parenthesis starting a subexpression.
 * The first elementary atom in the expression must be '^'.
   It must be at the beginning of a subexpression
   that is quantified by '*'.

That may seem a bit contrived at first, but with REG_NEWLINE, there
might even be practical use cases for patterns starting with something
similar to "\(^foo\)*", continuing with something else, then using
a backreference - for example to capture lines containing the
"something else" bracketed with lines containing variations of foo.
Even if such stuff is contrived, i don't think we want regexec(3)
to ever segfault.  Besides, doing explicit index checks before
backing up a byte in a buffer helps code auditors, and the regex
guts are already next door to auditor's hell.

I'm appending a test program demonstrating the segfault
and a patch fixing it.

OK?
  Ingo


 ----- 8< ----- schnipp ----- >8 ----- 8< ----- schnapp ----- >8 -----

#include <sys/types.h>
#include <err.h>
#include <regex.h>
#include <stdlib.h>

int
main(void)
{
        regex_t          re;
        char            *buf;

        if (regcomp(&re, "\\(^\\)*\\(x\\)\\2", REG_BASIC | REG_NEWLINE))
                errx(1, "regcomp");

        /*
         * Allocate a huge buffer such that we get
         * a guard page in front of it.
         */

        if ((buf = malloc(64 * 1024)) == NULL)
                err(1, NULL);
        buf[0] = 'x';
        buf[1] = 'x';
        buf[2] = '\0';

        /*
         * Trigger the segfault in regex/engine.c,
         * backref(), case OBOL.
         */

        regexec(&re, buf, 0, NULL, REG_NOTBOL);

        errx(1, "This is unexpected: regexec did not segfault.");
}

 ----- 8< ----- schnipp ----- >8 ----- 8< ----- schnapp ----- >8 -----

Index: engine.c
===================================================================
RCS file: /cvs/src/lib/libc/regex/engine.c,v
retrieving revision 1.19
diff -u -p -r1.19 engine.c
--- engine.c    28 Dec 2015 23:01:22 -0000      1.19
+++ engine.c    15 May 2016 15:18:22 -0000
@@ -506,9 +506,9 @@ backref(struct match *m, char *start, ch
                                return(NULL);
                        break;
                case OBOL:
-                       if ( (sp == m->beginp && !(m->eflags&REG_NOTBOL)) ||
-                                       (sp < m->endp && *(sp-1) == '\n' &&
-                                               (m->g->cflags&REG_NEWLINE)) )
+                       if ((sp == m->beginp && !(m->eflags&REG_NOTBOL)) ||
+                           (sp > m->offp && sp < m->endp &&
+                            *(sp-1) == '\n' && (m->g->cflags&REG_NEWLINE)))
                                { /* yes */ }
                        else
                                return(NULL);

Reply via email to