Re: libc/regex: safer pointer arithmetic

2021-01-03 Thread Miod Vallat


> regcomp.c uses the "start + count < end" idiom to check that there are
> "count" bytes available in an array of char "start" and "end" both point
> to.
>
> This is fine, unless "start + count" goes beyond the last element of the
> array. In this case, pedantic interpretation of the C standard makes
> the comparison of such a pointer against "end" undefined, and optimizers
> from hell will happily remove as much code as possible because of this.

I am only noticing now that llvm contains a copy of OpenBSD's libc regex
code (with an llvm_ prefix to the public interfaces) in its llvmSupport
library.

I am thus surprised that one of their sanitizers did not expose that
wrong construct already. I'll report this to the llvm project tomorrow.

In the meantime, under OpenBSD, it might be worth investigating shutting
that copy and having llvm_re* aliases of libc's re* functions, if only
to make the code smaller.



Re: libc/regex: safer pointer arithmetic

2020-12-30 Thread Theo Buehler
On Tue, Dec 29, 2020 at 06:03:36AM -0700, Todd C. Miller wrote:
> On Tue, 29 Dec 2020 10:33:26 +, Miod Vallat wrote:
> 
> > regcomp.c uses the "start + count < end" idiom to check that there are
> > "count" bytes available in an array of char "start" and "end" both point
> > to.
> >
> > This is fine, unless "start + count" goes beyond the last element of the
> > array. In this case, pedantic interpretation of the C standard makes
> > the comparison of such a pointer against "end" undefined, and optimizers
> > from hell will happily remove as much code as possible because of this.
> 
> OK millert@

Thank you. I have committed this batch of four diffs.



Re: libc/regex: safer pointer arithmetic

2020-12-29 Thread Todd C . Miller
On Tue, 29 Dec 2020 10:33:26 +, Miod Vallat wrote:

> regcomp.c uses the "start + count < end" idiom to check that there are
> "count" bytes available in an array of char "start" and "end" both point
> to.
>
> This is fine, unless "start + count" goes beyond the last element of the
> array. In this case, pedantic interpretation of the C standard makes
> the comparison of such a pointer against "end" undefined, and optimizers
> from hell will happily remove as much code as possible because of this.

OK millert@

 - todd



libc/regex: safer pointer arithmetic

2020-12-29 Thread Miod Vallat
regcomp.c uses the "start + count < end" idiom to check that there are
"count" bytes available in an array of char "start" and "end" both point
to.

This is fine, unless "start + count" goes beyond the last element of the
array. In this case, pedantic interpretation of the C standard makes
the comparison of such a pointer against "end" undefined, and optimizers
from hell will happily remove as much code as possible because of this.

An example of this occurs in regcomp.c's bothcases(), which defines
bracket[3], sets "next" to "bracket" and "end" to "bracket + 2". Then it
invokes p_bracket(), which starts with "if (p->next + 5 < p->end)"...

Because bothcases() and p_bracket() are static functions in regcomp.c,
there is a real risk of miscompilation if aggressive inlining happens.

The following diff rewrites the "start + count < end" constructs into
"end - start > count". Assuming "end" and "start" are always pointing in
the array (such as "bracket[3]" above), "end - start" is well-defined
and can be compared without trouble.

As a bonus, MORE2() implies MORE() therefore SEETWO() can be simplified
a bit.

Index: regcomp.c
===
RCS file: /OpenBSD/src/lib/libc/regex/regcomp.c,v
retrieving revision 1.35
diff -u -p -r1.35 regcomp.c
--- regcomp.c   13 Oct 2020 04:42:28 -  1.35
+++ regcomp.c   29 Dec 2020 10:24:18 -
@@ -113,10 +110,10 @@ static char nuls[10]; /* place to point
  */
 #definePEEK()  (*p->next)
 #definePEEK2() (*(p->next+1))
-#defineMORE()  (p->next < p->end)
-#defineMORE2() (p->next+1 < p->end)
+#defineMORE()  (p->end - p->next > 0)
+#defineMORE2() (p->end - p->next > 1)
 #defineSEE(c)  (MORE() && PEEK() == (c))
-#defineSEETWO(a, b)(MORE() && MORE2() && PEEK() == (a) && PEEK2() 
== (b))
+#defineSEETWO(a, b)(MORE2() && PEEK() == (a) && PEEK2() == (b))
 #defineEAT(c)  ((SEE(c)) ? (NEXT(), 1) : 0)
 #defineEATTWO(a, b)((SEETWO(a, b)) ? (NEXT2(), 1) : 0)
 #defineNEXT()  (p->next++)
@@ -623,15 +620,17 @@ p_bracket(struct parse *p)
int invert = 0;
 
/* Dept of Truly Sickening Special-Case Kludges */
-   if (p->next + 5 < p->end && strncmp(p->next, "[:<:]]", 6) == 0) {
-   EMIT(OBOW, 0);
-   NEXTn(6);
-   return;
-   }
-   if (p->next + 5 < p->end && strncmp(p->next, "[:>:]]", 6) == 0) {
-   EMIT(OEOW, 0);
-   NEXTn(6);
-   return;
+   if (p->end - p->next > 5) {
+   if (strncmp(p->next, "[:<:]]", 6) == 0) {
+   EMIT(OBOW, 0);
+   NEXTn(6);
+   return;
+   }
+   if (strncmp(p->next, "[:>:]]", 6) == 0) {
+   EMIT(OEOW, 0);
+   NEXTn(6);
+   return;
+   }
}
 
if ((cs = allocset(p)) == NULL) {