Re: expr: Fix integer overflows

2018-03-31 Thread Ingo Schwarze
Hi Tobias,

Tobias Stoeckmann wrote on Sat, Mar 31, 2018 at 01:28:19PM +0200:

> I actually ended up in expr(1) after seeing that the test(1) and ksh(1)
> debate could be continued here. While expr(1) is int64_t, expressions
> in ksh(1) are of type long, i.e. 32/64 bit depending on architecture.
> 
> This patch therefore is a preparation to get test(1) and expr(1) changes
> into ksh. As both utilities are standalone, it is much easier to discuss
> and make modifications in them, first.

All that sounds reasonable to me.

> Your example also triggers an overflow in FreeBSD's expr, so it is
> a good idea to inform them when we are happy with our fixed version.

Yes, please do that.

> Adjusted patch

Three more nits inline, and the updated patch appended at the end.

With that, OK schwarze@.
  Ingo


> Index: bin/expr/expr.c
> ===
> RCS file: /cvs/src/bin/expr/expr.c,v
> retrieving revision 1.26
> diff -u -p -u -p -r1.26 expr.c
> --- bin/expr/expr.c   19 Oct 2016 18:20:25 -  1.26
> +++ bin/expr/expr.c   31 Mar 2018 11:22:05 -
> @@ -19,8 +19,8 @@
>  struct val   *make_int(int64_t);
>  struct val   *make_str(char *);
>  void  free_value(struct val *);
> -int   is_integer(struct val *, int64_t *);
> -int   to_integer(struct val *);
> +int   is_integer(struct val *, int64_t *, const char **);
> +int   to_integer(struct val *, const char **);
>  void  to_string(struct val *);
>  int   is_zero_or_null(struct val *);
>  void  nexttoken(int);
> @@ -94,11 +94,9 @@ free_value(struct val *vp)
>  
>  /* determine if vp is an integer; if so, return it's value in *r */
>  int
> -is_integer(struct val *vp, int64_t *r)
> +is_integer(struct val *vp, int64_t *r, const char **errstr)
>  {
> - char   *s;
> - int neg;
> - int64_t i;
> + const char *errstrp;

Even if the current code doesn't rely on it, i think we should adopt
the good practice of strtonum(3) here of setting *errstr to NULL
on success, to make future bugs less likely.

Consider inserting at this point:

if (errstr == NULL)
errstr = 
*errstr = NULL;

>   if (vp->type == integer) {
>   *r = vp->u.i;
> @@ -107,43 +105,29 @@ is_integer(struct val *vp, int64_t *r)
>  
>   /*
>* POSIX.2 defines an "integer" as an optional unary minus
> -  * followed by digits.
> +  * followed by digits. Other representations are unspecified,
> +  * which means that strtonum(3) is a viable option here.
>*/
> - s = vp->u.s;
> - i = 0;
> + if (errstr == NULL)
> + errstr = 

These two lines can then be removed.

> + *r = strtonum(vp->u.s, INT64_MIN, INT64_MAX, errstr);
> + if (*errstr != NULL)
> + return 0;

This can be simplified to

return *errstr == NULL;

> - neg = (*s == '-');
> - if (neg)
> - s++;
> -
> - while (*s) {
> - if (!isdigit((unsigned char)*s))
> - return 0;
> -
> - i *= 10;
> - i += *s - '0';
> -
> - s++;
> - }
> -
> - if (neg)
> - i *= -1;
> -
> - *r = i;
>   return 1;
>  }
>  
>  
>  /* coerce to vp to an integer */
>  int
> -to_integer(struct val *vp)
> +to_integer(struct val *vp, const char **errstr)
>  {
>   int64_t r;
>  
>   if (vp->type == integer)
>   return 1;

For the reason explained above, insert here:

if (errstr != NULL)
*errstr = NULL;

> - if (is_integer(vp, )) {
> + if (is_integer(vp, , errstr)) {
>   free(vp->u.s);
>   vp->u.i = r;
>   vp->type = integer;
> @@ -176,7 +160,7 @@ is_zero_or_null(struct val *vp)
>   if (vp->type == integer)
>   return vp->u.i == 0;
>   else
> - return *vp->u.s == 0 || (to_integer(vp) && vp->u.i == 0);
> + return *vp->u.s == 0 || (to_integer(vp, NULL) && vp->u.i == 0);
>  }
>  
>  void
> @@ -303,20 +287,25 @@ eval5(void)
>  struct val *
>  eval4(void)
>  {
> - struct val *l, *r;
> - enum token  op;
> + const char  *errstr;
> + struct val  *l, *r;
> + enum token   op;
> + volatile int64_t res;
>  
>   l = eval5();
>   while ((op = token) == MUL || op == DIV || op == MOD) {
>   nexttoken(0);
>   r = eval5();
>  
> - if (!to_integer(l) || !to_integer(r)) {
> - errx(2, "non-numeric argument");
> + if (!to_integer(l, ) || !to_integer(r, )) {
> + errx(2, "%s", errstr);

I suggest to be more helpful here:

if (!to_integer(l, ))
errx(2, "number \"%s\" is %s", l->u.s, errstr);
if (!to_integer(r, ))
errx(2, "number \"%s\" is %s", r->u.s, errstr);

>   }
>  
>   

Re: expr: Fix integer overflows

2018-03-31 Thread Tobias Stoeckmann
Hi,

On Sat, Mar 31, 2018 at 02:57:45AM +0200, Ingo Schwarze wrote:
> Even though - as discussed previously for test(1) - behaviour is
> undefined by POSIX outside the range 0 to 2147483647, we are allowed
> to handle a larger range, and i agree that handling the range
> -9223372036854775808 to 9223372036854775807 (INT64_MIN to INT64_MAX)
> seems desirable.

I actually ended up in expr(1) after seeing that the test(1) and ksh(1)
debate could be continued here. While expr(1) is int64_t, expressions
in ksh(1) are of type long, i.e. 32/64 bit depending on architecture.

This patch therefore is a preparation to get test(1) and expr(1) changes
into ksh. As both utilities are standalone, it is much easier to discuss
and make modifications in them, first.

> > +   if (c < 0 || INT64_MAX / 10 < i || INT64_MAX - c < i * 10)
> > +   errx(3, "out of range");
> 
> I'm not saying this is an absolute show-stopper, but i consider
> it somewhat ugly.  What do you think about using strtonum(3)
> instead in is_integer?

Totally agree. Patch is adjusted. I tried to keep it as strict as it
was before, but you are correct. We don't violate POSIX with strtonum,
we just accept a few more inputs -- being in sync with ksh and test.

>$ /obin/expr.tobias -36854775808 - \( -9223372036854775807 - 1 \)
>   expr.tobias: overflow
> 
>   if ((l->u.i >= 0 && r->u.i < 0 && res <= 0) ||

I have adjusted the code. Awesome review skills and thanks a lot! Your
example also triggers an overflow in FreeBSD's expr, so it is a good
idea to inform them when we are happy with our fixed version.

> Do you want to check and re-send your patch?

Most certainly. Adjusted patch and extended regress tests in this mail.
Thanks a lot again!


Tobias

Index: bin/expr/expr.c
===
RCS file: /cvs/src/bin/expr/expr.c,v
retrieving revision 1.26
diff -u -p -u -p -r1.26 expr.c
--- bin/expr/expr.c 19 Oct 2016 18:20:25 -  1.26
+++ bin/expr/expr.c 31 Mar 2018 11:22:05 -
@@ -19,8 +19,8 @@
 struct val *make_int(int64_t);
 struct val *make_str(char *);
 voidfree_value(struct val *);
-int is_integer(struct val *, int64_t *);
-int to_integer(struct val *);
+int is_integer(struct val *, int64_t *, const char **);
+int to_integer(struct val *, const char **);
 voidto_string(struct val *);
 int is_zero_or_null(struct val *);
 voidnexttoken(int);
@@ -94,11 +94,9 @@ free_value(struct val *vp)
 
 /* determine if vp is an integer; if so, return it's value in *r */
 int
-is_integer(struct val *vp, int64_t *r)
+is_integer(struct val *vp, int64_t *r, const char **errstr)
 {
-   char   *s;
-   int neg;
-   int64_t i;
+   const char *errstrp;
 
if (vp->type == integer) {
*r = vp->u.i;
@@ -107,43 +105,29 @@ is_integer(struct val *vp, int64_t *r)
 
/*
 * POSIX.2 defines an "integer" as an optional unary minus
-* followed by digits.
+* followed by digits. Other representations are unspecified,
+* which means that strtonum(3) is a viable option here.
 */
-   s = vp->u.s;
-   i = 0;
+   if (errstr == NULL)
+   errstr = 
+   *r = strtonum(vp->u.s, INT64_MIN, INT64_MAX, errstr);
+   if (*errstr != NULL)
+   return 0;
 
-   neg = (*s == '-');
-   if (neg)
-   s++;
-
-   while (*s) {
-   if (!isdigit((unsigned char)*s))
-   return 0;
-
-   i *= 10;
-   i += *s - '0';
-
-   s++;
-   }
-
-   if (neg)
-   i *= -1;
-
-   *r = i;
return 1;
 }
 
 
 /* coerce to vp to an integer */
 int
-to_integer(struct val *vp)
+to_integer(struct val *vp, const char **errstr)
 {
int64_t r;
 
if (vp->type == integer)
return 1;
 
-   if (is_integer(vp, )) {
+   if (is_integer(vp, , errstr)) {
free(vp->u.s);
vp->u.i = r;
vp->type = integer;
@@ -176,7 +160,7 @@ is_zero_or_null(struct val *vp)
if (vp->type == integer)
return vp->u.i == 0;
else
-   return *vp->u.s == 0 || (to_integer(vp) && vp->u.i == 0);
+   return *vp->u.s == 0 || (to_integer(vp, NULL) && vp->u.i == 0);
 }
 
 void
@@ -303,20 +287,25 @@ eval5(void)
 struct val *
 eval4(void)
 {
-   struct val *l, *r;
-   enum token  op;
+   const char  *errstr;
+   struct val  *l, *r;
+   enum token   op;
+   volatile int64_t res;
 
l = eval5();
while ((op = token) == MUL || op == DIV || op == MOD) {
nexttoken(0);
r = eval5();
 
-   if (!to_integer(l) || !to_integer(r)) {
-   errx(2, "non-numeric 

Re: expr: Fix integer overflows

2018-03-30 Thread Ingo Schwarze
Hi Tobias,

Tobias Stoeckmann wrote on Fri, Mar 30, 2018 at 05:39:11PM +0200:

> Our expr implementation supports 64 bit integers, but does not check
> for overflows during parsing and arithmetical operations.

Even though - as discussed previously for test(1) - behaviour is
undefined by POSIX outside the range 0 to 2147483647, we are allowed
to handle a larger range, and i agree that handling the range
-9223372036854775808 to 9223372036854775807 (INT64_MIN to INT64_MAX)
seems desirable.

> This patch fixes the problems based on FreeBSD's implementation, which
> is a bit more advanced than NetBSD, which it is based upon.

But i have two concerns regarding your implementation, see inline below.

> Index: bin/expr/expr.c
> ===
> RCS file: /cvs/src/bin/expr/expr.c,v
> retrieving revision 1.26
> diff -u -p -u -p -r1.26 expr.c
> --- bin/expr/expr.c   19 Oct 2016 18:20:25 -  1.26
> +++ bin/expr/expr.c   30 Mar 2018 15:33:50 -
> @@ -99,6 +99,7 @@ is_integer(struct val *vp, int64_t *r)
>   char   *s;
>   int neg;
>   int64_t i;
> + charc;
>  
>   if (vp->type == integer) {
>   *r = vp->u.i;
> @@ -120,8 +121,12 @@ is_integer(struct val *vp, int64_t *r)
>   if (!isdigit((unsigned char)*s))
>   return 0;
>  
> + c = *s - '0';
> + if (c < 0 || INT64_MAX / 10 < i || INT64_MAX - c < i * 10)
> + errx(3, "out of range");

Here is the first concern.

First look at our old implementation:

   $ /obin/expr -9223372036854775807 + 0
  -9223372036854775807

   $ /obin/expr -9223372036854775807 - 1
  -9223372036854775808
   $ /obin/expr -9223372036854775808 + 0 
  -9223372036854775808

   $ /obin/expr -9223372036854775807 - 2
  9223372036854775807
   $ /obin/expr -9223372036854775808 - 1 
  9223372036854775807
   $ /obin/expr -9223372036854775809 + 0 
  9223372036854775807

The first three are expected; the overflow in the last three is not.

Now look at what your new implementation does:

   $ /obin/expr.tobias -9223372036854775807 + 0
  -9223372036854775807

   $ /obin/expr.tobias -9223372036854775807 - 1 
  -9223372036854775808
   $ /obin/expr.tobias -9223372036854775808 + 0 
  expr.tobias: out of range

   $ /obin/expr.tobias -9223372036854775807 - 2 
  expr.tobias: overflow
   $ /obin/expr.tobias -9223372036854775808 - 1 
  expr.tobias: out of range
   $ /obin/expr.tobias -9223372036854775809 + 0 
  expr.tobias: out of range

The first two are expected, and so is the overflow in number 4
and the out of range in number 6.  But the out of range in
numbers 3 and 5 is unexpected.  I would expect number 3 to
succeed and number 5 to fail with overflow rather than out of range.

The reason is that the hand-rolled integer parser fails for INT64_MIN
because -INT64_MIN is larger than INT64_MAX.

I'm not saying this is an absolute show-stopper, but i consider
it somewhat ugly.  What do you think about using strtonum(3)
instead in is_integer?  long long is guaranteed to be at least
64 bits long, and that would solve the problem with INT64_MIN.

> +
>   i *= 10;
> - i += *s - '0';
> + i += c;
>  
>   s++;
>   }
> @@ -303,8 +308,9 @@ eval5(void)
>  struct val *
>  eval4(void)
>  {
> - struct val *l, *r;
> - enum token  op;
> + struct val  *l, *r;
> + enum token   op;
> + volatile int64_t res;
>  
>   l = eval5();
>   while ((op = token) == MUL || op == DIV || op == MOD) {
> @@ -316,7 +322,10 @@ eval4(void)
>   }
>  
>   if (op == MUL) {
> - l->u.i *= r->u.i;
> + res = l->u.i * r->u.i;
> + if (r->u.i != 0 && l->u.i != res / r->u.i)
> + errx(3, "overflow");
> + l->u.i = res;
>   } else {
>   if (r->u.i == 0) {
>   errx(2, "division by zero");
> @@ -324,6 +333,8 @@ eval4(void)
>   if (op == DIV) {
>   if (l->u.i != INT64_MIN || r->u.i != -1)
>   l->u.i /= r->u.i;
> + else
> + errx(3, "overflow");
>   } else {
>   if (l->u.i != INT64_MIN || r->u.i != -1)
>   l->u.i %= r->u.i;
> @@ -342,8 +353,9 @@ eval4(void)
>  struct val *
>  eval3(void)
>  {
> - struct val *l, *r;
> - enum token  op;
> + struct val  *l, *r;
> + enum token   op;
> + volatile int64_t res;
>  
>   l = eval4();
>   while ((op = token) == ADD || op == SUB) {
> @@ -355,9 +367,18 @@ eval3(void)
>   }
>  
>   if (op == ADD) {
> - l->u.i += r->u.i;

expr: Fix integer overflows

2018-03-30 Thread Tobias Stoeckmann
Our expr implementation supports 64 bit integers, but does not check
for overflows during parsing and arithmetical operations.

This patch fixes the problems based on FreeBSD's implementation, which
is a bit more advanced than NetBSD, which it is based upon.

The added regression test case is taken from NetBSD and passes with
this patch.

Okay?


Tobias

Index: regress/bin/Makefile
===
RCS file: /cvs/src/regress/bin/Makefile,v
retrieving revision 1.13
diff -u -p -u -p -r1.13 Makefile
--- regress/bin/Makefile14 Jan 2018 22:04:47 -  1.13
+++ regress/bin/Makefile30 Mar 2018 15:33:50 -
@@ -1,6 +1,6 @@
 #  $OpenBSD: Makefile,v 1.13 2018/01/14 22:04:47 bluhm Exp $
 
-SUBDIR+= cat chmod csh ed ksh ln md5 pax ps test
+SUBDIR+= cat chmod csh ed expr ksh ln md5 pax ps test
 
 install:
 
Index: bin/expr/expr.c
===
RCS file: /cvs/src/bin/expr/expr.c,v
retrieving revision 1.26
diff -u -p -u -p -r1.26 expr.c
--- bin/expr/expr.c 19 Oct 2016 18:20:25 -  1.26
+++ bin/expr/expr.c 30 Mar 2018 15:33:50 -
@@ -99,6 +99,7 @@ is_integer(struct val *vp, int64_t *r)
char   *s;
int neg;
int64_t i;
+   charc;
 
if (vp->type == integer) {
*r = vp->u.i;
@@ -120,8 +121,12 @@ is_integer(struct val *vp, int64_t *r)
if (!isdigit((unsigned char)*s))
return 0;
 
+   c = *s - '0';
+   if (c < 0 || INT64_MAX / 10 < i || INT64_MAX - c < i * 10)
+   errx(3, "out of range");
+
i *= 10;
-   i += *s - '0';
+   i += c;
 
s++;
}
@@ -303,8 +308,9 @@ eval5(void)
 struct val *
 eval4(void)
 {
-   struct val *l, *r;
-   enum token  op;
+   struct val  *l, *r;
+   enum token   op;
+   volatile int64_t res;
 
l = eval5();
while ((op = token) == MUL || op == DIV || op == MOD) {
@@ -316,7 +322,10 @@ eval4(void)
}
 
if (op == MUL) {
-   l->u.i *= r->u.i;
+   res = l->u.i * r->u.i;
+   if (r->u.i != 0 && l->u.i != res / r->u.i)
+   errx(3, "overflow");
+   l->u.i = res;
} else {
if (r->u.i == 0) {
errx(2, "division by zero");
@@ -324,6 +333,8 @@ eval4(void)
if (op == DIV) {
if (l->u.i != INT64_MIN || r->u.i != -1)
l->u.i /= r->u.i;
+   else
+   errx(3, "overflow");
} else {
if (l->u.i != INT64_MIN || r->u.i != -1)
l->u.i %= r->u.i;
@@ -342,8 +353,9 @@ eval4(void)
 struct val *
 eval3(void)
 {
-   struct val *l, *r;
-   enum token  op;
+   struct val  *l, *r;
+   enum token   op;
+   volatile int64_t res;
 
l = eval4();
while ((op = token) == ADD || op == SUB) {
@@ -355,9 +367,18 @@ eval3(void)
}
 
if (op == ADD) {
-   l->u.i += r->u.i;
+   res = l->u.i + r->u.i;
+   if ((l->u.i > 0 && r->u.i > 0 && res <= 0) ||
+   (l->u.i < 0 && r->u.i < 0 && res >= 0))
+   errx(3, "overflow");
+   l->u.i = res;
} else {
-   l->u.i -= r->u.i;
+   res = l->u.i - r->u.i;
+   if ((r->u.i == INT64_MIN && l->u.i < 0) ||
+   (l->u.i > 0 && r->u.i < 0 && res <= 0) ||
+   (l->u.i < 0 && r->u.i > 0 && res >= 0))
+   errx(3, "overflow");
+   l->u.i = res;
}
 
free_value(r);
Index: regress/bin/expr/Makefile
===
RCS file: regress/bin/expr/Makefile
diff -N regress/bin/expr/Makefile
--- /dev/null   1 Jan 1970 00:00:00 -
+++ regress/bin/expr/Makefile   30 Mar 2018 15:33:50 -
@@ -0,0 +1,8 @@
+# $OpenBSD$
+
+REGRESS_TARGETS = expr
+
+expr:
+   sh ${.CURDIR}/expr.sh
+
+.include 
Index: regress/bin/expr/expr.sh
===
RCS file: regress/bin/expr/expr.sh
diff -N regress/bin/expr/expr.sh
--- /dev/null   1 Jan 1970 00:00:00 -
+++ regress/bin/expr/expr.sh30 Mar 2018 15:33:50 -
@@ -0,0 +1,98 @@
+#!/bin/ksh
+
+: ${EXPR=expr}
+
+function test_expr {
+   #echo "Testing: `eval echo $1`"
+   res=`eval $EXPR $1 2>&1`