Re: Standard conformance of strtol(3)

2017-07-06 Thread Gerhard Roth

On 06.07.2017 17:53, Todd C. Miller wrote:

On Thu, 06 Jul 2017 07:37:19 -0600, "Todd C. Miller" wrote:

glibc strtol() behavior:
 AIX
 FreeBSD
 GNU/Linux
 Solaris
 macOS


SunOS 4.1.3 has the same behavior as Solaris.  That's as far back
as I care to go.

  - todd



FWIW: AT's SVR4 from 1988 had the same behaviour ;)


Gerhard



Re: Standard conformance of strtol(3)

2017-07-06 Thread Marc Espie
On Thu, Jul 06, 2017 at 06:06:09PM +0200, Joerg Sonnenberger wrote:
> On Thu, Jul 06, 2017 at 03:42:19PM +0200, Marc Espie wrote:
> > 7.20.1.4 (3) If the value of base is zero, the expected form of the subject
> > sequence is that of an integer constant *as described in 6.4.4.1*, 
> > optionally
> > preceded by a plus or minus sign but not including an integer suffix [...]
> > 
> > 6.4.4.1 is a grammar
> > 
> > 
> > integer-constant:
> > [...]
> > hexadecimal-constant  integer-suffix_opt
> 
> You have skipped with [...] the part that actually matches "0". So yes,
> the ISO C grammar does require strtol and friends to handle "0xx" and
> similar as just "0".

Yep, since the goal was just to show that 0x was not a valid number in ISO C.

Duh!



Re: Standard conformance of strtol(3)

2017-07-06 Thread Todd C. Miller
I've just committed a fix for this.

 - todd



Re: Standard conformance of strtol(3)

2017-07-06 Thread Joerg Sonnenberger
On Thu, Jul 06, 2017 at 03:42:19PM +0200, Marc Espie wrote:
> 7.20.1.4 (3) If the value of base is zero, the expected form of the subject
> sequence is that of an integer constant *as described in 6.4.4.1*, optionally
> preceded by a plus or minus sign but not including an integer suffix [...]
> 
> 6.4.4.1 is a grammar
> 
> 
> integer-constant:
>   [...]
>   hexadecimal-constant  integer-suffix_opt

You have skipped with [...] the part that actually matches "0". So yes,
the ISO C grammar does require strtol and friends to handle "0xx" and
similar as just "0".

Joerg



Re: Standard conformance of strtol(3)

2017-07-06 Thread Todd C. Miller
On Thu, 06 Jul 2017 07:37:19 -0600, "Todd C. Miller" wrote:

> Sorry, HP-UX actually has the same behavior as us.  Here's what I
> have so far:
> 
> 4.4BSD strtol() behavior:
> HP-UX
> NetBSD
> OpenBSD

strtol(3) was added to BSD in 4.3-Reno along with the rest of the
Torek libc.

> glibc strtol() behavior:
> AIX
> FreeBSD
> GNU/Linux
> Solaris
> macOS

SunOS 4.1.3 has the same behavior as Solaris.  That's as far back
as I care to go.

 - todd



Re: Standard conformance of strtol(3)

2017-07-06 Thread Bryan Steele
On Thu, Jul 06, 2017 at 07:37:19AM -0600, Todd C. Miller wrote:
> Sorry, HP-UX actually has the same behavior as us.  Here's what I
> have so far:
> 
> 4.4BSD strtol() behavior:
> HP-UX
> NetBSD
> OpenBSD

I had discovered with awolk@ and Dragonfly also shares this
behaviour:

http://gitweb.dragonflybsd.org/dragonfly.git/blob/HEAD:/lib/libc/stdlib/_strtol.h

> glibc strtol() behavior:
> AIX
> FreeBSD
> GNU/Linux
> Solaris
> macOS
> 
>  - todd



Re: Standard conformance of strtol(3)

2017-07-06 Thread Marc Espie
On Wed, Jul 05, 2017 at 07:12:28PM -0400, Ted Unangst wrote:
> Olivier Antoine wrote:
> > Hi all,
> > 
> > Recently a bug has been identified in Tor:
> > 
> > https://trac.torproject.org/projects/tor/ticket/22789
> > 
> > As comments were made, questions were raised about the use of strtol(3),
> > the different interpretations of the standard and their implementation.
> > 
> > To summarize, the question revolves around the processing of strings in
> > base=16 and with the optional prefix '0x'.
> > 
> > l = strtol ("0xquux", & rest, 16);
> > 
> > Produce
> > l=0 rest=0xquux on OpenBSD
> > l=0 rest=xquux on Linux
> > 
> > Do specialists of the standard or developers have an opinion on this point
> > of detail?
> > Is there a defined behavior?
> 
> My opinion is that well written code would avoid feeding ambigious strings to
> strtol. Today's it's 0xquux and tomorrow it's 0xaquux and now you have a
> problem.
> 
> But, let's read 
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/strtol.html
> 
> It's actually unclear IMO. But I don't see anything prohibiting interpreting
> the string as an optional prefix with an empty body.

Well:

The functionality described on this reference page is aligned with the ISO C 
standard. Any conflict between the requirements described here and the ISO C 
standard is unintentional. This volume of POSIX.1-2008 defers to the ISO C 
standard.


This is Sparta^WISO C we're talking about.  The wording of posix is actually
irrelevant.

ISO C 99 is waaays clearer.

7.20.1.4 (3) If the value of base is zero, the expected form of the subject
sequence is that of an integer constant *as described in 6.4.4.1*, optionally
preceded by a plus or minus sign but not including an integer suffix [...]

7.20.1.4 (4) The subject sequence is defined as the longest intial subsequence
of the input string [...] *that is of the expected form*.


6.4.4.1 is a grammar


integer-constant:
[...]
hexadecimal-constant  integer-suffix_opt

hexadecimal-constant:
hexadecimal-prefix hexadecimal-digit
hexadecimal-constant hexadecimal-digit

There is no wiggle room there.


That grammar is explicit that there must be at least one hexadecimal digit
after the prefix.



Re: Standard conformance of strtol(3)

2017-07-06 Thread Todd C. Miller
Sorry, HP-UX actually has the same behavior as us.  Here's what I
have so far:

4.4BSD strtol() behavior:
HP-UX
NetBSD
OpenBSD

glibc strtol() behavior:
AIX
FreeBSD
GNU/Linux
Solaris
macOS

 - todd



Re: Standard conformance of strtol(3)

2017-07-06 Thread Todd C. Miller
Solaris, AIX, and HP-UX all have the same behavior as glibc.
We are the outlier.

Since the standard is clear that the 0x/0X prefix is optional I
believe our behavior is wrong.

 - todd



Re: Standard conformance of strtol(3)

2017-07-05 Thread Theo de Raadt
> Olivier Antoine wrote:
> > Hi all,
> > 
> > Recently a bug has been identified in Tor:
> > 
> > https://trac.torproject.org/projects/tor/ticket/22789
> > 
> > As comments were made, questions were raised about the use of strtol(3),
> > the different interpretations of the standard and their implementation.
> > 
> > To summarize, the question revolves around the processing of strings in
> > base=16 and with the optional prefix '0x'.
> > 
> > l = strtol ("0xquux", & rest, 16);
> > 
> > Produce
> > l=0 rest=0xquux on OpenBSD
> > l=0 rest=xquux on Linux
> > 
> > Do specialists of the standard or developers have an opinion on this point
> > of detail?
> > Is there a defined behavior?
> 
> My opinion is that well written code would avoid feeding ambigious strings to
> strtol. Today's it's 0xquux and tomorrow it's 0xaquux and now you have a
> problem.
> 
> But, let's read 
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/strtol.html
> 
> It's actually unclear IMO. But I don't see anything prohibiting interpreting
> the string as an optional prefix with an empty body.
> 
> I'm inclined to say that strtol parsing should involve minimal lookahead and
> backtracking. So if it sees 0x, it thinks hex prefix, and then parses the
> rest. It doesn't try parsing the rest, fail, and then backtrack and start over
> with a new parse strategy.

What does the original code from AT do?

Does the BSD 4.0 code do the same?

How about the BSD 4.2 code?

How about the BSD 4.4 code?

How about all the vendors who simply used that code unmodified?

Who is the outlier?

Is it glibc?

Is it possible the spec wasn't written in a proscriptive fashion?

How much code breaks as a result?

I expect some language layering will happen over this in the next year.

I wonder if some people are going to say "the original way of doing
this is so wrong, glibc does it so much better, and the written text
lets us get away with it no matter how much code it affects".

Always fun.



Re: Standard conformance of strtol(3)

2017-07-05 Thread Ted Unangst
Olivier Antoine wrote:
> Hi all,
> 
> Recently a bug has been identified in Tor:
> 
> https://trac.torproject.org/projects/tor/ticket/22789
> 
> As comments were made, questions were raised about the use of strtol(3),
> the different interpretations of the standard and their implementation.
> 
> To summarize, the question revolves around the processing of strings in
> base=16 and with the optional prefix '0x'.
> 
> l = strtol ("0xquux", & rest, 16);
> 
> Produce
> l=0 rest=0xquux on OpenBSD
> l=0 rest=xquux on Linux
> 
> Do specialists of the standard or developers have an opinion on this point
> of detail?
> Is there a defined behavior?

My opinion is that well written code would avoid feeding ambigious strings to
strtol. Today's it's 0xquux and tomorrow it's 0xaquux and now you have a
problem.

But, let's read 
http://pubs.opengroup.org/onlinepubs/9699919799/functions/strtol.html

It's actually unclear IMO. But I don't see anything prohibiting interpreting
the string as an optional prefix with an empty body.

I'm inclined to say that strtol parsing should involve minimal lookahead and
backtracking. So if it sees 0x, it thinks hex prefix, and then parses the
rest. It doesn't try parsing the rest, fail, and then backtrack and start over
with a new parse strategy.





Re: Standard conformance of strtol(3)

2017-07-05 Thread Todd C. Miller
C99 states that the 0x or 0X prefix is optional so we should only
consume the prefix if the following character is a valid hex char.

This is equivalent to the fix in FreeBSD but I used isxdigit(3).

 - todd

Index: lib/libc/stdlib/strtoimax.c
===
RCS file: /cvs/src/lib/libc/stdlib/strtoimax.c,v
retrieving revision 1.3
diff -u -p -u -r1.3 strtoimax.c
--- lib/libc/stdlib/strtoimax.c 12 Sep 2015 16:23:14 -  1.3
+++ lib/libc/stdlib/strtoimax.c 5 Jul 2017 22:58:33 -
@@ -74,8 +74,8 @@ strtoimax(const char *nptr, char **endpt
if (c == '+')
c = *s++;
}
-   if ((base == 0 || base == 16) &&
-   c == '0' && (*s == 'x' || *s == 'X')) {
+   if ((base == 0 || base == 16) && c == '0' &&
+   (*s == 'x' || *s == 'X') && isxdigit((unsigned char)s[1])) {
c = s[1];
s += 2;
base = 16;
Index: lib/libc/stdlib/strtol.c
===
RCS file: /cvs/src/lib/libc/stdlib/strtol.c,v
retrieving revision 1.11
diff -u -p -u -r1.11 strtol.c
--- lib/libc/stdlib/strtol.c13 Sep 2015 08:31:48 -  1.11
+++ lib/libc/stdlib/strtol.c5 Jul 2017 22:59:13 -
@@ -75,8 +75,8 @@ strtol(const char *nptr, char **endptr, 
if (c == '+')
c = *s++;
}
-   if ((base == 0 || base == 16) &&
-   c == '0' && (*s == 'x' || *s == 'X')) {
+   if ((base == 0 || base == 16) && c == '0' &&
+   (*s == 'x' || *s == 'X') && isxdigit((unsigned char)s[1])) {
c = s[1];
s += 2;
base = 16;
Index: lib/libc/stdlib/strtoll.c
===
RCS file: /cvs/src/lib/libc/stdlib/strtoll.c,v
retrieving revision 1.9
diff -u -p -u -r1.9 strtoll.c
--- lib/libc/stdlib/strtoll.c   13 Sep 2015 08:31:48 -  1.9
+++ lib/libc/stdlib/strtoll.c   5 Jul 2017 22:59:13 -
@@ -77,8 +77,8 @@ strtoll(const char *nptr, char **endptr,
if (c == '+')
c = *s++;
}
-   if ((base == 0 || base == 16) &&
-   c == '0' && (*s == 'x' || *s == 'X')) {
+   if ((base == 0 || base == 16) && c == '0' &&
+   (*s == 'x' || *s == 'X') && isxdigit((unsigned char)s[1])) {
c = s[1];
s += 2;
base = 16;
Index: lib/libc/stdlib/strtoul.c
===
RCS file: /cvs/src/lib/libc/stdlib/strtoul.c,v
retrieving revision 1.10
diff -u -p -u -r1.10 strtoul.c
--- lib/libc/stdlib/strtoul.c   13 Sep 2015 08:31:48 -  1.10
+++ lib/libc/stdlib/strtoul.c   5 Jul 2017 22:59:13 -
@@ -69,8 +69,8 @@ strtoul(const char *nptr, char **endptr,
if (c == '+')
c = *s++;
}
-   if ((base == 0 || base == 16) &&
-   c == '0' && (*s == 'x' || *s == 'X')) {
+   if ((base == 0 || base == 16) && c == '0' &&
+   (*s == 'x' || *s == 'X') && isxdigit((unsigned char)s[1])) {
c = s[1];
s += 2;
base = 16;
Index: lib/libc/stdlib/strtoull.c
===
RCS file: /cvs/src/lib/libc/stdlib/strtoull.c,v
retrieving revision 1.8
diff -u -p -u -r1.8 strtoull.c
--- lib/libc/stdlib/strtoull.c  13 Sep 2015 08:31:48 -  1.8
+++ lib/libc/stdlib/strtoull.c  5 Jul 2017 22:59:13 -
@@ -71,8 +71,8 @@ strtoull(const char *nptr, char **endptr
if (c == '+')
c = *s++;
}
-   if ((base == 0 || base == 16) &&
-   c == '0' && (*s == 'x' || *s == 'X')) {
+   if ((base == 0 || base == 16) && c == '0' &&
+   (*s == 'x' || *s == 'X') && isxdigit((unsigned char)s[1])) {
c = s[1];
s += 2;
base = 16;
Index: lib/libc/stdlib/strtoumax.c
===
RCS file: /cvs/src/lib/libc/stdlib/strtoumax.c,v
retrieving revision 1.3
diff -u -p -u -r1.3 strtoumax.c
--- lib/libc/stdlib/strtoumax.c 12 Sep 2015 16:23:14 -  1.3
+++ lib/libc/stdlib/strtoumax.c 5 Jul 2017 22:59:13 -
@@ -68,8 +68,8 @@ strtoumax(const char *nptr, char **endpt
if (c == '+')
c = *s++;
}
-   if ((base == 0 || base == 16) &&
-   c == '0' && (*s == 'x' || *s == 'X')) {
+   if ((base == 0 || base == 16) && c == '0' &&
+   (*s == 'x' || *s == 'X') && isxdigit((unsigned char)s[1])) {
c = s[1];
s += 2;
base = 16;



Standard conformance of strtol(3)

2017-07-05 Thread Olivier Antoine
Hi all,

Recently a bug has been identified in Tor:

https://trac.torproject.org/projects/tor/ticket/22789

As comments were made, questions were raised about the use of strtol(3),
the different interpretations of the standard and their implementation.

To summarize, the question revolves around the processing of strings in
base=16 and with the optional prefix '0x'.

l = strtol ("0xquux", & rest, 16);

Produce
l=0 rest=0xquux on OpenBSD
l=0 rest=xquux on Linux

Do specialists of the standard or developers have an opinion on this point
of detail?
Is there a defined behavior?

-- 
Olivier