Re: Standard conformance of strtol(3)
On 06.07.2017 17:53, Todd C. Miller wrote: On Thu, 06 Jul 2017 07:37:19 -0600, "Todd C. Miller" wrote: glibc strtol() behavior: AIX FreeBSD GNU/Linux Solaris macOS SunOS 4.1.3 has the same behavior as Solaris. That's as far back as I care to go. - todd FWIW: AT's SVR4 from 1988 had the same behaviour ;) Gerhard
Re: Standard conformance of strtol(3)
On Thu, Jul 06, 2017 at 06:06:09PM +0200, Joerg Sonnenberger wrote: > On Thu, Jul 06, 2017 at 03:42:19PM +0200, Marc Espie wrote: > > 7.20.1.4 (3) If the value of base is zero, the expected form of the subject > > sequence is that of an integer constant *as described in 6.4.4.1*, > > optionally > > preceded by a plus or minus sign but not including an integer suffix [...] > > > > 6.4.4.1 is a grammar > > > > > > integer-constant: > > [...] > > hexadecimal-constant integer-suffix_opt > > You have skipped with [...] the part that actually matches "0". So yes, > the ISO C grammar does require strtol and friends to handle "0xx" and > similar as just "0". Yep, since the goal was just to show that 0x was not a valid number in ISO C. Duh!
Re: Standard conformance of strtol(3)
I've just committed a fix for this. - todd
Re: Standard conformance of strtol(3)
On Thu, Jul 06, 2017 at 03:42:19PM +0200, Marc Espie wrote: > 7.20.1.4 (3) If the value of base is zero, the expected form of the subject > sequence is that of an integer constant *as described in 6.4.4.1*, optionally > preceded by a plus or minus sign but not including an integer suffix [...] > > 6.4.4.1 is a grammar > > > integer-constant: > [...] > hexadecimal-constant integer-suffix_opt You have skipped with [...] the part that actually matches "0". So yes, the ISO C grammar does require strtol and friends to handle "0xx" and similar as just "0". Joerg
Re: Standard conformance of strtol(3)
On Thu, 06 Jul 2017 07:37:19 -0600, "Todd C. Miller" wrote: > Sorry, HP-UX actually has the same behavior as us. Here's what I > have so far: > > 4.4BSD strtol() behavior: > HP-UX > NetBSD > OpenBSD strtol(3) was added to BSD in 4.3-Reno along with the rest of the Torek libc. > glibc strtol() behavior: > AIX > FreeBSD > GNU/Linux > Solaris > macOS SunOS 4.1.3 has the same behavior as Solaris. That's as far back as I care to go. - todd
Re: Standard conformance of strtol(3)
On Thu, Jul 06, 2017 at 07:37:19AM -0600, Todd C. Miller wrote: > Sorry, HP-UX actually has the same behavior as us. Here's what I > have so far: > > 4.4BSD strtol() behavior: > HP-UX > NetBSD > OpenBSD I had discovered with awolk@ and Dragonfly also shares this behaviour: http://gitweb.dragonflybsd.org/dragonfly.git/blob/HEAD:/lib/libc/stdlib/_strtol.h > glibc strtol() behavior: > AIX > FreeBSD > GNU/Linux > Solaris > macOS > > - todd
Re: Standard conformance of strtol(3)
On Wed, Jul 05, 2017 at 07:12:28PM -0400, Ted Unangst wrote: > Olivier Antoine wrote: > > Hi all, > > > > Recently a bug has been identified in Tor: > > > > https://trac.torproject.org/projects/tor/ticket/22789 > > > > As comments were made, questions were raised about the use of strtol(3), > > the different interpretations of the standard and their implementation. > > > > To summarize, the question revolves around the processing of strings in > > base=16 and with the optional prefix '0x'. > > > > l = strtol ("0xquux", & rest, 16); > > > > Produce > > l=0 rest=0xquux on OpenBSD > > l=0 rest=xquux on Linux > > > > Do specialists of the standard or developers have an opinion on this point > > of detail? > > Is there a defined behavior? > > My opinion is that well written code would avoid feeding ambigious strings to > strtol. Today's it's 0xquux and tomorrow it's 0xaquux and now you have a > problem. > > But, let's read > http://pubs.opengroup.org/onlinepubs/9699919799/functions/strtol.html > > It's actually unclear IMO. But I don't see anything prohibiting interpreting > the string as an optional prefix with an empty body. Well: The functionality described on this reference page is aligned with the ISO C standard. Any conflict between the requirements described here and the ISO C standard is unintentional. This volume of POSIX.1-2008 defers to the ISO C standard. This is Sparta^WISO C we're talking about. The wording of posix is actually irrelevant. ISO C 99 is waaays clearer. 7.20.1.4 (3) If the value of base is zero, the expected form of the subject sequence is that of an integer constant *as described in 6.4.4.1*, optionally preceded by a plus or minus sign but not including an integer suffix [...] 7.20.1.4 (4) The subject sequence is defined as the longest intial subsequence of the input string [...] *that is of the expected form*. 6.4.4.1 is a grammar integer-constant: [...] hexadecimal-constant integer-suffix_opt hexadecimal-constant: hexadecimal-prefix hexadecimal-digit hexadecimal-constant hexadecimal-digit There is no wiggle room there. That grammar is explicit that there must be at least one hexadecimal digit after the prefix.
Re: Standard conformance of strtol(3)
Sorry, HP-UX actually has the same behavior as us. Here's what I have so far: 4.4BSD strtol() behavior: HP-UX NetBSD OpenBSD glibc strtol() behavior: AIX FreeBSD GNU/Linux Solaris macOS - todd
Re: Standard conformance of strtol(3)
Solaris, AIX, and HP-UX all have the same behavior as glibc. We are the outlier. Since the standard is clear that the 0x/0X prefix is optional I believe our behavior is wrong. - todd
Re: Standard conformance of strtol(3)
> Olivier Antoine wrote: > > Hi all, > > > > Recently a bug has been identified in Tor: > > > > https://trac.torproject.org/projects/tor/ticket/22789 > > > > As comments were made, questions were raised about the use of strtol(3), > > the different interpretations of the standard and their implementation. > > > > To summarize, the question revolves around the processing of strings in > > base=16 and with the optional prefix '0x'. > > > > l = strtol ("0xquux", & rest, 16); > > > > Produce > > l=0 rest=0xquux on OpenBSD > > l=0 rest=xquux on Linux > > > > Do specialists of the standard or developers have an opinion on this point > > of detail? > > Is there a defined behavior? > > My opinion is that well written code would avoid feeding ambigious strings to > strtol. Today's it's 0xquux and tomorrow it's 0xaquux and now you have a > problem. > > But, let's read > http://pubs.opengroup.org/onlinepubs/9699919799/functions/strtol.html > > It's actually unclear IMO. But I don't see anything prohibiting interpreting > the string as an optional prefix with an empty body. > > I'm inclined to say that strtol parsing should involve minimal lookahead and > backtracking. So if it sees 0x, it thinks hex prefix, and then parses the > rest. It doesn't try parsing the rest, fail, and then backtrack and start over > with a new parse strategy. What does the original code from AT do? Does the BSD 4.0 code do the same? How about the BSD 4.2 code? How about the BSD 4.4 code? How about all the vendors who simply used that code unmodified? Who is the outlier? Is it glibc? Is it possible the spec wasn't written in a proscriptive fashion? How much code breaks as a result? I expect some language layering will happen over this in the next year. I wonder if some people are going to say "the original way of doing this is so wrong, glibc does it so much better, and the written text lets us get away with it no matter how much code it affects". Always fun.
Re: Standard conformance of strtol(3)
Olivier Antoine wrote: > Hi all, > > Recently a bug has been identified in Tor: > > https://trac.torproject.org/projects/tor/ticket/22789 > > As comments were made, questions were raised about the use of strtol(3), > the different interpretations of the standard and their implementation. > > To summarize, the question revolves around the processing of strings in > base=16 and with the optional prefix '0x'. > > l = strtol ("0xquux", & rest, 16); > > Produce > l=0 rest=0xquux on OpenBSD > l=0 rest=xquux on Linux > > Do specialists of the standard or developers have an opinion on this point > of detail? > Is there a defined behavior? My opinion is that well written code would avoid feeding ambigious strings to strtol. Today's it's 0xquux and tomorrow it's 0xaquux and now you have a problem. But, let's read http://pubs.opengroup.org/onlinepubs/9699919799/functions/strtol.html It's actually unclear IMO. But I don't see anything prohibiting interpreting the string as an optional prefix with an empty body. I'm inclined to say that strtol parsing should involve minimal lookahead and backtracking. So if it sees 0x, it thinks hex prefix, and then parses the rest. It doesn't try parsing the rest, fail, and then backtrack and start over with a new parse strategy.
Re: Standard conformance of strtol(3)
C99 states that the 0x or 0X prefix is optional so we should only consume the prefix if the following character is a valid hex char. This is equivalent to the fix in FreeBSD but I used isxdigit(3). - todd Index: lib/libc/stdlib/strtoimax.c === RCS file: /cvs/src/lib/libc/stdlib/strtoimax.c,v retrieving revision 1.3 diff -u -p -u -r1.3 strtoimax.c --- lib/libc/stdlib/strtoimax.c 12 Sep 2015 16:23:14 - 1.3 +++ lib/libc/stdlib/strtoimax.c 5 Jul 2017 22:58:33 - @@ -74,8 +74,8 @@ strtoimax(const char *nptr, char **endpt if (c == '+') c = *s++; } - if ((base == 0 || base == 16) && - c == '0' && (*s == 'x' || *s == 'X')) { + if ((base == 0 || base == 16) && c == '0' && + (*s == 'x' || *s == 'X') && isxdigit((unsigned char)s[1])) { c = s[1]; s += 2; base = 16; Index: lib/libc/stdlib/strtol.c === RCS file: /cvs/src/lib/libc/stdlib/strtol.c,v retrieving revision 1.11 diff -u -p -u -r1.11 strtol.c --- lib/libc/stdlib/strtol.c13 Sep 2015 08:31:48 - 1.11 +++ lib/libc/stdlib/strtol.c5 Jul 2017 22:59:13 - @@ -75,8 +75,8 @@ strtol(const char *nptr, char **endptr, if (c == '+') c = *s++; } - if ((base == 0 || base == 16) && - c == '0' && (*s == 'x' || *s == 'X')) { + if ((base == 0 || base == 16) && c == '0' && + (*s == 'x' || *s == 'X') && isxdigit((unsigned char)s[1])) { c = s[1]; s += 2; base = 16; Index: lib/libc/stdlib/strtoll.c === RCS file: /cvs/src/lib/libc/stdlib/strtoll.c,v retrieving revision 1.9 diff -u -p -u -r1.9 strtoll.c --- lib/libc/stdlib/strtoll.c 13 Sep 2015 08:31:48 - 1.9 +++ lib/libc/stdlib/strtoll.c 5 Jul 2017 22:59:13 - @@ -77,8 +77,8 @@ strtoll(const char *nptr, char **endptr, if (c == '+') c = *s++; } - if ((base == 0 || base == 16) && - c == '0' && (*s == 'x' || *s == 'X')) { + if ((base == 0 || base == 16) && c == '0' && + (*s == 'x' || *s == 'X') && isxdigit((unsigned char)s[1])) { c = s[1]; s += 2; base = 16; Index: lib/libc/stdlib/strtoul.c === RCS file: /cvs/src/lib/libc/stdlib/strtoul.c,v retrieving revision 1.10 diff -u -p -u -r1.10 strtoul.c --- lib/libc/stdlib/strtoul.c 13 Sep 2015 08:31:48 - 1.10 +++ lib/libc/stdlib/strtoul.c 5 Jul 2017 22:59:13 - @@ -69,8 +69,8 @@ strtoul(const char *nptr, char **endptr, if (c == '+') c = *s++; } - if ((base == 0 || base == 16) && - c == '0' && (*s == 'x' || *s == 'X')) { + if ((base == 0 || base == 16) && c == '0' && + (*s == 'x' || *s == 'X') && isxdigit((unsigned char)s[1])) { c = s[1]; s += 2; base = 16; Index: lib/libc/stdlib/strtoull.c === RCS file: /cvs/src/lib/libc/stdlib/strtoull.c,v retrieving revision 1.8 diff -u -p -u -r1.8 strtoull.c --- lib/libc/stdlib/strtoull.c 13 Sep 2015 08:31:48 - 1.8 +++ lib/libc/stdlib/strtoull.c 5 Jul 2017 22:59:13 - @@ -71,8 +71,8 @@ strtoull(const char *nptr, char **endptr if (c == '+') c = *s++; } - if ((base == 0 || base == 16) && - c == '0' && (*s == 'x' || *s == 'X')) { + if ((base == 0 || base == 16) && c == '0' && + (*s == 'x' || *s == 'X') && isxdigit((unsigned char)s[1])) { c = s[1]; s += 2; base = 16; Index: lib/libc/stdlib/strtoumax.c === RCS file: /cvs/src/lib/libc/stdlib/strtoumax.c,v retrieving revision 1.3 diff -u -p -u -r1.3 strtoumax.c --- lib/libc/stdlib/strtoumax.c 12 Sep 2015 16:23:14 - 1.3 +++ lib/libc/stdlib/strtoumax.c 5 Jul 2017 22:59:13 - @@ -68,8 +68,8 @@ strtoumax(const char *nptr, char **endpt if (c == '+') c = *s++; } - if ((base == 0 || base == 16) && - c == '0' && (*s == 'x' || *s == 'X')) { + if ((base == 0 || base == 16) && c == '0' && + (*s == 'x' || *s == 'X') && isxdigit((unsigned char)s[1])) { c = s[1]; s += 2; base = 16;
Standard conformance of strtol(3)
Hi all, Recently a bug has been identified in Tor: https://trac.torproject.org/projects/tor/ticket/22789 As comments were made, questions were raised about the use of strtol(3), the different interpretations of the standard and their implementation. To summarize, the question revolves around the processing of strings in base=16 and with the optional prefix '0x'. l = strtol ("0xquux", & rest, 16); Produce l=0 rest=0xquux on OpenBSD l=0 rest=xquux on Linux Do specialists of the standard or developers have an opinion on this point of detail? Is there a defined behavior? -- Olivier