svn commit: r259609 - head/sys/kern

2013-12-19 Thread Stefan Esser
Author: se
Date: Thu Dec 19 09:01:46 2013
New Revision: 259609
URL: http://svnweb.freebsd.org/changeset/base/259609

Log:
  Fix overflow for timeout values of more than 68 years, which is the maximum
  covered by sbintime (LONG_MAX seconds).
  
  Some programs use timeout values in excess of 1000 years. The conversion
  to sbintime caused wrap-around on overflow, which resulted in short or
  negative timeout values. This caused long delays on sockets opened by
  affected programs (e.g. OpenSSH).
  
  Kernels compiled without -fno-strict-overflow were not affected, apparently
  because the compiler tested the sign of the timeout value before performing
  the multiplication that lead to overflow.
  
  When the -fno-strict-overflow option was added to CFLAGS, this optimization
  was disabled and the test was performed on the result of the multiplication.
  Negative products were caught and resulted in EINVAL being returned, but
  wrap-around to positive values just shortened the timeout value to the
  residue of the result that could be represented by sbintime.
  
  The fix is to cap the timeout values at the maximum that can be represented
  by sbintime, which is 2^31 - 1 seconds or more than 68 years.
  
  After this change, the kernel can be compiled with -fno-strict-overflow
  with no ill effects.
  
  MFC after:3 days

Modified:
  head/sys/kern/kern_event.c

Modified: head/sys/kern/kern_event.c
==
--- head/sys/kern/kern_event.c  Thu Dec 19 07:33:07 2013(r259608)
+++ head/sys/kern/kern_event.c  Thu Dec 19 09:01:46 2013(r259609)
@@ -526,7 +526,8 @@ knote_fork(struct knlist *list, int pid)
 static __inline sbintime_t 
 timer2sbintime(intptr_t data)
 {
-
+   if (data  LLONG_MAX / SBT_1MS)
+   return LLONG_MAX;
return (SBT_1MS * data);
 }
 
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r259609 - head/sys/kern

2013-12-19 Thread Bruce Evans

On Thu, 19 Dec 2013, Stefan Esser wrote:


Log:
 Fix overflow for timeout values of more than 68 years, which is the maximum
 covered by sbintime (LONG_MAX seconds).


Not LONG_MAX seconds, but INT32_MAX seconds.  LONG_MAX seconds is about 2**32
times larger on 64-bit arches.

sbintimes and their complexity give many more possibilities for overflow.
I thought that the new overflow bugs and some old ones were already  fixed.


 Some programs use timeout values in excess of 1000 years. The conversion
 to sbintime caused wrap-around on overflow, which resulted in short or
 negative timeout values. This caused long delays on sockets opened by
 affected programs (e.g. OpenSSH).

 Kernels compiled without -fno-strict-overflow were not affected, apparently
 because the compiler tested the sign of the timeout value before performing
 the multiplication that lead to overflow.


I think it just gave a garbage value that was accidentally less harmful.


 When the -fno-strict-overflow option was added to CFLAGS, this optimization
 was disabled and the test was performed on the result of the multiplication.
 Negative products were caught and resulted in EINVAL being returned, but
 wrap-around to positive values just shortened the timeout value to the
 residue of the result that could be represented by sbintime.


This shows one reason why -fno-strict-overflow shouldn't be used.  It just
wastes time to give different undefined behaviour with undocumented details.


 The fix is to cap the timeout values at the maximum that can be represented
 by sbintime, which is 2^31 - 1 seconds or more than 68 years.


2^31 - 1 is a correct spelling of INT32_MAX, unlike LONG_MAX.


 After this change, the kernel can be compiled with -fno-strict-overflow
 with no ill effects.



Modified: head/sys/kern/kern_event.c
==
--- head/sys/kern/kern_event.c  Thu Dec 19 07:33:07 2013(r259608)
+++ head/sys/kern/kern_event.c  Thu Dec 19 09:01:46 2013(r259609)
@@ -526,7 +526,8 @@ knote_fork(struct knlist *list, int pid)
static __inline sbintime_t
timer2sbintime(intptr_t data)
{
-
+   if (data  LLONG_MAX / SBT_1MS)
+   return LLONG_MAX;
return (SBT_1MS * data);
}


This has the following style bugs:
- it removes the empty line after the (null) declarations
- it is missing parentheses around the return value
- it uses the long long abomination

This has the following type errors:
- using the long long abomination is not just a style bug.  sbintime_t has
  type int64_t, not long long.  The overflow check will break if long long
  becomes longer than int64_t
- returning LLONG_MAX is usually just a style bug.  When long long is longer
  than int64_t, the return value will overflow, but the implementation-defined
  behaviour for this is usually to convert it to the correct value.

INT64_MAX is a dangerous default maximum timeout.  You can't even add the
minimum sbintime_t of 2**-32 seconds to this without overflowing.

Old timeout code used the following method to reduce the problem of
overflowing timeouts:
  For alarm() and setitimer(), limit the timeout to 100 million seconds
  (3.17 years) and return EINVAL if it is too large.  This works with
  32-bit time_t until about 2035.  POSIX doesn't allow this.  Clamping
  the time to 100 million seconds isn't allowed either, since the
  residual time can be seen from applications.  Note that 64-bit time_t
  has little effect on this problem.  Uses wishing to overflow the kernel
  simply ask for a timeout in a timeval of INT64_MAX seconds plus 99
  seconds so that adding just 1 microseconds to it overflow.  With
  seconds in sbintime_t instead of in time_t, overflow occurs much
  sooner.

This is quite broken now:
- the limit of 1 used to be enforced in itimerfix(), but this was
  removed in connection with implementing POSIX realtime timers without
  even breaking the documentaion to match or touching PR(s) about the
  POSIX non-conformance of the limit.
- setitimer() mostly uses sbintimes now, so the old limit is irrelevant
  and enforcing it in itimerfix() would break the new sbintime code and
  presumably the not so new realtime timers code.  The sbintime code
  uses a modified limit that is also not allowed by POSIX.  This limit is
  of course undocumented.  It is INT32_MAX / 2, so that there is plenty
  of room for expansion.  This is what the above should use, modulo the
  conformance bug.  EINVAL is returned when this limit is exceeded.
- select() and poll() were more careful and seem to be correct now.
  The timeout in poll() is limited to INT_MAX milliseconds, so it fits
  in sbintime_t.  Except this assumes that int is 32 bits.  select()
  limits the timeout to INT32_MAX seconds so that it is representable
  as an sbintime_t and then does a further overflow check involving
  INT64_MAX.  When overflow would occur, it sets asbt to -1, which I
  think means an infinite 

Re: svn commit: r259609 - head/sys/kern

2013-12-19 Thread Stefan Esser
Am 19.12.2013 11:49, schrieb Bruce Evans:
 On Thu, 19 Dec 2013, Stefan Esser wrote:
 
 Log:
  Fix overflow for timeout values of more than 68 years, which is the
 maximum
  covered by sbintime (LONG_MAX seconds).
 
 Not LONG_MAX seconds, but INT32_MAX seconds.  LONG_MAX seconds is about
 2**32
 times larger on 64-bit arches.

Hi Bruce,

yes, you are of course correct ... The limit is 2^31-2^-32 seconds,
to be exact ;-)

This is represented by LONG_MAX in sbintime, which made me use that
wrong constant in the commit message.

 sbintimes and their complexity give many more possibilities for overflow.
 I thought that the new overflow bugs and some old ones were already  fixed.
 
  Some programs use timeout values in excess of 1000 years. The conversion
  to sbintime caused wrap-around on overflow, which resulted in short or
  negative timeout values. This caused long delays on sockets opened by
  affected programs (e.g. OpenSSH).

  Kernels compiled without -fno-strict-overflow were not affected,
 apparently
  because the compiler tested the sign of the timeout value before
 performing
  the multiplication that lead to overflow.
 
 I think it just gave a garbage value that was accidentally less harmful.

The result of the multiplication was performed modulo 2^64 due to
the limited range of the operation. The result was then interpreted
as a signed 64 bit 2s-complement number.

The factor 2^32/1000 is 0x418937, which multiplied with a round
decimal number over 1000*2^32 will result in random bit sequence
in the resulting sbintime.

Half the values will have been positive, and many will have
corresponded to substantial timeout values, but some will have
been very low, resulting in unexpectedly low timeouts.

  When the -fno-strict-overflow option was added to CFLAGS, this
 optimization
  was disabled and the test was performed on the result of the
 multiplication.
  Negative products were caught and resulted in EINVAL being returned, but
  wrap-around to positive values just shortened the timeout value to the
  residue of the result that could be represented by sbintime.
 
 This shows one reason why -fno-strict-overflow shouldn't be used.  It just
 wastes time to give different undefined behaviour with undocumented
 details.

Well, I thought so when I found the cause of the breakage (long
delays opening TCP connections) that were the result of the
compilation with -fno-strict-overflow.

But I reconsidered, because a real bug in the code has been identied,
this way. The bug existed, without being detected (because too short
but non-zero timeout values were caught at the application layer, which
just re-issued the call with the remaining time as timeout parameter).

If applications didn't have to worry about short timeouts for other
reasons, then this bug would have led to observable problems, even
with -fno-strict-overflow.

  The fix is to cap the timeout values at the maximum that can be
 represented
  by sbintime, which is 2^31 - 1 seconds or more than 68 years.
 
 2^31 - 1 is a correct spelling of INT32_MAX, unlike LONG_MAX.

Yes, sbintime is defined in units of 2^-32 seconds, which makes the
highest representable time exactly (2^31 - 2^-32) seconds. This
value is represented by LLONG_MAX units of 2^32 seconds ...

  After this change, the kernel can be compiled with -fno-strict-overflow
  with no ill effects.
 
 Modified: head/sys/kern/kern_event.c
 ==

 --- head/sys/kern/kern_event.cThu Dec 19 07:33:07 2013(r259608)
 +++ head/sys/kern/kern_event.cThu Dec 19 09:01:46 2013(r259609)
 @@ -526,7 +526,8 @@ knote_fork(struct knlist *list, int pid)
 static __inline sbintime_t
 timer2sbintime(intptr_t data)
 {
 -
 +if (data  LLONG_MAX / SBT_1MS)
 +return LLONG_MAX;
 return (SBT_1MS * data);
 }
 
 This has the following style bugs:
 - it removes the empty line after the (null) declarations
 - it is missing parentheses around the return value
 - it uses the long long abomination

Ughh, I'll fix the style bugs ...

It had taken me several hours over the last week to find this bug,
and I committed the fix that worked on my system without making it
compliant with FreeBSD style. Sorry for that ...

 This has the following type errors:
 - using the long long abomination is not just a style bug.  sbintime_t has
   type int64_t, not long long.  The overflow check will break if long long
   becomes longer than int64_t

The definition of sbintime is int64_t on all architectures. If any
architecture used a wider integer type for sbintime, the fix would
just limit the maximum timeout to 68 years (instead of many magnitudes
longer than the universe will last), which is still way beyond any
sensible timeout value. And functions will have to check for too short
timeouts to be robust, anyway.

The functions in kern_timeout.c heavily depend on sbintime having at
least 64 bits and perform calculations under the assumption, 

Re: svn commit: r259609 - head/sys/kern

2013-12-19 Thread Andreas Tobler
On 19.12.13 18:00, Stefan Esser wrote:

 I'd replace the two occurances of LLONG_MAX with INT64_MAX and add the
 missing empty line:
 
 static __inline sbintime_t
 timer2sbintime(intptr_t data)
 {
 
 if (data  INT64_MAX / SBT_1MS)
 return INT64_MAX;
 return (SBT_1MS * data);
 }
 
 If you can show evidence that a limit of INT64_MAX/2 is more appropriate
 (2^30 seconds or 34 years), the limit could be of course be reduced to
 that value.
 
 I could not find any code that would not tolerate INT64_MAX, though ...

Aehm, what about 32-bit systems where intptr_t == __int32_t?


cc1: warnings being treated as errors
/export/devel/fbsd/src/sys/kern/kern_event.c: In function 'timer2sbintime':
/export/devel/fbsd/src/sys/kern/kern_event.c:529: warning: comparison is
always false due to limited range of data type

Andreas

___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r259609 - head/sys/kern

2013-12-19 Thread Stefan Esser
Am 19.12.2013 21:46, schrieb Andreas Tobler:
 On 19.12.13 18:00, Stefan Esser wrote:
 
 I'd replace the two occurances of LLONG_MAX with INT64_MAX and add the
 missing empty line:

 static __inline sbintime_t
 timer2sbintime(intptr_t data)
 {

 if (data  INT64_MAX / SBT_1MS)
 return INT64_MAX;
 return (SBT_1MS * data);
 }

 If you can show evidence that a limit of INT64_MAX/2 is more appropriate
 (2^30 seconds or 34 years), the limit could be of course be reduced to
 that value.

 I could not find any code that would not tolerate INT64_MAX, though ...
 
 Aehm, what about 32-bit systems where intptr_t == __int32_t?
 
 cc1: warnings being treated as errors
 /export/devel/fbsd/src/sys/kern/kern_event.c: In function 'timer2sbintime':
 /export/devel/fbsd/src/sys/kern/kern_event.c:529: warning: comparison is
 always false due to limited range of data type

You are right, this needs to be fixed, too :(

I see two possibilities:

1) Conditional compilation: There is no need to check an upper bound on
   ILP32 architectures. INT32_MAX seconds can be expressed as sbintime.
   Architectures where intptr_t has at least 48 bits will support the
   maximum time that can be expressed in sbintime and the comparison
   can be left as is for them.

2) Calculate the upper bound in such a way, that it is guaranteed to be
   lower than the highest value that can be expressed as intptr_t. This
   value is (INT32_MAX - 1) on 32 bit architectures, while it remains
   (INT64_MAX / SBT_1MS) on 64 bit architectures.

I'm afraid that the warning emitted for 32 bit architectures will cause
tinderbox build failures, but I haven't seen a failure message, yet.

Should I back-out the commit that added the range check?

As long as -fno-strict-overflow is not put back into CFLAGS, the test
is not strictly required (only for the extremely unlikely case that a
number  1000 * MAX32_INT results in an extremely short remainder after
multiplication with SBT_1MS modulo 32 ...).

Regards, STefan

NB: I should have known better and should have asked for a review of
this change before it wa committed. Sorry for the inconvenience
caused :(
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r259609 - head/sys/kern

2013-12-19 Thread Adrian Chadd
On 19 December 2013 13:15, Stefan Esser s...@freebsd.org wrote:

 NB: I should have known better and should have asked for a review of
 this change before it wa committed. Sorry for the inconvenience
 caused :(

Hey, don't be (too) sorry - you chased down and debugged a
substantially annoying bug that's been around since forever.

Great work. :)



-a
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org