Re: PSA: If you run -current, beware!
On Wed, Feb 4, 2015 at 6:15 PM, Peter Wemm wrote: > --- kern/kern_clock.c 2014-12-01 15:42:21.707911656 -0800 > +++ kern/kern_clock.c 2014-12-01 15:42:21.707911656 -0800 > @@ -410,6 +415,11 @@ > #ifdef SW_WATCHDOG > EVENTHANDLER_REGISTER(watchdog_list, watchdog_config, NULL, 0); > #endif > + /* > +* Arrange for ticks to go negative just 5 minutes after boot > +* to help catch sign problems sooner. > +*/ > + ticks = INT_MAX - (hz * 5 * 60); > } Should we just commit this under #ifdef INVARIANTS? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: PSA: If you run -current, beware!
On 2/5/15 11:00 AM, Peter Wemm wrote: On Thursday, February 05, 2015 10:48:54 AM John Baldwin wrote: On Thursday, February 05, 2015 04:22:23 PM Luigi Rizzo wrote: On Thu, Feb 05, 2015 at 08:21:45AM -0500, John Baldwin wrote: On Thursday, February 05, 2015 08:48:33 AM Luigi Rizzo wrote: ... It is fixed (in the proper meaning of the word, not like worked around, covered by paper) by the patch at the end of the mail. We already have a story trying to enable much less ambitious option -fno-strict-overflow, see r259045 and the revert in r259422. I do not see other way than try one more time. Too many places in kernel depend on the correctly wrapping 2-complement arithmetic, among others are callweel and scheduler. Rather than depending on a compiler option, wouldn't it be better/more robust to change ticks to unsigned, which has specified wrapping behavior? Yes, but non-trivial. It's also not limited to ticks. Since the compiler knows when it would apply these optimizations, it would be nice if it could warn instead (GCC apparently has a warning, but clang does not). Having people do a manual audit of every signed integer expression in the tree will take a long time. I think I misunderstood the problem as being limited to ticks, which is probably only one symptom of a fundamental change in behaviour of the compiler. Still, it might be worthwhile start looking at ints that ought to be implemented as u_int I actually agree, I just think we are stuck with -fwrapv in the interval, but it's probably not a short interval. I think converting ticks to unsigned would be a good first start. For the record, I agree. However, I suspect that attempts to do so will have a non trivial number of bugs introduced. We have a track record of recurring problems with tcp sequence number space arithmetic and tcp timing, partly because the wraparounds happens infrequently. In the mean time, I feel that telling the compiler that it's OK to let it behave the way we expect (vs actively sabotaging it) is a viable stopgap. Seems like it would make sense to move these functions into files that can be easily compiled outside of kernel and then adding unit tests. I've done this before, to prove that larger pcb hashes help performance on large workloads. -Alfred ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: PSA: If you run -current, beware!
On Thursday, February 05, 2015 11:00:46 AM Peter Wemm wrote: > On Thursday, February 05, 2015 10:48:54 AM John Baldwin wrote: > > On Thursday, February 05, 2015 04:22:23 PM Luigi Rizzo wrote: > > > On Thu, Feb 05, 2015 at 08:21:45AM -0500, John Baldwin wrote: > > > > On Thursday, February 05, 2015 08:48:33 AM Luigi Rizzo wrote: > > > ... > > > > > > > > > > It is fixed (in the proper meaning of the word, not like worked > > > > > > > around, > > > > > > > covered by paper) by the patch at the end of the mail. > > > > > > > > > > > > > > We already have a story trying to enable much less ambitious > > > > > > > option > > > > > > > -fno-strict-overflow, see r259045 and the revert in r259422. I > > > > > > > do > > > > > > > not > > > > > > > see other way than try one more time. Too many places in kernel > > > > > > > depend on the correctly wrapping 2-complement arithmetic, among > > > > > > > others > > > > > > > are callweel and scheduler. > > > > > > > > > > Rather than depending on a compiler option, wouldn't it be > > > > > better/more > > > > > robust to change ticks to unsigned, which has specified wrapping > > > > > behavior? > > > > > > > > Yes, but non-trivial. It's also not limited to ticks. Since the > > > > compiler > > > > knows when it would apply these optimizations, it would be nice if it > > > > could > > > > warn instead (GCC apparently has a warning, but clang does not). > > > > Having > > > > people do a manual audit of every signed integer expression in the > > > > tree > > > > will take a long time. > > > > > > I think I misunderstood the problem as being limited to ticks, > > > which is probably only one symptom of a fundamental change in behaviour > > > of the compiler. > > > Still, it might be worthwhile start looking at ints that ought to be > > > implemented as u_int > > > > I actually agree, I just think we are stuck with -fwrapv in the interval, > > but it's probably not a short interval. I think converting ticks to > > unsigned would be a good first start. > > For the record, I agree. However, I suspect that attempts to do so will > have a non trivial number of bugs introduced. We have a track record of > recurring problems with tcp sequence number space arithmetic and tcp > timing, partly because the wraparounds happens infrequently. BTW; anybody working on this will want to run with kern.hz="10" in loader.conf (or higher). Having the clock tick 100 times faster speeds the rollover up from every ~25 days to every ~6 hours. I don't know what the practical limit is but at some point it will cause sufficient pain due to contention that it won't be useful. -- Peter Wemm - pe...@wemm.org; pe...@freebsd.org; pe...@yahoo-inc.com; KI6FJV UTF-8: for when a ' or ... just won\342\200\231t do\342\200\246 signature.asc Description: This is a digitally signed message part.
Re: PSA: If you run -current, beware!
On Thursday, February 05, 2015 10:48:54 AM John Baldwin wrote: > On Thursday, February 05, 2015 04:22:23 PM Luigi Rizzo wrote: > > On Thu, Feb 05, 2015 at 08:21:45AM -0500, John Baldwin wrote: > > > On Thursday, February 05, 2015 08:48:33 AM Luigi Rizzo wrote: > > ... > > > > > > > > It is fixed (in the proper meaning of the word, not like worked > > > > > > around, > > > > > > covered by paper) by the patch at the end of the mail. > > > > > > > > > > > > We already have a story trying to enable much less ambitious > > > > > > option > > > > > > -fno-strict-overflow, see r259045 and the revert in r259422. I do > > > > > > not > > > > > > see other way than try one more time. Too many places in kernel > > > > > > depend on the correctly wrapping 2-complement arithmetic, among > > > > > > others > > > > > > are callweel and scheduler. > > > > > > > > Rather than depending on a compiler option, wouldn't it be better/more > > > > robust to change ticks to unsigned, which has specified wrapping > > > > behavior? > > > > > > Yes, but non-trivial. It's also not limited to ticks. Since the > > > compiler > > > knows when it would apply these optimizations, it would be nice if it > > > could > > > warn instead (GCC apparently has a warning, but clang does not). Having > > > people do a manual audit of every signed integer expression in the tree > > > will take a long time. > > > > I think I misunderstood the problem as being limited to ticks, > > which is probably only one symptom of a fundamental change in behaviour > > of the compiler. > > Still, it might be worthwhile start looking at ints that ought to be > > implemented as u_int > > I actually agree, I just think we are stuck with -fwrapv in the interval, > but it's probably not a short interval. I think converting ticks to > unsigned would be a good first start. For the record, I agree. However, I suspect that attempts to do so will have a non trivial number of bugs introduced. We have a track record of recurring problems with tcp sequence number space arithmetic and tcp timing, partly because the wraparounds happens infrequently. In the mean time, I feel that telling the compiler that it's OK to let it behave the way we expect (vs actively sabotaging it) is a viable stopgap. -- Peter Wemm - pe...@wemm.org; pe...@freebsd.org; pe...@yahoo-inc.com; KI6FJV UTF-8: for when a ' or ... just won\342\200\231t do\342\200\246 signature.asc Description: This is a digitally signed message part.
Re: PSA: If you run -current, beware!
On Thu, Feb 05, 2015 at 10:48:54AM -0500, John Baldwin wrote: > On Thursday, February 05, 2015 04:22:23 PM Luigi Rizzo wrote: > > On Thu, Feb 05, 2015 at 08:21:45AM -0500, John Baldwin wrote: > > > On Thursday, February 05, 2015 08:48:33 AM Luigi Rizzo wrote: > > ... > > > > > > > > It is fixed (in the proper meaning of the word, not like worked > > > > > > around, > > > > > > covered by paper) by the patch at the end of the mail. > > > > > > > > > > > > We already have a story trying to enable much less ambitious option > > > > > > -fno-strict-overflow, see r259045 and the revert in r259422. I do > > > > > > not > > > > > > see other way than try one more time. Too many places in kernel > > > > > > depend on the correctly wrapping 2-complement arithmetic, among > > > > > > others > > > > > > are callweel and scheduler. > > > > > > > > Rather than depending on a compiler option, wouldn't it be better/more > > > > robust to change ticks to unsigned, which has specified wrapping > > > > behavior? > > > > > > Yes, but non-trivial. It's also not limited to ticks. Since the compiler > > > knows when it would apply these optimizations, it would be nice if it > > > could > > > warn instead (GCC apparently has a warning, but clang does not). Having > > > people do a manual audit of every signed integer expression in the tree > > > will take a long time. > > > > I think I misunderstood the problem as being limited to ticks, > > which is probably only one symptom of a fundamental change in behaviour > > of the compiler. > > Still, it might be worthwhile start looking at ints that ought to be > > implemented as u_int > > I actually agree, I just think we are stuck with -fwrapv in the interval, but > it's probably not a short interval. I think converting ticks to unsigned > would be a good first start. In principle MIT's KINT tool should help here. Unfortunatly, it's based on LLVM 3.1 and appears to be unmaintained. -- Brooks pgp5kXYYo2QhR.pgp Description: PGP signature
Re: PSA: If you run -current, beware!
On Thursday, February 05, 2015 04:22:23 PM Luigi Rizzo wrote: > On Thu, Feb 05, 2015 at 08:21:45AM -0500, John Baldwin wrote: > > On Thursday, February 05, 2015 08:48:33 AM Luigi Rizzo wrote: > ... > > > > > > It is fixed (in the proper meaning of the word, not like worked > > > > > around, > > > > > covered by paper) by the patch at the end of the mail. > > > > > > > > > > We already have a story trying to enable much less ambitious option > > > > > -fno-strict-overflow, see r259045 and the revert in r259422. I do > > > > > not > > > > > see other way than try one more time. Too many places in kernel > > > > > depend on the correctly wrapping 2-complement arithmetic, among > > > > > others > > > > > are callweel and scheduler. > > > > > > Rather than depending on a compiler option, wouldn't it be better/more > > > robust to change ticks to unsigned, which has specified wrapping > > > behavior? > > > > Yes, but non-trivial. It's also not limited to ticks. Since the compiler > > knows when it would apply these optimizations, it would be nice if it > > could > > warn instead (GCC apparently has a warning, but clang does not). Having > > people do a manual audit of every signed integer expression in the tree > > will take a long time. > > I think I misunderstood the problem as being limited to ticks, > which is probably only one symptom of a fundamental change in behaviour > of the compiler. > Still, it might be worthwhile start looking at ints that ought to be > implemented as u_int I actually agree, I just think we are stuck with -fwrapv in the interval, but it's probably not a short interval. I think converting ticks to unsigned would be a good first start. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: PSA: If you run -current, beware!
On Thu, Feb 05, 2015 at 08:21:45AM -0500, John Baldwin wrote: > On Thursday, February 05, 2015 08:48:33 AM Luigi Rizzo wrote: ... > > > > It is fixed (in the proper meaning of the word, not like worked around, > > > > covered by paper) by the patch at the end of the mail. > > > > > > > > We already have a story trying to enable much less ambitious option > > > > -fno-strict-overflow, see r259045 and the revert in r259422. I do not > > > > see other way than try one more time. Too many places in kernel > > > > depend on the correctly wrapping 2-complement arithmetic, among others > > > > are callweel and scheduler. > > > > Rather than depending on a compiler option, wouldn't it be better/more > > robust to change ticks to unsigned, which has specified wrapping behavior? > > Yes, but non-trivial. It's also not limited to ticks. Since the compiler > knows when it would apply these optimizations, it would be nice if it could > warn instead (GCC apparently has a warning, but clang does not). Having > people do a manual audit of every signed integer expression in the tree will > take a long time. I think I misunderstood the problem as being limited to ticks, which is probably only one symptom of a fundamental change in behaviour of the compiler. Still, it might be worthwhile start looking at ints that ought to be implemented as u_int cheers luigi ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: PSA: If you run -current, beware!
On 5 February 2015 at 02:48, Luigi Rizzo wrote: > > Rather than depending on a compiler option, wouldn't it be better/more > robust to change ticks to unsigned, which has specified wrapping behavior? I believe there are cases other than ticks that rely on 2s complement signed wrap. We'd want to make sure we find such cases. Newer GCC can help with that. The -Wstrict-overflow flag causes the compiler to warn when implementing an optimization based on undefined behaviour from signed overflow. Correct C code should work with or without -fwrapv, so we can do both: enable -fwrapv, and make changes to stop relying on undefined behaviour. For ticks specifically we have many examples over time of incorrect calculations so we'll benefit from some work here, independent of signed overflow. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: PSA: If you run -current, beware!
On Thursday, February 05, 2015 08:48:33 AM Luigi Rizzo wrote: > On Thursday, February 5, 2015, Peter Wemm wrote: > > On Wednesday, February 04, 2015 04:29:41 PM Konstantin Belousov wrote: > > > On Tue, Feb 03, 2015 at 01:33:15PM -0800, Peter Wemm wrote: > > > > Sometime in the Dec 10th through Jan 7th timeframe a timing bug has > > > > been > > > > > > introduced to 11.x/head/-current.With HZ=1000 (the default for > > > > bare > > > > metal, not for a vm); the clocks stop just after 24 days of uptime. > > > > This > > > > > > means things like cron, sleep, timeouts etc stop working. TCP/IP > > > > won't > > > > time out or retransmit, etc etc. It can get ugly. > > > > > > > > The problem is NOT in 10.x/-stable. > > > > > > > > We hit this in the freebsd.org cluster, the builds that we used are: > > > > FreeBSD 11.0-CURRENT #0 r275684: Wed Dec 10 20:38:43 UTC 2014 - fine > > > > FreeBSD 11.0-CURRENT #0 r276779: Wed Jan 7 18:47:09 UTC 2015 - broken > > > > > > > > If you are running -current in a situation where it'll accumulate > > > > uptime, > > > > > > you may want to take precautions. A reboot prior to 24 days uptime > > > > (as > > > > horrible a workaround as that is) will avoid it. > > > > > > > > Yes, this is being worked on. > > > > > > So the issue is reproducable in 3 minutes after boot with the following > > > change in kern_clock.c: > > > volatile int ticks = INT_MAX - (/*hz*/1000 * 3 * 60); > > > > > > It is fixed (in the proper meaning of the word, not like worked around, > > > covered by paper) by the patch at the end of the mail. > > > > > > We already have a story trying to enable much less ambitious option > > > -fno-strict-overflow, see r259045 and the revert in r259422. I do not > > > see other way than try one more time. Too many places in kernel > > > depend on the correctly wrapping 2-complement arithmetic, among others > > > are callweel and scheduler. > > Rather than depending on a compiler option, wouldn't it be better/more > robust to change ticks to unsigned, which has specified wrapping behavior? Yes, but non-trivial. It's also not limited to ticks. Since the compiler knows when it would apply these optimizations, it would be nice if it could warn instead (GCC apparently has a warning, but clang does not). Having people do a manual audit of every signed integer expression in the tree will take a long time. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: PSA: If you run -current, beware!
On 5 Feb 2015, at 07:48, Luigi Rizzo wrote: > > Rather than depending on a compiler option, wouldn't it be better/more > robust to change ticks to unsigned, which has specified wrapping behavior? Especially if we want to extend support for external toolchains. gcc and clang support -fwrapv (though occasionally versions of both will not fully support it), but other compilers may well not have an equivalent. Translating the code into C is a far more robust solution than the band-aid of telling the compiler to accept a language that is a bit like C and hoping that this will keep working across compiler implementations and versions. Adding -fwrapv also defeats a number of compiler optimisations, so we are going to generate worse code for places where people used signed types correctly to work around places where they were used incorrectly. David ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: PSA: If you run -current, beware!
On Thursday, February 5, 2015, Peter Wemm wrote: > On Wednesday, February 04, 2015 04:29:41 PM Konstantin Belousov wrote: > > On Tue, Feb 03, 2015 at 01:33:15PM -0800, Peter Wemm wrote: > > > Sometime in the Dec 10th through Jan 7th timeframe a timing bug has > been > > > introduced to 11.x/head/-current.With HZ=1000 (the default for bare > > > metal, not for a vm); the clocks stop just after 24 days of uptime. > This > > > means things like cron, sleep, timeouts etc stop working. TCP/IP won't > > > time out or retransmit, etc etc. It can get ugly. > > > > > > The problem is NOT in 10.x/-stable. > > > > > > We hit this in the freebsd.org cluster, the builds that we used are: > > > FreeBSD 11.0-CURRENT #0 r275684: Wed Dec 10 20:38:43 UTC 2014 - fine > > > FreeBSD 11.0-CURRENT #0 r276779: Wed Jan 7 18:47:09 UTC 2015 - broken > > > > > > If you are running -current in a situation where it'll accumulate > uptime, > > > you may want to take precautions. A reboot prior to 24 days uptime (as > > > horrible a workaround as that is) will avoid it. > > > > > > Yes, this is being worked on. > > > > So the issue is reproducable in 3 minutes after boot with the following > > change in kern_clock.c: > > volatile int ticks = INT_MAX - (/*hz*/1000 * 3 * 60); > > > > It is fixed (in the proper meaning of the word, not like worked around, > > covered by paper) by the patch at the end of the mail. > > > > We already have a story trying to enable much less ambitious option > > -fno-strict-overflow, see r259045 and the revert in r259422. I do not > > see other way than try one more time. Too many places in kernel > > depend on the correctly wrapping 2-complement arithmetic, among others > > are callweel and scheduler. > > Rather than depending on a compiler option, wouldn't it be better/more robust to change ticks to unsigned, which has specified wrapping behavior? Cheers Luigi Ugh. > > I believe I have a smoking gun that suggests that the clock-stop problem is > caused by the clang-3.5 import on Dec 31st. > > Backstory: > http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html > http://www.airs.com/blog/archives/120 > > I suspect that what has happened is that clang's optimizer got better at > seeing the direct or indirect effects of integer overflow and clang (and > gcc) > take advantage of that. > > I have used a slightly different change for about 10 years: > > --- kern/kern_clock.c 2014-12-01 15:42:21.707911656 -0800 > +++ kern/kern_clock.c 2014-12-01 15:42:21.707911656 -0800 > @@ -410,6 +415,11 @@ > #ifdef SW_WATCHDOG > EVENTHANDLER_REGISTER(watchdog_list, watchdog_config, NULL, 0); > #endif > + /* > +* Arrange for ticks to go negative just 5 minutes after boot > +* to help catch sign problems sooner. > +*/ > + ticks = INT_MAX - (hz * 5 * 60); > } > > /* > > This came about from when we had problems with integer overflow arithmetic > in > the tcp stack. > > In any case, I'm in the process of adding -fwrapv and the early wraparound > to > the freebsd.org cluster builds to give it some wider exercise. > > -- > Peter Wemm - pe...@wemm.org ; pe...@freebsd.org; > pe...@yahoo-inc.com ; KI6FJV > UTF-8: for when a ' or ... just won\342\200\231t do\342\200\246 -- -+--- Prof. Luigi RIZZO, ri...@iet.unipi.it . Dip. di Ing. dell'Informazione http://www.iet.unipi.it/~luigi/. Universita` di Pisa TEL +39-050-2211611 . via Diotisalvi 2 Mobile +39-338-6809875 . 56122 PISA (Italy) -+--- ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: PSA: If you run -current, beware!
On Wednesday, February 04, 2015 04:29:41 PM Konstantin Belousov wrote: > On Tue, Feb 03, 2015 at 01:33:15PM -0800, Peter Wemm wrote: > > Sometime in the Dec 10th through Jan 7th timeframe a timing bug has been > > introduced to 11.x/head/-current.With HZ=1000 (the default for bare > > metal, not for a vm); the clocks stop just after 24 days of uptime. This > > means things like cron, sleep, timeouts etc stop working. TCP/IP won't > > time out or retransmit, etc etc. It can get ugly. > > > > The problem is NOT in 10.x/-stable. > > > > We hit this in the freebsd.org cluster, the builds that we used are: > > FreeBSD 11.0-CURRENT #0 r275684: Wed Dec 10 20:38:43 UTC 2014 - fine > > FreeBSD 11.0-CURRENT #0 r276779: Wed Jan 7 18:47:09 UTC 2015 - broken > > > > If you are running -current in a situation where it'll accumulate uptime, > > you may want to take precautions. A reboot prior to 24 days uptime (as > > horrible a workaround as that is) will avoid it. > > > > Yes, this is being worked on. > > So the issue is reproducable in 3 minutes after boot with the following > change in kern_clock.c: > volatile int ticks = INT_MAX - (/*hz*/1000 * 3 * 60); > > It is fixed (in the proper meaning of the word, not like worked around, > covered by paper) by the patch at the end of the mail. > > We already have a story trying to enable much less ambitious option > -fno-strict-overflow, see r259045 and the revert in r259422. I do not > see other way than try one more time. Too many places in kernel > depend on the correctly wrapping 2-complement arithmetic, among others > are callweel and scheduler. Ugh. I believe I have a smoking gun that suggests that the clock-stop problem is caused by the clang-3.5 import on Dec 31st. Backstory: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html http://www.airs.com/blog/archives/120 I suspect that what has happened is that clang's optimizer got better at seeing the direct or indirect effects of integer overflow and clang (and gcc) take advantage of that. I have used a slightly different change for about 10 years: --- kern/kern_clock.c 2014-12-01 15:42:21.707911656 -0800 +++ kern/kern_clock.c 2014-12-01 15:42:21.707911656 -0800 @@ -410,6 +415,11 @@ #ifdef SW_WATCHDOG EVENTHANDLER_REGISTER(watchdog_list, watchdog_config, NULL, 0); #endif + /* +* Arrange for ticks to go negative just 5 minutes after boot +* to help catch sign problems sooner. +*/ + ticks = INT_MAX - (hz * 5 * 60); } /* This came about from when we had problems with integer overflow arithmetic in the tcp stack. In any case, I'm in the process of adding -fwrapv and the early wraparound to the freebsd.org cluster builds to give it some wider exercise. -- Peter Wemm - pe...@wemm.org; pe...@freebsd.org; pe...@yahoo-inc.com; KI6FJV UTF-8: for when a ' or ... just won\342\200\231t do\342\200\246 signature.asc Description: This is a digitally signed message part.
Re: PSA: If you run -current, beware!
On 4 February 2015 at 09:29, Konstantin Belousov wrote: > > So the issue is reproducable in 3 minutes after boot with the following > change in kern_clock.c: > volatile intticks = INT_MAX - (/*hz*/1000 * 3 * 60); > > It is fixed (in the proper meaning of the word, not like worked around, > covered by paper) by the patch at the end of the mail. > > We already have a story trying to enable much less ambitious option > -fno-strict-overflow, see r259045 and the revert in r259422. Note that -fno-strict-overflow and -fwrapv are equivalent as far as Clang is concerned: | // -fno-strict-overflow implies -fwrapv if it isn't disabled, but | // -fstrict-overflow won't turn off an explicitly enabled -fwrapv. | if (Arg *A = Args.getLastArg(options::OPT_fwrapv, | options::OPT_fno_wrapv)) { |if (A->getOption().matches(options::OPT_fwrapv)) | CmdArgs.push_back("-fwrapv"); | } else if (Arg *A = Args.getLastArg(options::OPT_fstrict_overflow, | options::OPT_fno_strict_overflow)) { |if (A->getOption().matches(options::OPT_fno_strict_overflow)) | CmdArgs.push_back("-fwrapv"); | } > I do not see other way than try one more time. Agreed. As you noted elsewhere the original issue that triggered the revert was fixed by r259609, so we should be able to just re-apply r259045. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: PSA: If you run -current, beware!
On Tue, Feb 03, 2015 at 01:33:15PM -0800, Peter Wemm wrote: > Sometime in the Dec 10th through Jan 7th timeframe a timing bug has been > introduced to 11.x/head/-current.With HZ=1000 (the default for bare > metal, > not for a vm); the clocks stop just after 24 days of uptime. This means > things like cron, sleep, timeouts etc stop working. TCP/IP won't time out or > retransmit, etc etc. It can get ugly. > > The problem is NOT in 10.x/-stable. > > We hit this in the freebsd.org cluster, the builds that we used are: > FreeBSD 11.0-CURRENT #0 r275684: Wed Dec 10 20:38:43 UTC 2014 - fine > FreeBSD 11.0-CURRENT #0 r276779: Wed Jan 7 18:47:09 UTC 2015 - broken > > If you are running -current in a situation where it'll accumulate uptime, you > may want to take precautions. A reboot prior to 24 days uptime (as horrible > a > workaround as that is) will avoid it. > > Yes, this is being worked on. So the issue is reproducable in 3 minutes after boot with the following change in kern_clock.c: volatile intticks = INT_MAX - (/*hz*/1000 * 3 * 60); It is fixed (in the proper meaning of the word, not like worked around, covered by paper) by the patch at the end of the mail. We already have a story trying to enable much less ambitious option -fno-strict-overflow, see r259045 and the revert in r259422. I do not see other way than try one more time. Too many places in kernel depend on the correctly wrapping 2-complement arithmetic, among others are callweel and scheduler. diff --git a/sys/conf/kern.mk b/sys/conf/kern.mk index c031b3a..eb7ce2f 100644 --- a/sys/conf/kern.mk +++ b/sys/conf/kern.mk @@ -158,6 +158,11 @@ INLINE_LIMIT?= 8000 CFLAGS+= -ffreestanding # +# Make signed arithmetic wrap. +# +CFLAGS+= -fwrapv + +# # GCC SSP support # .if ${MK_SSP} != "no" && \ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: PSA: If you run -current, beware!
On Tue, 2015-02-03 at 13:33 -0800, Peter Wemm wrote: > Sometime in the Dec 10th through Jan 7th timeframe a timing bug has been > introduced to 11.x/head/-current.With HZ=1000 (the default for bare > metal, > not for a vm); the clocks stop just after 24 days of uptime. This means > things like cron, sleep, timeouts etc stop working. TCP/IP won't time out or > retransmit, etc etc. It can get ugly. > > The problem is NOT in 10.x/-stable. > > We hit this in the freebsd.org cluster, the builds that we used are: > FreeBSD 11.0-CURRENT #0 r275684: Wed Dec 10 20:38:43 UTC 2014 - fine > FreeBSD 11.0-CURRENT #0 r276779: Wed Jan 7 18:47:09 UTC 2015 - broken > > If you are running -current in a situation where it'll accumulate uptime, you > may want to take precautions. A reboot prior to 24 days uptime (as horrible > a > workaround as that is) will avoid it. > > Yes, this is being worked on. FWIW, 24.8 days is the point at which an int32_t variable counting ticks at 1khz rolls over from positive to negative numbers. -- Ian ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: PSA: If you run -current, beware!
On Tuesday, February 3, 2015, Peter Wemm wrote: > Sometime in the Dec 10th through Jan 7th timeframe a timing bug has been > introduced to 11.x/head/-current.With HZ=1000 (the default for bare > metal, > not for a vm); the clocks stop just after 24 days of uptime. This means > Signed 32 bit overflow it seems from the numbers ? Wasn't that a windows feature in the old days ? :) Cheers Luigi -- -+--- Prof. Luigi RIZZO, ri...@iet.unipi.it . Dip. di Ing. dell'Informazione http://www.iet.unipi.it/~luigi/. Universita` di Pisa TEL +39-050-2211611 . via Diotisalvi 2 Mobile +39-338-6809875 . 56122 PISA (Italy) -+--- ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
PSA: If you run -current, beware!
Sometime in the Dec 10th through Jan 7th timeframe a timing bug has been introduced to 11.x/head/-current.With HZ=1000 (the default for bare metal, not for a vm); the clocks stop just after 24 days of uptime. This means things like cron, sleep, timeouts etc stop working. TCP/IP won't time out or retransmit, etc etc. It can get ugly. The problem is NOT in 10.x/-stable. We hit this in the freebsd.org cluster, the builds that we used are: FreeBSD 11.0-CURRENT #0 r275684: Wed Dec 10 20:38:43 UTC 2014 - fine FreeBSD 11.0-CURRENT #0 r276779: Wed Jan 7 18:47:09 UTC 2015 - broken If you are running -current in a situation where it'll accumulate uptime, you may want to take precautions. A reboot prior to 24 days uptime (as horrible a workaround as that is) will avoid it. Yes, this is being worked on. -- Peter Wemm - pe...@wemm.org; pe...@freebsd.org; pe...@yahoo-inc.com; KI6FJV UTF-8: for when a ' or ... just won\342\200\231t do\342\200\246 signature.asc Description: This is a digitally signed message part.