Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Fri, Jul 05, 2013 at 11:50:05PM +0200, Thomas Gleixner wrote: > On Fri, 5 Jul 2013, Peter Zijlstra wrote: > > On Fri, Jul 05, 2013 at 05:24:09PM +0200, Thomas Gleixner wrote: > > > See arch/x86/kernel/tsc.c > > > > > > We disable the watchdog for the TSC when tsc_clocksource_reliable is > > > set. > > > > > > tsc_clocksource_reliable is set when: > > > > > > - you add tsc=reliable to the kernel command line > > > > Ah, I didn't know about that one, useful. > > > > > - boot_cpu_has(X86_FEATURE_TSC_RELIABLE) > > > > > >X86_FEATURE_TSC_RELIABLE is a software flag, set by vmware and > > >moorsetown. So all other machines keep the watchdog enabled. > > > > Right.. I knew it was enabled on my machines even though they normally > > have usable TSC. > > Yeah, but our well justified paranoia still prevents us from trusting > these CPU flags. Maybe some day BIOS is going to be replaced by > something useful. You know: Hope springs eternal Oh quite agreed. Its just that at several times I've wanted to disable the thing. Now I know you can do using the kernel cmdline. Previously I had to wreck code -- not that much harder really :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Sat, Jul 06, 2013 at 12:17:46AM +0200, Thomas Gleixner wrote: > Good news! 10 years is way less than eternity and just before > retirement :) You know that after the 10 years they'll come up with an even uglier platform-differentiation-fiddle-with-dong-while-smoking-crack-crap which will even replace the OS, right? Hmm, I'm wondering what would be faster: wait *at least* 10 more years or get an old mainboard and start experimenting with coreboot... -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Sat, Jul 06, 2013 at 12:17:46AM +0200, Thomas Gleixner wrote: Good news! 10 years is way less than eternity and just before retirement :) You know that after the 10 years they'll come up with an even uglier platform-differentiation-fiddle-with-dong-while-smoking-crack-crap which will even replace the OS, right? Hmm, I'm wondering what would be faster: wait *at least* 10 more years or get an old mainboard and start experimenting with coreboot... -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Fri, Jul 05, 2013 at 11:50:05PM +0200, Thomas Gleixner wrote: On Fri, 5 Jul 2013, Peter Zijlstra wrote: On Fri, Jul 05, 2013 at 05:24:09PM +0200, Thomas Gleixner wrote: See arch/x86/kernel/tsc.c We disable the watchdog for the TSC when tsc_clocksource_reliable is set. tsc_clocksource_reliable is set when: - you add tsc=reliable to the kernel command line Ah, I didn't know about that one, useful. - boot_cpu_has(X86_FEATURE_TSC_RELIABLE) X86_FEATURE_TSC_RELIABLE is a software flag, set by vmware and moorsetown. So all other machines keep the watchdog enabled. Right.. I knew it was enabled on my machines even though they normally have usable TSC. Yeah, but our well justified paranoia still prevents us from trusting these CPU flags. Maybe some day BIOS is going to be replaced by something useful. You know: Hope springs eternal Oh quite agreed. Its just that at several times I've wanted to disable the thing. Now I know you can do using the kernel cmdline. Previously I had to wreck code -- not that much harder really :-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Fri, 5 Jul 2013, Borislav Petkov wrote: > On Fri, Jul 05, 2013 at 11:50:05PM +0200, Thomas Gleixner wrote: > > Yeah, but our well justified paranoia still prevents us from trusting > > these CPU flags. Maybe some day BIOS is going to be replaced by > > something useful. You know: Hope springs eternal > > Not in the next 10 yrs at least if one took a look at the > overengineered, obese at birth and braindead crap by the name of UEFI. Good news! 10 years is way less than eternity and just before retirement :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Fri, Jul 05, 2013 at 11:50:05PM +0200, Thomas Gleixner wrote: > Yeah, but our well justified paranoia still prevents us from trusting > these CPU flags. Maybe some day BIOS is going to be replaced by > something useful. You know: Hope springs eternal Not in the next 10 yrs at least if one took a look at the overengineered, obese at birth and braindead crap by the name of UEFI. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Fri, 5 Jul 2013, Peter Zijlstra wrote: > On Fri, Jul 05, 2013 at 05:24:09PM +0200, Thomas Gleixner wrote: > > See arch/x86/kernel/tsc.c > > > > We disable the watchdog for the TSC when tsc_clocksource_reliable is > > set. > > > > tsc_clocksource_reliable is set when: > > > > - you add tsc=reliable to the kernel command line > > Ah, I didn't know about that one, useful. > > > - boot_cpu_has(X86_FEATURE_TSC_RELIABLE) > > > >X86_FEATURE_TSC_RELIABLE is a software flag, set by vmware and > >moorsetown. So all other machines keep the watchdog enabled. > > Right.. I knew it was enabled on my machines even though they normally > have usable TSC. Yeah, but our well justified paranoia still prevents us from trusting these CPU flags. Maybe some day BIOS is going to be replaced by something useful. You know: Hope springs eternal -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Fri, Jul 05, 2013 at 05:24:09PM +0200, Thomas Gleixner wrote: > See arch/x86/kernel/tsc.c > > We disable the watchdog for the TSC when tsc_clocksource_reliable is > set. > > tsc_clocksource_reliable is set when: > > - you add tsc=reliable to the kernel command line Ah, I didn't know about that one, useful. > - boot_cpu_has(X86_FEATURE_TSC_RELIABLE) > >X86_FEATURE_TSC_RELIABLE is a software flag, set by vmware and >moorsetown. So all other machines keep the watchdog enabled. Right.. I knew it was enabled on my machines even though they normally have usable TSC. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Fri, 5 Jul 2013, Peter Zijlstra wrote: > On Fri, Jul 05, 2013 at 04:23:33PM +0200, Frederic Weisbecker wrote: > > Nope, I haven't touched that. I prefer not to fiddle with unstable > > clocksource for now :) > > > > As for unstable TSCs, if sched_clock_tick() needs to be fed, we simply > > don't stop the tick. > > Not entirely the same thing; I thought the clocksource watchdog was ran > even when we have a 'stable' TSC, just to make sure it stays stable. > There's known cases where the BIOS f*cks us over and wrecks TSC sync. See arch/x86/kernel/tsc.c We disable the watchdog for the TSC when tsc_clocksource_reliable is set. tsc_clocksource_reliable is set when: - you add tsc=reliable to the kernel command line - boot_cpu_has(X86_FEATURE_TSC_RELIABLE) X86_FEATURE_TSC_RELIABLE is a software flag, set by vmware and moorsetown. So all other machines keep the watchdog enabled. - On Geode LX (OLPC) Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Fri, Jul 05, 2013 at 04:23:33PM +0200, Frederic Weisbecker wrote: > Nope, I haven't touched that. I prefer not to fiddle with unstable > clocksource for now :) > > As for unstable TSCs, if sched_clock_tick() needs to be fed, we simply > don't stop the tick. Not entirely the same thing; I thought the clocksource watchdog was ran even when we have a 'stable' TSC, just to make sure it stays stable. There's known cases where the BIOS f*cks us over and wrecks TSC sync. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
2013/7/4 Peter Zijlstra : > On Thu, Jul 04, 2013 at 01:34:13PM +0800, Alex Shi wrote: > >> If the tsc is marked as constant and nonstop, could we set it as system >> clocksource when do tsc register? w/o checking it on clocksource_watchdog? > > I'd not do that; the BIOS can still screw you over, we need some validation. > > That said; we do need means to disable the clocksource watchdog -- although I > suppose Frederic might already have provided this for this NOHZ efforts when I > wasn't looking. Nope, I haven't touched that. I prefer not to fiddle with unstable clocksource for now :) As for unstable TSCs, if sched_clock_tick() needs to be fed, we simply don't stop the tick. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On 07/05/2013 01:58 PM, Thomas Gleixner wrote: >> > >> > Ingo had merged your branch into sched/core. :) >> > >> > commit f9bed7021710a3e45c331f7d7781de914cc1b939 >> > Merge: 7e76057 67dd331 >> > Author: Ingo Molnar >> > Date: Wed May 29 11:21:59 2013 +0200 >> > >> > Merge branch 'timers/urgent' > Not really. > > tip$ git branch --contains f9bed70217 > * master > > tip$ git branch --contains 5d33b883ae > * master > timers/core > > So you are testing tip/master not tip/sched/core. You'r right. I mixed them. sorry. -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On 07/05/2013 01:58 PM, Thomas Gleixner wrote: Ingo had merged your branch into sched/core. :) commit f9bed7021710a3e45c331f7d7781de914cc1b939 Merge: 7e76057 67dd331 Author: Ingo Molnar mi...@kernel.org Date: Wed May 29 11:21:59 2013 +0200 Merge branch 'timers/urgent' Not really. tip$ git branch --contains f9bed70217 * master tip$ git branch --contains 5d33b883ae * master timers/core So you are testing tip/master not tip/sched/core. You'r right. I mixed them. sorry. -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
2013/7/4 Peter Zijlstra pet...@infradead.org: On Thu, Jul 04, 2013 at 01:34:13PM +0800, Alex Shi wrote: If the tsc is marked as constant and nonstop, could we set it as system clocksource when do tsc register? w/o checking it on clocksource_watchdog? I'd not do that; the BIOS can still screw you over, we need some validation. That said; we do need means to disable the clocksource watchdog -- although I suppose Frederic might already have provided this for this NOHZ efforts when I wasn't looking. Nope, I haven't touched that. I prefer not to fiddle with unstable clocksource for now :) As for unstable TSCs, if sched_clock_tick() needs to be fed, we simply don't stop the tick. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Fri, Jul 05, 2013 at 04:23:33PM +0200, Frederic Weisbecker wrote: Nope, I haven't touched that. I prefer not to fiddle with unstable clocksource for now :) As for unstable TSCs, if sched_clock_tick() needs to be fed, we simply don't stop the tick. Not entirely the same thing; I thought the clocksource watchdog was ran even when we have a 'stable' TSC, just to make sure it stays stable. There's known cases where the BIOS f*cks us over and wrecks TSC sync. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Fri, 5 Jul 2013, Peter Zijlstra wrote: On Fri, Jul 05, 2013 at 04:23:33PM +0200, Frederic Weisbecker wrote: Nope, I haven't touched that. I prefer not to fiddle with unstable clocksource for now :) As for unstable TSCs, if sched_clock_tick() needs to be fed, we simply don't stop the tick. Not entirely the same thing; I thought the clocksource watchdog was ran even when we have a 'stable' TSC, just to make sure it stays stable. There's known cases where the BIOS f*cks us over and wrecks TSC sync. See arch/x86/kernel/tsc.c We disable the watchdog for the TSC when tsc_clocksource_reliable is set. tsc_clocksource_reliable is set when: - you add tsc=reliable to the kernel command line - boot_cpu_has(X86_FEATURE_TSC_RELIABLE) X86_FEATURE_TSC_RELIABLE is a software flag, set by vmware and moorsetown. So all other machines keep the watchdog enabled. - On Geode LX (OLPC) Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Fri, Jul 05, 2013 at 05:24:09PM +0200, Thomas Gleixner wrote: See arch/x86/kernel/tsc.c We disable the watchdog for the TSC when tsc_clocksource_reliable is set. tsc_clocksource_reliable is set when: - you add tsc=reliable to the kernel command line Ah, I didn't know about that one, useful. - boot_cpu_has(X86_FEATURE_TSC_RELIABLE) X86_FEATURE_TSC_RELIABLE is a software flag, set by vmware and moorsetown. So all other machines keep the watchdog enabled. Right.. I knew it was enabled on my machines even though they normally have usable TSC. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Fri, 5 Jul 2013, Peter Zijlstra wrote: On Fri, Jul 05, 2013 at 05:24:09PM +0200, Thomas Gleixner wrote: See arch/x86/kernel/tsc.c We disable the watchdog for the TSC when tsc_clocksource_reliable is set. tsc_clocksource_reliable is set when: - you add tsc=reliable to the kernel command line Ah, I didn't know about that one, useful. - boot_cpu_has(X86_FEATURE_TSC_RELIABLE) X86_FEATURE_TSC_RELIABLE is a software flag, set by vmware and moorsetown. So all other machines keep the watchdog enabled. Right.. I knew it was enabled on my machines even though they normally have usable TSC. Yeah, but our well justified paranoia still prevents us from trusting these CPU flags. Maybe some day BIOS is going to be replaced by something useful. You know: Hope springs eternal -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Fri, Jul 05, 2013 at 11:50:05PM +0200, Thomas Gleixner wrote: Yeah, but our well justified paranoia still prevents us from trusting these CPU flags. Maybe some day BIOS is going to be replaced by something useful. You know: Hope springs eternal Not in the next 10 yrs at least if one took a look at the overengineered, obese at birth and braindead crap by the name of UEFI. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Fri, 5 Jul 2013, Borislav Petkov wrote: On Fri, Jul 05, 2013 at 11:50:05PM +0200, Thomas Gleixner wrote: Yeah, but our well justified paranoia still prevents us from trusting these CPU flags. Maybe some day BIOS is going to be replaced by something useful. You know: Hope springs eternal Not in the next 10 yrs at least if one took a look at the overengineered, obese at birth and braindead crap by the name of UEFI. Good news! 10 years is way less than eternity and just before retirement :) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Fri, 5 Jul 2013, Alex Shi wrote: > On 07/05/2013 04:27 AM, Thomas Gleixner wrote: > We find some benchmarks drop a lot on tip/sched/core on many Intel boxes, > > And that branch does NOT have that commit included. So how can you see > > a regression on a branch caused by a commit NOT included into that > > branch? > > > > The offending commit is in tip timers/core and not in tip > > sched/core. What I'm wanted to say is, that we need a proper > > description of problems and not some random association. > > > > It tricked you to assume, that I'm not able to figure it out myself :) > > > > See? These things are complex and subtle, so we need precise > > descriptions and not some sloppy semi correct data. > > > > I'm well aware of the issue and with Peters help I got a reasonable > > explanation for it. A proper fix is about to be sent out. > > > > Ingo had merged your branch into sched/core. :) > > commit f9bed7021710a3e45c331f7d7781de914cc1b939 > Merge: 7e76057 67dd331 > Author: Ingo Molnar > Date: Wed May 29 11:21:59 2013 +0200 > > Merge branch 'timers/urgent' Not really. tip$ git branch --contains f9bed70217 * master tip$ git branch --contains 5d33b883ae * master timers/core So you are testing tip/master not tip/sched/core. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On 07/05/2013 04:27 AM, Thomas Gleixner wrote: We find some benchmarks drop a lot on tip/sched/core on many Intel boxes, > And that branch does NOT have that commit included. So how can you see > a regression on a branch caused by a commit NOT included into that > branch? > > The offending commit is in tip timers/core and not in tip > sched/core. What I'm wanted to say is, that we need a proper > description of problems and not some random association. > > It tricked you to assume, that I'm not able to figure it out myself :) > > See? These things are complex and subtle, so we need precise > descriptions and not some sloppy semi correct data. > > I'm well aware of the issue and with Peters help I got a reasonable > explanation for it. A proper fix is about to be sent out. > Ingo had merged your branch into sched/core. :) commit f9bed7021710a3e45c331f7d7781de914cc1b939 Merge: 7e76057 67dd331 Author: Ingo Molnar Date: Wed May 29 11:21:59 2013 +0200 Merge branch 'timers/urgent' ... commit 7d194f78bde64ec813c1ed8291181bdd61515e78 Merge: 0298bf7 1eaff67 Author: Ingo Molnar Date: Tue May 28 09:53:41 2013 +0200 Merge branch 'timers/core' commit 0298bf70644d7334bec16ae47f3aa58f4f883b59 Merge: cc662fa 2938d27 Author: Ingo Molnar Date: Tue May 28 09:49:51 2013 +0200 Merge branch 'timers/urgent' -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Thu, 4 Jul 2013, Davidlohr Bueso wrote: > On Thu, 2013-07-04 at 13:00 +0200, Thomas Gleixner wrote: > > On Thu, 4 Jul 2013, Alex Shi wrote: > > > > > We find some benchmarks drop a lot on tip/sched/core on many Intel boxes, > > > like oltp, tbench, hackbench etc. and bisected the commit 5d33b883ae > > > cause this regression. Due to this commit, the clocksource was changed > > > to hpet from tsc even tsc will be set CLOCK_SOURCE_VALID_FOR_HRES later > > > in clocksource_watchdog. > > > > 5d33b883ae is not in tip/sched/core. So what are you testing and > > bisecting? > > I think he's referring to: > > commit 5d33b883aed81c6fbcd09c6f7c3619eee850a7e2 I know what he is referring to. He explicitly mentions this commit: > > > like oltp, tbench, hackbench etc. and bisected the commit 5d33b883ae What I was pointing out that he was referring to tip sched/core at the same time > > > We find some benchmarks drop a lot on tip/sched/core on many Intel boxes, And that branch does NOT have that commit included. So how can you see a regression on a branch caused by a commit NOT included into that branch? The offending commit is in tip timers/core and not in tip sched/core. What I'm wanted to say is, that we need a proper description of problems and not some random association. It tricked you to assume, that I'm not able to figure it out myself :) See? These things are complex and subtle, so we need precise descriptions and not some sloppy semi correct data. I'm well aware of the issue and with Peters help I got a reasonable explanation for it. A proper fix is about to be sent out. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Thu, 2013-07-04 at 13:00 +0200, Thomas Gleixner wrote: > On Thu, 4 Jul 2013, Alex Shi wrote: > > > We find some benchmarks drop a lot on tip/sched/core on many Intel boxes, > > like oltp, tbench, hackbench etc. and bisected the commit 5d33b883ae > > cause this regression. Due to this commit, the clocksource was changed > > to hpet from tsc even tsc will be set CLOCK_SOURCE_VALID_FOR_HRES later > > in clocksource_watchdog. > > 5d33b883ae is not in tip/sched/core. So what are you testing and > bisecting? I think he's referring to: commit 5d33b883aed81c6fbcd09c6f7c3619eee850a7e2 Author: Thomas Gleixner Date: Thu Apr 25 20:31:43 2013 + clocksource: Always verify highres capability If a clocksource has a (wrong) high rating, but can't be used as a timebase for oneshot tick mode, it is unconditionally selected even when the system is already in oneshot tick mode. This causes full system failure. Verify the clocksource selection against the oneshot mode. Signed-off-by: Thomas Gleixner Acked-by: John Stultz Cc: Magnus Damm Link: http://lkml.kernel.org/r/20130425143435.635040...@linutronix.de Signed-off-by: Thomas Gleixner -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Thu, 4 Jul 2013, Alex Shi wrote: > We find some benchmarks drop a lot on tip/sched/core on many Intel boxes, > like oltp, tbench, hackbench etc. and bisected the commit 5d33b883ae > cause this regression. Due to this commit, the clocksource was changed > to hpet from tsc even tsc will be set CLOCK_SOURCE_VALID_FOR_HRES later > in clocksource_watchdog. 5d33b883ae is not in tip/sched/core. So what are you testing and bisecting? Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On 07/04/2013 03:58 PM, Peter Zijlstra wrote: > On Thu, Jul 04, 2013 at 01:34:13PM +0800, Alex Shi wrote: > >> If the tsc is marked as constant and nonstop, could we set it as system >> clocksource when do tsc register? w/o checking it on clocksource_watchdog? > > I'd not do that; the BIOS can still screw you over, we need some validation. I see. thanks! > > That said; we do need means to disable the clocksource watchdog -- although I > suppose Frederic might already have provided this for this NOHZ efforts when I > wasn't looking. > -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Thu, Jul 04, 2013 at 01:34:13PM +0800, Alex Shi wrote: > If the tsc is marked as constant and nonstop, could we set it as system > clocksource when do tsc register? w/o checking it on clocksource_watchdog? I'd not do that; the BIOS can still screw you over, we need some validation. That said; we do need means to disable the clocksource watchdog -- although I suppose Frederic might already have provided this for this NOHZ efforts when I wasn't looking. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Thu, Jul 04, 2013 at 01:34:13PM +0800, Alex Shi wrote: If the tsc is marked as constant and nonstop, could we set it as system clocksource when do tsc register? w/o checking it on clocksource_watchdog? I'd not do that; the BIOS can still screw you over, we need some validation. That said; we do need means to disable the clocksource watchdog -- although I suppose Frederic might already have provided this for this NOHZ efforts when I wasn't looking. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On 07/04/2013 03:58 PM, Peter Zijlstra wrote: On Thu, Jul 04, 2013 at 01:34:13PM +0800, Alex Shi wrote: If the tsc is marked as constant and nonstop, could we set it as system clocksource when do tsc register? w/o checking it on clocksource_watchdog? I'd not do that; the BIOS can still screw you over, we need some validation. I see. thanks! That said; we do need means to disable the clocksource watchdog -- although I suppose Frederic might already have provided this for this NOHZ efforts when I wasn't looking. -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Thu, 4 Jul 2013, Alex Shi wrote: We find some benchmarks drop a lot on tip/sched/core on many Intel boxes, like oltp, tbench, hackbench etc. and bisected the commit 5d33b883ae cause this regression. Due to this commit, the clocksource was changed to hpet from tsc even tsc will be set CLOCK_SOURCE_VALID_FOR_HRES later in clocksource_watchdog. 5d33b883ae is not in tip/sched/core. So what are you testing and bisecting? Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Thu, 2013-07-04 at 13:00 +0200, Thomas Gleixner wrote: On Thu, 4 Jul 2013, Alex Shi wrote: We find some benchmarks drop a lot on tip/sched/core on many Intel boxes, like oltp, tbench, hackbench etc. and bisected the commit 5d33b883ae cause this regression. Due to this commit, the clocksource was changed to hpet from tsc even tsc will be set CLOCK_SOURCE_VALID_FOR_HRES later in clocksource_watchdog. 5d33b883ae is not in tip/sched/core. So what are you testing and bisecting? I think he's referring to: commit 5d33b883aed81c6fbcd09c6f7c3619eee850a7e2 Author: Thomas Gleixner t...@linutronix.de Date: Thu Apr 25 20:31:43 2013 + clocksource: Always verify highres capability If a clocksource has a (wrong) high rating, but can't be used as a timebase for oneshot tick mode, it is unconditionally selected even when the system is already in oneshot tick mode. This causes full system failure. Verify the clocksource selection against the oneshot mode. Signed-off-by: Thomas Gleixner t...@linutronix.de Acked-by: John Stultz john.stu...@linaro.org Cc: Magnus Damm magnus.d...@gmail.com Link: http://lkml.kernel.org/r/20130425143435.635040...@linutronix.de Signed-off-by: Thomas Gleixner t...@linutronix.de -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Thu, 4 Jul 2013, Davidlohr Bueso wrote: On Thu, 2013-07-04 at 13:00 +0200, Thomas Gleixner wrote: On Thu, 4 Jul 2013, Alex Shi wrote: We find some benchmarks drop a lot on tip/sched/core on many Intel boxes, like oltp, tbench, hackbench etc. and bisected the commit 5d33b883ae cause this regression. Due to this commit, the clocksource was changed to hpet from tsc even tsc will be set CLOCK_SOURCE_VALID_FOR_HRES later in clocksource_watchdog. 5d33b883ae is not in tip/sched/core. So what are you testing and bisecting? I think he's referring to: commit 5d33b883aed81c6fbcd09c6f7c3619eee850a7e2 I know what he is referring to. He explicitly mentions this commit: like oltp, tbench, hackbench etc. and bisected the commit 5d33b883ae What I was pointing out that he was referring to tip sched/core at the same time We find some benchmarks drop a lot on tip/sched/core on many Intel boxes, And that branch does NOT have that commit included. So how can you see a regression on a branch caused by a commit NOT included into that branch? The offending commit is in tip timers/core and not in tip sched/core. What I'm wanted to say is, that we need a proper description of problems and not some random association. It tricked you to assume, that I'm not able to figure it out myself :) See? These things are complex and subtle, so we need precise descriptions and not some sloppy semi correct data. I'm well aware of the issue and with Peters help I got a reasonable explanation for it. A proper fix is about to be sent out. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On 07/05/2013 04:27 AM, Thomas Gleixner wrote: We find some benchmarks drop a lot on tip/sched/core on many Intel boxes, And that branch does NOT have that commit included. So how can you see a regression on a branch caused by a commit NOT included into that branch? The offending commit is in tip timers/core and not in tip sched/core. What I'm wanted to say is, that we need a proper description of problems and not some random association. It tricked you to assume, that I'm not able to figure it out myself :) See? These things are complex and subtle, so we need precise descriptions and not some sloppy semi correct data. I'm well aware of the issue and with Peters help I got a reasonable explanation for it. A proper fix is about to be sent out. Ingo had merged your branch into sched/core. :) commit f9bed7021710a3e45c331f7d7781de914cc1b939 Merge: 7e76057 67dd331 Author: Ingo Molnar mi...@kernel.org Date: Wed May 29 11:21:59 2013 +0200 Merge branch 'timers/urgent' ... commit 7d194f78bde64ec813c1ed8291181bdd61515e78 Merge: 0298bf7 1eaff67 Author: Ingo Molnar mi...@kernel.org Date: Tue May 28 09:53:41 2013 +0200 Merge branch 'timers/core' commit 0298bf70644d7334bec16ae47f3aa58f4f883b59 Merge: cc662fa 2938d27 Author: Ingo Molnar mi...@kernel.org Date: Tue May 28 09:49:51 2013 +0200 Merge branch 'timers/urgent' -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
On Fri, 5 Jul 2013, Alex Shi wrote: On 07/05/2013 04:27 AM, Thomas Gleixner wrote: We find some benchmarks drop a lot on tip/sched/core on many Intel boxes, And that branch does NOT have that commit included. So how can you see a regression on a branch caused by a commit NOT included into that branch? The offending commit is in tip timers/core and not in tip sched/core. What I'm wanted to say is, that we need a proper description of problems and not some random association. It tricked you to assume, that I'm not able to figure it out myself :) See? These things are complex and subtle, so we need precise descriptions and not some sloppy semi correct data. I'm well aware of the issue and with Peters help I got a reasonable explanation for it. A proper fix is about to be sent out. Ingo had merged your branch into sched/core. :) commit f9bed7021710a3e45c331f7d7781de914cc1b939 Merge: 7e76057 67dd331 Author: Ingo Molnar mi...@kernel.org Date: Wed May 29 11:21:59 2013 +0200 Merge branch 'timers/urgent' Not really. tip$ git branch --contains f9bed70217 * master tip$ git branch --contains 5d33b883ae * master timers/core So you are testing tip/master not tip/sched/core. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[URGENT rfc patch 0/3] tsc clocksource bug fix
We find some benchmarks drop a lot on tip/sched/core on many Intel boxes, like oltp, tbench, hackbench etc. and bisected the commit 5d33b883ae cause this regression. Due to this commit, the clocksource was changed to hpet from tsc even tsc will be set CLOCK_SOURCE_VALID_FOR_HRES later in clocksource_watchdog. Tim Chen said the hpet reading cost much. That cause this regression. This patchset fixed this bug by re-select clocksource after this flag set on tsc. BTW, If the tsc is marked as constant and nonstop, could we set it as system clocksource when do tsc register? w/o checking it on clocksource_watchdog? regards! Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[URGENT rfc patch 0/3] tsc clocksource bug fix
We find some benchmarks drop a lot on tip/sched/core on many Intel boxes, like oltp, tbench, hackbench etc. and bisected the commit 5d33b883ae cause this regression. Due to this commit, the clocksource was changed to hpet from tsc even tsc will be set CLOCK_SOURCE_VALID_FOR_HRES later in clocksource_watchdog. Tim Chen said the hpet reading cost much. That cause this regression. This patchset fixed this bug by re-select clocksource after this flag set on tsc. BTW, If the tsc is marked as constant and nonstop, could we set it as system clocksource when do tsc register? w/o checking it on clocksource_watchdog? regards! Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/