Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer
On Thu, 9 Aug 2007 14:16:16 -0700 "Aaron Durbin" <[EMAIL PROTECTED]> wrote: > > but, but. Didn't this get fixed by > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc2/2.6.23-rc2-mm1/broken-out/fix-hpet-init-race.patch > > ? > > > > That original patch looks good to me as well. I just checked git head to see > if > this was already fixed upstream, and it didn't appear to be. Sorry for the > duplicate patch. Yeah, sorry, I'm being a bit more batch-oriented in getting stuff into Linus nowadays. I seem to have 44 patches to go and another 20 which should go, but which belong to subsystem maintainers. We'll get there. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer
On 8/9/07, Andrew Morton <[EMAIL PROTECTED]> wrote: > On Wed, 8 Aug 2007 16:17:19 -0700 > Aaron Durbin <[EMAIL PROTECTED]> wrote: > > > In setup_APIC_timer with the HPET in use, a condition can arise while > > waiting for the next irq slice to expire on the HPET which will either > > cause an NMI watchdog to fire or a 3 minute busy loop if the NMI > > watchdog is disabled. > > > > The HPET comparator and the counter keep incrementing during its normal > > operation. When a comparison event fires the comparator will increment > > by the designated period. If the HPET trigger occurs right after > > the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin > > for up to 3 minutes (with a clock of 25MHz) waiting for the HPET > > counter to wrap around. However, when the NMI watchdog is enabled the > > NMI watchdog will detect a lockup and reboot the machine. This > > scenario can be exasperated by the presence of an SMI which will > > increase the window of opportunity for the condition to occur. > > > > The fix is to wait for the compartor to change which signals the > > end of the tick slice. > > > > --- > > > > The last patch had a typo in the diff which really would cause the > > problem state above. Sorry. > > > > diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c > > index 900ff38..06797e2 100644 > > --- a/arch/x86_64/kernel/apic.c > > +++ b/arch/x86_64/kernel/apic.c > > @@ -791,10 +791,12 @@ static void setup_APIC_timer(unsigned in > > > > /* wait for irq slice */ > > if (hpet_address && hpet_use_timer) { > > + /* > > + * Wait for the comparator value to change which signals that > > + * the tick slice has expired. > > + */ > > int trigger = hpet_readl(HPET_T0_CMP); > > - while (hpet_readl(HPET_COUNTER) >= trigger) > > - /* do nothing */ ; > > - while (hpet_readl(HPET_COUNTER) < trigger) > > + while (trigger == hpet_readl(HPET_T0_CMP)) > > /* do nothing */ ; > > } else { > > int c1, c2; > > but, but. Didn't this get fixed by > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc2/2.6.23-rc2-mm1/broken-out/fix-hpet-init-race.patch > ? > That original patch looks good to me as well. I just checked git head to see if this was already fixed upstream, and it didn't appear to be. Sorry for the duplicate patch. -Aaron - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer
On Wed, 8 Aug 2007 16:17:19 -0700 Aaron Durbin <[EMAIL PROTECTED]> wrote: > In setup_APIC_timer with the HPET in use, a condition can arise while > waiting for the next irq slice to expire on the HPET which will either > cause an NMI watchdog to fire or a 3 minute busy loop if the NMI > watchdog is disabled. > > The HPET comparator and the counter keep incrementing during its normal > operation. When a comparison event fires the comparator will increment > by the designated period. If the HPET trigger occurs right after > the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin > for up to 3 minutes (with a clock of 25MHz) waiting for the HPET > counter to wrap around. However, when the NMI watchdog is enabled the > NMI watchdog will detect a lockup and reboot the machine. This > scenario can be exasperated by the presence of an SMI which will > increase the window of opportunity for the condition to occur. > > The fix is to wait for the compartor to change which signals the > end of the tick slice. > > --- > > The last patch had a typo in the diff which really would cause the > problem state above. Sorry. > > diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c > index 900ff38..06797e2 100644 > --- a/arch/x86_64/kernel/apic.c > +++ b/arch/x86_64/kernel/apic.c > @@ -791,10 +791,12 @@ static void setup_APIC_timer(unsigned in > > /* wait for irq slice */ > if (hpet_address && hpet_use_timer) { > + /* > + * Wait for the comparator value to change which signals that > + * the tick slice has expired. > + */ > int trigger = hpet_readl(HPET_T0_CMP); > - while (hpet_readl(HPET_COUNTER) >= trigger) > - /* do nothing */ ; > - while (hpet_readl(HPET_COUNTER) < trigger) > + while (trigger == hpet_readl(HPET_T0_CMP)) > /* do nothing */ ; > } else { > int c1, c2; but, but. Didn't this get fixed by ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc2/2.6.23-rc2-mm1/broken-out/fix-hpet-init-race.patch ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer
On Wed, 8 Aug 2007 16:17:19 -0700 Aaron Durbin [EMAIL PROTECTED] wrote: In setup_APIC_timer with the HPET in use, a condition can arise while waiting for the next irq slice to expire on the HPET which will either cause an NMI watchdog to fire or a 3 minute busy loop if the NMI watchdog is disabled. The HPET comparator and the counter keep incrementing during its normal operation. When a comparison event fires the comparator will increment by the designated period. If the HPET trigger occurs right after the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin for up to 3 minutes (with a clock of 25MHz) waiting for the HPET counter to wrap around. However, when the NMI watchdog is enabled the NMI watchdog will detect a lockup and reboot the machine. This scenario can be exasperated by the presence of an SMI which will increase the window of opportunity for the condition to occur. The fix is to wait for the compartor to change which signals the end of the tick slice. --- The last patch had a typo in the diff which really would cause the problem state above. Sorry. diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c index 900ff38..06797e2 100644 --- a/arch/x86_64/kernel/apic.c +++ b/arch/x86_64/kernel/apic.c @@ -791,10 +791,12 @@ static void setup_APIC_timer(unsigned in /* wait for irq slice */ if (hpet_address hpet_use_timer) { + /* + * Wait for the comparator value to change which signals that + * the tick slice has expired. + */ int trigger = hpet_readl(HPET_T0_CMP); - while (hpet_readl(HPET_COUNTER) = trigger) - /* do nothing */ ; - while (hpet_readl(HPET_COUNTER) trigger) + while (trigger == hpet_readl(HPET_T0_CMP)) /* do nothing */ ; } else { int c1, c2; but, but. Didn't this get fixed by ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc2/2.6.23-rc2-mm1/broken-out/fix-hpet-init-race.patch ? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer
On 8/9/07, Andrew Morton [EMAIL PROTECTED] wrote: On Wed, 8 Aug 2007 16:17:19 -0700 Aaron Durbin [EMAIL PROTECTED] wrote: In setup_APIC_timer with the HPET in use, a condition can arise while waiting for the next irq slice to expire on the HPET which will either cause an NMI watchdog to fire or a 3 minute busy loop if the NMI watchdog is disabled. The HPET comparator and the counter keep incrementing during its normal operation. When a comparison event fires the comparator will increment by the designated period. If the HPET trigger occurs right after the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin for up to 3 minutes (with a clock of 25MHz) waiting for the HPET counter to wrap around. However, when the NMI watchdog is enabled the NMI watchdog will detect a lockup and reboot the machine. This scenario can be exasperated by the presence of an SMI which will increase the window of opportunity for the condition to occur. The fix is to wait for the compartor to change which signals the end of the tick slice. --- The last patch had a typo in the diff which really would cause the problem state above. Sorry. diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c index 900ff38..06797e2 100644 --- a/arch/x86_64/kernel/apic.c +++ b/arch/x86_64/kernel/apic.c @@ -791,10 +791,12 @@ static void setup_APIC_timer(unsigned in /* wait for irq slice */ if (hpet_address hpet_use_timer) { + /* + * Wait for the comparator value to change which signals that + * the tick slice has expired. + */ int trigger = hpet_readl(HPET_T0_CMP); - while (hpet_readl(HPET_COUNTER) = trigger) - /* do nothing */ ; - while (hpet_readl(HPET_COUNTER) trigger) + while (trigger == hpet_readl(HPET_T0_CMP)) /* do nothing */ ; } else { int c1, c2; but, but. Didn't this get fixed by ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc2/2.6.23-rc2-mm1/broken-out/fix-hpet-init-race.patch ? That original patch looks good to me as well. I just checked git head to see if this was already fixed upstream, and it didn't appear to be. Sorry for the duplicate patch. -Aaron - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer
On Thu, 9 Aug 2007 14:16:16 -0700 Aaron Durbin [EMAIL PROTECTED] wrote: but, but. Didn't this get fixed by ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc2/2.6.23-rc2-mm1/broken-out/fix-hpet-init-race.patch ? That original patch looks good to me as well. I just checked git head to see if this was already fixed upstream, and it didn't appear to be. Sorry for the duplicate patch. Yeah, sorry, I'm being a bit more batch-oriented in getting stuff into Linus nowadays. I seem to have 44 patches to go and another 20 which should go, but which belong to subsystem maintainers. We'll get there. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer
> > I was thinking along the same lines as well, but I really didn't know how > important all that code was for waiting for the next irq slice. I'm not time > expert, but I would imagine we would resynchronize correctly in the future > after > this code path? The idea is to just have the apic timers roughly at the same times + some fixed offset as the main timer. Otherwise e.g. add_timer could jitter up to a jiffie regarding the jiffies counter. But long term drift will destroy the synchronization anyways. I guess i'll send it in anyways for .23; it would be too risky to change anything radical in this area that late. Then later it can be revisited. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer
On 8/8/07, Andi Kleen <[EMAIL PROTECTED]> wrote: > On Thursday 09 August 2007 01:17:19 Aaron Durbin wrote: > > In setup_APIC_timer with the HPET in use, a condition can arise while > > waiting for the next irq slice to expire on the HPET which will either > > cause an NMI watchdog to fire or a 3 minute busy loop if the NMI > > watchdog is disabled. > > > > The HPET comparator and the counter keep incrementing during its normal > > operation. When a comparison event fires the comparator will increment > > by the designated period. If the HPET trigger occurs right after > > the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin > > for up to 3 minutes (with a clock of 25MHz) waiting for the HPET > > counter to wrap around. However, when the NMI watchdog is enabled the > > NMI watchdog will detect a lockup and reboot the machine. This > > scenario can be exasperated by the presence of an SMI which will > > increase the window of opportunity for the condition to occur. > > Nasty. > > Iirc the clockevents code did away with the synchronization > completely because Thomas determined it was useless because long term > it'll drift anyways. So perhaps it could be just removed? > I was thinking along the same lines as well, but I really didn't know how important all that code was for waiting for the next irq slice. I'm not time expert, but I would imagine we would resynchronize correctly in the future after this code path? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer
On Thursday 09 August 2007 01:17:19 Aaron Durbin wrote: > In setup_APIC_timer with the HPET in use, a condition can arise while > waiting for the next irq slice to expire on the HPET which will either > cause an NMI watchdog to fire or a 3 minute busy loop if the NMI > watchdog is disabled. > > The HPET comparator and the counter keep incrementing during its normal > operation. When a comparison event fires the comparator will increment > by the designated period. If the HPET trigger occurs right after > the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin > for up to 3 minutes (with a clock of 25MHz) waiting for the HPET > counter to wrap around. However, when the NMI watchdog is enabled the > NMI watchdog will detect a lockup and reboot the machine. This > scenario can be exasperated by the presence of an SMI which will > increase the window of opportunity for the condition to occur. Nasty. Iirc the clockevents code did away with the synchronization completely because Thomas determined it was useless because long term it'll drift anyways. So perhaps it could be just removed? -Andi > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer
On Thursday 09 August 2007 01:17:19 Aaron Durbin wrote: In setup_APIC_timer with the HPET in use, a condition can arise while waiting for the next irq slice to expire on the HPET which will either cause an NMI watchdog to fire or a 3 minute busy loop if the NMI watchdog is disabled. The HPET comparator and the counter keep incrementing during its normal operation. When a comparison event fires the comparator will increment by the designated period. If the HPET trigger occurs right after the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin for up to 3 minutes (with a clock of 25MHz) waiting for the HPET counter to wrap around. However, when the NMI watchdog is enabled the NMI watchdog will detect a lockup and reboot the machine. This scenario can be exasperated by the presence of an SMI which will increase the window of opportunity for the condition to occur. Nasty. Iirc the clockevents code did away with the synchronization completely because Thomas determined it was useless because long term it'll drift anyways. So perhaps it could be just removed? -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer
On 8/8/07, Andi Kleen [EMAIL PROTECTED] wrote: On Thursday 09 August 2007 01:17:19 Aaron Durbin wrote: In setup_APIC_timer with the HPET in use, a condition can arise while waiting for the next irq slice to expire on the HPET which will either cause an NMI watchdog to fire or a 3 minute busy loop if the NMI watchdog is disabled. The HPET comparator and the counter keep incrementing during its normal operation. When a comparison event fires the comparator will increment by the designated period. If the HPET trigger occurs right after the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin for up to 3 minutes (with a clock of 25MHz) waiting for the HPET counter to wrap around. However, when the NMI watchdog is enabled the NMI watchdog will detect a lockup and reboot the machine. This scenario can be exasperated by the presence of an SMI which will increase the window of opportunity for the condition to occur. Nasty. Iirc the clockevents code did away with the synchronization completely because Thomas determined it was useless because long term it'll drift anyways. So perhaps it could be just removed? I was thinking along the same lines as well, but I really didn't know how important all that code was for waiting for the next irq slice. I'm not time expert, but I would imagine we would resynchronize correctly in the future after this code path? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer
I was thinking along the same lines as well, but I really didn't know how important all that code was for waiting for the next irq slice. I'm not time expert, but I would imagine we would resynchronize correctly in the future after this code path? The idea is to just have the apic timers roughly at the same times + some fixed offset as the main timer. Otherwise e.g. add_timer could jitter up to a jiffie regarding the jiffies counter. But long term drift will destroy the synchronization anyways. I guess i'll send it in anyways for .23; it would be too risky to change anything radical in this area that late. Then later it can be revisited. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/