Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer

2007-08-09 Thread Andrew Morton
On Thu, 9 Aug 2007 14:16:16 -0700
"Aaron Durbin" <[EMAIL PROTECTED]> wrote:

> > but, but.  Didn't this get fixed by
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc2/2.6.23-rc2-mm1/broken-out/fix-hpet-init-race.patch
> > ?
> >
> 
> That original patch looks good to me as well. I just checked git head to see 
> if
> this was already fixed upstream, and it didn't appear to be. Sorry for the
> duplicate patch.

Yeah, sorry, I'm being a bit more batch-oriented in getting stuff into
Linus nowadays.  I seem to have 44 patches to go and another 20 which 
should go, but which belong to subsystem maintainers.  We'll get there.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer

2007-08-09 Thread Aaron Durbin
On 8/9/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
> On Wed, 8 Aug 2007 16:17:19 -0700
> Aaron Durbin <[EMAIL PROTECTED]> wrote:
>
> > In setup_APIC_timer with the HPET in use, a condition can arise while
> > waiting for the next irq slice to expire on the HPET which will either
> > cause an NMI watchdog to fire or a 3 minute busy loop if the NMI
> > watchdog is disabled.
> >
> > The HPET comparator and the counter keep incrementing during its normal
> > operation. When a comparison event fires the comparator will increment
> > by the designated period. If the HPET trigger occurs right after
> > the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin
> > for up to 3 minutes (with a clock of 25MHz) waiting for the HPET
> > counter to wrap around. However, when the NMI watchdog is enabled the
> > NMI watchdog will detect a lockup and reboot the machine. This
> > scenario can be exasperated by the presence of an SMI which will
> > increase the window of opportunity for the condition to occur.
> >
> > The fix is to wait for the compartor to change which signals the
> > end of the tick slice.
> >
> > ---
> >
> > The last patch had a typo in the diff which really would cause the
> > problem state above. Sorry.
> >
> > diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c
> > index 900ff38..06797e2 100644
> > --- a/arch/x86_64/kernel/apic.c
> > +++ b/arch/x86_64/kernel/apic.c
> > @@ -791,10 +791,12 @@ static void setup_APIC_timer(unsigned in
> >
> >   /* wait for irq slice */
> >   if (hpet_address && hpet_use_timer) {
> > + /*
> > +  * Wait for the comparator value to change which signals that
> > +  * the tick slice has expired.
> > +  */
> >   int trigger = hpet_readl(HPET_T0_CMP);
> > - while (hpet_readl(HPET_COUNTER) >= trigger)
> > - /* do nothing */ ;
> > - while (hpet_readl(HPET_COUNTER) <  trigger)
> > + while (trigger == hpet_readl(HPET_T0_CMP))
> >   /* do nothing */ ;
> >   } else {
> >   int c1, c2;
>
> but, but.  Didn't this get fixed by
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc2/2.6.23-rc2-mm1/broken-out/fix-hpet-init-race.patch
> ?
>

That original patch looks good to me as well. I just checked git head to see if
this was already fixed upstream, and it didn't appear to be. Sorry for the
duplicate patch.

-Aaron
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer

2007-08-09 Thread Andrew Morton
On Wed, 8 Aug 2007 16:17:19 -0700
Aaron Durbin <[EMAIL PROTECTED]> wrote:

> In setup_APIC_timer with the HPET in use, a condition can arise while
> waiting for the next irq slice to expire on the HPET which will either
> cause an NMI watchdog to fire or a 3 minute busy loop if the NMI
> watchdog is disabled.
> 
> The HPET comparator and the counter keep incrementing during its normal
> operation. When a comparison event fires the comparator will increment
> by the designated period. If the HPET trigger occurs right after
> the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin
> for up to 3 minutes (with a clock of 25MHz) waiting for the HPET
> counter to wrap around. However, when the NMI watchdog is enabled the
> NMI watchdog will detect a lockup and reboot the machine. This
> scenario can be exasperated by the presence of an SMI which will
> increase the window of opportunity for the condition to occur.
> 
> The fix is to wait for the compartor to change which signals the
> end of the tick slice.
> 
> ---
> 
> The last patch had a typo in the diff which really would cause the
> problem state above. Sorry.
> 
> diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c
> index 900ff38..06797e2 100644
> --- a/arch/x86_64/kernel/apic.c
> +++ b/arch/x86_64/kernel/apic.c
> @@ -791,10 +791,12 @@ static void setup_APIC_timer(unsigned in
>  
>   /* wait for irq slice */
>   if (hpet_address && hpet_use_timer) {
> + /*
> +  * Wait for the comparator value to change which signals that
> +  * the tick slice has expired.
> +  */
>   int trigger = hpet_readl(HPET_T0_CMP);
> - while (hpet_readl(HPET_COUNTER) >= trigger)
> - /* do nothing */ ;
> - while (hpet_readl(HPET_COUNTER) <  trigger)
> + while (trigger == hpet_readl(HPET_T0_CMP))
>   /* do nothing */ ;
>   } else {
>   int c1, c2;

but, but.  Didn't this get fixed by
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc2/2.6.23-rc2-mm1/broken-out/fix-hpet-init-race.patch
?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer

2007-08-09 Thread Andrew Morton
On Wed, 8 Aug 2007 16:17:19 -0700
Aaron Durbin [EMAIL PROTECTED] wrote:

 In setup_APIC_timer with the HPET in use, a condition can arise while
 waiting for the next irq slice to expire on the HPET which will either
 cause an NMI watchdog to fire or a 3 minute busy loop if the NMI
 watchdog is disabled.
 
 The HPET comparator and the counter keep incrementing during its normal
 operation. When a comparison event fires the comparator will increment
 by the designated period. If the HPET trigger occurs right after
 the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin
 for up to 3 minutes (with a clock of 25MHz) waiting for the HPET
 counter to wrap around. However, when the NMI watchdog is enabled the
 NMI watchdog will detect a lockup and reboot the machine. This
 scenario can be exasperated by the presence of an SMI which will
 increase the window of opportunity for the condition to occur.
 
 The fix is to wait for the compartor to change which signals the
 end of the tick slice.
 
 ---
 
 The last patch had a typo in the diff which really would cause the
 problem state above. Sorry.
 
 diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c
 index 900ff38..06797e2 100644
 --- a/arch/x86_64/kernel/apic.c
 +++ b/arch/x86_64/kernel/apic.c
 @@ -791,10 +791,12 @@ static void setup_APIC_timer(unsigned in
  
   /* wait for irq slice */
   if (hpet_address  hpet_use_timer) {
 + /*
 +  * Wait for the comparator value to change which signals that
 +  * the tick slice has expired.
 +  */
   int trigger = hpet_readl(HPET_T0_CMP);
 - while (hpet_readl(HPET_COUNTER) = trigger)
 - /* do nothing */ ;
 - while (hpet_readl(HPET_COUNTER)   trigger)
 + while (trigger == hpet_readl(HPET_T0_CMP))
   /* do nothing */ ;
   } else {
   int c1, c2;

but, but.  Didn't this get fixed by
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc2/2.6.23-rc2-mm1/broken-out/fix-hpet-init-race.patch
?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer

2007-08-09 Thread Aaron Durbin
On 8/9/07, Andrew Morton [EMAIL PROTECTED] wrote:
 On Wed, 8 Aug 2007 16:17:19 -0700
 Aaron Durbin [EMAIL PROTECTED] wrote:

  In setup_APIC_timer with the HPET in use, a condition can arise while
  waiting for the next irq slice to expire on the HPET which will either
  cause an NMI watchdog to fire or a 3 minute busy loop if the NMI
  watchdog is disabled.
 
  The HPET comparator and the counter keep incrementing during its normal
  operation. When a comparison event fires the comparator will increment
  by the designated period. If the HPET trigger occurs right after
  the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin
  for up to 3 minutes (with a clock of 25MHz) waiting for the HPET
  counter to wrap around. However, when the NMI watchdog is enabled the
  NMI watchdog will detect a lockup and reboot the machine. This
  scenario can be exasperated by the presence of an SMI which will
  increase the window of opportunity for the condition to occur.
 
  The fix is to wait for the compartor to change which signals the
  end of the tick slice.
 
  ---
 
  The last patch had a typo in the diff which really would cause the
  problem state above. Sorry.
 
  diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c
  index 900ff38..06797e2 100644
  --- a/arch/x86_64/kernel/apic.c
  +++ b/arch/x86_64/kernel/apic.c
  @@ -791,10 +791,12 @@ static void setup_APIC_timer(unsigned in
 
/* wait for irq slice */
if (hpet_address  hpet_use_timer) {
  + /*
  +  * Wait for the comparator value to change which signals that
  +  * the tick slice has expired.
  +  */
int trigger = hpet_readl(HPET_T0_CMP);
  - while (hpet_readl(HPET_COUNTER) = trigger)
  - /* do nothing */ ;
  - while (hpet_readl(HPET_COUNTER)   trigger)
  + while (trigger == hpet_readl(HPET_T0_CMP))
/* do nothing */ ;
} else {
int c1, c2;

 but, but.  Didn't this get fixed by
 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc2/2.6.23-rc2-mm1/broken-out/fix-hpet-init-race.patch
 ?


That original patch looks good to me as well. I just checked git head to see if
this was already fixed upstream, and it didn't appear to be. Sorry for the
duplicate patch.

-Aaron
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer

2007-08-09 Thread Andrew Morton
On Thu, 9 Aug 2007 14:16:16 -0700
Aaron Durbin [EMAIL PROTECTED] wrote:

  but, but.  Didn't this get fixed by
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc2/2.6.23-rc2-mm1/broken-out/fix-hpet-init-race.patch
  ?
 
 
 That original patch looks good to me as well. I just checked git head to see 
 if
 this was already fixed upstream, and it didn't appear to be. Sorry for the
 duplicate patch.

Yeah, sorry, I'm being a bit more batch-oriented in getting stuff into
Linus nowadays.  I seem to have 44 patches to go and another 20 which 
should go, but which belong to subsystem maintainers.  We'll get there.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer

2007-08-08 Thread Andi Kleen

> 
> I was thinking along the same lines as well, but I really didn't know how
> important all that code was for waiting for the next irq slice. I'm not time
> expert, but I would imagine we would resynchronize correctly in the future 
> after
> this code path?

The idea is to just have the apic timers roughly at the same times + some fixed
offset as the main timer.  Otherwise e.g. add_timer could jitter up to a jiffie
regarding the jiffies counter. But long term drift will destroy the 
synchronization 
anyways.

I guess i'll send it in anyways for .23; it would be too risky to change 
anything 
radical in this area that late. Then later it can be revisited.

-Andi 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer

2007-08-08 Thread Aaron Durbin
On 8/8/07, Andi Kleen <[EMAIL PROTECTED]> wrote:
> On Thursday 09 August 2007 01:17:19 Aaron Durbin wrote:
> > In setup_APIC_timer with the HPET in use, a condition can arise while
> > waiting for the next irq slice to expire on the HPET which will either
> > cause an NMI watchdog to fire or a 3 minute busy loop if the NMI
> > watchdog is disabled.
> >
> > The HPET comparator and the counter keep incrementing during its normal
> > operation. When a comparison event fires the comparator will increment
> > by the designated period. If the HPET trigger occurs right after
> > the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin
> > for up to 3 minutes (with a clock of 25MHz) waiting for the HPET
> > counter to wrap around. However, when the NMI watchdog is enabled the
> > NMI watchdog will detect a lockup and reboot the machine. This
> > scenario can be exasperated by the presence of an SMI which will
> > increase the window of opportunity for the condition to occur.
>
> Nasty.
>
> Iirc the clockevents code did away with the synchronization
> completely because Thomas determined it was useless because long term
> it'll drift anyways. So perhaps it could be just removed?
>

I was thinking along the same lines as well, but I really didn't know how
important all that code was for waiting for the next irq slice. I'm not time
expert, but I would imagine we would resynchronize correctly in the future after
this code path?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer

2007-08-08 Thread Andi Kleen
On Thursday 09 August 2007 01:17:19 Aaron Durbin wrote:
> In setup_APIC_timer with the HPET in use, a condition can arise while
> waiting for the next irq slice to expire on the HPET which will either
> cause an NMI watchdog to fire or a 3 minute busy loop if the NMI
> watchdog is disabled.
> 
> The HPET comparator and the counter keep incrementing during its normal
> operation. When a comparison event fires the comparator will increment
> by the designated period. If the HPET trigger occurs right after
> the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin
> for up to 3 minutes (with a clock of 25MHz) waiting for the HPET
> counter to wrap around. However, when the NMI watchdog is enabled the
> NMI watchdog will detect a lockup and reboot the machine. This
> scenario can be exasperated by the presence of an SMI which will
> increase the window of opportunity for the condition to occur.

Nasty.

Iirc the clockevents code did away with the synchronization
completely because Thomas determined it was useless because long term
it'll drift anyways. So perhaps it could be just removed?

-Andi
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer

2007-08-08 Thread Andi Kleen
On Thursday 09 August 2007 01:17:19 Aaron Durbin wrote:
 In setup_APIC_timer with the HPET in use, a condition can arise while
 waiting for the next irq slice to expire on the HPET which will either
 cause an NMI watchdog to fire or a 3 minute busy loop if the NMI
 watchdog is disabled.
 
 The HPET comparator and the counter keep incrementing during its normal
 operation. When a comparison event fires the comparator will increment
 by the designated period. If the HPET trigger occurs right after
 the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin
 for up to 3 minutes (with a clock of 25MHz) waiting for the HPET
 counter to wrap around. However, when the NMI watchdog is enabled the
 NMI watchdog will detect a lockup and reboot the machine. This
 scenario can be exasperated by the presence of an SMI which will
 increase the window of opportunity for the condition to occur.

Nasty.

Iirc the clockevents code did away with the synchronization
completely because Thomas determined it was useless because long term
it'll drift anyways. So perhaps it could be just removed?

-Andi
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer

2007-08-08 Thread Aaron Durbin
On 8/8/07, Andi Kleen [EMAIL PROTECTED] wrote:
 On Thursday 09 August 2007 01:17:19 Aaron Durbin wrote:
  In setup_APIC_timer with the HPET in use, a condition can arise while
  waiting for the next irq slice to expire on the HPET which will either
  cause an NMI watchdog to fire or a 3 minute busy loop if the NMI
  watchdog is disabled.
 
  The HPET comparator and the counter keep incrementing during its normal
  operation. When a comparison event fires the comparator will increment
  by the designated period. If the HPET trigger occurs right after
  the 'int trigger = hpet_readl(HPET_T0_CMP);' line, we will will spin
  for up to 3 minutes (with a clock of 25MHz) waiting for the HPET
  counter to wrap around. However, when the NMI watchdog is enabled the
  NMI watchdog will detect a lockup and reboot the machine. This
  scenario can be exasperated by the presence of an SMI which will
  increase the window of opportunity for the condition to occur.

 Nasty.

 Iirc the clockevents code did away with the synchronization
 completely because Thomas determined it was useless because long term
 it'll drift anyways. So perhaps it could be just removed?


I was thinking along the same lines as well, but I really didn't know how
important all that code was for waiting for the next irq slice. I'm not time
expert, but I would imagine we would resynchronize correctly in the future after
this code path?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_64: Avoid NMI Watchdog and/or long wait in setup_APIC_timer

2007-08-08 Thread Andi Kleen

 
 I was thinking along the same lines as well, but I really didn't know how
 important all that code was for waiting for the next irq slice. I'm not time
 expert, but I would imagine we would resynchronize correctly in the future 
 after
 this code path?

The idea is to just have the apic timers roughly at the same times + some fixed
offset as the main timer.  Otherwise e.g. add_timer could jitter up to a jiffie
regarding the jiffies counter. But long term drift will destroy the 
synchronization 
anyways.

I guess i'll send it in anyways for .23; it would be too risky to change 
anything 
radical in this area that late. Then later it can be revisited.

-Andi 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/