RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-08-04 Thread 河合英宏 / KAWAI,HIDEHIRO
Hi,

> From: Michal Hocko [mailto:mho...@kernel.org]
> On Fri 31-07-15 11:23:00, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > From: Michal Hocko [mailto:mho...@kernel.org]
> > > > There is a timeout of 1000ms in nmi_shootdown_cpus(), so I don't know
> > > > why CPU 130 waits so long.  I'll try to consider for a while.
> > >
> > > Yes, I do not understand the timing here either and the fact that the
> > > log is a complete mess in the important parts doesn't help a wee bit.
> >
> > I'm interested in where "kernel panic -not syncing: " is.
> > It may give us a clue.
> 
> This one is lost in the mangled text:
> [  167.843771] U<0>[  167.843771] hhuh. NMI received for unkn<0><0>[  
> 167.843765] Uh[  16NM843774I own rea reived for
> unknow<0 r  16n 2d 765] Uhhuh. CPU recei11. <0known reason 7. on770] Ker<[ - 
> not rn NMI:nic - not contt sing
> 
> <0 >[ : Not con.inu437azed and confused, b] Dtryingaed annue
> 
> fu 167.8ut trying>[   to 7.<0377 167.843775] U<0>[  167.843776] ]hhu.ived for 
> u3nknown rMason 3 re oived for [nk167.843781]

Thanks for the information.

I anticipated that some lock contention on issuing messages (e.g.
locks used by network/netconsole driver) delayed the panic procedure,
but it seems not to be related because the panic message finished to
be issued early.

If I come up with something, I will post a mail.  I think there
may be potential issues.

> 1.
> <. N0>[  167.843781] Uh. NMI recen 3d on CPU 0.i< >[ nowon 3d on] Chhuh.MI
> eceived[ or7.843nknoUhhuh.wn rMason e3d ceCPivUd 120.
> <0nk>no 167.wn843ason 3na s p120.
> o<0er savi d6 e843ab88] Do yeu have a
> [ er saving mode e nabl1d?7<4][  167 84hu94]MIuh. NceIived for 
> unknown reas vdfor 1no3was0>[ 2d 67.84380on CI
> rUe 12e.
> ive7d8u3800wn rveaseo f2d on CPo3.r< u>k[o 1 rea6s.o2d8 oo you hn aPve <0st>a 
> e power 1s7.843816] Do yoauv ng moade
> enbslra?ng[ e 167.8438p41o]er shhuhavi.ngIroenived fbled?nknow
> < reaso0> 2d on [PU1626.41]0>   Uh67.h. NM387I] receihed for .nknown reason  
> 2Nn MC U ceived for .
> [son 2d on CPU 6.
> <  160>7.8467.84873] Uhhuh. 3MI received 908 o knstra
> [ n167.843908] Do ygo pave westrangesa pvnv mode enableng mode ed?
> n

Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-08-04 Thread Michal Hocko
On Fri 31-07-15 11:23:00, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > From: Michal Hocko [mailto:mho...@kernel.org]
[...]
> > I am saying that watchdog_overflow_callback might trigger on more CPUs
> > and panic from NMI context as well. So this is not reduced to the NMI
> > button sends NMI to more CPUs.
> 
> I understand.  So, I have to also modify watchdog_overflow_callback
> to call nmi_panic().

yes.

[...]
> > > There is a timeout of 1000ms in nmi_shootdown_cpus(), so I don't know
> > > why CPU 130 waits so long.  I'll try to consider for a while.
> > 
> > Yes, I do not understand the timing here either and the fact that the
> > log is a complete mess in the important parts doesn't help a wee bit.
> 
> I'm interested in where "kernel panic -not syncing: " is.
> It may give us a clue.

This one is lost in the mangled text:
[  167.843771] U<0>[  167.843771] hhuh. NMI received for unkn<0><0>[  
167.843765] Uh[  16NM843774I own rea reived for unknow<0 r  16n 2d 765] Uhhuh. 
CPU recei11. <0known reason 7. on770] Ker<[ - not rn NMI:nic - not contt sing

<0 >[ : Not con.inu437azed and confused, b] Dtryingaed annue

fu 167.8ut trying>[   to 7.<0377 167.843775] U<0>[  167.843776] ]hhu.ived for 
u3nknown rMason 3 re oived for [nk167.843781]  1.
<. N0>[  167.843781] Uh. NMI recen 3d on CPU 0.i< >[ nowon 3d on] Chhuh.MI
eceived[ or7.843nknoUhhuh.wn rMason e3d ceCPivUd 120.
<0nk>no 167.wn843ason 3na s p120.
o<0er savi d6 e843ab88] Do yeu have a
[ er saving mode e nabl1d?7<4][  167 84hu94]MIuh. NceIived for unknown 
reas vdfor 1no3was0>[ 2d 67.84380on CI rUe 12e.
ive7d8u3800wn rveaseo f2d on CPo3.r< u>k[o 1 rea6s.o2d8 oo you hn aPve <0st>a e 
power 1s7.843816] Do yoauv ng moade enbslra?ng[ e 167.8438p41o]er 
shhuhavi.ngIroenived fbled?nknow
< reaso0> 2d on [PU1626.41]0>   Uh67.h. NM387I] receihed for .nknown reason  
2Nn MC U ceived for .
[son 2d on CPU 6.
<  160>7.8467.84873] Uhhuh. 3MI received 908 o knstra
[ n167.843908] Do ygo pave westrangesa pvnv mode enableng mode ed?
nhttp://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-08-04 Thread Michal Hocko
On Fri 31-07-15 11:23:00, 河合英宏 / KAWAI,HIDEHIRO wrote:
  From: Michal Hocko [mailto:mho...@kernel.org]
[...]
  I am saying that watchdog_overflow_callback might trigger on more CPUs
  and panic from NMI context as well. So this is not reduced to the NMI
  button sends NMI to more CPUs.
 
 I understand.  So, I have to also modify watchdog_overflow_callback
 to call nmi_panic().

yes.

[...]
   There is a timeout of 1000ms in nmi_shootdown_cpus(), so I don't know
   why CPU 130 waits so long.  I'll try to consider for a while.
  
  Yes, I do not understand the timing here either and the fact that the
  log is a complete mess in the important parts doesn't help a wee bit.
 
 I'm interested in where kernel panic -not syncing:  is.
 It may give us a clue.

This one is lost in the mangled text:
[  167.843771] U0[  167.843771] hhuh. NMI received for unkn00[  
167.843765] Uh[  16NM843774I own rea reived for unknow0 r  16n 2d 765] Uhhuh. 
CPU recei11. 0known reason 7. on770] Ker[ - not rn NMI:nic - not contt sing

0 [ : Not con.inu437azed and confused, b] Dtryingaed annue

fu 167.8ut trying[   to 7.0377 167.843775] U0[  167.843776] ]hhu.ived for 
u3nknown rMason 3 re oived for [nk167.843781]  1.
. N0[  167.843781] Uh. NMI recen 3d on CPU 0.i [ nowon 3d on] Chhuh.MI
eceived[ or7.843nknoUhhuh.wn rMason e3d ceCPivUd 120.
0nkno 167.wn843ason 3na s p120.
o0er savi d6 e843ab88] Do yeu have a
trange0[ er saving mode e nabl1d?74][  167 84hu94]MIuh. NceIived for unknown 
reas vdfor 1no3was0[ 2d 67.84380on CI rUe 12e.
ive7d8u3800wn rveaseo f2d on CPo3.r uk[o 1 rea6s.o2d8 oo you hn aPve 0sta e 
power 1s7.843816] Do yoauv ng moade enbslra?ng[ e 167.8438p41o]er 
shhuhavi.ngIroenived fbled?nknow
 reaso0 2d on [PU1626.41]0   Uh67.h. NM387I] receihed for .nknown reason  
2Nn MC U ceived for .
[son 2d on CPU 6.
  1607.8467.84873] Uhhuh. 3MI received 908 o knstra
[ n167.843908] Do ygo pave westrangesa pvnv mode enableng mode ed?
nb0ed?
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-08-04 Thread 河合英宏 / KAWAI,HIDEHIRO
Hi,

 From: Michal Hocko [mailto:mho...@kernel.org]
 On Fri 31-07-15 11:23:00, 河合英宏 / KAWAI,HIDEHIRO wrote:
   From: Michal Hocko [mailto:mho...@kernel.org]
There is a timeout of 1000ms in nmi_shootdown_cpus(), so I don't know
why CPU 130 waits so long.  I'll try to consider for a while.
  
   Yes, I do not understand the timing here either and the fact that the
   log is a complete mess in the important parts doesn't help a wee bit.
 
  I'm interested in where kernel panic -not syncing:  is.
  It may give us a clue.
 
 This one is lost in the mangled text:
 [  167.843771] U0[  167.843771] hhuh. NMI received for unkn00[  
 167.843765] Uh[  16NM843774I own rea reived for
 unknow0 r  16n 2d 765] Uhhuh. CPU recei11. 0known reason 7. on770] Ker[ - 
 not rn NMI:nic - not contt sing
 
 0 [ : Not con.inu437azed and confused, b] Dtryingaed annue
 
 fu 167.8ut trying[   to 7.0377 167.843775] U0[  167.843776] ]hhu.ived for 
 u3nknown rMason 3 re oived for [nk167.843781]

Thanks for the information.

I anticipated that some lock contention on issuing messages (e.g.
locks used by network/netconsole driver) delayed the panic procedure,
but it seems not to be related because the panic message finished to
be issued early.

If I come up with something, I will post a mail.  I think there
may be potential issues.

 1.
 . N0[  167.843781] Uh. NMI recen 3d on CPU 0.i [ nowon 3d on] Chhuh.MI
 eceived[ or7.843nknoUhhuh.wn rMason e3d ceCPivUd 120.
 0nkno 167.wn843ason 3na s p120.
 o0er savi d6 e843ab88] Do yeu have a
 trange0[ er saving mode e nabl1d?74][  167 84hu94]MIuh. NceIived for 
 unknown reas vdfor 1no3was0[ 2d 67.84380on CI
 rUe 12e.
 ive7d8u3800wn rveaseo f2d on CPo3.r uk[o 1 rea6s.o2d8 oo you hn aPve 0sta 
 e power 1s7.843816] Do yoauv ng moade
 enbslra?ng[ e 167.8438p41o]er shhuhavi.ngIroenived fbled?nknow
  reaso0 2d on [PU1626.41]0   Uh67.h. NM387I] receihed for .nknown reason  
 2Nn MC U ceived for .
 [son 2d on CPU 6.
   1607.8467.84873] Uhhuh. 3MI received 908 o knstra
 [ n167.843908] Do ygo pave westrangesa pvnv mode enableng mode ed?
 nb0ed?

Regards,
Kawai




RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-31 Thread 河合英宏 / KAWAI,HIDEHIRO
> From: Michal Hocko [mailto:mho...@kernel.org]
> 
> On Thu 30-07-15 11:55:52, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > From: Michal Hocko [mailto:mho...@kernel.org]
> [...]
> > > Could you point me to the code which does that, please? Maybe we are
> > > missing that in our 3.0 kernel. I was quite surprised to see this
> > > behavior as well.
> >
> > Please see the snippet below.
> >
> > void setup_local_APIC(void)
> > {
> > ...
> > /*
> >  * only the BP should see the LINT1 NMI signal, obviously.
> >  */
> > if (!cpu)
> > value = APIC_DM_NMI;
> > else
> > value = APIC_DM_NMI | APIC_LVT_MASKED;
> > if (!lapic_is_integrated()) /* 82489DX */
> > value |= APIC_LVT_LEVEL_TRIGGER;
> > apic_write(APIC_LVT1, value);
> >
> >
> > LINT1 pins of cpus other than CPU 0 are masked here.
> > However, at least on some of Hitachi servers, NMI caused by NMI
> > button doesn't seem to be delivered through LINT1.  So, my `external NMI'
> > word may not be correct.
> 
> I am not familiar with details here but I can tell you that this
> particular code snippet is the same in our 3.0 based kernel so it seems
> that the HW is indeed doing something differently.

Yes, and it turned out my PATCH 3/3 doesn't work at all on some
hardware...

> > > You might still get a panic on hardlockup which will happen on all CPUs
> > > from the NMI context so we have to be able to handle panic in NMI on
> > > many CPUs.
> >
> > Do you say about the case of a kerne panic while other cpus locks up
> > in NMI context?  In that case, there is no way to do things needed by
> > kdump procedure including saving registeres...
> 
> I am saying that watchdog_overflow_callback might trigger on more CPUs
> and panic from NMI context as well. So this is not reduced to the NMI
> button sends NMI to more CPUs.

I understand.  So, I have to also modify watchdog_overflow_callback
to call nmi_panic().

> Why cannot the panic() context save all the registers if we are going to
> loop in NMI context? This would be imho preferable to returning from
> panic IMO.

I'm not saying we cannot save registers and do some cleanups in NMI
context.  I fell that it would introduce unneeded complexity.
Since watchdog_overflow_callback is defined as generic code,
we need to implement the preparation for kdump for other architectures.
I haven't checked which architectures support both nmi watchdog and
kdump, though.

Anyway, I came up with a simple solution for x86.  Waiting for the
timing of nmi_shootdown_cpus() in nmi_panic(), then invoke the
callback registered by nmi_shootdown_cpus().

> > > I can provide the full log but it is quite mangled. I guess the
> > > CPU130 was the only one allowed to proceed with the panic while others
> > > returned from the unknown NMI handling. It took a lot of time until
> > > CPU130 managed to boot the crash kernel with soft lockups and RCU stalls
> > > reports. CPU0 is most probably locked up waiting for CPU130 to
> > > acknowledge the IPI which will not happen apparently.
> >
> > There is a timeout of 1000ms in nmi_shootdown_cpus(), so I don't know
> > why CPU 130 waits so long.  I'll try to consider for a while.
> 
> Yes, I do not understand the timing here either and the fact that the
> log is a complete mess in the important parts doesn't help a wee bit.

I'm interested in where "kernel panic -not syncing: " is.
It may give us a clue.


Regards,
Kawai



RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-31 Thread 河合英宏 / KAWAI,HIDEHIRO
 From: Michal Hocko [mailto:mho...@kernel.org]
 
 On Thu 30-07-15 11:55:52, 河合英宏 / KAWAI,HIDEHIRO wrote:
   From: Michal Hocko [mailto:mho...@kernel.org]
 [...]
   Could you point me to the code which does that, please? Maybe we are
   missing that in our 3.0 kernel. I was quite surprised to see this
   behavior as well.
 
  Please see the snippet below.
 
  void setup_local_APIC(void)
  {
  ...
  /*
   * only the BP should see the LINT1 NMI signal, obviously.
   */
  if (!cpu)
  value = APIC_DM_NMI;
  else
  value = APIC_DM_NMI | APIC_LVT_MASKED;
  if (!lapic_is_integrated()) /* 82489DX */
  value |= APIC_LVT_LEVEL_TRIGGER;
  apic_write(APIC_LVT1, value);
 
 
  LINT1 pins of cpus other than CPU 0 are masked here.
  However, at least on some of Hitachi servers, NMI caused by NMI
  button doesn't seem to be delivered through LINT1.  So, my `external NMI'
  word may not be correct.
 
 I am not familiar with details here but I can tell you that this
 particular code snippet is the same in our 3.0 based kernel so it seems
 that the HW is indeed doing something differently.

Yes, and it turned out my PATCH 3/3 doesn't work at all on some
hardware...

   You might still get a panic on hardlockup which will happen on all CPUs
   from the NMI context so we have to be able to handle panic in NMI on
   many CPUs.
 
  Do you say about the case of a kerne panic while other cpus locks up
  in NMI context?  In that case, there is no way to do things needed by
  kdump procedure including saving registeres...
 
 I am saying that watchdog_overflow_callback might trigger on more CPUs
 and panic from NMI context as well. So this is not reduced to the NMI
 button sends NMI to more CPUs.

I understand.  So, I have to also modify watchdog_overflow_callback
to call nmi_panic().

 Why cannot the panic() context save all the registers if we are going to
 loop in NMI context? This would be imho preferable to returning from
 panic IMO.

I'm not saying we cannot save registers and do some cleanups in NMI
context.  I fell that it would introduce unneeded complexity.
Since watchdog_overflow_callback is defined as generic code,
we need to implement the preparation for kdump for other architectures.
I haven't checked which architectures support both nmi watchdog and
kdump, though.

Anyway, I came up with a simple solution for x86.  Waiting for the
timing of nmi_shootdown_cpus() in nmi_panic(), then invoke the
callback registered by nmi_shootdown_cpus().

   I can provide the full log but it is quite mangled. I guess the
   CPU130 was the only one allowed to proceed with the panic while others
   returned from the unknown NMI handling. It took a lot of time until
   CPU130 managed to boot the crash kernel with soft lockups and RCU stalls
   reports. CPU0 is most probably locked up waiting for CPU130 to
   acknowledge the IPI which will not happen apparently.
 
  There is a timeout of 1000ms in nmi_shootdown_cpus(), so I don't know
  why CPU 130 waits so long.  I'll try to consider for a while.
 
 Yes, I do not understand the timing here either and the fact that the
 log is a complete mess in the important parts doesn't help a wee bit.

I'm interested in where kernel panic -not syncing:  is.
It may give us a clue.


Regards,
Kawai



Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-30 Thread Michal Hocko
On Thu 30-07-15 11:55:52, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > From: Michal Hocko [mailto:mho...@kernel.org]
[...]
> > Could you point me to the code which does that, please? Maybe we are
> > missing that in our 3.0 kernel. I was quite surprised to see this
> > behavior as well.
> 
> Please see the snippet below.
> 
> void setup_local_APIC(void)
> {
> ...
> /*
>  * only the BP should see the LINT1 NMI signal, obviously.
>  */
> if (!cpu)
> value = APIC_DM_NMI;
> else
> value = APIC_DM_NMI | APIC_LVT_MASKED;
> if (!lapic_is_integrated()) /* 82489DX */
> value |= APIC_LVT_LEVEL_TRIGGER;
> apic_write(APIC_LVT1, value);
> 
> 
> LINT1 pins of cpus other than CPU 0 are masked here.
> However, at least on some of Hitachi servers, NMI caused by NMI
> button doesn't seem to be delivered through LINT1.  So, my `external NMI'
> word may not be correct.

I am not familiar with details here but I can tell you that this
particular code snippet is the same in our 3.0 based kernel so it seems
that the HW is indeed doing something differently.

> > You might still get a panic on hardlockup which will happen on all CPUs
> > from the NMI context so we have to be able to handle panic in NMI on
> > many CPUs.
> 
> Do you say about the case of a kerne panic while other cpus locks up
> in NMI context?  In that case, there is no way to do things needed by
> kdump procedure including saving registeres...

I am saying that watchdog_overflow_callback might trigger on more CPUs
and panic from NMI context as well. So this is not reduced to the NMI
button sends NMI to more CPUs.

Why cannot the panic() context save all the registers if we are going to
loop in NMI context? This would be imho preferable to returning from
panic IMO.

[...]
> > I can provide the full log but it is quite mangled. I guess the
> > CPU130 was the only one allowed to proceed with the panic while others
> > returned from the unknown NMI handling. It took a lot of time until
> > CPU130 managed to boot the crash kernel with soft lockups and RCU stalls
> > reports. CPU0 is most probably locked up waiting for CPU130 to
> > acknowledge the IPI which will not happen apparently.
> 
> There is a timeout of 1000ms in nmi_shootdown_cpus(), so I don't know
> why CPU 130 waits so long.  I'll try to consider for a while.

Yes, I do not understand the timing here either and the fact that the
log is a complete mess in the important parts doesn't help a wee bit.
 
[...]

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-30 Thread 河合英宏 / KAWAI,HIDEHIRO
> From: Michal Hocko [mailto:mho...@kernel.org]
> 
> On Thu 30-07-15 01:45:35, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > Hi,
> >
> > > From: Michal Hocko [mailto:mho...@kernel.org]
> > >
> > > On Wed 29-07-15 09:09:18, 河合英宏 / KAWAI,HIDEHIRO wrote:
> [...]
> > > > #define nmi_panic(fmt, ...)\
> > > >do {\
> > > >if (atomic_cmpxchg(_cpu, -1, 
> > > > raw_smp_processor_id()) \
> > > >== -1)  \
> > > >panic(fmt, ##__VA_ARGS__);  \
> > > >} while (0)
> > >
> > > This would allow to return from NMI too eagerly.
> >
> > Yes, but what's the problem?
> 
> I believe that panic should be noreturn as much as possible and return
> only when we do not have any other options.

In my case, external NMI is delivered to only CPU 0.  This means
other CPUs continues to run until receiving NMI IPI.  I think returning
from NMI handler soon doesn't differ from this so much.

> Moreover I would ask an
> opposite question, what is the problem to loop in NMI on other CPUs than
> the one which is performing crash_kexec? We will not save registers, so
> what?

Actually, we have to do at least save registers, clean up VMX/SVM states
and announce that the cpu has stopped.  Just returning from NMI handler
is simpler.

> > The root cause of your case hasn't been clarified yet.
> > I can't fix for an unclear issue because I don't know what's the right
> > solution.
> >
> > > When I was testing my
> > > previous approach (on 3.0 based kernel) I had basically the same thing
> > > (one NMI to process panic) and others to return. This led to a strange
> > > behavior when the NMI button triggered NMI on all (hundreds) CPUs.
> >
> > It's strange.  Usually, NMI caused by NMI button is routed to only CPU 0
> > as an external NMI.  External NMI for CPUs other than CPU 0 are masked
> > at boot time.  Does it really happen?
> 
> Could you point me to the code which does that, please? Maybe we are
> missing that in our 3.0 kernel. I was quite surprised to see this
> behavior as well.

Please see the snippet below.

void setup_local_APIC(void)
{
...
/*
 * only the BP should see the LINT1 NMI signal, obviously.
 */
if (!cpu)
value = APIC_DM_NMI;
else
value = APIC_DM_NMI | APIC_LVT_MASKED;
if (!lapic_is_integrated()) /* 82489DX */
value |= APIC_LVT_LEVEL_TRIGGER;
apic_write(APIC_LVT1, value);


LINT1 pins of cpus other than CPU 0 are masked here.
However, at least on some of Hitachi servers, NMI caused by NMI
button doesn't seem to be delivered through LINT1.  So, my `external NMI'
word may not be correct.

> > Does the problem still happen on the latest kernel?
> 
> I do not have machine accessible so I have to rely on the customer to
> test and the current vanilla might be an issue.

Sure.

> > What kind of NMI is deliverd to each CPU?
> 
> See the log below.
> 
> > Traditionally, we should have assumed that NMI for crash dumping is
> > delivered to only one cpu.  Otherwise, we should often fail to take
> > a proper crash dump.
> 
> You might still get a panic on hardlockup which will happen on all CPUs
> from the NMI context so we have to be able to handle panic in NMI on
> many CPUs.

Do you say about the case of a kerne panic while other cpus locks up
in NMI context?  In that case, there is no way to do things needed by
kdump procedure including saving registeres...

> > It seems that your case is another problem to be solved separately.
> 
> I do not think so, quite contrary. If you want to solve the reentrancy
> then other CPUs might be spinning in NMI if there is a guarantee that at
> least one CPU can progress to finish crash_kexec().
> 
> > > The
> > > crash kernel booted eventually but the log contained lockups when a
> > > CPU waited for an IPI to the CPU which was handling the NMI panic.
> >
> > Could you explain more precisely?
> 
> [  167.843761] Uhhuh. NMI received for unknown reason 3d on CPU 130.
> [  167.843763] Do you have a strange power saving mode enabled?
> [... Mangled output ]
> [  167.856415] Dazed and confused, but trying to continue
> [  167.856428] Dazed and confused, but trying to continue
> [  167.856442] Dazed and confused, but trying to continue
> [...]
> [  193.108440] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:0:4]
> [...]
> [  193.108586] Call Trace:
> [  193.108595]  [] smp_call_function_single+0x15b/0x170
> [  193.108600]  [] smp_call_function_any+0x4e/0x110
> [  193.108607]  [] get_cur_val+0xbc/0x130 [acpi_cpufreq]
> [  193.108630]  [] get_cur_freq_on_cpu+0x77/0xf0 
> [acpi_cpufreq]
> [  193.108638]  [] cpufreq_update_policy+0x97/0x140
> [  193.108646]  [] acpi_processor_notify+0x4b/0x145 
> [processor]
> [  193.108654]  [] acpi_ev_notify_dispatch+0x61/0x77

RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-30 Thread 河合英宏 / KAWAI,HIDEHIRO
Hi,

> -Original Message-
> From: Michal Hocko [mailto:mho...@kernel.org]
> 
> On Thu 30-07-15 07:33:15, 河合英宏 / KAWAI,HIDEHIRO wrote:
> [...]
> > Are you using SGI UV?  On that platform, NMIs may be delivered to
> > all cpus because LVT1 of all cpus are not masked as follows:
> 
> This is Compute Blade 520XB1 from Hitachi with 240 cpus.

Thanks for the information!

I asked my colleague in other department about NMI button behavior
of our server just before receive your mail, and certainly what you
said happens; all cpus say "Uhhuh. NMI received for unknown reason 3d
on CPU N" when NMI button is pusshed.  So, this was also our problem...

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-30 Thread Michal Hocko
On Thu 30-07-15 07:33:15, 河合英宏 / KAWAI,HIDEHIRO wrote:
[...]
> Are you using SGI UV?  On that platform, NMIs may be delivered to
> all cpus because LVT1 of all cpus are not masked as follows:

This is Compute Blade 520XB1 from Hitachi with 240 cpus.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-30 Thread Michal Hocko
On Thu 30-07-15 01:45:35, 河合英宏 / KAWAI,HIDEHIRO wrote:
> Hi,
> 
> > From: Michal Hocko [mailto:mho...@kernel.org]
> > 
> > On Wed 29-07-15 09:09:18, 河合英宏 / KAWAI,HIDEHIRO wrote:
[...]
> > > #define nmi_panic(fmt, ...)\
> > >do {\
> > >if (atomic_cmpxchg(_cpu, -1, raw_smp_processor_id()) 
> > > \
> > >== -1)  \
> > >panic(fmt, ##__VA_ARGS__);  \
> > >} while (0)
> > 
> > This would allow to return from NMI too eagerly.
> 
> Yes, but what's the problem?

I believe that panic should be noreturn as much as possible and return
only when we do not have any other options. Moreover I would ask an
opposite question, what is the problem to loop in NMI on other CPUs than
the one which is performing crash_kexec? We will not save registers, so
what?

> The root cause of your case hasn't been clarified yet.
> I can't fix for an unclear issue because I don't know what's the right
> solution.
> 
> > When I was testing my
> > previous approach (on 3.0 based kernel) I had basically the same thing
> > (one NMI to process panic) and others to return. This led to a strange
> > behavior when the NMI button triggered NMI on all (hundreds) CPUs.
> 
> It's strange.  Usually, NMI caused by NMI button is routed to only CPU 0
> as an external NMI.  External NMI for CPUs other than CPU 0 are masked
> at boot time.  Does it really happen?

Could you point me to the code which does that, please? Maybe we are
missing that in our 3.0 kernel. I was quite surprised to see this
behavior as well.

> Does the problem still happen on the latest kernel?

I do not have machine accessible so I have to rely on the customer to
test and the current vanilla might be an issue.

> What kind of NMI is deliverd to each CPU?

See the log below.

> Traditionally, we should have assumed that NMI for crash dumping is
> delivered to only one cpu.  Otherwise, we should often fail to take
> a proper crash dump.

You might still get a panic on hardlockup which will happen on all CPUs
from the NMI context so we have to be able to handle panic in NMI on
many CPUs.

> It seems that your case is another problem to be solved separately.

I do not think so, quite contrary. If you want to solve the reentrancy
then other CPUs might be spinning in NMI if there is a guarantee that at
least one CPU can progress to finish crash_kexec().

> > The
> > crash kernel booted eventually but the log contained lockups when a
> > CPU waited for an IPI to the CPU which was handling the NMI panic.
> 
> Could you explain more precisely?

[  167.843761] Uhhuh. NMI received for unknown reason 3d on CPU 130.
[  167.843763] Do you have a strange power saving mode enabled?
[... Mangled output ]
[  167.856415] Dazed and confused, but trying to continue
[  167.856428] Dazed and confused, but trying to continue
[  167.856442] Dazed and confused, but trying to continue
[...]
[  193.108440] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:0:4]
[...]
[  193.108586] Call Trace:
[  193.108595]  [] smp_call_function_single+0x15b/0x170
[  193.108600]  [] smp_call_function_any+0x4e/0x110
[  193.108607]  [] get_cur_val+0xbc/0x130 [acpi_cpufreq]
[  193.108630]  [] get_cur_freq_on_cpu+0x77/0xf0 
[acpi_cpufreq]
[  193.108638]  [] cpufreq_update_policy+0x97/0x140
[  193.108646]  [] acpi_processor_notify+0x4b/0x145 
[processor]
[  193.108654]  [] acpi_ev_notify_dispatch+0x61/0x77
[  193.108659]  [] acpi_os_execute_deferred+0x21/0x2c
[  193.108667]  [] process_one_work+0x16c/0x350
[  193.108673]  [] worker_thread+0x17a/0x410
[  193.108679]  [] kthread+0x96/0xa0
[  193.108688]  [] kernel_thread_helper+0x4/0x10
[...]
[  221.068390] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:0:4]
[...]
[  227.991235] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 130} 
(detected by 56, t=15002 jiffies)
[  227.991247] sending NMI to all CPUs:
[  227.991251] NMI backtrace for cpu 0
[  229.074091] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 130} 
(detected by 105, t=15013 jiffies)
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Linux version 3.0.101-0.47.55.9.8853.0.TEST-default 
(geeko@buildhost) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE 
Linux) ) #1 SMP Thu May 28 08:25:11 UTC 2015 (dc083ee)
[0.00] Command line: root=/dev/system/lvroot resume=/dev/system/lvswap 
intel_idle.max_cstate=0 processor.max_cstate=0 elevator=deadline nmi_watchdog=1 
console=tty0 console=ttyS1,115200 elevator=deadline sysrq=yes reset_devices 
irqpoll maxcpus=1 disable_cpu_apicid=0 noefi acpi_rsdp=0xba7a4014  
crashkernel=1024M-:512M memmap=exactmap memmap=576K@64K memmap=523684K@393216K 
elfcorehdr=916900K memmap=32768K#3018748K memmap=3736K#3051516K 
memmap=262144K$3145728K

I can provide the full log 

RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-30 Thread 河合英宏 / KAWAI,HIDEHIRO
Hi Michal,

> From: 河合英宏 / KAWAI,HIDEHIRO [mailto:hidehiro.kawai...@hitachi.com]
> > When I was testing my
> > previous approach (on 3.0 based kernel) I had basically the same thing
> > (one NMI to process panic) and others to return. This led to a strange
> > behavior when the NMI button triggered NMI on all (hundreds) CPUs.
> 
> It's strange.  Usually, NMI caused by NMI button is routed to only CPU 0
> as an external NMI.  External NMI for CPUs other than CPU 0 are masked
> at boot time.  Does it really happen?  Does the problem still happen on
> the latest kernel?  What kind of NMI is deliverd to each CPU?

Are you using SGI UV?  On that platform, NMIs may be delivered to
all cpus because LVT1 of all cpus are not masked as follows:

void uv_nmi_init(void)
{
unsigned int value;

/*
 * Unmask NMI on all cpus
 */
value = apic_read(APIC_LVT1) | APIC_DM_NMI;
value &= ~APIC_LVT_MASKED;
apic_write(APIC_LVT1, value);
}

> 
> Traditionally, we should have assumed that NMI for crash dumping is
> delivered to only one cpu.  Otherwise, we should often fail to take
> a proper crash dump.  It seems that your case is another problem to be
> solved separately.
> 
> > The
> > crash kernel booted eventually but the log contained lockups when a
> > CPU waited for an IPI to the CPU which was handling the NMI panic.
> 
> Could you explain more precisely?
> 
> > Anyway, I do not thing this is really necessary to solve the panic
> > reentrancy issue.
> > If the missing saved state is a real problem then it
> > should be handled separately - maybe it can be achieved without an IPI
> > and directly from the panic context if we are in NMI.
> 
> What I would like to do via this patchse is to solve race issues
> among NMI, panic() and crash_kexec().  So, I don't think we should fix
> that separately, although I would need to reword some descriptions
> and titles.
> 
> Anyway, I'm going to sent out my revised version once in order to
> tidy up.  I also would like to hear kexec/kdump guys' opinions.
> 
> Regards,
> Kawai

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-30 Thread Michal Hocko
On Thu 30-07-15 11:55:52, 河合英宏 / KAWAI,HIDEHIRO wrote:
  From: Michal Hocko [mailto:mho...@kernel.org]
[...]
  Could you point me to the code which does that, please? Maybe we are
  missing that in our 3.0 kernel. I was quite surprised to see this
  behavior as well.
 
 Please see the snippet below.
 
 void setup_local_APIC(void)
 {
 ...
 /*
  * only the BP should see the LINT1 NMI signal, obviously.
  */
 if (!cpu)
 value = APIC_DM_NMI;
 else
 value = APIC_DM_NMI | APIC_LVT_MASKED;
 if (!lapic_is_integrated()) /* 82489DX */
 value |= APIC_LVT_LEVEL_TRIGGER;
 apic_write(APIC_LVT1, value);
 
 
 LINT1 pins of cpus other than CPU 0 are masked here.
 However, at least on some of Hitachi servers, NMI caused by NMI
 button doesn't seem to be delivered through LINT1.  So, my `external NMI'
 word may not be correct.

I am not familiar with details here but I can tell you that this
particular code snippet is the same in our 3.0 based kernel so it seems
that the HW is indeed doing something differently.

  You might still get a panic on hardlockup which will happen on all CPUs
  from the NMI context so we have to be able to handle panic in NMI on
  many CPUs.
 
 Do you say about the case of a kerne panic while other cpus locks up
 in NMI context?  In that case, there is no way to do things needed by
 kdump procedure including saving registeres...

I am saying that watchdog_overflow_callback might trigger on more CPUs
and panic from NMI context as well. So this is not reduced to the NMI
button sends NMI to more CPUs.

Why cannot the panic() context save all the registers if we are going to
loop in NMI context? This would be imho preferable to returning from
panic IMO.

[...]
  I can provide the full log but it is quite mangled. I guess the
  CPU130 was the only one allowed to proceed with the panic while others
  returned from the unknown NMI handling. It took a lot of time until
  CPU130 managed to boot the crash kernel with soft lockups and RCU stalls
  reports. CPU0 is most probably locked up waiting for CPU130 to
  acknowledge the IPI which will not happen apparently.
 
 There is a timeout of 1000ms in nmi_shootdown_cpus(), so I don't know
 why CPU 130 waits so long.  I'll try to consider for a while.

Yes, I do not understand the timing here either and the fact that the
log is a complete mess in the important parts doesn't help a wee bit.
 
[...]

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-30 Thread 河合英宏 / KAWAI,HIDEHIRO
 From: Michal Hocko [mailto:mho...@kernel.org]
 
 On Thu 30-07-15 01:45:35, 河合英宏 / KAWAI,HIDEHIRO wrote:
  Hi,
 
   From: Michal Hocko [mailto:mho...@kernel.org]
  
   On Wed 29-07-15 09:09:18, 河合英宏 / KAWAI,HIDEHIRO wrote:
 [...]
#define nmi_panic(fmt, ...)\
   do {\
   if (atomic_cmpxchg(panic_cpu, -1, 
raw_smp_processor_id()) \
   == -1)  \
   panic(fmt, ##__VA_ARGS__);  \
   } while (0)
  
   This would allow to return from NMI too eagerly.
 
  Yes, but what's the problem?
 
 I believe that panic should be noreturn as much as possible and return
 only when we do not have any other options.

In my case, external NMI is delivered to only CPU 0.  This means
other CPUs continues to run until receiving NMI IPI.  I think returning
from NMI handler soon doesn't differ from this so much.

 Moreover I would ask an
 opposite question, what is the problem to loop in NMI on other CPUs than
 the one which is performing crash_kexec? We will not save registers, so
 what?

Actually, we have to do at least save registers, clean up VMX/SVM states
and announce that the cpu has stopped.  Just returning from NMI handler
is simpler.

  The root cause of your case hasn't been clarified yet.
  I can't fix for an unclear issue because I don't know what's the right
  solution.
 
   When I was testing my
   previous approach (on 3.0 based kernel) I had basically the same thing
   (one NMI to process panic) and others to return. This led to a strange
   behavior when the NMI button triggered NMI on all (hundreds) CPUs.
 
  It's strange.  Usually, NMI caused by NMI button is routed to only CPU 0
  as an external NMI.  External NMI for CPUs other than CPU 0 are masked
  at boot time.  Does it really happen?
 
 Could you point me to the code which does that, please? Maybe we are
 missing that in our 3.0 kernel. I was quite surprised to see this
 behavior as well.

Please see the snippet below.

void setup_local_APIC(void)
{
...
/*
 * only the BP should see the LINT1 NMI signal, obviously.
 */
if (!cpu)
value = APIC_DM_NMI;
else
value = APIC_DM_NMI | APIC_LVT_MASKED;
if (!lapic_is_integrated()) /* 82489DX */
value |= APIC_LVT_LEVEL_TRIGGER;
apic_write(APIC_LVT1, value);


LINT1 pins of cpus other than CPU 0 are masked here.
However, at least on some of Hitachi servers, NMI caused by NMI
button doesn't seem to be delivered through LINT1.  So, my `external NMI'
word may not be correct.

  Does the problem still happen on the latest kernel?
 
 I do not have machine accessible so I have to rely on the customer to
 test and the current vanilla might be an issue.

Sure.

  What kind of NMI is deliverd to each CPU?
 
 See the log below.
 
  Traditionally, we should have assumed that NMI for crash dumping is
  delivered to only one cpu.  Otherwise, we should often fail to take
  a proper crash dump.
 
 You might still get a panic on hardlockup which will happen on all CPUs
 from the NMI context so we have to be able to handle panic in NMI on
 many CPUs.

Do you say about the case of a kerne panic while other cpus locks up
in NMI context?  In that case, there is no way to do things needed by
kdump procedure including saving registeres...

  It seems that your case is another problem to be solved separately.
 
 I do not think so, quite contrary. If you want to solve the reentrancy
 then other CPUs might be spinning in NMI if there is a guarantee that at
 least one CPU can progress to finish crash_kexec().
 
   The
   crash kernel booted eventually but the log contained lockups when a
   CPU waited for an IPI to the CPU which was handling the NMI panic.
 
  Could you explain more precisely?
 
 [  167.843761] Uhhuh. NMI received for unknown reason 3d on CPU 130.
 [  167.843763] Do you have a strange power saving mode enabled?
 [... Mangled output ]
 [  167.856415] Dazed and confused, but trying to continue
 [  167.856428] Dazed and confused, but trying to continue
 [  167.856442] Dazed and confused, but trying to continue
 [...]
 [  193.108440] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:0:4]
 [...]
 [  193.108586] Call Trace:
 [  193.108595]  [8109baeb] smp_call_function_single+0x15b/0x170
 [  193.108600]  [8109bb4e] smp_call_function_any+0x4e/0x110
 [  193.108607]  [a04a332c] get_cur_val+0xbc/0x130 [acpi_cpufreq]
 [  193.108630]  [a04a3417] get_cur_freq_on_cpu+0x77/0xf0 
 [acpi_cpufreq]
 [  193.108638]  [8137bc37] cpufreq_update_policy+0x97/0x140
 [  193.108646]  [a00ca04b] acpi_processor_notify+0x4b/0x145 
 [processor]
 [  193.108654]  [812d2eca] acpi_ev_notify_dispatch+0x61/0x77
 [  193.108659]  [812c1785] 

Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-30 Thread Michal Hocko
On Thu 30-07-15 01:45:35, 河合英宏 / KAWAI,HIDEHIRO wrote:
 Hi,
 
  From: Michal Hocko [mailto:mho...@kernel.org]
  
  On Wed 29-07-15 09:09:18, 河合英宏 / KAWAI,HIDEHIRO wrote:
[...]
   #define nmi_panic(fmt, ...)\
  do {\
  if (atomic_cmpxchg(panic_cpu, -1, raw_smp_processor_id()) 
   \
  == -1)  \
  panic(fmt, ##__VA_ARGS__);  \
  } while (0)
  
  This would allow to return from NMI too eagerly.
 
 Yes, but what's the problem?

I believe that panic should be noreturn as much as possible and return
only when we do not have any other options. Moreover I would ask an
opposite question, what is the problem to loop in NMI on other CPUs than
the one which is performing crash_kexec? We will not save registers, so
what?

 The root cause of your case hasn't been clarified yet.
 I can't fix for an unclear issue because I don't know what's the right
 solution.
 
  When I was testing my
  previous approach (on 3.0 based kernel) I had basically the same thing
  (one NMI to process panic) and others to return. This led to a strange
  behavior when the NMI button triggered NMI on all (hundreds) CPUs.
 
 It's strange.  Usually, NMI caused by NMI button is routed to only CPU 0
 as an external NMI.  External NMI for CPUs other than CPU 0 are masked
 at boot time.  Does it really happen?

Could you point me to the code which does that, please? Maybe we are
missing that in our 3.0 kernel. I was quite surprised to see this
behavior as well.

 Does the problem still happen on the latest kernel?

I do not have machine accessible so I have to rely on the customer to
test and the current vanilla might be an issue.

 What kind of NMI is deliverd to each CPU?

See the log below.

 Traditionally, we should have assumed that NMI for crash dumping is
 delivered to only one cpu.  Otherwise, we should often fail to take
 a proper crash dump.

You might still get a panic on hardlockup which will happen on all CPUs
from the NMI context so we have to be able to handle panic in NMI on
many CPUs.

 It seems that your case is another problem to be solved separately.

I do not think so, quite contrary. If you want to solve the reentrancy
then other CPUs might be spinning in NMI if there is a guarantee that at
least one CPU can progress to finish crash_kexec().

  The
  crash kernel booted eventually but the log contained lockups when a
  CPU waited for an IPI to the CPU which was handling the NMI panic.
 
 Could you explain more precisely?

[  167.843761] Uhhuh. NMI received for unknown reason 3d on CPU 130.
[  167.843763] Do you have a strange power saving mode enabled?
[... Mangled output ]
[  167.856415] Dazed and confused, but trying to continue
[  167.856428] Dazed and confused, but trying to continue
[  167.856442] Dazed and confused, but trying to continue
[...]
[  193.108440] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:0:4]
[...]
[  193.108586] Call Trace:
[  193.108595]  [8109baeb] smp_call_function_single+0x15b/0x170
[  193.108600]  [8109bb4e] smp_call_function_any+0x4e/0x110
[  193.108607]  [a04a332c] get_cur_val+0xbc/0x130 [acpi_cpufreq]
[  193.108630]  [a04a3417] get_cur_freq_on_cpu+0x77/0xf0 
[acpi_cpufreq]
[  193.108638]  [8137bc37] cpufreq_update_policy+0x97/0x140
[  193.108646]  [a00ca04b] acpi_processor_notify+0x4b/0x145 
[processor]
[  193.108654]  [812d2eca] acpi_ev_notify_dispatch+0x61/0x77
[  193.108659]  [812c1785] acpi_os_execute_deferred+0x21/0x2c
[  193.108667]  [8107d03c] process_one_work+0x16c/0x350
[  193.108673]  [8107fd6a] worker_thread+0x17a/0x410
[  193.108679]  [81084136] kthread+0x96/0xa0
[  193.108688]  [8146df64] kernel_thread_helper+0x4/0x10
[...]
[  221.068390] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:0:4]
[...]
[  227.991235] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 130} 
(detected by 56, t=15002 jiffies)
[  227.991247] sending NMI to all CPUs:
[  227.991251] NMI backtrace for cpu 0
[  229.074091] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 130} 
(detected by 105, t=15013 jiffies)
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Linux version 3.0.101-0.47.55.9.8853.0.TEST-default 
(geeko@buildhost) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE 
Linux) ) #1 SMP Thu May 28 08:25:11 UTC 2015 (dc083ee)
[0.00] Command line: root=/dev/system/lvroot resume=/dev/system/lvswap 
intel_idle.max_cstate=0 processor.max_cstate=0 elevator=deadline nmi_watchdog=1 
console=tty0 console=ttyS1,115200 elevator=deadline sysrq=yes reset_devices 
irqpoll maxcpus=1 disable_cpu_apicid=0 noefi acpi_rsdp=0xba7a4014  
crashkernel=1024M-:512M memmap=exactmap memmap=576K@64K 

Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-30 Thread Michal Hocko
On Thu 30-07-15 07:33:15, 河合英宏 / KAWAI,HIDEHIRO wrote:
[...]
 Are you using SGI UV?  On that platform, NMIs may be delivered to
 all cpus because LVT1 of all cpus are not masked as follows:

This is Compute Blade 520XB1 from Hitachi with 240 cpus.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-30 Thread 河合英宏 / KAWAI,HIDEHIRO
Hi,

 -Original Message-
 From: Michal Hocko [mailto:mho...@kernel.org]
 
 On Thu 30-07-15 07:33:15, 河合英宏 / KAWAI,HIDEHIRO wrote:
 [...]
  Are you using SGI UV?  On that platform, NMIs may be delivered to
  all cpus because LVT1 of all cpus are not masked as follows:
 
 This is Compute Blade 520XB1 from Hitachi with 240 cpus.

Thanks for the information!

I asked my colleague in other department about NMI button behavior
of our server just before receive your mail, and certainly what you
said happens; all cpus say Uhhuh. NMI received for unknown reason 3d
on CPU N when NMI button is pusshed.  So, this was also our problem...

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-30 Thread 河合英宏 / KAWAI,HIDEHIRO
Hi Michal,

 From: 河合英宏 / KAWAI,HIDEHIRO [mailto:hidehiro.kawai...@hitachi.com]
  When I was testing my
  previous approach (on 3.0 based kernel) I had basically the same thing
  (one NMI to process panic) and others to return. This led to a strange
  behavior when the NMI button triggered NMI on all (hundreds) CPUs.
 
 It's strange.  Usually, NMI caused by NMI button is routed to only CPU 0
 as an external NMI.  External NMI for CPUs other than CPU 0 are masked
 at boot time.  Does it really happen?  Does the problem still happen on
 the latest kernel?  What kind of NMI is deliverd to each CPU?

Are you using SGI UV?  On that platform, NMIs may be delivered to
all cpus because LVT1 of all cpus are not masked as follows:

void uv_nmi_init(void)
{
unsigned int value;

/*
 * Unmask NMI on all cpus
 */
value = apic_read(APIC_LVT1) | APIC_DM_NMI;
value = ~APIC_LVT_MASKED;
apic_write(APIC_LVT1, value);
}

 
 Traditionally, we should have assumed that NMI for crash dumping is
 delivered to only one cpu.  Otherwise, we should often fail to take
 a proper crash dump.  It seems that your case is another problem to be
 solved separately.
 
  The
  crash kernel booted eventually but the log contained lockups when a
  CPU waited for an IPI to the CPU which was handling the NMI panic.
 
 Could you explain more precisely?
 
  Anyway, I do not thing this is really necessary to solve the panic
  reentrancy issue.
  If the missing saved state is a real problem then it
  should be handled separately - maybe it can be achieved without an IPI
  and directly from the panic context if we are in NMI.
 
 What I would like to do via this patchse is to solve race issues
 among NMI, panic() and crash_kexec().  So, I don't think we should fix
 that separately, although I would need to reword some descriptions
 and titles.
 
 Anyway, I'm going to sent out my revised version once in order to
 tidy up.  I also would like to hear kexec/kdump guys' opinions.
 
 Regards,
 Kawai

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-29 Thread 河合英宏 / KAWAI,HIDEHIRO
Hi,

> From: Michal Hocko [mailto:mho...@kernel.org]
> 
> On Wed 29-07-15 09:09:18, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > From: Michal Hocko [mailto:mho...@kernel.org]
> > > On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > Hi,
> > > >
> > > > > From: linux-kernel-ow...@vger.kernel.org 
> > > > > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro 
> > > > > Kawai
> > > > > (2015/07/27 23:34), Michal Hocko wrote:
> > > > > > On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote:
> > > > [...]
> > > > > > The check could be also relaxed a bit and nmi_panic would
> > > > > > return only if the ongoing panic is the current cpu when we really 
> > > > > > have
> > > > > > to return and allow the preempted panic to finish.
> > > > >
> > > > > It's reasonable.  I'll do that in the next version.
> > > >
> > > > I noticed atomic_read() is insufficient.  Please consider the following
> > > > scenario.
> > > >
> > > > CPU 1: call panic() in the normal context
> > > > CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic()
> > > > CPU 1: set 1 to panic_cpu
> > > > CPU 0: fail to set 0 to panic_cpu, then do an infinite loop
> > > > CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus()
> > > >
> > > > At this point, since CPU 0 loops in NMI context, it never executes
> > > > the NMI handler registered by kdump_nmi_shootdown_cpus().  This means
> > > > that no register states are saved and no cleanups for VMX/SVM are
> > > > performed.
> > >
> > > Yes this is true but it is no different from the current state, isn't
> > > it? So if you want to handle that then it deserves a separate patch.
> > > It is certainly not harmful wrt. panic behavior.
> > >
> > > > So, we should still use atomic_cmpxchg() in nmi_panic() to
> > > > prevent other cpus from running panic routines.
> > >
> > > Not sure what you mean by that.
> >
> > I mean that we should use the same logic as my V2 patch like this:
> >
> > #define nmi_panic(fmt, ...)\
> >do {\
> >if (atomic_cmpxchg(_cpu, -1, raw_smp_processor_id()) \
> >== -1)  \
> >panic(fmt, ##__VA_ARGS__);  \
> >} while (0)
> 
> This would allow to return from NMI too eagerly.

Yes, but what's the problem?
The root cause of your case hasn't been clarified yet.
I can't fix for an unclear issue because I don't know what's the right
solution.

> When I was testing my
> previous approach (on 3.0 based kernel) I had basically the same thing
> (one NMI to process panic) and others to return. This led to a strange
> behavior when the NMI button triggered NMI on all (hundreds) CPUs.

It's strange.  Usually, NMI caused by NMI button is routed to only CPU 0
as an external NMI.  External NMI for CPUs other than CPU 0 are masked
at boot time.  Does it really happen?  Does the problem still happen on
the latest kernel?  What kind of NMI is deliverd to each CPU?

Traditionally, we should have assumed that NMI for crash dumping is
delivered to only one cpu.  Otherwise, we should often fail to take
a proper crash dump.  It seems that your case is another problem to be
solved separately.

> The
> crash kernel booted eventually but the log contained lockups when a
> CPU waited for an IPI to the CPU which was handling the NMI panic.

Could you explain more precisely?

> Anyway, I do not thing this is really necessary to solve the panic
> reentrancy issue.
> If the missing saved state is a real problem then it
> should be handled separately - maybe it can be achieved without an IPI
> and directly from the panic context if we are in NMI.

What I would like to do via this patchse is to solve race issues
among NMI, panic() and crash_kexec().  So, I don't think we should fix
that separately, although I would need to reword some descriptions
and titles.

Anyway, I'm going to sent out my revised version once in order to
tidy up.  I also would like to hear kexec/kdump guys' opinions.

Regards,
Kawai



Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-29 Thread Michal Hocko
On Wed 29-07-15 09:09:18, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > From: Michal Hocko [mailto:mho...@kernel.org]
> > On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > Hi,
> > >
> > > > From: linux-kernel-ow...@vger.kernel.org 
> > > > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro Kawai
> > > > (2015/07/27 23:34), Michal Hocko wrote:
> > > > > On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote:
> > > [...]
> > > > > The check could be also relaxed a bit and nmi_panic would
> > > > > return only if the ongoing panic is the current cpu when we really 
> > > > > have
> > > > > to return and allow the preempted panic to finish.
> > > >
> > > > It's reasonable.  I'll do that in the next version.
> > >
> > > I noticed atomic_read() is insufficient.  Please consider the following
> > > scenario.
> > >
> > > CPU 1: call panic() in the normal context
> > > CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic()
> > > CPU 1: set 1 to panic_cpu
> > > CPU 0: fail to set 0 to panic_cpu, then do an infinite loop
> > > CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus()
> > >
> > > At this point, since CPU 0 loops in NMI context, it never executes
> > > the NMI handler registered by kdump_nmi_shootdown_cpus().  This means
> > > that no register states are saved and no cleanups for VMX/SVM are
> > > performed.
> > 
> > Yes this is true but it is no different from the current state, isn't
> > it? So if you want to handle that then it deserves a separate patch.
> > It is certainly not harmful wrt. panic behavior.
> > 
> > > So, we should still use atomic_cmpxchg() in nmi_panic() to
> > > prevent other cpus from running panic routines.
> > 
> > Not sure what you mean by that.
> 
> I mean that we should use the same logic as my V2 patch like this:
> 
> #define nmi_panic(fmt, ...)\
>do {\
>if (atomic_cmpxchg(_cpu, -1, raw_smp_processor_id()) \
>== -1)  \
>panic(fmt, ##__VA_ARGS__);  \
>} while (0)

This would allow to return from NMI too eagerly. When I was testing my
previous approach (on 3.0 based kernel) I had basically the same thing
(one NMI to process panic) and others to return. This led to a strange
behavior when the NMI button triggered NMI on all (hundreds) CPUs. The
crash kernel booted eventually but the log contained lockups when a
CPU waited for an IPI to the CPU which was handling the NMI panic.

Anyway, I do not thing this is really necessary to solve the panic
reentrancy issue. If the missing saved state is a real problem then it
should be handled separately - maybe it can be achieved without an IPI
and directly from the panic context if we are in NMI.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-29 Thread 河合英宏 / KAWAI,HIDEHIRO
> From: Michal Hocko [mailto:mho...@kernel.org]
> On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > Hi,
> >
> > > From: linux-kernel-ow...@vger.kernel.org 
> > > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro Kawai
> > > (2015/07/27 23:34), Michal Hocko wrote:
> > > > On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote:
> > [...]
> > > > The check could be also relaxed a bit and nmi_panic would
> > > > return only if the ongoing panic is the current cpu when we really have
> > > > to return and allow the preempted panic to finish.
> > >
> > > It's reasonable.  I'll do that in the next version.
> >
> > I noticed atomic_read() is insufficient.  Please consider the following
> > scenario.
> >
> > CPU 1: call panic() in the normal context
> > CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic()
> > CPU 1: set 1 to panic_cpu
> > CPU 0: fail to set 0 to panic_cpu, then do an infinite loop
> > CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus()
> >
> > At this point, since CPU 0 loops in NMI context, it never executes
> > the NMI handler registered by kdump_nmi_shootdown_cpus().  This means
> > that no register states are saved and no cleanups for VMX/SVM are
> > performed.
> 
> Yes this is true but it is no different from the current state, isn't
> it? So if you want to handle that then it deserves a separate patch.
> It is certainly not harmful wrt. panic behavior.
> 
> > So, we should still use atomic_cmpxchg() in nmi_panic() to
> > prevent other cpus from running panic routines.
> 
> Not sure what you mean by that.

I mean that we should use the same logic as my V2 patch like this:

#define nmi_panic(fmt, ...)\
   do {\
   if (atomic_cmpxchg(_cpu, -1, raw_smp_processor_id()) \
   == -1)  \
   panic(fmt, ##__VA_ARGS__);  \
   } while (0)

By using atomic_cmpxchg here, we can ensure that only this cpu
runs panic routines.  It is important to prevent a NMI-context cpu
from calling panic_smp_self_stop(). 

void panic(const char *fmt, ...)
{
...
* `old_cpu == -1' means we are the first comer.
* `old_cpu == this_cpu' means we came here due to panic on NMI.
*/
   this_cpu = raw_smp_processor_id();
   old_cpu = atomic_cmpxchg(_cpu, -1, this_cpu);
   if (old_cpu != -1 && old_cpu != this_cpu)
panic_smp_self_stop();

Please assume that CPU 0 calls nmi_panic() in NMI context
and CPU 1 calls panic() in normal context at tha same time.

If CPU 1 set panic_cpu before CPU 0 does, CPU 1 runs panic routines
and CPU 0 return from the nmi handler.  Eventually CPU 0 is stopped
by nmi_shootdown_cpus().

If CPU 0 set panic_cpu before CPU 1 does, CPU 0 runs panic routines.
CPU 1 calls panic_smp_self_stop(), and wait for NMI by
nmi_shootdown_cpus().

Anyway, I tested my approach and it worked fine.

Regards,
Kawai



Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-29 Thread Michal Hocko
On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote:
> Hi,
> 
> > From: linux-kernel-ow...@vger.kernel.org 
> > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro Kawai
> > (2015/07/27 23:34), Michal Hocko wrote:
> > > On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote:
> [...]
> > > The check could be also relaxed a bit and nmi_panic would
> > > return only if the ongoing panic is the current cpu when we really have
> > > to return and allow the preempted panic to finish.
> > 
> > It's reasonable.  I'll do that in the next version.
> 
> I noticed atomic_read() is insufficient.  Please consider the following
> scenario.
> 
> CPU 1: call panic() in the normal context
> CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic()
> CPU 1: set 1 to panic_cpu
> CPU 0: fail to set 0 to panic_cpu, then do an infinite loop
> CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus()
> 
> At this point, since CPU 0 loops in NMI context, it never executes
> the NMI handler registered by kdump_nmi_shootdown_cpus().  This means
> that no register states are saved and no cleanups for VMX/SVM are
> performed.

Yes this is true but it is no different from the current state, isn't
it? So if you want to handle that then it deserves a separate patch.
It is certainly not harmful wrt. panic behavior.

> So, we should still use atomic_cmpxchg() in nmi_panic() to
> prevent other cpus from running panic routines.

Not sure what you mean by that.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-29 Thread 河合英宏 / KAWAI,HIDEHIRO
Hi,

 From: Michal Hocko [mailto:mho...@kernel.org]
 
 On Wed 29-07-15 09:09:18, 河合英宏 / KAWAI,HIDEHIRO wrote:
   From: Michal Hocko [mailto:mho...@kernel.org]
   On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote:
Hi,
   
 From: linux-kernel-ow...@vger.kernel.org 
 [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro 
 Kawai
 (2015/07/27 23:34), Michal Hocko wrote:
  On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote:
[...]
  The check could be also relaxed a bit and nmi_panic would
  return only if the ongoing panic is the current cpu when we really 
  have
  to return and allow the preempted panic to finish.

 It's reasonable.  I'll do that in the next version.
   
I noticed atomic_read() is insufficient.  Please consider the following
scenario.
   
CPU 1: call panic() in the normal context
CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic()
CPU 1: set 1 to panic_cpu
CPU 0: fail to set 0 to panic_cpu, then do an infinite loop
CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus()
   
At this point, since CPU 0 loops in NMI context, it never executes
the NMI handler registered by kdump_nmi_shootdown_cpus().  This means
that no register states are saved and no cleanups for VMX/SVM are
performed.
  
   Yes this is true but it is no different from the current state, isn't
   it? So if you want to handle that then it deserves a separate patch.
   It is certainly not harmful wrt. panic behavior.
  
So, we should still use atomic_cmpxchg() in nmi_panic() to
prevent other cpus from running panic routines.
  
   Not sure what you mean by that.
 
  I mean that we should use the same logic as my V2 patch like this:
 
  #define nmi_panic(fmt, ...)\
 do {\
 if (atomic_cmpxchg(panic_cpu, -1, raw_smp_processor_id()) \
 == -1)  \
 panic(fmt, ##__VA_ARGS__);  \
 } while (0)
 
 This would allow to return from NMI too eagerly.

Yes, but what's the problem?
The root cause of your case hasn't been clarified yet.
I can't fix for an unclear issue because I don't know what's the right
solution.

 When I was testing my
 previous approach (on 3.0 based kernel) I had basically the same thing
 (one NMI to process panic) and others to return. This led to a strange
 behavior when the NMI button triggered NMI on all (hundreds) CPUs.

It's strange.  Usually, NMI caused by NMI button is routed to only CPU 0
as an external NMI.  External NMI for CPUs other than CPU 0 are masked
at boot time.  Does it really happen?  Does the problem still happen on
the latest kernel?  What kind of NMI is deliverd to each CPU?

Traditionally, we should have assumed that NMI for crash dumping is
delivered to only one cpu.  Otherwise, we should often fail to take
a proper crash dump.  It seems that your case is another problem to be
solved separately.

 The
 crash kernel booted eventually but the log contained lockups when a
 CPU waited for an IPI to the CPU which was handling the NMI panic.

Could you explain more precisely?

 Anyway, I do not thing this is really necessary to solve the panic
 reentrancy issue.
 If the missing saved state is a real problem then it
 should be handled separately - maybe it can be achieved without an IPI
 and directly from the panic context if we are in NMI.

What I would like to do via this patchse is to solve race issues
among NMI, panic() and crash_kexec().  So, I don't think we should fix
that separately, although I would need to reword some descriptions
and titles.

Anyway, I'm going to sent out my revised version once in order to
tidy up.  I also would like to hear kexec/kdump guys' opinions.

Regards,
Kawai



Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-29 Thread Michal Hocko
On Wed 29-07-15 09:09:18, 河合英宏 / KAWAI,HIDEHIRO wrote:
  From: Michal Hocko [mailto:mho...@kernel.org]
  On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote:
   Hi,
  
From: linux-kernel-ow...@vger.kernel.org 
[mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro Kawai
(2015/07/27 23:34), Michal Hocko wrote:
 On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote:
   [...]
 The check could be also relaxed a bit and nmi_panic would
 return only if the ongoing panic is the current cpu when we really 
 have
 to return and allow the preempted panic to finish.
   
It's reasonable.  I'll do that in the next version.
  
   I noticed atomic_read() is insufficient.  Please consider the following
   scenario.
  
   CPU 1: call panic() in the normal context
   CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic()
   CPU 1: set 1 to panic_cpu
   CPU 0: fail to set 0 to panic_cpu, then do an infinite loop
   CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus()
  
   At this point, since CPU 0 loops in NMI context, it never executes
   the NMI handler registered by kdump_nmi_shootdown_cpus().  This means
   that no register states are saved and no cleanups for VMX/SVM are
   performed.
  
  Yes this is true but it is no different from the current state, isn't
  it? So if you want to handle that then it deserves a separate patch.
  It is certainly not harmful wrt. panic behavior.
  
   So, we should still use atomic_cmpxchg() in nmi_panic() to
   prevent other cpus from running panic routines.
  
  Not sure what you mean by that.
 
 I mean that we should use the same logic as my V2 patch like this:
 
 #define nmi_panic(fmt, ...)\
do {\
if (atomic_cmpxchg(panic_cpu, -1, raw_smp_processor_id()) \
== -1)  \
panic(fmt, ##__VA_ARGS__);  \
} while (0)

This would allow to return from NMI too eagerly. When I was testing my
previous approach (on 3.0 based kernel) I had basically the same thing
(one NMI to process panic) and others to return. This led to a strange
behavior when the NMI button triggered NMI on all (hundreds) CPUs. The
crash kernel booted eventually but the log contained lockups when a
CPU waited for an IPI to the CPU which was handling the NMI panic.

Anyway, I do not thing this is really necessary to solve the panic
reentrancy issue. If the missing saved state is a real problem then it
should be handled separately - maybe it can be achieved without an IPI
and directly from the panic context if we are in NMI.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-29 Thread Michal Hocko
On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote:
 Hi,
 
  From: linux-kernel-ow...@vger.kernel.org 
  [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro Kawai
  (2015/07/27 23:34), Michal Hocko wrote:
   On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote:
 [...]
   The check could be also relaxed a bit and nmi_panic would
   return only if the ongoing panic is the current cpu when we really have
   to return and allow the preempted panic to finish.
  
  It's reasonable.  I'll do that in the next version.
 
 I noticed atomic_read() is insufficient.  Please consider the following
 scenario.
 
 CPU 1: call panic() in the normal context
 CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic()
 CPU 1: set 1 to panic_cpu
 CPU 0: fail to set 0 to panic_cpu, then do an infinite loop
 CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus()
 
 At this point, since CPU 0 loops in NMI context, it never executes
 the NMI handler registered by kdump_nmi_shootdown_cpus().  This means
 that no register states are saved and no cleanups for VMX/SVM are
 performed.

Yes this is true but it is no different from the current state, isn't
it? So if you want to handle that then it deserves a separate patch.
It is certainly not harmful wrt. panic behavior.

 So, we should still use atomic_cmpxchg() in nmi_panic() to
 prevent other cpus from running panic routines.

Not sure what you mean by that.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-29 Thread 河合英宏 / KAWAI,HIDEHIRO
 From: Michal Hocko [mailto:mho...@kernel.org]
 On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote:
  Hi,
 
   From: linux-kernel-ow...@vger.kernel.org 
   [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro Kawai
   (2015/07/27 23:34), Michal Hocko wrote:
On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote:
  [...]
The check could be also relaxed a bit and nmi_panic would
return only if the ongoing panic is the current cpu when we really have
to return and allow the preempted panic to finish.
  
   It's reasonable.  I'll do that in the next version.
 
  I noticed atomic_read() is insufficient.  Please consider the following
  scenario.
 
  CPU 1: call panic() in the normal context
  CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic()
  CPU 1: set 1 to panic_cpu
  CPU 0: fail to set 0 to panic_cpu, then do an infinite loop
  CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus()
 
  At this point, since CPU 0 loops in NMI context, it never executes
  the NMI handler registered by kdump_nmi_shootdown_cpus().  This means
  that no register states are saved and no cleanups for VMX/SVM are
  performed.
 
 Yes this is true but it is no different from the current state, isn't
 it? So if you want to handle that then it deserves a separate patch.
 It is certainly not harmful wrt. panic behavior.
 
  So, we should still use atomic_cmpxchg() in nmi_panic() to
  prevent other cpus from running panic routines.
 
 Not sure what you mean by that.

I mean that we should use the same logic as my V2 patch like this:

#define nmi_panic(fmt, ...)\
   do {\
   if (atomic_cmpxchg(panic_cpu, -1, raw_smp_processor_id()) \
   == -1)  \
   panic(fmt, ##__VA_ARGS__);  \
   } while (0)

By using atomic_cmpxchg here, we can ensure that only this cpu
runs panic routines.  It is important to prevent a NMI-context cpu
from calling panic_smp_self_stop(). 

void panic(const char *fmt, ...)
{
...
* `old_cpu == -1' means we are the first comer.
* `old_cpu == this_cpu' means we came here due to panic on NMI.
*/
   this_cpu = raw_smp_processor_id();
   old_cpu = atomic_cmpxchg(panic_cpu, -1, this_cpu);
   if (old_cpu != -1  old_cpu != this_cpu)
panic_smp_self_stop();

Please assume that CPU 0 calls nmi_panic() in NMI context
and CPU 1 calls panic() in normal context at tha same time.

If CPU 1 set panic_cpu before CPU 0 does, CPU 1 runs panic routines
and CPU 0 return from the nmi handler.  Eventually CPU 0 is stopped
by nmi_shootdown_cpus().

If CPU 0 set panic_cpu before CPU 1 does, CPU 0 runs panic routines.
CPU 1 calls panic_smp_self_stop(), and wait for NMI by
nmi_shootdown_cpus().

Anyway, I tested my approach and it worked fine.

Regards,
Kawai



RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-28 Thread 河合英宏 / KAWAI,HIDEHIRO
Hi,

> From: linux-kernel-ow...@vger.kernel.org 
> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro Kawai
> (2015/07/27 23:34), Michal Hocko wrote:
> > On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote:
[...]
> > The check could be also relaxed a bit and nmi_panic would
> > return only if the ongoing panic is the current cpu when we really have
> > to return and allow the preempted panic to finish.
> 
> It's reasonable.  I'll do that in the next version.

I noticed atomic_read() is insufficient.  Please consider the following
scenario.

CPU 1: call panic() in the normal context
CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic()
CPU 1: set 1 to panic_cpu
CPU 0: fail to set 0 to panic_cpu, then do an infinite loop
CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus()

At this point, since CPU 0 loops in NMI context, it never executes
the NMI handler registered by kdump_nmi_shootdown_cpus().  This means
that no register states are saved and no cleanups for VMX/SVM are
performed.  So, we should still use atomic_cmpxchg() in nmi_panic() to
prevent other cpus from running panic routines.

> > +void nmi_panic(const char *fmt, ...)
> > +{
> > +   /*
> > +* We have to back off if the NMI has preempted an ongoing panic and
> > +* allow it to finish
> > +*/
> > +   if (atomic_read(_cpu) == raw_smp_processor_id())
> > +   return;
> > +
> > +   panic();
> > +}
> > +EXPORT_SYMBOL(nmi_panic);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-28 Thread 河合英宏 / KAWAI,HIDEHIRO
Hi,

 From: linux-kernel-ow...@vger.kernel.org 
 [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro Kawai
 (2015/07/27 23:34), Michal Hocko wrote:
  On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote:
[...]
  The check could be also relaxed a bit and nmi_panic would
  return only if the ongoing panic is the current cpu when we really have
  to return and allow the preempted panic to finish.
 
 It's reasonable.  I'll do that in the next version.

I noticed atomic_read() is insufficient.  Please consider the following
scenario.

CPU 1: call panic() in the normal context
CPU 0: call nmi_panic(), check the value of panic_cpu, then call panic()
CPU 1: set 1 to panic_cpu
CPU 0: fail to set 0 to panic_cpu, then do an infinite loop
CPU 1: call crash_kexec(), then call kdump_nmi_shootdown_cpus()

At this point, since CPU 0 loops in NMI context, it never executes
the NMI handler registered by kdump_nmi_shootdown_cpus().  This means
that no register states are saved and no cleanups for VMX/SVM are
performed.  So, we should still use atomic_cmpxchg() in nmi_panic() to
prevent other cpus from running panic routines.

  +void nmi_panic(const char *fmt, ...)
  +{
  +   /*
  +* We have to back off if the NMI has preempted an ongoing panic and
  +* allow it to finish
  +*/
  +   if (atomic_read(panic_cpu) == raw_smp_processor_id())
  +   return;
  +
  +   panic();
  +}
  +EXPORT_SYMBOL(nmi_panic);

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/