RE: Re: [V4 PATCH 2/2] mips/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-09-20 Thread 河合英宏 / KAWAIHIDEHIRO
Dave Young suggested to me to explain the problem in more detail,
so here is the revised commit description.  The patch is now in -mm,
so I copied Cc list from -mm version.  Also I added Corey Minyard's
Tested-by and Reviewed-by.

From: Hidehiro Kawai 
Subject: mips/panic: replace smp_send_stop() with kdump friendly version in 
panic path

This patch fixes the problems reported by Daniel Walker
(https://lkml.org/lkml/2015/6/24/44).

When kernel panics with crash_kexec_post_notifiers kernel parameter
enabled, other CPUs are stopped by smp_send_stop() instead of
machine_crash_shutdown() in __crash_kexec() path.

  panic()
if crash_kexec_post_notifiers == 1
  smp_send_stop()
  atomic_notifier_call_chain()
  kmsg_dump()
__crash_kexec()
  machine_crash_shutdown()
octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs

Different from smp_send_stop(), machine_crash_shutdown() stops other
CPUs with extra works for kdump.  So, if smp_send_stop() stops other
CPUs in advance, these extra works won't be done.  As the result,
kdump routines miss to save other CPUs' registers.  Additionally for
MIPS OCTEON, it misses to stop the watchdog timer.

To fix this problem, call a new kdump friendly function,
crash_smp_send_stop(), instead of the smp_send_stop() when
crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
weak function, and it just call smp_send_stop().  Architecture
codes should override it so that kdump can work appropriately.
This patch provides MIPS version.

Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
Link: 
http://lkml.kernel.org/r/20160810080950.11028.28000.st...@sysi4-13.yrl.intra.hitachi.co.jp
Signed-off-by: Hidehiro Kawai 
Reported-by: Daniel Walker 
Tested-by: Corey Minyard 
Reviewed-by: Corey Minyard 
Cc: Dave Young 
Cc: Baoquan He 
Cc: Vivek Goyal 
Cc: Eric Biederman 
Cc: Masami Hiramatsu 
Cc: Daniel Walker 
Cc: Xunlei Pang 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: David Vrabel 
Cc: Toshi Kani 
Cc: Ralf Baechle 
Cc: David Daney 
Cc: Aaro Koskinen 
Cc: "Steven J. Hill" 
Signed-off-by: Andrew Morton 

> From: Corey Minyard [mailto:cminy...@mvista.com]
> Sent: Friday, August 19, 2016 6:18 AM
> Sorry this took so long, but I have finally tested this, it seems to
> work fine:
> 
> Tested-by: Corey Minyard 
> Reviewed-by: Corey Minyard 
> 
> On 08/10/2016 03:09 AM, Hidehiro Kawai wrote:
> > Daniel Walker reported problems which happens when
> > crash_kexec_post_notifiers kernel option is enabled
> > (https://lkml.org/lkml/2015/6/24/44).
> >
> > In that case, smp_send_stop() is called before entering kdump routines
> > which assume other CPUs are still online.  As the result, kdump
> > routines fail to save other CPUs' registers.  Additionally for MIPS
> > OCTEON, it misses to stop the watchdog timer.
> >
> > To fix this problem, call a new kdump friendly function,
> > crash_smp_send_stop(), instead of the smp_send_stop() when
> > crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
> > weak function, and it just call smp_send_stop().  Architecture
> > codes should override it so that kdump can work appropriately.
> > This patch provides MIPS version.
> >
> > Reported-by: Daniel Walker 
> > Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" 
> > option)
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Ralf Baechle 
> > Cc: David Daney 
> > Cc: Aaro Koskinen 
> > Cc: "Steven J. Hill" 
> > Cc: Corey Minyard 
> >
> > ---
> > I'm not familiar with MIPS, and I don't have a test environment and
> > just did build tests only.  Please don't apply this patch until
> > someone does enough tests, otherwise simply drop this patch.
> > ---
> >   arch/mips/cavium-octeon/setup.c  |   14 ++
> >   arch/mips/include/asm/kexec.h|1 +
> >   arch/mips/kernel/crash.c |   18 +-
> >   arch/mips/kernel/machine_kexec.c |1 +
> >   4 files changed, 33 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/mips/cavium-octeon/setup.c 
> > b/arch/mips/cavium-octeon/setup.c
> > index cb16fcc..5537f95 100644
> > --- a/arch/mips/cavium-octeon/setup.c
> > +++ b/arch/mips/cavium-octeon/setup.c
> > @@ -267,6 +267,17 @@ static void 

RE: Re: [V4 PATCH 2/2] mips/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-09-20 Thread 河合英宏 / KAWAIHIDEHIRO
Dave Young suggested to me to explain the problem in more detail,
so here is the revised commit description.  The patch is now in -mm,
so I copied Cc list from -mm version.  Also I added Corey Minyard's
Tested-by and Reviewed-by.

From: Hidehiro Kawai 
Subject: mips/panic: replace smp_send_stop() with kdump friendly version in 
panic path

This patch fixes the problems reported by Daniel Walker
(https://lkml.org/lkml/2015/6/24/44).

When kernel panics with crash_kexec_post_notifiers kernel parameter
enabled, other CPUs are stopped by smp_send_stop() instead of
machine_crash_shutdown() in __crash_kexec() path.

  panic()
if crash_kexec_post_notifiers == 1
  smp_send_stop()
  atomic_notifier_call_chain()
  kmsg_dump()
__crash_kexec()
  machine_crash_shutdown()
octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs

Different from smp_send_stop(), machine_crash_shutdown() stops other
CPUs with extra works for kdump.  So, if smp_send_stop() stops other
CPUs in advance, these extra works won't be done.  As the result,
kdump routines miss to save other CPUs' registers.  Additionally for
MIPS OCTEON, it misses to stop the watchdog timer.

To fix this problem, call a new kdump friendly function,
crash_smp_send_stop(), instead of the smp_send_stop() when
crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
weak function, and it just call smp_send_stop().  Architecture
codes should override it so that kdump can work appropriately.
This patch provides MIPS version.

Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
Link: 
http://lkml.kernel.org/r/20160810080950.11028.28000.st...@sysi4-13.yrl.intra.hitachi.co.jp
Signed-off-by: Hidehiro Kawai 
Reported-by: Daniel Walker 
Tested-by: Corey Minyard 
Reviewed-by: Corey Minyard 
Cc: Dave Young 
Cc: Baoquan He 
Cc: Vivek Goyal 
Cc: Eric Biederman 
Cc: Masami Hiramatsu 
Cc: Daniel Walker 
Cc: Xunlei Pang 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: David Vrabel 
Cc: Toshi Kani 
Cc: Ralf Baechle 
Cc: David Daney 
Cc: Aaro Koskinen 
Cc: "Steven J. Hill" 
Signed-off-by: Andrew Morton 

> From: Corey Minyard [mailto:cminy...@mvista.com]
> Sent: Friday, August 19, 2016 6:18 AM
> Sorry this took so long, but I have finally tested this, it seems to
> work fine:
> 
> Tested-by: Corey Minyard 
> Reviewed-by: Corey Minyard 
> 
> On 08/10/2016 03:09 AM, Hidehiro Kawai wrote:
> > Daniel Walker reported problems which happens when
> > crash_kexec_post_notifiers kernel option is enabled
> > (https://lkml.org/lkml/2015/6/24/44).
> >
> > In that case, smp_send_stop() is called before entering kdump routines
> > which assume other CPUs are still online.  As the result, kdump
> > routines fail to save other CPUs' registers.  Additionally for MIPS
> > OCTEON, it misses to stop the watchdog timer.
> >
> > To fix this problem, call a new kdump friendly function,
> > crash_smp_send_stop(), instead of the smp_send_stop() when
> > crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
> > weak function, and it just call smp_send_stop().  Architecture
> > codes should override it so that kdump can work appropriately.
> > This patch provides MIPS version.
> >
> > Reported-by: Daniel Walker 
> > Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" 
> > option)
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Ralf Baechle 
> > Cc: David Daney 
> > Cc: Aaro Koskinen 
> > Cc: "Steven J. Hill" 
> > Cc: Corey Minyard 
> >
> > ---
> > I'm not familiar with MIPS, and I don't have a test environment and
> > just did build tests only.  Please don't apply this patch until
> > someone does enough tests, otherwise simply drop this patch.
> > ---
> >   arch/mips/cavium-octeon/setup.c  |   14 ++
> >   arch/mips/include/asm/kexec.h|1 +
> >   arch/mips/kernel/crash.c |   18 +-
> >   arch/mips/kernel/machine_kexec.c |1 +
> >   4 files changed, 33 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/mips/cavium-octeon/setup.c 
> > b/arch/mips/cavium-octeon/setup.c
> > index cb16fcc..5537f95 100644
> > --- a/arch/mips/cavium-octeon/setup.c
> > +++ b/arch/mips/cavium-octeon/setup.c
> > @@ -267,6 +267,17 @@ static void octeon_crash_shutdown(struct pt_regs *regs)
> > default_machine_crash_shutdown(regs);
> >   }
> >
> > +#ifdef CONFIG_SMP
> > +void octeon_crash_smp_send_stop(void)
> > +{
> > +   int cpu;
> > +
> > +   /* disable watchdogs */
> > +   for_each_online_cpu(cpu)
> > +   cvmx_write_csr(CVMX_CIU_WDOGX(cpu_logical_map(cpu)), 0);
> > +}
> > +#endif
> > +
> >   #endif /* CONFIG_KEXEC */
> >
> >   #ifdef CONFIG_CAVIUM_RESERVE32
> > @@ -911,6 +922,9 @@ void __init prom_init(void)
> > _machine_kexec_shutdown = octeon_shutdown;
> > _machine_crash_shutdown = octeon_crash_shutdown;
> > _machine_kexec_prepare = octeon_kexec_prepare;
> > +#ifdef CONFIG_SMP
> > +   _crash_smp_send_stop = 

RE: RE: Re: [V4 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-09-20 Thread 河合英宏 / KAWAIHIDEHIRO
Here is the revised commit description reflecting Dave's
comment.  Cc list was copied from -mm version.

From: Hidehiro Kawai 
Subject: x86/panic: replace smp_send_stop() with kdump friendly version in 
panic path

This patch fixes a problem reported by Daniel Walker
(https://lkml.org/lkml/2015/6/24/44).

When kernel panics with crash_kexec_post_notifiers kernel parameter
enabled, other CPUs are stopped by smp_send_stop() instead of
machine_crash_shutdown() in __crash_kexec() path.

  panic()
if crash_kexec_post_notifiers == 1
  smp_send_stop()
  atomic_notifier_call_chain()
  kmsg_dump()
__crash_kexec()
  machine_crash_shutdown()

Different from smp_send_stop(), machine_crash_shutdown() stops other
CPUs with extra works for kdump.  So, if smp_send_stop() stops other
CPUs in advance, these extra works won't be done.  For x86, kdump
routines miss to save other CPUs' registers and disable virtualization
extensions.

To fix this problem, call a new kdump friendly function,
crash_smp_send_stop(), instead of the smp_send_stop() when
crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
weak function, and it just call smp_send_stop().  Architecture
codes should override it so that kdump can work appropriately.
This patch only provides x86-specific version.

For Xen's PV kernel, just keep the current behavior.
As for Dom0, it doesn't use crash_kexec routines, and it relies on
panic notifier chain.  At the end of the chain, a hypercall is
issued which requests the hypervisor to execute kdump.  This means
regardless of crash_kexec_post_notifiers setting, smp_send_stop().
For PV HVM, it would work similarly to baremetal kernels with extra
cleanups for hypervisor.  It doesn't need additional care.

Changes in V4:
- Keep to use smp_send_stop if crash_kexec_post_notifiers is not set
- Rename panic_smp_send_stop to crash_smp_send_stop
- Don't change the behavior for Xen's PV kernel

Changes in V3:
- Revise comments, description, and symbol names

Changes in V2:
- Replace smp_send_stop() call with crash_kexec version which
  saves cpu states and cleans up VMX/SVM
- Drop a fix for Problem 1 at this moment

Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
Link: 
http://lkml.kernel.org/r/20160810080948.11028.15344.st...@sysi4-13.yrl.intra.hitachi.co.jp
Signed-off-by: Hidehiro Kawai 
Reported-by: Daniel Walker 
Cc: Dave Young 
Cc: Baoquan He 
Cc: Vivek Goyal 
Cc: Eric Biederman 
Cc: Masami Hiramatsu 
Cc: Daniel Walker 
Cc: Xunlei Pang 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: David Vrabel 
Cc: Toshi Kani 
Cc: Ralf Baechle 
Cc: David Daney 
Cc: Aaro Koskinen 
Cc: "Steven J. Hill" 
Cc: Corey Minyard 
Signed-off-by: Andrew Morton 

> -Original Message-
> From: Hidehiro Kawai
> Hi Xunlei,
> 
> > From: Xunlei Pang [mailto:xp...@redhat.com]
> > Sent: Tuesday, September 20, 2016 4:40 PM
> > On 2016/08/15/ at 19:22, Hidehiro Kawai wrote:
> > > Hi Dave,
> > >
> > > Thank you for the review.
> > >
> > >> From: Dave Young [mailto:dyo...@redhat.com]
> > >> Sent: Friday, August 12, 2016 12:17 PM
> > >>
> > >> Thanks for the update.
> > >> On 08/10/16 at 05:09pm, Hidehiro Kawai wrote:
> > >>> Daniel Walker reported problems which happens when
> > >>> crash_kexec_post_notifiers kernel option is enabled
> > >>> (https://lkml.org/lkml/2015/6/24/44).
> > >>>
> > >>> In that case, smp_send_stop() is called before entering kdump routines
> > >>> which assume other CPUs are still online.  As the result, for x86,
> > >>> kdump routines fail to save other CPUs' registers  and disable
> > >>> virtualization extensions.
> > >> Seems you simplified the changelog, but I think a little more details
> > >> will be helpful to understand the patch. You know sometimes lkml.org
> > >> does not work well.
> > > So, I'll try another archives when I post patch set next time.
> >
> > Hi Hidehiro Kawai,
> >
> > What's the status of this patch set, are you going to send an updated 
> > version?
> 
> Sorry, I misunderstood what Dave said, and I thought I don't
> need to revise the patch set.
> 
> Currently, this patch set is in -mm, then -next tree.
> What I need to fix is only commit descriptions, so I'm going to
> post revised descriptions as replies to this thread, and then
> request to Andrew to replace them if there is no objection.
> (or should I resend the whole patch set?)
> 
> Regards,
> Hidehiro Kawai
> 
> > >>> To fix this problem, call a new kdump friendly function,
> > >>> 

RE: RE: Re: [V4 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-09-20 Thread 河合英宏 / KAWAIHIDEHIRO
Here is the revised commit description reflecting Dave's
comment.  Cc list was copied from -mm version.

From: Hidehiro Kawai 
Subject: x86/panic: replace smp_send_stop() with kdump friendly version in 
panic path

This patch fixes a problem reported by Daniel Walker
(https://lkml.org/lkml/2015/6/24/44).

When kernel panics with crash_kexec_post_notifiers kernel parameter
enabled, other CPUs are stopped by smp_send_stop() instead of
machine_crash_shutdown() in __crash_kexec() path.

  panic()
if crash_kexec_post_notifiers == 1
  smp_send_stop()
  atomic_notifier_call_chain()
  kmsg_dump()
__crash_kexec()
  machine_crash_shutdown()

Different from smp_send_stop(), machine_crash_shutdown() stops other
CPUs with extra works for kdump.  So, if smp_send_stop() stops other
CPUs in advance, these extra works won't be done.  For x86, kdump
routines miss to save other CPUs' registers and disable virtualization
extensions.

To fix this problem, call a new kdump friendly function,
crash_smp_send_stop(), instead of the smp_send_stop() when
crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
weak function, and it just call smp_send_stop().  Architecture
codes should override it so that kdump can work appropriately.
This patch only provides x86-specific version.

For Xen's PV kernel, just keep the current behavior.
As for Dom0, it doesn't use crash_kexec routines, and it relies on
panic notifier chain.  At the end of the chain, a hypercall is
issued which requests the hypervisor to execute kdump.  This means
regardless of crash_kexec_post_notifiers setting, smp_send_stop().
For PV HVM, it would work similarly to baremetal kernels with extra
cleanups for hypervisor.  It doesn't need additional care.

Changes in V4:
- Keep to use smp_send_stop if crash_kexec_post_notifiers is not set
- Rename panic_smp_send_stop to crash_smp_send_stop
- Don't change the behavior for Xen's PV kernel

Changes in V3:
- Revise comments, description, and symbol names

Changes in V2:
- Replace smp_send_stop() call with crash_kexec version which
  saves cpu states and cleans up VMX/SVM
- Drop a fix for Problem 1 at this moment

Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
Link: 
http://lkml.kernel.org/r/20160810080948.11028.15344.st...@sysi4-13.yrl.intra.hitachi.co.jp
Signed-off-by: Hidehiro Kawai 
Reported-by: Daniel Walker 
Cc: Dave Young 
Cc: Baoquan He 
Cc: Vivek Goyal 
Cc: Eric Biederman 
Cc: Masami Hiramatsu 
Cc: Daniel Walker 
Cc: Xunlei Pang 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: David Vrabel 
Cc: Toshi Kani 
Cc: Ralf Baechle 
Cc: David Daney 
Cc: Aaro Koskinen 
Cc: "Steven J. Hill" 
Cc: Corey Minyard 
Signed-off-by: Andrew Morton 

> -Original Message-
> From: Hidehiro Kawai
> Hi Xunlei,
> 
> > From: Xunlei Pang [mailto:xp...@redhat.com]
> > Sent: Tuesday, September 20, 2016 4:40 PM
> > On 2016/08/15/ at 19:22, Hidehiro Kawai wrote:
> > > Hi Dave,
> > >
> > > Thank you for the review.
> > >
> > >> From: Dave Young [mailto:dyo...@redhat.com]
> > >> Sent: Friday, August 12, 2016 12:17 PM
> > >>
> > >> Thanks for the update.
> > >> On 08/10/16 at 05:09pm, Hidehiro Kawai wrote:
> > >>> Daniel Walker reported problems which happens when
> > >>> crash_kexec_post_notifiers kernel option is enabled
> > >>> (https://lkml.org/lkml/2015/6/24/44).
> > >>>
> > >>> In that case, smp_send_stop() is called before entering kdump routines
> > >>> which assume other CPUs are still online.  As the result, for x86,
> > >>> kdump routines fail to save other CPUs' registers  and disable
> > >>> virtualization extensions.
> > >> Seems you simplified the changelog, but I think a little more details
> > >> will be helpful to understand the patch. You know sometimes lkml.org
> > >> does not work well.
> > > So, I'll try another archives when I post patch set next time.
> >
> > Hi Hidehiro Kawai,
> >
> > What's the status of this patch set, are you going to send an updated 
> > version?
> 
> Sorry, I misunderstood what Dave said, and I thought I don't
> need to revise the patch set.
> 
> Currently, this patch set is in -mm, then -next tree.
> What I need to fix is only commit descriptions, so I'm going to
> post revised descriptions as replies to this thread, and then
> request to Andrew to replace them if there is no objection.
> (or should I resend the whole patch set?)
> 
> Regards,
> Hidehiro Kawai
> 
> > >>> To fix this problem, call a new kdump friendly function,
> > >>> crash_smp_send_stop(), instead of the smp_send_stop() when
> > >>> crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
> > >>> weak function, and it just call smp_send_stop().  Architecture
> > >>> codes should override it so that kdump can work appropriately.
> > >>> This patch only provides x86-specific version.
> > >>>
> > >>> For Xen's PV kernel, just keep the current behavior.
> > >> Could you explain a bit about above Xen PV kernel behavior?
> > 

RE: Re: [V4 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-09-20 Thread 河合英宏 / KAWAIHIDEHIRO
Hi Xunlei,

> From: Xunlei Pang [mailto:xp...@redhat.com]
> Sent: Tuesday, September 20, 2016 4:40 PM
> On 2016/08/15/ at 19:22, Hidehiro Kawai wrote:
> > Hi Dave,
> >
> > Thank you for the review.
> >
> >> From: Dave Young [mailto:dyo...@redhat.com]
> >> Sent: Friday, August 12, 2016 12:17 PM
> >>
> >> Thanks for the update.
> >> On 08/10/16 at 05:09pm, Hidehiro Kawai wrote:
> >>> Daniel Walker reported problems which happens when
> >>> crash_kexec_post_notifiers kernel option is enabled
> >>> (https://lkml.org/lkml/2015/6/24/44).
> >>>
> >>> In that case, smp_send_stop() is called before entering kdump routines
> >>> which assume other CPUs are still online.  As the result, for x86,
> >>> kdump routines fail to save other CPUs' registers  and disable
> >>> virtualization extensions.
> >> Seems you simplified the changelog, but I think a little more details
> >> will be helpful to understand the patch. You know sometimes lkml.org
> >> does not work well.
> > So, I'll try another archives when I post patch set next time.
> 
> Hi Hidehiro Kawai,
> 
> What's the status of this patch set, are you going to send an updated version?

Sorry, I misunderstood what Dave said, and I thought I don't
need to revise the patch set.

Currently, this patch set is in -mm, then -next tree.
What I need to fix is only commit descriptions, so I'm going to
post revised descriptions as replies to this thread, and then
request to Andrew to replace them if there is no objection.
(or should I resend the whole patch set?)

Regards,
Hidehiro Kawai

> >>> To fix this problem, call a new kdump friendly function,
> >>> crash_smp_send_stop(), instead of the smp_send_stop() when
> >>> crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
> >>> weak function, and it just call smp_send_stop().  Architecture
> >>> codes should override it so that kdump can work appropriately.
> >>> This patch only provides x86-specific version.
> >>>
> >>> For Xen's PV kernel, just keep the current behavior.
> >> Could you explain a bit about above Xen PV kernel behavior?
> >>
> >> BTW, this version looks better,  I think I'm fine with this version
> >> besides of the questions about changelog.
> > As for Dom0 kernel, it doesn't use crash_kexec routines, and
> > it relies on panic notifier chain.  At the end of the chain,
> > xen_panic_event is called, and it issues a hypercall which
> > requests Hypervisor to execute kdump.  This means whether
> > crash_kexec_panic_notifiers is set or not, panic notifiers
> > are called after smp_send_stop.  Even if we save registers
> > in Dom0 kernel, they seem to be ignored (Hypervisor is responsible
> > for that).  This is why I kept the current behavior for Xen.
> >
> > For PV DomU kernel, kdump is not supported.  For PV HVM
> > DomU, I'm not sure what will happen on panic because I
> > couldn't boot PV HVM DomU and test it.  But I think it will
> > work similarly to baremetal kernels with extra cleanups
> > for Hypervisor.
> >
> > Best regards,
> >
> > Hidehiro Kawai
> >
> >>> Changes in V4:
> >>> - Keep to use smp_send_stop if crash_kexec_post_notifiers is not set
> >>> - Rename panic_smp_send_stop to crash_smp_send_stop
> >>> - Don't change the behavior for Xen's PV kernel
> >>>
> >>> Changes in V3:
> >>> - Revise comments, description, and symbol names
> >>>
> >>> Changes in V2:
> >>> - Replace smp_send_stop() call with crash_kexec version which
> >>>   saves cpu states and cleans up VMX/SVM
> >>> - Drop a fix for Problem 1 at this moment
> >>>
> >>> Reported-by: Daniel Walker 
> >>> Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" 
> >>> option)
> >>> Signed-off-by: Hidehiro Kawai 
> >>> Cc: Dave Young 
> >>> Cc: Baoquan He 
> >>> Cc: Vivek Goyal 
> >>> Cc: Eric Biederman 
> >>> Cc: Masami Hiramatsu 
> >>> Cc: Daniel Walker 
> >>> Cc: Xunlei Pang 
> >>> Cc: Thomas Gleixner 
> >>> Cc: Ingo Molnar 
> >>> Cc: "H. Peter Anvin" 
> >>> Cc: Borislav Petkov 
> >>> Cc: David Vrabel 
> >>> Cc: Toshi Kani 
> >>> Cc: Andrew Morton 
> >>> ---
> >>>  arch/x86/include/asm/kexec.h |1 +
> >>>  arch/x86/include/asm/smp.h   |1 +
> >>>  arch/x86/kernel/crash.c  |   22 +---
> >>>  arch/x86/kernel/smp.c|5 
> >>>  kernel/panic.c   |   47 
> >>> --
> >>>  5 files changed, 66 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> >>> index d2434c1..282630e 100644
> >>> --- a/arch/x86/include/asm/kexec.h
> >>> +++ b/arch/x86/include/asm/kexec.h
> >>> @@ -210,6 +210,7 @@ struct kexec_entry64_regs {
> >>>
> >>>  typedef void 

RE: Re: [V4 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-09-20 Thread 河合英宏 / KAWAIHIDEHIRO
Hi Xunlei,

> From: Xunlei Pang [mailto:xp...@redhat.com]
> Sent: Tuesday, September 20, 2016 4:40 PM
> On 2016/08/15/ at 19:22, Hidehiro Kawai wrote:
> > Hi Dave,
> >
> > Thank you for the review.
> >
> >> From: Dave Young [mailto:dyo...@redhat.com]
> >> Sent: Friday, August 12, 2016 12:17 PM
> >>
> >> Thanks for the update.
> >> On 08/10/16 at 05:09pm, Hidehiro Kawai wrote:
> >>> Daniel Walker reported problems which happens when
> >>> crash_kexec_post_notifiers kernel option is enabled
> >>> (https://lkml.org/lkml/2015/6/24/44).
> >>>
> >>> In that case, smp_send_stop() is called before entering kdump routines
> >>> which assume other CPUs are still online.  As the result, for x86,
> >>> kdump routines fail to save other CPUs' registers  and disable
> >>> virtualization extensions.
> >> Seems you simplified the changelog, but I think a little more details
> >> will be helpful to understand the patch. You know sometimes lkml.org
> >> does not work well.
> > So, I'll try another archives when I post patch set next time.
> 
> Hi Hidehiro Kawai,
> 
> What's the status of this patch set, are you going to send an updated version?

Sorry, I misunderstood what Dave said, and I thought I don't
need to revise the patch set.

Currently, this patch set is in -mm, then -next tree.
What I need to fix is only commit descriptions, so I'm going to
post revised descriptions as replies to this thread, and then
request to Andrew to replace them if there is no objection.
(or should I resend the whole patch set?)

Regards,
Hidehiro Kawai

> >>> To fix this problem, call a new kdump friendly function,
> >>> crash_smp_send_stop(), instead of the smp_send_stop() when
> >>> crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
> >>> weak function, and it just call smp_send_stop().  Architecture
> >>> codes should override it so that kdump can work appropriately.
> >>> This patch only provides x86-specific version.
> >>>
> >>> For Xen's PV kernel, just keep the current behavior.
> >> Could you explain a bit about above Xen PV kernel behavior?
> >>
> >> BTW, this version looks better,  I think I'm fine with this version
> >> besides of the questions about changelog.
> > As for Dom0 kernel, it doesn't use crash_kexec routines, and
> > it relies on panic notifier chain.  At the end of the chain,
> > xen_panic_event is called, and it issues a hypercall which
> > requests Hypervisor to execute kdump.  This means whether
> > crash_kexec_panic_notifiers is set or not, panic notifiers
> > are called after smp_send_stop.  Even if we save registers
> > in Dom0 kernel, they seem to be ignored (Hypervisor is responsible
> > for that).  This is why I kept the current behavior for Xen.
> >
> > For PV DomU kernel, kdump is not supported.  For PV HVM
> > DomU, I'm not sure what will happen on panic because I
> > couldn't boot PV HVM DomU and test it.  But I think it will
> > work similarly to baremetal kernels with extra cleanups
> > for Hypervisor.
> >
> > Best regards,
> >
> > Hidehiro Kawai
> >
> >>> Changes in V4:
> >>> - Keep to use smp_send_stop if crash_kexec_post_notifiers is not set
> >>> - Rename panic_smp_send_stop to crash_smp_send_stop
> >>> - Don't change the behavior for Xen's PV kernel
> >>>
> >>> Changes in V3:
> >>> - Revise comments, description, and symbol names
> >>>
> >>> Changes in V2:
> >>> - Replace smp_send_stop() call with crash_kexec version which
> >>>   saves cpu states and cleans up VMX/SVM
> >>> - Drop a fix for Problem 1 at this moment
> >>>
> >>> Reported-by: Daniel Walker 
> >>> Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" 
> >>> option)
> >>> Signed-off-by: Hidehiro Kawai 
> >>> Cc: Dave Young 
> >>> Cc: Baoquan He 
> >>> Cc: Vivek Goyal 
> >>> Cc: Eric Biederman 
> >>> Cc: Masami Hiramatsu 
> >>> Cc: Daniel Walker 
> >>> Cc: Xunlei Pang 
> >>> Cc: Thomas Gleixner 
> >>> Cc: Ingo Molnar 
> >>> Cc: "H. Peter Anvin" 
> >>> Cc: Borislav Petkov 
> >>> Cc: David Vrabel 
> >>> Cc: Toshi Kani 
> >>> Cc: Andrew Morton 
> >>> ---
> >>>  arch/x86/include/asm/kexec.h |1 +
> >>>  arch/x86/include/asm/smp.h   |1 +
> >>>  arch/x86/kernel/crash.c  |   22 +---
> >>>  arch/x86/kernel/smp.c|5 
> >>>  kernel/panic.c   |   47 
> >>> --
> >>>  5 files changed, 66 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> >>> index d2434c1..282630e 100644
> >>> --- a/arch/x86/include/asm/kexec.h
> >>> +++ b/arch/x86/include/asm/kexec.h
> >>> @@ -210,6 +210,7 @@ struct kexec_entry64_regs {
> >>>
> >>>  typedef void crash_vmclear_fn(void);
> >>>  extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
> >>> +extern void kdump_nmi_shootdown_cpus(void);
> >>>
> >>>  #endif /* __ASSEMBLY__ */
> >>>
> >>> diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
> >>> index ebd0c16..f70989c 100644
> >>> --- 

RE: [V4 PATCH 2/2] mips/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-08-16 Thread 河合英宏 / KAWAIHIDEHIRO
> From: Corey Minyard [mailto:cminy...@mvista.com]
> Sent: Tuesday, August 16, 2016 3:02 AM
> On 08/15/2016 12:06 PM, Corey Minyard wrote:
> > On 08/15/2016 06:35 AM, 河合英宏 / KAWAI,HIDEHIRO wrote:
> >> Hi Corey,
> >>
> >>> From: Corey Minyard [mailto:cminy...@mvista.com]
> >>> Sent: Friday, August 12, 2016 10:56 PM
> >>> I'll try to test this, but I have one comment inline...
> >> Thank you very much!
> >>
> >>> On 08/11/2016 10:17 PM, Dave Young wrote:
> >>>> On 08/10/16 at 05:09pm, Hidehiro Kawai wrote:
> >> [snip]
> >>>>> diff --git a/arch/mips/kernel/crash.c b/arch/mips/kernel/crash.c
> >>>>> index 610f0f3..1723b17 100644
> >>>>> --- a/arch/mips/kernel/crash.c
> >>>>> +++ b/arch/mips/kernel/crash.c
> >>>>> @@ -47,9 +47,14 @@ static void crash_shutdown_secondary(void
> >>>>> *passed_regs)
> >>>>>
> >>>>>static void crash_kexec_prepare_cpus(void)
> >>>>>{
> >>>>> +static int cpus_stopped;
> >>>>>unsigned int msecs;
> >>>>> +unsigned int ncpus;
> >>>>>
> >>>>> -unsigned int ncpus = num_online_cpus() - 1;/* Excluding the
> >>>>> panic cpu */
> >>>>> +if (cpus_stopped)
> >>>>> +return;
> >>> Wouldn't you want an atomic operation and some special handling here to
> >>> ensure that only one CPU does this?  So if a CPU comes in here and
> >>> another CPU is already in the process stopping the CPUs it won't
> >>> result in a
> >>> deadlock.
> >> Because this function can be called only one panicking CPU,
> >> there is no problem.
> >>
> >> There are two paths which crash_kexec_prepare_cpus is called.
> >>
> >> Path 1 (panic path):
> >> panic()
> >>crash_smp_send_stop()
> >>  crash_kexec_prepare_cpus()
> >>
> >> Path 2 (oops path):
> >> crash_kexec()
> >>__crash_kexec()
> >>  machine_crash_shutdown()
> >>default_machine_crash_shutdown() // for MIPS
> >>  crash_kexec_prepare_cpus()
> >>
> >> Here, panic() and crash_kexec() run exclusively via
> >> panic_cpu atomic variable.  So we can use cpus_stopped as
> >> normal variable.
> >
> > Ok, if the code can only be entered once, what's the purpose of
> > cpus_stopped?
> > I guess that's what confused me.  You are right, the panic_cpu atomic
> > should
> > keep this on a single CPU.
> 
> Never mind, I see the path through panic() where that is required. My
> question
> below still remains, though.
> 
> > Also, panic() will call panic_smp_self_stop() if it finds another CPU
> > has already
> > called panic, which will just spin with interrupts off by default. I
> > didn't see a
> > definition for it in MIPS, wouldn't it need to be overridden to avoid
> > a deadlock?

No deadlock should happen. Panicking CPU calls crash_kexec_prepare_cpus(),
and it issues an IPI and wait for other CPUs handle it.  If some of them
are looping in panic_smp_self_stop() with interrupt disabled, they can't
handle the IPI.  But it's not a severe problem.  crash_kexec_prepare_cpus()
has a timeout mechanism, and it will go out from the wait loop when it
happens.

In that case, of course, their registers are not saved.  This could be
improved, but I'd like to entrust MIPS experts with the improvement.
This is another issue.

Best regards,

Hidehiro Kawai



RE: [V4 PATCH 2/2] mips/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-08-16 Thread 河合英宏 / KAWAIHIDEHIRO
> From: Corey Minyard [mailto:cminy...@mvista.com]
> Sent: Tuesday, August 16, 2016 3:02 AM
> On 08/15/2016 12:06 PM, Corey Minyard wrote:
> > On 08/15/2016 06:35 AM, 河合英宏 / KAWAI,HIDEHIRO wrote:
> >> Hi Corey,
> >>
> >>> From: Corey Minyard [mailto:cminy...@mvista.com]
> >>> Sent: Friday, August 12, 2016 10:56 PM
> >>> I'll try to test this, but I have one comment inline...
> >> Thank you very much!
> >>
> >>> On 08/11/2016 10:17 PM, Dave Young wrote:
> >>>> On 08/10/16 at 05:09pm, Hidehiro Kawai wrote:
> >> [snip]
> >>>>> diff --git a/arch/mips/kernel/crash.c b/arch/mips/kernel/crash.c
> >>>>> index 610f0f3..1723b17 100644
> >>>>> --- a/arch/mips/kernel/crash.c
> >>>>> +++ b/arch/mips/kernel/crash.c
> >>>>> @@ -47,9 +47,14 @@ static void crash_shutdown_secondary(void
> >>>>> *passed_regs)
> >>>>>
> >>>>>static void crash_kexec_prepare_cpus(void)
> >>>>>{
> >>>>> +static int cpus_stopped;
> >>>>>unsigned int msecs;
> >>>>> +unsigned int ncpus;
> >>>>>
> >>>>> -unsigned int ncpus = num_online_cpus() - 1;/* Excluding the
> >>>>> panic cpu */
> >>>>> +if (cpus_stopped)
> >>>>> +return;
> >>> Wouldn't you want an atomic operation and some special handling here to
> >>> ensure that only one CPU does this?  So if a CPU comes in here and
> >>> another CPU is already in the process stopping the CPUs it won't
> >>> result in a
> >>> deadlock.
> >> Because this function can be called only one panicking CPU,
> >> there is no problem.
> >>
> >> There are two paths which crash_kexec_prepare_cpus is called.
> >>
> >> Path 1 (panic path):
> >> panic()
> >>crash_smp_send_stop()
> >>  crash_kexec_prepare_cpus()
> >>
> >> Path 2 (oops path):
> >> crash_kexec()
> >>__crash_kexec()
> >>  machine_crash_shutdown()
> >>default_machine_crash_shutdown() // for MIPS
> >>  crash_kexec_prepare_cpus()
> >>
> >> Here, panic() and crash_kexec() run exclusively via
> >> panic_cpu atomic variable.  So we can use cpus_stopped as
> >> normal variable.
> >
> > Ok, if the code can only be entered once, what's the purpose of
> > cpus_stopped?
> > I guess that's what confused me.  You are right, the panic_cpu atomic
> > should
> > keep this on a single CPU.
> 
> Never mind, I see the path through panic() where that is required. My
> question
> below still remains, though.
> 
> > Also, panic() will call panic_smp_self_stop() if it finds another CPU
> > has already
> > called panic, which will just spin with interrupts off by default. I
> > didn't see a
> > definition for it in MIPS, wouldn't it need to be overridden to avoid
> > a deadlock?

No deadlock should happen. Panicking CPU calls crash_kexec_prepare_cpus(),
and it issues an IPI and wait for other CPUs handle it.  If some of them
are looping in panic_smp_self_stop() with interrupt disabled, they can't
handle the IPI.  But it's not a severe problem.  crash_kexec_prepare_cpus()
has a timeout mechanism, and it will go out from the wait loop when it
happens.

In that case, of course, their registers are not saved.  This could be
improved, but I'd like to entrust MIPS experts with the improvement.
This is another issue.

Best regards,

Hidehiro Kawai



RE: Re: [V4 PATCH 2/2] mips/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-08-15 Thread 河合英宏 / KAWAIHIDEHIRO
Hi Corey,

> From: Corey Minyard [mailto:cminy...@mvista.com]
> Sent: Friday, August 12, 2016 10:56 PM
> I'll try to test this, but I have one comment inline...

Thank you very much!

> On 08/11/2016 10:17 PM, Dave Young wrote:
> > On 08/10/16 at 05:09pm, Hidehiro Kawai wrote:
[snip]
> >> diff --git a/arch/mips/kernel/crash.c b/arch/mips/kernel/crash.c
> >> index 610f0f3..1723b17 100644
> >> --- a/arch/mips/kernel/crash.c
> >> +++ b/arch/mips/kernel/crash.c
> >> @@ -47,9 +47,14 @@ static void crash_shutdown_secondary(void *passed_regs)
> >>
> >>   static void crash_kexec_prepare_cpus(void)
> >>   {
> >> +  static int cpus_stopped;
> >>unsigned int msecs;
> >> +  unsigned int ncpus;
> >>
> >> -  unsigned int ncpus = num_online_cpus() - 1;/* Excluding the panic cpu */
> >> +  if (cpus_stopped)
> >> +  return;
> 
> Wouldn't you want an atomic operation and some special handling here to
> ensure that only one CPU does this?  So if a CPU comes in here and
> another CPU is already in the process stopping the CPUs it won't result in a
> deadlock.

Because this function can be called only one panicking CPU,
there is no problem.

There are two paths which crash_kexec_prepare_cpus is called.

Path 1 (panic path):
panic()
  crash_smp_send_stop()
crash_kexec_prepare_cpus()

Path 2 (oops path):
crash_kexec()
  __crash_kexec()
machine_crash_shutdown()
  default_machine_crash_shutdown() // for MIPS
crash_kexec_prepare_cpus()

Here, panic() and crash_kexec() run exclusively via
panic_cpu atomic variable.  So we can use cpus_stopped as
normal variable.

Best regards,

Hidehiro Kawai



RE: Re: [V4 PATCH 2/2] mips/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-08-15 Thread 河合英宏 / KAWAIHIDEHIRO
Hi Corey,

> From: Corey Minyard [mailto:cminy...@mvista.com]
> Sent: Friday, August 12, 2016 10:56 PM
> I'll try to test this, but I have one comment inline...

Thank you very much!

> On 08/11/2016 10:17 PM, Dave Young wrote:
> > On 08/10/16 at 05:09pm, Hidehiro Kawai wrote:
[snip]
> >> diff --git a/arch/mips/kernel/crash.c b/arch/mips/kernel/crash.c
> >> index 610f0f3..1723b17 100644
> >> --- a/arch/mips/kernel/crash.c
> >> +++ b/arch/mips/kernel/crash.c
> >> @@ -47,9 +47,14 @@ static void crash_shutdown_secondary(void *passed_regs)
> >>
> >>   static void crash_kexec_prepare_cpus(void)
> >>   {
> >> +  static int cpus_stopped;
> >>unsigned int msecs;
> >> +  unsigned int ncpus;
> >>
> >> -  unsigned int ncpus = num_online_cpus() - 1;/* Excluding the panic cpu */
> >> +  if (cpus_stopped)
> >> +  return;
> 
> Wouldn't you want an atomic operation and some special handling here to
> ensure that only one CPU does this?  So if a CPU comes in here and
> another CPU is already in the process stopping the CPUs it won't result in a
> deadlock.

Because this function can be called only one panicking CPU,
there is no problem.

There are two paths which crash_kexec_prepare_cpus is called.

Path 1 (panic path):
panic()
  crash_smp_send_stop()
crash_kexec_prepare_cpus()

Path 2 (oops path):
crash_kexec()
  __crash_kexec()
machine_crash_shutdown()
  default_machine_crash_shutdown() // for MIPS
crash_kexec_prepare_cpus()

Here, panic() and crash_kexec() run exclusively via
panic_cpu atomic variable.  So we can use cpus_stopped as
normal variable.

Best regards,

Hidehiro Kawai



RE: Re: [V4 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-08-15 Thread 河合英宏 / KAWAIHIDEHIRO
Hi Dave,

Thank you for the review.

> From: Dave Young [mailto:dyo...@redhat.com]
> Sent: Friday, August 12, 2016 12:17 PM
> 
> Thanks for the update.
> On 08/10/16 at 05:09pm, Hidehiro Kawai wrote:
> > Daniel Walker reported problems which happens when
> > crash_kexec_post_notifiers kernel option is enabled
> > (https://lkml.org/lkml/2015/6/24/44).
> >
> > In that case, smp_send_stop() is called before entering kdump routines
> > which assume other CPUs are still online.  As the result, for x86,
> > kdump routines fail to save other CPUs' registers  and disable
> > virtualization extensions.
> 
> Seems you simplified the changelog, but I think a little more details
> will be helpful to understand the patch. You know sometimes lkml.org
> does not work well.

So, I'll try another archives when I post patch set next time.

> > To fix this problem, call a new kdump friendly function,
> > crash_smp_send_stop(), instead of the smp_send_stop() when
> > crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
> > weak function, and it just call smp_send_stop().  Architecture
> > codes should override it so that kdump can work appropriately.
> > This patch only provides x86-specific version.
> >
> > For Xen's PV kernel, just keep the current behavior.
> 
> Could you explain a bit about above Xen PV kernel behavior?
> 
> BTW, this version looks better,  I think I'm fine with this version
> besides of the questions about changelog.

As for Dom0 kernel, it doesn't use crash_kexec routines, and
it relies on panic notifier chain.  At the end of the chain,
xen_panic_event is called, and it issues a hypercall which
requests Hypervisor to execute kdump.  This means whether
crash_kexec_panic_notifiers is set or not, panic notifiers
are called after smp_send_stop.  Even if we save registers
in Dom0 kernel, they seem to be ignored (Hypervisor is responsible
for that).  This is why I kept the current behavior for Xen.

For PV DomU kernel, kdump is not supported.  For PV HVM
DomU, I'm not sure what will happen on panic because I
couldn't boot PV HVM DomU and test it.  But I think it will
work similarly to baremetal kernels with extra cleanups
for Hypervisor.

Best regards,

Hidehiro Kawai

> > Changes in V4:
> > - Keep to use smp_send_stop if crash_kexec_post_notifiers is not set
> > - Rename panic_smp_send_stop to crash_smp_send_stop
> > - Don't change the behavior for Xen's PV kernel
> >
> > Changes in V3:
> > - Revise comments, description, and symbol names
> >
> > Changes in V2:
> > - Replace smp_send_stop() call with crash_kexec version which
> >   saves cpu states and cleans up VMX/SVM
> > - Drop a fix for Problem 1 at this moment
> >
> > Reported-by: Daniel Walker 
> > Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" 
> > option)
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Dave Young 
> > Cc: Baoquan He 
> > Cc: Vivek Goyal 
> > Cc: Eric Biederman 
> > Cc: Masami Hiramatsu 
> > Cc: Daniel Walker 
> > Cc: Xunlei Pang 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: Borislav Petkov 
> > Cc: David Vrabel 
> > Cc: Toshi Kani 
> > Cc: Andrew Morton 
> > ---
> >  arch/x86/include/asm/kexec.h |1 +
> >  arch/x86/include/asm/smp.h   |1 +
> >  arch/x86/kernel/crash.c  |   22 +---
> >  arch/x86/kernel/smp.c|5 
> >  kernel/panic.c   |   47 
> > --
> >  5 files changed, 66 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> > index d2434c1..282630e 100644
> > --- a/arch/x86/include/asm/kexec.h
> > +++ b/arch/x86/include/asm/kexec.h
> > @@ -210,6 +210,7 @@ struct kexec_entry64_regs {
> >
> >  typedef void crash_vmclear_fn(void);
> >  extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
> > +extern void kdump_nmi_shootdown_cpus(void);
> >
> >  #endif /* __ASSEMBLY__ */
> >
> > diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
> > index ebd0c16..f70989c 100644
> > --- a/arch/x86/include/asm/smp.h
> > +++ b/arch/x86/include/asm/smp.h
> > @@ -50,6 +50,7 @@ struct smp_ops {
> > void (*smp_cpus_done)(unsigned max_cpus);
> >
> > void (*stop_other_cpus)(int wait);
> > +   void (*crash_stop_other_cpus)(void);
> > void (*smp_send_reschedule)(int cpu);
> >
> > int (*cpu_up)(unsigned cpu, struct task_struct *tidle);
> > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> > index 9616cf7..650830e 100644
> > --- a/arch/x86/kernel/crash.c
> > +++ b/arch/x86/kernel/crash.c
> > @@ -133,15 +133,31 @@ static void kdump_nmi_callback(int cpu, struct 
> > pt_regs 

RE: Re: [V4 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-08-15 Thread 河合英宏 / KAWAIHIDEHIRO
Hi Dave,

Thank you for the review.

> From: Dave Young [mailto:dyo...@redhat.com]
> Sent: Friday, August 12, 2016 12:17 PM
> 
> Thanks for the update.
> On 08/10/16 at 05:09pm, Hidehiro Kawai wrote:
> > Daniel Walker reported problems which happens when
> > crash_kexec_post_notifiers kernel option is enabled
> > (https://lkml.org/lkml/2015/6/24/44).
> >
> > In that case, smp_send_stop() is called before entering kdump routines
> > which assume other CPUs are still online.  As the result, for x86,
> > kdump routines fail to save other CPUs' registers  and disable
> > virtualization extensions.
> 
> Seems you simplified the changelog, but I think a little more details
> will be helpful to understand the patch. You know sometimes lkml.org
> does not work well.

So, I'll try another archives when I post patch set next time.

> > To fix this problem, call a new kdump friendly function,
> > crash_smp_send_stop(), instead of the smp_send_stop() when
> > crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
> > weak function, and it just call smp_send_stop().  Architecture
> > codes should override it so that kdump can work appropriately.
> > This patch only provides x86-specific version.
> >
> > For Xen's PV kernel, just keep the current behavior.
> 
> Could you explain a bit about above Xen PV kernel behavior?
> 
> BTW, this version looks better,  I think I'm fine with this version
> besides of the questions about changelog.

As for Dom0 kernel, it doesn't use crash_kexec routines, and
it relies on panic notifier chain.  At the end of the chain,
xen_panic_event is called, and it issues a hypercall which
requests Hypervisor to execute kdump.  This means whether
crash_kexec_panic_notifiers is set or not, panic notifiers
are called after smp_send_stop.  Even if we save registers
in Dom0 kernel, they seem to be ignored (Hypervisor is responsible
for that).  This is why I kept the current behavior for Xen.

For PV DomU kernel, kdump is not supported.  For PV HVM
DomU, I'm not sure what will happen on panic because I
couldn't boot PV HVM DomU and test it.  But I think it will
work similarly to baremetal kernels with extra cleanups
for Hypervisor.

Best regards,

Hidehiro Kawai

> > Changes in V4:
> > - Keep to use smp_send_stop if crash_kexec_post_notifiers is not set
> > - Rename panic_smp_send_stop to crash_smp_send_stop
> > - Don't change the behavior for Xen's PV kernel
> >
> > Changes in V3:
> > - Revise comments, description, and symbol names
> >
> > Changes in V2:
> > - Replace smp_send_stop() call with crash_kexec version which
> >   saves cpu states and cleans up VMX/SVM
> > - Drop a fix for Problem 1 at this moment
> >
> > Reported-by: Daniel Walker 
> > Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" 
> > option)
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Dave Young 
> > Cc: Baoquan He 
> > Cc: Vivek Goyal 
> > Cc: Eric Biederman 
> > Cc: Masami Hiramatsu 
> > Cc: Daniel Walker 
> > Cc: Xunlei Pang 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: Borislav Petkov 
> > Cc: David Vrabel 
> > Cc: Toshi Kani 
> > Cc: Andrew Morton 
> > ---
> >  arch/x86/include/asm/kexec.h |1 +
> >  arch/x86/include/asm/smp.h   |1 +
> >  arch/x86/kernel/crash.c  |   22 +---
> >  arch/x86/kernel/smp.c|5 
> >  kernel/panic.c   |   47 
> > --
> >  5 files changed, 66 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> > index d2434c1..282630e 100644
> > --- a/arch/x86/include/asm/kexec.h
> > +++ b/arch/x86/include/asm/kexec.h
> > @@ -210,6 +210,7 @@ struct kexec_entry64_regs {
> >
> >  typedef void crash_vmclear_fn(void);
> >  extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
> > +extern void kdump_nmi_shootdown_cpus(void);
> >
> >  #endif /* __ASSEMBLY__ */
> >
> > diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
> > index ebd0c16..f70989c 100644
> > --- a/arch/x86/include/asm/smp.h
> > +++ b/arch/x86/include/asm/smp.h
> > @@ -50,6 +50,7 @@ struct smp_ops {
> > void (*smp_cpus_done)(unsigned max_cpus);
> >
> > void (*stop_other_cpus)(int wait);
> > +   void (*crash_stop_other_cpus)(void);
> > void (*smp_send_reschedule)(int cpu);
> >
> > int (*cpu_up)(unsigned cpu, struct task_struct *tidle);
> > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> > index 9616cf7..650830e 100644
> > --- a/arch/x86/kernel/crash.c
> > +++ b/arch/x86/kernel/crash.c
> > @@ -133,15 +133,31 @@ static void kdump_nmi_callback(int cpu, struct 
> > pt_regs *regs)
> > disable_local_APIC();
> >  }
> >
> > -static void kdump_nmi_shootdown_cpus(void)
> > +void kdump_nmi_shootdown_cpus(void)
> >  {
> > nmi_shootdown_cpus(kdump_nmi_callback);
> >
> > disable_local_APIC();
> >  }
> >
> > +/* Override the weak function in kernel/panic.c */
> > +void 

RE: Re: [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version

2016-07-19 Thread 河合英宏 / KAWAIHIDEHIRO
> From: 'Dave Young' [mailto:dyo...@redhat.com]
> Sent: Tuesday, July 19, 2016 3:52 PM
> Hi,
> On 07/19/16 at 05:51am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > Hi,
> >
> > > From: 'Dave Young' [mailto:dyo...@redhat.com]
> > > Sent: Monday, July 18, 2016 6:02 PM
> > > On 07/15/16 at 11:50am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > Hi Dave,
> > > >
> > > > Thanks for your reply.
> > > >
> > > > > From: 'Dave Young' [mailto:dyo...@redhat.com]
> > > > > Sent: Wednesday, July 13, 2016 11:04 AM
> > > > >
> > > > > On 07/12/16 at 02:49am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > > > Hi Dave,
> > > > > >
> > > > > > Thanks for the comments.
> > > > > >
> > > > > > > From: Dave Young [mailto:dyo...@redhat.com]
> > > > > > > Sent: Monday, July 11, 2016 5:35 PM
> > > > > > >
> > > > > > > On 07/05/16 at 08:33pm, Hidehiro Kawai wrote:
> > [snip]
> > > > > > > As for this patch I'm not sure it is safe to replace the
> > > > > > > smp_send_stop with the kdump friendly function. I'm also not sure 
> > > > > > > if
> > > > > > > the kdump friendly function is safe for kdump. Will glad to hear
> > > > > > > opinions from other arch experts.
> > > > > >
> > > > > > This stuff depends on architectures, so I speak only about
> > > > > > x86 (the logic doesn't change on other architectures at this time).
> > > > > >
> > > > > > kdump path with crash_kexec_post_notifiers disabled:
> > > > > >  panic()
> > > > > >__crash_kexec()
> > > > > >  crash_setup_regs()
> > > > > >  crash_save_vmcoreinfo()
> > > > > >  machine_crash_shutdown()
> > > > > >native_machine_crash_shutdown()
> > > > > >  panic_smp_send_stop() /* mostly same as original
> > > > > > * kdump_nmi_shootdown_cpus()
> > > > > > */
> > > > > >
> > > > > > kdump path with crash_kexec_post_notifiers enabled:
> > > > > >  panic()
> > > > > >panic_smp_send_stop()
> > > > > >__crash_kexec()
> > > > > >  crash_setup_regs()
> > > > > >  crash_save_vmcoreinfo()
> > > > > >  machine_crash_shutdown()
> > > > > >native_machine_crash_shutdown()
> > > > > >  panic_smp_send_stop() // do nothing
> > > > > >
> > > > > > The difference is that stopping other CPUs before crash_setup_regs()
> > > > > > and crash_save_vmcoreinfo() or not.  Since crash_setup_regs() and
> > > > > > crash_save_vmcoreinfo() just save information to some memory area,
> > > > > > they wouldn't be affected by panic_smp_send_stop().  This means
> > > > > > placing panic_smp_send_stop before __crash_kexec is safe.
> > > > > >
> > > > > > BTW, I noticed my patch breaks Xen kernel.  I'll fix it in the next
> > > > > > version.
> > > > >
> > > > > But it does breaks stuff which depends on cpu not being disabled like 
> > > > > problem 1 you mentioned in patch log.
> > > >
> > > > As I mentioned in the description of this patch, we should stop
> > > > other CPUs ASAP to preserve current state either
> > > > crash_kexec_post_notifiers is enabled or not.
> > > > Then, all remaining procedures should work well
> > > > after stopping other CPUs (but keep the CPU map online).
> > > >
> > > > Vivek also mentioned similar things:
> > > > https://lkml.org/lkml/2015/7/14/433
> > >
> > > The implementation in this patchset is different from suggestion in above 
> > > link?
> > >
> > > I think Vivek's suggestion is a good idea, to drop smp_send_stop and do 
> > > below:
> > >
> > > stop_cpus_save_register_state;
> > >
> > > if (!crash_kexec_post_notifiers)
> > >   crash_kexec()
> > > atomic_notifier_call_chain()
> > > kmsg_dump()
> > >
> > > I'm just commenting from code flow point of view, the detail 
> > > 

RE: Re: [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version

2016-07-19 Thread 河合英宏 / KAWAIHIDEHIRO
> From: 'Dave Young' [mailto:dyo...@redhat.com]
> Sent: Tuesday, July 19, 2016 3:52 PM
> Hi,
> On 07/19/16 at 05:51am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > Hi,
> >
> > > From: 'Dave Young' [mailto:dyo...@redhat.com]
> > > Sent: Monday, July 18, 2016 6:02 PM
> > > On 07/15/16 at 11:50am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > Hi Dave,
> > > >
> > > > Thanks for your reply.
> > > >
> > > > > From: 'Dave Young' [mailto:dyo...@redhat.com]
> > > > > Sent: Wednesday, July 13, 2016 11:04 AM
> > > > >
> > > > > On 07/12/16 at 02:49am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > > > Hi Dave,
> > > > > >
> > > > > > Thanks for the comments.
> > > > > >
> > > > > > > From: Dave Young [mailto:dyo...@redhat.com]
> > > > > > > Sent: Monday, July 11, 2016 5:35 PM
> > > > > > >
> > > > > > > On 07/05/16 at 08:33pm, Hidehiro Kawai wrote:
> > [snip]
> > > > > > > As for this patch I'm not sure it is safe to replace the
> > > > > > > smp_send_stop with the kdump friendly function. I'm also not sure 
> > > > > > > if
> > > > > > > the kdump friendly function is safe for kdump. Will glad to hear
> > > > > > > opinions from other arch experts.
> > > > > >
> > > > > > This stuff depends on architectures, so I speak only about
> > > > > > x86 (the logic doesn't change on other architectures at this time).
> > > > > >
> > > > > > kdump path with crash_kexec_post_notifiers disabled:
> > > > > >  panic()
> > > > > >__crash_kexec()
> > > > > >  crash_setup_regs()
> > > > > >  crash_save_vmcoreinfo()
> > > > > >  machine_crash_shutdown()
> > > > > >native_machine_crash_shutdown()
> > > > > >  panic_smp_send_stop() /* mostly same as original
> > > > > > * kdump_nmi_shootdown_cpus()
> > > > > > */
> > > > > >
> > > > > > kdump path with crash_kexec_post_notifiers enabled:
> > > > > >  panic()
> > > > > >panic_smp_send_stop()
> > > > > >__crash_kexec()
> > > > > >  crash_setup_regs()
> > > > > >  crash_save_vmcoreinfo()
> > > > > >  machine_crash_shutdown()
> > > > > >native_machine_crash_shutdown()
> > > > > >  panic_smp_send_stop() // do nothing
> > > > > >
> > > > > > The difference is that stopping other CPUs before crash_setup_regs()
> > > > > > and crash_save_vmcoreinfo() or not.  Since crash_setup_regs() and
> > > > > > crash_save_vmcoreinfo() just save information to some memory area,
> > > > > > they wouldn't be affected by panic_smp_send_stop().  This means
> > > > > > placing panic_smp_send_stop before __crash_kexec is safe.
> > > > > >
> > > > > > BTW, I noticed my patch breaks Xen kernel.  I'll fix it in the next
> > > > > > version.
> > > > >
> > > > > But it does breaks stuff which depends on cpu not being disabled like 
> > > > > problem 1 you mentioned in patch log.
> > > >
> > > > As I mentioned in the description of this patch, we should stop
> > > > other CPUs ASAP to preserve current state either
> > > > crash_kexec_post_notifiers is enabled or not.
> > > > Then, all remaining procedures should work well
> > > > after stopping other CPUs (but keep the CPU map online).
> > > >
> > > > Vivek also mentioned similar things:
> > > > https://lkml.org/lkml/2015/7/14/433
> > >
> > > The implementation in this patchset is different from suggestion in above 
> > > link?
> > >
> > > I think Vivek's suggestion is a good idea, to drop smp_send_stop and do 
> > > below:
> > >
> > > stop_cpus_save_register_state;
> > >
> > > if (!crash_kexec_post_notifiers)
> > >   crash_kexec()
> > > atomic_notifier_call_chain()
> > > kmsg_dump()
> > >
> > > I'm just commenting from code flow point of view, the detail 
> > > implementati

RE: Re: [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version

2016-07-18 Thread 河合英宏 / KAWAIHIDEHIRO
Hi,

> From: 'Dave Young' [mailto:dyo...@redhat.com]
> Sent: Monday, July 18, 2016 6:02 PM
> On 07/15/16 at 11:50am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > Hi Dave,
> >
> > Thanks for your reply.
> >
> > > From: 'Dave Young' [mailto:dyo...@redhat.com]
> > > Sent: Wednesday, July 13, 2016 11:04 AM
> > >
> > > On 07/12/16 at 02:49am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > Hi Dave,
> > > >
> > > > Thanks for the comments.
> > > >
> > > > > From: Dave Young [mailto:dyo...@redhat.com]
> > > > > Sent: Monday, July 11, 2016 5:35 PM
> > > > >
> > > > > On 07/05/16 at 08:33pm, Hidehiro Kawai wrote:
[snip] 
> > > > > As for this patch I'm not sure it is safe to replace the
> > > > > smp_send_stop with the kdump friendly function. I'm also not sure if
> > > > > the kdump friendly function is safe for kdump. Will glad to hear
> > > > > opinions from other arch experts.
> > > >
> > > > This stuff depends on architectures, so I speak only about
> > > > x86 (the logic doesn't change on other architectures at this time).
> > > >
> > > > kdump path with crash_kexec_post_notifiers disabled:
> > > >  panic()
> > > >__crash_kexec()
> > > >  crash_setup_regs()
> > > >  crash_save_vmcoreinfo()
> > > >  machine_crash_shutdown()
> > > >native_machine_crash_shutdown()
> > > >  panic_smp_send_stop() /* mostly same as original
> > > > * kdump_nmi_shootdown_cpus()
> > > > */
> > > >
> > > > kdump path with crash_kexec_post_notifiers enabled:
> > > >  panic()
> > > >panic_smp_send_stop()
> > > >__crash_kexec()
> > > >  crash_setup_regs()
> > > >  crash_save_vmcoreinfo()
> > > >  machine_crash_shutdown()
> > > >native_machine_crash_shutdown()
> > > >  panic_smp_send_stop() // do nothing
> > > >
> > > > The difference is that stopping other CPUs before crash_setup_regs()
> > > > and crash_save_vmcoreinfo() or not.  Since crash_setup_regs() and
> > > > crash_save_vmcoreinfo() just save information to some memory area,
> > > > they wouldn't be affected by panic_smp_send_stop().  This means
> > > > placing panic_smp_send_stop before __crash_kexec is safe.
> > > >
> > > > BTW, I noticed my patch breaks Xen kernel.  I'll fix it in the next
> > > > version.
> > >
> > > But it does breaks stuff which depends on cpu not being disabled like 
> > > problem 1 you mentioned in patch log.
> >
> > As I mentioned in the description of this patch, we should stop
> > other CPUs ASAP to preserve current state either
> > crash_kexec_post_notifiers is enabled or not.
> > Then, all remaining procedures should work well
> > after stopping other CPUs (but keep the CPU map online).
> >
> > Vivek also mentioned similar things:
> > https://lkml.org/lkml/2015/7/14/433
> 
> The implementation in this patchset is different from suggestion in above 
> link?
> 
> I think Vivek's suggestion is a good idea, to drop smp_send_stop and do below:
> 
> stop_cpus_save_register_state;
> 
> if (!crash_kexec_post_notifiers)
>   crash_kexec()
> atomic_notifier_call_chain()
> kmsg_dump()
> 
> I'm just commenting from code flow point of view, the detail implementation
> definitely need more comments from Arch experts.
> 
> Any reason did not move the kdump friendly function to earlier point like
> before previous __crash_kexec() below?
> if (!crash_kexec_post_notifiers) {
> printk_nmi_flush_on_panic();
> __crash_kexec(NULL);
> }

The reason why the implementation differs from Vivek's is to keep
the current code flow if crash_kexec_post_notifiers is not specified.

If we apply Vivek's or your suggestion, it may always cause kdump
to fail on MIPS OCTEON due to Problem 1.  I don't want to make things
any worse.  I may post a patch for MIPS OCTEON, but I can't test it.
For other architectures, I'm not sure what problems there are.
So at first, I want to fix the case where crash_kexec_post_notifiers is
specified on x86.  Then, if all other architectures support
`stop other CPUs before crash_kexec', switch to your or Vivek's
suggesting code.

Is this acceptable?

Best regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



RE: Re: [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version

2016-07-18 Thread 河合英宏 / KAWAIHIDEHIRO
Hi,

> From: 'Dave Young' [mailto:dyo...@redhat.com]
> Sent: Monday, July 18, 2016 6:02 PM
> On 07/15/16 at 11:50am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > Hi Dave,
> >
> > Thanks for your reply.
> >
> > > From: 'Dave Young' [mailto:dyo...@redhat.com]
> > > Sent: Wednesday, July 13, 2016 11:04 AM
> > >
> > > On 07/12/16 at 02:49am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > Hi Dave,
> > > >
> > > > Thanks for the comments.
> > > >
> > > > > From: Dave Young [mailto:dyo...@redhat.com]
> > > > > Sent: Monday, July 11, 2016 5:35 PM
> > > > >
> > > > > On 07/05/16 at 08:33pm, Hidehiro Kawai wrote:
[snip] 
> > > > > As for this patch I'm not sure it is safe to replace the
> > > > > smp_send_stop with the kdump friendly function. I'm also not sure if
> > > > > the kdump friendly function is safe for kdump. Will glad to hear
> > > > > opinions from other arch experts.
> > > >
> > > > This stuff depends on architectures, so I speak only about
> > > > x86 (the logic doesn't change on other architectures at this time).
> > > >
> > > > kdump path with crash_kexec_post_notifiers disabled:
> > > >  panic()
> > > >__crash_kexec()
> > > >  crash_setup_regs()
> > > >  crash_save_vmcoreinfo()
> > > >  machine_crash_shutdown()
> > > >native_machine_crash_shutdown()
> > > >  panic_smp_send_stop() /* mostly same as original
> > > > * kdump_nmi_shootdown_cpus()
> > > > */
> > > >
> > > > kdump path with crash_kexec_post_notifiers enabled:
> > > >  panic()
> > > >panic_smp_send_stop()
> > > >__crash_kexec()
> > > >  crash_setup_regs()
> > > >  crash_save_vmcoreinfo()
> > > >  machine_crash_shutdown()
> > > >native_machine_crash_shutdown()
> > > >  panic_smp_send_stop() // do nothing
> > > >
> > > > The difference is that stopping other CPUs before crash_setup_regs()
> > > > and crash_save_vmcoreinfo() or not.  Since crash_setup_regs() and
> > > > crash_save_vmcoreinfo() just save information to some memory area,
> > > > they wouldn't be affected by panic_smp_send_stop().  This means
> > > > placing panic_smp_send_stop before __crash_kexec is safe.
> > > >
> > > > BTW, I noticed my patch breaks Xen kernel.  I'll fix it in the next
> > > > version.
> > >
> > > But it does breaks stuff which depends on cpu not being disabled like 
> > > problem 1 you mentioned in patch log.
> >
> > As I mentioned in the description of this patch, we should stop
> > other CPUs ASAP to preserve current state either
> > crash_kexec_post_notifiers is enabled or not.
> > Then, all remaining procedures should work well
> > after stopping other CPUs (but keep the CPU map online).
> >
> > Vivek also mentioned similar things:
> > https://lkml.org/lkml/2015/7/14/433
> 
> The implementation in this patchset is different from suggestion in above 
> link?
> 
> I think Vivek's suggestion is a good idea, to drop smp_send_stop and do below:
> 
> stop_cpus_save_register_state;
> 
> if (!crash_kexec_post_notifiers)
>   crash_kexec()
> atomic_notifier_call_chain()
> kmsg_dump()
> 
> I'm just commenting from code flow point of view, the detail implementation
> definitely need more comments from Arch experts.
> 
> Any reason did not move the kdump friendly function to earlier point like
> before previous __crash_kexec() below?
> if (!crash_kexec_post_notifiers) {
> printk_nmi_flush_on_panic();
> __crash_kexec(NULL);
> }

The reason why the implementation differs from Vivek's is to keep
the current code flow if crash_kexec_post_notifiers is not specified.

If we apply Vivek's or your suggestion, it may always cause kdump
to fail on MIPS OCTEON due to Problem 1.  I don't want to make things
any worse.  I may post a patch for MIPS OCTEON, but I can't test it.
For other architectures, I'm not sure what problems there are.
So at first, I want to fix the case where crash_kexec_post_notifiers is
specified on x86.  Then, if all other architectures support
`stop other CPUs before crash_kexec', switch to your or Vivek's
suggesting code.

Is this acceptable?

Best regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



RE: Re: [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version

2016-07-15 Thread 河合英宏 / KAWAIHIDEHIRO
Hi Dave,

Thanks for your reply.

> From: 'Dave Young' [mailto:dyo...@redhat.com]
> Sent: Wednesday, July 13, 2016 11:04 AM
> 
> On 07/12/16 at 02:49am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > Hi Dave,
> >
> > Thanks for the comments.
> >
> > > From: Dave Young [mailto:dyo...@redhat.com]
> > > Sent: Monday, July 11, 2016 5:35 PM
> > >
> > > On 07/05/16 at 08:33pm, Hidehiro Kawai wrote:
> > > > This patch fixes one of the problems reported by Daniel Walker
> > > > (https://lkml.org/lkml/2015/6/24/44).
> > > >
> > > > If crash_kexec_post_notifiers boot option is specified, other CPUs
> > > > are stopped by smp_send_stop() instead of machine_crash_shutdown()
> > > > in crash_kexec() path.  This behavior change leads two problems.
> > > >
> > > >  Problem 1:
> > > >  octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs
> > > > are  still online and try to stop their watchdog timer.  If
> > > >  smp_send_stop() is called before octeon_generic_shutdown(),
> > > > stopping  watchdog timer will fail because other CPUs have been
> > > > offlined by  smp_send_stop().
> > > >
> > > >panic()
> > > >  if crash_kexec_post_notifiers == 1
> > > >smp_send_stop()
> > > >atomic_notifier_call_chain()
> > > >kmsg_dump()
> > > >  crash_kexec()
> > > >machine_crash_shutdown()
> > > >  octeon_generic_shutdown() // shutdown watchdog for ONLINE
> > > > CPUs
> > > >
> > > >  Problem 2:
> > > >  Most of architectures stop other CPUs in machine_crash_shutdown()
> > > > path, and they also do something needed for kdump.  For example,
> > > > they save registers, disable virtualization extensions, and so on.
> > > >  However, if smp_send_stop() stops other CPUs before
> > > > machine_crash_shutdown(), we miss those operations.
> > > >
> > > > How do we fix these problems?  In the first place, we should stop
> > > > other CPUs as soon as possible when panic() was called, otherwise
> > > > other CPUs may wipe out a clue to the cause of the failure.  So,
> > > > we replace smp_send_stop() with more suitable one for kdump.
> > >
> > > We have been avoiding extra things in panic path, but unfortunately
> > > crash_kexec_post_notifiers were added. I tend to agree the best
> > > place for this stuff is in 2nd kernel or purgatory instead of in 1st 
> > > kernel.
> >
> > Several months ago, I posted a patch set which writes regs to SEL,
> > generate an event to send SNMP message, and start/stop BMC's watchdog
> > timer in purgatory.  This feature requires BMC with KCS (Keyboard
> > Controller Style) I/F, but the most of enterprise grade server would have 
> > it.
> > (http://thread.gmane.org/gmane.linux.kernel.kexec/15382)
> >
> > Doing kmsg_dump things in purgatory wouldn't be suitable (should be
> > done in the 2nd kernel before enabling devices and IRQs?)
> 
> In theory it is doable maybe do something like oldmem_kmsg_dump while 
> /proc/vmcore initializing?

Hmm, I checked the case of using ACPI ERST as a persistent
storage. The feature is initialized by device_initcall():

device_initcall
  erst_init
pstore_register

And vmcore is initialized by fs_initcall() which is called
before device_initcall().  We may be able to change the sequence,
but anyway, these are done in later phase of the kernel initialization.
So, it may get less reliable although it would be doable.

> > > As for this patch I'm not sure it is safe to replace the
> > > smp_send_stop with the kdump friendly function. I'm also not sure if
> > > the kdump friendly function is safe for kdump. Will glad to hear
> > > opinions from other arch experts.
> >
> > This stuff depends on architectures, so I speak only about
> > x86 (the logic doesn't change on other architectures at this time).
> >
> > kdump path with crash_kexec_post_notifiers disabled:
> >  panic()
> >__crash_kexec()
> >  crash_setup_regs()
> >  crash_save_vmcoreinfo()
> >  machine_crash_shutdown()
> >native_machine_crash_shutdown()
> >  panic_smp_send_stop() /* mostly same as original
> > * kdump_nmi_shootdown_cpus()
> > */
> >
> > kdump path with crash_kexec_post_notifiers enabled:
> >  panic()
> >panic_smp_send_stop(

RE: Re: [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version

2016-07-15 Thread 河合英宏 / KAWAIHIDEHIRO
Hi Dave,

Thanks for your reply.

> From: 'Dave Young' [mailto:dyo...@redhat.com]
> Sent: Wednesday, July 13, 2016 11:04 AM
> 
> On 07/12/16 at 02:49am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > Hi Dave,
> >
> > Thanks for the comments.
> >
> > > From: Dave Young [mailto:dyo...@redhat.com]
> > > Sent: Monday, July 11, 2016 5:35 PM
> > >
> > > On 07/05/16 at 08:33pm, Hidehiro Kawai wrote:
> > > > This patch fixes one of the problems reported by Daniel Walker
> > > > (https://lkml.org/lkml/2015/6/24/44).
> > > >
> > > > If crash_kexec_post_notifiers boot option is specified, other CPUs
> > > > are stopped by smp_send_stop() instead of machine_crash_shutdown()
> > > > in crash_kexec() path.  This behavior change leads two problems.
> > > >
> > > >  Problem 1:
> > > >  octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs
> > > > are  still online and try to stop their watchdog timer.  If
> > > >  smp_send_stop() is called before octeon_generic_shutdown(),
> > > > stopping  watchdog timer will fail because other CPUs have been
> > > > offlined by  smp_send_stop().
> > > >
> > > >panic()
> > > >  if crash_kexec_post_notifiers == 1
> > > >smp_send_stop()
> > > >atomic_notifier_call_chain()
> > > >kmsg_dump()
> > > >  crash_kexec()
> > > >machine_crash_shutdown()
> > > >  octeon_generic_shutdown() // shutdown watchdog for ONLINE
> > > > CPUs
> > > >
> > > >  Problem 2:
> > > >  Most of architectures stop other CPUs in machine_crash_shutdown()
> > > > path, and they also do something needed for kdump.  For example,
> > > > they save registers, disable virtualization extensions, and so on.
> > > >  However, if smp_send_stop() stops other CPUs before
> > > > machine_crash_shutdown(), we miss those operations.
> > > >
> > > > How do we fix these problems?  In the first place, we should stop
> > > > other CPUs as soon as possible when panic() was called, otherwise
> > > > other CPUs may wipe out a clue to the cause of the failure.  So,
> > > > we replace smp_send_stop() with more suitable one for kdump.
> > >
> > > We have been avoiding extra things in panic path, but unfortunately
> > > crash_kexec_post_notifiers were added. I tend to agree the best
> > > place for this stuff is in 2nd kernel or purgatory instead of in 1st 
> > > kernel.
> >
> > Several months ago, I posted a patch set which writes regs to SEL,
> > generate an event to send SNMP message, and start/stop BMC's watchdog
> > timer in purgatory.  This feature requires BMC with KCS (Keyboard
> > Controller Style) I/F, but the most of enterprise grade server would have 
> > it.
> > (http://thread.gmane.org/gmane.linux.kernel.kexec/15382)
> >
> > Doing kmsg_dump things in purgatory wouldn't be suitable (should be
> > done in the 2nd kernel before enabling devices and IRQs?)
> 
> In theory it is doable maybe do something like oldmem_kmsg_dump while 
> /proc/vmcore initializing?

Hmm, I checked the case of using ACPI ERST as a persistent
storage. The feature is initialized by device_initcall():

device_initcall
  erst_init
pstore_register

And vmcore is initialized by fs_initcall() which is called
before device_initcall().  We may be able to change the sequence,
but anyway, these are done in later phase of the kernel initialization.
So, it may get less reliable although it would be doable.

> > > As for this patch I'm not sure it is safe to replace the
> > > smp_send_stop with the kdump friendly function. I'm also not sure if
> > > the kdump friendly function is safe for kdump. Will glad to hear
> > > opinions from other arch experts.
> >
> > This stuff depends on architectures, so I speak only about
> > x86 (the logic doesn't change on other architectures at this time).
> >
> > kdump path with crash_kexec_post_notifiers disabled:
> >  panic()
> >__crash_kexec()
> >  crash_setup_regs()
> >  crash_save_vmcoreinfo()
> >  machine_crash_shutdown()
> >native_machine_crash_shutdown()
> >  panic_smp_send_stop() /* mostly same as original
> > * kdump_nmi_shootdown_cpus()
> > */
> >
> > kdump path with crash_kexec_post_notifiers enabled:
> >  panic()
> >panic_smp_send_stop(

RE: Re: [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version

2016-07-12 Thread 河合英宏 / KAWAIHIDEHIRO
> From: linux-kernel-ow...@vger.kernel.org 
> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Xunlei Pang
> Sent: Tuesday, July 12, 2016 3:57 PM
> On 2016/07/12 at 11:56, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > Hi Xunlei,
> >
> > Thanks for the review.
> >
> >> From: Xunlei Pang [mailto:xp...@redhat.com]
> >> Sent: Tuesday, July 12, 2016 12:12 PM
> >> On 2016/07/05 at 19:33, Hidehiro Kawai wrote:
> >>> This patch fixes one of the problems reported by Daniel Walker
> >>> (https://lkml.org/lkml/2015/6/24/44).
> >>>
> >>> If crash_kexec_post_notifiers boot option is specified, other CPUs
> >>> are stopped by smp_send_stop() instead of machine_crash_shutdown()
> >>> in crash_kexec() path.  This behavior change leads two problems.
> >>>
> >>>  Problem 1:
> >>>  octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs are
> >>>  still online and try to stop their watchdog timer.  If
> >>>  smp_send_stop() is called before octeon_generic_shutdown(), stopping
> >>>  watchdog timer will fail because other CPUs have been offlined by
> >>>  smp_send_stop().
> >>>
> >>>panic()
> >>>  if crash_kexec_post_notifiers == 1
> >>>smp_send_stop()
> >>>atomic_notifier_call_chain()
> >>>kmsg_dump()
> >>>  crash_kexec()
> >>>machine_crash_shutdown()
> >>>  octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs
> >>>
> >>>  Problem 2:
> >>>  Most of architectures stop other CPUs in machine_crash_shutdown()
> >>>  path, and they also do something needed for kdump.  For example,
> >>>  they save registers, disable virtualization extensions, and so on.
> >>>  However, if smp_send_stop() stops other CPUs before
> >>>  machine_crash_shutdown(), we miss those operations.
> >>>
> >>> How do we fix these problems?  In the first place, we should stop
> >>> other CPUs as soon as possible when panic() was called, otherwise
> >>> other CPUs may wipe out a clue to the cause of the failure.  So, we
> >>> replace smp_send_stop() with more suitable one for kdump.
> >>>
> >>> This patch solves Problem 2 by replacing smp_send_stop() in panic()
> >>> with panic_smp_send_stop().  This is a weak function which calls
> >>> smp_send_stop(), and architecture dependent code may override this
> >>> with appropriate one.  This patch only provides x86-specific version.
> >>>
> >>> Changes in V3:
> >>> - Revise comments, description, and symbol names
> >>>
> >>> Changes in V2:
> >>> - Replace smp_send_stop() call with crash_kexec version which
> >>>   saves cpu states and cleans up VMX/SVM
> >>> - Drop a fix for Problem 1 at this moment
> >>>
> >>> Reported-by: Daniel Walker <dwal...@fifo99.com>
> >>> Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" 
> >>> option)
> >>> Signed-off-by: Hidehiro Kawai <hidehiro.kawai...@hitachi.com>
> >>> Cc: Dave Young <dyo...@redhat.com>
> >>> Cc: Baoquan He <b...@redhat.com>
> >>> Cc: Vivek Goyal <vgo...@redhat.com>
> >>> Cc: Eric Biederman <ebied...@xmission.com>
> >>> Cc: Masami Hiramatsu <mhira...@kernel.org>
> >>> Cc: Thomas Gleixner <t...@linutronix.de>
> >>> Cc: Ingo Molnar <mi...@redhat.com>
> >>> Cc: "H. Peter Anvin" <h...@zytor.com>
> >>> Cc: Borislav Petkov <b...@suse.de>
> >>> Cc: Toshi Kani <toshi.k...@hpe.com>
> >>> Cc: "Peter Zijlstra (Intel)" <pet...@infradead.org>
> >>> Cc: Takao Indoh <indou.ta...@jp.fujitsu.com>
> >>> Cc: "Lee, Chun-Yi" <joeyli.ker...@gmail.com>
> >>> Cc: Minfei Huang <mnfhu...@gmail.com>
> >>> Cc: Andrew Morton <a...@linux-foundation.org>
> >>> Cc: Michal Hocko <mho...@suse.com>
> >>> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
> >>> Cc: Petr Mladek <pmla...@suse.com>
> >>> Cc: Tejun Heo <t...@kernel.org>
> >>> Cc: Josh Poimboeuf <jpoim...@redhat.com>
> >>> ---
> >>>  arch/x86/kernel/crash.c |   14 ++
> >>>  kernel/panic.c  |   43 
> >>> 

RE: Re: [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version

2016-07-12 Thread 河合英宏 / KAWAIHIDEHIRO
> From: linux-kernel-ow...@vger.kernel.org 
> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Xunlei Pang
> Sent: Tuesday, July 12, 2016 3:57 PM
> On 2016/07/12 at 11:56, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > Hi Xunlei,
> >
> > Thanks for the review.
> >
> >> From: Xunlei Pang [mailto:xp...@redhat.com]
> >> Sent: Tuesday, July 12, 2016 12:12 PM
> >> On 2016/07/05 at 19:33, Hidehiro Kawai wrote:
> >>> This patch fixes one of the problems reported by Daniel Walker
> >>> (https://lkml.org/lkml/2015/6/24/44).
> >>>
> >>> If crash_kexec_post_notifiers boot option is specified, other CPUs
> >>> are stopped by smp_send_stop() instead of machine_crash_shutdown()
> >>> in crash_kexec() path.  This behavior change leads two problems.
> >>>
> >>>  Problem 1:
> >>>  octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs are
> >>>  still online and try to stop their watchdog timer.  If
> >>>  smp_send_stop() is called before octeon_generic_shutdown(), stopping
> >>>  watchdog timer will fail because other CPUs have been offlined by
> >>>  smp_send_stop().
> >>>
> >>>panic()
> >>>  if crash_kexec_post_notifiers == 1
> >>>smp_send_stop()
> >>>atomic_notifier_call_chain()
> >>>kmsg_dump()
> >>>  crash_kexec()
> >>>machine_crash_shutdown()
> >>>  octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs
> >>>
> >>>  Problem 2:
> >>>  Most of architectures stop other CPUs in machine_crash_shutdown()
> >>>  path, and they also do something needed for kdump.  For example,
> >>>  they save registers, disable virtualization extensions, and so on.
> >>>  However, if smp_send_stop() stops other CPUs before
> >>>  machine_crash_shutdown(), we miss those operations.
> >>>
> >>> How do we fix these problems?  In the first place, we should stop
> >>> other CPUs as soon as possible when panic() was called, otherwise
> >>> other CPUs may wipe out a clue to the cause of the failure.  So, we
> >>> replace smp_send_stop() with more suitable one for kdump.
> >>>
> >>> This patch solves Problem 2 by replacing smp_send_stop() in panic()
> >>> with panic_smp_send_stop().  This is a weak function which calls
> >>> smp_send_stop(), and architecture dependent code may override this
> >>> with appropriate one.  This patch only provides x86-specific version.
> >>>
> >>> Changes in V3:
> >>> - Revise comments, description, and symbol names
> >>>
> >>> Changes in V2:
> >>> - Replace smp_send_stop() call with crash_kexec version which
> >>>   saves cpu states and cleans up VMX/SVM
> >>> - Drop a fix for Problem 1 at this moment
> >>>
> >>> Reported-by: Daniel Walker 
> >>> Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" 
> >>> option)
> >>> Signed-off-by: Hidehiro Kawai 
> >>> Cc: Dave Young 
> >>> Cc: Baoquan He 
> >>> Cc: Vivek Goyal 
> >>> Cc: Eric Biederman 
> >>> Cc: Masami Hiramatsu 
> >>> Cc: Thomas Gleixner 
> >>> Cc: Ingo Molnar 
> >>> Cc: "H. Peter Anvin" 
> >>> Cc: Borislav Petkov 
> >>> Cc: Toshi Kani 
> >>> Cc: "Peter Zijlstra (Intel)" 
> >>> Cc: Takao Indoh 
> >>> Cc: "Lee, Chun-Yi" 
> >>> Cc: Minfei Huang 
> >>> Cc: Andrew Morton 
> >>> Cc: Michal Hocko 
> >>> Cc: Vitaly Kuznetsov 
> >>> Cc: Petr Mladek 
> >>> Cc: Tejun Heo 
> >>> Cc: Josh Poimboeuf 
> >>> ---
> >>>  arch/x86/kernel/crash.c |   14 ++
> >>>  kernel/panic.c  |   43 
> >>> ---
> >>>  2 files changed, 42 insertions(+), 15 deletions(-)
> >>>
> >>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> >>> index 9ef978d..3305433 100644
> >>> --- a/arch/x86/kernel/crash.c
> >>> +++ b/arch/x86/kernel/crash.c
> >>> @@ -133,15 +133,21 @@ static void kdump_nmi_callback(int cpu, struct 
> >>> pt_regs *regs)
> >>>   disable_local_APIC();
> >>>  }
> >>>
> >>> -static void kdump_nmi_shootdown_cpus(void)
> >>> +/* Override the weak function in kernel/panic.c */
> >>> +void panic_smp_send_stop(void)
> >>>  {
> >>> - nmi_shootdown_cpus(kdump_nmi_callback);
> >>> + static int cpus_stopped;
> >> Should be atomic_t type?
> > panic_smp_send_stop() can be called by only one panicking CPU
> > (but can be called twice). It is sufficient to be normal variable.
> 
> There are other call sites of __crash_kexec() for oops cases, which can
> call panic_smp_send_stop() concurrently on a different cpu.

In oops cases, crash_kexec() is called first, then __crash_kexec() is
called.  crash_kexec() excludes concurrent execution of panic() and
crash_kexec() via panic_cpu, so panic_smp_send_stop() shouldn't be
called concurrently.

Best regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



RE: Re: [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version

2016-07-11 Thread 河合英宏 / KAWAIHIDEHIRO
Hi Xunlei,

Thanks for the review.

> From: Xunlei Pang [mailto:xp...@redhat.com]
> Sent: Tuesday, July 12, 2016 12:12 PM
> On 2016/07/05 at 19:33, Hidehiro Kawai wrote:
> > This patch fixes one of the problems reported by Daniel Walker
> > (https://lkml.org/lkml/2015/6/24/44).
> >
> > If crash_kexec_post_notifiers boot option is specified, other CPUs
> > are stopped by smp_send_stop() instead of machine_crash_shutdown()
> > in crash_kexec() path.  This behavior change leads two problems.
> >
> >  Problem 1:
> >  octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs are
> >  still online and try to stop their watchdog timer.  If
> >  smp_send_stop() is called before octeon_generic_shutdown(), stopping
> >  watchdog timer will fail because other CPUs have been offlined by
> >  smp_send_stop().
> >
> >panic()
> >  if crash_kexec_post_notifiers == 1
> >smp_send_stop()
> >atomic_notifier_call_chain()
> >kmsg_dump()
> >  crash_kexec()
> >machine_crash_shutdown()
> >  octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs
> >
> >  Problem 2:
> >  Most of architectures stop other CPUs in machine_crash_shutdown()
> >  path, and they also do something needed for kdump.  For example,
> >  they save registers, disable virtualization extensions, and so on.
> >  However, if smp_send_stop() stops other CPUs before
> >  machine_crash_shutdown(), we miss those operations.
> >
> > How do we fix these problems?  In the first place, we should stop
> > other CPUs as soon as possible when panic() was called, otherwise
> > other CPUs may wipe out a clue to the cause of the failure.  So, we
> > replace smp_send_stop() with more suitable one for kdump.
> >
> > This patch solves Problem 2 by replacing smp_send_stop() in panic()
> > with panic_smp_send_stop().  This is a weak function which calls
> > smp_send_stop(), and architecture dependent code may override this
> > with appropriate one.  This patch only provides x86-specific version.
> >
> > Changes in V3:
> > - Revise comments, description, and symbol names
> >
> > Changes in V2:
> > - Replace smp_send_stop() call with crash_kexec version which
> >   saves cpu states and cleans up VMX/SVM
> > - Drop a fix for Problem 1 at this moment
> >
> > Reported-by: Daniel Walker 
> > Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" 
> > option)
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Dave Young 
> > Cc: Baoquan He 
> > Cc: Vivek Goyal 
> > Cc: Eric Biederman 
> > Cc: Masami Hiramatsu 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: Borislav Petkov 
> > Cc: Toshi Kani 
> > Cc: "Peter Zijlstra (Intel)" 
> > Cc: Takao Indoh 
> > Cc: "Lee, Chun-Yi" 
> > Cc: Minfei Huang 
> > Cc: Andrew Morton 
> > Cc: Michal Hocko 
> > Cc: Vitaly Kuznetsov 
> > Cc: Petr Mladek 
> > Cc: Tejun Heo 
> > Cc: Josh Poimboeuf 
> > ---
> >  arch/x86/kernel/crash.c |   14 ++
> >  kernel/panic.c  |   43 ---
> >  2 files changed, 42 insertions(+), 15 deletions(-)
> >
> > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> > index 9ef978d..3305433 100644
> > --- a/arch/x86/kernel/crash.c
> > +++ b/arch/x86/kernel/crash.c
> > @@ -133,15 +133,21 @@ static void kdump_nmi_callback(int cpu, struct 
> > pt_regs *regs)
> > disable_local_APIC();
> >  }
> >
> > -static void kdump_nmi_shootdown_cpus(void)
> > +/* Override the weak function in kernel/panic.c */
> > +void panic_smp_send_stop(void)
> >  {
> > -   nmi_shootdown_cpus(kdump_nmi_callback);
> > +   static int cpus_stopped;
> 
> Should be atomic_t type?

panic_smp_send_stop() can be called by only one panicking CPU
(but can be called twice). It is sufficient to be normal variable.

> > +
> > +   if (cpus_stopped)
> > +   return;
> >
> > +   nmi_shootdown_cpus(kdump_nmi_callback);
> > disable_local_APIC();
> > +   cpus_stopped = 1;
> >  }
> >
> >  #else
> > -static void kdump_nmi_shootdown_cpus(void)
> > +void panic_smp_send_stop(void)
> >  {
> > /* There are no cpus to shootdown */
> >  }
> > @@ -160,7 +166,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
> > /* The kernel is broken so disable interrupts */
> > local_irq_disable();
> >
> > -   kdump_nmi_shootdown_cpus();
> > +   panic_smp_send_stop();
> >
> > /*
> >  * VMCLEAR VMCSs loaded on this cpu if needed.
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index 8aa7449..da8062d2 100644
> > --- a/kernel/panic.c
> > +++ 

RE: Re: [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version

2016-07-11 Thread 河合英宏 / KAWAIHIDEHIRO
Hi Xunlei,

Thanks for the review.

> From: Xunlei Pang [mailto:xp...@redhat.com]
> Sent: Tuesday, July 12, 2016 12:12 PM
> On 2016/07/05 at 19:33, Hidehiro Kawai wrote:
> > This patch fixes one of the problems reported by Daniel Walker
> > (https://lkml.org/lkml/2015/6/24/44).
> >
> > If crash_kexec_post_notifiers boot option is specified, other CPUs
> > are stopped by smp_send_stop() instead of machine_crash_shutdown()
> > in crash_kexec() path.  This behavior change leads two problems.
> >
> >  Problem 1:
> >  octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs are
> >  still online and try to stop their watchdog timer.  If
> >  smp_send_stop() is called before octeon_generic_shutdown(), stopping
> >  watchdog timer will fail because other CPUs have been offlined by
> >  smp_send_stop().
> >
> >panic()
> >  if crash_kexec_post_notifiers == 1
> >smp_send_stop()
> >atomic_notifier_call_chain()
> >kmsg_dump()
> >  crash_kexec()
> >machine_crash_shutdown()
> >  octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs
> >
> >  Problem 2:
> >  Most of architectures stop other CPUs in machine_crash_shutdown()
> >  path, and they also do something needed for kdump.  For example,
> >  they save registers, disable virtualization extensions, and so on.
> >  However, if smp_send_stop() stops other CPUs before
> >  machine_crash_shutdown(), we miss those operations.
> >
> > How do we fix these problems?  In the first place, we should stop
> > other CPUs as soon as possible when panic() was called, otherwise
> > other CPUs may wipe out a clue to the cause of the failure.  So, we
> > replace smp_send_stop() with more suitable one for kdump.
> >
> > This patch solves Problem 2 by replacing smp_send_stop() in panic()
> > with panic_smp_send_stop().  This is a weak function which calls
> > smp_send_stop(), and architecture dependent code may override this
> > with appropriate one.  This patch only provides x86-specific version.
> >
> > Changes in V3:
> > - Revise comments, description, and symbol names
> >
> > Changes in V2:
> > - Replace smp_send_stop() call with crash_kexec version which
> >   saves cpu states and cleans up VMX/SVM
> > - Drop a fix for Problem 1 at this moment
> >
> > Reported-by: Daniel Walker 
> > Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" 
> > option)
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Dave Young 
> > Cc: Baoquan He 
> > Cc: Vivek Goyal 
> > Cc: Eric Biederman 
> > Cc: Masami Hiramatsu 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: Borislav Petkov 
> > Cc: Toshi Kani 
> > Cc: "Peter Zijlstra (Intel)" 
> > Cc: Takao Indoh 
> > Cc: "Lee, Chun-Yi" 
> > Cc: Minfei Huang 
> > Cc: Andrew Morton 
> > Cc: Michal Hocko 
> > Cc: Vitaly Kuznetsov 
> > Cc: Petr Mladek 
> > Cc: Tejun Heo 
> > Cc: Josh Poimboeuf 
> > ---
> >  arch/x86/kernel/crash.c |   14 ++
> >  kernel/panic.c  |   43 ---
> >  2 files changed, 42 insertions(+), 15 deletions(-)
> >
> > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> > index 9ef978d..3305433 100644
> > --- a/arch/x86/kernel/crash.c
> > +++ b/arch/x86/kernel/crash.c
> > @@ -133,15 +133,21 @@ static void kdump_nmi_callback(int cpu, struct 
> > pt_regs *regs)
> > disable_local_APIC();
> >  }
> >
> > -static void kdump_nmi_shootdown_cpus(void)
> > +/* Override the weak function in kernel/panic.c */
> > +void panic_smp_send_stop(void)
> >  {
> > -   nmi_shootdown_cpus(kdump_nmi_callback);
> > +   static int cpus_stopped;
> 
> Should be atomic_t type?

panic_smp_send_stop() can be called by only one panicking CPU
(but can be called twice). It is sufficient to be normal variable.

> > +
> > +   if (cpus_stopped)
> > +   return;
> >
> > +   nmi_shootdown_cpus(kdump_nmi_callback);
> > disable_local_APIC();
> > +   cpus_stopped = 1;
> >  }
> >
> >  #else
> > -static void kdump_nmi_shootdown_cpus(void)
> > +void panic_smp_send_stop(void)
> >  {
> > /* There are no cpus to shootdown */
> >  }
> > @@ -160,7 +166,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
> > /* The kernel is broken so disable interrupts */
> > local_irq_disable();
> >
> > -   kdump_nmi_shootdown_cpus();
> > +   panic_smp_send_stop();
> >
> > /*
> >  * VMCLEAR VMCSs loaded on this cpu if needed.
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index 8aa7449..da8062d2 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -71,6 +71,32 @@ void __weak nmi_panic_self_stop(struct pt_regs *regs)
> > panic_smp_self_stop();
> >  }
> >
> > +/*
> > + * Stop other CPUs in panic.  Architecture dependent code may override this
> > + * with more suitable version.  For example, if the architecture supports
> > + * crash dump, it should save registers of each stopped CPU and disable
> > + * per-CPU features such as virtualization extensions.
> > + */
> > 

RE: Re: [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version

2016-07-11 Thread 河合英宏 / KAWAIHIDEHIRO
Hi Dave,

Thanks for the comments.

> From: Dave Young [mailto:dyo...@redhat.com]
> Sent: Monday, July 11, 2016 5:35 PM
> 
> On 07/05/16 at 08:33pm, Hidehiro Kawai wrote:
> > This patch fixes one of the problems reported by Daniel Walker
> > (https://lkml.org/lkml/2015/6/24/44).
> >
> > If crash_kexec_post_notifiers boot option is specified, other CPUs
> > are stopped by smp_send_stop() instead of machine_crash_shutdown()
> > in crash_kexec() path.  This behavior change leads two problems.
> >
> >  Problem 1:
> >  octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs are
> >  still online and try to stop their watchdog timer.  If
> >  smp_send_stop() is called before octeon_generic_shutdown(), stopping
> >  watchdog timer will fail because other CPUs have been offlined by
> >  smp_send_stop().
> >
> >panic()
> >  if crash_kexec_post_notifiers == 1
> >smp_send_stop()
> >atomic_notifier_call_chain()
> >kmsg_dump()
> >  crash_kexec()
> >machine_crash_shutdown()
> >  octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs
> >
> >  Problem 2:
> >  Most of architectures stop other CPUs in machine_crash_shutdown()
> >  path, and they also do something needed for kdump.  For example,
> >  they save registers, disable virtualization extensions, and so on.
> >  However, if smp_send_stop() stops other CPUs before
> >  machine_crash_shutdown(), we miss those operations.
> >
> > How do we fix these problems?  In the first place, we should stop
> > other CPUs as soon as possible when panic() was called, otherwise
> > other CPUs may wipe out a clue to the cause of the failure.  So, we
> > replace smp_send_stop() with more suitable one for kdump.
> 
> We have been avoiding extra things in panic path, but unfortunately
> crash_kexec_post_notifiers were added. I tend to agree the best place
> for this stuff is in 2nd kernel or purgatory instead of in 1st kernel.

Several months ago, I posted a patch set which writes regs to SEL, generate
an event to send SNMP message, and start/stop BMC's watchdog timer in
purgatory.  This feature requires BMC with KCS (Keyboard Controller Style)
I/F, but the most of enterprise grade server would have it.
(http://thread.gmane.org/gmane.linux.kernel.kexec/15382)

Doing kmsg_dump things in purgatory wouldn't be suitable (should be done
in the 2nd kernel before enabling devices and IRQs?)
 
> As for this patch I'm not sure it is safe to replace the smp_send_stop
> with the kdump friendly function. I'm also not sure if the kdump friendly
> function is safe for kdump. Will glad to hear opinions from other
> arch experts.

This stuff depends on architectures, so I speak only about
x86 (the logic doesn't change on other architectures at this time).

kdump path with crash_kexec_post_notifiers disabled:
 panic()
   __crash_kexec()
 crash_setup_regs()
 crash_save_vmcoreinfo()
 machine_crash_shutdown()
   native_machine_crash_shutdown()
 panic_smp_send_stop() /* mostly same as original 
* kdump_nmi_shootdown_cpus()
*/

kdump path with crash_kexec_post_notifiers enabled:
 panic()
   panic_smp_send_stop()
   __crash_kexec()
 crash_setup_regs()
 crash_save_vmcoreinfo()
 machine_crash_shutdown()
   native_machine_crash_shutdown()
 panic_smp_send_stop() // do nothing

The difference is that stopping other CPUs before crash_setup_regs()
and crash_save_vmcoreinfo() or not.  Since crash_setup_regs() and
crash_save_vmcoreinfo() just save information to some memory area, 
they wouldn't be affected by panic_smp_send_stop().  This means
placing panic_smp_send_stop before __crash_kexec is safe.

BTW, I noticed my patch breaks Xen kernel.  I'll fix it in the
next version.

> BTW, if one want to use crash_kexec_post_notifiers he should take the
> risk of unreliable kdump. How about only call smp_send_stop in case no
> crash_kexec_post_notifiers being used.

Unlike panic_smp_send_stop()/kdump_nmi_shootdown_cpus(), smp_send_stop()
for x86 tries to stop other CPUs with normal IPI before issuing NMI IPI.
This would be because NMI IPI has a risk of deadlock.  We checked if
the kdump path has a risk of deadlock in the case of NMI panic and fixed
it.  But I'm not sure about normal panic path.  I agree with that use
smp_send_stop if crash_kexec_post_notifiers or kdump is disabled.

> > This patch solves Problem 2 by replacing smp_send_stop() in panic()
> > with panic_smp_send_stop().  This is a weak function which calls
> > smp_send_stop(), and architecture dependent code may override this
> > with appropriate one.  This patch only provides x86-specific version.
> 
> It does not fix the Problem 1, it seem not possible to fix it?

Problem 1 depends on architectures, and at least it doesn't happen
on x86.  I can try to fix the Problem 1 for MIPS, but I can't test it.
Possible solution will be to use an smp_send_stop variant which stop
the CPU 

RE: Re: [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version

2016-07-11 Thread 河合英宏 / KAWAIHIDEHIRO
Hi Dave,

Thanks for the comments.

> From: Dave Young [mailto:dyo...@redhat.com]
> Sent: Monday, July 11, 2016 5:35 PM
> 
> On 07/05/16 at 08:33pm, Hidehiro Kawai wrote:
> > This patch fixes one of the problems reported by Daniel Walker
> > (https://lkml.org/lkml/2015/6/24/44).
> >
> > If crash_kexec_post_notifiers boot option is specified, other CPUs
> > are stopped by smp_send_stop() instead of machine_crash_shutdown()
> > in crash_kexec() path.  This behavior change leads two problems.
> >
> >  Problem 1:
> >  octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs are
> >  still online and try to stop their watchdog timer.  If
> >  smp_send_stop() is called before octeon_generic_shutdown(), stopping
> >  watchdog timer will fail because other CPUs have been offlined by
> >  smp_send_stop().
> >
> >panic()
> >  if crash_kexec_post_notifiers == 1
> >smp_send_stop()
> >atomic_notifier_call_chain()
> >kmsg_dump()
> >  crash_kexec()
> >machine_crash_shutdown()
> >  octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs
> >
> >  Problem 2:
> >  Most of architectures stop other CPUs in machine_crash_shutdown()
> >  path, and they also do something needed for kdump.  For example,
> >  they save registers, disable virtualization extensions, and so on.
> >  However, if smp_send_stop() stops other CPUs before
> >  machine_crash_shutdown(), we miss those operations.
> >
> > How do we fix these problems?  In the first place, we should stop
> > other CPUs as soon as possible when panic() was called, otherwise
> > other CPUs may wipe out a clue to the cause of the failure.  So, we
> > replace smp_send_stop() with more suitable one for kdump.
> 
> We have been avoiding extra things in panic path, but unfortunately
> crash_kexec_post_notifiers were added. I tend to agree the best place
> for this stuff is in 2nd kernel or purgatory instead of in 1st kernel.

Several months ago, I posted a patch set which writes regs to SEL, generate
an event to send SNMP message, and start/stop BMC's watchdog timer in
purgatory.  This feature requires BMC with KCS (Keyboard Controller Style)
I/F, but the most of enterprise grade server would have it.
(http://thread.gmane.org/gmane.linux.kernel.kexec/15382)

Doing kmsg_dump things in purgatory wouldn't be suitable (should be done
in the 2nd kernel before enabling devices and IRQs?)
 
> As for this patch I'm not sure it is safe to replace the smp_send_stop
> with the kdump friendly function. I'm also not sure if the kdump friendly
> function is safe for kdump. Will glad to hear opinions from other
> arch experts.

This stuff depends on architectures, so I speak only about
x86 (the logic doesn't change on other architectures at this time).

kdump path with crash_kexec_post_notifiers disabled:
 panic()
   __crash_kexec()
 crash_setup_regs()
 crash_save_vmcoreinfo()
 machine_crash_shutdown()
   native_machine_crash_shutdown()
 panic_smp_send_stop() /* mostly same as original 
* kdump_nmi_shootdown_cpus()
*/

kdump path with crash_kexec_post_notifiers enabled:
 panic()
   panic_smp_send_stop()
   __crash_kexec()
 crash_setup_regs()
 crash_save_vmcoreinfo()
 machine_crash_shutdown()
   native_machine_crash_shutdown()
 panic_smp_send_stop() // do nothing

The difference is that stopping other CPUs before crash_setup_regs()
and crash_save_vmcoreinfo() or not.  Since crash_setup_regs() and
crash_save_vmcoreinfo() just save information to some memory area, 
they wouldn't be affected by panic_smp_send_stop().  This means
placing panic_smp_send_stop before __crash_kexec is safe.

BTW, I noticed my patch breaks Xen kernel.  I'll fix it in the
next version.

> BTW, if one want to use crash_kexec_post_notifiers he should take the
> risk of unreliable kdump. How about only call smp_send_stop in case no
> crash_kexec_post_notifiers being used.

Unlike panic_smp_send_stop()/kdump_nmi_shootdown_cpus(), smp_send_stop()
for x86 tries to stop other CPUs with normal IPI before issuing NMI IPI.
This would be because NMI IPI has a risk of deadlock.  We checked if
the kdump path has a risk of deadlock in the case of NMI panic and fixed
it.  But I'm not sure about normal panic path.  I agree with that use
smp_send_stop if crash_kexec_post_notifiers or kdump is disabled.

> > This patch solves Problem 2 by replacing smp_send_stop() in panic()
> > with panic_smp_send_stop().  This is a weak function which calls
> > smp_send_stop(), and architecture dependent code may override this
> > with appropriate one.  This patch only provides x86-specific version.
> 
> It does not fix the Problem 1, it seem not possible to fix it?

Problem 1 depends on architectures, and at least it doesn't happen
on x86.  I can try to fix the Problem 1 for MIPS, but I can't test it.
Possible solution will be to use an smp_send_stop variant which stop
the CPU 

RE: [Openipmi-developer] [v3 PATCH 3/3] hpwdt: Use nmi_panic() when kernel panics in NMI handler

2016-03-07 Thread 河合英宏 / KAWAIHIDEHIRO
> From: Phil Pokorny [mailto:ppoko...@penguincomputing.com]
> 
> While you are in drivers/watchdog/hpwdt.c replacing panic() with
> nmi_panic() can you also fix a few more of the broken strings?  The
> Linux style guide specfiically says NOT to break strings that are user
> visible because it breaks "grep".  So:

Ahh, yes.  I should have fixed this as well as the former case.
But this patch set has already been queued into -mm tree, so
I'll leave this patch if the fix is not important.

Regards,
Hidehiro Kawai

> 
> -   panic("An NMI occurred. Depending on your system the reason "
> +   nmi_panic(regs, "An NMI occurred. Depending on your system the reason 
> "
> "for the NMI is logged in any one of the following "
> "resources:\n"
> "1. Integrated Management Log (IML)\n"
> 
> Should probably be:
> 
> -   panic("An NMI occurred. Depending on your system the reason "
> -"for the NMI is logged in any one of the following "
> -"resources:\n"
> +   nmi_panic(regs, "An NMI occurred. Depending on your system the
> reason for the NMI is logged in any one of the following resources:\n"
> "1. Integrated Management Log (IML)\n"
> 
> So it's okay to break on "\n", but not in the middle of a line.
> 
> --
> Philip Pokorny, RHCE
> Chief Technology Officer
> PENGUIN COMPUTING, Inc
> www.penguincomputing.com
> 
> Changing the world through technical innovation


RE: [Openipmi-developer] [v3 PATCH 3/3] hpwdt: Use nmi_panic() when kernel panics in NMI handler

2016-03-07 Thread 河合英宏 / KAWAIHIDEHIRO
> From: Phil Pokorny [mailto:ppoko...@penguincomputing.com]
> 
> While you are in drivers/watchdog/hpwdt.c replacing panic() with
> nmi_panic() can you also fix a few more of the broken strings?  The
> Linux style guide specfiically says NOT to break strings that are user
> visible because it breaks "grep".  So:

Ahh, yes.  I should have fixed this as well as the former case.
But this patch set has already been queued into -mm tree, so
I'll leave this patch if the fix is not important.

Regards,
Hidehiro Kawai

> 
> -   panic("An NMI occurred. Depending on your system the reason "
> +   nmi_panic(regs, "An NMI occurred. Depending on your system the reason 
> "
> "for the NMI is logged in any one of the following "
> "resources:\n"
> "1. Integrated Management Log (IML)\n"
> 
> Should probably be:
> 
> -   panic("An NMI occurred. Depending on your system the reason "
> -"for the NMI is logged in any one of the following "
> -"resources:\n"
> +   nmi_panic(regs, "An NMI occurred. Depending on your system the
> reason for the NMI is logged in any one of the following resources:\n"
> "1. Integrated Management Log (IML)\n"
> 
> So it's okay to break on "\n", but not in the middle of a line.
> 
> --
> Philip Pokorny, RHCE
> Chief Technology Officer
> PENGUIN COMPUTING, Inc
> www.penguincomputing.com
> 
> Changing the world through technical innovation


RE: [v3 PATCH 1/3] panic: Change nmi_panic from macro to function

2016-03-07 Thread 河合英宏 / KAWAIHIDEHIRO
> From: Borislav Petkov [mailto:b...@alien8.de]
> On Thu, Mar 03, 2016 at 07:57:44PM +0900, Hidehiro Kawai wrote:
> > Change nmi_panic() macro to a normal function for the portability.
> 
> portability?

I wanted to say encapsulating things into a function makes modules
only have to know about the function. Modules don't need to know
all things in the macro.  But I thought again, and `portability'
was not appropriate.

However, this patch set has been queued into -mm now.  So I'll
leave this if not necessary.

Regards,
Hidehiro Kawai

> > Also, export it for modules.
> >
> > Changes since v2:
> > - Make nmi_panic receive a single string instead of printf style args
> >
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Andrew Morton 
> > Cc: Borislav Petkov 
> > Cc: Michal Nazarewicz 
> > Cc: Michal Hocko 
> > Cc: Rasmus Villemoes 
> > Cc: Nicolas Iooss 
> > Cc: Javi Merino 
> > Cc: Gobinda Charan Maji 
> > Cc: "Steven Rostedt (Red Hat)" 
> > Cc: Thomas Gleixner 
> > Cc: Vitaly Kuznetsov 
> > Cc: HATAYAMA Daisuke 
> > Cc: Tejun Heo 
> > ---
> >  include/linux/kernel.h |   21 +
> >  kernel/panic.c |   20 
> >  2 files changed, 21 insertions(+), 20 deletions(-)
> 
> Acked-by: Borislav Petkov 
> 
> --
> Regards/Gruss,
> Boris.
> 
> ECO tip #101: Trim your mails when you reply.


RE: [v3 PATCH 1/3] panic: Change nmi_panic from macro to function

2016-03-07 Thread 河合英宏 / KAWAIHIDEHIRO
> From: Borislav Petkov [mailto:b...@alien8.de]
> On Thu, Mar 03, 2016 at 07:57:44PM +0900, Hidehiro Kawai wrote:
> > Change nmi_panic() macro to a normal function for the portability.
> 
> portability?

I wanted to say encapsulating things into a function makes modules
only have to know about the function. Modules don't need to know
all things in the macro.  But I thought again, and `portability'
was not appropriate.

However, this patch set has been queued into -mm now.  So I'll
leave this if not necessary.

Regards,
Hidehiro Kawai

> > Also, export it for modules.
> >
> > Changes since v2:
> > - Make nmi_panic receive a single string instead of printf style args
> >
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Andrew Morton 
> > Cc: Borislav Petkov 
> > Cc: Michal Nazarewicz 
> > Cc: Michal Hocko 
> > Cc: Rasmus Villemoes 
> > Cc: Nicolas Iooss 
> > Cc: Javi Merino 
> > Cc: Gobinda Charan Maji 
> > Cc: "Steven Rostedt (Red Hat)" 
> > Cc: Thomas Gleixner 
> > Cc: Vitaly Kuznetsov 
> > Cc: HATAYAMA Daisuke 
> > Cc: Tejun Heo 
> > ---
> >  include/linux/kernel.h |   21 +
> >  kernel/panic.c |   20 
> >  2 files changed, 21 insertions(+), 20 deletions(-)
> 
> Acked-by: Borislav Petkov 
> 
> --
> Regards/Gruss,
> Boris.
> 
> ECO tip #101: Trim your mails when you reply.


RE: [v2 PATCH 1/3] panic: Change nmi_panic from macro to function

2016-03-02 Thread 河合英宏 / KAWAIHIDEHIRO
Hi,

> From: Borislav Petkov [mailto:b...@alien8.de]
> On Wed, Mar 02, 2016 at 02:18:24PM +0100, Michal Hocko wrote:
> > On Wed 02-03-16 19:36:26, Hidehiro Kawai wrote:
> > [...]
> > > +void nmi_panic(struct pt_regs *regs, const char *fmt, ...)
> >
> > Do we really need vargs? All the current users seem to be OK with a
> > simple string. This makes the code slightly more complicated without any
> > apparent reason.
> 
> I was just wondering the exactly same thing...
> 
> The contra-arg would be that in case someone wants to do nmi_panic()
> with more than a string, then it won't work.

It's not necessary to use vargs at this point, and passing a simple
string is OK for me.  Even if someone wants to use vargs, we can modify
nmi_panic() without any changes in caller side.  So, I'll remove it.

> The question is, does nmi_panic() even need to dump something more than
> regs and a string?

Hmm, printing regs, especially for RIP, would be useful for hang-up
cases, but I don't want to do much things in nmi_panic() as long as
it is already done by current callers.  nmi_panic() can be called
concurrently on multiple CPUs, so it also needs another serialization
to print something.

By the way, I have a patch set to safely leave important information
by kexec's purgatory...but no one is interested in?
http://thread.gmane.org/gmane.linux.kernel.kexec/15382

Regards,
--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group




RE: [v2 PATCH 1/3] panic: Change nmi_panic from macro to function

2016-03-02 Thread 河合英宏 / KAWAIHIDEHIRO
Hi,

> From: Borislav Petkov [mailto:b...@alien8.de]
> On Wed, Mar 02, 2016 at 02:18:24PM +0100, Michal Hocko wrote:
> > On Wed 02-03-16 19:36:26, Hidehiro Kawai wrote:
> > [...]
> > > +void nmi_panic(struct pt_regs *regs, const char *fmt, ...)
> >
> > Do we really need vargs? All the current users seem to be OK with a
> > simple string. This makes the code slightly more complicated without any
> > apparent reason.
> 
> I was just wondering the exactly same thing...
> 
> The contra-arg would be that in case someone wants to do nmi_panic()
> with more than a string, then it won't work.

It's not necessary to use vargs at this point, and passing a simple
string is OK for me.  Even if someone wants to use vargs, we can modify
nmi_panic() without any changes in caller side.  So, I'll remove it.

> The question is, does nmi_panic() even need to dump something more than
> regs and a string?

Hmm, printing regs, especially for RIP, would be useful for hang-up
cases, but I don't want to do much things in nmi_panic() as long as
it is already done by current callers.  nmi_panic() can be called
concurrently on multiple CPUs, so it also needs another serialization
to print something.

By the way, I have a patch set to safely leave important information
by kexec's purgatory...but no one is interested in?
http://thread.gmane.org/gmane.linux.kernel.kexec/15382

Regards,
--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group




RE: [PATCH 1/3] panic: Export panic_cpu and nmi_panic_self_stop

2016-02-29 Thread 河合英宏 / KAWAIHIDEHIRO
Hi Borislav,

> From: Borislav Petkov [mailto:b...@alien8.de]
> On Tue, Mar 01, 2016 at 10:50:37AM +0900, Hidehiro Kawai wrote:
> > Export panic_cpu and nmi_panic_self_stop symbols for modules which
> > use nmi_panic() macro.
> >
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Andrew Morton 
> > Cc: Borislav Petkov 
> > Cc: Thomas Gleixner 
> > Cc: Michal Hocko 
> > Cc: HATAYAMA Daisuke 
> > Cc: Vitaly Kuznetsov 
> > Cc: Tejun Heo 
> > ---
> >  kernel/panic.c |2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index d96469d..f4e8035 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -69,8 +69,10 @@ void __weak nmi_panic_self_stop(struct pt_regs *regs)
> >  {
> > panic_smp_self_stop();
> >  }
> > +EXPORT_SYMBOL(nmi_panic_self_stop);
> >
> >  atomic_t panic_cpu = ATOMIC_INIT(PANIC_CPU_INVALID);
> > +EXPORT_SYMBOL(panic_cpu);
> 
> Can we make nmi_panic() at least a proper function and export that
> instead of exporting all those implementation details...?

The reason I implemented nmi_panic() as a macro is to pass
variable arguments directly to panic().  Fortunately, since all
invocations of nmi_panic() just pass a fixed string, I can change it
to a normal function like:

int nmi_panic(struct pt_regs *regs, const char *msg)
{
...
panic("%s", msg);

If people don't mind if that, I'll change it.

Regards,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



RE: [PATCH 1/3] panic: Export panic_cpu and nmi_panic_self_stop

2016-02-29 Thread 河合英宏 / KAWAIHIDEHIRO
Hi Borislav,

> From: Borislav Petkov [mailto:b...@alien8.de]
> On Tue, Mar 01, 2016 at 10:50:37AM +0900, Hidehiro Kawai wrote:
> > Export panic_cpu and nmi_panic_self_stop symbols for modules which
> > use nmi_panic() macro.
> >
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Andrew Morton 
> > Cc: Borislav Petkov 
> > Cc: Thomas Gleixner 
> > Cc: Michal Hocko 
> > Cc: HATAYAMA Daisuke 
> > Cc: Vitaly Kuznetsov 
> > Cc: Tejun Heo 
> > ---
> >  kernel/panic.c |2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index d96469d..f4e8035 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -69,8 +69,10 @@ void __weak nmi_panic_self_stop(struct pt_regs *regs)
> >  {
> > panic_smp_self_stop();
> >  }
> > +EXPORT_SYMBOL(nmi_panic_self_stop);
> >
> >  atomic_t panic_cpu = ATOMIC_INIT(PANIC_CPU_INVALID);
> > +EXPORT_SYMBOL(panic_cpu);
> 
> Can we make nmi_panic() at least a proper function and export that
> instead of exporting all those implementation details...?

The reason I implemented nmi_panic() as a macro is to pass
variable arguments directly to panic().  Fortunately, since all
invocations of nmi_panic() just pass a fixed string, I can change it
to a normal function like:

int nmi_panic(struct pt_regs *regs, const char *msg)
{
...
panic("%s", msg);

If people don't mind if that, I'll change it.

Regards,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



RE: [PATCH 2/3] ipmi/watchdog: Use nmi_panic() when kernel panics in NMI handler

2016-02-29 Thread 河合英宏 / KAWAIHIDEHIRO
Hi Corey,

Thanks for the review.

> Sure, this is a good idea.
> 
> Acked-by: Corey Minyard 
> 
> Note that nmi_panic() came in commit 1717f2096b5 (panic, x86: Fix
> re-entrance problem due to panic on NMI) and then the regs field
> was added in the commit you reference.

Yes. So, I'll change the description to more proper one.

> Do you want me to add this to the IPMI queue or do you have another
> way to get this patch into the kernel?

I don't have another way, and I don't know how cross-subsystem
patch set should be handled.

I think it would be better this patch set is managed by one person
because both PATCH 2/3 and 3/3 depend on 1/3.

Thanks,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group





RE: [PATCH 2/3] ipmi/watchdog: Use nmi_panic() when kernel panics in NMI handler

2016-02-29 Thread 河合英宏 / KAWAIHIDEHIRO
Hi Corey,

Thanks for the review.

> Sure, this is a good idea.
> 
> Acked-by: Corey Minyard 
> 
> Note that nmi_panic() came in commit 1717f2096b5 (panic, x86: Fix
> re-entrance problem due to panic on NMI) and then the regs field
> was added in the commit you reference.

Yes. So, I'll change the description to more proper one.

> Do you want me to add this to the IPMI queue or do you have another
> way to get this patch into the kernel?

I don't have another way, and I don't know how cross-subsystem
patch set should be handled.

I think it would be better this patch set is managed by one person
because both PATCH 2/3 and 3/3 depend on 1/3.

Thanks,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group





RE: [PATCH 3/3] hpwdt: Use nmi_panic() when kernel panics in NMI handler

2016-02-29 Thread 河合英宏 / KAWAIHIDEHIRO
Hi Guenter,

Thanks for the review.

> On 02/29/2016 05:50 PM, Hidehiro Kawai wrote:
> > commit 58c5661f2144 ("panic, x86: Allow CPUs to save registers even
> > if looping in NMI context") introduced nmi_panic() which prevents
> > concurrent/recursive execution of panic().  It also saves registers
> > for the crash dump on x86.
> >
> > hpwdt driver can call panic() from NMI handler, so replace it with
> > nmi_panic().
> >
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Thomas Mingarelli 
> > Cc: Wim Van Sebroeck 
> > Cc: Guenter Roeck 
> > Cc: linux-watch...@vger.kernel.org
> > ---
> >   drivers/watchdog/hpwdt.c |   12 ++--
> >   1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
> > index 92443c3..fd0486f 100644
> > --- a/drivers/watchdog/hpwdt.c
> > +++ b/drivers/watchdog/hpwdt.c
> > @@ -496,11 +496,12 @@ static int hpwdt_pretimeout(unsigned int ulReason, 
> > struct pt_regs *regs)
> >
> > if (!is_icru && !is_uefi) {
> > if (cmn_regs.u1.ral == 0) {
> > -   panic("An NMI occurred, "
> > +   nmi_panic(regs, "An NMI occurred, "
> > "but unable to determine source.\n");
> 
> This message should really be in a single line.
> Sure, strictly speaking that should be done in a separate patch,
> but since you touch the line you might as well fix it.

OK, I'll combine them.

> > +   goto handled;
> 
> The goto is unnecessary. Just return NMI_HANDLED.

Sure, I'll do that.

> 
> > }
> > }
> > -   panic("An NMI occurred. Depending on your system the reason "
> > +   nmi_panic(regs, "An NMI occurred. Depending on your system the reason "
> > "for the NMI is logged in any one of the following "
> > "resources:\n"
> > "1. Integrated Management Log (IML)\n"
> > @@ -508,6 +509,13 @@ static int hpwdt_pretimeout(unsigned int ulReason, 
> > struct pt_regs *regs)
> > "3. OA Forward Progress Log\n"
> > "4. iLO Event Log");
> >
> > +handled:
> > +   /*
> > +* Returning from nmi_panic() means this CPU was processing panic()
> > +* before NMI.  Return NMI_HANDLED and go back to panic routines.
> > +*/
> 
> I don't think this comment adds much if any value.

So, I remove this comment.  NMI_HANDLED would be self-explanatory.

> > +   return NMI_HANDLED;
> > +
> >   out:
> 
> Any goto to this label does no longer serve a useful purpose (if it ever 
> served one),
> and should be replaced with "return NMI_DONE;"

I'll fix it.  It will become a bit simpler.

Thanks,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group





RE: [PATCH 3/3] hpwdt: Use nmi_panic() when kernel panics in NMI handler

2016-02-29 Thread 河合英宏 / KAWAIHIDEHIRO
Hi Guenter,

Thanks for the review.

> On 02/29/2016 05:50 PM, Hidehiro Kawai wrote:
> > commit 58c5661f2144 ("panic, x86: Allow CPUs to save registers even
> > if looping in NMI context") introduced nmi_panic() which prevents
> > concurrent/recursive execution of panic().  It also saves registers
> > for the crash dump on x86.
> >
> > hpwdt driver can call panic() from NMI handler, so replace it with
> > nmi_panic().
> >
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Thomas Mingarelli 
> > Cc: Wim Van Sebroeck 
> > Cc: Guenter Roeck 
> > Cc: linux-watch...@vger.kernel.org
> > ---
> >   drivers/watchdog/hpwdt.c |   12 ++--
> >   1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
> > index 92443c3..fd0486f 100644
> > --- a/drivers/watchdog/hpwdt.c
> > +++ b/drivers/watchdog/hpwdt.c
> > @@ -496,11 +496,12 @@ static int hpwdt_pretimeout(unsigned int ulReason, 
> > struct pt_regs *regs)
> >
> > if (!is_icru && !is_uefi) {
> > if (cmn_regs.u1.ral == 0) {
> > -   panic("An NMI occurred, "
> > +   nmi_panic(regs, "An NMI occurred, "
> > "but unable to determine source.\n");
> 
> This message should really be in a single line.
> Sure, strictly speaking that should be done in a separate patch,
> but since you touch the line you might as well fix it.

OK, I'll combine them.

> > +   goto handled;
> 
> The goto is unnecessary. Just return NMI_HANDLED.

Sure, I'll do that.

> 
> > }
> > }
> > -   panic("An NMI occurred. Depending on your system the reason "
> > +   nmi_panic(regs, "An NMI occurred. Depending on your system the reason "
> > "for the NMI is logged in any one of the following "
> > "resources:\n"
> > "1. Integrated Management Log (IML)\n"
> > @@ -508,6 +509,13 @@ static int hpwdt_pretimeout(unsigned int ulReason, 
> > struct pt_regs *regs)
> > "3. OA Forward Progress Log\n"
> > "4. iLO Event Log");
> >
> > +handled:
> > +   /*
> > +* Returning from nmi_panic() means this CPU was processing panic()
> > +* before NMI.  Return NMI_HANDLED and go back to panic routines.
> > +*/
> 
> I don't think this comment adds much if any value.

So, I remove this comment.  NMI_HANDLED would be self-explanatory.

> > +   return NMI_HANDLED;
> > +
> >   out:
> 
> Any goto to this label does no longer serve a useful purpose (if it ever 
> served one),
> and should be replaced with "return NMI_DONE;"

I'll fix it.  It will become a bit simpler.

Thanks,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group





RE: Re: [V6 PATCH 1/6] panic/x86: Fix re-entrance problem due to panic on NMI

2015-12-10 Thread 河合英宏 / KAWAIHIDEHIRO
> From: Borislav Petkov [mailto:b...@alien8.de]
[...]
> Looks better.
> 
> Did some comments cleanup and nmi_panic() macro reformatting so that it
> is more readable and ended up applying this:

Thanks a lot!

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

> ---
> From: Hidehiro Kawai 
> Date: Thu, 10 Dec 2015 10:46:26 +0900
> Subject: [PATCH] panic, x86: Fix re-entrance problem due to panic on NMI
> 
> If panic on NMI happens just after panic() on the same CPU, panic() is
> recursively called. Kernel stalls, as a result, after failing to acquire
> panic_lock.
> 
> To avoid this problem, don't call panic() in NMI context if we've
> already entered panic().
> 
> For that, introduce nmi_panic() macro to reduce code duplication. In
> the case of panic on NMI, don't return from NMI handlers if another CPU
> already panicked.
> 
> Signed-off-by: Hidehiro Kawai 
> Acked-by: Michal Hocko 
> Cc: Aaron Tomlin 
> Cc: Andrew Morton 
> Cc: Andy Lutomirski 
> Cc: Baoquan He 
> Cc: Chris Metcalf 
> Cc: David Hildenbrand 
> Cc: Don Zickus 
> Cc: "Eric W. Biederman" 
> Cc: Frederic Weisbecker 
> Cc: Gobinda Charan Maji 
> Cc: HATAYAMA Daisuke 
> Cc: "H. Peter Anvin" 
> Cc: Ingo Molnar 
> Cc: Javi Merino 
> Cc: Jonathan Corbet 
> Cc: ke...@lists.infradead.org
> Cc: linux-...@vger.kernel.org
> Cc: lkml 
> Cc: Masami Hiramatsu 
> Cc: Michal Nazarewicz 
> Cc: Nicolas Iooss 
> Cc: Peter Zijlstra 
> Cc: Prarit Bhargava 
> Cc: Rasmus Villemoes 
> Cc: Rusty Russell 
> Cc: Seth Jennings 
> Cc: Steven Rostedt 
> Cc: Thomas Gleixner 
> Cc: Ulrich Obergfell 
> Cc: Vitaly Kuznetsov 
> Cc: Vivek Goyal 
> Link: http://lkml.kernel.org/r/20151210014626.25437.13302.stgit@softrs
> [ Cleanup comments, fixup formatting. ]
> Signed-off-by: Borislav Petkov 
> ---
>  arch/x86/kernel/nmi.c  | 16 
>  include/linux/kernel.h | 20 
>  kernel/panic.c | 16 +---
>  kernel/watchdog.c  |  2 +-
>  4 files changed, 46 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> index 697f90db0e37..292a24bd0553 100644
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -231,7 +231,7 @@ pci_serr_error(unsigned char reason, struct pt_regs *regs)
>  #endif
> 
>   if (panic_on_unrecovered_nmi)
> - panic("NMI: Not continuing");
> + nmi_panic("NMI: Not continuing");
> 
>   pr_emerg("Dazed and confused, but trying to continue\n");
> 
> @@ -255,8 +255,16 @@ io_check_error(unsigned char reason, struct pt_regs 
> *regs)
>reason, smp_processor_id());
>   show_regs(regs);
> 
> - if (panic_on_io_nmi)
> - panic("NMI IOCK error: Not continuing");
> + if (panic_on_io_nmi) {
> + nmi_panic("NMI IOCK error: Not continuing");
> +
> + /*
> +  * If end up here, it means we have received an NMI while
> +  * processing panic(). Simply return without delaying and
> +  * re-enabling NMI.
> +  */
> + return;
> + }
> 
>   /* Re-enable the IOCK line, wait for a few seconds */
>   reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
> @@ -297,7 +305,7 @@ unknown_nmi_error(unsigned char reason, struct pt_regs 
> *regs)
> 
>   pr_emerg("Do you have a strange power saving mode enabled?\n");
>   if (unknown_nmi_panic || panic_on_unrecovered_nmi)
> - panic("NMI: Not continuing");
> + nmi_panic("NMI: Not continuing");
> 
>   pr_emerg("Dazed and confused, but trying to continue\n");
>  }
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 350dfb08aee3..750cc5c7c999 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -446,6 +446,26 @@ extern int sysctl_panic_on_stackoverflow;
>  extern bool crash_kexec_post_notifiers;
> 
>  /*
> + * panic_cpu is used for synchronizing panic() and crash_kexec() execution. 
> It
> + * holds a CPU number which is executing panic() currently. A value of
> + * PANIC_CPU_INVALID means no CPU has entered panic() or crash_kexec().
> + */
> +extern atomic_t panic_cpu;
> +#define PANIC_CPU_INVALID-1
> +
> +/*
> + * A variant of panic() called from NMI context. We return if we've already
> + * panicked on this CPU.
> + */
> +#define nmi_panic(fmt, ...)  \
> +do { \
> + int cpu = raw_smp_processor_id();   \
> + \
> + if (atomic_cmpxchg(_cpu, PANIC_CPU_INVALID, cpu) != cpu)  \
> + panic(fmt, ##__VA_ARGS__);  \
> +} while (0)
> +
> +/*
>   * Only to be used by arch init code. If the user over-wrote the default
>   * CONFIG_PANIC_TIMEOUT, honor it.
>   */
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 4b150bc0c6c1..3344524cf6ff 100644
> --- 

RE: Re: [V6 PATCH 1/6] panic/x86: Fix re-entrance problem due to panic on NMI

2015-12-10 Thread 河合英宏 / KAWAIHIDEHIRO
> From: Borislav Petkov [mailto:b...@alien8.de]
[...]
> Looks better.
> 
> Did some comments cleanup and nmi_panic() macro reformatting so that it
> is more readable and ended up applying this:

Thanks a lot!

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

> ---
> From: Hidehiro Kawai 
> Date: Thu, 10 Dec 2015 10:46:26 +0900
> Subject: [PATCH] panic, x86: Fix re-entrance problem due to panic on NMI
> 
> If panic on NMI happens just after panic() on the same CPU, panic() is
> recursively called. Kernel stalls, as a result, after failing to acquire
> panic_lock.
> 
> To avoid this problem, don't call panic() in NMI context if we've
> already entered panic().
> 
> For that, introduce nmi_panic() macro to reduce code duplication. In
> the case of panic on NMI, don't return from NMI handlers if another CPU
> already panicked.
> 
> Signed-off-by: Hidehiro Kawai 
> Acked-by: Michal Hocko 
> Cc: Aaron Tomlin 
> Cc: Andrew Morton 
> Cc: Andy Lutomirski 
> Cc: Baoquan He 
> Cc: Chris Metcalf 
> Cc: David Hildenbrand 
> Cc: Don Zickus 
> Cc: "Eric W. Biederman" 
> Cc: Frederic Weisbecker 
> Cc: Gobinda Charan Maji 
> Cc: HATAYAMA Daisuke 
> Cc: "H. Peter Anvin" 
> Cc: Ingo Molnar 
> Cc: Javi Merino 
> Cc: Jonathan Corbet 
> Cc: ke...@lists.infradead.org
> Cc: linux-...@vger.kernel.org
> Cc: lkml 
> Cc: Masami Hiramatsu 
> Cc: Michal Nazarewicz 
> Cc: Nicolas Iooss 
> Cc: Peter Zijlstra 
> Cc: Prarit Bhargava 
> Cc: Rasmus Villemoes 
> Cc: Rusty Russell 
> Cc: Seth Jennings 
> Cc: Steven Rostedt 
> Cc: Thomas Gleixner 
> Cc: Ulrich Obergfell 
> Cc: Vitaly Kuznetsov 
> Cc: Vivek Goyal 
> Link: http://lkml.kernel.org/r/20151210014626.25437.13302.stgit@softrs
> [ Cleanup comments, fixup formatting. ]
> Signed-off-by: Borislav Petkov 
> ---
>  arch/x86/kernel/nmi.c  | 16 
>  include/linux/kernel.h | 20 
>  kernel/panic.c | 16 +---
>  kernel/watchdog.c  |  2 +-
>  4 files changed, 46 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> index 697f90db0e37..292a24bd0553 100644
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -231,7 +231,7 @@ pci_serr_error(unsigned char reason, struct pt_regs *regs)
>  #endif
> 
>   if (panic_on_unrecovered_nmi)
> - panic("NMI: Not continuing");
> + nmi_panic("NMI: Not continuing");
> 
>   pr_emerg("Dazed and confused, but trying to continue\n");
> 
> @@ -255,8 +255,16 @@ io_check_error(unsigned char reason, struct pt_regs 
> *regs)
>reason, smp_processor_id());
>   show_regs(regs);
> 
> - if (panic_on_io_nmi)
> - panic("NMI IOCK error: Not continuing");
> + if (panic_on_io_nmi) {
> + nmi_panic("NMI IOCK error: Not continuing");
> +
> + /*
> +  * If end up here, it means we have received an NMI while
> +  * processing panic(). Simply return without delaying and
> +  * re-enabling NMI.
> +  */
> + return;
> + }
> 
>   /* Re-enable the IOCK line, wait for a few seconds */
>   reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
> @@ -297,7 +305,7 @@ unknown_nmi_error(unsigned char reason, struct pt_regs 
> *regs)
> 
>   pr_emerg("Do you have a strange power saving mode enabled?\n");
>   if (unknown_nmi_panic || panic_on_unrecovered_nmi)
> - panic("NMI: Not continuing");
> + nmi_panic("NMI: Not continuing");
> 
>   pr_emerg("Dazed and confused, but trying to continue\n");
>  }
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 350dfb08aee3..750cc5c7c999 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -446,6 +446,26 @@ extern int sysctl_panic_on_stackoverflow;
>  extern bool crash_kexec_post_notifiers;
> 
>  /*
> + * panic_cpu is used for synchronizing panic() and crash_kexec() execution. 
> It
> + * holds a CPU number which is executing panic() currently. A value of
> + * PANIC_CPU_INVALID means no CPU has entered panic() or crash_kexec().
> + */
> +extern atomic_t panic_cpu;
> +#define PANIC_CPU_INVALID-1
> +
> +/*
> + * A variant of panic() called from NMI context. We return if 

RE: Re: [V6 PATCH 5/6] x86/nmi: Fix to save registers for crash dump on external NMI broadcast

2015-12-09 Thread 河合英宏 / KAWAIHIDEHIRO
Hi Steven,

> From: Steven Rostedt [mailto:rost...@goodmis.org]
> On Tue, Nov 24, 2015 at 11:48:53AM +0100, Borislav Petkov wrote:
> > > +  */
> > > + while (!raw_spin_trylock(_reason_lock))
> > > + poll_crash_ipi_and_callback(regs);
> >
> > Waaait a minute: so if we're getting NMIs broadcasted on every core but
> > we're *not* crash dumping, we will run into here too. This can't be
> > right. :-\
> 
> This only does something if crash_ipi_done is set, which means you are killing
> the box. But perhaps a comment that states that here would be useful, or maybe
> just put in the check here. There's no need to make it depend on SMP, as
> raw_spin_trylock() will turn to just ({1}) for UP, and that code wont even be
> hit.

It seems that poll_crash_ipi_and_callback (now renamed to 
run_crash_ipi_callback)
is referred for UP if CONFIG_DEBUG_SPINLOCK=y, and it causes a build error
as below.  run_crash_ipi_callback refers crash_ipi_issued and 
crash_nmi_callback,
which are defined only if CONFIG_SMP=y.  So we need to defined it separately
for SMP and UP.

I'll resend this patch later.

> Hi Hidehiro,
> 
> [auto build test ERROR on v4.4-rc4]
> [also build test ERROR on next-20151209]
> [cannot apply to tip/x86/core]
> 
> url:
> https://github.com/0day-ci/linux/commits/Hidehiro-Kawai/Fix-race-issues-among-panic-NMI-and-crash_kexec/20151210-095
> 254
> config: x86_64-randconfig-s4-12101030 (attached as .config)
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64
> 
> All errors (new ones prefixed by >>):
> 
>arch/x86/built-in.o: In function `do_nmi':
> >> (.text+0x7339): undefined reference to `run_crash_ipi_callback'
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: Re: [V6 PATCH 5/6] x86/nmi: Fix to save registers for crash dump on external NMI broadcast

2015-12-09 Thread 河合英宏 / KAWAIHIDEHIRO
Hi Steven,

> From: Steven Rostedt [mailto:rost...@goodmis.org]
> On Tue, Nov 24, 2015 at 11:48:53AM +0100, Borislav Petkov wrote:
> > > +  */
> > > + while (!raw_spin_trylock(_reason_lock))
> > > + poll_crash_ipi_and_callback(regs);
> >
> > Waaait a minute: so if we're getting NMIs broadcasted on every core but
> > we're *not* crash dumping, we will run into here too. This can't be
> > right. :-\
> 
> This only does something if crash_ipi_done is set, which means you are killing
> the box. But perhaps a comment that states that here would be useful, or maybe
> just put in the check here. There's no need to make it depend on SMP, as
> raw_spin_trylock() will turn to just ({1}) for UP, and that code wont even be
> hit.

It seems that poll_crash_ipi_and_callback (now renamed to 
run_crash_ipi_callback)
is referred for UP if CONFIG_DEBUG_SPINLOCK=y, and it causes a build error
as below.  run_crash_ipi_callback refers crash_ipi_issued and 
crash_nmi_callback,
which are defined only if CONFIG_SMP=y.  So we need to defined it separately
for SMP and UP.

I'll resend this patch later.

> Hi Hidehiro,
> 
> [auto build test ERROR on v4.4-rc4]
> [also build test ERROR on next-20151209]
> [cannot apply to tip/x86/core]
> 
> url:
> https://github.com/0day-ci/linux/commits/Hidehiro-Kawai/Fix-race-issues-among-panic-NMI-and-crash_kexec/20151210-095
> 254
> config: x86_64-randconfig-s4-12101030 (attached as .config)
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64
> 
> All errors (new ones prefixed by >>):
> 
>arch/x86/built-in.o: In function `do_nmi':
> >> (.text+0x7339): undefined reference to `run_crash_ipi_callback'
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-12-03 Thread 河合英宏 / KAWAIHIDEHIRO
> On Thu, Dec 03, 2015 at 02:01:38AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > On Wed, Dec 02, 2015 at 11:57:38AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > We can do so, but I think resetting panic_cpu always would be
> > > > simpler and safer.
> >
> > I'll state in detail.
> >
> > When we call crash_kexec() without entering panic() and return from
> > it, panic() should be called eventually.
> 
> Huh, the call chain is
> 
> panic->crash_kexec
> 
> Or do you mean, when crash_kexec() is not called by panic() but by some
> of its other callers?

I was arguing about the case of oops_end --> crash_kexec
--> return from crash_kexec because of !kexec_crash_image -->
panic.

In the case of panic --> __crash_kexec, __crash_kexec is called
only once, so we don't need to check the return value of __crash_kexec
as you suggested.  So I thought you stated about crash_kexec --> panic
case.

> > But the code paths are a bit complicated and there are many
> > implementations for each architecture. So one day, this assumption may
> > be broken; the CPU doesn't call panic(). Or the CPU may fail to call
> > panic() because we are already in insane state. It would be nervous,
> > but allowing another CPU to process panic routines by resetting
> > panic_cpu is safer approach.
> 
> My suggestion was to do this only on the panic path - not necessarily on
> the others.
> 
> > Since this code is executed only once due to panic_cpu,
> > I think introducing this logic is not much valuable.
> > Also, current implementation is already quite simple:
> >
> > panic()
> > {
> > ...
> > __crash_kexec(NULL) {
> > if (mutex_trylock(_mutex)) {
> > if (kexec_crash_image) {
> > /* don't return */
> > }
> 
> I don't mean the kexec_crash_image case - I mean the opposite one:
> !kexec_crash_image.

I also mentioned !kexec_crash_image case...

> And I think I know now what you're trying to tell
> me: the first CPU which hits panic, will finish panic eventually and so
> it will take down the machine.

No.  The first CPU calls panic, and then it calls __crash_kexec.
Because of !kexec_crash_image, it returns from __crash_kexec and
continues to the panic procedure.  At the same time, another CPU
tries to call panic(), but it doesn't run the panic procedure;
panic_cpu prevents the second CPU from running it.
This means __crash_kexec is called only once even if we don't
check the return value of __crash_kexec.
(Please note that crash_kexec can be called multiple times in the
case of oops_end() --> crash_kexec().)

I'm sorry I couldn't tell my thought well.

Regards,
--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group




RE: [V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-12-03 Thread 河合英宏 / KAWAIHIDEHIRO
> On Thu, Dec 03, 2015 at 02:01:38AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > On Wed, Dec 02, 2015 at 11:57:38AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > We can do so, but I think resetting panic_cpu always would be
> > > > simpler and safer.
> >
> > I'll state in detail.
> >
> > When we call crash_kexec() without entering panic() and return from
> > it, panic() should be called eventually.
> 
> Huh, the call chain is
> 
> panic->crash_kexec
> 
> Or do you mean, when crash_kexec() is not called by panic() but by some
> of its other callers?

I was arguing about the case of oops_end --> crash_kexec
--> return from crash_kexec because of !kexec_crash_image -->
panic.

In the case of panic --> __crash_kexec, __crash_kexec is called
only once, so we don't need to check the return value of __crash_kexec
as you suggested.  So I thought you stated about crash_kexec --> panic
case.

> > But the code paths are a bit complicated and there are many
> > implementations for each architecture. So one day, this assumption may
> > be broken; the CPU doesn't call panic(). Or the CPU may fail to call
> > panic() because we are already in insane state. It would be nervous,
> > but allowing another CPU to process panic routines by resetting
> > panic_cpu is safer approach.
> 
> My suggestion was to do this only on the panic path - not necessarily on
> the others.
> 
> > Since this code is executed only once due to panic_cpu,
> > I think introducing this logic is not much valuable.
> > Also, current implementation is already quite simple:
> >
> > panic()
> > {
> > ...
> > __crash_kexec(NULL) {
> > if (mutex_trylock(_mutex)) {
> > if (kexec_crash_image) {
> > /* don't return */
> > }
> 
> I don't mean the kexec_crash_image case - I mean the opposite one:
> !kexec_crash_image.

I also mentioned !kexec_crash_image case...

> And I think I know now what you're trying to tell
> me: the first CPU which hits panic, will finish panic eventually and so
> it will take down the machine.

No.  The first CPU calls panic, and then it calls __crash_kexec.
Because of !kexec_crash_image, it returns from __crash_kexec and
continues to the panic procedure.  At the same time, another CPU
tries to call panic(), but it doesn't run the panic procedure;
panic_cpu prevents the second CPU from running it.
This means __crash_kexec is called only once even if we don't
check the return value of __crash_kexec.
(Please note that crash_kexec can be called multiple times in the
case of oops_end() --> crash_kexec().)

I'm sorry I couldn't tell my thought well.

Regards,
--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group




RE: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-12-02 Thread 河合英宏 / KAWAIHIDEHIRO
> @@ -357,7 +358,15 @@ static void default_do_nmi(struct pt_regs *regs)
>   }
> 
>   /* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
> - raw_spin_lock(_reason_lock);
> +
> + /*
> +  * Another CPU may be processing panic routines with holding
> +  * nmi_reason_lock.  Check IPI issuance from the panicking CPU
> +  * and call the callback directly.
> +  */
> + while (!raw_spin_trylock(_reason_lock))
> + poll_crash_ipi_and_callback(regs);
> +
>   reason = x86_platform.get_nmi_reason();

I noticed this logic is unneeded until applying PATCH 4/4.
Currently, unknown NMI can be broadcast to all CPUs, but in that case,
panic()/nmi_panic() are called after releasing nmi_reason_lock.
So CPUs can't loop infinitely here.

PATCH 4/4 allows us to broadcast external NMIs to all CPUs, and it
causes infinite loop in raw_spin_lock(_reason_lock).  So the above
changes are needed.

I'll move these chagnes to a later patch in the next version.

Thanks,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group





RE: [V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-12-02 Thread 河合英宏 / KAWAIHIDEHIRO
> On Wed, Dec 02, 2015 at 11:57:38AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > We can do so, but I think resetting panic_cpu always would be
> > simpler and safer.

I'll state in detail.

When we call crash_kexec() without entering panic() and return from
it, panic() should be called eventually.  But the code paths are
a bit complicated and there are many implementations for each
architecture.  So one day, this assumption may be broken; the CPU
doesn't call panic().  Or the CPU may fail to call panic() because
we are already in insane state.  It would be nervous, but allowing
another CPU to process panic routines by resetting panic_cpu
is safer approach.

> Well, I think executing code needlessly *especially* at panic time is
> not all that rosy either.
> 
> Besides something like this:
> 
>   static bool kexec_failed;
> 
>   ...
> 
> if (crash_kexec_post_notifiers && !kexec_failed)
>   kexec_failed = __crash_kexec(NULL);
> 
> is as simple as it gets.

Since this code is executed only once due to panic_cpu,
I think introducing this logic is not much valuable.
Also, current implementation is already quite simple:

panic()
{
...
__crash_kexec(NULL) {
if (mutex_trylock(_mutex)) {
if (kexec_crash_image) {
/* don't return */
}
}
mutex_unlock(_mutex)
}

How do you think?

Regards,
--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group




RE: [V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-12-02 Thread 河合英宏 / KAWAIHIDEHIRO
Hello Borislav,

Sorry, I haven't replied to this mail yet.

> On Fri, Nov 20, 2015 at 06:36:48PM +0900, Hidehiro Kawai wrote:
...
> > +void crash_kexec(struct pt_regs *regs)
> > +{
> > +   int old_cpu, this_cpu;
> > +
> > +   /*
> > +* Only one CPU is allowed to execute the crash_kexec() code as with
> > +* panic().  Otherwise parallel calls of panic() and crash_kexec()
> > +* may stop each other.  To exclude them, we use panic_cpu here too.
> > +*/
> > +   this_cpu = raw_smp_processor_id();
> > +   old_cpu = atomic_cmpxchg(_cpu, -1, this_cpu);
> > +   if (old_cpu == -1) {
> > +   /* This is the 1st CPU which comes here, so go ahead. */
> > +   __crash_kexec(regs);
> > +
> > +   /*
> > +* Reset panic_cpu to allow another panic()/crash_kexec()
> > +* call.
> 
> So can we make __crash_kexec() return error values?
> 
> * failed to grab kexec_mutex -> reset panic_cpu
> 
> * no kexec_crash_image -> no need to reset it, all future crash_kexec()
> calls won't work so no need to run into that path anymore. However, this could
> be problematic if we want the other CPUs to panic. Do we care?
> 
> * machine_kexec successful -> doesn't matter

We can do so, but I think resetting panic_cpu always would be
simpler and safer.

Although checking kexec_crash_image each time is pointless, it
doesn't cause any actual problem.

Regards,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



RE: [V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-12-02 Thread 河合英宏 / KAWAIHIDEHIRO
Hello Borislav,

Sorry, I haven't replied to this mail yet.

> On Fri, Nov 20, 2015 at 06:36:48PM +0900, Hidehiro Kawai wrote:
...
> > +void crash_kexec(struct pt_regs *regs)
> > +{
> > +   int old_cpu, this_cpu;
> > +
> > +   /*
> > +* Only one CPU is allowed to execute the crash_kexec() code as with
> > +* panic().  Otherwise parallel calls of panic() and crash_kexec()
> > +* may stop each other.  To exclude them, we use panic_cpu here too.
> > +*/
> > +   this_cpu = raw_smp_processor_id();
> > +   old_cpu = atomic_cmpxchg(_cpu, -1, this_cpu);
> > +   if (old_cpu == -1) {
> > +   /* This is the 1st CPU which comes here, so go ahead. */
> > +   __crash_kexec(regs);
> > +
> > +   /*
> > +* Reset panic_cpu to allow another panic()/crash_kexec()
> > +* call.
> 
> So can we make __crash_kexec() return error values?
> 
> * failed to grab kexec_mutex -> reset panic_cpu
> 
> * no kexec_crash_image -> no need to reset it, all future crash_kexec()
> calls won't work so no need to run into that path anymore. However, this could
> be problematic if we want the other CPUs to panic. Do we care?
> 
> * machine_kexec successful -> doesn't matter

We can do so, but I think resetting panic_cpu always would be
simpler and safer.

Although checking kexec_crash_image each time is pointless, it
doesn't cause any actual problem.

Regards,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



RE: [V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-12-02 Thread 河合英宏 / KAWAIHIDEHIRO
> On Wed, Dec 02, 2015 at 11:57:38AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > We can do so, but I think resetting panic_cpu always would be
> > simpler and safer.

I'll state in detail.

When we call crash_kexec() without entering panic() and return from
it, panic() should be called eventually.  But the code paths are
a bit complicated and there are many implementations for each
architecture.  So one day, this assumption may be broken; the CPU
doesn't call panic().  Or the CPU may fail to call panic() because
we are already in insane state.  It would be nervous, but allowing
another CPU to process panic routines by resetting panic_cpu
is safer approach.

> Well, I think executing code needlessly *especially* at panic time is
> not all that rosy either.
> 
> Besides something like this:
> 
>   static bool kexec_failed;
> 
>   ...
> 
> if (crash_kexec_post_notifiers && !kexec_failed)
>   kexec_failed = __crash_kexec(NULL);
> 
> is as simple as it gets.

Since this code is executed only once due to panic_cpu,
I think introducing this logic is not much valuable.
Also, current implementation is already quite simple:

panic()
{
...
__crash_kexec(NULL) {
if (mutex_trylock(_mutex)) {
if (kexec_crash_image) {
/* don't return */
}
}
mutex_unlock(_mutex)
}

How do you think?

Regards,
--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group




RE: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-12-02 Thread 河合英宏 / KAWAIHIDEHIRO
> @@ -357,7 +358,15 @@ static void default_do_nmi(struct pt_regs *regs)
>   }
> 
>   /* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
> - raw_spin_lock(_reason_lock);
> +
> + /*
> +  * Another CPU may be processing panic routines with holding
> +  * nmi_reason_lock.  Check IPI issuance from the panicking CPU
> +  * and call the callback directly.
> +  */
> + while (!raw_spin_trylock(_reason_lock))
> + poll_crash_ipi_and_callback(regs);
> +
>   reason = x86_platform.get_nmi_reason();

I noticed this logic is unneeded until applying PATCH 4/4.
Currently, unknown NMI can be broadcast to all CPUs, but in that case,
panic()/nmi_panic() are called after releasing nmi_reason_lock.
So CPUs can't loop infinitely here.

PATCH 4/4 allows us to broadcast external NMIs to all CPUs, and it
causes infinite loop in raw_spin_lock(_reason_lock).  So the above
changes are needed.

I'll move these chagnes to a later patch in the next version.

Thanks,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group





RE: [V5 PATCH 4/4] x86/apic: Introduce apic_extnmi boot option

2015-11-25 Thread 河合英宏 / KAWAIHIDEHIRO
> On Fri, Nov 20, 2015 at 06:36:50PM +0900, Hidehiro Kawai wrote:
> > This patch introduces new boot option, apic_extnmi:
> >
> >  apic_extnmi={ bsp | all | none}
> >
> > The default value is "bsp" and this is the current behavior; only
> > BSP receives external NMI.  "all" allows external NMIs to be
> > broadcast to all CPUs.  This would raise the success rate of panic
> > on NMI when BSP hangs up in NMI context or the external NMI is
> > swallowed by other NMI handlers on BSP.  If you specified "none",
> > any CPUs don't receive external NMIs.  This is useful for dump
> > capture kernel so that it wouldn't be shot down while saving a
> > crash dump.
> >
> > V5:
> > - Rename the option from "noextnmi" to "apic_extnmi"
> > - Add apic_extnmi=all feature
> > - Fix the wrong documentation about "noextnmi" (apic_extnmi=none)
> >
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: Jonathan Corbet 
> > ---
> >  Documentation/kernel-parameters.txt |9 +
> >  arch/x86/include/asm/apic.h |5 +
> >  arch/x86/kernel/apic/apic.c |   31 ++-
> >  3 files changed, 44 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/kernel-parameters.txt 
> > b/Documentation/kernel-parameters.txt
> > index f8aae63..ceed3bc 100644
> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> > @@ -472,6 +472,15 @@ bytes respectively. Such letter suffixes can also be 
> > entirely omitted.
> > Change the amount of debugging information output
> > when initialising the APIC and IO-APIC components.
> >
> > +   apic_extnmi=[APIC,X86] External NMI delivery setting
> > +   Format: { bsp (default) | all | none }
> > +   bsp:  External NMI is delivered to only CPU 0
> 
>   only to

Thanks for tha correction.

> 
> > +   all:  External NMIs are broadcast to all CPUs as a
> > + backup of CPU 0
> > +   none: External NMI is masked for all CPUs. This is
> > + useful so that a dump capture kernel won't be
> > + shot down by NMI
> > +
> > autoconf=   [IPV6]
> > See Documentation/networking/ipv6.txt.
> >
> > diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
> > index 7f62ad4..c80f6b6 100644
> > --- a/arch/x86/include/asm/apic.h
> > +++ b/arch/x86/include/asm/apic.h
> > @@ -23,6 +23,11 @@
> >  #define APIC_VERBOSE 1
> >  #define APIC_DEBUG   2
> >
> > +/* Macros for apic_extnmi which controls external NMI masking */
> > +#define APIC_EXTNMI_BSP0 /* Default */
> > +#define APIC_EXTNMI_ALL1
> > +#define APIC_EXTNMI_NONE   2
> > +
> >  /*
> >   * Define the default level of output to be very little
> >   * This can be turned up by using apic=verbose for more
> > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> > index 2f69e3b..a2a8074 100644
> > --- a/arch/x86/kernel/apic/apic.c
> > +++ b/arch/x86/kernel/apic/apic.c
> > @@ -82,6 +82,12 @@ physid_mask_t phys_cpu_present_map;
> >  static unsigned int disabled_cpu_apicid __read_mostly = BAD_APICID;
> >
> >  /*
> > + * This variable controls which CPUs receive external NMIs.  By default,
> > + * external NMIs are delivered to only BSP.
> 
> only to the BSP.

...and again.

> 
> > + */
> > +static int apic_extnmi = APIC_EXTNMI_BSP;
> > +
> > +/*
> >   * Map cpu index to physical APIC ID
> >   */
> >  DEFINE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid, BAD_APICID);
> > @@ -1161,6 +1167,8 @@ void __init init_bsp_APIC(void)
> > value = APIC_DM_NMI;
> > if (!lapic_is_integrated()) /* 82489DX */
> > value |= APIC_LVT_LEVEL_TRIGGER;
> > +   if (apic_extnmi == APIC_EXTNMI_NONE)
> > +   value |= APIC_LVT_MASKED;
> > apic_write(APIC_LVT1, value);
> >  }
> >
> > @@ -1380,7 +1388,8 @@ void setup_local_APIC(void)
> > /*
> >  * only the BP should see the LINT1 NMI signal, obviously.
> >  */
> 
> That comment needs adjusting.

OK, I'll do that.

> 
> > -   if (!cpu)
> > +   if ((!cpu && apic_extnmi != APIC_EXTNMI_NONE) ||
> > +   apic_extnmi == APIC_EXTNMI_ALL)
> > value = APIC_DM_NMI;
> > else
> > value = APIC_DM_NMI | APIC_LVT_MASKED;
> > @@ -2548,3 +2557,23 @@ static int __init apic_set_disabled_cpu_apicid(char 
> > *arg)
> > return 0;
> >  }
> >  early_param("disable_cpu_apicid", apic_set_disabled_cpu_apicid);
> > +
> > +static int __init apic_set_extnmi(char *arg)
> > +{
> > +   if (!arg)
> > +   return -EINVAL;
> > +
> > +   if (strcmp("all", arg) == 0)
> 
>   if (!strncmp("all", arg, 3))
> 
> ditto for the rest

I'll fix them.

> > +   apic_extnmi = APIC_EXTNMI_ALL;
> > +   else if 

RE: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-11-25 Thread 河合英宏 / KAWAIHIDEHIRO
> On Wed, Nov 25, 2015 at 09:46:37AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
...
> > I prefer this, but we might want to add some more prefix or suffix.
> > For example, "conditionally_run_crash_nmi_callback".
> 
> That's unnecessary IMO. If you need to express that, you could write
> that in a comment above the function definition. Anyone who looks at
> the code then will know that it is conditional, like so many other
> kernel functions. :)

OK, I'll use a simple one with a comment.

Regards,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group




RE: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-11-25 Thread 河合英宏 / KAWAIHIDEHIRO
> On Wed, Nov 25, 2015 at 05:51:59AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > `Infinite loop in NMI context' can happen:
> > > >
> > > >   a. when a cpu panics on NMI while another cpu is processing panic
> > > >   b. when a cpu received an external or unknown NMI while another
> > > >  cpu is processing panic on NMI
> > > >
> > > > In the case of a, it loops in panic_smp_self_stop().  In the case
> > > > of b, it loops in raw_spin_lock() of nmi_reason_lock.
> > >
> > > Please describe those two cases more verbosely - it takes slow people
> > > like me a while to figure out what exactly can happen.
> >
> >   a. when a cpu panics on NMI while another cpu is processing panic
> >  Ex.
> >  CPU 0 CPU 1
> >  = =
> >  panic()
> >panic_cpu <-- 0
> >check panic_cpu
> >crash_kexec()
> >receive an unknown NMI
> >unknown_nmi_error()
> >  nmi_panic()
> >panic()
> >  check panic_cpu
> >  panic_smp_self_stop()
> >infinite loop in NMI context
> >
> >   b. when a cpu received an external or unknown NMI while another
> >  cpu is processing panic on NMI
> >  Ex.
> >  CPU 0 CPU 1
> >  ====
> >  receive an unknown NMI
> >  unknown_nmi_error()
> >nmi_panic() receive an unknown NMI
> >  panic_cpu <-- 0   unknown_nmi_error()
> >  panic() nmi_panic()
> >check panic_cpu panic
> >crash_kexec() check panic_cpu
> >  panic_smp_self_stop()
> >infinite loop in NMI context
> 
> Ok, that's what I saw too, thanks for confirming.
> 
> But please write those examples with the *old* code in the commit
> message, i.e. without panic_cpu and nmi_panic() which you're adding.

Does *old* code mean the code without this patch *series* ?
panic_cpu and nmi_panic() are introduced by PATCH 1/4, not this patch.

> Basically, you want to structure your commit message this way:
> 
> This is the problem the current code has: ...
> 
> But we need to do this: ...
> 
> We fix it by doing that: ...

Good suggestion!  I'll revise a bit with following your comment.

> > > > + * directly.  This function is used when we have already been in NMI 
> > > > handler.
> > > > + */
> > > > +void poll_crash_ipi_and_callback(struct pt_regs *regs)
> > >
> > > Why "poll"? We won't return from crash_nmi_callback() if we're not the
> > > crashing CPU.
> >
> > This function polls that crash IPI has been issued by checking
> > crash_ipi_done, then invokes the callback.  This is different
> > from so-called "poll" functions.  Do you have some good name?
> 
> Maybe something as simple as "run_crash_callback"?

I prefer this, but we might want to add some more prefix or suffix.
For example, "conditionally_run_crash_nmi_callback".

> Or since we're calling it from other places, maybe add the "crash"
> prefix:
> 
>   while (!raw_spin_trylock(_reason_lock))
>   crash_run_callback(regs);
> 
> Looks even better to me in code context. :)

Thanks for your deep review!

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [V5 PATCH 4/4] x86/apic: Introduce apic_extnmi boot option

2015-11-25 Thread 河合英宏 / KAWAIHIDEHIRO
> On Fri, Nov 20, 2015 at 06:36:50PM +0900, Hidehiro Kawai wrote:
> > This patch introduces new boot option, apic_extnmi:
> >
> >  apic_extnmi={ bsp | all | none}
> >
> > The default value is "bsp" and this is the current behavior; only
> > BSP receives external NMI.  "all" allows external NMIs to be
> > broadcast to all CPUs.  This would raise the success rate of panic
> > on NMI when BSP hangs up in NMI context or the external NMI is
> > swallowed by other NMI handlers on BSP.  If you specified "none",
> > any CPUs don't receive external NMIs.  This is useful for dump
> > capture kernel so that it wouldn't be shot down while saving a
> > crash dump.
> >
> > V5:
> > - Rename the option from "noextnmi" to "apic_extnmi"
> > - Add apic_extnmi=all feature
> > - Fix the wrong documentation about "noextnmi" (apic_extnmi=none)
> >
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: Jonathan Corbet 
> > ---
> >  Documentation/kernel-parameters.txt |9 +
> >  arch/x86/include/asm/apic.h |5 +
> >  arch/x86/kernel/apic/apic.c |   31 ++-
> >  3 files changed, 44 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/kernel-parameters.txt 
> > b/Documentation/kernel-parameters.txt
> > index f8aae63..ceed3bc 100644
> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> > @@ -472,6 +472,15 @@ bytes respectively. Such letter suffixes can also be 
> > entirely omitted.
> > Change the amount of debugging information output
> > when initialising the APIC and IO-APIC components.
> >
> > +   apic_extnmi=[APIC,X86] External NMI delivery setting
> > +   Format: { bsp (default) | all | none }
> > +   bsp:  External NMI is delivered to only CPU 0
> 
>   only to

Thanks for tha correction.

> 
> > +   all:  External NMIs are broadcast to all CPUs as a
> > + backup of CPU 0
> > +   none: External NMI is masked for all CPUs. This is
> > + useful so that a dump capture kernel won't be
> > + shot down by NMI
> > +
> > autoconf=   [IPV6]
> > See Documentation/networking/ipv6.txt.
> >
> > diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
> > index 7f62ad4..c80f6b6 100644
> > --- a/arch/x86/include/asm/apic.h
> > +++ b/arch/x86/include/asm/apic.h
> > @@ -23,6 +23,11 @@
> >  #define APIC_VERBOSE 1
> >  #define APIC_DEBUG   2
> >
> > +/* Macros for apic_extnmi which controls external NMI masking */
> > +#define APIC_EXTNMI_BSP0 /* Default */
> > +#define APIC_EXTNMI_ALL1
> > +#define APIC_EXTNMI_NONE   2
> > +
> >  /*
> >   * Define the default level of output to be very little
> >   * This can be turned up by using apic=verbose for more
> > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> > index 2f69e3b..a2a8074 100644
> > --- a/arch/x86/kernel/apic/apic.c
> > +++ b/arch/x86/kernel/apic/apic.c
> > @@ -82,6 +82,12 @@ physid_mask_t phys_cpu_present_map;
> >  static unsigned int disabled_cpu_apicid __read_mostly = BAD_APICID;
> >
> >  /*
> > + * This variable controls which CPUs receive external NMIs.  By default,
> > + * external NMIs are delivered to only BSP.
> 
> only to the BSP.

...and again.

> 
> > + */
> > +static int apic_extnmi = APIC_EXTNMI_BSP;
> > +
> > +/*
> >   * Map cpu index to physical APIC ID
> >   */
> >  DEFINE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid, BAD_APICID);
> > @@ -1161,6 +1167,8 @@ void __init init_bsp_APIC(void)
> > value = APIC_DM_NMI;
> > if (!lapic_is_integrated()) /* 82489DX */
> > value |= APIC_LVT_LEVEL_TRIGGER;
> > +   if (apic_extnmi == APIC_EXTNMI_NONE)
> > +   value |= APIC_LVT_MASKED;
> > apic_write(APIC_LVT1, value);
> >  }
> >
> > @@ -1380,7 +1388,8 @@ void setup_local_APIC(void)
> > /*
> >  * only the BP should see the LINT1 NMI signal, obviously.
> >  */
> 
> That comment needs adjusting.

OK, I'll do that.

> 
> > -   if (!cpu)
> > +   if ((!cpu && apic_extnmi != APIC_EXTNMI_NONE) ||
> > +   apic_extnmi == APIC_EXTNMI_ALL)
> > value = APIC_DM_NMI;
> > else
> > value = APIC_DM_NMI | APIC_LVT_MASKED;
> > @@ -2548,3 +2557,23 @@ static int __init apic_set_disabled_cpu_apicid(char 
> > *arg)
> > return 0;
> >  }
> >  early_param("disable_cpu_apicid", apic_set_disabled_cpu_apicid);
> > +
> > +static int __init apic_set_extnmi(char *arg)
> > +{
> > +   if (!arg)
> > +   return -EINVAL;
> > +
> > +   if (strcmp("all", arg) == 0)
> 
>   if (!strncmp("all", arg, 3))
> 

RE: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-11-25 Thread 河合英宏 / KAWAIHIDEHIRO
> On Wed, Nov 25, 2015 at 09:46:37AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
...
> > I prefer this, but we might want to add some more prefix or suffix.
> > For example, "conditionally_run_crash_nmi_callback".
> 
> That's unnecessary IMO. If you need to express that, you could write
> that in a comment above the function definition. Anyone who looks at
> the code then will know that it is conditional, like so many other
> kernel functions. :)

OK, I'll use a simple one with a comment.

Regards,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group




RE: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-11-25 Thread 河合英宏 / KAWAIHIDEHIRO
> On Wed, Nov 25, 2015 at 05:51:59AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > `Infinite loop in NMI context' can happen:
> > > >
> > > >   a. when a cpu panics on NMI while another cpu is processing panic
> > > >   b. when a cpu received an external or unknown NMI while another
> > > >  cpu is processing panic on NMI
> > > >
> > > > In the case of a, it loops in panic_smp_self_stop().  In the case
> > > > of b, it loops in raw_spin_lock() of nmi_reason_lock.
> > >
> > > Please describe those two cases more verbosely - it takes slow people
> > > like me a while to figure out what exactly can happen.
> >
> >   a. when a cpu panics on NMI while another cpu is processing panic
> >  Ex.
> >  CPU 0 CPU 1
> >  = =
> >  panic()
> >panic_cpu <-- 0
> >check panic_cpu
> >crash_kexec()
> >receive an unknown NMI
> >unknown_nmi_error()
> >  nmi_panic()
> >panic()
> >  check panic_cpu
> >  panic_smp_self_stop()
> >infinite loop in NMI context
> >
> >   b. when a cpu received an external or unknown NMI while another
> >  cpu is processing panic on NMI
> >  Ex.
> >  CPU 0 CPU 1
> >  ====
> >  receive an unknown NMI
> >  unknown_nmi_error()
> >nmi_panic() receive an unknown NMI
> >  panic_cpu <-- 0   unknown_nmi_error()
> >  panic() nmi_panic()
> >check panic_cpu panic
> >crash_kexec() check panic_cpu
> >  panic_smp_self_stop()
> >infinite loop in NMI context
> 
> Ok, that's what I saw too, thanks for confirming.
> 
> But please write those examples with the *old* code in the commit
> message, i.e. without panic_cpu and nmi_panic() which you're adding.

Does *old* code mean the code without this patch *series* ?
panic_cpu and nmi_panic() are introduced by PATCH 1/4, not this patch.

> Basically, you want to structure your commit message this way:
> 
> This is the problem the current code has: ...
> 
> But we need to do this: ...
> 
> We fix it by doing that: ...

Good suggestion!  I'll revise a bit with following your comment.

> > > > + * directly.  This function is used when we have already been in NMI 
> > > > handler.
> > > > + */
> > > > +void poll_crash_ipi_and_callback(struct pt_regs *regs)
> > >
> > > Why "poll"? We won't return from crash_nmi_callback() if we're not the
> > > crashing CPU.
> >
> > This function polls that crash IPI has been issued by checking
> > crash_ipi_done, then invokes the callback.  This is different
> > from so-called "poll" functions.  Do you have some good name?
> 
> Maybe something as simple as "run_crash_callback"?

I prefer this, but we might want to add some more prefix or suffix.
For example, "conditionally_run_crash_nmi_callback".

> Or since we're calling it from other places, maybe add the "crash"
> prefix:
> 
>   while (!raw_spin_trylock(_reason_lock))
>   crash_run_callback(regs);
> 
> Looks even better to me in code context. :)

Thanks for your deep review!

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-11-24 Thread 河合英宏 / KAWAIHIDEHIRO
> On Fri, Nov 20, 2015 at 06:36:48PM +0900, Hidehiro Kawai wrote:
> > Currently, panic() and crash_kexec() can be called at the same time.
> > For example (x86 case):
> >
> > CPU 0:
> >   oops_end()
> > crash_kexec()
> >   mutex_trylock() // acquired
> > nmi_shootdown_cpus() // stop other cpus
> >
> > CPU 1:
> >   panic()
> > crash_kexec()
> >   mutex_trylock() // failed to acquire
> > smp_send_stop() // stop other cpus
> > infinite loop
> >
> > If CPU 1 calls smp_send_stop() before nmi_shootdown_cpus(), kdump
> > fails.
> 
> So the smp_send_stop() stops CPU 0 from calling nmi_shootdown_cpus(), right?

Yes, but the important thing is that CPU 1 stops CPU 0 which is
only CPU processing crash_ kexec routines.

> >
> > In another case:
> >
> > CPU 0:
> >   oops_end()
> > crash_kexec()
> >   mutex_trylock() // acquired
> > 
> > io_check_error()
> >   panic()
> > crash_kexec()
> >   mutex_trylock() // failed to acquire
> > infinite loop
> >
> > Clearly, this is an undesirable result.
> 
> I'm trying to see how this patch fixes this case.
> 
> >
> > To fix this problem, this patch changes crash_kexec() to exclude
> > others by using atomic_t panic_cpu.
> >
> > V5:
> > - Add missing dummy __crash_kexec() for !CONFIG_KEXEC_CORE case
> > - Replace atomic_xchg() with atomic_set() in crash_kexec() because
> >   it is used as a release operation and there is no need of memory
> >   barrier effect.  This change also removes an unused value warning
> >
> > V4:
> > - Use new __crash_kexec(), no exclusion check version of crash_kexec(),
> >   instead of checking if panic_cpu is the current cpu or not
> >
> > V2:
> > - Use atomic_cmpxchg() instead of spin_trylock() on panic_lock
> >   to exclude concurrent accesses
> > - Don't introduce no-lock version of crash_kexec()
> >
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Eric Biederman 
> > Cc: Vivek Goyal 
> > Cc: Andrew Morton 
> > Cc: Michal Hocko 
> > ---
> >  include/linux/kexec.h |2 ++
> >  kernel/kexec_core.c   |   26 +-
> >  kernel/panic.c|4 ++--
> >  3 files changed, 29 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> > index d140b1e..7b68d27 100644
> > --- a/include/linux/kexec.h
> > +++ b/include/linux/kexec.h
> > @@ -237,6 +237,7 @@ extern int kexec_purgatory_get_set_symbol(struct kimage 
> > *image,
> >   unsigned int size, bool get_value);
> >  extern void *kexec_purgatory_get_symbol_addr(struct kimage *image,
> >  const char *name);
> > +extern void __crash_kexec(struct pt_regs *);
> >  extern void crash_kexec(struct pt_regs *);
> >  int kexec_should_crash(struct task_struct *);
> >  void crash_save_cpu(struct pt_regs *regs, int cpu);
> > @@ -332,6 +333,7 @@ int __weak arch_kexec_apply_relocations(const Elf_Ehdr 
> > *ehdr, Elf_Shdr *sechdrs,
> >  #else /* !CONFIG_KEXEC_CORE */
> >  struct pt_regs;
> >  struct task_struct;
> > +static inline void __crash_kexec(struct pt_regs *regs) { }
> >  static inline void crash_kexec(struct pt_regs *regs) { }
> >  static inline int kexec_should_crash(struct task_struct *p) { return 0; }
> >  #define kexec_in_progress false
> > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > index 11b64a6..9d097f5 100644
> > --- a/kernel/kexec_core.c
> > +++ b/kernel/kexec_core.c
> > @@ -853,7 +853,8 @@ struct kimage *kexec_image;
> >  struct kimage *kexec_crash_image;
> >  int kexec_load_disabled;
> >
> > -void crash_kexec(struct pt_regs *regs)
> > +/* No panic_cpu check version of crash_kexec */
> > +void __crash_kexec(struct pt_regs *regs)
> >  {
> > /* Take the kexec_mutex here to prevent sys_kexec_load
> >  * running on one cpu from replacing the crash kernel
> > @@ -876,6 +877,29 @@ void crash_kexec(struct pt_regs *regs)
> > }
> >  }
> >
> > +void crash_kexec(struct pt_regs *regs)
> > +{
> > +   int old_cpu, this_cpu;
> > +
> > +   /*
> > +* Only one CPU is allowed to execute the crash_kexec() code as with
> > +* panic().  Otherwise parallel calls of panic() and crash_kexec()
> > +* may stop each other.  To exclude them, we use panic_cpu here too.
> > +*/
> > +   this_cpu = raw_smp_processor_id();
> > +   old_cpu = atomic_cmpxchg(_cpu, -1, this_cpu);
> > +   if (old_cpu == -1) {
> > +   /* This is the 1st CPU which comes here, so go ahead. */
> > +   __crash_kexec(regs);
> > +
> > +   /*
> > +* Reset panic_cpu to allow another panic()/crash_kexec()
> > +* call.
> > +*/
> > +   atomic_set(_cpu, -1);
> > +   }
> > +}
> > +
> >  size_t crash_get_memory_size(void)
> >  {
> > size_t size = 0;
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index 4fce2be..5d0b807 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -138,7 +138,7 @@ void panic(const char 

RE: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-11-24 Thread 河合英宏 / KAWAIHIDEHIRO
> On Tue, Nov 24, 2015 at 11:48:53AM +0100, Borislav Petkov wrote:
> >
> > > +  */
> > > + while (!raw_spin_trylock(_reason_lock))
> > > + poll_crash_ipi_and_callback(regs);
> >
> > Waaait a minute: so if we're getting NMIs broadcasted on every core but
> > we're *not* crash dumping, we will run into here too. This can't be
> > right. :-\
> 
> This only does something if crash_ipi_done is set, which means you are killing
> the box. But perhaps a comment that states that here would be useful, or maybe
> just put in the check here. 

OK, I'll add more comments around this.

> There's no need to make it depend on SMP, as
> raw_spin_trylock() will turn to just ({1}) for UP, and that code wont even be
> hit.

I'll integrate these SMP and UP versions with a comment about
that.

Regards,
--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group






RE: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-11-24 Thread 河合英宏 / KAWAIHIDEHIRO
> On Fri, Nov 20, 2015 at 06:36:46PM +0900, Hidehiro Kawai wrote:
> > nmi_shootdown_cpus(), a subroutine of crash_kexec(), sends NMI IPI
> > to non-panic cpus to stop them while saving their register
> 
>...to stop them and save their register...

Thanks for the correction.

> > information and doing some cleanups for crash dumping.  So if a
> > non-panic cpus is infinitely looping in NMI context, we fail to
> 
> That should be CPU. Please use "CPU" instead of "cpu" in all your text
> in your next submission.

OK, I'll fix that.

> > save its register information and lose the information from the
> > crash dump.
> >
> > `Infinite loop in NMI context' can happen:
> >
> >   a. when a cpu panics on NMI while another cpu is processing panic
> >   b. when a cpu received an external or unknown NMI while another
> >  cpu is processing panic on NMI
> >
> > In the case of a, it loops in panic_smp_self_stop().  In the case
> > of b, it loops in raw_spin_lock() of nmi_reason_lock.
> 
> Please describe those two cases more verbosely - it takes slow people
> like me a while to figure out what exactly can happen.

  a. when a cpu panics on NMI while another cpu is processing panic
 Ex.
 CPU 0 CPU 1
 = =
 panic()
   panic_cpu <-- 0
   check panic_cpu
   crash_kexec()
   receive an unknown NMI
   unknown_nmi_error()
 nmi_panic()
   panic()
 check panic_cpu
 panic_smp_self_stop()
   infinite loop in NMI context

  b. when a cpu received an external or unknown NMI while another
 cpu is processing panic on NMI
 Ex.
 CPU 0 CPU 1
 ====
 receive an unknown NMI
 unknown_nmi_error()
   nmi_panic() receive an unknown NMI
 panic_cpu <-- 0   unknown_nmi_error()
 panic() nmi_panic()
   check panic_cpu panic
   crash_kexec() check panic_cpu
 panic_smp_self_stop()
   infinite loop in NMI context
 
> > This can
> > happen on some servers which broadcasts NMIs to all CPUs when a dump
> > button is pushed.
> >
> > To save registers in these case too, this patch does following things:
> >
> > 1. Move the timing of `infinite loop in NMI context' (actually
> >done by panic_smp_self_stop()) outside of panic() to enable us to
> >refer pt_regs
> 
> I can't parse that sentence. And I really tried :-\
> panic_smp_self_stop() is still in panic().

panic_smp_self_stop() is still used when a CPU in normal context
should go into infinite loop.  Only when a CPU is in NMI context,
nmi_panic_self_stop() is called and the CPU loops infinitely
without entering panic().

I'll try to revise this sentense.

> > 2. call a callback of nmi_shootdown_cpus() directly to save
> >registers and do some cleanups after setting waiting_for_crash_ipi
> >which is used for counting down the number of cpus which handled
> >the callback
> >
> > V5:
> > - Use WRITE_ONCE() when setting crash_ipi_done to 1 so that the
> >   compiler doesn't change the instruction order
> > - Support the case of b in the above description
> > - Add poll_crash_ipi_and_callback()
> >
> > V4:
> > - Rewrite the patch description
> >
> > V3:
> > - Newly introduced
> >
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Andrew Morton 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: Peter Zijlstra 
> > Cc: Eric Biederman 
> > Cc: Vivek Goyal 
> > Cc: Michal Hocko 
> > ---
> >  arch/x86/include/asm/reboot.h |1 +
> >  arch/x86/kernel/nmi.c |   17 +
> >  arch/x86/kernel/reboot.c  |   28 
> >  include/linux/kernel.h|   12 ++--
> >  kernel/panic.c|   10 ++
> >  kernel/watchdog.c |2 +-
> >  6 files changed, 63 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
> > index a82c4f1..964e82f 100644
> > --- a/arch/x86/include/asm/reboot.h
> > +++ b/arch/x86/include/asm/reboot.h
> > @@ -25,5 +25,6 @@ void __noreturn machine_real_restart(unsigned int type);
> >
> >  typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
> >  void nmi_shootdown_cpus(nmi_shootdown_cb callback);
> > +void poll_crash_ipi_and_callback(struct pt_regs *regs);
> >
> >  #endif /* _ASM_X86_REBOOT_H */
> > diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> > index 5131714..74a1434 100644
> > --- a/arch/x86/kernel/nmi.c
> > +++ b/arch/x86/kernel/nmi.c
> > @@ -29,6 +29,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  

RE: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-11-24 Thread 河合英宏 / KAWAIHIDEHIRO
> On Fri, Nov 20, 2015 at 06:36:46PM +0900, Hidehiro Kawai wrote:
> > nmi_shootdown_cpus(), a subroutine of crash_kexec(), sends NMI IPI
> > to non-panic cpus to stop them while saving their register
> 
>...to stop them and save their register...

Thanks for the correction.

> > information and doing some cleanups for crash dumping.  So if a
> > non-panic cpus is infinitely looping in NMI context, we fail to
> 
> That should be CPU. Please use "CPU" instead of "cpu" in all your text
> in your next submission.

OK, I'll fix that.

> > save its register information and lose the information from the
> > crash dump.
> >
> > `Infinite loop in NMI context' can happen:
> >
> >   a. when a cpu panics on NMI while another cpu is processing panic
> >   b. when a cpu received an external or unknown NMI while another
> >  cpu is processing panic on NMI
> >
> > In the case of a, it loops in panic_smp_self_stop().  In the case
> > of b, it loops in raw_spin_lock() of nmi_reason_lock.
> 
> Please describe those two cases more verbosely - it takes slow people
> like me a while to figure out what exactly can happen.

  a. when a cpu panics on NMI while another cpu is processing panic
 Ex.
 CPU 0 CPU 1
 = =
 panic()
   panic_cpu <-- 0
   check panic_cpu
   crash_kexec()
   receive an unknown NMI
   unknown_nmi_error()
 nmi_panic()
   panic()
 check panic_cpu
 panic_smp_self_stop()
   infinite loop in NMI context

  b. when a cpu received an external or unknown NMI while another
 cpu is processing panic on NMI
 Ex.
 CPU 0 CPU 1
 ====
 receive an unknown NMI
 unknown_nmi_error()
   nmi_panic() receive an unknown NMI
 panic_cpu <-- 0   unknown_nmi_error()
 panic() nmi_panic()
   check panic_cpu panic
   crash_kexec() check panic_cpu
 panic_smp_self_stop()
   infinite loop in NMI context
 
> > This can
> > happen on some servers which broadcasts NMIs to all CPUs when a dump
> > button is pushed.
> >
> > To save registers in these case too, this patch does following things:
> >
> > 1. Move the timing of `infinite loop in NMI context' (actually
> >done by panic_smp_self_stop()) outside of panic() to enable us to
> >refer pt_regs
> 
> I can't parse that sentence. And I really tried :-\
> panic_smp_self_stop() is still in panic().

panic_smp_self_stop() is still used when a CPU in normal context
should go into infinite loop.  Only when a CPU is in NMI context,
nmi_panic_self_stop() is called and the CPU loops infinitely
without entering panic().

I'll try to revise this sentense.

> > 2. call a callback of nmi_shootdown_cpus() directly to save
> >registers and do some cleanups after setting waiting_for_crash_ipi
> >which is used for counting down the number of cpus which handled
> >the callback
> >
> > V5:
> > - Use WRITE_ONCE() when setting crash_ipi_done to 1 so that the
> >   compiler doesn't change the instruction order
> > - Support the case of b in the above description
> > - Add poll_crash_ipi_and_callback()
> >
> > V4:
> > - Rewrite the patch description
> >
> > V3:
> > - Newly introduced
> >
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Andrew Morton 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: Peter Zijlstra 
> > Cc: Eric Biederman 
> > Cc: Vivek Goyal 
> > Cc: Michal Hocko 
> > ---
> >  arch/x86/include/asm/reboot.h |1 +
> >  arch/x86/kernel/nmi.c |   17 +
> >  arch/x86/kernel/reboot.c  |   28 
> >  include/linux/kernel.h|   12 ++--
> >  kernel/panic.c|   10 ++
> >  kernel/watchdog.c |2 +-
> >  6 files changed, 63 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
> > index a82c4f1..964e82f 100644
> > --- a/arch/x86/include/asm/reboot.h
> > +++ b/arch/x86/include/asm/reboot.h
> > @@ -25,5 +25,6 @@ void __noreturn machine_real_restart(unsigned int type);
> >
> >  typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
> >  void nmi_shootdown_cpus(nmi_shootdown_cb callback);
> > +void poll_crash_ipi_and_callback(struct pt_regs *regs);
> >
> >  #endif /* _ASM_X86_REBOOT_H */
> > diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c

RE: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-11-24 Thread 河合英宏 / KAWAIHIDEHIRO
> On Tue, Nov 24, 2015 at 11:48:53AM +0100, Borislav Petkov wrote:
> >
> > > +  */
> > > + while (!raw_spin_trylock(_reason_lock))
> > > + poll_crash_ipi_and_callback(regs);
> >
> > Waaait a minute: so if we're getting NMIs broadcasted on every core but
> > we're *not* crash dumping, we will run into here too. This can't be
> > right. :-\
> 
> This only does something if crash_ipi_done is set, which means you are killing
> the box. But perhaps a comment that states that here would be useful, or maybe
> just put in the check here. 

OK, I'll add more comments around this.

> There's no need to make it depend on SMP, as
> raw_spin_trylock() will turn to just ({1}) for UP, and that code wont even be
> hit.

I'll integrate these SMP and UP versions with a comment about
that.

Regards,
--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group






RE: [V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-11-24 Thread 河合英宏 / KAWAIHIDEHIRO
> On Fri, Nov 20, 2015 at 06:36:48PM +0900, Hidehiro Kawai wrote:
> > Currently, panic() and crash_kexec() can be called at the same time.
> > For example (x86 case):
> >
> > CPU 0:
> >   oops_end()
> > crash_kexec()
> >   mutex_trylock() // acquired
> > nmi_shootdown_cpus() // stop other cpus
> >
> > CPU 1:
> >   panic()
> > crash_kexec()
> >   mutex_trylock() // failed to acquire
> > smp_send_stop() // stop other cpus
> > infinite loop
> >
> > If CPU 1 calls smp_send_stop() before nmi_shootdown_cpus(), kdump
> > fails.
> 
> So the smp_send_stop() stops CPU 0 from calling nmi_shootdown_cpus(), right?

Yes, but the important thing is that CPU 1 stops CPU 0 which is
only CPU processing crash_ kexec routines.

> >
> > In another case:
> >
> > CPU 0:
> >   oops_end()
> > crash_kexec()
> >   mutex_trylock() // acquired
> > 
> > io_check_error()
> >   panic()
> > crash_kexec()
> >   mutex_trylock() // failed to acquire
> > infinite loop
> >
> > Clearly, this is an undesirable result.
> 
> I'm trying to see how this patch fixes this case.
> 
> >
> > To fix this problem, this patch changes crash_kexec() to exclude
> > others by using atomic_t panic_cpu.
> >
> > V5:
> > - Add missing dummy __crash_kexec() for !CONFIG_KEXEC_CORE case
> > - Replace atomic_xchg() with atomic_set() in crash_kexec() because
> >   it is used as a release operation and there is no need of memory
> >   barrier effect.  This change also removes an unused value warning
> >
> > V4:
> > - Use new __crash_kexec(), no exclusion check version of crash_kexec(),
> >   instead of checking if panic_cpu is the current cpu or not
> >
> > V2:
> > - Use atomic_cmpxchg() instead of spin_trylock() on panic_lock
> >   to exclude concurrent accesses
> > - Don't introduce no-lock version of crash_kexec()
> >
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Eric Biederman 
> > Cc: Vivek Goyal 
> > Cc: Andrew Morton 
> > Cc: Michal Hocko 
> > ---
> >  include/linux/kexec.h |2 ++
> >  kernel/kexec_core.c   |   26 +-
> >  kernel/panic.c|4 ++--
> >  3 files changed, 29 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> > index d140b1e..7b68d27 100644
> > --- a/include/linux/kexec.h
> > +++ b/include/linux/kexec.h
> > @@ -237,6 +237,7 @@ extern int kexec_purgatory_get_set_symbol(struct kimage 
> > *image,
> >   unsigned int size, bool get_value);
> >  extern void *kexec_purgatory_get_symbol_addr(struct kimage *image,
> >  const char *name);
> > +extern void __crash_kexec(struct pt_regs *);
> >  extern void crash_kexec(struct pt_regs *);
> >  int kexec_should_crash(struct task_struct *);
> >  void crash_save_cpu(struct pt_regs *regs, int cpu);
> > @@ -332,6 +333,7 @@ int __weak arch_kexec_apply_relocations(const Elf_Ehdr 
> > *ehdr, Elf_Shdr *sechdrs,
> >  #else /* !CONFIG_KEXEC_CORE */
> >  struct pt_regs;
> >  struct task_struct;
> > +static inline void __crash_kexec(struct pt_regs *regs) { }
> >  static inline void crash_kexec(struct pt_regs *regs) { }
> >  static inline int kexec_should_crash(struct task_struct *p) { return 0; }
> >  #define kexec_in_progress false
> > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > index 11b64a6..9d097f5 100644
> > --- a/kernel/kexec_core.c
> > +++ b/kernel/kexec_core.c
> > @@ -853,7 +853,8 @@ struct kimage *kexec_image;
> >  struct kimage *kexec_crash_image;
> >  int kexec_load_disabled;
> >
> > -void crash_kexec(struct pt_regs *regs)
> > +/* No panic_cpu check version of crash_kexec */
> > +void __crash_kexec(struct pt_regs *regs)
> >  {
> > /* Take the kexec_mutex here to prevent sys_kexec_load
> >  * running on one cpu from replacing the crash kernel
> > @@ -876,6 +877,29 @@ void crash_kexec(struct pt_regs *regs)
> > }
> >  }
> >
> > +void crash_kexec(struct pt_regs *regs)
> > +{
> > +   int old_cpu, this_cpu;
> > +
> > +   /*
> > +* Only one CPU is allowed to execute the crash_kexec() code as with
> > +* panic().  Otherwise parallel calls of panic() and crash_kexec()
> > +* may stop each other.  To exclude them, we use panic_cpu here too.
> > +*/
> > +   this_cpu = raw_smp_processor_id();
> > +   old_cpu = atomic_cmpxchg(_cpu, -1, this_cpu);
> > +   if (old_cpu == -1) {
> > +   /* This is the 1st CPU which comes here, so go ahead. */
> > +   __crash_kexec(regs);
> > +
> > +   /*
> > +* Reset panic_cpu to allow another panic()/crash_kexec()
> > +* call.
> > +*/
> > +   atomic_set(_cpu, -1);
> > +   }
> > +}
> > +
> >  size_t crash_get_memory_size(void)
> >  {
> > size_t size = 0;
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index 

RE: [V5 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-11-23 Thread 河合英宏 / KAWAIHIDEHIRO
Hi,

> On Fri, Nov 20, 2015 at 06:36:44PM +0900, Hidehiro Kawai wrote:
> > If panic on NMI happens just after panic() on the same CPU, panic()
> > is recursively called.  As the result, it stalls after failing to
> > acquire panic_lock.
> >
> > To avoid this problem, don't call panic() in NMI context if
> > we've already entered panic().
> >
> > V4:
> > - Improve comments in io_check_error() and panic()
> >
> > V3:
> > - Introduce nmi_panic() macro to reduce code duplication
> > - In the case of panic on NMI, don't return from NMI handlers
> >   if another cpu already panicked
> >
> > V2:
> > - Use atomic_cmpxchg() instead of current spin_trylock() to
> >   exclude concurrent accesses to the panic routines
> > - Don't introduce no-lock version of panic()
> >
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Andrew Morton 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: Peter Zijlstra 
> > Cc: Michal Hocko 
> > ---
> >  arch/x86/kernel/nmi.c  |   16 
> >  include/linux/kernel.h |   13 +
> >  kernel/panic.c |   15 ---
> >  kernel/watchdog.c  |2 +-
> >  4 files changed, 38 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> > index 697f90d..5131714 100644
> > --- a/arch/x86/kernel/nmi.c
> > +++ b/arch/x86/kernel/nmi.c
> > @@ -231,7 +231,7 @@ pci_serr_error(unsigned char reason, struct pt_regs 
> > *regs)
> >  #endif
> >
> > if (panic_on_unrecovered_nmi)
> > -   panic("NMI: Not continuing");
> > +   nmi_panic("NMI: Not continuing");
> >
> > pr_emerg("Dazed and confused, but trying to continue\n");
> >
> > @@ -255,8 +255,16 @@ io_check_error(unsigned char reason, struct pt_regs 
> > *regs)
> >  reason, smp_processor_id());
> > show_regs(regs);
> >
> > -   if (panic_on_io_nmi)
> > -   panic("NMI IOCK error: Not continuing");
> > +   if (panic_on_io_nmi) {
> > +   nmi_panic("NMI IOCK error: Not continuing");
> 
> Btw, that panic_on_io_nmi seems undocumented in
> Documentation/sysctl/kernel.txt. Care to document it, please, as a
> separate patch?

Sure. I'll post a documentation patch for it in a separate patch.
Because panic_on_io_nmi has been available since relatively old days,
I didn't check it. 

> > +
> > +   /*
> > +* If we return from nmi_panic(), it means we have received
> > +* NMI while processing panic().  So, simply return without
> > +* a delay and re-enabling NMI.
> > +*/
> > +   return;
> > +   }
> >
> > /* Re-enable the IOCK line, wait for a few seconds */
> > reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
> > @@ -297,7 +305,7 @@ unknown_nmi_error(unsigned char reason, struct pt_regs 
> > *regs)
> >
> > pr_emerg("Do you have a strange power saving mode enabled?\n");
> > if (unknown_nmi_panic || panic_on_unrecovered_nmi)
> > -   panic("NMI: Not continuing");
> > +   nmi_panic("NMI: Not continuing");
> >
> > pr_emerg("Dazed and confused, but trying to continue\n");
> >  }
> > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > index 350dfb0..480a4fd 100644
> > --- a/include/linux/kernel.h
> > +++ b/include/linux/kernel.h
> > @@ -445,6 +445,19 @@ extern int sysctl_panic_on_stackoverflow;
> >
> >  extern bool crash_kexec_post_notifiers;
> >
> > +extern atomic_t panic_cpu;
> 
> This needs a comment explaining what this variable is and what it
> denotes.

OK, I'll do that in the next version.

> 
> > +
> > +/*
> > + * A variant of panic() called from NMI context.
> > + * If we've already panicked on this cpu, return from here.
> > + */
> > +#define nmi_panic(fmt, ...)
> > \
> > +   do {\
> > +   int this_cpu = raw_smp_processor_id();  \
> > +   if (atomic_cmpxchg(_cpu, -1, this_cpu) != this_cpu) \
> > +   panic(fmt, ##__VA_ARGS__);  \
> > +   } while (0)
> > +
> >  /*
> >   * Only to be used by arch init code. If the user over-wrote the default
> >   * CONFIG_PANIC_TIMEOUT, honor it.
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index 4579dbb..24ee2ea 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -61,6 +61,8 @@ void __weak panic_smp_self_stop(void)
> > cpu_relax();
> >  }
> >
> > +atomic_t panic_cpu = ATOMIC_INIT(-1);
> > +
> >  /**
> >   * panic - halt the system
> >   * @fmt: The text string to print
> > @@ -71,17 +73,17 @@ void __weak panic_smp_self_stop(void)
> >   */
> >  void panic(const char *fmt, ...)
> >  {
> > -   static DEFINE_SPINLOCK(panic_lock);
> > static char buf[1024];
> > va_list args;
> > long i, i_next = 0;
> > int state = 0;
> > +   int old_cpu, this_cpu;
> >
> > /*
> >  * Disable local interrupts. This will prevent panic_smp_self_stop
> 

RE: [V5 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-11-23 Thread 河合英宏 / KAWAIHIDEHIRO
Hi,

> On Fri, Nov 20, 2015 at 06:36:44PM +0900, Hidehiro Kawai wrote:
> > If panic on NMI happens just after panic() on the same CPU, panic()
> > is recursively called.  As the result, it stalls after failing to
> > acquire panic_lock.
> >
> > To avoid this problem, don't call panic() in NMI context if
> > we've already entered panic().
> >
> > V4:
> > - Improve comments in io_check_error() and panic()
> >
> > V3:
> > - Introduce nmi_panic() macro to reduce code duplication
> > - In the case of panic on NMI, don't return from NMI handlers
> >   if another cpu already panicked
> >
> > V2:
> > - Use atomic_cmpxchg() instead of current spin_trylock() to
> >   exclude concurrent accesses to the panic routines
> > - Don't introduce no-lock version of panic()
> >
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Andrew Morton 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: Peter Zijlstra 
> > Cc: Michal Hocko 
> > ---
> >  arch/x86/kernel/nmi.c  |   16 
> >  include/linux/kernel.h |   13 +
> >  kernel/panic.c |   15 ---
> >  kernel/watchdog.c  |2 +-
> >  4 files changed, 38 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> > index 697f90d..5131714 100644
> > --- a/arch/x86/kernel/nmi.c
> > +++ b/arch/x86/kernel/nmi.c
> > @@ -231,7 +231,7 @@ pci_serr_error(unsigned char reason, struct pt_regs 
> > *regs)
> >  #endif
> >
> > if (panic_on_unrecovered_nmi)
> > -   panic("NMI: Not continuing");
> > +   nmi_panic("NMI: Not continuing");
> >
> > pr_emerg("Dazed and confused, but trying to continue\n");
> >
> > @@ -255,8 +255,16 @@ io_check_error(unsigned char reason, struct pt_regs 
> > *regs)
> >  reason, smp_processor_id());
> > show_regs(regs);
> >
> > -   if (panic_on_io_nmi)
> > -   panic("NMI IOCK error: Not continuing");
> > +   if (panic_on_io_nmi) {
> > +   nmi_panic("NMI IOCK error: Not continuing");
> 
> Btw, that panic_on_io_nmi seems undocumented in
> Documentation/sysctl/kernel.txt. Care to document it, please, as a
> separate patch?

Sure. I'll post a documentation patch for it in a separate patch.
Because panic_on_io_nmi has been available since relatively old days,
I didn't check it. 

> > +
> > +   /*
> > +* If we return from nmi_panic(), it means we have received
> > +* NMI while processing panic().  So, simply return without
> > +* a delay and re-enabling NMI.
> > +*/
> > +   return;
> > +   }
> >
> > /* Re-enable the IOCK line, wait for a few seconds */
> > reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
> > @@ -297,7 +305,7 @@ unknown_nmi_error(unsigned char reason, struct pt_regs 
> > *regs)
> >
> > pr_emerg("Do you have a strange power saving mode enabled?\n");
> > if (unknown_nmi_panic || panic_on_unrecovered_nmi)
> > -   panic("NMI: Not continuing");
> > +   nmi_panic("NMI: Not continuing");
> >
> > pr_emerg("Dazed and confused, but trying to continue\n");
> >  }
> > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > index 350dfb0..480a4fd 100644
> > --- a/include/linux/kernel.h
> > +++ b/include/linux/kernel.h
> > @@ -445,6 +445,19 @@ extern int sysctl_panic_on_stackoverflow;
> >
> >  extern bool crash_kexec_post_notifiers;
> >
> > +extern atomic_t panic_cpu;
> 
> This needs a comment explaining what this variable is and what it
> denotes.

OK, I'll do that in the next version.

> 
> > +
> > +/*
> > + * A variant of panic() called from NMI context.
> > + * If we've already panicked on this cpu, return from here.
> > + */
> > +#define nmi_panic(fmt, ...)
> > \
> > +   do {\
> > +   int this_cpu = raw_smp_processor_id();  \
> > +   if (atomic_cmpxchg(_cpu, -1, this_cpu) != this_cpu) \
> > +   panic(fmt, ##__VA_ARGS__);  \
> > +   } while (0)
> > +
> >  /*
> >   * Only to be used by arch init code. If the user over-wrote the default
> >   * CONFIG_PANIC_TIMEOUT, honor it.
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index 4579dbb..24ee2ea 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -61,6 +61,8 @@ void __weak panic_smp_self_stop(void)
> > cpu_relax();
> >  }
> >
> > +atomic_t panic_cpu = ATOMIC_INIT(-1);
> > +
> >  /**
> >   * panic - halt the system
> >   * @fmt: The text string to print
> > @@ -71,17 +73,17 @@ void __weak panic_smp_self_stop(void)
> >   */
> >  void panic(const char *fmt, ...)
> >  {
> > -   static DEFINE_SPINLOCK(panic_lock);
> > static char buf[1024];
> > va_list args;
> > long i, 

RE: Re: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-27 Thread 河合英宏 / KAWAIHIDEHIRO
Hi,

> I just have a look at this thread. I am wondering why we don't use
> existing is_kdump_kernel() directly to disable external NMI if it's
> in kdump kernel. Then no need to introduce another boot option "noextnmi"
> which is used only for kdump kernel.

As I stated in another mail, there is a case where we don't want to
mask external NMIs in the second kernel.  So, we need some
configurable way.  Please see the following quotation.

> We souldn't enable this feature silently.  Some users wouldn't like
> to enable this feature.  For example, a user enables a watchdog timer
> which raises an external NMI when the counter is not reset for a
> specific duration.  Then, the second kernel hangs up while saving
> crash dump, and NMI is delivered to the CPU.  The kernel gets panic
> due to the NMI, prints some information to the display and serial
> console, and then automatically reboot.  In this case, users don't
> want to block external NMIs.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group




RE: Re: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-27 Thread 河合英宏 / KAWAIHIDEHIRO
Hi,

> I just have a look at this thread. I am wondering why we don't use
> existing is_kdump_kernel() directly to disable external NMI if it's
> in kdump kernel. Then no need to introduce another boot option "noextnmi"
> which is used only for kdump kernel.

As I stated in another mail, there is a case where we don't want to
mask external NMIs in the second kernel.  So, we need some
configurable way.  Please see the following quotation.

> We souldn't enable this feature silently.  Some users wouldn't like
> to enable this feature.  For example, a user enables a watchdog timer
> which raises an external NMI when the counter is not reset for a
> specific duration.  Then, the second kernel hangs up while saving
> crash dump, and NMI is delivered to the CPU.  The kernel gets panic
> due to the NMI, prints some information to the display and serial
> console, and then automatically reboot.  In this case, users don't
> want to block external NMIs.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group




RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-15 Thread 河合英宏 / KAWAIHIDEHIRO
> > By the way, I have a pending patch which expands this option like
> > this:
> >
> > apic_extnmi={ bsp | all | none }
> >
> > If apic_extnmi=all is specified, external NMIs are broadcast to
> > all CPUs.  This raises the successful rate of kernel panic in the case
> > where an external NMI to CPU 0 is swallowed by other NMI handlers or
> > blocked due to hang-up in NMI context.  The patch works without any
> > problems, but I'm going to drop the feature if it will cause long
> > discussion.  I'd like to settle this patch set down once.  At least,
> > I'm going to change this option to apic_extnmi={bsp|none} style for
> > the future expansion.
> >
> > How do you think about this?
> 
> Do it right away with all three variants. They make a lot of sense to
> me.

OK. I'll do that.

Regards,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group





RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-15 Thread 河合英宏 / KAWAIHIDEHIRO
> * Thomas Gleixner  wrote:
> 
> > Borislav,
> >
> > On Mon, 5 Oct 2015, Borislav Petkov wrote:
> > > On Mon, Oct 05, 2015 at 02:03:58AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > That's different from my point of view.  I'm not going to pass
> > > > some data from the first kernel to the second kernel. I'm just going to
> > > > provide a configurable option for the second kernel to users.
> > >
> > > Dude, WTF?! You're adding a kernel command line which is supposed to
> > > be used *only* by the kdump kernel. But nooo, it is there in the open
> > > and visible to people. And anyone can type it in during boot. AND THAT
> > > SHOULDN'T BE POSSIBLE IN THE FIRST PLACE!
> > >
> > > This information is strictly for the kdump kernel - it shouldn't be a
> > > generic command line option. How hard it is to understand that simple
> > > fact?!
> >
> > Calm down!
> >
> > Disabling that NMI in the first kernel is not going to make the world
> > explode. We have enough command line options a user can type in, which
> > are way worse than that one. If some "expert" types nonsense on the
> > first kernel command line, then it's none of our problems, really.
> >
> > If Kawai would have marked that option as a debug feature, this
> > discussion would have probably never happened.
> >
> > Aside of that, the best way to hand in options for the kdump kernel is
> > the command line. We have an existing interface for that.
> >
> > Let's move on. Nothing to see here.
> 
> So Boris kind of has a point: there are numerous problems with boot options as
> kexec parameter interface:
> 
>  - boot option strings are not a well defined programmatic interface:
> - failures are not obvious (they are often ignored)
> - inserting/setting parameters is awkward as well.
> 
>  - boot options are not an ABI, so when options have dual use with kexec it's 
> easy
>to break things inadvertently: without that failure being apparent other 
> than
>reintroducing obscure kexec failure modes again.
> 
>  - in the booted up kexec kernel it's not really obvious which options got 
> set by
>kexec and which got set by the user - as there's no separation of 
> namespaces.
> 
>  - likewise, if the user specifies an option in conflict with a kexec 
> requirement
>it's not really obvious what's happening: the user setting should probably
>dominate - I'm not sure that's the case with the current kexec code.

Thanks for the detailed explanation.  I can understand the disadvantages
of using boot option.  So these are essential problems with boot options
rather than new boot option added for kexec'ed kernel.
 
> Boot options are basically a user interface.
> 
> On the other hand the hack of (ab-)using boot parameters as kexec parameter
> passing is an existing, many years old practice and we cannot just stop it 
> without
> offering an alternative (and better!) interface first.
> 
> We could improve things by either adding a separate kexec-only parameter 
> passing
> facility (like programmatic boot parameters are) - or by creating some sort of
> boot parameter alias that clearly identifies kexec parameters.
> 
> So for example when introducing 'noextnmi' we'd also add a facility to add a
> 'kexec_noextnmi' alias - which clearly identifies this boot parameter as a 
> kexec
> related one.
> 
> Every 'kexec' inserted parameter would be prefixed by kexec_ - and then the
> separation of namespaces (and the prioritization of user vs. kexec 
> requirements)
> becomes well defined as well..
> 
> We should also probably print a warning if a kexec_* parameter is passed in 
> that
> has no matching handler in the kexec()-ed kernel.

It would be reasonable.  Or we might improve kexec command so that
it removes conflict boot options with warnings.

As I stated in another mail, I'm going to change "noextnmi" to
"apic_extnmi={bsp|all|none}", which can be used both the first and
second kernels.  So, I'll add this option without "kexec_" prefix
at this point.

> I do concur that this patch is probably OK given existing practices.

Thanks!

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-15 Thread 河合英宏 / KAWAIHIDEHIRO
> * Thomas Gleixner <t...@linutronix.de> wrote:
> 
> > Borislav,
> >
> > On Mon, 5 Oct 2015, Borislav Petkov wrote:
> > > On Mon, Oct 05, 2015 at 02:03:58AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > That's different from my point of view.  I'm not going to pass
> > > > some data from the first kernel to the second kernel. I'm just going to
> > > > provide a configurable option for the second kernel to users.
> > >
> > > Dude, WTF?! You're adding a kernel command line which is supposed to
> > > be used *only* by the kdump kernel. But nooo, it is there in the open
> > > and visible to people. And anyone can type it in during boot. AND THAT
> > > SHOULDN'T BE POSSIBLE IN THE FIRST PLACE!
> > >
> > > This information is strictly for the kdump kernel - it shouldn't be a
> > > generic command line option. How hard it is to understand that simple
> > > fact?!
> >
> > Calm down!
> >
> > Disabling that NMI in the first kernel is not going to make the world
> > explode. We have enough command line options a user can type in, which
> > are way worse than that one. If some "expert" types nonsense on the
> > first kernel command line, then it's none of our problems, really.
> >
> > If Kawai would have marked that option as a debug feature, this
> > discussion would have probably never happened.
> >
> > Aside of that, the best way to hand in options for the kdump kernel is
> > the command line. We have an existing interface for that.
> >
> > Let's move on. Nothing to see here.
> 
> So Boris kind of has a point: there are numerous problems with boot options as
> kexec parameter interface:
> 
>  - boot option strings are not a well defined programmatic interface:
> - failures are not obvious (they are often ignored)
> - inserting/setting parameters is awkward as well.
> 
>  - boot options are not an ABI, so when options have dual use with kexec it's 
> easy
>to break things inadvertently: without that failure being apparent other 
> than
>reintroducing obscure kexec failure modes again.
> 
>  - in the booted up kexec kernel it's not really obvious which options got 
> set by
>kexec and which got set by the user - as there's no separation of 
> namespaces.
> 
>  - likewise, if the user specifies an option in conflict with a kexec 
> requirement
>it's not really obvious what's happening: the user setting should probably
>dominate - I'm not sure that's the case with the current kexec code.

Thanks for the detailed explanation.  I can understand the disadvantages
of using boot option.  So these are essential problems with boot options
rather than new boot option added for kexec'ed kernel.
 
> Boot options are basically a user interface.
> 
> On the other hand the hack of (ab-)using boot parameters as kexec parameter
> passing is an existing, many years old practice and we cannot just stop it 
> without
> offering an alternative (and better!) interface first.
> 
> We could improve things by either adding a separate kexec-only parameter 
> passing
> facility (like programmatic boot parameters are) - or by creating some sort of
> boot parameter alias that clearly identifies kexec parameters.
> 
> So for example when introducing 'noextnmi' we'd also add a facility to add a
> 'kexec_noextnmi' alias - which clearly identifies this boot parameter as a 
> kexec
> related one.
> 
> Every 'kexec' inserted parameter would be prefixed by kexec_ - and then the
> separation of namespaces (and the prioritization of user vs. kexec 
> requirements)
> becomes well defined as well..
> 
> We should also probably print a warning if a kexec_* parameter is passed in 
> that
> has no matching handler in the kexec()-ed kernel.

It would be reasonable.  Or we might improve kexec command so that
it removes conflict boot options with warnings.

As I stated in another mail, I'm going to change "noextnmi" to
"apic_extnmi={bsp|all|none}", which can be used both the first and
second kernels.  So, I'll add this option without "kexec_" prefix
at this point.

> I do concur that this patch is probably OK given existing practices.

Thanks!

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-15 Thread 河合英宏 / KAWAIHIDEHIRO
> > By the way, I have a pending patch which expands this option like
> > this:
> >
> > apic_extnmi={ bsp | all | none }
> >
> > If apic_extnmi=all is specified, external NMIs are broadcast to
> > all CPUs.  This raises the successful rate of kernel panic in the case
> > where an external NMI to CPU 0 is swallowed by other NMI handlers or
> > blocked due to hang-up in NMI context.  The patch works without any
> > problems, but I'm going to drop the feature if it will cause long
> > discussion.  I'd like to settle this patch set down once.  At least,
> > I'm going to change this option to apic_extnmi={bsp|none} style for
> > the future expansion.
> >
> > How do you think about this?
> 
> Do it right away with all three variants. They make a lot of sense to
> me.

OK. I'll do that.

Regards,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group





RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-13 Thread 河合英宏 / KAWAIHIDEHIRO
> On Fri, 25 Sep 2015, Hidehiro Kawai wrote:
> 
> > This patch introduces new boot option "noextnmi" which disables
> > external NMI.  This option is useful for the dump capture kernel
> > so that an HA application or administrator wouldn't mistakenly
> > shoot down the kernel by NMI.
> >
> > Currently, only x86 supports this option.
> 
> You might add that is can be used for debugging purposes as
> well. External NMIs can be their own source of trouble. :)

Thanks for your comments!  I'll do that.

By the way, I have a pending patch which expands this option like
this:

apic_extnmi={ bsp | all | none }

If apic_extnmi=all is specified, external NMIs are broadcast to
all CPUs.  This raises the successful rate of kernel panic in the case
where an external NMI to CPU 0 is swallowed by other NMI handlers or
blocked due to hang-up in NMI context.  The patch works without any
problems, but I'm going to drop the feature if it will cause long
discussion.  I'd like to settle this patch set down once.  At least,
I'm going to change this option to apic_extnmi={bsp|none} style for
the future expansion.

How do you think about this?

> > Signed-off-by: Hidehiro Kawai 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: Jonathan Corbet 
> > ---
> >  Documentation/kernel-parameters.txt |4 
> >  arch/x86/kernel/apic/apic.c |   17 -
> >  2 files changed, 20 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/kernel-parameters.txt 
> > b/Documentation/kernel-parameters.txt
> > index 22a4b68..8bcaccd 100644
> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> > @@ -2379,6 +2379,10 @@ bytes respectively. Such letter suffixes can also be 
> > entirely omitted.
> > noexec=on: enable non-executable mappings (default)
> > noexec=off: disable non-executable mappings
> >
> > +   noextnmi[X86]
> > +   Mask external NMI.  This option is useful for a
> > +   dump capture kernel to be shot down by NMI.
> 
> That should read: "...not to be shot down", right?

Yes, you are right.  I'll fix it.

> Other than that.
> 
> Acked-by: Thomas Gleixner 

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group





RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-13 Thread 河合英宏 / KAWAIHIDEHIRO
Hello, Boris

Sorry for the late reply.

> On Mon, Oct 05, 2015 at 09:21:02AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > So, the problem for you is that "noextnmi" option is visible and effective
> > in the first kernel, isn't it?
> 
> No, such an option shouldn't exist at all. You should be passing
> information *in* *a* *different* *manner* to the kdump kernel - not with
> a kernel command line option.

Sorry, I couldn't find out the reason why I shouldn't use cmdline option.
It doesn't need new user I/F to inform the 1st kernel about masking/unmasking
external NMI in the 2nd kernel, doesn't need new data passing infrastructure,
and is easy to configure that.  Also, "elfcorehdr" and "disable_cpu_apicid"
have already been introduced as cmdline options for dump capture kernel.
This means the cmdline option approach would be mostly acceptable.

> I get the feeling I'm starting to sound like a broken record on this
> mail thread... :-(
> 
> One other thing we could probably try to do is use boot_params which
> is, IIUC, passed to the second kernel. So we can add another bit to
> boot_params.hdr.loadflags or so and use that. Or something similar.

I think using boot_params would be worse than ELF header approach.
It needs to reserve boot_params bits for all boot loaders.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group




RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-13 Thread 河合英宏 / KAWAIHIDEHIRO
> On Fri, 25 Sep 2015, Hidehiro Kawai wrote:
> 
> > This patch introduces new boot option "noextnmi" which disables
> > external NMI.  This option is useful for the dump capture kernel
> > so that an HA application or administrator wouldn't mistakenly
> > shoot down the kernel by NMI.
> >
> > Currently, only x86 supports this option.
> 
> You might add that is can be used for debugging purposes as
> well. External NMIs can be their own source of trouble. :)

Thanks for your comments!  I'll do that.

By the way, I have a pending patch which expands this option like
this:

apic_extnmi={ bsp | all | none }

If apic_extnmi=all is specified, external NMIs are broadcast to
all CPUs.  This raises the successful rate of kernel panic in the case
where an external NMI to CPU 0 is swallowed by other NMI handlers or
blocked due to hang-up in NMI context.  The patch works without any
problems, but I'm going to drop the feature if it will cause long
discussion.  I'd like to settle this patch set down once.  At least,
I'm going to change this option to apic_extnmi={bsp|none} style for
the future expansion.

How do you think about this?

> > Signed-off-by: Hidehiro Kawai 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: Jonathan Corbet 
> > ---
> >  Documentation/kernel-parameters.txt |4 
> >  arch/x86/kernel/apic/apic.c |   17 -
> >  2 files changed, 20 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/kernel-parameters.txt 
> > b/Documentation/kernel-parameters.txt
> > index 22a4b68..8bcaccd 100644
> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> > @@ -2379,6 +2379,10 @@ bytes respectively. Such letter suffixes can also be 
> > entirely omitted.
> > noexec=on: enable non-executable mappings (default)
> > noexec=off: disable non-executable mappings
> >
> > +   noextnmi[X86]
> > +   Mask external NMI.  This option is useful for a
> > +   dump capture kernel to be shot down by NMI.
> 
> That should read: "...not to be shot down", right?

Yes, you are right.  I'll fix it.

> Other than that.
> 
> Acked-by: Thomas Gleixner 

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group





RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-13 Thread 河合英宏 / KAWAIHIDEHIRO
Hello, Boris

Sorry for the late reply.

> On Mon, Oct 05, 2015 at 09:21:02AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > So, the problem for you is that "noextnmi" option is visible and effective
> > in the first kernel, isn't it?
> 
> No, such an option shouldn't exist at all. You should be passing
> information *in* *a* *different* *manner* to the kdump kernel - not with
> a kernel command line option.

Sorry, I couldn't find out the reason why I shouldn't use cmdline option.
It doesn't need new user I/F to inform the 1st kernel about masking/unmasking
external NMI in the 2nd kernel, doesn't need new data passing infrastructure,
and is easy to configure that.  Also, "elfcorehdr" and "disable_cpu_apicid"
have already been introduced as cmdline options for dump capture kernel.
This means the cmdline option approach would be mostly acceptable.

> I get the feeling I'm starting to sound like a broken record on this
> mail thread... :-(
> 
> One other thing we could probably try to do is use boot_params which
> is, IIUC, passed to the second kernel. So we can add another bit to
> boot_params.hdr.loadflags or so and use that. Or something similar.

I think using boot_params would be worse than ELF header approach.
It needs to reserve boot_params bits for all boot loaders.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group




RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-05 Thread 河合英宏 / KAWAIHIDEHIRO
> On Mon, Oct 05, 2015 at 02:03:58AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > That's different from my point of view.  I'm not going to pass
> > some data from the first kernel to the second kernel. I'm just going to
> > provide a configurable option for the second kernel to users.
> 
> Dude, WTF?! You're adding a kernel command line which is supposed to
> be used *only* by the kdump kernel. But nooo, it is there in the open
> and visible to people. And anyone can type it in during boot. AND THAT
> SHOULDN'T BE POSSIBLE IN THE FIRST PLACE!
> 
> This information is strictly for the kdump kernel - it shouldn't be a
> generic command line option. How hard it is to understand that simple
> fact?!

So, the problem for you is that "noextnmi" option is visible and effective
in the first kernel, isn't it?  If so, we can ignore "noextnmi" option
if we are in the first kernel and remove it from the documentation.
"elfcorehdr" cmdline option prepared by kexec command is passed to only
the second kernel, and it is also used to check if the booted kernel is
a kdump kernel.  Thus, if "elfcorehdr" is NOT specified, then ignore
"noextnmi".

Documentation/kernel-parameters.txt:
> elfcorehdr=[size[KMG]@]offset[KMG] [IA64,PPC,SH,X86,S390]
> Specifies physical address of start of kernel core
> image elf header and optionally the size. Generally
> kexec loader will pass this option to capture kernel.
> See Documentation/kdump/kdump.txt for details.

> 
> 
> > I think we should use the ELF header only if the passed information
> > is saved to a crash dump.
> 
> So what?! ELF header will contain the additional bit of information that
> the second kernel wasn't reacting to NMIs. But that's fine, that *is*
> the desired behavior anyway.
> 
> All I'm saying is, this is a strict kdump kernel "command", so to speak,
> and it doesn't belong with the generic kernel command line parameters.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group





RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-05 Thread 河合英宏 / KAWAIHIDEHIRO
> On Mon, Oct 05, 2015 at 02:03:58AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > That's different from my point of view.  I'm not going to pass
> > some data from the first kernel to the second kernel. I'm just going to
> > provide a configurable option for the second kernel to users.
> 
> Dude, WTF?! You're adding a kernel command line which is supposed to
> be used *only* by the kdump kernel. But nooo, it is there in the open
> and visible to people. And anyone can type it in during boot. AND THAT
> SHOULDN'T BE POSSIBLE IN THE FIRST PLACE!
> 
> This information is strictly for the kdump kernel - it shouldn't be a
> generic command line option. How hard it is to understand that simple
> fact?!

So, the problem for you is that "noextnmi" option is visible and effective
in the first kernel, isn't it?  If so, we can ignore "noextnmi" option
if we are in the first kernel and remove it from the documentation.
"elfcorehdr" cmdline option prepared by kexec command is passed to only
the second kernel, and it is also used to check if the booted kernel is
a kdump kernel.  Thus, if "elfcorehdr" is NOT specified, then ignore
"noextnmi".

Documentation/kernel-parameters.txt:
> elfcorehdr=[size[KMG]@]offset[KMG] [IA64,PPC,SH,X86,S390]
> Specifies physical address of start of kernel core
> image elf header and optionally the size. Generally
> kexec loader will pass this option to capture kernel.
> See Documentation/kdump/kdump.txt for details.

> 
> 
> > I think we should use the ELF header only if the passed information
> > is saved to a crash dump.
> 
> So what?! ELF header will contain the additional bit of information that
> the second kernel wasn't reacting to NMIs. But that's fine, that *is*
> the desired behavior anyway.
> 
> All I'm saying is, this is a strict kdump kernel "command", so to speak,
> and it doesn't belong with the generic kernel command line parameters.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group





RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-04 Thread 河合英宏 / KAWAIHIDEHIRO
> On Fri, Oct 02, 2015 at 12:58:02AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > On Thu, Oct 01, 2015 at 10:24:19AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > But how do we check if the starting kernel is a dump capture kernel?
> > >
> > > How does that first kernel pass info to the capture kernel?
> >
> > As I described in the previous mail,
> 
> I meant: "How does the first kernel pass info to the capture kernel by
> *not* using the kernel command line"?
> 
> The kernel command line is not the channel to pass data to the kdump
> kernel.

That's different from my point of view.  I'm not going to pass
some data from the first kernel to the second kernel. I'm just going to
provide a configurable option for the second kernel to users.

We souldn't enable this feature silently.  Some users wouldn't like
to enable this feature.  For example, a user enables a watchdog timer
which raises an external NMI when the counter is not reset for a
specific duration.  Then, the second kernel hangs up while saving
crash dump, and NMI is delivered to the CPU.  The kernel gets panic
due to the NMI, prints some information to the display and serial
console, and then automatically reboot.  In this case, users don't
want to block external NMIs.

So, making this feature configurable by command line option is
reasonable.

> > Yes, your first kernel doesn't get external NMIs, but basically
> > you don't have to set "noextnmi" option to the first kernel.
> 
> So it doesn't belong there as a kernel command line parameter in the
> first place.
> 
> IOW, you need a different method to pass data to the second kernel. Be
> it an ELF header, be it a shared page, whatever.

I think we should use the ELF header only if the passed information
is saved to a crash dump.

Also, we wouldn't want to introduce new shared page for that purpose.
A memory segment provided by kexec syscall is not usable because
the second kernel doesn't know what there is in a segment without a
command line option.  Please note that "elfcorehdr" command line option
prepared by kexec command is used to inform the second kernel about
the address of the ELF header memory segment.

Regards,


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-04 Thread 河合英宏 / KAWAIHIDEHIRO
> On Fri, Oct 02, 2015 at 12:58:02AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > On Thu, Oct 01, 2015 at 10:24:19AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > But how do we check if the starting kernel is a dump capture kernel?
> > >
> > > How does that first kernel pass info to the capture kernel?
> >
> > As I described in the previous mail,
> 
> I meant: "How does the first kernel pass info to the capture kernel by
> *not* using the kernel command line"?
> 
> The kernel command line is not the channel to pass data to the kdump
> kernel.

That's different from my point of view.  I'm not going to pass
some data from the first kernel to the second kernel. I'm just going to
provide a configurable option for the second kernel to users.

We souldn't enable this feature silently.  Some users wouldn't like
to enable this feature.  For example, a user enables a watchdog timer
which raises an external NMI when the counter is not reset for a
specific duration.  Then, the second kernel hangs up while saving
crash dump, and NMI is delivered to the CPU.  The kernel gets panic
due to the NMI, prints some information to the display and serial
console, and then automatically reboot.  In this case, users don't
want to block external NMIs.

So, making this feature configurable by command line option is
reasonable.

> > Yes, your first kernel doesn't get external NMIs, but basically
> > you don't have to set "noextnmi" option to the first kernel.
> 
> So it doesn't belong there as a kernel command line parameter in the
> first place.
> 
> IOW, you need a different method to pass data to the second kernel. Be
> it an ELF header, be it a shared page, whatever.

I think we should use the ELF header only if the passed information
is saved to a crash dump.

Also, we wouldn't want to introduce new shared page for that purpose.
A memory segment provided by kexec syscall is not usable because
the second kernel doesn't know what there is in a segment without a
command line option.  Please note that "elfcorehdr" command line option
prepared by kexec command is used to inform the second kernel about
the address of the ELF header memory segment.

Regards,


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-01 Thread 河合英宏 / KAWAIHIDEHIRO
> On Thu, Oct 01, 2015 at 10:24:19AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > But how do we check if the starting kernel is a dump capture kernel?
> 
> How does that first kernel pass info to the capture kernel?

As I described in the previous mail, You just have to add "noextnmi"
to KDUMP_COMMANDLINE_APPEND in /etc/sysconfig/kdump.  Then, "noextnmi"
option is passed to the capture kernel by the action of kexec command.

Cmdline option gives users flexibility.  I'm not sure all users
want to disable external NMIs in the 2nd kernel.

> > I think using cmdline option is the simplest way.
> 
> More often than not, simplest != correct.
> 
> What happens if I pass this option to the first kernel? All of a sudden
> my *first* kernel doesn't get external NMIs.

Yes, your first kernel doesn't get external NMIs, but basically
you don't have to set "noextnmi" option to the first kernel.


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group




RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-01 Thread 河合英宏 / KAWAIHIDEHIRO
> On Thu, Oct 01, 2015 at 07:01:50AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > I suppose that a sever which uses this feature will equip a BMC
> > and BMC mandatorily supports hard reset command for the server.
> > If the HA clustering software detects no response from the server
> > after relatively long timeout, it might want to insert hard reset
> > to the server by IPMI over LAN.
> 
> So why doesn't the capture kernel *automatically* ignore external NMIs,
> without a cmdline option?
> 
> Before it starts capturing, it says: "I'm starting capturing and am
> ignoring external NMIs from now on."

But how do we check if the starting kernel is a dump capture kernel?
I think using cmdline option is the simplest way.  You just have to
add "noextnmi" to KDUMP_COMMANDLINE_APPEND in /etc/sysconfig/kdump.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-01 Thread 河合英宏 / KAWAIHIDEHIRO
> On Thu, Oct 01, 2015 at 02:33:18AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > On Fri, Sep 25, 2015 at 08:28:11PM +0900, Hidehiro Kawai wrote:
> > > > This patch introduces new boot option "noextnmi" which disables
> > > > external NMI.  This option is useful for the dump capture kernel
> > > > so that an HA application or administrator wouldn't mistakenly
> > > > shoot down the kernel by NMI.
> > >
> > > So that they can get really stuck when the crash kernel crashes, right?
> > > ;-)
> >
> > No, it is different from my intention.
> >
> > `mistakenly' in the above means; they issue NMI due to a misconception
> > that the monitored host is stuck in the 1st kernel while it is actually
> > taking a crash dump in the 2nd kernel.  To avoid this kind of accident,
> > there is a tool such as fence_kdump which notifies "I'm taking a crash
> > dump, so don't send NMI" to the HA clustering software.  However, there
> > is a time window between kernel panic and the notification.
> >
> > "noextnmi" allows users to avoid this kind of accident all the time of
> > 2nd kernel.
> 
> Yes yes, I understand. But if the crash kernel also gets stuck they have
> no means of recovery, right? (other than power cycling the hardware)

Yes, but I think it's not a big problem.

I suppose that a sever which uses this feature will equip a BMC
and BMC mandatorily supports hard reset command for the server.
If the HA clustering software detects no response from the server
after relatively long timeout, it might want to insert hard reset
to the server by IPMI over LAN.

> Just playing devils advocate here, I don't actually object to the patch.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group





RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-01 Thread 河合英宏 / KAWAIHIDEHIRO
> On Thu, Oct 01, 2015 at 07:01:50AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > I suppose that a sever which uses this feature will equip a BMC
> > and BMC mandatorily supports hard reset command for the server.
> > If the HA clustering software detects no response from the server
> > after relatively long timeout, it might want to insert hard reset
> > to the server by IPMI over LAN.
> 
> So why doesn't the capture kernel *automatically* ignore external NMIs,
> without a cmdline option?
> 
> Before it starts capturing, it says: "I'm starting capturing and am
> ignoring external NMIs from now on."

But how do we check if the starting kernel is a dump capture kernel?
I think using cmdline option is the simplest way.  You just have to
add "noextnmi" to KDUMP_COMMANDLINE_APPEND in /etc/sysconfig/kdump.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-01 Thread 河合英宏 / KAWAIHIDEHIRO
> On Thu, Oct 01, 2015 at 02:33:18AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > On Fri, Sep 25, 2015 at 08:28:11PM +0900, Hidehiro Kawai wrote:
> > > > This patch introduces new boot option "noextnmi" which disables
> > > > external NMI.  This option is useful for the dump capture kernel
> > > > so that an HA application or administrator wouldn't mistakenly
> > > > shoot down the kernel by NMI.
> > >
> > > So that they can get really stuck when the crash kernel crashes, right?
> > > ;-)
> >
> > No, it is different from my intention.
> >
> > `mistakenly' in the above means; they issue NMI due to a misconception
> > that the monitored host is stuck in the 1st kernel while it is actually
> > taking a crash dump in the 2nd kernel.  To avoid this kind of accident,
> > there is a tool such as fence_kdump which notifies "I'm taking a crash
> > dump, so don't send NMI" to the HA clustering software.  However, there
> > is a time window between kernel panic and the notification.
> >
> > "noextnmi" allows users to avoid this kind of accident all the time of
> > 2nd kernel.
> 
> Yes yes, I understand. But if the crash kernel also gets stuck they have
> no means of recovery, right? (other than power cycling the hardware)

Yes, but I think it's not a big problem.

I suppose that a sever which uses this feature will equip a BMC
and BMC mandatorily supports hard reset command for the server.
If the HA clustering software detects no response from the server
after relatively long timeout, it might want to insert hard reset
to the server by IPMI over LAN.

> Just playing devils advocate here, I don't actually object to the patch.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group





RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-01 Thread 河合英宏 / KAWAIHIDEHIRO
> On Thu, Oct 01, 2015 at 10:24:19AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > But how do we check if the starting kernel is a dump capture kernel?
> 
> How does that first kernel pass info to the capture kernel?

As I described in the previous mail, You just have to add "noextnmi"
to KDUMP_COMMANDLINE_APPEND in /etc/sysconfig/kdump.  Then, "noextnmi"
option is passed to the capture kernel by the action of kexec command.

Cmdline option gives users flexibility.  I'm not sure all users
want to disable external NMIs in the 2nd kernel.

> > I think using cmdline option is the simplest way.
> 
> More often than not, simplest != correct.
> 
> What happens if I pass this option to the first kernel? All of a sudden
> my *first* kernel doesn't get external NMIs.

Yes, your first kernel doesn't get external NMIs, but basically
you don't have to set "noextnmi" option to the first kernel.


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group




RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-09-30 Thread 河合英宏 / KAWAIHIDEHIRO
> On Fri, Sep 25, 2015 at 08:28:11PM +0900, Hidehiro Kawai wrote:
> > This patch introduces new boot option "noextnmi" which disables
> > external NMI.  This option is useful for the dump capture kernel
> > so that an HA application or administrator wouldn't mistakenly
> > shoot down the kernel by NMI.
> 
> So that they can get really stuck when the crash kernel crashes, right?
> ;-)

No, it is different from my intention.

`mistakenly' in the above means; they issue NMI due to a misconception
that the monitored host is stuck in the 1st kernel while it is actually
taking a crash dump in the 2nd kernel.  To avoid this kind of accident,
there is a tool such as fence_kdump which notifies "I'm taking a crash
dump, so don't send NMI" to the HA clustering software.  However, there
is a time window between kernel panic and the notification.

"noextnmi" allows users to avoid this kind of accident all the time of
2nd kernel.

Regards,


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: Re: [V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-09-30 Thread 河合英宏 / KAWAIHIDEHIRO
> On Mon, Sep 28, 2015 at 07:08:19AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > >> kernel/kexec_core.c:899:3: note: in expansion of macro 'atomic_xchg'
> > >   atomic_xchg(_cpu, -1);
> > >   ^
> >
> > I changed to use atomic_xchg() instead of atomic_set() in V3
> > because atomic_set() doesn't mean memory barrier.  However,
> > I thought again and there is no need of barrier; there is no
> > problem if a competitor sees old value of panic_cpu or new one.
> > So, atomic_set() is sufficient and using it will remove this warning.
> >
> > I will resend the fixed version later.
> 
> So if you rely on the memory barrier; you should have also put a comment
> on explaining the ordering requirements.

I don't intend to use an explicit memory barrier.  There is no
memory ordering requirement here.  Also, atomic_set() which will be
used instead of atomic_xchg() is used as a RELEASE operation, so
I believe there is no problem.

Documentation/memory-barriers.txt:
> The following operations are potential problems as they do _not_ imply memory
> barriers, but might be used for implementing such things as RELEASE-class
> operations:
> 
> atomic_set();
> ...

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [V4 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-09-30 Thread 河合英宏 / KAWAIHIDEHIRO
> On Fri, Sep 25, 2015 at 08:28:07PM +0900, Hidehiro Kawai wrote:
> > --- a/arch/x86/kernel/reboot.c
> > +++ b/arch/x86/kernel/reboot.c
> > @@ -718,6 +718,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
> >  static nmi_shootdown_cb shootdown_callback;
> >
> >  static atomic_t waiting_for_crash_ipi;
> > +static int crash_ipi_done;
> >
> >  static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
> >  {
> > @@ -779,6 +780,7 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
> > wmb();
> >
> > smp_send_nmi_allbutself();
> > +   crash_ipi_done = 1; /* Kick cpus looping in nmi context */
> 
> I would suggest using WRITE_ONCE() for that, because without the
> volatile the compiler need not actually emit the store until after the
> whole waiting thing _IF_ it can inline the whole thing.
> 
> Currently udelay() will end up being a function call and will therefore
> force the store to be emitted, but I'd rather not rely on that.

OK, I use WRITE_ONCE().
Thanks!

> >
> > msecs = 1000; /* Wait at most a second for the other cpus to stop */
> > while ((atomic_read(_for_crash_ipi) > 0) && msecs) {


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [V4 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-09-30 Thread 河合英宏 / KAWAIHIDEHIRO
> On Fri, Sep 25, 2015 at 12:13:55PM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > Peter saids -tip tree doesn't have panic_on_unrecovered_nmi in the
> > previoius discussion, but it still exists.  So, I didn't change
> > anything about panic_on_unrecovered_nmi.
> >
> 
> > > --- a/arch/x86/kernel/nmi.c
> > > +++ b/arch/x86/kernel/nmi.c
> > > @@ -231,7 +231,7 @@ void unregister_nmi_handler(unsigned int type, const 
> > > char *name)
> > >  #endif
> > >
> > >   if (panic_on_unrecovered_nmi)
> > > - panic("NMI: Not continuing");
> > > + nmi_panic("NMI: Not continuing");
> > >
> > >   pr_emerg("Dazed and confused, but trying to continue\n");
> > >
> 
> I was looking at unregister_nmi_handler() because that's the function
> the diff points to. That still doesn't have panic_on_unrecovered_nmi.
> 
> It looks like your diff tool is 'broken' and generates nonsense function
> data.

I had noticed the function name is wrong, but I didn't know how
do I fix that, sorry.  Now, I updated git to the latest version
and the issue disappeared.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group




RE: [V4 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-09-30 Thread 河合英宏 / KAWAIHIDEHIRO
> On Fri, Sep 25, 2015 at 12:13:55PM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > Peter saids -tip tree doesn't have panic_on_unrecovered_nmi in the
> > previoius discussion, but it still exists.  So, I didn't change
> > anything about panic_on_unrecovered_nmi.
> >
> 
> > > --- a/arch/x86/kernel/nmi.c
> > > +++ b/arch/x86/kernel/nmi.c
> > > @@ -231,7 +231,7 @@ void unregister_nmi_handler(unsigned int type, const 
> > > char *name)
> > >  #endif
> > >
> > >   if (panic_on_unrecovered_nmi)
> > > - panic("NMI: Not continuing");
> > > + nmi_panic("NMI: Not continuing");
> > >
> > >   pr_emerg("Dazed and confused, but trying to continue\n");
> > >
> 
> I was looking at unregister_nmi_handler() because that's the function
> the diff points to. That still doesn't have panic_on_unrecovered_nmi.
> 
> It looks like your diff tool is 'broken' and generates nonsense function
> data.

I had noticed the function name is wrong, but I didn't know how
do I fix that, sorry.  Now, I updated git to the latest version
and the issue disappeared.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group




RE: [V4 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-09-30 Thread 河合英宏 / KAWAIHIDEHIRO
> On Fri, Sep 25, 2015 at 08:28:07PM +0900, Hidehiro Kawai wrote:
> > --- a/arch/x86/kernel/reboot.c
> > +++ b/arch/x86/kernel/reboot.c
> > @@ -718,6 +718,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
> >  static nmi_shootdown_cb shootdown_callback;
> >
> >  static atomic_t waiting_for_crash_ipi;
> > +static int crash_ipi_done;
> >
> >  static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
> >  {
> > @@ -779,6 +780,7 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
> > wmb();
> >
> > smp_send_nmi_allbutself();
> > +   crash_ipi_done = 1; /* Kick cpus looping in nmi context */
> 
> I would suggest using WRITE_ONCE() for that, because without the
> volatile the compiler need not actually emit the store until after the
> whole waiting thing _IF_ it can inline the whole thing.
> 
> Currently udelay() will end up being a function call and will therefore
> force the store to be emitted, but I'd rather not rely on that.

OK, I use WRITE_ONCE().
Thanks!

> >
> > msecs = 1000; /* Wait at most a second for the other cpus to stop */
> > while ((atomic_read(_for_crash_ipi) > 0) && msecs) {


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: Re: [V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-09-30 Thread 河合英宏 / KAWAIHIDEHIRO
> On Mon, Sep 28, 2015 at 07:08:19AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > >> kernel/kexec_core.c:899:3: note: in expansion of macro 'atomic_xchg'
> > >   atomic_xchg(_cpu, -1);
> > >   ^
> >
> > I changed to use atomic_xchg() instead of atomic_set() in V3
> > because atomic_set() doesn't mean memory barrier.  However,
> > I thought again and there is no need of barrier; there is no
> > problem if a competitor sees old value of panic_cpu or new one.
> > So, atomic_set() is sufficient and using it will remove this warning.
> >
> > I will resend the fixed version later.
> 
> So if you rely on the memory barrier; you should have also put a comment
> on explaining the ordering requirements.

I don't intend to use an explicit memory barrier.  There is no
memory ordering requirement here.  Also, atomic_set() which will be
used instead of atomic_xchg() is used as a RELEASE operation, so
I believe there is no problem.

Documentation/memory-barriers.txt:
> The following operations are potential problems as they do _not_ imply memory
> barriers, but might be used for implementing such things as RELEASE-class
> operations:
> 
> atomic_set();
> ...

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-09-30 Thread 河合英宏 / KAWAIHIDEHIRO
> On Fri, Sep 25, 2015 at 08:28:11PM +0900, Hidehiro Kawai wrote:
> > This patch introduces new boot option "noextnmi" which disables
> > external NMI.  This option is useful for the dump capture kernel
> > so that an HA application or administrator wouldn't mistakenly
> > shoot down the kernel by NMI.
> 
> So that they can get really stuck when the crash kernel crashes, right?
> ;-)

No, it is different from my intention.

`mistakenly' in the above means; they issue NMI due to a misconception
that the monitored host is stuck in the 1st kernel while it is actually
taking a crash dump in the 2nd kernel.  To avoid this kind of accident,
there is a tool such as fence_kdump which notifies "I'm taking a crash
dump, so don't send NMI" to the HA clustering software.  However, there
is a time window between kernel panic and the notification.

"noextnmi" allows users to avoid this kind of accident all the time of
2nd kernel.

Regards,


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: Re: [V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-09-28 Thread 河合英宏 / KAWAIHIDEHIRO
> Hi Hidehiro,
> 
> [auto build test results on v4.3-rc2 -- if it's inappropriate base, please 
> ignore]
> 
> config: ia64-allyesconfig (attached as .config)
> reproduce:
>   wget 
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>  -O ~/bin/make.cross
>   chmod +x ~/bin/make.cross
>   git checkout 0077681103150af584e5e592c0238fd010654c26
>   # save the attached .config to linux build tree
>   make.cross ARCH=ia64
[snip]
>arch/ia64/include/uapi/asm/cmpxchg.h:56:2: warning: value computed is not 
> used [-Wunused-value]
> ((__typeof__(*(ptr))) __xchg((unsigned long) (x), (ptr), sizeof(*(ptr
>  ^
>arch/ia64/include/asm/atomic.h:135:30: note: in expansion of macro 'xchg'
> #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
>  ^
> >> kernel/kexec_core.c:899:3: note: in expansion of macro 'atomic_xchg'
>   atomic_xchg(_cpu, -1);
>   ^

I changed to use atomic_xchg() instead of atomic_set() in V3
because atomic_set() doesn't mean memory barrier.  However,
I thought again and there is no need of barrier; there is no
problem if a competitor sees old value of panic_cpu or new one.
So, atomic_set() is sufficient and using it will remove this warning.

I will resend the fixed version later.

> vim +/atomic_xchg +899 kernel/kexec_core.c
> 
>883
>884/*
>885 * Only one CPU is allowed to execute the crash_kexec() 
> code as with
>886 * panic().  Otherwise parallel calls of panic() and 
> crash_kexec()
>887 * may stop each other.  To exclude them, we use 
> panic_cpu here too.
>888 */
>889this_cpu = raw_smp_processor_id();
>890old_cpu = atomic_cmpxchg(_cpu, -1, this_cpu);
>891if (old_cpu == -1) {
>892/* This is the 1st CPU which comes here, so go 
> ahead. */
>893__crash_kexec(regs);
>894
>895/*
>896 * Reset panic_cpu to allow another 
> panic()/crash_kexec()
>897 * call.
>898 */
>  > 899atomic_xchg(_cpu, -1);
>900}
>901}
>902
>903size_t crash_get_memory_size(void)
>904{
>905size_t size = 0;
>906
>907mutex_lock(_mutex);
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group





RE: Re: [V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-09-28 Thread 河合英宏 / KAWAIHIDEHIRO
> Hi Hidehiro,
> 
> [auto build test results on v4.3-rc2 -- if it's inappropriate base, please 
> ignore]
> 
> config: ia64-allyesconfig (attached as .config)
> reproduce:
>   wget 
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>  -O ~/bin/make.cross
>   chmod +x ~/bin/make.cross
>   git checkout 0077681103150af584e5e592c0238fd010654c26
>   # save the attached .config to linux build tree
>   make.cross ARCH=ia64
[snip]
>arch/ia64/include/uapi/asm/cmpxchg.h:56:2: warning: value computed is not 
> used [-Wunused-value]
> ((__typeof__(*(ptr))) __xchg((unsigned long) (x), (ptr), sizeof(*(ptr
>  ^
>arch/ia64/include/asm/atomic.h:135:30: note: in expansion of macro 'xchg'
> #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
>  ^
> >> kernel/kexec_core.c:899:3: note: in expansion of macro 'atomic_xchg'
>   atomic_xchg(_cpu, -1);
>   ^

I changed to use atomic_xchg() instead of atomic_set() in V3
because atomic_set() doesn't mean memory barrier.  However,
I thought again and there is no need of barrier; there is no
problem if a competitor sees old value of panic_cpu or new one.
So, atomic_set() is sufficient and using it will remove this warning.

I will resend the fixed version later.

> vim +/atomic_xchg +899 kernel/kexec_core.c
> 
>883
>884/*
>885 * Only one CPU is allowed to execute the crash_kexec() 
> code as with
>886 * panic().  Otherwise parallel calls of panic() and 
> crash_kexec()
>887 * may stop each other.  To exclude them, we use 
> panic_cpu here too.
>888 */
>889this_cpu = raw_smp_processor_id();
>890old_cpu = atomic_cmpxchg(_cpu, -1, this_cpu);
>891if (old_cpu == -1) {
>892/* This is the 1st CPU which comes here, so go 
> ahead. */
>893__crash_kexec(regs);
>894
>895/*
>896 * Reset panic_cpu to allow another 
> panic()/crash_kexec()
>897 * call.
>898 */
>  > 899atomic_xchg(_cpu, -1);
>900}
>901}
>902
>903size_t crash_get_memory_size(void)
>904{
>905size_t size = 0;
>906
>907mutex_lock(_mutex);
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group





RE: Re: [V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-09-27 Thread 河合英宏 / KAWAIHIDEHIRO
> Hi Hidehiro,
> 
> [auto build test results on v4.3-rc2 -- if it's inappropriate base, please 
> ignore]
> 
> config: x86_64-allnoconfig (attached as .config)
> reproduce:
>   git checkout 0077681103150af584e5e592c0238fd010654c26
>   # save the attached .config to linux build tree
>   make ARCH=x86_64
> 
> All error/warnings (new ones prefixed by >>):
> 
>kernel/panic.c: In function 'panic':
> >> kernel/panic.c:140:3: error: implicit declaration of function 
> >> '__crash_kexec' [-Werror=implicit-function-declaration]
>   __crash_kexec(NULL);
>   ^

Sorry, I missed to take into account the case of !CONFIG_KEXEC_CORE.

 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
+static inline void __crash_kexec(struct pt_regs *regs) { }
 static inline void crash_kexec(struct pt_regs *regs) { }
 static inline int kexec_should_crash(struct task_struct *p) { return 0; }

I'll resend the revised version later.

>cc1: some warnings being treated as errors
> 
> vim +/__crash_kexec +140 kernel/panic.c
> 
>134 * If we have crashed and we have a crash kernel loaded 
> let it handle
>135 * everything else.
>136 * If we want to run this after calling 
> panic_notifiers, pass
>137 * the "crash_kexec_post_notifiers" option to the 
> kernel.
>138 */
>139if (!crash_kexec_post_notifiers)
>  > 140__crash_kexec(NULL);
>141
>142/*
>143 * Note smp_send_stop is the usual smp shutdown 
> function, which
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group




RE: Re: [V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-09-27 Thread 河合英宏 / KAWAIHIDEHIRO
> Hi Hidehiro,
> 
> [auto build test results on v4.3-rc2 -- if it's inappropriate base, please 
> ignore]
> 
> config: x86_64-allnoconfig (attached as .config)
> reproduce:
>   git checkout 0077681103150af584e5e592c0238fd010654c26
>   # save the attached .config to linux build tree
>   make ARCH=x86_64
> 
> All error/warnings (new ones prefixed by >>):
> 
>kernel/panic.c: In function 'panic':
> >> kernel/panic.c:140:3: error: implicit declaration of function 
> >> '__crash_kexec' [-Werror=implicit-function-declaration]
>   __crash_kexec(NULL);
>   ^

Sorry, I missed to take into account the case of !CONFIG_KEXEC_CORE.

 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
+static inline void __crash_kexec(struct pt_regs *regs) { }
 static inline void crash_kexec(struct pt_regs *regs) { }
 static inline int kexec_should_crash(struct task_struct *p) { return 0; }

I'll resend the revised version later.

>cc1: some warnings being treated as errors
> 
> vim +/__crash_kexec +140 kernel/panic.c
> 
>134 * If we have crashed and we have a crash kernel loaded 
> let it handle
>135 * everything else.
>136 * If we want to run this after calling 
> panic_notifiers, pass
>137 * the "crash_kexec_post_notifiers" option to the 
> kernel.
>138 */
>139if (!crash_kexec_post_notifiers)
>  > 140__crash_kexec(NULL);
>141
>142/*
>143 * Note smp_send_stop is the usual smp shutdown 
> function, which
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group




RE: [V4 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-09-25 Thread 河合英宏 / KAWAIHIDEHIRO
Peter saids -tip tree doesn't have panic_on_unrecovered_nmi in the
previoius discussion, but it still exists.  So, I didn't change
anything about panic_on_unrecovered_nmi.

Thanks,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

> From: Hidehiro Kawai [mailto:hidehiro.kawai...@hitachi.com]
> 
> If panic on NMI happens just after panic() on the same CPU, panic()
> is recursively called.  As the result, it stalls after failing to
> acquire panic_lock.
> 
> To avoid this problem, don't call panic() in NMI context if
> we've already entered panic().
> 
> V4:
> - Improve comments in io_check_error() and panic()
> 
> V3:
> - Introduce nmi_panic() macro to reduce code duplication
> - In the case of panic on NMI, don't return from NMI handlers
>   if another cpu already panicked
> 
> V2:
> - Use atomic_cmpxchg() instead of current spin_trylock() to
>   exclude concurrent accesses to the panic routines
> - Don't introduce no-lock version of panic()
> 
> Signed-off-by: Hidehiro Kawai 
> Cc: Andrew Morton 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Peter Zijlstra 
> Cc: Michal Hocko 
> ---
>  arch/x86/kernel/nmi.c  |   16 
>  include/linux/kernel.h |   13 +
>  kernel/panic.c |   15 ---
>  kernel/watchdog.c  |2 +-
>  4 files changed, 38 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> index 697f90d..5131714 100644
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -231,7 +231,7 @@ void unregister_nmi_handler(unsigned int type, const char 
> *name)
>  #endif
> 
>   if (panic_on_unrecovered_nmi)
> - panic("NMI: Not continuing");
> + nmi_panic("NMI: Not continuing");
> 
>   pr_emerg("Dazed and confused, but trying to continue\n");
> 
> @@ -255,8 +255,16 @@ void unregister_nmi_handler(unsigned int type, const 
> char *name)
>reason, smp_processor_id());
>   show_regs(regs);
> 
> - if (panic_on_io_nmi)
> - panic("NMI IOCK error: Not continuing");
> + if (panic_on_io_nmi) {
> + nmi_panic("NMI IOCK error: Not continuing");
> +
> + /*
> +  * If we return from nmi_panic(), it means we have received
> +  * NMI while processing panic().  So, simply return without
> +  * a delay and re-enabling NMI.
> +  */
> + return;
> + }
> 
>   /* Re-enable the IOCK line, wait for a few seconds */
>   reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
> @@ -297,7 +305,7 @@ void unregister_nmi_handler(unsigned int type, const char 
> *name)
> 
>   pr_emerg("Do you have a strange power saving mode enabled?\n");
>   if (unknown_nmi_panic || panic_on_unrecovered_nmi)
> - panic("NMI: Not continuing");
> + nmi_panic("NMI: Not continuing");
> 
>   pr_emerg("Dazed and confused, but trying to continue\n");
>  }
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 5582410..57c33da 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -443,6 +443,19 @@ extern __scanf(2, 0)
> 
>  extern bool crash_kexec_post_notifiers;
> 
> +extern atomic_t panic_cpu;
> +
> +/*
> + * A variant of panic() called from NMI context.
> + * If we've already panicked on this cpu, return from here.
> + */
> +#define nmi_panic(fmt, ...)  \
> + do {\
> + int this_cpu = raw_smp_processor_id();  \
> + if (atomic_cmpxchg(_cpu, -1, this_cpu) != this_cpu) \
> + panic(fmt, ##__VA_ARGS__);  \
> + } while (0)
> +
>  /*
>   * Only to be used by arch init code. If the user over-wrote the default
>   * CONFIG_PANIC_TIMEOUT, honor it.
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 04e91ff..a105e67 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -60,6 +60,8 @@ void __weak panic_smp_self_stop(void)
>   cpu_relax();
>  }
> 
> +atomic_t panic_cpu = ATOMIC_INIT(-1);
> +
>  /**
>   *   panic - halt the system
>   *   @fmt: The text string to print
> @@ -70,17 +72,17 @@ void __weak panic_smp_self_stop(void)
>   */
>  void panic(const char *fmt, ...)
>  {
> - static DEFINE_SPINLOCK(panic_lock);
>   static char buf[1024];
>   va_list args;
>   long i, i_next = 0;
>   int state = 0;
> + int old_cpu, this_cpu;
> 
>   /*
>* Disable local interrupts. This will prevent panic_smp_self_stop
>* from deadlocking the first cpu that invokes the panic, since
>* there is nothing to prevent an interrupt handler (that runs
> -  * after the panic_lock is acquired) from invoking panic again.
> +  * after setting panic_cpu) from invoking panic again.
>*/
>   local_irq_disable();
> 
> @@ -93,8 +95,15 @@ void panic(const char 

RE: [V4 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-09-25 Thread 河合英宏 / KAWAIHIDEHIRO
Peter saids -tip tree doesn't have panic_on_unrecovered_nmi in the
previoius discussion, but it still exists.  So, I didn't change
anything about panic_on_unrecovered_nmi.

Thanks,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

> From: Hidehiro Kawai [mailto:hidehiro.kawai...@hitachi.com]
> 
> If panic on NMI happens just after panic() on the same CPU, panic()
> is recursively called.  As the result, it stalls after failing to
> acquire panic_lock.
> 
> To avoid this problem, don't call panic() in NMI context if
> we've already entered panic().
> 
> V4:
> - Improve comments in io_check_error() and panic()
> 
> V3:
> - Introduce nmi_panic() macro to reduce code duplication
> - In the case of panic on NMI, don't return from NMI handlers
>   if another cpu already panicked
> 
> V2:
> - Use atomic_cmpxchg() instead of current spin_trylock() to
>   exclude concurrent accesses to the panic routines
> - Don't introduce no-lock version of panic()
> 
> Signed-off-by: Hidehiro Kawai 
> Cc: Andrew Morton 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Peter Zijlstra 
> Cc: Michal Hocko 
> ---
>  arch/x86/kernel/nmi.c  |   16 
>  include/linux/kernel.h |   13 +
>  kernel/panic.c |   15 ---
>  kernel/watchdog.c  |2 +-
>  4 files changed, 38 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> index 697f90d..5131714 100644
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -231,7 +231,7 @@ void unregister_nmi_handler(unsigned int type, const char 
> *name)
>  #endif
> 
>   if (panic_on_unrecovered_nmi)
> - panic("NMI: Not continuing");
> + nmi_panic("NMI: Not continuing");
> 
>   pr_emerg("Dazed and confused, but trying to continue\n");
> 
> @@ -255,8 +255,16 @@ void unregister_nmi_handler(unsigned int type, const 
> char *name)
>reason, smp_processor_id());
>   show_regs(regs);
> 
> - if (panic_on_io_nmi)
> - panic("NMI IOCK error: Not continuing");
> + if (panic_on_io_nmi) {
> + nmi_panic("NMI IOCK error: Not continuing");
> +
> + /*
> +  * If we return from nmi_panic(), it means we have received
> +  * NMI while processing panic().  So, simply return without
> +  * a delay and re-enabling NMI.
> +  */
> + return;
> + }
> 
>   /* Re-enable the IOCK line, wait for a few seconds */
>   reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
> @@ -297,7 +305,7 @@ void unregister_nmi_handler(unsigned int type, const char 
> *name)
> 
>   pr_emerg("Do you have a strange power saving mode enabled?\n");
>   if (unknown_nmi_panic || panic_on_unrecovered_nmi)
> - panic("NMI: Not continuing");
> + nmi_panic("NMI: Not continuing");
> 
>   pr_emerg("Dazed and confused, but trying to continue\n");
>  }
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 5582410..57c33da 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -443,6 +443,19 @@ extern __scanf(2, 0)
> 
>  extern bool crash_kexec_post_notifiers;
> 
> +extern atomic_t panic_cpu;
> +
> +/*
> + * A variant of panic() called from NMI context.
> + * If we've already panicked on this cpu, return from here.
> + */
> +#define nmi_panic(fmt, ...)  \
> + do {\
> + int this_cpu = raw_smp_processor_id();  \
> + if (atomic_cmpxchg(_cpu, -1, this_cpu) != this_cpu) \
> + panic(fmt, ##__VA_ARGS__);  \
> + } while (0)
> +
>  /*
>   * Only to be used by arch init code. If the user over-wrote the default
>   * CONFIG_PANIC_TIMEOUT, honor it.
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 04e91ff..a105e67 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -60,6 +60,8 @@ void __weak panic_smp_self_stop(void)
>   cpu_relax();
>  }
> 
> +atomic_t panic_cpu = ATOMIC_INIT(-1);
> +
>  /**
>   *   panic - halt the system
>   *   @fmt: The text string to print
> @@ -70,17 +72,17 @@ void __weak panic_smp_self_stop(void)
>   */
>  void panic(const char *fmt, ...)
>  {
> - static DEFINE_SPINLOCK(panic_lock);
>   static char buf[1024];
>   va_list args;
>   long i, i_next = 0;
>   int state = 0;
> + int old_cpu, this_cpu;
> 
>   /*
>* Disable local interrupts. This will prevent panic_smp_self_stop
>* from deadlocking the first cpu that invokes the panic, since
>* there is nothing to prevent an interrupt handler (that runs
> -  * after the panic_lock is acquired) from invoking panic 

RE: [V3 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-08-31 Thread 河合英宏 / KAWAIHIDEHIRO
> From: Peter Zijlstra [mailto:pet...@infradead.org]
> On Mon, Aug 31, 2015 at 08:53:11AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > I understand your question.  I don't intend to permit the recursive
> > > call of crash_kexec() as for 'old_cpu != this_cpu' check.  That is
> > > needed for the case of panic() --> crash_kexec().  Since panic_cpu has
> > > already been set to this_cpu in panic() (please see PATCH 1/4), no one
> > > can run crash_kexec() without 'old_cpu != this_cpu' check.
> > >
> > > If you don't like this check, I would also be able to handle this case
> > > like below:
> > >
> > > crash_kexec()
> > > {
> > >   old_cpu = atomic_cmpxchg(_cpu, -1, this_cpu);
> > >   if (old_cpu != -1)
> > >   return;
> > >
> > >   __crash_kexec();
> > > }
> > >
> > > panic()
> > > {
> > >   atomic_cmpxchg(_cpu, -1, this_cpu);
> > >   __crash_kexec();
> > > ...
> > >
> >
> > Is that OK?
> 
> I suppose so, but I think me getting confused means comments can be
> added/improved.

OK, I'll improve comments and description in the next version.

Thanks!

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [V3 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-08-31 Thread 河合英宏 / KAWAIHIDEHIRO
Hello Peter,

> From: linux-kernel-ow...@vger.kernel.org 
> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of 河合英宏 / KAWAI,
> 
> Hi,
> 
> > From: Peter Zijlstra [mailto:pet...@infradead.org]
> >
> > On Sat, Aug 22, 2015 at 02:35:24AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > From: Peter Zijlstra [mailto:pet...@infradead.org]
> > > >
> > > > On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote:
> > > > >  void crash_kexec(struct pt_regs *regs)
> > > > >  {
> > > > > + int old_cpu, this_cpu;
> > > > > +
> > > > > + /*
> > > > > +  * `old_cpu == -1' means we are the first comer and 
> > > > > crash_kexec()
> > > > > +  * was called without entering panic().
> > > > > +  * `old_cpu == this_cpu' means crash_kexec() was called from 
> > > > > panic().
> > > > > +  */
> > > > > + this_cpu = raw_smp_processor_id();
> > > > > + old_cpu = atomic_cmpxchg(_cpu, -1, this_cpu);
> > > > > + if (old_cpu != -1 && old_cpu != this_cpu)
> > > > > + return;
> > > >
> > > > This allows recursive calling of crash_kexec(), the Changelog did not
> > > > mention that. Is this really required?
> > >
> > > What part are you arguing?  Recursive call of crash_kexec() doesn't
> > > happen.  In the first place, one of the purpose of this patch is
> > > to prevent a recursive call of crash_kexec() in the following case
> > > as I stated in the description:
> > >
> > > CPU 0:
> > >   oops_end()
> > > crash_kexec()
> > >   mutex_trylock() // acquired
> > > 
> > > io_check_error()
> > >   panic()
> > > crash_kexec()
> > >   mutex_trylock() // failed to acquire
> > > infinite loop
> > >
> >
> > Yes, but what to we want to do there? It seems to me that is wrong, we
> > do not want to let a recursive crash_kexec() proceed.
> >
> > Whereas the condition you created explicitly allows this recursion by
> > virtue of the 'old_cpu != this_cpu' check.
> 
> I understand your question.  I don't intend to permit the recursive
> call of crash_kexec() as for 'old_cpu != this_cpu' check.  That is
> needed for the case of panic() --> crash_kexec().  Since panic_cpu has
> already been set to this_cpu in panic() (please see PATCH 1/4), no one
> can run crash_kexec() without 'old_cpu != this_cpu' check.
> 
> If you don't like this check, I would also be able to handle this case
> like below:
> 
> crash_kexec()
> {
>   old_cpu = atomic_cmpxchg(_cpu, -1, this_cpu);
>   if (old_cpu != -1)
>   return;
> 
>   __crash_kexec();
> }
> 
> panic()
> {
>   atomic_cmpxchg(_cpu, -1, this_cpu);
>   __crash_kexec();
> ...
> 

Is that OK?

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group




RE: [V3 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-08-31 Thread 河合英宏 / KAWAIHIDEHIRO
> From: Peter Zijlstra [mailto:pet...@infradead.org]
> On Mon, Aug 31, 2015 at 08:53:11AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > I understand your question.  I don't intend to permit the recursive
> > > call of crash_kexec() as for 'old_cpu != this_cpu' check.  That is
> > > needed for the case of panic() --> crash_kexec().  Since panic_cpu has
> > > already been set to this_cpu in panic() (please see PATCH 1/4), no one
> > > can run crash_kexec() without 'old_cpu != this_cpu' check.
> > >
> > > If you don't like this check, I would also be able to handle this case
> > > like below:
> > >
> > > crash_kexec()
> > > {
> > >   old_cpu = atomic_cmpxchg(_cpu, -1, this_cpu);
> > >   if (old_cpu != -1)
> > >   return;
> > >
> > >   __crash_kexec();
> > > }
> > >
> > > panic()
> > > {
> > >   atomic_cmpxchg(_cpu, -1, this_cpu);
> > >   __crash_kexec();
> > > ...
> > >
> >
> > Is that OK?
> 
> I suppose so, but I think me getting confused means comments can be
> added/improved.

OK, I'll improve comments and description in the next version.

Thanks!

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [V3 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-08-31 Thread 河合英宏 / KAWAIHIDEHIRO
Hello Peter,

> From: linux-kernel-ow...@vger.kernel.org 
> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of 河合英宏 / KAWAI,
> 
> Hi,
> 
> > From: Peter Zijlstra [mailto:pet...@infradead.org]
> >
> > On Sat, Aug 22, 2015 at 02:35:24AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > From: Peter Zijlstra [mailto:pet...@infradead.org]
> > > >
> > > > On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote:
> > > > >  void crash_kexec(struct pt_regs *regs)
> > > > >  {
> > > > > + int old_cpu, this_cpu;
> > > > > +
> > > > > + /*
> > > > > +  * `old_cpu == -1' means we are the first comer and 
> > > > > crash_kexec()
> > > > > +  * was called without entering panic().
> > > > > +  * `old_cpu == this_cpu' means crash_kexec() was called from 
> > > > > panic().
> > > > > +  */
> > > > > + this_cpu = raw_smp_processor_id();
> > > > > + old_cpu = atomic_cmpxchg(_cpu, -1, this_cpu);
> > > > > + if (old_cpu != -1 && old_cpu != this_cpu)
> > > > > + return;
> > > >
> > > > This allows recursive calling of crash_kexec(), the Changelog did not
> > > > mention that. Is this really required?
> > >
> > > What part are you arguing?  Recursive call of crash_kexec() doesn't
> > > happen.  In the first place, one of the purpose of this patch is
> > > to prevent a recursive call of crash_kexec() in the following case
> > > as I stated in the description:
> > >
> > > CPU 0:
> > >   oops_end()
> > > crash_kexec()
> > >   mutex_trylock() // acquired
> > > 
> > > io_check_error()
> > >   panic()
> > > crash_kexec()
> > >   mutex_trylock() // failed to acquire
> > > infinite loop
> > >
> >
> > Yes, but what to we want to do there? It seems to me that is wrong, we
> > do not want to let a recursive crash_kexec() proceed.
> >
> > Whereas the condition you created explicitly allows this recursion by
> > virtue of the 'old_cpu != this_cpu' check.
> 
> I understand your question.  I don't intend to permit the recursive
> call of crash_kexec() as for 'old_cpu != this_cpu' check.  That is
> needed for the case of panic() --> crash_kexec().  Since panic_cpu has
> already been set to this_cpu in panic() (please see PATCH 1/4), no one
> can run crash_kexec() without 'old_cpu != this_cpu' check.
> 
> If you don't like this check, I would also be able to handle this case
> like below:
> 
> crash_kexec()
> {
>   old_cpu = atomic_cmpxchg(_cpu, -1, this_cpu);
>   if (old_cpu != -1)
>   return;
> 
>   __crash_kexec();
> }
> 
> panic()
> {
>   atomic_cmpxchg(_cpu, -1, this_cpu);
>   __crash_kexec();
> ...
> 

Is that OK?

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group




  1   2   3   >