RE: Re: [V4 PATCH 2/2] mips/panic: Replace smp_send_stop() with kdump friendly version in panic path
Dave Young suggested to me to explain the problem in more detail, so here is the revised commit description. The patch is now in -mm, so I copied Cc list from -mm version. Also I added Corey Minyard's Tested-by and Reviewed-by. From: Hidehiro KawaiSubject: mips/panic: replace smp_send_stop() with kdump friendly version in panic path This patch fixes the problems reported by Daniel Walker (https://lkml.org/lkml/2015/6/24/44). When kernel panics with crash_kexec_post_notifiers kernel parameter enabled, other CPUs are stopped by smp_send_stop() instead of machine_crash_shutdown() in __crash_kexec() path. panic() if crash_kexec_post_notifiers == 1 smp_send_stop() atomic_notifier_call_chain() kmsg_dump() __crash_kexec() machine_crash_shutdown() octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs Different from smp_send_stop(), machine_crash_shutdown() stops other CPUs with extra works for kdump. So, if smp_send_stop() stops other CPUs in advance, these extra works won't be done. As the result, kdump routines miss to save other CPUs' registers. Additionally for MIPS OCTEON, it misses to stop the watchdog timer. To fix this problem, call a new kdump friendly function, crash_smp_send_stop(), instead of the smp_send_stop() when crash_kexec_post_notifiers is enabled. crash_smp_send_stop() is a weak function, and it just call smp_send_stop(). Architecture codes should override it so that kdump can work appropriately. This patch provides MIPS version. Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option) Link: http://lkml.kernel.org/r/20160810080950.11028.28000.st...@sysi4-13.yrl.intra.hitachi.co.jp Signed-off-by: Hidehiro Kawai Reported-by: Daniel Walker Tested-by: Corey Minyard Reviewed-by: Corey Minyard Cc: Dave Young Cc: Baoquan He Cc: Vivek Goyal Cc: Eric Biederman Cc: Masami Hiramatsu Cc: Daniel Walker Cc: Xunlei Pang Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Borislav Petkov Cc: David Vrabel Cc: Toshi Kani Cc: Ralf Baechle Cc: David Daney Cc: Aaro Koskinen Cc: "Steven J. Hill" Signed-off-by: Andrew Morton > From: Corey Minyard [mailto:cminy...@mvista.com] > Sent: Friday, August 19, 2016 6:18 AM > Sorry this took so long, but I have finally tested this, it seems to > work fine: > > Tested-by: Corey Minyard > Reviewed-by: Corey Minyard > > On 08/10/2016 03:09 AM, Hidehiro Kawai wrote: > > Daniel Walker reported problems which happens when > > crash_kexec_post_notifiers kernel option is enabled > > (https://lkml.org/lkml/2015/6/24/44). > > > > In that case, smp_send_stop() is called before entering kdump routines > > which assume other CPUs are still online. As the result, kdump > > routines fail to save other CPUs' registers. Additionally for MIPS > > OCTEON, it misses to stop the watchdog timer. > > > > To fix this problem, call a new kdump friendly function, > > crash_smp_send_stop(), instead of the smp_send_stop() when > > crash_kexec_post_notifiers is enabled. crash_smp_send_stop() is a > > weak function, and it just call smp_send_stop(). Architecture > > codes should override it so that kdump can work appropriately. > > This patch provides MIPS version. > > > > Reported-by: Daniel Walker > > Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" > > option) > > Signed-off-by: Hidehiro Kawai > > Cc: Ralf Baechle > > Cc: David Daney > > Cc: Aaro Koskinen > > Cc: "Steven J. Hill" > > Cc: Corey Minyard > > > > --- > > I'm not familiar with MIPS, and I don't have a test environment and > > just did build tests only. Please don't apply this patch until > > someone does enough tests, otherwise simply drop this patch. > > --- > > arch/mips/cavium-octeon/setup.c | 14 ++ > > arch/mips/include/asm/kexec.h|1 + > > arch/mips/kernel/crash.c | 18 +- > > arch/mips/kernel/machine_kexec.c |1 + > > 4 files changed, 33 insertions(+), 1 deletion(-) > > > > diff --git a/arch/mips/cavium-octeon/setup.c > > b/arch/mips/cavium-octeon/setup.c > > index cb16fcc..5537f95 100644 > > --- a/arch/mips/cavium-octeon/setup.c > > +++ b/arch/mips/cavium-octeon/setup.c > > @@ -267,6 +267,17 @@ static void
RE: Re: [V4 PATCH 2/2] mips/panic: Replace smp_send_stop() with kdump friendly version in panic path
Dave Young suggested to me to explain the problem in more detail, so here is the revised commit description. The patch is now in -mm, so I copied Cc list from -mm version. Also I added Corey Minyard's Tested-by and Reviewed-by. From: Hidehiro Kawai Subject: mips/panic: replace smp_send_stop() with kdump friendly version in panic path This patch fixes the problems reported by Daniel Walker (https://lkml.org/lkml/2015/6/24/44). When kernel panics with crash_kexec_post_notifiers kernel parameter enabled, other CPUs are stopped by smp_send_stop() instead of machine_crash_shutdown() in __crash_kexec() path. panic() if crash_kexec_post_notifiers == 1 smp_send_stop() atomic_notifier_call_chain() kmsg_dump() __crash_kexec() machine_crash_shutdown() octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs Different from smp_send_stop(), machine_crash_shutdown() stops other CPUs with extra works for kdump. So, if smp_send_stop() stops other CPUs in advance, these extra works won't be done. As the result, kdump routines miss to save other CPUs' registers. Additionally for MIPS OCTEON, it misses to stop the watchdog timer. To fix this problem, call a new kdump friendly function, crash_smp_send_stop(), instead of the smp_send_stop() when crash_kexec_post_notifiers is enabled. crash_smp_send_stop() is a weak function, and it just call smp_send_stop(). Architecture codes should override it so that kdump can work appropriately. This patch provides MIPS version. Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option) Link: http://lkml.kernel.org/r/20160810080950.11028.28000.st...@sysi4-13.yrl.intra.hitachi.co.jp Signed-off-by: Hidehiro Kawai Reported-by: Daniel Walker Tested-by: Corey Minyard Reviewed-by: Corey Minyard Cc: Dave Young Cc: Baoquan He Cc: Vivek Goyal Cc: Eric Biederman Cc: Masami Hiramatsu Cc: Daniel Walker Cc: Xunlei Pang Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Borislav Petkov Cc: David Vrabel Cc: Toshi Kani Cc: Ralf Baechle Cc: David Daney Cc: Aaro Koskinen Cc: "Steven J. Hill" Signed-off-by: Andrew Morton > From: Corey Minyard [mailto:cminy...@mvista.com] > Sent: Friday, August 19, 2016 6:18 AM > Sorry this took so long, but I have finally tested this, it seems to > work fine: > > Tested-by: Corey Minyard > Reviewed-by: Corey Minyard > > On 08/10/2016 03:09 AM, Hidehiro Kawai wrote: > > Daniel Walker reported problems which happens when > > crash_kexec_post_notifiers kernel option is enabled > > (https://lkml.org/lkml/2015/6/24/44). > > > > In that case, smp_send_stop() is called before entering kdump routines > > which assume other CPUs are still online. As the result, kdump > > routines fail to save other CPUs' registers. Additionally for MIPS > > OCTEON, it misses to stop the watchdog timer. > > > > To fix this problem, call a new kdump friendly function, > > crash_smp_send_stop(), instead of the smp_send_stop() when > > crash_kexec_post_notifiers is enabled. crash_smp_send_stop() is a > > weak function, and it just call smp_send_stop(). Architecture > > codes should override it so that kdump can work appropriately. > > This patch provides MIPS version. > > > > Reported-by: Daniel Walker > > Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" > > option) > > Signed-off-by: Hidehiro Kawai > > Cc: Ralf Baechle > > Cc: David Daney > > Cc: Aaro Koskinen > > Cc: "Steven J. Hill" > > Cc: Corey Minyard > > > > --- > > I'm not familiar with MIPS, and I don't have a test environment and > > just did build tests only. Please don't apply this patch until > > someone does enough tests, otherwise simply drop this patch. > > --- > > arch/mips/cavium-octeon/setup.c | 14 ++ > > arch/mips/include/asm/kexec.h|1 + > > arch/mips/kernel/crash.c | 18 +- > > arch/mips/kernel/machine_kexec.c |1 + > > 4 files changed, 33 insertions(+), 1 deletion(-) > > > > diff --git a/arch/mips/cavium-octeon/setup.c > > b/arch/mips/cavium-octeon/setup.c > > index cb16fcc..5537f95 100644 > > --- a/arch/mips/cavium-octeon/setup.c > > +++ b/arch/mips/cavium-octeon/setup.c > > @@ -267,6 +267,17 @@ static void octeon_crash_shutdown(struct pt_regs *regs) > > default_machine_crash_shutdown(regs); > > } > > > > +#ifdef CONFIG_SMP > > +void octeon_crash_smp_send_stop(void) > > +{ > > + int cpu; > > + > > + /* disable watchdogs */ > > + for_each_online_cpu(cpu) > > + cvmx_write_csr(CVMX_CIU_WDOGX(cpu_logical_map(cpu)), 0); > > +} > > +#endif > > + > > #endif /* CONFIG_KEXEC */ > > > > #ifdef CONFIG_CAVIUM_RESERVE32 > > @@ -911,6 +922,9 @@ void __init prom_init(void) > > _machine_kexec_shutdown = octeon_shutdown; > > _machine_crash_shutdown = octeon_crash_shutdown; > > _machine_kexec_prepare = octeon_kexec_prepare; > > +#ifdef CONFIG_SMP > > + _crash_smp_send_stop =
RE: Re: [V4 PATCH 2/2] mips/panic: Replace smp_send_stop() with kdump friendly version in panic path
Hi Corey, > From: Corey Minyard [mailto:cminy...@mvista.com] > Sent: Friday, August 12, 2016 10:56 PM > I'll try to test this, but I have one comment inline... Thank you very much! > On 08/11/2016 10:17 PM, Dave Young wrote: > > On 08/10/16 at 05:09pm, Hidehiro Kawai wrote: [snip] > >> diff --git a/arch/mips/kernel/crash.c b/arch/mips/kernel/crash.c > >> index 610f0f3..1723b17 100644 > >> --- a/arch/mips/kernel/crash.c > >> +++ b/arch/mips/kernel/crash.c > >> @@ -47,9 +47,14 @@ static void crash_shutdown_secondary(void *passed_regs) > >> > >> static void crash_kexec_prepare_cpus(void) > >> { > >> + static int cpus_stopped; > >>unsigned int msecs; > >> + unsigned int ncpus; > >> > >> - unsigned int ncpus = num_online_cpus() - 1;/* Excluding the panic cpu */ > >> + if (cpus_stopped) > >> + return; > > Wouldn't you want an atomic operation and some special handling here to > ensure that only one CPU does this? So if a CPU comes in here and > another CPU is already in the process stopping the CPUs it won't result in a > deadlock. Because this function can be called only one panicking CPU, there is no problem. There are two paths which crash_kexec_prepare_cpus is called. Path 1 (panic path): panic() crash_smp_send_stop() crash_kexec_prepare_cpus() Path 2 (oops path): crash_kexec() __crash_kexec() machine_crash_shutdown() default_machine_crash_shutdown() // for MIPS crash_kexec_prepare_cpus() Here, panic() and crash_kexec() run exclusively via panic_cpu atomic variable. So we can use cpus_stopped as normal variable. Best regards, Hidehiro Kawai
RE: Re: [V4 PATCH 2/2] mips/panic: Replace smp_send_stop() with kdump friendly version in panic path
Hi Corey, > From: Corey Minyard [mailto:cminy...@mvista.com] > Sent: Friday, August 12, 2016 10:56 PM > I'll try to test this, but I have one comment inline... Thank you very much! > On 08/11/2016 10:17 PM, Dave Young wrote: > > On 08/10/16 at 05:09pm, Hidehiro Kawai wrote: [snip] > >> diff --git a/arch/mips/kernel/crash.c b/arch/mips/kernel/crash.c > >> index 610f0f3..1723b17 100644 > >> --- a/arch/mips/kernel/crash.c > >> +++ b/arch/mips/kernel/crash.c > >> @@ -47,9 +47,14 @@ static void crash_shutdown_secondary(void *passed_regs) > >> > >> static void crash_kexec_prepare_cpus(void) > >> { > >> + static int cpus_stopped; > >>unsigned int msecs; > >> + unsigned int ncpus; > >> > >> - unsigned int ncpus = num_online_cpus() - 1;/* Excluding the panic cpu */ > >> + if (cpus_stopped) > >> + return; > > Wouldn't you want an atomic operation and some special handling here to > ensure that only one CPU does this? So if a CPU comes in here and > another CPU is already in the process stopping the CPUs it won't result in a > deadlock. Because this function can be called only one panicking CPU, there is no problem. There are two paths which crash_kexec_prepare_cpus is called. Path 1 (panic path): panic() crash_smp_send_stop() crash_kexec_prepare_cpus() Path 2 (oops path): crash_kexec() __crash_kexec() machine_crash_shutdown() default_machine_crash_shutdown() // for MIPS crash_kexec_prepare_cpus() Here, panic() and crash_kexec() run exclusively via panic_cpu atomic variable. So we can use cpus_stopped as normal variable. Best regards, Hidehiro Kawai