Re: [V4 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-10-13 Thread 'Dave Young'
Hi, 河合英宏

Thanks for the patch log update, it looks good to me.

Acked-by: Dave Young 

On 09/20/16 at 11:22am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> Here is the revised commit description reflecting Dave's
> comment.  Cc list was copied from -mm version.
> 
> From: Hidehiro Kawai 
> Subject: x86/panic: replace smp_send_stop() with kdump friendly version in 
> panic path
> 
> This patch fixes a problem reported by Daniel Walker
> (https://lkml.org/lkml/2015/6/24/44).
> 
> When kernel panics with crash_kexec_post_notifiers kernel parameter
> enabled, other CPUs are stopped by smp_send_stop() instead of
> machine_crash_shutdown() in __crash_kexec() path.
> 
>   panic()
> if crash_kexec_post_notifiers == 1
>   smp_send_stop()
>   atomic_notifier_call_chain()
>   kmsg_dump()
> __crash_kexec()
>   machine_crash_shutdown()
> 
> Different from smp_send_stop(), machine_crash_shutdown() stops other
> CPUs with extra works for kdump.  So, if smp_send_stop() stops other
> CPUs in advance, these extra works won't be done.  For x86, kdump
> routines miss to save other CPUs' registers and disable virtualization
> extensions.
> 
> To fix this problem, call a new kdump friendly function,
> crash_smp_send_stop(), instead of the smp_send_stop() when
> crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
> weak function, and it just call smp_send_stop().  Architecture
> codes should override it so that kdump can work appropriately.
> This patch only provides x86-specific version.
> 
> For Xen's PV kernel, just keep the current behavior.
> As for Dom0, it doesn't use crash_kexec routines, and it relies on
> panic notifier chain.  At the end of the chain, a hypercall is
> issued which requests the hypervisor to execute kdump.  This means
> regardless of crash_kexec_post_notifiers setting, smp_send_stop().
> For PV HVM, it would work similarly to baremetal kernels with extra
> cleanups for hypervisor.  It doesn't need additional care.
> 
> Changes in V4:
> - Keep to use smp_send_stop if crash_kexec_post_notifiers is not set
> - Rename panic_smp_send_stop to crash_smp_send_stop
> - Don't change the behavior for Xen's PV kernel
> 
> Changes in V3:
> - Revise comments, description, and symbol names
> 
> Changes in V2:
> - Replace smp_send_stop() call with crash_kexec version which
>   saves cpu states and cleans up VMX/SVM
> - Drop a fix for Problem 1 at this moment
> 
> Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
> Link: 
> http://lkml.kernel.org/r/20160810080948.11028.15344.st...@sysi4-13.yrl.intra.hitachi.co.jp
> Signed-off-by: Hidehiro Kawai 
> Reported-by: Daniel Walker 
> Cc: Dave Young 
> Cc: Baoquan He 
> Cc: Vivek Goyal 
> Cc: Eric Biederman 
> Cc: Masami Hiramatsu 
> Cc: Daniel Walker 
> Cc: Xunlei Pang 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Borislav Petkov 
> Cc: David Vrabel 
> Cc: Toshi Kani 
> Cc: Ralf Baechle 
> Cc: David Daney 
> Cc: Aaro Koskinen 
> Cc: "Steven J. Hill" 
> Cc: Corey Minyard 
> Signed-off-by: Andrew Morton 
> 

[snip]

Thanks
Dave

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: RE: Re: [V4 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-09-20 Thread 河合英宏 / KAWAI,HIDEHIRO
Here is the revised commit description reflecting Dave's
comment.  Cc list was copied from -mm version.

From: Hidehiro Kawai 
Subject: x86/panic: replace smp_send_stop() with kdump friendly version in 
panic path

This patch fixes a problem reported by Daniel Walker
(https://lkml.org/lkml/2015/6/24/44).

When kernel panics with crash_kexec_post_notifiers kernel parameter
enabled, other CPUs are stopped by smp_send_stop() instead of
machine_crash_shutdown() in __crash_kexec() path.

  panic()
if crash_kexec_post_notifiers == 1
  smp_send_stop()
  atomic_notifier_call_chain()
  kmsg_dump()
__crash_kexec()
  machine_crash_shutdown()

Different from smp_send_stop(), machine_crash_shutdown() stops other
CPUs with extra works for kdump.  So, if smp_send_stop() stops other
CPUs in advance, these extra works won't be done.  For x86, kdump
routines miss to save other CPUs' registers and disable virtualization
extensions.

To fix this problem, call a new kdump friendly function,
crash_smp_send_stop(), instead of the smp_send_stop() when
crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
weak function, and it just call smp_send_stop().  Architecture
codes should override it so that kdump can work appropriately.
This patch only provides x86-specific version.

For Xen's PV kernel, just keep the current behavior.
As for Dom0, it doesn't use crash_kexec routines, and it relies on
panic notifier chain.  At the end of the chain, a hypercall is
issued which requests the hypervisor to execute kdump.  This means
regardless of crash_kexec_post_notifiers setting, smp_send_stop().
For PV HVM, it would work similarly to baremetal kernels with extra
cleanups for hypervisor.  It doesn't need additional care.

Changes in V4:
- Keep to use smp_send_stop if crash_kexec_post_notifiers is not set
- Rename panic_smp_send_stop to crash_smp_send_stop
- Don't change the behavior for Xen's PV kernel

Changes in V3:
- Revise comments, description, and symbol names

Changes in V2:
- Replace smp_send_stop() call with crash_kexec version which
  saves cpu states and cleans up VMX/SVM
- Drop a fix for Problem 1 at this moment

Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
Link: 
http://lkml.kernel.org/r/20160810080948.11028.15344.st...@sysi4-13.yrl.intra.hitachi.co.jp
Signed-off-by: Hidehiro Kawai 
Reported-by: Daniel Walker 
Cc: Dave Young 
Cc: Baoquan He 
Cc: Vivek Goyal 
Cc: Eric Biederman 
Cc: Masami Hiramatsu 
Cc: Daniel Walker 
Cc: Xunlei Pang 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: David Vrabel 
Cc: Toshi Kani 
Cc: Ralf Baechle 
Cc: David Daney 
Cc: Aaro Koskinen 
Cc: "Steven J. Hill" 
Cc: Corey Minyard 
Signed-off-by: Andrew Morton 

> -Original Message-
> From: Hidehiro Kawai
> Hi Xunlei,
> 
> > From: Xunlei Pang [mailto:xp...@redhat.com]
> > Sent: Tuesday, September 20, 2016 4:40 PM
> > On 2016/08/15/ at 19:22, Hidehiro Kawai wrote:
> > > Hi Dave,
> > >
> > > Thank you for the review.
> > >
> > >> From: Dave Young [mailto:dyo...@redhat.com]
> > >> Sent: Friday, August 12, 2016 12:17 PM
> > >>
> > >> Thanks for the update.
> > >> On 08/10/16 at 05:09pm, Hidehiro Kawai wrote:
> > >>> Daniel Walker reported problems which happens when
> > >>> crash_kexec_post_notifiers kernel option is enabled
> > >>> (https://lkml.org/lkml/2015/6/24/44).
> > >>>
> > >>> In that case, smp_send_stop() is called before entering kdump routines
> > >>> which assume other CPUs are still online.  As the result, for x86,
> > >>> kdump routines fail to save other CPUs' registers  and disable
> > >>> virtualization extensions.
> > >> Seems you simplified the changelog, but I think a little more details
> > >> will be helpful to understand the patch. You know sometimes lkml.org
> > >> does not work well.
> > > So, I'll try another archives when I post patch set next time.
> >
> > Hi Hidehiro Kawai,
> >
> > What's the status of this patch set, are you going to send an updated 
> > version?
> 
> Sorry, I misunderstood what Dave said, and I thought I don't
> need to revise the patch set.
> 
> Currently, this patch set is in -mm, then -next tree.
> What I need to fix is only commit descriptions, so I'm going to
> post revised descriptions as replies to this thread, and then
> request to Andrew to replace them if there is no objection.
> (or should I resend the whole patch set?)
> 
> Regards,
> Hidehiro Kawai
> 
> > >>> To fix this problem, call a new kdump friendly function,
> > >>> 

RE: Re: [V4 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-09-20 Thread 河合英宏 / KAWAI,HIDEHIRO
Hi Xunlei,

> From: Xunlei Pang [mailto:xp...@redhat.com]
> Sent: Tuesday, September 20, 2016 4:40 PM
> On 2016/08/15/ at 19:22, Hidehiro Kawai wrote:
> > Hi Dave,
> >
> > Thank you for the review.
> >
> >> From: Dave Young [mailto:dyo...@redhat.com]
> >> Sent: Friday, August 12, 2016 12:17 PM
> >>
> >> Thanks for the update.
> >> On 08/10/16 at 05:09pm, Hidehiro Kawai wrote:
> >>> Daniel Walker reported problems which happens when
> >>> crash_kexec_post_notifiers kernel option is enabled
> >>> (https://lkml.org/lkml/2015/6/24/44).
> >>>
> >>> In that case, smp_send_stop() is called before entering kdump routines
> >>> which assume other CPUs are still online.  As the result, for x86,
> >>> kdump routines fail to save other CPUs' registers  and disable
> >>> virtualization extensions.
> >> Seems you simplified the changelog, but I think a little more details
> >> will be helpful to understand the patch. You know sometimes lkml.org
> >> does not work well.
> > So, I'll try another archives when I post patch set next time.
> 
> Hi Hidehiro Kawai,
> 
> What's the status of this patch set, are you going to send an updated version?

Sorry, I misunderstood what Dave said, and I thought I don't
need to revise the patch set.

Currently, this patch set is in -mm, then -next tree.
What I need to fix is only commit descriptions, so I'm going to
post revised descriptions as replies to this thread, and then
request to Andrew to replace them if there is no objection.
(or should I resend the whole patch set?)

Regards,
Hidehiro Kawai

> >>> To fix this problem, call a new kdump friendly function,
> >>> crash_smp_send_stop(), instead of the smp_send_stop() when
> >>> crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
> >>> weak function, and it just call smp_send_stop().  Architecture
> >>> codes should override it so that kdump can work appropriately.
> >>> This patch only provides x86-specific version.
> >>>
> >>> For Xen's PV kernel, just keep the current behavior.
> >> Could you explain a bit about above Xen PV kernel behavior?
> >>
> >> BTW, this version looks better,  I think I'm fine with this version
> >> besides of the questions about changelog.
> > As for Dom0 kernel, it doesn't use crash_kexec routines, and
> > it relies on panic notifier chain.  At the end of the chain,
> > xen_panic_event is called, and it issues a hypercall which
> > requests Hypervisor to execute kdump.  This means whether
> > crash_kexec_panic_notifiers is set or not, panic notifiers
> > are called after smp_send_stop.  Even if we save registers
> > in Dom0 kernel, they seem to be ignored (Hypervisor is responsible
> > for that).  This is why I kept the current behavior for Xen.
> >
> > For PV DomU kernel, kdump is not supported.  For PV HVM
> > DomU, I'm not sure what will happen on panic because I
> > couldn't boot PV HVM DomU and test it.  But I think it will
> > work similarly to baremetal kernels with extra cleanups
> > for Hypervisor.
> >
> > Best regards,
> >
> > Hidehiro Kawai
> >
> >>> Changes in V4:
> >>> - Keep to use smp_send_stop if crash_kexec_post_notifiers is not set
> >>> - Rename panic_smp_send_stop to crash_smp_send_stop
> >>> - Don't change the behavior for Xen's PV kernel
> >>>
> >>> Changes in V3:
> >>> - Revise comments, description, and symbol names
> >>>
> >>> Changes in V2:
> >>> - Replace smp_send_stop() call with crash_kexec version which
> >>>   saves cpu states and cleans up VMX/SVM
> >>> - Drop a fix for Problem 1 at this moment
> >>>
> >>> Reported-by: Daniel Walker 
> >>> Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" 
> >>> option)
> >>> Signed-off-by: Hidehiro Kawai 
> >>> Cc: Dave Young 
> >>> Cc: Baoquan He 
> >>> Cc: Vivek Goyal 
> >>> Cc: Eric Biederman 
> >>> Cc: Masami Hiramatsu 
> >>> Cc: Daniel Walker 
> >>> Cc: Xunlei Pang 
> >>> Cc: Thomas Gleixner 
> >>> Cc: Ingo Molnar 
> >>> Cc: "H. Peter Anvin" 
> >>> Cc: Borislav Petkov 
> >>> Cc: David Vrabel 
> >>> Cc: Toshi Kani 
> >>> Cc: Andrew Morton 
> >>> ---
> >>>  arch/x86/include/asm/kexec.h |1 +
> >>>  arch/x86/include/asm/smp.h   |1 +
> >>>  arch/x86/kernel/crash.c  |   22 +---
> >>>  arch/x86/kernel/smp.c|5 
> >>>  kernel/panic.c   |   47 
> >>> --
> >>>  5 files changed, 66 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> >>> index d2434c1..282630e 100644
> >>> --- a/arch/x86/include/asm/kexec.h
> >>> +++ b/arch/x86/include/asm/kexec.h
> >>> @@ -210,6 +210,7 @@ struct kexec_entry64_regs {
> >>>
> >>>  typedef void 

Re: [V4 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-09-20 Thread Xunlei Pang
On 2016/08/15/ at 19:22, Hidehiro Kawai wrote:
> Hi Dave,
>
> Thank you for the review.
>
>> From: Dave Young [mailto:dyo...@redhat.com]
>> Sent: Friday, August 12, 2016 12:17 PM
>>
>> Thanks for the update.
>> On 08/10/16 at 05:09pm, Hidehiro Kawai wrote:
>>> Daniel Walker reported problems which happens when
>>> crash_kexec_post_notifiers kernel option is enabled
>>> (https://lkml.org/lkml/2015/6/24/44).
>>>
>>> In that case, smp_send_stop() is called before entering kdump routines
>>> which assume other CPUs are still online.  As the result, for x86,
>>> kdump routines fail to save other CPUs' registers  and disable
>>> virtualization extensions.
>> Seems you simplified the changelog, but I think a little more details
>> will be helpful to understand the patch. You know sometimes lkml.org
>> does not work well.
> So, I'll try another archives when I post patch set next time.

Hi Hidehiro Kawai,

What's the status of this patch set, are you going to send an updated version?

Regards,
Xunlei

>>> To fix this problem, call a new kdump friendly function,
>>> crash_smp_send_stop(), instead of the smp_send_stop() when
>>> crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
>>> weak function, and it just call smp_send_stop().  Architecture
>>> codes should override it so that kdump can work appropriately.
>>> This patch only provides x86-specific version.
>>>
>>> For Xen's PV kernel, just keep the current behavior.
>> Could you explain a bit about above Xen PV kernel behavior?
>>
>> BTW, this version looks better,  I think I'm fine with this version
>> besides of the questions about changelog.
> As for Dom0 kernel, it doesn't use crash_kexec routines, and
> it relies on panic notifier chain.  At the end of the chain,
> xen_panic_event is called, and it issues a hypercall which
> requests Hypervisor to execute kdump.  This means whether
> crash_kexec_panic_notifiers is set or not, panic notifiers
> are called after smp_send_stop.  Even if we save registers
> in Dom0 kernel, they seem to be ignored (Hypervisor is responsible
> for that).  This is why I kept the current behavior for Xen.
>
> For PV DomU kernel, kdump is not supported.  For PV HVM
> DomU, I'm not sure what will happen on panic because I
> couldn't boot PV HVM DomU and test it.  But I think it will
> work similarly to baremetal kernels with extra cleanups
> for Hypervisor.
>
> Best regards,
>
> Hidehiro Kawai
>
>>> Changes in V4:
>>> - Keep to use smp_send_stop if crash_kexec_post_notifiers is not set
>>> - Rename panic_smp_send_stop to crash_smp_send_stop
>>> - Don't change the behavior for Xen's PV kernel
>>>
>>> Changes in V3:
>>> - Revise comments, description, and symbol names
>>>
>>> Changes in V2:
>>> - Replace smp_send_stop() call with crash_kexec version which
>>>   saves cpu states and cleans up VMX/SVM
>>> - Drop a fix for Problem 1 at this moment
>>>
>>> Reported-by: Daniel Walker 
>>> Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" 
>>> option)
>>> Signed-off-by: Hidehiro Kawai 
>>> Cc: Dave Young 
>>> Cc: Baoquan He 
>>> Cc: Vivek Goyal 
>>> Cc: Eric Biederman 
>>> Cc: Masami Hiramatsu 
>>> Cc: Daniel Walker 
>>> Cc: Xunlei Pang 
>>> Cc: Thomas Gleixner 
>>> Cc: Ingo Molnar 
>>> Cc: "H. Peter Anvin" 
>>> Cc: Borislav Petkov 
>>> Cc: David Vrabel 
>>> Cc: Toshi Kani 
>>> Cc: Andrew Morton 
>>> ---
>>>  arch/x86/include/asm/kexec.h |1 +
>>>  arch/x86/include/asm/smp.h   |1 +
>>>  arch/x86/kernel/crash.c  |   22 +---
>>>  arch/x86/kernel/smp.c|5 
>>>  kernel/panic.c   |   47 
>>> --
>>>  5 files changed, 66 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
>>> index d2434c1..282630e 100644
>>> --- a/arch/x86/include/asm/kexec.h
>>> +++ b/arch/x86/include/asm/kexec.h
>>> @@ -210,6 +210,7 @@ struct kexec_entry64_regs {
>>>
>>>  typedef void crash_vmclear_fn(void);
>>>  extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
>>> +extern void kdump_nmi_shootdown_cpus(void);
>>>
>>>  #endif /* __ASSEMBLY__ */
>>>
>>> diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
>>> index ebd0c16..f70989c 100644
>>> --- a/arch/x86/include/asm/smp.h
>>> +++ b/arch/x86/include/asm/smp.h
>>> @@ -50,6 +50,7 @@ struct smp_ops {
>>> void (*smp_cpus_done)(unsigned max_cpus);
>>>
>>> void (*stop_other_cpus)(int wait);
>>> +   void (*crash_stop_other_cpus)(void);
>>> void (*smp_send_reschedule)(int cpu);
>>>
>>> int (*cpu_up)(unsigned cpu, struct task_struct *tidle);
>>> diff --git a/arch/x86/kernel/crash.c 

RE: Re: [V4 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-08-15 Thread 河合英宏 / KAWAI,HIDEHIRO
Hi Dave,

Thank you for the review.

> From: Dave Young [mailto:dyo...@redhat.com]
> Sent: Friday, August 12, 2016 12:17 PM
> 
> Thanks for the update.
> On 08/10/16 at 05:09pm, Hidehiro Kawai wrote:
> > Daniel Walker reported problems which happens when
> > crash_kexec_post_notifiers kernel option is enabled
> > (https://lkml.org/lkml/2015/6/24/44).
> >
> > In that case, smp_send_stop() is called before entering kdump routines
> > which assume other CPUs are still online.  As the result, for x86,
> > kdump routines fail to save other CPUs' registers  and disable
> > virtualization extensions.
> 
> Seems you simplified the changelog, but I think a little more details
> will be helpful to understand the patch. You know sometimes lkml.org
> does not work well.

So, I'll try another archives when I post patch set next time.

> > To fix this problem, call a new kdump friendly function,
> > crash_smp_send_stop(), instead of the smp_send_stop() when
> > crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
> > weak function, and it just call smp_send_stop().  Architecture
> > codes should override it so that kdump can work appropriately.
> > This patch only provides x86-specific version.
> >
> > For Xen's PV kernel, just keep the current behavior.
> 
> Could you explain a bit about above Xen PV kernel behavior?
> 
> BTW, this version looks better,  I think I'm fine with this version
> besides of the questions about changelog.

As for Dom0 kernel, it doesn't use crash_kexec routines, and
it relies on panic notifier chain.  At the end of the chain,
xen_panic_event is called, and it issues a hypercall which
requests Hypervisor to execute kdump.  This means whether
crash_kexec_panic_notifiers is set or not, panic notifiers
are called after smp_send_stop.  Even if we save registers
in Dom0 kernel, they seem to be ignored (Hypervisor is responsible
for that).  This is why I kept the current behavior for Xen.

For PV DomU kernel, kdump is not supported.  For PV HVM
DomU, I'm not sure what will happen on panic because I
couldn't boot PV HVM DomU and test it.  But I think it will
work similarly to baremetal kernels with extra cleanups
for Hypervisor.

Best regards,

Hidehiro Kawai

> > Changes in V4:
> > - Keep to use smp_send_stop if crash_kexec_post_notifiers is not set
> > - Rename panic_smp_send_stop to crash_smp_send_stop
> > - Don't change the behavior for Xen's PV kernel
> >
> > Changes in V3:
> > - Revise comments, description, and symbol names
> >
> > Changes in V2:
> > - Replace smp_send_stop() call with crash_kexec version which
> >   saves cpu states and cleans up VMX/SVM
> > - Drop a fix for Problem 1 at this moment
> >
> > Reported-by: Daniel Walker 
> > Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" 
> > option)
> > Signed-off-by: Hidehiro Kawai 
> > Cc: Dave Young 
> > Cc: Baoquan He 
> > Cc: Vivek Goyal 
> > Cc: Eric Biederman 
> > Cc: Masami Hiramatsu 
> > Cc: Daniel Walker 
> > Cc: Xunlei Pang 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: Borislav Petkov 
> > Cc: David Vrabel 
> > Cc: Toshi Kani 
> > Cc: Andrew Morton 
> > ---
> >  arch/x86/include/asm/kexec.h |1 +
> >  arch/x86/include/asm/smp.h   |1 +
> >  arch/x86/kernel/crash.c  |   22 +---
> >  arch/x86/kernel/smp.c|5 
> >  kernel/panic.c   |   47 
> > --
> >  5 files changed, 66 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> > index d2434c1..282630e 100644
> > --- a/arch/x86/include/asm/kexec.h
> > +++ b/arch/x86/include/asm/kexec.h
> > @@ -210,6 +210,7 @@ struct kexec_entry64_regs {
> >
> >  typedef void crash_vmclear_fn(void);
> >  extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
> > +extern void kdump_nmi_shootdown_cpus(void);
> >
> >  #endif /* __ASSEMBLY__ */
> >
> > diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
> > index ebd0c16..f70989c 100644
> > --- a/arch/x86/include/asm/smp.h
> > +++ b/arch/x86/include/asm/smp.h
> > @@ -50,6 +50,7 @@ struct smp_ops {
> > void (*smp_cpus_done)(unsigned max_cpus);
> >
> > void (*stop_other_cpus)(int wait);
> > +   void (*crash_stop_other_cpus)(void);
> > void (*smp_send_reschedule)(int cpu);
> >
> > int (*cpu_up)(unsigned cpu, struct task_struct *tidle);
> > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> > index 9616cf7..650830e 100644
> > --- a/arch/x86/kernel/crash.c
> > +++ b/arch/x86/kernel/crash.c
> > @@ -133,15 +133,31 @@ static void kdump_nmi_callback(int cpu, struct 
> > pt_regs 

[V4 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version in panic path

2016-08-10 Thread Hidehiro Kawai
Daniel Walker reported problems which happens when
crash_kexec_post_notifiers kernel option is enabled
(https://lkml.org/lkml/2015/6/24/44).

In that case, smp_send_stop() is called before entering kdump routines
which assume other CPUs are still online.  As the result, for x86,
kdump routines fail to save other CPUs' registers  and disable
virtualization extensions.

To fix this problem, call a new kdump friendly function,
crash_smp_send_stop(), instead of the smp_send_stop() when
crash_kexec_post_notifiers is enabled.  crash_smp_send_stop() is a
weak function, and it just call smp_send_stop().  Architecture
codes should override it so that kdump can work appropriately.
This patch only provides x86-specific version.

For Xen's PV kernel, just keep the current behavior.

Changes in V4:
- Keep to use smp_send_stop if crash_kexec_post_notifiers is not set
- Rename panic_smp_send_stop to crash_smp_send_stop
- Don't change the behavior for Xen's PV kernel

Changes in V3:
- Revise comments, description, and symbol names

Changes in V2:
- Replace smp_send_stop() call with crash_kexec version which
  saves cpu states and cleans up VMX/SVM
- Drop a fix for Problem 1 at this moment

Reported-by: Daniel Walker 
Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
Signed-off-by: Hidehiro Kawai 
Cc: Dave Young 
Cc: Baoquan He 
Cc: Vivek Goyal 
Cc: Eric Biederman 
Cc: Masami Hiramatsu 
Cc: Daniel Walker 
Cc: Xunlei Pang 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: David Vrabel 
Cc: Toshi Kani 
Cc: Andrew Morton 
---
 arch/x86/include/asm/kexec.h |1 +
 arch/x86/include/asm/smp.h   |1 +
 arch/x86/kernel/crash.c  |   22 +---
 arch/x86/kernel/smp.c|5 
 kernel/panic.c   |   47 --
 5 files changed, 66 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index d2434c1..282630e 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -210,6 +210,7 @@ struct kexec_entry64_regs {
 
 typedef void crash_vmclear_fn(void);
 extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
+extern void kdump_nmi_shootdown_cpus(void);
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index ebd0c16..f70989c 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -50,6 +50,7 @@ struct smp_ops {
void (*smp_cpus_done)(unsigned max_cpus);
 
void (*stop_other_cpus)(int wait);
+   void (*crash_stop_other_cpus)(void);
void (*smp_send_reschedule)(int cpu);
 
int (*cpu_up)(unsigned cpu, struct task_struct *tidle);
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 9616cf7..650830e 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -133,15 +133,31 @@ static void kdump_nmi_callback(int cpu, struct pt_regs 
*regs)
disable_local_APIC();
 }
 
-static void kdump_nmi_shootdown_cpus(void)
+void kdump_nmi_shootdown_cpus(void)
 {
nmi_shootdown_cpus(kdump_nmi_callback);
 
disable_local_APIC();
 }
 
+/* Override the weak function in kernel/panic.c */
+void crash_smp_send_stop(void)
+{
+   static int cpus_stopped;
+
+   if (cpus_stopped)
+   return;
+
+   if (smp_ops.crash_stop_other_cpus)
+   smp_ops.crash_stop_other_cpus();
+   else
+   smp_send_stop();
+
+   cpus_stopped = 1;
+}
+
 #else
-static void kdump_nmi_shootdown_cpus(void)
+void crash_smp_send_stop(void)
 {
/* There are no cpus to shootdown */
 }
@@ -160,7 +176,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
/* The kernel is broken so disable interrupts */
local_irq_disable();
 
-   kdump_nmi_shootdown_cpus();
+   crash_smp_send_stop();
 
/*
 * VMCLEAR VMCSs loaded on this cpu if needed.
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 658777c..68f8cc2 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -32,6 +32,8 @@
 #include 
 #include 
 #include 
+#include 
+
 /*
  * Some notes on x86 processor bugs affecting SMP operation:
  *
@@ -342,6 +344,9 @@ struct smp_ops smp_ops = {
.smp_cpus_done  = native_smp_cpus_done,
 
.stop_other_cpus= native_stop_other_cpus,
+#if defined(CONFIG_KEXEC_CORE)
+   .crash_stop_other_cpus  = kdump_nmi_shootdown_cpus,
+#endif
.smp_send_reschedule= native_smp_send_reschedule,
 
.cpu_up = native_cpu_up,
diff --git a/kernel/panic.c