Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-03-02 Thread Borislav Petkov
On Mon, Mar 02, 2015 at 11:50:49AM -0500, Prarit Bhargava wrote: > Unless entering a deep C state kicks an MCE ... which we've seen with flaky > hardware. If that is the case, you'll see the MCE not only when entering kdump. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you re

Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-03-02 Thread Prarit Bhargava
On 03/02/2015 11:32 AM, Borislav Petkov wrote: > On Mon, Mar 02, 2015 at 11:33:33PM +0900, Naoya Horiguchi wrote: >> Yes, CPU offlining is one option to keep other CPUs quiet. I'm not sure why >> current kexec implementation doesn't offline the other CPUs but just doing >> cpu_relax() loop, but m

Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-03-02 Thread Borislav Petkov
On Mon, Mar 02, 2015 at 11:33:33PM +0900, Naoya Horiguchi wrote: > Yes, CPU offlining is one option to keep other CPUs quiet. I'm not sure why > current kexec implementation doesn't offline the other CPUs but just doing > cpu_relax() loop, but my guess is that in some kernel panic situation (like >

Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-03-02 Thread Naoya Horiguchi
On Mon, Mar 02, 2015 at 01:17:01PM +0100, Borislav Petkov wrote: On Mon, Mar 02, 2015 at 02:31:19AM +, Naoya Horiguchi wrote: > And please note that the target of this patch is an MCE when the kernel is > already running on kdump code (so crashing happened *not* because of the MCE). > In that

Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-03-02 Thread Borislav Petkov
On Mon, Mar 02, 2015 at 02:31:19AM +, Naoya Horiguchi wrote: > And please note that the target of this patch is an MCE when the kernel is > already running on kdump code (so crashing happened *not* because of the MCE). > In that case, we can expect that kdump works fine if the MCE hits the "kdu

Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-03-01 Thread Naoya Horiguchi
On Fri, Feb 27, 2015 at 06:27:16PM +, Luck, Tony wrote: > > When CR4.MCE=0b and an MCE happens, it will shutdown the system, at > > least on Intel, according to Tony > > I checked with the architects ... and I was right. If you clear CR4.MCE > you'll still > see the machine check - and you'll

Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-03-01 Thread Naoya Horiguchi
On Fri, Feb 27, 2015 at 08:14:47AM -0500, Prarit Bhargava wrote: > On 02/27/2015 07:46 AM, Naoya Horiguchi wrote: > > Hi Prarit, > > > > On Fri, Feb 27, 2015 at 06:09:52AM -0500, Prarit Bhargava wrote: > > ... > >> > @@ -157,6 +160,11 @@ void native_machine_crash_shutdown(struct pt_regs > >> > *r

RE: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-02-27 Thread Luck, Tony
> When CR4.MCE=0b and an MCE happens, it will shutdown the system, at > least on Intel, according to Tony I checked with the architects ... and I was right. If you clear CR4.MCE you'll still see the machine check - and you'll pull the big system reset lever. If you think the other cpus can survi

Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-02-27 Thread Prarit Bhargava
On 02/27/2015 07:46 AM, Naoya Horiguchi wrote: > Hi Prarit, > > On Fri, Feb 27, 2015 at 06:09:52AM -0500, Prarit Bhargava wrote: > ... >> > @@ -157,6 +160,11 @@ void native_machine_crash_shutdown(struct pt_regs >> > *regs) >> > /* The kernel is broken so disable interrupts */ >> > loc

Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-02-27 Thread Naoya Horiguchi
Hi Prarit, On Fri, Feb 27, 2015 at 06:09:52AM -0500, Prarit Bhargava wrote: ... > @@ -157,6 +160,11 @@ void native_machine_crash_shutdown(struct pt_regs *regs) >/* The kernel is broken so disable interrupts */ >local_irq_disable(); > > + /* > + * We can't expect MCE handling to work a

Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-02-27 Thread Borislav Petkov
On Fri, Feb 27, 2015 at 06:09:52AM -0500, Prarit Bhargava wrote: > What if the system is actually having problems with MCE errors -- > which are leading to system panics of some sort. Do you *really* want > the system to continue on at that point? No one said that disabling MCA and doing kdump is

Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-02-27 Thread Prarit Bhargava
On 02/26/2015 11:58 PM, Naoya Horiguchi wrote: > kexec disables (or "shoots down") all CPUs other than a crashing CPU before > entering the 2nd kernel. But the MCE handler is still enabled after that, so > if MCE happens and broadcasts around CPUs after the main thread starts the > 2nd kernel (wh

[PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-02-26 Thread Naoya Horiguchi
kexec disables (or "shoots down") all CPUs other than a crashing CPU before entering the 2nd kernel. But the MCE handler is still enabled after that, so if MCE happens and broadcasts around CPUs after the main thread starts the 2nd kernel (which might not start MCE yet, or might decide not to start