Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-28 Thread Baoquan He
On 04/10/15 at 12:49am, Naoya Horiguchi wrote: > On Thu, Apr 09, 2015 at 09:05:51PM +0200, Borislav Petkov wrote: > > On Thu, Apr 09, 2015 at 06:22:02PM +, Luck, Tony wrote: > > > > Why? Those CPUs are offlined and num_online_cpus() in mce_start() should > > > > account for that, no? > > > > >

Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-28 Thread Baoquan He
On 04/10/15 at 12:49am, Naoya Horiguchi wrote: On Thu, Apr 09, 2015 at 09:05:51PM +0200, Borislav Petkov wrote: On Thu, Apr 09, 2015 at 06:22:02PM +, Luck, Tony wrote: Why? Those CPUs are offlined and num_online_cpus() in mce_start() should account for that, no? And if those

Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Borislav Petkov
On Thu, Apr 09, 2015 at 06:22:02PM +, Luck, Tony wrote: > > Why? Those CPUs are offlined and num_online_cpus() in mce_start() should > > account for that, no? > > > > And if those are offlined, they're very very unlikely to trigger an MCE > > as they're idle and not executing code. > > Let's

RE: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Luck, Tony
> Why? Those CPUs are offlined and num_online_cpus() in mce_start() should > account for that, no? > > And if those are offlined, they're very very unlikely to trigger an MCE > as they're idle and not executing code. Let's step back a few feet and look at the big picture. There are three main

RE: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Luck, Tony
> If only APEI EINJ could be taught to do delayed injection, regardless of > OS kernel running. Tony, is something like that even possible at all? Use: # echo 1 > notrigger that allows you to plant a land-mine in memory that will get tripped later. Pick the memory address in a clever way

Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Borislav Petkov
On Thu, Apr 09, 2015 at 08:59:44AM +, Naoya Horiguchi wrote: > I replied about testing. That might be tricky a little, but I hope it helps. Yeah, whatever we do, we need this properly tested before upstreaming. That's a given. > Even if we raise tolerant level in running kdump, that doesn't

Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Ingo Molnar
* Naoya Horiguchi wrote: > On Thu, Apr 09, 2015 at 10:00:30AM +0200, Ingo Molnar wrote: > > > > * Borislav Petkov wrote: > > > > > Btw, Ingo had some reservations about this. Ingo? > > > > Yeah, so my concerns are the following: > > > > > kexec disables (or "shoots down") all CPUs other

Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Naoya Horiguchi
On Thu, Apr 09, 2015 at 10:21:25AM +0200, Borislav Petkov wrote: > On Thu, Apr 09, 2015 at 10:00:30AM +0200, Ingo Molnar wrote: > > So the thing is, when we boot up the second kernel there will be a > > window where the old handler isn't valid (because the new kernel has > > its own pagetables,

Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Naoya Horiguchi
On Thu, Apr 09, 2015 at 10:00:30AM +0200, Ingo Molnar wrote: > > * Borislav Petkov wrote: > > > Btw, Ingo had some reservations about this. Ingo? > > Yeah, so my concerns are the following: > > > kexec disables (or "shoots down") all CPUs other than the crashing > > CPU before entering the

Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Borislav Petkov
On Thu, Apr 09, 2015 at 10:00:30AM +0200, Ingo Molnar wrote: > So the thing is, when we boot up the second kernel there will be a > window where the old handler isn't valid (because the new kernel has > its own pagetables, etc.) and the new handler is not installed yet. > > If an MCE hits that

Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Ingo Molnar
* Borislav Petkov wrote: > Btw, Ingo had some reservations about this. Ingo? Yeah, so my concerns are the following: > kexec disables (or "shoots down") all CPUs other than the crashing > CPU before entering the 2nd kernel. However, MCA is still enabled so > if an MCE happens and broadcasts

Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Borislav Petkov
On Thu, Apr 09, 2015 at 06:57:38AM +, Naoya Horiguchi wrote: > Yes, I did see it at fisrt, so I did two tweaks for the testing: > > 1) to fix qemu code. I think that current mce injection code of qemu is buggy, > because when we try to inject MCE in broadcast mode, all injections other than >

Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Naoya Horiguchi
On Thu, Apr 09, 2015 at 08:13:46AM +0200, Borislav Petkov wrote: > On Tue, Apr 07, 2015 at 08:02:18AM +, Naoya Horiguchi wrote: > > kexec disables (or "shoots down") all CPUs other than a crashing CPU before > > entering the 2nd kernel. But the MCE handler is still enabled after that, > > so

Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Borislav Petkov
On Tue, Apr 07, 2015 at 08:02:18AM +, Naoya Horiguchi wrote: > kexec disables (or "shoots down") all CPUs other than a crashing CPU before > entering the 2nd kernel. But the MCE handler is still enabled after that, > so if MCE happens and broadcasts over the CPUs after the main thread starts >

Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Naoya Horiguchi
On Thu, Apr 09, 2015 at 10:00:30AM +0200, Ingo Molnar wrote: * Borislav Petkov b...@alien8.de wrote: Btw, Ingo had some reservations about this. Ingo? Yeah, so my concerns are the following: kexec disables (or shoots down) all CPUs other than the crashing CPU before entering the

Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Borislav Petkov
On Thu, Apr 09, 2015 at 06:22:02PM +, Luck, Tony wrote: Why? Those CPUs are offlined and num_online_cpus() in mce_start() should account for that, no? And if those are offlined, they're very very unlikely to trigger an MCE as they're idle and not executing code. Let's step back a

Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Ingo Molnar
* Borislav Petkov b...@alien8.de wrote: Btw, Ingo had some reservations about this. Ingo? Yeah, so my concerns are the following: kexec disables (or shoots down) all CPUs other than the crashing CPU before entering the 2nd kernel. However, MCA is still enabled so if an MCE happens and

Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Borislav Petkov
On Tue, Apr 07, 2015 at 08:02:18AM +, Naoya Horiguchi wrote: kexec disables (or shoots down) all CPUs other than a crashing CPU before entering the 2nd kernel. But the MCE handler is still enabled after that, so if MCE happens and broadcasts over the CPUs after the main thread starts the

Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Borislav Petkov
On Thu, Apr 09, 2015 at 06:57:38AM +, Naoya Horiguchi wrote: Yes, I did see it at fisrt, so I did two tweaks for the testing: 1) to fix qemu code. I think that current mce injection code of qemu is buggy, because when we try to inject MCE in broadcast mode, all injections other than the

Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Naoya Horiguchi
On Thu, Apr 09, 2015 at 10:21:25AM +0200, Borislav Petkov wrote: On Thu, Apr 09, 2015 at 10:00:30AM +0200, Ingo Molnar wrote: So the thing is, when we boot up the second kernel there will be a window where the old handler isn't valid (because the new kernel has its own pagetables, etc.)

Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Borislav Petkov
On Thu, Apr 09, 2015 at 10:00:30AM +0200, Ingo Molnar wrote: So the thing is, when we boot up the second kernel there will be a window where the old handler isn't valid (because the new kernel has its own pagetables, etc.) and the new handler is not installed yet. If an MCE hits that

Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Ingo Molnar
* Naoya Horiguchi n-horigu...@ah.jp.nec.com wrote: On Thu, Apr 09, 2015 at 10:00:30AM +0200, Ingo Molnar wrote: * Borislav Petkov b...@alien8.de wrote: Btw, Ingo had some reservations about this. Ingo? Yeah, so my concerns are the following: kexec disables (or shoots down)

Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Naoya Horiguchi
On Thu, Apr 09, 2015 at 08:13:46AM +0200, Borislav Petkov wrote: On Tue, Apr 07, 2015 at 08:02:18AM +, Naoya Horiguchi wrote: kexec disables (or shoots down) all CPUs other than a crashing CPU before entering the 2nd kernel. But the MCE handler is still enabled after that, so if MCE

Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Borislav Petkov
On Thu, Apr 09, 2015 at 08:59:44AM +, Naoya Horiguchi wrote: I replied about testing. That might be tricky a little, but I hope it helps. Yeah, whatever we do, we need this properly tested before upstreaming. That's a given. Even if we raise tolerant level in running kdump, that doesn't

RE: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Luck, Tony
Why? Those CPUs are offlined and num_online_cpus() in mce_start() should account for that, no? And if those are offlined, they're very very unlikely to trigger an MCE as they're idle and not executing code. Let's step back a few feet and look at the big picture. There are three main

RE: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-09 Thread Luck, Tony
If only APEI EINJ could be taught to do delayed injection, regardless of OS kernel running. Tony, is something like that even possible at all? Use: # echo 1 notrigger that allows you to plant a land-mine in memory that will get tripped later. Pick the memory address in a clever way and

[PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-07 Thread Naoya Horiguchi
kexec disables (or "shoots down") all CPUs other than a crashing CPU before entering the 2nd kernel. But the MCE handler is still enabled after that, so if MCE happens and broadcasts over the CPUs after the main thread starts the 2nd kernel (which might not initialize its MCE handler yet, or might

[PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

2015-04-07 Thread Naoya Horiguchi
kexec disables (or shoots down) all CPUs other than a crashing CPU before entering the 2nd kernel. But the MCE handler is still enabled after that, so if MCE happens and broadcasts over the CPUs after the main thread starts the 2nd kernel (which might not initialize its MCE handler yet, or might