Re: [RFC PATCH] x86: Do not panic if mce=2 is passed

2016-10-31 Thread Borislav Petkov
On Fri, Sep 16, 2016 at 01:23:25PM -0700, Yinghai Lu wrote:
> From: Yinghai Lu 
> 
> For UE recovery support, current we need mce=2 in command line
> and also disable panic_on_oops with sysctl.
> 
> but other user may still need to have panic_on_oops to 1 always.
> 
> We can remove checking of panic_on_oops for mce-severity path.
> 
> We should be ok as on default path when mce=2 is not passed, tolerant
> is 0, so they will still get MCE_PANIC_SEVERITY returned.
> 
> Signed-off-by: Yinghai Lu 
> 
> 
> ---
>  arch/x86/kernel/cpu/mcheck/mce-severity.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux-2.6/arch/x86/kernel/cpu/mcheck/mce-severity.c
> ===
> --- linux-2.6.orig/arch/x86/kernel/cpu/mcheck/mce-severity.c
> +++ linux-2.6/arch/x86/kernel/cpu/mcheck/mce-severity.c
> @@ -311,7 +311,7 @@ static int mce_severity_intel(struct mce
>   *msg = s->msg;
>   s->covered = 1;
>   if (s->sev >= MCE_UC_SEVERITY && ctx == IN_KERNEL) {
> - if (panic_on_oops || tolerant < 1)
> + if (tolerant < 1)
>   return MCE_PANIC_SEVERITY;
>   }
>   return s->sev;

Applied,
thanks.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.


Re: [RFC PATCH] x86: Do not panic if mce=2 is passed

2016-09-18 Thread Borislav Petkov
On Fri, Sep 16, 2016 at 08:28:44PM +, Luck, Tony wrote:
> > For UE recovery support, current we need mce=2 in command line
> > and also disable panic_on_oops with sysctl.
> 
> Please explain. I've never given mce=2 on command line, and have
> had my kernel recover from thousands of (injected) UE memory errors.

So frankly, that panic_on_oops doesn't make a whole lotta sense to me.

It is promoting MCEs with severity MCE_UC_SEVERITY and higher to a
panic.

So let's look at those:

MCE_UC_SEVERITY,- we don't do anything special in the kernel for
those so just as well.
MCE_AR_SEVERITY,- those end up in the memory failure code if
they're memory errors
MCE_PANIC_SEVERITY, - causes panic

so if anything, panic_on_oops shouldn't control the panicking behavior
as tolerant does that already:

 * Tolerant levels:
 * 0: always panic on uncorrected errors, log corrected errors
 * 1: panic or SIGBUS on uncorrected errors, log corrected errors
 * 2: SIGBUS or log uncorrected errors (if possible), log corr. errors
 * 3: never panic or SIGBUS, log all errors (for testing only)

IOW, I think that patch makes sense but please doublecheck my logic
above first.

Thanks.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--


RE: [RFC PATCH] x86: Do not panic if mce=2 is passed

2016-09-16 Thread Luck, Tony
> For UE recovery support, current we need mce=2 in command line
> and also disable panic_on_oops with sysctl.

Please explain. I've never given mce=2 on command line, and have
had my kernel recover from thousands of (injected) UE memory errors.

-Tony


[RFC PATCH] x86: Do not panic if mce=2 is passed

2016-09-16 Thread Yinghai Lu
From: Yinghai Lu 

For UE recovery support, current we need mce=2 in command line
and also disable panic_on_oops with sysctl.

but other user may still need to have panic_on_oops to 1 always.

We can remove checking of panic_on_oops for mce-severity path.

We should be ok as on default path when mce=2 is not passed, tolerant
is 0, so they will still get MCE_PANIC_SEVERITY returned.

Signed-off-by: Yinghai Lu 


---
 arch/x86/kernel/cpu/mcheck/mce-severity.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/arch/x86/kernel/cpu/mcheck/mce-severity.c
===
--- linux-2.6.orig/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ linux-2.6/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -311,7 +311,7 @@ static int mce_severity_intel(struct mce
*msg = s->msg;
s->covered = 1;
if (s->sev >= MCE_UC_SEVERITY && ctx == IN_KERNEL) {
-   if (panic_on_oops || tolerant < 1)
+   if (tolerant < 1)
return MCE_PANIC_SEVERITY;
}
return s->sev;