Re: [RFC PATCH] x86: Do not panic if mce=2 is passed
On Fri, Sep 16, 2016 at 01:23:25PM -0700, Yinghai Lu wrote: > From: Yinghai Lu > > For UE recovery support, current we need mce=2 in command line > and also disable panic_on_oops with sysctl. > > but other user may still need to have panic_on_oops to 1 always. > > We can remove checking of panic_on_oops for mce-severity path. > > We should be ok as on default path when mce=2 is not passed, tolerant > is 0, so they will still get MCE_PANIC_SEVERITY returned. > > Signed-off-by: Yinghai Lu > > > --- > arch/x86/kernel/cpu/mcheck/mce-severity.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Index: linux-2.6/arch/x86/kernel/cpu/mcheck/mce-severity.c > === > --- linux-2.6.orig/arch/x86/kernel/cpu/mcheck/mce-severity.c > +++ linux-2.6/arch/x86/kernel/cpu/mcheck/mce-severity.c > @@ -311,7 +311,7 @@ static int mce_severity_intel(struct mce > *msg = s->msg; > s->covered = 1; > if (s->sev >= MCE_UC_SEVERITY && ctx == IN_KERNEL) { > - if (panic_on_oops || tolerant < 1) > + if (tolerant < 1) > return MCE_PANIC_SEVERITY; > } > return s->sev; Applied, thanks. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [RFC PATCH] x86: Do not panic if mce=2 is passed
On Fri, Sep 16, 2016 at 08:28:44PM +, Luck, Tony wrote: > > For UE recovery support, current we need mce=2 in command line > > and also disable panic_on_oops with sysctl. > > Please explain. I've never given mce=2 on command line, and have > had my kernel recover from thousands of (injected) UE memory errors. So frankly, that panic_on_oops doesn't make a whole lotta sense to me. It is promoting MCEs with severity MCE_UC_SEVERITY and higher to a panic. So let's look at those: MCE_UC_SEVERITY,- we don't do anything special in the kernel for those so just as well. MCE_AR_SEVERITY,- those end up in the memory failure code if they're memory errors MCE_PANIC_SEVERITY, - causes panic so if anything, panic_on_oops shouldn't control the panicking behavior as tolerant does that already: * Tolerant levels: * 0: always panic on uncorrected errors, log corrected errors * 1: panic or SIGBUS on uncorrected errors, log corrected errors * 2: SIGBUS or log uncorrected errors (if possible), log corr. errors * 3: never panic or SIGBUS, log all errors (for testing only) IOW, I think that patch makes sense but please doublecheck my logic above first. Thanks. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. --
RE: [RFC PATCH] x86: Do not panic if mce=2 is passed
> For UE recovery support, current we need mce=2 in command line > and also disable panic_on_oops with sysctl. Please explain. I've never given mce=2 on command line, and have had my kernel recover from thousands of (injected) UE memory errors. -Tony