Re: Injecting SLB miltihit crashes kernel 5.9.0-rc5
On 9/15/20 2:13 PM, Michal Suchánek wrote: > Hello, > > Using the SLB mutihit injection test module (which I did not write so I > do not want to post it here) to verify updates on my 5.3 frankernekernel > I found that the kernel crashes with Oops: kernel bad access. > > I tested on latest upstream kernel build that I have at hand and the > result is te same (minus the message - nothing was logged and the kernel > simply rebooted). Yes, SLB multihit recovery is broken upstream. Fix is on the way. > > Since the whole effort to write a real mode MCE handler was supposed to > prevent this maybe the SLB injection module should be added to the > kernel selftests? Yes. We are working on adding SLB injection selftest patches will be posted soon. Thanks, -Mahesh. > > Thanks > > Michal >
Re: Injecting SLB miltihit crashes kernel 5.9.0-rc5
Excerpts from Michael Ellerman's message of September 15, 2020 10:54 pm: > Michal Suchánek writes: >> Hello, >> >> Using the SLB mutihit injection test module (which I did not write so I >> do not want to post it here) to verify updates on my 5.3 frankernekernel >> I found that the kernel crashes with Oops: kernel bad access. >> >> I tested on latest upstream kernel build that I have at hand and the >> result is te same (minus the message - nothing was logged and the kernel >> simply rebooted). > > That's disappointing. It seems to work okay with qemu and mambo injection on upstream (powernv_defconfig). I wonder why that nmi_enter is crashing. Can you post the output of a successful test with the patch reverted? qemu injection test output - [ 195.279885][C0] Disabling lock debugging due to kernel taint [ 195.280891][C0] MCE: CPU0: machine check (Warning) Host SLB Multihit DAR: deadbeef [Recovered] [ 195.282117][C0] MCE: CPU0: NIP: [c003c2b4] isa300_idle_stop_mayloss+0x68/0x6c [ 195.283631][C0] MCE: CPU0: Initiator CPU [ 195.284432][C0] MCE: CPU0: Probable Software error (some chance of hardware cause) [ 220.711577][ T90] MCE: CPU0: machine check (Warning) Host SLB Multihit DAR: deadbeef [Recovered] [ 220.712805][ T90] MCE: CPU0: PID: 90 Comm: yes NIP: [7fff7fdac2e0] [ 220.713553][ T90] MCE: CPU0: Initiator CPU [ 220.714021][ T90] MCE: CPU0: Probable Software error (some chance of hardware cause) Thanks, Nick
Re: Injecting SLB miltihit crashes kernel 5.9.0-rc5
Michal Suchánek writes: > Hello, > > Using the SLB mutihit injection test module (which I did not write so I > do not want to post it here) to verify updates on my 5.3 frankernekernel > I found that the kernel crashes with Oops: kernel bad access. > > I tested on latest upstream kernel build that I have at hand and the > result is te same (minus the message - nothing was logged and the kernel > simply rebooted). That's disappointing. > Since the whole effort to write a real mode MCE handler was supposed to > prevent this maybe the SLB injection module should be added to the > kernel selftests? Yes I'd like to see it upstream. I think it should be integrated into LKDTM, which contains other dangerous things like that and is designed for testing how the kernel handles/recovers from bad conditions. cheers
Injecting SLB miltihit crashes kernel 5.9.0-rc5
Hello, Using the SLB mutihit injection test module (which I did not write so I do not want to post it here) to verify updates on my 5.3 frankernekernel I found that the kernel crashes with Oops: kernel bad access. I tested on latest upstream kernel build that I have at hand and the result is te same (minus the message - nothing was logged and the kernel simply rebooted). Since the whole effort to write a real mode MCE handler was supposed to prevent this maybe the SLB injection module should be added to the kernel selftests? Thanks Michal