Re: Injecting SLB miltihit crashes kernel 5.9.0-rc5

2020-09-16 Thread Mahesh Jagannath Salgaonkar
On 9/15/20 2:13 PM, Michal Suchánek wrote:
> Hello,
> 
> Using the SLB mutihit injection test module (which I did not write so I
> do not want to post it here) to verify updates on my 5.3 frankernekernel
> I found that the kernel crashes with Oops: kernel bad access.
> 
> I tested on latest upstream kernel build that I have at hand and the
> result is te same (minus the message - nothing was logged and the kernel
> simply rebooted).


Yes, SLB multihit recovery is broken upstream. Fix is on the way.


> 
> Since the whole effort to write a real mode MCE handler was supposed to
> prevent this maybe the SLB injection module should be added to the
> kernel selftests?

Yes. We are working on adding SLB injection selftest patches will be
posted soon.

Thanks,
-Mahesh.

> 
> Thanks
> 
> Michal
> 



Re: Injecting SLB miltihit crashes kernel 5.9.0-rc5

2020-09-15 Thread Nicholas Piggin
Excerpts from Michael Ellerman's message of September 15, 2020 10:54 pm:
> Michal Suchánek  writes:
>> Hello,
>>
>> Using the SLB mutihit injection test module (which I did not write so I
>> do not want to post it here) to verify updates on my 5.3 frankernekernel
>> I found that the kernel crashes with Oops: kernel bad access.
>>
>> I tested on latest upstream kernel build that I have at hand and the
>> result is te same (minus the message - nothing was logged and the kernel
>> simply rebooted).
> 
> That's disappointing.

It seems to work okay with qemu and mambo injection on upstream
(powernv_defconfig). I wonder why that nmi_enter is crashing.
Can you post the output of a successful test with the patch
reverted?


qemu injection test output - 
[  195.279885][C0] Disabling lock debugging due to kernel taint
[  195.280891][C0] MCE: CPU0: machine check (Warning) Host SLB Multihit 
DAR: deadbeef [Recovered]
[  195.282117][C0] MCE: CPU0: NIP: [c003c2b4] 
isa300_idle_stop_mayloss+0x68/0x6c
[  195.283631][C0] MCE: CPU0: Initiator CPU
[  195.284432][C0] MCE: CPU0: Probable Software error (some chance of 
hardware cause)
[  220.711577][   T90] MCE: CPU0: machine check (Warning) Host SLB Multihit 
DAR: deadbeef [Recovered]
[  220.712805][   T90] MCE: CPU0: PID: 90 Comm: yes NIP: [7fff7fdac2e0]
[  220.713553][   T90] MCE: CPU0: Initiator CPU
[  220.714021][   T90] MCE: CPU0: Probable Software error (some chance of 
hardware cause)

Thanks,
Nick


Re: Injecting SLB miltihit crashes kernel 5.9.0-rc5

2020-09-15 Thread Michael Ellerman
Michal Suchánek  writes:
> Hello,
>
> Using the SLB mutihit injection test module (which I did not write so I
> do not want to post it here) to verify updates on my 5.3 frankernekernel
> I found that the kernel crashes with Oops: kernel bad access.
>
> I tested on latest upstream kernel build that I have at hand and the
> result is te same (minus the message - nothing was logged and the kernel
> simply rebooted).

That's disappointing.

> Since the whole effort to write a real mode MCE handler was supposed to
> prevent this maybe the SLB injection module should be added to the
> kernel selftests?

Yes I'd like to see it upstream. I think it should be integrated into
LKDTM, which contains other dangerous things like that and is designed
for testing how the kernel handles/recovers from bad conditions.

cheers


Injecting SLB miltihit crashes kernel 5.9.0-rc5

2020-09-15 Thread Michal Suchánek
Hello,

Using the SLB mutihit injection test module (which I did not write so I
do not want to post it here) to verify updates on my 5.3 frankernekernel
I found that the kernel crashes with Oops: kernel bad access.

I tested on latest upstream kernel build that I have at hand and the
result is te same (minus the message - nothing was logged and the kernel
simply rebooted).

Since the whole effort to write a real mode MCE handler was supposed to
prevent this maybe the SLB injection module should be added to the
kernel selftests?

Thanks

Michal