On Tue, Feb 26, 2019 at 04:08:57PM +1000, Nicholas Piggin wrote:
> This series fixes several similar but unrelated bugs with NMIs
> clobbering live registers without noticing it, because MSR[RI] is set.
> Pretty rare bugs, but serious silent corruption consequences.
>
> For the most part these can be observed and tested quite easily
> with the mambo simulator, except that it does not seem to follow
> the architecture wrt leaving MSR[RI] unchanged for HV interrupts.
> Mambo clears MSR[RI], so you have to account for that manually.
>
> Since v1:
> - Fixed several build bugs.
>
> Since v2:
> - Improved changelog and comments.
> - Fixed the NIA test for virt mode interrupts.
Hit with below crash on Power8 box, patch built with linuxppc merge branch with
`ppc64le_defconfig`
UnknownStateTransition: Something happened system state="8" and we transitioned
to UNKNOWN state. Review the following for more details
Message="OpTestSystem in run_IPLing and Exception="Kernel OOPS (machine in
state '5'): Oops: Kernel access of bad area, sig: 11 [#1]
[0.00] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
[0.00] Modules linked in:
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc7-gf46b87021 #1
[0.00] NIP: c0c1306c LR: c0c12f64 CTR: c033d860
[0.00] REGS: c14878b0 TRAP: 0380 Not tainted
(5.0.0-rc7-gf46b87021)
[0.00] MSR: 90001033 CR: 28002224
XER:
[0.00] CFAR: c0c12f7c IRQMASK: 1
[0.00] GPR00: c0c12f64 c1487b40 c1488400
f000
[0.00] GPR04: c1487b18 c1487b20
c1388400
[0.00] GPR08: f000 f008
0008
[0.00] GPR12: c15e1ed0 c167
[0.00] GPR16: c15e0d40
0001
[0.00] GPR20: 0800
c1413b90
[0.00] GPR24: c1413b98 0070 0008
[0.00] GPR28: 00701000
[0.00] NIP [c0c1306c] memmap_init_zone+0x258/0x308
[0.00] LR [c0c12f64] memmap_init_zone+0x150/0x308
[0.00] Call Trace:
[0.00] [c1487b40] [c0c12f64]
memmap_init_zone+0x150/0x308 (unreliable)
[0.00] [c1487be0] [c0f87acc]
free_area_init_node+0x480/0x518
[0.00] [c1487cf0] [c0f88630]
free_area_init_nodes+0x838/0x940
[0.00] [c1487e10] [c0f6340c] paging_init+0x8c/0xa8
[0.00] [c1487e80] [c0f5bc00] setup_arch+0x3b4/0x3f0
[0.00] [c1487ef0] [c0f53b68] start_kernel+0x94/0x630
[0.00] [c1487f90] [c000b37c]
start_here_common+0x1c/0x520
[0.00] Instruction dump:
[0.00] 71290002 41820014 ebea0008 7cc6fa14 78df8402 4870 3d22000c
7bea3664
[0.00] 39299d20 e909 7c685214 39230008 fa290018 fa290020
fa290030
[0.00] random: get_random_bytes called from
print_oops_end_marker+0x40/0x80 with crng_init=0
[0.00] ---[ end trace ]---
[0.00]
[0.00] Kernel panic - not syncing: Attempted to kill the idle task!
[0.00] Rebooting in 10 seconds" caused the system to go to UNKNOWN_BAD
and the system will be stopping."
Regards,
-Satheesh.
>
> Nicholas Piggin (4):
> powerpc/64s: Fix HV NMI vs HV interrupt recoverability test
> powerpc/64s: system reset interrupt preserve HSRRs
> powerpc/64s: Prepare to handle data interrupts vs d-side MCE
> reentrancy
> powerpc/64s: Fix data interrupts vs d-side MCE reentrancy
>
> arch/powerpc/include/asm/asm-prototypes.h | 8 ++
> arch/powerpc/include/asm/nmi.h| 2 +
> arch/powerpc/kernel/exceptions-64s.S | 92 +++
> arch/powerpc/kernel/mce.c | 3 +
> arch/powerpc/kernel/traps.c | 91 +-
> 5 files changed, 179 insertions(+), 17 deletions(-)
>
> --
> 2.18.0
>