Re: [PATCH v3 0/4] Fixes for 3 separate NMI reentrancy bugs

2019-02-25 Thread Satheesh Rajendran
On Tue, Feb 26, 2019 at 04:08:57PM +1000, Nicholas Piggin wrote:
> This series fixes several similar but unrelated bugs with NMIs
> clobbering live registers without noticing it, because MSR[RI] is set.
> Pretty rare bugs, but serious silent corruption consequences.
> 
> For the most part these can be observed and tested quite easily
> with the mambo simulator, except that it does not seem to follow
> the architecture wrt leaving MSR[RI] unchanged for HV interrupts.
> Mambo clears MSR[RI], so you have to account for that manually.
> 
> Since v1:
> - Fixed several build bugs.
> 
> Since v2:
> - Improved changelog and comments.
> - Fixed the NIA test for virt mode interrupts.

Hit with below crash on Power8 box, patch built with linuxppc merge branch with 
`ppc64le_defconfig`

UnknownStateTransition: Something happened system state="8" and we transitioned 
to UNKNOWN state.  Review the following for more details
Message="OpTestSystem in run_IPLing and Exception="Kernel OOPS (machine in 
state '5'): Oops: Kernel access of bad area, sig: 11 [#1]
[0.00] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
[0.00] Modules linked in:
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc7-gf46b87021 #1
[0.00] NIP:  c0c1306c LR: c0c12f64 CTR: c033d860
[0.00] REGS: c14878b0 TRAP: 0380   Not tainted  
(5.0.0-rc7-gf46b87021)
[0.00] MSR:  90001033   CR: 28002224  
XER: 
[0.00] CFAR: c0c12f7c IRQMASK: 1 
[0.00] GPR00: c0c12f64 c1487b40 c1488400 
f000 
[0.00] GPR04: c1487b18 c1487b20  
c1388400 
[0.00] GPR08: f000 f008  
0008 
[0.00] GPR12: c15e1ed0 c167  
 
[0.00] GPR16:   c15e0d40 
0001 
[0.00] GPR20:   0800 
c1413b90 
[0.00] GPR24: c1413b98 0070 0008 
 
[0.00] GPR28:   00701000 
 
[0.00] NIP [c0c1306c] memmap_init_zone+0x258/0x308
[0.00] LR [c0c12f64] memmap_init_zone+0x150/0x308
[0.00] Call Trace:
[0.00] [c1487b40] [c0c12f64] 
memmap_init_zone+0x150/0x308 (unreliable)
[0.00] [c1487be0] [c0f87acc] 
free_area_init_node+0x480/0x518
[0.00] [c1487cf0] [c0f88630] 
free_area_init_nodes+0x838/0x940
[0.00] [c1487e10] [c0f6340c] paging_init+0x8c/0xa8
[0.00] [c1487e80] [c0f5bc00] setup_arch+0x3b4/0x3f0
[0.00] [c1487ef0] [c0f53b68] start_kernel+0x94/0x630
[0.00] [c1487f90] [c000b37c] 
start_here_common+0x1c/0x520
[0.00] Instruction dump:
[0.00] 71290002 41820014 ebea0008 7cc6fa14 78df8402 4870 3d22000c 
7bea3664 
[0.00] 39299d20 e909 7c685214 39230008  fa290018 fa290020 
fa290030 
[0.00] random: get_random_bytes called from 
print_oops_end_marker+0x40/0x80 with crng_init=0
[0.00] ---[ end trace  ]---
[0.00] 
[0.00] Kernel panic - not syncing: Attempted to kill the idle task!
[0.00] Rebooting in 10 seconds" caused the system to go to UNKNOWN_BAD 
and the system will be stopping."

Regards,
-Satheesh.
> 
> Nicholas Piggin (4):
>   powerpc/64s: Fix HV NMI vs HV interrupt recoverability test
>   powerpc/64s: system reset interrupt preserve HSRRs
>   powerpc/64s: Prepare to handle data interrupts vs d-side MCE
> reentrancy
>   powerpc/64s: Fix data interrupts vs d-side MCE reentrancy
> 
>  arch/powerpc/include/asm/asm-prototypes.h |  8 ++
>  arch/powerpc/include/asm/nmi.h|  2 +
>  arch/powerpc/kernel/exceptions-64s.S  | 92 +++
>  arch/powerpc/kernel/mce.c |  3 +
>  arch/powerpc/kernel/traps.c   | 91 +-
>  5 files changed, 179 insertions(+), 17 deletions(-)
> 
> -- 
> 2.18.0
> 



[PATCH v3 0/4] Fixes for 3 separate NMI reentrancy bugs

2019-02-25 Thread Nicholas Piggin
This series fixes several similar but unrelated bugs with NMIs
clobbering live registers without noticing it, because MSR[RI] is set.
Pretty rare bugs, but serious silent corruption consequences.

For the most part these can be observed and tested quite easily
with the mambo simulator, except that it does not seem to follow
the architecture wrt leaving MSR[RI] unchanged for HV interrupts.
Mambo clears MSR[RI], so you have to account for that manually.

Since v1:
- Fixed several build bugs.

Since v2:
- Improved changelog and comments.
- Fixed the NIA test for virt mode interrupts.

Nicholas Piggin (4):
  powerpc/64s: Fix HV NMI vs HV interrupt recoverability test
  powerpc/64s: system reset interrupt preserve HSRRs
  powerpc/64s: Prepare to handle data interrupts vs d-side MCE
reentrancy
  powerpc/64s: Fix data interrupts vs d-side MCE reentrancy

 arch/powerpc/include/asm/asm-prototypes.h |  8 ++
 arch/powerpc/include/asm/nmi.h|  2 +
 arch/powerpc/kernel/exceptions-64s.S  | 92 +++
 arch/powerpc/kernel/mce.c |  3 +
 arch/powerpc/kernel/traps.c   | 91 +-
 5 files changed, 179 insertions(+), 17 deletions(-)

-- 
2.18.0