Re: [PATCH] powerpc/64s: Fix local irq disable when PMIs are disabled

2023-02-05 Thread Michael Ellerman
On Sat, 21 Jan 2023 19:53:52 +1000, Nicholas Piggin wrote:
> When PMI interrupts are soft-masked, local_irq_save() will clear the PMI
> mask bit, allowing PMIs in and causing a race condition. This causes a
> deadlock in native_hpte_insert via hash_preload, which depends on PMIs
> being disabled since commit 8b91cee5eadd ("powerpc/64s/hash: Make hash
> faults work in NMI context"). native_hpte_insert calls local_irq_save().
> It's possible the lpar hash code is also affected when tracing is
> enabled because __trace_hcall_entry() calls local_irq_save().
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/64s: Fix local irq disable when PMIs are disabled
  https://git.kernel.org/powerpc/c/bc88ef663265676419555df2dc469a471c0add31

cheers


[PATCH] powerpc/64s: Fix local irq disable when PMIs are disabled

2023-01-21 Thread Nicholas Piggin
When PMI interrupts are soft-masked, local_irq_save() will clear the PMI
mask bit, allowing PMIs in and causing a race condition. This causes a
deadlock in native_hpte_insert via hash_preload, which depends on PMIs
being disabled since commit 8b91cee5eadd ("powerpc/64s/hash: Make hash
faults work in NMI context"). native_hpte_insert calls local_irq_save().
It's possible the lpar hash code is also affected when tracing is
enabled because __trace_hcall_entry() calls local_irq_save().

Fix this by making arch_local_irq_save() _or_ the IRQS_DISABLED bit
into the mask. Add a warning in arch_local_irq_disable() to make sure
it isn't called with PMIs disabled.

This was found with the stress_hpt option with a kbuild workload
running together with `perf record -g`.

Fixes: f442d004806e ("powerpc/64s: Add support to mask perf interrupts and 
replay them")
Fixes: 8b91cee5eadd ("powerpc/64s/hash: Make hash faults work in NMI context")
Signed-off-by: Nicholas Piggin 
---
Lockup looks like this, note IRQMASK=1 in native_hpte_insert when we
expect it should be 3.

 watchdog: CPU 16 Hard LOCKUP
 watchdog: CPU 16 TB:6084087529753, last heartbeat TB:6075895318740 (16000ms 
ago)
 CPU: 16 PID: 9319 Comm: check-local-exp
 NIP:  c008b040 LR: c037cd64 CTR: c0342160
 REGS: c03fffa3fd60 TRAP: 0100   Not tainted
 MSR:  90081033   CR: 88484808  XER: 20040078
 CFAR: c000dc3c IRQMASK: 3
 GPR00: c00e5b10 c00088e17090 c10c0100 c00088e170f0
 GPR04: 7fffc690 0008 c24f0100 fe00
 GPR08: c00012ac4cc0 bcff a8aa 4000
 GPR12: c0342160 c03f2880  
 GPR16:    
 GPR20:    
 GPR24: 0001 fe00 c0002c16d000 0008
 GPR28: 7fdf  7fffc690 c00088e171b0
 NIP [c008b040] __copy_tofrom_user_power7+0x20c/0x7ac
 LR [c037cd64] copy_from_user_nofault+0xa4/0x190
 Call Trace:
 [c00088e17090] [c03feb802030] 0xc03feb802030 (unreliable)
 [c00088e170c0] [c00e5b10] perf_callchain_user_64+0x170/0x4f0
 [c00088e17160] [c00e5980] perf_callchain_user+0x20/0x40
 [c00088e17180] [c035f054] get_perf_callchain+0x184/0x250
 [c00088e17210] [c0357874] perf_callchain+0x94/0xd0
 [c00088e17230] [c035819c] perf_prepare_sample+0x6ac/0x8f0
 [c00088e17290] [c0358428] perf_event_output_forward+0x48/0xc0
 [c00088e17310] [c034d6cc] __perf_event_overflow+0x12c/0x270
 [c00088e17360] [c00e8b80] record_and_restart+0x340/0x830
 [c00088e17580] [c00e9318] perf_event_interrupt+0x2a8/0x4a0
 [c00088e17620] [c0028b64] 
performance_monitor_exception_nmi+0x64/0xb0
 [c00088e17670] [c000baac] 
performance_monitor_common_virt+0x2ac/0x390
 --- interrupt: f00 at native_hpte_insert+0x174/0x210
 NIP:  c007be84 LR: c007bdd4 CTR: c007bd10
 REGS: c00088e176a0 TRAP: 0f00   Not tainted  
(6.2.0-rc4-00077-gd368967cb103-dirty)
 MSR:  90009033   CR: 44484802  XER: 0078
 CFAR:  IRQMASK: 1
 GPR00: c007d2b8 c00088e17940 c10c0100 c000203fc2347b80
 GPR04: 00b3b9708ff0 0010 0400d5791196 1000
 GPR08: 000b3b970885 c000203fc2347b88 000b3b970884 c2457fd0
 GPR12: c007bd10 c03f2880 c2457e70 ffd1e43b9708
 GPR16: 00b3b9708ff0 c2457e18 0001 0196
 GPR20: c24576b8 0800 0002 0002
 GPR24: d579 0196 0003 000b3b970880
 GPR28:  0001  c000203fc2347b80
 NIP [c007be84] native_hpte_insert+0x174/0x210
 LR [c007bdd4] native_hpte_insert+0xc4/0x210
 --- interrupt: f00
 [c00088e17940] [c00088e179c0] 0xc00088e179c0 (unreliable)
 [c00088e179c0] [c007d2b8] __hash_page_64K+0x218/0x4f0
 [c00088e17a70] [c00761fc] __update_mmu_cache+0x30c/0x3b0
 [c00088e17b10] [c03d00a0] do_wp_page+0xa50/0x1640
 [c00088e17bf0] [c03d3ca4] __handle_mm_fault+0xb94/0x1b90
 [c00088e17d00] [c03d4dc0] handle_mm_fault+0x120/0x300
 [c00088e17d50] [c006cbc4] ___do_page_fault+0x2d4/0xac0
 [c00088e17df0] [c006d460] hash__do_page_fault+0x30/0xc0
 [c00088e17e20] [c0075d88] do_hash_fault+0x258/0x340

Thanks,
Nick
---
 arch/powerpc/include/asm/hw_irq.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 77fa88c2aed0..5156fe21284c 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++