When PMI interrupts are soft-masked, local_irq_save() will clear the PMI
mask bit, allowing PMIs in and causing a race condition. This causes a
deadlock in native_hpte_insert via hash_preload, which depends on PMIs
being disabled since commit 8b91cee5eadd ("powerpc/64s/hash: Make hash
faults work in NMI context"). native_hpte_insert calls local_irq_save().
It's possible the lpar hash code is also affected when tracing is
enabled because __trace_hcall_entry() calls local_irq_save().
Fix this by making arch_local_irq_save() _or_ the IRQS_DISABLED bit
into the mask. Add a warning in arch_local_irq_disable() to make sure
it isn't called with PMIs disabled.
This was found with the stress_hpt option with a kbuild workload
running together with `perf record -g`.
Fixes: f442d004806e ("powerpc/64s: Add support to mask perf interrupts and
replay them")
Fixes: 8b91cee5eadd ("powerpc/64s/hash: Make hash faults work in NMI context")
Signed-off-by: Nicholas Piggin
---
Lockup looks like this, note IRQMASK=1 in native_hpte_insert when we
expect it should be 3.
watchdog: CPU 16 Hard LOCKUP
watchdog: CPU 16 TB:6084087529753, last heartbeat TB:6075895318740 (16000ms
ago)
CPU: 16 PID: 9319 Comm: check-local-exp
NIP: c008b040 LR: c037cd64 CTR: c0342160
REGS: c03fffa3fd60 TRAP: 0100 Not tainted
MSR: 90081033 CR: 88484808 XER: 20040078
CFAR: c000dc3c IRQMASK: 3
GPR00: c00e5b10 c00088e17090 c10c0100 c00088e170f0
GPR04: 7fffc690 0008 c24f0100 fe00
GPR08: c00012ac4cc0 bcff a8aa 4000
GPR12: c0342160 c03f2880
GPR16:
GPR20:
GPR24: 0001 fe00 c0002c16d000 0008
GPR28: 7fdf 7fffc690 c00088e171b0
NIP [c008b040] __copy_tofrom_user_power7+0x20c/0x7ac
LR [c037cd64] copy_from_user_nofault+0xa4/0x190
Call Trace:
[c00088e17090] [c03feb802030] 0xc03feb802030 (unreliable)
[c00088e170c0] [c00e5b10] perf_callchain_user_64+0x170/0x4f0
[c00088e17160] [c00e5980] perf_callchain_user+0x20/0x40
[c00088e17180] [c035f054] get_perf_callchain+0x184/0x250
[c00088e17210] [c0357874] perf_callchain+0x94/0xd0
[c00088e17230] [c035819c] perf_prepare_sample+0x6ac/0x8f0
[c00088e17290] [c0358428] perf_event_output_forward+0x48/0xc0
[c00088e17310] [c034d6cc] __perf_event_overflow+0x12c/0x270
[c00088e17360] [c00e8b80] record_and_restart+0x340/0x830
[c00088e17580] [c00e9318] perf_event_interrupt+0x2a8/0x4a0
[c00088e17620] [c0028b64]
performance_monitor_exception_nmi+0x64/0xb0
[c00088e17670] [c000baac]
performance_monitor_common_virt+0x2ac/0x390
--- interrupt: f00 at native_hpte_insert+0x174/0x210
NIP: c007be84 LR: c007bdd4 CTR: c007bd10
REGS: c00088e176a0 TRAP: 0f00 Not tainted
(6.2.0-rc4-00077-gd368967cb103-dirty)
MSR: 90009033 CR: 44484802 XER: 0078
CFAR: IRQMASK: 1
GPR00: c007d2b8 c00088e17940 c10c0100 c000203fc2347b80
GPR04: 00b3b9708ff0 0010 0400d5791196 1000
GPR08: 000b3b970885 c000203fc2347b88 000b3b970884 c2457fd0
GPR12: c007bd10 c03f2880 c2457e70 ffd1e43b9708
GPR16: 00b3b9708ff0 c2457e18 0001 0196
GPR20: c24576b8 0800 0002 0002
GPR24: d579 0196 0003 000b3b970880
GPR28: 0001 c000203fc2347b80
NIP [c007be84] native_hpte_insert+0x174/0x210
LR [c007bdd4] native_hpte_insert+0xc4/0x210
--- interrupt: f00
[c00088e17940] [c00088e179c0] 0xc00088e179c0 (unreliable)
[c00088e179c0] [c007d2b8] __hash_page_64K+0x218/0x4f0
[c00088e17a70] [c00761fc] __update_mmu_cache+0x30c/0x3b0
[c00088e17b10] [c03d00a0] do_wp_page+0xa50/0x1640
[c00088e17bf0] [c03d3ca4] __handle_mm_fault+0xb94/0x1b90
[c00088e17d00] [c03d4dc0] handle_mm_fault+0x120/0x300
[c00088e17d50] [c006cbc4] ___do_page_fault+0x2d4/0xac0
[c00088e17df0] [c006d460] hash__do_page_fault+0x30/0xc0
[c00088e17e20] [c0075d88] do_hash_fault+0x258/0x340
Thanks,
Nick
---
arch/powerpc/include/asm/hw_irq.h | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/hw_irq.h
b/arch/powerpc/include/asm/hw_irq.h
index 77fa88c2aed0..5156fe21284c 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++