Re: [PATCH v3 1/2] powerpc/64s: Fix pte update for kernel memory on radix
On Mon, 8 Feb 2021 14:29:56 +1100, Jordan Niethe wrote: > When adding a pte a ptesync is needed to order the update of the pte > with subsequent accesses otherwise a spurious fault may be raised. > > radix__set_pte_at() does not do this for performance gains. For > non-kernel memory this is not an issue as any faults of this kind are > corrected by the page fault handler. For kernel memory these faults are > not handled. The current solution is that there is a ptesync in > flush_cache_vmap() which should be called when mapping from the vmalloc > region. > > [...] Applied to powerpc/next. [1/2] powerpc/64s: Fix pte update for kernel memory on radix https://git.kernel.org/powerpc/c/b8b2f37cf632434456182e9002d63cbc450c [2/2] selftests/powerpc: Test for spurious kernel memory faults on radix https://git.kernel.org/powerpc/c/29e3ea8cbd2958cf237b84652ec236803f2c6202 cheers
[PATCH v3 1/2] powerpc/64s: Fix pte update for kernel memory on radix
When adding a pte a ptesync is needed to order the update of the pte with subsequent accesses otherwise a spurious fault may be raised. radix__set_pte_at() does not do this for performance gains. For non-kernel memory this is not an issue as any faults of this kind are corrected by the page fault handler. For kernel memory these faults are not handled. The current solution is that there is a ptesync in flush_cache_vmap() which should be called when mapping from the vmalloc region. However, map_kernel_page() does not call flush_cache_vmap(). This is troublesome in particular for code patching with Strict RWX on radix. In do_patch_instruction() the page frame that contains the instruction to be patched is mapped and then immediately patched. With no ordering or synchronization between setting up the pte and writing to the page it is possible for faults. As the code patching is done using __put_user_asm_goto() the resulting fault is obscured - but using a normal store instead it can be seen: [ 418.498768][ T757] BUG: Unable to handle kernel data access on write at 0xc00808f24a3c [ 418.498790][ T757] Faulting instruction address: 0xc008bd74 [ 418.498805][ T757] Oops: Kernel access of bad area, sig: 11 [#1] [ 418.498828][ T757] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV [ 418.498843][ T757] Modules linked in: nop_module(PO+) [last unloaded: nop_module] [ 418.498872][ T757] CPU: 4 PID: 757 Comm: sh Tainted: P O 5.10.0-rc5-01361-ge3c1b78c8440-dirty #43 [ 418.498936][ T757] NIP: c008bd74 LR: c008bd50 CTR: c0025810 [ 418.498979][ T757] REGS: c00016f634a0 TRAP: 0300 Tainted: P O (5.10.0-rc5-01361-ge3c1b78c8440-dirty) [ 418.499033][ T757] MSR: 90009033 CR: 44002884 XER: [ 418.499084][ T757] CFAR: c007c68c DAR: c00808f24a3c DSISR: 4200 IRQMASK: 1 This results in the kind of issue reported here: https://lore.kernel.org/linuxppc-dev/15ac5b0e-a221-4b8c-9039-fa96b8ef7...@lca.pw/ Chris Riedl suggested a reliable way to reproduce the issue: $ mount -t debugfs none /sys/kernel/debug $ (while true; do echo function > /sys/kernel/debug/tracing/current_tracer ; echo nop > /sys/kernel/debug/tracing/current_tracer ; done)& Turning ftrace on and off does a large amount of code patching which in usually less then 5min will crash giving a trace like: [ 146.668988][ T809] ftrace-powerpc: (ptrval): replaced (4b473b11) != old (6000) [ 146.668995][ T809] [ ftrace bug ] [ 146.669031][ T809] ftrace failed to modify [ 146.669039][ T809] [] napi_busy_loop+0xc/0x390 [ 146.669045][ T809] actual: 11:3b:47:4b [ 146.669070][ T809] Setting ftrace call site to call ftrace function [ 146.669075][ T809] ftrace record flags: 8001 [ 146.669081][ T809] (1) [ 146.669081][ T809] expected tramp: c006c96c [ 146.669096][ T809] [ cut here ] [ 146.669104][ T809] WARNING: CPU: 4 PID: 809 at kernel/trace/ftrace.c:2065 ftrace_bug+0x28c/0x2e8 [ 146.669109][ T809] Modules linked in: nop_module(PO-) [last unloaded: nop_module] [ 146.669130][ T809] CPU: 4 PID: 809 Comm: sh Tainted: P O 5.10.0-rc5-01360-gf878ccaf250a #1 [ 146.669136][ T809] NIP: c024f334 LR: c024f330 CTR: c01a5af0 [ 146.669142][ T809] REGS: c4c8b760 TRAP: 0700 Tainted: P O (5.10.0-rc5-01360-gf878ccaf250a) [ 146.669147][ T809] MSR: 9282b033 CR: 28008848 XER: 2004 [ 146.669208][ T809] CFAR: c01a9c98 IRQMASK: 0 [ 146.669208][ T809] GPR00: c024f330 c4c8b9f0 c2770600 0022 [ 146.669208][ T809] GPR04: 7fff c4c8b6d0 0027 c007fe9bcdd8 [ 146.669208][ T809] GPR08: 0023 ffd8 0027 c2613118 [ 146.669208][ T809] GPR12: 8000 c007fffdca00 [ 146.669208][ T809] GPR16: 23ec37c5 0008 [ 146.669208][ T809] GPR20: c4c8bc90 c27a2d20 c4c8bcd0 c2612fe8 [ 146.669208][ T809] GPR24: 0038 0030 0028 0020 [ 146.669208][ T809] GPR28: c0ff1b68 c0bf8e5c c312f700 c0fbb9b0 [ 146.669384][ T809] NIP [c024f334] ftrace_bug+0x28c/0x2e8 [ 146.669391][ T809] LR [c024f330] ftrace_bug+0x288/0x2e8 [ 146.669396][ T809] Call Trace: [ 146.669403][ T809] [c4c8b9f0] [c024f330] ftrace_bug+0x288/0x2e8 (unreliable) [ 146.669418][ T809] [c4c8ba80] [c0248778] ftrace_modify_all_code+0x168/0x210 [ 146.669429][ T809] [c4c8bab0] [c006c528] arch_ftrace_update_code+0x18/0x30 [ 146.669440][ T809] [c4c8bad0] [c0248954] ftrace_run_update_code+0x44/0xc0 [ 1