Re: [PATCH v3 1/2] powerpc/64s: Fix pte update for kernel memory on radix

2021-04-10 Thread Michael Ellerman
On Mon, 8 Feb 2021 14:29:56 +1100, Jordan Niethe wrote:
> When adding a pte a ptesync is needed to order the update of the pte
> with subsequent accesses otherwise a spurious fault may be raised.
> 
> radix__set_pte_at() does not do this for performance gains. For
> non-kernel memory this is not an issue as any faults of this kind are
> corrected by the page fault handler.  For kernel memory these faults are
> not handled.  The current solution is that there is a ptesync in
> flush_cache_vmap() which should be called when mapping from the vmalloc
> region.
> 
> [...]

Applied to powerpc/next.

[1/2] powerpc/64s: Fix pte update for kernel memory on radix
  https://git.kernel.org/powerpc/c/b8b2f37cf632434456182e9002d63cbc450c
[2/2] selftests/powerpc: Test for spurious kernel memory faults on radix
  https://git.kernel.org/powerpc/c/29e3ea8cbd2958cf237b84652ec236803f2c6202

cheers


[PATCH v3 1/2] powerpc/64s: Fix pte update for kernel memory on radix

2021-02-07 Thread Jordan Niethe
When adding a pte a ptesync is needed to order the update of the pte
with subsequent accesses otherwise a spurious fault may be raised.

radix__set_pte_at() does not do this for performance gains. For
non-kernel memory this is not an issue as any faults of this kind are
corrected by the page fault handler.  For kernel memory these faults are
not handled.  The current solution is that there is a ptesync in
flush_cache_vmap() which should be called when mapping from the vmalloc
region.

However, map_kernel_page() does not call flush_cache_vmap(). This is
troublesome in particular for code patching with Strict RWX on radix. In
do_patch_instruction() the page frame that contains the instruction to
be patched is mapped and then immediately patched. With no ordering or
synchronization between setting up the pte and writing to the page it is
possible for faults.

As the code patching is done using __put_user_asm_goto() the resulting
fault is obscured - but using a normal store instead it can be seen:

[  418.498768][  T757] BUG: Unable to handle kernel data access on write at 
0xc00808f24a3c
[  418.498790][  T757] Faulting instruction address: 0xc008bd74
[  418.498805][  T757] Oops: Kernel access of bad area, sig: 11 [#1]
[  418.498828][  T757] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
[  418.498843][  T757] Modules linked in: nop_module(PO+) [last unloaded: 
nop_module]
[  418.498872][  T757] CPU: 4 PID: 757 Comm: sh Tainted: P   O  
5.10.0-rc5-01361-ge3c1b78c8440-dirty #43
[  418.498936][  T757] NIP:  c008bd74 LR: c008bd50 CTR: 
c0025810
[  418.498979][  T757] REGS: c00016f634a0 TRAP: 0300   Tainted: P   
O   (5.10.0-rc5-01361-ge3c1b78c8440-dirty)
[  418.499033][  T757] MSR:  90009033   CR: 
44002884  XER: 
[  418.499084][  T757] CFAR: c007c68c DAR: c00808f24a3c DSISR: 
4200 IRQMASK: 1

This results in the kind of issue reported here:
https://lore.kernel.org/linuxppc-dev/15ac5b0e-a221-4b8c-9039-fa96b8ef7...@lca.pw/

Chris Riedl suggested a reliable way to reproduce the issue:
$ mount -t debugfs none /sys/kernel/debug
$ (while true; do echo function > /sys/kernel/debug/tracing/current_tracer ; 
echo nop > /sys/kernel/debug/tracing/current_tracer ; done)&

Turning ftrace on and off does a large amount of code patching which in
usually less then 5min will crash giving a trace like:

[  146.668988][  T809] ftrace-powerpc: (ptrval): replaced (4b473b11) != 
old (6000)
[  146.668995][  T809] [ ftrace bug ]
[  146.669031][  T809] ftrace failed to modify
[  146.669039][  T809] [] napi_busy_loop+0xc/0x390
[  146.669045][  T809]  actual:   11:3b:47:4b
[  146.669070][  T809] Setting ftrace call site to call ftrace function
[  146.669075][  T809] ftrace record flags: 8001
[  146.669081][  T809]  (1)
[  146.669081][  T809]  expected tramp: c006c96c
[  146.669096][  T809] [ cut here ]
[  146.669104][  T809] WARNING: CPU: 4 PID: 809 at kernel/trace/ftrace.c:2065 
ftrace_bug+0x28c/0x2e8
[  146.669109][  T809] Modules linked in: nop_module(PO-) [last unloaded: 
nop_module]
[  146.669130][  T809] CPU: 4 PID: 809 Comm: sh Tainted: P   O  
5.10.0-rc5-01360-gf878ccaf250a #1
[  146.669136][  T809] NIP:  c024f334 LR: c024f330 CTR: 
c01a5af0
[  146.669142][  T809] REGS: c4c8b760 TRAP: 0700   Tainted: P   
O   (5.10.0-rc5-01360-gf878ccaf250a)
[  146.669147][  T809] MSR:  9282b033 
  CR: 28008848  XER: 2004
[  146.669208][  T809] CFAR: c01a9c98 IRQMASK: 0
[  146.669208][  T809] GPR00: c024f330 c4c8b9f0 
c2770600 0022
[  146.669208][  T809] GPR04: 7fff c4c8b6d0 
0027 c007fe9bcdd8
[  146.669208][  T809] GPR08: 0023 ffd8 
0027 c2613118
[  146.669208][  T809] GPR12: 8000 c007fffdca00 
 
[  146.669208][  T809] GPR16: 23ec37c5  
 0008
[  146.669208][  T809] GPR20: c4c8bc90 c27a2d20 
c4c8bcd0 c2612fe8
[  146.669208][  T809] GPR24: 0038 0030 
0028 0020
[  146.669208][  T809] GPR28: c0ff1b68 c0bf8e5c 
c312f700 c0fbb9b0
[  146.669384][  T809] NIP [c024f334] ftrace_bug+0x28c/0x2e8
[  146.669391][  T809] LR [c024f330] ftrace_bug+0x288/0x2e8
[  146.669396][  T809] Call Trace:
[  146.669403][  T809] [c4c8b9f0] [c024f330] 
ftrace_bug+0x288/0x2e8 (unreliable)
[  146.669418][  T809] [c4c8ba80] [c0248778] 
ftrace_modify_all_code+0x168/0x210
[  146.669429][  T809] [c4c8bab0] [c006c528] 
arch_ftrace_update_code+0x18/0x30
[  146.669440][  T809] [c4c8bad0] [c0248954] 
ftrace_run_update_code+0x44/0xc0
[  1