[PATCH v4 0/5] Add generic data patching functions
Currently patch_instruction() bases the write length on the value being written. If the value looks like a prefixed instruction it writes 8 bytes, otherwise it writes 4 bytes. This makes it potentially buggy to use for writing arbitrary data, as if you want to write 4 bytes but it decides to write 8 bytes it may clobber the following memory or be unaligned and trigger an oops if it tries to cross a page boundary. To solve this, this series pulls out the size parameter to the 'top' of the memory patching logic, and propagates it through the various functions. The two sizes supported are int and long; this allows for patching instructions and pointers on both ppc32 and ppc64. On ppc32 these are the same size, so care is taken to only use the size parameter on static functions, so the compiler can optimise it out entirely. Unfortunately GCC trips over its own feet here and won't optimise in a way that is optimal for strict RWX (mpc85xx_smp_defconfig) and no RWX (pmac32_defconfig). More details in the v2 cover letter. Changes from v3: * Improved the boot test. Now that alignment is enforced, it checks the word (but not doubleword) aligned patch_ulong(). Changes from v2: * Various changes noted on each patch * Data patching now enforced to be aligned * Restore page aligned flushing optimisation Changes from v1: * Addressed the v1 review actions * Removed noinline (for now) v3: https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20240325055302.876434-1-bg...@linux.ibm.com/ v2: https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20231016050147.115686-1-bg...@linux.ibm.com/ v1: https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20230207015643.590684-1-bg...@linux.ibm.com/ Benjamin Gray (5): powerpc/code-patching: Add generic memory patching powerpc/code-patching: Add data patch alignment check powerpc/64: Convert patch_instruction() to patch_u32() powerpc/32: Convert patch_instruction() to patch_uint() powerpc/code-patching: Add boot selftest for data patching arch/powerpc/include/asm/code-patching.h | 37 + arch/powerpc/kernel/module_64.c | 5 +- arch/powerpc/kernel/static_call.c| 2 +- arch/powerpc/lib/code-patching.c | 70 +++- arch/powerpc/lib/test-code-patching.c| 41 ++ arch/powerpc/platforms/powermac/smp.c| 2 +- 6 files changed, 137 insertions(+), 20 deletions(-) -- 2.45.0
[PATCH v4 5/5] powerpc/code-patching: Add boot selftest for data patching
Extend the code patching selftests with some basic coverage of the new data patching variants too. Signed-off-by: Benjamin Gray --- v4: * Change store to a check * Account for doubleword alignment v3: * New in v3 --- arch/powerpc/lib/test-code-patching.c | 41 +++ 1 file changed, 41 insertions(+) diff --git a/arch/powerpc/lib/test-code-patching.c b/arch/powerpc/lib/test-code-patching.c index f76030087f98..8cd3b32f805b 100644 --- a/arch/powerpc/lib/test-code-patching.c +++ b/arch/powerpc/lib/test-code-patching.c @@ -438,6 +438,46 @@ static void __init test_multi_instruction_patching(void) vfree(buf); } +static void __init test_data_patching(void) +{ + void *buf; + u32 *addr32; + + buf = vzalloc(PAGE_SIZE); + check(buf); + if (!buf) + return; + + addr32 = buf + 128; + + addr32[1] = 0xA0A1A2A3; + addr32[2] = 0xB0B1B2B3; + + check(!patch_uint([1], 0xC0C1C2C3)); + + check(addr32[0] == 0); + check(addr32[1] == 0xC0C1C2C3); + check(addr32[2] == 0xB0B1B2B3); + check(addr32[3] == 0); + + /* Unaligned patch_ulong() should fail */ + if (IS_ENABLED(CONFIG_PPC64)) + check(patch_ulong([1], 0xD0D1D2D3) == -EINVAL); + + check(!patch_ulong([2], 0xD0D1D2D3)); + + check(addr32[0] == 0); + check(addr32[1] == 0xC0C1C2C3); + check(*(unsigned long *)([2]) == 0xD0D1D2D3); + + if (!IS_ENABLED(CONFIG_PPC64)) + check(addr32[3] == 0); + + check(addr32[4] == 0); + + vfree(buf); +} + static int __init test_code_patching(void) { pr_info("Running code patching self-tests ...\n"); @@ -448,6 +488,7 @@ static int __init test_code_patching(void) test_translate_branch(); test_prefixed_patching(); test_multi_instruction_patching(); + test_data_patching(); return 0; } -- 2.45.0
[PATCH v4 1/5] powerpc/code-patching: Add generic memory patching
patch_instruction() is designed for patching instructions in otherwise readonly memory. Other consumers also sometimes need to patch readonly memory, so have abused patch_instruction() for arbitrary data patches. This is a problem on ppc64 as patch_instruction() decides on the patch width using the 'instruction' opcode to see if it's a prefixed instruction. Data that triggers this can lead to larger writes, possibly crossing a page boundary and failing the write altogether. Introduce patch_uint(), and patch_ulong(), with aliases patch_u32(), and patch_u64() (on ppc64) designed for aligned data patches. The patch size is now determined by the called function, and is passed as an additional parameter to generic internals. While the instruction flushing is not required for data patches, it remains unconditional in this patch. A followup series is possible if benchmarking shows fewer flushes gives an improvement in some data-patching workload. ppc32 does not support prefixed instructions, so is unaffected by the original issue. Care is taken in not exposing the size parameter in the public (non-static) interface, so the compiler can const-propagate it away. Signed-off-by: Benjamin Gray --- v3: * Rename from *_memory to *_mem * Change type of ppc32 patch_uint() address to void* * Explain introduction of val32 for big endian * Some formatting v2: * Deduplicate patch_32() definition * Use u32 for val32 * Remove noinline --- arch/powerpc/include/asm/code-patching.h | 31 arch/powerpc/lib/code-patching.c | 64 ++-- 2 files changed, 80 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h index 0e29ccf903d0..21a36e2c4e26 100644 --- a/arch/powerpc/include/asm/code-patching.h +++ b/arch/powerpc/include/asm/code-patching.h @@ -76,6 +76,37 @@ int patch_instruction(u32 *addr, ppc_inst_t instr); int raw_patch_instruction(u32 *addr, ppc_inst_t instr); int patch_instructions(u32 *addr, u32 *code, size_t len, bool repeat_instr); +/* + * The data patching functions patch_uint() and patch_ulong(), etc., must be + * called on aligned addresses. + * + * The instruction patching functions patch_instruction() and similar must be + * called on addresses satisfying instruction alignment requirements. + */ + +#ifdef CONFIG_PPC64 + +int patch_uint(void *addr, unsigned int val); +int patch_ulong(void *addr, unsigned long val); + +#define patch_u64 patch_ulong + +#else + +static inline int patch_uint(void *addr, unsigned int val) +{ + return patch_instruction(addr, ppc_inst(val)); +} + +static inline int patch_ulong(void *addr, unsigned long val) +{ + return patch_instruction(addr, ppc_inst(val)); +} + +#endif + +#define patch_u32 patch_uint + static inline unsigned long patch_site_addr(s32 *site) { return (unsigned long)site + *site; diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index df64343b9214..18f762616db9 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -20,15 +20,14 @@ #include #include -static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 *patch_addr) +static int __patch_mem(void *exec_addr, unsigned long val, void *patch_addr, bool is_dword) { - if (!ppc_inst_prefixed(instr)) { - u32 val = ppc_inst_val(instr); + if (!IS_ENABLED(CONFIG_PPC64) || likely(!is_dword)) { + /* For big endian correctness: plain address would use the wrong half */ + u32 val32 = val; - __put_kernel_nofault(patch_addr, , u32, failed); + __put_kernel_nofault(patch_addr, , u32, failed); } else { - u64 val = ppc_inst_as_ulong(instr); - __put_kernel_nofault(patch_addr, , u64, failed); } @@ -44,7 +43,10 @@ static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 *patch_addr int raw_patch_instruction(u32 *addr, ppc_inst_t instr) { - return __patch_instruction(addr, instr, addr); + if (ppc_inst_prefixed(instr)) + return __patch_mem(addr, ppc_inst_as_ulong(instr), addr, true); + else + return __patch_mem(addr, ppc_inst_val(instr), addr, false); } struct patch_context { @@ -276,7 +278,7 @@ static void unmap_patch_area(unsigned long addr) flush_tlb_kernel_range(addr, addr + PAGE_SIZE); } -static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) +static int __do_patch_mem_mm(void *addr, unsigned long val, bool is_dword) { int err; u32 *patch_addr; @@ -305,7 +307,7 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) orig_mm = start_using_temp_mm(patching_mm); - err = __patch_instruction(addr, instr, patch_addr); + err = __patch_mem(addr, val, patch_addr, is_dword); /* context synchronisation performed
[PATCH v4 2/5] powerpc/code-patching: Add data patch alignment check
The new data patching still needs to be aligned within a cacheline too for the flushes to work correctly. To simplify this requirement, we just say data patches must be aligned. Detect when data patching is not aligned, returning an invalid argument error. Signed-off-by: Benjamin Gray --- v3: * New in v3 --- arch/powerpc/include/asm/code-patching.h | 6 ++ arch/powerpc/lib/code-patching.c | 6 ++ 2 files changed, 12 insertions(+) diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h index 21a36e2c4e26..e7f14720f630 100644 --- a/arch/powerpc/include/asm/code-patching.h +++ b/arch/powerpc/include/asm/code-patching.h @@ -95,11 +95,17 @@ int patch_ulong(void *addr, unsigned long val); static inline int patch_uint(void *addr, unsigned int val) { + if (!IS_ALIGNED((unsigned long)addr, sizeof(unsigned int))) + return -EINVAL; + return patch_instruction(addr, ppc_inst(val)); } static inline int patch_ulong(void *addr, unsigned long val) { + if (!IS_ALIGNED((unsigned long)addr, sizeof(unsigned long))) + return -EINVAL; + return patch_instruction(addr, ppc_inst(val)); } diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 18f762616db9..384b275d1bc5 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -386,12 +386,18 @@ NOKPROBE_SYMBOL(patch_instruction); int patch_uint(void *addr, unsigned int val) { + if (!IS_ALIGNED((unsigned long)addr, sizeof(unsigned int))) + return -EINVAL; + return patch_mem(addr, val, false); } NOKPROBE_SYMBOL(patch_uint); int patch_ulong(void *addr, unsigned long val) { + if (!IS_ALIGNED((unsigned long)addr, sizeof(unsigned long))) + return -EINVAL; + return patch_mem(addr, val, true); } NOKPROBE_SYMBOL(patch_ulong); -- 2.45.0
[PATCH v4 4/5] powerpc/32: Convert patch_instruction() to patch_uint()
These changes are for patch_instruction() uses on data. Unlike ppc64 these should not be incorrect as-is, but using the patch_uint() alias better reflects what kind of data being patched and allows for benchmarking the effect of different patch_* implementations (e.g., skipping instruction flushing when patching data). Signed-off-by: Benjamin Gray --- arch/powerpc/kernel/static_call.c | 2 +- arch/powerpc/platforms/powermac/smp.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/static_call.c b/arch/powerpc/kernel/static_call.c index 863a7aa24650..1502b7e439ca 100644 --- a/arch/powerpc/kernel/static_call.c +++ b/arch/powerpc/kernel/static_call.c @@ -17,7 +17,7 @@ void arch_static_call_transform(void *site, void *tramp, void *func, bool tail) mutex_lock(_mutex); if (func && !is_short) { - err = patch_instruction(tramp + PPC_SCT_DATA, ppc_inst(target)); + err = patch_ulong(tramp + PPC_SCT_DATA, target); if (err) goto out; } diff --git a/arch/powerpc/platforms/powermac/smp.c b/arch/powerpc/platforms/powermac/smp.c index 15644be31990..d21b681f52fb 100644 --- a/arch/powerpc/platforms/powermac/smp.c +++ b/arch/powerpc/platforms/powermac/smp.c @@ -827,7 +827,7 @@ static int smp_core99_kick_cpu(int nr) mdelay(1); /* Restore our exception vector */ - patch_instruction(vector, ppc_inst(save_vector)); + patch_uint(vector, save_vector); local_irq_restore(flags); if (ppc_md.progress) ppc_md.progress("smp_core99_kick_cpu done", 0x347); -- 2.45.0
[PATCH v4 3/5] powerpc/64: Convert patch_instruction() to patch_u32()
This use of patch_instruction() is working on 32 bit data, and can fail if the data looks like a prefixed instruction and the extra write crosses a page boundary. Use patch_u32() to fix the write size. Fixes: 8734b41b3efe ("powerpc/module_64: Fix livepatching for RO modules") Link: https://lore.kernel.org/all/20230203004649.1f59dbd4@yea/ Signed-off-by: Benjamin Gray --- v2: * Added the fixes tag, it seems appropriate even if the subject does mention a more robust solution being required. patch_u64() should be more efficient, but judging from the bug report it doesn't seem like the data is doubleword aligned. --- arch/powerpc/kernel/module_64.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c index 7112adc597a8..e9bab599d0c2 100644 --- a/arch/powerpc/kernel/module_64.c +++ b/arch/powerpc/kernel/module_64.c @@ -651,12 +651,11 @@ static inline int create_stub(const Elf64_Shdr *sechdrs, // func_desc_t is 8 bytes if ABIv2, else 16 bytes desc = func_desc(addr); for (i = 0; i < sizeof(func_desc_t) / sizeof(u32); i++) { - if (patch_instruction(((u32 *)>funcdata) + i, - ppc_inst(((u32 *)())[i]))) + if (patch_u32(((u32 *)>funcdata) + i, ((u32 *))[i])) return 0; } - if (patch_instruction(>magic, ppc_inst(STUB_MAGIC))) + if (patch_u32(>magic, STUB_MAGIC)) return 0; return 1; -- 2.45.0
Re: [PATCH v3 3/5] powerpc/64: Convert patch_instruction() to patch_u32()
On Tue, 2024-04-23 at 15:09 +0530, Naveen N Rao wrote: > On Mon, Mar 25, 2024 at 04:53:00PM +1100, Benjamin Gray wrote: > > This use of patch_instruction() is working on 32 bit data, and can > > fail > > if the data looks like a prefixed instruction and the extra write > > crosses a page boundary. Use patch_u32() to fix the write size. > > > > Fixes: 8734b41b3efe ("powerpc/module_64: Fix livepatching for RO > > modules") > > Link: https://lore.kernel.org/all/20230203004649.1f59dbd4@yea/ > > Signed-off-by: Benjamin Gray > > > > --- > > > > v2: * Added the fixes tag, it seems appropriate even if the subject > > does > > mention a more robust solution being required. > > > > patch_u64() should be more efficient, but judging from the bug > > report > > it doesn't seem like the data is doubleword aligned. > > Asking again, is that still the case? It looks like at least the > first > fix below can be converted to patch_u64(). > > - Naveen Sorry, I think I forgot this question last time. Reading the commit descriptions you linked, I don't see any mention of "entry->funcdata will always be doubleword aligned because XYZ". If the patch makes it doubleword aligned anyway, I wouldn't be confident asserting all callers will always do this without looking into it a lot more. Perhaps a separate series could optimise it with appropriate justification/assertions to catch bad alignment. But I think leaving it out of this series is fine because the original works in words, so it's not regressing anything. > > > --- > > arch/powerpc/kernel/module_64.c | 5 ++--- > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > > diff --git a/arch/powerpc/kernel/module_64.c > > b/arch/powerpc/kernel/module_64.c > > index 7112adc597a8..e9bab599d0c2 100644 > > --- a/arch/powerpc/kernel/module_64.c > > +++ b/arch/powerpc/kernel/module_64.c > > @@ -651,12 +651,11 @@ static inline int create_stub(const > > Elf64_Shdr *sechdrs, > > // func_desc_t is 8 bytes if ABIv2, else 16 bytes > > desc = func_desc(addr); > > for (i = 0; i < sizeof(func_desc_t) / sizeof(u32); i++) { > > - if (patch_instruction(((u32 *)>funcdata) + > > i, > > - ppc_inst(((u32 > > *)())[i]))) > > + if (patch_u32(((u32 *)>funcdata) + i, ((u32 > > *))[i])) > > return 0; > > } > > > > - if (patch_instruction(>magic, > > ppc_inst(STUB_MAGIC))) > > + if (patch_u32(>magic, STUB_MAGIC)) > > return 0; > > > > return 1; > > -- > > 2.44.0 > >
[PATCH v1 9/9] Documentation: Document PowerPC kernel dynamic DEXCR interface
Documents how to use the PR_PPC_GET_DEXCR and PR_PPC_SET_DEXCR prctl()'s for changing a process's DEXCR or its process tree default value. Signed-off-by: Benjamin Gray --- Documentation/arch/powerpc/dexcr.rst | 141 ++- 1 file changed, 139 insertions(+), 2 deletions(-) diff --git a/Documentation/arch/powerpc/dexcr.rst b/Documentation/arch/powerpc/dexcr.rst index 615a631f51fa..ab0724212fcd 100644 --- a/Documentation/arch/powerpc/dexcr.rst +++ b/Documentation/arch/powerpc/dexcr.rst @@ -36,8 +36,145 @@ state for a process. Configuration = -The DEXCR is currently unconfigurable. All threads are run with the -NPHIE aspect enabled. +prctl +- + +A process can control its own userspace DEXCR value using the +``PR_PPC_GET_DEXCR`` and ``PR_PPC_SET_DEXCR`` pair of +:manpage:`prctl(2)` commands. These calls have the form:: + +prctl(PR_PPC_GET_DEXCR, unsigned long which, 0, 0, 0); +prctl(PR_PPC_SET_DEXCR, unsigned long which, unsigned long ctrl, 0, 0); + +The possible 'which' and 'ctrl' values are as follows. Note there is no relation +between the 'which' value and the DEXCR aspect's index. + +.. flat-table:: + :header-rows: 1 + :widths: 2 7 1 + + * - ``prctl()`` which + - Aspect name + - Aspect index + + * - ``PR_PPC_DEXCR_SBHE`` + - Speculative Branch Hint Enable (SBHE) + - 0 + + * - ``PR_PPC_DEXCR_IBRTPD`` + - Indirect Branch Recurrent Target Prediction Disable (IBRTPD) + - 3 + + * - ``PR_PPC_DEXCR_SRAPD`` + - Subroutine Return Address Prediction Disable (SRAPD) + - 4 + + * - ``PR_PPC_DEXCR_NPHIE`` + - Non-Privileged Hash Instruction Enable (NPHIE) + - 5 + +.. flat-table:: + :header-rows: 1 + :widths: 2 8 + + * - ``prctl()`` ctrl + - Meaning + + * - ``PR_PPC_DEXCR_CTRL_EDITABLE`` + - This aspect can be configured with PR_PPC_SET_DEXCR (get only) + + * - ``PR_PPC_DEXCR_CTRL_SET`` + - This aspect is set / set this aspect + + * - ``PR_PPC_DEXCR_CTRL_CLEAR`` + - This aspect is clear / clear this aspect + + * - ``PR_PPC_DEXCR_CTRL_SET_ONEXEC`` + - This aspect will be set after exec / set this aspect after exec + + * - ``PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC`` + - This aspect will be clear after exec / clear this aspect after exec + +Note that + +* which is a plain value, not a bitmask. Aspects must be worked with individually. + +* ctrl is a bitmask. ``PR_PPC_GET_DEXCR`` returns both the current and onexec + configuration. For example, ``PR_PPC_GET_DEXCR`` may return + ``PR_PPC_DEXCR_CTRL_EDITABLE | PR_PPC_DEXCR_CTRL_SET | + PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC``. This would indicate the aspect is currently + set, it will be cleared when you run exec, and you can change this with the + ``PR_PPC_SET_DEXCR`` prctl. + +* The set/clear terminology refers to setting/clearing the bit in the DEXCR. + For example:: + + prctl(PR_PPC_SET_DEXCR, PR_PPC_DEXCR_IBRTPD, PR_PPC_DEXCR_CTRL_SET, 0, 0); + + will set the IBRTPD aspect bit in the DEXCR, causing indirect branch prediction + to be disabled. + +* The status returned by ``PR_PPC_GET_DEXCR`` represents what value the process + would like applied. It does not include any alternative overrides, such as if + the hypervisor is enforcing the aspect be set. To see the true DEXCR state + software should read the appropriate SPRs directly. + +* The aspect state when starting a process is copied from the parent's state on + :manpage:`fork(2)`. The state is reset to a fixed value on + :manpage:`execve(2)`. The PR_PPC_SET_DEXCR prctl() can control both of these + values. + +* The ``*_ONEXEC`` controls do not change the current process's DEXCR. + +Use ``PR_PPC_SET_DEXCR`` with one of ``PR_PPC_DEXCR_CTRL_SET`` or +``PR_PPC_DEXCR_CTRL_CLEAR`` to edit a given aspect. + +Common error codes for both getting and setting the DEXCR are as follows: + +.. flat-table:: + :header-rows: 1 + :widths: 2 8 + + * - Error + - Meaning + + * - ``EINVAL`` + - The DEXCR is not supported by the kernel. + + * - ``ENODEV`` + - The aspect is not recognised by the kernel or not supported by the + hardware. + +``PR_PPC_SET_DEXCR`` may also report the following error codes: + +.. flat-table:: + :header-rows: 1 + :widths: 2 8 + + * - Error + - Meaning + + * - ``EINVAL`` + - The ctrl value contains unrecognised flags. + + * - ``EINVAL`` + - The ctrl value contains mutually conflicting flags (e.g., + ``PR_PPC_DEXCR_CTRL_SET | PR_PPC_DEXCR_CTRL_CLEAR``) + + * - ``EPERM`` + - This aspect cannot be modified with prctl() (check for the + PR_PPC_DEXCR_CTRL_EDITABLE flag with PR_PPC_GET_DEXCR). + + * - ``EPERM`` + - The process does not have sufficient privilege to perform the operation. + For example, clearing NPHIE on exec is a privileged operation (a process + can still clear its own NPHIE aspect without privileges). + +This interface allows a process to control its own DEXCR
[PATCH v1 8/9] selftests/powerpc/dexcr: Add chdexcr utility
Adds a utility to exercise the prctl DEXCR inheritance in the shell. Supports setting and clearing each aspect. Signed-off-by: Benjamin Gray --- .../selftests/powerpc/dexcr/.gitignore| 1 + .../testing/selftests/powerpc/dexcr/Makefile | 2 +- .../testing/selftests/powerpc/dexcr/chdexcr.c | 110 +++ tools/testing/selftests/powerpc/dexcr/dexcr.h | 47 +++ .../testing/selftests/powerpc/dexcr/lsdexcr.c | 130 -- 5 files changed, 185 insertions(+), 105 deletions(-) create mode 100644 tools/testing/selftests/powerpc/dexcr/chdexcr.c diff --git a/tools/testing/selftests/powerpc/dexcr/.gitignore b/tools/testing/selftests/powerpc/dexcr/.gitignore index 5d526613cd26..11eefb4b9fa4 100644 --- a/tools/testing/selftests/powerpc/dexcr/.gitignore +++ b/tools/testing/selftests/powerpc/dexcr/.gitignore @@ -1,3 +1,4 @@ dexcr_test hashchk_test +chdexcr lsdexcr diff --git a/tools/testing/selftests/powerpc/dexcr/Makefile b/tools/testing/selftests/powerpc/dexcr/Makefile index 076943193c07..6a89e88ef7b2 100644 --- a/tools/testing/selftests/powerpc/dexcr/Makefile +++ b/tools/testing/selftests/powerpc/dexcr/Makefile @@ -1,5 +1,5 @@ TEST_GEN_PROGS := dexcr_test hashchk_test -TEST_GEN_FILES := lsdexcr +TEST_GEN_FILES := lsdexcr chdexcr include ../../lib.mk diff --git a/tools/testing/selftests/powerpc/dexcr/chdexcr.c b/tools/testing/selftests/powerpc/dexcr/chdexcr.c new file mode 100644 index ..217187a83224 --- /dev/null +++ b/tools/testing/selftests/powerpc/dexcr/chdexcr.c @@ -0,0 +1,110 @@ +// SPDX-License-Identifier: GPL-2.0+ + +#include +#include +#include +#include +#include +#include + +#include "dexcr.h" +#include "utils.h" + +static void die(const char *msg) +{ + printf("%s\n", msg); + exit(1); +} + +static void help(void) +{ + printf("Invoke a provided program with a custom DEXCR on-exec reset value\n" + "\n" + "usage: chdexcr [CHDEXCR OPTIONS] -- PROGRAM [ARGS...]\n" + "\n" + "Each configurable DEXCR aspect is exposed as an option.\n" + "\n" + "The normal option sets the aspect in the DEXCR. The --no- variant\n" + "clears that aspect. For example, --ibrtpd sets the IBRTPD aspect bit,\n" + "so indirect branch predicition will be disabled in the provided program.\n" + "Conversely, --no-ibrtpd clears the aspect bit, so indirect branch\n" + "prediction may occur.\n" + "\n" + "CHDEXCR OPTIONS:\n"); + + for (int i = 0; i < ARRAY_SIZE(aspects); i++) { + const struct dexcr_aspect *aspect = [i]; + + if (aspect->prctl == -1) + continue; + + printf(" --%-6s / --no-%-6s : %s\n", aspect->opt, aspect->opt, aspect->desc); + } +} + +static const struct dexcr_aspect *opt_to_aspect(const char *opt) +{ + for (int i = 0; i < ARRAY_SIZE(aspects); i++) + if (aspects[i].prctl != -1 && !strcmp(aspects[i].opt, opt)) + return [i]; + + return NULL; +} + +static int apply_option(const char *option) +{ + const struct dexcr_aspect *aspect; + const char *opt = NULL; + const char *set_prefix = "--"; + const char *clear_prefix = "--no-"; + unsigned long ctrl = 0; + int err; + + if (!strcmp(option, "-h") || !strcmp(option, "--help")) { + help(); + exit(0); + } + + /* Strip out --(no-) prefix and determine ctrl value */ + if (!strncmp(option, clear_prefix, strlen(clear_prefix))) { + opt = [strlen(clear_prefix)]; + ctrl |= PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC; + } else if (!strncmp(option, set_prefix, strlen(set_prefix))) { + opt = [strlen(set_prefix)]; + ctrl |= PR_PPC_DEXCR_CTRL_SET_ONEXEC; + } + + if (!opt || !*opt) + return 1; + + aspect = opt_to_aspect(opt); + if (!aspect) + die("unknown aspect"); + + err = pr_set_dexcr(aspect->prctl, ctrl); + if (err) + die("failed to apply option"); + + return 0; +} + +int main(int argc, char *const argv[], char *const envp[]) +{ + int i; + + if (!dexcr_exists()) + die("DEXCR not detected on this hardware"); + + for (i = 1; i < argc; i++) + if (apply_option(argv[i])) + break; + + if (i < argc && !strcmp(argv[i], "--")) + i++; + + if (i >= argc) + die("missing command"); + + execve(argv[i], [i], en
[PATCH v1 7/9] selftests/powerpc/dexcr: Add DEXCR config details to lsdexcr
Now that the DEXCR can be configured with prctl, add a section in lsdexcr that explains why each aspect is set the way it is. Signed-off-by: Benjamin Gray --- .../testing/selftests/powerpc/dexcr/lsdexcr.c | 113 +- 1 file changed, 111 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/powerpc/dexcr/lsdexcr.c b/tools/testing/selftests/powerpc/dexcr/lsdexcr.c index 94abbfcc389e..a63db47b6610 100644 --- a/tools/testing/selftests/powerpc/dexcr/lsdexcr.c +++ b/tools/testing/selftests/powerpc/dexcr/lsdexcr.c @@ -1,9 +1,9 @@ // SPDX-License-Identifier: GPL-2.0+ -#include #include #include #include +#include #include "dexcr.h" #include "utils.h" @@ -16,6 +16,8 @@ struct dexcr_aspect { const char *name; const char *desc; unsigned int index; + unsigned long prctl; + const char *sysctl; }; static const struct dexcr_aspect aspects[] = { @@ -23,26 +25,36 @@ static const struct dexcr_aspect aspects[] = { .name = "SBHE", .desc = "Speculative branch hint enable", .index = 0, + .prctl = PR_PPC_DEXCR_SBHE, + .sysctl = "speculative_branch_hint_enable", }, { .name = "IBRTPD", .desc = "Indirect branch recurrent target prediction disable", .index = 3, + .prctl = PR_PPC_DEXCR_IBRTPD, + .sysctl = "indirect_branch_recurrent_target_prediction_disable", }, { .name = "SRAPD", .desc = "Subroutine return address prediction disable", .index = 4, + .prctl = PR_PPC_DEXCR_SRAPD, + .sysctl = "subroutine_return_address_prediction_disable", }, { .name = "NPHIE", .desc = "Non-privileged hash instruction enable", .index = 5, + .prctl = PR_PPC_DEXCR_NPHIE, + .sysctl = "nonprivileged_hash_instruction_enable", }, { .name = "PHIE", .desc = "Privileged hash instruction enable", .index = 6, + .prctl = -1, + .sysctl = NULL, }, }; @@ -60,7 +72,7 @@ static void print_dexcr(char *name, unsigned int bits) const char *enabled_aspects[ARRAY_SIZE(aspects) + 1] = {NULL}; size_t j = 0; - printf("%s: %08x", name, bits); + printf("%s: 0x%08x", name, bits); if (bits == 0) { printf("\n"); @@ -103,6 +115,95 @@ static void print_aspect(const struct dexcr_aspect *aspect) printf(" \t(%s)\n", aspect->desc); } +static void print_aspect_config(const struct dexcr_aspect *aspect) +{ + char sysctl_path[128] = "/proc/sys/kernel/dexcr/"; + const char *reason = "unknown"; + const char *reason_hyp = NULL; + const char *reason_sysctl = "no sysctl"; + const char *reason_prctl = "no prctl"; + bool actual = effective & DEXCR_PR_BIT(aspect->index); + bool expected = false; + + long sysctl_ctrl = 0; + int prctl_ctrl = 0; + int err; + + if (aspect->prctl >= 0) { + prctl_ctrl = pr_get_dexcr(aspect->prctl); + if (prctl_ctrl < 0) + reason_prctl = "(failed to read prctl)"; + else { + if (prctl_ctrl & PR_PPC_DEXCR_CTRL_SET) { + reason_prctl = "set by prctl"; + expected = true; + } else if (prctl_ctrl & PR_PPC_DEXCR_CTRL_CLEAR) { + reason_prctl = "cleared by prctl"; + expected = false; + } else + reason_prctl = "unknown prctl"; + + reason = reason_prctl; + } + } + + if (aspect->sysctl) { + strcat(sysctl_path, aspect->sysctl); + err = read_long(sysctl_path, _ctrl, 10); + if (err) + reason_sysctl = "(failed to read sysctl)"; + else { + switch (sysctl_ctrl) { + case 0: + reason_sysctl = "cleared by sysctl"; + reason = reason_sysctl; + expected = false; + break; + case 1: + reason_sysctl = "set by sysctl"; + reason = reason_sysc
[PATCH v1 6/9] selftests/powerpc/dexcr: Attempt to enable NPHIE in hashchk selftest
Now that a process can control its DEXCR to some extent, make the hashchk tests more reliable by explicitly setting the local and onexec NPHIE aspect. Signed-off-by: Benjamin Gray --- tools/testing/selftests/powerpc/dexcr/hashchk_test.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/powerpc/dexcr/hashchk_test.c b/tools/testing/selftests/powerpc/dexcr/hashchk_test.c index 7d5658c9ebe4..645224bdc142 100644 --- a/tools/testing/selftests/powerpc/dexcr/hashchk_test.c +++ b/tools/testing/selftests/powerpc/dexcr/hashchk_test.c @@ -21,8 +21,14 @@ static int require_nphie(void) { SKIP_IF_MSG(!dexcr_exists(), "DEXCR not supported"); + + pr_set_dexcr(PR_PPC_DEXCR_NPHIE, PR_PPC_DEXCR_CTRL_SET | PR_PPC_DEXCR_CTRL_SET_ONEXEC); + + if (get_dexcr(EFFECTIVE) & DEXCR_PR_NPHIE) + return 0; + SKIP_IF_MSG(!(get_dexcr(EFFECTIVE) & DEXCR_PR_NPHIE), - "DEXCR[NPHIE] not enabled"); + "Failed to enable DEXCR[NPHIE]"); return 0; } -- 2.44.0
[PATCH v1 5/9] selftests/powerpc/dexcr: Add DEXCR prctl interface test
Some basic tests of the prctl interface of the DEXCR. Signed-off-by: Benjamin Gray --- .../selftests/powerpc/dexcr/.gitignore| 1 + .../testing/selftests/powerpc/dexcr/Makefile | 4 +- tools/testing/selftests/powerpc/dexcr/dexcr.c | 40 tools/testing/selftests/powerpc/dexcr/dexcr.h | 10 + .../selftests/powerpc/dexcr/dexcr_test.c | 213 ++ 5 files changed, 267 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/dexcr/dexcr_test.c diff --git a/tools/testing/selftests/powerpc/dexcr/.gitignore b/tools/testing/selftests/powerpc/dexcr/.gitignore index b82f45dd46b9..5d526613cd26 100644 --- a/tools/testing/selftests/powerpc/dexcr/.gitignore +++ b/tools/testing/selftests/powerpc/dexcr/.gitignore @@ -1,2 +1,3 @@ +dexcr_test hashchk_test lsdexcr diff --git a/tools/testing/selftests/powerpc/dexcr/Makefile b/tools/testing/selftests/powerpc/dexcr/Makefile index 829ad075b4a4..076943193c07 100644 --- a/tools/testing/selftests/powerpc/dexcr/Makefile +++ b/tools/testing/selftests/powerpc/dexcr/Makefile @@ -1,8 +1,10 @@ -TEST_GEN_PROGS := hashchk_test +TEST_GEN_PROGS := dexcr_test hashchk_test TEST_GEN_FILES := lsdexcr include ../../lib.mk +CFLAGS += $(KHDR_INCLUDES) + $(OUTPUT)/hashchk_test: CFLAGS += -fno-pie -no-pie $(call cc-option,-mno-rop-protect) $(TEST_GEN_PROGS): ../harness.c ../utils.c ./dexcr.c diff --git a/tools/testing/selftests/powerpc/dexcr/dexcr.c b/tools/testing/selftests/powerpc/dexcr/dexcr.c index 65ec5347de98..468fd0dc9912 100644 --- a/tools/testing/selftests/powerpc/dexcr/dexcr.c +++ b/tools/testing/selftests/powerpc/dexcr/dexcr.c @@ -3,6 +3,7 @@ #include #include #include +#include #include #include @@ -43,6 +44,45 @@ bool dexcr_exists(void) return exists; } +unsigned int pr_which_to_aspect(unsigned long which) +{ + switch (which) { + case PR_PPC_DEXCR_SBHE: + return DEXCR_PR_SBHE; + case PR_PPC_DEXCR_IBRTPD: + return DEXCR_PR_IBRTPD; + case PR_PPC_DEXCR_SRAPD: + return DEXCR_PR_SRAPD; + case PR_PPC_DEXCR_NPHIE: + return DEXCR_PR_NPHIE; + default: + FAIL_IF_EXIT_MSG(true, "unknown PR aspect"); + } +} + +int pr_get_dexcr(unsigned long which) +{ + return prctl(PR_PPC_GET_DEXCR, which, 0UL, 0UL, 0UL); +} + +int pr_set_dexcr(unsigned long which, unsigned long ctrl) +{ + return prctl(PR_PPC_SET_DEXCR, which, ctrl, 0UL, 0UL); +} + +bool pr_dexcr_aspect_supported(unsigned long which) +{ + if (pr_get_dexcr(which) == -1) + return errno == ENODEV; + + return true; +} + +bool pr_dexcr_aspect_editable(unsigned long which) +{ + return pr_get_dexcr(which) & PR_PPC_DEXCR_CTRL_EDITABLE; +} + /* * Just test if a bad hashchk triggers a signal, without checking * for support or if the NPHIE aspect is enabled. diff --git a/tools/testing/selftests/powerpc/dexcr/dexcr.h b/tools/testing/selftests/powerpc/dexcr/dexcr.h index f55cbbc8643b..a6aa7eac11da 100644 --- a/tools/testing/selftests/powerpc/dexcr/dexcr.h +++ b/tools/testing/selftests/powerpc/dexcr/dexcr.h @@ -28,6 +28,16 @@ bool dexcr_exists(void); +bool pr_dexcr_aspect_supported(unsigned long which); + +bool pr_dexcr_aspect_editable(unsigned long which); + +int pr_get_dexcr(unsigned long pr_aspect); + +int pr_set_dexcr(unsigned long pr_aspect, unsigned long ctrl); + +unsigned int pr_which_to_aspect(unsigned long which); + bool hashchk_triggers(void); enum dexcr_source { diff --git a/tools/testing/selftests/powerpc/dexcr/dexcr_test.c b/tools/testing/selftests/powerpc/dexcr/dexcr_test.c new file mode 100644 index ..4662c823a4f1 --- /dev/null +++ b/tools/testing/selftests/powerpc/dexcr/dexcr_test.c @@ -0,0 +1,213 @@ +#include +#include +#include +#include +#include +#include + +#include "dexcr.h" +#include "utils.h" + +/* + * Helper function for testing the behaviour of a newly exec-ed process + */ +static int dexcr_prctl_onexec_test_child(unsigned long which, const char *status) +{ + unsigned long dexcr = mfspr(SPRN_DEXCR_RO); + unsigned long aspect = pr_which_to_aspect(which); + int ctrl = pr_get_dexcr(which); + + if (!strcmp(status, "set")) { + FAIL_IF_EXIT_MSG(!(ctrl & PR_PPC_DEXCR_CTRL_SET), +"setting aspect across exec not applied"); + + FAIL_IF_EXIT_MSG(!(ctrl & PR_PPC_DEXCR_CTRL_SET_ONEXEC), +"setting aspect across exec not inherited"); + + FAIL_IF_EXIT_MSG(!(aspect & dexcr), "setting aspect across exec did not take effect"); + } else if (!strcmp(status, "clear")) { + FAIL_IF_EXIT_MSG(!(ctrl & PR_PPC_DEXCR_CTRL_CLEAR), +"clearing aspect across exec not applied"); + +
[PATCH v1 3/9] powerpc/dexcr: Reset DEXCR value across exec
Inheriting the DEXCR across exec can have security and usability concerns. If a program is compiled with hash instructions it generally expects to run with NPHIE enabled. But if the parent process disables NPHIE then if it's not careful it will be disabled for any children too and the protection offered by hash checks is basically worthless. This patch introduces a per-process reset value that new execs in a particular process tree are initialized with. This enables fine grained control over what DEXCR value child processes run with by default. For example, containers running legacy binaries that expect hash instructions to act as NOPs could configure the reset value of the container root to control the default reset value for all members of the container. Signed-off-by: Benjamin Gray --- This differs from the previous iterations by making the reset value totally independent of the process's current DEXCR value. The original iterations stored an 'inherit' bitmask, where the mask decides which of the current DEXCR aspects are kept across exec, and the others are reset to a fixed default. That approach has a flaw when trying to prevent unauthorized inherited hash check disabling. With a hierarchy A -> B -> C of process execs, suppose A is privileged and enables NPHIE as an inherited aspect. If B is unprivileged but clears its NPHIE aspect, the clear is unintentionally inherited down to C as well (which may be setuid). This new approach lets processes control the reset value directly without any reference to the values the process itself is using. Compared to the original approach, we don't run into an issue where an unprivileged middle child ever controls the reset value. --- arch/powerpc/include/asm/processor.h | 2 +- arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/dexcr.c | 21 + arch/powerpc/kernel/process.c| 7 +++ 4 files changed, 30 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/kernel/dexcr.c diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 882e31296ea6..aad85a24134a 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -261,7 +261,7 @@ struct thread_struct { unsigned long sier3; unsigned long hashkeyr; unsigned long dexcr; - + unsigned long dexcr_onexec; /* Reset value to load on exec */ #endif }; diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index d3282fbea4f2..1d183b077948 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -87,6 +87,7 @@ obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o obj-$(CONFIG_PPC_DAWR) += dawr.o obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_ppc970.o cpu_setup_pa6t.o obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_power.o +obj-$(CONFIG_PPC_BOOK3S_64)+= dexcr.o obj-$(CONFIG_PPC_BOOK3S_64)+= mce.o mce_power.o obj-$(CONFIG_PPC_BOOK3E_64)+= exceptions-64e.o idle_64e.o obj-$(CONFIG_PPC_BARRIER_NOSPEC) += security.o diff --git a/arch/powerpc/kernel/dexcr.c b/arch/powerpc/kernel/dexcr.c new file mode 100644 index ..f65c359968cc --- /dev/null +++ b/arch/powerpc/kernel/dexcr.c @@ -0,0 +1,21 @@ +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +static int __init init_task_dexcr(void) +{ + if (!early_cpu_has_feature(CPU_FTR_ARCH_31)) + return 0; + + current->thread.dexcr_onexec = mfspr(SPRN_DEXCR); + + return 0; +} +early_initcall(init_task_dexcr) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index d482c3fd81d7..8ab779a3bdde 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1641,6 +1641,13 @@ void arch_setup_new_exec(void) current->thread.regs->amr = default_amr; current->thread.regs->iamr = default_iamr; #endif + +#ifdef CONFIG_PPC_BOOK3S_64 + if (cpu_has_feature(CPU_FTR_ARCH_31)) { + current->thread.dexcr = current->thread.dexcr_onexec; + mtspr(SPRN_DEXCR, current->thread.dexcr); + } +#endif /* CONFIG_PPC_BOOK3S_64 */ } #ifdef CONFIG_PPC64 -- 2.44.0
[PATCH v1 4/9] powerpc/dexcr: Add DEXCR prctl interface
Now that we track a DEXCR on a per-task basis, individual tasks are free to configure it as they like. The interface is a pair of getter/setter prctl's that work on a single aspect at a time (multiple aspects at once is more difficult if there are different rules applied for each aspect, now or in future). The getter shows the current state of the process config, and the setter allows setting/clearing the aspect. Signed-off-by: Benjamin Gray --- I'm intentionally trying to avoid saying 'enabling' or 'disabling' the aspect, as that could be confusing when the aspects themselves may 'enable' or 'disable' their feature when the aspect is set. --- arch/powerpc/include/asm/processor.h | 10 +++ arch/powerpc/kernel/dexcr.c | 99 include/uapi/linux/prctl.h | 16 + kernel/sys.c | 16 + 4 files changed, 141 insertions(+) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index aad85a24134a..e44cac0da346 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -334,6 +334,16 @@ extern int set_endian(struct task_struct *tsk, unsigned int val); extern int get_unalign_ctl(struct task_struct *tsk, unsigned long adr); extern int set_unalign_ctl(struct task_struct *tsk, unsigned int val); +#ifdef CONFIG_PPC_BOOK3S_64 + +#define PPC_GET_DEXCR_ASPECT(tsk, asp) get_dexcr_prctl((tsk), (asp)) +#define PPC_SET_DEXCR_ASPECT(tsk, asp, val) set_dexcr_prctl((tsk), (asp), (val)) + +int get_dexcr_prctl(struct task_struct *tsk, unsigned long asp); +int set_dexcr_prctl(struct task_struct *tsk, unsigned long asp, unsigned long val); + +#endif + extern void load_fp_state(struct thread_fp_state *fp); extern void store_fp_state(struct thread_fp_state *fp); extern void load_vr_state(struct thread_vr_state *vr); diff --git a/arch/powerpc/kernel/dexcr.c b/arch/powerpc/kernel/dexcr.c index f65c359968cc..f50247b3daa1 100644 --- a/arch/powerpc/kernel/dexcr.c +++ b/arch/powerpc/kernel/dexcr.c @@ -19,3 +19,102 @@ static int __init init_task_dexcr(void) return 0; } early_initcall(init_task_dexcr) + +/* Allow thread local configuration of these by default */ +#define DEXCR_PRCTL_EDITABLE ( \ + DEXCR_PR_IBRTPD | \ + DEXCR_PR_SRAPD | \ + DEXCR_PR_NPHIE) + +static int prctl_to_aspect(unsigned long which, unsigned int *aspect) +{ + switch (which) { + case PR_PPC_DEXCR_SBHE: + *aspect = DEXCR_PR_SBHE; + break; + case PR_PPC_DEXCR_IBRTPD: + *aspect = DEXCR_PR_IBRTPD; + break; + case PR_PPC_DEXCR_SRAPD: + *aspect = DEXCR_PR_SRAPD; + break; + case PR_PPC_DEXCR_NPHIE: + *aspect = DEXCR_PR_NPHIE; + break; + default: + return -ENODEV; + } + + return 0; +} + +int get_dexcr_prctl(struct task_struct *task, unsigned long which) +{ + unsigned int aspect; + int ret; + + ret = prctl_to_aspect(which, ); + if (ret) + return ret; + + if (aspect & DEXCR_PRCTL_EDITABLE) + ret |= PR_PPC_DEXCR_CTRL_EDITABLE; + + if (aspect & mfspr(SPRN_DEXCR)) + ret |= PR_PPC_DEXCR_CTRL_SET; + else + ret |= PR_PPC_DEXCR_CTRL_CLEAR; + + if (aspect & task->thread.dexcr_onexec) + ret |= PR_PPC_DEXCR_CTRL_SET_ONEXEC; + else + ret |= PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC; + + return ret; +} + +int set_dexcr_prctl(struct task_struct *task, unsigned long which, unsigned long ctrl) +{ + unsigned long dexcr; + unsigned int aspect; + int err = 0; + + err = prctl_to_aspect(which, ); + if (err) + return err; + + if (!(aspect & DEXCR_PRCTL_EDITABLE)) + return -EPERM; + + if (ctrl & ~PR_PPC_DEXCR_CTRL_MASK) + return -EINVAL; + + if (ctrl & PR_PPC_DEXCR_CTRL_SET && ctrl & PR_PPC_DEXCR_CTRL_CLEAR) + return -EINVAL; + + if (ctrl & PR_PPC_DEXCR_CTRL_SET_ONEXEC && ctrl & PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC) + return -EINVAL; + + /* +* We do not want an unprivileged process being able to disable +* a setuid process's hash check instructions +*/ + if (aspect == DEXCR_PR_NPHIE && ctrl & PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC && !capable(CAP_SYS_ADMIN)) + return -EPERM; + + dexcr = mfspr(SPRN_DEXCR); + + if (ctrl & PR_PPC_DEXCR_CTRL_SET) + dexcr |= aspect; + else if (ctrl & PR_PPC_DEXCR_CTRL_CLEAR) + dexcr &= ~aspect; + + if (ctrl & PR_PPC_DEXCR_CTRL_SET_ONEXEC) + task->thread.dexcr_onexec |= aspect; + else if (ctrl & PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC) + task->
[PATCH v1 1/9] selftests/powerpc/dexcr: Add -no-pie to hashchk tests
The hashchk tests want to verify that the hash key is changed over exec. It does so by calculating hashes at the same address across an exec. This is made simpler by disabling PIE functionality, so we can re-execute ourselves and be using the same addresses in the child. While -fno-pie is already added, -no-pie is also required. Fixes: ca64da7574f8 ("selftests/powerpc/dexcr: Add hashst/hashchk test") Signed-off-by: Benjamin Gray --- This is not related to features introduced in this series, just fixes the test added in the static DEXCR series. --- tools/testing/selftests/powerpc/dexcr/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/powerpc/dexcr/Makefile b/tools/testing/selftests/powerpc/dexcr/Makefile index 76210f2bcec3..829ad075b4a4 100644 --- a/tools/testing/selftests/powerpc/dexcr/Makefile +++ b/tools/testing/selftests/powerpc/dexcr/Makefile @@ -3,7 +3,7 @@ TEST_GEN_FILES := lsdexcr include ../../lib.mk -$(OUTPUT)/hashchk_test: CFLAGS += -fno-pie $(call cc-option,-mno-rop-protect) +$(OUTPUT)/hashchk_test: CFLAGS += -fno-pie -no-pie $(call cc-option,-mno-rop-protect) $(TEST_GEN_PROGS): ../harness.c ../utils.c ./dexcr.c $(TEST_GEN_FILES): ../utils.c ./dexcr.c -- 2.44.0
[PATCH v1 2/9] powerpc/dexcr: Track the DEXCR per-process
Add capability to make the DEXCR act as a per-process SPR. We do not yet have an interface for changing the values per task. We also expect the kernel to use a single DEXCR value across all tasks while in privileged state, so there is no need to synchronize after changing it (the userspace aspects will synchronize upon returning to userspace). Signed-off-by: Benjamin Gray --- arch/powerpc/include/asm/processor.h | 1 + arch/powerpc/kernel/process.c| 10 ++ arch/powerpc/kernel/ptrace/ptrace-view.c | 7 +-- 3 files changed, 12 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index b2c51d337e60..882e31296ea6 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -260,6 +260,7 @@ struct thread_struct { unsigned long sier2; unsigned long sier3; unsigned long hashkeyr; + unsigned long dexcr; #endif }; diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 9452a54d356c..d482c3fd81d7 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1185,6 +1185,9 @@ static inline void save_sprs(struct thread_struct *t) if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE)) t->hashkeyr = mfspr(SPRN_HASHKEYR); + + if (cpu_has_feature(CPU_FTR_ARCH_31)) + t->dexcr = mfspr(SPRN_DEXCR); #endif } @@ -1267,6 +1270,10 @@ static inline void restore_sprs(struct thread_struct *old_thread, if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE) && old_thread->hashkeyr != new_thread->hashkeyr) mtspr(SPRN_HASHKEYR, new_thread->hashkeyr); + + if (cpu_has_feature(CPU_FTR_ARCH_31) && + old_thread->dexcr != new_thread->dexcr) + mtspr(SPRN_DEXCR, new_thread->dexcr); #endif } @@ -1878,6 +1885,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) #ifdef CONFIG_PPC_BOOK3S_64 if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE)) p->thread.hashkeyr = current->thread.hashkeyr; + + if (cpu_has_feature(CPU_FTR_ARCH_31)) + p->thread.dexcr = mfspr(SPRN_DEXCR); #endif return 0; } diff --git a/arch/powerpc/kernel/ptrace/ptrace-view.c b/arch/powerpc/kernel/ptrace/ptrace-view.c index 584cf5c3df50..c1819e0a6684 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-view.c +++ b/arch/powerpc/kernel/ptrace/ptrace-view.c @@ -469,12 +469,7 @@ static int dexcr_get(struct task_struct *target, const struct user_regset *regse if (!cpu_has_feature(CPU_FTR_ARCH_31)) return -ENODEV; - /* -* The DEXCR is currently static across all CPUs, so we don't -* store the target's value anywhere, but the static value -* will also be correct. -*/ - membuf_store(, (u64)lower_32_bits(DEXCR_INIT)); + membuf_store(, (u64)lower_32_bits(target->thread.dexcr)); /* * Technically the HDEXCR is per-cpu, but a hypervisor can't reasonably -- 2.44.0
[PATCH v1 0/9] Add dynamic DEXCR support
Adds support for a process to change its DEXCR value. The implementation is somewhat conservative; SBHE (speculative branch hint enable) is not exposed as an editable aspect because its effects can bleed over to other threads. As explained in the third patch, this series changes the reset/inherit behaviour on exec. Previously there was a bitmask that tracked which aspects to copy from the current state vs resetting to a fixed default. This allows unprivileged processes to disable ROP protection for setuid binaries though, and is generally a weird interface to work with. The actual intent (and new implementation) tracks the exec value as an independent value that doesn't use the parent's DEXCR at all. The parent can control this reset value separately to its own DEXCR value. The other interesting part is the prctl interface. I've made the _SET, _CLEAR, _SET_ONEXEC, and _CLEAR_ONEXEC controls each a separate flag. This makes it easier to re-use with the getter prctl, as opposed to making set/clear a boolean value with a separate flag for if it's 'on-exec'. With separate flags you can return both the current and on-exec state in the getter in the same way you'd prepare it for the setter. There are still more features that can be added. A global switch to disable ROP protection could be useful as an option to prevent performance degradation. Also a prctl to randomise the hash key could be useful for when userspace knows a fork is not going to run any parent hashes. These features could be added in a future series (or the next version of this one :) ) though. Benjamin Gray (9): selftests/powerpc/dexcr: Add -no-pie to hashchk tests powerpc/dexcr: Track the DEXCR per-process powerpc/dexcr: Reset DEXCR value across exec powerpc/dexcr: Add DEXCR prctl interface selftests/powerpc/dexcr: Add DEXCR prctl interface test selftests/powerpc/dexcr: Attempt to enable NPHIE in hashchk selftest selftests/powerpc/dexcr: Add DEXCR config details to lsdexcr selftests/powerpc/dexcr: Add chdexcr utility Documentation: Document PowerPC kernel dynamic DEXCR interface Documentation/arch/powerpc/dexcr.rst | 141 +++- arch/powerpc/include/asm/processor.h | 13 +- arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/dexcr.c | 120 ++ arch/powerpc/kernel/process.c | 17 ++ arch/powerpc/kernel/ptrace/ptrace-view.c | 7 +- include/uapi/linux/prctl.h| 16 ++ kernel/sys.c | 16 ++ .../selftests/powerpc/dexcr/.gitignore| 2 + .../testing/selftests/powerpc/dexcr/Makefile | 8 +- .../testing/selftests/powerpc/dexcr/chdexcr.c | 110 + tools/testing/selftests/powerpc/dexcr/dexcr.c | 40 tools/testing/selftests/powerpc/dexcr/dexcr.h | 57 + .../selftests/powerpc/dexcr/dexcr_test.c | 213 ++ .../selftests/powerpc/dexcr/hashchk_test.c| 8 +- .../testing/selftests/powerpc/dexcr/lsdexcr.c | 103 ++--- 16 files changed, 823 insertions(+), 49 deletions(-) create mode 100644 arch/powerpc/kernel/dexcr.c create mode 100644 tools/testing/selftests/powerpc/dexcr/chdexcr.c create mode 100644 tools/testing/selftests/powerpc/dexcr/dexcr_test.c -- 2.44.0
[PATCH v1] powerpc: Error on assembly warnings
We currently enable -Werror on the arch/powerpc subtree. However this only catches C warnings. Assembly warnings are logged, but the make invocation will still succeed. This can allow incorrect syntax such as ori r3, r4, r5 to be compiled without catching that the assembler is treating r5 as the immediate value 5. To prevent this in assembly files and inline assembly, add the -fatal-warnings option to assembler invocations. Signed-off-by: Benjamin Gray --- arch/powerpc/Kbuild | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/Kbuild b/arch/powerpc/Kbuild index 22cd0d55a892..da862e9558bc 100644 --- a/arch/powerpc/Kbuild +++ b/arch/powerpc/Kbuild @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 -subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror +subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror -Wa,-fatal-warnings +subdir-asflags-$(CONFIG_PPC_WERROR) := -Wa,-fatal-warnings obj-y += kernel/ obj-y += mm/ -- 2.44.0
[PATCH v2 3/3] powerpc/code-patching: Restore 32-bit patching performance
The new open/close abstraction makes it more difficult for a compiler to optimise. This causes 10% worse performance on ppc32 as in [1]. Restoring the page alignment mask and inlining the helpers allows the compiler to better reason about the address alignment, allowing more optimised cache flushing selection. [1]: https://lore.kernel.org/all/77fdcdeb-4af5-4ad0-a4c6-57bf0762d...@csgroup.eu/ Suggested-by: Christophe Leroy Signed-off-by: Benjamin Gray --- v2: * New in v2 I think Suggested-by is an appropriate tag. The patch is Christophe's from the link, I just added the commit description, so it could well be better to change the author to Christophe completely. --- arch/powerpc/lib/code-patching.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index b3a644290369..d089da115987 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -282,13 +282,13 @@ struct patch_window { * Interrupts must be disabled for the entire duration of the patching. The PIDR * is potentially changed during this time. */ -static int open_patch_window(void *addr, struct patch_window *ctx) +static __always_inline int open_patch_window(void *addr, struct patch_window *ctx) { unsigned long pfn = get_patch_pfn(addr); lockdep_assert_irqs_disabled(); - ctx->text_poke_addr = (unsigned long)__this_cpu_read(cpu_patching_context.addr); + ctx->text_poke_addr = (unsigned long)__this_cpu_read(cpu_patching_context.addr) & PAGE_MASK; if (!mm_patch_enabled()) { ctx->ptep = __this_cpu_read(cpu_patching_context.pte); @@ -331,7 +331,7 @@ static int open_patch_window(void *addr, struct patch_window *ctx) return 0; } -static void close_patch_window(struct patch_window *ctx) +static __always_inline void close_patch_window(struct patch_window *ctx) { lockdep_assert_irqs_disabled(); -- 2.44.0
[PATCH v2 1/3] powerpc/code-patching: Introduce open_patch_window()/close_patch_window()
The code patching capabilities have grown, and the specific quirks for setting up and tearing down the patching window are getting duplicated. This commit introduces an abstraction for working with this patching window. It defines open_patch_window() to set up the writable alias page, and close_patch_window() to remove it and flush the TLB. The actual mechanism for providing this patching window is an implementation detail that consumers don't need to worry about. Consumers are still responsible for flushing/syncing any changes they make through this alias page though. Signed-off-by: Benjamin Gray --- v1: https://lore.kernel.org/all/20240315025937.407590-1-bg...@linux.ibm.com/ This design can be readily extended to remap the writable page to another physical page without incurring all of the entry and exit overhead. But that might have problems with spending too long in an interrupts disabled context, so I've left it out for now. --- arch/powerpc/lib/code-patching.c | 113 +++ 1 file changed, 113 insertions(+) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index df64343b9214..7c193e19e297 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -276,6 +276,119 @@ static void unmap_patch_area(unsigned long addr) flush_tlb_kernel_range(addr, addr + PAGE_SIZE); } +/* + * Represents an active patching window. + */ +struct patch_window { + /* +* Page aligned patching window address. The page is a writable alias of +* the configured target page. +*/ + unsigned long text_poke_addr; + /* +* Pointer to the PTE for the patching window page. +*/ + pte_t *ptep; + /* +* (Temporary MM patching only) The original MM before creating the +* patching window. +*/ + struct mm_struct *orig_mm; + /* +* (Temporary MM patching only) The patching window MM. +*/ + struct mm_struct *patch_mm; + /* +* (Temporary MM patching only) Lock for the patching window PTE. +*/ + spinlock_t *ptl; +}; + +/* + * Calling this function opens a 'patching window' that maps a + * page starting at ctx->text_poke_addr (which is set by this function) + * as a writable alias to the page starting at addr. + * + * Upon success, callers must invoke the corresponding close_patch_window() + * function to return to the original state. Callers are also responsible + * for syncing any changes they make inside the window. + * + * Interrupts must be disabled for the entire duration of the patching. The PIDR + * is potentially changed during this time. + */ +static int open_patch_window(void *addr, struct patch_window *ctx) +{ + unsigned long pfn = get_patch_pfn(addr); + + lockdep_assert_irqs_disabled(); + + ctx->text_poke_addr = (unsigned long)__this_cpu_read(cpu_patching_context.addr); + + if (!mm_patch_enabled()) { + ctx->ptep = __this_cpu_read(cpu_patching_context.pte); + __set_pte_at(_mm, ctx->text_poke_addr, +ctx->ptep, pfn_pte(pfn, PAGE_KERNEL), 0); + + /* See ptesync comment in radix__set_pte_at() */ + if (radix_enabled()) + asm volatile("ptesync" ::: "memory"); + + return 0; + } + + ctx->orig_mm = current->active_mm; + ctx->patch_mm = __this_cpu_read(cpu_patching_context.mm); + + ctx->ptep = get_locked_pte(ctx->patch_mm, ctx->text_poke_addr, >ptl); + if (!ctx->ptep) + return -ENOMEM; + + __set_pte_at(ctx->patch_mm, ctx->text_poke_addr, +ctx->ptep, pfn_pte(pfn, PAGE_KERNEL), 0); + + /* order PTE update before use, also serves as the hwsync */ + asm volatile("ptesync" ::: "memory"); + + /* +* Changing mm requires context synchronising instructions on both sides of +* the context switch, as well as a hwsync between the last instruction for +* which the address of an associated storage access was translated using +* the current context. switch_mm_irqs_off() performs an isync after the +* context switch. +*/ + isync(); + switch_mm_irqs_off(ctx->orig_mm, ctx->patch_mm, current); + + WARN_ON(!mm_is_thread_local(ctx->patch_mm)); + + suspend_breakpoints(); + return 0; +} + +static void close_patch_window(struct patch_window *ctx) +{ + lockdep_assert_irqs_disabled(); + + if (!mm_patch_enabled()) { + pte_clear(_mm, ctx->text_poke_addr, ctx->ptep); + flush_tlb_kernel_range(ctx->text_poke_addr, ctx->text_poke_addr + PAGE_SIZE); + return; + } + + isync(); + switch_mm_irqs_off(ctx->patch_mm, ctx->
[PATCH v2 2/3] powerpc/code-patching: Convert to open_patch_window()/close_patch_window()
The existing patching alias page setup and teardown sections can be simplified to make use of the new open_patch_window() abstraction. This eliminates the _mm variants of the helpers, consumers no longer need to check mm_patch_enabled(), and consumers no longer need to worry about synchronization and flushing beyond the changes they make in the patching window. Signed-off-by: Benjamin Gray --- v2: * Removed an outdated comment about syncing --- arch/powerpc/lib/code-patching.c | 179 +++ 1 file changed, 15 insertions(+), 164 deletions(-) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 7c193e19e297..b3a644290369 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -66,40 +66,6 @@ static bool mm_patch_enabled(void) return IS_ENABLED(CONFIG_SMP) && radix_enabled(); } -/* - * The following applies for Radix MMU. Hash MMU has different requirements, - * and so is not supported. - * - * Changing mm requires context synchronising instructions on both sides of - * the context switch, as well as a hwsync between the last instruction for - * which the address of an associated storage access was translated using - * the current context. - * - * switch_mm_irqs_off() performs an isync after the context switch. It is - * the responsibility of the caller to perform the CSI and hwsync before - * starting/stopping the temp mm. - */ -static struct mm_struct *start_using_temp_mm(struct mm_struct *temp_mm) -{ - struct mm_struct *orig_mm = current->active_mm; - - lockdep_assert_irqs_disabled(); - switch_mm_irqs_off(orig_mm, temp_mm, current); - - WARN_ON(!mm_is_thread_local(temp_mm)); - - suspend_breakpoints(); - return orig_mm; -} - -static void stop_using_temp_mm(struct mm_struct *temp_mm, - struct mm_struct *orig_mm) -{ - lockdep_assert_irqs_disabled(); - switch_mm_irqs_off(temp_mm, orig_mm, current); - restore_breakpoints(); -} - static int text_area_cpu_up(unsigned int cpu) { struct vm_struct *area; @@ -389,73 +355,20 @@ static void close_patch_window(struct patch_window *ctx) pte_unmap_unlock(ctx->ptep, ctx->ptl); } -static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) -{ - int err; - u32 *patch_addr; - unsigned long text_poke_addr; - pte_t *pte; - unsigned long pfn = get_patch_pfn(addr); - struct mm_struct *patching_mm; - struct mm_struct *orig_mm; - spinlock_t *ptl; - - patching_mm = __this_cpu_read(cpu_patching_context.mm); - text_poke_addr = __this_cpu_read(cpu_patching_context.addr); - patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr)); - - pte = get_locked_pte(patching_mm, text_poke_addr, ); - if (!pte) - return -ENOMEM; - - __set_pte_at(patching_mm, text_poke_addr, pte, pfn_pte(pfn, PAGE_KERNEL), 0); - - /* order PTE update before use, also serves as the hwsync */ - asm volatile("ptesync": : :"memory"); - - /* order context switch after arbitrary prior code */ - isync(); - - orig_mm = start_using_temp_mm(patching_mm); - - err = __patch_instruction(addr, instr, patch_addr); - - /* context synchronisation performed by __patch_instruction (isync or exception) */ - stop_using_temp_mm(patching_mm, orig_mm); - - pte_clear(patching_mm, text_poke_addr, pte); - /* -* ptesync to order PTE update before TLB invalidation done -* by radix__local_flush_tlb_page_psize (in _tlbiel_va) -*/ - local_flush_tlb_page_psize(patching_mm, text_poke_addr, mmu_virtual_psize); - - pte_unmap_unlock(pte, ptl); - - return err; -} - static int __do_patch_instruction(u32 *addr, ppc_inst_t instr) { - int err; + struct patch_window ctx; u32 *patch_addr; - unsigned long text_poke_addr; - pte_t *pte; - unsigned long pfn = get_patch_pfn(addr); - - text_poke_addr = (unsigned long)__this_cpu_read(cpu_patching_context.addr) & PAGE_MASK; - patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr)); + int err; - pte = __this_cpu_read(cpu_patching_context.pte); - __set_pte_at(_mm, text_poke_addr, pte, pfn_pte(pfn, PAGE_KERNEL), 0); - /* See ptesync comment in radix__set_pte_at() */ - if (radix_enabled()) - asm volatile("ptesync": : :"memory"); + err = open_patch_window(addr, ); + if (err) + return err; + patch_addr = (u32 *)(ctx.text_poke_addr + offset_in_page(addr)); err = __patch_instruction(addr, instr, patch_addr); - pte_clear(_mm, text_poke_addr, pte); - flush_tlb_kernel_range(text_poke_addr, text_poke_addr + PAGE_SIZE); + close_patch_window(); return err; } @@ -475,1
Re: [PATCH v1 2/2] powerpc64/dexcr: Enable PHIE on all CPUs
OK, so I compile for corenet64 but not corenet32 apparently. I'll fix the shift overflow in the next round.
[PATCH v3 5/5] powerpc/code-patching: Add boot selftest for data patching
Extend the code patching selftests with some basic coverage of the new data patching variants too. Signed-off-by: Benjamin Gray --- v3: * New in v3 --- arch/powerpc/lib/test-code-patching.c | 36 +++ 1 file changed, 36 insertions(+) diff --git a/arch/powerpc/lib/test-code-patching.c b/arch/powerpc/lib/test-code-patching.c index c44823292f73..e96c48fcd4db 100644 --- a/arch/powerpc/lib/test-code-patching.c +++ b/arch/powerpc/lib/test-code-patching.c @@ -347,6 +347,41 @@ static void __init test_prefixed_patching(void) check(!memcmp(iptr, expected, sizeof(expected))); } +static void __init test_data_patching(void) +{ + void *buf; + u32 *addr32; + + buf = vzalloc(PAGE_SIZE); + check(buf); + if (!buf) + return; + + addr32 = buf + 128; + + addr32[1] = 0xA0A1A2A3; + addr32[2] = 0xB0B1B2B3; + + patch_uint([1], 0xC0C1C2C3); + + check(addr32[0] == 0); + check(addr32[1] == 0xC0C1C2C3); + check(addr32[2] == 0xB0B1B2B3); + check(addr32[3] == 0); + + patch_ulong([1], 0xD0D1D2D3); + + check(addr32[0] == 0); + *(unsigned long *)([1]) = 0xD0D1D2D3; + + if (!IS_ENABLED(CONFIG_PPC64)) + check(addr32[2] == 0xB0B1B2B3); + + check(addr32[3] == 0); + + vfree(buf); +} + static int __init test_code_patching(void) { pr_info("Running code patching self-tests ...\n"); @@ -356,6 +391,7 @@ static int __init test_code_patching(void) test_create_function_call(); test_translate_branch(); test_prefixed_patching(); + test_data_patching(); return 0; } -- 2.44.0
[PATCH v3 1/5] powerpc/code-patching: Add generic memory patching
patch_instruction() is designed for patching instructions in otherwise readonly memory. Other consumers also sometimes need to patch readonly memory, so have abused patch_instruction() for arbitrary data patches. This is a problem on ppc64 as patch_instruction() decides on the patch width using the 'instruction' opcode to see if it's a prefixed instruction. Data that triggers this can lead to larger writes, possibly crossing a page boundary and failing the write altogether. Introduce patch_uint(), and patch_ulong(), with aliases patch_u32(), and patch_u64() (on ppc64) designed for aligned data patches. The patch size is now determined by the called function, and is passed as an additional parameter to generic internals. While the instruction flushing is not required for data patches, it remains unconditional in this patch. A followup series is possible if benchmarking shows fewer flushes gives an improvement in some data-patching workload. ppc32 does not support prefixed instructions, so is unaffected by the original issue. Care is taken in not exposing the size parameter in the public (non-static) interface, so the compiler can const-propagate it away. Signed-off-by: Benjamin Gray --- v3: * Rename from *_memory to *_mem * Change type of ppc32 patch_uint() address to void* * Explain introduction of val32 for big endian * Some formatting v2: * Deduplicate patch_32() definition * Use u32 for val32 * Remove noinline --- arch/powerpc/include/asm/code-patching.h | 31 arch/powerpc/lib/code-patching.c | 64 ++-- 2 files changed, 80 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h index 0e29ccf903d0..21a36e2c4e26 100644 --- a/arch/powerpc/include/asm/code-patching.h +++ b/arch/powerpc/include/asm/code-patching.h @@ -76,6 +76,37 @@ int patch_instruction(u32 *addr, ppc_inst_t instr); int raw_patch_instruction(u32 *addr, ppc_inst_t instr); int patch_instructions(u32 *addr, u32 *code, size_t len, bool repeat_instr); +/* + * The data patching functions patch_uint() and patch_ulong(), etc., must be + * called on aligned addresses. + * + * The instruction patching functions patch_instruction() and similar must be + * called on addresses satisfying instruction alignment requirements. + */ + +#ifdef CONFIG_PPC64 + +int patch_uint(void *addr, unsigned int val); +int patch_ulong(void *addr, unsigned long val); + +#define patch_u64 patch_ulong + +#else + +static inline int patch_uint(void *addr, unsigned int val) +{ + return patch_instruction(addr, ppc_inst(val)); +} + +static inline int patch_ulong(void *addr, unsigned long val) +{ + return patch_instruction(addr, ppc_inst(val)); +} + +#endif + +#define patch_u32 patch_uint + static inline unsigned long patch_site_addr(s32 *site) { return (unsigned long)site + *site; diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index c6ab46156cda..a31e313c6321 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -20,15 +20,14 @@ #include #include -static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 *patch_addr) +static int __patch_mem(void *exec_addr, unsigned long val, void *patch_addr, bool is_dword) { - if (!ppc_inst_prefixed(instr)) { - u32 val = ppc_inst_val(instr); + if (!IS_ENABLED(CONFIG_PPC64) || likely(!is_dword)) { + /* For big endian correctness: plain address would use the wrong half */ + u32 val32 = val; - __put_kernel_nofault(patch_addr, , u32, failed); + __put_kernel_nofault(patch_addr, , u32, failed); } else { - u64 val = ppc_inst_as_ulong(instr); - __put_kernel_nofault(patch_addr, , u64, failed); } @@ -44,7 +43,10 @@ static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 *patch_addr int raw_patch_instruction(u32 *addr, ppc_inst_t instr) { - return __patch_instruction(addr, instr, addr); + if (ppc_inst_prefixed(instr)) + return __patch_mem(addr, ppc_inst_as_ulong(instr), addr, true); + else + return __patch_mem(addr, ppc_inst_val(instr), addr, false); } struct patch_context { @@ -276,7 +278,7 @@ static void unmap_patch_area(unsigned long addr) flush_tlb_kernel_range(addr, addr + PAGE_SIZE); } -static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) +static int __do_patch_mem_mm(void *addr, unsigned long val, bool is_dword) { int err; u32 *patch_addr; @@ -305,7 +307,7 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) orig_mm = start_using_temp_mm(patching_mm); - err = __patch_instruction(addr, instr, patch_addr); + err = __patch_mem(addr, val, patch_addr, is_dword); /* context synchronisation performed
[PATCH v3 4/5] powerpc/32: Convert patch_instruction() to patch_uint()
These changes are for patch_instruction() uses on data. Unlike ppc64 these should not be incorrect as-is, but using the patch_uint() alias better reflects what kind of data being patched and allows for benchmarking the effect of different patch_* implementations (e.g., skipping instruction flushing when patching data). Signed-off-by: Benjamin Gray --- arch/powerpc/kernel/static_call.c | 2 +- arch/powerpc/platforms/powermac/smp.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/static_call.c b/arch/powerpc/kernel/static_call.c index 863a7aa24650..1502b7e439ca 100644 --- a/arch/powerpc/kernel/static_call.c +++ b/arch/powerpc/kernel/static_call.c @@ -17,7 +17,7 @@ void arch_static_call_transform(void *site, void *tramp, void *func, bool tail) mutex_lock(_mutex); if (func && !is_short) { - err = patch_instruction(tramp + PPC_SCT_DATA, ppc_inst(target)); + err = patch_ulong(tramp + PPC_SCT_DATA, target); if (err) goto out; } diff --git a/arch/powerpc/platforms/powermac/smp.c b/arch/powerpc/platforms/powermac/smp.c index 15644be31990..d21b681f52fb 100644 --- a/arch/powerpc/platforms/powermac/smp.c +++ b/arch/powerpc/platforms/powermac/smp.c @@ -827,7 +827,7 @@ static int smp_core99_kick_cpu(int nr) mdelay(1); /* Restore our exception vector */ - patch_instruction(vector, ppc_inst(save_vector)); + patch_uint(vector, save_vector); local_irq_restore(flags); if (ppc_md.progress) ppc_md.progress("smp_core99_kick_cpu done", 0x347); -- 2.44.0
[PATCH v3 0/5] Add generic data patching functions
Currently patch_instruction() bases the write length on the value being written. If the value looks like a prefixed instruction it writes 8 bytes, otherwise it writes 4 bytes. This makes it potentially buggy to use for writing arbitrary data, as if you want to write 4 bytes but it decides to write 8 bytes it may clobber the following memory or be unaligned and trigger an oops if it tries to cross a page boundary. To solve this, this series pulls out the size parameter to the 'top' of the memory patching logic, and propagates it through the various functions. The two sizes supported are int and long; this allows for patching instructions and pointers on both ppc32 and ppc64. On ppc32 these are the same size, so care is taken to only use the size parameter on static functions, so the compiler can optimise it out entirely. Unfortunately GCC trips over its own feet here and won't optimise in a way that is optimal for strict RWX (mpc85xx_smp_defconfig) and no RWX (pmac32_defconfig). More details in the v2 cover letter. Changes from v2: * Various changes noted on each patch * Data patching now enforced to be aligned * Restore page aligned flushing optimisation Changes from v1: * Addressed the v1 review actions * Removed noinline (for now) v2: https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20231016050147.115686-1-bg...@linux.ibm.com/ v1: https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20230207015643.590684-1-bg...@linux.ibm.com/ Benjamin Gray (5): powerpc/code-patching: Add generic memory patching powerpc/code-patching: Add data patch alignment check powerpc/64: Convert patch_instruction() to patch_u32() powerpc/32: Convert patch_instruction() to patch_uint() powerpc/code-patching: Add boot selftest for data patching arch/powerpc/include/asm/code-patching.h | 37 + arch/powerpc/kernel/module_64.c | 5 +- arch/powerpc/kernel/static_call.c| 2 +- arch/powerpc/lib/code-patching.c | 70 +++- arch/powerpc/lib/test-code-patching.c| 36 arch/powerpc/platforms/powermac/smp.c| 2 +- 6 files changed, 132 insertions(+), 20 deletions(-) -- 2.44.0
[PATCH v3 2/5] powerpc/code-patching: Add data patch alignment check
The new data patching still needs to be aligned within a cacheline too for the flushes to work correctly. To simplify this requirement, we just say data patches must be aligned. Detect when data patching is not aligned, returning an invalid argument error. Signed-off-by: Benjamin Gray --- v3: * New in v3 --- arch/powerpc/include/asm/code-patching.h | 6 ++ arch/powerpc/lib/code-patching.c | 6 ++ 2 files changed, 12 insertions(+) diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h index 21a36e2c4e26..e7f14720f630 100644 --- a/arch/powerpc/include/asm/code-patching.h +++ b/arch/powerpc/include/asm/code-patching.h @@ -95,11 +95,17 @@ int patch_ulong(void *addr, unsigned long val); static inline int patch_uint(void *addr, unsigned int val) { + if (!IS_ALIGNED((unsigned long)addr, sizeof(unsigned int))) + return -EINVAL; + return patch_instruction(addr, ppc_inst(val)); } static inline int patch_ulong(void *addr, unsigned long val) { + if (!IS_ALIGNED((unsigned long)addr, sizeof(unsigned long))) + return -EINVAL; + return patch_instruction(addr, ppc_inst(val)); } diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index a31e313c6321..35c620dbdfd4 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -386,12 +386,18 @@ NOKPROBE_SYMBOL(patch_instruction); int patch_uint(void *addr, unsigned int val) { + if (!IS_ALIGNED((unsigned long)addr, sizeof(unsigned int))) + return -EINVAL; + return patch_mem(addr, val, false); } NOKPROBE_SYMBOL(patch_uint); int patch_ulong(void *addr, unsigned long val) { + if (!IS_ALIGNED((unsigned long)addr, sizeof(unsigned long))) + return -EINVAL; + return patch_mem(addr, val, true); } NOKPROBE_SYMBOL(patch_ulong); -- 2.44.0
[PATCH v3 3/5] powerpc/64: Convert patch_instruction() to patch_u32()
This use of patch_instruction() is working on 32 bit data, and can fail if the data looks like a prefixed instruction and the extra write crosses a page boundary. Use patch_u32() to fix the write size. Fixes: 8734b41b3efe ("powerpc/module_64: Fix livepatching for RO modules") Link: https://lore.kernel.org/all/20230203004649.1f59dbd4@yea/ Signed-off-by: Benjamin Gray --- v2: * Added the fixes tag, it seems appropriate even if the subject does mention a more robust solution being required. patch_u64() should be more efficient, but judging from the bug report it doesn't seem like the data is doubleword aligned. --- arch/powerpc/kernel/module_64.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c index 7112adc597a8..e9bab599d0c2 100644 --- a/arch/powerpc/kernel/module_64.c +++ b/arch/powerpc/kernel/module_64.c @@ -651,12 +651,11 @@ static inline int create_stub(const Elf64_Shdr *sechdrs, // func_desc_t is 8 bytes if ABIv2, else 16 bytes desc = func_desc(addr); for (i = 0; i < sizeof(func_desc_t) / sizeof(u32); i++) { - if (patch_instruction(((u32 *)>funcdata) + i, - ppc_inst(((u32 *)())[i]))) + if (patch_u32(((u32 *)>funcdata) + i, ((u32 *))[i])) return 0; } - if (patch_instruction(>magic, ppc_inst(STUB_MAGIC))) + if (patch_u32(>magic, STUB_MAGIC)) return 0; return 1; -- 2.44.0
[PATCH v2 1/2] powerpc/code-patching: Test patch_instructions() during boot
patch_instructions() introduces new behaviour with a couple of variations. Test each case of * a repeated 32-bit instruction, * a repeated 64-bit instruction (ppc64), and * a copied sequence of instructions for both on a single page and when it crosses a page boundary. Signed-off-by: Benjamin Gray --- v1: https://lore.kernel.org/all/20240315025736.404867-1-bg...@linux.ibm.com/ v2: * Shrink the code array to reduce frame size. It still crosses a page, and 32 vs 256 words is unlikely to make a difference in test coverage otherwise. --- arch/powerpc/lib/test-code-patching.c | 92 +++ 1 file changed, 92 insertions(+) diff --git a/arch/powerpc/lib/test-code-patching.c b/arch/powerpc/lib/test-code-patching.c index c44823292f73..f76030087f98 100644 --- a/arch/powerpc/lib/test-code-patching.c +++ b/arch/powerpc/lib/test-code-patching.c @@ -347,6 +347,97 @@ static void __init test_prefixed_patching(void) check(!memcmp(iptr, expected, sizeof(expected))); } +static void __init test_multi_instruction_patching(void) +{ + u32 code[32]; + void *buf; + u32 *addr32; + u64 *addr64; + ppc_inst_t inst64 = ppc_inst_prefix(OP_PREFIX << 26 | 3UL << 24, PPC_RAW_TRAP()); + u32 inst32 = PPC_RAW_NOP(); + + buf = vzalloc(PAGE_SIZE * 8); + check(buf); + if (!buf) + return; + + /* Test single page 32-bit repeated instruction */ + addr32 = buf + PAGE_SIZE; + check(!patch_instructions(addr32 + 1, , 12, true)); + + check(addr32[0] == 0); + check(addr32[1] == inst32); + check(addr32[2] == inst32); + check(addr32[3] == inst32); + check(addr32[4] == 0); + + /* Test single page 64-bit repeated instruction */ + if (IS_ENABLED(CONFIG_PPC64)) { + check(ppc_inst_prefixed(inst64)); + + addr64 = buf + PAGE_SIZE * 2; + ppc_inst_write(code, inst64); + check(!patch_instructions((u32 *)(addr64 + 1), code, 24, true)); + + check(addr64[0] == 0); + check(ppc_inst_equal(ppc_inst_read((u32 *)[1]), inst64)); + check(ppc_inst_equal(ppc_inst_read((u32 *)[2]), inst64)); + check(ppc_inst_equal(ppc_inst_read((u32 *)[3]), inst64)); + check(addr64[4] == 0); + } + + /* Test single page memcpy */ + addr32 = buf + PAGE_SIZE * 3; + + for (int i = 0; i < ARRAY_SIZE(code); i++) + code[i] = i + 1; + + check(!patch_instructions(addr32 + 1, code, sizeof(code), false)); + + check(addr32[0] == 0); + check(!memcmp([1], code, sizeof(code))); + check(addr32[ARRAY_SIZE(code) + 1] == 0); + + /* Test multipage 32-bit repeated instruction */ + addr32 = buf + PAGE_SIZE * 4 - 8; + check(!patch_instructions(addr32 + 1, , 12, true)); + + check(addr32[0] == 0); + check(addr32[1] == inst32); + check(addr32[2] == inst32); + check(addr32[3] == inst32); + check(addr32[4] == 0); + + /* Test multipage 64-bit repeated instruction */ + if (IS_ENABLED(CONFIG_PPC64)) { + check(ppc_inst_prefixed(inst64)); + + addr64 = buf + PAGE_SIZE * 5 - 8; + ppc_inst_write(code, inst64); + check(!patch_instructions((u32 *)(addr64 + 1), code, 24, true)); + + check(addr64[0] == 0); + check(ppc_inst_equal(ppc_inst_read((u32 *)[1]), inst64)); + check(ppc_inst_equal(ppc_inst_read((u32 *)[2]), inst64)); + check(ppc_inst_equal(ppc_inst_read((u32 *)[3]), inst64)); + check(addr64[4] == 0); + } + + /* Test multipage memcpy */ + addr32 = buf + PAGE_SIZE * 6 - 12; + + for (int i = 0; i < ARRAY_SIZE(code); i++) + code[i] = i + 1; + + check(!patch_instructions(addr32 + 1, code, sizeof(code), false)); + + check(addr32[0] == 0); + check(!memcmp([1], code, sizeof(code))); + check(addr32[ARRAY_SIZE(code) + 1] == 0); + + vfree(buf); +} + static int __init test_code_patching(void) { pr_info("Running code patching self-tests ...\n"); @@ -356,6 +447,7 @@ static int __init test_code_patching(void) test_create_function_call(); test_translate_branch(); test_prefixed_patching(); + test_multi_instruction_patching(); return 0; } -- 2.44.0
[PATCH v2 2/2] powerpc/code-patching: Use dedicated memory routines for patching
The patching page set up as a writable alias may be in quadrant 0 (userspace) if the temporary mm path is used. This causes sanitiser failures if so. Sanitiser failures also occur on the non-mm path because the plain memset family is instrumented, and KASAN treats the patching window as poisoned. Introduce locally defined patch_* variants of memset that perform an uninstrumented lower level set, as well as detecting write errors like the original single patch variant does. copy_to_user() is not correct here, as the PTE makes it a proper kernel page (the EAA is privileged access only, RW). It just happens to be in quadrant 0 because that's the hardware's mechanism for using the current PID vs PID 0 in translations. Importantly, it's incorrect to allow user page accesses. Now that the patching memsets are used, we also propagate a failure up to the caller as the single patch variant does. Signed-off-by: Benjamin Gray --- v2: * Fix typo in EAA (from EEA) * Fix references to quadrant number (0, not 1) * Use copy_to_kernel_nofault() over custom memcpy * Drop custom memcpy optimisation patch --- arch/powerpc/lib/code-patching.c | 31 +++ 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index c6ab46156cda..df64343b9214 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -372,9 +372,32 @@ int patch_instruction(u32 *addr, ppc_inst_t instr) } NOKPROBE_SYMBOL(patch_instruction); +static int patch_memset64(u64 *addr, u64 val, size_t count) +{ + for (u64 *end = addr + count; addr < end; addr++) + __put_kernel_nofault(addr, , u64, failed); + + return 0; + +failed: + return -EPERM; +} + +static int patch_memset32(u32 *addr, u32 val, size_t count) +{ + for (u32 *end = addr + count; addr < end; addr++) + __put_kernel_nofault(addr, , u32, failed); + + return 0; + +failed: + return -EPERM; +} + static int __patch_instructions(u32 *patch_addr, u32 *code, size_t len, bool repeat_instr) { unsigned long start = (unsigned long)patch_addr; + int err; /* Repeat instruction */ if (repeat_instr) { @@ -383,19 +406,19 @@ static int __patch_instructions(u32 *patch_addr, u32 *code, size_t len, bool rep if (ppc_inst_prefixed(instr)) { u64 val = ppc_inst_as_ulong(instr); - memset64((u64 *)patch_addr, val, len / 8); + err = patch_memset64((u64 *)patch_addr, val, len / 8); } else { u32 val = ppc_inst_val(instr); - memset32(patch_addr, val, len / 4); + err = patch_memset32(patch_addr, val, len / 4); } } else { - memcpy(patch_addr, code, len); + err = copy_to_kernel_nofault(patch_addr, code, len); } smp_wmb(); /* smp write barrier */ flush_icache_range(start, start + len); - return 0; + return err; } /* -- 2.44.0
[PATCH v1 1/2] powerpc64/dexcr: Compile kernel with privileged hash instructions
There are dedicated hashstp and hashchkp instructions that can be inserted into a guest kernel to give it hypervisor managed ROP protection (the hypervisor sets the secret hash key and handles hashstp exceptions). In testing, the kernel appears to handle the compiler generated hash protection just fine, without any changes. This makes sense, as any 'weird' stack interactions will normally be done in hand written assembly. We can expect that a compiler generated function prologue will be matched with a compiler generated function epilogue with the stack as expected by the compiler (in some sense, the hash value stored on the stack is just like any other local variable). GCC requires ELF ABI v2, and Clang only works with ELF ABI v2 anyway, so add it as a dependency. GCC will only insert these instructions if the target CPU is specified to be Power10 (possibly a bug; the documentation says they are inserted for Power8 or higher). Signed-off-by: Benjamin Gray --- arch/powerpc/Makefile | 3 +++ arch/powerpc/platforms/Kconfig.cputype | 12 2 files changed, 15 insertions(+) diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 65261cbe5bfd..bfaa3c754ae2 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -168,6 +168,9 @@ endif CFLAGS-$(CONFIG_TARGET_CPU_BOOL) += -mcpu=$(CONFIG_TARGET_CPU) AFLAGS-$(CONFIG_TARGET_CPU_BOOL) += -mcpu=$(CONFIG_TARGET_CPU) +CFLAGS-$(CONFIG_PPC_KERNEL_ROP_PROTECT) += $(call cc-option,-mrop-protect) +CFLAGS-$(CONFIG_PPC_KERNEL_ROP_PROTECT) += $(call cc-option,-mprivileged) + CFLAGS-y += $(CONFIG_TUNE_CPU) asinstr := $(call as-instr,lis 9$(comma)foo@high,-DHAVE_AS_ATHIGH=1) diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index b2d8c0da2ad9..a95b11782379 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -517,6 +517,18 @@ config PPC_KUAP_DEBUG Add extra debugging for Kernel Userspace Access Protection (KUAP) If you're unsure, say N. +config PPC_KERNEL_ROP_PROTECT + bool "Kernel ROP Protection" + default y + depends on PPC64_ELF_ABI_V2 + depends on !CC_IS_GCC || TARGET_CPU = "power10" + help + This tells the compiler to insert hashstp/hashckp instructions + in the prologue and epilogue of every kernel function. The kernel + also turns on the DEXCR[PHIE] aspect to cause an exception if the + hashchkp does not agree with the hash calculated by the matching + hashstp. + config PPC_PKEY def_bool y depends on PPC_BOOK3S_64 -- 2.44.0
[PATCH v1 2/2] powerpc64/dexcr: Enable PHIE on all CPUs
While we can now compile the kernel with ROP protection, it's possible the hash instructions are acting as NOPs. Enable the PHIE aspect at an appropriate stage in boot so as to maximise coverage without requiring certain functions be compiled without ROP protection. For the boot CPU, there are no arch defined functions that do not return and get called before we start spawning tasks. Therefore we insert the PHIE enablement as a feature section after we call early_setup() where CPU feature detection takes place. For secondary CPUs, we can enable PHIE in start_secondary(). Signed-off-by: Benjamin Gray --- This patch is probably incompatible with the per-task DEXCR tracking in the userspace DEXCR series, but I'll fix up whichever one lands last. I tested on a Power10 (TCG and KVM) and Power9. I also tried enabling ftrace; no apparent issues, and the trace probes were definitely triggering. The default config enables ROP protection when the dependencies are satisfied but perhaps we might want to phase it in slower by disabling it? Finally, I've tied together inserting the hash instructions and enabling the PHIE aspect. It might be preferable for distros to have the option to boot without enabling PHIE for performance comparisons. This would be with a command line option I guess? --- arch/powerpc/include/asm/reg.h | 2 ++ arch/powerpc/kernel/head_64.S | 10 ++ arch/powerpc/kernel/smp.c | 6 ++ 3 files changed, 18 insertions(+) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index d3d1aea009b4..185807119320 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -392,6 +392,8 @@ #define DEXCR_PR_IBRTPD 0x1000UL /* 3: Indirect Branch Recurrent Target Prediction Disable */ #define DEXCR_PR_SRAPD 0x0800UL /* 4: Subroutine Return Address Prediction Disable */ #define DEXCR_PR_NPHIE 0x0400UL /* 5: Non-Privileged Hash Instruction Enable */ +#define DEXCR_PR_PHIE 0x0200UL /* 6: Privileged Hash Instruction Enable */ +#define DEXCR_PNH_PHIE (DEXCR_PR_PHIE << 32) #define DEXCR_INIT DEXCR_PR_NPHIE /* Fixed DEXCR value to initialise all CPUs with */ #define SPRN_IC0x350 /* Virtual Instruction Count */ #define SPRN_VTB 0x351 /* Virtual Time Base */ diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index 4690c219bfa4..490cd09dc9a4 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -1011,6 +1011,16 @@ start_here_multiplatform: mtctr r12 bctrl /* also sets r13 and SPRG_PACA */ +#ifdef CONFIG_PPC_KERNEL_ROP_PROTECT +BEGIN_FTR_SECTION + mfspr r3,SPRN_DEXCR + LOAD_REG_IMMEDIATE(r4,DEXCR_PNH_PHIE) + or r3,r3,r4 + mtspr SPRN_DEXCR,r3 + isync +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_31) +#endif + LOAD_REG_ADDR(r3, start_here_common) ld r4,PACAKMSR(r13) mtspr SPRN_SRR0,r3 diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index a60e4139214b..4a8e9f79aa0c 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -1620,6 +1620,12 @@ void start_secondary(void *unused) { unsigned int cpu = raw_smp_processor_id(); + /* Enable hash instructions on this CPU in case not already enabled by the hypervisor */ + if (IS_ENABLED(CONFIG_PPC_KERNEL_ROP_PROTECT) && cpu_has_feature(CPU_FTR_ARCH_31)) { + mtspr(SPRN_DEXCR, mfspr(SPRN_DEXCR) | DEXCR_PNH_PHIE); + isync(); + } + /* PPC64 calls setup_kup() in early_setup_secondary() */ if (IS_ENABLED(CONFIG_PPC32)) setup_kup(); -- 2.44.0
Re: [PATCH v1 1/3] powerpc/code-patching: Test patch_instructions() during boot
mjW5LaBPOdGiiDE1w95Ri9HRK27S2dRZpyib9L4mkfYWPAF41mTudjKmVpgtBLO//rO+zmF04OMB/4sWJhLfvhq1CXULDqw5dcuIAIYwf2ughOtyAPFK1ViDcMO5X1bVpNAFO5m4VBpZvFDQ0j0JfqfVBdL68uH05W1/8dMj76RaWj5m0rLM5slY1FQUPddSU+ic9vaZhlDepjU3ZyI8fmioofNGHaxJq6uNTytKdj87kwDV6PQ4hmuGtY56C7JCgjp053sRJ6sXqgKBWfe4ZOJH17mQm+fws93byLoZvvz4Z3im0Rb0MlFo/WirNyhu+TmTNLpnzFUZfenoKrqAkZLY8u1iCFquhgqA321P+sfYew66DtwQmaoi2GKmF89y2enXXzjLNKfLDKkuVoKxFSPeizYqrLi22R9iO8EGBYKACAWIQQ9K5v9I+L06Hi4yOJ5xrdpFsvehAUCYzuwkQIbAgCBCRB5xrdpFsvehHYgBBkRCAAdFiEESFUlaLYscsf4Dt5gaavCcpI6D/8FAmM7sJEACgkQaavCcpI6D/95UgEAqfSj0QhCrYfazQiLDKJstrz3oIKFjhB6+FYMZqt+K1MA/2ioFtHbypeeWbsqYYRhRyTjAKcvE1NZGtH/YWLgkViUidoBAN6gFX/P+VWB77/w8S/BnPmnJx45wmphlkCL8ckOyopFAQCj9eWamHCl2DSaASMSuoZed6C6Gm0OFtuZh/r8K485BQ== Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Date: Mon, 18 Mar 2024 08:55:02 +1100 MIME-Version: 1.0 User-Agent: Evolution 3.50.4 (3.50.4-1.fc39) X-Trend-IP-HD: ip=[9.192.253.14]helo={ozlabs.au.ibm.com}sender=(bg...@linux.ibm.com)recipient= On Mon, 2024-03-18 at 08:38 +1100, Benjamin Gray wrote: > On Fri, 2024-03-15 at 07:14 +, Christophe Leroy wrote: > >=20 > >=20 > > Le 15/03/2024 =C3=A0 03:57, Benjamin Gray a =C3=A9crit=C2=A0: > > > patch_instructions() introduces new behaviour with a couple of > > > variations. Test each case of > > >=20 > > > =C2=A0=C2=A0 * a repeated 32-bit instruction, > > > =C2=A0=C2=A0 * a repeated 64-bit instruction (ppc64), and > > > =C2=A0=C2=A0 * a copied sequence of instructions > > >=20 > > > for both on a single page and when it crosses a page boundary. > > >=20 > > > Signed-off-by: Benjamin Gray > > > --- > > > =C2=A0 arch/powerpc/lib/test-code-patching.c | 92 > > > +++ > > > =C2=A0 1 file changed, 92 insertions(+) > > >=20 > > > diff --git a/arch/powerpc/lib/test-code-patching.c > > > b/arch/powerpc/lib/test-code-patching.c > > > index c44823292f73..35a3756272df 100644 > > > --- a/arch/powerpc/lib/test-code-patching.c > > > +++ b/arch/powerpc/lib/test-code-patching.c > > > @@ -347,6 +347,97 @@ static void __init > > > test_prefixed_patching(void) > > > =C2=A0=C2=A0 check(!memcmp(iptr, expected, sizeof(expected))); > > > =C2=A0 } > > > =C2=A0=20 > > > +static void __init test_multi_instruction_patching(void) > > > +{ > > > + u32 code[256]; > >=20 > > Build failure: > >=20 > > =C2=A0=C2=A0 CC=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 arch/powerpc/lib/test-cod= e-patching.o > > arch/powerpc/lib/test-code-patching.c: In function=20 > > 'test_multi_instruction_patching': > > arch/powerpc/lib/test-code-patching.c:439:1: error: the frame size > > of > > 1040 bytes is larger than 1024 bytes [-Werror=3Dframe-larger-than=3D] > > =C2=A0=C2=A0 439 | } > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | ^ > > cc1: all warnings being treated as errors > > make[4]: *** [scripts/Makefile.build:243:=20 > > arch/powerpc/lib/test-code-patching.o] Error 1 > >=20 > >=20 > > I have to avoid big arrays on the stack. >=20 > All good, I can do that. >=20 > I do run my patches through a couple of 32-bit configs, but I didn't > see this error. Is this a standard config I should be testing with? >=20 Specifically I build pmac32_defconfig and ppc44x_defconfig
Re: [PATCH v1 1/3] powerpc/code-patching: Test patch_instructions() during boot
On Mon, 2024-03-18 at 08:38 +1100, Benjamin Gray wrote: > On Fri, 2024-03-15 at 07:14 +, Christophe Leroy wrote: > > > > > > Le 15/03/2024 à 03:57, Benjamin Gray a écrit : > > > patch_instructions() introduces new behaviour with a couple of > > > variations. Test each case of > > > > > > * a repeated 32-bit instruction, > > > * a repeated 64-bit instruction (ppc64), and > > > * a copied sequence of instructions > > > > > > for both on a single page and when it crosses a page boundary. > > > > > > Signed-off-by: Benjamin Gray > > > --- > > > arch/powerpc/lib/test-code-patching.c | 92 > > > +++ > > > 1 file changed, 92 insertions(+) > > > > > > diff --git a/arch/powerpc/lib/test-code-patching.c > > > b/arch/powerpc/lib/test-code-patching.c > > > index c44823292f73..35a3756272df 100644 > > > --- a/arch/powerpc/lib/test-code-patching.c > > > +++ b/arch/powerpc/lib/test-code-patching.c > > > @@ -347,6 +347,97 @@ static void __init > > > test_prefixed_patching(void) > > > check(!memcmp(iptr, expected, sizeof(expected))); > > > } > > > > > > +static void __init test_multi_instruction_patching(void) > > > +{ > > > + u32 code[256]; > > > > Build failure: > > > > CC arch/powerpc/lib/test-code-patching.o > > arch/powerpc/lib/test-code-patching.c: In function > > 'test_multi_instruction_patching': > > arch/powerpc/lib/test-code-patching.c:439:1: error: the frame size > > of > > 1040 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] > > 439 | } > > | ^ > > cc1: all warnings being treated as errors > > make[4]: *** [scripts/Makefile.build:243: > > arch/powerpc/lib/test-code-patching.o] Error 1 > > > > > > I have to avoid big arrays on the stack. > > All good, I can do that. > > I do run my patches through a couple of 32-bit configs, but I didn't > see this error. Is this a standard config I should be testing with? Specifically pmac32_defconfig and ppc44x_defconfig
Re: [PATCH v1 3/3] powerpc/code-patching: Optimise patch_memcpy() to 4 byte chunks
On Fri, 2024-03-15 at 06:39 +, Christophe Leroy wrote: > > > Le 15/03/2024 à 03:57, Benjamin Gray a écrit : > > As we are patching instructions, we can assume the length is a > > multiple > > of 4 and the destination address is aligned. > > > > Atomicity of patching a prefixed instruction is not a concern, as > > the > > original implementation doesn't provide it anyway. > > This patch looks unnecessary. > > copy_to_kernel_nofault() is what you want to use instead. Yeah, I would drop this patch when using copy_to_kernel_nofault() > > > > > Signed-off-by: Benjamin Gray > > --- > > arch/powerpc/lib/code-patching.c | 8 > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/arch/powerpc/lib/code-patching.c > > b/arch/powerpc/lib/code-patching.c > > index c6633759b509..ed450a32918c 100644 > > --- a/arch/powerpc/lib/code-patching.c > > +++ b/arch/powerpc/lib/code-patching.c > > @@ -394,10 +394,10 @@ static int patch_memset32(u32 *addr, u32 val, > > size_t count) > > return -EPERM; > > } > > > > -static int patch_memcpy(void *dst, void *src, size_t len) > > +static int patch_memcpy32(u32 *dst, u32 *src, size_t count) > > { > > - for (void *end = src + len; src < end; dst++, src++) > > - __put_kernel_nofault(dst, src, u8, failed); > > + for (u32 *end = src + count; src < end; dst++, src++) > > + __put_kernel_nofault(dst, src, u32, failed); > > > > return 0; > > > > @@ -424,7 +424,7 @@ static int __patch_instructions(u32 > > *patch_addr, u32 *code, size_t len, bool rep > > err = patch_memset32(patch_addr, val, len > > / 4); > > } > > } else { > > - err = patch_memcpy(patch_addr, code, len); > > + err = patch_memcpy32(patch_addr, code, len / 4); > > } > > > > smp_wmb(); /* smp write barrier */
Re: [PATCH v1 2/3] powerpc/code-patching: Use dedicated memory routines for patching
On Fri, 2024-03-15 at 06:36 +, Christophe Leroy wrote: > > > Le 15/03/2024 à 03:57, Benjamin Gray a écrit : > > The patching page set up as a writable alias may be in quadrant 1 > > (userspace) if the temporary mm path is used. This causes sanitiser > > failures if so. Sanitiser failures also occur on the non-mm path > > because the plain memset family is instrumented, and KASAN treats > > the > > patching window as poisoned. > > > > Introduce locally defined patch_* variants of memset that perform > > an > > uninstrumented lower level set, as well as detecting write errors > > like > > the original single patch variant does. > > > > copy_to_user() is not correct here, as the PTE makes it a proper > > kernel > > page (the EEA is privileged access only, RW). It just happens to be > > in > > quadrant 1 because that's the hardware's mechanism for using the > > current > > PID vs PID 0 in translations. Importantly, it's incorrect to allow > > user > > page accesses. > > > > Now that the patching memsets are used, we also propagate a failure > > up > > to the caller as the single patch variant does. > > > > Signed-off-by: Benjamin Gray > > > > --- > > > > The patch_memcpy() can be optimised to 4 bytes at a time assuming > > the > > same requirements as regular instruction patching are being > > followed > > for the 'copy sequence of instructions' mode (i.e., they actually > > are > > instructions following instruction alignment rules). > > Why not use copy_to_kernel_nofault() ? I had not come across copy_to_kernel_nofault(). It looks like the optimised memcpy() I wanted, so thanks. > > > > --- > > arch/powerpc/lib/code-patching.c | 42 > > +--- > > 1 file changed, 38 insertions(+), 4 deletions(-) > > > > diff --git a/arch/powerpc/lib/code-patching.c > > b/arch/powerpc/lib/code-patching.c > > index c6ab46156cda..c6633759b509 100644 > > --- a/arch/powerpc/lib/code-patching.c > > +++ b/arch/powerpc/lib/code-patching.c > > @@ -372,9 +372,43 @@ int patch_instruction(u32 *addr, ppc_inst_t > > instr) > > } > > NOKPROBE_SYMBOL(patch_instruction); > > > > +static int patch_memset64(u64 *addr, u64 val, size_t count) > > +{ > > + for (u64 *end = addr + count; addr < end; addr++) > > + __put_kernel_nofault(addr, , u64, failed); > > + > > + return 0; > > + > > +failed: > > + return -EPERM; > > Is it correct ? Shouldn't it be -EFAULT ? The single instruction patch returns EPERM, which was set this way to align with ftrace's expectations. I think it's best to keep the single/multi patching variants consistent with each other where possible. > > > +} > > + > > +static int patch_memset32(u32 *addr, u32 val, size_t count) > > +{ > > + for (u32 *end = addr + count; addr < end; addr++) > > + __put_kernel_nofault(addr, , u32, failed); > > + > > + return 0; > > + > > +failed: > > + return -EPERM; > > +} > > + > > +static int patch_memcpy(void *dst, void *src, size_t len) > > +{ > > + for (void *end = src + len; src < end; dst++, src++) > > + __put_kernel_nofault(dst, src, u8, failed); > > + > > + return 0; > > + > > +failed: > > + return -EPERM; > > +} > > + > > static int __patch_instructions(u32 *patch_addr, u32 *code, > > size_t len, bool repeat_instr) > > { > > unsigned long start = (unsigned long)patch_addr; > > + int err; > > > > /* Repeat instruction */ > > if (repeat_instr) { > > @@ -383,19 +417,19 @@ static int __patch_instructions(u32 > > *patch_addr, u32 *code, size_t len, bool rep > > if (ppc_inst_prefixed(instr)) { > > u64 val = ppc_inst_as_ulong(instr); > > > > - memset64((u64 *)patch_addr, val, len / 8); > > + err = patch_memset64((u64 *)patch_addr, > > val, len / 8); > > } else { > > u32 val = ppc_inst_val(instr); > > > > - memset32(patch_addr, val, len / 4); > > + err = patch_memset32(patch_addr, val, len > > / 4); > > } > > } else { > > - memcpy(patch_addr, code, len); > > + err = patch_memcpy(patch_addr, code, len); > > Use copy_to_kernel_nofault() instead of open coding a new less > optimised > version of it. > > > } > > > > smp_wmb(); /* smp write barrier */ > > flush_icache_range(start, start + len); > > - return 0; > > + return err; > > } > > > > /*
Re: [PATCH v1 1/3] powerpc/code-patching: Test patch_instructions() during boot
On Fri, 2024-03-15 at 07:14 +, Christophe Leroy wrote: > > > Le 15/03/2024 à 03:57, Benjamin Gray a écrit : > > patch_instructions() introduces new behaviour with a couple of > > variations. Test each case of > > > > * a repeated 32-bit instruction, > > * a repeated 64-bit instruction (ppc64), and > > * a copied sequence of instructions > > > > for both on a single page and when it crosses a page boundary. > > > > Signed-off-by: Benjamin Gray > > --- > > arch/powerpc/lib/test-code-patching.c | 92 > > +++ > > 1 file changed, 92 insertions(+) > > > > diff --git a/arch/powerpc/lib/test-code-patching.c > > b/arch/powerpc/lib/test-code-patching.c > > index c44823292f73..35a3756272df 100644 > > --- a/arch/powerpc/lib/test-code-patching.c > > +++ b/arch/powerpc/lib/test-code-patching.c > > @@ -347,6 +347,97 @@ static void __init > > test_prefixed_patching(void) > > check(!memcmp(iptr, expected, sizeof(expected))); > > } > > > > +static void __init test_multi_instruction_patching(void) > > +{ > > + u32 code[256]; > > Build failure: > > CC arch/powerpc/lib/test-code-patching.o > arch/powerpc/lib/test-code-patching.c: In function > 'test_multi_instruction_patching': > arch/powerpc/lib/test-code-patching.c:439:1: error: the frame size of > 1040 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] > 439 | } > | ^ > cc1: all warnings being treated as errors > make[4]: *** [scripts/Makefile.build:243: > arch/powerpc/lib/test-code-patching.o] Error 1 > > > I have to avoid big arrays on the stack. All good, I can do that. I do run my patches through a couple of 32-bit configs, but I didn't see this error. Is this a standard config I should be testing with? > > > > + void *buf; > > + u32 *addr32; > > + u64 *addr64; > > + ppc_inst_t inst64 = ppc_inst_prefix(OP_PREFIX << 26 | 3UL > > << 24, PPC_RAW_TRAP()); > > + u32 inst32 = PPC_RAW_NOP(); > > + > > + buf = vzalloc(PAGE_SIZE * 8); > > + check(buf); > > + if (!buf) > > + return; > > + > > + /* Test single page 32-bit repeated instruction */ > > + addr32 = buf + PAGE_SIZE; > > + check(!patch_instructions(addr32 + 1, , 12, true)); > > + > > + check(addr32[0] == 0); > > + check(addr32[1] == inst32); > > + check(addr32[2] == inst32); > > + check(addr32[3] == inst32); > > + check(addr32[4] == 0); > > + > > + /* Test single page 64-bit repeated instruction */ > > + if (IS_ENABLED(CONFIG_PPC64)) { > > + check(ppc_inst_prefixed(inst64)); > > + > > + addr64 = buf + PAGE_SIZE * 2; > > + ppc_inst_write(code, inst64); > > + check(!patch_instructions((u32 *)(addr64 + 1), > > code, 24, true)); > > + > > + check(addr64[0] == 0); > > + check(ppc_inst_equal(ppc_inst_read((u32 > > *)[1]), inst64)); > > + check(ppc_inst_equal(ppc_inst_read((u32 > > *)[2]), inst64)); > > + check(ppc_inst_equal(ppc_inst_read((u32 > > *)[3]), inst64)); > > + check(addr64[4] == 0); > > + } > > + > > + /* Test single page memcpy */ > > + addr32 = buf + PAGE_SIZE * 3; > > + > > + for (int i = 0; i < ARRAY_SIZE(code); i++) > > + code[i] = i + 1; > > + > > + check(!patch_instructions(addr32 + 1, code, sizeof(code), > > false)); > > + > > + check(addr32[0] == 0); > > + check(!memcmp([1], code, sizeof(code))); > > + check(addr32[ARRAY_SIZE(code) + 1] == 0); > > + > > + /* Test multipage 32-bit repeated instruction */ > > + addr32 = buf + PAGE_SIZE * 4 - 8; > > + check(!patch_instructions(addr32 + 1, , 12, true)); > > + > > + check(addr32[0] == 0); > > + check(addr32[1] == inst32); > > + check(addr32[2] == inst32); > > + check(addr32[3] == inst32); > > + check(addr32[4] == 0); > > + > > + /* Test multipage 64-bit repeated instruction */ > > + if (IS_ENABLED(CONFIG_PPC64)) { > > + check(ppc_inst_prefixed(inst64)); > > + > > + addr64 = buf + PAGE_SIZE * 5 - 8; > > + ppc_inst_write(code, inst64); > > + check(!patch_instructions((u32 *)(addr64 + 1), > > code, 24, true)); > > + > > + check(addr64[0] == 0); > > + check(ppc_inst_equal(ppc_inst_read((u32 > > *)[1]), inst64)); > > + ch
Re: [PATCH v1 2/2] powerpc/code-patching: Convert to open_patch_window()/close_patch_window()
On Sat, 2024-03-16 at 10:10 +, Christophe Leroy wrote: > > > Le 15/03/2024 à 09:38, Christophe Leroy a écrit : > > > > > > Le 15/03/2024 à 03:59, Benjamin Gray a écrit : > > > The existing patching alias page setup and teardown sections can > > > be > > > simplified to make use of the new open_patch_window() > > > abstraction. > > > > > > This eliminates the _mm variants of the helpers, consumers no > > > longer > > > need to check mm_patch_enabled(), and consumers no longer need to > > > worry > > > about synchronization and flushing beyond the changes they make > > > in the > > > patching window. > > > > With this patch, the time needed to activate or de-activate > > function > > tracer is approx 10% longer on powerpc 8xx. > > With the following changes, the performance is restored: > > diff --git a/arch/powerpc/lib/code-patching.c > b/arch/powerpc/lib/code-patching.c > index fd6f8576033a..bc92b85913d8 100644 > --- a/arch/powerpc/lib/code-patching.c > +++ b/arch/powerpc/lib/code-patching.c > @@ -282,13 +282,13 @@ struct patch_window { > * Interrupts must be disabled for the entire duration of the > patching. The PIDR > * is potentially changed during this time. > */ > -static int open_patch_window(void *addr, struct patch_window *ctx) > +static __always_inline int open_patch_window(void *addr, struct > patch_window *ctx) > { > unsigned long pfn = get_patch_pfn(addr); > > lockdep_assert_irqs_disabled(); > > - ctx->text_poke_addr = (unsigned > long)__this_cpu_read(cpu_patching_context.addr); > + ctx->text_poke_addr = (unsigned > long)__this_cpu_read(cpu_patching_context.addr) & PAGE_MASK; > > if (!mm_patch_enabled()) { > ctx->ptep = > __this_cpu_read(cpu_patching_context.pte); > @@ -331,7 +331,7 @@ static int open_patch_window(void *addr, struct > patch_window *ctx) > return 0; > } > > -static void close_patch_window(struct patch_window *ctx) > +static __always_inline void close_patch_window(struct patch_window > *ctx) > { > lockdep_assert_irqs_disabled(); > > Thanks for checking that. I did restore the page mask optimisation in a later patch while still developing, but the 64-bit assembly looked slightly worse for it. I didn't check the 32-bit; no way to benchmark it anyway.
Re: [PATCH v1 2/3] powerpc/code-patching: Use dedicated memory routines for patching
Also supersedes https://lore.kernel.org/all/20240213043638.168048-1-bg...@linux.ibm.com/
[PATCH v1 2/2] powerpc/code-patching: Convert to open_patch_window()/close_patch_window()
The existing patching alias page setup and teardown sections can be simplified to make use of the new open_patch_window() abstraction. This eliminates the _mm variants of the helpers, consumers no longer need to check mm_patch_enabled(), and consumers no longer need to worry about synchronization and flushing beyond the changes they make in the patching window. Signed-off-by: Benjamin Gray --- arch/powerpc/lib/code-patching.c | 180 +++ 1 file changed, 16 insertions(+), 164 deletions(-) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index d1b812f84154..fd6f8576033a 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -66,40 +66,6 @@ static bool mm_patch_enabled(void) return IS_ENABLED(CONFIG_SMP) && radix_enabled(); } -/* - * The following applies for Radix MMU. Hash MMU has different requirements, - * and so is not supported. - * - * Changing mm requires context synchronising instructions on both sides of - * the context switch, as well as a hwsync between the last instruction for - * which the address of an associated storage access was translated using - * the current context. - * - * switch_mm_irqs_off() performs an isync after the context switch. It is - * the responsibility of the caller to perform the CSI and hwsync before - * starting/stopping the temp mm. - */ -static struct mm_struct *start_using_temp_mm(struct mm_struct *temp_mm) -{ - struct mm_struct *orig_mm = current->active_mm; - - lockdep_assert_irqs_disabled(); - switch_mm_irqs_off(orig_mm, temp_mm, current); - - WARN_ON(!mm_is_thread_local(temp_mm)); - - suspend_breakpoints(); - return orig_mm; -} - -static void stop_using_temp_mm(struct mm_struct *temp_mm, - struct mm_struct *orig_mm) -{ - lockdep_assert_irqs_disabled(); - switch_mm_irqs_off(temp_mm, orig_mm, current); - restore_breakpoints(); -} - static int text_area_cpu_up(unsigned int cpu) { struct vm_struct *area; @@ -389,73 +355,20 @@ static void close_patch_window(struct patch_window *ctx) pte_unmap_unlock(ctx->ptep, ctx->ptl); } -static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) -{ - int err; - u32 *patch_addr; - unsigned long text_poke_addr; - pte_t *pte; - unsigned long pfn = get_patch_pfn(addr); - struct mm_struct *patching_mm; - struct mm_struct *orig_mm; - spinlock_t *ptl; - - patching_mm = __this_cpu_read(cpu_patching_context.mm); - text_poke_addr = __this_cpu_read(cpu_patching_context.addr); - patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr)); - - pte = get_locked_pte(patching_mm, text_poke_addr, ); - if (!pte) - return -ENOMEM; - - __set_pte_at(patching_mm, text_poke_addr, pte, pfn_pte(pfn, PAGE_KERNEL), 0); - - /* order PTE update before use, also serves as the hwsync */ - asm volatile("ptesync": : :"memory"); - - /* order context switch after arbitrary prior code */ - isync(); - - orig_mm = start_using_temp_mm(patching_mm); - - err = __patch_instruction(addr, instr, patch_addr); - - /* context synchronisation performed by __patch_instruction (isync or exception) */ - stop_using_temp_mm(patching_mm, orig_mm); - - pte_clear(patching_mm, text_poke_addr, pte); - /* -* ptesync to order PTE update before TLB invalidation done -* by radix__local_flush_tlb_page_psize (in _tlbiel_va) -*/ - local_flush_tlb_page_psize(patching_mm, text_poke_addr, mmu_virtual_psize); - - pte_unmap_unlock(pte, ptl); - - return err; -} - static int __do_patch_instruction(u32 *addr, ppc_inst_t instr) { - int err; + struct patch_window ctx; u32 *patch_addr; - unsigned long text_poke_addr; - pte_t *pte; - unsigned long pfn = get_patch_pfn(addr); - - text_poke_addr = (unsigned long)__this_cpu_read(cpu_patching_context.addr) & PAGE_MASK; - patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr)); + int err; - pte = __this_cpu_read(cpu_patching_context.pte); - __set_pte_at(_mm, text_poke_addr, pte, pfn_pte(pfn, PAGE_KERNEL), 0); - /* See ptesync comment in radix__set_pte_at() */ - if (radix_enabled()) - asm volatile("ptesync": : :"memory"); + err = open_patch_window(addr, ); + if (err) + return err; + patch_addr = (u32 *)(ctx.text_poke_addr + offset_in_page(addr)); err = __patch_instruction(addr, instr, patch_addr); - pte_clear(_mm, text_poke_addr, pte); - flush_tlb_kernel_range(text_poke_addr, text_poke_addr + PAGE_SIZE); + close_patch_window(); return err; } @@ -475,10 +388,7 @@ int patch_instruction(u32 *addr, ppc_inst_t in
[PATCH v1 1/2] powerpc/code-patching: Introduce open_patch_window()/close_patch_window()
The code patching capabilities have grown, and the specific quirks for setting up and tearing down the patching window are getting duplicated. This commit introduces an abstraction for working with this patching window. It defines open_patch_window() to set up the writable alias page, and close_patch_window() to remove it and flush the TLB. The actual mechanism for providing this patching window is an implementation detail that consumers don't need to worry about. Consumers are still responsible for flushing/syncing any changes they make through this alias page though. Signed-off-by: Benjamin Gray --- This design can be readily extended to remap the writable page to another physical page without incurring all of the entry and exit overhead. But that might have problems with spending too long in an interrupts disabled context, so I've left it out for now. --- arch/powerpc/lib/code-patching.c | 113 +++ 1 file changed, 113 insertions(+) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index ed450a32918c..d1b812f84154 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -276,6 +276,119 @@ static void unmap_patch_area(unsigned long addr) flush_tlb_kernel_range(addr, addr + PAGE_SIZE); } +/* + * Represents an active patching window. + */ +struct patch_window { + /* +* Page aligned patching window address. The page is a writable alias of +* the configured target page. +*/ + unsigned long text_poke_addr; + /* +* Pointer to the PTE for the patching window page. +*/ + pte_t *ptep; + /* +* (Temporary MM patching only) The original MM before creating the +* patching window. +*/ + struct mm_struct *orig_mm; + /* +* (Temporary MM patching only) The patching window MM. +*/ + struct mm_struct *patch_mm; + /* +* (Temporary MM patching only) Lock for the patching window PTE. +*/ + spinlock_t *ptl; +}; + +/* + * Calling this function opens a 'patching window' that maps a + * page starting at ctx->text_poke_addr (which is set by this function) + * as a writable alias to the page starting at addr. + * + * Upon success, callers must invoke the corresponding close_patch_window() + * function to return to the original state. Callers are also responsible + * for syncing any changes they make inside the window. + * + * Interrupts must be disabled for the entire duration of the patching. The PIDR + * is potentially changed during this time. + */ +static int open_patch_window(void *addr, struct patch_window *ctx) +{ + unsigned long pfn = get_patch_pfn(addr); + + lockdep_assert_irqs_disabled(); + + ctx->text_poke_addr = (unsigned long)__this_cpu_read(cpu_patching_context.addr); + + if (!mm_patch_enabled()) { + ctx->ptep = __this_cpu_read(cpu_patching_context.pte); + __set_pte_at(_mm, ctx->text_poke_addr, +ctx->ptep, pfn_pte(pfn, PAGE_KERNEL), 0); + + /* See ptesync comment in radix__set_pte_at() */ + if (radix_enabled()) + asm volatile("ptesync" ::: "memory"); + + return 0; + } + + ctx->orig_mm = current->active_mm; + ctx->patch_mm = __this_cpu_read(cpu_patching_context.mm); + + ctx->ptep = get_locked_pte(ctx->patch_mm, ctx->text_poke_addr, >ptl); + if (!ctx->ptep) + return -ENOMEM; + + __set_pte_at(ctx->patch_mm, ctx->text_poke_addr, +ctx->ptep, pfn_pte(pfn, PAGE_KERNEL), 0); + + /* order PTE update before use, also serves as the hwsync */ + asm volatile("ptesync" ::: "memory"); + + /* +* Changing mm requires context synchronising instructions on both sides of +* the context switch, as well as a hwsync between the last instruction for +* which the address of an associated storage access was translated using +* the current context. switch_mm_irqs_off() performs an isync after the +* context switch. +*/ + isync(); + switch_mm_irqs_off(ctx->orig_mm, ctx->patch_mm, current); + + WARN_ON(!mm_is_thread_local(ctx->patch_mm)); + + suspend_breakpoints(); + return 0; +} + +static void close_patch_window(struct patch_window *ctx) +{ + lockdep_assert_irqs_disabled(); + + if (!mm_patch_enabled()) { + pte_clear(_mm, ctx->text_poke_addr, ctx->ptep); + flush_tlb_kernel_range(ctx->text_poke_addr, ctx->text_poke_addr + PAGE_SIZE); + return; + } + + isync(); + switch_mm_irqs_off(ctx->patch_mm, ctx->orig_mm, current); + restore_breakpoints(); + + pte_clear(ctx->pat
[PATCH v1 3/3] powerpc/code-patching: Optimise patch_memcpy() to 4 byte chunks
As we are patching instructions, we can assume the length is a multiple of 4 and the destination address is aligned. Atomicity of patching a prefixed instruction is not a concern, as the original implementation doesn't provide it anyway. Signed-off-by: Benjamin Gray --- arch/powerpc/lib/code-patching.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index c6633759b509..ed450a32918c 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -394,10 +394,10 @@ static int patch_memset32(u32 *addr, u32 val, size_t count) return -EPERM; } -static int patch_memcpy(void *dst, void *src, size_t len) +static int patch_memcpy32(u32 *dst, u32 *src, size_t count) { - for (void *end = src + len; src < end; dst++, src++) - __put_kernel_nofault(dst, src, u8, failed); + for (u32 *end = src + count; src < end; dst++, src++) + __put_kernel_nofault(dst, src, u32, failed); return 0; @@ -424,7 +424,7 @@ static int __patch_instructions(u32 *patch_addr, u32 *code, size_t len, bool rep err = patch_memset32(patch_addr, val, len / 4); } } else { - err = patch_memcpy(patch_addr, code, len); + err = patch_memcpy32(patch_addr, code, len / 4); } smp_wmb(); /* smp write barrier */ -- 2.44.0
[PATCH v1 2/3] powerpc/code-patching: Use dedicated memory routines for patching
The patching page set up as a writable alias may be in quadrant 1 (userspace) if the temporary mm path is used. This causes sanitiser failures if so. Sanitiser failures also occur on the non-mm path because the plain memset family is instrumented, and KASAN treats the patching window as poisoned. Introduce locally defined patch_* variants of memset that perform an uninstrumented lower level set, as well as detecting write errors like the original single patch variant does. copy_to_user() is not correct here, as the PTE makes it a proper kernel page (the EEA is privileged access only, RW). It just happens to be in quadrant 1 because that's the hardware's mechanism for using the current PID vs PID 0 in translations. Importantly, it's incorrect to allow user page accesses. Now that the patching memsets are used, we also propagate a failure up to the caller as the single patch variant does. Signed-off-by: Benjamin Gray --- The patch_memcpy() can be optimised to 4 bytes at a time assuming the same requirements as regular instruction patching are being followed for the 'copy sequence of instructions' mode (i.e., they actually are instructions following instruction alignment rules). --- arch/powerpc/lib/code-patching.c | 42 +--- 1 file changed, 38 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index c6ab46156cda..c6633759b509 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -372,9 +372,43 @@ int patch_instruction(u32 *addr, ppc_inst_t instr) } NOKPROBE_SYMBOL(patch_instruction); +static int patch_memset64(u64 *addr, u64 val, size_t count) +{ + for (u64 *end = addr + count; addr < end; addr++) + __put_kernel_nofault(addr, , u64, failed); + + return 0; + +failed: + return -EPERM; +} + +static int patch_memset32(u32 *addr, u32 val, size_t count) +{ + for (u32 *end = addr + count; addr < end; addr++) + __put_kernel_nofault(addr, , u32, failed); + + return 0; + +failed: + return -EPERM; +} + +static int patch_memcpy(void *dst, void *src, size_t len) +{ + for (void *end = src + len; src < end; dst++, src++) + __put_kernel_nofault(dst, src, u8, failed); + + return 0; + +failed: + return -EPERM; +} + static int __patch_instructions(u32 *patch_addr, u32 *code, size_t len, bool repeat_instr) { unsigned long start = (unsigned long)patch_addr; + int err; /* Repeat instruction */ if (repeat_instr) { @@ -383,19 +417,19 @@ static int __patch_instructions(u32 *patch_addr, u32 *code, size_t len, bool rep if (ppc_inst_prefixed(instr)) { u64 val = ppc_inst_as_ulong(instr); - memset64((u64 *)patch_addr, val, len / 8); + err = patch_memset64((u64 *)patch_addr, val, len / 8); } else { u32 val = ppc_inst_val(instr); - memset32(patch_addr, val, len / 4); + err = patch_memset32(patch_addr, val, len / 4); } } else { - memcpy(patch_addr, code, len); + err = patch_memcpy(patch_addr, code, len); } smp_wmb(); /* smp write barrier */ flush_icache_range(start, start + len); - return 0; + return err; } /* -- 2.44.0
[PATCH v1 1/3] powerpc/code-patching: Test patch_instructions() during boot
patch_instructions() introduces new behaviour with a couple of variations. Test each case of * a repeated 32-bit instruction, * a repeated 64-bit instruction (ppc64), and * a copied sequence of instructions for both on a single page and when it crosses a page boundary. Signed-off-by: Benjamin Gray --- arch/powerpc/lib/test-code-patching.c | 92 +++ 1 file changed, 92 insertions(+) diff --git a/arch/powerpc/lib/test-code-patching.c b/arch/powerpc/lib/test-code-patching.c index c44823292f73..35a3756272df 100644 --- a/arch/powerpc/lib/test-code-patching.c +++ b/arch/powerpc/lib/test-code-patching.c @@ -347,6 +347,97 @@ static void __init test_prefixed_patching(void) check(!memcmp(iptr, expected, sizeof(expected))); } +static void __init test_multi_instruction_patching(void) +{ + u32 code[256]; + void *buf; + u32 *addr32; + u64 *addr64; + ppc_inst_t inst64 = ppc_inst_prefix(OP_PREFIX << 26 | 3UL << 24, PPC_RAW_TRAP()); + u32 inst32 = PPC_RAW_NOP(); + + buf = vzalloc(PAGE_SIZE * 8); + check(buf); + if (!buf) + return; + + /* Test single page 32-bit repeated instruction */ + addr32 = buf + PAGE_SIZE; + check(!patch_instructions(addr32 + 1, , 12, true)); + + check(addr32[0] == 0); + check(addr32[1] == inst32); + check(addr32[2] == inst32); + check(addr32[3] == inst32); + check(addr32[4] == 0); + + /* Test single page 64-bit repeated instruction */ + if (IS_ENABLED(CONFIG_PPC64)) { + check(ppc_inst_prefixed(inst64)); + + addr64 = buf + PAGE_SIZE * 2; + ppc_inst_write(code, inst64); + check(!patch_instructions((u32 *)(addr64 + 1), code, 24, true)); + + check(addr64[0] == 0); + check(ppc_inst_equal(ppc_inst_read((u32 *)[1]), inst64)); + check(ppc_inst_equal(ppc_inst_read((u32 *)[2]), inst64)); + check(ppc_inst_equal(ppc_inst_read((u32 *)[3]), inst64)); + check(addr64[4] == 0); + } + + /* Test single page memcpy */ + addr32 = buf + PAGE_SIZE * 3; + + for (int i = 0; i < ARRAY_SIZE(code); i++) + code[i] = i + 1; + + check(!patch_instructions(addr32 + 1, code, sizeof(code), false)); + + check(addr32[0] == 0); + check(!memcmp([1], code, sizeof(code))); + check(addr32[ARRAY_SIZE(code) + 1] == 0); + + /* Test multipage 32-bit repeated instruction */ + addr32 = buf + PAGE_SIZE * 4 - 8; + check(!patch_instructions(addr32 + 1, , 12, true)); + + check(addr32[0] == 0); + check(addr32[1] == inst32); + check(addr32[2] == inst32); + check(addr32[3] == inst32); + check(addr32[4] == 0); + + /* Test multipage 64-bit repeated instruction */ + if (IS_ENABLED(CONFIG_PPC64)) { + check(ppc_inst_prefixed(inst64)); + + addr64 = buf + PAGE_SIZE * 5 - 8; + ppc_inst_write(code, inst64); + check(!patch_instructions((u32 *)(addr64 + 1), code, 24, true)); + + check(addr64[0] == 0); + check(ppc_inst_equal(ppc_inst_read((u32 *)[1]), inst64)); + check(ppc_inst_equal(ppc_inst_read((u32 *)[2]), inst64)); + check(ppc_inst_equal(ppc_inst_read((u32 *)[3]), inst64)); + check(addr64[4] == 0); + } + + /* Test multipage memcpy */ + addr32 = buf + PAGE_SIZE * 6 - 12; + + for (int i = 0; i < ARRAY_SIZE(code); i++) + code[i] = i + 1; + + check(!patch_instructions(addr32 + 1, code, sizeof(code), false)); + + check(addr32[0] == 0); + check(!memcmp([1], code, sizeof(code))); + check(addr32[ARRAY_SIZE(code) + 1] == 0); + + vfree(buf); +} + static int __init test_code_patching(void) { pr_info("Running code patching self-tests ...\n"); @@ -356,6 +447,7 @@ static int __init test_code_patching(void) test_create_function_call(); test_translate_branch(); test_prefixed_patching(); + test_multi_instruction_patching(); return 0; } -- 2.44.0
Re: BUG: KASAN: vmalloc-out-of-bounds in memset32 (bpf_prog_pack_free)
On Wed, 2024-01-31 at 11:46 +, Christophe Leroy wrote: > Hi, > > Got the following BUG while loading module test_bpf.ko > > No time to investigate for now. > > root@vgoip:~# insmod test_bpf.ko > [ 263.409030] > == > [ 263.416415] BUG: KASAN: vmalloc-out-of-bounds in > memset32+0x5c/0xa0 > [ 263.422952] Write of size 4 at addr c9000e40 by task kworker/0:0/7 > [ 263.429322] > [ 263.430816] CPU: 0 PID: 7 Comm: kworker/0:0 Not tainted > 6.8.0-rc1-s3k-dev-02364-gc626219462a6-dirty #606 > [ 263.440580] Hardware name: MIAE 8xx 0x50 CMPC885 > [ 263.445658] Workqueue: events bpf_prog_free_deferred > [ 263.450973] Call Trace: > [ 263.453411] [c905bd00] [c0c114e8] dump_stack_lvl+0x34/0x50 > (unreliable) > [ 263.460744] [c905bd20] [c026b9d4] print_report+0x174/0x608 > [ 263.466853] [c905bd70] [c026c01c] kasan_report+0xc0/0x130 > [ 263.472788] [c905bdd0] [c0c43cb0] memset32+0x5c/0xa0 > [ 263.478198] [c905bdf0] [c0030690] patch_instructions+0x70/0x17c > [ 263.484656] [c905be30] [c00389b0] > bpf_arch_text_invalidate+0xa8/0x120 > [ 263.491638] [c905be90] [c018e63c] bpf_prog_pack_free+0xec/0x24c > [ 263.498096] [c905bec0] [c018ea34] > bpf_jit_binary_pack_free+0x3c/0x80 > [ 263.504991] [c905bee0] [c0038ae8] bpf_jit_free+0xc0/0x128 > [ 263.510925] [c905bf00] [c00790f8] process_one_work+0x310/0x6e8 > [ 263.517209] [c905bf50] [c0079b2c] worker_thread+0x65c/0x868 > [ 263.523232] [c905bfc0] [c0084b78] kthread+0x17c/0x1ac > [ 263.528817] [c905bff0] [c00181fc] start_kernel_thread+0x10/0x14 > [ 263.535279] > [ 263.536782] The buggy address belongs to the virtual mapping at > [ 263.536782] [c900, c9008000) created by: > [ 263.536782] text_area_cpu_up+0x28/0x1d4 > [ 263.551418] > [ 263.552902] The buggy address belongs to the physical page: > [ 263.560228] > [ 263.561713] Memory state around the buggy address: > [ 263.566585] c9000d00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 > f8 f8 > [ 263.573307] c9000d80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 > f8 f8 > [ 263.580027] >c9000e00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 > f8 f8 > [ 263.586677] ^ > [ 263.591370] c9000e80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 > f8 f8 > [ 263.598093] c9000f00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 > f8 f8 > [ 263.604743] > == > > Christophe > Looks like a false positive. It's clearly in range of the poking allocation after all. KASAN apparently leaves VM_ALLOC areas as poisoned, expecting some kind of subsequent allocator to come and unpoison specific parts. Except we call map_patch_area() instead of whatever path generic code was expecting, so we never unpoison the range. The issue would be pre-existing from the beginning, and gone unnoticed because the original code path that does the patching (i.e., actually interacts with the poking page) isn't instrumented.
Re: [PATCH] powerpc/ftrace: Ignore ftrace locations in exit text sections
ddr, int link) > { > ppc_inst_t op; > diff --git a/arch/powerpc/kernel/vmlinux.lds.S > b/arch/powerpc/kernel/vmlinux.lds.S > index 1c5970df3233..9c376ae6857d 100644 > --- a/arch/powerpc/kernel/vmlinux.lds.S > +++ b/arch/powerpc/kernel/vmlinux.lds.S > @@ -281,7 +281,9 @@ SECTIONS > * to deal with references from __bug_table > */ > .exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) { > + _sexittext = .; > EXIT_TEXT > + _eexittext = .; > } > > . = ALIGN(PAGE_SIZE); > > base-commit: 4ef8376c466ae8b03e632dd8eca1e44315f7dd61 Reviewed-by: Benjamin Gray Fixes what I assume is the same bug I was seeing for a while [0.00][T0] ftrace-powerpc: 0xc30f4f38: No module provided for non-kernel address [0.00][T0] [ ftrace bug ] [0.00][T0] ftrace faulted on modifying [0.00][T0] [] _sub_D_65535_0+0xc/0x40 [0.00][T0] Initializing ftrace call sites [0.00][T0] ftrace record flags: 0 [0.00][T0] (0) [0.00][T0] expected tramp: c00a8548 [0.00][T0] [ cut here ] [0.00][T0] WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:2179 ftrace_bug+0x16c/0x3a0 [0.00][T0] Modules linked in: [0.00][T0] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-rc3-02361-g892886f9f5b7-dirty #11 [0.00][T0] Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf05 of:SLOF,HEAD hv:linux,kvm pSeries [0.00][T0] NIP: c048a96c LR: c048a968 CTR: [0.00][T0] REGS: c4fa7ae0 TRAP: 0700 Not tainted (6.8.0-rc3-02361-g892886f9f5b7-dirty) [0.00][T0] MSR: 82021033 CR: 28028440 XER: [0.00][T0] CFAR: c031db34 IRQMASK: 3 [0.00][T0] GPR00: c048a968 c4fa7d80 c2645600 0022 [0.00][T0] GPR04: 0008 0001 c0326dc4 [0.00][T0] GPR08: c2535600 0001 c440bee0 48028444 [0.00][T0] GPR12: c625 [0.00][T0] GPR16: 0003 0001 [0.00][T0] GPR20: c4fe8af8 c00010030428 c45b90c0 [0.00][T0] GPR24: c00010030430 c00010030420 [0.00][T0] GPR28: c1dd9a40 c1dd8bc0 c000102a5b00 c000102a5b08 [0.00][T0] NIP [c048a96c] ftrace_bug+0x16c/0x3a0 [0.00][T0] LR [c048a968] ftrace_bug+0x168/0x3a0 [0.00][T0] Call Trace: [0.00][T0] [c4fa7d80] [c048a968] ftrace_bug+0x168/0x3a0 (unreliable) [0.00][T0] [c4fa7e20] [c048aeec] ftrace_process_locs+0x34c/0x780 [0.00][T0] [c4fa7ec0] [c30655c0] ftrace_init+0xc8/0x27c [0.00][T0] [c4fa7f40] [c300d458] start_kernel+0x1bc/0x528 [0.00][T0] [c4fa7fe0] [c000eaa0] start_here_common+0x1c/0x20 [0.00][T0] Code: 7fe3fb78 483a31e5 6000 e93e0008 75290800 40820150 7fc3f378 4bfffc95 7c641b78 387d0f40 4be93189 6000 <0fe0> 382100a0 3d22fda6 3901 [0.00][T0] irq event stamp: 0 [0.00][T0] hardirqs last enabled at (0): [<>] 0x0 [0.00][T0] hardirqs last disabled at (0): [<>] 0x0 [0.00][T0] softirqs last enabled at (0): [<>] 0x0 [0.00][T0] softirqs last disabled at (0): [<>] 0x0 [0.00][T0] ---[ end trace ]---
[PATCH] powerpc/code-patching: Disable KASAN in __patch_instructions()
The memset/memcpy functions are by default instrumented by KASAN, which complains about user memory access when using a poking page in userspace. Using a userspace address is expected though, so don't instrument with KASAN for this function. Signed-off-by: Benjamin Gray --- I tried to replace the memsetN calls with __memsetN, but we appear to disable the non-instrumented variants of these when KASAN is enabled. Christophe might you know more here? The cost of just suppressing reports for this section shouldn't be too relevant; KASAN detects the access, but exits before it starts preparing the report itself. So it's just like any other KASAN instrumented function for the most part. --- arch/powerpc/lib/code-patching.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index c6ab46156cda..24989594578a 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -3,6 +3,7 @@ * Copyright 2008 Michael Ellerman, IBM Corporation. */ +#include #include #include #include @@ -377,6 +378,7 @@ static int __patch_instructions(u32 *patch_addr, u32 *code, size_t len, bool rep unsigned long start = (unsigned long)patch_addr; /* Repeat instruction */ + kasan_disable_current(); if (repeat_instr) { ppc_inst_t instr = ppc_inst_read(code); @@ -392,6 +394,7 @@ static int __patch_instructions(u32 *patch_addr, u32 *code, size_t len, bool rep } else { memcpy(patch_addr, code, len); } + kasan_enable_current(); smp_wmb(); /* smp write barrier */ flush_icache_range(start, start + len); -- 2.43.0
Re: [PATCH] powerpc/kasan: Limit KASAN thread size increase to 32KB
Don't know why the previous mail went blank. On Mon, 2024-02-12 at 17:42 +1100, Michael Ellerman wrote: > KASAN is seen to increase stack usage, to the point that it was > reported > to lead to stack overflow on some 32-bit machines (see link). > > To avoid overflows the stack size was doubled for KASAN builds in > commit 3e8635fb2e07 ("powerpc/kasan: Force thread size increase with > KASAN"). > > However with a 32KB stack size to begin with, the doubling leads to a > 64KB stack, which causes build errors: > arch/powerpc/kernel/switch.S:249: Error: operand out of range > (0xfe50 is not between 0x8000 and > 0x7fff) > > Although the asm could be reworked, in practice a 32KB stack seems > sufficient even for KASAN builds - the additional usage seems to be > in > the 2-3KB range for a 64-bit KASAN build. > > So only increase the stack for KASAN if the stack size is < 32KB. > > Link: > https://lore.kernel.org/linuxppc-dev/bug-207129-206...@https.bugzilla.kernel.org%2F/ > Reported-by: Spoorthy > Reported-by: Benjamin Gray > Fixes: 18f14afe2816 ("powerpc/64s: Increase default stack size to > 32KB") > Signed-off-by: Michael Ellerman Reviewed-by: Benjamin Gray > --- > arch/powerpc/include/asm/thread_info.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/include/asm/thread_info.h > b/arch/powerpc/include/asm/thread_info.h > index bf5dde1a4114..15c5691dd218 100644 > --- a/arch/powerpc/include/asm/thread_info.h > +++ b/arch/powerpc/include/asm/thread_info.h > @@ -14,7 +14,7 @@ > > #ifdef __KERNEL__ > > -#ifdef CONFIG_KASAN > +#if defined(CONFIG_KASAN) && CONFIG_THREAD_SHIFT < 15 > #define MIN_THREAD_SHIFT (CONFIG_THREAD_SHIFT + 1) > #else > #define MIN_THREAD_SHIFT CONFIG_THREAD_SHIFT
Re: [PATCH] powerpc/kasan: Limit KASAN thread size increase to 32KB
[PATCH] powerpc64/kasan: Pass virtual addresses to kasan_init_phys_region()
The kasan_init_phys_region() function maps shadow pages necessary for the ranges of the linear map backed by physical pages. Currently kasan_init_phys_region() is being passed physical addresses, but kasan_mem_to_shadow() expects virtual addresses. It works right now because the lower bits (12:64) of the kasan_mem_to_shadow() calculation are the same for the real and virtual addresses, so the actual PTE value is the same in the end. But virtual addresses are the intended input, so fix it. Signed-off-by: Benjamin Gray --- arch/powerpc/mm/kasan/init_book3e_64.c | 2 +- arch/powerpc/mm/kasan/init_book3s_64.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/mm/kasan/init_book3e_64.c b/arch/powerpc/mm/kasan/init_book3e_64.c index 11519e88dc6b..43c03b84ff32 100644 --- a/arch/powerpc/mm/kasan/init_book3e_64.c +++ b/arch/powerpc/mm/kasan/init_book3e_64.c @@ -112,7 +112,7 @@ void __init kasan_init(void) pte_t zero_pte = pfn_pte(virt_to_pfn(kasan_early_shadow_page), PAGE_KERNEL_RO); for_each_mem_range(i, , ) - kasan_init_phys_region((void *)start, (void *)end); + kasan_init_phys_region(phys_to_virt(start), phys_to_virt(end)); if (IS_ENABLED(CONFIG_KASAN_VMALLOC)) kasan_remove_zero_shadow((void *)VMALLOC_START, VMALLOC_SIZE); diff --git a/arch/powerpc/mm/kasan/init_book3s_64.c b/arch/powerpc/mm/kasan/init_book3s_64.c index 9300d641cf9a..3fb5ce4f48f4 100644 --- a/arch/powerpc/mm/kasan/init_book3s_64.c +++ b/arch/powerpc/mm/kasan/init_book3s_64.c @@ -62,7 +62,7 @@ void __init kasan_init(void) } for_each_mem_range(i, , ) - kasan_init_phys_region((void *)start, (void *)end); + kasan_init_phys_region(phys_to_virt(start), phys_to_virt(end)); for (i = 0; i < PTRS_PER_PTE; i++) __set_pte_at(_mm, (unsigned long)kasan_early_shadow_page, -- 2.43.0
Re: [PATCH v2 1/3] powerpc/code-patching: Add generic memory patching
On Thu, 2023-11-30 at 15:55 +0530, Naveen N Rao wrote: > On Mon, Oct 16, 2023 at 04:01:45PM +1100, Benjamin Gray wrote: > > patch_instruction() is designed for patching instructions in > > otherwise > > readonly memory. Other consumers also sometimes need to patch > > readonly > > memory, so have abused patch_instruction() for arbitrary data > > patches. > > > > This is a problem on ppc64 as patch_instruction() decides on the > > patch > > width using the 'instruction' opcode to see if it's a prefixed > > instruction. Data that triggers this can lead to larger writes, > > possibly > > crossing a page boundary and failing the write altogether. > > > > Introduce patch_uint(), and patch_ulong(), with aliases > > patch_u32(), and > > patch_u64() (on ppc64) designed for aligned data patches. The patch > > size is now determined by the called function, and is passed as an > > additional parameter to generic internals. > > > > While the instruction flushing is not required for data patches, > > the > > use cases for data patching (mainly module loading and static > > calls) > > are less performance sensitive than for instruction patching > > (ftrace activation). > > That's debatable. While it is nice to be able to activate function > tracing quickly, it is not necessarily a hot path. On the flip side, > I > do have a use case for data patching for ftrace activation :) > True, but it's still correct to do so at least. Having a different path for data writes is definitely a possibility, but would be added for performance. This series is motivated by fixing a correctness issue with data write sizes. > > So the instruction flushing remains unconditional > > in this patch. > > > > ppc32 does not support prefixed instructions, so is unaffected by > > the > > original issue. Care is taken in not exposing the size parameter in > > the > > public (non-static) interface, so the compiler can const-propagate > > it > > away. > > > > Signed-off-by: Benjamin Gray > > > > --- > > > > v2: * Deduplicate patch_32() definition > > * Use u32 for val32 > > * Remove noinline > > --- > > arch/powerpc/include/asm/code-patching.h | 33 > > arch/powerpc/lib/code-patching.c | 66 ++-- > > > > 2 files changed, 83 insertions(+), 16 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/code-patching.h > > b/arch/powerpc/include/asm/code-patching.h > > index 3f881548fb61..7c6056bb1706 100644 > > --- a/arch/powerpc/include/asm/code-patching.h > > +++ b/arch/powerpc/include/asm/code-patching.h > > @@ -75,6 +75,39 @@ int patch_branch(u32 *addr, unsigned long > > target, int flags); > > int patch_instruction(u32 *addr, ppc_inst_t instr); > > int raw_patch_instruction(u32 *addr, ppc_inst_t instr); > > > > +/* > > + * patch_uint() and patch_ulong() should only be called on > > addresses where the > > + * patch does not cross a cacheline, otherwise it may not be > > flushed properly > > + * and mixes of new and stale data may be observed. It cannot > > cross a page > > + * boundary, as only the target page is mapped as writable. > > Should we enforce alignment requirements, especially for > patch_ulong() > on 64-bit powerpc? I am not sure if there are use cases for unaligned > 64-bit writes. That should also ensure that the write doesn't cross a > cacheline. Yeah, the current description is more just the technical restrictions, not an endorsement of usage. If the caller isn't working with aligned data, it seems unlikely it would still be cacheline aligned. The caller should break it into 32bit patches if this is the case. By enforce, are you suggesting a WARN_ON in the code too? > > + * > > + * patch_instruction() and other instruction patchers > > automatically satisfy this > > + * requirement due to instruction alignment requirements. > > + */ > > + > > +#ifdef CONFIG_PPC64 > > + > > +int patch_uint(void *addr, unsigned int val); > > +int patch_ulong(void *addr, unsigned long val); > > + > > +#define patch_u64 patch_ulong > > + > > +#else > > + > > +static inline int patch_uint(u32 *addr, unsigned int val) > > Is there a reason to use u32 * here? > No, fixed to void* for next series > > +{ > > + return patch_instruction(addr, ppc_inst(val)); > > +} > > + > > +static inline int patch_ulong(void *addr, unsigned long val) > > +{ > > + return patch_instruction(addr, ppc_inst(
Re: [PATCH v2 0/3] Add generic data patching functions
On 17/10/23 5:39 pm, Christophe Leroy wrote: Le 16/10/2023 à 07:01, Benjamin Gray a écrit : Currently patch_instruction() bases the write length on the value being written. If the value looks like a prefixed instruction it writes 8 bytes, otherwise it writes 4 bytes. This makes it potentially buggy to use for writing arbitrary data, as if you want to write 4 bytes but it decides to write 8 bytes it may clobber the following memory or be unaligned and trigger an oops if it tries to cross a page boundary. To solve this, this series pulls out the size parameter to the 'top' of the text patching logic, and propagates it through the various functions. The two sizes supported are int and long; this allows for patching instructions and pointers on both ppc32 and ppc64. On ppc32 these are the same size, so care is taken to only use the size parameter on static functions, so the compiler can optimise it out entirely. Unfortunately GCC trips over its own feet here and won't optimise in a way that is optimal for strict RWX (mpc85xx_smp_defconfig) and no RWX (pmac32_defconfig). In the first case, patch_memory() is very large and can only be inlined if a single function calls it. While the source only calls it in patch_instruction(), an earlier optimisation pass inlined patch_instruction() into patch_branch(), so now there are 'two' references to patch_memory() and it cannot be inlined into patch_instruction(). Instead patch_instruction() becomes a single branch directly to patch_memory(). We can fix this by marking patch_instruction() as noinline, but this prevents patch_memory() from being directly inlined into patch_branch() when RWX is disabled and patch_memory() is very small. It may be possible to avoid this by merging together patch_instruction() and patch_memory() on ppc32, but the only way I can think to do this without duplicating the implementation involves using the preprocessor to change if is_dword is a parameter or a local variable, which is gross. What about: diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h index 7c6056bb1706..af89ef450c93 100644 --- a/arch/powerpc/include/asm/code-patching.h +++ b/arch/powerpc/include/asm/code-patching.h @@ -72,7 +72,7 @@ static inline int create_branch(ppc_inst_t *instr, const u32 *addr, int create_cond_branch(ppc_inst_t *instr, const u32 *addr, unsigned long target, int flags); int patch_branch(u32 *addr, unsigned long target, int flags); -int patch_instruction(u32 *addr, ppc_inst_t instr); +int patch_memory(void *addr, unsigned long val, bool is_dword); int raw_patch_instruction(u32 *addr, ppc_inst_t instr); /* @@ -87,24 +87,28 @@ int raw_patch_instruction(u32 *addr, ppc_inst_t instr); #ifdef CONFIG_PPC64 -int patch_uint(void *addr, unsigned int val); -int patch_ulong(void *addr, unsigned long val); +int patch_instruction(u32 *addr, ppc_inst_t instr); #define patch_u64 patch_ulong #else -static inline int patch_uint(u32 *addr, unsigned int val) +static inline int patch_instruction(u32 *addr, ppc_inst_t instr) { - return patch_instruction(addr, ppc_inst(val)); + return patch_memory(addr, ppc_inst_val(instr), false); } +#endif + static inline int patch_ulong(void *addr, unsigned long val) { - return patch_instruction(addr, ppc_inst(val)); + return patch_memory(addr, val, true); } -#endif +static inline int patch_uint(void *addr, unsigned int val) +{ + return patch_memory(addr, val, false); +} #define patch_u32 patch_uint diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 60289332412f..77418b2a4aa4 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -355,7 +355,7 @@ static int __do_patch_memory(void *addr, unsigned long val, bool is_dword) return err; } -static int patch_memory(void *addr, unsigned long val, bool is_dword) +int patch_memory(void *addr, unsigned long val, bool is_dword) { int err; unsigned long flags; @@ -378,6 +378,7 @@ static int patch_memory(void *addr, unsigned long val, bool is_dword) return err; } +NOKPROBE_SYMBOL(patch_memory) #ifdef CONFIG_PPC64 @@ -390,26 +391,6 @@ int patch_instruction(u32 *addr, ppc_inst_t instr) } NOKPROBE_SYMBOL(patch_instruction) -int patch_uint(void *addr, unsigned int val) -{ - return patch_memory(addr, val, false); -} -NOKPROBE_SYMBOL(patch_uint) - -int patch_ulong(void *addr, unsigned long val) -{ - return patch_memory(addr, val, true); -} -NOKPROBE_SYMBOL(patch_ulong) - -#else - -int patch_instruction(u32 *addr, ppc_inst_t instr) -{ - return patch_memory(addr, ppc_inst_val(instr), false); -} -NOKPROBE_SYMBOL(patch_instruction) - #endif int patch_branch(u32 *addr, unsigned long target, int flags) Wouldn't every caller need to initialise the is_dword parameter in that case? It can't tell it's unused across
[PATCH v2 3/3] powerpc/32: Convert patch_instruction() to patch_uint()
These changes are for patch_instruction() uses on data. Unlike ppc64 these should not be incorrect as-is, but using the patch_uint() alias better reflects what kind of data being patched and allows for benchmarking the effect of different patch_* implementations (e.g., skipping instruction flushing when patching data). Signed-off-by: Benjamin Gray --- arch/powerpc/kernel/static_call.c | 2 +- arch/powerpc/platforms/powermac/smp.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/static_call.c b/arch/powerpc/kernel/static_call.c index 863a7aa24650..1502b7e439ca 100644 --- a/arch/powerpc/kernel/static_call.c +++ b/arch/powerpc/kernel/static_call.c @@ -17,7 +17,7 @@ void arch_static_call_transform(void *site, void *tramp, void *func, bool tail) mutex_lock(_mutex); if (func && !is_short) { - err = patch_instruction(tramp + PPC_SCT_DATA, ppc_inst(target)); + err = patch_ulong(tramp + PPC_SCT_DATA, target); if (err) goto out; } diff --git a/arch/powerpc/platforms/powermac/smp.c b/arch/powerpc/platforms/powermac/smp.c index c83d1e14077e..cfc1cd10135d 100644 --- a/arch/powerpc/platforms/powermac/smp.c +++ b/arch/powerpc/platforms/powermac/smp.c @@ -827,7 +827,7 @@ static int smp_core99_kick_cpu(int nr) mdelay(1); /* Restore our exception vector */ - patch_instruction(vector, ppc_inst(save_vector)); + patch_uint(vector, save_vector); local_irq_restore(flags); if (ppc_md.progress) ppc_md.progress("smp_core99_kick_cpu done", 0x347); -- 2.39.2
[PATCH v2 2/3] powerpc/64: Convert patch_instruction() to patch_u32()
This use of patch_instruction() is working on 32 bit data, and can fail if the data looks like a prefixed instruction and the extra write crosses a page boundary. Use patch_u32() to fix the write size. Fixes: 8734b41b3efe ("powerpc/module_64: Fix livepatching for RO modules") Link: https://lore.kernel.org/all/20230203004649.1f59dbd4@yea/ Signed-off-by: Benjamin Gray --- v2: * Added the fixes tag, it seems appropriate even if the subject does mention a more robust solution being required. patch_u64() should be more efficient, but judging from the bug report it doesn't seem like the data is doubleword aligned. --- arch/powerpc/kernel/module_64.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c index 7112adc597a8..e9bab599d0c2 100644 --- a/arch/powerpc/kernel/module_64.c +++ b/arch/powerpc/kernel/module_64.c @@ -651,12 +651,11 @@ static inline int create_stub(const Elf64_Shdr *sechdrs, // func_desc_t is 8 bytes if ABIv2, else 16 bytes desc = func_desc(addr); for (i = 0; i < sizeof(func_desc_t) / sizeof(u32); i++) { - if (patch_instruction(((u32 *)>funcdata) + i, - ppc_inst(((u32 *)())[i]))) + if (patch_u32(((u32 *)>funcdata) + i, ((u32 *))[i])) return 0; } - if (patch_instruction(>magic, ppc_inst(STUB_MAGIC))) + if (patch_u32(>magic, STUB_MAGIC)) return 0; return 1; -- 2.39.2
[PATCH v2 1/3] powerpc/code-patching: Add generic memory patching
patch_instruction() is designed for patching instructions in otherwise readonly memory. Other consumers also sometimes need to patch readonly memory, so have abused patch_instruction() for arbitrary data patches. This is a problem on ppc64 as patch_instruction() decides on the patch width using the 'instruction' opcode to see if it's a prefixed instruction. Data that triggers this can lead to larger writes, possibly crossing a page boundary and failing the write altogether. Introduce patch_uint(), and patch_ulong(), with aliases patch_u32(), and patch_u64() (on ppc64) designed for aligned data patches. The patch size is now determined by the called function, and is passed as an additional parameter to generic internals. While the instruction flushing is not required for data patches, the use cases for data patching (mainly module loading and static calls) are less performance sensitive than for instruction patching (ftrace activation). So the instruction flushing remains unconditional in this patch. ppc32 does not support prefixed instructions, so is unaffected by the original issue. Care is taken in not exposing the size parameter in the public (non-static) interface, so the compiler can const-propagate it away. Signed-off-by: Benjamin Gray --- v2: * Deduplicate patch_32() definition * Use u32 for val32 * Remove noinline --- arch/powerpc/include/asm/code-patching.h | 33 arch/powerpc/lib/code-patching.c | 66 ++-- 2 files changed, 83 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h index 3f881548fb61..7c6056bb1706 100644 --- a/arch/powerpc/include/asm/code-patching.h +++ b/arch/powerpc/include/asm/code-patching.h @@ -75,6 +75,39 @@ int patch_branch(u32 *addr, unsigned long target, int flags); int patch_instruction(u32 *addr, ppc_inst_t instr); int raw_patch_instruction(u32 *addr, ppc_inst_t instr); +/* + * patch_uint() and patch_ulong() should only be called on addresses where the + * patch does not cross a cacheline, otherwise it may not be flushed properly + * and mixes of new and stale data may be observed. It cannot cross a page + * boundary, as only the target page is mapped as writable. + * + * patch_instruction() and other instruction patchers automatically satisfy this + * requirement due to instruction alignment requirements. + */ + +#ifdef CONFIG_PPC64 + +int patch_uint(void *addr, unsigned int val); +int patch_ulong(void *addr, unsigned long val); + +#define patch_u64 patch_ulong + +#else + +static inline int patch_uint(u32 *addr, unsigned int val) +{ + return patch_instruction(addr, ppc_inst(val)); +} + +static inline int patch_ulong(void *addr, unsigned long val) +{ + return patch_instruction(addr, ppc_inst(val)); +} + +#endif + +#define patch_u32 patch_uint + static inline unsigned long patch_site_addr(s32 *site) { return (unsigned long)site + *site; diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index b00112d7ad46..60289332412f 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -20,15 +20,14 @@ #include #include -static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 *patch_addr) +static int __patch_memory(void *exec_addr, unsigned long val, void *patch_addr, + bool is_dword) { - if (!ppc_inst_prefixed(instr)) { - u32 val = ppc_inst_val(instr); + if (!IS_ENABLED(CONFIG_PPC64) || likely(!is_dword)) { + u32 val32 = val; - __put_kernel_nofault(patch_addr, , u32, failed); + __put_kernel_nofault(patch_addr, , u32, failed); } else { - u64 val = ppc_inst_as_ulong(instr); - __put_kernel_nofault(patch_addr, , u64, failed); } @@ -43,7 +42,10 @@ static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 *patch_addr int raw_patch_instruction(u32 *addr, ppc_inst_t instr) { - return __patch_instruction(addr, instr, addr); + if (ppc_inst_prefixed(instr)) + return __patch_memory(addr, ppc_inst_as_ulong(instr), addr, true); + else + return __patch_memory(addr, ppc_inst_val(instr), addr, false); } struct patch_context { @@ -278,7 +280,7 @@ static void unmap_patch_area(unsigned long addr) flush_tlb_kernel_range(addr, addr + PAGE_SIZE); } -static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) +static int __do_patch_memory_mm(void *addr, unsigned long val, bool is_dword) { int err; u32 *patch_addr; @@ -307,7 +309,7 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr) orig_mm = start_using_temp_mm(patching_mm); - err = __patch_instruction(addr, instr, patch_addr); + err = __patch_memory(addr, val, patch_addr, is_dword); /* hwsync performed by __patch_instruction
[PATCH v2 0/3] Add generic data patching functions
Currently patch_instruction() bases the write length on the value being written. If the value looks like a prefixed instruction it writes 8 bytes, otherwise it writes 4 bytes. This makes it potentially buggy to use for writing arbitrary data, as if you want to write 4 bytes but it decides to write 8 bytes it may clobber the following memory or be unaligned and trigger an oops if it tries to cross a page boundary. To solve this, this series pulls out the size parameter to the 'top' of the text patching logic, and propagates it through the various functions. The two sizes supported are int and long; this allows for patching instructions and pointers on both ppc32 and ppc64. On ppc32 these are the same size, so care is taken to only use the size parameter on static functions, so the compiler can optimise it out entirely. Unfortunately GCC trips over its own feet here and won't optimise in a way that is optimal for strict RWX (mpc85xx_smp_defconfig) and no RWX (pmac32_defconfig). In the first case, patch_memory() is very large and can only be inlined if a single function calls it. While the source only calls it in patch_instruction(), an earlier optimisation pass inlined patch_instruction() into patch_branch(), so now there are 'two' references to patch_memory() and it cannot be inlined into patch_instruction(). Instead patch_instruction() becomes a single branch directly to patch_memory(). We can fix this by marking patch_instruction() as noinline, but this prevents patch_memory() from being directly inlined into patch_branch() when RWX is disabled and patch_memory() is very small. It may be possible to avoid this by merging together patch_instruction() and patch_memory() on ppc32, but the only way I can think to do this without duplicating the implementation involves using the preprocessor to change if is_dword is a parameter or a local variable, which is gross. For now I've removed the noinline, because at least the compiler might get smarter in future and do the inlines correctly. If noinline remains then there is no chance of it working. Changes from v1: * Addressed the v1 review actions * Removed noinline (for now) v1: https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20230207015643.590684-1-bg...@linux.ibm.com/ Benjamin Gray (3): powerpc/code-patching: Add generic memory patching powerpc/64: Convert patch_instruction() to patch_u32() powerpc/32: Convert patch_instruction() to patch_uint() arch/powerpc/include/asm/code-patching.h | 33 arch/powerpc/kernel/module_64.c | 5 +- arch/powerpc/kernel/static_call.c| 2 +- arch/powerpc/lib/code-patching.c | 66 ++-- arch/powerpc/platforms/powermac/smp.c| 2 +- 5 files changed, 87 insertions(+), 21 deletions(-) -- 2.39.2
[PATCH 06/12] powerpc: Annotate endianness of various variables and functions
Sparse reports several endianness warnings on variables and functions that are consistently treated as big endian. There are no multi-endianness shenanigans going on here so fix these low hanging fruit up in one patch. All changes are just type annotations; no endianness switching operations are introduced by this patch. Signed-off-by: Benjamin Gray --- arch/powerpc/include/asm/book3s/64/pgtable.h| 2 +- arch/powerpc/include/asm/imc-pmu.h | 16 arch/powerpc/kernel/prom_init.c | 2 +- arch/powerpc/kexec/core_64.c| 4 ++-- arch/powerpc/kexec/file_load_64.c | 6 +++--- arch/powerpc/mm/drmem.c | 2 +- arch/powerpc/perf/hv-24x7.c | 2 +- arch/powerpc/perf/imc-pmu.c | 9 + arch/powerpc/platforms/powermac/feature.c | 3 ++- arch/powerpc/platforms/pseries/hotplug-memory.c | 3 ++- 10 files changed, 26 insertions(+), 23 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 5c497c862d75..7b0d0f9d343a 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -647,7 +647,7 @@ static inline pte_t pte_mkuser(pte_t pte) */ static inline int pte_devmap(pte_t pte) { - u64 mask = cpu_to_be64(_PAGE_DEVMAP | _PAGE_PTE); + __be64 mask = cpu_to_be64(_PAGE_DEVMAP | _PAGE_PTE); return (pte_raw(pte) & mask) == mask; } diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h index 699a88584ae1..a656635df386 100644 --- a/arch/powerpc/include/asm/imc-pmu.h +++ b/arch/powerpc/include/asm/imc-pmu.h @@ -74,14 +74,14 @@ struct imc_events { * The following is the data structure to hold trace imc data. */ struct trace_imc_data { - u64 tb1; - u64 ip; - u64 val; - u64 cpmc1; - u64 cpmc2; - u64 cpmc3; - u64 cpmc4; - u64 tb2; + __be64 tb1; + __be64 ip; + __be64 val; + __be64 cpmc1; + __be64 cpmc2; + __be64 cpmc3; + __be64 cpmc4; + __be64 tb2; }; /* Event attribute array index */ diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index d464ba412084..e67effdba85c 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -947,7 +947,7 @@ struct option_vector7 { } __packed; struct ibm_arch_vec { - struct { u32 mask, val; } pvrs[14]; + struct { __be32 mask, val; } pvrs[14]; u8 num_vectors; diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c index a79e28c91e2b..0bee7ca9a77c 100644 --- a/arch/powerpc/kexec/core_64.c +++ b/arch/powerpc/kexec/core_64.c @@ -379,8 +379,8 @@ void default_machine_kexec(struct kimage *image) #ifdef CONFIG_PPC_64S_HASH_MMU /* Values we need to export to the second kernel via the device tree. */ -static unsigned long htab_base; -static unsigned long htab_size; +static __be64 htab_base; +static __be64 htab_size; static struct property htab_base_prop = { .name = "linux,htab-base", diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index 09187aca3d1f..961a6dd67365 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -32,7 +32,7 @@ #include struct umem_info { - u64 *buf; /* data buffer for usable-memory property */ + __be64 *buf;/* data buffer for usable-memory property */ u32 size; /* size allocated for the data buffer */ u32 max_entries;/* maximum no. of entries */ u32 idx;/* index of current entry */ @@ -443,10 +443,10 @@ static int locate_mem_hole_bottom_up_ppc64(struct kexec_buf *kbuf, * * Returns buffer on success, NULL on error. */ -static u64 *check_realloc_usable_mem(struct umem_info *um_info, int cnt) +static __be64 *check_realloc_usable_mem(struct umem_info *um_info, int cnt) { u32 new_size; - u64 *tbuf; + __be64 *tbuf; if ((um_info->idx + cnt) <= um_info->max_entries) return um_info->buf; diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c index 2369d1bf2411..fde7790277f7 100644 --- a/arch/powerpc/mm/drmem.c +++ b/arch/powerpc/mm/drmem.c @@ -67,7 +67,7 @@ static int drmem_update_dt_v1(struct device_node *memory, struct property *new_prop; struct of_drconf_cell_v1 *dr_cell; struct drmem_lmb *lmb; - u32 *p; + __be32 *p; new_prop = clone_property(prop, prop->length); if (!new_prop) diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index 3449be7c0d51..057ec2e3451d 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -1338,7 +1338,7 @@ static int get_count_from_result(struct perf_event *
[PATCH 07/12] powerpc/kvm: Force cast endianness of KVM shared regs
Sparse reports endianness mismatches in the KVM shared regs getter and setter helpers. This code has dynamic endianness behind a safe interface, so a force is warranted here to tell sparse this is OK. Signed-off-by: Benjamin Gray --- arch/powerpc/include/asm/kvm_ppc.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index b4da8514af43..b547bcf1a995 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -943,18 +943,18 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, ulong val) \ static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu) \ { \ if (kvmppc_shared_big_endian(vcpu)) \ - return be##size##_to_cpu(vcpu->arch.shared->reg);\ + return be##size##_to_cpu((__be##size __force)vcpu->arch.shared->reg); \ else\ - return le##size##_to_cpu(vcpu->arch.shared->reg);\ + return le##size##_to_cpu((__le##size __force)vcpu->arch.shared->reg); \ } \ #define SHARED_WRAPPER_SET(reg, size) \ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val) \ { \ if (kvmppc_shared_big_endian(vcpu)) \ - vcpu->arch.shared->reg = cpu_to_be##size(val); \ + vcpu->arch.shared->reg = (u##size __force)cpu_to_be##size(val); \ else\ - vcpu->arch.shared->reg = cpu_to_le##size(val); \ + vcpu->arch.shared->reg = (u##size __force)cpu_to_le##size(val); \ } \ #define SHARED_WRAPPER(reg, size) \ -- 2.39.2
[PATCH 12/12] powerpc/fadump: Annotate endianness cast with __force
Sparse reports an endianness error with the else case of val = (cpu_endian ? be64_to_cpu(reg_entry->reg_val) : (u64)(reg_entry->reg_val)); This is a safe operation because the code is explicitly working with dynamic endianness, so add the __force annotation to tell Sparse to ignore it. Signed-off-by: Benjamin Gray --- arch/powerpc/platforms/powernv/opal-fadump.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/opal-fadump.h b/arch/powerpc/platforms/powernv/opal-fadump.h index 3f715efb0aa6..5eeb794b5eb1 100644 --- a/arch/powerpc/platforms/powernv/opal-fadump.h +++ b/arch/powerpc/platforms/powernv/opal-fadump.h @@ -135,7 +135,7 @@ static inline void opal_fadump_read_regs(char *bufp, unsigned int regs_cnt, for (i = 0; i < regs_cnt; i++, bufp += reg_entry_size) { reg_entry = (struct hdat_fadump_reg_entry *)bufp; val = (cpu_endian ? be64_to_cpu(reg_entry->reg_val) : - (u64)(reg_entry->reg_val)); + (u64 __force)(reg_entry->reg_val)); opal_fadump_set_regval_regnum(regs, be32_to_cpu(reg_entry->reg_type), be32_to_cpu(reg_entry->reg_num), -- 2.39.2
[PATCH 09/12] powerpc/uaccess: Cast away __user annotation after verification
Sparse reports dereference of a __user pointer. copy_mc_to_user() takes a __user pointer, verifies it, then calls the generic copy routine copy_mc_generic(). As we have verified the pointer, cast out the __user annotation when passing to copy_mc_generic(). Signed-off-by: Benjamin Gray --- arch/powerpc/include/asm/uaccess.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index fb725ec77926..f1f9890f50d3 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -374,7 +374,7 @@ copy_mc_to_user(void __user *to, const void *from, unsigned long n) if (check_copy_size(from, n, true)) { if (access_ok(to, n)) { allow_write_to_user(to, n); - n = copy_mc_generic((void *)to, from, n); + n = copy_mc_generic((void __force *)to, from, n); prevent_write_to_user(to, n); } } -- 2.39.2
[PATCH 11/12] powerpc/eeh: Remove unnecessary cast
Sparse reports a warning when casting to an int. There is no need to cast in the first place, so drop them. Signed-off-by: Benjamin Gray --- arch/powerpc/kernel/eeh_driver.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 438568a472d0..48773d2d9be3 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -39,7 +39,7 @@ static int eeh_result_priority(enum pci_ers_result result) case PCI_ERS_RESULT_NEED_RESET: return 6; default: - WARN_ONCE(1, "Unknown pci_ers_result value: %d\n", (int)result); + WARN_ONCE(1, "Unknown pci_ers_result value: %d\n", result); return 0; } }; @@ -60,7 +60,7 @@ static const char *pci_ers_result_name(enum pci_ers_result result) case PCI_ERS_RESULT_NO_AER_DRIVER: return "no AER driver"; default: - WARN_ONCE(1, "Unknown result type: %d\n", (int)result); + WARN_ONCE(1, "Unknown result type: %d\n", result); return "unknown"; } }; -- 2.39.2
[PATCH 10/12] powerpc: Cast away __iomem in low level IO routines
Sparse reports dereferencing an __iomem pointer. These routines are clearly low level handlers for IO memory, so force cast away the __iomem annotation to tell sparse the dereferences are safe. Signed-off-by: Benjamin Gray --- arch/powerpc/kernel/io.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/io.c b/arch/powerpc/kernel/io.c index 2f29b7d432de..6af535905984 100644 --- a/arch/powerpc/kernel/io.c +++ b/arch/powerpc/kernel/io.c @@ -33,7 +33,7 @@ void _insb(const volatile u8 __iomem *port, void *buf, long count) return; asm volatile("sync"); do { - tmp = *port; + tmp = *(const volatile u8 __force *)port; eieio(); *tbuf++ = tmp; } while (--count != 0); @@ -49,7 +49,7 @@ void _outsb(volatile u8 __iomem *port, const void *buf, long count) return; asm volatile("sync"); do { - *port = *tbuf++; + *(volatile u8 __force *)port = *tbuf++; } while (--count != 0); asm volatile("sync"); } @@ -64,7 +64,7 @@ void _insw_ns(const volatile u16 __iomem *port, void *buf, long count) return; asm volatile("sync"); do { - tmp = *port; + tmp = *(const volatile u16 __force *)port; eieio(); *tbuf++ = tmp; } while (--count != 0); @@ -80,7 +80,7 @@ void _outsw_ns(volatile u16 __iomem *port, const void *buf, long count) return; asm volatile("sync"); do { - *port = *tbuf++; + *(volatile u16 __force *)port = *tbuf++; } while (--count != 0); asm volatile("sync"); } @@ -95,7 +95,7 @@ void _insl_ns(const volatile u32 __iomem *port, void *buf, long count) return; asm volatile("sync"); do { - tmp = *port; + tmp = *(const volatile u32 __force *)port; eieio(); *tbuf++ = tmp; } while (--count != 0); @@ -111,7 +111,7 @@ void _outsl_ns(volatile u32 __iomem *port, const void *buf, long count) return; asm volatile("sync"); do { - *port = *tbuf++; + *(volatile u32 __force *)port = *tbuf++; } while (--count != 0); asm volatile("sync"); } -- 2.39.2
[PATCH 08/12] powerpc/opal: Annotate out param endianness
Sparse reports an endian mismatch with args to opal_int_get_xirr(). Checking the skiboot source[1] shows the function takes a __be32* (as expected), so update the function declaration to reflect this. [1]: https://github.com/open-power/skiboot/blob/80e2b1dc73/hw/xive.c#L3479 Signed-off-by: Benjamin Gray --- arch/powerpc/include/asm/opal.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index a9b31cc258fc..b66b0c615f4f 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -227,7 +227,7 @@ int64_t opal_pci_set_power_state(uint64_t async_token, uint64_t id, uint64_t data); int64_t opal_pci_poll2(uint64_t id, uint64_t data); -int64_t opal_int_get_xirr(uint32_t *out_xirr, bool just_poll); +int64_t opal_int_get_xirr(__be32 *out_xirr, bool just_poll); int64_t opal_int_set_cppr(uint8_t cppr); int64_t opal_int_eoi(uint32_t xirr); int64_t opal_int_set_mfrr(uint32_t cpu, uint8_t mfrr); -- 2.39.2
[PATCH 05/12] powerpc: Remove extern from function implementations
Sparse reports several function implementations annotated with extern. This is clearly incorrect, likely just copied from an actual extern declaration in another file. Fix the sparse warnings by removing extern. Signed-off-by: Benjamin Gray --- arch/powerpc/kernel/iommu.c | 8 arch/powerpc/kernel/traps.c | 4 ++-- arch/powerpc/kvm/book3s_64_vio.c | 8 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index 14251bc5219e..3e28579f7c62 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -1074,10 +1074,10 @@ int iommu_tce_check_gpa(unsigned long page_shift, unsigned long gpa) } EXPORT_SYMBOL_GPL(iommu_tce_check_gpa); -extern long iommu_tce_xchg_no_kill(struct mm_struct *mm, - struct iommu_table *tbl, - unsigned long entry, unsigned long *hpa, - enum dma_data_direction *direction) +long iommu_tce_xchg_no_kill(struct mm_struct *mm, + struct iommu_table *tbl, + unsigned long entry, unsigned long *hpa, + enum dma_data_direction *direction) { long ret; unsigned long size = 0; diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 64ff37721fd0..12d128238cfe 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -157,7 +157,7 @@ static int die_owner = -1; static unsigned int die_nest_count; static int die_counter; -extern void panic_flush_kmsg_start(void) +void panic_flush_kmsg_start(void) { /* * These are mostly taken from kernel/panic.c, but tries to do @@ -170,7 +170,7 @@ extern void panic_flush_kmsg_start(void) bust_spinlocks(1); } -extern void panic_flush_kmsg_end(void) +void panic_flush_kmsg_end(void) { kmsg_dump(KMSG_DUMP_PANIC); bust_spinlocks(0); diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c index 93b695b289e9..15200d766fc5 100644 --- a/arch/powerpc/kvm/book3s_64_vio.c +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -77,8 +77,8 @@ static void kvm_spapr_tce_liobn_put(struct kref *kref) call_rcu(>rcu, kvm_spapr_tce_iommu_table_free); } -extern void kvm_spapr_tce_release_iommu_group(struct kvm *kvm, - struct iommu_group *grp) +void kvm_spapr_tce_release_iommu_group(struct kvm *kvm, + struct iommu_group *grp) { int i; struct kvmppc_spapr_tce_table *stt; @@ -105,8 +105,8 @@ extern void kvm_spapr_tce_release_iommu_group(struct kvm *kvm, rcu_read_unlock(); } -extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd, - struct iommu_group *grp) +long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd, + struct iommu_group *grp) { struct kvmppc_spapr_tce_table *stt = NULL; bool found = false; -- 2.39.2
[PATCH 04/12] powerpc: Use NULL instead of 0 for null pointers
Sparse reports several uses of 0 for pointer arguments and comparisons. Replace with NULL to better convey the intent. Remove entirely if a comparison to follow the kernel style of implicit boolean conversions. Signed-off-by: Benjamin Gray --- arch/powerpc/kernel/setup_64.c| 2 +- arch/powerpc/kvm/book3s_xive_native.c | 2 +- arch/powerpc/net/bpf_jit_comp.c | 8 arch/powerpc/perf/imc-pmu.c | 2 +- arch/powerpc/platforms/4xx/soc.c | 2 +- arch/powerpc/platforms/pseries/lpar.c | 8 6 files changed, 12 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 246201d0d879..2f19d5e94485 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -364,7 +364,7 @@ void __init early_setup(unsigned long dt_ptr) */ initialise_paca(_paca, 0); fixup_boot_paca(_paca); - WARN_ON(local_paca != 0); + WARN_ON(local_paca); setup_paca(_paca); /* install the paca into registers */ /* printk is now safe to use --- */ diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c index 712ab91ced39..6e2ebbd8aaac 100644 --- a/arch/powerpc/kvm/book3s_xive_native.c +++ b/arch/powerpc/kvm/book3s_xive_native.c @@ -567,7 +567,7 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive, u8 priority; struct kvm_ppc_xive_eq kvm_eq; int rc; - __be32 *qaddr = 0; + __be32 *qaddr = NULL; struct page *page; struct xive_q *q; gfn_t gfn; diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c index 37043dfc1add..316cadf442b2 100644 --- a/arch/powerpc/net/bpf_jit_comp.c +++ b/arch/powerpc/net/bpf_jit_comp.c @@ -119,7 +119,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) cgctx.stack_size = round_up(fp->aux->stack_depth, 16); /* Scouting faux-generate pass 0 */ - if (bpf_jit_build_body(fp, 0, , addrs, 0, false)) { + if (bpf_jit_build_body(fp, NULL, , addrs, 0, false)) { /* We hit something illegal or unsupported. */ fp = org_fp; goto out_addrs; @@ -134,7 +134,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) */ if (cgctx.seen & SEEN_TAILCALL || !is_offset_in_branch_range((long)cgctx.idx * 4)) { cgctx.idx = 0; - if (bpf_jit_build_body(fp, 0, , addrs, 0, false)) { + if (bpf_jit_build_body(fp, NULL, , addrs, 0, false)) { fp = org_fp; goto out_addrs; } @@ -146,9 +146,9 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) * update ctgtx.idx as it pretends to output instructions, then we can * calculate total size from idx. */ - bpf_jit_build_prologue(0, ); + bpf_jit_build_prologue(NULL, ); addrs[fp->len] = cgctx.idx * 4; - bpf_jit_build_epilogue(0, ); + bpf_jit_build_epilogue(NULL, ); fixup_len = fp->aux->num_exentries * BPF_FIXUP_LEN * 4; extable_len = fp->aux->num_exentries * sizeof(struct exception_table_entry); diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c index 9d229ef7f86e..eba2baabccb0 100644 --- a/arch/powerpc/perf/imc-pmu.c +++ b/arch/powerpc/perf/imc-pmu.c @@ -544,7 +544,7 @@ static int nest_imc_event_init(struct perf_event *event) break; } pcni++; - } while (pcni->vbase != 0); + } while (pcni->vbase); if (!flag) return -ENODEV; diff --git a/arch/powerpc/platforms/4xx/soc.c b/arch/powerpc/platforms/4xx/soc.c index b2d940437a66..5412e6b21e10 100644 --- a/arch/powerpc/platforms/4xx/soc.c +++ b/arch/powerpc/platforms/4xx/soc.c @@ -112,7 +112,7 @@ static int __init ppc4xx_l2c_probe(void) } /* Install error handler */ - if (request_irq(irq, l2c_error_handler, 0, "L2C", 0) < 0) { + if (request_irq(irq, l2c_error_handler, 0, "L2C", NULL) < 0) { printk(KERN_ERR "Cannot install L2C error handler" ", cache is not enabled\n"); of_node_put(np); diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c index f2cb62148f36..b4e86b7ea584 100644 --- a/arch/powerpc/platforms/pseries/lpar.c +++ b/arch/powerpc/platforms/pseries/lpar.c @@ -192,9 +192,9 @@ static void free_dtl_buffers(unsigned long *time_limit) continue; kmem_cache_free(dtl_cache, pp->dispatch_log); pp->dtl_ridx = 0; - pp->dispatch_log = 0; - pp->dispatch_log_end = 0; - pp->dtl_curr = 0; + pp->dispat
[PATCH 00/12] Miscellaneous Sparse fixes
There are many Sparse warnings in the kernel, including the powerpc directory. This series provides fixes for some low-hanging fruit found when trying to triage the warnings earlier this year. It addresses about 100 warnings (many have the same root cause). I know there's concerns about making it harder to backport things. In general, as someone who was not around during the development of these features, I think that it is useful make the annotations as correct as possible. But it's no fuss if some/all of the patches are nacked for this reason. I just figured some of it might be useful instead of continuing to sit on it indefinitely. Benjamin Gray (12): powerpc/xive: Fix endian conversion size powerpc/pseries: Restructure hvc_get_chars() endianness powerpc: Explicitly reverse bytes when checking for byte reversal powerpc: Use NULL instead of 0 for null pointers powerpc: Remove extern from function implementations powerpc: Annotate endianness of various variables and functions powerpc/kvm: Force cast endianness of KVM shared regs powerpc/opal: Annotate out param endianness powerpc/uaccess: Cast away __user annotation after verification powerpc: Cast away __iomem in low level IO routines powerpc/eeh: Remove unnecessary cast powerpc/fadump: Annotate endianness cast with __force arch/powerpc/include/asm/book3s/64/pgtable.h| 2 +- arch/powerpc/include/asm/imc-pmu.h | 16 arch/powerpc/include/asm/kvm_ppc.h | 8 arch/powerpc/include/asm/opal.h | 2 +- arch/powerpc/include/asm/uaccess.h | 2 +- arch/powerpc/kernel/eeh_driver.c| 4 ++-- arch/powerpc/kernel/io.c| 12 ++-- arch/powerpc/kernel/iommu.c | 8 arch/powerpc/kernel/prom_init.c | 2 +- arch/powerpc/kernel/setup_64.c | 2 +- arch/powerpc/kernel/traps.c | 4 ++-- arch/powerpc/kexec/core_64.c| 4 ++-- arch/powerpc/kexec/file_load_64.c | 6 +++--- arch/powerpc/kvm/book3s_64_vio.c| 8 arch/powerpc/kvm/book3s_xive_native.c | 2 +- arch/powerpc/mm/drmem.c | 2 +- arch/powerpc/net/bpf_jit_comp.c | 8 arch/powerpc/perf/hv-24x7.c | 2 +- arch/powerpc/perf/imc-pmu.c | 11 ++- arch/powerpc/platforms/4xx/soc.c| 2 +- arch/powerpc/platforms/powermac/feature.c | 3 ++- arch/powerpc/platforms/powernv/opal-fadump.h| 2 +- arch/powerpc/platforms/pseries/hotplug-memory.c | 3 ++- arch/powerpc/platforms/pseries/hvconsole.c | 6 +++--- arch/powerpc/platforms/pseries/lpar.c | 8 arch/powerpc/sysdev/mpic.c | 2 +- arch/powerpc/sysdev/xive/native.c | 2 +- 27 files changed, 68 insertions(+), 65 deletions(-) -- 2.39.2
[PATCH 01/12] powerpc/xive: Fix endian conversion size
Sparse reports a size mismatch in the endian swap. The Opal implementation[1] passes the value as a __be64, and the receiving variable out_qsize is a u64, so the use of be32_to_cpu() appears to be an error. [1]: https://github.com/open-power/skiboot/blob/80e2b1dc73/hw/xive.c#L3854 Fixes: 88ec6b93c8e7 ("powerpc/xive: add OPAL extensions for the XIVE native exploitation support") Signed-off-by: Benjamin Gray --- arch/powerpc/sysdev/xive/native.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/sysdev/xive/native.c b/arch/powerpc/sysdev/xive/native.c index 9f0af4d795d8..f1c0fa6ece21 100644 --- a/arch/powerpc/sysdev/xive/native.c +++ b/arch/powerpc/sysdev/xive/native.c @@ -802,7 +802,7 @@ int xive_native_get_queue_info(u32 vp_id, u32 prio, if (out_qpage) *out_qpage = be64_to_cpu(qpage); if (out_qsize) - *out_qsize = be32_to_cpu(qsize); + *out_qsize = be64_to_cpu(qsize); if (out_qeoi_page) *out_qeoi_page = be64_to_cpu(qeoi_page); if (out_escalate_irq) -- 2.39.2
[PATCH 03/12] powerpc: Explicitly reverse bytes when checking for byte reversal
Sparse reports an invalid endian cast here. The code is written for big endian platforms, so le32_to_cpu() acts as a byte reversal. This file is checked by sparse on a little endian build though, so replace the reverse function with the dedicated swab32() function to better express the intent of the code. Signed-off-by: Benjamin Gray --- arch/powerpc/sysdev/mpic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/sysdev/mpic.c b/arch/powerpc/sysdev/mpic.c index ba287abcb008..dabbdd356664 100644 --- a/arch/powerpc/sysdev/mpic.c +++ b/arch/powerpc/sysdev/mpic.c @@ -355,7 +355,7 @@ static void __init mpic_test_broken_ipi(struct mpic *mpic) mpic_write(mpic->gregs, MPIC_INFO(GREG_IPI_VECTOR_PRI_0), MPIC_VECPRI_MASK); r = mpic_read(mpic->gregs, MPIC_INFO(GREG_IPI_VECTOR_PRI_0)); - if (r == le32_to_cpu(MPIC_VECPRI_MASK)) { + if (r == swab32(MPIC_VECPRI_MASK)) { printk(KERN_INFO "mpic: Detected reversed IPI registers\n"); mpic->flags |= MPIC_BROKEN_IPI; } -- 2.39.2
[PATCH 02/12] powerpc/pseries: Restructure hvc_get_chars() endianness
Sparse reports an endian mismatch in hvc_get_chars(). At first it seemed like the retbuf should be __be64[], but actually retbuf holds serialized registers returned by the hypervisor call, so it's correctly CPU endian typed. Instead, it is the be64_to_cpu() that's misleading. The intent is to do a byte swap on a little endian CPU. The swap is required because we wanted to store the register values to memory without 'swapping' bytes, so that the high order byte of the first register is the first byte in the result buffer. In diagram form, on a LE CPU with the letters representing the return string bytes: (register bytes) A B C D E F G H I J K L M N O P (retbuf mem bytes) H G F E D C B A P O N M L K J I (buf/lbuf mem bytes) A B C D E F G H I J K L M N O P So retbuf stores the registers in CPU endian, and buf always stores in big endian. So replace the byte swap function with cpu_to_be64() and cast lbuf as an array of __be64 to match the semantics closer. Signed-off-by: Benjamin Gray --- arch/powerpc/platforms/pseries/hvconsole.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/platforms/pseries/hvconsole.c b/arch/powerpc/platforms/pseries/hvconsole.c index 1ac52963e08b..647718a15e78 100644 --- a/arch/powerpc/platforms/pseries/hvconsole.c +++ b/arch/powerpc/platforms/pseries/hvconsole.c @@ -29,11 +29,11 @@ int hvc_get_chars(uint32_t vtermno, char *buf, int count) { long ret; unsigned long retbuf[PLPAR_HCALL_BUFSIZE]; - unsigned long *lbuf = (unsigned long *)buf; + __be64 *lbuf = (__be64 __force *)buf; ret = plpar_hcall(H_GET_TERM_CHAR, retbuf, vtermno); - lbuf[0] = be64_to_cpu(retbuf[1]); - lbuf[1] = be64_to_cpu(retbuf[2]); + lbuf[0] = cpu_to_be64(retbuf[1]); + lbuf[1] = cpu_to_be64(retbuf[2]); if (ret == H_SUCCESS) return retbuf[0]; -- 2.39.2
[RFC PATCH 5/6] powerpc/dexcr: Add sysctl entry for SBHE system override
The DEXCR Speculative Branch Hint Enable (SBHE) aspect controls whether the hints provided by BO field of Branch instructions are obeyed during speculative execution. SBHE behaviour per ISA 3.1B: 0: The hints provided by BO field of Branch instructions may be ignored during speculative execution 1: The hints provided by BO field of Branch instructions are obeyed during speculative execution Add a sysctl entry to allow changing this aspect globally in the system at runtime: /proc/sys/kernel/speculative_branch_hint_enable Three values are supported: -1: Disable DEXCR SBHE sysctl override 0: Override and set DEXCR[SBHE] aspect to 0 1: Override and set DEXCR[SBHE] aspect to 1 Internally, introduces a mechanism to apply arbitrary system wide overrides on top of the prctl() config. Signed-off-by: Benjamin Gray --- arch/powerpc/include/asm/processor.h | 8 +-- arch/powerpc/kernel/dexcr.c | 85 2 files changed, 89 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index a9d83621dfad..e7b732efb968 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -461,14 +461,14 @@ int exit_vmx_usercopy(void); int enter_vmx_ops(void); void *exit_vmx_ops(void *dest); -static inline unsigned long get_thread_dexcr(struct thread_struct const *thread) -{ #ifdef CONFIG_PPC_BOOK3S_64 - return thread->dexcr_enabled; +unsigned long get_thread_dexcr(struct thread_struct const *thread); #else +static inline unsigned long get_thread_dexcr(struct thread_struct const *thread) +{ return 0; -#endif } +#endif #endif /* __KERNEL__ */ #endif /* __ASSEMBLY__ */ diff --git a/arch/powerpc/kernel/dexcr.c b/arch/powerpc/kernel/dexcr.c index db663ce7b3ce..e790f76787db 100644 --- a/arch/powerpc/kernel/dexcr.c +++ b/arch/powerpc/kernel/dexcr.c @@ -1,7 +1,9 @@ #include +#include #include #include #include +#include #include #include @@ -16,6 +18,8 @@ static unsigned long dexcr_supported __ro_after_init = 0; +static int spec_branch_hint_enable = -1; + static int __init dexcr_init(void) { if (!early_cpu_has_feature(CPU_FTR_ARCH_31)) @@ -37,6 +41,35 @@ static int __init dexcr_init(void) } early_initcall(dexcr_init); +unsigned long get_thread_dexcr(struct thread_struct const *thread) +{ + unsigned long dexcr = thread->dexcr_enabled; + + /* +* spec_branch_hint_enable may be written to concurrently via sysctl. +* The sysctl handler is careful to use WRITE_ONCE, so we avoid +* tearing/different values with READ_ONCE +*/ + switch (READ_ONCE(spec_branch_hint_enable)) { + case 0: + dexcr &= ~DEXCR_PR_SBHE; + break; + case 1: + dexcr |= DEXCR_PR_SBHE; + break; + } + + return dexcr; +} + +static void update_dexcr_on_cpu(void *_info) +{ + /* ensure the spec_branch_hint_enable write propagated to this CPU */ + smp_mb(); + + mtspr(SPRN_DEXCR, get_thread_dexcr(>thread)); +} + static int prctl_to_aspect(unsigned long which, unsigned int *aspect) { switch (which) { @@ -126,3 +159,55 @@ int set_dexcr_prctl(struct task_struct *task, unsigned long which, unsigned long return 0; } + +#ifdef CONFIG_SYSCTL + +static const int min_sysctl_val = -1; + +static int sysctl_dexcr_sbhe_handler(struct ctl_table *table, int write, +void *buf, size_t *lenp, loff_t *ppos) +{ + int err = 0; + int prev; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (!cpu_has_feature(CPU_FTR_DEXCR_SBHE)) + return -ENODEV; + + prev = READ_ONCE(spec_branch_hint_enable); + + err = proc_dointvec_minmax(table, write, buf, lenp, ppos); + if (err) + return err; + + if (spec_branch_hint_enable != prev && write) + on_each_cpu(update_dexcr_on_cpu, NULL, 1); + + return 0; +} + +static struct ctl_table dexcr_sbhe_ctl_table[] = { + { + .procname = "speculative_branch_hint_enable", + .data = _branch_hint_enable, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = sysctl_dexcr_sbhe_handler, + .extra1 = (void *)_sysctl_val, + .extra2 = SYSCTL_ONE, + } +}; + +static int __init register_dexcr_aspects_sysctl(void) +{ + if (!early_cpu_has_feature(CPU_FTR_DEXCR_SBHE)) + return -ENODEV; + + register_sysctl("kernel", dexcr_sbhe_ctl_table); + return 0; +} +device_initcall(register_dexcr_aspects_sysctl); + +#endif /* CONFIG_SYSCTL */ -- 2.41.0
[RFC PATCH 6/6] powerpc/dexcr: Add enforced userspace ROP protection config
The DEXCR Non-Privileged Hash Instruction Enable (NPHIE) aspect controls whether the hashst and hashchk instructions are treated as no-ops by the CPU. NPHIE behaviour per ISA 3.1B: 0: hashst and hashchk instructions are executed as no-ops (even when allowed by PCR) 1: hashst and hashchk instructions are executed normally (if allowed by PCR) Currently this aspect may be set per-process by prctl() or enforced globally by the hypervisor. Add a kernel config option PPC_USER_ENFORCE_ROP_PROTECT to enforce DEXCR[NPHIE] globally regardless of prctl() or hypervisor. If set, don't report NPHIE as editable via prctl(), as the prctl() value can never take effect. Signed-off-by: Benjamin Gray --- arch/powerpc/Kconfig| 5 + arch/powerpc/kernel/dexcr.c | 7 ++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 3aaadfd2c8eb..4851cb463dc0 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -473,6 +473,11 @@ config PGTABLE_LEVELS default 2 if !PPC64 default 4 +config PPC_USER_ENFORCE_ROP_PROTECT + bool + depends on PPC_BOOK3S_64 + default y + source "arch/powerpc/sysdev/Kconfig" source "arch/powerpc/platforms/Kconfig" diff --git a/arch/powerpc/kernel/dexcr.c b/arch/powerpc/kernel/dexcr.c index e790f76787db..01d8fa28ca17 100644 --- a/arch/powerpc/kernel/dexcr.c +++ b/arch/powerpc/kernel/dexcr.c @@ -1,6 +1,8 @@ +#include #include #include #include +#include #include #include #include @@ -14,7 +16,7 @@ #define DEXCR_PRCTL_EDITABLE ( \ DEXCR_PR_IBRTPD | \ DEXCR_PR_SRAPD | \ - DEXCR_PR_NPHIE) + (!IS_ENABLED(CONFIG_PPC_USER_ENFORCE_ROP_PROTECT) ? DEXCR_PR_NPHIE : 0)) static unsigned long dexcr_supported __ro_after_init = 0; @@ -45,6 +47,9 @@ unsigned long get_thread_dexcr(struct thread_struct const *thread) { unsigned long dexcr = thread->dexcr_enabled; + if (IS_ENABLED(CONFIG_PPC_USER_ENFORCE_ROP_PROTECT)) + dexcr |= DEXCR_PR_NPHIE; + /* * spec_branch_hint_enable may be written to concurrently via sysctl. * The sysctl handler is careful to use WRITE_ONCE, so we avoid -- 2.41.0
[RFC PATCH 2/6] powerpc/dexcr: Add thread specific DEXCR configuration
Add capability to track a DEXCR value per thread. Nothing actually changes these values yet, but they are correctly tracked, propagated, and used to set the hardware register. Signed-off-by: Benjamin Gray --- arch/powerpc/include/asm/processor.h | 12 arch/powerpc/kernel/process.c| 24 2 files changed, 36 insertions(+) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index b2c51d337e60..28a72023f9bd 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -260,6 +260,9 @@ struct thread_struct { unsigned long sier2; unsigned long sier3; unsigned long hashkeyr; + unsigned intdexcr; /* Temporary value saved during thread switch */ + unsigned intdexcr_enabled; /* Bitmask of aspects enabled by this thread */ + unsigned intdexcr_inherit; /* Bitmask of aspects to inherit across exec */ #endif }; @@ -448,6 +451,15 @@ int exit_vmx_usercopy(void); int enter_vmx_ops(void); void *exit_vmx_ops(void *dest); +static inline unsigned long get_thread_dexcr(struct thread_struct const *thread) +{ +#ifdef CONFIG_PPC_BOOK3S_64 + return thread->dexcr_enabled; +#else + return 0; +#endif +} + #endif /* __KERNEL__ */ #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_PROCESSOR_H */ diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index b68898ac07e1..c05d58b7c6b3 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1185,6 +1185,9 @@ static inline void save_sprs(struct thread_struct *t) if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE)) t->hashkeyr = mfspr(SPRN_HASHKEYR); + + if (cpu_has_feature(CPU_FTR_ARCH_31)) + t->dexcr = mfspr(SPRN_DEXCR); #endif } @@ -1267,6 +1270,10 @@ static inline void restore_sprs(struct thread_struct *old_thread, if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE) && old_thread->hashkeyr != new_thread->hashkeyr) mtspr(SPRN_HASHKEYR, new_thread->hashkeyr); + + if (cpu_has_feature(CPU_FTR_ARCH_31) && + old_thread->dexcr != get_thread_dexcr(new_thread)) + mtspr(SPRN_DEXCR, get_thread_dexcr(new_thread)); #endif } @@ -1634,6 +1641,11 @@ void arch_setup_new_exec(void) current->thread.regs->amr = default_amr; current->thread.regs->iamr = default_iamr; #endif + +#ifdef CONFIG_PPC_BOOK3S_64 + current->thread.dexcr_enabled &= current->thread.dexcr_inherit; + current->thread.dexcr_enabled |= ~current->thread.dexcr_inherit & DEXCR_INIT; +#endif } #ifdef CONFIG_PPC64 @@ -1878,6 +1890,11 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) #ifdef CONFIG_PPC_BOOK3S_64 if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE)) p->thread.hashkeyr = current->thread.hashkeyr; + + if (cpu_has_feature(CPU_FTR_ARCH_31)) { + p->thread.dexcr_enabled = current->thread.dexcr_enabled; + p->thread.dexcr_inherit = current->thread.dexcr_inherit; + } #endif return 0; } @@ -2000,6 +2017,13 @@ void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp) current->thread.hashkeyr = get_random_long(); mtspr(SPRN_HASHKEYR, current->thread.hashkeyr); } + + if (cpu_has_feature(CPU_FTR_ARCH_31)) { + current->thread.dexcr = 0; + current->thread.dexcr_enabled = DEXCR_INIT; + current->thread.dexcr_inherit = 0; + mtspr(SPRN_DEXCR, get_thread_dexcr(>thread)); + } #endif /* CONFIG_PPC_BOOK3S_64 */ } EXPORT_SYMBOL(start_thread); -- 2.41.0
[RFC PATCH 1/6] powerpc/dexcr: Make all aspects CPU features
The CPU_FEATURE_* mechanism is the only way right now to get configuration from the "ibm,pa-features" devicetree node. Add a CPU_FEATURE_* entry for each other DEXCR aspect that will be exposed to userspace. The NPHIE feature value is changed for consistency; the actual value is never accessed or exposed to userspace, so there is no breakage. Signed-off-by: Benjamin Gray --- arch/powerpc/include/asm/cputable.h | 6 +- arch/powerpc/kernel/prom.c | 3 +++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index 8765d5158324..bd087d0cb5fa 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -192,7 +192,10 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTR_P9_RADIX_PREFETCH_BUG LONG_ASM_CONST(0x0002) #define CPU_FTR_ARCH_31 LONG_ASM_CONST(0x0004) #define CPU_FTR_DAWR1 LONG_ASM_CONST(0x0008) -#define CPU_FTR_DEXCR_NPHIELONG_ASM_CONST(0x0010) +#define CPU_FTR_DEXCR_SBHE LONG_ASM_CONST(0x0010) +#define CPU_FTR_DEXCR_IBRTPD LONG_ASM_CONST(0x0020) +#define CPU_FTR_DEXCR_SRAPDLONG_ASM_CONST(0x0040) +#define CPU_FTR_DEXCR_NPHIELONG_ASM_CONST(0x0080) #ifndef __ASSEMBLY__ @@ -453,6 +456,7 @@ static inline void cpu_feature_keys_init(void) { } CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \ CPU_FTR_ARCH_300 | CPU_FTR_ARCH_31 | \ CPU_FTR_DAWR | CPU_FTR_DAWR1 | \ + CPU_FTR_DEXCR_SBHE | CPU_FTR_DEXCR_IBRTPD | CPU_FTR_DEXCR_SRAPD | \ CPU_FTR_DEXCR_NPHIE) #define CPU_FTRS_CELL (CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index 0b5878c3125b..ea081a5d2023 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -182,6 +182,9 @@ static struct ibm_feature ibm_pa_features[] __initdata = { .cpu_user_ftrs2 = PPC_FEATURE2_HTM_COMP | PPC_FEATURE2_HTM_NOSC_COMP }, { .pabyte = 64, .pabit = 0, .cpu_features = CPU_FTR_DAWR1 }, + { .pabyte = 68, .pabit = 0, .cpu_features = CPU_FTR_DEXCR_SBHE }, + { .pabyte = 68, .pabit = 3, .cpu_features = CPU_FTR_DEXCR_IBRTPD }, + { .pabyte = 68, .pabit = 4, .cpu_features = CPU_FTR_DEXCR_SRAPD }, { .pabyte = 68, .pabit = 5, .cpu_features = CPU_FTR_DEXCR_NPHIE }, }; -- 2.41.0
[RFC PATCH 4/6] powerpc/dexcr: Add prctl implementation
Adds an initial prctl interface implementation. Unprivileged processes can query the current prctl setting, including whether an aspect is implemented by the hardware or is permitted to be modified by a setter prctl. Editable aspects can be changed by a CAP_SYS_ADMIN privileged process. The prctl setting represents what the process itself has requested, and does not account for any overrides. Either the kernel or a hypervisor may enforce a different setting for an aspect. Userspace can access a readonly view of the current DEXCR via SPR 812, and a readonly view of the aspects enforced by the hypervisor via SPR 455. A bitwise OR of these two SPRs will give the effective DEXCR aspect state of the process. Signed-off-by: Benjamin Gray --- arch/powerpc/include/asm/processor.h | 10 +++ arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/dexcr.c | 128 +++ 3 files changed, 139 insertions(+) create mode 100644 arch/powerpc/kernel/dexcr.c diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 28a72023f9bd..a9d83621dfad 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -336,6 +336,16 @@ extern int set_endian(struct task_struct *tsk, unsigned int val); extern int get_unalign_ctl(struct task_struct *tsk, unsigned long adr); extern int set_unalign_ctl(struct task_struct *tsk, unsigned int val); +#ifdef CONFIG_PPC_BOOK3S_64 + +#define PPC_GET_DEXCR_ASPECT(tsk, asp) get_dexcr_prctl((tsk), (asp)) +#define PPC_SET_DEXCR_ASPECT(tsk, asp, val) set_dexcr_prctl((tsk), (asp), (val)) + +int get_dexcr_prctl(struct task_struct *tsk, unsigned long asp); +int set_dexcr_prctl(struct task_struct *tsk, unsigned long asp, unsigned long val); + +#endif + extern void load_fp_state(struct thread_fp_state *fp); extern void store_fp_state(struct thread_fp_state *fp); extern void load_vr_state(struct thread_vr_state *vr); diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 2919433be355..24f82b09246c 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -88,6 +88,7 @@ obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o obj-$(CONFIG_PPC_DAWR) += dawr.o obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_ppc970.o cpu_setup_pa6t.o obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_power.o +obj-$(CONFIG_PPC_BOOK3S_64)+= dexcr.o obj-$(CONFIG_PPC_BOOK3S_64)+= mce.o mce_power.o obj-$(CONFIG_PPC_BOOK3E_64)+= exceptions-64e.o idle_64e.o obj-$(CONFIG_PPC_BARRIER_NOSPEC) += security.o diff --git a/arch/powerpc/kernel/dexcr.c b/arch/powerpc/kernel/dexcr.c new file mode 100644 index ..db663ce7b3ce --- /dev/null +++ b/arch/powerpc/kernel/dexcr.c @@ -0,0 +1,128 @@ +#include +#include +#include +#include + +#include +#include +#include +#include + +/* Allow thread local configuration of these by default */ +#define DEXCR_PRCTL_EDITABLE ( \ + DEXCR_PR_IBRTPD | \ + DEXCR_PR_SRAPD | \ + DEXCR_PR_NPHIE) + +static unsigned long dexcr_supported __ro_after_init = 0; + +static int __init dexcr_init(void) +{ + if (!early_cpu_has_feature(CPU_FTR_ARCH_31)) + return 0; + + if (early_cpu_has_feature(CPU_FTR_DEXCR_SBHE)) + dexcr_supported |= DEXCR_PR_SBHE; + + if (early_cpu_has_feature(CPU_FTR_DEXCR_IBRTPD)) + dexcr_supported |= DEXCR_PR_IBRTPD; + + if (early_cpu_has_feature(CPU_FTR_DEXCR_SRAPD)) + dexcr_supported |= DEXCR_PR_SRAPD; + + if (early_cpu_has_feature(CPU_FTR_DEXCR_NPHIE)) + dexcr_supported |= DEXCR_PR_NPHIE; + + return 0; +} +early_initcall(dexcr_init); + +static int prctl_to_aspect(unsigned long which, unsigned int *aspect) +{ + switch (which) { + case PR_PPC_DEXCR_SBHE: + *aspect = DEXCR_PR_SBHE; + break; + case PR_PPC_DEXCR_IBRTPD: + *aspect = DEXCR_PR_IBRTPD; + break; + case PR_PPC_DEXCR_SRAPD: + *aspect = DEXCR_PR_SRAPD; + break; + case PR_PPC_DEXCR_NPHIE: + *aspect = DEXCR_PR_NPHIE; + break; + default: + return -ENODEV; + } + + return 0; +} + +int get_dexcr_prctl(struct task_struct *task, unsigned long which) +{ + unsigned int aspect; + int ret; + + ret = prctl_to_aspect(which, ); + if (ret) + return ret; + + if (!(aspect & dexcr_supported)) + return -ENODEV; + + if (aspect & task->thread.dexcr_enabled) + ret |= PR_PPC_DEXCR_CTRL_ON; + else + ret |= PR_PPC_DEXCR_CTRL_OFF; + + if (aspect & task->thread.dexcr_inherit) + ret |= PR_PPC_DEXCR_CTRL_INHERIT; + + return ret; +} + +int set_dexcr_prctl(struct task_struct *task, unsigned long which, unsigned long ctrl) +{ + u
[RFC PATCH 3/6] prctl: Define PowerPC DEXCR interface
Adds the definitions and generic handler for prctl control of the PowerPC Dynamic Execution Control Register (DEXCR). Signed-off-by: Benjamin Gray --- include/uapi/linux/prctl.h | 13 + kernel/sys.c | 16 2 files changed, 29 insertions(+) diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 3c36aeade991..85d66ad134b1 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -305,4 +305,17 @@ struct prctl_mm_map { # define PR_RISCV_V_VSTATE_CTRL_NEXT_MASK 0xc # define PR_RISCV_V_VSTATE_CTRL_MASK 0x1f +/* PowerPC Dynamic Execution Control Register (DEXCR) controls */ +#define PR_PPC_GET_DEXCR 71 +#define PR_PPC_SET_DEXCR 72 +/* DEXCR aspect to act on */ +# define PR_PPC_DEXCR_SBHE 0 /* Speculative branch hint enable */ +# define PR_PPC_DEXCR_IBRTPD 1 /* Indirect branch recurrent target prediction disable */ +# define PR_PPC_DEXCR_SRAPD2 /* Subroutine return address prediction disable */ +# define PR_PPC_DEXCR_NPHIE3 /* Non-privileged hash instruction enable */ +/* Action to apply / return */ +# define PR_PPC_DEXCR_CTRL_OFF (1UL << 0) +# define PR_PPC_DEXCR_CTRL_ON (1UL << 1) +# define PR_PPC_DEXCR_CTRL_INHERIT (1UL << 2) + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/sys.c b/kernel/sys.c index 2410e3999ebe..0c1b8e9c3d16 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -146,6 +146,12 @@ #ifndef RISCV_V_GET_CONTROL # define RISCV_V_GET_CONTROL() (-EINVAL) #endif +#ifndef PPC_GET_DEXCR_ASPECT +# define PPC_GET_DEXCR_ASPECT(a, b)(-EINVAL) +#endif +#ifndef PPC_SET_DEXCR_ASPECT +# define PPC_SET_DEXCR_ASPECT(a, b, c) (-EINVAL) +#endif /* * this is where the system-wide overflow UID and GID are defined, for @@ -2686,6 +2692,16 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, case PR_GET_MDWE: error = prctl_get_mdwe(arg2, arg3, arg4, arg5); break; + case PR_PPC_GET_DEXCR: + if (arg3 || arg4 || arg5) + return -EINVAL; + error = PPC_GET_DEXCR_ASPECT(me, arg2); + break; + case PR_PPC_SET_DEXCR: + if (arg4 || arg5) + return -EINVAL; + error = PPC_SET_DEXCR_ASPECT(me, arg2, arg3); + break; case PR_SET_VMA: error = prctl_set_vma(arg2, arg3, arg4, arg5); break; -- 2.41.0
[RFC PATCH 0/6] Add dynamic DEXCR support
(This RFC is mainly to get feedback on the user interface. Tests and documentation will be added to the non-rfc followups. This builds but is otherwise untested.) In the "Add static DEXCR support" series[1] the kernel was made to initialise the DEXCR to a static value on all CPUs when they online. This series allows the DEXCR value to be changed at runtime with a per-thread granularity. It provides a prctl interface to set and query this configuration. It also provides a system wide sysctl override for the SBHE aspect, which specifically has effects that can bleed over to other CPUs (temporarily after changing it) and certain tracing tools may require it be set globally across all threads. Some notes on the patches/changes from the original RFC: 1. We don't need all the aspects to use feature bits, but the aspect information is in the device tree and this is the simplest mechanism to access it. Adding some kind of callback support to the feature detector would also work. The dexcr_supported variable introduced in patch 4 is a half-hearted example of how the callbacks could just update that variable, and no more CPU features would be necessary. 2. The thread used to track 'default' as a separate state (way back in the original RFC before the split into static/dynamic). This RFC simplifies it away, as it is only useful if there is a system wide default that can be changed. The current system wide default is decided at compile time, so we just initialise the thread config to that. If the 'default' state were added in future though, would that be a userspace ABI concern? I guess it could just return a 'default' flag as well as the current 'on' and 'off' flags to indicate what the default is. 3. The prctl controls are reduced to what I expect to be most useful. Default state is removed as above, and so is 'force' (where the aspect would no longer be configurable). 'inherit' remains as a way to control the DEXCR of child process trees that may not be aware of it. 4. The prctl set interface is privileged. The concern is a non-privileged process disabling NPHIE (HASHCHK enabler) and then invoking a setuid binary which doesn't set NPHIE itself. It seems that kind of information about the exec target is not available in arch specific code. 5. A lot of the synchonization of the sysctl interface is removed. Apparently the kernfs system that manages these files enforces exclusive access to a given sysctl file. Additionally, the proc_dointvec_minmax() function was made to store the result with WRITE_ONCE(), so we can assume a regular atomic store of an aligned word. 6. The ROP protection enforcement is refactored a bit. The idea is to allow baking into the kernel at compile time that NPHIE cannot be changed by a thread. Seems to allow making the system more secure on paper, not sure how useful it is in practice. 7. The prctl interface tries to stay separate from the DEXCR structure. This makes it a little contorted (having to convert the prctl value to an aspect), but I think makes the interface more robust against future changes to the DEXCR. E.g., if all 32 aspect bits were exhausted and a second DEXCR added, the current interface could still handle that. [1]: https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20230616034846.311705-1-bg...@linux.ibm.com/ Benjamin Gray (6): powerpc/dexcr: Make all aspects CPU features powerpc/dexcr: Add thread specific DEXCR configuration prctl: Define PowerPC DEXCR interface powerpc/dexcr: Add prctl implementation powerpc/dexcr: Add sysctl entry for SBHE system override powerpc/dexcr: Add enforced userspace ROP protection config arch/powerpc/Kconfig | 5 + arch/powerpc/include/asm/cputable.h | 6 +- arch/powerpc/include/asm/processor.h | 22 +++ arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/dexcr.c | 218 +++ arch/powerpc/kernel/process.c| 24 +++ arch/powerpc/kernel/prom.c | 3 + include/uapi/linux/prctl.h | 13 ++ kernel/sys.c | 16 ++ 9 files changed, 307 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/kernel/dexcr.c -- 2.41.0
[PATCH v2] powerpc/dexcr: Move HASHCHK trap handler
Syzkaller reported a sleep in atomic context bug relating to the HASHCHK handler logic BUG: sleeping function called from invalid context at arch/powerpc/kernel/traps.c:1518 in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 25040, name: syz-executor preempt_count: 0, expected: 0 RCU nest depth: 0, expected: 0 no locks held by syz-executor/25040. irq event stamp: 34 hardirqs last enabled at (33): [] prep_irq_for_enabled_exit arch/powerpc/kernel/interrupt.c:56 [inline] hardirqs last enabled at (33): [] interrupt_exit_user_prepare_main+0x148/0x600 arch/powerpc/kernel/interrupt.c:230 hardirqs last disabled at (34): [] interrupt_enter_prepare+0x144/0x4f0 arch/powerpc/include/asm/interrupt.h:176 softirqs last enabled at (0): [] copy_process+0x16e4/0x4750 kernel/fork.c:2436 softirqs last disabled at (0): [<>] 0x0 CPU: 15 PID: 25040 Comm: syz-executor Not tainted 6.5.0-rc5-1-g3ccdff6bb06d #3 Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf06 of:IBM,FW1040.00 (NL1040_021) hv:phyp pSeries Call Trace: [c000a8247ce0] [c032b0e4] __might_resched+0x3b4/0x400 kernel/sched/core.c:10189 [c000a8247d80] [c08c7dc8] __might_fault+0xa8/0x170 mm/memory.c:5853 [c000a8247dc0] [c004160c] do_program_check+0x32c/0xb20 arch/powerpc/kernel/traps.c:1518 [c000a8247e50] [c0009b2c] program_check_common_virt+0x3bc/0x3c0 To determine if a trap was caused by a HASHCHK instruction, we inspect the user instruction that triggered the trap. However this may sleep if the page needs to be faulted in (get_user_instr() reaches __get_user(), which calls might_fault() and triggers the bug message). Move the HASHCHK handler logic to after we allow IRQs, which is fine because we are only interested in HASHCHK if it's a user space trap. Fixes: 5bcba4e6c13f ("powerpc/dexcr: Handle hashchk exception") Signed-off-by: Benjamin Gray --- v1: https://lore.kernel.org/all/20230825014910.488822-1-bg...@linux.ibm.com/ v1 -> v2: Changed commit description to mention Syzkaller and bug output --- arch/powerpc/kernel/traps.c | 56 - 1 file changed, 36 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index eeff136b83d9..64ff37721fd0 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -1512,23 +1512,11 @@ static void do_program_check(struct pt_regs *regs) return; } - if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE) && user_mode(regs)) { - ppc_inst_t insn; - - if (get_user_instr(insn, (void __user *)regs->nip)) { - _exception(SIGSEGV, regs, SEGV_MAPERR, regs->nip); - return; - } - - if (ppc_inst_primary_opcode(insn) == 31 && - get_xop(ppc_inst_val(insn)) == OP_31_XOP_HASHCHK) { - _exception(SIGILL, regs, ILL_ILLOPN, regs->nip); - return; - } + /* User mode considers other cases after enabling IRQs */ + if (!user_mode(regs)) { + _exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip); + return; } - - _exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip); - return; } #ifdef CONFIG_PPC_TRANSACTIONAL_MEM if (reason & REASON_TM) { @@ -1561,16 +1549,44 @@ static void do_program_check(struct pt_regs *regs) /* * If we took the program check in the kernel skip down to sending a -* SIGILL. The subsequent cases all relate to emulating instructions -* which we should only do for userspace. We also do not want to enable -* interrupts for kernel faults because that might lead to further -* faults, and loose the context of the original exception. +* SIGILL. The subsequent cases all relate to user space, such as +* emulating instructions which we should only do for user space. We +* also do not want to enable interrupts for kernel faults because that +* might lead to further faults, and loose the context of the original +* exception. */ if (!user_mode(regs)) goto sigill; interrupt_cond_local_irq_enable(regs); + /* +* (reason & REASON_TRAP) is mostly handled before enabling IRQs, +* except get_user_instr() can sleep so we cannot reliably inspect the +* current instruction in that context. Now that we know we are +* handling a user space trap and can sleep, we can check if the trap +* was a hashchk failure. +*/ + if (reason & REASON_T
[PATCH] powerpc/configs: Set more PPC debug configs
Add more config options that wouldn't be done by the generic debug config in kernel/configs/debug.config CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG Adds an initialized check on each (cpu|mmu)_has_feature() CONFIG_PPC_IRQ_SOFT_MASK_DEBUG Adds some extra checks around IRQ mask manipulation CONFIG_PPC_KUAP_DEBUG Adds some extra KAUP checks around interrupts/context switching CONFIG_PPC_RFI_SRR_DEBUG Adds some extra SSR checks around interrupts/syscalls Signed-off-by: Benjamin Gray --- arch/powerpc/configs/debug.config | 4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/configs/debug.config b/arch/powerpc/configs/debug.config index a14ae1f20d60..bcc1fcf25e10 100644 --- a/arch/powerpc/configs/debug.config +++ b/arch/powerpc/configs/debug.config @@ -1 +1,5 @@ +CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG=y +CONFIG_PPC_IRQ_SOFT_MASK_DEBUG=y +CONFIG_PPC_KUAP_DEBUG=y +CONFIG_PPC_RFI_SRR_DEBUG=y CONFIG_SCOM_DEBUGFS=y -- 2.41.0
Re: [PATCH 0/3] Fix preemption errors in watchpoints
On 29/8/23 4:34 pm, Benjamin Gray wrote: When enabling debug config options relating to preemption, several bugs appear in the kernel log. With this series applied, the breakpoint code no longer prints bugs when running the powerpc/ptrace selftests. Benjamin Gray (3): powerpc/watchpoints: Disable preemption in thread_change_pc() powerpc/watchpoint: Disable pagefaults when getting user instruction powerpc/watchpoints: Annotate atomic context in more places arch/powerpc/kernel/hw_breakpoint.c | 16 +++- arch/powerpc/kernel/hw_breakpoint_constraints.c | 7 ++- 2 files changed, 21 insertions(+), 2 deletions(-) -- 2.41.0 The particular config is below, used by appending to a ppc64le_guest_defconfig. Not all options are relevant. Tested on a Power8 and Power10 machine (1 and 2 watchpoints). CONFIG_LOCALVERSION="-watchpoint-sleep" CONFIG_GENERIC_IRQ_DEBUGFS=y CONFIG_PREEMPT=y CONFIG_PRINTK_INDEX=y CONFIG_CGROUP_DEBUG=y CONFIG_PPC_KUAP_DEBUG=y CONFIG_SCOM_DEBUGFS=y CONFIG_UDBG_RTAS_CONSOLE=y CONFIG_RELOCATABLE_TEST=y CONFIG_PM_DEBUG=y CONFIG_PM_ADVANCED_DEBUG=y CONFIG_STATIC_KEYS_SELFTEST=y CONFIG_MODULE_DEBUG=y CONFIG_MODULE_STATS=y CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE=y CONFIG_CMA_DEBUGFS=y CONFIG_CMA_SYSFS=y CONFIG_PERCPU_STATS=y CONFIG_USERFAULTFD=y CONFIG_VALIDATE_FS_PARSER=y CONFIG_EXT4_DEBUG=y CONFIG_HARDENED_USERCOPY=y CONFIG_FORTIFY_SOURCE=y CONFIG_DYNAMIC_DEBUG=y CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y CONFIG_DEBUG_INFO_SPLIT=y CONFIG_GDB_SCRIPTS=y CONFIG_READABLE_ASM=y CONFIG_NET_DEV_REFCNT_TRACKER=y CONFIG_NET_NS_REFCNT_TRACKER=y CONFIG_DEBUG_NET=y CONFIG_DEBUG_PAGEALLOC=y CONFIG_SLUB_DEBUG_ON=y CONFIG_PTDUMP_DEBUGFS=y CONFIG_DEBUG_KMEMLEAK=y CONFIG_PER_VMA_LOCK_STATS=y CONFIG_DEBUG_OBJECTS=y CONFIG_DEBUG_OBJECTS_SELFTEST=y CONFIG_DEBUG_OBJECTS_FREE=y CONFIG_DEBUG_OBJECTS_TIMERS=y CONFIG_DEBUG_OBJECTS_WORK=y CONFIG_DEBUG_OBJECTS_RCU_HEAD=y CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER=y CONFIG_SHRINKER_DEBUG=y CONFIG_SCHED_STACK_END_CHECK=y CONFIG_DEBUG_VM=y CONFIG_DEBUG_VM_MAPLE_TREE=y CONFIG_DEBUG_VM_RB=y CONFIG_DEBUG_VM_PGFLAGS=y CONFIG_DEBUG_VM_PGTABLE=y CONFIG_DEBUG_VIRTUAL=y CONFIG_DEBUG_PER_CPU_MAPS=y CONFIG_DEBUG_SHIRQ=y CONFIG_WQ_WATCHDOG=y CONFIG_DEBUG_TIMEKEEPING=y CONFIG_DEBUG_PREEMPT=y CONFIG_PROVE_LOCKING=y CONFIG_LOCK_STAT=y CONFIG_DEBUG_ATOMIC_SLEEP=y CONFIG_DEBUG_LOCKING_API_SELFTESTS=y CONFIG_CSD_LOCK_WAIT_DEBUG=y CONFIG_DEBUG_IRQFLAGS=y CONFIG_WARN_ALL_UNSEEDED_RANDOM=y CONFIG_DEBUG_NOTIFIERS=y CONFIG_DEBUG_CREDENTIALS=y CONFIG_RCU_CPU_STALL_CPUTIME=y CONFIG_RCU_EQS_DEBUG=y CONFIG_DEBUG_WQ_FORCE_RR_CPU=y CONFIG_CPU_HOTPLUG_STATE_CONTROL=y CONFIG_LATENCYTOP=y CONFIG_PPC_IRQ_SOFT_MASK_DEBUG=y CONFIG_PPC_RFI_SRR_DEBUG=y
[PATCH 3/3] powerpc/watchpoints: Annotate atomic context in more places
It can be easy to miss that the notifier mechanism invokes the callbacks in an atomic context, so add some comments to that effect on the two handlers we register here. Signed-off-by: Benjamin Gray --- arch/powerpc/kernel/hw_breakpoint.c | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 2854376870cf..a1318ce18d0e 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -368,6 +368,11 @@ static void handle_p10dd1_spurious_exception(struct perf_event **bp, } } +/* + * Handle a DABR or DAWR exception. + * + * Called in atomic context. + */ int hw_breakpoint_handler(struct die_args *args) { bool err = false; @@ -495,6 +500,8 @@ NOKPROBE_SYMBOL(hw_breakpoint_handler); /* * Handle single-step exceptions following a DABR hit. + * + * Called in atomic context. */ static int single_step_dabr_instruction(struct die_args *args) { @@ -546,6 +553,8 @@ NOKPROBE_SYMBOL(single_step_dabr_instruction); /* * Handle debug exception notifications. + * + * Called in atomic context. */ int hw_breakpoint_exceptions_notify( struct notifier_block *unused, unsigned long val, void *data) -- 2.41.0
[PATCH 0/3] Fix preemption errors in watchpoints
When enabling debug config options relating to preemption, several bugs appear in the kernel log. With this series applied, the breakpoint code no longer prints bugs when running the powerpc/ptrace selftests. Benjamin Gray (3): powerpc/watchpoints: Disable preemption in thread_change_pc() powerpc/watchpoint: Disable pagefaults when getting user instruction powerpc/watchpoints: Annotate atomic context in more places arch/powerpc/kernel/hw_breakpoint.c | 16 +++- arch/powerpc/kernel/hw_breakpoint_constraints.c | 7 ++- 2 files changed, 21 insertions(+), 2 deletions(-) -- 2.41.0
[PATCH 2/3] powerpc/watchpoint: Disable pagefaults when getting user instruction
This is called in an atomic context, so is not allowed to sleep if a user page needs to be faulted in and has nowhere it can be deferred to. The pagefault_disabled() function is documented as preventing user access methods from sleeping. In practice the page will be mapped in nearly always because we are reading the instruction that just triggered the watchpoint trap. Signed-off-by: Benjamin Gray --- arch/powerpc/kernel/hw_breakpoint_constraints.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/hw_breakpoint_constraints.c b/arch/powerpc/kernel/hw_breakpoint_constraints.c index a74623025f3a..9e51801c4915 100644 --- a/arch/powerpc/kernel/hw_breakpoint_constraints.c +++ b/arch/powerpc/kernel/hw_breakpoint_constraints.c @@ -131,8 +131,13 @@ void wp_get_instr_detail(struct pt_regs *regs, ppc_inst_t *instr, int *type, int *size, unsigned long *ea) { struct instruction_op op; + int err; - if (__get_user_instr(*instr, (void __user *)regs->nip)) + pagefault_disable(); + err = __get_user_instr(*instr, (void __user *)regs->nip); + pagefault_enable(); + + if (err) return; analyse_instr(, regs, *instr); -- 2.41.0
[PATCH 1/3] powerpc/watchpoints: Disable preemption in thread_change_pc()
thread_change_pc() uses CPU local data, so must be protected from swapping CPUs while it is reading the breakpoint struct. The error is more noticeable after 1e60f3564bad ("powerpc/watchpoints: Track perf single step directly on the breakpoint"), which added an unconditional __this_cpu_read() call in thread_change_pc(). However the existing __this_cpu_read() that runs if a breakpoint does need to be re-inserted has the same issue. Signed-off-by: Benjamin Gray --- There's probably a more idiomatic way to express this. We technically don't need to disable preemption for the entire function: we should only need to disable preemption within each loop iteration while handling the pointer we are working with. Each iteration itself is independent. --- arch/powerpc/kernel/hw_breakpoint.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index b8513dc3e53a..2854376870cf 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -230,13 +230,15 @@ void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs) struct arch_hw_breakpoint *info; int i; + preempt_disable(); + for (i = 0; i < nr_wp_slots(); i++) { struct perf_event *bp = __this_cpu_read(bp_per_reg[i]); if (unlikely(bp && counter_arch_bp(bp)->perf_single_step)) goto reset; } - return; + goto out; reset: regs_set_return_msr(regs, regs->msr & ~MSR_SE); @@ -245,6 +247,9 @@ void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs) __set_breakpoint(i, info); info->perf_single_step = false; } + +out: + preempt_enable(); } static bool is_larx_stcx_instr(int type) -- 2.41.0
[PATCH] powerpc/dexcr: Move HASHCHK trap handler
To determine if a trap was caused by a HASHCHK instruction, we inspect the user instruction that triggered the trap. However this may sleep if the page needs to be faulted in. Move the HASHCHK handler logic to after we allow IRQs, which is fine because we are only interested in HASHCHK if it's a user space trap. Fixes: 5bcba4e6c13f ("powerpc/dexcr: Handle hashchk exception") Signed-off-by: Benjamin Gray --- arch/powerpc/kernel/traps.c | 56 - 1 file changed, 36 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index f5ce282dc4b8..32b5e841ea97 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -1512,23 +1512,11 @@ static void do_program_check(struct pt_regs *regs) return; } - if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE) && user_mode(regs)) { - ppc_inst_t insn; - - if (get_user_instr(insn, (void __user *)regs->nip)) { - _exception(SIGSEGV, regs, SEGV_MAPERR, regs->nip); - return; - } - - if (ppc_inst_primary_opcode(insn) == 31 && - get_xop(ppc_inst_val(insn)) == OP_31_XOP_HASHCHK) { - _exception(SIGILL, regs, ILL_ILLOPN, regs->nip); - return; - } + /* User mode considers other cases after enabling IRQs */ + if (!user_mode(regs)) { + _exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip); + return; } - - _exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip); - return; } #ifdef CONFIG_PPC_TRANSACTIONAL_MEM if (reason & REASON_TM) { @@ -1561,16 +1549,44 @@ static void do_program_check(struct pt_regs *regs) /* * If we took the program check in the kernel skip down to sending a -* SIGILL. The subsequent cases all relate to emulating instructions -* which we should only do for userspace. We also do not want to enable -* interrupts for kernel faults because that might lead to further -* faults, and loose the context of the original exception. +* SIGILL. The subsequent cases all relate to user space, such as +* emulating instructions which we should only do for user space. We +* also do not want to enable interrupts for kernel faults because that +* might lead to further faults, and loose the context of the original +* exception. */ if (!user_mode(regs)) goto sigill; interrupt_cond_local_irq_enable(regs); + /* +* (reason & REASON_TRAP) is mostly handled before enabling IRQs, +* except get_user_instr() can sleep so we cannot reliably inspect the +* current instruction in that context. Now that we know we are +* handling a user space trap and can sleep, we can check if the trap +* was a hashchk failure. +*/ + if (reason & REASON_TRAP) { + if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE)) { + ppc_inst_t insn; + + if (get_user_instr(insn, (void __user *)regs->nip)) { + _exception(SIGSEGV, regs, SEGV_MAPERR, regs->nip); + return; + } + + if (ppc_inst_primary_opcode(insn) == 31 && + get_xop(ppc_inst_val(insn)) == OP_31_XOP_HASHCHK) { + _exception(SIGILL, regs, ILL_ILLOPN, regs->nip); + return; + } + } + + _exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip); + return; + } + /* (reason & REASON_ILLEGAL) would be the obvious thing here, * but there seems to be a hardware bug on the 405GP (RevD) * that means ESR is sometimes set incorrectly - either to -- 2.41.0
Re: [PATCH 2/2] powerpc/xmon: use KSYM_NAME_LEN in array size
On 30/5/23 10:54 pm, Miguel Ojeda wrote: Side-note: in `get_function_bounds()`, I see `kallsyms_lookup()` being used, but the name seems discarded? Can `kallsyms_lookup_size_offset()` be used instead, thus avoiding the usage of the buffer there to begin with? I'm not familiar with the kallsyms infrastructure, but looking over the implementations of kallsyms_lookup() and kallsyms_lookup_size_offset() it looks like the existing kallsyms_lookup() handles an extra case over kallsyms_lookup_size_offset()? kallsyms_lookup_buildid() (the implementation of kallsyms_lookup()) has /* See if it's in a module or a BPF JITed image. */ ret = module_address_lookup(addr, symbolsize, offset, modname, modbuildid, namebuf); if (!ret) ret = bpf_address_lookup(addr, symbolsize, offset, modname, namebuf); if (!ret) ret = ftrace_mod_address_lookup(addr, symbolsize, offset, modname, namebuf); while kallsyms_lookup_size_offset() is missing the ftrace case return !!module_address_lookup(addr, symbolsize, offset, NULL, NULL, namebuf) || !!__bpf_address_lookup(addr, symbolsize, offset, namebuf); Might this be a concern for xmon?
[PATCH 7/7] perf/hw_breakpoint: Remove arch breakpoint hooks
PowerPC was the only user of these hooks, and has been refactored to no longer require them. There is no need to keep them around, so remove them to reduce complexity. Signed-off-by: Benjamin Gray --- include/linux/hw_breakpoint.h | 3 --- kernel/events/hw_breakpoint.c | 28 2 files changed, 31 deletions(-) diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h index 7fbb45911273..db199d653dd1 100644 --- a/include/linux/hw_breakpoint.h +++ b/include/linux/hw_breakpoint.h @@ -90,9 +90,6 @@ extern int dbg_reserve_bp_slot(struct perf_event *bp); extern int dbg_release_bp_slot(struct perf_event *bp); extern int reserve_bp_slot(struct perf_event *bp); extern void release_bp_slot(struct perf_event *bp); -int arch_reserve_bp_slot(struct perf_event *bp); -void arch_release_bp_slot(struct perf_event *bp); -void arch_unregister_hw_breakpoint(struct perf_event *bp); extern void flush_ptrace_hw_breakpoint(struct task_struct *tsk); diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c index c3797701339c..6c2cb4e4f48d 100644 --- a/kernel/events/hw_breakpoint.c +++ b/kernel/events/hw_breakpoint.c @@ -523,26 +523,6 @@ toggle_bp_slot(struct perf_event *bp, bool enable, enum bp_type_idx type, int we return 0; } -__weak int arch_reserve_bp_slot(struct perf_event *bp) -{ - return 0; -} - -__weak void arch_release_bp_slot(struct perf_event *bp) -{ -} - -/* - * Function to perform processor-specific cleanup during unregistration - */ -__weak void arch_unregister_hw_breakpoint(struct perf_event *bp) -{ - /* -* A weak stub function here for those archs that don't define -* it inside arch/.../kernel/hw_breakpoint.c -*/ -} - /* * Constraints to check before allowing this new breakpoint counter. * @@ -594,7 +574,6 @@ static int __reserve_bp_slot(struct perf_event *bp, u64 bp_type) enum bp_type_idx type; int max_pinned_slots; int weight; - int ret; /* We couldn't initialize breakpoint constraints on boot */ if (!constraints_initialized) @@ -613,10 +592,6 @@ static int __reserve_bp_slot(struct perf_event *bp, u64 bp_type) if (max_pinned_slots > hw_breakpoint_slots_cached(type)) return -ENOSPC; - ret = arch_reserve_bp_slot(bp); - if (ret) - return ret; - return toggle_bp_slot(bp, true, type, weight); } @@ -634,8 +609,6 @@ static void __release_bp_slot(struct perf_event *bp, u64 bp_type) enum bp_type_idx type; int weight; - arch_release_bp_slot(bp); - type = find_slot_idx(bp_type); weight = hw_breakpoint_weight(bp); WARN_ON(toggle_bp_slot(bp, false, type, weight)); @@ -645,7 +618,6 @@ void release_bp_slot(struct perf_event *bp) { struct mutex *mtx = bp_constraints_lock(bp); - arch_unregister_hw_breakpoint(bp); __release_bp_slot(bp, bp->attr.bp_type); bp_constraints_unlock(mtx); } -- 2.41.0
[PATCH 6/7] selftests/powerpc/ptrace: Update ptrace-perf watchpoint selftest
Now that ptrace and perf are no longer exclusive, update the test to exercise interesting interactions. An assembly file is used for the children to allow precise instruction choice and addresses, while avoiding any compiler quirks. Signed-off-by: Benjamin Gray --- .../testing/selftests/powerpc/ptrace/Makefile |1 + .../powerpc/ptrace/ptrace-perf-asm.S | 33 + .../powerpc/ptrace/ptrace-perf-hwbreak.c | 1104 +++-- 3 files changed, 479 insertions(+), 659 deletions(-) create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace-perf-asm.S rewrite tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c (93%) diff --git a/tools/testing/selftests/powerpc/ptrace/Makefile b/tools/testing/selftests/powerpc/ptrace/Makefile index cbeeaeae8837..1b39b86849da 100644 --- a/tools/testing/selftests/powerpc/ptrace/Makefile +++ b/tools/testing/selftests/powerpc/ptrace/Makefile @@ -36,6 +36,7 @@ $(TM_TESTS): CFLAGS += -I../tm -mhtm CFLAGS += $(KHDR_INCLUDES) -fno-pie $(OUTPUT)/ptrace-gpr: ptrace-gpr.S +$(OUTPUT)/ptrace-perf-hwbreak: ptrace-perf-asm.S $(OUTPUT)/ptrace-pkey $(OUTPUT)/core-pkey: LDLIBS += -pthread $(TEST_GEN_PROGS): ../harness.c ../utils.c ../lib/reg.S diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-perf-asm.S b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-asm.S new file mode 100644 index ..9aa2e58f3189 --- /dev/null +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-asm.S @@ -0,0 +1,33 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ + +#include + +.global same_watch_addr_load +.global same_watch_addr_trap + +FUNC_START(same_watch_addr_child) + nop +same_watch_addr_load: + ld 0,0(3) + nop +same_watch_addr_trap: + trap + blr +FUNC_END(same_watch_addr_child) + + +.global perf_then_ptrace_load1 +.global perf_then_ptrace_load2 +.global perf_then_ptrace_trap + +FUNC_START(perf_then_ptrace_child) + nop +perf_then_ptrace_load1: + ld 0,0(3) +perf_then_ptrace_load2: + ld 0,0(4) + nop +perf_then_ptrace_trap: + trap + blr +FUNC_END(perf_then_ptrace_child) diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c dissimilarity index 93% index d8a9e95fc03d..a0a0b9bb5854 100644 --- a/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c @@ -1,659 +1,445 @@ -// SPDX-License-Identifier: GPL-2.0+ -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include "ptrace.h" - -char data[16]; - -/* Overlapping address range */ -volatile __u64 *ptrace_data1 = (__u64 *)[0]; -volatile __u64 *perf_data1 = (__u64 *)[4]; - -/* Non-overlapping address range */ -volatile __u64 *ptrace_data2 = (__u64 *)[0]; -volatile __u64 *perf_data2 = (__u64 *)[8]; - -static unsigned long pid_max_addr(void) -{ - FILE *fp; - char *line, *c; - char addr[100]; - size_t len = 0; - - fp = fopen("/proc/kallsyms", "r"); - if (!fp) { - printf("Failed to read /proc/kallsyms. Exiting..\n"); - exit(EXIT_FAILURE); - } - - while (getline(, , fp) != -1) { - if (!strstr(line, "pid_max") || strstr(line, "pid_max_max") || - strstr(line, "pid_max_min")) - continue; - - strncpy(addr, line, len < 100 ? len : 100); - c = strchr(addr, ' '); - *c = '\0'; - return strtoul(addr, , 16); - } - fclose(fp); - printf("Could not find pid_max. Exiting..\n"); - exit(EXIT_FAILURE); - return -1; -} - -static void perf_user_event_attr_set(struct perf_event_attr *attr, __u64 addr, __u64 len) -{ - memset(attr, 0, sizeof(struct perf_event_attr)); - attr->type = PERF_TYPE_BREAKPOINT; - attr->size = sizeof(struct perf_event_attr); - attr->bp_type= HW_BREAKPOINT_R; - attr->bp_addr= addr; - attr->bp_len = len; - attr->exclude_kernel = 1; - attr->exclude_hv = 1; -} - -static void perf_kernel_event_attr_set(struct perf_event_attr *attr) -{ - memset(attr, 0, sizeof(struct perf_event_attr)); - attr->type = PERF_TYPE_BREAKPOINT; - attr->size = sizeof(struct perf_event_attr); - attr->bp_type= HW_BREAKPOINT_R; - attr->bp_addr= pid_max_addr(); - attr->bp_len = sizeof(unsigned long); - attr->exclude_user = 1; - attr->exclude_hv = 1; -} - -static int perf_cpu_event_open(int cpu, __u64 addr, __u64 len) -{ - struct perf_event_attr attr; - - perf_user_event_attr_set(, addr, len); - retur
[PATCH 5/7] powerpc/watchpoints: Remove ptrace/perf exclusion tracking
ptrace and perf watchpoints were considered incompatible in commit 29da4f91c0c1 ("powerpc/watchpoint: Don't allow concurrent perf and ptrace events"), but the logic in that commit doesn't really apply. Ptrace doesn't automatically single step; the ptracer must request this explicitly. And the ptracer can do so regardless of whether a ptrace/perf watchpoint triggered or not: it could single step every instruction if it wanted to. Whatever stopped the ptracee before executing the instruction that would trigger the perf watchpoint is no longer relevant by this point. To get correct behaviour when perf and ptrace are watching the same data we must ignore the perf watchpoint. After all, ptrace has before-execute semantics, and perf is after-execute, so perf doesn't actually care about the watchpoint trigger at this point in time. Pausing before execution does not mean we will actually end up executing the instruction. Importantly though, we don't remove the perf watchpoint yet. This is key. The ptracer is free to do whatever it likes right now. E.g., it can continue the process, single step. or even set the child PC somewhere completely different. If it does try to execute the instruction though, without reinserting the watchpoint (in which case we go back to the start of this example), the perf watchpoint would immediately trigger. This time there is no ptrace watchpoint, so we can safely perform a single step and increment the perf counter. Upon receiving the single step exception, the existing code already handles propagating or consuming it based on whether another subsystem (e.g. ptrace) requested a single step. Again, this is needed with or without perf/ptrace exclusion, because ptrace could be single stepping this instruction regardless of if a watchpoint is involved. Signed-off-by: Benjamin Gray --- Whether it's a _good_ idea to mix ptrace and perf is another thing entirely mind... . But they are not inherently incompatible. --- arch/powerpc/kernel/hw_breakpoint.c | 249 +--- 1 file changed, 1 insertion(+), 248 deletions(-) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index bf8dda1a7e04..b8513dc3e53a 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -114,253 +114,6 @@ static bool is_ptrace_bp(struct perf_event *bp) return bp->overflow_handler == ptrace_triggered; } -struct breakpoint { - struct list_head list; - struct perf_event *bp; - bool ptrace_bp; -}; - -/* - * While kernel/events/hw_breakpoint.c does its own synchronization, we cannot - * rely on it safely synchronizing internals here; however, we can rely on it - * not requesting more breakpoints than available. - */ -static DEFINE_SPINLOCK(cpu_bps_lock); -static DEFINE_PER_CPU(struct breakpoint *, cpu_bps[HBP_NUM_MAX]); -static DEFINE_SPINLOCK(task_bps_lock); -static LIST_HEAD(task_bps); - -static struct breakpoint *alloc_breakpoint(struct perf_event *bp) -{ - struct breakpoint *tmp; - - tmp = kzalloc(sizeof(*tmp), GFP_KERNEL); - if (!tmp) - return ERR_PTR(-ENOMEM); - tmp->bp = bp; - tmp->ptrace_bp = is_ptrace_bp(bp); - return tmp; -} - -static bool bp_addr_range_overlap(struct perf_event *bp1, struct perf_event *bp2) -{ - __u64 bp1_saddr, bp1_eaddr, bp2_saddr, bp2_eaddr; - - bp1_saddr = ALIGN_DOWN(bp1->attr.bp_addr, HW_BREAKPOINT_SIZE); - bp1_eaddr = ALIGN(bp1->attr.bp_addr + bp1->attr.bp_len, HW_BREAKPOINT_SIZE); - bp2_saddr = ALIGN_DOWN(bp2->attr.bp_addr, HW_BREAKPOINT_SIZE); - bp2_eaddr = ALIGN(bp2->attr.bp_addr + bp2->attr.bp_len, HW_BREAKPOINT_SIZE); - - return (bp1_saddr < bp2_eaddr && bp1_eaddr > bp2_saddr); -} - -static bool alternate_infra_bp(struct breakpoint *b, struct perf_event *bp) -{ - return is_ptrace_bp(bp) ? !b->ptrace_bp : b->ptrace_bp; -} - -static bool can_co_exist(struct breakpoint *b, struct perf_event *bp) -{ - return !(alternate_infra_bp(b, bp) && bp_addr_range_overlap(b->bp, bp)); -} - -static int task_bps_add(struct perf_event *bp) -{ - struct breakpoint *tmp; - - tmp = alloc_breakpoint(bp); - if (IS_ERR(tmp)) - return PTR_ERR(tmp); - - spin_lock(_bps_lock); - list_add(>list, _bps); - spin_unlock(_bps_lock); - return 0; -} - -static void task_bps_remove(struct perf_event *bp) -{ - struct list_head *pos, *q; - - spin_lock(_bps_lock); - list_for_each_safe(pos, q, _bps) { - struct breakpoint *tmp = list_entry(pos, struct breakpoint, list); - - if (tmp->bp == bp) { - list_del(>list); - kfree(tmp); - break; - } - } - spin_unlock(_bps_lock); -} - -/* - * If any task has breakpoint from alterna
[PATCH 4/7] powerpc/watchpoints: Simplify watchpoint reinsertion
We only remove watchpoints when they have the perf_single_step flag set, so we can reinsert them during the first iteration. Signed-off-by: Benjamin Gray --- arch/powerpc/kernel/hw_breakpoint.c | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 624375c18882..bf8dda1a7e04 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -772,16 +772,6 @@ static int single_step_dabr_instruction(struct die_args *args) perf_bp_event(bp, regs); info->perf_single_step = false; - } - - if (!found) - return NOTIFY_DONE; - - for (int i = 0; i < nr_wp_slots(); i++) { - struct perf_event *bp = __this_cpu_read(bp_per_reg[i]); - if (!bp) - continue; - __set_breakpoint(i, counter_arch_bp(bp)); } @@ -789,7 +779,7 @@ static int single_step_dabr_instruction(struct die_args *args) * If the process was being single-stepped by ptrace, let the * other single-step actions occur (e.g. generate SIGTRAP). */ - if (test_thread_flag(TIF_SINGLESTEP)) + if (!found || test_thread_flag(TIF_SINGLESTEP)) return NOTIFY_DONE; return NOTIFY_STOP; -- 2.41.0
[PATCH 1/7] powerpc/watchpoints: Explain thread_change_pc() more
The behaviour of the thread_change_pc() function is a bit cryptic without being more familiar with how the watchpoint logic handles perf's after-execute semantics. Expand the comment to explain why we can re-insert the breakpoint and unset the perf_single_step flag. Signed-off-by: Benjamin Gray --- arch/powerpc/kernel/hw_breakpoint.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index e1b4e70c8fd0..bad2991f906b 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -499,6 +499,10 @@ int hw_breakpoint_arch_parse(struct perf_event *bp, * Restores the breakpoint on the debug registers. * Invoke this function if it is known that the execution context is * about to change to cause loss of MSR_SE settings. + * + * The perf watchpoint will simply re-trigger once the thread is started again, + * and the watchpoint handler will set up MSR_SE and perf_single_step as + * needed. */ void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs) { -- 2.41.0
[PATCH 3/7] powerpc/watchpoints: Track perf single step directly on the breakpoint
There is a bug in the current watchpoint tracking logic, where the teardown in arch_unregister_hw_breakpoint() uses bp->ctx->task, which it does not have a reference of and parallel threads may be in the process of destroying. This was partially addressed in commit fb822e6076d9 ("powerpc/hw_breakpoint: Fix oops when destroying hw_breakpoint event"), but the underlying issue of accessing a struct member in an unknown state still remained. Syzkaller managed to trigger a null pointer derefernce due to the race between the task destructor and checking the pointer and dereferencing it in the loop. While this null pointer dereference could be fixed by using READ_ONCE to access the task up front, that just changes the error to manipulating possbily freed memory. Instead, the breakpoint logic needs to be reworked to remove any dependency on a context or task struct during breakpoint removal. The reason we have this currently is to clear thread.last_hit_ubp. This member is used to differentiate the perf DAWR single-step sequence from other causes of single-step, such as userspace just calling ptrace(PTRACE_SINGLESTEP, ...). We need to differentiate them because, when the single step interrupt is received, we need to know whether to re-insert the DAWR breakpoint (perf) or not (ptrace / other). arch_unregister_hw_breakpoint() needs to clear this information to prevent dangling pointers to possibly freed memory. These pointers are dereferenced in single_step_dabr_instruction() without a way to check their validity. This patch moves the tracking of this information to the breakpoint itself. This means we no longer have to do anything special to clean up. Signed-off-by: Benjamin Gray --- arch/powerpc/include/asm/hw_breakpoint.h | 1 + arch/powerpc/include/asm/processor.h | 5 -- arch/powerpc/kernel/hw_breakpoint.c | 69 3 files changed, 23 insertions(+), 52 deletions(-) diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index 84d39fd42f71..66db0147d5b4 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -18,6 +18,7 @@ struct arch_hw_breakpoint { u16 len; /* length of the target data symbol */ u16 hw_len; /* length programmed in hw */ u8 flags; + boolperf_single_step; /* temporarily uninstalled for a perf single step */ }; /* Note: Don't change the first 6 bits below as they are in the same order diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 8a6754ffdc7e..9e67cb1c72e9 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -172,11 +172,6 @@ struct thread_struct { unsigned intalign_ctl; /* alignment handling control */ #ifdef CONFIG_HAVE_HW_BREAKPOINT struct perf_event *ptrace_bps[HBP_NUM_MAX]; - /* -* Helps identify source of single-step exception and subsequent -* hw-breakpoint enablement -*/ - struct perf_event *last_hit_ubp[HBP_NUM_MAX]; #endif /* CONFIG_HAVE_HW_BREAKPOINT */ struct arch_hw_breakpoint hw_brk[HBP_NUM_MAX]; /* hardware breakpoint info */ unsigned long trap_nr;/* last trap # on this thread */ diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index e6749642604c..624375c18882 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -43,16 +43,6 @@ int hw_breakpoint_slots(int type) return 0; /* no instruction breakpoints available */ } -static bool single_step_pending(void) -{ - int i; - - for (i = 0; i < nr_wp_slots(); i++) { - if (current->thread.last_hit_ubp[i]) - return true; - } - return false; -} /* * Install a perf counter breakpoint. @@ -84,7 +74,7 @@ int arch_install_hw_breakpoint(struct perf_event *bp) * Do not install DABR values if the instruction must be single-stepped. * If so, DABR will be populated in single_step_dabr_instruction(). */ - if (!single_step_pending()) + if (!info->perf_single_step) __set_breakpoint(i, info); return 0; @@ -371,28 +361,6 @@ void arch_release_bp_slot(struct perf_event *bp) } } -/* - * Perform cleanup of arch-specific counters during unregistration - * of the perf-event - */ -void arch_unregister_hw_breakpoint(struct perf_event *bp) -{ - /* -* If the breakpoint is unregistered between a hw_breakpoint_handler() -* and the single_step_dabr_instruction(), then cleanup the breakpoint -* restoration variables to prevent dangling pointers. -* FIXME, this should not be using bp->ctx at all! Sayeth peterz. -*/ - if (bp->ctx && bp->
[PATCH 2/7] powerpc/watchpoints: Don't track info persistently
info is cheap to retrieve, and is likely optimised by the compiler anyway. On the other hand, propagating it across the functions makes it possible to be inconsistent and adds needless complexity. Remove it, and invoke counter_arch_bp() when we need to work with it. As we don't persist it, we just use the local bp array to track whether we are ignoring a breakpoint. Signed-off-by: Benjamin Gray --- arch/powerpc/kernel/hw_breakpoint.c | 60 +++-- 1 file changed, 32 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index bad2991f906b..e6749642604c 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -538,23 +538,22 @@ static bool is_octword_vsx_instr(int type, int size) * We've failed in reliably handling the hw-breakpoint. Unregister * it and throw a warning message to let the user know about it. */ -static void handler_error(struct perf_event *bp, struct arch_hw_breakpoint *info) +static void handler_error(struct perf_event *bp) { WARN(1, "Unable to handle hardware breakpoint. Breakpoint at 0x%lx will be disabled.", -info->address); +counter_arch_bp(bp)->address); perf_event_disable_inatomic(bp); } -static void larx_stcx_err(struct perf_event *bp, struct arch_hw_breakpoint *info) +static void larx_stcx_err(struct perf_event *bp) { printk_ratelimited("Breakpoint hit on instruction that can't be emulated. Breakpoint at 0x%lx will be disabled.\n", - info->address); + counter_arch_bp(bp)->address); perf_event_disable_inatomic(bp); } static bool stepping_handler(struct pt_regs *regs, struct perf_event **bp, -struct arch_hw_breakpoint **info, int *hit, -ppc_inst_t instr) +int *hit, ppc_inst_t instr) { int i; int stepped; @@ -565,7 +564,7 @@ static bool stepping_handler(struct pt_regs *regs, struct perf_event **bp, if (!hit[i]) continue; current->thread.last_hit_ubp[i] = bp[i]; - info[i] = NULL; + bp[i] = NULL; } regs_set_return_msr(regs, regs->msr | MSR_SE); return false; @@ -576,15 +575,15 @@ static bool stepping_handler(struct pt_regs *regs, struct perf_event **bp, for (i = 0; i < nr_wp_slots(); i++) { if (!hit[i]) continue; - handler_error(bp[i], info[i]); - info[i] = NULL; + handler_error(bp[i]); + bp[i] = NULL; } return false; } return true; } -static void handle_p10dd1_spurious_exception(struct arch_hw_breakpoint **info, +static void handle_p10dd1_spurious_exception(struct perf_event **bp, int *hit, unsigned long ea) { int i; @@ -596,10 +595,14 @@ static void handle_p10dd1_spurious_exception(struct arch_hw_breakpoint **info, * spurious exception. */ for (i = 0; i < nr_wp_slots(); i++) { - if (!info[i]) + struct arch_hw_breakpoint *info; + + if (!bp[i]) continue; - hw_end_addr = ALIGN(info[i]->address + info[i]->len, HW_BREAKPOINT_SIZE); + info = counter_arch_bp(bp[i]); + + hw_end_addr = ALIGN(info->address + info->len, HW_BREAKPOINT_SIZE); /* * Ending address of DAWR range is less than starting @@ -629,9 +632,9 @@ static void handle_p10dd1_spurious_exception(struct arch_hw_breakpoint **info, return; for (i = 0; i < nr_wp_slots(); i++) { - if (info[i]) { + if (bp[i]) { hit[i] = 1; - info[i]->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ; + counter_arch_bp(bp[i])->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ; } } } @@ -642,7 +645,6 @@ int hw_breakpoint_handler(struct die_args *args) int rc = NOTIFY_STOP; struct perf_event *bp[HBP_NUM_MAX] = { NULL }; struct pt_regs *regs = args->regs; - struct arch_hw_breakpoint *info[HBP_NUM_MAX] = { NULL }; int i; int hit[HBP_NUM_MAX] = {0}; int nr_hit = 0; @@ -667,18 +669,20 @@ int hw_breakpoint_handler(struct die_args *args) wp_get_instr_detail(regs, , , , ); for (i = 0; i < nr_wp_slots(); i++) { + struct arch_hw_breakpoint *info; + bp[i] = __this_cpu_read(bp_per_reg[i]); if (!bp[i])
[PATCH 0/7] Rework perf and ptrace watchpoint tracking
Syzkaller triggered a null pointer dereference in the arch_unregister_hw_breakpoint() hook. This is due to accessing the bp->ctx->task field changing to -1 while we iterate the breakpoints. This series refactors the breakpoint tracking logic to remove the dependency on bp->ctx entirely. It also simplifies handling of ptrace and perf breakpoints, making insertion less restrictive. If merged, it allows several arch hooks that PowerPC was the sole user of to be removed. Benjamin Gray (7): powerpc/watchpoints: Explain thread_change_pc() more powerpc/watchpoints: Don't track info persistently powerpc/watchpoints: Track perf single step directly on the breakpoint powerpc/watchpoints: Simplify watchpoint reinsertion powerpc/watchpoints: Remove ptrace/perf exclusion tracking selftests/powerpc/ptrace: Update ptrace-perf watchpoint selftest perf/hw_breakpoint: Remove arch breakpoint hooks arch/powerpc/include/asm/hw_breakpoint.h |1 + arch/powerpc/include/asm/processor.h |5 - arch/powerpc/kernel/hw_breakpoint.c | 388 +- include/linux/hw_breakpoint.h |3 - kernel/events/hw_breakpoint.c | 28 - .../testing/selftests/powerpc/ptrace/Makefile |1 + .../powerpc/ptrace/ptrace-perf-asm.S | 33 + .../powerpc/ptrace/ptrace-perf-hwbreak.c | 1104 +++-- 8 files changed, 537 insertions(+), 1026 deletions(-) create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace-perf-asm.S rewrite tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c (93%) -- 2.41.0
[PATCH v1 3/4] selftests/powerpc/ptrace: Fix typo in pid_max search error
pid_max_addr() searches for the 'pid_max' symbol in /proc/kallsyms, and prints an error if it cannot find it. The error message has a typo, calling it pix_max. Signed-off-by: Benjamin Gray --- tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c index 16c653600124..d8a9e95fc03d 100644 --- a/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c @@ -46,7 +46,7 @@ static unsigned long pid_max_addr(void) return strtoul(addr, , 16); } fclose(fp); - printf("Could not find pix_max. Exiting..\n"); + printf("Could not find pid_max. Exiting..\n"); exit(EXIT_FAILURE); return -1; } -- 2.41.0
[PATCH v1 2/4] selftests/powerpc/ptrace: Explain why tests are skipped
Many tests require specific hardware features/configurations that a typical machine might not have. As a result, it's common to see a test is skipped. But it is tedious to find out why a test is skipped when all it gives is the file location of the skip macro. Convert SKIP_IF() to SKIP_IF_MSG(), with appropriate descriptions of why the test is being skipped. This gives a general idea of why a test is skipped, which can be looked into further if it doesn't make sense. Signed-off-by: Benjamin Gray --- tools/testing/selftests/powerpc/ptrace/child.h | 4 ++-- tools/testing/selftests/powerpc/ptrace/core-pkey.c | 2 +- tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c| 2 +- tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c | 2 +- tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c | 4 ++-- tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c | 2 +- tools/testing/selftests/powerpc/ptrace/ptrace-tar.c | 2 +- tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c | 4 ++-- tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c | 4 ++-- tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-tar.c | 4 ++-- tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-vsx.c | 4 ++-- tools/testing/selftests/powerpc/ptrace/ptrace-tm-spr.c | 4 ++-- tools/testing/selftests/powerpc/ptrace/ptrace-tm-tar.c | 4 ++-- tools/testing/selftests/powerpc/ptrace/ptrace-tm-vsx.c | 4 ++-- tools/testing/selftests/powerpc/ptrace/ptrace-vsx.c | 2 +- 15 files changed, 24 insertions(+), 24 deletions(-) diff --git a/tools/testing/selftests/powerpc/ptrace/child.h b/tools/testing/selftests/powerpc/ptrace/child.h index d7275b7b33dc..df62ff0735f7 100644 --- a/tools/testing/selftests/powerpc/ptrace/child.h +++ b/tools/testing/selftests/powerpc/ptrace/child.h @@ -48,12 +48,12 @@ struct child_sync { } \ } while (0) -#define PARENT_SKIP_IF_UNSUPPORTED(x, sync)\ +#define PARENT_SKIP_IF_UNSUPPORTED(x, sync, msg) \ do {\ if ((x) == -1 && (errno == ENODEV || errno == EINVAL)) { \ (sync)->parent_gave_up = true; \ prod_child(sync); \ - SKIP_IF(1); \ + SKIP_IF_MSG(1, msg);\ } \ } while (0) diff --git a/tools/testing/selftests/powerpc/ptrace/core-pkey.c b/tools/testing/selftests/powerpc/ptrace/core-pkey.c index f6f8596ce8e1..f6da4cb30cd6 100644 --- a/tools/testing/selftests/powerpc/ptrace/core-pkey.c +++ b/tools/testing/selftests/powerpc/ptrace/core-pkey.c @@ -266,7 +266,7 @@ static int parent(struct shared_info *info, pid_t pid) * to the child. */ ret = ptrace_read_regs(pid, NT_PPC_PKEY, regs, 3); - PARENT_SKIP_IF_UNSUPPORTED(ret, >child_sync); + PARENT_SKIP_IF_UNSUPPORTED(ret, >child_sync, "PKEYs not supported"); PARENT_FAIL_IF(ret, >child_sync); info->amr = regs[0]; diff --git a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c index f75739bbad28..e374c6b7ace6 100644 --- a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c +++ b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c @@ -884,7 +884,7 @@ static int perf_hwbreak(void) { srand ( time(NULL) ); - SKIP_IF(!perf_breakpoint_supported()); + SKIP_IF_MSG(!perf_breakpoint_supported(), "Perf breakpoints not supported"); return runtest(); } diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c index 1345e9b9af0f..a16239277a6f 100644 --- a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c @@ -603,7 +603,7 @@ static int ptrace_hwbreak(void) wait(NULL); get_dbginfo(child_pid, ); - SKIP_IF(dbginfo.num_data_bps == 0); + SKIP_IF_MSG(dbginfo.num_data_bps == 0, "No data breakpoints present"); dawr = dawr_present(); run_tests(child_pid, , dawr); diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c index 3344e74a97b4..16c653600124 100644 --- a/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c @@ -641,10 +641,10 @@ static int ptrace_perf_hwbreak(void) wait(NULL); /* <-- child (SIGUSR1) */ get_dbginfo(child_pid, ); -
[PATCH v1 4/4] selftests/powerpc/ptrace: Declare test temporary variables as volatile
While the target is volatile, the temporary variables used to access the target cast away the volatile. This is undefined behaviour, and a compiler may optimise away/reorder these accesses, breaking the test. This was observed with GCC 13.1.1, but it can be difficult to reproduce because of the dependency on compiler behaviour. Signed-off-by: Benjamin Gray --- .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 24 +-- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c index a16239277a6f..75d30d61ab0e 100644 --- a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c @@ -64,26 +64,26 @@ static bool dawr_present(struct ppc_debug_info *dbginfo) static void write_var(int len) { - __u8 *pcvar; - __u16 *psvar; - __u32 *pivar; - __u64 *plvar; + volatile __u8 *pcvar; + volatile __u16 *psvar; + volatile __u32 *pivar; + volatile __u64 *plvar; switch (len) { case 1: - pcvar = (__u8 *) + pcvar = (volatile __u8 *) *pcvar = 0xff; break; case 2: - psvar = (__u16 *) + psvar = (volatile __u16 *) *psvar = 0x; break; case 4: - pivar = (__u32 *) + pivar = (volatile __u32 *) *pivar = 0x; break; case 8: - plvar = (__u64 *) + plvar = (volatile __u64 *) *plvar = 0xLL; break; } @@ -98,16 +98,16 @@ static void read_var(int len) switch (len) { case 1: - cvar = (__u8)glvar; + cvar = (volatile __u8)glvar; break; case 2: - svar = (__u16)glvar; + svar = (volatile __u16)glvar; break; case 4: - ivar = (__u32)glvar; + ivar = (volatile __u32)glvar; break; case 8: - lvar = (__u64)glvar; + lvar = (volatile __u64)glvar; break; } } -- 2.41.0
[PATCH v1 1/4] Documentation/powerpc: Fix ptrace request names
The documented ptrace request names are currently wrong/incomplete. Fix this to improve correctness and searchability. Signed-off-by: Benjamin Gray --- Documentation/powerpc/ptrace.rst | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Documentation/powerpc/ptrace.rst b/Documentation/powerpc/ptrace.rst index 77725d69eb4a..5629edf4d56e 100644 --- a/Documentation/powerpc/ptrace.rst +++ b/Documentation/powerpc/ptrace.rst @@ -15,7 +15,7 @@ that's extendable and that covers both BookE and server processors, so that GDB doesn't need to special-case each of them. We added the following 3 new ptrace requests. -1. PTRACE_PPC_GETHWDEBUGINFO +1. PPC_PTRACE_GETHWDBGINFO Query for GDB to discover the hardware debug features. The main info to @@ -48,7 +48,7 @@ features will have bits indicating whether there is support for:: #define PPC_DEBUG_FEATURE_DATA_BP_DAWR 0x10 #define PPC_DEBUG_FEATURE_DATA_BP_ARCH_310x20 -2. PTRACE_SETHWDEBUG +2. PPC_PTRACE_SETHWDEBUG Sets a hardware breakpoint or watchpoint, according to the provided structure:: @@ -88,7 +88,7 @@ that the BookE supports. COMEFROM breakpoints available in server processors are not contemplated, but that is out of the scope of this work. ptrace will return an integer (handle) uniquely identifying the breakpoint or -watchpoint just created. This integer will be used in the PTRACE_DELHWDEBUG +watchpoint just created. This integer will be used in the PPC_PTRACE_DELHWDEBUG request to ask for its removal. Return -ENOSPC if the requested breakpoint can't be allocated on the registers. @@ -150,7 +150,7 @@ Some examples of using the structure to: p.addr2 = (uint64_t) end_range; p.condition_value = 0; -3. PTRACE_DELHWDEBUG +3. PPC_PTRACE_DELHWDEBUG Takes an integer which identifies an existing breakpoint or watchpoint (i.e., the value returned from PTRACE_SETHWDEBUG), and deletes the -- 2.41.0
[PATCH v1 0/4] Improve ptrace selftest usability
While trying to test changes to the breakpoint implementation in the kernel, I tried to run the ptrace tests and met many unexplained skips and failures. This series addresses the pain points of trying to run these tests and learn what they are doing. Benjamin Gray (4): Documentation/powerpc: Fix ptrace request names selftests/powerpc/ptrace: Explain why tests are skipped selftests/powerpc/ptrace: Fix typo in pid_max search error selftests/powerpc/ptrace: Declare test temporary variables as volatile Documentation/powerpc/ptrace.rst | 8 +++--- .../testing/selftests/powerpc/ptrace/child.h | 4 +-- .../selftests/powerpc/ptrace/core-pkey.c | 2 +- .../selftests/powerpc/ptrace/perf-hwbreak.c | 2 +- .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 26 +-- .../powerpc/ptrace/ptrace-perf-hwbreak.c | 6 ++--- .../selftests/powerpc/ptrace/ptrace-pkey.c| 2 +- .../selftests/powerpc/ptrace/ptrace-tar.c | 2 +- .../selftests/powerpc/ptrace/ptrace-tm-gpr.c | 4 +-- .../powerpc/ptrace/ptrace-tm-spd-gpr.c| 4 +-- .../powerpc/ptrace/ptrace-tm-spd-tar.c| 4 +-- .../powerpc/ptrace/ptrace-tm-spd-vsx.c| 4 +-- .../selftests/powerpc/ptrace/ptrace-tm-spr.c | 4 +-- .../selftests/powerpc/ptrace/ptrace-tm-tar.c | 4 +-- .../selftests/powerpc/ptrace/ptrace-tm-vsx.c | 4 +-- .../selftests/powerpc/ptrace/ptrace-vsx.c | 2 +- 16 files changed, 41 insertions(+), 41 deletions(-) -- 2.41.0
[PATCH] powerpc/kasan: Disable KCOV in KASAN code
As per the generic KASAN code in mm/kasan, disable KCOV with KCOV_INSTRUMENT := n in the makefile. This fixes a ppc64 boot hang when KCOV and KASAN are enabled. kasan_early_init() gets called before a PACA is initialised, but the KCOV hook expects a valid PACA. Suggested-by: Christophe Leroy Signed-off-by: Benjamin Gray --- The generic makefile has other disabled sanitisers / hooks, but this patch is just an upstream of fixing a blocking issue for running syzkaller. The blocker is resolved by this; kernels stop hanging on boot with KCOV + KASAN. This works as an alternative to https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20230707013110.215693-1-bg...@linux.ibm.com/ --- arch/powerpc/mm/kasan/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile index 699eeffd9f55..f9522fd70b2f 100644 --- a/arch/powerpc/mm/kasan/Makefile +++ b/arch/powerpc/mm/kasan/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 KASAN_SANITIZE := n +KCOV_INSTRUMENT := n obj-$(CONFIG_PPC32)+= init_32.o obj-$(CONFIG_PPC_8xx) += 8xx.o -- 2.41.0
Re: [PATCH] powerpc/powermac: Use early_* IO variants in via_calibrate_decr
On Thu, 2023-07-06 at 11:08 +1000, Benjamin Gray wrote: > The issue is pre-existing, but is surfaced by commit 721255b9826b > ("genirq: Use a maple tree for interrupt descriptor management"). > It's not clear to me why this causes it to surface. >From the thread chain in [1], it looks like the maple tree implementation just has different allocation behaviour, which follows a pre-existing code path to kmem_cache_alloc_bulk(), which unconditionally enables interrupts. (thanks Jordan Niethe for finding this thread) [1]: https://lore.kernel.org/all/87o7qdzfay.ffs@tglx/