[PATCH v4 0/5] Add generic data patching functions

2024-05-14 Thread Benjamin Gray
Currently patch_instruction() bases the write length on the value being
written. If the value looks like a prefixed instruction it writes 8 bytes,
otherwise it writes 4 bytes. This makes it potentially buggy to use for
writing arbitrary data, as if you want to write 4 bytes but it decides to
write 8 bytes it may clobber the following memory or be unaligned and
trigger an oops if it tries to cross a page boundary.

To solve this, this series pulls out the size parameter to the 'top' of
the memory patching logic, and propagates it through the various functions.

The two sizes supported are int and long; this allows for patching
instructions and pointers on both ppc32 and ppc64. On ppc32 these are the
same size, so care is taken to only use the size parameter on static
functions, so the compiler can optimise it out entirely. Unfortunately
GCC trips over its own feet here and won't optimise in a way that is
optimal for strict RWX (mpc85xx_smp_defconfig) and no RWX
(pmac32_defconfig). More details in the v2 cover letter.

Changes from v3:
  * Improved the boot test. Now that alignment is enforced,
it checks the word (but not doubleword) aligned patch_ulong().

Changes from v2:
  * Various changes noted on each patch
  * Data patching now enforced to be aligned
  * Restore page aligned flushing optimisation

Changes from v1:
  * Addressed the v1 review actions
  * Removed noinline (for now)

v3: 
https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20240325055302.876434-1-bg...@linux.ibm.com/
v2: 
https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20231016050147.115686-1-bg...@linux.ibm.com/
v1: 
https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20230207015643.590684-1-bg...@linux.ibm.com/

Benjamin Gray (5):
  powerpc/code-patching: Add generic memory patching
  powerpc/code-patching: Add data patch alignment check
  powerpc/64: Convert patch_instruction() to patch_u32()
  powerpc/32: Convert patch_instruction() to patch_uint()
  powerpc/code-patching: Add boot selftest for data patching

 arch/powerpc/include/asm/code-patching.h | 37 +
 arch/powerpc/kernel/module_64.c  |  5 +-
 arch/powerpc/kernel/static_call.c|  2 +-
 arch/powerpc/lib/code-patching.c | 70 +++-
 arch/powerpc/lib/test-code-patching.c| 41 ++
 arch/powerpc/platforms/powermac/smp.c|  2 +-
 6 files changed, 137 insertions(+), 20 deletions(-)

--
2.45.0



[PATCH v4 5/5] powerpc/code-patching: Add boot selftest for data patching

2024-05-14 Thread Benjamin Gray
Extend the code patching selftests with some basic coverage of the new
data patching variants too.

Signed-off-by: Benjamin Gray 

---

v4: * Change store to a check
* Account for doubleword alignment
v3: * New in v3
---
 arch/powerpc/lib/test-code-patching.c | 41 +++
 1 file changed, 41 insertions(+)

diff --git a/arch/powerpc/lib/test-code-patching.c 
b/arch/powerpc/lib/test-code-patching.c
index f76030087f98..8cd3b32f805b 100644
--- a/arch/powerpc/lib/test-code-patching.c
+++ b/arch/powerpc/lib/test-code-patching.c
@@ -438,6 +438,46 @@ static void __init test_multi_instruction_patching(void)
vfree(buf);
 }
 
+static void __init test_data_patching(void)
+{
+   void *buf;
+   u32 *addr32;
+
+   buf = vzalloc(PAGE_SIZE);
+   check(buf);
+   if (!buf)
+   return;
+
+   addr32 = buf + 128;
+
+   addr32[1] = 0xA0A1A2A3;
+   addr32[2] = 0xB0B1B2B3;
+
+   check(!patch_uint([1], 0xC0C1C2C3));
+
+   check(addr32[0] == 0);
+   check(addr32[1] == 0xC0C1C2C3);
+   check(addr32[2] == 0xB0B1B2B3);
+   check(addr32[3] == 0);
+
+   /* Unaligned patch_ulong() should fail */
+   if (IS_ENABLED(CONFIG_PPC64))
+   check(patch_ulong([1], 0xD0D1D2D3) == -EINVAL);
+
+   check(!patch_ulong([2], 0xD0D1D2D3));
+
+   check(addr32[0] == 0);
+   check(addr32[1] == 0xC0C1C2C3);
+   check(*(unsigned long *)([2]) == 0xD0D1D2D3);
+
+   if (!IS_ENABLED(CONFIG_PPC64))
+   check(addr32[3] == 0);
+
+   check(addr32[4] == 0);
+
+   vfree(buf);
+}
+
 static int __init test_code_patching(void)
 {
pr_info("Running code patching self-tests ...\n");
@@ -448,6 +488,7 @@ static int __init test_code_patching(void)
test_translate_branch();
test_prefixed_patching();
test_multi_instruction_patching();
+   test_data_patching();
 
return 0;
 }
-- 
2.45.0



[PATCH v4 1/5] powerpc/code-patching: Add generic memory patching

2024-05-14 Thread Benjamin Gray
patch_instruction() is designed for patching instructions in otherwise
readonly memory. Other consumers also sometimes need to patch readonly
memory, so have abused patch_instruction() for arbitrary data patches.

This is a problem on ppc64 as patch_instruction() decides on the patch
width using the 'instruction' opcode to see if it's a prefixed
instruction. Data that triggers this can lead to larger writes, possibly
crossing a page boundary and failing the write altogether.

Introduce patch_uint(), and patch_ulong(), with aliases patch_u32(), and
patch_u64() (on ppc64) designed for aligned data patches. The patch
size is now determined by the called function, and is passed as an
additional parameter to generic internals.

While the instruction flushing is not required for data patches, it
remains unconditional in this patch. A followup series is possible if
benchmarking shows fewer flushes gives an improvement in some
data-patching workload.

ppc32 does not support prefixed instructions, so is unaffected by the
original issue. Care is taken in not exposing the size parameter in the
public (non-static) interface, so the compiler can const-propagate it
away.

Signed-off-by: Benjamin Gray 

---

v3: * Rename from *_memory to *_mem
* Change type of ppc32 patch_uint() address to void*
* Explain introduction of val32 for big endian
* Some formatting

v2: * Deduplicate patch_32() definition
* Use u32 for val32
* Remove noinline
---
 arch/powerpc/include/asm/code-patching.h | 31 
 arch/powerpc/lib/code-patching.c | 64 ++--
 2 files changed, 80 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index 0e29ccf903d0..21a36e2c4e26 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -76,6 +76,37 @@ int patch_instruction(u32 *addr, ppc_inst_t instr);
 int raw_patch_instruction(u32 *addr, ppc_inst_t instr);
 int patch_instructions(u32 *addr, u32 *code, size_t len, bool repeat_instr);
 
+/*
+ * The data patching functions patch_uint() and patch_ulong(), etc., must be
+ * called on aligned addresses.
+ *
+ * The instruction patching functions patch_instruction() and similar must be
+ * called on addresses satisfying instruction alignment requirements.
+ */
+
+#ifdef CONFIG_PPC64
+
+int patch_uint(void *addr, unsigned int val);
+int patch_ulong(void *addr, unsigned long val);
+
+#define patch_u64 patch_ulong
+
+#else
+
+static inline int patch_uint(void *addr, unsigned int val)
+{
+   return patch_instruction(addr, ppc_inst(val));
+}
+
+static inline int patch_ulong(void *addr, unsigned long val)
+{
+   return patch_instruction(addr, ppc_inst(val));
+}
+
+#endif
+
+#define patch_u32 patch_uint
+
 static inline unsigned long patch_site_addr(s32 *site)
 {
return (unsigned long)site + *site;
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index df64343b9214..18f762616db9 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -20,15 +20,14 @@
 #include 
 #include 
 
-static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 
*patch_addr)
+static int __patch_mem(void *exec_addr, unsigned long val, void *patch_addr, 
bool is_dword)
 {
-   if (!ppc_inst_prefixed(instr)) {
-   u32 val = ppc_inst_val(instr);
+   if (!IS_ENABLED(CONFIG_PPC64) || likely(!is_dword)) {
+   /* For big endian correctness: plain address would use the 
wrong half */
+   u32 val32 = val;
 
-   __put_kernel_nofault(patch_addr, , u32, failed);
+   __put_kernel_nofault(patch_addr, , u32, failed);
} else {
-   u64 val = ppc_inst_as_ulong(instr);
-
__put_kernel_nofault(patch_addr, , u64, failed);
}
 
@@ -44,7 +43,10 @@ static int __patch_instruction(u32 *exec_addr, ppc_inst_t 
instr, u32 *patch_addr
 
 int raw_patch_instruction(u32 *addr, ppc_inst_t instr)
 {
-   return __patch_instruction(addr, instr, addr);
+   if (ppc_inst_prefixed(instr))
+   return __patch_mem(addr, ppc_inst_as_ulong(instr), addr, true);
+   else
+   return __patch_mem(addr, ppc_inst_val(instr), addr, false);
 }
 
 struct patch_context {
@@ -276,7 +278,7 @@ static void unmap_patch_area(unsigned long addr)
flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
 }
 
-static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr)
+static int __do_patch_mem_mm(void *addr, unsigned long val, bool is_dword)
 {
int err;
u32 *patch_addr;
@@ -305,7 +307,7 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t 
instr)
 
orig_mm = start_using_temp_mm(patching_mm);
 
-   err = __patch_instruction(addr, instr, patch_addr);
+   err = __patch_mem(addr, val, patch_addr, is_dword);
 
/* context synchronisation performed

[PATCH v4 2/5] powerpc/code-patching: Add data patch alignment check

2024-05-14 Thread Benjamin Gray
The new data patching still needs to be aligned within a
cacheline too for the flushes to work correctly. To simplify
this requirement, we just say data patches must be aligned.

Detect when data patching is not aligned, returning an invalid
argument error.

Signed-off-by: Benjamin Gray 

---

v3: * New in v3
---
 arch/powerpc/include/asm/code-patching.h | 6 ++
 arch/powerpc/lib/code-patching.c | 6 ++
 2 files changed, 12 insertions(+)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index 21a36e2c4e26..e7f14720f630 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -95,11 +95,17 @@ int patch_ulong(void *addr, unsigned long val);
 
 static inline int patch_uint(void *addr, unsigned int val)
 {
+   if (!IS_ALIGNED((unsigned long)addr, sizeof(unsigned int)))
+   return -EINVAL;
+
return patch_instruction(addr, ppc_inst(val));
 }
 
 static inline int patch_ulong(void *addr, unsigned long val)
 {
+   if (!IS_ALIGNED((unsigned long)addr, sizeof(unsigned long)))
+   return -EINVAL;
+
return patch_instruction(addr, ppc_inst(val));
 }
 
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 18f762616db9..384b275d1bc5 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -386,12 +386,18 @@ NOKPROBE_SYMBOL(patch_instruction);
 
 int patch_uint(void *addr, unsigned int val)
 {
+   if (!IS_ALIGNED((unsigned long)addr, sizeof(unsigned int)))
+   return -EINVAL;
+
return patch_mem(addr, val, false);
 }
 NOKPROBE_SYMBOL(patch_uint);
 
 int patch_ulong(void *addr, unsigned long val)
 {
+   if (!IS_ALIGNED((unsigned long)addr, sizeof(unsigned long)))
+   return -EINVAL;
+
return patch_mem(addr, val, true);
 }
 NOKPROBE_SYMBOL(patch_ulong);
-- 
2.45.0



[PATCH v4 4/5] powerpc/32: Convert patch_instruction() to patch_uint()

2024-05-14 Thread Benjamin Gray
These changes are for patch_instruction() uses on data. Unlike ppc64
these should not be incorrect as-is, but using the patch_uint() alias
better reflects what kind of data being patched and allows for
benchmarking the effect of different patch_* implementations (e.g.,
skipping instruction flushing when patching data).

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/kernel/static_call.c | 2 +-
 arch/powerpc/platforms/powermac/smp.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/static_call.c 
b/arch/powerpc/kernel/static_call.c
index 863a7aa24650..1502b7e439ca 100644
--- a/arch/powerpc/kernel/static_call.c
+++ b/arch/powerpc/kernel/static_call.c
@@ -17,7 +17,7 @@ void arch_static_call_transform(void *site, void *tramp, void 
*func, bool tail)
mutex_lock(_mutex);
 
if (func && !is_short) {
-   err = patch_instruction(tramp + PPC_SCT_DATA, ppc_inst(target));
+   err = patch_ulong(tramp + PPC_SCT_DATA, target);
if (err)
goto out;
}
diff --git a/arch/powerpc/platforms/powermac/smp.c 
b/arch/powerpc/platforms/powermac/smp.c
index 15644be31990..d21b681f52fb 100644
--- a/arch/powerpc/platforms/powermac/smp.c
+++ b/arch/powerpc/platforms/powermac/smp.c
@@ -827,7 +827,7 @@ static int smp_core99_kick_cpu(int nr)
mdelay(1);
 
/* Restore our exception vector */
-   patch_instruction(vector, ppc_inst(save_vector));
+   patch_uint(vector, save_vector);
 
local_irq_restore(flags);
if (ppc_md.progress) ppc_md.progress("smp_core99_kick_cpu done", 0x347);
-- 
2.45.0



[PATCH v4 3/5] powerpc/64: Convert patch_instruction() to patch_u32()

2024-05-14 Thread Benjamin Gray
This use of patch_instruction() is working on 32 bit data, and can fail
if the data looks like a prefixed instruction and the extra write
crosses a page boundary. Use patch_u32() to fix the write size.

Fixes: 8734b41b3efe ("powerpc/module_64: Fix livepatching for RO modules")
Link: https://lore.kernel.org/all/20230203004649.1f59dbd4@yea/
Signed-off-by: Benjamin Gray 

---

v2: * Added the fixes tag, it seems appropriate even if the subject does
  mention a more robust solution being required.

patch_u64() should be more efficient, but judging from the bug report
it doesn't seem like the data is doubleword aligned.
---
 arch/powerpc/kernel/module_64.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 7112adc597a8..e9bab599d0c2 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -651,12 +651,11 @@ static inline int create_stub(const Elf64_Shdr *sechdrs,
// func_desc_t is 8 bytes if ABIv2, else 16 bytes
desc = func_desc(addr);
for (i = 0; i < sizeof(func_desc_t) / sizeof(u32); i++) {
-   if (patch_instruction(((u32 *)>funcdata) + i,
- ppc_inst(((u32 *)())[i])))
+   if (patch_u32(((u32 *)>funcdata) + i, ((u32 *))[i]))
return 0;
}
 
-   if (patch_instruction(>magic, ppc_inst(STUB_MAGIC)))
+   if (patch_u32(>magic, STUB_MAGIC))
return 0;
 
return 1;
-- 
2.45.0



Re: [PATCH v3 3/5] powerpc/64: Convert patch_instruction() to patch_u32()

2024-05-13 Thread Benjamin Gray
On Tue, 2024-04-23 at 15:09 +0530, Naveen N Rao wrote:
> On Mon, Mar 25, 2024 at 04:53:00PM +1100, Benjamin Gray wrote:
> > This use of patch_instruction() is working on 32 bit data, and can
> > fail
> > if the data looks like a prefixed instruction and the extra write
> > crosses a page boundary. Use patch_u32() to fix the write size.
> > 
> > Fixes: 8734b41b3efe ("powerpc/module_64: Fix livepatching for RO
> > modules")
> > Link: https://lore.kernel.org/all/20230203004649.1f59dbd4@yea/
> > Signed-off-by: Benjamin Gray 
> > 
> > ---
> > 
> > v2: * Added the fixes tag, it seems appropriate even if the subject
> > does
> >   mention a more robust solution being required.
> > 
> > patch_u64() should be more efficient, but judging from the bug
> > report
> > it doesn't seem like the data is doubleword aligned.
> 
> Asking again, is that still the case? It looks like at least the
> first 
> fix below can be converted to patch_u64().
> 
> - Naveen

Sorry, I think I forgot this question last time. Reading the commit
descriptions you linked, I don't see any mention of "entry->funcdata
will always be doubleword aligned because XYZ". If the patch makes it
doubleword aligned anyway, I wouldn't be confident asserting all
callers will always do this without looking into it a lot more.

Perhaps a separate series could optimise it with appropriate
justification/assertions to catch bad alignment. But I think leaving it
out of this series is fine because the original works in words, so it's
not regressing anything.

> 
> > ---
> >  arch/powerpc/kernel/module_64.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/module_64.c
> > b/arch/powerpc/kernel/module_64.c
> > index 7112adc597a8..e9bab599d0c2 100644
> > --- a/arch/powerpc/kernel/module_64.c
> > +++ b/arch/powerpc/kernel/module_64.c
> > @@ -651,12 +651,11 @@ static inline int create_stub(const
> > Elf64_Shdr *sechdrs,
> >     // func_desc_t is 8 bytes if ABIv2, else 16 bytes
> >     desc = func_desc(addr);
> >     for (i = 0; i < sizeof(func_desc_t) / sizeof(u32); i++) {
> > -   if (patch_instruction(((u32 *)>funcdata) +
> > i,
> > -     ppc_inst(((u32
> > *)())[i])))
> > +   if (patch_u32(((u32 *)>funcdata) + i, ((u32
> > *))[i]))
> >     return 0;
> >     }
> >  
> > -   if (patch_instruction(>magic,
> > ppc_inst(STUB_MAGIC)))
> > +   if (patch_u32(>magic, STUB_MAGIC))
> >     return 0;
> >  
> >     return 1;
> > -- 
> > 2.44.0
> > 



[PATCH v1 9/9] Documentation: Document PowerPC kernel dynamic DEXCR interface

2024-04-17 Thread Benjamin Gray
Documents how to use the PR_PPC_GET_DEXCR and PR_PPC_SET_DEXCR prctl()'s
for changing a process's DEXCR or its process tree default value.

Signed-off-by: Benjamin Gray 
---
 Documentation/arch/powerpc/dexcr.rst | 141 ++-
 1 file changed, 139 insertions(+), 2 deletions(-)

diff --git a/Documentation/arch/powerpc/dexcr.rst 
b/Documentation/arch/powerpc/dexcr.rst
index 615a631f51fa..ab0724212fcd 100644
--- a/Documentation/arch/powerpc/dexcr.rst
+++ b/Documentation/arch/powerpc/dexcr.rst
@@ -36,8 +36,145 @@ state for a process.
 Configuration
 =
 
-The DEXCR is currently unconfigurable. All threads are run with the
-NPHIE aspect enabled.
+prctl
+-
+
+A process can control its own userspace DEXCR value using the
+``PR_PPC_GET_DEXCR`` and ``PR_PPC_SET_DEXCR`` pair of
+:manpage:`prctl(2)` commands. These calls have the form::
+
+prctl(PR_PPC_GET_DEXCR, unsigned long which, 0, 0, 0);
+prctl(PR_PPC_SET_DEXCR, unsigned long which, unsigned long ctrl, 0, 0);
+
+The possible 'which' and 'ctrl' values are as follows. Note there is no 
relation
+between the 'which' value and the DEXCR aspect's index.
+
+.. flat-table::
+   :header-rows: 1
+   :widths: 2 7 1
+
+   * - ``prctl()`` which
+ - Aspect name
+ - Aspect index
+
+   * - ``PR_PPC_DEXCR_SBHE``
+ - Speculative Branch Hint Enable (SBHE)
+ - 0
+
+   * - ``PR_PPC_DEXCR_IBRTPD``
+ - Indirect Branch Recurrent Target Prediction Disable (IBRTPD)
+ - 3
+
+   * - ``PR_PPC_DEXCR_SRAPD``
+ - Subroutine Return Address Prediction Disable (SRAPD)
+ - 4
+
+   * - ``PR_PPC_DEXCR_NPHIE``
+ - Non-Privileged Hash Instruction Enable (NPHIE)
+ - 5
+
+.. flat-table::
+   :header-rows: 1
+   :widths: 2 8
+
+   * - ``prctl()`` ctrl
+ - Meaning
+
+   * - ``PR_PPC_DEXCR_CTRL_EDITABLE``
+ - This aspect can be configured with PR_PPC_SET_DEXCR (get only)
+
+   * - ``PR_PPC_DEXCR_CTRL_SET``
+ - This aspect is set / set this aspect
+
+   * - ``PR_PPC_DEXCR_CTRL_CLEAR``
+ - This aspect is clear / clear this aspect
+
+   * - ``PR_PPC_DEXCR_CTRL_SET_ONEXEC``
+ - This aspect will be set after exec / set this aspect after exec
+
+   * - ``PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC``
+ - This aspect will be clear after exec / clear this aspect after exec
+
+Note that
+
+* which is a plain value, not a bitmask. Aspects must be worked with 
individually.
+
+* ctrl is a bitmask. ``PR_PPC_GET_DEXCR`` returns both the current and onexec
+  configuration. For example, ``PR_PPC_GET_DEXCR`` may return
+  ``PR_PPC_DEXCR_CTRL_EDITABLE | PR_PPC_DEXCR_CTRL_SET |
+  PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC``. This would indicate the aspect is currently
+  set, it will be cleared when you run exec, and you can change this with the
+  ``PR_PPC_SET_DEXCR`` prctl.
+
+* The set/clear terminology refers to setting/clearing the bit in the DEXCR.
+  For example::
+
+  prctl(PR_PPC_SET_DEXCR, PR_PPC_DEXCR_IBRTPD, PR_PPC_DEXCR_CTRL_SET, 0, 
0);
+
+  will set the IBRTPD aspect bit in the DEXCR, causing indirect branch 
prediction
+  to be disabled.
+
+* The status returned by ``PR_PPC_GET_DEXCR`` represents what value the process
+  would like applied. It does not include any alternative overrides, such as if
+  the hypervisor is enforcing the aspect be set. To see the true DEXCR state
+  software should read the appropriate SPRs directly.
+
+* The aspect state when starting a process is copied from the parent's state on
+  :manpage:`fork(2)`. The state is reset to a fixed value on
+  :manpage:`execve(2)`. The PR_PPC_SET_DEXCR prctl() can control both of these
+  values.
+
+* The ``*_ONEXEC`` controls do not change the current process's DEXCR.
+
+Use ``PR_PPC_SET_DEXCR`` with one of ``PR_PPC_DEXCR_CTRL_SET`` or
+``PR_PPC_DEXCR_CTRL_CLEAR`` to edit a given aspect.
+
+Common error codes for both getting and setting the DEXCR are as follows:
+
+.. flat-table::
+   :header-rows: 1
+   :widths: 2 8
+
+   * - Error
+ - Meaning
+
+   * - ``EINVAL``
+ - The DEXCR is not supported by the kernel.
+
+   * - ``ENODEV``
+ - The aspect is not recognised by the kernel or not supported by the
+   hardware.
+
+``PR_PPC_SET_DEXCR`` may also report the following error codes:
+
+.. flat-table::
+   :header-rows: 1
+   :widths: 2 8
+
+   * - Error
+ - Meaning
+
+   * - ``EINVAL``
+ - The ctrl value contains unrecognised flags.
+
+   * - ``EINVAL``
+ - The ctrl value contains mutually conflicting flags (e.g.,
+   ``PR_PPC_DEXCR_CTRL_SET | PR_PPC_DEXCR_CTRL_CLEAR``)
+
+   * - ``EPERM``
+ - This aspect cannot be modified with prctl() (check for the
+   PR_PPC_DEXCR_CTRL_EDITABLE flag with PR_PPC_GET_DEXCR).
+
+   * - ``EPERM``
+ - The process does not have sufficient privilege to perform the operation.
+   For example, clearing NPHIE on exec is a privileged operation (a process
+   can still clear its own NPHIE aspect without privileges).
+
+This interface allows a process to control its own DEXCR

[PATCH v1 8/9] selftests/powerpc/dexcr: Add chdexcr utility

2024-04-17 Thread Benjamin Gray
Adds a utility to exercise the prctl DEXCR inheritance in the shell.
Supports setting and clearing each aspect.

Signed-off-by: Benjamin Gray 
---
 .../selftests/powerpc/dexcr/.gitignore|   1 +
 .../testing/selftests/powerpc/dexcr/Makefile  |   2 +-
 .../testing/selftests/powerpc/dexcr/chdexcr.c | 110 +++
 tools/testing/selftests/powerpc/dexcr/dexcr.h |  47 +++
 .../testing/selftests/powerpc/dexcr/lsdexcr.c | 130 --
 5 files changed, 185 insertions(+), 105 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/dexcr/chdexcr.c

diff --git a/tools/testing/selftests/powerpc/dexcr/.gitignore 
b/tools/testing/selftests/powerpc/dexcr/.gitignore
index 5d526613cd26..11eefb4b9fa4 100644
--- a/tools/testing/selftests/powerpc/dexcr/.gitignore
+++ b/tools/testing/selftests/powerpc/dexcr/.gitignore
@@ -1,3 +1,4 @@
 dexcr_test
 hashchk_test
+chdexcr
 lsdexcr
diff --git a/tools/testing/selftests/powerpc/dexcr/Makefile 
b/tools/testing/selftests/powerpc/dexcr/Makefile
index 076943193c07..6a89e88ef7b2 100644
--- a/tools/testing/selftests/powerpc/dexcr/Makefile
+++ b/tools/testing/selftests/powerpc/dexcr/Makefile
@@ -1,5 +1,5 @@
 TEST_GEN_PROGS := dexcr_test hashchk_test
-TEST_GEN_FILES := lsdexcr
+TEST_GEN_FILES := lsdexcr chdexcr
 
 include ../../lib.mk
 
diff --git a/tools/testing/selftests/powerpc/dexcr/chdexcr.c 
b/tools/testing/selftests/powerpc/dexcr/chdexcr.c
new file mode 100644
index ..217187a83224
--- /dev/null
+++ b/tools/testing/selftests/powerpc/dexcr/chdexcr.c
@@ -0,0 +1,110 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "dexcr.h"
+#include "utils.h"
+
+static void die(const char *msg)
+{
+   printf("%s\n", msg);
+   exit(1);
+}
+
+static void help(void)
+{
+   printf("Invoke a provided program with a custom DEXCR on-exec reset 
value\n"
+  "\n"
+  "usage: chdexcr [CHDEXCR OPTIONS] -- PROGRAM [ARGS...]\n"
+  "\n"
+  "Each configurable DEXCR aspect is exposed as an option.\n"
+  "\n"
+  "The normal option sets the aspect in the DEXCR. The --no- 
variant\n"
+  "clears that aspect. For example, --ibrtpd sets the IBRTPD 
aspect bit,\n"
+  "so indirect branch predicition will be disabled in the provided 
program.\n"
+  "Conversely, --no-ibrtpd clears the aspect bit, so indirect 
branch\n"
+  "prediction may occur.\n"
+  "\n"
+  "CHDEXCR OPTIONS:\n");
+
+   for (int i = 0; i < ARRAY_SIZE(aspects); i++) {
+   const struct dexcr_aspect *aspect = [i];
+
+   if (aspect->prctl == -1)
+   continue;
+
+   printf("  --%-6s / --no-%-6s : %s\n", aspect->opt, aspect->opt, 
aspect->desc);
+   }
+}
+
+static const struct dexcr_aspect *opt_to_aspect(const char *opt)
+{
+   for (int i = 0; i < ARRAY_SIZE(aspects); i++)
+   if (aspects[i].prctl != -1 && !strcmp(aspects[i].opt, opt))
+   return [i];
+
+   return NULL;
+}
+
+static int apply_option(const char *option)
+{
+   const struct dexcr_aspect *aspect;
+   const char *opt = NULL;
+   const char *set_prefix = "--";
+   const char *clear_prefix = "--no-";
+   unsigned long ctrl = 0;
+   int err;
+
+   if (!strcmp(option, "-h") || !strcmp(option, "--help")) {
+   help();
+   exit(0);
+   }
+
+   /* Strip out --(no-) prefix and determine ctrl value */
+   if (!strncmp(option, clear_prefix, strlen(clear_prefix))) {
+   opt = [strlen(clear_prefix)];
+   ctrl |= PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC;
+   } else if (!strncmp(option, set_prefix, strlen(set_prefix))) {
+   opt = [strlen(set_prefix)];
+   ctrl |= PR_PPC_DEXCR_CTRL_SET_ONEXEC;
+   }
+
+   if (!opt || !*opt)
+   return 1;
+
+   aspect = opt_to_aspect(opt);
+   if (!aspect)
+   die("unknown aspect");
+
+   err = pr_set_dexcr(aspect->prctl, ctrl);
+   if (err)
+   die("failed to apply option");
+
+   return 0;
+}
+
+int main(int argc, char *const argv[], char *const envp[])
+{
+   int i;
+
+   if (!dexcr_exists())
+   die("DEXCR not detected on this hardware");
+
+   for (i = 1; i < argc; i++)
+   if (apply_option(argv[i]))
+   break;
+
+   if (i < argc && !strcmp(argv[i], "--"))
+   i++;
+
+   if (i >= argc)
+   die("missing command");
+
+   execve(argv[i], [i], en

[PATCH v1 7/9] selftests/powerpc/dexcr: Add DEXCR config details to lsdexcr

2024-04-17 Thread Benjamin Gray
Now that the DEXCR can be configured with prctl, add a section in
lsdexcr that explains why each aspect is set the way it is.

Signed-off-by: Benjamin Gray 
---
 .../testing/selftests/powerpc/dexcr/lsdexcr.c | 113 +-
 1 file changed, 111 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/powerpc/dexcr/lsdexcr.c 
b/tools/testing/selftests/powerpc/dexcr/lsdexcr.c
index 94abbfcc389e..a63db47b6610 100644
--- a/tools/testing/selftests/powerpc/dexcr/lsdexcr.c
+++ b/tools/testing/selftests/powerpc/dexcr/lsdexcr.c
@@ -1,9 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0+
 
-#include 
 #include 
 #include 
 #include 
+#include 
 
 #include "dexcr.h"
 #include "utils.h"
@@ -16,6 +16,8 @@ struct dexcr_aspect {
const char *name;
const char *desc;
unsigned int index;
+   unsigned long prctl;
+   const char *sysctl;
 };
 
 static const struct dexcr_aspect aspects[] = {
@@ -23,26 +25,36 @@ static const struct dexcr_aspect aspects[] = {
.name = "SBHE",
.desc = "Speculative branch hint enable",
.index = 0,
+   .prctl = PR_PPC_DEXCR_SBHE,
+   .sysctl = "speculative_branch_hint_enable",
},
{
.name = "IBRTPD",
.desc = "Indirect branch recurrent target prediction disable",
.index = 3,
+   .prctl = PR_PPC_DEXCR_IBRTPD,
+   .sysctl = "indirect_branch_recurrent_target_prediction_disable",
},
{
.name = "SRAPD",
.desc = "Subroutine return address prediction disable",
.index = 4,
+   .prctl = PR_PPC_DEXCR_SRAPD,
+   .sysctl = "subroutine_return_address_prediction_disable",
},
{
.name = "NPHIE",
.desc = "Non-privileged hash instruction enable",
.index = 5,
+   .prctl = PR_PPC_DEXCR_NPHIE,
+   .sysctl = "nonprivileged_hash_instruction_enable",
},
{
.name = "PHIE",
.desc = "Privileged hash instruction enable",
.index = 6,
+   .prctl = -1,
+   .sysctl = NULL,
},
 };
 
@@ -60,7 +72,7 @@ static void print_dexcr(char *name, unsigned int bits)
const char *enabled_aspects[ARRAY_SIZE(aspects) + 1] = {NULL};
size_t j = 0;
 
-   printf("%s: %08x", name, bits);
+   printf("%s: 0x%08x", name, bits);
 
if (bits == 0) {
printf("\n");
@@ -103,6 +115,95 @@ static void print_aspect(const struct dexcr_aspect *aspect)
printf("  \t(%s)\n", aspect->desc);
 }
 
+static void print_aspect_config(const struct dexcr_aspect *aspect)
+{
+   char sysctl_path[128] = "/proc/sys/kernel/dexcr/";
+   const char *reason = "unknown";
+   const char *reason_hyp = NULL;
+   const char *reason_sysctl = "no sysctl";
+   const char *reason_prctl = "no prctl";
+   bool actual = effective & DEXCR_PR_BIT(aspect->index);
+   bool expected = false;
+
+   long sysctl_ctrl = 0;
+   int prctl_ctrl = 0;
+   int err;
+
+   if (aspect->prctl >= 0) {
+   prctl_ctrl = pr_get_dexcr(aspect->prctl);
+   if (prctl_ctrl < 0)
+   reason_prctl = "(failed to read prctl)";
+   else {
+   if (prctl_ctrl & PR_PPC_DEXCR_CTRL_SET) {
+   reason_prctl = "set by prctl";
+   expected = true;
+   } else if (prctl_ctrl & PR_PPC_DEXCR_CTRL_CLEAR) {
+   reason_prctl = "cleared by prctl";
+   expected = false;
+   } else
+   reason_prctl = "unknown prctl";
+
+   reason = reason_prctl;
+   }
+   }
+
+   if (aspect->sysctl) {
+   strcat(sysctl_path, aspect->sysctl);
+   err = read_long(sysctl_path, _ctrl, 10);
+   if (err)
+   reason_sysctl = "(failed to read sysctl)";
+   else {
+   switch (sysctl_ctrl) {
+   case 0:
+   reason_sysctl = "cleared by sysctl";
+   reason = reason_sysctl;
+   expected = false;
+   break;
+   case 1:
+   reason_sysctl = "set by sysctl";
+   reason = reason_sysc

[PATCH v1 6/9] selftests/powerpc/dexcr: Attempt to enable NPHIE in hashchk selftest

2024-04-17 Thread Benjamin Gray
Now that a process can control its DEXCR to some extent, make the
hashchk tests more reliable by explicitly setting the local and onexec
NPHIE aspect.

Signed-off-by: Benjamin Gray 
---
 tools/testing/selftests/powerpc/dexcr/hashchk_test.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/powerpc/dexcr/hashchk_test.c 
b/tools/testing/selftests/powerpc/dexcr/hashchk_test.c
index 7d5658c9ebe4..645224bdc142 100644
--- a/tools/testing/selftests/powerpc/dexcr/hashchk_test.c
+++ b/tools/testing/selftests/powerpc/dexcr/hashchk_test.c
@@ -21,8 +21,14 @@
 static int require_nphie(void)
 {
SKIP_IF_MSG(!dexcr_exists(), "DEXCR not supported");
+
+   pr_set_dexcr(PR_PPC_DEXCR_NPHIE, PR_PPC_DEXCR_CTRL_SET | 
PR_PPC_DEXCR_CTRL_SET_ONEXEC);
+
+   if (get_dexcr(EFFECTIVE) & DEXCR_PR_NPHIE)
+   return 0;
+
SKIP_IF_MSG(!(get_dexcr(EFFECTIVE) & DEXCR_PR_NPHIE),
-   "DEXCR[NPHIE] not enabled");
+   "Failed to enable DEXCR[NPHIE]");
 
return 0;
 }
-- 
2.44.0



[PATCH v1 5/9] selftests/powerpc/dexcr: Add DEXCR prctl interface test

2024-04-17 Thread Benjamin Gray
Some basic tests of the prctl interface of the DEXCR.

Signed-off-by: Benjamin Gray 
---
 .../selftests/powerpc/dexcr/.gitignore|   1 +
 .../testing/selftests/powerpc/dexcr/Makefile  |   4 +-
 tools/testing/selftests/powerpc/dexcr/dexcr.c |  40 
 tools/testing/selftests/powerpc/dexcr/dexcr.h |  10 +
 .../selftests/powerpc/dexcr/dexcr_test.c  | 213 ++
 5 files changed, 267 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/dexcr/dexcr_test.c

diff --git a/tools/testing/selftests/powerpc/dexcr/.gitignore 
b/tools/testing/selftests/powerpc/dexcr/.gitignore
index b82f45dd46b9..5d526613cd26 100644
--- a/tools/testing/selftests/powerpc/dexcr/.gitignore
+++ b/tools/testing/selftests/powerpc/dexcr/.gitignore
@@ -1,2 +1,3 @@
+dexcr_test
 hashchk_test
 lsdexcr
diff --git a/tools/testing/selftests/powerpc/dexcr/Makefile 
b/tools/testing/selftests/powerpc/dexcr/Makefile
index 829ad075b4a4..076943193c07 100644
--- a/tools/testing/selftests/powerpc/dexcr/Makefile
+++ b/tools/testing/selftests/powerpc/dexcr/Makefile
@@ -1,8 +1,10 @@
-TEST_GEN_PROGS := hashchk_test
+TEST_GEN_PROGS := dexcr_test hashchk_test
 TEST_GEN_FILES := lsdexcr
 
 include ../../lib.mk
 
+CFLAGS += $(KHDR_INCLUDES)
+
 $(OUTPUT)/hashchk_test: CFLAGS += -fno-pie -no-pie $(call 
cc-option,-mno-rop-protect)
 
 $(TEST_GEN_PROGS): ../harness.c ../utils.c ./dexcr.c
diff --git a/tools/testing/selftests/powerpc/dexcr/dexcr.c 
b/tools/testing/selftests/powerpc/dexcr/dexcr.c
index 65ec5347de98..468fd0dc9912 100644
--- a/tools/testing/selftests/powerpc/dexcr/dexcr.c
+++ b/tools/testing/selftests/powerpc/dexcr/dexcr.c
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -43,6 +44,45 @@ bool dexcr_exists(void)
return exists;
 }
 
+unsigned int pr_which_to_aspect(unsigned long which)
+{
+   switch (which) {
+   case PR_PPC_DEXCR_SBHE:
+   return DEXCR_PR_SBHE;
+   case PR_PPC_DEXCR_IBRTPD:
+   return DEXCR_PR_IBRTPD;
+   case PR_PPC_DEXCR_SRAPD:
+   return DEXCR_PR_SRAPD;
+   case PR_PPC_DEXCR_NPHIE:
+   return DEXCR_PR_NPHIE;
+   default:
+   FAIL_IF_EXIT_MSG(true, "unknown PR aspect");
+   }
+}
+
+int pr_get_dexcr(unsigned long which)
+{
+   return prctl(PR_PPC_GET_DEXCR, which, 0UL, 0UL, 0UL);
+}
+
+int pr_set_dexcr(unsigned long which, unsigned long ctrl)
+{
+   return prctl(PR_PPC_SET_DEXCR, which, ctrl, 0UL, 0UL);
+}
+
+bool pr_dexcr_aspect_supported(unsigned long which)
+{
+   if (pr_get_dexcr(which) == -1)
+   return errno == ENODEV;
+
+   return true;
+}
+
+bool pr_dexcr_aspect_editable(unsigned long which)
+{
+   return pr_get_dexcr(which) & PR_PPC_DEXCR_CTRL_EDITABLE;
+}
+
 /*
  * Just test if a bad hashchk triggers a signal, without checking
  * for support or if the NPHIE aspect is enabled.
diff --git a/tools/testing/selftests/powerpc/dexcr/dexcr.h 
b/tools/testing/selftests/powerpc/dexcr/dexcr.h
index f55cbbc8643b..a6aa7eac11da 100644
--- a/tools/testing/selftests/powerpc/dexcr/dexcr.h
+++ b/tools/testing/selftests/powerpc/dexcr/dexcr.h
@@ -28,6 +28,16 @@
 
 bool dexcr_exists(void);
 
+bool pr_dexcr_aspect_supported(unsigned long which);
+
+bool pr_dexcr_aspect_editable(unsigned long which);
+
+int pr_get_dexcr(unsigned long pr_aspect);
+
+int pr_set_dexcr(unsigned long pr_aspect, unsigned long ctrl);
+
+unsigned int pr_which_to_aspect(unsigned long which);
+
 bool hashchk_triggers(void);
 
 enum dexcr_source {
diff --git a/tools/testing/selftests/powerpc/dexcr/dexcr_test.c 
b/tools/testing/selftests/powerpc/dexcr/dexcr_test.c
new file mode 100644
index ..4662c823a4f1
--- /dev/null
+++ b/tools/testing/selftests/powerpc/dexcr/dexcr_test.c
@@ -0,0 +1,213 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "dexcr.h"
+#include "utils.h"
+
+/*
+ * Helper function for testing the behaviour of a newly exec-ed process
+ */
+static int dexcr_prctl_onexec_test_child(unsigned long which, const char 
*status)
+{
+   unsigned long dexcr = mfspr(SPRN_DEXCR_RO);
+   unsigned long aspect = pr_which_to_aspect(which);
+   int ctrl = pr_get_dexcr(which);
+
+   if (!strcmp(status, "set")) {
+   FAIL_IF_EXIT_MSG(!(ctrl & PR_PPC_DEXCR_CTRL_SET),
+"setting aspect across exec not applied");
+
+   FAIL_IF_EXIT_MSG(!(ctrl & PR_PPC_DEXCR_CTRL_SET_ONEXEC),
+"setting aspect across exec not inherited");
+
+   FAIL_IF_EXIT_MSG(!(aspect & dexcr), "setting aspect across exec 
did not take effect");
+   } else if (!strcmp(status, "clear")) {
+   FAIL_IF_EXIT_MSG(!(ctrl & PR_PPC_DEXCR_CTRL_CLEAR),
+"clearing aspect across exec not applied");
+
+  

[PATCH v1 3/9] powerpc/dexcr: Reset DEXCR value across exec

2024-04-17 Thread Benjamin Gray
Inheriting the DEXCR across exec can have security and usability
concerns. If a program is compiled with hash instructions it generally
expects to run with NPHIE enabled. But if the parent process disables
NPHIE then if it's not careful it will be disabled for any children too
and the protection offered by hash checks is basically worthless.

This patch introduces a per-process reset value that new execs in a
particular process tree are initialized with. This enables fine grained
control over what DEXCR value child processes run with by default. For
example, containers running legacy binaries that expect hash
instructions to act as NOPs could configure the reset value of the
container root to control the default reset value for all members of the
container.

Signed-off-by: Benjamin Gray 

---

This differs from the previous iterations by making the reset value
totally independent of the process's current DEXCR value. The original
iterations stored an 'inherit' bitmask, where the mask decides which of
the current DEXCR aspects are kept across exec, and the others are reset
to a fixed default.

That approach has a flaw when trying to prevent unauthorized inherited
hash check disabling. With a hierarchy A -> B -> C of process execs,
suppose A is privileged and enables NPHIE as an inherited aspect. If B
is unprivileged but clears its NPHIE aspect, the clear is
unintentionally inherited down to C as well (which may be setuid).

This new approach lets processes control the reset value directly
without any reference to the values the process itself is using.
Compared to the original approach, we don't run into an issue where an
unprivileged middle child ever controls the reset value.
---
 arch/powerpc/include/asm/processor.h |  2 +-
 arch/powerpc/kernel/Makefile |  1 +
 arch/powerpc/kernel/dexcr.c  | 21 +
 arch/powerpc/kernel/process.c|  7 +++
 4 files changed, 30 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/kernel/dexcr.c

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 882e31296ea6..aad85a24134a 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -261,7 +261,7 @@ struct thread_struct {
unsigned long   sier3;
unsigned long   hashkeyr;
unsigned long   dexcr;
-
+   unsigned long   dexcr_onexec;   /* Reset value to load on exec */
 #endif
 };
 
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index d3282fbea4f2..1d183b077948 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -87,6 +87,7 @@ obj-$(CONFIG_HAVE_HW_BREAKPOINT)  += hw_breakpoint.o
 obj-$(CONFIG_PPC_DAWR) += dawr.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_ppc970.o cpu_setup_pa6t.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_power.o
+obj-$(CONFIG_PPC_BOOK3S_64)+= dexcr.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= mce.o mce_power.o
 obj-$(CONFIG_PPC_BOOK3E_64)+= exceptions-64e.o idle_64e.o
 obj-$(CONFIG_PPC_BARRIER_NOSPEC) += security.o
diff --git a/arch/powerpc/kernel/dexcr.c b/arch/powerpc/kernel/dexcr.c
new file mode 100644
index ..f65c359968cc
--- /dev/null
+++ b/arch/powerpc/kernel/dexcr.c
@@ -0,0 +1,21 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+static int __init init_task_dexcr(void)
+{
+   if (!early_cpu_has_feature(CPU_FTR_ARCH_31))
+   return 0;
+
+   current->thread.dexcr_onexec = mfspr(SPRN_DEXCR);
+
+   return 0;
+}
+early_initcall(init_task_dexcr)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index d482c3fd81d7..8ab779a3bdde 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1641,6 +1641,13 @@ void arch_setup_new_exec(void)
current->thread.regs->amr  = default_amr;
current->thread.regs->iamr  = default_iamr;
 #endif
+
+#ifdef CONFIG_PPC_BOOK3S_64
+   if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+   current->thread.dexcr = current->thread.dexcr_onexec;
+   mtspr(SPRN_DEXCR, current->thread.dexcr);
+   }
+#endif /* CONFIG_PPC_BOOK3S_64 */
 }
 
 #ifdef CONFIG_PPC64
-- 
2.44.0



[PATCH v1 4/9] powerpc/dexcr: Add DEXCR prctl interface

2024-04-17 Thread Benjamin Gray
Now that we track a DEXCR on a per-task basis, individual tasks are free
to configure it as they like.

The interface is a pair of getter/setter prctl's that work on a single
aspect at a time (multiple aspects at once is more difficult if there
are different rules applied for each aspect, now or in future). The
getter shows the current state of the process config, and the setter
allows setting/clearing the aspect.

Signed-off-by: Benjamin Gray 

---

I'm intentionally trying to avoid saying 'enabling' or 'disabling' the
aspect, as that could be confusing when the aspects themselves may
'enable' or 'disable' their feature when the aspect is set.
---
 arch/powerpc/include/asm/processor.h | 10 +++
 arch/powerpc/kernel/dexcr.c  | 99 
 include/uapi/linux/prctl.h   | 16 +
 kernel/sys.c | 16 +
 4 files changed, 141 insertions(+)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index aad85a24134a..e44cac0da346 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -334,6 +334,16 @@ extern int set_endian(struct task_struct *tsk, unsigned 
int val);
 extern int get_unalign_ctl(struct task_struct *tsk, unsigned long adr);
 extern int set_unalign_ctl(struct task_struct *tsk, unsigned int val);
 
+#ifdef CONFIG_PPC_BOOK3S_64
+
+#define PPC_GET_DEXCR_ASPECT(tsk, asp) get_dexcr_prctl((tsk), (asp))
+#define PPC_SET_DEXCR_ASPECT(tsk, asp, val) set_dexcr_prctl((tsk), (asp), 
(val))
+
+int get_dexcr_prctl(struct task_struct *tsk, unsigned long asp);
+int set_dexcr_prctl(struct task_struct *tsk, unsigned long asp, unsigned long 
val);
+
+#endif
+
 extern void load_fp_state(struct thread_fp_state *fp);
 extern void store_fp_state(struct thread_fp_state *fp);
 extern void load_vr_state(struct thread_vr_state *vr);
diff --git a/arch/powerpc/kernel/dexcr.c b/arch/powerpc/kernel/dexcr.c
index f65c359968cc..f50247b3daa1 100644
--- a/arch/powerpc/kernel/dexcr.c
+++ b/arch/powerpc/kernel/dexcr.c
@@ -19,3 +19,102 @@ static int __init init_task_dexcr(void)
return 0;
 }
 early_initcall(init_task_dexcr)
+
+/* Allow thread local configuration of these by default */
+#define DEXCR_PRCTL_EDITABLE ( \
+   DEXCR_PR_IBRTPD | \
+   DEXCR_PR_SRAPD | \
+   DEXCR_PR_NPHIE)
+
+static int prctl_to_aspect(unsigned long which, unsigned int *aspect)
+{
+   switch (which) {
+   case PR_PPC_DEXCR_SBHE:
+   *aspect = DEXCR_PR_SBHE;
+   break;
+   case PR_PPC_DEXCR_IBRTPD:
+   *aspect = DEXCR_PR_IBRTPD;
+   break;
+   case PR_PPC_DEXCR_SRAPD:
+   *aspect = DEXCR_PR_SRAPD;
+   break;
+   case PR_PPC_DEXCR_NPHIE:
+   *aspect = DEXCR_PR_NPHIE;
+   break;
+   default:
+   return -ENODEV;
+   }
+
+   return 0;
+}
+
+int get_dexcr_prctl(struct task_struct *task, unsigned long which)
+{
+   unsigned int aspect;
+   int ret;
+
+   ret = prctl_to_aspect(which, );
+   if (ret)
+   return ret;
+
+   if (aspect & DEXCR_PRCTL_EDITABLE)
+   ret |= PR_PPC_DEXCR_CTRL_EDITABLE;
+
+   if (aspect & mfspr(SPRN_DEXCR))
+   ret |= PR_PPC_DEXCR_CTRL_SET;
+   else
+   ret |= PR_PPC_DEXCR_CTRL_CLEAR;
+
+   if (aspect & task->thread.dexcr_onexec)
+   ret |= PR_PPC_DEXCR_CTRL_SET_ONEXEC;
+   else
+   ret |= PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC;
+
+   return ret;
+}
+
+int set_dexcr_prctl(struct task_struct *task, unsigned long which, unsigned 
long ctrl)
+{
+   unsigned long dexcr;
+   unsigned int aspect;
+   int err = 0;
+
+   err = prctl_to_aspect(which, );
+   if (err)
+   return err;
+
+   if (!(aspect & DEXCR_PRCTL_EDITABLE))
+   return -EPERM;
+
+   if (ctrl & ~PR_PPC_DEXCR_CTRL_MASK)
+   return -EINVAL;
+
+   if (ctrl & PR_PPC_DEXCR_CTRL_SET && ctrl & PR_PPC_DEXCR_CTRL_CLEAR)
+   return -EINVAL;
+
+   if (ctrl & PR_PPC_DEXCR_CTRL_SET_ONEXEC && ctrl & 
PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC)
+   return -EINVAL;
+
+   /*
+* We do not want an unprivileged process being able to disable
+* a setuid process's hash check instructions
+*/
+   if (aspect == DEXCR_PR_NPHIE && ctrl & PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC 
&& !capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   dexcr = mfspr(SPRN_DEXCR);
+
+   if (ctrl & PR_PPC_DEXCR_CTRL_SET)
+   dexcr |= aspect;
+   else if (ctrl & PR_PPC_DEXCR_CTRL_CLEAR)
+   dexcr &= ~aspect;
+
+   if (ctrl & PR_PPC_DEXCR_CTRL_SET_ONEXEC)
+   task->thread.dexcr_onexec |= aspect;
+   else if (ctrl & PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC)
+   task->

[PATCH v1 1/9] selftests/powerpc/dexcr: Add -no-pie to hashchk tests

2024-04-17 Thread Benjamin Gray
The hashchk tests want to verify that the hash key is changed over exec.
It does so by calculating hashes at the same address across an exec.
This is made simpler by disabling PIE functionality, so we can
re-execute ourselves and be using the same addresses in the child.

While -fno-pie is already added, -no-pie is also required.

Fixes: ca64da7574f8 ("selftests/powerpc/dexcr: Add hashst/hashchk test")
Signed-off-by: Benjamin Gray 

---

This is not related to features introduced in this series, just fixes
the test added in the static DEXCR series.
---
 tools/testing/selftests/powerpc/dexcr/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/powerpc/dexcr/Makefile 
b/tools/testing/selftests/powerpc/dexcr/Makefile
index 76210f2bcec3..829ad075b4a4 100644
--- a/tools/testing/selftests/powerpc/dexcr/Makefile
+++ b/tools/testing/selftests/powerpc/dexcr/Makefile
@@ -3,7 +3,7 @@ TEST_GEN_FILES := lsdexcr
 
 include ../../lib.mk
 
-$(OUTPUT)/hashchk_test: CFLAGS += -fno-pie $(call cc-option,-mno-rop-protect)
+$(OUTPUT)/hashchk_test: CFLAGS += -fno-pie -no-pie $(call 
cc-option,-mno-rop-protect)
 
 $(TEST_GEN_PROGS): ../harness.c ../utils.c ./dexcr.c
 $(TEST_GEN_FILES): ../utils.c ./dexcr.c
-- 
2.44.0



[PATCH v1 2/9] powerpc/dexcr: Track the DEXCR per-process

2024-04-17 Thread Benjamin Gray
Add capability to make the DEXCR act as a per-process SPR.

We do not yet have an interface for changing the values per task. We
also expect the kernel to use a single DEXCR value across all tasks
while in privileged state, so there is no need to synchronize after
changing it (the userspace aspects will synchronize upon returning to
userspace).

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/include/asm/processor.h |  1 +
 arch/powerpc/kernel/process.c| 10 ++
 arch/powerpc/kernel/ptrace/ptrace-view.c |  7 +--
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index b2c51d337e60..882e31296ea6 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -260,6 +260,7 @@ struct thread_struct {
unsigned long   sier2;
unsigned long   sier3;
unsigned long   hashkeyr;
+   unsigned long   dexcr;
 
 #endif
 };
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 9452a54d356c..d482c3fd81d7 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1185,6 +1185,9 @@ static inline void save_sprs(struct thread_struct *t)
 
if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE))
t->hashkeyr = mfspr(SPRN_HASHKEYR);
+
+   if (cpu_has_feature(CPU_FTR_ARCH_31))
+   t->dexcr = mfspr(SPRN_DEXCR);
 #endif
 }
 
@@ -1267,6 +1270,10 @@ static inline void restore_sprs(struct thread_struct 
*old_thread,
if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE) &&
old_thread->hashkeyr != new_thread->hashkeyr)
mtspr(SPRN_HASHKEYR, new_thread->hashkeyr);
+
+   if (cpu_has_feature(CPU_FTR_ARCH_31) &&
+   old_thread->dexcr != new_thread->dexcr)
+   mtspr(SPRN_DEXCR, new_thread->dexcr);
 #endif
 
 }
@@ -1878,6 +1885,9 @@ int copy_thread(struct task_struct *p, const struct 
kernel_clone_args *args)
 #ifdef CONFIG_PPC_BOOK3S_64
if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE))
p->thread.hashkeyr = current->thread.hashkeyr;
+
+   if (cpu_has_feature(CPU_FTR_ARCH_31))
+   p->thread.dexcr = mfspr(SPRN_DEXCR);
 #endif
return 0;
 }
diff --git a/arch/powerpc/kernel/ptrace/ptrace-view.c 
b/arch/powerpc/kernel/ptrace/ptrace-view.c
index 584cf5c3df50..c1819e0a6684 100644
--- a/arch/powerpc/kernel/ptrace/ptrace-view.c
+++ b/arch/powerpc/kernel/ptrace/ptrace-view.c
@@ -469,12 +469,7 @@ static int dexcr_get(struct task_struct *target, const 
struct user_regset *regse
if (!cpu_has_feature(CPU_FTR_ARCH_31))
return -ENODEV;
 
-   /*
-* The DEXCR is currently static across all CPUs, so we don't
-* store the target's value anywhere, but the static value
-* will also be correct.
-*/
-   membuf_store(, (u64)lower_32_bits(DEXCR_INIT));
+   membuf_store(, (u64)lower_32_bits(target->thread.dexcr));
 
/*
 * Technically the HDEXCR is per-cpu, but a hypervisor can't reasonably
-- 
2.44.0



[PATCH v1 0/9] Add dynamic DEXCR support

2024-04-17 Thread Benjamin Gray
Adds support for a process to change its DEXCR value. The implementation is
somewhat conservative; SBHE (speculative branch hint enable) is not exposed
as an editable aspect because its effects can bleed over to other threads.

As explained in the third patch, this series changes the reset/inherit
behaviour on exec. Previously there was a bitmask that tracked which aspects
to copy from the current state vs resetting to a fixed default. This
allows unprivileged processes to disable ROP protection for setuid binaries
though, and is generally a weird interface to work with. The actual intent
(and new implementation) tracks the exec value as an independent value that
doesn't use the parent's DEXCR at all. The parent can control this reset value
separately to its own DEXCR value.

The other interesting part is the prctl interface. I've made the _SET, _CLEAR,
_SET_ONEXEC, and _CLEAR_ONEXEC controls each a separate flag. This makes it
easier to re-use with the getter prctl, as opposed to making set/clear a boolean
value with a separate flag for if it's 'on-exec'. With separate flags you can
return both the current and on-exec state in the getter in the same way you'd
prepare it for the setter.

There are still more features that can be added. A global switch to disable
ROP protection could be useful as an option to prevent performance degradation.
Also a prctl to randomise the hash key could be useful for when userspace knows
a fork is not going to run any parent hashes. These features could be added in
a future series (or the next version of this one :) ) though.

Benjamin Gray (9):
  selftests/powerpc/dexcr: Add -no-pie to hashchk tests
  powerpc/dexcr: Track the DEXCR per-process
  powerpc/dexcr: Reset DEXCR value across exec
  powerpc/dexcr: Add DEXCR prctl interface
  selftests/powerpc/dexcr: Add DEXCR prctl interface test
  selftests/powerpc/dexcr: Attempt to enable NPHIE in hashchk selftest
  selftests/powerpc/dexcr: Add DEXCR config details to lsdexcr
  selftests/powerpc/dexcr: Add chdexcr utility
  Documentation: Document PowerPC kernel dynamic DEXCR interface

 Documentation/arch/powerpc/dexcr.rst  | 141 +++-
 arch/powerpc/include/asm/processor.h  |  13 +-
 arch/powerpc/kernel/Makefile  |   1 +
 arch/powerpc/kernel/dexcr.c   | 120 ++
 arch/powerpc/kernel/process.c |  17 ++
 arch/powerpc/kernel/ptrace/ptrace-view.c  |   7 +-
 include/uapi/linux/prctl.h|  16 ++
 kernel/sys.c  |  16 ++
 .../selftests/powerpc/dexcr/.gitignore|   2 +
 .../testing/selftests/powerpc/dexcr/Makefile  |   8 +-
 .../testing/selftests/powerpc/dexcr/chdexcr.c | 110 +
 tools/testing/selftests/powerpc/dexcr/dexcr.c |  40 
 tools/testing/selftests/powerpc/dexcr/dexcr.h |  57 +
 .../selftests/powerpc/dexcr/dexcr_test.c  | 213 ++
 .../selftests/powerpc/dexcr/hashchk_test.c|   8 +-
 .../testing/selftests/powerpc/dexcr/lsdexcr.c | 103 ++---
 16 files changed, 823 insertions(+), 49 deletions(-)
 create mode 100644 arch/powerpc/kernel/dexcr.c
 create mode 100644 tools/testing/selftests/powerpc/dexcr/chdexcr.c
 create mode 100644 tools/testing/selftests/powerpc/dexcr/dexcr_test.c

--
2.44.0



[PATCH v1] powerpc: Error on assembly warnings

2024-03-25 Thread Benjamin Gray
We currently enable -Werror on the arch/powerpc subtree. However this
only catches C warnings. Assembly warnings are logged, but the make
invocation will still succeed. This can allow incorrect syntax such as

  ori r3, r4, r5

to be compiled without catching that the assembler is treating r5
as the immediate value 5.

To prevent this in assembly files and inline assembly, add the
-fatal-warnings option to assembler invocations.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/Kbuild | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kbuild b/arch/powerpc/Kbuild
index 22cd0d55a892..da862e9558bc 100644
--- a/arch/powerpc/Kbuild
+++ b/arch/powerpc/Kbuild
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
-subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
+subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror -Wa,-fatal-warnings
+subdir-asflags-$(CONFIG_PPC_WERROR) := -Wa,-fatal-warnings
 
 obj-y += kernel/
 obj-y += mm/
-- 
2.44.0



[PATCH v2 3/3] powerpc/code-patching: Restore 32-bit patching performance

2024-03-25 Thread Benjamin Gray
The new open/close abstraction makes it more difficult for a
compiler to optimise. This causes 10% worse performance on
ppc32 as in [1]. Restoring the page alignment mask and inlining
the helpers allows the compiler to better reason about the address
alignment, allowing more optimised cache flushing selection.

[1]: 
https://lore.kernel.org/all/77fdcdeb-4af5-4ad0-a4c6-57bf0762d...@csgroup.eu/

Suggested-by: Christophe Leroy 
Signed-off-by: Benjamin Gray 

---

v2: * New in v2

I think Suggested-by is an appropriate tag. The patch is Christophe's
from the link, I just added the commit description, so it could well
be better to change the author to Christophe completely.
---
 arch/powerpc/lib/code-patching.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index b3a644290369..d089da115987 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -282,13 +282,13 @@ struct patch_window {
  * Interrupts must be disabled for the entire duration of the patching. The 
PIDR
  * is potentially changed during this time.
  */
-static int open_patch_window(void *addr, struct patch_window *ctx)
+static __always_inline int open_patch_window(void *addr, struct patch_window 
*ctx)
 {
unsigned long pfn = get_patch_pfn(addr);
 
lockdep_assert_irqs_disabled();
 
-   ctx->text_poke_addr = (unsigned 
long)__this_cpu_read(cpu_patching_context.addr);
+   ctx->text_poke_addr = (unsigned 
long)__this_cpu_read(cpu_patching_context.addr) & PAGE_MASK;
 
if (!mm_patch_enabled()) {
ctx->ptep = __this_cpu_read(cpu_patching_context.pte);
@@ -331,7 +331,7 @@ static int open_patch_window(void *addr, struct 
patch_window *ctx)
return 0;
 }
 
-static void close_patch_window(struct patch_window *ctx)
+static __always_inline void close_patch_window(struct patch_window *ctx)
 {
lockdep_assert_irqs_disabled();
 
-- 
2.44.0



[PATCH v2 1/3] powerpc/code-patching: Introduce open_patch_window()/close_patch_window()

2024-03-25 Thread Benjamin Gray
The code patching capabilities have grown, and the specific quirks for
setting up and tearing down the patching window are getting duplicated.

This commit introduces an abstraction for working with this patching
window. It defines open_patch_window() to set up the writable alias
page, and close_patch_window() to remove it and flush the TLB. The
actual mechanism for providing this patching window is an implementation
detail that consumers don't need to worry about. Consumers are still
responsible for flushing/syncing any changes they make through this
alias page though.

Signed-off-by: Benjamin Gray 

---

v1: https://lore.kernel.org/all/20240315025937.407590-1-bg...@linux.ibm.com/

This design can be readily extended to remap the writable page to
another physical page without incurring all of the entry and exit
overhead. But that might have problems with spending too long in
an interrupts disabled context, so I've left it out for now.
---
 arch/powerpc/lib/code-patching.c | 113 +++
 1 file changed, 113 insertions(+)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index df64343b9214..7c193e19e297 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -276,6 +276,119 @@ static void unmap_patch_area(unsigned long addr)
flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
 }
 
+/*
+ * Represents an active patching window.
+ */
+struct patch_window {
+   /*
+* Page aligned patching window address. The page is a writable alias of
+* the configured target page.
+*/
+   unsigned long text_poke_addr;
+   /*
+* Pointer to the PTE for the patching window page.
+*/
+   pte_t *ptep;
+   /*
+* (Temporary MM patching only) The original MM before creating the
+* patching window.
+*/
+   struct mm_struct *orig_mm;
+   /*
+* (Temporary MM patching only) The patching window MM.
+*/
+   struct mm_struct *patch_mm;
+   /*
+* (Temporary MM patching only) Lock for the patching window PTE.
+*/
+   spinlock_t *ptl;
+};
+
+/*
+ * Calling this function opens a 'patching window' that maps a
+ * page starting at ctx->text_poke_addr (which is set by this function)
+ * as a writable alias to the page starting at addr.
+ *
+ * Upon success, callers must invoke the corresponding close_patch_window()
+ * function to return to the original state. Callers are also responsible
+ * for syncing any changes they make inside the window.
+ *
+ * Interrupts must be disabled for the entire duration of the patching. The 
PIDR
+ * is potentially changed during this time.
+ */
+static int open_patch_window(void *addr, struct patch_window *ctx)
+{
+   unsigned long pfn = get_patch_pfn(addr);
+
+   lockdep_assert_irqs_disabled();
+
+   ctx->text_poke_addr = (unsigned 
long)__this_cpu_read(cpu_patching_context.addr);
+
+   if (!mm_patch_enabled()) {
+   ctx->ptep = __this_cpu_read(cpu_patching_context.pte);
+   __set_pte_at(_mm, ctx->text_poke_addr,
+ctx->ptep, pfn_pte(pfn, PAGE_KERNEL), 0);
+
+   /* See ptesync comment in radix__set_pte_at() */
+   if (radix_enabled())
+   asm volatile("ptesync" ::: "memory");
+
+   return 0;
+   }
+
+   ctx->orig_mm = current->active_mm;
+   ctx->patch_mm = __this_cpu_read(cpu_patching_context.mm);
+
+   ctx->ptep = get_locked_pte(ctx->patch_mm, ctx->text_poke_addr, 
>ptl);
+   if (!ctx->ptep)
+   return -ENOMEM;
+
+   __set_pte_at(ctx->patch_mm, ctx->text_poke_addr,
+ctx->ptep, pfn_pte(pfn, PAGE_KERNEL), 0);
+
+   /* order PTE update before use, also serves as the hwsync */
+   asm volatile("ptesync" ::: "memory");
+
+   /*
+* Changing mm requires context synchronising instructions on both 
sides of
+* the context switch, as well as a hwsync between the last instruction 
for
+* which the address of an associated storage access was translated 
using
+* the current context. switch_mm_irqs_off() performs an isync after the
+* context switch.
+*/
+   isync();
+   switch_mm_irqs_off(ctx->orig_mm, ctx->patch_mm, current);
+
+   WARN_ON(!mm_is_thread_local(ctx->patch_mm));
+
+   suspend_breakpoints();
+   return 0;
+}
+
+static void close_patch_window(struct patch_window *ctx)
+{
+   lockdep_assert_irqs_disabled();
+
+   if (!mm_patch_enabled()) {
+   pte_clear(_mm, ctx->text_poke_addr, ctx->ptep);
+   flush_tlb_kernel_range(ctx->text_poke_addr, ctx->text_poke_addr 
+ PAGE_SIZE);
+   return;
+   }
+
+   isync();
+   switch_mm_irqs_off(ctx->patch_mm, ctx->

[PATCH v2 2/3] powerpc/code-patching: Convert to open_patch_window()/close_patch_window()

2024-03-25 Thread Benjamin Gray
The existing patching alias page setup and teardown sections can be
simplified to make use of the new open_patch_window() abstraction.

This eliminates the _mm variants of the helpers, consumers no longer
need to check mm_patch_enabled(), and consumers no longer need to worry
about synchronization and flushing beyond the changes they make in the
patching window.

Signed-off-by: Benjamin Gray 

---

v2: * Removed an outdated comment about syncing
---
 arch/powerpc/lib/code-patching.c | 179 +++
 1 file changed, 15 insertions(+), 164 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 7c193e19e297..b3a644290369 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -66,40 +66,6 @@ static bool mm_patch_enabled(void)
return IS_ENABLED(CONFIG_SMP) && radix_enabled();
 }
 
-/*
- * The following applies for Radix MMU. Hash MMU has different requirements,
- * and so is not supported.
- *
- * Changing mm requires context synchronising instructions on both sides of
- * the context switch, as well as a hwsync between the last instruction for
- * which the address of an associated storage access was translated using
- * the current context.
- *
- * switch_mm_irqs_off() performs an isync after the context switch. It is
- * the responsibility of the caller to perform the CSI and hwsync before
- * starting/stopping the temp mm.
- */
-static struct mm_struct *start_using_temp_mm(struct mm_struct *temp_mm)
-{
-   struct mm_struct *orig_mm = current->active_mm;
-
-   lockdep_assert_irqs_disabled();
-   switch_mm_irqs_off(orig_mm, temp_mm, current);
-
-   WARN_ON(!mm_is_thread_local(temp_mm));
-
-   suspend_breakpoints();
-   return orig_mm;
-}
-
-static void stop_using_temp_mm(struct mm_struct *temp_mm,
-  struct mm_struct *orig_mm)
-{
-   lockdep_assert_irqs_disabled();
-   switch_mm_irqs_off(temp_mm, orig_mm, current);
-   restore_breakpoints();
-}
-
 static int text_area_cpu_up(unsigned int cpu)
 {
struct vm_struct *area;
@@ -389,73 +355,20 @@ static void close_patch_window(struct patch_window *ctx)
pte_unmap_unlock(ctx->ptep, ctx->ptl);
 }
 
-static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr)
-{
-   int err;
-   u32 *patch_addr;
-   unsigned long text_poke_addr;
-   pte_t *pte;
-   unsigned long pfn = get_patch_pfn(addr);
-   struct mm_struct *patching_mm;
-   struct mm_struct *orig_mm;
-   spinlock_t *ptl;
-
-   patching_mm = __this_cpu_read(cpu_patching_context.mm);
-   text_poke_addr = __this_cpu_read(cpu_patching_context.addr);
-   patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr));
-
-   pte = get_locked_pte(patching_mm, text_poke_addr, );
-   if (!pte)
-   return -ENOMEM;
-
-   __set_pte_at(patching_mm, text_poke_addr, pte, pfn_pte(pfn, 
PAGE_KERNEL), 0);
-
-   /* order PTE update before use, also serves as the hwsync */
-   asm volatile("ptesync": : :"memory");
-
-   /* order context switch after arbitrary prior code */
-   isync();
-
-   orig_mm = start_using_temp_mm(patching_mm);
-
-   err = __patch_instruction(addr, instr, patch_addr);
-
-   /* context synchronisation performed by __patch_instruction (isync or 
exception) */
-   stop_using_temp_mm(patching_mm, orig_mm);
-
-   pte_clear(patching_mm, text_poke_addr, pte);
-   /*
-* ptesync to order PTE update before TLB invalidation done
-* by radix__local_flush_tlb_page_psize (in _tlbiel_va)
-*/
-   local_flush_tlb_page_psize(patching_mm, text_poke_addr, 
mmu_virtual_psize);
-
-   pte_unmap_unlock(pte, ptl);
-
-   return err;
-}
-
 static int __do_patch_instruction(u32 *addr, ppc_inst_t instr)
 {
-   int err;
+   struct patch_window ctx;
u32 *patch_addr;
-   unsigned long text_poke_addr;
-   pte_t *pte;
-   unsigned long pfn = get_patch_pfn(addr);
-
-   text_poke_addr = (unsigned 
long)__this_cpu_read(cpu_patching_context.addr) & PAGE_MASK;
-   patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr));
+   int err;
 
-   pte = __this_cpu_read(cpu_patching_context.pte);
-   __set_pte_at(_mm, text_poke_addr, pte, pfn_pte(pfn, PAGE_KERNEL), 
0);
-   /* See ptesync comment in radix__set_pte_at() */
-   if (radix_enabled())
-   asm volatile("ptesync": : :"memory");
+   err = open_patch_window(addr, );
+   if (err)
+   return err;
 
+   patch_addr = (u32 *)(ctx.text_poke_addr + offset_in_page(addr));
err = __patch_instruction(addr, instr, patch_addr);
 
-   pte_clear(_mm, text_poke_addr, pte);
-   flush_tlb_kernel_range(text_poke_addr, text_poke_addr + PAGE_SIZE);
+   close_patch_window();
 
return err;
 }
@@ -475,1

Re: [PATCH v1 2/2] powerpc64/dexcr: Enable PHIE on all CPUs

2024-03-24 Thread Benjamin Gray
OK, so I compile for corenet64 but not corenet32 apparently. I'll fix
the shift overflow in the next round.


[PATCH v3 5/5] powerpc/code-patching: Add boot selftest for data patching

2024-03-24 Thread Benjamin Gray
Extend the code patching selftests with some basic coverage of the new
data patching variants too.

Signed-off-by: Benjamin Gray 

---

v3: * New in v3
---
 arch/powerpc/lib/test-code-patching.c | 36 +++
 1 file changed, 36 insertions(+)

diff --git a/arch/powerpc/lib/test-code-patching.c 
b/arch/powerpc/lib/test-code-patching.c
index c44823292f73..e96c48fcd4db 100644
--- a/arch/powerpc/lib/test-code-patching.c
+++ b/arch/powerpc/lib/test-code-patching.c
@@ -347,6 +347,41 @@ static void __init test_prefixed_patching(void)
check(!memcmp(iptr, expected, sizeof(expected)));
 }
 
+static void __init test_data_patching(void)
+{
+   void *buf;
+   u32 *addr32;
+
+   buf = vzalloc(PAGE_SIZE);
+   check(buf);
+   if (!buf)
+   return;
+
+   addr32 = buf + 128;
+
+   addr32[1] = 0xA0A1A2A3;
+   addr32[2] = 0xB0B1B2B3;
+
+   patch_uint([1], 0xC0C1C2C3);
+
+   check(addr32[0] == 0);
+   check(addr32[1] == 0xC0C1C2C3);
+   check(addr32[2] == 0xB0B1B2B3);
+   check(addr32[3] == 0);
+
+   patch_ulong([1], 0xD0D1D2D3);
+
+   check(addr32[0] == 0);
+   *(unsigned long *)([1]) = 0xD0D1D2D3;
+
+   if (!IS_ENABLED(CONFIG_PPC64))
+   check(addr32[2] == 0xB0B1B2B3);
+
+   check(addr32[3] == 0);
+
+   vfree(buf);
+}
+
 static int __init test_code_patching(void)
 {
pr_info("Running code patching self-tests ...\n");
@@ -356,6 +391,7 @@ static int __init test_code_patching(void)
test_create_function_call();
test_translate_branch();
test_prefixed_patching();
+   test_data_patching();
 
return 0;
 }
-- 
2.44.0



[PATCH v3 1/5] powerpc/code-patching: Add generic memory patching

2024-03-24 Thread Benjamin Gray
patch_instruction() is designed for patching instructions in otherwise
readonly memory. Other consumers also sometimes need to patch readonly
memory, so have abused patch_instruction() for arbitrary data patches.

This is a problem on ppc64 as patch_instruction() decides on the patch
width using the 'instruction' opcode to see if it's a prefixed
instruction. Data that triggers this can lead to larger writes, possibly
crossing a page boundary and failing the write altogether.

Introduce patch_uint(), and patch_ulong(), with aliases patch_u32(), and
patch_u64() (on ppc64) designed for aligned data patches. The patch
size is now determined by the called function, and is passed as an
additional parameter to generic internals.

While the instruction flushing is not required for data patches, it
remains unconditional in this patch. A followup series is possible if
benchmarking shows fewer flushes gives an improvement in some
data-patching workload.

ppc32 does not support prefixed instructions, so is unaffected by the
original issue. Care is taken in not exposing the size parameter in the
public (non-static) interface, so the compiler can const-propagate it
away.

Signed-off-by: Benjamin Gray 

---

v3: * Rename from *_memory to *_mem
* Change type of ppc32 patch_uint() address to void*
* Explain introduction of val32 for big endian
* Some formatting

v2: * Deduplicate patch_32() definition
* Use u32 for val32
* Remove noinline
---
 arch/powerpc/include/asm/code-patching.h | 31 
 arch/powerpc/lib/code-patching.c | 64 ++--
 2 files changed, 80 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index 0e29ccf903d0..21a36e2c4e26 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -76,6 +76,37 @@ int patch_instruction(u32 *addr, ppc_inst_t instr);
 int raw_patch_instruction(u32 *addr, ppc_inst_t instr);
 int patch_instructions(u32 *addr, u32 *code, size_t len, bool repeat_instr);
 
+/*
+ * The data patching functions patch_uint() and patch_ulong(), etc., must be
+ * called on aligned addresses.
+ *
+ * The instruction patching functions patch_instruction() and similar must be
+ * called on addresses satisfying instruction alignment requirements.
+ */
+
+#ifdef CONFIG_PPC64
+
+int patch_uint(void *addr, unsigned int val);
+int patch_ulong(void *addr, unsigned long val);
+
+#define patch_u64 patch_ulong
+
+#else
+
+static inline int patch_uint(void *addr, unsigned int val)
+{
+   return patch_instruction(addr, ppc_inst(val));
+}
+
+static inline int patch_ulong(void *addr, unsigned long val)
+{
+   return patch_instruction(addr, ppc_inst(val));
+}
+
+#endif
+
+#define patch_u32 patch_uint
+
 static inline unsigned long patch_site_addr(s32 *site)
 {
return (unsigned long)site + *site;
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index c6ab46156cda..a31e313c6321 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -20,15 +20,14 @@
 #include 
 #include 
 
-static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 
*patch_addr)
+static int __patch_mem(void *exec_addr, unsigned long val, void *patch_addr, 
bool is_dword)
 {
-   if (!ppc_inst_prefixed(instr)) {
-   u32 val = ppc_inst_val(instr);
+   if (!IS_ENABLED(CONFIG_PPC64) || likely(!is_dword)) {
+   /* For big endian correctness: plain address would use the 
wrong half */
+   u32 val32 = val;
 
-   __put_kernel_nofault(patch_addr, , u32, failed);
+   __put_kernel_nofault(patch_addr, , u32, failed);
} else {
-   u64 val = ppc_inst_as_ulong(instr);
-
__put_kernel_nofault(patch_addr, , u64, failed);
}
 
@@ -44,7 +43,10 @@ static int __patch_instruction(u32 *exec_addr, ppc_inst_t 
instr, u32 *patch_addr
 
 int raw_patch_instruction(u32 *addr, ppc_inst_t instr)
 {
-   return __patch_instruction(addr, instr, addr);
+   if (ppc_inst_prefixed(instr))
+   return __patch_mem(addr, ppc_inst_as_ulong(instr), addr, true);
+   else
+   return __patch_mem(addr, ppc_inst_val(instr), addr, false);
 }
 
 struct patch_context {
@@ -276,7 +278,7 @@ static void unmap_patch_area(unsigned long addr)
flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
 }
 
-static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr)
+static int __do_patch_mem_mm(void *addr, unsigned long val, bool is_dword)
 {
int err;
u32 *patch_addr;
@@ -305,7 +307,7 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t 
instr)
 
orig_mm = start_using_temp_mm(patching_mm);
 
-   err = __patch_instruction(addr, instr, patch_addr);
+   err = __patch_mem(addr, val, patch_addr, is_dword);
 
/* context synchronisation performed

[PATCH v3 4/5] powerpc/32: Convert patch_instruction() to patch_uint()

2024-03-24 Thread Benjamin Gray
These changes are for patch_instruction() uses on data. Unlike ppc64
these should not be incorrect as-is, but using the patch_uint() alias
better reflects what kind of data being patched and allows for
benchmarking the effect of different patch_* implementations (e.g.,
skipping instruction flushing when patching data).

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/kernel/static_call.c | 2 +-
 arch/powerpc/platforms/powermac/smp.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/static_call.c 
b/arch/powerpc/kernel/static_call.c
index 863a7aa24650..1502b7e439ca 100644
--- a/arch/powerpc/kernel/static_call.c
+++ b/arch/powerpc/kernel/static_call.c
@@ -17,7 +17,7 @@ void arch_static_call_transform(void *site, void *tramp, void 
*func, bool tail)
mutex_lock(_mutex);
 
if (func && !is_short) {
-   err = patch_instruction(tramp + PPC_SCT_DATA, ppc_inst(target));
+   err = patch_ulong(tramp + PPC_SCT_DATA, target);
if (err)
goto out;
}
diff --git a/arch/powerpc/platforms/powermac/smp.c 
b/arch/powerpc/platforms/powermac/smp.c
index 15644be31990..d21b681f52fb 100644
--- a/arch/powerpc/platforms/powermac/smp.c
+++ b/arch/powerpc/platforms/powermac/smp.c
@@ -827,7 +827,7 @@ static int smp_core99_kick_cpu(int nr)
mdelay(1);
 
/* Restore our exception vector */
-   patch_instruction(vector, ppc_inst(save_vector));
+   patch_uint(vector, save_vector);
 
local_irq_restore(flags);
if (ppc_md.progress) ppc_md.progress("smp_core99_kick_cpu done", 0x347);
-- 
2.44.0



[PATCH v3 0/5] Add generic data patching functions

2024-03-24 Thread Benjamin Gray
Currently patch_instruction() bases the write length on the value being
written. If the value looks like a prefixed instruction it writes 8 bytes,
otherwise it writes 4 bytes. This makes it potentially buggy to use for
writing arbitrary data, as if you want to write 4 bytes but it decides to
write 8 bytes it may clobber the following memory or be unaligned and
trigger an oops if it tries to cross a page boundary.

To solve this, this series pulls out the size parameter to the 'top' of
the memory patching logic, and propagates it through the various functions.

The two sizes supported are int and long; this allows for patching
instructions and pointers on both ppc32 and ppc64. On ppc32 these are the
same size, so care is taken to only use the size parameter on static
functions, so the compiler can optimise it out entirely. Unfortunately
GCC trips over its own feet here and won't optimise in a way that is
optimal for strict RWX (mpc85xx_smp_defconfig) and no RWX
(pmac32_defconfig). More details in the v2 cover letter.

Changes from v2:
  * Various changes noted on each patch
  * Data patching now enforced to be aligned
  * Restore page aligned flushing optimisation

Changes from v1:
  * Addressed the v1 review actions
  * Removed noinline (for now)

v2: 
https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20231016050147.115686-1-bg...@linux.ibm.com/
v1: 
https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20230207015643.590684-1-bg...@linux.ibm.com/

Benjamin Gray (5):
  powerpc/code-patching: Add generic memory patching
  powerpc/code-patching: Add data patch alignment check
  powerpc/64: Convert patch_instruction() to patch_u32()
  powerpc/32: Convert patch_instruction() to patch_uint()
  powerpc/code-patching: Add boot selftest for data patching

 arch/powerpc/include/asm/code-patching.h | 37 +
 arch/powerpc/kernel/module_64.c  |  5 +-
 arch/powerpc/kernel/static_call.c|  2 +-
 arch/powerpc/lib/code-patching.c | 70 +++-
 arch/powerpc/lib/test-code-patching.c| 36 
 arch/powerpc/platforms/powermac/smp.c|  2 +-
 6 files changed, 132 insertions(+), 20 deletions(-)

--
2.44.0



[PATCH v3 2/5] powerpc/code-patching: Add data patch alignment check

2024-03-24 Thread Benjamin Gray
The new data patching still needs to be aligned within a
cacheline too for the flushes to work correctly. To simplify
this requirement, we just say data patches must be aligned.

Detect when data patching is not aligned, returning an invalid
argument error.

Signed-off-by: Benjamin Gray 

---

v3: * New in v3
---
 arch/powerpc/include/asm/code-patching.h | 6 ++
 arch/powerpc/lib/code-patching.c | 6 ++
 2 files changed, 12 insertions(+)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index 21a36e2c4e26..e7f14720f630 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -95,11 +95,17 @@ int patch_ulong(void *addr, unsigned long val);
 
 static inline int patch_uint(void *addr, unsigned int val)
 {
+   if (!IS_ALIGNED((unsigned long)addr, sizeof(unsigned int)))
+   return -EINVAL;
+
return patch_instruction(addr, ppc_inst(val));
 }
 
 static inline int patch_ulong(void *addr, unsigned long val)
 {
+   if (!IS_ALIGNED((unsigned long)addr, sizeof(unsigned long)))
+   return -EINVAL;
+
return patch_instruction(addr, ppc_inst(val));
 }
 
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index a31e313c6321..35c620dbdfd4 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -386,12 +386,18 @@ NOKPROBE_SYMBOL(patch_instruction);
 
 int patch_uint(void *addr, unsigned int val)
 {
+   if (!IS_ALIGNED((unsigned long)addr, sizeof(unsigned int)))
+   return -EINVAL;
+
return patch_mem(addr, val, false);
 }
 NOKPROBE_SYMBOL(patch_uint);
 
 int patch_ulong(void *addr, unsigned long val)
 {
+   if (!IS_ALIGNED((unsigned long)addr, sizeof(unsigned long)))
+   return -EINVAL;
+
return patch_mem(addr, val, true);
 }
 NOKPROBE_SYMBOL(patch_ulong);
-- 
2.44.0



[PATCH v3 3/5] powerpc/64: Convert patch_instruction() to patch_u32()

2024-03-24 Thread Benjamin Gray
This use of patch_instruction() is working on 32 bit data, and can fail
if the data looks like a prefixed instruction and the extra write
crosses a page boundary. Use patch_u32() to fix the write size.

Fixes: 8734b41b3efe ("powerpc/module_64: Fix livepatching for RO modules")
Link: https://lore.kernel.org/all/20230203004649.1f59dbd4@yea/
Signed-off-by: Benjamin Gray 

---

v2: * Added the fixes tag, it seems appropriate even if the subject does
  mention a more robust solution being required.

patch_u64() should be more efficient, but judging from the bug report
it doesn't seem like the data is doubleword aligned.
---
 arch/powerpc/kernel/module_64.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 7112adc597a8..e9bab599d0c2 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -651,12 +651,11 @@ static inline int create_stub(const Elf64_Shdr *sechdrs,
// func_desc_t is 8 bytes if ABIv2, else 16 bytes
desc = func_desc(addr);
for (i = 0; i < sizeof(func_desc_t) / sizeof(u32); i++) {
-   if (patch_instruction(((u32 *)>funcdata) + i,
- ppc_inst(((u32 *)())[i])))
+   if (patch_u32(((u32 *)>funcdata) + i, ((u32 *))[i]))
return 0;
}
 
-   if (patch_instruction(>magic, ppc_inst(STUB_MAGIC)))
+   if (patch_u32(>magic, STUB_MAGIC))
return 0;
 
return 1;
-- 
2.44.0



[PATCH v2 1/2] powerpc/code-patching: Test patch_instructions() during boot

2024-03-24 Thread Benjamin Gray
patch_instructions() introduces new behaviour with a couple of
variations. Test each case of

  * a repeated 32-bit instruction,
  * a repeated 64-bit instruction (ppc64), and
  * a copied sequence of instructions

for both on a single page and when it crosses a page boundary.

Signed-off-by: Benjamin Gray 

---

v1: https://lore.kernel.org/all/20240315025736.404867-1-bg...@linux.ibm.com/

v2: * Shrink the code array to reduce frame size. It still
  crosses a page, and 32 vs 256 words is unlikely to
  make a difference in test coverage otherwise.
---
 arch/powerpc/lib/test-code-patching.c | 92 +++
 1 file changed, 92 insertions(+)

diff --git a/arch/powerpc/lib/test-code-patching.c 
b/arch/powerpc/lib/test-code-patching.c
index c44823292f73..f76030087f98 100644
--- a/arch/powerpc/lib/test-code-patching.c
+++ b/arch/powerpc/lib/test-code-patching.c
@@ -347,6 +347,97 @@ static void __init test_prefixed_patching(void)
check(!memcmp(iptr, expected, sizeof(expected)));
 }
 
+static void __init test_multi_instruction_patching(void)
+{
+   u32 code[32];
+   void *buf;
+   u32 *addr32;
+   u64 *addr64;
+   ppc_inst_t inst64 = ppc_inst_prefix(OP_PREFIX << 26 | 3UL << 24, 
PPC_RAW_TRAP());
+   u32 inst32 = PPC_RAW_NOP();
+
+   buf = vzalloc(PAGE_SIZE * 8);
+   check(buf);
+   if (!buf)
+   return;
+
+   /* Test single page 32-bit repeated instruction */
+   addr32 = buf + PAGE_SIZE;
+   check(!patch_instructions(addr32 + 1, , 12, true));
+
+   check(addr32[0] == 0);
+   check(addr32[1] == inst32);
+   check(addr32[2] == inst32);
+   check(addr32[3] == inst32);
+   check(addr32[4] == 0);
+
+   /* Test single page 64-bit repeated instruction */
+   if (IS_ENABLED(CONFIG_PPC64)) {
+   check(ppc_inst_prefixed(inst64));
+
+   addr64 = buf + PAGE_SIZE * 2;
+   ppc_inst_write(code, inst64);
+   check(!patch_instructions((u32 *)(addr64 + 1), code, 24, true));
+
+   check(addr64[0] == 0);
+   check(ppc_inst_equal(ppc_inst_read((u32 *)[1]), inst64));
+   check(ppc_inst_equal(ppc_inst_read((u32 *)[2]), inst64));
+   check(ppc_inst_equal(ppc_inst_read((u32 *)[3]), inst64));
+   check(addr64[4] == 0);
+   }
+
+   /* Test single page memcpy */
+   addr32 = buf + PAGE_SIZE * 3;
+
+   for (int i = 0; i < ARRAY_SIZE(code); i++)
+   code[i] = i + 1;
+
+   check(!patch_instructions(addr32 + 1, code, sizeof(code), false));
+
+   check(addr32[0] == 0);
+   check(!memcmp([1], code, sizeof(code)));
+   check(addr32[ARRAY_SIZE(code) + 1] == 0);
+
+   /* Test multipage 32-bit repeated instruction */
+   addr32 = buf + PAGE_SIZE * 4 - 8;
+   check(!patch_instructions(addr32 + 1, , 12, true));
+
+   check(addr32[0] == 0);
+   check(addr32[1] == inst32);
+   check(addr32[2] == inst32);
+   check(addr32[3] == inst32);
+   check(addr32[4] == 0);
+
+   /* Test multipage 64-bit repeated instruction */
+   if (IS_ENABLED(CONFIG_PPC64)) {
+   check(ppc_inst_prefixed(inst64));
+
+   addr64 = buf + PAGE_SIZE * 5 - 8;
+   ppc_inst_write(code, inst64);
+   check(!patch_instructions((u32 *)(addr64 + 1), code, 24, true));
+
+   check(addr64[0] == 0);
+   check(ppc_inst_equal(ppc_inst_read((u32 *)[1]), inst64));
+   check(ppc_inst_equal(ppc_inst_read((u32 *)[2]), inst64));
+   check(ppc_inst_equal(ppc_inst_read((u32 *)[3]), inst64));
+   check(addr64[4] == 0);
+   }
+
+   /* Test multipage memcpy */
+   addr32 = buf + PAGE_SIZE * 6 - 12;
+
+   for (int i = 0; i < ARRAY_SIZE(code); i++)
+   code[i] = i + 1;
+
+   check(!patch_instructions(addr32 + 1, code, sizeof(code), false));
+
+   check(addr32[0] == 0);
+   check(!memcmp([1], code, sizeof(code)));
+   check(addr32[ARRAY_SIZE(code) + 1] == 0);
+
+   vfree(buf);
+}
+
 static int __init test_code_patching(void)
 {
pr_info("Running code patching self-tests ...\n");
@@ -356,6 +447,7 @@ static int __init test_code_patching(void)
test_create_function_call();
test_translate_branch();
test_prefixed_patching();
+   test_multi_instruction_patching();
 
return 0;
 }
-- 
2.44.0



[PATCH v2 2/2] powerpc/code-patching: Use dedicated memory routines for patching

2024-03-24 Thread Benjamin Gray
The patching page set up as a writable alias may be in quadrant 0
(userspace) if the temporary mm path is used. This causes sanitiser
failures if so. Sanitiser failures also occur on the non-mm path
because the plain memset family is instrumented, and KASAN treats the
patching window as poisoned.

Introduce locally defined patch_* variants of memset that perform an
uninstrumented lower level set, as well as detecting write errors like
the original single patch variant does.

copy_to_user() is not correct here, as the PTE makes it a proper kernel
page (the EAA is privileged access only, RW). It just happens to be in
quadrant 0 because that's the hardware's mechanism for using the current
PID vs PID 0 in translations. Importantly, it's incorrect to allow user
page accesses.

Now that the patching memsets are used, we also propagate a failure up
to the caller as the single patch variant does.

Signed-off-by: Benjamin Gray 

---

v2: * Fix typo in EAA (from EEA)
* Fix references to quadrant number (0, not 1)
* Use copy_to_kernel_nofault() over custom memcpy
* Drop custom memcpy optimisation patch
---
 arch/powerpc/lib/code-patching.c | 31 +++
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index c6ab46156cda..df64343b9214 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -372,9 +372,32 @@ int patch_instruction(u32 *addr, ppc_inst_t instr)
 }
 NOKPROBE_SYMBOL(patch_instruction);
 
+static int patch_memset64(u64 *addr, u64 val, size_t count)
+{
+   for (u64 *end = addr + count; addr < end; addr++)
+   __put_kernel_nofault(addr, , u64, failed);
+
+   return 0;
+
+failed:
+   return -EPERM;
+}
+
+static int patch_memset32(u32 *addr, u32 val, size_t count)
+{
+   for (u32 *end = addr + count; addr < end; addr++)
+   __put_kernel_nofault(addr, , u32, failed);
+
+   return 0;
+
+failed:
+   return -EPERM;
+}
+
 static int __patch_instructions(u32 *patch_addr, u32 *code, size_t len, bool 
repeat_instr)
 {
unsigned long start = (unsigned long)patch_addr;
+   int err;
 
/* Repeat instruction */
if (repeat_instr) {
@@ -383,19 +406,19 @@ static int __patch_instructions(u32 *patch_addr, u32 
*code, size_t len, bool rep
if (ppc_inst_prefixed(instr)) {
u64 val = ppc_inst_as_ulong(instr);
 
-   memset64((u64 *)patch_addr, val, len / 8);
+   err = patch_memset64((u64 *)patch_addr, val, len / 8);
} else {
u32 val = ppc_inst_val(instr);
 
-   memset32(patch_addr, val, len / 4);
+   err = patch_memset32(patch_addr, val, len / 4);
}
} else {
-   memcpy(patch_addr, code, len);
+   err = copy_to_kernel_nofault(patch_addr, code, len);
}
 
smp_wmb();  /* smp write barrier */
flush_icache_range(start, start + len);
-   return 0;
+   return err;
 }
 
 /*
-- 
2.44.0



[PATCH v1 1/2] powerpc64/dexcr: Compile kernel with privileged hash instructions

2024-03-24 Thread Benjamin Gray
There are dedicated hashstp and hashchkp instructions that
can be inserted into a guest kernel to give it hypervisor
managed ROP protection (the hypervisor sets the secret hash
key and handles hashstp exceptions).

In testing, the kernel appears to handle the compiler generated
hash protection just fine, without any changes. This makes sense,
as any 'weird' stack interactions will normally be done in hand
written assembly. We can expect that a compiler generated function
prologue will be matched with a compiler generated function epilogue
with the stack as expected by the compiler (in some sense, the hash
value stored on the stack is just like any other local variable).

GCC requires ELF ABI v2, and Clang only works with ELF ABI v2
anyway, so add it as a dependency.

GCC will only insert these instructions if the target CPU is
specified to be Power10 (possibly a bug; the documentation says
they are inserted for Power8 or higher).

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/Makefile  |  3 +++
 arch/powerpc/platforms/Kconfig.cputype | 12 
 2 files changed, 15 insertions(+)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 65261cbe5bfd..bfaa3c754ae2 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -168,6 +168,9 @@ endif
 CFLAGS-$(CONFIG_TARGET_CPU_BOOL) += -mcpu=$(CONFIG_TARGET_CPU)
 AFLAGS-$(CONFIG_TARGET_CPU_BOOL) += -mcpu=$(CONFIG_TARGET_CPU)
 
+CFLAGS-$(CONFIG_PPC_KERNEL_ROP_PROTECT) += $(call cc-option,-mrop-protect)
+CFLAGS-$(CONFIG_PPC_KERNEL_ROP_PROTECT) += $(call cc-option,-mprivileged)
+
 CFLAGS-y += $(CONFIG_TUNE_CPU)
 
 asinstr := $(call as-instr,lis 9$(comma)foo@high,-DHAVE_AS_ATHIGH=1)
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index b2d8c0da2ad9..a95b11782379 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -517,6 +517,18 @@ config PPC_KUAP_DEBUG
  Add extra debugging for Kernel Userspace Access Protection (KUAP)
  If you're unsure, say N.
 
+config PPC_KERNEL_ROP_PROTECT
+   bool "Kernel ROP Protection"
+   default y
+   depends on PPC64_ELF_ABI_V2
+   depends on !CC_IS_GCC || TARGET_CPU = "power10"
+   help
+ This tells the compiler to insert hashstp/hashckp instructions
+ in the prologue and epilogue of every kernel function. The kernel
+ also turns on the DEXCR[PHIE] aspect to cause an exception if the
+ hashchkp does not agree with the hash calculated by the matching
+ hashstp.
+
 config PPC_PKEY
def_bool y
depends on PPC_BOOK3S_64
-- 
2.44.0



[PATCH v1 2/2] powerpc64/dexcr: Enable PHIE on all CPUs

2024-03-24 Thread Benjamin Gray
While we can now compile the kernel with ROP
protection, it's possible the hash instructions
are acting as NOPs.

Enable the PHIE aspect at an appropriate stage in
boot so as to maximise coverage without requiring
certain functions be compiled without ROP
protection.

For the boot CPU, there are no arch defined functions
that do not return and get called before we start
spawning tasks. Therefore we insert the PHIE enablement
as a feature section after we call early_setup() where
CPU feature detection takes place.

For secondary CPUs, we can enable PHIE in
start_secondary().

Signed-off-by: Benjamin Gray 

---

This patch is probably incompatible with the
per-task DEXCR tracking in the userspace DEXCR
series, but I'll fix up whichever one lands last.

I tested on a Power10 (TCG and KVM) and Power9.
I also tried enabling ftrace; no apparent issues,
and the trace probes were definitely triggering.

The default config enables ROP protection when
the dependencies are satisfied but perhaps we
might want to phase it in slower by disabling it?

Finally, I've tied together inserting the hash
instructions and enabling the PHIE aspect. It
might be preferable for distros to have the option
to boot without enabling PHIE for performance
comparisons. This would be with a command line
option I guess?
---
 arch/powerpc/include/asm/reg.h |  2 ++
 arch/powerpc/kernel/head_64.S  | 10 ++
 arch/powerpc/kernel/smp.c  |  6 ++
 3 files changed, 18 insertions(+)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index d3d1aea009b4..185807119320 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -392,6 +392,8 @@
 #define   DEXCR_PR_IBRTPD 0x1000UL /* 3: Indirect Branch Recurrent Target 
Prediction Disable */
 #define   DEXCR_PR_SRAPD  0x0800UL /* 4: Subroutine Return Address 
Prediction Disable */
 #define   DEXCR_PR_NPHIE  0x0400UL /* 5: Non-Privileged Hash Instruction 
Enable */
+#define   DEXCR_PR_PHIE   0x0200UL /* 6: Privileged Hash Instruction 
Enable */
+#define   DEXCR_PNH_PHIE  (DEXCR_PR_PHIE << 32)
 #define   DEXCR_INIT   DEXCR_PR_NPHIE  /* Fixed DEXCR value to initialise all 
CPUs with */
 #define SPRN_IC0x350   /* Virtual Instruction Count */
 #define SPRN_VTB   0x351   /* Virtual Time Base */
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index 4690c219bfa4..490cd09dc9a4 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -1011,6 +1011,16 @@ start_here_multiplatform:
mtctr   r12
bctrl   /* also sets r13 and SPRG_PACA */
 
+#ifdef CONFIG_PPC_KERNEL_ROP_PROTECT
+BEGIN_FTR_SECTION
+   mfspr   r3,SPRN_DEXCR
+   LOAD_REG_IMMEDIATE(r4,DEXCR_PNH_PHIE)
+   or  r3,r3,r4
+   mtspr   SPRN_DEXCR,r3
+   isync
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_31)
+#endif
+
LOAD_REG_ADDR(r3, start_here_common)
ld  r4,PACAKMSR(r13)
mtspr   SPRN_SRR0,r3
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index a60e4139214b..4a8e9f79aa0c 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1620,6 +1620,12 @@ void start_secondary(void *unused)
 {
unsigned int cpu = raw_smp_processor_id();
 
+   /* Enable hash instructions on this CPU in case not already enabled by 
the hypervisor */
+   if (IS_ENABLED(CONFIG_PPC_KERNEL_ROP_PROTECT) && 
cpu_has_feature(CPU_FTR_ARCH_31)) {
+   mtspr(SPRN_DEXCR, mfspr(SPRN_DEXCR) | DEXCR_PNH_PHIE);
+   isync();
+   }
+
/* PPC64 calls setup_kup() in early_setup_secondary() */
if (IS_ENABLED(CONFIG_PPC32))
setup_kup();
-- 
2.44.0



Re: [PATCH v1 1/3] powerpc/code-patching: Test patch_instructions() during boot

2024-03-17 Thread Benjamin Gray
 
mjW5LaBPOdGiiDE1w95Ri9HRK27S2dRZpyib9L4mkfYWPAF41mTudjKmVpgtBLO//rO+zmF04OMB/4sWJhLfvhq1CXULDqw5dcuIAIYwf2ughOtyAPFK1ViDcMO5X1bVpNAFO5m4VBpZvFDQ0j0JfqfVBdL68uH05W1/8dMj76RaWj5m0rLM5slY1FQUPddSU+ic9vaZhlDepjU3ZyI8fmioofNGHaxJq6uNTytKdj87kwDV6PQ4hmuGtY56C7JCgjp053sRJ6sXqgKBWfe4ZOJH17mQm+fws93byLoZvvz4Z3im0Rb0MlFo/WirNyhu+TmTNLpnzFUZfenoKrqAkZLY8u1iCFquhgqA321P+sfYew66DtwQmaoi2GKmF89y2enXXzjLNKfLDKkuVoKxFSPeizYqrLi22R9iO8EGBYKACAWIQQ9K5v9I+L06Hi4yOJ5xrdpFsvehAUCYzuwkQIbAgCBCRB5xrdpFsvehHYgBBkRCAAdFiEESFUlaLYscsf4Dt5gaavCcpI6D/8FAmM7sJEACgkQaavCcpI6D/95UgEAqfSj0QhCrYfazQiLDKJstrz3oIKFjhB6+FYMZqt+K1MA/2ioFtHbypeeWbsqYYRhRyTjAKcvE1NZGtH/YWLgkViUidoBAN6gFX/P+VWB77/w8S/BnPmnJx45wmphlkCL8ckOyopFAQCj9eWamHCl2DSaASMSuoZed6C6Gm0OFtuZh/r8K485BQ==
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Date: Mon, 18 Mar 2024 08:55:02 +1100
MIME-Version: 1.0
User-Agent: Evolution 3.50.4 (3.50.4-1.fc39) 
X-Trend-IP-HD: 
ip=[9.192.253.14]helo={ozlabs.au.ibm.com}sender=(bg...@linux.ibm.com)recipient=

On Mon, 2024-03-18 at 08:38 +1100, Benjamin Gray wrote:
> On Fri, 2024-03-15 at 07:14 +, Christophe Leroy wrote:
> >=20
> >=20
> > Le 15/03/2024 =C3=A0 03:57, Benjamin Gray a =C3=A9crit=C2=A0:
> > > patch_instructions() introduces new behaviour with a couple of
> > > variations. Test each case of
> > >=20
> > > =C2=A0=C2=A0 * a repeated 32-bit instruction,
> > > =C2=A0=C2=A0 * a repeated 64-bit instruction (ppc64), and
> > > =C2=A0=C2=A0 * a copied sequence of instructions
> > >=20
> > > for both on a single page and when it crosses a page boundary.
> > >=20
> > > Signed-off-by: Benjamin Gray 
> > > ---
> > > =C2=A0 arch/powerpc/lib/test-code-patching.c | 92
> > > +++
> > > =C2=A0 1 file changed, 92 insertions(+)
> > >=20
> > > diff --git a/arch/powerpc/lib/test-code-patching.c
> > > b/arch/powerpc/lib/test-code-patching.c
> > > index c44823292f73..35a3756272df 100644
> > > --- a/arch/powerpc/lib/test-code-patching.c
> > > +++ b/arch/powerpc/lib/test-code-patching.c
> > > @@ -347,6 +347,97 @@ static void __init
> > > test_prefixed_patching(void)
> > > =C2=A0=C2=A0  check(!memcmp(iptr, expected, sizeof(expected)));
> > > =C2=A0 }
> > > =C2=A0=20
> > > +static void __init test_multi_instruction_patching(void)
> > > +{
> > > + u32 code[256];
> >=20
> > Build failure:
> >=20
> > =C2=A0=C2=A0 CC=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 arch/powerpc/lib/test-cod=
e-patching.o
> > arch/powerpc/lib/test-code-patching.c: In function=20
> > 'test_multi_instruction_patching':
> > arch/powerpc/lib/test-code-patching.c:439:1: error: the frame size
> > of
> > 1040 bytes is larger than 1024 bytes [-Werror=3Dframe-larger-than=3D]
> > =C2=A0=C2=A0 439 | }
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | ^
> > cc1: all warnings being treated as errors
> > make[4]: *** [scripts/Makefile.build:243:=20
> > arch/powerpc/lib/test-code-patching.o] Error 1
> >=20
> >=20
> > I have to avoid big arrays on the stack.
>=20
> All good, I can do that.
>=20
> I do run my patches through a couple of 32-bit configs, but I didn't
> see this error. Is this a standard config I should be testing with?
>=20

Specifically I build pmac32_defconfig and ppc44x_defconfig




Re: [PATCH v1 1/3] powerpc/code-patching: Test patch_instructions() during boot

2024-03-17 Thread Benjamin Gray
On Mon, 2024-03-18 at 08:38 +1100, Benjamin Gray wrote:
> On Fri, 2024-03-15 at 07:14 +, Christophe Leroy wrote:
> > 
> > 
> > Le 15/03/2024 à 03:57, Benjamin Gray a écrit :
> > > patch_instructions() introduces new behaviour with a couple of
> > > variations. Test each case of
> > > 
> > >    * a repeated 32-bit instruction,
> > >    * a repeated 64-bit instruction (ppc64), and
> > >    * a copied sequence of instructions
> > > 
> > > for both on a single page and when it crosses a page boundary.
> > > 
> > > Signed-off-by: Benjamin Gray 
> > > ---
> > >   arch/powerpc/lib/test-code-patching.c | 92
> > > +++
> > >   1 file changed, 92 insertions(+)
> > > 
> > > diff --git a/arch/powerpc/lib/test-code-patching.c
> > > b/arch/powerpc/lib/test-code-patching.c
> > > index c44823292f73..35a3756272df 100644
> > > --- a/arch/powerpc/lib/test-code-patching.c
> > > +++ b/arch/powerpc/lib/test-code-patching.c
> > > @@ -347,6 +347,97 @@ static void __init
> > > test_prefixed_patching(void)
> > >   check(!memcmp(iptr, expected, sizeof(expected)));
> > >   }
> > >   
> > > +static void __init test_multi_instruction_patching(void)
> > > +{
> > > + u32 code[256];
> > 
> > Build failure:
> > 
> >    CC  arch/powerpc/lib/test-code-patching.o
> > arch/powerpc/lib/test-code-patching.c: In function 
> > 'test_multi_instruction_patching':
> > arch/powerpc/lib/test-code-patching.c:439:1: error: the frame size
> > of
> > 1040 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
> >    439 | }
> >    | ^
> > cc1: all warnings being treated as errors
> > make[4]: *** [scripts/Makefile.build:243: 
> > arch/powerpc/lib/test-code-patching.o] Error 1
> > 
> > 
> > I have to avoid big arrays on the stack.
> 
> All good, I can do that.
> 
> I do run my patches through a couple of 32-bit configs, but I didn't
> see this error. Is this a standard config I should be testing with?

Specifically pmac32_defconfig and ppc44x_defconfig


Re: [PATCH v1 3/3] powerpc/code-patching: Optimise patch_memcpy() to 4 byte chunks

2024-03-17 Thread Benjamin Gray
On Fri, 2024-03-15 at 06:39 +, Christophe Leroy wrote:
> 
> 
> Le 15/03/2024 à 03:57, Benjamin Gray a écrit :
> > As we are patching instructions, we can assume the length is a
> > multiple
> > of 4 and the destination address is aligned.
> > 
> > Atomicity of patching a prefixed instruction is not a concern, as
> > the
> > original implementation doesn't provide it anyway.
> 
> This patch looks unnecessary.
> 
> copy_to_kernel_nofault() is what you want to use instead.

Yeah, I would drop this patch when using copy_to_kernel_nofault()

> 
> > 
> > Signed-off-by: Benjamin Gray 
> > ---
> >   arch/powerpc/lib/code-patching.c | 8 
> >   1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/powerpc/lib/code-patching.c
> > b/arch/powerpc/lib/code-patching.c
> > index c6633759b509..ed450a32918c 100644
> > --- a/arch/powerpc/lib/code-patching.c
> > +++ b/arch/powerpc/lib/code-patching.c
> > @@ -394,10 +394,10 @@ static int patch_memset32(u32 *addr, u32 val,
> > size_t count)
> >     return -EPERM;
> >   }
> >   
> > -static int patch_memcpy(void *dst, void *src, size_t len)
> > +static int patch_memcpy32(u32 *dst, u32 *src, size_t count)
> >   {
> > -   for (void *end = src + len; src < end; dst++, src++)
> > -   __put_kernel_nofault(dst, src, u8, failed);
> > +   for (u32 *end = src + count; src < end; dst++, src++)
> > +   __put_kernel_nofault(dst, src, u32, failed);
> >   
> >     return 0;
> >   
> > @@ -424,7 +424,7 @@ static int __patch_instructions(u32
> > *patch_addr, u32 *code, size_t len, bool rep
> >     err = patch_memset32(patch_addr, val, len
> > / 4);
> >     }
> >     } else {
> > -   err = patch_memcpy(patch_addr, code, len);
> > +   err = patch_memcpy32(patch_addr, code, len / 4);
> >     }
> >   
> >     smp_wmb();  /* smp write barrier */



Re: [PATCH v1 2/3] powerpc/code-patching: Use dedicated memory routines for patching

2024-03-17 Thread Benjamin Gray
On Fri, 2024-03-15 at 06:36 +, Christophe Leroy wrote:
> 
> 
> Le 15/03/2024 à 03:57, Benjamin Gray a écrit :
> > The patching page set up as a writable alias may be in quadrant 1
> > (userspace) if the temporary mm path is used. This causes sanitiser
> > failures if so. Sanitiser failures also occur on the non-mm path
> > because the plain memset family is instrumented, and KASAN treats
> > the
> > patching window as poisoned.
> > 
> > Introduce locally defined patch_* variants of memset that perform
> > an
> > uninstrumented lower level set, as well as detecting write errors
> > like
> > the original single patch variant does.
> > 
> > copy_to_user() is not correct here, as the PTE makes it a proper
> > kernel
> > page (the EEA is privileged access only, RW). It just happens to be
> > in
> > quadrant 1 because that's the hardware's mechanism for using the
> > current
> > PID vs PID 0 in translations. Importantly, it's incorrect to allow
> > user
> > page accesses.
> > 
> > Now that the patching memsets are used, we also propagate a failure
> > up
> > to the caller as the single patch variant does.
> > 
> > Signed-off-by: Benjamin Gray 
> > 
> > ---
> > 
> > The patch_memcpy() can be optimised to 4 bytes at a time assuming
> > the
> > same requirements as regular instruction patching are being
> > followed
> > for the 'copy sequence of instructions' mode (i.e., they actually
> > are
> > instructions following instruction alignment rules).
> 
> Why not use copy_to_kernel_nofault() ?

I had not come across copy_to_kernel_nofault(). It looks like the
optimised memcpy() I wanted, so thanks.

> 
> 
> > ---
> >   arch/powerpc/lib/code-patching.c | 42
> > +---
> >   1 file changed, 38 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/powerpc/lib/code-patching.c
> > b/arch/powerpc/lib/code-patching.c
> > index c6ab46156cda..c6633759b509 100644
> > --- a/arch/powerpc/lib/code-patching.c
> > +++ b/arch/powerpc/lib/code-patching.c
> > @@ -372,9 +372,43 @@ int patch_instruction(u32 *addr, ppc_inst_t
> > instr)
> >   }
> >   NOKPROBE_SYMBOL(patch_instruction);
> >   
> > +static int patch_memset64(u64 *addr, u64 val, size_t count)
> > +{
> > +   for (u64 *end = addr + count; addr < end; addr++)
> > +   __put_kernel_nofault(addr, , u64, failed);
> > +
> > +   return 0;
> > +
> > +failed:
> > +   return -EPERM;
> 
> Is it correct ? Shouldn't it be -EFAULT ?

The single instruction patch returns EPERM, which was set this way to
align with ftrace's expectations. I think it's best to keep the
single/multi patching variants consistent with each other where
possible.

> 
> > +}
> > +
> > +static int patch_memset32(u32 *addr, u32 val, size_t count)
> > +{
> > +   for (u32 *end = addr + count; addr < end; addr++)
> > +   __put_kernel_nofault(addr, , u32, failed);
> > +
> > +   return 0;
> > +
> > +failed:
> > +   return -EPERM;
> > +}
> > +
> > +static int patch_memcpy(void *dst, void *src, size_t len)
> > +{
> > +   for (void *end = src + len; src < end; dst++, src++)
> > +   __put_kernel_nofault(dst, src, u8, failed);
> > +
> > +   return 0;
> > +
> > +failed:
> > +   return -EPERM;
> > +}
> > +
> >   static int __patch_instructions(u32 *patch_addr, u32 *code,
> > size_t len, bool repeat_instr)
> >   {
> >     unsigned long start = (unsigned long)patch_addr;
> > +   int err;
> >   
> >     /* Repeat instruction */
> >     if (repeat_instr) {
> > @@ -383,19 +417,19 @@ static int __patch_instructions(u32
> > *patch_addr, u32 *code, size_t len, bool rep
> >     if (ppc_inst_prefixed(instr)) {
> >     u64 val = ppc_inst_as_ulong(instr);
> >   
> > -   memset64((u64 *)patch_addr, val, len / 8);
> > +   err = patch_memset64((u64 *)patch_addr,
> > val, len / 8);
> >     } else {
> >     u32 val = ppc_inst_val(instr);
> >   
> > -   memset32(patch_addr, val, len / 4);
> > +   err = patch_memset32(patch_addr, val, len
> > / 4);
> >     }
> >     } else {
> > -   memcpy(patch_addr, code, len);
> > +   err = patch_memcpy(patch_addr, code, len);
> 
> Use copy_to_kernel_nofault() instead of open coding a new less
> optimised 
> version of it.
> 
> >     }
> >   
> >     smp_wmb();  /* smp write barrier */
> >     flush_icache_range(start, start + len);
> > -   return 0;
> > +   return err;
> >   }
> >   
> >   /*



Re: [PATCH v1 1/3] powerpc/code-patching: Test patch_instructions() during boot

2024-03-17 Thread Benjamin Gray
On Fri, 2024-03-15 at 07:14 +, Christophe Leroy wrote:
> 
> 
> Le 15/03/2024 à 03:57, Benjamin Gray a écrit :
> > patch_instructions() introduces new behaviour with a couple of
> > variations. Test each case of
> > 
> >    * a repeated 32-bit instruction,
> >    * a repeated 64-bit instruction (ppc64), and
> >    * a copied sequence of instructions
> > 
> > for both on a single page and when it crosses a page boundary.
> > 
> > Signed-off-by: Benjamin Gray 
> > ---
> >   arch/powerpc/lib/test-code-patching.c | 92
> > +++
> >   1 file changed, 92 insertions(+)
> > 
> > diff --git a/arch/powerpc/lib/test-code-patching.c
> > b/arch/powerpc/lib/test-code-patching.c
> > index c44823292f73..35a3756272df 100644
> > --- a/arch/powerpc/lib/test-code-patching.c
> > +++ b/arch/powerpc/lib/test-code-patching.c
> > @@ -347,6 +347,97 @@ static void __init
> > test_prefixed_patching(void)
> >     check(!memcmp(iptr, expected, sizeof(expected)));
> >   }
> >   
> > +static void __init test_multi_instruction_patching(void)
> > +{
> > +   u32 code[256];
> 
> Build failure:
> 
>    CC  arch/powerpc/lib/test-code-patching.o
> arch/powerpc/lib/test-code-patching.c: In function 
> 'test_multi_instruction_patching':
> arch/powerpc/lib/test-code-patching.c:439:1: error: the frame size of
> 1040 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
>    439 | }
>    | ^
> cc1: all warnings being treated as errors
> make[4]: *** [scripts/Makefile.build:243: 
> arch/powerpc/lib/test-code-patching.o] Error 1
> 
> 
> I have to avoid big arrays on the stack.

All good, I can do that.

I do run my patches through a couple of 32-bit configs, but I didn't
see this error. Is this a standard config I should be testing with?

> 
> 
> > +   void *buf;
> > +   u32 *addr32;
> > +   u64 *addr64;
> > +   ppc_inst_t inst64 = ppc_inst_prefix(OP_PREFIX << 26 | 3UL
> > << 24, PPC_RAW_TRAP());
> > +   u32 inst32 = PPC_RAW_NOP();
> > +
> > +   buf = vzalloc(PAGE_SIZE * 8);
> > +   check(buf);
> > +   if (!buf)
> > +   return;
> > +
> > +   /* Test single page 32-bit repeated instruction */
> > +   addr32 = buf + PAGE_SIZE;
> > +   check(!patch_instructions(addr32 + 1, , 12, true));
> > +
> > +   check(addr32[0] == 0);
> > +   check(addr32[1] == inst32);
> > +   check(addr32[2] == inst32);
> > +   check(addr32[3] == inst32);
> > +   check(addr32[4] == 0);
> > +
> > +   /* Test single page 64-bit repeated instruction */
> > +   if (IS_ENABLED(CONFIG_PPC64)) {
> > +   check(ppc_inst_prefixed(inst64));
> > +
> > +   addr64 = buf + PAGE_SIZE * 2;
> > +   ppc_inst_write(code, inst64);
> > +   check(!patch_instructions((u32 *)(addr64 + 1),
> > code, 24, true));
> > +
> > +   check(addr64[0] == 0);
> > +   check(ppc_inst_equal(ppc_inst_read((u32
> > *)[1]), inst64));
> > +   check(ppc_inst_equal(ppc_inst_read((u32
> > *)[2]), inst64));
> > +   check(ppc_inst_equal(ppc_inst_read((u32
> > *)[3]), inst64));
> > +   check(addr64[4] == 0);
> > +   }
> > +
> > +   /* Test single page memcpy */
> > +   addr32 = buf + PAGE_SIZE * 3;
> > +
> > +   for (int i = 0; i < ARRAY_SIZE(code); i++)
> > +   code[i] = i + 1;
> > +
> > +   check(!patch_instructions(addr32 + 1, code, sizeof(code),
> > false));
> > +
> > +   check(addr32[0] == 0);
> > +   check(!memcmp([1], code, sizeof(code)));
> > +   check(addr32[ARRAY_SIZE(code) + 1] == 0);
> > +
> > +   /* Test multipage 32-bit repeated instruction */
> > +   addr32 = buf + PAGE_SIZE * 4 - 8;
> > +   check(!patch_instructions(addr32 + 1, , 12, true));
> > +
> > +   check(addr32[0] == 0);
> > +   check(addr32[1] == inst32);
> > +   check(addr32[2] == inst32);
> > +   check(addr32[3] == inst32);
> > +   check(addr32[4] == 0);
> > +
> > +   /* Test multipage 64-bit repeated instruction */
> > +   if (IS_ENABLED(CONFIG_PPC64)) {
> > +   check(ppc_inst_prefixed(inst64));
> > +
> > +   addr64 = buf + PAGE_SIZE * 5 - 8;
> > +   ppc_inst_write(code, inst64);
> > +   check(!patch_instructions((u32 *)(addr64 + 1),
> > code, 24, true));
> > +
> > +   check(addr64[0] == 0);
> > +   check(ppc_inst_equal(ppc_inst_read((u32
> > *)[1]), inst64));
> > +   ch

Re: [PATCH v1 2/2] powerpc/code-patching: Convert to open_patch_window()/close_patch_window()

2024-03-17 Thread Benjamin Gray
On Sat, 2024-03-16 at 10:10 +, Christophe Leroy wrote:
> 
> 
> Le 15/03/2024 à 09:38, Christophe Leroy a écrit :
> > 
> > 
> > Le 15/03/2024 à 03:59, Benjamin Gray a écrit :
> > > The existing patching alias page setup and teardown sections can
> > > be
> > > simplified to make use of the new open_patch_window()
> > > abstraction.
> > > 
> > > This eliminates the _mm variants of the helpers, consumers no
> > > longer
> > > need to check mm_patch_enabled(), and consumers no longer need to
> > > worry
> > > about synchronization and flushing beyond the changes they make
> > > in the
> > > patching window.
> > 
> > With this patch, the time needed to activate or de-activate
> > function 
> > tracer is approx 10% longer on powerpc 8xx.
> 
> With the following changes, the performance is restored:
> 
> diff --git a/arch/powerpc/lib/code-patching.c 
> b/arch/powerpc/lib/code-patching.c
> index fd6f8576033a..bc92b85913d8 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -282,13 +282,13 @@ struct patch_window {
>    * Interrupts must be disabled for the entire duration of the 
> patching. The PIDR
>    * is potentially changed during this time.
>    */
> -static int open_patch_window(void *addr, struct patch_window *ctx)
> +static __always_inline int open_patch_window(void *addr, struct 
> patch_window *ctx)
>   {
>   unsigned long pfn = get_patch_pfn(addr);
> 
>   lockdep_assert_irqs_disabled();
> 
> - ctx->text_poke_addr = (unsigned 
> long)__this_cpu_read(cpu_patching_context.addr);
> + ctx->text_poke_addr = (unsigned 
> long)__this_cpu_read(cpu_patching_context.addr) & PAGE_MASK;
> 
>   if (!mm_patch_enabled()) {
>   ctx->ptep =
> __this_cpu_read(cpu_patching_context.pte);
> @@ -331,7 +331,7 @@ static int open_patch_window(void *addr, struct 
> patch_window *ctx)
>   return 0;
>   }
> 
> -static void close_patch_window(struct patch_window *ctx)
> +static __always_inline void close_patch_window(struct patch_window
> *ctx)
>   {
>   lockdep_assert_irqs_disabled();
> 
> 

Thanks for checking that. I did restore the page mask optimisation in a
later patch while still developing, but the 64-bit assembly looked
slightly worse for it. I didn't check the 32-bit; no way to benchmark
it anyway.


Re: [PATCH v1 2/3] powerpc/code-patching: Use dedicated memory routines for patching

2024-03-14 Thread Benjamin Gray
Also supersedes
https://lore.kernel.org/all/20240213043638.168048-1-bg...@linux.ibm.com/


[PATCH v1 2/2] powerpc/code-patching: Convert to open_patch_window()/close_patch_window()

2024-03-14 Thread Benjamin Gray
The existing patching alias page setup and teardown sections can be
simplified to make use of the new open_patch_window() abstraction.

This eliminates the _mm variants of the helpers, consumers no longer
need to check mm_patch_enabled(), and consumers no longer need to worry
about synchronization and flushing beyond the changes they make in the
patching window.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/lib/code-patching.c | 180 +++
 1 file changed, 16 insertions(+), 164 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index d1b812f84154..fd6f8576033a 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -66,40 +66,6 @@ static bool mm_patch_enabled(void)
return IS_ENABLED(CONFIG_SMP) && radix_enabled();
 }
 
-/*
- * The following applies for Radix MMU. Hash MMU has different requirements,
- * and so is not supported.
- *
- * Changing mm requires context synchronising instructions on both sides of
- * the context switch, as well as a hwsync between the last instruction for
- * which the address of an associated storage access was translated using
- * the current context.
- *
- * switch_mm_irqs_off() performs an isync after the context switch. It is
- * the responsibility of the caller to perform the CSI and hwsync before
- * starting/stopping the temp mm.
- */
-static struct mm_struct *start_using_temp_mm(struct mm_struct *temp_mm)
-{
-   struct mm_struct *orig_mm = current->active_mm;
-
-   lockdep_assert_irqs_disabled();
-   switch_mm_irqs_off(orig_mm, temp_mm, current);
-
-   WARN_ON(!mm_is_thread_local(temp_mm));
-
-   suspend_breakpoints();
-   return orig_mm;
-}
-
-static void stop_using_temp_mm(struct mm_struct *temp_mm,
-  struct mm_struct *orig_mm)
-{
-   lockdep_assert_irqs_disabled();
-   switch_mm_irqs_off(temp_mm, orig_mm, current);
-   restore_breakpoints();
-}
-
 static int text_area_cpu_up(unsigned int cpu)
 {
struct vm_struct *area;
@@ -389,73 +355,20 @@ static void close_patch_window(struct patch_window *ctx)
pte_unmap_unlock(ctx->ptep, ctx->ptl);
 }
 
-static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr)
-{
-   int err;
-   u32 *patch_addr;
-   unsigned long text_poke_addr;
-   pte_t *pte;
-   unsigned long pfn = get_patch_pfn(addr);
-   struct mm_struct *patching_mm;
-   struct mm_struct *orig_mm;
-   spinlock_t *ptl;
-
-   patching_mm = __this_cpu_read(cpu_patching_context.mm);
-   text_poke_addr = __this_cpu_read(cpu_patching_context.addr);
-   patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr));
-
-   pte = get_locked_pte(patching_mm, text_poke_addr, );
-   if (!pte)
-   return -ENOMEM;
-
-   __set_pte_at(patching_mm, text_poke_addr, pte, pfn_pte(pfn, 
PAGE_KERNEL), 0);
-
-   /* order PTE update before use, also serves as the hwsync */
-   asm volatile("ptesync": : :"memory");
-
-   /* order context switch after arbitrary prior code */
-   isync();
-
-   orig_mm = start_using_temp_mm(patching_mm);
-
-   err = __patch_instruction(addr, instr, patch_addr);
-
-   /* context synchronisation performed by __patch_instruction (isync or 
exception) */
-   stop_using_temp_mm(patching_mm, orig_mm);
-
-   pte_clear(patching_mm, text_poke_addr, pte);
-   /*
-* ptesync to order PTE update before TLB invalidation done
-* by radix__local_flush_tlb_page_psize (in _tlbiel_va)
-*/
-   local_flush_tlb_page_psize(patching_mm, text_poke_addr, 
mmu_virtual_psize);
-
-   pte_unmap_unlock(pte, ptl);
-
-   return err;
-}
-
 static int __do_patch_instruction(u32 *addr, ppc_inst_t instr)
 {
-   int err;
+   struct patch_window ctx;
u32 *patch_addr;
-   unsigned long text_poke_addr;
-   pte_t *pte;
-   unsigned long pfn = get_patch_pfn(addr);
-
-   text_poke_addr = (unsigned 
long)__this_cpu_read(cpu_patching_context.addr) & PAGE_MASK;
-   patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr));
+   int err;
 
-   pte = __this_cpu_read(cpu_patching_context.pte);
-   __set_pte_at(_mm, text_poke_addr, pte, pfn_pte(pfn, PAGE_KERNEL), 
0);
-   /* See ptesync comment in radix__set_pte_at() */
-   if (radix_enabled())
-   asm volatile("ptesync": : :"memory");
+   err = open_patch_window(addr, );
+   if (err)
+   return err;
 
+   patch_addr = (u32 *)(ctx.text_poke_addr + offset_in_page(addr));
err = __patch_instruction(addr, instr, patch_addr);
 
-   pte_clear(_mm, text_poke_addr, pte);
-   flush_tlb_kernel_range(text_poke_addr, text_poke_addr + PAGE_SIZE);
+   close_patch_window();
 
return err;
 }
@@ -475,10 +388,7 @@ int patch_instruction(u32 *addr, ppc_inst_t in

[PATCH v1 1/2] powerpc/code-patching: Introduce open_patch_window()/close_patch_window()

2024-03-14 Thread Benjamin Gray
The code patching capabilities have grown, and the specific quirks for
setting up and tearing down the patching window are getting duplicated.

This commit introduces an abstraction for working with this patching
window. It defines open_patch_window() to set up the writable alias
page, and close_patch_window() to remove it and flush the TLB. The
actual mechanism for providing this patching window is an implementation
detail that consumers don't need to worry about. Consumers are still
responsible for flushing/syncing any changes they make through this
alias page though.

Signed-off-by: Benjamin Gray 

---

This design can be readily extended to remap the writable page to
another physical page without incurring all of the entry and exit
overhead. But that might have problems with spending too long in
an interrupts disabled context, so I've left it out for now.
---
 arch/powerpc/lib/code-patching.c | 113 +++
 1 file changed, 113 insertions(+)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index ed450a32918c..d1b812f84154 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -276,6 +276,119 @@ static void unmap_patch_area(unsigned long addr)
flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
 }
 
+/*
+ * Represents an active patching window.
+ */
+struct patch_window {
+   /*
+* Page aligned patching window address. The page is a writable alias of
+* the configured target page.
+*/
+   unsigned long text_poke_addr;
+   /*
+* Pointer to the PTE for the patching window page.
+*/
+   pte_t *ptep;
+   /*
+* (Temporary MM patching only) The original MM before creating the
+* patching window.
+*/
+   struct mm_struct *orig_mm;
+   /*
+* (Temporary MM patching only) The patching window MM.
+*/
+   struct mm_struct *patch_mm;
+   /*
+* (Temporary MM patching only) Lock for the patching window PTE.
+*/
+   spinlock_t *ptl;
+};
+
+/*
+ * Calling this function opens a 'patching window' that maps a
+ * page starting at ctx->text_poke_addr (which is set by this function)
+ * as a writable alias to the page starting at addr.
+ *
+ * Upon success, callers must invoke the corresponding close_patch_window()
+ * function to return to the original state. Callers are also responsible
+ * for syncing any changes they make inside the window.
+ *
+ * Interrupts must be disabled for the entire duration of the patching. The 
PIDR
+ * is potentially changed during this time.
+ */
+static int open_patch_window(void *addr, struct patch_window *ctx)
+{
+   unsigned long pfn = get_patch_pfn(addr);
+
+   lockdep_assert_irqs_disabled();
+
+   ctx->text_poke_addr = (unsigned 
long)__this_cpu_read(cpu_patching_context.addr);
+
+   if (!mm_patch_enabled()) {
+   ctx->ptep = __this_cpu_read(cpu_patching_context.pte);
+   __set_pte_at(_mm, ctx->text_poke_addr,
+ctx->ptep, pfn_pte(pfn, PAGE_KERNEL), 0);
+
+   /* See ptesync comment in radix__set_pte_at() */
+   if (radix_enabled())
+   asm volatile("ptesync" ::: "memory");
+
+   return 0;
+   }
+
+   ctx->orig_mm = current->active_mm;
+   ctx->patch_mm = __this_cpu_read(cpu_patching_context.mm);
+
+   ctx->ptep = get_locked_pte(ctx->patch_mm, ctx->text_poke_addr, 
>ptl);
+   if (!ctx->ptep)
+   return -ENOMEM;
+
+   __set_pte_at(ctx->patch_mm, ctx->text_poke_addr,
+ctx->ptep, pfn_pte(pfn, PAGE_KERNEL), 0);
+
+   /* order PTE update before use, also serves as the hwsync */
+   asm volatile("ptesync" ::: "memory");
+
+   /*
+* Changing mm requires context synchronising instructions on both 
sides of
+* the context switch, as well as a hwsync between the last instruction 
for
+* which the address of an associated storage access was translated 
using
+* the current context. switch_mm_irqs_off() performs an isync after the
+* context switch.
+*/
+   isync();
+   switch_mm_irqs_off(ctx->orig_mm, ctx->patch_mm, current);
+
+   WARN_ON(!mm_is_thread_local(ctx->patch_mm));
+
+   suspend_breakpoints();
+   return 0;
+}
+
+static void close_patch_window(struct patch_window *ctx)
+{
+   lockdep_assert_irqs_disabled();
+
+   if (!mm_patch_enabled()) {
+   pte_clear(_mm, ctx->text_poke_addr, ctx->ptep);
+   flush_tlb_kernel_range(ctx->text_poke_addr, ctx->text_poke_addr 
+ PAGE_SIZE);
+   return;
+   }
+
+   isync();
+   switch_mm_irqs_off(ctx->patch_mm, ctx->orig_mm, current);
+   restore_breakpoints();
+
+   pte_clear(ctx->pat

[PATCH v1 3/3] powerpc/code-patching: Optimise patch_memcpy() to 4 byte chunks

2024-03-14 Thread Benjamin Gray
As we are patching instructions, we can assume the length is a multiple
of 4 and the destination address is aligned.

Atomicity of patching a prefixed instruction is not a concern, as the
original implementation doesn't provide it anyway.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/lib/code-patching.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index c6633759b509..ed450a32918c 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -394,10 +394,10 @@ static int patch_memset32(u32 *addr, u32 val, size_t 
count)
return -EPERM;
 }
 
-static int patch_memcpy(void *dst, void *src, size_t len)
+static int patch_memcpy32(u32 *dst, u32 *src, size_t count)
 {
-   for (void *end = src + len; src < end; dst++, src++)
-   __put_kernel_nofault(dst, src, u8, failed);
+   for (u32 *end = src + count; src < end; dst++, src++)
+   __put_kernel_nofault(dst, src, u32, failed);
 
return 0;
 
@@ -424,7 +424,7 @@ static int __patch_instructions(u32 *patch_addr, u32 *code, 
size_t len, bool rep
err = patch_memset32(patch_addr, val, len / 4);
}
} else {
-   err = patch_memcpy(patch_addr, code, len);
+   err = patch_memcpy32(patch_addr, code, len / 4);
}
 
smp_wmb();  /* smp write barrier */
-- 
2.44.0



[PATCH v1 2/3] powerpc/code-patching: Use dedicated memory routines for patching

2024-03-14 Thread Benjamin Gray
The patching page set up as a writable alias may be in quadrant 1
(userspace) if the temporary mm path is used. This causes sanitiser
failures if so. Sanitiser failures also occur on the non-mm path
because the plain memset family is instrumented, and KASAN treats the
patching window as poisoned.

Introduce locally defined patch_* variants of memset that perform an
uninstrumented lower level set, as well as detecting write errors like
the original single patch variant does.

copy_to_user() is not correct here, as the PTE makes it a proper kernel
page (the EEA is privileged access only, RW). It just happens to be in
quadrant 1 because that's the hardware's mechanism for using the current
PID vs PID 0 in translations. Importantly, it's incorrect to allow user
page accesses.

Now that the patching memsets are used, we also propagate a failure up
to the caller as the single patch variant does.

Signed-off-by: Benjamin Gray 

---

The patch_memcpy() can be optimised to 4 bytes at a time assuming the
same requirements as regular instruction patching are being followed
for the 'copy sequence of instructions' mode (i.e., they actually are
instructions following instruction alignment rules).
---
 arch/powerpc/lib/code-patching.c | 42 +---
 1 file changed, 38 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index c6ab46156cda..c6633759b509 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -372,9 +372,43 @@ int patch_instruction(u32 *addr, ppc_inst_t instr)
 }
 NOKPROBE_SYMBOL(patch_instruction);
 
+static int patch_memset64(u64 *addr, u64 val, size_t count)
+{
+   for (u64 *end = addr + count; addr < end; addr++)
+   __put_kernel_nofault(addr, , u64, failed);
+
+   return 0;
+
+failed:
+   return -EPERM;
+}
+
+static int patch_memset32(u32 *addr, u32 val, size_t count)
+{
+   for (u32 *end = addr + count; addr < end; addr++)
+   __put_kernel_nofault(addr, , u32, failed);
+
+   return 0;
+
+failed:
+   return -EPERM;
+}
+
+static int patch_memcpy(void *dst, void *src, size_t len)
+{
+   for (void *end = src + len; src < end; dst++, src++)
+   __put_kernel_nofault(dst, src, u8, failed);
+
+   return 0;
+
+failed:
+   return -EPERM;
+}
+
 static int __patch_instructions(u32 *patch_addr, u32 *code, size_t len, bool 
repeat_instr)
 {
unsigned long start = (unsigned long)patch_addr;
+   int err;
 
/* Repeat instruction */
if (repeat_instr) {
@@ -383,19 +417,19 @@ static int __patch_instructions(u32 *patch_addr, u32 
*code, size_t len, bool rep
if (ppc_inst_prefixed(instr)) {
u64 val = ppc_inst_as_ulong(instr);
 
-   memset64((u64 *)patch_addr, val, len / 8);
+   err = patch_memset64((u64 *)patch_addr, val, len / 8);
} else {
u32 val = ppc_inst_val(instr);
 
-   memset32(patch_addr, val, len / 4);
+   err = patch_memset32(patch_addr, val, len / 4);
}
} else {
-   memcpy(patch_addr, code, len);
+   err = patch_memcpy(patch_addr, code, len);
}
 
smp_wmb();  /* smp write barrier */
flush_icache_range(start, start + len);
-   return 0;
+   return err;
 }
 
 /*
-- 
2.44.0



[PATCH v1 1/3] powerpc/code-patching: Test patch_instructions() during boot

2024-03-14 Thread Benjamin Gray
patch_instructions() introduces new behaviour with a couple of
variations. Test each case of

  * a repeated 32-bit instruction,
  * a repeated 64-bit instruction (ppc64), and
  * a copied sequence of instructions

for both on a single page and when it crosses a page boundary.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/lib/test-code-patching.c | 92 +++
 1 file changed, 92 insertions(+)

diff --git a/arch/powerpc/lib/test-code-patching.c 
b/arch/powerpc/lib/test-code-patching.c
index c44823292f73..35a3756272df 100644
--- a/arch/powerpc/lib/test-code-patching.c
+++ b/arch/powerpc/lib/test-code-patching.c
@@ -347,6 +347,97 @@ static void __init test_prefixed_patching(void)
check(!memcmp(iptr, expected, sizeof(expected)));
 }
 
+static void __init test_multi_instruction_patching(void)
+{
+   u32 code[256];
+   void *buf;
+   u32 *addr32;
+   u64 *addr64;
+   ppc_inst_t inst64 = ppc_inst_prefix(OP_PREFIX << 26 | 3UL << 24, 
PPC_RAW_TRAP());
+   u32 inst32 = PPC_RAW_NOP();
+
+   buf = vzalloc(PAGE_SIZE * 8);
+   check(buf);
+   if (!buf)
+   return;
+
+   /* Test single page 32-bit repeated instruction */
+   addr32 = buf + PAGE_SIZE;
+   check(!patch_instructions(addr32 + 1, , 12, true));
+
+   check(addr32[0] == 0);
+   check(addr32[1] == inst32);
+   check(addr32[2] == inst32);
+   check(addr32[3] == inst32);
+   check(addr32[4] == 0);
+
+   /* Test single page 64-bit repeated instruction */
+   if (IS_ENABLED(CONFIG_PPC64)) {
+   check(ppc_inst_prefixed(inst64));
+
+   addr64 = buf + PAGE_SIZE * 2;
+   ppc_inst_write(code, inst64);
+   check(!patch_instructions((u32 *)(addr64 + 1), code, 24, true));
+
+   check(addr64[0] == 0);
+   check(ppc_inst_equal(ppc_inst_read((u32 *)[1]), inst64));
+   check(ppc_inst_equal(ppc_inst_read((u32 *)[2]), inst64));
+   check(ppc_inst_equal(ppc_inst_read((u32 *)[3]), inst64));
+   check(addr64[4] == 0);
+   }
+
+   /* Test single page memcpy */
+   addr32 = buf + PAGE_SIZE * 3;
+
+   for (int i = 0; i < ARRAY_SIZE(code); i++)
+   code[i] = i + 1;
+
+   check(!patch_instructions(addr32 + 1, code, sizeof(code), false));
+
+   check(addr32[0] == 0);
+   check(!memcmp([1], code, sizeof(code)));
+   check(addr32[ARRAY_SIZE(code) + 1] == 0);
+
+   /* Test multipage 32-bit repeated instruction */
+   addr32 = buf + PAGE_SIZE * 4 - 8;
+   check(!patch_instructions(addr32 + 1, , 12, true));
+
+   check(addr32[0] == 0);
+   check(addr32[1] == inst32);
+   check(addr32[2] == inst32);
+   check(addr32[3] == inst32);
+   check(addr32[4] == 0);
+
+   /* Test multipage 64-bit repeated instruction */
+   if (IS_ENABLED(CONFIG_PPC64)) {
+   check(ppc_inst_prefixed(inst64));
+
+   addr64 = buf + PAGE_SIZE * 5 - 8;
+   ppc_inst_write(code, inst64);
+   check(!patch_instructions((u32 *)(addr64 + 1), code, 24, true));
+
+   check(addr64[0] == 0);
+   check(ppc_inst_equal(ppc_inst_read((u32 *)[1]), inst64));
+   check(ppc_inst_equal(ppc_inst_read((u32 *)[2]), inst64));
+   check(ppc_inst_equal(ppc_inst_read((u32 *)[3]), inst64));
+   check(addr64[4] == 0);
+   }
+
+   /* Test multipage memcpy */
+   addr32 = buf + PAGE_SIZE * 6 - 12;
+
+   for (int i = 0; i < ARRAY_SIZE(code); i++)
+   code[i] = i + 1;
+
+   check(!patch_instructions(addr32 + 1, code, sizeof(code), false));
+
+   check(addr32[0] == 0);
+   check(!memcmp([1], code, sizeof(code)));
+   check(addr32[ARRAY_SIZE(code) + 1] == 0);
+
+   vfree(buf);
+}
+
 static int __init test_code_patching(void)
 {
pr_info("Running code patching self-tests ...\n");
@@ -356,6 +447,7 @@ static int __init test_code_patching(void)
test_create_function_call();
test_translate_branch();
test_prefixed_patching();
+   test_multi_instruction_patching();
 
return 0;
 }
-- 
2.44.0



Re: BUG: KASAN: vmalloc-out-of-bounds in memset32 (bpf_prog_pack_free)

2024-02-22 Thread Benjamin Gray
On Wed, 2024-01-31 at 11:46 +, Christophe Leroy wrote:
> Hi,
> 
> Got the following BUG while loading module test_bpf.ko
> 
> No time to investigate for now.
> 
> root@vgoip:~# insmod test_bpf.ko
> [  263.409030] 
> ==
> [  263.416415] BUG: KASAN: vmalloc-out-of-bounds in
> memset32+0x5c/0xa0
> [  263.422952] Write of size 4 at addr c9000e40 by task kworker/0:0/7
> [  263.429322]
> [  263.430816] CPU: 0 PID: 7 Comm: kworker/0:0 Not tainted 
> 6.8.0-rc1-s3k-dev-02364-gc626219462a6-dirty #606
> [  263.440580] Hardware name: MIAE 8xx 0x50 CMPC885
> [  263.445658] Workqueue: events bpf_prog_free_deferred
> [  263.450973] Call Trace:
> [  263.453411] [c905bd00] [c0c114e8] dump_stack_lvl+0x34/0x50
> (unreliable)
> [  263.460744] [c905bd20] [c026b9d4] print_report+0x174/0x608
> [  263.466853] [c905bd70] [c026c01c] kasan_report+0xc0/0x130
> [  263.472788] [c905bdd0] [c0c43cb0] memset32+0x5c/0xa0
> [  263.478198] [c905bdf0] [c0030690] patch_instructions+0x70/0x17c
> [  263.484656] [c905be30] [c00389b0]
> bpf_arch_text_invalidate+0xa8/0x120
> [  263.491638] [c905be90] [c018e63c] bpf_prog_pack_free+0xec/0x24c
> [  263.498096] [c905bec0] [c018ea34]
> bpf_jit_binary_pack_free+0x3c/0x80
> [  263.504991] [c905bee0] [c0038ae8] bpf_jit_free+0xc0/0x128
> [  263.510925] [c905bf00] [c00790f8] process_one_work+0x310/0x6e8
> [  263.517209] [c905bf50] [c0079b2c] worker_thread+0x65c/0x868
> [  263.523232] [c905bfc0] [c0084b78] kthread+0x17c/0x1ac
> [  263.528817] [c905bff0] [c00181fc] start_kernel_thread+0x10/0x14
> [  263.535279]
> [  263.536782] The buggy address belongs to the virtual mapping at
> [  263.536782]  [c900, c9008000) created by:
> [  263.536782]  text_area_cpu_up+0x28/0x1d4
> [  263.551418]
> [  263.552902] The buggy address belongs to the physical page:
> [  263.560228]
> [  263.561713] Memory state around the buggy address:
> [  263.566585]  c9000d00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
> f8 f8
> [  263.573307]  c9000d80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
> f8 f8
> [  263.580027] >c9000e00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
> f8 f8
> [  263.586677]    ^
> [  263.591370]  c9000e80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
> f8 f8
> [  263.598093]  c9000f00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
> f8 f8
> [  263.604743] 
> ==
> 
> Christophe
> 

Looks like a false positive. It's clearly in range of the poking
allocation after all. KASAN apparently leaves VM_ALLOC areas as
poisoned, expecting some kind of subsequent allocator to come and
unpoison specific parts. Except we call map_patch_area() instead of
whatever path generic code was expecting, so we never unpoison the
range. The issue would be pre-existing from the beginning, and gone
unnoticed because the original code path that does the patching (i.e.,
actually interacts with the poking page) isn't instrumented.


Re: [PATCH] powerpc/ftrace: Ignore ftrace locations in exit text sections

2024-02-12 Thread Benjamin Gray
ddr, int link)
>  {
>   ppc_inst_t op;
> diff --git a/arch/powerpc/kernel/vmlinux.lds.S
> b/arch/powerpc/kernel/vmlinux.lds.S
> index 1c5970df3233..9c376ae6857d 100644
> --- a/arch/powerpc/kernel/vmlinux.lds.S
> +++ b/arch/powerpc/kernel/vmlinux.lds.S
> @@ -281,7 +281,9 @@ SECTIONS
>    * to deal with references from __bug_table
>    */
>   .exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) {
> + _sexittext = .;
>   EXIT_TEXT
> + _eexittext = .;
>   }
>  
>   . = ALIGN(PAGE_SIZE);
> 
> base-commit: 4ef8376c466ae8b03e632dd8eca1e44315f7dd61

Reviewed-by: Benjamin Gray 

Fixes what I assume is the same bug I was seeing for a while

[0.00][T0] ftrace-powerpc: 0xc30f4f38: No module provided 
for non-kernel address
[0.00][T0] [ ftrace bug ]
[0.00][T0] ftrace faulted on modifying 
[0.00][T0] [] _sub_D_65535_0+0xc/0x40
[0.00][T0] Initializing ftrace call sites
[0.00][T0] ftrace record flags: 0
[0.00][T0]  (0)
[0.00][T0]  expected tramp: c00a8548
[0.00][T0] [ cut here ]
[0.00][T0] WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:2179 
ftrace_bug+0x16c/0x3a0
[0.00][T0] Modules linked in:
[0.00][T0] CPU: 0 PID: 0 Comm: swapper Not tainted 
6.8.0-rc3-02361-g892886f9f5b7-dirty #11
[0.00][T0] Hardware name: IBM pSeries (emulated by qemu) POWER9 
(raw) 0x4e1202 0xf05 of:SLOF,HEAD hv:linux,kvm pSeries
[0.00][T0] NIP:  c048a96c LR: c048a968 CTR: 

[0.00][T0] REGS: c4fa7ae0 TRAP: 0700   Not tainted  
(6.8.0-rc3-02361-g892886f9f5b7-dirty)
[0.00][T0] MSR:  82021033   CR: 
28028440  XER: 
[0.00][T0] CFAR: c031db34 IRQMASK: 3 
[0.00][T0] GPR00: c048a968 c4fa7d80 
c2645600 0022 
[0.00][T0] GPR04: 0008 0001 
c0326dc4  
[0.00][T0] GPR08: c2535600 0001 
c440bee0 48028444 
[0.00][T0] GPR12:  c625 
  
[0.00][T0] GPR16:  0003 
0001  
[0.00][T0] GPR20: c4fe8af8 c00010030428 
c45b90c0  
[0.00][T0] GPR24:  c00010030430 
c00010030420  
[0.00][T0] GPR28: c1dd9a40 c1dd8bc0 
c000102a5b00 c000102a5b08 
[0.00][T0] NIP [c048a96c] ftrace_bug+0x16c/0x3a0
[0.00][T0] LR [c048a968] ftrace_bug+0x168/0x3a0
[0.00][T0] Call Trace:
[0.00][T0] [c4fa7d80] [c048a968] 
ftrace_bug+0x168/0x3a0 (unreliable)
[0.00][T0] [c4fa7e20] [c048aeec] 
ftrace_process_locs+0x34c/0x780
[0.00][T0] [c4fa7ec0] [c30655c0] 
ftrace_init+0xc8/0x27c
[0.00][T0] [c4fa7f40] [c300d458] 
start_kernel+0x1bc/0x528
[0.00][T0] [c4fa7fe0] [c000eaa0] 
start_here_common+0x1c/0x20
[0.00][T0] Code: 7fe3fb78 483a31e5 6000 e93e0008 75290800 
40820150 7fc3f378 4bfffc95 7c641b78 387d0f40 4be93189 6000 <0fe0> 
382100a0 3d22fda6 3901 
[0.00][T0] irq event stamp: 0
[0.00][T0] hardirqs last  enabled at (0): [<>] 0x0
[0.00][T0] hardirqs last disabled at (0): [<>] 0x0
[0.00][T0] softirqs last  enabled at (0): [<>] 0x0
[0.00][T0] softirqs last disabled at (0): [<>] 0x0
[0.00][T0] ---[ end trace  ]---




[PATCH] powerpc/code-patching: Disable KASAN in __patch_instructions()

2024-02-12 Thread Benjamin Gray
The memset/memcpy functions are by default instrumented by KASAN, which
complains about user memory access when using a poking page in
userspace.

Using a userspace address is expected though, so don't instrument with
KASAN for this function.

Signed-off-by: Benjamin Gray 

---

I tried to replace the memsetN calls with __memsetN, but we appear to
disable the non-instrumented variants of these when KASAN is enabled.
Christophe might you know more here?

The cost of just suppressing reports for this section shouldn't be too
relevant; KASAN detects the access, but exits before it starts preparing
the report itself. So it's just like any other KASAN instrumented
function for the most part.
---
 arch/powerpc/lib/code-patching.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index c6ab46156cda..24989594578a 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -3,6 +3,7 @@
  *  Copyright 2008 Michael Ellerman, IBM Corporation.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -377,6 +378,7 @@ static int __patch_instructions(u32 *patch_addr, u32 *code, 
size_t len, bool rep
unsigned long start = (unsigned long)patch_addr;
 
/* Repeat instruction */
+   kasan_disable_current();
if (repeat_instr) {
ppc_inst_t instr = ppc_inst_read(code);
 
@@ -392,6 +394,7 @@ static int __patch_instructions(u32 *patch_addr, u32 *code, 
size_t len, bool rep
} else {
memcpy(patch_addr, code, len);
}
+   kasan_enable_current();
 
smp_wmb();  /* smp write barrier */
flush_icache_range(start, start + len);
-- 
2.43.0



Re: [PATCH] powerpc/kasan: Limit KASAN thread size increase to 32KB

2024-02-12 Thread Benjamin Gray
Don't know why the previous mail went blank.

On Mon, 2024-02-12 at 17:42 +1100, Michael Ellerman wrote:
> KASAN is seen to increase stack usage, to the point that it was
> reported
> to lead to stack overflow on some 32-bit machines (see link).
> 
> To avoid overflows the stack size was doubled for KASAN builds in
> commit 3e8635fb2e07 ("powerpc/kasan: Force thread size increase with
> KASAN").
> 
> However with a 32KB stack size to begin with, the doubling leads to a
> 64KB stack, which causes build errors:
>   arch/powerpc/kernel/switch.S:249: Error: operand out of range
> (0xfe50 is not between 0x8000 and
> 0x7fff)
> 
> Although the asm could be reworked, in practice a 32KB stack seems
> sufficient even for KASAN builds - the additional usage seems to be
> in
> the 2-3KB range for a 64-bit KASAN build.
> 
> So only increase the stack for KASAN if the stack size is < 32KB.
> 
> Link:
> https://lore.kernel.org/linuxppc-dev/bug-207129-206...@https.bugzilla.kernel.org%2F/
> Reported-by: Spoorthy 
> Reported-by: Benjamin Gray 
> Fixes: 18f14afe2816 ("powerpc/64s: Increase default stack size to
> 32KB")
> Signed-off-by: Michael Ellerman 

Reviewed-by: Benjamin Gray 

> ---
>  arch/powerpc/include/asm/thread_info.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/thread_info.h
> b/arch/powerpc/include/asm/thread_info.h
> index bf5dde1a4114..15c5691dd218 100644
> --- a/arch/powerpc/include/asm/thread_info.h
> +++ b/arch/powerpc/include/asm/thread_info.h
> @@ -14,7 +14,7 @@
>  
>  #ifdef __KERNEL__
>  
> -#ifdef CONFIG_KASAN
> +#if defined(CONFIG_KASAN) && CONFIG_THREAD_SHIFT < 15
>  #define MIN_THREAD_SHIFT (CONFIG_THREAD_SHIFT + 1)
>  #else
>  #define MIN_THREAD_SHIFT CONFIG_THREAD_SHIFT



Re: [PATCH] powerpc/kasan: Limit KASAN thread size increase to 32KB

2024-02-12 Thread Benjamin Gray


[PATCH] powerpc64/kasan: Pass virtual addresses to kasan_init_phys_region()

2024-02-11 Thread Benjamin Gray
The kasan_init_phys_region() function maps shadow pages necessary for
the ranges of the linear map backed by physical pages. Currently
kasan_init_phys_region() is being passed physical addresses, but
kasan_mem_to_shadow() expects virtual addresses.

It works right now because the lower bits (12:64) of the
kasan_mem_to_shadow() calculation are the same for the real and virtual
addresses, so the actual PTE value is the same in the end. But virtual
addresses are the intended input, so fix it.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/mm/kasan/init_book3e_64.c | 2 +-
 arch/powerpc/mm/kasan/init_book3s_64.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/kasan/init_book3e_64.c 
b/arch/powerpc/mm/kasan/init_book3e_64.c
index 11519e88dc6b..43c03b84ff32 100644
--- a/arch/powerpc/mm/kasan/init_book3e_64.c
+++ b/arch/powerpc/mm/kasan/init_book3e_64.c
@@ -112,7 +112,7 @@ void __init kasan_init(void)
pte_t zero_pte = pfn_pte(virt_to_pfn(kasan_early_shadow_page), 
PAGE_KERNEL_RO);
 
for_each_mem_range(i, , )
-   kasan_init_phys_region((void *)start, (void *)end);
+   kasan_init_phys_region(phys_to_virt(start), phys_to_virt(end));
 
if (IS_ENABLED(CONFIG_KASAN_VMALLOC))
kasan_remove_zero_shadow((void *)VMALLOC_START, VMALLOC_SIZE);
diff --git a/arch/powerpc/mm/kasan/init_book3s_64.c 
b/arch/powerpc/mm/kasan/init_book3s_64.c
index 9300d641cf9a..3fb5ce4f48f4 100644
--- a/arch/powerpc/mm/kasan/init_book3s_64.c
+++ b/arch/powerpc/mm/kasan/init_book3s_64.c
@@ -62,7 +62,7 @@ void __init kasan_init(void)
}
 
for_each_mem_range(i, , )
-   kasan_init_phys_region((void *)start, (void *)end);
+   kasan_init_phys_region(phys_to_virt(start), phys_to_virt(end));
 
for (i = 0; i < PTRS_PER_PTE; i++)
__set_pte_at(_mm, (unsigned long)kasan_early_shadow_page,
-- 
2.43.0



Re: [PATCH v2 1/3] powerpc/code-patching: Add generic memory patching

2024-02-04 Thread Benjamin Gray
On Thu, 2023-11-30 at 15:55 +0530, Naveen N Rao wrote:
> On Mon, Oct 16, 2023 at 04:01:45PM +1100, Benjamin Gray wrote:
> > patch_instruction() is designed for patching instructions in
> > otherwise
> > readonly memory. Other consumers also sometimes need to patch
> > readonly
> > memory, so have abused patch_instruction() for arbitrary data
> > patches.
> > 
> > This is a problem on ppc64 as patch_instruction() decides on the
> > patch
> > width using the 'instruction' opcode to see if it's a prefixed
> > instruction. Data that triggers this can lead to larger writes,
> > possibly
> > crossing a page boundary and failing the write altogether.
> > 
> > Introduce patch_uint(), and patch_ulong(), with aliases
> > patch_u32(), and
> > patch_u64() (on ppc64) designed for aligned data patches. The patch
> > size is now determined by the called function, and is passed as an
> > additional parameter to generic internals.
> > 
> > While the instruction flushing is not required for data patches,
> > the
> > use cases for data patching (mainly module loading and static
> > calls)
> > are less performance sensitive than for instruction patching
> > (ftrace activation).
> 
> That's debatable. While it is nice to be able to activate function 
> tracing quickly, it is not necessarily a hot path. On the flip side,
> I 
> do have a use case for data patching for ftrace activation :)
> 

True, but it's still correct to do so at least. Having a different path
for data writes is definitely a possibility, but would be added for
performance. This series is motivated by fixing a correctness issue
with data write sizes.

> > So the instruction flushing remains unconditional
> > in this patch.
> > 
> > ppc32 does not support prefixed instructions, so is unaffected by
> > the
> > original issue. Care is taken in not exposing the size parameter in
> > the
> > public (non-static) interface, so the compiler can const-propagate
> > it
> > away.
> > 
> > Signed-off-by: Benjamin Gray 
> > 
> > ---
> > 
> > v2: * Deduplicate patch_32() definition
> >     * Use u32 for val32
> >     * Remove noinline
> > ---
> >  arch/powerpc/include/asm/code-patching.h | 33 
> >  arch/powerpc/lib/code-patching.c | 66 ++--
> > 
> >  2 files changed, 83 insertions(+), 16 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/code-patching.h
> > b/arch/powerpc/include/asm/code-patching.h
> > index 3f881548fb61..7c6056bb1706 100644
> > --- a/arch/powerpc/include/asm/code-patching.h
> > +++ b/arch/powerpc/include/asm/code-patching.h
> > @@ -75,6 +75,39 @@ int patch_branch(u32 *addr, unsigned long
> > target, int flags);
> >  int patch_instruction(u32 *addr, ppc_inst_t instr);
> >  int raw_patch_instruction(u32 *addr, ppc_inst_t instr);
> >  
> > +/*
> > + * patch_uint() and patch_ulong() should only be called on
> > addresses where the
> > + * patch does not cross a cacheline, otherwise it may not be
> > flushed properly
> > + * and mixes of new and stale data may be observed. It cannot
> > cross a page
> > + * boundary, as only the target page is mapped as writable.
> 
> Should we enforce alignment requirements, especially for
> patch_ulong() 
> on 64-bit powerpc? I am not sure if there are use cases for unaligned
> 64-bit writes. That should also ensure that the write doesn't cross a
> cacheline.

Yeah, the current description is more just the technical restrictions,
not an endorsement of usage. If the caller isn't working with aligned
data, it seems unlikely it would still be cacheline aligned. The caller
should break it into 32bit patches if this is the case.

By enforce, are you suggesting a WARN_ON in the code too?

> > + *
> > + * patch_instruction() and other instruction patchers
> > automatically satisfy this
> > + * requirement due to instruction alignment requirements.
> > + */
> > +
> > +#ifdef CONFIG_PPC64
> > +
> > +int patch_uint(void *addr, unsigned int val);
> > +int patch_ulong(void *addr, unsigned long val);
> > +
> > +#define patch_u64 patch_ulong
> > +
> > +#else
> > +
> > +static inline int patch_uint(u32 *addr, unsigned int val)
> 
> Is there a reason to use u32 * here?
> 

No, fixed to void* for next series

> > +{
> > +   return patch_instruction(addr, ppc_inst(val));
> > +}
> > +
> > +static inline int patch_ulong(void *addr, unsigned long val)
> > +{
> > +   return patch_instruction(addr, ppc_inst(

Re: [PATCH v2 0/3] Add generic data patching functions

2023-10-17 Thread Benjamin Gray

On 17/10/23 5:39 pm, Christophe Leroy wrote:

Le 16/10/2023 à 07:01, Benjamin Gray a écrit :

Currently patch_instruction() bases the write length on the value being
written. If the value looks like a prefixed instruction it writes 8 bytes,
otherwise it writes 4 bytes. This makes it potentially buggy to use for
writing arbitrary data, as if you want to write 4 bytes but it decides to
write 8 bytes it may clobber the following memory or be unaligned and
trigger an oops if it tries to cross a page boundary.

To solve this, this series pulls out the size parameter to the 'top' of
the text patching logic, and propagates it through the various functions.

The two sizes supported are int and long; this allows for patching
instructions and pointers on both ppc32 and ppc64. On ppc32 these are the
same size, so care is taken to only use the size parameter on static
functions, so the compiler can optimise it out entirely. Unfortunately
GCC trips over its own feet here and won't optimise in a way that is
optimal for strict RWX (mpc85xx_smp_defconfig) and no RWX
(pmac32_defconfig).

In the first case, patch_memory() is very large and can only be inlined
if a single function calls it. While the source only calls it in
patch_instruction(), an earlier optimisation pass inlined
patch_instruction() into patch_branch(), so now there are 'two' references
to patch_memory() and it cannot be inlined into patch_instruction().
Instead patch_instruction() becomes a single branch directly to
patch_memory().

We can fix this by marking patch_instruction() as noinline, but this
prevents patch_memory() from being directly inlined into patch_branch()
when RWX is disabled and patch_memory() is very small.

It may be possible to avoid this by merging together patch_instruction()
and patch_memory() on ppc32, but the only way I can think to do this
without duplicating the implementation involves using the preprocessor
to change if is_dword is a parameter or a local variable, which is gross.


What about:

diff --git a/arch/powerpc/include/asm/code-patching.h
b/arch/powerpc/include/asm/code-patching.h
index 7c6056bb1706..af89ef450c93 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -72,7 +72,7 @@ static inline int create_branch(ppc_inst_t *instr,
const u32 *addr,
   int create_cond_branch(ppc_inst_t *instr, const u32 *addr,
   unsigned long target, int flags);
   int patch_branch(u32 *addr, unsigned long target, int flags);
-int patch_instruction(u32 *addr, ppc_inst_t instr);
+int patch_memory(void *addr, unsigned long val, bool is_dword);
   int raw_patch_instruction(u32 *addr, ppc_inst_t instr);

   /*
@@ -87,24 +87,28 @@ int raw_patch_instruction(u32 *addr, ppc_inst_t instr);

   #ifdef CONFIG_PPC64

-int patch_uint(void *addr, unsigned int val);
-int patch_ulong(void *addr, unsigned long val);
+int patch_instruction(u32 *addr, ppc_inst_t instr);

   #define patch_u64 patch_ulong

   #else

-static inline int patch_uint(u32 *addr, unsigned int val)
+static inline int patch_instruction(u32 *addr, ppc_inst_t instr)
   {
-   return patch_instruction(addr, ppc_inst(val));
+   return patch_memory(addr, ppc_inst_val(instr), false);
   }

+#endif
+
   static inline int patch_ulong(void *addr, unsigned long val)
   {
-   return patch_instruction(addr, ppc_inst(val));
+   return patch_memory(addr, val, true);
   }

-#endif
+static inline int patch_uint(void *addr, unsigned int val)
+{
+   return patch_memory(addr, val, false);
+}

   #define patch_u32 patch_uint

diff --git a/arch/powerpc/lib/code-patching.c
b/arch/powerpc/lib/code-patching.c
index 60289332412f..77418b2a4aa4 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -355,7 +355,7 @@ static int __do_patch_memory(void *addr, unsigned
long val, bool is_dword)
return err;
   }

-static int patch_memory(void *addr, unsigned long val, bool is_dword)
+int patch_memory(void *addr, unsigned long val, bool is_dword)
   {
int err;
unsigned long flags;
@@ -378,6 +378,7 @@ static int patch_memory(void *addr, unsigned long
val, bool is_dword)

return err;
   }
+NOKPROBE_SYMBOL(patch_memory)

   #ifdef CONFIG_PPC64

@@ -390,26 +391,6 @@ int patch_instruction(u32 *addr, ppc_inst_t instr)
   }
   NOKPROBE_SYMBOL(patch_instruction)

-int patch_uint(void *addr, unsigned int val)
-{
-   return patch_memory(addr, val, false);
-}
-NOKPROBE_SYMBOL(patch_uint)
-
-int patch_ulong(void *addr, unsigned long val)
-{
-   return patch_memory(addr, val, true);
-}
-NOKPROBE_SYMBOL(patch_ulong)
-
-#else
-
-int patch_instruction(u32 *addr, ppc_inst_t instr)
-{
-   return patch_memory(addr, ppc_inst_val(instr), false);
-}
-NOKPROBE_SYMBOL(patch_instruction)
-
   #endif

   int patch_branch(u32 *addr, unsigned long target, int flags)



Wouldn't every caller need to initialise the is_dword parameter in that 
case? It can't tell it's unused across

[PATCH v2 3/3] powerpc/32: Convert patch_instruction() to patch_uint()

2023-10-15 Thread Benjamin Gray
These changes are for patch_instruction() uses on data. Unlike ppc64
these should not be incorrect as-is, but using the patch_uint() alias
better reflects what kind of data being patched and allows for
benchmarking the effect of different patch_* implementations (e.g.,
skipping instruction flushing when patching data).

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/kernel/static_call.c | 2 +-
 arch/powerpc/platforms/powermac/smp.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/static_call.c 
b/arch/powerpc/kernel/static_call.c
index 863a7aa24650..1502b7e439ca 100644
--- a/arch/powerpc/kernel/static_call.c
+++ b/arch/powerpc/kernel/static_call.c
@@ -17,7 +17,7 @@ void arch_static_call_transform(void *site, void *tramp, void 
*func, bool tail)
mutex_lock(_mutex);
 
if (func && !is_short) {
-   err = patch_instruction(tramp + PPC_SCT_DATA, ppc_inst(target));
+   err = patch_ulong(tramp + PPC_SCT_DATA, target);
if (err)
goto out;
}
diff --git a/arch/powerpc/platforms/powermac/smp.c 
b/arch/powerpc/platforms/powermac/smp.c
index c83d1e14077e..cfc1cd10135d 100644
--- a/arch/powerpc/platforms/powermac/smp.c
+++ b/arch/powerpc/platforms/powermac/smp.c
@@ -827,7 +827,7 @@ static int smp_core99_kick_cpu(int nr)
mdelay(1);
 
/* Restore our exception vector */
-   patch_instruction(vector, ppc_inst(save_vector));
+   patch_uint(vector, save_vector);
 
local_irq_restore(flags);
if (ppc_md.progress) ppc_md.progress("smp_core99_kick_cpu done", 0x347);
-- 
2.39.2



[PATCH v2 2/3] powerpc/64: Convert patch_instruction() to patch_u32()

2023-10-15 Thread Benjamin Gray
This use of patch_instruction() is working on 32 bit data, and can fail
if the data looks like a prefixed instruction and the extra write
crosses a page boundary. Use patch_u32() to fix the write size.

Fixes: 8734b41b3efe ("powerpc/module_64: Fix livepatching for RO modules")
Link: https://lore.kernel.org/all/20230203004649.1f59dbd4@yea/
Signed-off-by: Benjamin Gray 

---

v2: * Added the fixes tag, it seems appropriate even if the subject does
  mention a more robust solution being required.

patch_u64() should be more efficient, but judging from the bug report
it doesn't seem like the data is doubleword aligned.
---
 arch/powerpc/kernel/module_64.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 7112adc597a8..e9bab599d0c2 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -651,12 +651,11 @@ static inline int create_stub(const Elf64_Shdr *sechdrs,
// func_desc_t is 8 bytes if ABIv2, else 16 bytes
desc = func_desc(addr);
for (i = 0; i < sizeof(func_desc_t) / sizeof(u32); i++) {
-   if (patch_instruction(((u32 *)>funcdata) + i,
- ppc_inst(((u32 *)())[i])))
+   if (patch_u32(((u32 *)>funcdata) + i, ((u32 *))[i]))
return 0;
}
 
-   if (patch_instruction(>magic, ppc_inst(STUB_MAGIC)))
+   if (patch_u32(>magic, STUB_MAGIC))
return 0;
 
return 1;
-- 
2.39.2



[PATCH v2 1/3] powerpc/code-patching: Add generic memory patching

2023-10-15 Thread Benjamin Gray
patch_instruction() is designed for patching instructions in otherwise
readonly memory. Other consumers also sometimes need to patch readonly
memory, so have abused patch_instruction() for arbitrary data patches.

This is a problem on ppc64 as patch_instruction() decides on the patch
width using the 'instruction' opcode to see if it's a prefixed
instruction. Data that triggers this can lead to larger writes, possibly
crossing a page boundary and failing the write altogether.

Introduce patch_uint(), and patch_ulong(), with aliases patch_u32(), and
patch_u64() (on ppc64) designed for aligned data patches. The patch
size is now determined by the called function, and is passed as an
additional parameter to generic internals.

While the instruction flushing is not required for data patches, the
use cases for data patching (mainly module loading and static calls)
are less performance sensitive than for instruction patching
(ftrace activation). So the instruction flushing remains unconditional
in this patch.

ppc32 does not support prefixed instructions, so is unaffected by the
original issue. Care is taken in not exposing the size parameter in the
public (non-static) interface, so the compiler can const-propagate it
away.

Signed-off-by: Benjamin Gray 

---

v2: * Deduplicate patch_32() definition
* Use u32 for val32
* Remove noinline
---
 arch/powerpc/include/asm/code-patching.h | 33 
 arch/powerpc/lib/code-patching.c | 66 ++--
 2 files changed, 83 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index 3f881548fb61..7c6056bb1706 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -75,6 +75,39 @@ int patch_branch(u32 *addr, unsigned long target, int flags);
 int patch_instruction(u32 *addr, ppc_inst_t instr);
 int raw_patch_instruction(u32 *addr, ppc_inst_t instr);
 
+/*
+ * patch_uint() and patch_ulong() should only be called on addresses where the
+ * patch does not cross a cacheline, otherwise it may not be flushed properly
+ * and mixes of new and stale data may be observed. It cannot cross a page
+ * boundary, as only the target page is mapped as writable.
+ *
+ * patch_instruction() and other instruction patchers automatically satisfy 
this
+ * requirement due to instruction alignment requirements.
+ */
+
+#ifdef CONFIG_PPC64
+
+int patch_uint(void *addr, unsigned int val);
+int patch_ulong(void *addr, unsigned long val);
+
+#define patch_u64 patch_ulong
+
+#else
+
+static inline int patch_uint(u32 *addr, unsigned int val)
+{
+   return patch_instruction(addr, ppc_inst(val));
+}
+
+static inline int patch_ulong(void *addr, unsigned long val)
+{
+   return patch_instruction(addr, ppc_inst(val));
+}
+
+#endif
+
+#define patch_u32 patch_uint
+
 static inline unsigned long patch_site_addr(s32 *site)
 {
return (unsigned long)site + *site;
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index b00112d7ad46..60289332412f 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -20,15 +20,14 @@
 #include 
 #include 
 
-static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 
*patch_addr)
+static int __patch_memory(void *exec_addr, unsigned long val, void *patch_addr,
+ bool is_dword)
 {
-   if (!ppc_inst_prefixed(instr)) {
-   u32 val = ppc_inst_val(instr);
+   if (!IS_ENABLED(CONFIG_PPC64) || likely(!is_dword)) {
+   u32 val32 = val;
 
-   __put_kernel_nofault(patch_addr, , u32, failed);
+   __put_kernel_nofault(patch_addr, , u32, failed);
} else {
-   u64 val = ppc_inst_as_ulong(instr);
-
__put_kernel_nofault(patch_addr, , u64, failed);
}
 
@@ -43,7 +42,10 @@ static int __patch_instruction(u32 *exec_addr, ppc_inst_t 
instr, u32 *patch_addr
 
 int raw_patch_instruction(u32 *addr, ppc_inst_t instr)
 {
-   return __patch_instruction(addr, instr, addr);
+   if (ppc_inst_prefixed(instr))
+   return __patch_memory(addr, ppc_inst_as_ulong(instr), addr, 
true);
+   else
+   return __patch_memory(addr, ppc_inst_val(instr), addr, false);
 }
 
 struct patch_context {
@@ -278,7 +280,7 @@ static void unmap_patch_area(unsigned long addr)
flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
 }
 
-static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr)
+static int __do_patch_memory_mm(void *addr, unsigned long val, bool is_dword)
 {
int err;
u32 *patch_addr;
@@ -307,7 +309,7 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t 
instr)
 
orig_mm = start_using_temp_mm(patching_mm);
 
-   err = __patch_instruction(addr, instr, patch_addr);
+   err = __patch_memory(addr, val, patch_addr, is_dword);
 
/* hwsync performed by __patch_instruction

[PATCH v2 0/3] Add generic data patching functions

2023-10-15 Thread Benjamin Gray
Currently patch_instruction() bases the write length on the value being
written. If the value looks like a prefixed instruction it writes 8 bytes,
otherwise it writes 4 bytes. This makes it potentially buggy to use for
writing arbitrary data, as if you want to write 4 bytes but it decides to
write 8 bytes it may clobber the following memory or be unaligned and
trigger an oops if it tries to cross a page boundary.

To solve this, this series pulls out the size parameter to the 'top' of
the text patching logic, and propagates it through the various functions.

The two sizes supported are int and long; this allows for patching
instructions and pointers on both ppc32 and ppc64. On ppc32 these are the
same size, so care is taken to only use the size parameter on static
functions, so the compiler can optimise it out entirely. Unfortunately
GCC trips over its own feet here and won't optimise in a way that is
optimal for strict RWX (mpc85xx_smp_defconfig) and no RWX
(pmac32_defconfig).

In the first case, patch_memory() is very large and can only be inlined
if a single function calls it. While the source only calls it in
patch_instruction(), an earlier optimisation pass inlined
patch_instruction() into patch_branch(), so now there are 'two' references
to patch_memory() and it cannot be inlined into patch_instruction().
Instead patch_instruction() becomes a single branch directly to
patch_memory().

We can fix this by marking patch_instruction() as noinline, but this
prevents patch_memory() from being directly inlined into patch_branch()
when RWX is disabled and patch_memory() is very small.

It may be possible to avoid this by merging together patch_instruction()
and patch_memory() on ppc32, but the only way I can think to do this
without duplicating the implementation involves using the preprocessor
to change if is_dword is a parameter or a local variable, which is gross.

For now I've removed the noinline, because at least the compiler might
get smarter in future and do the inlines correctly. If noinline remains
then there is no chance of it working.

Changes from v1:
  * Addressed the v1 review actions
  * Removed noinline (for now)

v1: 
https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20230207015643.590684-1-bg...@linux.ibm.com/

Benjamin Gray (3):
  powerpc/code-patching: Add generic memory patching
  powerpc/64: Convert patch_instruction() to patch_u32()
  powerpc/32: Convert patch_instruction() to patch_uint()

 arch/powerpc/include/asm/code-patching.h | 33 
 arch/powerpc/kernel/module_64.c  |  5 +-
 arch/powerpc/kernel/static_call.c|  2 +-
 arch/powerpc/lib/code-patching.c | 66 ++--
 arch/powerpc/platforms/powermac/smp.c|  2 +-
 5 files changed, 87 insertions(+), 21 deletions(-)

-- 
2.39.2


[PATCH 06/12] powerpc: Annotate endianness of various variables and functions

2023-10-11 Thread Benjamin Gray
Sparse reports several endianness warnings on variables and functions
that are consistently treated as big endian. There are no
multi-endianness shenanigans going on here so fix these low hanging
fruit up in one patch.

All changes are just type annotations; no endianness switching
operations are introduced by this patch.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h|  2 +-
 arch/powerpc/include/asm/imc-pmu.h  | 16 
 arch/powerpc/kernel/prom_init.c |  2 +-
 arch/powerpc/kexec/core_64.c|  4 ++--
 arch/powerpc/kexec/file_load_64.c   |  6 +++---
 arch/powerpc/mm/drmem.c |  2 +-
 arch/powerpc/perf/hv-24x7.c |  2 +-
 arch/powerpc/perf/imc-pmu.c |  9 +
 arch/powerpc/platforms/powermac/feature.c   |  3 ++-
 arch/powerpc/platforms/pseries/hotplug-memory.c |  3 ++-
 10 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 5c497c862d75..7b0d0f9d343a 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -647,7 +647,7 @@ static inline pte_t pte_mkuser(pte_t pte)
  */
 static inline int pte_devmap(pte_t pte)
 {
-   u64 mask = cpu_to_be64(_PAGE_DEVMAP | _PAGE_PTE);
+   __be64 mask = cpu_to_be64(_PAGE_DEVMAP | _PAGE_PTE);
 
return (pte_raw(pte) & mask) == mask;
 }
diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index 699a88584ae1..a656635df386 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -74,14 +74,14 @@ struct imc_events {
  * The following is the data structure to hold trace imc data.
  */
 struct trace_imc_data {
-   u64 tb1;
-   u64 ip;
-   u64 val;
-   u64 cpmc1;
-   u64 cpmc2;
-   u64 cpmc3;
-   u64 cpmc4;
-   u64 tb2;
+   __be64 tb1;
+   __be64 ip;
+   __be64 val;
+   __be64 cpmc1;
+   __be64 cpmc2;
+   __be64 cpmc3;
+   __be64 cpmc4;
+   __be64 tb2;
 };
 
 /* Event attribute array index */
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index d464ba412084..e67effdba85c 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -947,7 +947,7 @@ struct option_vector7 {
 } __packed;
 
 struct ibm_arch_vec {
-   struct { u32 mask, val; } pvrs[14];
+   struct { __be32 mask, val; } pvrs[14];
 
u8 num_vectors;
 
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index a79e28c91e2b..0bee7ca9a77c 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -379,8 +379,8 @@ void default_machine_kexec(struct kimage *image)
 
 #ifdef CONFIG_PPC_64S_HASH_MMU
 /* Values we need to export to the second kernel via the device tree. */
-static unsigned long htab_base;
-static unsigned long htab_size;
+static __be64 htab_base;
+static __be64 htab_size;
 
 static struct property htab_base_prop = {
.name = "linux,htab-base",
diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index 09187aca3d1f..961a6dd67365 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -32,7 +32,7 @@
 #include 
 
 struct umem_info {
-   u64 *buf;   /* data buffer for usable-memory property */
+   __be64 *buf;/* data buffer for usable-memory property */
u32 size;   /* size allocated for the data buffer */
u32 max_entries;/* maximum no. of entries */
u32 idx;/* index of current entry */
@@ -443,10 +443,10 @@ static int locate_mem_hole_bottom_up_ppc64(struct 
kexec_buf *kbuf,
  *
  * Returns buffer on success, NULL on error.
  */
-static u64 *check_realloc_usable_mem(struct umem_info *um_info, int cnt)
+static __be64 *check_realloc_usable_mem(struct umem_info *um_info, int cnt)
 {
u32 new_size;
-   u64 *tbuf;
+   __be64 *tbuf;
 
if ((um_info->idx + cnt) <= um_info->max_entries)
return um_info->buf;
diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index 2369d1bf2411..fde7790277f7 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -67,7 +67,7 @@ static int drmem_update_dt_v1(struct device_node *memory,
struct property *new_prop;
struct of_drconf_cell_v1 *dr_cell;
struct drmem_lmb *lmb;
-   u32 *p;
+   __be32 *p;
 
new_prop = clone_property(prop, prop->length);
if (!new_prop)
diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 3449be7c0d51..057ec2e3451d 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -1338,7 +1338,7 @@ static int get_count_from_result(struct perf_event *

[PATCH 07/12] powerpc/kvm: Force cast endianness of KVM shared regs

2023-10-10 Thread Benjamin Gray
Sparse reports endianness mismatches in the KVM shared regs getter and
setter helpers.

This code has dynamic endianness behind a safe interface, so a force is
warranted here to tell sparse this is OK.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/include/asm/kvm_ppc.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index b4da8514af43..b547bcf1a995 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -943,18 +943,18 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu 
*vcpu, ulong val) \
 static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)  \
 {  \
if (kvmppc_shared_big_endian(vcpu)) \
-  return be##size##_to_cpu(vcpu->arch.shared->reg);\
+   return be##size##_to_cpu((__be##size 
__force)vcpu->arch.shared->reg);   \
else\
-  return le##size##_to_cpu(vcpu->arch.shared->reg);\
+   return le##size##_to_cpu((__le##size 
__force)vcpu->arch.shared->reg);   \
 }  \
 
 #define SHARED_WRAPPER_SET(reg, size)  \
 static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)
\
 {  \
if (kvmppc_shared_big_endian(vcpu)) \
-  vcpu->arch.shared->reg = cpu_to_be##size(val);   \
+   vcpu->arch.shared->reg = (u##size __force)cpu_to_be##size(val); 
\
else\
-  vcpu->arch.shared->reg = cpu_to_le##size(val);   \
+   vcpu->arch.shared->reg = (u##size __force)cpu_to_le##size(val); 
\
 }  \
 
 #define SHARED_WRAPPER(reg, size)  \
-- 
2.39.2



[PATCH 12/12] powerpc/fadump: Annotate endianness cast with __force

2023-10-10 Thread Benjamin Gray
Sparse reports an endianness error with the else case of

  val = (cpu_endian ? be64_to_cpu(reg_entry->reg_val) :
 (u64)(reg_entry->reg_val));

This is a safe operation because the code is explicitly working with
dynamic endianness, so add the __force annotation to tell Sparse to
ignore it.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/platforms/powernv/opal-fadump.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/opal-fadump.h 
b/arch/powerpc/platforms/powernv/opal-fadump.h
index 3f715efb0aa6..5eeb794b5eb1 100644
--- a/arch/powerpc/platforms/powernv/opal-fadump.h
+++ b/arch/powerpc/platforms/powernv/opal-fadump.h
@@ -135,7 +135,7 @@ static inline void opal_fadump_read_regs(char *bufp, 
unsigned int regs_cnt,
for (i = 0; i < regs_cnt; i++, bufp += reg_entry_size) {
reg_entry = (struct hdat_fadump_reg_entry *)bufp;
val = (cpu_endian ? be64_to_cpu(reg_entry->reg_val) :
-  (u64)(reg_entry->reg_val));
+  (u64 __force)(reg_entry->reg_val));
opal_fadump_set_regval_regnum(regs,
  be32_to_cpu(reg_entry->reg_type),
  be32_to_cpu(reg_entry->reg_num),
-- 
2.39.2



[PATCH 09/12] powerpc/uaccess: Cast away __user annotation after verification

2023-10-10 Thread Benjamin Gray
Sparse reports dereference of a __user pointer. copy_mc_to_user() takes
a __user pointer, verifies it, then calls the generic copy routine
copy_mc_generic().

As we have verified the pointer, cast out the __user annotation when
passing to copy_mc_generic().

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/include/asm/uaccess.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index fb725ec77926..f1f9890f50d3 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -374,7 +374,7 @@ copy_mc_to_user(void __user *to, const void *from, unsigned 
long n)
if (check_copy_size(from, n, true)) {
if (access_ok(to, n)) {
allow_write_to_user(to, n);
-   n = copy_mc_generic((void *)to, from, n);
+   n = copy_mc_generic((void __force *)to, from, n);
prevent_write_to_user(to, n);
}
}
-- 
2.39.2



[PATCH 11/12] powerpc/eeh: Remove unnecessary cast

2023-10-10 Thread Benjamin Gray
Sparse reports a warning when casting to an int. There is no need to
cast in the first place, so drop them.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/kernel/eeh_driver.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 438568a472d0..48773d2d9be3 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -39,7 +39,7 @@ static int eeh_result_priority(enum pci_ers_result result)
case PCI_ERS_RESULT_NEED_RESET:
return 6;
default:
-   WARN_ONCE(1, "Unknown pci_ers_result value: %d\n", (int)result);
+   WARN_ONCE(1, "Unknown pci_ers_result value: %d\n", result);
return 0;
}
 };
@@ -60,7 +60,7 @@ static const char *pci_ers_result_name(enum pci_ers_result 
result)
case PCI_ERS_RESULT_NO_AER_DRIVER:
return "no AER driver";
default:
-   WARN_ONCE(1, "Unknown result type: %d\n", (int)result);
+   WARN_ONCE(1, "Unknown result type: %d\n", result);
return "unknown";
}
 };
-- 
2.39.2



[PATCH 10/12] powerpc: Cast away __iomem in low level IO routines

2023-10-10 Thread Benjamin Gray
Sparse reports dereferencing an __iomem pointer. These routines
are clearly low level handlers for IO memory, so force cast away
the __iomem annotation to tell sparse the dereferences are safe.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/kernel/io.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/io.c b/arch/powerpc/kernel/io.c
index 2f29b7d432de..6af535905984 100644
--- a/arch/powerpc/kernel/io.c
+++ b/arch/powerpc/kernel/io.c
@@ -33,7 +33,7 @@ void _insb(const volatile u8 __iomem *port, void *buf, long 
count)
return;
asm volatile("sync");
do {
-   tmp = *port;
+   tmp = *(const volatile u8 __force *)port;
eieio();
*tbuf++ = tmp;
} while (--count != 0);
@@ -49,7 +49,7 @@ void _outsb(volatile u8 __iomem *port, const void *buf, long 
count)
return;
asm volatile("sync");
do {
-   *port = *tbuf++;
+   *(volatile u8 __force *)port = *tbuf++;
} while (--count != 0);
asm volatile("sync");
 }
@@ -64,7 +64,7 @@ void _insw_ns(const volatile u16 __iomem *port, void *buf, 
long count)
return;
asm volatile("sync");
do {
-   tmp = *port;
+   tmp = *(const volatile u16 __force *)port;
eieio();
*tbuf++ = tmp;
} while (--count != 0);
@@ -80,7 +80,7 @@ void _outsw_ns(volatile u16 __iomem *port, const void *buf, 
long count)
return;
asm volatile("sync");
do {
-   *port = *tbuf++;
+   *(volatile u16 __force *)port = *tbuf++;
} while (--count != 0);
asm volatile("sync");
 }
@@ -95,7 +95,7 @@ void _insl_ns(const volatile u32 __iomem *port, void *buf, 
long count)
return;
asm volatile("sync");
do {
-   tmp = *port;
+   tmp = *(const volatile u32 __force *)port;
eieio();
*tbuf++ = tmp;
} while (--count != 0);
@@ -111,7 +111,7 @@ void _outsl_ns(volatile u32 __iomem *port, const void *buf, 
long count)
return;
asm volatile("sync");
do {
-   *port = *tbuf++;
+   *(volatile u32 __force *)port = *tbuf++;
} while (--count != 0);
asm volatile("sync");
 }
-- 
2.39.2



[PATCH 08/12] powerpc/opal: Annotate out param endianness

2023-10-10 Thread Benjamin Gray
Sparse reports an endian mismatch with args to opal_int_get_xirr().
Checking the skiboot source[1] shows the function takes a __be32*
(as expected), so update the function declaration to reflect this.

[1]: https://github.com/open-power/skiboot/blob/80e2b1dc73/hw/xive.c#L3479

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/include/asm/opal.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index a9b31cc258fc..b66b0c615f4f 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -227,7 +227,7 @@ int64_t opal_pci_set_power_state(uint64_t async_token, 
uint64_t id,
 uint64_t data);
 int64_t opal_pci_poll2(uint64_t id, uint64_t data);
 
-int64_t opal_int_get_xirr(uint32_t *out_xirr, bool just_poll);
+int64_t opal_int_get_xirr(__be32 *out_xirr, bool just_poll);
 int64_t opal_int_set_cppr(uint8_t cppr);
 int64_t opal_int_eoi(uint32_t xirr);
 int64_t opal_int_set_mfrr(uint32_t cpu, uint8_t mfrr);
-- 
2.39.2



[PATCH 05/12] powerpc: Remove extern from function implementations

2023-10-10 Thread Benjamin Gray
Sparse reports several function implementations annotated with extern.
This is clearly incorrect, likely just copied from an actual extern
declaration in another file.

Fix the sparse warnings by removing extern.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/kernel/iommu.c  | 8 
 arch/powerpc/kernel/traps.c  | 4 ++--
 arch/powerpc/kvm/book3s_64_vio.c | 8 
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 14251bc5219e..3e28579f7c62 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1074,10 +1074,10 @@ int iommu_tce_check_gpa(unsigned long page_shift, 
unsigned long gpa)
 }
 EXPORT_SYMBOL_GPL(iommu_tce_check_gpa);
 
-extern long iommu_tce_xchg_no_kill(struct mm_struct *mm,
-   struct iommu_table *tbl,
-   unsigned long entry, unsigned long *hpa,
-   enum dma_data_direction *direction)
+long iommu_tce_xchg_no_kill(struct mm_struct *mm,
+   struct iommu_table *tbl,
+   unsigned long entry, unsigned long *hpa,
+   enum dma_data_direction *direction)
 {
long ret;
unsigned long size = 0;
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 64ff37721fd0..12d128238cfe 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -157,7 +157,7 @@ static int die_owner = -1;
 static unsigned int die_nest_count;
 static int die_counter;
 
-extern void panic_flush_kmsg_start(void)
+void panic_flush_kmsg_start(void)
 {
/*
 * These are mostly taken from kernel/panic.c, but tries to do
@@ -170,7 +170,7 @@ extern void panic_flush_kmsg_start(void)
bust_spinlocks(1);
 }
 
-extern void panic_flush_kmsg_end(void)
+void panic_flush_kmsg_end(void)
 {
kmsg_dump(KMSG_DUMP_PANIC);
bust_spinlocks(0);
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 93b695b289e9..15200d766fc5 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -77,8 +77,8 @@ static void kvm_spapr_tce_liobn_put(struct kref *kref)
call_rcu(>rcu, kvm_spapr_tce_iommu_table_free);
 }
 
-extern void kvm_spapr_tce_release_iommu_group(struct kvm *kvm,
-   struct iommu_group *grp)
+void kvm_spapr_tce_release_iommu_group(struct kvm *kvm,
+  struct iommu_group *grp)
 {
int i;
struct kvmppc_spapr_tce_table *stt;
@@ -105,8 +105,8 @@ extern void kvm_spapr_tce_release_iommu_group(struct kvm 
*kvm,
rcu_read_unlock();
 }
 
-extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
-   struct iommu_group *grp)
+long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
+ struct iommu_group *grp)
 {
struct kvmppc_spapr_tce_table *stt = NULL;
bool found = false;
-- 
2.39.2



[PATCH 04/12] powerpc: Use NULL instead of 0 for null pointers

2023-10-10 Thread Benjamin Gray
Sparse reports several uses of 0 for pointer arguments and comparisons.
Replace with NULL to better convey the intent. Remove entirely if a
comparison to follow the kernel style of implicit boolean conversions.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/kernel/setup_64.c| 2 +-
 arch/powerpc/kvm/book3s_xive_native.c | 2 +-
 arch/powerpc/net/bpf_jit_comp.c   | 8 
 arch/powerpc/perf/imc-pmu.c   | 2 +-
 arch/powerpc/platforms/4xx/soc.c  | 2 +-
 arch/powerpc/platforms/pseries/lpar.c | 8 
 6 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 246201d0d879..2f19d5e94485 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -364,7 +364,7 @@ void __init early_setup(unsigned long dt_ptr)
 */
initialise_paca(_paca, 0);
fixup_boot_paca(_paca);
-   WARN_ON(local_paca != 0);
+   WARN_ON(local_paca);
setup_paca(_paca); /* install the paca into registers */
 
/*  printk is now safe to use --- */
diff --git a/arch/powerpc/kvm/book3s_xive_native.c 
b/arch/powerpc/kvm/book3s_xive_native.c
index 712ab91ced39..6e2ebbd8aaac 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -567,7 +567,7 @@ static int kvmppc_xive_native_set_queue_config(struct 
kvmppc_xive *xive,
u8 priority;
struct kvm_ppc_xive_eq kvm_eq;
int rc;
-   __be32 *qaddr = 0;
+   __be32 *qaddr = NULL;
struct page *page;
struct xive_q *q;
gfn_t gfn;
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 37043dfc1add..316cadf442b2 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -119,7 +119,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
 
/* Scouting faux-generate pass 0 */
-   if (bpf_jit_build_body(fp, 0, , addrs, 0, false)) {
+   if (bpf_jit_build_body(fp, NULL, , addrs, 0, false)) {
/* We hit something illegal or unsupported. */
fp = org_fp;
goto out_addrs;
@@ -134,7 +134,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 */
if (cgctx.seen & SEEN_TAILCALL || 
!is_offset_in_branch_range((long)cgctx.idx * 4)) {
cgctx.idx = 0;
-   if (bpf_jit_build_body(fp, 0, , addrs, 0, false)) {
+   if (bpf_jit_build_body(fp, NULL, , addrs, 0, false)) {
fp = org_fp;
goto out_addrs;
}
@@ -146,9 +146,9 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 * update ctgtx.idx as it pretends to output instructions, then we can
 * calculate total size from idx.
 */
-   bpf_jit_build_prologue(0, );
+   bpf_jit_build_prologue(NULL, );
addrs[fp->len] = cgctx.idx * 4;
-   bpf_jit_build_epilogue(0, );
+   bpf_jit_build_epilogue(NULL, );
 
fixup_len = fp->aux->num_exentries * BPF_FIXUP_LEN * 4;
extable_len = fp->aux->num_exentries * sizeof(struct 
exception_table_entry);
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 9d229ef7f86e..eba2baabccb0 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -544,7 +544,7 @@ static int nest_imc_event_init(struct perf_event *event)
break;
}
pcni++;
-   } while (pcni->vbase != 0);
+   } while (pcni->vbase);
 
if (!flag)
return -ENODEV;
diff --git a/arch/powerpc/platforms/4xx/soc.c b/arch/powerpc/platforms/4xx/soc.c
index b2d940437a66..5412e6b21e10 100644
--- a/arch/powerpc/platforms/4xx/soc.c
+++ b/arch/powerpc/platforms/4xx/soc.c
@@ -112,7 +112,7 @@ static int __init ppc4xx_l2c_probe(void)
}
 
/* Install error handler */
-   if (request_irq(irq, l2c_error_handler, 0, "L2C", 0) < 0) {
+   if (request_irq(irq, l2c_error_handler, 0, "L2C", NULL) < 0) {
printk(KERN_ERR "Cannot install L2C error handler"
   ", cache is not enabled\n");
of_node_put(np);
diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index f2cb62148f36..b4e86b7ea584 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -192,9 +192,9 @@ static void free_dtl_buffers(unsigned long *time_limit)
continue;
kmem_cache_free(dtl_cache, pp->dispatch_log);
pp->dtl_ridx = 0;
-   pp->dispatch_log = 0;
-   pp->dispatch_log_end = 0;
-   pp->dtl_curr = 0;
+   pp->dispat

[PATCH 00/12] Miscellaneous Sparse fixes

2023-10-10 Thread Benjamin Gray
There are many Sparse warnings in the kernel, including the powerpc
directory. This series provides fixes for some low-hanging fruit found
when trying to triage the warnings earlier this year. It addresses about
100 warnings (many have the same root cause).

I know there's concerns about making it harder to backport things. In
general, as someone who was not around during the development of these
features, I think that it is useful make the annotations as correct as
possible. But it's no fuss if some/all of the patches are nacked for this
reason. I just figured some of it might be useful instead of continuing to
sit on it indefinitely.

Benjamin Gray (12):
  powerpc/xive: Fix endian conversion size
  powerpc/pseries: Restructure hvc_get_chars() endianness
  powerpc: Explicitly reverse bytes when checking for byte reversal
  powerpc: Use NULL instead of 0 for null pointers
  powerpc: Remove extern from function implementations
  powerpc: Annotate endianness of various variables and functions
  powerpc/kvm: Force cast endianness of KVM shared regs
  powerpc/opal: Annotate out param endianness
  powerpc/uaccess: Cast away __user annotation after verification
  powerpc: Cast away __iomem in low level IO routines
  powerpc/eeh: Remove unnecessary cast
  powerpc/fadump: Annotate endianness cast with __force

 arch/powerpc/include/asm/book3s/64/pgtable.h|  2 +-
 arch/powerpc/include/asm/imc-pmu.h  | 16 
 arch/powerpc/include/asm/kvm_ppc.h  |  8 
 arch/powerpc/include/asm/opal.h |  2 +-
 arch/powerpc/include/asm/uaccess.h  |  2 +-
 arch/powerpc/kernel/eeh_driver.c|  4 ++--
 arch/powerpc/kernel/io.c| 12 ++--
 arch/powerpc/kernel/iommu.c |  8 
 arch/powerpc/kernel/prom_init.c |  2 +-
 arch/powerpc/kernel/setup_64.c  |  2 +-
 arch/powerpc/kernel/traps.c |  4 ++--
 arch/powerpc/kexec/core_64.c|  4 ++--
 arch/powerpc/kexec/file_load_64.c   |  6 +++---
 arch/powerpc/kvm/book3s_64_vio.c|  8 
 arch/powerpc/kvm/book3s_xive_native.c   |  2 +-
 arch/powerpc/mm/drmem.c |  2 +-
 arch/powerpc/net/bpf_jit_comp.c |  8 
 arch/powerpc/perf/hv-24x7.c |  2 +-
 arch/powerpc/perf/imc-pmu.c | 11 ++-
 arch/powerpc/platforms/4xx/soc.c|  2 +-
 arch/powerpc/platforms/powermac/feature.c   |  3 ++-
 arch/powerpc/platforms/powernv/opal-fadump.h|  2 +-
 arch/powerpc/platforms/pseries/hotplug-memory.c |  3 ++-
 arch/powerpc/platforms/pseries/hvconsole.c  |  6 +++---
 arch/powerpc/platforms/pseries/lpar.c   |  8 
 arch/powerpc/sysdev/mpic.c  |  2 +-
 arch/powerpc/sysdev/xive/native.c   |  2 +-
 27 files changed, 68 insertions(+), 65 deletions(-)

-- 
2.39.2


[PATCH 01/12] powerpc/xive: Fix endian conversion size

2023-10-10 Thread Benjamin Gray
Sparse reports a size mismatch in the endian swap. The Opal
implementation[1] passes the value as a __be64, and the receiving
variable out_qsize is a u64, so the use of be32_to_cpu() appears to be
an error.

[1]: https://github.com/open-power/skiboot/blob/80e2b1dc73/hw/xive.c#L3854

Fixes: 88ec6b93c8e7 ("powerpc/xive: add OPAL extensions for the XIVE native 
exploitation support")
Signed-off-by: Benjamin Gray 
---
 arch/powerpc/sysdev/xive/native.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/sysdev/xive/native.c 
b/arch/powerpc/sysdev/xive/native.c
index 9f0af4d795d8..f1c0fa6ece21 100644
--- a/arch/powerpc/sysdev/xive/native.c
+++ b/arch/powerpc/sysdev/xive/native.c
@@ -802,7 +802,7 @@ int xive_native_get_queue_info(u32 vp_id, u32 prio,
if (out_qpage)
*out_qpage = be64_to_cpu(qpage);
if (out_qsize)
-   *out_qsize = be32_to_cpu(qsize);
+   *out_qsize = be64_to_cpu(qsize);
if (out_qeoi_page)
*out_qeoi_page = be64_to_cpu(qeoi_page);
if (out_escalate_irq)
-- 
2.39.2



[PATCH 03/12] powerpc: Explicitly reverse bytes when checking for byte reversal

2023-10-10 Thread Benjamin Gray
Sparse reports an invalid endian cast here. The code is written for
big endian platforms, so le32_to_cpu() acts as a byte reversal.

This file is checked by sparse on a little endian build though, so
replace the reverse function with the dedicated swab32() function to
better express the intent of the code.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/sysdev/mpic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/sysdev/mpic.c b/arch/powerpc/sysdev/mpic.c
index ba287abcb008..dabbdd356664 100644
--- a/arch/powerpc/sysdev/mpic.c
+++ b/arch/powerpc/sysdev/mpic.c
@@ -355,7 +355,7 @@ static void __init mpic_test_broken_ipi(struct mpic *mpic)
mpic_write(mpic->gregs, MPIC_INFO(GREG_IPI_VECTOR_PRI_0), 
MPIC_VECPRI_MASK);
r = mpic_read(mpic->gregs, MPIC_INFO(GREG_IPI_VECTOR_PRI_0));
 
-   if (r == le32_to_cpu(MPIC_VECPRI_MASK)) {
+   if (r == swab32(MPIC_VECPRI_MASK)) {
printk(KERN_INFO "mpic: Detected reversed IPI registers\n");
mpic->flags |= MPIC_BROKEN_IPI;
}
-- 
2.39.2



[PATCH 02/12] powerpc/pseries: Restructure hvc_get_chars() endianness

2023-10-10 Thread Benjamin Gray
Sparse reports an endian mismatch in hvc_get_chars().

At first it seemed like the retbuf should be __be64[], but actually
retbuf holds serialized registers returned by the hypervisor call, so
it's correctly CPU endian typed.

Instead, it is the be64_to_cpu() that's misleading. The intent is to do
a byte swap on a little endian CPU. The swap is required because we
wanted to store the register values to memory without 'swapping' bytes,
so that the high order byte of the first register is the first byte
in the result buffer.

In diagram form, on a LE CPU with the letters representing the return
string bytes:

(register bytes) A B C D E F G H   I J K L M N O P
  (retbuf mem bytes) H G F E D C B A   P O N M L K J I
(buf/lbuf mem bytes) A B C D E F G H   I J K L M N O P

So retbuf stores the registers in CPU endian, and buf always stores in
big endian.

So replace the byte swap function with cpu_to_be64() and cast lbuf as an
array of __be64 to match the semantics closer.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/platforms/pseries/hvconsole.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hvconsole.c 
b/arch/powerpc/platforms/pseries/hvconsole.c
index 1ac52963e08b..647718a15e78 100644
--- a/arch/powerpc/platforms/pseries/hvconsole.c
+++ b/arch/powerpc/platforms/pseries/hvconsole.c
@@ -29,11 +29,11 @@ int hvc_get_chars(uint32_t vtermno, char *buf, int count)
 {
long ret;
unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
-   unsigned long *lbuf = (unsigned long *)buf;
+   __be64 *lbuf = (__be64 __force *)buf;
 
ret = plpar_hcall(H_GET_TERM_CHAR, retbuf, vtermno);
-   lbuf[0] = be64_to_cpu(retbuf[1]);
-   lbuf[1] = be64_to_cpu(retbuf[2]);
+   lbuf[0] = cpu_to_be64(retbuf[1]);
+   lbuf[1] = cpu_to_be64(retbuf[2]);
 
if (ret == H_SUCCESS)
return retbuf[0];
-- 
2.39.2



[RFC PATCH 5/6] powerpc/dexcr: Add sysctl entry for SBHE system override

2023-10-09 Thread Benjamin Gray
The DEXCR Speculative Branch Hint Enable (SBHE) aspect controls whether
the hints provided by BO field of Branch instructions are obeyed during
speculative execution.

SBHE behaviour per ISA 3.1B:

0:  The hints provided by BO field of Branch instructions may be
ignored during speculative execution

1:  The hints provided by BO field of Branch instructions are obeyed
during speculative execution

Add a sysctl entry to allow changing this aspect globally in the system
at runtime:

/proc/sys/kernel/speculative_branch_hint_enable

Three values are supported:

-1: Disable DEXCR SBHE sysctl override
 0: Override and set DEXCR[SBHE] aspect to 0
 1: Override and set DEXCR[SBHE] aspect to 1

Internally, introduces a mechanism to apply arbitrary system wide
overrides on top of the prctl() config.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/include/asm/processor.h |  8 +--
 arch/powerpc/kernel/dexcr.c  | 85 
 2 files changed, 89 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index a9d83621dfad..e7b732efb968 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -461,14 +461,14 @@ int exit_vmx_usercopy(void);
 int enter_vmx_ops(void);
 void *exit_vmx_ops(void *dest);
 
-static inline unsigned long get_thread_dexcr(struct thread_struct const 
*thread)
-{
 #ifdef CONFIG_PPC_BOOK3S_64
-   return thread->dexcr_enabled;
+unsigned long get_thread_dexcr(struct thread_struct const *thread);
 #else
+static inline unsigned long get_thread_dexcr(struct thread_struct const 
*thread)
+{
return 0;
-#endif
 }
+#endif
 
 #endif /* __KERNEL__ */
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/kernel/dexcr.c b/arch/powerpc/kernel/dexcr.c
index db663ce7b3ce..e790f76787db 100644
--- a/arch/powerpc/kernel/dexcr.c
+++ b/arch/powerpc/kernel/dexcr.c
@@ -1,7 +1,9 @@
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -16,6 +18,8 @@
 
 static unsigned long dexcr_supported __ro_after_init = 0;
 
+static int spec_branch_hint_enable = -1;
+
 static int __init dexcr_init(void)
 {
if (!early_cpu_has_feature(CPU_FTR_ARCH_31))
@@ -37,6 +41,35 @@ static int __init dexcr_init(void)
 }
 early_initcall(dexcr_init);
 
+unsigned long get_thread_dexcr(struct thread_struct const *thread)
+{
+   unsigned long dexcr = thread->dexcr_enabled;
+
+   /* 
+* spec_branch_hint_enable may be written to concurrently via sysctl.
+* The sysctl handler is careful to use WRITE_ONCE, so we avoid
+* tearing/different values with READ_ONCE 
+*/
+   switch (READ_ONCE(spec_branch_hint_enable)) {
+   case 0:
+   dexcr &= ~DEXCR_PR_SBHE;
+   break;
+   case 1:
+   dexcr |= DEXCR_PR_SBHE;
+   break;
+   }
+
+   return dexcr;
+}
+
+static void update_dexcr_on_cpu(void *_info)
+{
+   /* ensure the spec_branch_hint_enable write propagated to this CPU */
+   smp_mb();
+
+   mtspr(SPRN_DEXCR, get_thread_dexcr(>thread));
+}
+
 static int prctl_to_aspect(unsigned long which, unsigned int *aspect)
 {
switch (which) {
@@ -126,3 +159,55 @@ int set_dexcr_prctl(struct task_struct *task, unsigned 
long which, unsigned long
 
return 0;
 }
+
+#ifdef CONFIG_SYSCTL
+
+static const int min_sysctl_val = -1;
+
+static int sysctl_dexcr_sbhe_handler(struct ctl_table *table, int write,
+void *buf, size_t *lenp, loff_t *ppos)
+{
+   int err = 0;
+   int prev;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   if (!cpu_has_feature(CPU_FTR_DEXCR_SBHE))
+   return -ENODEV;
+
+   prev = READ_ONCE(spec_branch_hint_enable);
+   
+   err = proc_dointvec_minmax(table, write, buf, lenp, ppos);
+   if (err)
+   return err;
+
+   if (spec_branch_hint_enable != prev && write)
+   on_each_cpu(update_dexcr_on_cpu, NULL, 1);
+
+   return 0;
+}
+
+static struct ctl_table dexcr_sbhe_ctl_table[] = {
+   {
+   .procname   = "speculative_branch_hint_enable",
+   .data   = _branch_hint_enable,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = sysctl_dexcr_sbhe_handler,
+   .extra1 = (void *)_sysctl_val,
+   .extra2 = SYSCTL_ONE,
+   }
+};
+
+static int __init register_dexcr_aspects_sysctl(void)
+{
+   if (!early_cpu_has_feature(CPU_FTR_DEXCR_SBHE))
+   return -ENODEV;
+
+   register_sysctl("kernel", dexcr_sbhe_ctl_table);
+   return 0;
+}
+device_initcall(register_dexcr_aspects_sysctl);
+
+#endif /* CONFIG_SYSCTL */
-- 
2.41.0



[RFC PATCH 6/6] powerpc/dexcr: Add enforced userspace ROP protection config

2023-10-08 Thread Benjamin Gray
The DEXCR Non-Privileged Hash Instruction Enable (NPHIE) aspect controls
whether the hashst and hashchk instructions are treated as no-ops by the
CPU.

NPHIE behaviour per ISA 3.1B:

0:  hashst and hashchk instructions are executed as no-ops
(even when allowed by PCR)

1:  hashst and hashchk instructions are executed normally
(if allowed by PCR)

Currently this aspect may be set per-process by prctl() or enforced
globally by the hypervisor.

Add a kernel config option PPC_USER_ENFORCE_ROP_PROTECT to enforce
DEXCR[NPHIE] globally regardless of prctl() or hypervisor. If set, don't
report NPHIE as editable via prctl(), as the prctl() value can never
take effect.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/Kconfig| 5 +
 arch/powerpc/kernel/dexcr.c | 7 ++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 3aaadfd2c8eb..4851cb463dc0 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -473,6 +473,11 @@ config PGTABLE_LEVELS
default 2 if !PPC64
default 4
 
+config PPC_USER_ENFORCE_ROP_PROTECT
+   bool
+   depends on PPC_BOOK3S_64
+   default y
+
 source "arch/powerpc/sysdev/Kconfig"
 source "arch/powerpc/platforms/Kconfig"
 
diff --git a/arch/powerpc/kernel/dexcr.c b/arch/powerpc/kernel/dexcr.c
index e790f76787db..01d8fa28ca17 100644
--- a/arch/powerpc/kernel/dexcr.c
+++ b/arch/powerpc/kernel/dexcr.c
@@ -1,6 +1,8 @@
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -14,7 +16,7 @@
 #define DEXCR_PRCTL_EDITABLE ( \
DEXCR_PR_IBRTPD | \
DEXCR_PR_SRAPD | \
-   DEXCR_PR_NPHIE)
+   (!IS_ENABLED(CONFIG_PPC_USER_ENFORCE_ROP_PROTECT) ? DEXCR_PR_NPHIE : 0))
 
 static unsigned long dexcr_supported __ro_after_init = 0;
 
@@ -45,6 +47,9 @@ unsigned long get_thread_dexcr(struct thread_struct const 
*thread)
 {
unsigned long dexcr = thread->dexcr_enabled;
 
+   if (IS_ENABLED(CONFIG_PPC_USER_ENFORCE_ROP_PROTECT))
+   dexcr |= DEXCR_PR_NPHIE;
+
/* 
 * spec_branch_hint_enable may be written to concurrently via sysctl.
 * The sysctl handler is careful to use WRITE_ONCE, so we avoid
-- 
2.41.0



[RFC PATCH 2/6] powerpc/dexcr: Add thread specific DEXCR configuration

2023-10-08 Thread Benjamin Gray
Add capability to track a DEXCR value per thread.

Nothing actually changes these values yet, but they are correctly
tracked, propagated, and used to set the hardware register.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/include/asm/processor.h | 12 
 arch/powerpc/kernel/process.c| 24 
 2 files changed, 36 insertions(+)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index b2c51d337e60..28a72023f9bd 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -260,6 +260,9 @@ struct thread_struct {
unsigned long   sier2;
unsigned long   sier3;
unsigned long   hashkeyr;
+   unsigned intdexcr;  /* Temporary value saved during thread 
switch */
+   unsigned intdexcr_enabled;  /* Bitmask of aspects enabled by this 
thread */
+   unsigned intdexcr_inherit;  /* Bitmask of aspects to inherit across 
exec */
 
 #endif
 };
@@ -448,6 +451,15 @@ int exit_vmx_usercopy(void);
 int enter_vmx_ops(void);
 void *exit_vmx_ops(void *dest);
 
+static inline unsigned long get_thread_dexcr(struct thread_struct const 
*thread)
+{
+#ifdef CONFIG_PPC_BOOK3S_64
+   return thread->dexcr_enabled;
+#else
+   return 0;
+#endif
+}
+
 #endif /* __KERNEL__ */
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_POWERPC_PROCESSOR_H */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index b68898ac07e1..c05d58b7c6b3 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1185,6 +1185,9 @@ static inline void save_sprs(struct thread_struct *t)
 
if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE))
t->hashkeyr = mfspr(SPRN_HASHKEYR);
+
+   if (cpu_has_feature(CPU_FTR_ARCH_31))
+   t->dexcr = mfspr(SPRN_DEXCR);
 #endif
 }
 
@@ -1267,6 +1270,10 @@ static inline void restore_sprs(struct thread_struct 
*old_thread,
if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE) &&
old_thread->hashkeyr != new_thread->hashkeyr)
mtspr(SPRN_HASHKEYR, new_thread->hashkeyr);
+
+   if (cpu_has_feature(CPU_FTR_ARCH_31) &&
+   old_thread->dexcr != get_thread_dexcr(new_thread))
+   mtspr(SPRN_DEXCR, get_thread_dexcr(new_thread));
 #endif
 
 }
@@ -1634,6 +1641,11 @@ void arch_setup_new_exec(void)
current->thread.regs->amr  = default_amr;
current->thread.regs->iamr  = default_iamr;
 #endif
+
+#ifdef CONFIG_PPC_BOOK3S_64
+   current->thread.dexcr_enabled &= current->thread.dexcr_inherit;
+   current->thread.dexcr_enabled |= ~current->thread.dexcr_inherit & 
DEXCR_INIT; 
+#endif
 }
 
 #ifdef CONFIG_PPC64
@@ -1878,6 +1890,11 @@ int copy_thread(struct task_struct *p, const struct 
kernel_clone_args *args)
 #ifdef CONFIG_PPC_BOOK3S_64
if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE))
p->thread.hashkeyr = current->thread.hashkeyr;
+
+   if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+   p->thread.dexcr_enabled = current->thread.dexcr_enabled;
+   p->thread.dexcr_inherit = current->thread.dexcr_inherit;
+   }
 #endif
return 0;
 }
@@ -2000,6 +2017,13 @@ void start_thread(struct pt_regs *regs, unsigned long 
start, unsigned long sp)
current->thread.hashkeyr = get_random_long();
mtspr(SPRN_HASHKEYR, current->thread.hashkeyr);
}
+
+   if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+   current->thread.dexcr = 0;
+   current->thread.dexcr_enabled = DEXCR_INIT;
+   current->thread.dexcr_inherit = 0;
+   mtspr(SPRN_DEXCR, get_thread_dexcr(>thread));
+   }
 #endif /* CONFIG_PPC_BOOK3S_64 */
 }
 EXPORT_SYMBOL(start_thread);
-- 
2.41.0



[RFC PATCH 1/6] powerpc/dexcr: Make all aspects CPU features

2023-10-08 Thread Benjamin Gray
The CPU_FEATURE_* mechanism is the only way right now to get
configuration from the "ibm,pa-features" devicetree node. Add a
CPU_FEATURE_* entry for each other DEXCR aspect that will be
exposed to userspace.

The NPHIE feature value is changed for consistency; the actual value is
never accessed or exposed to userspace, so there is no breakage.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/include/asm/cputable.h | 6 +-
 arch/powerpc/kernel/prom.c  | 3 +++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index 8765d5158324..bd087d0cb5fa 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -192,7 +192,10 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTR_P9_RADIX_PREFETCH_BUG  LONG_ASM_CONST(0x0002)
 #define CPU_FTR_ARCH_31
LONG_ASM_CONST(0x0004)
 #define CPU_FTR_DAWR1  LONG_ASM_CONST(0x0008)
-#define CPU_FTR_DEXCR_NPHIELONG_ASM_CONST(0x0010)
+#define CPU_FTR_DEXCR_SBHE LONG_ASM_CONST(0x0010)
+#define CPU_FTR_DEXCR_IBRTPD   LONG_ASM_CONST(0x0020)
+#define CPU_FTR_DEXCR_SRAPDLONG_ASM_CONST(0x0040)
+#define CPU_FTR_DEXCR_NPHIELONG_ASM_CONST(0x0080)
 
 #ifndef __ASSEMBLY__
 
@@ -453,6 +456,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
CPU_FTR_ARCH_300 | CPU_FTR_ARCH_31 | \
CPU_FTR_DAWR | CPU_FTR_DAWR1 | \
+   CPU_FTR_DEXCR_SBHE | CPU_FTR_DEXCR_IBRTPD | CPU_FTR_DEXCR_SRAPD | \
CPU_FTR_DEXCR_NPHIE)
 #define CPU_FTRS_CELL  (CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 0b5878c3125b..ea081a5d2023 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -182,6 +182,9 @@ static struct ibm_feature ibm_pa_features[] __initdata = {
  .cpu_user_ftrs2 = PPC_FEATURE2_HTM_COMP | PPC_FEATURE2_HTM_NOSC_COMP 
},
 
{ .pabyte = 64, .pabit = 0, .cpu_features = CPU_FTR_DAWR1 },
+   { .pabyte = 68, .pabit = 0, .cpu_features = CPU_FTR_DEXCR_SBHE },
+   { .pabyte = 68, .pabit = 3, .cpu_features = CPU_FTR_DEXCR_IBRTPD },
+   { .pabyte = 68, .pabit = 4, .cpu_features = CPU_FTR_DEXCR_SRAPD },
{ .pabyte = 68, .pabit = 5, .cpu_features = CPU_FTR_DEXCR_NPHIE },
 };
 
-- 
2.41.0



[RFC PATCH 4/6] powerpc/dexcr: Add prctl implementation

2023-10-08 Thread Benjamin Gray
Adds an initial prctl interface implementation. Unprivileged processes
can query the current prctl setting, including whether an aspect is
implemented by the hardware or is permitted to be modified by a setter
prctl. Editable aspects can be changed by a CAP_SYS_ADMIN privileged
process.

The prctl setting represents what the process itself has requested, and
does not account for any overrides. Either the kernel or a hypervisor
may enforce a different setting for an aspect.

Userspace can access a readonly view of the current DEXCR via SPR 812,
and a readonly view of the aspects enforced by the hypervisor via
SPR 455. A bitwise OR of these two SPRs will give the effective
DEXCR aspect state of the process.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/include/asm/processor.h |  10 +++
 arch/powerpc/kernel/Makefile |   1 +
 arch/powerpc/kernel/dexcr.c  | 128 +++
 3 files changed, 139 insertions(+)
 create mode 100644 arch/powerpc/kernel/dexcr.c

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 28a72023f9bd..a9d83621dfad 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -336,6 +336,16 @@ extern int set_endian(struct task_struct *tsk, unsigned 
int val);
 extern int get_unalign_ctl(struct task_struct *tsk, unsigned long adr);
 extern int set_unalign_ctl(struct task_struct *tsk, unsigned int val);
 
+#ifdef CONFIG_PPC_BOOK3S_64
+
+#define PPC_GET_DEXCR_ASPECT(tsk, asp) get_dexcr_prctl((tsk), (asp))
+#define PPC_SET_DEXCR_ASPECT(tsk, asp, val) set_dexcr_prctl((tsk), (asp), 
(val))
+
+int get_dexcr_prctl(struct task_struct *tsk, unsigned long asp);
+int set_dexcr_prctl(struct task_struct *tsk, unsigned long asp, unsigned long 
val);
+
+#endif
+
 extern void load_fp_state(struct thread_fp_state *fp);
 extern void store_fp_state(struct thread_fp_state *fp);
 extern void load_vr_state(struct thread_vr_state *vr);
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 2919433be355..24f82b09246c 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -88,6 +88,7 @@ obj-$(CONFIG_HAVE_HW_BREAKPOINT)  += hw_breakpoint.o
 obj-$(CONFIG_PPC_DAWR) += dawr.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_ppc970.o cpu_setup_pa6t.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_power.o
+obj-$(CONFIG_PPC_BOOK3S_64)+= dexcr.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= mce.o mce_power.o
 obj-$(CONFIG_PPC_BOOK3E_64)+= exceptions-64e.o idle_64e.o
 obj-$(CONFIG_PPC_BARRIER_NOSPEC) += security.o
diff --git a/arch/powerpc/kernel/dexcr.c b/arch/powerpc/kernel/dexcr.c
new file mode 100644
index ..db663ce7b3ce
--- /dev/null
+++ b/arch/powerpc/kernel/dexcr.c
@@ -0,0 +1,128 @@
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+/* Allow thread local configuration of these by default */
+#define DEXCR_PRCTL_EDITABLE ( \
+   DEXCR_PR_IBRTPD | \
+   DEXCR_PR_SRAPD | \
+   DEXCR_PR_NPHIE)
+
+static unsigned long dexcr_supported __ro_after_init = 0;
+
+static int __init dexcr_init(void)
+{
+   if (!early_cpu_has_feature(CPU_FTR_ARCH_31))
+   return 0;
+
+   if (early_cpu_has_feature(CPU_FTR_DEXCR_SBHE))
+   dexcr_supported |= DEXCR_PR_SBHE;
+
+   if (early_cpu_has_feature(CPU_FTR_DEXCR_IBRTPD))
+   dexcr_supported |= DEXCR_PR_IBRTPD;
+
+   if (early_cpu_has_feature(CPU_FTR_DEXCR_SRAPD))
+   dexcr_supported |= DEXCR_PR_SRAPD;
+
+   if (early_cpu_has_feature(CPU_FTR_DEXCR_NPHIE))
+   dexcr_supported |= DEXCR_PR_NPHIE;
+
+   return 0;
+}
+early_initcall(dexcr_init);
+
+static int prctl_to_aspect(unsigned long which, unsigned int *aspect)
+{
+   switch (which) {
+   case PR_PPC_DEXCR_SBHE:
+   *aspect = DEXCR_PR_SBHE;
+   break;
+   case PR_PPC_DEXCR_IBRTPD:
+   *aspect = DEXCR_PR_IBRTPD;
+   break;
+   case PR_PPC_DEXCR_SRAPD:
+   *aspect = DEXCR_PR_SRAPD;
+   break;
+   case PR_PPC_DEXCR_NPHIE:
+   *aspect = DEXCR_PR_NPHIE;
+   break;
+   default:
+   return -ENODEV;
+   }
+
+   return 0;
+}
+
+int get_dexcr_prctl(struct task_struct *task, unsigned long which)
+{
+   unsigned int aspect;
+   int ret;
+
+   ret = prctl_to_aspect(which, );
+   if (ret)
+   return ret;
+   
+   if (!(aspect & dexcr_supported))
+   return -ENODEV;
+
+   if (aspect & task->thread.dexcr_enabled)
+   ret |= PR_PPC_DEXCR_CTRL_ON;
+   else
+   ret |= PR_PPC_DEXCR_CTRL_OFF;
+
+   if (aspect & task->thread.dexcr_inherit)
+   ret |= PR_PPC_DEXCR_CTRL_INHERIT;
+
+   return ret;
+}
+
+int set_dexcr_prctl(struct task_struct *task, unsigned long which, unsigned 
long ctrl)
+{
+   u

[RFC PATCH 3/6] prctl: Define PowerPC DEXCR interface

2023-10-08 Thread Benjamin Gray
Adds the definitions and generic handler for prctl control of the
PowerPC Dynamic Execution Control Register (DEXCR).

Signed-off-by: Benjamin Gray 
---
 include/uapi/linux/prctl.h | 13 +
 kernel/sys.c   | 16 
 2 files changed, 29 insertions(+)

diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 3c36aeade991..85d66ad134b1 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -305,4 +305,17 @@ struct prctl_mm_map {
 # define PR_RISCV_V_VSTATE_CTRL_NEXT_MASK  0xc
 # define PR_RISCV_V_VSTATE_CTRL_MASK   0x1f
 
+/* PowerPC Dynamic Execution Control Register (DEXCR) controls */
+#define PR_PPC_GET_DEXCR   71
+#define PR_PPC_SET_DEXCR   72
+/* DEXCR aspect to act on */
+# define PR_PPC_DEXCR_SBHE 0 /* Speculative branch hint enable */
+# define PR_PPC_DEXCR_IBRTPD   1 /* Indirect branch recurrent target 
prediction disable */
+# define PR_PPC_DEXCR_SRAPD2 /* Subroutine return address 
prediction disable */
+# define PR_PPC_DEXCR_NPHIE3 /* Non-privileged hash instruction 
enable */
+/* Action to apply / return */
+# define PR_PPC_DEXCR_CTRL_OFF (1UL << 0)
+# define PR_PPC_DEXCR_CTRL_ON  (1UL << 1)
+# define PR_PPC_DEXCR_CTRL_INHERIT (1UL << 2)
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 2410e3999ebe..0c1b8e9c3d16 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -146,6 +146,12 @@
 #ifndef RISCV_V_GET_CONTROL
 # define RISCV_V_GET_CONTROL() (-EINVAL)
 #endif
+#ifndef PPC_GET_DEXCR_ASPECT
+# define PPC_GET_DEXCR_ASPECT(a, b)(-EINVAL)
+#endif
+#ifndef PPC_SET_DEXCR_ASPECT
+# define PPC_SET_DEXCR_ASPECT(a, b, c) (-EINVAL)
+#endif
 
 /*
  * this is where the system-wide overflow UID and GID are defined, for
@@ -2686,6 +2692,16 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, 
unsigned long, arg3,
case PR_GET_MDWE:
error = prctl_get_mdwe(arg2, arg3, arg4, arg5);
break;
+   case PR_PPC_GET_DEXCR:
+   if (arg3 || arg4 || arg5)
+   return -EINVAL;
+   error = PPC_GET_DEXCR_ASPECT(me, arg2);
+   break;
+   case PR_PPC_SET_DEXCR:
+   if (arg4 || arg5)
+   return -EINVAL;
+   error = PPC_SET_DEXCR_ASPECT(me, arg2, arg3);
+   break;
case PR_SET_VMA:
error = prctl_set_vma(arg2, arg3, arg4, arg5);
break;
-- 
2.41.0



[RFC PATCH 0/6] Add dynamic DEXCR support

2023-10-08 Thread Benjamin Gray
(This RFC is mainly to get feedback on the user interface. Tests and
documentation will be added to the non-rfc followups. This builds but
is otherwise untested.)

In the "Add static DEXCR support" series[1] the kernel was made to
initialise the DEXCR to a static value on all CPUs when they online.

This series allows the DEXCR value to be changed at runtime with a
per-thread granularity. It provides a prctl interface to set and query
this configuration. It also provides a system wide sysctl override for
the SBHE aspect, which specifically has effects that can bleed over to
other CPUs (temporarily after changing it) and certain tracing tools
may require it be set globally across all threads.

Some notes on the patches/changes from the original RFC:

1. We don't need all the aspects to use feature bits, but the
   aspect information is in the device tree and this is the
   simplest mechanism to access it. Adding some kind of callback
   support to the feature detector would also work.

   The dexcr_supported variable introduced in patch 4 is a half-hearted
   example of how the callbacks could just update that variable, and
   no more CPU features would be necessary.

2. The thread used to track 'default' as a separate state (way back in
   the original RFC before the split into static/dynamic). This RFC
   simplifies it away, as it is only useful if there is a system wide
   default that can be changed. The current system wide default is
   decided at compile time, so we just initialise the thread config
   to that.

   If the 'default' state were added in future though, would that be
   a userspace ABI concern? I guess it could just return a 'default'
   flag as well as the current 'on' and 'off' flags to indicate what
   the default is.

3. The prctl controls are reduced to what I expect to be most useful.
   Default state is removed as above, and so is 'force' (where the aspect
   would no longer be configurable). 'inherit' remains as a way to control
   the DEXCR of child process trees that may not be aware of it.

4. The prctl set interface is privileged. The concern is a non-privileged
   process disabling NPHIE (HASHCHK enabler) and then invoking a setuid
   binary which doesn't set NPHIE itself. It seems that kind of 
   information about the exec target is not available in arch specific
   code.

5. A lot of the synchonization of the sysctl interface is removed.
   Apparently the kernfs system that manages these files enforces
   exclusive access to a given sysctl file. Additionally, the 
   proc_dointvec_minmax() function was made to store the result with 
   WRITE_ONCE(), so we can assume a regular atomic store of an aligned
   word.

6. The ROP protection enforcement is refactored a bit. The idea is to
   allow baking into the kernel at compile time that NPHIE cannot be
   changed by a thread. Seems to allow making the system more secure on
   paper, not sure how useful it is in practice.

7. The prctl interface tries to stay separate from the DEXCR structure.
   This makes it a little contorted (having to convert the prctl value to
   an aspect), but I think makes the interface more robust against future
   changes to the DEXCR. E.g., if all 32 aspect bits were exhausted and a
   second DEXCR added, the current interface could still handle that.


[1]: 
https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20230616034846.311705-1-bg...@linux.ibm.com/


Benjamin Gray (6):
  powerpc/dexcr: Make all aspects CPU features
  powerpc/dexcr: Add thread specific DEXCR configuration
  prctl: Define PowerPC DEXCR interface
  powerpc/dexcr: Add prctl implementation
  powerpc/dexcr: Add sysctl entry for SBHE system override
  powerpc/dexcr: Add enforced userspace ROP protection config

 arch/powerpc/Kconfig |   5 +
 arch/powerpc/include/asm/cputable.h  |   6 +-
 arch/powerpc/include/asm/processor.h |  22 +++
 arch/powerpc/kernel/Makefile |   1 +
 arch/powerpc/kernel/dexcr.c  | 218 +++
 arch/powerpc/kernel/process.c|  24 +++
 arch/powerpc/kernel/prom.c   |   3 +
 include/uapi/linux/prctl.h   |  13 ++
 kernel/sys.c |  16 ++
 9 files changed, 307 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/kernel/dexcr.c

-- 
2.41.0


[PATCH v2] powerpc/dexcr: Move HASHCHK trap handler

2023-09-14 Thread Benjamin Gray
Syzkaller reported a sleep in atomic context bug relating to the HASHCHK
handler logic

  BUG: sleeping function called from invalid context at 
arch/powerpc/kernel/traps.c:1518
  in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 25040, name: 
syz-executor
  preempt_count: 0, expected: 0
  RCU nest depth: 0, expected: 0
  no locks held by syz-executor/25040.
  irq event stamp: 34
  hardirqs last  enabled at (33): [] 
prep_irq_for_enabled_exit arch/powerpc/kernel/interrupt.c:56 [inline]
  hardirqs last  enabled at (33): [] 
interrupt_exit_user_prepare_main+0x148/0x600 arch/powerpc/kernel/interrupt.c:230
  hardirqs last disabled at (34): [] 
interrupt_enter_prepare+0x144/0x4f0 arch/powerpc/include/asm/interrupt.h:176
  softirqs last  enabled at (0): [] 
copy_process+0x16e4/0x4750 kernel/fork.c:2436
  softirqs last disabled at (0): [<>] 0x0
  CPU: 15 PID: 25040 Comm: syz-executor Not tainted 
6.5.0-rc5-1-g3ccdff6bb06d #3
  Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf06 of:IBM,FW1040.00 
(NL1040_021) hv:phyp pSeries
  Call Trace:
  [c000a8247ce0] [c032b0e4] __might_resched+0x3b4/0x400 
kernel/sched/core.c:10189
  [c000a8247d80] [c08c7dc8] __might_fault+0xa8/0x170 
mm/memory.c:5853
  [c000a8247dc0] [c004160c] do_program_check+0x32c/0xb20 
arch/powerpc/kernel/traps.c:1518
  [c000a8247e50] [c0009b2c] program_check_common_virt+0x3bc/0x3c0

To determine if a trap was caused by a HASHCHK instruction, we inspect
the user instruction that triggered the trap. However this may sleep
if the page needs to be faulted in (get_user_instr() reaches
__get_user(), which calls might_fault() and triggers the bug message).

Move the HASHCHK handler logic to after we allow IRQs, which is fine
because we are only interested in HASHCHK if it's a user space trap.

Fixes: 5bcba4e6c13f ("powerpc/dexcr: Handle hashchk exception")
Signed-off-by: Benjamin Gray 
---
v1: https://lore.kernel.org/all/20230825014910.488822-1-bg...@linux.ibm.com/

v1 -> v2: Changed commit description to mention Syzkaller and bug output
---
 arch/powerpc/kernel/traps.c | 56 -
 1 file changed, 36 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index eeff136b83d9..64ff37721fd0 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1512,23 +1512,11 @@ static void do_program_check(struct pt_regs *regs)
return;
}
 
-   if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE) && user_mode(regs)) {
-   ppc_inst_t insn;
-
-   if (get_user_instr(insn, (void __user *)regs->nip)) {
-   _exception(SIGSEGV, regs, SEGV_MAPERR, 
regs->nip);
-   return;
-   }
-
-   if (ppc_inst_primary_opcode(insn) == 31 &&
-   get_xop(ppc_inst_val(insn)) == OP_31_XOP_HASHCHK) {
-   _exception(SIGILL, regs, ILL_ILLOPN, regs->nip);
-   return;
-   }
+   /* User mode considers other cases after enabling IRQs */
+   if (!user_mode(regs)) {
+   _exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip);
+   return;
}
-
-   _exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip);
-   return;
}
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
if (reason & REASON_TM) {
@@ -1561,16 +1549,44 @@ static void do_program_check(struct pt_regs *regs)
 
/*
 * If we took the program check in the kernel skip down to sending a
-* SIGILL. The subsequent cases all relate to emulating instructions
-* which we should only do for userspace. We also do not want to enable
-* interrupts for kernel faults because that might lead to further
-* faults, and loose the context of the original exception.
+* SIGILL. The subsequent cases all relate to user space, such as
+* emulating instructions which we should only do for user space. We
+* also do not want to enable interrupts for kernel faults because that
+* might lead to further faults, and loose the context of the original
+* exception.
 */
if (!user_mode(regs))
goto sigill;
 
interrupt_cond_local_irq_enable(regs);
 
+   /*
+* (reason & REASON_TRAP) is mostly handled before enabling IRQs,
+* except get_user_instr() can sleep so we cannot reliably inspect the
+* current instruction in that context. Now that we know we are
+* handling a user space trap and can sleep, we can check if the trap
+* was a hashchk failure.
+*/
+   if (reason & REASON_T

[PATCH] powerpc/configs: Set more PPC debug configs

2023-08-29 Thread Benjamin Gray
Add more config options that wouldn't be done by the generic debug
config in kernel/configs/debug.config

CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG
Adds an initialized check on each (cpu|mmu)_has_feature()

CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
Adds some extra checks around IRQ mask manipulation

CONFIG_PPC_KUAP_DEBUG
Adds some extra KAUP checks around interrupts/context switching

CONFIG_PPC_RFI_SRR_DEBUG
Adds some extra SSR checks around interrupts/syscalls

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/configs/debug.config | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/configs/debug.config 
b/arch/powerpc/configs/debug.config
index a14ae1f20d60..bcc1fcf25e10 100644
--- a/arch/powerpc/configs/debug.config
+++ b/arch/powerpc/configs/debug.config
@@ -1 +1,5 @@
+CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG=y
+CONFIG_PPC_IRQ_SOFT_MASK_DEBUG=y
+CONFIG_PPC_KUAP_DEBUG=y
+CONFIG_PPC_RFI_SRR_DEBUG=y
 CONFIG_SCOM_DEBUGFS=y
-- 
2.41.0



Re: [PATCH 0/3] Fix preemption errors in watchpoints

2023-08-29 Thread Benjamin Gray

On 29/8/23 4:34 pm, Benjamin Gray wrote:

When enabling debug config options relating to preemption, several bugs
appear in the kernel log. With this series applied, the breakpoint code
no longer prints bugs when running the powerpc/ptrace selftests.

Benjamin Gray (3):
   powerpc/watchpoints: Disable preemption in thread_change_pc()
   powerpc/watchpoint: Disable pagefaults when getting user instruction
   powerpc/watchpoints: Annotate atomic context in more places

  arch/powerpc/kernel/hw_breakpoint.c | 16 +++-
  arch/powerpc/kernel/hw_breakpoint_constraints.c |  7 ++-
  2 files changed, 21 insertions(+), 2 deletions(-)

--
2.41.0


The particular config is below, used by appending to a 
ppc64le_guest_defconfig. Not all options are relevant. Tested on a 
Power8 and Power10 machine (1 and 2 watchpoints).


CONFIG_LOCALVERSION="-watchpoint-sleep"
CONFIG_GENERIC_IRQ_DEBUGFS=y
CONFIG_PREEMPT=y
CONFIG_PRINTK_INDEX=y
CONFIG_CGROUP_DEBUG=y
CONFIG_PPC_KUAP_DEBUG=y
CONFIG_SCOM_DEBUGFS=y
CONFIG_UDBG_RTAS_CONSOLE=y
CONFIG_RELOCATABLE_TEST=y
CONFIG_PM_DEBUG=y
CONFIG_PM_ADVANCED_DEBUG=y
CONFIG_STATIC_KEYS_SELFTEST=y
CONFIG_MODULE_DEBUG=y
CONFIG_MODULE_STATS=y
CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE=y
CONFIG_CMA_DEBUGFS=y
CONFIG_CMA_SYSFS=y
CONFIG_PERCPU_STATS=y
CONFIG_USERFAULTFD=y
CONFIG_VALIDATE_FS_PARSER=y
CONFIG_EXT4_DEBUG=y
CONFIG_HARDENED_USERCOPY=y
CONFIG_FORTIFY_SOURCE=y
CONFIG_DYNAMIC_DEBUG=y
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_DEBUG_INFO_SPLIT=y
CONFIG_GDB_SCRIPTS=y
CONFIG_READABLE_ASM=y
CONFIG_NET_DEV_REFCNT_TRACKER=y
CONFIG_NET_NS_REFCNT_TRACKER=y
CONFIG_DEBUG_NET=y
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_SLUB_DEBUG_ON=y
CONFIG_PTDUMP_DEBUGFS=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_PER_VMA_LOCK_STATS=y
CONFIG_DEBUG_OBJECTS=y
CONFIG_DEBUG_OBJECTS_SELFTEST=y
CONFIG_DEBUG_OBJECTS_FREE=y
CONFIG_DEBUG_OBJECTS_TIMERS=y
CONFIG_DEBUG_OBJECTS_WORK=y
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER=y
CONFIG_SHRINKER_DEBUG=y
CONFIG_SCHED_STACK_END_CHECK=y
CONFIG_DEBUG_VM=y
CONFIG_DEBUG_VM_MAPLE_TREE=y
CONFIG_DEBUG_VM_RB=y
CONFIG_DEBUG_VM_PGFLAGS=y
CONFIG_DEBUG_VM_PGTABLE=y
CONFIG_DEBUG_VIRTUAL=y
CONFIG_DEBUG_PER_CPU_MAPS=y
CONFIG_DEBUG_SHIRQ=y
CONFIG_WQ_WATCHDOG=y
CONFIG_DEBUG_TIMEKEEPING=y
CONFIG_DEBUG_PREEMPT=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCK_STAT=y
CONFIG_DEBUG_ATOMIC_SLEEP=y
CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
CONFIG_CSD_LOCK_WAIT_DEBUG=y
CONFIG_DEBUG_IRQFLAGS=y
CONFIG_WARN_ALL_UNSEEDED_RANDOM=y
CONFIG_DEBUG_NOTIFIERS=y
CONFIG_DEBUG_CREDENTIALS=y
CONFIG_RCU_CPU_STALL_CPUTIME=y
CONFIG_RCU_EQS_DEBUG=y
CONFIG_DEBUG_WQ_FORCE_RR_CPU=y
CONFIG_CPU_HOTPLUG_STATE_CONTROL=y
CONFIG_LATENCYTOP=y
CONFIG_PPC_IRQ_SOFT_MASK_DEBUG=y
CONFIG_PPC_RFI_SRR_DEBUG=y



[PATCH 3/3] powerpc/watchpoints: Annotate atomic context in more places

2023-08-29 Thread Benjamin Gray
It can be easy to miss that the notifier mechanism invokes the callbacks
in an atomic context, so add some comments to that effect on the two
handlers we register here.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/kernel/hw_breakpoint.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index 2854376870cf..a1318ce18d0e 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -368,6 +368,11 @@ static void handle_p10dd1_spurious_exception(struct 
perf_event **bp,
}
 }
 
+/*
+ * Handle a DABR or DAWR exception.
+ *
+ * Called in atomic context.
+ */
 int hw_breakpoint_handler(struct die_args *args)
 {
bool err = false;
@@ -495,6 +500,8 @@ NOKPROBE_SYMBOL(hw_breakpoint_handler);
 
 /*
  * Handle single-step exceptions following a DABR hit.
+ *
+ * Called in atomic context.
  */
 static int single_step_dabr_instruction(struct die_args *args)
 {
@@ -546,6 +553,8 @@ NOKPROBE_SYMBOL(single_step_dabr_instruction);
 
 /*
  * Handle debug exception notifications.
+ *
+ * Called in atomic context.
  */
 int hw_breakpoint_exceptions_notify(
struct notifier_block *unused, unsigned long val, void *data)
-- 
2.41.0



[PATCH 0/3] Fix preemption errors in watchpoints

2023-08-29 Thread Benjamin Gray
When enabling debug config options relating to preemption, several bugs
appear in the kernel log. With this series applied, the breakpoint code
no longer prints bugs when running the powerpc/ptrace selftests.

Benjamin Gray (3):
  powerpc/watchpoints: Disable preemption in thread_change_pc()
  powerpc/watchpoint: Disable pagefaults when getting user instruction
  powerpc/watchpoints: Annotate atomic context in more places

 arch/powerpc/kernel/hw_breakpoint.c | 16 +++-
 arch/powerpc/kernel/hw_breakpoint_constraints.c |  7 ++-
 2 files changed, 21 insertions(+), 2 deletions(-)

--
2.41.0


[PATCH 2/3] powerpc/watchpoint: Disable pagefaults when getting user instruction

2023-08-29 Thread Benjamin Gray
This is called in an atomic context, so is not allowed to sleep if a
user page needs to be faulted in and has nowhere it can be deferred to.
The pagefault_disabled() function is documented as preventing user
access methods from sleeping.

In practice the page will be mapped in nearly always because we are
reading the instruction that just triggered the watchpoint trap.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/kernel/hw_breakpoint_constraints.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint_constraints.c 
b/arch/powerpc/kernel/hw_breakpoint_constraints.c
index a74623025f3a..9e51801c4915 100644
--- a/arch/powerpc/kernel/hw_breakpoint_constraints.c
+++ b/arch/powerpc/kernel/hw_breakpoint_constraints.c
@@ -131,8 +131,13 @@ void wp_get_instr_detail(struct pt_regs *regs, ppc_inst_t 
*instr,
 int *type, int *size, unsigned long *ea)
 {
struct instruction_op op;
+   int err;
 
-   if (__get_user_instr(*instr, (void __user *)regs->nip))
+   pagefault_disable();
+   err = __get_user_instr(*instr, (void __user *)regs->nip);
+   pagefault_enable();
+
+   if (err)
return;
 
analyse_instr(, regs, *instr);
-- 
2.41.0



[PATCH 1/3] powerpc/watchpoints: Disable preemption in thread_change_pc()

2023-08-29 Thread Benjamin Gray
thread_change_pc() uses CPU local data, so must be protected from
swapping CPUs while it is reading the breakpoint struct.

The error is more noticeable after 1e60f3564bad ("powerpc/watchpoints:
Track perf single step directly on the breakpoint"), which added an
unconditional __this_cpu_read() call in thread_change_pc(). However the
existing __this_cpu_read() that runs if a breakpoint does need to be
re-inserted has the same issue.

Signed-off-by: Benjamin Gray 

---

There's probably a more idiomatic way to express this. We technically
don't need to disable preemption for the entire function: we should only
need to disable preemption within each loop iteration while handling the
pointer we are working with. Each iteration itself is independent.
---
 arch/powerpc/kernel/hw_breakpoint.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index b8513dc3e53a..2854376870cf 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -230,13 +230,15 @@ void thread_change_pc(struct task_struct *tsk, struct 
pt_regs *regs)
struct arch_hw_breakpoint *info;
int i;
 
+   preempt_disable();
+
for (i = 0; i < nr_wp_slots(); i++) {
struct perf_event *bp = __this_cpu_read(bp_per_reg[i]);
 
if (unlikely(bp && counter_arch_bp(bp)->perf_single_step))
goto reset;
}
-   return;
+   goto out;
 
 reset:
regs_set_return_msr(regs, regs->msr & ~MSR_SE);
@@ -245,6 +247,9 @@ void thread_change_pc(struct task_struct *tsk, struct 
pt_regs *regs)
__set_breakpoint(i, info);
info->perf_single_step = false;
}
+
+out:
+   preempt_enable();
 }
 
 static bool is_larx_stcx_instr(int type)
-- 
2.41.0



[PATCH] powerpc/dexcr: Move HASHCHK trap handler

2023-08-24 Thread Benjamin Gray
To determine if a trap was caused by a HASHCHK instruction, we inspect
the user instruction that triggered the trap. However this may sleep
if the page needs to be faulted in.

Move the HASHCHK handler logic to after we allow IRQs, which is fine
because we are only interested in HASHCHK if it's a user space trap.

Fixes: 5bcba4e6c13f ("powerpc/dexcr: Handle hashchk exception")
Signed-off-by: Benjamin Gray 
---
 arch/powerpc/kernel/traps.c | 56 -
 1 file changed, 36 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index f5ce282dc4b8..32b5e841ea97 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1512,23 +1512,11 @@ static void do_program_check(struct pt_regs *regs)
return;
}
 
-   if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE) && user_mode(regs)) {
-   ppc_inst_t insn;
-
-   if (get_user_instr(insn, (void __user *)regs->nip)) {
-   _exception(SIGSEGV, regs, SEGV_MAPERR, 
regs->nip);
-   return;
-   }
-
-   if (ppc_inst_primary_opcode(insn) == 31 &&
-   get_xop(ppc_inst_val(insn)) == OP_31_XOP_HASHCHK) {
-   _exception(SIGILL, regs, ILL_ILLOPN, regs->nip);
-   return;
-   }
+   /* User mode considers other cases after enabling IRQs */
+   if (!user_mode(regs)) {
+   _exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip);
+   return;
}
-
-   _exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip);
-   return;
}
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
if (reason & REASON_TM) {
@@ -1561,16 +1549,44 @@ static void do_program_check(struct pt_regs *regs)
 
/*
 * If we took the program check in the kernel skip down to sending a
-* SIGILL. The subsequent cases all relate to emulating instructions
-* which we should only do for userspace. We also do not want to enable
-* interrupts for kernel faults because that might lead to further
-* faults, and loose the context of the original exception.
+* SIGILL. The subsequent cases all relate to user space, such as
+* emulating instructions which we should only do for user space. We
+* also do not want to enable interrupts for kernel faults because that
+* might lead to further faults, and loose the context of the original
+* exception.
 */
if (!user_mode(regs))
goto sigill;
 
interrupt_cond_local_irq_enable(regs);
 
+   /*
+* (reason & REASON_TRAP) is mostly handled before enabling IRQs,
+* except get_user_instr() can sleep so we cannot reliably inspect the
+* current instruction in that context. Now that we know we are
+* handling a user space trap and can sleep, we can check if the trap
+* was a hashchk failure.
+*/
+   if (reason & REASON_TRAP) {
+   if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE)) {
+   ppc_inst_t insn;
+
+   if (get_user_instr(insn, (void __user *)regs->nip)) {
+   _exception(SIGSEGV, regs, SEGV_MAPERR, 
regs->nip);
+   return;
+   }
+
+   if (ppc_inst_primary_opcode(insn) == 31 &&
+   get_xop(ppc_inst_val(insn)) == OP_31_XOP_HASHCHK) {
+   _exception(SIGILL, regs, ILL_ILLOPN, regs->nip);
+   return;
+   }
+   }
+
+   _exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip);
+   return;
+   }
+
/* (reason & REASON_ILLEGAL) would be the obvious thing here,
 * but there seems to be a hardware bug on the 405GP (RevD)
 * that means ESR is sometimes set incorrectly - either to
-- 
2.41.0



Re: [PATCH 2/2] powerpc/xmon: use KSYM_NAME_LEN in array size

2023-08-02 Thread Benjamin Gray

On 30/5/23 10:54 pm, Miguel Ojeda wrote:

Side-note: in `get_function_bounds()`, I see `kallsyms_lookup()` being
used, but the name seems discarded? Can
`kallsyms_lookup_size_offset()` be used instead, thus avoiding the
usage of the buffer there to begin with?


I'm not familiar with the kallsyms infrastructure, but looking over the 
implementations of kallsyms_lookup() and kallsyms_lookup_size_offset() 
it looks like the existing kallsyms_lookup()

 handles an extra case over kallsyms_lookup_size_offset()?

kallsyms_lookup_buildid() (the implementation of kallsyms_lookup()) has

  /* See if it's in a module or a BPF JITed image. */
  ret = module_address_lookup(addr, symbolsize, offset,
  modname, modbuildid, namebuf);
  if (!ret)
  ret = bpf_address_lookup(addr, symbolsize,
   offset, modname, namebuf);

  if (!ret)
  ret = ftrace_mod_address_lookup(addr, symbolsize,
  offset, modname, namebuf);

while kallsyms_lookup_size_offset() is missing the ftrace case

  return !!module_address_lookup(addr, symbolsize, offset,
 NULL, NULL, namebuf) ||
 !!__bpf_address_lookup(addr, symbolsize, offset, namebuf);

Might this be a concern for xmon?


[PATCH 7/7] perf/hw_breakpoint: Remove arch breakpoint hooks

2023-07-31 Thread Benjamin Gray
PowerPC was the only user of these hooks, and has been refactored to no
longer require them. There is no need to keep them around, so remove
them to reduce complexity.

Signed-off-by: Benjamin Gray 
---
 include/linux/hw_breakpoint.h |  3 ---
 kernel/events/hw_breakpoint.c | 28 
 2 files changed, 31 deletions(-)

diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index 7fbb45911273..db199d653dd1 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -90,9 +90,6 @@ extern int dbg_reserve_bp_slot(struct perf_event *bp);
 extern int dbg_release_bp_slot(struct perf_event *bp);
 extern int reserve_bp_slot(struct perf_event *bp);
 extern void release_bp_slot(struct perf_event *bp);
-int arch_reserve_bp_slot(struct perf_event *bp);
-void arch_release_bp_slot(struct perf_event *bp);
-void arch_unregister_hw_breakpoint(struct perf_event *bp);
 
 extern void flush_ptrace_hw_breakpoint(struct task_struct *tsk);
 
diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index c3797701339c..6c2cb4e4f48d 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -523,26 +523,6 @@ toggle_bp_slot(struct perf_event *bp, bool enable, enum 
bp_type_idx type, int we
return 0;
 }
 
-__weak int arch_reserve_bp_slot(struct perf_event *bp)
-{
-   return 0;
-}
-
-__weak void arch_release_bp_slot(struct perf_event *bp)
-{
-}
-
-/*
- * Function to perform processor-specific cleanup during unregistration
- */
-__weak void arch_unregister_hw_breakpoint(struct perf_event *bp)
-{
-   /*
-* A weak stub function here for those archs that don't define
-* it inside arch/.../kernel/hw_breakpoint.c
-*/
-}
-
 /*
  * Constraints to check before allowing this new breakpoint counter.
  *
@@ -594,7 +574,6 @@ static int __reserve_bp_slot(struct perf_event *bp, u64 
bp_type)
enum bp_type_idx type;
int max_pinned_slots;
int weight;
-   int ret;
 
/* We couldn't initialize breakpoint constraints on boot */
if (!constraints_initialized)
@@ -613,10 +592,6 @@ static int __reserve_bp_slot(struct perf_event *bp, u64 
bp_type)
if (max_pinned_slots > hw_breakpoint_slots_cached(type))
return -ENOSPC;
 
-   ret = arch_reserve_bp_slot(bp);
-   if (ret)
-   return ret;
-
return toggle_bp_slot(bp, true, type, weight);
 }
 
@@ -634,8 +609,6 @@ static void __release_bp_slot(struct perf_event *bp, u64 
bp_type)
enum bp_type_idx type;
int weight;
 
-   arch_release_bp_slot(bp);
-
type = find_slot_idx(bp_type);
weight = hw_breakpoint_weight(bp);
WARN_ON(toggle_bp_slot(bp, false, type, weight));
@@ -645,7 +618,6 @@ void release_bp_slot(struct perf_event *bp)
 {
struct mutex *mtx = bp_constraints_lock(bp);
 
-   arch_unregister_hw_breakpoint(bp);
__release_bp_slot(bp, bp->attr.bp_type);
bp_constraints_unlock(mtx);
 }
-- 
2.41.0



[PATCH 6/7] selftests/powerpc/ptrace: Update ptrace-perf watchpoint selftest

2023-07-31 Thread Benjamin Gray
Now that ptrace and perf are no longer exclusive, update the
test to exercise interesting interactions.

An assembly file is used for the children to allow precise instruction
choice and addresses, while avoiding any compiler quirks.

Signed-off-by: Benjamin Gray 
---
 .../testing/selftests/powerpc/ptrace/Makefile |1 +
 .../powerpc/ptrace/ptrace-perf-asm.S  |   33 +
 .../powerpc/ptrace/ptrace-perf-hwbreak.c  | 1104 +++--
 3 files changed, 479 insertions(+), 659 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace-perf-asm.S
 rewrite tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c (93%)

diff --git a/tools/testing/selftests/powerpc/ptrace/Makefile 
b/tools/testing/selftests/powerpc/ptrace/Makefile
index cbeeaeae8837..1b39b86849da 100644
--- a/tools/testing/selftests/powerpc/ptrace/Makefile
+++ b/tools/testing/selftests/powerpc/ptrace/Makefile
@@ -36,6 +36,7 @@ $(TM_TESTS): CFLAGS += -I../tm -mhtm
 CFLAGS += $(KHDR_INCLUDES) -fno-pie
 
 $(OUTPUT)/ptrace-gpr: ptrace-gpr.S
+$(OUTPUT)/ptrace-perf-hwbreak: ptrace-perf-asm.S
 $(OUTPUT)/ptrace-pkey $(OUTPUT)/core-pkey: LDLIBS += -pthread
 
 $(TEST_GEN_PROGS): ../harness.c ../utils.c ../lib/reg.S
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-perf-asm.S 
b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-asm.S
new file mode 100644
index ..9aa2e58f3189
--- /dev/null
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-asm.S
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include 
+
+.global same_watch_addr_load
+.global same_watch_addr_trap
+
+FUNC_START(same_watch_addr_child)
+   nop
+same_watch_addr_load:
+   ld 0,0(3)
+   nop
+same_watch_addr_trap:
+   trap
+   blr
+FUNC_END(same_watch_addr_child)
+
+
+.global perf_then_ptrace_load1
+.global perf_then_ptrace_load2
+.global perf_then_ptrace_trap
+
+FUNC_START(perf_then_ptrace_child)
+   nop
+perf_then_ptrace_load1:
+   ld 0,0(3)
+perf_then_ptrace_load2:
+   ld 0,0(4)
+   nop
+perf_then_ptrace_trap:
+   trap
+   blr
+FUNC_END(perf_then_ptrace_child)
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c
dissimilarity index 93%
index d8a9e95fc03d..a0a0b9bb5854 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c
@@ -1,659 +1,445 @@
-// SPDX-License-Identifier: GPL-2.0+
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include "ptrace.h"
-
-char data[16];
-
-/* Overlapping address range */
-volatile __u64 *ptrace_data1 = (__u64 *)[0];
-volatile __u64 *perf_data1 = (__u64 *)[4];
-
-/* Non-overlapping address range */
-volatile __u64 *ptrace_data2 = (__u64 *)[0];
-volatile __u64 *perf_data2 = (__u64 *)[8];
-
-static unsigned long pid_max_addr(void)
-{
-   FILE *fp;
-   char *line, *c;
-   char addr[100];
-   size_t len = 0;
-
-   fp = fopen("/proc/kallsyms", "r");
-   if (!fp) {
-   printf("Failed to read /proc/kallsyms. Exiting..\n");
-   exit(EXIT_FAILURE);
-   }
-
-   while (getline(, , fp) != -1) {
-   if (!strstr(line, "pid_max") || strstr(line, "pid_max_max") ||
-   strstr(line, "pid_max_min"))
-   continue;
-
-   strncpy(addr, line, len < 100 ? len : 100);
-   c = strchr(addr, ' ');
-   *c = '\0';
-   return strtoul(addr, , 16);
-   }
-   fclose(fp);
-   printf("Could not find pid_max. Exiting..\n");
-   exit(EXIT_FAILURE);
-   return -1;
-}
-
-static void perf_user_event_attr_set(struct perf_event_attr *attr, __u64 addr, 
__u64 len)
-{
-   memset(attr, 0, sizeof(struct perf_event_attr));
-   attr->type   = PERF_TYPE_BREAKPOINT;
-   attr->size   = sizeof(struct perf_event_attr);
-   attr->bp_type= HW_BREAKPOINT_R;
-   attr->bp_addr= addr;
-   attr->bp_len = len;
-   attr->exclude_kernel = 1;
-   attr->exclude_hv = 1;
-}
-
-static void perf_kernel_event_attr_set(struct perf_event_attr *attr)
-{
-   memset(attr, 0, sizeof(struct perf_event_attr));
-   attr->type   = PERF_TYPE_BREAKPOINT;
-   attr->size   = sizeof(struct perf_event_attr);
-   attr->bp_type= HW_BREAKPOINT_R;
-   attr->bp_addr= pid_max_addr();
-   attr->bp_len = sizeof(unsigned long);
-   attr->exclude_user   = 1;
-   attr->exclude_hv = 1;
-}
-
-static int perf_cpu_event_open(int cpu, __u64 addr, __u64 len)
-{
-   struct perf_event_attr attr;
-
-   perf_user_event_attr_set(, addr, len);
-   retur

[PATCH 5/7] powerpc/watchpoints: Remove ptrace/perf exclusion tracking

2023-07-31 Thread Benjamin Gray
ptrace and perf watchpoints were considered incompatible in
commit 29da4f91c0c1 ("powerpc/watchpoint: Don't allow concurrent perf
and ptrace events"), but the logic in that commit doesn't really apply.

Ptrace doesn't automatically single step; the ptracer must request this
explicitly. And the ptracer can do so regardless of whether a
ptrace/perf watchpoint triggered or not: it could single step every
instruction if it wanted to. Whatever stopped the ptracee before
executing the instruction that would trigger the perf watchpoint is no
longer relevant by this point.

To get correct behaviour when perf and ptrace are watching the same
data we must ignore the perf watchpoint. After all, ptrace has
before-execute semantics, and perf is after-execute, so perf doesn't
actually care about the watchpoint trigger at this point in time.
Pausing before execution does not mean we will actually end up executing
the instruction.

Importantly though, we don't remove the perf watchpoint yet. This is
key.

The ptracer is free to do whatever it likes right now. E.g., it can
continue the process, single step. or even set the child PC somewhere
completely different.

If it does try to execute the instruction though, without reinserting
the watchpoint (in which case we go back to the start of this example),
the perf watchpoint would immediately trigger. This time there is no
ptrace watchpoint, so we can safely perform a single step and increment
the perf counter. Upon receiving the single step exception, the existing
code already handles propagating or consuming it based on whether
another subsystem (e.g. ptrace) requested a single step. Again, this is
needed with or without perf/ptrace exclusion, because ptrace could be
single stepping this instruction regardless of if a watchpoint is
involved.

Signed-off-by: Benjamin Gray 

---

Whether it's a _good_ idea to mix ptrace and perf is another thing
entirely mind... . But they are not inherently incompatible.
---
 arch/powerpc/kernel/hw_breakpoint.c | 249 +---
 1 file changed, 1 insertion(+), 248 deletions(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index bf8dda1a7e04..b8513dc3e53a 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -114,253 +114,6 @@ static bool is_ptrace_bp(struct perf_event *bp)
return bp->overflow_handler == ptrace_triggered;
 }
 
-struct breakpoint {
-   struct list_head list;
-   struct perf_event *bp;
-   bool ptrace_bp;
-};
-
-/*
- * While kernel/events/hw_breakpoint.c does its own synchronization, we cannot
- * rely on it safely synchronizing internals here; however, we can rely on it
- * not requesting more breakpoints than available.
- */
-static DEFINE_SPINLOCK(cpu_bps_lock);
-static DEFINE_PER_CPU(struct breakpoint *, cpu_bps[HBP_NUM_MAX]);
-static DEFINE_SPINLOCK(task_bps_lock);
-static LIST_HEAD(task_bps);
-
-static struct breakpoint *alloc_breakpoint(struct perf_event *bp)
-{
-   struct breakpoint *tmp;
-
-   tmp = kzalloc(sizeof(*tmp), GFP_KERNEL);
-   if (!tmp)
-   return ERR_PTR(-ENOMEM);
-   tmp->bp = bp;
-   tmp->ptrace_bp = is_ptrace_bp(bp);
-   return tmp;
-}
-
-static bool bp_addr_range_overlap(struct perf_event *bp1, struct perf_event 
*bp2)
-{
-   __u64 bp1_saddr, bp1_eaddr, bp2_saddr, bp2_eaddr;
-
-   bp1_saddr = ALIGN_DOWN(bp1->attr.bp_addr, HW_BREAKPOINT_SIZE);
-   bp1_eaddr = ALIGN(bp1->attr.bp_addr + bp1->attr.bp_len, 
HW_BREAKPOINT_SIZE);
-   bp2_saddr = ALIGN_DOWN(bp2->attr.bp_addr, HW_BREAKPOINT_SIZE);
-   bp2_eaddr = ALIGN(bp2->attr.bp_addr + bp2->attr.bp_len, 
HW_BREAKPOINT_SIZE);
-
-   return (bp1_saddr < bp2_eaddr && bp1_eaddr > bp2_saddr);
-}
-
-static bool alternate_infra_bp(struct breakpoint *b, struct perf_event *bp)
-{
-   return is_ptrace_bp(bp) ? !b->ptrace_bp : b->ptrace_bp;
-}
-
-static bool can_co_exist(struct breakpoint *b, struct perf_event *bp)
-{
-   return !(alternate_infra_bp(b, bp) && bp_addr_range_overlap(b->bp, bp));
-}
-
-static int task_bps_add(struct perf_event *bp)
-{
-   struct breakpoint *tmp;
-
-   tmp = alloc_breakpoint(bp);
-   if (IS_ERR(tmp))
-   return PTR_ERR(tmp);
-
-   spin_lock(_bps_lock);
-   list_add(>list, _bps);
-   spin_unlock(_bps_lock);
-   return 0;
-}
-
-static void task_bps_remove(struct perf_event *bp)
-{
-   struct list_head *pos, *q;
-
-   spin_lock(_bps_lock);
-   list_for_each_safe(pos, q, _bps) {
-   struct breakpoint *tmp = list_entry(pos, struct breakpoint, 
list);
-
-   if (tmp->bp == bp) {
-   list_del(>list);
-   kfree(tmp);
-   break;
-   }
-   }
-   spin_unlock(_bps_lock);
-}
-
-/*
- * If any task has breakpoint from alterna

[PATCH 4/7] powerpc/watchpoints: Simplify watchpoint reinsertion

2023-07-31 Thread Benjamin Gray
We only remove watchpoints when they have the perf_single_step flag set,
so we can reinsert them during the first iteration.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/kernel/hw_breakpoint.c | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index 624375c18882..bf8dda1a7e04 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -772,16 +772,6 @@ static int single_step_dabr_instruction(struct die_args 
*args)
perf_bp_event(bp, regs);
 
info->perf_single_step = false;
-   }
-
-   if (!found)
-   return NOTIFY_DONE;
-
-   for (int i = 0; i < nr_wp_slots(); i++) {
-   struct perf_event *bp = __this_cpu_read(bp_per_reg[i]);
-   if (!bp)
-   continue;
-
__set_breakpoint(i, counter_arch_bp(bp));
}
 
@@ -789,7 +779,7 @@ static int single_step_dabr_instruction(struct die_args 
*args)
 * If the process was being single-stepped by ptrace, let the
 * other single-step actions occur (e.g. generate SIGTRAP).
 */
-   if (test_thread_flag(TIF_SINGLESTEP))
+   if (!found || test_thread_flag(TIF_SINGLESTEP))
return NOTIFY_DONE;
 
return NOTIFY_STOP;
-- 
2.41.0



[PATCH 1/7] powerpc/watchpoints: Explain thread_change_pc() more

2023-07-31 Thread Benjamin Gray
The behaviour of the thread_change_pc() function is a bit cryptic
without being more familiar with how the watchpoint logic handles
perf's after-execute semantics.

Expand the comment to explain why we can re-insert the breakpoint and
unset the perf_single_step flag.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/kernel/hw_breakpoint.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index e1b4e70c8fd0..bad2991f906b 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -499,6 +499,10 @@ int hw_breakpoint_arch_parse(struct perf_event *bp,
  * Restores the breakpoint on the debug registers.
  * Invoke this function if it is known that the execution context is
  * about to change to cause loss of MSR_SE settings.
+ *
+ * The perf watchpoint will simply re-trigger once the thread is started again,
+ * and the watchpoint handler will set up MSR_SE and perf_single_step as
+ * needed.
  */
 void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs)
 {
-- 
2.41.0



[PATCH 3/7] powerpc/watchpoints: Track perf single step directly on the breakpoint

2023-07-31 Thread Benjamin Gray
There is a bug in the current watchpoint tracking logic, where the
teardown in arch_unregister_hw_breakpoint() uses bp->ctx->task, which it
does not have a reference of and parallel threads may be in the process
of destroying. This was partially addressed in commit fb822e6076d9
("powerpc/hw_breakpoint: Fix oops when destroying hw_breakpoint event"),
but the underlying issue of accessing a struct member in an unknown
state still remained. Syzkaller managed to trigger a null pointer
derefernce due to the race between the task destructor and checking the
pointer and dereferencing it in the loop.

While this null pointer dereference could be fixed by using READ_ONCE
to access the task up front, that just changes the error to manipulating
possbily freed memory.

Instead, the breakpoint logic needs to be reworked to remove any
dependency on a context or task struct during breakpoint removal.

The reason we have this currently is to clear thread.last_hit_ubp. This
member is used to differentiate the perf DAWR single-step sequence from
other causes of single-step, such as userspace just calling
ptrace(PTRACE_SINGLESTEP, ...). We need to differentiate them because,
when the single step interrupt is received, we need to know whether to
re-insert the DAWR breakpoint (perf) or not (ptrace / other).

arch_unregister_hw_breakpoint() needs to clear this information to
prevent dangling pointers to possibly freed memory. These pointers are
dereferenced in single_step_dabr_instruction() without a way to check
their validity.

This patch moves the tracking of this information to the breakpoint
itself. This means we no longer have to do anything special to clean up.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/include/asm/hw_breakpoint.h |  1 +
 arch/powerpc/include/asm/processor.h |  5 --
 arch/powerpc/kernel/hw_breakpoint.c  | 69 
 3 files changed, 23 insertions(+), 52 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index 84d39fd42f71..66db0147d5b4 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -18,6 +18,7 @@ struct arch_hw_breakpoint {
u16 len; /* length of the target data symbol */
u16 hw_len; /* length programmed in hw */
u8  flags;
+   boolperf_single_step; /* temporarily uninstalled for a perf 
single step */
 };
 
 /* Note: Don't change the first 6 bits below as they are in the same order
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 8a6754ffdc7e..9e67cb1c72e9 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -172,11 +172,6 @@ struct thread_struct {
unsigned intalign_ctl;  /* alignment handling control */
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
struct perf_event *ptrace_bps[HBP_NUM_MAX];
-   /*
-* Helps identify source of single-step exception and subsequent
-* hw-breakpoint enablement
-*/
-   struct perf_event *last_hit_ubp[HBP_NUM_MAX];
 #endif /* CONFIG_HAVE_HW_BREAKPOINT */
struct arch_hw_breakpoint hw_brk[HBP_NUM_MAX]; /* hardware breakpoint 
info */
unsigned long   trap_nr;/* last trap # on this thread */
diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index e6749642604c..624375c18882 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -43,16 +43,6 @@ int hw_breakpoint_slots(int type)
return 0;   /* no instruction breakpoints available */
 }
 
-static bool single_step_pending(void)
-{
-   int i;
-
-   for (i = 0; i < nr_wp_slots(); i++) {
-   if (current->thread.last_hit_ubp[i])
-   return true;
-   }
-   return false;
-}
 
 /*
  * Install a perf counter breakpoint.
@@ -84,7 +74,7 @@ int arch_install_hw_breakpoint(struct perf_event *bp)
 * Do not install DABR values if the instruction must be single-stepped.
 * If so, DABR will be populated in single_step_dabr_instruction().
 */
-   if (!single_step_pending())
+   if (!info->perf_single_step)
__set_breakpoint(i, info);
 
return 0;
@@ -371,28 +361,6 @@ void arch_release_bp_slot(struct perf_event *bp)
}
 }
 
-/*
- * Perform cleanup of arch-specific counters during unregistration
- * of the perf-event
- */
-void arch_unregister_hw_breakpoint(struct perf_event *bp)
-{
-   /*
-* If the breakpoint is unregistered between a hw_breakpoint_handler()
-* and the single_step_dabr_instruction(), then cleanup the breakpoint
-* restoration variables to prevent dangling pointers.
-* FIXME, this should not be using bp->ctx at all! Sayeth peterz.
-*/
-   if (bp->ctx && bp->

[PATCH 2/7] powerpc/watchpoints: Don't track info persistently

2023-07-31 Thread Benjamin Gray
info is cheap to retrieve, and is likely optimised by the compiler
anyway. On the other hand, propagating it across the functions makes it
possible to be inconsistent and adds needless complexity.

Remove it, and invoke counter_arch_bp() when we need to work with it.

As we don't persist it, we just use the local bp array to track whether
we are ignoring a breakpoint.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/kernel/hw_breakpoint.c | 60 +++--
 1 file changed, 32 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index bad2991f906b..e6749642604c 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -538,23 +538,22 @@ static bool is_octword_vsx_instr(int type, int size)
  * We've failed in reliably handling the hw-breakpoint. Unregister
  * it and throw a warning message to let the user know about it.
  */
-static void handler_error(struct perf_event *bp, struct arch_hw_breakpoint 
*info)
+static void handler_error(struct perf_event *bp)
 {
WARN(1, "Unable to handle hardware breakpoint. Breakpoint at 0x%lx will 
be disabled.",
-info->address);
+counter_arch_bp(bp)->address);
perf_event_disable_inatomic(bp);
 }
 
-static void larx_stcx_err(struct perf_event *bp, struct arch_hw_breakpoint 
*info)
+static void larx_stcx_err(struct perf_event *bp)
 {
printk_ratelimited("Breakpoint hit on instruction that can't be 
emulated. Breakpoint at 0x%lx will be disabled.\n",
-  info->address);
+  counter_arch_bp(bp)->address);
perf_event_disable_inatomic(bp);
 }
 
 static bool stepping_handler(struct pt_regs *regs, struct perf_event **bp,
-struct arch_hw_breakpoint **info, int *hit,
-ppc_inst_t instr)
+int *hit, ppc_inst_t instr)
 {
int i;
int stepped;
@@ -565,7 +564,7 @@ static bool stepping_handler(struct pt_regs *regs, struct 
perf_event **bp,
if (!hit[i])
continue;
current->thread.last_hit_ubp[i] = bp[i];
-   info[i] = NULL;
+   bp[i] = NULL;
}
regs_set_return_msr(regs, regs->msr | MSR_SE);
return false;
@@ -576,15 +575,15 @@ static bool stepping_handler(struct pt_regs *regs, struct 
perf_event **bp,
for (i = 0; i < nr_wp_slots(); i++) {
if (!hit[i])
continue;
-   handler_error(bp[i], info[i]);
-   info[i] = NULL;
+   handler_error(bp[i]);
+   bp[i] = NULL;
}
return false;
}
return true;
 }
 
-static void handle_p10dd1_spurious_exception(struct arch_hw_breakpoint **info,
+static void handle_p10dd1_spurious_exception(struct perf_event **bp,
 int *hit, unsigned long ea)
 {
int i;
@@ -596,10 +595,14 @@ static void handle_p10dd1_spurious_exception(struct 
arch_hw_breakpoint **info,
 * spurious exception.
 */
for (i = 0; i < nr_wp_slots(); i++) {
-   if (!info[i])
+   struct arch_hw_breakpoint *info;
+
+   if (!bp[i])
continue;
 
-   hw_end_addr = ALIGN(info[i]->address + info[i]->len, 
HW_BREAKPOINT_SIZE);
+   info = counter_arch_bp(bp[i]);
+
+   hw_end_addr = ALIGN(info->address + info->len, 
HW_BREAKPOINT_SIZE);
 
/*
 * Ending address of DAWR range is less than starting
@@ -629,9 +632,9 @@ static void handle_p10dd1_spurious_exception(struct 
arch_hw_breakpoint **info,
return;
 
for (i = 0; i < nr_wp_slots(); i++) {
-   if (info[i]) {
+   if (bp[i]) {
hit[i] = 1;
-   info[i]->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ;
+   counter_arch_bp(bp[i])->type |= 
HW_BRK_TYPE_EXTRANEOUS_IRQ;
}
}
 }
@@ -642,7 +645,6 @@ int hw_breakpoint_handler(struct die_args *args)
int rc = NOTIFY_STOP;
struct perf_event *bp[HBP_NUM_MAX] = { NULL };
struct pt_regs *regs = args->regs;
-   struct arch_hw_breakpoint *info[HBP_NUM_MAX] = { NULL };
int i;
int hit[HBP_NUM_MAX] = {0};
int nr_hit = 0;
@@ -667,18 +669,20 @@ int hw_breakpoint_handler(struct die_args *args)
wp_get_instr_detail(regs, , , , );
 
for (i = 0; i < nr_wp_slots(); i++) {
+   struct arch_hw_breakpoint *info;
+
bp[i] = __this_cpu_read(bp_per_reg[i]);
if (!bp[i])
  

[PATCH 0/7] Rework perf and ptrace watchpoint tracking

2023-07-31 Thread Benjamin Gray
Syzkaller triggered a null pointer dereference in the
arch_unregister_hw_breakpoint() hook. This is due to accessing
the bp->ctx->task field changing to -1 while we iterate the breakpoints.

This series refactors the breakpoint tracking logic to remove the
dependency on bp->ctx entirely. It also simplifies handling of ptrace and
perf breakpoints, making insertion less restrictive. 

If merged, it allows several arch hooks that PowerPC was the sole user of
to be removed.

Benjamin Gray (7):
  powerpc/watchpoints: Explain thread_change_pc() more
  powerpc/watchpoints: Don't track info persistently
  powerpc/watchpoints: Track perf single step directly on the breakpoint
  powerpc/watchpoints: Simplify watchpoint reinsertion
  powerpc/watchpoints: Remove ptrace/perf exclusion tracking
  selftests/powerpc/ptrace: Update ptrace-perf watchpoint selftest
  perf/hw_breakpoint: Remove arch breakpoint hooks

 arch/powerpc/include/asm/hw_breakpoint.h  |1 +
 arch/powerpc/include/asm/processor.h  |5 -
 arch/powerpc/kernel/hw_breakpoint.c   |  388 +-
 include/linux/hw_breakpoint.h |3 -
 kernel/events/hw_breakpoint.c |   28 -
 .../testing/selftests/powerpc/ptrace/Makefile |1 +
 .../powerpc/ptrace/ptrace-perf-asm.S  |   33 +
 .../powerpc/ptrace/ptrace-perf-hwbreak.c  | 1104 +++--
 8 files changed, 537 insertions(+), 1026 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace-perf-asm.S
 rewrite tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c (93%)

-- 
2.41.0


[PATCH v1 3/4] selftests/powerpc/ptrace: Fix typo in pid_max search error

2023-07-24 Thread Benjamin Gray
pid_max_addr() searches for the 'pid_max' symbol in /proc/kallsyms, and
prints an error if it cannot find it. The error message has a typo,
calling it pix_max.

Signed-off-by: Benjamin Gray 
---
 tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c
index 16c653600124..d8a9e95fc03d 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c
@@ -46,7 +46,7 @@ static unsigned long pid_max_addr(void)
return strtoul(addr, , 16);
}
fclose(fp);
-   printf("Could not find pix_max. Exiting..\n");
+   printf("Could not find pid_max. Exiting..\n");
exit(EXIT_FAILURE);
return -1;
 }
-- 
2.41.0



[PATCH v1 2/4] selftests/powerpc/ptrace: Explain why tests are skipped

2023-07-24 Thread Benjamin Gray
Many tests require specific hardware features/configurations that a
typical machine might not have. As a result, it's common to see a test
is skipped. But it is tedious to find out why a test is skipped
when all it gives is the file location of the skip macro.

Convert SKIP_IF() to SKIP_IF_MSG(), with appropriate descriptions of why
the test is being skipped. This gives a general idea of why a test is
skipped, which can be looked into further if it doesn't make sense.

Signed-off-by: Benjamin Gray 
---
 tools/testing/selftests/powerpc/ptrace/child.h   | 4 ++--
 tools/testing/selftests/powerpc/ptrace/core-pkey.c   | 2 +-
 tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c| 2 +-
 tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c  | 2 +-
 tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c | 4 ++--
 tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c | 2 +-
 tools/testing/selftests/powerpc/ptrace/ptrace-tar.c  | 2 +-
 tools/testing/selftests/powerpc/ptrace/ptrace-tm-gpr.c   | 4 ++--
 tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-gpr.c   | 4 ++--
 tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-tar.c   | 4 ++--
 tools/testing/selftests/powerpc/ptrace/ptrace-tm-spd-vsx.c   | 4 ++--
 tools/testing/selftests/powerpc/ptrace/ptrace-tm-spr.c   | 4 ++--
 tools/testing/selftests/powerpc/ptrace/ptrace-tm-tar.c   | 4 ++--
 tools/testing/selftests/powerpc/ptrace/ptrace-tm-vsx.c   | 4 ++--
 tools/testing/selftests/powerpc/ptrace/ptrace-vsx.c  | 2 +-
 15 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/tools/testing/selftests/powerpc/ptrace/child.h 
b/tools/testing/selftests/powerpc/ptrace/child.h
index d7275b7b33dc..df62ff0735f7 100644
--- a/tools/testing/selftests/powerpc/ptrace/child.h
+++ b/tools/testing/selftests/powerpc/ptrace/child.h
@@ -48,12 +48,12 @@ struct child_sync {
}   \
} while (0)
 
-#define PARENT_SKIP_IF_UNSUPPORTED(x, sync)\
+#define PARENT_SKIP_IF_UNSUPPORTED(x, sync, msg)   \
do {\
if ((x) == -1 && (errno == ENODEV || errno == EINVAL)) { \
(sync)->parent_gave_up = true;  \
prod_child(sync);   \
-   SKIP_IF(1); \
+   SKIP_IF_MSG(1, msg);\
}   \
} while (0)
 
diff --git a/tools/testing/selftests/powerpc/ptrace/core-pkey.c 
b/tools/testing/selftests/powerpc/ptrace/core-pkey.c
index f6f8596ce8e1..f6da4cb30cd6 100644
--- a/tools/testing/selftests/powerpc/ptrace/core-pkey.c
+++ b/tools/testing/selftests/powerpc/ptrace/core-pkey.c
@@ -266,7 +266,7 @@ static int parent(struct shared_info *info, pid_t pid)
 * to the child.
 */
ret = ptrace_read_regs(pid, NT_PPC_PKEY, regs, 3);
-   PARENT_SKIP_IF_UNSUPPORTED(ret, >child_sync);
+   PARENT_SKIP_IF_UNSUPPORTED(ret, >child_sync, "PKEYs not 
supported");
PARENT_FAIL_IF(ret, >child_sync);
 
info->amr = regs[0];
diff --git a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
index f75739bbad28..e374c6b7ace6 100644
--- a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
+++ b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
@@ -884,7 +884,7 @@ static int perf_hwbreak(void)
 {
srand ( time(NULL) );
 
-   SKIP_IF(!perf_breakpoint_supported());
+   SKIP_IF_MSG(!perf_breakpoint_supported(), "Perf breakpoints not 
supported");
 
return runtest();
 }
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
index 1345e9b9af0f..a16239277a6f 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
@@ -603,7 +603,7 @@ static int ptrace_hwbreak(void)
wait(NULL);
 
get_dbginfo(child_pid, );
-   SKIP_IF(dbginfo.num_data_bps == 0);
+   SKIP_IF_MSG(dbginfo.num_data_bps == 0, "No data breakpoints present");
 
dawr = dawr_present();
run_tests(child_pid, , dawr);
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c
index 3344e74a97b4..16c653600124 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c
@@ -641,10 +641,10 @@ static int ptrace_perf_hwbreak(void)
wait(NULL); /* <-- child (SIGUSR1) */
 
get_dbginfo(child_pid, );
-  

[PATCH v1 4/4] selftests/powerpc/ptrace: Declare test temporary variables as volatile

2023-07-24 Thread Benjamin Gray
While the target is volatile, the temporary variables used to access the
target cast away the volatile. This is undefined behaviour, and a
compiler may optimise away/reorder these accesses, breaking the test.

This was observed with GCC 13.1.1, but it can be difficult to reproduce
because of the dependency on compiler behaviour.

Signed-off-by: Benjamin Gray 
---
 .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 24 +--
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
index a16239277a6f..75d30d61ab0e 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
@@ -64,26 +64,26 @@ static bool dawr_present(struct ppc_debug_info *dbginfo)
 
 static void write_var(int len)
 {
-   __u8 *pcvar;
-   __u16 *psvar;
-   __u32 *pivar;
-   __u64 *plvar;
+   volatile __u8 *pcvar;
+   volatile __u16 *psvar;
+   volatile __u32 *pivar;
+   volatile __u64 *plvar;
 
switch (len) {
case 1:
-   pcvar = (__u8 *)
+   pcvar = (volatile __u8 *)
*pcvar = 0xff;
break;
case 2:
-   psvar = (__u16 *)
+   psvar = (volatile __u16 *)
*psvar = 0x;
break;
case 4:
-   pivar = (__u32 *)
+   pivar = (volatile __u32 *)
*pivar = 0x;
break;
case 8:
-   plvar = (__u64 *)
+   plvar = (volatile __u64 *)
*plvar = 0xLL;
break;
}
@@ -98,16 +98,16 @@ static void read_var(int len)
 
switch (len) {
case 1:
-   cvar = (__u8)glvar;
+   cvar = (volatile __u8)glvar;
break;
case 2:
-   svar = (__u16)glvar;
+   svar = (volatile __u16)glvar;
break;
case 4:
-   ivar = (__u32)glvar;
+   ivar = (volatile __u32)glvar;
break;
case 8:
-   lvar = (__u64)glvar;
+   lvar = (volatile __u64)glvar;
break;
}
 }
-- 
2.41.0



[PATCH v1 1/4] Documentation/powerpc: Fix ptrace request names

2023-07-24 Thread Benjamin Gray
The documented ptrace request names are currently wrong/incomplete.
Fix this to improve correctness and searchability.

Signed-off-by: Benjamin Gray 
---
 Documentation/powerpc/ptrace.rst | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/powerpc/ptrace.rst b/Documentation/powerpc/ptrace.rst
index 77725d69eb4a..5629edf4d56e 100644
--- a/Documentation/powerpc/ptrace.rst
+++ b/Documentation/powerpc/ptrace.rst
@@ -15,7 +15,7 @@ that's extendable and that covers both BookE and server 
processors, so
 that GDB doesn't need to special-case each of them. We added the
 following 3 new ptrace requests.
 
-1. PTRACE_PPC_GETHWDEBUGINFO
+1. PPC_PTRACE_GETHWDBGINFO
 
 
 Query for GDB to discover the hardware debug features. The main info to
@@ -48,7 +48,7 @@ features will have bits indicating whether there is support 
for::
   #define PPC_DEBUG_FEATURE_DATA_BP_DAWR   0x10
   #define PPC_DEBUG_FEATURE_DATA_BP_ARCH_310x20
 
-2. PTRACE_SETHWDEBUG
+2. PPC_PTRACE_SETHWDEBUG
 
 Sets a hardware breakpoint or watchpoint, according to the provided structure::
 
@@ -88,7 +88,7 @@ that the BookE supports. COMEFROM breakpoints available in 
server processors
 are not contemplated, but that is out of the scope of this work.
 
 ptrace will return an integer (handle) uniquely identifying the breakpoint or
-watchpoint just created. This integer will be used in the PTRACE_DELHWDEBUG
+watchpoint just created. This integer will be used in the PPC_PTRACE_DELHWDEBUG
 request to ask for its removal. Return -ENOSPC if the requested breakpoint
 can't be allocated on the registers.
 
@@ -150,7 +150,7 @@ Some examples of using the structure to:
 p.addr2   = (uint64_t) end_range;
 p.condition_value = 0;
 
-3. PTRACE_DELHWDEBUG
+3. PPC_PTRACE_DELHWDEBUG
 
 Takes an integer which identifies an existing breakpoint or watchpoint
 (i.e., the value returned from PTRACE_SETHWDEBUG), and deletes the
-- 
2.41.0



[PATCH v1 0/4] Improve ptrace selftest usability

2023-07-24 Thread Benjamin Gray
While trying to test changes to the breakpoint implementation in the kernel, I
tried to run the ptrace tests and met many unexplained skips and failures.

This series addresses the pain points of trying to run these tests and learn
what they are doing.

Benjamin Gray (4):
  Documentation/powerpc: Fix ptrace request names
  selftests/powerpc/ptrace: Explain why tests are skipped
  selftests/powerpc/ptrace: Fix typo in pid_max search error
  selftests/powerpc/ptrace: Declare test temporary variables as volatile

 Documentation/powerpc/ptrace.rst  |  8 +++---
 .../testing/selftests/powerpc/ptrace/child.h  |  4 +--
 .../selftests/powerpc/ptrace/core-pkey.c  |  2 +-
 .../selftests/powerpc/ptrace/perf-hwbreak.c   |  2 +-
 .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 26 +--
 .../powerpc/ptrace/ptrace-perf-hwbreak.c  |  6 ++---
 .../selftests/powerpc/ptrace/ptrace-pkey.c|  2 +-
 .../selftests/powerpc/ptrace/ptrace-tar.c |  2 +-
 .../selftests/powerpc/ptrace/ptrace-tm-gpr.c  |  4 +--
 .../powerpc/ptrace/ptrace-tm-spd-gpr.c|  4 +--
 .../powerpc/ptrace/ptrace-tm-spd-tar.c|  4 +--
 .../powerpc/ptrace/ptrace-tm-spd-vsx.c|  4 +--
 .../selftests/powerpc/ptrace/ptrace-tm-spr.c  |  4 +--
 .../selftests/powerpc/ptrace/ptrace-tm-tar.c  |  4 +--
 .../selftests/powerpc/ptrace/ptrace-tm-vsx.c  |  4 +--
 .../selftests/powerpc/ptrace/ptrace-vsx.c |  2 +-
 16 files changed, 41 insertions(+), 41 deletions(-)

--
2.41.0


[PATCH] powerpc/kasan: Disable KCOV in KASAN code

2023-07-09 Thread Benjamin Gray
As per the generic KASAN code in mm/kasan, disable KCOV with
KCOV_INSTRUMENT := n in the makefile.

This fixes a ppc64 boot hang when KCOV and KASAN are enabled.
kasan_early_init() gets called before a PACA is initialised, but the
KCOV hook expects a valid PACA.

Suggested-by: Christophe Leroy 
Signed-off-by: Benjamin Gray 

---

The generic makefile has other disabled sanitisers / hooks, but this
patch is just an upstream of fixing a blocking issue for running
syzkaller. The blocker is resolved by this; kernels stop hanging on boot
with KCOV + KASAN.

This works as an alternative to
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20230707013110.215693-1-bg...@linux.ibm.com/
---
 arch/powerpc/mm/kasan/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
index 699eeffd9f55..f9522fd70b2f 100644
--- a/arch/powerpc/mm/kasan/Makefile
+++ b/arch/powerpc/mm/kasan/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 
 KASAN_SANITIZE := n
+KCOV_INSTRUMENT := n
 
 obj-$(CONFIG_PPC32)+= init_32.o
 obj-$(CONFIG_PPC_8xx)  += 8xx.o
-- 
2.41.0



Re: [PATCH] powerpc/powermac: Use early_* IO variants in via_calibrate_decr

2023-07-06 Thread Benjamin Gray
On Thu, 2023-07-06 at 11:08 +1000, Benjamin Gray wrote:
> The issue is pre-existing, but is surfaced by commit 721255b9826b
> ("genirq: Use a maple tree for interrupt descriptor management").
> It's not clear to me why this causes it to surface.

>From the thread chain in [1], it looks like the maple tree
implementation just has different allocation behaviour, which follows a
pre-existing code path to kmem_cache_alloc_bulk(), which
unconditionally enables interrupts.

(thanks Jordan Niethe for finding this thread)

[1]: https://lore.kernel.org/all/87o7qdzfay.ffs@tglx/


  1   2   3   4   >