Re: [PATCH RFC 2/2] powerpc/selftests: add test for papr-vpd

2023-08-24 Thread Russell Currey
On Tue, 2023-08-22 at 16:33 -0500, Nathan Lynch via B4 Relay wrote:
> From: Nathan Lynch 

Hi Nathan,

snowpatch has found a compiler error with this patch.


   Error: papr_vpd.c:346:33: error: passing argument 2 of 'test_harness'
   discards 'const' qualifier from pointer target type [-Werror=discarded-
   qualifiers]
  if (test_harness(t->function, t->description))
^
   In file included from papr_vpd.c:11:0:
   /linux/tools/testing/selftests/powerpc/include/utils.h:35:5: note:
   expected 'char *' but argument is of type 'const char * const'
int test_harness(int (test_function)(void), char *name);
^

https://github.com/linuxppc/linux-snowpatch/actions/runs/5960052721/job/16166735476#step:6:1337
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20230822-papr-sys_rtas-vs-lockdown-v1-2-932623cf3...@linux.ibm.com/

- Russell


[PATCH] powerpc/pseries: Rework lppaca_shared_proc() to avoid DEBUG_PREEMPT

2023-08-07 Thread Russell Currey
lppaca_shared_proc() takes a pointer to the lppaca which is typically
accessed through get_lppaca().  With DEBUG_PREEMPT enabled, this leads
to checking if preemption is enabled, for example:

BUG: using smp_processor_id() in preemptible [] code: grep/10693
caller is lparcfg_data+0x408/0x19a0
CPU: 4 PID: 10693 Comm: grep Not tainted 6.5.0-rc3 #2
Call Trace:
dump_stack_lvl+0x154/0x200 (unreliable)
check_preemption_disabled+0x214/0x220
lparcfg_data+0x408/0x19a0
...

This isn't actually a problem however, as it does not matter which
lppaca is accessed, the shared proc state will be the same.
vcpudispatch_stats_procfs_init() already works around this by disabling
preemption, but the lparcfg code does not, erroring any time
/proc/powerpc/lparcfg is accessed with DEBUG_PREEMPT enabled.

Instead of disabling preemption on the caller side, rework
lppaca_shared_proc() to not take a pointer and instead directly access
the lppaca, bypassing any potential preemption checks.

Fixes: f13c13a00512 ("powerpc: Stop using non-architected shared_proc field in 
lppaca")
Signed-off-by: Russell Currey 
---
Fixes tag might be a bit overkill.
---
 arch/powerpc/include/asm/lppaca.h|  9 -
 arch/powerpc/include/asm/paca.h  |  5 +
 arch/powerpc/platforms/pseries/lpar.c| 10 +-
 arch/powerpc/platforms/pseries/lparcfg.c |  4 ++--
 arch/powerpc/platforms/pseries/setup.c   |  2 +-
 drivers/cpuidle/cpuidle-pseries.c|  8 +---
 6 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/lppaca.h 
b/arch/powerpc/include/asm/lppaca.h
index 34d44cb17c87..c12e1a6e3595 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -127,7 +127,14 @@ struct lppaca {
  */
 #define LPPACA_OLD_SHARED_PROC 2
 
-static inline bool lppaca_shared_proc(struct lppaca *l)
+/*
+ * All CPUs should have the same shared proc value, so directly access the PACA
+ * to avoid false positives from DEBUG_PREEMPT.
+ *
+ * local_paca can't be referenced directly from lppaca.h, hence the macro.
+ */
+#define lppaca_shared_proc() (__lppaca_shared_proc(local_paca->lppaca_ptr))
+static inline bool __lppaca_shared_proc(struct lppaca *l)
 {
if (!firmware_has_feature(FW_FEATURE_SPLPAR))
return false;
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index cb325938766a..f77337b92ccf 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -49,6 +49,11 @@ extern unsigned int debug_smp_processor_id(void); /* from 
linux/smp.h */
 
 #ifdef CONFIG_PPC_PSERIES
 #define get_lppaca()   (get_paca()->lppaca_ptr)
+/*
+ * All CPUs should have the same shared proc value, so directly access the PACA
+ * to avoid false positives from DEBUG_PREEMPT.
+ */
+#define lppaca_shared_proc() (__lppaca_shared_proc(local_paca->lppaca_ptr))
 #endif
 
 #define get_slb_shadow()   (get_paca()->slb_shadow_ptr)
diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index 2eab323f6970..cb2f1211f7eb 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -639,16 +639,8 @@ static const struct proc_ops 
vcpudispatch_stats_freq_proc_ops = {
 
 static int __init vcpudispatch_stats_procfs_init(void)
 {
-   /*
-* Avoid smp_processor_id while preemptible. All CPUs should have
-* the same value for lppaca_shared_proc.
-*/
-   preempt_disable();
-   if (!lppaca_shared_proc(get_lppaca())) {
-   preempt_enable();
+   if (!lppaca_shared_proc())
return 0;
-   }
-   preempt_enable();
 
if (!proc_create("powerpc/vcpudispatch_stats", 0600, NULL,
_stats_proc_ops))
diff --git a/arch/powerpc/platforms/pseries/lparcfg.c 
b/arch/powerpc/platforms/pseries/lparcfg.c
index 8acc70509520..1c151d77e74b 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -206,7 +206,7 @@ static void parse_ppp_data(struct seq_file *m)
   ppp_data.active_system_procs);
 
/* pool related entries are appropriate for shared configs */
-   if (lppaca_shared_proc(get_lppaca())) {
+   if (lppaca_shared_proc()) {
unsigned long pool_idle_time, pool_procs;
 
seq_printf(m, "pool=%d\n", ppp_data.pool_num);
@@ -560,7 +560,7 @@ static int pseries_lparcfg_data(struct seq_file *m, void *v)
   partition_potential_processors);
 
seq_printf(m, "shared_processor_mode=%d\n",
-  lppaca_shared_proc(get_lppaca()));
+  lppaca_shared_proc());
 
 #ifdef CONFIG_PPC_64S_HASH_MMU
if (!radix_enabled())
diff --git a/arch/powerpc/platforms/pseries/set

[PATCH] powerpc/crypto: Add gitignore for generated P10 AES/GCM .S files

2023-07-12 Thread Russell Currey
aesp10-ppc.S and ghashp10-ppc.S are autogenerated and not tracked by
git, so they should be ignored.  This is doing the same as the P8 files
in drivers/crypto/vmx/.gitignore but for the P10 files in
arch/powerpc/crypto.

Signed-off-by: Russell Currey 
---
Fixes: 81d358b118dc ("powerpc/crypto: Fix aes-gcm-p10 link errors")

(but who cares)

 arch/powerpc/crypto/.gitignore | 3 +++
 1 file changed, 3 insertions(+)
 create mode 100644 arch/powerpc/crypto/.gitignore

diff --git a/arch/powerpc/crypto/.gitignore b/arch/powerpc/crypto/.gitignore
new file mode 100644
index ..e1094f08f713
--- /dev/null
+++ b/arch/powerpc/crypto/.gitignore
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+aesp10-ppc.S
+ghashp10-ppc.S
-- 
2.41.0



Re: [PATCH] powerpc/security: Fix Speculation_Store_Bypass reporting on Power10

2023-05-17 Thread Russell Currey
On Wed, 2023-05-17 at 17:49 +1000, Michael Ellerman wrote:
> Nageswara reported that /proc/self/status was showing "vulnerable"
> for
> the Speculation_Store_Bypass feature on Power10, eg:
> 
>   $ grep Speculation_Store_Bypass: /proc/self/status
>   Speculation_Store_Bypass:   vulnerable
> 
> But at the same time the sysfs files, and lscpu, were showing "Not
> affected".
> 
> This turns out to simply be a bug in the reporting of the
> Speculation_Store_Bypass, aka. PR_SPEC_STORE_BYPASS, case.
> 
> When SEC_FTR_STF_BARRIER was added, so that firmware could
> communicate
> the vulnerability was not present, the code in ssb_prctl_get() was
> not
> updated to check the new flag.
> 
> So add the check for SEC_FTR_STF_BARRIER being disabled. Rather than
> adding the new check to the existing if block and expanding the
> comment
> to cover both cases, rewrite the three cases to be separate so they
> can
> be commented separately for clarity.
> 
> Fixes: 84ed26fd00c5 ("powerpc/security: Add a security feature for
> STF barrier")
> Cc: sta...@vger.kernel.org # v5.14+
> Reported-by: Nageswara R Sastry 
> Signed-off-by: Michael Ellerman 

Reviewed-by: Russell Currey 


Re: [PATCH v2 10/12] selftests/powerpc: Add more utility macros

2023-05-07 Thread Russell Currey
On Thu, 2023-03-30 at 16:50 +1100, Benjamin Gray wrote:
> * Include unistd.h for _exit()
> * Include stdio.h for fprintf()
> * Adds _MSG assertion variants to provide more context behind why a
>   failure occurred.
> * Move ARRAY_SIZE macro to utils.h
> 
> The _MSG variants and ARRAY_SIZE will be used by the following
> DEXCR selftests.
> 
> Signed-off-by: Benjamin Gray 
> 

I'd prefer if your commit message led with adding FAIL_IF_MSG etc and
then mentioned the other changes to support it.  It's unintuitive to
read as-is.

Reviewed-by: Russell Currey 


Re: [PATCH v2 09/12] Documentation: Document PowerPC kernel DEXCR interface

2023-05-07 Thread Russell Currey
On Thu, 2023-03-30 at 16:50 +1100, Benjamin Gray wrote:
> Describe the DEXCR and document how to configure it.
> 
> Signed-off-by: Benjamin Gray 
> 
> ---
> v2: * Document coredump & ptrace support
> v1: * Remove the dynamic control docs, describe the static config
>   option
> 
> This documentation is a little bare for now, but will be expanded on
> when dynamic DEXCR control is added.

Reviewed-by: Russell Currey 


Re: [PATCH v2 08/12] powerpc/ptrace: Expose HASHKEYR register to ptrace

2023-05-07 Thread Russell Currey
On Thu, 2023-03-30 at 16:50 +1100, Benjamin Gray wrote:
> The HASHKEYR register contains a secret per-process key to enable
> unique
> hashes per process. In general it should not be exposed to userspace
> at all and a regular process has no need to know its key.
> 
> However, checkpoint restore in userspace (CRIU) functionality
> requires
> that a process be able to set the HASHKEYR of another process,
> otherwise
> existing hashes on the stack would be invalidated by a new random
> key.
> 
> Exposing HASHKEYR in this way also makes it appear in core dumps,
> which
> is a security concern. Multiple threads may share a key, for example
> just after a fork() call, where the kernel cannot know if the child
> is
> going to return back along the parent's stack. If such a thread is
> coerced into making a core dump, then the HASHKEYR value will be
> readable and able to be used against all other threads sharing that
> key,
> effectively undoing any protection offered by hashst/hashchk.
> 
> Therefore we expose HASHKEYR to ptrace when CONFIG_CHECKPOINT_RESTORE
> is
> enabled, providing a choice of increased security or migratable ROP
> protected processes. This is similar to how ARM exposes its PAC keys.
> 
> Signed-off-by: Benjamin Gray 

Seems sensible

Reviewed-by: Russell Currey 


Re: [PATCH v2 07/12] powerpc/ptrace: Expose DEXCR and HDEXCR registers to ptrace

2023-05-07 Thread Russell Currey
On Thu, 2023-03-30 at 16:50 +1100, Benjamin Gray wrote:
> The DEXCR register is of interest when ptracing processes. Currently
> it
> is static, but eventually will be dynamically controllable by a
> process.
> If a process can control its own, then it is useful for it to be
> ptrace-able to (e.g., for checkpoint-restore functionality).
> 
> It is also relevant to core dumps (the NPHIE aspect in particular),
> which use the ptrace mechanism (or is it the other way around?) to
> decide what to dump. The HDEXCR is useful here too, as the NPHIE
> aspect
> may be set in the HDEXCR without being set in the DEXCR. Although the
> HDEXCR is per-cpu and we don't track it in the task struct (it's
> useless
> in normal operation), it would be difficult to imagine why a
> hypervisor
> would set it to different values within a guest. A hypervisor cannot
> safely set NPHIE differently at least, as that would break programs.
> 
> Expose a read-only view of the userspace DEXCR and HDEXCR to ptrace.
> The HDEXCR is always readonly, and is useful for diagnosing the core
> dumps (as the HDEXCR may set NPHIE without the DEXCR setting it).
> 
> Signed-off-by: Benjamin Gray 
> 

I don't know much about ptrace but this looks sane.

Reviewed-by: Russell Currey 


Re: [PATCH v2 06/12] powerpc/dexcr: Support custom default DEXCR value

2023-05-07 Thread Russell Currey
On Thu, 2023-03-30 at 16:50 +1100, Benjamin Gray wrote:
> Make the DEXCR value configurable at config time. Intentionally don't
> limit possible values to support future aspects without needing
> kernel
> updates.
> 
> The default config value enables hashst/hashchk in problem state.
> This should be safe, as generally software needs to request these
> instructions be included in the first place.
> 
> Signed-off-by: Benjamin Gray 
> 

Reviewed-by: Russell Currey 

> ---
> New in v1
> 
> Preface with: I'm note sure on the best place to put the config.

Me neither.

> 
> I also don't think there's any need to zero out unknown/unsupported
> bits. Reserved implies they are ignored by the hardware (from my
> understanding of the ISA). Current P10s boot with all bits set;
> lsdexcr
> (later patch) reports
> 
>    uDEXCR: ff00 (SBHE, IBRTPD, SRAPD, NPHIE, PHIE, unknown)
> 
> when you try to read it back. Leaving them be also makes it easier to
> support newer aspects without a kernel update.
> 
> If arbitrary value support isn't important, it's probably a nicer
> interface to make each aspect an entry in a menu.
> 
> Future work may include dynamic DEXCR controls via prctl() and sysfs.
> The dynamic controls would be able to override this default DEXCR on
> a
> per-process basis. A stronger "PPC_ENFORCE_USER_ROP_PROCTETION"
> config
> may be required at such a time to prevent dynamically disabling the
> hash checks.
> ---
>  arch/powerpc/Kconfig  | 14 ++
>  arch/powerpc/kernel/cpu_setup_power.c |  3 ++-
>  2 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 47017975fc2b..809ae576e19f 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -1035,6 +1035,20 @@ config PPC_MEM_KEYS
>  
>   If unsure, say y.
>  
> +config PPC_DEXCR_DEFAULT

Calling it "default" could be slightly misleading since there's no
other way to modify the DEXCR right now.  It'll make more sense once
it's user configurable though.


> +   hex "Default DEXCR value"
> +   default 0x0400
> +   depends on PPC_BOOK3S_64
> +   help
> + Power10 introduces the Dynamic Execution Control Register
> (DEXCR)
> + to provide fine grained control over various speculation
> and
> + security capabilities. This is used as the default DEXCR
> value.
> +
> + It is a 64 bit value that splits into 32 bits for
> supervisor mode
> + and 32 bits for problem state. The default config value
> enables
> + the hashst/hashck instructions in userspace. See the ISA
> for

hashchk*

May also be useful to reference the ISA version here.

> + specifics of what each bit controls.
> +
>  config PPC_SECURE_BOOT
> prompt "Enable secure boot support"
> bool
> diff --git a/arch/powerpc/kernel/cpu_setup_power.c
> b/arch/powerpc/kernel/cpu_setup_power.c
> index c00721801a1b..814c825a0661 100644
> --- a/arch/powerpc/kernel/cpu_setup_power.c
> +++ b/arch/powerpc/kernel/cpu_setup_power.c
> @@ -10,6 +10,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -128,7 +129,7 @@ static void init_PMU_ISA31(void)
>  
>  static void init_DEXCR(void)
>  {
> -   mtspr(SPRN_DEXCR, 0);
> +   mtspr(SPRN_DEXCR, CONFIG_PPC_DEXCR_DEFAULT);
> mtspr(SPRN_HASHKEYR, 0);
>  }
>  



Re: [PATCH v2 05/12] powerpc/dexcr: Support userspace ROP protection

2023-05-07 Thread Russell Currey
On Thu, 2023-03-30 at 16:50 +1100, Benjamin Gray wrote:
> The ISA 3.1B hashst and hashchk instructions use a per-cpu SPR
> HASHKEYR
> to hold a key used in the hash calculation. This key should be
> different
> for each process to make it harder for a malicious process to
> recreate
> valid hash values for a victim process.
> 
> Add support for storing a per-thread hash key, and setting/clearing
> HASHKEYR appropriately.
> 
> Signed-off-by: Benjamin Gray 

Reviewed-by: Russell Currey 


Re: [PATCH v2 04/12] powerpc/dexcr: Handle hashchk exception

2023-05-07 Thread Russell Currey
On Thu, 2023-03-30 at 16:50 +1100, Benjamin Gray wrote:
> Recognise and pass the appropriate signal to the user program when a
> hashchk instruction triggers. This is independent of allowing
> configuration of DEXCR[NPHIE], as a hypervisor can enforce this
> aspect
> regardless of the kernel.
> 
> The signal mirrors how ARM reports their similar check failure. For
> example, their FPAC handler in arch/arm64/kernel/traps.c
> do_el0_fpac()
> does this. When we fail to read the instruction that caused the fault
> we send a segfault, similar to how emulate_math() does it.
> 
> Signed-off-by: Benjamin Gray 
> 
> ---
> 
> v1: * Refactor the hashchk check to return 0 on success, an error
>   code on failure. Determine what to do based on specific
> error
>   code.

0 on success makes sense, but it's not exactly obvious what "success"
is in this case.

> * Motivate signal and code
> ---
>  arch/powerpc/include/asm/ppc-opcode.h |  1 +
>  arch/powerpc/include/asm/processor.h  |  9 +++
>  arch/powerpc/kernel/Makefile  |  1 +
>  arch/powerpc/kernel/dexcr.c   | 36
> +++
>  arch/powerpc/kernel/traps.c   | 10 
>  5 files changed, 57 insertions(+)
>  create mode 100644 arch/powerpc/kernel/dexcr.c
> 
> diff --git a/arch/powerpc/include/asm/ppc-opcode.h
> b/arch/powerpc/include/asm/ppc-opcode.h
> index 21e33e46f4b8..89b316466ed1 100644
> --- a/arch/powerpc/include/asm/ppc-opcode.h
> +++ b/arch/powerpc/include/asm/ppc-opcode.h
> @@ -215,6 +215,7 @@
>  #define OP_31_XOP_STFSX    663
>  #define OP_31_XOP_STFSUX    695
>  #define OP_31_XOP_STFDX 727
> +#define OP_31_XOP_HASHCHK   754
>  #define OP_31_XOP_STFDUX    759
>  #define OP_31_XOP_LHBRX 790
>  #define OP_31_XOP_LFIWAX    855
> diff --git a/arch/powerpc/include/asm/processor.h
> b/arch/powerpc/include/asm/processor.h
> index e96c9b8c2a60..bad64d6a5d36 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -455,6 +455,15 @@ int exit_vmx_usercopy(void);
>  int enter_vmx_ops(void);
>  void *exit_vmx_ops(void *dest);
>  
> +#ifdef CONFIG_PPC_BOOK3S_64
> +int check_hashchk_trap(struct pt_regs const *regs);
> +#else
> +static inline int check_hashchk_trap(struct pt_regs const *regs)
> +{
> +   return -EINVAL;
> +}
> +#endif /* CONFIG_PPC_BOOK3S_64 */
> +
>  #endif /* __KERNEL__ */
>  #endif /* __ASSEMBLY__ */
>  #endif /* _ASM_POWERPC_PROCESSOR_H */
> diff --git a/arch/powerpc/kernel/Makefile
> b/arch/powerpc/kernel/Makefile
> index 9bf2be123093..07181e508754 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -88,6 +88,7 @@ obj-$(CONFIG_HAVE_HW_BREAKPOINT)  +=
> hw_breakpoint.o
>  obj-$(CONFIG_PPC_DAWR) += dawr.o
>  obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_ppc970.o
> cpu_setup_pa6t.o
>  obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_power.o
> +obj-$(CONFIG_PPC_BOOK3S_64)+= dexcr.o
>  obj-$(CONFIG_PPC_BOOK3S_64)+= mce.o mce_power.o
>  obj-$(CONFIG_PPC_BOOK3E_64)+= exceptions-64e.o idle_64e.o
>  obj-$(CONFIG_PPC_BARRIER_NOSPEC) += security.o
> diff --git a/arch/powerpc/kernel/dexcr.c
> b/arch/powerpc/kernel/dexcr.c
> new file mode 100644
> index ..f263e5439cc6
> --- /dev/null
> +++ b/arch/powerpc/kernel/dexcr.c
> @@ -0,0 +1,36 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * DEXCR infrastructure

May as well spell out DEXCR here

> + *
> + * Copyright 2023, Benjamin Gray, IBM Corporation.
> + */
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +int check_hashchk_trap(struct pt_regs const *regs)
> +{
> +   ppc_inst_t insn;
> +
> +   if (!cpu_has_feature(CPU_FTR_DEXCR_NPHIE))
> +   return -EINVAL;
> +
> +   if (!user_mode(regs))
> +   return -EINVAL;
> +
> +   if (get_user_instr(insn, (void __user *)regs->nip))
> +   return -EFAULT;
> +
> +   if (ppc_inst_primary_opcode(insn) != 31 ||
> +   get_xop(ppc_inst_val(insn)) != OP_31_XOP_HASHCHK)
> +   return -EINVAL;
> +
> +   return 0;
> +}

The return values here are quite confusing and only start to make sense
in the context of the calling function, which isn't great for something
living in a different file.  So if we return 0 we SIGILL because of
hashchk, return -EFAULT to SEGV_MAPERR and return -EINVAL to fall
through to SIGTRAP.  I would like a comment for that since it's not
very intuitive.

> diff --git a/arch/powerpc/kernel/traps.c
> b/arch/powerpc/kernel/traps.c
> index 9bdd79aa51cf..ade67e23b974 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -1516,6 +1516,16 @@ static void do_program_check(struct pt_regs
> *regs)
> return;
> }
> }
> +
> +   switch (check_hashchk_trap(regs)) {
> +   case 0:
> +

Re: [PATCH v2 01/12] powerpc/book3s: Add missing include

2023-05-07 Thread Russell Currey
On Thu, 2023-03-30 at 16:50 +1100, Benjamin Gray wrote:
> The functions here use struct thread_struct fields, so need to import
> the full definition from . The  header
> that defines current only forward declares struct thread_struct.
> 
> Failing to include this  header leads to a compilation
> error when a translation unit does not also include 
> indirectly.
> 
> Signed-off-by: Benjamin Gray 
> Reviewed-by: Nicholas Piggin 

Reviewed-by: Russell Currey 


Re: [PATCH v2 03/12] powerpc/dexcr: Add initial Dynamic Execution Control Register (DEXCR) support

2023-05-07 Thread Russell Currey
On Thu, 2023-03-30 at 16:50 +1100, Benjamin Gray wrote:
> ISA 3.1B introduces the Dynamic Execution Control Register (DEXCR).
> It
> is a per-cpu register that allows control over various CPU behaviours
> including branch hint usage, indirect branch speculation, and
> hashst/hashchk support.
> 
> Add some definitions and basic support for the DEXCR in the kernel.
> Right now it just
> 
>   * Zero initialises the DEXCR and HASHKEYR when a CPU onlines.
>   * Clears them in reset_sprs().
>   * Detects when the NPHIE aspect is supported (the others don't get
>     looked at in this series, so there's no need to waste a CPU_FTR
>     on them).
> 
> We initialise the HASHKEYR to ensure that all cores have the same
> key,
> so an HV enforced NPHIE + swapping cores doesn't randomly crash a
> process using hash instructions. The stores to HASHKEYR are
> unconditional because the ISA makes no mention of the SPR being
> missing
> if support for doing the hashes isn't present. So all that would
> happen
> is the HASHKEYR value gets ignored. This helps slightly if NPHIE
> detection fails; e.g., we currently only detect it on pseries.
> 
> Signed-off-by: Benjamin Gray 
> 

LGTM.

Reviewed-by: Russell Currey 


Re: [PATCH v2 02/12] powerpc/ptrace: Add missing include

2023-05-07 Thread Russell Currey
On Thu, 2023-03-30 at 16:50 +1100, Benjamin Gray wrote:
> ptrace-decl.h uses user_regset_get2_fn (among other things) from
> regset.h. While all current users of ptrace-decl.h include regset.h
> before it anyway, it adds an implicit ordering dependency and breaks
> source tooling that tries to inspect ptrace-decl.h by itself.
> 
> Signed-off-by: Benjamin Gray 

Reviewed-by: Russell Currey 



[PATCH] powerpc/iommu: Fix notifiers being shared by PCI and VIO buses

2023-03-21 Thread Russell Currey
fail_iommu_setup() registers the fail_iommu_bus_notifier struct to both
PCI and VIO buses.  struct notifier_block is a linked list node, so this
causes any notifiers later registered to either bus type to also be
registered to the other since they share the same node.

This causes issues in (at least) the vgaarb code, which registers a
notifier for PCI buses.  pci_notify() ends up being called on a vio
device, converted with to_pci_dev() even though it's not a PCI device,
and finally makes a bad access in vga_arbiter_add_pci_device() as
discovered with KASAN:

 BUG: KASAN: slab-out-of-bounds in vga_arbiter_add_pci_device+0x60/0xe00
 Read of size 4 at addr c00264c26fdc by task swapper/0/1

 Call Trace:
 [c00263607520] [c00010f7023c] dump_stack_lvl+0x1bc/0x2b8 (unreliable)
 [c00263607560] [cf142a64] print_report+0x3f4/0xc60
 [c00263607640] [cf142144] kasan_report+0x244/0x698
 [c00263607740] [cf1460e8] __asan_load4+0xe8/0x250
 [c00263607760] [cff4b850] vga_arbiter_add_pci_device+0x60/0xe00
 [c00263607850] [cff4c678] pci_notify+0x88/0x444
 [c002636078b0] [ce94dfc4] notifier_call_chain+0x104/0x320
 [c00263607950] [ce94f050] blocking_notifier_call_chain+0xa0/0x140
 [c00263607990] [c000100cb3b8] device_add+0xac8/0x1d30
 [c00263607aa0] [c000100ccd98] device_register+0x58/0x80
 [c00263607ad0] [ce84247c] vio_register_device_node+0x9ac/0xce0
 [c00263607ba0] [c000126c95d8] vio_bus_scan_register_devices+0xc4/0x13c
 [c00263607bd0] [c000126c96e4] 
__machine_initcall_pseries_vio_device_init+0x94/0xf0
 [c00263607c00] [ce69467c] do_one_initcall+0x12c/0xaa8
 [c00263607cf0] [c0001268b8a8] kernel_init_freeable+0xa48/0xba8
 [c00263607dd0] [ce695f24] kernel_init+0x64/0x400
 [c00263607e50] [ce68e0e4] ret_from_kernel_thread+0x5c/0x64

Fix this by creating separate notifier_block structs for each bus type.

Fixes: d6b9a81b2a45 ("powerpc: IOMMU fault injection")
Reported-by: Nageswara R Sastry 
Signed-off-by: Russell Currey 
---
 arch/powerpc/kernel/iommu.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index ee95937bdaf1..6f1117fe3870 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -171,17 +171,26 @@ static int fail_iommu_bus_notify(struct notifier_block 
*nb,
return 0;
 }
 
-static struct notifier_block fail_iommu_bus_notifier = {
+/*
+ * PCI and VIO buses need separate notifier_block structs, since they're linked
+ * list nodes.  Sharing a notifier_block would mean that any notifiers later
+ * registered for PCI buses would also get called by VIO buses and vice versa.
+ */
+static struct notifier_block fail_iommu_pci_bus_notifier = {
+   .notifier_call = fail_iommu_bus_notify
+};
+
+static struct notifier_block fail_iommu_vio_bus_notifier = {
.notifier_call = fail_iommu_bus_notify
 };
 
 static int __init fail_iommu_setup(void)
 {
 #ifdef CONFIG_PCI
-   bus_register_notifier(_bus_type, _iommu_bus_notifier);
+   bus_register_notifier(_bus_type, _iommu_pci_bus_notifier);
 #endif
 #ifdef CONFIG_IBMVIO
-   bus_register_notifier(_bus_type, _iommu_bus_notifier);
+   bus_register_notifier(_bus_type, _iommu_vio_bus_notifier);
 #endif
 
return 0;
-- 
2.39.2



[PATCH] powerpc/mm: Fix false detection of read faults

2023-03-09 Thread Russell Currey
To support detection of read faults with Radix execute-only memory, the
vma_is_accessible() check in access_error() (which checks for PROT_NONE)
was replaced with a check to see if VM_READ was missing, and if so,
returns true to assert the fault was caused by a bad read.

This is incorrect, as it ignores that both VM_WRITE and VM_EXEC imply
read on powerpc, as defined in protection_map[].  This causes mappings
containing VM_WRITE or VM_EXEC without VM_READ to misreport the cause of
page faults, since the MMU is still allowing reads.

Correct this by restoring the original vma_is_accessible() check for
PROT_NONE mappings, and adding a separate check for Radix PROT_EXEC-only
mappings.

Fixes: 395cac7752b9 ("powerpc/mm: Support execute-only memory on the Radix MMU")
Reported-by: Michal Suchánek 
Tested-by: Benjamin Gray 
Signed-off-by: Russell Currey 
---
 arch/powerpc/mm/fault.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index c7ae86b04b8a..d0710ecc1fc7 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -271,11 +271,16 @@ static bool access_error(bool is_write, bool is_exec, 
struct vm_area_struct *vma
}
 
/*
-* Check for a read fault.  This could be caused by a read on an
-* inaccessible page (i.e. PROT_NONE), or a Radix MMU execute-only page.
+* VM_READ, VM_WRITE and VM_EXEC all imply read permissions, as
+* defined in protection_map[].  Read faults can only be caused by
+* a PROT_NONE mapping, or with a PROT_EXEC-only mapping on Radix.
 */
-   if (unlikely(!(vma->vm_flags & VM_READ)))
+   if (unlikely(!vma_is_accessible(vma)))
return true;
+
+   if (unlikely(radix_enabled() && ((vma->vm_flags & VM_ACCESS_FLAGS) == 
VM_EXEC)))
+   return true;
+
/*
 * We should ideally do the vma pkey access check here. But in the
 * fault path, handle_mm_fault() also does the same check. To avoid
-- 
2.39.2



Re: [PATCH v4 1/2] powerpc/mm: Support execute-only memory on the Radix MMU

2023-03-08 Thread Russell Currey
On Wed, 2023-03-08 at 16:27 +0100, Michal Suchánek wrote:
> Hello,
> 
> On Wed, Aug 31, 2022 at 11:13:59PM +1000, Michael Ellerman wrote:
> > On Wed, 17 Aug 2022 15:06:39 +1000, Russell Currey wrote:
> > > Add support for execute-only memory (XOM) for the Radix MMU by
> > > using an
> > > execute-only mapping, as opposed to the RX mapping used by
> > > powerpc's
> > > other MMUs.
> > > 
> > > The Hash MMU already supports XOM through the execute-only pkey,
> > > which is a separate mechanism shared with x86.  A PROT_EXEC-only
> > > mapping
> > > will map to RX, and then the pkey will be applied on top of it.
> > > 
> > > [...]
> > 
> > Applied to powerpc/next.
> > 
> > [1/2] powerpc/mm: Support execute-only memory on the Radix MMU
> >  
> > https://git.kernel.org/powerpc/c/395cac7752b905318ae454a8b859d4c190485510
> 
> This breaks libaio tests (on POWER9 hash PowerVM):
> https://pagure.io/libaio/blob/master/f/harness/cases/5.t#_43
> 
> cases/5.p
> expect   512: (w), res =   512 [Success]
> expect   512: (r), res =   512 [Success]
> expect   512: (r), res =   512 [Success]
> expect   512: (w), res =   512 [Success]
> expect   512: (w), res =   512 [Success]
> expect   -14: (r), res =   -14 [Bad address]
> expect   512: (r), res =   512 [Success]
> expect   512: (w), res =   512 [Success]
> test cases/5.t completed PASSED.
> 
> cases/5.p
> expect   512: (w), res =   512 [Success]
> expect   512: (r), res =   512 [Success]
> expect   512: (r), res =   512 [Success]
> expect   512: (w), res =   512 [Success]
> expect   512: (w), res =   512 [Success]
> expect   -14: (r), res =   -14 [Bad address]
> expect   512: (r), res =   512 [Success]
> expect   -14: (w), res =   512 [Success] -- FAILED
> test cases/5.t completed FAILED.
> 
> Can you have a look if that test assumption is OK?

Hi Michal, thanks for the report.

This wasn't an intended behaviour change, so it is a bug.  I have no
idea why we hit the fault in write() but not in io_submit(), though. 
The same issue applies under Radix.

What's happening here is that we're taking a page fault and calling
into access_error() and returning true when we shouldn't.  Previously
we didn't check for read faults and only checked for PROT_NONE.  My
patch checks the vma flags to see if they lack VM_READ after we check
for exec and write, which ignores that VM_WRITE implies read 

This means we're mishandling faults for write-only mappings by assuming
that the lack of VM_READ means we're faulting from read, when that
should only be possible under a PROT_EXEC-only mapping.

I think the correct behaviour is

if (unlikely(!(vma->vm_flags & (VM_READ | VM_WRITE

in access_error().

Will do some more testing and send a patch soon.  I also need to verify
that write implying read is true for all powerpc platforms.

- Russell

> 
> Thanks
> 
> Michal



[PATCH] powerpc/pseries: Avoid hcall in plpks_is_available() on non-pseries

2023-02-21 Thread Russell Currey
plpks_is_available() can be called on any platform via kexec but calls
_plpks_get_config() which makes a hcall, which will only work on pseries.
Fix this by returning early in plpks_is_available() if hcalls aren't
possible.

Fixes: 119da30d037d ("powerpc/pseries: Expose PLPKS config values, support 
additional fields")
Reported-by: Murphy Zhou 
Signed-off-by: Russell Currey 
---
 arch/powerpc/platforms/pseries/plpks.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/plpks.c 
b/arch/powerpc/platforms/pseries/plpks.c
index cdf09e5bd741..6f7bf3fc3aea 100644
--- a/arch/powerpc/platforms/pseries/plpks.c
+++ b/arch/powerpc/platforms/pseries/plpks.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static u8 *ospassword;
 static u16 ospasswordlength;
@@ -377,6 +378,9 @@ bool plpks_is_available(void)
 {
int rc;
 
+   if (!firmware_has_feature(FW_FEATURE_LPAR))
+   return false;
+
rc = _plpks_get_config();
if (rc)
return false;
-- 
2.39.2



Re: [PATCH v4 16/24] powerpc/pseries: Implement signed update for PLPKS objects

2023-01-30 Thread Russell Currey
On Mon, 2023-01-30 at 15:43 +1100, Andrew Donnellan wrote:
> On Tue, 2023-01-24 at 14:16 +1000, Nicholas Piggin wrote:
> > > diff --git a/arch/powerpc/platforms/pseries/plpks.c
> > > b/arch/powerpc/platforms/pseries/plpks.c
> > > index 1189246b03dc..796ed5544ee5 100644
> > > --- a/arch/powerpc/platforms/pseries/plpks.c
> > > +++ b/arch/powerpc/platforms/pseries/plpks.c
> > > @@ -81,6 +81,12 @@ static int pseries_status_to_err(int rc)
> > > err = -ENOENT;
> > > break;
> > > case H_BUSY:
> > > +   case H_LONG_BUSY_ORDER_1_MSEC:
> > > +   case H_LONG_BUSY_ORDER_10_MSEC:
> > > +   case H_LONG_BUSY_ORDER_100_MSEC:
> > > +   case H_LONG_BUSY_ORDER_1_SEC:
> > > +   case H_LONG_BUSY_ORDER_10_SEC:
> > > +   case H_LONG_BUSY_ORDER_100_SEC:
> > > err = -EBUSY;
> > > break;
> > > case H_AUTHORITY:
> > 
> > This is a bit sad to maintain here. It's duplicating bits with
> > hvcs_convert, and a bunch of open coded places. Probably not the
> > series to do anything about. Would be nice if we could standardise
> > it though.
> 
> Agreed - though we're not going to touch it in this series.
> 
> > 
> > > @@ -184,14 +190,17 @@ static struct label *construct_label(char
> > > *component, u8 varos, u8 *name,
> > >  u16 namelen)
> > >  {
> > > struct label *label;
> > > -   size_t slen;
> > > +   size_t slen = 0;
> > >  
> > > if (!name || namelen > PLPKS_MAX_NAME_SIZE)
> > > return ERR_PTR(-EINVAL);
> > >  
> > > -   slen = strlen(component);
> > > -   if (component && slen > sizeof(label->attr.prefix))
> > > -   return ERR_PTR(-EINVAL);
> > > +   // Support NULL component for signed updates
> > > +   if (component) {
> > > +   slen = strlen(component);
> > > +   if (slen > sizeof(label->attr.prefix))
> > > +   return ERR_PTR(-EINVAL);
> > > +   }
> > 
> > Is this already a bug? Code checks for component != NULL but
> > previously
> > calls strlen which would oops on NULL component AFAIKS. Granted
> > nothing
> > is actually using any of this these days.
> 
> True, it should have been checking for NULL first, but as you say no-
> one is using it.
> 
> > 
> > It already seems like it's supposed to be allowed to rad NULL
> > component
> > with read_var though? Why the differences, why not always allow
> > NULL
> > component? (I assume there is some reason, I just don't know
> > anything
> > about secvar or secure boot).
> 
> I think the comment confuses more than it clarifies, I'll remove it.
> 
> As you say, read_var() should work fine with component == NULL,
> though
> write_var() checks it. The only rule I can find in the spec is that
> signed update calls *must* set the component to NULL. I'm seeking
> clarification on that.

Signed update calls *must* set the component to NULL.

We could just call construct_label() with NULL as the component
directly but I think it's better to explicitly check var->component and
return so the caller knows what they're trying to do is wrong.

> 
> > > +EXPORT_SYMBOL(plpks_signed_update_var);
> > 
> > Sorry I missed it before -- can this be a _GPL export?
> 
> Indeed it should be - actually, I should check if I can get rid of
> the
> export completely...
> 



Re: [PATCH v4 21/24] powerpc/pseries: Pass PLPKS password on kexec

2023-01-30 Thread Russell Currey
On Tue, 2023-01-24 at 14:36 +1000, Nicholas Piggin wrote:
> On Fri Jan 20, 2023 at 5:43 PM AEST, Andrew Donnellan wrote:
> > From: Russell Currey 
> > 
> > Before interacting with the PLPKS, we ask the hypervisor to
> > generate a
> > password for the current boot, which is then required for most
> > further
> > PLPKS operations.
> > 
> > If we kexec into a new kernel, the new kernel will try and fail to
> > generate a new password, as the password has already been set.
> > 
> > Pass the password through to the new kernel via the device tree, in
> > /chosen/plpks-pw. Check for the presence of this property before
> > trying
> 
> In /chosen/ibm,plpks-pw

Good catch, thanks

> 
> > to generate a new password - if it exists, use the existing
> > password and
> > remove it from the device tree.
> > 
> > Signed-off-by: Russell Currey 
> > Signed-off-by: Andrew Donnellan 
> > 
> > ---
> > 
> > v3: New patch
> > 
> > v4: Fix compile when CONFIG_PSERIES_PLPKS=n (snowpatch)
> > 
> >     Fix error handling on fdt_path_offset() call (ruscur)
> > ---
> >  arch/powerpc/kexec/file_load_64.c  | 18 ++
> >  arch/powerpc/platforms/pseries/plpks.c | 18 +-
> >  2 files changed, 35 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/kexec/file_load_64.c
> > b/arch/powerpc/kexec/file_load_64.c
> > index af8854f9eae3..0c9130af60cc 100644
> > --- a/arch/powerpc/kexec/file_load_64.c
> > +++ b/arch/powerpc/kexec/file_load_64.c
> > @@ -27,6 +27,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  struct umem_info {
> > u64 *buf;   /* data buffer for usable-memory
> > property */
> > @@ -1156,6 +1157,9 @@ int setup_new_fdt_ppc64(const struct kimage
> > *image, void *fdt,
> >  {
> > struct crash_mem *umem = NULL, *rmem = NULL;
> > int i, nr_ranges, ret;
> > +#ifdef CONFIG_PSERIES_PLPKS
> > +   int chosen_offset;
> > +#endif
> 
> Could put this in plpks_is_available and avoid an ifdef.

Yep, moving this out, though not into plpks_is_available().

> 
> >  
> > /*
> >  * Restrict memory usage for kdump kernel by setting up
> > @@ -1230,6 +1234,20 @@ int setup_new_fdt_ppc64(const struct kimage
> > *image, void *fdt,
> > }
> > }
> >  
> > +#ifdef CONFIG_PSERIES_PLPKS
> > +   // If we have PLPKS active, we need to provide the password
> > +   if (plpks_is_available()) {
> > +   chosen_offset = fdt_path_offset(fdt, "/chosen");
> > +   if (chosen_offset < 0) {
> > +   pr_err("Can't find chosen node: %s\n",
> > +  fdt_strerror(chosen_offset));
> > +   goto out;
> > +   }
> > +   ret = fdt_setprop(fdt, chosen_offset, "ibm,plpks-
> > pw",
> > + plpks_get_password(),
> > plpks_get_passwordlen());
> > +   }
> > +#endif // CONFIG_PSERIES_PLPKS
> 
> I think if you define plpks_get_password and plpks_get_passwordlen as
> BUILD_BUG_ON when PLPKS is not configured and plpks_is_available as
> false, you could remove the ifdef entirely.

I'm moving this into a helper in plpks.c since now there's FDT handling
in early boot in there.  We can drop plpks_get_password() entirely.

> 
> > +
> >  out:
> > kfree(rmem);
> > kfree(umem);
> > diff --git a/arch/powerpc/platforms/pseries/plpks.c
> > b/arch/powerpc/platforms/pseries/plpks.c
> > index b3c7410a4f13..0350f10e1755 100644
> > --- a/arch/powerpc/platforms/pseries/plpks.c
> > +++ b/arch/powerpc/platforms/pseries/plpks.c
> > @@ -16,6 +16,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -126,7 +127,22 @@ static int plpks_gen_password(void)
> >  {
> > unsigned long retbuf[PLPAR_HCALL_BUFSIZE] = { 0 };
> > u8 *password, consumer = PLPKS_OS_OWNER;
> > -   int rc;
> > +   struct property *prop;
> > +   int rc, len;
> > +
> > +   // Before we generate the password, we may have been booted
> > by kexec and
> > +   // provided with a previous password.  Check for that
> > first.
> 
> So not really generating the password then. Should it be in a
> different
> function the caller mak

Re: [PATCH v4 24/24] integrity/powerpc: Support loading keys from pseries secvar

2023-01-24 Thread Russell Currey
On Tue, 2023-01-24 at 10:14 -0500, Mimi Zohar wrote:
> On Fri, 2023-01-20 at 18:43 +1100, Andrew Donnellan wrote:
> > From: Russell Currey 
> > 
> > The secvar object format is only in the device tree under powernv.
> > We now have an API call to retrieve it in a generic way, so we
> > should
> > use that instead of having to handle the DT here.
> > 
> > Add support for pseries secvar, with the "ibm,plpks-sb-v1" format.
> > The object format is expected to be the same, so there shouldn't be
> > any
> > functional differences between objects retrieved from powernv and
> > pseries.
> > 
> > Signed-off-by: Russell Currey 
> > Signed-off-by: Andrew Donnellan 
> > 
> > ---
> > 
> > v3: New patch
> > 
> > v4: Pass format buffer size (stefanb, npiggin)
> > ---
> >  .../integrity/platform_certs/load_powerpc.c | 17 ++---
> > 
> >  1 file changed, 10 insertions(+), 7 deletions(-)
> > 
> > diff --git a/security/integrity/platform_certs/load_powerpc.c
> > b/security/integrity/platform_certs/load_powerpc.c
> > index dee51606d5f4..d4ce91bf3fec 100644
> > --- a/security/integrity/platform_certs/load_powerpc.c
> > +++ b/security/integrity/platform_certs/load_powerpc.c
> > @@ -10,7 +10,6 @@
> >  #include 
> >  #include 
> >  #include 
> > -#include 
> >  #include 
> >  #include 
> >  #include "keyring_handler.h"
> > @@ -59,16 +58,22 @@ static int __init load_powerpc_certs(void)
> > void *db = NULL, *dbx = NULL;
> > u64 dbsize = 0, dbxsize = 0;
> > int rc = 0;
> > -   struct device_node *node;
> > +   ssize_t len;
> > +   char buf[32];
> >  
> > if (!secvar_ops)
> > return -ENODEV;
> >  
> > -   /* The following only applies for the edk2-compat backend.
> > */
> > -   node = of_find_compatible_node(NULL, NULL, "ibm,edk2-
> > compat-v1");
> > -   if (!node)
> > +   len = secvar_ops->format(buf, 32);
> 
> "powerpc/secvar: Handle format string in the consumer"  defines
> opal_secvar_format() for the object format "ibm,secvar-backend". 
> Here
> shouldn't it being returning the format for "ibm,edk2-compat-v1"?
> 

They end up with the same value.  The DT structure on powernv looks
like this:

/proc/device-tree/ibm,opal/secvar:
name "secvar"
compatible   "ibm,secvar-backend"
 "ibm,edk2-compat-v1"
format   "ibm,edk2-compat-v1"
max-var-key-len   0400
phandle  805a (32858)
max-var-size  2000

The existing code is checking for a node compatible with "ibm,edk2-
compat-v1", which would match the node above.  opal_secvar_format()
checks for a node compatible with "ibm,secvar-backend" (again, matching
above) and then returns the contents of the "format" string, which is
"ibm,edk2-compat-v1".

Ultimately it's two different ways of doing the same thing, but this
way load_powerpc_certs() doesn't have to interact with the device tree.

- Russell


> Mimi
> 
> > +   if (len <= 0)
> > return -ENODEV;
> >  
> > +   // Check for known secure boot implementations from OPAL or
> > PLPKS
> > +   if (strcmp("ibm,edk2-compat-v1", buf) && strcmp("ibm,plpks-
> > sb-v1", buf)) {
> > +   pr_err("Unsupported secvar implementation \"%s\",
> > not loading certs\n", buf);
> > +   return -ENODEV;
> > +   }
> > +
> > /*
> >  * Get db, and dbx. They might not exist, so it isn't an
> > error if we
> >  * can't get them.
> > @@ -103,8 +108,6 @@ static int __init load_powerpc_certs(void)
> > kfree(dbx);
> > }
> >  
> > -   of_node_put(node);
> > -
> > return rc;
> >  }
> >  late_initcall(load_powerpc_certs);
> 
> 



Re: [PATCH v3 04/24] powerpc/secvar: Handle format string in the consumer

2023-01-19 Thread Russell Currey
On Thu, 2023-01-19 at 11:17 +1000, Nicholas Piggin wrote:
> On Wed Jan 18, 2023 at 4:10 PM AEST, Andrew Donnellan wrote:
> > From: Russell Currey 
> > 
> > The code that handles the format string in secvar-sysfs.c is
> > entirely
> > OPAL specific, so create a new "format" op in secvar_operations to
> > make
> > the secvar code more generic.  No functional change.
> > 
> > Signed-off-by: Russell Currey 
> > Signed-off-by: Andrew Donnellan 
> > 
> > ---
> > 
> > v2: Use sysfs_emit() instead of sprintf() (gregkh)
> > 
> > v3: Enforce format string size limit (ruscur)
> > ---
> >  arch/powerpc/include/asm/secvar.h    |  3 +++
> >  arch/powerpc/kernel/secvar-sysfs.c   | 23 
> > --
> >  arch/powerpc/platforms/powernv/opal-secvar.c | 25
> > 
> >  3 files changed, 33 insertions(+), 18 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/secvar.h
> > b/arch/powerpc/include/asm/secvar.h
> > index 07ba36f868a7..8b6475589120 100644
> > --- a/arch/powerpc/include/asm/secvar.h
> > +++ b/arch/powerpc/include/asm/secvar.h
> > @@ -11,12 +11,15 @@
> >  #include 
> >  #include 
> >  
> > +#define SECVAR_MAX_FORMAT_LEN  30 // max length of string returned
> > by ->format()
> > +
> >  extern const struct secvar_operations *secvar_ops;
> >  
> >  struct secvar_operations {
> > int (*get)(const char *key, u64 key_len, u8 *data, u64
> > *data_size);
> > int (*get_next)(const char *key, u64 *key_len, u64
> > keybufsize);
> > int (*set)(const char *key, u64 key_len, u8 *data, u64
> > data_size);
> > +   ssize_t (*format)(char *buf);
> >  };
> >  
> >  #ifdef CONFIG_PPC_SECURE_BOOT
> > diff --git a/arch/powerpc/kernel/secvar-sysfs.c
> > b/arch/powerpc/kernel/secvar-sysfs.c
> > index 462cacc0ca60..d3858eedd72c 100644
> > --- a/arch/powerpc/kernel/secvar-sysfs.c
> > +++ b/arch/powerpc/kernel/secvar-sysfs.c
> > @@ -21,26 +21,13 @@ static struct kset *secvar_kset;
> >  static ssize_t format_show(struct kobject *kobj, struct
> > kobj_attribute *attr,
> >    char *buf)
> >  {
> > -   ssize_t rc = 0;
> > -   struct device_node *node;
> > -   const char *format;
> > -
> > -   node = of_find_compatible_node(NULL, NULL, "ibm,secvar-
> > backend");
> > -   if (!of_device_is_available(node)) {
> > -   rc = -ENODEV;
> > -   goto out;
> > -   }
> > +   char tmp[SECVAR_MAX_FORMAT_LEN];
> > +   ssize_t len = secvar_ops->format(tmp);
> >  
> > -   rc = of_property_read_string(node, "format", );
> > -   if (rc)
> > -   goto out;
> > +   if (len <= 0)
> > +   return -EIO;
> 
> AFAIKS this does have a functional change, it loses the return value.
> Why not return len if it is < 0, and -EIO if len == 0?

In v2 mpe suggested the following:

   I'm not sure you should pass that raw error back to sysfs. Some of
   the
   values could be confusing, eg. if you return -EINVAL it looks like a
   parameter to the read() syscall was invalid. Might be better to just
   return -EIO.
   
Following that advice, I don't think we should return something other
than -EIO, but we should at least pr_err() to document the error - this
isn't something that should ever fail.

> 
> Thanks,
> Nick



Re: [PATCH] powerpc/secvar: Use u64 in secvar_operations

2023-01-11 Thread Russell Currey
On Thu, 2023-01-12 at 13:38 +1100, Michael Ellerman wrote:
> There's no reason for secvar_operations to use uint64_t vs the more
> common kernel type u64.
> 
> The types are compatible, but they require different printk format
> strings which can lead to confusion.
> 
> Change all the secvar related routines to use u64.
> 
> Signed-off-by: Michael Ellerman 

Reviewed-by: Russell Currey 



Re: [PATCH v2 7/7] powerpc/pseries: Implement secvars for dynamic secure boot

2023-01-09 Thread Russell Currey
On Mon, 2023-01-09 at 16:20 +1100, Andrew Donnellan wrote:
> On Mon, 2023-01-09 at 14:34 +1100, Russell Currey wrote:
> > 
> > > > +static int plpks_secvar_init(void)
> > > > +{
> > > > +   if (!plpks_is_available())
> > > > +   return -ENODEV;
> > > > +
> > > > +   set_secvar_ops(_secvar_ops);
> > > > +   set_secvar_config_attrs(config_attrs);
> > > > +   return 0;
> > > > +}
> > > > +device_initcall(plpks_secvar_init);
> > > 
> > > That must be a machine_device_initcall(pseries, ...), otherwise
> > > we
> > > will
> > > blow up doing a hcall on powernv in plpks_is_available().
> > 
> > OK, can do.  I don't understand your case of how powernv could hit
> > this, but I think I to have to move plpks_is_available() into
> > include/,
> > so it's going to be even more possible anyway.
> 
> Kernels can be compiled with both pseries and powernv support, in
> which
> case plpks_secvar_init() will be called unconditionally even when
> booting on a powernv machine.
> 
> I can confirm that as it is, booting this on powernv qemu causes a
> panic.

Of course, I'm not sure why I thought an initcall in a platform that
wasn't active would magically not run on other platforms.

> 



Re: [PATCH v2 7/7] powerpc/pseries: Implement secvars for dynamic secure boot

2023-01-08 Thread Russell Currey
On Fri, 2023-01-06 at 21:49 +1100, Michael Ellerman wrote:
> Russell Currey  writes:
> > The pseries platform can support dynamic secure boot (i.e. secure
> > boot
> > using user-defined keys) using variables contained with the PowerVM
> > LPAR
> > Platform KeyStore (PLPKS).  Using the powerpc secvar API, expose
> > the
> > relevant variables for pseries dynamic secure boot through the
> > existing
> > secvar filesystem layout.
> > 
> > The relevant variables for dynamic secure boot are signed in the
> > keystore, and can only be modified using the H_PKS_SIGNED_UPDATE
> > hcall.
> > Object labels in the keystore are encoded using ucs2 format.  With
> > our
> > fixed variable names we don't have to care about encoding outside
> > of the
> > necessary byte padding.
> > 
> > When a user writes to a variable, the first 8 bytes of data must
> > contain
> > the signed update flags as defined by the hypervisor.
> > 
> > When a user reads a variable, the first 4 bytes of data contain the
> > policies defined for the object.
> > 
> > Limitations exist due to the underlying implementation of sysfs
> > binary
> > attributes, as is the case for the OPAL secvar implementation -
> > partial writes are unsupported and writes cannot be larger than
> > PAGE_SIZE.
> > 
> > Co-developed-by: Nayna Jain 
> > Signed-off-by: Nayna Jain 
> > Co-developed-by: Andrew Donnellan 
> > Signed-off-by: Andrew Donnellan 
> > Signed-off-by: Russell Currey 
> > ---
> > v2: Remove unnecessary config vars from sysfs and document the
> > others,
> >     thanks to review from Greg.  If we end up needing to expose
> > more, we
> >     can add them later and update the docs.
> > 
> >     Use sysfs_emit() instead of sprintf(), thanks to Greg.
> > 
> >     Change the size of the sysfs binary attributes to include the
> > 8-byte
> >     flags header, preventing truncation of large writes.
> > 
> >  Documentation/ABI/testing/sysfs-secvar    |  67 -
> >  arch/powerpc/platforms/pseries/Kconfig    |  13 +
> >  arch/powerpc/platforms/pseries/Makefile   |   4 +-
> >  arch/powerpc/platforms/pseries/plpks-secvar.c | 245
> > ++
> >  4 files changed, 326 insertions(+), 3 deletions(-)
> >  create mode 100644 arch/powerpc/platforms/pseries/plpks-secvar.c
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-secvar
> > b/Documentation/ABI/testing/sysfs-secvar
> > index feebb8c57294..466a8cb92b92 100644
> > --- a/Documentation/ABI/testing/sysfs-secvar
> > +++ b/Documentation/ABI/testing/sysfs-secvar
> > @@ -34,7 +34,7 @@ Description:  An integer representation of the
> > size of the content of the
> >  
> >  What:  /sys/firmware/secvar/vars//data
> >  Date:  August 2019
> > -Contact:   Nayna Jain h
> > +Contact:   Nayna Jain 
> >  Description:   A read-only file containing the value of the
> > variable. The size
> > of the file represents the maximum size of the
> > variable data.
> >  
> > @@ -44,3 +44,68 @@ Contact: Nayna Jain 
> >  Description:   A write-only file that is used to submit the new
> > value for the
> > variable. The size of the file represents the
> > maximum size of
> > the variable data that can be written.
> > +
> > +What:  /sys/firmware/secvar/config
> > +Date:  December 2022
> > +Contact:   Nayna Jain 
> > +Description:   This optional directory contains read-only config
> > attributes as
> > +   defined by the secure variable implementation.  All
> > data is in
> > +   ASCII format. The directory is only created if the
> > backing
> > +   implementation provides variables to populate it,
> > which at
> > +   present is only PLPKS on the pseries platform.
> 
> I think it's OK to mention that currently this only exists for PLPKS
> ...
> 
> > +What:  /sys/firmware/secvar/config/version
> > +Date:  December 2022
> > +Contact:   Nayna Jain 
> > +Description:   RO file, only present if the secvar implementation
> > is PLPKS.
> 
> ... but I don't think we want to specify that files are only present
> for PLPKS. 
> 
> Because if another backend wanted to create them in future, that
> would
> technically be an ABI change.

Some are going to be PLPKS-specific, but for generic stuff like this I
can change the description.

> 
> >

Re: [PATCH v2 7/7] powerpc/pseries: Implement secvars for dynamic secure boot

2023-01-05 Thread Russell Currey
On Thu, 2023-01-05 at 19:15 +1100, Andrew Donnellan wrote:
> On Fri, 2022-12-30 at 15:20 +1100, Russell Currey wrote:
> > The pseries platform can support dynamic secure boot (i.e. secure
> > boot
> > using user-defined keys) using variables contained with the PowerVM
> > LPAR
> > Platform KeyStore (PLPKS).  Using the powerpc secvar API, expose
> > the
> > relevant variables for pseries dynamic secure boot through the
> > existing
> > secvar filesystem layout.
> > 
> > The relevant variables for dynamic secure boot are signed in the
> > keystore, and can only be modified using the H_PKS_SIGNED_UPDATE
> > hcall.
> > Object labels in the keystore are encoded using ucs2 format.  With
> > our
> > fixed variable names we don't have to care about encoding outside
> > of
> > the
> > necessary byte padding.
> > 
> > When a user writes to a variable, the first 8 bytes of data must
> > contain
> > the signed update flags as defined by the hypervisor.
> > 
> > When a user reads a variable, the first 4 bytes of data contain the
> > policies defined for the object.
> > 
> > Limitations exist due to the underlying implementation of sysfs
> > binary
> > attributes, as is the case for the OPAL secvar implementation -
> > partial writes are unsupported and writes cannot be larger than
> > PAGE_SIZE.
> 
> The PAGE_SIZE limit, in practice, will be a major limitation with 4K
> pages (we expect several of the variables to regularly be larger than
> 4K) but won't be much of an issue for 64K (that's all the storage
> space
> the hypervisor will give you anyway).
> 
> In a previous internal version, we printed a message when PAGE_SIZE <
> plpks_get_maxobjectsize(), might be worth still doing that?

Yeah, we should do that in the secvar code.  The same limitation
applies on the powernv side as well.

> 
> > 
> > Co-developed-by: Nayna Jain 
> > Signed-off-by: Nayna Jain 
> > Co-developed-by: Andrew Donnellan 
> > Signed-off-by: Andrew Donnellan 
> > Signed-off-by: Russell Currey 
> 
> Some minor comments for v3 on a patch which already carries my
> signoff...
> 
> > ---
> > v2: Remove unnecessary config vars from sysfs and document the
> > others,
> >     thanks to review from Greg.  If we end up needing to expose
> > more,
> > we
> >     can add them later and update the docs.
> > 
> >     Use sysfs_emit() instead of sprintf(), thanks to Greg.
> > 
> >     Change the size of the sysfs binary attributes to include the
> > 8-
> > byte
> >     flags header, preventing truncation of large writes.
> > 
> >  Documentation/ABI/testing/sysfs-secvar    |  67 -
> >  arch/powerpc/platforms/pseries/Kconfig    |  13 +
> >  arch/powerpc/platforms/pseries/Makefile   |   4 +-
> >  arch/powerpc/platforms/pseries/plpks-secvar.c | 245
> > ++
> >  4 files changed, 326 insertions(+), 3 deletions(-)
> >  create mode 100644 arch/powerpc/platforms/pseries/plpks-secvar.c
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-secvar
> > b/Documentation/ABI/testing/sysfs-secvar
> > index feebb8c57294..466a8cb92b92 100644
> > --- a/Documentation/ABI/testing/sysfs-secvar
> > +++ b/Documentation/ABI/testing/sysfs-secvar
> > @@ -34,7 +34,7 @@ Description:  An integer representation of the
> > size
> > of the content of the
> >  
> >  What:  /sys/firmware/secvar/vars//data
> >  Date:  August 2019
> > -Contact:   Nayna Jain h
> > +Contact:   Nayna Jain 
> >  Description:   A read-only file containing the value of the
> > variable. The size
> > of the file represents the maximum size of the
> > variable data.
> >  
> > @@ -44,3 +44,68 @@ Contact: Nayna Jain 
> >  Description:   A write-only file that is used to submit the new
> > value for the
> > variable. The size of the file represents the
> > maximum
> > size of
> > the variable data that can be written.
> > +
> > +What:  /sys/firmware/secvar/config
> > +Date:  December 2022
> > +Contact:   Nayna Jain 
> > +Description:   This optional directory contains read-only config
> > attributes as
> > +   defined by the secure variable implementation.  All
> > data is in
> > +   ASCII format. The directory is only created if the
> > backing
> > +   implementation provides variables to populate it,
> > which at
>

Re: [PATCH v2 6/7] powerpc/secvar: Extend sysfs to include config vars

2023-01-05 Thread Russell Currey
On Fri, 2023-01-06 at 15:15 +1100, Michael Ellerman wrote:
> Russell Currey  writes:
> > The forthcoming pseries consumer of the secvar API wants to expose
> > a
> > number of config variables.  Allowing secvar implementations to
> > provide
> > their own sysfs attributes makes it easy for consumers to expose
> > what
> > they need to.
> > 
> > This is not being used by the OPAL secvar implementation at
> > present, and
> > the config directory will not be created if no attributes are set.
> 
> Would it be slightly cleaner if the attributes were just a member of
> secvar_operations?
> 
> That would avoid the need for some of the separate handling of the
> ops
> and the attributes.
> 
> I know "ops" implies it's only methods, but I think that's not a big
> problem. The power_pmu struct does something similar, where it
> combines
> ops and attributes.

Yeah that does seem easier, thanks for the suggestion.
> 
> cheers
> 
> > diff --git a/arch/powerpc/include/asm/secvar.h
> > b/arch/powerpc/include/asm/secvar.h
> > index 92d2c051918b..250e7066b6da 100644
> > --- a/arch/powerpc/include/asm/secvar.h
> > +++ b/arch/powerpc/include/asm/secvar.h
> > @@ -10,6 +10,7 @@
> >  
> >  #include 
> >  #include 
> > +#include 
> >  
> >  extern const struct secvar_operations *secvar_ops;
> >  
> > @@ -27,10 +28,12 @@ struct secvar_operations {
> >  #ifdef CONFIG_PPC_SECURE_BOOT
> >  
> >  extern void set_secvar_ops(const struct secvar_operations *ops);
> > +extern void set_secvar_config_attrs(const struct attribute
> > **attrs);
> >  
> >  #else
> >  
> >  static inline void set_secvar_ops(const struct secvar_operations
> > *ops) { }
> > +static inline void set_secvar_config_attrs(const struct attribute
> > **attrs) { }
> >  
> >  #endif
> >  
> > diff --git a/arch/powerpc/kernel/secvar-sysfs.c
> > b/arch/powerpc/kernel/secvar-sysfs.c
> > index aa1daec480e1..ad1e1d72d2ae 100644
> > --- a/arch/powerpc/kernel/secvar-sysfs.c
> > +++ b/arch/powerpc/kernel/secvar-sysfs.c
> > @@ -15,9 +15,17 @@
> >  
> >  #define NAME_MAX_SIZE 1024
> >  
> > +const struct attribute **secvar_config_attrs __ro_after_init =
> > NULL;
> > +
> >  static struct kobject *secvar_kobj;
> >  static struct kset *secvar_kset;
> >  
> > +void set_secvar_config_attrs(const struct attribute **attrs)
> > +{
> > +   WARN_ON_ONCE(secvar_config_attrs);
> > +   secvar_config_attrs = attrs;
> > +}
> > +
> >  static ssize_t format_show(struct kobject *kobj, struct
> > kobj_attribute *attr,
> >    char *buf)
> >  {
> > @@ -134,6 +142,16 @@ static int update_kobj_size(void)
> > return 0;
> >  }
> >  
> > +static int secvar_sysfs_config(struct kobject *kobj)
> > +{
> > +   struct attribute_group config_group = {
> > +   .name = "config",
> > +   .attrs = (struct attribute **)secvar_config_attrs,
> > +   };
> > +
> > +   return sysfs_create_group(kobj, _group);
> > +}
> > +
> >  static int secvar_sysfs_load(void)
> >  {
> > char *name;
> > @@ -196,26 +214,38 @@ static int secvar_sysfs_init(void)
> >  
> > rc = sysfs_create_file(secvar_kobj, _attr.attr);
> > if (rc) {
> > -   kobject_put(secvar_kobj);
> > -   return -ENOMEM;
> > +   pr_err("secvar: Failed to create format object\n");
> > +   rc = -ENOMEM;
> > +   goto err;
> > }
> >  
> > secvar_kset = kset_create_and_add("vars", NULL,
> > secvar_kobj);
> > if (!secvar_kset) {
> > pr_err("secvar: sysfs kobject registration
> > failed.\n");
> > -   kobject_put(secvar_kobj);
> > -   return -ENOMEM;
> > +   rc = -ENOMEM;
> > +   goto err;
> > }
> >  
> > rc = update_kobj_size();
> > if (rc) {
> > pr_err("Cannot read the size of the attribute\n");
> > -   return rc;
> > +   goto err;
> > +   }
> > +
> > +   if (secvar_config_attrs) {
> > +   rc = secvar_sysfs_config(secvar_kobj);
> > +   if (rc) {
> > +   pr_err("secvar: Failed to create config
> > directory\n");
> > +   goto err;
> > +   }
> > }
> >  
> > secvar_sysfs_load();
> >  
> > return 0;
> > +err:
> > +   kobject_put(secvar_kobj);
> > +   return rc;
> >  }
> >  
> >  late_initcall(secvar_sysfs_init);
> > -- 
> > 2.38.1



Re: [PATCH v2 6/7] powerpc/secvar: Extend sysfs to include config vars

2023-01-05 Thread Russell Currey
On Thu, 2023-01-05 at 18:28 +1100, Andrew Donnellan wrote:
> On Fri, 2022-12-30 at 15:20 +1100, Russell Currey wrote:
> > The forthcoming pseries consumer of the secvar API wants to expose
> > a
> > number of config variables.  Allowing secvar implementations to
> > provide
> > their own sysfs attributes makes it easy for consumers to expose
> > what
> > they need to.
> > 
> > This is not being used by the OPAL secvar implementation at
> > present,
> > and
> > the config directory will not be created if no attributes are set.
> > 
> > Signed-off-by: Russell Currey 
> 
> Minor comments below, but regardless:
> 
> Reviewed-by: Andrew Donnellan 
> 
> > ---
> > I played around with adding an API call to facilitate a more
> > generic
> > key/value interface for config variables and it seemed like
> > unnecessary
> > complexity.  I think this is cleaner.  If there was ever a secvar
> > interface other than sysfs we'd have to rework it, though.
> 
> I concur, this can be dealt with if/when the secvar interface is
> exposed by some other means than sysfs.
> 
> > 
> >  arch/powerpc/include/asm/secvar.h  |  3 +++
> >  arch/powerpc/kernel/secvar-sysfs.c | 40
> > ++--
> > --
> >  2 files changed, 38 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/secvar.h
> > b/arch/powerpc/include/asm/secvar.h
> > index 92d2c051918b..250e7066b6da 100644
> > --- a/arch/powerpc/include/asm/secvar.h
> > +++ b/arch/powerpc/include/asm/secvar.h
> > @@ -10,6 +10,7 @@
> >  
> >  #include 
> >  #include 
> > +#include 
> >  
> >  extern const struct secvar_operations *secvar_ops;
> >  
> > @@ -27,10 +28,12 @@ struct secvar_operations {
> >  #ifdef CONFIG_PPC_SECURE_BOOT
> >  
> >  extern void set_secvar_ops(const struct secvar_operations *ops);
> > +extern void set_secvar_config_attrs(const struct attribute
> > **attrs);
> >  
> >  #else
> >  
> >  static inline void set_secvar_ops(const struct secvar_operations
> > *ops) { }
> > +static inline void set_secvar_config_attrs(const struct attribute
> > **attrs) { }
> >  
> >  #endif
> >  
> > diff --git a/arch/powerpc/kernel/secvar-sysfs.c
> > b/arch/powerpc/kernel/secvar-sysfs.c
> > index aa1daec480e1..ad1e1d72d2ae 100644
> > --- a/arch/powerpc/kernel/secvar-sysfs.c
> > +++ b/arch/powerpc/kernel/secvar-sysfs.c
> > @@ -15,9 +15,17 @@
> >  
> >  #define NAME_MAX_SIZE 1024
> >  
> > +const struct attribute **secvar_config_attrs __ro_after_init =
> > NULL;
> > +
> >  static struct kobject *secvar_kobj;
> >  static struct kset *secvar_kset;
> >  
> > +void set_secvar_config_attrs(const struct attribute **attrs)
> > +{
> > +   WARN_ON_ONCE(secvar_config_attrs);
> > +   secvar_config_attrs = attrs;
> > +}
> > +
> >  static ssize_t format_show(struct kobject *kobj, struct
> > kobj_attribute *attr,
> >    char *buf)
> >  {
> > @@ -134,6 +142,16 @@ static int update_kobj_size(void)
> > return 0;
> >  }
> >  
> > +static int secvar_sysfs_config(struct kobject *kobj)
> > +{
> > +   struct attribute_group config_group = {
> > +   .name = "config",
> > +   .attrs = (struct attribute **)secvar_config_attrs,
> > +   };
> 
> I was slightly concerned that you're putting this on the stack, but
> it
> doesn't appear that sysfs_create_group() keeps any references to the
> group around after it creates all the files, so I think this is fine.
> 
> > +
> > +   return sysfs_create_group(kobj, _group);
> > +}
> > +
> >  static int secvar_sysfs_load(void)
> >  {
> > char *name;
> > @@ -196,26 +214,38 @@ static int secvar_sysfs_init(void)
> >  
> > rc = sysfs_create_file(secvar_kobj, _attr.attr);
> > if (rc) {
> > -   kobject_put(secvar_kobj);
> > -   return -ENOMEM;
> > +   pr_err("secvar: Failed to create format object\n");
> 
> This file defines pr_fmt, so the secvar: prefix here can go away,
> though I notice that is the case for all the existing prints in this
> function too.

Yeah we should fix that for all of them, good catch.

> 
> > +   rc = -ENOMEM;
> > +   goto err;
> > }
> >  
> > secvar_kset = kset_create_and_add("vars&qu

Re: [PATCH 4/4] powerpc/pseries: Implement signed update for PLPKS objects

2023-01-03 Thread Russell Currey
On Tue, 2022-12-20 at 18:16 +1100, Andrew Donnellan wrote:
> From: Nayna Jain 
> 
> The Platform Keystore provides a signed update interface which can be
> used
> to create, replace or append to certain variables in the PKS in a
> secure
> fashion, with the hypervisor requiring that the update be signed
> using the
> Platform Key.
> 
> Implement an interface to the H_PKS_SIGNED_UPDATE hcall in the plpks
> driver to allow signed updates to PKS objects.
> 
> (The plpks driver doesn't need to do any cryptography or otherwise
> handle
> the actual signed variable contents - that will be handled by
> userspace
> tooling.)
> 
> Signed-off-by: Nayna Jain 
> [ajd: split patch, rewrite commit message, add timeout handling]
> Co-developed-by: Andrew Donnellan 
> Signed-off-by: Andrew Donnellan 
> ---
>  arch/powerpc/include/asm/hvcall.h  |  3 +-
>  arch/powerpc/platforms/pseries/plpks.c | 81 +++-
> --
>  arch/powerpc/platforms/pseries/plpks.h |  5 ++
>  3 files changed, 79 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/hvcall.h
> b/arch/powerpc/include/asm/hvcall.h
> index 95fd7f9485d5..33b26c0cb69b 100644
> --- a/arch/powerpc/include/asm/hvcall.h
> +++ b/arch/powerpc/include/asm/hvcall.h
> @@ -336,7 +336,8 @@
>  #define H_SCM_FLUSH0x44C
>  #define H_GET_ENERGY_SCALE_INFO0x450
>  #define H_WATCHDOG 0x45C
> -#define MAX_HCALL_OPCODE   H_WATCHDOG
> +#define H_PKS_SIGNED_UPDATE0x454
> +#define MAX_HCALL_OPCODE   H_PKS_SIGNED_UPDATE
>  
>  /* Scope args for H_SCM_UNBIND_ALL */
>  #define H_UNBIND_SCOPE_ALL (0x1)
> diff --git a/arch/powerpc/platforms/pseries/plpks.c
> b/arch/powerpc/platforms/pseries/plpks.c
> index c5ae00a8a968..9e4401aabf4f 100644
> --- a/arch/powerpc/platforms/pseries/plpks.c
> +++ b/arch/powerpc/platforms/pseries/plpks.c
> @@ -30,9 +30,9 @@
>  #define MAX_NAME_SIZE  239
>  #define MAX_DATA_SIZE  4000
>  
> -#define PKS_FLUSH_MAX_TIMEOUT 5000 //msec
> -#define PKS_FLUSH_SLEEP  10 //msec
> -#define PKS_FLUSH_SLEEP_RANGE 400
> +#define PKS_MAX_TIMEOUT5000 // msec
> +#define PKS_FLUSH_SLEEP10 // msec
> +#define PKS_FLUSH_SLEEP_RANGE  400
>  
>  static u8 *ospassword;
>  static u16 ospasswordlength;
> @@ -95,6 +95,12 @@ static int pseries_status_to_err(int rc)
> err = -ENOENT;
> break;
> case H_BUSY:
> +   case H_LONG_BUSY_ORDER_1_MSEC:
> +   case H_LONG_BUSY_ORDER_10_MSEC:
> +   case H_LONG_BUSY_ORDER_100_MSEC:
> +   case H_LONG_BUSY_ORDER_1_SEC:
> +   case H_LONG_BUSY_ORDER_10_SEC:
> +   case H_LONG_BUSY_ORDER_100_SEC:
> err = -EBUSY;
> break;
> case H_AUTHORITY:
> @@ -198,14 +204,17 @@ static struct label *construct_label(char
> *component, u8 varos, u8 *name,
>  u16 namelen)
>  {
> struct label *label;
> -   size_t slen;
> +   size_t slen = 0;
>  
> if (!name || namelen > MAX_NAME_SIZE)
> return ERR_PTR(-EINVAL);
>  
> -   slen = strlen(component);
> -   if (component && slen > sizeof(label->attr.prefix))
> -   return ERR_PTR(-EINVAL);
> +   // Support NULL component for signed updates
> +   if (component) {
> +   slen = strlen(component);
> +   if (slen > sizeof(label->attr.prefix))
> +   return ERR_PTR(-EINVAL);
> +   }
>  
> // The label structure must not cross a page boundary, so we
> align to the next power of 2
> label = kzalloc(roundup_pow_of_two(sizeof(*label)),
> GFP_KERNEL);
> @@ -372,7 +381,7 @@ static int plpks_confirm_object_flushed(struct
> label *label,
> usleep_range(PKS_FLUSH_SLEEP,
>  PKS_FLUSH_SLEEP +
> PKS_FLUSH_SLEEP_RANGE);
> timeout = timeout + PKS_FLUSH_SLEEP;
> -   } while (timeout < PKS_FLUSH_MAX_TIMEOUT);
> +   } while (timeout < PKS_MAX_TIMEOUT);
>  
> if (timed_out)
> rc = -ETIMEDOUT;
> @@ -382,6 +391,60 @@ static int plpks_confirm_object_flushed(struct
> label *label,
> return rc;
>  }
>  
> +int plpks_signed_update_var(struct plpks_var var, u64 flags)
> +{
> +   unsigned long retbuf[PLPAR_HCALL9_BUFSIZE] = {0};
> +   int rc;
> +   struct label *label;
> +   struct plpks_auth *auth;
> +   u64 continuetoken = 0;
> +   u64 timeout = 0;
> +
> +   if (!var.data || var.datalen <= 0 || var.namelen >
> MAX_NAME_SIZE)
> +   return -EINVAL;
> +
> +   if (!(var.policy & SIGNEDUPDATE))
> +   return -EINVAL;
> +
> +   auth = construct_auth(PKS_OS_OWNER);
> +   if (IS_ERR(auth))
> +   return PTR_ERR(auth);
> +
> +   label = construct_label(var.component, var.os, var.name,
> var.namelen);
> +   if (IS_ERR(label)) {
> +   rc = PTR_ERR(label);
> +   goto out;
> +   }

Re: [PATCH 3/4] powerpc/pseries: Expose PLPKS config values, support additional fields

2023-01-03 Thread Russell Currey
On Tue, 2022-12-20 at 18:16 +1100, Andrew Donnellan wrote:
> From: Nayna Jain 
> 
> The plpks driver uses the H_PKS_GET_CONFIG hcall to retrieve
> configuration
> and status information about the PKS from the hypervisor.
> 
> Update _plpks_get_config() to handle some additional fields. Add
> getter
> functions to allow the PKS configuration information to be accessed
> from
> other files.
> 
> While we're here, move the config struct in _plpks_get_config() off
> the
> stack - it's getting large and we also need to make sure it doesn't
> cross
> a page boundary.
> 
> Signed-off-by: Nayna Jain 
> [ajd: split patch, extend to support additional v3 API fields, minor
> fixes]
> Co-developed-by: Andrew Donnellan 
> Signed-off-by: Andrew Donnellan 
> ---
>  arch/powerpc/platforms/pseries/plpks.c | 118 ++-
> --
>  arch/powerpc/platforms/pseries/plpks.h |  58 
>  2 files changed, 164 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/plpks.c
> b/arch/powerpc/platforms/pseries/plpks.c
> index 8ccc91143370..c5ae00a8a968 100644
> --- a/arch/powerpc/platforms/pseries/plpks.c
> +++ b/arch/powerpc/platforms/pseries/plpks.c
> @@ -38,8 +38,16 @@ static u8 *ospassword;
>  static u16 ospasswordlength;
>  
>  // Retrieved with H_PKS_GET_CONFIG
> +static u8 version;
> +static u16 objoverhead;
>  static u16 maxpwsize;
>  static u16 maxobjsize;
> +static s16 maxobjlabelsize;
> +static u32 totalsize;
> +static u32 usedspace;
> +static u32 supportedpolicies;
> +static u32 maxlargeobjectsize;
> +static u64 signedupdatealgorithms;
>  
>  struct plpks_auth {
> u8 version;
> @@ -220,32 +228,118 @@ static struct label *construct_label(char
> *component, u8 varos, u8 *name,
>  static int _plpks_get_config(void)
>  {
> unsigned long retbuf[PLPAR_HCALL_BUFSIZE] = { 0 };
> -   struct {
> +   struct config {
> u8 version;
> u8 flags;
> -   __be32 rsvd0;
> +   __be16 rsvd0;
> +   __be16 objoverhead;
> __be16 maxpwsize;
> __be16 maxobjlabelsize;
> __be16 maxobjsize;
> __be32 totalsize;
> __be32 usedspace;
> __be32 supportedpolicies;
> -   __be64 rsvd1;
> -   } __packed config;
> +   __be32 maxlargeobjectsize;
> +   __be64 signedupdatealgorithms;
> +   u8 rsvd1[476];
> +   } __packed *config;
> size_t size;
> -   int rc;
> +   int rc = 0;
> +
> +   size = sizeof(*config);
> +
> +   // Config struct must not cross a page boundary. So long as
> the struct
> +   // size is a power of 2, this should be fine as alignment is
> guaranteed
> +   config = kzalloc(size, GFP_KERNEL);
> +   if (!config) {
> +   rc = -ENOMEM;
> +   goto err;
> +   }
>  
> -   size = sizeof(config);
> +   rc = plpar_hcall(H_PKS_GET_CONFIG, retbuf,
> virt_to_phys(config), size);
>  
> -   rc = plpar_hcall(H_PKS_GET_CONFIG, retbuf,
> virt_to_phys(), size);
> +   if (rc != H_SUCCESS) {
> +   rc = pseries_status_to_err(rc);
> +   goto err;
> +   }
>  
> -   if (rc != H_SUCCESS)
> -   return pseries_status_to_err(rc);
> +   version = config->version;
> +   objoverhead = be16_to_cpu(config->objoverhead);
> +   maxpwsize = be16_to_cpu(config->maxpwsize);
> +   maxobjsize = be16_to_cpu(config->maxobjsize);
> +   maxobjlabelsize = be16_to_cpu(config->maxobjlabelsize) -
> + MAX_LABEL_ATTR_SIZE;
> +   maxobjlabelsize = maxobjlabelsize < 0 ? 0 : maxobjlabelsize;

Isn't a bit of precision lost here?  There has to be a better way to
handle this.  We get a be16 from the hypervisor, turn it into a u16,
and assign that to an s16 in order to handle underflow.  Can we just
check if the size we're given is large enough?  The hypervisor
documentation also says this value must be at least 255, if we sanity
check that we don't have to worry about underflow.

> +   totalsize = be32_to_cpu(config->totalsize);
> +   usedspace = be32_to_cpu(config->usedspace);
> +   supportedpolicies = be32_to_cpu(config->supportedpolicies);
> +   maxlargeobjectsize = be32_to_cpu(config->maxlargeobjectsize);
> +   signedupdatealgorithms = be64_to_cpu(config-
> >signedupdatealgorithms);
> +
> +err:
> +   kfree(config);
> +   return rc;
> +}
>  
> -   maxpwsize = be16_to_cpu(config.maxpwsize);
> -   maxobjsize = be16_to_cpu(config.maxobjsize);
> +u8 plpks_get_version(void)
> +{
> +   return version;
> +}
> +
> +u16 plpks_get_objoverhead(void)
> +{
> +   return objoverhead;
> +}
> +
> +u16 plpks_get_maxpwsize(void)
> +{
> +   return maxpwsize;
> +}
> +
> +u16 plpks_get_maxobjectsize(void)
> +{
> +   return maxobjsize;
> +}
> +
> +u16 plpks_get_maxobjectlabelsize(void)

and it's returned as a u16 

Re: [PATCH 2/4] powerpc/pseries: Fix alignment of PLPKS structures and buffers

2023-01-03 Thread Russell Currey
On Tue, 2022-12-20 at 18:16 +1100, Andrew Donnellan wrote:
> A number of structures and buffers passed to PKS hcalls have
> alignment
> requirements, which could on occasion cause problems:
> 
> - Authorisation structures must be 16-byte aligned and must not cross
> a
>   page boundary
> 
> - Label structures must not cross page coundaries
> 
> - Password output buffers must not cross page boundaries
> 
> Round up the allocations of these structures/buffers to the next
> power of
> 2 to make sure this happens.
> 
> Reported-by: Benjamin Gray 
> Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform
> KeyStore")
> Signed-off-by: Andrew Donnellan 
> 
Reviewed-by: Russell Currey 



Re: [PATCH 1/4] powerpc/pseries: Fix handling of PLPKS object flushing timeout

2023-01-03 Thread Russell Currey
On Tue, 2022-12-20 at 18:16 +1100, Andrew Donnellan wrote:
> plpks_confirm_object_flushed() uses the H_PKS_CONFIRM_OBJECT_FLUSHED
> hcall
> to check whether changes to an object in the Platform KeyStore have
> been
> flushed to non-volatile storage.
> 
> The hcall returns two output values, the return code and the flush
> status.
> plpks_confirm_object_flushed() polls the hcall until either the flush
> status has updated, the return code is an error, or a timeout has
> been
> exceeded.
> 
> While we're still polling, the hcall is returning H_SUCCESS (0) as
> the
> return code. In the timeout case, this means that upon exiting the
> polling
> loop, rc is 0, and therefore 0 is returned to the user.
> 
> Handle the timeout case separately and return ETIMEDOUT if triggered.
> 
> Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform
> KeyStore")
> Reported-by: Benjamin Gray 
> Signed-off-by: Andrew Donnellan 

Tested-by: Russell Currey 
Reviewed-by: Russell Currey 



[PATCH v2 7/7] powerpc/pseries: Implement secvars for dynamic secure boot

2022-12-29 Thread Russell Currey
The pseries platform can support dynamic secure boot (i.e. secure boot
using user-defined keys) using variables contained with the PowerVM LPAR
Platform KeyStore (PLPKS).  Using the powerpc secvar API, expose the
relevant variables for pseries dynamic secure boot through the existing
secvar filesystem layout.

The relevant variables for dynamic secure boot are signed in the
keystore, and can only be modified using the H_PKS_SIGNED_UPDATE hcall.
Object labels in the keystore are encoded using ucs2 format.  With our
fixed variable names we don't have to care about encoding outside of the
necessary byte padding.

When a user writes to a variable, the first 8 bytes of data must contain
the signed update flags as defined by the hypervisor.

When a user reads a variable, the first 4 bytes of data contain the
policies defined for the object.

Limitations exist due to the underlying implementation of sysfs binary
attributes, as is the case for the OPAL secvar implementation -
partial writes are unsupported and writes cannot be larger than PAGE_SIZE.

Co-developed-by: Nayna Jain 
Signed-off-by: Nayna Jain 
Co-developed-by: Andrew Donnellan 
Signed-off-by: Andrew Donnellan 
Signed-off-by: Russell Currey 
---
v2: Remove unnecessary config vars from sysfs and document the others,
thanks to review from Greg.  If we end up needing to expose more, we
can add them later and update the docs.

Use sysfs_emit() instead of sprintf(), thanks to Greg.

Change the size of the sysfs binary attributes to include the 8-byte
flags header, preventing truncation of large writes.

 Documentation/ABI/testing/sysfs-secvar|  67 -
 arch/powerpc/platforms/pseries/Kconfig|  13 +
 arch/powerpc/platforms/pseries/Makefile   |   4 +-
 arch/powerpc/platforms/pseries/plpks-secvar.c | 245 ++
 4 files changed, 326 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/plpks-secvar.c

diff --git a/Documentation/ABI/testing/sysfs-secvar 
b/Documentation/ABI/testing/sysfs-secvar
index feebb8c57294..466a8cb92b92 100644
--- a/Documentation/ABI/testing/sysfs-secvar
+++ b/Documentation/ABI/testing/sysfs-secvar
@@ -34,7 +34,7 @@ Description:  An integer representation of the size of the 
content of the
 
 What:  /sys/firmware/secvar/vars//data
 Date:  August 2019
-Contact:   Nayna Jain h
+Contact:   Nayna Jain 
 Description:   A read-only file containing the value of the variable. The size
of the file represents the maximum size of the variable data.
 
@@ -44,3 +44,68 @@ Contact: Nayna Jain 
 Description:   A write-only file that is used to submit the new value for the
variable. The size of the file represents the maximum size of
the variable data that can be written.
+
+What:  /sys/firmware/secvar/config
+Date:  December 2022
+Contact:   Nayna Jain 
+Description:   This optional directory contains read-only config attributes as
+   defined by the secure variable implementation.  All data is in
+   ASCII format. The directory is only created if the backing
+   implementation provides variables to populate it, which at
+   present is only PLPKS on the pseries platform.
+
+What:  /sys/firmware/secvar/config/version
+Date:  December 2022
+Contact:   Nayna Jain 
+Description:   RO file, only present if the secvar implementation is PLPKS.
+
+   Contains the config version as reported by the hypervisor in
+   ASCII decimal format.
+
+What:  /sys/firmware/secvar/config/max_object_size
+Date:  December 2022
+Contact:   Nayna Jain 
+Description:   RO file, only present if the secvar implementation is PLPKS.
+
+   Contains the maximum allowed size of objects in the keystore
+   in bytes, represented in ASCII decimal format.
+
+   This is not necessarily the same as the max size that can be
+   written to an update file as writes can contain more than
+   object data, you should use the size of the update file for
+   that purpose.
+
+What:  /sys/firmware/secvar/config/total_size
+Date:  December 2022
+Contact:   Nayna Jain 
+Description:   RO file, only present if the secvar implementation is PLPKS.
+
+   Contains the total size of the PLPKS in bytes, represented in
+   ASCII decimal format.
+
+What:  /sys/firmware/secvar/config/used_space
+Date:  December 2022
+Contact:   Nayna Jain 
+Description:   RO file, only present if the secvar implementation is PLPKS.
+
+   Contains the current space consumed of the PLPKS in bytes,
+   represented in ASCII decimal format.
+
+What:  /sys/firmware/secvar/config/supported_policies
+Date:  December 2022
+Contact:   Nayna Jain 
+Description:   RO file, only

[PATCH v2 6/7] powerpc/secvar: Extend sysfs to include config vars

2022-12-29 Thread Russell Currey
The forthcoming pseries consumer of the secvar API wants to expose a
number of config variables.  Allowing secvar implementations to provide
their own sysfs attributes makes it easy for consumers to expose what
they need to.

This is not being used by the OPAL secvar implementation at present, and
the config directory will not be created if no attributes are set.

Signed-off-by: Russell Currey 
---
I played around with adding an API call to facilitate a more generic
key/value interface for config variables and it seemed like unnecessary
complexity.  I think this is cleaner.  If there was ever a secvar
interface other than sysfs we'd have to rework it, though.

 arch/powerpc/include/asm/secvar.h  |  3 +++
 arch/powerpc/kernel/secvar-sysfs.c | 40 ++
 2 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/secvar.h 
b/arch/powerpc/include/asm/secvar.h
index 92d2c051918b..250e7066b6da 100644
--- a/arch/powerpc/include/asm/secvar.h
+++ b/arch/powerpc/include/asm/secvar.h
@@ -10,6 +10,7 @@
 
 #include 
 #include 
+#include 
 
 extern const struct secvar_operations *secvar_ops;
 
@@ -27,10 +28,12 @@ struct secvar_operations {
 #ifdef CONFIG_PPC_SECURE_BOOT
 
 extern void set_secvar_ops(const struct secvar_operations *ops);
+extern void set_secvar_config_attrs(const struct attribute **attrs);
 
 #else
 
 static inline void set_secvar_ops(const struct secvar_operations *ops) { }
+static inline void set_secvar_config_attrs(const struct attribute **attrs) { }
 
 #endif
 
diff --git a/arch/powerpc/kernel/secvar-sysfs.c 
b/arch/powerpc/kernel/secvar-sysfs.c
index aa1daec480e1..ad1e1d72d2ae 100644
--- a/arch/powerpc/kernel/secvar-sysfs.c
+++ b/arch/powerpc/kernel/secvar-sysfs.c
@@ -15,9 +15,17 @@
 
 #define NAME_MAX_SIZE 1024
 
+const struct attribute **secvar_config_attrs __ro_after_init = NULL;
+
 static struct kobject *secvar_kobj;
 static struct kset *secvar_kset;
 
+void set_secvar_config_attrs(const struct attribute **attrs)
+{
+   WARN_ON_ONCE(secvar_config_attrs);
+   secvar_config_attrs = attrs;
+}
+
 static ssize_t format_show(struct kobject *kobj, struct kobj_attribute *attr,
   char *buf)
 {
@@ -134,6 +142,16 @@ static int update_kobj_size(void)
return 0;
 }
 
+static int secvar_sysfs_config(struct kobject *kobj)
+{
+   struct attribute_group config_group = {
+   .name = "config",
+   .attrs = (struct attribute **)secvar_config_attrs,
+   };
+
+   return sysfs_create_group(kobj, _group);
+}
+
 static int secvar_sysfs_load(void)
 {
char *name;
@@ -196,26 +214,38 @@ static int secvar_sysfs_init(void)
 
rc = sysfs_create_file(secvar_kobj, _attr.attr);
if (rc) {
-   kobject_put(secvar_kobj);
-   return -ENOMEM;
+   pr_err("secvar: Failed to create format object\n");
+   rc = -ENOMEM;
+   goto err;
}
 
secvar_kset = kset_create_and_add("vars", NULL, secvar_kobj);
if (!secvar_kset) {
pr_err("secvar: sysfs kobject registration failed.\n");
-   kobject_put(secvar_kobj);
-   return -ENOMEM;
+   rc = -ENOMEM;
+   goto err;
}
 
rc = update_kobj_size();
if (rc) {
pr_err("Cannot read the size of the attribute\n");
-   return rc;
+   goto err;
+   }
+
+   if (secvar_config_attrs) {
+   rc = secvar_sysfs_config(secvar_kobj);
+   if (rc) {
+   pr_err("secvar: Failed to create config directory\n");
+   goto err;
+   }
}
 
secvar_sysfs_load();
 
return 0;
+err:
+   kobject_put(secvar_kobj);
+   return rc;
 }
 
 late_initcall(secvar_sysfs_init);
-- 
2.38.1



[PATCH v2 5/7] powerpc/secvar: Handle max object size in the consumer

2022-12-29 Thread Russell Currey
Currently the max object size is handled in the core secvar code with an
entirely OPAL-specific implementation, so create a new max_size() op and
move the existing implementation into the powernv platform.  Should be
no functional change.

Signed-off-by: Russell Currey 
---
 arch/powerpc/include/asm/secvar.h|  1 +
 arch/powerpc/kernel/secvar-sysfs.c   | 17 +++--
 arch/powerpc/platforms/powernv/opal-secvar.c | 19 +++
 3 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/secvar.h 
b/arch/powerpc/include/asm/secvar.h
index 3b7e5a3625bd..92d2c051918b 100644
--- a/arch/powerpc/include/asm/secvar.h
+++ b/arch/powerpc/include/asm/secvar.h
@@ -21,6 +21,7 @@ struct secvar_operations {
int (*set)(const char *key, uint64_t key_len, u8 *data,
   uint64_t data_size);
ssize_t (*format)(char *buf);
+   int (*max_size)(uint64_t *max_size);
 };
 
 #ifdef CONFIG_PPC_SECURE_BOOT
diff --git a/arch/powerpc/kernel/secvar-sysfs.c 
b/arch/powerpc/kernel/secvar-sysfs.c
index 190238f51335..aa1daec480e1 100644
--- a/arch/powerpc/kernel/secvar-sysfs.c
+++ b/arch/powerpc/kernel/secvar-sysfs.c
@@ -122,27 +122,16 @@ static struct kobj_type secvar_ktype = {
 static int update_kobj_size(void)
 {
 
-   struct device_node *node;
u64 varsize;
-   int rc = 0;
+   int rc = secvar_ops->max_size();
 
-   node = of_find_compatible_node(NULL, NULL, "ibm,secvar-backend");
-   if (!of_device_is_available(node)) {
-   rc = -ENODEV;
-   goto out;
-   }
-
-   rc = of_property_read_u64(node, "max-var-size", );
if (rc)
-   goto out;
+   return rc;
 
data_attr.size = varsize;
update_attr.size = varsize;
 
-out:
-   of_node_put(node);
-
-   return rc;
+   return 0;
 }
 
 static int secvar_sysfs_load(void)
diff --git a/arch/powerpc/platforms/powernv/opal-secvar.c 
b/arch/powerpc/platforms/powernv/opal-secvar.c
index 5e9de06b2533..07260460e966 100644
--- a/arch/powerpc/platforms/powernv/opal-secvar.c
+++ b/arch/powerpc/platforms/powernv/opal-secvar.c
@@ -125,11 +125,30 @@ static ssize_t opal_secvar_format(char *buf)
return rc;
 }
 
+static int opal_secvar_max_size(uint64_t *max_size)
+{
+   int rc;
+   struct device_node *node;
+
+   node = of_find_compatible_node(NULL, NULL, "ibm,secvar-backend");
+   if (!of_device_is_available(node)) {
+   rc = -ENODEV;
+   goto out;
+   }
+
+   rc = of_property_read_u64(node, "max-var-size", max_size);
+
+out:
+   of_node_put(node);
+   return rc;
+}
+
 static const struct secvar_operations opal_secvar_ops = {
.get = opal_get_variable,
.get_next = opal_get_next_variable,
.set = opal_set_variable,
.format = opal_secvar_format,
+   .max_size = opal_secvar_max_size,
 };
 
 static int opal_secvar_probe(struct platform_device *pdev)
-- 
2.38.1



[PATCH v2 4/7] powerpc/secvar: Handle format string in the consumer

2022-12-29 Thread Russell Currey
The code that handles the format string in secvar-sysfs.c is entirely
OPAL specific, so create a new "format" op in secvar_operations to make
the secvar code more generic.  No functional change.

Signed-off-by: Russell Currey 
---
 arch/powerpc/include/asm/secvar.h|  1 +
 arch/powerpc/kernel/secvar-sysfs.c   | 21 +---
 arch/powerpc/platforms/powernv/opal-secvar.c | 25 
 3 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/secvar.h 
b/arch/powerpc/include/asm/secvar.h
index 4cc35b58b986..3b7e5a3625bd 100644
--- a/arch/powerpc/include/asm/secvar.h
+++ b/arch/powerpc/include/asm/secvar.h
@@ -20,6 +20,7 @@ struct secvar_operations {
uint64_t keybufsize);
int (*set)(const char *key, uint64_t key_len, u8 *data,
   uint64_t data_size);
+   ssize_t (*format)(char *buf);
 };
 
 #ifdef CONFIG_PPC_SECURE_BOOT
diff --git a/arch/powerpc/kernel/secvar-sysfs.c 
b/arch/powerpc/kernel/secvar-sysfs.c
index 02e9fee1552e..190238f51335 100644
--- a/arch/powerpc/kernel/secvar-sysfs.c
+++ b/arch/powerpc/kernel/secvar-sysfs.c
@@ -21,26 +21,7 @@ static struct kset *secvar_kset;
 static ssize_t format_show(struct kobject *kobj, struct kobj_attribute *attr,
   char *buf)
 {
-   ssize_t rc = 0;
-   struct device_node *node;
-   const char *format;
-
-   node = of_find_compatible_node(NULL, NULL, "ibm,secvar-backend");
-   if (!of_device_is_available(node)) {
-   rc = -ENODEV;
-   goto out;
-   }
-
-   rc = of_property_read_string(node, "format", );
-   if (rc)
-   goto out;
-
-   rc = sysfs_emit(buf, "%s\n", format);
-
-out:
-   of_node_put(node);
-
-   return rc;
+   return secvar_ops->format(buf);
 }
 
 
diff --git a/arch/powerpc/platforms/powernv/opal-secvar.c 
b/arch/powerpc/platforms/powernv/opal-secvar.c
index 14133e120bdd..5e9de06b2533 100644
--- a/arch/powerpc/platforms/powernv/opal-secvar.c
+++ b/arch/powerpc/platforms/powernv/opal-secvar.c
@@ -101,10 +101,35 @@ static int opal_set_variable(const char *key, uint64_t 
ksize, u8 *data,
return opal_status_to_err(rc);
 }
 
+static ssize_t opal_secvar_format(char *buf)
+{
+   ssize_t rc = 0;
+   struct device_node *node;
+   const char *format;
+
+   node = of_find_compatible_node(NULL, NULL, "ibm,secvar-backend");
+   if (!of_device_is_available(node)) {
+   rc = -ENODEV;
+   goto out;
+   }
+
+   rc = of_property_read_string(node, "format", );
+   if (rc)
+   goto out;
+
+   rc = sysfs_emit(buf, "%s\n", format);
+
+out:
+   of_node_put(node);
+
+   return rc;
+}
+
 static const struct secvar_operations opal_secvar_ops = {
.get = opal_get_variable,
.get_next = opal_get_next_variable,
.set = opal_set_variable,
+   .format = opal_secvar_format,
 };
 
 static int opal_secvar_probe(struct platform_device *pdev)
-- 
2.38.1



[PATCH v2 3/7] powerpc/secvar: Use sysfs_emit() instead of sprintf()

2022-12-29 Thread Russell Currey
The secvar format string and object size sysfs files are both ASCII
text, and should use sysfs_emit().  No functional change.

Suggested-by: Greg Kroah-Hartman 
Signed-off-by: Russell Currey 
---
v2: new

 arch/powerpc/kernel/secvar-sysfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/secvar-sysfs.c 
b/arch/powerpc/kernel/secvar-sysfs.c
index 1ee4640a2641..02e9fee1552e 100644
--- a/arch/powerpc/kernel/secvar-sysfs.c
+++ b/arch/powerpc/kernel/secvar-sysfs.c
@@ -35,7 +35,7 @@ static ssize_t format_show(struct kobject *kobj, struct 
kobj_attribute *attr,
if (rc)
goto out;
 
-   rc = sprintf(buf, "%s\n", format);
+   rc = sysfs_emit(buf, "%s\n", format);
 
 out:
of_node_put(node);
@@ -57,7 +57,7 @@ static ssize_t size_show(struct kobject *kobj, struct 
kobj_attribute *attr,
return rc;
}
 
-   return sprintf(buf, "%llu\n", dsize);
+   return sysfs_emit(buf, "%llu\n", dsize);
 }
 
 static ssize_t data_read(struct file *filep, struct kobject *kobj,
-- 
2.38.1



[PATCH v2 2/7] powerpc/secvar: WARN_ON_ONCE() if multiple secvar ops are set

2022-12-29 Thread Russell Currey
The secvar code only supports one consumer at a time.

Multiple consumers aren't possible at this point in time, but we'd want
it to be obvious if it ever could happen.

Signed-off-by: Russell Currey 
---
 arch/powerpc/kernel/secvar-ops.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/secvar-ops.c b/arch/powerpc/kernel/secvar-ops.c
index 6a29777d6a2d..aa1b2adc2710 100644
--- a/arch/powerpc/kernel/secvar-ops.c
+++ b/arch/powerpc/kernel/secvar-ops.c
@@ -8,10 +8,12 @@
 
 #include 
 #include 
+#include 
 
-const struct secvar_operations *secvar_ops __ro_after_init;
+const struct secvar_operations *secvar_ops __ro_after_init = NULL;
 
 void set_secvar_ops(const struct secvar_operations *ops)
 {
+   WARN_ON_ONCE(secvar_ops);
secvar_ops = ops;
 }
-- 
2.38.1



[PATCH v2 0/7] pseries dynamic secure boot interface using secvar

2022-12-29 Thread Russell Currey
Changes in v2:

Remove unnecessary config vars from sysfs and document the others,
thanks to review from Greg.  If we end up needing to expose more, we
can add them later and update the docs.

Use sysfs_emit() instead of sprintf() for all sysfs strings

Change the size of the sysfs binary attributes to include the 8-byte
flags header, preventing truncation of large writes.

This series exposes an interface to userspace for reading and writing
secure variables contained within the PowerVM LPAR Platform KeyStore
(PLPKS) for the purpose of configuring dynamic secure boot.

This series builds on past work by Nayna Jain[0] in exposing PLPKS
variables to userspace.  Rather than being a generic interface for
interacting with the keystore, however, we use the existing powerpc
secvar infrastructure to only expose objects in the keystore used
for dynamic secure boot.  This has the benefit of leveraging an
existing interface and making the implementation relatively minimal.

This series needs to be applied on top of Andrew's recent bugfix
series[1].

There are a few relevant details to note about the implementation:

 * New additions to the secvar API, format() and max_size()
 * New optional sysfs directory "config/" for arbitrary ASCII variables
 * Some OPAL-specific code has been relocated from secvar-sysfs.c to
powernv platform code.  Would appreciate any powernv testing!
 * Variable names are fixed and only those used for secure boot are
exposed.  This is not a generic PLPKS interface, but also
doesn't preclude one being added in future.

With this series, both powernv and pseries platforms support dynamic
secure boot through the same interface.

[0]: 
https://lore.kernel.org/linuxppc-dev/20221106210744.603240-1-na...@linux.ibm.com/
[1]: 
https://lore.kernel.org/linuxppc-dev/20221220071626.1426786-1-...@linux.ibm.com/

v1: 
https://lore.kernel.org/linuxppc-dev/20221228072943.429266-1-rus...@russell.cc/

Russell Currey (7):
  powerpc/pseries: Log hcall return codes for PLPKS debug
  powerpc/secvar: WARN_ON_ONCE() if multiple secvar ops are set
  powerpc/secvar: Use sysfs_emit() instead of sprintf()
  powerpc/secvar: Handle format string in the consumer
  powerpc/secvar: Handle max object size in the consumer
  powerpc/secvar: Extend sysfs to include config vars
  powerpc/pseries: Implement secvars for dynamic secure boot

 Documentation/ABI/testing/sysfs-secvar|  67 -
 arch/powerpc/include/asm/secvar.h |   5 +
 arch/powerpc/kernel/secvar-ops.c  |   4 +-
 arch/powerpc/kernel/secvar-sysfs.c|  78 +++---
 arch/powerpc/platforms/powernv/opal-secvar.c  |  44 
 arch/powerpc/platforms/pseries/Kconfig|  13 +
 arch/powerpc/platforms/pseries/Makefile   |   4 +-
 arch/powerpc/platforms/pseries/plpks-secvar.c | 245 ++
 arch/powerpc/platforms/pseries/plpks.c|   2 +
 9 files changed, 419 insertions(+), 43 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/plpks-secvar.c

-- 
2.38.1



[PATCH v2 1/7] powerpc/pseries: Log hcall return codes for PLPKS debug

2022-12-29 Thread Russell Currey
The plpks code converts hypervisor return codes into their Linux
equivalents so that users can understand them.  Having access to the
original return codes is really useful for debugging, so add a
pr_debug() so we don't lose information from the conversion.

Signed-off-by: Russell Currey 
---
 arch/powerpc/platforms/pseries/plpks.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/plpks.c 
b/arch/powerpc/platforms/pseries/plpks.c
index 9e4401aabf4f..820218eb894f 100644
--- a/arch/powerpc/platforms/pseries/plpks.c
+++ b/arch/powerpc/platforms/pseries/plpks.c
@@ -131,6 +131,8 @@ static int pseries_status_to_err(int rc)
err = -EINVAL;
}
 
+   pr_debug("Converted hypervisor code %d to Linux %d\n", rc, err);
+
return err;
 }
 
-- 
2.38.1



[PATCH 3/6] powerpc/secvar: Handle format string in the consumer

2022-12-27 Thread Russell Currey
The code that handles the format string in secvar-sysfs.c is entirely
OPAL specific, so create a new "format" op in secvar_operations to make
the secvar code more generic.  No functional change.

Signed-off-by: Russell Currey 
---
 arch/powerpc/include/asm/secvar.h|  1 +
 arch/powerpc/kernel/secvar-sysfs.c   | 21 +---
 arch/powerpc/platforms/powernv/opal-secvar.c | 25 
 3 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/secvar.h 
b/arch/powerpc/include/asm/secvar.h
index 4cc35b58b986..3b7e5a3625bd 100644
--- a/arch/powerpc/include/asm/secvar.h
+++ b/arch/powerpc/include/asm/secvar.h
@@ -20,6 +20,7 @@ struct secvar_operations {
uint64_t keybufsize);
int (*set)(const char *key, uint64_t key_len, u8 *data,
   uint64_t data_size);
+   ssize_t (*format)(char *buf);
 };
 
 #ifdef CONFIG_PPC_SECURE_BOOT
diff --git a/arch/powerpc/kernel/secvar-sysfs.c 
b/arch/powerpc/kernel/secvar-sysfs.c
index 1ee4640a2641..daf28b11866f 100644
--- a/arch/powerpc/kernel/secvar-sysfs.c
+++ b/arch/powerpc/kernel/secvar-sysfs.c
@@ -21,26 +21,7 @@ static struct kset *secvar_kset;
 static ssize_t format_show(struct kobject *kobj, struct kobj_attribute *attr,
   char *buf)
 {
-   ssize_t rc = 0;
-   struct device_node *node;
-   const char *format;
-
-   node = of_find_compatible_node(NULL, NULL, "ibm,secvar-backend");
-   if (!of_device_is_available(node)) {
-   rc = -ENODEV;
-   goto out;
-   }
-
-   rc = of_property_read_string(node, "format", );
-   if (rc)
-   goto out;
-
-   rc = sprintf(buf, "%s\n", format);
-
-out:
-   of_node_put(node);
-
-   return rc;
+   return secvar_ops->format(buf);
 }
 
 
diff --git a/arch/powerpc/platforms/powernv/opal-secvar.c 
b/arch/powerpc/platforms/powernv/opal-secvar.c
index 14133e120bdd..cd5b5c06c091 100644
--- a/arch/powerpc/platforms/powernv/opal-secvar.c
+++ b/arch/powerpc/platforms/powernv/opal-secvar.c
@@ -101,10 +101,35 @@ static int opal_set_variable(const char *key, uint64_t 
ksize, u8 *data,
return opal_status_to_err(rc);
 }
 
+static ssize_t opal_secvar_format(char *buf)
+{
+   ssize_t rc = 0;
+   struct device_node *node;
+   const char *format;
+
+   node = of_find_compatible_node(NULL, NULL, "ibm,secvar-backend");
+   if (!of_device_is_available(node)) {
+   rc = -ENODEV;
+   goto out;
+   }
+
+   rc = of_property_read_string(node, "format", );
+   if (rc)
+   goto out;
+
+   rc = sprintf(buf, "%s\n", format);
+
+out:
+   of_node_put(node);
+
+   return rc;
+}
+
 static const struct secvar_operations opal_secvar_ops = {
.get = opal_get_variable,
.get_next = opal_get_next_variable,
.set = opal_set_variable,
+   .format = opal_secvar_format,
 };
 
 static int opal_secvar_probe(struct platform_device *pdev)
-- 
2.38.1



[PATCH 5/6] powerpc/secvar: Extend sysfs to include config vars

2022-12-27 Thread Russell Currey
The forthcoming pseries consumer of the secvar API wants to expose a
number of config variables.  Allowing secvar implementations to provide
their own sysfs attributes makes it easy for consumers to expose what
they need to.

This is not being used by the OPAL secvar implementation at present, and
the config directory will not be created if no attributes are set.

Signed-off-by: Russell Currey 
---
I played around with adding an API call to facilitate a more generic
key/value interface for config variables and it seemed like unnecessary
complexity.  I think this is cleaner.  If there was ever a secvar
interface other than sysfs we'd have to rework it, though.

 arch/powerpc/include/asm/secvar.h  |  3 +++
 arch/powerpc/kernel/secvar-sysfs.c | 40 ++
 2 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/secvar.h 
b/arch/powerpc/include/asm/secvar.h
index 92d2c051918b..250e7066b6da 100644
--- a/arch/powerpc/include/asm/secvar.h
+++ b/arch/powerpc/include/asm/secvar.h
@@ -10,6 +10,7 @@
 
 #include 
 #include 
+#include 
 
 extern const struct secvar_operations *secvar_ops;
 
@@ -27,10 +28,12 @@ struct secvar_operations {
 #ifdef CONFIG_PPC_SECURE_BOOT
 
 extern void set_secvar_ops(const struct secvar_operations *ops);
+extern void set_secvar_config_attrs(const struct attribute **attrs);
 
 #else
 
 static inline void set_secvar_ops(const struct secvar_operations *ops) { }
+static inline void set_secvar_config_attrs(const struct attribute **attrs) { }
 
 #endif
 
diff --git a/arch/powerpc/kernel/secvar-sysfs.c 
b/arch/powerpc/kernel/secvar-sysfs.c
index ea408763dc78..0c3790345403 100644
--- a/arch/powerpc/kernel/secvar-sysfs.c
+++ b/arch/powerpc/kernel/secvar-sysfs.c
@@ -15,9 +15,17 @@
 
 #define NAME_MAX_SIZE 1024
 
+const struct attribute **secvar_config_attrs __ro_after_init = NULL;
+
 static struct kobject *secvar_kobj;
 static struct kset *secvar_kset;
 
+void set_secvar_config_attrs(const struct attribute **attrs)
+{
+   WARN_ON_ONCE(secvar_config_attrs);
+   secvar_config_attrs = attrs;
+}
+
 static ssize_t format_show(struct kobject *kobj, struct kobj_attribute *attr,
   char *buf)
 {
@@ -134,6 +142,16 @@ static int update_kobj_size(void)
return 0;
 }
 
+static int secvar_sysfs_config(struct kobject *kobj)
+{
+   struct attribute_group config_group = {
+   .name = "config",
+   .attrs = (struct attribute **)secvar_config_attrs,
+   };
+
+   return sysfs_create_group(kobj, _group);
+}
+
 static int secvar_sysfs_load(void)
 {
char *name;
@@ -196,26 +214,38 @@ static int secvar_sysfs_init(void)
 
rc = sysfs_create_file(secvar_kobj, _attr.attr);
if (rc) {
-   kobject_put(secvar_kobj);
-   return -ENOMEM;
+   pr_err("secvar: Failed to create format object\n");
+   rc = -ENOMEM;
+   goto err;
}
 
secvar_kset = kset_create_and_add("vars", NULL, secvar_kobj);
if (!secvar_kset) {
pr_err("secvar: sysfs kobject registration failed.\n");
-   kobject_put(secvar_kobj);
-   return -ENOMEM;
+   rc = -ENOMEM;
+   goto err;
}
 
rc = update_kobj_size();
if (rc) {
pr_err("Cannot read the size of the attribute\n");
-   return rc;
+   goto err;
+   }
+
+   if (secvar_config_attrs) {
+   rc = secvar_sysfs_config(secvar_kobj);
+   if (rc) {
+   pr_err("secvar: Failed to create config directory\n");
+   goto err;
+   }
}
 
secvar_sysfs_load();
 
return 0;
+err:
+   kobject_put(secvar_kobj);
+   return rc;
 }
 
 late_initcall(secvar_sysfs_init);
-- 
2.38.1



[PATCH 1/6] powerpc/pseries: Log hcall return codes for PLPKS debug

2022-12-27 Thread Russell Currey
The plpks code converts hypervisor return codes into their Linux
equivalents so that users can understand them.  Having access to the
original return codes is really useful for debugging, so add a
pr_debug() so we don't lose information from the conversion.

Signed-off-by: Russell Currey 
---
 arch/powerpc/platforms/pseries/plpks.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/plpks.c 
b/arch/powerpc/platforms/pseries/plpks.c
index 9e4401aabf4f..820218eb894f 100644
--- a/arch/powerpc/platforms/pseries/plpks.c
+++ b/arch/powerpc/platforms/pseries/plpks.c
@@ -131,6 +131,8 @@ static int pseries_status_to_err(int rc)
err = -EINVAL;
}
 
+   pr_debug("Converted hypervisor code %d to Linux %d\n", rc, err);
+
return err;
 }
 
-- 
2.38.1



[PATCH 2/6] powerpc/secvar: WARN_ON_ONCE() if multiple secvar ops are set

2022-12-27 Thread Russell Currey
The secvar code only supports one consumer at a time.

Multiple consumers aren't possible at this point in time, but we'd want
it to be obvious if it ever could happen.

Signed-off-by: Russell Currey 
---
 arch/powerpc/kernel/secvar-ops.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/secvar-ops.c b/arch/powerpc/kernel/secvar-ops.c
index 6a29777d6a2d..aa1b2adc2710 100644
--- a/arch/powerpc/kernel/secvar-ops.c
+++ b/arch/powerpc/kernel/secvar-ops.c
@@ -8,10 +8,12 @@
 
 #include 
 #include 
+#include 
 
-const struct secvar_operations *secvar_ops __ro_after_init;
+const struct secvar_operations *secvar_ops __ro_after_init = NULL;
 
 void set_secvar_ops(const struct secvar_operations *ops)
 {
+   WARN_ON_ONCE(secvar_ops);
secvar_ops = ops;
 }
-- 
2.38.1



[PATCH 0/6] pseries dynamic secure boot interface using secvar

2022-12-27 Thread Russell Currey
This series exposes an interface to userspace for reading and writing
secure variables contained within the PowerVM LPAR Platform KeyStore
(PLPKS) for the purpose of configuring dynamic secure boot.

This series builds on past work by Nayna Jain[0] in exposing PLPKS
variables to userspace.  Rather than being a generic interface for
interacting with the keystore, however, we use the existing powerpc
secvar infrastructure to only expose objects in the keystore used
for dynamic secure boot.  This has the benefit of leveraging an
existing interface and making the implementation relatively minimal.

This series needs to be applied on top of Andrew's recent bugfix
series[1].

There are a few relevant details to note about the implementation:

 * New additions to the secvar API, format() and max_size()
 * New optional sysfs directory "config/" for arbitrary ASCII variables
 * Some OPAL-specific code has been relocated from secvar-sysfs.c to
powernv platform code.  Would appreciate any powernv testing!
 * Variable names are fixed and only those used for secure boot are
exposed.  This is not a generic PLPKS interface, but also
doesn't preclude one being added in future.

With this series, both powernv and pseries platforms support dynamic
secure boot through the same interface.

[0]: 
https://lore.kernel.org/linuxppc-dev/20221106210744.603240-1-na...@linux.ibm.com/
[1]: 
https://lore.kernel.org/linuxppc-dev/20221220071626.1426786-1-...@linux.ibm.com/

Russell Currey (6):
  powerpc/pseries: Log hcall return codes for PLPKS debug
  powerpc/secvar: WARN_ON_ONCE() if multiple secvar ops are set
  powerpc/secvar: Handle format string in the consumer
  powerpc/secvar: Handle max object size in the consumer
  powerpc/secvar: Extend sysfs to include config vars
  powerpc/pseries: Implement secvars for dynamic secure boot

 Documentation/ABI/testing/sysfs-secvar|   8 +
 arch/powerpc/include/asm/secvar.h |   5 +
 arch/powerpc/kernel/secvar-ops.c  |   4 +-
 arch/powerpc/kernel/secvar-sysfs.c|  76 +++---
 arch/powerpc/platforms/powernv/opal-secvar.c  |  44 +++
 arch/powerpc/platforms/pseries/Kconfig|  13 +
 arch/powerpc/platforms/pseries/Makefile   |   4 +-
 arch/powerpc/platforms/pseries/plpks-secvar.c | 250 ++
 arch/powerpc/platforms/pseries/plpks.c|   2 +
 9 files changed, 365 insertions(+), 41 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/plpks-secvar.c

-- 
2.38.1



[PATCH 6/6] powerpc/pseries: Implement secvars for dynamic secure boot

2022-12-27 Thread Russell Currey
The pseries platform can support dynamic secure boot (i.e. secure boot
using user-defined keys) using variables contained with the PowerVM LPAR
Platform KeyStore (PLPKS).  Using the powerpc secvar API, expose the
relevant variables for pseries dynamic secure boot through the existing
secvar filesystem layout.

The relevant variables for dynamic secure boot are signed in the
keystore, and can only be modified using the H_PKS_SIGNED_UPDATE hcall.
Object labels in the keystore are encoded using ucs2 format.  With our
fixed variable names we don't have to care about encoding outside of the
necessary byte padding.

When a user writes to a variable, the first 8 bytes of data must contain
the signed update flags as defined by the hypervisor.

When a user reads a variable, the first 4 bytes of data contain the
policies defined for the object.

Limitations exist due to the underlying implementation of sysfs binary
attributes, as is the case for the OPAL secvar implementation -
partial writes are unsupported and writes cannot be larger than PAGE_SIZE.

Co-developed-by: Nayna Jain 
Signed-off-by: Nayna Jain 
Co-developed-by: Andrew Donnellan 
Signed-off-by: Andrew Donnellan 
Signed-off-by: Russell Currey 
---
 Documentation/ABI/testing/sysfs-secvar|   8 +
 arch/powerpc/platforms/pseries/Kconfig|  13 +
 arch/powerpc/platforms/pseries/Makefile   |   4 +-
 arch/powerpc/platforms/pseries/plpks-secvar.c | 250 ++
 4 files changed, 273 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/plpks-secvar.c

diff --git a/Documentation/ABI/testing/sysfs-secvar 
b/Documentation/ABI/testing/sysfs-secvar
index feebb8c57294..e6fef664c9c8 100644
--- a/Documentation/ABI/testing/sysfs-secvar
+++ b/Documentation/ABI/testing/sysfs-secvar
@@ -44,3 +44,11 @@ Contact: Nayna Jain 
 Description:   A write-only file that is used to submit the new value for the
variable. The size of the file represents the maximum size of
the variable data that can be written.
+
+What:  /sys/firmware/secvar/config
+Date:  December 2022
+Contact:   Nayna Jain 
+Description:   This optional directory contains read-only config attributes as
+   defined by the secure variable implementation.  All data is in
+   ASCII format. The directory is only created if the backing
+   implementation provides variables to populate it.
diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index a3b4d99567cb..94e08c405d50 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -162,6 +162,19 @@ config PSERIES_PLPKS
 
  If unsure, select N.
 
+config PSERIES_PLPKS_SECVAR
+   depends on PSERIES_PLPKS
+   depends on PPC_SECURE_BOOT
+   bool "Support for the PLPKS secvar interface"
+   help
+ PowerVM can support dynamic secure boot with user-defined keys
+ through the PLPKS. Keystore objects used in dynamic secure boot
+ can be exposed to the kernel and userspace through the powerpc
+ secvar infrastructure. Select this to enable the PLPKS backend
+ for secvars for use in pseries dynamic secure boot.
+
+ If unsure, select N.
+
 config PAPR_SCM
depends on PPC_PSERIES && MEMORY_HOTPLUG && LIBNVDIMM
tristate "Support for the PAPR Storage Class Memory interface"
diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index 92310202bdd7..807756991f9d 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -27,8 +27,8 @@ obj-$(CONFIG_PAPR_SCM)+= papr_scm.o
 obj-$(CONFIG_PPC_SPLPAR)   += vphn.o
 obj-$(CONFIG_PPC_SVM)  += svm.o
 obj-$(CONFIG_FA_DUMP)  += rtas-fadump.o
-obj-$(CONFIG_PSERIES_PLPKS) += plpks.o
-
+obj-$(CONFIG_PSERIES_PLPKS)+= plpks.o
+obj-$(CONFIG_PSERIES_PLPKS_SECVAR) += plpks-secvar.o
 obj-$(CONFIG_SUSPEND)  += suspend.o
 obj-$(CONFIG_PPC_VAS)  += vas.o vas-sysfs.o
 
diff --git a/arch/powerpc/platforms/pseries/plpks-secvar.c 
b/arch/powerpc/platforms/pseries/plpks-secvar.c
new file mode 100644
index ..3f9ff16c03c8
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/plpks-secvar.c
@@ -0,0 +1,250 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Secure variable implementation using the PowerVM LPAR Platform KeyStore 
(PLPKS)
+ *
+ * Copyright 2022, IBM Corporation
+ * Authors: Russell Currey
+ *  Andrew Donnellan
+ *  Nayna Jain
+ */
+
+#define pr_fmt(fmt) "secvar: "fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "plpks.h"
+
+// Config attributes for sysfs
+#define PLPKS_CONFIG_ATTR(name, fmt, func) \
+   static ssize_t name##_show(struct kobject *

[PATCH 4/6] powerpc/secvar: Handle max object size in the consumer

2022-12-27 Thread Russell Currey
Currently the max object size is handled in the core secvar code with an
entirely OPAL-specific implementation, so create a new max_size() op and
move the existing implementation into the powernv platform.  Should be
no functional change.

Signed-off-by: Russell Currey 
---
 arch/powerpc/include/asm/secvar.h|  1 +
 arch/powerpc/kernel/secvar-sysfs.c   | 17 +++--
 arch/powerpc/platforms/powernv/opal-secvar.c | 19 +++
 3 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/secvar.h 
b/arch/powerpc/include/asm/secvar.h
index 3b7e5a3625bd..92d2c051918b 100644
--- a/arch/powerpc/include/asm/secvar.h
+++ b/arch/powerpc/include/asm/secvar.h
@@ -21,6 +21,7 @@ struct secvar_operations {
int (*set)(const char *key, uint64_t key_len, u8 *data,
   uint64_t data_size);
ssize_t (*format)(char *buf);
+   int (*max_size)(uint64_t *max_size);
 };
 
 #ifdef CONFIG_PPC_SECURE_BOOT
diff --git a/arch/powerpc/kernel/secvar-sysfs.c 
b/arch/powerpc/kernel/secvar-sysfs.c
index daf28b11866f..ea408763dc78 100644
--- a/arch/powerpc/kernel/secvar-sysfs.c
+++ b/arch/powerpc/kernel/secvar-sysfs.c
@@ -122,27 +122,16 @@ static struct kobj_type secvar_ktype = {
 static int update_kobj_size(void)
 {
 
-   struct device_node *node;
u64 varsize;
-   int rc = 0;
+   int rc = secvar_ops->max_size();
 
-   node = of_find_compatible_node(NULL, NULL, "ibm,secvar-backend");
-   if (!of_device_is_available(node)) {
-   rc = -ENODEV;
-   goto out;
-   }
-
-   rc = of_property_read_u64(node, "max-var-size", );
if (rc)
-   goto out;
+   return rc;
 
data_attr.size = varsize;
update_attr.size = varsize;
 
-out:
-   of_node_put(node);
-
-   return rc;
+   return 0;
 }
 
 static int secvar_sysfs_load(void)
diff --git a/arch/powerpc/platforms/powernv/opal-secvar.c 
b/arch/powerpc/platforms/powernv/opal-secvar.c
index cd5b5c06c091..3ef6b9afd129 100644
--- a/arch/powerpc/platforms/powernv/opal-secvar.c
+++ b/arch/powerpc/platforms/powernv/opal-secvar.c
@@ -125,11 +125,30 @@ static ssize_t opal_secvar_format(char *buf)
return rc;
 }
 
+static int opal_secvar_max_size(uint64_t *max_size)
+{
+   int rc;
+   struct device_node *node;
+
+   node = of_find_compatible_node(NULL, NULL, "ibm,secvar-backend");
+   if (!of_device_is_available(node)) {
+   rc = -ENODEV;
+   goto out;
+   }
+
+   rc = of_property_read_u64(node, "max-var-size", max_size);
+
+out:
+   of_node_put(node);
+   return rc;
+}
+
 static const struct secvar_operations opal_secvar_ops = {
.get = opal_get_variable,
.get_next = opal_get_next_variable,
.set = opal_set_variable,
.format = opal_secvar_format,
+   .max_size = opal_secvar_max_size,
 };
 
 static int opal_secvar_probe(struct platform_device *pdev)
-- 
2.38.1



Re: [RFC PATCH 00/13] Add DEXCR support

2022-11-27 Thread Russell Currey
On Mon, 2022-11-28 at 13:44 +1100, Benjamin Gray wrote:
> This series is based on initial work by Chris Riedl that was not sent
> to the list.
> 
> Adds a kernel interface for userspace to interact with the DEXCR.
> The DEXCR is a SPR that allows control over various execution
> 'aspects', such as indirect branch prediction and enabling the
> hashst/hashchk instructions. Further details are in ISA 3.1B
> Book 3 chapter 12.
> 
> This RFC proposes an interface for users to interact with the DEXCR.
> It aims to support
> 
> * Querying supported aspects
> * Getting/setting aspects on a per-process level
> * Allowing global overrides across all processes
> 
> There are some parts that I'm not sure on the best way to approach
> (hence RFC):
> 
> * The feature names in arch/powerpc/kernel/dt_cpu_ftrs.c appear to be
> unimplemented
>   in skiboot, so are being defined by this series. Is being so
> verbose fine?

These are going to need to be added to skiboot before they can be
referenced in the kernel.  Inclusion in skiboot makes them ABI, the
kernel is just a consumer.

> * What aspects should be editable by a process? E.g., SBHE has
>   effects that potentially bleed into other processes. Should
>   it only be system wide configurable?

For context, ISA 3.1B p1358 says: 

   In some micro-architectures, the execution behav-
   ior controlled by aspect 0 is difficult to change with
   any degree of timing precision. The change may
   also bleed over into other threads on the same pro-
   cessor. Any environment that has a dependence on
   the more secure setting of aspect 0 should not
   change the value, and ideally should share a pro-
   cessor only with similar threads. For other environ-
   ments, changes to the effective value of aspect 0
   represent a relative risk tolerance for its aspect of
   execution behavior, with the understanding that
   there will be significant hysteresis in the execution
   behavior.
   
If a process sets SBHE for itself and all it takes is context switching
from a process with SBHE unset to cause exposure, then yeah I think it
should just be global.  I doubt branch hints have enough impact for
process granularity to be especially desirable anyway.

> * Should configuring certain aspects for the process be non-
> privileged? E.g.,
>   Is there harm in always allowing configuration of IBRTPD, SRAPD?
> The *FORCE_SET*
>   action prevents further process local changes regardless of
> privilege.

I'm not aware of a reason why it would be a problem to allow
unprivileged configuration as long as there's a way to prevent further
changes.  The concerning case is if a mitigation is set by a trusted
process context, and then untrusted code is executed that manages to
turn the mitigation off again.

> * The tests fail Patchwork CI because of the new prctl macros, and
> the CI
>   doesn't run headers_install and add -isystem
> /usr/include to
>   the make command.

The CI runs on x86 and cross compiles the kernel and selftests, and
boots are done in qemu tcg.  Maybe we can skip the build if the symbols
are undefined or do something like

#ifndef PR_PPC_DEXCR_...
return KSFT_SKIP;
#endif

in the test itself?

> * On handling an exception, I don't check if the NPHIE bit is enabled
> in the DEXCR.
>   To do so would require reading both the DEXCR and HDEXCR, for
> little gain (it
>   should only matter that the current instruction was a hashchk. If
> so, the only
>   reason it would cause an exception is the failed check. If the
> instruction is
>   rewritten between exception and check we'd be wrong anyway).

For context, the hashst and hashchk instructions are implemented using
previously reserved nops.  I'm not aware of any reason a nop could trap
(i.e. we could check for a trap that came from hashchk even if NPHIE is
not set), but afaik that'd be the only reason we would have to check.

> 
> The series is based on the earlier selftest utils series[1], so the
> tests won't build
> at all without applying that first. The kernel side should build fine
> on ppc/next
> 247f34f7b80357943234f93f247a1ae6b6c3a740 though.
> 
> [1]:
> https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20221122231103.15829-1-bg...@linux.ibm.com/
> 
> Benjamin Gray (13):
>   powerpc/book3s: Add missing  include
>   powerpc: Add initial Dynamic Execution Control Register (DEXCR)
>     support
>   powerpc/dexcr: Handle hashchk exception
>   powerpc/dexcr: Support userspace ROP protection
>   prctl: Define PowerPC DEXCR interface
>   powerpc/dexcr: Add prctl implementation
>   powerpc/dexcr: Add sysctl entry for SBHE system override
>   powerpc/dexcr: Add enforced userspace ROP protection config
>   selftests/powerpc: Add more utility macros
>   selftests/powerpc: Add hashst/hashchk test
>   selftests/powerpc: Add DEXCR prctl, sysctl interface test
>   selftests/powerpc: Add DEXCR status utility lsdexcr
>   Documentation: Document PowerPC kernel DEXCR interface
> 
>  

Re: [PATCH 2/4] powerpc/64s: Clear gprs on interrupt routine entry on Book3S

2022-11-06 Thread Russell Currey
On Thu, 2022-11-03 at 09:45 +1100, Rohan McLure wrote:
> Zero user state in gprs (assign to zero) to reduce the influence of
> user
> registers on speculation within kernel syscall handlers. Clears occur
> at the very beginning of the sc and scv 0 interrupt handlers, with
> restores occurring following the execution of the syscall handler.
> 
> Zero GPRS r0, r2-r11, r14-r31, on entry into the kernel for all
> other interrupt sources. The remaining gprs are overwritten by
> entry macros to interrupt handlers, irrespective of whether or not a
> given handler consumes these register values.
> 
> Prior to this commit, r14-r31 are restored on a per-interrupt basis
> at
> exit, but now they are always restored on 64bit Book3S. Remove
> explicit
> REST_NVGPRS invocations on 64-bit Book3S. 32-bit systems do not clear
> user registers on interrupt, and continue to depend on the return
> value
> of interrupt_exit_user_prepare to determine whether or not to restore
> non-volatiles.
> 
> The mmap_bench benchmark in selftests should rapidly invoke
> pagefaults.
> See ~0.8% performance regression with this mitigation, but this
> indicates the worst-case performance due to heavier-weight interrupt
> handlers. This mitigation is able to be enabled/disabled through
> CONFIG_INTERRUPT_SANITIZE_REGISTERS.
> 
> Signed-off-by: Rohan McLure 
> ---

Hi Rohan, I haven't figured out what it's coming from, but I'm seeing
assembler errors from this patch:

   /linux/arch/powerpc/kernel/interrupt_64.S: Assembler messages:
   /linux/arch/powerpc/kernel/interrupt_64.S:212: Error: too many
   positional arguments
   /linux/arch/powerpc/kernel/interrupt_64.S:219: Error: too many
   positional arguments
   /linux/arch/powerpc/kernel/interrupt_64.S:302: Error: too many
   positional arguments
   
Log here:
https://github.com/ruscur/linux-ci/actions/runs/3381711748/jobs/5615903876#step:4:98
   

> Resubmitting patches as their own series after v6 partially merged:
> Link:
> https://lore.kernel.org/all/166488988686.779920.13794870102696416283.b4...@ellerman.id.au/t/
> 
> Standalone series: Now syscall register clearing is included
> under the same configuration option. This now matches the
> description given for CONFIG_INTERRUPT_SANITIZE_REGISTERS.
> ---
>  arch/powerpc/kernel/exceptions-64s.S | 47 +-
>  arch/powerpc/kernel/interrupt_64.S   | 34 +++
>  2 files changed, 72 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/exceptions-64s.S
> b/arch/powerpc/kernel/exceptions-64s.S
> index 651c36b056bd..0605018762d1 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -21,6 +21,19 @@
>  #include 
>  #include 
>  
> +/*
> + * macros for handling user register sanitisation
> + */
> +#ifdef CONFIG_INTERRUPT_SANITIZE_REGISTERS
> +#define SANITIZE_ZEROIZE_NVGPRS()  ZEROIZE_NVGPRS()
> +#define SANITIZE_RESTORE_NVGPRS()  REST_NVGPRS(r1)
> +#define HANDLER_RESTORE_NVGPRS()
> +#else
> +#define SANITIZE_ZEROIZE_NVGPRS()
> +#define SANITIZE_RESTORE_NVGPRS()
> +#define HANDLER_RESTORE_NVGPRS()   REST_NVGPRS(r1)
> +#endif /* CONFIG_INTERRUPT_SANITIZE_REGISTERS */
> +
>  /*
>   * Following are fixed section helper macros.
>   *
> @@ -111,6 +124,7 @@ name:
>  #define ISTACK .L_ISTACK_\name\()  /* Set regular kernel
> stack */
>  #define __ISTACK(name) .L_ISTACK_ ## name
>  #define IKUAP  .L_IKUAP_\name\()   /* Do KUAP lock */
> +#define IMSR_R12   .L_IMSR_R12_\name\()/* Assumes MSR saved
> to r12 */
>  
>  #define
> INT_DEFINE_BEGIN(n)\
>  .macro int_define_ ## n name
> @@ -176,6 +190,9 @@ do_define_int n
> .ifndef IKUAP
> IKUAP=1
> .endif
> +   .ifndef IMSR_R12
> +   IMSR_R12=0
> +   .endif
>  .endm
>  
>  /*
> @@ -502,6 +519,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real, text)
> std r10,0(r1)   /* make stack chain
> pointer */
> std r0,GPR0(r1) /* save r0 in
> stackframe*/
> std r10,GPR1(r1)/* save r1 in
> stackframe*/
> +   ZEROIZE_GPR(0)
>  
> /* Mark our [H]SRRs valid for return */
> li  r10,1
> @@ -544,8 +562,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
> std r9,GPR11(r1)
> std r10,GPR12(r1)
> std r11,GPR13(r1)
> +   .if !IMSR_R12
> +   ZEROIZE_GPRS(9, 12)
> +   .else
> +   ZEROIZE_GPRS(9, 11)
> +   .endif
>  
> SAVE_NVGPRS(r1)
> +   SANITIZE_ZEROIZE_NVGPRS()
>  
> .if IDAR
> .if IISIDE
> @@ -577,8 +601,8 @@ BEGIN_FTR_SECTION
>  END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
> ld  r10,IAREA+EX_CTR(r13)
> std r10,_CTR(r1)
> -   std r2,GPR2(r1) /* save r2 in
> stackframe*/
> -   SAVE_GPRS(3, 8, r1) /* save r3 - r8 in
> stackframe   */
> +   SAVE_GPRS(2, 8, 

Re: [RFC PATCH 14/19] powerpc: split validate_sp into two functions

2022-11-06 Thread Russell Currey
On Mon, 2022-10-31 at 15:54 +1000, Nicholas Piggin wrote:
> Most callers just want to validate an arbitrary kernel stack pointer,
> some need a particular size. Make the size case the exceptional one
> with an extra function.
> 
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/include/asm/processor.h | 15 ---
>  arch/powerpc/kernel/process.c    | 23 ++-
>  arch/powerpc/kernel/stacktrace.c |  2 +-
>  arch/powerpc/perf/callchain.c    |  6 +++---
>  4 files changed, 30 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/processor.h
> b/arch/powerpc/include/asm/processor.h
> index 631802999d59..e96c9b8c2a60 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -374,9 +374,18 @@ static inline unsigned long __pack_fe01(unsigned
> int fpmode)
>  
>  #endif
>  
> -/* Check that a certain kernel stack pointer is valid in task_struct
> p */
> -int validate_sp(unsigned long sp, struct task_struct *p,
> -   unsigned long nbytes);
> +/*
> + * Check that a certain kernel stack pointer is a valid (minimum
> sized)
> + * stack frame in task_struct p.
> + */
> +int validate_sp(unsigned long sp, struct task_struct *p);
> +
> +/*
> + * validate the stack frame of a particular minimum size, used for
> when we are
> + * looking at a certain object in the stack beyond the minimum.
> + */
> +int validate_sp_size(unsigned long sp, struct task_struct *p,
> +    unsigned long nbytes);
>  
>  /*
>   * Prefetch macros.
> diff --git a/arch/powerpc/kernel/process.c
> b/arch/powerpc/kernel/process.c
> index 6cb3982a11ef..b5defea32e75 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -2128,9 +2128,12 @@ static inline int
> valid_emergency_stack(unsigned long sp, struct task_struct *p,
> return 0;
>  }
>  
> -
> -int validate_sp(unsigned long sp, struct task_struct *p,
> -  unsigned long nbytes)
> +/*
> + * validate the stack frame of a particular minimum size, used for
> when we are
> + * looking at a certain object in the stack beyond the minimum.
> + */
> +int validate_sp_size(unsigned long sp, struct task_struct *p,
> +    unsigned long nbytes)
>  {
> unsigned long stack_page = (unsigned long)task_stack_page(p);
>  
> @@ -2146,7 +2149,10 @@ int validate_sp(unsigned long sp, struct
> task_struct *p,
> return valid_emergency_stack(sp, p, nbytes);
>  }
>  
> -EXPORT_SYMBOL(validate_sp);
> +int validate_sp(unsigned long sp, struct task_struct *p)
> +{
> +   return validate_sp(sp, p, STACK_FRAME_OVERHEAD);

Hi Nick, I assume this supposed to be validate_sp_size()?  Did you get
this to compile?

> +}
>  
>  static unsigned long ___get_wchan(struct task_struct *p)
>  {
> @@ -2154,13 +2160,12 @@ static unsigned long ___get_wchan(struct
> task_struct *p)
> int count = 0;
>  
> sp = p->thread.ksp;
> -   if (!validate_sp(sp, p, STACK_FRAME_OVERHEAD))
> +   if (!validate_sp(sp, p))
> return 0;
>  
> do {
> sp = READ_ONCE_NOCHECK(*(unsigned long *)sp);
> -   if (!validate_sp(sp, p, STACK_FRAME_OVERHEAD) ||
> -   task_is_running(p))
> +   if (!validate_sp(sp, p) || task_is_running(p))
> return 0;
> if (count > 0) {
> ip = READ_ONCE_NOCHECK(((unsigned long
> *)sp)[STACK_FRAME_LR_SAVE]);
> @@ -2214,7 +2219,7 @@ void __no_sanitize_address show_stack(struct
> task_struct *tsk,
> lr = 0;
> printk("%sCall Trace:\n", loglvl);
> do {
> -   if (!validate_sp(sp, tsk, STACK_FRAME_OVERHEAD))
> +   if (!validate_sp(sp, tsk))
> break;
>  
> stack = (unsigned long *) sp;
> @@ -2241,7 +2246,7 @@ void __no_sanitize_address show_stack(struct
> task_struct *tsk,
>  * could hold a pt_regs, if that does not fit then it
> can't
>  * have regs.
>  */
> -   if (validate_sp(sp, tsk, STACK_SWITCH_FRAME_SIZE)
> +   if (validate_sp_size(sp, tsk,
> STACK_SWITCH_FRAME_SIZE)
>     && stack[STACK_INT_FRAME_MARKER_LONGS] ==
> STACK_FRAME_REGS_MARKER) {
> struct pt_regs *regs = (struct pt_regs *)
> (sp + STACK_INT_FRAME_REGS);
> diff --git a/arch/powerpc/kernel/stacktrace.c
> b/arch/powerpc/kernel/stacktrace.c
> index 453ac317a6cf..1dbbf30f265e 100644
> --- a/arch/powerpc/kernel/stacktrace.c
> +++ b/arch/powerpc/kernel/stacktrace.c
> @@ -43,7 +43,7 @@ void __no_sanitize_address
> arch_stack_walk(stack_trace_consume_fn consume_entry,
> unsigned long *stack = (unsigned long *) sp;
> unsigned long newsp, ip;
>  
> -   if (!validate_sp(sp, task, STACK_FRAME_OVERHEAD))
> +   if 

Re: [RFC PATCH 4/6] powerpc/powernv: Convert pointers to physical addresses in OPAL call args

2022-11-06 Thread Russell Currey
On Sat, 2022-11-05 at 04:27 +1100, Andrew Donnellan wrote:
> A number of OPAL calls take addresses as arguments (e.g. buffers with
> strings to print, etc). These addresses need to be physical
> addresses, as
> OPAL runs in real mode.
> 
> Since the hardware ignores the top two bits of the address in real
> mode,
> passing addresses in the kernel's linear map works fine even if we
> don't
> wrap them in __pa().
> 
> With VMAP_STACK, however, we're going to have to use
> vmalloc_to_phys() to
> convert addresses from the stack into an address that OPAL can use.
> 
> Introduce a new macro, stack_pa(), that uses vmalloc_to_phys() for
> addresses in the vmalloc area, and __pa() for linear map addresses.
> Add it
> to all the existing callsites where we pass pointers to OPAL.
> 
> Signed-off-by: Andrew Donnellan 
> ---
>  arch/powerpc/kvm/book3s_hv_builtin.c  |  2 +-
>  arch/powerpc/platforms/powernv/eeh-powernv.c  | 20 ++-
>  arch/powerpc/platforms/powernv/ocxl.c |  3 +-
>  arch/powerpc/platforms/powernv/opal-core.c    |  4 +--
>  arch/powerpc/platforms/powernv/opal-dump.c    |  6 ++--
>  arch/powerpc/platforms/powernv/opal-elog.c    | 10 +++---
>  arch/powerpc/platforms/powernv/opal-fadump.c  | 12 +++
>  arch/powerpc/platforms/powernv/opal-flash.c   |  5 +--
>  arch/powerpc/platforms/powernv/opal-hmi.c |  3 +-
>  arch/powerpc/platforms/powernv/opal-irqchip.c |  4 +--
>  arch/powerpc/platforms/powernv/opal-lpc.c |  8 ++---
>  arch/powerpc/platforms/powernv/opal-nvram.c   |  4 +--
>  arch/powerpc/platforms/powernv/opal-power.c   |  4 +--
>  .../powerpc/platforms/powernv/opal-powercap.c |  2 +-
>  arch/powerpc/platforms/powernv/opal-prd.c |  6 ++--
>  arch/powerpc/platforms/powernv/opal-psr.c |  2 +-
>  arch/powerpc/platforms/powernv/opal-rtc.c |  2 +-
>  arch/powerpc/platforms/powernv/opal-secvar.c  |  9 +++--
>  arch/powerpc/platforms/powernv/opal-sensor.c  |  4 +--
>  .../powerpc/platforms/powernv/opal-sysparam.c |  4 +--
>  arch/powerpc/platforms/powernv/opal-xscom.c   |  2 +-
>  arch/powerpc/platforms/powernv/opal.c | 16 -
>  arch/powerpc/platforms/powernv/pci-ioda.c | 14 
>  arch/powerpc/platforms/powernv/pci.c  | 25 +++---
>  arch/powerpc/platforms/powernv/setup.c    |  2 +-
>  arch/powerpc/platforms/powernv/smp.c  |  2 +-
>  arch/powerpc/sysdev/xics/icp-opal.c   |  2 +-
>  arch/powerpc/sysdev/xics/ics-opal.c   |  8 ++---
>  arch/powerpc/sysdev/xive/native.c | 33 -
> --
>  drivers/char/ipmi/ipmi_powernv.c  |  6 ++--
>  drivers/char/powernv-op-panel.c   |  2 +-
>  drivers/i2c/busses/i2c-opal.c |  2 +-
>  drivers/leds/leds-powernv.c   |  6 ++--
>  drivers/mtd/devices/powernv_flash.c   |  4 +--
>  drivers/rtc/rtc-opal.c    |  4 +--
>  35 files changed, 135 insertions(+), 107 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c
> b/arch/powerpc/kvm/book3s_hv_builtin.c
> index da85f046377a..dba041d659d2 100644
> --- a/arch/powerpc/kvm/book3s_hv_builtin.c
> +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
> @@ -414,7 +414,7 @@ static long kvmppc_read_one_intr(bool *again)
> xics_phys = local_paca->kvm_hstate.xics_phys;
> rc = 0;
> if (!xics_phys)
> -   rc = opal_int_get_xirr(, false);
> +   rc = opal_int_get_xirr(stack_pa(), false);
> else
> xirr = __raw_rm_readl(xics_phys + XICS_XIRR);
> if (rc < 0)
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c
> b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index a83cb679dd59..f069aa28f969 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -517,7 +517,7 @@ static void pnv_eeh_get_phb_diag(struct eeh_pe
> *pe)
> struct pnv_phb *phb = pe->phb->private_data;
> s64 rc;
>  
> -   rc = opal_pci_get_phb_diag_data2(phb->opal_id, pe->data,
> +   rc = opal_pci_get_phb_diag_data2(phb->opal_id, stack_pa(pe-
> >data),
>  phb->diag_data_size);
> if (rc != OPAL_SUCCESS)
> pr_warn("%s: Failure %lld getting PHB#%x diag-
> data\n",
> @@ -534,8 +534,8 @@ static int pnv_eeh_get_phb_state(struct eeh_pe
> *pe)
>  
> rc = opal_pci_eeh_freeze_status(phb->opal_id,
> pe->addr,
> -   ,
> -   ,
> +   stack_pa(),
> +   stack_pa(),
> NULL);
> if (rc != OPAL_SUCCESS) {
> pr_warn("%s: Failure %lld getting PHB#%x state\n",
> @@ -594,8 +594,8 @@ static int pnv_eeh_get_pe_state(struct eeh_pe
> *pe)
> } else {
> rc = 

Re: [PATCH v3 3/3] powerpc: mm: support page table check

2022-10-25 Thread Russell Currey
On Mon, 2022-10-24 at 11:35 +1100, Rohan McLure wrote:
> On creation and clearing of a page table mapping, instrument such
> calls
> by invoking page_table_check_pte_set and page_table_check_pte_clear
> respectively. These calls serve as a sanity check against illegal
> mappings.
> 
> Enable ARCH_SUPPORTS_PAGE_TABLE_CHECK for all ppc64, and 32-bit
> platforms implementing Book3S.
> 
> Change pud_pfn to be a runtime bug rather than a build bug as it is
> consumed by page_table_check_pud_{clear,set} which are not called.
> 
> See also:
> 
> riscv support in commit 3fee229a8eb9 ("riscv/mm: enable
> ARCH_SUPPORTS_PAGE_TABLE_CHECK")
> arm64 in commit 42b2547137f5 ("arm64/mm: enable
> ARCH_SUPPORTS_PAGE_TABLE_CHECK")
> x86_64 in commit d283d422c6c4 ("x86: mm: add x86_64 support for page
> table
> check")
> 
> Signed-off-by: Rohan McLure 

Reviewed-by: Russell Currey 


Re: [PATCH] powerpc: replace ternary operator with min()

2022-10-23 Thread Russell Currey
On Sun, 2022-10-23 at 20:44 +0800, KaiLong Wang wrote:
> Fix the following coccicheck warning:
> 
> arch/powerpc/xmon/xmon.c:2987: WARNING opportunity for min()
> arch/powerpc/xmon/xmon.c:2583: WARNING opportunity for min()
> 
> Signed-off-by: KaiLong Wang 

Hello,

This fails to compile on some platforms/compilers since n is a long and
16 is an int, expanding to:

r = __builtin_choose_expr(
((!!(sizeof((typeof(n) *)1 == (typeof(16) *)1))) &&
 ((sizeof(int) ==
   sizeof(*(8 ? ((void *)((long)(n)*0l)) : (int *)8))) &&
  (sizeof(int) ==
   sizeof(*(8 ? ((void *)((long)(16) * 0l)) :
(int *)8),
((n) < (16) ? (n) : (16)), ({
typeof(n) __UNIQUE_ID___x0 = (n);
typeof(16) __UNIQUE_ID___y1 = (16);
((__UNIQUE_ID___x0) < (__UNIQUE_ID___y1) ?
 (__UNIQUE_ID___x0) :
 (__UNIQUE_ID___y1));
}));

Here's the full build failure as found by snowpatch:
https://github.com/ruscur/linux-ci/actions/runs/3308880562/jobs/5461579048#step:4:89

You should use min_t(long, n, 16) instead.

- Russell

> ---
>  arch/powerpc/xmon/xmon.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> index f51c882bf902..a7751cd2cc9d 100644
> --- a/arch/powerpc/xmon/xmon.c
> +++ b/arch/powerpc/xmon/xmon.c
> @@ -2580,7 +2580,7 @@ static void xmon_rawdump (unsigned long adrs,
> long ndump)
> unsigned char temp[16];
>  
> for (n = ndump; n > 0;) {
> -   r = n < 16? n: 16;
> +   r = min(n, 16);
> nr = mread(adrs, temp, r);
> adrs += nr;
> for (m = 0; m < r; ++m) {
> @@ -2984,7 +2984,7 @@ prdump(unsigned long adrs, long ndump)
> for (n = ndump; n > 0;) {
> printf(REG, adrs);
> putchar(' ');
> -   r = n < 16? n: 16;
> +   r = min(n, 16);
> nr = mread(adrs, temp, r);
> adrs += nr;
> for (m = 0; m < r; ++m) {



Re: [PATCH v8 4/6] powerpc/tlb: Add local flush for page given mm_struct and psize

2022-10-23 Thread Russell Currey
On Fri, 2022-10-21 at 16:22 +1100, Benjamin Gray wrote:
> Adds a local TLB flush operation that works given an mm_struct, VA to
> flush, and page size representation.
> 
> This removes the need to create a vm_area_struct, which the temporary
> patching mm work does not need.
> 
> Signed-off-by: Benjamin Gray 
> ---
>  arch/powerpc/include/asm/book3s/32/tlbflush.h  | 9 +
>  arch/powerpc/include/asm/book3s/64/tlbflush-hash.h | 5 +
>  arch/powerpc/include/asm/book3s/64/tlbflush.h  | 8 
>  arch/powerpc/include/asm/nohash/tlbflush.h | 1 +
>  4 files changed, 23 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/book3s/32/tlbflush.h
> b/arch/powerpc/include/asm/book3s/32/tlbflush.h
> index ba1743c52b56..e5a688cebf69 100644
> --- a/arch/powerpc/include/asm/book3s/32/tlbflush.h
> +++ b/arch/powerpc/include/asm/book3s/32/tlbflush.h
> @@ -2,6 +2,8 @@
>  #ifndef _ASM_POWERPC_BOOK3S_32_TLBFLUSH_H
>  #define _ASM_POWERPC_BOOK3S_32_TLBFLUSH_H
>  
> +#include 
> +
>  #define MMU_NO_CONTEXT  (0)
>  /*
>   * TLB flushing for "classic" hash-MMU 32-bit CPUs, 6xx, 7xx, 7xxx
> @@ -74,6 +76,13 @@ static inline void local_flush_tlb_page(struct
> vm_area_struct *vma,
>  {
> flush_tlb_page(vma, vmaddr);
>  }
> +
> +static inline void local_flush_tlb_page_psize(struct mm_struct *mm,
> unsigned long vmaddr, int psize)
> +{
> +   BUILD_BUG_ON(psize != MMU_PAGE_4K);
> +   flush_range(mm, vmaddr, vmaddr + PAGE_SIZE);
> +}
> +
>  static inline void local_flush_tlb_mm(struct mm_struct *mm)
>  {
> flush_tlb_mm(mm);
> diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
> b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
> index fab8332fe1ad..8fd9dc49b2a1 100644
> --- a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
> @@ -94,6 +94,11 @@ static inline void
> hash__local_flush_tlb_page(struct vm_area_struct *vma,
>  {
>  }
>  
> +static inline void hash__local_flush_tlb_page_psize(struct mm_struct
> *mm,
> +   unsigned long
> vmaddr, int psize)
> +{
> +}
> +
>  static inline void hash__flush_tlb_page(struct vm_area_struct *vma,
>     unsigned long vmaddr)
>  {
> diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h
> b/arch/powerpc/include/asm/book3s/64/tlbflush.h
> index 67655cd60545..2d839dd5c08c 100644
> --- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
> +++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
> @@ -92,6 +92,14 @@ static inline void local_flush_tlb_page(struct
> vm_area_struct *vma,
> return hash__local_flush_tlb_page(vma, vmaddr);
>  }
>  
> +static inline void local_flush_tlb_page_psize(struct mm_struct *mm,
> + unsigned long vmaddr,
> int psize)
> +{
> +   if (radix_enabled())
> +   return radix__local_flush_tlb_page_psize(mm, vmaddr,
> psize);
> +   return hash__local_flush_tlb_page_psize(mm, vmaddr, psize);
> +}
> +
>  static inline void local_flush_all_mm(struct mm_struct *mm)
>  {
> if (radix_enabled())
> diff --git a/arch/powerpc/include/asm/nohash/tlbflush.h
> b/arch/powerpc/include/asm/nohash/tlbflush.h
> index bdaf34ad41ea..59bce0ebdcf4 100644
> --- a/arch/powerpc/include/asm/nohash/tlbflush.h
> +++ b/arch/powerpc/include/asm/nohash/tlbflush.h
> @@ -58,6 +58,7 @@ static inline void flush_tlb_kernel_range(unsigned
> long start, unsigned long end
>  extern void flush_tlb_kernel_range(unsigned long start, unsigned
> long end);
>  extern void local_flush_tlb_mm(struct mm_struct *mm);
>  extern void local_flush_tlb_page(struct vm_area_struct *vma,
> unsigned long vmaddr);
> +extern void local_flush_tlb_page_psize(struct mm_struct *mm,
> unsigned long vmaddr, int psize);

This misses a definition for PPC_8xx which leads to a build failure as
found by snowpatch here:
https://github.com/ruscur/linux-ci/actions/runs/3295033018/jobs/5433162658#step:4:116

>  
>  extern void __local_flush_tlb_page(struct mm_struct *mm, unsigned
> long vmaddr,
>    int tsize, int ind);



[PATCH] powerpc/8xx: Fix warning in hw_breakpoint_handler()

2022-10-23 Thread Russell Currey
In hw_breakpoint_handler(), ea is set by wp_get_instr_detail() except
for 8xx, leading the variable to be passed uninitialised to
wp_check_constraints().  This is safe as wp_check_constraints() returns
early without using ea, so just set it to make the compiler happy.

Signed-off-by: Russell Currey 
---
 arch/powerpc/kernel/hw_breakpoint.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index 8db1a15d7acb..e1b4e70c8fd0 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -646,7 +646,7 @@ int hw_breakpoint_handler(struct die_args *args)
ppc_inst_t instr = ppc_inst(0);
int type = 0;
int size = 0;
-   unsigned long ea;
+   unsigned long ea = 0;
 
/* Disable breakpoints during exception handling */
hw_breakpoint_disable();
-- 
2.37.3



Re: [PATCH v8 5/6] powerpc/code-patching: Use temporary mm for Radix MMU

2022-10-23 Thread Russell Currey
On Fri, 2022-10-21 at 16:22 +1100, Benjamin Gray wrote:
> From: "Christopher M. Riedl" 
> 
> x86 supports the notion of a temporary mm which restricts access to
> temporary PTEs to a single CPU. A temporary mm is useful for
> situations
> where a CPU needs to perform sensitive operations (such as patching a
> STRICT_KERNEL_RWX kernel) requiring temporary mappings without
> exposing
> said mappings to other CPUs. Another benefit is that other CPU TLBs
> do
> not need to be flushed when the temporary mm is torn down.
> 
> Mappings in the temporary mm can be set in the userspace portion of
> the
> address-space.
> 
> Interrupts must be disabled while the temporary mm is in use. HW
> breakpoints, which may have been set by userspace as watchpoints on
> addresses now within the temporary mm, are saved and disabled when
> loading the temporary mm. The HW breakpoints are restored when
> unloading
> the temporary mm. All HW breakpoints are indiscriminately disabled
> while
> the temporary mm is in use - this may include breakpoints set by
> perf.
> 
> Use the `poking_init` init hook to prepare a temporary mm and
> patching
> address. Initialize the temporary mm by copying the init mm. Choose a
> randomized patching address inside the temporary mm userspace address
> space. The patching address is randomized between PAGE_SIZE and
> DEFAULT_MAP_WINDOW-PAGE_SIZE.
> 
> Bits of entropy with 64K page size on BOOK3S_64:
> 
> bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)
> 
> PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
> bits of entropy = log2(128TB / 64K)
> bits of entropy = 31
> 
> The upper limit is DEFAULT_MAP_WINDOW due to how the Book3s64 Hash
> MMU
> operates - by default the space above DEFAULT_MAP_WINDOW is not
> available. Currently the Hash MMU does not use a temporary mm so
> technically this upper limit isn't necessary; however, a larger
> randomization range does not further "harden" this overall approach
> and
> future work may introduce patching with a temporary mm on Hash as
> well.
> 
> Randomization occurs only once during initialization for each CPU as
> it
> comes online.
> 
> The patching page is mapped with PAGE_KERNEL to set EAA[0] for the
> PTE
> which ignores the AMR (so no need to unlock/lock KUAP) according to
> PowerISA v3.0b Figure 35 on Radix.
> 
> Based on x86 implementation:
> 
> commit 4fc19708b165
> ("x86/alternatives: Initialize temporary mm for patching")
> 
> and:
> 
> commit b3fd8e83ada0
> ("x86/alternatives: Use temporary mm for text poking")
> 
> ---

Is the section following the --- your addendum to Chris' patch?  That
cuts it off from git, including your signoff.  It'd be better to have
it together as one commit message and note the bits you contributed
below the --- after your signoff.

Commits where you're modifying someone else's previous work should
include their signoff above yours, as well.

> Synchronisation is done according to Book 3 Chapter 13

might want to mention the ISA version alongside this, since chapter
numbering can change

> "Synchronization
> Requirements for Context Alterations". Switching the mm is a change
> to
> the PID, which requires a context synchronising instruction before
> and
> after the change, and a hwsync between the last instruction that
> performs address translation for an associated storage access.
> 
> Instruction fetch is an associated storage access, but the
> instruction
> address mappings are not being changed, so it should not matter which
> context they use. We must still perform a hwsync to guard arbitrary
> prior code that may have access a userspace address.
> 
> TLB invalidation is local and VA specific. Local because only this
> core
> used the patching mm, and VA specific because we only care that the
> writable mapping is purged. Leaving the other mappings intact is more
> efficient, especially when performing many code patches in a row
> (e.g.,
> as ftrace would).
> 
> Signed-off-by: Benjamin Gray 
> ---
>  arch/powerpc/lib/code-patching.c | 226
> ++-
>  1 file changed, 221 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/lib/code-patching.c
> b/arch/powerpc/lib/code-patching.c
> index 9b9eba574d7e..eabdd74a26c0 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -4,12 +4,17 @@
>   */
>  
>  #include 
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  
> +#include 
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -42,11 +47,59 @@ int raw_patch_instruction(u32 *addr, ppc_inst_t
> instr)
>  }
>  
>  #ifdef CONFIG_STRICT_KERNEL_RWX
> +
>  static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
> +static DEFINE_PER_CPU(struct mm_struct *, cpu_patching_mm);
> +static DEFINE_PER_CPU(unsigned long, cpu_patching_addr);
> +static DEFINE_PER_CPU(pte_t *, cpu_patching_pte);
>  
>  static int map_patch_area(void *addr, unsigned long 

Re: [PATCH v8 4/6] powerpc/tlb: Add local flush for page given mm_struct and psize

2022-10-23 Thread Russell Currey
On Fri, 2022-10-21 at 16:22 +1100, Benjamin Gray wrote:
> Adds a local TLB flush operation that works given an mm_struct, VA to
> flush, and page size representation.
> 
> This removes the need to create a vm_area_struct, which the temporary
> patching mm work does not need.
> 
> Signed-off-by: Benjamin Gray 
> ---
>  arch/powerpc/include/asm/book3s/32/tlbflush.h  | 9 +
>  arch/powerpc/include/asm/book3s/64/tlbflush-hash.h | 5 +
>  arch/powerpc/include/asm/book3s/64/tlbflush.h  | 8 
>  arch/powerpc/include/asm/nohash/tlbflush.h | 1 +
>  4 files changed, 23 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/book3s/32/tlbflush.h
> b/arch/powerpc/include/asm/book3s/32/tlbflush.h
> index ba1743c52b56..e5a688cebf69 100644
> --- a/arch/powerpc/include/asm/book3s/32/tlbflush.h
> +++ b/arch/powerpc/include/asm/book3s/32/tlbflush.h
> @@ -2,6 +2,8 @@
>  #ifndef _ASM_POWERPC_BOOK3S_32_TLBFLUSH_H
>  #define _ASM_POWERPC_BOOK3S_32_TLBFLUSH_H
>  
> +#include 
> +
>  #define MMU_NO_CONTEXT  (0)
>  /*
>   * TLB flushing for "classic" hash-MMU 32-bit CPUs, 6xx, 7xx, 7xxx
> @@ -74,6 +76,13 @@ static inline void local_flush_tlb_page(struct
> vm_area_struct *vma,
>  {
> flush_tlb_page(vma, vmaddr);
>  }
> +
> +static inline void local_flush_tlb_page_psize(struct mm_struct *mm,
> unsigned long vmaddr, int psize)
> +{
> +   BUILD_BUG_ON(psize != MMU_PAGE_4K);

Is there any utility in adding this for 32bit if the following patches
are only for Radix?

> +   flush_range(mm, vmaddr, vmaddr + PAGE_SIZE);
> +}
> +
>  static inline void local_flush_tlb_mm(struct mm_struct *mm)
>  {
> flush_tlb_mm(mm);
> diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
> b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
> index fab8332fe1ad..8fd9dc49b2a1 100644
> --- a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
> @@ -94,6 +94,11 @@ static inline void
> hash__local_flush_tlb_page(struct vm_area_struct *vma,
>  {
>  }
>  
> +static inline void hash__local_flush_tlb_page_psize(struct mm_struct
> *mm,
> +   unsigned long
> vmaddr, int psize)
> +{
> +}
> +
>  static inline void hash__flush_tlb_page(struct vm_area_struct *vma,
>     unsigned long vmaddr)
>  {
> diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h
> b/arch/powerpc/include/asm/book3s/64/tlbflush.h
> index 67655cd60545..2d839dd5c08c 100644
> --- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
> +++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
> @@ -92,6 +92,14 @@ static inline void local_flush_tlb_page(struct
> vm_area_struct *vma,
> return hash__local_flush_tlb_page(vma, vmaddr);
>  }
>  
> +static inline void local_flush_tlb_page_psize(struct mm_struct *mm,
> + unsigned long vmaddr,
> int psize)
> +{
> +   if (radix_enabled())
> +   return radix__local_flush_tlb_page_psize(mm, vmaddr,
> psize);
> +   return hash__local_flush_tlb_page_psize(mm, vmaddr, psize);
> +}
> +
>  static inline void local_flush_all_mm(struct mm_struct *mm)
>  {
> if (radix_enabled())
> diff --git a/arch/powerpc/include/asm/nohash/tlbflush.h
> b/arch/powerpc/include/asm/nohash/tlbflush.h
> index bdaf34ad41ea..59bce0ebdcf4 100644
> --- a/arch/powerpc/include/asm/nohash/tlbflush.h
> +++ b/arch/powerpc/include/asm/nohash/tlbflush.h
> @@ -58,6 +58,7 @@ static inline void flush_tlb_kernel_range(unsigned
> long start, unsigned long end
>  extern void flush_tlb_kernel_range(unsigned long start, unsigned
> long end);
>  extern void local_flush_tlb_mm(struct mm_struct *mm);
>  extern void local_flush_tlb_page(struct vm_area_struct *vma,
> unsigned long vmaddr);
> +extern void local_flush_tlb_page_psize(struct mm_struct *mm,
> unsigned long vmaddr, int psize);
>  
>  extern void __local_flush_tlb_page(struct mm_struct *mm, unsigned
> long vmaddr,
>    int tsize, int ind);



Re: [PATCH v8 3/6] powerpc/code-patching: Verify instruction patch succeeded

2022-10-23 Thread Russell Currey
On Fri, 2022-10-21 at 16:22 +1100, Benjamin Gray wrote:
> Verifies that if the instruction patching did not return an error
> then
> the value stored at the given address to patch is now equal to the
> instruction we patched it to.
> 
> Signed-off-by: Benjamin Gray 
> ---
>  arch/powerpc/lib/code-patching.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/lib/code-patching.c
> b/arch/powerpc/lib/code-patching.c
> index 34fc7ac34d91..9b9eba574d7e 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -186,6 +186,8 @@ static int do_patch_instruction(u32 *addr,
> ppc_inst_t instr)
> err = __do_patch_instruction(addr, instr);
> local_irq_restore(flags);
>  
> +   WARN_ON(!err && !ppc_inst_equal(instr, ppc_inst_read(addr)));
> +

As a side note, I had a look at test-code-patching.c and it doesn't
look like we don't have a test for ppc_inst_equal() with prefixed
instructions.  We should fix that.

> return err;
>  }
>  #else /* !CONFIG_STRICT_KERNEL_RWX */



Re: [PATCH v8 2/6] powerpc/code-patching: Use WARN_ON and fix check in poking_init

2022-10-23 Thread Russell Currey
On Fri, 2022-10-21 at 16:22 +1100, Benjamin Gray wrote:
> From: "Christopher M. Riedl" 
> 
> The latest kernel docs list BUG_ON() as 'deprecated' and that they
> should be replaced with WARN_ON() (or pr_warn()) when possible. The
> BUG_ON() in poking_init() warrants a WARN_ON() rather than a
> pr_warn()
> since the error condition is deemed "unreachable".
> 
> Also take this opportunity to fix the failure check in the WARN_ON():
> cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, ...) returns a positive
> integer
> on success and a negative integer on failure.
> 
> Signed-off-by: Benjamin Gray 

Reviewed-by: Russell Currey 


Re: [PATCH v8 1/6] powerpc: Allow clearing and restoring registers independent of saved breakpoint state

2022-10-23 Thread Russell Currey
On Fri, 2022-10-21 at 16:22 +1100, Benjamin Gray wrote:
> From: Jordan Niethe 

Hi Ben,

> For the coming temporary mm used for instruction patching, the
> breakpoint registers need to be cleared to prevent them from
> accidentally being triggered. As soon as the patching is done, the
> breakpoints will be restored. The breakpoint state is stored in the
> per
> cpu variable current_brk[]. Add a pause_breakpoints() function which
> will
> clear the breakpoint registers without touching the state in
> current_bkr[]. Add a pair function unpause_breakpoints() which will
 
typo here ^

> move
> the state in current_brk[] back to the registers.
> 
> Signed-off-by: Jordan Niethe 
> Signed-off-by: Benjamin Gray 
> ---
>  arch/powerpc/include/asm/debug.h |  2 ++
>  arch/powerpc/kernel/process.c    | 36 +-
> --
>  2 files changed, 35 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/debug.h
> b/arch/powerpc/include/asm/debug.h
> index 86a14736c76c..83f2dc3785e8 100644
> --- a/arch/powerpc/include/asm/debug.h
> +++ b/arch/powerpc/include/asm/debug.h
> @@ -46,6 +46,8 @@ static inline int debugger_fault_handler(struct
> pt_regs *regs) { return 0; }
>  #endif
>  
>  void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk);
> +void pause_breakpoints(void);
> +void unpause_breakpoints(void);

Nitpick, would (clear/suspend)/restore be clearer than pause/unpause?

>  bool ppc_breakpoint_available(void);
>  #ifdef CONFIG_PPC_ADV_DEBUG_REGS
>  extern void do_send_trap(struct pt_regs *regs, unsigned long
> address,
> diff --git a/arch/powerpc/kernel/process.c
> b/arch/powerpc/kernel/process.c
> index 67da147fe34d..7aee1b30e73c 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -685,6 +685,7 @@ DEFINE_INTERRUPT_HANDLER(do_break)
>  
>  static DEFINE_PER_CPU(struct arch_hw_breakpoint,
> current_brk[HBP_NUM_MAX]);
>  
> +

some bonus whitespace here

>  #ifdef CONFIG_PPC_ADV_DEBUG_REGS
>  /*
>   * Set the debug registers back to their default "safe" values.
> @@ -862,10 +863,8 @@ static inline int set_breakpoint_8xx(struct
> arch_hw_breakpoint *brk)
> return 0;
>  }
>  
> -void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk)
> +static void set_breakpoint(int nr, struct arch_hw_breakpoint
> *brk)

Is there a way to refactor this?  The quad underscore is pretty cursed.

>  {
> -   memcpy(this_cpu_ptr(_brk[nr]), brk, sizeof(*brk));
> -
> if (dawr_enabled())
> // Power8 or later
> set_dawr(nr, brk);
> @@ -879,6 +878,12 @@ void __set_breakpoint(int nr, struct
> arch_hw_breakpoint *brk)
> WARN_ON_ONCE(1);
>  }
>  
> +void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk)
> +{
> +   memcpy(this_cpu_ptr(_brk[nr]), brk, sizeof(*brk));
> +   set_breakpoint(nr, brk);
> +}
> +
>  /* Check if we have DAWR or DABR hardware */
>  bool ppc_breakpoint_available(void)
>  {
> @@ -891,6 +896,31 @@ bool ppc_breakpoint_available(void)
>  }
>  EXPORT_SYMBOL_GPL(ppc_breakpoint_available);
>  
> +/* Disable the breakpoint in hardware without touching current_brk[]
> */
> +void pause_breakpoints(void)
> +{
> +   struct arch_hw_breakpoint brk = {0};
> +   int i;
> +
> +   if (!ppc_breakpoint_available())
> +   return;
> +
> +   for (i = 0; i < nr_wp_slots(); i++)
> +   set_breakpoint(i, );
> +}
> +
> +/* Renable the breakpoint in hardware from current_brk[] */
> +void unpause_breakpoints(void)
> +{
> +   int i;
> +
> +   if (!ppc_breakpoint_available())
> +   return;
> +
> +   for (i = 0; i < nr_wp_slots(); i++)
> +   set_breakpoint(i, this_cpu_ptr(_brk[i]));
> +}
> +
>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>  
>  static inline bool tm_enabled(struct task_struct *tsk)



Re: [PATCH v3 4/4] powerpc/64s: Enable KFENCE on book3s64

2022-09-26 Thread Russell Currey
On Mon, 2022-09-26 at 07:57 +, Nicholas Miehlbradt wrote:
> KFENCE support was added for ppc32 in commit 90cbac0e995d
> ("powerpc: Enable KFENCE for PPC32").
> Enable KFENCE on ppc64 architecture with hash and radix MMUs.
> It uses the same mechanism as debug pagealloc to
> protect/unprotect pages. All KFENCE kunit tests pass on both
> MMUs.
> 
> KFENCE memory is initially allocated using memblock but is
> later marked as SLAB allocated. This necessitates the change
> to __pud_free to ensure that the KFENCE pages are freed
> appropriately.
> 
> Based on previous work by Christophe Leroy and Jordan Niethe.
> 
> Signed-off-by: Nicholas Miehlbradt 

LGTM.  For the whole series:

Reviewed-by: Russell Currey 


Re: [PATCH v5] livepatch: Clear relocation targets on a module removal

2022-09-07 Thread Russell Currey
On Thu, 2022-09-01 at 08:42 -0400, Joe Lawrence wrote:
> On Thu, Sep 01, 2022 at 01:39:02PM +1000, Michael Ellerman wrote:
> > Joe Lawrence  writes:
> > > On Thu, Sep 01, 2022 at 08:30:44AM +1000, Michael Ellerman wrote:
> > > > Joe Lawrence  writes:
> > ...
> > > 
> > > Hi Michael,
> > > 
> > > While we're on the topic of klp-relocations and Power, I saw a
> > > similar
> > > access problem when writing (late) relocations into
> > > .data..ro_after_init.  I'm not entirely convinced this should be
> > > allowed
> > > (ie, is it really read-only after .init or ???), but it seems
> > > that other
> > > arches currently allow it ...
> > 
> > I guess that's because we didn't properly fix apply_relocate_add()
> > in
> > https://github.com/linuxppc/issues/issues/375 ?
> > 
> > If other arches allow it then we don't want to be the odd one out
> > :)
> > 
> > So I guess we need to implement it.
> > 
> 
> FWIW, I think it this particular relocation is pretty rare, we
> haven't
> seen it in real patches nor do we have a kpatch test that generates
> it.
> I only hit a crash as I was trying to write a more exhaustive test
> for
> the klp-convert implementation.

I'll revive my proper fix.  I stopped working on it since my previous
version was hitting endian bugs with some relocations & it didn't seem
necessary at the time.  Shouldn't take too much to get it going again.

> 
> > > = TEST: klp-convert data relocations (late module patching)
> > > =
> > > % modprobe test_klp_convert_data
> > > livepatch: enabling patch 'test_klp_convert_data'
> > > livepatch: 'test_klp_convert_data': starting patching transition
> > > livepatch: 'test_klp_convert_data': patching complete
> > > % modprobe test_klp_convert_mod
> > > ...
> > > module_64: Applying ADD relocate section 54 to 20
> > > module_64: RELOC at 8482d02a: 38-type as
> > > .klp.sym.test_klp_convert_mod.static_ro_after_init,0
> > > (0xc008016d0084) + 0
> > > BUG: Unable to handle kernel data access on write at
> > > 0xc008021d
> > > Faulting instruction address: 0xc0055f14
> > > Oops: Kernel access of bad area, sig: 11 [#1]
> > > LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> > > Modules linked in: test_klp_convert_mod(+)
> > > test_klp_convert_data(K) bonding rfkill tls pseries_rng drm fuse
> > > drm_panel_orientation_quirks xfs libcrc32c sd_mod t10_pi sg
> > > ibmvscsi ibmveth scsi_transport_srp vmx_crypto dm_mirror
> > > dm_region_hash dm_log dm_mod [last unloaded:
> > > test_klp_convert_mod]
> > > CPU: 0 PID: 17089 Comm: modprobe Kdump: loaded Tainted:
> > > G  K   5.19.0+ #1
> > > NIP:  c0055f14 LR: c021ef28 CTR: c0055f14
> > > REGS: c000387af5a0 TRAP: 0300   Tainted: G  K   
> > > (5.19.0+)
> > > MSR:  82009033   CR: 88228444 
> > > XER: 
> > > CFAR: c0055e04 DAR: c008021d DSISR: 4200
> > > IRQMASK: 0
> > > GPR00: c021ef28 c000387af840 c2a68a00
> > > c88b3000
> > > GPR04: c00802230084 0026 0036
> > > c008021e0480
> > > GPR08: 7c426214 c0055f14 c0055e08
> > > 0d80
> > > GPR12: c021d9b0 c2d9 c88b3000
> > > c008021f0810
> > > GPR16: c008021c0638 c88b3d80 
> > > c1181e38
> > > GPR20: c29dc088 c008021e0480 c008021f0870
> > > aaab
> > > GPR24: c88b3c40 c008021d c008021f
> > > 
> > > GPR28: c008021d  c008021c0638
> > > 0810
> > > NIP [c0055f14] apply_relocate_add+0x474/0x9e0
> > > LR [c021ef28] klp_apply_section_relocs+0x208/0x2d0
> > > Call Trace:
> > > [c000387af840] [c000387af920] 0xc000387af920
> > > (unreliable)
> > > [c000387af940] [c021ef28]
> > > klp_apply_section_relocs+0x208/0x2d0
> > > [c000387afa30] [c021f080]
> > > klp_init_object_loaded+0x90/0x1e0
> > > [c000387afac0] [c02200ac]
> > > klp_module_coming+0x3dc/0x5c0
> > > [c000387afb70] [c0231414] load_module+0xf64/0x13a0
> > > [c000387afc90] [c0231b8c]
> > > __do_sys_finit_module+0xdc/0x180
> > > [c000387afdb0] [c002f004]
> > > system_call_exception+0x164/0x340
> > > [c000387afe10] [c000be68]
> > > system_call_vectored_common+0xe8/0x278
> > > --- interrupt: 3000 at 0x7fffb6af4710
> > > NIP:  7fffb6af4710 LR:  CTR: 
> > > REGS: c000387afe80 TRAP: 3000   Tainted: G  K   
> > > (5.19.0+)
> > > MSR:  8000f033   CR:
> > > 48224244  XER: 
> > > IRQMASK: 0
> > > GPR00: 0161 7fffe06f5550 7fffb6bf7200
> > > 0005
> > > GPR04: 000105f36ca0  0005
> > > 
> > > GPR08:   
> > > 
> > > GPR12: 

Re: [RFC PATCH 1/4] powerpc/code-patching: add patch_memory() for writing RO text

2022-09-05 Thread Russell Currey
On Thu, 2022-09-01 at 07:01 +, Christophe Leroy wrote:
> 
> 
> Le 01/09/2022 à 07:58, Benjamin Gray a écrit :
> > From: Russell Currey 
> > 
> > powerpc allocates a text poke area of one page that is used by
> > patch_instruction() to modify read-only text when STRICT_KERNEL_RWX
> > is enabled.
> > 
> > patch_instruction() is only designed for instructions,
> > so writing data using the text poke area can only happen 4 bytes
> > at a time - each with a page map/unmap, pte flush and syncs.
> > 
> > This patch introduces patch_memory(), implementing the same
> > interface as memcpy(), similar to x86's text_poke() and s390's
> > s390_kernel_write().  patch_memory() only needs to map the text
> > poke area once, unless the write would cross a page boundary.
> > 
> > Signed-off-by: Russell Currey 
> > Signed-off-by: Benjamin Gray 
> > ---
> >   arch/powerpc/include/asm/code-patching.h |  1 +
> >   arch/powerpc/lib/code-patching.c | 65
> > 
> >   2 files changed, 66 insertions(+)
> > 
> > diff --git a/arch/powerpc/include/asm/code-patching.h
> > b/arch/powerpc/include/asm/code-patching.h
> > index 1c6316ec4b74..3de90748bce7 100644
> > --- a/arch/powerpc/include/asm/code-patching.h
> > +++ b/arch/powerpc/include/asm/code-patching.h
> > @@ -76,6 +76,7 @@ int create_cond_branch(ppc_inst_t *instr, const
> > u32 *addr,
> >   int patch_branch(u32 *addr, unsigned long target, int flags);
> >   int patch_instruction(u32 *addr, ppc_inst_t instr);
> >   int raw_patch_instruction(u32 *addr, ppc_inst_t instr);
> > +void *patch_memory(void *dest, const void *src, size_t size);
> > 
> >   static inline unsigned long patch_site_addr(s32 *site)
> >   {
> > diff --git a/arch/powerpc/lib/code-patching.c
> > b/arch/powerpc/lib/code-patching.c
> > index 6edf0697a526..0cca39af44cb 100644
> > --- a/arch/powerpc/lib/code-patching.c
> > +++ b/arch/powerpc/lib/code-patching.c
> > @@ -14,6 +14,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> > 
> >   static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr,
> > u32 *patch_addr)
> >   {
> > @@ -183,6 +184,65 @@ static int do_patch_instruction(u32 *addr,
> > ppc_inst_t instr)
> > 
> >  return err;
> >   }
> > +
> > +static int do_patch_memory(void *dest, const void *src, size_t
> > size)
> > +{
> > +   int err;
> > +   unsigned long text_poke_addr, patch_addr;
> > +
> > +   text_poke_addr = (unsigned
> > long)__this_cpu_read(text_poke_area)->addr;
> > +
> > +   err = map_patch_area(dest, text_poke_addr);
> 
> This is not in line with the optimisation done by 
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220815114840.1468656-1-...@ellerman.id.au/

This patch hasn't changed since last year, thanks for the pointer.

> 
> > +   if (err)
> > +   return err;
> > +
> > +   patch_addr = text_poke_addr + offset_in_page(dest);
> > +   copy_to_kernel_nofault((u8 *)patch_addr, src, size);
> 
> copy_to_kernel_nofault() has a performance cost.
> 
> > +
> > +   flush_icache_range(patch_addr, size);
> 
> Is that needed ? We are patching data, not text.

It's necessary if it gets used to patch text, which it might.  Maybe we
should add a variable and only flush if the caller thinks it's needed.

The comment below should be updated for that too.

> > +   unmap_patch_area(text_poke_addr);
> > +
> > +   return 0;
> > +}
> > +
> > +/**
> > + * patch_memory - write data using the text poke area
> > + *
> > + * @dest:  destination address
> > + * @src:   source address
> > + * @size:  size in bytes
> > + *
> > + * like memcpy(), but using the text poke area. No atomicity
> > guarantees.
> > + * Do not use for instructions, use patch_instruction() instead.
> > + * Handles crossing page boundaries, though you shouldn't need to.
> > + *
> > + * Return value:
> > + * @dest
> > + **/
> > +void *patch_memory(void *dest, const void *src, size_t size)
> > +{
> > +   int err;
> > +   unsigned long flags;
> > +   size_t written, write_size;
> > +
> > +   // If the poke area isn't set up, it's early boot and we
> > can just memcpy.
> > +   if (!this_cpu_read(text_poke_area))
> > +   return memcpy(dest, src, size);
> > +
> > +   for (written = 0; written < size; written 

[PATCH] powerpc/pasemi: Use strscpy instead of strlcpy

2022-08-27 Thread Russell Currey
find_i2c_driver() contained the last usage of strlcpy() in arch/powerpc.
The return value was used to check if strlen(src) >= n, for which
strscpy() returns -E2BIG.

Signed-off-by: Russell Currey 
---
 arch/powerpc/platforms/pasemi/misc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pasemi/misc.c 
b/arch/powerpc/platforms/pasemi/misc.c
index f859ada29074..9e9a7e46288a 100644
--- a/arch/powerpc/platforms/pasemi/misc.c
+++ b/arch/powerpc/platforms/pasemi/misc.c
@@ -36,8 +36,7 @@ static int __init find_i2c_driver(struct device_node *node,
for (i = 0; i < ARRAY_SIZE(i2c_devices); i++) {
if (!of_device_is_compatible(node, i2c_devices[i].of_device))
continue;
-   if (strlcpy(info->type, i2c_devices[i].i2c_type,
-   I2C_NAME_SIZE) >= I2C_NAME_SIZE)
+   if (strscpy(info->type, i2c_devices[i].i2c_type, I2C_NAME_SIZE) 
< 0)
return -ENOMEM;
return 0;
}
-- 
2.37.2



[PATCH v4 2/2] selftests/powerpc: Add a test for execute-only memory

2022-08-16 Thread Russell Currey
From: Nicholas Miehlbradt 

This selftest is designed to cover execute-only protections
on the Radix MMU but will also work with Hash.

The tests are based on those found in pkey_exec_test with modifications
to use the generic mprotect() instead of the pkey variants.

Signed-off-by: Nicholas Miehlbradt 
Signed-off-by: Russell Currey 
---
v4: new

 tools/testing/selftests/powerpc/mm/Makefile   |   3 +-
 .../testing/selftests/powerpc/mm/exec_prot.c  | 231 ++
 2 files changed, 233 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/mm/exec_prot.c

diff --git a/tools/testing/selftests/powerpc/mm/Makefile 
b/tools/testing/selftests/powerpc/mm/Makefile
index 27dc09d0bfee..19dd0b2ea397 100644
--- a/tools/testing/selftests/powerpc/mm/Makefile
+++ b/tools/testing/selftests/powerpc/mm/Makefile
@@ -3,7 +3,7 @@ noarg:
$(MAKE) -C ../
 
 TEST_GEN_PROGS := hugetlb_vs_thp_test subpage_prot prot_sao segv_errors 
wild_bctr \
- large_vm_fork_separation bad_accesses pkey_exec_prot \
+ large_vm_fork_separation bad_accesses exec_prot 
pkey_exec_prot \
  pkey_siginfo stack_expansion_signal stack_expansion_ldst \
  large_vm_gpr_corruption
 TEST_PROGS := stress_code_patching.sh
@@ -22,6 +22,7 @@ $(OUTPUT)/wild_bctr: CFLAGS += -m64
 $(OUTPUT)/large_vm_fork_separation: CFLAGS += -m64
 $(OUTPUT)/large_vm_gpr_corruption: CFLAGS += -m64
 $(OUTPUT)/bad_accesses: CFLAGS += -m64
+$(OUTPUT)/exec_prot: CFLAGS += -m64
 $(OUTPUT)/pkey_exec_prot: CFLAGS += -m64
 $(OUTPUT)/pkey_siginfo: CFLAGS += -m64
 
diff --git a/tools/testing/selftests/powerpc/mm/exec_prot.c 
b/tools/testing/selftests/powerpc/mm/exec_prot.c
new file mode 100644
index ..db75b2225de1
--- /dev/null
+++ b/tools/testing/selftests/powerpc/mm/exec_prot.c
@@ -0,0 +1,231 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2022, Nicholas Miehlbradt, IBM Corporation
+ * based on pkey_exec_prot.c
+ *
+ * Test if applying execute protection on pages works as expected.
+ */
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include "pkeys.h"
+
+
+#define PPC_INST_NOP   0x6000
+#define PPC_INST_TRAP  0x7fe8
+#define PPC_INST_BLR   0x4e800020
+
+static volatile sig_atomic_t fault_code;
+static volatile sig_atomic_t remaining_faults;
+static volatile unsigned int *fault_addr;
+static unsigned long pgsize, numinsns;
+static unsigned int *insns;
+static bool pkeys_supported;
+
+static bool is_fault_expected(int fault_code)
+{
+   if (fault_code == SEGV_ACCERR)
+   return true;
+
+   /* Assume any pkey error is fine since pkey_exec_prot test covers them 
*/
+   if (fault_code == SEGV_PKUERR && pkeys_supported)
+   return true;
+
+   return false;
+}
+
+static void trap_handler(int signum, siginfo_t *sinfo, void *ctx)
+{
+   /* Check if this fault originated from the expected address */
+   if (sinfo->si_addr != (void *)fault_addr)
+   sigsafe_err("got a fault for an unexpected address\n");
+
+   _exit(1);
+}
+
+static void segv_handler(int signum, siginfo_t *sinfo, void *ctx)
+{
+   fault_code = sinfo->si_code;
+
+   /* Check if this fault originated from the expected address */
+   if (sinfo->si_addr != (void *)fault_addr) {
+   sigsafe_err("got a fault for an unexpected address\n");
+   _exit(1);
+   }
+
+   /* Check if too many faults have occurred for a single test case */
+   if (!remaining_faults) {
+   sigsafe_err("got too many faults for the same address\n");
+   _exit(1);
+   }
+
+
+   /* Restore permissions in order to continue */
+   if (is_fault_expected(fault_code)) {
+   if (mprotect(insns, pgsize, PROT_READ | PROT_WRITE | 
PROT_EXEC)) {
+   sigsafe_err("failed to set access permissions\n");
+   _exit(1);
+   }
+   } else {
+   sigsafe_err("got a fault with an unexpected code\n");
+   _exit(1);
+   }
+
+   remaining_faults--;
+}
+
+static int check_exec_fault(int rights)
+{
+   /*
+* Jump to the executable region.
+*
+* The first iteration also checks if the overwrite of the
+* first instruction word from a trap to a no-op succeeded.
+*/
+   fault_code = -1;
+   remaining_faults = 0;
+   if (!(rights & PROT_EXEC))
+   remaining_faults = 1;
+
+   FAIL_IF(mprotect(insns, pgsize, rights) != 0);
+   asm volatile("mtctr %0; bctrl" : : "r"(insns));
+
+   FAIL_IF(remaining_faults != 0);
+   if (!(rights & PROT_EXEC))
+   FAIL_IF(!is_fault_expected(fault_code));
+
+   return 0;
+}
+
+static int test(void)
+{
+   struct sigaction segv_act

[PATCH v4 1/2] powerpc/mm: Support execute-only memory on the Radix MMU

2022-08-16 Thread Russell Currey
Add support for execute-only memory (XOM) for the Radix MMU by using an
execute-only mapping, as opposed to the RX mapping used by powerpc's
other MMUs.

The Hash MMU already supports XOM through the execute-only pkey,
which is a separate mechanism shared with x86.  A PROT_EXEC-only mapping
will map to RX, and then the pkey will be applied on top of it.

mmap() and mprotect() consumers in userspace should observe the same
behaviour on Hash and Radix despite the differences in implementation.

Replacing the vma_is_accessible() check in access_error() with a read
check should be functionally equivalent for non-Radix MMUs, since it
follows write and execute checks.  For Radix, the change enables
detecting faults on execute-only mappings where vma_is_accessible() would
return true.

Signed-off-by: Russell Currey 
---
v4: Reword commit message, add changes suggested by Christophe and Aneesh

 arch/powerpc/include/asm/book3s/64/pgtable.h |  2 ++
 arch/powerpc/mm/book3s64/pgtable.c   | 11 +--
 arch/powerpc/mm/fault.c  |  6 +-
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 392ff48f77df..486902aff040 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -151,6 +151,8 @@
 #define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC)
 #define PAGE_READONLY  __pgprot(_PAGE_BASE | _PAGE_READ)
 #define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC)
+/* Radix only, Hash uses PAGE_READONLY_X + execute-only pkey instead */
+#define PAGE_EXECONLY  __pgprot(_PAGE_BASE | _PAGE_EXEC)
 
 /* Permission masks used for kernel mappings */
 #define PAGE_KERNEL__pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 7b9966402b25..f6151a589298 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -553,8 +553,15 @@ EXPORT_SYMBOL_GPL(memremap_compat_align);
 
 pgprot_t vm_get_page_prot(unsigned long vm_flags)
 {
-   unsigned long prot = pgprot_val(protection_map[vm_flags &
-   (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]);
+   unsigned long prot;
+
+   /* Radix supports execute-only, but protection_map maps X -> RX */
+   if (radix_enabled() && ((vm_flags & VM_ACCESS_FLAGS) == VM_EXEC)) {
+   prot = pgprot_val(PAGE_EXECONLY);
+   } else {
+   prot = pgprot_val(protection_map[vm_flags &
+(VM_ACCESS_FLAGS | 
VM_SHARED)]);
+   }
 
if (vm_flags & VM_SAO)
prot |= _PAGE_SAO;
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 014005428687..1566804e4b3d 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -270,7 +270,11 @@ static bool access_error(bool is_write, bool is_exec, 
struct vm_area_struct *vma
return false;
}
 
-   if (unlikely(!vma_is_accessible(vma)))
+   /*
+* Check for a read fault.  This could be caused by a read on an
+* inaccessible page (i.e. PROT_NONE), or a Radix MMU execute-only page.
+*/
+   if (unlikely(!(vma->vm_flags & VM_READ)))
return true;
/*
 * We should ideally do the vma pkey access check here. But in the
-- 
2.37.2



[PATCH] selftests/powerpc: Add missing PMU selftests to .gitignores

2022-08-12 Thread Russell Currey
Some recently added selftests don't have their binaries in .gitignores,
so add them.

I also alphabetically sorted sampling_tests/.gitignore while I was in
there.

Signed-off-by: Russell Currey 
---
 .../powerpc/pmu/event_code_tests/.gitignore   | 20 +++
 .../powerpc/pmu/sampling_tests/.gitignore | 18 +
 2 files changed, 34 insertions(+), 4 deletions(-)
 create mode 100644 
tools/testing/selftests/powerpc/pmu/event_code_tests/.gitignore

diff --git a/tools/testing/selftests/powerpc/pmu/event_code_tests/.gitignore 
b/tools/testing/selftests/powerpc/pmu/event_code_tests/.gitignore
new file mode 100644
index ..5710683da525
--- /dev/null
+++ b/tools/testing/selftests/powerpc/pmu/event_code_tests/.gitignore
@@ -0,0 +1,20 @@
+blacklisted_events_test
+event_alternatives_tests_p10
+event_alternatives_tests_p9
+generic_events_valid_test
+group_constraint_cache_test
+group_constraint_l2l3_sel_test
+group_constraint_mmcra_sample_test
+group_constraint_pmc56_test
+group_constraint_pmc_count_test
+group_constraint_radix_scope_qual_test
+group_constraint_repeat_test
+group_constraint_thresh_cmp_test
+group_constraint_thresh_ctl_test
+group_constraint_thresh_sel_test
+group_constraint_unit_test
+group_pmc56_exclude_constraints_test
+hw_cache_event_type_test
+invalid_event_code_test
+reserved_bits_mmcra_sample_elig_mode_test
+reserved_bits_mmcra_thresh_ctl_test
diff --git a/tools/testing/selftests/powerpc/pmu/sampling_tests/.gitignore 
b/tools/testing/selftests/powerpc/pmu/sampling_tests/.gitignore
index 0fce5a694684..f93b4c7c3a8a 100644
--- a/tools/testing/selftests/powerpc/pmu/sampling_tests/.gitignore
+++ b/tools/testing/selftests/powerpc/pmu/sampling_tests/.gitignore
@@ -1,11 +1,21 @@
-mmcr0_exceptionbits_test
+bhrb_filter_map_test
+bhrb_no_crash_wo_pmu_test
+intr_regs_no_crash_wo_pmu_test
 mmcr0_cc56run_test
-mmcr0_pmccext_test
-mmcr0_pmcjce_test
+mmcr0_exceptionbits_test
 mmcr0_fc56_pmc1ce_test
 mmcr0_fc56_pmc56_test
+mmcr0_pmccext_test
+mmcr0_pmcjce_test
 mmcr1_comb_test
-mmcr2_l2l3_test
+mmcr1_sel_unit_cache_test
 mmcr2_fcs_fch_test
+mmcr2_l2l3_test
 mmcr3_src_test
+mmcra_bhrb_any_test
+mmcra_bhrb_cond_test
+mmcra_bhrb_disable_no_branch_test
+mmcra_bhrb_disable_test
+mmcra_bhrb_ind_call_test
+mmcra_thresh_cmp_test
 mmcra_thresh_marked_sample_test
-- 
2.37.1



[PATCH v2] powerpc/kexec: Fix build failure from uninitialised variable

2022-08-09 Thread Russell Currey
clang 14 won't build because ret is uninitialised and can be returned if
both prop and fdtprop are NULL.  Drop the ret variable and return an
error in that failure case.

Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of 
ibm,dma-window")
Suggested-by: Christophe Leroy 
Signed-off-by: Russell Currey 
---
v2: adopt Christophe's suggestion, which is better

 arch/powerpc/kexec/file_load_64.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index 683462e4556b..349a781cea0b 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -1043,17 +1043,17 @@ static int copy_property(void *fdt, int node_offset, 
const struct device_node *d
 const char *propname)
 {
const void *prop, *fdtprop;
-   int len = 0, fdtlen = 0, ret;
+   int len = 0, fdtlen = 0;
 
prop = of_get_property(dn, propname, );
fdtprop = fdt_getprop(fdt, node_offset, propname, );
 
if (fdtprop && !prop)
-   ret = fdt_delprop(fdt, node_offset, propname);
+   return fdt_delprop(fdt, node_offset, propname);
else if (prop)
-   ret = fdt_setprop(fdt, node_offset, propname, prop, len);
-
-   return ret;
+   return fdt_setprop(fdt, node_offset, propname, prop, len);
+   else
+   return -FDT_ERR_NOTFOUND;
 }
 
 static int update_pci_dma_nodes(void *fdt, const char *dmapropname)
-- 
2.37.1



Re: [PATCH v3] powerpc/mm: Support execute-only memory on the Radix MMU

2022-08-09 Thread Russell Currey
On Tue, 2022-08-09 at 05:51 +, Christophe Leroy wrote:
> Le 09/08/2022 à 04:44, Russell Currey a écrit :
> > The Hash MMU already supports XOM (i.e. mmap with PROT_EXEC only)
> > through the execute-only pkey.  A PROT_EXEC-only mapping will
> > actually
> > map to RX, and then the pkey will be applied on top of it.
> 
> I don't think XOM is a commonly understood accronym. Maybe the first 
> time you use it it'd be better to say something like:
> 
> The Hash MMU already supports execute-only memory (XOM)

Yes, that's better.

> 
> When you say that Hash MMU supports it through the execute-only pkey,
> does it mean that it is taken into account automatically at mmap
> time, 
> or does the userspace app has to do something special to use the key
> ? 
> If it is the second, it means that depending on whether you are radix
> or 
> not, you must do something different ? Is that expected ?

It happens at mmap time, see do_mmap() in mm/mmap.c (and similar for
mprotect).  That calls into execute_only_pkey() which can return
something on x86 & Hash, and if it does that pkey gets used.  The
userspace process doesn't have to do anything, it's transparent.  So
there's no difference in program behaviour switching between Hash/Radix
- at least in the basic cases I've tested.

> 
> > 
> > Radix doesn't have pkeys, but it does have execute permissions
> > built-in
> > to the MMU, so all we have to do to support XOM is expose it.
> > 
> > Signed-off-by: Russell Currey 
> > ---
> > v3: Incorporate Aneesh's suggestions, leave protection_map
> > untouched
> > Basic test:
> > https://github.com/ruscur/junkcode/blob/main/mmap_test.c
> > 
> >   arch/powerpc/include/asm/book3s/64/pgtable.h |  2 ++
> >   arch/powerpc/mm/book3s64/pgtable.c   | 11 +--
> >   arch/powerpc/mm/fault.c  |  6 +-
> >   3 files changed, 16 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h
> > b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > index 392ff48f77df..486902aff040 100644
> > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > @@ -151,6 +151,8 @@
> >   #define PAGE_COPY_X   __pgprot(_PAGE_BASE | _PAGE_READ |
> > _PAGE_EXEC)
> >   #define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_READ)
> >   #define PAGE_READONLY_X   __pgprot(_PAGE_BASE | _PAGE_READ |
> > _PAGE_EXEC)
> > +/* Radix only, Hash uses PAGE_READONLY_X + execute-only pkey
> > instead */
> > +#define PAGE_EXECONLY  __pgprot(_PAGE_BASE | _PAGE_EXEC)
> >   
> >   /* Permission masks used for kernel mappings */
> >   #define PAGE_KERNEL   __pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)
> > diff --git a/arch/powerpc/mm/book3s64/pgtable.c
> > b/arch/powerpc/mm/book3s64/pgtable.c
> > index 7b9966402b25..62f63d344596 100644
> > --- a/arch/powerpc/mm/book3s64/pgtable.c
> > +++ b/arch/powerpc/mm/book3s64/pgtable.c
> > @@ -553,8 +553,15 @@ EXPORT_SYMBOL_GPL(memremap_compat_align);
> >   
> >   pgprot_t vm_get_page_prot(unsigned long vm_flags)
> >   {
> > -   unsigned long prot = pgprot_val(protection_map[vm_flags &
> > -
> >    (VM_READ|VM_WRITE|VM_EXEC|VM_
> > SHARED)]);
> > +   unsigned long prot;
> > +
> > +   /* Radix supports execute-only, but protection_map maps X -
> > > RX */
> > +   if (radix_enabled() && ((vm_flags &
> > (VM_READ|VM_WRITE|VM_EXEC)) == VM_EXEC)) {
> 
> Maybe use VM_ACCESS_FLAGS ?

I was looking for something like that but only checked powerpc, thanks.

> 
> > +   prot = pgprot_val(PAGE_EXECONLY);
> > +   } else {
> > +   prot = pgprot_val(protection_map[vm_flags &
> > +
> > (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]);
> > +   }
> >   
> > if (vm_flags & VM_SAO)
> > prot |= _PAGE_SAO;
> > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> > index 014005428687..59e4cbcf3109 100644
> > --- a/arch/powerpc/mm/fault.c
> > +++ b/arch/powerpc/mm/fault.c
> > @@ -270,7 +270,11 @@ static bool access_error(bool is_write, bool
> > is_exec, struct vm_area_struct *vma
> > return false;
> > }
> >   
> > -   if (unlikely(!vma_is_accessible(vma)))
> > +   /* On Radix, a read fault could be from PROT_NONE or
> > PROT_EXEC */
> > +   if (unlikely(radix_enabled() && !(vma->vm_flags &
> > 

[PATCH] powerpc/kexec: Fix build failure from uninitialised variable

2022-08-08 Thread Russell Currey
clang 14 won't build because ret is uninitialised and can be returned if
both prop and fdtprop are NULL.

Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of 
ibm,dma-window")
Signed-off-by: Russell Currey 
---
Not sure what should be returned here, EINVAL seemed reasonable for a
passed property not existing.

Also, damn it Alexey, I mentioned this in my review:
http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220616075901.835871-1-...@ozlabs.ru/

Consider yourself lucky I'm no longer your dictator (if you don't already)

 arch/powerpc/kexec/file_load_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index 683462e4556b..8fa2995e6fc7 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -1043,7 +1043,7 @@ static int copy_property(void *fdt, int node_offset, 
const struct device_node *d
 const char *propname)
 {
const void *prop, *fdtprop;
-   int len = 0, fdtlen = 0, ret;
+   int len = 0, fdtlen = 0, ret = -EINVAL;
 
prop = of_get_property(dn, propname, );
fdtprop = fdt_getprop(fdt, node_offset, propname, );
-- 
2.37.1



[PATCH v3] powerpc/mm: Support execute-only memory on the Radix MMU

2022-08-08 Thread Russell Currey
The Hash MMU already supports XOM (i.e. mmap with PROT_EXEC only)
through the execute-only pkey.  A PROT_EXEC-only mapping will actually
map to RX, and then the pkey will be applied on top of it.

Radix doesn't have pkeys, but it does have execute permissions built-in
to the MMU, so all we have to do to support XOM is expose it.

Signed-off-by: Russell Currey 
---
v3: Incorporate Aneesh's suggestions, leave protection_map untouched
Basic test: https://github.com/ruscur/junkcode/blob/main/mmap_test.c

 arch/powerpc/include/asm/book3s/64/pgtable.h |  2 ++
 arch/powerpc/mm/book3s64/pgtable.c   | 11 +--
 arch/powerpc/mm/fault.c  |  6 +-
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 392ff48f77df..486902aff040 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -151,6 +151,8 @@
 #define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC)
 #define PAGE_READONLY  __pgprot(_PAGE_BASE | _PAGE_READ)
 #define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC)
+/* Radix only, Hash uses PAGE_READONLY_X + execute-only pkey instead */
+#define PAGE_EXECONLY  __pgprot(_PAGE_BASE | _PAGE_EXEC)
 
 /* Permission masks used for kernel mappings */
 #define PAGE_KERNEL__pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 7b9966402b25..62f63d344596 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -553,8 +553,15 @@ EXPORT_SYMBOL_GPL(memremap_compat_align);
 
 pgprot_t vm_get_page_prot(unsigned long vm_flags)
 {
-   unsigned long prot = pgprot_val(protection_map[vm_flags &
-   (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]);
+   unsigned long prot;
+
+   /* Radix supports execute-only, but protection_map maps X -> RX */
+   if (radix_enabled() && ((vm_flags & (VM_READ|VM_WRITE|VM_EXEC)) == 
VM_EXEC)) {
+   prot = pgprot_val(PAGE_EXECONLY);
+   } else {
+   prot = pgprot_val(protection_map[vm_flags &
+ (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]);
+   }
 
if (vm_flags & VM_SAO)
prot |= _PAGE_SAO;
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 014005428687..59e4cbcf3109 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -270,7 +270,11 @@ static bool access_error(bool is_write, bool is_exec, 
struct vm_area_struct *vma
return false;
}
 
-   if (unlikely(!vma_is_accessible(vma)))
+   /* On Radix, a read fault could be from PROT_NONE or PROT_EXEC */
+   if (unlikely(radix_enabled() && !(vma->vm_flags & VM_READ)))
+   return true;
+   /* Check for a PROT_NONE fault on other MMUs */
+   else if (unlikely(!vma_is_accessible(vma)))
return true;
/*
 * We should ideally do the vma pkey access check here. But in the
-- 
2.37.1



Re: [PATCH v2 2/2] powerpc/mm: Support execute-only memory on the Radix MMU

2022-08-08 Thread Russell Currey
On Mon, 2022-08-08 at 18:54 +0530, Aneesh Kumar K V wrote:
> On 8/8/22 6:31 PM, Russell Currey wrote:
> > The Hash MMU already supports XOM (i.e. mmap with PROT_EXEC only)
> > through the execute-only pkey.  A PROT_EXEC-only mapping will
> > actually
> > map to RX, and then the pkey will be applied on top of it.
> > 
> > Radix doesn't have pkeys, but it does have execute permissions
> > built-in
> > to the MMU, so all we have to do to support XOM is expose it.
> > 
> > That's not possible with protection_map being const, so make it RO
> > after
> > init instead.
> > 
> > Signed-off-by: Russell Currey 
> > ---
> > v2: Make protection_map __ro_after_init and set it in an initcall
> > (v1 didn't work, I tested before rebasing on Anshuman's patches)
> > 
> > basic test:
> > https://raw.githubusercontent.com/ruscur/junkcode/main/mmap_test.c
> > 
> >  arch/powerpc/include/asm/book3s/64/radix.h |  3 +++
> >  arch/powerpc/include/asm/pgtable.h |  1 -
> >  arch/powerpc/mm/fault.c    | 10 ++
> >  arch/powerpc/mm/pgtable.c  | 16 +++-
> >  4 files changed, 28 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/book3s/64/radix.h
> > b/arch/powerpc/include/asm/book3s/64/radix.h
> > index 686001eda936..bf316b773d73 100644
> > --- a/arch/powerpc/include/asm/book3s/64/radix.h
> > +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> > @@ -19,6 +19,9 @@
> >  #include 
> >  #endif
> >  
> > +/* Execute-only page protections, Hash can use RX + execute-only
> > pkey */
> > +#define PAGE_EXECONLY  __pgprot(_PAGE_BASE | _PAGE_EXEC)
> > +
> >  /* An empty PTE can still have a R or C writeback */
> >  #define RADIX_PTE_NONE_MASK(_PAGE_DIRTY |
> > _PAGE_ACCESSED)
> >  
> > diff --git a/arch/powerpc/include/asm/pgtable.h
> > b/arch/powerpc/include/asm/pgtable.h
> > index 33f4bf8d22b0..3cbb6de20f9d 100644
> > --- a/arch/powerpc/include/asm/pgtable.h
> > +++ b/arch/powerpc/include/asm/pgtable.h
> > @@ -60,7 +60,6 @@ extern void paging_init(void);
> >  void poking_init(void);
> >  
> >  extern unsigned long ioremap_bot;
> > -extern const pgprot_t protection_map[16];
> >  
> >  /*
> >   * kern_addr_valid is intended to indicate whether an address is a
> > valid
> > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> > index 014005428687..887c0cc45ca6 100644
> > --- a/arch/powerpc/mm/fault.c
> > +++ b/arch/powerpc/mm/fault.c
> > @@ -270,6 +270,16 @@ static bool access_error(bool is_write, bool
> > is_exec, struct vm_area_struct *vma
> > return false;
> > }
> >  
> > +   if (unlikely(!(vma->vm_flags & VM_READ))) {
> > +   /*
> > +    * If we're on Radix, then this could be a read
> > attempt on
> > +    * execute-only memory.  On other MMUs, an "exec-
> > only" page
> > +    * will be given RX flags, so this might be
> > redundant.
> > +    */
> > +   if (radix_enabled())
> > +   return true;
> > +   }
> > +
> > if (unlikely(!vma_is_accessible(vma)))
> > return true;
> > /*
> > diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> > index 0b2bbde5fb65..6e1a6a999c3c 100644
> > --- a/arch/powerpc/mm/pgtable.c
> > +++ b/arch/powerpc/mm/pgtable.c
> > @@ -475,7 +475,7 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned
> > long ea,
> >  EXPORT_SYMBOL_GPL(__find_linux_pte);
> >  
> >  /* Note due to the way vm flags are laid out, the bits are XWR */
> > -const pgprot_t protection_map[16] = {
> > +static pgprot_t protection_map[16] __ro_after_init = {
> > [VM_NONE]   =
> > PAGE_NONE,
> > [VM_READ]   =
> > PAGE_READONLY,
> > [VM_WRITE]  =
> > PAGE_COPY,
> > @@ -494,6 +494,20 @@ const pgprot_t protection_map[16] = {
> > [VM_SHARED | VM_EXEC | VM_WRITE | VM_READ]  =
> > PAGE_SHARED_X
> >  };
> >  
> > +#ifdef CONFIG_PPC_RADIX_MMU
> > +static int __init radix_update_protection_map(void)
> > +{
> > +   if (early_radix_enabled()) {
> > +   /* Radix directly supports execute-only page
> > protections */
> > +   protection_map[VM_EXEC] = PAGE_EXECONLY;
> > +   protection_map[VM_EXEC | VM_SHARED] =
> > PAGE_EXECONLY;
> > +   }
> > +
> > +   return 0;
> > +}
> > +arch_initcall(radix_update_protection_map);
> 
> Instead of this can we do this in vm_get_page_prot() ?
> 
> /* EXEC only shared or non shared mapping ? */
> if (radix_enabled() && ((vm_flags & (VM_READ | VM_WRITE |
> VM_EXEC)) == VM_EXEC))
> prot = PAGE_EXECONLY; 

That is a lot simpler, thanks.

- Russell

> 
> 
> > +#endif /* CONFIG_PPC_RADIX_MMU */
> > +
> >  #ifdef CONFIG_PPC_BOOK3S_64
> >  pgprot_t vm_get_page_prot(unsigned long vm_flags)
> >  {
> 



Re: [PATCH] powerpc/mm: Support execute-only memory on the Radix MMU

2022-08-08 Thread Russell Currey
On Mon, 2022-08-08 at 18:28 +0530, Aneesh Kumar K V wrote:
> On 8/8/22 5:28 PM, Russell Currey wrote:
> > The Hash MMU already supports XOM (i.e. mmap with PROT_EXEC only)
> > through the execute-only pkey.  A PROT_ONLY mapping will actually
> > map to
> > RX, and then the pkey will be applied on top of it.
> > 
> > Radix doesn't have pkeys, but it does have execute permissions
> > built-in
> > to the MMU, so all we have to do to support XOM is expose it.
> > 
> > Signed-off-by: Russell Currey 
> > ---
> > quick test:
> > https://raw.githubusercontent.com/ruscur/junkcode/main/mmap_test.c
> > I can make it a selftest.
> > 
> >  arch/powerpc/include/asm/book3s/64/radix.h |  3 +++
> >  arch/powerpc/mm/book3s64/radix_pgtable.c   |  4 
> >  arch/powerpc/mm/fault.c    | 10 ++
> >  3 files changed, 17 insertions(+)
> > 
> > diff --git a/arch/powerpc/include/asm/book3s/64/radix.h
> > b/arch/powerpc/include/asm/book3s/64/radix.h
> > index 686001eda936..bf316b773d73 100644
> > --- a/arch/powerpc/include/asm/book3s/64/radix.h
> > +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> > @@ -19,6 +19,9 @@
> >  #include 
> >  #endif
> >  
> > +/* Execute-only page protections, Hash can use RX + execute-only
> > pkey */
> > +#define PAGE_EXECONLY  __pgprot(_PAGE_BASE | _PAGE_EXEC)
> > +
> >  /* An empty PTE can still have a R or C writeback */
> >  #define RADIX_PTE_NONE_MASK(_PAGE_DIRTY |
> > _PAGE_ACCESSED)
> >  
> > diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c
> > b/arch/powerpc/mm/book3s64/radix_pgtable.c
> > index 698274109c91..2edb56169805 100644
> > --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> > +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> > @@ -617,6 +617,10 @@ void __init radix__early_init_mmu(void)
> > __pmd_frag_nr = RADIX_PMD_FRAG_NR;
> > __pmd_frag_size_shift = RADIX_PMD_FRAG_SIZE_SHIFT;
> >  
> > +   /* Radix directly supports execute-only page protections */
> > +   protection_map[VM_EXEC] = PAGE_EXECONLY;
> > +   protection_map[VM_EXEC | VM_SHARED] = PAGE_EXECONLY;
> > +
> > radix_init_pgtable();
> >  
> > if (!firmware_has_feature(FW_FEATURE_LPAR)) {
> > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> > index 014005428687..887c0cc45ca6 100644
> > --- a/arch/powerpc/mm/fault.c
> > +++ b/arch/powerpc/mm/fault.c
> > @@ -270,6 +270,16 @@ static bool access_error(bool is_write, bool
> > is_exec, struct vm_area_struct *vma
> > return false;
> > }
> >  
> > +   if (unlikely(!(vma->vm_flags & VM_READ))) {
> > +   /*
> > +    * If we're on Radix, then this could be a read
> > attempt on
> > +    * execute-only memory.  On other MMUs, an "exec-
> > only" page
> > +    * will be given RX flags, so this might be
> > redundant.
> > +    */
> > +   if (radix_enabled())
> > +   return true;
> > +   }
> > +
> 
> 
> should we do 
> 
> /* This cover both PROT_NONE (due to check above) and exec only
> mapping */
> if (radix_enabled() && !(vma->vm_flags & VM_READ)) {
>     return true;
> /* PROT_NONE check */
> else if (!vma_is_accessible(vma)) 
>    return true;
> 
> return false;

That is better, thanks.

- Russell

> 
> 
> 
> > if (unlikely(!vma_is_accessible(vma)))
> > return true;
> > /*
> 
> -aneesh



Re: [PATCH v2 1/2] powerpc/mm: Move vm_get_page_prot() out of book3s64 code

2022-08-08 Thread Russell Currey
On Mon, 2022-08-08 at 14:32 +, Christophe Leroy wrote:
> 
> 
> Le 08/08/2022 à 15:01, Russell Currey a écrit :
> > protection_map is about to be __ro_after_init instead of const, so
> > move
> > the only non-local function that consumes it to the same file so it
> > can
> > at least be static.
> 
> What's the advantage of doing that ? Why does it need to be static  ?
> 
> Christophe

It doesn't need to be, I didn't like having it exposed unnecessarily. 
Aneesh's suggestion lets it stay const so I can drop this patch anyway.

- Russell

> 
> > 
> > Signed-off-by: Russell Currey 
> > ---
> > v2: new
> > 
> >   arch/powerpc/mm/book3s64/pgtable.c | 16 
> >   arch/powerpc/mm/pgtable.c  | 21 +++--
> >   2 files changed, 19 insertions(+), 18 deletions(-)
> > 
> > diff --git a/arch/powerpc/mm/book3s64/pgtable.c
> > b/arch/powerpc/mm/book3s64/pgtable.c
> > index 7b9966402b25..e2a4ea5eb960 100644
> > --- a/arch/powerpc/mm/book3s64/pgtable.c
> > +++ b/arch/powerpc/mm/book3s64/pgtable.c
> > @@ -550,19 +550,3 @@ unsigned long memremap_compat_align(void)
> >   }
> >   EXPORT_SYMBOL_GPL(memremap_compat_align);
> >   #endif
> > -
> > -pgprot_t vm_get_page_prot(unsigned long vm_flags)
> > -{
> > -   unsigned long prot = pgprot_val(protection_map[vm_flags &
> > -
> >    (VM_READ|VM_WRITE|VM_EXEC|VM_
> > SHARED)]);
> > -
> > -   if (vm_flags & VM_SAO)
> > -   prot |= _PAGE_SAO;
> > -
> > -#ifdef CONFIG_PPC_MEM_KEYS
> > -   prot |= vmflag_to_pte_pkey_bits(vm_flags);
> > -#endif
> > -
> > -   return __pgprot(prot);
> > -}
> > -EXPORT_SYMBOL(vm_get_page_prot);
> > diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> > index cb2dcdb18f8e..0b2bbde5fb65 100644
> > --- a/arch/powerpc/mm/pgtable.c
> > +++ b/arch/powerpc/mm/pgtable.c
> > @@ -27,6 +27,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   
> >   #ifdef CONFIG_PPC64
> >   #define PGD_ALIGN (sizeof(pgd_t) * MAX_PTRS_PER_PGD)
> > @@ -493,6 +494,22 @@ const pgprot_t protection_map[16] = {
> > [VM_SHARED | VM_EXEC | VM_WRITE | VM_READ]  =
> > PAGE_SHARED_X
> >   };
> >   
> > -#ifndef CONFIG_PPC_BOOK3S_64
> > -DECLARE_VM_GET_PAGE_PROT
> > +#ifdef CONFIG_PPC_BOOK3S_64
> > +pgprot_t vm_get_page_prot(unsigned long vm_flags)
> > +{
> > +   unsigned long prot = pgprot_val(protection_map[vm_flags &
> > +   (VM_READ|VM_WRITE|VM_EXEC|V
> > M_SHARED)]);
> > +
> > +   if (vm_flags & VM_SAO)
> > +   prot |= _PAGE_SAO;
> > +
> > +#ifdef CONFIG_PPC_MEM_KEYS
> > +   prot |= vmflag_to_pte_pkey_bits(vm_flags);
> >   #endif
> > +
> > +   return __pgprot(prot);
> > +}
> > +EXPORT_SYMBOL(vm_get_page_prot);
> > +#else
> > +DECLARE_VM_GET_PAGE_PROT
> > +#endif /* CONFIG_PPC_BOOK3S_64 */



[PATCH v2 2/2] powerpc/mm: Support execute-only memory on the Radix MMU

2022-08-08 Thread Russell Currey
The Hash MMU already supports XOM (i.e. mmap with PROT_EXEC only)
through the execute-only pkey.  A PROT_EXEC-only mapping will actually
map to RX, and then the pkey will be applied on top of it.

Radix doesn't have pkeys, but it does have execute permissions built-in
to the MMU, so all we have to do to support XOM is expose it.

That's not possible with protection_map being const, so make it RO after
init instead.

Signed-off-by: Russell Currey 
---
v2: Make protection_map __ro_after_init and set it in an initcall
(v1 didn't work, I tested before rebasing on Anshuman's patches)

basic test: https://raw.githubusercontent.com/ruscur/junkcode/main/mmap_test.c

 arch/powerpc/include/asm/book3s/64/radix.h |  3 +++
 arch/powerpc/include/asm/pgtable.h |  1 -
 arch/powerpc/mm/fault.c| 10 ++
 arch/powerpc/mm/pgtable.c  | 16 +++-
 4 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
b/arch/powerpc/include/asm/book3s/64/radix.h
index 686001eda936..bf316b773d73 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -19,6 +19,9 @@
 #include 
 #endif
 
+/* Execute-only page protections, Hash can use RX + execute-only pkey */
+#define PAGE_EXECONLY  __pgprot(_PAGE_BASE | _PAGE_EXEC)
+
 /* An empty PTE can still have a R or C writeback */
 #define RADIX_PTE_NONE_MASK(_PAGE_DIRTY | _PAGE_ACCESSED)
 
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 33f4bf8d22b0..3cbb6de20f9d 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -60,7 +60,6 @@ extern void paging_init(void);
 void poking_init(void);
 
 extern unsigned long ioremap_bot;
-extern const pgprot_t protection_map[16];
 
 /*
  * kern_addr_valid is intended to indicate whether an address is a valid
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 014005428687..887c0cc45ca6 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -270,6 +270,16 @@ static bool access_error(bool is_write, bool is_exec, 
struct vm_area_struct *vma
return false;
}
 
+   if (unlikely(!(vma->vm_flags & VM_READ))) {
+   /*
+* If we're on Radix, then this could be a read attempt on
+* execute-only memory.  On other MMUs, an "exec-only" page
+* will be given RX flags, so this might be redundant.
+*/
+   if (radix_enabled())
+   return true;
+   }
+
if (unlikely(!vma_is_accessible(vma)))
return true;
/*
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 0b2bbde5fb65..6e1a6a999c3c 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -475,7 +475,7 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
 EXPORT_SYMBOL_GPL(__find_linux_pte);
 
 /* Note due to the way vm flags are laid out, the bits are XWR */
-const pgprot_t protection_map[16] = {
+static pgprot_t protection_map[16] __ro_after_init = {
[VM_NONE]   = PAGE_NONE,
[VM_READ]   = PAGE_READONLY,
[VM_WRITE]  = PAGE_COPY,
@@ -494,6 +494,20 @@ const pgprot_t protection_map[16] = {
[VM_SHARED | VM_EXEC | VM_WRITE | VM_READ]  = PAGE_SHARED_X
 };
 
+#ifdef CONFIG_PPC_RADIX_MMU
+static int __init radix_update_protection_map(void)
+{
+   if (early_radix_enabled()) {
+   /* Radix directly supports execute-only page protections */
+   protection_map[VM_EXEC] = PAGE_EXECONLY;
+   protection_map[VM_EXEC | VM_SHARED] = PAGE_EXECONLY;
+   }
+
+   return 0;
+}
+arch_initcall(radix_update_protection_map);
+#endif /* CONFIG_PPC_RADIX_MMU */
+
 #ifdef CONFIG_PPC_BOOK3S_64
 pgprot_t vm_get_page_prot(unsigned long vm_flags)
 {
-- 
2.37.1



[PATCH v2 1/2] powerpc/mm: Move vm_get_page_prot() out of book3s64 code

2022-08-08 Thread Russell Currey
protection_map is about to be __ro_after_init instead of const, so move
the only non-local function that consumes it to the same file so it can
at least be static.

Signed-off-by: Russell Currey 
---
v2: new

 arch/powerpc/mm/book3s64/pgtable.c | 16 
 arch/powerpc/mm/pgtable.c  | 21 +++--
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 7b9966402b25..e2a4ea5eb960 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -550,19 +550,3 @@ unsigned long memremap_compat_align(void)
 }
 EXPORT_SYMBOL_GPL(memremap_compat_align);
 #endif
-
-pgprot_t vm_get_page_prot(unsigned long vm_flags)
-{
-   unsigned long prot = pgprot_val(protection_map[vm_flags &
-   (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]);
-
-   if (vm_flags & VM_SAO)
-   prot |= _PAGE_SAO;
-
-#ifdef CONFIG_PPC_MEM_KEYS
-   prot |= vmflag_to_pte_pkey_bits(vm_flags);
-#endif
-
-   return __pgprot(prot);
-}
-EXPORT_SYMBOL(vm_get_page_prot);
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index cb2dcdb18f8e..0b2bbde5fb65 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_PPC64
 #define PGD_ALIGN (sizeof(pgd_t) * MAX_PTRS_PER_PGD)
@@ -493,6 +494,22 @@ const pgprot_t protection_map[16] = {
[VM_SHARED | VM_EXEC | VM_WRITE | VM_READ]  = PAGE_SHARED_X
 };
 
-#ifndef CONFIG_PPC_BOOK3S_64
-DECLARE_VM_GET_PAGE_PROT
+#ifdef CONFIG_PPC_BOOK3S_64
+pgprot_t vm_get_page_prot(unsigned long vm_flags)
+{
+   unsigned long prot = pgprot_val(protection_map[vm_flags &
+   (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]);
+
+   if (vm_flags & VM_SAO)
+   prot |= _PAGE_SAO;
+
+#ifdef CONFIG_PPC_MEM_KEYS
+   prot |= vmflag_to_pte_pkey_bits(vm_flags);
 #endif
+
+   return __pgprot(prot);
+}
+EXPORT_SYMBOL(vm_get_page_prot);
+#else
+DECLARE_VM_GET_PAGE_PROT
+#endif /* CONFIG_PPC_BOOK3S_64 */
-- 
2.37.1



[PATCH] powerpc/mm: Support execute-only memory on the Radix MMU

2022-08-08 Thread Russell Currey
The Hash MMU already supports XOM (i.e. mmap with PROT_EXEC only)
through the execute-only pkey.  A PROT_ONLY mapping will actually map to
RX, and then the pkey will be applied on top of it.

Radix doesn't have pkeys, but it does have execute permissions built-in
to the MMU, so all we have to do to support XOM is expose it.

Signed-off-by: Russell Currey 
---
quick test: https://raw.githubusercontent.com/ruscur/junkcode/main/mmap_test.c
I can make it a selftest.

 arch/powerpc/include/asm/book3s/64/radix.h |  3 +++
 arch/powerpc/mm/book3s64/radix_pgtable.c   |  4 
 arch/powerpc/mm/fault.c| 10 ++
 3 files changed, 17 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
b/arch/powerpc/include/asm/book3s/64/radix.h
index 686001eda936..bf316b773d73 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -19,6 +19,9 @@
 #include 
 #endif
 
+/* Execute-only page protections, Hash can use RX + execute-only pkey */
+#define PAGE_EXECONLY  __pgprot(_PAGE_BASE | _PAGE_EXEC)
+
 /* An empty PTE can still have a R or C writeback */
 #define RADIX_PTE_NONE_MASK(_PAGE_DIRTY | _PAGE_ACCESSED)
 
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 698274109c91..2edb56169805 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -617,6 +617,10 @@ void __init radix__early_init_mmu(void)
__pmd_frag_nr = RADIX_PMD_FRAG_NR;
__pmd_frag_size_shift = RADIX_PMD_FRAG_SIZE_SHIFT;
 
+   /* Radix directly supports execute-only page protections */
+   protection_map[VM_EXEC] = PAGE_EXECONLY;
+   protection_map[VM_EXEC | VM_SHARED] = PAGE_EXECONLY;
+
radix_init_pgtable();
 
if (!firmware_has_feature(FW_FEATURE_LPAR)) {
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 014005428687..887c0cc45ca6 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -270,6 +270,16 @@ static bool access_error(bool is_write, bool is_exec, 
struct vm_area_struct *vma
return false;
}
 
+   if (unlikely(!(vma->vm_flags & VM_READ))) {
+   /*
+* If we're on Radix, then this could be a read attempt on
+* execute-only memory.  On other MMUs, an "exec-only" page
+* will be given RX flags, so this might be redundant.
+*/
+   if (radix_enabled())
+   return true;
+   }
+
if (unlikely(!vma_is_accessible(vma)))
return true;
/*
-- 
2.37.1



[PATCH] MAINTAINERS: Remove myself as EEH maintainer

2022-08-06 Thread Russell Currey
I haven't touched EEH in a long time I don't have much knowledge of the
subsystem at this point either, so it's misleading to have me as a
maintainer.

I remain grateful to Oliver for picking up my slack over the years.

Signed-off-by: Russell Currey 
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index a9f77648c107..dfe6081fa0b3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15639,7 +15639,6 @@ F:  drivers/pci/endpoint/
 F: tools/pci/
 
 PCI ENHANCED ERROR HANDLING (EEH) FOR POWERPC
-M: Russell Currey 
 M: Oliver O'Halloran 
 L: linuxppc-dev@lists.ozlabs.org
 S: Supported
-- 
2.37.1



[PATCH] selftests/powerpc: Don't run spectre_v2 test by default

2022-08-04 Thread Russell Currey
The spectre_v2 selftest has issues that I'm unsure of how to resolve.
It uses context to determine intended behaviour, but that context is
unreliable - as an example, when running as a KVM guest, qemu can
deliberately misreport mitigation status for compatibility purposes.

As a result, the selftest is unreliable as a pass/fail test without
the test runner knowing what they expect its behaviour to be.  I don't
think the selftest is useless so we should keep it around, but we
shouldn't have run_tests run it by default.

Suggested-by: Eirik Fuller 
Signed-off-by: Russell Currey 
---
 tools/testing/selftests/powerpc/security/Makefile | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/powerpc/security/Makefile 
b/tools/testing/selftests/powerpc/security/Makefile
index 7488315fd847..c954d79aeb80 100644
--- a/tools/testing/selftests/powerpc/security/Makefile
+++ b/tools/testing/selftests/powerpc/security/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0+
 
-TEST_GEN_PROGS := rfi_flush entry_flush uaccess_flush spectre_v2
+TEST_GEN_PROGS := rfi_flush entry_flush uaccess_flush
+TEST_GEN_PROGS_EXTENDED := spectre_v2
 TEST_PROGS := mitigation-patching.sh
 
 top_srcdir = ../../../../..
@@ -10,6 +11,7 @@ CFLAGS += -I../../../../../usr/include
 include ../../lib.mk
 
 $(TEST_GEN_PROGS): ../harness.c ../utils.c
+$(TEST_GEN_PROGS_EXTENDED): ../harness.c ../utils.c
 
 $(OUTPUT)/spectre_v2: CFLAGS += -m64
 $(OUTPUT)/spectre_v2: ../pmu/event.c branch_loops.S
-- 
2.37.1



Re: [PATCH kernel] pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window

2022-06-26 Thread Russell Currey
 *fdt,
> if (ret < 0)
> goto out;
>  
> +#define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
> +#define DMA64_PROPNAME "linux,dma64-ddr-window-info"

Instead of having these defined in two different places, could they be
moved out of iommu.c and into a header?  Though we hardcode ibm,dma-
window everywhere anyway.

> +   ret = update_pci_nodes(fdt, DIRECT64_PROPNAME);
> +   if (ret < 0)
> +   goto out;
> +
> +   ret = update_pci_nodes(fdt, DMA64_PROPNAME);
> +   if (ret < 0)
> +   goto out;
> +
> /* Update memory reserve map */
> ret = get_reserved_memory_ranges();
> if (ret)
> diff --git a/arch/powerpc/platforms/pseries/iommu.c
> b/arch/powerpc/platforms/pseries/iommu.c
> index fba64304e859..af3c871668df 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -700,6 +700,33 @@ struct iommu_table_ops
> iommu_table_lpar_multi_ops = {
> .get = tce_get_pSeriesLP
>  };
>  
> +/*
> + * Find nearest ibm,dma-window (default DMA window) or direct DMA
> window or
> + * dynamic 64bit DMA window, walking up the device tree.
> + */
> +static struct device_node *pci_dma_find(struct device_node *dn,
> +   const __be32 **dma_window)
> +{
> +   const __be32 *dw = NULL;
> +
> +   for ( ; dn && PCI_DN(dn); dn = dn->parent) {
> +   dw = of_get_property(dn, "ibm,dma-window", NULL);
> +   if (dw) {
> +   if (dma_window)
> +   *dma_window = dw;
> +   return dn;
> +   }
> +   dw = of_get_property(dn, DIRECT64_PROPNAME, NULL);
> +   if (dw)
> +   return dn;
> +   dw = of_get_property(dn, DMA64_PROPNAME, NULL);
> +   if (dw)
> +   return dn;
> +   }
> +
> +   return NULL;
> +}
> +
>  static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
>  {
> struct iommu_table *tbl;
> @@ -712,20 +739,10 @@ static void pci_dma_bus_setup_pSeriesLP(struct
> pci_bus *bus)
> pr_debug("pci_dma_bus_setup_pSeriesLP: setting up bus
> %pOF\n",
>  dn);
>  
> -   /*
> -    * Find nearest ibm,dma-window (default DMA window), walking
> up the
> -    * device tree
> -    */
> -   for (pdn = dn; pdn != NULL; pdn = pdn->parent) {
> -   dma_window = of_get_property(pdn, "ibm,dma-window",
> NULL);
> -   if (dma_window != NULL)
> -   break;
> -   }
> +   pdn = pci_dma_find(dn, _window);
>  
> -   if (dma_window == NULL) {
> +   if (dma_window == NULL)
> pr_debug("  no ibm,dma-window property !\n");
> -   return;
> -   }
>  
> ppci = PCI_DN(pdn);
>  
> @@ -735,11 +752,13 @@ static void pci_dma_bus_setup_pSeriesLP(struct
> pci_bus *bus)
> if (!ppci->table_group) {
> ppci->table_group = iommu_pseries_alloc_group(ppci-
> >phb->node);
> tbl = ppci->table_group->tables[0];
> -   iommu_table_setparms_lpar(ppci->phb, pdn, tbl,
> -   ppci->table_group, dma_window);
> +   if (dma_window) {
> +   iommu_table_setparms_lpar(ppci->phb, pdn,
> tbl,
> + ppci->table_group,
> dma_window);
>  
> -   if (!iommu_init_table(tbl, ppci->phb->node, 0, 0))
> -   panic("Failed to initialize iommu table");
> +   if (!iommu_init_table(tbl, ppci->phb->node,
> 0, 0))
> +   panic("Failed to initialize iommu
> table");
> +   }
> iommu_register_group(ppci->table_group,
> pci_domain_nr(bus), 0);
>     pr_debug("  created table: %p\n", ppci->table_group);
> @@ -1429,16 +1448,22 @@ static bool enable_ddw(struct pci_dev *dev,
> struct device_node *pdn)
>  
> pci->table_group->tables[1] = newtbl;
>  
> -   /* Keep default DMA window struct if removed */
> -   if (default_win_removed) {
> -   tbl->it_size = 0;
> -   vfree(tbl->it_map);
> -   tbl->it_map = NULL;
> -   }
> -
> set_iommu_table_base(>dev, newtb

Re: [PATCH] powerc: Update asm-prototypes.h comment

2022-06-22 Thread Russell Currey
On Fri, 2022-06-17 at 18:02 +1000, Michael Ellerman wrote:
> This header was recently cleaned up in commit 76222808fc25 ("powerpc:
> Move C prototypes out of asm-prototypes.h"), update the comment to
> reflect it's proper purpose.
> 
> Signed-off-by: Michael Ellerman 

Hi Michael, subject says "powerc" instead of "powerpc".

- clippy


Re: [PATCH 3/5] bpf ppc64: Add instructions for atomic_[cmp]xchg

2022-05-15 Thread Russell Currey
On Thu, 2022-05-12 at 13:15 +0530, Hari Bathini wrote:
> This adds two atomic opcodes BPF_XCHG and BPF_CMPXCHG on ppc64, both
> of which include the BPF_FETCH flag.  The kernel's atomic_cmpxchg
> operation fundamentally has 3 operands, but we only have two register
> fields. Therefore the operand we compare against (the kernel's API
> calls it 'old') is hard-coded to be BPF_REG_R0. Also, kernel's
> atomic_cmpxchg returns the previous value at dst_reg + off. JIT the
> same for BPF too with return value put in BPF_REG_0.
> 
>   BPF_REG_R0 = atomic_cmpxchg(dst_reg + off, BPF_REG_R0, src_reg);
> 
> Signed-off-by: Hari Bathini 
> ---
>  arch/powerpc/net/bpf_jit_comp64.c | 28 
>  1 file changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c
> b/arch/powerpc/net/bpf_jit_comp64.c
> index 504fa459f9f3..df9e20b22ccb 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -783,6 +783,9 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32
> *image, struct codegen_context *
>  */
> case BPF_STX | BPF_ATOMIC | BPF_W:
> case BPF_STX | BPF_ATOMIC | BPF_DW:
> +   u32 save_reg = tmp2_reg;
> +   u32 ret_reg = src_reg;

Hi Hari,

Some compilers[0][1] don't like these late declarations after case
labels:

   arch/powerpc/net/bpf_jit_comp64.c: In function ‘bpf_jit_build_body’:
   arch/powerpc/net/bpf_jit_comp64.c:781:4: error: a label can only be
   part of a statement and a declaration is not a statement
   u32 save_reg = tmp2_reg;
   ^~~
   arch/powerpc/net/bpf_jit_comp64.c:782:4: error: expected expression
   before ‘u32’
   u32 ret_reg = src_reg;
   ^~~
   arch/powerpc/net/bpf_jit_comp64.c:819:5: error: ‘ret_reg’ undeclared
   (first use in this function); did you mean ‘dst_reg’?
ret_reg = bpf_to_ppc(BPF_REG_0);
   
Adding a semicolon fixes the first issue, i.e.

   case BPF_STX | BPF_ATOMIC | BPF_DW: ;
   
but then it just complains about mixed declarations and code instead.

So you should declare save_reg and ret_reg at the beginning of the for
loop like the rest of the variables.

- Russell

[0]: gcc 5.5.0
https://github.com/ruscur/linux-ci/runs/6418546193?check_suite_focus=true#step:4:122
[1]: clang 12.0
https://github.com/ruscur/linux-ci/runs/6418545338?check_suite_focus=true#step:4:117

> +
> /* Get offset into TMP_REG_1 */
> EMIT(PPC_RAW_LI(tmp1_reg, off));
> tmp_idx = ctx->idx * 4;
> @@ -813,6 +816,24 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32
> *image, struct codegen_context *
> case BPF_XOR | BPF_FETCH:
> EMIT(PPC_RAW_XOR(tmp2_reg, tmp2_reg,
> src_reg));
> break;
> +   case BPF_CMPXCHG:
> +   /*
> +    * Return old value in BPF_REG_0 for
> BPF_CMPXCHG &
> +    * in src_reg for other cases.
> +    */
> +   ret_reg = bpf_to_ppc(BPF_REG_0);
> +
> +   /* Compare with old value in BPF_R0
> */
> +   if (size == BPF_DW)
> +   EMIT(PPC_RAW_CMPD(bpf_to_ppc(
> BPF_REG_0), tmp2_reg));
> +   else
> +   EMIT(PPC_RAW_CMPW(bpf_to_ppc(
> BPF_REG_0), tmp2_reg));
> +   /* Don't set if different from old
> value */
> +   PPC_BCC_SHORT(COND_NE, (ctx->idx + 3)
> * 4);
> +   fallthrough;
> +   case BPF_XCHG:
> +   save_reg = src_reg;
> +   break;
> default:
> pr_err_ratelimited(
> "eBPF filter atomic op code
> %02x (@%d) unsupported\n",
> @@ -822,15 +843,14 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32
> *image, struct codegen_context *
>  
> /* store new value */
> if (size == BPF_DW)
> -   EMIT(PPC_RAW_STDCX(tmp2_reg,
> tmp1_reg, dst_reg));
> +   EMIT(PPC_RAW_STDCX(save_reg,
> tmp1_reg, dst_reg));
> else
> -   EMIT(PPC_RAW_STWCX(tmp2_reg,
> tmp1_reg, dst_reg));
> +   EMIT(PPC_RAW_STWCX(save_reg,
> tmp1_reg, dst_reg));
> /* we're done if this succeeded */
> PPC_BCC_SHORT(COND_NE, tmp_idx);
>  
> -   /* For the BPF_FETCH variant, get old value
> into src_reg */
> if (imm & BPF_FETCH)
> -   

Re: [PATCH v2 1/2] powerpc/powernv: Get L1D flush requirements from device-tree

2022-04-05 Thread Russell Currey
On Tue, 2022-04-05 at 02:49 +, Joel Stanley wrote:

> I booted both patches in this series on a power10 powernv machine,
> applied on top of v5.18-rc1:
> 
> $ dmesg |grep -i flush
> [    0.00] rfi-flush: fallback displacement flush available
> [    0.00] rfi-flush: patched 12 locations (no flush)
> [    0.00] count-cache-flush: flush disabled.
> [    0.00] link-stack-flush: flush disabled.

In this case you'd be looking for stf-barrier, uaccess-flush and entry-
flush so this doesn't tell us anything.  This must have been from a
no_spectrev2 boot with count cache and link stack flushes disabled.

> 
> $ grep . /sys/devices/system/cpu/vulnerabilities/*
> /sys/devices/system/cpu/vulnerabilities/itlb_multihit:Not affected
> /sys/devices/system/cpu/vulnerabilities/l1tf:Not affected
> /sys/devices/system/cpu/vulnerabilities/mds:Not affected
> /sys/devices/system/cpu/vulnerabilities/meltdown:Not affected
> /sys/devices/system/cpu/vulnerabilities/spec_store_bypass:Not
> affected
> /sys/devices/system/cpu/vulnerabilities/spectre_v1:Mitigation: __user
> pointer sanitization, ori31 speculation barrier enabled
> /sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation:
> Software count cache flush (hardware accelerated), Software link
> stack
> flush
> /sys/devices/system/cpu/vulnerabilities/srbds:Not affected
> /sys/devices/system/cpu/vulnerabilities/tsx_async_abort:Not affected
> 
> Does that match what we expect?

This is as expected for P10, though clearly from a different boot to
the above :)

- Russell

> 
> Cheers,
> 
> Joel


Re: [PATCH] powerpc/powernv: Get more flushing requirements from device-tree

2022-04-04 Thread Russell Currey
On Wed, 2022-03-23 at 16:26 -0300, Murilo Opsfelder Araújo wrote:
> Hi, Russell.
> 
> I think this patch could have been split in half with their
> corresponding Fixes: tag.
> 
> This may sound nitpicking but doing this would certainly help distros
> doing their backports.

Hi Murilo,

I didn't use the Fixes: tag originally since as far as I'm aware this
issue doesn't impact any systems "out in the wild" - so I didn't think
there would be interest in any backports.  I should have split and
tagged the commits anyway though, in case others wanted to make that
decision.

Will resend.




[PATCH v2 1/2] powerpc/powernv: Get L1D flush requirements from device-tree

2022-04-04 Thread Russell Currey
The device-tree properties no-need-l1d-flush-msr-pr-1-to-0 and
no-need-l1d-flush-kernel-on-user-access are the equivalents of
H_CPU_BEHAV_NO_L1D_FLUSH_ENTRY and H_CPU_BEHAV_NO_L1D_FLUSH_UACCESS
from the H_GET_CPU_CHARACTERISTICS hcall on pseries respectively.

In commit d02fa40d759f ("powerpc/powernv: Remove POWER9 PVR version
check for entry and uaccess flushes") the condition for disabling the
L1D flush on kernel entry and user access was changed from any non-P9
CPU to only checking P7 and P8.  Without the appropriate device-tree
checks for newer processors on powernv, these flushes are unnecessarily
enabled on those systems.  This patch corrects this.

Fixes: d02fa40d759f ("powerpc/powernv: Remove POWER9 PVR version check for 
entry and uaccess flushes")
Reported-by: Joel Stanley 
Signed-off-by: Russell Currey 
---
 arch/powerpc/platforms/powernv/setup.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index 105d889abd51..378f7e5f18d2 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -96,6 +96,12 @@ static void __init init_fw_feat_flags(struct device_node *np)
 
if (fw_feature_is("disabled", "needs-spec-barrier-for-bound-checks", 
np))
security_ftr_clear(SEC_FTR_BNDS_CHK_SPEC_BAR);
+
+   if (fw_feature_is("enabled", "no-need-l1d-flush-msr-pr-1-to-0", np))
+   security_ftr_clear(SEC_FTR_L1D_FLUSH_ENTRY);
+
+   if (fw_feature_is("enabled", "no-need-l1d-flush-kernel-on-user-access", 
np))
+   security_ftr_clear(SEC_FTR_L1D_FLUSH_UACCESS);
 }
 
 static void __init pnv_setup_security_mitigations(void)
-- 
2.35.1



[PATCH v2 2/2] powerpc/powernv: Get STF barrier requirements from device-tree

2022-04-04 Thread Russell Currey
The device-tree property no-need-store-drain-on-priv-state-switch is
equivalent to H_CPU_BEHAV_NO_STF_BARRIER from the
H_CPU_GET_CHARACTERISTICS hcall on pseries.

Since commit 84ed26fd00c5 ("powerpc/security: Add a security feature for
STF barrier") powernv systems with this device-tree property have been
enabling the STF barrier when they have no need for it.  This patch
fixes this by clearing the STF barrier feature on those systems.

Fixes: 84ed26fd00c5 ("powerpc/security: Add a security feature for STF barrier")
Reported-by: Joel Stanley 
Signed-off-by: Russell Currey 
---
 arch/powerpc/platforms/powernv/setup.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index 378f7e5f18d2..824c3ad7a0fa 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -102,6 +102,9 @@ static void __init init_fw_feat_flags(struct device_node 
*np)
 
if (fw_feature_is("enabled", "no-need-l1d-flush-kernel-on-user-access", 
np))
security_ftr_clear(SEC_FTR_L1D_FLUSH_UACCESS);
+
+   if (fw_feature_is("enabled", 
"no-need-store-drain-on-priv-state-switch", np))
+   security_ftr_clear(SEC_FTR_STF_BARRIER);
 }
 
 static void __init pnv_setup_security_mitigations(void)
-- 
2.35.1



[PATCH] powerpc/powernv: Get more flushing requirements from device-tree

2022-03-22 Thread Russell Currey
The device-tree properties no-need-l1d-flush-msr-pr-1-to-0,
no-need-l1d-flush-kernel-on-user-access and
no-need-store-drain-on-priv-state-switch are the equivalents of
H_CPU_BEHAV_NO_L1D_FLUSH_ENTRY, H_CPU_BEHAV_NO_L1D_FLUSH_UACCESS
and H_CPU_BEHAV_NO_STF_BARRIER from the H_GET_CPU_CHARACTERISTICS
hcall on pseries, respectively.

Since commit 84ed26fd00c5 ("powerpc/security: Add a security feature for
STF barrier") powernv systems with this device-tree property have been
enabling the STF barrier when they have no need for it.  This patch
fixes this by clearing the STF barrier feature on those systems.

In commit d02fa40d759f ("powerpc/powernv: Remove POWER9 PVR version
check for entry and uaccess flushes") the condition for disabling the
L1D flush on kernel entry and user access was changed from any non-P9
CPU to only checking P7 and P8.  Without the appropriate device-tree
checks for newer processors on powernv, these flushes are unnecessarily
enabled on those systems.  This patch fixes that too.

Reported-by: Joel Stanley 
Signed-off-by: Russell Currey 
---
 arch/powerpc/platforms/powernv/setup.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index 105d889abd51..824c3ad7a0fa 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -96,6 +96,15 @@ static void __init init_fw_feat_flags(struct device_node *np)
 
if (fw_feature_is("disabled", "needs-spec-barrier-for-bound-checks", 
np))
security_ftr_clear(SEC_FTR_BNDS_CHK_SPEC_BAR);
+
+   if (fw_feature_is("enabled", "no-need-l1d-flush-msr-pr-1-to-0", np))
+   security_ftr_clear(SEC_FTR_L1D_FLUSH_ENTRY);
+
+   if (fw_feature_is("enabled", "no-need-l1d-flush-kernel-on-user-access", 
np))
+   security_ftr_clear(SEC_FTR_L1D_FLUSH_UACCESS);
+
+   if (fw_feature_is("enabled", 
"no-need-store-drain-on-priv-state-switch", np))
+   security_ftr_clear(SEC_FTR_STF_BARRIER);
 }
 
 static void __init pnv_setup_security_mitigations(void)
-- 
2.35.1



Re: [PATCH] powerpc/module_64: fix array_size.cocci warning

2022-02-24 Thread Russell Currey
On Wed, 2022-02-23 at 15:54 +0800, Guo Zhengkui wrote:
> Fix following coccicheck warning:
> ./arch/powerpc/kernel/module_64.c:432:40-41: WARNING: Use ARRAY_SIZE.
> 
> ARRAY_SIZE(arr) is a macro provided by the kernel. It makes sure that
> arr
> is an array, so it's safer than sizeof(arr) / sizeof(arr[0]) and more
> standard.
> 
> Signed-off-by: Guo Zhengkui 

Reviewed-by: Russell Currey 


Re: [PATCH kernel 3/3] powerpc/llvm/lto: Workaround conditional branches in FTR_SECTION_ELSE

2022-02-10 Thread Russell Currey
On Fri, 2022-02-11 at 13:31 +1100, Alexey Kardashevskiy wrote:
> diff --git a/arch/powerpc/lib/memcpy_64.S
> b/arch/powerpc/lib/memcpy_64.S
> index 016c91e958d8..286c7e2d0883 100644
> --- a/arch/powerpc/lib/memcpy_64.S
> +++ b/arch/powerpc/lib/memcpy_64.S
> @@ -50,10 +50,11 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_VMX_COPY)
>     At the time of writing the only CPU that has this combination of
> bits
>     set is Power6. */
>  test_feature = (SELFTEST_CASE == 1)
> +   beq  .ldst_aligned

Hey Alexey, typo here (should be .Ldst_aligned) that breaks the build
for BE.

>  BEGIN_FTR_SECTION
> nop




Re: [PATCH]selftests/powerpc: skip tests for unavailable mitigations.

2021-12-14 Thread Russell Currey
On Mon, 2021-12-13 at 22:12 +0530, Sachin Sant wrote:
> Mitigation patching test iterates over a set of mitigations
> irrespective
> of whether a certain mitigation is supported/available in the kernel.
> This causes following messages on a kernel where some mitigations
> are unavailable:
> 
>   Spawned threads enabling/disabling mitigations ...
>   cat: entry_flush: No such file or directory
>   cat: uaccess_flush: No such file or directory
>   Waiting for timeout ...
>   OK
> 
> This patch adds a check for available mitigations in the kernel.
> 
> Reported-by: Nageswara R Sastry 
> Signed-off-by: Sachin Sant 

Reviewed-by: Russell Currey 


Re: [PATCH v2 2/2] powerpc/module_64: Use patch_memory() to apply relocations to loaded modules

2021-12-12 Thread Russell Currey
On Sun, 2021-12-12 at 10:41 +, Christophe Leroy wrote:
> 
> 
> Le 12/12/2021 à 02:03, Russell Currey a écrit :
> > Livepatching a loaded module involves applying relocations through
> > apply_relocate_add(), which attempts to write to read-only memory
> > when
> > CONFIG_STRICT_MODULE_RWX=y.  Work around this by performing these
> > writes through the text poke area by using patch_memory().
> > 
> > Similar to x86 and s390 implementations, apply_relocate_add() now
> > chooses to use patch_memory() or memcpy() depending on if the
> > module
> > is loaded or not.  Without STRICT_KERNEL_RWX, patch_memory() is
> > just
> > memcpy(), so there should be no performance impact.
> > 
> > While many relocation types may not be applied in a livepatch
> > context, comprehensively handling them prevents any issues in
> > future,
> > with no performance penalty as the text poke area is only used when
> > necessary.
> > 
> > create_stub() and create_ftrace_stub() are modified to first write
> > to the stack so that the ppc64_stub_entry struct only takes one
> > write() to modify, saving several map/unmap/flush operations
> > when use of patch_memory() is necessary.
> > 
> > This patch also contains some trivial whitespace fixes.
> > 
> > Fixes: c35717c71e98 ("powerpc: Set ARCH_HAS_STRICT_MODULE_RWX")
> > Reported-by: Joe Lawrence 
> > Signed-off-by: Russell Currey 
> > ---
> > v2: No changes.
> > 
> > Some discussion here:https://github.com/linuxppc/issues/issues/375
> > for-stable version using patch_instruction():
> > https://lore.kernel.org/linuxppc-dev/20211123081520.18843-1-rus...@russell.cc/
> > 
> >   arch/powerpc/kernel/module_64.c | 157 +--
> > -
> >   1 file changed, 104 insertions(+), 53 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/module_64.c
> > b/arch/powerpc/kernel/module_64.c
> > index 6baa676e7cb6..2a146750fa6f 100644
> > --- a/arch/powerpc/kernel/module_64.c
> > +++ b/arch/powerpc/kernel/module_64.c
> > @@ -350,11 +350,11 @@ static u32 stub_insns[] = {
> >    */
> >   static inline int create_ftrace_stub(struct ppc64_stub_entry
> > *entry,
> > unsigned long addr,
> > -   struct module *me)
> > +   struct module *me,
> > +   void *(*write)(void *,
> > const void *, size_t))
> 
> I really dislike this write() parameter to the function.
> 
> I think it would be better to define a static sub-function that takes
> write()'s parameters plus the 'struct module *me' and have it call 
> either patch_memory() or memcpy() based on me->state.

I don't like it much either, I was just going off prior art from x86
and s390.  I like your idea better, and that function could just be
memcpy() if !CONFIG_STRICT_MODULE_RWX, removing the need to check the
state in that case.

> 
> >   {
> > long reladdr;
> > -
> > -   memcpy(entry->jump, stub_insns, sizeof(stub_insns));
> > +   struct ppc64_stub_entry tmp_entry;
> >   
> > /* Stub uses address relative to kernel toc (from the paca)
> > */
> > reladdr = addr - kernel_toc_addr();
> > @@ -364,12 +364,20 @@ static inline int create_ftrace_stub(struct
> > ppc64_stub_entry *entry,
> > return 0;
> > }
> >   
> > -   entry->jump[1] |= PPC_HA(reladdr);
> > -   entry->jump[2] |= PPC_LO(reladdr);
> > +   /*
> > +    * In case @entry is write-protected, make our changes on
> > the stack
> > +    * so we can update the whole struct in one write().
> > +    */
> > +   memcpy(_entry, entry, sizeof(struct ppc64_stub_entry));
> 
> That copy seems unnecessary, entry is a struct with three fields and
> you 
> are setting all three field below.

Oops, you're right.

> >   
> > +   memcpy(_entry.jump, stub_insns, sizeof(stub_insns));
> > +   tmp_entry.jump[1] |= PPC_HA(reladdr);
> > +   tmp_entry.jump[2] |= PPC_LO(reladdr);
> > /* Eventhough we don't use funcdata in the stub, it's
> > needed elsewhere. */
> > -   entry->funcdata = func_desc(addr);
> > -   entry->magic = STUB_MAGIC;
> > +   tmp_entry.funcdata = func_desc(addr);
> > +   tmp_entry.magic = STUB_MAGIC;
> > +
> > +   write(entry, _entry, sizeof(struct ppc64_stub_entry));
> >   
> > return 1;
> >   }
> &

Re: [PATCH v2 1/2] powerpc/code-patching: add patch_memory() for writing RO text

2021-12-12 Thread Russell Currey
On Sun, 2021-12-12 at 09:08 +, Christophe Leroy wrote:
> Le 12/12/2021 à 02:03, Russell Currey a écrit :
> > +static int do_patch_memory(void *dest, const void *src, size_t
> > size, unsigned long poke_addr)
> > +{
> > +   unsigned long patch_addr = poke_addr +
> > offset_in_page(dest);
> > +
> > +   if (map_patch_area(dest, poke_addr)) {
> > +   pr_warn("failed to map %lx\n", poke_addr);
> 
> It isn't worth a warning here. If that happens before slab is
> available, 
> it will panic in early_alloc_pgtable().
> 
> If it happens after, you will already get a pile of messages dumping
> the 
> memory state etc ...
> 
> During the last few years, pr_ messages have been removed from most 
> places where ENOMEM is returned.

That's good to know, thanks.

> 
> > +   return -1;
> > +   }
> 
> I have a series reworking error handling at 
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=274823=*
> 
> Especially this one handles map_patch_area() : 
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/85259d894069e47f915ea580b169e1adbeec7a61.1638446239.git.christophe.le...@csgroup.eu/
> 
> Would be good if you could rebase your series on top of it.


> 
I've rebased on top of your series (patchwork 274258 & 274823).

> > +
> > +   memcpy((u8 *)patch_addr, src, size);
> 
> Shouldn't we use copy_to_kernel_nofault(), so that we survive from a 
> fault just like patch_instruction() ?

Yes we should.

> > +
> > +   flush_icache_range(patch_addr, size);
> > +
> > +   if (unmap_patch_area(poke_addr)) {
> > +   pr_warn("failed to unmap %lx\n", poke_addr);
> > +   return -1;
> > +   }
> 
> I have changed unmap_page_area() to a void in 
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/299804b117fae35c786c827536c91f25352e279b.1638446239.git.christophe.le...@csgroup.eu/
> 
> > +
> > +   return 0;
> > +}
> > +
> > +/**
> > + * patch_memory - write data using the text poke area
> > + *
> > + * @dest:  destination address
> > + * @src:   source address
> > + * @size:  size in bytes
> > + *
> > + * like memcpy(), but using the text poke area. No atomicity
> > guarantees.
> > + * Do not use for instructions, use patch_instruction() instead.
> > + * Handles crossing page boundaries, though you shouldn't need to.
> > + *
> > + * Return value:
> > + * @dest
> > + **/
> > +void *patch_memory(void *dest, const void *src, size_t size)
> > +{
> > +   size_t bytes_written, write_size;
> > +   unsigned long text_poke_addr;
> > +   unsigned long flags;
> > +
> > +   // If the poke area isn't set up, it's early boot and we
> > can just memcpy.
> > +   if (!this_cpu_read(text_poke_area))
> > +   return memcpy(dest, src, size);
> > +
> > +   local_irq_save(flags);
> 
> Do we want to do such potentially big copies with interrupts disabled
> ?

Probably not.  This should never actually get used for big copies - the
problem it was written to solve never copies more than 40 bytes, and is
very unlikely to ever cross a page boundary.

I could disable and re-enable interrupts per-page (per call of
do_patch_memory()) so there's a preemption window on longer operations.

> 
> > +   text_poke_addr = (unsigned
> > long)__this_cpu_read(text_poke_area)->addr;
> > +
> > +   for (bytes_written = 0;
> > +    bytes_written < size;
> > +    bytes_written += write_size) {
> 
> I recommend you to read 
> https://www.kernel.org/doc/html/latest/process/coding-style.html?highlight=coding%20style#naming
> 
> As explained there, local variable names should be short. Using long 
> names is non-productive.
> 
> You could just call it "written", it would allow you to keep the
> for() 
> on a single line, that would be a lot more readable.

I am aware of the coding style, my brain somehow didn't consider
"written" as a better option, which is quite silly.


> > +   // Write as much as possible without crossing a
> > page boundary.
> > +   write_size = min_t(size_t,
> > +  size - bytes_written,
> > +  PAGE_SIZE - offset_in_page(dest
> > + bytes_written));
> 
> Reduce the size of you variable names and keep it on a single line.

> > +
> > +   if (do_patch_memory(dest + bytes_written,
> > +   src

Re: [PATCH] powerpc: Add set_memory_{p/np}() and remove set_memory_attr()

2021-12-12 Thread Russell Currey
On Fri, 2021-12-10 at 08:09 +, Christophe Leroy wrote:
> set_memory_attr() was implemented by commit 4d1755b6a762
> ("powerpc/mm:
> implement set_memory_attr()") because the set_memory_xx() couldn't
> be used at that time to modify memory "on the fly" as explained it
> the commit.
> 
> But set_memory_attr() uses set_pte_at() which leads to warnings when
> CONFIG_DEBUG_VM is selected, because set_pte_at() is unexpected for
> updating existing page table entries.
> 
> The check could be bypassed by using __set_pte_at() instead,
> as it was the case before commit c988cfd38e48 ("powerpc/32:
> use set_memory_attr()") but since commit 9f7853d7609d ("powerpc/mm:
> Fix set_memory_*() against concurrent accesses") it is now possible
> to use set_memory_xx() functions to update page table entries
> "on the fly" because the update is now atomic.
> 
> For DEBUG_PAGEALLOC we need to clear and set back _PAGE_PRESENT.
> Add set_memory_np() and set_memory_p() for that.
> 
> Replace all uses of set_memory_attr() by the relevant set_memory_xx()
> and remove set_memory_attr().
> 
> Reported-by: Maxime Bizon 
> Fixes: c988cfd38e48 ("powerpc/32: use set_memory_attr()")
> Cc: sta...@vger.kernel.org
> Depends-on: 9f7853d7609d ("powerpc/mm: Fix set_memory_*() against
> concurrent accesses")
> Signed-off-by: Christophe Leroy 

Reviewed-by: Russell Currey 

One comment below:

> diff --git a/arch/powerpc/include/asm/set_memory.h
> b/arch/powerpc/include/asm/set_memory.h
> index b040094f7920..061f1766a8a4 100644
> --- a/arch/powerpc/include/asm/set_memory.h
> +++ b/arch/powerpc/include/asm/set_memory.h
> @@ -6,6 +6,8 @@
>  #define SET_MEMORY_RW  1
>  #define SET_MEMORY_NX  2
>  #define SET_MEMORY_X   3
> +#define SET_MEMORY_NP  4
> +#define SET_MEMORY_P   5

It might be nice to have a comment somewhere in set_memory.h explaining
that {p/np} = present/not present.  RO/RW/NX/X are commonly used, "p"
as shorthand for "present" is less obvious.  x86's set_memory.h has a
nice comment covering everything as an example.

>  int change_memory_attr(unsigned long addr, int numpages, long
> action);
>  
> @@ -29,6 +31,14 @@ static inline int set_memory_x(unsigned long addr,
> int numpages)
> return change_memory_attr(addr, numpages, SET_MEMORY_X);
>  }
>  
> -int set_memory_attr(unsigned long addr, int numpages, pgprot_t
> prot);
> +static inline int set_memory_np(unsigned long addr, int numpages)
> +{
> +   return change_memory_attr(addr, numpages, SET_MEMORY_NP);
> +}
> +
> +static inline int set_memory_p(unsigned long addr, int numpages)
> +{
> +   return change_memory_attr(addr, numpages, SET_MEMORY_P);
> +}
>  
>  #endif


[PATCH v2 2/2] powerpc/module_64: Use patch_memory() to apply relocations to loaded modules

2021-12-11 Thread Russell Currey
Livepatching a loaded module involves applying relocations through
apply_relocate_add(), which attempts to write to read-only memory when
CONFIG_STRICT_MODULE_RWX=y.  Work around this by performing these
writes through the text poke area by using patch_memory().

Similar to x86 and s390 implementations, apply_relocate_add() now
chooses to use patch_memory() or memcpy() depending on if the module
is loaded or not.  Without STRICT_KERNEL_RWX, patch_memory() is just
memcpy(), so there should be no performance impact.

While many relocation types may not be applied in a livepatch
context, comprehensively handling them prevents any issues in future,
with no performance penalty as the text poke area is only used when
necessary.

create_stub() and create_ftrace_stub() are modified to first write
to the stack so that the ppc64_stub_entry struct only takes one
write() to modify, saving several map/unmap/flush operations
when use of patch_memory() is necessary.

This patch also contains some trivial whitespace fixes.

Fixes: c35717c71e98 ("powerpc: Set ARCH_HAS_STRICT_MODULE_RWX")
Reported-by: Joe Lawrence 
Signed-off-by: Russell Currey 
---
v2: No changes.

Some discussion here:https://github.com/linuxppc/issues/issues/375
for-stable version using patch_instruction():
https://lore.kernel.org/linuxppc-dev/20211123081520.18843-1-rus...@russell.cc/

 arch/powerpc/kernel/module_64.c | 157 +---
 1 file changed, 104 insertions(+), 53 deletions(-)

diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 6baa676e7cb6..2a146750fa6f 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -350,11 +350,11 @@ static u32 stub_insns[] = {
  */
 static inline int create_ftrace_stub(struct ppc64_stub_entry *entry,
unsigned long addr,
-   struct module *me)
+   struct module *me,
+   void *(*write)(void *, const void *, 
size_t))
 {
long reladdr;
-
-   memcpy(entry->jump, stub_insns, sizeof(stub_insns));
+   struct ppc64_stub_entry tmp_entry;
 
/* Stub uses address relative to kernel toc (from the paca) */
reladdr = addr - kernel_toc_addr();
@@ -364,12 +364,20 @@ static inline int create_ftrace_stub(struct 
ppc64_stub_entry *entry,
return 0;
}
 
-   entry->jump[1] |= PPC_HA(reladdr);
-   entry->jump[2] |= PPC_LO(reladdr);
+   /*
+* In case @entry is write-protected, make our changes on the stack
+* so we can update the whole struct in one write().
+*/
+   memcpy(_entry, entry, sizeof(struct ppc64_stub_entry));
 
+   memcpy(_entry.jump, stub_insns, sizeof(stub_insns));
+   tmp_entry.jump[1] |= PPC_HA(reladdr);
+   tmp_entry.jump[2] |= PPC_LO(reladdr);
/* Eventhough we don't use funcdata in the stub, it's needed elsewhere. 
*/
-   entry->funcdata = func_desc(addr);
-   entry->magic = STUB_MAGIC;
+   tmp_entry.funcdata = func_desc(addr);
+   tmp_entry.magic = STUB_MAGIC;
+
+   write(entry, _entry, sizeof(struct ppc64_stub_entry));
 
return 1;
 }
@@ -392,7 +400,8 @@ static bool is_mprofile_ftrace_call(const char *name)
 #else
 static inline int create_ftrace_stub(struct ppc64_stub_entry *entry,
unsigned long addr,
-   struct module *me)
+   struct module *me,
+   void *(*write)(void *, const void *, 
size_t))
 {
return 0;
 }
@@ -419,14 +428,14 @@ static inline int create_stub(const Elf64_Shdr *sechdrs,
  struct ppc64_stub_entry *entry,
  unsigned long addr,
  struct module *me,
- const char *name)
+ const char *name,
+ void *(*write)(void *, const void *, size_t))
 {
long reladdr;
+   struct ppc64_stub_entry tmp_entry;
 
if (is_mprofile_ftrace_call(name))
-   return create_ftrace_stub(entry, addr, me);
-
-   memcpy(entry->jump, ppc64_stub_insns, sizeof(ppc64_stub_insns));
+   return create_ftrace_stub(entry, addr, me, write);
 
/* Stub uses address relative to r2. */
reladdr = (unsigned long)entry - my_r2(sechdrs, me);
@@ -437,10 +446,19 @@ static inline int create_stub(const Elf64_Shdr *sechdrs,
}
pr_debug("Stub %p get data from reladdr %li\n", entry, reladdr);
 
-   entry->jump[0] |= PPC_HA(reladdr);
-   entry->jump[1] |= PPC_LO(reladdr);
-   entry->funcdata = func_desc(addr);
-   entry->magic = STUB_MAGIC;
+   /*
+* In case @entry is write-protected, make our changes on the stack
+  

[PATCH v2 1/2] powerpc/code-patching: add patch_memory() for writing RO text

2021-12-11 Thread Russell Currey
powerpc allocates a text poke area of one page that is used by
patch_instruction() to modify read-only text when STRICT_KERNEL_RWX
is enabled.

patch_instruction() is only designed for instructions,
so writing data using the text poke area can only happen 4 bytes
at a time - each with a page map/unmap, pte flush and syncs.

This patch introduces patch_memory(), implementing the same
interface as memcpy(), similar to x86's text_poke() and s390's
s390_kernel_write().  patch_memory() only needs to map the text
poke area once, unless the write would cross a page boundary.

Signed-off-by: Russell Currey 
---
v2: Use min_t() instead of min(), fixing the 32-bit build as reported
by snowpatch.

Some discussion here: https://github.com/linuxppc/issues/issues/375

 arch/powerpc/include/asm/code-patching.h |  1 +
 arch/powerpc/lib/code-patching.c | 74 
 2 files changed, 75 insertions(+)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index 4ba834599c4d..604211d8380c 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -31,6 +31,7 @@ int create_cond_branch(struct ppc_inst *instr, const u32 
*addr,
 int patch_branch(u32 *addr, unsigned long target, int flags);
 int patch_instruction(u32 *addr, struct ppc_inst instr);
 int raw_patch_instruction(u32 *addr, struct ppc_inst instr);
+void *patch_memory(void *dest, const void *src, size_t size);
 
 static inline unsigned long patch_site_addr(s32 *site)
 {
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index c5ed98823835..330602aa59f1 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int __patch_instruction(u32 *exec_addr, struct ppc_inst instr, u32 
*patch_addr)
 {
@@ -178,6 +179,74 @@ static int do_patch_instruction(u32 *addr, struct ppc_inst 
instr)
 
return err;
 }
+
+static int do_patch_memory(void *dest, const void *src, size_t size, unsigned 
long poke_addr)
+{
+   unsigned long patch_addr = poke_addr + offset_in_page(dest);
+
+   if (map_patch_area(dest, poke_addr)) {
+   pr_warn("failed to map %lx\n", poke_addr);
+   return -1;
+   }
+
+   memcpy((u8 *)patch_addr, src, size);
+
+   flush_icache_range(patch_addr, size);
+
+   if (unmap_patch_area(poke_addr)) {
+   pr_warn("failed to unmap %lx\n", poke_addr);
+   return -1;
+   }
+
+   return 0;
+}
+
+/**
+ * patch_memory - write data using the text poke area
+ *
+ * @dest:  destination address
+ * @src:   source address
+ * @size:  size in bytes
+ *
+ * like memcpy(), but using the text poke area. No atomicity guarantees.
+ * Do not use for instructions, use patch_instruction() instead.
+ * Handles crossing page boundaries, though you shouldn't need to.
+ *
+ * Return value:
+ * @dest
+ **/
+void *patch_memory(void *dest, const void *src, size_t size)
+{
+   size_t bytes_written, write_size;
+   unsigned long text_poke_addr;
+   unsigned long flags;
+
+   // If the poke area isn't set up, it's early boot and we can just 
memcpy.
+   if (!this_cpu_read(text_poke_area))
+   return memcpy(dest, src, size);
+
+   local_irq_save(flags);
+   text_poke_addr = (unsigned long)__this_cpu_read(text_poke_area)->addr;
+
+   for (bytes_written = 0;
+bytes_written < size;
+bytes_written += write_size) {
+   // Write as much as possible without crossing a page boundary.
+   write_size = min_t(size_t,
+  size - bytes_written,
+  PAGE_SIZE - offset_in_page(dest + 
bytes_written));
+
+   if (do_patch_memory(dest + bytes_written,
+   src + bytes_written,
+   write_size,
+   text_poke_addr))
+   break;
+   }
+
+   local_irq_restore(flags);
+
+   return dest;
+}
 #else /* !CONFIG_STRICT_KERNEL_RWX */
 
 static int do_patch_instruction(u32 *addr, struct ppc_inst instr)
@@ -185,6 +254,11 @@ static int do_patch_instruction(u32 *addr, struct ppc_inst 
instr)
return raw_patch_instruction(addr, instr);
 }
 
+void *patch_memory(void *dest, const void *src, size_t size)
+{
+   return memcpy(dest, src, size);
+}
+
 #endif /* CONFIG_STRICT_KERNEL_RWX */
 
 int patch_instruction(u32 *addr, struct ppc_inst instr)
-- 
2.34.1



[PATCH 2/2] powerpc/module_64: Use patch_memory() to apply relocations to loaded modules

2021-12-11 Thread Russell Currey
Livepatching a loaded module involves applying relocations through
apply_relocate_add(), which attempts to write to read-only memory when
CONFIG_STRICT_MODULE_RWX=y.  Work around this by performing these
writes through the text poke area by using patch_memory().

Similar to x86 and s390 implementations, apply_relocate_add() now
chooses to use patch_memory() or memcpy() depending on if the module
is loaded or not.  Without STRICT_KERNEL_RWX, patch_memory() is just
memcpy(), so there should be no performance impact.

While many relocation types may not be applied in a livepatch
context, comprehensively handling them prevents any issues in future,
with no performance penalty as the text poke area is only used when
necessary.

create_stub() and create_ftrace_stub() are modified to first write
to the stack so that the ppc64_stub_entry struct only takes one
write() to modify, saving several map/unmap/flush operations
when use of patch_memory() is necessary.

This patch also contains some trivial whitespace fixes.

Fixes: c35717c71e98 ("powerpc: Set ARCH_HAS_STRICT_MODULE_RWX")
Reported-by: Joe Lawrence 
Signed-off-by: Russell Currey 
---
Some discussion here:https://github.com/linuxppc/issues/issues/375
for-stable version using patch_instruction():
https://lore.kernel.org/linuxppc-dev/20211123081520.18843-1-rus...@russell.cc/

 arch/powerpc/kernel/module_64.c | 157 +---
 1 file changed, 104 insertions(+), 53 deletions(-)

diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 6baa676e7cb6..2a146750fa6f 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -350,11 +350,11 @@ static u32 stub_insns[] = {
  */
 static inline int create_ftrace_stub(struct ppc64_stub_entry *entry,
unsigned long addr,
-   struct module *me)
+   struct module *me,
+   void *(*write)(void *, const void *, 
size_t))
 {
long reladdr;
-
-   memcpy(entry->jump, stub_insns, sizeof(stub_insns));
+   struct ppc64_stub_entry tmp_entry;
 
/* Stub uses address relative to kernel toc (from the paca) */
reladdr = addr - kernel_toc_addr();
@@ -364,12 +364,20 @@ static inline int create_ftrace_stub(struct 
ppc64_stub_entry *entry,
return 0;
}
 
-   entry->jump[1] |= PPC_HA(reladdr);
-   entry->jump[2] |= PPC_LO(reladdr);
+   /*
+* In case @entry is write-protected, make our changes on the stack
+* so we can update the whole struct in one write().
+*/
+   memcpy(_entry, entry, sizeof(struct ppc64_stub_entry));
 
+   memcpy(_entry.jump, stub_insns, sizeof(stub_insns));
+   tmp_entry.jump[1] |= PPC_HA(reladdr);
+   tmp_entry.jump[2] |= PPC_LO(reladdr);
/* Eventhough we don't use funcdata in the stub, it's needed elsewhere. 
*/
-   entry->funcdata = func_desc(addr);
-   entry->magic = STUB_MAGIC;
+   tmp_entry.funcdata = func_desc(addr);
+   tmp_entry.magic = STUB_MAGIC;
+
+   write(entry, _entry, sizeof(struct ppc64_stub_entry));
 
return 1;
 }
@@ -392,7 +400,8 @@ static bool is_mprofile_ftrace_call(const char *name)
 #else
 static inline int create_ftrace_stub(struct ppc64_stub_entry *entry,
unsigned long addr,
-   struct module *me)
+   struct module *me,
+   void *(*write)(void *, const void *, 
size_t))
 {
return 0;
 }
@@ -419,14 +428,14 @@ static inline int create_stub(const Elf64_Shdr *sechdrs,
  struct ppc64_stub_entry *entry,
  unsigned long addr,
  struct module *me,
- const char *name)
+ const char *name,
+ void *(*write)(void *, const void *, size_t))
 {
long reladdr;
+   struct ppc64_stub_entry tmp_entry;
 
if (is_mprofile_ftrace_call(name))
-   return create_ftrace_stub(entry, addr, me);
-
-   memcpy(entry->jump, ppc64_stub_insns, sizeof(ppc64_stub_insns));
+   return create_ftrace_stub(entry, addr, me, write);
 
/* Stub uses address relative to r2. */
reladdr = (unsigned long)entry - my_r2(sechdrs, me);
@@ -437,10 +446,19 @@ static inline int create_stub(const Elf64_Shdr *sechdrs,
}
pr_debug("Stub %p get data from reladdr %li\n", entry, reladdr);
 
-   entry->jump[0] |= PPC_HA(reladdr);
-   entry->jump[1] |= PPC_LO(reladdr);
-   entry->funcdata = func_desc(addr);
-   entry->magic = STUB_MAGIC;
+   /*
+* In case @entry is write-protected, make our changes on the stack
+  

[PATCH 1/2] powerpc/code-patching: add patch_memory() for writing RO text

2021-12-11 Thread Russell Currey
powerpc allocates a text poke area of one page that is used by
patch_instruction() to modify read-only text when STRICT_KERNEL_RWX
is enabled.

patch_instruction() is only designed for instructions,
so writing data using the text poke area can only happen 4 bytes
at a time - each with a page map/unmap, pte flush and syncs.

This patch introduces patch_memory(), implementing the same
interface as memcpy(), similar to x86's text_poke() and s390's
s390_kernel_write().  patch_memory() only needs to map the text
poke area once, unless the write would cross a page boundary.

Signed-off-by: Russell Currey 
---
Sorry I took so long to post this.
Some discussion here: https://github.com/linuxppc/issues/issues/375

 arch/powerpc/include/asm/code-patching.h |  1 +
 arch/powerpc/lib/code-patching.c | 73 
 2 files changed, 74 insertions(+)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index 4ba834599c4d..604211d8380c 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -31,6 +31,7 @@ int create_cond_branch(struct ppc_inst *instr, const u32 
*addr,
 int patch_branch(u32 *addr, unsigned long target, int flags);
 int patch_instruction(u32 *addr, struct ppc_inst instr);
 int raw_patch_instruction(u32 *addr, struct ppc_inst instr);
+void *patch_memory(void *dest, const void *src, size_t size);
 
 static inline unsigned long patch_site_addr(s32 *site)
 {
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index c5ed98823835..3a566d756ccc 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int __patch_instruction(u32 *exec_addr, struct ppc_inst instr, u32 
*patch_addr)
 {
@@ -178,6 +179,73 @@ static int do_patch_instruction(u32 *addr, struct ppc_inst 
instr)
 
return err;
 }
+
+static int do_patch_memory(void *dest, const void *src, size_t size, unsigned 
long poke_addr)
+{
+   unsigned long patch_addr = poke_addr + offset_in_page(dest);
+
+   if (map_patch_area(dest, poke_addr)) {
+   pr_warn("failed to map %lx\n", poke_addr);
+   return -1;
+   }
+
+   memcpy((u8 *)patch_addr, src, size);
+
+   flush_icache_range(patch_addr, size);
+
+   if (unmap_patch_area(poke_addr)) {
+   pr_warn("failed to unmap %lx\n", poke_addr);
+   return -1;
+   }
+
+   return 0;
+}
+
+/**
+ * patch_memory - write data using the text poke area
+ *
+ * @dest:  destination address
+ * @src:   source address
+ * @size:  size in bytes
+ *
+ * like memcpy(), but using the text poke area. No atomicity guarantees.
+ * Do not use for instructions, use patch_instruction() instead.
+ * Handles crossing page boundaries, though you shouldn't need to.
+ *
+ * Return value:
+ * @dest
+ **/
+void *patch_memory(void *dest, const void *src, size_t size)
+{
+   size_t bytes_written, write_size;
+   unsigned long text_poke_addr;
+   unsigned long flags;
+
+   // If the poke area isn't set up, it's early boot and we can just 
memcpy.
+   if (!this_cpu_read(text_poke_area))
+   return memcpy(dest, src, size);
+
+   local_irq_save(flags);
+   text_poke_addr = (unsigned long)__this_cpu_read(text_poke_area)->addr;
+
+   for (bytes_written = 0;
+bytes_written < size;
+bytes_written += write_size) {
+   // Write as much as possible without crossing a page boundary.
+   write_size = min(size - bytes_written,
+PAGE_SIZE - offset_in_page(dest + 
bytes_written));
+
+   if (do_patch_memory(dest + bytes_written,
+   src + bytes_written,
+   write_size,
+   text_poke_addr))
+   break;
+   }
+
+   local_irq_restore(flags);
+
+   return dest;
+}
 #else /* !CONFIG_STRICT_KERNEL_RWX */
 
 static int do_patch_instruction(u32 *addr, struct ppc_inst instr)
@@ -185,6 +253,11 @@ static int do_patch_instruction(u32 *addr, struct ppc_inst 
instr)
return raw_patch_instruction(addr, instr);
 }
 
+void *patch_memory(void *dest, const void *src, size_t size)
+{
+   return memcpy(dest, src, size);
+}
+
 #endif /* CONFIG_STRICT_KERNEL_RWX */
 
 int patch_instruction(u32 *addr, struct ppc_inst instr)
-- 
2.34.1



Re: [PATCH] powerpc/module_64: Fix livepatching for RO modules

2021-12-07 Thread Russell Currey
On Tue, 2021-12-07 at 09:44 -0500, Joe Lawrence wrote:
> On 11/23/21 3:15 AM, Russell Currey wrote:
> 
> [[ cc += livepatching list ]]
> 
> Hi Russell,
> 
> Thanks for writing a minimal fix for stable / backporting.  As I
> mentioned on the github issue [1], this avoided the crashes I
> reported
> here and over on kpatch github [2].  I wasn't sure if this is the
> final
> version for stable, but feel free to add my:
> 
> Tested-by: Joe Lawrence 

Thanks Joe, as per the discussions on GitHub I think we're fine to use
this patch for a fix for stable (unless there's new issues found or
additional community feedback etc).

> 
> [1] https://github.com/linuxppc/issues/issues/375
> [2] https://github.com/dynup/kpatch/issues/1228
> 



[PATCH] powerpc/module_64: Fix livepatching for RO modules

2021-11-23 Thread Russell Currey
Livepatching a loaded module involves applying relocations through
apply_relocate_add(), which attempts to write to read-only memory when
CONFIG_STRICT_MODULE_RWX=y.  Work around this by performing these
writes through the text poke area by using patch_instruction().

R_PPC_REL24 is the only relocation type generated by the kpatch-build
userspace tool or klp-convert kernel tree that I observed applying a
relocation to a post-init module.

A more comprehensive solution is planned, but using patch_instruction()
for R_PPC_REL24 on should serve as a sufficient fix.

This does have a performance impact, I observed ~15% overhead in
module_load() on POWER8 bare metal with checksum verification off.

Fixes: c35717c71e98 ("powerpc: Set ARCH_HAS_STRICT_MODULE_RWX")
Cc: sta...@vger.kernel.org # v5.14+
Reported-by: Joe Lawrence 
Signed-off-by: Russell Currey 
---
Intended to be a minimal fix that can go to stable.

 arch/powerpc/kernel/module_64.c | 30 ++
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 6baa676e7cb6..c25ef36c3ef4 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -422,11 +422,16 @@ static inline int create_stub(const Elf64_Shdr *sechdrs,
  const char *name)
 {
long reladdr;
+   func_desc_t desc;
+   int i;
 
if (is_mprofile_ftrace_call(name))
return create_ftrace_stub(entry, addr, me);
 
-   memcpy(entry->jump, ppc64_stub_insns, sizeof(ppc64_stub_insns));
+   for (i = 0; i < sizeof(ppc64_stub_insns) / sizeof(u32); i++) {
+   patch_instruction(>jump[i],
+ ppc_inst(ppc64_stub_insns[i]));
+   }
 
/* Stub uses address relative to r2. */
reladdr = (unsigned long)entry - my_r2(sechdrs, me);
@@ -437,10 +442,19 @@ static inline int create_stub(const Elf64_Shdr *sechdrs,
}
pr_debug("Stub %p get data from reladdr %li\n", entry, reladdr);
 
-   entry->jump[0] |= PPC_HA(reladdr);
-   entry->jump[1] |= PPC_LO(reladdr);
-   entry->funcdata = func_desc(addr);
-   entry->magic = STUB_MAGIC;
+   patch_instruction(>jump[0],
+ ppc_inst(entry->jump[0] | PPC_HA(reladdr)));
+   patch_instruction(>jump[1],
+ ppc_inst(entry->jump[1] | PPC_LO(reladdr)));
+
+   // func_desc_t is 8 bytes if ABIv2, else 16 bytes
+   desc = func_desc(addr);
+   for (i = 0; i < sizeof(func_desc_t) / sizeof(u32); i++) {
+   patch_instruction(((u32 *)>funcdata) + i,
+ ppc_inst(((u32 *)())[i]));
+   }
+
+   patch_instruction(>magic, ppc_inst(STUB_MAGIC));
 
return 1;
 }
@@ -496,7 +510,7 @@ static int restore_r2(const char *name, u32 *instruction, 
struct module *me)
return 0;
}
/* ld r2,R2_STACK_OFFSET(r1) */
-   *instruction = PPC_INST_LD_TOC;
+   patch_instruction(instruction, ppc_inst(PPC_INST_LD_TOC));
return 1;
 }
 
@@ -636,9 +650,9 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
}
 
/* Only replace bits 2 through 26 */
-   *(uint32_t *)location
-   = (*(uint32_t *)location & ~0x03fc)
+   value = (*(uint32_t *)location & ~0x03fc)
| (value & 0x03fc);
+   patch_instruction((u32 *)location, ppc_inst(value));
break;
 
case R_PPC64_REL64:
-- 
2.34.0



Re: ppc64le STRICT_MODULE_RWX and livepatch apply_relocate_add() crashes

2021-11-01 Thread Russell Currey
On Sun, 2021-10-31 at 22:43 -0400, Joe Lawrence wrote:
> Starting with 5.14 kernels, I can reliably reproduce a crash [1] on
> ppc64le when loading livepatches containing late klp-relocations [2].
> These are relocations, specific to livepatching, that are resolved not
> when a livepatch module is loaded, but only when a livepatch-target
> module is loaded.

Hey Joe, thanks for the report.

> I haven't started looking at a fix yet, but in the case of the x86 code
> update, its apply_relocate_add() implementation was modified to use a
> common text_poke() function to allowed us to drop
> module_{en,dis}ble_ro() games by the livepatching code.

It should be a similar fix for Power, our patch_instruction() uses a
text poke area but apply_relocate_add() doesn't use it and does its own
raw patching instead.

> I can take a closer look this week, but thought I'd send out a report
> in case this may be a known todo for STRICT_MODULE_RWX on Power.

I'm looking into this now, will update when there's progress.  I
personally wasn't aware but Jordan flagged this as an issue back in
August [0].  Are the selftests in the klp-convert tree sufficient for
testing?  I'm not especially familiar with livepatching & haven't used
the userspace tools.

- Russell

[0] https://github.com/linuxppc/issues/issues/375

> 
> -- Joe



  1   2   3   4   5   >