Re: [PATCH 10/13] powerpc/rtas: improve function information lookups

2022-11-28 Thread Andrew Donnellan
On Mon, 2022-11-28 at 18:08 -0600, Nathan Lynch wrote:
> Andrew Donnellan  writes:
> > On Fri, 2022-11-18 at 09:07 -0600, Nathan Lynch wrote:
> > > Convert rtas_token() to use a lockless binary search on the
> > > function
> > > table. Fall back to the old behavior for lookups against names
> > > that
> > > are not known to be RTAS functions, but issue a warning.
> > > rtas_token()
> > > is for function names; it is not a general facility for accessing
> > > arbitrary properties of the /rtas node. All known misuses of
> > > rtas_token() have been converted to more appropriate of_ APIs in
> > > preceding changes.
> > 
> > For in-kernel users, why not go all the way: make rtas_token()
> > static
> > and use it purely for the userspace API,
> 
> Not sure what userspace API refers to here, can you be more specific?
> I
> don't think rtas_token() is exposed to user space.

Right, what I actually meant to refer to here is the fact that sys_rtas
takes a token, but when I wrote this I forgot that userspace doesn't
pass a string, rather librtas implements rtas_token itself using the
/proc interface to the device tree.

> 
> > and switch kernel users over
> > to using rtas_function_index directly?
> 
> Well, I have work in progress to transition all rtas_token() users to
> a
> different API, but using opaque symbolic handles rather than exposing
> the indexes. Something like:
> 
> /*
>  * Opaque handle for client code to refer to RTAS functions. All
> valid
>  * function handles are build-time constants prefixed with RTAS_FN_.
>  */
> typedef struct {
> enum rtas_function_index index;
> } rtas_fn_handle_t;
> 
> #define rtas_fn_handle(x_) (const rtas_fn_handle_t) { .index =
> rtas_fnidx(x_), }
> 
> #define RTAS_FN_CHECK_EXCEPTION  
> rtas_fn_handle(CHECK_EXCEPTION)
> #define RTAS_FN_DISPLAY_CHARACTER
> rtas_fn_handle(DISPLAY_CHARACTER)
> #define RTAS_FN_EVENT_SCAN   
> rtas_fn_handle(EVENT_SCAN)
> 
> ...
> 
> /**
>  * rtas_function_token() - RTAS function token lookup.
>  * @handle: Function handle, e.g. RTAS_FN_EVENT_SCAN.
>  *
>  * Context: Any context.
>  * Return: the token value for the function if implemented by this
> platform,
>  * otherwise RTAS_UNKNOWN_SERVICE.
>  */
> s32 rtas_function_token(const rtas_fn_handle_t handle)
> {
> const size_t index = handle.index;
> const bool out_of_bounds = index >=
> ARRAY_SIZE(rtas_function_table);
> 
> if (WARN_ONCE(out_of_bounds, "invalid function index %zu",
> index))
> return RTAS_UNKNOWN_SERVICE;
> 
> return rtas_function_table[index].token;
> }
> 
> But that's a tree-wide change that would touch various drivers, and I
> feel like the current series is what I can reasonably hope to get
> applied right now.

It's not clear to me what the benefit of adding this additional layer
of indirection would be.


-- 
Andrew DonnellanOzLabs, ADL Canberra
a...@linux.ibm.com   IBM Australia Limited


Re: [PATCH 13/13] powerpc/rtas: place tracepoints in do_enter_rtas()

2022-11-28 Thread Nicholas Piggin
On Tue Nov 29, 2022 at 9:44 AM AEST, Nathan Lynch wrote:
> "Nicholas Piggin"  writes:
>
> > On Sat Nov 19, 2022 at 1:07 AM AEST, Nathan Lynch wrote:
> >> Call the just-added rtas tracepoints in do_enter_rtas(), taking care
> >> to avoid function name lookups in the CPU offline path.
> >>
> >> Signed-off-by: Nathan Lynch 
> >> ---
> >>  arch/powerpc/kernel/rtas.c | 23 +++
> >>  1 file changed, 23 insertions(+)
> >>
> >> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> >> index 198366d641d0..3487b42cfbf7 100644
> >> --- a/arch/powerpc/kernel/rtas.c
> >> +++ b/arch/powerpc/kernel/rtas.c
> >> @@ -38,6 +38,7 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> +#include 
> >>  #include 
> >>  
> >>  enum rtas_function_flags {
> >> @@ -525,6 +526,7 @@ void enter_rtas(unsigned long);
> >>  static void do_enter_rtas(struct rtas_args *args)
> >>  {
> >>unsigned long msr;
> >> +  const char *name = NULL;
> >>  
> >>/*
> >> * Make sure MSR[RI] is currently enabled as it will be forced later
> >> @@ -537,9 +539,30 @@ static void do_enter_rtas(struct rtas_args *args)
> >>  
> >>hard_irq_disable(); /* Ensure MSR[EE] is disabled on PPC64 */
> >>  
> >> +  if ((trace_rtas_input_enabled() || trace_rtas_output_enabled())) {
> >> +  /*
> >> +   * rtas_token_to_function() uses xarray which uses RCU,
> >> +   * but this code can run in the CPU offline path
> >> +   * (e.g. stop-self), after it's become invalid to call
> >> +   * RCU APIs.
> >> +   */
> >
> > We can call this in real-mode via pseries_machine_check_realmode
> > -> fwnmi_release_errinfo, so tracing should be disabled for that
> > case too... Does this_cpu_set_ftrace_enabled(0) in the early
> > machine check handler cover that sufficiently?
>
> I suspect so, but I'd like to verify. Do you know how I could exercise
> this path in qemu or LPAR?

I have some machine check injection patches for qemu but they never got
merged... I can point you to them if you need.

Should try get those merged again.

Thanks,
Nick


Re: [PATCH 03/13] powerpc/rtas: avoid device tree lookups in rtas_os_term()

2022-11-28 Thread Nicholas Piggin
On Tue Nov 29, 2022 at 4:26 AM AEST, Nathan Lynch wrote:
> "Nicholas Piggin"  writes:
> > On Sat Nov 19, 2022 at 1:07 AM AEST, Nathan Lynch wrote:
> >> rtas_os_term() is called during panic. Its behavior depends on a
> >> couple of conditions in the /rtas node of the device tree, the
> >> traversal of which entails locking and local IRQ state changes. If the
> >> kernel panics while devtree_lock is held, rtas_os_term() as currently
> >> written could hang.
> >
> > Nice.
> >
> >>
> >> Instead of discovering the relevant characteristics at panic time,
> >> cache them in file-static variables at boot. Note the lookup for
> >> "ibm,extended-os-term" is converted to of_property_read_bool() since
> >> it is a boolean property, not a RTAS function token.
> >
> > Small nit, but you could do that at the query site unless you
> > were going to start using ibm,os-term without the extended
> > capability.
>
> I'm unsure that this is what you're suggesting, but we don't want to use
> of_property_read_bool() in this context either, because it has the same
> undesirable qualities as rtas_token().

I mean rtas_initialize() could do

if (of_property_read_bool(rtas.dev, "ibm,extended-os-term"))
ibm_os_term_token = rtas_token("ibm,os-term");

Thanks,
Nick


Re: [PATCH] driver core: fix up some missing class.devnode() conversions.

2022-11-28 Thread Christophe Leroy


Le 28/11/2022 à 18:35, Greg Kroah-Hartman a écrit :
> In commit ff62b8e6588f ("driver core: make struct class.devnode() take a
> const *") the ->devnode callback changed the pointer to be const, but a
> few instances of PowerPC drivers were not caught for some reason.
> 
> Fix this up by changing the pointers to be const.

Build fails:

/linux/arch/powerpc/platforms/book3s/vas-api.c: In function 
'vas_register_coproc_api':
/linux/arch/powerpc/platforms/book3s/vas-api.c:590:31: error: assignment 
from incompatible pointer type [-Werror=incompatible-pointer-types]
   coproc_device.class->devnode = coproc_devnode;
^
cc1: all warnings being treated as errors

> 
> Reported-by: Stephen Rothwell 
> Cc: Michael Ellerman 
> Cc: Nicholas Piggin 
> Cc: Christophe Leroy 
> Cc: Frederic Barrat 
> Cc: Andrew Donnellan 
> Cc: Arnd Bergmann 
> Cc: linuxppc-dev@lists.ozlabs.org
> Fixes: ff62b8e6588f ("driver core: make struct class.devnode() take a const 
> *")
> Signed-off-by: Greg Kroah-Hartman 
> ---
>   arch/powerpc/platforms/book3s/vas-api.c | 2 +-
>   drivers/misc/cxl/file.c | 2 +-
>   drivers/misc/ocxl/file.c| 2 +-
>   3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/book3s/vas-api.c 
> b/arch/powerpc/platforms/book3s/vas-api.c
> index 40f5ae5e1238..eb5bed333750 100644
> --- a/arch/powerpc/platforms/book3s/vas-api.c
> +++ b/arch/powerpc/platforms/book3s/vas-api.c
> @@ -53,7 +53,7 @@ struct coproc_instance {
>   struct vas_window *txwin;
>   };
>   
> -static char *coproc_devnode(struct device *dev, umode_t *mode)
> +static char *coproc_devnode(const struct device *dev, umode_t *mode)
>   {
>   return kasprintf(GFP_KERNEL, "crypto/%s", dev_name(dev));
>   }
> diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
> index 3dbdce96fae0..5878329b011a 100644
> --- a/drivers/misc/cxl/file.c
> +++ b/drivers/misc/cxl/file.c
> @@ -546,7 +546,7 @@ static const struct file_operations afu_master_fops = {
>   };
>   
>   
> -static char *cxl_devnode(struct device *dev, umode_t *mode)
> +static char *cxl_devnode(const struct device *dev, umode_t *mode)
>   {
>   if (cpu_has_feature(CPU_FTR_HVMODE) &&
>   CXL_DEVT_IS_CARD(dev->devt)) {
> diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
> index d46dba2df5a1..d96be36405a0 100644
> --- a/drivers/misc/ocxl/file.c
> +++ b/drivers/misc/ocxl/file.c
> @@ -581,7 +581,7 @@ void ocxl_file_unregister_afu(struct ocxl_afu *afu)
>   device_unregister(>dev);
>   }
>   
> -static char *ocxl_devnode(struct device *dev, umode_t *mode)
> +static char *ocxl_devnode(const struct device *dev, umode_t *mode)
>   {
>   return kasprintf(GFP_KERNEL, "ocxl/%s", dev_name(dev));
>   }


[PATCH v4 7/7] powerpc/64: Sanitise user registers on interrupt in pseries, POWERNV

2022-11-28 Thread Rohan McLure
Cause pseries and POWERNV platforms to default to zeroising all potentially
user-defined registers when entering the kernel by means of any interrupt
source, reducing user-influence of the kernel and the likelihood or
producing speculation gadgets.

Signed-off-by: Rohan McLure 
---
Resubmitting patches as their own series after v6 partially merged:
Link: 
https://lore.kernel.org/all/166488988686.779920.13794870102696416283.b4...@ellerman.id.au/t/
v4: Default on POWERNV as well.
---
 arch/powerpc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 280c797e0f30..2ab114a02f62 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -534,7 +534,7 @@ config HOTPLUG_CPU
 config INTERRUPT_SANITIZE_REGISTERS
bool "Clear gprs on interrupt arrival"
depends on PPC64 && ARCH_HAS_SYSCALL_WRAPPER
-   default PPC_BOOK3E_64
+   default PPC_BOOK3E_64 || PPC_PSERIES || PPC_POWERNV
help
  Reduce the influence of user register state on interrupt handlers and
  syscalls through clearing user state from registers before handling
-- 
2.37.2



[PATCH v4 6/7] powerpc/64e: Clear gprs on interrupt routine entry on Book3E

2022-11-28 Thread Rohan McLure
Zero GPRS r14-r31 on entry into the kernel for interrupt sources to
limit influence of user-space values in potential speculation gadgets.
Prior to this commit, all other GPRS are reassigned during the common
prologue to interrupt handlers and so need not be zeroised explicitly.

This may be done safely, without loss of register state prior to the
interrupt, as the common prologue saves the initial values of
non-volatiles, which are unconditionally restored in interrupt_64.S.
Mitigation defaults to enabled by INTERRUPT_SANITIZE_REGISTERS.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Rohan McLure 
---
Resubmitting patches as their own series after v6 partially merged:
Link: 
https://lore.kernel.org/all/166488988686.779920.13794870102696416283.b4...@ellerman.id.au/t/
---
 arch/powerpc/kernel/exceptions-64e.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 930e36099015..6b842d5ea4e1 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -358,7 +358,6 @@ ret_from_mc_except:
std r14,PACA_EXMC+EX_R14(r13);  \
std r15,PACA_EXMC+EX_R15(r13)
 
-
 /* Core exception code for all exceptions except TLB misses. */
 #define EXCEPTION_COMMON_LVL(n, scratch, excf) \
 exc_##n##_common:  \
@@ -394,7 +393,8 @@ exc_##n##_common:   
\
std r12,STACK_FRAME_OVERHEAD-16(r1); /* mark the frame */   \
std r3,_TRAP(r1);   /* set trap number  */  \
std r0,RESULT(r1);  /* clear regs->result */\
-   SAVE_NVGPRS(r1);
+   SAVE_NVGPRS(r1);\
+   SANITIZE_ZEROIZE_NVGPRS();  /* minimise speculation influence */
 
 #define EXCEPTION_COMMON(n) \
EXCEPTION_COMMON_LVL(n, SPRN_SPRG_GEN_SCRATCH, PACA_EXGEN)
-- 
2.37.2



[PATCH v4 5/7] powerpc/64s: Zeroise gprs on interrupt routine entry on Book3S

2022-11-28 Thread Rohan McLure
Zeroise user state in gprs (assign to zero) to reduce the influence of user
registers on speculation within kernel syscall handlers. Clears occur
at the very beginning of the sc and scv 0 interrupt handlers, with
restores occurring following the execution of the syscall handler.

Zeroise GPRS r0, r2-r11, r14-r31, on entry into the kernel for all
other interrupt sources. The remaining gprs are overwritten by
entry macros to interrupt handlers, irrespective of whether or not a
given handler consumes these register values. If an interrupt does not
select the IMSR_R12 IOption, zeroise r12.

Prior to this commit, r14-r31 are restored on a per-interrupt basis at
exit, but now they are always restored on 64bit Book3S. Remove explicit
REST_NVGPRS invocations on 64-bit Book3S. 32-bit systems do not clear
user registers on interrupt, and continue to depend on the return value
of interrupt_exit_user_prepare to determine whether or not to restore
non-volatiles.

The mmap_bench benchmark in selftests should rapidly invoke pagefaults.
See ~0.8% performance regression with this mitigation, but this
indicates the worst-case performance due to heavier-weight interrupt
handlers. This mitigation is able to be enabled/disabled through
CONFIG_INTERRUPT_SANITIZE_REGISTERS.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Rohan McLure 
---
v2: REST_NVGPRS should be conditional on mitigation in scv handler. Fix
improper multi-line preprocessor macro in interrupt_64.S
v4: Split off IMSR_R12 definition into its own patch. Move macro
definitions for register sanitisation into asm/ppc_asm.h
---
 arch/powerpc/kernel/exceptions-64s.S | 27 ++-
 arch/powerpc/kernel/interrupt_64.S   | 16 ++--
 2 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 58d72db1d484..8ee3db3b6595 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -506,6 +506,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real, text)
std r10,0(r1)   /* make stack chain pointer */
std r0,GPR0(r1) /* save r0 in stackframe*/
std r10,GPR1(r1)/* save r1 in stackframe*/
+   ZEROIZE_GPR(0)
 
/* Mark our [H]SRRs valid for return */
li  r10,1
@@ -548,8 +549,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
std r9,GPR11(r1)
std r10,GPR12(r1)
std r11,GPR13(r1)
+   .if !IMSR_R12
+   ZEROIZE_GPRS(9, 12)
+   .else
+   ZEROIZE_GPRS(9, 11)
+   .endif
 
SAVE_NVGPRS(r1)
+   SANITIZE_ZEROIZE_NVGPRS()
 
.if IDAR
.if IISIDE
@@ -581,8 +588,8 @@ BEGIN_FTR_SECTION
 END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
ld  r10,IAREA+EX_CTR(r13)
std r10,_CTR(r1)
-   std r2,GPR2(r1) /* save r2 in stackframe*/
-   SAVE_GPRS(3, 8, r1) /* save r3 - r8 in stackframe   */
+   SAVE_GPRS(2, 8, r1) /* save r2 - r8 in stackframe   */
+   ZEROIZE_GPRS(2, 8)
mflrr9  /* Get LR, later save to stack  */
LOAD_PACA_TOC() /* get kernel TOC into r2   */
std r9,_LINK(r1)
@@ -700,6 +707,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
mtlrr9
ld  r9,_CCR(r1)
mtcrr9
+   SANITIZE_RESTORE_NVGPRS()
REST_GPRS(2, 13, r1)
REST_GPR(0, r1)
/* restore original r1. */
@@ -1445,7 +1453,7 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 * do_break() may have changed the NV GPRS while handling a breakpoint.
 * If so, we need to restore them with their updated values.
 */
-   REST_NVGPRS(r1)
+   HANDLER_RESTORE_NVGPRS()
b   interrupt_return_srr
 
 
@@ -1671,7 +1679,7 @@ EXC_COMMON_BEGIN(alignment_common)
GEN_COMMON alignment
addir3,r1,STACK_FRAME_OVERHEAD
bl  alignment_exception
-   REST_NVGPRS(r1) /* instruction emulation may change GPRs */
+   HANDLER_RESTORE_NVGPRS() /* instruction emulation may change GPRs */
b   interrupt_return_srr
 
 
@@ -1737,7 +1745,7 @@ EXC_COMMON_BEGIN(program_check_common)
 .Ldo_program_check:
addir3,r1,STACK_FRAME_OVERHEAD
bl  program_check_exception
-   REST_NVGPRS(r1) /* instruction emulation may change GPRs */
+   HANDLER_RESTORE_NVGPRS() /* instruction emulation may change GPRs */
b   interrupt_return_srr
 
 
@@ -2169,7 +2177,7 @@ EXC_COMMON_BEGIN(emulation_assist_common)
GEN_COMMON emulation_assist
addir3,r1,STACK_FRAME_OVERHEAD
bl  emulation_assist_interrupt
-   REST_NVGPRS(r1) /* instruction emulation may change GPRs */
+   HANDLER_RESTORE_NVGPRS() /* instruction emulation may change GPRs */
b   interrupt_return_hsrr
 
 
@@ -2489,7 +2497,7 @@ 

[PATCH v4 4/7] powerpc/64s: IOption for MSR stored in r12

2022-11-28 Thread Rohan McLure
Interrupt handlers in asm/exceptions-64s.S contain a great deal of common
code produced by the GEN_COMMON macros. Currently, at the exit point of
the macro, r12 will contain the contents of the MSR. A future patch will
cause these macros to zeroise architected registers to avoid potential
speculation influence of user data.

Provide an IOption that signals that r12 must be retained, as the
interrupt handler assumes it to hold the contents of the MSR.

Signed-off-by: Rohan McLure 
---
v4: Split 64s register sanitisation commit to establish this IOption
---
 arch/powerpc/kernel/exceptions-64s.S | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 5381a43e50fe..58d72db1d484 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -111,6 +111,7 @@ name:
 #define ISTACK .L_ISTACK_\name\()  /* Set regular kernel stack */
 #define __ISTACK(name) .L_ISTACK_ ## name
 #define IKUAP  .L_IKUAP_\name\()   /* Do KUAP lock */
+#define IMSR_R12   .L_IMSR_R12_\name\()/* Assumes MSR saved to r12 */
 
 #define INT_DEFINE_BEGIN(n)\
 .macro int_define_ ## n name
@@ -176,6 +177,9 @@ do_define_int n
.ifndef IKUAP
IKUAP=1
.endif
+   .ifndef IMSR_R12
+   IMSR_R12=0
+   .endif
 .endm
 
 /*
@@ -1751,6 +1755,7 @@ INT_DEFINE_BEGIN(fp_unavailable)
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
IKVM_REAL=1
 #endif
+   IMSR_R12=1
 INT_DEFINE_END(fp_unavailable)
 
 EXC_REAL_BEGIN(fp_unavailable, 0x800, 0x100)
@@ -2372,6 +2377,7 @@ INT_DEFINE_BEGIN(altivec_unavailable)
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
IKVM_REAL=1
 #endif
+   IMSR_R12=1
 INT_DEFINE_END(altivec_unavailable)
 
 EXC_REAL_BEGIN(altivec_unavailable, 0xf20, 0x20)
@@ -2421,6 +2427,7 @@ INT_DEFINE_BEGIN(vsx_unavailable)
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
IKVM_REAL=1
 #endif
+   IMSR_R12=1
 INT_DEFINE_END(vsx_unavailable)
 
 EXC_REAL_BEGIN(vsx_unavailable, 0xf40, 0x20)
-- 
2.37.2



[PATCH v4 3/7] powerpc/64: Sanitise common exit code for interrupts

2022-11-28 Thread Rohan McLure
Interrupt code is shared between Book3E/S 64-bit systems for interrupt
handlers. Ensure that exit code correctly restores non-volatile gprs on
each system when CONFIG_INTERRUPT_SANITIZE_REGISTERS is enabled.

Also introduce macros for clearing/restoring registers on interrupt
entry for when this configuration option is either disabled or enabled.

Signed-off-by: Rohan McLure 
---
v4: New patch
---
 arch/powerpc/kernel/interrupt_64.S | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/kernel/interrupt_64.S 
b/arch/powerpc/kernel/interrupt_64.S
index 978a173eb339..1ef4fdef74fb 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -408,9 +408,11 @@ interrupt_return_\srr\()_user: /* make backtraces match 
the _kernel variant */
 _ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_user)
addir3,r1,STACK_FRAME_OVERHEAD
bl  interrupt_exit_user_prepare
+#ifndef CONFIG_INTERRUPT_SANITIZE_REGISTERS
cmpdi   r3,0
bne-.Lrestore_nvgprs_\srr
 .Lrestore_nvgprs_\srr\()_cont:
+#endif
std r1,PACA_EXIT_SAVE_R1(r13) /* save r1 for restart */
 #ifdef CONFIG_PPC_BOOK3S
 .Linterrupt_return_\srr\()_user_rst_start:
@@ -424,6 +426,7 @@ _ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_user)
stb r11,PACAIRQHAPPENED(r13) # clear out possible HARD_DIS
 
 .Lfast_user_interrupt_return_\srr\():
+   SANITIZE_RESTORE_NVGPRS()
 #ifdef CONFIG_PPC_BOOK3S
.ifc \srr,srr
lbz r4,PACASRR_VALID(r13)
@@ -493,9 +496,11 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
b   .   /* prevent speculative execution */
 .Linterrupt_return_\srr\()_user_rst_end:
 
+#ifndef CONFIG_INTERRUPT_SANITIZE_REGISTERS
 .Lrestore_nvgprs_\srr\():
REST_NVGPRS(r1)
b   .Lrestore_nvgprs_\srr\()_cont
+#endif
 
 #ifdef CONFIG_PPC_BOOK3S
 interrupt_return_\srr\()_user_restart:
@@ -576,6 +581,7 @@ _ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_kernel)
stb r11,PACAIRQHAPPENED(r13) // clear the possible HARD_DIS
 
 .Lfast_kernel_interrupt_return_\srr\():
+   SANITIZE_RESTORE_NVGPRS()
cmpdi   cr1,r3,0
 #ifdef CONFIG_PPC_BOOK3S
.ifc \srr,srr
-- 
2.37.2



[PATCH v4 1/7] powerpc/64: Add INTERRUPT_SANITIZE_REGISTERS Kconfig

2022-11-28 Thread Rohan McLure
Add Kconfig option for enabling clearing of registers on arrival in an
interrupt handler. This reduces the speculation influence of registers
on kernel internals. The option will be consumed by 64-bit systems that
feature speculation and wish to implement this mitigation.

This patch only introduces the Kconfig option, no actual mitigations.

The primary overhead of this mitigation lies in an increased number of
registers that must be saved and restored by interrupt handlers on
Book3S systems. Enable by default on Book3E systems, which prior to
this patch eagerly save and restore register state, meaning that the
mitigation when implemented will have minimal overhead.

Acked-by: Nicholas Piggin 
Signed-off-by: Rohan McLure 
---
Resubmitting patches as their own series after v6 partially merged:
Link: 
https://lore.kernel.org/all/166488988686.779920.13794870102696416283.b4...@ellerman.id.au/t/
---
 arch/powerpc/Kconfig | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 4fd4924f6d50..280c797e0f30 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -531,6 +531,15 @@ config HOTPLUG_CPU
 
  Say N if you are unsure.
 
+config INTERRUPT_SANITIZE_REGISTERS
+   bool "Clear gprs on interrupt arrival"
+   depends on PPC64 && ARCH_HAS_SYSCALL_WRAPPER
+   default PPC_BOOK3E_64
+   help
+ Reduce the influence of user register state on interrupt handlers and
+ syscalls through clearing user state from registers before handling
+ the exception.
+
 config PPC_QUEUED_SPINLOCKS
bool "Queued spinlocks" if EXPERT
depends on SMP
-- 
2.37.2



[PATCH v4 2/7] powerpc/64: Add interrupt register sanitisation macros

2022-11-28 Thread Rohan McLure
Include in asm/ppc_asm.h macros to be used in multiple successive
patches to implement zeroising architected registers in interrupt
handlers. Registers will be sanitised in this fashion in future patches
to reduce the speculation influence of user-controlled register values.
These mitigations will be configurable through the
CONFIG_INTERRUPT_SANITIZE_REGISTERS Kconfig option.

Included are macros for conditionally zeroising registers and restoring
as required with the mitigation enabled. With the mitigation disabled,
non-volatiles must be restored on demand at separate locations to
those required by the mitigation.

Signed-off-by: Rohan McLure 
---
v4: New patch
---
 arch/powerpc/include/asm/ppc_asm.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc_asm.h 
b/arch/powerpc/include/asm/ppc_asm.h
index 753a2757bcd4..272b2795c36a 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -74,6 +74,23 @@
 #define SAVE_GPR(n, base)  SAVE_GPRS(n, n, base)
 #define REST_GPR(n, base)  REST_GPRS(n, n, base)
 
+/* macros for handling user register sanitisation */
+#ifdef CONFIG_INTERRUPT_SANITIZE_REGISTERS
+#define SANITIZE_ZEROIZE_SYSCALL_GPRS()ZEROIZE_GPR(0); 
\
+   ZEROIZE_GPRS(5, 12);\
+   ZEROIZE_NVGPRS()
+#define SANITIZE_ZEROIZE_INTERRUPT_NVGPRS()ZEROIZE_NVGPRS()
+#define SANITIZE_ZEROIZE_NVGPRS()  ZEROIZE_NVGPRS()
+#define SANITIZE_RESTORE_NVGPRS()  REST_NVGPRS(r1)
+#define HANDLER_RESTORE_NVGPRS()
+#else
+#define SANITIZE_ZEROIZE_INTERRUPT_NVGPRS()
+#define SANITIZE_ZEROIZE_SYSCALL_GPRS()
+#define SANITIZE_ZEROIZE_NVGPRS()
+#define SANITIZE_RESTORE_NVGPRS()
+#define HANDLER_RESTORE_NVGPRS()   REST_NVGPRS(r1)
+#endif /* CONFIG_INTERRUPT_SANITIZE_REGISTERS */
+
 #define SAVE_FPR(n, base)  stfdn,8*TS_FPRWIDTH*(n)(base)
 #define SAVE_2FPRS(n, base)SAVE_FPR(n, base); SAVE_FPR(n+1, base)
 #define SAVE_4FPRS(n, base)SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base)
-- 
2.37.2



Re: [PATCH 1/3] firmware_loader: remove #include

2022-11-28 Thread Russ Weight



On 11/25/22 21:09, Thomas Weißschuh wrote:
> utsrelease.h is potentially generated on each build.
> By removing this unused include we can get rid of some spurious
> recompilations.
Reviewed-by: Russ Weight 
> Signed-off-by: Thomas Weißschuh 
> ---
>  drivers/base/firmware_loader/firmware.h | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/base/firmware_loader/firmware.h 
> b/drivers/base/firmware_loader/firmware.h
> index fe77e91c38a2..bf549d6500d7 100644
> --- a/drivers/base/firmware_loader/firmware.h
> +++ b/drivers/base/firmware_loader/firmware.h
> @@ -9,8 +9,6 @@
>  #include 
>  #include 
>  
> -#include 
> -
>  /**
>   * enum fw_opt - options to control firmware loading behaviour
>   *
>
> base-commit: 0b1dcc2cf55ae6523c6fbd0d741b3ac28c9f4536



Re: [PATCH v6 0/4] Option to build big-endian with ELFv2 ABI

2022-11-28 Thread Joel Stanley
On Mon, 28 Nov 2022 at 04:16, Nicholas Piggin  wrote:
>
> This is hopefully the final attempt. Luis was happy for the module
> patch to go via the powerpc tree, so I've put the the ELFv2 for big
> endian build patches into the series. Hopefully we can deprecate
> the ELFv1 ABI
>
> Since v5, I cleaned up patch 2 as per Christophe's review. And patch
> 4 I removed the EXPERT depends so it's easier to test. It's marked as
> experimental, but we should soon make it default and try to deprecate
> the v1 ABI so we can eventually remove it.

Reviewed-by: Joel Stanley 

I did some builds and boot tested them in qemu fine.

>
> Thanks,
> Nick
>
> Nicholas Piggin (4):
>   module: add module_elf_check_arch for module-specific checks
>   powerpc/64: Add module check for ELF ABI version
>   powerpc/64: Add big-endian ELFv2 flavour to crypto VMX asm generation
>   powerpc/64: Option to build big-endian with ELFv2 ABI
>
>  arch/powerpc/Kconfig   | 21 +
>  arch/powerpc/kernel/module_64.c| 10 ++
>  arch/powerpc/platforms/Kconfig.cputype |  4 ++--
>  drivers/crypto/vmx/Makefile| 12 +++-
>  drivers/crypto/vmx/ppc-xlate.pl| 10 ++
>  include/linux/moduleloader.h   |  3 +++
>  kernel/module/main.c   | 10 ++
>  7 files changed, 63 insertions(+), 7 deletions(-)
>
> --
> 2.37.2
>


Re: [PATCH v6] livepatch: Clear relocation targets on a module removal

2022-11-28 Thread Song Liu
On Fri, Nov 18, 2022 at 8:24 AM Petr Mladek  wrote:
>
[...]

> >
> > +#ifdef CONFIG_LIVEPATCH
> > +void clear_relocate_add(Elf64_Shdr *sechdrs,
> > +const char *strtab,
> > +unsigned int symindex,
> > +unsigned int relsec,
> > +struct module *me)
> > +{
> > + unsigned int i;
> > + Elf64_Rela *rela = (void *)sechdrs[relsec].sh_addr;
> > + Elf64_Sym *sym;
> > + unsigned long *location;
> > + const char *symname;
> > + u32 *instruction;
> > +
> > + pr_debug("Clearing ADD relocate section %u to %u\n", relsec,
> > +  sechdrs[relsec].sh_info);
> > +
> > + for (i = 0; i < sechdrs[relsec].sh_size / sizeof(*rela); i++) {
> > + location = (void *)sechdrs[sechdrs[relsec].sh_info].sh_addr
> > + + rela[i].r_offset;
> > + sym = (Elf64_Sym *)sechdrs[symindex].sh_addr
> > + + ELF64_R_SYM(rela[i].r_info);
> > + symname = me->core_kallsyms.strtab
> > + + sym->st_name;
> > +
> > + if (ELF64_R_TYPE(rela[i].r_info) != R_PPC_REL24)
> > + continue;
> > + /*
> > +  * reverse the operations in apply_relocate_add() for case
> > +  * R_PPC_REL24.
> > +  */
> > + if (sym->st_shndx != SHN_UNDEF &&
> > + sym->st_shndx != SHN_LIVEPATCH)
> > + continue;
> > +
> > + instruction = (u32 *)location;
> > + if (is_mprofile_ftrace_call(symname))
> > + continue;
> > +
> > + if (!instr_is_relative_link_branch(ppc_inst(*instruction)))
> > + continue;
> > +
> > + instruction += 1;
> > + patch_instruction(instruction, ppc_inst(PPC_RAW_NOP()));
> > + }
> > +
> > +}
>
> This looks like a lot of duplicated code. Isn't it?

TBH, I think the duplicated code is not really bad.

apply_relocate_add() is a much more complicated function, I would
rather not mess it up to make this function a little simpler.

[...]

>
> This duplicates a lot of code. Please, rename apply_relocate_add() the
> same way as __apply_clear_relocate_add() and add the "apply" parameter.
> Then add the wrappers for this:
>
> int write_relocate_add(Elf64_Shdr *sechdrs,
>const char *strtab,
>unsigned int symindex,
>unsigned int relsec,
>struct module *me,
>bool apply)
> {
> int ret;
> bool early = me->state == MODULE_STATE_UNFORMED;
> void *(*write)(void *, const void *, size_t) = memcpy;
>
> if (!early) {
> write = text_poke;
> mutex_lock(_mutex);
> }

How about we move the "early" logic into __write_relocate_add()?

>
> ret = __write_relocate_add(sechdrs, strtab, symindex, relsec, me,
>  write, apply);
>
> if (!early) {
> text_poke_sync();
> mutex_unlock(_mutex);
> }
>
> return ret;
> }
>
> int apply_relocate_add(Elf64_Shdr *sechdrs,
>const char *strtab,
>unsigned int symindex,
>unsigned int relsec,
>struct module *me)
> {
> return write_relocate_add(sechdrs, strtab, symindex, relsec, me, 
> true);

Then we just call __write_relocate_add() from here...

> }
>
> #ifdef CONFIG_LIVEPATCH
> void apply_relocate_add(Elf64_Shdr *sechdrs,
> const char *strtab,
> unsigned int symindex,
> unsigned int relsec,
> struct module *me)
> {
> write_relocate_add(sechdrs, strtab, symindex, relsec, me, false);

and here.


> }
> #endif
>
>
> > +#endif
> > +
> >  #endif
> >
> >  int module_finalize(const Elf_Ehdr *hdr,
> > --- a/kernel/livepatch/core.c
> > +++ b/kernel/livepatch/core.c
> > @@ -316,6 +316,45 @@ int klp_apply_section_relocs(struct module *pmod, 
> > Elf_Shdr *sechdrs,
> >   return apply_relocate_add(sechdrs, strtab, symndx, secndx, pmod);
> >  }
> >
> > +static void klp_clear_object_relocations(struct module *pmod,
> > + struct klp_object *obj)
> > +{
> > + int i, cnt;
> > + const char *objname, *secname;
> > + char sec_objname[MODULE_NAME_LEN];
> > + Elf_Shdr *sec;
> > +
> > + objname = klp_is_module(obj) ? obj->name : "vmlinux";
> > +
> > + /* For each klp relocation section */
> > + for (i = 1; i < pmod->klp_info->hdr.e_shnum; i++) {
> > + sec = pmod->klp_info->sechdrs + i;
> > + secname = pmod->klp_info->secstrings + sec->sh_name;
> > + if (!(sec->sh_flags & SHF_RELA_LIVEPATCH))
> > + continue;
> > +
> > + /*
> > +  

[PATCH] soc: fsl: qe: Fix refcount leak in par_io_of_config

2022-11-28 Thread Zheng Yongjun
of_parse_phandle() returns a node pointer with refcount
incremented, we should use of_node_put() on error path.
Add missing of_node_put() to avoid refcount leak.

Fixes: 986585385131 ("[POWERPC] Add QUICC Engine (QE) infrastructure")
Signed-off-by: Zheng Yongjun 
---
 drivers/soc/fsl/qe/qe_io.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/soc/fsl/qe/qe_io.c b/drivers/soc/fsl/qe/qe_io.c
index a5e2d0e5ab51..e26836315167 100644
--- a/drivers/soc/fsl/qe/qe_io.c
+++ b/drivers/soc/fsl/qe/qe_io.c
@@ -159,11 +159,13 @@ int par_io_of_config(struct device_node *np)
pio_map = of_get_property(pio, "pio-map", _map_len);
if (pio_map == NULL) {
printk(KERN_ERR "pio-map is not set!\n");
+   of_node_put(pio);
return -1;
}
pio_map_len /= sizeof(unsigned int);
if ((pio_map_len % 6) != 0) {
printk(KERN_ERR "pio-map format wrong!\n");
+   of_node_put(pio);
return -1;
}

--
2.17.1



Re: [PATCH] powerpc: dts: turris1x.dts: Add channel labels for temperature sensor

2022-11-28 Thread Michael Ellerman
Pali Rohár  writes:
> Michael, could you take this patch?

Yep.

With these dts patches it always helps if you tell me that it passes the
DT schema checks, so that I don't get yelled at by the DT people :)

cheers

> On Sunday 09 October 2022 14:05:06 Pali Rohár wrote:
>> On Friday 30 September 2022 14:46:18 Pali Rohár wrote:
>> > + CC hwmon ML
>> > 
>> > On Friday 30 September 2022 14:39:01 Pali Rohár wrote:
>> > > Channel 0 of SA56004ED chip refers to internal SA56004ED chip sensor 
>> > > (chip
>> > > itself is located on the board) and channel 1 of SA56004ED chip refers to
>> > > external sensor which is connected to temperature diode of the P2020 CPU.
>> > > 
>> > > Fixes: 54c15ec3b738 ("powerpc: dts: Add DTS file for CZ.NIC Turris 1.x 
>> > > routers")
>> > > Signed-off-by: Pali Rohár 
>> > > ---
>> > > With this change userspace 'sensors' applications prints labels:
>> > > 
>> > > $ sensors
>> > > sa56004-i2c-0-4c
>> > > Adapter: MPC adapter (i2c@3000)
>> > > board:+34.2°C  (low  =  +0.0°C, high = +70.0°C)
>> > >(crit = +85.0°C, hyst = +75.0°C)
>> > > cpu:  +58.9°C  (low  =  +0.0°C, high = +70.0°C)
>> > >(crit = +85.0°C, hyst = +75.0°C)
>> > > 
>> > > And without this change it prints just generic tempX names:
>> > > 
>> > > $ sensors
>> > > sa56004-i2c-0-4c
>> > > Adapter: MPC adapter (i2c@3000)
>> > > temp1:+43.0°C  (low  =  +0.0°C, high = +70.0°C)
>> > >(crit = +85.0°C, hyst = +75.0°C)
>> > > temp2:+63.4°C  (low  =  +0.0°C, high = +70.0°C)
>> > >(crit = +85.0°C, hyst = +75.0°C)
>> > > ---
>> > >  arch/powerpc/boot/dts/turris1x.dts | 14 ++
>> > >  1 file changed, 14 insertions(+)
>> > > 
>> > > diff --git a/arch/powerpc/boot/dts/turris1x.dts 
>> > > b/arch/powerpc/boot/dts/turris1x.dts
>> > > index 4033c554b06a..5b5278c32e43 100644
>> > > --- a/arch/powerpc/boot/dts/turris1x.dts
>> > > +++ b/arch/powerpc/boot/dts/turris1x.dts
>> > > @@ -69,6 +69,20 @@
>> > >  interrupt-parent = <>;
>> > >  interrupts = <12 IRQ_TYPE_LEVEL_LOW>, 
>> > > /* GPIO12 - ALERT pin */
>> > >   <13 IRQ_TYPE_LEVEL_LOW>; 
>> > > /* GPIO13 - CRIT pin */
>> > > +#address-cells = <1>;
>> > > +#size-cells = <0>;
>> > > +
>> > > +/* Local temperature sensor (SA56004ED 
>> > > internal) */
>> > > +channel@0 {
>> > > +reg = <0>;
>> > > +label = "board";
>> > > +};
>> > > +
>> > > +/* Remote temperature sensor (D+/D- 
>> > > connected to P2020 CPU Temperature Diode) */
>> > > +channel@1 {
>> > > +reg = <1>;
>> > > +label = "cpu";
>> > > +};
>> > 
>> > I'm not sure if you want UPPERCASE, lowercase, PascalCase, kebab-case
>> > or snake_case format of labels. Or if you want also "temp" or
>> > "temperature" keyword in the label. So please adjust label to the
>> > preferred one, if proposed format is not the correct.
>> 
>> Ok, if nobody complains then please take this patch as is.
>> 
>> > >  };
>> > >  
>> > >  /* DDR3 SPD/EEPROM */
>> > > -- 
>> > > 2.20.1
>> > > 


Re: [PATCH 13/13] powerpc/rtas: place tracepoints in do_enter_rtas()

2022-11-28 Thread Michael Ellerman
Nathan Lynch  writes:
> "Nicholas Piggin"  writes:
>> On Sat Nov 19, 2022 at 1:07 AM AEST, Nathan Lynch wrote:
>>> Call the just-added rtas tracepoints in do_enter_rtas(), taking care
>>> to avoid function name lookups in the CPU offline path.
>>>
>>> Signed-off-by: Nathan Lynch 
>>> ---
>>>  arch/powerpc/kernel/rtas.c | 23 +++
>>>  1 file changed, 23 insertions(+)
>>>
>>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>>> index 198366d641d0..3487b42cfbf7 100644
>>> --- a/arch/powerpc/kernel/rtas.c
>>> +++ b/arch/powerpc/kernel/rtas.c
>>> @@ -38,6 +38,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  #include 
>>>  
>>>  enum rtas_function_flags {
>>> @@ -525,6 +526,7 @@ void enter_rtas(unsigned long);
>>>  static void do_enter_rtas(struct rtas_args *args)
>>>  {
>>> unsigned long msr;
>>> +   const char *name = NULL;
>>>  
>>> /*
>>>  * Make sure MSR[RI] is currently enabled as it will be forced later
>>> @@ -537,9 +539,30 @@ static void do_enter_rtas(struct rtas_args *args)
>>>  
>>> hard_irq_disable(); /* Ensure MSR[EE] is disabled on PPC64 */
>>>  
>>> +   if ((trace_rtas_input_enabled() || trace_rtas_output_enabled())) {
>>> +   /*
>>> +* rtas_token_to_function() uses xarray which uses RCU,
>>> +* but this code can run in the CPU offline path
>>> +* (e.g. stop-self), after it's become invalid to call
>>> +* RCU APIs.
>>> +*/
>>
>> We can call this in real-mode via pseries_machine_check_realmode
>> -> fwnmi_release_errinfo, so tracing should be disabled for that
>> case too... Does this_cpu_set_ftrace_enabled(0) in the early
>> machine check handler cover that sufficiently?
>
> I suspect so, but I'd like to verify. Do you know how I could exercise
> this path in qemu or LPAR?

On a P9 or P10 LPAR you should be able to use 
tools/testing/selftests/powerpc/mce/inject-ra-err

cheers


Re: [PATCH 10/13] powerpc/rtas: improve function information lookups

2022-11-28 Thread Nathan Lynch
Andrew Donnellan  writes:
> On Fri, 2022-11-18 at 09:07 -0600, Nathan Lynch wrote:
>> Convert rtas_token() to use a lockless binary search on the function
>> table. Fall back to the old behavior for lookups against names that
>> are not known to be RTAS functions, but issue a warning. rtas_token()
>> is for function names; it is not a general facility for accessing
>> arbitrary properties of the /rtas node. All known misuses of
>> rtas_token() have been converted to more appropriate of_ APIs in
>> preceding changes.
>
> For in-kernel users, why not go all the way: make rtas_token() static
> and use it purely for the userspace API,

Not sure what userspace API refers to here, can you be more specific? I
don't think rtas_token() is exposed to user space.

> and switch kernel users over
> to using rtas_function_index directly?

Well, I have work in progress to transition all rtas_token() users to a
different API, but using opaque symbolic handles rather than exposing
the indexes. Something like:

/*
 * Opaque handle for client code to refer to RTAS functions. All valid
 * function handles are build-time constants prefixed with RTAS_FN_.
 */
typedef struct {
enum rtas_function_index index;
} rtas_fn_handle_t;

#define rtas_fn_handle(x_) (const rtas_fn_handle_t) { .index = rtas_fnidx(x_), }

#define RTAS_FN_CHECK_EXCEPTION   
rtas_fn_handle(CHECK_EXCEPTION)
#define RTAS_FN_DISPLAY_CHARACTER 
rtas_fn_handle(DISPLAY_CHARACTER)
#define RTAS_FN_EVENT_SCANrtas_fn_handle(EVENT_SCAN)

...

/**
 * rtas_function_token() - RTAS function token lookup.
 * @handle: Function handle, e.g. RTAS_FN_EVENT_SCAN.
 *
 * Context: Any context.
 * Return: the token value for the function if implemented by this platform,
 * otherwise RTAS_UNKNOWN_SERVICE.
 */
s32 rtas_function_token(const rtas_fn_handle_t handle)
{
const size_t index = handle.index;
const bool out_of_bounds = index >= ARRAY_SIZE(rtas_function_table);

if (WARN_ONCE(out_of_bounds, "invalid function index %zu", index))
return RTAS_UNKNOWN_SERVICE;

return rtas_function_table[index].token;
}

But that's a tree-wide change that would touch various drivers, and I
feel like the current series is what I can reasonably hope to get
applied right now.


Re: [PATCH 13/13] powerpc/rtas: place tracepoints in do_enter_rtas()

2022-11-28 Thread Nathan Lynch
"Nicholas Piggin"  writes:

> On Sat Nov 19, 2022 at 1:07 AM AEST, Nathan Lynch wrote:
>> Call the just-added rtas tracepoints in do_enter_rtas(), taking care
>> to avoid function name lookups in the CPU offline path.
>>
>> Signed-off-by: Nathan Lynch 
>> ---
>>  arch/powerpc/kernel/rtas.c | 23 +++
>>  1 file changed, 23 insertions(+)
>>
>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> index 198366d641d0..3487b42cfbf7 100644
>> --- a/arch/powerpc/kernel/rtas.c
>> +++ b/arch/powerpc/kernel/rtas.c
>> @@ -38,6 +38,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  
>>  enum rtas_function_flags {
>> @@ -525,6 +526,7 @@ void enter_rtas(unsigned long);
>>  static void do_enter_rtas(struct rtas_args *args)
>>  {
>>  unsigned long msr;
>> +const char *name = NULL;
>>  
>>  /*
>>   * Make sure MSR[RI] is currently enabled as it will be forced later
>> @@ -537,9 +539,30 @@ static void do_enter_rtas(struct rtas_args *args)
>>  
>>  hard_irq_disable(); /* Ensure MSR[EE] is disabled on PPC64 */
>>  
>> +if ((trace_rtas_input_enabled() || trace_rtas_output_enabled())) {
>> +/*
>> + * rtas_token_to_function() uses xarray which uses RCU,
>> + * but this code can run in the CPU offline path
>> + * (e.g. stop-self), after it's become invalid to call
>> + * RCU APIs.
>> + */
>
> We can call this in real-mode via pseries_machine_check_realmode
> -> fwnmi_release_errinfo, so tracing should be disabled for that
> case too... Does this_cpu_set_ftrace_enabled(0) in the early
> machine check handler cover that sufficiently?

I suspect so, but I'd like to verify. Do you know how I could exercise
this path in qemu or LPAR?


Re: [PATCH mm-unstable v1 16/20] mm/frame-vector: remove FOLL_FORCE usage

2022-11-28 Thread Andrew Morton
On Mon, 28 Nov 2022 09:18:47 +0100 David Hildenbrand  wrote:

> > Less chances of things going wrong that way.
> > 
> > Just mention in the v2 cover letter that the first patch was added to
> > make it easy to backport that fix without being hampered by merge
> > conflicts if it was added after your frame_vector.c patch.
> 
> Yes, that's the way I would naturally do, it, however, Andrew prefers 
> delta updates for minor changes.
> 
> @Andrew, whatever you prefer!

I'm inclined to let things sit as they are.  Cross-tree conflicts
happen, and Linus handles them.  I'll flag this (very simple) conflict
in the pull request, if MM merges second.  If v4l merges second then
hopefully they will do the same.  But this one is so simple that Linus
hardly needs our help.

But Linus won't be editing changelogs so that the changelog makes more
sense after both trees are joined.  I'm inclined to let the changelog
sit as it is as well.


Re: [PATCH 10/13] powerpc/rtas: improve function information lookups

2022-11-28 Thread Nathan Lynch
Nick Child  writes:
> On 11/18/22 09:07, Nathan Lynch wrote:
>> +static int __init rtas_token_to_function_xarray_init(void)
>> +{
>> +int err = 0;
>> +
>> +for (size_t i = 0; i < ARRAY_SIZE(rtas_function_table); ++i) {
>> +const struct rtas_function *func = _function_table[i];
>> +const s32 token = func->token;
>> +
>> +if (token == RTAS_UNKNOWN_SERVICE)
>> +continue;
>> +
>> +err = xa_err(xa_store(_token_to_function_xarray,
>> +  token, (void *)func, GFP_KERNEL));
>> +if (err)
>> +break;
>> +}
>> +
>> +return err;
>> +}
>> +arch_initcall(rtas_token_to_function_xarray_init);
>> +
>> +static const struct rtas_function *rtas_token_to_function(s32 token)
>> +{
>> +const struct rtas_function *func;
>> +
>> +if (WARN_ONCE(token < 0, "invalid token %d", token))
>> +return NULL;
>> +
>> +func = xa_load(_token_to_function_xarray, (unsigned long)token);
>> +
> Why typecast token here and not in xa_store?

No good reason. I'll add it to the xa_store() call site.

>> +static void __init rtas_function_table_init(void)
>> +{
>> +struct property *prop;
>> +
>> +for (size_t i = 0; i < ARRAY_SIZE(rtas_function_table); ++i) {
>> +struct rtas_function *curr = _function_table[i];
>> +struct rtas_function *prior;
>> +int cmp;
>> +
>> +curr->token = RTAS_UNKNOWN_SERVICE;
>> +
>> +if (i == 0)
>> +continue;
>> +/*
>> + * Ensure table is sorted correctly for binary search
>> + * on function names.
>> + */
>> +prior = _function_table[i - 1];
>> +
>> +cmp = strcmp(prior->name, curr->name);
>> +if (cmp < 0)
>> +continue;
>> +
>> +if (cmp == 0) {
>> +pr_err("'%s' has duplicate function table entries\n",
>> +   curr->name);
>> +} else {
>> +pr_err("function table unsorted: '%s' wrongly precedes 
>> '%s'\n",
>> +   prior->name, curr->name);
>> +}
>> +}
> Just a thought, would it be simpler to use sort()? you already have the
> cmp_func implemented for bsearch().

It's an option, but I think a tradeoff is that we would have to
sacrifice some const-ness in the data structures (i.e. remove the const
qualifier from struct rtas_function's fields). And the table has to be
in *some* order, so it may as well be sorted by name from the start.

That said, I don't love resorting to a boot-time check for this. We
could sidestep the issue by generating the C code for the table and
indexes at build time, but it's hard to justify the effort when the set
of RTAS functions changes very slowly over time.

> As for the series as a whole:
> I am no RTAS expert but was able to build, boot and mess around with new
> tracepoints without errors:
>
> Tested-by: Nick Child 

Thanks for testing and reviewing!


Re: [PATCH 10/13] powerpc/rtas: improve function information lookups

2022-11-28 Thread Nathan Lynch
Andrew Donnellan  writes:
> On Wed, 2022-11-23 at 13:32 -0600, Nick Child wrote:
>> On 11/22/22 20:51, Andrew Donnellan wrote:
>> > On Fri, 2022-11-18 at 09:07 -0600, Nathan Lynch wrote:
>> > > +enum rtas_function_flags {
>> > > +   RTAS_FN_FLAG_BANNED_FOR_SYSCALL_ON_LE = (1 << 0),
>> > > +};
>> > 
>> > This seems to be new, what's the justification?
>> > 
>> 
>> Seems to be a run-time replacement of:
>> #ifdef CONFIG_CPU_BIG_ENDIAN
>> { "ibm,suspend-me", -1, -1, -1, -1, -1 },
>> { "ibm,update-nodes", -1, 0, -1, -1, -1, 4096 },
>> { "ibm,update-properties", -1, 0, -1, -1, -1, 4096 },
>> #endif
>> 
>> It looks to be handled logically:
>> + if (IS_ENABLED(CONFIG_CPU_LITTLE_ENDIAN) &&
>> +   (func->flags & RTAS_FN_FLAG_BANNED_FOR_SYSCALL_ON_LE))
>> +   goto err;
>> 
>> Perhaps, also allow the addition of any future special cases
>> for rtas functions easier to maintain?
>
> Makes sense, though I'm slightly confused about the original rationale
> for the ifdef and why it's not being fixed in userspace.

Nick C's explanation is correct. I will make the commit message more
explicit about the conversion, and document the flag in the code.

The original rationale:

commit de0f7349a0dd072e54b5fc04c305907b22d28a5f
Author: Nathan Lynch 
Date:   Mon Dec 7 15:51:33 2020 -0600

powerpc/rtas: prevent suspend-related sys_rtas use on LE

While drmgr has had work in some areas to make its RTAS syscall
interactions endian-neutral, its code for performing partition
migration via the syscall has never worked on LE. While it is able to
complete ibm,suspend-me successfully, it crashes when attempting the
subsequent ibm,update-nodes call.

drmgr is the only known (or plausible) user of ibm,suspend-me,
ibm,update-nodes, and ibm,update-properties, so allow them only in
big-endian configurations.

To summarize: we know these functions have never had working users via
sys_rtas on ppc64le, and we want to keep it that way.

> Slightly clunky name though, something like
> RTAS_FN_FLAG_SYSCALL_BE_ONLY might be less clunky?

RTAS_FN_FLAG_BANNED_FOR_SYSCALL_ON_LE is verbose, but I think it
communicates better that we are consciously imposing a policy in a
specific context.


Re: [PATCH 05/13] powerpc/pseries/eeh: use correct API for error log size

2022-11-28 Thread Nathan Lynch
Andrew Donnellan  writes:
> On Fri, 2022-11-18 at 09:07 -0600, Nathan Lynch wrote:
>> rtas-error-log-max is not the name of an RTAS function, so
>> rtas_token() is not the appropriate API for retrieving its value. We
>> already have rtas_get_error_log_max() which returns a sensible value
>> if the property is absent for any reason, so use that instead.
>> 
>> Signed-off-by: Nathan Lynch 
>> Fixes: 8d633291b4fc ("powerpc/eeh: pseries platform EEH error log
>> retrieval")
>> ---
>>  arch/powerpc/platforms/pseries/eeh_pseries.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c
>> b/arch/powerpc/platforms/pseries/eeh_pseries.c
>> index 8e40ccac0f44..e5e4f4aa5afd 100644
>> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
>> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
>> @@ -848,7 +848,7 @@ static int __init eeh_pseries_init(void)
>> }
>>  
>> /* Initialize error log size */
>> -   eeh_error_buf_size = rtas_token("rtas-error-log-max");
>> +   eeh_error_buf_size = rtas_get_error_log_max();
>> if (eeh_error_buf_size == RTAS_UNKNOWN_SERVICE) {
>
> This is now impossible, and the whole block makes little sense after
> the next patch

Indeed, will fix in v2, thanks.


Re: [PATCH 03/13] powerpc/rtas: avoid device tree lookups in rtas_os_term()

2022-11-28 Thread Nathan Lynch
"Nicholas Piggin"  writes:
> On Sat Nov 19, 2022 at 1:07 AM AEST, Nathan Lynch wrote:
>> rtas_os_term() is called during panic. Its behavior depends on a
>> couple of conditions in the /rtas node of the device tree, the
>> traversal of which entails locking and local IRQ state changes. If the
>> kernel panics while devtree_lock is held, rtas_os_term() as currently
>> written could hang.
>
> Nice.
>
>>
>> Instead of discovering the relevant characteristics at panic time,
>> cache them in file-static variables at boot. Note the lookup for
>> "ibm,extended-os-term" is converted to of_property_read_bool() since
>> it is a boolean property, not a RTAS function token.
>
> Small nit, but you could do that at the query site unless you
> were going to start using ibm,os-term without the extended
> capability.

I'm unsure that this is what you're suggesting, but we don't want to use
of_property_read_bool() in this context either, because it has the same
undesirable qualities as rtas_token().

> Reviewed-by: Nicholas Piggin 

Thanks!


Re: [PATCH 03/13] powerpc/rtas: avoid device tree lookups in rtas_os_term()

2022-11-28 Thread Nathan Lynch
Andrew Donnellan  writes:
>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> index c12dd5ed5e00..81e4996012b7 100644
>> --- a/arch/powerpc/kernel/rtas.c
>> +++ b/arch/powerpc/kernel/rtas.c
>> @@ -947,6 +947,8 @@ void __noreturn rtas_halt(void)
>>  
>>  /* Must be in the RMO region, so we place it here */
>>  static char rtas_os_term_buf[2048];
>> +static s32 ibm_os_term_token = RTAS_UNKNOWN_SERVICE;
>
> s32 and int are obviously identical, but rtas_token is declared using
> int, as are the other variables where we cache various tokens.

Right... I think it's better practice to use an explicitly sized type
where the data is directly derived from the device tree and ultimately
passed to the firmware call interface. Gradually enacting this while
tolerating some cosmetic inconsistency in the code seems OK to me, but
I'm open to other opinions.


[PATCH] driver core: fix up some missing class.devnode() conversions.

2022-11-28 Thread Greg Kroah-Hartman
In commit ff62b8e6588f ("driver core: make struct class.devnode() take a
const *") the ->devnode callback changed the pointer to be const, but a
few instances of PowerPC drivers were not caught for some reason.

Fix this up by changing the pointers to be const.

Reported-by: Stephen Rothwell 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Frederic Barrat 
Cc: Andrew Donnellan 
Cc: Arnd Bergmann 
Cc: linuxppc-dev@lists.ozlabs.org
Fixes: ff62b8e6588f ("driver core: make struct class.devnode() take a const *")
Signed-off-by: Greg Kroah-Hartman 
---
 arch/powerpc/platforms/book3s/vas-api.c | 2 +-
 drivers/misc/cxl/file.c | 2 +-
 drivers/misc/ocxl/file.c| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/book3s/vas-api.c 
b/arch/powerpc/platforms/book3s/vas-api.c
index 40f5ae5e1238..eb5bed333750 100644
--- a/arch/powerpc/platforms/book3s/vas-api.c
+++ b/arch/powerpc/platforms/book3s/vas-api.c
@@ -53,7 +53,7 @@ struct coproc_instance {
struct vas_window *txwin;
 };
 
-static char *coproc_devnode(struct device *dev, umode_t *mode)
+static char *coproc_devnode(const struct device *dev, umode_t *mode)
 {
return kasprintf(GFP_KERNEL, "crypto/%s", dev_name(dev));
 }
diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
index 3dbdce96fae0..5878329b011a 100644
--- a/drivers/misc/cxl/file.c
+++ b/drivers/misc/cxl/file.c
@@ -546,7 +546,7 @@ static const struct file_operations afu_master_fops = {
 };
 
 
-static char *cxl_devnode(struct device *dev, umode_t *mode)
+static char *cxl_devnode(const struct device *dev, umode_t *mode)
 {
if (cpu_has_feature(CPU_FTR_HVMODE) &&
CXL_DEVT_IS_CARD(dev->devt)) {
diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
index d46dba2df5a1..d96be36405a0 100644
--- a/drivers/misc/ocxl/file.c
+++ b/drivers/misc/ocxl/file.c
@@ -581,7 +581,7 @@ void ocxl_file_unregister_afu(struct ocxl_afu *afu)
device_unregister(>dev);
 }
 
-static char *ocxl_devnode(struct device *dev, umode_t *mode)
+static char *ocxl_devnode(const struct device *dev, umode_t *mode)
 {
return kasprintf(GFP_KERNEL, "ocxl/%s", dev_name(dev));
 }
-- 
2.38.1



[linux-next:master] BUILD REGRESSION 15f2f20ccbf2d04cb14e3e7635aa0447208c71e7

2022-11-28 Thread kernel test robot
tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: 15f2f20ccbf2d04cb14e3e7635aa0447208c71e7  Add linux-next specific 
files for 20221128

Error/Warning reports:

https://lore.kernel.org/oe-kbuild-all/202211041320.coq8eelj-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202211080348.bosishom-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202211090634.ryfkk0ws-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202211242120.mzzvguln-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202211282102.qur7hhrw-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202211282244.c1faavio-...@intel.com

Error/Warning: (recently discovered and may have been fixed)

arch/arm/mach-s3c/devs.c:32:10: fatal error: linux/platform_data/dma-s3c24xx.h: 
No such file or directory
arch/powerpc/kernel/kvm_emul.o: warning: objtool: kvm_template_end(): can't 
find starting instruction
arch/powerpc/kernel/optprobes_head.o: warning: objtool: 
optprobe_template_end(): can't find starting instruction
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dp.c:5075:24: warning: 
implicit conversion from 'enum ' to 'enum dc_status' 
[-Wenum-conversion]
drivers/gpu/drm/amd/amdgpu/../display/dc/irq/dcn201/irq_service_dcn201.c:40:20: 
warning: no previous prototype for 'to_dal_irq_source_dcn201' 
[-Wmissing-prototypes]
drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c:451:1: warning: no previous 
prototype for 'gf100_fifo_nonstall_block' [-Wmissing-prototypes]
drivers/gpu/drm/nouveau/nvkm/engine/fifo/runl.c:34:1: warning: no previous 
prototype for 'nvkm_engn_cgrp_get' [-Wmissing-prototypes]
drivers/gpu/drm/nouveau/nvkm/engine/gr/tu102.c:210:1: warning: no previous 
prototype for 'tu102_gr_load' [-Wmissing-prototypes]
drivers/gpu/drm/nouveau/nvkm/nvfw/acr.c:49:1: warning: no previous prototype 
for 'wpr_generic_header_dump' [-Wmissing-prototypes]
drivers/gpu/drm/nouveau/nvkm/subdev/acr/lsfw.c:221:21: warning: variable 'loc' 
set but not used [-Wunused-but-set-variable]
vmlinux.o: warning: objtool: __btrfs_map_block+0x21ad: unreachable instruction
vmlinux.o: warning: objtool: btrfs_calc_avail_data_space+0x4f: unreachable 
instruction

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- alpha-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-core-dc_link_dp.c:warning:implicit-conversion-from-enum-anonymous-to-enum-dc_status
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-irq-dcn201-irq_service_dcn201.c:warning:no-previous-prototype-for-to_dal_irq_source_dcn201
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-fifo-gf100.c:warning:no-previous-prototype-for-gf100_fifo_nonstall_block
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-fifo-runl.c:warning:no-previous-prototype-for-nvkm_engn_cgrp_get
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-gr-tu102.c:warning:no-previous-prototype-for-tu102_gr_load
|   |-- 
drivers-gpu-drm-nouveau-nvkm-nvfw-acr.c:warning:no-previous-prototype-for-wpr_generic_header_dump
|   `-- 
drivers-gpu-drm-nouveau-nvkm-subdev-acr-lsfw.c:warning:variable-loc-set-but-not-used
|-- alpha-randconfig-r031-20221128
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-fifo-gf100.c:warning:no-previous-prototype-for-gf100_fifo_nonstall_block
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-fifo-runl.c:warning:no-previous-prototype-for-nvkm_engn_cgrp_get
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-gr-tu102.c:warning:no-previous-prototype-for-tu102_gr_load
|   |-- 
drivers-gpu-drm-nouveau-nvkm-nvfw-acr.c:warning:no-previous-prototype-for-wpr_generic_header_dump
|   `-- 
drivers-gpu-drm-nouveau-nvkm-subdev-acr-lsfw.c:warning:variable-loc-set-but-not-used
|-- arc-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-core-dc_link_dp.c:warning:implicit-conversion-from-enum-anonymous-to-enum-dc_status
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-irq-dcn201-irq_service_dcn201.c:warning:no-previous-prototype-for-to_dal_irq_source_dcn201
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-fifo-gf100.c:warning:no-previous-prototype-for-gf100_fifo_nonstall_block
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-fifo-runl.c:warning:no-previous-prototype-for-nvkm_engn_cgrp_get
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-gr-tu102.c:warning:no-previous-prototype-for-tu102_gr_load
|   |-- 
drivers-gpu-drm-nouveau-nvkm-nvfw-acr.c:warning:no-previous-prototype-for-wpr_generic_header_dump
|   `-- 
drivers-gpu-drm-nouveau-nvkm-subdev-acr-lsfw.c:warning:variable-loc-set-but-not-used
|-- arc-randconfig-r024-20221127
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-core-dc_link_dp.c:warning:implicit-conversion-from-enum-anonymous-to-enum-dc_status
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-irq-dcn201-irq_service_dcn201.c:warning:no-previous-prototype-for-to_dal_irq_source_dcn201
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-fifo-gf100.c:warning:no-previous-prototype-for-gf100_fifo_nonstall_block
|   |-- 
drivers-gpu-drm-nouveau-nvkm-engine-fifo-runl.c:warning:no-previous-prototype

Re: [PATCH linux-next][RFC]torture: avoid offline tick_do_timer_cpu

2022-11-28 Thread Paul E. McKenney
On Mon, Nov 28, 2022 at 09:12:28AM +0100, Thomas Gleixner wrote:
> On Sun, Nov 27 2022 at 09:53, Paul E. McKenney wrote:
> > On Sun, Nov 27, 2022 at 01:40:28PM +0100, Thomas Gleixner wrote:
> >> There are quite some reasons why a CPU-hotplug or a hot-unplug operation
> >> can fail, which is not a fatal problem, really.
> >> 
> >> So if a CPU hotplug operation fails, then why can't the torture test
> >> just move on and validate that the system still behaves correctly?
> >> 
> >> That gives us more coverage than just testing the good case and giving
> >> up when something unexpected happens.
> >
> > Agreed, with access to a function like the tick_nohz_full_timekeeper()
> > suggested earlier in this email thread, then yes, it would make sense to
> > try to offline the CPU anyway, then forgive the failure in cases where
> > the CPU matches that indicated by tick_nohz_full_timekeeper().
> 
> Why special casing this? There are other valid reasons why offlining can
> fail. So we special case timekeeper today and then next week we special
> case something else just because. That does not make sense. If it fails
> there is a reason and you can log it. The important part is that the
> system is functional and stable after the fail and the rollback.

Perhaps there are other valid reasons, but they have not been showing up
in my torture-test runs for well over a decade.  Not saying that they
don't happen, of course.  But if they involved (say) cgroups, then my
test setup would not exercise them.

So are you looking to introduce spurious CPU-hotplug failures?  If so,
these will also affect things like suspend/resume.  Plus it will make
it much more difficult to detect real but intermittent CPU-hotplug bugs,
which is the motivation for special-casing the tick_nohz_full_timekeeper()
failures.

So we should discuss introduciton of any spurious failures that might
be under consideration.

Independently of that, the torture_onoff() functions can of course keep
some sort of histogram of the failure return codes.  Or are there other
failure indications that should be captured?

> >> I even argue that the torture test should inject random failures into
> >> the hotplug state machine to achieve extended code coverage.
> >
> > I could imagine torture_onoff() telling various CPU-hotplug notifiers
> > to refuse the transition using some TBD interface.
> 
> There is already an interface which is exposed to sysfs which allows you
> to enforce a "fail" at a defined hotplug state.

If you would like me to be testing this as part of my normal testing
regimen, I will need an in-kernel interface.  Such an interface is of
course not needed for modprobe-style testing, in which case the script
doing the modprobe and rmmod can of course manipulate the sysfs files.
But I don't do that sort of testing very often.  And when I do, it is
almost always with kernels configured for Meta's fleet, which almost
never do CPU-offline operations.

Thanx, Paul

> > That would better test the CPU-hotplug common code's ability to deal
> > with failures.
> 
> Correct.
> 
> > Or did you have something else/additional in mind?
> 
> No.
> 
> Thanks,
> 
> tglx


Re: [PATCH mm-unstable v1 16/20] mm/frame-vector: remove FOLL_FORCE usage

2022-11-28 Thread Tomasz Figa
On Mon, Nov 28, 2022 at 5:19 PM David Hildenbrand  wrote:
>
> On 28.11.22 09:17, Hans Verkuil wrote:
> > Hi David,
> >
> > On 27/11/2022 11:35, David Hildenbrand wrote:
> >> On 16.11.22 11:26, David Hildenbrand wrote:
> >>> FOLL_FORCE is really only for ptrace access. According to commit
> >>> 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are always
> >>> writable"), get_vaddr_frames() currently pins all pages writable as a
> >>> workaround for issues with read-only buffers.
> >>>
> >>> FOLL_FORCE, however, seems to be a legacy leftover as it predates
> >>> commit 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are
> >>> always writable"). Let's just remove it.
> >>>
> >>> Once the read-only buffer issue has been resolved, FOLL_WRITE could
> >>> again be set depending on the DMA direction.
> >>>
> >>> Cc: Hans Verkuil 
> >>> Cc: Marek Szyprowski 
> >>> Cc: Tomasz Figa 
> >>> Cc: Marek Szyprowski 
> >>> Cc: Mauro Carvalho Chehab 
> >>> Signed-off-by: David Hildenbrand 
> >>> ---
> >>>drivers/media/common/videobuf2/frame_vector.c | 2 +-
> >>>1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/media/common/videobuf2/frame_vector.c 
> >>> b/drivers/media/common/videobuf2/frame_vector.c
> >>> index 542dde9d2609..062e98148c53 100644
> >>> --- a/drivers/media/common/videobuf2/frame_vector.c
> >>> +++ b/drivers/media/common/videobuf2/frame_vector.c
> >>> @@ -50,7 +50,7 @@ int get_vaddr_frames(unsigned long start, unsigned int 
> >>> nr_frames,
> >>>start = untagged_addr(start);
> >>>  ret = pin_user_pages_fast(start, nr_frames,
> >>> -  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
> >>> +  FOLL_WRITE | FOLL_LONGTERM,
> >>>  (struct page **)(vec->ptrs));
> >>>if (ret > 0) {
> >>>vec->got_ref = true;
> >>
> >>
> >> Hi Andrew,
> >>
> >> see the discussion at [1] regarding a conflict and how to proceed with
> >> upstreaming. The conflict would be easy to resolve, however, also
> >> the patch description doesn't make sense anymore with [1].
> >
> > Might it be easier and less confusing if you post a v2 of this series
> > with my patch first? That way it is clear that 1) my patch has to come
> > first, and 2) that it is part of a single series and should be merged
> > by the mm subsystem.
> >
> > Less chances of things going wrong that way.
> >
> > Just mention in the v2 cover letter that the first patch was added to
> > make it easy to backport that fix without being hampered by merge
> > conflicts if it was added after your frame_vector.c patch.
>
> Yes, that's the way I would naturally do, it, however, Andrew prefers
> delta updates for minor changes.
>
> @Andrew, whatever you prefer!
>
> Thanks!
>

However you folks proceed with taking this patch, feel free to add my
Acked-by. Thanks!

Best regards,
Tomasz

> --
> Thanks,
>
> David / dhildenb
>


Re: [PATCH mm-unstable v1 16/20] mm/frame-vector: remove FOLL_FORCE usage

2022-11-28 Thread Hans Verkuil
On 28/11/2022 09:18, David Hildenbrand wrote:
> On 28.11.22 09:17, Hans Verkuil wrote:
>> Hi David,
>>
>> On 27/11/2022 11:35, David Hildenbrand wrote:
>>> On 16.11.22 11:26, David Hildenbrand wrote:
 FOLL_FORCE is really only for ptrace access. According to commit
 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are always
 writable"), get_vaddr_frames() currently pins all pages writable as a
 workaround for issues with read-only buffers.

 FOLL_FORCE, however, seems to be a legacy leftover as it predates
 commit 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are
 always writable"). Let's just remove it.

 Once the read-only buffer issue has been resolved, FOLL_WRITE could
 again be set depending on the DMA direction.

 Cc: Hans Verkuil 
 Cc: Marek Szyprowski 
 Cc: Tomasz Figa 
 Cc: Marek Szyprowski 
 Cc: Mauro Carvalho Chehab 
 Signed-off-by: David Hildenbrand 
 ---
    drivers/media/common/videobuf2/frame_vector.c | 2 +-
    1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/drivers/media/common/videobuf2/frame_vector.c 
 b/drivers/media/common/videobuf2/frame_vector.c
 index 542dde9d2609..062e98148c53 100644
 --- a/drivers/media/common/videobuf2/frame_vector.c
 +++ b/drivers/media/common/videobuf2/frame_vector.c
 @@ -50,7 +50,7 @@ int get_vaddr_frames(unsigned long start, unsigned int 
 nr_frames,
    start = untagged_addr(start);
      ret = pin_user_pages_fast(start, nr_frames,
 -  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
 +  FOLL_WRITE | FOLL_LONGTERM,
  (struct page **)(vec->ptrs));
    if (ret > 0) {
    vec->got_ref = true;
>>>
>>>
>>> Hi Andrew,
>>>
>>> see the discussion at [1] regarding a conflict and how to proceed with
>>> upstreaming. The conflict would be easy to resolve, however, also
>>> the patch description doesn't make sense anymore with [1].
>>
>> Might it be easier and less confusing if you post a v2 of this series
>> with my patch first? That way it is clear that 1) my patch has to come
>> first, and 2) that it is part of a single series and should be merged
>> by the mm subsystem.
>>
>> Less chances of things going wrong that way.
>>
>> Just mention in the v2 cover letter that the first patch was added to
>> make it easy to backport that fix without being hampered by merge
>> conflicts if it was added after your frame_vector.c patch.
> 
> Yes, that's the way I would naturally do, it, however, Andrew prefers delta 
> updates for minor changes.
> 
> @Andrew, whatever you prefer!

Andrew, I've resent my patch, this time with you CCed as well.

Regards,

Hans

> 
> Thanks!
> 



Re: [PATCH mm-unstable v1 16/20] mm/frame-vector: remove FOLL_FORCE usage

2022-11-28 Thread David Hildenbrand

On 28.11.22 09:17, Hans Verkuil wrote:

Hi David,

On 27/11/2022 11:35, David Hildenbrand wrote:

On 16.11.22 11:26, David Hildenbrand wrote:

FOLL_FORCE is really only for ptrace access. According to commit
707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are always
writable"), get_vaddr_frames() currently pins all pages writable as a
workaround for issues with read-only buffers.

FOLL_FORCE, however, seems to be a legacy leftover as it predates
commit 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are
always writable"). Let's just remove it.

Once the read-only buffer issue has been resolved, FOLL_WRITE could
again be set depending on the DMA direction.

Cc: Hans Verkuil 
Cc: Marek Szyprowski 
Cc: Tomasz Figa 
Cc: Marek Szyprowski 
Cc: Mauro Carvalho Chehab 
Signed-off-by: David Hildenbrand 
---
   drivers/media/common/videobuf2/frame_vector.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/media/common/videobuf2/frame_vector.c 
b/drivers/media/common/videobuf2/frame_vector.c
index 542dde9d2609..062e98148c53 100644
--- a/drivers/media/common/videobuf2/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -50,7 +50,7 @@ int get_vaddr_frames(unsigned long start, unsigned int 
nr_frames,
   start = untagged_addr(start);
     ret = pin_user_pages_fast(start, nr_frames,
-  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
+  FOLL_WRITE | FOLL_LONGTERM,
     (struct page **)(vec->ptrs));
   if (ret > 0) {
   vec->got_ref = true;



Hi Andrew,

see the discussion at [1] regarding a conflict and how to proceed with
upstreaming. The conflict would be easy to resolve, however, also
the patch description doesn't make sense anymore with [1].


Might it be easier and less confusing if you post a v2 of this series
with my patch first? That way it is clear that 1) my patch has to come
first, and 2) that it is part of a single series and should be merged
by the mm subsystem.

Less chances of things going wrong that way.

Just mention in the v2 cover letter that the first patch was added to
make it easy to backport that fix without being hampered by merge
conflicts if it was added after your frame_vector.c patch.


Yes, that's the way I would naturally do, it, however, Andrew prefers 
delta updates for minor changes.


@Andrew, whatever you prefer!

Thanks!

--
Thanks,

David / dhildenb



Re: [PATCH mm-unstable v1 16/20] mm/frame-vector: remove FOLL_FORCE usage

2022-11-28 Thread Hans Verkuil
Hi David,

On 27/11/2022 11:35, David Hildenbrand wrote:
> On 16.11.22 11:26, David Hildenbrand wrote:
>> FOLL_FORCE is really only for ptrace access. According to commit
>> 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are always
>> writable"), get_vaddr_frames() currently pins all pages writable as a
>> workaround for issues with read-only buffers.
>>
>> FOLL_FORCE, however, seems to be a legacy leftover as it predates
>> commit 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are
>> always writable"). Let's just remove it.
>>
>> Once the read-only buffer issue has been resolved, FOLL_WRITE could
>> again be set depending on the DMA direction.
>>
>> Cc: Hans Verkuil 
>> Cc: Marek Szyprowski 
>> Cc: Tomasz Figa 
>> Cc: Marek Szyprowski 
>> Cc: Mauro Carvalho Chehab 
>> Signed-off-by: David Hildenbrand 
>> ---
>>   drivers/media/common/videobuf2/frame_vector.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/media/common/videobuf2/frame_vector.c 
>> b/drivers/media/common/videobuf2/frame_vector.c
>> index 542dde9d2609..062e98148c53 100644
>> --- a/drivers/media/common/videobuf2/frame_vector.c
>> +++ b/drivers/media/common/videobuf2/frame_vector.c
>> @@ -50,7 +50,7 @@ int get_vaddr_frames(unsigned long start, unsigned int 
>> nr_frames,
>>   start = untagged_addr(start);
>>     ret = pin_user_pages_fast(start, nr_frames,
>> -  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
>> +  FOLL_WRITE | FOLL_LONGTERM,
>>     (struct page **)(vec->ptrs));
>>   if (ret > 0) {
>>   vec->got_ref = true;
> 
> 
> Hi Andrew,
> 
> see the discussion at [1] regarding a conflict and how to proceed with
> upstreaming. The conflict would be easy to resolve, however, also
> the patch description doesn't make sense anymore with [1].

Might it be easier and less confusing if you post a v2 of this series
with my patch first? That way it is clear that 1) my patch has to come
first, and 2) that it is part of a single series and should be merged
by the mm subsystem.

Less chances of things going wrong that way.

Just mention in the v2 cover letter that the first patch was added to
make it easy to backport that fix without being hampered by merge
conflicts if it was added after your frame_vector.c patch.

Regards,

Hans

> 
> 
> On top of mm-unstable, reverting this patch and applying [1] gives me
> an updated patch:
> 
> 
> From 1e66c25f1467c1f1e5f275312f2c6df29308d4df Mon Sep 17 00:00:00 2001
> From: David Hildenbrand 
> Date: Wed, 16 Nov 2022 11:26:55 +0100
> Subject: [PATCH] mm/frame-vector: remove FOLL_FORCE usage
> 
> GUP now supports reliable R/O long-term pinning in COW mappings, such
> that we break COW early. MAP_SHARED VMAs only use the shared zeropage so
> far in one corner case (DAXFS file with holes), which can be ignored
> because GUP does not support long-term pinning in fsdax (see
> check_vma_flags()).
> 
> Consequently, FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM is no longer required
> for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop
> using FOLL_FORCE, which is really only for ptrace access.
> 
> Reviewed-by: Daniel Vetter 
> Acked-by: Hans Verkuil 
> Cc: Hans Verkuil 
> Cc: Marek Szyprowski 
> Cc: Tomasz Figa 
> Cc: Marek Szyprowski 
> Cc: Mauro Carvalho Chehab 
> Signed-off-by: David Hildenbrand 
> ---
>  drivers/media/common/videobuf2/frame_vector.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/media/common/videobuf2/frame_vector.c 
> b/drivers/media/common/videobuf2/frame_vector.c
> index aad72640f055..8606fdacf5b8 100644
> --- a/drivers/media/common/videobuf2/frame_vector.c
> +++ b/drivers/media/common/videobuf2/frame_vector.c
> @@ -41,7 +41,7 @@ int get_vaddr_frames(unsigned long start, unsigned int 
> nr_frames, bool write,
>  int ret_pin_user_pages_fast = 0;
>  int ret = 0;
>  int err;
> -    unsigned int gup_flags = FOLL_FORCE | FOLL_LONGTERM;
> +    unsigned int gup_flags = FOLL_LONGTERM;
>  
>  if (nr_frames == 0)
>  return 0;



Re: [PATCH linux-next][RFC]torture: avoid offline tick_do_timer_cpu

2022-11-28 Thread Thomas Gleixner
On Sun, Nov 27 2022 at 09:53, Paul E. McKenney wrote:
> On Sun, Nov 27, 2022 at 01:40:28PM +0100, Thomas Gleixner wrote:
>> There are quite some reasons why a CPU-hotplug or a hot-unplug operation
>> can fail, which is not a fatal problem, really.
>> 
>> So if a CPU hotplug operation fails, then why can't the torture test
>> just move on and validate that the system still behaves correctly?
>> 
>> That gives us more coverage than just testing the good case and giving
>> up when something unexpected happens.
>
> Agreed, with access to a function like the tick_nohz_full_timekeeper()
> suggested earlier in this email thread, then yes, it would make sense to
> try to offline the CPU anyway, then forgive the failure in cases where
> the CPU matches that indicated by tick_nohz_full_timekeeper().

Why special casing this? There are other valid reasons why offlining can
fail. So we special case timekeeper today and then next week we special
case something else just because. That does not make sense. If it fails
there is a reason and you can log it. The important part is that the
system is functional and stable after the fail and the rollback.

>> I even argue that the torture test should inject random failures into
>> the hotplug state machine to achieve extended code coverage.
>
> I could imagine torture_onoff() telling various CPU-hotplug notifiers
> to refuse the transition using some TBD interface.

There is already an interface which is exposed to sysfs which allows you
to enforce a "fail" at a defined hotplug state.

> That would better test the CPU-hotplug common code's ability to deal
> with failures.

Correct.

> Or did you have something else/additional in mind?

No.

Thanks,

tglx