Re: [PATCH 10/13] powerpc/rtas: improve function information lookups
On Mon, 2022-11-28 at 18:08 -0600, Nathan Lynch wrote: > Andrew Donnellan writes: > > On Fri, 2022-11-18 at 09:07 -0600, Nathan Lynch wrote: > > > Convert rtas_token() to use a lockless binary search on the > > > function > > > table. Fall back to the old behavior for lookups against names > > > that > > > are not known to be RTAS functions, but issue a warning. > > > rtas_token() > > > is for function names; it is not a general facility for accessing > > > arbitrary properties of the /rtas node. All known misuses of > > > rtas_token() have been converted to more appropriate of_ APIs in > > > preceding changes. > > > > For in-kernel users, why not go all the way: make rtas_token() > > static > > and use it purely for the userspace API, > > Not sure what userspace API refers to here, can you be more specific? > I > don't think rtas_token() is exposed to user space. Right, what I actually meant to refer to here is the fact that sys_rtas takes a token, but when I wrote this I forgot that userspace doesn't pass a string, rather librtas implements rtas_token itself using the /proc interface to the device tree. > > > and switch kernel users over > > to using rtas_function_index directly? > > Well, I have work in progress to transition all rtas_token() users to > a > different API, but using opaque symbolic handles rather than exposing > the indexes. Something like: > > /* > * Opaque handle for client code to refer to RTAS functions. All > valid > * function handles are build-time constants prefixed with RTAS_FN_. > */ > typedef struct { > enum rtas_function_index index; > } rtas_fn_handle_t; > > #define rtas_fn_handle(x_) (const rtas_fn_handle_t) { .index = > rtas_fnidx(x_), } > > #define RTAS_FN_CHECK_EXCEPTION > rtas_fn_handle(CHECK_EXCEPTION) > #define RTAS_FN_DISPLAY_CHARACTER > rtas_fn_handle(DISPLAY_CHARACTER) > #define RTAS_FN_EVENT_SCAN > rtas_fn_handle(EVENT_SCAN) > > ... > > /** > * rtas_function_token() - RTAS function token lookup. > * @handle: Function handle, e.g. RTAS_FN_EVENT_SCAN. > * > * Context: Any context. > * Return: the token value for the function if implemented by this > platform, > * otherwise RTAS_UNKNOWN_SERVICE. > */ > s32 rtas_function_token(const rtas_fn_handle_t handle) > { > const size_t index = handle.index; > const bool out_of_bounds = index >= > ARRAY_SIZE(rtas_function_table); > > if (WARN_ONCE(out_of_bounds, "invalid function index %zu", > index)) > return RTAS_UNKNOWN_SERVICE; > > return rtas_function_table[index].token; > } > > But that's a tree-wide change that would touch various drivers, and I > feel like the current series is what I can reasonably hope to get > applied right now. It's not clear to me what the benefit of adding this additional layer of indirection would be. -- Andrew DonnellanOzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited
Re: [PATCH 13/13] powerpc/rtas: place tracepoints in do_enter_rtas()
On Tue Nov 29, 2022 at 9:44 AM AEST, Nathan Lynch wrote: > "Nicholas Piggin" writes: > > > On Sat Nov 19, 2022 at 1:07 AM AEST, Nathan Lynch wrote: > >> Call the just-added rtas tracepoints in do_enter_rtas(), taking care > >> to avoid function name lookups in the CPU offline path. > >> > >> Signed-off-by: Nathan Lynch > >> --- > >> arch/powerpc/kernel/rtas.c | 23 +++ > >> 1 file changed, 23 insertions(+) > >> > >> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c > >> index 198366d641d0..3487b42cfbf7 100644 > >> --- a/arch/powerpc/kernel/rtas.c > >> +++ b/arch/powerpc/kernel/rtas.c > >> @@ -38,6 +38,7 @@ > >> #include > >> #include > >> #include > >> +#include > >> #include > >> > >> enum rtas_function_flags { > >> @@ -525,6 +526,7 @@ void enter_rtas(unsigned long); > >> static void do_enter_rtas(struct rtas_args *args) > >> { > >>unsigned long msr; > >> + const char *name = NULL; > >> > >>/* > >> * Make sure MSR[RI] is currently enabled as it will be forced later > >> @@ -537,9 +539,30 @@ static void do_enter_rtas(struct rtas_args *args) > >> > >>hard_irq_disable(); /* Ensure MSR[EE] is disabled on PPC64 */ > >> > >> + if ((trace_rtas_input_enabled() || trace_rtas_output_enabled())) { > >> + /* > >> + * rtas_token_to_function() uses xarray which uses RCU, > >> + * but this code can run in the CPU offline path > >> + * (e.g. stop-self), after it's become invalid to call > >> + * RCU APIs. > >> + */ > > > > We can call this in real-mode via pseries_machine_check_realmode > > -> fwnmi_release_errinfo, so tracing should be disabled for that > > case too... Does this_cpu_set_ftrace_enabled(0) in the early > > machine check handler cover that sufficiently? > > I suspect so, but I'd like to verify. Do you know how I could exercise > this path in qemu or LPAR? I have some machine check injection patches for qemu but they never got merged... I can point you to them if you need. Should try get those merged again. Thanks, Nick
Re: [PATCH 03/13] powerpc/rtas: avoid device tree lookups in rtas_os_term()
On Tue Nov 29, 2022 at 4:26 AM AEST, Nathan Lynch wrote: > "Nicholas Piggin" writes: > > On Sat Nov 19, 2022 at 1:07 AM AEST, Nathan Lynch wrote: > >> rtas_os_term() is called during panic. Its behavior depends on a > >> couple of conditions in the /rtas node of the device tree, the > >> traversal of which entails locking and local IRQ state changes. If the > >> kernel panics while devtree_lock is held, rtas_os_term() as currently > >> written could hang. > > > > Nice. > > > >> > >> Instead of discovering the relevant characteristics at panic time, > >> cache them in file-static variables at boot. Note the lookup for > >> "ibm,extended-os-term" is converted to of_property_read_bool() since > >> it is a boolean property, not a RTAS function token. > > > > Small nit, but you could do that at the query site unless you > > were going to start using ibm,os-term without the extended > > capability. > > I'm unsure that this is what you're suggesting, but we don't want to use > of_property_read_bool() in this context either, because it has the same > undesirable qualities as rtas_token(). I mean rtas_initialize() could do if (of_property_read_bool(rtas.dev, "ibm,extended-os-term")) ibm_os_term_token = rtas_token("ibm,os-term"); Thanks, Nick
Re: [PATCH] driver core: fix up some missing class.devnode() conversions.
Le 28/11/2022 à 18:35, Greg Kroah-Hartman a écrit : > In commit ff62b8e6588f ("driver core: make struct class.devnode() take a > const *") the ->devnode callback changed the pointer to be const, but a > few instances of PowerPC drivers were not caught for some reason. > > Fix this up by changing the pointers to be const. Build fails: /linux/arch/powerpc/platforms/book3s/vas-api.c: In function 'vas_register_coproc_api': /linux/arch/powerpc/platforms/book3s/vas-api.c:590:31: error: assignment from incompatible pointer type [-Werror=incompatible-pointer-types] coproc_device.class->devnode = coproc_devnode; ^ cc1: all warnings being treated as errors > > Reported-by: Stephen Rothwell > Cc: Michael Ellerman > Cc: Nicholas Piggin > Cc: Christophe Leroy > Cc: Frederic Barrat > Cc: Andrew Donnellan > Cc: Arnd Bergmann > Cc: linuxppc-dev@lists.ozlabs.org > Fixes: ff62b8e6588f ("driver core: make struct class.devnode() take a const > *") > Signed-off-by: Greg Kroah-Hartman > --- > arch/powerpc/platforms/book3s/vas-api.c | 2 +- > drivers/misc/cxl/file.c | 2 +- > drivers/misc/ocxl/file.c| 2 +- > 3 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/platforms/book3s/vas-api.c > b/arch/powerpc/platforms/book3s/vas-api.c > index 40f5ae5e1238..eb5bed333750 100644 > --- a/arch/powerpc/platforms/book3s/vas-api.c > +++ b/arch/powerpc/platforms/book3s/vas-api.c > @@ -53,7 +53,7 @@ struct coproc_instance { > struct vas_window *txwin; > }; > > -static char *coproc_devnode(struct device *dev, umode_t *mode) > +static char *coproc_devnode(const struct device *dev, umode_t *mode) > { > return kasprintf(GFP_KERNEL, "crypto/%s", dev_name(dev)); > } > diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c > index 3dbdce96fae0..5878329b011a 100644 > --- a/drivers/misc/cxl/file.c > +++ b/drivers/misc/cxl/file.c > @@ -546,7 +546,7 @@ static const struct file_operations afu_master_fops = { > }; > > > -static char *cxl_devnode(struct device *dev, umode_t *mode) > +static char *cxl_devnode(const struct device *dev, umode_t *mode) > { > if (cpu_has_feature(CPU_FTR_HVMODE) && > CXL_DEVT_IS_CARD(dev->devt)) { > diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c > index d46dba2df5a1..d96be36405a0 100644 > --- a/drivers/misc/ocxl/file.c > +++ b/drivers/misc/ocxl/file.c > @@ -581,7 +581,7 @@ void ocxl_file_unregister_afu(struct ocxl_afu *afu) > device_unregister(>dev); > } > > -static char *ocxl_devnode(struct device *dev, umode_t *mode) > +static char *ocxl_devnode(const struct device *dev, umode_t *mode) > { > return kasprintf(GFP_KERNEL, "ocxl/%s", dev_name(dev)); > }
[PATCH v4 7/7] powerpc/64: Sanitise user registers on interrupt in pseries, POWERNV
Cause pseries and POWERNV platforms to default to zeroising all potentially user-defined registers when entering the kernel by means of any interrupt source, reducing user-influence of the kernel and the likelihood or producing speculation gadgets. Signed-off-by: Rohan McLure --- Resubmitting patches as their own series after v6 partially merged: Link: https://lore.kernel.org/all/166488988686.779920.13794870102696416283.b4...@ellerman.id.au/t/ v4: Default on POWERNV as well. --- arch/powerpc/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 280c797e0f30..2ab114a02f62 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -534,7 +534,7 @@ config HOTPLUG_CPU config INTERRUPT_SANITIZE_REGISTERS bool "Clear gprs on interrupt arrival" depends on PPC64 && ARCH_HAS_SYSCALL_WRAPPER - default PPC_BOOK3E_64 + default PPC_BOOK3E_64 || PPC_PSERIES || PPC_POWERNV help Reduce the influence of user register state on interrupt handlers and syscalls through clearing user state from registers before handling -- 2.37.2
[PATCH v4 6/7] powerpc/64e: Clear gprs on interrupt routine entry on Book3E
Zero GPRS r14-r31 on entry into the kernel for interrupt sources to limit influence of user-space values in potential speculation gadgets. Prior to this commit, all other GPRS are reassigned during the common prologue to interrupt handlers and so need not be zeroised explicitly. This may be done safely, without loss of register state prior to the interrupt, as the common prologue saves the initial values of non-volatiles, which are unconditionally restored in interrupt_64.S. Mitigation defaults to enabled by INTERRUPT_SANITIZE_REGISTERS. Reviewed-by: Nicholas Piggin Signed-off-by: Rohan McLure --- Resubmitting patches as their own series after v6 partially merged: Link: https://lore.kernel.org/all/166488988686.779920.13794870102696416283.b4...@ellerman.id.au/t/ --- arch/powerpc/kernel/exceptions-64e.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 930e36099015..6b842d5ea4e1 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -358,7 +358,6 @@ ret_from_mc_except: std r14,PACA_EXMC+EX_R14(r13); \ std r15,PACA_EXMC+EX_R15(r13) - /* Core exception code for all exceptions except TLB misses. */ #define EXCEPTION_COMMON_LVL(n, scratch, excf) \ exc_##n##_common: \ @@ -394,7 +393,8 @@ exc_##n##_common: \ std r12,STACK_FRAME_OVERHEAD-16(r1); /* mark the frame */ \ std r3,_TRAP(r1); /* set trap number */ \ std r0,RESULT(r1); /* clear regs->result */\ - SAVE_NVGPRS(r1); + SAVE_NVGPRS(r1);\ + SANITIZE_ZEROIZE_NVGPRS(); /* minimise speculation influence */ #define EXCEPTION_COMMON(n) \ EXCEPTION_COMMON_LVL(n, SPRN_SPRG_GEN_SCRATCH, PACA_EXGEN) -- 2.37.2
[PATCH v4 5/7] powerpc/64s: Zeroise gprs on interrupt routine entry on Book3S
Zeroise user state in gprs (assign to zero) to reduce the influence of user registers on speculation within kernel syscall handlers. Clears occur at the very beginning of the sc and scv 0 interrupt handlers, with restores occurring following the execution of the syscall handler. Zeroise GPRS r0, r2-r11, r14-r31, on entry into the kernel for all other interrupt sources. The remaining gprs are overwritten by entry macros to interrupt handlers, irrespective of whether or not a given handler consumes these register values. If an interrupt does not select the IMSR_R12 IOption, zeroise r12. Prior to this commit, r14-r31 are restored on a per-interrupt basis at exit, but now they are always restored on 64bit Book3S. Remove explicit REST_NVGPRS invocations on 64-bit Book3S. 32-bit systems do not clear user registers on interrupt, and continue to depend on the return value of interrupt_exit_user_prepare to determine whether or not to restore non-volatiles. The mmap_bench benchmark in selftests should rapidly invoke pagefaults. See ~0.8% performance regression with this mitigation, but this indicates the worst-case performance due to heavier-weight interrupt handlers. This mitigation is able to be enabled/disabled through CONFIG_INTERRUPT_SANITIZE_REGISTERS. Reviewed-by: Nicholas Piggin Signed-off-by: Rohan McLure --- v2: REST_NVGPRS should be conditional on mitigation in scv handler. Fix improper multi-line preprocessor macro in interrupt_64.S v4: Split off IMSR_R12 definition into its own patch. Move macro definitions for register sanitisation into asm/ppc_asm.h --- arch/powerpc/kernel/exceptions-64s.S | 27 ++- arch/powerpc/kernel/interrupt_64.S | 16 ++-- 2 files changed, 32 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 58d72db1d484..8ee3db3b6595 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -506,6 +506,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real, text) std r10,0(r1) /* make stack chain pointer */ std r0,GPR0(r1) /* save r0 in stackframe*/ std r10,GPR1(r1)/* save r1 in stackframe*/ + ZEROIZE_GPR(0) /* Mark our [H]SRRs valid for return */ li r10,1 @@ -548,8 +549,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) std r9,GPR11(r1) std r10,GPR12(r1) std r11,GPR13(r1) + .if !IMSR_R12 + ZEROIZE_GPRS(9, 12) + .else + ZEROIZE_GPRS(9, 11) + .endif SAVE_NVGPRS(r1) + SANITIZE_ZEROIZE_NVGPRS() .if IDAR .if IISIDE @@ -581,8 +588,8 @@ BEGIN_FTR_SECTION END_FTR_SECTION_IFSET(CPU_FTR_CFAR) ld r10,IAREA+EX_CTR(r13) std r10,_CTR(r1) - std r2,GPR2(r1) /* save r2 in stackframe*/ - SAVE_GPRS(3, 8, r1) /* save r3 - r8 in stackframe */ + SAVE_GPRS(2, 8, r1) /* save r2 - r8 in stackframe */ + ZEROIZE_GPRS(2, 8) mflrr9 /* Get LR, later save to stack */ LOAD_PACA_TOC() /* get kernel TOC into r2 */ std r9,_LINK(r1) @@ -700,6 +707,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR) mtlrr9 ld r9,_CCR(r1) mtcrr9 + SANITIZE_RESTORE_NVGPRS() REST_GPRS(2, 13, r1) REST_GPR(0, r1) /* restore original r1. */ @@ -1445,7 +1453,7 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) * do_break() may have changed the NV GPRS while handling a breakpoint. * If so, we need to restore them with their updated values. */ - REST_NVGPRS(r1) + HANDLER_RESTORE_NVGPRS() b interrupt_return_srr @@ -1671,7 +1679,7 @@ EXC_COMMON_BEGIN(alignment_common) GEN_COMMON alignment addir3,r1,STACK_FRAME_OVERHEAD bl alignment_exception - REST_NVGPRS(r1) /* instruction emulation may change GPRs */ + HANDLER_RESTORE_NVGPRS() /* instruction emulation may change GPRs */ b interrupt_return_srr @@ -1737,7 +1745,7 @@ EXC_COMMON_BEGIN(program_check_common) .Ldo_program_check: addir3,r1,STACK_FRAME_OVERHEAD bl program_check_exception - REST_NVGPRS(r1) /* instruction emulation may change GPRs */ + HANDLER_RESTORE_NVGPRS() /* instruction emulation may change GPRs */ b interrupt_return_srr @@ -2169,7 +2177,7 @@ EXC_COMMON_BEGIN(emulation_assist_common) GEN_COMMON emulation_assist addir3,r1,STACK_FRAME_OVERHEAD bl emulation_assist_interrupt - REST_NVGPRS(r1) /* instruction emulation may change GPRs */ + HANDLER_RESTORE_NVGPRS() /* instruction emulation may change GPRs */ b interrupt_return_hsrr @@ -2489,7 +2497,7 @@
[PATCH v4 4/7] powerpc/64s: IOption for MSR stored in r12
Interrupt handlers in asm/exceptions-64s.S contain a great deal of common code produced by the GEN_COMMON macros. Currently, at the exit point of the macro, r12 will contain the contents of the MSR. A future patch will cause these macros to zeroise architected registers to avoid potential speculation influence of user data. Provide an IOption that signals that r12 must be retained, as the interrupt handler assumes it to hold the contents of the MSR. Signed-off-by: Rohan McLure --- v4: Split 64s register sanitisation commit to establish this IOption --- arch/powerpc/kernel/exceptions-64s.S | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 5381a43e50fe..58d72db1d484 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -111,6 +111,7 @@ name: #define ISTACK .L_ISTACK_\name\() /* Set regular kernel stack */ #define __ISTACK(name) .L_ISTACK_ ## name #define IKUAP .L_IKUAP_\name\() /* Do KUAP lock */ +#define IMSR_R12 .L_IMSR_R12_\name\()/* Assumes MSR saved to r12 */ #define INT_DEFINE_BEGIN(n)\ .macro int_define_ ## n name @@ -176,6 +177,9 @@ do_define_int n .ifndef IKUAP IKUAP=1 .endif + .ifndef IMSR_R12 + IMSR_R12=0 + .endif .endm /* @@ -1751,6 +1755,7 @@ INT_DEFINE_BEGIN(fp_unavailable) #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE IKVM_REAL=1 #endif + IMSR_R12=1 INT_DEFINE_END(fp_unavailable) EXC_REAL_BEGIN(fp_unavailable, 0x800, 0x100) @@ -2372,6 +2377,7 @@ INT_DEFINE_BEGIN(altivec_unavailable) #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE IKVM_REAL=1 #endif + IMSR_R12=1 INT_DEFINE_END(altivec_unavailable) EXC_REAL_BEGIN(altivec_unavailable, 0xf20, 0x20) @@ -2421,6 +2427,7 @@ INT_DEFINE_BEGIN(vsx_unavailable) #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE IKVM_REAL=1 #endif + IMSR_R12=1 INT_DEFINE_END(vsx_unavailable) EXC_REAL_BEGIN(vsx_unavailable, 0xf40, 0x20) -- 2.37.2
[PATCH v4 3/7] powerpc/64: Sanitise common exit code for interrupts
Interrupt code is shared between Book3E/S 64-bit systems for interrupt handlers. Ensure that exit code correctly restores non-volatile gprs on each system when CONFIG_INTERRUPT_SANITIZE_REGISTERS is enabled. Also introduce macros for clearing/restoring registers on interrupt entry for when this configuration option is either disabled or enabled. Signed-off-by: Rohan McLure --- v4: New patch --- arch/powerpc/kernel/interrupt_64.S | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S index 978a173eb339..1ef4fdef74fb 100644 --- a/arch/powerpc/kernel/interrupt_64.S +++ b/arch/powerpc/kernel/interrupt_64.S @@ -408,9 +408,11 @@ interrupt_return_\srr\()_user: /* make backtraces match the _kernel variant */ _ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_user) addir3,r1,STACK_FRAME_OVERHEAD bl interrupt_exit_user_prepare +#ifndef CONFIG_INTERRUPT_SANITIZE_REGISTERS cmpdi r3,0 bne-.Lrestore_nvgprs_\srr .Lrestore_nvgprs_\srr\()_cont: +#endif std r1,PACA_EXIT_SAVE_R1(r13) /* save r1 for restart */ #ifdef CONFIG_PPC_BOOK3S .Linterrupt_return_\srr\()_user_rst_start: @@ -424,6 +426,7 @@ _ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_user) stb r11,PACAIRQHAPPENED(r13) # clear out possible HARD_DIS .Lfast_user_interrupt_return_\srr\(): + SANITIZE_RESTORE_NVGPRS() #ifdef CONFIG_PPC_BOOK3S .ifc \srr,srr lbz r4,PACASRR_VALID(r13) @@ -493,9 +496,11 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS) b . /* prevent speculative execution */ .Linterrupt_return_\srr\()_user_rst_end: +#ifndef CONFIG_INTERRUPT_SANITIZE_REGISTERS .Lrestore_nvgprs_\srr\(): REST_NVGPRS(r1) b .Lrestore_nvgprs_\srr\()_cont +#endif #ifdef CONFIG_PPC_BOOK3S interrupt_return_\srr\()_user_restart: @@ -576,6 +581,7 @@ _ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_kernel) stb r11,PACAIRQHAPPENED(r13) // clear the possible HARD_DIS .Lfast_kernel_interrupt_return_\srr\(): + SANITIZE_RESTORE_NVGPRS() cmpdi cr1,r3,0 #ifdef CONFIG_PPC_BOOK3S .ifc \srr,srr -- 2.37.2
[PATCH v4 1/7] powerpc/64: Add INTERRUPT_SANITIZE_REGISTERS Kconfig
Add Kconfig option for enabling clearing of registers on arrival in an interrupt handler. This reduces the speculation influence of registers on kernel internals. The option will be consumed by 64-bit systems that feature speculation and wish to implement this mitigation. This patch only introduces the Kconfig option, no actual mitigations. The primary overhead of this mitigation lies in an increased number of registers that must be saved and restored by interrupt handlers on Book3S systems. Enable by default on Book3E systems, which prior to this patch eagerly save and restore register state, meaning that the mitigation when implemented will have minimal overhead. Acked-by: Nicholas Piggin Signed-off-by: Rohan McLure --- Resubmitting patches as their own series after v6 partially merged: Link: https://lore.kernel.org/all/166488988686.779920.13794870102696416283.b4...@ellerman.id.au/t/ --- arch/powerpc/Kconfig | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 4fd4924f6d50..280c797e0f30 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -531,6 +531,15 @@ config HOTPLUG_CPU Say N if you are unsure. +config INTERRUPT_SANITIZE_REGISTERS + bool "Clear gprs on interrupt arrival" + depends on PPC64 && ARCH_HAS_SYSCALL_WRAPPER + default PPC_BOOK3E_64 + help + Reduce the influence of user register state on interrupt handlers and + syscalls through clearing user state from registers before handling + the exception. + config PPC_QUEUED_SPINLOCKS bool "Queued spinlocks" if EXPERT depends on SMP -- 2.37.2
[PATCH v4 2/7] powerpc/64: Add interrupt register sanitisation macros
Include in asm/ppc_asm.h macros to be used in multiple successive patches to implement zeroising architected registers in interrupt handlers. Registers will be sanitised in this fashion in future patches to reduce the speculation influence of user-controlled register values. These mitigations will be configurable through the CONFIG_INTERRUPT_SANITIZE_REGISTERS Kconfig option. Included are macros for conditionally zeroising registers and restoring as required with the mitigation enabled. With the mitigation disabled, non-volatiles must be restored on demand at separate locations to those required by the mitigation. Signed-off-by: Rohan McLure --- v4: New patch --- arch/powerpc/include/asm/ppc_asm.h | 17 + 1 file changed, 17 insertions(+) diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h index 753a2757bcd4..272b2795c36a 100644 --- a/arch/powerpc/include/asm/ppc_asm.h +++ b/arch/powerpc/include/asm/ppc_asm.h @@ -74,6 +74,23 @@ #define SAVE_GPR(n, base) SAVE_GPRS(n, n, base) #define REST_GPR(n, base) REST_GPRS(n, n, base) +/* macros for handling user register sanitisation */ +#ifdef CONFIG_INTERRUPT_SANITIZE_REGISTERS +#define SANITIZE_ZEROIZE_SYSCALL_GPRS()ZEROIZE_GPR(0); \ + ZEROIZE_GPRS(5, 12);\ + ZEROIZE_NVGPRS() +#define SANITIZE_ZEROIZE_INTERRUPT_NVGPRS()ZEROIZE_NVGPRS() +#define SANITIZE_ZEROIZE_NVGPRS() ZEROIZE_NVGPRS() +#define SANITIZE_RESTORE_NVGPRS() REST_NVGPRS(r1) +#define HANDLER_RESTORE_NVGPRS() +#else +#define SANITIZE_ZEROIZE_INTERRUPT_NVGPRS() +#define SANITIZE_ZEROIZE_SYSCALL_GPRS() +#define SANITIZE_ZEROIZE_NVGPRS() +#define SANITIZE_RESTORE_NVGPRS() +#define HANDLER_RESTORE_NVGPRS() REST_NVGPRS(r1) +#endif /* CONFIG_INTERRUPT_SANITIZE_REGISTERS */ + #define SAVE_FPR(n, base) stfdn,8*TS_FPRWIDTH*(n)(base) #define SAVE_2FPRS(n, base)SAVE_FPR(n, base); SAVE_FPR(n+1, base) #define SAVE_4FPRS(n, base)SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base) -- 2.37.2
Re: [PATCH 1/3] firmware_loader: remove #include
On 11/25/22 21:09, Thomas Weißschuh wrote: > utsrelease.h is potentially generated on each build. > By removing this unused include we can get rid of some spurious > recompilations. Reviewed-by: Russ Weight > Signed-off-by: Thomas Weißschuh > --- > drivers/base/firmware_loader/firmware.h | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/drivers/base/firmware_loader/firmware.h > b/drivers/base/firmware_loader/firmware.h > index fe77e91c38a2..bf549d6500d7 100644 > --- a/drivers/base/firmware_loader/firmware.h > +++ b/drivers/base/firmware_loader/firmware.h > @@ -9,8 +9,6 @@ > #include > #include > > -#include > - > /** > * enum fw_opt - options to control firmware loading behaviour > * > > base-commit: 0b1dcc2cf55ae6523c6fbd0d741b3ac28c9f4536
Re: [PATCH v6 0/4] Option to build big-endian with ELFv2 ABI
On Mon, 28 Nov 2022 at 04:16, Nicholas Piggin wrote: > > This is hopefully the final attempt. Luis was happy for the module > patch to go via the powerpc tree, so I've put the the ELFv2 for big > endian build patches into the series. Hopefully we can deprecate > the ELFv1 ABI > > Since v5, I cleaned up patch 2 as per Christophe's review. And patch > 4 I removed the EXPERT depends so it's easier to test. It's marked as > experimental, but we should soon make it default and try to deprecate > the v1 ABI so we can eventually remove it. Reviewed-by: Joel Stanley I did some builds and boot tested them in qemu fine. > > Thanks, > Nick > > Nicholas Piggin (4): > module: add module_elf_check_arch for module-specific checks > powerpc/64: Add module check for ELF ABI version > powerpc/64: Add big-endian ELFv2 flavour to crypto VMX asm generation > powerpc/64: Option to build big-endian with ELFv2 ABI > > arch/powerpc/Kconfig | 21 + > arch/powerpc/kernel/module_64.c| 10 ++ > arch/powerpc/platforms/Kconfig.cputype | 4 ++-- > drivers/crypto/vmx/Makefile| 12 +++- > drivers/crypto/vmx/ppc-xlate.pl| 10 ++ > include/linux/moduleloader.h | 3 +++ > kernel/module/main.c | 10 ++ > 7 files changed, 63 insertions(+), 7 deletions(-) > > -- > 2.37.2 >
Re: [PATCH v6] livepatch: Clear relocation targets on a module removal
On Fri, Nov 18, 2022 at 8:24 AM Petr Mladek wrote: > [...] > > > > +#ifdef CONFIG_LIVEPATCH > > +void clear_relocate_add(Elf64_Shdr *sechdrs, > > +const char *strtab, > > +unsigned int symindex, > > +unsigned int relsec, > > +struct module *me) > > +{ > > + unsigned int i; > > + Elf64_Rela *rela = (void *)sechdrs[relsec].sh_addr; > > + Elf64_Sym *sym; > > + unsigned long *location; > > + const char *symname; > > + u32 *instruction; > > + > > + pr_debug("Clearing ADD relocate section %u to %u\n", relsec, > > + sechdrs[relsec].sh_info); > > + > > + for (i = 0; i < sechdrs[relsec].sh_size / sizeof(*rela); i++) { > > + location = (void *)sechdrs[sechdrs[relsec].sh_info].sh_addr > > + + rela[i].r_offset; > > + sym = (Elf64_Sym *)sechdrs[symindex].sh_addr > > + + ELF64_R_SYM(rela[i].r_info); > > + symname = me->core_kallsyms.strtab > > + + sym->st_name; > > + > > + if (ELF64_R_TYPE(rela[i].r_info) != R_PPC_REL24) > > + continue; > > + /* > > + * reverse the operations in apply_relocate_add() for case > > + * R_PPC_REL24. > > + */ > > + if (sym->st_shndx != SHN_UNDEF && > > + sym->st_shndx != SHN_LIVEPATCH) > > + continue; > > + > > + instruction = (u32 *)location; > > + if (is_mprofile_ftrace_call(symname)) > > + continue; > > + > > + if (!instr_is_relative_link_branch(ppc_inst(*instruction))) > > + continue; > > + > > + instruction += 1; > > + patch_instruction(instruction, ppc_inst(PPC_RAW_NOP())); > > + } > > + > > +} > > This looks like a lot of duplicated code. Isn't it? TBH, I think the duplicated code is not really bad. apply_relocate_add() is a much more complicated function, I would rather not mess it up to make this function a little simpler. [...] > > This duplicates a lot of code. Please, rename apply_relocate_add() the > same way as __apply_clear_relocate_add() and add the "apply" parameter. > Then add the wrappers for this: > > int write_relocate_add(Elf64_Shdr *sechdrs, >const char *strtab, >unsigned int symindex, >unsigned int relsec, >struct module *me, >bool apply) > { > int ret; > bool early = me->state == MODULE_STATE_UNFORMED; > void *(*write)(void *, const void *, size_t) = memcpy; > > if (!early) { > write = text_poke; > mutex_lock(_mutex); > } How about we move the "early" logic into __write_relocate_add()? > > ret = __write_relocate_add(sechdrs, strtab, symindex, relsec, me, > write, apply); > > if (!early) { > text_poke_sync(); > mutex_unlock(_mutex); > } > > return ret; > } > > int apply_relocate_add(Elf64_Shdr *sechdrs, >const char *strtab, >unsigned int symindex, >unsigned int relsec, >struct module *me) > { > return write_relocate_add(sechdrs, strtab, symindex, relsec, me, > true); Then we just call __write_relocate_add() from here... > } > > #ifdef CONFIG_LIVEPATCH > void apply_relocate_add(Elf64_Shdr *sechdrs, > const char *strtab, > unsigned int symindex, > unsigned int relsec, > struct module *me) > { > write_relocate_add(sechdrs, strtab, symindex, relsec, me, false); and here. > } > #endif > > > > +#endif > > + > > #endif > > > > int module_finalize(const Elf_Ehdr *hdr, > > --- a/kernel/livepatch/core.c > > +++ b/kernel/livepatch/core.c > > @@ -316,6 +316,45 @@ int klp_apply_section_relocs(struct module *pmod, > > Elf_Shdr *sechdrs, > > return apply_relocate_add(sechdrs, strtab, symndx, secndx, pmod); > > } > > > > +static void klp_clear_object_relocations(struct module *pmod, > > + struct klp_object *obj) > > +{ > > + int i, cnt; > > + const char *objname, *secname; > > + char sec_objname[MODULE_NAME_LEN]; > > + Elf_Shdr *sec; > > + > > + objname = klp_is_module(obj) ? obj->name : "vmlinux"; > > + > > + /* For each klp relocation section */ > > + for (i = 1; i < pmod->klp_info->hdr.e_shnum; i++) { > > + sec = pmod->klp_info->sechdrs + i; > > + secname = pmod->klp_info->secstrings + sec->sh_name; > > + if (!(sec->sh_flags & SHF_RELA_LIVEPATCH)) > > + continue; > > + > > + /* > > +
[PATCH] soc: fsl: qe: Fix refcount leak in par_io_of_config
of_parse_phandle() returns a node pointer with refcount incremented, we should use of_node_put() on error path. Add missing of_node_put() to avoid refcount leak. Fixes: 986585385131 ("[POWERPC] Add QUICC Engine (QE) infrastructure") Signed-off-by: Zheng Yongjun --- drivers/soc/fsl/qe/qe_io.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/soc/fsl/qe/qe_io.c b/drivers/soc/fsl/qe/qe_io.c index a5e2d0e5ab51..e26836315167 100644 --- a/drivers/soc/fsl/qe/qe_io.c +++ b/drivers/soc/fsl/qe/qe_io.c @@ -159,11 +159,13 @@ int par_io_of_config(struct device_node *np) pio_map = of_get_property(pio, "pio-map", _map_len); if (pio_map == NULL) { printk(KERN_ERR "pio-map is not set!\n"); + of_node_put(pio); return -1; } pio_map_len /= sizeof(unsigned int); if ((pio_map_len % 6) != 0) { printk(KERN_ERR "pio-map format wrong!\n"); + of_node_put(pio); return -1; } -- 2.17.1
Re: [PATCH] powerpc: dts: turris1x.dts: Add channel labels for temperature sensor
Pali Rohár writes: > Michael, could you take this patch? Yep. With these dts patches it always helps if you tell me that it passes the DT schema checks, so that I don't get yelled at by the DT people :) cheers > On Sunday 09 October 2022 14:05:06 Pali Rohár wrote: >> On Friday 30 September 2022 14:46:18 Pali Rohár wrote: >> > + CC hwmon ML >> > >> > On Friday 30 September 2022 14:39:01 Pali Rohár wrote: >> > > Channel 0 of SA56004ED chip refers to internal SA56004ED chip sensor >> > > (chip >> > > itself is located on the board) and channel 1 of SA56004ED chip refers to >> > > external sensor which is connected to temperature diode of the P2020 CPU. >> > > >> > > Fixes: 54c15ec3b738 ("powerpc: dts: Add DTS file for CZ.NIC Turris 1.x >> > > routers") >> > > Signed-off-by: Pali Rohár >> > > --- >> > > With this change userspace 'sensors' applications prints labels: >> > > >> > > $ sensors >> > > sa56004-i2c-0-4c >> > > Adapter: MPC adapter (i2c@3000) >> > > board:+34.2°C (low = +0.0°C, high = +70.0°C) >> > >(crit = +85.0°C, hyst = +75.0°C) >> > > cpu: +58.9°C (low = +0.0°C, high = +70.0°C) >> > >(crit = +85.0°C, hyst = +75.0°C) >> > > >> > > And without this change it prints just generic tempX names: >> > > >> > > $ sensors >> > > sa56004-i2c-0-4c >> > > Adapter: MPC adapter (i2c@3000) >> > > temp1:+43.0°C (low = +0.0°C, high = +70.0°C) >> > >(crit = +85.0°C, hyst = +75.0°C) >> > > temp2:+63.4°C (low = +0.0°C, high = +70.0°C) >> > >(crit = +85.0°C, hyst = +75.0°C) >> > > --- >> > > arch/powerpc/boot/dts/turris1x.dts | 14 ++ >> > > 1 file changed, 14 insertions(+) >> > > >> > > diff --git a/arch/powerpc/boot/dts/turris1x.dts >> > > b/arch/powerpc/boot/dts/turris1x.dts >> > > index 4033c554b06a..5b5278c32e43 100644 >> > > --- a/arch/powerpc/boot/dts/turris1x.dts >> > > +++ b/arch/powerpc/boot/dts/turris1x.dts >> > > @@ -69,6 +69,20 @@ >> > > interrupt-parent = <>; >> > > interrupts = <12 IRQ_TYPE_LEVEL_LOW>, >> > > /* GPIO12 - ALERT pin */ >> > > <13 IRQ_TYPE_LEVEL_LOW>; >> > > /* GPIO13 - CRIT pin */ >> > > +#address-cells = <1>; >> > > +#size-cells = <0>; >> > > + >> > > +/* Local temperature sensor (SA56004ED >> > > internal) */ >> > > +channel@0 { >> > > +reg = <0>; >> > > +label = "board"; >> > > +}; >> > > + >> > > +/* Remote temperature sensor (D+/D- >> > > connected to P2020 CPU Temperature Diode) */ >> > > +channel@1 { >> > > +reg = <1>; >> > > +label = "cpu"; >> > > +}; >> > >> > I'm not sure if you want UPPERCASE, lowercase, PascalCase, kebab-case >> > or snake_case format of labels. Or if you want also "temp" or >> > "temperature" keyword in the label. So please adjust label to the >> > preferred one, if proposed format is not the correct. >> >> Ok, if nobody complains then please take this patch as is. >> >> > > }; >> > > >> > > /* DDR3 SPD/EEPROM */ >> > > -- >> > > 2.20.1 >> > >
Re: [PATCH 13/13] powerpc/rtas: place tracepoints in do_enter_rtas()
Nathan Lynch writes: > "Nicholas Piggin" writes: >> On Sat Nov 19, 2022 at 1:07 AM AEST, Nathan Lynch wrote: >>> Call the just-added rtas tracepoints in do_enter_rtas(), taking care >>> to avoid function name lookups in the CPU offline path. >>> >>> Signed-off-by: Nathan Lynch >>> --- >>> arch/powerpc/kernel/rtas.c | 23 +++ >>> 1 file changed, 23 insertions(+) >>> >>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c >>> index 198366d641d0..3487b42cfbf7 100644 >>> --- a/arch/powerpc/kernel/rtas.c >>> +++ b/arch/powerpc/kernel/rtas.c >>> @@ -38,6 +38,7 @@ >>> #include >>> #include >>> #include >>> +#include >>> #include >>> >>> enum rtas_function_flags { >>> @@ -525,6 +526,7 @@ void enter_rtas(unsigned long); >>> static void do_enter_rtas(struct rtas_args *args) >>> { >>> unsigned long msr; >>> + const char *name = NULL; >>> >>> /* >>> * Make sure MSR[RI] is currently enabled as it will be forced later >>> @@ -537,9 +539,30 @@ static void do_enter_rtas(struct rtas_args *args) >>> >>> hard_irq_disable(); /* Ensure MSR[EE] is disabled on PPC64 */ >>> >>> + if ((trace_rtas_input_enabled() || trace_rtas_output_enabled())) { >>> + /* >>> +* rtas_token_to_function() uses xarray which uses RCU, >>> +* but this code can run in the CPU offline path >>> +* (e.g. stop-self), after it's become invalid to call >>> +* RCU APIs. >>> +*/ >> >> We can call this in real-mode via pseries_machine_check_realmode >> -> fwnmi_release_errinfo, so tracing should be disabled for that >> case too... Does this_cpu_set_ftrace_enabled(0) in the early >> machine check handler cover that sufficiently? > > I suspect so, but I'd like to verify. Do you know how I could exercise > this path in qemu or LPAR? On a P9 or P10 LPAR you should be able to use tools/testing/selftests/powerpc/mce/inject-ra-err cheers
Re: [PATCH 10/13] powerpc/rtas: improve function information lookups
Andrew Donnellan writes: > On Fri, 2022-11-18 at 09:07 -0600, Nathan Lynch wrote: >> Convert rtas_token() to use a lockless binary search on the function >> table. Fall back to the old behavior for lookups against names that >> are not known to be RTAS functions, but issue a warning. rtas_token() >> is for function names; it is not a general facility for accessing >> arbitrary properties of the /rtas node. All known misuses of >> rtas_token() have been converted to more appropriate of_ APIs in >> preceding changes. > > For in-kernel users, why not go all the way: make rtas_token() static > and use it purely for the userspace API, Not sure what userspace API refers to here, can you be more specific? I don't think rtas_token() is exposed to user space. > and switch kernel users over > to using rtas_function_index directly? Well, I have work in progress to transition all rtas_token() users to a different API, but using opaque symbolic handles rather than exposing the indexes. Something like: /* * Opaque handle for client code to refer to RTAS functions. All valid * function handles are build-time constants prefixed with RTAS_FN_. */ typedef struct { enum rtas_function_index index; } rtas_fn_handle_t; #define rtas_fn_handle(x_) (const rtas_fn_handle_t) { .index = rtas_fnidx(x_), } #define RTAS_FN_CHECK_EXCEPTION rtas_fn_handle(CHECK_EXCEPTION) #define RTAS_FN_DISPLAY_CHARACTER rtas_fn_handle(DISPLAY_CHARACTER) #define RTAS_FN_EVENT_SCANrtas_fn_handle(EVENT_SCAN) ... /** * rtas_function_token() - RTAS function token lookup. * @handle: Function handle, e.g. RTAS_FN_EVENT_SCAN. * * Context: Any context. * Return: the token value for the function if implemented by this platform, * otherwise RTAS_UNKNOWN_SERVICE. */ s32 rtas_function_token(const rtas_fn_handle_t handle) { const size_t index = handle.index; const bool out_of_bounds = index >= ARRAY_SIZE(rtas_function_table); if (WARN_ONCE(out_of_bounds, "invalid function index %zu", index)) return RTAS_UNKNOWN_SERVICE; return rtas_function_table[index].token; } But that's a tree-wide change that would touch various drivers, and I feel like the current series is what I can reasonably hope to get applied right now.
Re: [PATCH 13/13] powerpc/rtas: place tracepoints in do_enter_rtas()
"Nicholas Piggin" writes: > On Sat Nov 19, 2022 at 1:07 AM AEST, Nathan Lynch wrote: >> Call the just-added rtas tracepoints in do_enter_rtas(), taking care >> to avoid function name lookups in the CPU offline path. >> >> Signed-off-by: Nathan Lynch >> --- >> arch/powerpc/kernel/rtas.c | 23 +++ >> 1 file changed, 23 insertions(+) >> >> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c >> index 198366d641d0..3487b42cfbf7 100644 >> --- a/arch/powerpc/kernel/rtas.c >> +++ b/arch/powerpc/kernel/rtas.c >> @@ -38,6 +38,7 @@ >> #include >> #include >> #include >> +#include >> #include >> >> enum rtas_function_flags { >> @@ -525,6 +526,7 @@ void enter_rtas(unsigned long); >> static void do_enter_rtas(struct rtas_args *args) >> { >> unsigned long msr; >> +const char *name = NULL; >> >> /* >> * Make sure MSR[RI] is currently enabled as it will be forced later >> @@ -537,9 +539,30 @@ static void do_enter_rtas(struct rtas_args *args) >> >> hard_irq_disable(); /* Ensure MSR[EE] is disabled on PPC64 */ >> >> +if ((trace_rtas_input_enabled() || trace_rtas_output_enabled())) { >> +/* >> + * rtas_token_to_function() uses xarray which uses RCU, >> + * but this code can run in the CPU offline path >> + * (e.g. stop-self), after it's become invalid to call >> + * RCU APIs. >> + */ > > We can call this in real-mode via pseries_machine_check_realmode > -> fwnmi_release_errinfo, so tracing should be disabled for that > case too... Does this_cpu_set_ftrace_enabled(0) in the early > machine check handler cover that sufficiently? I suspect so, but I'd like to verify. Do you know how I could exercise this path in qemu or LPAR?
Re: [PATCH mm-unstable v1 16/20] mm/frame-vector: remove FOLL_FORCE usage
On Mon, 28 Nov 2022 09:18:47 +0100 David Hildenbrand wrote: > > Less chances of things going wrong that way. > > > > Just mention in the v2 cover letter that the first patch was added to > > make it easy to backport that fix without being hampered by merge > > conflicts if it was added after your frame_vector.c patch. > > Yes, that's the way I would naturally do, it, however, Andrew prefers > delta updates for minor changes. > > @Andrew, whatever you prefer! I'm inclined to let things sit as they are. Cross-tree conflicts happen, and Linus handles them. I'll flag this (very simple) conflict in the pull request, if MM merges second. If v4l merges second then hopefully they will do the same. But this one is so simple that Linus hardly needs our help. But Linus won't be editing changelogs so that the changelog makes more sense after both trees are joined. I'm inclined to let the changelog sit as it is as well.
Re: [PATCH 10/13] powerpc/rtas: improve function information lookups
Nick Child writes: > On 11/18/22 09:07, Nathan Lynch wrote: >> +static int __init rtas_token_to_function_xarray_init(void) >> +{ >> +int err = 0; >> + >> +for (size_t i = 0; i < ARRAY_SIZE(rtas_function_table); ++i) { >> +const struct rtas_function *func = _function_table[i]; >> +const s32 token = func->token; >> + >> +if (token == RTAS_UNKNOWN_SERVICE) >> +continue; >> + >> +err = xa_err(xa_store(_token_to_function_xarray, >> + token, (void *)func, GFP_KERNEL)); >> +if (err) >> +break; >> +} >> + >> +return err; >> +} >> +arch_initcall(rtas_token_to_function_xarray_init); >> + >> +static const struct rtas_function *rtas_token_to_function(s32 token) >> +{ >> +const struct rtas_function *func; >> + >> +if (WARN_ONCE(token < 0, "invalid token %d", token)) >> +return NULL; >> + >> +func = xa_load(_token_to_function_xarray, (unsigned long)token); >> + > Why typecast token here and not in xa_store? No good reason. I'll add it to the xa_store() call site. >> +static void __init rtas_function_table_init(void) >> +{ >> +struct property *prop; >> + >> +for (size_t i = 0; i < ARRAY_SIZE(rtas_function_table); ++i) { >> +struct rtas_function *curr = _function_table[i]; >> +struct rtas_function *prior; >> +int cmp; >> + >> +curr->token = RTAS_UNKNOWN_SERVICE; >> + >> +if (i == 0) >> +continue; >> +/* >> + * Ensure table is sorted correctly for binary search >> + * on function names. >> + */ >> +prior = _function_table[i - 1]; >> + >> +cmp = strcmp(prior->name, curr->name); >> +if (cmp < 0) >> +continue; >> + >> +if (cmp == 0) { >> +pr_err("'%s' has duplicate function table entries\n", >> + curr->name); >> +} else { >> +pr_err("function table unsorted: '%s' wrongly precedes >> '%s'\n", >> + prior->name, curr->name); >> +} >> +} > Just a thought, would it be simpler to use sort()? you already have the > cmp_func implemented for bsearch(). It's an option, but I think a tradeoff is that we would have to sacrifice some const-ness in the data structures (i.e. remove the const qualifier from struct rtas_function's fields). And the table has to be in *some* order, so it may as well be sorted by name from the start. That said, I don't love resorting to a boot-time check for this. We could sidestep the issue by generating the C code for the table and indexes at build time, but it's hard to justify the effort when the set of RTAS functions changes very slowly over time. > As for the series as a whole: > I am no RTAS expert but was able to build, boot and mess around with new > tracepoints without errors: > > Tested-by: Nick Child Thanks for testing and reviewing!
Re: [PATCH 10/13] powerpc/rtas: improve function information lookups
Andrew Donnellan writes: > On Wed, 2022-11-23 at 13:32 -0600, Nick Child wrote: >> On 11/22/22 20:51, Andrew Donnellan wrote: >> > On Fri, 2022-11-18 at 09:07 -0600, Nathan Lynch wrote: >> > > +enum rtas_function_flags { >> > > + RTAS_FN_FLAG_BANNED_FOR_SYSCALL_ON_LE = (1 << 0), >> > > +}; >> > >> > This seems to be new, what's the justification? >> > >> >> Seems to be a run-time replacement of: >> #ifdef CONFIG_CPU_BIG_ENDIAN >> { "ibm,suspend-me", -1, -1, -1, -1, -1 }, >> { "ibm,update-nodes", -1, 0, -1, -1, -1, 4096 }, >> { "ibm,update-properties", -1, 0, -1, -1, -1, 4096 }, >> #endif >> >> It looks to be handled logically: >> + if (IS_ENABLED(CONFIG_CPU_LITTLE_ENDIAN) && >> + (func->flags & RTAS_FN_FLAG_BANNED_FOR_SYSCALL_ON_LE)) >> + goto err; >> >> Perhaps, also allow the addition of any future special cases >> for rtas functions easier to maintain? > > Makes sense, though I'm slightly confused about the original rationale > for the ifdef and why it's not being fixed in userspace. Nick C's explanation is correct. I will make the commit message more explicit about the conversion, and document the flag in the code. The original rationale: commit de0f7349a0dd072e54b5fc04c305907b22d28a5f Author: Nathan Lynch Date: Mon Dec 7 15:51:33 2020 -0600 powerpc/rtas: prevent suspend-related sys_rtas use on LE While drmgr has had work in some areas to make its RTAS syscall interactions endian-neutral, its code for performing partition migration via the syscall has never worked on LE. While it is able to complete ibm,suspend-me successfully, it crashes when attempting the subsequent ibm,update-nodes call. drmgr is the only known (or plausible) user of ibm,suspend-me, ibm,update-nodes, and ibm,update-properties, so allow them only in big-endian configurations. To summarize: we know these functions have never had working users via sys_rtas on ppc64le, and we want to keep it that way. > Slightly clunky name though, something like > RTAS_FN_FLAG_SYSCALL_BE_ONLY might be less clunky? RTAS_FN_FLAG_BANNED_FOR_SYSCALL_ON_LE is verbose, but I think it communicates better that we are consciously imposing a policy in a specific context.
Re: [PATCH 05/13] powerpc/pseries/eeh: use correct API for error log size
Andrew Donnellan writes: > On Fri, 2022-11-18 at 09:07 -0600, Nathan Lynch wrote: >> rtas-error-log-max is not the name of an RTAS function, so >> rtas_token() is not the appropriate API for retrieving its value. We >> already have rtas_get_error_log_max() which returns a sensible value >> if the property is absent for any reason, so use that instead. >> >> Signed-off-by: Nathan Lynch >> Fixes: 8d633291b4fc ("powerpc/eeh: pseries platform EEH error log >> retrieval") >> --- >> arch/powerpc/platforms/pseries/eeh_pseries.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c >> b/arch/powerpc/platforms/pseries/eeh_pseries.c >> index 8e40ccac0f44..e5e4f4aa5afd 100644 >> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c >> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c >> @@ -848,7 +848,7 @@ static int __init eeh_pseries_init(void) >> } >> >> /* Initialize error log size */ >> - eeh_error_buf_size = rtas_token("rtas-error-log-max"); >> + eeh_error_buf_size = rtas_get_error_log_max(); >> if (eeh_error_buf_size == RTAS_UNKNOWN_SERVICE) { > > This is now impossible, and the whole block makes little sense after > the next patch Indeed, will fix in v2, thanks.
Re: [PATCH 03/13] powerpc/rtas: avoid device tree lookups in rtas_os_term()
"Nicholas Piggin" writes: > On Sat Nov 19, 2022 at 1:07 AM AEST, Nathan Lynch wrote: >> rtas_os_term() is called during panic. Its behavior depends on a >> couple of conditions in the /rtas node of the device tree, the >> traversal of which entails locking and local IRQ state changes. If the >> kernel panics while devtree_lock is held, rtas_os_term() as currently >> written could hang. > > Nice. > >> >> Instead of discovering the relevant characteristics at panic time, >> cache them in file-static variables at boot. Note the lookup for >> "ibm,extended-os-term" is converted to of_property_read_bool() since >> it is a boolean property, not a RTAS function token. > > Small nit, but you could do that at the query site unless you > were going to start using ibm,os-term without the extended > capability. I'm unsure that this is what you're suggesting, but we don't want to use of_property_read_bool() in this context either, because it has the same undesirable qualities as rtas_token(). > Reviewed-by: Nicholas Piggin Thanks!
Re: [PATCH 03/13] powerpc/rtas: avoid device tree lookups in rtas_os_term()
Andrew Donnellan writes: >> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c >> index c12dd5ed5e00..81e4996012b7 100644 >> --- a/arch/powerpc/kernel/rtas.c >> +++ b/arch/powerpc/kernel/rtas.c >> @@ -947,6 +947,8 @@ void __noreturn rtas_halt(void) >> >> /* Must be in the RMO region, so we place it here */ >> static char rtas_os_term_buf[2048]; >> +static s32 ibm_os_term_token = RTAS_UNKNOWN_SERVICE; > > s32 and int are obviously identical, but rtas_token is declared using > int, as are the other variables where we cache various tokens. Right... I think it's better practice to use an explicitly sized type where the data is directly derived from the device tree and ultimately passed to the firmware call interface. Gradually enacting this while tolerating some cosmetic inconsistency in the code seems OK to me, but I'm open to other opinions.
[PATCH] driver core: fix up some missing class.devnode() conversions.
In commit ff62b8e6588f ("driver core: make struct class.devnode() take a const *") the ->devnode callback changed the pointer to be const, but a few instances of PowerPC drivers were not caught for some reason. Fix this up by changing the pointers to be const. Reported-by: Stephen Rothwell Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Christophe Leroy Cc: Frederic Barrat Cc: Andrew Donnellan Cc: Arnd Bergmann Cc: linuxppc-dev@lists.ozlabs.org Fixes: ff62b8e6588f ("driver core: make struct class.devnode() take a const *") Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/platforms/book3s/vas-api.c | 2 +- drivers/misc/cxl/file.c | 2 +- drivers/misc/ocxl/file.c| 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/platforms/book3s/vas-api.c b/arch/powerpc/platforms/book3s/vas-api.c index 40f5ae5e1238..eb5bed333750 100644 --- a/arch/powerpc/platforms/book3s/vas-api.c +++ b/arch/powerpc/platforms/book3s/vas-api.c @@ -53,7 +53,7 @@ struct coproc_instance { struct vas_window *txwin; }; -static char *coproc_devnode(struct device *dev, umode_t *mode) +static char *coproc_devnode(const struct device *dev, umode_t *mode) { return kasprintf(GFP_KERNEL, "crypto/%s", dev_name(dev)); } diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c index 3dbdce96fae0..5878329b011a 100644 --- a/drivers/misc/cxl/file.c +++ b/drivers/misc/cxl/file.c @@ -546,7 +546,7 @@ static const struct file_operations afu_master_fops = { }; -static char *cxl_devnode(struct device *dev, umode_t *mode) +static char *cxl_devnode(const struct device *dev, umode_t *mode) { if (cpu_has_feature(CPU_FTR_HVMODE) && CXL_DEVT_IS_CARD(dev->devt)) { diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c index d46dba2df5a1..d96be36405a0 100644 --- a/drivers/misc/ocxl/file.c +++ b/drivers/misc/ocxl/file.c @@ -581,7 +581,7 @@ void ocxl_file_unregister_afu(struct ocxl_afu *afu) device_unregister(>dev); } -static char *ocxl_devnode(struct device *dev, umode_t *mode) +static char *ocxl_devnode(const struct device *dev, umode_t *mode) { return kasprintf(GFP_KERNEL, "ocxl/%s", dev_name(dev)); } -- 2.38.1
[linux-next:master] BUILD REGRESSION 15f2f20ccbf2d04cb14e3e7635aa0447208c71e7
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master branch HEAD: 15f2f20ccbf2d04cb14e3e7635aa0447208c71e7 Add linux-next specific files for 20221128 Error/Warning reports: https://lore.kernel.org/oe-kbuild-all/202211041320.coq8eelj-...@intel.com https://lore.kernel.org/oe-kbuild-all/202211080348.bosishom-...@intel.com https://lore.kernel.org/oe-kbuild-all/202211090634.ryfkk0ws-...@intel.com https://lore.kernel.org/oe-kbuild-all/202211242120.mzzvguln-...@intel.com https://lore.kernel.org/oe-kbuild-all/202211282102.qur7hhrw-...@intel.com https://lore.kernel.org/oe-kbuild-all/202211282244.c1faavio-...@intel.com Error/Warning: (recently discovered and may have been fixed) arch/arm/mach-s3c/devs.c:32:10: fatal error: linux/platform_data/dma-s3c24xx.h: No such file or directory arch/powerpc/kernel/kvm_emul.o: warning: objtool: kvm_template_end(): can't find starting instruction arch/powerpc/kernel/optprobes_head.o: warning: objtool: optprobe_template_end(): can't find starting instruction drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dp.c:5075:24: warning: implicit conversion from 'enum ' to 'enum dc_status' [-Wenum-conversion] drivers/gpu/drm/amd/amdgpu/../display/dc/irq/dcn201/irq_service_dcn201.c:40:20: warning: no previous prototype for 'to_dal_irq_source_dcn201' [-Wmissing-prototypes] drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c:451:1: warning: no previous prototype for 'gf100_fifo_nonstall_block' [-Wmissing-prototypes] drivers/gpu/drm/nouveau/nvkm/engine/fifo/runl.c:34:1: warning: no previous prototype for 'nvkm_engn_cgrp_get' [-Wmissing-prototypes] drivers/gpu/drm/nouveau/nvkm/engine/gr/tu102.c:210:1: warning: no previous prototype for 'tu102_gr_load' [-Wmissing-prototypes] drivers/gpu/drm/nouveau/nvkm/nvfw/acr.c:49:1: warning: no previous prototype for 'wpr_generic_header_dump' [-Wmissing-prototypes] drivers/gpu/drm/nouveau/nvkm/subdev/acr/lsfw.c:221:21: warning: variable 'loc' set but not used [-Wunused-but-set-variable] vmlinux.o: warning: objtool: __btrfs_map_block+0x21ad: unreachable instruction vmlinux.o: warning: objtool: btrfs_calc_avail_data_space+0x4f: unreachable instruction Error/Warning ids grouped by kconfigs: gcc_recent_errors |-- alpha-allyesconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-core-dc_link_dp.c:warning:implicit-conversion-from-enum-anonymous-to-enum-dc_status | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-irq-dcn201-irq_service_dcn201.c:warning:no-previous-prototype-for-to_dal_irq_source_dcn201 | |-- drivers-gpu-drm-nouveau-nvkm-engine-fifo-gf100.c:warning:no-previous-prototype-for-gf100_fifo_nonstall_block | |-- drivers-gpu-drm-nouveau-nvkm-engine-fifo-runl.c:warning:no-previous-prototype-for-nvkm_engn_cgrp_get | |-- drivers-gpu-drm-nouveau-nvkm-engine-gr-tu102.c:warning:no-previous-prototype-for-tu102_gr_load | |-- drivers-gpu-drm-nouveau-nvkm-nvfw-acr.c:warning:no-previous-prototype-for-wpr_generic_header_dump | `-- drivers-gpu-drm-nouveau-nvkm-subdev-acr-lsfw.c:warning:variable-loc-set-but-not-used |-- alpha-randconfig-r031-20221128 | |-- drivers-gpu-drm-nouveau-nvkm-engine-fifo-gf100.c:warning:no-previous-prototype-for-gf100_fifo_nonstall_block | |-- drivers-gpu-drm-nouveau-nvkm-engine-fifo-runl.c:warning:no-previous-prototype-for-nvkm_engn_cgrp_get | |-- drivers-gpu-drm-nouveau-nvkm-engine-gr-tu102.c:warning:no-previous-prototype-for-tu102_gr_load | |-- drivers-gpu-drm-nouveau-nvkm-nvfw-acr.c:warning:no-previous-prototype-for-wpr_generic_header_dump | `-- drivers-gpu-drm-nouveau-nvkm-subdev-acr-lsfw.c:warning:variable-loc-set-but-not-used |-- arc-allyesconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-core-dc_link_dp.c:warning:implicit-conversion-from-enum-anonymous-to-enum-dc_status | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-irq-dcn201-irq_service_dcn201.c:warning:no-previous-prototype-for-to_dal_irq_source_dcn201 | |-- drivers-gpu-drm-nouveau-nvkm-engine-fifo-gf100.c:warning:no-previous-prototype-for-gf100_fifo_nonstall_block | |-- drivers-gpu-drm-nouveau-nvkm-engine-fifo-runl.c:warning:no-previous-prototype-for-nvkm_engn_cgrp_get | |-- drivers-gpu-drm-nouveau-nvkm-engine-gr-tu102.c:warning:no-previous-prototype-for-tu102_gr_load | |-- drivers-gpu-drm-nouveau-nvkm-nvfw-acr.c:warning:no-previous-prototype-for-wpr_generic_header_dump | `-- drivers-gpu-drm-nouveau-nvkm-subdev-acr-lsfw.c:warning:variable-loc-set-but-not-used |-- arc-randconfig-r024-20221127 | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-core-dc_link_dp.c:warning:implicit-conversion-from-enum-anonymous-to-enum-dc_status | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-irq-dcn201-irq_service_dcn201.c:warning:no-previous-prototype-for-to_dal_irq_source_dcn201 | |-- drivers-gpu-drm-nouveau-nvkm-engine-fifo-gf100.c:warning:no-previous-prototype-for-gf100_fifo_nonstall_block | |-- drivers-gpu-drm-nouveau-nvkm-engine-fifo-runl.c:warning:no-previous-prototype
Re: [PATCH linux-next][RFC]torture: avoid offline tick_do_timer_cpu
On Mon, Nov 28, 2022 at 09:12:28AM +0100, Thomas Gleixner wrote: > On Sun, Nov 27 2022 at 09:53, Paul E. McKenney wrote: > > On Sun, Nov 27, 2022 at 01:40:28PM +0100, Thomas Gleixner wrote: > >> There are quite some reasons why a CPU-hotplug or a hot-unplug operation > >> can fail, which is not a fatal problem, really. > >> > >> So if a CPU hotplug operation fails, then why can't the torture test > >> just move on and validate that the system still behaves correctly? > >> > >> That gives us more coverage than just testing the good case and giving > >> up when something unexpected happens. > > > > Agreed, with access to a function like the tick_nohz_full_timekeeper() > > suggested earlier in this email thread, then yes, it would make sense to > > try to offline the CPU anyway, then forgive the failure in cases where > > the CPU matches that indicated by tick_nohz_full_timekeeper(). > > Why special casing this? There are other valid reasons why offlining can > fail. So we special case timekeeper today and then next week we special > case something else just because. That does not make sense. If it fails > there is a reason and you can log it. The important part is that the > system is functional and stable after the fail and the rollback. Perhaps there are other valid reasons, but they have not been showing up in my torture-test runs for well over a decade. Not saying that they don't happen, of course. But if they involved (say) cgroups, then my test setup would not exercise them. So are you looking to introduce spurious CPU-hotplug failures? If so, these will also affect things like suspend/resume. Plus it will make it much more difficult to detect real but intermittent CPU-hotplug bugs, which is the motivation for special-casing the tick_nohz_full_timekeeper() failures. So we should discuss introduciton of any spurious failures that might be under consideration. Independently of that, the torture_onoff() functions can of course keep some sort of histogram of the failure return codes. Or are there other failure indications that should be captured? > >> I even argue that the torture test should inject random failures into > >> the hotplug state machine to achieve extended code coverage. > > > > I could imagine torture_onoff() telling various CPU-hotplug notifiers > > to refuse the transition using some TBD interface. > > There is already an interface which is exposed to sysfs which allows you > to enforce a "fail" at a defined hotplug state. If you would like me to be testing this as part of my normal testing regimen, I will need an in-kernel interface. Such an interface is of course not needed for modprobe-style testing, in which case the script doing the modprobe and rmmod can of course manipulate the sysfs files. But I don't do that sort of testing very often. And when I do, it is almost always with kernels configured for Meta's fleet, which almost never do CPU-offline operations. Thanx, Paul > > That would better test the CPU-hotplug common code's ability to deal > > with failures. > > Correct. > > > Or did you have something else/additional in mind? > > No. > > Thanks, > > tglx
Re: [PATCH mm-unstable v1 16/20] mm/frame-vector: remove FOLL_FORCE usage
On Mon, Nov 28, 2022 at 5:19 PM David Hildenbrand wrote: > > On 28.11.22 09:17, Hans Verkuil wrote: > > Hi David, > > > > On 27/11/2022 11:35, David Hildenbrand wrote: > >> On 16.11.22 11:26, David Hildenbrand wrote: > >>> FOLL_FORCE is really only for ptrace access. According to commit > >>> 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are always > >>> writable"), get_vaddr_frames() currently pins all pages writable as a > >>> workaround for issues with read-only buffers. > >>> > >>> FOLL_FORCE, however, seems to be a legacy leftover as it predates > >>> commit 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are > >>> always writable"). Let's just remove it. > >>> > >>> Once the read-only buffer issue has been resolved, FOLL_WRITE could > >>> again be set depending on the DMA direction. > >>> > >>> Cc: Hans Verkuil > >>> Cc: Marek Szyprowski > >>> Cc: Tomasz Figa > >>> Cc: Marek Szyprowski > >>> Cc: Mauro Carvalho Chehab > >>> Signed-off-by: David Hildenbrand > >>> --- > >>>drivers/media/common/videobuf2/frame_vector.c | 2 +- > >>>1 file changed, 1 insertion(+), 1 deletion(-) > >>> > >>> diff --git a/drivers/media/common/videobuf2/frame_vector.c > >>> b/drivers/media/common/videobuf2/frame_vector.c > >>> index 542dde9d2609..062e98148c53 100644 > >>> --- a/drivers/media/common/videobuf2/frame_vector.c > >>> +++ b/drivers/media/common/videobuf2/frame_vector.c > >>> @@ -50,7 +50,7 @@ int get_vaddr_frames(unsigned long start, unsigned int > >>> nr_frames, > >>>start = untagged_addr(start); > >>> ret = pin_user_pages_fast(start, nr_frames, > >>> - FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM, > >>> + FOLL_WRITE | FOLL_LONGTERM, > >>> (struct page **)(vec->ptrs)); > >>>if (ret > 0) { > >>>vec->got_ref = true; > >> > >> > >> Hi Andrew, > >> > >> see the discussion at [1] regarding a conflict and how to proceed with > >> upstreaming. The conflict would be easy to resolve, however, also > >> the patch description doesn't make sense anymore with [1]. > > > > Might it be easier and less confusing if you post a v2 of this series > > with my patch first? That way it is clear that 1) my patch has to come > > first, and 2) that it is part of a single series and should be merged > > by the mm subsystem. > > > > Less chances of things going wrong that way. > > > > Just mention in the v2 cover letter that the first patch was added to > > make it easy to backport that fix without being hampered by merge > > conflicts if it was added after your frame_vector.c patch. > > Yes, that's the way I would naturally do, it, however, Andrew prefers > delta updates for minor changes. > > @Andrew, whatever you prefer! > > Thanks! > However you folks proceed with taking this patch, feel free to add my Acked-by. Thanks! Best regards, Tomasz > -- > Thanks, > > David / dhildenb >
Re: [PATCH mm-unstable v1 16/20] mm/frame-vector: remove FOLL_FORCE usage
On 28/11/2022 09:18, David Hildenbrand wrote: > On 28.11.22 09:17, Hans Verkuil wrote: >> Hi David, >> >> On 27/11/2022 11:35, David Hildenbrand wrote: >>> On 16.11.22 11:26, David Hildenbrand wrote: FOLL_FORCE is really only for ptrace access. According to commit 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are always writable"), get_vaddr_frames() currently pins all pages writable as a workaround for issues with read-only buffers. FOLL_FORCE, however, seems to be a legacy leftover as it predates commit 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are always writable"). Let's just remove it. Once the read-only buffer issue has been resolved, FOLL_WRITE could again be set depending on the DMA direction. Cc: Hans Verkuil Cc: Marek Szyprowski Cc: Tomasz Figa Cc: Marek Szyprowski Cc: Mauro Carvalho Chehab Signed-off-by: David Hildenbrand --- drivers/media/common/videobuf2/frame_vector.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/media/common/videobuf2/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c index 542dde9d2609..062e98148c53 100644 --- a/drivers/media/common/videobuf2/frame_vector.c +++ b/drivers/media/common/videobuf2/frame_vector.c @@ -50,7 +50,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, start = untagged_addr(start); ret = pin_user_pages_fast(start, nr_frames, - FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM, + FOLL_WRITE | FOLL_LONGTERM, (struct page **)(vec->ptrs)); if (ret > 0) { vec->got_ref = true; >>> >>> >>> Hi Andrew, >>> >>> see the discussion at [1] regarding a conflict and how to proceed with >>> upstreaming. The conflict would be easy to resolve, however, also >>> the patch description doesn't make sense anymore with [1]. >> >> Might it be easier and less confusing if you post a v2 of this series >> with my patch first? That way it is clear that 1) my patch has to come >> first, and 2) that it is part of a single series and should be merged >> by the mm subsystem. >> >> Less chances of things going wrong that way. >> >> Just mention in the v2 cover letter that the first patch was added to >> make it easy to backport that fix without being hampered by merge >> conflicts if it was added after your frame_vector.c patch. > > Yes, that's the way I would naturally do, it, however, Andrew prefers delta > updates for minor changes. > > @Andrew, whatever you prefer! Andrew, I've resent my patch, this time with you CCed as well. Regards, Hans > > Thanks! >
Re: [PATCH mm-unstable v1 16/20] mm/frame-vector: remove FOLL_FORCE usage
On 28.11.22 09:17, Hans Verkuil wrote: Hi David, On 27/11/2022 11:35, David Hildenbrand wrote: On 16.11.22 11:26, David Hildenbrand wrote: FOLL_FORCE is really only for ptrace access. According to commit 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are always writable"), get_vaddr_frames() currently pins all pages writable as a workaround for issues with read-only buffers. FOLL_FORCE, however, seems to be a legacy leftover as it predates commit 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are always writable"). Let's just remove it. Once the read-only buffer issue has been resolved, FOLL_WRITE could again be set depending on the DMA direction. Cc: Hans Verkuil Cc: Marek Szyprowski Cc: Tomasz Figa Cc: Marek Szyprowski Cc: Mauro Carvalho Chehab Signed-off-by: David Hildenbrand --- drivers/media/common/videobuf2/frame_vector.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/media/common/videobuf2/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c index 542dde9d2609..062e98148c53 100644 --- a/drivers/media/common/videobuf2/frame_vector.c +++ b/drivers/media/common/videobuf2/frame_vector.c @@ -50,7 +50,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, start = untagged_addr(start); ret = pin_user_pages_fast(start, nr_frames, - FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM, + FOLL_WRITE | FOLL_LONGTERM, (struct page **)(vec->ptrs)); if (ret > 0) { vec->got_ref = true; Hi Andrew, see the discussion at [1] regarding a conflict and how to proceed with upstreaming. The conflict would be easy to resolve, however, also the patch description doesn't make sense anymore with [1]. Might it be easier and less confusing if you post a v2 of this series with my patch first? That way it is clear that 1) my patch has to come first, and 2) that it is part of a single series and should be merged by the mm subsystem. Less chances of things going wrong that way. Just mention in the v2 cover letter that the first patch was added to make it easy to backport that fix without being hampered by merge conflicts if it was added after your frame_vector.c patch. Yes, that's the way I would naturally do, it, however, Andrew prefers delta updates for minor changes. @Andrew, whatever you prefer! Thanks! -- Thanks, David / dhildenb
Re: [PATCH mm-unstable v1 16/20] mm/frame-vector: remove FOLL_FORCE usage
Hi David, On 27/11/2022 11:35, David Hildenbrand wrote: > On 16.11.22 11:26, David Hildenbrand wrote: >> FOLL_FORCE is really only for ptrace access. According to commit >> 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are always >> writable"), get_vaddr_frames() currently pins all pages writable as a >> workaround for issues with read-only buffers. >> >> FOLL_FORCE, however, seems to be a legacy leftover as it predates >> commit 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are >> always writable"). Let's just remove it. >> >> Once the read-only buffer issue has been resolved, FOLL_WRITE could >> again be set depending on the DMA direction. >> >> Cc: Hans Verkuil >> Cc: Marek Szyprowski >> Cc: Tomasz Figa >> Cc: Marek Szyprowski >> Cc: Mauro Carvalho Chehab >> Signed-off-by: David Hildenbrand >> --- >> drivers/media/common/videobuf2/frame_vector.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/media/common/videobuf2/frame_vector.c >> b/drivers/media/common/videobuf2/frame_vector.c >> index 542dde9d2609..062e98148c53 100644 >> --- a/drivers/media/common/videobuf2/frame_vector.c >> +++ b/drivers/media/common/videobuf2/frame_vector.c >> @@ -50,7 +50,7 @@ int get_vaddr_frames(unsigned long start, unsigned int >> nr_frames, >> start = untagged_addr(start); >> ret = pin_user_pages_fast(start, nr_frames, >> - FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM, >> + FOLL_WRITE | FOLL_LONGTERM, >> (struct page **)(vec->ptrs)); >> if (ret > 0) { >> vec->got_ref = true; > > > Hi Andrew, > > see the discussion at [1] regarding a conflict and how to proceed with > upstreaming. The conflict would be easy to resolve, however, also > the patch description doesn't make sense anymore with [1]. Might it be easier and less confusing if you post a v2 of this series with my patch first? That way it is clear that 1) my patch has to come first, and 2) that it is part of a single series and should be merged by the mm subsystem. Less chances of things going wrong that way. Just mention in the v2 cover letter that the first patch was added to make it easy to backport that fix without being hampered by merge conflicts if it was added after your frame_vector.c patch. Regards, Hans > > > On top of mm-unstable, reverting this patch and applying [1] gives me > an updated patch: > > > From 1e66c25f1467c1f1e5f275312f2c6df29308d4df Mon Sep 17 00:00:00 2001 > From: David Hildenbrand > Date: Wed, 16 Nov 2022 11:26:55 +0100 > Subject: [PATCH] mm/frame-vector: remove FOLL_FORCE usage > > GUP now supports reliable R/O long-term pinning in COW mappings, such > that we break COW early. MAP_SHARED VMAs only use the shared zeropage so > far in one corner case (DAXFS file with holes), which can be ignored > because GUP does not support long-term pinning in fsdax (see > check_vma_flags()). > > Consequently, FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM is no longer required > for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop > using FOLL_FORCE, which is really only for ptrace access. > > Reviewed-by: Daniel Vetter > Acked-by: Hans Verkuil > Cc: Hans Verkuil > Cc: Marek Szyprowski > Cc: Tomasz Figa > Cc: Marek Szyprowski > Cc: Mauro Carvalho Chehab > Signed-off-by: David Hildenbrand > --- > drivers/media/common/videobuf2/frame_vector.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/media/common/videobuf2/frame_vector.c > b/drivers/media/common/videobuf2/frame_vector.c > index aad72640f055..8606fdacf5b8 100644 > --- a/drivers/media/common/videobuf2/frame_vector.c > +++ b/drivers/media/common/videobuf2/frame_vector.c > @@ -41,7 +41,7 @@ int get_vaddr_frames(unsigned long start, unsigned int > nr_frames, bool write, > int ret_pin_user_pages_fast = 0; > int ret = 0; > int err; > - unsigned int gup_flags = FOLL_FORCE | FOLL_LONGTERM; > + unsigned int gup_flags = FOLL_LONGTERM; > > if (nr_frames == 0) > return 0;
Re: [PATCH linux-next][RFC]torture: avoid offline tick_do_timer_cpu
On Sun, Nov 27 2022 at 09:53, Paul E. McKenney wrote: > On Sun, Nov 27, 2022 at 01:40:28PM +0100, Thomas Gleixner wrote: >> There are quite some reasons why a CPU-hotplug or a hot-unplug operation >> can fail, which is not a fatal problem, really. >> >> So if a CPU hotplug operation fails, then why can't the torture test >> just move on and validate that the system still behaves correctly? >> >> That gives us more coverage than just testing the good case and giving >> up when something unexpected happens. > > Agreed, with access to a function like the tick_nohz_full_timekeeper() > suggested earlier in this email thread, then yes, it would make sense to > try to offline the CPU anyway, then forgive the failure in cases where > the CPU matches that indicated by tick_nohz_full_timekeeper(). Why special casing this? There are other valid reasons why offlining can fail. So we special case timekeeper today and then next week we special case something else just because. That does not make sense. If it fails there is a reason and you can log it. The important part is that the system is functional and stable after the fail and the rollback. >> I even argue that the torture test should inject random failures into >> the hotplug state machine to achieve extended code coverage. > > I could imagine torture_onoff() telling various CPU-hotplug notifiers > to refuse the transition using some TBD interface. There is already an interface which is exposed to sysfs which allows you to enforce a "fail" at a defined hotplug state. > That would better test the CPU-hotplug common code's ability to deal > with failures. Correct. > Or did you have something else/additional in mind? No. Thanks, tglx