[linux-next:master] BUILD REGRESSION bc63de6e6ba0b16652c5fb4b9c9916b9e7ca1f23
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master branch HEAD: bc63de6e6ba0b16652c5fb4b9c9916b9e7ca1f23 Add linux-next specific files for 20231208 Error/Warning reports: https://lore.kernel.org/oe-kbuild-all/202312081716.luuhsns4-...@intel.com https://lore.kernel.org/oe-kbuild-all/202312081904.nkusjjo0-...@intel.com https://lore.kernel.org/oe-kbuild-all/202312081955.g3stkpfj-...@intel.com https://lore.kernel.org/oe-kbuild-all/202312090117.dlmjtqst-...@intel.com Error/Warning: (recently discovered and may have been fixed) WARNING: modpost: vmlinux: section mismatch in reference: at91_poweroff_probe+0x8c (section: .text) -> at91_wakeup_status (section: .init.text) WARNING: modpost: vmlinux: section mismatch in reference: at91_shdwc_probe+0xd8 (section: .text) -> at91_wakeup_status (section: .init.text) arch/mips/mm/cache.c:209:(.text+0x690): undefined reference to `r3k_cache_init' arch/powerpc/platforms/44x/warp.c:109:15: error: variable 'warp_gpio_leds' has initializer but incomplete type arch/powerpc/platforms/44x/warp.c:109:31: error: storage size of 'warp_gpio_leds' isn't known arch/powerpc/platforms/44x/warp.c:110:10: error: 'struct platform_device' has no member named 'name' arch/powerpc/platforms/44x/warp.c:110:19: warning: excess elements in struct initializer arch/powerpc/platforms/44x/warp.c:111:10: error: 'struct platform_device' has no member named 'id' arch/powerpc/platforms/44x/warp.c:112:10: error: 'struct platform_device' has no member named 'dev' arch/powerpc/platforms/44x/warp.c:112:19: error: extra brace group at end of initializer arch/powerpc/platforms/44x/warp.c:197:25: error: implicit declaration of function 'platform_device_register'; did you mean 'of_device_register'? [-Werror=implicit-function-declaration] drivers/usb/host/uhci-grlib.c:152:31: error: implicit declaration of function 'platform_get_drvdata'; did you mean 'pci_get_drvdata'? [-Werror=implicit-function-declaration] drivers/usb/host/uhci-grlib.c:152:31: warning: initialization of 'struct usb_hcd *' from 'int' makes pointer from integer without a cast [-Wint-conversion] drivers/usb/host/uhci-grlib.c:184:15: error: variable 'uhci_grlib_driver' has initializer but incomplete type drivers/usb/host/uhci-grlib.c:184:31: error: storage size of 'uhci_grlib_driver' isn't known drivers/usb/host/uhci-grlib.c:185:10: error: 'struct platform_driver' has no member named 'probe' drivers/usb/host/uhci-grlib.c:185:27: warning: excess elements in struct initializer drivers/usb/host/uhci-grlib.c:186:10: error: 'struct platform_driver' has no member named 'remove_new' drivers/usb/host/uhci-grlib.c:187:10: error: 'struct platform_driver' has no member named 'shutdown' drivers/usb/host/uhci-grlib.c:188:10: error: 'struct platform_driver' has no member named 'driver' drivers/usb/host/uhci-grlib.c:188:19: error: extra brace group at end of initializer drivers/usb/host/uhci-grlib.c:92:36: error: invalid use of undefined type 'struct platform_device' drivers/usb/host/uhci-hcd.c:885:18: error: implicit declaration of function 'platform_driver_register' [-Werror=implicit-function-declaration] drivers/usb/host/uhci-hcd.c:902:9: error: implicit declaration of function 'platform_driver_unregister'; did you mean 'driver_unregister'? [-Werror=implicit-function-declaration] uffd-common.c:636:28: warning: unused variable 'uffdio_move' [-Wunused-variable] Error/Warning ids grouped by kconfigs: gcc_recent_errors |-- alpha-randconfig-r083-20230821 | `-- WARNING:modpost:missing-MODULE_DESCRIPTION()-in-drivers-char-agp-alpha-agp.o |-- arc-randconfig-r132-20231208 | `-- lib-zstd-compress-zstd_fast.c:sparse:sparse:Using-plain-integer-as-NULL-pointer |-- arm-allyesconfig | `-- qcom_stats.c:(.text):undefined-reference-to-__aeabi_uldivmod |-- arm-randconfig-r004-20221210 | `-- WARNING:modpost:drivers-pcmcia-omap_cf:section-mismatch-in-reference:omap_cf_driver-(section:.data)-omap_cf_remove-(section:.exit.text) |-- arm-randconfig-r112-20231208 | `-- lib-zstd-compress-zstd_fast.c:sparse:sparse:Using-plain-integer-as-NULL-pointer |-- i386-buildonly-randconfig-002-20230825 | `-- include-linux-compiler_types.h:error:call-to-__compiletime_assert_NNN-declared-with-attribute-error:BUILD_BUG_ON-failed:sizeof(-vcpup)-SMP_CACHE_BYTES |-- i386-randconfig-r133-20231208 | `-- lib-zstd-compress-zstd_fast.c:sparse:sparse:Using-plain-integer-as-NULL-pointer |-- loongarch-randconfig-r123-20231208 | `-- lib-zstd-compress-zstd_fast.c:sparse:sparse:Using-plain-integer-as-NULL-pointer |-- mips-allyesconfig | `-- qcom_stats.c:(.text.qcom_ddr_stats_show):undefined-reference-to-__udivdi3 |-- mips-fuloong2e_defconfig | `-- WARNING:modpost:missing-MODULE_DESCRIPTION()-in-drivers-base-regmap-regmap-mmio.o |-- mips-randconfig-r021-20230212 | `-- arch-mips-mm-cache.c:(.text):undefined-reference-to-r3k_cache_init |-- parisc-allmodconfig | `-- WARNING:modpost:missing-MODULE_DESCR
Re: [PATCH v2] i2c: cpm: Remove linux,i2c-index conversion from be32
Hi Christophe, On Wed, Dec 06, 2023 at 11:24:03PM +0100, Christophe Leroy wrote: > sparse reports an error on some data that gets converted from be32. > > That's because that data is typed u32 instead of __be32. > > The type is correct, the be32_to_cpu() conversion is not. > > Remove the conversion. > > Reported-by: kernel test robot > Closes: > https://lore.kernel.org/oe-kbuild-all/202312042210.ql4da8av-...@intel.com/ > Signed-off-by: Christophe Leroy > --- > v2: Use u32 directly, remove be32_to_cpu(). > --- > drivers/i2c/busses/i2c-cpm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/i2c/busses/i2c-cpm.c b/drivers/i2c/busses/i2c-cpm.c > index 9a664abf734d..771d60bc8d71 100644 > --- a/drivers/i2c/busses/i2c-cpm.c > +++ b/drivers/i2c/busses/i2c-cpm.c > @@ -658,7 +658,7 @@ static int cpm_i2c_probe(struct platform_device *ofdev) > /* register new adapter to i2c module... */ > > data = of_get_property(ofdev->dev.of_node, "linux,i2c-index", ); > - cpm->adap.nr = (data && len == 4) ? be32_to_cpup(data) : -1; > + cpm->adap.nr = (data && len == 4) ? *data : -1; thanks! Reviewed-by: Andi Shyti Andi
Re: [PATCH 0/3] PCI/AER: Clean up logging
[+cc Jonathan] On Wed, Dec 06, 2023 at 04:42:28PM -0600, Bjorn Helgaas wrote: > From: Bjorn Helgaas > > Clean up some minor AER logging issues: > > - Log as "Correctable errors", not "Corrected errors" > > - Decode the Requester ID when we couldn't find detail error info > > Bjorn Helgaas (3): > PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors > PCI/AER: Decode Requester ID when no error info found > PCI/AER: Use explicit register sizes for struct members > > drivers/pci/pcie/aer.c | 19 --- > include/linux/aer.h| 8 > 2 files changed, 16 insertions(+), 11 deletions(-) Applied to pci/aer for v6.8. Thanks, Jonathan, for your time in taking a look.
[RFC PATCH 1/9] powerpc/ftrace: Fix indentation in ftrace.h
Replace seven spaces with a tab character to fix an indentation issue reported by the kernel test robot. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202311221731.aluwtdim-...@intel.com/ Signed-off-by: Naveen N Rao --- arch/powerpc/include/asm/ftrace.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h index 9e5a39b6a311..1ebd2ca97f12 100644 --- a/arch/powerpc/include/asm/ftrace.h +++ b/arch/powerpc/include/asm/ftrace.h @@ -25,7 +25,7 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr) if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY)) addr += MCOUNT_INSN_SIZE; - return addr; + return addr; } unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip, -- 2.43.0
[RFC PATCH 9/9] samples/ftrace: Add support for ftrace direct samples on powerpc
Add powerpc 32-bit and 64-bit samples for ftrace direct. This serves to show the sample instruction sequence to be used by ftrace direct calls to adhere to the ftrace ABI. On 64-bit powerpc, TOC setup requires some additional work. Signed-off-by: Naveen N Rao --- arch/powerpc/Kconfig| 2 + samples/ftrace/ftrace-direct-modify.c | 94 - samples/ftrace/ftrace-direct-multi-modify.c | 110 +++- samples/ftrace/ftrace-direct-multi.c| 64 +++- samples/ftrace/ftrace-direct-too.c | 72 - samples/ftrace/ftrace-direct.c | 61 ++- 6 files changed, 398 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 4fe04fdca33a..28de3a5f3e98 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -274,6 +274,8 @@ config PPC select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RELIABLE_STACKTRACE select HAVE_RSEQ + select HAVE_SAMPLE_FTRACE_DIRECTif HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS + select HAVE_SAMPLE_FTRACE_DIRECT_MULTI if HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS select HAVE_SETUP_PER_CPU_AREA if PPC64 select HAVE_SOFTIRQ_ON_OWN_STACK select HAVE_STACKPROTECTOR if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2) diff --git a/samples/ftrace/ftrace-direct-modify.c b/samples/ftrace/ftrace-direct-modify.c index e2a6a69352df..bd985035b937 100644 --- a/samples/ftrace/ftrace-direct-modify.c +++ b/samples/ftrace/ftrace-direct-modify.c @@ -2,7 +2,7 @@ #include #include #include -#ifndef CONFIG_ARM64 +#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32) #include #endif @@ -164,6 +164,98 @@ asm ( #endif /* CONFIG_LOONGARCH */ +#ifdef CONFIG_PPC32 + +asm ( +" .pushsection.text, \"ax\", @progbits\n" +" .type my_tramp1, @function\n" +" .globl my_tramp1\n" +" my_tramp1:" +" stw 0, 4(1)\n" +" stwu1, -16(1)\n" +" mflr0\n" +" stw 0, 4(1)\n" +" stwu1, -16(1)\n" +" bl my_direct_func1\n" +" lwz 0, 20(1)\n" +" mtlr0\n" +" addi1, 1, 32\n" +" lwz 0, 4(1)\n" +" blr\n" +" .size my_tramp1, .-my_tramp1\n" + +" .type my_tramp2, @function\n" +" .globl my_tramp2\n" +" my_tramp2:" +" stw 0, 4(1)\n" +" stwu1, -16(1)\n" +" mflr0\n" +" stw 0, 4(1)\n" +" stwu1, -16(1)\n" +" bl my_direct_func2\n" +" lwz 0, 20(1)\n" +" mtlr0\n" +" addi1, 1, 32\n" +" lwz 0, 4(1)\n" +" blr\n" +" .size my_tramp2, .-my_tramp2\n" +" .popsection\n" +); + +#endif /* CONFIG_PPC32 */ + +#ifdef CONFIG_PPC64 + +asm ( +" .pushsection.text, \"ax\", @progbits\n" +" .type my_tramp1, @function\n" +" .globl my_tramp1\n" +" my_tramp1:" +" std 0, 16(1)\n" +" stdu1, -32(1)\n" +" mflr0\n" +" std 0, 16(1)\n" +" stdu1, -32(1)\n" +" std 2, 24(1)\n" +" bcl 20, 31, 1f\n" +" 1: mflr12\n" +" ld 2, (2f - 1b)(12)\n" +" bl my_direct_func1\n" +" ld 2, 24(1)\n" +" ld 0, 48(1)\n" +" mtlr0\n" +" addi1, 1, 64\n" +" ld 0, 16(1)\n" +" blr\n" +" 2: .quad .TOC.@tocbase\n" +" .size my_tramp1, .-my_tramp1\n" + +" .type my_tramp2, @function\n" +" .globl my_tramp2\n" +" my_tramp2:" +" std 0, 16(1)\n" +" stdu1, -32(1)\n" +" mflr0\n" +" std 0, 16(1)\n" +" stdu1, -32(1)\n" +" std 2, 24(1)\n" +" bcl 20, 31, 1f\n" +" 1: mflr12\n" +" ld 2, (2f - 1b)(12)\n" +" bl my_direct_func2\n" +" ld 2, 24(1)\n" +" ld 0, 48(1)\n" +" mtlr0\n" +" addi1, 1, 64\n" +" ld 0, 16(1)\n" +" blr\n" +" 2: .quad .TOC.@tocbase\n" +" .size my_tramp2, .-my_tramp2\n" +" .popsection\n" +); + +#endif /* CONFIG_PPC64 */ + static struct ftrace_ops direct; static unsigned long my_tramp = (unsigned long)my_tramp1; diff --git a/samples/ftrace/ftrace-direct-multi-modify.c b/samples/ftrace/ftrace-direct-multi-modify.c index 2e349834d63c..478e879a23af 100644 --- a/samples/ftrace/ftrace-direct-multi-modify.c +++ b/samples/ftrace/ftrace-direct-multi-modify.c @@
[RFC PATCH 8/9] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS
Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS similar to the arm64 implementation. ftrace direct calls allow custom trampolines to be called into directly from function ftrace call sites, bypassing the ftrace trampoline completely. This functionality is currently utilized by BPF trampolines to hook into kernel function entries. Since we have limited relative branch range, we support ftrace direct calls through support for DYNAMIC_FTRACE_WITH_CALL_OPS. In this approach, ftrace trampoline is not entirely bypassed. Rather, it is re-purposed into a stub that reads direct_call field from the associated ftrace_ops structure and branches into that, if it is not NULL. For this, it is sufficient if we can ensure that the ftrace trampoline is reachable from all traceable functions. When multiple ftrace_ops are associated with a call site, we utilize a call back to set pt_regs->orig_gpr3 that can then be tested on the return path from the ftrace trampoline to branch into the direct caller. Signed-off-by: Naveen N Rao --- arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/ftrace.h| 15 arch/powerpc/kernel/asm-offsets.c| 3 + arch/powerpc/kernel/trace/ftrace_entry.S | 99 ++-- 4 files changed, 93 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index c8ecc9dcc914..4fe04fdca33a 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -235,6 +235,7 @@ config PPC select HAVE_DYNAMIC_FTRACE select HAVE_DYNAMIC_FTRACE_WITH_ARGSif ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32 select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if ARCH_USING_PATCHABLE_FUNCTION_ENTRY + select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS if HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS select HAVE_DYNAMIC_FTRACE_WITH_REGSif ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32 select HAVE_EBPF_JIT select HAVE_EFFICIENT_UNALIGNED_ACCESS diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h index d9b99781bea3..986c4fffb9ec 100644 --- a/arch/powerpc/include/asm/ftrace.h +++ b/arch/powerpc/include/asm/ftrace.h @@ -93,6 +93,21 @@ struct ftrace_ops; #define ftrace_graph_func ftrace_graph_func void ftrace_graph_func(unsigned long ip, unsigned long parent_ip, struct ftrace_ops *op, struct ftrace_regs *fregs); + +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS +/* + * When an ftrace registered caller is tracing a function that is also set by a + * register_ftrace_direct() call, it needs to be differentiated in the + * ftrace_caller trampoline so that the direct call can be invoked after the + * other ftrace ops. To do this, place the direct caller in the orig_gpr3 field + * of pt_regs. This tells ftrace_caller that there's a direct caller. + */ +static inline void arch_ftrace_set_direct_caller(struct ftrace_regs *fregs, unsigned long addr) +{ + struct pt_regs *regs = >regs; + regs->orig_gpr3 = addr; +} +#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */ #endif #endif /* __ASSEMBLY__ */ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 8b8a39b57a9f..85da10726d98 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -678,6 +678,9 @@ int main(void) #ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS OFFSET(FTRACE_OPS_FUNC, ftrace_ops, func); +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS + OFFSET(FTRACE_OPS_DIRECT_CALL, ftrace_ops, direct_call); +#endif #endif return 0; diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S index 4d1220c2e32f..ab60395fc34b 100644 --- a/arch/powerpc/kernel/trace/ftrace_entry.S +++ b/arch/powerpc/kernel/trace/ftrace_entry.S @@ -33,14 +33,57 @@ * and then arrange for the ftrace function to be called. */ .macro ftrace_regs_entry allregs - /* Save the original return address in A's stack frame */ - PPC_STL r0, LRSAVE(r1) /* Create a minimal stack frame for representing B */ PPC_STLUr1, -STACK_FRAME_MIN_SIZE(r1) /* Create our stack frame + pt_regs */ PPC_STLUr1,-SWITCH_FRAME_SIZE(r1) + .if \allregs == 1 + SAVE_GPRS(11, 12, r1) + .endif + + /* Get the _mcount() call site out of LR */ + mflrr11 + +#ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY + /* +* This points after the bl at 'mtlr r0', but this sequence could be +* outside the function. Move this to point just after the ftrace +* location inside the function for proper unwind. +*/ + addir11, r11, FTRACE_MCOUNT_TRAMP_OFFSET - MCOUNT_INSN_SIZE + +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS + /* Load the ftrace_op */ + PPC_LL r12, -SZL-MCOUNT_INSN_SIZE(r11) + +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS + /*
[RFC PATCH 7/9] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS
Implement support for DYNAMIC_FTRACE_WITH_CALL_OPS similar to the arm64 implementation. This works by patching-in a pointer to an associated ftrace_ops structure before each traceable function. If multiple ftrace_ops are associated with a call site, then a special ftrace_list_ops is used to enable iterating over all the registered ftrace_ops. If no ftrace_ops are associated with a call site, then a special ftrace_nop_ops structure is used to render the ftrace call as a no-op. ftrace trampoline can then read the associated ftrace_ops for a call site by loading from an offset from the LR, and branch directly to the associated function. The primary advantage with this approach is that we don't have to iterate over all the registered ftrace_ops for call sites that have a single ftrace_ops registered. This is the equivalent of implementing support for dynamic ftrace trampolines, which set up a special ftrace trampoline for each registered ftrace_ops and have individual call sites branch into those directly. A secondary advantage is that this gives us a way to add support for direct ftrace callers without having to resort to using stubs. The address of the direct call trampoline can be loaded from the ftrace_ops structure. To support this, we utilize the space between the existing function profile sequence and the function entry. During ftrace activation, we update this location with the associated ftrace_ops pointer. Then, on ftrace entry, we load from this location and call into ftrace_ops->func(). For 64-bit powerpc, we also select FUNCTION_ALIGNMENT_8B so that the ftrace_ops pointer is double word aligned and can be updated atomically. Signed-off-by: Naveen N Rao --- arch/powerpc/Kconfig | 2 + arch/powerpc/kernel/asm-offsets.c| 4 ++ arch/powerpc/kernel/trace/ftrace.c | 58 arch/powerpc/kernel/trace/ftrace_entry.S | 39 +++- 4 files changed, 91 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 318e5c1b7454..c8ecc9dcc914 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -190,6 +190,7 @@ config PPC select EDAC_SUPPORT select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY if ARCH_USING_PATCHABLE_FUNCTION_ENTRY select FUNCTION_ALIGNMENT_4B + select FUNCTION_ALIGNMENT_8Bif PPC64 && DYNAMIC_FTRACE_WITH_CALL_OPS select GENERIC_ATOMIC64 if PPC32 select GENERIC_CLOCKEVENTS_BROADCASTif SMP select GENERIC_CMOS_UPDATE @@ -233,6 +234,7 @@ config PPC select HAVE_DEBUG_STACKOVERFLOW select HAVE_DYNAMIC_FTRACE select HAVE_DYNAMIC_FTRACE_WITH_ARGSif ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32 + select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if ARCH_USING_PATCHABLE_FUNCTION_ENTRY select HAVE_DYNAMIC_FTRACE_WITH_REGSif ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32 select HAVE_EBPF_JIT select HAVE_EFFICIENT_UNALIGNED_ACCESS diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 9f14d95b8b32..8b8a39b57a9f 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -676,5 +676,9 @@ int main(void) DEFINE(BPT_SIZE, BPT_SIZE); #endif +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS + OFFSET(FTRACE_OPS_FUNC, ftrace_ops, func); +#endif + return 0; } diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index d3b4949142a8..af84eabf7912 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -124,6 +124,41 @@ static int ftrace_get_call_inst(struct dyn_ftrace *rec, unsigned long addr, ppc_ return 0; } +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS +static const struct ftrace_ops *powerpc_rec_get_ops(struct dyn_ftrace *rec) +{ + const struct ftrace_ops *ops = NULL; + + if (rec->flags & FTRACE_FL_CALL_OPS_EN) { + ops = ftrace_find_unique_ops(rec); + WARN_ON_ONCE(!ops); + } + + if (!ops) + ops = _list_ops; + + return ops; +} + +static int ftrace_rec_set_ops(const struct dyn_ftrace *rec, const struct ftrace_ops *ops) +{ + return patch_ulong((void *)(rec->ip - sizeof(unsigned long)), (unsigned long)ops); +} + +static int ftrace_rec_set_nop_ops(struct dyn_ftrace *rec) +{ + return ftrace_rec_set_ops(rec, _nop_ops); +} + +static int ftrace_rec_update_ops(struct dyn_ftrace *rec) +{ + return ftrace_rec_set_ops(rec, powerpc_rec_get_ops(rec)); +} +#else +static int ftrace_rec_set_nop_ops(struct dyn_ftrace *rec) { return 0; } +static int ftrace_rec_update_ops(struct dyn_ftrace *rec) { return 0; } +#endif + #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, unsigned long addr) { @@ -159,6 +194,10 @@ int
[RFC PATCH 6/9] powerpc/ftrace: Update and move function profile instructions out-of-line
Function profile sequence on powerpc includes two instructions at the beginning of each function: mflrr0 bl ftrace_caller The call to ftrace_caller() gets nop'ed out during kernel boot and is patched in when ftrace is enabled. There are two issues with this: 1. The 'mflr r0' instruction at the beginning of each function remains even though ftrace is not being used. 2. When ftrace is activated, we return from ftrace_caller() with a bctr instruction to preserve r0 and LR, resulting in the link stack becoming unbalanced. To address (1), we have tried to nop'out the 'mflr r0' instruction when nop'ing out the call to ftrace_caller() and restoring it when enabling ftrace. But, that required additional synchronization slowing down ftrace activation. It also left an additional nop instruction at the beginning of each function and that wasn't desirable on 32-bit powerpc. Instead of that, move the function profile sequence out-of-line leaving a single nop at function entry. On ftrace activation, the nop is changed to an unconditional branch to the out-of-line sequence that in turn calls ftrace_caller(). This removes the need for complex synchronization during ftrace activation and simplifies the code. More importantly, this improves performance of the kernel when ftrace is not in use. To address (2), change the ftrace trampoline to return with a 'blr' instruction with the original return address in r0 intact. Then, an additional 'mtlr r0' instruction in the function profile sequence can move the correct return address back to LR. With the above two changes, the function profile sequence now looks like the following: [func: # GEP -- 64-bit powerpc, optional addis r2,r12,imm1 addir2,r2,imm2] tramp: mflrr0 bl ftrace_caller mtlrr0 b func nop [nop] # 64-bit powerpc only func: # LEP nop On 32-bit powerpc, the ftrace mcount trampoline is now completely outside the function. This is also the case on 64-bit powerpc for functions that do not need a GEP. However, for functions that need a GEP, the additional instructions are inserted between the GEP and the LEP. Since we can only have a fixed number of instructions between GEP and LEP, we choose to emit 6 instructions. Four of those instructions are used for the function profile sequence and two instruction slots are reserved for implementing support for DYNAMIC_FTRACE_WITH_CALL_OPS. On 32-bit powerpc, we emit one additional nop for this purpose resulting in a total of 5 nops before function entry. To enable ftrace, the nop at function entry is changed to an unconditional branch to 'tramp'. The call to ftrace_caller() may be updated to ftrace_regs_caller() depending on the registered ftrace ops. On 64-bit powerpc, we additionally change the instruction at 'tramp' to 'mflr r0' from an unconditional branch back to func+4. This is so that functions entered through the GEP can skip the function profile sequence unless ftrace is enabled. With the context_switch microbenchmark on a P9 machine, there is a performance improvement of ~6% with this patch applied, going from 650k context switches to 690k context switches without ftrace enabled. With ftrace enabled, the performance was similar at 86k context switches. The downside of this approach is the increase in vmlinux size, especially on 32-bit powerpc. We now emit 3 additional instructions for each function (excluding the one or two instructions for supporting DYNAMIC_FTRACE_WITH_CALL_OPS). On 64-bit powerpc with the current implementation of -fpatchable-function-entry though, this is not avoidable since we are forced to emit 6 instructions between the GEP and the LEP even if we are to only support DYNAMIC_FTRACE_WITH_CALL_OPS. Signed-off-by: Naveen N Rao --- arch/powerpc/Makefile| 6 +- arch/powerpc/include/asm/code-patching.h | 15 ++- arch/powerpc/include/asm/ftrace.h| 18 ++- arch/powerpc/kernel/kprobes.c| 51 +++- arch/powerpc/kernel/trace/ftrace.c | 149 ++- arch/powerpc/kernel/trace/ftrace_entry.S | 54 ++-- 6 files changed, 246 insertions(+), 47 deletions(-) diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index f19dbaa1d541..91ef34be8eb9 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -145,7 +145,11 @@ CFLAGS-$(CONFIG_PPC32) += $(call cc-option,-mno-readonly-in-sdata) ifdef CONFIG_FUNCTION_TRACER ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY KBUILD_CPPFLAGS+= -DCC_USING_PATCHABLE_FUNCTION_ENTRY -CC_FLAGS_FTRACE := -fpatchable-function-entry=2 +ifdef CONFIG_PPC32 +CC_FLAGS_FTRACE := -fpatchable-function-entry=6,5 +else +CC_FLAGS_FTRACE := -fpatchable-function-entry=7,6 +endif else CC_FLAGS_FTRACE := -pg ifdef CONFIG_MPROFILE_KERNEL diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
[RFC PATCH 5/9] powerpc/kprobes: Use ftrace to determine if a probe is at function entry
Rather than hard-coding the offset into a function to be used to determine if a kprobe is at function entry, use ftrace_location() to determine the ftrace location within the function and categorize all instructions till that offset to be function entry. For functions that cannot be traced, we fall back to using a fixed offset of 8 (two instructions) to categorize a probe as being at function entry for 64-bit elfv2. Signed-off-by: Naveen N Rao --- arch/powerpc/kernel/kprobes.c | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c index b20ee72e873a..42665dfab59e 100644 --- a/arch/powerpc/kernel/kprobes.c +++ b/arch/powerpc/kernel/kprobes.c @@ -105,24 +105,22 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset) return addr; } -static bool arch_kprobe_on_func_entry(unsigned long offset) +static bool arch_kprobe_on_func_entry(unsigned long addr, unsigned long offset) { -#ifdef CONFIG_PPC64_ELF_ABI_V2 -#ifdef CONFIG_KPROBES_ON_FTRACE - return offset <= 16; -#else - return offset <= 8; -#endif -#else + unsigned long ip = ftrace_location(addr); + + if (ip) + return offset <= (ip - addr); + if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2)) + return offset <= 8; return !offset; -#endif } /* XXX try and fold the magic of kprobe_lookup_name() in this */ kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset, bool *on_func_entry) { - *on_func_entry = arch_kprobe_on_func_entry(offset); + *on_func_entry = arch_kprobe_on_func_entry(addr, offset); return (kprobe_opcode_t *)(addr + offset); } -- 2.43.0
[RFC PATCH 4/9] powerpc/Kconfig: Select FUNCTION_ALIGNMENT_4B
From: Sathvika Vasireddy Commit d49a0626216b95 ("arch: Introduce CONFIG_FUNCTION_ALIGNMENT") introduced a generic function-alignment infrastructure. Move to using FUNCTION_ALIGNMENT_4B on powerpc, to use the same alignment as that of the existing _GLOBAL macro. Signed-off-by: Sathvika Vasireddy --- arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/linkage.h | 3 --- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 6f105ee4f3cf..318e5c1b7454 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -189,6 +189,7 @@ config PPC select EDAC_ATOMIC_SCRUB select EDAC_SUPPORT select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY if ARCH_USING_PATCHABLE_FUNCTION_ENTRY + select FUNCTION_ALIGNMENT_4B select GENERIC_ATOMIC64 if PPC32 select GENERIC_CLOCKEVENTS_BROADCASTif SMP select GENERIC_CMOS_UPDATE diff --git a/arch/powerpc/include/asm/linkage.h b/arch/powerpc/include/asm/linkage.h index b88d1d2cf304..b71b9582e754 100644 --- a/arch/powerpc/include/asm/linkage.h +++ b/arch/powerpc/include/asm/linkage.h @@ -4,9 +4,6 @@ #include -#define __ALIGN.align 2 -#define __ALIGN_STR".align 2" - #ifdef CONFIG_PPC64_ELF_ABI_V1 #define cond_syscall(x) \ asm ("\t.weak " #x "\n\t.set " #x ", sys_ni_syscall\n" \ -- 2.43.0
[RFC PATCH 3/9] powerpc/ftrace: Remove nops after the call to ftrace_stub
ftrace_stub is within the same CU, so there is no need for a subsequent nop instruction. Signed-off-by: Naveen N Rao --- arch/powerpc/kernel/trace/ftrace_entry.S | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S index 17d1ed3d0b40..244a1c7bb1e8 100644 --- a/arch/powerpc/kernel/trace/ftrace_entry.S +++ b/arch/powerpc/kernel/trace/ftrace_entry.S @@ -162,7 +162,6 @@ _GLOBAL(ftrace_regs_caller) .globl ftrace_regs_call ftrace_regs_call: bl ftrace_stub - nop ftrace_regs_exit 1 _GLOBAL(ftrace_caller) @@ -171,7 +170,6 @@ _GLOBAL(ftrace_caller) .globl ftrace_call ftrace_call: bl ftrace_stub - nop ftrace_regs_exit 0 _GLOBAL(ftrace_stub) -- 2.43.0
[RFC PATCH 2/9] powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code
On 32-bit powerpc, gcc generates a three instruction sequence for function profiling: mflrr0 stw r0, 4(r1) bl _mcount On kernel boot, the call to _mcount() is nop-ed out, to be patched back in when ftrace is actually enabled. The 'stw' instruction therefore is not necessary unless ftrace is enabled. Nop it out during ftrace init. When ftrace is enabled, we want the 'stw' so that stack unwinding works properly. Perform the same within the ftrace handler, similar to 64-bit powerpc. For 64-bit powerpc, early versions of gcc used to emit a three instruction sequence for function profiling (with -mprofile-kernel) with a 'std' instruction to mimic the 'stw' above. Address that scenario also by nop-ing out the 'std' instruction during ftrace init. Signed-off-by: Naveen N Rao --- arch/powerpc/kernel/trace/ftrace.c | 6 -- arch/powerpc/kernel/trace/ftrace_entry.S | 4 ++-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index 82010629cf88..2956196c98ff 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -229,13 +229,15 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec) /* Expected sequence: 'mflr r0', 'stw r0,4(r1)', 'bl _mcount' */ ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0))); if (!ret) - ret = ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_STW(_R0, _R1, 4))); + ret = ftrace_modify_code(ip - 4, ppc_inst(PPC_RAW_STW(_R0, _R1, 4)), +ppc_inst(PPC_RAW_NOP())); } else if (IS_ENABLED(CONFIG_MPROFILE_KERNEL)) { /* Expected sequence: 'mflr r0', ['std r0,16(r1)'], 'bl _mcount' */ ret = ftrace_read_inst(ip - 4, ); if (!ret && !ppc_inst_equal(old, ppc_inst(PPC_RAW_MFLR(_R0 { ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0))); - ret |= ftrace_validate_inst(ip - 4, ppc_inst(PPC_RAW_STD(_R0, _R1, 16))); + ret |= ftrace_modify_code(ip - 4, ppc_inst(PPC_RAW_STD(_R0, _R1, 16)), + ppc_inst(PPC_RAW_NOP())); } } else { return -EINVAL; diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S b/arch/powerpc/kernel/trace/ftrace_entry.S index 40677416d7b2..17d1ed3d0b40 100644 --- a/arch/powerpc/kernel/trace/ftrace_entry.S +++ b/arch/powerpc/kernel/trace/ftrace_entry.S @@ -33,6 +33,8 @@ * and then arrange for the ftrace function to be called. */ .macro ftrace_regs_entry allregs + /* Save the original return address in A's stack frame */ + PPC_STL r0, LRSAVE(r1) /* Create a minimal stack frame for representing B */ PPC_STLUr1, -STACK_FRAME_MIN_SIZE(r1) @@ -44,8 +46,6 @@ SAVE_GPRS(3, 10, r1) #ifdef CONFIG_PPC64 - /* Save the original return address in A's stack frame */ - std r0, LRSAVE+SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE(r1) /* Ok to continue? */ lbz r3, PACA_FTRACE_ENABLED(r13) cmpdi r3, 0 -- 2.43.0
[RFC PATCH 0/9] powerpc: ftrace updates
Early RFC. This series attempts to address couple of issues with the existing support for ftrace on powerpc, with a view towards improving performance when ftrace is not enabled. See patch 6 for more details. Patches 7 and 8 implement support for ftrace direct calls, through adding support for DYNAMIC_FTRACE_WITH_CALL_OPS. The first 5 patches are minor cleanups and updates, and can go in separately. This series depends on Benjamin Gray's series adding support for patch_ulong(). I have lightly tested this patch set and it looks to be working well. As described in patch 6, context_switch microbenchmark shows an improvement of ~6% with this series with ftrace disabled. Performance when ftrace is enabled reduces due to how DYNAMIC_FTRACE_WITH_CALL_OPS works, and due to support for direct calls. Some of that can hopefully be improved, if this approach is otherwise ok. - Naveen Naveen N Rao (8): powerpc/ftrace: Fix indentation in ftrace.h powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code powerpc/ftrace: Remove nops after the call to ftrace_stub powerpc/kprobes: Use ftrace to determine if a probe is at function entry powerpc/ftrace: Update and move function profile instructions out-of-line powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS samples/ftrace: Add support for ftrace direct samples on powerpc Sathvika Vasireddy (1): powerpc/Kconfig: Select FUNCTION_ALIGNMENT_4B arch/powerpc/Kconfig| 6 + arch/powerpc/Makefile | 6 +- arch/powerpc/include/asm/code-patching.h| 15 +- arch/powerpc/include/asm/ftrace.h | 35 ++- arch/powerpc/include/asm/linkage.h | 3 - arch/powerpc/kernel/asm-offsets.c | 7 + arch/powerpc/kernel/kprobes.c | 69 +- arch/powerpc/kernel/trace/ftrace.c | 231 arch/powerpc/kernel/trace/ftrace_entry.S| 182 +++ samples/ftrace/ftrace-direct-modify.c | 94 +++- samples/ftrace/ftrace-direct-multi-modify.c | 110 +- samples/ftrace/ftrace-direct-multi.c| 64 +- samples/ftrace/ftrace-direct-too.c | 72 +- samples/ftrace/ftrace-direct.c | 61 +- 14 files changed, 845 insertions(+), 110 deletions(-) base-commit: 9a15ae60f2c9707433b01e55815cd9142be102b2 prerequisite-patch-id: 38d3e705bf2e27cfa5e3ba369a6ded84ba6615c2 prerequisite-patch-id: 609d292e054b2396b603890522a940fa0bdfb6d8 prerequisite-patch-id: 6f7213fb77b1260defbf43be0e47bff9c80054cc prerequisite-patch-id: f2328625ae2193c3c8e336b154b62030940cece8 -- 2.43.0
Re: [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
On Wed, 6 Dec 2023 16:42:29 -0600 Bjorn Helgaas wrote: > From: Bjorn Helgaas > > The PCIe spec classifies errors as either "Correctable" or "Uncorrectable". > Previously we printed these as "Corrected" or "Uncorrected". To avoid > confusion, use the same terms as the spec. > > One confusing situation is when one agent detects an error, but another > agent is responsible for recovery, e.g., by re-attempting the operation. > The first agent may log a "correctable" error but it has not yet been > corrected. The recovery agent must report an uncorrectable error if it is > unable to recover. If we print the first agent's error as "Corrected", it > gives the false impression that it has already been resolved. > > Sample message change: > > - pcieport :00:1c.5: AER: Corrected error received: :00:1c.5 > + pcieport :00:1c.5: AER: Correctable error received: :00:1c.5 > > Signed-off-by: Bjorn Helgaas
Re: [PATCH RFC/RFT 3/4] riscv: Stop emitting preventive sfence.vma for new userspace mappings
On Thu, Dec 7, 2023 at 5:37 PM Christophe Leroy wrote: > > The subject says "riscv:" but it changes core part and several arch. > Maybe this commit should be split in two commits, one for API changes > that changes flush_tlb_fix_spurious_fault() to > flush_tlb_fix_spurious_write_fault() and adds > flush_tlb_fix_spurious_read_fault() including the change in memory.c, > then a second patch with the changes to riscv. You're right, I'll do that, thanks. > > Le 07/12/2023 à 16:03, Alexandre Ghiti a écrit : > > The preventive sfence.vma were emitted because new mappings must be made > > visible to the page table walker, either the uarch caches invalid > > entries or not. > > > > Actually, there is no need to preventively sfence.vma on new mappings for > > userspace, this should be handled only in the page fault path. > > > > This allows to drastically reduce the number of sfence.vma emitted: > > > > * Ubuntu boot to login: > > Before: ~630k sfence.vma > > After: ~200k sfence.vma > > > > * ltp - mmapstress01 > > Before: ~45k > > After: ~6.3k > > > > * lmbench - lat_pagefault > > Before: ~665k > > After: 832 (!) > > > > * lmbench - lat_mmap > > Before: ~546k > > After: 718 (!) > > > > The only issue with the removal of sfence.vma in update_mmu_cache() is > > that on uarchs that cache invalid entries, those won't be invalidated > > until the process takes a fault: so that's an additional fault in those > > cases. > > > > Signed-off-by: Alexandre Ghiti > > --- > > arch/arm64/include/asm/pgtable.h | 2 +- > > arch/mips/include/asm/pgtable.h | 6 +-- > > arch/powerpc/include/asm/book3s/64/tlbflush.h | 8 ++-- > > arch/riscv/include/asm/pgtable.h | 43 +++ > > include/linux/pgtable.h | 8 +++- > > mm/memory.c | 12 +- > > 6 files changed, 48 insertions(+), 31 deletions(-) > > Did you forget mm/pgtable-generic.c ? Indeed, I "missed" the occurrence of flush_tlb_fix_spurious_fault() there, thanks. > > > > > diff --git a/arch/arm64/include/asm/pgtable.h > > b/arch/arm64/include/asm/pgtable.h > > index 7f7d9b1df4e5..728f25f529a5 100644 > > --- a/arch/arm64/include/asm/pgtable.h > > +++ b/arch/arm64/include/asm/pgtable.h > > @@ -57,7 +57,7 @@ static inline bool arch_thp_swp_supported(void) > >* fault on one CPU which has been handled concurrently by another CPU > >* does not need to perform additional invalidation. > >*/ > > -#define flush_tlb_fix_spurious_fault(vma, address, ptep) do { } while (0) > > +#define flush_tlb_fix_spurious_write_fault(vma, address, ptep) do { } > > while (0) > > Why do you need to do that change ? Nothing is explained about that in > the commit message. I renamed this macro because in the page fault path, flush_tlb_fix_spurious_fault() is called only when the fault is a write fault (see https://elixir.bootlin.com/linux/latest/source/mm/memory.c#L5016). I'll check if that fits the occurrence in mm/pgtable-generic.c too. Thanks again for the review, Alex > > > > > /* > >* ZERO_PAGE is a global shared page that is always zero: used > > diff --git a/arch/mips/include/asm/pgtable.h > > b/arch/mips/include/asm/pgtable.h > > index 430b208c0130..84439fe6ed29 100644 > > --- a/arch/mips/include/asm/pgtable.h > > +++ b/arch/mips/include/asm/pgtable.h > > @@ -478,9 +478,9 @@ static inline pgprot_t pgprot_writecombine(pgprot_t > > _prot) > > return __pgprot(prot); > > } > > > > -static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma, > > - unsigned long address, > > - pte_t *ptep) > > +static inline void flush_tlb_fix_spurious_write_fault(struct > > vm_area_struct *vma, > > + unsigned long address, > > + pte_t *ptep) > > { > > } > > > > diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h > > b/arch/powerpc/include/asm/book3s/64/tlbflush.h > > index 1950c1b825b4..7166d56f90db 100644 > > --- a/arch/powerpc/include/asm/book3s/64/tlbflush.h > > +++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h > > @@ -128,10 +128,10 @@ static inline void flush_tlb_page(struct > > vm_area_struct *vma, > > #define flush_tlb_page(vma, addr) local_flush_tlb_page(vma, addr) > > #endif /* CONFIG_SMP */ > > > > -#define flush_tlb_fix_spurious_fault flush_tlb_fix_spurious_fault > > -static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma, > > - unsigned long address, > > - pte_t *ptep) > > +#define flush_tlb_fix_spurious_write_fault > > flush_tlb_fix_spurious_write_fault > > +static inline void flush_tlb_fix_spurious_write_fault(struct > > vm_area_struct *vma, > > + unsigned long
Re: [PATCH 3/3] PCI/AER: Use explicit register sizes for struct members
On Wed, 6 Dec 2023 16:42:31 -0600 Bjorn Helgaas wrote: > From: Bjorn Helgaas > > aer_irq() reads the AER Root Error Status and Error Source Identification > (PCI_ERR_ROOT_STATUS and PCI_ERR_ROOT_ERR_SRC) registers directly into > struct aer_err_source. Both registers are 32 bits, so declare the members > explicitly as "u32" instead of "unsigned int". > > Similarly, aer_get_device_error_info() reads the AER Header Log > (PCI_ERR_HEADER_LOG) registers, which are also 32 bits, into struct > aer_header_log_regs. Declare those members as "u32" as well. > > No functional changes intended. > > Signed-off-by: Bjorn Helgaas Another sensible cleanup. FWIW on such simple patches Reviewed-by: Jonathan Cameron
Re: [PATCH 2/3] PCI/AER: Decode Requester ID when no error info found
On Wed, 6 Dec 2023 16:42:30 -0600 Bjorn Helgaas wrote: > From: Bjorn Helgaas > > When a device with AER detects an error, it logs error information in its > own AER Error Status registers. It may send an Error Message to the Root > Port (RCEC in the case of an RCiEP), which logs the fact that an Error > Message was received (Root Error Status) and the Requester ID of the > message source (Error Source Identification). > > aer_print_port_info() prints the Requester ID from the Root Port Error > Source in the usual Linux "bb:dd.f" format, but when find_source_device() > finds no error details in the hierarchy below the Root Port, it printed the > raw Requester ID without decoding it. > > Decode the Requester ID in the usual Linux format so it matches other > messages. > > Sample message changes: > > - pcieport :00:1c.5: AER: Correctable error received: :00:1c.5 > - pcieport :00:1c.5: AER: can't find device of ID00e5 > + pcieport :00:1c.5: AER: Correctable error message received from > :00:1c.5 > + pcieport :00:1c.5: AER: found no error details for :00:1c.5 > > Signed-off-by: Bjorn Helgaas LGTM Reviewed-by: Jonathan Cameron
Re: [PATCH RFC/RFT 2/4] riscv: Add a runtime detection of invalid TLB entries caching
On Thu, Dec 7, 2023 at 4:55 PM Christophe Leroy wrote: > > > > Le 07/12/2023 à 16:03, Alexandre Ghiti a écrit : > > This mechanism allows to completely bypass the sfence.vma introduced by > > the previous commit for uarchs that do not cache invalid TLB entries. > > > > Signed-off-by: Alexandre Ghiti > > --- > > arch/riscv/mm/init.c | 124 +++ > > 1 file changed, 124 insertions(+) > > > > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c > > index 379403de6c6f..2e854613740c 100644 > > --- a/arch/riscv/mm/init.c > > +++ b/arch/riscv/mm/init.c > > @@ -56,6 +56,8 @@ bool pgtable_l5_enabled = IS_ENABLED(CONFIG_64BIT) && > > !IS_ENABLED(CONFIG_XIP_KER > > EXPORT_SYMBOL(pgtable_l4_enabled); > > EXPORT_SYMBOL(pgtable_l5_enabled); > > > > +bool tlb_caching_invalid_entries; > > + > > phys_addr_t phys_ram_base __ro_after_init; > > EXPORT_SYMBOL(phys_ram_base); > > > > @@ -750,6 +752,18 @@ static void __init disable_pgtable_l4(void) > > satp_mode = SATP_MODE_39; > > } > > > > +static void __init enable_pgtable_l5(void) > > +{ > > + pgtable_l5_enabled = true; > > + satp_mode = SATP_MODE_57; > > +} > > + > > +static void __init enable_pgtable_l4(void) > > +{ > > + pgtable_l4_enabled = true; > > + satp_mode = SATP_MODE_48; > > +} > > + > > static int __init print_no4lvl(char *p) > > { > > pr_info("Disabled 4-level and 5-level paging"); > > @@ -826,6 +840,112 @@ static __init void set_satp_mode(uintptr_t dtb_pa) > > memset(early_pud, 0, PAGE_SIZE); > > memset(early_pmd, 0, PAGE_SIZE); > > } > > + > > +/* Determine at runtime if the uarch caches invalid TLB entries */ > > +static __init void set_tlb_caching_invalid_entries(void) > > +{ > > +#define NR_RETRIES_CACHING_INVALID_ENTRIES 50 > > Looks odd to have macros nested in the middle of a function. > > > + uintptr_t set_tlb_caching_invalid_entries_pmd = ((unsigned > > long)set_tlb_caching_invalid_entries) & PMD_MASK; > > + // TODO the test_addr as defined below could go into another pud... > > + uintptr_t test_addr = set_tlb_caching_invalid_entries_pmd + 2 * > > PMD_SIZE; > > + pmd_t valid_pmd; > > + u64 satp; > > + int i = 0; > > + > > + /* To ease the page table creation */ > > + disable_pgtable_l5(); > > + disable_pgtable_l4(); > > + > > + /* Establish a mapping for set_tlb_caching_invalid_entries() in sv39 > > */ > > + create_pgd_mapping(early_pg_dir, > > +set_tlb_caching_invalid_entries_pmd, > > +(uintptr_t)early_pmd, > > +PGDIR_SIZE, PAGE_TABLE); > > + > > + /* Handle the case where set_tlb_caching_invalid_entries straddles 2 > > PMDs */ > > + create_pmd_mapping(early_pmd, > > +set_tlb_caching_invalid_entries_pmd, > > +set_tlb_caching_invalid_entries_pmd, > > +PMD_SIZE, PAGE_KERNEL_EXEC); > > + create_pmd_mapping(early_pmd, > > +set_tlb_caching_invalid_entries_pmd + PMD_SIZE, > > +set_tlb_caching_invalid_entries_pmd + PMD_SIZE, > > +PMD_SIZE, PAGE_KERNEL_EXEC); > > + > > + /* Establish an invalid mapping */ > > + create_pmd_mapping(early_pmd, test_addr, 0, PMD_SIZE, __pgprot(0)); > > + > > + /* Precompute the valid pmd here because the mapping for pfn_pmd() > > won't exist */ > > + valid_pmd = pfn_pmd(PFN_DOWN(set_tlb_caching_invalid_entries_pmd), > > PAGE_KERNEL); > > + > > + local_flush_tlb_all(); > > + satp = PFN_DOWN((uintptr_t)_pg_dir) | SATP_MODE_39; > > + csr_write(CSR_SATP, satp); > > + > > + /* > > + * Set stvec to after the trapping access, access this invalid mapping > > + * and legitimately trap > > + */ > > + // TODO: Should I save the previous stvec? > > +#define ASM_STR(x) __ASM_STR(x) > > Looks odd to have macros nested in the middle of a function. > > > > + asm volatile( > > + "la a0, 1f \n" > > + "csrw " ASM_STR(CSR_TVEC) ", a0 \n" > > + "ld a0, 0(%0) \n" > > + ".align 2 \n" > > + "1: \n" > > + : > > + : "r" (test_addr) > > + : "a0" > > + ); > > + > > + /* Now establish a valid mapping to check if the invalid one is > > cached */ > > + early_pmd[pmd_index(test_addr)] = valid_pmd; > > + > > + /* > > + * Access the valid mapping multiple times: indeed, we can't use > > + * sfence.vma as a barrier to make sure the cpu did not reorder > > accesses > > + * so we may trap even if the uarch does not cache invalid entries. By > > + * trying a few times, we make sure that those uarchs will see the > > right > > + * mapping at some point. > > + */ > >
Re: [PATCH RFC/RFT 1/4] riscv: Stop emitting preventive sfence.vma for new vmalloc mappings
Hi Christophe, On Thu, Dec 7, 2023 at 4:52 PM Christophe Leroy wrote: > > > > Le 07/12/2023 à 16:03, Alexandre Ghiti a écrit : > > In 6.5, we removed the vmalloc fault path because that can't work (see > > [1] [2]). Then in order to make sure that new page table entries were > > seen by the page table walker, we had to preventively emit a sfence.vma > > on all harts [3] but this solution is very costly since it relies on IPI. > > > > And even there, we could end up in a loop of vmalloc faults if a vmalloc > > allocation is done in the IPI path (for example if it is traced, see > > [4]), which could result in a kernel stack overflow. > > > > Those preventive sfence.vma needed to be emitted because: > > > > - if the uarch caches invalid entries, the new mapping may not be > >observed by the page table walker and an invalidation may be needed. > > - if the uarch does not cache invalid entries, a reordered access > >could "miss" the new mapping and traps: in that case, we would actually > >only need to retry the access, no sfence.vma is required. > > > > So this patch removes those preventive sfence.vma and actually handles > > the possible (and unlikely) exceptions. And since the kernel stacks > > mappings lie in the vmalloc area, this handling must be done very early > > when the trap is taken, at the very beginning of handle_exception: this > > also rules out the vmalloc allocations in the fault path. > > > > Note that for now, we emit a sfence.vma even for uarchs that do not > > cache invalid entries as we have no means to know that: that will be > > fixed in the next patch. > > > > Link: > > https://lore.kernel.org/linux-riscv/20230531093817.665799-1-bj...@kernel.org/ > > [1] > > Link: > > https://lore.kernel.org/linux-riscv/20230801090927.2018653-1-dy...@andestech.com > > [2] > > Link: > > https://lore.kernel.org/linux-riscv/20230725132246.817726-1-alexgh...@rivosinc.com/ > > [3] > > Link: https://lore.kernel.org/lkml/20200508144043.13893-1-j...@8bytes.org/ > > [4] > > Signed-off-by: Alexandre Ghiti > > --- > > arch/riscv/include/asm/cacheflush.h | 19 +- > > arch/riscv/include/asm/thread_info.h | 5 ++ > > arch/riscv/kernel/asm-offsets.c | 5 ++ > > arch/riscv/kernel/entry.S| 94 > > arch/riscv/mm/init.c | 2 + > > 5 files changed, 124 insertions(+), 1 deletion(-) > > > > diff --git a/arch/riscv/include/asm/cacheflush.h > > b/arch/riscv/include/asm/cacheflush.h > > index 3cb53c4df27c..a916cbc69d47 100644 > > --- a/arch/riscv/include/asm/cacheflush.h > > +++ b/arch/riscv/include/asm/cacheflush.h > > @@ -37,7 +37,24 @@ static inline void flush_dcache_page(struct page *page) > > flush_icache_mm(vma->vm_mm, 0) > > > > #ifdef CONFIG_64BIT > > -#define flush_cache_vmap(start, end) flush_tlb_kernel_range(start, end) > > +extern u64 new_vmalloc[]; > > Can you have the table size here ? Would help GCC static analysis for > boundary checking. Yes, I'll do > > > +extern char _end[]; > > +#define flush_cache_vmap flush_cache_vmap > > +static inline void flush_cache_vmap(unsigned long start, unsigned long end) > > +{ > > + if ((start < VMALLOC_END && end > VMALLOC_START) || > > + (start < MODULES_END && end > MODULES_VADDR)) { > > Can you use is_vmalloc_or_module_addr() instead ? Yes, I'll do > > > > + int i; > > + > > + /* > > + * We don't care if concurrently a cpu resets this value since > > + * the only place this can happen is in handle_exception() > > where > > + * an sfence.vma is emitted. > > + */ > > + for (i = 0; i < NR_CPUS / sizeof(u64) + 1; ++i) > > Use ARRAY_SIZE() ? And that too :) Thanks for the review, Alex > > > + new_vmalloc[i] = -1ULL; > > + } > > +} > > #endif > > > > #ifndef CONFIG_SMP > > diff --git a/arch/riscv/include/asm/thread_info.h > > b/arch/riscv/include/asm/thread_info.h > > index 1833beb00489..8fe12fa6c329 100644 > > --- a/arch/riscv/include/asm/thread_info.h > > +++ b/arch/riscv/include/asm/thread_info.h > > @@ -60,6 +60,11 @@ struct thread_info { > > longuser_sp;/* User stack pointer */ > > int cpu; > > unsigned long syscall_work; /* SYSCALL_WORK_ flags */ > > + /* > > + * Used in handle_exception() to save a0, a1 and a2 before knowing if > > we > > + * can access the kernel stack. > > + */ > > + unsigned long a0, a1, a2; > > }; > > > > /* > > diff --git a/arch/riscv/kernel/asm-offsets.c > > b/arch/riscv/kernel/asm-offsets.c > > index d6a75aac1d27..340c1c84560d 100644 > > --- a/arch/riscv/kernel/asm-offsets.c > > +++ b/arch/riscv/kernel/asm-offsets.c > > @@ -34,10 +34,15 @@ void asm_offsets(void) > > OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]); > > OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]); > >
Re: [PATCH 01/12] KVM: PPC: Book3S HV nestedv2: Invalidate RPT before deleting a guest
Hi Aneesh, Thanks for looking into this patch. My responses inline below: "Aneesh Kumar K.V (IBM)" writes: > Vaibhav Jain writes: > >> From: Jordan Niethe >> >> An L0 must invalidate the L2's RPT during H_GUEST_DELETE if this has not >> already been done. This is a slow operation that means H_GUEST_DELETE >> must return H_BUSY multiple times before completing. Invalidating the >> tables before deleting the guest so there is less work for the L0 to do. >> >> Signed-off-by: Jordan Niethe >> --- >> arch/powerpc/include/asm/kvm_book3s.h | 1 + >> arch/powerpc/kvm/book3s_hv.c | 6 -- >> arch/powerpc/kvm/book3s_hv_nested.c | 2 +- >> 3 files changed, 6 insertions(+), 3 deletions(-) >> >> diff --git a/arch/powerpc/include/asm/kvm_book3s.h >> b/arch/powerpc/include/asm/kvm_book3s.h >> index 4f527d09c92b..a37736ed3728 100644 >> --- a/arch/powerpc/include/asm/kvm_book3s.h >> +++ b/arch/powerpc/include/asm/kvm_book3s.h >> @@ -302,6 +302,7 @@ void kvmhv_nested_exit(void); >> void kvmhv_vm_nested_init(struct kvm *kvm); >> long kvmhv_set_partition_table(struct kvm_vcpu *vcpu); >> long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu); >> +void kvmhv_flush_lpid(u64 lpid); >> void kvmhv_set_ptbl_entry(u64 lpid, u64 dw0, u64 dw1); >> void kvmhv_release_all_nested(struct kvm *kvm); >> long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu); >> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c >> index 1ed6ec140701..5543e8490cd9 100644 >> --- a/arch/powerpc/kvm/book3s_hv.c >> +++ b/arch/powerpc/kvm/book3s_hv.c >> @@ -5691,10 +5691,12 @@ static void kvmppc_core_destroy_vm_hv(struct kvm >> *kvm) >> kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0); >> } >> >> -if (kvmhv_is_nestedv2()) >> +if (kvmhv_is_nestedv2()) { >> +kvmhv_flush_lpid(kvm->arch.lpid); >> plpar_guest_delete(0, kvm->arch.lpid); >> > > I am not sure I follow the optimization here. I would expect the > hypervisor to kill all the translation caches as part of guest_delete. > What is the benefit of doing a lpid flush outside the guest delete? > Thats right. However without this optimization the H_GUEST_DELETE hcall in plpar_guest_delete() returns H_BUSY multiple times resulting in multiple hcalls to the hypervisor until it finishes. Flushing the guest translation cache upfront reduces the number of HCALLs L1 guests has to make to delete a L2 guest via H_GUEST_DELETE. >> -else >> +} else { >> kvmppc_free_lpid(kvm->arch.lpid); >> +} >> >> kvmppc_free_pimap(kvm); >> } >> diff --git a/arch/powerpc/kvm/book3s_hv_nested.c >> b/arch/powerpc/kvm/book3s_hv_nested.c >> index 3b658b8696bc..5c375ec1a3c6 100644 >> --- a/arch/powerpc/kvm/book3s_hv_nested.c >> +++ b/arch/powerpc/kvm/book3s_hv_nested.c >> @@ -503,7 +503,7 @@ void kvmhv_nested_exit(void) >> } >> } >> >> -static void kvmhv_flush_lpid(u64 lpid) >> +void kvmhv_flush_lpid(u64 lpid) >> { >> long rc; >> >> -- >> 2.42.0 -- Cheers ~ Vaibhav
[PATCH v6 3/3] Documentation/powerpc: update fadump implementation details
The patch titled ("powerpc: make fadump resilient with memory add/remove events") has made significant changes to the implementation of fadump, particularly on elfcorehdr creation and fadump crash info header structure. Therefore, updating the fadump implementation documentation to reflect those changes. Following updates are done to firmware assisted dump documentation: 1. The elfcorehdr is no longer stored after fadump HDR in the reserved dump area. Instead, the second kernel dynamically allocates memory for the elfcorehdr within the address range from 0 to the boot memory size. Therefore, update figures 1 and 2 of Memory Reservation during the first and second kernels to reflect this change. 2. A version field has been added to the fadump header to manage the future changes to fadump crash info header structure without changing the fadump header magic number in the future. Therefore, remove the corresponding TODO from the document. Signed-off-by: Sourabh Jain Cc: Aditya Gupta Cc: Aneesh Kumar K.V Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao --- .../arch/powerpc/firmware-assisted-dump.rst | 91 +-- 1 file changed, 42 insertions(+), 49 deletions(-) diff --git a/Documentation/arch/powerpc/firmware-assisted-dump.rst b/Documentation/arch/powerpc/firmware-assisted-dump.rst index e363fc48529a..7e37aadd1f77 100644 --- a/Documentation/arch/powerpc/firmware-assisted-dump.rst +++ b/Documentation/arch/powerpc/firmware-assisted-dump.rst @@ -134,12 +134,12 @@ that are run. If there is dump data, then the memory is held. If there is no waiting dump data, then only the memory required to -hold CPU state, HPTE region, boot memory dump, FADump header and -elfcore header, is usually reserved at an offset greater than boot -memory size (see Fig. 1). This area is *not* released: this region -will be kept permanently reserved, so that it can act as a receptacle -for a copy of the boot memory content in addition to CPU state and -HPTE region, in the case a crash does occur. +hold CPU state, HPTE region, boot memory dump, and FADump header is +usually reserved at an offset greater than boot memory size (see Fig. 1). +This area is *not* released: this region will be kept permanently +reserved, so that it can act as a receptacle for a copy of the boot +memory content in addition to CPU state and HPTE region, in the case +a crash does occur. Since this reserved memory area is used only after the system crash, there is no point in blocking this significant chunk of memory from @@ -153,22 +153,22 @@ that were present in CMA region:: o Memory Reservation during first kernel - Low memory Top of memory - 0boot memory size |<--- Reserved dump area --->| | - | | |Permanent Reservation | | - V V || V - +---+-/ /---+---++---+-+-++--+ - | | |///|| DUMP | HDR | ELF || | - +---+-/ /---+---++---+-+-++--+ -| ^^ ^ ^ ^ -| || | | | -\ CPU HPTE / | | - -- | | - Boot memory content gets transferred| | - to reserved area by firmware at the | | - time of crash. | | - FADump Header | - (meta area)| + Low memory Top of memory + 0boot memory size |<-- Reserved dump area ->| | + | | | Permanent Reservation | | + V V | | V + +---+-/ /---+---++---+---++-+ + | | |///||DUMP | HDR || | + +---+-/ /---+---++---+---++-+ +| ^^ ^ ^ ^ +| || | | | +\ CPU HPTE / | | + | | + Boot memory content gets transferred | | + to reserved area by firmware at the | | + time of crash. | | + FADump Header | +(meta area) | | | Metadata: This area holds a metadata structure whose @@ -186,13 +186,20 @@ that were present in
[PATCH v6 2/3] powerpc/fadump: add hotplug_ready sysfs interface
The elfcorehdr describes the CPUs and memory of the crashed kernel to the kernel that captures the dump, known as the second or fadump kernel. The elfcorehdr needs to be updated if the system's memory changes due to memory hotplug or online/offline events. Currently, memory hotplug events are monitored in userspace by udev rules, and fadump is re-registered, which recreates the elfcorehdr with the latest available memory in the system. However, the previous patch ("powerpc: make fadump resilient with memory add/remove events") moved the creation of elfcorehdr to the second or fadump kernel. This eliminates the need to regenerate the elfcorehdr during memory hotplug or online/offline events. Create a sysfs entry at /sys/kernel/fadump/hotplug_ready to let userspace know that fadump re-registration is not required for memory add/remove events. Signed-off-by: Sourabh Jain Cc: Aditya Gupta Cc: Aneesh Kumar K.V Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao --- Documentation/ABI/testing/sysfs-kernel-fadump | 11 +++ arch/powerpc/kernel/fadump.c | 14 ++ 2 files changed, 25 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump b/Documentation/ABI/testing/sysfs-kernel-fadump index 8f7a64a81783..971934b2891e 100644 --- a/Documentation/ABI/testing/sysfs-kernel-fadump +++ b/Documentation/ABI/testing/sysfs-kernel-fadump @@ -38,3 +38,14 @@ Contact: linuxppc-dev@lists.ozlabs.org Description: read only Provide information about the amount of memory reserved by FADump to save the crash dump in bytes. + +What: /sys/kernel/fadump/hotplug_ready +Date: Sep 2023 +Contact: linuxppc-dev@lists.ozlabs.org +Description: read only + Kdump udev rule re-registers fadump on memory add/remove events, + primarily to update the elfcorehdr. This sysfs indicates the + kdump udev rule that fadump re-registration is not required on + memory add/remove events because elfcorehdr is now prepared in + the second/fadump kernel. +User: kexec-tools diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index eb9132538268..a55dd9bf754c 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -1455,6 +1455,18 @@ static ssize_t enabled_show(struct kobject *kobj, return sprintf(buf, "%d\n", fw_dump.fadump_enabled); } +/* + * /sys/kernel/fadump/hotplug_ready sysfs node returns 1, which inidcates + * to usersapce that fadump re-registration is not required on memory + * hotplug events. + */ +static ssize_t hotplug_ready_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf) +{ + return sprintf(buf, "%d\n", 1); +} + static ssize_t mem_reserved_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) @@ -1527,11 +1539,13 @@ static struct kobj_attribute release_attr = __ATTR_WO(release_mem); static struct kobj_attribute enable_attr = __ATTR_RO(enabled); static struct kobj_attribute register_attr = __ATTR_RW(registered); static struct kobj_attribute mem_reserved_attr = __ATTR_RO(mem_reserved); +static struct kobj_attribute hotplug_ready_attr = __ATTR_RO(hotplug_ready); static struct attribute *fadump_attrs[] = { _attr.attr, _attr.attr, _reserved_attr.attr, + _ready_attr.attr, NULL, }; -- 2.41.0
[PATCH v6 1/3] powerpc: make fadump resilient with memory add/remove events
Due to changes in memory resources caused by either memory hotplug or online/offline events, the elfcorehdr, which describes the CPUs and memory of the crashed kernel to the kernel that collects the dump (known as second/fadump kernel), becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr can lead to failed or inaccurate dump collection. Memory hotplug or online/offline events is referred as memory add/remove events in reset of the commit message. The current solution to address the aforementioned issue is as follows: Monitor memory add/remove events in userspace using udev rules, and re-register fadump whenever there are changes in memory resources. This leads to the creation of a new elfcorehdr with updated system memory information. There are several notable issues associated with re-registering fadump for every memory add/remove events. 1. Bulk memory add/remove events with udev-based fadump re-registration can lead to race conditions and, more importantly, it creates a wide window during which fadump is inactive until all memory add/remove events are settled. 2. Re-registering fadump for every memory add/remove event is inefficient. 3. The memory for elfcorehdr is allocated based on the memblock regions available during early boot and remains fixed thereafter. However, if elfcorehdr is later recreated with additional memblock regions, its size will increase, potentially leading to memory corruption. Address the aforementioned challenges by shifting the creation of elfcorehdr from the first kernel (also referred as the crashed kernel), where it was created and frequently recreated for every memory add/remove event, to the fadump kernel. As a result, the elfcorehdr only needs to be created once, thus eliminating the necessity to re-register fadump during memory add/remove events. At present, the first kernel prepares the fadump header and stores it in the fadump reserved area. The fadump header contains start address of the elfcorehd, crashing CPU details, etc. In the event of first kernel crash, the second/fadump boots and access the fadump header prepared by first kernel and do the following in a platform-specific function [rtas|opal]_fadump_process: At present, the first kernel is responsible for preparing the fadump header and storing it in the fadump reserved area. The fadump header includes the start address of the elfcorehd, crashing CPU details, and other relevant information. In the event of a crash in the first kernel, the second/fadump boots and accesses the fadump header prepared by the first kernel. It then performs the following steps in a platform-specific function [rtas|opal]_fadump_process: 1. Sanity check for fadump header 2. Update CPU notes in elfcorehdr 3. Set the global variable elfcorehdr_addr to the address of the fadump header's elfcorehdr. For vmcore module to process it later on. Along with the above, update the setup_fadump()/fadump.c to create elfcorehdr in second/fadump kernel. Section below outlines the information required to create the elfcorehdr and the changes made to make it available to the fadump kernel if it's not already. To create elfcorehdr, the following crashed kernel information is required: CPU notes, vmcoreinfo, and memory ranges. At present, the CPU notes are already prepared in the fadump kernel, so no changes are needed in that regard. The fadump kernel has access to all crashed kernel memory regions, including boot memory regions that are relocated by firmware to fadump reserved areas, so no changes for that either. However, it is necessary to add new members to the fadump header, i.e., the 'fadump_crash_info_header' structure, in order to pass the crashed kernel's vmcoreinfo address and its size to fadump kernel. In addition to the vmcoreinfo address and size, there are a few other attributes also added to the fadump_crash_info_header structure. 1. version: It stores the fadump header version, which is currently set to 1. This provides flexibility to update the fadump crash info header in the future without changing the magic number. For each change in the fadump header, the version will be increased. This will help the updated kernel determine how to handle kernel dumps from older kernels. The magic number remains relevant for checking fadump header corruption. 2. elfcorehdr_size: since elfcorehdr is now prepared in the fadump/second kernel and it is not part of the reserved area, this attribute is needed to track the memory allocated for elfcorehdr to do the deallocation properly. 3. pt_regs_sz/cpu_mask_sz: Store size of pt_regs and cpu_mask strucutre in first kernel. These attributes are used avoid processing the dump if the sizes of pt_regs and cpu_mask are not the same across the crashed and fadump kernel. Note: if either first/crashed kernel or second/fadump kernel do not have the changes introduced here kernel fail to collect
[PATCH v6 0/3] powerpc: make fadump resilient with memory add/remove events
Problem: Due to changes in memory resources caused by either memory hotplug or online/offline events, the elfcorehdr, which describes the cpus and memory of the crashed kernel to the kernel that collects the dump (known as second/fadump kernel), becomes outdated. Consequently, attempting dump collection with an outdated elfcorehdr can lead to failed or inaccurate dump collection. Memory hotplug or online/offline events is referred as memory add/remove events in reset of the patch series. Existing solution: == Monitor memory add/remove events in userspace using udev rules, and re-register fadump whenever there are changes in memory resources. This leads to the creation of a new elfcorehdr with updated system memory information. Challenges with existing solution: == 1. Performing bulk memory add/remove with udev-based fadump re-registration can lead to race conditions and, more importantly, it creates a large wide window during which fadump is inactive until all memory add/remove events are settled. 2. Re-registering fadump for every memory add/remove event is inefficient. 3. Memory for elfcorehdr is allocated based on the memblock regions available during first kernel early boot and it remains fixed thereafter. However, if the elfcorehdr is later recreated with additional memblock regions, its size will increase, potentially leading to memory corruption. Proposed solution: == Address the aforementioned challenges by shifting the creation of elfcorehdr from the first kernel (also referred as the crashed kernel), where it was created and frequently recreated for every memory add/remove event, to the fadump kernel. As a result, the elfcorehdr only needs to be created once, thus eliminating the necessity to re-register fadump during memory add/remove events. To know more about elfcorehdr creation in the fadump kernel, refer to the first patch in this series. The second patch includes a new sysfs interface that tells userspace that fadump re-registration isn't needed for memory add/remove events. note that userspace changes do not need to be in sync with kernel changes; they can roll out independently. Since there are significant changes in the fadump implementation, the third patch updates the fadump documentation to reflect the changes made in this patch series. Kernel tree rebased on 6.7.0-rc4 with patch series applied: = https://github.com/sourabhjains/linux/tree/fadump-mem-hotplug-v6 Userspace changes: == To realize this feature, one must update the kdump udev rules to prevent fadump re-registration during memory add/remove events. On rhel apply the following changes to file /usr/lib/udev/rules.d/98-kexec.rules -run+="/bin/sh -c '/usr/bin/systemctl is-active kdump.service || exit 0; /usr/bin/systemd-run --quiet --no-block /usr/lib/udev/kdump-udev-throttler'" +# don't re-register fadump if the value of the node +# /sys/kernel/fadump/hotplug_ready is 1. + +run+="/bin/sh -c '/usr/bin/systemctl is-active kdump.service || exit 0; ! test -f /sys/kernel/fadump_enabled || cat /sys/kernel/fadump_enabled | grep 0 || ! test -f /sys/kernel/fadump/hotplug_ready || cat /sys/kernel/fadump/hotplug_ready | grep 0 || exit 0; /usr/bin/systemd-run --quiet --no-block /usr/lib/udev/kdump-udev-throttler'" Changelog: == v6: 8 Dec 2023 - Add size fields for `pt_regs` and `cpumask` in the fadump header structure - Don't process the dump if the size of `pt_regs` and `cpu_mask` is not same in the crashed and fadump kernel - Include an additional check for endianness mismatch when the magic number doesn't match, to print the relevant error message - Don't process the dump if the fadump header contains an old magic number - Rebased it to 6.7.0-rc4 v5: 29 Oct 2023 https://lore.kernel.org/all/20231029124548.12198-1-sourabhj...@linux.ibm.com/ - Fix a comment on the first patch v4: 21 Oct 2023 https://lore.kernel.org/all/20231021181733.204311-1-sourabhj...@linux.ibm.com/ - Fix a build warning about type casting v3: 9 Oct 2023 https://lore.kernel.org/all/20231009041953.36139-1-sourabhj...@linux.ibm.com/ - Assign physical address of elfcorehdr to fdh->elfcorehdr_addr - Rename a variable, boot_mem_dest_addr -> boot_mem_dest_offset v2: 25 Sep 2023 https://lore.kernel.org/all/20230925051214.678957-1-sourabhj...@linux.ibm.com/ - Fixed a few indentation issues reported by the checkpatch script. - Rebased it to 6.6.0-rc3 v1: 17 Sep 2023 https://lore.kernel.org/all/20230917080225.561627-1-sourabhj...@linux.ibm.com/ Cc: Aditya Gupta Cc: Aneesh Kumar K.V Cc: Hari Bathini Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: Naveen N Rao Sourabh Jain (3): powerpc: make fadump resilient with memory add/remove events powerpc/fadump: add hotplug_ready sysfs interface Documentation/powerpc: update fadump
Re: [PATCH v2] i2c: cpm: Remove linux,i2c-index conversion from be32
Acked-By: Jochen Friedrich Am 06.12.2023 um 23:24 schrieb Christophe Leroy: sparse reports an error on some data that gets converted from be32. That's because that data is typed u32 instead of __be32. The type is correct, the be32_to_cpu() conversion is not. Remove the conversion. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202312042210.ql4da8av-...@intel.com/ Signed-off-by: Christophe Leroy --- v2: Use u32 directly, remove be32_to_cpu(). --- drivers/i2c/busses/i2c-cpm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/i2c/busses/i2c-cpm.c b/drivers/i2c/busses/i2c-cpm.c index 9a664abf734d..771d60bc8d71 100644 --- a/drivers/i2c/busses/i2c-cpm.c +++ b/drivers/i2c/busses/i2c-cpm.c @@ -658,7 +658,7 @@ static int cpm_i2c_probe(struct platform_device *ofdev) /* register new adapter to i2c module... */ data = of_get_property(ofdev->dev.of_node, "linux,i2c-index", ); - cpm->adap.nr = (data && len == 4) ? be32_to_cpup(data) : -1; + cpm->adap.nr = (data && len == 4) ? *data : -1; result = i2c_add_numbered_adapter(>adap); if (result < 0)
[PATCH] powerpc/powernv: Add a null pointer check to scom_debug_init_one
kasprintf() returns a pointer to dynamically allocated memory which can be NULL upon failure. Add a null pointer check, and release 'ent' to avoid memory leaks. Fixes: bfd2f0d49aef ("powerpc/powernv: Get rid of old scom_controller abstraction") Cc: Kunwu Chan Signed-off-by: Kunwu Chan --- arch/powerpc/platforms/powernv/opal-xscom.c | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/powerpc/platforms/powernv/opal-xscom.c b/arch/powerpc/platforms/powernv/opal-xscom.c index 262cd6fac907..748c2b97fa53 100644 --- a/arch/powerpc/platforms/powernv/opal-xscom.c +++ b/arch/powerpc/platforms/powernv/opal-xscom.c @@ -165,6 +165,11 @@ static int scom_debug_init_one(struct dentry *root, struct device_node *dn, ent->chip = chip; snprintf(ent->name, 16, "%08x", chip); ent->path.data = (void *)kasprintf(GFP_KERNEL, "%pOF", dn); + if (!ent->path.data) { + kfree(ent); + return -ENOMEM; + } + ent->path.size = strlen((char *)ent->path.data); dir = debugfs_create_dir(ent->name, root); -- 2.39.2
Re: [PATCH 09/12] KVM: PPC: Book3S HV nestedv2: Do not call H_COPY_TOFROM_GUEST
Vaibhav Jain writes: > From: Jordan Niethe > > H_COPY_TOFROM_GUEST is part of the nestedv1 API and so should not be > called by a nestedv2 host. Do not attempt to call it. > May be we should use firmware_has_feature(FW_FEATURE_H_COPY_TOFROM_GUEST))? the nestedv2 can end up using the above hcall if it is supported by the hypervisor right? In its absence we will have to translate the guest ea using xlate and then use kvm_guest_read to read location using the guest real address right? That xlate will also involves multiple kvm_guest_read. > Signed-off-by: Jordan Niethe > --- > arch/powerpc/kvm/book3s_64_mmu_radix.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c > b/arch/powerpc/kvm/book3s_64_mmu_radix.c > index 916af6c153a5..4a1abb9f7c05 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c > @@ -40,6 +40,9 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int > pid, > unsigned long quadrant, ret = n; > bool is_load = !!to; > > + if (kvmhv_is_nestedv2()) > + return H_UNSUPPORTED; > + > /* Can't access quadrants 1 or 2 in non-HV mode, call the HV to do it */ > if (kvmhv_on_pseries()) > return plpar_hcall_norets(H_COPY_TOFROM_GUEST, lpid, pid, eaddr, > -- > 2.42.0