[linux-next:master] BUILD REGRESSION bc63de6e6ba0b16652c5fb4b9c9916b9e7ca1f23

2023-12-08 Thread kernel test robot
tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: bc63de6e6ba0b16652c5fb4b9c9916b9e7ca1f23  Add linux-next specific 
files for 20231208

Error/Warning reports:

https://lore.kernel.org/oe-kbuild-all/202312081716.luuhsns4-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202312081904.nkusjjo0-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202312081955.g3stkpfj-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202312090117.dlmjtqst-...@intel.com

Error/Warning: (recently discovered and may have been fixed)

WARNING: modpost: vmlinux: section mismatch in reference: 
at91_poweroff_probe+0x8c (section: .text) -> at91_wakeup_status (section: 
.init.text)
WARNING: modpost: vmlinux: section mismatch in reference: at91_shdwc_probe+0xd8 
(section: .text) -> at91_wakeup_status (section: .init.text)
arch/mips/mm/cache.c:209:(.text+0x690): undefined reference to `r3k_cache_init'
arch/powerpc/platforms/44x/warp.c:109:15: error: variable 'warp_gpio_leds' has 
initializer but incomplete type
arch/powerpc/platforms/44x/warp.c:109:31: error: storage size of 
'warp_gpio_leds' isn't known
arch/powerpc/platforms/44x/warp.c:110:10: error: 'struct platform_device' has 
no member named 'name'
arch/powerpc/platforms/44x/warp.c:110:19: warning: excess elements in struct 
initializer
arch/powerpc/platforms/44x/warp.c:111:10: error: 'struct platform_device' has 
no member named 'id'
arch/powerpc/platforms/44x/warp.c:112:10: error: 'struct platform_device' has 
no member named 'dev'
arch/powerpc/platforms/44x/warp.c:112:19: error: extra brace group at end of 
initializer
arch/powerpc/platforms/44x/warp.c:197:25: error: implicit declaration of 
function 'platform_device_register'; did you mean 'of_device_register'? 
[-Werror=implicit-function-declaration]
drivers/usb/host/uhci-grlib.c:152:31: error: implicit declaration of function 
'platform_get_drvdata'; did you mean 'pci_get_drvdata'? 
[-Werror=implicit-function-declaration]
drivers/usb/host/uhci-grlib.c:152:31: warning: initialization of 'struct 
usb_hcd *' from 'int' makes pointer from integer without a cast 
[-Wint-conversion]
drivers/usb/host/uhci-grlib.c:184:15: error: variable 'uhci_grlib_driver' has 
initializer but incomplete type
drivers/usb/host/uhci-grlib.c:184:31: error: storage size of 
'uhci_grlib_driver' isn't known
drivers/usb/host/uhci-grlib.c:185:10: error: 'struct platform_driver' has no 
member named 'probe'
drivers/usb/host/uhci-grlib.c:185:27: warning: excess elements in struct 
initializer
drivers/usb/host/uhci-grlib.c:186:10: error: 'struct platform_driver' has no 
member named 'remove_new'
drivers/usb/host/uhci-grlib.c:187:10: error: 'struct platform_driver' has no 
member named 'shutdown'
drivers/usb/host/uhci-grlib.c:188:10: error: 'struct platform_driver' has no 
member named 'driver'
drivers/usb/host/uhci-grlib.c:188:19: error: extra brace group at end of 
initializer
drivers/usb/host/uhci-grlib.c:92:36: error: invalid use of undefined type 
'struct platform_device'
drivers/usb/host/uhci-hcd.c:885:18: error: implicit declaration of function 
'platform_driver_register' [-Werror=implicit-function-declaration]
drivers/usb/host/uhci-hcd.c:902:9: error: implicit declaration of function 
'platform_driver_unregister'; did you mean 'driver_unregister'? 
[-Werror=implicit-function-declaration]
uffd-common.c:636:28: warning: unused variable 'uffdio_move' [-Wunused-variable]

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- alpha-randconfig-r083-20230821
|   `-- 
WARNING:modpost:missing-MODULE_DESCRIPTION()-in-drivers-char-agp-alpha-agp.o
|-- arc-randconfig-r132-20231208
|   `-- 
lib-zstd-compress-zstd_fast.c:sparse:sparse:Using-plain-integer-as-NULL-pointer
|-- arm-allyesconfig
|   `-- qcom_stats.c:(.text):undefined-reference-to-__aeabi_uldivmod
|-- arm-randconfig-r004-20221210
|   `-- 
WARNING:modpost:drivers-pcmcia-omap_cf:section-mismatch-in-reference:omap_cf_driver-(section:.data)-omap_cf_remove-(section:.exit.text)
|-- arm-randconfig-r112-20231208
|   `-- 
lib-zstd-compress-zstd_fast.c:sparse:sparse:Using-plain-integer-as-NULL-pointer
|-- i386-buildonly-randconfig-002-20230825
|   `-- 
include-linux-compiler_types.h:error:call-to-__compiletime_assert_NNN-declared-with-attribute-error:BUILD_BUG_ON-failed:sizeof(-vcpup)-SMP_CACHE_BYTES
|-- i386-randconfig-r133-20231208
|   `-- 
lib-zstd-compress-zstd_fast.c:sparse:sparse:Using-plain-integer-as-NULL-pointer
|-- loongarch-randconfig-r123-20231208
|   `-- 
lib-zstd-compress-zstd_fast.c:sparse:sparse:Using-plain-integer-as-NULL-pointer
|-- mips-allyesconfig
|   `-- 
qcom_stats.c:(.text.qcom_ddr_stats_show):undefined-reference-to-__udivdi3
|-- mips-fuloong2e_defconfig
|   `-- 
WARNING:modpost:missing-MODULE_DESCRIPTION()-in-drivers-base-regmap-regmap-mmio.o
|-- mips-randconfig-r021-20230212
|   `-- arch-mips-mm-cache.c:(.text):undefined-reference-to-r3k_cache_init
|-- parisc-allmodconfig
|   `-- 
WARNING:modpost:missing-MODULE_DESCR

Re: [PATCH v2] i2c: cpm: Remove linux,i2c-index conversion from be32

2023-12-08 Thread Andi Shyti
Hi Christophe,

On Wed, Dec 06, 2023 at 11:24:03PM +0100, Christophe Leroy wrote:
> sparse reports an error on some data that gets converted from be32.
> 
> That's because that data is typed u32 instead of __be32.
> 
> The type is correct, the be32_to_cpu() conversion is not.
> 
> Remove the conversion.
> 
> Reported-by: kernel test robot 
> Closes: 
> https://lore.kernel.org/oe-kbuild-all/202312042210.ql4da8av-...@intel.com/
> Signed-off-by: Christophe Leroy 
> ---
> v2: Use u32 directly, remove be32_to_cpu().
> ---
>  drivers/i2c/busses/i2c-cpm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/i2c/busses/i2c-cpm.c b/drivers/i2c/busses/i2c-cpm.c
> index 9a664abf734d..771d60bc8d71 100644
> --- a/drivers/i2c/busses/i2c-cpm.c
> +++ b/drivers/i2c/busses/i2c-cpm.c
> @@ -658,7 +658,7 @@ static int cpm_i2c_probe(struct platform_device *ofdev)
>   /* register new adapter to i2c module... */
>  
>   data = of_get_property(ofdev->dev.of_node, "linux,i2c-index", );
> - cpm->adap.nr = (data && len == 4) ? be32_to_cpup(data) : -1;
> + cpm->adap.nr = (data && len == 4) ? *data : -1;

thanks!

Reviewed-by: Andi Shyti 

Andi


Re: [PATCH 0/3] PCI/AER: Clean up logging

2023-12-08 Thread Bjorn Helgaas
[+cc Jonathan]

On Wed, Dec 06, 2023 at 04:42:28PM -0600, Bjorn Helgaas wrote:
> From: Bjorn Helgaas 
> 
> Clean up some minor AER logging issues:
> 
>   - Log as "Correctable errors", not "Corrected errors"
> 
>   - Decode the Requester ID when we couldn't find detail error info
> 
> Bjorn Helgaas (3):
>   PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
>   PCI/AER: Decode Requester ID when no error info found
>   PCI/AER: Use explicit register sizes for struct members
> 
>  drivers/pci/pcie/aer.c | 19 ---
>  include/linux/aer.h|  8 
>  2 files changed, 16 insertions(+), 11 deletions(-)

Applied to pci/aer for v6.8.  Thanks, Jonathan, for your time in
taking a look.


[RFC PATCH 1/9] powerpc/ftrace: Fix indentation in ftrace.h

2023-12-08 Thread Naveen N Rao
Replace seven spaces with a tab character to fix an indentation issue
reported by the kernel test robot.

Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202311221731.aluwtdim-...@intel.com/
Signed-off-by: Naveen N Rao 
---
 arch/powerpc/include/asm/ftrace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index 9e5a39b6a311..1ebd2ca97f12 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -25,7 +25,7 @@ static inline unsigned long ftrace_call_adjust(unsigned long 
addr)
if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY))
addr += MCOUNT_INSN_SIZE;
 
-   return addr;
+   return addr;
 }
 
 unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
-- 
2.43.0



[RFC PATCH 9/9] samples/ftrace: Add support for ftrace direct samples on powerpc

2023-12-08 Thread Naveen N Rao
Add powerpc 32-bit and 64-bit samples for ftrace direct. This serves to
show the sample instruction sequence to be used by ftrace direct calls
to adhere to the ftrace ABI.

On 64-bit powerpc, TOC setup requires some additional work.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/Kconfig|   2 +
 samples/ftrace/ftrace-direct-modify.c   |  94 -
 samples/ftrace/ftrace-direct-multi-modify.c | 110 +++-
 samples/ftrace/ftrace-direct-multi.c|  64 +++-
 samples/ftrace/ftrace-direct-too.c  |  72 -
 samples/ftrace/ftrace-direct.c  |  61 ++-
 6 files changed, 398 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 4fe04fdca33a..28de3a5f3e98 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -274,6 +274,8 @@ config PPC
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE
select HAVE_RSEQ
+   select HAVE_SAMPLE_FTRACE_DIRECTif 
HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+   select HAVE_SAMPLE_FTRACE_DIRECT_MULTI  if 
HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
select HAVE_SETUP_PER_CPU_AREA  if PPC64
select HAVE_SOFTIRQ_ON_OWN_STACK
select HAVE_STACKPROTECTOR  if PPC32 && 
$(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
diff --git a/samples/ftrace/ftrace-direct-modify.c 
b/samples/ftrace/ftrace-direct-modify.c
index e2a6a69352df..bd985035b937 100644
--- a/samples/ftrace/ftrace-direct-modify.c
+++ b/samples/ftrace/ftrace-direct-modify.c
@@ -2,7 +2,7 @@
 #include 
 #include 
 #include 
-#ifndef CONFIG_ARM64
+#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
 #include 
 #endif
 
@@ -164,6 +164,98 @@ asm (
 
 #endif /* CONFIG_LOONGARCH */
 
+#ifdef CONFIG_PPC32
+
+asm (
+"  .pushsection.text, \"ax\", @progbits\n"
+"  .type   my_tramp1, @function\n"
+"  .globl  my_tramp1\n"
+"   my_tramp1:"
+"  stw 0, 4(1)\n"
+"  stwu1, -16(1)\n"
+"  mflr0\n"
+"  stw 0, 4(1)\n"
+"  stwu1, -16(1)\n"
+"  bl  my_direct_func1\n"
+"  lwz 0, 20(1)\n"
+"  mtlr0\n"
+"  addi1, 1, 32\n"
+"  lwz 0, 4(1)\n"
+"  blr\n"
+"  .size   my_tramp1, .-my_tramp1\n"
+
+"  .type   my_tramp2, @function\n"
+"  .globl  my_tramp2\n"
+"   my_tramp2:"
+"  stw 0, 4(1)\n"
+"  stwu1, -16(1)\n"
+"  mflr0\n"
+"  stw 0, 4(1)\n"
+"  stwu1, -16(1)\n"
+"  bl  my_direct_func2\n"
+"  lwz 0, 20(1)\n"
+"  mtlr0\n"
+"  addi1, 1, 32\n"
+"  lwz 0, 4(1)\n"
+"  blr\n"
+"  .size   my_tramp2, .-my_tramp2\n"
+"  .popsection\n"
+);
+
+#endif /* CONFIG_PPC32 */
+
+#ifdef CONFIG_PPC64
+
+asm (
+"  .pushsection.text, \"ax\", @progbits\n"
+"  .type   my_tramp1, @function\n"
+"  .globl  my_tramp1\n"
+"   my_tramp1:"
+"  std 0, 16(1)\n"
+"  stdu1, -32(1)\n"
+"  mflr0\n"
+"  std 0, 16(1)\n"
+"  stdu1, -32(1)\n"
+"  std 2, 24(1)\n"
+"  bcl 20, 31, 1f\n"
+"   1: mflr12\n"
+"  ld  2, (2f - 1b)(12)\n"
+"  bl  my_direct_func1\n"
+"  ld  2, 24(1)\n"
+"  ld  0, 48(1)\n"
+"  mtlr0\n"
+"  addi1, 1, 64\n"
+"  ld  0, 16(1)\n"
+"  blr\n"
+"   2: .quad   .TOC.@tocbase\n"
+"  .size   my_tramp1, .-my_tramp1\n"
+
+"  .type   my_tramp2, @function\n"
+"  .globl  my_tramp2\n"
+"   my_tramp2:"
+"  std 0, 16(1)\n"
+"  stdu1, -32(1)\n"
+"  mflr0\n"
+"  std 0, 16(1)\n"
+"  stdu1, -32(1)\n"
+"  std 2, 24(1)\n"
+"  bcl 20, 31, 1f\n"
+"   1: mflr12\n"
+"  ld  2, (2f - 1b)(12)\n"
+"  bl  my_direct_func2\n"
+"  ld  2, 24(1)\n"
+"  ld  0, 48(1)\n"
+"  mtlr0\n"
+"  addi1, 1, 64\n"
+"  ld  0, 16(1)\n"
+"  blr\n"
+"   2: .quad   .TOC.@tocbase\n"
+"  .size   my_tramp2, .-my_tramp2\n"
+"  .popsection\n"
+);
+
+#endif /* CONFIG_PPC64 */
+
 static struct ftrace_ops direct;
 
 static unsigned long my_tramp = (unsigned long)my_tramp1;
diff --git a/samples/ftrace/ftrace-direct-multi-modify.c 
b/samples/ftrace/ftrace-direct-multi-modify.c
index 2e349834d63c..478e879a23af 100644
--- a/samples/ftrace/ftrace-direct-multi-modify.c
+++ b/samples/ftrace/ftrace-direct-multi-modify.c
@@ 

[RFC PATCH 8/9] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS

2023-12-08 Thread Naveen N Rao
Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS similar to the arm64
implementation.

ftrace direct calls allow custom trampolines to be called into directly
from function ftrace call sites, bypassing the ftrace trampoline
completely. This functionality is currently utilized by BPF trampolines
to hook into kernel function entries.

Since we have limited relative branch range, we support ftrace direct
calls through support for DYNAMIC_FTRACE_WITH_CALL_OPS. In this
approach, ftrace trampoline is not entirely bypassed. Rather, it is
re-purposed into a stub that reads direct_call field from the associated
ftrace_ops structure and branches into that, if it is not NULL. For
this, it is sufficient if we can ensure that the ftrace trampoline is
reachable from all traceable functions.

When multiple ftrace_ops are associated with a call site, we utilize a
call back to set pt_regs->orig_gpr3 that can then be tested on the
return path from the ftrace trampoline to branch into the direct caller.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/ftrace.h| 15 
 arch/powerpc/kernel/asm-offsets.c|  3 +
 arch/powerpc/kernel/trace/ftrace_entry.S | 99 ++--
 4 files changed, 93 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c8ecc9dcc914..4fe04fdca33a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -235,6 +235,7 @@ config PPC
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_ARGSif 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+   select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS if 
HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
select HAVE_DYNAMIC_FTRACE_WITH_REGSif 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
select HAVE_EBPF_JIT
select HAVE_EFFICIENT_UNALIGNED_ACCESS
diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index d9b99781bea3..986c4fffb9ec 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -93,6 +93,21 @@ struct ftrace_ops;
 #define ftrace_graph_func ftrace_graph_func
 void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
   struct ftrace_ops *op, struct ftrace_regs *fregs);
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+/*
+ * When an ftrace registered caller is tracing a function that is also set by a
+ * register_ftrace_direct() call, it needs to be differentiated in the
+ * ftrace_caller trampoline so that the direct call can be invoked after the
+ * other ftrace ops. To do this, place the direct caller in the orig_gpr3 field
+ * of pt_regs. This tells ftrace_caller that there's a direct caller.
+ */
+static inline void arch_ftrace_set_direct_caller(struct ftrace_regs *fregs, 
unsigned long addr)
+{
+   struct pt_regs *regs = >regs;
+   regs->orig_gpr3 = addr;
+}
+#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
 #endif
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 8b8a39b57a9f..85da10726d98 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -678,6 +678,9 @@ int main(void)
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
OFFSET(FTRACE_OPS_FUNC, ftrace_ops, func);
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+   OFFSET(FTRACE_OPS_DIRECT_CALL, ftrace_ops, direct_call);
+#endif
 #endif
 
return 0;
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S 
b/arch/powerpc/kernel/trace/ftrace_entry.S
index 4d1220c2e32f..ab60395fc34b 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -33,14 +33,57 @@
  * and then arrange for the ftrace function to be called.
  */
 .macro ftrace_regs_entry allregs
-   /* Save the original return address in A's stack frame */
-   PPC_STL r0, LRSAVE(r1)
/* Create a minimal stack frame for representing B */
PPC_STLUr1, -STACK_FRAME_MIN_SIZE(r1)
 
/* Create our stack frame + pt_regs */
PPC_STLUr1,-SWITCH_FRAME_SIZE(r1)
 
+   .if \allregs == 1
+   SAVE_GPRS(11, 12, r1)
+   .endif
+
+   /* Get the _mcount() call site out of LR */
+   mflrr11
+
+#ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+   /*
+* This points after the bl at 'mtlr r0', but this sequence could be
+* outside the function. Move this to point just after the ftrace
+* location inside the function for proper unwind.
+*/
+   addir11, r11, FTRACE_MCOUNT_TRAMP_OFFSET - MCOUNT_INSN_SIZE
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+   /* Load the ftrace_op */
+   PPC_LL  r12, -SZL-MCOUNT_INSN_SIZE(r11)
+
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+   /* 

[RFC PATCH 7/9] powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS

2023-12-08 Thread Naveen N Rao
Implement support for DYNAMIC_FTRACE_WITH_CALL_OPS similar to the
arm64 implementation.

This works by patching-in a pointer to an associated ftrace_ops
structure before each traceable function. If multiple ftrace_ops are
associated with a call site, then a special ftrace_list_ops is used to
enable iterating over all the registered ftrace_ops. If no ftrace_ops
are associated with a call site, then a special ftrace_nop_ops structure
is used to render the ftrace call as a no-op. ftrace trampoline can then
read the associated ftrace_ops for a call site by loading from an offset
from the LR, and branch directly to the associated function.

The primary advantage with this approach is that we don't have to
iterate over all the registered ftrace_ops for call sites that have a
single ftrace_ops registered. This is the equivalent of implementing
support for dynamic ftrace trampolines, which set up a special ftrace
trampoline for each registered ftrace_ops and have individual call sites
branch into those directly.

A secondary advantage is that this gives us a way to add support for
direct ftrace callers without having to resort to using stubs. The
address of the direct call trampoline can be loaded from the ftrace_ops
structure.

To support this, we utilize the space between the existing function
profile sequence and the function entry. During ftrace activation, we
update this location with the associated ftrace_ops pointer. Then, on
ftrace entry, we load from this location and call into
ftrace_ops->func().

For 64-bit powerpc, we also select FUNCTION_ALIGNMENT_8B so that the
ftrace_ops pointer is double word aligned and can be updated atomically.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/Kconfig |  2 +
 arch/powerpc/kernel/asm-offsets.c|  4 ++
 arch/powerpc/kernel/trace/ftrace.c   | 58 
 arch/powerpc/kernel/trace/ftrace_entry.S | 39 +++-
 4 files changed, 91 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 318e5c1b7454..c8ecc9dcc914 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -190,6 +190,7 @@ config PPC
select EDAC_SUPPORT
select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY if 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY
select FUNCTION_ALIGNMENT_4B
+   select FUNCTION_ALIGNMENT_8Bif PPC64 && 
DYNAMIC_FTRACE_WITH_CALL_OPS
select GENERIC_ATOMIC64 if PPC32
select GENERIC_CLOCKEVENTS_BROADCASTif SMP
select GENERIC_CMOS_UPDATE
@@ -233,6 +234,7 @@ config PPC
select HAVE_DEBUG_STACKOVERFLOW
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_ARGSif 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
+   select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY
select HAVE_DYNAMIC_FTRACE_WITH_REGSif 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
select HAVE_EBPF_JIT
select HAVE_EFFICIENT_UNALIGNED_ACCESS
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 9f14d95b8b32..8b8a39b57a9f 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -676,5 +676,9 @@ int main(void)
DEFINE(BPT_SIZE, BPT_SIZE);
 #endif
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+   OFFSET(FTRACE_OPS_FUNC, ftrace_ops, func);
+#endif
+
return 0;
 }
diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index d3b4949142a8..af84eabf7912 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -124,6 +124,41 @@ static int ftrace_get_call_inst(struct dyn_ftrace *rec, 
unsigned long addr, ppc_
return 0;
 }
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+static const struct ftrace_ops *powerpc_rec_get_ops(struct dyn_ftrace *rec)
+{
+   const struct ftrace_ops *ops = NULL;
+
+   if (rec->flags & FTRACE_FL_CALL_OPS_EN) {
+   ops = ftrace_find_unique_ops(rec);
+   WARN_ON_ONCE(!ops);
+   }
+
+   if (!ops)
+   ops = _list_ops;
+
+   return ops;
+}
+
+static int ftrace_rec_set_ops(const struct dyn_ftrace *rec, const struct 
ftrace_ops *ops)
+{
+   return patch_ulong((void *)(rec->ip - sizeof(unsigned long)), (unsigned 
long)ops);
+}
+
+static int ftrace_rec_set_nop_ops(struct dyn_ftrace *rec)
+{
+   return ftrace_rec_set_ops(rec, _nop_ops);
+}
+
+static int ftrace_rec_update_ops(struct dyn_ftrace *rec)
+{
+   return ftrace_rec_set_ops(rec, powerpc_rec_get_ops(rec));
+}
+#else
+static int ftrace_rec_set_nop_ops(struct dyn_ftrace *rec) { return 0; }
+static int ftrace_rec_update_ops(struct dyn_ftrace *rec) { return 0; }
+#endif
+
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
 int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, 
unsigned long addr)
 {
@@ -159,6 +194,10 @@ int 

[RFC PATCH 6/9] powerpc/ftrace: Update and move function profile instructions out-of-line

2023-12-08 Thread Naveen N Rao
Function profile sequence on powerpc includes two instructions at the
beginning of each function:

mflrr0
bl  ftrace_caller

The call to ftrace_caller() gets nop'ed out during kernel boot and is
patched in when ftrace is enabled.

There are two issues with this:
1. The 'mflr r0' instruction at the beginning of each function remains
   even though ftrace is not being used.
2. When ftrace is activated, we return from ftrace_caller() with a bctr
   instruction to preserve r0 and LR, resulting in the link stack
   becoming unbalanced.

To address (1), we have tried to nop'out the 'mflr r0' instruction when
nop'ing out the call to ftrace_caller() and restoring it when enabling
ftrace. But, that required additional synchronization slowing down
ftrace activation. It also left an additional nop instruction at the
beginning of each function and that wasn't desirable on 32-bit powerpc.

Instead of that, move the function profile sequence out-of-line leaving
a single nop at function entry. On ftrace activation, the nop is changed
to an unconditional branch to the out-of-line sequence that in turn
calls ftrace_caller(). This removes the need for complex synchronization
during ftrace activation and simplifies the code. More importantly, this
improves performance of the kernel when ftrace is not in use.

To address (2), change the ftrace trampoline to return with a 'blr'
instruction with the original return address in r0 intact. Then, an
additional 'mtlr r0' instruction in the function profile sequence can
move the correct return address back to LR.

With the above two changes, the function profile sequence now looks like
the following:

 [func: # GEP -- 64-bit powerpc, optional
addis   r2,r12,imm1
addir2,r2,imm2]
  tramp:
mflrr0
bl  ftrace_caller
mtlrr0
b   func
nop
[nop]   # 64-bit powerpc only
  func: # LEP
nop

On 32-bit powerpc, the ftrace mcount trampoline is now completely
outside the function. This is also the case on 64-bit powerpc for
functions that do not need a GEP. However, for functions that need a
GEP, the additional instructions are inserted between the GEP and the
LEP. Since we can only have a fixed number of instructions between GEP
and LEP, we choose to emit 6 instructions. Four of those instructions
are used for the function profile sequence and two instruction slots are
reserved for implementing support for DYNAMIC_FTRACE_WITH_CALL_OPS. On
32-bit powerpc, we emit one additional nop for this purpose resulting in
a total of 5 nops before function entry.

To enable ftrace, the nop at function entry is changed to an
unconditional branch to 'tramp'. The call to ftrace_caller() may be
updated to ftrace_regs_caller() depending on the registered ftrace ops.
On 64-bit powerpc, we additionally change the instruction at 'tramp' to
'mflr r0' from an unconditional branch back to func+4. This is so that
functions entered through the GEP can skip the function profile sequence
unless ftrace is enabled.

With the context_switch microbenchmark on a P9 machine, there is a
performance improvement of ~6% with this patch applied, going from 650k
context switches to 690k context switches without ftrace enabled. With
ftrace enabled, the performance was similar at 86k context switches.

The downside of this approach is the increase in vmlinux size,
especially on 32-bit powerpc. We now emit 3 additional instructions for
each function (excluding the one or two instructions for supporting
DYNAMIC_FTRACE_WITH_CALL_OPS). On 64-bit powerpc with the current
implementation of -fpatchable-function-entry though, this is not
avoidable since we are forced to emit 6 instructions between the GEP and
the LEP even if we are to only support DYNAMIC_FTRACE_WITH_CALL_OPS.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/Makefile|   6 +-
 arch/powerpc/include/asm/code-patching.h |  15 ++-
 arch/powerpc/include/asm/ftrace.h|  18 ++-
 arch/powerpc/kernel/kprobes.c|  51 +++-
 arch/powerpc/kernel/trace/ftrace.c   | 149 ++-
 arch/powerpc/kernel/trace/ftrace_entry.S |  54 ++--
 6 files changed, 246 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index f19dbaa1d541..91ef34be8eb9 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -145,7 +145,11 @@ CFLAGS-$(CONFIG_PPC32) += $(call 
cc-option,-mno-readonly-in-sdata)
 ifdef CONFIG_FUNCTION_TRACER
 ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
 KBUILD_CPPFLAGS+= -DCC_USING_PATCHABLE_FUNCTION_ENTRY
-CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+ifdef CONFIG_PPC32
+CC_FLAGS_FTRACE := -fpatchable-function-entry=6,5
+else
+CC_FLAGS_FTRACE := -fpatchable-function-entry=7,6
+endif
 else
 CC_FLAGS_FTRACE := -pg
 ifdef CONFIG_MPROFILE_KERNEL
diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h

[RFC PATCH 5/9] powerpc/kprobes: Use ftrace to determine if a probe is at function entry

2023-12-08 Thread Naveen N Rao
Rather than hard-coding the offset into a function to be used to
determine if a kprobe is at function entry, use ftrace_location() to
determine the ftrace location within the function and categorize all
instructions till that offset to be function entry.

For functions that cannot be traced, we fall back to using a fixed
offset of 8 (two instructions) to categorize a probe as being at
function entry for 64-bit elfv2.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/kprobes.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index b20ee72e873a..42665dfab59e 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -105,24 +105,22 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, 
unsigned int offset)
return addr;
 }
 
-static bool arch_kprobe_on_func_entry(unsigned long offset)
+static bool arch_kprobe_on_func_entry(unsigned long addr, unsigned long offset)
 {
-#ifdef CONFIG_PPC64_ELF_ABI_V2
-#ifdef CONFIG_KPROBES_ON_FTRACE
-   return offset <= 16;
-#else
-   return offset <= 8;
-#endif
-#else
+   unsigned long ip = ftrace_location(addr);
+
+   if (ip)
+   return offset <= (ip - addr);
+   if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2))
+   return offset <= 8;
return !offset;
-#endif
 }
 
 /* XXX try and fold the magic of kprobe_lookup_name() in this */
 kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long 
offset,
 bool *on_func_entry)
 {
-   *on_func_entry = arch_kprobe_on_func_entry(offset);
+   *on_func_entry = arch_kprobe_on_func_entry(addr, offset);
return (kprobe_opcode_t *)(addr + offset);
 }
 
-- 
2.43.0



[RFC PATCH 4/9] powerpc/Kconfig: Select FUNCTION_ALIGNMENT_4B

2023-12-08 Thread Naveen N Rao
From: Sathvika Vasireddy 

Commit d49a0626216b95 ("arch: Introduce CONFIG_FUNCTION_ALIGNMENT")
introduced a generic function-alignment infrastructure. Move to using
FUNCTION_ALIGNMENT_4B on powerpc, to use the same alignment as that of
the existing _GLOBAL macro.

Signed-off-by: Sathvika Vasireddy 
---
 arch/powerpc/Kconfig   | 1 +
 arch/powerpc/include/asm/linkage.h | 3 ---
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 6f105ee4f3cf..318e5c1b7454 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -189,6 +189,7 @@ config PPC
select EDAC_ATOMIC_SCRUB
select EDAC_SUPPORT
select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY if 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY
+   select FUNCTION_ALIGNMENT_4B
select GENERIC_ATOMIC64 if PPC32
select GENERIC_CLOCKEVENTS_BROADCASTif SMP
select GENERIC_CMOS_UPDATE
diff --git a/arch/powerpc/include/asm/linkage.h 
b/arch/powerpc/include/asm/linkage.h
index b88d1d2cf304..b71b9582e754 100644
--- a/arch/powerpc/include/asm/linkage.h
+++ b/arch/powerpc/include/asm/linkage.h
@@ -4,9 +4,6 @@
 
 #include 
 
-#define __ALIGN.align 2
-#define __ALIGN_STR".align 2"
-
 #ifdef CONFIG_PPC64_ELF_ABI_V1
 #define cond_syscall(x) \
asm ("\t.weak " #x "\n\t.set " #x ", sys_ni_syscall\n"  \
-- 
2.43.0



[RFC PATCH 3/9] powerpc/ftrace: Remove nops after the call to ftrace_stub

2023-12-08 Thread Naveen N Rao
ftrace_stub is within the same CU, so there is no need for a subsequent
nop instruction.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/ftrace_entry.S | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S 
b/arch/powerpc/kernel/trace/ftrace_entry.S
index 17d1ed3d0b40..244a1c7bb1e8 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -162,7 +162,6 @@ _GLOBAL(ftrace_regs_caller)
 .globl ftrace_regs_call
 ftrace_regs_call:
bl  ftrace_stub
-   nop
ftrace_regs_exit 1
 
 _GLOBAL(ftrace_caller)
@@ -171,7 +170,6 @@ _GLOBAL(ftrace_caller)
 .globl ftrace_call
 ftrace_call:
bl  ftrace_stub
-   nop
ftrace_regs_exit 0
 
 _GLOBAL(ftrace_stub)
-- 
2.43.0



[RFC PATCH 2/9] powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code

2023-12-08 Thread Naveen N Rao
On 32-bit powerpc, gcc generates a three instruction sequence for
function profiling:
mflrr0
stw r0, 4(r1)
bl  _mcount

On kernel boot, the call to _mcount() is nop-ed out, to be patched back
in when ftrace is actually enabled. The 'stw' instruction therefore is
not necessary unless ftrace is enabled. Nop it out during ftrace init.

When ftrace is enabled, we want the 'stw' so that stack unwinding works
properly. Perform the same within the ftrace handler, similar to 64-bit
powerpc.

For 64-bit powerpc, early versions of gcc used to emit a three
instruction sequence for function profiling (with -mprofile-kernel) with
a 'std' instruction to mimic the 'stw' above. Address that scenario also
by nop-ing out the 'std' instruction during ftrace init.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/kernel/trace/ftrace.c   | 6 --
 arch/powerpc/kernel/trace/ftrace_entry.S | 4 ++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 82010629cf88..2956196c98ff 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -229,13 +229,15 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace 
*rec)
/* Expected sequence: 'mflr r0', 'stw r0,4(r1)', 'bl _mcount' */
ret = ftrace_validate_inst(ip - 8, ppc_inst(PPC_RAW_MFLR(_R0)));
if (!ret)
-   ret = ftrace_validate_inst(ip - 4, 
ppc_inst(PPC_RAW_STW(_R0, _R1, 4)));
+   ret = ftrace_modify_code(ip - 4, 
ppc_inst(PPC_RAW_STW(_R0, _R1, 4)),
+ppc_inst(PPC_RAW_NOP()));
} else if (IS_ENABLED(CONFIG_MPROFILE_KERNEL)) {
/* Expected sequence: 'mflr r0', ['std r0,16(r1)'], 'bl 
_mcount' */
ret = ftrace_read_inst(ip - 4, );
if (!ret && !ppc_inst_equal(old, ppc_inst(PPC_RAW_MFLR(_R0 {
ret = ftrace_validate_inst(ip - 8, 
ppc_inst(PPC_RAW_MFLR(_R0)));
-   ret |= ftrace_validate_inst(ip - 4, 
ppc_inst(PPC_RAW_STD(_R0, _R1, 16)));
+   ret |= ftrace_modify_code(ip - 4, 
ppc_inst(PPC_RAW_STD(_R0, _R1, 16)),
+ ppc_inst(PPC_RAW_NOP()));
}
} else {
return -EINVAL;
diff --git a/arch/powerpc/kernel/trace/ftrace_entry.S 
b/arch/powerpc/kernel/trace/ftrace_entry.S
index 40677416d7b2..17d1ed3d0b40 100644
--- a/arch/powerpc/kernel/trace/ftrace_entry.S
+++ b/arch/powerpc/kernel/trace/ftrace_entry.S
@@ -33,6 +33,8 @@
  * and then arrange for the ftrace function to be called.
  */
 .macro ftrace_regs_entry allregs
+   /* Save the original return address in A's stack frame */
+   PPC_STL r0, LRSAVE(r1)
/* Create a minimal stack frame for representing B */
PPC_STLUr1, -STACK_FRAME_MIN_SIZE(r1)
 
@@ -44,8 +46,6 @@
SAVE_GPRS(3, 10, r1)
 
 #ifdef CONFIG_PPC64
-   /* Save the original return address in A's stack frame */
-   std r0, LRSAVE+SWITCH_FRAME_SIZE+STACK_FRAME_MIN_SIZE(r1)
/* Ok to continue? */
lbz r3, PACA_FTRACE_ENABLED(r13)
cmpdi   r3, 0
-- 
2.43.0



[RFC PATCH 0/9] powerpc: ftrace updates

2023-12-08 Thread Naveen N Rao
Early RFC.

This series attempts to address couple of issues with the existing 
support for ftrace on powerpc, with a view towards improving performance 
when ftrace is not enabled. See patch 6 for more details.

Patches 7 and 8 implement support for ftrace direct calls, through 
adding support for DYNAMIC_FTRACE_WITH_CALL_OPS.

The first 5 patches are minor cleanups and updates, and can go in 
separately.

This series depends on Benjamin Gray's series adding support for 
patch_ulong().

I have lightly tested this patch set and it looks to be working well. As 
described in patch 6, context_switch microbenchmark shows an improvement 
of ~6% with this series with ftrace disabled. Performance when ftrace is
enabled reduces due to how DYNAMIC_FTRACE_WITH_CALL_OPS works, and due 
to support for direct calls. Some of that can hopefully be improved, if 
this approach is otherwise ok.

- Naveen



Naveen N Rao (8):
  powerpc/ftrace: Fix indentation in ftrace.h
  powerpc/ftrace: Unify 32-bit and 64-bit ftrace entry code
  powerpc/ftrace: Remove nops after the call to ftrace_stub
  powerpc/kprobes: Use ftrace to determine if a probe is at function
entry
  powerpc/ftrace: Update and move function profile instructions
out-of-line
  powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS
  powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS
  samples/ftrace: Add support for ftrace direct samples on powerpc

Sathvika Vasireddy (1):
  powerpc/Kconfig: Select FUNCTION_ALIGNMENT_4B

 arch/powerpc/Kconfig|   6 +
 arch/powerpc/Makefile   |   6 +-
 arch/powerpc/include/asm/code-patching.h|  15 +-
 arch/powerpc/include/asm/ftrace.h   |  35 ++-
 arch/powerpc/include/asm/linkage.h  |   3 -
 arch/powerpc/kernel/asm-offsets.c   |   7 +
 arch/powerpc/kernel/kprobes.c   |  69 +-
 arch/powerpc/kernel/trace/ftrace.c  | 231 
 arch/powerpc/kernel/trace/ftrace_entry.S| 182 +++
 samples/ftrace/ftrace-direct-modify.c   |  94 +++-
 samples/ftrace/ftrace-direct-multi-modify.c | 110 +-
 samples/ftrace/ftrace-direct-multi.c|  64 +-
 samples/ftrace/ftrace-direct-too.c  |  72 +-
 samples/ftrace/ftrace-direct.c  |  61 +-
 14 files changed, 845 insertions(+), 110 deletions(-)


base-commit: 9a15ae60f2c9707433b01e55815cd9142be102b2
prerequisite-patch-id: 38d3e705bf2e27cfa5e3ba369a6ded84ba6615c2
prerequisite-patch-id: 609d292e054b2396b603890522a940fa0bdfb6d8
prerequisite-patch-id: 6f7213fb77b1260defbf43be0e47bff9c80054cc
prerequisite-patch-id: f2328625ae2193c3c8e336b154b62030940cece8
-- 
2.43.0



Re: [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors

2023-12-08 Thread Jonathan Cameron
On Wed,  6 Dec 2023 16:42:29 -0600
Bjorn Helgaas  wrote:

> From: Bjorn Helgaas 
> 
> The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
> Previously we printed these as "Corrected" or "Uncorrected".  To avoid
> confusion, use the same terms as the spec.
> 
> One confusing situation is when one agent detects an error, but another
> agent is responsible for recovery, e.g., by re-attempting the operation.
> The first agent may log a "correctable" error but it has not yet been
> corrected.  The recovery agent must report an uncorrectable error if it is
> unable to recover.  If we print the first agent's error as "Corrected", it
> gives the false impression that it has already been resolved.
> 
> Sample message change:
> 
>   - pcieport :00:1c.5: AER: Corrected error received: :00:1c.5
>   + pcieport :00:1c.5: AER: Correctable error received: :00:1c.5
> 
> Signed-off-by: Bjorn Helgaas 


Re: [PATCH RFC/RFT 3/4] riscv: Stop emitting preventive sfence.vma for new userspace mappings

2023-12-08 Thread Alexandre Ghiti
On Thu, Dec 7, 2023 at 5:37 PM Christophe Leroy
 wrote:
>
> The subject says "riscv:" but it changes core part and several arch.
> Maybe this commit should be split in two commits, one for API changes
> that changes flush_tlb_fix_spurious_fault() to
> flush_tlb_fix_spurious_write_fault() and adds
> flush_tlb_fix_spurious_read_fault() including the change in memory.c,
> then a second patch with the changes to riscv.

You're right, I'll do that, thanks.

>
> Le 07/12/2023 à 16:03, Alexandre Ghiti a écrit :
> > The preventive sfence.vma were emitted because new mappings must be made
> > visible to the page table walker, either the uarch caches invalid
> > entries or not.
> >
> > Actually, there is no need to preventively sfence.vma on new mappings for
> > userspace, this should be handled only in the page fault path.
> >
> > This allows to drastically reduce the number of sfence.vma emitted:
> >
> > * Ubuntu boot to login:
> > Before: ~630k sfence.vma
> > After:  ~200k sfence.vma
> >
> > * ltp - mmapstress01
> > Before: ~45k
> > After:  ~6.3k
> >
> > * lmbench - lat_pagefault
> > Before: ~665k
> > After:   832 (!)
> >
> > * lmbench - lat_mmap
> > Before: ~546k
> > After:   718 (!)
> >
> > The only issue with the removal of sfence.vma in update_mmu_cache() is
> > that on uarchs that cache invalid entries, those won't be invalidated
> > until the process takes a fault: so that's an additional fault in those
> > cases.
> >
> > Signed-off-by: Alexandre Ghiti 
> > ---
> >   arch/arm64/include/asm/pgtable.h  |  2 +-
> >   arch/mips/include/asm/pgtable.h   |  6 +--
> >   arch/powerpc/include/asm/book3s/64/tlbflush.h |  8 ++--
> >   arch/riscv/include/asm/pgtable.h  | 43 +++
> >   include/linux/pgtable.h   |  8 +++-
> >   mm/memory.c   | 12 +-
> >   6 files changed, 48 insertions(+), 31 deletions(-)
>
> Did you forget mm/pgtable-generic.c ?

Indeed, I "missed" the occurrence of flush_tlb_fix_spurious_fault()
there, thanks.

>
> >
> > diff --git a/arch/arm64/include/asm/pgtable.h 
> > b/arch/arm64/include/asm/pgtable.h
> > index 7f7d9b1df4e5..728f25f529a5 100644
> > --- a/arch/arm64/include/asm/pgtable.h
> > +++ b/arch/arm64/include/asm/pgtable.h
> > @@ -57,7 +57,7 @@ static inline bool arch_thp_swp_supported(void)
> >* fault on one CPU which has been handled concurrently by another CPU
> >* does not need to perform additional invalidation.
> >*/
> > -#define flush_tlb_fix_spurious_fault(vma, address, ptep) do { } while (0)
> > +#define flush_tlb_fix_spurious_write_fault(vma, address, ptep) do { } 
> > while (0)
>
> Why do you need to do that change ? Nothing is explained about that in
> the commit message.

I renamed this macro because in the page fault path,
flush_tlb_fix_spurious_fault() is called only when the fault is a
write fault (see
https://elixir.bootlin.com/linux/latest/source/mm/memory.c#L5016).
I'll check if that fits the occurrence in mm/pgtable-generic.c too.

Thanks again for the review,

Alex

>
> >
> >   /*
> >* ZERO_PAGE is a global shared page that is always zero: used
> > diff --git a/arch/mips/include/asm/pgtable.h 
> > b/arch/mips/include/asm/pgtable.h
> > index 430b208c0130..84439fe6ed29 100644
> > --- a/arch/mips/include/asm/pgtable.h
> > +++ b/arch/mips/include/asm/pgtable.h
> > @@ -478,9 +478,9 @@ static inline pgprot_t pgprot_writecombine(pgprot_t 
> > _prot)
> >   return __pgprot(prot);
> >   }
> >
> > -static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma,
> > - unsigned long address,
> > - pte_t *ptep)
> > +static inline void flush_tlb_fix_spurious_write_fault(struct 
> > vm_area_struct *vma,
> > +   unsigned long address,
> > +   pte_t *ptep)
> >   {
> >   }
> >
> > diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h 
> > b/arch/powerpc/include/asm/book3s/64/tlbflush.h
> > index 1950c1b825b4..7166d56f90db 100644
> > --- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
> > +++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
> > @@ -128,10 +128,10 @@ static inline void flush_tlb_page(struct 
> > vm_area_struct *vma,
> >   #define flush_tlb_page(vma, addr)   local_flush_tlb_page(vma, addr)
> >   #endif /* CONFIG_SMP */
> >
> > -#define flush_tlb_fix_spurious_fault flush_tlb_fix_spurious_fault
> > -static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma,
> > - unsigned long address,
> > - pte_t *ptep)
> > +#define flush_tlb_fix_spurious_write_fault 
> > flush_tlb_fix_spurious_write_fault
> > +static inline void flush_tlb_fix_spurious_write_fault(struct 
> > vm_area_struct *vma,
> > +   unsigned long 

Re: [PATCH 3/3] PCI/AER: Use explicit register sizes for struct members

2023-12-08 Thread Jonathan Cameron
On Wed,  6 Dec 2023 16:42:31 -0600
Bjorn Helgaas  wrote:

> From: Bjorn Helgaas 
> 
> aer_irq() reads the AER Root Error Status and Error Source Identification
> (PCI_ERR_ROOT_STATUS and PCI_ERR_ROOT_ERR_SRC) registers directly into
> struct aer_err_source.  Both registers are 32 bits, so declare the members
> explicitly as "u32" instead of "unsigned int".
> 
> Similarly, aer_get_device_error_info() reads the AER Header Log
> (PCI_ERR_HEADER_LOG) registers, which are also 32 bits, into struct
> aer_header_log_regs.  Declare those members as "u32" as well.
> 
> No functional changes intended.
> 
> Signed-off-by: Bjorn Helgaas 

Another sensible cleanup. FWIW on such simple patches
Reviewed-by: Jonathan Cameron 


Re: [PATCH 2/3] PCI/AER: Decode Requester ID when no error info found

2023-12-08 Thread Jonathan Cameron
On Wed,  6 Dec 2023 16:42:30 -0600
Bjorn Helgaas  wrote:

> From: Bjorn Helgaas 
> 
> When a device with AER detects an error, it logs error information in its
> own AER Error Status registers.  It may send an Error Message to the Root
> Port (RCEC in the case of an RCiEP), which logs the fact that an Error
> Message was received (Root Error Status) and the Requester ID of the
> message source (Error Source Identification).
> 
> aer_print_port_info() prints the Requester ID from the Root Port Error
> Source in the usual Linux "bb:dd.f" format, but when find_source_device()
> finds no error details in the hierarchy below the Root Port, it printed the
> raw Requester ID without decoding it.
> 
> Decode the Requester ID in the usual Linux format so it matches other
> messages.
> 
> Sample message changes:
> 
>   - pcieport :00:1c.5: AER: Correctable error received: :00:1c.5
>   - pcieport :00:1c.5: AER: can't find device of ID00e5
>   + pcieport :00:1c.5: AER: Correctable error message received from 
> :00:1c.5
>   + pcieport :00:1c.5: AER: found no error details for :00:1c.5
> 
> Signed-off-by: Bjorn Helgaas 
LGTM
Reviewed-by: Jonathan Cameron 



Re: [PATCH RFC/RFT 2/4] riscv: Add a runtime detection of invalid TLB entries caching

2023-12-08 Thread Alexandre Ghiti
On Thu, Dec 7, 2023 at 4:55 PM Christophe Leroy
 wrote:
>
>
>
> Le 07/12/2023 à 16:03, Alexandre Ghiti a écrit :
> > This mechanism allows to completely bypass the sfence.vma introduced by
> > the previous commit for uarchs that do not cache invalid TLB entries.
> >
> > Signed-off-by: Alexandre Ghiti 
> > ---
> >   arch/riscv/mm/init.c | 124 +++
> >   1 file changed, 124 insertions(+)
> >
> > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> > index 379403de6c6f..2e854613740c 100644
> > --- a/arch/riscv/mm/init.c
> > +++ b/arch/riscv/mm/init.c
> > @@ -56,6 +56,8 @@ bool pgtable_l5_enabled = IS_ENABLED(CONFIG_64BIT) && 
> > !IS_ENABLED(CONFIG_XIP_KER
> >   EXPORT_SYMBOL(pgtable_l4_enabled);
> >   EXPORT_SYMBOL(pgtable_l5_enabled);
> >
> > +bool tlb_caching_invalid_entries;
> > +
> >   phys_addr_t phys_ram_base __ro_after_init;
> >   EXPORT_SYMBOL(phys_ram_base);
> >
> > @@ -750,6 +752,18 @@ static void __init disable_pgtable_l4(void)
> >   satp_mode = SATP_MODE_39;
> >   }
> >
> > +static void __init enable_pgtable_l5(void)
> > +{
> > + pgtable_l5_enabled = true;
> > + satp_mode = SATP_MODE_57;
> > +}
> > +
> > +static void __init enable_pgtable_l4(void)
> > +{
> > + pgtable_l4_enabled = true;
> > + satp_mode = SATP_MODE_48;
> > +}
> > +
> >   static int __init print_no4lvl(char *p)
> >   {
> >   pr_info("Disabled 4-level and 5-level paging");
> > @@ -826,6 +840,112 @@ static __init void set_satp_mode(uintptr_t dtb_pa)
> >   memset(early_pud, 0, PAGE_SIZE);
> >   memset(early_pmd, 0, PAGE_SIZE);
> >   }
> > +
> > +/* Determine at runtime if the uarch caches invalid TLB entries */
> > +static __init void set_tlb_caching_invalid_entries(void)
> > +{
> > +#define NR_RETRIES_CACHING_INVALID_ENTRIES   50
>
> Looks odd to have macros nested in the middle of a function.
>
> > + uintptr_t set_tlb_caching_invalid_entries_pmd = ((unsigned 
> > long)set_tlb_caching_invalid_entries) & PMD_MASK;
> > + // TODO the test_addr as defined below could go into another pud...
> > + uintptr_t test_addr = set_tlb_caching_invalid_entries_pmd + 2 * 
> > PMD_SIZE;
> > + pmd_t valid_pmd;
> > + u64 satp;
> > + int i = 0;
> > +
> > + /* To ease the page table creation */
> > + disable_pgtable_l5();
> > + disable_pgtable_l4();
> > +
> > + /* Establish a mapping for set_tlb_caching_invalid_entries() in sv39 
> > */
> > + create_pgd_mapping(early_pg_dir,
> > +set_tlb_caching_invalid_entries_pmd,
> > +(uintptr_t)early_pmd,
> > +PGDIR_SIZE, PAGE_TABLE);
> > +
> > + /* Handle the case where set_tlb_caching_invalid_entries straddles 2 
> > PMDs */
> > + create_pmd_mapping(early_pmd,
> > +set_tlb_caching_invalid_entries_pmd,
> > +set_tlb_caching_invalid_entries_pmd,
> > +PMD_SIZE, PAGE_KERNEL_EXEC);
> > + create_pmd_mapping(early_pmd,
> > +set_tlb_caching_invalid_entries_pmd + PMD_SIZE,
> > +set_tlb_caching_invalid_entries_pmd + PMD_SIZE,
> > +PMD_SIZE, PAGE_KERNEL_EXEC);
> > +
> > + /* Establish an invalid mapping */
> > + create_pmd_mapping(early_pmd, test_addr, 0, PMD_SIZE, __pgprot(0));
> > +
> > + /* Precompute the valid pmd here because the mapping for pfn_pmd() 
> > won't exist */
> > + valid_pmd = pfn_pmd(PFN_DOWN(set_tlb_caching_invalid_entries_pmd), 
> > PAGE_KERNEL);
> > +
> > + local_flush_tlb_all();
> > + satp = PFN_DOWN((uintptr_t)_pg_dir) | SATP_MODE_39;
> > + csr_write(CSR_SATP, satp);
> > +
> > + /*
> > +  * Set stvec to after the trapping access, access this invalid mapping
> > +  * and legitimately trap
> > +  */
> > + // TODO: Should I save the previous stvec?
> > +#define ASM_STR(x)   __ASM_STR(x)
>
> Looks odd to have macros nested in the middle of a function.
>
>
> > + asm volatile(
> > + "la a0, 1f  \n"
> > + "csrw " ASM_STR(CSR_TVEC) ", a0 \n"
> > + "ld a0, 0(%0)   \n"
> > + ".align 2   \n"
> > + "1: \n"
> > + :
> > + : "r" (test_addr)
> > + : "a0"
> > + );
> > +
> > + /* Now establish a valid mapping to check if the invalid one is 
> > cached */
> > + early_pmd[pmd_index(test_addr)] = valid_pmd;
> > +
> > + /*
> > +  * Access the valid mapping multiple times: indeed, we can't use
> > +  * sfence.vma as a barrier to make sure the cpu did not reorder 
> > accesses
> > +  * so we may trap even if the uarch does not cache invalid entries. By
> > +  * trying a few times, we make sure that those uarchs will see the 
> > right
> > +  * mapping at some point.
> > +  */
> > 

Re: [PATCH RFC/RFT 1/4] riscv: Stop emitting preventive sfence.vma for new vmalloc mappings

2023-12-08 Thread Alexandre Ghiti
Hi Christophe,

On Thu, Dec 7, 2023 at 4:52 PM Christophe Leroy
 wrote:
>
>
>
> Le 07/12/2023 à 16:03, Alexandre Ghiti a écrit :
> > In 6.5, we removed the vmalloc fault path because that can't work (see
> > [1] [2]). Then in order to make sure that new page table entries were
> > seen by the page table walker, we had to preventively emit a sfence.vma
> > on all harts [3] but this solution is very costly since it relies on IPI.
> >
> > And even there, we could end up in a loop of vmalloc faults if a vmalloc
> > allocation is done in the IPI path (for example if it is traced, see
> > [4]), which could result in a kernel stack overflow.
> >
> > Those preventive sfence.vma needed to be emitted because:
> >
> > - if the uarch caches invalid entries, the new mapping may not be
> >observed by the page table walker and an invalidation may be needed.
> > - if the uarch does not cache invalid entries, a reordered access
> >could "miss" the new mapping and traps: in that case, we would actually
> >only need to retry the access, no sfence.vma is required.
> >
> > So this patch removes those preventive sfence.vma and actually handles
> > the possible (and unlikely) exceptions. And since the kernel stacks
> > mappings lie in the vmalloc area, this handling must be done very early
> > when the trap is taken, at the very beginning of handle_exception: this
> > also rules out the vmalloc allocations in the fault path.
> >
> > Note that for now, we emit a sfence.vma even for uarchs that do not
> > cache invalid entries as we have no means to know that: that will be
> > fixed in the next patch.
> >
> > Link: 
> > https://lore.kernel.org/linux-riscv/20230531093817.665799-1-bj...@kernel.org/
> >  [1]
> > Link: 
> > https://lore.kernel.org/linux-riscv/20230801090927.2018653-1-dy...@andestech.com
> >  [2]
> > Link: 
> > https://lore.kernel.org/linux-riscv/20230725132246.817726-1-alexgh...@rivosinc.com/
> >  [3]
> > Link: https://lore.kernel.org/lkml/20200508144043.13893-1-j...@8bytes.org/ 
> > [4]
> > Signed-off-by: Alexandre Ghiti 
> > ---
> >   arch/riscv/include/asm/cacheflush.h  | 19 +-
> >   arch/riscv/include/asm/thread_info.h |  5 ++
> >   arch/riscv/kernel/asm-offsets.c  |  5 ++
> >   arch/riscv/kernel/entry.S| 94 
> >   arch/riscv/mm/init.c |  2 +
> >   5 files changed, 124 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/riscv/include/asm/cacheflush.h 
> > b/arch/riscv/include/asm/cacheflush.h
> > index 3cb53c4df27c..a916cbc69d47 100644
> > --- a/arch/riscv/include/asm/cacheflush.h
> > +++ b/arch/riscv/include/asm/cacheflush.h
> > @@ -37,7 +37,24 @@ static inline void flush_dcache_page(struct page *page)
> >   flush_icache_mm(vma->vm_mm, 0)
> >
> >   #ifdef CONFIG_64BIT
> > -#define flush_cache_vmap(start, end) flush_tlb_kernel_range(start, end)
> > +extern u64 new_vmalloc[];
>
> Can you have the table size here ? Would help GCC static analysis for
> boundary checking.

Yes, I'll do

>
> > +extern char _end[];
> > +#define flush_cache_vmap flush_cache_vmap
> > +static inline void flush_cache_vmap(unsigned long start, unsigned long end)
> > +{
> > + if ((start < VMALLOC_END && end > VMALLOC_START) ||
> > + (start < MODULES_END && end > MODULES_VADDR)) {
>
> Can you use is_vmalloc_or_module_addr() instead ?

Yes, I'll do

>
>
> > + int i;
> > +
> > + /*
> > +  * We don't care if concurrently a cpu resets this value since
> > +  * the only place this can happen is in handle_exception() 
> > where
> > +  * an sfence.vma is emitted.
> > +  */
> > + for (i = 0; i < NR_CPUS / sizeof(u64) + 1; ++i)
>
> Use ARRAY_SIZE() ?

And that too :)

Thanks for the review,

Alex

>
> > + new_vmalloc[i] = -1ULL;
> > + }
> > +}
> >   #endif
> >
> >   #ifndef CONFIG_SMP
> > diff --git a/arch/riscv/include/asm/thread_info.h 
> > b/arch/riscv/include/asm/thread_info.h
> > index 1833beb00489..8fe12fa6c329 100644
> > --- a/arch/riscv/include/asm/thread_info.h
> > +++ b/arch/riscv/include/asm/thread_info.h
> > @@ -60,6 +60,11 @@ struct thread_info {
> >   longuser_sp;/* User stack pointer */
> >   int cpu;
> >   unsigned long   syscall_work;   /* SYSCALL_WORK_ flags */
> > + /*
> > +  * Used in handle_exception() to save a0, a1 and a2 before knowing if 
> > we
> > +  * can access the kernel stack.
> > +  */
> > + unsigned long   a0, a1, a2;
> >   };
> >
> >   /*
> > diff --git a/arch/riscv/kernel/asm-offsets.c 
> > b/arch/riscv/kernel/asm-offsets.c
> > index d6a75aac1d27..340c1c84560d 100644
> > --- a/arch/riscv/kernel/asm-offsets.c
> > +++ b/arch/riscv/kernel/asm-offsets.c
> > @@ -34,10 +34,15 @@ void asm_offsets(void)
> >   OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
> >   OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
> >  

Re: [PATCH 01/12] KVM: PPC: Book3S HV nestedv2: Invalidate RPT before deleting a guest

2023-12-08 Thread Vaibhav Jain


Hi Aneesh,

Thanks for looking into this patch. My responses inline below:

"Aneesh Kumar K.V (IBM)"  writes:

> Vaibhav Jain  writes:
>
>> From: Jordan Niethe 
>>
>> An L0 must invalidate the L2's RPT during H_GUEST_DELETE if this has not
>> already been done. This is a slow operation that means H_GUEST_DELETE
>> must return H_BUSY multiple times before completing. Invalidating the
>> tables before deleting the guest so there is less work for the L0 to do.
>>
>> Signed-off-by: Jordan Niethe 
>> ---
>>  arch/powerpc/include/asm/kvm_book3s.h | 1 +
>>  arch/powerpc/kvm/book3s_hv.c  | 6 --
>>  arch/powerpc/kvm/book3s_hv_nested.c   | 2 +-
>>  3 files changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
>> b/arch/powerpc/include/asm/kvm_book3s.h
>> index 4f527d09c92b..a37736ed3728 100644
>> --- a/arch/powerpc/include/asm/kvm_book3s.h
>> +++ b/arch/powerpc/include/asm/kvm_book3s.h
>> @@ -302,6 +302,7 @@ void kvmhv_nested_exit(void);
>>  void kvmhv_vm_nested_init(struct kvm *kvm);
>>  long kvmhv_set_partition_table(struct kvm_vcpu *vcpu);
>>  long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu);
>> +void kvmhv_flush_lpid(u64 lpid);
>>  void kvmhv_set_ptbl_entry(u64 lpid, u64 dw0, u64 dw1);
>>  void kvmhv_release_all_nested(struct kvm *kvm);
>>  long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu);
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 1ed6ec140701..5543e8490cd9 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -5691,10 +5691,12 @@ static void kvmppc_core_destroy_vm_hv(struct kvm 
>> *kvm)
>>  kvmhv_set_ptbl_entry(kvm->arch.lpid, 0, 0);
>>  }
>>  
>> -if (kvmhv_is_nestedv2())
>> +if (kvmhv_is_nestedv2()) {
>> +kvmhv_flush_lpid(kvm->arch.lpid);
>>  plpar_guest_delete(0, kvm->arch.lpid);
>>
>
> I am not sure I follow the optimization here. I would expect the
> hypervisor to kill all the translation caches as part of guest_delete.
> What is the benefit of doing a lpid flush outside the guest delete?
>
Thats right. However without this optimization the H_GUEST_DELETE hcall
in plpar_guest_delete() returns H_BUSY multiple times resulting in
multiple hcalls to the hypervisor until it finishes. Flushing the guest
translation cache upfront reduces the number of HCALLs L1 guests has to
make to delete a L2 guest via H_GUEST_DELETE.

>> -else
>> +} else {
>>  kvmppc_free_lpid(kvm->arch.lpid);
>> +}
>>  
>>  kvmppc_free_pimap(kvm);
>>  }
>> diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
>> b/arch/powerpc/kvm/book3s_hv_nested.c
>> index 3b658b8696bc..5c375ec1a3c6 100644
>> --- a/arch/powerpc/kvm/book3s_hv_nested.c
>> +++ b/arch/powerpc/kvm/book3s_hv_nested.c
>> @@ -503,7 +503,7 @@ void kvmhv_nested_exit(void)
>>  }
>>  }
>>  
>> -static void kvmhv_flush_lpid(u64 lpid)
>> +void kvmhv_flush_lpid(u64 lpid)
>>  {
>>  long rc;
>>  
>> -- 
>> 2.42.0

-- 
Cheers
~ Vaibhav


[PATCH v6 3/3] Documentation/powerpc: update fadump implementation details

2023-12-08 Thread Sourabh Jain
The patch titled ("powerpc: make fadump resilient with memory add/remove
events") has made significant changes to the implementation of fadump,
particularly on elfcorehdr creation and fadump crash info header
structure. Therefore, updating the fadump implementation documentation
to reflect those changes.

Following updates are done to firmware assisted dump documentation:

1. The elfcorehdr is no longer stored after fadump HDR in the reserved
   dump area. Instead, the second kernel dynamically allocates memory
   for the elfcorehdr within the address range from 0 to the boot memory
   size. Therefore, update figures 1 and 2 of Memory Reservation during
   the first and second kernels to reflect this change.

2. A version field has been added to the fadump header to manage the
   future changes to fadump crash info header structure without changing
   the fadump header magic number in the future. Therefore, remove the
   corresponding TODO from the document.

Signed-off-by: Sourabh Jain 
Cc: Aditya Gupta 
Cc: Aneesh Kumar K.V 
Cc: Hari Bathini 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Naveen N Rao 
---
 .../arch/powerpc/firmware-assisted-dump.rst   | 91 +--
 1 file changed, 42 insertions(+), 49 deletions(-)

diff --git a/Documentation/arch/powerpc/firmware-assisted-dump.rst 
b/Documentation/arch/powerpc/firmware-assisted-dump.rst
index e363fc48529a..7e37aadd1f77 100644
--- a/Documentation/arch/powerpc/firmware-assisted-dump.rst
+++ b/Documentation/arch/powerpc/firmware-assisted-dump.rst
@@ -134,12 +134,12 @@ that are run. If there is dump data, then the
 memory is held.
 
 If there is no waiting dump data, then only the memory required to
-hold CPU state, HPTE region, boot memory dump, FADump header and
-elfcore header, is usually reserved at an offset greater than boot
-memory size (see Fig. 1). This area is *not* released: this region
-will be kept permanently reserved, so that it can act as a receptacle
-for a copy of the boot memory content in addition to CPU state and
-HPTE region, in the case a crash does occur.
+hold CPU state, HPTE region, boot memory dump, and FADump header is
+usually reserved at an offset greater than boot memory size (see Fig. 1).
+This area is *not* released: this region will be kept permanently
+reserved, so that it can act as a receptacle for a copy of the boot
+memory content in addition to CPU state and HPTE region, in the case
+a crash does occur.
 
 Since this reserved memory area is used only after the system crash,
 there is no point in blocking this significant chunk of memory from
@@ -153,22 +153,22 @@ that were present in CMA region::
 
   o Memory Reservation during first kernel
 
-  Low memory Top of memory
-  0boot memory size   |<--- Reserved dump area --->|   |
-  |   |   |Permanent Reservation   |   |
-  V   V   ||   V
-  +---+-/ /---+---++---+-+-++--+
-  |   |   |///||  DUMP | HDR | ELF ||  |
-  +---+-/ /---+---++---+-+-++--+
-|   ^^ ^  ^   ^
-|   || |  |   |
-\  CPU  HPTE   /  |   |
- --   |   |
-  Boot memory content gets transferred|   |
-  to reserved area by firmware at the |   |
-  time of crash.  |   |
-  FADump Header   |
-   (meta area)|
+  Low memory  Top of memory
+  0boot memory size   |<-- Reserved dump area ->| |
+  |   |   |  Permanent Reservation  | |
+  V   V   | | V
+  +---+-/ /---+---++---+---++-+
+  |   |   |///||DUMP   |  HDR  || |
+  +---+-/ /---+---++---+---++-+
+|   ^^   ^ ^  ^
+|   ||   | |  |
+\  CPU  HPTE / |  |
+   |  |
+  Boot memory content gets transferred |  |
+  to reserved area by firmware at the  |  |
+  time of crash.   |  |
+   FADump Header  |
+(meta area)   |
   |
   |
   Metadata: This area holds a metadata structure whose
@@ -186,13 +186,20 @@ that were present in 

[PATCH v6 2/3] powerpc/fadump: add hotplug_ready sysfs interface

2023-12-08 Thread Sourabh Jain
The elfcorehdr describes the CPUs and memory of the crashed kernel to
the kernel that captures the dump, known as the second or fadump kernel.
The elfcorehdr needs to be updated if the system's memory changes due to
memory hotplug or online/offline events.

Currently, memory hotplug events are monitored in userspace by udev
rules, and fadump is re-registered, which recreates the elfcorehdr with
the latest available memory in the system.

However, the previous patch ("powerpc: make fadump resilient with memory
add/remove events") moved the creation of elfcorehdr to the second or
fadump kernel. This eliminates the need to regenerate the elfcorehdr
during memory hotplug or online/offline events.

Create a sysfs entry at /sys/kernel/fadump/hotplug_ready to let
userspace know that fadump re-registration is not required for memory
add/remove events.

Signed-off-by: Sourabh Jain 
Cc: Aditya Gupta 
Cc: Aneesh Kumar K.V 
Cc: Hari Bathini 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Naveen N Rao 
---
 Documentation/ABI/testing/sysfs-kernel-fadump | 11 +++
 arch/powerpc/kernel/fadump.c  | 14 ++
 2 files changed, 25 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump 
b/Documentation/ABI/testing/sysfs-kernel-fadump
index 8f7a64a81783..971934b2891e 100644
--- a/Documentation/ABI/testing/sysfs-kernel-fadump
+++ b/Documentation/ABI/testing/sysfs-kernel-fadump
@@ -38,3 +38,14 @@ Contact: linuxppc-dev@lists.ozlabs.org
 Description:   read only
Provide information about the amount of memory reserved by
FADump to save the crash dump in bytes.
+
+What:  /sys/kernel/fadump/hotplug_ready
+Date:  Sep 2023
+Contact:   linuxppc-dev@lists.ozlabs.org
+Description:   read only
+   Kdump udev rule re-registers fadump on memory add/remove events,
+   primarily to update the elfcorehdr. This sysfs indicates the
+   kdump udev rule that fadump re-registration is not required on
+   memory add/remove events because elfcorehdr is now prepared in
+   the second/fadump kernel.
+User:  kexec-tools
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index eb9132538268..a55dd9bf754c 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1455,6 +1455,18 @@ static ssize_t enabled_show(struct kobject *kobj,
return sprintf(buf, "%d\n", fw_dump.fadump_enabled);
 }
 
+/*
+ * /sys/kernel/fadump/hotplug_ready sysfs node returns 1, which inidcates
+ * to usersapce that fadump re-registration is not required on memory
+ * hotplug events.
+ */
+static ssize_t hotplug_ready_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+   return sprintf(buf, "%d\n", 1);
+}
+
 static ssize_t mem_reserved_show(struct kobject *kobj,
 struct kobj_attribute *attr,
 char *buf)
@@ -1527,11 +1539,13 @@ static struct kobj_attribute release_attr = 
__ATTR_WO(release_mem);
 static struct kobj_attribute enable_attr = __ATTR_RO(enabled);
 static struct kobj_attribute register_attr = __ATTR_RW(registered);
 static struct kobj_attribute mem_reserved_attr = __ATTR_RO(mem_reserved);
+static struct kobj_attribute hotplug_ready_attr = __ATTR_RO(hotplug_ready);
 
 static struct attribute *fadump_attrs[] = {
_attr.attr,
_attr.attr,
_reserved_attr.attr,
+   _ready_attr.attr,
NULL,
 };
 
-- 
2.41.0



[PATCH v6 1/3] powerpc: make fadump resilient with memory add/remove events

2023-12-08 Thread Sourabh Jain
Due to changes in memory resources caused by either memory hotplug or
online/offline events, the elfcorehdr, which describes the CPUs and
memory of the crashed kernel to the kernel that collects the dump (known
as second/fadump kernel), becomes outdated. Consequently, attempting
dump collection with an outdated elfcorehdr can lead to failed or
inaccurate dump collection.

Memory hotplug or online/offline events is referred as memory add/remove
events in reset of the commit message.

The current solution to address the aforementioned issue is as follows:
Monitor memory add/remove events in userspace using udev rules, and
re-register fadump whenever there are changes in memory resources. This
leads to the creation of a new elfcorehdr with updated system memory
information.

There are several notable issues associated with re-registering fadump
for every memory add/remove events.

1. Bulk memory add/remove events with udev-based fadump re-registration
   can lead to race conditions and, more importantly, it creates a wide
   window during which fadump is inactive until all memory add/remove
   events are settled.
2. Re-registering fadump for every memory add/remove event is
   inefficient.
3. The memory for elfcorehdr is allocated based on the memblock regions
   available during early boot and remains fixed thereafter. However, if
   elfcorehdr is later recreated with additional memblock regions, its
   size will increase, potentially leading to memory corruption.

Address the aforementioned challenges by shifting the creation of
elfcorehdr from the first kernel (also referred as the crashed kernel),
where it was created and frequently recreated for every memory
add/remove event, to the fadump kernel. As a result, the elfcorehdr only
needs to be created once, thus eliminating the necessity to re-register
fadump during memory add/remove events.

At present, the first kernel prepares the fadump header and stores it in
the fadump reserved area. The fadump header contains start address of
the elfcorehd, crashing CPU details, etc.  In the event of first kernel
crash, the second/fadump boots and access the fadump header prepared by
first kernel and do the following in a platform-specific function
[rtas|opal]_fadump_process:

At present, the first kernel is responsible for preparing the fadump
header and storing it in the fadump reserved area. The fadump header
includes the start address of the elfcorehd, crashing CPU details, and
other relevant information. In the event of a crash in the first kernel,
the second/fadump boots and accesses the fadump header prepared by the
first kernel. It then performs the following steps in a
platform-specific function [rtas|opal]_fadump_process:

1. Sanity check for fadump header
2. Update CPU notes in elfcorehdr
3. Set the global variable elfcorehdr_addr to the address of the
   fadump header's elfcorehdr. For vmcore module to process it later on.

Along with the above, update the setup_fadump()/fadump.c to create
elfcorehdr in second/fadump kernel.

Section below outlines the information required to create the elfcorehdr
and the changes made to make it available to the fadump kernel if it's
not already.

To create elfcorehdr, the following crashed kernel information is
required: CPU notes, vmcoreinfo, and memory ranges.

At present, the CPU notes are already prepared in the fadump kernel, so
no changes are needed in that regard. The fadump kernel has access to
all crashed kernel memory regions, including boot memory regions that
are relocated by firmware to fadump reserved areas, so no changes for
that either. However, it is necessary to add new members to the fadump
header, i.e., the 'fadump_crash_info_header' structure, in order to pass
the crashed kernel's vmcoreinfo address and its size to fadump kernel.

In addition to the vmcoreinfo address and size, there are a few other
attributes also added to the fadump_crash_info_header structure.

1. version:
   It stores the fadump header version, which is currently set to 1.
   This provides flexibility to update the fadump crash info header in
   the future without changing the magic number. For each change in the
   fadump header, the version will be increased. This will help the
   updated kernel determine how to handle kernel dumps from older
   kernels. The magic number remains relevant for checking fadump header
   corruption.

2. elfcorehdr_size:
   since elfcorehdr is now prepared in the fadump/second kernel and
   it is not part of the reserved area, this attribute is needed to
   track the memory allocated for elfcorehdr to do the deallocation
   properly.

3. pt_regs_sz/cpu_mask_sz:
   Store size of pt_regs and cpu_mask strucutre in first kernel. These
   attributes are used avoid processing the dump if the sizes of pt_regs
   and cpu_mask are not the same across the crashed and fadump kernel.

Note: if either first/crashed kernel or second/fadump kernel do not have
the changes introduced here kernel fail to collect 

[PATCH v6 0/3] powerpc: make fadump resilient with memory add/remove events

2023-12-08 Thread Sourabh Jain
Problem:

Due to changes in memory resources caused by either memory hotplug or
online/offline events, the elfcorehdr, which describes the cpus and
memory of the crashed kernel to the kernel that collects the dump (known
as second/fadump kernel), becomes outdated. Consequently, attempting
dump collection with an outdated elfcorehdr can lead to failed or
inaccurate dump collection.

Memory hotplug or online/offline events is referred as memory add/remove
events in reset of the patch series.

Existing solution:
==
Monitor memory add/remove events in userspace using udev rules, and
re-register fadump whenever there are changes in memory resources. This
leads to the creation of a new elfcorehdr with updated system memory
information.

Challenges with existing solution:
==
1. Performing bulk memory add/remove with udev-based fadump
   re-registration can lead to race conditions and, more importantly,
   it creates a large wide window during which fadump is inactive until
   all memory add/remove events are settled.
2. Re-registering fadump for every memory add/remove event is
   inefficient.
3. Memory for elfcorehdr is allocated based on the memblock regions
   available during first kernel early boot and it remains fixed
   thereafter. However, if the elfcorehdr is later recreated with
   additional memblock regions, its size will increase, potentially
   leading to memory corruption.

Proposed solution:
==
Address the aforementioned challenges by shifting the creation of
elfcorehdr from the first kernel (also referred as the crashed kernel),
where it was created and frequently recreated for every memory
add/remove event, to the fadump kernel. As a result, the elfcorehdr only
needs to be created once, thus eliminating the necessity to re-register
fadump during memory add/remove events.

To know more about elfcorehdr creation in the fadump kernel, refer to
the first patch in this series.

The second patch includes a new sysfs interface that tells userspace
that fadump re-registration isn't needed for memory add/remove events. 
note that userspace changes do not need to be in sync with kernel
changes; they can roll out independently.

Since there are significant changes in the fadump implementation, the
third patch updates the fadump documentation to reflect the changes made
in this patch series.

Kernel tree rebased on 6.7.0-rc4 with patch series applied:
=
https://github.com/sourabhjains/linux/tree/fadump-mem-hotplug-v6

Userspace changes:
==
To realize this feature, one must update the kdump udev rules to prevent
fadump re-registration during memory add/remove events.

On rhel apply the following changes to file
/usr/lib/udev/rules.d/98-kexec.rules

-run+="/bin/sh -c '/usr/bin/systemctl is-active kdump.service || exit 0; 
/usr/bin/systemd-run --quiet --no-block /usr/lib/udev/kdump-udev-throttler'"
+# don't re-register fadump if the value of the node
+# /sys/kernel/fadump/hotplug_ready is 1.
+
+run+="/bin/sh -c '/usr/bin/systemctl is-active kdump.service || exit 0; ! test 
-f /sys/kernel/fadump_enabled || cat /sys/kernel/fadump_enabled | grep 0  || ! 
test -f /sys/kernel/fadump/hotplug_ready || cat 
/sys/kernel/fadump/hotplug_ready | grep 0 || exit 0; /usr/bin/systemd-run 
--quiet --no-block /usr/lib/udev/kdump-udev-throttler'"

Changelog:
==
v6: 8 Dec 2023
  - Add size fields for `pt_regs` and `cpumask` in the fadump header
structure
  - Don't process the dump if the size of `pt_regs` and `cpu_mask` is
not same in the crashed and fadump kernel
  - Include an additional check for endianness mismatch when the magic
number doesn't match, to print the relevant error message
  - Don't process the dump if the fadump header contains an old magic number
  - Rebased it to 6.7.0-rc4

v5: 29 Oct 2023 
  https://lore.kernel.org/all/20231029124548.12198-1-sourabhj...@linux.ibm.com/
  - Fix a comment on the first patch

v4: 21 Oct 2023
  https://lore.kernel.org/all/20231021181733.204311-1-sourabhj...@linux.ibm.com/
  - Fix a build warning about type casting

v3: 9 Oct 2023
  https://lore.kernel.org/all/20231009041953.36139-1-sourabhj...@linux.ibm.com/
  - Assign physical address of elfcorehdr to fdh->elfcorehdr_addr
  - Rename a variable, boot_mem_dest_addr -> boot_mem_dest_offset

v2: 25 Sep 2023
  https://lore.kernel.org/all/20230925051214.678957-1-sourabhj...@linux.ibm.com/
  - Fixed a few indentation issues reported by the checkpatch script.
  - Rebased it to 6.6.0-rc3

v1: 17 Sep 2023
  https://lore.kernel.org/all/20230917080225.561627-1-sourabhj...@linux.ibm.com/

Cc: Aditya Gupta 
Cc: Aneesh Kumar K.V 
Cc: Hari Bathini 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Naveen N Rao 

Sourabh Jain (3):
  powerpc: make fadump resilient with memory add/remove events
  powerpc/fadump: add hotplug_ready sysfs interface
  Documentation/powerpc: update fadump 

Re: [PATCH v2] i2c: cpm: Remove linux,i2c-index conversion from be32

2023-12-08 Thread Jochen Friedrich

Acked-By: Jochen Friedrich 

Am 06.12.2023 um 23:24 schrieb Christophe Leroy:

sparse reports an error on some data that gets converted from be32.

That's because that data is typed u32 instead of __be32.

The type is correct, the be32_to_cpu() conversion is not.

Remove the conversion.

Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202312042210.ql4da8av-...@intel.com/
Signed-off-by: Christophe Leroy 
---
v2: Use u32 directly, remove be32_to_cpu().
---
  drivers/i2c/busses/i2c-cpm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/i2c/busses/i2c-cpm.c b/drivers/i2c/busses/i2c-cpm.c
index 9a664abf734d..771d60bc8d71 100644
--- a/drivers/i2c/busses/i2c-cpm.c
+++ b/drivers/i2c/busses/i2c-cpm.c
@@ -658,7 +658,7 @@ static int cpm_i2c_probe(struct platform_device *ofdev)
/* register new adapter to i2c module... */
  
  	data = of_get_property(ofdev->dev.of_node, "linux,i2c-index", );

-   cpm->adap.nr = (data && len == 4) ? be32_to_cpup(data) : -1;
+   cpm->adap.nr = (data && len == 4) ? *data : -1;
result = i2c_add_numbered_adapter(>adap);
  
  	if (result < 0)


[PATCH] powerpc/powernv: Add a null pointer check to scom_debug_init_one

2023-12-08 Thread Kunwu Chan
kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure.
Add a null pointer check, and release 'ent' to avoid memory leaks.

Fixes: bfd2f0d49aef ("powerpc/powernv: Get rid of old scom_controller 
abstraction")
Cc: Kunwu Chan 
Signed-off-by: Kunwu Chan 
---
 arch/powerpc/platforms/powernv/opal-xscom.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/opal-xscom.c 
b/arch/powerpc/platforms/powernv/opal-xscom.c
index 262cd6fac907..748c2b97fa53 100644
--- a/arch/powerpc/platforms/powernv/opal-xscom.c
+++ b/arch/powerpc/platforms/powernv/opal-xscom.c
@@ -165,6 +165,11 @@ static int scom_debug_init_one(struct dentry *root, struct 
device_node *dn,
ent->chip = chip;
snprintf(ent->name, 16, "%08x", chip);
ent->path.data = (void *)kasprintf(GFP_KERNEL, "%pOF", dn);
+   if (!ent->path.data) {
+   kfree(ent);
+   return -ENOMEM;
+   }
+
ent->path.size = strlen((char *)ent->path.data);
 
dir = debugfs_create_dir(ent->name, root);
-- 
2.39.2



Re: [PATCH 09/12] KVM: PPC: Book3S HV nestedv2: Do not call H_COPY_TOFROM_GUEST

2023-12-08 Thread IBM
Vaibhav Jain  writes:

> From: Jordan Niethe 
>
> H_COPY_TOFROM_GUEST is part of the nestedv1 API and so should not be
> called by a nestedv2 host. Do not attempt to call it.
>

May be we should use
firmware_has_feature(FW_FEATURE_H_COPY_TOFROM_GUEST))?

the nestedv2 can end up using the above hcall if it is supported by the
hypervisor right? In its absence we will have to translate the guest ea
using xlate and then use kvm_guest_read to read location using the guest
real address right? That xlate will also involves multiple kvm_guest_read.


> Signed-off-by: Jordan Niethe 
> ---
>  arch/powerpc/kvm/book3s_64_mmu_radix.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
> b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> index 916af6c153a5..4a1abb9f7c05 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> @@ -40,6 +40,9 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int 
> pid,
>   unsigned long quadrant, ret = n;
>   bool is_load = !!to;
>  
> + if (kvmhv_is_nestedv2())
> + return H_UNSUPPORTED;
> +
>   /* Can't access quadrants 1 or 2 in non-HV mode, call the HV to do it */
>   if (kvmhv_on_pseries())
>   return plpar_hcall_norets(H_COPY_TOFROM_GUEST, lpid, pid, eaddr,
> -- 
> 2.42.0