from:"Hari Bathini"

Re: [PATCH 2/5] powerpc64/bpf: jit support for unconditional byte swap

2024-05-22 Thread Hari Bathini





On 17/05/24 1:26 pm, Artem Savkov wrote:

Add jit support for unconditional byte swap. Tested using BSWAP tests
from test_bpf module.

Signed-off-by: Artem Savkov 
---
  arch/powerpc/net/bpf_jit_comp64.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 3071205782b15..97191cf091bbf 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -699,11 +699,12 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
u32 *fimage, struct code
 */
case BPF_ALU | BPF_END | BPF_FROM_LE:
case BPF_ALU | BPF_END | BPF_FROM_BE:



+   case BPF_ALU64 | BPF_END | BPF_FROM_LE:


A comment here indicating this case does unconditional swap
could improve readability.

Other than this minor nit, the patchset looks good to me.
Also, tested the changes with test_bpf module and selftests.
For the series..

Reviewed-by: Hari Bathini 


  #ifdef __BIG_ENDIAN__
if (BPF_SRC(code) == BPF_FROM_BE)
goto emit_clear;
  #else /* !__BIG_ENDIAN__ */
-   if (BPF_SRC(code) == BPF_FROM_LE)
+   if (BPF_CLASS(code) == BPF_ALU && BPF_SRC(code) == 
BPF_FROM_LE)
goto emit_clear;
  #endif
switch (imm) {

[PATCH] powerpc/fadump: update documentation about bootargs_append

2024-05-10 Thread Hari Bathini

Update ABI documentation about the introduction of the new sysfs
entry bootargs_append. This sysfs entry will be used to setup the
additional parameters to be passed to dump capture kernel.

Signed-off-by: Hari Bathini 
---

* This patch is a follow-up of below patch series, to update corresponding
  ABI documentation:

https://lore.kernel.org/all/20240509115755.519982-1-hbath...@linux.ibm.com/

 Documentation/ABI/testing/sysfs-kernel-fadump | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump 
b/Documentation/ABI/testing/sysfs-kernel-fadump
index c586054657d6..2f9daa7ca55b 100644
--- a/Documentation/ABI/testing/sysfs-kernel-fadump
+++ b/Documentation/ABI/testing/sysfs-kernel-fadump
@@ -49,3 +49,10 @@ Description: read only
memory add/remove events because elfcorehdr is now prepared in
the second/fadump kernel.
 User:  kexec-tools
+
+What:  /sys/kernel/fadump/bootargs_append
+Date:  May 2024
+Contact:   linuxppc-dev@lists.ozlabs.org
+Description:   read/write
+   This is a special sysfs file available to setup additional
+   parameters to be passed to capture kernel.
-- 
2.45.0

[PATCH] powerpc/85xx: fix compile error without CONFIG_CRASH_DUMP

2024-05-10 Thread Hari Bathini

Since commit 5c4233cc0920 ("powerpc/kdump: Split KEXEC_CORE and
CRASH_DUMP dependency"), crashing_cpu is not available without
CONFIG_CRASH_DUMP. Fix compile error on 64-BIT 85xx owing to this
change.

Cc: sta...@vger.kernel.org
Fixes: 5c4233cc0920 ("powerpc/kdump: Split KEXEC_CORE and CRASH_DUMP 
dependency")
Reported-by: Christian Zigotzky 
Closes: 
https://lore.kernel.org/all/fa247ae4-5825-4dbe-a737-d93b7ab4d...@xenosoft.de/
Suggested-by: Michael Ellerman 
Signed-off-by: Hari Bathini 
---
 arch/powerpc/platforms/85xx/smp.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/smp.c 
b/arch/powerpc/platforms/85xx/smp.c
index 40aa58206888..e52b848b64b7 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -398,6 +398,7 @@ static void mpc85xx_smp_kexec_cpu_down(int crash_shutdown, 
int secondary)
hard_irq_disable();
mpic_teardown_this_cpu(secondary);
 
+#ifdef CONFIG_CRASH_DUMP
if (cpu == crashing_cpu && cpu_thread_in_core(cpu) != 0) {
/*
 * We enter the crash kernel on whatever cpu crashed,
@@ -406,9 +407,11 @@ static void mpc85xx_smp_kexec_cpu_down(int crash_shutdown, 
int secondary)
 */
disable_threadbit = 1;
disable_cpu = cpu_first_thread_sibling(cpu);
-   } else if (sibling != crashing_cpu &&
-  cpu_thread_in_core(cpu) == 0 &&
-  cpu_thread_in_core(sibling) != 0) {
+   } else if (sibling == crashing_cpu) {
+   return;
+   }
+#endif
+   if (cpu_thread_in_core(cpu) == 0 && cpu_thread_in_core(sibling) != 0) {
disable_threadbit = 2;
disable_cpu = sibling;
}
-- 
2.45.0

[PATCH v2 3/3] powerpc/fadump: pass additional parameters when fadump is active

2024-05-09 Thread Hari Bathini

Append the additional parameters passed/set in the dedicated parameter
area (RTAS_FADUMP_PARAM_AREA) to bootargs in fadump capture kernel.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/include/asm/fadump.h |  2 ++
 arch/powerpc/kernel/fadump.c  | 35 +++
 arch/powerpc/kernel/prom.c|  3 +++
 3 files changed, 40 insertions(+)

diff --git a/arch/powerpc/include/asm/fadump.h 
b/arch/powerpc/include/asm/fadump.h
index 526a6a647312..ef40c9b6972a 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -19,12 +19,14 @@ extern int is_fadump_active(void);
 extern int should_fadump_crash(void);
 extern void crash_fadump(struct pt_regs *, const char *);
 extern void fadump_cleanup(void);
+extern void fadump_append_bootargs(void);
 
 #else  /* CONFIG_FA_DUMP */
 static inline int is_fadump_active(void) { return 0; }
 static inline int should_fadump_crash(void) { return 0; }
 static inline void crash_fadump(struct pt_regs *regs, const char *str) { }
 static inline void fadump_cleanup(void) { }
+static inline void fadump_append_bootargs(void) { }
 #endif /* !CONFIG_FA_DUMP */
 
 #if defined(CONFIG_FA_DUMP) || defined(CONFIG_PRESERVE_FA_DUMP)
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 6d35b09d6f3a..2276bacc4170 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -131,6 +131,41 @@ static int __init fadump_cma_init(void)
 static int __init fadump_cma_init(void) { return 1; }
 #endif /* CONFIG_CMA */
 
+/*
+ * Additional parameters meant for capture kernel are placed in a dedicated 
area.
+ * If this is capture kernel boot, append these parameters to bootargs.
+ */
+void __init fadump_append_bootargs(void)
+{
+   char *append_args;
+   size_t len;
+
+   if (!fw_dump.dump_active || !fw_dump.param_area_supported || 
!fw_dump.param_area)
+   return;
+
+   if (fw_dump.param_area >= fw_dump.boot_mem_top) {
+   if (memblock_reserve(fw_dump.param_area, COMMAND_LINE_SIZE)) {
+   pr_warn("WARNING: Can't use additional parameters 
area!\n");
+   fw_dump.param_area = 0;
+   return;
+   }
+   }
+
+   append_args = (char *)fw_dump.param_area;
+   len = strlen(boot_command_line);
+
+   /*
+* Too late to fail even if cmdline size exceeds. Truncate additional 
parameters
+* to cmdline size and proceed anyway.
+*/
+   if (len + strlen(append_args) >= COMMAND_LINE_SIZE - 1)
+   pr_warn("WARNING: Appending parameters exceeds cmdline size. 
Truncating!\n");
+
+   pr_debug("Cmdline: %s\n", boot_command_line);
+   snprintf(boot_command_line + len, COMMAND_LINE_SIZE - len, " %s", 
append_args);
+   pr_info("Updated cmdline: %s\n", boot_command_line);
+}
+
 /* Scan the Firmware Assisted dump configuration details. */
 int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname,
  int depth, void *data)
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index eb140ea6b6ff..60819751e55e 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -813,6 +813,9 @@ void __init early_init_devtree(void *params)
 */
of_scan_flat_dt(early_init_dt_scan_chosen_ppc, boot_command_line);
 
+   /* Append additional parameters passed for fadump capture kernel */
+   fadump_append_bootargs();
+
/* Scan memory nodes and rebuild MEMBLOCKs */
early_init_dt_scan_root();
early_init_dt_scan_memory_ppc();
-- 
2.45.0

[PATCH v2 2/3] powerpc/fadump: setup additional parameters for dump capture kernel

2024-05-09 Thread Hari Bathini

For fadump case, passing additional parameters to dump capture kernel
helps in minimizing the memory footprint for it and also provides the
flexibility to disable components/modules, like hugepages, that are
hindering the boot process of the special dump capture environment.

Set up a dedicated parameter area to be passed to the capture kernel.
This area type is defined as RTAS_FADUMP_PARAM_AREA. Sysfs attribute
'/sys/kernel/fadump/bootargs_append' is exported to the userspace to
specify the additional parameters to be passed to the capture kernel

Signed-off-by: Hari Bathini 
---
 arch/powerpc/include/asm/fadump-internal.h   |  3 +
 arch/powerpc/kernel/fadump.c | 87 
 arch/powerpc/platforms/powernv/opal-fadump.c |  6 +-
 arch/powerpc/platforms/pseries/rtas-fadump.c | 35 +++-
 arch/powerpc/platforms/pseries/rtas-fadump.h | 11 ++-
 5 files changed, 133 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump-internal.h 
b/arch/powerpc/include/asm/fadump-internal.h
index 35787fa1ac60..e83869a4eb6a 100644
--- a/arch/powerpc/include/asm/fadump-internal.h
+++ b/arch/powerpc/include/asm/fadump-internal.h
@@ -124,6 +124,8 @@ struct fw_dump {
unsigned long   cpu_notes_buf_vaddr;
unsigned long   cpu_notes_buf_size;
 
+   unsigned long   param_area;
+
/*
 * Maximum size supported by firmware to copy from source to
 * destination address per entry.
@@ -138,6 +140,7 @@ struct fw_dump {
unsigned long   dump_active:1;
unsigned long   dump_registered:1;
unsigned long   nocma:1;
+   unsigned long   param_area_supported:1;
 
struct fadump_ops   *ops;
 };
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index fe6be00451b9..6d35b09d6f3a 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1431,6 +1431,43 @@ static ssize_t registered_show(struct kobject *kobj,
return sprintf(buf, "%d\n", fw_dump.dump_registered);
 }
 
+static ssize_t bootargs_append_show(struct kobject *kobj,
+  struct kobj_attribute *attr,
+  char *buf)
+{
+   return sprintf(buf, "%s\n", (char *)__va(fw_dump.param_area));
+}
+
+static ssize_t bootargs_append_store(struct kobject *kobj,
+  struct kobj_attribute *attr,
+  const char *buf, size_t count)
+{
+   char *params;
+
+   if (!fw_dump.fadump_enabled || fw_dump.dump_active)
+   return -EPERM;
+
+   if (count >= COMMAND_LINE_SIZE)
+   return -EINVAL;
+
+   /*
+* Fail here instead of handling this scenario with
+* some silly workaround in capture kernel.
+*/
+   if (saved_command_line_len + count >= COMMAND_LINE_SIZE) {
+   pr_err("Appending parameters exceeds cmdline size!\n");
+   return -ENOSPC;
+   }
+
+   params = __va(fw_dump.param_area);
+   strscpy_pad(params, buf, COMMAND_LINE_SIZE);
+   /* Remove newline character at the end. */
+   if (params[count-1] == '\n')
+   params[count-1] = '\0';
+
+   return count;
+}
+
 static ssize_t registered_store(struct kobject *kobj,
struct kobj_attribute *attr,
const char *buf, size_t count)
@@ -1490,6 +1527,7 @@ static struct kobj_attribute enable_attr = 
__ATTR_RO(enabled);
 static struct kobj_attribute register_attr = __ATTR_RW(registered);
 static struct kobj_attribute mem_reserved_attr = __ATTR_RO(mem_reserved);
 static struct kobj_attribute hotplug_ready_attr = __ATTR_RO(hotplug_ready);
+static struct kobj_attribute bootargs_append_attr = __ATTR_RW(bootargs_append);
 
 static struct attribute *fadump_attrs[] = {
_attr.attr,
@@ -1663,6 +1701,54 @@ static void __init fadump_process(void)
fadump_invalidate_release_mem();
 }
 
+/*
+ * Reserve memory to store additional parameters to be passed
+ * for fadump/capture kernel.
+ */
+static void fadump_setup_param_area(void)
+{
+   phys_addr_t range_start, range_end;
+
+   if (!fw_dump.param_area_supported || fw_dump.dump_active)
+   return;
+
+   /* This memory can't be used by PFW or bootloader as it is shared 
across kernels */
+   if (radix_enabled()) {
+   /*
+* Anywhere in the upper half should be good enough as all 
memory
+* is accessible in real mode.
+*/
+   range_start = memblock_end_of_DRAM() / 2;
+   range_end = memblock_end_of_DRAM();
+   } else {
+   /*
+* Passing additional parameters is supported for hash MMU only
+* if the first memory block size is 768MB or higher.
+*/
+   if (ppc64_rma_size < 0x3

[PATCH v2 1/3] powerpc/pseries/fadump: add support for multiple boot memory regions

2024-05-09 Thread Hari Bathini

Currently, fadump on pseries assumes a single boot memory region even
though f/w supports more than one boot memory region. Add support for
more boot memory regions to make the implementation flexible for any
enhancements that introduce other region types. For this, rtas memory
structure for fadump is updated to have multiple boot memory regions
instead of just one. Additionally, methods responsible for creating
the fadump memory structure during both the first and second kernel
boot have been modified to take these multiple boot memory regions
into account. Also, a new callback has been added to the fadump_ops
structure to get the maximum boot memory regions supported by the
platform.

Signed-off-by: Sourabh Jain 
Signed-off-by: Hari Bathini 
---
 arch/powerpc/include/asm/fadump-internal.h   |   2 +-
 arch/powerpc/kernel/fadump.c |  27 +-
 arch/powerpc/platforms/powernv/opal-fadump.c |   7 +
 arch/powerpc/platforms/pseries/rtas-fadump.c | 255 +--
 arch/powerpc/platforms/pseries/rtas-fadump.h |  26 +-
 5 files changed, 197 insertions(+), 120 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump-internal.h 
b/arch/powerpc/include/asm/fadump-internal.h
index 5d706a7acc8a..35787fa1ac60 100644
--- a/arch/powerpc/include/asm/fadump-internal.h
+++ b/arch/powerpc/include/asm/fadump-internal.h
@@ -156,6 +156,7 @@ struct fadump_ops {
  struct seq_file *m);
void(*fadump_trigger)(struct fadump_crash_info_header *fdh,
  const char *msg);
+   int (*fadump_max_boot_mem_rgns)(void);
 };
 
 /* Helper functions */
@@ -163,7 +164,6 @@ s32 __init fadump_setup_cpu_notes_buf(u32 num_cpus);
 void fadump_free_cpu_notes_buf(void);
 u32 *__init fadump_regs_to_elf_notes(u32 *buf, struct pt_regs *regs);
 void __init fadump_update_elfcore_header(char *bufp);
-bool is_fadump_boot_mem_contiguous(void);
 bool is_fadump_reserved_mem_contiguous(void);
 
 #else /* !CONFIG_PRESERVE_FA_DUMP */
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 0b849563393e..fe6be00451b9 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -220,28 +220,6 @@ static bool is_fadump_mem_area_contiguous(u64 d_start, u64 
d_end)
return ret;
 }
 
-/*
- * Returns true, if there are no holes in boot memory area,
- * false otherwise.
- */
-bool is_fadump_boot_mem_contiguous(void)
-{
-   unsigned long d_start, d_end;
-   bool ret = false;
-   int i;
-
-   for (i = 0; i < fw_dump.boot_mem_regs_cnt; i++) {
-   d_start = fw_dump.boot_mem_addr[i];
-   d_end   = d_start + fw_dump.boot_mem_sz[i];
-
-   ret = is_fadump_mem_area_contiguous(d_start, d_end);
-   if (!ret)
-   break;
-   }
-
-   return ret;
-}
-
 /*
  * Returns true, if there are no holes in reserved memory area,
  * false otherwise.
@@ -381,10 +359,11 @@ static unsigned long __init get_fadump_area_size(void)
 static int __init add_boot_mem_region(unsigned long rstart,
  unsigned long rsize)
 {
+   int max_boot_mem_rgns = fw_dump.ops->fadump_max_boot_mem_rgns();
int i = fw_dump.boot_mem_regs_cnt++;
 
-   if (fw_dump.boot_mem_regs_cnt > FADUMP_MAX_MEM_REGS) {
-   fw_dump.boot_mem_regs_cnt = FADUMP_MAX_MEM_REGS;
+   if (fw_dump.boot_mem_regs_cnt > max_boot_mem_rgns) {
+   fw_dump.boot_mem_regs_cnt = max_boot_mem_rgns;
return 0;
}
 
diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c 
b/arch/powerpc/platforms/powernv/opal-fadump.c
index 767a6b19e42a..5a88d7efb48a 100644
--- a/arch/powerpc/platforms/powernv/opal-fadump.c
+++ b/arch/powerpc/platforms/powernv/opal-fadump.c
@@ -599,6 +599,12 @@ static void opal_fadump_trigger(struct 
fadump_crash_info_header *fdh,
pr_emerg("No backend support for MPIPL!\n");
 }
 
+/* FADUMP_MAX_MEM_REGS or lower */
+static int opal_fadump_max_boot_mem_rgns(void)
+{
+   return FADUMP_MAX_MEM_REGS;
+}
+
 static struct fadump_ops opal_fadump_ops = {
.fadump_init_mem_struct = opal_fadump_init_mem_struct,
.fadump_get_metadata_size   = opal_fadump_get_metadata_size,
@@ -611,6 +617,7 @@ static struct fadump_ops opal_fadump_ops = {
.fadump_process = opal_fadump_process,
.fadump_region_show = opal_fadump_region_show,
.fadump_trigger = opal_fadump_trigger,
+   .fadump_max_boot_mem_rgns   = opal_fadump_max_boot_mem_rgns,
 };
 
 void __init opal_fadump_dt_scan(struct fw_dump *fadump_conf, u64 node)
diff --git a/arch/powerpc/platforms/pseries/rtas-fadump.c 
b/arch/powerpc/platforms/pseries/rtas-fadump.c
index 214f37788b2d..4db78b2bb2a8 100644
--- a/arch/powerpc/platforms/pseries/rtas-fadump.c
+++ b/arch/powerpc/platforms/pseries/rtas-fadump.c
@@ -29,9 +29,6

[PATCH v2 0/3] powerpc/fadump: pass additional args to dump capture kernel

2024-05-09 Thread Hari Bathini

While fadump is a more reliable alternative to kdump dump capturing
method, it doesn't support passing additional parameters. Having
such support is desirable for two major reasons:

  1. It helps minimize the memory consumption of fadump dump capture
 kernel by disabling features that consume considerable amount of
 memory but have little significance for dump capture environment
 (eg. numa, cma, cgroup, etc.)
   2. It helps disable such features/components in dump capture kernel
  that are unstable and/or are being debugged.

This patch series is a follow-up to [1]. Adds support for passing
additional parameters to fadump capture kernel to make it more
desirable. For this, a dedicated area is passed between production
kernel and capture kerenl to pass these additional parameters. This
support is enabled only on pseries as of now. The dedicated area is
referred to as RTAS_FADUMP_PARAM_AREA.

In radix MMU mode, this dedicated area can be anywhere but in case of
hash MMU, it can only be in the first memory block to be accessible
during early boot. Enabling this feature support in both radix and
hash MMU modes but in hash MMU only when RMA size is 768MB or more
to avoid complex memory real estate with FW components.

The first patch adds support for multiple boot memory regions to make
addition of any new region types simpler. The second patch sets up the
parameter (dedicated) area to be passed to the capture kernel.
/sys/kernel/fadump/bootargs_append is exported to the userspace to
specify the additional parameters to be passed to the capture kernel.
The last patch appends the parameters to bootargs during capture
kernel boot.

Changes in v2:
* RFC tag removed.
* Moved variable declaration out of switch case.
* Zero'ed the parameter area while setting up.
* Reserving the parameter area only when needed.

[1] 
https://lore.kernel.org/linuxppc-dev/20231205201835.388030-1-hbath...@linux.ibm.com/

Hari Bathini (3):
  powerpc/pseries/fadump: add support for multiple boot memory regions
  powerpc/fadump: setup additional parameters for dump capture kernel
  powerpc/fadump: pass additional parameters when fadump is active

 arch/powerpc/include/asm/fadump-internal.h   |   5 +-
 arch/powerpc/include/asm/fadump.h|   2 +
 arch/powerpc/kernel/fadump.c | 149 --
 arch/powerpc/kernel/prom.c   |   3 +
 arch/powerpc/platforms/powernv/opal-fadump.c |  13 +-
 arch/powerpc/platforms/pseries/rtas-fadump.c | 290 +--
 arch/powerpc/platforms/pseries/rtas-fadump.h |  29 +-
 7 files changed, 366 insertions(+), 125 deletions(-)

-- 
2.45.0

[PATCH v4 1/2] powerpc64/bpf: fix tail calls for PCREL addressing

2024-05-02 Thread Hari Bathini

With PCREL addressing, there is no kernel TOC. So, it is not setup in
prologue when PCREL addressing is used. But the number of instructions
to skip on a tail call was not adjusted accordingly. That resulted in
not so obvious failures while using tailcalls. 'tailcalls' selftest
crashed the system with the below call trace:

  bpf_test_run+0xe8/0x3cc (unreliable)
  bpf_prog_test_run_skb+0x348/0x778
  __sys_bpf+0xb04/0x2b00
  sys_bpf+0x28/0x38
  system_call_exception+0x168/0x340
  system_call_vectored_common+0x15c/0x2ec

Also, as bpf programs are always module addresses and a bpf helper in
general is a core kernel text address, using PC relative addressing
often fails with "out of range of pcrel address" error. Switch to
using kernel base for relative addressing to handle this better.

Fixes: 7e3a68be42e1 ("powerpc/64: vmlinux support building with PCREL 
addresing")
Cc: sta...@vger.kernel.org
Signed-off-by: Hari Bathini 
---

* Changes in v4:
  - Fix out of range errors by switching to kernelbase instead of PC
for relative addressing.

* Changes in v3:
  - New patch to fix tailcall issues with PCREL addressing.


 arch/powerpc/net/bpf_jit_comp64.c | 30 --
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 79f23974a320..4de08e35e284 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -202,7 +202,8 @@ void bpf_jit_build_epilogue(u32 *image, struct 
codegen_context *ctx)
EMIT(PPC_RAW_BLR());
 }
 
-static int bpf_jit_emit_func_call_hlp(u32 *image, struct codegen_context *ctx, 
u64 func)
+static int
+bpf_jit_emit_func_call_hlp(u32 *image, u32 *fimage, struct codegen_context 
*ctx, u64 func)
 {
unsigned long func_addr = func ? ppc_function_entry((void *)func) : 0;
long reladdr;
@@ -211,19 +212,20 @@ static int bpf_jit_emit_func_call_hlp(u32 *image, struct 
codegen_context *ctx, u
return -EINVAL;
 
if (IS_ENABLED(CONFIG_PPC_KERNEL_PCREL)) {
-   reladdr = func_addr - CTX_NIA(ctx);
+   reladdr = func_addr - local_paca->kernelbase;
 
if (reladdr >= (long)SZ_8G || reladdr < -(long)SZ_8G) {
-   pr_err("eBPF: address of %ps out of range of pcrel 
address.\n",
-   (void *)func);
+   pr_err("eBPF: address of %ps out of range of 34-bit 
relative address.\n",
+  (void *)func);
return -ERANGE;
}
-   /* pla r12,addr */
-   EMIT(PPC_PREFIX_MLS | __PPC_PRFX_R(1) | IMM_H18(reladdr));
-   EMIT(PPC_INST_PADDI | ___PPC_RT(_R12) | IMM_L(reladdr));
-   EMIT(PPC_RAW_MTCTR(_R12));
-   EMIT(PPC_RAW_BCTR());
-
+   EMIT(PPC_RAW_LD(_R12, _R13, offsetof(struct paca_struct, 
kernelbase)));
+   /* Align for subsequent prefix instruction */
+   if (!IS_ALIGNED((unsigned long)fimage + CTX_NIA(ctx), 8))
+   EMIT(PPC_RAW_NOP());
+   /* paddi r12,r12,addr */
+   EMIT(PPC_PREFIX_MLS | __PPC_PRFX_R(0) | IMM_H18(reladdr));
+   EMIT(PPC_INST_PADDI | ___PPC_RT(_R12) | ___PPC_RA(_R12) | 
IMM_L(reladdr));
} else {
reladdr = func_addr - kernel_toc_addr();
if (reladdr > 0x7FFF || reladdr < -(0x8000L)) {
@@ -233,9 +235,9 @@ static int bpf_jit_emit_func_call_hlp(u32 *image, struct 
codegen_context *ctx, u
 
EMIT(PPC_RAW_ADDIS(_R12, _R2, PPC_HA(reladdr)));
EMIT(PPC_RAW_ADDI(_R12, _R12, PPC_LO(reladdr)));
-   EMIT(PPC_RAW_MTCTR(_R12));
-   EMIT(PPC_RAW_BCTRL());
}
+   EMIT(PPC_RAW_MTCTR(_R12));
+   EMIT(PPC_RAW_BCTRL());
 
return 0;
 }
@@ -285,7 +287,7 @@ static int bpf_jit_emit_tail_call(u32 *image, struct 
codegen_context *ctx, u32 o
int b2p_index = bpf_to_ppc(BPF_REG_3);
int bpf_tailcall_prologue_size = 8;
 
-   if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2))
+   if (!IS_ENABLED(CONFIG_PPC_KERNEL_PCREL) && 
IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2))
bpf_tailcall_prologue_size += 4; /* skip past the toc load */
 
/*
@@ -993,7 +995,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 
*fimage, struct code
return ret;
 
if (func_addr_fixed)
-   ret = bpf_jit_emit_func_call_hlp(image, ctx, 
func_addr);
+   ret = bpf_jit_emit_func_call_hlp(image, fimage, 
ctx, func_addr);
else
ret = bpf_jit_emit_func_call_rel(image, fimage, 
ctx, func_addr);
 
-- 
2.44.0

[PATCH v4 2/2] powerpc/bpf: enable kfunc call

2024-05-02 Thread Hari Bathini

Currently, bpf jit code on powerpc assumes all the bpf functions and
helpers to be part of core kernel text. This is false for kfunc case,
as function addresses may not be part of core kernel text area. So,
add support for addresses that are not within core kernel text area
too, to enable kfunc support. Emit instructions based on whether the
function address is within core kernel text address or not, to retain
optimized instruction sequence where possible.

In case of PCREL, as a bpf function that is not within core kernel
text area is likely to go out of range with relative addressing on
kernel base, use PC relative addressing. If that goes out of range,
load the full address with PPC_LI64().

With addresses that are not within core kernel text area supported,
override bpf_jit_supports_kfunc_call() to enable kfunc support. Also,
override bpf_jit_supports_far_kfunc_call() to enable 64-bit pointers,
as an address offset can be more than 32-bit long on PPC64.

Signed-off-by: Hari Bathini 
---

* Changes in v4:
  - Use either kernelbase or PC for relative addressing. Also, fallback
to PPC_LI64(), if both are out of range.
  - Update r2 with kernel TOC for elfv1 too as elfv1 also uses the
optimization sequence, that expects r2 to be kernel TOC, when
function address is within core kernel text.

* Changes in v3:
  - Retained optimized instruction sequence when function address is
a core kernel address as suggested by Naveen.
  - Used unoptimized instruction sequence for PCREL addressing to
avoid out of range errors for core kernel function addresses.
  - Folded patch that adds support for kfunc calls with patch that
enables/advertises this support as suggested by Naveen.


 arch/powerpc/net/bpf_jit_comp.c   | 10 +
 arch/powerpc/net/bpf_jit_comp64.c | 61 ++-
 2 files changed, 61 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 0f9a21783329..984655419da5 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -359,3 +359,13 @@ void bpf_jit_free(struct bpf_prog *fp)
 
bpf_prog_unlock_free(fp);
 }
+
+bool bpf_jit_supports_kfunc_call(void)
+{
+   return true;
+}
+
+bool bpf_jit_supports_far_kfunc_call(void)
+{
+   return IS_ENABLED(CONFIG_PPC64);
+}
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 4de08e35e284..8afc14a4a125 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -208,17 +208,13 @@ bpf_jit_emit_func_call_hlp(u32 *image, u32 *fimage, 
struct codegen_context *ctx,
unsigned long func_addr = func ? ppc_function_entry((void *)func) : 0;
long reladdr;
 
-   if (WARN_ON_ONCE(!core_kernel_text(func_addr)))
+   if (WARN_ON_ONCE(!kernel_text_address(func_addr)))
return -EINVAL;
 
-   if (IS_ENABLED(CONFIG_PPC_KERNEL_PCREL)) {
-   reladdr = func_addr - local_paca->kernelbase;
+#ifdef CONFIG_PPC_KERNEL_PCREL
+   reladdr = func_addr - local_paca->kernelbase;
 
-   if (reladdr >= (long)SZ_8G || reladdr < -(long)SZ_8G) {
-   pr_err("eBPF: address of %ps out of range of 34-bit 
relative address.\n",
-  (void *)func);
-   return -ERANGE;
-   }
+   if (reladdr < (long)SZ_8G && reladdr >= -(long)SZ_8G) {
EMIT(PPC_RAW_LD(_R12, _R13, offsetof(struct paca_struct, 
kernelbase)));
/* Align for subsequent prefix instruction */
if (!IS_ALIGNED((unsigned long)fimage + CTX_NIA(ctx), 8))
@@ -227,6 +223,26 @@ bpf_jit_emit_func_call_hlp(u32 *image, u32 *fimage, struct 
codegen_context *ctx,
EMIT(PPC_PREFIX_MLS | __PPC_PRFX_R(0) | IMM_H18(reladdr));
EMIT(PPC_INST_PADDI | ___PPC_RT(_R12) | ___PPC_RA(_R12) | 
IMM_L(reladdr));
} else {
+   unsigned long pc = (unsigned long)fimage + CTX_NIA(ctx);
+   bool alignment_needed = !IS_ALIGNED(pc, 8);
+
+   reladdr = func_addr - (alignment_needed ? pc + 4 :  pc);
+
+   if (reladdr < (long)SZ_8G && reladdr >= -(long)SZ_8G) {
+   if (alignment_needed)
+   EMIT(PPC_RAW_NOP());
+   /* pla r12,addr */
+   EMIT(PPC_PREFIX_MLS | __PPC_PRFX_R(1) | 
IMM_H18(reladdr));
+   EMIT(PPC_INST_PADDI | ___PPC_RT(_R12) | IMM_L(reladdr));
+   } else {
+   /* We can clobber r12 */
+   PPC_LI64(_R12, func);
+   }
+   }
+   EMIT(PPC_RAW_MTCTR(_R12));
+   EMIT(PPC_RAW_BCTRL());
+#else
+   if (core_kernel_text(func_addr)) {
reladdr = func_addr - kernel_toc_addr();
if (reladdr > 0x7FFF || reladdr < -(0x8000L)) {

[PATCH 1/2] radix/kfence: map __kfence_pool at page granularity

2024-04-24 Thread Hari Bathini

When KFENCE is enabled, total system memory is mapped at page level
granularity. But in radix MMU mode, ~3GB additional memory is needed
to map 100GB of system memory at page level granularity when compared
to using 2MB direct mapping. This is not desired considering KFENCE is
designed to be enabled in production kernels [1]. Also, mapping memory
allocated for KFENCE pool at page granularity seems sufficient enough
to enable KFENCE support. So, allocate __kfence_pool during bootup and
map it at page granularity instead of mapping all system memory at
page granularity.

Without patch:
# cat /proc/meminfo
MemTotal:   101201920 kB

With patch:
# cat /proc/meminfo
MemTotal:   104483904 kB

All kfence_test.c testcases passed with this patch.

[1] https://lore.kernel.org/all/20201103175841.3495947-2-el...@google.com/

Signed-off-by: Hari Bathini 
---
 arch/powerpc/include/asm/kfence.h|  5 
 arch/powerpc/mm/book3s64/radix_pgtable.c | 34 ++--
 arch/powerpc/mm/init_64.c| 14 ++
 3 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/kfence.h 
b/arch/powerpc/include/asm/kfence.h
index 424ceef82ae6..18ec2b06ba1e 100644
--- a/arch/powerpc/include/asm/kfence.h
+++ b/arch/powerpc/include/asm/kfence.h
@@ -8,6 +8,7 @@
 #ifndef __ASM_POWERPC_KFENCE_H
 #define __ASM_POWERPC_KFENCE_H
 
+#include 
 #include 
 #include 
 
@@ -15,6 +16,10 @@
 #define ARCH_FUNC_PREFIX "."
 #endif
 
+#ifdef CONFIG_KFENCE
+extern bool kfence_early_init;
+#endif
+
 static inline bool arch_kfence_init_pool(void)
 {
return true;
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 15e88f1439ec..fccbf92f279b 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -291,9 +292,8 @@ static unsigned long next_boundary(unsigned long addr, 
unsigned long end)
return end;
 }
 
-static int __meminit create_physical_mapping(unsigned long start,
-unsigned long end,
-int nid, pgprot_t _prot)
+static int __meminit create_physical_mapping(unsigned long start, unsigned 
long end, int nid,
+pgprot_t _prot, unsigned long 
mapping_sz_limit)
 {
unsigned long vaddr, addr, mapping_size = 0;
bool prev_exec, exec = false;
@@ -301,7 +301,10 @@ static int __meminit create_physical_mapping(unsigned long 
start,
int psize;
unsigned long max_mapping_size = memory_block_size;
 
-   if (debug_pagealloc_enabled_or_kfence())
+   if (mapping_sz_limit < max_mapping_size)
+   max_mapping_size = mapping_sz_limit;
+
+   if (debug_pagealloc_enabled())
max_mapping_size = PAGE_SIZE;
 
start = ALIGN(start, PAGE_SIZE);
@@ -358,6 +361,7 @@ static int __meminit create_physical_mapping(unsigned long 
start,
 
 static void __init radix_init_pgtable(void)
 {
+   phys_addr_t kfence_pool __maybe_unused;
unsigned long rts_field;
phys_addr_t start, end;
u64 i;
@@ -365,6 +369,13 @@ static void __init radix_init_pgtable(void)
/* We don't support slb for radix */
slb_set_size(0);
 
+#ifdef CONFIG_KFENCE
+   if (kfence_early_init) {
+   kfence_pool = memblock_phys_alloc(KFENCE_POOL_SIZE, PAGE_SIZE);
+   memblock_mark_nomap(kfence_pool, KFENCE_POOL_SIZE);
+   }
+#endif
+
/*
 * Create the linear mapping
 */
@@ -380,10 +391,18 @@ static void __init radix_init_pgtable(void)
continue;
}
 
-   WARN_ON(create_physical_mapping(start, end,
-   -1, PAGE_KERNEL));
+   WARN_ON(create_physical_mapping(start, end, -1, PAGE_KERNEL, 
~0UL));
}
 
+#ifdef CONFIG_KFENCE
+   if (kfence_early_init) {
+   create_physical_mapping(kfence_pool, kfence_pool + 
KFENCE_POOL_SIZE, -1,
+   PAGE_KERNEL, PAGE_SIZE);
+   memblock_clear_nomap(kfence_pool, KFENCE_POOL_SIZE);
+   __kfence_pool = __va(kfence_pool);
+   }
+#endif
+
if (!cpu_has_feature(CPU_FTR_HVMODE) &&
cpu_has_feature(CPU_FTR_P9_RADIX_PREFETCH_BUG)) {
/*
@@ -874,8 +893,7 @@ int __meminit radix__create_section_mapping(unsigned long 
start,
return -1;
}
 
-   return create_physical_mapping(__pa(start), __pa(end),
-  nid, prot);
+   return create_physical_mapping(__pa(start), __pa(end), nid, prot, ~0UL);
 }
 
 int __meminit radix__remove_section_mapping(unsigned long start, unsigned long 
end)
diff --git a/arch/powerpc/mm/init_64.c b/arch

[PATCH 2/2] radix/kfence: support late __kfence_pool allocation

2024-04-24 Thread Hari Bathini

With commit b33f778bba5ef ("kfence: alloc kfence_pool after system
startup"), KFENCE pool can be allocated after system startup via the
page allocator. This can lead to problems as all memory is not mapped
at page granularity anymore with CONFIG_KFENCE. Address this by direct
mapping all memory at PMD level and split the mapping for PMD pages
that overlap with __kfence_pool to page level granularity if and when
__kfence_pool is allocated after system startup.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/include/asm/book3s/64/radix.h |  2 +
 arch/powerpc/include/asm/kfence.h  | 14 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c   | 50 +-
 3 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
b/arch/powerpc/include/asm/book3s/64/radix.h
index 8f55ff74bb68..0423ddbcf73c 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -340,6 +340,8 @@ extern void radix__vmemmap_remove_mapping(unsigned long 
start,
 extern int radix__map_kernel_page(unsigned long ea, unsigned long pa,
 pgprot_t flags, unsigned int psz);
 
+extern bool radix_kfence_init_pool(void);
+
 static inline unsigned long radix__get_tree_size(void)
 {
unsigned long rts_field;
diff --git a/arch/powerpc/include/asm/kfence.h 
b/arch/powerpc/include/asm/kfence.h
index 18ec2b06ba1e..c5d2fb2f9ecb 100644
--- a/arch/powerpc/include/asm/kfence.h
+++ b/arch/powerpc/include/asm/kfence.h
@@ -18,12 +18,24 @@
 
 #ifdef CONFIG_KFENCE
 extern bool kfence_early_init;
-#endif
+
+static inline bool kfence_alloc_pool_late(void)
+{
+   return !kfence_early_init;
+}
 
 static inline bool arch_kfence_init_pool(void)
 {
+#ifdef CONFIG_PPC_BOOK3S_64
+   if (radix_enabled())
+   return radix_kfence_init_pool();
+#endif
+
return true;
 }
+#else
+static inline bool kfence_alloc_pool_late(void) { return false; }
+#endif
 
 #ifdef CONFIG_PPC64
 static inline bool kfence_protect_page(unsigned long addr, bool protect)
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index fccbf92f279b..f4374e3e31e1 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -253,6 +253,53 @@ void radix__mark_initmem_nx(void)
 }
 #endif /* CONFIG_STRICT_KERNEL_RWX */
 
+#ifdef CONFIG_KFENCE
+static inline int radix_split_pmd_page(pmd_t *pmd, unsigned long addr)
+{
+   pte_t *pte = pte_alloc_one_kernel(_mm);
+   unsigned long pfn = PFN_DOWN(__pa(addr));
+   int i;
+
+   if (!pte)
+   return -ENOMEM;
+
+   for (i = 0; i < PTRS_PER_PTE; i++) {
+   __set_pte_at(_mm, addr, pte + i, pfn_pte(pfn + i, 
PAGE_KERNEL), 0);
+   asm volatile("ptesync": : :"memory");
+   }
+   pmd_populate_kernel(_mm, pmd, pte);
+
+   flush_tlb_kernel_range(addr, addr + PMD_SIZE);
+   return 0;
+}
+
+bool radix_kfence_init_pool(void)
+{
+   unsigned int page_psize, pmd_psize;
+   unsigned long addr;
+   pmd_t *pmd;
+
+   if (!kfence_alloc_pool_late())
+   return true;
+
+   page_psize = shift_to_mmu_psize(PAGE_SHIFT);
+   pmd_psize = shift_to_mmu_psize(PMD_SHIFT);
+   for (addr = (unsigned long)__kfence_pool; is_kfence_address((void 
*)addr);
+addr += PAGE_SIZE) {
+   pmd = pmd_off_k(addr);
+
+   if (pmd_leaf(*pmd)) {
+   if (radix_split_pmd_page(pmd, addr & PMD_MASK))
+   return false;
+   update_page_count(pmd_psize, -1);
+   update_page_count(page_psize, PTRS_PER_PTE);
+   }
+   }
+
+   return true;
+}
+#endif
+
 static inline void __meminit
 print_mapping(unsigned long start, unsigned long end, unsigned long size, bool 
exec)
 {
@@ -391,7 +438,8 @@ static void __init radix_init_pgtable(void)
continue;
}
 
-   WARN_ON(create_physical_mapping(start, end, -1, PAGE_KERNEL, 
~0UL));
+   WARN_ON(create_physical_mapping(start, end, -1, PAGE_KERNEL,
+   kfence_alloc_pool_late() ? 
PMD_SIZE : ~0UL));
}
 
 #ifdef CONFIG_KFENCE
-- 
2.44.0

[PATCH v3 1/2] powerpc64/bpf: fix tail calls for PCREL addressing

2024-04-02 Thread Hari Bathini

With PCREL addressing, there is no kernel TOC. So, it is not setup in
prologue when PCREL addressing is used. But the number of instructions
to skip on a tail call was not adjusted accordingly. That resulted in
not so obvious failures while using tailcalls. 'tailcalls' selftest
crashed the system with the below call trace:

  bpf_test_run+0xe8/0x3cc (unreliable)
  bpf_prog_test_run_skb+0x348/0x778
  __sys_bpf+0xb04/0x2b00
  sys_bpf+0x28/0x38
  system_call_exception+0x168/0x340
  system_call_vectored_common+0x15c/0x2ec

Fixes: 7e3a68be42e1 ("powerpc/64: vmlinux support building with PCREL 
addresing")
Cc: sta...@vger.kernel.org
Signed-off-by: Hari Bathini 
---

* Changes in v3:
  - New patch to fix tailcall issues with PCREL addressing.


 arch/powerpc/net/bpf_jit_comp64.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 79f23974a320..7f62ac4b4e65 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -285,8 +285,10 @@ static int bpf_jit_emit_tail_call(u32 *image, struct 
codegen_context *ctx, u32 o
int b2p_index = bpf_to_ppc(BPF_REG_3);
int bpf_tailcall_prologue_size = 8;
 
+#ifndef CONFIG_PPC_KERNEL_PCREL
if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2))
bpf_tailcall_prologue_size += 4; /* skip past the toc load */
+#endif
 
/*
 * if (index >= array->map.max_entries)
-- 
2.44.0

[PATCH v3 2/2] powerpc/bpf: enable kfunc call

2024-04-02 Thread Hari Bathini

Currently, bpf jit code on powerpc assumes all the bpf functions and
helpers to be kernel text. This is false for kfunc case, as function
addresses can be module addresses as well. So, ensure module addresses
are supported to enable kfunc support.

Emit instructions based on whether the function address is kernel text
address or module address to retain optimized instruction sequence for
kernel text address case.

Also, as bpf programs are always module addresses and a bpf helper can
be within kernel address as well, using relative addressing often fails
with "out of range of pcrel address" error. Use unoptimized instruction
sequence for both kernel and module addresses to work around this, when
PCREL addressing is used.

With module addresses supported, override bpf_jit_supports_kfunc_call()
to enable kfunc support. Since module address offsets can be more than
32-bit long on PPC64, override bpf_jit_supports_far_kfunc_call() to
enable 64-bit pointers.

Signed-off-by: Hari Bathini 
---

* Changes in v3:
  - Retained optimized instruction sequence when function address is
a core kernel address as suggested by Naveen.
  - Used unoptimized instruction sequence for PCREL addressing to
avoid out of range errors for core kernel function addresses.
  - Folded patch that adds support for kfunc calls with patch that
enables/advertises this support as suggested by Naveen.


 arch/powerpc/net/bpf_jit_comp.c   | 10 +++
 arch/powerpc/net/bpf_jit_comp64.c | 48 ---
 2 files changed, 42 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 0f9a21783329..dc7ffafd7441 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -359,3 +359,13 @@ void bpf_jit_free(struct bpf_prog *fp)
 
bpf_prog_unlock_free(fp);
 }
+
+bool bpf_jit_supports_kfunc_call(void)
+{
+   return true;
+}
+
+bool bpf_jit_supports_far_kfunc_call(void)
+{
+   return IS_ENABLED(CONFIG_PPC64) ? true : false;
+}
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 7f62ac4b4e65..ec3adf715c55 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -207,24 +207,14 @@ static int bpf_jit_emit_func_call_hlp(u32 *image, struct 
codegen_context *ctx, u
unsigned long func_addr = func ? ppc_function_entry((void *)func) : 0;
long reladdr;
 
-   if (WARN_ON_ONCE(!core_kernel_text(func_addr)))
+   /*
+* With the introduction of kfunc feature, BPF helpers can be part of 
kernel as
+* well as module text address.
+*/
+   if (WARN_ON_ONCE(!kernel_text_address(func_addr)))
return -EINVAL;
 
-   if (IS_ENABLED(CONFIG_PPC_KERNEL_PCREL)) {
-   reladdr = func_addr - CTX_NIA(ctx);
-
-   if (reladdr >= (long)SZ_8G || reladdr < -(long)SZ_8G) {
-   pr_err("eBPF: address of %ps out of range of pcrel 
address.\n",
-   (void *)func);
-   return -ERANGE;
-   }
-   /* pla r12,addr */
-   EMIT(PPC_PREFIX_MLS | __PPC_PRFX_R(1) | IMM_H18(reladdr));
-   EMIT(PPC_INST_PADDI | ___PPC_RT(_R12) | IMM_L(reladdr));
-   EMIT(PPC_RAW_MTCTR(_R12));
-   EMIT(PPC_RAW_BCTR());
-
-   } else {
+   if (core_kernel_text(func_addr) && 
!IS_ENABLED(CONFIG_PPC_KERNEL_PCREL)) {
reladdr = func_addr - kernel_toc_addr();
if (reladdr > 0x7FFF || reladdr < -(0x8000L)) {
pr_err("eBPF: address of %ps out of range of 
kernel_toc.\n", (void *)func);
@@ -235,6 +225,32 @@ static int bpf_jit_emit_func_call_hlp(u32 *image, struct 
codegen_context *ctx, u
EMIT(PPC_RAW_ADDI(_R12, _R12, PPC_LO(reladdr)));
EMIT(PPC_RAW_MTCTR(_R12));
EMIT(PPC_RAW_BCTRL());
+   } else {
+   if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V1)) {
+   /* func points to the function descriptor */
+   PPC_LI64(bpf_to_ppc(TMP_REG_2), func);
+   /* Load actual entry point from function descriptor */
+   EMIT(PPC_RAW_LD(bpf_to_ppc(TMP_REG_1), 
bpf_to_ppc(TMP_REG_2), 0));
+   /* ... and move it to CTR */
+   EMIT(PPC_RAW_MTCTR(bpf_to_ppc(TMP_REG_1)));
+   /*
+* Load TOC from function descriptor at offset 8.
+* We can clobber r2 since we get called through a
+* function pointer (so caller will save/restore r2)
+* and since we don't use a TOC ourself.
+*/
+   EMIT(PPC_RAW_LD(2, bpf_to_ppc(TMP_REG_2), 8));
+   EMIT(PPC_RAW_BCTRL());
+

Re: [PATCH v8 1/3] powerpc: make fadump resilient with memory add/remove events

2024-03-11 Thread Hari Bathini





On 17/02/24 12:50 pm, Sourabh Jain wrote:

Due to changes in memory resources caused by either memory hotplug or
online/offline events, the elfcorehdr, which describes the CPUs and
memory of the crashed kernel to the kernel that collects the dump (known
as second/fadump kernel), becomes outdated. Consequently, attempting
dump collection with an outdated elfcorehdr can lead to failed or
inaccurate dump collection.

Memory hotplug or online/offline events is referred as memory add/remove
events in reset of the commit message.

The current solution to address the aforementioned issue is as follows:
Monitor memory add/remove events in userspace using udev rules, and
re-register fadump whenever there are changes in memory resources. This
leads to the creation of a new elfcorehdr with updated system memory
information.

There are several notable issues associated with re-registering fadump
for every memory add/remove events.

1. Bulk memory add/remove events with udev-based fadump re-registration
can lead to race conditions and, more importantly, it creates a wide
window during which fadump is inactive until all memory add/remove
events are settled.
2. Re-registering fadump for every memory add/remove event is
inefficient.
3. The memory for elfcorehdr is allocated based on the memblock regions
available during early boot and remains fixed thereafter. However, if
elfcorehdr is later recreated with additional memblock regions, its
size will increase, potentially leading to memory corruption.

Address the aforementioned challenges by shifting the creation of
elfcorehdr from the first kernel (also referred as the crashed kernel),
where it was created and frequently recreated for every memory
add/remove event, to the fadump kernel. As a result, the elfcorehdr only
needs to be created once, thus eliminating the necessity to re-register
fadump during memory add/remove events.

At present, the first kernel is responsible for preparing the fadump
header and storing it in the fadump reserved area. The fadump header
includes the start address of the elfcorehdr, crashing CPU details, and
other relevant information. In the event of a crash in the first kernel,
the second/fadump boots and accesses the fadump header prepared by the
first kernel. It then performs the following steps in a
platform-specific function [rtas|opal]_fadump_process:

1. Sanity check for fadump header
2. Update CPU notes in elfcorehdr

Along with the above, update the setup_fadump()/fadump.c to create
elfcorehdr and set its address to the global variable elfcorehdr_addr
for the vmcore module to process it in the second/fadump kernel.

Section below outlines the information required to create the elfcorehdr
and the changes made to make it available to the fadump kernel if it's
not already.

To create elfcorehdr, the following crashed kernel information is
required: CPU notes, vmcoreinfo, and memory ranges.

At present, the CPU notes are already prepared in the fadump kernel, so
no changes are needed in that regard. The fadump kernel has access to
all crashed kernel memory regions, including boot memory regions that
are relocated by firmware to fadump reserved areas, so no changes for
that either. However, it is necessary to add new members to the fadump
header, i.e., the 'fadump_crash_info_header' structure, in order to pass
the crashed kernel's vmcoreinfo address and its size to fadump kernel.

In addition to the vmcoreinfo address and size, there are a few other
attributes also added to the fadump_crash_info_header structure.

1. version:
It stores the fadump header version, which is currently set to 1.
This provides flexibility to update the fadump crash info header in
the future without changing the magic number. For each change in the
fadump header, the version will be increased. This will help the
updated kernel determine how to handle kernel dumps from older
kernels. The magic number remains relevant for checking fadump header
corruption.

2. pt_regs_sz/cpu_mask_sz:
Store size of pt_regs and cpu_mask structure of first kernel. These
attributes are used to prevent dump processing if the sizes of
pt_regs or cpu_mask structure differ between the first and fadump
kernels.

Note: if either first/crashed kernel or second/fadump kernel do not have
the changes introduced here then kernel fail to collect the dump and
prints relevant error message on the console.

Signed-off-by: Sourabh Jain 
Cc: Aditya Gupta 
Cc: Aneesh Kumar K.V 
Cc: Hari Bathini 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Naveen N Rao 
---
  arch/powerpc/include/asm/fadump-internal.h   |  31 +-
  arch/powerpc/kernel/fadump.c | 339 +++
  arch/powerpc/platforms/powernv/opal-fadump.c |  22 +-
  arch/powerpc/platforms/pseries/rtas-fadump.c |  30 +-
  4 files changed, 232 insertions(+), 190 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump-internal.h 
b/arch/powerpc

Re: [PATCH v17 6/6] powerpc/crash: add crash memory hotplug support

2024-03-02 Thread Hari Bathini





On 26/02/24 2:11 pm, Sourabh Jain wrote:

Extend the arch crash hotplug handler, as introduced by the patch title
("powerpc: add crash CPU hotplug support"), to also support memory
add/remove events.

Elfcorehdr describes the memory of the crash kernel to capture the
kernel; hence, it needs to be updated if memory resources change due to
memory add/remove events. Therefore, arch_crash_handle_hotplug_event()
is updated to recreate the elfcorehdr and replace it with the previous
one on memory add/remove events.

The memblock list is used to prepare the elfcorehdr. In the case of
memory hot remove, the memblock list is updated after the arch crash
hotplug handler is triggered, as depicted in Figure 1. Thus, the
hot-removed memory is explicitly removed from the crash memory ranges
to ensure that the memory ranges added to elfcorehdr do not include the
hot-removed memory.

 Memory remove
   |
   v
 Offline pages
   |
   v
  Initiate memory notify call <> crash hotplug handler
  chain for MEM_OFFLINE event
   |
   v
  Update memblock list

Figure 1

There are two system calls, `kexec_file_load` and `kexec_load`, used to
load the kdump image. A few changes have been made to ensure that the
kernel can safely update the elfcorehdr component of the kdump image for
both system calls.

For the kexec_file_load syscall, kdump image is prepared in the kernel.
To support an increasing number of memory regions, the elfcorehdr is
built with extra buffer space to ensure that it can accommodate
additional memory ranges in future.

For the kexec_load syscall, the elfcorehdr is updated only if the
KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by the
kexec tool. Passing this flag to the kernel indicates that the
elfcorehdr is built to accommodate additional memory ranges and the
elfcorehdr segment is not considered for SHA calculation, making it safe
to update.

The changes related to this feature are kept under the CRASH_HOTPLUG
config, and it is enabled by default.



Overall, the patchset looks good. I tried out the changes too.

Acked-by: Hari Bathini 


Signed-off-by: Sourabh Jain 
Cc: Akhil Raj 
Cc: Andrew Morton 
Cc: Aneesh Kumar K.V 
Cc: Baoquan He 
Cc: Borislav Petkov (AMD) 
Cc: Boris Ostrovsky 
Cc: Christophe Leroy 
Cc: Dave Hansen 
Cc: Dave Young 
Cc: David Hildenbrand 
Cc: Greg Kroah-Hartman 
Cc: Hari Bathini 
Cc: Laurent Dufour 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Mimi Zohar 
Cc: Naveen N Rao 
Cc: Oscar Salvador 
Cc: Thomas Gleixner 
Cc: Valentin Schneider 
Cc: Vivek Goyal 
Cc: ke...@lists.infradead.org
Cc: x...@kernel.org
---
  arch/powerpc/include/asm/kexec.h|  3 +
  arch/powerpc/include/asm/kexec_ranges.h |  1 +
  arch/powerpc/kexec/crash.c  | 95 -
  arch/powerpc/kexec/file_load_64.c   | 20 +-
  arch/powerpc/kexec/ranges.c | 85 ++
  5 files changed, 202 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index e75970351bcd..95a98b390d62 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -141,6 +141,9 @@ void arch_crash_handle_hotplug_event(struct kimage *image, 
void *arg);
  
  int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags);

  #define arch_crash_hotplug_support arch_crash_hotplug_support
+
+unsigned int arch_crash_get_elfcorehdr_size(void);
+#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
  #endif /* CONFIG_CRASH_HOTPLUG */
  
  extern int crashing_cpu;

diff --git a/arch/powerpc/include/asm/kexec_ranges.h 
b/arch/powerpc/include/asm/kexec_ranges.h
index 8489e844b447..14055896cbcb 100644
--- a/arch/powerpc/include/asm/kexec_ranges.h
+++ b/arch/powerpc/include/asm/kexec_ranges.h
@@ -7,6 +7,7 @@
  void sort_memory_ranges(struct crash_mem *mrngs, bool merge);
  struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges);
  int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
+int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
  int get_exclude_memory_ranges(struct crash_mem **mem_ranges);
  int get_reserved_memory_ranges(struct crash_mem **mem_ranges);
  int get_crash_memory_ranges(struct crash_mem **mem_ranges);
diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c
index 8938a19af12f..21b193e938a3 100644
--- a/arch/powerpc/kexec/crash.c
+++ b/arch/powerpc/kexec/crash.c
@@ -17,6 +17,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 

  #include 
@@ -25,6 +26,7 @@
  #include 
  #include 
  #include 
+#include 
  
  /*

   * The primary CPU waits a while for all secondary CPUs to enter. This is to
@@ -398,6 +400,94 @@ void default_machine_crash_shutdown(struct pt_regs *regs)
  #undef pr_fmt
  #define pr_fmt(fmt) "crash hp: " fmt
  
+/*

+ * Advertise preferred elfcorehdr size to use

Re: [PATCH v17 5/6] powerpc/crash: add crash CPU hotplug support

2024-03-02 Thread Hari Bathini





On 26/02/24 2:11 pm, Sourabh Jain wrote:

Due to CPU/Memory hotplug or online/offline events, the elfcorehdr
(which describes the CPUs and memory of the crashed kernel) and FDT
(Flattened Device Tree) of kdump image becomes outdated. Consequently,
attempting dump collection with an outdated elfcorehdr or FDT can lead
to failed or inaccurate dump collection.

Going forward, CPU hotplug or online/offline events are referred as
CPU/Memory add/remove events.

The current solution to address the above issue involves monitoring the
CPU/Memory add/remove events in userspace using udev rules and whenever
there are changes in CPU and memory resources, the entire kdump image
is loaded again. The kdump image includes kernel, initrd, elfcorehdr,
FDT, purgatory. Given that only elfcorehdr and FDT get outdated due to
CPU/Memory add/remove events, reloading the entire kdump image is
inefficient. More importantly, kdump remains inactive for a substantial
amount of time until the kdump reload completes.

To address the aforementioned issue, commit 247262756121 ("crash: add
generic infrastructure for crash hotplug support") added a generic
infrastructure that allows architectures to selectively update the kdump
image component during CPU or memory add/remove events within the kernel
itself.

In the event of a CPU or memory add/remove events, the generic crash
hotplug event handler, `crash_handle_hotplug_event()`, is triggered. It
then acquires the necessary locks to update the kdump image and invokes
the architecture-specific crash hotplug handler,
`arch_crash_handle_hotplug_event()`, to update the required kdump image
components.

This patch adds crash hotplug handler for PowerPC and enable support to
update the kdump image on CPU add/remove events. Support for memory
add/remove events is added in a subsequent patch with the title
"powerpc: add crash memory hotplug support"

As mentioned earlier, only the elfcorehdr and FDT kdump image components
need to be updated in the event of CPU or memory add/remove events.
However, on PowerPC architecture crash hotplug handler only updates the
FDT to enable crash hotplug support for CPU add/remove events. Here's
why.

The elfcorehdr on PowerPC is built with possible CPUs, and thus, it does
not need an update on CPU add/remove events. On the other hand, the FDT
needs to be updated on CPU add events to include the newly added CPU. If
the FDT is not updated and the kernel crashes on a newly added CPU, the
kdump kernel will fail to boot due to the unavailability of the crashing
CPU in the FDT. During the early boot, it is expected that the boot CPU
must be a part of the FDT; otherwise, the kernel will raise a BUG and
fail to boot. For more information, refer to commit 36ae37e3436b0
("powerpc: Make boot_cpuid common between 32 and 64-bit"). Since it is
okay to have an offline CPU in the kdump FDT, no action is taken in case
of CPU removal.

There are two system calls, `kexec_file_load` and `kexec_load`, used to
load the kdump image. Few changes have been made to ensure kernel can
safely update the FDT of kdump image loaded using both system calls.

For kexec_file_load syscall the kdump image is prepared in kernel. So to
support an increasing number of CPUs, the FDT is constructed with extra
buffer space to ensure it can accommodate a possible number of CPU
nodes. Additionally, a call to fdt_pack (which trims the unused space
once the FDT is prepared) is avoided if this feature is enabled.

For the kexec_load syscall, the FDT is updated only if the
KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by
userspace (kexec tools). When userspace passes this flag to the kernel,
it indicates that the FDT is built to accommodate possible CPUs, and the
FDT segment is excluded from SHA calculation, making it safe to update.

The changes related to this feature are kept under the CRASH_HOTPLUG
config, and it is enabled by default.



Looks good.

Acked-by: Hari Bathini 


Signed-off-by: Sourabh Jain 
Cc: Akhil Raj 
Cc: Andrew Morton 
Cc: Aneesh Kumar K.V 
Cc: Baoquan He 
Cc: Borislav Petkov (AMD) 
Cc: Boris Ostrovsky 
Cc: Christophe Leroy 
Cc: Dave Hansen 
Cc: Dave Young 
Cc: David Hildenbrand 
Cc: Greg Kroah-Hartman 
Cc: Hari Bathini 
Cc: Laurent Dufour 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Mimi Zohar 
Cc: Naveen N Rao 
Cc: Oscar Salvador 
Cc: Thomas Gleixner 
Cc: Valentin Schneider 
Cc: Vivek Goyal 
Cc: ke...@lists.infradead.org
Cc: x...@kernel.org
---
  arch/powerpc/Kconfig  |   4 ++
  arch/powerpc/include/asm/kexec.h  |   8 +++
  arch/powerpc/kexec/crash.c| 103 ++
  arch/powerpc/kexec/elf_64.c   |   3 +-
  arch/powerpc/kexec/file_load_64.c |  17 +
  5 files changed, 134 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e377deefa2dc..16d2b20574c4 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -686,6 +686,10 @@

Re: [PATCH v17 4/6] PowerPC/kexec: make the update_cpus_node() function public

2024-03-02 Thread Hari Bathini





On 26/02/24 2:11 pm, Sourabh Jain wrote:

Move the update_cpus_node() from kexec/{file_load_64.c => core_64.c}
to allow other kexec components to use it.

Later in the series, this function is used for in-kernel updates
to the kdump image during CPU/memory hotplug or online/offline events for
both kexec_load and kexec_file_load syscalls.

No functional changes are intended.



Looks good to me.

Acked-by: Hari Bathini 


Signed-off-by: Sourabh Jain 
Cc: Akhil Raj 
Cc: Andrew Morton 
Cc: Aneesh Kumar K.V 
Cc: Baoquan He 
Cc: Borislav Petkov (AMD) 
Cc: Boris Ostrovsky 
Cc: Christophe Leroy 
Cc: Dave Hansen 
Cc: Dave Young 
Cc: David Hildenbrand 
Cc: Greg Kroah-Hartman 
Cc: Hari Bathini 
Cc: Laurent Dufour 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Mimi Zohar 
Cc: Naveen N Rao 
Cc: Oscar Salvador 
Cc: Thomas Gleixner 
Cc: Valentin Schneider 
Cc: Vivek Goyal 
Cc: ke...@lists.infradead.org
Cc: x...@kernel.org
---
  arch/powerpc/include/asm/kexec.h  |  4 ++
  arch/powerpc/kexec/core_64.c  | 91 +++
  arch/powerpc/kexec/file_load_64.c | 87 -
  3 files changed, 95 insertions(+), 87 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index fdb90e24dc74..d9ff4d0e392d 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -185,6 +185,10 @@ static inline void crash_send_ipi(void 
(*crash_ipi_callback)(struct pt_regs *))
  
  #endif /* CONFIG_CRASH_DUMP */
  
+#if defined(CONFIG_KEXEC_FILE) || defined(CONFIG_CRASH_DUMP)

+int update_cpus_node(void *fdt);
+#endif
+
  #ifdef CONFIG_PPC_BOOK3S_64
  #include 
  #endif
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 762e4d09aacf..85050be08a23 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -17,6 +17,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 

  #include 
@@ -30,6 +31,7 @@
  #include 
  #include 
  #include 
+#include 
  
  int machine_kexec_prepare(struct kimage *image)

  {
@@ -419,3 +421,92 @@ static int __init export_htab_values(void)
  }
  late_initcall(export_htab_values);
  #endif /* CONFIG_PPC_64S_HASH_MMU */
+
+#if defined(CONFIG_KEXEC_FILE) || defined(CONFIG_CRASH_DUMP)
+/**
+ * add_node_props - Reads node properties from device node structure and add
+ *  them to fdt.
+ * @fdt:Flattened device tree of the kernel
+ * @node_offset:offset of the node to add a property at
+ * @dn: device node pointer
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+static int add_node_props(void *fdt, int node_offset, const struct device_node 
*dn)
+{
+   int ret = 0;
+   struct property *pp;
+
+   if (!dn)
+   return -EINVAL;
+
+   for_each_property_of_node(dn, pp) {
+   ret = fdt_setprop(fdt, node_offset, pp->name, pp->value, 
pp->length);
+   if (ret < 0) {
+   pr_err("Unable to add %s property: %s\n", pp->name, 
fdt_strerror(ret));
+   return ret;
+   }
+   }
+   return ret;
+}
+
+/**
+ * update_cpus_node - Update cpus node of flattened device tree using of_root
+ *device node.
+ * @fdt:  Flattened device tree of the kernel.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+int update_cpus_node(void *fdt)
+{
+   struct device_node *cpus_node, *dn;
+   int cpus_offset, cpus_subnode_offset, ret = 0;
+
+   cpus_offset = fdt_path_offset(fdt, "/cpus");
+   if (cpus_offset < 0 && cpus_offset != -FDT_ERR_NOTFOUND) {
+   pr_err("Malformed device tree: error reading /cpus node: %s\n",
+  fdt_strerror(cpus_offset));
+   return cpus_offset;
+   }
+
+   if (cpus_offset > 0) {
+   ret = fdt_del_node(fdt, cpus_offset);
+   if (ret < 0) {
+   pr_err("Error deleting /cpus node: %s\n", 
fdt_strerror(ret));
+   return -EINVAL;
+   }
+   }
+
+   /* Add cpus node to fdt */
+   cpus_offset = fdt_add_subnode(fdt, fdt_path_offset(fdt, "/"), "cpus");
+   if (cpus_offset < 0) {
+   pr_err("Error creating /cpus node: %s\n", 
fdt_strerror(cpus_offset));
+   return -EINVAL;
+   }
+
+   /* Add cpus node properties */
+   cpus_node = of_find_node_by_path("/cpus");
+   ret = add_node_props(fdt, cpus_offset, cpus_node);
+   of_node_put(cpus_node);
+   if (ret < 0)
+   return ret;
+
+   /* Loop through all subnodes of cpus and add them to fdt */
+   for_each_node_by_type(dn, "cpu") {
+   cpus_subnode_offset = fdt_add_subnode(fdt, cpus_offset, 
dn->full_name);
+   if (cpus_subnode_off

Re: [PATCH v17 3/6] powerpc/kexec: move *_memory_ranges functions to ranges.c

2024-03-02 Thread Hari Bathini





On 26/02/24 2:11 pm, Sourabh Jain wrote:

Move the following functions form kexec/{file_load_64.c => ranges.c} and
make them public so that components other KEXEC_FILE can also use these
functions.
1. get_exclude_memory_ranges
2. get_reserved_memory_ranges
3. get_crash_memory_ranges
4. get_usable_memory_ranges

Later in the series get_crash_memory_ranges function is utilized for
in-kernel updates to kdump image during CPU/Memory hotplug or
online/offline events for both kexec_load and kexec_file_load syscalls.

Since the above functions are moved to ranges.c, some of the helper
functions in ranges.c are no longer required to be public. Mark them as
static and removed them from kexec_ranges.h header file.

Finally, remove the CONFIG_KEXEC_FILE build dependency for range.c
because it is required for other config, such as CONFIG_CRASH_DUMP.

No functional changes are intended.



Acked-by: Hari Bathini 


Signed-off-by: Sourabh Jain 
Cc: Akhil Raj 
Cc: Andrew Morton 
Cc: Aneesh Kumar K.V 
Cc: Baoquan He 
Cc: Borislav Petkov (AMD) 
Cc: Boris Ostrovsky 
Cc: Christophe Leroy 
Cc: Dave Hansen 
Cc: Dave Young 
Cc: David Hildenbrand 
Cc: Greg Kroah-Hartman 
Cc: Hari Bathini 
Cc: Laurent Dufour 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Mimi Zohar 
Cc: Naveen N Rao 
Cc: Oscar Salvador 
Cc: Thomas Gleixner 
Cc: Valentin Schneider 
Cc: Vivek Goyal 
Cc: ke...@lists.infradead.org
Cc: x...@kernel.org
---
  arch/powerpc/include/asm/kexec_ranges.h |  19 +-
  arch/powerpc/kexec/Makefile |   4 +-
  arch/powerpc/kexec/file_load_64.c   | 190 
  arch/powerpc/kexec/ranges.c | 227 +++-
  4 files changed, 224 insertions(+), 216 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec_ranges.h 
b/arch/powerpc/include/asm/kexec_ranges.h
index f83866a19e87..8489e844b447 100644
--- a/arch/powerpc/include/asm/kexec_ranges.h
+++ b/arch/powerpc/include/asm/kexec_ranges.h
@@ -7,19 +7,8 @@
  void sort_memory_ranges(struct crash_mem *mrngs, bool merge);
  struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges);
  int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
-int add_tce_mem_ranges(struct crash_mem **mem_ranges);
-int add_initrd_mem_range(struct crash_mem **mem_ranges);
-#ifdef CONFIG_PPC_64S_HASH_MMU
-int add_htab_mem_range(struct crash_mem **mem_ranges);
-#else
-static inline int add_htab_mem_range(struct crash_mem **mem_ranges)
-{
-   return 0;
-}
-#endif
-int add_kernel_mem_range(struct crash_mem **mem_ranges);
-int add_rtas_mem_range(struct crash_mem **mem_ranges);
-int add_opal_mem_range(struct crash_mem **mem_ranges);
-int add_reserved_mem_ranges(struct crash_mem **mem_ranges);
-
+int get_exclude_memory_ranges(struct crash_mem **mem_ranges);
+int get_reserved_memory_ranges(struct crash_mem **mem_ranges);
+int get_crash_memory_ranges(struct crash_mem **mem_ranges);
+int get_usable_memory_ranges(struct crash_mem **mem_ranges);
  #endif /* _ASM_POWERPC_KEXEC_RANGES_H */
diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
index 8e469c4da3f8..470eb0453e17 100644
--- a/arch/powerpc/kexec/Makefile
+++ b/arch/powerpc/kexec/Makefile
@@ -3,11 +3,11 @@
  # Makefile for the linux kernel.
  #
  
-obj-y+= core.o core_$(BITS).o

+obj-y  += core.o core_$(BITS).o ranges.o
  
  obj-$(CONFIG_PPC32)		+= relocate_32.o
  
-obj-$(CONFIG_KEXEC_FILE)	+= file_load.o ranges.o file_load_$(BITS).o elf_$(BITS).o

+obj-$(CONFIG_KEXEC_FILE)   += file_load.o file_load_$(BITS).o elf_$(BITS).o
  obj-$(CONFIG_VMCORE_INFO) += vmcore_info.o
  obj-$(CONFIG_CRASH_DUMP)  += crash.o
  
diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c

index 1bc65de6174f..6a01f62b8fcf 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -47,83 +47,6 @@ const struct kexec_file_ops * const kexec_file_loaders[] = {
NULL
  };
  
-/**

- * get_exclude_memory_ranges - Get exclude memory ranges. This list includes
- * regions like opal/rtas, tce-table, initrd,
- * kernel, htab which should be avoided while
- * setting up kexec load segments.
- * @mem_ranges:Range list to add the memory ranges to.
- *
- * Returns 0 on success, negative errno on error.
- */
-static int get_exclude_memory_ranges(struct crash_mem **mem_ranges)
-{
-   int ret;
-
-   ret = add_tce_mem_ranges(mem_ranges);
-   if (ret)
-   goto out;
-
-   ret = add_initrd_mem_range(mem_ranges);
-   if (ret)
-   goto out;
-
-   ret = add_htab_mem_range(mem_ranges);
-   if (ret)
-   goto out;
-
-   ret = add_kernel_mem_range(mem_ranges);
-   if (ret)
-   goto out;
-
-   ret = add_rtas_mem_range(mem_ranges);
-   if (ret)
-   goto out;
-
-   ret = add_opal_mem_range(mem_ran

Re: [PATCH v17 1/6] crash: forward memory_notify arg to arch crash hotplug handler

2024-03-02 Thread Hari Bathini





On 26/02/24 2:11 pm, Sourabh Jain wrote:

In the event of memory hotplug or online/offline events, the crash
memory hotplug notifier `crash_memhp_notifier()` receives a
`memory_notify` object but doesn't forward that object to the
generic and architecture-specific crash hotplug handler.

The `memory_notify` object contains the starting PFN (Page Frame Number)
and the number of pages in the hot-removed memory. This information is
necessary for architectures like PowerPC to update/recreate the kdump
image, specifically `elfcorehdr`.

So update the function signature of `crash_handle_hotplug_event()` and
`arch_crash_handle_hotplug_event()` to accept the `memory_notify` object
as an argument from crash memory hotplug notifier.

Since no such object is available in the case of CPU hotplug event, the
crash CPU hotplug notifier `crash_cpuhp_online()` passes NULL to the
crash hotplug handler.



Acked-by: Hari Bathini 


Signed-off-by: Sourabh Jain 
Acked-by: Baoquan He 
Cc: Akhil Raj 
Cc: Andrew Morton 
Cc: Aneesh Kumar K.V 
Cc: Borislav Petkov (AMD) 
Cc: Boris Ostrovsky 
Cc: Christophe Leroy 
Cc: Dave Hansen 
Cc: Dave Young 
Cc: David Hildenbrand 
Cc: Greg Kroah-Hartman 
Cc: Hari Bathini 
Cc: Laurent Dufour 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Mimi Zohar 
Cc: Naveen N Rao 
Cc: Oscar Salvador 
Cc: Thomas Gleixner 
Cc: Valentin Schneider 
Cc: Vivek Goyal 
Cc: ke...@lists.infradead.org
Cc: x...@kernel.org
---
  arch/x86/include/asm/kexec.h |  2 +-
  arch/x86/kernel/crash.c  |  4 +++-
  include/linux/crash_core.h   |  2 +-
  kernel/crash_core.c  | 14 +++---
  4 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 91ca9a9ee3a2..cb1320ebbc23 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -207,7 +207,7 @@ int arch_kimage_file_post_load_cleanup(struct kimage 
*image);
  extern void kdump_nmi_shootdown_cpus(void);
  
  #ifdef CONFIG_CRASH_HOTPLUG

-void arch_crash_handle_hotplug_event(struct kimage *image);
+void arch_crash_handle_hotplug_event(struct kimage *image, void *arg);
  #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
  
  #ifdef CONFIG_HOTPLUG_CPU

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index e74d0c4286c1..2a682fe86352 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -432,10 +432,12 @@ unsigned int arch_crash_get_elfcorehdr_size(void)
  /**
   * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
   * @image: a pointer to kexec_crash_image
+ * @arg: struct memory_notify handler for memory hotplug case and
+ *   NULL for CPU hotplug case.
   *
   * Prepare the new elfcorehdr and replace the existing elfcorehdr.
   */
-void arch_crash_handle_hotplug_event(struct kimage *image)
+void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
  {
void *elfbuf = NULL, *old_elfcorehdr;
unsigned long nr_mem_ranges;
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index d33352c2e386..647e928efee8 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -37,7 +37,7 @@ static inline void arch_kexec_unprotect_crashkres(void) { }
  
  
  #ifndef arch_crash_handle_hotplug_event

-static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
+static inline void arch_crash_handle_hotplug_event(struct kimage *image, void 
*arg) { }
  #endif
  
  int crash_check_update_elfcorehdr(void);

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 78b5dc7cee3a..70fa8111a9d6 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -534,7 +534,7 @@ int crash_check_update_elfcorehdr(void)
   * list of segments it checks (since the elfcorehdr changes and thus
   * would require an update to purgatory itself to update the digest).
   */
-static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int 
cpu)
+static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int 
cpu, void *arg)
  {
struct kimage *image;
  
@@ -596,7 +596,7 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)

image->hp_action = hp_action;
  
  	/* Now invoke arch-specific update handler */

-   arch_crash_handle_hotplug_event(image);
+   arch_crash_handle_hotplug_event(image, arg);
  
  	/* No longer handling a hotplug event */

image->hp_action = KEXEC_CRASH_HP_NONE;
@@ -612,17 +612,17 @@ static void crash_handle_hotplug_event(unsigned int 
hp_action, unsigned int cpu)
crash_hotplug_unlock();
  }
  
-static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v)

+static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, 
void *arg)
  {
switch (val) {
case MEM_ONLINE:
crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY,
-   KEXEC_CRASH_HP_INVAL

Re: [PATCH v17 2/6] crash: add a new kexec flag for hotplug support

2024-03-02 Thread Hari Bathini





On 26/02/24 2:11 pm, Sourabh Jain wrote:

Commit a72bbec70da2 ("crash: hotplug support for kexec_load()")
introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses
this flag to indicate to the kernel that it is safe to modify the
elfcorehdr of the kdump image loaded using the kexec_load system call.

However, it is possible that architectures may need to update kexec
segments other then elfcorehdr. For example, FDT (Flatten Device Tree)
on PowerPC. Introducing a new kexec flag for every new kexec segment
may not be a good solution. Hence, a generic kexec flag bit,
`KEXEC_CRASH_HOTPLUG_SUPPORT`, is introduced to share the CPU/Memory
hotplug support intent between the kexec tool and the kernel for the
kexec_load system call.

Now, if the kexec tool sends KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag to
the kernel, it indicates to the kernel that all the required kexec
segment is skipped from SHA calculation and it is safe to update kdump
image loaded using the kexec_load syscall.

While loading the kdump image using the kexec_load syscall, the
@update_elfcorehdr member of struct kimage is set if the kexec tool
sends the KEXEC_UPDATE_ELFCOREHDR kexec flag. This member is later used
to determine whether it is safe to update elfcorehdr on hotplug events.
However, with the introduction of the KEXEC_CRASH_HOTPLUG_SUPPORT kexec
flag, the kexec tool could mark all the required kexec segments on an
architecture as safe to update. So rename the @update_elfcorehdr to
@hotplug_support. If @hotplug_support is set, the kernel can safely
update all the required kexec segments of the kdump image during
CPU/Memory hotplug events.

Introduce an architecture-specific function to process kexec flags for
determining hotplug support. Set the @hotplug_support member of struct
kimage for both kexec_load and kexec_file_load system calls. This
simplifies kernel checks to identify hotplug support for the currently
loaded kdump image by just examining the value of @hotplug_support.



Couple of minor nits. See comments below.
Otherwise, looks good to me.

Acked-by: Hari Bathini 


Signed-off-by: Sourabh Jain 
Cc: Akhil Raj 
Cc: Andrew Morton 
Cc: Aneesh Kumar K.V 
Cc: Baoquan He 
Cc: Borislav Petkov (AMD) 
Cc: Boris Ostrovsky 
Cc: Christophe Leroy 
Cc: Dave Hansen 
Cc: Dave Young 
Cc: David Hildenbrand 
Cc: Eric DeVolder 
Cc: Greg Kroah-Hartman 
Cc: Hari Bathini 
Cc: Laurent Dufour 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Mimi Zohar 
Cc: Naveen N Rao 
Cc: Oscar Salvador 
Cc: Thomas Gleixner 
Cc: Valentin Schneider 
Cc: Vivek Goyal 
Cc: ke...@lists.infradead.org
Cc: x...@kernel.org
---
  arch/x86/include/asm/kexec.h | 11 ++-
  arch/x86/kernel/crash.c  | 28 +---
  drivers/base/cpu.c   |  2 +-
  drivers/base/memory.c|  2 +-
  include/linux/crash_core.h   | 13 ++---
  include/linux/kexec.h| 11 +++
  include/uapi/linux/kexec.h   |  1 +
  kernel/crash_core.c  | 11 ---
  kernel/kexec.c   |  4 ++--
  kernel/kexec_file.c  |  5 +
  10 files changed, 46 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index cb1320ebbc23..ae5482a2f0ca 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -210,15 +210,8 @@ extern void kdump_nmi_shootdown_cpus(void);
  void arch_crash_handle_hotplug_event(struct kimage *image, void *arg);
  #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
  
-#ifdef CONFIG_HOTPLUG_CPU

-int arch_crash_hotplug_cpu_support(void);
-#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
-#endif
-
-#ifdef CONFIG_MEMORY_HOTPLUG
-int arch_crash_hotplug_memory_support(void);
-#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
-#endif
+int arch_crash_hotplug_support(struct kimage *image, unsigned long 
kexec_flags);
+#define arch_crash_hotplug_support arch_crash_hotplug_support
  
  unsigned int arch_crash_get_elfcorehdr_size(void);

  #define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 2a682fe86352..f06501445cd9 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -402,20 +402,26 @@ int crash_load_segments(struct kimage *image)
  #undef pr_fmt
  #define pr_fmt(fmt) "crash hp: " fmt





-/* These functions provide the value for the sysfs crash_hotplug nodes */
-#ifdef CONFIG_HOTPLUG_CPU
-int arch_crash_hotplug_cpu_support(void)
+int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags)
  {
-   return crash_check_update_elfcorehdr();
-}
-#endif
  
-#ifdef CONFIG_MEMORY_HOTPLUG

-int arch_crash_hotplug_memory_support(void)
-{
-   return crash_check_update_elfcorehdr();
-}
+#ifdef CONFIG_KEXEC_FILE
+   if (image->file_mode)
+   return 1;
  #endif
+   /*
+* Initially, crash hotplug support f

[PATCH linux-next v2 2/3] powerpc/kexec: split CONFIG_KEXEC_FILE and CONFIG_CRASH_DUMP

2024-02-26 Thread Hari Bathini

CONFIG_KEXEC_FILE does not have to select CONFIG_CRASH_DUMP. Move
some code under CONFIG_CRASH_DUMP to support CONFIG_KEXEC_FILE and
!CONFIG_CRASH_DUMP case.

Signed-off-by: Hari Bathini 
---

* No changes in v2.

 arch/powerpc/kexec/elf_64.c   |   4 +-
 arch/powerpc/kexec/file_load_64.c | 269 --
 2 files changed, 142 insertions(+), 131 deletions(-)

diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
index 904016cf89ea..6d8951e8e966 100644
--- a/arch/powerpc/kexec/elf_64.c
+++ b/arch/powerpc/kexec/elf_64.c
@@ -47,7 +47,7 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
if (ret)
return ERR_PTR(ret);
 
-   if (image->type == KEXEC_TYPE_CRASH) {
+   if (IS_ENABLED(CONFIG_CRASH_DUMP) && image->type == KEXEC_TYPE_CRASH) {
/* min & max buffer values for kdump case */
kbuf.buf_min = pbuf.buf_min = crashk_res.start;
kbuf.buf_max = pbuf.buf_max =
@@ -70,7 +70,7 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
kexec_dprintk("Loaded purgatory at 0x%lx\n", pbuf.mem);
 
/* Load additional segments needed for panic kernel */
-   if (image->type == KEXEC_TYPE_CRASH) {
+   if (IS_ENABLED(CONFIG_CRASH_DUMP) && image->type == KEXEC_TYPE_CRASH) {
ret = load_crashdump_segments_ppc64(image, );
if (ret) {
pr_err("Failed to load kdump kernel segments\n");
diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index 5b4c5cb23354..1bc65de6174f 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -96,119 +96,6 @@ static int get_exclude_memory_ranges(struct crash_mem 
**mem_ranges)
return ret;
 }
 
-/**
- * get_usable_memory_ranges - Get usable memory ranges. This list includes
- *regions like crashkernel, opal/rtas & tce-table,
- *that kdump kernel could use.
- * @mem_ranges:   Range list to add the memory ranges to.
- *
- * Returns 0 on success, negative errno on error.
- */
-static int get_usable_memory_ranges(struct crash_mem **mem_ranges)
-{
-   int ret;
-
-   /*
-* Early boot failure observed on guests when low memory (first memory
-* block?) is not added to usable memory. So, add [0, crashk_res.end]
-* instead of [crashk_res.start, crashk_res.end] to workaround it.
-* Also, crashed kernel's memory must be added to reserve map to
-* avoid kdump kernel from using it.
-*/
-   ret = add_mem_range(mem_ranges, 0, crashk_res.end + 1);
-   if (ret)
-   goto out;
-
-   ret = add_rtas_mem_range(mem_ranges);
-   if (ret)
-   goto out;
-
-   ret = add_opal_mem_range(mem_ranges);
-   if (ret)
-   goto out;
-
-   ret = add_tce_mem_ranges(mem_ranges);
-out:
-   if (ret)
-   pr_err("Failed to setup usable memory ranges\n");
-   return ret;
-}
-
-/**
- * get_crash_memory_ranges - Get crash memory ranges. This list includes
- *   first/crashing kernel's memory regions that
- *   would be exported via an elfcore.
- * @mem_ranges:  Range list to add the memory ranges to.
- *
- * Returns 0 on success, negative errno on error.
- */
-static int get_crash_memory_ranges(struct crash_mem **mem_ranges)
-{
-   phys_addr_t base, end;
-   struct crash_mem *tmem;
-   u64 i;
-   int ret;
-
-   for_each_mem_range(i, , ) {
-   u64 size = end - base;
-
-   /* Skip backup memory region, which needs a separate entry */
-   if (base == BACKUP_SRC_START) {
-   if (size > BACKUP_SRC_SIZE) {
-   base = BACKUP_SRC_END + 1;
-   size -= BACKUP_SRC_SIZE;
-   } else
-   continue;
-   }
-
-   ret = add_mem_range(mem_ranges, base, size);
-   if (ret)
-   goto out;
-
-   /* Try merging adjacent ranges before reallocation attempt */
-   if ((*mem_ranges)->nr_ranges == (*mem_ranges)->max_nr_ranges)
-   sort_memory_ranges(*mem_ranges, true);
-   }
-
-   /* Reallocate memory ranges if there is no space to split ranges */
-   tmem = *mem_ranges;
-   if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) {
-   tmem = realloc_mem_ranges(mem_ranges);
-   if (!tmem)
-   goto out;
-   }
-
-   /* Exclude crashkernel region */
-   ret = crash_exclude_mem_range(tmem, crashk_res.start, crashk_res.end);
-   if (ret)
-   goto out;
-
-   /*
-* F

[PATCH linux-next v2 3/3] powerpc/kdump: Split KEXEC_CORE and CRASH_DUMP dependency

2024-02-26 Thread Hari Bathini

Remove CONFIG_CRASH_DUMP dependency on CONFIG_KEXEC. CONFIG_KEXEC_CORE
was used at places where CONFIG_CRASH_DUMP or CONFIG_CRASH_RESERVE was
appropriate. Replace with appropriate #ifdefs to support CONFIG_KEXEC
and !CONFIG_CRASH_DUMP configuration option. Also, make CONFIG_FA_DUMP
dependent on CONFIG_CRASH_DUMP to avoid unmet dependencies for FA_DUMP
with !CONFIG_KEXEC_CORE configuration option.

Signed-off-by: Hari Bathini 
---

Changes in v2:
* Fixed a compile error for POWERNV build reported by Sourabh.

 arch/powerpc/Kconfig |  9 +--
 arch/powerpc/include/asm/kexec.h | 98 ++--
 arch/powerpc/kernel/prom.c   |  2 +-
 arch/powerpc/kernel/setup-common.c   |  2 +-
 arch/powerpc/kernel/smp.c|  4 +-
 arch/powerpc/kexec/Makefile  |  3 +-
 arch/powerpc/kexec/core.c|  4 ++
 arch/powerpc/platforms/powernv/smp.c |  2 +-
 8 files changed, 61 insertions(+), 63 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 5cf8ad8d7e8e..e377deefa2dc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -607,11 +607,6 @@ config PPC64_SUPPORTS_MEMORY_FAILURE
 config ARCH_SUPPORTS_KEXEC
def_bool PPC_BOOK3S || PPC_E500 || (44x && !SMP)
 
-config ARCH_SELECTS_KEXEC
-   def_bool y
-   depends on KEXEC
-   select CRASH_DUMP
-
 config ARCH_SUPPORTS_KEXEC_FILE
def_bool PPC64
 
@@ -622,7 +617,6 @@ config ARCH_SELECTS_KEXEC_FILE
def_bool y
depends on KEXEC_FILE
select KEXEC_ELF
-   select CRASH_DUMP
select HAVE_IMA_KEXEC if IMA
 
 config PPC64_BIG_ENDIAN_ELF_ABI_V2
@@ -694,8 +688,7 @@ config ARCH_SELECTS_CRASH_DUMP
 
 config FA_DUMP
bool "Firmware-assisted dump"
-   depends on PPC64 && (PPC_RTAS || PPC_POWERNV)
-   select CRASH_DUMP
+   depends on CRASH_DUMP && PPC64 && (PPC_RTAS || PPC_POWERNV)
help
  A robust mechanism to get reliable kernel crash dump with
  assistance from firmware. This approach does not use kexec,
diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index e1b43aa12175..fdb90e24dc74 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -55,59 +55,18 @@
 typedef void (*crash_shutdown_t)(void);
 
 #ifdef CONFIG_KEXEC_CORE
-
-/*
- * This function is responsible for capturing register states if coming
- * via panic or invoking dump using sysrq-trigger.
- */
-static inline void crash_setup_regs(struct pt_regs *newregs,
-   struct pt_regs *oldregs)
-{
-   if (oldregs)
-   memcpy(newregs, oldregs, sizeof(*newregs));
-   else
-   ppc_save_regs(newregs);
-}
+struct kimage;
+struct pt_regs;
 
 extern void kexec_smp_wait(void);  /* get and clear naca physid, wait for
  master to copy new code to 0 */
-extern int crashing_cpu;
-extern void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *));
-extern void crash_ipi_callback(struct pt_regs *);
-extern int crash_wake_offline;
-
-struct kimage;
-struct pt_regs;
 extern void default_machine_kexec(struct kimage *image);
-extern void default_machine_crash_shutdown(struct pt_regs *regs);
-extern int crash_shutdown_register(crash_shutdown_t handler);
-extern int crash_shutdown_unregister(crash_shutdown_t handler);
-
-extern void crash_kexec_prepare(void);
-extern void crash_kexec_secondary(struct pt_regs *regs);
-int __init overlaps_crashkernel(unsigned long start, unsigned long size);
-extern void reserve_crashkernel(void);
 extern void machine_kexec_mask_interrupts(void);
 
-static inline bool kdump_in_progress(void)
-{
-   return crashing_cpu >= 0;
-}
-
 void relocate_new_kernel(unsigned long indirection_page, unsigned long 
reboot_code_buffer,
 unsigned long start_address) __noreturn;
-
 void kexec_copy_flush(struct kimage *image);
 
-#if defined(CONFIG_CRASH_DUMP)
-bool is_kdump_kernel(void);
-#define is_kdump_kernelis_kdump_kernel
-#if defined(CONFIG_PPC_RTAS)
-void crash_free_reserved_phys_range(unsigned long begin, unsigned long end);
-#define crash_free_reserved_phys_range crash_free_reserved_phys_range
-#endif /* CONFIG_PPC_RTAS */
-#endif /* CONFIG_CRASH_DUMP */
-
 #ifdef CONFIG_KEXEC_FILE
 extern const struct kexec_file_ops kexec_elf64_ops;
 
@@ -152,15 +111,56 @@ int setup_new_fdt_ppc64(const struct kimage *image, void 
*fdt,
 
 #endif /* CONFIG_KEXEC_FILE */
 
-#else /* !CONFIG_KEXEC_CORE */
-static inline void crash_kexec_secondary(struct pt_regs *regs) { }
+#endif /* CONFIG_KEXEC_CORE */
+
+#ifdef CONFIG_CRASH_RESERVE
+int __init overlaps_crashkernel(unsigned long start, unsigned long size);
+extern void reserve_crashkernel(void);
+#else
+static inline void reserve_crashkernel(void) {}
+static inline int overlaps_crashkernel(unsigned long start, unsigned long 
siz

[PATCH linux-next v2 0/3] powerpc/kexec: split CONFIG_CRASH_DUMP out from CONFIG_KEXEC_CORE

2024-02-26 Thread Hari Bathini

This patch series is a follow-up to [1] based on discussions at [2]
about additional work needed to get it working on powerpc.

The first patch in the series makes struct crash_mem available with or
without CONFIG_CRASH_DUMP enabled. The next patch moves kdump specific
code for kexec_file_load syscall under CONFIG_CRASH_DUMP and the last
patch splits other kdump specific code under CONFIG_CRASH_DUMP and
removes dependency with CONFIG_CRASH_DUMP for CONFIG_KEXEC_CORE.

[1] https://lore.kernel.org/all/20240124051254.67105-1-...@redhat.com/
[2] 
https://lore.kernel.org/all/9101bb07-70f1-476c-bec9-ec67e9899...@linux.ibm.com/

Changes in v2:
* Fixed a compile error for POWERNV build reported by Sourabh.

Hari Bathini (3):
  kexec/kdump: make struct crash_mem available without CONFIG_CRASH_DUMP
  powerpc/kexec: split CONFIG_KEXEC_FILE and CONFIG_CRASH_DUMP
  powerpc/kdump: Split KEXEC_CORE and CRASH_DUMP dependency

 arch/powerpc/Kconfig |   9 +-
 arch/powerpc/include/asm/kexec.h |  98 +-
 arch/powerpc/kernel/prom.c   |   2 +-
 arch/powerpc/kernel/setup-common.c   |   2 +-
 arch/powerpc/kernel/smp.c|   4 +-
 arch/powerpc/kexec/Makefile  |   3 +-
 arch/powerpc/kexec/core.c|   4 +
 arch/powerpc/kexec/elf_64.c  |   4 +-
 arch/powerpc/kexec/file_load_64.c| 269 ++-
 arch/powerpc/platforms/powernv/smp.c |   2 +-
 include/linux/crash_core.h   |  12 +-
 11 files changed, 209 insertions(+), 200 deletions(-)

-- 
2.43.2

[PATCH linux-next v2 1/3] kexec/kdump: make struct crash_mem available without CONFIG_CRASH_DUMP

2024-02-26 Thread Hari Bathini

struct crash_mem defined under include/linux/crash_core.h represents
a list of memory ranges. While it is used to represent memory ranges
for kdump kernel, it can also be used for other kind of memory ranges.
In fact, KEXEC_FILE_LOAD syscall in powerpc uses this structure to
represent reserved memory ranges and exclude memory ranges needed to
find the right memory regions to load kexec kernel. So, make the
definition of crash_mem structure available for !CONFIG_CRASH_DUMP
case too.

Signed-off-by: Hari Bathini 
---

* No changes in v2.

 include/linux/crash_core.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index 23270b16e1db..d33352c2e386 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -8,6 +8,12 @@
 
 struct kimage;
 
+struct crash_mem {
+   unsigned int max_nr_ranges;
+   unsigned int nr_ranges;
+   struct range ranges[] __counted_by(max_nr_ranges);
+};
+
 #ifdef CONFIG_CRASH_DUMP
 
 int crash_shrink_memory(unsigned long new_size);
@@ -51,12 +57,6 @@ static inline unsigned int crash_get_elfcorehdr_size(void) { 
return 0; }
 /* Alignment required for elf header segment */
 #define ELF_CORE_HEADER_ALIGN   4096
 
-struct crash_mem {
-   unsigned int max_nr_ranges;
-   unsigned int nr_ranges;
-   struct range ranges[] __counted_by(max_nr_ranges);
-};
-
 extern int crash_exclude_mem_range(struct crash_mem *mem,
   unsigned long long mstart,
   unsigned long long mend);
-- 
2.43.2

Re: [PATCH linux-next 3/3] powerpc/kdump: Split KEXEC_CORE and CRASH_DUMP dependency

2024-02-25 Thread Hari Bathini





On 23/02/24 1:05 pm, Sourabh Jain wrote:

Hello Hari,


Hi Sourabh,



Build failure detected.


Thanks for trying out the patches.



On 13/02/24 17:01, Hari Bathini wrote:

Remove CONFIG_CRASH_DUMP dependency on CONFIG_KEXEC. CONFIG_KEXEC_CORE
was used at places where CONFIG_CRASH_DUMP or CONFIG_CRASH_RESERVE was
appropriate. Replace with appropriate #ifdefs to support CONFIG_KEXEC
and !CONFIG_CRASH_DUMP configuration option. Also, make CONFIG_FA_DUMP
dependent on CONFIG_CRASH_DUMP to avoid unmet dependencies for FA_DUMP
with !CONFIG_KEXEC_CORE configuration option.

Signed-off-by: Hari Bathini 
---
  arch/powerpc/Kconfig   |  9 +--
  arch/powerpc/include/asm/kexec.h   | 98 +++---
  arch/powerpc/kernel/prom.c |  2 +-
  arch/powerpc/kernel/setup-common.c |  2 +-
  arch/powerpc/kernel/smp.c  |  4 +-
  arch/powerpc/kexec/Makefile    |  3 +-
  arch/powerpc/kexec/core.c  |  4 ++
  7 files changed, 60 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 5cf8ad8d7e8e..e377deefa2dc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -607,11 +607,6 @@ config PPC64_SUPPORTS_MEMORY_FAILURE
  config ARCH_SUPPORTS_KEXEC
  def_bool PPC_BOOK3S || PPC_E500 || (44x && !SMP)
-config ARCH_SELECTS_KEXEC
-    def_bool y
-    depends on KEXEC
-    select CRASH_DUMP
-
  config ARCH_SUPPORTS_KEXEC_FILE
  def_bool PPC64
@@ -622,7 +617,6 @@ config ARCH_SELECTS_KEXEC_FILE
  def_bool y
  depends on KEXEC_FILE
  select KEXEC_ELF
-    select CRASH_DUMP
  select HAVE_IMA_KEXEC if IMA
  config PPC64_BIG_ENDIAN_ELF_ABI_V2
@@ -694,8 +688,7 @@ config ARCH_SELECTS_CRASH_DUMP
  config FA_DUMP
  bool "Firmware-assisted dump"
-    depends on PPC64 && (PPC_RTAS || PPC_POWERNV)
-    select CRASH_DUMP
+    depends on CRASH_DUMP && PPC64 && (PPC_RTAS || PPC_POWERNV)
  help
    A robust mechanism to get reliable kernel crash dump with
    assistance from firmware. This approach does not use kexec,
diff --git a/arch/powerpc/include/asm/kexec.h 
b/arch/powerpc/include/asm/kexec.h

index e1b43aa12175..fdb90e24dc74 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -55,59 +55,18 @@
  typedef void (*crash_shutdown_t)(void);
  #ifdef CONFIG_KEXEC_CORE
-
-/*
- * This function is responsible for capturing register states if coming
- * via panic or invoking dump using sysrq-trigger.
- */
-static inline void crash_setup_regs(struct pt_regs *newregs,
-    struct pt_regs *oldregs)
-{
-    if (oldregs)
-    memcpy(newregs, oldregs, sizeof(*newregs));
-    else
-    ppc_save_regs(newregs);
-}
+struct kimage;
+struct pt_regs;
  extern void kexec_smp_wait(void);    /* get and clear naca physid, 
wait for

    master to copy new code to 0 */
-extern int crashing_cpu;
-extern void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs 
*));

-extern void crash_ipi_callback(struct pt_regs *);
-extern int crash_wake_offline;
-
-struct kimage;
-struct pt_regs;
  extern void default_machine_kexec(struct kimage *image);
-extern void default_machine_crash_shutdown(struct pt_regs *regs);
-extern int crash_shutdown_register(crash_shutdown_t handler);
-extern int crash_shutdown_unregister(crash_shutdown_t handler);
-
-extern void crash_kexec_prepare(void);
-extern void crash_kexec_secondary(struct pt_regs *regs);
-int __init overlaps_crashkernel(unsigned long start, unsigned long 
size);

-extern void reserve_crashkernel(void);
  extern void machine_kexec_mask_interrupts(void);
-static inline bool kdump_in_progress(void)
-{
-    return crashing_cpu >= 0;
-}
-
  void relocate_new_kernel(unsigned long indirection_page, unsigned 
long reboot_code_buffer,

   unsigned long start_address) __noreturn;
-
  void kexec_copy_flush(struct kimage *image);
-#if defined(CONFIG_CRASH_DUMP)
-bool is_kdump_kernel(void);
-#define is_kdump_kernel    is_kdump_kernel
-#if defined(CONFIG_PPC_RTAS)
-void crash_free_reserved_phys_range(unsigned long begin, unsigned 
long end);

-#define crash_free_reserved_phys_range crash_free_reserved_phys_range
-#endif /* CONFIG_PPC_RTAS */
-#endif /* CONFIG_CRASH_DUMP */
-
  #ifdef CONFIG_KEXEC_FILE
  extern const struct kexec_file_ops kexec_elf64_ops;
@@ -152,15 +111,56 @@ int setup_new_fdt_ppc64(const struct kimage 
*image, void *fdt,

  #endif /* CONFIG_KEXEC_FILE */
-#else /* !CONFIG_KEXEC_CORE */
-static inline void crash_kexec_secondary(struct pt_regs *regs) { }
+#endif /* CONFIG_KEXEC_CORE */
+
+#ifdef CONFIG_CRASH_RESERVE
+int __init overlaps_crashkernel(unsigned long start, unsigned long 
size);

+extern void reserve_crashkernel(void);
+#else
+static inline void reserve_crashkernel(void) {}
+static inline int overlaps_crashkernel(unsigned long start, unsigned 
long size) { return 0; }

+#endif
-static inline int overlaps_crashkernel(unsigned l

Re: [PATCH v2 00/14] Split crash out from kexec and clean up related config items

2024-02-22 Thread Hari Bathini





On 23/02/24 2:59 am, Andrew Morton wrote:

On Thu, 22 Feb 2024 10:47:29 +0530 Hari Bathini  wrote:




On 22/02/24 2:27 am, Andrew Morton wrote:

On Wed, 21 Feb 2024 11:15:00 +0530 Hari Bathini  wrote:


On 04/02/24 8:56 am, Baoquan He wrote:

Hope Hari and Pingfan can help have a look, see if
it's doable. Now, I make it either have both kexec and crash enabled, or
disable both of them altogether.


Sure. I will take a closer look...

Thanks a lot. Please feel free to post patches to make that, or I can do
it with your support or suggestion.


Tested your changes and on top of these changes, came up with the below
changes to get it working for powerpc:

   
https://lore.kernel.org/all/20240213113150.1148276-1-hbath...@linux.ibm.com/


So can we take it that you're OK with Baoquan's series as-is?


Hi Andrew,

If you mean

v3 (https://lore.kernel.org/all/20240124051254.67105-1-...@redhat.com/)
+
follow-up from Baoquan
(https://lore.kernel.org/all/Zb8D1ASrgX0qVm9z@MiWiFi-R3L-srv/)

Yes.



Can I add your Acked-by: and/or Tested-by: to the patches in this series?


Sure, Andrew.

Acked-by: Hari Bathini 

for..

Patches 1-5 & 8 in:

  https://lore.kernel.org/all/20240124051254.67105-1-...@redhat.com/

and this follow-up patch:

  https://lore.kernel.org/all/Zb8D1ASrgX0qVm9z@MiWiFi-R3L-srv/

Thanks
Hari

Re: [powerpc] Dump capture failure with recent linux-next

2024-02-21 Thread Hari Bathini


Hi Sachin,

On 22/02/24 10:55 am, Sachin Sant wrote:

Kdump fails to save vmcore with recent linux-next builds on IBM Power server
with following messages

  Starting Kdump Vmcore Save Service...
[ 17.349599] kdump[367]: Kdump is using the default log level(3).
[ 17.407407] kdump[391]: saving to 
/sysroot//var/crash//127.0.0.1-2024-02-21-15:03:55/
[ 17.441270] EXT4-fs (sda2): re-mounted 630dfb4e-74bd-45c4-a8de-232992bc8724 
r/w. Quota mode: none.
[ 17.04] kdump[395]: saving vmcore-dmesg.txt to 
/sysroot//var/crash//127.0.0.1-2024-02-21-15:03:55/
[ 17.464859] kdump[401]: saving vmcore-dmesg.txt complete
[ 17.466636] kdump[403]: saving vmcore
[ 17.551589] kdump.sh[404]:
Checking for memory holes : [ 0.0 %] /
Checking for memory holes : [100.0 %] | readpage_elf: Attempt to read 
non-existent page at 0xc.
[ 17.551718] kdump.sh[404]: readmem: type_addr: 0, addr:c00c, 
size:16384
[ 17.551793] kdump.sh[404]: __exclude_unnecessary_pages: Can't read the buffer 
of struct page.
[ 17.551864] kdump.sh[404]: create_2nd_bitmap: Can't exclude unnecessary pages.
[ 17.562632] kdump.sh[404]: The kernel version is not supported.
[ 17.562708] kdump.sh[404]: The makedumpfile operation may be incomplete.
[ 17.562773] kdump.sh[404]: makedumpfile Failed.
[ 17.564335] kdump[406]: saving vmcore failed, _exitcode:1
[ 17.566013] kdump[408]: saving the /run/initramfs/kexec-dmesg.log to 
/sysroot//var/crash//127.0.0.1-2024-02-21-15:03:55/
[ 17.583658] kdump[414]: saving vmcore failed

Git bisect points to following patch

commit 378eb24a0658dd922b29524e0ce35c6c43f56cba
 mm/vmalloc: remove vmap_area_list

Reverting this patch allows kdump to save vmcore file correctly.

Does this change require any corresponding changes to makedumpfile?


Right. The change intends the tools to use VMALLOC_START exported via
vmcoreinfo instead of vmap_area_list. I don't see the corresponding
makedumpfile change submitted upstream yet though.

Aditya, can you help with this..

- Hari

Re: [PATCH v2 00/14] Split crash out from kexec and clean up related config items

2024-02-21 Thread Hari Bathini





On 22/02/24 2:27 am, Andrew Morton wrote:

On Wed, 21 Feb 2024 11:15:00 +0530 Hari Bathini  wrote:


On 04/02/24 8:56 am, Baoquan He wrote:

Hope Hari and Pingfan can help have a look, see if
it's doable. Now, I make it either have both kexec and crash enabled, or
disable both of them altogether.


Sure. I will take a closer look...

Thanks a lot. Please feel free to post patches to make that, or I can do
it with your support or suggestion.


Tested your changes and on top of these changes, came up with the below
changes to get it working for powerpc:

  
https://lore.kernel.org/all/20240213113150.1148276-1-hbath...@linux.ibm.com/


So can we take it that you're OK with Baoquan's series as-is?


Hi Andrew,

If you mean

v3 (https://lore.kernel.org/all/20240124051254.67105-1-...@redhat.com/)
+
follow-up from Baoquan 
(https://lore.kernel.org/all/Zb8D1ASrgX0qVm9z@MiWiFi-R3L-srv/)


Yes.

My changes are based on top of the above patches..

Thanks
Hari

Re: [PATCH v2 00/14] Split crash out from kexec and clean up related config items

2024-02-20 Thread Hari Bathini


Hi Baoquan,

On 04/02/24 8:56 am, Baoquan He wrote:

Hope Hari and Pingfan can help have a look, see if
it's doable. Now, I make it either have both kexec and crash enabled, or
disable both of them altogether.


Sure. I will take a closer look...

Thanks a lot. Please feel free to post patches to make that, or I can do
it with your support or suggestion.


Tested your changes and on top of these changes, came up with the below
changes to get it working for powerpc:


https://lore.kernel.org/all/20240213113150.1148276-1-hbath...@linux.ibm.com/

Please take a look.

Thanks
Hari

Re: [PATCH v2 1/2] powerpc/bpf: ensure module addresses are supported

2024-02-15 Thread Hari Bathini





On 13/02/24 1:23 pm, Christophe Leroy wrote:



Le 01/02/2024 à 18:12, Hari Bathini a écrit :

Currently, bpf jit code on powerpc assumes all the bpf functions and
helpers to be kernel text. This is false for kfunc case, as function
addresses are mostly module addresses in that case. Ensure module
addresses are supported to enable kfunc support.

Assume kernel text address for programs with no kfunc call to optimize
instruction sequence in that case. Add a check to error out if this
assumption ever changes in the future.

Signed-off-by: Hari Bathini 
---

Changes in v2:
* Using bpf_prog_has_kfunc_call() to decide whether to use optimized
instruction sequence or not as suggested by Naveen.


   arch/powerpc/net/bpf_jit.h|   5 +-
   arch/powerpc/net/bpf_jit_comp.c   |   4 +-
   arch/powerpc/net/bpf_jit_comp32.c |   8 ++-
   arch/powerpc/net/bpf_jit_comp64.c | 109 --
   4 files changed, 97 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index cdea5dccaefe..fc56ee0ee9c5 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -160,10 +160,11 @@ static inline void bpf_clear_seen_register(struct 
codegen_context *ctx, int i)
   }
   
   void bpf_jit_init_reg_mapping(struct codegen_context *ctx);

-int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context 
*ctx, u64 func);
+int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context 
*ctx, u64 func,
+  bool has_kfunc_call);
   int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct 
codegen_context *ctx,
   u32 *addrs, int pass, bool extra_pass);
-void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
+void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx, bool 
has_kfunc_call);
   void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
   void bpf_jit_realloc_regs(struct codegen_context *ctx);
   int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int 
tmp_reg, long exit_addr);
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 0f9a21783329..7b4103b4c929 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -163,7 +163,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 * update ctgtx.idx as it pretends to output instructions, then we can
 * calculate total size from idx.
 */
-   bpf_jit_build_prologue(NULL, );
+   bpf_jit_build_prologue(NULL, , bpf_prog_has_kfunc_call(fp));
addrs[fp->len] = cgctx.idx * 4;
bpf_jit_build_epilogue(NULL, );
   
@@ -192,7 +192,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)

/* Now build the prologue, body code & epilogue for real. */
cgctx.idx = 0;
cgctx.alt_exit_addr = 0;
-   bpf_jit_build_prologue(code_base, );
+   bpf_jit_build_prologue(code_base, , 
bpf_prog_has_kfunc_call(fp));
if (bpf_jit_build_body(fp, code_base, fcode_base, , 
addrs, pass,
   extra_pass)) {
bpf_arch_text_copy(>size, >size, 
sizeof(hdr->size));
diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
b/arch/powerpc/net/bpf_jit_comp32.c
index 2f39c50ca729..447747e51a58 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -123,7 +123,7 @@ void bpf_jit_realloc_regs(struct codegen_context *ctx)
}
   }
   
-void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)

+void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx, bool 
has_kfunc_call)
   {
int i;
   
@@ -201,7 +201,8 @@ void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)

   }
   
   /* Relative offset needs to be calculated based on final image location */

-int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context 
*ctx, u64 func)
+int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context 
*ctx, u64 func,
+  bool has_kfunc_call)
   {
s32 rel = (s32)func - (s32)(fimage + ctx->idx);
   
@@ -1054,7 +1055,8 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct code

EMIT(PPC_RAW_STW(bpf_to_ppc(BPF_REG_5), _R1, 
12));
}
   
-			ret = bpf_jit_emit_func_call_rel(image, fimage, ctx, func_addr);

+   ret = bpf_jit_emit_func_call_rel(image, fimage, ctx, 
func_addr,
+
bpf_prog_has_kfunc_call(fp));
if (ret)
return ret;
   
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c

index 79f23974a320..385a5df1670c 100644
--- a/arch/powerpc/n

Re: [PATCH v2 2/2] powerpc/bpf: enable kfunc call

2024-02-15 Thread Hari Bathini





On 13/02/24 1:24 pm, Christophe Leroy wrote:



Le 01/02/2024 à 18:12, Hari Bathini a écrit :

With module addresses supported, override bpf_jit_supports_kfunc_call()
to enable kfunc support. Module address offsets can be more than 32-bit
long, so override bpf_jit_supports_far_kfunc_call() to enable 64-bit
pointers.


What's the impact on PPC32 ? There are no 64-bit pointers on PPC32.


Yeah. Not required to return true for PPC32 case and probably not a
good thing to claim support for far kfunc calls for PPC32. Changing to:

+bool bpf_jit_supports_far_kfunc_call(void)
+{
+   return IS_ENABLED(CONFIG_PPC64);
+}



Signed-off-by: Hari Bathini 
---

* No changes since v1.


   arch/powerpc/net/bpf_jit_comp.c | 10 ++
   1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 7b4103b4c929..f896a4213696 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -359,3 +359,13 @@ void bpf_jit_free(struct bpf_prog *fp)
   
   	bpf_prog_unlock_free(fp);

   }
+
+bool bpf_jit_supports_kfunc_call(void)
+{
+   return true;
+}
+
+bool bpf_jit_supports_far_kfunc_call(void)
+{
+   return true;
+}

[PATCH linux-next 3/3] powerpc/kdump: Split KEXEC_CORE and CRASH_DUMP dependency

2024-02-13 Thread Hari Bathini

Remove CONFIG_CRASH_DUMP dependency on CONFIG_KEXEC. CONFIG_KEXEC_CORE
was used at places where CONFIG_CRASH_DUMP or CONFIG_CRASH_RESERVE was
appropriate. Replace with appropriate #ifdefs to support CONFIG_KEXEC
and !CONFIG_CRASH_DUMP configuration option. Also, make CONFIG_FA_DUMP
dependent on CONFIG_CRASH_DUMP to avoid unmet dependencies for FA_DUMP
with !CONFIG_KEXEC_CORE configuration option.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/Kconfig   |  9 +--
 arch/powerpc/include/asm/kexec.h   | 98 +++---
 arch/powerpc/kernel/prom.c |  2 +-
 arch/powerpc/kernel/setup-common.c |  2 +-
 arch/powerpc/kernel/smp.c  |  4 +-
 arch/powerpc/kexec/Makefile|  3 +-
 arch/powerpc/kexec/core.c  |  4 ++
 7 files changed, 60 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 5cf8ad8d7e8e..e377deefa2dc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -607,11 +607,6 @@ config PPC64_SUPPORTS_MEMORY_FAILURE
 config ARCH_SUPPORTS_KEXEC
def_bool PPC_BOOK3S || PPC_E500 || (44x && !SMP)
 
-config ARCH_SELECTS_KEXEC
-   def_bool y
-   depends on KEXEC
-   select CRASH_DUMP
-
 config ARCH_SUPPORTS_KEXEC_FILE
def_bool PPC64
 
@@ -622,7 +617,6 @@ config ARCH_SELECTS_KEXEC_FILE
def_bool y
depends on KEXEC_FILE
select KEXEC_ELF
-   select CRASH_DUMP
select HAVE_IMA_KEXEC if IMA
 
 config PPC64_BIG_ENDIAN_ELF_ABI_V2
@@ -694,8 +688,7 @@ config ARCH_SELECTS_CRASH_DUMP
 
 config FA_DUMP
bool "Firmware-assisted dump"
-   depends on PPC64 && (PPC_RTAS || PPC_POWERNV)
-   select CRASH_DUMP
+   depends on CRASH_DUMP && PPC64 && (PPC_RTAS || PPC_POWERNV)
help
  A robust mechanism to get reliable kernel crash dump with
  assistance from firmware. This approach does not use kexec,
diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index e1b43aa12175..fdb90e24dc74 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -55,59 +55,18 @@
 typedef void (*crash_shutdown_t)(void);
 
 #ifdef CONFIG_KEXEC_CORE
-
-/*
- * This function is responsible for capturing register states if coming
- * via panic or invoking dump using sysrq-trigger.
- */
-static inline void crash_setup_regs(struct pt_regs *newregs,
-   struct pt_regs *oldregs)
-{
-   if (oldregs)
-   memcpy(newregs, oldregs, sizeof(*newregs));
-   else
-   ppc_save_regs(newregs);
-}
+struct kimage;
+struct pt_regs;
 
 extern void kexec_smp_wait(void);  /* get and clear naca physid, wait for
  master to copy new code to 0 */
-extern int crashing_cpu;
-extern void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *));
-extern void crash_ipi_callback(struct pt_regs *);
-extern int crash_wake_offline;
-
-struct kimage;
-struct pt_regs;
 extern void default_machine_kexec(struct kimage *image);
-extern void default_machine_crash_shutdown(struct pt_regs *regs);
-extern int crash_shutdown_register(crash_shutdown_t handler);
-extern int crash_shutdown_unregister(crash_shutdown_t handler);
-
-extern void crash_kexec_prepare(void);
-extern void crash_kexec_secondary(struct pt_regs *regs);
-int __init overlaps_crashkernel(unsigned long start, unsigned long size);
-extern void reserve_crashkernel(void);
 extern void machine_kexec_mask_interrupts(void);
 
-static inline bool kdump_in_progress(void)
-{
-   return crashing_cpu >= 0;
-}
-
 void relocate_new_kernel(unsigned long indirection_page, unsigned long 
reboot_code_buffer,
 unsigned long start_address) __noreturn;
-
 void kexec_copy_flush(struct kimage *image);
 
-#if defined(CONFIG_CRASH_DUMP)
-bool is_kdump_kernel(void);
-#define is_kdump_kernelis_kdump_kernel
-#if defined(CONFIG_PPC_RTAS)
-void crash_free_reserved_phys_range(unsigned long begin, unsigned long end);
-#define crash_free_reserved_phys_range crash_free_reserved_phys_range
-#endif /* CONFIG_PPC_RTAS */
-#endif /* CONFIG_CRASH_DUMP */
-
 #ifdef CONFIG_KEXEC_FILE
 extern const struct kexec_file_ops kexec_elf64_ops;
 
@@ -152,15 +111,56 @@ int setup_new_fdt_ppc64(const struct kimage *image, void 
*fdt,
 
 #endif /* CONFIG_KEXEC_FILE */
 
-#else /* !CONFIG_KEXEC_CORE */
-static inline void crash_kexec_secondary(struct pt_regs *regs) { }
+#endif /* CONFIG_KEXEC_CORE */
+
+#ifdef CONFIG_CRASH_RESERVE
+int __init overlaps_crashkernel(unsigned long start, unsigned long size);
+extern void reserve_crashkernel(void);
+#else
+static inline void reserve_crashkernel(void) {}
+static inline int overlaps_crashkernel(unsigned long start, unsigned long 
size) { return 0; }
+#endif
 
-static inline int overlaps_crashkernel(unsigned long start, unsigned long size)
+#if defined(CONFIG_CRASH_DUMP)
+/*
+

[PATCH linux-next 2/3] powerpc/kexec: split CONFIG_KEXEC_FILE and CONFIG_CRASH_DUMP

2024-02-13 Thread Hari Bathini

CONFIG_KEXEC_FILE does not have to select CONFIG_CRASH_DUMP. Move
some code under CONFIG_CRASH_DUMP to support CONFIG_KEXEC_FILE and
!CONFIG_CRASH_DUMP case.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/kexec/elf_64.c   |   4 +-
 arch/powerpc/kexec/file_load_64.c | 269 --
 2 files changed, 142 insertions(+), 131 deletions(-)

diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
index 904016cf89ea..6d8951e8e966 100644
--- a/arch/powerpc/kexec/elf_64.c
+++ b/arch/powerpc/kexec/elf_64.c
@@ -47,7 +47,7 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
if (ret)
return ERR_PTR(ret);
 
-   if (image->type == KEXEC_TYPE_CRASH) {
+   if (IS_ENABLED(CONFIG_CRASH_DUMP) && image->type == KEXEC_TYPE_CRASH) {
/* min & max buffer values for kdump case */
kbuf.buf_min = pbuf.buf_min = crashk_res.start;
kbuf.buf_max = pbuf.buf_max =
@@ -70,7 +70,7 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
kexec_dprintk("Loaded purgatory at 0x%lx\n", pbuf.mem);
 
/* Load additional segments needed for panic kernel */
-   if (image->type == KEXEC_TYPE_CRASH) {
+   if (IS_ENABLED(CONFIG_CRASH_DUMP) && image->type == KEXEC_TYPE_CRASH) {
ret = load_crashdump_segments_ppc64(image, );
if (ret) {
pr_err("Failed to load kdump kernel segments\n");
diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index 5b4c5cb23354..1bc65de6174f 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -96,119 +96,6 @@ static int get_exclude_memory_ranges(struct crash_mem 
**mem_ranges)
return ret;
 }
 
-/**
- * get_usable_memory_ranges - Get usable memory ranges. This list includes
- *regions like crashkernel, opal/rtas & tce-table,
- *that kdump kernel could use.
- * @mem_ranges:   Range list to add the memory ranges to.
- *
- * Returns 0 on success, negative errno on error.
- */
-static int get_usable_memory_ranges(struct crash_mem **mem_ranges)
-{
-   int ret;
-
-   /*
-* Early boot failure observed on guests when low memory (first memory
-* block?) is not added to usable memory. So, add [0, crashk_res.end]
-* instead of [crashk_res.start, crashk_res.end] to workaround it.
-* Also, crashed kernel's memory must be added to reserve map to
-* avoid kdump kernel from using it.
-*/
-   ret = add_mem_range(mem_ranges, 0, crashk_res.end + 1);
-   if (ret)
-   goto out;
-
-   ret = add_rtas_mem_range(mem_ranges);
-   if (ret)
-   goto out;
-
-   ret = add_opal_mem_range(mem_ranges);
-   if (ret)
-   goto out;
-
-   ret = add_tce_mem_ranges(mem_ranges);
-out:
-   if (ret)
-   pr_err("Failed to setup usable memory ranges\n");
-   return ret;
-}
-
-/**
- * get_crash_memory_ranges - Get crash memory ranges. This list includes
- *   first/crashing kernel's memory regions that
- *   would be exported via an elfcore.
- * @mem_ranges:  Range list to add the memory ranges to.
- *
- * Returns 0 on success, negative errno on error.
- */
-static int get_crash_memory_ranges(struct crash_mem **mem_ranges)
-{
-   phys_addr_t base, end;
-   struct crash_mem *tmem;
-   u64 i;
-   int ret;
-
-   for_each_mem_range(i, , ) {
-   u64 size = end - base;
-
-   /* Skip backup memory region, which needs a separate entry */
-   if (base == BACKUP_SRC_START) {
-   if (size > BACKUP_SRC_SIZE) {
-   base = BACKUP_SRC_END + 1;
-   size -= BACKUP_SRC_SIZE;
-   } else
-   continue;
-   }
-
-   ret = add_mem_range(mem_ranges, base, size);
-   if (ret)
-   goto out;
-
-   /* Try merging adjacent ranges before reallocation attempt */
-   if ((*mem_ranges)->nr_ranges == (*mem_ranges)->max_nr_ranges)
-   sort_memory_ranges(*mem_ranges, true);
-   }
-
-   /* Reallocate memory ranges if there is no space to split ranges */
-   tmem = *mem_ranges;
-   if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) {
-   tmem = realloc_mem_ranges(mem_ranges);
-   if (!tmem)
-   goto out;
-   }
-
-   /* Exclude crashkernel region */
-   ret = crash_exclude_mem_range(tmem, crashk_res.start, crashk_res.end);
-   if (ret)
-   goto out;
-
-   /*
-* FIXME: For now, stay i

[PATCH linux-next 1/3] kexec/kdump: make struct crash_mem available without CONFIG_CRASH_DUMP

2024-02-13 Thread Hari Bathini

struct crash_mem defined under include/linux/crash_core.h represents
a list of memory ranges. While it is used to represent memory ranges
for kdump kernel, it can also be used for other kind of memory ranges.
In fact, KEXEC_FILE_LOAD syscall in powerpc uses this structure to
represent reserved memory ranges and exclude memory ranges needed to
find the right memory regions to load kexec kernel. So, make the
definition of crash_mem structure available for !CONFIG_CRASH_DUMP
case too.

Signed-off-by: Hari Bathini 
---
 include/linux/crash_core.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index 23270b16e1db..d33352c2e386 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -8,6 +8,12 @@
 
 struct kimage;
 
+struct crash_mem {
+   unsigned int max_nr_ranges;
+   unsigned int nr_ranges;
+   struct range ranges[] __counted_by(max_nr_ranges);
+};
+
 #ifdef CONFIG_CRASH_DUMP
 
 int crash_shrink_memory(unsigned long new_size);
@@ -51,12 +57,6 @@ static inline unsigned int crash_get_elfcorehdr_size(void) { 
return 0; }
 /* Alignment required for elf header segment */
 #define ELF_CORE_HEADER_ALIGN   4096
 
-struct crash_mem {
-   unsigned int max_nr_ranges;
-   unsigned int nr_ranges;
-   struct range ranges[] __counted_by(max_nr_ranges);
-};
-
 extern int crash_exclude_mem_range(struct crash_mem *mem,
   unsigned long long mstart,
   unsigned long long mend);
-- 
2.43.0

[PATCH linux-next 0/3] powerpc/kexec: split CONFIG_CRASH_DUMP out from CONFIG_KEXEC_CORE

2024-02-13 Thread Hari Bathini

This patch series is a follow-up to [1] based on discussions at [2]
about additional work needed to get it working on powerpc.

The first patch in the series makes struct crash_mem available with or
without CONFIG_CRASH_DUMP enabled. The next patch moves kdump specific
code for kexec_file_load syscall under CONFIG_CRASH_DUMP and the last
patch splits other kdump specific code under CONFIG_CRASH_DUMP and
removes dependency with CONFIG_CRASH_DUMP for CONFIG_KEXEC_CORE.

[1] https://lore.kernel.org/all/20240124051254.67105-1-...@redhat.com/
[2] 
https://lore.kernel.org/all/9101bb07-70f1-476c-bec9-ec67e9899...@linux.ibm.com/


Hari Bathini (3):
  kexec/kdump: make struct crash_mem available without CONFIG_CRASH_DUMP
  powerpc/kexec: split CONFIG_KEXEC_FILE and CONFIG_CRASH_DUMP
  powerpc/kdump: Split KEXEC_CORE and CRASH_DUMP dependency

 arch/powerpc/Kconfig   |   9 +-
 arch/powerpc/include/asm/kexec.h   |  98 +--
 arch/powerpc/kernel/prom.c |   2 +-
 arch/powerpc/kernel/setup-common.c |   2 +-
 arch/powerpc/kernel/smp.c  |   4 +-
 arch/powerpc/kexec/Makefile|   3 +-
 arch/powerpc/kexec/core.c  |   4 +
 arch/powerpc/kexec/elf_64.c|   4 +-
 arch/powerpc/kexec/file_load_64.c  | 269 +++--
 include/linux/crash_core.h |  12 +-
 10 files changed, 208 insertions(+), 199 deletions(-)

-- 
2.43.0

Re: [PATCH v2 00/14] Split crash out from kexec and clean up related config items

2024-02-01 Thread Hari Bathini


Hi Baoquan,

On 19/01/24 8:22 pm, Baoquan He wrote:

Motivation:
=
Previously, LKP reported a building error. When investigating, it can't
be resolved reasonablly with the present messy kdump config items.

  https://lore.kernel.org/oe-kbuild-all/202312182200.ka7mzifq-...@intel.com/

The kdump (crash dumping) related config items could causes confusions:

Firstly,
---
CRASH_CORE enables codes including
  - crashkernel reservation;
  - elfcorehdr updating;
  - vmcoreinfo exporting;
  - crash hotplug handling;

Now fadump of powerpc, kcore dynamic debugging and kdump all selects
CRASH_CORE, while fadump
  - fadump needs crashkernel parsing, vmcoreinfo exporting, and accessing
global variable 'elfcorehdr_addr';
  - kcore only needs vmcoreinfo exporting;
  - kdump needs all of the current kernel/crash_core.c.

So only enabling PROC_CORE or FA_DUMP will enable CRASH_CORE, this
mislead people that we enable crash dumping, actual it's not.

Secondly,
---
It's not reasonable to allow KEXEC_CORE select CRASH_CORE.

Because KEXEC_CORE enables codes which allocate control pages, copy
kexec/kdump segments, and prepare for switching. These codes are
shared by both kexec reboot and kdump. We could want kexec reboot,
but disable kdump. In that case, CRASH_CORE should not be selected.

  
  CONFIG_CRASH_CORE=y
  CONFIG_KEXEC_CORE=y
  CONFIG_KEXEC=y
  CONFIG_KEXEC_FILE=y
 -

Thirdly,
---
It's not reasonable to allow CRASH_DUMP select KEXEC_CORE.

That could make KEXEC_CORE, CRASH_DUMP are enabled independently from
KEXEC or KEXEC_FILE. However, w/o KEXEC or KEXEC_FILE, the KEXEC_CORE
code built in doesn't make any sense because no kernel loading or
switching will happen to utilize the KEXEC_CORE code.
  -
  CONFIG_CRASH_CORE=y
  CONFIG_KEXEC_CORE=y
  CONFIG_CRASH_DUMP=y
  -

In this case, what is worse, on arch sh and arm, KEXEC relies on MMU,
while CRASH_DUMP can still be enabled when !MMU, then compiling error is
seen as the lkp test robot reported in above link.

  --arch/sh/Kconfig--
  config ARCH_SUPPORTS_KEXEC
  def_bool MMU

  config ARCH_SUPPORTS_CRASH_DUMP
  def_bool BROKEN_ON_SMP
  ---

Changes:
===
1, split out crash_reserve.c from crash_core.c;
2, split out vmcore_infoc. from crash_core.c;
3, move crash related codes in kexec_core.c into crash_core.c;
4, remove dependency of FA_DUMP on CRASH_DUMP;
5, clean up kdump related config items;
6, wrap up crash codes in crash related ifdefs on all 9 arch-es
which support crash dumping;

Achievement:
===
With above changes, I can rearrange the config item logic as below (the right
item depends on or is selected by the left item):

 PROC_KCORE ---> VMCORE_INFO

|--> VMCORE_INFO
 FA_DUMP|
|--> CRASH_RESERVE


FA_DUMP also needs PROC_VMCORE (CRASH_DUMP by dependency, I guess).
So, the FA_DUMP related changes here will need a relook..



 >VMCORE_INFO
/
|>CRASH_RESERVE
 KEXEC  --|/|
  |--> KEXEC_CORE--> CRASH_DUMP-->/-|>PROC_VMCORE
 KEXEC_FILE --|   \ |
\>CRASH_HOTPLUG


 KEXEC  --|
  |--> KEXEC_CORE (for kexec reboot only)
 KEXEC_FILE --|

Test

On all 8 architectures, including x86_64, arm64, s390x, sh, arm, mips,
riscv, loongarch, I did below three cases of config item setting and
building all passed. Let me take configs on x86_64 as exampmle here:

(1) Both CONFIG_KEXEC and KEXEC_FILE is unset, then all kexec/kdump
items are unset automatically:
# Kexec and crash features
# CONFIG_KEXEC is not set
# CONFIG_KEXEC_FILE is not set
# end of Kexec and crash features

(2) set CONFIG_KEXEC_FILE and 'make olddefconfig':
---
# Kexec and crash features
CONFIG_CRASH_RESERVE=y
CONFIG_VMCORE_INFO=y
CONFIG_KEXEC_CORE=y
CONFIG_KEXEC_FILE=y
CONFIG_CRASH_DUMP=y
CONFIG_CRASH_HOTPLUG=y
CONFIG_CRASH_MAX_MEMORY_RANGES=8192
# end of Kexec and crash features
---

(3) unset CONFIG_CRASH_DUMP in case 2 and execute 'make olddefconfig':

# Kexec and crash features
CONFIG_KEXEC_CORE=y
CONFIG_KEXEC_FILE=y
# end of Kexec and crash features


Note:
For ppc, it needs investigation to make clear how to split out crash
code in arch folder.


On powerpc, both kdump and fadump need PROC_VMCORE & CRASH_DUMP.
Hope that clears things. So, patch 3/14 breaks things for FA_DUMP..


Hope Hari and Pingfan can help have a look, see if
it's doable. Now, I make it either have both kexec and crash enabled, or
disable both of them altogether.



Sure. I will take a closer

[PATCH v2 2/2] powerpc/bpf: enable kfunc call

2024-02-01 Thread Hari Bathini

With module addresses supported, override bpf_jit_supports_kfunc_call()
to enable kfunc support. Module address offsets can be more than 32-bit
long, so override bpf_jit_supports_far_kfunc_call() to enable 64-bit
pointers.

Signed-off-by: Hari Bathini 
---

* No changes since v1.


 arch/powerpc/net/bpf_jit_comp.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 7b4103b4c929..f896a4213696 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -359,3 +359,13 @@ void bpf_jit_free(struct bpf_prog *fp)
 
bpf_prog_unlock_free(fp);
 }
+
+bool bpf_jit_supports_kfunc_call(void)
+{
+   return true;
+}
+
+bool bpf_jit_supports_far_kfunc_call(void)
+{
+   return true;
+}
-- 
2.43.0

[PATCH v2 1/2] powerpc/bpf: ensure module addresses are supported

2024-02-01 Thread Hari Bathini

Currently, bpf jit code on powerpc assumes all the bpf functions and
helpers to be kernel text. This is false for kfunc case, as function
addresses are mostly module addresses in that case. Ensure module
addresses are supported to enable kfunc support.

Assume kernel text address for programs with no kfunc call to optimize
instruction sequence in that case. Add a check to error out if this
assumption ever changes in the future.

Signed-off-by: Hari Bathini 
---

Changes in v2:
* Using bpf_prog_has_kfunc_call() to decide whether to use optimized
  instruction sequence or not as suggested by Naveen.


 arch/powerpc/net/bpf_jit.h|   5 +-
 arch/powerpc/net/bpf_jit_comp.c   |   4 +-
 arch/powerpc/net/bpf_jit_comp32.c |   8 ++-
 arch/powerpc/net/bpf_jit_comp64.c | 109 --
 4 files changed, 97 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index cdea5dccaefe..fc56ee0ee9c5 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -160,10 +160,11 @@ static inline void bpf_clear_seen_register(struct 
codegen_context *ctx, int i)
 }
 
 void bpf_jit_init_reg_mapping(struct codegen_context *ctx);
-int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context 
*ctx, u64 func);
+int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context 
*ctx, u64 func,
+  bool has_kfunc_call);
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct 
codegen_context *ctx,
   u32 *addrs, int pass, bool extra_pass);
-void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
+void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx, bool 
has_kfunc_call);
 void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_realloc_regs(struct codegen_context *ctx);
 int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int 
tmp_reg, long exit_addr);
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 0f9a21783329..7b4103b4c929 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -163,7 +163,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 * update ctgtx.idx as it pretends to output instructions, then we can
 * calculate total size from idx.
 */
-   bpf_jit_build_prologue(NULL, );
+   bpf_jit_build_prologue(NULL, , bpf_prog_has_kfunc_call(fp));
addrs[fp->len] = cgctx.idx * 4;
bpf_jit_build_epilogue(NULL, );
 
@@ -192,7 +192,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
/* Now build the prologue, body code & epilogue for real. */
cgctx.idx = 0;
cgctx.alt_exit_addr = 0;
-   bpf_jit_build_prologue(code_base, );
+   bpf_jit_build_prologue(code_base, , 
bpf_prog_has_kfunc_call(fp));
if (bpf_jit_build_body(fp, code_base, fcode_base, , 
addrs, pass,
   extra_pass)) {
bpf_arch_text_copy(>size, >size, 
sizeof(hdr->size));
diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
b/arch/powerpc/net/bpf_jit_comp32.c
index 2f39c50ca729..447747e51a58 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -123,7 +123,7 @@ void bpf_jit_realloc_regs(struct codegen_context *ctx)
}
 }
 
-void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
+void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx, bool 
has_kfunc_call)
 {
int i;
 
@@ -201,7 +201,8 @@ void bpf_jit_build_epilogue(u32 *image, struct 
codegen_context *ctx)
 }
 
 /* Relative offset needs to be calculated based on final image location */
-int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context 
*ctx, u64 func)
+int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context 
*ctx, u64 func,
+  bool has_kfunc_call)
 {
s32 rel = (s32)func - (s32)(fimage + ctx->idx);
 
@@ -1054,7 +1055,8 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
u32 *fimage, struct code
EMIT(PPC_RAW_STW(bpf_to_ppc(BPF_REG_5), _R1, 
12));
}
 
-   ret = bpf_jit_emit_func_call_rel(image, fimage, ctx, 
func_addr);
+   ret = bpf_jit_emit_func_call_rel(image, fimage, ctx, 
func_addr,
+
bpf_prog_has_kfunc_call(fp));
if (ret)
return ret;
 
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 79f23974a320..385a5df1670c 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -122,12 +122,17 @@ void bpf_jit_realloc_regs(struct codegen_context

Re: [PATCH v15 5/5] powerpc: add crash memory hotplug support

2024-01-23 Thread Hari Bathini





On 11/01/24 4:21 pm, Sourabh Jain wrote:

Extend the arch crash hotplug handler, as introduced by the patch title
("powerpc: add crash CPU hotplug support"), to also support memory
add/remove events.

Elfcorehdr describes the memory of the crash kernel to capture the
kernel; hence, it needs to be updated if memory resources change due to
memory add/remove events. Therefore, arch_crash_handle_hotplug_event()
is updated to recreate the elfcorehdr and replace it with the previous
one on memory add/remove events.

The memblock list is used to prepare the elfcorehdr. In the case of
memory hot removal, the memblock list is updated after the arch crash
hotplug handler is triggered, as depicted in Figure 1. Thus, the
hot-removed memory is explicitly removed from the crash memory ranges
to ensure that the memory ranges added to elfcorehdr do not include the
hot-removed memory.

 Memory remove
   |
   v
 Offline pages
   |
   v
  Initiate memory notify call <> crash hotplug handler
  chain for MEM_OFFLINE event
   |
   v
  Update memblock list

Figure 1

There are two system calls, `kexec_file_load` and `kexec_load`, used to
load the kdump image. A few changes have been made to ensure that the
kernel can safely update the elfcorehdr component of the kdump image for
both system calls.

For the kexec_file_load syscall, kdump image is prepared in the kernel.
To support an increasing number of memory regions, the elfcorehdr is
built with extra buffer space to ensure that it can accommodate
additional memory ranges in future.

For the kexec_load syscall, the elfcorehdr is updated only if the
KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by the
kexec tool. Passing this flag to the kernel indicates that the
elfcorehdr is built to accommodate additional memory ranges and the
elfcorehdr segment is not considered for SHA calculation, making it safe
to update.

The changes related to this feature are kept under the CRASH_HOTPLUG
config, and it is enabled by default.

Signed-off-by: Sourabh Jain 
Cc: Akhil Raj 
Cc: Andrew Morton 
Cc: Aneesh Kumar K.V 
Cc: Baoquan He 
Cc: Borislav Petkov (AMD) 
Cc: Boris Ostrovsky 
Cc: Christophe Leroy 
Cc: Dave Hansen 
Cc: Dave Young 
Cc: David Hildenbrand 
Cc: Greg Kroah-Hartman 
Cc: Hari Bathini 
Cc: Laurent Dufour 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Mimi Zohar 
Cc: Naveen N Rao 
Cc: Oscar Salvador 
Cc: Thomas Gleixner 
Cc: Valentin Schneider 
Cc: Vivek Goyal 
Cc: ke...@lists.infradead.org
Cc: x...@kernel.org
---
  arch/powerpc/include/asm/kexec.h|   5 +-
  arch/powerpc/include/asm/kexec_ranges.h |   1 +
  arch/powerpc/kexec/core_64.c| 107 +++-
  arch/powerpc/kexec/file_load_64.c   |  34 +++-
  arch/powerpc/kexec/ranges.c |  85 +++
  5 files changed, 225 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 943e58eb9bff..25ff5b7f1a28 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -116,8 +116,11 @@ int get_crash_memory_ranges(struct crash_mem **mem_ranges);
  #ifdef CONFIG_CRASH_HOTPLUG
  void arch_crash_handle_hotplug_event(struct kimage *image, void *arg);
  #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
-#endif /*CONFIG_CRASH_HOTPLUG */
  
+unsigned int arch_crash_get_elfcorehdr_size(void);

+#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
+
+#endif /*CONFIG_CRASH_HOTPLUG */
  #endif /* CONFIG_PPC64 */
  
  #ifdef CONFIG_KEXEC_FILE

diff --git a/arch/powerpc/include/asm/kexec_ranges.h 
b/arch/powerpc/include/asm/kexec_ranges.h
index f83866a19e87..802abf580cf0 100644
--- a/arch/powerpc/include/asm/kexec_ranges.h
+++ b/arch/powerpc/include/asm/kexec_ranges.h
@@ -7,6 +7,7 @@
  void sort_memory_ranges(struct crash_mem *mrngs, bool merge);
  struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges);
  int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
+int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
  int add_tce_mem_ranges(struct crash_mem **mem_ranges);
  int add_initrd_mem_range(struct crash_mem **mem_ranges);
  #ifdef CONFIG_PPC_64S_HASH_MMU
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 43fcd78c2102..4673f150f973 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -19,8 +19,11 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 

+#include 
+#include 
  #include 
  #include 
  #include 
@@ -546,6 +549,101 @@ int update_cpus_node(void *fdt)
  #undef pr_fmt
  #define pr_fmt(fmt) "crash hp: " fmt
  
+/*

+ * Advertise preferred elfcorehdr size to userspace via
+ * /sys/kernel/crash_elfcorehdr_size sysfs interface.
+ */
+unsigned int arch_crash_get_elfcorehdr_size(void)
+{
+   unsigned int sz;
+

Re: [PATCH v15 4/5] powerpc: add crash CPU hotplug support

2024-01-23 Thread Hari Bathini





On 11/01/24 4:21 pm, Sourabh Jain wrote:

Due to CPU/Memory hotplug or online/offline events, the elfcorehdr
(which describes the CPUs and memory of the crashed kernel) and FDT
(Flattened Device Tree) of kdump image becomes outdated. Consequently,
attempting dump collection with an outdated elfcorehdr or FDT can lead
to failed or inaccurate dump collection.

Going forward, CPU hotplug or online/offline events are referred as
CPU/Memory add/remove events.

The current solution to address the above issue involves monitoring the
CPU/Memory add/remove events in userspace using udev rules and whenever
there are changes in CPU and memory resources, the entire kdump image
is loaded again. The kdump image includes kernel, initrd, elfcorehdr,
FDT, purgatory. Given that only elfcorehdr and FDT get outdated due to
CPU/Memory add/remove events, reloading the entire kdump image is
inefficient. More importantly, kdump remains inactive for a substantial
amount of time until the kdump reload completes.

To address the aforementioned issue, commit 247262756121 ("crash: add
generic infrastructure for crash hotplug support") added a generic
infrastructure that allows architectures to selectively update the kdump
image component during CPU or memory add/remove events within the kernel
itself.

In the event of a CPU or memory add/remove events, the generic crash
hotplug event handler, `crash_handle_hotplug_event()`, is triggered. It
then acquires the necessary locks to update the kdump image and invokes
the architecture-specific crash hotplug handler,
`arch_crash_handle_hotplug_event()`, to update the required kdump image
components.

This patch adds crash hotplug handler for PowerPC and enable support to
update the kdump image on CPU add/remove events. Support for memory
add/remove events is added in a subsequent patch with the title
"powerpc: add crash memory hotplug support"

As mentioned earlier, only the elfcorehdr and FDT kdump image components
need to be updated in the event of CPU or memory add/remove events.
However, on PowerPC architecture crash hotplug handler only updates the
FDT to enable crash hotplug support for CPU add/remove events. Here's
why.

The elfcorehdr on PowerPC is built with possible CPUs, and thus, it does
not need an update on CPU add/remove events. On the other hand, the FDT
needs to be updated on CPU add events to include the newly added CPU. If
the FDT is not updated and the kernel crashes on a newly added CPU, the
kdump kernel will fail to boot due to the unavailability of the crashing
CPU in the FDT. During the early boot, it is expected that the boot CPU
must be a part of the FDT; otherwise, the kernel will raise a BUG and
fail to boot. For more information, refer to commit 36ae37e3436b0
("powerpc: Make boot_cpuid common between 32 and 64-bit"). Since it is
okay to have an offline CPU in the kdump FDT, no action is taken in case
of CPU removal.

There are two system calls, `kexec_file_load` and `kexec_load`, used to
load the kdump image. Few changes have been made to ensure kernel can
safely update the FDT of kdump image loaded using both system calls.

For kexec_file_load syscall the kdump image is prepared in kernel. So to
support an increasing number of CPUs, the FDT is constructed with extra
buffer space to ensure it can accommodate a possible number of CPU
nodes. Additionally, a call to fdt_pack (which trims the unused space
once the FDT is prepared) is avoided if this feature is enabled.

For the kexec_load syscall, the FDT is updated only if the
KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by
userspace (kexec tools). When userspace passes this flag to the kernel,
it indicates that the FDT is built to accommodate possible CPUs, and the
FDT segment is excluded from SHA calculation, making it safe to update.

The changes related to this feature are kept under the CRASH_HOTPLUG
config, and it is enabled by default.

Signed-off-by: Sourabh Jain 
Cc: Akhil Raj 
Cc: Andrew Morton 
Cc: Aneesh Kumar K.V 
Cc: Baoquan He 
Cc: Borislav Petkov (AMD) 
Cc: Boris Ostrovsky 
Cc: Christophe Leroy 
Cc: Dave Hansen 
Cc: Dave Young 
Cc: David Hildenbrand 
Cc: Greg Kroah-Hartman 
Cc: Hari Bathini 
Cc: Laurent Dufour 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Mimi Zohar 
Cc: Naveen N Rao 
Cc: Oscar Salvador 
Cc: Thomas Gleixner 
Cc: Valentin Schneider 
Cc: Vivek Goyal 
Cc: ke...@lists.infradead.org
Cc: x...@kernel.org
---
  arch/powerpc/Kconfig  |  4 ++
  arch/powerpc/include/asm/kexec.h  |  6 +++
  arch/powerpc/kexec/core_64.c  | 69 +++
  arch/powerpc/kexec/elf_64.c   | 12 +-
  arch/powerpc/kexec/file_load_64.c | 15 +++
  5 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 414b978b8010..91d7bb0b81ee 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -682,6 +682,10 @@ config RELOCATABLE_TEST
  config ARCH_SUPPORTS_CRA

Re: [PATCH v7 1/3] powerpc: make fadump resilient with memory add/remove events

2024-01-23 Thread Hari Bathini

e across the crashed and fadump kernel.

Note: if either first/crashed kernel or second/fadump kernel do not have
the changes introduced here then kernel fail to collect the dump and
prints relevant error message on the console.

Signed-off-by: Sourabh Jain 
Cc: Aditya Gupta 
Cc: Aneesh Kumar K.V 
Cc: Hari Bathini 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Naveen N Rao 
---
  arch/powerpc/include/asm/fadump-internal.h   |  31 +-
  arch/powerpc/kernel/fadump.c | 355 +++
  arch/powerpc/platforms/powernv/opal-fadump.c |  18 +-
  arch/powerpc/platforms/pseries/rtas-fadump.c |  23 +-
  4 files changed, 242 insertions(+), 185 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump-internal.h 
b/arch/powerpc/include/asm/fadump-internal.h
index 27f9e11eda28..a632e9708610 100644
--- a/arch/powerpc/include/asm/fadump-internal.h
+++ b/arch/powerpc/include/asm/fadump-internal.h
@@ -42,13 +42,40 @@ static inline u64 fadump_str_to_u64(const char *str)
  
  #define FADUMP_CPU_UNKNOWN		(~((u32)0))
  
-#define FADUMP_CRASH_INFO_MAGIC		fadump_str_to_u64("FADMPINF")

+/*
+ * The introduction of new fields in the fadump crash info header has
+ * led to a change in the magic key from `FADMPINF` to `FADMPSIG` for
+ * identifying a kernel crash from an old kernel.
+ *
+ * To prevent the need for further changes to the magic number in the
+ * event of future modifications to the fadump crash info header, a
+ * version field has been introduced to track the fadump crash info
+ * header version.
+ *
+ * Consider a few points before adding new members to the fadump crash info
+ * header structure:
+ *
+ *  - Append new members; avoid adding them in between.
+ *  - Non-primitive members should have a size member as well.
+ *  - For every change in the fadump header, increment the
+ *fadump header version. This helps the updated kernel decide how to
+ *handle kernel dumps from older kernels.
+ */
+#define FADUMP_CRASH_INFO_MAGIC_OLDfadump_str_to_u64("FADMPINF")
+#define FADUMP_CRASH_INFO_MAGICfadump_str_to_u64("FADMPSIG")
+#define FADUMP_HEADER_VERSION  1
  
  /* fadump crash info structure */

  struct fadump_crash_info_header {
u64 magic_number;
-   u64 elfcorehdr_addr;
+   u32 version;
u32 crashing_cpu;



+   u64 elfcorehdr_addr;
+   u64 elfcorehdr_size;


fadump_crash_info_header structure is to share info across reboots.
Now that elfcorehdr is prepared in second kernel and also dump capture
of older kernel is not supported, get rid of elfcorehdr_addr &
elfcorehdr_size from fadump_crash_info_header structure and put them
in fw_dump structure instead..


+   u64 vmcoreinfo_raddr;
+   u64 vmcoreinfo_size;
+   u32 pt_regs_sz;
+   u32 cpu_mask_sz;
struct pt_regs  regs;
struct cpumask  cpu_mask;
  };
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index d14eda1e8589..eb9132538268 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -53,8 +53,6 @@ static struct kobject *fadump_kobj;
  static atomic_t cpus_in_fadump;
  static DEFINE_MUTEX(fadump_mutex);
  
-static struct fadump_mrange_info crash_mrange_info = { "crash", NULL, 0, 0, 0, false };

-
  #define RESERVED_RNGS_SZ  16384 /* 16K - 128 entries */
  #define RESERVED_RNGS_CNT (RESERVED_RNGS_SZ / \
 sizeof(struct fadump_memory_range))
@@ -373,12 +371,6 @@ static unsigned long __init get_fadump_area_size(void)
size = PAGE_ALIGN(size);
size += fw_dump.boot_memory_size;
size += sizeof(struct fadump_crash_info_header);
-   size += sizeof(struct elfhdr); /* ELF core header.*/
-   size += sizeof(struct elf_phdr); /* place holder for cpu notes */
-   /* Program headers for crash memory regions. */
-   size += sizeof(struct elf_phdr) * (memblock_num_regions(memory) + 2);
-
-   size = PAGE_ALIGN(size);
  
  	/* This is to hold kernel metadata on platforms that support it */

size += (fw_dump.ops->fadump_get_metadata_size ?
@@ -931,36 +923,6 @@ static inline int fadump_add_mem_range(struct 
fadump_mrange_info *mrange_info,
return 0;
  }
  
-static int fadump_exclude_reserved_area(u64 start, u64 end)

-{
-   u64 ra_start, ra_end;
-   int ret = 0;
-
-   ra_start = fw_dump.reserve_dump_area_start;
-   ra_end = ra_start + fw_dump.reserve_dump_area_size;
-
-   if ((ra_start < end) && (ra_end > start)) {
-   if ((start < ra_start) && (end > ra_end)) {
-   ret = fadump_add_mem_range(_mrange_info,
-  start, ra_start);
-   if (ret)
-   return ret;
-
-   ret = fa

Re: [PATCHv9 2/2] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt

2024-01-08 Thread Hari Bathini





On 09/01/24 9:57 am, Hari Bathini wrote:

Hi Michael,



Sorry, Michael.
I am just about getting back to work and I spoke too soon.
You already seem to have posted a set with the approach you had in mind:

  https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=388350

Thanks
Hari


I am fine with either approach. I was trying to address your concerns
in my way. Looking for your inputs here on how to go about this now..

On 29/11/23 7:00 am, Pingfan Liu wrote:

Hi Hari,


On Mon, Nov 27, 2023 at 12:30 PM Hari Bathini  
wrote:


Hi Pingfan, Michael,

On 17/10/23 4:03 pm, Hari Bathini wrote:



On 17/10/23 7:58 am, Pingfan Liu wrote:

*** Idea ***
For kexec -p, the boot cpu can be not the cpu0, this causes the 
problem

of allocating memory for paca_ptrs[]. However, in theory, there is no
requirement to assign cpu's logical id as its present sequence in the
device tree. But there is something like cpu_first_thread_sibling(),
which makes assumption on the mapping inside a core. Hence partially
loosening the mapping, i.e. unbind the mapping of core while keep the
mapping inside a core.

*** Implement ***
At this early stage, there are plenty of memory to utilize. Hence, 
this

patch allocates interim memory to link the cpu info on a list, then
reorder cpus by changing the list head. As a result, there is a rotate
shift between the sequence number in dt and the cpu logical number.

*** Result ***
After this patch, a boot-cpu's logical id will always be mapped 
into the

range [0,threads_per_core).

Besides this, at this phase, all threads in the boot core are 
forced to

be onlined. This restriction will be lifted in a later patch with
extra effort.

Signed-off-by: Pingfan Liu 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Mahesh Salgaonkar 
Cc: Wen Xiong 
Cc: Baoquan He 
Cc: Ming Lei 
Cc: Sourabh Jain 
Cc: Hari Bathini 
Cc: ke...@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org


Thanks for working on this, Pingfan.
Looks good to me.

Acked-by: Hari Bathini 



On second thoughts, probably better off with no impact for
bootcpu < nr_cpu_ids case and changing only two cores logical
numbering otherwise. Something like the below (Please share
your thoughts):



I am afraid that it may not be as ideal as it looks, considering the
following factors:
-1. For the case of 'bootcpu < nr_cpu_ids', crash can happen evenly
across any cpu in the system, which seriously undermines the
protection intended here (Under the most optimistic scenario, there is
a 50% chance of success)

-2. For the re-ordering of logical numbering, IMHO, if there is
concern that re-ordering will break something, the partial re-ordering
can not avoid that.  We ought to spot probable hazards so as to ease
worries.


Thanks,

Pingfan


diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index ec82f5bda908..78a8312aa8c4 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -76,7 +76,9 @@ u64 ppc64_rma_size;
   unsigned int boot_cpu_node_count __ro_after_init;
   #endif
   static phys_addr_t first_memblock_size;
+#ifdef CONFIG_SMP
   static int __initdata boot_cpu_count;
+#endif

   static int __init early_parse_mem(char *p)
   {
@@ -357,6 +359,25 @@ static int __init early_init_dt_scan_cpus(unsigned
long node,
 fdt_boot_cpuid_phys(initial_boot_params)) {
 found = boot_cpu_count;
 found_thread = i;
+   /*
+    * Map boot-cpu logical id into the range
+    * of [0, thread_per_core) if it can't be
+    * accommodated within nr_cpu_ids.
+    */
+   if (i != boot_cpu_count && boot_cpu_count >= 
nr_cpu_ids) {

+   boot_cpuid = i;
+   DBG("Logical CPU number for boot CPU 
changed from %d to %d\n",

+   boot_cpu_count, i);
+   } else {
+   boot_cpuid = boot_cpu_count;
+   }
+
+   /* Ensure boot thread is acconted for in 
nr_cpu_ids */

+   if (boot_cpuid >= nr_cpu_ids) {
+   set_nr_cpu_ids(boot_cpuid + 1);
+   DBG("Adjusted nr_cpu_ids to %u, to 
include boot CPU.\n",

+   nr_cpu_ids);
+   }
 }
   #ifdef CONFIG_SMP
 /* logical cpu id is always 0 on UP kernels */
@@ -368,9 +389,8 @@ static int __init early_init_dt_scan_cpus(unsigned
long node,
 if (found < 0)
 return 0;

-   DBG("boot cpu: logical %d physical %d\n", found,
+   DBG("boot cpu: logical %d physical %d\n", boot_cpuid,
 be32_to_cpu(intserv[found_thread]));
-   boot_cpuid = found;

Re: [PATCHv9 2/2] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt

2024-01-08 Thread Hari Bathini


Hi Michael,

I am fine with either approach. I was trying to address your concerns
in my way. Looking for your inputs here on how to go about this now..

On 29/11/23 7:00 am, Pingfan Liu wrote:

Hi Hari,


On Mon, Nov 27, 2023 at 12:30 PM Hari Bathini  wrote:


Hi Pingfan, Michael,

On 17/10/23 4:03 pm, Hari Bathini wrote:



On 17/10/23 7:58 am, Pingfan Liu wrote:

*** Idea ***
For kexec -p, the boot cpu can be not the cpu0, this causes the problem
of allocating memory for paca_ptrs[]. However, in theory, there is no
requirement to assign cpu's logical id as its present sequence in the
device tree. But there is something like cpu_first_thread_sibling(),
which makes assumption on the mapping inside a core. Hence partially
loosening the mapping, i.e. unbind the mapping of core while keep the
mapping inside a core.

*** Implement ***
At this early stage, there are plenty of memory to utilize. Hence, this
patch allocates interim memory to link the cpu info on a list, then
reorder cpus by changing the list head. As a result, there is a rotate
shift between the sequence number in dt and the cpu logical number.

*** Result ***
After this patch, a boot-cpu's logical id will always be mapped into the
range [0,threads_per_core).

Besides this, at this phase, all threads in the boot core are forced to
be onlined. This restriction will be lifted in a later patch with
extra effort.

Signed-off-by: Pingfan Liu 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Mahesh Salgaonkar 
Cc: Wen Xiong 
Cc: Baoquan He 
Cc: Ming Lei 
Cc: Sourabh Jain 
Cc: Hari Bathini 
Cc: ke...@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org


Thanks for working on this, Pingfan.
Looks good to me.

Acked-by: Hari Bathini 



On second thoughts, probably better off with no impact for
bootcpu < nr_cpu_ids case and changing only two cores logical
numbering otherwise. Something like the below (Please share
your thoughts):



I am afraid that it may not be as ideal as it looks, considering the
following factors:
-1. For the case of 'bootcpu < nr_cpu_ids', crash can happen evenly
across any cpu in the system, which seriously undermines the
protection intended here (Under the most optimistic scenario, there is
a 50% chance of success)

-2. For the re-ordering of logical numbering, IMHO, if there is
concern that re-ordering will break something, the partial re-ordering
can not avoid that.  We ought to spot probable hazards so as to ease
worries.


Thanks,

Pingfan


diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index ec82f5bda908..78a8312aa8c4 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -76,7 +76,9 @@ u64 ppc64_rma_size;
   unsigned int boot_cpu_node_count __ro_after_init;
   #endif
   static phys_addr_t first_memblock_size;
+#ifdef CONFIG_SMP
   static int __initdata boot_cpu_count;
+#endif

   static int __init early_parse_mem(char *p)
   {
@@ -357,6 +359,25 @@ static int __init early_init_dt_scan_cpus(unsigned
long node,
 fdt_boot_cpuid_phys(initial_boot_params)) {
 found = boot_cpu_count;
 found_thread = i;
+   /*
+* Map boot-cpu logical id into the range
+* of [0, thread_per_core) if it can't be
+* accommodated within nr_cpu_ids.
+*/
+   if (i != boot_cpu_count && boot_cpu_count >= 
nr_cpu_ids) {
+   boot_cpuid = i;
+   DBG("Logical CPU number for boot CPU changed from %d 
to %d\n",
+   boot_cpu_count, i);
+   } else {
+   boot_cpuid = boot_cpu_count;
+   }
+
+   /* Ensure boot thread is acconted for in nr_cpu_ids */
+   if (boot_cpuid >= nr_cpu_ids) {
+   set_nr_cpu_ids(boot_cpuid + 1);
+   DBG("Adjusted nr_cpu_ids to %u, to include boot 
CPU.\n",
+   nr_cpu_ids);
+   }
 }
   #ifdef CONFIG_SMP
 /* logical cpu id is always 0 on UP kernels */
@@ -368,9 +389,8 @@ static int __init early_init_dt_scan_cpus(unsigned
long node,
 if (found < 0)
 return 0;

-   DBG("boot cpu: logical %d physical %d\n", found,
+   DBG("boot cpu: logical %d physical %d\n", boot_cpuid,
 be32_to_cpu(intserv[found_thread]));
-   boot_cpuid = found;

 boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);

diff --git a/arch/powerpc/kernel/setup-common.c
b/arch/powerpc/kernel/setup-common.c
index b7b733474b60..f7179525c774 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -409,6 +409,12 @@ static void __init cp

[PATCH 2/2] powerpc/bpf: enable kfunc call

2023-12-20 Thread Hari Bathini

With module addresses supported, override bpf_jit_supports_kfunc_call()
to enable kfunc support. Module address offsets can be more than 32-bit
long, so override bpf_jit_supports_far_kfunc_call() to enable 64-bit
pointers.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/net/bpf_jit_comp.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 0f9a21783329..a6151a5ef9a5 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -359,3 +359,13 @@ void bpf_jit_free(struct bpf_prog *fp)
 
bpf_prog_unlock_free(fp);
 }
+
+bool bpf_jit_supports_kfunc_call(void)
+{
+   return true;
+}
+
+bool bpf_jit_supports_far_kfunc_call(void)
+{
+   return true;
+}
-- 
2.43.0

[PATCH 1/2] powerpc/bpf: ensure module addresses are supported

2023-12-20 Thread Hari Bathini

Currently, bpf jit code on powerpc assumes all the bpf functions and
helpers to be kernel text. This is false for kfunc case, as function
addresses are mostly module addresses in that case. Ensure module
addresses are supported to enable kfunc support.

This effectively reverts commit feb6307289d8 ("powerpc64/bpf: Optimize
instruction sequence used for function calls") and commit 43d636f8b4fd
("powerpc64/bpf elfv1: Do not load TOC before calling functions") that
assumed only kernel text for bpf functions/helpers.

Also, commit b10cb163c4b3 ("powerpc64/bpf elfv2: Setup kernel TOC in
r2 on entry") that paved the way for the commits mentioned above is
reverted.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/net/bpf_jit.h|  2 +-
 arch/powerpc/net/bpf_jit_comp32.c |  8 +--
 arch/powerpc/net/bpf_jit_comp64.c | 90 +--
 3 files changed, 52 insertions(+), 48 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index cdea5dccaefe..48503caa5b58 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -160,7 +160,7 @@ static inline void bpf_clear_seen_register(struct 
codegen_context *ctx, int i)
 }
 
 void bpf_jit_init_reg_mapping(struct codegen_context *ctx);
-int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context 
*ctx, u64 func);
+void bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct 
codegen_context *ctx, u64 func);
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct 
codegen_context *ctx,
   u32 *addrs, int pass, bool extra_pass);
 void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
b/arch/powerpc/net/bpf_jit_comp32.c
index 2f39c50ca729..1236a75c04ea 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -201,7 +201,7 @@ void bpf_jit_build_epilogue(u32 *image, struct 
codegen_context *ctx)
 }
 
 /* Relative offset needs to be calculated based on final image location */
-int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context 
*ctx, u64 func)
+void bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct 
codegen_context *ctx, u64 func)
 {
s32 rel = (s32)func - (s32)(fimage + ctx->idx);
 
@@ -214,8 +214,6 @@ int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, 
struct codegen_context *
EMIT(PPC_RAW_MTCTR(_R0));
EMIT(PPC_RAW_BCTRL());
}
-
-   return 0;
 }
 
 static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 
out)
@@ -1054,9 +1052,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
u32 *fimage, struct code
EMIT(PPC_RAW_STW(bpf_to_ppc(BPF_REG_5), _R1, 
12));
}
 
-   ret = bpf_jit_emit_func_call_rel(image, fimage, ctx, 
func_addr);
-   if (ret)
-   return ret;
+   bpf_jit_emit_func_call_rel(image, fimage, ctx, 
func_addr);
 
EMIT(PPC_RAW_MR(bpf_to_ppc(BPF_REG_0) - 1, _R3));
EMIT(PPC_RAW_MR(bpf_to_ppc(BPF_REG_0), _R4));
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 79f23974a320..e7199c202a00 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -126,11 +126,6 @@ void bpf_jit_build_prologue(u32 *image, struct 
codegen_context *ctx)
 {
int i;
 
-#ifndef CONFIG_PPC_KERNEL_PCREL
-   if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2))
-   EMIT(PPC_RAW_LD(_R2, _R13, offsetof(struct paca_struct, 
kernel_toc)));
-#endif
-
/*
 * Initialize tail_call_cnt if we do tail calls.
 * Otherwise, put in NOPs so that it can be skipped when we are
@@ -145,6 +140,8 @@ void bpf_jit_build_prologue(u32 *image, struct 
codegen_context *ctx)
EMIT(PPC_RAW_NOP());
}
 
+#define BPF_TAILCALL_PROLOGUE_SIZE 8
+
if (bpf_has_stack_frame(ctx)) {
/*
 * We need a stack frame, but we don't necessarily need to
@@ -204,14 +201,9 @@ void bpf_jit_build_epilogue(u32 *image, struct 
codegen_context *ctx)
 
 static int bpf_jit_emit_func_call_hlp(u32 *image, struct codegen_context *ctx, 
u64 func)
 {
-   unsigned long func_addr = func ? ppc_function_entry((void *)func) : 0;
-   long reladdr;
-
-   if (WARN_ON_ONCE(!core_kernel_text(func_addr)))
-   return -EINVAL;
-
if (IS_ENABLED(CONFIG_PPC_KERNEL_PCREL)) {
-   reladdr = func_addr - CTX_NIA(ctx);
+   unsigned long func_addr = func ? ppc_function_entry((void 
*)func) : 0;
+   long reladdr = func_addr - CTX_NIA(ctx);
 
if (reladdr >= (long)SZ_8G || reladdr < -(long)SZ_8G) {
pr_err("eBPF: address of %ps out of ran

Re: [PATCH v14 5/6] powerpc: add crash CPU hotplug support

2023-12-19 Thread Hari Bathini


Hi Sourabh

On 11/12/23 2:00 pm, Sourabh Jain wrote:

Due to CPU/Memory hotplug or online/offline events the elfcorehdr
(which describes the CPUs and memory of the crashed kernel) and FDT
(Flattened Device Tree) of kdump image becomes outdated. Consequently,
attempting dump collection with an outdated elfcorehdr or FDT can lead
to failed or inaccurate dump collection.

Going forward CPU hotplug or online/offlice events are referred as


s/offlice/offline/


CPU/Memory add/remvoe events.


s/remvoe/remove/


The current solution to address the above issue involves monitoring the
CPU/memory add/remove events in userspace using udev rules and whenever
there are changes in CPU and memory resources, the entire kdump image
is loaded again. The kdump image includes kernel, initrd, elfcorehdr,
FDT, purgatory. Given that only elfcorehdr and FDT get outdated due to
CPU/Memory add/remove events, reloading the entire kdump image is
inefficient. More importantly, kdump remains inactive for a substantial
amount of time until the kdump reload completes.

To address the aforementioned issue, commit 247262756121 ("crash: add
generic infrastructure for crash hotplug support") added a generic
infrastructure that allows architectures to selectively update the kdump
image component during CPU or memory add/remove events within the kernel
itself.

In the event of a CPU or memory add/remove event, the generic crash
hotplug event handler, `crash_handle_hotplug_event()`, is triggered. It
then acquires the necessary locks to update the kdump image and invokes
the architecture-specific crash hotplug handler,
`arch_crash_handle_hotplug_event()`, to update the required kdump image
components.

This patch adds crash hotplug handler for PowerPC and enable support to
update the kdump image on CPU add/remove events. Support for memory
add/remove events is added in a subsequent patch with the title
"powerpc: add crash memory hotplug support."

As mentioned earlier, only the elfcorehdr and FDT kdump image components
need to be updated in the event of CPU or memory add/remove events.
However, the PowerPC architecture crash hotplug handler only updates the
FDT to enable crash hotplug support for CPU add/remove events. Here's
why.

The Elfcorehdr on PowerPC is built with possible CPUs, and thus, it does
not need an update on CPU add/remove events. On the other hand, the FDT
needs to be updated on CPU add events to include the newly added CPU. If
the FDT is not updated and the kernel crashes on a newly added CPU, the
kdump kernel will fail to boot due to the unavailability of the crashing
CPU in the FDT. During the early boot, it is expected that the boot CPU
must be a part of the FDT; otherwise, the kernel will raise a BUG and
fail to boot. For more information, refer to commit 36ae37e3436b0
("powerpc: Make boot_cpuid common between 32 and 64-bit"). Since it is
okay to have an offline CPU in the kdump FDT, no action is taken in case
of CPU removal.

There are two system calls, `kexec_file_load` and `kexec_load`, used to
load the kdump image. Few changes have been made to ensure kernel can
safely update the kdump FDT for both system calls.

For kexec_file_load syscall the kdump image is prepared in kernel. So to
support an increasing number of CPUs, the FDT is constructed with extra
buffer space to ensure it can accommodate a possible number of CPU
nodes. Additionally, a call to fdt_pack (which trims the unused space
once the FDT is prepared) is avoided for kdump image loading if this
feature is enabled.

For the kexec_load syscall, the FDT is updated only if both the
KEXEC_UPDATE_FDT and KEXEC_UPDATE_ELFCOREHDR kexec flags are passed to
the kernel by the kexec tool. Passing these flags to the kernel
indicates that the FDT is built to accommodate possible CPUs, and the
FDT segment is not considered for SHA calculation, making it safe to
update the FDT.

Commit 88a6f8994421 ("crash: memory and CPU hotplug sysfs attributes")
added a sysfs interface to indicate userspace (kdump udev rule) that
kernel will update the kdump image on CPU hotplug events, so kdump
reload can be avoided. Implement arch specific function
`arch_crash_hotplug_cpu_support()` to correctly advertise kernel
capability to update kdump image.

This feature is advertised to userspace when the following conditions
are met:

1. Kdump image is loaded using kexec_file_load system call.
2. Kdump image is loaded using kexec_load system and both
KEXEC_UPATE_ELFCOREHDR and KEXEC_UPDATE_FDT kexec flags are
passed to kernel.

The changes related to this feature are kept under the CRASH_HOTPLUG
config, and it is enabled by default.

Signed-off-by: Sourabh Jain 
Cc: Akhil Raj 
Cc: Andrew Morton 
Cc: Aneesh Kumar K.V 
Cc: Baoquan He 
Cc: Borislav Petkov (AMD) 
Cc: Boris Ostrovsky 
Cc: Christophe Leroy 
Cc: Dave Hansen 
Cc: Dave Young 
Cc: David Hildenbrand 
Cc: Eric DeVolder 
Cc: Greg Kroah-Hartman 
Cc: Hari Bathini 
Cc: Laurent Dufour 
Cc:

Re: [RFC PATCH 2/3] powerpc/fadump: pass additional parameters to dump capture kernel

2023-12-17 Thread Hari Bathini


Hi Sourabh,

On 15/12/23 2:12 pm, Sourabh Jain wrote:

Hello Hari,

On 06/12/23 01:48, Hari Bathini wrote:

For fadump case, passing additional parameters to dump capture kernel
helps in minimizing the memory footprint for it and also provides the
flexibility to disable components/modules, like hugepages, that are
hindering the boot process of the special dump capture environment.

Set up a dedicated parameter area to be passed to the capture kernel.
This area type is defined as RTAS_FADUMP_PARAM_AREA. Sysfs attribute
'/sys/kernel/fadump/bootargs_append' is exported to the userspace to
specify the additional parameters to be passed to the capture kernel

Signed-off-by: Hari Bathini 
---
  arch/powerpc/include/asm/fadump-internal.h   |  3 +
  arch/powerpc/kernel/fadump.c | 80 
  arch/powerpc/platforms/powernv/opal-fadump.c |  6 +-
  arch/powerpc/platforms/pseries/rtas-fadump.c | 35 -
  arch/powerpc/platforms/pseries/rtas-fadump.h | 11 ++-
  5 files changed, 126 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump-internal.h 
b/arch/powerpc/include/asm/fadump-internal.h

index b3956c400519..81629226b15f 100644
--- a/arch/powerpc/include/asm/fadump-internal.h
+++ b/arch/powerpc/include/asm/fadump-internal.h
@@ -97,6 +97,8 @@ struct fw_dump {
  unsigned long    cpu_notes_buf_vaddr;
  unsigned long    cpu_notes_buf_size;
+    unsigned long    param_area;
+
  /*
   * Maximum size supported by firmware to copy from source to
   * destination address per entry.
@@ -111,6 +113,7 @@ struct fw_dump {
  unsigned long    dump_active:1;
  unsigned long    dump_registered:1;
  unsigned long    nocma:1;
+    unsigned long    param_area_supported:1;
  struct fadump_ops    *ops;
  };
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 757681658dda..98f089747ac9 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1470,6 +1470,7 @@ static ssize_t mem_reserved_show(struct kobject 
*kobj,

  return sprintf(buf, "%ld\n", fw_dump.reserve_dump_area_size);
  }
+
  static ssize_t registered_show(struct kobject *kobj,
 struct kobj_attribute *attr,
 char *buf)
@@ -1477,6 +1478,43 @@ static ssize_t registered_show(struct kobject 
*kobj,

  return sprintf(buf, "%d\n", fw_dump.dump_registered);
  }
+static ssize_t bootargs_append_show(struct kobject *kobj,
+   struct kobj_attribute *attr,
+   char *buf)
+{
+    return sprintf(buf, "%s\n", (char *)__va(fw_dump.param_area));
+}
+
+static ssize_t bootargs_append_store(struct kobject *kobj,
+   struct kobj_attribute *attr,
+   const char *buf, size_t count)
+{
+    char *params;
+
+    if (!fw_dump.fadump_enabled || fw_dump.dump_active)
+    return -EPERM;
+
+    if (count >= COMMAND_LINE_SIZE)
+    return -EINVAL;
+
+    /*
+ * Fail here instead of handling this scenario with
+ * some silly workaround in capture kernel.
+ */
+    if (saved_command_line_len + count >= COMMAND_LINE_SIZE) {
+    pr_err("Appending parameters exceeds cmdline size!\n");
+    return -ENOSPC;
+    }
+
+    params = __va(fw_dump.param_area);
+    strscpy_pad(params, buf, COMMAND_LINE_SIZE);
+    /* Remove newline character at the end. */
+    if (params[count-1] == '\n')
+    params[count-1] = '\0';
+
+    return count;
+}
+
  static ssize_t registered_store(struct kobject *kobj,
  struct kobj_attribute *attr,
  const char *buf, size_t count)
@@ -1535,6 +1573,7 @@ static struct kobj_attribute release_attr = 
__ATTR_WO(release_mem);

  static struct kobj_attribute enable_attr = __ATTR_RO(enabled);
  static struct kobj_attribute register_attr = __ATTR_RW(registered);
  static struct kobj_attribute mem_reserved_attr = 
__ATTR_RO(mem_reserved);
+static struct kobj_attribute bootargs_append_attr = 
__ATTR_RW(bootargs_append);

  static struct attribute *fadump_attrs[] = {
  _attr.attr,
@@ -1611,6 +1650,46 @@ static void __init fadump_init_files(void)
  return;
  }
+/*
+ * Reserve memory to store additional parameters to be passed
+ * for fadump/capture kernel.
+ */
+static void fadump_setup_param_area(void)
+{
+    phys_addr_t range_start, range_end;
+
+    if (!fw_dump.param_area_supported || fw_dump.dump_active)
+    return;
+
+    /* This memory can't be used by PFW or bootloader as it is shared 
across kernels */

+    if (radix_enabled()) {
+    /*
+ * Anywhere in the upper half should be good enough as all 
memory

+ * is accessible in real mode.
+ */
+    range_start = memblock_end_of_DRAM() / 2;
+    range_end = memblock_end_of_DRAM();
+    } else {
+    /*
+ * Passing additional parameters is supported for hash MMU only
+ * if the first memory block size is 768MB or hi

[RFC PATCH 3/3] powerpc/fadump: pass additional parameters when fadump is active

2023-12-05 Thread Hari Bathini

Append the additional parameters passed/set in the dedicated parameter
area (RTAS_FADUMP_PARAM_AREA) to bootargs in fadump capture kernel.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/include/asm/fadump.h |  2 ++
 arch/powerpc/kernel/fadump.c  | 34 +++
 arch/powerpc/kernel/prom.c|  3 +++
 3 files changed, 39 insertions(+)

diff --git a/arch/powerpc/include/asm/fadump.h 
b/arch/powerpc/include/asm/fadump.h
index 526a6a647312..ef40c9b6972a 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -19,12 +19,14 @@ extern int is_fadump_active(void);
 extern int should_fadump_crash(void);
 extern void crash_fadump(struct pt_regs *, const char *);
 extern void fadump_cleanup(void);
+extern void fadump_append_bootargs(void);
 
 #else  /* CONFIG_FA_DUMP */
 static inline int is_fadump_active(void) { return 0; }
 static inline int should_fadump_crash(void) { return 0; }
 static inline void crash_fadump(struct pt_regs *regs, const char *str) { }
 static inline void fadump_cleanup(void) { }
+static inline void fadump_append_bootargs(void) { }
 #endif /* !CONFIG_FA_DUMP */
 
 #if defined(CONFIG_FA_DUMP) || defined(CONFIG_PRESERVE_FA_DUMP)
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 98f089747ac9..9b1601f99f72 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -133,6 +133,40 @@ static int __init fadump_cma_init(void)
 static int __init fadump_cma_init(void) { return 1; }
 #endif /* CONFIG_CMA */
 
+/*
+ * Additional parameters meant for capture kernel are placed in a dedicated 
area.
+ * If this is capture kernel boot, append these parameters to bootargs.
+ */
+void __init fadump_append_bootargs(void)
+{
+   char *append_args;
+   size_t len;
+
+   if (!fw_dump.dump_active || !fw_dump.param_area_supported || 
!fw_dump.param_area)
+   return;
+
+   /* TODO: What to do if this overlaps with reserve area (radix case?) */
+   if (memblock_reserve(fw_dump.param_area, COMMAND_LINE_SIZE)) {
+   pr_warn("WARNING: Can't use additional parameters area!\n");
+   fw_dump.param_area = 0;
+   return;
+   }
+
+   append_args = (char *)fw_dump.param_area;
+   len = strlen(boot_command_line);
+
+   /*
+* Too late to fail even if cmdline size exceeds. Truncate additional 
parameters
+* to cmdline size and proceed anyway.
+*/
+   if (len + strlen(append_args) >= COMMAND_LINE_SIZE - 1)
+   pr_warn("WARNING: Appending parameters exceeds cmdline size. 
Truncating!\n");
+
+   pr_debug("Cmdline: %s\n", boot_command_line);
+   snprintf(boot_command_line + len, COMMAND_LINE_SIZE - len, " %s", 
append_args);
+   pr_info("Updated cmdline: %s\n", boot_command_line);
+}
+
 /* Scan the Firmware Assisted dump configuration details. */
 int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname,
  int depth, void *data)
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 0b5878c3125b..00a03d476cb9 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -791,6 +791,9 @@ void __init early_init_devtree(void *params)
 */
of_scan_flat_dt(early_init_dt_scan_chosen_ppc, boot_command_line);
 
+   /* Append additional parameters passed for fadump capture kernel */
+   fadump_append_bootargs();
+
/* Scan memory nodes and rebuild MEMBLOCKs */
early_init_dt_scan_root();
early_init_dt_scan_memory_ppc();
-- 
2.43.0

[RFC PATCH 2/3] powerpc/fadump: pass additional parameters to dump capture kernel

2023-12-05 Thread Hari Bathini

For fadump case, passing additional parameters to dump capture kernel
helps in minimizing the memory footprint for it and also provides the
flexibility to disable components/modules, like hugepages, that are
hindering the boot process of the special dump capture environment.

Set up a dedicated parameter area to be passed to the capture kernel.
This area type is defined as RTAS_FADUMP_PARAM_AREA. Sysfs attribute
'/sys/kernel/fadump/bootargs_append' is exported to the userspace to
specify the additional parameters to be passed to the capture kernel

Signed-off-by: Hari Bathini 
---
 arch/powerpc/include/asm/fadump-internal.h   |  3 +
 arch/powerpc/kernel/fadump.c | 80 
 arch/powerpc/platforms/powernv/opal-fadump.c |  6 +-
 arch/powerpc/platforms/pseries/rtas-fadump.c | 35 -
 arch/powerpc/platforms/pseries/rtas-fadump.h | 11 ++-
 5 files changed, 126 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump-internal.h 
b/arch/powerpc/include/asm/fadump-internal.h
index b3956c400519..81629226b15f 100644
--- a/arch/powerpc/include/asm/fadump-internal.h
+++ b/arch/powerpc/include/asm/fadump-internal.h
@@ -97,6 +97,8 @@ struct fw_dump {
unsigned long   cpu_notes_buf_vaddr;
unsigned long   cpu_notes_buf_size;
 
+   unsigned long   param_area;
+
/*
 * Maximum size supported by firmware to copy from source to
 * destination address per entry.
@@ -111,6 +113,7 @@ struct fw_dump {
unsigned long   dump_active:1;
unsigned long   dump_registered:1;
unsigned long   nocma:1;
+   unsigned long   param_area_supported:1;
 
struct fadump_ops   *ops;
 };
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 757681658dda..98f089747ac9 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1470,6 +1470,7 @@ static ssize_t mem_reserved_show(struct kobject *kobj,
return sprintf(buf, "%ld\n", fw_dump.reserve_dump_area_size);
 }
 
+
 static ssize_t registered_show(struct kobject *kobj,
   struct kobj_attribute *attr,
   char *buf)
@@ -1477,6 +1478,43 @@ static ssize_t registered_show(struct kobject *kobj,
return sprintf(buf, "%d\n", fw_dump.dump_registered);
 }
 
+static ssize_t bootargs_append_show(struct kobject *kobj,
+  struct kobj_attribute *attr,
+  char *buf)
+{
+   return sprintf(buf, "%s\n", (char *)__va(fw_dump.param_area));
+}
+
+static ssize_t bootargs_append_store(struct kobject *kobj,
+  struct kobj_attribute *attr,
+  const char *buf, size_t count)
+{
+   char *params;
+
+   if (!fw_dump.fadump_enabled || fw_dump.dump_active)
+   return -EPERM;
+
+   if (count >= COMMAND_LINE_SIZE)
+   return -EINVAL;
+
+   /*
+* Fail here instead of handling this scenario with
+* some silly workaround in capture kernel.
+*/
+   if (saved_command_line_len + count >= COMMAND_LINE_SIZE) {
+   pr_err("Appending parameters exceeds cmdline size!\n");
+   return -ENOSPC;
+   }
+
+   params = __va(fw_dump.param_area);
+   strscpy_pad(params, buf, COMMAND_LINE_SIZE);
+   /* Remove newline character at the end. */
+   if (params[count-1] == '\n')
+   params[count-1] = '\0';
+
+   return count;
+}
+
 static ssize_t registered_store(struct kobject *kobj,
struct kobj_attribute *attr,
const char *buf, size_t count)
@@ -1535,6 +1573,7 @@ static struct kobj_attribute release_attr = 
__ATTR_WO(release_mem);
 static struct kobj_attribute enable_attr = __ATTR_RO(enabled);
 static struct kobj_attribute register_attr = __ATTR_RW(registered);
 static struct kobj_attribute mem_reserved_attr = __ATTR_RO(mem_reserved);
+static struct kobj_attribute bootargs_append_attr = __ATTR_RW(bootargs_append);
 
 static struct attribute *fadump_attrs[] = {
_attr.attr,
@@ -1611,6 +1650,46 @@ static void __init fadump_init_files(void)
return;
 }
 
+/*
+ * Reserve memory to store additional parameters to be passed
+ * for fadump/capture kernel.
+ */
+static void fadump_setup_param_area(void)
+{
+   phys_addr_t range_start, range_end;
+
+   if (!fw_dump.param_area_supported || fw_dump.dump_active)
+   return;
+
+   /* This memory can't be used by PFW or bootloader as it is shared 
across kernels */
+   if (radix_enabled()) {
+   /*
+* Anywhere in the upper half should be good enough as all 
memory
+* is accessible in real mode.
+*/
+   range_start = memblock_end_of_DRAM() / 2;
+   rang

[RFC PATCH 1/3] powerpc/pseries/fadump: add support for multiple boot memory regions

2023-12-05 Thread Hari Bathini

From: Sourabh Jain 

Currently, fadump on pseries assumes a single boot memory region even
though f/w supports more than one boot memory region. Add support for
more boot memory regions to make the implementation flexible for any
enhancements that introduce other region types. For this, rtas memory
structure for fadump is updated to have multiple boot memory regions
instead of just one. Additionally, methods responsible for creating
the fadump memory structure during both the first and second kernel
boot have been modified to take these multiple boot memory regions
into account. Also, a new callback has been added to the fadump_ops
structure to get the maximum boot memory regions supported by the
platform.

Signed-off-by: Sourabh Jain 
Signed-off-by: Hari Bathini 
---
 arch/powerpc/include/asm/fadump-internal.h   |   2 +-
 arch/powerpc/kernel/fadump.c |  27 +-
 arch/powerpc/platforms/powernv/opal-fadump.c |   8 +
 arch/powerpc/platforms/pseries/rtas-fadump.c | 258 ---
 arch/powerpc/platforms/pseries/rtas-fadump.h |  26 +-
 5 files changed, 199 insertions(+), 122 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump-internal.h 
b/arch/powerpc/include/asm/fadump-internal.h
index 27f9e11eda28..b3956c400519 100644
--- a/arch/powerpc/include/asm/fadump-internal.h
+++ b/arch/powerpc/include/asm/fadump-internal.h
@@ -129,6 +129,7 @@ struct fadump_ops {
  struct seq_file *m);
void(*fadump_trigger)(struct fadump_crash_info_header *fdh,
  const char *msg);
+   int (*fadump_max_boot_mem_rgns)(void);
 };
 
 /* Helper functions */
@@ -136,7 +137,6 @@ s32 __init fadump_setup_cpu_notes_buf(u32 num_cpus);
 void fadump_free_cpu_notes_buf(void);
 u32 *__init fadump_regs_to_elf_notes(u32 *buf, struct pt_regs *regs);
 void __init fadump_update_elfcore_header(char *bufp);
-bool is_fadump_boot_mem_contiguous(void);
 bool is_fadump_reserved_mem_contiguous(void);
 
 #else /* !CONFIG_PRESERVE_FA_DUMP */
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index d14eda1e8589..757681658dda 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -222,28 +222,6 @@ static bool is_fadump_mem_area_contiguous(u64 d_start, u64 
d_end)
return ret;
 }
 
-/*
- * Returns true, if there are no holes in boot memory area,
- * false otherwise.
- */
-bool is_fadump_boot_mem_contiguous(void)
-{
-   unsigned long d_start, d_end;
-   bool ret = false;
-   int i;
-
-   for (i = 0; i < fw_dump.boot_mem_regs_cnt; i++) {
-   d_start = fw_dump.boot_mem_addr[i];
-   d_end   = d_start + fw_dump.boot_mem_sz[i];
-
-   ret = is_fadump_mem_area_contiguous(d_start, d_end);
-   if (!ret)
-   break;
-   }
-
-   return ret;
-}
-
 /*
  * Returns true, if there are no holes in reserved memory area,
  * false otherwise.
@@ -389,10 +367,11 @@ static unsigned long __init get_fadump_area_size(void)
 static int __init add_boot_mem_region(unsigned long rstart,
  unsigned long rsize)
 {
+   int max_boot_mem_rgns = fw_dump.ops->fadump_max_boot_mem_rgns();
int i = fw_dump.boot_mem_regs_cnt++;
 
-   if (fw_dump.boot_mem_regs_cnt > FADUMP_MAX_MEM_REGS) {
-   fw_dump.boot_mem_regs_cnt = FADUMP_MAX_MEM_REGS;
+   if (fw_dump.boot_mem_regs_cnt > max_boot_mem_rgns) {
+   fw_dump.boot_mem_regs_cnt = max_boot_mem_rgns;
return 0;
}
 
diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c 
b/arch/powerpc/platforms/powernv/opal-fadump.c
index 964f464b1b0e..fa26c21a08d9 100644
--- a/arch/powerpc/platforms/powernv/opal-fadump.c
+++ b/arch/powerpc/platforms/powernv/opal-fadump.c
@@ -615,6 +615,13 @@ static void opal_fadump_trigger(struct 
fadump_crash_info_header *fdh,
pr_emerg("No backend support for MPIPL!\n");
 }
 
+/* FADUMP_MAX_MEM_REGS or lower */
+static int opal_fadump_max_boot_mem_rgns(void)
+{
+   return FADUMP_MAX_MEM_REGS;
+
+}
+
 static struct fadump_ops opal_fadump_ops = {
.fadump_init_mem_struct = opal_fadump_init_mem_struct,
.fadump_get_metadata_size   = opal_fadump_get_metadata_size,
@@ -627,6 +634,7 @@ static struct fadump_ops opal_fadump_ops = {
.fadump_process = opal_fadump_process,
.fadump_region_show = opal_fadump_region_show,
.fadump_trigger = opal_fadump_trigger,
+   .fadump_max_boot_mem_rgns   = opal_fadump_max_boot_mem_rgns,
 };
 
 void __init opal_fadump_dt_scan(struct fw_dump *fadump_conf, u64 node)
diff --git a/arch/powerpc/platforms/pseries/rtas-fadump.c 
b/arch/powerpc/platforms/pseries/rtas-fadump.c
index b5853e9fcc3c..1b05b4cefdfd 100644
--- a/arch/powerpc/platforms/pseries/rtas-fadump.c
+++ b/arch/powerpc/platforms/pseries/rtas-fadump.c
@@ -29,9

[RFC PATCH 0/3] powerpc/fadump: pass additional args to dump capture kernel

2023-12-05 Thread Hari Bathini

While fadump is a more reliable alternative to kdump dump capturing
method, it doesn't support passing additional parameters. Having
such support is desirable for two major reasons:

  1. It helps minimize the memory consumption of fadump dump capture
 kernel by disabling features that consume considerable amount of
 memory but have little significance for dump capture environment
 (eg. numa, cma, cgroup, etc.)
   2. It helps disable such features/components in dump capture kernel
  that are unstable and/or are being debugged.

This patch series adds support to pass additional parameters to fadump
capture kernel to make it more desirable. For this, a dedicated area
is passed between production kernel and capture kerenl to pass these
additional parameters. This support is enabled only on pseries as of
now. The dedicated area is referred to as RTAS_FADUMP_PARAM_AREA.

In radix MMU mode, this dedicated area can be anywhere but in case of
hash MMU, it can only be in the first memory block to be accessible
during early boot. Enabling this feature support in both radix and
hash MMU modes but in hash MMU only when RMA size is 768MB or more
to avoid complex memory real estate with FW components.

The first patch adds support for multiple boot memory regions to make
addition of any new region types simpler. The second patch sets up the
parameter (dedicated) area to be passed to the capture kernel.
/sys/kernel/fadump/bootargs_append is exported to the userspace to
specify the additional parameters to be passed to the capture kernel.
The last patch appends the parameters to bootargs during capture
kernel boot.

Hari Bathini (2):
  powerpc/fadump: pass additional parameters to dump capture kernel
  powerpc/fadump: pass additional parameters when fadump is active

Sourabh Jain (1):
  powerpc/pseries/fadump: add support for multiple boot memory regions

 arch/powerpc/include/asm/fadump-internal.h   |   5 +-
 arch/powerpc/include/asm/fadump.h|   2 +
 arch/powerpc/kernel/fadump.c | 141 +++--
 arch/powerpc/kernel/prom.c   |   3 +
 arch/powerpc/platforms/powernv/opal-fadump.c |  14 +-
 arch/powerpc/platforms/pseries/rtas-fadump.c | 293 +--
 arch/powerpc/platforms/pseries/rtas-fadump.h |  29 +-
 7 files changed, 360 insertions(+), 127 deletions(-)

-- 
2.43.0

Re: [PATCHv9 2/2] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt

2023-11-26 Thread Hari Bathini


Hi Pingfan, Michael,

On 17/10/23 4:03 pm, Hari Bathini wrote:



On 17/10/23 7:58 am, Pingfan Liu wrote:

*** Idea ***
For kexec -p, the boot cpu can be not the cpu0, this causes the problem
of allocating memory for paca_ptrs[]. However, in theory, there is no
requirement to assign cpu's logical id as its present sequence in the
device tree. But there is something like cpu_first_thread_sibling(),
which makes assumption on the mapping inside a core. Hence partially
loosening the mapping, i.e. unbind the mapping of core while keep the
mapping inside a core.

*** Implement ***
At this early stage, there are plenty of memory to utilize. Hence, this
patch allocates interim memory to link the cpu info on a list, then
reorder cpus by changing the list head. As a result, there is a rotate
shift between the sequence number in dt and the cpu logical number.

*** Result ***
After this patch, a boot-cpu's logical id will always be mapped into the
range [0,threads_per_core).

Besides this, at this phase, all threads in the boot core are forced to
be onlined. This restriction will be lifted in a later patch with
extra effort.

Signed-off-by: Pingfan Liu 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Mahesh Salgaonkar 
Cc: Wen Xiong 
Cc: Baoquan He 
Cc: Ming Lei 
Cc: Sourabh Jain 
Cc: Hari Bathini 
Cc: ke...@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org


Thanks for working on this, Pingfan.
Looks good to me.

Acked-by: Hari Bathini 



On second thoughts, probably better off with no impact for
bootcpu < nr_cpu_ids case and changing only two cores logical
numbering otherwise. Something like the below (Please share
your thoughts):

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index ec82f5bda908..78a8312aa8c4 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -76,7 +76,9 @@ u64 ppc64_rma_size;
 unsigned int boot_cpu_node_count __ro_after_init;
 #endif
 static phys_addr_t first_memblock_size;
+#ifdef CONFIG_SMP
 static int __initdata boot_cpu_count;
+#endif

 static int __init early_parse_mem(char *p)
 {
@@ -357,6 +359,25 @@ static int __init early_init_dt_scan_cpus(unsigned 
long node,

fdt_boot_cpuid_phys(initial_boot_params)) {
found = boot_cpu_count;
found_thread = i;
+   /*
+* Map boot-cpu logical id into the range
+* of [0, thread_per_core) if it can't be
+* accommodated within nr_cpu_ids.
+*/
+   if (i != boot_cpu_count && boot_cpu_count >= 
nr_cpu_ids) {
+   boot_cpuid = i;
+   DBG("Logical CPU number for boot CPU changed from %d 
to %d\n",
+   boot_cpu_count, i);
+   } else {
+   boot_cpuid = boot_cpu_count;
+   }
+
+   /* Ensure boot thread is acconted for in nr_cpu_ids */
+   if (boot_cpuid >= nr_cpu_ids) {
+   set_nr_cpu_ids(boot_cpuid + 1);
+   DBG("Adjusted nr_cpu_ids to %u, to include boot 
CPU.\n",
+   nr_cpu_ids);
+   }
}
 #ifdef CONFIG_SMP
/* logical cpu id is always 0 on UP kernels */
@@ -368,9 +389,8 @@ static int __init early_init_dt_scan_cpus(unsigned 
long node,

if (found < 0)
return 0;

-   DBG("boot cpu: logical %d physical %d\n", found,
+   DBG("boot cpu: logical %d physical %d\n", boot_cpuid,
be32_to_cpu(intserv[found_thread]));
-   boot_cpuid = found;

boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c

index b7b733474b60..f7179525c774 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -409,6 +409,12 @@ static void __init cpu_init_thread_core_maps(int tpc)

 u32 *cpu_to_phys_id = NULL;

+struct interrupt_server_node {
+   boolavail;
+   int len;
+   __be32 intserv[];
+};
+
 /**
  * setup_cpu_maps - initialize the following cpu maps:
  *  cpu_possible_mask
@@ -429,9 +435,13 @@ u32 *cpu_to_phys_id = NULL;
  */
 void __init smp_setup_cpu_maps(void)
 {
+   struct interrupt_server_node *core0_node = NULL, *bt_node = NULL;
+   int orig_boot_cpu = -1, orig_boot_thread = -1;
+   bool found_boot_cpu = false;
struct device_node *dn;
-   int cpu = 0;
int nthreads = 1;
+   int cpu = 0;
+   int j, len;

DBG("smp_setup_cpu_maps()\n");

@@ -442,9 +452,9 @@ void __init smp_setup_cpu_maps(void)
  __func__, nr_cpu_ids * sizeof(u32), __alignof__(u

Re: [PATCH v5 1/3] powerpc: make fadump resilient with memory add/remove events

2023-11-16 Thread Hari Bathini





On 17/11/23 11:01 am, Aneesh Kumar K V wrote:

On 11/17/23 10:03 AM, Sourabh Jain wrote:

Hi Aneesh,

Thanks for reviewing the patch.

On 15/11/23 10:14, Aneesh Kumar K.V wrote:

Sourabh Jain  writes:




diff --git a/arch/powerpc/include/asm/fadump-internal.h 
b/arch/powerpc/include/asm/fadump-internal.h
index 27f9e11eda28..7be3d8894520 100644
--- a/arch/powerpc/include/asm/fadump-internal.h
+++ b/arch/powerpc/include/asm/fadump-internal.h
@@ -42,7 +42,25 @@ static inline u64 fadump_str_to_u64(const char *str)
     #define FADUMP_CPU_UNKNOWN    (~((u32)0))
   -#define FADUMP_CRASH_INFO_MAGIC    fadump_str_to_u64("FADMPINF")
+/*
+ * The introduction of new fields in the fadump crash info header has
+ * led to a change in the magic key, from `FADMPINF` to `FADMPSIG`.
+ * This alteration ensures backward compatibility, enabling the kernel
+ * with the updated fadump crash info to handle kernel dumps from older
+ * kernels.
+ *
+ * To prevent the need for further changes to the magic number in the
+ * event of future modifications to the fadump header, a version field
+ * has been introduced to track the fadump crash info header version.
+ *
+ * Historically, there was no connection between the magic number and
+ * the fadump crash info header version. However, moving forward, the
+ * `FADMPINF` magic number in header will be treated as version 0, while
+ * the `FADMPSIG` magic number in header will include a version field to
+ * determine its version.
+ */
+#define FADUMP_CRASH_INFO_MAGIC    fadump_str_to_u64("FADMPSIG")
+#define FADUMP_VERSION    1


Can we keep the old magic details as

#define FADUMP_CRASH_INFO_MAGIC_OLD    fadump_str_to_u64("FADMPINF")
#define FADUMP_CRASH_INFO_MAGIC    fadump_str_to_u64("FADMPSIG")


Sure.


Also considering the struct need not be backward compatible, can we just
do

struct fadump_crash_info_header {
 u64    magic_number;
 u32    crashing_cpu;
 u64    elfcorehdr_addr;
 u64    elfcorehdr_size;
 u64    vmcoreinfo_raddr;
 u64    vmcoreinfo_size;
 struct pt_regs    regs;
 struct cpumask    cpu_mask;
};
static inline bool fadump_compatible(struct fadump_crash_info_header
*fdh)
{
 return (fdh->magic_number == FADUMP_CRASH_INFO_MAGIC)
}

and fail fadump if we find it not compatible?


Agree that it is unsafe to collect a dump with an incompatible fadump crash 
info header.

Given that I am updating the fadump crash info header, we can make a few 
arrangements
like adding a size filed for the dynamic size attribute like pt_regs and 
cpumask to ensure
better compatibility in the future.

Additionally, let's introduce a version field to the fadump crash info header 
to avoid changing
the magic number in the future.



I am not sure whether we need to add all the complexity to enable supporting 
different fadump kernel
version. Is that even a possible use case with fadump? Can't we always assume 
that with fadump the
crash kernel and fadump kernel will be same version? if yes we can simply fail 
with a magic number
mismatch because that indicates an user config error?


If we decide not to support different kernel versions for production
kernel and capture kernel, We can make that implicit by adding kernel
version info of production kernel in the header and bailing out if
there is kernel version mismatch as magic could still match for two
different kernel versions.

I would personally prefer something like the below though:

struct fadump_crash_info_header {
u64 magic_number;
u32 version
u32 crashing_cpu;
u64 elfcorehdr_addr;
u64 elfcorehdr_size;
u64 vmcoreinfo_raddr;
u64 vmcoreinfo_size;
u8  kernel_version[];
u32 pt_regs_sz;
struct pt_regs  regs;
u32 cpu_mask_sz;
struct cpumask  cpu_mask;
};

if (magic_number != new_magic)
goto err;   /* Error out */

if (kernel_version != capture_kernel_version)
{
		if (cpu_mask_sz == sizeof(struct pt_regs) && cpu_mask_sz == 
sizeof(struct cpumask))

/*
 * Warn about the kernel version mismatch and how data 
can be different
 * across kernel versions and proceed anyway!
 */
else
goto err;   /* Error out */
}

This ensures we warn and proceed in cases where it is less likely to
have issues capturing kernel dump. This helps in dev environment where
we are trying to debug an early boot crash - in which case capture
kernel can't be the same kernel as it would likely hit the same problem
while booting..

Thanks
Hari

Re: [PATCH v7 1/5] powerpc/code-patching: introduce patch_instructions()

2023-10-30 Thread Hari Bathini


Hi Aneesh,

On 30/10/23 6:32 pm, Aneesh Kumar K.V wrote:

Hari Bathini  writes:


patch_instruction() entails setting up pte, patching the instruction,
clearing the pte and flushing the tlb. If multiple instructions need
to be patched, every instruction would have to go through the above
drill unnecessarily. Instead, introduce patch_instructions() function
that sets up the pte, clears the pte and flushes the tlb only once
per page range of instructions to be patched. Duplicate most of the
patch_instruction() code instead of merging with it, to avoid the
performance degradation observed on ppc32, for patch_instruction(),
with the code path merged. Also, setup poking_init() always as BPF
expects poking_init() to be setup even when STRICT_KERNEL_RWX is off.

Signed-off-by: Hari Bathini 
Acked-by: Song Liu 



A lot of this is duplicate of patch_instruction(). Can we consolidate
thing between them?


True. The code was consolidated till v5 but had to duplicate most of it
to avoid performance degradation reported on ppc32:


https://lore.kernel.org/all/6cceb564-8b52-4d98-9118-92a914f48...@csgroup.eu/




---

Changes in v7:
* Fixed crash observed with !STRICT_RWX.


  arch/powerpc/include/asm/code-patching.h |   1 +
  arch/powerpc/lib/code-patching.c | 141 ++-
  2 files changed, 139 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index 3f881548fb61..0e29ccf903d0 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -74,6 +74,7 @@ int create_cond_branch(ppc_inst_t *instr, const u32 *addr,
  int patch_branch(u32 *addr, unsigned long target, int flags);
  int patch_instruction(u32 *addr, ppc_inst_t instr);
  int raw_patch_instruction(u32 *addr, ppc_inst_t instr);
+int patch_instructions(u32 *addr, u32 *code, size_t len, bool repeat_instr);
  
  static inline unsigned long patch_site_addr(s32 *site)

  {
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index b00112d7ad46..e1c1fd9246d8 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -204,9 +204,6 @@ void __init poking_init(void)
  {
int ret;
  
-	if (!IS_ENABLED(CONFIG_STRICT_KERNEL_RWX))

-   return;
-
if (mm_patch_enabled())
ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
"powerpc/text_poke_mm:online",
@@ -378,6 +375,144 @@ int patch_instruction(u32 *addr, ppc_inst_t instr)
  }
  NOKPROBE_SYMBOL(patch_instruction);
  
+static int __patch_instructions(u32 *patch_addr, u32 *code, size_t len, bool repeat_instr)

+{
+   unsigned long start = (unsigned long)patch_addr;
+
+   /* Repeat instruction */
+   if (repeat_instr) {
+   ppc_inst_t instr = ppc_inst_read(code);
+
+   if (ppc_inst_prefixed(instr)) {
+   u64 val = ppc_inst_as_ulong(instr);
+
+   memset64((u64 *)patch_addr, val, len / 8);
+   } else {
+   u32 val = ppc_inst_val(instr);
+
+   memset32(patch_addr, val, len / 4);
+   }
+   } else {
+   memcpy(patch_addr, code, len);
+   }
+
+   smp_wmb();  /* smp write barrier */
+   flush_icache_range(start, start + len);
+   return 0;
+}
+
+/*
+ * A page is mapped and instructions that fit the page are patched.
+ * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below.
+ */
+static int __do_patch_instructions_mm(u32 *addr, u32 *code, size_t len, bool 
repeat_instr)
+{
+   struct mm_struct *patching_mm, *orig_mm;
+   unsigned long pfn = get_patch_pfn(addr);
+   unsigned long text_poke_addr;
+   spinlock_t *ptl;
+   u32 *patch_addr;
+   pte_t *pte;
+   int err;
+
+   patching_mm = __this_cpu_read(cpu_patching_context.mm);
+   text_poke_addr = __this_cpu_read(cpu_patching_context.addr);
+   patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr));
+
+   pte = get_locked_pte(patching_mm, text_poke_addr, );
+   if (!pte)
+   return -ENOMEM;
+
+   __set_pte_at(patching_mm, text_poke_addr, pte, pfn_pte(pfn, 
PAGE_KERNEL), 0);
+
+   /* order PTE update before use, also serves as the hwsync */
+   asm volatile("ptesync" ::: "memory");
+
+   /* order context switch after arbitrary prior code */
+   isync();
+
+   orig_mm = start_using_temp_mm(patching_mm);
+
+   err = __patch_instructions(patch_addr, code, len, repeat_instr);
+
+   /* context synchronisation performed by __patch_instructions */
+   stop_using_temp_mm(patching_mm, orig_mm);
+
+   pte_clear(patching_mm, text_poke_addr, pte);
+   /*
+* ptesync to order PTE update before TLB invalidation done
+* by radix__local_flush_tlb_page_psize (in _tlbiel_va)
+*/
+

Re: [PATCH v6 5/5] powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free]

2023-10-20 Thread Hari Bathini





On 19/10/23 11:41 am, Michael Ellerman wrote:

Hari Bathini  writes:

Use bpf_jit_binary_pack_alloc in powerpc jit. The jit engine first
writes the program to the rw buffer. When the jit is done, the program
is copied to the final location with bpf_jit_binary_pack_finalize.
With multiple jit_subprogs, bpf_jit_free is called on some subprograms
that haven't got bpf_jit_binary_pack_finalize() yet. Implement custom
bpf_jit_free() like in commit 1d5f82d9dd47 ("bpf, x86: fix freeing of
not-finalized bpf_prog_pack") to call bpf_jit_binary_pack_finalize(),
if necessary. As bpf_flush_icache() is not needed anymore, remove it.

Signed-off-by: Hari Bathini 
Acked-by: Song Liu 
---
  arch/powerpc/net/bpf_jit.h|  18 ++---
  arch/powerpc/net/bpf_jit_comp.c   | 106 ++
  arch/powerpc/net/bpf_jit_comp32.c |  13 ++--
  arch/powerpc/net/bpf_jit_comp64.c |  10 +--
  4 files changed, 96 insertions(+), 51 deletions(-)


This causes a crash at boot on my Power7 box:


Thanks, Michael.
Posted v7.

- Hari

[PATCH v7 1/5] powerpc/code-patching: introduce patch_instructions()

2023-10-20 Thread Hari Bathini

patch_instruction() entails setting up pte, patching the instruction,
clearing the pte and flushing the tlb. If multiple instructions need
to be patched, every instruction would have to go through the above
drill unnecessarily. Instead, introduce patch_instructions() function
that sets up the pte, clears the pte and flushes the tlb only once
per page range of instructions to be patched. Duplicate most of the
patch_instruction() code instead of merging with it, to avoid the
performance degradation observed on ppc32, for patch_instruction(),
with the code path merged. Also, setup poking_init() always as BPF
expects poking_init() to be setup even when STRICT_KERNEL_RWX is off.

Signed-off-by: Hari Bathini 
Acked-by: Song Liu 
---

Changes in v7:
* Fixed crash observed with !STRICT_RWX.


 arch/powerpc/include/asm/code-patching.h |   1 +
 arch/powerpc/lib/code-patching.c | 141 ++-
 2 files changed, 139 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index 3f881548fb61..0e29ccf903d0 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -74,6 +74,7 @@ int create_cond_branch(ppc_inst_t *instr, const u32 *addr,
 int patch_branch(u32 *addr, unsigned long target, int flags);
 int patch_instruction(u32 *addr, ppc_inst_t instr);
 int raw_patch_instruction(u32 *addr, ppc_inst_t instr);
+int patch_instructions(u32 *addr, u32 *code, size_t len, bool repeat_instr);
 
 static inline unsigned long patch_site_addr(s32 *site)
 {
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index b00112d7ad46..e1c1fd9246d8 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -204,9 +204,6 @@ void __init poking_init(void)
 {
int ret;
 
-   if (!IS_ENABLED(CONFIG_STRICT_KERNEL_RWX))
-   return;
-
if (mm_patch_enabled())
ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
"powerpc/text_poke_mm:online",
@@ -378,6 +375,144 @@ int patch_instruction(u32 *addr, ppc_inst_t instr)
 }
 NOKPROBE_SYMBOL(patch_instruction);
 
+static int __patch_instructions(u32 *patch_addr, u32 *code, size_t len, bool 
repeat_instr)
+{
+   unsigned long start = (unsigned long)patch_addr;
+
+   /* Repeat instruction */
+   if (repeat_instr) {
+   ppc_inst_t instr = ppc_inst_read(code);
+
+   if (ppc_inst_prefixed(instr)) {
+   u64 val = ppc_inst_as_ulong(instr);
+
+   memset64((u64 *)patch_addr, val, len / 8);
+   } else {
+   u32 val = ppc_inst_val(instr);
+
+   memset32(patch_addr, val, len / 4);
+   }
+   } else {
+   memcpy(patch_addr, code, len);
+   }
+
+   smp_wmb();  /* smp write barrier */
+   flush_icache_range(start, start + len);
+   return 0;
+}
+
+/*
+ * A page is mapped and instructions that fit the page are patched.
+ * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below.
+ */
+static int __do_patch_instructions_mm(u32 *addr, u32 *code, size_t len, bool 
repeat_instr)
+{
+   struct mm_struct *patching_mm, *orig_mm;
+   unsigned long pfn = get_patch_pfn(addr);
+   unsigned long text_poke_addr;
+   spinlock_t *ptl;
+   u32 *patch_addr;
+   pte_t *pte;
+   int err;
+
+   patching_mm = __this_cpu_read(cpu_patching_context.mm);
+   text_poke_addr = __this_cpu_read(cpu_patching_context.addr);
+   patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr));
+
+   pte = get_locked_pte(patching_mm, text_poke_addr, );
+   if (!pte)
+   return -ENOMEM;
+
+   __set_pte_at(patching_mm, text_poke_addr, pte, pfn_pte(pfn, 
PAGE_KERNEL), 0);
+
+   /* order PTE update before use, also serves as the hwsync */
+   asm volatile("ptesync" ::: "memory");
+
+   /* order context switch after arbitrary prior code */
+   isync();
+
+   orig_mm = start_using_temp_mm(patching_mm);
+
+   err = __patch_instructions(patch_addr, code, len, repeat_instr);
+
+   /* context synchronisation performed by __patch_instructions */
+   stop_using_temp_mm(patching_mm, orig_mm);
+
+   pte_clear(patching_mm, text_poke_addr, pte);
+   /*
+* ptesync to order PTE update before TLB invalidation done
+* by radix__local_flush_tlb_page_psize (in _tlbiel_va)
+*/
+   local_flush_tlb_page_psize(patching_mm, text_poke_addr, 
mmu_virtual_psize);
+
+   pte_unmap_unlock(pte, ptl);
+
+   return err;
+}
+
+/*
+ * A page is mapped and instructions that fit the page are patched.
+ * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below.
+ */
+static int __do_patch_instructions(u32 *addr, u32 *code, size_t len, bool 
repeat_instr)
+{
+   unsigned

[PATCH v7 5/5] powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free]

2023-10-20 Thread Hari Bathini

Use bpf_jit_binary_pack_alloc in powerpc jit. The jit engine first
writes the program to the rw buffer. When the jit is done, the program
is copied to the final location with bpf_jit_binary_pack_finalize.
With multiple jit_subprogs, bpf_jit_free is called on some subprograms
that haven't got bpf_jit_binary_pack_finalize() yet. Implement custom
bpf_jit_free() like in commit 1d5f82d9dd47 ("bpf, x86: fix freeing of
not-finalized bpf_prog_pack") to call bpf_jit_binary_pack_finalize(),
if necessary. As bpf_flush_icache() is not needed anymore, remove it.

Signed-off-by: Hari Bathini 
Acked-by: Song Liu 
---

* No changes in v7.


 arch/powerpc/net/bpf_jit.h|  18 ++---
 arch/powerpc/net/bpf_jit_comp.c   | 106 ++
 arch/powerpc/net/bpf_jit_comp32.c |  13 ++--
 arch/powerpc/net/bpf_jit_comp64.c |  10 +--
 4 files changed, 96 insertions(+), 51 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 72b7bb34fade..cdea5dccaefe 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -36,9 +36,6 @@
EMIT(PPC_RAW_BRANCH(offset)); \
} while (0)
 
-/* bl (unconditional 'branch' with link) */
-#define PPC_BL(dest)   EMIT(PPC_RAW_BL((dest) - (unsigned long)(image + 
ctx->idx)))
-
 /* "cond" here covers BO:BI fields. */
 #define PPC_BCC_SHORT(cond, dest)\
do {  \
@@ -147,12 +144,6 @@ struct codegen_context {
 #define BPF_FIXUP_LEN  2 /* Two instructions => 8 bytes */
 #endif
 
-static inline void bpf_flush_icache(void *start, void *end)
-{
-   smp_wmb();  /* smp write barrier */
-   flush_icache_range((unsigned long)start, (unsigned long)end);
-}
-
 static inline bool bpf_is_seen_register(struct codegen_context *ctx, int i)
 {
return ctx->seen & (1 << (31 - i));
@@ -169,16 +160,17 @@ static inline void bpf_clear_seen_register(struct 
codegen_context *ctx, int i)
 }
 
 void bpf_jit_init_reg_mapping(struct codegen_context *ctx);
-int bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 
func);
-int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context 
*ctx,
+int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context 
*ctx, u64 func);
+int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct 
codegen_context *ctx,
   u32 *addrs, int pass, bool extra_pass);
 void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_realloc_regs(struct codegen_context *ctx);
 int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int 
tmp_reg, long exit_addr);
 
-int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct 
codegen_context *ctx,
- int insn_idx, int jmp_off, int dst_reg);
+int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, u32 *fimage, int 
pass,
+ struct codegen_context *ctx, int insn_idx,
+ int jmp_off, int dst_reg);
 
 #endif
 
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index e7ca270a39d5..a79d7c478074 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -44,9 +44,12 @@ int bpf_jit_emit_exit_insn(u32 *image, struct 
codegen_context *ctx, int tmp_reg,
 }
 
 struct powerpc_jit_data {
-   struct bpf_binary_header *header;
+   /* address of rw header */
+   struct bpf_binary_header *hdr;
+   /* address of ro final header */
+   struct bpf_binary_header *fhdr;
u32 *addrs;
-   u8 *image;
+   u8 *fimage;
u32 proglen;
struct codegen_context ctx;
 };
@@ -67,11 +70,14 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
struct codegen_context cgctx;
int pass;
int flen;
-   struct bpf_binary_header *bpf_hdr;
+   struct bpf_binary_header *fhdr = NULL;
+   struct bpf_binary_header *hdr = NULL;
struct bpf_prog *org_fp = fp;
struct bpf_prog *tmp_fp;
bool bpf_blinded = false;
bool extra_pass = false;
+   u8 *fimage = NULL;
+   u32 *fcode_base;
u32 extable_len;
u32 fixup_len;
 
@@ -101,9 +107,16 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
addrs = jit_data->addrs;
if (addrs) {
cgctx = jit_data->ctx;
-   image = jit_data->image;
-   bpf_hdr = jit_data->header;
+   /*
+* JIT compiled to a writable location (image/code_base) first.
+* It is then moved to the readonly final location 
(fimage/fcode_base)
+* using instruction patching.
+*/
+   fimage

[PATCH v7 3/5] powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack

2023-10-20 Thread Hari Bathini

Implement bpf_arch_text_invalidate and use it to fill unused part of
the bpf_prog_pack with trap instructions when a BPF program is freed.

Signed-off-by: Hari Bathini 
Acked-by: Song Liu 
---

* No changes in v7.


 arch/powerpc/net/bpf_jit_comp.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index c740eac8d584..ecd7cffbbe28 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -292,3 +292,18 @@ void *bpf_arch_text_copy(void *dst, void *src, size_t len)
 
return err ? ERR_PTR(err) : dst;
 }
+
+int bpf_arch_text_invalidate(void *dst, size_t len)
+{
+   u32 insn = BREAKPOINT_INSTRUCTION;
+   int ret;
+
+   if (WARN_ON_ONCE(core_kernel_text((unsigned long)dst)))
+   return -EINVAL;
+
+   mutex_lock(_mutex);
+   ret = patch_instructions(dst, , len, true);
+   mutex_unlock(_mutex);
+
+   return ret;
+}
-- 
2.41.0

[PATCH v7 4/5] powerpc/bpf: rename powerpc64_jit_data to powerpc_jit_data

2023-10-20 Thread Hari Bathini

powerpc64_jit_data is a misnomer as it is meant for both ppc32 and
ppc64. Rename it to powerpc_jit_data.

Signed-off-by: Hari Bathini 
Acked-by: Song Liu 
---

* No changes in v7.


 arch/powerpc/net/bpf_jit_comp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index ecd7cffbbe28..e7ca270a39d5 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -43,7 +43,7 @@ int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context 
*ctx, int tmp_reg,
return 0;
 }
 
-struct powerpc64_jit_data {
+struct powerpc_jit_data {
struct bpf_binary_header *header;
u32 *addrs;
u8 *image;
@@ -63,7 +63,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
u8 *image = NULL;
u32 *code_base;
u32 *addrs;
-   struct powerpc64_jit_data *jit_data;
+   struct powerpc_jit_data *jit_data;
struct codegen_context cgctx;
int pass;
int flen;
-- 
2.41.0

[PATCH v7 2/5] powerpc/bpf: implement bpf_arch_text_copy

2023-10-20 Thread Hari Bathini

bpf_arch_text_copy is used to dump JITed binary to RX page, allowing
multiple BPF programs to share the same page. Use the newly introduced
patch_instructions() to implement it.

Signed-off-by: Hari Bathini 
Acked-by: Song Liu 
---

* No changes in v7.


 arch/powerpc/net/bpf_jit_comp.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 37043dfc1add..c740eac8d584 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -13,9 +13,13 @@
 #include 
 #include 
 #include 
-#include 
+#include 
+#include 
 #include 
 
+#include 
+#include 
+
 #include "bpf_jit.h"
 
 static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
@@ -274,3 +278,17 @@ int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, 
int pass, struct code
ctx->exentry_idx++;
return 0;
 }
+
+void *bpf_arch_text_copy(void *dst, void *src, size_t len)
+{
+   int err;
+
+   if (WARN_ON_ONCE(core_kernel_text((unsigned long)dst)))
+   return ERR_PTR(-EINVAL);
+
+   mutex_lock(_mutex);
+   err = patch_instructions(dst, src, len, false);
+   mutex_unlock(_mutex);
+
+   return err ? ERR_PTR(err) : dst;
+}
-- 
2.41.0

[PATCH v7 0/5] powerpc/bpf: use BPF prog pack allocator

2023-10-20 Thread Hari Bathini

Most BPF programs are small, but they consume a page each. For systems
with busy traffic and many BPF programs, this may also add significant
pressure on instruction TLB. High iTLB pressure usually slows down the
whole system causing visible performance degradation for production
workloads.

bpf_prog_pack, a customized allocator that packs multiple bpf programs
into preallocated memory chunks, was proposed [1] to address it. This
series extends this support on powerpc.

Both bpf_arch_text_copy() & bpf_arch_text_invalidate() functions,
needed for this support depend on instruction patching in text area.
Currently, patch_instruction() supports patching only one instruction
at a time. The first patch introduces patch_instructions() function
to enable patching more than one instruction at a time. This helps in
avoiding performance degradation while JITing bpf programs.

Patches 2 & 3 implement the above mentioned arch specific functions
using patch_instructions(). Patch 4 fixes a misnomer in bpf JITing
code. The last patch enables the use of BPF prog pack allocator on
powerpc and also, ensures cleanup is handled gracefully.

[1] https://lore.kernel.org/bpf/20220204185742.271030-1-s...@kernel.org/

Changes in v7:
* Fixed crash observed with !STRICT_RWX.

Changes in v6:
* No changes in patches 2-5/5 except addition of Acked-by tags from Song.
* Skipped merging code path of patch_instruction() & patch_instructions()
  to avoid performance overhead observed on ppc32 with that.

Changes in v5:
* Moved introduction of patch_instructions() as 1st patch in series.
* Improved patch_instructions() to use memset & memcpy.
* Fixed the misnomer in JITing code as a separate patch.
* Removed unused bpf_flush_icache() function.

Changes in v4:
* Updated bpf_patch_instructions() definition in patch 1/5 so that
  it doesn't have to be updated again in patch 2/5.
* Addressed Christophe's comment on bpf_arch_text_invalidate() return
  value in patch 2/5.

Changes in v3:
* Fixed segfault issue observed on ppc32 due to inaccurate offset
  calculation for branching.
* Tried to minimize the performance impact for patch_instruction()
  with the introduction of patch_instructions().
* Corrected uses of u32* vs ppc_instr_t.
* Moved the change that introduces patch_instructions() to after
  enabling bpf_prog_pack support.
* Added few comments to improve code readability.

Changes in v2:
* Introduced patch_instructions() to help with patching bpf programs.


Hari Bathini (5):
  powerpc/code-patching: introduce patch_instructions()
  powerpc/bpf: implement bpf_arch_text_copy
  powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack
  powerpc/bpf: rename powerpc64_jit_data to powerpc_jit_data
  powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free]

 arch/powerpc/include/asm/code-patching.h |   1 +
 arch/powerpc/lib/code-patching.c | 141 +-
 arch/powerpc/net/bpf_jit.h   |  18 +--
 arch/powerpc/net/bpf_jit_comp.c  | 145 ++-
 arch/powerpc/net/bpf_jit_comp32.c|  13 +-
 arch/powerpc/net/bpf_jit_comp64.c|  10 +-
 6 files changed, 271 insertions(+), 57 deletions(-)

-- 
2.41.0

Re: [PATCHv9 2/2] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt

2023-10-20 Thread Hari Bathini





On 18/10/23 1:51 pm, Pingfan Liu wrote:

On Tue, Oct 17, 2023 at 6:39 PM Hari Bathini  wrote:




On 17/10/23 7:58 am, Pingfan Liu wrote:

*** Idea ***
For kexec -p, the boot cpu can be not the cpu0, this causes the problem
of allocating memory for paca_ptrs[]. However, in theory, there is no
requirement to assign cpu's logical id as its present sequence in the
device tree. But there is something like cpu_first_thread_sibling(),
which makes assumption on the mapping inside a core. Hence partially
loosening the mapping, i.e. unbind the mapping of core while keep the
mapping inside a core.

*** Implement ***
At this early stage, there are plenty of memory to utilize. Hence, this
patch allocates interim memory to link the cpu info on a list, then
reorder cpus by changing the list head. As a result, there is a rotate
shift between the sequence number in dt and the cpu logical number.

*** Result ***
After this patch, a boot-cpu's logical id will always be mapped into the
range [0,threads_per_core).

Besides this, at this phase, all threads in the boot core are forced to
be onlined. This restriction will be lifted in a later patch with
extra effort.

Signed-off-by: Pingfan Liu 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Mahesh Salgaonkar 
Cc: Wen Xiong 
Cc: Baoquan He 
Cc: Ming Lei 
Cc: Sourabh Jain 
Cc: Hari Bathini 
Cc: ke...@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org


Thanks for working on this, Pingfan.
Looks good to me.

Acked-by: Hari Bathini 



Thank you for kindly reviewing. I hope that after all these years, we
have accomplished the objective.



I hope so too.
Thanks!

Re: [PATCHv9 2/2] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt

2023-10-17 Thread Hari Bathini





On 17/10/23 7:58 am, Pingfan Liu wrote:

*** Idea ***
For kexec -p, the boot cpu can be not the cpu0, this causes the problem
of allocating memory for paca_ptrs[]. However, in theory, there is no
requirement to assign cpu's logical id as its present sequence in the
device tree. But there is something like cpu_first_thread_sibling(),
which makes assumption on the mapping inside a core. Hence partially
loosening the mapping, i.e. unbind the mapping of core while keep the
mapping inside a core.

*** Implement ***
At this early stage, there are plenty of memory to utilize. Hence, this
patch allocates interim memory to link the cpu info on a list, then
reorder cpus by changing the list head. As a result, there is a rotate
shift between the sequence number in dt and the cpu logical number.

*** Result ***
After this patch, a boot-cpu's logical id will always be mapped into the
range [0,threads_per_core).

Besides this, at this phase, all threads in the boot core are forced to
be onlined. This restriction will be lifted in a later patch with
extra effort.

Signed-off-by: Pingfan Liu 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Mahesh Salgaonkar 
Cc: Wen Xiong 
Cc: Baoquan He 
Cc: Ming Lei 
Cc: Sourabh Jain 
Cc: Hari Bathini 
Cc: ke...@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org


Thanks for working on this, Pingfan.
Looks good to me.

Acked-by: Hari Bathini 


---
  arch/powerpc/kernel/prom.c | 25 +
  arch/powerpc/kernel/setup-common.c | 84 +++---
  2 files changed, 82 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index ec82f5bda908..7ed9034912ca 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -76,7 +76,9 @@ u64 ppc64_rma_size;
  unsigned int boot_cpu_node_count __ro_after_init;
  #endif
  static phys_addr_t first_memblock_size;
+#ifdef CONFIG_SMP
  static int __initdata boot_cpu_count;
+#endif
  
  static int __init early_parse_mem(char *p)

  {
@@ -331,8 +333,7 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
const __be32 *intserv;
int i, nthreads;
int len;
-   int found = -1;
-   int found_thread = 0;
+   bool found = false;
  
  	/* We are scanning "cpu" nodes only */

if (type == NULL || strcmp(type, "cpu") != 0)
@@ -355,8 +356,15 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
for (i = 0; i < nthreads; i++) {
if (be32_to_cpu(intserv[i]) ==
fdt_boot_cpuid_phys(initial_boot_params)) {
-   found = boot_cpu_count;
-   found_thread = i;
+   /*
+* always map the boot-cpu logical id into the
+* range of [0, thread_per_core)
+*/
+   boot_cpuid = i;
+   found = true;
+   /* This forces all threads in a core to be online */
+   if (nr_cpu_ids % nthreads != 0)
+   set_nr_cpu_ids(ALIGN(nr_cpu_ids, nthreads));
}
  #ifdef CONFIG_SMP
/* logical cpu id is always 0 on UP kernels */
@@ -365,14 +373,13 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
}
  
  	/* Not the boot CPU */

-   if (found < 0)
+   if (!found)
return 0;
  
-	DBG("boot cpu: logical %d physical %d\n", found,

-   be32_to_cpu(intserv[found_thread]));
-   boot_cpuid = found;
+   DBG("boot cpu: logical %d physical %d\n", boot_cpuid,
+   be32_to_cpu(intserv[boot_cpuid]));
  
-	boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);

+   boot_cpu_hwid = be32_to_cpu(intserv[boot_cpuid]);
  
  	/*

 * PAPR defines "logical" PVR values for cpus that
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 707f0490639d..9802c7e5ee2f 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -36,6 +36,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -425,6 +426,13 @@ static void __init cpu_init_thread_core_maps(int tpc)
  
  u32 *cpu_to_phys_id = NULL;
  
+struct interrupt_server_node {

+   struct list_head node;
+   boolavail;
+   int len;
+   __be32 intserv[];
+};
+
  /**
   * setup_cpu_maps - initialize the following cpu maps:
   *  cpu_possible_mask
@@ -446,11 +454,16 @@ u32 *cpu_to_phys_id = NULL;
  void __init smp_setup_cpu_maps(void)
  {
struct device_node *dn;
-   int cpu = 0;
-   int nthreads = 1;
+   int shift = 0, cpu = 0;
+   int j, nthreads = 1;
+   int len;
+   struct interrupt_server_node *intserv_node, *n;
+   struct list_head *bt_node, head;
+   bool avail, f

Re: [PATCHv9 1/2] powerpc/setup : Enable boot_cpu_hwid for PPC32

2023-10-17 Thread Hari Bathini





On 17/10/23 7:58 am, Pingfan Liu wrote:

In order to identify the boot cpu, its intserv[] should be recorded and
checked in smp_setup_cpu_maps().

smp_setup_cpu_maps() is shared between PPC64 and PPC32. Since PPC64 has
already used boot_cpu_hwid to carry that information, enabling this
variable on PPC32 so later it can also be used to carry that information
for PPC32 in the coming patch.

Signed-off-by: Pingfan Liu 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Mahesh Salgaonkar 
Cc: Wen Xiong 
Cc: Baoquan He 
Cc: Ming Lei 
Cc: Sourabh Jain 
Cc: Hari Bathini 
Cc: ke...@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org


LGTM.

Acked-by: Hari Bathini 


---
  arch/powerpc/include/asm/smp.h | 2 +-
  arch/powerpc/kernel/prom.c | 3 +--
  arch/powerpc/kernel/setup-common.c | 2 --
  3 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 576d0e15..5db9178cc800 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -26,7 +26,7 @@
  #include 
  
  extern int boot_cpuid;

-extern int boot_cpu_hwid; /* PPC64 only */
+extern int boot_cpu_hwid;
  extern int spinning_secondaries;
  extern u32 *cpu_to_phys_id;
  extern bool coregroup_enabled;
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 0b5878c3125b..ec82f5bda908 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -372,8 +372,7 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
be32_to_cpu(intserv[found_thread]));
boot_cpuid = found;
  
-	if (IS_ENABLED(CONFIG_PPC64))

-   boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);
+   boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);
  
  	/*

 * PAPR defines "logical" PVR values for cpus that
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 2f1026fba00d..707f0490639d 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -87,9 +87,7 @@ EXPORT_SYMBOL(machine_id);
  int boot_cpuid = -1;
  EXPORT_SYMBOL_GPL(boot_cpuid);
  
-#ifdef CONFIG_PPC64

  int boot_cpu_hwid = -1;
-#endif
  
  /*

   * These are used in binfmt_elf.c to put aux entries on the stack

Re: [PATCH v6 0/5] powerpc/bpf: use BPF prog pack allocator

2023-10-17 Thread Hari Bathini





On 16/10/23 5:37 pm, Daniel Borkmann wrote:

On 10/12/23 10:03 PM, Hari Bathini wrote:

Most BPF programs are small, but they consume a page each. For systems
with busy traffic and many BPF programs, this may also add significant
pressure on instruction TLB. High iTLB pressure usually slows down the
whole system causing visible performance degradation for production
workloads.

bpf_prog_pack, a customized allocator that packs multiple bpf programs
into preallocated memory chunks, was proposed [1] to address it. This
series extends this support on powerpc.

Both bpf_arch_text_copy() & bpf_arch_text_invalidate() functions,
needed for this support depend on instruction patching in text area.
Currently, patch_instruction() supports patching only one instruction
at a time. The first patch introduces patch_instructions() function
to enable patching more than one instruction at a time. This helps in
avoiding performance degradation while JITing bpf programs.

Patches 2 & 3 implement the above mentioned arch specific functions
using patch_instructions(). Patch 4 fixes a misnomer in bpf JITing
code. The last patch enables the use of BPF prog pack allocator on
powerpc and also, ensures cleanup is handled gracefully.

[1] https://lore.kernel.org/bpf/20220204185742.271030-1-s...@kernel.org/

Changes in v6:
* No changes in patches 2-5/5 except addition of Acked-by tags from Song.
* Skipped merging code path of patch_instruction() & patch_instructions()
   to avoid performance overhead observed on ppc32 with that.


I presume this will be routed via Michael?


Yes, Daniel. This can go via linuxppc tree.

Thanks
Hari

Re: [PATCH v5 1/5] powerpc/code-patching: introduce patch_instructions()

2023-10-12 Thread Hari Bathini


Thanks for the review, Christophe.

On 10/10/23 11:16 pm, Christophe Leroy wrote:



Le 28/09/2023 à 21:48, Hari Bathini a écrit :

patch_instruction() entails setting up pte, patching the instruction,
clearing the pte and flushing the tlb. If multiple instructions need
to be patched, every instruction would have to go through the above
drill unnecessarily. Instead, introduce function patch_instructions()
that sets up the pte, clears the pte and flushes the tlb only once per
page range of instructions to be patched. This adds a slight overhead
to patch_instruction() call while improving the patching time for
scenarios where more than one instruction needs to be patched.


Not a "slight" but a "significant" overhead on PPC32.

Thinking about it once more I don't think it is a good idea to try and
merge that into the existing code_patching logic which is really single
instruction performance oriented.

Anyway, comments below.



Signed-off-by: Hari Bathini 
---
   arch/powerpc/include/asm/code-patching.h |  1 +
   arch/powerpc/lib/code-patching.c | 93 +---
   2 files changed, 85 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index 3f881548fb61..43a4aedfa703 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -74,6 +74,7 @@ int create_cond_branch(ppc_inst_t *instr, const u32 *addr,
   int patch_branch(u32 *addr, unsigned long target, int flags);
   int patch_instruction(u32 *addr, ppc_inst_t instr);
   int raw_patch_instruction(u32 *addr, ppc_inst_t instr);
+int patch_instructions(void *addr, void *code, size_t len, bool repeat_instr);


I don't like void *, you can do to much nasty things with that.
I think you want u32 *

   
   static inline unsigned long patch_site_addr(s32 *site)

   {
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index b00112d7ad46..4ff002bc41f6 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -278,7 +278,36 @@ static void unmap_patch_area(unsigned long addr)
flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
   }
   
-static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr)

+static int __patch_instructions(u32 *patch_addr, void *code, size_t len, bool 
repeat_instr)
+{
+   unsigned long start = (unsigned long)patch_addr;
+
+   /* Repeat instruction */
+   if (repeat_instr) {
+   ppc_inst_t instr = ppc_inst_read(code);
+
+   if (ppc_inst_prefixed(instr)) {
+   u64 val = ppc_inst_as_ulong(instr);
+
+   memset64((uint64_t *)patch_addr, val, len / 8);


Use u64 instead of uint64_t.


+   } else {
+   u32 val = ppc_inst_val(instr);
+
+   memset32(patch_addr, val, len / 4);
+   }
+   } else
+   memcpy(patch_addr, code, len);


Missing braces, see
https://docs.kernel.org/process/coding-style.html#placing-braces-and-spaces


+
+   smp_wmb();  /* smp write barrier */
+   flush_icache_range(start, start + len);
+   return 0;
+}
+
+/*
+ * A page is mapped and instructions that fit the page are patched.
+ * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below.
+ */
+static int __do_patch_instructions_mm(u32 *addr, void *code, size_t len, bool 
repeat_instr)
   {
int err;
u32 *patch_addr;
@@ -307,11 +336,15 @@ static int __do_patch_instruction_mm(u32 *addr, 
ppc_inst_t instr)
   
   	orig_mm = start_using_temp_mm(patching_mm);
   
-	err = __patch_instruction(addr, instr, patch_addr);

+   /* Single instruction case. */
+   if (len == 0) {
+   err = __patch_instruction(addr, *(ppc_inst_t *)code, 
patch_addr);


Take care, you can't convert u32 * to ppc_inst_t that way, you have to
use ppc_inst_read() otherwise you'll get odd result with prefixed
instructions depending on endianness.

   
-	/* hwsync performed by __patch_instruction (sync) if successful */

-   if (err)
-   mb();  /* sync */
+   /* hwsync performed by __patch_instruction (sync) if successful 
*/
+   if (err)
+   mb();  /* sync */


Get this away, see my patch at
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/e88b154eaf2efd9ff177d472d3411dcdec8ff4f5.1696675567.git.christophe.le...@csgroup.eu/


+   } else
+   err = __patch_instructions(patch_addr, code, len, repeat_instr);
   
   	/* context synchronisation performed by __patch_instruction (isync or exception) */

stop_using_temp_mm(patching_mm, orig_mm);
@@ -328,7 +361,11 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t 
instr)
return err;
   }
   
-static int __do_patch_instruction(u32 *addr, ppc_inst_t instr)

+/*
+ * A page is mapped and instructions that fit the page are patch

[PATCH v6 5/5] powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free]

2023-10-12 Thread Hari Bathini

Use bpf_jit_binary_pack_alloc in powerpc jit. The jit engine first
writes the program to the rw buffer. When the jit is done, the program
is copied to the final location with bpf_jit_binary_pack_finalize.
With multiple jit_subprogs, bpf_jit_free is called on some subprograms
that haven't got bpf_jit_binary_pack_finalize() yet. Implement custom
bpf_jit_free() like in commit 1d5f82d9dd47 ("bpf, x86: fix freeing of
not-finalized bpf_prog_pack") to call bpf_jit_binary_pack_finalize(),
if necessary. As bpf_flush_icache() is not needed anymore, remove it.

Signed-off-by: Hari Bathini 
Acked-by: Song Liu 
---
 arch/powerpc/net/bpf_jit.h|  18 ++---
 arch/powerpc/net/bpf_jit_comp.c   | 106 ++
 arch/powerpc/net/bpf_jit_comp32.c |  13 ++--
 arch/powerpc/net/bpf_jit_comp64.c |  10 +--
 4 files changed, 96 insertions(+), 51 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 72b7bb34fade..cdea5dccaefe 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -36,9 +36,6 @@
EMIT(PPC_RAW_BRANCH(offset)); \
} while (0)
 
-/* bl (unconditional 'branch' with link) */
-#define PPC_BL(dest)   EMIT(PPC_RAW_BL((dest) - (unsigned long)(image + 
ctx->idx)))
-
 /* "cond" here covers BO:BI fields. */
 #define PPC_BCC_SHORT(cond, dest)\
do {  \
@@ -147,12 +144,6 @@ struct codegen_context {
 #define BPF_FIXUP_LEN  2 /* Two instructions => 8 bytes */
 #endif
 
-static inline void bpf_flush_icache(void *start, void *end)
-{
-   smp_wmb();  /* smp write barrier */
-   flush_icache_range((unsigned long)start, (unsigned long)end);
-}
-
 static inline bool bpf_is_seen_register(struct codegen_context *ctx, int i)
 {
return ctx->seen & (1 << (31 - i));
@@ -169,16 +160,17 @@ static inline void bpf_clear_seen_register(struct 
codegen_context *ctx, int i)
 }
 
 void bpf_jit_init_reg_mapping(struct codegen_context *ctx);
-int bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 
func);
-int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context 
*ctx,
+int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context 
*ctx, u64 func);
+int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct 
codegen_context *ctx,
   u32 *addrs, int pass, bool extra_pass);
 void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_realloc_regs(struct codegen_context *ctx);
 int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int 
tmp_reg, long exit_addr);
 
-int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct 
codegen_context *ctx,
- int insn_idx, int jmp_off, int dst_reg);
+int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, u32 *fimage, int 
pass,
+ struct codegen_context *ctx, int insn_idx,
+ int jmp_off, int dst_reg);
 
 #endif
 
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index e7ca270a39d5..a79d7c478074 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -44,9 +44,12 @@ int bpf_jit_emit_exit_insn(u32 *image, struct 
codegen_context *ctx, int tmp_reg,
 }
 
 struct powerpc_jit_data {
-   struct bpf_binary_header *header;
+   /* address of rw header */
+   struct bpf_binary_header *hdr;
+   /* address of ro final header */
+   struct bpf_binary_header *fhdr;
u32 *addrs;
-   u8 *image;
+   u8 *fimage;
u32 proglen;
struct codegen_context ctx;
 };
@@ -67,11 +70,14 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
struct codegen_context cgctx;
int pass;
int flen;
-   struct bpf_binary_header *bpf_hdr;
+   struct bpf_binary_header *fhdr = NULL;
+   struct bpf_binary_header *hdr = NULL;
struct bpf_prog *org_fp = fp;
struct bpf_prog *tmp_fp;
bool bpf_blinded = false;
bool extra_pass = false;
+   u8 *fimage = NULL;
+   u32 *fcode_base;
u32 extable_len;
u32 fixup_len;
 
@@ -101,9 +107,16 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
addrs = jit_data->addrs;
if (addrs) {
cgctx = jit_data->ctx;
-   image = jit_data->image;
-   bpf_hdr = jit_data->header;
+   /*
+* JIT compiled to a writable location (image/code_base) first.
+* It is then moved to the readonly final location 
(fimage/fcode_base)
+* using instruction patching.
+*/
+   fimage = jit_

[PATCH v6 4/5] powerpc/bpf: rename powerpc64_jit_data to powerpc_jit_data

2023-10-12 Thread Hari Bathini

powerpc64_jit_data is a misnomer as it is meant for both ppc32 and
ppc64. Rename it to powerpc_jit_data.

Signed-off-by: Hari Bathini 
Acked-by: Song Liu 
---
 arch/powerpc/net/bpf_jit_comp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index ecd7cffbbe28..e7ca270a39d5 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -43,7 +43,7 @@ int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context 
*ctx, int tmp_reg,
return 0;
 }
 
-struct powerpc64_jit_data {
+struct powerpc_jit_data {
struct bpf_binary_header *header;
u32 *addrs;
u8 *image;
@@ -63,7 +63,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
u8 *image = NULL;
u32 *code_base;
u32 *addrs;
-   struct powerpc64_jit_data *jit_data;
+   struct powerpc_jit_data *jit_data;
struct codegen_context cgctx;
int pass;
int flen;
-- 
2.41.0

[PATCH v6 3/5] powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack

2023-10-12 Thread Hari Bathini

Implement bpf_arch_text_invalidate and use it to fill unused part of
the bpf_prog_pack with trap instructions when a BPF program is freed.

Signed-off-by: Hari Bathini 
Acked-by: Song Liu 
---
 arch/powerpc/net/bpf_jit_comp.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index c740eac8d584..ecd7cffbbe28 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -292,3 +292,18 @@ void *bpf_arch_text_copy(void *dst, void *src, size_t len)
 
return err ? ERR_PTR(err) : dst;
 }
+
+int bpf_arch_text_invalidate(void *dst, size_t len)
+{
+   u32 insn = BREAKPOINT_INSTRUCTION;
+   int ret;
+
+   if (WARN_ON_ONCE(core_kernel_text((unsigned long)dst)))
+   return -EINVAL;
+
+   mutex_lock(_mutex);
+   ret = patch_instructions(dst, , len, true);
+   mutex_unlock(_mutex);
+
+   return ret;
+}
-- 
2.41.0

[PATCH v6 2/5] powerpc/bpf: implement bpf_arch_text_copy

2023-10-12 Thread Hari Bathini

bpf_arch_text_copy is used to dump JITed binary to RX page, allowing
multiple BPF programs to share the same page. Use the newly introduced
patch_instructions() to implement it.

Signed-off-by: Hari Bathini 
Acked-by: Song Liu 
---
 arch/powerpc/net/bpf_jit_comp.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 37043dfc1add..c740eac8d584 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -13,9 +13,13 @@
 #include 
 #include 
 #include 
-#include 
+#include 
+#include 
 #include 
 
+#include 
+#include 
+
 #include "bpf_jit.h"
 
 static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
@@ -274,3 +278,17 @@ int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, 
int pass, struct code
ctx->exentry_idx++;
return 0;
 }
+
+void *bpf_arch_text_copy(void *dst, void *src, size_t len)
+{
+   int err;
+
+   if (WARN_ON_ONCE(core_kernel_text((unsigned long)dst)))
+   return ERR_PTR(-EINVAL);
+
+   mutex_lock(_mutex);
+   err = patch_instructions(dst, src, len, false);
+   mutex_unlock(_mutex);
+
+   return err ? ERR_PTR(err) : dst;
+}
-- 
2.41.0

[PATCH v6 1/5] powerpc/code-patching: introduce patch_instructions()

2023-10-12 Thread Hari Bathini

patch_instruction() entails setting up pte, patching the instruction,
clearing the pte and flushing the tlb. If multiple instructions need
to be patched, every instruction would have to go through the above
drill unnecessarily. Instead, introduce patch_instructions() function
that sets up the pte, clears the pte and flushes the tlb only once per
page range of instructions to be patched. This duplicates most of the
code patching logic, instead of merging with it, to avoid performance
degradation observed for single instruction patching on ppc32 with
the code path merged.

Signed-off-by: Hari Bathini 
Acked-by: Song Liu 
---

Changes in v6:
* Skipped merging code path of patch_instruction() & patch_instructions()
  to avoid performance overhead observed on ppc32 with that.


 arch/powerpc/include/asm/code-patching.h |   1 +
 arch/powerpc/lib/code-patching.c | 138 +++
 2 files changed, 139 insertions(+)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index 3f881548fb61..0e29ccf903d0 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -74,6 +74,7 @@ int create_cond_branch(ppc_inst_t *instr, const u32 *addr,
 int patch_branch(u32 *addr, unsigned long target, int flags);
 int patch_instruction(u32 *addr, ppc_inst_t instr);
 int raw_patch_instruction(u32 *addr, ppc_inst_t instr);
+int patch_instructions(u32 *addr, u32 *code, size_t len, bool repeat_instr);
 
 static inline unsigned long patch_site_addr(s32 *site)
 {
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index b00112d7ad46..a115496f934b 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -378,6 +378,144 @@ int patch_instruction(u32 *addr, ppc_inst_t instr)
 }
 NOKPROBE_SYMBOL(patch_instruction);
 
+static int __patch_instructions(u32 *patch_addr, u32 *code, size_t len, bool 
repeat_instr)
+{
+   unsigned long start = (unsigned long)patch_addr;
+
+   /* Repeat instruction */
+   if (repeat_instr) {
+   ppc_inst_t instr = ppc_inst_read(code);
+
+   if (ppc_inst_prefixed(instr)) {
+   u64 val = ppc_inst_as_ulong(instr);
+
+   memset64((u64 *)patch_addr, val, len / 8);
+   } else {
+   u32 val = ppc_inst_val(instr);
+
+   memset32(patch_addr, val, len / 4);
+   }
+   } else {
+   memcpy(patch_addr, code, len);
+   }
+
+   smp_wmb();  /* smp write barrier */
+   flush_icache_range(start, start + len);
+   return 0;
+}
+
+/*
+ * A page is mapped and instructions that fit the page are patched.
+ * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below.
+ */
+static int __do_patch_instructions_mm(u32 *addr, u32 *code, size_t len, bool 
repeat_instr)
+{
+   struct mm_struct *patching_mm, *orig_mm;
+   unsigned long pfn = get_patch_pfn(addr);
+   unsigned long text_poke_addr;
+   spinlock_t *ptl;
+   u32 *patch_addr;
+   pte_t *pte;
+   int err;
+
+   patching_mm = __this_cpu_read(cpu_patching_context.mm);
+   text_poke_addr = __this_cpu_read(cpu_patching_context.addr);
+   patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr));
+
+   pte = get_locked_pte(patching_mm, text_poke_addr, );
+   if (!pte)
+   return -ENOMEM;
+
+   __set_pte_at(patching_mm, text_poke_addr, pte, pfn_pte(pfn, 
PAGE_KERNEL), 0);
+
+   /* order PTE update before use, also serves as the hwsync */
+   asm volatile("ptesync" ::: "memory");
+
+   /* order context switch after arbitrary prior code */
+   isync();
+
+   orig_mm = start_using_temp_mm(patching_mm);
+
+   err = __patch_instructions(patch_addr, code, len, repeat_instr);
+
+   /* context synchronisation performed by __patch_instructions */
+   stop_using_temp_mm(patching_mm, orig_mm);
+
+   pte_clear(patching_mm, text_poke_addr, pte);
+   /*
+* ptesync to order PTE update before TLB invalidation done
+* by radix__local_flush_tlb_page_psize (in _tlbiel_va)
+*/
+   local_flush_tlb_page_psize(patching_mm, text_poke_addr, 
mmu_virtual_psize);
+
+   pte_unmap_unlock(pte, ptl);
+
+   return err;
+}
+
+/*
+ * A page is mapped and instructions that fit the page are patched.
+ * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below.
+ */
+static int __do_patch_instructions(u32 *addr, u32 *code, size_t len, bool 
repeat_instr)
+{
+   unsigned long pfn = get_patch_pfn(addr);
+   unsigned long text_poke_addr;
+   u32 *patch_addr;
+   pte_t *pte;
+   int err;
+
+   text_poke_addr = (unsigned 
long)__this_cpu_read(cpu_patching_context.addr) & PAGE_MASK;
+   patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr));
+
+   pte = __this_cpu_r

[PATCH v6 0/5] powerpc/bpf: use BPF prog pack allocator

2023-10-12 Thread Hari Bathini

Most BPF programs are small, but they consume a page each. For systems
with busy traffic and many BPF programs, this may also add significant
pressure on instruction TLB. High iTLB pressure usually slows down the
whole system causing visible performance degradation for production
workloads.

bpf_prog_pack, a customized allocator that packs multiple bpf programs
into preallocated memory chunks, was proposed [1] to address it. This
series extends this support on powerpc.

Both bpf_arch_text_copy() & bpf_arch_text_invalidate() functions,
needed for this support depend on instruction patching in text area.
Currently, patch_instruction() supports patching only one instruction
at a time. The first patch introduces patch_instructions() function
to enable patching more than one instruction at a time. This helps in
avoiding performance degradation while JITing bpf programs.

Patches 2 & 3 implement the above mentioned arch specific functions
using patch_instructions(). Patch 4 fixes a misnomer in bpf JITing
code. The last patch enables the use of BPF prog pack allocator on
powerpc and also, ensures cleanup is handled gracefully.

[1] https://lore.kernel.org/bpf/20220204185742.271030-1-s...@kernel.org/

Changes in v6:
* No changes in patches 2-5/5 except addition of Acked-by tags from Song.
* Skipped merging code path of patch_instruction() & patch_instructions()
  to avoid performance overhead observed on ppc32 with that.

Changes in v5:
* Moved introduction of patch_instructions() as 1st patch in series.
* Improved patch_instructions() to use memset & memcpy.
* Fixed the misnomer in JITing code as a separate patch.
* Removed unused bpf_flush_icache() function.

Changes in v4:
* Updated bpf_patch_instructions() definition in patch 1/5 so that
  it doesn't have to be updated again in patch 2/5.
* Addressed Christophe's comment on bpf_arch_text_invalidate() return
  value in patch 2/5.

Changes in v3:
* Fixed segfault issue observed on ppc32 due to inaccurate offset
  calculation for branching.
* Tried to minimize the performance impact for patch_instruction()
  with the introduction of patch_instructions().
* Corrected uses of u32* vs ppc_instr_t.
* Moved the change that introduces patch_instructions() to after
  enabling bpf_prog_pack support.
* Added few comments to improve code readability.

Changes in v2:
* Introduced patch_instructions() to help with patching bpf programs.


Hari Bathini (5):
  powerpc/code-patching: introduce patch_instructions()
  powerpc/bpf: implement bpf_arch_text_copy
  powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack
  powerpc/bpf: rename powerpc64_jit_data to powerpc_jit_data
  powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free]

 arch/powerpc/include/asm/code-patching.h |   1 +
 arch/powerpc/lib/code-patching.c | 138 +
 arch/powerpc/net/bpf_jit.h   |  18 +--
 arch/powerpc/net/bpf_jit_comp.c  | 145 ++-
 arch/powerpc/net/bpf_jit_comp32.c|  13 +-
 arch/powerpc/net/bpf_jit_comp64.c|  10 +-
 6 files changed, 271 insertions(+), 54 deletions(-)

-- 
2.41.0

Re: [PATCHv8 3/5] powerpc/setup: Handle the case when boot_cpuid greater than nr_cpus

2023-10-11 Thread Hari Bathini





On 11/10/23 8:35 am, Pingfan Liu wrote:

On Tue, Oct 10, 2023 at 01:56:13PM +0530, Hari Bathini wrote:



On 09/10/23 5:00 pm, Pingfan Liu wrote:

If the boot_cpuid is smaller than nr_cpus, it requires extra effort to
ensure the boot_cpu is in cpu_present_mask. This can be achieved by
reserving the last quota for the boot cpu.

Note: the restriction on nr_cpus will be lifted with more effort in the
successive patches

Signed-off-by: Pingfan Liu 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Mahesh Salgaonkar 
Cc: Wen Xiong 
Cc: Baoquan He 
Cc: Ming Lei 
Cc: ke...@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org
---
   arch/powerpc/kernel/setup-common.c | 25 ++---
   1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 81291e13dec0..f9ef0a2666b0 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -454,8 +454,8 @@ struct interrupt_server_node {
   void __init smp_setup_cpu_maps(void)
   {
struct device_node *dn;
-   int shift = 0, cpu = 0;
-   int j, nthreads = 1;
+   int terminate, shift = 0, cpu = 0;
+   int j, bt_thread = 0, nthreads = 1;
int len;
struct interrupt_server_node *intserv_node, *n;
struct list_head *bt_node, head;
@@ -518,6 +518,7 @@ void __init smp_setup_cpu_maps(void)
for (j = 0 ; j < nthreads; j++) {
if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) {
bt_node = _node->node;
+   bt_thread = j;
found_boot_cpu = true;
/*
 * Record the round-shift between dt
@@ -537,11 +538,21 @@ void __init smp_setup_cpu_maps(void)
/* Select the primary thread, the boot cpu's slibing, as the logic 0 */
list_add_tail(, bt_node);
pr_info("the round shift between dt seq and the cpu logic number: 
%d\n", shift);
+   terminate = nr_cpu_ids;
list_for_each_entry(intserv_node, , node) {
+   j = 0;



+   /* Choose a start point to cover the boot cpu */
+   if (nr_cpu_ids - 1 < bt_thread) {
+   /*
+* The processor core puts assumption on the thread id,
+* not to breach the assumption.
+*/
+   terminate = nr_cpu_ids - 1;


nthreads is anyway assumed to be same for all cores. So, enforcing
nr_cpu_ids to a minimum of nthreads (and multiple of nthreads) should
make the code much simpler without the need for above check and the
other complexities addressed in the subsequent patches...



Indeed, this series can be splited into two partsk, [1-2/5] and [3-5/5].
In [1-2/5], if smaller, the nr_cpu_ids is enforced to be equal to
nthreads. I will make it align upward on nthreads in the next version.
So [1-2/5] can be totally independent from the rest patches in this
series.


Yup. Would prefer it that way.


 From an engineer's perspective, [3-5/5] are added to maintain the
nr_cpus semantics. (Finally, nr_cpus=1 can be achieved but requiring
effort on other subsystem)


I understand it would be nice to maintain semantics but not worth the
complexity it brings, IMHO. So, my suggest would be to drop [3-5/5].

Thanks
Hari

Re: [PATCHv8 2/5] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt

2023-10-10 Thread Hari Bathini





On 09/10/23 5:00 pm, Pingfan Liu wrote:

*** Idea ***
For kexec -p, the boot cpu can be not the cpu0, this causes the problem
of allocating memory for paca_ptrs[]. However, in theory, there is no
requirement to assign cpu's logical id as its present sequence in the
device tree. But there is something like cpu_first_thread_sibling(),
which makes assumption on the mapping inside a core. Hence partially
loosening the mapping, i.e. unbind the mapping of core while keep the
mapping inside a core.

*** Implement ***
At this early stage, there are plenty of memory to utilize. Hence, this
patch allocates interim memory to link the cpu info on a list, then
reorder cpus by changing the list head. As a result, there is a rotate
shift between the sequence number in dt and the cpu logical number.

*** Result ***
After this patch, a boot-cpu's logical id will always be mapped into the
range [0,threads_per_core).

Besides this, at this phase, all threads in the boot core are forced to
be onlined. This restriction will be lifted in a later patch with
extra effort.

Signed-off-by: Pingfan Liu 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Mahesh Salgaonkar 
Cc: Wen Xiong 
Cc: Baoquan He 
Cc: Ming Lei 
Cc: ke...@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org
---
  arch/powerpc/kernel/prom.c | 25 +
  arch/powerpc/kernel/setup-common.c | 87 +++---
  2 files changed, 85 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index ec82f5bda908..87272a2d8c10 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -76,7 +76,9 @@ u64 ppc64_rma_size;
  unsigned int boot_cpu_node_count __ro_after_init;
  #endif
  static phys_addr_t first_memblock_size;
+#ifdef CONFIG_SMP
  static int __initdata boot_cpu_count;
+#endif
  
  static int __init early_parse_mem(char *p)

  {
@@ -331,8 +333,7 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
const __be32 *intserv;
int i, nthreads;
int len;
-   int found = -1;
-   int found_thread = 0;
+   bool found = false;
  
  	/* We are scanning "cpu" nodes only */

if (type == NULL || strcmp(type, "cpu") != 0)
@@ -355,8 +356,15 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
for (i = 0; i < nthreads; i++) {
if (be32_to_cpu(intserv[i]) ==
fdt_boot_cpuid_phys(initial_boot_params)) {
-   found = boot_cpu_count;
-   found_thread = i;
+   /*
+* always map the boot-cpu logical id into the
+* range of [0, thread_per_core)
+*/
+   boot_cpuid = i;
+   found = true;
+   /* This works around the hole in paca_ptrs[]. */
+   if (nr_cpu_ids < nthreads)
+   set_nr_cpu_ids(nthreads);
}
  #ifdef CONFIG_SMP
/* logical cpu id is always 0 on UP kernels */
@@ -365,14 +373,13 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
}
  
  	/* Not the boot CPU */

-   if (found < 0)
+   if (!found)
return 0;
  
-	DBG("boot cpu: logical %d physical %d\n", found,

-   be32_to_cpu(intserv[found_thread]));
-   boot_cpuid = found;
+   DBG("boot cpu: logical %d physical %d\n", boot_cpuid,
+   be32_to_cpu(intserv[boot_cpuid]));
  
-	boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);

+   boot_cpu_hwid = be32_to_cpu(intserv[boot_cpuid]);
  
  	/*

 * PAPR defines "logical" PVR values for cpus that
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 1b19a9815672..81291e13dec0 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -36,6 +36,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -425,6 +426,13 @@ static void __init cpu_init_thread_core_maps(int tpc)
  
  u32 *cpu_to_phys_id = NULL;
  
+struct interrupt_server_node {

+   struct list_head node;
+   boolavail;
+   int len;
+   __be32 *intserv;
+};
+
  /**
   * setup_cpu_maps - initialize the following cpu maps:
   *  cpu_possible_mask
@@ -446,11 +454,16 @@ u32 *cpu_to_phys_id = NULL;
  void __init smp_setup_cpu_maps(void)
  {
struct device_node *dn;
-   int cpu = 0;
-   int nthreads = 1;
+   int shift = 0, cpu = 0;
+   int j, nthreads = 1;
+   int len;
+   struct interrupt_server_node *intserv_node, *n;
+   struct list_head *bt_node, head;
+   bool avail, found_boot_cpu = false;
  
  	DBG("smp_setup_cpu_maps()\n");
  
+	INIT_LIST_HEAD();

cpu_to_phys_id = memblock_alloc(nr_cpu_ids * sizeof(u32),
__alignof__(u32));
if

Re: [PATCHv8 3/5] powerpc/setup: Handle the case when boot_cpuid greater than nr_cpus

2023-10-10 Thread Hari Bathini





On 09/10/23 5:00 pm, Pingfan Liu wrote:

If the boot_cpuid is smaller than nr_cpus, it requires extra effort to
ensure the boot_cpu is in cpu_present_mask. This can be achieved by
reserving the last quota for the boot cpu.

Note: the restriction on nr_cpus will be lifted with more effort in the
successive patches

Signed-off-by: Pingfan Liu 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Mahesh Salgaonkar 
Cc: Wen Xiong 
Cc: Baoquan He 
Cc: Ming Lei 
Cc: ke...@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org
---
  arch/powerpc/kernel/setup-common.c | 25 ++---
  1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 81291e13dec0..f9ef0a2666b0 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -454,8 +454,8 @@ struct interrupt_server_node {
  void __init smp_setup_cpu_maps(void)
  {
struct device_node *dn;
-   int shift = 0, cpu = 0;
-   int j, nthreads = 1;
+   int terminate, shift = 0, cpu = 0;
+   int j, bt_thread = 0, nthreads = 1;
int len;
struct interrupt_server_node *intserv_node, *n;
struct list_head *bt_node, head;
@@ -518,6 +518,7 @@ void __init smp_setup_cpu_maps(void)
for (j = 0 ; j < nthreads; j++) {
if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) {
bt_node = _node->node;
+   bt_thread = j;
found_boot_cpu = true;
/*
 * Record the round-shift between dt
@@ -537,11 +538,21 @@ void __init smp_setup_cpu_maps(void)
/* Select the primary thread, the boot cpu's slibing, as the logic 0 */
list_add_tail(, bt_node);
pr_info("the round shift between dt seq and the cpu logic number: 
%d\n", shift);
+   terminate = nr_cpu_ids;
list_for_each_entry(intserv_node, , node) {
  
+		j = 0;



+   /* Choose a start point to cover the boot cpu */
+   if (nr_cpu_ids - 1 < bt_thread) {
+   /*
+* The processor core puts assumption on the thread id,
+* not to breach the assumption.
+*/
+   terminate = nr_cpu_ids - 1;


nthreads is anyway assumed to be same for all cores. So, enforcing
nr_cpu_ids to a minimum of nthreads (and multiple of nthreads) should
make the code much simpler without the need for above check and the
other complexities addressed in the subsequent patches...

Thanks
Hari

Re: [PATCH v5 1/5] powerpc/code-patching: introduce patch_instructions()

2023-10-06 Thread Hari Bathini


Thanks for the review, Song.

On 29/09/23 2:38 am, Song Liu wrote:

On Thu, Sep 28, 2023 at 12:48 PM Hari Bathini  wrote:


patch_instruction() entails setting up pte, patching the instruction,
clearing the pte and flushing the tlb. If multiple instructions need
to be patched, every instruction would have to go through the above
drill unnecessarily. Instead, introduce function patch_instructions()
that sets up the pte, clears the pte and flushes the tlb only once per
page range of instructions to be patched. This adds a slight overhead
to patch_instruction() call while improving the patching time for
scenarios where more than one instruction needs to be patched.

Signed-off-by: Hari Bathini 


Acked-by: Song Liu 

With a nit below.

[...]

+/*
+ * A page is mapped and instructions that fit the page are patched.
+ * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below.
+ */
+static int __do_patch_instructions_mm(u32 *addr, void *code, size_t len, bool 
repeat_instr)
  {
 int err;
 u32 *patch_addr;
@@ -307,11 +336,15 @@ static int __do_patch_instruction_mm(u32 *addr, 
ppc_inst_t instr)

 orig_mm = start_using_temp_mm(patching_mm);

-   err = __patch_instruction(addr, instr, patch_addr);
+   /* Single instruction case. */
+   if (len == 0) {
+   err = __patch_instruction(addr, *(ppc_inst_t *)code, 
patch_addr);


len == 0 for single instruction is a little weird to me. How about we just use
len == 4 or 8 depending on the instruction to patch?


Yeah. Looks a bit weird but it avoids the need to call ppc_inst_read()
& ppc_inst_len(). A comment explaining why this weird check could
have been better though..

Thanks
Hari

Re: [PATCH v5 1/5] powerpc/code-patching: introduce patch_instructions()

2023-10-06 Thread Hari Bathini


Hi Christophe,


On 29/09/23 2:09 pm, Christophe Leroy wrote:



Le 28/09/2023 à 21:48, Hari Bathini a écrit :

patch_instruction() entails setting up pte, patching the instruction,
clearing the pte and flushing the tlb. If multiple instructions need
to be patched, every instruction would have to go through the above
drill unnecessarily. Instead, introduce function patch_instructions()
that sets up the pte, clears the pte and flushes the tlb only once per
page range of instructions to be patched. This adds a slight overhead
to patch_instruction() call while improving the patching time for
scenarios where more than one instruction needs to be patched.


On my powerpc8xx, this patch leads to an increase of about 8% of the
time needed to activate ftrace function tracer.


Interesting! My observation on ppc64le was somewhat different.
With single cpu, average ticks were almost similar with and without
the patch (~1580). I saw a performance degradation of less than
0.6% without vs with this patch to activate function tracer.

Ticks to activate function tracer in 15 attempts without
this patch (avg: 108734089):
106619626
111712292
111030404
111021344
111313530
106253773
107156175
106887038
107215379
108646636
108040287
108311770
107842343
106894310
112066423

Ticks to activate function tracer in 15 attempts with
this patch (avg: 109328578):
109378357
108794095
108595381
107622142
110689418
107287276
107132093
112540481
111311830
112608265
102883923
112054554
111762570
109874309
107393979

I used the below patch for the experiment:

diff --git a/arch/powerpc/lib/code-patching.c 
b/arch/powerpc/lib/code-patching.c

index b00112d7ad4..0979d12d00c 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -19,6 +19,10 @@
 #include 
 #include 
 #include 
+#include 
+
+unsigned long patching_time;
+unsigned long num_times;

 static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 
*patch_addr)

 {
@@ -353,7 +357,7 @@ static int __do_patch_instruction(u32 *addr, 
ppc_inst_t instr)

return err;
 }

-int patch_instruction(u32 *addr, ppc_inst_t instr)
+int ___patch_instruction(u32 *addr, ppc_inst_t instr)
 {
int err;
unsigned long flags;
@@ -376,6 +380,19 @@ int patch_instruction(u32 *addr, ppc_inst_t instr)

return err;
 }
+
+int patch_instruction(u32 *addr, ppc_inst_t instr)
+{
+   u64 start;
+   int err;
+
+   start = get_tb();
+   err = ___patch_instruction(addr, instr);
+   patching_time += (get_tb() - start);
+   num_times++;
+
+   return err;
+}
 NOKPROBE_SYMBOL(patch_instruction);

 int patch_branch(u32 *addr, unsigned long target, int flags)
diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
index 1d4bc493b2f..f52694cfeab 100644
--- a/kernel/ksysfs.c
+++ b/kernel/ksysfs.c
@@ -35,6 +35,18 @@ static struct kobj_attribute _name##_attr = 
__ATTR_RO(_name)

 #define KERNEL_ATTR_RW(_name) \
 static struct kobj_attribute _name##_attr = __ATTR_RW(_name)

+unsigned long patch_avgtime;
+extern unsigned long patching_time;
+extern unsigned long num_times;
+
+static ssize_t patching_avgtime_show(struct kobject *kobj,
+struct kobj_attribute *attr, char *buf)
+{
+   patch_avgtime = patching_time / num_times;
+   return sysfs_emit(buf, "%lu\n", patch_avgtime);
+}
+KERNEL_ATTR_RO(patching_avgtime);
+
 /* current uevent sequence number */
 static ssize_t uevent_seqnum_show(struct kobject *kobj,
  struct kobj_attribute *attr, char *buf)
@@ -250,6 +262,7 @@ struct kobject *kernel_kobj;
 EXPORT_SYMBOL_GPL(kernel_kobj);

 static struct attribute * kernel_attrs[] = {
+   _avgtime_attr.attr,
_attr.attr,
_seqnum_attr.attr,
_byteorder_attr.attr,
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index abaaf516fca..5eb950bcab9 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -50,6 +50,7 @@
 #include 

 #include  /* COMMAND_LINE_SIZE */
+#include 

 #include "trace.h"
 #include "trace_output.h"
@@ -6517,6 +6518,7 @@ int tracing_set_tracer(struct trace_array *tr, 
const char *buf)

bool had_max_tr;
 #endif
int ret = 0;
+   u64 start;

mutex_lock(_types_lock);

@@ -6536,6 +6538,10 @@ int tracing_set_tracer(struct trace_array *tr, 
const char *buf)

ret = -EINVAL;
goto out;
}
+
+   pr_warn("Current tracer: %s, Changing to tracer: %s\n",
+   tr->current_trace->name, t->name);
+   start = get_tb();
if (t == tr->current_trace)
goto out;

@@ -6614,6 +6620,7 @@ int tracing_set_tracer(struct trace_array *tr, 
const char *buf)

tr->current_trace->enabled++;
trace_branch_enable(tr);
  out:
+   pr_warn("Time taken to enable tracer is %llu\n", (get_tb() - start));
mutex_unlock(_types_lock);

return ret;

Thanks
Hari

Re: [PATCH v4 4/5] powerpc/code-patching: introduce patch_instructions()

2023-09-28 Thread Hari Bathini





On 26/09/23 12:21 pm, Christophe Leroy wrote:



Le 26/09/2023 à 00:50, Song Liu a écrit :

On Fri, Sep 8, 2023 at 6:28 AM Hari Bathini  wrote:


patch_instruction() entails setting up pte, patching the instruction,
clearing the pte and flushing the tlb. If multiple instructions need
to be patched, every instruction would have to go through the above
drill unnecessarily. Instead, introduce function patch_instructions()
that sets up the pte, clears the pte and flushes the tlb only once per
page range of instructions to be patched. This adds a slight overhead
to patch_instruction() call while improving the patching time for
scenarios where more than one instruction needs to be patched.

Signed-off-by: Hari Bathini 


I didn't see this one when I reviewed 1/5. Please ignore that comment.


If I remember correctry, patch 1 introduces a huge performance
degradation, which gets then improved with this patch.

As I said before, I'd expect patch 4 to go first then get
bpf_arch_text_copy() be implemented with patch_instructions() directly.



Thanks for the reviews, Christophe & Song.
Posted v5 based on your inputs.

- Hari

[PATCH v5 5/5] powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free]

2023-09-28 Thread Hari Bathini

Use bpf_jit_binary_pack_alloc in powerpc jit. The jit engine first
writes the program to the rw buffer. When the jit is done, the program
is copied to the final location with bpf_jit_binary_pack_finalize.
With multiple jit_subprogs, bpf_jit_free is called on some subprograms
that haven't got bpf_jit_binary_pack_finalize() yet. Implement custom
bpf_jit_free() like in commit 1d5f82d9dd47 ("bpf, x86: fix freeing of
not-finalized bpf_prog_pack") to call bpf_jit_binary_pack_finalize(),
if necessary. As bpf_flush_icache() is not needed anymore, remove it.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/net/bpf_jit.h|  18 ++---
 arch/powerpc/net/bpf_jit_comp.c   | 106 ++
 arch/powerpc/net/bpf_jit_comp32.c |  13 ++--
 arch/powerpc/net/bpf_jit_comp64.c |  10 +--
 4 files changed, 96 insertions(+), 51 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 72b7bb34fade..cdea5dccaefe 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -36,9 +36,6 @@
EMIT(PPC_RAW_BRANCH(offset)); \
} while (0)
 
-/* bl (unconditional 'branch' with link) */
-#define PPC_BL(dest)   EMIT(PPC_RAW_BL((dest) - (unsigned long)(image + 
ctx->idx)))
-
 /* "cond" here covers BO:BI fields. */
 #define PPC_BCC_SHORT(cond, dest)\
do {  \
@@ -147,12 +144,6 @@ struct codegen_context {
 #define BPF_FIXUP_LEN  2 /* Two instructions => 8 bytes */
 #endif
 
-static inline void bpf_flush_icache(void *start, void *end)
-{
-   smp_wmb();  /* smp write barrier */
-   flush_icache_range((unsigned long)start, (unsigned long)end);
-}
-
 static inline bool bpf_is_seen_register(struct codegen_context *ctx, int i)
 {
return ctx->seen & (1 << (31 - i));
@@ -169,16 +160,17 @@ static inline void bpf_clear_seen_register(struct 
codegen_context *ctx, int i)
 }
 
 void bpf_jit_init_reg_mapping(struct codegen_context *ctx);
-int bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 
func);
-int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context 
*ctx,
+int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context 
*ctx, u64 func);
+int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct 
codegen_context *ctx,
   u32 *addrs, int pass, bool extra_pass);
 void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_realloc_regs(struct codegen_context *ctx);
 int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int 
tmp_reg, long exit_addr);
 
-int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct 
codegen_context *ctx,
- int insn_idx, int jmp_off, int dst_reg);
+int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, u32 *fimage, int 
pass,
+ struct codegen_context *ctx, int insn_idx,
+ int jmp_off, int dst_reg);
 
 #endif
 
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index e7ca270a39d5..a79d7c478074 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -44,9 +44,12 @@ int bpf_jit_emit_exit_insn(u32 *image, struct 
codegen_context *ctx, int tmp_reg,
 }
 
 struct powerpc_jit_data {
-   struct bpf_binary_header *header;
+   /* address of rw header */
+   struct bpf_binary_header *hdr;
+   /* address of ro final header */
+   struct bpf_binary_header *fhdr;
u32 *addrs;
-   u8 *image;
+   u8 *fimage;
u32 proglen;
struct codegen_context ctx;
 };
@@ -67,11 +70,14 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
struct codegen_context cgctx;
int pass;
int flen;
-   struct bpf_binary_header *bpf_hdr;
+   struct bpf_binary_header *fhdr = NULL;
+   struct bpf_binary_header *hdr = NULL;
struct bpf_prog *org_fp = fp;
struct bpf_prog *tmp_fp;
bool bpf_blinded = false;
bool extra_pass = false;
+   u8 *fimage = NULL;
+   u32 *fcode_base;
u32 extable_len;
u32 fixup_len;
 
@@ -101,9 +107,16 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
addrs = jit_data->addrs;
if (addrs) {
cgctx = jit_data->ctx;
-   image = jit_data->image;
-   bpf_hdr = jit_data->header;
+   /*
+* JIT compiled to a writable location (image/code_base) first.
+* It is then moved to the readonly final location 
(fimage/fcode_base)
+* using instruction patching.
+*/
+   fimage = jit_data->fimage;
+   fhd

[PATCH v5 4/5] powerpc/bpf: rename powerpc64_jit_data to powerpc_jit_data

2023-09-28 Thread Hari Bathini

powerpc64_jit_data is a misnomer as it is meant for both ppc32 and
ppc64. Rename it to powerpc_jit_data.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/net/bpf_jit_comp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index ecd7cffbbe28..e7ca270a39d5 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -43,7 +43,7 @@ int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context 
*ctx, int tmp_reg,
return 0;
 }
 
-struct powerpc64_jit_data {
+struct powerpc_jit_data {
struct bpf_binary_header *header;
u32 *addrs;
u8 *image;
@@ -63,7 +63,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
u8 *image = NULL;
u32 *code_base;
u32 *addrs;
-   struct powerpc64_jit_data *jit_data;
+   struct powerpc_jit_data *jit_data;
struct codegen_context cgctx;
int pass;
int flen;
-- 
2.41.0

[PATCH v5 3/5] powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack

2023-09-28 Thread Hari Bathini

Implement bpf_arch_text_invalidate and use it to fill unused part of
the bpf_prog_pack with trap instructions when a BPF program is freed.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/net/bpf_jit_comp.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index c740eac8d584..ecd7cffbbe28 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -292,3 +292,18 @@ void *bpf_arch_text_copy(void *dst, void *src, size_t len)
 
return err ? ERR_PTR(err) : dst;
 }
+
+int bpf_arch_text_invalidate(void *dst, size_t len)
+{
+   u32 insn = BREAKPOINT_INSTRUCTION;
+   int ret;
+
+   if (WARN_ON_ONCE(core_kernel_text((unsigned long)dst)))
+   return -EINVAL;
+
+   mutex_lock(_mutex);
+   ret = patch_instructions(dst, , len, true);
+   mutex_unlock(_mutex);
+
+   return ret;
+}
-- 
2.41.0

[PATCH v5 2/5] powerpc/bpf: implement bpf_arch_text_copy

2023-09-28 Thread Hari Bathini

bpf_arch_text_copy is used to dump JITed binary to RX page, allowing
multiple BPF programs to share the same page. Use the newly introduced
patch_instructions() to implement it.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/net/bpf_jit_comp.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 37043dfc1add..c740eac8d584 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -13,9 +13,13 @@
 #include 
 #include 
 #include 
-#include 
+#include 
+#include 
 #include 
 
+#include 
+#include 
+
 #include "bpf_jit.h"
 
 static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
@@ -274,3 +278,17 @@ int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, 
int pass, struct code
ctx->exentry_idx++;
return 0;
 }
+
+void *bpf_arch_text_copy(void *dst, void *src, size_t len)
+{
+   int err;
+
+   if (WARN_ON_ONCE(core_kernel_text((unsigned long)dst)))
+   return ERR_PTR(-EINVAL);
+
+   mutex_lock(_mutex);
+   err = patch_instructions(dst, src, len, false);
+   mutex_unlock(_mutex);
+
+   return err ? ERR_PTR(err) : dst;
+}
-- 
2.41.0

[PATCH v5 1/5] powerpc/code-patching: introduce patch_instructions()

2023-09-28 Thread Hari Bathini

patch_instruction() entails setting up pte, patching the instruction,
clearing the pte and flushing the tlb. If multiple instructions need
to be patched, every instruction would have to go through the above
drill unnecessarily. Instead, introduce function patch_instructions()
that sets up the pte, clears the pte and flushes the tlb only once per
page range of instructions to be patched. This adds a slight overhead
to patch_instruction() call while improving the patching time for
scenarios where more than one instruction needs to be patched.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/include/asm/code-patching.h |  1 +
 arch/powerpc/lib/code-patching.c | 93 +---
 2 files changed, 85 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index 3f881548fb61..43a4aedfa703 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -74,6 +74,7 @@ int create_cond_branch(ppc_inst_t *instr, const u32 *addr,
 int patch_branch(u32 *addr, unsigned long target, int flags);
 int patch_instruction(u32 *addr, ppc_inst_t instr);
 int raw_patch_instruction(u32 *addr, ppc_inst_t instr);
+int patch_instructions(void *addr, void *code, size_t len, bool repeat_instr);
 
 static inline unsigned long patch_site_addr(s32 *site)
 {
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index b00112d7ad46..4ff002bc41f6 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -278,7 +278,36 @@ static void unmap_patch_area(unsigned long addr)
flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
 }
 
-static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr)
+static int __patch_instructions(u32 *patch_addr, void *code, size_t len, bool 
repeat_instr)
+{
+   unsigned long start = (unsigned long)patch_addr;
+
+   /* Repeat instruction */
+   if (repeat_instr) {
+   ppc_inst_t instr = ppc_inst_read(code);
+
+   if (ppc_inst_prefixed(instr)) {
+   u64 val = ppc_inst_as_ulong(instr);
+
+   memset64((uint64_t *)patch_addr, val, len / 8);
+   } else {
+   u32 val = ppc_inst_val(instr);
+
+   memset32(patch_addr, val, len / 4);
+   }
+   } else
+   memcpy(patch_addr, code, len);
+
+   smp_wmb();  /* smp write barrier */
+   flush_icache_range(start, start + len);
+   return 0;
+}
+
+/*
+ * A page is mapped and instructions that fit the page are patched.
+ * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below.
+ */
+static int __do_patch_instructions_mm(u32 *addr, void *code, size_t len, bool 
repeat_instr)
 {
int err;
u32 *patch_addr;
@@ -307,11 +336,15 @@ static int __do_patch_instruction_mm(u32 *addr, 
ppc_inst_t instr)
 
orig_mm = start_using_temp_mm(patching_mm);
 
-   err = __patch_instruction(addr, instr, patch_addr);
+   /* Single instruction case. */
+   if (len == 0) {
+   err = __patch_instruction(addr, *(ppc_inst_t *)code, 
patch_addr);
 
-   /* hwsync performed by __patch_instruction (sync) if successful */
-   if (err)
-   mb();  /* sync */
+   /* hwsync performed by __patch_instruction (sync) if successful 
*/
+   if (err)
+   mb();  /* sync */
+   } else
+   err = __patch_instructions(patch_addr, code, len, repeat_instr);
 
/* context synchronisation performed by __patch_instruction (isync or 
exception) */
stop_using_temp_mm(patching_mm, orig_mm);
@@ -328,7 +361,11 @@ static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t 
instr)
return err;
 }
 
-static int __do_patch_instruction(u32 *addr, ppc_inst_t instr)
+/*
+ * A page is mapped and instructions that fit the page are patched.
+ * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below.
+ */
+static int __do_patch_instructions(u32 *addr, void *code, size_t len, bool 
repeat_instr)
 {
int err;
u32 *patch_addr;
@@ -345,7 +382,11 @@ static int __do_patch_instruction(u32 *addr, ppc_inst_t 
instr)
if (radix_enabled())
asm volatile("ptesync": : :"memory");
 
-   err = __patch_instruction(addr, instr, patch_addr);
+   /* Single instruction case. */
+   if (len == 0)
+   err = __patch_instruction(addr, *(ppc_inst_t *)code, 
patch_addr);
+   else
+   err = __patch_instructions(patch_addr, code, len, repeat_instr);
 
pte_clear(_mm, text_poke_addr, pte);
flush_tlb_kernel_range(text_poke_addr, text_poke_addr + PAGE_SIZE);
@@ -369,15 +410,49 @@ int patch_instruction(u32 *addr, ppc_inst_t instr)
 
local_irq_save(flags);
if (mm_patch_enabled())
-   err = __do_patch_instru

[PATCH v5 0/5] powerpc/bpf: use BPF prog pack allocator

2023-09-28 Thread Hari Bathini

Most BPF programs are small, but they consume a page each. For systems
with busy traffic and many BPF programs, this may also add significant
pressure on instruction TLB. High iTLB pressure usually slows down the
whole system causing visible performance degradation for production
workloads.

bpf_prog_pack, a customized allocator that packs multiple bpf programs
into preallocated memory chunks, was proposed [1] to address it. This
series extends this support on powerpc.

Both bpf_arch_text_copy() & bpf_arch_text_invalidate() functions,
needed for this support depend on instruction patching in text area.
Currently, patch_instruction() supports patching only one instruction
at a time. The first patch introduces patch_instructions() function
to enable patching more than one instruction at a time. This helps in
avoiding performance degradation while JITing bpf programs.

Patches 2 & 3 implement the above mentioned arch specific functions
using patch_instructions(). Patch 4 fixes a misnomer in bpf JITing
code. The last patch enables the use of BPF prog pack allocator on
powerpc and also, ensures cleanup is handled gracefully.

[1] https://lore.kernel.org/bpf/20220204185742.271030-1-s...@kernel.org/

Changes in v5:
* Moved introduction of patch_instructions() as 1st patch in series.
* Improved patch_instructions() to use memset & memcpy.
* Fixed the misnomer in JITing code as a separate patch.
* Removed unused bpf_flush_icache() function.

Changes in v4:
* Updated bpf_patch_instructions() definition in patch 1/5 so that
  it doesn't have to be updated again in patch 2/5.
* Addressed Christophe's comment on bpf_arch_text_invalidate() return
  value in patch 2/5.

Changes in v3:
* Fixed segfault issue observed on ppc32 due to inaccurate offset
  calculation for branching.
* Tried to minimize the performance impact for patch_instruction()
  with the introduction of patch_instructions().
* Corrected uses of u32* vs ppc_instr_t.
* Moved the change that introduces patch_instructions() to after
  enabling bpf_prog_pack support.
* Added few comments to improve code readability.


Hari Bathini (5):
  powerpc/code-patching: introduce patch_instructions()
  powerpc/bpf: implement bpf_arch_text_copy
  powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack
  powerpc/bpf: rename powerpc64_jit_data to powerpc_jit_data
  powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free]

 arch/powerpc/include/asm/code-patching.h |   1 +
 arch/powerpc/lib/code-patching.c |  93 +--
 arch/powerpc/net/bpf_jit.h   |  18 +--
 arch/powerpc/net/bpf_jit_comp.c  | 145 ++-
 arch/powerpc/net/bpf_jit_comp32.c|  13 +-
 arch/powerpc/net/bpf_jit_comp64.c|  10 +-
 6 files changed, 217 insertions(+), 63 deletions(-)

-- 
2.41.0

Re: [PATCH v4 0/5] powerpc/bpf: use BPF prog pack allocator

2023-09-25 Thread Hari Bathini


Ping!

Comments, please..

On 08/09/23 6:57 pm, Hari Bathini wrote:

Most BPF programs are small, but they consume a page each. For systems
with busy traffic and many BPF programs, this may also add significant
pressure on instruction TLB. High iTLB pressure usually slows down the
whole system causing visible performance degradation for production
workloads.

bpf_prog_pack, a customized allocator that packs multiple bpf programs
into preallocated memory chunks, was proposed [1] to address it. This
series extends this support on powerpc.

Patches 1 & 2 add the arch specific functions needed to support this
feature. Patch 3 enables the support for powerpc and ensures cleanup
is handled gracefully. Patch 4 introduces patch_instructions() that
optimizes some calls while patching more than one instruction. Patch 5
leverages this new function to improve time taken for JIT'ing BPF
programs.

Note that the first 3 patches are sufficient to enable the support
for bpf_prog_pack on powerpc. Patches 4 & 5 are to improve the JIT
compilation time of BPF programs on powerpc.

Changes in v4:
* Updated bpf_patch_instructions() definition in patch 1/5 so that
   it doesn't have to be updated again in patch 2/5.
* Addressed Christophe's comment on bpf_arch_text_invalidate() return
   value in patch 2/5.

Changes in v3:
* Fixed segfault issue observed on ppc32 due to inaccurate offset
   calculation for branching.
* Tried to minimize the performance impact for patch_instruction()
   with the introduction of patch_instructions().
* Corrected uses of u32* vs ppc_instr_t.
* Moved the change that introduces patch_instructions() to after
   enabling bpf_prog_pack support.
* Added few comments to improve code readability.

[1] https://lore.kernel.org/bpf/20220204185742.271030-1-s...@kernel.org/
[2] https://lore.kernel.org/all/20230309180028.180200-1-hbath...@linux.ibm.com/


Hari Bathini (5):
   powerpc/bpf: implement bpf_arch_text_copy
   powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack
   powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free]
   powerpc/code-patching: introduce patch_instructions()
   powerpc/bpf: use patch_instructions()

  arch/powerpc/include/asm/code-patching.h |   1 +
  arch/powerpc/lib/code-patching.c |  94 ---
  arch/powerpc/net/bpf_jit.h   |  12 +-
  arch/powerpc/net/bpf_jit_comp.c  | 144 ++-
  arch/powerpc/net/bpf_jit_comp32.c|  13 +-
  arch/powerpc/net/bpf_jit_comp64.c|  10 +-
  6 files changed, 211 insertions(+), 63 deletions(-)

Re: [PATCH] powerpc: add `cur_cpu_spec` symbol to vmcoreinfo

2023-09-15 Thread Hari Bathini





On 14/09/23 6:52 pm, Michael Ellerman wrote:

Aditya Gupta  writes:

Presently, while reading a vmcore, makedumpfile uses
`cur_cpu_spec.mmu_features` to decide whether the crashed system had
RADIX MMU or not.

Currently, makedumpfile fails to get the `cur_cpu_spec` symbol (unless
a vmlinux is passed with the `-x` flag to makedumpfile), and hence
assigns offsets and shifts (such as pgd_offset_l4) incorrecly considering
MMU to be hash MMU.

Add `cur_cpu_spec` symbol and offset of `mmu_features` in the
`cpu_spec` struct, to VMCOREINFO, so that the symbol address and offset
is accessible to makedumpfile, without needing the vmlinux file


This looks fine.

Seems like cpu_features would be needed or at least pretty useful too?


Yeah. Would be nice to have access to that, to accurately identify the
kind of system vmcore was generated for.

Thanks
Hari

Re: [PATCH v4 8/8] bpf ppc32: Access only if addr is kernel address

2023-09-14 Thread Hari Bathini





On 14/09/23 11:48 am, Christophe Leroy wrote:

Hi,



Hi Christophe,


Le 29/09/2021 à 13:18, Hari Bathini a écrit :

With KUAP enabled, any kernel code which wants to access userspace
needs to be surrounded by disable-enable KUAP. But that is not
happening for BPF_PROBE_MEM load instruction. Though PPC32 does not
support read protection, considering the fact that PTR_TO_BTF_ID
(which uses BPF_PROBE_MEM mode) could either be a valid kernel pointer
or NULL but should never be a pointer to userspace address, execute
BPF_PROBE_MEM load only if addr is kernel address, otherwise set
dst_reg=0 and move on.


While looking at the series "bpf: verifier: stop emitting zext for LDX"
from Puranjay I got a question on this old commit, see below.



This will catch NULL, valid or invalid userspace pointers. Only bad
kernel pointer will be handled by BPF exception table.

[Alexei suggested for x86]
Suggested-by: Alexei Starovoitov 
Signed-off-by: Hari Bathini 
---

Changes in v4:
* Adjusted the emit code to avoid using temporary reg.


   arch/powerpc/net/bpf_jit_comp32.c | 34 +++
   1 file changed, 34 insertions(+)

diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
b/arch/powerpc/net/bpf_jit_comp32.c
index 6ee13a09c70d..2ac81563c78d 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -818,6 +818,40 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
case BPF_LDX | BPF_PROBE_MEM | BPF_W:
case BPF_LDX | BPF_MEM | BPF_DW: /* dst = *(u64 *)(ul) (src + 
off) */
case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
+   /*
+* As PTR_TO_BTF_ID that uses BPF_PROBE_MEM mode could 
either be a valid
+* kernel pointer or NULL but not a userspace address, 
execute BPF_PROBE_MEM
+* load only if addr is kernel address (see 
is_kernel_addr()), otherwise
+* set dst_reg=0 and move on.
+*/
+   if (BPF_MODE(code) == BPF_PROBE_MEM) {
+   PPC_LI32(_R0, TASK_SIZE - off);
+   EMIT(PPC_RAW_CMPLW(src_reg, _R0));
+   PPC_BCC(COND_GT, (ctx->idx + 5) * 4);
+   EMIT(PPC_RAW_LI(dst_reg, 0));
+   /*
+* For BPF_DW case, "li reg_h,0" would be 
needed when
+* !fp->aux->verifier_zext. Emit NOP otherwise.
+*
+* Note that "li reg_h,0" is emitted for 
BPF_B/H/W case,
+* if necessary. So, jump there insted of 
emitting an
+* additional "li reg_h,0" instruction.
+*/
+   if (size == BPF_DW && !fp->aux->verifier_zext)
+   EMIT(PPC_RAW_LI(dst_reg_h, 0));
+   else
+   EMIT(PPC_RAW_NOP());


While do you need a NOP in the else case ? Can't we just emit no
instruction in that case ?


Yeah but used the same offset for all cases in the conditional branch
above. To drop the NOP, the conditional branch offset can be calculated
based on the above if condition, I guess..

Thanks,
Hari

Re: [PATCH v2 2/3] vmcore: allow fadump to export vmcore even if is_kdump_kernel() is false

2023-09-12 Thread Hari Bathini





On 11/09/23 4:01 pm, Baoquan He wrote:

On 09/11/23 at 05:13pm, Michael Ellerman wrote:

Hari Bathini  writes:

Currently, is_kdump_kernel() returns true when elfcorehdr_addr is set.
While elfcorehdr_addr is set for kexec based kernel dump mechanism,
alternate dump capturing methods like fadump [1] also set it to export
the vmcore. Since, is_kdump_kernel() is used to restrict resources in
crash dump capture kernel and such restrictions are not desirable for
fadump, allow is_kdump_kernel() to be defined differently for fadump
case. With that change, include is_fadump_active() check in functions
is_vmcore_usable() & vmcore_unusable() to be able to export vmcore for
fadump case too.

...

diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index 0f3a656293b0..de8a9fabfb6f 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -50,6 +50,7 @@ void vmcore_cleanup(void);
  #define vmcore_elf64_check_arch(x) (elf_check_arch(x) || 
vmcore_elf_check_arch_cross(x))
  #endif
  
+#ifndef is_kdump_kernel

  /*
   * is_kdump_kernel() checks whether this kernel is booting after a panic of
   * previous kernel or not. This is determined by checking if previous kernel
@@ -64,6 +65,19 @@ static inline bool is_kdump_kernel(void)
  {
return elfcorehdr_addr != ELFCORE_ADDR_MAX;
  }
+#endif
+
+#ifndef is_fadump_active
+/*
+ * If f/w assisted dump capturing mechanism (fadump), instead of kexec based
+ * dump capturing mechanism (kdump) is exporting the vmcore, then this function
+ * will be defined in arch specific code to return true, when appropriate.
+ */
+static inline bool is_fadump_active(void)
+{
+   return false;
+}
+#endif
  
  /* is_vmcore_usable() checks if the kernel is booting after a panic and

   * the vmcore region is usable.
@@ -75,7 +89,8 @@ static inline bool is_kdump_kernel(void)
  
  static inline int is_vmcore_usable(void)

  {
-   return is_kdump_kernel() && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0;
+   return (is_kdump_kernel() || is_fadump_active())
+   && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0;
  }
  
  /* vmcore_unusable() marks the vmcore as unusable,

@@ -84,7 +99,7 @@ static inline int is_vmcore_usable(void)
  
  static inline void vmcore_unusable(void)

  {
-   if (is_kdump_kernel())
+   if (is_kdump_kernel() || is_fadump_active())
elfcorehdr_addr = ELFCORE_ADDR_ERR;
  }


I think it would be cleaner to decouple is_vmcore_usable() and
vmcore_usable() from is_kdump_kernel().

ie, make them operate solely based on the value of elforehdr_addr:

static inline int is_vmcore_usable(void)
{
elfcorehdr_addr != ELFCORE_ADDR_ERR && \
elfcorehdr_addr != ELFCORE_ADDR_MAX;


Agree. I fell into the blind corner of thinking earlier. Above change
is better.


Thanks for the reviews.
Posted v3.

- Hari

[PATCH v3 2/2] powerpc/fadump: make is_kdump_kernel() return false when fadump is active

2023-09-12 Thread Hari Bathini

Currently, is_kdump_kernel() returns true in crash dump capture kernel
for both kdump and fadump crash dump capturing methods, as both these
methods set elfcorehdr_addr. Some restrictions enforced for crash dump
capture kernel, based on is_kdump_kernel(), are specifically meant for
kdump case and not desirable for fadump - eg. IO queues restriction in
device drivers. So, define is_kdump_kernel() to return false when f/w
assisted dump is active.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/include/asm/kexec.h |  8 ++--
 arch/powerpc/kernel/crash_dump.c | 12 
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index a1ddba01e7d1..e1b43aa12175 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -99,10 +99,14 @@ void relocate_new_kernel(unsigned long indirection_page, 
unsigned long reboot_co
 
 void kexec_copy_flush(struct kimage *image);
 
-#if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_PPC_RTAS)
+#if defined(CONFIG_CRASH_DUMP)
+bool is_kdump_kernel(void);
+#define is_kdump_kernelis_kdump_kernel
+#if defined(CONFIG_PPC_RTAS)
 void crash_free_reserved_phys_range(unsigned long begin, unsigned long end);
 #define crash_free_reserved_phys_range crash_free_reserved_phys_range
-#endif
+#endif /* CONFIG_PPC_RTAS */
+#endif /* CONFIG_CRASH_DUMP */
 
 #ifdef CONFIG_KEXEC_FILE
 extern const struct kexec_file_ops kexec_elf64_ops;
diff --git a/arch/powerpc/kernel/crash_dump.c b/arch/powerpc/kernel/crash_dump.c
index 9a3b85bfc83f..2086fa6cdc25 100644
--- a/arch/powerpc/kernel/crash_dump.c
+++ b/arch/powerpc/kernel/crash_dump.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef DEBUG
 #include 
@@ -92,6 +93,17 @@ ssize_t copy_oldmem_page(struct iov_iter *iter, unsigned 
long pfn,
return csize;
 }
 
+/*
+ * Return true only when kexec based kernel dump capturing method is used.
+ * This ensures all restritions applied for kdump case are not automatically
+ * applied for fadump case.
+ */
+bool is_kdump_kernel(void)
+{
+   return !is_fadump_active() && elfcorehdr_addr != ELFCORE_ADDR_MAX;
+}
+EXPORT_SYMBOL_GPL(is_kdump_kernel);
+
 #ifdef CONFIG_PPC_RTAS
 /*
  * The crashkernel region will almost always overlap the RTAS region, so
-- 
2.41.0

[PATCH v3 1/2] vmcore: remove dependency with is_kdump_kernel() for exporting vmcore

2023-09-12 Thread Hari Bathini

Currently, is_kdump_kernel() returns true when elfcorehdr_addr is set.
While elfcorehdr_addr is set for kexec based kernel dump mechanism,
alternate dump capturing methods like fadump [1] also set it to export
the vmcore. Since, is_kdump_kernel() is used to restrict resources in
crash dump capture kernel and such restrictions may not be desirable
for fadump, allow is_kdump_kernel() to be defined differently for such
scenarios. With this, is_kdump_kernel() could be false while vmcore is
usable. So, remove unnecessary dependency with is_kdump_kernel(), for
exporting vmcore.

[1] https://docs.kernel.org/powerpc/firmware-assisted-dump.html

Suggested-by: Michael Ellerman 
Signed-off-by: Hari Bathini 
---

Changes in v3:
* Decoupled is_vmcore_usable() & vmcore_unusable() from is_kdump_kernel()
  as suggested here: 

https://lore.kernel.org/linuxppc-dev/ZP7si3UMVpPfYV+w@MiWiFi-R3L-srv/T/#m13ae5a7e4ba6f4d8397f0f66581832292eee3a85


 include/linux/crash_dump.h | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index 0f3a656293b0..acc55626afdc 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -50,6 +50,7 @@ void vmcore_cleanup(void);
 #define vmcore_elf64_check_arch(x) (elf_check_arch(x) || 
vmcore_elf_check_arch_cross(x))
 #endif
 
+#ifndef is_kdump_kernel
 /*
  * is_kdump_kernel() checks whether this kernel is booting after a panic of
  * previous kernel or not. This is determined by checking if previous kernel
@@ -64,6 +65,7 @@ static inline bool is_kdump_kernel(void)
 {
return elfcorehdr_addr != ELFCORE_ADDR_MAX;
 }
+#endif
 
 /* is_vmcore_usable() checks if the kernel is booting after a panic and
  * the vmcore region is usable.
@@ -75,7 +77,8 @@ static inline bool is_kdump_kernel(void)
 
 static inline int is_vmcore_usable(void)
 {
-   return is_kdump_kernel() && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0;
+   return elfcorehdr_addr != ELFCORE_ADDR_ERR &&
+   elfcorehdr_addr != ELFCORE_ADDR_MAX ? 1 : 0;
 }
 
 /* vmcore_unusable() marks the vmcore as unusable,
@@ -84,8 +87,7 @@ static inline int is_vmcore_usable(void)
 
 static inline void vmcore_unusable(void)
 {
-   if (is_kdump_kernel())
-   elfcorehdr_addr = ELFCORE_ADDR_ERR;
+   elfcorehdr_addr = ELFCORE_ADDR_ERR;
 }
 
 /**
-- 
2.41.0

[PATCH v4 5/5] powerpc/bpf: use patch_instructions()

2023-09-08 Thread Hari Bathini

Use the newly introduced patch_instructions() that handles patching
multiple instructions with one call. This improves speed of exectution
for JIT'ing bpf programs.

Without this patch (on a POWER9 lpar):

  # time modprobe test_bpf
  real2m59.681s
  user0m0.000s
  sys 1m44.160s
  #

With this patch (on a POWER9 lpar):

  # time modprobe test_bpf
  real0m5.013s
  user0m0.000s
  sys 0m4.216s
  #

Signed-off-by: Hari Bathini 
---
 arch/powerpc/net/bpf_jit_comp.c | 34 ++---
 1 file changed, 6 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index c89ff34504a1..1e5000d18321 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -26,28 +26,6 @@ static void bpf_jit_fill_ill_insns(void *area, unsigned int 
size)
memset32(area, BREAKPOINT_INSTRUCTION, size / 4);
 }
 
-/*
- * Patch 'len' bytes of instructions from opcode to addr, one instruction
- * at a time. Returns addr on success. ERR_PTR(-EINVAL), otherwise.
- */
-static void *bpf_patch_instructions(void *addr, void *opcode, size_t len, bool 
fill_insn)
-{
-   while (len > 0) {
-   ppc_inst_t insn = ppc_inst_read(opcode);
-   int ilen = ppc_inst_len(insn);
-
-   if (patch_instruction(addr, insn))
-   return ERR_PTR(-EINVAL);
-
-   len -= ilen;
-   addr = addr + ilen;
-   if (!fill_insn)
-   opcode = opcode + ilen;
-   }
-
-   return addr;
-}
-
 int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int 
tmp_reg, long exit_addr)
 {
if (!exit_addr || is_offset_in_branch_range(exit_addr - (ctx->idx * 
4))) {
@@ -330,31 +308,31 @@ int bpf_add_extable_entry(struct bpf_prog *fp, u32 
*image, u32 *fimage, int pass
 
 void *bpf_arch_text_copy(void *dst, void *src, size_t len)
 {
-   void *ret;
+   int err;
 
if (WARN_ON_ONCE(core_kernel_text((unsigned long)dst)))
return ERR_PTR(-EINVAL);
 
mutex_lock(_mutex);
-   ret = bpf_patch_instructions(dst, src, len, false);
+   err = patch_instructions(dst, src, len, false);
mutex_unlock(_mutex);
 
-   return ret;
+   return err ? ERR_PTR(err) : dst;
 }
 
 int bpf_arch_text_invalidate(void *dst, size_t len)
 {
u32 insn = BREAKPOINT_INSTRUCTION;
-   void *ret;
+   int ret;
 
if (WARN_ON_ONCE(core_kernel_text((unsigned long)dst)))
return -EINVAL;
 
mutex_lock(_mutex);
-   ret = bpf_patch_instructions(dst, , len, true);
+   ret = patch_instructions(dst, , len, true);
mutex_unlock(_mutex);
 
-   return IS_ERR(ret) ? PTR_ERR(ret) : 0;
+   return ret;
 }
 
 void bpf_jit_free(struct bpf_prog *fp)
-- 
2.41.0

[PATCH v4 4/5] powerpc/code-patching: introduce patch_instructions()

2023-09-08 Thread Hari Bathini

patch_instruction() entails setting up pte, patching the instruction,
clearing the pte and flushing the tlb. If multiple instructions need
to be patched, every instruction would have to go through the above
drill unnecessarily. Instead, introduce function patch_instructions()
that sets up the pte, clears the pte and flushes the tlb only once per
page range of instructions to be patched. This adds a slight overhead
to patch_instruction() call while improving the patching time for
scenarios where more than one instruction needs to be patched.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/include/asm/code-patching.h |  1 +
 arch/powerpc/lib/code-patching.c | 94 
 2 files changed, 80 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index 3f881548fb61..4f5f0c091586 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -74,6 +74,7 @@ int create_cond_branch(ppc_inst_t *instr, const u32 *addr,
 int patch_branch(u32 *addr, unsigned long target, int flags);
 int patch_instruction(u32 *addr, ppc_inst_t instr);
 int raw_patch_instruction(u32 *addr, ppc_inst_t instr);
+int patch_instructions(void *addr, void *code, size_t len, bool fill_insn);
 
 static inline unsigned long patch_site_addr(s32 *site)
 {
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index b00112d7ad46..60d446e16b42 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -278,20 +278,25 @@ static void unmap_patch_area(unsigned long addr)
flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
 }
 
-static int __do_patch_instruction_mm(u32 *addr, ppc_inst_t instr)
+/*
+ * A page is mapped and instructions that fit the page are patched.
+ * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below.
+ */
+static int __do_patch_instructions_mm(u32 *addr, void *code, size_t len, bool 
fill_insn)
 {
-   int err;
-   u32 *patch_addr;
unsigned long text_poke_addr;
pte_t *pte;
unsigned long pfn = get_patch_pfn(addr);
struct mm_struct *patching_mm;
struct mm_struct *orig_mm;
+   ppc_inst_t instr;
+   void *patch_addr;
spinlock_t *ptl;
+   int ilen, err;
 
patching_mm = __this_cpu_read(cpu_patching_context.mm);
text_poke_addr = __this_cpu_read(cpu_patching_context.addr);
-   patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr));
+   patch_addr = (void *)(text_poke_addr + offset_in_page(addr));
 
pte = get_locked_pte(patching_mm, text_poke_addr, );
if (!pte)
@@ -307,11 +312,22 @@ static int __do_patch_instruction_mm(u32 *addr, 
ppc_inst_t instr)
 
orig_mm = start_using_temp_mm(patching_mm);
 
-   err = __patch_instruction(addr, instr, patch_addr);
+   while (len > 0) {
+   instr = ppc_inst_read(code);
+   ilen = ppc_inst_len(instr);
+   err = __patch_instruction(addr, instr, patch_addr);
+   /* hwsync performed by __patch_instruction (sync) if successful 
*/
+   if (err) {
+   mb();  /* sync */
+   break;
+   }
 
-   /* hwsync performed by __patch_instruction (sync) if successful */
-   if (err)
-   mb();  /* sync */
+   len -= ilen;
+   patch_addr = patch_addr + ilen;
+   addr = (void *)addr + ilen;
+   if (!fill_insn)
+   code = code + ilen;
+   }
 
/* context synchronisation performed by __patch_instruction (isync or 
exception) */
stop_using_temp_mm(patching_mm, orig_mm);
@@ -328,16 +344,21 @@ static int __do_patch_instruction_mm(u32 *addr, 
ppc_inst_t instr)
return err;
 }
 
-static int __do_patch_instruction(u32 *addr, ppc_inst_t instr)
+/*
+ * A page is mapped and instructions that fit the page are patched.
+ * Assumes 'len' to be (PAGE_SIZE - offset_in_page(addr)) or below.
+ */
+static int __do_patch_instructions(u32 *addr, void *code, size_t len, bool 
fill_insn)
 {
-   int err;
-   u32 *patch_addr;
unsigned long text_poke_addr;
pte_t *pte;
unsigned long pfn = get_patch_pfn(addr);
+   void *patch_addr;
+   ppc_inst_t instr;
+   int ilen, err;
 
text_poke_addr = (unsigned 
long)__this_cpu_read(cpu_patching_context.addr) & PAGE_MASK;
-   patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr));
+   patch_addr = (void *)(text_poke_addr + offset_in_page(addr));
 
pte = __this_cpu_read(cpu_patching_context.pte);
__set_pte_at(_mm, text_poke_addr, pte, pfn_pte(pfn, PAGE_KERNEL), 
0);
@@ -345,7 +366,19 @@ static int __do_patch_instruction(u32 *addr, ppc_inst_t 
instr)
if (radix_enabled())
asm volatile("ptesync": : :"memory");
 
-   err = __pa

[PATCH v4 3/5] powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free]

2023-09-08 Thread Hari Bathini

Use bpf_jit_binary_pack_alloc in powerpc jit. The jit engine first
writes the program to the rw buffer. When the jit is done, the program
is copied to the final location with bpf_jit_binary_pack_finalize.
With multiple jit_subprogs, bpf_jit_free is called on some subprograms
that haven't got bpf_jit_binary_pack_finalize() yet. Implement custom
bpf_jit_free() like in commit 1d5f82d9dd47 ("bpf, x86: fix freeing of
not-finalized bpf_prog_pack") to call bpf_jit_binary_pack_finalize(),
if necessary. While here, correct the misnomer powerpc64_jit_data to
powerpc_jit_data as it is meant for both ppc32 and ppc64.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/net/bpf_jit.h|  12 ++--
 arch/powerpc/net/bpf_jit_comp.c   | 110 ++
 arch/powerpc/net/bpf_jit_comp32.c |  13 ++--
 arch/powerpc/net/bpf_jit_comp64.c |  10 +--
 4 files changed, 98 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 72b7bb34fade..02bd89aa8ea5 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -36,9 +36,6 @@
EMIT(PPC_RAW_BRANCH(offset)); \
} while (0)
 
-/* bl (unconditional 'branch' with link) */
-#define PPC_BL(dest)   EMIT(PPC_RAW_BL((dest) - (unsigned long)(image + 
ctx->idx)))
-
 /* "cond" here covers BO:BI fields. */
 #define PPC_BCC_SHORT(cond, dest)\
do {  \
@@ -169,16 +166,17 @@ static inline void bpf_clear_seen_register(struct 
codegen_context *ctx, int i)
 }
 
 void bpf_jit_init_reg_mapping(struct codegen_context *ctx);
-int bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 
func);
-int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context 
*ctx,
+int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context 
*ctx, u64 func);
+int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct 
codegen_context *ctx,
   u32 *addrs, int pass, bool extra_pass);
 void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_realloc_regs(struct codegen_context *ctx);
 int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int 
tmp_reg, long exit_addr);
 
-int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct 
codegen_context *ctx,
- int insn_idx, int jmp_off, int dst_reg);
+int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, u32 *fimage, int 
pass,
+ struct codegen_context *ctx, int insn_idx,
+ int jmp_off, int dst_reg);
 
 #endif
 
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index c0036eb171f0..c89ff34504a1 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -64,10 +64,13 @@ int bpf_jit_emit_exit_insn(u32 *image, struct 
codegen_context *ctx, int tmp_reg,
return 0;
 }
 
-struct powerpc64_jit_data {
-   struct bpf_binary_header *header;
+struct powerpc_jit_data {
+   /* address of rw header */
+   struct bpf_binary_header *hdr;
+   /* address of ro final header */
+   struct bpf_binary_header *fhdr;
u32 *addrs;
-   u8 *image;
+   u8 *fimage;
u32 proglen;
struct codegen_context ctx;
 };
@@ -84,15 +87,18 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
u8 *image = NULL;
u32 *code_base;
u32 *addrs;
-   struct powerpc64_jit_data *jit_data;
+   struct powerpc_jit_data *jit_data;
struct codegen_context cgctx;
int pass;
int flen;
-   struct bpf_binary_header *bpf_hdr;
+   struct bpf_binary_header *fhdr = NULL;
+   struct bpf_binary_header *hdr = NULL;
struct bpf_prog *org_fp = fp;
struct bpf_prog *tmp_fp;
bool bpf_blinded = false;
bool extra_pass = false;
+   u8 *fimage = NULL;
+   u32 *fcode_base;
u32 extable_len;
u32 fixup_len;
 
@@ -122,9 +128,16 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
addrs = jit_data->addrs;
if (addrs) {
cgctx = jit_data->ctx;
-   image = jit_data->image;
-   bpf_hdr = jit_data->header;
+   /*
+* JIT compiled to a writable location (image/code_base) first.
+* It is then moved to the readonly final location 
(fimage/fcode_base)
+* using instruction patching.
+*/
+   fimage = jit_data->fimage;
+   fhdr = jit_data->fhdr;
proglen = jit_data->proglen;
+   hdr = jit_data->hdr;
+   image = (void *)hdr + ((void *)fimage - (

[PATCH v4 2/5] powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack

2023-09-08 Thread Hari Bathini

Implement bpf_arch_text_invalidate and use it to fill unused part of
the bpf_prog_pack with trap instructions when a BPF program is freed.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/net/bpf_jit_comp.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 4f896222c579..c0036eb171f0 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -313,3 +313,18 @@ void *bpf_arch_text_copy(void *dst, void *src, size_t len)
 
return ret;
 }
+
+int bpf_arch_text_invalidate(void *dst, size_t len)
+{
+   u32 insn = BREAKPOINT_INSTRUCTION;
+   void *ret;
+
+   if (WARN_ON_ONCE(core_kernel_text((unsigned long)dst)))
+   return -EINVAL;
+
+   mutex_lock(_mutex);
+   ret = bpf_patch_instructions(dst, , len, true);
+   mutex_unlock(_mutex);
+
+   return IS_ERR(ret) ? PTR_ERR(ret) : 0;
+}
-- 
2.41.0

[PATCH v4 1/5] powerpc/bpf: implement bpf_arch_text_copy

2023-09-08 Thread Hari Bathini

bpf_arch_text_copy is used to dump JITed binary to RX page, allowing
multiple BPF programs to share the same page. Use patch_instruction()
to implement it.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/net/bpf_jit_comp.c | 41 -
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 37043dfc1add..4f896222c579 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -13,9 +13,12 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 
+#include 
+#include 
+
 #include "bpf_jit.h"
 
 static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
@@ -23,6 +26,28 @@ static void bpf_jit_fill_ill_insns(void *area, unsigned int 
size)
memset32(area, BREAKPOINT_INSTRUCTION, size / 4);
 }
 
+/*
+ * Patch 'len' bytes of instructions from opcode to addr, one instruction
+ * at a time. Returns addr on success. ERR_PTR(-EINVAL), otherwise.
+ */
+static void *bpf_patch_instructions(void *addr, void *opcode, size_t len, bool 
fill_insn)
+{
+   while (len > 0) {
+   ppc_inst_t insn = ppc_inst_read(opcode);
+   int ilen = ppc_inst_len(insn);
+
+   if (patch_instruction(addr, insn))
+   return ERR_PTR(-EINVAL);
+
+   len -= ilen;
+   addr = addr + ilen;
+   if (!fill_insn)
+   opcode = opcode + ilen;
+   }
+
+   return addr;
+}
+
 int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int 
tmp_reg, long exit_addr)
 {
if (!exit_addr || is_offset_in_branch_range(exit_addr - (ctx->idx * 
4))) {
@@ -274,3 +299,17 @@ int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, 
int pass, struct code
ctx->exentry_idx++;
return 0;
 }
+
+void *bpf_arch_text_copy(void *dst, void *src, size_t len)
+{
+   void *ret;
+
+   if (WARN_ON_ONCE(core_kernel_text((unsigned long)dst)))
+   return ERR_PTR(-EINVAL);
+
+   mutex_lock(_mutex);
+   ret = bpf_patch_instructions(dst, src, len, false);
+   mutex_unlock(_mutex);
+
+   return ret;
+}
-- 
2.41.0

[PATCH v4 0/5] powerpc/bpf: use BPF prog pack allocator

2023-09-08 Thread Hari Bathini

Most BPF programs are small, but they consume a page each. For systems
with busy traffic and many BPF programs, this may also add significant
pressure on instruction TLB. High iTLB pressure usually slows down the
whole system causing visible performance degradation for production
workloads.

bpf_prog_pack, a customized allocator that packs multiple bpf programs
into preallocated memory chunks, was proposed [1] to address it. This
series extends this support on powerpc.

Patches 1 & 2 add the arch specific functions needed to support this
feature. Patch 3 enables the support for powerpc and ensures cleanup
is handled gracefully. Patch 4 introduces patch_instructions() that
optimizes some calls while patching more than one instruction. Patch 5
leverages this new function to improve time taken for JIT'ing BPF
programs.

Note that the first 3 patches are sufficient to enable the support
for bpf_prog_pack on powerpc. Patches 4 & 5 are to improve the JIT
compilation time of BPF programs on powerpc.

Changes in v4:
* Updated bpf_patch_instructions() definition in patch 1/5 so that
  it doesn't have to be updated again in patch 2/5.
* Addressed Christophe's comment on bpf_arch_text_invalidate() return
  value in patch 2/5.

Changes in v3:
* Fixed segfault issue observed on ppc32 due to inaccurate offset
  calculation for branching.
* Tried to minimize the performance impact for patch_instruction()
  with the introduction of patch_instructions().
* Corrected uses of u32* vs ppc_instr_t.
* Moved the change that introduces patch_instructions() to after
  enabling bpf_prog_pack support.
* Added few comments to improve code readability.

[1] https://lore.kernel.org/bpf/20220204185742.271030-1-s...@kernel.org/
[2] https://lore.kernel.org/all/20230309180028.180200-1-hbath...@linux.ibm.com/


Hari Bathini (5):
  powerpc/bpf: implement bpf_arch_text_copy
  powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack
  powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free]
  powerpc/code-patching: introduce patch_instructions()
  powerpc/bpf: use patch_instructions()

 arch/powerpc/include/asm/code-patching.h |   1 +
 arch/powerpc/lib/code-patching.c |  94 ---
 arch/powerpc/net/bpf_jit.h   |  12 +-
 arch/powerpc/net/bpf_jit_comp.c  | 144 ++-
 arch/powerpc/net/bpf_jit_comp32.c|  13 +-
 arch/powerpc/net/bpf_jit_comp64.c|  10 +-
 6 files changed, 211 insertions(+), 63 deletions(-)

-- 
2.41.0

Re: [PATCH v2 1/3] powerpc/fadump: make is_fadump_active() visible for exporting vmcore

2023-09-07 Thread Hari Bathini


Thanks, Baoquan.

On 07/09/23 11:12 am, Baoquan He wrote:

On 09/06/23 at 12:06am, Hari Bathini wrote:

Include asm/fadump.h in asm/kexec.h to make it visible while exporting
vmcore. Also, update is_fadump_active() to return boolean instead of
integer for better readability. The change will be used in the next
patch to ensure vmcore is exported when fadump is active.

Signed-off-by: Hari Bathini 


Thanks, Hari. The whole series looks good to me.

Acked-by: Baoquan He 

Since it's a power specific change, should be picked into powerpc tree?


Michael, would you mind taking the series via powerpc tree..

Thanks
Hari


---

Changes in v2:
* New patch based on Baoquan's suggestion to use is_fadump_active()
   instead of introducing new function is_crashdump_kernel().


  arch/powerpc/include/asm/fadump.h | 4 ++--
  arch/powerpc/include/asm/kexec.h  | 8 ++--
  arch/powerpc/kernel/fadump.c  | 4 ++--
  3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h 
b/arch/powerpc/include/asm/fadump.h
index 526a6a647312..27b74a7e2162 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -15,13 +15,13 @@ extern int crashing_cpu;
  
  extern int is_fadump_memory_area(u64 addr, ulong size);

  extern int setup_fadump(void);
-extern int is_fadump_active(void);
+extern bool is_fadump_active(void);
  extern int should_fadump_crash(void);
  extern void crash_fadump(struct pt_regs *, const char *);
  extern void fadump_cleanup(void);
  
  #else	/* CONFIG_FA_DUMP */

-static inline int is_fadump_active(void) { return 0; }
+static inline bool is_fadump_active(void) { return false; }
  static inline int should_fadump_crash(void) { return 0; }
  static inline void crash_fadump(struct pt_regs *regs, const char *str) { }
  static inline void fadump_cleanup(void) { }
diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index a1ddba01e7d1..b760ef459234 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -51,6 +51,7 @@
  
  #ifndef __ASSEMBLY__

  #include 
+#include 
  
  typedef void (*crash_shutdown_t)(void);
  
@@ -99,10 +100,13 @@ void relocate_new_kernel(unsigned long indirection_page, unsigned long reboot_co
  
  void kexec_copy_flush(struct kimage *image);
  
-#if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_PPC_RTAS)

+#if defined(CONFIG_CRASH_DUMP)
+#define is_fadump_active   is_fadump_active
+#if defined(CONFIG_PPC_RTAS)
  void crash_free_reserved_phys_range(unsigned long begin, unsigned long end);
  #define crash_free_reserved_phys_range crash_free_reserved_phys_range
-#endif
+#endif /* CONFIG_PPC_RTAS */
+#endif /* CONFIG_CRASH_DUMP */
  
  #ifdef CONFIG_KEXEC_FILE

  extern const struct kexec_file_ops kexec_elf64_ops;
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 3ff2da7b120b..5682a65e8326 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -187,9 +187,9 @@ int should_fadump_crash(void)
return 1;
  }
  
-int is_fadump_active(void)

+bool is_fadump_active(void)
  {
-   return fw_dump.dump_active;
+   return !!fw_dump.dump_active;
  }
  
  /*

--
2.41.0

[PATCH v2 2/3] vmcore: allow fadump to export vmcore even if is_kdump_kernel() is false

2023-09-05 Thread Hari Bathini

Currently, is_kdump_kernel() returns true when elfcorehdr_addr is set.
While elfcorehdr_addr is set for kexec based kernel dump mechanism,
alternate dump capturing methods like fadump [1] also set it to export
the vmcore. Since, is_kdump_kernel() is used to restrict resources in
crash dump capture kernel and such restrictions are not desirable for
fadump, allow is_kdump_kernel() to be defined differently for fadump
case. With that change, include is_fadump_active() check in functions
is_vmcore_usable() & vmcore_unusable() to be able to export vmcore for
fadump case too.

[1] https://docs.kernel.org/powerpc/firmware-assisted-dump.html

Signed-off-by: Hari Bathini 
---

Changes in v2:
* is_fadump_active() check added to is_vmcore_usable() as suggested
  by Baoquan.


 include/linux/crash_dump.h | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index 0f3a656293b0..de8a9fabfb6f 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -50,6 +50,7 @@ void vmcore_cleanup(void);
 #define vmcore_elf64_check_arch(x) (elf_check_arch(x) || 
vmcore_elf_check_arch_cross(x))
 #endif
 
+#ifndef is_kdump_kernel
 /*
  * is_kdump_kernel() checks whether this kernel is booting after a panic of
  * previous kernel or not. This is determined by checking if previous kernel
@@ -64,6 +65,19 @@ static inline bool is_kdump_kernel(void)
 {
return elfcorehdr_addr != ELFCORE_ADDR_MAX;
 }
+#endif
+
+#ifndef is_fadump_active
+/*
+ * If f/w assisted dump capturing mechanism (fadump), instead of kexec based
+ * dump capturing mechanism (kdump) is exporting the vmcore, then this function
+ * will be defined in arch specific code to return true, when appropriate.
+ */
+static inline bool is_fadump_active(void)
+{
+   return false;
+}
+#endif
 
 /* is_vmcore_usable() checks if the kernel is booting after a panic and
  * the vmcore region is usable.
@@ -75,7 +89,8 @@ static inline bool is_kdump_kernel(void)
 
 static inline int is_vmcore_usable(void)
 {
-   return is_kdump_kernel() && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0;
+   return (is_kdump_kernel() || is_fadump_active())
+   && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0;
 }
 
 /* vmcore_unusable() marks the vmcore as unusable,
@@ -84,7 +99,7 @@ static inline int is_vmcore_usable(void)
 
 static inline void vmcore_unusable(void)
 {
-   if (is_kdump_kernel())
+   if (is_kdump_kernel() || is_fadump_active())
elfcorehdr_addr = ELFCORE_ADDR_ERR;
 }
 
-- 
2.41.0

Re: [PATCH 1/2] vmcore: allow alternate dump capturing methods to export vmcore without is_kdump_kernel()

2023-09-05 Thread Hari Bathini





On 05/09/23 8:00 am, Baoquan He wrote:

On 09/04/23 at 08:04pm, Hari Bathini wrote:

Hi Baoquan,

Thanks for the review...

On 03/09/23 9:06 am, Baoquan He wrote:

Hi Hari,

On 09/02/23 at 12:34am, Hari Bathini wrote:

Currently, is_kdump_kernel() returns true when elfcorehdr_addr is set.
While elfcorehdr_addr is set for kexec based kernel dump mechanism,
alternate dump capturing methods like fadump [1] also set it to export
the vmcore. is_kdump_kernel() is used to restrict resources in crash
dump capture kernel but such restrictions may not be desirable for
fadump. Allow is_kdump_kernel() to be defined differently for such
scenarios. With this, is_kdump_kernel() could be false while vmcore
is usable. So, introduce is_crashdump_kernel() to return true when
elfcorehdr_addr is set and use it for vmcore related checks.


I got what is done in these two patches, but didn't get why they need be
done. vmcore_unusable()/is_vmcore_usable() are only unitilized in ia64.
Why do you care if it's is_crashdump_kernel() or is_kdump_kernel()?
If you want to override the generic is_kdump_kernel() with powerpc's own
is_kdump_kernel(), your below change is enough to allow you to do that.
I can't see why is_crashdump_kernel() is needed. Could you explain that
specifically?


You mean to just remove is_kdump_kernel() check in is_vmcore_usable() &
vmcore_unusable() functions? Replaced generic is_crashdump_kernel()
function instead, that returns true for any dump capturing method,
irrespective of whether is_kdump_kernel() returns true or false.
For fadump case, is_kdump_kernel() will return false after patch 2/2.


OK, I could understand what you want to achieve. You want to make
is_kdump_kernel() only return true for kdump, while is_vmcore_usable()
returns true for both kdump and fadump.

IIUC, can we change as below? It could make code clearer and more
straightforward. I don't think adding another is_crashdump_kernel()
is a good idea, that would be a torture for non-powerpc people reading
code when they need differentiate between kdump and crashdump.



Sure, Baoquan.
Posted v2 based on your suggestion.

Thanks
Hari

[PATCH v2 1/3] powerpc/fadump: make is_fadump_active() visible for exporting vmcore

2023-09-05 Thread Hari Bathini

Include asm/fadump.h in asm/kexec.h to make it visible while exporting
vmcore. Also, update is_fadump_active() to return boolean instead of
integer for better readability. The change will be used in the next
patch to ensure vmcore is exported when fadump is active.

Signed-off-by: Hari Bathini 
---

Changes in v2:
* New patch based on Baoquan's suggestion to use is_fadump_active()
  instead of introducing new function is_crashdump_kernel().


 arch/powerpc/include/asm/fadump.h | 4 ++--
 arch/powerpc/include/asm/kexec.h  | 8 ++--
 arch/powerpc/kernel/fadump.c  | 4 ++--
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h 
b/arch/powerpc/include/asm/fadump.h
index 526a6a647312..27b74a7e2162 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -15,13 +15,13 @@ extern int crashing_cpu;
 
 extern int is_fadump_memory_area(u64 addr, ulong size);
 extern int setup_fadump(void);
-extern int is_fadump_active(void);
+extern bool is_fadump_active(void);
 extern int should_fadump_crash(void);
 extern void crash_fadump(struct pt_regs *, const char *);
 extern void fadump_cleanup(void);
 
 #else  /* CONFIG_FA_DUMP */
-static inline int is_fadump_active(void) { return 0; }
+static inline bool is_fadump_active(void) { return false; }
 static inline int should_fadump_crash(void) { return 0; }
 static inline void crash_fadump(struct pt_regs *regs, const char *str) { }
 static inline void fadump_cleanup(void) { }
diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index a1ddba01e7d1..b760ef459234 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -51,6 +51,7 @@
 
 #ifndef __ASSEMBLY__
 #include 
+#include 
 
 typedef void (*crash_shutdown_t)(void);
 
@@ -99,10 +100,13 @@ void relocate_new_kernel(unsigned long indirection_page, 
unsigned long reboot_co
 
 void kexec_copy_flush(struct kimage *image);
 
-#if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_PPC_RTAS)
+#if defined(CONFIG_CRASH_DUMP)
+#define is_fadump_active   is_fadump_active
+#if defined(CONFIG_PPC_RTAS)
 void crash_free_reserved_phys_range(unsigned long begin, unsigned long end);
 #define crash_free_reserved_phys_range crash_free_reserved_phys_range
-#endif
+#endif /* CONFIG_PPC_RTAS */
+#endif /* CONFIG_CRASH_DUMP */
 
 #ifdef CONFIG_KEXEC_FILE
 extern const struct kexec_file_ops kexec_elf64_ops;
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 3ff2da7b120b..5682a65e8326 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -187,9 +187,9 @@ int should_fadump_crash(void)
return 1;
 }
 
-int is_fadump_active(void)
+bool is_fadump_active(void)
 {
-   return fw_dump.dump_active;
+   return !!fw_dump.dump_active;
 }
 
 /*
-- 
2.41.0

1 2 3 4 5 6 7 8 >

1 - 100 of 792 matches

Mail list logo