[PATCH] powerpc/xmon: Fix warning comparing pointer to 0

2022-03-23 Thread Haowen Bai
Avoid pointer type value compared with 0 to make code clear.

Signed-off-by: Haowen Bai 
---
 arch/powerpc/xmon/spu-dis.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/xmon/spu-dis.c b/arch/powerpc/xmon/spu-dis.c
index 4b0a4e6..6078aa5 100644
--- a/arch/powerpc/xmon/spu-dis.c
+++ b/arch/powerpc/xmon/spu-dis.c
@@ -34,7 +34,7 @@ init_spu_disassemble (void)
   int o = spu_opcodes[i].opcode;
   if (o >= SPU_DISASM_TBL_SIZE)
continue; /* abort (); */
-  if (spu_disassemble_table[o] == 0)
+  if (!spu_disassemble_table[o])
spu_disassemble_table[o] = _opcodes[i];
 }
 }
@@ -48,30 +48,30 @@ get_index_for_opcode (unsigned int insn)
 
   /* Init the table.  This assumes that element 0/opcode 0 (currently
* NOP) is always used */
-  if (spu_disassemble_table[0] == 0)
+  if (!spu_disassemble_table[0])
 init_spu_disassemble ();
 
-  if ((index = spu_disassemble_table[opcode & 0x780]) != 0
+  if ((index = spu_disassemble_table[opcode & 0x780]) != NULL
   && index->insn_type == RRR)
 return index;
 
-  if ((index = spu_disassemble_table[opcode & 0x7f0]) != 0
+  if ((index = spu_disassemble_table[opcode & 0x7f0]) != NULL
   && (index->insn_type == RI18 || index->insn_type == LBT))
 return index;
 
-  if ((index = spu_disassemble_table[opcode & 0x7f8]) != 0
+  if ((index = spu_disassemble_table[opcode & 0x7f8]) != NULL
   && index->insn_type == RI10)
 return index;
 
-  if ((index = spu_disassemble_table[opcode & 0x7fc]) != 0
+  if ((index = spu_disassemble_table[opcode & 0x7fc]) != NULL
   && (index->insn_type == RI16))
 return index;
 
-  if ((index = spu_disassemble_table[opcode & 0x7fe]) != 0
+  if ((index = spu_disassemble_table[opcode & 0x7fe]) != NULL
   && (index->insn_type == RI8))
 return index;
 
-  if ((index = spu_disassemble_table[opcode & 0x7ff]) != 0)
+  if ((index = spu_disassemble_table[opcode & 0x7ff]) != NULL)
 return index;
 
   return NULL;
@@ -89,7 +89,7 @@ print_insn_spu (unsigned long insn, unsigned long memaddr)
 
   index = get_index_for_opcode (insn);
 
-  if (index == 0)
+  if (!index)
 {
   printf(".long 0x%lx", insn);
 }
-- 
2.7.4



Re: [RFC v3 PATCH 5/5] powerpc/crash hp: add crash hotplug support for kexec_load

2022-03-23 Thread Eric DeVolder

Some minor nits below.
eric

On 3/21/22 03:04, Sourabh Jain wrote:

The kernel changes needed to add support for crash hotplug support for
kexec_load system calls are similar to kexec_file_load (which has already
been implemented in earlier patches). Since kexec segment array is
prepared by kexec tool in the userspace, the kernel does aware of which

s/kernel does aware/ kernel is not aware/ ?


index FDT segment is present.

The only change was done to enabled crash hotplug support for kexec_load is

s/was done to enabled/to enable/ ?

updated the crash hotplug handler to identify the FDT segment from kexec

s/updated/ to update/ ?

segment array.

Signed-off-by: Sourabh Jain 
---
  arch/powerpc/kexec/core_64.c | 21 -
  1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index a470fe6904e3..2c248dfb169b 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -480,7 +480,9 @@ int update_cpus_node(void *fdt)
  void arch_crash_hotplug_handler(struct kimage *image, unsigned int hp_action,
unsigned long a, unsigned long b)
  {
-   void *fdt;
+   void *fdt, *ptr;
+   unsigned int n;
+   unsigned long mem, memsz;
  
  	/* No action needed for CPU hot-unplug */

if (hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
@@ -492,6 +494,23 @@ void arch_crash_hotplug_handler(struct kimage *image, 
unsigned int hp_action,
return;
}
  
+	/* Sine kexec segments for kexec_load system call is prepred by

s/Sine/Since/
s/is prepred/are prepared/

+* kexec tool in userspace we need loop through all the segments
+* to find out segment index corresponds FDT segment. In case of
+* kexec_file_load it is discovered during the load itself.
+*/
+   if (!image->arch.fdt_index_valid) {
+   for (n = 0; n < image->nr_segments; n++) {
+   mem = image->segment[n].mem;
+   memsz = image->segment[n].memsz;
+   ptr = __va(mem);
+   if (ptr && fdt_magic(ptr) == FDT_MAGIC) {
+   image->arch.fdt_index = n;
+   image->arch.fdt_index_valid = true;

How about adding a break; statement to early exit?

+   }
+   }
+   }
+
/* Must have valid FDT index */
if (!image->arch.fdt_index_valid) {
pr_err("crash hp: unable to locate FDT segment");



Re: [RFC v3 PATCH 4/5] powerpc/crash hp: add crash hotplug support for kexec_file_load

2022-03-23 Thread Eric DeVolder

Notes below.
eric

On 3/21/22 03:04, Sourabh Jain wrote:

Two major changes are done to enable the crash CPU hotplug handler.
Firstly, updated the kexec_load path to prepare kimage for hotplug
changes and secondly, implemented the crash hotplug handler itself.

On the kexec load path, memsz allocation of crash FDT segment is
updated to ensure that it has sufficient buffer space to accommodate
future hot add CPUs. Initialized the kimage members to track the FDT
segment.

The crash hotplug handler updates the cpus node of crash FDT. While
we update crash FDT the kexec_crash_image is marked invalid and restored
after FDT update to avoid race.

Since memory crash hotplug support is not there yet the crash hotplug
handler simply warn the user and return.

Signed-off-by: Sourabh Jain 
---
  arch/powerpc/kexec/core_64.c | 46 
  arch/powerpc/kexec/elf_64.c  | 40 +++
  2 files changed, 86 insertions(+)

diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 249d2632526d..a470fe6904e3 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -466,6 +466,52 @@ int update_cpus_node(void *fdt)
return ret;
  }
  
+#ifdef CONFIG_CRASH_HOTPLUG

+/**
+ * arch_crash_hotplug_handler() - Handle hotplug FDT changes
+ * @image: the active struct kimage
+ * @hp_action: the hot un/plug action being handled
+ * @a: first parameter dependent upon hp_action
+ * @b: first parameter dependent upon hp_action
+ *
+ * To accurately reflect CPU hot un/plug changes, the FDT
+ * must be updated with the new list of CPUs and memories.
+ */
+void arch_crash_hotplug_handler(struct kimage *image, unsigned int hp_action,
+   unsigned long a, unsigned long b)
+{
+   void *fdt;
+
+   /* No action needed for CPU hot-unplug */
+   if (hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
+   return;

Just curious why no action is needed on cpu remove?


+
+   /* crash update on memory hotplug is not support yet */
+   if (hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY || hp_action == 
KEXEC_CRASH_HP_ADD_MEMORY) {
+   pr_err("crash hp: crash update is not supported with memory 
hotplug\n");
+   return;
+   }
+
+   /* Must have valid FDT index */
+   if (!image->arch.fdt_index_valid) {
+   pr_err("crash hp: unable to locate FDT segment");
+   return;
+   }
+
+   fdt = __va((void *)image->segment[image->arch.fdt_index].mem);
+
+   /* Temporarily invalidate the crash image while it is replaced */
+   xchg(_crash_image, NULL);
+
+   /* update FDT to refelect changes to CPU resrouces */
+   if (update_cpus_node(fdt))
+   pr_err("crash hp: failed to update crash FDT");
+
+   /* The crash image is now valid once again */
+   xchg(_crash_image, image);
+}
+#endif /* CONFIG_CRASH_HOTPLUG */
+
  #ifdef CONFIG_PPC_64S_HASH_MMU
  /* Values we need to export to the second kernel via the device tree. */
  static unsigned long htab_base;
diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
index eeb258002d1e..2ffe6d69e186 100644
--- a/arch/powerpc/kexec/elf_64.c
+++ b/arch/powerpc/kexec/elf_64.c
@@ -24,6 +24,33 @@
  #include 
  #include 
  
+

+#ifdef CONFIG_CRASH_HOTPLUG
+#define MAX_CORE 256

Is there a better config option to tie this value too?

+#define PER_CORE_NODE_SIZE 1500
+
+/**
+ * get_crash_fdt_mem_sz() - calcuate mem size for crash kernel FDT
+ * @fdt: pointer to crash kernel FDT
+ *
+ * Calculate the buffer space needed to add more CPU nodes in FDT after
+ * capture kenrel load due to hot add events.
+ *
+ * Some assumption are made to calculate the additional buffer size needed
+ * to accommodate future hot add CPUs to the crash FDT. The maximum core count
+ * in the system would not go beyond MAX_CORE and memory needed to store per 
core
+ * date in FDT is PER_CORE_NODE_SIZE.
+ *
+ * Certainly MAX_CORE count can be replaced with possible core count and
+ * PER_CORE_NODE_SIZE to some standard value instead of randomly observed
+ * core size value on Power9 LPAR.
+ */
+static unsigned int get_crash_fdt_mem_sz(void *fdt)
+{
+   return fdt_totalsize(fdt) + (PER_CORE_NODE_SIZE * MAX_CORE);
+}
+#endif
+
  static void *elf64_load(struct kimage *image, char *kernel_buf,
unsigned long kernel_len, char *initrd,
unsigned long initrd_len, char *cmdline,
@@ -123,6 +150,19 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
kbuf.buf_align = PAGE_SIZE;
kbuf.top_down = true;
kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
+
+#ifdef CONFIG_CRASH_HOTPLUG
+   if (image->type == KEXEC_TYPE_CRASH) {
+   kbuf.memsz = get_crash_fdt_mem_sz(fdt);
+   fdt_set_totalsize(fdt, kbuf.memsz);
+   image->arch.fdt_index = image->nr_segments;
+   image->arch.fdt_index_valid = 

Re: [RFC v3 PATCH 0/5] In kernel handling of CPU hotplug events for crash kernel

2022-03-23 Thread Eric DeVolder




On 3/21/22 03:04, Sourabh Jain wrote:

This patch series implements the crash hotplug handler on PowerPC introduced
by https://lkml.org/lkml/2022/3/3/674 patch series.


The Problem:

Post hotplug/DLPAR events the capture kernel holds stale information about the
system. Dump collection with stale capture kernel might end up in dump capture
failure or an inaccurate dump collection.


Existing solution:
==
The existing solution to keep the capture kernel up-to-date is observe the
hotplug event via udev rule and trigger a full capture kernel reload post
hotplug event.

Shortcomings:

- Leaves a window where kernel crash might not lead to successful dump
   collection.
- Reloading all kexec components for each hotplug is inefficient. Since only
   one or two kexec components need to be updated due to hotplug event reloading
   all kexec component is redundant.
- udev rules are prone to races if hotplug events are frequent.

More about issues with an existing solution is posted here:
  - https://lkml.org/lkml/2020/12/14/532
  - https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-February/240254.html

Proposed Solution:
==
Instead of reloading all kexec segments on hotplug event, this patch series
focuses on updating only the relevant kexec segment. Once the kexec
segments are loaded in the kernel reserved area then an arch-specific hotplug 
handler
will update the relevant kexec segment based on hotplug event type.

As mentioned above this patch series implemented a PowerPC crash hotplug
handler for the CPU. The crash hotplug handler memory is in our TODO list.


A couple of minor changes are required to realize the benefit of the patch
series:

- disalble the udev rule:

   comment out the below line in kdump udev rule file:

fwiw, this will need to be conditionalized on arch, ie to skip for ppc64. I'm 
doing the same for x86_64.

   RHEL: /usr/lib/udev/rules.d/98-kexec.rules
   # SUBSYSTEM=="cpu", ACTION=="online", GOTO="kdump_reload_cpu"

- kexec tool needs to be updated with patch for kexec_load system call
   to work (not needed if -s option is used during kexec panic load):

---
diff --git a/kexec/arch/ppc64/kexec-elf-ppc64.c 
b/kexec/arch/ppc64/kexec-elf-ppc64.c
index 695b8b0..1dc6490 100644
--- a/kexec/arch/ppc64/kexec-elf-ppc64.c
+++ b/kexec/arch/ppc64/kexec-elf-ppc64.c
@@ -45,6 +45,29 @@ uint64_t initrd_base, initrd_size;
  unsigned char reuse_initrd = 0;
  const char *ramdisk;
  
+#define MAX_CORE 256

+#define PER_CORE_NODE_SIZE 1500
+
+/**
+ * get_crash_fdt_mem_sz() - calcuate mem size for crash kernel FDT
+ * @fdt: pointer to crash kernel FDT
+ *
+ * Calculate the buffer space needed to add more CPU nodes in FDT after
+ * capture kenrel load due to hot add events.
+ *
+ * Some assumption are made to calculate the additional buffer size needed
+ * to accommodate future hot add CPUs to the crash FDT. The maximum core count
+ * in the system would not go beyond MAX_CORE and memory needed to store per 
core
+ * date in FDT is PER_CORE_NODE_SIZE.
+ *
+ * Certainly MAX_CORE count can be replaced with possible core count and
+ * PER_CORE_NODE_SIZE to some standard value instead of randomly observed
+ * core size value on Power9 LPAR.
+ */
+static unsigned int get_crash_fdt_mem_sz(void *fdt) {
+   return fdt_totalsize(fdt) + (PER_CORE_NODE_SIZE * MAX_CORE);
+}
+
  int elf_ppc64_probe(const char *buf, off_t len)
  {
struct mem_ehdr ehdr;
@@ -179,6 +202,7 @@ int elf_ppc64_load(int argc, char **argv, const char *buf, 
off_t len,
uint64_t max_addr, hole_addr;
char *seg_buf = NULL;
off_t seg_size = 0;
+   unsigned int mem_sz = 0;
struct mem_phdr *phdr;
size_t size;
  #ifdef NEED_RESERVE_DTB
@@ -329,7 +353,13 @@ int elf_ppc64_load(int argc, char **argv, const char *buf, 
off_t len,
if (result < 0)
return result;
  
-	my_dt_offset = add_buffer(info, seg_buf, seg_size, seg_size,

+   if (info->kexec_flags & KEXEC_ON_CRASH) {
+   mem_sz = get_crash_fdt_mem_sz((void *)seg_buf);
+   fdt_set_totalsize(seg_buf, mem_sz);
+   info->fdt_index = info->nr_segments;
+   }
+
+   my_dt_offset = add_buffer(info, seg_buf, seg_size, mem_sz,
0, 0, max_addr, -1);
  
  #ifdef NEED_RESERVE_DTB

diff --git a/kexec/kexec.c b/kexec/kexec.c
index f63b36b..846b1a8 100644
--- a/kexec/kexec.c
+++ b/kexec/kexec.c
@@ -672,6 +672,9 @@ static void update_purgatory(struct kexec_info *info)
if (info->segment[i].mem == (void *)info->rhdr.rel_addr) {
continue;
}
+   if (info->fdt_index == i)
+   continue;
+
sha256_update(, info->segment[i].buf,
  info->segment[i].bufsz);
nullsz = info->segment[i].memsz - info->segment[i].bufsz;
diff --git 

Re: [RFC v3 PATCH 3/5] powrepc/crash hp: update kimage struct

2022-03-23 Thread Eric DeVolder




On 3/21/22 03:04, Sourabh Jain wrote:

Two new members fdt_index and fdt_index_valid are added in kimage_arch
struct to track the FDT kexec segment. These new members of kimage_arch
struct will help the crash hotplug handler to easily access the FDT
segment from the kexec segment array. Otherwise, we have to loop through
all kexec segments to find the FDT segments.

Signed-off-by: Sourabh Jain 



---
  arch/powerpc/include/asm/kexec.h | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index e1288826e22e..19c2cab6a880 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -104,6 +104,8 @@ extern const struct kexec_file_ops kexec_elf64_ops;
  struct kimage_arch {
struct crash_mem *exclude_ranges;
  
+	int fdt_index;

+   bool fdt_index_valid;
unsigned long backup_start;
void *backup_buf;
void *fdt;



Question, for the kexec_file_load scenario, is there a need to have the fdt_index segment excluded 
by kexec_calculate_store_digests() ?


Re: [RFC v3 PATCH 2/5] powerpc/crash hp: introduce a new config option CRASH_HOTPLUG

2022-03-23 Thread Eric DeVolder




On 3/21/22 03:04, Sourabh Jain wrote:

The option CRASH_HOTPLUG enables, in kernel update to kexec segments on
hotplug events.

All the updates needed on the capture kernel load path in the kernel for
both kexec_load and kexec_file_load system will be kept under this config.

Signed-off-by: Sourabh Jain 


Reviewed-by: Eric DeVolder 


---
  arch/powerpc/Kconfig | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index b779603978e1..b816339ef8c7 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -623,6 +623,17 @@ config FA_DUMP
  If unsure, say "y". Only special kernels like petitboot may
  need to say "N" here.
  
+config CRASH_HOTPLUG

+   bool "kernel updates of crash kexec segments"
+   depends on CRASH_DUMP && (HOTPLUG_CPU) && KEXEC_FILE
+   help
+ An efficient way to keep the capture kernel up-to-date with CPU
+ hotplug events. On hotplug event (CPU/memory) the kexec segments
+ of capture kernel becomes stale and need to be updated with latest
+ CPU and memory regions. In this method the kernel performs minimal
+ update to only relevant kexec segments on CPU hotplug event, instead
+ of triggering full capture reload from userspace using udev rule.
+
  config PRESERVE_FA_DUMP
bool "Preserve Firmware-assisted dump"
depends on PPC64 && PPC_POWERNV && !FA_DUMP



Re: [PATCH] powerpc/powernv: Get more flushing requirements from device-tree

2022-03-23 Thread Murilo Opsfelder Araújo

Hi, Russell.

I think this patch could have been split in half with their corresponding 
Fixes: tag.

This may sound nitpicking but doing this would certainly help distros doing 
their backports.

More comments below.

On 3/22/22 04:47, Russell Currey wrote:

The device-tree properties no-need-l1d-flush-msr-pr-1-to-0,
no-need-l1d-flush-kernel-on-user-access and
no-need-store-drain-on-priv-state-switch are the equivalents of
H_CPU_BEHAV_NO_L1D_FLUSH_ENTRY, H_CPU_BEHAV_NO_L1D_FLUSH_UACCESS
and H_CPU_BEHAV_NO_STF_BARRIER from the H_GET_CPU_CHARACTERISTICS
hcall on pseries, respectively.

Since commit 84ed26fd00c5 ("powerpc/security: Add a security feature for
STF barrier") powernv systems with this device-tree property have been
enabling the STF barrier when they have no need for it.  This patch
fixes this by clearing the STF barrier feature on those systems.

In commit d02fa40d759f ("powerpc/powernv: Remove POWER9 PVR version
check for entry and uaccess flushes") the condition for disabling the
L1D flush on kernel entry and user access was changed from any non-P9
CPU to only checking P7 and P8.  Without the appropriate device-tree
checks for newer processors on powernv, these flushes are unnecessarily
enabled on those systems.  This patch fixes that too.

Reported-by: Joel Stanley 
Signed-off-by: Russell Currey 
---
  arch/powerpc/platforms/powernv/setup.c | 9 +
  1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index 105d889abd51..824c3ad7a0fa 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -96,6 +96,15 @@ static void __init init_fw_feat_flags(struct device_node *np)
  
  	if (fw_feature_is("disabled", "needs-spec-barrier-for-bound-checks", np))

security_ftr_clear(SEC_FTR_BNDS_CHK_SPEC_BAR);
+
+   if (fw_feature_is("enabled", "no-need-l1d-flush-msr-pr-1-to-0", np))
+   security_ftr_clear(SEC_FTR_L1D_FLUSH_ENTRY);
+
+   if (fw_feature_is("enabled", "no-need-l1d-flush-kernel-on-user-access", 
np))
+   security_ftr_clear(SEC_FTR_L1D_FLUSH_UACCESS);
+


This first diff in one patch with:

Fixes: d02fa40d759f (powerpc/powernv: Remove POWER9 PVR version check for entry 
and uaccess flushes)


+   if (fw_feature_is("enabled", 
"no-need-store-drain-on-priv-state-switch", np))
+   security_ftr_clear(SEC_FTR_STF_BARRIER);


And this second diff in another one with:

Fixes: 84ed26fd00c5 (powerpc/security: Add a security feature for STF barrier)

And commit messages could be updated for both commits accordingly.


  }
  
  static void __init pnv_setup_security_mitigations(void)


Cheers!

--
Murilo


Re: False positive kmemleak report for dtb properties names on powerpc

2022-03-23 Thread Mike Rapoport
Hi Catalin,

On Wed, Mar 23, 2022 at 05:22:38PM +, Catalin Marinas wrote:
> Hi Ariel,
> 
> On Fri, Feb 18, 2022 at 09:45:51PM +0200, Ariel Marcovitch wrote:
> > I was running a powerpc 32bit kernel (built using
> > qemu_ppc_mpc8544ds_defconfig
> > buildroot config, with enabling DEBUGFS+KMEMLEAK+HIGHMEM in the kernel
> > config)
> > on qemu and invoked the kmemleak scan (twice. for some reason the first time
> > wasn't enough).
> [...]
> > I got 97 leak reports, all similar to the following:
> [...]
> > memblock_alloc lets kmemleak know about the allocated memory using
> > kmemleak_alloc_phys (in mm/memblock.c:memblock_alloc_range_nid()).
> > 
> > The problem is with the following code (mm/kmemleak.c):
> > 
> > ```c
> > 
> > void __ref kmemleak_alloc_phys(phys_addr_t phys, size_t size, int min_count,
> >    gfp_t gfp)
> > {
> >     if (!IS_ENABLED(CONFIG_HIGHMEM) || PHYS_PFN(phys) < max_low_pfn)
> >     kmemleak_alloc(__va(phys), size, min_count, gfp);
> > }
> > 
> > ```
> > 
> > When CONFIG_HIGHMEM is enabled, the pfn of the allocated memory is checked
> > against max_low_pfn, to make sure it is not in the HIGHMEM zone.
> > 
> > However, when called through unflatten_device_tree(), max_low_pfn is not yet
> > initialized in powerpc.
> > 
> > max_low_pfn is initialized (when NUMA is disabled) in
> > arch/powerpc/mm/mem.c:mem_topology_setup() which is called only after
> > unflatten_device_tree() is called in the same function (setup_arch()).
> > 
> > Because max_low_pfn is global it is 0 before initialization, so as far as
> > kmemleak_alloc_phys() is concerned, every memory is HIGHMEM (: and the
> > allocated memory is not tracked by kmemleak, causing references to objects
> > allocated later with kmalloc() to be ignored and these objects are marked as
> > leaked.
> 
> Thanks for digging into this. It looks like the logic doesn't work (not
> sure whether it ever worked).
> 
> > I actually tried to find out whether this happen on other arches as well,
> > and it seems like arm64 also have this problem when dtb is used instead of
> > acpi, although I haven't had the chance to confirm this.
> 
> arm64 doesn't enable CONFIG_HIGHMEM, so it's not affected.
> 
> > I don't suppose I can just shuffle the calls in setup_arch() around, so I
> > wanted to hear your opinions first
> 
> I think it's better if we change the logic than shuffling the calls.
> IIUC MEMBLOCK_ALLOC_ACCESSIBLE means that __va() works on the phys
> address return by memblock, so something like below (untested):

MEMBLOCK_ALLOC_ACCESSIBLE means "anywhere", see commit e63075a3c937
("memblock: Introduce default allocation limit and use it to replace
explicit ones"), so it won't help to detect high memory.

If I remember correctly, ppc initializes memblock *very* early, so setting
max_low_pfn along with lowmem_end_addr in
arch/powerpc/mm/init_32::MMU_init() makes sense to me.

Maybe ppc folks have other ideas...
I've added Christophe who works on ppc32 these days.
 
> -8<--
> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> index 7580baa76af1..f3599e857c13 100644
> --- a/mm/kmemleak.c
> +++ b/mm/kmemleak.c
> @@ -1127,8 +1127,7 @@ EXPORT_SYMBOL(kmemleak_no_scan);
>  void __ref kmemleak_alloc_phys(phys_addr_t phys, size_t size, int min_count,
>  gfp_t gfp)
>  {
> - if (!IS_ENABLED(CONFIG_HIGHMEM) || PHYS_PFN(phys) < max_low_pfn)
> - kmemleak_alloc(__va(phys), size, min_count, gfp);
> + kmemleak_alloc(__va(phys), size, min_count, gfp);
>  }
>  EXPORT_SYMBOL(kmemleak_alloc_phys);
>  
> diff --git a/mm/memblock.c b/mm/memblock.c
> index b12a364f2766..2515bd4331e8 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1397,7 +1397,7 @@ phys_addr_t __init memblock_alloc_range_nid(phys_addr_t 
> size,
>* Skip kmemleak for those places like kasan_init() and
>* early_pgtable_alloc() due to high volume.
>*/
> - if (end != MEMBLOCK_ALLOC_NOLEAKTRACE)
> + if (end == MEMBLOCK_ALLOC_ACCESSIBLE)

This change would enable kmemleak for KASAN on arm and arm64 that AFAIR
caused OOM in kmemleak and it also will limit tracking only to allocations
that do not specify 'end' explicitly ;-) 

>   /*
>* The min_count is set to 0 so that memblock allocated
>* blocks are never reported as leaks. This is because many
> -8<--
> 
> But I'm not sure whether we'd now miss some genuine allocations where
> the memblock limit is different from MEMBLOCK_ALLOC_ACCESSIBLE but still
> within the lowmem limit. If the above works, we can probably get rid of
> some other kmemleak callbacks in the kernel.
> 
> Adding Mike for any input on memblock.
> 
> -- 
> Catalin

-- 
Sincerely yours,
Mike.


Re: [RFC PATCH 1/3] objtool: Move common code to utils.c

2022-03-23 Thread Miroslav Benes
> +#define sym_for_each_insn(file, sym, insn)  \
> + for (insn = find_insn(file, sym->sec, sym->offset); \
> +  insn && >list != >insn_list && \
> + insn->sec == sym->sec &&\
> + insn->offset < sym->offset + sym->len;  \
> +  insn = list_next_entry(insn, list))
> +
> +#endif /* UTILS_H */

Since you include  in check.c, you can remove the 
definition of sym_for_each_insn() macro from check.c as well.

I wonder if it would make sense to move all these helper functions to 
utils.c and utils.h. Might be connected to what Josh wrote about his work 
on objtool interface.

Regards
Miroslav


[PATCH v1 1/1] powerpc/83xx/mpc8349emitx: Get rid of of_node assignment

2022-03-23 Thread Andy Shevchenko
Let GPIO library to assign of_node from the parent device.
This allows to move GPIO library and drivers to use fwnode
APIs instead of being stuck with OF-only interfaces.

Signed-off-by: Andy Shevchenko 
---
 arch/powerpc/platforms/83xx/mcu_mpc8349emitx.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/83xx/mcu_mpc8349emitx.c 
b/arch/powerpc/platforms/83xx/mcu_mpc8349emitx.c
index a38372f9ac12..26b502773b3f 100644
--- a/arch/powerpc/platforms/83xx/mcu_mpc8349emitx.c
+++ b/arch/powerpc/platforms/83xx/mcu_mpc8349emitx.c
@@ -8,15 +8,15 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
-#include 
-#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -116,21 +116,17 @@ static int mcu_gpio_dir_out(struct gpio_chip *gc, 
unsigned int gpio, int val)
 
 static int mcu_gpiochip_add(struct mcu *mcu)
 {
-   struct device_node *np;
+   struct device *dev = >client->dev;
struct gpio_chip *gc = >gc;
 
-   np = of_find_compatible_node(NULL, NULL, "fsl,mcu-mpc8349emitx");
-   if (!np)
-   return -ENODEV;
-
gc->owner = THIS_MODULE;
-   gc->label = kasprintf(GFP_KERNEL, "%pOF", np);
+   gc->label = kasprintf(GFP_KERNEL, "%pfw", dev_fwnode(dev));
gc->can_sleep = 1;
gc->ngpio = MCU_NUM_GPIO;
gc->base = -1;
gc->set = mcu_gpio_set;
gc->direction_output = mcu_gpio_dir_out;
-   gc->of_node = np;
+   gc->parent = dev;
 
return gpiochip_add_data(gc, mcu);
 }
-- 
2.35.1



Re: False positive kmemleak report for dtb properties names on powerpc

2022-03-23 Thread Catalin Marinas
Hi Ariel,

On Fri, Feb 18, 2022 at 09:45:51PM +0200, Ariel Marcovitch wrote:
> I was running a powerpc 32bit kernel (built using
> qemu_ppc_mpc8544ds_defconfig
> buildroot config, with enabling DEBUGFS+KMEMLEAK+HIGHMEM in the kernel
> config)
> on qemu and invoked the kmemleak scan (twice. for some reason the first time
> wasn't enough).
[...]
> I got 97 leak reports, all similar to the following:
[...]
> memblock_alloc lets kmemleak know about the allocated memory using
> kmemleak_alloc_phys (in mm/memblock.c:memblock_alloc_range_nid()).
> 
> The problem is with the following code (mm/kmemleak.c):
> 
> ```c
> 
> void __ref kmemleak_alloc_phys(phys_addr_t phys, size_t size, int min_count,
>    gfp_t gfp)
> {
>     if (!IS_ENABLED(CONFIG_HIGHMEM) || PHYS_PFN(phys) < max_low_pfn)
>     kmemleak_alloc(__va(phys), size, min_count, gfp);
> }
> 
> ```
> 
> When CONFIG_HIGHMEM is enabled, the pfn of the allocated memory is checked
> against max_low_pfn, to make sure it is not in the HIGHMEM zone.
> 
> However, when called through unflatten_device_tree(), max_low_pfn is not yet
> initialized in powerpc.
> 
> max_low_pfn is initialized (when NUMA is disabled) in
> arch/powerpc/mm/mem.c:mem_topology_setup() which is called only after
> unflatten_device_tree() is called in the same function (setup_arch()).
> 
> Because max_low_pfn is global it is 0 before initialization, so as far as
> kmemleak_alloc_phys() is concerned, every memory is HIGHMEM (: and the
> allocated memory is not tracked by kmemleak, causing references to objects
> allocated later with kmalloc() to be ignored and these objects are marked as
> leaked.

Thanks for digging into this. It looks like the logic doesn't work (not
sure whether it ever worked).

> I actually tried to find out whether this happen on other arches as well,
> and it seems like arm64 also have this problem when dtb is used instead of
> acpi, although I haven't had the chance to confirm this.

arm64 doesn't enable CONFIG_HIGHMEM, so it's not affected.

> I don't suppose I can just shuffle the calls in setup_arch() around, so I
> wanted to hear your opinions first

I think it's better if we change the logic than shuffling the calls.
IIUC MEMBLOCK_ALLOC_ACCESSIBLE means that __va() works on the phys
address return by memblock, so something like below (untested):

-8<--
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 7580baa76af1..f3599e857c13 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1127,8 +1127,7 @@ EXPORT_SYMBOL(kmemleak_no_scan);
 void __ref kmemleak_alloc_phys(phys_addr_t phys, size_t size, int min_count,
   gfp_t gfp)
 {
-   if (!IS_ENABLED(CONFIG_HIGHMEM) || PHYS_PFN(phys) < max_low_pfn)
-   kmemleak_alloc(__va(phys), size, min_count, gfp);
+   kmemleak_alloc(__va(phys), size, min_count, gfp);
 }
 EXPORT_SYMBOL(kmemleak_alloc_phys);
 
diff --git a/mm/memblock.c b/mm/memblock.c
index b12a364f2766..2515bd4331e8 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1397,7 +1397,7 @@ phys_addr_t __init memblock_alloc_range_nid(phys_addr_t 
size,
 * Skip kmemleak for those places like kasan_init() and
 * early_pgtable_alloc() due to high volume.
 */
-   if (end != MEMBLOCK_ALLOC_NOLEAKTRACE)
+   if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
/*
 * The min_count is set to 0 so that memblock allocated
 * blocks are never reported as leaks. This is because many
-8<--

But I'm not sure whether we'd now miss some genuine allocations where
the memblock limit is different from MEMBLOCK_ALLOC_ACCESSIBLE but still
within the lowmem limit. If the above works, we can probably get rid of
some other kmemleak callbacks in the kernel.

Adding Mike for any input on memblock.

-- 
Catalin


[PATCH] powerpc/ftrace: Remove ftrace init tramp once kernel init is complete

2022-03-23 Thread Naveen N. Rao
Stop using the ftrace trampoline for init section once kernel init is
complete.

Fixes: 67361cf8071286 ("powerpc/ftrace: Handle large kernel configs")
Cc: sta...@vger.kernel.org # v4.20+
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/include/asm/ftrace.h  |  2 ++
 arch/powerpc/kernel/trace/ftrace.c | 15 ---
 arch/powerpc/mm/mem.c  |  2 ++
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/ftrace.h 
b/arch/powerpc/include/asm/ftrace.h
index d83758acd1c7c3..d329f4ad18944d 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -110,11 +110,13 @@ static inline u8 this_cpu_get_ftrace_enabled(void)
return get_paca()->ftrace_enabled;
 }
 
+void ftrace_free_init_tramp(void);
 #else /* CONFIG_PPC64 */
 static inline void this_cpu_disable_ftrace(void) { }
 static inline void this_cpu_enable_ftrace(void) { }
 static inline void this_cpu_set_ftrace_enabled(u8 ftrace_enabled) { }
 static inline u8 this_cpu_get_ftrace_enabled(void) { return 1; }
+static inline void ftrace_free_init_tramp(void) { }
 #endif /* CONFIG_PPC64 */
 #endif /* !__ASSEMBLY__ */
 
diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 4ee04aacf9f13c..a778f2ae1f3f50 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -306,9 +306,7 @@ static int setup_mcount_compiler_tramp(unsigned long tramp)
 
/* Is this a known long jump tramp? */
for (i = 0; i < NUM_FTRACE_TRAMPS; i++)
-   if (!ftrace_tramps[i])
-   break;
-   else if (ftrace_tramps[i] == tramp)
+   if (ftrace_tramps[i] == tramp)
return 0;
 
/* Is this a known plt tramp? */
@@ -863,6 +861,17 @@ void arch_ftrace_update_code(int command)
 
 extern unsigned int ftrace_tramp_text[], ftrace_tramp_init[];
 
+void ftrace_free_init_tramp(void)
+{
+   int i;
+
+   for (i = 0; i < NUM_FTRACE_TRAMPS && ftrace_tramps[i]; i++)
+   if (ftrace_tramps[i] == (unsigned long)ftrace_tramp_init) {
+   ftrace_tramps[i] = 0;
+   return;
+   }
+}
+
 int __init ftrace_dyn_arch_init(void)
 {
int i;
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 8e301cd8925b2b..78aafd644b777c 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -312,6 +313,7 @@ void free_initmem(void)
ppc_md.progress = ppc_printk_progress;
mark_initmem_nx();
free_initmem_default(POISON_FREE_INITMEM);
+   ftrace_free_init_tramp();
 }
 
 /*

base-commit: e8833c5edc5903f8c8c4fa3dd4f34d6b813c87c8
-- 
2.35.1



[v2 PATCH 2/2] powerpc/papr_scm: Fix build failure when

2022-03-23 Thread Kajol Jain
The following build failure occurs when CONFIG_PERF_EVENTS is not set
as generic pmu functions are not visible in that scenario.

arch/powerpc/platforms/pseries/papr_scm.c:372:35: error: ‘struct perf_event’ 
has no member named ‘attr’
 p->nvdimm_events_map[event->attr.config],
   ^~
In file included from ./include/linux/list.h:5,
 from ./include/linux/kobject.h:19,
 from ./include/linux/of.h:17,
 from arch/powerpc/platforms/pseries/papr_scm.c:5:
arch/powerpc/platforms/pseries/papr_scm.c: In function 
‘papr_scm_pmu_event_init’:
arch/powerpc/platforms/pseries/papr_scm.c:389:49: error: ‘struct perf_event’ 
has no member named ‘pmu’
  struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
 ^~
./include/linux/container_of.h:18:26: note: in definition of macro 
‘container_of’
  void *__mptr = (void *)(ptr); \
  ^~~
arch/powerpc/platforms/pseries/papr_scm.c:389:30: note: in expansion of macro 
‘to_nvdimm_pmu’
  struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
  ^
In file included from ./include/linux/bits.h:22,
 from ./include/linux/bitops.h:6,
 from ./include/linux/of.h:15,
 from arch/powerpc/platforms/pseries/papr_scm.c:5:

Fix the build issue by adding check for CONFIG_PERF_EVENTS config option
and also add stub function for papr_scm_pmu_register to handle
the CONFIG_PERF_EVENTS=n case. Also move the position of macro
"to_nvdimm_pmu" inorder to merge it in CONFIG_PERF_EVENTS=y block.

Fixes: 4c08d4bbc089 ("powerpc/papr_scm: Add perf interface support") (Commit id
based on libnvdimm-for-next tree)
Signed-off-by: Kajol Jain 
---
 arch/powerpc/platforms/pseries/papr_scm.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

---
Changelog:
v1 -> v2
- Rebase and tested the fix patch changes on top of libnvdimm-for-next branch

- Add stub function for papr_scm_pmu_register function to handle
  the CONFIG_PERF_EVENTS=n case.

- Move the position of macro "to_nvdimm_pmu" inorder to merge it in 
CONFIG_PERF_EVENTS=y
  block.

- The initial part of arch/powerpc/platforms/pseries/papr_scm.c could be moved 
to a
  headerfile, will work on this in the follow up patchset.
---
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index 4dd513d7c029..9dba9e71fde9 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -69,8 +69,6 @@
 #define PAPR_SCM_PERF_STATS_EYECATCHER __stringify(SCMSTATS)
 #define PAPR_SCM_PERF_STATS_VERSION 0x1
 
-#define to_nvdimm_pmu(_pmu)container_of(_pmu, struct nvdimm_pmu, pmu)
-
 /* Struct holding a single performance metric */
 struct papr_scm_perf_stat {
u8 stat_id[8];
@@ -346,6 +344,9 @@ static ssize_t drc_pmem_query_stats(struct papr_scm_priv *p,
return 0;
 }
 
+#ifdef CONFIG_PERF_EVENTS
+#define to_nvdimm_pmu(_pmu)container_of(_pmu, struct nvdimm_pmu, pmu)
+
 static int papr_scm_pmu_get_value(struct perf_event *event, struct device 
*dev, u64 *count)
 {
struct papr_scm_perf_stat *stat;
@@ -558,6 +559,10 @@ static void papr_scm_pmu_register(struct papr_scm_priv *p)
dev_info(>pdev->dev, "nvdimm pmu didn't register rc=%d\n", rc);
 }
 
+#else
+static void papr_scm_pmu_register(struct papr_scm_priv *p) { }
+#endif
+
 /*
  * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
  * health information.
-- 
2.31.1



[v2 PATCH 1/2] drivers/nvdimm: Fix build failure when CONFIG_PERF_EVENTS is not set

2022-03-23 Thread Kajol Jain
The following build failure occurs when CONFIG_PERF_EVENTS is not set
as generic pmu functions are not visible in that scenario.

|-- s390-randconfig-r044-20220313
|   |-- nd_perf.c:(.text):undefined-reference-to-perf_pmu_migrate_context
|   |-- nd_perf.c:(.text):undefined-reference-to-perf_pmu_register
|   `-- nd_perf.c:(.text):undefined-reference-to-perf_pmu_unregister

Similar build failure in nds32 architecture:
nd_perf.c:(.text+0x21e): undefined reference to `perf_pmu_migrate_context'
nd_perf.c:(.text+0x434): undefined reference to `perf_pmu_register'
nd_perf.c:(.text+0x57c): undefined reference to `perf_pmu_unregister'

Fix this issue by adding check for CONFIG_PERF_EVENTS config option
and disabling the nvdimm perf interface incase this config is not set.

Also remove function declaration of perf_pmu_migrate_context,
perf_pmu_register, perf_pmu_unregister functions from nd.h as these are
common pmu functions which are part of perf_event.h and since we
are disabling nvdimm perf interface incase CONFIG_PERF_EVENTS option
is not set, we not need to declare them in nd.h

Also move the platform_device header file addition part from nd.h to
nd_perf.c and add stub functions for register_nvdimm_pmu and
unregister_nvdimm_pmu functions to handle CONFIG_PERF_EVENTS=n
case.

Fixes: 0fab1ba6ad6b ("drivers/nvdimm: Add perf interface to expose
nvdimm performance stats") (Commit id based on libnvdimm-for-next tree)
Signed-off-by: Kajol Jain 
Link: https://lore.kernel.org/all/62317124.ybqfu33+s%2fwdvwgj%25...@intel.com/
Reported-by: kernel test robot 
---
 drivers/nvdimm/Makefile  |  2 +-
 drivers/nvdimm/nd_perf.c |  1 +
 include/linux/nd.h   | 16 
 3 files changed, 14 insertions(+), 5 deletions(-)

---
Changelog:

v1 -> v2
- Rebase and tested the fix patch changes on top of libnvdimm-for-next branch

- Add stub functions for register_nvdimm_pmu and unregister_nvdimm_pmu
  functions to handle CONFIG_PERF_EVENTS=n case as suggested by Dan Williams

- Move the platform_device header file addition part from nd.h to
  nd_perf.c and just forward declare struct platform_device in nd.h
---

diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 3fb806748716..ba0296dca9db 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -15,7 +15,7 @@ nd_e820-y := e820.o
 libnvdimm-y := core.o
 libnvdimm-y += bus.o
 libnvdimm-y += dimm_devs.o
-libnvdimm-y += nd_perf.o
+libnvdimm-$(CONFIG_PERF_EVENTS) += nd_perf.o
 libnvdimm-y += dimm.o
 libnvdimm-y += region_devs.o
 libnvdimm-y += region.o
diff --git a/drivers/nvdimm/nd_perf.c b/drivers/nvdimm/nd_perf.c
index 314415894acf..433bbb68ae64 100644
--- a/drivers/nvdimm/nd_perf.c
+++ b/drivers/nvdimm/nd_perf.c
@@ -10,6 +10,7 @@
 #define pr_fmt(fmt) "nvdimm_pmu: " fmt
 
 #include 
+#include 
 
 #define EVENT(_name, _code) enum{_name = _code}
 
diff --git a/include/linux/nd.h b/include/linux/nd.h
index 7b2ccbdc1cbc..f3e91c891cbc 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -9,7 +9,6 @@
 #include 
 #include 
 #include 
-#include 
 
 enum nvdimm_event {
NVDIMM_REVALIDATE_POISON,
@@ -57,15 +56,24 @@ struct nvdimm_pmu {
struct cpumask arch_cpumask;
 };
 
+struct platform_device;
+
+#ifdef CONFIG_PERF_EVENTS
 extern ssize_t nvdimm_events_sysfs_show(struct device *dev,
struct device_attribute *attr,
char *page);
 
 int register_nvdimm_pmu(struct nvdimm_pmu *nvdimm, struct platform_device 
*pdev);
 void unregister_nvdimm_pmu(struct nvdimm_pmu *nd_pmu);
-void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu);
-int perf_pmu_register(struct pmu *pmu, const char *name, int type);
-void perf_pmu_unregister(struct pmu *pmu);
+
+#else
+static inline int register_nvdimm_pmu(struct nvdimm_pmu *nvdimm, struct 
platform_device *pdev)
+{
+   return -ENXIO;
+}
+
+static inline void unregister_nvdimm_pmu(struct nvdimm_pmu *nd_pmu) { }
+#endif
 
 struct nd_device_driver {
struct device_driver drv;
-- 
2.31.1



Re: [PATCH 1/2] drivers/nvdimm: Fix build failure when CONFIG_PERF_EVENTS is not set

2022-03-23 Thread kajoljain



On 3/22/22 07:40, Dan Williams wrote:
> On Fri, Mar 18, 2022 at 4:42 AM Kajol Jain  wrote:
>>
>> The following build failure occures when CONFIG_PERF_EVENTS is not set
>> as generic pmu functions are not visible in that scenario.
>>
>> |-- s390-randconfig-r044-20220313
>> |   |-- nd_perf.c:(.text):undefined-reference-to-perf_pmu_migrate_context
>> |   |-- nd_perf.c:(.text):undefined-reference-to-perf_pmu_register
>> |   `-- nd_perf.c:(.text):undefined-reference-to-perf_pmu_unregister
>>
>> Similar build failure in nds32 architecture:
>> nd_perf.c:(.text+0x21e): undefined reference to `perf_pmu_migrate_context'
>> nd_perf.c:(.text+0x434): undefined reference to `perf_pmu_register'
>> nd_perf.c:(.text+0x57c): undefined reference to `perf_pmu_unregister'
>>
>> Fix this issue by adding check for CONFIG_PERF_EVENTS config option
>> and disabling the nvdimm perf interface incase this config is not set.
>>
>> Also removed function declaration of perf_pmu_migrate_context,
>> perf_pmu_register, perf_pmu_unregister functions from nd.h as these are
>> common pmu functions which are part of perf_event.h and since we
>> are disabling nvdimm perf interface incase CONFIG_PERF_EVENTS option
>> is not set, we not need to declare them in nd.h
>>
>> Fixes: 0fab1ba6ad6b ("drivers/nvdimm: Add perf interface to expose
>> nvdimm performance stats") (Commit id based on linux-next tree)
>> Signed-off-by: Kajol Jain 
>> Link: 
>> https://lore.kernel.org/all/62317124.ybqfu33+s%2fwdvwgj%25...@intel.com/
>> Reported-by: kernel test robot 
>> ---
>>  drivers/nvdimm/Makefile | 2 +-
>>  include/linux/nd.h  | 7 ---
>>  2 files changed, 5 insertions(+), 4 deletions(-)
>>
>> ---
>> - This fix patch changes are added and tested on top of linux-next tree
>>   on the 'next-20220315' branch.
>> ---
>> diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
>> index 3fb806748716..ba0296dca9db 100644
>> --- a/drivers/nvdimm/Makefile
>> +++ b/drivers/nvdimm/Makefile
>> @@ -15,7 +15,7 @@ nd_e820-y := e820.o
>>  libnvdimm-y := core.o
>>  libnvdimm-y += bus.o
>>  libnvdimm-y += dimm_devs.o
>> -libnvdimm-y += nd_perf.o
>> +libnvdimm-$(CONFIG_PERF_EVENTS) += nd_perf.o
>>  libnvdimm-y += dimm.o
>>  libnvdimm-y += region_devs.o
>>  libnvdimm-y += region.o
>> diff --git a/include/linux/nd.h b/include/linux/nd.h
>> index 7b2ccbdc1cbc..a4265eaf5ae8 100644
>> --- a/include/linux/nd.h
>> +++ b/include/linux/nd.h
>> @@ -8,8 +8,10 @@
>>  #include 
>>  #include 
>>  #include 
>> +#ifdef CONFIG_PERF_EVENTS
>>  #include 
> 
> Why must this not be included? Doesn't it already handle the
> CONFIG_PERF_EVENTS=n case internally?
> 
>>  #include 
> 
> I notice now that this platform-device header should have never been
> added in the first place, just forward declare:
> 
> struct platform_device;

Hi Dan,
Sure I will do the required changes.

Thanks,
Kajol Jain
> 
>> +#endif
>>
>>  enum nvdimm_event {
>> NVDIMM_REVALIDATE_POISON,
>> @@ -25,6 +27,7 @@ enum nvdimm_claim_class {
>> NVDIMM_CCLASS_UNKNOWN,
>>  };
>>
>> +#ifdef CONFIG_PERF_EVENTS
>>  #define NVDIMM_EVENT_VAR(_id)  event_attr_##_id
>>  #define NVDIMM_EVENT_PTR(_id)  (_attr_##_id.attr.attr)
> 
> Why must these be inside the ifdef guard?
> 
>>
>> @@ -63,9 +66,7 @@ extern ssize_t nvdimm_events_sysfs_show(struct device *dev,
>>
>>  int register_nvdimm_pmu(struct nvdimm_pmu *nvdimm, struct platform_device 
>> *pdev);
>>  void unregister_nvdimm_pmu(struct nvdimm_pmu *nd_pmu);
> 
> Shouldn't there also be stub functions in the CONFIG_PERF_EVENTS=n case?
> 
> static inline int register_nvdimm_pmu(struct nvdimm_pmu *nvdimm,
> struct platform_device *pdev)
> {
> return -ENXIO;
> }
> 
> static inline void unregister_nvdimm_pmu(struct nvdimm_pmu *nd_pmu)
> {
> }
> 
>> -void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu);
>> -int perf_pmu_register(struct pmu *pmu, const char *name, int type);
>> -void perf_pmu_unregister(struct pmu *pmu);
> 
> Yeah, I should have caught these earlier.
> 
>> +#endif
>>
>>  struct nd_device_driver {
>> struct device_driver drv;
>> --
>> 2.31.1
>>


Re: [PATCH 2/2] powerpc/papr_scm: Fix build failure when CONFIG_PERF_EVENTS is not set

2022-03-23 Thread kajoljain



On 3/23/22 21:02, Dan Williams wrote:
> On Wed, Mar 23, 2022 at 3:05 AM Michael Ellerman  wrote:
>>
>> Dan Williams  writes:
>>> On Tue, Mar 22, 2022 at 7:30 AM kajoljain  wrote:
 On 3/22/22 03:09, Dan Williams wrote:
> On Fri, Mar 18, 2022 at 4:42 AM Kajol Jain  wrote:
>>
>> The following build failure occures when CONFIG_PERF_EVENTS is not set
>> as generic pmu functions are not visible in that scenario.
>>
>> arch/powerpc/platforms/pseries/papr_scm.c:372:35: error: ‘struct 
>> perf_event’ has no member named ‘attr’
>>  p->nvdimm_events_map[event->attr.config],
>>^~
>> In file included from ./include/linux/list.h:5,
>>  from ./include/linux/kobject.h:19,
>>  from ./include/linux/of.h:17,
>>  from arch/powerpc/platforms/pseries/papr_scm.c:5:
>> arch/powerpc/platforms/pseries/papr_scm.c: In function 
>> ‘papr_scm_pmu_event_init’:
>> arch/powerpc/platforms/pseries/papr_scm.c:389:49: error: ‘struct 
>> perf_event’ has no member named ‘pmu’
>>   struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
>>  ^~
>> ./include/linux/container_of.h:18:26: note: in definition of macro 
>> ‘container_of’
>>   void *__mptr = (void *)(ptr); \
>>   ^~~
>> arch/powerpc/platforms/pseries/papr_scm.c:389:30: note: in expansion of 
>> macro ‘to_nvdimm_pmu’
>>   struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
>>   ^
>> In file included from ./include/linux/bits.h:22,
>>  from ./include/linux/bitops.h:6,
>>  from ./include/linux/of.h:15,
>>  from arch/powerpc/platforms/pseries/papr_scm.c:5:
>>
>> Fix the build issue by adding check for CONFIG_PERF_EVENTS config option
>> and disabling the papr_scm perf interface support incase this config
>> is not set
>>
>> Fixes: 4c08d4bbc089 ("powerpc/papr_scm: Add perf interface support") 
>> (Commit id
>> based on linux-next tree)
>> Signed-off-by: Kajol Jain 
>> ---
>>  arch/powerpc/platforms/pseries/papr_scm.c | 15 +++
>
> This is a bit messier than I would have liked mainly because it dumps
> a bunch of ifdefery into a C file contrary to coding style, "Wherever
> possible, don't use preprocessor conditionals (#if, #ifdef) in .c
> files". I would expect this all to move to an organization like:

 Hi Dan,
   Thanks for reviewing the patches. Inorder to avoid the multiple
 ifdefs checks, we can also add stub function for papr_scm_pmu_register.
 With that change we will just have one ifdef check for
 CONFIG_PERF_EVENTS config in both papr_scm.c and nd.h file. Hence we can
 avoid adding new files specific for papr_scm perf interface.

 Below is the code snippet for that change, let me know if looks fine to
 you. I tested it
 with set/unset PAPR_SCM config value and set/unset PERF_EVENTS config
 value combinations.

 diff --git a/arch/powerpc/platforms/pseries/papr_scm.c
 b/arch/powerpc/platforms/pseries/papr_scm.c
 index 4dd513d7c029..38fabb44d3c3 100644
 --- a/arch/powerpc/platforms/pseries/papr_scm.c
 +++ b/arch/powerpc/platforms/pseries/papr_scm.c
 @@ -69,8 +69,6 @@
  #define PAPR_SCM_PERF_STATS_EYECATCHER __stringify(SCMSTATS)
  #define PAPR_SCM_PERF_STATS_VERSION 0x1

 -#define to_nvdimm_pmu(_pmu)container_of(_pmu, struct nvdimm_pmu, pmu)
 -
  /* Struct holding a single performance metric */
  struct papr_scm_perf_stat {
 u8 stat_id[8];
 @@ -346,6 +344,9 @@ static ssize_t drc_pmem_query_stats(struct
 papr_scm_priv *p,
 return 0;
  }

 +#ifdef CONFIG_PERF_EVENTS
 +#define to_nvdimm_pmu(_pmu)container_of(_pmu, struct nvdimm_pmu, pmu)
 +
  static int papr_scm_pmu_get_value(struct perf_event *event, struct
 device *dev, u64 *count)
  {
 struct papr_scm_perf_stat *stat;
 @@ -558,6 +559,10 @@ static void papr_scm_pmu_register(struct
 papr_scm_priv *p)
 dev_info(>pdev->dev, "nvdimm pmu didn't register rc=%d\n", rc);
  }

 +#else
 +static inline void papr_scm_pmu_register(struct papr_scm_priv *p) { }
>>>
>>> Since this isn't in a header file, it does not need to be marked
>>> "inline" the compiler will figure it out.
>>>
 +#endif
>>>
>>> It might be time to create:
>>>
>>> arch/powerpc/platforms/pseries/papr_scm.h
>>>
>>> ...there is quite a bit of header material accrued in papr_scm.c and
>>> once the ifdefs start landing in it then it becomes a nominal coding
>>> style issue. That said, this is certainly more palatable than the
>>> previous version. So if you don't want to create papr_scm.h yet for
>>> 

Re: [PATCH 2/2] powerpc/papr_scm: Fix build failure when CONFIG_PERF_EVENTS is not set

2022-03-23 Thread Dan Williams
On Wed, Mar 23, 2022 at 3:05 AM Michael Ellerman  wrote:
>
> Dan Williams  writes:
> > On Tue, Mar 22, 2022 at 7:30 AM kajoljain  wrote:
> >> On 3/22/22 03:09, Dan Williams wrote:
> >> > On Fri, Mar 18, 2022 at 4:42 AM Kajol Jain  wrote:
> >> >>
> >> >> The following build failure occures when CONFIG_PERF_EVENTS is not set
> >> >> as generic pmu functions are not visible in that scenario.
> >> >>
> >> >> arch/powerpc/platforms/pseries/papr_scm.c:372:35: error: ‘struct 
> >> >> perf_event’ has no member named ‘attr’
> >> >>  p->nvdimm_events_map[event->attr.config],
> >> >>^~
> >> >> In file included from ./include/linux/list.h:5,
> >> >>  from ./include/linux/kobject.h:19,
> >> >>  from ./include/linux/of.h:17,
> >> >>  from arch/powerpc/platforms/pseries/papr_scm.c:5:
> >> >> arch/powerpc/platforms/pseries/papr_scm.c: In function 
> >> >> ‘papr_scm_pmu_event_init’:
> >> >> arch/powerpc/platforms/pseries/papr_scm.c:389:49: error: ‘struct 
> >> >> perf_event’ has no member named ‘pmu’
> >> >>   struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
> >> >>  ^~
> >> >> ./include/linux/container_of.h:18:26: note: in definition of macro 
> >> >> ‘container_of’
> >> >>   void *__mptr = (void *)(ptr); \
> >> >>   ^~~
> >> >> arch/powerpc/platforms/pseries/papr_scm.c:389:30: note: in expansion of 
> >> >> macro ‘to_nvdimm_pmu’
> >> >>   struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
> >> >>   ^
> >> >> In file included from ./include/linux/bits.h:22,
> >> >>  from ./include/linux/bitops.h:6,
> >> >>  from ./include/linux/of.h:15,
> >> >>  from arch/powerpc/platforms/pseries/papr_scm.c:5:
> >> >>
> >> >> Fix the build issue by adding check for CONFIG_PERF_EVENTS config option
> >> >> and disabling the papr_scm perf interface support incase this config
> >> >> is not set
> >> >>
> >> >> Fixes: 4c08d4bbc089 ("powerpc/papr_scm: Add perf interface support") 
> >> >> (Commit id
> >> >> based on linux-next tree)
> >> >> Signed-off-by: Kajol Jain 
> >> >> ---
> >> >>  arch/powerpc/platforms/pseries/papr_scm.c | 15 +++
> >> >
> >> > This is a bit messier than I would have liked mainly because it dumps
> >> > a bunch of ifdefery into a C file contrary to coding style, "Wherever
> >> > possible, don't use preprocessor conditionals (#if, #ifdef) in .c
> >> > files". I would expect this all to move to an organization like:
> >>
> >> Hi Dan,
> >>   Thanks for reviewing the patches. Inorder to avoid the multiple
> >> ifdefs checks, we can also add stub function for papr_scm_pmu_register.
> >> With that change we will just have one ifdef check for
> >> CONFIG_PERF_EVENTS config in both papr_scm.c and nd.h file. Hence we can
> >> avoid adding new files specific for papr_scm perf interface.
> >>
> >> Below is the code snippet for that change, let me know if looks fine to
> >> you. I tested it
> >> with set/unset PAPR_SCM config value and set/unset PERF_EVENTS config
> >> value combinations.
> >>
> >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c
> >> b/arch/powerpc/platforms/pseries/papr_scm.c
> >> index 4dd513d7c029..38fabb44d3c3 100644
> >> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> >> @@ -69,8 +69,6 @@
> >>  #define PAPR_SCM_PERF_STATS_EYECATCHER __stringify(SCMSTATS)
> >>  #define PAPR_SCM_PERF_STATS_VERSION 0x1
> >>
> >> -#define to_nvdimm_pmu(_pmu)container_of(_pmu, struct nvdimm_pmu, pmu)
> >> -
> >>  /* Struct holding a single performance metric */
> >>  struct papr_scm_perf_stat {
> >> u8 stat_id[8];
> >> @@ -346,6 +344,9 @@ static ssize_t drc_pmem_query_stats(struct
> >> papr_scm_priv *p,
> >> return 0;
> >>  }
> >>
> >> +#ifdef CONFIG_PERF_EVENTS
> >> +#define to_nvdimm_pmu(_pmu)container_of(_pmu, struct nvdimm_pmu, pmu)
> >> +
> >>  static int papr_scm_pmu_get_value(struct perf_event *event, struct
> >> device *dev, u64 *count)
> >>  {
> >> struct papr_scm_perf_stat *stat;
> >> @@ -558,6 +559,10 @@ static void papr_scm_pmu_register(struct
> >> papr_scm_priv *p)
> >> dev_info(>pdev->dev, "nvdimm pmu didn't register rc=%d\n", rc);
> >>  }
> >>
> >> +#else
> >> +static inline void papr_scm_pmu_register(struct papr_scm_priv *p) { }
> >
> > Since this isn't in a header file, it does not need to be marked
> > "inline" the compiler will figure it out.
> >
> >> +#endif
> >
> > It might be time to create:
> >
> > arch/powerpc/platforms/pseries/papr_scm.h
> >
> > ...there is quite a bit of header material accrued in papr_scm.c and
> > once the ifdefs start landing in it then it becomes a nominal coding
> > style issue. That said, this is certainly more palatable than the
> > previous version. So if you don't want to create papr_scm.h yet for
> > 

Re: [PATCH V9 11/20] riscv: compat: syscall: Add compat_sys_call_table implementation

2022-03-23 Thread Guo Ren
Hi Palmer & Arnd,

Fixup fadvise64_64 arguments problem.

On Tue, Mar 22, 2022 at 10:41 PM  wrote:
>
> From: Guo Ren 
>
> Implement compat sys_call_table and some system call functions:
> truncate64, ftruncate64, fallocate, pread64, pwrite64,
> sync_file_range, readahead, fadvise64_64 which need argument
> translation.
>
> Signed-off-by: Guo Ren 
> Signed-off-by: Guo Ren 
> Reviewed-by: Arnd Bergmann 
> Tested-by: Heiko Stuebner 
> Cc: Palmer Dabbelt 
> ---
>  arch/riscv/include/asm/syscall.h |  1 +
>  arch/riscv/include/asm/unistd.h  | 11 +++
>  arch/riscv/include/uapi/asm/unistd.h |  2 +-
>  arch/riscv/kernel/Makefile   |  1 +
>  arch/riscv/kernel/compat_syscall_table.c | 19 
>  arch/riscv/kernel/sys_riscv.c|  6 ++--
>  fs/open.c| 24 +++
>  fs/read_write.c  | 16 ++
>  fs/sync.c|  9 ++
>  include/asm-generic/compat.h |  7 +
>  include/linux/compat.h   | 37 
>  mm/fadvise.c | 11 +++
>  mm/readahead.c   |  7 +
>  13 files changed, 148 insertions(+), 3 deletions(-)
>  create mode 100644 arch/riscv/kernel/compat_syscall_table.c
>
> diff --git a/arch/riscv/include/asm/syscall.h 
> b/arch/riscv/include/asm/syscall.h
> index 7ac6a0e275f2..384a63b86420 100644
> --- a/arch/riscv/include/asm/syscall.h
> +++ b/arch/riscv/include/asm/syscall.h
> @@ -16,6 +16,7 @@
>
>  /* The array of function pointers for syscalls. */
>  extern void * const sys_call_table[];
> +extern void * const compat_sys_call_table[];
>
>  /*
>   * Only the low 32 bits of orig_r0 are meaningful, so we return int.
> diff --git a/arch/riscv/include/asm/unistd.h b/arch/riscv/include/asm/unistd.h
> index 6c316093a1e5..5ddac412b578 100644
> --- a/arch/riscv/include/asm/unistd.h
> +++ b/arch/riscv/include/asm/unistd.h
> @@ -11,6 +11,17 @@
>  #define __ARCH_WANT_SYS_CLONE
>  #define __ARCH_WANT_MEMFD_SECRET
>
> +#ifdef CONFIG_COMPAT
> +#define __ARCH_WANT_COMPAT_TRUNCATE64
> +#define __ARCH_WANT_COMPAT_FTRUNCATE64
> +#define __ARCH_WANT_COMPAT_FALLOCATE
> +#define __ARCH_WANT_COMPAT_PREAD64
> +#define __ARCH_WANT_COMPAT_PWRITE64
> +#define __ARCH_WANT_COMPAT_SYNC_FILE_RANGE
> +#define __ARCH_WANT_COMPAT_READAHEAD
> +#define __ARCH_WANT_COMPAT_FADVISE64_64
> +#endif
> +
>  #include 
>
>  #define NR_syscalls (__NR_syscalls)
> diff --git a/arch/riscv/include/uapi/asm/unistd.h 
> b/arch/riscv/include/uapi/asm/unistd.h
> index 8062996c2dfd..c9e50eed14aa 100644
> --- a/arch/riscv/include/uapi/asm/unistd.h
> +++ b/arch/riscv/include/uapi/asm/unistd.h
> @@ -15,7 +15,7 @@
>   * along with this program.  If not, see .
>   */
>
> -#ifdef __LP64__
> +#if defined(__LP64__) && !defined(__SYSCALL_COMPAT)
>  #define __ARCH_WANT_NEW_STAT
>  #define __ARCH_WANT_SET_GET_RLIMIT
>  #endif /* __LP64__ */
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index ffc87e76b1dd..3b3e425aadd2 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -68,3 +68,4 @@ obj-$(CONFIG_CRASH_DUMP)  += crash_dump.o
>  obj-$(CONFIG_JUMP_LABEL)   += jump_label.o
>
>  obj-$(CONFIG_EFI)  += efi.o
> +obj-$(CONFIG_COMPAT)   += compat_syscall_table.o
> diff --git a/arch/riscv/kernel/compat_syscall_table.c 
> b/arch/riscv/kernel/compat_syscall_table.c
> new file mode 100644
> index ..651f2b009c28
> --- /dev/null
> +++ b/arch/riscv/kernel/compat_syscall_table.c
> @@ -0,0 +1,19 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#define __SYSCALL_COMPAT
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#undef __SYSCALL
> +#define __SYSCALL(nr, call)  [nr] = (call),
> +
> +asmlinkage long compat_sys_rt_sigreturn(void);
> +
> +void * const compat_sys_call_table[__NR_syscalls] = {
> +   [0 ... __NR_syscalls - 1] = sys_ni_syscall,
> +#include 
> +};
> diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> index 12f8a7fce78b..9c0194f176fc 100644
> --- a/arch/riscv/kernel/sys_riscv.c
> +++ b/arch/riscv/kernel/sys_riscv.c
> @@ -33,7 +33,9 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, 
> len,
>  {
> return riscv_sys_mmap(addr, len, prot, flags, fd, offset, 0);
>  }
> -#else
> +#endif
> +
> +#if defined(CONFIG_32BIT) || defined(CONFIG_COMPAT)
>  SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
> unsigned long, prot, unsigned long, flags,
> unsigned long, fd, off_t, offset)
> @@ -44,7 +46,7 @@ SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, 
> len,
>  */
> return riscv_sys_mmap(addr, len, prot, flags, fd, offset, 12);
>  }
> -#endif /* !CONFIG_64BIT */
> +#endif
>
>  /*
>   * Allows the instruction cache to be flushed from userspace.  Despite RISC-V
> diff --git a/fs/open.c 

[PATCH 2/2] powerpc/64: remove system call instruction emulation

2022-03-23 Thread Naveen N. Rao
From: Nicholas Piggin 

emulate_step instruction emulation including sc instruction emulation
initially appeared in xmon. It then emulation code was then moved into
sstep.c where kprobes could use it too, and later hw_breakpoint and
uprobes started to use it.

Until uprobes, the only instruction emulation users were for kernel
mode instructions.

- xmon only steps / breaks on kernel addresses.
- kprobes is kernel only.
- hw_breakpoint only emulates kernel instructions, single steps user.

At one point there was support for the kernel to execute sc
instructions, although that is long removed and it's not clear whether
there was any in-tree code. So system call emulation is not required by
the above users.

uprobes uses emulate_step and it appears possible to emulate sc
instruction in userspace. Userspace system call emulation is broken and
it's not clear it ever worked well.

The big complication is that userspace takes an interrupt to the kernel
to emulate the instruction. The user->kernel interrupt sets up registers
and interrupt stack frame expecting to return to userspace, then system
call instruction emulation re-directs that stack frame to the kernel,
early in the system call interrupt handler. This means the the interrupt
return code takes the kernel->kernel restore path, which does not restore
everything as the system call interrupt handler would expect coming from
userspace. regs->iamr appears to get lost for example, because the
kernel->kernel return does not restore the user iamr. Accounting such as
irqflags tracing and CPU accounting does not get flipped back to user
mode as the system call handler expects, so those appear to enter the
kernel twice without returning to userspace.

These things may be individually fixable with various complication, but
it is a big complexity for unclear real benefit.

This patch removes system call emulation and disables stepping system
calls (because they don't work with trace interrupts, as commented).

Signed-off-by: Nicholas Piggin 
[also get rid of '#ifdef CONFIG_PPC64']
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/interrupt_64.S | 10 ---
 arch/powerpc/lib/sstep.c   | 46 +++---
 2 files changed, 10 insertions(+), 46 deletions(-)

diff --git a/arch/powerpc/kernel/interrupt_64.S 
b/arch/powerpc/kernel/interrupt_64.S
index 7bab2d7de372e0..6471034c790973 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -219,16 +219,6 @@ system_call_vectored common 0x3000
  */
 system_call_vectored sigill 0x7ff0
 
-
-/*
- * Entered via kernel return set up by kernel/sstep.c, must match entry regs
- */
-   .globl system_call_vectored_emulate
-system_call_vectored_emulate:
-_ASM_NOKPROBE_SYMBOL(system_call_vectored_emulate)
-   li  r10,IRQS_ALL_DISABLED
-   stb r10,PACAIRQSOFTMASK(r13)
-   b   system_call_vectored_common
 #endif /* CONFIG_PPC_BOOK3S */
 
.balign IFETCH_ALIGN_BYTES
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 3fda8d0a05b43f..01c8fd39f34981 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -15,9 +15,6 @@
 #include 
 #include 
 
-extern char system_call_common[];
-extern char system_call_vectored_emulate[];
-
 #ifdef CONFIG_PPC64
 /* Bits in SRR1 that are copied from MSR */
 #define MSR_MASK   0x87c0UL
@@ -1376,7 +1373,6 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
if (branch_taken(word, regs, op))
op->type |= BRTAKEN;
return 1;
-#ifdef CONFIG_PPC64
case 17:/* sc */
if ((word & 0xfe2) == 2)
op->type = SYSCALL;
@@ -1388,7 +1384,6 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
} else
op->type = UNKNOWN;
return 0;
-#endif
case 18:/* b */
op->type = BRANCH | BRTAKEN;
imm = word & 0x03fc;
@@ -3643,43 +3638,22 @@ int emulate_step(struct pt_regs *regs, ppc_inst_t instr)
regs_set_return_msr(regs, (regs->msr & ~op.val) | (val & 
op.val));
goto instr_done;
 
-#ifdef CONFIG_PPC64
case SYSCALL:   /* sc */
/*
-* N.B. this uses knowledge about how the syscall
-* entry code works.  If that is changed, this will
-* need to be changed also.
+* Per ISA v3.1, section 7.5.15 'Trace Interrupt', we can't
+* single step a system call instruction:
+*
+*   Successful completion for an instruction means that the
+*   instruction caused no other interrupt. Thus a Trace
+*   interrupt never occurs for a System Call or System Call
+*   Vectored instruction, or for a Trap instruction that
+*   traps.
   

[PATCH 0/2] powerpc: Remove system call emulation

2022-03-23 Thread Naveen N. Rao
This is an update to the series posted by Nick here:
http://lkml.kernel.org/r/20220124055741.3686496-1-npig...@gmail.com

The first patch disables probes and breakpoints on instructions that
can't be single stepped, including sc and scv. The second patch removes
system call emulation for powerpc64.

- Naveen



Naveen N. Rao (1):
  powerpc: Reject probes on instructions that can't be single stepped

Nicholas Piggin (1):
  powerpc/64: remove system call instruction emulation

 arch/powerpc/include/asm/probes.h  | 55 ++
 arch/powerpc/kernel/interrupt_64.S | 10 --
 arch/powerpc/kernel/kprobes.c  |  4 +--
 arch/powerpc/kernel/uprobes.c  |  5 +++
 arch/powerpc/lib/sstep.c   | 46 ++---
 arch/powerpc/xmon/xmon.c   | 11 +++---
 6 files changed, 77 insertions(+), 54 deletions(-)


base-commit: e8833c5edc5903f8c8c4fa3dd4f34d6b813c87c8
-- 
2.35.1



[PATCH 1/2] powerpc: Reject probes on instructions that can't be single stepped

2022-03-23 Thread Naveen N. Rao
Per the ISA, a Trace interrupt is not generated for:
- [h|u]rfi[d]
- rfscv
- sc, scv, and Trap instructions that trap
- Power-Saving Mode instructions
- other instructions that cause interrupts (other than Trace interrupts)
- the first instructions of any interrupt handler (applies to Branch and Single 
Step tracing;
CIABR matches may still occur)
- instructions that are emulated by software

Add a helper to check for instructions belonging to the first four
categories above and to reject kprobes, uprobes and xmon breakpoints on
such instructions. We reject probing on instructions belonging to these
categories across all ISA versions and across both BookS and BookE.

For trap instructions, we can't know in advance if they can cause a
trap, and there is no good reason to allow probing on those. Also,
uprobes already refuses to probe trap instructions and kprobes does not
allow probes on trap instructions used for kernel warnings and bugs. As
such, stop allowing any type of probes/breakpoints on trap instruction
across uprobes, kprobes and xmon.

For some of the fp/altivec instructions that can generate an interrupt
and which we emulate in the kernel (altivec assist, for example), we
check and turn off single stepping in emulate_single_step().

Instructions generating a DSI are restarted and single stepping normally
completes once the instruction is completed.

In uprobes, if a single stepped instruction results in a non-fatal
signal to be delivered to the task, such signals are "delayed" until
after the instruction completes. For fatal signals, single stepping is
cancelled and the instruction restarted in-place so that core dump
captures proper addresses.

In kprobes, we do not allow probes on instructions having an extable
entry and we also do not allow probing interrupt vectors.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/include/asm/probes.h | 55 +++
 arch/powerpc/kernel/kprobes.c |  4 +--
 arch/powerpc/kernel/uprobes.c |  5 +++
 arch/powerpc/xmon/xmon.c  | 11 +++
 4 files changed, 67 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/probes.h 
b/arch/powerpc/include/asm/probes.h
index c5d984700d241a..21ab5b6256b590 100644
--- a/arch/powerpc/include/asm/probes.h
+++ b/arch/powerpc/include/asm/probes.h
@@ -31,6 +31,61 @@ typedef u32 ppc_opcode_t;
 #define MSR_SINGLESTEP (MSR_SE)
 #endif
 
+static inline bool can_single_step(u32 inst)
+{
+   switch (inst >> 26) {
+   case 2: /* tdi */
+   return false;
+   case 3: /* twi */
+   return false;
+   case 17:/* sc and scv */
+   return false;
+   case 19:
+   switch ((inst >> 1) & 0x3ff) {
+   case 18:/* rfid */
+   return false;
+   case 38:/* rfmci */
+   return false;
+   case 39:/* rfdi */
+   return false;
+   case 50:/* rfi */
+   return false;
+   case 51:/* rfci */
+   return false;
+   case 82:/* rfscv */
+   return false;
+   case 274:   /* hrfid */
+   return false;
+   case 306:   /* urfid */
+   return false;
+   case 370:   /* stop */
+   return false;
+   case 402:   /* doze */
+   return false;
+   case 434:   /* nap */
+   return false;
+   case 466:   /* sleep */
+   return false;
+   case 498:   /* rvwinkle */
+   return false;
+   }
+   break;
+   case 31:
+   switch ((inst >> 1) & 0x3ff) {
+   case 4: /* tw */
+   return false;
+   case 68:/* td */
+   return false;
+   case 146:   /* mtmsr */
+   return false;
+   case 178:   /* mtmsrd */
+   return false;
+   }
+   break;
+   }
+   return true;
+}
+
 /* Enable single stepping for the current task */
 static inline void enable_single_step(struct pt_regs *regs)
 {
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 9a492fdec1dfbe..0936a6c8c256b9 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -129,8 +129,8 @@ int arch_prepare_kprobe(struct kprobe *p)
if ((unsigned long)p->addr & 0x03) {
printk("Attempt to register kprobe at an unaligned address\n");
ret = -EINVAL;
-   } else if (IS_MTMSRD(insn) || IS_RFID(insn)) {
-   printk("Cannot register a kprobe on mtmsr[d]/rfi[d]\n");
+   } else if 

Re: [PATCH 2/2] powerpc/papr_scm: Fix build failure when CONFIG_PERF_EVENTS is not set

2022-03-23 Thread Michael Ellerman
Dan Williams  writes:
> On Tue, Mar 22, 2022 at 7:30 AM kajoljain  wrote:
>> On 3/22/22 03:09, Dan Williams wrote:
>> > On Fri, Mar 18, 2022 at 4:42 AM Kajol Jain  wrote:
>> >>
>> >> The following build failure occures when CONFIG_PERF_EVENTS is not set
>> >> as generic pmu functions are not visible in that scenario.
>> >>
>> >> arch/powerpc/platforms/pseries/papr_scm.c:372:35: error: ‘struct 
>> >> perf_event’ has no member named ‘attr’
>> >>  p->nvdimm_events_map[event->attr.config],
>> >>^~
>> >> In file included from ./include/linux/list.h:5,
>> >>  from ./include/linux/kobject.h:19,
>> >>  from ./include/linux/of.h:17,
>> >>  from arch/powerpc/platforms/pseries/papr_scm.c:5:
>> >> arch/powerpc/platforms/pseries/papr_scm.c: In function 
>> >> ‘papr_scm_pmu_event_init’:
>> >> arch/powerpc/platforms/pseries/papr_scm.c:389:49: error: ‘struct 
>> >> perf_event’ has no member named ‘pmu’
>> >>   struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
>> >>  ^~
>> >> ./include/linux/container_of.h:18:26: note: in definition of macro 
>> >> ‘container_of’
>> >>   void *__mptr = (void *)(ptr); \
>> >>   ^~~
>> >> arch/powerpc/platforms/pseries/papr_scm.c:389:30: note: in expansion of 
>> >> macro ‘to_nvdimm_pmu’
>> >>   struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
>> >>   ^
>> >> In file included from ./include/linux/bits.h:22,
>> >>  from ./include/linux/bitops.h:6,
>> >>  from ./include/linux/of.h:15,
>> >>  from arch/powerpc/platforms/pseries/papr_scm.c:5:
>> >>
>> >> Fix the build issue by adding check for CONFIG_PERF_EVENTS config option
>> >> and disabling the papr_scm perf interface support incase this config
>> >> is not set
>> >>
>> >> Fixes: 4c08d4bbc089 ("powerpc/papr_scm: Add perf interface support") 
>> >> (Commit id
>> >> based on linux-next tree)
>> >> Signed-off-by: Kajol Jain 
>> >> ---
>> >>  arch/powerpc/platforms/pseries/papr_scm.c | 15 +++
>> >
>> > This is a bit messier than I would have liked mainly because it dumps
>> > a bunch of ifdefery into a C file contrary to coding style, "Wherever
>> > possible, don't use preprocessor conditionals (#if, #ifdef) in .c
>> > files". I would expect this all to move to an organization like:
>>
>> Hi Dan,
>>   Thanks for reviewing the patches. Inorder to avoid the multiple
>> ifdefs checks, we can also add stub function for papr_scm_pmu_register.
>> With that change we will just have one ifdef check for
>> CONFIG_PERF_EVENTS config in both papr_scm.c and nd.h file. Hence we can
>> avoid adding new files specific for papr_scm perf interface.
>>
>> Below is the code snippet for that change, let me know if looks fine to
>> you. I tested it
>> with set/unset PAPR_SCM config value and set/unset PERF_EVENTS config
>> value combinations.
>>
>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c
>> b/arch/powerpc/platforms/pseries/papr_scm.c
>> index 4dd513d7c029..38fabb44d3c3 100644
>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> @@ -69,8 +69,6 @@
>>  #define PAPR_SCM_PERF_STATS_EYECATCHER __stringify(SCMSTATS)
>>  #define PAPR_SCM_PERF_STATS_VERSION 0x1
>>
>> -#define to_nvdimm_pmu(_pmu)container_of(_pmu, struct nvdimm_pmu, pmu)
>> -
>>  /* Struct holding a single performance metric */
>>  struct papr_scm_perf_stat {
>> u8 stat_id[8];
>> @@ -346,6 +344,9 @@ static ssize_t drc_pmem_query_stats(struct
>> papr_scm_priv *p,
>> return 0;
>>  }
>>
>> +#ifdef CONFIG_PERF_EVENTS
>> +#define to_nvdimm_pmu(_pmu)container_of(_pmu, struct nvdimm_pmu, pmu)
>> +
>>  static int papr_scm_pmu_get_value(struct perf_event *event, struct
>> device *dev, u64 *count)
>>  {
>> struct papr_scm_perf_stat *stat;
>> @@ -558,6 +559,10 @@ static void papr_scm_pmu_register(struct
>> papr_scm_priv *p)
>> dev_info(>pdev->dev, "nvdimm pmu didn't register rc=%d\n", rc);
>>  }
>>
>> +#else
>> +static inline void papr_scm_pmu_register(struct papr_scm_priv *p) { }
>
> Since this isn't in a header file, it does not need to be marked
> "inline" the compiler will figure it out.
>
>> +#endif
>
> It might be time to create:
>
> arch/powerpc/platforms/pseries/papr_scm.h
>
> ...there is quite a bit of header material accrued in papr_scm.c and
> once the ifdefs start landing in it then it becomes a nominal coding
> style issue. That said, this is certainly more palatable than the
> previous version. So if you don't want to create papr_scm.h yet for
> this, at least make a note in the changelog that the first portion of
> arch/powerpc/platforms/pseries/papr_scm.c is effectively papr_scm.h
> content and may move there in the future, or something like that.

IMHO the only thing that belongs in a header is content that's needed by

Re: [PATCH kernel] powerpc/boot: Stop using RELACOUNT

2022-03-23 Thread Gabriel Paubert
On Wed, Mar 23, 2022 at 11:18:40AM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 3/22/22 13:12, Michael Ellerman wrote:
> > Alexey Kardashevskiy  writes:
> > > So far the RELACOUNT tag from the ELF header was containing the exact
> > > number of R_PPC_RELATIVE/R_PPC64_RELATIVE relocations. However the LLVM's
> > > recent change [1] make it equal-or-less than the actual number which
> > > makes it useless.
> > > 
> > > This replaces RELACOUNT in zImage loader with a pair of RELASZ and 
> > > RELAENT.
> > > The vmlinux relocation code is fixed in [2].
> > 
> > That's committed so you can say:
> >in commit d79976918852 ("powerpc/64: Add UADDR64 relocation support")
> > 
> > > To make it more future proof, this walks through the entire .rela.dyn
> > > section instead of assuming that the section is sorter by a relocation
> > > type. Unlike [1], this does not add unaligned UADDR/UADDR64 relocations
> >  ^
> >  that should be 2?
> 
> Yes.
> 
> > 
> > > as in hardly possible to see those in arch-specific zImage.
> > 
> > I don't quite parse that. Is it true we can never see them in zImage?
> > Maybe it's true that we don't see them in practice.
> 
> I can force UADDR64 in zImage as I did for d79976918852 but zImage is lot
> smaller and more arch-specific than vmlinux and so far only PRINT_INDEX
> triggered UADDR64 in vmlinux and chances of the same thing happening in
> zImage are small.
> 
> 
> > 
> > > [1] https://github.com/llvm/llvm-project/commit/da0e5b885b25cf4
> > > [2] 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=next=d799769188529a
> > > Signed-off-by: Alexey Kardashevskiy 
> > > ---
> > >   arch/powerpc/boot/crt0.S | 43 +---
> > >   1 file changed, 27 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/boot/crt0.S b/arch/powerpc/boot/crt0.S
> > > index feadee18e271..6ea3417da3b7 100644
> > > --- a/arch/powerpc/boot/crt0.S
> > > +++ b/arch/powerpc/boot/crt0.S
> > > @@ -8,7 +8,8 @@
> > >   #include "ppc_asm.h"
> > >   RELA = 7
> > > -RELACOUNT = 0x6ff9
> > > +RELASZ = 8
> > > +RELAENT = 9
> > >   .data
> > >   /* A procedure descriptor used when booting this as a COFF file.
> > > @@ -75,34 +76,38 @@ p_base:   mflrr10 /* r10 now 
> > > points to runtime addr of p_base */
> > >   bne 11f
> > >   lwz r9,4(r12)   /* get RELA pointer in r9 */
> > >   b   12f
> > > -11:  addis   r8,r8,(-RELACOUNT)@ha
> > > - cmpwi   r8,RELACOUNT@l
> > > +11:  cmpwi   r8,RELASZ
> > > + bne 111f
> > > + lwz r0,4(r12)   /* get RELASZ value in r0 */
> > > + b   12f
> > > +111: cmpwi   r8,RELAENT
> > 
> > Can you use named local labels for new labels you introduce?
> > 
> > This could be .Lcheck_for_relaent: perhaps.
> 
> Then I'll need to rename them all/most and add more noise to the patch which
> reduces chances of it being reviewed. But sure, I can rename labels.
> 
> 
> 
> > >   bne 12f
> > > - lwz r0,4(r12)   /* get RELACOUNT value in r0 */
> > > + lwz r14,4(r12)  /* get RELAENT value in r14 */
> > >   12: addir12,r12,8
> > >   b   9b
> > >   /* The relocation section contains a list of relocations.
> > >* We now do the R_PPC_RELATIVE ones, which point to words
> > > -  * which need to be initialized with addend + offset.
> > > -  * The R_PPC_RELATIVE ones come first and there are RELACOUNT
> > > -  * of them. */
> > > +  * which need to be initialized with addend + offset */
> > >   10: /* skip relocation if we don't have both */
> > >   cmpwi   r0,0
> > >   beq 3f
> > >   cmpwi   r9,0
> > >   beq 3f
> > > + cmpwi   r14,0
> > > + beq 3f
> > >   add r9,r9,r11   /* Relocate RELA pointer */
> > > + divdr0,r0,r14   /* RELASZ / RELAENT */
> > 
> > This is in the 32-bit portion isn't it. AFAIK 32-bit CPUs don't
> > implement divd. I'm not sure why the toolchain allowed it. I would
> > expect it to trap if run on real 32-bit hardware.
> 
> 
> Uff, my bad, "divw", right?

Or maybe even "divwu", since I suspect both values are unsigned.

Probably does not make a difference in practice.

Gabriel

> 
> I am guessing it works as zImage for 64bit BigEndian is still ELF32 which
> runs in 64bit CPU and I did not test on real PPC32 as I'm not quite sure how
> and I hoped your farm will do this for me :)
> 
> 
> 
> > >   mtctr   r0
> > >   2:  lbz r0,4+3(r9)  /* ELF32_R_INFO(reloc->r_info) */
> > >   cmpwi   r0,22   /* R_PPC_RELATIVE */
> > > - bne 3f
> > > + bne 22f
> > >   lwz r12,0(r9)   /* reloc->r_offset */
> > >   lwz r0,8(r9)/* reloc->r_addend */
> > >   add r0,r0,r11
> > >   stwxr0,r11,r12
> > > - addir9,r9,12
> > > +22:  add r9,r9,r14
> > >

Re: [PATCH 2/2] powerpc/papr_scm: Fix build failure when CONFIG_PERF_EVENTS is not set

2022-03-23 Thread Vaibhav Jain
Dan Williams  writes:

> On Tue, Mar 22, 2022 at 7:30 AM kajoljain  wrote:
>>
>>
>>
>> On 3/22/22 03:09, Dan Williams wrote:
>> > On Fri, Mar 18, 2022 at 4:42 AM Kajol Jain  wrote:
>> >>
>> >> The following build failure occures when CONFIG_PERF_EVENTS is not set
>> >> as generic pmu functions are not visible in that scenario.
>> >>
>> >> arch/powerpc/platforms/pseries/papr_scm.c:372:35: error: ‘struct 
>> >> perf_event’ has no member named ‘attr’
>> >>  p->nvdimm_events_map[event->attr.config],
>> >>^~
>> >> In file included from ./include/linux/list.h:5,
>> >>  from ./include/linux/kobject.h:19,
>> >>  from ./include/linux/of.h:17,
>> >>  from arch/powerpc/platforms/pseries/papr_scm.c:5:
>> >> arch/powerpc/platforms/pseries/papr_scm.c: In function 
>> >> ‘papr_scm_pmu_event_init’:
>> >> arch/powerpc/platforms/pseries/papr_scm.c:389:49: error: ‘struct 
>> >> perf_event’ has no member named ‘pmu’
>> >>   struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
>> >>  ^~
>> >> ./include/linux/container_of.h:18:26: note: in definition of macro 
>> >> ‘container_of’
>> >>   void *__mptr = (void *)(ptr); \
>> >>   ^~~
>> >> arch/powerpc/platforms/pseries/papr_scm.c:389:30: note: in expansion of 
>> >> macro ‘to_nvdimm_pmu’
>> >>   struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
>> >>   ^
>> >> In file included from ./include/linux/bits.h:22,
>> >>  from ./include/linux/bitops.h:6,
>> >>  from ./include/linux/of.h:15,
>> >>  from arch/powerpc/platforms/pseries/papr_scm.c:5:
>> >>
>> >> Fix the build issue by adding check for CONFIG_PERF_EVENTS config option
>> >> and disabling the papr_scm perf interface support incase this config
>> >> is not set
>> >>
>> >> Fixes: 4c08d4bbc089 ("powerpc/papr_scm: Add perf interface support") 
>> >> (Commit id
>> >> based on linux-next tree)
>> >> Signed-off-by: Kajol Jain 
>> >> ---
>> >>  arch/powerpc/platforms/pseries/papr_scm.c | 15 +++
>> >
>> > This is a bit messier than I would have liked mainly because it dumps
>> > a bunch of ifdefery into a C file contrary to coding style, "Wherever
>> > possible, don't use preprocessor conditionals (#if, #ifdef) in .c
>> > files". I would expect this all to move to an organization like:
>>
>> Hi Dan,
>>   Thanks for reviewing the patches. Inorder to avoid the multiple
>> ifdefs checks, we can also add stub function for papr_scm_pmu_register.
>> With that change we will just have one ifdef check for
>> CONFIG_PERF_EVENTS config in both papr_scm.c and nd.h file. Hence we can
>> avoid adding new files specific for papr_scm perf interface.
>>
>> Below is the code snippet for that change, let me know if looks fine to
>> you. I tested it
>> with set/unset PAPR_SCM config value and set/unset PERF_EVENTS config
>> value combinations.
>>
>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c
>> b/arch/powerpc/platforms/pseries/papr_scm.c
>> index 4dd513d7c029..38fabb44d3c3 100644
>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> @@ -69,8 +69,6 @@
>>  #define PAPR_SCM_PERF_STATS_EYECATCHER __stringify(SCMSTATS)
>>  #define PAPR_SCM_PERF_STATS_VERSION 0x1
>>
>> -#define to_nvdimm_pmu(_pmu)container_of(_pmu, struct nvdimm_pmu, pmu)
>> -
>>  /* Struct holding a single performance metric */
>>  struct papr_scm_perf_stat {
>> u8 stat_id[8];
>> @@ -346,6 +344,9 @@ static ssize_t drc_pmem_query_stats(struct
>> papr_scm_priv *p,
>> return 0;
>>  }
>>
>> +#ifdef CONFIG_PERF_EVENTS
>> +#define to_nvdimm_pmu(_pmu)container_of(_pmu, struct nvdimm_pmu, pmu)
>> +
>>  static int papr_scm_pmu_get_value(struct perf_event *event, struct
>> device *dev, u64 *count)
>>  {
>> struct papr_scm_perf_stat *stat;
>> @@ -558,6 +559,10 @@ static void papr_scm_pmu_register(struct
>> papr_scm_priv *p)
>> dev_info(>pdev->dev, "nvdimm pmu didn't register rc=%d\n", rc);
>>  }
>>
>> +#else
>> +static inline void papr_scm_pmu_register(struct papr_scm_priv *p) { }
>
> Since this isn't in a header file, it does not need to be marked
> "inline" the compiler will figure it out.
>
>> +#endif
>
> It might be time to create:
>
> arch/powerpc/platforms/pseries/papr_scm.h
>
> ...there is quite a bit of header material accrued in papr_scm.c and
> once the ifdefs start landing in it then it becomes a nominal coding
> style issue. That said, this is certainly more palatable than the
> previous version. So if you don't want to create papr_scm.h yet for
> this, at least make a note in the changelog that the first portion of
> arch/powerpc/platforms/pseries/papr_scm.c is effectively papr_scm.h
> content and may move there in the future, or something like that.

Great suggestion Dan and incidently we are already working on 

Re: [PATCH 2/2] powerpc/papr_scm: Fix build failure when CONFIG_PERF_EVENTS is not set

2022-03-23 Thread Aneesh Kumar K V

On 3/22/22 10:02 PM, Dan Williams wrote:

On Tue, Mar 22, 2022 at 7:30 AM kajoljain  wrote:




On 3/22/22 03:09, Dan Williams wrote:

On Fri, Mar 18, 2022 at 4:42 AM Kajol Jain  wrote:


The following build failure occures when CONFIG_PERF_EVENTS is not set
as generic pmu functions are not visible in that scenario.

arch/powerpc/platforms/pseries/papr_scm.c:372:35: error: ‘struct perf_event’ 
has no member named ‘attr’
  p->nvdimm_events_map[event->attr.config],
^~
In file included from ./include/linux/list.h:5,
  from ./include/linux/kobject.h:19,
  from ./include/linux/of.h:17,
  from arch/powerpc/platforms/pseries/papr_scm.c:5:
arch/powerpc/platforms/pseries/papr_scm.c: In function 
‘papr_scm_pmu_event_init’:
arch/powerpc/platforms/pseries/papr_scm.c:389:49: error: ‘struct perf_event’ 
has no member named ‘pmu’
   struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
  ^~
./include/linux/container_of.h:18:26: note: in definition of macro 
‘container_of’
   void *__mptr = (void *)(ptr); \
   ^~~
arch/powerpc/platforms/pseries/papr_scm.c:389:30: note: in expansion of macro 
‘to_nvdimm_pmu’
   struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
   ^
In file included from ./include/linux/bits.h:22,
  from ./include/linux/bitops.h:6,
  from ./include/linux/of.h:15,
  from arch/powerpc/platforms/pseries/papr_scm.c:5:

Fix the build issue by adding check for CONFIG_PERF_EVENTS config option
and disabling the papr_scm perf interface support incase this config
is not set

Fixes: 4c08d4bbc089 ("powerpc/papr_scm: Add perf interface support") (Commit id
based on linux-next tree)
Signed-off-by: Kajol Jain 
---
  arch/powerpc/platforms/pseries/papr_scm.c | 15 +++


This is a bit messier than I would have liked mainly because it dumps
a bunch of ifdefery into a C file contrary to coding style, "Wherever
possible, don't use preprocessor conditionals (#if, #ifdef) in .c
files". I would expect this all to move to an organization like:


Hi Dan,
   Thanks for reviewing the patches. Inorder to avoid the multiple
ifdefs checks, we can also add stub function for papr_scm_pmu_register.
With that change we will just have one ifdef check for
CONFIG_PERF_EVENTS config in both papr_scm.c and nd.h file. Hence we can
avoid adding new files specific for papr_scm perf interface.

Below is the code snippet for that change, let me know if looks fine to
you. I tested it
with set/unset PAPR_SCM config value and set/unset PERF_EVENTS config
value combinations.

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c
b/arch/powerpc/platforms/pseries/papr_scm.c
index 4dd513d7c029..38fabb44d3c3 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -69,8 +69,6 @@
  #define PAPR_SCM_PERF_STATS_EYECATCHER __stringify(SCMSTATS)
  #define PAPR_SCM_PERF_STATS_VERSION 0x1

-#define to_nvdimm_pmu(_pmu)container_of(_pmu, struct nvdimm_pmu, pmu)
-
  /* Struct holding a single performance metric */
  struct papr_scm_perf_stat {
 u8 stat_id[8];
@@ -346,6 +344,9 @@ static ssize_t drc_pmem_query_stats(struct
papr_scm_priv *p,
 return 0;
  }

+#ifdef CONFIG_PERF_EVENTS
+#define to_nvdimm_pmu(_pmu)container_of(_pmu, struct nvdimm_pmu, pmu)
+
  static int papr_scm_pmu_get_value(struct perf_event *event, struct
device *dev, u64 *count)
  {
 struct papr_scm_perf_stat *stat;
@@ -558,6 +559,10 @@ static void papr_scm_pmu_register(struct
papr_scm_priv *p)
 dev_info(>pdev->dev, "nvdimm pmu didn't register rc=%d\n", rc);
  }

+#else
+static inline void papr_scm_pmu_register(struct papr_scm_priv *p) { }


Since this isn't in a header file, it does not need to be marked
"inline" the compiler will figure it out.


+#endif


It might be time to create:

arch/powerpc/platforms/pseries/papr_scm.h

...there is quite a bit of header material accrued in papr_scm.c and
once the ifdefs start landing in it then it becomes a nominal coding
style issue. That said, this is certainly more palatable than the
previous version. So if you don't want to create papr_scm.h yet for
this, at least make a note in the changelog that the first portion of
arch/powerpc/platforms/pseries/papr_scm.c is effectively papr_scm.h
content and may move there in the future, or something like that.


There was an attempt to do that earlier. We should get that reposted 
with further changes move more details to header.


https://lore.kernel.org/nvdimm/163454440296.431294.2368481747380790011.st...@lep8c.aus.stglabs.ibm.com 





Re: [PATCH kernel] powerpc/boot: Stop using RELACOUNT

2022-03-23 Thread Michael Ellerman
Alexey Kardashevskiy  writes:
> On 3/22/22 13:12, Michael Ellerman wrote:
>> Alexey Kardashevskiy  writes:
>>> So far the RELACOUNT tag from the ELF header was containing the exact
>>> number of R_PPC_RELATIVE/R_PPC64_RELATIVE relocations. However the LLVM's
>>> recent change [1] make it equal-or-less than the actual number which
>>> makes it useless.
>>>
>>> This replaces RELACOUNT in zImage loader with a pair of RELASZ and RELAENT.
>>> The vmlinux relocation code is fixed in [2].
>> 
>> That's committed so you can say:
>>in commit d79976918852 ("powerpc/64: Add UADDR64 relocation support")
>> 
>>> To make it more future proof, this walks through the entire .rela.dyn
>>> section instead of assuming that the section is sorter by a relocation
>>> type. Unlike [1], this does not add unaligned UADDR/UADDR64 relocations
>>  ^
>>  that should be 2?
>
> Yes.
>
>> 
>>> as in hardly possible to see those in arch-specific zImage.
>> 
>> I don't quite parse that. Is it true we can never see them in zImage?
>> Maybe it's true that we don't see them in practice.
>
> I can force UADDR64 in zImage as I did for d79976918852 but zImage is 
> lot smaller and more arch-specific than vmlinux and so far only 
> PRINT_INDEX triggered UADDR64 in vmlinux and chances of the same thing 
> happening in zImage are small.

OK. Just update the change log to say something like that. ie. they're
not impossible, but not seen in practice.

>>> @@ -75,34 +76,38 @@ p_base: mflrr10 /* r10 now points to 
>>> runtime addr of p_base */
>>> bne 11f
>>> lwz r9,4(r12)   /* get RELA pointer in r9 */
>>> b   12f
>>> -11:addis   r8,r8,(-RELACOUNT)@ha
>>> -   cmpwi   r8,RELACOUNT@l
>>> +11:cmpwi   r8,RELASZ
>>> +   bne 111f
>>> +   lwz r0,4(r12)   /* get RELASZ value in r0 */
>>> +   b   12f
>>> +111:   cmpwi   r8,RELAENT
>> 
>> Can you use named local labels for new labels you introduce?
>> 
>> This could be .Lcheck_for_relaent: perhaps.
>
> Then I'll need to rename them all/most and add more noise to the patch 
> which reduces chances of it being reviewed. But sure, I can rename labels.

I said for new labels you introduce :) We can do a follow-up to rename
existing labels if we want to.

>>> bne 12f
>>> -   lwz r0,4(r12)   /* get RELACOUNT value in r0 */
>>> +   lwz r14,4(r12)  /* get RELAENT value in r14 */
>>>   12:   addir12,r12,8
>>> b   9b
>>>   
>>> /* The relocation section contains a list of relocations.
>>>  * We now do the R_PPC_RELATIVE ones, which point to words
>>> -* which need to be initialized with addend + offset.
>>> -* The R_PPC_RELATIVE ones come first and there are RELACOUNT
>>> -* of them. */
>>> +* which need to be initialized with addend + offset */
>>>   10:   /* skip relocation if we don't have both */
>>> cmpwi   r0,0
>>> beq 3f
>>> cmpwi   r9,0
>>> beq 3f
>>> +   cmpwi   r14,0
>>> +   beq 3f
>>>   
>>> add r9,r9,r11   /* Relocate RELA pointer */
>>> +   divdr0,r0,r14   /* RELASZ / RELAENT */
>> 
>> This is in the 32-bit portion isn't it. AFAIK 32-bit CPUs don't
>> implement divd. I'm not sure why the toolchain allowed it. I would
>> expect it to trap if run on real 32-bit hardware.
>
>
> Uff, my bad, "divw", right?

AFAIK yes.

> I am guessing it works as zImage for 64bit BigEndian is still ELF32 
> which runs in 64bit CPU and I did not test on real PPC32 as I'm not 
> quite sure how and I hoped your farm will do this for me :)

Yeah I was hoping they would catch it too. I build pmac32 which should
build a 32-bit zImage, but I build it with a 64-bit compiler using -m32,
so maybe that's why it's accepted. Or maybe we're passing the wrong
options to the assembler.

I don't have any tests of actually booting a 32-bit zImage, my automated
tests all use the vmlinux.

cheers