Re: [PATCH] powerpc/boot: Build wrapper for an appropriate CPU

2022-03-30 Thread Joel Stanley
On Thu, 31 Mar 2022 at 02:05, Murilo Opsfelder Araújo
 wrote:
>
> Hi, Joel.
>
> On 3/30/22 08:24, Joel Stanley wrote:
> > Currently the boot wrapper lacks a -mcpu option, so it will be built for
> > the toolchain's default cpu. This is a problem if the toolchain defaults
> > to a cpu with newer instructions.
> >
> > We could wire in TARGET_CPU but instead use the oldest supported option
> > so the wrapper runs anywhere.
> >
> > The GCC documentation stays that -mcpu=powerpc64le will give us a
> > generic 64 bit powerpc machine:
> >
> >   -mcpu=powerpc, -mcpu=powerpc64, and -mcpu=powerpc64le specify pure
> >   32-bit PowerPC (either endian), 64-bit big endian PowerPC and 64-bit
> >   little endian PowerPC architecture machine types, with an appropriate,
> >   generic processor model assumed for scheduling purposes.
> >
> > So do that for each of the three machines.
> >
> > This bug was found when building the kernel with a toolchain that
> > defaulted to powre10, resulting in a pcrel enabled wrapper which fails
> > to link:
> >
> >   arch/powerpc/boot/wrapper.a(crt0.o): in function `p_base':
> >   (.text+0x150): call to `platform_init' lacks nop, can't restore toc; (toc 
> > save/adjust stub)
> >   (.text+0x154): call to `start' lacks nop, can't restore toc; (toc 
> > save/adjust stub)
> >   powerpc64le-buildroot-linux-gnu-ld: final link failed: bad value
> >
> > Even with tha bug worked around the resulting kernel would crash on a
> > power9 box:
> >
> >   $ qemu-system-ppc64 -nographic -nodefaults -M powernv9 -kernel 
> > arch/powerpc/boot/zImage.epapr -serial mon:stdio
> >   [7.069331356,5] INIT: Starting kernel at 0x20010020, fdt at 
> > 0x3068c628 25694 bytes
> >   [7.130374661,3] ***
> >   [7.131072886,3] Fatal Exception 0xe40 at 200101e4MSR 
> > 9001
> >   [7.131290613,3] CFAR : 2001027c MSR  : 9001
> >   [7.131433759,3] SRR0 : 20010050 SRR1 : 9001
> >   [7.13155,3] HSRR0: 200101e4 HSRR1: 9001
> >   [7.131733687,3] DSISR:  DAR  : 
> >   [7.131905162,3] LR   : 20010280 CTR  : 
> >   [7.132068356,3] CR   : 44002004 XER  : 
> >
> > Link: https://github.com/linuxppc/issues/issues/400
> > Signed-off-by: Joel Stanley 
> > ---
> > Tested:
> >
> >   - ppc64le_defconfig
> >   - pseries and powernv qemu, for power8, power9, power10 cpus
> >   - buildroot compiler that defaults to -mcpu=power10 (gcc 10.3.0, ld 
> > 2.36.1)
> >   -  RHEL9 cross compilers (gcc 11.2.1-1, ld 2.35.2-17.el9)
> >
> > All decompressed and made it into the kernel ok.
> >
> > ppc64_defconfig did not work, as we've got a regression when the wrapper
> > is built for big endian. It hasn't worked for zImage.pseries for a long
> > time (at least v4.14), and broke some time between v5.4 and v5.17 for
> > zImage.epapr.
> >
> >   arch/powerpc/boot/Makefile | 8 ++--
> >   1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
> > index 9993c6256ad2..1f5cc401bfc0 100644
> > --- a/arch/powerpc/boot/Makefile
> > +++ b/arch/powerpc/boot/Makefile
> > @@ -38,9 +38,13 @@ BOOTCFLAGS:= -Wall -Wundef -Wstrict-prototypes 
> > -Wno-trigraphs \
> >$(LINUXINCLUDE)
> >
> >   ifdef CONFIG_PPC64_BOOT_WRAPPER
> > -BOOTCFLAGS   += -m64
> > +ifdef CONFIG_CPU_LITTLE_ENDIAN
> > +BOOTCFLAGS   += -m64 -mcpu=powerpc64le
> >   else
> > -BOOTCFLAGS   += -m32
> > +BOOTCFLAGS   += -m64 -mcpu=powerpc64
> > +endif
> > +else
> > +BOOTCFLAGS   += -m32 -mcpu=powerpc
> >   endif
> >
> >   BOOTCFLAGS  += -isystem $(shell $(BOOTCC) -print-file-name=include)
>
> I think it was a fortunate coincidence that the default cpu type of your gcc 
> is
> compatible with your system.  If the distro gcc moves its default to a newer 
> cpu
> type than your system, this bug would happen again.

Perhaps I needed to be clear in my commit message: that's the exact
bug I'm looking to avoid. I have a buildroot toolchain that was built
for -mcpu=power10.

I think you're suggesting the -mcpu=powerpc64 option will change it 's
behavior depending on the default. From my reading of the man page, I
don't think that's true.

I did a little test using my buildroot compiler which has
with-cpu=power10. I used the presence of PCREL relocations as evidence
that it was build for power10.

$ powerpc64le-buildroot-linux-gnu-gcc -mcpu=power10 -c test.c
$ readelf -r test.o |grep -c PCREL
24
$ powerpc64le-buildroot-linux-gnu-gcc -c test.c
$ readelf -r test.o |grep -c PCREL
24
$ powerpc64le-buildroot-linux-gnu-gcc -mcpu=powerpc64le -c test.c
$ readelf -r test.o |grep -c PCREL
0

>
> The command "gcc -v |& grep with-cpu" will show you the default cpu type for 
> 32
> and 64-bit that gcc was configured.

Just a headss up: this gives me no output for the 64 bit compilers on my laptop:

$ 

Re: [PATCH] powerpc/boot: Build wrapper for an appropriate CPU

2022-03-30 Thread Murilo Opsfelder Araújo

Hi, Joel.

On 3/30/22 08:24, Joel Stanley wrote:

Currently the boot wrapper lacks a -mcpu option, so it will be built for
the toolchain's default cpu. This is a problem if the toolchain defaults
to a cpu with newer instructions.

We could wire in TARGET_CPU but instead use the oldest supported option
so the wrapper runs anywhere.

The GCC documentation stays that -mcpu=powerpc64le will give us a
generic 64 bit powerpc machine:

  -mcpu=powerpc, -mcpu=powerpc64, and -mcpu=powerpc64le specify pure
  32-bit PowerPC (either endian), 64-bit big endian PowerPC and 64-bit
  little endian PowerPC architecture machine types, with an appropriate,
  generic processor model assumed for scheduling purposes.

So do that for each of the three machines.

This bug was found when building the kernel with a toolchain that
defaulted to powre10, resulting in a pcrel enabled wrapper which fails
to link:

  arch/powerpc/boot/wrapper.a(crt0.o): in function `p_base':
  (.text+0x150): call to `platform_init' lacks nop, can't restore toc; (toc 
save/adjust stub)
  (.text+0x154): call to `start' lacks nop, can't restore toc; (toc save/adjust 
stub)
  powerpc64le-buildroot-linux-gnu-ld: final link failed: bad value

Even with tha bug worked around the resulting kernel would crash on a
power9 box:

  $ qemu-system-ppc64 -nographic -nodefaults -M powernv9 -kernel 
arch/powerpc/boot/zImage.epapr -serial mon:stdio
  [7.069331356,5] INIT: Starting kernel at 0x20010020, fdt at 0x3068c628 
25694 bytes
  [7.130374661,3] ***
  [7.131072886,3] Fatal Exception 0xe40 at 200101e4MSR 
9001
  [7.131290613,3] CFAR : 2001027c MSR  : 9001
  [7.131433759,3] SRR0 : 20010050 SRR1 : 9001
  [7.13155,3] HSRR0: 200101e4 HSRR1: 9001
  [7.131733687,3] DSISR:  DAR  : 
  [7.131905162,3] LR   : 20010280 CTR  : 
  [7.132068356,3] CR   : 44002004 XER  : 

Link: https://github.com/linuxppc/issues/issues/400
Signed-off-by: Joel Stanley 
---
Tested:

  - ppc64le_defconfig
  - pseries and powernv qemu, for power8, power9, power10 cpus
  - buildroot compiler that defaults to -mcpu=power10 (gcc 10.3.0, ld 2.36.1)
  -  RHEL9 cross compilers (gcc 11.2.1-1, ld 2.35.2-17.el9)

All decompressed and made it into the kernel ok.

ppc64_defconfig did not work, as we've got a regression when the wrapper
is built for big endian. It hasn't worked for zImage.pseries for a long
time (at least v4.14), and broke some time between v5.4 and v5.17 for
zImage.epapr.

  arch/powerpc/boot/Makefile | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 9993c6256ad2..1f5cc401bfc0 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -38,9 +38,13 @@ BOOTCFLAGS:= -Wall -Wundef -Wstrict-prototypes 
-Wno-trigraphs \
 $(LINUXINCLUDE)
  
  ifdef CONFIG_PPC64_BOOT_WRAPPER

-BOOTCFLAGS += -m64
+ifdef CONFIG_CPU_LITTLE_ENDIAN
+BOOTCFLAGS += -m64 -mcpu=powerpc64le
  else
-BOOTCFLAGS += -m32
+BOOTCFLAGS += -m64 -mcpu=powerpc64
+endif
+else
+BOOTCFLAGS += -m32 -mcpu=powerpc
  endif
  
  BOOTCFLAGS	+= -isystem $(shell $(BOOTCC) -print-file-name=include)


I think it was a fortunate coincidence that the default cpu type of your gcc is
compatible with your system.  If the distro gcc moves its default to a newer cpu
type than your system, this bug would happen again.

The command "gcc -v |& grep with-cpu" will show you the default cpu type for 32
and 64-bit that gcc was configured.

Considering the CONFIG_TARGET_CPU for BOOTCFLAGS would bring some level of
consistency between CFLAGS and BOOTCFLAGS regarding -mcpu value.

We could mimic the behaviour from arch/powerpc/Makefile:

166 ifdef config_ppc_book3s_64
167 ifdef config_cpu_little_endian
168 cflags-$(config_generic_cpu) += -mcpu=power8
169 cflags-$(config_generic_cpu) += $(call 
cc-option,-mtune=power9,-mtune=power8)
170 else
171 cflags-$(config_generic_cpu) += $(call cc-option,-mtune=power7,$(call 
cc-option,-mtune=power5))
172 cflags-$(config_generic_cpu) += $(call 
cc-option,-mcpu=power5,-mcpu=power4)
173 endif
174 else ifdef config_ppc_book3e_64
175 cflags-$(config_generic_cpu) += -mcpu=powerpc64
176 endif
...
185 CFLAGS-$(CONFIG_TARGET_CPU_BOOL) += $(call 
cc-option,-mcpu=$(CONFIG_TARGET_CPU))

Cheers!

--
Murilo


[Bug 215781] Highmem support broken on kernels greater 5.15.x on ppc32?

2022-03-30 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=215781

--- Comment #3 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 300667
  --> https://bugzilla.kernel.org/attachment.cgi?id=300667=edit
kernel .config (5.15.32, PowerMac G4 DP)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 215781] Highmem support broken on kernels greater 5.15.x on ppc32?

2022-03-30 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=215781

--- Comment #2 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 300666
  --> https://bugzilla.kernel.org/attachment.cgi?id=300666=edit
kernel .config (5.16.18, PowerMac G4 DP)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 215781] Highmem support broken on kernels greater 5.15.x on ppc32?

2022-03-30 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=215781

--- Comment #1 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 300665
  --> https://bugzilla.kernel.org/attachment.cgi?id=300665=edit
dmesg (5.15.32, PowerMac G4 DP)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 215781] New: Highmem support broken on kernels greater 5.15.x on ppc32?

2022-03-30 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=215781

Bug ID: 215781
   Summary: Highmem support broken on kernels greater 5.15.x on
ppc32?
   Product: Platform Specific/Hardware
   Version: 2.5
Kernel Version: 5.16.18
  Hardware: PPC-32
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: PPC-32
  Assignee: platform_ppc...@kernel-bugs.osdl.org
  Reporter: erhar...@mailbox.org
Regression: Yes

Created attachment 300664
  --> https://bugzilla.kernel.org/attachment.cgi?id=300664=edit
dmesg (5.16.18, PowerMac G4 DP)

Noticed my G4 DP ran a bit sluggish... Turned out it uses only 614664K of
2097152K RAM. Happens on both kernel 5.16.18 and 5.17.1. Kernels 5.15 and
before work as expected.

It seems to be a problem with highmem as 5.16.18 and 5.1.7.1 show 0K highmem.
CONFIG_HIGHMEM=y is of course set.

Kernel 5.16.18 says:
[...]
Top of RAM: 0x8000, Total RAM: 0x8000
Memory hole size: 0MB
Zone ranges:
  DMA  [mem 0x-0x27ff]
  Normal   empty
  HighMem  [mem 0x2800-0x7fff]
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x-0x7fff]
Initmem setup node 0 [mem 0x-0x7fff]
percpu: Embedded 12 pages/cpu s19404 r8192 d21556 u49152
pcpu-alloc: s19404 r8192 d21556 u49152 alloc=12*4096
pcpu-alloc: [0] 0 [0] 1 
Built 1 zonelists, mobility grouping on.  Total pages: 522848
Kernel command line: ro root=/dev/sda5 zswap.max_pool_percent=16
zswap.zpool=z3fold slub_debug=FZP page_poison=1
netconsole=@192.168.2.5/eth0,@192.168.2.2/70:85:C2:30:EC:01 
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes, linear)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
mem auto-init: stack:__user(zero), heap alloc:off, heap free:off
Kernel virtual memory layout:
  * 0xffbbf000..0xf000  : fixmap
  * 0xff40..0xff80  : highmem PTEs
  * 0xff115000..0xff40  : early ioremap
  * 0xe900..0xff115000  : vmalloc & ioremap
  * 0xb000..0xc000  : modules
Memory: 614664K/2097152K available (8828K kernel code, 488K rwdata, 1664K
rodata, 1316K init, 381K bss, 1482488K reserved, 0K cma-reserved, 0K highmem)
[...]

On kernel 5.15.23 I got highmem as expected:
[...]
Top of RAM: 0x8000, Total RAM: 0x8000
Memory hole size: 0MB
Zone ranges:
  DMA  [mem 0x-0x27ff]
  Normal   empty
  HighMem  [mem 0x2800-0x7fff]
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x-0x7fff]
Initmem setup node 0 [mem 0x-0x7fff]
percpu: Embedded 12 pages/cpu s19404 r8192 d21556 u49152
pcpu-alloc: s19404 r8192 d21556 u49152 alloc=12*4096
pcpu-alloc: [0] 0 [0] 1 
Built 1 zonelists, mobility grouping on.  Total pages: 522848
Kernel command line: ro root=/dev/sda5 zswap.max_pool_percent=16
zswap.zpool=z3fold slub_debug=FZP page_poison=1
netconsole=@192.168.2.5/eth0,@192.168.2.2/70:85:C2:30:EC:01 
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes, linear)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
mem auto-init: stack:__user(zero), heap alloc:off, heap free:off
Kernel virtual memory layout:
  * 0xffbbf000..0xf000  : fixmap
  * 0xff40..0xff80  : highmem PTEs
  * 0xff115000..0xff40  : early ioremap
  * 0xe900..0xff115000  : vmalloc & ioremap
  * 0xb000..0xc000  : modules
Memory: 2056460K/2097152K available (8688K kernel code, 488K rwdata, 1644K
rodata, 1316K init, 377K bss, 40692K reserved, 0K cma-reserved, 1441792K
highmem)
[...]

For testing I used the kernel .config from 5.15.32 for 5.16.18 via make
oldconfig and selecting =n for all questions.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v2] ftrace: Make ftrace_graph_is_dead() a static branch

2022-03-30 Thread Steven Rostedt
On Wed, 30 Mar 2022 06:55:26 +
Christophe Leroy  wrote:

> > Small nit. Please order the includes in "upside-down x-mas tree" fashion:
> > 
> > #include 
> > #include 
> > #include 
> > #include 
> >   
> 
> That's the first time I get such a request. Usually people request 
> #includes to be in alphabetical order so when I see a file that has 
> headers in alphabetical order I try to not break it, but here that was 
> not the case so I put it at the end of the list.

This is something that Ingo Molnar started back in 2009 or so. And I do
find it easier on the eyes ;-)  I may be the only one today trying to keep
it (albeit poorly).

It's not a hard requirement, but I find it makes the code look more like
art, which it is :-D

> 
> I'll send v3

Thanks,

-- Steve


Re: [PATCH v3 1/2] PCI/AER: Disable AER service when link is in L2/L3 ready, L2 and L3 state

2022-03-30 Thread Sathyanarayanan Kuppuswamy




On 3/29/22 1:31 AM, Kai-Heng Feng wrote:

On some Intel AlderLake platforms, Thunderbolt entering D3cold can cause
some errors reported by AER:
[   30.100211] pcieport :00:1d.0: AER: Uncorrected (Non-Fatal) error 
received: :00:1d.0
[   30.100251] pcieport :00:1d.0: PCIe Bus Error: severity=Uncorrected 
(Non-Fatal), type=Transaction Layer, (Requester ID)
[   30.100256] pcieport :00:1d.0:   device [8086:7ab0] error 
status/mask=0010/4000
[   30.100262] pcieport :00:1d.0:[20] UnsupReq   (First)
[   30.100267] pcieport :00:1d.0: AER:   TLP Header: 3400 0852 
 
[   30.100372] thunderbolt :0a:00.0: AER: can't recover (no error_detected 
callback)
[   30.100401] xhci_hcd :3e:00.0: AER: can't recover (no error_detected 
callback)
[   30.100427] pcieport :00:1d.0: AER: device recovery failed


Include details about in which platform you have seen it and whether
this is a generic power issue?



So disable AER service to avoid the noises from turning power rails
on/off when the device is in low power states (D3hot and D3cold), as
PCIe spec "5.2 Link State Power Management" states that TLP and DLLP


Also include PCIe specification version number.


transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold
with aux power) and L3 (D3cold).

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215453
Reviewed-by: Mika Westerberg 
Signed-off-by: Kai-Heng Feng 
---
v3:
  - Remove reference to ACS.
  - Wording change.

v2:
  - Wording change.

  drivers/pci/pcie/aer.c | 31 +--
  1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 9fa1f97e5b270..e4e9d4a3098d7 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1367,6 +1367,22 @@ static int aer_probe(struct pcie_device *dev)
return 0;
  }
  
+static int aer_suspend(struct pcie_device *dev)

+{
+   struct aer_rpc *rpc = get_service_data(dev);
+
+   aer_disable_rootport(rpc);
+   return 0;
+}
+
+static int aer_resume(struct pcie_device *dev)
+{
+   struct aer_rpc *rpc = get_service_data(dev);
+
+   aer_enable_rootport(rpc);
+   return 0;
+}
+
  /**
   * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP
   * @dev: pointer to Root Port, RCEC, or RCiEP
@@ -1433,12 +1449,15 @@ static pci_ers_result_t aer_root_reset(struct pci_dev 
*dev)
  }
  
  static struct pcie_port_service_driver aerdriver = {

-   .name   = "aer",
-   .port_type  = PCIE_ANY_PORT,
-   .service= PCIE_PORT_SERVICE_AER,
-
-   .probe  = aer_probe,
-   .remove = aer_remove,
+   .name   = "aer",
+   .port_type  = PCIE_ANY_PORT,
+   .service= PCIE_PORT_SERVICE_AER,
+   .probe  = aer_probe,
+   .suspend= aer_suspend,
+   .resume = aer_resume,
+   .runtime_suspend= aer_suspend,
+   .runtime_resume = aer_resume,
+   .remove = aer_remove,
  };
  
  /**


--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


Re: [RFC PATCH 3/3] objtool/mcount: Add powerpc specific functions

2022-03-30 Thread Naveen N. Rao

Christophe Leroy wrote:



Le 29/03/2022 à 14:01, Michael Ellerman a écrit :

Josh Poimboeuf  writes:

On Sun, Mar 27, 2022 at 09:09:20AM +, Christophe Leroy wrote:

What are current works in progress on objtool ? Should I wait Josh's
changes before starting looking at all this ? Should I wait for anything
else ?


I'm not making any major changes to the code, just shuffling things
around to make the interface more modular.  I hope to have something
soon (this week).  Peter recently added a big feature (Intel IBT) which
is already in -next.

Contributions are welcome, with the understanding that you'll help
maintain it ;-)

Some years ago Kamalesh Babulal had a prototype of objtool for ppc64le
which did the full stack validation.  I'm not sure what ever became of
that.


 From memory he was starting to clean the patches up in late 2019, but I
guess that probably got derailed by COVID. AFAIK he never posted
anything. Maybe someone at IBM has a copy internally (Naveen?).


Kamalesh had a WIP series to enable stack validation on powerpc. From 
what I recall, he was waiting on and/or working with the arm64 folks 
around some of the common changes needed in objtool.





FWIW, there have been some objtool patches for arm64 stack validation,
but the arm64 maintainers have been hesitant to get on board with
objtool, as it brings a certain maintenance burden.  Especially for the
full stack validation and ORC unwinder.  But if you only want inline
static calls and/or mcount then it'd probably be much easier to
maintain.


I would like to have the stack validation, but I am also worried about
the maintenance burden.

I guess we start with mcount, which looks pretty minimal judging by this
series, and see how we go from there.



I'm not sure mcount is really needed as we have recordmcount, but at 
least it is an easy one to start with and as we have recordmount we can 
easily compare the results and check it works as expected.


On the contrary, I think support for mcount in objtool is something we 
want to get going soon (hopefully, in time for v5.19) given the issues 
we are seeing with recordmcount:

- https://github.com/linuxppc/issues/issues/388
- https://lore.kernel.org/all/20220211014313.1790140-1-...@ozlabs.ru/


- Naveen



Re: [GIT PULL] LIBNVDIMM update for v5.18

2022-03-30 Thread pr-tracker-bot
The pull request you sent on Tue, 29 Mar 2022 13:54:41 -0700:

> git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm 
> tags/libnvdimm-for-5.18

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/ee96dd9614f1c139e719dd2f296acbed7f1ab4b8

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.18-1 tag

2022-03-30 Thread Segher Boessenkool
On Thu, Mar 31, 2022 at 12:21:03AM +1100, Michael Ellerman wrote:
> Michal Suchánek  writes:
> > On Mon, Mar 28, 2022 at 08:07:13PM +1100, Michael Ellerman wrote:
> >> Linus Torvalds  writes:
> >> > On Fri, Mar 25, 2022 at 3:25 AM Michael Ellerman  
> >> > wrote:
> >> 
> >> > That said:
> >> >
> >> >> There's a series of commits cleaning up function descriptor handling,
> >> >
> >> > For some reason I also thought that powerpc had actually moved away
> >> > from function descriptors, so I'm clearly not keeping up with the
> >> > times.
> >> 
> >> No you're right, we have moved away from them, but not entirely.
> >> 
> >> Functions descriptors are still used for 64-bit big endian, but they're
> >> not used for 64-bit little endian, or 32-bit.
> >
> > There was a patch to use ABIv2 for ppc64 big endian. I suppose that
> > would rid usof the gunction descriptors for good.
> 
> It would be nice.
> 
> The hesitation in the past was that the GNU toolchain developers don't
> officially support BE+ELFv2, though it is in use so it does work.

We do not officially support ELFv2 BE because there are no significant
users, so we cannot have the same confidence it works correctly.

It isn't tested often with GCC for example, mainly because it isn't
convenient to do without pre-packaged user space for it (and on the
other hand, there isn't much demand for it).

> > Maybe it's worth resurrecting?
> 
> Yeah maybe we should think about it again. If it builds with clang/lld
> that would be a real plus.

With GCC it should work fine still.  But no doubt you will find some
edge cases...  which you won't find until you try :-)


Segher


Re: [PATCH v2 6/8] s390/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

2022-03-30 Thread Gerald Schaefer
On Tue, 29 Mar 2022 18:43:27 +0200
David Hildenbrand  wrote:

> Let's use bit 52, which is unused.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  arch/s390/include/asm/pgtable.h | 23 +--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index 3982575bb586..a397b072a580 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -181,6 +181,8 @@ static inline int is_module_addr(void *addr)
>  #define _PAGE_SOFT_DIRTY 0x000
>  #endif
>  
> +#define _PAGE_SWP_EXCLUSIVE _PAGE_LARGE  /* SW pte exclusive swap bit */
> +
>  /* Set of bits not changed in pte_modify */
>  #define _PAGE_CHG_MASK   (PAGE_MASK | _PAGE_SPECIAL | 
> _PAGE_DIRTY | \
>_PAGE_YOUNG | _PAGE_SOFT_DIRTY)
> @@ -826,6 +828,22 @@ static inline int pmd_protnone(pmd_t pmd)
>  }
>  #endif
>  
> +#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
> +static inline int pte_swp_exclusive(pte_t pte)
> +{
> + return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
> +}
> +
> +static inline pte_t pte_swp_mkexclusive(pte_t pte)
> +{
> + return set_pte_bit(pte, __pgprot(_PAGE_SWP_EXCLUSIVE));
> +}
> +
> +static inline pte_t pte_swp_clear_exclusive(pte_t pte)
> +{
> + return clear_pte_bit(pte, __pgprot(_PAGE_SWP_EXCLUSIVE));
> +}
> +
>  static inline int pte_soft_dirty(pte_t pte)
>  {
>   return pte_val(pte) & _PAGE_SOFT_DIRTY;
> @@ -1715,14 +1733,15 @@ static inline int has_transparent_hugepage(void)
>   * Bits 54 and 63 are used to indicate the page type. Bit 53 marks the pte
>   * as invalid.
>   * A swap pte is indicated by bit pattern (pte & 0x201) == 0x200
> - * |   offset|X11XX|type |S0|
> + * |   offset|E11XX|type |S0|
>   * |001122334455|5|55566|66|
>   * |0123456789012345678901234567890123456789012345678901|23456|78901|23|
>   *
>   * Bits 0-51 store the offset.
> + * Bit 52 (E) is used to remember PG_anon_exclusive.
>   * Bits 57-61 store the type.
>   * Bit 62 (S) is used for softdirty tracking.
> - * Bits 52, 55 and 56 (X) are unused.
> + * Bits 55 and 56 (X) are unused.
>   */
>  
>  #define __SWP_OFFSET_MASK((1UL << 52) - 1)

Thanks David!

Reviewed-by: Gerald Schaefer 


Re: [PATCH v2 5/8] s390/pgtable: cleanup description of swp pte layout

2022-03-30 Thread Gerald Schaefer
On Tue, 29 Mar 2022 18:43:26 +0200
David Hildenbrand  wrote:

> Bit 52 and bit 55 don't have to be zero: they only trigger a
> translation-specifiation exception if the PTE is marked as valid, which
> is not the case for swap ptes.
> 
> Document which bits are used for what, and which ones are unused.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  arch/s390/include/asm/pgtable.h | 17 -
>  1 file changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index 9df679152620..3982575bb586 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -1712,18 +1712,17 @@ static inline int has_transparent_hugepage(void)
>  /*
>   * 64 bit swap entry format:
>   * A page-table entry has some bits we have to treat in a special way.
> - * Bits 52 and bit 55 have to be zero, otherwise a specification
> - * exception will occur instead of a page translation exception. The
> - * specification exception has the bad habit not to store necessary
> - * information in the lowcore.
> - * Bits 54 and 63 are used to indicate the page type.
> + * Bits 54 and 63 are used to indicate the page type. Bit 53 marks the pte
> + * as invalid.
>   * A swap pte is indicated by bit pattern (pte & 0x201) == 0x200
> - * This leaves the bits 0-51 and bits 56-62 to store type and offset.
> - * We use the 5 bits from 57-61 for the type and the 52 bits from 0-51
> - * for the offset.
> - * |   offset|01100|type |00|
> + * |   offset|X11XX|type |S0|
>   * |001122334455|5|55566|66|
>   * |0123456789012345678901234567890123456789012345678901|23456|78901|23|
> + *
> + * Bits 0-51 store the offset.
> + * Bits 57-61 store the type.
> + * Bit 62 (S) is used for softdirty tracking.
> + * Bits 52, 55 and 56 (X) are unused.
>   */
>  
>  #define __SWP_OFFSET_MASK((1UL << 52) - 1)

Thanks David!

Reviewed-by: Gerald Schaefer 


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.18-1 tag

2022-03-30 Thread Arnd Bergmann
On Wed, Mar 30, 2022 at 3:21 PM Michael Ellerman  wrote:
> Michal Suchánek  writes:
> > On Mon, Mar 28, 2022 at 08:07:13PM +1100, Michael Ellerman wrote:
> >> No you're right, we have moved away from them, but not entirely.
> >>
> >> Functions descriptors are still used for 64-bit big endian, but they're
> >> not used for 64-bit little endian, or 32-bit.
> >
> > There was a patch to use ABIv2 for ppc64 big endian. I suppose that
> > would rid usof the gunction descriptors for good.
>
> It would be nice.
>
> The hesitation in the past was that the GNU toolchain developers don't
> officially support BE+ELFv2, though it is in use so it does work.

It clearly made sense to wait while BE+ELFv1 was commonly used and
well tested, but as that is getting less common each year, getting ELFv1
out of the picture would appear to make the setup less obscure, not more.

   Arnd


[PATCH v2 3/3] powerpc/64: remove system call instruction emulation

2022-03-30 Thread Naveen N. Rao
From: Nicholas Piggin 

emulate_step() instruction emulation including sc instruction emulation
initially appeared in xmon. It was then moved into sstep.c where kprobes
could use it too, and later hw_breakpoint and uprobes started to use it.

Until uprobes, the only instruction emulation users were for kernel
mode instructions.

- xmon only steps / breaks on kernel addresses.
- kprobes is kernel only.
- hw_breakpoint only emulates kernel instructions, single steps user.

At one point, there was support for the kernel to execute sc
instructions, although that is long removed and it's not clear whether
there were any in-tree users. So system call emulation is not required
by the above users.

uprobes uses emulate_step and it appears possible to emulate sc
instruction in userspace. Userspace system call emulation is broken and
it's not clear it ever worked well.

The big complication is that userspace takes an interrupt to the kernel
to emulate the instruction. The user->kernel interrupt sets up registers
and interrupt stack frame expecting to return to userspace, then system
call instruction emulation re-directs that stack frame to the kernel,
early in the system call interrupt handler. This means the interrupt
return code takes the kernel->kernel restore path, which does not
restore everything as the system call interrupt handler would expect
coming from userspace. regs->iamr appears to get lost for example,
because the kernel->kernel return does not restore the user iamr.
Accounting such as irqflags tracing and CPU accounting does not get
flipped back to user mode as the system call handler expects, so those
appear to enter the kernel twice without returning to userspace.

These things may be individually fixable with various complication, but
it is a big complexity for unclear real benefit.

Furthermore, it is not possible to single step a system call instruction
since it causes an interrupt. As such, a separate patch disables probing
on system call instructions.

This patch removes system call emulation and disables stepping system
calls.

Signed-off-by: Nicholas Piggin 
[minor commit log edit, and also get rid of '#ifdef CONFIG_PPC64']
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/interrupt_64.S | 10 ---
 arch/powerpc/lib/sstep.c   | 46 +++---
 2 files changed, 10 insertions(+), 46 deletions(-)

diff --git a/arch/powerpc/kernel/interrupt_64.S 
b/arch/powerpc/kernel/interrupt_64.S
index 7bab2d7de372e0..6471034c790973 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -219,16 +219,6 @@ system_call_vectored common 0x3000
  */
 system_call_vectored sigill 0x7ff0
 
-
-/*
- * Entered via kernel return set up by kernel/sstep.c, must match entry regs
- */
-   .globl system_call_vectored_emulate
-system_call_vectored_emulate:
-_ASM_NOKPROBE_SYMBOL(system_call_vectored_emulate)
-   li  r10,IRQS_ALL_DISABLED
-   stb r10,PACAIRQSOFTMASK(r13)
-   b   system_call_vectored_common
 #endif /* CONFIG_PPC_BOOK3S */
 
.balign IFETCH_ALIGN_BYTES
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 3fda8d0a05b43f..01c8fd39f34981 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -15,9 +15,6 @@
 #include 
 #include 
 
-extern char system_call_common[];
-extern char system_call_vectored_emulate[];
-
 #ifdef CONFIG_PPC64
 /* Bits in SRR1 that are copied from MSR */
 #define MSR_MASK   0x87c0UL
@@ -1376,7 +1373,6 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
if (branch_taken(word, regs, op))
op->type |= BRTAKEN;
return 1;
-#ifdef CONFIG_PPC64
case 17:/* sc */
if ((word & 0xfe2) == 2)
op->type = SYSCALL;
@@ -1388,7 +1384,6 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
} else
op->type = UNKNOWN;
return 0;
-#endif
case 18:/* b */
op->type = BRANCH | BRTAKEN;
imm = word & 0x03fc;
@@ -3643,43 +3638,22 @@ int emulate_step(struct pt_regs *regs, ppc_inst_t instr)
regs_set_return_msr(regs, (regs->msr & ~op.val) | (val & 
op.val));
goto instr_done;
 
-#ifdef CONFIG_PPC64
case SYSCALL:   /* sc */
/*
-* N.B. this uses knowledge about how the syscall
-* entry code works.  If that is changed, this will
-* need to be changed also.
+* Per ISA v3.1, section 7.5.15 'Trace Interrupt', we can't
+* single step a system call instruction:
+*
+*   Successful completion for an instruction means that the
+*   instruction caused no other interrupt. Thus a Trace
+*   interrupt never occurs for a System Call 

[PATCH v2 2/3] powerpc: Reject probes on instructions that can't be single stepped

2022-03-30 Thread Naveen N. Rao
Per the ISA, a Trace interrupt is not generated for:
- [h|u]rfi[d]
- rfscv
- sc, scv, and Trap instructions that trap
- Power-Saving Mode instructions
- other instructions that cause interrupts (other than Trace interrupts)
- the first instructions of any interrupt handler (applies to Branch and Single 
Step tracing;
CIABR matches may still occur)
- instructions that are emulated by software

Add a helper to check for instructions belonging to the first four
categories above and to reject kprobes, uprobes and xmon breakpoints on
such instructions. We reject probing on instructions belonging to these
categories across all ISA versions and across both BookS and BookE.

For trap instructions, we can't know in advance if they can cause a
trap, and there is no good reason to allow probing on those. Also,
uprobes already refuses to probe trap instructions and kprobes does not
allow probes on trap instructions used for kernel warnings and bugs. As
such, stop allowing any type of probes/breakpoints on trap instruction
across uprobes, kprobes and xmon.

For some of the fp/altivec instructions that can generate an interrupt
and which we emulate in the kernel (altivec assist, for example), we
check and turn off single stepping in emulate_single_step().

Instructions generating a DSI are restarted and single stepping normally
completes once the instruction is completed.

In uprobes, if a single stepped instruction results in a non-fatal
signal to be delivered to the task, such signals are "delayed" until
after the instruction completes. For fatal signals, single stepping is
cancelled and the instruction restarted in-place so that core dump
captures proper addresses.

In kprobes, we do not allow probes on instructions having an extable
entry and we also do not allow probing interrupt vectors.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/include/asm/ppc-opcode.h | 18 ++
 arch/powerpc/include/asm/probes.h | 36 +++
 arch/powerpc/kernel/kprobes.c |  4 +--
 arch/powerpc/kernel/uprobes.c |  5 
 arch/powerpc/xmon/xmon.c  | 11 
 5 files changed, 66 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index a5d89cd3e8d12d..683e9bc618a74d 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -130,6 +130,8 @@
 #define OP_PREFIX  1
 #define OP_TRAP_64 2
 #define OP_TRAP3
+#define OP_SC  17
+#define OP_19  19
 #define OP_31  31
 #define OP_LWZ 32
 #define OP_LWZU33
@@ -159,6 +161,20 @@
 #define OP_LD  58
 #define OP_STD 62
 
+#define OP_19_XOP_RFID 18
+#define OP_19_XOP_RFMCI38
+#define OP_19_XOP_RFDI 39
+#define OP_19_XOP_RFI  50
+#define OP_19_XOP_RFCI 51
+#define OP_19_XOP_RFSCV82
+#define OP_19_XOP_HRFID274
+#define OP_19_XOP_URFID306
+#define OP_19_XOP_STOP 370
+#define OP_19_XOP_DOZE 402
+#define OP_19_XOP_NAP  434
+#define OP_19_XOP_SLEEP466
+#define OP_19_XOP_RVWINKLE 498
+
 #define OP_31_XOP_TRAP  4
 #define OP_31_XOP_LDX   21
 #define OP_31_XOP_LWZX  23
@@ -179,6 +195,8 @@
 #define OP_31_XOP_LHZUX 311
 #define OP_31_XOP_MSGSNDP   142
 #define OP_31_XOP_MSGCLRP   174
+#define OP_31_XOP_MTMSR 146
+#define OP_31_XOP_MTMSRD178
 #define OP_31_XOP_TLBIE 306
 #define OP_31_XOP_MFSPR 339
 #define OP_31_XOP_LWAX  341
diff --git a/arch/powerpc/include/asm/probes.h 
b/arch/powerpc/include/asm/probes.h
index c5d984700d241a..6f66e358aa3780 100644
--- a/arch/powerpc/include/asm/probes.h
+++ b/arch/powerpc/include/asm/probes.h
@@ -8,6 +8,7 @@
  * Copyright IBM Corporation, 2012
  */
 #include 
+#include 
 
 typedef u32 ppc_opcode_t;
 #define BREAKPOINT_INSTRUCTION 0x7fe8  /* trap */
@@ -31,6 +32,41 @@ typedef u32 ppc_opcode_t;
 #define MSR_SINGLESTEP (MSR_SE)
 #endif
 
+static inline bool can_single_step(u32 inst)
+{
+   switch (get_op(inst)) {
+   case OP_TRAP_64:return false;
+   case OP_TRAP:   return false;
+   case OP_SC: return false;
+   case OP_19:
+   switch (get_xop(inst)) {
+   case OP_19_XOP_RFID:return false;
+   case OP_19_XOP_RFMCI:   return false;
+   case OP_19_XOP_RFDI:return false;
+   case OP_19_XOP_RFI: return false;
+   case OP_19_XOP_RFCI:return false;
+   case OP_19_XOP_RFSCV:   return false;
+   case OP_19_XOP_HRFID:   return false;
+   case OP_19_XOP_URFID:   return false;
+   case OP_19_XOP_STOP:return false;
+   case OP_19_XOP_DOZE:return false;
+   case 

[PATCH v2 1/3] powerpc: Sort and de-dup primary opcodes in ppc-opcode.h

2022-03-30 Thread Naveen N. Rao
Some of the primary opcodes are duplicated. Remove those, and sort the
rest of the primary opcodes to make it easy to read.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/include/asm/ppc-opcode.h | 69 ---
 1 file changed, 31 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 82f1f0041c6f79..a5d89cd3e8d12d 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -127,8 +127,37 @@
 
 
 /* opcode and xopcode for instructions */
-#define OP_TRAP 3
-#define OP_TRAP_64 2
+#define OP_PREFIX  1
+#define OP_TRAP_64 2
+#define OP_TRAP3
+#define OP_31  31
+#define OP_LWZ 32
+#define OP_LWZU33
+#define OP_LBZ 34
+#define OP_LBZU35
+#define OP_STW 36
+#define OP_STWU37
+#define OP_STB 38
+#define OP_STBU39
+#define OP_LHZ 40
+#define OP_LHZU41
+#define OP_LHA 42
+#define OP_LHAU43
+#define OP_STH 44
+#define OP_STHU45
+#define OP_LMW 46
+#define OP_STMW47
+#define OP_LFS 48
+#define OP_LFSU49
+#define OP_LFD 50
+#define OP_LFDU51
+#define OP_STFS52
+#define OP_STFSU   53
+#define OP_STFD54
+#define OP_STFDU   55
+#define OP_LQ  56
+#define OP_LD  58
+#define OP_STD 62
 
 #define OP_31_XOP_TRAP  4
 #define OP_31_XOP_LDX   21
@@ -208,42 +237,6 @@
 /* VMX Vector Store Instructions */
 #define OP_31_XOP_STVX  231
 
-/* Prefixed Instructions */
-#define OP_PREFIX  1
-
-#define OP_31   31
-#define OP_LWZ  32
-#define OP_STFS 52
-#define OP_STFSU 53
-#define OP_STFD 54
-#define OP_STFDU 55
-#define OP_LD   58
-#define OP_LWZU 33
-#define OP_LBZ  34
-#define OP_LBZU 35
-#define OP_STW  36
-#define OP_STWU 37
-#define OP_STD  62
-#define OP_STB  38
-#define OP_STBU 39
-#define OP_LHZ  40
-#define OP_LHZU 41
-#define OP_LHA  42
-#define OP_LHAU 43
-#define OP_STH  44
-#define OP_STHU 45
-#define OP_LMW  46
-#define OP_STMW 47
-#define OP_LFS  48
-#define OP_LFSU 49
-#define OP_LFD  50
-#define OP_LFDU 51
-#define OP_STFS 52
-#define OP_STFSU 53
-#define OP_STFD  54
-#define OP_STFDU 55
-#define OP_LQ56
-
 /* sorted alphabetically */
 #define PPC_INST_BCCTR_FLUSH   0x4c400420
 #define PPC_INST_COPY  0x7c20060c
-- 
2.35.1



[PATCH v2 0/3] powerpc: Remove system call emulation

2022-03-30 Thread Naveen N. Rao
Since v1, the main change is to use helpers to decode primary/extended 
opcode and the addition of macros for some of the used opcodes.

- Naveen



Naveen N. Rao (2):
  powerpc: Sort and de-dup primary opcodes in ppc-opcode.h
  powerpc: Reject probes on instructions that can't be single stepped

Nicholas Piggin (1):
  powerpc/64: remove system call instruction emulation

 arch/powerpc/include/asm/ppc-opcode.h | 87 +++
 arch/powerpc/include/asm/probes.h | 36 +++
 arch/powerpc/kernel/interrupt_64.S| 10 ---
 arch/powerpc/kernel/kprobes.c |  4 +-
 arch/powerpc/kernel/uprobes.c |  5 ++
 arch/powerpc/lib/sstep.c  | 46 +++---
 arch/powerpc/xmon/xmon.c  | 11 ++--
 7 files changed, 107 insertions(+), 92 deletions(-)


base-commit: e8833c5edc5903f8c8c4fa3dd4f34d6b813c87c8
-- 
2.35.1



Re: [PATCH] powerpc/numa: Handle partially initialized numa nodes

2022-03-30 Thread Oscar Salvador
On Wed, Mar 30, 2022 at 07:21:23PM +0530, Srikar Dronamraju wrote:
> With commit 09f49dca570a ("mm: handle uninitialized numa nodes
> gracefully") NODE_DATA for even a memoryless/cpuless node is partially
> initialized at boot time.
> 
> Before onlining the node, current Powerpc code checks for NODE_DATA to
> be NULL. However since NODE_DATA is partially initialized, this check
> will end up always being false.
> 
> This causes hotplugging a CPU to a memoryless/cpuless node to fail.
> 
> Before adding CPUs
> $ numactl -H
> available: 1 nodes (4)
> node 4 cpus: 0 1 2 3 4 5 6 7
> node 4 size: 97372 MB
> node 4 free: 95545 MB
> node distances:
> node   4
> 4:  10
> 
> $ lparstat
> System Configuration
> type=Dedicated mode=Capped smt=8 lcpu=1 mem=99709440 kB cpus=0 ent=1.00
> 
> %user  %sys %wait%idlephysc %entc lbusy   app  vcsw phint
> - - --- - - - - -
> 2.66  2.67  0.1694.51 0.00  0.00  5.33  0.00 67749 0
> 
> After hotplugging 32 cores
> $ numactl -H
> node 4 cpus: 0 1 2 3 4 5 6 7 120 121 122 123 124 125 126 127 128 129 130
> 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148
> 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166
> 167 168 169 170 171 172 173 174 175
> node 4 size: 97372 MB
> node 4 free: 93636 MB
> node distances:
> node   4
> 4:  10
> 
> $ lparstat
> System Configuration
> type=Dedicated mode=Capped smt=8 lcpu=33 mem=99709440 kB cpus=0 ent=33.00
> 
> %user  %sys %wait%idlephysc %entc lbusy   app  vcsw phint
> - - --- - - - - -
> 0.04  0.02  0.0099.94 0.00  0.00  0.06  0.00 1128751 3
> 
> As we can see numactl is listing only 8 cores while lparstat is showing
> 33 cores.
> 
> Also dmesg is showing messages like:
> [ 2261.318350 ] BUG: arch topology borken
> [ 2261.318357 ]  the DIE domain not a subset of the NODE domain
> 
> Fixes: 09f49dca570a ("mm: handle uninitialized numa nodes gracefully")
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux...@kvack.org
> Cc: Michal Hocko 
> Cc: Michael Ellerman 
> Reported-by: Geetika Moolchandani 
> Signed-off-by: Srikar Dronamraju 

Acked-by: Oscar Salvador 

Thanks Srikar!

> ---
>  arch/powerpc/mm/numa.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index b9b7fefbb64b..13022d734951 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -1436,7 +1436,7 @@ int find_and_online_cpu_nid(int cpu)
>   if (new_nid < 0 || !node_possible(new_nid))
>   new_nid = first_online_node;
>  
> - if (NODE_DATA(new_nid) == NULL) {
> + if (!node_online(new_nid)) {
>  #ifdef CONFIG_MEMORY_HOTPLUG
>   /*
>* Need to ensure that NODE_DATA is initialized for a node from
> -- 
> 2.27.0
> 
> 

-- 
Oscar Salvador
SUSE Labs


[PATCH] powerpc/numa: Handle partially initialized numa nodes

2022-03-30 Thread Srikar Dronamraju
With commit 09f49dca570a ("mm: handle uninitialized numa nodes
gracefully") NODE_DATA for even a memoryless/cpuless node is partially
initialized at boot time.

Before onlining the node, current Powerpc code checks for NODE_DATA to
be NULL. However since NODE_DATA is partially initialized, this check
will end up always being false.

This causes hotplugging a CPU to a memoryless/cpuless node to fail.

Before adding CPUs
$ numactl -H
available: 1 nodes (4)
node 4 cpus: 0 1 2 3 4 5 6 7
node 4 size: 97372 MB
node 4 free: 95545 MB
node distances:
node   4
4:  10

$ lparstat
System Configuration
type=Dedicated mode=Capped smt=8 lcpu=1 mem=99709440 kB cpus=0 ent=1.00

%user  %sys %wait%idlephysc %entc lbusy   app  vcsw phint
- - --- - - - - -
2.66  2.67  0.1694.51 0.00  0.00  5.33  0.00 67749 0

After hotplugging 32 cores
$ numactl -H
node 4 cpus: 0 1 2 3 4 5 6 7 120 121 122 123 124 125 126 127 128 129 130
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166
167 168 169 170 171 172 173 174 175
node 4 size: 97372 MB
node 4 free: 93636 MB
node distances:
node   4
4:  10

$ lparstat
System Configuration
type=Dedicated mode=Capped smt=8 lcpu=33 mem=99709440 kB cpus=0 ent=33.00

%user  %sys %wait%idlephysc %entc lbusy   app  vcsw phint
- - --- - - - - -
0.04  0.02  0.0099.94 0.00  0.00  0.06  0.00 1128751 3

As we can see numactl is listing only 8 cores while lparstat is showing
33 cores.

Also dmesg is showing messages like:
[ 2261.318350 ] BUG: arch topology borken
[ 2261.318357 ]  the DIE domain not a subset of the NODE domain

Fixes: 09f49dca570a ("mm: handle uninitialized numa nodes gracefully")
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org
Cc: Michal Hocko 
Cc: Michael Ellerman 
Reported-by: Geetika Moolchandani 
Signed-off-by: Srikar Dronamraju 
---
 arch/powerpc/mm/numa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index b9b7fefbb64b..13022d734951 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1436,7 +1436,7 @@ int find_and_online_cpu_nid(int cpu)
if (new_nid < 0 || !node_possible(new_nid))
new_nid = first_online_node;
 
-   if (NODE_DATA(new_nid) == NULL) {
+   if (!node_online(new_nid)) {
 #ifdef CONFIG_MEMORY_HOTPLUG
/*
 * Need to ensure that NODE_DATA is initialized for a node from
-- 
2.27.0



Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.18-1 tag

2022-03-30 Thread Michael Ellerman
Michal Suchánek  writes:
> On Mon, Mar 28, 2022 at 08:07:13PM +1100, Michael Ellerman wrote:
>> Linus Torvalds  writes:
>> > On Fri, Mar 25, 2022 at 3:25 AM Michael Ellerman  
>> > wrote:
>> 
>> > That said:
>> >
>> >> There's a series of commits cleaning up function descriptor handling,
>> >
>> > For some reason I also thought that powerpc had actually moved away
>> > from function descriptors, so I'm clearly not keeping up with the
>> > times.
>> 
>> No you're right, we have moved away from them, but not entirely.
>> 
>> Functions descriptors are still used for 64-bit big endian, but they're
>> not used for 64-bit little endian, or 32-bit.
>
> There was a patch to use ABIv2 for ppc64 big endian. I suppose that
> would rid usof the gunction descriptors for good.

It would be nice.

The hesitation in the past was that the GNU toolchain developers don't
officially support BE+ELFv2, though it is in use so it does work.

> Maybe it's worth resurrecting?

Yeah maybe we should think about it again. If it builds with clang/lld
that would be a real plus.

cheers


Re: [PATCH 1/2] powerpc: Reject probes on instructions that can't be single stepped

2022-03-30 Thread Naveen N. Rao

Christophe Leroy wrote:



Le 28/03/2022 à 19:20, Naveen N. Rao a écrit :

Michael Ellerman wrote:

Murilo Opsfelder Araújo  writes:

On 3/23/22 08:51, Naveen N. Rao wrote:

+static inline bool can_single_step(u32 inst)
+{
+    switch (inst >> 26) {


Can't ppc_inst_primary_opcode() be used instead?


I didn't want to add a dependency on inst.h. But I guess I can very well 
move this out of the header into some .c file. I will see if I can make 
that work.


Maybe use get_op() from asm/disassemble.h ?




+    case 31:
+    switch ((inst >> 1) & 0x3ff) {


For that one you have get_xop() in asm/disassemble.h


Nice! I will use those.




+    case 4:    /* tw */


OP_31_XOP_TRAP


+    return false;
+    case 68:    /* td */


OP_31_XOP_TRAP_64


+    return false;
+    case 146:    /* mtmsr */
+    return false;
+    case 178:    /* mtmsrd */
+    return false;
+    }
+    break;
+    }
+    return true;
+}
+


Can't OP_* definitions from ppc-opcode.h be used for all of these 
switch-case statements?


Yes please. And add any that are missing.


We only have OP_31 from the above list now. I'll add the rest.


Isn't there also OP_TRAP and OP_TRAP_64 ?


Ah, the list clearly isn't sorted, and there are some duplicates 
there :)



Thanks,
Naveen



Re: [PATCH] powerpc/rtas: Keep MSR RI set when calling RTAS

2022-03-30 Thread Laurent Dufour
On 29/03/2022, 13:14:10, Michael Ellerman wrote:
> Laurent Dufour  writes:
>> On 29/03/2022, 10:31:33, Nicholas Piggin wrote:
>>> Excerpts from Laurent Dufour's message of March 17, 2022 9:06 pm:
 RTAS runs in real mode (MSR[DR] and MSR[IR] unset) and in 32bits
 mode (MSR[SF] unset).

 The change in MSR is done in enter_rtas() in a relatively complex way,
 since the MSR value could be hardcoded.

 Furthermore, a panic has been reported when hitting the watchdog interrupt
 while running in RTAS, this leads to the following stack trace:

 [69244.027433][   C24] watchdog: CPU 24 Hard LOCKUP
 [69244.027442][   C24] watchdog: CPU 24 TB:997512652051031, last heartbeat 
 TB:997504470175378 (15980ms ago)
 [69244.027451][   C24] Modules linked in: chacha_generic(E) libchacha(E) 
 xxhash_generic(E) wp512(E) sha3_generic(E) rmd160(E) poly1305_generic(E) 
 libpoly1305(E) michael_mic(E) md4(E) crc32_generic(E) cmac(E) ccm(E) 
 algif_rng(E) twofish_generic(E) twofish_common(E) serpent_generic(E) 
 fcrypt(E) des_generic(E) libdes(E) cast6_generic(E) cast5_generic(E) 
 cast_common(E) camellia_generic(E) blowfish_generic(E) blowfish_common(E) 
 algif_skcipher(E) algif_hash(E) gcm(E) algif_aead(E) af_alg(E) tun(E) 
 rpcsec_gss_krb5(E) auth_rpcgss(E)
 nfsv4(E) dns_resolver(E) rpadlpar_io(EX) rpaphp(EX) xsk_diag(E) 
 tcp_diag(E) udp_diag(E) raw_diag(E) inet_diag(E) unix_diag(E) 
 af_packet_diag(E) netlink_diag(E) nfsv3(E) nfs_acl(E) nfs(E) lockd(E) 
 grace(E) sunrpc(E) fscache(E) netfs(E) af_packet(E) rfkill(E) bonding(E) 
 tls(E) ibmveth(EX) crct10dif_vpmsum(E) rtc_generic(E) drm(E) 
 drm_panel_orientation_quirks(E) fuse(E) configfs(E) backlight(E) 
 ip_tables(E) x_tables(E) dm_service_time(E) sd_mod(E) t10_pi(E)
 [69244.027555][   C24]  ibmvfc(EX) scsi_transport_fc(E) vmx_crypto(E) 
 gf128mul(E) btrfs(E) blake2b_generic(E) libcrc32c(E) crc32c_vpmsum(E) 
 xor(E) raid6_pq(E) dm_mirror(E) dm_region_hash(E) dm_log(E) sg(E) 
 dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) 
 scsi_mod(E)
 [69244.027587][   C24] Supported: No, Unreleased kernel
 [69244.027600][   C24] CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded 
 Tainted: GE  X5.14.21-150400.71.1.bz196362_2-default #1 
 SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c
 [69244.027609][   C24] NIP:  1fb41050 LR: 1fb4104c CTR: 
 
 [69244.027612][   C24] REGS: cfc33d60 TRAP: 0100   Tainted: G  
   E  X (5.14.21-150400.71.1.bz196362_2-default)
 [69244.027615][   C24] MSR:  82981000   CR: 
 4882  XER: 20040020
 [69244.027625][   C24] CFAR: 011c IRQMASK: 1
 [69244.027625][   C24] GPR00: 0003  
 0001 50dc
 [69244.027625][   C24] GPR04: 1ffb6100 0020 
 0001 1fb09010
 [69244.027625][   C24] GPR08: 2000  
  
 [69244.027625][   C24] GPR12: 8004072a40a8 cff8b680 
 0007 0034
 [69244.027625][   C24] GPR16: 1fbf6e94 1fbf6d84 
 1fbd1db0 1fb3f008
 [69244.027625][   C24] GPR20: 1fb41018  
 017f f68f
 [69244.027625][   C24] GPR24: 1fb18fe8 1fb3e000 
 1fb1adc0 1fb1cf40
 [69244.027625][   C24] GPR28: 1fb26000 1fb460f0 
 1fb17f18 1fb17000
 [69244.027663][   C24] NIP [1fb41050] 0x1fb41050
 [69244.027696][   C24] LR [1fb4104c] 0x1fb4104c
 [69244.027699][   C24] Call Trace:
 [69244.027701][   C24] Instruction dump:
 [69244.027723][   C24]      
   
 [69244.027728][   C24]      
   
 [69244.027762][T87504] Oops: Unrecoverable System Reset, sig: 6 [#1]
 [69244.028044][T87504] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA 
 pSeries
 [69244.028089][T87504] Modules linked in: chacha_generic(E) libchacha(E) 
 xxhash_generic(E) wp512(E) sha3_generic(E) rmd160(E) poly1305_generic(E) 
 libpoly1305(E) michael_mic(E) md4(E) crc32_generic(E) cmac(E) ccm(E) 
 algif_rng(E) twofish_generic(E) twofish_common(E) serpent_generic(E) 
 fcrypt(E) des_generic(E) libdes(E) cast6_generic(E) cast5_generic(E) 
 cast_common(E) camellia_generic(E) blowfish_generic(E) blowfish_common(E) 
 algif_skcipher(E) algif_hash(E) gcm(E) algif_aead(E) af_alg(E) tun(E) 
 rpcsec_gss_krb5(E) auth_rpcgss(E)
 nfsv4(E) dns_resolver(E) rpadlpar_io(EX) rpaphp(EX) xsk_diag(E) 
 tcp_diag(E) udp_diag(E) raw_diag(E) inet_diag(E) unix_diag(E) 

[PATCH AUTOSEL 5.10 26/37] uaccess: fix type mismatch warnings from access_ok()

2022-03-30 Thread Sasha Levin
From: Arnd Bergmann 

[ Upstream commit 23fc539e81295b14b50c6ccc5baeb4f3d59d822d ]

On some architectures, access_ok() does not do any argument type
checking, so replacing the definition with a generic one causes
a few warnings for harmless issues that were never caught before.

Fix the ones that I found either through my own test builds or
that were reported by the 0-day bot.

Reported-by: kernel test robot 
Reviewed-by: Christoph Hellwig 
Acked-by: Dinh Nguyen 
Signed-off-by: Arnd Bergmann 
Signed-off-by: Sasha Levin 
---
 arch/arc/kernel/process.c  |  2 +-
 arch/arm/kernel/swp_emulate.c  |  2 +-
 arch/arm/kernel/traps.c|  2 +-
 arch/csky/kernel/perf_callchain.c  |  2 +-
 arch/csky/kernel/signal.c  |  2 +-
 arch/nios2/kernel/signal.c | 20 +++-
 arch/powerpc/lib/sstep.c   |  4 ++--
 arch/riscv/kernel/perf_callchain.c |  4 ++--
 arch/sparc/kernel/signal_32.c  |  2 +-
 lib/test_lockup.c  |  4 ++--
 10 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c
index 37f724ad5e39..a85e9c625ab5 100644
--- a/arch/arc/kernel/process.c
+++ b/arch/arc/kernel/process.c
@@ -43,7 +43,7 @@ SYSCALL_DEFINE0(arc_gettls)
return task_thread_info(current)->thr_ptr;
 }
 
-SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new)
+SYSCALL_DEFINE3(arc_usr_cmpxchg, int __user *, uaddr, int, expected, int, new)
 {
struct pt_regs *regs = current_pt_regs();
u32 uval;
diff --git a/arch/arm/kernel/swp_emulate.c b/arch/arm/kernel/swp_emulate.c
index 6166ba38bf99..b74bfcf94fb1 100644
--- a/arch/arm/kernel/swp_emulate.c
+++ b/arch/arm/kernel/swp_emulate.c
@@ -195,7 +195,7 @@ static int swp_handler(struct pt_regs *regs, unsigned int 
instr)
 destreg, EXTRACT_REG_NUM(instr, RT2_OFFSET), data);
 
/* Check access in reasonable access range for both SWP and SWPB */
-   if (!access_ok((address & ~3), 4)) {
+   if (!access_ok((void __user *)(address & ~3), 4)) {
pr_debug("SWP{B} emulation: access to %p not allowed!\n",
 (void *)address);
res = -EFAULT;
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 2d9e72ad1b0f..a531afad87fd 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -589,7 +589,7 @@ do_cache_op(unsigned long start, unsigned long end, int 
flags)
if (end < start || flags)
return -EINVAL;
 
-   if (!access_ok(start, end - start))
+   if (!access_ok((void __user *)start, end - start))
return -EFAULT;
 
return __do_cache_op(start, end);
diff --git a/arch/csky/kernel/perf_callchain.c 
b/arch/csky/kernel/perf_callchain.c
index 35318a635a5f..75e1f9df5f60 100644
--- a/arch/csky/kernel/perf_callchain.c
+++ b/arch/csky/kernel/perf_callchain.c
@@ -49,7 +49,7 @@ static unsigned long user_backtrace(struct 
perf_callchain_entry_ctx *entry,
 {
struct stackframe buftail;
unsigned long lr = 0;
-   unsigned long *user_frame_tail = (unsigned long *)fp;
+   unsigned long __user *user_frame_tail = (unsigned long __user *)fp;
 
/* Check accessibility of one struct frame_tail beyond */
if (!access_ok(user_frame_tail, sizeof(buftail)))
diff --git a/arch/csky/kernel/signal.c b/arch/csky/kernel/signal.c
index 0ca49b5e3dd3..243228b0aa07 100644
--- a/arch/csky/kernel/signal.c
+++ b/arch/csky/kernel/signal.c
@@ -136,7 +136,7 @@ static inline void __user *get_sigframe(struct ksignal 
*ksig,
 static int
 setup_rt_frame(struct ksignal *ksig, sigset_t *set, struct pt_regs *regs)
 {
-   struct rt_sigframe *frame;
+   struct rt_sigframe __user *frame;
int err = 0;
struct csky_vdso *vdso = current->mm->context.vdso;
 
diff --git a/arch/nios2/kernel/signal.c b/arch/nios2/kernel/signal.c
index cf2dca2ac7c3..e45491d1d3e4 100644
--- a/arch/nios2/kernel/signal.c
+++ b/arch/nios2/kernel/signal.c
@@ -36,10 +36,10 @@ struct rt_sigframe {
 
 static inline int rt_restore_ucontext(struct pt_regs *regs,
struct switch_stack *sw,
-   struct ucontext *uc, int *pr2)
+   struct ucontext __user *uc, int *pr2)
 {
int temp;
-   unsigned long *gregs = uc->uc_mcontext.gregs;
+   unsigned long __user *gregs = uc->uc_mcontext.gregs;
int err;
 
/* Always make any pending restarted system calls return -EINTR */
@@ -102,10 +102,11 @@ asmlinkage int do_rt_sigreturn(struct switch_stack *sw)
 {
struct pt_regs *regs = (struct pt_regs *)(sw + 1);
/* Verify, can we follow the stack back */
-   struct rt_sigframe *frame = (struct rt_sigframe *) regs->sp;
+   struct rt_sigframe __user *frame;
sigset_t set;
int rval;
 
+   frame = (struct rt_sigframe __user *) regs->sp;
if 

[PATCH AUTOSEL 5.15 36/50] uaccess: fix type mismatch warnings from access_ok()

2022-03-30 Thread Sasha Levin
From: Arnd Bergmann 

[ Upstream commit 23fc539e81295b14b50c6ccc5baeb4f3d59d822d ]

On some architectures, access_ok() does not do any argument type
checking, so replacing the definition with a generic one causes
a few warnings for harmless issues that were never caught before.

Fix the ones that I found either through my own test builds or
that were reported by the 0-day bot.

Reported-by: kernel test robot 
Reviewed-by: Christoph Hellwig 
Acked-by: Dinh Nguyen 
Signed-off-by: Arnd Bergmann 
Signed-off-by: Sasha Levin 
---
 arch/arc/kernel/process.c  |  2 +-
 arch/arm/kernel/swp_emulate.c  |  2 +-
 arch/arm/kernel/traps.c|  2 +-
 arch/csky/kernel/perf_callchain.c  |  2 +-
 arch/csky/kernel/signal.c  |  2 +-
 arch/nios2/kernel/signal.c | 20 +++-
 arch/powerpc/lib/sstep.c   |  4 ++--
 arch/riscv/kernel/perf_callchain.c |  4 ++--
 arch/sparc/kernel/signal_32.c  |  2 +-
 lib/test_lockup.c  |  4 ++--
 10 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c
index 8e90052f6f05..5f7f5aab361f 100644
--- a/arch/arc/kernel/process.c
+++ b/arch/arc/kernel/process.c
@@ -43,7 +43,7 @@ SYSCALL_DEFINE0(arc_gettls)
return task_thread_info(current)->thr_ptr;
 }
 
-SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new)
+SYSCALL_DEFINE3(arc_usr_cmpxchg, int __user *, uaddr, int, expected, int, new)
 {
struct pt_regs *regs = current_pt_regs();
u32 uval;
diff --git a/arch/arm/kernel/swp_emulate.c b/arch/arm/kernel/swp_emulate.c
index 6166ba38bf99..b74bfcf94fb1 100644
--- a/arch/arm/kernel/swp_emulate.c
+++ b/arch/arm/kernel/swp_emulate.c
@@ -195,7 +195,7 @@ static int swp_handler(struct pt_regs *regs, unsigned int 
instr)
 destreg, EXTRACT_REG_NUM(instr, RT2_OFFSET), data);
 
/* Check access in reasonable access range for both SWP and SWPB */
-   if (!access_ok((address & ~3), 4)) {
+   if (!access_ok((void __user *)(address & ~3), 4)) {
pr_debug("SWP{B} emulation: access to %p not allowed!\n",
 (void *)address);
res = -EFAULT;
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 655c4fe0b4d0..54abd8720dde 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -575,7 +575,7 @@ do_cache_op(unsigned long start, unsigned long end, int 
flags)
if (end < start || flags)
return -EINVAL;
 
-   if (!access_ok(start, end - start))
+   if (!access_ok((void __user *)start, end - start))
return -EFAULT;
 
return __do_cache_op(start, end);
diff --git a/arch/csky/kernel/perf_callchain.c 
b/arch/csky/kernel/perf_callchain.c
index 35318a635a5f..75e1f9df5f60 100644
--- a/arch/csky/kernel/perf_callchain.c
+++ b/arch/csky/kernel/perf_callchain.c
@@ -49,7 +49,7 @@ static unsigned long user_backtrace(struct 
perf_callchain_entry_ctx *entry,
 {
struct stackframe buftail;
unsigned long lr = 0;
-   unsigned long *user_frame_tail = (unsigned long *)fp;
+   unsigned long __user *user_frame_tail = (unsigned long __user *)fp;
 
/* Check accessibility of one struct frame_tail beyond */
if (!access_ok(user_frame_tail, sizeof(buftail)))
diff --git a/arch/csky/kernel/signal.c b/arch/csky/kernel/signal.c
index c7b763d2f526..8867ddf3e6c7 100644
--- a/arch/csky/kernel/signal.c
+++ b/arch/csky/kernel/signal.c
@@ -136,7 +136,7 @@ static inline void __user *get_sigframe(struct ksignal 
*ksig,
 static int
 setup_rt_frame(struct ksignal *ksig, sigset_t *set, struct pt_regs *regs)
 {
-   struct rt_sigframe *frame;
+   struct rt_sigframe __user *frame;
int err = 0;
 
frame = get_sigframe(ksig, regs, sizeof(*frame));
diff --git a/arch/nios2/kernel/signal.c b/arch/nios2/kernel/signal.c
index 2009ae2d3c3b..386e46443b60 100644
--- a/arch/nios2/kernel/signal.c
+++ b/arch/nios2/kernel/signal.c
@@ -36,10 +36,10 @@ struct rt_sigframe {
 
 static inline int rt_restore_ucontext(struct pt_regs *regs,
struct switch_stack *sw,
-   struct ucontext *uc, int *pr2)
+   struct ucontext __user *uc, int *pr2)
 {
int temp;
-   unsigned long *gregs = uc->uc_mcontext.gregs;
+   unsigned long __user *gregs = uc->uc_mcontext.gregs;
int err;
 
/* Always make any pending restarted system calls return -EINTR */
@@ -102,10 +102,11 @@ asmlinkage int do_rt_sigreturn(struct switch_stack *sw)
 {
struct pt_regs *regs = (struct pt_regs *)(sw + 1);
/* Verify, can we follow the stack back */
-   struct rt_sigframe *frame = (struct rt_sigframe *) regs->sp;
+   struct rt_sigframe __user *frame;
sigset_t set;
int rval;
 
+   frame = (struct rt_sigframe __user *) regs->sp;
if (!access_ok(frame, 

[PATCH AUTOSEL 5.16 40/59] uaccess: fix type mismatch warnings from access_ok()

2022-03-30 Thread Sasha Levin
From: Arnd Bergmann 

[ Upstream commit 23fc539e81295b14b50c6ccc5baeb4f3d59d822d ]

On some architectures, access_ok() does not do any argument type
checking, so replacing the definition with a generic one causes
a few warnings for harmless issues that were never caught before.

Fix the ones that I found either through my own test builds or
that were reported by the 0-day bot.

Reported-by: kernel test robot 
Reviewed-by: Christoph Hellwig 
Acked-by: Dinh Nguyen 
Signed-off-by: Arnd Bergmann 
Signed-off-by: Sasha Levin 
---
 arch/arc/kernel/process.c  |  2 +-
 arch/arm/kernel/swp_emulate.c  |  2 +-
 arch/arm/kernel/traps.c|  2 +-
 arch/csky/kernel/perf_callchain.c  |  2 +-
 arch/csky/kernel/signal.c  |  2 +-
 arch/nios2/kernel/signal.c | 20 +++-
 arch/powerpc/lib/sstep.c   |  4 ++--
 arch/riscv/kernel/perf_callchain.c |  4 ++--
 arch/sparc/kernel/signal_32.c  |  2 +-
 lib/test_lockup.c  |  4 ++--
 10 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c
index 8e90052f6f05..5f7f5aab361f 100644
--- a/arch/arc/kernel/process.c
+++ b/arch/arc/kernel/process.c
@@ -43,7 +43,7 @@ SYSCALL_DEFINE0(arc_gettls)
return task_thread_info(current)->thr_ptr;
 }
 
-SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new)
+SYSCALL_DEFINE3(arc_usr_cmpxchg, int __user *, uaddr, int, expected, int, new)
 {
struct pt_regs *regs = current_pt_regs();
u32 uval;
diff --git a/arch/arm/kernel/swp_emulate.c b/arch/arm/kernel/swp_emulate.c
index 6166ba38bf99..b74bfcf94fb1 100644
--- a/arch/arm/kernel/swp_emulate.c
+++ b/arch/arm/kernel/swp_emulate.c
@@ -195,7 +195,7 @@ static int swp_handler(struct pt_regs *regs, unsigned int 
instr)
 destreg, EXTRACT_REG_NUM(instr, RT2_OFFSET), data);
 
/* Check access in reasonable access range for both SWP and SWPB */
-   if (!access_ok((address & ~3), 4)) {
+   if (!access_ok((void __user *)(address & ~3), 4)) {
pr_debug("SWP{B} emulation: access to %p not allowed!\n",
 (void *)address);
res = -EFAULT;
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 90c887aa67a4..f74460d3bef5 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -575,7 +575,7 @@ do_cache_op(unsigned long start, unsigned long end, int 
flags)
if (end < start || flags)
return -EINVAL;
 
-   if (!access_ok(start, end - start))
+   if (!access_ok((void __user *)start, end - start))
return -EFAULT;
 
return __do_cache_op(start, end);
diff --git a/arch/csky/kernel/perf_callchain.c 
b/arch/csky/kernel/perf_callchain.c
index 35318a635a5f..75e1f9df5f60 100644
--- a/arch/csky/kernel/perf_callchain.c
+++ b/arch/csky/kernel/perf_callchain.c
@@ -49,7 +49,7 @@ static unsigned long user_backtrace(struct 
perf_callchain_entry_ctx *entry,
 {
struct stackframe buftail;
unsigned long lr = 0;
-   unsigned long *user_frame_tail = (unsigned long *)fp;
+   unsigned long __user *user_frame_tail = (unsigned long __user *)fp;
 
/* Check accessibility of one struct frame_tail beyond */
if (!access_ok(user_frame_tail, sizeof(buftail)))
diff --git a/arch/csky/kernel/signal.c b/arch/csky/kernel/signal.c
index c7b763d2f526..8867ddf3e6c7 100644
--- a/arch/csky/kernel/signal.c
+++ b/arch/csky/kernel/signal.c
@@ -136,7 +136,7 @@ static inline void __user *get_sigframe(struct ksignal 
*ksig,
 static int
 setup_rt_frame(struct ksignal *ksig, sigset_t *set, struct pt_regs *regs)
 {
-   struct rt_sigframe *frame;
+   struct rt_sigframe __user *frame;
int err = 0;
 
frame = get_sigframe(ksig, regs, sizeof(*frame));
diff --git a/arch/nios2/kernel/signal.c b/arch/nios2/kernel/signal.c
index 2009ae2d3c3b..386e46443b60 100644
--- a/arch/nios2/kernel/signal.c
+++ b/arch/nios2/kernel/signal.c
@@ -36,10 +36,10 @@ struct rt_sigframe {
 
 static inline int rt_restore_ucontext(struct pt_regs *regs,
struct switch_stack *sw,
-   struct ucontext *uc, int *pr2)
+   struct ucontext __user *uc, int *pr2)
 {
int temp;
-   unsigned long *gregs = uc->uc_mcontext.gregs;
+   unsigned long __user *gregs = uc->uc_mcontext.gregs;
int err;
 
/* Always make any pending restarted system calls return -EINTR */
@@ -102,10 +102,11 @@ asmlinkage int do_rt_sigreturn(struct switch_stack *sw)
 {
struct pt_regs *regs = (struct pt_regs *)(sw + 1);
/* Verify, can we follow the stack back */
-   struct rt_sigframe *frame = (struct rt_sigframe *) regs->sp;
+   struct rt_sigframe __user *frame;
sigset_t set;
int rval;
 
+   frame = (struct rt_sigframe __user *) regs->sp;
if (!access_ok(frame, 

[PATCH AUTOSEL 5.17 41/66] uaccess: fix type mismatch warnings from access_ok()

2022-03-30 Thread Sasha Levin
From: Arnd Bergmann 

[ Upstream commit 23fc539e81295b14b50c6ccc5baeb4f3d59d822d ]

On some architectures, access_ok() does not do any argument type
checking, so replacing the definition with a generic one causes
a few warnings for harmless issues that were never caught before.

Fix the ones that I found either through my own test builds or
that were reported by the 0-day bot.

Reported-by: kernel test robot 
Reviewed-by: Christoph Hellwig 
Acked-by: Dinh Nguyen 
Signed-off-by: Arnd Bergmann 
Signed-off-by: Sasha Levin 
---
 arch/arc/kernel/process.c  |  2 +-
 arch/arm/kernel/swp_emulate.c  |  2 +-
 arch/arm/kernel/traps.c|  2 +-
 arch/csky/kernel/perf_callchain.c  |  2 +-
 arch/csky/kernel/signal.c  |  2 +-
 arch/nios2/kernel/signal.c | 20 +++-
 arch/powerpc/lib/sstep.c   |  4 ++--
 arch/riscv/kernel/perf_callchain.c |  4 ++--
 arch/sparc/kernel/signal_32.c  |  2 +-
 lib/test_lockup.c  |  4 ++--
 10 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c
index 8e90052f6f05..5f7f5aab361f 100644
--- a/arch/arc/kernel/process.c
+++ b/arch/arc/kernel/process.c
@@ -43,7 +43,7 @@ SYSCALL_DEFINE0(arc_gettls)
return task_thread_info(current)->thr_ptr;
 }
 
-SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new)
+SYSCALL_DEFINE3(arc_usr_cmpxchg, int __user *, uaddr, int, expected, int, new)
 {
struct pt_regs *regs = current_pt_regs();
u32 uval;
diff --git a/arch/arm/kernel/swp_emulate.c b/arch/arm/kernel/swp_emulate.c
index 6166ba38bf99..b74bfcf94fb1 100644
--- a/arch/arm/kernel/swp_emulate.c
+++ b/arch/arm/kernel/swp_emulate.c
@@ -195,7 +195,7 @@ static int swp_handler(struct pt_regs *regs, unsigned int 
instr)
 destreg, EXTRACT_REG_NUM(instr, RT2_OFFSET), data);
 
/* Check access in reasonable access range for both SWP and SWPB */
-   if (!access_ok((address & ~3), 4)) {
+   if (!access_ok((void __user *)(address & ~3), 4)) {
pr_debug("SWP{B} emulation: access to %p not allowed!\n",
 (void *)address);
res = -EFAULT;
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index cae4a748811f..5d58aee24087 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -577,7 +577,7 @@ do_cache_op(unsigned long start, unsigned long end, int 
flags)
if (end < start || flags)
return -EINVAL;
 
-   if (!access_ok(start, end - start))
+   if (!access_ok((void __user *)start, end - start))
return -EFAULT;
 
return __do_cache_op(start, end);
diff --git a/arch/csky/kernel/perf_callchain.c 
b/arch/csky/kernel/perf_callchain.c
index 92057de08f4f..1612f4354087 100644
--- a/arch/csky/kernel/perf_callchain.c
+++ b/arch/csky/kernel/perf_callchain.c
@@ -49,7 +49,7 @@ static unsigned long user_backtrace(struct 
perf_callchain_entry_ctx *entry,
 {
struct stackframe buftail;
unsigned long lr = 0;
-   unsigned long *user_frame_tail = (unsigned long *)fp;
+   unsigned long __user *user_frame_tail = (unsigned long __user *)fp;
 
/* Check accessibility of one struct frame_tail beyond */
if (!access_ok(user_frame_tail, sizeof(buftail)))
diff --git a/arch/csky/kernel/signal.c b/arch/csky/kernel/signal.c
index c7b763d2f526..8867ddf3e6c7 100644
--- a/arch/csky/kernel/signal.c
+++ b/arch/csky/kernel/signal.c
@@ -136,7 +136,7 @@ static inline void __user *get_sigframe(struct ksignal 
*ksig,
 static int
 setup_rt_frame(struct ksignal *ksig, sigset_t *set, struct pt_regs *regs)
 {
-   struct rt_sigframe *frame;
+   struct rt_sigframe __user *frame;
int err = 0;
 
frame = get_sigframe(ksig, regs, sizeof(*frame));
diff --git a/arch/nios2/kernel/signal.c b/arch/nios2/kernel/signal.c
index 2009ae2d3c3b..386e46443b60 100644
--- a/arch/nios2/kernel/signal.c
+++ b/arch/nios2/kernel/signal.c
@@ -36,10 +36,10 @@ struct rt_sigframe {
 
 static inline int rt_restore_ucontext(struct pt_regs *regs,
struct switch_stack *sw,
-   struct ucontext *uc, int *pr2)
+   struct ucontext __user *uc, int *pr2)
 {
int temp;
-   unsigned long *gregs = uc->uc_mcontext.gregs;
+   unsigned long __user *gregs = uc->uc_mcontext.gregs;
int err;
 
/* Always make any pending restarted system calls return -EINTR */
@@ -102,10 +102,11 @@ asmlinkage int do_rt_sigreturn(struct switch_stack *sw)
 {
struct pt_regs *regs = (struct pt_regs *)(sw + 1);
/* Verify, can we follow the stack back */
-   struct rt_sigframe *frame = (struct rt_sigframe *) regs->sp;
+   struct rt_sigframe __user *frame;
sigset_t set;
int rval;
 
+   frame = (struct rt_sigframe __user *) regs->sp;
if (!access_ok(frame, 

Re: [PATCH] powerpc/boot: Build wrapper for an appropriate CPU

2022-03-30 Thread Joel Stanley
On Wed, 30 Mar 2022 at 11:33, Christophe Leroy
 wrote:
>
>
>
> Le 30/03/2022 à 13:24, Joel Stanley a écrit :
> > Currently the boot wrapper lacks a -mcpu option, so it will be built for
> > the toolchain's default cpu. This is a problem if the toolchain defaults
> > to a cpu with newer instructions.
> >
> > We could wire in TARGET_CPU but instead use the oldest supported option
> > so the wrapper runs anywhere.
> >
> > The GCC documentation stays that -mcpu=powerpc64le will give us a
> > generic 64 bit powerpc machine:
> >
> >   -mcpu=powerpc, -mcpu=powerpc64, and -mcpu=powerpc64le specify pure
> >   32-bit PowerPC (either endian), 64-bit big endian PowerPC and 64-bit
> >   little endian PowerPC architecture machine types, with an appropriate,
> >   generic processor model assumed for scheduling purposes.
> >
> > So do that for each of the three machines.
> >
> > This bug was found when building the kernel with a toolchain that
> > defaulted to powre10, resulting in a pcrel enabled wrapper which fails
> > to link:
> >
> >   arch/powerpc/boot/wrapper.a(crt0.o): in function `p_base':
> >   (.text+0x150): call to `platform_init' lacks nop, can't restore toc; (toc 
> > save/adjust stub)
> >   (.text+0x154): call to `start' lacks nop, can't restore toc; (toc 
> > save/adjust stub)
> >   powerpc64le-buildroot-linux-gnu-ld: final link failed: bad value
> >
> > Even with tha bug worked around the resulting kernel would crash on a
> > power9 box:
> >
> >   $ qemu-system-ppc64 -nographic -nodefaults -M powernv9 -kernel 
> > arch/powerpc/boot/zImage.epapr -serial mon:stdio
> >   [7.069331356,5] INIT: Starting kernel at 0x20010020, fdt at 
> > 0x3068c628 25694 bytes
> >   [7.130374661,3] ***
> >   [7.131072886,3] Fatal Exception 0xe40 at 200101e4MSR 
> > 9001
> >   [7.131290613,3] CFAR : 2001027c MSR  : 9001
> >   [7.131433759,3] SRR0 : 20010050 SRR1 : 9001
> >   [7.13155,3] HSRR0: 200101e4 HSRR1: 9001
> >   [7.131733687,3] DSISR:  DAR  : 
> >   [7.131905162,3] LR   : 20010280 CTR  : 
> >   [7.132068356,3] CR   : 44002004 XER  : 
> >
> > Link: https://github.com/linuxppc/issues/issues/400
> > Signed-off-by: Joel Stanley 
> > ---
> > Tested:
> >
> >   - ppc64le_defconfig
> >   - pseries and powernv qemu, for power8, power9, power10 cpus
> >   - buildroot compiler that defaults to -mcpu=power10 (gcc 10.3.0, ld 
> > 2.36.1)
> >   -  RHEL9 cross compilers (gcc 11.2.1-1, ld 2.35.2-17.el9)
> >
> > All decompressed and made it into the kernel ok.
> >
> > ppc64_defconfig did not work, as we've got a regression when the wrapper
> > is built for big endian. It hasn't worked for zImage.pseries for a long
> > time (at least v4.14), and broke some time between v5.4 and v5.17 for
> > zImage.epapr.
> >
> >   arch/powerpc/boot/Makefile | 8 ++--
> >   1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
> > index 9993c6256ad2..1f5cc401bfc0 100644
> > --- a/arch/powerpc/boot/Makefile
> > +++ b/arch/powerpc/boot/Makefile
> > @@ -38,9 +38,13 @@ BOOTCFLAGS:= -Wall -Wundef -Wstrict-prototypes 
> > -Wno-trigraphs \
> >$(LINUXINCLUDE)
> >
> >   ifdef CONFIG_PPC64_BOOT_WRAPPER
> > -BOOTCFLAGS   += -m64
> > +ifdef CONFIG_CPU_LITTLE_ENDIAN
> > +BOOTCFLAGS   += -m64 -mcpu=powerpc64le
> >   else
> > -BOOTCFLAGS   += -m32
> > +BOOTCFLAGS   += -m64 -mcpu=powerpc64
> > +endif
> > +else
> > +BOOTCFLAGS   += -m32 -mcpu=powerpc
>
> How does that interracts with the following lines ? Isn't it an issue to
> have two -mcpu ?
>
> arch/powerpc/boot/Makefile:$(obj)/4xx.o: BOOTCFLAGS += -mcpu=405
> arch/powerpc/boot/Makefile:$(obj)/ebony.o: BOOTCFLAGS += -mcpu=440
> arch/powerpc/boot/Makefile:$(obj)/cuboot-hotfoot.o: BOOTCFLAGS += -mcpu=405
> arch/powerpc/boot/Makefile:$(obj)/cuboot-taishan.o: BOOTCFLAGS += -mcpu=440
> arch/powerpc/boot/Makefile:$(obj)/cuboot-katmai.o: BOOTCFLAGS += -mcpu=440
> arch/powerpc/boot/Makefile:$(obj)/cuboot-acadia.o: BOOTCFLAGS += -mcpu=405
> arch/powerpc/boot/Makefile:$(obj)/treeboot-iss4xx.o: BOOTCFLAGS += -mcpu=405
> arch/powerpc/boot/Makefile:$(obj)/treeboot-currituck.o: BOOTCFLAGS +=
> -mcpu=405
> arch/powerpc/boot/Makefile:$(obj)/treeboot-akebono.o: BOOTCFLAGS +=
> -mcpu=405

Good point, I didn't test the other wrappers.

Last one wins as far as -mcpu lines goes, from a quick test. But it
might lead to less confusion if I dropped the -mcpu=powerpc change.


Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.18-1 tag

2022-03-30 Thread Michal Suchánek
On Mon, Mar 28, 2022 at 08:07:13PM +1100, Michael Ellerman wrote:
> Linus Torvalds  writes:
> > On Fri, Mar 25, 2022 at 3:25 AM Michael Ellerman  
> > wrote:

> 
> > That said:
> >
> >> There's a series of commits cleaning up function descriptor handling,
> >
> > For some reason I also thought that powerpc had actually moved away
> > from function descriptors, so I'm clearly not keeping up with the
> > times.
> 
> No you're right, we have moved away from them, but not entirely.
> 
> Functions descriptors are still used for 64-bit big endian, but they're
> not used for 64-bit little endian, or 32-bit.

There was a patch to use ABIv2 for ppc64 big endian. I suppose that
would rid usof the gunction descriptors for good.

Somehow the discussion of that change tralied off without any results.

Maybe it's worth resurrecting?

Thanks

Michal


Re: [PATCH] powerpc/boot: Build wrapper for an appropriate CPU

2022-03-30 Thread Christophe Leroy


Le 30/03/2022 à 13:24, Joel Stanley a écrit :
> Currently the boot wrapper lacks a -mcpu option, so it will be built for
> the toolchain's default cpu. This is a problem if the toolchain defaults
> to a cpu with newer instructions.
> 
> We could wire in TARGET_CPU but instead use the oldest supported option
> so the wrapper runs anywhere.
> 
> The GCC documentation stays that -mcpu=powerpc64le will give us a
> generic 64 bit powerpc machine:
> 
>   -mcpu=powerpc, -mcpu=powerpc64, and -mcpu=powerpc64le specify pure
>   32-bit PowerPC (either endian), 64-bit big endian PowerPC and 64-bit
>   little endian PowerPC architecture machine types, with an appropriate,
>   generic processor model assumed for scheduling purposes.
> 
> So do that for each of the three machines.
> 
> This bug was found when building the kernel with a toolchain that
> defaulted to powre10, resulting in a pcrel enabled wrapper which fails
> to link:
> 
>   arch/powerpc/boot/wrapper.a(crt0.o): in function `p_base':
>   (.text+0x150): call to `platform_init' lacks nop, can't restore toc; (toc 
> save/adjust stub)
>   (.text+0x154): call to `start' lacks nop, can't restore toc; (toc 
> save/adjust stub)
>   powerpc64le-buildroot-linux-gnu-ld: final link failed: bad value
> 
> Even with tha bug worked around the resulting kernel would crash on a
> power9 box:
> 
>   $ qemu-system-ppc64 -nographic -nodefaults -M powernv9 -kernel 
> arch/powerpc/boot/zImage.epapr -serial mon:stdio
>   [7.069331356,5] INIT: Starting kernel at 0x20010020, fdt at 0x3068c628 
> 25694 bytes
>   [7.130374661,3] ***
>   [7.131072886,3] Fatal Exception 0xe40 at 200101e4MSR 
> 9001
>   [7.131290613,3] CFAR : 2001027c MSR  : 9001
>   [7.131433759,3] SRR0 : 20010050 SRR1 : 9001
>   [7.13155,3] HSRR0: 200101e4 HSRR1: 9001
>   [7.131733687,3] DSISR:  DAR  : 
>   [7.131905162,3] LR   : 20010280 CTR  : 
>   [7.132068356,3] CR   : 44002004 XER  : 
> 
> Link: https://github.com/linuxppc/issues/issues/400
> Signed-off-by: Joel Stanley 
> ---
> Tested:
> 
>   - ppc64le_defconfig
>   - pseries and powernv qemu, for power8, power9, power10 cpus
>   - buildroot compiler that defaults to -mcpu=power10 (gcc 10.3.0, ld 2.36.1)
>   -  RHEL9 cross compilers (gcc 11.2.1-1, ld 2.35.2-17.el9)
> 
> All decompressed and made it into the kernel ok.
> 
> ppc64_defconfig did not work, as we've got a regression when the wrapper
> is built for big endian. It hasn't worked for zImage.pseries for a long
> time (at least v4.14), and broke some time between v5.4 and v5.17 for
> zImage.epapr.
> 
>   arch/powerpc/boot/Makefile | 8 ++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
> index 9993c6256ad2..1f5cc401bfc0 100644
> --- a/arch/powerpc/boot/Makefile
> +++ b/arch/powerpc/boot/Makefile
> @@ -38,9 +38,13 @@ BOOTCFLAGS:= -Wall -Wundef -Wstrict-prototypes 
> -Wno-trigraphs \
>$(LINUXINCLUDE)
>   
>   ifdef CONFIG_PPC64_BOOT_WRAPPER
> -BOOTCFLAGS   += -m64
> +ifdef CONFIG_CPU_LITTLE_ENDIAN
> +BOOTCFLAGS   += -m64 -mcpu=powerpc64le
>   else
> -BOOTCFLAGS   += -m32
> +BOOTCFLAGS   += -m64 -mcpu=powerpc64
> +endif
> +else
> +BOOTCFLAGS   += -m32 -mcpu=powerpc

How does that interracts with the following lines ? Isn't it an issue to 
have two -mcpu ?

arch/powerpc/boot/Makefile:$(obj)/4xx.o: BOOTCFLAGS += -mcpu=405
arch/powerpc/boot/Makefile:$(obj)/ebony.o: BOOTCFLAGS += -mcpu=440
arch/powerpc/boot/Makefile:$(obj)/cuboot-hotfoot.o: BOOTCFLAGS += -mcpu=405
arch/powerpc/boot/Makefile:$(obj)/cuboot-taishan.o: BOOTCFLAGS += -mcpu=440
arch/powerpc/boot/Makefile:$(obj)/cuboot-katmai.o: BOOTCFLAGS += -mcpu=440
arch/powerpc/boot/Makefile:$(obj)/cuboot-acadia.o: BOOTCFLAGS += -mcpu=405
arch/powerpc/boot/Makefile:$(obj)/treeboot-iss4xx.o: BOOTCFLAGS += -mcpu=405
arch/powerpc/boot/Makefile:$(obj)/treeboot-currituck.o: BOOTCFLAGS += 
-mcpu=405
arch/powerpc/boot/Makefile:$(obj)/treeboot-akebono.o: BOOTCFLAGS += 
-mcpu=405


>   endif
>   
>   BOOTCFLAGS  += -isystem $(shell $(BOOTCC) -print-file-name=include)

[PATCH] powerpc/boot: Build wrapper for an appropriate CPU

2022-03-30 Thread Joel Stanley
Currently the boot wrapper lacks a -mcpu option, so it will be built for
the toolchain's default cpu. This is a problem if the toolchain defaults
to a cpu with newer instructions.

We could wire in TARGET_CPU but instead use the oldest supported option
so the wrapper runs anywhere.

The GCC documentation stays that -mcpu=powerpc64le will give us a
generic 64 bit powerpc machine:

 -mcpu=powerpc, -mcpu=powerpc64, and -mcpu=powerpc64le specify pure
 32-bit PowerPC (either endian), 64-bit big endian PowerPC and 64-bit
 little endian PowerPC architecture machine types, with an appropriate,
 generic processor model assumed for scheduling purposes.

So do that for each of the three machines.

This bug was found when building the kernel with a toolchain that
defaulted to powre10, resulting in a pcrel enabled wrapper which fails
to link:

 arch/powerpc/boot/wrapper.a(crt0.o): in function `p_base':
 (.text+0x150): call to `platform_init' lacks nop, can't restore toc; (toc 
save/adjust stub)
 (.text+0x154): call to `start' lacks nop, can't restore toc; (toc save/adjust 
stub)
 powerpc64le-buildroot-linux-gnu-ld: final link failed: bad value

Even with tha bug worked around the resulting kernel would crash on a
power9 box:

 $ qemu-system-ppc64 -nographic -nodefaults -M powernv9 -kernel 
arch/powerpc/boot/zImage.epapr -serial mon:stdio
 [7.069331356,5] INIT: Starting kernel at 0x20010020, fdt at 0x3068c628 
25694 bytes
 [7.130374661,3] ***
 [7.131072886,3] Fatal Exception 0xe40 at 200101e4MSR 
9001
 [7.131290613,3] CFAR : 2001027c MSR  : 9001
 [7.131433759,3] SRR0 : 20010050 SRR1 : 9001
 [7.13155,3] HSRR0: 200101e4 HSRR1: 9001
 [7.131733687,3] DSISR:  DAR  : 
 [7.131905162,3] LR   : 20010280 CTR  : 
 [7.132068356,3] CR   : 44002004 XER  : 

Link: https://github.com/linuxppc/issues/issues/400
Signed-off-by: Joel Stanley 
---
Tested:

 - ppc64le_defconfig
 - pseries and powernv qemu, for power8, power9, power10 cpus
 - buildroot compiler that defaults to -mcpu=power10 (gcc 10.3.0, ld 2.36.1)
 -  RHEL9 cross compilers (gcc 11.2.1-1, ld 2.35.2-17.el9)

All decompressed and made it into the kernel ok.

ppc64_defconfig did not work, as we've got a regression when the wrapper
is built for big endian. It hasn't worked for zImage.pseries for a long
time (at least v4.14), and broke some time between v5.4 and v5.17 for
zImage.epapr.

 arch/powerpc/boot/Makefile | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 9993c6256ad2..1f5cc401bfc0 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -38,9 +38,13 @@ BOOTCFLAGS:= -Wall -Wundef -Wstrict-prototypes 
-Wno-trigraphs \
 $(LINUXINCLUDE)
 
 ifdef CONFIG_PPC64_BOOT_WRAPPER
-BOOTCFLAGS += -m64
+ifdef CONFIG_CPU_LITTLE_ENDIAN
+BOOTCFLAGS += -m64 -mcpu=powerpc64le
 else
-BOOTCFLAGS += -m32
+BOOTCFLAGS += -m64 -mcpu=powerpc64
+endif
+else
+BOOTCFLAGS += -m32 -mcpu=powerpc
 endif
 
 BOOTCFLAGS += -isystem $(shell $(BOOTCC) -print-file-name=include)
-- 
2.35.1



Re: [PATCH] recordmcount: Support empty section from recent binutils

2022-03-30 Thread Christophe Leroy


Le 29/03/2022 à 00:31, Joel Stanley a écrit :
> On Mon, 29 Nov 2021 at 22:43, Christophe Leroy
>  wrote:
>>
>>
>>
>> Le 29/11/2021 à 18:43, Steven Rostedt a écrit :
>>> On Fri, 26 Nov 2021 08:43:23 +
>>> LEROY Christophe  wrote:
>>>
 Le 24/11/2021 à 15:43, Christophe Leroy a écrit :
> Looks like recent binutils (2.36 and over ?) may empty some section,
> leading to failure like:
>
>  Cannot find symbol for section 11: .text.unlikely.
>  kernel/kexec_file.o: failed
>  make[1]: *** [scripts/Makefile.build:287: kernel/kexec_file.o] Error 
> 1
>
> In order to avoid that, ensure that the section has a content before
> returning it's name in has_rel_mcount().

 This patch doesn't work, on PPC32 I get the following message with this
 patch applied:

 [0.00] ftrace: No functions to be traced?

 Without the patch I get:

 [0.00] ftrace: allocating 22381 entries in 66 pages
 [0.00] ftrace: allocated 66 pages with 2 groups
>>>
>>> Because of this report, I have not applied this patch (even though I was
>>> about to push it to Linus).
>>>
>>> I'm pulling it from my queue until this gets resolved.
>>>
>>
>> I have no idea on how to fix that for the moment.
>>
>> With GCC 10 (binutils 2.36) an objdump -x on kernel/kexec_file.o gives:
>>
>>  ld  .text.unlikely  .text.unlikely
>>   wF .text.unlikely 0038
>> .arch_kexec_apply_relocations_add
>> 0038  wF .text.unlikely 0038
>> .arch_kexec_apply_relocations
>>
>>
>> With GCC 11 (binutils 2.37) the same gives:
>>
>>   wF .text.unlikely 0038
>> .arch_kexec_apply_relocations_add
>> 0038  wF .text.unlikely 0038
>> .arch_kexec_apply_relocations
>>
>>
>> The problem is that recordmcount drops weak symbols, and it doesn't find
>> any non-weak symbol in .text.unlikely
>>
>> Explication given at
>> https://elixir.bootlin.com/linux/v5.16-rc2/source/scripts/recordmcount.h#L506
>>
>> I have no idea on what to do.
> 
> Did you end up finding a solution for this issue?
> 

Not really, my solution was to switch to the kernel compiler at 
https://mirrors.edge.kernel.org/pub/tools/crosstool/ which embeds 
binutils 2.36

But it looks like using objtool instead of recordmcount doesn't exhibit 
the problem.

https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220318105140.43914-4...@linux.ibm.com/

Christophe

[powerpc:next] BUILD SUCCESS af41d2866f7d75bbb38d487f6ec7770425d70e45

2022-03-30 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next
branch HEAD: af41d2866f7d75bbb38d487f6ec7770425d70e45  powerpc/64: Fix build 
failure with allyesconfig in book3s_64_entry.S

elapsed time: 967m

configs tested: 137
configs skipped: 136

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
powerpc  randconfig-c003-20220330
i386  randconfig-c001
sh  rts7751r2d1_defconfig
armrealview_defconfig
arm assabet_defconfig
alpha   defconfig
xtensa   allyesconfig
sh   se7619_defconfig
sh  lboxre2_defconfig
powerpc  pasemi_defconfig
arm   imx_v6_v7_defconfig
parisc64 alldefconfig
mips tb0226_defconfig
armclps711x_defconfig
armxcep_defconfig
arm  badge4_defconfig
arm pxa_defconfig
ia64 allmodconfig
arc defconfig
i386defconfig
armcerfcube_defconfig
openriscor1ksim_defconfig
m68k   m5249evb_defconfig
powerpc  ppc40x_defconfig
armqcom_defconfig
mips db1xxx_defconfig
sh  sdk7786_defconfig
powerpc wii_defconfig
powerpc mpc85xx_cds_defconfig
arm  randconfig-c002-20220327
arm  randconfig-c002-20220329
arm  randconfig-c002-20220330
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2allyesconfig
cskydefconfig
alphaallyesconfig
h8300allyesconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
parisc64defconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386   debian-10.3-kselftests
i386  debian-10.3
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a001-20220328
x86_64   randconfig-a003-20220328
x86_64   randconfig-a004-20220328
x86_64   randconfig-a002-20220328
x86_64   randconfig-a005-20220328
x86_64   randconfig-a006-20220328
i386 randconfig-a001-20220328
i386 randconfig-a003-20220328
i386 randconfig-a006-20220328
i386 randconfig-a005-20220328
i386 randconfig-a004-20220328
i386 randconfig-a002-20220328
x86_64randconfig-a006
x86_64randconfig-a004
x86_64randconfig-a002
riscvnommu_k210_defconfig
riscvallyesconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv   defconfig
riscv  rv32_defconfig
riscvallmodconfig
x86_64rhel-8.3-kselftests
um   x86_64_defconfig
um i386_defconfig
x86_64   allyesconfig
x86_64  defconfig
x86_64   rhel-8.3
x86_64  rhel-8.3-func
x86_64 rhel-8.3-kunit
x86_64  kexec

clang tested configs:
mips randconfig-c004-20220329
x86_64randconfig-c007
mips randconfig-c004-20220327
arm  randconfig-c002-20220327
arm  randconfig-c002-20220329
riscv

[PATCH v3] ftrace: Make ftrace_graph_is_dead() a static branch

2022-03-30 Thread Christophe Leroy
ftrace_graph_is_dead() is used on hot paths, it just reads a variable
in memory and is not worth suffering function call constraints.

For instance, at entry of prepare_ftrace_return(), inlining it avoids
saving prepare_ftrace_return() parameters to stack and restoring them
after calling ftrace_graph_is_dead().

While at it using a static branch is even more performant and is
rather well adapted considering that the returned value will almost
never change.

Inline ftrace_graph_is_dead() and replace 'kill_ftrace_graph' bool
by a static branch.

The performance improvement is noticeable.

Signed-off-by: Christophe Leroy 
---
v3: Keep includes in upside-down x-mas tree

v2: Use a static branch instead of a global bool var.
---
 include/linux/ftrace.h | 16 +++-
 kernel/trace/fgraph.c  | 17 +++--
 2 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index ed8cf433a46a..4816b7e11047 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -9,6 +9,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1018,7 +1019,20 @@ unsigned long ftrace_graph_ret_addr(struct task_struct 
*task, int *idx,
 extern int register_ftrace_graph(struct fgraph_ops *ops);
 extern void unregister_ftrace_graph(struct fgraph_ops *ops);
 
-extern bool ftrace_graph_is_dead(void);
+/**
+ * ftrace_graph_is_dead - returns true if ftrace_graph_stop() was called
+ *
+ * ftrace_graph_stop() is called when a severe error is detected in
+ * the function graph tracing. This function is called by the critical
+ * paths of function graph to keep those paths from doing any more harm.
+ */
+DECLARE_STATIC_KEY_FALSE(kill_ftrace_graph);
+
+static inline bool ftrace_graph_is_dead(void)
+{
+   return static_branch_unlikely(_ftrace_graph);
+}
+
 extern void ftrace_graph_stop(void);
 
 /* The current handlers in use */
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 19028e072cdb..8f4fb328133a 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -7,6 +7,7 @@
  *
  * Highly modified by Steven Rostedt (VMware).
  */
+#include 
 #include 
 #include 
 #include 
@@ -23,24 +24,12 @@
 #define ASSIGN_OPS_HASH(opsname, val)
 #endif
 
-static bool kill_ftrace_graph;
+DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
 int ftrace_graph_active;
 
 /* Both enabled by default (can be cleared by function_graph tracer flags */
 static bool fgraph_sleep_time = true;
 
-/**
- * ftrace_graph_is_dead - returns true if ftrace_graph_stop() was called
- *
- * ftrace_graph_stop() is called when a severe error is detected in
- * the function graph tracing. This function is called by the critical
- * paths of function graph to keep those paths from doing any more harm.
- */
-bool ftrace_graph_is_dead(void)
-{
-   return kill_ftrace_graph;
-}
-
 /**
  * ftrace_graph_stop - set to permanently disable function graph tracing
  *
@@ -51,7 +40,7 @@ bool ftrace_graph_is_dead(void)
  */
 void ftrace_graph_stop(void)
 {
-   kill_ftrace_graph = true;
+   static_branch_enable(_ftrace_graph);
 }
 
 /* Add a function return address to the trace stack on thread info.*/
-- 
2.35.1



Re: [PATCH v2 7/8] powerpc/pgtable: remove _PAGE_BIT_SWAP_TYPE for book3s

2022-03-30 Thread David Hildenbrand
On 30.03.22 08:07, Christophe Leroy wrote:
> 
> 
> Le 29/03/2022 à 18:43, David Hildenbrand a écrit :
>> The swap type is simply stored in bits 0x1f of the swap pte. Let's
>> simplify by just getting rid of _PAGE_BIT_SWAP_TYPE. It's not like that
>> we can simply change it: _PAGE_SWP_SOFT_DIRTY would suddenly fall into
>> _RPAGE_RSV1, which isn't possible and would make the
>> BUILD_BUG_ON(_PAGE_HPTEFLAGS & _PAGE_SWP_SOFT_DIRTY) angry.
>>
>> While at it, make it clearer which bit we're actually using for
>> _PAGE_SWP_SOFT_DIRTY by just using the proper define and introduce and
>> use SWP_TYPE_MASK.
>>
>> Signed-off-by: David Hildenbrand 
>> ---
>>   arch/powerpc/include/asm/book3s/64/pgtable.h | 12 +---
> 
> Why only BOOK3S ? Why not BOOK3E as well ?

Hi Cristophe,

I'm focusing on the most relevant enterprise architectures for now. I
don't have the capacity to convert each and every architecture at this
point (especially, I don't to waste my time in case this doesn't get
merged, and book3e didn't look straight forward to me).

Once this series hits upstream, I can look into other architectures --
and I'll be happy if other people jump in that have more familiarity
with the architecture-specific swp pte layouts.

Thanks

-- 
Thanks,

David / dhildenb



Re: [PATCH v2] ftrace: Make ftrace_graph_is_dead() a static branch

2022-03-30 Thread Christophe Leroy


Le 30/03/2022 à 04:07, Steven Rostedt a écrit :
> On Fri, 25 Mar 2022 09:03:08 +0100
> Christophe Leroy  wrote:
> 
>> --- a/kernel/trace/fgraph.c
>> +++ b/kernel/trace/fgraph.c
>> @@ -10,6 +10,7 @@
>>   #include 
>>   #include 
>>   #include 
>> +#include 
>>   
> 
> Small nit. Please order the includes in "upside-down x-mas tree" fashion:
> 
> #include 
> #include 
> #include 
> #include 
> 

That's the first time I get such a request. Usually people request 
#includes to be in alphabetical order so when I see a file that has 
headers in alphabetical order I try to not break it, but here that was 
not the case so I put it at the end of the list.

I'll send v3

Thanks
Christophe

Re: [PATCH v2 7/8] powerpc/pgtable: remove _PAGE_BIT_SWAP_TYPE for book3s

2022-03-30 Thread Christophe Leroy


Le 29/03/2022 à 18:43, David Hildenbrand a écrit :
> The swap type is simply stored in bits 0x1f of the swap pte. Let's
> simplify by just getting rid of _PAGE_BIT_SWAP_TYPE. It's not like that
> we can simply change it: _PAGE_SWP_SOFT_DIRTY would suddenly fall into
> _RPAGE_RSV1, which isn't possible and would make the
> BUILD_BUG_ON(_PAGE_HPTEFLAGS & _PAGE_SWP_SOFT_DIRTY) angry.
> 
> While at it, make it clearer which bit we're actually using for
> _PAGE_SWP_SOFT_DIRTY by just using the proper define and introduce and
> use SWP_TYPE_MASK.
> 
> Signed-off-by: David Hildenbrand 
> ---
>   arch/powerpc/include/asm/book3s/64/pgtable.h | 12 +---

Why only BOOK3S ? Why not BOOK3E as well ?

Christophe

>   1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 875730d5af40..8e98375d5c4a 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -13,7 +13,6 @@
>   /*
>* Common bits between hash and Radix page table
>*/
> -#define _PAGE_BIT_SWAP_TYPE  0
>   
>   #define _PAGE_EXEC  0x1 /* execute permission */
>   #define _PAGE_WRITE 0x2 /* write access allowed */
> @@ -751,17 +750,16 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t 
> newprot)
>* Don't have overlapping bits with _PAGE_HPTEFLAGS \
>* We filter HPTEFLAGS on set_pte.  \
>*/ \
> - BUILD_BUG_ON(_PAGE_HPTEFLAGS & (0x1f << _PAGE_BIT_SWAP_TYPE)); \
> + BUILD_BUG_ON(_PAGE_HPTEFLAGS & SWP_TYPE_MASK); \
>   BUILD_BUG_ON(_PAGE_HPTEFLAGS & _PAGE_SWP_SOFT_DIRTY);   \
>   } while (0)
>   
>   #define SWP_TYPE_BITS 5
> -#define __swp_type(x)(((x).val >> _PAGE_BIT_SWAP_TYPE) \
> - & ((1UL << SWP_TYPE_BITS) - 1))
> +#define SWP_TYPE_MASK((1UL << SWP_TYPE_BITS) - 1)
> +#define __swp_type(x)((x).val & SWP_TYPE_MASK)
>   #define __swp_offset(x) (((x).val & PTE_RPN_MASK) >> PAGE_SHIFT)
>   #define __swp_entry(type, offset)   ((swp_entry_t) { \
> - ((type) << _PAGE_BIT_SWAP_TYPE) \
> - | (((offset) << PAGE_SHIFT) & PTE_RPN_MASK)})
> + (type) | (((offset) << PAGE_SHIFT) & 
> PTE_RPN_MASK)})
>   /*
>* swp_entry_t must be independent of pte bits. We build a swp_entry_t from
>* swap type and offset we get from swap and convert that to pte to find a
> @@ -774,7 +772,7 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t 
> newprot)
>   #define __swp_entry_to_pmd(x)   (pte_pmd(__swp_entry_to_pte(x)))
>   
>   #ifdef CONFIG_MEM_SOFT_DIRTY
> -#define _PAGE_SWP_SOFT_DIRTY   (1UL << (SWP_TYPE_BITS + _PAGE_BIT_SWAP_TYPE))
> +#define _PAGE_SWP_SOFT_DIRTY _PAGE_NON_IDEMPOTENT
>   #else
>   #define _PAGE_SWP_SOFT_DIRTY0UL
>   #endif /* CONFIG_MEM_SOFT_DIRTY */

Re: [RFC][PATCH 2/2] powerpc/papr_scm: Implement support for reporting generic nvdimm stats

2022-03-30 Thread Christophe Leroy

Hi,

Le 08/11/2020 à 22:15, Vaibhav Jain a écrit :

Add support for reporting papr-scm supported generic nvdimm stats by
implementing support for handling ND_CMD_GET_STAT in
'papr_scm_ndctl().

The mapping between libnvdimm generic nvdimm-stats and papr-scm
specific performance-stats is embedded inside 'dimm_stats_map[]'. This
array is queried by newly introduced 'papr_scm_get_stat()' that
verifies if the requested nvdimm-stat is supported and if yes does an
hcall via 'drc_pmem_query_stat()' to request the performance-stat and
return it back to libnvdimm.

Signed-off-by: Vaibhav Jain 


I see this series is still flagged as 'new' in patchwork.

I saw some patches providing stats to nvdimm are in a pull request for 5.18.

I imagine this is the same subject, so I'm going to change the status of 
this series. Let me know if I'm wrong.


Thanks
Christophe


---
  arch/powerpc/platforms/pseries/papr_scm.c | 66 ++-
  1 file changed, 65 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index 835163f54244..51eeab3376fd 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -25,7 +25,8 @@
((1ul << ND_CMD_GET_CONFIG_SIZE) | \
 (1ul << ND_CMD_GET_CONFIG_DATA) | \
 (1ul << ND_CMD_SET_CONFIG_DATA) | \
-(1ul << ND_CMD_CALL))
+(1ul << ND_CMD_CALL) |  \
+(1ul << ND_CMD_GET_STAT))
  
  /* DIMM health bitmap bitmap indicators */

  /* SCM device is unable to persist memory contents */
@@ -120,6 +121,16 @@ struct papr_scm_priv {
  static LIST_HEAD(papr_nd_regions);
  static DEFINE_MUTEX(papr_ndr_lock);
  
+/* Map generic nvdimm stats to papr-scm stats */

+static const char * const dimm_stat_map[] = {
+   [ND_DIMM_STAT_INVALID] = NULL,
+   [ND_DIMM_STAT_MEDIA_READS] = "MedRCnt ",
+   [ND_DIMM_STAT_MEDIA_WRITES] = "MedWCnt ",
+   [ND_DIMM_STAT_READ_REQUESTS] = "HostLCnt",
+   [ND_DIMM_STAT_WRITE_REQUESTS] = "HostSCnt",
+   [ND_DIMM_STAT_MAX] = NULL,
+};
+
  static int drc_pmem_bind(struct papr_scm_priv *p)
  {
unsigned long ret[PLPAR_HCALL_BUFSIZE];
@@ -728,6 +739,54 @@ static int papr_scm_service_pdsm(struct papr_scm_priv *p,
return pdsm_pkg->cmd_status;
  }
  
+/*

+ * For a given pdsm request call an appropriate service function.
+ * Returns errors if any while handling the pdsm command package.
+ */
+static int papr_scm_get_stat(struct papr_scm_priv *p,
+struct nd_cmd_get_dimm_stat *dimm_stat)
+
+{
+   int rc;
+   ssize_t size;
+   struct papr_scm_perf_stat *stat;
+   struct papr_scm_perf_stats *stats;
+
+   /* Check if the requested stat-id is supported */
+   if (dimm_stat->stat_id >= ARRAY_SIZE(dimm_stat_map) ||
+   !dimm_stat_map[dimm_stat->stat_id]) {
+   dev_dbg(>pdev->dev, "Invalid stat-id %lld\n", 
dimm_stat->stat_id);
+   return -ENOSPC;
+   }
+
+   /* Allocate request buffer enough to hold single performance stat */
+   size = sizeof(struct papr_scm_perf_stats) +
+   sizeof(struct papr_scm_perf_stat);
+
+   stats = kzalloc(size, GFP_KERNEL);
+   if (!stats)
+   return -ENOMEM;
+
+   stat = >scm_statistic[0];
+   memcpy(>stat_id, dimm_stat_map[dimm_stat->stat_id],
+  sizeof(stat->stat_id));
+   stat->stat_val = 0;
+
+   /* Fetch the statistic from PHYP and copy it to provided payload */
+   rc = drc_pmem_query_stats(p, stats, 1);
+   if (rc < 0) {
+   dev_dbg(>pdev->dev, "Err(%d) fetching stat '%.8s'\n",
+   rc, stat->stat_id);
+   kfree(stats);
+   return rc;
+   }
+
+   dimm_stat->int_val = be64_to_cpu(stat->stat_val);
+
+   kfree(stats);
+   return 0;
+}
+
  static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
  struct nvdimm *nvdimm, unsigned int cmd, void *buf,
  unsigned int buf_len, int *cmd_rc)
@@ -772,6 +831,11 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor 
*nd_desc,
*cmd_rc = papr_scm_service_pdsm(p, call_pkg);
break;
  
+	case ND_CMD_GET_STAT:

+   *cmd_rc = papr_scm_get_stat(p,
+   (struct nd_cmd_get_dimm_stat *)buf);
+   break;
+
default:
dev_dbg(>pdev->dev, "Unknown command = %d\n", cmd);
return -EINVAL;