Re: [PATCH] powerpc/boot: Build wrapper for an appropriate CPU
On Thu, 31 Mar 2022 at 02:05, Murilo Opsfelder Araújo wrote: > > Hi, Joel. > > On 3/30/22 08:24, Joel Stanley wrote: > > Currently the boot wrapper lacks a -mcpu option, so it will be built for > > the toolchain's default cpu. This is a problem if the toolchain defaults > > to a cpu with newer instructions. > > > > We could wire in TARGET_CPU but instead use the oldest supported option > > so the wrapper runs anywhere. > > > > The GCC documentation stays that -mcpu=powerpc64le will give us a > > generic 64 bit powerpc machine: > > > > -mcpu=powerpc, -mcpu=powerpc64, and -mcpu=powerpc64le specify pure > > 32-bit PowerPC (either endian), 64-bit big endian PowerPC and 64-bit > > little endian PowerPC architecture machine types, with an appropriate, > > generic processor model assumed for scheduling purposes. > > > > So do that for each of the three machines. > > > > This bug was found when building the kernel with a toolchain that > > defaulted to powre10, resulting in a pcrel enabled wrapper which fails > > to link: > > > > arch/powerpc/boot/wrapper.a(crt0.o): in function `p_base': > > (.text+0x150): call to `platform_init' lacks nop, can't restore toc; (toc > > save/adjust stub) > > (.text+0x154): call to `start' lacks nop, can't restore toc; (toc > > save/adjust stub) > > powerpc64le-buildroot-linux-gnu-ld: final link failed: bad value > > > > Even with tha bug worked around the resulting kernel would crash on a > > power9 box: > > > > $ qemu-system-ppc64 -nographic -nodefaults -M powernv9 -kernel > > arch/powerpc/boot/zImage.epapr -serial mon:stdio > > [7.069331356,5] INIT: Starting kernel at 0x20010020, fdt at > > 0x3068c628 25694 bytes > > [7.130374661,3] *** > > [7.131072886,3] Fatal Exception 0xe40 at 200101e4MSR > > 9001 > > [7.131290613,3] CFAR : 2001027c MSR : 9001 > > [7.131433759,3] SRR0 : 20010050 SRR1 : 9001 > > [7.13155,3] HSRR0: 200101e4 HSRR1: 9001 > > [7.131733687,3] DSISR: DAR : > > [7.131905162,3] LR : 20010280 CTR : > > [7.132068356,3] CR : 44002004 XER : > > > > Link: https://github.com/linuxppc/issues/issues/400 > > Signed-off-by: Joel Stanley > > --- > > Tested: > > > > - ppc64le_defconfig > > - pseries and powernv qemu, for power8, power9, power10 cpus > > - buildroot compiler that defaults to -mcpu=power10 (gcc 10.3.0, ld > > 2.36.1) > > - RHEL9 cross compilers (gcc 11.2.1-1, ld 2.35.2-17.el9) > > > > All decompressed and made it into the kernel ok. > > > > ppc64_defconfig did not work, as we've got a regression when the wrapper > > is built for big endian. It hasn't worked for zImage.pseries for a long > > time (at least v4.14), and broke some time between v5.4 and v5.17 for > > zImage.epapr. > > > > arch/powerpc/boot/Makefile | 8 ++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile > > index 9993c6256ad2..1f5cc401bfc0 100644 > > --- a/arch/powerpc/boot/Makefile > > +++ b/arch/powerpc/boot/Makefile > > @@ -38,9 +38,13 @@ BOOTCFLAGS:= -Wall -Wundef -Wstrict-prototypes > > -Wno-trigraphs \ > >$(LINUXINCLUDE) > > > > ifdef CONFIG_PPC64_BOOT_WRAPPER > > -BOOTCFLAGS += -m64 > > +ifdef CONFIG_CPU_LITTLE_ENDIAN > > +BOOTCFLAGS += -m64 -mcpu=powerpc64le > > else > > -BOOTCFLAGS += -m32 > > +BOOTCFLAGS += -m64 -mcpu=powerpc64 > > +endif > > +else > > +BOOTCFLAGS += -m32 -mcpu=powerpc > > endif > > > > BOOTCFLAGS += -isystem $(shell $(BOOTCC) -print-file-name=include) > > I think it was a fortunate coincidence that the default cpu type of your gcc > is > compatible with your system. If the distro gcc moves its default to a newer > cpu > type than your system, this bug would happen again. Perhaps I needed to be clear in my commit message: that's the exact bug I'm looking to avoid. I have a buildroot toolchain that was built for -mcpu=power10. I think you're suggesting the -mcpu=powerpc64 option will change it 's behavior depending on the default. From my reading of the man page, I don't think that's true. I did a little test using my buildroot compiler which has with-cpu=power10. I used the presence of PCREL relocations as evidence that it was build for power10. $ powerpc64le-buildroot-linux-gnu-gcc -mcpu=power10 -c test.c $ readelf -r test.o |grep -c PCREL 24 $ powerpc64le-buildroot-linux-gnu-gcc -c test.c $ readelf -r test.o |grep -c PCREL 24 $ powerpc64le-buildroot-linux-gnu-gcc -mcpu=powerpc64le -c test.c $ readelf -r test.o |grep -c PCREL 0 > > The command "gcc -v |& grep with-cpu" will show you the default cpu type for > 32 > and 64-bit that gcc was configured. Just a headss up: this gives me no output for the 64 bit compilers on my laptop: $
Re: [PATCH] powerpc/boot: Build wrapper for an appropriate CPU
Hi, Joel. On 3/30/22 08:24, Joel Stanley wrote: Currently the boot wrapper lacks a -mcpu option, so it will be built for the toolchain's default cpu. This is a problem if the toolchain defaults to a cpu with newer instructions. We could wire in TARGET_CPU but instead use the oldest supported option so the wrapper runs anywhere. The GCC documentation stays that -mcpu=powerpc64le will give us a generic 64 bit powerpc machine: -mcpu=powerpc, -mcpu=powerpc64, and -mcpu=powerpc64le specify pure 32-bit PowerPC (either endian), 64-bit big endian PowerPC and 64-bit little endian PowerPC architecture machine types, with an appropriate, generic processor model assumed for scheduling purposes. So do that for each of the three machines. This bug was found when building the kernel with a toolchain that defaulted to powre10, resulting in a pcrel enabled wrapper which fails to link: arch/powerpc/boot/wrapper.a(crt0.o): in function `p_base': (.text+0x150): call to `platform_init' lacks nop, can't restore toc; (toc save/adjust stub) (.text+0x154): call to `start' lacks nop, can't restore toc; (toc save/adjust stub) powerpc64le-buildroot-linux-gnu-ld: final link failed: bad value Even with tha bug worked around the resulting kernel would crash on a power9 box: $ qemu-system-ppc64 -nographic -nodefaults -M powernv9 -kernel arch/powerpc/boot/zImage.epapr -serial mon:stdio [7.069331356,5] INIT: Starting kernel at 0x20010020, fdt at 0x3068c628 25694 bytes [7.130374661,3] *** [7.131072886,3] Fatal Exception 0xe40 at 200101e4MSR 9001 [7.131290613,3] CFAR : 2001027c MSR : 9001 [7.131433759,3] SRR0 : 20010050 SRR1 : 9001 [7.13155,3] HSRR0: 200101e4 HSRR1: 9001 [7.131733687,3] DSISR: DAR : [7.131905162,3] LR : 20010280 CTR : [7.132068356,3] CR : 44002004 XER : Link: https://github.com/linuxppc/issues/issues/400 Signed-off-by: Joel Stanley --- Tested: - ppc64le_defconfig - pseries and powernv qemu, for power8, power9, power10 cpus - buildroot compiler that defaults to -mcpu=power10 (gcc 10.3.0, ld 2.36.1) - RHEL9 cross compilers (gcc 11.2.1-1, ld 2.35.2-17.el9) All decompressed and made it into the kernel ok. ppc64_defconfig did not work, as we've got a regression when the wrapper is built for big endian. It hasn't worked for zImage.pseries for a long time (at least v4.14), and broke some time between v5.4 and v5.17 for zImage.epapr. arch/powerpc/boot/Makefile | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile index 9993c6256ad2..1f5cc401bfc0 100644 --- a/arch/powerpc/boot/Makefile +++ b/arch/powerpc/boot/Makefile @@ -38,9 +38,13 @@ BOOTCFLAGS:= -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \ $(LINUXINCLUDE) ifdef CONFIG_PPC64_BOOT_WRAPPER -BOOTCFLAGS += -m64 +ifdef CONFIG_CPU_LITTLE_ENDIAN +BOOTCFLAGS += -m64 -mcpu=powerpc64le else -BOOTCFLAGS += -m32 +BOOTCFLAGS += -m64 -mcpu=powerpc64 +endif +else +BOOTCFLAGS += -m32 -mcpu=powerpc endif BOOTCFLAGS += -isystem $(shell $(BOOTCC) -print-file-name=include) I think it was a fortunate coincidence that the default cpu type of your gcc is compatible with your system. If the distro gcc moves its default to a newer cpu type than your system, this bug would happen again. The command "gcc -v |& grep with-cpu" will show you the default cpu type for 32 and 64-bit that gcc was configured. Considering the CONFIG_TARGET_CPU for BOOTCFLAGS would bring some level of consistency between CFLAGS and BOOTCFLAGS regarding -mcpu value. We could mimic the behaviour from arch/powerpc/Makefile: 166 ifdef config_ppc_book3s_64 167 ifdef config_cpu_little_endian 168 cflags-$(config_generic_cpu) += -mcpu=power8 169 cflags-$(config_generic_cpu) += $(call cc-option,-mtune=power9,-mtune=power8) 170 else 171 cflags-$(config_generic_cpu) += $(call cc-option,-mtune=power7,$(call cc-option,-mtune=power5)) 172 cflags-$(config_generic_cpu) += $(call cc-option,-mcpu=power5,-mcpu=power4) 173 endif 174 else ifdef config_ppc_book3e_64 175 cflags-$(config_generic_cpu) += -mcpu=powerpc64 176 endif ... 185 CFLAGS-$(CONFIG_TARGET_CPU_BOOL) += $(call cc-option,-mcpu=$(CONFIG_TARGET_CPU)) Cheers! -- Murilo
[Bug 215781] Highmem support broken on kernels greater 5.15.x on ppc32?
https://bugzilla.kernel.org/show_bug.cgi?id=215781 --- Comment #3 from Erhard F. (erhar...@mailbox.org) --- Created attachment 300667 --> https://bugzilla.kernel.org/attachment.cgi?id=300667=edit kernel .config (5.15.32, PowerMac G4 DP) -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 215781] Highmem support broken on kernels greater 5.15.x on ppc32?
https://bugzilla.kernel.org/show_bug.cgi?id=215781 --- Comment #2 from Erhard F. (erhar...@mailbox.org) --- Created attachment 300666 --> https://bugzilla.kernel.org/attachment.cgi?id=300666=edit kernel .config (5.16.18, PowerMac G4 DP) -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 215781] Highmem support broken on kernels greater 5.15.x on ppc32?
https://bugzilla.kernel.org/show_bug.cgi?id=215781 --- Comment #1 from Erhard F. (erhar...@mailbox.org) --- Created attachment 300665 --> https://bugzilla.kernel.org/attachment.cgi?id=300665=edit dmesg (5.15.32, PowerMac G4 DP) -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 215781] New: Highmem support broken on kernels greater 5.15.x on ppc32?
https://bugzilla.kernel.org/show_bug.cgi?id=215781 Bug ID: 215781 Summary: Highmem support broken on kernels greater 5.15.x on ppc32? Product: Platform Specific/Hardware Version: 2.5 Kernel Version: 5.16.18 Hardware: PPC-32 OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: PPC-32 Assignee: platform_ppc...@kernel-bugs.osdl.org Reporter: erhar...@mailbox.org Regression: Yes Created attachment 300664 --> https://bugzilla.kernel.org/attachment.cgi?id=300664=edit dmesg (5.16.18, PowerMac G4 DP) Noticed my G4 DP ran a bit sluggish... Turned out it uses only 614664K of 2097152K RAM. Happens on both kernel 5.16.18 and 5.17.1. Kernels 5.15 and before work as expected. It seems to be a problem with highmem as 5.16.18 and 5.1.7.1 show 0K highmem. CONFIG_HIGHMEM=y is of course set. Kernel 5.16.18 says: [...] Top of RAM: 0x8000, Total RAM: 0x8000 Memory hole size: 0MB Zone ranges: DMA [mem 0x-0x27ff] Normal empty HighMem [mem 0x2800-0x7fff] Movable zone start for each node Early memory node ranges node 0: [mem 0x-0x7fff] Initmem setup node 0 [mem 0x-0x7fff] percpu: Embedded 12 pages/cpu s19404 r8192 d21556 u49152 pcpu-alloc: s19404 r8192 d21556 u49152 alloc=12*4096 pcpu-alloc: [0] 0 [0] 1 Built 1 zonelists, mobility grouping on. Total pages: 522848 Kernel command line: ro root=/dev/sda5 zswap.max_pool_percent=16 zswap.zpool=z3fold slub_debug=FZP page_poison=1 netconsole=@192.168.2.5/eth0,@192.168.2.2/70:85:C2:30:EC:01 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes, linear) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes, linear) mem auto-init: stack:__user(zero), heap alloc:off, heap free:off Kernel virtual memory layout: * 0xffbbf000..0xf000 : fixmap * 0xff40..0xff80 : highmem PTEs * 0xff115000..0xff40 : early ioremap * 0xe900..0xff115000 : vmalloc & ioremap * 0xb000..0xc000 : modules Memory: 614664K/2097152K available (8828K kernel code, 488K rwdata, 1664K rodata, 1316K init, 381K bss, 1482488K reserved, 0K cma-reserved, 0K highmem) [...] On kernel 5.15.23 I got highmem as expected: [...] Top of RAM: 0x8000, Total RAM: 0x8000 Memory hole size: 0MB Zone ranges: DMA [mem 0x-0x27ff] Normal empty HighMem [mem 0x2800-0x7fff] Movable zone start for each node Early memory node ranges node 0: [mem 0x-0x7fff] Initmem setup node 0 [mem 0x-0x7fff] percpu: Embedded 12 pages/cpu s19404 r8192 d21556 u49152 pcpu-alloc: s19404 r8192 d21556 u49152 alloc=12*4096 pcpu-alloc: [0] 0 [0] 1 Built 1 zonelists, mobility grouping on. Total pages: 522848 Kernel command line: ro root=/dev/sda5 zswap.max_pool_percent=16 zswap.zpool=z3fold slub_debug=FZP page_poison=1 netconsole=@192.168.2.5/eth0,@192.168.2.2/70:85:C2:30:EC:01 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes, linear) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes, linear) mem auto-init: stack:__user(zero), heap alloc:off, heap free:off Kernel virtual memory layout: * 0xffbbf000..0xf000 : fixmap * 0xff40..0xff80 : highmem PTEs * 0xff115000..0xff40 : early ioremap * 0xe900..0xff115000 : vmalloc & ioremap * 0xb000..0xc000 : modules Memory: 2056460K/2097152K available (8688K kernel code, 488K rwdata, 1644K rodata, 1316K init, 377K bss, 40692K reserved, 0K cma-reserved, 1441792K highmem) [...] For testing I used the kernel .config from 5.15.32 for 5.16.18 via make oldconfig and selecting =n for all questions. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
Re: [PATCH v2] ftrace: Make ftrace_graph_is_dead() a static branch
On Wed, 30 Mar 2022 06:55:26 + Christophe Leroy wrote: > > Small nit. Please order the includes in "upside-down x-mas tree" fashion: > > > > #include > > #include > > #include > > #include > > > > That's the first time I get such a request. Usually people request > #includes to be in alphabetical order so when I see a file that has > headers in alphabetical order I try to not break it, but here that was > not the case so I put it at the end of the list. This is something that Ingo Molnar started back in 2009 or so. And I do find it easier on the eyes ;-) I may be the only one today trying to keep it (albeit poorly). It's not a hard requirement, but I find it makes the code look more like art, which it is :-D > > I'll send v3 Thanks, -- Steve
Re: [PATCH v3 1/2] PCI/AER: Disable AER service when link is in L2/L3 ready, L2 and L3 state
On 3/29/22 1:31 AM, Kai-Heng Feng wrote: On some Intel AlderLake platforms, Thunderbolt entering D3cold can cause some errors reported by AER: [ 30.100211] pcieport :00:1d.0: AER: Uncorrected (Non-Fatal) error received: :00:1d.0 [ 30.100251] pcieport :00:1d.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) [ 30.100256] pcieport :00:1d.0: device [8086:7ab0] error status/mask=0010/4000 [ 30.100262] pcieport :00:1d.0:[20] UnsupReq (First) [ 30.100267] pcieport :00:1d.0: AER: TLP Header: 3400 0852 [ 30.100372] thunderbolt :0a:00.0: AER: can't recover (no error_detected callback) [ 30.100401] xhci_hcd :3e:00.0: AER: can't recover (no error_detected callback) [ 30.100427] pcieport :00:1d.0: AER: device recovery failed Include details about in which platform you have seen it and whether this is a generic power issue? So disable AER service to avoid the noises from turning power rails on/off when the device is in low power states (D3hot and D3cold), as PCIe spec "5.2 Link State Power Management" states that TLP and DLLP Also include PCIe specification version number. transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215453 Reviewed-by: Mika Westerberg Signed-off-by: Kai-Heng Feng --- v3: - Remove reference to ACS. - Wording change. v2: - Wording change. drivers/pci/pcie/aer.c | 31 +-- 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 9fa1f97e5b270..e4e9d4a3098d7 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1367,6 +1367,22 @@ static int aer_probe(struct pcie_device *dev) return 0; } +static int aer_suspend(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + + aer_disable_rootport(rpc); + return 0; +} + +static int aer_resume(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + + aer_enable_rootport(rpc); + return 0; +} + /** * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP * @dev: pointer to Root Port, RCEC, or RCiEP @@ -1433,12 +1449,15 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev) } static struct pcie_port_service_driver aerdriver = { - .name = "aer", - .port_type = PCIE_ANY_PORT, - .service= PCIE_PORT_SERVICE_AER, - - .probe = aer_probe, - .remove = aer_remove, + .name = "aer", + .port_type = PCIE_ANY_PORT, + .service= PCIE_PORT_SERVICE_AER, + .probe = aer_probe, + .suspend= aer_suspend, + .resume = aer_resume, + .runtime_suspend= aer_suspend, + .runtime_resume = aer_resume, + .remove = aer_remove, }; /** -- Sathyanarayanan Kuppuswamy Linux Kernel Developer
Re: [RFC PATCH 3/3] objtool/mcount: Add powerpc specific functions
Christophe Leroy wrote: Le 29/03/2022 à 14:01, Michael Ellerman a écrit : Josh Poimboeuf writes: On Sun, Mar 27, 2022 at 09:09:20AM +, Christophe Leroy wrote: What are current works in progress on objtool ? Should I wait Josh's changes before starting looking at all this ? Should I wait for anything else ? I'm not making any major changes to the code, just shuffling things around to make the interface more modular. I hope to have something soon (this week). Peter recently added a big feature (Intel IBT) which is already in -next. Contributions are welcome, with the understanding that you'll help maintain it ;-) Some years ago Kamalesh Babulal had a prototype of objtool for ppc64le which did the full stack validation. I'm not sure what ever became of that. From memory he was starting to clean the patches up in late 2019, but I guess that probably got derailed by COVID. AFAIK he never posted anything. Maybe someone at IBM has a copy internally (Naveen?). Kamalesh had a WIP series to enable stack validation on powerpc. From what I recall, he was waiting on and/or working with the arm64 folks around some of the common changes needed in objtool. FWIW, there have been some objtool patches for arm64 stack validation, but the arm64 maintainers have been hesitant to get on board with objtool, as it brings a certain maintenance burden. Especially for the full stack validation and ORC unwinder. But if you only want inline static calls and/or mcount then it'd probably be much easier to maintain. I would like to have the stack validation, but I am also worried about the maintenance burden. I guess we start with mcount, which looks pretty minimal judging by this series, and see how we go from there. I'm not sure mcount is really needed as we have recordmcount, but at least it is an easy one to start with and as we have recordmount we can easily compare the results and check it works as expected. On the contrary, I think support for mcount in objtool is something we want to get going soon (hopefully, in time for v5.19) given the issues we are seeing with recordmcount: - https://github.com/linuxppc/issues/issues/388 - https://lore.kernel.org/all/20220211014313.1790140-1-...@ozlabs.ru/ - Naveen
Re: [GIT PULL] LIBNVDIMM update for v5.18
The pull request you sent on Tue, 29 Mar 2022 13:54:41 -0700: > git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm > tags/libnvdimm-for-5.18 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/ee96dd9614f1c139e719dd2f296acbed7f1ab4b8 Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.18-1 tag
On Thu, Mar 31, 2022 at 12:21:03AM +1100, Michael Ellerman wrote: > Michal Suchánek writes: > > On Mon, Mar 28, 2022 at 08:07:13PM +1100, Michael Ellerman wrote: > >> Linus Torvalds writes: > >> > On Fri, Mar 25, 2022 at 3:25 AM Michael Ellerman > >> > wrote: > >> > >> > That said: > >> > > >> >> There's a series of commits cleaning up function descriptor handling, > >> > > >> > For some reason I also thought that powerpc had actually moved away > >> > from function descriptors, so I'm clearly not keeping up with the > >> > times. > >> > >> No you're right, we have moved away from them, but not entirely. > >> > >> Functions descriptors are still used for 64-bit big endian, but they're > >> not used for 64-bit little endian, or 32-bit. > > > > There was a patch to use ABIv2 for ppc64 big endian. I suppose that > > would rid usof the gunction descriptors for good. > > It would be nice. > > The hesitation in the past was that the GNU toolchain developers don't > officially support BE+ELFv2, though it is in use so it does work. We do not officially support ELFv2 BE because there are no significant users, so we cannot have the same confidence it works correctly. It isn't tested often with GCC for example, mainly because it isn't convenient to do without pre-packaged user space for it (and on the other hand, there isn't much demand for it). > > Maybe it's worth resurrecting? > > Yeah maybe we should think about it again. If it builds with clang/lld > that would be a real plus. With GCC it should work fine still. But no doubt you will find some edge cases... which you won't find until you try :-) Segher
Re: [PATCH v2 6/8] s390/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
On Tue, 29 Mar 2022 18:43:27 +0200 David Hildenbrand wrote: > Let's use bit 52, which is unused. > > Signed-off-by: David Hildenbrand > --- > arch/s390/include/asm/pgtable.h | 23 +-- > 1 file changed, 21 insertions(+), 2 deletions(-) > > diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h > index 3982575bb586..a397b072a580 100644 > --- a/arch/s390/include/asm/pgtable.h > +++ b/arch/s390/include/asm/pgtable.h > @@ -181,6 +181,8 @@ static inline int is_module_addr(void *addr) > #define _PAGE_SOFT_DIRTY 0x000 > #endif > > +#define _PAGE_SWP_EXCLUSIVE _PAGE_LARGE /* SW pte exclusive swap bit */ > + > /* Set of bits not changed in pte_modify */ > #define _PAGE_CHG_MASK (PAGE_MASK | _PAGE_SPECIAL | > _PAGE_DIRTY | \ >_PAGE_YOUNG | _PAGE_SOFT_DIRTY) > @@ -826,6 +828,22 @@ static inline int pmd_protnone(pmd_t pmd) > } > #endif > > +#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE > +static inline int pte_swp_exclusive(pte_t pte) > +{ > + return pte_val(pte) & _PAGE_SWP_EXCLUSIVE; > +} > + > +static inline pte_t pte_swp_mkexclusive(pte_t pte) > +{ > + return set_pte_bit(pte, __pgprot(_PAGE_SWP_EXCLUSIVE)); > +} > + > +static inline pte_t pte_swp_clear_exclusive(pte_t pte) > +{ > + return clear_pte_bit(pte, __pgprot(_PAGE_SWP_EXCLUSIVE)); > +} > + > static inline int pte_soft_dirty(pte_t pte) > { > return pte_val(pte) & _PAGE_SOFT_DIRTY; > @@ -1715,14 +1733,15 @@ static inline int has_transparent_hugepage(void) > * Bits 54 and 63 are used to indicate the page type. Bit 53 marks the pte > * as invalid. > * A swap pte is indicated by bit pattern (pte & 0x201) == 0x200 > - * | offset|X11XX|type |S0| > + * | offset|E11XX|type |S0| > * |001122334455|5|55566|66| > * |0123456789012345678901234567890123456789012345678901|23456|78901|23| > * > * Bits 0-51 store the offset. > + * Bit 52 (E) is used to remember PG_anon_exclusive. > * Bits 57-61 store the type. > * Bit 62 (S) is used for softdirty tracking. > - * Bits 52, 55 and 56 (X) are unused. > + * Bits 55 and 56 (X) are unused. > */ > > #define __SWP_OFFSET_MASK((1UL << 52) - 1) Thanks David! Reviewed-by: Gerald Schaefer
Re: [PATCH v2 5/8] s390/pgtable: cleanup description of swp pte layout
On Tue, 29 Mar 2022 18:43:26 +0200 David Hildenbrand wrote: > Bit 52 and bit 55 don't have to be zero: they only trigger a > translation-specifiation exception if the PTE is marked as valid, which > is not the case for swap ptes. > > Document which bits are used for what, and which ones are unused. > > Signed-off-by: David Hildenbrand > --- > arch/s390/include/asm/pgtable.h | 17 - > 1 file changed, 8 insertions(+), 9 deletions(-) > > diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h > index 9df679152620..3982575bb586 100644 > --- a/arch/s390/include/asm/pgtable.h > +++ b/arch/s390/include/asm/pgtable.h > @@ -1712,18 +1712,17 @@ static inline int has_transparent_hugepage(void) > /* > * 64 bit swap entry format: > * A page-table entry has some bits we have to treat in a special way. > - * Bits 52 and bit 55 have to be zero, otherwise a specification > - * exception will occur instead of a page translation exception. The > - * specification exception has the bad habit not to store necessary > - * information in the lowcore. > - * Bits 54 and 63 are used to indicate the page type. > + * Bits 54 and 63 are used to indicate the page type. Bit 53 marks the pte > + * as invalid. > * A swap pte is indicated by bit pattern (pte & 0x201) == 0x200 > - * This leaves the bits 0-51 and bits 56-62 to store type and offset. > - * We use the 5 bits from 57-61 for the type and the 52 bits from 0-51 > - * for the offset. > - * | offset|01100|type |00| > + * | offset|X11XX|type |S0| > * |001122334455|5|55566|66| > * |0123456789012345678901234567890123456789012345678901|23456|78901|23| > + * > + * Bits 0-51 store the offset. > + * Bits 57-61 store the type. > + * Bit 62 (S) is used for softdirty tracking. > + * Bits 52, 55 and 56 (X) are unused. > */ > > #define __SWP_OFFSET_MASK((1UL << 52) - 1) Thanks David! Reviewed-by: Gerald Schaefer
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.18-1 tag
On Wed, Mar 30, 2022 at 3:21 PM Michael Ellerman wrote: > Michal Suchánek writes: > > On Mon, Mar 28, 2022 at 08:07:13PM +1100, Michael Ellerman wrote: > >> No you're right, we have moved away from them, but not entirely. > >> > >> Functions descriptors are still used for 64-bit big endian, but they're > >> not used for 64-bit little endian, or 32-bit. > > > > There was a patch to use ABIv2 for ppc64 big endian. I suppose that > > would rid usof the gunction descriptors for good. > > It would be nice. > > The hesitation in the past was that the GNU toolchain developers don't > officially support BE+ELFv2, though it is in use so it does work. It clearly made sense to wait while BE+ELFv1 was commonly used and well tested, but as that is getting less common each year, getting ELFv1 out of the picture would appear to make the setup less obscure, not more. Arnd
[PATCH v2 3/3] powerpc/64: remove system call instruction emulation
From: Nicholas Piggin emulate_step() instruction emulation including sc instruction emulation initially appeared in xmon. It was then moved into sstep.c where kprobes could use it too, and later hw_breakpoint and uprobes started to use it. Until uprobes, the only instruction emulation users were for kernel mode instructions. - xmon only steps / breaks on kernel addresses. - kprobes is kernel only. - hw_breakpoint only emulates kernel instructions, single steps user. At one point, there was support for the kernel to execute sc instructions, although that is long removed and it's not clear whether there were any in-tree users. So system call emulation is not required by the above users. uprobes uses emulate_step and it appears possible to emulate sc instruction in userspace. Userspace system call emulation is broken and it's not clear it ever worked well. The big complication is that userspace takes an interrupt to the kernel to emulate the instruction. The user->kernel interrupt sets up registers and interrupt stack frame expecting to return to userspace, then system call instruction emulation re-directs that stack frame to the kernel, early in the system call interrupt handler. This means the interrupt return code takes the kernel->kernel restore path, which does not restore everything as the system call interrupt handler would expect coming from userspace. regs->iamr appears to get lost for example, because the kernel->kernel return does not restore the user iamr. Accounting such as irqflags tracing and CPU accounting does not get flipped back to user mode as the system call handler expects, so those appear to enter the kernel twice without returning to userspace. These things may be individually fixable with various complication, but it is a big complexity for unclear real benefit. Furthermore, it is not possible to single step a system call instruction since it causes an interrupt. As such, a separate patch disables probing on system call instructions. This patch removes system call emulation and disables stepping system calls. Signed-off-by: Nicholas Piggin [minor commit log edit, and also get rid of '#ifdef CONFIG_PPC64'] Signed-off-by: Naveen N. Rao --- arch/powerpc/kernel/interrupt_64.S | 10 --- arch/powerpc/lib/sstep.c | 46 +++--- 2 files changed, 10 insertions(+), 46 deletions(-) diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S index 7bab2d7de372e0..6471034c790973 100644 --- a/arch/powerpc/kernel/interrupt_64.S +++ b/arch/powerpc/kernel/interrupt_64.S @@ -219,16 +219,6 @@ system_call_vectored common 0x3000 */ system_call_vectored sigill 0x7ff0 - -/* - * Entered via kernel return set up by kernel/sstep.c, must match entry regs - */ - .globl system_call_vectored_emulate -system_call_vectored_emulate: -_ASM_NOKPROBE_SYMBOL(system_call_vectored_emulate) - li r10,IRQS_ALL_DISABLED - stb r10,PACAIRQSOFTMASK(r13) - b system_call_vectored_common #endif /* CONFIG_PPC_BOOK3S */ .balign IFETCH_ALIGN_BYTES diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index 3fda8d0a05b43f..01c8fd39f34981 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -15,9 +15,6 @@ #include #include -extern char system_call_common[]; -extern char system_call_vectored_emulate[]; - #ifdef CONFIG_PPC64 /* Bits in SRR1 that are copied from MSR */ #define MSR_MASK 0x87c0UL @@ -1376,7 +1373,6 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, if (branch_taken(word, regs, op)) op->type |= BRTAKEN; return 1; -#ifdef CONFIG_PPC64 case 17:/* sc */ if ((word & 0xfe2) == 2) op->type = SYSCALL; @@ -1388,7 +1384,6 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, } else op->type = UNKNOWN; return 0; -#endif case 18:/* b */ op->type = BRANCH | BRTAKEN; imm = word & 0x03fc; @@ -3643,43 +3638,22 @@ int emulate_step(struct pt_regs *regs, ppc_inst_t instr) regs_set_return_msr(regs, (regs->msr & ~op.val) | (val & op.val)); goto instr_done; -#ifdef CONFIG_PPC64 case SYSCALL: /* sc */ /* -* N.B. this uses knowledge about how the syscall -* entry code works. If that is changed, this will -* need to be changed also. +* Per ISA v3.1, section 7.5.15 'Trace Interrupt', we can't +* single step a system call instruction: +* +* Successful completion for an instruction means that the +* instruction caused no other interrupt. Thus a Trace +* interrupt never occurs for a System Call
[PATCH v2 2/3] powerpc: Reject probes on instructions that can't be single stepped
Per the ISA, a Trace interrupt is not generated for: - [h|u]rfi[d] - rfscv - sc, scv, and Trap instructions that trap - Power-Saving Mode instructions - other instructions that cause interrupts (other than Trace interrupts) - the first instructions of any interrupt handler (applies to Branch and Single Step tracing; CIABR matches may still occur) - instructions that are emulated by software Add a helper to check for instructions belonging to the first four categories above and to reject kprobes, uprobes and xmon breakpoints on such instructions. We reject probing on instructions belonging to these categories across all ISA versions and across both BookS and BookE. For trap instructions, we can't know in advance if they can cause a trap, and there is no good reason to allow probing on those. Also, uprobes already refuses to probe trap instructions and kprobes does not allow probes on trap instructions used for kernel warnings and bugs. As such, stop allowing any type of probes/breakpoints on trap instruction across uprobes, kprobes and xmon. For some of the fp/altivec instructions that can generate an interrupt and which we emulate in the kernel (altivec assist, for example), we check and turn off single stepping in emulate_single_step(). Instructions generating a DSI are restarted and single stepping normally completes once the instruction is completed. In uprobes, if a single stepped instruction results in a non-fatal signal to be delivered to the task, such signals are "delayed" until after the instruction completes. For fatal signals, single stepping is cancelled and the instruction restarted in-place so that core dump captures proper addresses. In kprobes, we do not allow probes on instructions having an extable entry and we also do not allow probing interrupt vectors. Signed-off-by: Naveen N. Rao --- arch/powerpc/include/asm/ppc-opcode.h | 18 ++ arch/powerpc/include/asm/probes.h | 36 +++ arch/powerpc/kernel/kprobes.c | 4 +-- arch/powerpc/kernel/uprobes.c | 5 arch/powerpc/xmon/xmon.c | 11 5 files changed, 66 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index a5d89cd3e8d12d..683e9bc618a74d 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -130,6 +130,8 @@ #define OP_PREFIX 1 #define OP_TRAP_64 2 #define OP_TRAP3 +#define OP_SC 17 +#define OP_19 19 #define OP_31 31 #define OP_LWZ 32 #define OP_LWZU33 @@ -159,6 +161,20 @@ #define OP_LD 58 #define OP_STD 62 +#define OP_19_XOP_RFID 18 +#define OP_19_XOP_RFMCI38 +#define OP_19_XOP_RFDI 39 +#define OP_19_XOP_RFI 50 +#define OP_19_XOP_RFCI 51 +#define OP_19_XOP_RFSCV82 +#define OP_19_XOP_HRFID274 +#define OP_19_XOP_URFID306 +#define OP_19_XOP_STOP 370 +#define OP_19_XOP_DOZE 402 +#define OP_19_XOP_NAP 434 +#define OP_19_XOP_SLEEP466 +#define OP_19_XOP_RVWINKLE 498 + #define OP_31_XOP_TRAP 4 #define OP_31_XOP_LDX 21 #define OP_31_XOP_LWZX 23 @@ -179,6 +195,8 @@ #define OP_31_XOP_LHZUX 311 #define OP_31_XOP_MSGSNDP 142 #define OP_31_XOP_MSGCLRP 174 +#define OP_31_XOP_MTMSR 146 +#define OP_31_XOP_MTMSRD178 #define OP_31_XOP_TLBIE 306 #define OP_31_XOP_MFSPR 339 #define OP_31_XOP_LWAX 341 diff --git a/arch/powerpc/include/asm/probes.h b/arch/powerpc/include/asm/probes.h index c5d984700d241a..6f66e358aa3780 100644 --- a/arch/powerpc/include/asm/probes.h +++ b/arch/powerpc/include/asm/probes.h @@ -8,6 +8,7 @@ * Copyright IBM Corporation, 2012 */ #include +#include typedef u32 ppc_opcode_t; #define BREAKPOINT_INSTRUCTION 0x7fe8 /* trap */ @@ -31,6 +32,41 @@ typedef u32 ppc_opcode_t; #define MSR_SINGLESTEP (MSR_SE) #endif +static inline bool can_single_step(u32 inst) +{ + switch (get_op(inst)) { + case OP_TRAP_64:return false; + case OP_TRAP: return false; + case OP_SC: return false; + case OP_19: + switch (get_xop(inst)) { + case OP_19_XOP_RFID:return false; + case OP_19_XOP_RFMCI: return false; + case OP_19_XOP_RFDI:return false; + case OP_19_XOP_RFI: return false; + case OP_19_XOP_RFCI:return false; + case OP_19_XOP_RFSCV: return false; + case OP_19_XOP_HRFID: return false; + case OP_19_XOP_URFID: return false; + case OP_19_XOP_STOP:return false; + case OP_19_XOP_DOZE:return false; + case
[PATCH v2 1/3] powerpc: Sort and de-dup primary opcodes in ppc-opcode.h
Some of the primary opcodes are duplicated. Remove those, and sort the rest of the primary opcodes to make it easy to read. Signed-off-by: Naveen N. Rao --- arch/powerpc/include/asm/ppc-opcode.h | 69 --- 1 file changed, 31 insertions(+), 38 deletions(-) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 82f1f0041c6f79..a5d89cd3e8d12d 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -127,8 +127,37 @@ /* opcode and xopcode for instructions */ -#define OP_TRAP 3 -#define OP_TRAP_64 2 +#define OP_PREFIX 1 +#define OP_TRAP_64 2 +#define OP_TRAP3 +#define OP_31 31 +#define OP_LWZ 32 +#define OP_LWZU33 +#define OP_LBZ 34 +#define OP_LBZU35 +#define OP_STW 36 +#define OP_STWU37 +#define OP_STB 38 +#define OP_STBU39 +#define OP_LHZ 40 +#define OP_LHZU41 +#define OP_LHA 42 +#define OP_LHAU43 +#define OP_STH 44 +#define OP_STHU45 +#define OP_LMW 46 +#define OP_STMW47 +#define OP_LFS 48 +#define OP_LFSU49 +#define OP_LFD 50 +#define OP_LFDU51 +#define OP_STFS52 +#define OP_STFSU 53 +#define OP_STFD54 +#define OP_STFDU 55 +#define OP_LQ 56 +#define OP_LD 58 +#define OP_STD 62 #define OP_31_XOP_TRAP 4 #define OP_31_XOP_LDX 21 @@ -208,42 +237,6 @@ /* VMX Vector Store Instructions */ #define OP_31_XOP_STVX 231 -/* Prefixed Instructions */ -#define OP_PREFIX 1 - -#define OP_31 31 -#define OP_LWZ 32 -#define OP_STFS 52 -#define OP_STFSU 53 -#define OP_STFD 54 -#define OP_STFDU 55 -#define OP_LD 58 -#define OP_LWZU 33 -#define OP_LBZ 34 -#define OP_LBZU 35 -#define OP_STW 36 -#define OP_STWU 37 -#define OP_STD 62 -#define OP_STB 38 -#define OP_STBU 39 -#define OP_LHZ 40 -#define OP_LHZU 41 -#define OP_LHA 42 -#define OP_LHAU 43 -#define OP_STH 44 -#define OP_STHU 45 -#define OP_LMW 46 -#define OP_STMW 47 -#define OP_LFS 48 -#define OP_LFSU 49 -#define OP_LFD 50 -#define OP_LFDU 51 -#define OP_STFS 52 -#define OP_STFSU 53 -#define OP_STFD 54 -#define OP_STFDU 55 -#define OP_LQ56 - /* sorted alphabetically */ #define PPC_INST_BCCTR_FLUSH 0x4c400420 #define PPC_INST_COPY 0x7c20060c -- 2.35.1
[PATCH v2 0/3] powerpc: Remove system call emulation
Since v1, the main change is to use helpers to decode primary/extended opcode and the addition of macros for some of the used opcodes. - Naveen Naveen N. Rao (2): powerpc: Sort and de-dup primary opcodes in ppc-opcode.h powerpc: Reject probes on instructions that can't be single stepped Nicholas Piggin (1): powerpc/64: remove system call instruction emulation arch/powerpc/include/asm/ppc-opcode.h | 87 +++ arch/powerpc/include/asm/probes.h | 36 +++ arch/powerpc/kernel/interrupt_64.S| 10 --- arch/powerpc/kernel/kprobes.c | 4 +- arch/powerpc/kernel/uprobes.c | 5 ++ arch/powerpc/lib/sstep.c | 46 +++--- arch/powerpc/xmon/xmon.c | 11 ++-- 7 files changed, 107 insertions(+), 92 deletions(-) base-commit: e8833c5edc5903f8c8c4fa3dd4f34d6b813c87c8 -- 2.35.1
Re: [PATCH] powerpc/numa: Handle partially initialized numa nodes
On Wed, Mar 30, 2022 at 07:21:23PM +0530, Srikar Dronamraju wrote: > With commit 09f49dca570a ("mm: handle uninitialized numa nodes > gracefully") NODE_DATA for even a memoryless/cpuless node is partially > initialized at boot time. > > Before onlining the node, current Powerpc code checks for NODE_DATA to > be NULL. However since NODE_DATA is partially initialized, this check > will end up always being false. > > This causes hotplugging a CPU to a memoryless/cpuless node to fail. > > Before adding CPUs > $ numactl -H > available: 1 nodes (4) > node 4 cpus: 0 1 2 3 4 5 6 7 > node 4 size: 97372 MB > node 4 free: 95545 MB > node distances: > node 4 > 4: 10 > > $ lparstat > System Configuration > type=Dedicated mode=Capped smt=8 lcpu=1 mem=99709440 kB cpus=0 ent=1.00 > > %user %sys %wait%idlephysc %entc lbusy app vcsw phint > - - --- - - - - - > 2.66 2.67 0.1694.51 0.00 0.00 5.33 0.00 67749 0 > > After hotplugging 32 cores > $ numactl -H > node 4 cpus: 0 1 2 3 4 5 6 7 120 121 122 123 124 125 126 127 128 129 130 > 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 > 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 > 167 168 169 170 171 172 173 174 175 > node 4 size: 97372 MB > node 4 free: 93636 MB > node distances: > node 4 > 4: 10 > > $ lparstat > System Configuration > type=Dedicated mode=Capped smt=8 lcpu=33 mem=99709440 kB cpus=0 ent=33.00 > > %user %sys %wait%idlephysc %entc lbusy app vcsw phint > - - --- - - - - - > 0.04 0.02 0.0099.94 0.00 0.00 0.06 0.00 1128751 3 > > As we can see numactl is listing only 8 cores while lparstat is showing > 33 cores. > > Also dmesg is showing messages like: > [ 2261.318350 ] BUG: arch topology borken > [ 2261.318357 ] the DIE domain not a subset of the NODE domain > > Fixes: 09f49dca570a ("mm: handle uninitialized numa nodes gracefully") > Cc: linuxppc-dev@lists.ozlabs.org > Cc: linux...@kvack.org > Cc: Michal Hocko > Cc: Michael Ellerman > Reported-by: Geetika Moolchandani > Signed-off-by: Srikar Dronamraju Acked-by: Oscar Salvador Thanks Srikar! > --- > arch/powerpc/mm/numa.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > index b9b7fefbb64b..13022d734951 100644 > --- a/arch/powerpc/mm/numa.c > +++ b/arch/powerpc/mm/numa.c > @@ -1436,7 +1436,7 @@ int find_and_online_cpu_nid(int cpu) > if (new_nid < 0 || !node_possible(new_nid)) > new_nid = first_online_node; > > - if (NODE_DATA(new_nid) == NULL) { > + if (!node_online(new_nid)) { > #ifdef CONFIG_MEMORY_HOTPLUG > /* >* Need to ensure that NODE_DATA is initialized for a node from > -- > 2.27.0 > > -- Oscar Salvador SUSE Labs
[PATCH] powerpc/numa: Handle partially initialized numa nodes
With commit 09f49dca570a ("mm: handle uninitialized numa nodes gracefully") NODE_DATA for even a memoryless/cpuless node is partially initialized at boot time. Before onlining the node, current Powerpc code checks for NODE_DATA to be NULL. However since NODE_DATA is partially initialized, this check will end up always being false. This causes hotplugging a CPU to a memoryless/cpuless node to fail. Before adding CPUs $ numactl -H available: 1 nodes (4) node 4 cpus: 0 1 2 3 4 5 6 7 node 4 size: 97372 MB node 4 free: 95545 MB node distances: node 4 4: 10 $ lparstat System Configuration type=Dedicated mode=Capped smt=8 lcpu=1 mem=99709440 kB cpus=0 ent=1.00 %user %sys %wait%idlephysc %entc lbusy app vcsw phint - - --- - - - - - 2.66 2.67 0.1694.51 0.00 0.00 5.33 0.00 67749 0 After hotplugging 32 cores $ numactl -H node 4 cpus: 0 1 2 3 4 5 6 7 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 node 4 size: 97372 MB node 4 free: 93636 MB node distances: node 4 4: 10 $ lparstat System Configuration type=Dedicated mode=Capped smt=8 lcpu=33 mem=99709440 kB cpus=0 ent=33.00 %user %sys %wait%idlephysc %entc lbusy app vcsw phint - - --- - - - - - 0.04 0.02 0.0099.94 0.00 0.00 0.06 0.00 1128751 3 As we can see numactl is listing only 8 cores while lparstat is showing 33 cores. Also dmesg is showing messages like: [ 2261.318350 ] BUG: arch topology borken [ 2261.318357 ] the DIE domain not a subset of the NODE domain Fixes: 09f49dca570a ("mm: handle uninitialized numa nodes gracefully") Cc: linuxppc-dev@lists.ozlabs.org Cc: linux...@kvack.org Cc: Michal Hocko Cc: Michael Ellerman Reported-by: Geetika Moolchandani Signed-off-by: Srikar Dronamraju --- arch/powerpc/mm/numa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index b9b7fefbb64b..13022d734951 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -1436,7 +1436,7 @@ int find_and_online_cpu_nid(int cpu) if (new_nid < 0 || !node_possible(new_nid)) new_nid = first_online_node; - if (NODE_DATA(new_nid) == NULL) { + if (!node_online(new_nid)) { #ifdef CONFIG_MEMORY_HOTPLUG /* * Need to ensure that NODE_DATA is initialized for a node from -- 2.27.0
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.18-1 tag
Michal Suchánek writes: > On Mon, Mar 28, 2022 at 08:07:13PM +1100, Michael Ellerman wrote: >> Linus Torvalds writes: >> > On Fri, Mar 25, 2022 at 3:25 AM Michael Ellerman >> > wrote: >> >> > That said: >> > >> >> There's a series of commits cleaning up function descriptor handling, >> > >> > For some reason I also thought that powerpc had actually moved away >> > from function descriptors, so I'm clearly not keeping up with the >> > times. >> >> No you're right, we have moved away from them, but not entirely. >> >> Functions descriptors are still used for 64-bit big endian, but they're >> not used for 64-bit little endian, or 32-bit. > > There was a patch to use ABIv2 for ppc64 big endian. I suppose that > would rid usof the gunction descriptors for good. It would be nice. The hesitation in the past was that the GNU toolchain developers don't officially support BE+ELFv2, though it is in use so it does work. > Maybe it's worth resurrecting? Yeah maybe we should think about it again. If it builds with clang/lld that would be a real plus. cheers
Re: [PATCH 1/2] powerpc: Reject probes on instructions that can't be single stepped
Christophe Leroy wrote: Le 28/03/2022 à 19:20, Naveen N. Rao a écrit : Michael Ellerman wrote: Murilo Opsfelder Araújo writes: On 3/23/22 08:51, Naveen N. Rao wrote: +static inline bool can_single_step(u32 inst) +{ + switch (inst >> 26) { Can't ppc_inst_primary_opcode() be used instead? I didn't want to add a dependency on inst.h. But I guess I can very well move this out of the header into some .c file. I will see if I can make that work. Maybe use get_op() from asm/disassemble.h ? + case 31: + switch ((inst >> 1) & 0x3ff) { For that one you have get_xop() in asm/disassemble.h Nice! I will use those. + case 4: /* tw */ OP_31_XOP_TRAP + return false; + case 68: /* td */ OP_31_XOP_TRAP_64 + return false; + case 146: /* mtmsr */ + return false; + case 178: /* mtmsrd */ + return false; + } + break; + } + return true; +} + Can't OP_* definitions from ppc-opcode.h be used for all of these switch-case statements? Yes please. And add any that are missing. We only have OP_31 from the above list now. I'll add the rest. Isn't there also OP_TRAP and OP_TRAP_64 ? Ah, the list clearly isn't sorted, and there are some duplicates there :) Thanks, Naveen
Re: [PATCH] powerpc/rtas: Keep MSR RI set when calling RTAS
On 29/03/2022, 13:14:10, Michael Ellerman wrote: > Laurent Dufour writes: >> On 29/03/2022, 10:31:33, Nicholas Piggin wrote: >>> Excerpts from Laurent Dufour's message of March 17, 2022 9:06 pm: RTAS runs in real mode (MSR[DR] and MSR[IR] unset) and in 32bits mode (MSR[SF] unset). The change in MSR is done in enter_rtas() in a relatively complex way, since the MSR value could be hardcoded. Furthermore, a panic has been reported when hitting the watchdog interrupt while running in RTAS, this leads to the following stack trace: [69244.027433][ C24] watchdog: CPU 24 Hard LOCKUP [69244.027442][ C24] watchdog: CPU 24 TB:997512652051031, last heartbeat TB:997504470175378 (15980ms ago) [69244.027451][ C24] Modules linked in: chacha_generic(E) libchacha(E) xxhash_generic(E) wp512(E) sha3_generic(E) rmd160(E) poly1305_generic(E) libpoly1305(E) michael_mic(E) md4(E) crc32_generic(E) cmac(E) ccm(E) algif_rng(E) twofish_generic(E) twofish_common(E) serpent_generic(E) fcrypt(E) des_generic(E) libdes(E) cast6_generic(E) cast5_generic(E) cast_common(E) camellia_generic(E) blowfish_generic(E) blowfish_common(E) algif_skcipher(E) algif_hash(E) gcm(E) algif_aead(E) af_alg(E) tun(E) rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) rpadlpar_io(EX) rpaphp(EX) xsk_diag(E) tcp_diag(E) udp_diag(E) raw_diag(E) inet_diag(E) unix_diag(E) af_packet_diag(E) netlink_diag(E) nfsv3(E) nfs_acl(E) nfs(E) lockd(E) grace(E) sunrpc(E) fscache(E) netfs(E) af_packet(E) rfkill(E) bonding(E) tls(E) ibmveth(EX) crct10dif_vpmsum(E) rtc_generic(E) drm(E) drm_panel_orientation_quirks(E) fuse(E) configfs(E) backlight(E) ip_tables(E) x_tables(E) dm_service_time(E) sd_mod(E) t10_pi(E) [69244.027555][ C24] ibmvfc(EX) scsi_transport_fc(E) vmx_crypto(E) gf128mul(E) btrfs(E) blake2b_generic(E) libcrc32c(E) crc32c_vpmsum(E) xor(E) raid6_pq(E) dm_mirror(E) dm_region_hash(E) dm_log(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) [69244.027587][ C24] Supported: No, Unreleased kernel [69244.027600][ C24] CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: GE X5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c [69244.027609][ C24] NIP: 1fb41050 LR: 1fb4104c CTR: [69244.027612][ C24] REGS: cfc33d60 TRAP: 0100 Tainted: G E X (5.14.21-150400.71.1.bz196362_2-default) [69244.027615][ C24] MSR: 82981000 CR: 4882 XER: 20040020 [69244.027625][ C24] CFAR: 011c IRQMASK: 1 [69244.027625][ C24] GPR00: 0003 0001 50dc [69244.027625][ C24] GPR04: 1ffb6100 0020 0001 1fb09010 [69244.027625][ C24] GPR08: 2000 [69244.027625][ C24] GPR12: 8004072a40a8 cff8b680 0007 0034 [69244.027625][ C24] GPR16: 1fbf6e94 1fbf6d84 1fbd1db0 1fb3f008 [69244.027625][ C24] GPR20: 1fb41018 017f f68f [69244.027625][ C24] GPR24: 1fb18fe8 1fb3e000 1fb1adc0 1fb1cf40 [69244.027625][ C24] GPR28: 1fb26000 1fb460f0 1fb17f18 1fb17000 [69244.027663][ C24] NIP [1fb41050] 0x1fb41050 [69244.027696][ C24] LR [1fb4104c] 0x1fb4104c [69244.027699][ C24] Call Trace: [69244.027701][ C24] Instruction dump: [69244.027723][ C24] [69244.027728][ C24] [69244.027762][T87504] Oops: Unrecoverable System Reset, sig: 6 [#1] [69244.028044][T87504] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [69244.028089][T87504] Modules linked in: chacha_generic(E) libchacha(E) xxhash_generic(E) wp512(E) sha3_generic(E) rmd160(E) poly1305_generic(E) libpoly1305(E) michael_mic(E) md4(E) crc32_generic(E) cmac(E) ccm(E) algif_rng(E) twofish_generic(E) twofish_common(E) serpent_generic(E) fcrypt(E) des_generic(E) libdes(E) cast6_generic(E) cast5_generic(E) cast_common(E) camellia_generic(E) blowfish_generic(E) blowfish_common(E) algif_skcipher(E) algif_hash(E) gcm(E) algif_aead(E) af_alg(E) tun(E) rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) rpadlpar_io(EX) rpaphp(EX) xsk_diag(E) tcp_diag(E) udp_diag(E) raw_diag(E) inet_diag(E) unix_diag(E)
[PATCH AUTOSEL 5.10 26/37] uaccess: fix type mismatch warnings from access_ok()
From: Arnd Bergmann [ Upstream commit 23fc539e81295b14b50c6ccc5baeb4f3d59d822d ] On some architectures, access_ok() does not do any argument type checking, so replacing the definition with a generic one causes a few warnings for harmless issues that were never caught before. Fix the ones that I found either through my own test builds or that were reported by the 0-day bot. Reported-by: kernel test robot Reviewed-by: Christoph Hellwig Acked-by: Dinh Nguyen Signed-off-by: Arnd Bergmann Signed-off-by: Sasha Levin --- arch/arc/kernel/process.c | 2 +- arch/arm/kernel/swp_emulate.c | 2 +- arch/arm/kernel/traps.c| 2 +- arch/csky/kernel/perf_callchain.c | 2 +- arch/csky/kernel/signal.c | 2 +- arch/nios2/kernel/signal.c | 20 +++- arch/powerpc/lib/sstep.c | 4 ++-- arch/riscv/kernel/perf_callchain.c | 4 ++-- arch/sparc/kernel/signal_32.c | 2 +- lib/test_lockup.c | 4 ++-- 10 files changed, 23 insertions(+), 21 deletions(-) diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c index 37f724ad5e39..a85e9c625ab5 100644 --- a/arch/arc/kernel/process.c +++ b/arch/arc/kernel/process.c @@ -43,7 +43,7 @@ SYSCALL_DEFINE0(arc_gettls) return task_thread_info(current)->thr_ptr; } -SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new) +SYSCALL_DEFINE3(arc_usr_cmpxchg, int __user *, uaddr, int, expected, int, new) { struct pt_regs *regs = current_pt_regs(); u32 uval; diff --git a/arch/arm/kernel/swp_emulate.c b/arch/arm/kernel/swp_emulate.c index 6166ba38bf99..b74bfcf94fb1 100644 --- a/arch/arm/kernel/swp_emulate.c +++ b/arch/arm/kernel/swp_emulate.c @@ -195,7 +195,7 @@ static int swp_handler(struct pt_regs *regs, unsigned int instr) destreg, EXTRACT_REG_NUM(instr, RT2_OFFSET), data); /* Check access in reasonable access range for both SWP and SWPB */ - if (!access_ok((address & ~3), 4)) { + if (!access_ok((void __user *)(address & ~3), 4)) { pr_debug("SWP{B} emulation: access to %p not allowed!\n", (void *)address); res = -EFAULT; diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c index 2d9e72ad1b0f..a531afad87fd 100644 --- a/arch/arm/kernel/traps.c +++ b/arch/arm/kernel/traps.c @@ -589,7 +589,7 @@ do_cache_op(unsigned long start, unsigned long end, int flags) if (end < start || flags) return -EINVAL; - if (!access_ok(start, end - start)) + if (!access_ok((void __user *)start, end - start)) return -EFAULT; return __do_cache_op(start, end); diff --git a/arch/csky/kernel/perf_callchain.c b/arch/csky/kernel/perf_callchain.c index 35318a635a5f..75e1f9df5f60 100644 --- a/arch/csky/kernel/perf_callchain.c +++ b/arch/csky/kernel/perf_callchain.c @@ -49,7 +49,7 @@ static unsigned long user_backtrace(struct perf_callchain_entry_ctx *entry, { struct stackframe buftail; unsigned long lr = 0; - unsigned long *user_frame_tail = (unsigned long *)fp; + unsigned long __user *user_frame_tail = (unsigned long __user *)fp; /* Check accessibility of one struct frame_tail beyond */ if (!access_ok(user_frame_tail, sizeof(buftail))) diff --git a/arch/csky/kernel/signal.c b/arch/csky/kernel/signal.c index 0ca49b5e3dd3..243228b0aa07 100644 --- a/arch/csky/kernel/signal.c +++ b/arch/csky/kernel/signal.c @@ -136,7 +136,7 @@ static inline void __user *get_sigframe(struct ksignal *ksig, static int setup_rt_frame(struct ksignal *ksig, sigset_t *set, struct pt_regs *regs) { - struct rt_sigframe *frame; + struct rt_sigframe __user *frame; int err = 0; struct csky_vdso *vdso = current->mm->context.vdso; diff --git a/arch/nios2/kernel/signal.c b/arch/nios2/kernel/signal.c index cf2dca2ac7c3..e45491d1d3e4 100644 --- a/arch/nios2/kernel/signal.c +++ b/arch/nios2/kernel/signal.c @@ -36,10 +36,10 @@ struct rt_sigframe { static inline int rt_restore_ucontext(struct pt_regs *regs, struct switch_stack *sw, - struct ucontext *uc, int *pr2) + struct ucontext __user *uc, int *pr2) { int temp; - unsigned long *gregs = uc->uc_mcontext.gregs; + unsigned long __user *gregs = uc->uc_mcontext.gregs; int err; /* Always make any pending restarted system calls return -EINTR */ @@ -102,10 +102,11 @@ asmlinkage int do_rt_sigreturn(struct switch_stack *sw) { struct pt_regs *regs = (struct pt_regs *)(sw + 1); /* Verify, can we follow the stack back */ - struct rt_sigframe *frame = (struct rt_sigframe *) regs->sp; + struct rt_sigframe __user *frame; sigset_t set; int rval; + frame = (struct rt_sigframe __user *) regs->sp; if
[PATCH AUTOSEL 5.15 36/50] uaccess: fix type mismatch warnings from access_ok()
From: Arnd Bergmann [ Upstream commit 23fc539e81295b14b50c6ccc5baeb4f3d59d822d ] On some architectures, access_ok() does not do any argument type checking, so replacing the definition with a generic one causes a few warnings for harmless issues that were never caught before. Fix the ones that I found either through my own test builds or that were reported by the 0-day bot. Reported-by: kernel test robot Reviewed-by: Christoph Hellwig Acked-by: Dinh Nguyen Signed-off-by: Arnd Bergmann Signed-off-by: Sasha Levin --- arch/arc/kernel/process.c | 2 +- arch/arm/kernel/swp_emulate.c | 2 +- arch/arm/kernel/traps.c| 2 +- arch/csky/kernel/perf_callchain.c | 2 +- arch/csky/kernel/signal.c | 2 +- arch/nios2/kernel/signal.c | 20 +++- arch/powerpc/lib/sstep.c | 4 ++-- arch/riscv/kernel/perf_callchain.c | 4 ++-- arch/sparc/kernel/signal_32.c | 2 +- lib/test_lockup.c | 4 ++-- 10 files changed, 23 insertions(+), 21 deletions(-) diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c index 8e90052f6f05..5f7f5aab361f 100644 --- a/arch/arc/kernel/process.c +++ b/arch/arc/kernel/process.c @@ -43,7 +43,7 @@ SYSCALL_DEFINE0(arc_gettls) return task_thread_info(current)->thr_ptr; } -SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new) +SYSCALL_DEFINE3(arc_usr_cmpxchg, int __user *, uaddr, int, expected, int, new) { struct pt_regs *regs = current_pt_regs(); u32 uval; diff --git a/arch/arm/kernel/swp_emulate.c b/arch/arm/kernel/swp_emulate.c index 6166ba38bf99..b74bfcf94fb1 100644 --- a/arch/arm/kernel/swp_emulate.c +++ b/arch/arm/kernel/swp_emulate.c @@ -195,7 +195,7 @@ static int swp_handler(struct pt_regs *regs, unsigned int instr) destreg, EXTRACT_REG_NUM(instr, RT2_OFFSET), data); /* Check access in reasonable access range for both SWP and SWPB */ - if (!access_ok((address & ~3), 4)) { + if (!access_ok((void __user *)(address & ~3), 4)) { pr_debug("SWP{B} emulation: access to %p not allowed!\n", (void *)address); res = -EFAULT; diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c index 655c4fe0b4d0..54abd8720dde 100644 --- a/arch/arm/kernel/traps.c +++ b/arch/arm/kernel/traps.c @@ -575,7 +575,7 @@ do_cache_op(unsigned long start, unsigned long end, int flags) if (end < start || flags) return -EINVAL; - if (!access_ok(start, end - start)) + if (!access_ok((void __user *)start, end - start)) return -EFAULT; return __do_cache_op(start, end); diff --git a/arch/csky/kernel/perf_callchain.c b/arch/csky/kernel/perf_callchain.c index 35318a635a5f..75e1f9df5f60 100644 --- a/arch/csky/kernel/perf_callchain.c +++ b/arch/csky/kernel/perf_callchain.c @@ -49,7 +49,7 @@ static unsigned long user_backtrace(struct perf_callchain_entry_ctx *entry, { struct stackframe buftail; unsigned long lr = 0; - unsigned long *user_frame_tail = (unsigned long *)fp; + unsigned long __user *user_frame_tail = (unsigned long __user *)fp; /* Check accessibility of one struct frame_tail beyond */ if (!access_ok(user_frame_tail, sizeof(buftail))) diff --git a/arch/csky/kernel/signal.c b/arch/csky/kernel/signal.c index c7b763d2f526..8867ddf3e6c7 100644 --- a/arch/csky/kernel/signal.c +++ b/arch/csky/kernel/signal.c @@ -136,7 +136,7 @@ static inline void __user *get_sigframe(struct ksignal *ksig, static int setup_rt_frame(struct ksignal *ksig, sigset_t *set, struct pt_regs *regs) { - struct rt_sigframe *frame; + struct rt_sigframe __user *frame; int err = 0; frame = get_sigframe(ksig, regs, sizeof(*frame)); diff --git a/arch/nios2/kernel/signal.c b/arch/nios2/kernel/signal.c index 2009ae2d3c3b..386e46443b60 100644 --- a/arch/nios2/kernel/signal.c +++ b/arch/nios2/kernel/signal.c @@ -36,10 +36,10 @@ struct rt_sigframe { static inline int rt_restore_ucontext(struct pt_regs *regs, struct switch_stack *sw, - struct ucontext *uc, int *pr2) + struct ucontext __user *uc, int *pr2) { int temp; - unsigned long *gregs = uc->uc_mcontext.gregs; + unsigned long __user *gregs = uc->uc_mcontext.gregs; int err; /* Always make any pending restarted system calls return -EINTR */ @@ -102,10 +102,11 @@ asmlinkage int do_rt_sigreturn(struct switch_stack *sw) { struct pt_regs *regs = (struct pt_regs *)(sw + 1); /* Verify, can we follow the stack back */ - struct rt_sigframe *frame = (struct rt_sigframe *) regs->sp; + struct rt_sigframe __user *frame; sigset_t set; int rval; + frame = (struct rt_sigframe __user *) regs->sp; if (!access_ok(frame,
[PATCH AUTOSEL 5.16 40/59] uaccess: fix type mismatch warnings from access_ok()
From: Arnd Bergmann [ Upstream commit 23fc539e81295b14b50c6ccc5baeb4f3d59d822d ] On some architectures, access_ok() does not do any argument type checking, so replacing the definition with a generic one causes a few warnings for harmless issues that were never caught before. Fix the ones that I found either through my own test builds or that were reported by the 0-day bot. Reported-by: kernel test robot Reviewed-by: Christoph Hellwig Acked-by: Dinh Nguyen Signed-off-by: Arnd Bergmann Signed-off-by: Sasha Levin --- arch/arc/kernel/process.c | 2 +- arch/arm/kernel/swp_emulate.c | 2 +- arch/arm/kernel/traps.c| 2 +- arch/csky/kernel/perf_callchain.c | 2 +- arch/csky/kernel/signal.c | 2 +- arch/nios2/kernel/signal.c | 20 +++- arch/powerpc/lib/sstep.c | 4 ++-- arch/riscv/kernel/perf_callchain.c | 4 ++-- arch/sparc/kernel/signal_32.c | 2 +- lib/test_lockup.c | 4 ++-- 10 files changed, 23 insertions(+), 21 deletions(-) diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c index 8e90052f6f05..5f7f5aab361f 100644 --- a/arch/arc/kernel/process.c +++ b/arch/arc/kernel/process.c @@ -43,7 +43,7 @@ SYSCALL_DEFINE0(arc_gettls) return task_thread_info(current)->thr_ptr; } -SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new) +SYSCALL_DEFINE3(arc_usr_cmpxchg, int __user *, uaddr, int, expected, int, new) { struct pt_regs *regs = current_pt_regs(); u32 uval; diff --git a/arch/arm/kernel/swp_emulate.c b/arch/arm/kernel/swp_emulate.c index 6166ba38bf99..b74bfcf94fb1 100644 --- a/arch/arm/kernel/swp_emulate.c +++ b/arch/arm/kernel/swp_emulate.c @@ -195,7 +195,7 @@ static int swp_handler(struct pt_regs *regs, unsigned int instr) destreg, EXTRACT_REG_NUM(instr, RT2_OFFSET), data); /* Check access in reasonable access range for both SWP and SWPB */ - if (!access_ok((address & ~3), 4)) { + if (!access_ok((void __user *)(address & ~3), 4)) { pr_debug("SWP{B} emulation: access to %p not allowed!\n", (void *)address); res = -EFAULT; diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c index 90c887aa67a4..f74460d3bef5 100644 --- a/arch/arm/kernel/traps.c +++ b/arch/arm/kernel/traps.c @@ -575,7 +575,7 @@ do_cache_op(unsigned long start, unsigned long end, int flags) if (end < start || flags) return -EINVAL; - if (!access_ok(start, end - start)) + if (!access_ok((void __user *)start, end - start)) return -EFAULT; return __do_cache_op(start, end); diff --git a/arch/csky/kernel/perf_callchain.c b/arch/csky/kernel/perf_callchain.c index 35318a635a5f..75e1f9df5f60 100644 --- a/arch/csky/kernel/perf_callchain.c +++ b/arch/csky/kernel/perf_callchain.c @@ -49,7 +49,7 @@ static unsigned long user_backtrace(struct perf_callchain_entry_ctx *entry, { struct stackframe buftail; unsigned long lr = 0; - unsigned long *user_frame_tail = (unsigned long *)fp; + unsigned long __user *user_frame_tail = (unsigned long __user *)fp; /* Check accessibility of one struct frame_tail beyond */ if (!access_ok(user_frame_tail, sizeof(buftail))) diff --git a/arch/csky/kernel/signal.c b/arch/csky/kernel/signal.c index c7b763d2f526..8867ddf3e6c7 100644 --- a/arch/csky/kernel/signal.c +++ b/arch/csky/kernel/signal.c @@ -136,7 +136,7 @@ static inline void __user *get_sigframe(struct ksignal *ksig, static int setup_rt_frame(struct ksignal *ksig, sigset_t *set, struct pt_regs *regs) { - struct rt_sigframe *frame; + struct rt_sigframe __user *frame; int err = 0; frame = get_sigframe(ksig, regs, sizeof(*frame)); diff --git a/arch/nios2/kernel/signal.c b/arch/nios2/kernel/signal.c index 2009ae2d3c3b..386e46443b60 100644 --- a/arch/nios2/kernel/signal.c +++ b/arch/nios2/kernel/signal.c @@ -36,10 +36,10 @@ struct rt_sigframe { static inline int rt_restore_ucontext(struct pt_regs *regs, struct switch_stack *sw, - struct ucontext *uc, int *pr2) + struct ucontext __user *uc, int *pr2) { int temp; - unsigned long *gregs = uc->uc_mcontext.gregs; + unsigned long __user *gregs = uc->uc_mcontext.gregs; int err; /* Always make any pending restarted system calls return -EINTR */ @@ -102,10 +102,11 @@ asmlinkage int do_rt_sigreturn(struct switch_stack *sw) { struct pt_regs *regs = (struct pt_regs *)(sw + 1); /* Verify, can we follow the stack back */ - struct rt_sigframe *frame = (struct rt_sigframe *) regs->sp; + struct rt_sigframe __user *frame; sigset_t set; int rval; + frame = (struct rt_sigframe __user *) regs->sp; if (!access_ok(frame,
[PATCH AUTOSEL 5.17 41/66] uaccess: fix type mismatch warnings from access_ok()
From: Arnd Bergmann [ Upstream commit 23fc539e81295b14b50c6ccc5baeb4f3d59d822d ] On some architectures, access_ok() does not do any argument type checking, so replacing the definition with a generic one causes a few warnings for harmless issues that were never caught before. Fix the ones that I found either through my own test builds or that were reported by the 0-day bot. Reported-by: kernel test robot Reviewed-by: Christoph Hellwig Acked-by: Dinh Nguyen Signed-off-by: Arnd Bergmann Signed-off-by: Sasha Levin --- arch/arc/kernel/process.c | 2 +- arch/arm/kernel/swp_emulate.c | 2 +- arch/arm/kernel/traps.c| 2 +- arch/csky/kernel/perf_callchain.c | 2 +- arch/csky/kernel/signal.c | 2 +- arch/nios2/kernel/signal.c | 20 +++- arch/powerpc/lib/sstep.c | 4 ++-- arch/riscv/kernel/perf_callchain.c | 4 ++-- arch/sparc/kernel/signal_32.c | 2 +- lib/test_lockup.c | 4 ++-- 10 files changed, 23 insertions(+), 21 deletions(-) diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c index 8e90052f6f05..5f7f5aab361f 100644 --- a/arch/arc/kernel/process.c +++ b/arch/arc/kernel/process.c @@ -43,7 +43,7 @@ SYSCALL_DEFINE0(arc_gettls) return task_thread_info(current)->thr_ptr; } -SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new) +SYSCALL_DEFINE3(arc_usr_cmpxchg, int __user *, uaddr, int, expected, int, new) { struct pt_regs *regs = current_pt_regs(); u32 uval; diff --git a/arch/arm/kernel/swp_emulate.c b/arch/arm/kernel/swp_emulate.c index 6166ba38bf99..b74bfcf94fb1 100644 --- a/arch/arm/kernel/swp_emulate.c +++ b/arch/arm/kernel/swp_emulate.c @@ -195,7 +195,7 @@ static int swp_handler(struct pt_regs *regs, unsigned int instr) destreg, EXTRACT_REG_NUM(instr, RT2_OFFSET), data); /* Check access in reasonable access range for both SWP and SWPB */ - if (!access_ok((address & ~3), 4)) { + if (!access_ok((void __user *)(address & ~3), 4)) { pr_debug("SWP{B} emulation: access to %p not allowed!\n", (void *)address); res = -EFAULT; diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c index cae4a748811f..5d58aee24087 100644 --- a/arch/arm/kernel/traps.c +++ b/arch/arm/kernel/traps.c @@ -577,7 +577,7 @@ do_cache_op(unsigned long start, unsigned long end, int flags) if (end < start || flags) return -EINVAL; - if (!access_ok(start, end - start)) + if (!access_ok((void __user *)start, end - start)) return -EFAULT; return __do_cache_op(start, end); diff --git a/arch/csky/kernel/perf_callchain.c b/arch/csky/kernel/perf_callchain.c index 92057de08f4f..1612f4354087 100644 --- a/arch/csky/kernel/perf_callchain.c +++ b/arch/csky/kernel/perf_callchain.c @@ -49,7 +49,7 @@ static unsigned long user_backtrace(struct perf_callchain_entry_ctx *entry, { struct stackframe buftail; unsigned long lr = 0; - unsigned long *user_frame_tail = (unsigned long *)fp; + unsigned long __user *user_frame_tail = (unsigned long __user *)fp; /* Check accessibility of one struct frame_tail beyond */ if (!access_ok(user_frame_tail, sizeof(buftail))) diff --git a/arch/csky/kernel/signal.c b/arch/csky/kernel/signal.c index c7b763d2f526..8867ddf3e6c7 100644 --- a/arch/csky/kernel/signal.c +++ b/arch/csky/kernel/signal.c @@ -136,7 +136,7 @@ static inline void __user *get_sigframe(struct ksignal *ksig, static int setup_rt_frame(struct ksignal *ksig, sigset_t *set, struct pt_regs *regs) { - struct rt_sigframe *frame; + struct rt_sigframe __user *frame; int err = 0; frame = get_sigframe(ksig, regs, sizeof(*frame)); diff --git a/arch/nios2/kernel/signal.c b/arch/nios2/kernel/signal.c index 2009ae2d3c3b..386e46443b60 100644 --- a/arch/nios2/kernel/signal.c +++ b/arch/nios2/kernel/signal.c @@ -36,10 +36,10 @@ struct rt_sigframe { static inline int rt_restore_ucontext(struct pt_regs *regs, struct switch_stack *sw, - struct ucontext *uc, int *pr2) + struct ucontext __user *uc, int *pr2) { int temp; - unsigned long *gregs = uc->uc_mcontext.gregs; + unsigned long __user *gregs = uc->uc_mcontext.gregs; int err; /* Always make any pending restarted system calls return -EINTR */ @@ -102,10 +102,11 @@ asmlinkage int do_rt_sigreturn(struct switch_stack *sw) { struct pt_regs *regs = (struct pt_regs *)(sw + 1); /* Verify, can we follow the stack back */ - struct rt_sigframe *frame = (struct rt_sigframe *) regs->sp; + struct rt_sigframe __user *frame; sigset_t set; int rval; + frame = (struct rt_sigframe __user *) regs->sp; if (!access_ok(frame,
Re: [PATCH] powerpc/boot: Build wrapper for an appropriate CPU
On Wed, 30 Mar 2022 at 11:33, Christophe Leroy wrote: > > > > Le 30/03/2022 à 13:24, Joel Stanley a écrit : > > Currently the boot wrapper lacks a -mcpu option, so it will be built for > > the toolchain's default cpu. This is a problem if the toolchain defaults > > to a cpu with newer instructions. > > > > We could wire in TARGET_CPU but instead use the oldest supported option > > so the wrapper runs anywhere. > > > > The GCC documentation stays that -mcpu=powerpc64le will give us a > > generic 64 bit powerpc machine: > > > > -mcpu=powerpc, -mcpu=powerpc64, and -mcpu=powerpc64le specify pure > > 32-bit PowerPC (either endian), 64-bit big endian PowerPC and 64-bit > > little endian PowerPC architecture machine types, with an appropriate, > > generic processor model assumed for scheduling purposes. > > > > So do that for each of the three machines. > > > > This bug was found when building the kernel with a toolchain that > > defaulted to powre10, resulting in a pcrel enabled wrapper which fails > > to link: > > > > arch/powerpc/boot/wrapper.a(crt0.o): in function `p_base': > > (.text+0x150): call to `platform_init' lacks nop, can't restore toc; (toc > > save/adjust stub) > > (.text+0x154): call to `start' lacks nop, can't restore toc; (toc > > save/adjust stub) > > powerpc64le-buildroot-linux-gnu-ld: final link failed: bad value > > > > Even with tha bug worked around the resulting kernel would crash on a > > power9 box: > > > > $ qemu-system-ppc64 -nographic -nodefaults -M powernv9 -kernel > > arch/powerpc/boot/zImage.epapr -serial mon:stdio > > [7.069331356,5] INIT: Starting kernel at 0x20010020, fdt at > > 0x3068c628 25694 bytes > > [7.130374661,3] *** > > [7.131072886,3] Fatal Exception 0xe40 at 200101e4MSR > > 9001 > > [7.131290613,3] CFAR : 2001027c MSR : 9001 > > [7.131433759,3] SRR0 : 20010050 SRR1 : 9001 > > [7.13155,3] HSRR0: 200101e4 HSRR1: 9001 > > [7.131733687,3] DSISR: DAR : > > [7.131905162,3] LR : 20010280 CTR : > > [7.132068356,3] CR : 44002004 XER : > > > > Link: https://github.com/linuxppc/issues/issues/400 > > Signed-off-by: Joel Stanley > > --- > > Tested: > > > > - ppc64le_defconfig > > - pseries and powernv qemu, for power8, power9, power10 cpus > > - buildroot compiler that defaults to -mcpu=power10 (gcc 10.3.0, ld > > 2.36.1) > > - RHEL9 cross compilers (gcc 11.2.1-1, ld 2.35.2-17.el9) > > > > All decompressed and made it into the kernel ok. > > > > ppc64_defconfig did not work, as we've got a regression when the wrapper > > is built for big endian. It hasn't worked for zImage.pseries for a long > > time (at least v4.14), and broke some time between v5.4 and v5.17 for > > zImage.epapr. > > > > arch/powerpc/boot/Makefile | 8 ++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile > > index 9993c6256ad2..1f5cc401bfc0 100644 > > --- a/arch/powerpc/boot/Makefile > > +++ b/arch/powerpc/boot/Makefile > > @@ -38,9 +38,13 @@ BOOTCFLAGS:= -Wall -Wundef -Wstrict-prototypes > > -Wno-trigraphs \ > >$(LINUXINCLUDE) > > > > ifdef CONFIG_PPC64_BOOT_WRAPPER > > -BOOTCFLAGS += -m64 > > +ifdef CONFIG_CPU_LITTLE_ENDIAN > > +BOOTCFLAGS += -m64 -mcpu=powerpc64le > > else > > -BOOTCFLAGS += -m32 > > +BOOTCFLAGS += -m64 -mcpu=powerpc64 > > +endif > > +else > > +BOOTCFLAGS += -m32 -mcpu=powerpc > > How does that interracts with the following lines ? Isn't it an issue to > have two -mcpu ? > > arch/powerpc/boot/Makefile:$(obj)/4xx.o: BOOTCFLAGS += -mcpu=405 > arch/powerpc/boot/Makefile:$(obj)/ebony.o: BOOTCFLAGS += -mcpu=440 > arch/powerpc/boot/Makefile:$(obj)/cuboot-hotfoot.o: BOOTCFLAGS += -mcpu=405 > arch/powerpc/boot/Makefile:$(obj)/cuboot-taishan.o: BOOTCFLAGS += -mcpu=440 > arch/powerpc/boot/Makefile:$(obj)/cuboot-katmai.o: BOOTCFLAGS += -mcpu=440 > arch/powerpc/boot/Makefile:$(obj)/cuboot-acadia.o: BOOTCFLAGS += -mcpu=405 > arch/powerpc/boot/Makefile:$(obj)/treeboot-iss4xx.o: BOOTCFLAGS += -mcpu=405 > arch/powerpc/boot/Makefile:$(obj)/treeboot-currituck.o: BOOTCFLAGS += > -mcpu=405 > arch/powerpc/boot/Makefile:$(obj)/treeboot-akebono.o: BOOTCFLAGS += > -mcpu=405 Good point, I didn't test the other wrappers. Last one wins as far as -mcpu lines goes, from a quick test. But it might lead to less confusion if I dropped the -mcpu=powerpc change.
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.18-1 tag
On Mon, Mar 28, 2022 at 08:07:13PM +1100, Michael Ellerman wrote: > Linus Torvalds writes: > > On Fri, Mar 25, 2022 at 3:25 AM Michael Ellerman > > wrote: > > > That said: > > > >> There's a series of commits cleaning up function descriptor handling, > > > > For some reason I also thought that powerpc had actually moved away > > from function descriptors, so I'm clearly not keeping up with the > > times. > > No you're right, we have moved away from them, but not entirely. > > Functions descriptors are still used for 64-bit big endian, but they're > not used for 64-bit little endian, or 32-bit. There was a patch to use ABIv2 for ppc64 big endian. I suppose that would rid usof the gunction descriptors for good. Somehow the discussion of that change tralied off without any results. Maybe it's worth resurrecting? Thanks Michal
Re: [PATCH] powerpc/boot: Build wrapper for an appropriate CPU
Le 30/03/2022 à 13:24, Joel Stanley a écrit : > Currently the boot wrapper lacks a -mcpu option, so it will be built for > the toolchain's default cpu. This is a problem if the toolchain defaults > to a cpu with newer instructions. > > We could wire in TARGET_CPU but instead use the oldest supported option > so the wrapper runs anywhere. > > The GCC documentation stays that -mcpu=powerpc64le will give us a > generic 64 bit powerpc machine: > > -mcpu=powerpc, -mcpu=powerpc64, and -mcpu=powerpc64le specify pure > 32-bit PowerPC (either endian), 64-bit big endian PowerPC and 64-bit > little endian PowerPC architecture machine types, with an appropriate, > generic processor model assumed for scheduling purposes. > > So do that for each of the three machines. > > This bug was found when building the kernel with a toolchain that > defaulted to powre10, resulting in a pcrel enabled wrapper which fails > to link: > > arch/powerpc/boot/wrapper.a(crt0.o): in function `p_base': > (.text+0x150): call to `platform_init' lacks nop, can't restore toc; (toc > save/adjust stub) > (.text+0x154): call to `start' lacks nop, can't restore toc; (toc > save/adjust stub) > powerpc64le-buildroot-linux-gnu-ld: final link failed: bad value > > Even with tha bug worked around the resulting kernel would crash on a > power9 box: > > $ qemu-system-ppc64 -nographic -nodefaults -M powernv9 -kernel > arch/powerpc/boot/zImage.epapr -serial mon:stdio > [7.069331356,5] INIT: Starting kernel at 0x20010020, fdt at 0x3068c628 > 25694 bytes > [7.130374661,3] *** > [7.131072886,3] Fatal Exception 0xe40 at 200101e4MSR > 9001 > [7.131290613,3] CFAR : 2001027c MSR : 9001 > [7.131433759,3] SRR0 : 20010050 SRR1 : 9001 > [7.13155,3] HSRR0: 200101e4 HSRR1: 9001 > [7.131733687,3] DSISR: DAR : > [7.131905162,3] LR : 20010280 CTR : > [7.132068356,3] CR : 44002004 XER : > > Link: https://github.com/linuxppc/issues/issues/400 > Signed-off-by: Joel Stanley > --- > Tested: > > - ppc64le_defconfig > - pseries and powernv qemu, for power8, power9, power10 cpus > - buildroot compiler that defaults to -mcpu=power10 (gcc 10.3.0, ld 2.36.1) > - RHEL9 cross compilers (gcc 11.2.1-1, ld 2.35.2-17.el9) > > All decompressed and made it into the kernel ok. > > ppc64_defconfig did not work, as we've got a regression when the wrapper > is built for big endian. It hasn't worked for zImage.pseries for a long > time (at least v4.14), and broke some time between v5.4 and v5.17 for > zImage.epapr. > > arch/powerpc/boot/Makefile | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile > index 9993c6256ad2..1f5cc401bfc0 100644 > --- a/arch/powerpc/boot/Makefile > +++ b/arch/powerpc/boot/Makefile > @@ -38,9 +38,13 @@ BOOTCFLAGS:= -Wall -Wundef -Wstrict-prototypes > -Wno-trigraphs \ >$(LINUXINCLUDE) > > ifdef CONFIG_PPC64_BOOT_WRAPPER > -BOOTCFLAGS += -m64 > +ifdef CONFIG_CPU_LITTLE_ENDIAN > +BOOTCFLAGS += -m64 -mcpu=powerpc64le > else > -BOOTCFLAGS += -m32 > +BOOTCFLAGS += -m64 -mcpu=powerpc64 > +endif > +else > +BOOTCFLAGS += -m32 -mcpu=powerpc How does that interracts with the following lines ? Isn't it an issue to have two -mcpu ? arch/powerpc/boot/Makefile:$(obj)/4xx.o: BOOTCFLAGS += -mcpu=405 arch/powerpc/boot/Makefile:$(obj)/ebony.o: BOOTCFLAGS += -mcpu=440 arch/powerpc/boot/Makefile:$(obj)/cuboot-hotfoot.o: BOOTCFLAGS += -mcpu=405 arch/powerpc/boot/Makefile:$(obj)/cuboot-taishan.o: BOOTCFLAGS += -mcpu=440 arch/powerpc/boot/Makefile:$(obj)/cuboot-katmai.o: BOOTCFLAGS += -mcpu=440 arch/powerpc/boot/Makefile:$(obj)/cuboot-acadia.o: BOOTCFLAGS += -mcpu=405 arch/powerpc/boot/Makefile:$(obj)/treeboot-iss4xx.o: BOOTCFLAGS += -mcpu=405 arch/powerpc/boot/Makefile:$(obj)/treeboot-currituck.o: BOOTCFLAGS += -mcpu=405 arch/powerpc/boot/Makefile:$(obj)/treeboot-akebono.o: BOOTCFLAGS += -mcpu=405 > endif > > BOOTCFLAGS += -isystem $(shell $(BOOTCC) -print-file-name=include)
[PATCH] powerpc/boot: Build wrapper for an appropriate CPU
Currently the boot wrapper lacks a -mcpu option, so it will be built for the toolchain's default cpu. This is a problem if the toolchain defaults to a cpu with newer instructions. We could wire in TARGET_CPU but instead use the oldest supported option so the wrapper runs anywhere. The GCC documentation stays that -mcpu=powerpc64le will give us a generic 64 bit powerpc machine: -mcpu=powerpc, -mcpu=powerpc64, and -mcpu=powerpc64le specify pure 32-bit PowerPC (either endian), 64-bit big endian PowerPC and 64-bit little endian PowerPC architecture machine types, with an appropriate, generic processor model assumed for scheduling purposes. So do that for each of the three machines. This bug was found when building the kernel with a toolchain that defaulted to powre10, resulting in a pcrel enabled wrapper which fails to link: arch/powerpc/boot/wrapper.a(crt0.o): in function `p_base': (.text+0x150): call to `platform_init' lacks nop, can't restore toc; (toc save/adjust stub) (.text+0x154): call to `start' lacks nop, can't restore toc; (toc save/adjust stub) powerpc64le-buildroot-linux-gnu-ld: final link failed: bad value Even with tha bug worked around the resulting kernel would crash on a power9 box: $ qemu-system-ppc64 -nographic -nodefaults -M powernv9 -kernel arch/powerpc/boot/zImage.epapr -serial mon:stdio [7.069331356,5] INIT: Starting kernel at 0x20010020, fdt at 0x3068c628 25694 bytes [7.130374661,3] *** [7.131072886,3] Fatal Exception 0xe40 at 200101e4MSR 9001 [7.131290613,3] CFAR : 2001027c MSR : 9001 [7.131433759,3] SRR0 : 20010050 SRR1 : 9001 [7.13155,3] HSRR0: 200101e4 HSRR1: 9001 [7.131733687,3] DSISR: DAR : [7.131905162,3] LR : 20010280 CTR : [7.132068356,3] CR : 44002004 XER : Link: https://github.com/linuxppc/issues/issues/400 Signed-off-by: Joel Stanley --- Tested: - ppc64le_defconfig - pseries and powernv qemu, for power8, power9, power10 cpus - buildroot compiler that defaults to -mcpu=power10 (gcc 10.3.0, ld 2.36.1) - RHEL9 cross compilers (gcc 11.2.1-1, ld 2.35.2-17.el9) All decompressed and made it into the kernel ok. ppc64_defconfig did not work, as we've got a regression when the wrapper is built for big endian. It hasn't worked for zImage.pseries for a long time (at least v4.14), and broke some time between v5.4 and v5.17 for zImage.epapr. arch/powerpc/boot/Makefile | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile index 9993c6256ad2..1f5cc401bfc0 100644 --- a/arch/powerpc/boot/Makefile +++ b/arch/powerpc/boot/Makefile @@ -38,9 +38,13 @@ BOOTCFLAGS:= -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \ $(LINUXINCLUDE) ifdef CONFIG_PPC64_BOOT_WRAPPER -BOOTCFLAGS += -m64 +ifdef CONFIG_CPU_LITTLE_ENDIAN +BOOTCFLAGS += -m64 -mcpu=powerpc64le else -BOOTCFLAGS += -m32 +BOOTCFLAGS += -m64 -mcpu=powerpc64 +endif +else +BOOTCFLAGS += -m32 -mcpu=powerpc endif BOOTCFLAGS += -isystem $(shell $(BOOTCC) -print-file-name=include) -- 2.35.1
Re: [PATCH] recordmcount: Support empty section from recent binutils
Le 29/03/2022 à 00:31, Joel Stanley a écrit : > On Mon, 29 Nov 2021 at 22:43, Christophe Leroy > wrote: >> >> >> >> Le 29/11/2021 à 18:43, Steven Rostedt a écrit : >>> On Fri, 26 Nov 2021 08:43:23 + >>> LEROY Christophe wrote: >>> Le 24/11/2021 à 15:43, Christophe Leroy a écrit : > Looks like recent binutils (2.36 and over ?) may empty some section, > leading to failure like: > > Cannot find symbol for section 11: .text.unlikely. > kernel/kexec_file.o: failed > make[1]: *** [scripts/Makefile.build:287: kernel/kexec_file.o] Error > 1 > > In order to avoid that, ensure that the section has a content before > returning it's name in has_rel_mcount(). This patch doesn't work, on PPC32 I get the following message with this patch applied: [0.00] ftrace: No functions to be traced? Without the patch I get: [0.00] ftrace: allocating 22381 entries in 66 pages [0.00] ftrace: allocated 66 pages with 2 groups >>> >>> Because of this report, I have not applied this patch (even though I was >>> about to push it to Linus). >>> >>> I'm pulling it from my queue until this gets resolved. >>> >> >> I have no idea on how to fix that for the moment. >> >> With GCC 10 (binutils 2.36) an objdump -x on kernel/kexec_file.o gives: >> >> ld .text.unlikely .text.unlikely >> wF .text.unlikely 0038 >> .arch_kexec_apply_relocations_add >> 0038 wF .text.unlikely 0038 >> .arch_kexec_apply_relocations >> >> >> With GCC 11 (binutils 2.37) the same gives: >> >> wF .text.unlikely 0038 >> .arch_kexec_apply_relocations_add >> 0038 wF .text.unlikely 0038 >> .arch_kexec_apply_relocations >> >> >> The problem is that recordmcount drops weak symbols, and it doesn't find >> any non-weak symbol in .text.unlikely >> >> Explication given at >> https://elixir.bootlin.com/linux/v5.16-rc2/source/scripts/recordmcount.h#L506 >> >> I have no idea on what to do. > > Did you end up finding a solution for this issue? > Not really, my solution was to switch to the kernel compiler at https://mirrors.edge.kernel.org/pub/tools/crosstool/ which embeds binutils 2.36 But it looks like using objtool instead of recordmcount doesn't exhibit the problem. https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220318105140.43914-4...@linux.ibm.com/ Christophe
[powerpc:next] BUILD SUCCESS af41d2866f7d75bbb38d487f6ec7770425d70e45
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next branch HEAD: af41d2866f7d75bbb38d487f6ec7770425d70e45 powerpc/64: Fix build failure with allyesconfig in book3s_64_entry.S elapsed time: 967m configs tested: 137 configs skipped: 136 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arm defconfig arm64allyesconfig arm64 defconfig arm allyesconfig arm allmodconfig powerpc randconfig-c003-20220330 i386 randconfig-c001 sh rts7751r2d1_defconfig armrealview_defconfig arm assabet_defconfig alpha defconfig xtensa allyesconfig sh se7619_defconfig sh lboxre2_defconfig powerpc pasemi_defconfig arm imx_v6_v7_defconfig parisc64 alldefconfig mips tb0226_defconfig armclps711x_defconfig armxcep_defconfig arm badge4_defconfig arm pxa_defconfig ia64 allmodconfig arc defconfig i386defconfig armcerfcube_defconfig openriscor1ksim_defconfig m68k m5249evb_defconfig powerpc ppc40x_defconfig armqcom_defconfig mips db1xxx_defconfig sh sdk7786_defconfig powerpc wii_defconfig powerpc mpc85xx_cds_defconfig arm randconfig-c002-20220327 arm randconfig-c002-20220329 arm randconfig-c002-20220330 ia64defconfig ia64 allyesconfig m68k allmodconfig m68kdefconfig m68k allyesconfig nios2allyesconfig cskydefconfig alphaallyesconfig h8300allyesconfig sh allmodconfig parisc defconfig s390 allyesconfig s390 allmodconfig parisc64defconfig parisc allyesconfig s390defconfig i386 allyesconfig sparcallyesconfig sparc defconfig i386 debian-10.3-kselftests i386 debian-10.3 mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig x86_64 randconfig-a001-20220328 x86_64 randconfig-a003-20220328 x86_64 randconfig-a004-20220328 x86_64 randconfig-a002-20220328 x86_64 randconfig-a005-20220328 x86_64 randconfig-a006-20220328 i386 randconfig-a001-20220328 i386 randconfig-a003-20220328 i386 randconfig-a006-20220328 i386 randconfig-a005-20220328 i386 randconfig-a004-20220328 i386 randconfig-a002-20220328 x86_64randconfig-a006 x86_64randconfig-a004 x86_64randconfig-a002 riscvnommu_k210_defconfig riscvallyesconfig riscvnommu_virt_defconfig riscv allnoconfig riscv defconfig riscv rv32_defconfig riscvallmodconfig x86_64rhel-8.3-kselftests um x86_64_defconfig um i386_defconfig x86_64 allyesconfig x86_64 defconfig x86_64 rhel-8.3 x86_64 rhel-8.3-func x86_64 rhel-8.3-kunit x86_64 kexec clang tested configs: mips randconfig-c004-20220329 x86_64randconfig-c007 mips randconfig-c004-20220327 arm randconfig-c002-20220327 arm randconfig-c002-20220329 riscv
[PATCH v3] ftrace: Make ftrace_graph_is_dead() a static branch
ftrace_graph_is_dead() is used on hot paths, it just reads a variable in memory and is not worth suffering function call constraints. For instance, at entry of prepare_ftrace_return(), inlining it avoids saving prepare_ftrace_return() parameters to stack and restoring them after calling ftrace_graph_is_dead(). While at it using a static branch is even more performant and is rather well adapted considering that the returned value will almost never change. Inline ftrace_graph_is_dead() and replace 'kill_ftrace_graph' bool by a static branch. The performance improvement is noticeable. Signed-off-by: Christophe Leroy --- v3: Keep includes in upside-down x-mas tree v2: Use a static branch instead of a global bool var. --- include/linux/ftrace.h | 16 +++- kernel/trace/fgraph.c | 17 +++-- 2 files changed, 18 insertions(+), 15 deletions(-) diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index ed8cf433a46a..4816b7e11047 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -9,6 +9,7 @@ #include #include +#include #include #include #include @@ -1018,7 +1019,20 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx, extern int register_ftrace_graph(struct fgraph_ops *ops); extern void unregister_ftrace_graph(struct fgraph_ops *ops); -extern bool ftrace_graph_is_dead(void); +/** + * ftrace_graph_is_dead - returns true if ftrace_graph_stop() was called + * + * ftrace_graph_stop() is called when a severe error is detected in + * the function graph tracing. This function is called by the critical + * paths of function graph to keep those paths from doing any more harm. + */ +DECLARE_STATIC_KEY_FALSE(kill_ftrace_graph); + +static inline bool ftrace_graph_is_dead(void) +{ + return static_branch_unlikely(_ftrace_graph); +} + extern void ftrace_graph_stop(void); /* The current handlers in use */ diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c index 19028e072cdb..8f4fb328133a 100644 --- a/kernel/trace/fgraph.c +++ b/kernel/trace/fgraph.c @@ -7,6 +7,7 @@ * * Highly modified by Steven Rostedt (VMware). */ +#include #include #include #include @@ -23,24 +24,12 @@ #define ASSIGN_OPS_HASH(opsname, val) #endif -static bool kill_ftrace_graph; +DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph); int ftrace_graph_active; /* Both enabled by default (can be cleared by function_graph tracer flags */ static bool fgraph_sleep_time = true; -/** - * ftrace_graph_is_dead - returns true if ftrace_graph_stop() was called - * - * ftrace_graph_stop() is called when a severe error is detected in - * the function graph tracing. This function is called by the critical - * paths of function graph to keep those paths from doing any more harm. - */ -bool ftrace_graph_is_dead(void) -{ - return kill_ftrace_graph; -} - /** * ftrace_graph_stop - set to permanently disable function graph tracing * @@ -51,7 +40,7 @@ bool ftrace_graph_is_dead(void) */ void ftrace_graph_stop(void) { - kill_ftrace_graph = true; + static_branch_enable(_ftrace_graph); } /* Add a function return address to the trace stack on thread info.*/ -- 2.35.1
Re: [PATCH v2 7/8] powerpc/pgtable: remove _PAGE_BIT_SWAP_TYPE for book3s
On 30.03.22 08:07, Christophe Leroy wrote: > > > Le 29/03/2022 à 18:43, David Hildenbrand a écrit : >> The swap type is simply stored in bits 0x1f of the swap pte. Let's >> simplify by just getting rid of _PAGE_BIT_SWAP_TYPE. It's not like that >> we can simply change it: _PAGE_SWP_SOFT_DIRTY would suddenly fall into >> _RPAGE_RSV1, which isn't possible and would make the >> BUILD_BUG_ON(_PAGE_HPTEFLAGS & _PAGE_SWP_SOFT_DIRTY) angry. >> >> While at it, make it clearer which bit we're actually using for >> _PAGE_SWP_SOFT_DIRTY by just using the proper define and introduce and >> use SWP_TYPE_MASK. >> >> Signed-off-by: David Hildenbrand >> --- >> arch/powerpc/include/asm/book3s/64/pgtable.h | 12 +--- > > Why only BOOK3S ? Why not BOOK3E as well ? Hi Cristophe, I'm focusing on the most relevant enterprise architectures for now. I don't have the capacity to convert each and every architecture at this point (especially, I don't to waste my time in case this doesn't get merged, and book3e didn't look straight forward to me). Once this series hits upstream, I can look into other architectures -- and I'll be happy if other people jump in that have more familiarity with the architecture-specific swp pte layouts. Thanks -- Thanks, David / dhildenb
Re: [PATCH v2] ftrace: Make ftrace_graph_is_dead() a static branch
Le 30/03/2022 à 04:07, Steven Rostedt a écrit : > On Fri, 25 Mar 2022 09:03:08 +0100 > Christophe Leroy wrote: > >> --- a/kernel/trace/fgraph.c >> +++ b/kernel/trace/fgraph.c >> @@ -10,6 +10,7 @@ >> #include >> #include >> #include >> +#include >> > > Small nit. Please order the includes in "upside-down x-mas tree" fashion: > > #include > #include > #include > #include > That's the first time I get such a request. Usually people request #includes to be in alphabetical order so when I see a file that has headers in alphabetical order I try to not break it, but here that was not the case so I put it at the end of the list. I'll send v3 Thanks Christophe
Re: [PATCH v2 7/8] powerpc/pgtable: remove _PAGE_BIT_SWAP_TYPE for book3s
Le 29/03/2022 à 18:43, David Hildenbrand a écrit : > The swap type is simply stored in bits 0x1f of the swap pte. Let's > simplify by just getting rid of _PAGE_BIT_SWAP_TYPE. It's not like that > we can simply change it: _PAGE_SWP_SOFT_DIRTY would suddenly fall into > _RPAGE_RSV1, which isn't possible and would make the > BUILD_BUG_ON(_PAGE_HPTEFLAGS & _PAGE_SWP_SOFT_DIRTY) angry. > > While at it, make it clearer which bit we're actually using for > _PAGE_SWP_SOFT_DIRTY by just using the proper define and introduce and > use SWP_TYPE_MASK. > > Signed-off-by: David Hildenbrand > --- > arch/powerpc/include/asm/book3s/64/pgtable.h | 12 +--- Why only BOOK3S ? Why not BOOK3E as well ? Christophe > 1 file changed, 5 insertions(+), 7 deletions(-) > > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h > b/arch/powerpc/include/asm/book3s/64/pgtable.h > index 875730d5af40..8e98375d5c4a 100644 > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h > @@ -13,7 +13,6 @@ > /* >* Common bits between hash and Radix page table >*/ > -#define _PAGE_BIT_SWAP_TYPE 0 > > #define _PAGE_EXEC 0x1 /* execute permission */ > #define _PAGE_WRITE 0x2 /* write access allowed */ > @@ -751,17 +750,16 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t > newprot) >* Don't have overlapping bits with _PAGE_HPTEFLAGS \ >* We filter HPTEFLAGS on set_pte. \ >*/ \ > - BUILD_BUG_ON(_PAGE_HPTEFLAGS & (0x1f << _PAGE_BIT_SWAP_TYPE)); \ > + BUILD_BUG_ON(_PAGE_HPTEFLAGS & SWP_TYPE_MASK); \ > BUILD_BUG_ON(_PAGE_HPTEFLAGS & _PAGE_SWP_SOFT_DIRTY); \ > } while (0) > > #define SWP_TYPE_BITS 5 > -#define __swp_type(x)(((x).val >> _PAGE_BIT_SWAP_TYPE) \ > - & ((1UL << SWP_TYPE_BITS) - 1)) > +#define SWP_TYPE_MASK((1UL << SWP_TYPE_BITS) - 1) > +#define __swp_type(x)((x).val & SWP_TYPE_MASK) > #define __swp_offset(x) (((x).val & PTE_RPN_MASK) >> PAGE_SHIFT) > #define __swp_entry(type, offset) ((swp_entry_t) { \ > - ((type) << _PAGE_BIT_SWAP_TYPE) \ > - | (((offset) << PAGE_SHIFT) & PTE_RPN_MASK)}) > + (type) | (((offset) << PAGE_SHIFT) & > PTE_RPN_MASK)}) > /* >* swp_entry_t must be independent of pte bits. We build a swp_entry_t from >* swap type and offset we get from swap and convert that to pte to find a > @@ -774,7 +772,7 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t > newprot) > #define __swp_entry_to_pmd(x) (pte_pmd(__swp_entry_to_pte(x))) > > #ifdef CONFIG_MEM_SOFT_DIRTY > -#define _PAGE_SWP_SOFT_DIRTY (1UL << (SWP_TYPE_BITS + _PAGE_BIT_SWAP_TYPE)) > +#define _PAGE_SWP_SOFT_DIRTY _PAGE_NON_IDEMPOTENT > #else > #define _PAGE_SWP_SOFT_DIRTY0UL > #endif /* CONFIG_MEM_SOFT_DIRTY */
Re: [RFC][PATCH 2/2] powerpc/papr_scm: Implement support for reporting generic nvdimm stats
Hi, Le 08/11/2020 à 22:15, Vaibhav Jain a écrit : Add support for reporting papr-scm supported generic nvdimm stats by implementing support for handling ND_CMD_GET_STAT in 'papr_scm_ndctl(). The mapping between libnvdimm generic nvdimm-stats and papr-scm specific performance-stats is embedded inside 'dimm_stats_map[]'. This array is queried by newly introduced 'papr_scm_get_stat()' that verifies if the requested nvdimm-stat is supported and if yes does an hcall via 'drc_pmem_query_stat()' to request the performance-stat and return it back to libnvdimm. Signed-off-by: Vaibhav Jain I see this series is still flagged as 'new' in patchwork. I saw some patches providing stats to nvdimm are in a pull request for 5.18. I imagine this is the same subject, so I'm going to change the status of this series. Let me know if I'm wrong. Thanks Christophe --- arch/powerpc/platforms/pseries/papr_scm.c | 66 ++- 1 file changed, 65 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index 835163f54244..51eeab3376fd 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -25,7 +25,8 @@ ((1ul << ND_CMD_GET_CONFIG_SIZE) | \ (1ul << ND_CMD_GET_CONFIG_DATA) | \ (1ul << ND_CMD_SET_CONFIG_DATA) | \ -(1ul << ND_CMD_CALL)) +(1ul << ND_CMD_CALL) | \ +(1ul << ND_CMD_GET_STAT)) /* DIMM health bitmap bitmap indicators */ /* SCM device is unable to persist memory contents */ @@ -120,6 +121,16 @@ struct papr_scm_priv { static LIST_HEAD(papr_nd_regions); static DEFINE_MUTEX(papr_ndr_lock); +/* Map generic nvdimm stats to papr-scm stats */ +static const char * const dimm_stat_map[] = { + [ND_DIMM_STAT_INVALID] = NULL, + [ND_DIMM_STAT_MEDIA_READS] = "MedRCnt ", + [ND_DIMM_STAT_MEDIA_WRITES] = "MedWCnt ", + [ND_DIMM_STAT_READ_REQUESTS] = "HostLCnt", + [ND_DIMM_STAT_WRITE_REQUESTS] = "HostSCnt", + [ND_DIMM_STAT_MAX] = NULL, +}; + static int drc_pmem_bind(struct papr_scm_priv *p) { unsigned long ret[PLPAR_HCALL_BUFSIZE]; @@ -728,6 +739,54 @@ static int papr_scm_service_pdsm(struct papr_scm_priv *p, return pdsm_pkg->cmd_status; } +/* + * For a given pdsm request call an appropriate service function. + * Returns errors if any while handling the pdsm command package. + */ +static int papr_scm_get_stat(struct papr_scm_priv *p, +struct nd_cmd_get_dimm_stat *dimm_stat) + +{ + int rc; + ssize_t size; + struct papr_scm_perf_stat *stat; + struct papr_scm_perf_stats *stats; + + /* Check if the requested stat-id is supported */ + if (dimm_stat->stat_id >= ARRAY_SIZE(dimm_stat_map) || + !dimm_stat_map[dimm_stat->stat_id]) { + dev_dbg(>pdev->dev, "Invalid stat-id %lld\n", dimm_stat->stat_id); + return -ENOSPC; + } + + /* Allocate request buffer enough to hold single performance stat */ + size = sizeof(struct papr_scm_perf_stats) + + sizeof(struct papr_scm_perf_stat); + + stats = kzalloc(size, GFP_KERNEL); + if (!stats) + return -ENOMEM; + + stat = >scm_statistic[0]; + memcpy(>stat_id, dimm_stat_map[dimm_stat->stat_id], + sizeof(stat->stat_id)); + stat->stat_val = 0; + + /* Fetch the statistic from PHYP and copy it to provided payload */ + rc = drc_pmem_query_stats(p, stats, 1); + if (rc < 0) { + dev_dbg(>pdev->dev, "Err(%d) fetching stat '%.8s'\n", + rc, stat->stat_id); + kfree(stats); + return rc; + } + + dimm_stat->int_val = be64_to_cpu(stat->stat_val); + + kfree(stats); + return 0; +} + static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm, unsigned int cmd, void *buf, unsigned int buf_len, int *cmd_rc) @@ -772,6 +831,11 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc, *cmd_rc = papr_scm_service_pdsm(p, call_pkg); break; + case ND_CMD_GET_STAT: + *cmd_rc = papr_scm_get_stat(p, + (struct nd_cmd_get_dimm_stat *)buf); + break; + default: dev_dbg(>pdev->dev, "Unknown command = %d\n", cmd); return -EINVAL;