Re: [RFC PATCH 1/5] powerpc/64s/hash: convert SLB miss handlers to C
On Tue, 21 Aug 2018 16:46:02 +1000 Michael Ellerman wrote: > Nicholas Piggin writes: > > > This patch moves SLB miss handlers completely to C, using the standard > > exception handler macros to set up the stack and branch to C. > > > > This can be done because the segment containing the kernel stack is > > always bolted, so accessing it with relocation on will not cause an > > SLB exception. > > > > Arbitrary kernel memory may not be accessed when handling kernel space > > SLB misses, so care should be taken there. > > We'll need to mark everything that's used in slb.c as notrace, otherwise > > Probably we just need to mark the whole file as not traceable. Yeah good point there. I'll do that. The whole file including things we allow today? How do we do that, like this? CFLAGS_REMOVE_slb.o = -mno-sched-epilog $(CC_FLAGS_FTRACE) Thanks, Nick
Re: [v5] powerpc/topology: Get topology for shared processors at boot
* Michael Ellerman [2018-08-21 20:35:23]: > On Fri, 2018-08-17 at 14:54:39 UTC, Srikar Dronamraju wrote: > > On a shared lpar, Phyp will not update the cpu associativity at boot > > time. Just after the boot system does recognize itself as a shared lpar and > > trigger a request for correct cpu associativity. But by then the scheduler > > would have already created/destroyed its sched domains. > > > > This causes > > - Broken load balance across Nodes causing islands of cores. > > - Performance degradation esp if the system is lightly loaded > > - dmesg to wrongly report all cpus to be in Node 0. > > - Messages in dmesg saying borken topology. > > - With commit 051f3ca02e46 ("sched/topology: Introduce NUMA identity > > node sched domain"), can cause rcu stalls at boot up. > > > > > > Previous attempt to solve this problem > > https://patchwork.ozlabs.org/patch/530090/ > > > > Reported-by: Manjunatha H R > > Signed-off-by: Srikar Dronamraju > > Applied to powerpc next, thanks. > > https://git.kernel.org/powerpc/c/2ea62630681027c455117aa471ea3a > Once it gets to Linus's tree, can we request this to be included in stable trees? -- Thanks and Regards Srikar Dronamraju
Re: [PATCH] crypto: vmx - Fix sleep-in-atomic bugs
CC: Paulo Flabiano Smorigo , linuxppc-dev@lists.ozlabs.org (Sorry, sent this before reading new e-mails in the thread...) ut 21. 8. 2018 o 17:18 Ondrej Mosnacek napísal(a): > > This patch fixes sleep-in-atomic bugs in AES-CBC and AES-XTS VMX > implementations. The problem is that the blkcipher_* functions should > not be called in atomic context. > > The bugs can be reproduced via the AF_ALG interface by trying to > encrypt/decrypt sufficiently large buffers (at least 64 KiB) using the > VMX implementations of 'cbc(aes)' or 'xts(aes)'. Such operations then > trigger BUG in crypto_yield(): > > [ 891.863680] BUG: sleeping function called from invalid context at > include/crypto/algapi.h:424 > [ 891.864622] in_atomic(): 1, irqs_disabled(): 0, pid: 12347, name: kcapi-enc > [ 891.864739] 1 lock held by kcapi-enc/12347: > [ 891.864811] #0: f5d42c46 (sk_lock-AF_ALG){+.+.}, at: > skcipher_recvmsg+0x50/0x530 > [ 891.865076] CPU: 5 PID: 12347 Comm: kcapi-enc Not tainted > 4.19.0-0.rc0.git3.1.fc30.ppc64le #1 > [ 891.865251] Call Trace: > [ 891.865340] [c003387578c0] [c0d67ea4] dump_stack+0xe8/0x164 > (unreliable) > [ 891.865511] [c00338757910] [c0172a58] > ___might_sleep+0x2f8/0x310 > [ 891.865679] [c00338757990] [c06bff74] > blkcipher_walk_done+0x374/0x4a0 > [ 891.865825] [c003387579e0] [d7e73e70] > p8_aes_cbc_encrypt+0x1c8/0x260 [vmx_crypto] > [ 891.865993] [c00338757ad0] [c06c0ee0] > skcipher_encrypt_blkcipher+0x60/0x80 > [ 891.866128] [c00338757b10] [c06ec504] > skcipher_recvmsg+0x424/0x530 > [ 891.866283] [c00338757bd0] [c0b00654] sock_recvmsg+0x74/0xa0 > [ 891.866403] [c00338757c10] [c0b00f64] ___sys_recvmsg+0xf4/0x2f0 > [ 891.866515] [c00338757d90] [c0b02bb8] __sys_recvmsg+0x68/0xe0 > [ 891.866631] [c00338757e30] [c000bbe4] system_call+0x5c/0x70 > > Fixes: 8c755ace357c ("crypto: vmx - Adding CBC routines for VMX module") > Fixes: c07f5d3da643 ("crypto: vmx - Adding support for XTS") > Cc: sta...@vger.kernel.org > Signed-off-by: Ondrej Mosnacek > --- > This patch should fix the issue, but I didn't test it. (I'll see if I > can find some time tomorrow to try and recompile the kernel on a PPC > machine... in the meantime please review :) > > drivers/crypto/vmx/aes_cbc.c | 30 ++ > drivers/crypto/vmx/aes_xts.c | 19 --- > 2 files changed, 26 insertions(+), 23 deletions(-) > > diff --git a/drivers/crypto/vmx/aes_cbc.c b/drivers/crypto/vmx/aes_cbc.c > index 5285ece4f33a..b71895871be3 100644 > --- a/drivers/crypto/vmx/aes_cbc.c > +++ b/drivers/crypto/vmx/aes_cbc.c > @@ -107,24 +107,23 @@ static int p8_aes_cbc_encrypt(struct blkcipher_desc > *desc, > ret = crypto_skcipher_encrypt(req); > skcipher_request_zero(req); > } else { > - preempt_disable(); > - pagefault_disable(); > - enable_kernel_vsx(); > - > blkcipher_walk_init(, dst, src, nbytes); > ret = blkcipher_walk_virt(desc, ); > while ((nbytes = walk.nbytes)) { > + preempt_disable(); > + pagefault_disable(); > + enable_kernel_vsx(); > aes_p8_cbc_encrypt(walk.src.virt.addr, >walk.dst.virt.addr, >nbytes & AES_BLOCK_MASK, >>enc_key, walk.iv, 1); > + disable_kernel_vsx(); > + pagefault_enable(); > + preempt_enable(); > + > nbytes &= AES_BLOCK_SIZE - 1; > ret = blkcipher_walk_done(desc, , nbytes); > } > - > - disable_kernel_vsx(); > - pagefault_enable(); > - preempt_enable(); > } > > return ret; > @@ -147,24 +146,23 @@ static int p8_aes_cbc_decrypt(struct blkcipher_desc > *desc, > ret = crypto_skcipher_decrypt(req); > skcipher_request_zero(req); > } else { > - preempt_disable(); > - pagefault_disable(); > - enable_kernel_vsx(); > - > blkcipher_walk_init(, dst, src, nbytes); > ret = blkcipher_walk_virt(desc, ); > while ((nbytes = walk.nbytes)) { > + preempt_disable(); > + pagefault_disable(); > + enable_kernel_vsx(); > aes_p8_cbc_encrypt(walk.src.virt.addr, >walk.dst.virt.addr, >nbytes & AES_BLOCK_MASK, >>dec_key, walk.iv, 0); > + disable_kernel_vsx(); > +
[PATCH 7/8] powerpc: enable building all dtbs
Enable the 'dtbs' target for powerpc. This allows building all the dts files in arch/powerpc/boot/dts/ when COMPILE_TEST and OF_ALL_DTBS are enabled. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Rob Herring --- arch/powerpc/boot/dts/Makefile | 5 + arch/powerpc/boot/dts/fsl/Makefile | 4 2 files changed, 9 insertions(+) create mode 100644 arch/powerpc/boot/dts/fsl/Makefile diff --git a/arch/powerpc/boot/dts/Makefile b/arch/powerpc/boot/dts/Makefile index f66554cd5c45..fb335d05aae8 100644 --- a/arch/powerpc/boot/dts/Makefile +++ b/arch/powerpc/boot/dts/Makefile @@ -1 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 + +subdir-y += fsl + +dtstree:= $(srctree)/$(src) +dtb-$(CONFIG_OF_ALL_DTBS) := $(patsubst $(dtstree)/%.dts,%.dtb, $(wildcard $(dtstree)/*.dts)) diff --git a/arch/powerpc/boot/dts/fsl/Makefile b/arch/powerpc/boot/dts/fsl/Makefile new file mode 100644 index ..3bae982641e9 --- /dev/null +++ b/arch/powerpc/boot/dts/fsl/Makefile @@ -0,0 +1,4 @@ +# SPDX-License-Identifier: GPL-2.0 + +dtstree:= $(srctree)/$(src) +dtb-$(CONFIG_OF_ALL_DTBS) := $(patsubst $(dtstree)/%.dts,%.dtb, $(wildcard $(dtstree)/*.dts)) -- 2.17.1
[PATCH 6/8] kbuild: consolidate Devicetree dtb build rules
There is nothing arch specific about building dtb files other than their location under /arch/*/boot/dts/. Keeping each arch aligned is a pain. The dependencies and supported targets are all slightly different. Also, a cross-compiler for each arch is needed, but really the host compiler preprocessor is perfectly fine for building dtbs. Move the build rules to a common location and remove the arch specific ones. This is done in a single step to avoid warnings about overriding rules. The build dependencies had been a mixture of 'scripts' and/or 'prepare'. These pull in several dependencies some of which need a target compiler (specifically devicetable-offsets.h) and aren't needed to build dtbs. All that is really needed is dtc, so adjust the dependencies to only be dtc. This change enables support 'dtbs_install' on some arches which were missing the target. Cc: Masahiro Yamada Cc: Michal Marek Cc: Vineet Gupta Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon Cc: Yoshinori Sato Cc: Michal Simek Cc: Ralf Baechle Cc: Paul Burton Cc: James Hogan Cc: Ley Foon Tan Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Chris Zankel Cc: Max Filippov Cc: linux-kbu...@vger.kernel.org Cc: linux-snps-...@lists.infradead.org Cc: linux-arm-ker...@lists.infradead.org Cc: uclinux-h8-de...@lists.sourceforge.jp Cc: linux-m...@linux-mips.org Cc: nios2-...@lists.rocketboards.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-xte...@linux-xtensa.org Signed-off-by: Rob Herring --- Makefile | 30 ++ arch/arc/Makefile| 6 -- arch/arm/Makefile| 20 +--- arch/arm64/Makefile | 17 + arch/c6x/Makefile| 2 -- arch/h8300/Makefile | 11 +-- arch/microblaze/Makefile | 4 +--- arch/mips/Makefile | 15 +-- arch/nds32/Makefile | 2 +- arch/nios2/Makefile | 7 --- arch/nios2/boot/Makefile | 4 arch/powerpc/Makefile| 3 --- arch/xtensa/Makefile | 12 +--- scripts/Makefile | 1 - scripts/Makefile.lib | 2 +- 15 files changed, 38 insertions(+), 98 deletions(-) diff --git a/Makefile b/Makefile index c13f8b85ba60..6d89e673f192 100644 --- a/Makefile +++ b/Makefile @@ -1212,6 +1212,30 @@ kselftest-merge: $(srctree)/tools/testing/selftests/*/config +$(Q)$(MAKE) -f $(srctree)/Makefile olddefconfig +# --- +# Devicetree files + +dtstree := $(wildcard arch/$(SRCARCH)/boot/dts) + +ifdef CONFIG_OF_EARLY_FLATTREE + +%.dtb %.dtb.S %.dtb.o: | dtc + $(Q)$(MAKE) $(build)=$(dtstree) $(dtstree)/$@ + +PHONY += dtbs +dtbs: | dtc + $(Q)$(MAKE) $(build)=$(dtstree) + +dtbs_install: dtbs + $(Q)$(MAKE) $(dtbinst)=$(dtstree) + +all: dtbs + +dtc: + $(Q)$(MAKE) $(build)=scripts/dtc + +endif + # --- # Modules @@ -1425,6 +1449,12 @@ help: @echo ' kselftest-merge - Merge all the config dependencies of kselftest to existing' @echo '.config.' @echo '' + @$(if $(dtstree), \ + echo 'Devicetree:'; \ + echo '* dtbs- Build device tree blobs for enabled boards'; \ + echo ' dtbs_install- Install dtbs to $(INSTALL_DTBS_PATH)'; \ + echo '') + @echo 'Userspace tools targets:' @echo ' use "make tools/help"' @echo ' or "cd tools; make help"' diff --git a/arch/arc/Makefile b/arch/arc/Makefile index 6c1b20dd76ad..cbfb7a16b570 100644 --- a/arch/arc/Makefile +++ b/arch/arc/Makefile @@ -132,11 +132,5 @@ boot_targets += uImage uImage.bin uImage.gz $(boot_targets): vmlinux $(Q)$(MAKE) $(build)=$(boot) $(boot)/$@ -%.dtb %.dtb.S %.dtb.o: scripts - $(Q)$(MAKE) $(build)=$(boot)/dts $(boot)/dts/$@ - -dtbs: scripts - $(Q)$(MAKE) $(build)=$(boot)/dts - archclean: $(Q)$(MAKE) $(clean)=$(boot) diff --git a/arch/arm/Makefile b/arch/arm/Makefile index e7d703d8fac3..7f02ef8dfdb2 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -308,12 +308,7 @@ else KBUILD_IMAGE := $(boot)/zImage endif -# Build the DT binary blobs if we have OF configured -ifeq ($(CONFIG_USE_OF),y) -KBUILD_DTBS := dtbs -endif - -all: $(notdir $(KBUILD_IMAGE)) $(KBUILD_DTBS) +all: $(notdir $(KBUILD_IMAGE)) archheaders: @@ -340,17 +335,6 @@ $(BOOT_TARGETS): vmlinux $(INSTALL_TARGETS): $(Q)$(MAKE) $(build)=$(boot) MACHINE=$(MACHINE) $@ -%.dtb: | scripts - $(Q)$(MAKE) $(build)=$(boot)/dts MACHINE=$(MACHINE) $(boot)/dts/$@ - -PHONY += dtbs dtbs_install - -dtbs: prepare scripts - $(Q)$(MAKE) $(build)=$(boot)/dts - -dtbs_install: - $(Q)$(MAKE) $(dtbinst)=$(boot)/dts - PHONY += vdso_install vdso_install: ifeq ($(CONFIG_VDSO),y) @@ -372,8 +356,6 @@ define archhelp echo ' uImage
[PATCH 0/8] Devicetree build consolidation
This series addresses a couple of issues I have with building dts files. First, the ability to build all the dts files in the tree. This has been supported on most arches for some time with powerpc being the main exception. The reason powerpc wasn't supported was it needed a change in the location built dtb files are put. Secondly, it's a pain to acquire all the cross-compilers needed to build dtbs for each arch. There's no reason to build with the cross compiler and the host compiler is perfectly fine as we only need the pre-processor. I started addressing just those 2 problems, but kept finding small differences such as target dependencies and dtbs_install support across architectures. Instead of trying to align all these, I've consolidated the build targets moving them out of the arch makefiles. I'd like to take the series via the DT tree. Rob Rob Herring (8): powerpc: build .dtb files in dts directory nios2: build .dtb files in dts directory nios2: use common rules to build built-in dtb nios2: fix building all dtbs c6x: use common built-in dtb support kbuild: consolidate Devicetree dtb build rules powerpc: enable building all dtbs c6x: enable building all dtbs Makefile | 30 ++ arch/arc/Makefile | 6 arch/arm/Makefile | 20 +--- arch/arm64/Makefile| 17 +-- arch/c6x/Makefile | 2 -- arch/c6x/boot/dts/Makefile | 17 +-- arch/c6x/boot/dts/linked_dtb.S | 2 -- arch/c6x/include/asm/sections.h| 1 - arch/c6x/kernel/setup.c| 4 +-- arch/c6x/kernel/vmlinux.lds.S | 10 -- arch/h8300/Makefile| 11 +-- arch/microblaze/Makefile | 4 +-- arch/mips/Makefile | 15 + arch/nds32/Makefile| 2 +- arch/nios2/Makefile| 11 +-- arch/nios2/boot/Makefile | 22 -- arch/nios2/boot/dts/Makefile | 6 arch/nios2/boot/linked_dtb.S | 19 arch/powerpc/Makefile | 3 -- arch/powerpc/boot/Makefile | 49 ++ arch/powerpc/boot/dts/Makefile | 6 arch/powerpc/boot/dts/fsl/Makefile | 4 +++ arch/xtensa/Makefile | 12 +--- scripts/Makefile | 1 - scripts/Makefile.lib | 2 +- 25 files changed, 87 insertions(+), 189 deletions(-) delete mode 100644 arch/c6x/boot/dts/linked_dtb.S create mode 100644 arch/nios2/boot/dts/Makefile delete mode 100644 arch/nios2/boot/linked_dtb.S create mode 100644 arch/powerpc/boot/dts/Makefile create mode 100644 arch/powerpc/boot/dts/fsl/Makefile -- 2.17.1
[PATCH 1/8] powerpc: build .dtb files in dts directory
Align powerpc with other architectures which build the dtb files in the same directory as the dts files. This is also in line with most other build targets which are located in the same directory as the source. This move will help enable the 'dtbs' target which builds all the dtbs regardless of kernel config. This transition could break some scripts if they expect dtb files in the old location. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Rob Herring --- arch/powerpc/Makefile | 2 +- arch/powerpc/boot/Makefile | 49 -- arch/powerpc/boot/dts/Makefile | 1 + 3 files changed, 25 insertions(+), 27 deletions(-) create mode 100644 arch/powerpc/boot/dts/Makefile diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 8397c7bd5880..9bacffa3b72e 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -294,7 +294,7 @@ bootwrapper_install: $(Q)$(MAKE) $(build)=$(boot) $(patsubst %,$(boot)/%,$@) %.dtb: scripts - $(Q)$(MAKE) $(build)=$(boot) $(patsubst %,$(boot)/%,$@) + $(Q)$(MAKE) $(build)=$(boot)/dts $(patsubst %,$(boot)/dts/%,$@) # Used to create 'merged defconfigs' # To use it $(call) it with the first argument as the base defconfig diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile index 0fb96c26136f..b201d93e1725 100644 --- a/arch/powerpc/boot/Makefile +++ b/arch/powerpc/boot/Makefile @@ -381,11 +381,11 @@ $(addprefix $(obj)/, $(sort $(filter zImage.%, $(image-y: vmlinux $(wrapperb $(call if_changed,wrap,$(subst $(obj)/zImage.,,$@)) # dtbImage% - a dtbImage is a zImage with an embedded device tree blob -$(obj)/dtbImage.initrd.%: vmlinux $(wrapperbits) $(obj)/%.dtb FORCE - $(call if_changed,wrap,$*,,$(obj)/$*.dtb,$(obj)/ramdisk.image.gz) +$(obj)/dtbImage.initrd.%: vmlinux $(wrapperbits) $(obj)/dts/%.dtb FORCE + $(call if_changed,wrap,$*,,$(obj)/dts/$*.dtb,$(obj)/ramdisk.image.gz) -$(obj)/dtbImage.%: vmlinux $(wrapperbits) $(obj)/%.dtb FORCE - $(call if_changed,wrap,$*,,$(obj)/$*.dtb) +$(obj)/dtbImage.%: vmlinux $(wrapperbits) $(obj)/dts/%.dtb FORCE + $(call if_changed,wrap,$*,,$(obj)/dts/$*.dtb) # This cannot be in the root of $(src) as the zImage rule always adds a $(obj) # prefix @@ -395,36 +395,33 @@ $(obj)/vmlinux.strip: vmlinux $(obj)/uImage: vmlinux $(wrapperbits) FORCE $(call if_changed,wrap,uboot) -$(obj)/uImage.initrd.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE - $(call if_changed,wrap,uboot-$*,,$(obj)/$*.dtb,$(obj)/ramdisk.image.gz) +$(obj)/uImage.initrd.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE + $(call if_changed,wrap,uboot-$*,,$(obj)/dts/$*.dtb,$(obj)/ramdisk.image.gz) -$(obj)/uImage.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE - $(call if_changed,wrap,uboot-$*,,$(obj)/$*.dtb) +$(obj)/uImage.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE + $(call if_changed,wrap,uboot-$*,,$(obj)/dts/$*.dtb) -$(obj)/cuImage.initrd.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE - $(call if_changed,wrap,cuboot-$*,,$(obj)/$*.dtb,$(obj)/ramdisk.image.gz) +$(obj)/cuImage.initrd.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE + $(call if_changed,wrap,cuboot-$*,,$(obj)/dts/$*.dtb,$(obj)/ramdisk.image.gz) -$(obj)/cuImage.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE - $(call if_changed,wrap,cuboot-$*,,$(obj)/$*.dtb) +$(obj)/cuImage.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE + $(call if_changed,wrap,cuboot-$*,,$(obj)/dts/$*.dtb) -$(obj)/simpleImage.initrd.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE - $(call if_changed,wrap,simpleboot-$*,,$(obj)/$*.dtb,$(obj)/ramdisk.image.gz) +$(obj)/simpleImage.initrd.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE + $(call if_changed,wrap,simpleboot-$*,,$(obj)/dts/$*.dtb,$(obj)/ramdisk.image.gz) -$(obj)/simpleImage.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE - $(call if_changed,wrap,simpleboot-$*,,$(obj)/$*.dtb) +$(obj)/simpleImage.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE + $(call if_changed,wrap,simpleboot-$*,,$(obj)/dts/$*.dtb) -$(obj)/treeImage.initrd.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE - $(call if_changed,wrap,treeboot-$*,,$(obj)/$*.dtb,$(obj)/ramdisk.image.gz) +$(obj)/treeImage.initrd.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE + $(call if_changed,wrap,treeboot-$*,,$(obj)/dts/$*.dtb,$(obj)/ramdisk.image.gz) -$(obj)/treeImage.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE - $(call if_changed,wrap,treeboot-$*,,$(obj)/$*.dtb) +$(obj)/treeImage.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE + $(call if_changed,wrap,treeboot-$*,,$(obj)/dts/$*.dtb) -# Rule to build device tree blobs -$(obj)/%.dtb: $(src)/dts/%.dts FORCE - $(call if_changed_dep,dtc) - -$(obj)/%.dtb: $(src)/dts/fsl/%.dts FORCE - $(call if_changed_dep,dtc) +# Needed for the above targets to work with dts/fsl/ files
Re: [PATCH 5/5] arm64: dts: add LX2160ARDB board support
On Mon, Aug 20, 2018 at 1:52 PM Vabhav Sharma wrote: > > LX2160A reference design board (RDB) is a high-performance > computing, evaluation, and development platform with LX2160A > SoC. > > Signed-off-by: Priyanka Jain > Signed-off-by: Sriram Dash > Signed-off-by: Vabhav Sharma > --- > arch/arm64/boot/dts/freescale/Makefile| 1 + > arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts | 95 > +++ > 2 files changed, 96 insertions(+) > create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts > > diff --git a/arch/arm64/boot/dts/freescale/Makefile > b/arch/arm64/boot/dts/freescale/Makefile > index 86e18ad..445b72b 100644 > --- a/arch/arm64/boot/dts/freescale/Makefile > +++ b/arch/arm64/boot/dts/freescale/Makefile > @@ -13,3 +13,4 @@ dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-rdb.dtb > dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-simu.dtb > dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-qds.dtb > dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-rdb.dtb > +dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-rdb.dtb > diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts > b/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts > new file mode 100644 > index 000..70fad20 > --- /dev/null > +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts > @@ -0,0 +1,95 @@ > +// SPDX-License-Identifier: (GPL-2.0 OR MIT) > +// > +// Device Tree file for LX2160ARDB > +// > +// Copyright 2018 NXP > + > +/dts-v1/; > + > +#include "fsl-lx2160a.dtsi" > + > +/ { > + model = "NXP Layerscape LX2160ARDB"; > + compatible = "fsl,lx2160a-rdb", "fsl,lx2160a"; > + > + aliases { > + crypto = Drop this. Aliases should be numbered, and this is not a standard alias name either. > + serial0 = > + serial1 = > + serial2 = > + serial3 = > + }; > + chosen { > + stdout-path = "serial0:115200n8"; > + }; > +}; > + > + { > + status = "okay"; > +}; > + > + { > + status = "okay"; > +}; > + > + { > + status = "okay"; > + pca9547@77 { i2c-mux@77 > + compatible = "nxp,pca9547"; > + reg = <0x77>; > + #address-cells = <1>; > + #size-cells = <0>; > + > + i2c@2 { > + #address-cells = <1>; > + #size-cells = <0>; > + reg = <0x2>; > + > + ina220@40 { > + compatible = "ti,ina220"; > + reg = <0x40>; > + shunt-resistor = <1000>; > + }; > + }; > + > + i2c@3 { > + #address-cells = <1>; > + #size-cells = <0>; > + reg = <0x3>; > + > + sa56004@4c { temperature-sensor@4c > + compatible = "nxp,sa56004"; > + reg = <0x4c>; > + }; > + > + sa56004@4d { > + compatible = "nxp,sa56004"; > + reg = <0x4d>; > + }; > + }; > + }; > +}; > + > + { > + status = "okay"; > + > + rtc@51 { > + compatible = "nxp,pcf2129"; > + reg = <0x51>; > + // IRQ10_B > + interrupts = <0 150 0x4>; > + }; > + > +}; > + > + { > + status = "okay"; > +}; > + > + { > + status = "okay"; > +}; > + > + { > + status = "okay"; > +}; > -- > 2.7.4 >
[PATCH] selftests/powerpc: Do not fail on TM_CAUSE_RESCHED
There are cases where the test is not expecting to have the transaction aborted, but, the test process might have been rescheduled, either in the OS level or by KVM (if it is running on a KVM guest machine). The process reschedule will cause a treclaim/recheckpoint which will cause the transaction to doom, failing as soon as the process is rescheduled back to the CPU. This might cause the test to fail, but this is not a failure in essence. If that is the case, TEXASR[FC] is indicated with either TM_CAUSE_RESCHEDULE or TM_CAUSE_KVM_RESCHEDULE for KVM interruptions. In this scenario, ignore these two failures and avoid the whole test to return failure. CC: Gustavo Romero Signed-off-by: Breno Leitao --- tools/testing/selftests/powerpc/tm/tm-unavailable.c | 10 +++--- tools/testing/selftests/powerpc/tm/tm.h | 9 + 2 files changed, 16 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/powerpc/tm/tm-unavailable.c b/tools/testing/selftests/powerpc/tm/tm-unavailable.c index 156c8e750259..0fd793f574c3 100644 --- a/tools/testing/selftests/powerpc/tm/tm-unavailable.c +++ b/tools/testing/selftests/powerpc/tm/tm-unavailable.c @@ -236,7 +236,8 @@ void *tm_una_ping(void *input) } /* Check if we were not expecting a failure and a it occurred. */ - if (!expecting_failure() && is_failure(cr_)) { + if (!expecting_failure() && is_failure(cr_) && + !failure_is_reschedule()) { printf("\n\tUnexpected transaction failure 0x%02lx\n\t", failure_code()); return (void *) -1; @@ -244,9 +245,12 @@ void *tm_una_ping(void *input) /* * Check if TM failed due to the cause we were expecting. 0xda is a -* TM_CAUSE_FAC_UNAV cause, otherwise it's an unexpected cause. +* TM_CAUSE_FAC_UNAV cause, otherwise it's an unexpected cause, unless +* it was caused by a reschedule. */ - if (is_failure(cr_) && !failure_is_unavailable()) { + + if (is_failure(cr_) && !failure_is_unavailable() + & !failure_is_reschedule()) { printf("\n\tUnexpected failure cause 0x%02lx\n\t", failure_code()); return (void *) -1; diff --git a/tools/testing/selftests/powerpc/tm/tm.h b/tools/testing/selftests/powerpc/tm/tm.h index df4204247d45..5518b1d4ef8b 100644 --- a/tools/testing/selftests/powerpc/tm/tm.h +++ b/tools/testing/selftests/powerpc/tm/tm.h @@ -52,6 +52,15 @@ static inline bool failure_is_unavailable(void) return (failure_code() & TM_CAUSE_FAC_UNAV) == TM_CAUSE_FAC_UNAV; } +static inline bool failure_is_reschedule(void) +{ + if ((failure_code() & TM_CAUSE_RESCHED) == TM_CAUSE_RESCHED || + (failure_code() & TM_CAUSE_KVM_RESCHED) == TM_CAUSE_KVM_RESCHED) + return true; + + return false; +} + static inline bool failure_is_nesting(void) { return (__builtin_get_texasru() & 0x40); -- 2.16.3
[PATCH] powerpc/xive: Initialize symbol before usage
Function xive_native_get_ipi() might uses chip_id without it being initialized. This gives the following error on 'smatch' tool: error: uninitialized symbol 'chip_id' This patch simply sets chip_id initial value to 0. CC: Benjamin Herrenschmidt Signed-off-by: Breno Leitao --- arch/powerpc/sysdev/xive/native.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/sysdev/xive/native.c b/arch/powerpc/sysdev/xive/native.c index 311185b9960a..fc56673a3c0f 100644 --- a/arch/powerpc/sysdev/xive/native.c +++ b/arch/powerpc/sysdev/xive/native.c @@ -239,7 +239,7 @@ static bool xive_native_match(struct device_node *node) static int xive_native_get_ipi(unsigned int cpu, struct xive_cpu *xc) { struct device_node *np; - unsigned int chip_id; + unsigned int chip_id = 0; s64 irq; /* Find the chip ID */ -- 2.16.3
[PATCH] powerpc/iommu: avoid derefence before pointer check
The tbl pointer is being derefenced by IOMMU_PAGE_SIZE prior the check if it is not NULL. Just moving the dereference code to after the check, where there will be guarantee that 'tbl' will not be NULL. CC: Alistair Popple Signed-off-by: Breno Leitao --- arch/powerpc/kernel/iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index af7a20dc6e09..80b6caaa9b92 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -785,9 +785,9 @@ dma_addr_t iommu_map_page(struct device *dev, struct iommu_table *tbl, vaddr = page_address(page) + offset; uaddr = (unsigned long)vaddr; - npages = iommu_num_pages(uaddr, size, IOMMU_PAGE_SIZE(tbl)); if (tbl) { + npages = iommu_num_pages(uaddr, size, IOMMU_PAGE_SIZE(tbl)); align = 0; if (tbl->it_page_shift < PAGE_SHIFT && size >= PAGE_SIZE && ((unsigned long)vaddr & ~PAGE_MASK) == 0) -- 2.16.3
[PATCH] powerpc/kernel: Fix potential spectre v1 in syscall
The rtas syscall reads a value from a user-provided structure and uses it to index an array, being a possible area for a potential spectre v1 attack. This is the code that exposes this problem. args.rets = [nargs]; The nargs is an user provided value, and the below code is an example where the 'nargs' value would be set to XX. struct rtas_args ra; ra.nargs = htobe32(XX); syscall(__NR_rtas, ); Signed-off-by: Breno Leitao --- arch/powerpc/kernel/rtas.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 8afd146bc9c7..5ef3c863003d 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include @@ -1056,7 +1057,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) struct rtas_args args; unsigned long flags; char *buff_copy, *errbuf = NULL; - int nargs, nret, token; + int index, nargs, nret, token; if (!capable(CAP_SYS_ADMIN)) return -EPERM; @@ -1084,7 +1085,8 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) if (token == RTAS_UNKNOWN_SERVICE) return -EINVAL; - args.rets = [nargs]; + index = array_index_nospec(nargs, ARRAY_SIZE(args.args)); + args.rets = [index]; memset(args.rets, 0, nret * sizeof(rtas_arg_t)); /* Need to handle ibm,suspend_me call specially */ -- 2.16.3
Re: Odd SIGSEGV issue introduced by commit 6b31d5955cb29 ("mm, oom: fix potential data corruption when oom_reaper races with writer")
On Tue, Aug 21, 2018 at 04:40:15PM +1000, Michael Ellerman wrote: > Christophe LEROY writes: > ... > > > > And I bisected its disappearance with commit 99cd1302327a2 ("powerpc: > > Deliver SEGV signal on pkey violation") > > Whoa that's weird. > > > Looking at those two commits, especially the one which makes it > > dissapear, I'm quite sceptic. Any idea on what could be the cause and/or > > how to investigate further ? > > Are you sure it's not some corruption that just happens to be masked by > that commit? I can't see anything in that commit that could explain that > change in behaviour. > > The only real change is if you're hitting DSISR_KEYFAULT isn't it? even with the 'commit 99cd1302327a2', a SEGV signal should get generated; which should kill the process. Unless the process handles SEGV signals with SEGV_PKUERR differently. The other surprising thing is, why is DSISR_KEYFAULT getting generated in the first place? Are keys somehow getting programmed into the HPTE? Feels like some random corruption. Is this behavior seen with power8 or power9? RP
Re: [PATCH] crypto: vmx - Fix sleep-in-atomic bugs
On Tue, Aug 21, 2018 at 05:24:45PM +0200, Ondrej Mosnáček wrote: > CC: Paulo Flabiano Smorigo , > linuxppc-dev@lists.ozlabs.org > > (Sorry, sent this before reading new e-mails in the thread...) > > ut 21. 8. 2018 o 17:18 Ondrej Mosnacek napísal(a): > > > > This patch fixes sleep-in-atomic bugs in AES-CBC and AES-XTS VMX > > implementations. The problem is that the blkcipher_* functions should > > not be called in atomic context. > > > > The bugs can be reproduced via the AF_ALG interface by trying to > > encrypt/decrypt sufficiently large buffers (at least 64 KiB) using the > > VMX implementations of 'cbc(aes)' or 'xts(aes)'. Such operations then > > trigger BUG in crypto_yield(): > > > > [ 891.863680] BUG: sleeping function called from invalid context at > > include/crypto/algapi.h:424 > > [ 891.864622] in_atomic(): 1, irqs_disabled(): 0, pid: 12347, name: > > kcapi-enc > > [ 891.864739] 1 lock held by kcapi-enc/12347: > > [ 891.864811] #0: f5d42c46 (sk_lock-AF_ALG){+.+.}, at: > > skcipher_recvmsg+0x50/0x530 > > [ 891.865076] CPU: 5 PID: 12347 Comm: kcapi-enc Not tainted > > 4.19.0-0.rc0.git3.1.fc30.ppc64le #1 > > [ 891.865251] Call Trace: > > [ 891.865340] [c003387578c0] [c0d67ea4] dump_stack+0xe8/0x164 > > (unreliable) > > [ 891.865511] [c00338757910] [c0172a58] > > ___might_sleep+0x2f8/0x310 > > [ 891.865679] [c00338757990] [c06bff74] > > blkcipher_walk_done+0x374/0x4a0 > > [ 891.865825] [c003387579e0] [d7e73e70] > > p8_aes_cbc_encrypt+0x1c8/0x260 [vmx_crypto] > > [ 891.865993] [c00338757ad0] [c06c0ee0] > > skcipher_encrypt_blkcipher+0x60/0x80 > > [ 891.866128] [c00338757b10] [c06ec504] > > skcipher_recvmsg+0x424/0x530 > > [ 891.866283] [c00338757bd0] [c0b00654] sock_recvmsg+0x74/0xa0 > > [ 891.866403] [c00338757c10] [c0b00f64] > > ___sys_recvmsg+0xf4/0x2f0 > > [ 891.866515] [c00338757d90] [c0b02bb8] __sys_recvmsg+0x68/0xe0 > > [ 891.866631] [c00338757e30] [c000bbe4] system_call+0x5c/0x70 > > > > Fixes: 8c755ace357c ("crypto: vmx - Adding CBC routines for VMX module") > > Fixes: c07f5d3da643 ("crypto: vmx - Adding support for XTS") > > Cc: sta...@vger.kernel.org > > Signed-off-by: Ondrej Mosnacek > > --- > > This patch should fix the issue, but I didn't test it. (I'll see if I > > can find some time tomorrow to try and recompile the kernel on a PPC > > machine... in the meantime please review :) > > > > drivers/crypto/vmx/aes_cbc.c | 30 ++ > > drivers/crypto/vmx/aes_xts.c | 19 --- > > 2 files changed, 26 insertions(+), 23 deletions(-) > > > > diff --git a/drivers/crypto/vmx/aes_cbc.c b/drivers/crypto/vmx/aes_cbc.c > > index 5285ece4f33a..b71895871be3 100644 > > --- a/drivers/crypto/vmx/aes_cbc.c > > +++ b/drivers/crypto/vmx/aes_cbc.c > > @@ -107,24 +107,23 @@ static int p8_aes_cbc_encrypt(struct blkcipher_desc > > *desc, > > ret = crypto_skcipher_encrypt(req); > > skcipher_request_zero(req); > > } else { > > - preempt_disable(); > > - pagefault_disable(); > > - enable_kernel_vsx(); > > - > > blkcipher_walk_init(, dst, src, nbytes); > > ret = blkcipher_walk_virt(desc, ); > > while ((nbytes = walk.nbytes)) { > > + preempt_disable(); > > + pagefault_disable(); > > + enable_kernel_vsx(); > > aes_p8_cbc_encrypt(walk.src.virt.addr, > >walk.dst.virt.addr, > >nbytes & AES_BLOCK_MASK, > >>enc_key, walk.iv, 1); > > + disable_kernel_vsx(); > > + pagefault_enable(); > > + preempt_enable(); > > + > > nbytes &= AES_BLOCK_SIZE - 1; > > ret = blkcipher_walk_done(desc, , nbytes); > > } > > - > > - disable_kernel_vsx(); > > - pagefault_enable(); > > - preempt_enable(); > > } > > > > return ret; > > @@ -147,24 +146,23 @@ static int p8_aes_cbc_decrypt(struct blkcipher_desc > > *desc, > > ret = crypto_skcipher_decrypt(req); > > skcipher_request_zero(req); > > } else { > > - preempt_disable(); > > - pagefault_disable(); > > - enable_kernel_vsx(); > > - > > blkcipher_walk_init(, dst, src, nbytes); > > ret = blkcipher_walk_virt(desc, ); > > while ((nbytes = walk.nbytes)) { > > + preempt_disable(); > > + pagefault_disable(); > > + enable_kernel_vsx(); > >
Re: BUG: libkcapi tests trigger sleep-in-atomic bug in VMX code (ppc64)
CC: Paulo Flabiano Smorigo Yes, I do believe that CTR is doing it right. Preemption only needs to be disabled during the aes_p8_cbc_encrypt() call, to avoid trashing the VSX registers during the AES operation. -- Regards, Marcelo On Tue, Aug 21, 2018 at 05:03:50PM +0200, Christophe LEROY wrote: > > > Le 21/08/2018 à 16:38, Ondrej Mosnáček a écrit : > > ut 21. 8. 2018 o 16:18 Stephan Mueller napísal(a): > > > Am Dienstag, 21. August 2018, 14:48:11 CEST schrieb Ondrej Mosnáček: > > > > > > Hi Ondrej, Marcelo, > > > > > > (+Marcelo) > > > > > > > Looking at crypto/algif_skcipher.c, I can see that skcipher_recvmsg() > > > > holds the socket lock the whole time and yet passes > > > > CRYPTO_TFM_REQ_MAY_SLEEP to the cipher implementation. Isn't that > > > > wrong? > > > > > > I think you are referring to lock_sock(sk)? > > > > > > If so, this should not be the culprit: the socket lock is in essence a > > > mutex- > > > like operation with its own wait queue that it allowed to sleep. In > > > lock_sock_nested that is called by lock_sock it even has the call of > > > might_sleep which indicates that the caller may be put to sleep. > > > > > > Looking into the code (without too much debugging) I see in the function > > > p8_aes_cbc_encrypt that is part of the stack trace the call to > > > preempt_disable() which starts an atomic context. The preempt_enable() is > > > invoked after the walk operation. > > > > > > The preempt_disable increases the preempt_count. That counter is used by > > > in_atomic() to check whether we are in atomic context. > > > > > > The issue is that blkcipher_walk_done may call crypto_yield() which then > > > invokes cond_resched if the implementation is allowed to sleep. > > > > Indeed, you're right, the issue is actually in the vmx_crypto code. I > > remember having looked at the 'ctr(aes)' implementation in there a few > > days ago (I think I was trying to debug this very issue, but for some > > reason I only looked at ctr(aes)...) and I didn't find any bug, so > > that's why I jumped to suspecting the algif_skcipher code... I should > > have double-checked :) > > > > It turns out the 'cbc(aes)' (and actually also 'xts(aes)') > > implementation is coded a bit differently and they both *do* contain > > the sleep-in-atomic bug. I will try to fix them according to the > > correct CTR implementation and send a patch. > > CC: linuxppc-dev@lists.ozlabs.org > > > > > Thanks, > > Ondrej > > > > > @Marcelo: shouldn't be the sleep flag be cleared when entering the > > > preempt_disable section? > > > > > > Ciao > > > Stephan > > > > > > signature.asc Description: PGP signature
Re: BUG: libkcapi tests trigger sleep-in-atomic bug in VMX code (ppc64)
Le 21/08/2018 à 16:38, Ondrej Mosnáček a écrit : ut 21. 8. 2018 o 16:18 Stephan Mueller napísal(a): Am Dienstag, 21. August 2018, 14:48:11 CEST schrieb Ondrej Mosnáček: Hi Ondrej, Marcelo, (+Marcelo) Looking at crypto/algif_skcipher.c, I can see that skcipher_recvmsg() holds the socket lock the whole time and yet passes CRYPTO_TFM_REQ_MAY_SLEEP to the cipher implementation. Isn't that wrong? I think you are referring to lock_sock(sk)? If so, this should not be the culprit: the socket lock is in essence a mutex- like operation with its own wait queue that it allowed to sleep. In lock_sock_nested that is called by lock_sock it even has the call of might_sleep which indicates that the caller may be put to sleep. Looking into the code (without too much debugging) I see in the function p8_aes_cbc_encrypt that is part of the stack trace the call to preempt_disable() which starts an atomic context. The preempt_enable() is invoked after the walk operation. The preempt_disable increases the preempt_count. That counter is used by in_atomic() to check whether we are in atomic context. The issue is that blkcipher_walk_done may call crypto_yield() which then invokes cond_resched if the implementation is allowed to sleep. Indeed, you're right, the issue is actually in the vmx_crypto code. I remember having looked at the 'ctr(aes)' implementation in there a few days ago (I think I was trying to debug this very issue, but for some reason I only looked at ctr(aes)...) and I didn't find any bug, so that's why I jumped to suspecting the algif_skcipher code... I should have double-checked :) It turns out the 'cbc(aes)' (and actually also 'xts(aes)') implementation is coded a bit differently and they both *do* contain the sleep-in-atomic bug. I will try to fix them according to the correct CTR implementation and send a patch. CC: linuxppc-dev@lists.ozlabs.org Thanks, Ondrej @Marcelo: shouldn't be the sleep flag be cleared when entering the preempt_disable section? Ciao Stephan
[PATCH 0/2] i2c: remove deprecated attach_adapter callback
So, I wanted to do this in the next cycle, but Linus seems to want it this cycle already [1], so here it is: Remove the attach_adapter callback from the 2.4 times by converting the last user to a custom probing mechanism based on deferred probing. We used this already in commit ac397c80de89 ("ALSA: ppc: keywest: drop using attach adapter") successfully on HW, so we agreed to use it on the windtunnel driver as well. With the last user gone, we can then remove the callback \o/ I think this allows for more cleanup in the core, but let's do this later and focus on the removal for now. Tested on a Renesas R-Car Salvator-XS board (M3N) by using and rebinding various I2C busses. Build bot and checkpatch are happy, too. I'd like to send a pull request to Linus this merge window, so looking forward to super fast comments, acks, etc... Thanks, Wolfram [1] http://patchwork.ozlabs.org/patch/959322/#1976742 Wolfram Sang (2): macintosh: therm_windtunnel: drop using attach_adapter i2c: remove deprecated attach_adapter callback drivers/i2c/i2c-core-base.c | 11 +-- drivers/macintosh/therm_windtunnel.c | 25 +++-- include/linux/i2c.h | 6 -- 3 files changed, 24 insertions(+), 18 deletions(-) -- 2.11.0
[PATCH 2/2] i2c: remove deprecated attach_adapter callback
There aren't any users left. Remove this callback from the 2.4 times. Phew, finally, that took years to reach... Signed-off-by: Wolfram Sang --- drivers/i2c/i2c-core-base.c | 11 +-- include/linux/i2c.h | 6 -- 2 files changed, 1 insertion(+), 16 deletions(-) diff --git a/drivers/i2c/i2c-core-base.c b/drivers/i2c/i2c-core-base.c index 5a937109a289..f15737763608 100644 --- a/drivers/i2c/i2c-core-base.c +++ b/drivers/i2c/i2c-core-base.c @@ -62,7 +62,7 @@ /* * core_lock protects i2c_adapter_idr, and guarantees that device detection, - * deletion of detected devices, and attach_adapter calls are serialized + * deletion of detected devices are serialized */ static DEFINE_MUTEX(core_lock); static DEFINE_IDR(i2c_adapter_idr); @@ -1124,15 +1124,6 @@ static int i2c_do_add_adapter(struct i2c_driver *driver, /* Detect supported devices on that bus, and instantiate them */ i2c_detect(adap, driver); - /* Let legacy drivers scan this bus for matching devices */ - if (driver->attach_adapter) { - dev_warn(>dev, "%s: attach_adapter method is deprecated\n", -driver->driver.name); - dev_warn(>dev, -"Please use another way to instantiate your i2c_client\n"); - /* We ignore the return code; if it fails, too bad */ - driver->attach_adapter(adap); - } return 0; } diff --git a/include/linux/i2c.h b/include/linux/i2c.h index 36f357ecdf67..b79387fd57da 100644 --- a/include/linux/i2c.h +++ b/include/linux/i2c.h @@ -231,7 +231,6 @@ enum i2c_alert_protocol { /** * struct i2c_driver - represent an I2C device driver * @class: What kind of i2c device we instantiate (for detect) - * @attach_adapter: Callback for bus addition (deprecated) * @probe: Callback for device binding - soon to be deprecated * @probe_new: New callback for device binding * @remove: Callback for device unbinding @@ -268,11 +267,6 @@ enum i2c_alert_protocol { struct i2c_driver { unsigned int class; - /* Notifies the driver that a new bus has appeared. You should avoid -* using this, it will be removed in a near future. -*/ - int (*attach_adapter)(struct i2c_adapter *) __deprecated; - /* Standard driver model interfaces */ int (*probe)(struct i2c_client *, const struct i2c_device_id *); int (*remove)(struct i2c_client *); -- 2.11.0
[PATCH 1/2] macintosh: therm_windtunnel: drop using attach_adapter
As we now have deferred probing, we can use a custom mechanism and finally get rid of the legacy interface from the i2c core. Signed-off-by: Wolfram Sang --- drivers/macintosh/therm_windtunnel.c | 25 +++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/drivers/macintosh/therm_windtunnel.c b/drivers/macintosh/therm_windtunnel.c index 68dcbcb4fc5b..8c744578122a 100644 --- a/drivers/macintosh/therm_windtunnel.c +++ b/drivers/macintosh/therm_windtunnel.c @@ -432,7 +432,6 @@ static struct i2c_driver g4fan_driver = { .driver = { .name = "therm_windtunnel", }, - .attach_adapter = do_attach, .probe = do_probe, .remove = do_remove, .id_table = therm_windtunnel_id, @@ -445,7 +444,29 @@ static struct i2c_driver g4fan_driver = { static int therm_of_probe(struct platform_device *dev) { - return i2c_add_driver( _driver ); + struct i2c_adapter *adap; + int ret, i = 0; + + adap = i2c_get_adapter(0); + if (!adap) + return -EPROBE_DEFER; + + ret = i2c_add_driver(_driver); + if (ret) { + i2c_put_adapter(adap); + return ret; + } + + /* We assume Macs have consecutive I2C bus numbers starting at 0 */ + while (adap) { + do_attach(adap); + if (x.running) + return 0; + i2c_put_adapter(adap); + adap = i2c_get_adapter(++i); + } + + return -ENODEV; } static int -- 2.11.0
Re: [PATCH] powerpc/nohash: fix pte_access_permitted()
Le 21/08/2018 à 16:25, Aneesh Kumar K.V a écrit : Christophe Leroy writes: Commit 5769beaf180a8 ("powerpc/mm: Add proper pte access check helper for other platforms") replaced generic pte_access_permitted() by an arch specific one. The generic one is defined as (pte_present(pte) && (!(write) || pte_write(pte))) The arch specific one is open coded checking that _PAGE_USER and _PAGE_WRITE (_PAGE_RW) flags are set, but lacking to check that _PAGE_RO and _PAGE_PRIVILEGED are unset, leading to a useless test on targets like the 8xx which defines _PAGE_RW and _PAGE_USER as 0. Commit 5fa5b16be5b31 ("powerpc/mm/hugetlb: Use pte_access_permitted for hugetlb access check") replaced some tests performed with pte helpers by a call to pte_access_permitted(), leading to the same issue. This patch rewrites powerpc/nohash pte_access_permitted() using pte helpers. Thanks for fixing this. I should have used the helper instead of opencoding it on nohash platforms. This is another reason why I was also suggesting we should avoid consolidating pte accessors across platforms and user accessors instead of opencoding. I fully agree. I have opened a topic for that at https://github.com/linuxppc/linux/issues/177 Christophe https://lore.kernel.org/lkml/87lgcusc6z@linux.vnet.ibm.com/T/#u Reviewed-by: Aneesh Kumar K.V Fixes: 5769beaf180a8 ("powerpc/mm: Add proper pte access check helper for other platforms") Fixes: 5fa5b16be5b31 ("powerpc/mm/hugetlb: Use pte_access_permitted for hugetlb access check") Cc: sta...@vger.kernel.org # v4.15+ Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/nohash/pgtable.h | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h index 2160be2e4339..b321c82b3624 100644 --- a/arch/powerpc/include/asm/nohash/pgtable.h +++ b/arch/powerpc/include/asm/nohash/pgtable.h @@ -51,17 +51,14 @@ static inline int pte_present(pte_t pte) #define pte_access_permitted pte_access_permitted static inline bool pte_access_permitted(pte_t pte, bool write) { - unsigned long pteval = pte_val(pte); /* * A read-only access is controlled by _PAGE_USER bit. * We have _PAGE_READ set for WRITE and EXECUTE */ - unsigned long need_pte_bits = _PAGE_PRESENT | _PAGE_USER; - - if (write) - need_pte_bits |= _PAGE_WRITE; + if (!pte_present(pte) || !pte_user(pte) || !pte_read(pte)) + return false; - if ((pteval & need_pte_bits) != need_pte_bits) + if (write && !pte_write(pte)) return false; return true; -- 2.13.3
Re: [PATCH] powerpc/nohash: fix pte_access_permitted()
Christophe Leroy writes: > Commit 5769beaf180a8 ("powerpc/mm: Add proper pte access check helper > for other platforms") replaced generic pte_access_permitted() by an > arch specific one. > > The generic one is defined as > (pte_present(pte) && (!(write) || pte_write(pte))) > > The arch specific one is open coded checking that _PAGE_USER and > _PAGE_WRITE (_PAGE_RW) flags are set, but lacking to check that > _PAGE_RO and _PAGE_PRIVILEGED are unset, leading to a useless test > on targets like the 8xx which defines _PAGE_RW and _PAGE_USER as 0. > > Commit 5fa5b16be5b31 ("powerpc/mm/hugetlb: Use pte_access_permitted > for hugetlb access check") replaced some tests performed with > pte helpers by a call to pte_access_permitted(), leading to the same > issue. > > This patch rewrites powerpc/nohash pte_access_permitted() > using pte helpers. > Thanks for fixing this. I should have used the helper instead of opencoding it on nohash platforms. This is another reason why I was also suggesting we should avoid consolidating pte accessors across platforms and user accessors instead of opencoding. https://lore.kernel.org/lkml/87lgcusc6z@linux.vnet.ibm.com/T/#u Reviewed-by: Aneesh Kumar K.V > Fixes: 5769beaf180a8 ("powerpc/mm: Add proper pte access check helper for > other platforms") > Fixes: 5fa5b16be5b31 ("powerpc/mm/hugetlb: Use pte_access_permitted for > hugetlb access check") > Cc: sta...@vger.kernel.org # v4.15+ > Signed-off-by: Christophe Leroy > --- > arch/powerpc/include/asm/nohash/pgtable.h | 9 +++-- > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/arch/powerpc/include/asm/nohash/pgtable.h > b/arch/powerpc/include/asm/nohash/pgtable.h > index 2160be2e4339..b321c82b3624 100644 > --- a/arch/powerpc/include/asm/nohash/pgtable.h > +++ b/arch/powerpc/include/asm/nohash/pgtable.h > @@ -51,17 +51,14 @@ static inline int pte_present(pte_t pte) > #define pte_access_permitted pte_access_permitted > static inline bool pte_access_permitted(pte_t pte, bool write) > { > - unsigned long pteval = pte_val(pte); > /* >* A read-only access is controlled by _PAGE_USER bit. >* We have _PAGE_READ set for WRITE and EXECUTE >*/ > - unsigned long need_pte_bits = _PAGE_PRESENT | _PAGE_USER; > - > - if (write) > - need_pte_bits |= _PAGE_WRITE; > + if (!pte_present(pte) || !pte_user(pte) || !pte_read(pte)) > + return false; > > - if ((pteval & need_pte_bits) != need_pte_bits) > + if (write && !pte_write(pte)) > return false; > > return true; > -- > 2.13.3
Re: [PATCH 1/2] sched/topology: Set correct numa topology type
On Tue, Aug 21, 2018 at 04:02:58AM -0700, Srikar Dronamraju wrote: > * Srikar Dronamraju [2018-08-10 22:30:18]: > > > With commit 051f3ca02e46 ("sched/topology: Introduce NUMA identity node > > sched domain") scheduler introduces an new numa level. However this > > leads to numa topology on 2 node systems no more marked as NUMA_DIRECT. > > After this commit, it gets reported as NUMA_BACKPLANE. This is because > > sched_domains_numa_level is now 2 on 2 node systems. > > > > Fix this by allowing setting systems that have upto 2 numa levels as > > NUMA_DIRECT. > > > > While here remove a code that assumes level can be 0. > > > > Fixes: 051f3ca02e46 "Introduce NUMA identity node sched domain" > > Signed-off-by: Srikar Dronamraju > > --- > > Hey Peter, > > Did you look at these two patches? Nope, been on holidays and the inbox is an even bigger mess than normal. I'll get to it, eventually :/
[RFC PATCH 2/2] powerpc/mm: Increase the max addressable memory to 2PB
Currently we limit the max addressable memory to 128TB. This patch increase the limit to 2PB. We can have devices like nvdimm which adds memory above 512TB limit. We still don't support regular system ram above 512TB. One of the challenge with that is the percpu allocator, that allocates per node memory and use the max distance between them as the percpu offsets. This means with large gap in address space ( system ram above 1PB) we will run out of vmalloc space to map the percpu allocation. In order to support addressable memory above 512TB, kernel should be able to linear map this range. To do that with hash translation we now add 4 context to kernel linear map region. Our per context addressable range is 512TB. We still keep VMALLOC and VMEMMAP region to old size. SLB miss handlers is updated to validate these limit. We also limit this update to SPARSEMEM_VMEMMAP and SPARSEMEM_EXTREME Signed-off-by: Aneesh Kumar K.V --- Changes dependent on http://patchwork.ozlabs.org/patch/960211/ "powerpc/64s/hash: convert SLB miss handlers to C" arch/powerpc/include/asm/book3s/64/mmu-hash.h | 49 --- arch/powerpc/include/asm/mmu.h| 15 ++ arch/powerpc/include/asm/sparsemem.h | 11 - arch/powerpc/mm/slb.c | 20 ++-- 4 files changed, 73 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h index b3520b549cba..a184b71cead8 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h @@ -558,6 +558,26 @@ extern void slb_set_size(u16 size); #define ESID_BITS_MASK ((1 << ESID_BITS) - 1) #define ESID_BITS_1T_MASK ((1 << ESID_BITS_1T) - 1) +/* + * Now certain config support MAX_PHYSMEM more than 512TB. Hence we will need + * to use more than one context for linear mapping the kernel. + * For vmalloc and memmap, we use just one context with 512TB. With 64 byte + * struct page size, we need ony 32 TB in memmap for 2PB (51 bits (MAX_PHYSMEM_BITS)). + */ +#if (MAX_PHYSMEM_BITS > MAX_EA_BITS_PER_CONTEXT) +#define MAX_KERNEL_CTX_CNT (1UL << (MAX_PHYSMEM_BITS - MAX_EA_BITS_PER_CONTEXT)) +#else +#define MAX_KERNEL_CTX_CNT 1 +#endif + +#define MAX_VMALLOC_CTX_CNT1 +#define MAX_MEMMAP_CTX_CNT 1 + +/* + * Would be nice to use KERNEL_REGION_ID here + */ +#define KERNEL_REGION_CONTEXT_OFFSET (0xc - 1) + /* * 256MB segment * The proto-VSID space has 2^(CONTEX_BITS + ESID_BITS) - 1 segments @@ -568,12 +588,13 @@ extern void slb_set_size(u16 size); * We also need to avoid the last segment of the last context, because that * would give a protovsid of 0x1f. That will result in a VSID 0 * because of the modulo operation in vsid scramble. + * + * We add one extra context to MIN_USER_CONTEXT so that we can map kernel + * context easily. The +1 is to map the unused 0xe region mapping. */ #define MAX_USER_CONTEXT ((ASM_CONST(1) << CONTEXT_BITS) - 2) -#define MIN_USER_CONTEXT (5) - -/* Would be nice to use KERNEL_REGION_ID here */ -#define KERNEL_REGION_CONTEXT_OFFSET (0xc - 1) +#define MIN_USER_CONTEXT (MAX_KERNEL_CTX_CNT + MAX_VMALLOC_CTX_CNT + \ +MAX_MEMMAP_CTX_CNT + 2) /* * For platforms that support on 65bit VA we limit the context bits @@ -733,6 +754,23 @@ static inline unsigned long get_vsid(unsigned long context, unsigned long ea, return vsid_scramble(protovsid, VSID_MULTIPLIER_1T, vsid_bits); } +static inline unsigned long get_kernel_context(unsigned long ea) +{ + unsigned long region_id = REGION_ID(ea); + unsigned long ctx = region_id - KERNEL_REGION_CONTEXT_OFFSET; + /* +* For linear mapping we do support multiple context +*/ + if (region_id == KERNEL_REGION_ID) { + /* +* We already verified ea to be not beyond the addr limit. +*/ + ctx += ((ea & ~REGION_MASK) >> MAX_EA_BITS_PER_CONTEXT); + } + + return ctx; +} + /* * This is only valid for addresses >= PAGE_OFFSET */ @@ -755,8 +793,7 @@ static inline unsigned long get_kernel_vsid(unsigned long ea, int ssize) * So we can compute the context from the region (top nibble) by * subtracting 11, or 0xc - 1. */ - context = (ea >> 60) - KERNEL_REGION_CONTEXT_OFFSET; - + context = get_kernel_context(ea); return get_vsid(context, ea, ssize); } diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h index 13ea441ac531..eb20eb3b8fb0 100644 --- a/arch/powerpc/include/asm/mmu.h +++ b/arch/powerpc/include/asm/mmu.h @@ -309,6 +309,21 @@ static inline u16 get_mm_addr_key(struct mm_struct *mm, unsigned long address) */ #define MMU_PAGE_COUNT 16 +/* + * If we store section details in page->flags we can't increase the MAX_PHYSMEM_BITS + * if we increase SECTIONS_WIDTH we
[RFC PATCH 1/2] powerpc/mm/hash: Rename get_ea_context to get_user_context
We will be adding get_kernel_context later. Update function name to indicate this handle context allocation user space address. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/book3s/64/mmu.h | 4 ++-- arch/powerpc/include/asm/mmu_context.h | 2 +- arch/powerpc/mm/slb.c| 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h index 9c8c669a6b6a..6328857f259f 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu.h +++ b/arch/powerpc/include/asm/book3s/64/mmu.h @@ -208,7 +208,7 @@ extern void radix_init_pseries(void); static inline void radix_init_pseries(void) { }; #endif -static inline int get_ea_context(mm_context_t *ctx, unsigned long ea) +static inline int get_user_context(mm_context_t *ctx, unsigned long ea) { int index = ea >> MAX_EA_BITS_PER_CONTEXT; @@ -223,7 +223,7 @@ static inline int get_ea_context(mm_context_t *ctx, unsigned long ea) static inline unsigned long get_user_vsid(mm_context_t *ctx, unsigned long ea, int ssize) { - unsigned long context = get_ea_context(ctx, ea); + unsigned long context = get_user_context(ctx, ea); return get_vsid(context, ea, ssize); } diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index b2f89b621b15..dbbab7ba449b 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -81,7 +81,7 @@ static inline bool need_extra_context(struct mm_struct *mm, unsigned long ea) { int context_id; - context_id = get_ea_context(>context, ea); + context_id = get_user_context(>context, ea); if (!context_id) return true; return false; diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c index b0f4c038d658..f63a608922ca 100644 --- a/arch/powerpc/mm/slb.c +++ b/arch/powerpc/mm/slb.c @@ -464,7 +464,7 @@ static long slb_allocate_user(struct mm_struct *mm, unsigned long ea) if (ea >= mm->context.slb_addr_limit) return -EFAULT; - context = get_ea_context(>context, ea); + context = get_user_context(>context, ea); if (!context) return -EFAULT; -- 2.17.1
Re: [RFC 00/15] PCI: turn some __weak functions into callbacks
On Tue, Aug 21, 2018 at 12:30:50PM +0100, David Woodhouse wrote: > On Mon, 2018-08-20 at 23:14 -0700, Christoph Hellwig wrote: > > On Fri, Aug 17, 2018 at 12:26:30PM +0200, Arnd Bergmann wrote: > > > Hi Bjorn and others, > > > > > > Triggered by Christoph's patches, I had another go at converting > > > all of the remaining pci host bridge implementations to be based > > > on pci_alloc_host_bridge and a separate registration function. > > > > I really like the idea behind this series. > > Hm... are you turning direct calls into retpolined indirect calls? He does. But not anywhere near the fast path.
[PATCH] powerpc/nohash: fix pte_access_permitted()
Commit 5769beaf180a8 ("powerpc/mm: Add proper pte access check helper for other platforms") replaced generic pte_access_permitted() by an arch specific one. The generic one is defined as (pte_present(pte) && (!(write) || pte_write(pte))) The arch specific one is open coded checking that _PAGE_USER and _PAGE_WRITE (_PAGE_RW) flags are set, but lacking to check that _PAGE_RO and _PAGE_PRIVILEGED are unset, leading to a useless test on targets like the 8xx which defines _PAGE_RW and _PAGE_USER as 0. Commit 5fa5b16be5b31 ("powerpc/mm/hugetlb: Use pte_access_permitted for hugetlb access check") replaced some tests performed with pte helpers by a call to pte_access_permitted(), leading to the same issue. This patch rewrites powerpc/nohash pte_access_permitted() using pte helpers. Fixes: 5769beaf180a8 ("powerpc/mm: Add proper pte access check helper for other platforms") Fixes: 5fa5b16be5b31 ("powerpc/mm/hugetlb: Use pte_access_permitted for hugetlb access check") Cc: sta...@vger.kernel.org # v4.15+ Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/nohash/pgtable.h | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h index 2160be2e4339..b321c82b3624 100644 --- a/arch/powerpc/include/asm/nohash/pgtable.h +++ b/arch/powerpc/include/asm/nohash/pgtable.h @@ -51,17 +51,14 @@ static inline int pte_present(pte_t pte) #define pte_access_permitted pte_access_permitted static inline bool pte_access_permitted(pte_t pte, bool write) { - unsigned long pteval = pte_val(pte); /* * A read-only access is controlled by _PAGE_USER bit. * We have _PAGE_READ set for WRITE and EXECUTE */ - unsigned long need_pte_bits = _PAGE_PRESENT | _PAGE_USER; - - if (write) - need_pte_bits |= _PAGE_WRITE; + if (!pte_present(pte) || !pte_user(pte) || !pte_read(pte)) + return false; - if ((pteval & need_pte_bits) != need_pte_bits) + if (write && !pte_write(pte)) return false; return true; -- 2.13.3
Re: [RFC 00/15] PCI: turn some __weak functions into callbacks
On Mon, 2018-08-20 at 23:14 -0700, Christoph Hellwig wrote: > On Fri, Aug 17, 2018 at 12:26:30PM +0200, Arnd Bergmann wrote: > > Hi Bjorn and others, > > > > Triggered by Christoph's patches, I had another go at converting > > all of the remaining pci host bridge implementations to be based > > on pci_alloc_host_bridge and a separate registration function. > > I really like the idea behind this series. Hm... are you turning direct calls into retpolined indirect calls? smime.p7s Description: S/MIME cryptographic signature
[PATCH v2] powerpc/64s/hash: convert SLB miss handlers to C
This patch moves SLB miss handlers completely to C, using the standard exception handler macros to set up the stack and branch to C. This can be done because the segment containing the kernel stack is always bolted, so accessing it with relocation on will not cause an SLB exception. Arbitrary kernel memory may not be accessed when handling kernel space SLB misses, so care should be taken there. However user SLB misses can access any kernel memory, which can be used to move some fields out of the paca (in later patches). User SLB misses could quite easily reconcile IRQs and set up a first class kernel environment and exit via ret_from_except, however that doesn't seem to be necessary at the moment, so we only do that if a bad fault is encountered. [ Credit to Aneesh for bug fixes, error checks, and improvements to bad address handling, etc. ] Signed-off-by: Nicholas Piggin Since RFC: - Added MSR[RI] handling - Fixed up a register loss bug exposed by irq tracing (Aneesh) - Reject misses outside the defined kernel regions (Aneesh) - Added several more sanity checks and error handling (Aneesh), we may look at consolidating these tests and tightenig up the code but for a first pass we decided it's better to check carefully. Since v1: - Fixed SLB cache corruption (Aneesh) - Fixed untidy SLBE allocation "leak" in get_vsid error case - Now survives some stress testing on real hardware --- arch/powerpc/include/asm/asm-prototypes.h | 2 + arch/powerpc/kernel/exceptions-64s.S | 202 +++-- arch/powerpc/mm/Makefile | 2 +- arch/powerpc/mm/slb.c | 271 + arch/powerpc/mm/slb_low.S | 335 -- 5 files changed, 196 insertions(+), 616 deletions(-) delete mode 100644 arch/powerpc/mm/slb_low.S diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h index 1f4691ce4126..78ed3c3f879a 100644 --- a/arch/powerpc/include/asm/asm-prototypes.h +++ b/arch/powerpc/include/asm/asm-prototypes.h @@ -78,6 +78,8 @@ void kernel_bad_stack(struct pt_regs *regs); void system_reset_exception(struct pt_regs *regs); void machine_check_exception(struct pt_regs *regs); void emulation_assist_interrupt(struct pt_regs *regs); +long do_slb_fault(struct pt_regs *regs, unsigned long ea); +void do_bad_slb_fault(struct pt_regs *regs, unsigned long ea, long err); /* signals, syscalls and interrupts */ long sys_swapcontext(struct ucontext __user *old_ctx, diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index ea04dfb8c092..c3832243819b 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -566,28 +566,36 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80) - SET_SCRATCH0(r13) - EXCEPTION_PROLOG_0(PACA_EXSLB) - EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x380) - mr r12,r3 /* save r3 */ - mfspr r3,SPRN_DAR - mfspr r11,SPRN_SRR1 - crset 4*cr6+eq - BRANCH_TO_COMMON(r10, slb_miss_common) +EXCEPTION_PROLOG(PACA_EXSLB, data_access_slb_common, EXC_STD, KVMTEST_PR, 0x380); EXC_REAL_END(data_access_slb, 0x380, 0x80) EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80) - SET_SCRATCH0(r13) - EXCEPTION_PROLOG_0(PACA_EXSLB) - EXCEPTION_PROLOG_1(PACA_EXSLB, NOTEST, 0x380) - mr r12,r3 /* save r3 */ - mfspr r3,SPRN_DAR - mfspr r11,SPRN_SRR1 - crset 4*cr6+eq - BRANCH_TO_COMMON(r10, slb_miss_common) +EXCEPTION_RELON_PROLOG(PACA_EXSLB, data_access_slb_common, EXC_STD, NOTEST, 0x380); EXC_VIRT_END(data_access_slb, 0x4380, 0x80) + TRAMP_KVM_SKIP(PACA_EXSLB, 0x380) +EXC_COMMON_BEGIN(data_access_slb_common) + mfspr r10,SPRN_DAR + std r10,PACA_EXSLB+EX_DAR(r13) + EXCEPTION_PROLOG_COMMON(0x380, PACA_EXSLB) + ld r4,PACA_EXSLB+EX_DAR(r13) + std r4,_DAR(r1) + addir3,r1,STACK_FRAME_OVERHEAD + bl do_slb_fault + cmpdi r3,0 + bne-1f + b fast_exception_return +1: /* Error case */ + std r3,RESULT(r1) + bl save_nvgprs + RECONCILE_IRQ_STATE(r10, r11) + ld r4,_DAR(r1) + ld r5,RESULT(r1) + addir3,r1,STACK_FRAME_OVERHEAD + bl do_bad_slb_fault + b ret_from_except + EXC_REAL(instruction_access, 0x400, 0x80) EXC_VIRT(instruction_access, 0x4400, 0x80, 0x400) @@ -610,160 +618,34 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) EXC_REAL_BEGIN(instruction_access_slb, 0x480, 0x80) - SET_SCRATCH0(r13) - EXCEPTION_PROLOG_0(PACA_EXSLB) - EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x480) - mr r12,r3 /* save r3 */ - mfspr r3,SPRN_SRR0/* SRR0 is faulting address */ - mfspr r11,SPRN_SRR1 - crclr 4*cr6+eq - BRANCH_TO_COMMON(r10,
Re: [PATCH] powerpc/64s/hash: convert SLB miss handlers to C
On Tue, 21 Aug 2018 16:12:44 +1000 Benjamin Herrenschmidt wrote: > On Tue, 2018-08-21 at 15:13 +1000, Nicholas Piggin wrote: > > This patch moves SLB miss handlers completely to C, using the standard > > exception handler macros to set up the stack and branch to C. > > > > This can be done because the segment containing the kernel stack is > > always bolted, so accessing it with relocation on will not cause an > > SLB exception. > > > > Arbitrary kernel memory may not be accessed when handling kernel space > > SLB misses, so care should be taken there. However user SLB misses can > > access any kernel memory, which can be used to move some fields out of > > the paca (in later patches). > > > > User SLB misses could quite easily reconcile IRQs and set up a first > > class kernel environment and exit via ret_from_except, however that > > doesn't seem to be necessary at the moment, so we only do that if a > > bad fault is encountered. > > > > [ Credit to Aneesh for bug fixes, error checks, and improvements to bad > > address handling, etc ] > > > > Signed-off-by: Nicholas Piggin > > > > Since RFC: > > - Send patch 1 by itself to focus on the big change. > > - Added MSR[RI] handling > > - Fixed up a register loss bug exposed by irq tracing (Aneesh) > > - Reject misses outside the defined kernel regions (Aneesh) > > - Added several more sanity checks and error handlig (Aneesh), we may > > look at consolidating these tests and tightenig up the code but for > > a first pass we decided it's better to check carefully. > > --- > > arch/powerpc/include/asm/asm-prototypes.h | 2 + > > arch/powerpc/kernel/exceptions-64s.S | 202 +++-- > > arch/powerpc/mm/Makefile | 2 +- > > arch/powerpc/mm/slb.c | 257 + > > arch/powerpc/mm/slb_low.S | 335 -- > > 5 files changed, 185 insertions(+), 613 deletions(-) > ^^^^^^ > > Nice ! :-) Okay with the subsequent improvements, the context switching benchmark is about 30% faster. Radix is still faster, but hash now has 90% the throughput rather than 70% with upstream. Although that's no longer testing SLB miss performance because they are all eliminated. I don't know of any other SLB miss intensive tests. I think these numbers indicate it may be a reasonable approach wrt performance. Though we should still look at HANA or something on a big memory system. Thanks, Nick
Re: [PATCH 1/2] sched/topology: Set correct numa topology type
* Srikar Dronamraju [2018-08-10 22:30:18]: > With commit 051f3ca02e46 ("sched/topology: Introduce NUMA identity node > sched domain") scheduler introduces an new numa level. However this > leads to numa topology on 2 node systems no more marked as NUMA_DIRECT. > After this commit, it gets reported as NUMA_BACKPLANE. This is because > sched_domains_numa_level is now 2 on 2 node systems. > > Fix this by allowing setting systems that have upto 2 numa levels as > NUMA_DIRECT. > > While here remove a code that assumes level can be 0. > > Fixes: 051f3ca02e46 "Introduce NUMA identity node sched domain" > Signed-off-by: Srikar Dronamraju > --- Hey Peter, Did you look at these two patches? -- Thanks and Regards Srikar Dronamraju
[PATCH RFCv2 6/6] memory-hotplug.txt: Add some details about locking internals
Let's document the magic a bit, especially why device_hotplug_lock is required when adding/removing memory and how it all play together with requests to online/offline memory from user space. Cc: Jonathan Corbet Cc: Michal Hocko Cc: Andrew Morton Signed-off-by: David Hildenbrand --- Documentation/memory-hotplug.txt | 39 +++- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt index 7f49ebf3ddb2..03aaad7d7373 100644 --- a/Documentation/memory-hotplug.txt +++ b/Documentation/memory-hotplug.txt @@ -3,7 +3,7 @@ Memory Hotplug == :Created: Jul 28 2007 -:Updated: Add description of notifier of memory hotplug: Oct 11 2007 +:Updated: Add some details about locking internals:Aug 20 2018 This document is about memory hotplug including how-to-use and current status. Because Memory Hotplug is still under development, contents of this text will @@ -495,6 +495,43 @@ further processing of the notification queue. NOTIFY_STOP stops further processing of the notification queue. + +Locking Internals += + +When adding/removing memory that uses memory block devices (i.e. ordinary RAM), +the device_hotplug_lock should be held to: + +- synchronize against online/offline requests (e.g. via sysfs). This way, memory + block devices can only be accessed (.online/.state attributes) by user + space once memory has been fully added. And when removing memory, we + know nobody is in critical sections. +- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC) + +Especially, there is a possible lock inversion that is avoided using +device_hotplug_lock when adding memory and user space tries to online that +memory faster than expected: + +- device_online() will first take the device_lock(), followed by + mem_hotplug_lock +- add_memory_resource() will first take the mem_hotplug_lock, followed by + the device_lock() (while creating the devices, during bus_add_device()). + +As the device is visible to user space before taking the device_lock(), this +can result in a lock inversion. + +onlining/offlining of memory should be done via device_online()/ +device_offline() - to make sure it is properly synchronized to actions +via sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type) + +When adding/removing/onlining/offlining memory or adding/removing +heterogeneous/device memory, we should always hold the mem_hotplug_lock to +serialise memory hotplug (e.g. access to global/zone variables). + +In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) allows +for a quite efficient get_online_mems/put_online_mems implementation. + + Future Work === -- 2.17.1
[PATCH RFCv2 5/6] powerpc/powernv: hold device_hotplug_lock in memtrace_offline_pages()
Let's perform all checking + offlining + removing under device_hotplug_lock, so nobody can mess with these devices via sysfs concurrently. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Rashmica Gupta Cc: Balbir Singh Cc: Michael Neuling Signed-off-by: David Hildenbrand --- arch/powerpc/platforms/powernv/memtrace.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c index ef7181d4fe68..473e59842ec5 100644 --- a/arch/powerpc/platforms/powernv/memtrace.c +++ b/arch/powerpc/platforms/powernv/memtrace.c @@ -74,9 +74,13 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages) { u64 end_pfn = start_pfn + nr_pages - 1; + lock_device_hotplug(); + if (walk_memory_range(start_pfn, end_pfn, NULL, - check_memblock_online)) + check_memblock_online)) { + unlock_device_hotplug(); return false; + } walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE, change_memblock_state); @@ -84,14 +88,16 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages) if (offline_pages(start_pfn, nr_pages)) { walk_memory_range(start_pfn, end_pfn, (void *)MEM_ONLINE, change_memblock_state); + unlock_device_hotplug(); return false; } walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE, change_memblock_state); - remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT); + __remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT); + unlock_device_hotplug(); return true; } -- 2.17.1
[PATCH RFCv2 4/6] powerpc/powernv: hold device_hotplug_lock when calling device_online()
device_online() should be called with device_hotplug_lock() held. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Rashmica Gupta Cc: Balbir Singh Cc: Michael Neuling Signed-off-by: David Hildenbrand --- arch/powerpc/platforms/powernv/memtrace.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c index 8f1cd4f3bfd5..ef7181d4fe68 100644 --- a/arch/powerpc/platforms/powernv/memtrace.c +++ b/arch/powerpc/platforms/powernv/memtrace.c @@ -229,9 +229,11 @@ static int memtrace_online(void) * we need to online the memory ourselves. */ if (!memhp_auto_online) { + lock_device_hotplug(); walk_memory_range(PFN_DOWN(ent->start), PFN_UP(ent->start + ent->size - 1), NULL, online_mem_block); + unlock_device_hotplug(); } /* -- 2.17.1
[PATCH RFCv2 3/6] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock
There seem to be some problems as result of 30467e0b3be ("mm, hotplug: fix concurrent memory hot-add deadlock"), which tried to fix a possible lock inversion reported and discussed in [1] due to the two locks a) device_lock() b) mem_hotplug_lock While add_memory() first takes b), followed by a) during bus_probe_device(), onlining of memory from user space first took b), followed by a), exposing a possible deadlock. In [1], and it was decided to not make use of device_hotplug_lock, but rather to enforce a locking order. The problems I spotted related to this: 1. Memory block device attributes: While .state first calls mem_hotplug_begin() and the calls device_online() - which takes device_lock() - .online does no longer call mem_hotplug_begin(), so effectively calls online_pages() without mem_hotplug_lock. 2. device_online() should be called under device_hotplug_lock, however onlining memory during add_memory() does not take care of that. In addition, I think there is also something wrong about the locking in 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages() without locks. This was introduced after 30467e0b3be. And skimming over the code, I assume it could need some more care in regards to locking (e.g. device_online() called without device_hotplug_lock - but I'll not touch that for now). Now that we hold the device_hotplug_lock when - adding memory (e.g. via add_memory()/add_memory_resource()) - removing memory (e.g. via remove_memory()) - device_online()/device_offline() We can move mem_hotplug_lock usage back into online_pages()/offline_pages(). Why is mem_hotplug_lock still needed? Essentially to make get_online_mems()/put_online_mems() be very fast (relying on device_hotplug_lock would be very slow), and to serialize against addition of memory that does not create memory block devices (hmm). [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/ 2015-February/065324.html This patch is partly based on a patch by Vitaly Kuznetsov. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: Greg Kroah-Hartman Cc: "K. Y. Srinivasan" Cc: Haiyang Zhang Cc: Stephen Hemminger Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Rashmica Gupta Cc: Michael Neuling Cc: Balbir Singh Cc: Kate Stewart Cc: Thomas Gleixner Cc: Philippe Ombredanne Cc: Andrew Morton Cc: Michal Hocko Cc: Pavel Tatashin Cc: Vlastimil Babka Cc: Dan Williams Cc: Oscar Salvador Cc: YASUAKI ISHIMATSU Cc: Mathieu Malaterre Signed-off-by: David Hildenbrand --- drivers/base/memory.c | 13 + mm/memory_hotplug.c | 28 2 files changed, 21 insertions(+), 20 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 5b0375be7f65..04be13539eb8 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -228,7 +228,6 @@ static bool pages_correctly_probed(unsigned long start_pfn) /* * MEMORY_HOTPLUG depends on SPARSEMEM in mm/Kconfig, so it is * OK to have direct references to sparsemem variables in here. - * Must already be protected by mem_hotplug_begin(). */ static int memory_block_action(unsigned long phys_index, unsigned long action, int online_type) @@ -294,7 +293,6 @@ static int memory_subsys_online(struct device *dev) if (mem->online_type < 0) mem->online_type = MMOP_ONLINE_KEEP; - /* Already under protection of mem_hotplug_begin() */ ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE); /* clear online_type */ @@ -341,19 +339,11 @@ store_mem_state(struct device *dev, goto err; } - /* -* Memory hotplug needs to hold mem_hotplug_begin() for probe to find -* the correct memory block to online before doing device_online(dev), -* which will take dev->mutex. Take the lock early to prevent an -* inversion, memory_subsys_online() callbacks will be implemented by -* assuming it's already protected. -*/ - mem_hotplug_begin(); - switch (online_type) { case MMOP_ONLINE_KERNEL: case MMOP_ONLINE_MOVABLE: case MMOP_ONLINE_KEEP: + /* mem->online_type is protected by device_hotplug_lock */ mem->online_type = online_type; ret = device_online(>dev); break; @@ -364,7 +354,6 @@ store_mem_state(struct device *dev, ret = -EINVAL; /* should never happen */ } - mem_hotplug_done(); err: unlock_device_hotplug(); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index e2b5c751e3ea..a2c6c87d83f3 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -881,7 +881,6 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid, return zone; } -/* Must be protected by mem_hotplug_begin() or
[PATCH RFCv2 2/6] mm/memory_hotplug: make add_memory() take the device_hotplug_lock
add_memory() currently does not take the device_hotplug_lock, however is aleady called under the lock from arch/powerpc/platforms/pseries/hotplug-memory.c drivers/acpi/acpi_memhotplug.c to synchronize against CPU hot-remove and similar. In general, we should hold the device_hotplug_lock when adding memory to synchronize against online/offline request (e.g. from user space) - which already resulted in lock inversions due to device_lock() and mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory hot-add deadlock"). add_memory()/add_memory_resource() will create memory block devices, so this really feels like the right thing to do. Holding the device_hotplug_lock makes sure that a memory block device can really only be accessed (e.g. via .online/.state) from user space, once the memory has been fully added to the system. The lock is not held yet in drivers/xen/balloon.c arch/powerpc/platforms/powernv/memtrace.c drivers/s390/char/sclp_cmd.c drivers/hv/hv_balloon.c So, let's either use the locked variants or take the lock. Don't export add_memory_resource(), as it once was exported to be used by XEN, which is never built as a module. If somebody requires it, we also have to export a locked variant (as device_hotplug_lock is never exported). Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: Greg Kroah-Hartman Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Nathan Fontenot Cc: John Allen Cc: Andrew Morton Cc: Michal Hocko Cc: Dan Williams Cc: Joonsoo Kim Cc: Vlastimil Babka Cc: Oscar Salvador Cc: Mathieu Malaterre Cc: Pavel Tatashin Cc: YASUAKI ISHIMATSU Signed-off-by: David Hildenbrand --- .../platforms/pseries/hotplug-memory.c| 2 +- drivers/acpi/acpi_memhotplug.c| 2 +- drivers/base/memory.c | 9 ++-- drivers/xen/balloon.c | 3 +++ include/linux/memory_hotplug.h| 1 + mm/memory_hotplug.c | 22 --- 6 files changed, 32 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index b3f54466e25f..2e6f41dc103a 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -702,7 +702,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb) nid = memory_add_physaddr_to_nid(lmb->base_addr); /* Add the memory */ - rc = add_memory(nid, lmb->base_addr, block_sz); + rc = __add_memory(nid, lmb->base_addr, block_sz); if (rc) { dlpar_remove_device_tree_lmb(lmb); return rc; diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 811148415993..8fe0960ea572 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -228,7 +228,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) if (node < 0) node = memory_add_physaddr_to_nid(info->start_addr); - result = add_memory(node, info->start_addr, info->length); + result = __add_memory(node, info->start_addr, info->length); /* * If the memory block has been used by the kernel, add_memory() diff --git a/drivers/base/memory.c b/drivers/base/memory.c index c8a1cb0b6136..5b0375be7f65 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -521,15 +521,20 @@ memory_probe_store(struct device *dev, struct device_attribute *attr, if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1)) return -EINVAL; + ret = lock_device_hotplug_sysfs(); + if (ret) + goto out; + nid = memory_add_physaddr_to_nid(phys_addr); - ret = add_memory(nid, phys_addr, -MIN_MEMORY_BLOCK_SIZE * sections_per_block); + ret = __add_memory(nid, phys_addr, + MIN_MEMORY_BLOCK_SIZE * sections_per_block); if (ret) goto out; ret = count; out: + unlock_device_hotplug(); return ret; } diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index e12bb256036f..6bab019a82b1 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -395,7 +395,10 @@ static enum bp_state reserve_additional_memory(void) * callers drop the mutex before trying again. */ mutex_unlock(_mutex); + /* add_memory_resource() requires the device_hotplug lock */ + lock_device_hotplug(); rc = add_memory_resource(nid, resource, memhp_auto_online); + unlock_device_hotplug(); mutex_lock(_mutex); if (rc) { diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 1f096852f479..ffd9cd10fcf3 100644 ---
[PATCH RFCv2 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock
remove_memory() is exported right now but requires the device_hotplug_lock, which is not exported. So let's provide a variant that takes the lock and only export that one. The lock is already held in arch/powerpc/platforms/pseries/hotplug-memory.c drivers/acpi/acpi_memhotplug.c So, let's use the locked variant. The lock is not held (but taken in) arch/powerpc/platforms/powernv/memtrace.c So let's keep using the (now) locked variant. Apart from that, there are not other users in the tree. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: Rashmica Gupta Cc: Michael Neuling Cc: Balbir Singh Cc: Nathan Fontenot Cc: John Allen Cc: Andrew Morton Cc: Michal Hocko Cc: Dan Williams Cc: Joonsoo Kim Cc: Vlastimil Babka Cc: Pavel Tatashin Cc: Greg Kroah-Hartman Cc: Oscar Salvador Cc: YASUAKI ISHIMATSU Cc: Mathieu Malaterre Signed-off-by: David Hildenbrand --- arch/powerpc/platforms/powernv/memtrace.c | 2 -- arch/powerpc/platforms/pseries/hotplug-memory.c | 6 +++--- drivers/acpi/acpi_memhotplug.c | 2 +- include/linux/memory_hotplug.h | 3 ++- mm/memory_hotplug.c | 9 - 5 files changed, 14 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c index 51dc398ae3f7..8f1cd4f3bfd5 100644 --- a/arch/powerpc/platforms/powernv/memtrace.c +++ b/arch/powerpc/platforms/powernv/memtrace.c @@ -90,9 +90,7 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages) walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE, change_memblock_state); - lock_device_hotplug(); remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT); - unlock_device_hotplug(); return true; } diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index c1578f54c626..b3f54466e25f 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -334,7 +334,7 @@ static int pseries_remove_memblock(unsigned long base, unsigned int memblock_siz nid = memory_add_physaddr_to_nid(base); for (i = 0; i < sections_per_block; i++) { - remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE); + __remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE); base += MIN_MEMORY_BLOCK_SIZE; } @@ -423,7 +423,7 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb) block_sz = pseries_memory_block_size(); nid = memory_add_physaddr_to_nid(lmb->base_addr); - remove_memory(nid, lmb->base_addr, block_sz); + __remove_memory(nid, lmb->base_addr, block_sz); /* Update memory regions for memory remove */ memblock_remove(lmb->base_addr, block_sz); @@ -710,7 +710,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb) rc = dlpar_online_lmb(lmb); if (rc) { - remove_memory(nid, lmb->base_addr, block_sz); + __remove_memory(nid, lmb->base_addr, block_sz); dlpar_remove_device_tree_lmb(lmb); } else { lmb->flags |= DRCONF_MEM_ASSIGNED; diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 6b0d3ef7309c..811148415993 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct acpi_memory_device *mem_device) nid = memory_add_physaddr_to_nid(info->start_addr); acpi_unbind_memory_blocks(info); - remove_memory(nid, info->start_addr, info->length); + __remove_memory(nid, info->start_addr, info->length); list_del(>list); kfree(info); } diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 34a28227068d..1f096852f479 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -301,6 +301,7 @@ extern bool is_mem_section_removable(unsigned long pfn, unsigned long nr_pages); extern void try_offline_node(int nid); extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); extern void remove_memory(int nid, u64 start, u64 size); +extern void __remove_memory(int nid, u64 start, u64 size); #else static inline bool is_mem_section_removable(unsigned long pfn, @@ -317,6 +318,7 @@ static inline int offline_pages(unsigned long start_pfn, unsigned long nr_pages) } static inline void remove_memory(int nid, u64 start, u64 size) {} +static inline void __remove_memory(int nid, u64 start, u64 size) {} #endif /* CONFIG_MEMORY_HOTREMOVE */ extern void __ref free_area_init_core_hotplug(int nid); @@ -330,7 +332,6 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned
[PATCH RFCv2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock
This is the same approach as in the first RFC, but this time without exporting device_hotplug_lock (requested by Greg) and with some more details and documentation regarding locking. Tested only on x86 so far. -- Reading through the code and studying how mem_hotplug_lock is to be used, I noticed that there are two places where we can end up calling device_online()/device_offline() - online_pages()/offline_pages() without the mem_hotplug_lock. And there are other places where we call device_online()/device_offline() without the device_hotplug_lock. While e.g. echo "online" > /sys/devices/system/memory/memory9/state is fine, e.g. echo 1 > /sys/devices/system/memory/memory9/online Will not take the mem_hotplug_lock. However the device_lock() and device_hotplug_lock. E.g. via memory_probe_store(), we can end up calling add_memory()->online_pages() without the device_hotplug_lock. So we can have concurrent callers in online_pages(). We e.g. touch in online_pages() basically unprotected zone->present_pages then. Looks like there is a longer history to that (see Patch #2 for details), and fixing it to work the way it was intended is not really possible. We would e.g. have to take the mem_hotplug_lock in device/base/core.c, which sounds wrong. Summary: We had a lock inversion on mem_hotplug_lock and device_lock(). More details can be found in patch 3 and patch 6. I propose the general rules (documentation added in patch 6): 1. add_memory/add_memory_resource() must only be called with device_hotplug_lock. 2. remove_memory() must only be called with device_hotplug_lock. This is already documented and holds for all callers. 3. device_online()/device_offline() must only be called with device_hotplug_lock. This is already documented and true for now in core code. Other callers (related to memory hotplug) have to be fixed up. 4. mem_hotplug_lock is taken inside of add_memory/remove_memory/ online_pages/offline_pages. To me, this looks way cleaner than what we have right now (and easier to verify). And looking at the documentation of remove_memory, using lock_device_hotplug also for add_memory() feels natural. RFC -> RFCv2: - Don't export device_hotplug_lock, provide proper remove_memory/add_memory wrappers. - Split up the patches a bit. - Try to improve powernv memtrace locking - Add some documentation for locking that matches my knowledge David Hildenbrand (6): mm/memory_hotplug: make remove_memory() take the device_hotplug_lock mm/memory_hotplug: make add_memory() take the device_hotplug_lock mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock powerpc/powernv: hold device_hotplug_lock when calling device_online() powerpc/powernv: hold device_hotplug_lock in memtrace_offline_pages() memory-hotplug.txt: Add some details about locking internals Documentation/memory-hotplug.txt | 39 +++- arch/powerpc/platforms/powernv/memtrace.c | 14 +++-- .../platforms/pseries/hotplug-memory.c| 8 +-- drivers/acpi/acpi_memhotplug.c| 4 +- drivers/base/memory.c | 22 +++ drivers/xen/balloon.c | 3 + include/linux/memory_hotplug.h| 4 +- mm/memory_hotplug.c | 59 +++ 8 files changed, 115 insertions(+), 38 deletions(-) -- 2.17.1
Re: powerpc/fadump: cleanup crash memory ranges support
On Fri, 2018-08-17 at 20:40:56 UTC, Hari Bathini wrote: > Commit 1bd6a1c4b80a ("powerpc/fadump: handle crash memory ranges array > index overflow") changed crash memory ranges to a dynamic array that > is reallocated on-demand with krealloc(). The relevant header for this > call was not included. The kernel compiles though. But be cautious and > add the header anyway. > > Also, memory allocation logic in fadump_add_crash_memory() takes care > of memory allocation for crash memory ranges in all scenarios. Drop > unnecessary memory allocation in fadump_setup_crash_memory_ranges(). > > Fixes: 1bd6a1c4b80a ("powerpc/fadump: handle crash memory ranges array index > overflow") > Cc: Mahesh Salgaonkar > Signed-off-by: Hari Bathini Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/a58183138cb72059a0c278f8370a47 cheers
Re: [v5] powerpc/topology: Get topology for shared processors at boot
On Fri, 2018-08-17 at 14:54:39 UTC, Srikar Dronamraju wrote: > On a shared lpar, Phyp will not update the cpu associativity at boot > time. Just after the boot system does recognize itself as a shared lpar and > trigger a request for correct cpu associativity. But by then the scheduler > would have already created/destroyed its sched domains. > > This causes > - Broken load balance across Nodes causing islands of cores. > - Performance degradation esp if the system is lightly loaded > - dmesg to wrongly report all cpus to be in Node 0. > - Messages in dmesg saying borken topology. > - With commit 051f3ca02e46 ("sched/topology: Introduce NUMA identity > node sched domain"), can cause rcu stalls at boot up. > > >From a scheduler maintainer's perspective, moving cpus from one node to > another or creating more numa levels after boot is not appropriate > without some notification to the user space. > https://lore.kernel.org/lkml/20150406214558.ga38...@linux.vnet.ibm.com/T/#u > > The sched_domains_numa_masks table which is used to generate cpumasks is > only created at boot time just before creating sched domains and never > updated. Hence, its better to get the topology correct before the sched > domains are created. > > For example on 64 core Power 8 shared lpar, dmesg reports > > [2.088360] Brought up 512 CPUs > [2.088368] Node 0 CPUs: 0-511 > [2.088371] Node 1 CPUs: > [2.088373] Node 2 CPUs: > [2.088375] Node 3 CPUs: > [2.088376] Node 4 CPUs: > [2.088378] Node 5 CPUs: > [2.088380] Node 6 CPUs: > [2.088382] Node 7 CPUs: > [2.088386] Node 8 CPUs: > [2.088388] Node 9 CPUs: > [2.088390] Node 10 CPUs: > [2.088392] Node 11 CPUs: > ... > [3.916091] BUG: arch topology borken > [3.916103] the DIE domain not a subset of the NUMA domain > [3.916105] BUG: arch topology borken > [3.916106] the DIE domain not a subset of the NUMA domain > ... > > numactl/lscpu output will still be correct with cores spreading across > all nodes. > > Socket(s): 64 > NUMA node(s): 12 > Model: 2.0 (pvr 004d 0200) > Model name:POWER8 (architected), altivec supported > Hypervisor vendor: pHyp > Virtualization type: para > L1d cache: 64K > L1i cache: 32K > NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471 > NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479 > NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487 > NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495 > NUMA node4 CPU(s): 208-215,304-311,400-407,496-503 > NUMA node5 CPU(s): 168-175,264-271,360-367,456-463 > NUMA node6 CPU(s): 128-135,224-231,320-327,416-423 > NUMA node7 CPU(s): 136-143,232-239,328-335,424-431 > NUMA node8 CPU(s): 216-223,312-319,408-415,504-511 > NUMA node9 CPU(s): 144-151,240-247,336-343,432-439 > NUMA node10 CPU(s):152-159,248-255,344-351,440-447 > NUMA node11 CPU(s):160-167,256-263,352-359,448-455 > > Currently on this lpar, the scheduler detects 2 levels of Numa and > created numa sched domains for all cpus, but it finds a single DIE > domain consisting of all cpus. Hence it deletes all numa sched domains. > > To address this, detect the shared processor and update topology soon after > cpus are setup so that correct topology is updated just before scheduler > creates sched domain. > > With the fix, dmesg reports > > [0.491336] numa: Node 0 CPUs: 0-7 32-39 64-71 96-103 176-183 272-279 > 368-375 464-471 > [0.491351] numa: Node 1 CPUs: 8-15 40-47 72-79 104-111 184-191 280-287 > 376-383 472-479 > [0.491359] numa: Node 2 CPUs: 16-23 48-55 80-87 112-119 192-199 288-295 > 384-391 480-487 > [0.491366] numa: Node 3 CPUs: 24-31 56-63 88-95 120-127 200-207 296-303 > 392-399 488-495 > [0.491374] numa: Node 4 CPUs: 208-215 304-311 400-407 496-503 > [0.491379] numa: Node 5 CPUs: 168-175 264-271 360-367 456-463 > [0.491384] numa: Node 6 CPUs: 128-135 224-231 320-327 416-423 > [0.491389] numa: Node 7 CPUs: 136-143 232-239 328-335 424-431 > [0.491394] numa: Node 8 CPUs: 216-223 312-319 408-415 504-511 > [0.491399] numa: Node 9 CPUs: 144-151 240-247 336-343 432-439 > [0.491404] numa: Node 10 CPUs: 152-159 248-255 344-351 440-447 > [0.491409] numa: Node 11 CPUs: 160-167 256-263 352-359 448-455 > > and lscpu would also report > > Socket(s): 64 > NUMA node(s): 12 > Model: 2.0 (pvr 004d 0200) > Model name:POWER8 (architected), altivec supported > Hypervisor vendor: pHyp > Virtualization type: para > L1d cache: 64K > L1i cache: 32K > NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471 > NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479 > NUMA node2 CPU(s):
Re: [v2] powerpc/powernv/pci: Work around races in PCI bridge enabling
On Fri, 2018-08-17 at 07:30:39 UTC, Benjamin Herrenschmidt wrote: > The generic code is race when multiple children of a PCI bridge try to > enable it simultaneously. > > This leads to drivers trying to access a device through a not-yet-enabled > bridge, and this EEH errors under various circumstances when using parallel > driver probing. > > There is work going on to fix that properly in the PCI core but it will > take some time. > > x86 gets away with it because (outside of hotplug), the BIOS enables all > the bridges at boot time. > > This patch does the same thing on powernv by enabling all bridges that > have child devices at boot time, thus avoiding subsequent races. It's suitable > for backporting to stable and distros, while the proper PCI fix will probably > be significantly more invasive. > > Signed-off-by: Benjamin Herrenschmidt > CC: sta...@vger.kernel.org Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/db2173198b9513f7add8009f225afa cheers
Re: powerpc/traps: Avoid rate limit messages from show unhandled signals
On Fri, 2018-08-17 at 06:55:00 UTC, Michael Ellerman wrote: > In the recent commit to add an explicit ratelimit state when showing > unhandled signals, commit 35a52a10c3ac ("powerpc/traps: Use an > explicit ratelimit state for show_signal_msg()"), I put the check of > show_unhandled_signals and the ratelimit state before the call to > unhandled_signal() so as to avoid unnecessarily calling the latter > when show_unhandled_signals is false. > > However that causes us to check the ratelimit state on every call, so > if we take a lot of *handled* signals that has the effect of making > the ratelimit code print warnings that callbacks have been suppressed > when they haven't. > > So rearrange the code so that we check show_unhandled_signals first, > then call unhandled_signal() and finally check the ratelimit state. > > Signed-off-by: Michael Ellerman > Reviewed-by: Murilo Opsfelder Araujo Applied to powerpc next. https://git.kernel.org/powerpc/c/997dd26cb3c8b7c9b8765751cc1491 cheers
Re: powerpc/64s: idle_power4 fix PACA_IRQ_HARD_DIS accounting
On Tue, 2018-08-07 at 13:21:56 UTC, Nicholas Piggin wrote: > When idle_power4 hard disables interrupts then finds a soft pending > interrupt, it returns with interrupts hard disabled but without > PACA_IRQ_HARD_DIS set. Commit 9b81c0211c ("powerpc/64s: make > PACA_IRQ_HARD_DIS track MSR[EE] closely") added a warning for that > condition. > > Fix this by adding the PACA_IRQ_HARD_DIS for that case. > > Signed-off-by: Nicholas Piggin Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/993ff6d9df74305bc7b5bbc7a0810c cheers
Re: [15/15] powerpc/powernv: provide a console flush operation for opal hvc driver
On Mon, 2018-04-30 at 14:55:58 UTC, Nicholas Piggin wrote: > Provide the flush hv_op for the opal hvc driver. This will flush the > firmware console buffers without spinning with interrupts disabled. > > Cc: Benjamin Herrenschmidt > Cc: linuxppc-dev@lists.ozlabs.org > Signed-off-by: Nicholas Piggin Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/95b861a76c1ded3e89d33a3d9f4552 cheers
Re: [PATCH] poewrpc/mce: Fix SLB rebolting during MCE recovery path.
On Fri, 17 Aug 2018 14:51:47 +0530 Mahesh J Salgaonkar wrote: > From: Mahesh Salgaonkar > > With the powrpc next commit e7e81847478 (poewrpc/mce: Fix SLB rebolting > during MCE recovery path.), the SLB error recovery is broken. The > commit missed a crucial change of OR-ing index value to RB[52-63] which > selects the SLB entry while rebolting. This patch fixes that. > > Signed-off-by: Mahesh Salgaonkar > Reviewed-by: Nicholas Piggin > --- > arch/powerpc/mm/slb.c |5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c > index 0b095fa54049..6dd9913425bc 100644 > --- a/arch/powerpc/mm/slb.c > +++ b/arch/powerpc/mm/slb.c > @@ -101,9 +101,12 @@ void __slb_restore_bolted_realmode(void) > >/* No isync needed because realmode. */ > for (index = 0; index < SLB_NUM_BOLTED; index++) { > + unsigned long rb = be64_to_cpu(p->save_area[index].esid); > + > + rb = (rb & ~0xFFFul) | index; > asm volatile("slbmte %0,%1" : >: "r" (be64_to_cpu(p->save_area[index].vsid)), > -"r" (be64_to_cpu(p->save_area[index].esid))); > +"r" (rb)); > } > } > > I'm just looking at this again. The bolted save areas do have the index field set. So for the OS, your patch should be equivalent to this, right? static inline void slb_shadow_clear(enum slb_index index) { - WRITE_ONCE(get_slb_shadow()->save_area[index].esid, 0); + WRITE_ONCE(get_slb_shadow()->save_area[index].esid, index); } Which seems like a better fix. PAPR says: Note: SLB is filled sequentially starting at index 0 from the shadow buffer ignoring the contents of RB field bits 52-63 So that shouldn't be an issue. Thanks, Nick
Re: [PATCH 4/5] arm64: dts: add QorIQ LX2160A SoC support
On Mon, Aug 20, 2018 at 12:17:15PM +0530, Vabhav Sharma wrote: > LX2160A SoC is based on Layerscape Chassis Generation 3.2 Architecture. > > LX2160A features an advanced 16 64-bit ARM v8 CortexA72 processor cores > in 8 cluster, CCN508, GICv3,two 64-bit DDR4 memory controller, 8 I2C > controllers, 3 dspi, 2 esdhc,2 USB 3.0, mmu 500, 3 SATA, 4 PL011 SBSA > UARTs etc. > > Signed-off-by: Ramneek Mehresh > Signed-off-by: Zhang Ying-22455 > Signed-off-by: Nipun Gupta > Signed-off-by: Priyanka Jain > Signed-off-by: Yogesh Gaur > Signed-off-by: Sriram Dash > Signed-off-by: Vabhav Sharma > --- > arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi | 572 > + > 1 file changed, 572 insertions(+) > create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi > > diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi > b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi > new file mode 100644 > index 000..e35e494 > --- /dev/null > +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi > @@ -0,0 +1,572 @@ > +// SPDX-License-Identifier: (GPL-2.0 OR MIT) > +// > +// Device Tree Include file for Layerscape-LX2160A family SoC. > +// > +// Copyright 2018 NXP > + > +#include > + > +/memreserve/ 0x8000 0x0001; > + > +/ { > + compatible = "fsl,lx2160a"; > + interrupt-parent = <>; > + #address-cells = <2>; > + #size-cells = <2>; > + > + cpus { > + #address-cells = <1>; > + #size-cells = <0>; > + > + // 8 clusters having 2 Cortex-A72 cores each > + cpu@0 { > + device_type = "cpu"; > + compatible = "arm,cortex-a72"; > + reg = <0x0>; > + clocks = < 1 0>; > + next-level-cache = <_l2>; If you expect to get cache properties in sysfs entries, you need to populate them here and for each L2 cache. [...] > + > + rstcr: syscon@1e6 { > + compatible = "syscon"; > + reg = <0x0 0x1e6 0x0 0x4>; > + }; > + > + reboot { > + compatible ="syscon-reboot"; > + regmap = <>; > + offset = <0x0>; > + mask = <0x2>; Is this disabled in bootloader ? With PSCI, it's preferred to use SYSTEM_RESET/OFF. EL3 f/w may need to do some housekeeping on poweroff. > + }; > + > + timer { > + compatible = "arm,armv8-timer"; > + interrupts = <1 13 4>, // Physical Secure PPI, active-low The comment says active low but the value 4 indicates it's HIGH from "include/dt-bindings/interrupt-controller/irq.h" > + <1 14 4>, // Physical Non-Secure PPI, active-low > + <1 11 4>, // Virtual PPI, active-low > + <1 10 4>; // Hypervisor PPI, active-low > + }; > + > + pmu { > + compatible = "arm,armv8-pmuv3"; More specific compatible preferably "arm,cortex-a72-pmu" ? -- Regards, Sudeep
Re: [RFC 00/15] PCI: turn some __weak functions into callbacks
On Tue, Aug 21, 2018 at 8:14 AM Christoph Hellwig wrote: > > On Fri, Aug 17, 2018 at 12:26:30PM +0200, Arnd Bergmann wrote: > > Hi Bjorn and others, > > > > Triggered by Christoph's patches, I had another go at converting > > all of the remaining pci host bridge implementations to be based > > on pci_alloc_host_bridge and a separate registration function. > > I really like the idea behind this series. > > > I'm adding a bit of duplication into the less maintained code > > here, but it makes everything more consistent, and gives an > > easy place to hook up callback functions etc. > > I wonder if there is a way to avoid some of that by adding a few > more helpers, but even without the helpers that approach looks > ok to me. Ok, thanks for taking a first look. One core part that gets duplicated a lot (also in existing drivers) is the chunk that could be handled by this: int pci_host_bridge_init(struct pci_host_bridge *bridge, struct device *parent, int bus, struct pci_ops *ops, void *sysdata, struct list_head *resource_list) { if (resources) list_splice_init(resources, >windows); bridge->dev.parent = parent; bridge->sysdata = sysdata; bridge->busnr = bus; bridge->ops = ops; } That would probably help, but we should think carefully about the set of fields that we want pass here, specifically because the idea of splitting the probing into two parts was to avoid having to come up with a new interface every time that list changes due to some rework. For instance, the numa node is something that might get passed here, and if we decide to split out the operations into a separate pci_host_bridge_ops structure, the pointer to that would also be something we'd want to pass this way. > Do you have a git tree somewhere to play around with the changes? I now uploaded it (with fixes incorporated) to https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git pci-probe-rework arnd
Re: [PATCH] powerpc/64s/hash: convert SLB miss handlers to C
On Tue, 21 Aug 2018 16:12:44 +1000 Benjamin Herrenschmidt wrote: > On Tue, 2018-08-21 at 15:13 +1000, Nicholas Piggin wrote: > > This patch moves SLB miss handlers completely to C, using the standard > > exception handler macros to set up the stack and branch to C. > > > > This can be done because the segment containing the kernel stack is > > always bolted, so accessing it with relocation on will not cause an > > SLB exception. > > > > Arbitrary kernel memory may not be accessed when handling kernel space > > SLB misses, so care should be taken there. However user SLB misses can > > access any kernel memory, which can be used to move some fields out of > > the paca (in later patches). > > > > User SLB misses could quite easily reconcile IRQs and set up a first > > class kernel environment and exit via ret_from_except, however that > > doesn't seem to be necessary at the moment, so we only do that if a > > bad fault is encountered. > > > > [ Credit to Aneesh for bug fixes, error checks, and improvements to bad > > address handling, etc ] > > > > Signed-off-by: Nicholas Piggin > > > > Since RFC: > > - Send patch 1 by itself to focus on the big change. > > - Added MSR[RI] handling > > - Fixed up a register loss bug exposed by irq tracing (Aneesh) > > - Reject misses outside the defined kernel regions (Aneesh) > > - Added several more sanity checks and error handlig (Aneesh), we may > > look at consolidating these tests and tightenig up the code but for > > a first pass we decided it's better to check carefully. > > --- > > arch/powerpc/include/asm/asm-prototypes.h | 2 + > > arch/powerpc/kernel/exceptions-64s.S | 202 +++-- > > arch/powerpc/mm/Makefile | 2 +- > > arch/powerpc/mm/slb.c | 257 + > > arch/powerpc/mm/slb_low.S | 335 -- > > 5 files changed, 185 insertions(+), 613 deletions(-) > ^^^^^^ > > Nice ! :-) So I did some measurements with context switching (that takes quite a few SLB faults), we lose about 3-5% performance on that benchmark. Top SLB involved profile entries before: 7.44% [k] switch_slb 3.43% [k] slb_compare_rr_to_size 1.64% [k] slb_miss_common 1.24% [k] slb_miss_kernel_load_io 0.58% [k] exc_virt_0x4480_instruction_access_slb After: 7.15% [k] switch_slb 3.90% [k] slb_insert_entry 3.65% [k] fast_exception_return 1.00% [k] slb_allocate_user 0.59% [k] exc_virt_0x4480_instruction_access_slb With later patches we can reduce SLB misses to zero on this workload (and generally an order of magnitude lower on small workloads). But each miss will be more expensive and very large memory workloads are going to have mandatory misses. Will be good to try verifying that we can do smarter SLB allocation and reclaim to make up for that on workloads like HANA. I think we probably could because round robin isn't great. Thanks, Nick
Re: [RFC PATCH 1/5] powerpc/64s/hash: convert SLB miss handlers to C
Nicholas Piggin writes: > This patch moves SLB miss handlers completely to C, using the standard > exception handler macros to set up the stack and branch to C. > > This can be done because the segment containing the kernel stack is > always bolted, so accessing it with relocation on will not cause an > SLB exception. > > Arbitrary kernel memory may not be accessed when handling kernel space > SLB misses, so care should be taken there. We'll need to mark everything that's used in slb.c as notrace, otherwise Probably we just need to mark the whole file as not traceable. cheers
Re: Odd SIGSEGV issue introduced by commit 6b31d5955cb29 ("mm, oom: fix potential data corruption when oom_reaper races with writer")
Christophe LEROY writes: ... > > And I bisected its disappearance with commit 99cd1302327a2 ("powerpc: > Deliver SEGV signal on pkey violation") Whoa that's weird. > Looking at those two commits, especially the one which makes it > dissapear, I'm quite sceptic. Any idea on what could be the cause and/or > how to investigate further ? Are you sure it's not some corruption that just happens to be masked by that commit? I can't see anything in that commit that could explain that change in behaviour. The only real change is if you're hitting DSISR_KEYFAULT isn't it? What happens if you take 087003e9ef7c and apply the various hunks from 99cd1302327a2 gradually (or those that you can anyway)? cheers
Re: [RFC 00/15] PCI: turn some __weak functions into callbacks
On Fri, Aug 17, 2018 at 12:26:30PM +0200, Arnd Bergmann wrote: > Hi Bjorn and others, > > Triggered by Christoph's patches, I had another go at converting > all of the remaining pci host bridge implementations to be based > on pci_alloc_host_bridge and a separate registration function. I really like the idea behind this series. > I'm adding a bit of duplication into the less maintained code > here, but it makes everything more consistent, and gives an > easy place to hook up callback functions etc. I wonder if there is a way to avoid some of that by adding a few more helpers, but even without the helpers that approach looks ok to me. Do you have a git tree somewhere to play around with the changes?
Re: [PATCH v2 1/4] powerpc/tm: Remove msr_tm_active()
Breno Leitao writes: > On 08/16/2018 09:49 PM, Michael Ellerman wrote: >> Michael Neuling writes: >> >>> On Mon, 2018-06-18 at 19:59 -0300, Breno Leitao wrote: Currently msr_tm_active() is a wrapper around MSR_TM_ACTIVE() if CONFIG_PPC_TRANSACTIONAL_MEM is set, or it is just a function that returns false if CONFIG_PPC_TRANSACTIONAL_MEM is not set. This function is not necessary, since MSR_TM_ACTIVE() just do the same, checking for the TS bits and does not require any TM facility. This patchset remove every instance of msr_tm_active() and replaced it by MSR_TM_ACTIVE(). Signed-off-by: Breno Leitao >>> >>> Patch looks good... one minor nit below... >>> - if (!msr_tm_active(regs->msr) && - !current->thread.load_fp && !loadvec(current->thread)) + if (!current->thread.load_fp && !loadvec(current->thread)) { +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM + if (!MSR_TM_ACTIVE(regs->msr)) + return; >>> >>> Can you make a MSR_TM_ACTIVE() that returns false when >>> !CONFIG_PPC_TRANSACTIONAL_MEM. Then you don't need this inline #ifdef. >> >> Is that safe? >> >> I see ~50 callers of MSR_TM_ACTIVE(), are they all inside #ifdef TM ? > > I checked all of them, and the only two that are not called inside a #ifdef > are at kvm/book3s_hv_tm.c. They are: > > kvm/book3s_hv_tm.c: if (!MSR_TM_ACTIVE(msr)) { > kvm/book3s_hv_tm.c: if (MSR_TM_ACTIVE(msr) || !(vcpu->arch.texasr & > TEXASR_FS)) { That whole file is only built if TM=y: kvm-hv-$(CONFIG_PPC_TRANSACTIONAL_MEM) += \ book3s_hv_tm.o > All the others are being called inside the #ifdef > > Other than that, I do not see why it would be a problem in the way I > implemented it, since it will return false for the two cases above, which > seems correct. Take a look on how the definition became: > > #ifdef CONFIG_PPC_TRANSACTIONAL_MEM > #define MSR_TM_ACTIVE(x) (((x) & MSR_TS_MASK) != 0) /* Transaction active? > */ > #else > #define MSR_TM_ACTIVE(x) 0 > #endif Imagine we had some code somewhere that checked for TM being active in a non-TM aware kernel, that would break with this change, because now the MSR check does nothing when TM=n. eg. we might check at boot time that we're not transactional, eg. in case we came from a kdump kernel that was in a transaction. So if all the call-sites are already inside an #ifdef I'd be inclined to not add the #ifdef around the MSR_TM_ACTIVE macro. That way the macro can always be used to check the MSR value, whether TM is compiled in or out. cheers
Re: [PATCH 1/2] powerpc/process: fix nested output in show_user_instructions()
Christophe Leroy writes: > When two processes crash at the same time, we sometimes encounter > nesting in the middle of a line: I think "interleaved" is the right word, rather than "nesting". They're actually (potentially) completely unrelated segfaults, that just happen to occur at the same time. And in fact any output that happens simultaneously will mess things up, it doesn't have to be another segfault. > [4.365317] init[1]: segfault (11) at 0 nip 0 lr 0 code 1 > [4.370452] init[1]: code: > [4.372042] init[74]: segfault (11) at 10a74 nip 1000c198 lr 100078c8 code > 1 in sh[1000+14000] > [4.386829] > [4.391542] init[1]: code: > > [4.400863] init[74]: code: 90010024 bf61000c 91490a7c 3fa01002 3be0 > 7d3e4b78 3bbd0c20 3b60 > [4.409867] init[74]: code: 3b9d0040 7c7fe02e 2f83 419e0028 <8923> > 2f89 41be001c 4b7f6e79 > > This patch fixes it by preparing complete lines in a buffer and > printing it at once. > > Fixes: 88b0fe1757359 ("powerpc: Add show_user_instructions()") > Cc: Murilo Opsfelder Araujo > Signed-off-by: Christophe Leroy > --- > arch/powerpc/kernel/process.c | 17 + > 1 file changed, 9 insertions(+), 8 deletions(-) > > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c > index 913c5725cdb2..c722ce4ca1c0 100644 > --- a/arch/powerpc/kernel/process.c > +++ b/arch/powerpc/kernel/process.c > @@ -1303,32 +1303,33 @@ void show_user_instructions(struct pt_regs *regs) > { > unsigned long pc; > int i; > + char buf[96]; /* enough for 8 times 9 + 2 chars */ > + int l = 0; I'm sure your math is right, but still an on-stack buffer with sprintf() is a bit scary. Can you try using seq_buf instead? It is safe against overflow. eg, something like: struct seq_buf s; char buf[96]; seq_buf_init(, buf, sizeof(buf)); ... seq_buf_printf(, ...); cheers
Re: [PATCH] powerpc/Makefiles: Fix clang/llvm build
On Tue, 21 Aug 2018 15:38:39 +1000 Anton Blanchard wrote: > Hi Michael, > > > This breaks GCC 4.6.3 at least, which we still support: > > > > Assembler messages: > > Error: invalid switch -mpower8 > > Error: unrecognized option -mpower8 > > ../scripts/mod/empty.c:1:0: fatal error: error closing -: Broken > > pipe > > Yuck. We have POWER8 instructions in our assembly code, and a toolchain > that doesn't understand the -mpower8 flag, but has support for > power8 instructions. > > This is what I see on clang with -mpower7: > > /tmp/sstep-2afa55.s:7584: Error: unrecognized opcode: `lbarx' > /tmp/sstep-2afa55.s:7626: Error: unrecognized opcode: `stbcx.' > /tmp/sstep-2afa55.s:8077: Error: unrecognized opcode: `lharx' > /tmp/sstep-2afa55.s:8140: Error: unrecognized opcode: `stbcx.' > > Nick: any suggestions? cpu-as-$(CONFIG_PPC_BOOK3S_64) += $(call as-option,-Wa$(comma)-mpower8,-Wa$(comma)-mpower4) ? Thanks, Nick
Re: [PATCH] powerpc/64s/hash: convert SLB miss handlers to C
On Tue, 2018-08-21 at 15:13 +1000, Nicholas Piggin wrote: > This patch moves SLB miss handlers completely to C, using the standard > exception handler macros to set up the stack and branch to C. > > This can be done because the segment containing the kernel stack is > always bolted, so accessing it with relocation on will not cause an > SLB exception. > > Arbitrary kernel memory may not be accessed when handling kernel space > SLB misses, so care should be taken there. However user SLB misses can > access any kernel memory, which can be used to move some fields out of > the paca (in later patches). > > User SLB misses could quite easily reconcile IRQs and set up a first > class kernel environment and exit via ret_from_except, however that > doesn't seem to be necessary at the moment, so we only do that if a > bad fault is encountered. > > [ Credit to Aneesh for bug fixes, error checks, and improvements to bad > address handling, etc ] > > Signed-off-by: Nicholas Piggin > > Since RFC: > - Send patch 1 by itself to focus on the big change. > - Added MSR[RI] handling > - Fixed up a register loss bug exposed by irq tracing (Aneesh) > - Reject misses outside the defined kernel regions (Aneesh) > - Added several more sanity checks and error handlig (Aneesh), we may > look at consolidating these tests and tightenig up the code but for > a first pass we decided it's better to check carefully. > --- > arch/powerpc/include/asm/asm-prototypes.h | 2 + > arch/powerpc/kernel/exceptions-64s.S | 202 +++-- > arch/powerpc/mm/Makefile | 2 +- > arch/powerpc/mm/slb.c | 257 + > arch/powerpc/mm/slb_low.S | 335 -- > 5 files changed, 185 insertions(+), 613 deletions(-) ^^^^^^ Nice ! :-) Cheers, Ben.