date:20180821

Re: [RFC PATCH 1/5] powerpc/64s/hash: convert SLB miss handlers to C

2018-08-21 Thread Nicholas Piggin

On Tue, 21 Aug 2018 16:46:02 +1000
Michael Ellerman  wrote:

> Nicholas Piggin  writes:
> 
> > This patch moves SLB miss handlers completely to C, using the standard
> > exception handler macros to set up the stack and branch to C.
> >
> > This can be done because the segment containing the kernel stack is
> > always bolted, so accessing it with relocation on will not cause an
> > SLB exception.
> >
> > Arbitrary kernel memory may not be accessed when handling kernel space
> > SLB misses, so care should be taken there.  
> 
> We'll need to mark everything that's used in slb.c as notrace, otherwise
> 
> Probably we just need to mark the whole file as not traceable.

Yeah good point there. I'll do that. The whole file including things we
allow today? How do we do that, like this?

CFLAGS_REMOVE_slb.o = -mno-sched-epilog $(CC_FLAGS_FTRACE)

Thanks,
Nick

Re: [v5] powerpc/topology: Get topology for shared processors at boot

2018-08-21 Thread Srikar Dronamraju

* Michael Ellerman  [2018-08-21 20:35:23]:

> On Fri, 2018-08-17 at 14:54:39 UTC, Srikar Dronamraju wrote:
> > On a shared lpar, Phyp will not update the cpu associativity at boot
> > time. Just after the boot system does recognize itself as a shared lpar and
> > trigger a request for correct cpu associativity. But by then the scheduler
> > would have already created/destroyed its sched domains.
> > 
> > This causes
> > - Broken load balance across Nodes causing islands of cores.
> > - Performance degradation esp if the system is lightly loaded
> > - dmesg to wrongly report all cpus to be in Node 0.
> > - Messages in dmesg saying borken topology.
> > - With commit 051f3ca02e46 ("sched/topology: Introduce NUMA identity
> >   node sched domain"), can cause rcu stalls at boot up.
> > 
> > 
> > Previous attempt to solve this problem
> > https://patchwork.ozlabs.org/patch/530090/
> > 
> > Reported-by: Manjunatha H R 
> > Signed-off-by: Srikar Dronamraju 
> 
> Applied to powerpc next, thanks.
> 
> https://git.kernel.org/powerpc/c/2ea62630681027c455117aa471ea3a
> 

Once it gets to Linus's tree, can we request this to be included in
stable trees?

-- 
Thanks and Regards
Srikar Dronamraju

Re: [PATCH] crypto: vmx - Fix sleep-in-atomic bugs

2018-08-21 Thread Ondrej Mosnáček

CC: Paulo Flabiano Smorigo ,
linuxppc-dev@lists.ozlabs.org

(Sorry, sent this before reading new e-mails in the thread...)

ut 21. 8. 2018 o 17:18 Ondrej Mosnacek  napísal(a):
>
> This patch fixes sleep-in-atomic bugs in AES-CBC and AES-XTS VMX
> implementations. The problem is that the blkcipher_* functions should
> not be called in atomic context.
>
> The bugs can be reproduced via the AF_ALG interface by trying to
> encrypt/decrypt sufficiently large buffers (at least 64 KiB) using the
> VMX implementations of 'cbc(aes)' or 'xts(aes)'. Such operations then
> trigger BUG in crypto_yield():
>
> [  891.863680] BUG: sleeping function called from invalid context at 
> include/crypto/algapi.h:424
> [  891.864622] in_atomic(): 1, irqs_disabled(): 0, pid: 12347, name: kcapi-enc
> [  891.864739] 1 lock held by kcapi-enc/12347:
> [  891.864811]  #0: f5d42c46 (sk_lock-AF_ALG){+.+.}, at: 
> skcipher_recvmsg+0x50/0x530
> [  891.865076] CPU: 5 PID: 12347 Comm: kcapi-enc Not tainted 
> 4.19.0-0.rc0.git3.1.fc30.ppc64le #1
> [  891.865251] Call Trace:
> [  891.865340] [c003387578c0] [c0d67ea4] dump_stack+0xe8/0x164 
> (unreliable)
> [  891.865511] [c00338757910] [c0172a58] 
> ___might_sleep+0x2f8/0x310
> [  891.865679] [c00338757990] [c06bff74] 
> blkcipher_walk_done+0x374/0x4a0
> [  891.865825] [c003387579e0] [d7e73e70] 
> p8_aes_cbc_encrypt+0x1c8/0x260 [vmx_crypto]
> [  891.865993] [c00338757ad0] [c06c0ee0] 
> skcipher_encrypt_blkcipher+0x60/0x80
> [  891.866128] [c00338757b10] [c06ec504] 
> skcipher_recvmsg+0x424/0x530
> [  891.866283] [c00338757bd0] [c0b00654] sock_recvmsg+0x74/0xa0
> [  891.866403] [c00338757c10] [c0b00f64] ___sys_recvmsg+0xf4/0x2f0
> [  891.866515] [c00338757d90] [c0b02bb8] __sys_recvmsg+0x68/0xe0
> [  891.866631] [c00338757e30] [c000bbe4] system_call+0x5c/0x70
>
> Fixes: 8c755ace357c ("crypto: vmx - Adding CBC routines for VMX module")
> Fixes: c07f5d3da643 ("crypto: vmx - Adding support for XTS")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Ondrej Mosnacek 
> ---
> This patch should fix the issue, but I didn't test it. (I'll see if I
> can find some time tomorrow to try and recompile the kernel on a PPC
> machine... in the meantime please review :)
>
>  drivers/crypto/vmx/aes_cbc.c | 30 ++
>  drivers/crypto/vmx/aes_xts.c | 19 ---
>  2 files changed, 26 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/crypto/vmx/aes_cbc.c b/drivers/crypto/vmx/aes_cbc.c
> index 5285ece4f33a..b71895871be3 100644
> --- a/drivers/crypto/vmx/aes_cbc.c
> +++ b/drivers/crypto/vmx/aes_cbc.c
> @@ -107,24 +107,23 @@ static int p8_aes_cbc_encrypt(struct blkcipher_desc 
> *desc,
> ret = crypto_skcipher_encrypt(req);
> skcipher_request_zero(req);
> } else {
> -   preempt_disable();
> -   pagefault_disable();
> -   enable_kernel_vsx();
> -
> blkcipher_walk_init(, dst, src, nbytes);
> ret = blkcipher_walk_virt(desc, );
> while ((nbytes = walk.nbytes)) {
> +   preempt_disable();
> +   pagefault_disable();
> +   enable_kernel_vsx();
> aes_p8_cbc_encrypt(walk.src.virt.addr,
>walk.dst.virt.addr,
>nbytes & AES_BLOCK_MASK,
>>enc_key, walk.iv, 1);
> +   disable_kernel_vsx();
> +   pagefault_enable();
> +   preempt_enable();
> +
> nbytes &= AES_BLOCK_SIZE - 1;
> ret = blkcipher_walk_done(desc, , nbytes);
> }
> -
> -   disable_kernel_vsx();
> -   pagefault_enable();
> -   preempt_enable();
> }
>
> return ret;
> @@ -147,24 +146,23 @@ static int p8_aes_cbc_decrypt(struct blkcipher_desc 
> *desc,
> ret = crypto_skcipher_decrypt(req);
> skcipher_request_zero(req);
> } else {
> -   preempt_disable();
> -   pagefault_disable();
> -   enable_kernel_vsx();
> -
> blkcipher_walk_init(, dst, src, nbytes);
> ret = blkcipher_walk_virt(desc, );
> while ((nbytes = walk.nbytes)) {
> +   preempt_disable();
> +   pagefault_disable();
> +   enable_kernel_vsx();
> aes_p8_cbc_encrypt(walk.src.virt.addr,
>walk.dst.virt.addr,
>nbytes & AES_BLOCK_MASK,
>>dec_key, walk.iv, 0);
> +   disable_kernel_vsx();
> +

[PATCH 7/8] powerpc: enable building all dtbs

2018-08-21 Thread Rob Herring

Enable the 'dtbs' target for powerpc. This allows building all the dts
files in arch/powerpc/boot/dts/ when COMPILE_TEST and OF_ALL_DTBS are
enabled.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring 
---
 arch/powerpc/boot/dts/Makefile | 5 +
 arch/powerpc/boot/dts/fsl/Makefile | 4 
 2 files changed, 9 insertions(+)
 create mode 100644 arch/powerpc/boot/dts/fsl/Makefile

diff --git a/arch/powerpc/boot/dts/Makefile b/arch/powerpc/boot/dts/Makefile
index f66554cd5c45..fb335d05aae8 100644
--- a/arch/powerpc/boot/dts/Makefile
+++ b/arch/powerpc/boot/dts/Makefile
@@ -1 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
+
+subdir-y += fsl
+
+dtstree:= $(srctree)/$(src)
+dtb-$(CONFIG_OF_ALL_DTBS) := $(patsubst $(dtstree)/%.dts,%.dtb, $(wildcard 
$(dtstree)/*.dts))
diff --git a/arch/powerpc/boot/dts/fsl/Makefile 
b/arch/powerpc/boot/dts/fsl/Makefile
new file mode 100644
index ..3bae982641e9
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+
+dtstree:= $(srctree)/$(src)
+dtb-$(CONFIG_OF_ALL_DTBS) := $(patsubst $(dtstree)/%.dts,%.dtb, $(wildcard 
$(dtstree)/*.dts))
-- 
2.17.1

[PATCH 6/8] kbuild: consolidate Devicetree dtb build rules

2018-08-21 Thread Rob Herring

There is nothing arch specific about building dtb files other than their
location under /arch/*/boot/dts/. Keeping each arch aligned is a pain.
The dependencies and supported targets are all slightly different.
Also, a cross-compiler for each arch is needed, but really the host
compiler preprocessor is perfectly fine for building dtbs. Move the
build rules to a common location and remove the arch specific ones. This
is done in a single step to avoid warnings about overriding rules.

The build dependencies had been a mixture of 'scripts' and/or 'prepare'.
These pull in several dependencies some of which need a target compiler
(specifically devicetable-offsets.h) and aren't needed to build dtbs.
All that is really needed is dtc, so adjust the dependencies to only be
dtc.

This change enables support 'dtbs_install' on some arches which were
missing the target.

Cc: Masahiro Yamada 
Cc: Michal Marek 
Cc: Vineet Gupta 
Cc: Russell King 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Yoshinori Sato 
Cc: Michal Simek 
Cc: Ralf Baechle 
Cc: Paul Burton 
Cc: James Hogan 
Cc: Ley Foon Tan 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Chris Zankel 
Cc: Max Filippov 
Cc: linux-kbu...@vger.kernel.org
Cc: linux-snps-...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: uclinux-h8-de...@lists.sourceforge.jp
Cc: linux-m...@linux-mips.org
Cc: nios2-...@lists.rocketboards.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-xte...@linux-xtensa.org
Signed-off-by: Rob Herring 
---
 Makefile | 30 ++
 arch/arc/Makefile|  6 --
 arch/arm/Makefile| 20 +---
 arch/arm64/Makefile  | 17 +
 arch/c6x/Makefile|  2 --
 arch/h8300/Makefile  | 11 +--
 arch/microblaze/Makefile |  4 +---
 arch/mips/Makefile   | 15 +--
 arch/nds32/Makefile  |  2 +-
 arch/nios2/Makefile  |  7 ---
 arch/nios2/boot/Makefile |  4 
 arch/powerpc/Makefile|  3 ---
 arch/xtensa/Makefile | 12 +---
 scripts/Makefile |  1 -
 scripts/Makefile.lib |  2 +-
 15 files changed, 38 insertions(+), 98 deletions(-)

diff --git a/Makefile b/Makefile
index c13f8b85ba60..6d89e673f192 100644
--- a/Makefile
+++ b/Makefile
@@ -1212,6 +1212,30 @@ kselftest-merge:
$(srctree)/tools/testing/selftests/*/config
+$(Q)$(MAKE) -f $(srctree)/Makefile olddefconfig
 
+# ---
+# Devicetree files
+
+dtstree := $(wildcard arch/$(SRCARCH)/boot/dts)
+
+ifdef CONFIG_OF_EARLY_FLATTREE
+
+%.dtb %.dtb.S %.dtb.o: | dtc
+   $(Q)$(MAKE) $(build)=$(dtstree) $(dtstree)/$@
+
+PHONY += dtbs
+dtbs: | dtc
+   $(Q)$(MAKE) $(build)=$(dtstree)
+
+dtbs_install: dtbs
+   $(Q)$(MAKE) $(dtbinst)=$(dtstree)
+
+all: dtbs
+
+dtc:
+   $(Q)$(MAKE) $(build)=scripts/dtc
+
+endif
+
 # ---
 # Modules
 
@@ -1425,6 +1449,12 @@ help:
@echo  '  kselftest-merge - Merge all the config dependencies of 
kselftest to existing'
@echo  '.config.'
@echo  ''
+   @$(if $(dtstree), \
+   echo 'Devicetree:'; \
+   echo '* dtbs- Build device tree blobs for enabled 
boards'; \
+   echo '  dtbs_install- Install dtbs to 
$(INSTALL_DTBS_PATH)'; \
+   echo '')
+
@echo 'Userspace tools targets:'
@echo '  use "make tools/help"'
@echo '  or  "cd tools; make help"'
diff --git a/arch/arc/Makefile b/arch/arc/Makefile
index 6c1b20dd76ad..cbfb7a16b570 100644
--- a/arch/arc/Makefile
+++ b/arch/arc/Makefile
@@ -132,11 +132,5 @@ boot_targets += uImage uImage.bin uImage.gz
 $(boot_targets): vmlinux
$(Q)$(MAKE) $(build)=$(boot) $(boot)/$@
 
-%.dtb %.dtb.S %.dtb.o: scripts
-   $(Q)$(MAKE) $(build)=$(boot)/dts $(boot)/dts/$@
-
-dtbs: scripts
-   $(Q)$(MAKE) $(build)=$(boot)/dts
-
 archclean:
$(Q)$(MAKE) $(clean)=$(boot)
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index e7d703d8fac3..7f02ef8dfdb2 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -308,12 +308,7 @@ else
 KBUILD_IMAGE := $(boot)/zImage
 endif
 
-# Build the DT binary blobs if we have OF configured
-ifeq ($(CONFIG_USE_OF),y)
-KBUILD_DTBS := dtbs
-endif
-
-all:   $(notdir $(KBUILD_IMAGE)) $(KBUILD_DTBS)
+all:   $(notdir $(KBUILD_IMAGE))
 
 
 archheaders:
@@ -340,17 +335,6 @@ $(BOOT_TARGETS): vmlinux
 $(INSTALL_TARGETS):
$(Q)$(MAKE) $(build)=$(boot) MACHINE=$(MACHINE) $@
 
-%.dtb: | scripts
-   $(Q)$(MAKE) $(build)=$(boot)/dts MACHINE=$(MACHINE) $(boot)/dts/$@
-
-PHONY += dtbs dtbs_install
-
-dtbs: prepare scripts
-   $(Q)$(MAKE) $(build)=$(boot)/dts
-
-dtbs_install:
-   $(Q)$(MAKE) $(dtbinst)=$(boot)/dts
-
 PHONY += vdso_install
 vdso_install:
 ifeq ($(CONFIG_VDSO),y)
@@ -372,8 +356,6 @@ define archhelp
   echo  '  uImage

[PATCH 0/8] Devicetree build consolidation

2018-08-21 Thread Rob Herring

This series addresses a couple of issues I have with building dts files.

First, the ability to build all the dts files in the tree. This has been
supported on most arches for some time with powerpc being the main
exception. The reason powerpc wasn't supported was it needed a change
in the location built dtb files are put.

Secondly, it's a pain to acquire all the cross-compilers needed to build
dtbs for each arch. There's no reason to build with the cross compiler and
the host compiler is perfectly fine as we only need the pre-processor.

I started addressing just those 2 problems, but kept finding small
differences such as target dependencies and dtbs_install support across
architectures. Instead of trying to align all these, I've consolidated the
build targets moving them out of the arch makefiles.

I'd like to take the series via the DT tree.

Rob

Rob Herring (8):
  powerpc: build .dtb files in dts directory
  nios2: build .dtb files in dts directory
  nios2: use common rules to build built-in dtb
  nios2: fix building all dtbs
  c6x: use common built-in dtb support
  kbuild: consolidate Devicetree dtb build rules
  powerpc: enable building all dtbs
  c6x: enable building all dtbs

 Makefile   | 30 ++
 arch/arc/Makefile  |  6 
 arch/arm/Makefile  | 20 +---
 arch/arm64/Makefile| 17 +--
 arch/c6x/Makefile  |  2 --
 arch/c6x/boot/dts/Makefile | 17 +--
 arch/c6x/boot/dts/linked_dtb.S |  2 --
 arch/c6x/include/asm/sections.h|  1 -
 arch/c6x/kernel/setup.c|  4 +--
 arch/c6x/kernel/vmlinux.lds.S  | 10 --
 arch/h8300/Makefile| 11 +--
 arch/microblaze/Makefile   |  4 +--
 arch/mips/Makefile | 15 +
 arch/nds32/Makefile|  2 +-
 arch/nios2/Makefile| 11 +--
 arch/nios2/boot/Makefile   | 22 --
 arch/nios2/boot/dts/Makefile   |  6 
 arch/nios2/boot/linked_dtb.S   | 19 
 arch/powerpc/Makefile  |  3 --
 arch/powerpc/boot/Makefile | 49 ++
 arch/powerpc/boot/dts/Makefile |  6 
 arch/powerpc/boot/dts/fsl/Makefile |  4 +++
 arch/xtensa/Makefile   | 12 +---
 scripts/Makefile   |  1 -
 scripts/Makefile.lib   |  2 +-
 25 files changed, 87 insertions(+), 189 deletions(-)
 delete mode 100644 arch/c6x/boot/dts/linked_dtb.S
 create mode 100644 arch/nios2/boot/dts/Makefile
 delete mode 100644 arch/nios2/boot/linked_dtb.S
 create mode 100644 arch/powerpc/boot/dts/Makefile
 create mode 100644 arch/powerpc/boot/dts/fsl/Makefile

--
2.17.1

[PATCH 1/8] powerpc: build .dtb files in dts directory

2018-08-21 Thread Rob Herring

Align powerpc with other architectures which build the dtb files in the
same directory as the dts files. This is also in line with most other
build targets which are located in the same directory as the source.
This move will help enable the 'dtbs' target which builds all the dtbs
regardless of kernel config.

This transition could break some scripts if they expect dtb files in the
old location.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring 
---
 arch/powerpc/Makefile  |  2 +-
 arch/powerpc/boot/Makefile | 49 --
 arch/powerpc/boot/dts/Makefile |  1 +
 3 files changed, 25 insertions(+), 27 deletions(-)
 create mode 100644 arch/powerpc/boot/dts/Makefile

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 8397c7bd5880..9bacffa3b72e 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -294,7 +294,7 @@ bootwrapper_install:
$(Q)$(MAKE) $(build)=$(boot) $(patsubst %,$(boot)/%,$@)
 
 %.dtb: scripts
-   $(Q)$(MAKE) $(build)=$(boot) $(patsubst %,$(boot)/%,$@)
+   $(Q)$(MAKE) $(build)=$(boot)/dts $(patsubst %,$(boot)/dts/%,$@)
 
 # Used to create 'merged defconfigs'
 # To use it $(call) it with the first argument as the base defconfig
diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 0fb96c26136f..b201d93e1725 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -381,11 +381,11 @@ $(addprefix $(obj)/, $(sort $(filter zImage.%, 
$(image-y: vmlinux $(wrapperb
$(call if_changed,wrap,$(subst $(obj)/zImage.,,$@))
 
 # dtbImage% - a dtbImage is a zImage with an embedded device tree blob
-$(obj)/dtbImage.initrd.%: vmlinux $(wrapperbits) $(obj)/%.dtb FORCE
-   $(call if_changed,wrap,$*,,$(obj)/$*.dtb,$(obj)/ramdisk.image.gz)
+$(obj)/dtbImage.initrd.%: vmlinux $(wrapperbits) $(obj)/dts/%.dtb FORCE
+   $(call if_changed,wrap,$*,,$(obj)/dts/$*.dtb,$(obj)/ramdisk.image.gz)
 
-$(obj)/dtbImage.%: vmlinux $(wrapperbits) $(obj)/%.dtb FORCE
-   $(call if_changed,wrap,$*,,$(obj)/$*.dtb)
+$(obj)/dtbImage.%: vmlinux $(wrapperbits) $(obj)/dts/%.dtb FORCE
+   $(call if_changed,wrap,$*,,$(obj)/dts/$*.dtb)
 
 # This cannot be in the root of $(src) as the zImage rule always adds a $(obj)
 # prefix
@@ -395,36 +395,33 @@ $(obj)/vmlinux.strip: vmlinux
 $(obj)/uImage: vmlinux $(wrapperbits) FORCE
$(call if_changed,wrap,uboot)
 
-$(obj)/uImage.initrd.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE
-   $(call if_changed,wrap,uboot-$*,,$(obj)/$*.dtb,$(obj)/ramdisk.image.gz)
+$(obj)/uImage.initrd.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE
+   $(call 
if_changed,wrap,uboot-$*,,$(obj)/dts/$*.dtb,$(obj)/ramdisk.image.gz)
 
-$(obj)/uImage.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE
-   $(call if_changed,wrap,uboot-$*,,$(obj)/$*.dtb)
+$(obj)/uImage.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE
+   $(call if_changed,wrap,uboot-$*,,$(obj)/dts/$*.dtb)
 
-$(obj)/cuImage.initrd.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE
-   $(call if_changed,wrap,cuboot-$*,,$(obj)/$*.dtb,$(obj)/ramdisk.image.gz)
+$(obj)/cuImage.initrd.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE
+   $(call 
if_changed,wrap,cuboot-$*,,$(obj)/dts/$*.dtb,$(obj)/ramdisk.image.gz)
 
-$(obj)/cuImage.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE
-   $(call if_changed,wrap,cuboot-$*,,$(obj)/$*.dtb)
+$(obj)/cuImage.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE
+   $(call if_changed,wrap,cuboot-$*,,$(obj)/dts/$*.dtb)
 
-$(obj)/simpleImage.initrd.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE
-   $(call 
if_changed,wrap,simpleboot-$*,,$(obj)/$*.dtb,$(obj)/ramdisk.image.gz)
+$(obj)/simpleImage.initrd.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE
+   $(call 
if_changed,wrap,simpleboot-$*,,$(obj)/dts/$*.dtb,$(obj)/ramdisk.image.gz)
 
-$(obj)/simpleImage.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE
-   $(call if_changed,wrap,simpleboot-$*,,$(obj)/$*.dtb)
+$(obj)/simpleImage.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE
+   $(call if_changed,wrap,simpleboot-$*,,$(obj)/dts/$*.dtb)
 
-$(obj)/treeImage.initrd.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE
-   $(call 
if_changed,wrap,treeboot-$*,,$(obj)/$*.dtb,$(obj)/ramdisk.image.gz)
+$(obj)/treeImage.initrd.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE
+   $(call 
if_changed,wrap,treeboot-$*,,$(obj)/dts/$*.dtb,$(obj)/ramdisk.image.gz)
 
-$(obj)/treeImage.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE
-   $(call if_changed,wrap,treeboot-$*,,$(obj)/$*.dtb)
+$(obj)/treeImage.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE
+   $(call if_changed,wrap,treeboot-$*,,$(obj)/dts/$*.dtb)
 
-# Rule to build device tree blobs
-$(obj)/%.dtb: $(src)/dts/%.dts FORCE
-   $(call if_changed_dep,dtc)
-
-$(obj)/%.dtb: $(src)/dts/fsl/%.dts FORCE
-   $(call if_changed_dep,dtc)
+# Needed for the above targets to work with dts/fsl/ files

Re: [PATCH 5/5] arm64: dts: add LX2160ARDB board support

2018-08-21 Thread Rob Herring

On Mon, Aug 20, 2018 at 1:52 PM Vabhav Sharma  wrote:
>
> LX2160A reference design board (RDB) is a high-performance
> computing, evaluation, and development platform with LX2160A
> SoC.
>
> Signed-off-by: Priyanka Jain 
> Signed-off-by: Sriram Dash 
> Signed-off-by: Vabhav Sharma 
> ---
>  arch/arm64/boot/dts/freescale/Makefile|  1 +
>  arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts | 95 
> +++
>  2 files changed, 96 insertions(+)
>  create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts
>
> diff --git a/arch/arm64/boot/dts/freescale/Makefile 
> b/arch/arm64/boot/dts/freescale/Makefile
> index 86e18ad..445b72b 100644
> --- a/arch/arm64/boot/dts/freescale/Makefile
> +++ b/arch/arm64/boot/dts/freescale/Makefile
> @@ -13,3 +13,4 @@ dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-rdb.dtb
>  dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-simu.dtb
>  dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-qds.dtb
>  dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-rdb.dtb
> +dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-rdb.dtb
> diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts 
> b/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts
> new file mode 100644
> index 000..70fad20
> --- /dev/null
> +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts
> @@ -0,0 +1,95 @@
> +// SPDX-License-Identifier: (GPL-2.0 OR MIT)
> +//
> +// Device Tree file for LX2160ARDB
> +//
> +// Copyright 2018 NXP
> +
> +/dts-v1/;
> +
> +#include "fsl-lx2160a.dtsi"
> +
> +/ {
> +   model = "NXP Layerscape LX2160ARDB";
> +   compatible = "fsl,lx2160a-rdb", "fsl,lx2160a";
> +
> +   aliases {
> +   crypto = 

Drop this. Aliases should be numbered, and this is not a standard
alias name either.

> +   serial0 = 
> +   serial1 = 
> +   serial2 = 
> +   serial3 = 
> +   };
> +   chosen {
> +   stdout-path = "serial0:115200n8";
> +   };
> +};
> +
> + {
> +   status = "okay";
> +};
> +
> + {
> +   status = "okay";
> +};
> +
> + {
> +   status = "okay";
> +   pca9547@77 {

i2c-mux@77

> +   compatible = "nxp,pca9547";
> +   reg = <0x77>;
> +   #address-cells = <1>;
> +   #size-cells = <0>;
> +
> +   i2c@2 {
> +   #address-cells = <1>;
> +   #size-cells = <0>;
> +   reg = <0x2>;
> +
> +   ina220@40 {
> +   compatible = "ti,ina220";
> +   reg = <0x40>;
> +   shunt-resistor = <1000>;
> +   };
> +   };
> +
> +   i2c@3 {
> +   #address-cells = <1>;
> +   #size-cells = <0>;
> +   reg = <0x3>;
> +
> +   sa56004@4c {

temperature-sensor@4c

> +   compatible = "nxp,sa56004";
> +   reg = <0x4c>;
> +   };
> +
> +   sa56004@4d {
> +   compatible = "nxp,sa56004";
> +   reg = <0x4d>;
> +   };
> +   };
> +   };
> +};
> +
> + {
> +   status = "okay";
> +
> +   rtc@51 {
> +   compatible = "nxp,pcf2129";
> +   reg = <0x51>;
> +   // IRQ10_B
> +   interrupts = <0 150 0x4>;
> +   };
> +
> +};
> +
> + {
> +   status = "okay";
> +};
> +
> + {
> +   status = "okay";
> +};
> +
> + {
> +   status = "okay";
> +};
> --
> 2.7.4
>

[PATCH] selftests/powerpc: Do not fail on TM_CAUSE_RESCHED

2018-08-21 Thread Breno Leitao

There are cases where the test is not expecting to have the transaction
aborted, but, the test process might have been rescheduled, either in the OS
level or by KVM (if it is running on a KVM guest machine). The process
reschedule will cause a treclaim/recheckpoint which will cause the transaction
to doom, failing as soon as the process is rescheduled back to the CPU. This
might cause the test to fail, but this is not a failure in essence.

If that is the case, TEXASR[FC] is indicated with either
TM_CAUSE_RESCHEDULE or TM_CAUSE_KVM_RESCHEDULE for KVM interruptions.

In this scenario, ignore these two failures and avoid the whole test to return
failure.

CC: Gustavo Romero 
Signed-off-by: Breno Leitao 
---
 tools/testing/selftests/powerpc/tm/tm-unavailable.c | 10 +++---
 tools/testing/selftests/powerpc/tm/tm.h |  9 +
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/powerpc/tm/tm-unavailable.c 
b/tools/testing/selftests/powerpc/tm/tm-unavailable.c
index 156c8e750259..0fd793f574c3 100644
--- a/tools/testing/selftests/powerpc/tm/tm-unavailable.c
+++ b/tools/testing/selftests/powerpc/tm/tm-unavailable.c
@@ -236,7 +236,8 @@ void *tm_una_ping(void *input)
}
 
/* Check if we were not expecting a failure and a it occurred. */
-   if (!expecting_failure() && is_failure(cr_)) {
+   if (!expecting_failure() && is_failure(cr_) &&
+   !failure_is_reschedule()) {
printf("\n\tUnexpected transaction failure 0x%02lx\n\t",
failure_code());
return (void *) -1;
@@ -244,9 +245,12 @@ void *tm_una_ping(void *input)
 
/*
 * Check if TM failed due to the cause we were expecting. 0xda is a
-* TM_CAUSE_FAC_UNAV cause, otherwise it's an unexpected cause.
+* TM_CAUSE_FAC_UNAV cause, otherwise it's an unexpected cause, unless
+* it was caused by a reschedule.
 */
-   if (is_failure(cr_) && !failure_is_unavailable()) {
+
+   if (is_failure(cr_) && !failure_is_unavailable()
+   & !failure_is_reschedule()) {
printf("\n\tUnexpected failure cause 0x%02lx\n\t",
failure_code());
return (void *) -1;
diff --git a/tools/testing/selftests/powerpc/tm/tm.h 
b/tools/testing/selftests/powerpc/tm/tm.h
index df4204247d45..5518b1d4ef8b 100644
--- a/tools/testing/selftests/powerpc/tm/tm.h
+++ b/tools/testing/selftests/powerpc/tm/tm.h
@@ -52,6 +52,15 @@ static inline bool failure_is_unavailable(void)
return (failure_code() & TM_CAUSE_FAC_UNAV) == TM_CAUSE_FAC_UNAV;
 }
 
+static inline bool failure_is_reschedule(void)
+{
+   if ((failure_code() & TM_CAUSE_RESCHED) == TM_CAUSE_RESCHED ||
+   (failure_code() & TM_CAUSE_KVM_RESCHED) == TM_CAUSE_KVM_RESCHED)
+   return true;
+
+   return false;
+}
+
 static inline bool failure_is_nesting(void)
 {
return (__builtin_get_texasru() & 0x40);
-- 
2.16.3

[PATCH] powerpc/xive: Initialize symbol before usage

2018-08-21 Thread Breno Leitao

Function xive_native_get_ipi() might uses chip_id without it being
initialized. This gives the following error on 'smatch' tool:

error: uninitialized symbol 'chip_id'

This patch simply sets chip_id initial value to 0.

CC: Benjamin Herrenschmidt 
Signed-off-by: Breno Leitao 
---
 arch/powerpc/sysdev/xive/native.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/sysdev/xive/native.c 
b/arch/powerpc/sysdev/xive/native.c
index 311185b9960a..fc56673a3c0f 100644
--- a/arch/powerpc/sysdev/xive/native.c
+++ b/arch/powerpc/sysdev/xive/native.c
@@ -239,7 +239,7 @@ static bool xive_native_match(struct device_node *node)
 static int xive_native_get_ipi(unsigned int cpu, struct xive_cpu *xc)
 {
struct device_node *np;
-   unsigned int chip_id;
+   unsigned int chip_id = 0;
s64 irq;
 
/* Find the chip ID */
-- 
2.16.3

[PATCH] powerpc/iommu: avoid derefence before pointer check

2018-08-21 Thread Breno Leitao

The tbl pointer is being derefenced by IOMMU_PAGE_SIZE prior the check if
it is not NULL.

Just moving the dereference code to after the check, where there will be
guarantee that 'tbl' will not be NULL.

CC: Alistair Popple 
Signed-off-by: Breno Leitao 
---
 arch/powerpc/kernel/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index af7a20dc6e09..80b6caaa9b92 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -785,9 +785,9 @@ dma_addr_t iommu_map_page(struct device *dev, struct 
iommu_table *tbl,
 
vaddr = page_address(page) + offset;
uaddr = (unsigned long)vaddr;
-   npages = iommu_num_pages(uaddr, size, IOMMU_PAGE_SIZE(tbl));
 
if (tbl) {
+   npages = iommu_num_pages(uaddr, size, IOMMU_PAGE_SIZE(tbl));
align = 0;
if (tbl->it_page_shift < PAGE_SHIFT && size >= PAGE_SIZE &&
((unsigned long)vaddr & ~PAGE_MASK) == 0)
-- 
2.16.3

[PATCH] powerpc/kernel: Fix potential spectre v1 in syscall

2018-08-21 Thread Breno Leitao

The rtas syscall reads a value from a user-provided structure and uses it
to index an array, being a possible area for a potential spectre v1 attack.
This is the code that exposes this problem.

args.rets = [nargs];

The nargs is an user provided value, and the below code is an example where
the 'nargs' value would be set to XX.

struct rtas_args ra;
ra.nargs = htobe32(XX);
syscall(__NR_rtas, );

Signed-off-by: Breno Leitao 
---
 arch/powerpc/kernel/rtas.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 8afd146bc9c7..5ef3c863003d 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1056,7 +1057,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
struct rtas_args args;
unsigned long flags;
char *buff_copy, *errbuf = NULL;
-   int nargs, nret, token;
+   int index, nargs, nret, token;
 
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -1084,7 +1085,8 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
if (token == RTAS_UNKNOWN_SERVICE)
return -EINVAL;
 
-   args.rets = [nargs];
+   index = array_index_nospec(nargs, ARRAY_SIZE(args.args));
+   args.rets = [index];
memset(args.rets, 0, nret * sizeof(rtas_arg_t));
 
/* Need to handle ibm,suspend_me call specially */
-- 
2.16.3

Re: Odd SIGSEGV issue introduced by commit 6b31d5955cb29 ("mm, oom: fix potential data corruption when oom_reaper races with writer")

2018-08-21 Thread Ram Pai

On Tue, Aug 21, 2018 at 04:40:15PM +1000, Michael Ellerman wrote:
> Christophe LEROY  writes:
> ...
> >
> > And I bisected its disappearance with commit 99cd1302327a2 ("powerpc: 
> > Deliver SEGV signal on pkey violation")
> 
> Whoa that's weird.
> 
> > Looking at those two commits, especially the one which makes it 
> > dissapear, I'm quite sceptic. Any idea on what could be the cause and/or 
> > how to investigate further ?
> 
> Are you sure it's not some corruption that just happens to be masked by
> that commit? I can't see anything in that commit that could explain that
> change in behaviour.
> 
> The only real change is if you're hitting DSISR_KEYFAULT isn't it?

even with the 'commit 99cd1302327a2', a SEGV signal should get generated;
which should kill the process. Unless the process handles SEGV signals 
with SEGV_PKUERR differently.

The other surprising thing is, why is DSISR_KEYFAULT getting generated
in the first place?  Are keys somehow getting programmed into the HPTE?

Feels like some random corruption.

Is this behavior seen with power8 or power9?

RP

Re: [PATCH] crypto: vmx - Fix sleep-in-atomic bugs

2018-08-21 Thread Marcelo Henrique Cerri

On Tue, Aug 21, 2018 at 05:24:45PM +0200, Ondrej Mosnáček wrote:
> CC: Paulo Flabiano Smorigo ,
> linuxppc-dev@lists.ozlabs.org
> 
> (Sorry, sent this before reading new e-mails in the thread...)
> 
> ut 21. 8. 2018 o 17:18 Ondrej Mosnacek  napísal(a):
> >
> > This patch fixes sleep-in-atomic bugs in AES-CBC and AES-XTS VMX
> > implementations. The problem is that the blkcipher_* functions should
> > not be called in atomic context.
> >
> > The bugs can be reproduced via the AF_ALG interface by trying to
> > encrypt/decrypt sufficiently large buffers (at least 64 KiB) using the
> > VMX implementations of 'cbc(aes)' or 'xts(aes)'. Such operations then
> > trigger BUG in crypto_yield():
> >
> > [  891.863680] BUG: sleeping function called from invalid context at 
> > include/crypto/algapi.h:424
> > [  891.864622] in_atomic(): 1, irqs_disabled(): 0, pid: 12347, name: 
> > kcapi-enc
> > [  891.864739] 1 lock held by kcapi-enc/12347:
> > [  891.864811]  #0: f5d42c46 (sk_lock-AF_ALG){+.+.}, at: 
> > skcipher_recvmsg+0x50/0x530
> > [  891.865076] CPU: 5 PID: 12347 Comm: kcapi-enc Not tainted 
> > 4.19.0-0.rc0.git3.1.fc30.ppc64le #1
> > [  891.865251] Call Trace:
> > [  891.865340] [c003387578c0] [c0d67ea4] dump_stack+0xe8/0x164 
> > (unreliable)
> > [  891.865511] [c00338757910] [c0172a58] 
> > ___might_sleep+0x2f8/0x310
> > [  891.865679] [c00338757990] [c06bff74] 
> > blkcipher_walk_done+0x374/0x4a0
> > [  891.865825] [c003387579e0] [d7e73e70] 
> > p8_aes_cbc_encrypt+0x1c8/0x260 [vmx_crypto]
> > [  891.865993] [c00338757ad0] [c06c0ee0] 
> > skcipher_encrypt_blkcipher+0x60/0x80
> > [  891.866128] [c00338757b10] [c06ec504] 
> > skcipher_recvmsg+0x424/0x530
> > [  891.866283] [c00338757bd0] [c0b00654] sock_recvmsg+0x74/0xa0
> > [  891.866403] [c00338757c10] [c0b00f64] 
> > ___sys_recvmsg+0xf4/0x2f0
> > [  891.866515] [c00338757d90] [c0b02bb8] __sys_recvmsg+0x68/0xe0
> > [  891.866631] [c00338757e30] [c000bbe4] system_call+0x5c/0x70
> >
> > Fixes: 8c755ace357c ("crypto: vmx - Adding CBC routines for VMX module")
> > Fixes: c07f5d3da643 ("crypto: vmx - Adding support for XTS")
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Ondrej Mosnacek 
> > ---
> > This patch should fix the issue, but I didn't test it. (I'll see if I
> > can find some time tomorrow to try and recompile the kernel on a PPC
> > machine... in the meantime please review :)
> >
> >  drivers/crypto/vmx/aes_cbc.c | 30 ++
> >  drivers/crypto/vmx/aes_xts.c | 19 ---
> >  2 files changed, 26 insertions(+), 23 deletions(-)
> >
> > diff --git a/drivers/crypto/vmx/aes_cbc.c b/drivers/crypto/vmx/aes_cbc.c
> > index 5285ece4f33a..b71895871be3 100644
> > --- a/drivers/crypto/vmx/aes_cbc.c
> > +++ b/drivers/crypto/vmx/aes_cbc.c
> > @@ -107,24 +107,23 @@ static int p8_aes_cbc_encrypt(struct blkcipher_desc 
> > *desc,
> > ret = crypto_skcipher_encrypt(req);
> > skcipher_request_zero(req);
> > } else {
> > -   preempt_disable();
> > -   pagefault_disable();
> > -   enable_kernel_vsx();
> > -
> > blkcipher_walk_init(, dst, src, nbytes);
> > ret = blkcipher_walk_virt(desc, );
> > while ((nbytes = walk.nbytes)) {
> > +   preempt_disable();
> > +   pagefault_disable();
> > +   enable_kernel_vsx();
> > aes_p8_cbc_encrypt(walk.src.virt.addr,
> >walk.dst.virt.addr,
> >nbytes & AES_BLOCK_MASK,
> >>enc_key, walk.iv, 1);
> > +   disable_kernel_vsx();
> > +   pagefault_enable();
> > +   preempt_enable();
> > +
> > nbytes &= AES_BLOCK_SIZE - 1;
> > ret = blkcipher_walk_done(desc, , nbytes);
> > }
> > -
> > -   disable_kernel_vsx();
> > -   pagefault_enable();
> > -   preempt_enable();
> > }
> >
> > return ret;
> > @@ -147,24 +146,23 @@ static int p8_aes_cbc_decrypt(struct blkcipher_desc 
> > *desc,
> > ret = crypto_skcipher_decrypt(req);
> > skcipher_request_zero(req);
> > } else {
> > -   preempt_disable();
> > -   pagefault_disable();
> > -   enable_kernel_vsx();
> > -
> > blkcipher_walk_init(, dst, src, nbytes);
> > ret = blkcipher_walk_virt(desc, );
> > while ((nbytes = walk.nbytes)) {
> > +   preempt_disable();
> > +   pagefault_disable();
> > +   enable_kernel_vsx();
> >

Re: BUG: libkcapi tests trigger sleep-in-atomic bug in VMX code (ppc64)

2018-08-21 Thread Marcelo Henrique Cerri

CC: Paulo Flabiano Smorigo 

Yes, I do believe that CTR is doing it right. Preemption only needs to be
disabled during the aes_p8_cbc_encrypt() call, to avoid trashing the
VSX registers during the AES operation.

-- 
Regards,
Marcelo

On Tue, Aug 21, 2018 at 05:03:50PM +0200, Christophe LEROY wrote:
> 
> 
> Le 21/08/2018 à 16:38, Ondrej Mosnáček a écrit :
> > ut 21. 8. 2018 o 16:18 Stephan Mueller  napísal(a):
> > > Am Dienstag, 21. August 2018, 14:48:11 CEST schrieb Ondrej Mosnáček:
> > > 
> > > Hi Ondrej, Marcelo,
> > > 
> > > (+Marcelo)
> > > 
> > > > Looking at crypto/algif_skcipher.c, I can see that skcipher_recvmsg()
> > > > holds the socket lock the whole time and yet passes
> > > > CRYPTO_TFM_REQ_MAY_SLEEP to the cipher implementation. Isn't that
> > > > wrong?
> > > 
> > > I think you are referring to lock_sock(sk)?
> > > 
> > > If so, this should not be the culprit: the socket lock is in essence a 
> > > mutex-
> > > like operation with its own wait queue that it allowed to sleep. In
> > > lock_sock_nested that is called by lock_sock it even has the call of
> > > might_sleep which indicates that the caller may be put to sleep.
> > > 
> > > Looking into the code (without too much debugging) I see in the function
> > > p8_aes_cbc_encrypt that is part of the stack trace the call to
> > > preempt_disable() which starts an atomic context. The preempt_enable() is
> > > invoked after the walk operation.
> > > 
> > > The preempt_disable increases the preempt_count. That counter is used by
> > > in_atomic() to check whether we are in atomic context.
> > > 
> > > The issue is that blkcipher_walk_done may call crypto_yield() which then
> > > invokes cond_resched if the implementation is allowed to sleep.
> > 
> > Indeed, you're right, the issue is actually in the vmx_crypto code. I
> > remember having looked at the 'ctr(aes)' implementation in there a few
> > days ago (I think I was trying to debug this very issue, but for some
> > reason I only looked at ctr(aes)...) and I didn't find any bug, so
> > that's why I jumped to suspecting the algif_skcipher code... I should
> > have double-checked :)
> > 
> > It turns out the 'cbc(aes)' (and actually also 'xts(aes)')
> > implementation is coded a bit differently and they both *do* contain
> > the sleep-in-atomic bug. I will try to fix them according to the
> > correct CTR implementation and send a patch.
> 
> CC: linuxppc-dev@lists.ozlabs.org 
> 
> > 
> > Thanks,
> > Ondrej
> > 
> > > @Marcelo: shouldn't be the sleep flag be cleared when entering the
> > > preempt_disable section?
> > > 
> > > Ciao
> > > Stephan
> > > 
> > > 


signature.asc
Description: PGP signature

Re: BUG: libkcapi tests trigger sleep-in-atomic bug in VMX code (ppc64)

2018-08-21 Thread Christophe LEROY





Le 21/08/2018 à 16:38, Ondrej Mosnáček a écrit :

ut 21. 8. 2018 o 16:18 Stephan Mueller  napísal(a):

Am Dienstag, 21. August 2018, 14:48:11 CEST schrieb Ondrej Mosnáček:

Hi Ondrej, Marcelo,

(+Marcelo)


Looking at crypto/algif_skcipher.c, I can see that skcipher_recvmsg()
holds the socket lock the whole time and yet passes
CRYPTO_TFM_REQ_MAY_SLEEP to the cipher implementation. Isn't that
wrong?


I think you are referring to lock_sock(sk)?

If so, this should not be the culprit: the socket lock is in essence a mutex-
like operation with its own wait queue that it allowed to sleep. In
lock_sock_nested that is called by lock_sock it even has the call of
might_sleep which indicates that the caller may be put to sleep.

Looking into the code (without too much debugging) I see in the function
p8_aes_cbc_encrypt that is part of the stack trace the call to
preempt_disable() which starts an atomic context. The preempt_enable() is
invoked after the walk operation.

The preempt_disable increases the preempt_count. That counter is used by
in_atomic() to check whether we are in atomic context.

The issue is that blkcipher_walk_done may call crypto_yield() which then
invokes cond_resched if the implementation is allowed to sleep.


Indeed, you're right, the issue is actually in the vmx_crypto code. I
remember having looked at the 'ctr(aes)' implementation in there a few
days ago (I think I was trying to debug this very issue, but for some
reason I only looked at ctr(aes)...) and I didn't find any bug, so
that's why I jumped to suspecting the algif_skcipher code... I should
have double-checked :)

It turns out the 'cbc(aes)' (and actually also 'xts(aes)')
implementation is coded a bit differently and they both *do* contain
the sleep-in-atomic bug. I will try to fix them according to the
correct CTR implementation and send a patch.


CC: linuxppc-dev@lists.ozlabs.org 



Thanks,
Ondrej


@Marcelo: shouldn't be the sleep flag be cleared when entering the
preempt_disable section?

Ciao
Stephan

[PATCH 0/2] i2c: remove deprecated attach_adapter callback

2018-08-21 Thread Wolfram Sang

So, I wanted to do this in the next cycle, but Linus seems to want it this
cycle already [1], so here it is:

Remove the attach_adapter callback from the 2.4 times by converting
the last user to a custom probing mechanism based on deferred probing. We used
this already in commit ac397c80de89 ("ALSA: ppc: keywest: drop using attach
adapter") successfully on HW, so we agreed to use it on the windtunnel driver
as well.

With the last user gone, we can then remove the callback \o/ I think this
allows for more cleanup in the core, but let's do this later and focus on the
removal for now.

Tested on a Renesas R-Car Salvator-XS board (M3N) by using and rebinding
various I2C busses. Build bot and checkpatch are happy, too.

I'd like to send a pull request to Linus this merge window, so looking forward
to super fast comments, acks, etc...

Thanks,

   Wolfram

[1] http://patchwork.ozlabs.org/patch/959322/#1976742

Wolfram Sang (2):
  macintosh: therm_windtunnel: drop using attach_adapter
  i2c: remove deprecated attach_adapter callback

 drivers/i2c/i2c-core-base.c  | 11 +--
 drivers/macintosh/therm_windtunnel.c | 25 +++--
 include/linux/i2c.h  |  6 --
 3 files changed, 24 insertions(+), 18 deletions(-)

-- 
2.11.0

[PATCH 2/2] i2c: remove deprecated attach_adapter callback

2018-08-21 Thread Wolfram Sang

There aren't any users left. Remove this callback from the 2.4 times.
Phew, finally, that took years to reach...

Signed-off-by: Wolfram Sang 
---
 drivers/i2c/i2c-core-base.c | 11 +--
 include/linux/i2c.h |  6 --
 2 files changed, 1 insertion(+), 16 deletions(-)

diff --git a/drivers/i2c/i2c-core-base.c b/drivers/i2c/i2c-core-base.c
index 5a937109a289..f15737763608 100644
--- a/drivers/i2c/i2c-core-base.c
+++ b/drivers/i2c/i2c-core-base.c
@@ -62,7 +62,7 @@
 
 /*
  * core_lock protects i2c_adapter_idr, and guarantees that device detection,
- * deletion of detected devices, and attach_adapter calls are serialized
+ * deletion of detected devices are serialized
  */
 static DEFINE_MUTEX(core_lock);
 static DEFINE_IDR(i2c_adapter_idr);
@@ -1124,15 +1124,6 @@ static int i2c_do_add_adapter(struct i2c_driver *driver,
/* Detect supported devices on that bus, and instantiate them */
i2c_detect(adap, driver);
 
-   /* Let legacy drivers scan this bus for matching devices */
-   if (driver->attach_adapter) {
-   dev_warn(>dev, "%s: attach_adapter method is 
deprecated\n",
-driver->driver.name);
-   dev_warn(>dev,
-"Please use another way to instantiate your 
i2c_client\n");
-   /* We ignore the return code; if it fails, too bad */
-   driver->attach_adapter(adap);
-   }
return 0;
 }
 
diff --git a/include/linux/i2c.h b/include/linux/i2c.h
index 36f357ecdf67..b79387fd57da 100644
--- a/include/linux/i2c.h
+++ b/include/linux/i2c.h
@@ -231,7 +231,6 @@ enum i2c_alert_protocol {
 /**
  * struct i2c_driver - represent an I2C device driver
  * @class: What kind of i2c device we instantiate (for detect)
- * @attach_adapter: Callback for bus addition (deprecated)
  * @probe: Callback for device binding - soon to be deprecated
  * @probe_new: New callback for device binding
  * @remove: Callback for device unbinding
@@ -268,11 +267,6 @@ enum i2c_alert_protocol {
 struct i2c_driver {
unsigned int class;
 
-   /* Notifies the driver that a new bus has appeared. You should avoid
-* using this, it will be removed in a near future.
-*/
-   int (*attach_adapter)(struct i2c_adapter *) __deprecated;
-
/* Standard driver model interfaces */
int (*probe)(struct i2c_client *, const struct i2c_device_id *);
int (*remove)(struct i2c_client *);
-- 
2.11.0

[PATCH 1/2] macintosh: therm_windtunnel: drop using attach_adapter

2018-08-21 Thread Wolfram Sang

As we now have deferred probing, we can use a custom mechanism and
finally get rid of the legacy interface from the i2c core.

Signed-off-by: Wolfram Sang 
---
 drivers/macintosh/therm_windtunnel.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/macintosh/therm_windtunnel.c 
b/drivers/macintosh/therm_windtunnel.c
index 68dcbcb4fc5b..8c744578122a 100644
--- a/drivers/macintosh/therm_windtunnel.c
+++ b/drivers/macintosh/therm_windtunnel.c
@@ -432,7 +432,6 @@ static struct i2c_driver g4fan_driver = {
.driver = {
.name   = "therm_windtunnel",
},
-   .attach_adapter = do_attach,
.probe  = do_probe,
.remove = do_remove,
.id_table   = therm_windtunnel_id,
@@ -445,7 +444,29 @@ static struct i2c_driver g4fan_driver = {
 
 static int therm_of_probe(struct platform_device *dev)
 {
-   return i2c_add_driver( _driver );
+   struct i2c_adapter *adap;
+   int ret, i = 0;
+
+   adap = i2c_get_adapter(0);
+   if (!adap)
+   return -EPROBE_DEFER;
+
+   ret = i2c_add_driver(_driver);
+   if (ret) {
+   i2c_put_adapter(adap);
+   return ret;
+   }
+
+   /* We assume Macs have consecutive I2C bus numbers starting at 0 */
+   while (adap) {
+   do_attach(adap);
+   if (x.running)
+   return 0;
+   i2c_put_adapter(adap);
+   adap = i2c_get_adapter(++i);
+   }
+
+   return -ENODEV;
 }
 
 static int
-- 
2.11.0

Re: [PATCH] powerpc/nohash: fix pte_access_permitted()

2018-08-21 Thread Christophe LEROY





Le 21/08/2018 à 16:25, Aneesh Kumar K.V a écrit :

Christophe Leroy  writes:


Commit 5769beaf180a8 ("powerpc/mm: Add proper pte access check helper
for other platforms") replaced generic pte_access_permitted() by an
arch specific one.

The generic one is defined as
(pte_present(pte) && (!(write) || pte_write(pte)))

The arch specific one is open coded checking that _PAGE_USER and
_PAGE_WRITE (_PAGE_RW) flags are set, but lacking to check that
_PAGE_RO and _PAGE_PRIVILEGED are unset, leading to a useless test
on targets like the 8xx which defines _PAGE_RW and _PAGE_USER as 0.

Commit 5fa5b16be5b31 ("powerpc/mm/hugetlb: Use pte_access_permitted
for hugetlb access check") replaced some tests performed with
pte helpers by a call to pte_access_permitted(), leading to the same
issue.

This patch rewrites powerpc/nohash pte_access_permitted()
using pte helpers.



Thanks for fixing this. I should have used the helper instead of
opencoding it on nohash platforms. This is another reason why I was also
suggesting we should avoid consolidating pte accessors across platforms
and user accessors instead of opencoding.


I fully agree. I have opened a topic for that at 
https://github.com/linuxppc/linux/issues/177


Christophe



https://lore.kernel.org/lkml/87lgcusc6z@linux.vnet.ibm.com/T/#u

Reviewed-by: Aneesh Kumar K.V 


Fixes: 5769beaf180a8 ("powerpc/mm: Add proper pte access check helper for other 
platforms")
Fixes: 5fa5b16be5b31 ("powerpc/mm/hugetlb: Use pte_access_permitted for hugetlb 
access check")
Cc: sta...@vger.kernel.org # v4.15+
Signed-off-by: Christophe Leroy 
---
  arch/powerpc/include/asm/nohash/pgtable.h | 9 +++--
  1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index 2160be2e4339..b321c82b3624 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -51,17 +51,14 @@ static inline int pte_present(pte_t pte)
  #define pte_access_permitted pte_access_permitted
  static inline bool pte_access_permitted(pte_t pte, bool write)
  {
-   unsigned long pteval = pte_val(pte);
/*
 * A read-only access is controlled by _PAGE_USER bit.
 * We have _PAGE_READ set for WRITE and EXECUTE
 */
-   unsigned long need_pte_bits = _PAGE_PRESENT | _PAGE_USER;
-
-   if (write)
-   need_pte_bits |= _PAGE_WRITE;
+   if (!pte_present(pte) || !pte_user(pte) || !pte_read(pte))
+   return false;
  
-	if ((pteval & need_pte_bits) != need_pte_bits)

+   if (write && !pte_write(pte))
return false;
  
  	return true;

--
2.13.3

Re: [PATCH] powerpc/nohash: fix pte_access_permitted()

2018-08-21 Thread Aneesh Kumar K.V

Christophe Leroy  writes:

> Commit 5769beaf180a8 ("powerpc/mm: Add proper pte access check helper
> for other platforms") replaced generic pte_access_permitted() by an
> arch specific one.
>
> The generic one is defined as
> (pte_present(pte) && (!(write) || pte_write(pte)))
>
> The arch specific one is open coded checking that _PAGE_USER and
> _PAGE_WRITE (_PAGE_RW) flags are set, but lacking to check that
> _PAGE_RO and _PAGE_PRIVILEGED are unset, leading to a useless test
> on targets like the 8xx which defines _PAGE_RW and _PAGE_USER as 0.
>
> Commit 5fa5b16be5b31 ("powerpc/mm/hugetlb: Use pte_access_permitted
> for hugetlb access check") replaced some tests performed with
> pte helpers by a call to pte_access_permitted(), leading to the same
> issue.
>
> This patch rewrites powerpc/nohash pte_access_permitted()
> using pte helpers.
>

Thanks for fixing this. I should have used the helper instead of
opencoding it on nohash platforms. This is another reason why I was also
suggesting we should avoid consolidating pte accessors across platforms
and user accessors instead of opencoding. 

https://lore.kernel.org/lkml/87lgcusc6z@linux.vnet.ibm.com/T/#u

Reviewed-by: Aneesh Kumar K.V 

> Fixes: 5769beaf180a8 ("powerpc/mm: Add proper pte access check helper for 
> other platforms")
> Fixes: 5fa5b16be5b31 ("powerpc/mm/hugetlb: Use pte_access_permitted for 
> hugetlb access check")
> Cc: sta...@vger.kernel.org # v4.15+
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/nohash/pgtable.h | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
> b/arch/powerpc/include/asm/nohash/pgtable.h
> index 2160be2e4339..b321c82b3624 100644
> --- a/arch/powerpc/include/asm/nohash/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/pgtable.h
> @@ -51,17 +51,14 @@ static inline int pte_present(pte_t pte)
>  #define pte_access_permitted pte_access_permitted
>  static inline bool pte_access_permitted(pte_t pte, bool write)
>  {
> - unsigned long pteval = pte_val(pte);
>   /*
>* A read-only access is controlled by _PAGE_USER bit.
>* We have _PAGE_READ set for WRITE and EXECUTE
>*/
> - unsigned long need_pte_bits = _PAGE_PRESENT | _PAGE_USER;
> -
> - if (write)
> - need_pte_bits |= _PAGE_WRITE;
> + if (!pte_present(pte) || !pte_user(pte) || !pte_read(pte))
> + return false;
>  
> - if ((pteval & need_pte_bits) != need_pte_bits)
> + if (write && !pte_write(pte))
>   return false;
>  
>   return true;
> -- 
> 2.13.3

Re: [PATCH 1/2] sched/topology: Set correct numa topology type

2018-08-21 Thread Peter Zijlstra

On Tue, Aug 21, 2018 at 04:02:58AM -0700, Srikar Dronamraju wrote:
> * Srikar Dronamraju  [2018-08-10 22:30:18]:
> 
> > With commit 051f3ca02e46 ("sched/topology: Introduce NUMA identity node
> > sched domain") scheduler introduces an new numa level. However this
> > leads to numa topology on 2 node systems no more marked as NUMA_DIRECT.
> > After this commit, it gets reported as NUMA_BACKPLANE. This is because
> > sched_domains_numa_level is now 2 on 2 node systems.
> > 
> > Fix this by allowing setting systems that have upto 2 numa levels as
> > NUMA_DIRECT.
> > 
> > While here remove a code that assumes level can be 0.
> > 
> > Fixes: 051f3ca02e46 "Introduce NUMA identity node sched domain"
> > Signed-off-by: Srikar Dronamraju 
> > ---
> 
> Hey Peter,
> 
> Did you look at these two patches?

Nope, been on holidays and the inbox is an even bigger mess than normal.
I'll get to it, eventually :/

[RFC PATCH 2/2] powerpc/mm: Increase the max addressable memory to 2PB

2018-08-21 Thread Aneesh Kumar K.V

Currently we limit the max addressable memory to 128TB. This patch increase the
limit to 2PB. We can have devices like nvdimm which adds memory above 512TB
limit.

We still don't support regular system ram above 512TB. One of the challenge with
that is the percpu allocator, that allocates per node memory and use the max
distance between them as the percpu offsets. This means with large gap in
address space ( system ram above 1PB) we will run out of vmalloc space to map
the percpu allocation.

In order to support addressable memory above 512TB, kernel should be able to
linear map this range. To do that with hash translation we now add 4 context
to kernel linear map region. Our per context addressable range is 512TB. We
still keep VMALLOC and VMEMMAP region to old size. SLB miss handlers is updated
to validate these limit.

We also limit this update to SPARSEMEM_VMEMMAP and SPARSEMEM_EXTREME

Signed-off-by: Aneesh Kumar K.V 
---

Changes dependent on http://patchwork.ozlabs.org/patch/960211/
"powerpc/64s/hash: convert SLB miss handlers to C"


 arch/powerpc/include/asm/book3s/64/mmu-hash.h | 49 ---
 arch/powerpc/include/asm/mmu.h| 15 ++
 arch/powerpc/include/asm/sparsemem.h  | 11 -
 arch/powerpc/mm/slb.c | 20 ++--
 4 files changed, 73 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index b3520b549cba..a184b71cead8 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -558,6 +558,26 @@ extern void slb_set_size(u16 size);
 #define ESID_BITS_MASK ((1 << ESID_BITS) - 1)
 #define ESID_BITS_1T_MASK  ((1 << ESID_BITS_1T) - 1)
 
+/*
+ * Now certain config support MAX_PHYSMEM more than 512TB. Hence we will need
+ * to use more than one context for linear mapping the kernel.
+ * For vmalloc and memmap, we use just one context with 512TB. With 64 byte
+ * struct page size, we need ony 32 TB in memmap for 2PB (51 bits 
(MAX_PHYSMEM_BITS)).
+ */
+#if (MAX_PHYSMEM_BITS > MAX_EA_BITS_PER_CONTEXT)
+#define MAX_KERNEL_CTX_CNT (1UL << (MAX_PHYSMEM_BITS - 
MAX_EA_BITS_PER_CONTEXT))
+#else
+#define MAX_KERNEL_CTX_CNT 1
+#endif
+
+#define MAX_VMALLOC_CTX_CNT1
+#define MAX_MEMMAP_CTX_CNT 1
+
+/*
+ * Would be nice to use KERNEL_REGION_ID here
+ */
+#define KERNEL_REGION_CONTEXT_OFFSET   (0xc - 1)
+
 /*
  * 256MB segment
  * The proto-VSID space has 2^(CONTEX_BITS + ESID_BITS) - 1 segments
@@ -568,12 +588,13 @@ extern void slb_set_size(u16 size);
  * We also need to avoid the last segment of the last context, because that
  * would give a protovsid of 0x1f. That will result in a VSID 0
  * because of the modulo operation in vsid scramble.
+ *
+ * We add one extra context to MIN_USER_CONTEXT so that we can map kernel
+ * context easily. The +1 is to map the unused 0xe region mapping.
  */
 #define MAX_USER_CONTEXT   ((ASM_CONST(1) << CONTEXT_BITS) - 2)
-#define MIN_USER_CONTEXT   (5)
-
-/* Would be nice to use KERNEL_REGION_ID here */
-#define KERNEL_REGION_CONTEXT_OFFSET   (0xc - 1)
+#define MIN_USER_CONTEXT   (MAX_KERNEL_CTX_CNT + MAX_VMALLOC_CTX_CNT + \
+MAX_MEMMAP_CTX_CNT + 2)
 
 /*
  * For platforms that support on 65bit VA we limit the context bits
@@ -733,6 +754,23 @@ static inline unsigned long get_vsid(unsigned long 
context, unsigned long ea,
return vsid_scramble(protovsid, VSID_MULTIPLIER_1T, vsid_bits);
 }
 
+static inline unsigned long get_kernel_context(unsigned long ea)
+{
+   unsigned long region_id = REGION_ID(ea);
+   unsigned long ctx = region_id - KERNEL_REGION_CONTEXT_OFFSET;
+   /*
+* For linear mapping we do support multiple context
+*/
+   if (region_id == KERNEL_REGION_ID) {
+   /*
+* We already verified ea to be not beyond the addr limit.
+*/
+   ctx +=  ((ea & ~REGION_MASK) >> MAX_EA_BITS_PER_CONTEXT);
+   }
+
+   return ctx;
+}
+
 /*
  * This is only valid for addresses >= PAGE_OFFSET
  */
@@ -755,8 +793,7 @@ static inline unsigned long get_kernel_vsid(unsigned long 
ea, int ssize)
 * So we can compute the context from the region (top nibble) by
 * subtracting 11, or 0xc - 1.
 */
-   context = (ea >> 60) - KERNEL_REGION_CONTEXT_OFFSET;
-
+   context = get_kernel_context(ea);
return get_vsid(context, ea, ssize);
 }
 
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 13ea441ac531..eb20eb3b8fb0 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -309,6 +309,21 @@ static inline u16 get_mm_addr_key(struct mm_struct *mm, 
unsigned long address)
  */
 #define MMU_PAGE_COUNT 16
 
+/*
+ * If we store section details in page->flags we can't increase the 
MAX_PHYSMEM_BITS
+ * if we increase SECTIONS_WIDTH we

[RFC PATCH 1/2] powerpc/mm/hash: Rename get_ea_context to get_user_context

2018-08-21 Thread Aneesh Kumar K.V

We will be adding get_kernel_context later. Update function name to indicate
this handle context allocation user space address.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/mmu.h | 4 ++--
 arch/powerpc/include/asm/mmu_context.h   | 2 +-
 arch/powerpc/mm/slb.c| 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index 9c8c669a6b6a..6328857f259f 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -208,7 +208,7 @@ extern void radix_init_pseries(void);
 static inline void radix_init_pseries(void) { };
 #endif
 
-static inline int get_ea_context(mm_context_t *ctx, unsigned long ea)
+static inline int get_user_context(mm_context_t *ctx, unsigned long ea)
 {
int index = ea >> MAX_EA_BITS_PER_CONTEXT;
 
@@ -223,7 +223,7 @@ static inline int get_ea_context(mm_context_t *ctx, 
unsigned long ea)
 static inline unsigned long get_user_vsid(mm_context_t *ctx,
  unsigned long ea, int ssize)
 {
-   unsigned long context = get_ea_context(ctx, ea);
+   unsigned long context = get_user_context(ctx, ea);
 
return get_vsid(context, ea, ssize);
 }
diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index b2f89b621b15..dbbab7ba449b 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -81,7 +81,7 @@ static inline bool need_extra_context(struct mm_struct *mm, 
unsigned long ea)
 {
int context_id;
 
-   context_id = get_ea_context(>context, ea);
+   context_id = get_user_context(>context, ea);
if (!context_id)
return true;
return false;
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index b0f4c038d658..f63a608922ca 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -464,7 +464,7 @@ static long slb_allocate_user(struct mm_struct *mm, 
unsigned long ea)
if (ea >= mm->context.slb_addr_limit)
return -EFAULT;
 
-   context = get_ea_context(>context, ea);
+   context = get_user_context(>context, ea);
if (!context)
return -EFAULT;
 
-- 
2.17.1

Re: [RFC 00/15] PCI: turn some __weak functions into callbacks

2018-08-21 Thread Christoph Hellwig

On Tue, Aug 21, 2018 at 12:30:50PM +0100, David Woodhouse wrote:
> On Mon, 2018-08-20 at 23:14 -0700, Christoph Hellwig wrote:
> > On Fri, Aug 17, 2018 at 12:26:30PM +0200, Arnd Bergmann wrote:
> > > Hi Bjorn and others,
> > > 
> > > Triggered by Christoph's patches, I had another go at converting
> > > all of the remaining pci host bridge implementations to be based
> > > on pci_alloc_host_bridge and a separate registration function.
> > 
> > I really like the idea behind this series.
> 
> Hm... are you turning direct calls into retpolined indirect calls?

He does.  But not anywhere near the fast path.

[PATCH] powerpc/nohash: fix pte_access_permitted()

2018-08-21 Thread Christophe Leroy

Commit 5769beaf180a8 ("powerpc/mm: Add proper pte access check helper
for other platforms") replaced generic pte_access_permitted() by an
arch specific one.

The generic one is defined as
(pte_present(pte) && (!(write) || pte_write(pte)))

The arch specific one is open coded checking that _PAGE_USER and
_PAGE_WRITE (_PAGE_RW) flags are set, but lacking to check that
_PAGE_RO and _PAGE_PRIVILEGED are unset, leading to a useless test
on targets like the 8xx which defines _PAGE_RW and _PAGE_USER as 0.

Commit 5fa5b16be5b31 ("powerpc/mm/hugetlb: Use pte_access_permitted
for hugetlb access check") replaced some tests performed with
pte helpers by a call to pte_access_permitted(), leading to the same
issue.

This patch rewrites powerpc/nohash pte_access_permitted()
using pte helpers.

Fixes: 5769beaf180a8 ("powerpc/mm: Add proper pte access check helper for other 
platforms")
Fixes: 5fa5b16be5b31 ("powerpc/mm/hugetlb: Use pte_access_permitted for hugetlb 
access check")
Cc: sta...@vger.kernel.org # v4.15+
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/pgtable.h | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index 2160be2e4339..b321c82b3624 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -51,17 +51,14 @@ static inline int pte_present(pte_t pte)
 #define pte_access_permitted pte_access_permitted
 static inline bool pte_access_permitted(pte_t pte, bool write)
 {
-   unsigned long pteval = pte_val(pte);
/*
 * A read-only access is controlled by _PAGE_USER bit.
 * We have _PAGE_READ set for WRITE and EXECUTE
 */
-   unsigned long need_pte_bits = _PAGE_PRESENT | _PAGE_USER;
-
-   if (write)
-   need_pte_bits |= _PAGE_WRITE;
+   if (!pte_present(pte) || !pte_user(pte) || !pte_read(pte))
+   return false;
 
-   if ((pteval & need_pte_bits) != need_pte_bits)
+   if (write && !pte_write(pte))
return false;
 
return true;
-- 
2.13.3

Re: [RFC 00/15] PCI: turn some __weak functions into callbacks

2018-08-21 Thread David Woodhouse

On Mon, 2018-08-20 at 23:14 -0700, Christoph Hellwig wrote:
> On Fri, Aug 17, 2018 at 12:26:30PM +0200, Arnd Bergmann wrote:
> > Hi Bjorn and others,
> > 
> > Triggered by Christoph's patches, I had another go at converting
> > all of the remaining pci host bridge implementations to be based
> > on pci_alloc_host_bridge and a separate registration function.
> 
> I really like the idea behind this series.

Hm... are you turning direct calls into retpolined indirect calls?



smime.p7s
Description: S/MIME cryptographic signature

[PATCH v2] powerpc/64s/hash: convert SLB miss handlers to C

2018-08-21 Thread Nicholas Piggin

This patch moves SLB miss handlers completely to C, using the standard
exception handler macros to set up the stack and branch to C.

This can be done because the segment containing the kernel stack is
always bolted, so accessing it with relocation on will not cause an
SLB exception.

Arbitrary kernel memory may not be accessed when handling kernel space
SLB misses, so care should be taken there. However user SLB misses can
access any kernel memory, which can be used to move some fields out of
the paca (in later patches).

User SLB misses could quite easily reconcile IRQs and set up a first
class kernel environment and exit via ret_from_except, however that
doesn't seem to be necessary at the moment, so we only do that if a
bad fault is encountered.

[ Credit to Aneesh for bug fixes, error checks, and improvements to bad
  address handling, etc. ]

Signed-off-by: Nicholas Piggin 

Since RFC:
- Added MSR[RI] handling
- Fixed up a register loss bug exposed by irq tracing (Aneesh)
- Reject misses outside the defined kernel regions (Aneesh)
- Added several more sanity checks and error handling (Aneesh), we may
  look at consolidating these tests and tightenig up the code but for
  a first pass we decided it's better to check carefully.

Since v1:
- Fixed SLB cache corruption (Aneesh)
- Fixed untidy SLBE allocation "leak" in get_vsid error case
- Now survives some stress testing on real hardware
---
 arch/powerpc/include/asm/asm-prototypes.h |   2 +
 arch/powerpc/kernel/exceptions-64s.S  | 202 +++--
 arch/powerpc/mm/Makefile  |   2 +-
 arch/powerpc/mm/slb.c | 271 +
 arch/powerpc/mm/slb_low.S | 335 --
 5 files changed, 196 insertions(+), 616 deletions(-)
 delete mode 100644 arch/powerpc/mm/slb_low.S

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 1f4691ce4126..78ed3c3f879a 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -78,6 +78,8 @@ void kernel_bad_stack(struct pt_regs *regs);
 void system_reset_exception(struct pt_regs *regs);
 void machine_check_exception(struct pt_regs *regs);
 void emulation_assist_interrupt(struct pt_regs *regs);
+long do_slb_fault(struct pt_regs *regs, unsigned long ea);
+void do_bad_slb_fault(struct pt_regs *regs, unsigned long ea, long err);
 
 /* signals, syscalls and interrupts */
 long sys_swapcontext(struct ucontext __user *old_ctx,
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index ea04dfb8c092..c3832243819b 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -566,28 +566,36 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 
 
 EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80)
-   SET_SCRATCH0(r13)
-   EXCEPTION_PROLOG_0(PACA_EXSLB)
-   EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x380)
-   mr  r12,r3  /* save r3 */
-   mfspr   r3,SPRN_DAR
-   mfspr   r11,SPRN_SRR1
-   crset   4*cr6+eq
-   BRANCH_TO_COMMON(r10, slb_miss_common)
+EXCEPTION_PROLOG(PACA_EXSLB, data_access_slb_common, EXC_STD, KVMTEST_PR, 
0x380);
 EXC_REAL_END(data_access_slb, 0x380, 0x80)
 
 EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80)
-   SET_SCRATCH0(r13)
-   EXCEPTION_PROLOG_0(PACA_EXSLB)
-   EXCEPTION_PROLOG_1(PACA_EXSLB, NOTEST, 0x380)
-   mr  r12,r3  /* save r3 */
-   mfspr   r3,SPRN_DAR
-   mfspr   r11,SPRN_SRR1
-   crset   4*cr6+eq
-   BRANCH_TO_COMMON(r10, slb_miss_common)
+EXCEPTION_RELON_PROLOG(PACA_EXSLB, data_access_slb_common, EXC_STD, NOTEST, 
0x380);
 EXC_VIRT_END(data_access_slb, 0x4380, 0x80)
+
 TRAMP_KVM_SKIP(PACA_EXSLB, 0x380)
 
+EXC_COMMON_BEGIN(data_access_slb_common)
+   mfspr   r10,SPRN_DAR
+   std r10,PACA_EXSLB+EX_DAR(r13)
+   EXCEPTION_PROLOG_COMMON(0x380, PACA_EXSLB)
+   ld  r4,PACA_EXSLB+EX_DAR(r13)
+   std r4,_DAR(r1)
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  do_slb_fault
+   cmpdi   r3,0
+   bne-1f
+   b   fast_exception_return
+1: /* Error case */
+   std r3,RESULT(r1)
+   bl  save_nvgprs
+   RECONCILE_IRQ_STATE(r10, r11)
+   ld  r4,_DAR(r1)
+   ld  r5,RESULT(r1)
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  do_bad_slb_fault
+   b   ret_from_except
+
 
 EXC_REAL(instruction_access, 0x400, 0x80)
 EXC_VIRT(instruction_access, 0x4400, 0x80, 0x400)
@@ -610,160 +618,34 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 
 
 EXC_REAL_BEGIN(instruction_access_slb, 0x480, 0x80)
-   SET_SCRATCH0(r13)
-   EXCEPTION_PROLOG_0(PACA_EXSLB)
-   EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x480)
-   mr  r12,r3  /* save r3 */
-   mfspr   r3,SPRN_SRR0/* SRR0 is faulting address */
-   mfspr   r11,SPRN_SRR1
-   crclr   4*cr6+eq
-   BRANCH_TO_COMMON(r10,

Re: [PATCH] powerpc/64s/hash: convert SLB miss handlers to C

2018-08-21 Thread Nicholas Piggin

On Tue, 21 Aug 2018 16:12:44 +1000
Benjamin Herrenschmidt  wrote:

> On Tue, 2018-08-21 at 15:13 +1000, Nicholas Piggin wrote:
> > This patch moves SLB miss handlers completely to C, using the standard
> > exception handler macros to set up the stack and branch to C.
> > 
> > This can be done because the segment containing the kernel stack is
> > always bolted, so accessing it with relocation on will not cause an
> > SLB exception.
> > 
> > Arbitrary kernel memory may not be accessed when handling kernel space
> > SLB misses, so care should be taken there. However user SLB misses can
> > access any kernel memory, which can be used to move some fields out of
> > the paca (in later patches).
> > 
> > User SLB misses could quite easily reconcile IRQs and set up a first
> > class kernel environment and exit via ret_from_except, however that
> > doesn't seem to be necessary at the moment, so we only do that if a
> > bad fault is encountered.
> > 
> > [ Credit to Aneesh for bug fixes, error checks, and improvements to bad
> >   address handling, etc ]
> > 
> > Signed-off-by: Nicholas Piggin 
> > 
> > Since RFC:
> > - Send patch 1 by itself to focus on the big change.
> > - Added MSR[RI] handling
> > - Fixed up a register loss bug exposed by irq tracing (Aneesh)
> > - Reject misses outside the defined kernel regions (Aneesh)
> > - Added several more sanity checks and error handlig (Aneesh), we may
> >   look at consolidating these tests and tightenig up the code but for
> >   a first pass we decided it's better to check carefully.
> > ---
> >  arch/powerpc/include/asm/asm-prototypes.h |   2 +
> >  arch/powerpc/kernel/exceptions-64s.S  | 202 +++--
> >  arch/powerpc/mm/Makefile  |   2 +-
> >  arch/powerpc/mm/slb.c | 257 +
> >  arch/powerpc/mm/slb_low.S | 335 --
> >  5 files changed, 185 insertions(+), 613 deletions(-)  
> ^^^^^^
> 
> Nice ! :-)

Okay with the subsequent improvements, the context switching benchmark
is about 30% faster. Radix is still faster, but hash now has 90% the
throughput rather than 70% with upstream. Although that's no longer
testing SLB miss performance because they are all eliminated.

I don't know of any other SLB miss intensive tests. I think these
numbers indicate it may be a reasonable approach wrt performance.
Though we should still look at HANA or something on a big memory
system.

Thanks,
Nick

Re: [PATCH 1/2] sched/topology: Set correct numa topology type

2018-08-21 Thread Srikar Dronamraju

* Srikar Dronamraju  [2018-08-10 22:30:18]:

> With commit 051f3ca02e46 ("sched/topology: Introduce NUMA identity node
> sched domain") scheduler introduces an new numa level. However this
> leads to numa topology on 2 node systems no more marked as NUMA_DIRECT.
> After this commit, it gets reported as NUMA_BACKPLANE. This is because
> sched_domains_numa_level is now 2 on 2 node systems.
> 
> Fix this by allowing setting systems that have upto 2 numa levels as
> NUMA_DIRECT.
> 
> While here remove a code that assumes level can be 0.
> 
> Fixes: 051f3ca02e46 "Introduce NUMA identity node sched domain"
> Signed-off-by: Srikar Dronamraju 
> ---

Hey Peter,

Did you look at these two patches?

-- 
Thanks and Regards
Srikar Dronamraju

[PATCH RFCv2 6/6] memory-hotplug.txt: Add some details about locking internals

2018-08-21 Thread David Hildenbrand

Let's document the magic a bit, especially why device_hotplug_lock is
required when adding/removing memory and how it all play together with
requests to online/offline memory from user space.

Cc: Jonathan Corbet 
Cc: Michal Hocko 
Cc: Andrew Morton 
Signed-off-by: David Hildenbrand 
---
 Documentation/memory-hotplug.txt | 39 +++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 7f49ebf3ddb2..03aaad7d7373 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -3,7 +3,7 @@ Memory Hotplug
 ==
 
 :Created:  Jul 28 2007
-:Updated: Add description of notifier of memory hotplug:   Oct 11 2007
+:Updated: Add some details about locking internals:Aug 20 2018
 
 This document is about memory hotplug including how-to-use and current status.
 Because Memory Hotplug is still under development, contents of this text will
@@ -495,6 +495,43 @@ further processing of the notification queue.
 
 NOTIFY_STOP stops further processing of the notification queue.
 
+
+Locking Internals
+=
+
+When adding/removing memory that uses memory block devices (i.e. ordinary RAM),
+the device_hotplug_lock should be held to:
+
+- synchronize against online/offline requests (e.g. via sysfs). This way, 
memory
+  block devices can only be accessed (.online/.state attributes) by user
+  space once memory has been fully added. And when removing memory, we
+  know nobody is in critical sections.
+- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC)
+
+Especially, there is a possible lock inversion that is avoided using
+device_hotplug_lock when adding memory and user space tries to online that
+memory faster than expected:
+
+- device_online() will first take the device_lock(), followed by
+  mem_hotplug_lock
+- add_memory_resource() will first take the mem_hotplug_lock, followed by
+  the device_lock() (while creating the devices, during bus_add_device()).
+
+As the device is visible to user space before taking the device_lock(), this
+can result in a lock inversion.
+
+onlining/offlining of memory should be done via device_online()/
+device_offline() - to make sure it is properly synchronized to actions
+via sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type)
+
+When adding/removing/onlining/offlining memory or adding/removing
+heterogeneous/device memory, we should always hold the mem_hotplug_lock to
+serialise memory hotplug (e.g. access to global/zone variables).
+
+In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) allows
+for a quite efficient get_online_mems/put_online_mems implementation.
+
+
 Future Work
 ===
 
-- 
2.17.1

[PATCH RFCv2 5/6] powerpc/powernv: hold device_hotplug_lock in memtrace_offline_pages()

2018-08-21 Thread David Hildenbrand

Let's perform all checking + offlining + removing under
device_hotplug_lock, so nobody can mess with these devices via
sysfs concurrently.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Rashmica Gupta 
Cc: Balbir Singh 
Cc: Michael Neuling 
Signed-off-by: David Hildenbrand 
---
 arch/powerpc/platforms/powernv/memtrace.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c 
b/arch/powerpc/platforms/powernv/memtrace.c
index ef7181d4fe68..473e59842ec5 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -74,9 +74,13 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, 
u64 nr_pages)
 {
u64 end_pfn = start_pfn + nr_pages - 1;
 
+   lock_device_hotplug();
+
if (walk_memory_range(start_pfn, end_pfn, NULL,
-   check_memblock_online))
+   check_memblock_online)) {
+   unlock_device_hotplug();
return false;
+   }
 
walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
  change_memblock_state);
@@ -84,14 +88,16 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, 
u64 nr_pages)
if (offline_pages(start_pfn, nr_pages)) {
walk_memory_range(start_pfn, end_pfn, (void *)MEM_ONLINE,
  change_memblock_state);
+   unlock_device_hotplug();
return false;
}
 
walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
  change_memblock_state);
 
-   remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
+   __remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
 
+   unlock_device_hotplug();
return true;
 }
 
-- 
2.17.1

[PATCH RFCv2 4/6] powerpc/powernv: hold device_hotplug_lock when calling device_online()

2018-08-21 Thread David Hildenbrand

device_online() should be called with device_hotplug_lock() held.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Rashmica Gupta 
Cc: Balbir Singh 
Cc: Michael Neuling 
Signed-off-by: David Hildenbrand 
---
 arch/powerpc/platforms/powernv/memtrace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c 
b/arch/powerpc/platforms/powernv/memtrace.c
index 8f1cd4f3bfd5..ef7181d4fe68 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -229,9 +229,11 @@ static int memtrace_online(void)
 * we need to online the memory ourselves.
 */
if (!memhp_auto_online) {
+   lock_device_hotplug();
walk_memory_range(PFN_DOWN(ent->start),
  PFN_UP(ent->start + ent->size - 1),
  NULL, online_mem_block);
+   unlock_device_hotplug();
}
 
/*
-- 
2.17.1

[PATCH RFCv2 3/6] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock

2018-08-21 Thread David Hildenbrand

There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
fix concurrent memory hot-add deadlock"), which tried to fix a possible
lock inversion reported and discussed in [1] due to the two locks
a) device_lock()
b) mem_hotplug_lock

While add_memory() first takes b), followed by a) during
bus_probe_device(), onlining of memory from user space first took b),
followed by a), exposing a possible deadlock.

In [1], and it was decided to not make use of device_hotplug_lock, but
rather to enforce a locking order.

The problems I spotted related to this:

1. Memory block device attributes: While .state first calls
   mem_hotplug_begin() and the calls device_online() - which takes
   device_lock() - .online does no longer call mem_hotplug_begin(), so
   effectively calls online_pages() without mem_hotplug_lock.

2. device_online() should be called under device_hotplug_lock, however
   onlining memory during add_memory() does not take care of that.

In addition, I think there is also something wrong about the locking in

3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
   without locks. This was introduced after 30467e0b3be. And skimming over
   the code, I assume it could need some more care in regards to locking
   (e.g. device_online() called without device_hotplug_lock - but I'll
   not touch that for now).

Now that we hold the device_hotplug_lock when
- adding memory (e.g. via add_memory()/add_memory_resource())
- removing memory (e.g. via remove_memory())
- device_online()/device_offline()

We can move mem_hotplug_lock usage back into
online_pages()/offline_pages().

Why is mem_hotplug_lock still needed? Essentially to make
get_online_mems()/put_online_mems() be very fast (relying on
device_hotplug_lock would be very slow), and to serialize against
addition of memory that does not create memory block devices (hmm).

[1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
2015-February/065324.html

This patch is partly based on a patch by Vitaly Kuznetsov.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: Greg Kroah-Hartman 
Cc: "K. Y. Srinivasan" 
Cc: Haiyang Zhang 
Cc: Stephen Hemminger 
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Rashmica Gupta 
Cc: Michael Neuling 
Cc: Balbir Singh 
Cc: Kate Stewart 
Cc: Thomas Gleixner 
Cc: Philippe Ombredanne 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Pavel Tatashin 
Cc: Vlastimil Babka 
Cc: Dan Williams 
Cc: Oscar Salvador 
Cc: YASUAKI ISHIMATSU 
Cc: Mathieu Malaterre 
Signed-off-by: David Hildenbrand 
---
 drivers/base/memory.c | 13 +
 mm/memory_hotplug.c   | 28 
 2 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 5b0375be7f65..04be13539eb8 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -228,7 +228,6 @@ static bool pages_correctly_probed(unsigned long start_pfn)
 /*
  * MEMORY_HOTPLUG depends on SPARSEMEM in mm/Kconfig, so it is
  * OK to have direct references to sparsemem variables in here.
- * Must already be protected by mem_hotplug_begin().
  */
 static int
 memory_block_action(unsigned long phys_index, unsigned long action, int 
online_type)
@@ -294,7 +293,6 @@ static int memory_subsys_online(struct device *dev)
if (mem->online_type < 0)
mem->online_type = MMOP_ONLINE_KEEP;
 
-   /* Already under protection of mem_hotplug_begin() */
ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE);
 
/* clear online_type */
@@ -341,19 +339,11 @@ store_mem_state(struct device *dev,
goto err;
}
 
-   /*
-* Memory hotplug needs to hold mem_hotplug_begin() for probe to find
-* the correct memory block to online before doing device_online(dev),
-* which will take dev->mutex.  Take the lock early to prevent an
-* inversion, memory_subsys_online() callbacks will be implemented by
-* assuming it's already protected.
-*/
-   mem_hotplug_begin();
-
switch (online_type) {
case MMOP_ONLINE_KERNEL:
case MMOP_ONLINE_MOVABLE:
case MMOP_ONLINE_KEEP:
+   /* mem->online_type is protected by device_hotplug_lock */
mem->online_type = online_type;
ret = device_online(>dev);
break;
@@ -364,7 +354,6 @@ store_mem_state(struct device *dev,
ret = -EINVAL; /* should never happen */
}
 
-   mem_hotplug_done();
 err:
unlock_device_hotplug();
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e2b5c751e3ea..a2c6c87d83f3 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -881,7 +881,6 @@ static struct zone * __meminit move_pfn_range(int 
online_type, int nid,
return zone;
 }
 
-/* Must be protected by mem_hotplug_begin() or

[PATCH RFCv2 2/6] mm/memory_hotplug: make add_memory() take the device_hotplug_lock

2018-08-21 Thread David Hildenbrand

add_memory() currently does not take the device_hotplug_lock, however
is aleady called under the lock from
arch/powerpc/platforms/pseries/hotplug-memory.c
drivers/acpi/acpi_memhotplug.c
to synchronize against CPU hot-remove and similar.

In general, we should hold the device_hotplug_lock when adding memory
to synchronize against online/offline request (e.g. from user space) -
which already resulted in lock inversions due to device_lock() and
mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory
hot-add deadlock"). add_memory()/add_memory_resource() will create memory
block devices, so this really feels like the right thing to do.

Holding the device_hotplug_lock makes sure that a memory block device
can really only be accessed (e.g. via .online/.state) from user space,
once the memory has been fully added to the system.

The lock is not held yet in
drivers/xen/balloon.c
arch/powerpc/platforms/powernv/memtrace.c
drivers/s390/char/sclp_cmd.c
drivers/hv/hv_balloon.c
So, let's either use the locked variants or take the lock.

Don't export add_memory_resource(), as it once was exported to be used
by XEN, which is never built as a module. If somebody requires it, we
also have to export a locked variant (as device_hotplug_lock is never
exported).

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: Greg Kroah-Hartman 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Nathan Fontenot 
Cc: John Allen 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Dan Williams 
Cc: Joonsoo Kim 
Cc: Vlastimil Babka 
Cc: Oscar Salvador 
Cc: Mathieu Malaterre 
Cc: Pavel Tatashin 
Cc: YASUAKI ISHIMATSU 
Signed-off-by: David Hildenbrand 
---
 .../platforms/pseries/hotplug-memory.c|  2 +-
 drivers/acpi/acpi_memhotplug.c|  2 +-
 drivers/base/memory.c |  9 ++--
 drivers/xen/balloon.c |  3 +++
 include/linux/memory_hotplug.h|  1 +
 mm/memory_hotplug.c   | 22 ---
 6 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index b3f54466e25f..2e6f41dc103a 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -702,7 +702,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
nid = memory_add_physaddr_to_nid(lmb->base_addr);
 
/* Add the memory */
-   rc = add_memory(nid, lmb->base_addr, block_sz);
+   rc = __add_memory(nid, lmb->base_addr, block_sz);
if (rc) {
dlpar_remove_device_tree_lmb(lmb);
return rc;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 811148415993..8fe0960ea572 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -228,7 +228,7 @@ static int acpi_memory_enable_device(struct 
acpi_memory_device *mem_device)
if (node < 0)
node = memory_add_physaddr_to_nid(info->start_addr);
 
-   result = add_memory(node, info->start_addr, info->length);
+   result = __add_memory(node, info->start_addr, info->length);
 
/*
 * If the memory block has been used by the kernel, add_memory()
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c8a1cb0b6136..5b0375be7f65 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -521,15 +521,20 @@ memory_probe_store(struct device *dev, struct 
device_attribute *attr,
if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
return -EINVAL;
 
+   ret = lock_device_hotplug_sysfs();
+   if (ret)
+   goto out;
+
nid = memory_add_physaddr_to_nid(phys_addr);
-   ret = add_memory(nid, phys_addr,
-MIN_MEMORY_BLOCK_SIZE * sections_per_block);
+   ret = __add_memory(nid, phys_addr,
+  MIN_MEMORY_BLOCK_SIZE * sections_per_block);
 
if (ret)
goto out;
 
ret = count;
 out:
+   unlock_device_hotplug();
return ret;
 }
 
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index e12bb256036f..6bab019a82b1 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -395,7 +395,10 @@ static enum bp_state reserve_additional_memory(void)
 * callers drop the mutex before trying again.
 */
mutex_unlock(_mutex);
+   /* add_memory_resource() requires the device_hotplug lock */
+   lock_device_hotplug();
rc = add_memory_resource(nid, resource, memhp_auto_online);
+   unlock_device_hotplug();
mutex_lock(_mutex);
 
if (rc) {
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 1f096852f479..ffd9cd10fcf3 100644
---

[PATCH RFCv2 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock

2018-08-21 Thread David Hildenbrand

remove_memory() is exported right now but requires the
device_hotplug_lock, which is not exported. So let's provide a variant
that takes the lock and only export that one.

The lock is already held in
arch/powerpc/platforms/pseries/hotplug-memory.c
drivers/acpi/acpi_memhotplug.c
So, let's use the locked variant.

The lock is not held (but taken in)
arch/powerpc/platforms/powernv/memtrace.c
So let's keep using the (now) locked variant.

Apart from that, there are not other users in the tree.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: Rashmica Gupta 
Cc: Michael Neuling 
Cc: Balbir Singh 
Cc: Nathan Fontenot 
Cc: John Allen 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Dan Williams 
Cc: Joonsoo Kim 
Cc: Vlastimil Babka 
Cc: Pavel Tatashin 
Cc: Greg Kroah-Hartman 
Cc: Oscar Salvador 
Cc: YASUAKI ISHIMATSU 
Cc: Mathieu Malaterre 
Signed-off-by: David Hildenbrand 
---
 arch/powerpc/platforms/powernv/memtrace.c   | 2 --
 arch/powerpc/platforms/pseries/hotplug-memory.c | 6 +++---
 drivers/acpi/acpi_memhotplug.c  | 2 +-
 include/linux/memory_hotplug.h  | 3 ++-
 mm/memory_hotplug.c | 9 -
 5 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c 
b/arch/powerpc/platforms/powernv/memtrace.c
index 51dc398ae3f7..8f1cd4f3bfd5 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -90,9 +90,7 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, 
u64 nr_pages)
walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
  change_memblock_state);
 
-   lock_device_hotplug();
remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
-   unlock_device_hotplug();
 
return true;
 }
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index c1578f54c626..b3f54466e25f 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -334,7 +334,7 @@ static int pseries_remove_memblock(unsigned long base, 
unsigned int memblock_siz
nid = memory_add_physaddr_to_nid(base);
 
for (i = 0; i < sections_per_block; i++) {
-   remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE);
+   __remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE);
base += MIN_MEMORY_BLOCK_SIZE;
}
 
@@ -423,7 +423,7 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
block_sz = pseries_memory_block_size();
nid = memory_add_physaddr_to_nid(lmb->base_addr);
 
-   remove_memory(nid, lmb->base_addr, block_sz);
+   __remove_memory(nid, lmb->base_addr, block_sz);
 
/* Update memory regions for memory remove */
memblock_remove(lmb->base_addr, block_sz);
@@ -710,7 +710,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 
rc = dlpar_online_lmb(lmb);
if (rc) {
-   remove_memory(nid, lmb->base_addr, block_sz);
+   __remove_memory(nid, lmb->base_addr, block_sz);
dlpar_remove_device_tree_lmb(lmb);
} else {
lmb->flags |= DRCONF_MEM_ASSIGNED;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 6b0d3ef7309c..811148415993 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct 
acpi_memory_device *mem_device)
nid = memory_add_physaddr_to_nid(info->start_addr);
 
acpi_unbind_memory_blocks(info);
-   remove_memory(nid, info->start_addr, info->length);
+   __remove_memory(nid, info->start_addr, info->length);
list_del(>list);
kfree(info);
}
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 34a28227068d..1f096852f479 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -301,6 +301,7 @@ extern bool is_mem_section_removable(unsigned long pfn, 
unsigned long nr_pages);
 extern void try_offline_node(int nid);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
 extern void remove_memory(int nid, u64 start, u64 size);
+extern void __remove_memory(int nid, u64 start, u64 size);
 
 #else
 static inline bool is_mem_section_removable(unsigned long pfn,
@@ -317,6 +318,7 @@ static inline int offline_pages(unsigned long start_pfn, 
unsigned long nr_pages)
 }
 
 static inline void remove_memory(int nid, u64 start, u64 size) {}
+static inline void __remove_memory(int nid, u64 start, u64 size) {}
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 extern void __ref free_area_init_core_hotplug(int nid);
@@ -330,7 +332,6 @@ extern void move_pfn_range_to_zone(struct zone *zone, 
unsigned

[PATCH RFCv2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock

2018-08-21 Thread David Hildenbrand

This is the same approach as in the first RFC, but this time without
exporting device_hotplug_lock (requested by Greg) and with some more
details and documentation regarding locking. Tested only on x86 so far.

--

Reading through the code and studying how mem_hotplug_lock is to be used,
I noticed that there are two places where we can end up calling
device_online()/device_offline() - online_pages()/offline_pages() without
the mem_hotplug_lock. And there are other places where we call
device_online()/device_offline() without the device_hotplug_lock.

While e.g.
echo "online" > /sys/devices/system/memory/memory9/state
is fine, e.g.
echo 1 > /sys/devices/system/memory/memory9/online
Will not take the mem_hotplug_lock. However the device_lock() and
device_hotplug_lock.

E.g. via memory_probe_store(), we can end up calling
add_memory()->online_pages() without the device_hotplug_lock. So we can
have concurrent callers in online_pages(). We e.g. touch in online_pages()
basically unprotected zone->present_pages then.

Looks like there is a longer history to that (see Patch #2 for details),
and fixing it to work the way it was intended is not really possible. We
would e.g. have to take the mem_hotplug_lock in device/base/core.c, which
sounds wrong.

Summary: We had a lock inversion on mem_hotplug_lock and device_lock().
More details can be found in patch 3 and patch 6.

I propose the general rules (documentation added in patch 6):

1. add_memory/add_memory_resource() must only be called with
   device_hotplug_lock.
2. remove_memory() must only be called with device_hotplug_lock. This is
   already documented and holds for all callers.
3. device_online()/device_offline() must only be called with
   device_hotplug_lock. This is already documented and true for now in core
   code. Other callers (related to memory hotplug) have to be fixed up.
4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
   online_pages/offline_pages.

To me, this looks way cleaner than what we have right now (and easier to
verify). And looking at the documentation of remove_memory, using
lock_device_hotplug also for add_memory() feels natural.


RFC -> RFCv2:
- Don't export device_hotplug_lock, provide proper remove_memory/add_memory
  wrappers.
- Split up the patches a bit.
- Try to improve powernv memtrace locking
- Add some documentation for locking that matches my knowledge

David Hildenbrand (6):
  mm/memory_hotplug: make remove_memory() take the device_hotplug_lock
  mm/memory_hotplug: make add_memory() take the device_hotplug_lock
  mm/memory_hotplug: fix online/offline_pages called w.o.
mem_hotplug_lock
  powerpc/powernv: hold device_hotplug_lock when calling device_online()
  powerpc/powernv: hold device_hotplug_lock in memtrace_offline_pages()
  memory-hotplug.txt: Add some details about locking internals

 Documentation/memory-hotplug.txt  | 39 +++-
 arch/powerpc/platforms/powernv/memtrace.c | 14 +++--
 .../platforms/pseries/hotplug-memory.c|  8 +--
 drivers/acpi/acpi_memhotplug.c|  4 +-
 drivers/base/memory.c | 22 +++
 drivers/xen/balloon.c |  3 +
 include/linux/memory_hotplug.h|  4 +-
 mm/memory_hotplug.c   | 59 +++
 8 files changed, 115 insertions(+), 38 deletions(-)

-- 
2.17.1

Re: powerpc/fadump: cleanup crash memory ranges support

2018-08-21 Thread Michael Ellerman

On Fri, 2018-08-17 at 20:40:56 UTC, Hari Bathini wrote:
> Commit 1bd6a1c4b80a ("powerpc/fadump: handle crash memory ranges array
> index overflow") changed crash memory ranges to a dynamic array that
> is reallocated on-demand with krealloc(). The relevant header for this
> call was not included. The kernel compiles though. But be cautious and
> add the header anyway.
> 
> Also, memory allocation logic in fadump_add_crash_memory() takes care
> of memory allocation for crash memory ranges in all scenarios. Drop
> unnecessary memory allocation in fadump_setup_crash_memory_ranges().
> 
> Fixes: 1bd6a1c4b80a ("powerpc/fadump: handle crash memory ranges array index 
> overflow")
> Cc: Mahesh Salgaonkar 
> Signed-off-by: Hari Bathini 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a58183138cb72059a0c278f8370a47

cheers

Re: [v5] powerpc/topology: Get topology for shared processors at boot

2018-08-21 Thread Michael Ellerman

On Fri, 2018-08-17 at 14:54:39 UTC, Srikar Dronamraju wrote:
> On a shared lpar, Phyp will not update the cpu associativity at boot
> time. Just after the boot system does recognize itself as a shared lpar and
> trigger a request for correct cpu associativity. But by then the scheduler
> would have already created/destroyed its sched domains.
> 
> This causes
> - Broken load balance across Nodes causing islands of cores.
> - Performance degradation esp if the system is lightly loaded
> - dmesg to wrongly report all cpus to be in Node 0.
> - Messages in dmesg saying borken topology.
> - With commit 051f3ca02e46 ("sched/topology: Introduce NUMA identity
>   node sched domain"), can cause rcu stalls at boot up.
> 
> >From a scheduler maintainer's perspective, moving cpus from one node to
> another or creating more numa levels after boot is not appropriate
> without some notification to the user space.
> https://lore.kernel.org/lkml/20150406214558.ga38...@linux.vnet.ibm.com/T/#u
> 
> The sched_domains_numa_masks table which is used to generate cpumasks is
> only created at boot time just before creating sched domains and never
> updated.  Hence, its better to get the topology correct before the sched
> domains are created.
> 
> For example on 64 core Power 8 shared lpar, dmesg reports
> 
> [2.088360] Brought up 512 CPUs
> [2.088368] Node 0 CPUs: 0-511
> [2.088371] Node 1 CPUs:
> [2.088373] Node 2 CPUs:
> [2.088375] Node 3 CPUs:
> [2.088376] Node 4 CPUs:
> [2.088378] Node 5 CPUs:
> [2.088380] Node 6 CPUs:
> [2.088382] Node 7 CPUs:
> [2.088386] Node 8 CPUs:
> [2.088388] Node 9 CPUs:
> [2.088390] Node 10 CPUs:
> [2.088392] Node 11 CPUs:
> ...
> [3.916091] BUG: arch topology borken
> [3.916103]  the DIE domain not a subset of the NUMA domain
> [3.916105] BUG: arch topology borken
> [3.916106]  the DIE domain not a subset of the NUMA domain
> ...
> 
> numactl/lscpu output will still be correct with cores spreading across
> all nodes.
> 
> Socket(s): 64
> NUMA node(s):  12
> Model: 2.0 (pvr 004d 0200)
> Model name:POWER8 (architected), altivec supported
> Hypervisor vendor: pHyp
> Virtualization type:   para
> L1d cache: 64K
> L1i cache: 32K
> NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471
> NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479
> NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487
> NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495
> NUMA node4 CPU(s): 208-215,304-311,400-407,496-503
> NUMA node5 CPU(s): 168-175,264-271,360-367,456-463
> NUMA node6 CPU(s): 128-135,224-231,320-327,416-423
> NUMA node7 CPU(s): 136-143,232-239,328-335,424-431
> NUMA node8 CPU(s): 216-223,312-319,408-415,504-511
> NUMA node9 CPU(s): 144-151,240-247,336-343,432-439
> NUMA node10 CPU(s):152-159,248-255,344-351,440-447
> NUMA node11 CPU(s):160-167,256-263,352-359,448-455
> 
> Currently on this lpar, the scheduler detects 2 levels of Numa and
> created numa sched domains for all cpus, but it finds a single DIE
> domain consisting of all cpus. Hence it deletes all numa sched domains.
> 
> To address this, detect the shared processor and update topology soon after
> cpus are setup so that correct topology is updated just before scheduler
> creates sched domain.
> 
> With the fix, dmesg reports
> 
> [0.491336] numa: Node 0 CPUs: 0-7 32-39 64-71 96-103 176-183 272-279 
> 368-375 464-471
> [0.491351] numa: Node 1 CPUs: 8-15 40-47 72-79 104-111 184-191 280-287 
> 376-383 472-479
> [0.491359] numa: Node 2 CPUs: 16-23 48-55 80-87 112-119 192-199 288-295 
> 384-391 480-487
> [0.491366] numa: Node 3 CPUs: 24-31 56-63 88-95 120-127 200-207 296-303 
> 392-399 488-495
> [0.491374] numa: Node 4 CPUs: 208-215 304-311 400-407 496-503
> [0.491379] numa: Node 5 CPUs: 168-175 264-271 360-367 456-463
> [0.491384] numa: Node 6 CPUs: 128-135 224-231 320-327 416-423
> [0.491389] numa: Node 7 CPUs: 136-143 232-239 328-335 424-431
> [0.491394] numa: Node 8 CPUs: 216-223 312-319 408-415 504-511
> [0.491399] numa: Node 9 CPUs: 144-151 240-247 336-343 432-439
> [0.491404] numa: Node 10 CPUs: 152-159 248-255 344-351 440-447
> [0.491409] numa: Node 11 CPUs: 160-167 256-263 352-359 448-455
> 
> and lscpu would also report
> 
> Socket(s): 64
> NUMA node(s):  12
> Model: 2.0 (pvr 004d 0200)
> Model name:POWER8 (architected), altivec supported
> Hypervisor vendor: pHyp
> Virtualization type:   para
> L1d cache: 64K
> L1i cache: 32K
> NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471
> NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479
> NUMA node2 CPU(s):

Re: [v2] powerpc/powernv/pci: Work around races in PCI bridge enabling

2018-08-21 Thread Michael Ellerman

On Fri, 2018-08-17 at 07:30:39 UTC, Benjamin Herrenschmidt wrote:
> The generic code is race when multiple children of a PCI bridge try to
> enable it simultaneously.
> 
> This leads to drivers trying to access a device through a not-yet-enabled
> bridge, and this EEH errors under various circumstances when using parallel
> driver probing.
> 
> There is work going on to fix that properly in the PCI core but it will
> take some time.
> 
> x86 gets away with it because (outside of hotplug), the BIOS enables all
> the bridges at boot time.
> 
> This patch does the same thing on powernv by enabling all bridges that
> have child devices at boot time, thus avoiding subsequent races. It's suitable
> for backporting to stable and distros, while the proper PCI fix will probably
> be significantly more invasive.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> CC: sta...@vger.kernel.org

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/db2173198b9513f7add8009f225afa

cheers

Re: powerpc/traps: Avoid rate limit messages from show unhandled signals

2018-08-21 Thread Michael Ellerman

On Fri, 2018-08-17 at 06:55:00 UTC, Michael Ellerman wrote:
> In the recent commit to add an explicit ratelimit state when showing
> unhandled signals, commit 35a52a10c3ac ("powerpc/traps: Use an
> explicit ratelimit state for show_signal_msg()"), I put the check of
> show_unhandled_signals and the ratelimit state before the call to
> unhandled_signal() so as to avoid unnecessarily calling the latter
> when show_unhandled_signals is false.
> 
> However that causes us to check the ratelimit state on every call, so
> if we take a lot of *handled* signals that has the effect of making
> the ratelimit code print warnings that callbacks have been suppressed
> when they haven't.
> 
> So rearrange the code so that we check show_unhandled_signals first,
> then call unhandled_signal() and finally check the ratelimit state.
> 
> Signed-off-by: Michael Ellerman 
> Reviewed-by: Murilo Opsfelder Araujo 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/997dd26cb3c8b7c9b8765751cc1491

cheers

Re: powerpc/64s: idle_power4 fix PACA_IRQ_HARD_DIS accounting

2018-08-21 Thread Michael Ellerman

On Tue, 2018-08-07 at 13:21:56 UTC, Nicholas Piggin wrote:
> When idle_power4 hard disables interrupts then finds a soft pending
> interrupt, it returns with interrupts hard disabled but without
> PACA_IRQ_HARD_DIS set. Commit 9b81c0211c ("powerpc/64s: make
> PACA_IRQ_HARD_DIS track MSR[EE] closely") added a warning for that
> condition.
> 
> Fix this by adding the PACA_IRQ_HARD_DIS for that case.
> 
> Signed-off-by: Nicholas Piggin 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/993ff6d9df74305bc7b5bbc7a0810c

cheers

Re: [15/15] powerpc/powernv: provide a console flush operation for opal hvc driver

2018-08-21 Thread Michael Ellerman

On Mon, 2018-04-30 at 14:55:58 UTC, Nicholas Piggin wrote:
> Provide the flush hv_op for the opal hvc driver. This will flush the
> firmware console buffers without spinning with interrupts disabled.
> 
> Cc: Benjamin Herrenschmidt 
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Nicholas Piggin 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/95b861a76c1ded3e89d33a3d9f4552

cheers

Re: [PATCH] poewrpc/mce: Fix SLB rebolting during MCE recovery path.

2018-08-21 Thread Nicholas Piggin

On Fri, 17 Aug 2018 14:51:47 +0530
Mahesh J Salgaonkar  wrote:

> From: Mahesh Salgaonkar 
> 
> With the powrpc next commit e7e81847478 (poewrpc/mce: Fix SLB rebolting
> during MCE recovery path.), the SLB error recovery is broken. The
> commit missed a crucial change of OR-ing index value to RB[52-63] which
> selects the SLB entry while rebolting. This patch fixes that.
> 
> Signed-off-by: Mahesh Salgaonkar 
> Reviewed-by: Nicholas Piggin 
> ---
>  arch/powerpc/mm/slb.c |5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
> index 0b095fa54049..6dd9913425bc 100644
> --- a/arch/powerpc/mm/slb.c
> +++ b/arch/powerpc/mm/slb.c
> @@ -101,9 +101,12 @@ void __slb_restore_bolted_realmode(void)
>  
>/* No isync needed because realmode. */
>   for (index = 0; index < SLB_NUM_BOLTED; index++) {
> + unsigned long rb = be64_to_cpu(p->save_area[index].esid);
> +
> + rb = (rb & ~0xFFFul) | index;
>   asm volatile("slbmte  %0,%1" :
>: "r" (be64_to_cpu(p->save_area[index].vsid)),
> -"r" (be64_to_cpu(p->save_area[index].esid)));
> +"r" (rb));
>   }
>  }
>  
> 

I'm just looking at this again. The bolted save areas do have the
index field set. So for the OS, your patch should be equivalent to
this, right?

 static inline void slb_shadow_clear(enum slb_index index)
 {
-   WRITE_ONCE(get_slb_shadow()->save_area[index].esid, 0);
+   WRITE_ONCE(get_slb_shadow()->save_area[index].esid, index);
 }

Which seems like a better fix.

PAPR says:

  Note: SLB is filled sequentially starting at index 0
  from the shadow buffer ignoring the contents of
  RB field bits 52-63

So that shouldn't be an issue.

Thanks,
Nick

Re: [PATCH 4/5] arm64: dts: add QorIQ LX2160A SoC support

2018-08-21 Thread Sudeep Holla

On Mon, Aug 20, 2018 at 12:17:15PM +0530, Vabhav Sharma wrote:
> LX2160A SoC is based on Layerscape Chassis Generation 3.2 Architecture.
> 
> LX2160A features an advanced 16 64-bit ARM v8 CortexA72 processor cores
> in 8 cluster, CCN508, GICv3,two 64-bit DDR4 memory controller, 8 I2C
> controllers, 3 dspi, 2 esdhc,2 USB 3.0, mmu 500, 3 SATA, 4 PL011 SBSA
> UARTs etc.
> 
> Signed-off-by: Ramneek Mehresh 
> Signed-off-by: Zhang Ying-22455 
> Signed-off-by: Nipun Gupta 
> Signed-off-by: Priyanka Jain 
> Signed-off-by: Yogesh Gaur 
> Signed-off-by: Sriram Dash 
> Signed-off-by: Vabhav Sharma 
> ---
>  arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi | 572 
> +
>  1 file changed, 572 insertions(+)
>  create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> 
> diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi 
> b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> new file mode 100644
> index 000..e35e494
> --- /dev/null
> +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> @@ -0,0 +1,572 @@
> +// SPDX-License-Identifier: (GPL-2.0 OR MIT)
> +//
> +// Device Tree Include file for Layerscape-LX2160A family SoC.
> +//
> +// Copyright 2018 NXP
> +
> +#include 
> +
> +/memreserve/ 0x8000 0x0001;
> +
> +/ {
> + compatible = "fsl,lx2160a";
> + interrupt-parent = <>;
> + #address-cells = <2>;
> + #size-cells = <2>;
> +
> + cpus {
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + // 8 clusters having 2 Cortex-A72 cores each
> + cpu@0 {
> + device_type = "cpu";
> + compatible = "arm,cortex-a72";
> + reg = <0x0>;
> + clocks = < 1 0>;
> + next-level-cache = <_l2>;

If you expect to get cache properties in sysfs entries, you need to populate
them here and for each L2 cache.

[...]

> +
> + rstcr: syscon@1e6 {
> + compatible = "syscon";
> + reg = <0x0 0x1e6 0x0 0x4>;
> + };
> +
> + reboot {
> + compatible ="syscon-reboot";
> + regmap = <>;
> + offset = <0x0>;
> + mask = <0x2>;

Is this disabled in bootloader ? With PSCI, it's preferred to use
SYSTEM_RESET/OFF. EL3 f/w may need to do some housekeeping on poweroff.

> + };
> +
> + timer {
> + compatible = "arm,armv8-timer";
> + interrupts = <1 13 4>, // Physical Secure PPI, active-low

The comment says active low but the value 4 indicates it's HIGH from
"include/dt-bindings/interrupt-controller/irq.h"

> +  <1 14 4>, // Physical Non-Secure PPI, active-low
> +  <1 11 4>, // Virtual PPI, active-low
> +  <1 10 4>; // Hypervisor PPI, active-low
> + };
> +
> + pmu {
> + compatible = "arm,armv8-pmuv3";

More specific compatible preferably "arm,cortex-a72-pmu" ?

--
Regards,
Sudeep

Re: [RFC 00/15] PCI: turn some __weak functions into callbacks

2018-08-21 Thread Arnd Bergmann

On Tue, Aug 21, 2018 at 8:14 AM Christoph Hellwig  wrote:
>
> On Fri, Aug 17, 2018 at 12:26:30PM +0200, Arnd Bergmann wrote:
> > Hi Bjorn and others,
> >
> > Triggered by Christoph's patches, I had another go at converting
> > all of the remaining pci host bridge implementations to be based
> > on pci_alloc_host_bridge and a separate registration function.
>
> I really like the idea behind this series.
>
> > I'm adding a bit of duplication into the less maintained code
> > here, but it makes everything more consistent, and gives an
> > easy place to hook up callback functions etc.
>
> I wonder if there is a way to avoid some of that by adding a few
> more helpers, but even without the helpers that approach looks
> ok to me.

Ok, thanks for taking a first look.

One core part that gets duplicated a lot (also in existing drivers)
is the chunk that could be handled by this:

int pci_host_bridge_init(struct pci_host_bridge *bridge,
   struct device *parent, int bus,
   struct pci_ops *ops, void *sysdata,
   struct list_head *resource_list)
{
   if (resources)
  list_splice_init(resources, >windows);
   bridge->dev.parent = parent;
   bridge->sysdata = sysdata;
   bridge->busnr = bus;
   bridge->ops = ops;
}

That would probably help, but we should think carefully about
the set of fields that we want pass here, specifically because the
idea of splitting the probing into two parts was to avoid having
to come up with a new interface every time that list changes
due to some rework.

For instance, the numa node is something that might get passed
here, and if we decide to split out the operations into a separate
pci_host_bridge_ops structure, the pointer to that would also
be something we'd want to pass this way.

> Do you have a git tree somewhere to play around with the changes?

I now uploaded it (with fixes incorporated) to

https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git
pci-probe-rework

   arnd

Re: [PATCH] powerpc/64s/hash: convert SLB miss handlers to C

2018-08-21 Thread Nicholas Piggin

On Tue, 21 Aug 2018 16:12:44 +1000
Benjamin Herrenschmidt  wrote:

> On Tue, 2018-08-21 at 15:13 +1000, Nicholas Piggin wrote:
> > This patch moves SLB miss handlers completely to C, using the standard
> > exception handler macros to set up the stack and branch to C.
> > 
> > This can be done because the segment containing the kernel stack is
> > always bolted, so accessing it with relocation on will not cause an
> > SLB exception.
> > 
> > Arbitrary kernel memory may not be accessed when handling kernel space
> > SLB misses, so care should be taken there. However user SLB misses can
> > access any kernel memory, which can be used to move some fields out of
> > the paca (in later patches).
> > 
> > User SLB misses could quite easily reconcile IRQs and set up a first
> > class kernel environment and exit via ret_from_except, however that
> > doesn't seem to be necessary at the moment, so we only do that if a
> > bad fault is encountered.
> > 
> > [ Credit to Aneesh for bug fixes, error checks, and improvements to bad
> >   address handling, etc ]
> > 
> > Signed-off-by: Nicholas Piggin 
> > 
> > Since RFC:
> > - Send patch 1 by itself to focus on the big change.
> > - Added MSR[RI] handling
> > - Fixed up a register loss bug exposed by irq tracing (Aneesh)
> > - Reject misses outside the defined kernel regions (Aneesh)
> > - Added several more sanity checks and error handlig (Aneesh), we may
> >   look at consolidating these tests and tightenig up the code but for
> >   a first pass we decided it's better to check carefully.
> > ---
> >  arch/powerpc/include/asm/asm-prototypes.h |   2 +
> >  arch/powerpc/kernel/exceptions-64s.S  | 202 +++--
> >  arch/powerpc/mm/Makefile  |   2 +-
> >  arch/powerpc/mm/slb.c | 257 +
> >  arch/powerpc/mm/slb_low.S | 335 --
> >  5 files changed, 185 insertions(+), 613 deletions(-)  
> ^^^^^^
> 
> Nice ! :-)

So I did some measurements with context switching (that takes quite a
few SLB faults), we lose about 3-5% performance on that benchmark.

Top SLB involved profile entries before:

   7.44%  [k] switch_slb
   3.43%  [k] slb_compare_rr_to_size
   1.64%  [k] slb_miss_common
   1.24%  [k] slb_miss_kernel_load_io
   0.58%  [k] exc_virt_0x4480_instruction_access_slb

After:

   7.15%  [k] switch_slb
   3.90%  [k] slb_insert_entry
   3.65%  [k] fast_exception_return
   1.00%  [k] slb_allocate_user
   0.59%  [k] exc_virt_0x4480_instruction_access_slb

With later patches we can reduce SLB misses to zero on this workload
(and generally an order of magnitude lower on small workloads). But
each miss will be more expensive and very large memory workloads are
going to have mandatory misses. Will be good to try verifying that
we can do smarter SLB allocation and reclaim to make up for that on
workloads like HANA. I think we probably could because round robin
isn't great.

Thanks,
Nick

Re: [RFC PATCH 1/5] powerpc/64s/hash: convert SLB miss handlers to C

2018-08-21 Thread Michael Ellerman

Nicholas Piggin  writes:

> This patch moves SLB miss handlers completely to C, using the standard
> exception handler macros to set up the stack and branch to C.
>
> This can be done because the segment containing the kernel stack is
> always bolted, so accessing it with relocation on will not cause an
> SLB exception.
>
> Arbitrary kernel memory may not be accessed when handling kernel space
> SLB misses, so care should be taken there.

We'll need to mark everything that's used in slb.c as notrace, otherwise

Probably we just need to mark the whole file as not traceable.

cheers

Re: Odd SIGSEGV issue introduced by commit 6b31d5955cb29 ("mm, oom: fix potential data corruption when oom_reaper races with writer")

2018-08-21 Thread Michael Ellerman

Christophe LEROY  writes:
...
>
> And I bisected its disappearance with commit 99cd1302327a2 ("powerpc: 
> Deliver SEGV signal on pkey violation")

Whoa that's weird.

> Looking at those two commits, especially the one which makes it 
> dissapear, I'm quite sceptic. Any idea on what could be the cause and/or 
> how to investigate further ?

Are you sure it's not some corruption that just happens to be masked by
that commit? I can't see anything in that commit that could explain that
change in behaviour.

The only real change is if you're hitting DSISR_KEYFAULT isn't it?

What happens if you take 087003e9ef7c and apply the various hunks from
99cd1302327a2 gradually (or those that you can anyway)?

cheers

Re: [RFC 00/15] PCI: turn some __weak functions into callbacks

2018-08-21 Thread Christoph Hellwig

On Fri, Aug 17, 2018 at 12:26:30PM +0200, Arnd Bergmann wrote:
> Hi Bjorn and others,
> 
> Triggered by Christoph's patches, I had another go at converting
> all of the remaining pci host bridge implementations to be based
> on pci_alloc_host_bridge and a separate registration function.

I really like the idea behind this series.

> I'm adding a bit of duplication into the less maintained code
> here, but it makes everything more consistent, and gives an
> easy place to hook up callback functions etc.

I wonder if there is a way to avoid some of that by adding a few
more helpers, but even without the helpers that approach looks
ok to me.

Do you have a git tree somewhere to play around with the changes?

Re: [PATCH v2 1/4] powerpc/tm: Remove msr_tm_active()

2018-08-21 Thread Michael Ellerman

Breno Leitao  writes:
> On 08/16/2018 09:49 PM, Michael Ellerman wrote:
>> Michael Neuling  writes:
>> 
>>> On Mon, 2018-06-18 at 19:59 -0300, Breno Leitao wrote:
 Currently msr_tm_active() is a wrapper around MSR_TM_ACTIVE() if
 CONFIG_PPC_TRANSACTIONAL_MEM is set, or it is just a function that
 returns false if CONFIG_PPC_TRANSACTIONAL_MEM is not set.

 This function is not necessary, since MSR_TM_ACTIVE() just do the same,
 checking for the TS bits and does not require any TM facility.

 This patchset remove every instance of msr_tm_active() and replaced it
 by MSR_TM_ACTIVE().

 Signed-off-by: Breno Leitao 

>>>
>>> Patch looks good... one minor nit below...
>>>

 -  if (!msr_tm_active(regs->msr) &&
 -  !current->thread.load_fp && !loadvec(current->thread))
 +  if (!current->thread.load_fp && !loadvec(current->thread)) {
 +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 +  if (!MSR_TM_ACTIVE(regs->msr))
 +  return;
>>>
>>> Can you make a MSR_TM_ACTIVE() that returns false when
>>> !CONFIG_PPC_TRANSACTIONAL_MEM. Then you don't need this inline #ifdef.
>> 
>> Is that safe?
>> 
>> I see ~50 callers of MSR_TM_ACTIVE(), are they all inside #ifdef TM ?
>
> I checked all of them, and the only two that are not called inside a #ifdef
> are at kvm/book3s_hv_tm.c. They are:
>
>   kvm/book3s_hv_tm.c: if (!MSR_TM_ACTIVE(msr)) {
>   kvm/book3s_hv_tm.c: if (MSR_TM_ACTIVE(msr) || !(vcpu->arch.texasr & 
> TEXASR_FS)) {

That whole file is only built if TM=y:

kvm-hv-$(CONFIG_PPC_TRANSACTIONAL_MEM) += \
book3s_hv_tm.o

> All the others are being called inside the #ifdef
>
> Other than that, I do not see why it would be a problem in the way I
> implemented it, since it will return false for the two cases above, which
> seems correct. Take a look on how the definition became:
>
>   #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>   #define MSR_TM_ACTIVE(x) (((x) & MSR_TS_MASK) != 0) /* Transaction active? 
> */
>   #else
>   #define MSR_TM_ACTIVE(x) 0
>   #endif

Imagine we had some code somewhere that checked for TM being active in a
non-TM aware kernel, that would break with this change, because now the
MSR check does nothing when TM=n.

eg. we might check at boot time that we're not transactional, eg. in
case we came from a kdump kernel that was in a transaction.

So if all the call-sites are already inside an #ifdef I'd be inclined to
not add the #ifdef around the MSR_TM_ACTIVE macro.

That way the macro can always be used to check the MSR value, whether TM
is compiled in or out.

cheers

Re: [PATCH 1/2] powerpc/process: fix nested output in show_user_instructions()

2018-08-21 Thread Michael Ellerman

Christophe Leroy  writes:

> When two processes crash at the same time, we sometimes encounter
> nesting in the middle of a line:

I think "interleaved" is the right word, rather than "nesting".

They're actually (potentially) completely unrelated segfaults, that just
happen to occur at the same time.

And in fact any output that happens simultaneously will mess things up,
it doesn't have to be another segfault.

> [4.365317] init[1]: segfault (11) at 0 nip 0 lr 0 code 1
> [4.370452] init[1]: code:    
> [4.372042] init[74]: segfault (11) at 10a74 nip 1000c198 lr 100078c8 code 
> 1 in sh[1000+14000]
> [4.386829]    
> [4.391542] init[1]: code:      
>   
> [4.400863] init[74]: code: 90010024 bf61000c 91490a7c 3fa01002 3be0 
> 7d3e4b78 3bbd0c20 3b60
> [4.409867] init[74]: code: 3b9d0040 7c7fe02e 2f83 419e0028 <8923> 
> 2f89 41be001c 4b7f6e79
>
> This patch fixes it by preparing complete lines in a buffer and
> printing it at once.
>
> Fixes: 88b0fe1757359 ("powerpc: Add show_user_instructions()")
> Cc: Murilo Opsfelder Araujo 
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/kernel/process.c | 17 +
>  1 file changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 913c5725cdb2..c722ce4ca1c0 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1303,32 +1303,33 @@ void show_user_instructions(struct pt_regs *regs)
>  {
>   unsigned long pc;
>   int i;
> + char buf[96]; /* enough for 8 times 9 + 2 chars */
> + int l = 0;

I'm sure your math is right, but still an on-stack buffer with sprintf()
is a bit scary.

Can you try using seq_buf instead? It is safe against overflow.

eg, something like:

struct seq_buf s;
char buf[96];

seq_buf_init(, buf, sizeof(buf));
...
seq_buf_printf(, ...);

cheers

Re: [PATCH] powerpc/Makefiles: Fix clang/llvm build

2018-08-21 Thread Nicholas Piggin

On Tue, 21 Aug 2018 15:38:39 +1000
Anton Blanchard  wrote:

> Hi Michael,
> 
> > This breaks GCC 4.6.3 at least, which we still support:
> > 
> >   Assembler messages:
> >   Error: invalid switch -mpower8
> >   Error: unrecognized option -mpower8
> >   ../scripts/mod/empty.c:1:0: fatal error: error closing -: Broken
> > pipe  
> 
> Yuck. We have POWER8 instructions in our assembly code, and a toolchain
> that doesn't understand the -mpower8 flag, but has support for
> power8 instructions.
> 
> This is what I see on clang with -mpower7:
> 
> /tmp/sstep-2afa55.s:7584: Error: unrecognized opcode: `lbarx'
> /tmp/sstep-2afa55.s:7626: Error: unrecognized opcode: `stbcx.'
> /tmp/sstep-2afa55.s:8077: Error: unrecognized opcode: `lharx'
> /tmp/sstep-2afa55.s:8140: Error: unrecognized opcode: `stbcx.'
> 
> Nick: any suggestions?

 cpu-as-$(CONFIG_PPC_BOOK3S_64) += $(call 
as-option,-Wa$(comma)-mpower8,-Wa$(comma)-mpower4)

?

Thanks,
Nick

Re: [PATCH] powerpc/64s/hash: convert SLB miss handlers to C

2018-08-21 Thread Benjamin Herrenschmidt

On Tue, 2018-08-21 at 15:13 +1000, Nicholas Piggin wrote:
> This patch moves SLB miss handlers completely to C, using the standard
> exception handler macros to set up the stack and branch to C.
> 
> This can be done because the segment containing the kernel stack is
> always bolted, so accessing it with relocation on will not cause an
> SLB exception.
> 
> Arbitrary kernel memory may not be accessed when handling kernel space
> SLB misses, so care should be taken there. However user SLB misses can
> access any kernel memory, which can be used to move some fields out of
> the paca (in later patches).
> 
> User SLB misses could quite easily reconcile IRQs and set up a first
> class kernel environment and exit via ret_from_except, however that
> doesn't seem to be necessary at the moment, so we only do that if a
> bad fault is encountered.
> 
> [ Credit to Aneesh for bug fixes, error checks, and improvements to bad
>   address handling, etc ]
> 
> Signed-off-by: Nicholas Piggin 
> 
> Since RFC:
> - Send patch 1 by itself to focus on the big change.
> - Added MSR[RI] handling
> - Fixed up a register loss bug exposed by irq tracing (Aneesh)
> - Reject misses outside the defined kernel regions (Aneesh)
> - Added several more sanity checks and error handlig (Aneesh), we may
>   look at consolidating these tests and tightenig up the code but for
>   a first pass we decided it's better to check carefully.
> ---
>  arch/powerpc/include/asm/asm-prototypes.h |   2 +
>  arch/powerpc/kernel/exceptions-64s.S  | 202 +++--
>  arch/powerpc/mm/Makefile  |   2 +-
>  arch/powerpc/mm/slb.c | 257 +
>  arch/powerpc/mm/slb_low.S | 335 --
>  5 files changed, 185 insertions(+), 613 deletions(-)
^^^^^^

Nice ! :-)

Cheers,
Ben.

54 matches

Mail list logo