[PATCH] powerpc/configs: Disable latencytop
latencytop adds almost 4kB to each and every task struct and as such it doesn't deserve to be in our defconfigs. Signed-off-by: Anton Blanchard --- arch/powerpc/configs/g5_defconfig | 1 - arch/powerpc/configs/gamecube_defconfig | 1 - arch/powerpc/configs/maple_defconfig| 1 - arch/powerpc/configs/pmac32_defconfig | 1 - arch/powerpc/configs/powernv_defconfig | 1 - arch/powerpc/configs/ppc64_defconfig| 1 - arch/powerpc/configs/ppc64e_defconfig | 1 - arch/powerpc/configs/ppc6xx_defconfig | 1 - arch/powerpc/configs/pseries_defconfig | 1 - arch/powerpc/configs/wii_defconfig | 1 - 10 files changed, 10 deletions(-) diff --git a/arch/powerpc/configs/g5_defconfig b/arch/powerpc/configs/g5_defconfig index ceb3c770786f..8e9389d6c8ef 100644 --- a/arch/powerpc/configs/g5_defconfig +++ b/arch/powerpc/configs/g5_defconfig @@ -244,7 +244,6 @@ CONFIG_CRC_T10DIF=y CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUG_KERNEL=y CONFIG_DEBUG_MUTEXES=y -CONFIG_LATENCYTOP=y CONFIG_BOOTX_TEXT=y CONFIG_CRYPTO_TEST=m CONFIG_CRYPTO_PCBC=m diff --git a/arch/powerpc/configs/gamecube_defconfig b/arch/powerpc/configs/gamecube_defconfig index 805b0f87653c..bfffdc4f1b73 100644 --- a/arch/powerpc/configs/gamecube_defconfig +++ b/arch/powerpc/configs/gamecube_defconfig @@ -91,7 +91,6 @@ CONFIG_CRC_CCITT=y CONFIG_PRINTK_TIME=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_MUTEXES=y -CONFIG_LATENCYTOP=y CONFIG_SCHED_TRACER=y CONFIG_DMA_API_DEBUG=y CONFIG_PPC_EARLY_DEBUG=y diff --git a/arch/powerpc/configs/maple_defconfig b/arch/powerpc/configs/maple_defconfig index c5f2005005d3..1c436fafb397 100644 --- a/arch/powerpc/configs/maple_defconfig +++ b/arch/powerpc/configs/maple_defconfig @@ -104,7 +104,6 @@ CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUG_KERNEL=y CONFIG_DEBUG_STACK_USAGE=y CONFIG_DEBUG_STACKOVERFLOW=y -CONFIG_LATENCYTOP=y CONFIG_XMON=y CONFIG_XMON_DEFAULT=y CONFIG_BOOTX_TEXT=y diff --git a/arch/powerpc/configs/pmac32_defconfig b/arch/powerpc/configs/pmac32_defconfig index 50b610b48914..8d632ceaea48 100644 --- a/arch/powerpc/configs/pmac32_defconfig +++ b/arch/powerpc/configs/pmac32_defconfig @@ -293,7 +293,6 @@ CONFIG_CRC_T10DIF=y CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUG_KERNEL=y CONFIG_DETECT_HUNG_TASK=y -CONFIG_LATENCYTOP=y CONFIG_XMON=y CONFIG_XMON_DEFAULT=y CONFIG_BOOTX_TEXT=y diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig index ef2ef98d3f28..1cf8ce18b4ca 100644 --- a/arch/powerpc/configs/powernv_defconfig +++ b/arch/powerpc/configs/powernv_defconfig @@ -317,7 +317,6 @@ CONFIG_DEBUG_STACK_USAGE=y CONFIG_DEBUG_STACKOVERFLOW=y CONFIG_SOFTLOCKUP_DETECTOR=y CONFIG_HARDLOCKUP_DETECTOR=y -CONFIG_LATENCYTOP=y CONFIG_FUNCTION_TRACER=y CONFIG_SCHED_TRACER=y CONFIG_FTRACE_SYSCALLS=y diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig index 91fdb619b484..96d695ffe074 100644 --- a/arch/powerpc/configs/ppc64_defconfig +++ b/arch/powerpc/configs/ppc64_defconfig @@ -367,7 +367,6 @@ CONFIG_DEBUG_STACKOVERFLOW=y CONFIG_SOFTLOCKUP_DETECTOR=y CONFIG_HARDLOCKUP_DETECTOR=y CONFIG_DEBUG_MUTEXES=y -CONFIG_LATENCYTOP=y CONFIG_FUNCTION_TRACER=y CONFIG_SCHED_TRACER=y CONFIG_BLK_DEV_IO_TRACE=y diff --git a/arch/powerpc/configs/ppc64e_defconfig b/arch/powerpc/configs/ppc64e_defconfig index 41d85cb3c9a2..7e52d4658867 100644 --- a/arch/powerpc/configs/ppc64e_defconfig +++ b/arch/powerpc/configs/ppc64e_defconfig @@ -223,7 +223,6 @@ CONFIG_DEBUG_STACK_USAGE=y CONFIG_DEBUG_STACKOVERFLOW=y CONFIG_DETECT_HUNG_TASK=y CONFIG_DEBUG_MUTEXES=y -CONFIG_LATENCYTOP=y CONFIG_IRQSOFF_TRACER=y CONFIG_SCHED_TRACER=y CONFIG_BLK_DEV_IO_TRACE=y diff --git a/arch/powerpc/configs/ppc6xx_defconfig b/arch/powerpc/configs/ppc6xx_defconfig index 7c6baf6df139..7bcdb4d1411c 100644 --- a/arch/powerpc/configs/ppc6xx_defconfig +++ b/arch/powerpc/configs/ppc6xx_defconfig @@ -1148,7 +1148,6 @@ CONFIG_FAIL_MAKE_REQUEST=y CONFIG_FAIL_IO_TIMEOUT=y CONFIG_FAULT_INJECTION_DEBUG_FS=y CONFIG_FAULT_INJECTION_STACKTRACE_FILTER=y -CONFIG_LATENCYTOP=y CONFIG_SCHED_TRACER=y CONFIG_STACK_TRACER=y CONFIG_BLK_DEV_IO_TRACE=y diff --git a/arch/powerpc/configs/pseries_defconfig b/arch/powerpc/configs/pseries_defconfig index 62e12f61a3b2..c8f5f281e367 100644 --- a/arch/powerpc/configs/pseries_defconfig +++ b/arch/powerpc/configs/pseries_defconfig @@ -290,7 +290,6 @@ CONFIG_DEBUG_STACK_USAGE=y CONFIG_DEBUG_STACKOVERFLOW=y CONFIG_SOFTLOCKUP_DETECTOR=y CONFIG_HARDLOCKUP_DETECTOR=y -CONFIG_LATENCYTOP=y CONFIG_FUNCTION_TRACER=y CONFIG_SCHED_TRACER=y CONFIG_BLK_DEV_IO_TRACE=y diff --git a/arch/powerpc/configs/wii_defconfig b/arch/powerpc/configs/wii_defconfig index f5c366b02828..d60c04a4708a 100644 --- a/arch/powerpc/configs/wii_defconfig +++ b/arch/powerpc/configs/wii_defconfig @@ -123,7 +123,6 @@ CONFIG_PRINTK_TIME=y CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_MUTEXES=y -CONFIG_LATENCYTOP=y CONFIG_SCHED_TRACER=y CONFIG_BLK_DEV_IO_TRACE=y
Re: [PATCH kernel] prom_init: Fetch flatten device tree from the system firmware
On 04/06/2019 09:42, Segher Boessenkool wrote: > Hi! > > On Mon, Jun 03, 2019 at 12:56:26PM +1000, Alexey Kardashevskiy wrote: >> On 03/06/2019 09:23, Segher Boessenkool wrote: So I go for the simple one and agree with Alexey's idea. >>> >>> When dealing with a whole device tree you have to know about the various >>> dynamically generated nodes and props, and handle each appropriately. >> >> The code I am changing fetches the device tree and build an fdt. What is >> that special knowledge in this context you are talking about? > > Things like /options are dynamically generated. Generated by the guest kernel, after walking the tree and before making the fdt blob? I cannot see it, what do I miss? Otherwise it is all in slof and the same result is fetched one way or another. > I thought you would just be able to reuse some FDT parsing code to > implement my suggested new interface. Maybe that was too optimistic. -- Alexey
[PATCH v2] perf ioctl: Add check for the sample_period value
perf_event_open() limits the sample_period to 63 bits. See commit 0819b2e30ccb ("perf: Limit perf_event_attr::sample_period to 63 bits"). Make ioctl() consistent with it. Also on powerpc, negative sample_period could cause a recursive PMIs leading to a hang (reported when running perf-fuzzer). Signed-off-by: Ravi Bangoria --- kernel/events/core.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/events/core.c b/kernel/events/core.c index abbd4b3b96c2..e44c90378940 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5005,6 +5005,9 @@ static int perf_event_period(struct perf_event *event, u64 __user *arg) if (perf_event_check_period(event, value)) return -EINVAL; + if (!event->attr.freq && (value & (1ULL << 63))) + return -EINVAL; + event_function_call(event, __perf_event_period, ); return 0; -- 2.20.1
Re: [RFC PATCH v2] powerpc/xmon: restrict when kernel is locked down
On 4/6/19 1:05 pm, Christopher M Riedl wrote:>>> + if (!xmon_is_ro) { + xmon_is_ro = kernel_is_locked_down("Using xmon write-access", + LOCKDOWN_INTEGRITY); + if (xmon_is_ro) { + printf("xmon: Read-only due to kernel lockdown\n"); + clear_all_bpt(); Remind me again why we need to clear breakpoints in integrity mode? Andrew I interpreted "integrity" mode as meaning that any changes made by xmon should be reversed. This also covers the case when a user creates some breakpoint(s) in xmon, exits xmon, and then elevates the lockdown state. Upon hitting the first breakpoint and (re-)entering xmon, xmon will clear all breakpoints. Xmon can only take action in response to dynamic lockdown level changes when xmon is invoked in some manner - if there is a better way I am all ears :) Integrity mode merely means we are aiming to prevent modifications to kernel memory. IMHO leaving existing breakpoints in place is fine as long as when we hit the breakpoint xmon is in read-only mode. (dja/mpe might have opinions on this) -- Andrew Donnellan OzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited
crash after NX error
On my two socket POWER9 system (powernv) with 842 zwap set up, I recently got a crash with the Ubuntu kernel (I haven't tried with upstream, and this is the first time the system has died like this, so I'm not sure how repeatable it is). [2.891463] zswap: loaded using pool 842-nx/zbud ... [15626.124646] nx_compress_powernv: ERROR: CSB still not valid after 500 us, giving up : 00 00 00 00 [16868.932913] Unable to handle kernel paging request for data at address 0x6655f67da816cdb8 [16868.933726] Faulting instruction address: 0xc0391600 cpu 0x68: Vector: 380 (Data Access Out of Range) at [c01c9d98b9a0] pc: c0391600: kmem_cache_alloc+0x2e0/0x340 lr: c03915ec: kmem_cache_alloc+0x2cc/0x340 sp: c01c9d98bc20 msr: 9280b033 dar: 6655f67da816cdb8 current = 0xc01ad43cb400 paca= 0xcfac7800 softe: 0irq_happened: 0x01 pid = 8319, comm = make Linux version 4.15.0-50-generic (buildd@bos02-ppc64el-006) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #54-Ubuntu SMP Mon May 6 18:55:18 UTC 2019 (Ubuntu 4.15.0-50.54-generic 4.15.18) 68:mon> t [c01c9d98bc20] c03914d4 kmem_cache_alloc+0x1b4/0x340 (unreliable) [c01c9d98bc80] c03b1e14 __khugepaged_enter+0x54/0x220 [c01c9d98bcc0] c010f0ec copy_process.isra.5.part.6+0xebc/0x1a10 [c01c9d98bda0] c010fe4c _do_fork+0xec/0x510 [c01c9d98be30] c000b584 ppc_clone+0x8/0xc --- Exception: c00 (System Call) at 7afe9daf87f4 SP (7fffca606880) is in userspace So, it looks like there could be a problem in the error path, plausibly fixed by this patch: commit 656ecc16e8fc2ab44b3d70e3fcc197a7020d0ca5 Author: Haren Myneni Date: Wed Jun 13 00:32:40 2018 -0700 crypto/nx: Initialize 842 high and normal RxFIFO control registers NX increments readOffset by FIFO size in receive FIFO control register when CRB is read. But the index in RxFIFO has to match with the corresponding entry in FIFO maintained by VAS in kernel. Otherwise NX may be processing incorrect CRBs and can cause CRB timeout. VAS FIFO offset is 0 when the receive window is opened during initialization. When the module is reloaded or in kexec boot, readOffset in FIFO control register may not match with VAS entry. This patch adds nx_coproc_init OPAL call to reset readOffset and queued entries in FIFO control register for both high and normal FIFOs. Signed-off-by: Haren Myneni [mpe: Fixup uninitialized variable warning] Signed-off-by: Michael Ellerman $ git describe --contains 656ecc16e8fc2ab44b3d70e3fcc197a7020d0ca5 v4.19-rc1~24^2~50 Which was never backported to any stable release, so probably needs to be for v4.14 through v4.18. Notably, Ubuntu is on v4.15 and it doesn't seem to have picked up the patch. I'm opening an Ubuntu bug for this. Haren, is this something you can drive through the stable process (assuming my above crash looks like this failure)? -- Stewart Smith OPAL Architect, IBM.
[PATCH v5 1/2] powerpc: silence a -Wcast-function-type warning in dawr_write_file_bool
From: Mathieu Malaterre In commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option") the following piece of code was added: smp_call_function((smp_call_func_t)set_dawr, _brk, 0); Since GCC 8 this triggers the following warning about incompatible function types: arch/powerpc/kernel/hw_breakpoint.c:408:21: error: cast between incompatible function types from 'int (*)(struct arch_hw_breakpoint *)' to 'void (*)(void *)' [-Werror=cast-function-type] Since the warning is there for a reason, and should not be hidden behind a cast, provide an intermediate callback function to avoid the warning. Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option") Suggested-by: Christoph Hellwig Signed-off-by: Mathieu Malaterre Signed-off-by: Michael Neuling --- arch/powerpc/kernel/hw_breakpoint.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index da307dd93e..ca3a2358b7 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -384,6 +384,11 @@ void hw_breakpoint_pmu_read(struct perf_event *bp) bool dawr_force_enable; EXPORT_SYMBOL_GPL(dawr_force_enable); +static void set_dawr_cb(void *info) +{ + set_dawr(info); +} + static ssize_t dawr_write_file_bool(struct file *file, const char __user *user_buf, size_t count, loff_t *ppos) @@ -403,7 +408,7 @@ static ssize_t dawr_write_file_bool(struct file *file, /* If we are clearing, make sure all CPUs have the DAWR cleared */ if (!dawr_force_enable) - smp_call_function((smp_call_func_t)set_dawr, _brk, 0); + smp_call_function(set_dawr_cb, _brk, 0); return rc; } -- 2.21.0
[PATCH v5 2/2] powerpc: Fix compile issue with force DAWR
If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail at linking with: arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined reference to `dawr_force_enable' This was caused by commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option"). This moves a bunch of code around to fix this. It moves a lot of the DAWR code in a new file and creates a new CONFIG_PPC_DAWR to enable compiling it. Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option") Signed-off-by: Michael Neuling -- v5: - Changes based on comments by hch - Change // to /* comments - Change return code from -1 to -ENODEV - Remove unneeded default n in new Kconfig option - Remove setting to default value - Remove unnecessary braces v4: - Fix merge conflict with patch from Mathieu Malaterre: powerpc: silence a -Wcast-function-type warning in dawr_write_file_bool - Fixed checkpatch issues noticed by Christophe Leroy. v3: Fixes based on Christophe Leroy's comments: - Fix Kconfig options to better reflect reality - Reorder alphabetically - Inline vs #define - Fixed default return for dawr_enabled() when CONFIG_PPC_DAWR=N V2: Fixes based on Christophe Leroy's comments: - Fix commit message formatting - Move more DAWR code into dawr.c --- arch/powerpc/Kconfig | 4 + arch/powerpc/include/asm/hw_breakpoint.h | 21 +++-- arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/dawr.c | 98 arch/powerpc/kernel/hw_breakpoint.c | 61 --- arch/powerpc/kernel/process.c| 28 --- arch/powerpc/kvm/Kconfig | 1 + 7 files changed, 118 insertions(+), 96 deletions(-) create mode 100644 arch/powerpc/kernel/dawr.c diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 8c1c636308..9d61b36df4 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -234,6 +234,7 @@ config PPC select OLD_SIGSUSPEND select PCI_DOMAINS if PCI select PCI_SYSCALL if PCI + select PPC_DAWR if PPC64 select RTC_LIB select SPARSE_IRQ select SYSCTL_EXCEPTION_TRACE @@ -370,6 +371,9 @@ config PPC_ADV_DEBUG_DAC_RANGE depends on PPC_ADV_DEBUG_REGS && 44x default y +config PPC_DAWR + bool + config ZONE_DMA bool default y if PPC_BOOK3E_64 diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index 0fe8c1e46b..41abdae6d0 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -90,18 +90,25 @@ static inline void hw_breakpoint_disable(void) extern void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs); int hw_breakpoint_handler(struct die_args *args); -extern int set_dawr(struct arch_hw_breakpoint *brk); +#else /* CONFIG_HAVE_HW_BREAKPOINT */ +static inline void hw_breakpoint_disable(void) { } +static inline void thread_change_pc(struct task_struct *tsk, + struct pt_regs *regs) { } + +#endif /* CONFIG_HAVE_HW_BREAKPOINT */ + + +#ifdef CONFIG_PPC_DAWR extern bool dawr_force_enable; static inline bool dawr_enabled(void) { return dawr_force_enable; } - -#else /* CONFIG_HAVE_HW_BREAKPOINT */ -static inline void hw_breakpoint_disable(void) { } -static inline void thread_change_pc(struct task_struct *tsk, - struct pt_regs *regs) { } +int set_dawr(struct arch_hw_breakpoint *brk); +#else static inline bool dawr_enabled(void) { return false; } -#endif /* CONFIG_HAVE_HW_BREAKPOINT */ +static inline int set_dawr(struct arch_hw_breakpoint *brk) { return -1; } +#endif + #endif /* __KERNEL__ */ #endif /* _PPC_BOOK3S_64_HW_BREAKPOINT_H */ diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 0ea6c4aa3a..56dfa7a2a6 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -56,6 +56,7 @@ obj-$(CONFIG_PPC64) += setup_64.o sys_ppc32.o \ obj-$(CONFIG_VDSO32) += vdso32/ obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o +obj-$(CONFIG_PPC_DAWR) += dawr.o obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_ppc970.o cpu_setup_pa6t.o obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_power.o obj-$(CONFIG_PPC_BOOK3S_64)+= mce.o mce_power.o diff --git a/arch/powerpc/kernel/dawr.c b/arch/powerpc/kernel/dawr.c new file mode 100644 index 00..ae3bd6abac --- /dev/null +++ b/arch/powerpc/kernel/dawr.c @@ -0,0 +1,98 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * DAWR infrastructure + * + * Copyright 2019, Michael Neuling, IBM Corporation. + */ + +#include +#include +#include +#include +#include +#include +#include + +bool dawr_force_enable; +EXPORT_SYMBOL_GPL(dawr_force_enable); + +int set_dawr(struct
Re: [RFC PATCH v2] powerpc/xmon: restrict when kernel is locked down
> On June 3, 2019 at 1:36 AM Andrew Donnellan wrote: > > > On 24/5/19 10:38 pm, Christopher M. Riedl wrote: > > Xmon should be either fully or partially disabled depending on the > > kernel lockdown state. > > > > Put xmon into read-only mode for lockdown=integrity and completely > > disable xmon when lockdown=confidentiality. Xmon checks the lockdown > > state and takes appropriate action: > > > > (1) during xmon_setup to prevent early xmon'ing > > > > (2) when triggered via sysrq > > > > (3) when toggled via debugfs > > > > (4) when triggered via a previously enabled breakpoint > > > > The following lockdown state transitions are handled: > > > > (1) lockdown=none -> lockdown=integrity > > clear all breakpoints, set xmon read-only mode > > > > (2) lockdown=none -> lockdown=confidentiality > > clear all breakpoints, prevent re-entry into xmon > > > > (3) lockdown=integrity -> lockdown=confidentiality > > prevent re-entry into xmon > > > > Suggested-by: Andrew Donnellan > > Signed-off-by: Christopher M. Riedl > > --- > > > > Applies on top of this series: > > https://patchwork.kernel.org/cover/10884631/ > > > > I've done some limited testing of the scenarios mentioned in the commit > > message on a single CPU QEMU config. > > > > v1->v2: > > Fix subject line > > Submit to linuxppc-dev and kernel-hardening > > > > arch/powerpc/xmon/xmon.c | 56 +++- > > 1 file changed, 55 insertions(+), 1 deletion(-) > > > > diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c > > index 3e7be19aa208..8c4a5a0c28f0 100644 > > --- a/arch/powerpc/xmon/xmon.c > > +++ b/arch/powerpc/xmon/xmon.c > > @@ -191,6 +191,9 @@ static void dump_tlb_44x(void); > > static void dump_tlb_book3e(void); > > #endif > > > > +static void clear_all_bpt(void); > > +static void xmon_init(int); > > + > > #ifdef CONFIG_PPC64 > > #define REG "%.16lx" > > #else > > @@ -291,6 +294,39 @@ Commands:\n\ > > zh halt\n" > > ; > > > > +#ifdef CONFIG_LOCK_DOWN_KERNEL > > +static bool xmon_check_lockdown(void) > > +{ > > + static bool lockdown = false; > > + > > + if (!lockdown) { > > + lockdown = kernel_is_locked_down("Using xmon", > > +LOCKDOWN_CONFIDENTIALITY); > > + if (lockdown) { > > + printf("xmon: Disabled by strict kernel lockdown\n"); > > + xmon_on = 0; > > + xmon_init(0); > > + } > > + } > > + > > + if (!xmon_is_ro) { > > + xmon_is_ro = kernel_is_locked_down("Using xmon write-access", > > + LOCKDOWN_INTEGRITY); > > + if (xmon_is_ro) { > > + printf("xmon: Read-only due to kernel lockdown\n"); > > + clear_all_bpt(); > > Remind me again why we need to clear breakpoints in integrity mode? > > > Andrew > I interpreted "integrity" mode as meaning that any changes made by xmon should be reversed. This also covers the case when a user creates some breakpoint(s) in xmon, exits xmon, and then elevates the lockdown state. Upon hitting the first breakpoint and (re-)entering xmon, xmon will clear all breakpoints. Xmon can only take action in response to dynamic lockdown level changes when xmon is invoked in some manner - if there is a better way I am all ears :) > > > + } > > + } > > + > > + return lockdown; > > +} > > +#else > > +inline static bool xmon_check_lockdown(void) > > +{ > > + return false; > > +} > > +#endif /* CONFIG_LOCK_DOWN_KERNEL */ > > + > > static struct pt_regs *xmon_regs; > > > > static inline void sync(void) > > @@ -708,6 +744,9 @@ static int xmon_bpt(struct pt_regs *regs) > > struct bpt *bp; > > unsigned long offset; > > > > + if (xmon_check_lockdown()) > > + return 0; > > + > > if ((regs->msr & (MSR_IR|MSR_PR|MSR_64BIT)) != (MSR_IR|MSR_64BIT)) > > return 0; > > > > @@ -739,6 +778,9 @@ static int xmon_sstep(struct pt_regs *regs) > > > > static int xmon_break_match(struct pt_regs *regs) > > { > > + if (xmon_check_lockdown()) > > + return 0; > > + > > if ((regs->msr & (MSR_IR|MSR_PR|MSR_64BIT)) != (MSR_IR|MSR_64BIT)) > > return 0; > > if (dabr.enabled == 0) > > @@ -749,6 +791,9 @@ static int xmon_break_match(struct pt_regs *regs) > > > > static int xmon_iabr_match(struct pt_regs *regs) > > { > > + if (xmon_check_lockdown()) > > + return 0; > > + > > if ((regs->msr & (MSR_IR|MSR_PR|MSR_64BIT)) != (MSR_IR|MSR_64BIT)) > > return 0; > > if (iabr == NULL) > > @@ -3742,6 +3787,9 @@ static void xmon_init(int enable) > > #ifdef CONFIG_MAGIC_SYSRQ > > static void sysrq_handle_xmon(int key) > > { > > + if (xmon_check_lockdown()) > > + return; > > + > > /* ensure xmon is enabled */ > >
Re: [PATCH v4 2/2] powerpc: Fix compile issue with force DAWR
I agree with all the below and will address in v5. Mikey On Tue, 2019-05-28 at 23:28 -0700, Christoph Hellwig wrote: > > +config PPC_DAWR > > + bool > > + default n > > "default n" is the default default. No need to write this line. > > > +++ b/arch/powerpc/kernel/dawr.c > > @@ -0,0 +1,100 @@ > > +// SPDX-License-Identifier: GPL-2.0+ > > +// > > +// DAWR infrastructure > > +// > > +// Copyright 2019, Michael Neuling, IBM Corporation. > > Normal top of file header should be /* */, //-style comments are only > for the actual SPDX heder line. > > > + /* Send error to user if they hypervisor won't allow us to write DAWR */ > > + if ((!dawr_force_enable) && > > + (firmware_has_feature(FW_FEATURE_LPAR)) && > > + (set_dawr(_brk) != H_SUCCESS)) > > None of the three inner brace sets here are required, and the code > becomes much easier to read without them. > > > + return -1; > > What about returning a proper error code? > > > +static int __init dawr_force_setup(void) > > +{ > > + dawr_force_enable = false; > > This variable already is initialized to alse by default, so this line > is not required. > > > + if (PVR_VER(mfspr(SPRN_PVR)) == PVR_POWER9) { > > + /* Turn DAWR off by default, but allow admin to turn it on */ > > + dawr_force_enable = false; > > .. and neither is this one. >
Re: [PATCH kernel] prom_init: Fetch flatten device tree from the system firmware
On Mon, Jun 03, 2019 at 06:49:32PM -0500, Segher Boessenkool wrote: > On Tue, Jun 04, 2019 at 07:18:42AM +1000, Benjamin Herrenschmidt wrote: > > Yes we could make property fetching faster but mostly by creating a new > > bulk interface which is quite a bit of work, a new API, and will in > > practice not be used for anything other than creating the FDT. In that > > case, nothing will beat in performance having OF create the FDT itself > > based on its current tree. > > And that will change the whole boot protocol, the interaction between OF > and its client, which is a much bigger change, not conceptually trivial > at all. Copying all properties at once is, which is why I suggested it. Much as I wasn't terribly convinced by the original idea, I agree with Ben and Alexey here. Your approach has slightly more complexity for slightly less benefit. If it's worth doing yours it's better to do theirs. > It would take away the opposition to your patch. Only from you, AFAICT... -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [PATCH v2 1/5] powerpc/powernv: Move SCOM access code into powernv platform
On 9/5/19 3:11 pm, Andrew Donnellan wrote: The powernv platform is the only one that directly accesses SCOMs. Move the support code to platforms/powernv, and get rid of the PPC_SCOM Kconfig option, as SCOM support is always selected when compiling for powernv. This also means that the Kconfig item for CONFIG_SCOM_DEBUGFS will actually show up in menuconfig, as previously it was the only labelled option in sysdev/Kconfig and wasn't actually in a menu. As I've just realised, this isn't actually correct - the option does indeed show up... in the root menu, where I've just been trained to ignore it, and where you won't get a menu location if you try to search for it using / in menuconfig. I think moving it to the platform menu is obviously a better location. mpe would you be able to fix up the commit message in merge? Andrew Signed-off-by: Andrew Donnellan --- v1->v2: - move scom.h as well (mpe) - add all the other patches in this series --- arch/powerpc/platforms/powernv/Kconfig | 5 - arch/powerpc/platforms/powernv/Makefile | 2 +- arch/powerpc/platforms/powernv/opal-xscom.c | 3 ++- arch/powerpc/{sysdev => platforms/powernv}/scom.c | 3 ++- .../{include/asm => platforms/powernv}/scom.h | 13 +++-- arch/powerpc/sysdev/Kconfig | 7 --- arch/powerpc/sysdev/Makefile| 2 -- 7 files changed, 12 insertions(+), 23 deletions(-) rename arch/powerpc/{sysdev => platforms/powernv}/scom.c (99%) rename arch/powerpc/{include/asm => platforms/powernv}/scom.h (95%) diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig index 850eee860cf2..938803eab0ad 100644 --- a/arch/powerpc/platforms/powernv/Kconfig +++ b/arch/powerpc/platforms/powernv/Kconfig @@ -12,7 +12,6 @@ config PPC_POWERNV select EPAPR_BOOT select PPC_INDIRECT_PIO select PPC_UDBG_16550 - select PPC_SCOM select ARCH_RANDOM select CPU_FREQ select PPC_DOORBELL @@ -47,3 +46,7 @@ config PPC_VAS VAS adapters are found in POWER9 based systems. If unsure, say N. + +config SCOM_DEBUGFS + bool "Expose SCOM controllers via debugfs" + depends on DEBUG_FS diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index da2e99efbd04..4b1644150135 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -4,12 +4,12 @@ obj-y += idle.o opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o obj-y += rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o obj-y += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o obj-y += opal-kmsg.o opal-powercap.o opal-psr.o opal-sensor-groups.o +obj-y += opal-xscom.o scom.o obj-$(CONFIG_SMP) += smp.o subcore.o subcore-asm.o obj-$(CONFIG_PCI) += pci.o pci-ioda.o npu-dma.o pci-ioda-tce.o obj-$(CONFIG_CXL_BASE)+= pci-cxl.o obj-$(CONFIG_EEH) += eeh-powernv.o -obj-$(CONFIG_PPC_SCOM) += opal-xscom.o obj-$(CONFIG_MEMORY_FAILURE) += opal-memory-errors.o obj-$(CONFIG_OPAL_PRD)+= opal-prd.o obj-$(CONFIG_PERF_EVENTS) += opal-imc.o diff --git a/arch/powerpc/platforms/powernv/opal-xscom.c b/arch/powerpc/platforms/powernv/opal-xscom.c index 22d5e1110dbb..66337d92cb63 100644 --- a/arch/powerpc/platforms/powernv/opal-xscom.c +++ b/arch/powerpc/platforms/powernv/opal-xscom.c @@ -18,7 +18,8 @@ #include #include #include -#include + +#include "scom.h" /* * We could probably fit that inside the scom_map_t diff --git a/arch/powerpc/sysdev/scom.c b/arch/powerpc/platforms/powernv/scom.c similarity index 99% rename from arch/powerpc/sysdev/scom.c rename to arch/powerpc/platforms/powernv/scom.c index a707b24a7ddb..50c019d2ef45 100644 --- a/arch/powerpc/sysdev/scom.c +++ b/arch/powerpc/platforms/powernv/scom.c @@ -23,9 +23,10 @@ #include #include #include -#include #include +#include "scom.h" + const struct scom_controller *scom_controller; EXPORT_SYMBOL_GPL(scom_controller); diff --git a/arch/powerpc/include/asm/scom.h b/arch/powerpc/platforms/powernv/scom.h similarity index 95% rename from arch/powerpc/include/asm/scom.h rename to arch/powerpc/platforms/powernv/scom.h index f5cde45b1161..b14fe0edf95b 100644 --- a/arch/powerpc/include/asm/scom.h +++ b/arch/powerpc/platforms/powernv/scom.h @@ -18,12 +18,8 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ -#ifndef _ASM_POWERPC_SCOM_H -#define _ASM_POWERPC_SCOM_H - -#ifdef __KERNEL__ -#ifndef __ASSEMBLY__ -#ifdef CONFIG_PPC_SCOM +#ifndef _SCOM_H +#define _SCOM_H /* * The SCOM bus is a sideband bus used for accessing various internal @@ -161,7 +157,4 @@ static inline int scom_write(scom_map_t map, u64 reg, u64 value) } -#endif /* CONFIG_PPC_SCOM */
Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted
On Mon, Jun 03, 2019 at 10:13:59PM -0300, Thiago Jung Bauermann wrote: > > > Michael S. Tsirkin writes: > > > On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote: > >> I rephrased it in terms of address translation. What do you think of > >> this version? The flag name is slightly different too: > >> > >> > >> VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION This feature has the same > >> meaning as VIRTIO_F_ACCESS_PLATFORM both when set and when not set, > >> with the exception that address translation is guaranteed to be > >> unnecessary when accessing memory addresses supplied to the device > >> by the driver. Which is to say, the device will always use physical > >> addresses matching addresses used by the driver (typically meaning > >> physical addresses used by the CPU) and not translated further. This > >> flag should be set by the guest if offered, but to allow for > >> backward-compatibility device implementations allow for it to be > >> left unset by the guest. It is an error to set both this flag and > >> VIRTIO_F_ACCESS_PLATFORM. > > > > > > OK so VIRTIO_F_ACCESS_PLATFORM is designed to allow unpriveledged > > drivers. This is why devices fail when it's not negotiated. > > Just to clarify, what do you mean by unprivileged drivers? Is it drivers > implemented in guest userspace such as with VFIO? Or unprivileged in > some other sense such as needing to use bounce buffers for some reason? I had drivers in guest userspace in mind. > > This confuses me. > > If driver is unpriveledged then what happens with this flag? > > It can supply any address it wants. Will that corrupt kernel > > memory? > > Not needing address translation doesn't necessarily mean that there's no > IOMMU. On powerpc we don't use VIRTIO_F_ACCESS_PLATFORM but there's > always an IOMMU present. And we also support VFIO drivers. The VFIO API > for pseries (sPAPR section in Documentation/vfio.txt) has extra ioctls > to program the IOMMU. > > For our use case, we don't need address translation because we set up an > identity mapping in the IOMMU so that the device can use guest physical > addresses. And can it access any guest physical address? > If the guest kernel is concerned that an unprivileged driver could > jeopardize its integrity it should not negotiate this feature flag. Unfortunately flag negotiation is done through config space and so can be overwritten by the driver. > Perhaps there should be a note about this in the flag definition? This > concern is platform-dependant though. I don't believe it's an issue in > pseries. Again ACCESS_PLATFORM has a pretty open definition. It does actually say it's all up to the platform. Specifically how will VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION be implemented portably? virtio has no portable way to know whether DMA API bypasses translation. > -- > Thiago Jung Bauermann > IBM Linux Technology Center
Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted
Michael S. Tsirkin writes: > On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote: >> I rephrased it in terms of address translation. What do you think of >> this version? The flag name is slightly different too: >> >> >> VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION This feature has the same >> meaning as VIRTIO_F_ACCESS_PLATFORM both when set and when not set, >> with the exception that address translation is guaranteed to be >> unnecessary when accessing memory addresses supplied to the device >> by the driver. Which is to say, the device will always use physical >> addresses matching addresses used by the driver (typically meaning >> physical addresses used by the CPU) and not translated further. This >> flag should be set by the guest if offered, but to allow for >> backward-compatibility device implementations allow for it to be >> left unset by the guest. It is an error to set both this flag and >> VIRTIO_F_ACCESS_PLATFORM. > > > OK so VIRTIO_F_ACCESS_PLATFORM is designed to allow unpriveledged > drivers. This is why devices fail when it's not negotiated. Just to clarify, what do you mean by unprivileged drivers? Is it drivers implemented in guest userspace such as with VFIO? Or unprivileged in some other sense such as needing to use bounce buffers for some reason? > This confuses me. > If driver is unpriveledged then what happens with this flag? > It can supply any address it wants. Will that corrupt kernel > memory? Not needing address translation doesn't necessarily mean that there's no IOMMU. On powerpc we don't use VIRTIO_F_ACCESS_PLATFORM but there's always an IOMMU present. And we also support VFIO drivers. The VFIO API for pseries (sPAPR section in Documentation/vfio.txt) has extra ioctls to program the IOMMU. For our use case, we don't need address translation because we set up an identity mapping in the IOMMU so that the device can use guest physical addresses. If the guest kernel is concerned that an unprivileged driver could jeopardize its integrity it should not negotiate this feature flag. Perhaps there should be a note about this in the flag definition? This concern is platform-dependant though. I don't believe it's an issue in pseries. -- Thiago Jung Bauermann IBM Linux Technology Center
Re: [PATCH kernel] prom_init: Fetch flatten device tree from the system firmware
On Mon, 2019-06-03 at 18:42 -0500, Segher Boessenkool wrote: > Hi! > > On Mon, Jun 03, 2019 at 12:56:26PM +1000, Alexey Kardashevskiy wrote: > > On 03/06/2019 09:23, Segher Boessenkool wrote: > > > > So I go for the simple one and agree with Alexey's idea. > > > > > > When dealing with a whole device tree you have to know about the > > > various > > > dynamically generated nodes and props, and handle each > > > appropriately. > > > > The code I am changing fetches the device tree and build an fdt. > > What is > > that special knowledge in this context you are talking about? > > Things like /options are dynamically generated. They are generated before we do the call to extract the fdt, what's the problem there ? > I thought you would just be able to reuse some FDT parsing code to > implement my suggested new interface. Maybe that was too optimistic. No, the idea is to have SLOF re-flatten it's live tree and hand us a blob. At least that's my understanding. Ben.
Re: [PATCH v2] PCI: rpaphp: Avoid a sometimes-uninitialized warning
On 06/03/2019 03:11 PM, Nathan Chancellor wrote: > When building with -Wsometimes-uninitialized, clang warns: > > drivers/pci/hotplug/rpaphp_core.c:243:14: warning: variable 'fndit' is > used uninitialized whenever 'for' loop exits because its condition is > false [-Wsometimes-uninitialized] > for (j = 0; j < entries; j++) { > ^~~ > drivers/pci/hotplug/rpaphp_core.c:256:6: note: uninitialized use occurs > here > if (fndit) > ^ > drivers/pci/hotplug/rpaphp_core.c:243:14: note: remove the condition if > it is always true > for (j = 0; j < entries; j++) { > ^~~ > drivers/pci/hotplug/rpaphp_core.c:233:14: note: initialize the variable > 'fndit' to silence this warning > int j, fndit; > ^ > = 0 > > fndit is only used to gate a sprintf call, which can be moved into the > loop to simplify the code and eliminate the local variable, which will > fix this warning. > > Link: https://github.com/ClangBuiltLinux/linux/issues/504 > Fixes: 2fcf3ae508c2 ("hotplug/drc-info: Add code to search ibm,drc-info > property") > Suggested-by: Nick Desaulniers > Signed-off-by: Nathan Chancellor Acked-by: Tyrel Datwyler
Re: [PATCH v3] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work
On 06/03/2019 04:44 PM, Nathan Chancellor wrote: > clang warns: > > drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used > uninitialized whenever switch case is taken [-Wsometimes-uninitialized] > case IBMVSCSI_HOST_ACTION_NONE: > ^ > drivers/scsi/ibmvscsi/ibmvscsi.c:2151:6: note: uninitialized use occurs > here > if (rc) { > ^~ > > Initialize rc in the IBMVSCSI_HOST_ACTION_UNBLOCK case statement then > shuffle IBMVSCSI_HOST_ACTION_NONE down to the default case statement and > make it return early so that rc is never used uninitialized in this > function. > > Fixes: 035a3c4046b5 ("scsi: ibmvscsi: redo driver work thread to use enum > action states") > Link: https://github.com/ClangBuiltLinux/linux/issues/502 > Suggested-by: Michael Ellerman > Suggested-by: Tyrel Datwyler > Signed-off-by: Nathan Chancellor Acked-by: Tyrel Datwyler
Re: [WIP RFC PATCH 0/6] Generic Firmware Variable Filesystem
Hi Greg, >> >> As PowerNV moves towards secure boot, we need a place to put secure >> >> variables. One option that has been canvassed is to make our secure >> >> variables look like EFI variables. This is an early sketch of another >> >> approach where we create a generic firmware variable file system, >> >> fwvarfs, and an OPAL Secure Variable backend for it. >> > >> > Is there a need of new filesystem ? I am wondering why can't these be >> > exposed via sysfs / securityfs ? >> > Probably, something like... /sys/firmware/secureboot or >> > /sys/kernel/security/secureboot/ ? >> >> I suppose we could put secure variables in sysfs, but I'm not sure >> that's what sysfs was intended for. I understand sysfs as "a >> filesystem-based view of kernel objects" (from >> Documentation/filesystems/configfs/configfs.txt), and I don't think a >> secure variable is really a kernel object in the same way most other >> things in sysfs are... but I'm open to being convinced. > > What makes them more "secure" than anything else that is in sysfs today? > I didn't see anything in this patchset that provided "additional > security", did I miss it? You're right, there's no additional security. What I should have said was that I didn't think that _firmware_ variables were kernel objects in the same way that other things in sysfs are. Having read the rest of your reply it seems I'm mistaken on this. > I would just recommend putting this in sysfs. Make a new subsystem > (i.e. class) and away you go. > >> My hope with fwvarfs is to provide a generic place for firmware >> variables so that we don't need to expand the list of firmware-specific >> filesystems beyond efivarfs. I am also aiming to make things simple to >> use so that people familiar with firmware don't also have to become >> familiar with filesystem code in order to expose firmware variables to >> userspace. >> fwvarfs can also be used for variables that are not security relevant as >> well. For example, with the EFI backend (patch 3), both secure and >> insecure variables can be read. > > I don't remember why efi variables were not put in sysfs, I think there > was some reasoning behind it originally. Perhaps look in the linux-efi > archives. I'll have a look: I suspect the appeal of efivarfs is that it allows for things like non-case-sensitive matching on the GUID part of the filename while retaining case-sensitivity on the part of the filename representing the variable name. As suggested, I'll try a sysfs class. I think that will allow me to kill off most of the abstraction layer too. Thanks for the input. Regards, Daniel > > thanks, > > greg k-h
Re: [PATCH kernel] prom_init: Fetch flatten device tree from the system firmware
On Tue, Jun 04, 2019 at 07:18:42AM +1000, Benjamin Herrenschmidt wrote: > Yes we could make property fetching faster but mostly by creating a new > bulk interface which is quite a bit of work, a new API, and will in > practice not be used for anything other than creating the FDT. In that > case, nothing will beat in performance having OF create the FDT itself > based on its current tree. And that will change the whole boot protocol, the interaction between OF and its client, which is a much bigger change, not conceptually trivial at all. Copying all properties at once is, which is why I suggested it. It would take away the opposition to your patch. Segher
[PATCH v3] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work
clang warns: drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used uninitialized whenever switch case is taken [-Wsometimes-uninitialized] case IBMVSCSI_HOST_ACTION_NONE: ^ drivers/scsi/ibmvscsi/ibmvscsi.c:2151:6: note: uninitialized use occurs here if (rc) { ^~ Initialize rc in the IBMVSCSI_HOST_ACTION_UNBLOCK case statement then shuffle IBMVSCSI_HOST_ACTION_NONE down to the default case statement and make it return early so that rc is never used uninitialized in this function. Fixes: 035a3c4046b5 ("scsi: ibmvscsi: redo driver work thread to use enum action states") Link: https://github.com/ClangBuiltLinux/linux/issues/502 Suggested-by: Michael Ellerman Suggested-by: Tyrel Datwyler Signed-off-by: Nathan Chancellor --- v1 -> v2: * Initialize rc in the case statements, rather than at the top of the function, as suggested by Michael. v2 -> v3: * default and IBMVSCSI_HOST_ACTION_NONE now return early from the function, as requested by Tyrel. drivers/scsi/ibmvscsi/ibmvscsi.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c index 65053daef5f7..7f66a7783209 100644 --- a/drivers/scsi/ibmvscsi/ibmvscsi.c +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c @@ -2109,8 +2109,8 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data *hostdata) spin_lock_irqsave(hostdata->host->host_lock, flags); switch (hostdata->action) { - case IBMVSCSI_HOST_ACTION_NONE: case IBMVSCSI_HOST_ACTION_UNBLOCK: + rc = 0; break; case IBMVSCSI_HOST_ACTION_RESET: spin_unlock_irqrestore(hostdata->host->host_lock, flags); @@ -2128,8 +2128,10 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data *hostdata) if (!rc) rc = ibmvscsi_send_crq(hostdata, 0xC001LL, 0); break; + case IBMVSCSI_HOST_ACTION_NONE: default: - break; + spin_unlock_irqrestore(hostdata->host->host_lock, flags); + return; } hostdata->action = IBMVSCSI_HOST_ACTION_NONE; -- 2.22.0.rc3
Re: [PATCH kernel] prom_init: Fetch flatten device tree from the system firmware
Hi! On Mon, Jun 03, 2019 at 12:56:26PM +1000, Alexey Kardashevskiy wrote: > On 03/06/2019 09:23, Segher Boessenkool wrote: > >> So I go for the simple one and agree with Alexey's idea. > > > > When dealing with a whole device tree you have to know about the various > > dynamically generated nodes and props, and handle each appropriately. > > The code I am changing fetches the device tree and build an fdt. What is > that special knowledge in this context you are talking about? Things like /options are dynamically generated. I thought you would just be able to reuse some FDT parsing code to implement my suggested new interface. Maybe that was too optimistic. Segher
Re: [PATCH v2] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work
On 06/03/2019 04:34 PM, Tyrel Datwyler wrote: > On 06/03/2019 04:25 PM, Tyrel Datwyler wrote: >> On 06/03/2019 03:19 PM, Nathan Chancellor wrote: >>> clang warns: >>> >>> drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used >>> uninitialized whenever switch case is taken [-Wsometimes-uninitialized] >>> case IBMVSCSI_HOST_ACTION_NONE: >>> ^ >>> drivers/scsi/ibmvscsi/ibmvscsi.c:2151:6: note: uninitialized use occurs >>> here >>> if (rc) { >>> ^~ >>> >>> Initialize rc to zero in the case statements that clang mentions so that >>> the atomic_set and dev_err statement don't trigger for them. >>> >>> Fixes: 035a3c4046b5 ("scsi: ibmvscsi: redo driver work thread to use enum >>> action states") >>> Link: https://github.com/ClangBuiltLinux/linux/issues/502 >>> Suggested-by: Michael Ellerman >>> Signed-off-by: Nathan Chancellor >> >> Acked-by: Tyrel Datwyler >> > > On second thought NACK. See my response to Michael earlier in the thread. > > I think this is the better solution: > > diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c > b/drivers/scsi/ibmvscsi/ibmvscsi.c > index 727c31dc11a0..c3cf05dd8733 100644 > --- a/drivers/scsi/ibmvscsi/ibmvscsi.c > +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c > @@ -2123,8 +2123,8 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data > *hostdata) > > spin_lock_irqsave(hostdata->host->host_lock, flags); > switch (hostdata->action) { > - case IBMVSCSI_HOST_ACTION_NONE: > case IBMVSCSI_HOST_ACTION_UNBLOCK: > + rc = 0; > break; > case IBMVSCSI_HOST_ACTION_RESET: > spin_unlock_irqrestore(hostdata->host->host_lock, flags); > @@ -2142,8 +2142,9 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data > *hostdata) > if (!rc) > rc = ibmvscsi_send_crq(hostdata, > 0xC001LL, 0); > break; > + case IBMVSCSI_HOST_ACTION_NONE: > default: > - break; Need a spin_unlock_irqrestore() here before the return. -Tyrel > + return; > } > > hostdata->action = IBMVSCSI_HOST_ACTION_NONE; >
Re: [PATCH v2] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work
On 06/03/2019 04:25 PM, Tyrel Datwyler wrote: > On 06/03/2019 03:19 PM, Nathan Chancellor wrote: >> clang warns: >> >> drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used >> uninitialized whenever switch case is taken [-Wsometimes-uninitialized] >> case IBMVSCSI_HOST_ACTION_NONE: >> ^ >> drivers/scsi/ibmvscsi/ibmvscsi.c:2151:6: note: uninitialized use occurs >> here >> if (rc) { >> ^~ >> >> Initialize rc to zero in the case statements that clang mentions so that >> the atomic_set and dev_err statement don't trigger for them. >> >> Fixes: 035a3c4046b5 ("scsi: ibmvscsi: redo driver work thread to use enum >> action states") >> Link: https://github.com/ClangBuiltLinux/linux/issues/502 >> Suggested-by: Michael Ellerman >> Signed-off-by: Nathan Chancellor > > Acked-by: Tyrel Datwyler > On second thought NACK. See my response to Michael earlier in the thread. I think this is the better solution: diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c index 727c31dc11a0..c3cf05dd8733 100644 --- a/drivers/scsi/ibmvscsi/ibmvscsi.c +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c @@ -2123,8 +2123,8 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data *hostdata) spin_lock_irqsave(hostdata->host->host_lock, flags); switch (hostdata->action) { - case IBMVSCSI_HOST_ACTION_NONE: case IBMVSCSI_HOST_ACTION_UNBLOCK: + rc = 0; break; case IBMVSCSI_HOST_ACTION_RESET: spin_unlock_irqrestore(hostdata->host->host_lock, flags); @@ -2142,8 +2142,9 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data *hostdata) if (!rc) rc = ibmvscsi_send_crq(hostdata, 0xC001LL, 0); break; + case IBMVSCSI_HOST_ACTION_NONE: default: - break; + return; } hostdata->action = IBMVSCSI_HOST_ACTION_NONE;
Re: [PATCH] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work
On 06/02/2019 03:15 AM, Michael Ellerman wrote: > Hi Nathan, > > Nathan Chancellor writes: >> clang warns: >> >> drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used >> uninitialized whenever switch case is taken [-Wsometimes-uninitialized] >> case IBMVSCSI_HOST_ACTION_NONE: >> ^ >> drivers/scsi/ibmvscsi/ibmvscsi.c:2151:6: note: uninitialized use occurs >> here >> if (rc) { >> ^~ >> >> Initialize rc to zero so that the atomic_set and dev_err statement don't >> trigger for the cases that just break. >> >> Fixes: 035a3c4046b5 ("scsi: ibmvscsi: redo driver work thread to use enum >> action states") >> Link: https://github.com/ClangBuiltLinux/linux/issues/502 >> Signed-off-by: Nathan Chancellor >> --- >> drivers/scsi/ibmvscsi/ibmvscsi.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c >> b/drivers/scsi/ibmvscsi/ibmvscsi.c >> index 727c31dc11a0..6714d8043e62 100644 >> --- a/drivers/scsi/ibmvscsi/ibmvscsi.c >> +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c >> @@ -2118,7 +2118,7 @@ static unsigned long ibmvscsi_get_desired_dma(struct >> vio_dev *vdev) >> static void ibmvscsi_do_work(struct ibmvscsi_host_data *hostdata) >> { >> unsigned long flags; >> -int rc; >> +int rc = 0; >> char *action = "reset"; >> >> spin_lock_irqsave(hostdata->host->host_lock, flags); > > It's always preferable IMHO to keep any initialisation as localised as > possible, so that the compiler can continue to warn about uninitialised > usages elsewhere. In this case that would mean doing the rc = 0 in the > switch, something like: > > diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c > b/drivers/scsi/ibmvscsi/ibmvscsi.c > index 727c31dc11a0..7ee5755cf636 100644 > --- a/drivers/scsi/ibmvscsi/ibmvscsi.c > +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c > @@ -2123,9 +2123,6 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data > *hostdata) > > spin_lock_irqsave(hostdata->host->host_lock, flags); > switch (hostdata->action) { > - case IBMVSCSI_HOST_ACTION_NONE: > - case IBMVSCSI_HOST_ACTION_UNBLOCK: > - break; > case IBMVSCSI_HOST_ACTION_RESET: > spin_unlock_irqrestore(hostdata->host->host_lock, flags); > rc = ibmvscsi_reset_crq_queue(>queue, hostdata); > @@ -2142,7 +2139,10 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data > *hostdata) > if (!rc) > rc = ibmvscsi_send_crq(hostdata, > 0xC001LL, 0); > break; > + case IBMVSCSI_HOST_ACTION_NONE: > + case IBMVSCSI_HOST_ACTION_UNBLOCK: > default: > + rc = 0; > break; > } > > > But then that makes me wonder if that's actually correct? > > If we get an action that we don't recognise should we just throw it away > like that? (by doing hostdata->action = IBMVSCSI_HOST_ACTION_NONE). Tyrel? On initial pass I was ok with this, but after thinking on it I think it is more subtle. The right approach is to set rc = 0 for HOST_ACTION_UNBLOCK as we want to fall through. For HOST_ACTION_NONE and default we need to return directly from the function. -Tyrel > > cheers >
Re: [PATCH v2] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work
On 06/03/2019 03:19 PM, Nathan Chancellor wrote: > clang warns: > > drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used > uninitialized whenever switch case is taken [-Wsometimes-uninitialized] > case IBMVSCSI_HOST_ACTION_NONE: > ^ > drivers/scsi/ibmvscsi/ibmvscsi.c:2151:6: note: uninitialized use occurs > here > if (rc) { > ^~ > > Initialize rc to zero in the case statements that clang mentions so that > the atomic_set and dev_err statement don't trigger for them. > > Fixes: 035a3c4046b5 ("scsi: ibmvscsi: redo driver work thread to use enum > action states") > Link: https://github.com/ClangBuiltLinux/linux/issues/502 > Suggested-by: Michael Ellerman > Signed-off-by: Nathan Chancellor Acked-by: Tyrel Datwyler
[PATCH v2] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work
clang warns: drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used uninitialized whenever switch case is taken [-Wsometimes-uninitialized] case IBMVSCSI_HOST_ACTION_NONE: ^ drivers/scsi/ibmvscsi/ibmvscsi.c:2151:6: note: uninitialized use occurs here if (rc) { ^~ Initialize rc to zero in the case statements that clang mentions so that the atomic_set and dev_err statement don't trigger for them. Fixes: 035a3c4046b5 ("scsi: ibmvscsi: redo driver work thread to use enum action states") Link: https://github.com/ClangBuiltLinux/linux/issues/502 Suggested-by: Michael Ellerman Signed-off-by: Nathan Chancellor --- v1 -> v2: * Initialize rc in the case statements, rather than at the top of the function, as suggested by Michael. drivers/scsi/ibmvscsi/ibmvscsi.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c index 65053daef5f7..3b5647d622d9 100644 --- a/drivers/scsi/ibmvscsi/ibmvscsi.c +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c @@ -2109,9 +2109,6 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data *hostdata) spin_lock_irqsave(hostdata->host->host_lock, flags); switch (hostdata->action) { - case IBMVSCSI_HOST_ACTION_NONE: - case IBMVSCSI_HOST_ACTION_UNBLOCK: - break; case IBMVSCSI_HOST_ACTION_RESET: spin_unlock_irqrestore(hostdata->host->host_lock, flags); rc = ibmvscsi_reset_crq_queue(>queue, hostdata); @@ -2128,7 +2125,10 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data *hostdata) if (!rc) rc = ibmvscsi_send_crq(hostdata, 0xC001LL, 0); break; + case IBMVSCSI_HOST_ACTION_NONE: + case IBMVSCSI_HOST_ACTION_UNBLOCK: default: + rc = 0; break; } -- 2.22.0.rc3
Re: [PATCH v3 06/11] mm/memory_hotplug: Allow arch_remove_pages() without CONFIG_MEMORY_HOTREMOVE
Allow arch_remove_pages() or arch_remove_memory()? And want to confirm the kernel build on affected arch succeed? On Mon, May 27, 2019 at 01:11:47PM +0200, David Hildenbrand wrote: >We want to improve error handling while adding memory by allowing >to use arch_remove_memory() and __remove_pages() even if >CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like: > > arch_add_memory() > rc = do_something(); > if (rc) { > arch_remove_memory(); > } > >We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require >quite some dependencies for memory offlining. > >Cc: Tony Luck >Cc: Fenghua Yu >Cc: Benjamin Herrenschmidt >Cc: Paul Mackerras >Cc: Michael Ellerman >Cc: Martin Schwidefsky >Cc: Heiko Carstens >Cc: Yoshinori Sato >Cc: Rich Felker >Cc: Dave Hansen >Cc: Andy Lutomirski >Cc: Peter Zijlstra >Cc: Thomas Gleixner >Cc: Ingo Molnar >Cc: Borislav Petkov >Cc: "H. Peter Anvin" >Cc: Greg Kroah-Hartman >Cc: "Rafael J. Wysocki" >Cc: Andrew Morton >Cc: Michal Hocko >Cc: Mike Rapoport >Cc: David Hildenbrand >Cc: Oscar Salvador >Cc: "Kirill A. Shutemov" >Cc: Alex Deucher >Cc: "David S. Miller" >Cc: Mark Brown >Cc: Chris Wilson >Cc: Christophe Leroy >Cc: Nicholas Piggin >Cc: Vasily Gorbik >Cc: Rob Herring >Cc: Masahiro Yamada >Cc: "mike.tra...@hpe.com" >Cc: Andrew Banman >Cc: Pavel Tatashin >Cc: Wei Yang >Cc: Arun KS >Cc: Qian Cai >Cc: Mathieu Malaterre >Cc: Baoquan He >Cc: Logan Gunthorpe >Cc: Anshuman Khandual >Signed-off-by: David Hildenbrand >--- > arch/arm64/mm/mmu.c| 2 -- > arch/ia64/mm/init.c| 2 -- > arch/powerpc/mm/mem.c | 2 -- > arch/s390/mm/init.c| 2 -- > arch/sh/mm/init.c | 2 -- > arch/x86/mm/init_32.c | 2 -- > arch/x86/mm/init_64.c | 2 -- > drivers/base/memory.c | 2 -- > include/linux/memory.h | 2 -- > include/linux/memory_hotplug.h | 2 -- > mm/memory_hotplug.c| 2 -- > mm/sparse.c| 6 -- > 12 files changed, 28 deletions(-) > >diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c >index e569a543c384..9ccd7539f2d4 100644 >--- a/arch/arm64/mm/mmu.c >+++ b/arch/arm64/mm/mmu.c >@@ -1084,7 +1084,6 @@ int arch_add_memory(int nid, u64 start, u64 size, > return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT, > restrictions); > } >-#ifdef CONFIG_MEMORY_HOTREMOVE > void arch_remove_memory(int nid, u64 start, u64 size, > struct vmem_altmap *altmap) > { >@@ -1103,4 +1102,3 @@ void arch_remove_memory(int nid, u64 start, u64 size, > __remove_pages(zone, start_pfn, nr_pages, altmap); > } > #endif >-#endif >diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c >index d28e29103bdb..aae75fd7b810 100644 >--- a/arch/ia64/mm/init.c >+++ b/arch/ia64/mm/init.c >@@ -681,7 +681,6 @@ int arch_add_memory(int nid, u64 start, u64 size, > return ret; > } > >-#ifdef CONFIG_MEMORY_HOTREMOVE > void arch_remove_memory(int nid, u64 start, u64 size, > struct vmem_altmap *altmap) > { >@@ -693,4 +692,3 @@ void arch_remove_memory(int nid, u64 start, u64 size, > __remove_pages(zone, start_pfn, nr_pages, altmap); > } > #endif >-#endif >diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c >index e885fe2aafcc..e4bc2dc3f593 100644 >--- a/arch/powerpc/mm/mem.c >+++ b/arch/powerpc/mm/mem.c >@@ -130,7 +130,6 @@ int __ref arch_add_memory(int nid, u64 start, u64 size, > return __add_pages(nid, start_pfn, nr_pages, restrictions); > } > >-#ifdef CONFIG_MEMORY_HOTREMOVE > void __ref arch_remove_memory(int nid, u64 start, u64 size, >struct vmem_altmap *altmap) > { >@@ -164,7 +163,6 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size, > pr_warn("Hash collision while resizing HPT\n"); > } > #endif >-#endif /* CONFIG_MEMORY_HOTPLUG */ > > #ifndef CONFIG_NEED_MULTIPLE_NODES > void __init mem_topology_setup(void) >diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c >index 14955e0a9fcf..ffb81fe95c77 100644 >--- a/arch/s390/mm/init.c >+++ b/arch/s390/mm/init.c >@@ -239,7 +239,6 @@ int arch_add_memory(int nid, u64 start, u64 size, > return rc; > } > >-#ifdef CONFIG_MEMORY_HOTREMOVE > void arch_remove_memory(int nid, u64 start, u64 size, > struct vmem_altmap *altmap) > { >@@ -251,5 +250,4 @@ void arch_remove_memory(int nid, u64 start, u64 size, > __remove_pages(zone, start_pfn, nr_pages, altmap); > vmem_remove_mapping(start, size); > } >-#endif > #endif /* CONFIG_MEMORY_HOTPLUG */ >diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c >index 13c6a6bb5fd9..dfdbaa50946e 100644 >--- a/arch/sh/mm/init.c >+++ b/arch/sh/mm/init.c >@@ -429,7 +429,6 @@ int memory_add_physaddr_to_nid(u64 addr) > EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > #endif > >-#ifdef CONFIG_MEMORY_HOTREMOVE > void arch_remove_memory(int nid, u64 start, u64
[PATCH v2] PCI: rpaphp: Avoid a sometimes-uninitialized warning
When building with -Wsometimes-uninitialized, clang warns: drivers/pci/hotplug/rpaphp_core.c:243:14: warning: variable 'fndit' is used uninitialized whenever 'for' loop exits because its condition is false [-Wsometimes-uninitialized] for (j = 0; j < entries; j++) { ^~~ drivers/pci/hotplug/rpaphp_core.c:256:6: note: uninitialized use occurs here if (fndit) ^ drivers/pci/hotplug/rpaphp_core.c:243:14: note: remove the condition if it is always true for (j = 0; j < entries; j++) { ^~~ drivers/pci/hotplug/rpaphp_core.c:233:14: note: initialize the variable 'fndit' to silence this warning int j, fndit; ^ = 0 fndit is only used to gate a sprintf call, which can be moved into the loop to simplify the code and eliminate the local variable, which will fix this warning. Link: https://github.com/ClangBuiltLinux/linux/issues/504 Fixes: 2fcf3ae508c2 ("hotplug/drc-info: Add code to search ibm,drc-info property") Suggested-by: Nick Desaulniers Signed-off-by: Nathan Chancellor --- v1 -> v2: * Eliminate fndit altogether by shuffling the sprintf call into the for loop and changing the if conditional, as suggested by Nick. drivers/pci/hotplug/rpaphp_core.c | 18 +++--- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/drivers/pci/hotplug/rpaphp_core.c b/drivers/pci/hotplug/rpaphp_core.c index bcd5d357ca23..c3899ee1db99 100644 --- a/drivers/pci/hotplug/rpaphp_core.c +++ b/drivers/pci/hotplug/rpaphp_core.c @@ -230,7 +230,7 @@ static int rpaphp_check_drc_props_v2(struct device_node *dn, char *drc_name, struct of_drc_info drc; const __be32 *value; char cell_drc_name[MAX_DRC_NAME_LEN]; - int j, fndit; + int j; info = of_find_property(dn->parent, "ibm,drc-info", NULL); if (info == NULL) @@ -245,17 +245,13 @@ static int rpaphp_check_drc_props_v2(struct device_node *dn, char *drc_name, /* Should now know end of current entry */ - if (my_index > drc.last_drc_index) - continue; - - fndit = 1; - break; + /* Found it */ + if (my_index <= drc.last_drc_index) { + sprintf(cell_drc_name, "%s%d", drc.drc_name_prefix, + my_index); + break; + } } - /* Found it */ - - if (fndit) - sprintf(cell_drc_name, "%s%d", drc.drc_name_prefix, - my_index); if (((drc_name == NULL) || (drc_name && !strcmp(drc_name, cell_drc_name))) && -- 2.22.0.rc3
Re: [PATCH] PCI: rpaphp: Avoid a sometimes-uninitialized warning
Hi Nick, On Mon, Jun 03, 2019 at 02:07:50PM -0700, Nick Desaulniers wrote: > On Mon, Jun 3, 2019 at 10:44 AM Nathan Chancellor > wrote: > > Looking at the loop in a vacuum as clang would, fndit could be > > uninitialized if entries was ever zero or the if statement was > > always true within the loop. Regardless of whether or not this > > warning is a problem in practice, "found" variables should always > > be initialized to false so that there is no possibility of > > undefined behavior. > > Thanks for the patch Nathan. fndit isn't really being used for > anything other than a print statement outside of the loop. How about: Thank you for the review, this seems reasonable. I will send a v2 shortly. > > ``` > diff --git a/drivers/pci/hotplug/rpaphp_core.c > b/drivers/pci/hotplug/rpaphp_core.c > index bcd5d357ca23..c3899ee1db99 100644 > --- a/drivers/pci/hotplug/rpaphp_core.c > +++ b/drivers/pci/hotplug/rpaphp_core.c > @@ -230,7 +230,7 @@ static int rpaphp_check_drc_props_v2(struct > device_node *dn, char *drc_name, > struct of_drc_info drc; > const __be32 *value; > char cell_drc_name[MAX_DRC_NAME_LEN]; > - int j, fndit; > + int j; > > info = of_find_property(dn->parent, "ibm,drc-info", NULL); > if (info == NULL) > @@ -245,17 +245,13 @@ static int rpaphp_check_drc_props_v2(struct > device_node *dn, char *drc_name, > > /* Should now know end of current entry */ > > - if (my_index > drc.last_drc_index) > - continue; > - > - fndit = 1; > - break; > + /* Found it */ > + if (my_index <= drc.last_drc_index) { > + sprintf(cell_drc_name, "%s%d", drc.drc_name_prefix, > + my_index); > + break; > + } > } > - /* Found it */ > - > - if (fndit) > - sprintf(cell_drc_name, "%s%d", drc.drc_name_prefix, > - my_index); > > if (((drc_name == NULL) || >(drc_name && !strcmp(drc_name, cell_drc_name))) && > ``` > (not sure my tabs were pasted properly in the above...) Doesn't look like it but no worries. Thanks, Nathan
Re: [PATCH v3 05/11] drivers/base/memory: Pass a block_id to init_memory_block()
On Mon, May 27, 2019 at 01:11:46PM +0200, David Hildenbrand wrote: >We'll rework hotplug_memory_register() shortly, so it no longer consumes >pass a section. > >Cc: Greg Kroah-Hartman >Cc: "Rafael J. Wysocki" >Signed-off-by: David Hildenbrand >--- > drivers/base/memory.c | 15 +++ > 1 file changed, 7 insertions(+), 8 deletions(-) > >diff --git a/drivers/base/memory.c b/drivers/base/memory.c >index f180427e48f4..f914fa6fe350 100644 >--- a/drivers/base/memory.c >+++ b/drivers/base/memory.c >@@ -651,21 +651,18 @@ int register_memory(struct memory_block *memory) > return ret; > } > >-static int init_memory_block(struct memory_block **memory, >- struct mem_section *section, unsigned long state) >+static int init_memory_block(struct memory_block **memory, int block_id, >+ unsigned long state) > { > struct memory_block *mem; > unsigned long start_pfn; >- int scn_nr; > int ret = 0; > > mem = kzalloc(sizeof(*mem), GFP_KERNEL); > if (!mem) > return -ENOMEM; > >- scn_nr = __section_nr(section); >- mem->start_section_nr = >- base_memory_block_id(scn_nr) * sections_per_block; >+ mem->start_section_nr = block_id * sections_per_block; > mem->end_section_nr = mem->start_section_nr + sections_per_block - 1; > mem->state = state; > start_pfn = section_nr_to_pfn(mem->start_section_nr); >@@ -694,7 +691,8 @@ static int add_memory_block(int base_section_nr) > > if (section_count == 0) > return 0; >- ret = init_memory_block(, __nr_to_section(section_nr), MEM_ONLINE); >+ ret = init_memory_block(, base_memory_block_id(base_section_nr), >+ MEM_ONLINE); If my understanding is correct, section_nr could be removed too. > if (ret) > return ret; > mem->section_count = section_count; >@@ -707,6 +705,7 @@ static int add_memory_block(int base_section_nr) > */ > int hotplug_memory_register(int nid, struct mem_section *section) > { >+ int block_id = base_memory_block_id(__section_nr(section)); > int ret = 0; > struct memory_block *mem; > >@@ -717,7 +716,7 @@ int hotplug_memory_register(int nid, struct mem_section >*section) > mem->section_count++; > put_device(>dev); > } else { >- ret = init_memory_block(, section, MEM_OFFLINE); >+ ret = init_memory_block(, block_id, MEM_OFFLINE); > if (ret) > goto out; > mem->section_count++; >-- >2.20.1 -- Wei Yang Help you, Help me
Re: [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling
On 03.06.19 23:21, Wei Yang wrote: > IMHO, there is some typo. Yes, thanks. > > s/devicehandling/device handling/ > > On Mon, May 27, 2019 at 01:11:41PM +0200, David Hildenbrand wrote: >> We only want memory block devices for memory to be onlined/offlined >> (add/remove from the buddy). This is required so user space can >> online/offline memory and kdump gets notified about newly onlined memory. >> >> Let's factor out creation/removal of memory block devices. This helps >> to further cleanup arch_add_memory/arch_remove_memory() and to make >> implementation of new features easier - especially sub-section >> memory hot add from Dan. >> >> Anshuman Khandual is currently working on arch_remove_memory(). I added >> a temporary solution via "arm64/mm: Add temporary arch_remove_memory() >> implementation", that is sufficient as a firsts tep in the context of > > s/firsts tep/first step/ > >> this series. (we don't cleanup page tables in case anything goes >> wrong already) >> >> Did a quick sanity test with DIMM plug/unplug, making sure all devices >> and sysfs links properly get added/removed. Compile tested on s390x and >> x86-64. >> >> Based on next/master. >> >> Next refactoring on my list will be making sure that remove_memory() >> will never deal with zones / access "struct pages". Any kind of zone >> handling will have to be done when offlining system memory / before >> removing device memory. I am thinking about remove_pfn_range_from_zone()", >> du undo everything "move_pfn_range_to_zone()" did. > > what is "du undo"? I may not get it. to undo ;) -- Thanks, David / dhildenb
Re: [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling
IMHO, there is some typo. s/devicehandling/device handling/ On Mon, May 27, 2019 at 01:11:41PM +0200, David Hildenbrand wrote: >We only want memory block devices for memory to be onlined/offlined >(add/remove from the buddy). This is required so user space can >online/offline memory and kdump gets notified about newly onlined memory. > >Let's factor out creation/removal of memory block devices. This helps >to further cleanup arch_add_memory/arch_remove_memory() and to make >implementation of new features easier - especially sub-section >memory hot add from Dan. > >Anshuman Khandual is currently working on arch_remove_memory(). I added >a temporary solution via "arm64/mm: Add temporary arch_remove_memory() >implementation", that is sufficient as a firsts tep in the context of s/firsts tep/first step/ >this series. (we don't cleanup page tables in case anything goes >wrong already) > >Did a quick sanity test with DIMM plug/unplug, making sure all devices >and sysfs links properly get added/removed. Compile tested on s390x and >x86-64. > >Based on next/master. > >Next refactoring on my list will be making sure that remove_memory() >will never deal with zones / access "struct pages". Any kind of zone >handling will have to be done when offlining system memory / before >removing device memory. I am thinking about remove_pfn_range_from_zone()", >du undo everything "move_pfn_range_to_zone()" did. what is "du undo"? I may not get it. > >v2 -> v3: >- Add "s390x/mm: Fail when an altmap is used for arch_add_memory()" >- Add "arm64/mm: Add temporary arch_remove_memory() implementation" >- Add "drivers/base/memory: Pass a block_id to init_memory_block()" >- Various changes to "mm/memory_hotplug: Create memory block devices > after arch_add_memory()" and "mm/memory_hotplug: Create memory block > devices after arch_add_memory()" due to switching from sections to > block_id's. > >v1 -> v2: >- s390x/mm: Implement arch_remove_memory() >-- remove mapping after "__remove_pages" > >David Hildenbrand (11): > mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() > s390x/mm: Fail when an altmap is used for arch_add_memory() > s390x/mm: Implement arch_remove_memory() > arm64/mm: Add temporary arch_remove_memory() implementation > drivers/base/memory: Pass a block_id to init_memory_block() > mm/memory_hotplug: Allow arch_remove_pages() without >CONFIG_MEMORY_HOTREMOVE > mm/memory_hotplug: Create memory block devices after arch_add_memory() > mm/memory_hotplug: Drop MHP_MEMBLOCK_API > mm/memory_hotplug: Remove memory block devices before >arch_remove_memory() > mm/memory_hotplug: Make unregister_memory_block_under_nodes() never >fail > mm/memory_hotplug: Remove "zone" parameter from >sparse_remove_one_section > > arch/arm64/mm/mmu.c| 17 + > arch/ia64/mm/init.c| 2 - > arch/powerpc/mm/mem.c | 2 - > arch/s390/mm/init.c| 18 +++-- > arch/sh/mm/init.c | 2 - > arch/x86/mm/init_32.c | 2 - > arch/x86/mm/init_64.c | 2 - > drivers/base/memory.c | 134 +++-- > drivers/base/node.c| 27 +++ > include/linux/memory.h | 6 +- > include/linux/memory_hotplug.h | 12 +-- > include/linux/node.h | 7 +- > mm/memory_hotplug.c| 44 +-- > mm/sparse.c| 10 +-- > 14 files changed, 140 insertions(+), 145 deletions(-) > >-- >2.20.1 -- Wei Yang Help you, Help me
Re: [PATCH v3 0/6] soc/fsl/qe: cleanups and new DT binding
On 13/05/2019 13.14, Rasmus Villemoes wrote: > This small series consists of some small cleanups and simplifications > of the QUICC engine driver, and introduces a new DT binding that makes > it much easier to support other variants of the QUICC engine IP block > that appears in the wild: There's no reason to expect in general that > the number of valid SNUMs uniquely determines the set of such, so it's > better to simply let the device tree specify the values (and, > implicitly via the array length, also the count). > > Which tree should this go through? Ping? These patches should be ready to go in, but I don't know who is supposed to pick them up. Thanks, Rasmus
[PATCH] powerpc: Enable kernel XZ compression option on PPC_85xx
Enable kernel XZ compression option on PPC_85xx. Tested with simpleImage on TP-Link TL-WDR4900 (Freescale P1014 processor). Suggested-by: Christian Lamparter Signed-off-by: Pawel Dembicki --- arch/powerpc/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 8c1c636308c8..daf4cb968922 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -196,7 +196,7 @@ config PPC select HAVE_IOREMAP_PROT select HAVE_IRQ_EXIT_ON_IRQ_STACK select HAVE_KERNEL_GZIP - select HAVE_KERNEL_XZ if PPC_BOOK3S || 44x + select HAVE_KERNEL_XZ if PPC_BOOK3S || 44x || PPC_85xx select HAVE_KPROBES select HAVE_KPROBES_ON_FTRACE select HAVE_KRETPROBES -- 2.20.1
Re: [PATCH 22/22] docs: fix broken documentation links
On Wed, May 29, 2019 at 08:23:53PM -0300, Mauro Carvalho Chehab wrote: > Mostly due to x86 and acpi conversion, several documentation > links are still pointing to the old file. Fix them. Acked-by: Mark Brown signature.asc Description: PGP signature
Re: [PATCH v3 04/11] arm64/mm: Add temporary arch_remove_memory() implementation
On Mon, May 27, 2019 at 01:11:45PM +0200, David Hildenbrand wrote: >A proper arch_remove_memory() implementation is on its way, which also >cleanly removes page tables in arch_add_memory() in case something goes >wrong. Would this be better to understand? removes page tables created in arch_add_memory > >As we want to use arch_remove_memory() in case something goes wrong >during memory hotplug after arch_add_memory() finished, let's add >a temporary hack that is sufficient enough until we get a proper >implementation that cleans up page table entries. > >We will remove CONFIG_MEMORY_HOTREMOVE around this code in follow up >patches. > >Cc: Catalin Marinas >Cc: Will Deacon >Cc: Mark Rutland >Cc: Andrew Morton >Cc: Ard Biesheuvel >Cc: Chintan Pandya >Cc: Mike Rapoport >Cc: Jun Yao >Cc: Yu Zhao >Cc: Robin Murphy >Cc: Anshuman Khandual >Signed-off-by: David Hildenbrand >--- > arch/arm64/mm/mmu.c | 19 +++ > 1 file changed, 19 insertions(+) > >diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c >index a1bfc4413982..e569a543c384 100644 >--- a/arch/arm64/mm/mmu.c >+++ b/arch/arm64/mm/mmu.c >@@ -1084,4 +1084,23 @@ int arch_add_memory(int nid, u64 start, u64 size, > return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT, > restrictions); > } >+#ifdef CONFIG_MEMORY_HOTREMOVE >+void arch_remove_memory(int nid, u64 start, u64 size, >+ struct vmem_altmap *altmap) >+{ >+ unsigned long start_pfn = start >> PAGE_SHIFT; >+ unsigned long nr_pages = size >> PAGE_SHIFT; >+ struct zone *zone; >+ >+ /* >+ * FIXME: Cleanup page tables (also in arch_add_memory() in case >+ * adding fails). Until then, this function should only be used >+ * during memory hotplug (adding memory), not for memory >+ * unplug. ARCH_ENABLE_MEMORY_HOTREMOVE must not be >+ * unlocked yet. >+ */ >+ zone = page_zone(pfn_to_page(start_pfn)); Compared with arch_remove_memory in x86. If altmap is not NULL, zone will be retrieved from page related to altmap. Not sure why this is not the same? >+ __remove_pages(zone, start_pfn, nr_pages, altmap); >+} >+#endif > #endif >-- >2.20.1 -- Wei Yang Help you, Help me
Re: [PATCH kernel] prom_init: Fetch flatten device tree from the system firmware
On Mon, 2019-06-03 at 12:56 +1000, Alexey Kardashevskiy wrote: > > > That is all you need if you do not want to use OF at all. > > ? We also need OF drivers to boot grub and the system, and a default > console for early booting, but yes, we do not want to keep using slof > that much. > > > If you *do* want to keep having an Open Firmware, what we want or need > > is a faster way to walk huge device trees. > > Why? We do not need to _walk_ the tree at all (we _have_ to now and > while walking we do nothing but pack everything together into the fdt > blob) as slof can easily do this already. I agree with Alexey. In a sense this can be thought as an extension of "quiesce", which effectively kills OF. Yes we could make property fetching faster but mostly by creating a new bulk interface which is quite a bit of work, a new API, and will in practice not be used for anything other than creating the FDT. In that case, nothing will beat in performance having OF create the FDT itself based on its current tree. > > > There is no use for the "fetch all properties" cases other than > > > building an FDT that any of us can think of, and it would create a more > > > complicated interface than just "fetch an FDT". > > > > It is a simple way to speed up fetching the device tree enormously, > > without needing big changes to either OF or the clients using it -- not > > in the code, but importantly also not conceptually: everything works just > > as before, just a lot faster. > > I can safely presume though that this will never ever be used in > practice. And it will be still slower, a tiny bit. And require new code > in both slof and linux. Correct. > I can rather see us getting rid of SLOF in the future which in turn will > require the fdt blob pointer in r5 (or whatever it is, forgot) when the > guest starts; so "fetch-all-props" won't be used there either. > > > > So I go for the simple one and agree with Alexey's idea. > > > > When dealing with a whole device tree you have to know about the various > > dynamically generated nodes and props, and handle each appropriately. > > The code I am changing fetches the device tree and build an fdt. What is > that special knowledge in this context you are talking about? Ben.
RE: [PATCH v3 0/6] soc/fsl/qe: cleanups and new DT binding
> -Original Message- > From: Rasmus Villemoes > Sent: Monday, June 3, 2019 2:54 PM > To: devicet...@vger.kernel.org; Qiang Zhao ; Leo Li > > Cc: linuxppc-dev@lists.ozlabs.org; linux-arm-ker...@lists.infradead.org; > linux-ker...@vger.kernel.org; Rob Herring ; Scott > Wood ; Christophe Leroy ; > Mark Rutland ; jo...@infinera.com > > Subject: Re: [PATCH v3 0/6] soc/fsl/qe: cleanups and new DT binding > > On 13/05/2019 13.14, Rasmus Villemoes wrote: > > This small series consists of some small cleanups and simplifications > > of the QUICC engine driver, and introduces a new DT binding that makes > > it much easier to support other variants of the QUICC engine IP block > > that appears in the wild: There's no reason to expect in general that > > the number of valid SNUMs uniquely determines the set of such, so it's > > better to simply let the device tree specify the values (and, > > implicitly via the array length, also the count). > > > > Which tree should this go through? > > Ping? These patches should be ready to go in, but I don't know who is > supposed to pick them up. I can pick them up through the soc/fsl tree. Regards, Leo
Re: [PATCH v3] mm: add account_locked_vm utility function
On Wed, 29 May 2019 16:50:19 -0400 Daniel Jordan wrote: > locked_vm accounting is done roughly the same way in five places, so > unify them in a helper. > > Include the helper's caller in the debug print to distinguish between > callsites. > > Error codes stay the same, so user-visible behavior does too. The one > exception is that the -EPERM case in tce_account_locked_vm is removed > because Alexey has never seen it triggered. > > Signed-off-by: Daniel Jordan > Tested-by: Alexey Kardashevskiy > Cc: Alan Tull > Cc: Alex Williamson > Cc: Andrew Morton > Cc: Benjamin Herrenschmidt > Cc: Christoph Lameter > Cc: Christophe Leroy > Cc: Davidlohr Bueso > Cc: Ira Weiny > Cc: Jason Gunthorpe > Cc: Mark Rutland > Cc: Michael Ellerman > Cc: Moritz Fischer > Cc: Paul Mackerras > Cc: Steve Sistare > Cc: Wu Hao > Cc: linux...@kvack.org > Cc: k...@vger.kernel.org > Cc: kvm-...@vger.kernel.org > Cc: linuxppc-dev@lists.ozlabs.org > Cc: linux-f...@vger.kernel.org > Cc: linux-ker...@vger.kernel.org > --- > v3: > - uninline account_locked_vm (Andrew) > - fix doc comment (Ira) > - retain down_write_killable in vfio type1 (Alex) > - leave Alexey's T-b since the code is the same aside from uninlining > - sanity tested with vfio type1, sanity-built on ppc > > arch/powerpc/kvm/book3s_64_vio.c | 44 ++-- > arch/powerpc/mm/book3s64/iommu_api.c | 41 ++- > drivers/fpga/dfl-afu-dma-region.c| 53 ++-- > drivers/vfio/vfio_iommu_spapr_tce.c | 54 ++-- > drivers/vfio/vfio_iommu_type1.c | 17 +-- > include/linux/mm.h | 4 ++ > mm/util.c| 75 > 7 files changed, 98 insertions(+), 190 deletions(-) I tend to prefer adding a negative rather than converting to absolute and passing a bool for inc/dec, but it all seems equivalent, so for vfio parts Acked-by: Alex Williamson > > diff --git a/arch/powerpc/kvm/book3s_64_vio.c > b/arch/powerpc/kvm/book3s_64_vio.c > index 66270e07449a..768b645c7edf 100644 > --- a/arch/powerpc/kvm/book3s_64_vio.c > +++ b/arch/powerpc/kvm/book3s_64_vio.c > @@ -30,6 +30,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -56,43 +57,6 @@ static unsigned long kvmppc_stt_pages(unsigned long > tce_pages) > return tce_pages + ALIGN(stt_bytes, PAGE_SIZE) / PAGE_SIZE; > } > > -static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc) > -{ > - long ret = 0; > - > - if (!current || !current->mm) > - return ret; /* process exited */ > - > - down_write(>mm->mmap_sem); > - > - if (inc) { > - unsigned long locked, lock_limit; > - > - locked = current->mm->locked_vm + stt_pages; > - lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; > - if (locked > lock_limit && !capable(CAP_IPC_LOCK)) > - ret = -ENOMEM; > - else > - current->mm->locked_vm += stt_pages; > - } else { > - if (WARN_ON_ONCE(stt_pages > current->mm->locked_vm)) > - stt_pages = current->mm->locked_vm; > - > - current->mm->locked_vm -= stt_pages; > - } > - > - pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid, > - inc ? '+' : '-', > - stt_pages << PAGE_SHIFT, > - current->mm->locked_vm << PAGE_SHIFT, > - rlimit(RLIMIT_MEMLOCK), > - ret ? " - exceeded" : ""); > - > - up_write(>mm->mmap_sem); > - > - return ret; > -} > - > static void kvm_spapr_tce_iommu_table_free(struct rcu_head *head) > { > struct kvmppc_spapr_tce_iommu_table *stit = container_of(head, > @@ -302,7 +266,7 @@ static int kvm_spapr_tce_release(struct inode *inode, > struct file *filp) > > kvm_put_kvm(stt->kvm); > > - kvmppc_account_memlimit( > + account_locked_vm(current->mm, > kvmppc_stt_pages(kvmppc_tce_pages(stt->size)), false); > call_rcu(>rcu, release_spapr_tce_table); > > @@ -327,7 +291,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, > return -EINVAL; > > npages = kvmppc_tce_pages(size); > - ret = kvmppc_account_memlimit(kvmppc_stt_pages(npages), true); > + ret = account_locked_vm(current->mm, kvmppc_stt_pages(npages), true); > if (ret) > return ret; > > @@ -373,7 +337,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, > > kfree(stt); > fail_acct: > - kvmppc_account_memlimit(kvmppc_stt_pages(npages), false); > + account_locked_vm(current->mm, kvmppc_stt_pages(npages), false); > return ret; > } > > diff --git a/arch/powerpc/mm/book3s64/iommu_api.c > b/arch/powerpc/mm/book3s64/iommu_api.c > index 5c521f3924a5..18d22eec0ebd 100644 > --- a/arch/powerpc/mm/book3s64/iommu_api.c > +++
[PATCH] PCI: rpaphp: Avoid a sometimes-uninitialized warning
When building with -Wsometimes-uninitialized, clang warns: drivers/pci/hotplug/rpaphp_core.c:243:14: warning: variable 'fndit' is used uninitialized whenever 'for' loop exits because its condition is false [-Wsometimes-uninitialized] for (j = 0; j < entries; j++) { ^~~ drivers/pci/hotplug/rpaphp_core.c:256:6: note: uninitialized use occurs here if (fndit) ^ drivers/pci/hotplug/rpaphp_core.c:243:14: note: remove the condition if it is always true for (j = 0; j < entries; j++) { ^~~ drivers/pci/hotplug/rpaphp_core.c:233:14: note: initialize the variable 'fndit' to silence this warning int j, fndit; ^ = 0 Looking at the loop in a vacuum as clang would, fndit could be uninitialized if entries was ever zero or the if statement was always true within the loop. Regardless of whether or not this warning is a problem in practice, "found" variables should always be initialized to false so that there is no possibility of undefined behavior. Link: https://github.com/ClangBuiltLinux/linux/issues/504 Fixes: 2fcf3ae508c2 ("hotplug/drc-info: Add code to search ibm,drc-info property") Signed-off-by: Nathan Chancellor --- drivers/pci/hotplug/rpaphp_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pci/hotplug/rpaphp_core.c b/drivers/pci/hotplug/rpaphp_core.c index bcd5d357ca23..07b56bf2886f 100644 --- a/drivers/pci/hotplug/rpaphp_core.c +++ b/drivers/pci/hotplug/rpaphp_core.c @@ -230,7 +230,7 @@ static int rpaphp_check_drc_props_v2(struct device_node *dn, char *drc_name, struct of_drc_info drc; const __be32 *value; char cell_drc_name[MAX_DRC_NAME_LEN]; - int j, fndit; + int j, fndit = 0; info = of_find_property(dn->parent, "ibm,drc-info", NULL); if (info == NULL) -- 2.22.0.rc2
Re: [PATCH 03/16] mm: simplify gup_fast_permitted
On Mon, Jun 3, 2019 at 9:08 AM Linus Torvalds wrote: > > The new code has no test at all for "nr_pages == 0", afaik. Note that it really is important to check for that, because right now we do if (gup_fast_permitted(start, nr_pages)) { local_irq_save(flags); gup_pgd_range(start, end, write ? FOLL_WRITE : 0, pages, ); local_irq_restore(flags); } and that gup_pgd_range() function *depends* on the range being non-zero, and does pgdp = pgd_offset(current->mm, addr); do { pgd_t pgd = READ_ONCE(*pgdp); ... } while (pgdp++, addr = next, addr != end); Note how a zero range would turn into an infinite range here. And the only check for 0 was that if (nr_pages <= 0) return 0; in get_user_pages_fast() that you removed. (Admittedly, it would be much better to have that check in __get_user_pages_fast() itself, because we do have callers that call the double-underscore version) Now, I sincerely hope that we don't have anybody that passes in a zero nr_pages (or a negative one), but we do actually have a comment saying it's ok. Note that the check for "if (end < start)" not only does not check for 0, it also doesn't really check for negative. It checks for _overflow_. Admittedly most negative values would be expected to overflow, but it's still a very different issue. Maybe you added the check for negative somewhere else (in another patch), but I don't see it. Linus
Re: [PATCH v3 0/6] Prerequisites for NXP LS104xA SMMU enablement
Am 31.05.19 um 19:32 schrieb Laurentiu Tudor: >> -Original Message- >> From: Andreas Färber >> Sent: Friday, May 31, 2019 8:04 PM >> >> Hello Laurentiu, >> >> Am 31.05.19 um 18:46 schrieb Laurentiu Tudor: -Original Message- From: Andreas Färber Sent: Friday, May 31, 2019 7:15 PM Hi Laurentiu, Am 30.05.19 um 16:19 schrieb laurentiu.tu...@nxp.com: > This patch series contains several fixes in preparation for SMMU > support on NXP LS1043A and LS1046A chips. Once these get picked up, > I'll submit the actual SMMU enablement patches consisting in the > required device tree changes. Have you thought through what will happen if this patch ordering is not preserved? In particular, a user installing a future U-Boot update with the DTB bits but booting a stable kernel without this patch series - wouldn't that regress dpaa then for our customers? >>> >>> These are fixes for issues that popped out after enabling SMMU. >>> I do not expect them to break anything. >> >> That was not my question! You're missing my point: All your patches are >> lacking a Fixes header in their commit message, for backporting them, to >> avoid _your DT patches_ breaking the driver on stable branches! > > It does appear that I'm missing your point. For sure, the DT updates solely > will > break the kernel without these fixes but I'm not sure I understand how this > could happen. In short, customers rarely run master branch. Kindly have your colleagues explain stable branches to you in details. With Fixes header I was referring to the syntax explained here: https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes https://www.kernel.org/doc/html/latest/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes > My plan was to share the kernel dts patches sometime after this series > makes it through. That's fine. What I'm warning you is that seemingly your DT patches, once in one of your LSDK U-Boot releases, will cause a regression for distros like our SLES 15 SP1 unless these prereq kernel patches get applied on the respective stable branches. Which will not happen automatically unless you as patch author take the appropriate action before they get merged. Thanks, Andreas -- SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg)
Re: [PATCH 03/16] mm: simplify gup_fast_permitted
On Mon, Jun 3, 2019 at 12:41 AM Christoph Hellwig wrote: > > I only removed a duplicate of it. I don't see any remaining cases. > The full (old) code in get_user_pages_fast() looks like this: > > if (nr_pages <= 0) > return 0; > > if (unlikely(!access_ok((void __user *)start, len))) > return -EFAULT; > > if (gup_fast_permitted(start, nr_pages)) { Yes, and that code was correct. The new code has no test at all for "nr_pages == 0", afaik. Linus
Re: [PATCH 01/16] uaccess: add untagged_addr definition for other arches
On 6/1/19 1:49 AM, Christoph Hellwig wrote: > From: Andrey Konovalov > > To allow arm64 syscalls to accept tagged pointers from userspace, we must > untag them when they are passed to the kernel. Since untagging is done in > generic parts of the kernel, the untagged_addr macro needs to be defined > for all architectures. > > Define it as a noop for architectures other than arm64. Could you reword above sentence? We are already starting off with untagged_addr() not being no-op for arm64 and sparc64. It will expand further potentially. So something more along the lines of "Define it as noop for architectures that do not support memory tagging". The first paragraph in the log can also be rewritten to be not specific to arm64. -- Khalid > > Acked-by: Catalin Marinas > Signed-off-by: Andrey Konovalov > Signed-off-by: Christoph Hellwig > --- > include/linux/mm.h | 4 > 1 file changed, 4 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 0e8834ac32b7..949d43e9c0b6 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -99,6 +99,10 @@ extern int mmap_rnd_compat_bits __read_mostly; > #include > #include > > +#ifndef untagged_addr > +#define untagged_addr(addr) (addr) > +#endif > + > #ifndef __pa_symbol > #define __pa_symbol(x) __pa(RELOC_HIDE((unsigned long)(x), 0)) > #endif >
[PATCH] powerpc/32s: fix booting with CONFIG_PPC_EARLY_DEBUG_BOOTX
When booting through OF, setup_disp_bat() does nothing because disp_BAT are not set. By change, it used to work because BOOTX buffer is mapped 1:1 at address 0x8100 by the bootloader, and btext_setup_display() sets virt addr same as phys addr. But since commit 215b823707ce ("powerpc/32s: set up an early static hash table for KASAN."), a temporary page table overrides the bootloader mapping. This 0x8100 is also problematic with the newly implemented Kernel Userspace Access Protection (KUAP) because it is within user address space. This patch fixes those issues by properly setting disp_BAT through a call to btext_prepare_BAT(), allowing setup_disp_bat() to properly setup BAT3 for early bootx screen buffer access. Reported-by: Mathieu Malaterre Fixes: 215b823707ce ("powerpc/32s: set up an early static hash table for KASAN.") Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/btext.h | 4 arch/powerpc/kernel/prom_init.c| 1 + arch/powerpc/kernel/prom_init_check.sh | 2 +- 3 files changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/btext.h b/arch/powerpc/include/asm/btext.h index 3ffad030393c..461b0f193864 100644 --- a/arch/powerpc/include/asm/btext.h +++ b/arch/powerpc/include/asm/btext.h @@ -13,7 +13,11 @@ extern void btext_update_display(unsigned long phys, int width, int height, int depth, int pitch); extern void btext_setup_display(int width, int height, int depth, int pitch, unsigned long address); +#ifdef CONFIG_PPC32 extern void btext_prepare_BAT(void); +#else +static inline void btext_prepare_BAT(void) { } +#endif extern void btext_map(void); extern void btext_unmap(void); diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index 3555cad7bdde..ed446b7ea164 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -2336,6 +2336,7 @@ static void __init prom_check_displays(void) prom_printf("W=%d H=%d LB=%d addr=0x%x\n", width, height, pitch, addr); btext_setup_display(width, height, 8, pitch, addr); + btext_prepare_BAT(); } #endif /* CONFIG_PPC_EARLY_DEBUG_BOOTX */ } diff --git a/arch/powerpc/kernel/prom_init_check.sh b/arch/powerpc/kernel/prom_init_check.sh index 518d416971c1..160bef0d553d 100644 --- a/arch/powerpc/kernel/prom_init_check.sh +++ b/arch/powerpc/kernel/prom_init_check.sh @@ -24,7 +24,7 @@ fi WHITELIST="add_reloc_offset __bss_start __bss_stop copy_and_flush _end enter_prom $MEM_FUNCS reloc_offset __secondary_hold __secondary_hold_acknowledge __secondary_hold_spinloop __start -logo_linux_clut224 +logo_linux_clut224 btext_prepare_BAT reloc_got2 kernstart_addr memstart_addr linux_banner _stext __prom_init_toc_start __prom_init_toc_end btext_setup_display TOC." -- 2.13.3
Re: [PATCH] dlpar: Fix a missing-check bug in dlpar_parse_cc_property()
On Sun, 2019-05-26 at 02:42:40 UTC, Gen Zhang wrote: > In dlpar_parse_cc_property(), 'prop->name' is allocated by kstrdup(). > kstrdup() may return NULL, so it should be checked and handle error. > And prop should be freed if 'prop->name' is NULL. > > Signed-off-by: Gen Zhang > Acked-by: Nathan Lynch Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/efa9ace68e487ddd29c2b4d6dd232421 cheers
Re: [PATCH] powerpc/powernv: Update firmware archaeology around OPAL_HANDLE_HMI
On Fri, 2019-05-24 at 05:09:56 UTC, Stewart Smith wrote: > The first machines to ship with OPAL firmware all got firmware updates > that have the new call, but just in case someone is foolish enough to > believe the first 4 months of firmware is the best, we keep this code > around. > > Comment is updated to not refer to late 2014 as recent or the future. > > Signed-off-by: Stewart Smith Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/1549c42deff5f3326ae295ae5816 cheers
Re: [PATCH] powerpc/powernv: Show checkstop reason for NPU2 HMIs
On Thu, 2019-05-23 at 12:28:04 UTC, Frederic Barrat wrote: > If the kernel is notified of an HMI caused by the NPU2, it's currently > not being recognized and it logs the default message: > > Unknown Malfunction Alert of type 3 > > The NPU on Power 9 has 3 Fault Isolation Registers, so that's a lot of > possible causes, but we should at least log that it's an NPU problem > and report which FIR and which bit were raised if opal gave us the > information. > > Signed-off-by: Frederic Barrat > Reviewed-by: Andrew Donnellan Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/89d87bcba2874d824affb7842bb3960c cheers
Re: [PATCH] powerpc: Remove variable ‘path’ since not used
On Thu, 2019-05-23 at 10:25:20 UTC, Mathieu Malaterre wrote: > In commit eab00a208eb6 ("powerpc: Move `path` variable inside > DEBUG_PROM") DEBUG_PROM sentinels were added to silence a warning > (treated as error with W=1): > > arch/powerpc/kernel/prom_init.c:1388:8: error: variable ‘path’ set but not > used [-Werror=unused-but-set-variable] > > Rework the original patch and simplify the code, by removing the > variable ‘path’ completely. Fix line over 90 characters. > > Suggested-by: Michael Ellerman > Signed-off-by: Mathieu Malaterre Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/c806a6fde1c29e7419afcf94d761827a cheers
Re: [PATCH] powerpc/pseries: Fix xive=off command line
On Wed, 2019-05-15 at 10:05:01 UTC, Greg Kurz wrote: > On POWER9, if the hypervisor supports XIVE exploitation mode, the guest OS > will unconditionally requests for the XIVE interrupt mode even if XIVE was > deactivated with the kernel command line xive=off. Later on, when the spapr > XIVE init code handles xive=off, it disables XIVE and tries to fall back on > the legacy mode XICS. > > This discrepency causes a kernel panic because the hypervisor is configured > to provide the XIVE interrupt mode to the guest : > > [0.008837] kernel BUG at arch/powerpc/sysdev/xics/xics-common.c:135! > [0.008877] Oops: Exception in kernel mode, sig: 5 [#1] > [0.008908] LE SMP NR_CPUS=1024 NUMA pSeries > [0.008939] Modules linked in: > [0.008964] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW > 5.0.13-200.fc29.ppc64le #1 > [0.009018] NIP: c1029ab8 LR: c1029aac CTR: > c18e > [0.009065] REGS: c007f96d7900 TRAP: 0700 Tainted: GW > (5.0.13-200.fc29.ppc64le) > [0.009119] MSR: 82029033 CR: > 28000222 XER: 2004 > [0.009168] CFAR: c01b1e28 IRQMASK: 0 > [0.009168] GPR00: c1029aac c007f96d7b90 c15e8600 > > [0.009168] GPR04: 0001 0061 > 646f6d61696e0d0a > [0.009168] GPR08: 0007fd8f 0001 c14c44c0 > c007f96d76cf > [0.009168] GPR12: c18e 0001 > > [0.009168] GPR16: 0001 c007f96d7c08 > c16903d0 > [0.009168] GPR20: c007fffe04e8 ffea c1620164 > c161fe58 > [0.009168] GPR24: c0ea6c88 c11151a8 006000c0 > c007f96d7c34 > [0.009168] GPR28: c14b286c c1115180 > c161dc70 > [0.009558] NIP [c1029ab8] xics_smp_probe+0x38/0x98 > [0.009590] LR [c1029aac] xics_smp_probe+0x2c/0x98 > [0.009622] Call Trace: > [0.009639] [c007f96d7b90] [c1029aac] xics_smp_probe+0x2c/0x98 > (unreliable) > [0.009687] [c007f96d7bb0] [c1033404] > pSeries_smp_probe+0x40/0xa0 > [0.009734] [c007f96d7bd0] [c10212a4] > smp_prepare_cpus+0x62c/0x6ec > [0.009782] [c007f96d7cf0] [c10141b8] > kernel_init_freeable+0x148/0x448 > [0.009829] [c007f96d7db0] [c0010ba4] kernel_init+0x2c/0x148 > [0.009870] [c007f96d7e20] [c000bdd4] > ret_from_kernel_thread+0x5c/0x68 > [0.009916] Instruction dump: > [0.009940] 7c0802a6 6000 7c0802a6 3882 f8010010 f821ffe1 3c62001c > e863b9a0 > [0.009988] 4b1882d1 6000 7c690034 5529d97e <0b09> 3d22001c > e929b998 3ce2ff8f > > Look for xive=off during prom_init and don't ask for XIVE in this case. One > exception though: if the host only supports XIVE, we still want to boot so > we ignore xive=off. > > Similarly, have the spapr XIVE init code to looking at the interrupt mode > negociated during CAS, and ignore xive=off if the hypervisor only supports > XIVE. > > Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt > controller") > Cc: sta...@vger.kernel.org # v4.20 > Reported-by: Pavithra R. Prakash > Signed-off-by: Greg Kurz > Reviewed-by: Cédric Le Goater Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/a3bf9fbdad600b1e4335dd90979f8d60 cheers
Re: [PATCH 1/2] powerpc/lib: fix redundant inclusion of quad.o
On Mon, 2019-05-13 at 10:00:14 UTC, Christophe Leroy wrote: > quad.o is only for PPC64, and already included in obj64-y, > so it doesn't have to be in obj-y > > Fixes: 31bfdb036f12 ("powerpc: Use instruction emulation infrastructure to > handle alignment faults") > Signed-off-by: Christophe Leroy Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/f8e0d0fddf87f26aed576983f752329d cheers
Re: [PATCH -next] misc: ocxl: Make ocxl_remove static
On Sat, 2019-05-04 at 10:27:20 UTC, YueHaibing wrote: > Fix sparse warning: > > drivers/misc/ocxl/pci.c:44:6: warning: > symbol 'ocxl_remove' was not declared. Should it be static? > > Reported-by: Hulk Robot > Signed-off-by: YueHaibing > Acked-by: Andrew Donnellan Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/00b0cdbbc87fb4b531a0d75f4004ed3d cheers
Re: [PATCH -next] powerpc/book3s64: Make some symbols static
On Sat, 2019-05-04 at 10:24:27 UTC, YueHaibing wrote: > Fix sparse warnings: > > arch/powerpc/mm/book3s64/radix_pgtable.c:326:13: warning: symbol > 'radix_init_pgtable' was not declared. Should it be static? > arch/powerpc/mm/book3s64/hash_native.c:48:1: warning: symbol > 'native_tlbie_lock' was not declared. Should it be static? > arch/powerpc/mm/book3s64/hash_utils.c:988:24: warning: symbol > 'init_hash_mm_context' was not declared. Should it be static? > > Signed-off-by: YueHaibing Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/d667edc01bedcd23988ef69f851e7cc2 cheers
Re: [PATCH] powerpc/powernv/npu: Fix reference leak
On Fri, 2019-04-19 at 15:34:13 UTC, Greg Kurz wrote: > Since 902bdc57451c, get_pci_dev() calls pci_get_domain_bus_and_slot(). This > has the effect of incrementing the reference count of the PCI device, as > explained in drivers/pci/search.c: > > * Given a PCI domain, bus, and slot/function number, the desired PCI > * device is located in the list of PCI devices. If the device is > * found, its reference count is increased and this function returns a > * pointer to its data structure. The caller must decrement the > * reference count by calling pci_dev_put(). If no device is found, > * %NULL is returned. > > Nothing was done to call pci_dev_put() and the reference count of GPU and > NPU PCI devices rockets up. > > A natural way to fix this would be to teach the callers about the change, > so that they call pci_dev_put() when done with the pointer. This turns > out to be quite intrusive, as it affects many paths in npu-dma.c, > pci-ioda.c and vfio_pci_nvlink2.c. Also, the issue appeared in 4.16 and > some affected code got moved around since then: it would be problematic > to backport the fix to stable releases. > > All that code never cared for reference counting anyway. Call pci_dev_put() > from get_pci_dev() to revert to the previous behavior. > > Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev from > pci_dn") > Cc: sta...@vger.kernel.org # v4.16 > Signed-off-by: Greg Kurz > Reviewed-by: Alexey Kardashevskiy Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/02c5f5394918b9b47ff4357b1b183357 cheers
Re: [PATCH v8 3/7] s390/pci: add support for IOMMU default DMA mode build options
On Thu, 30 May 2019, Zhen Lei wrote: > The default DMA mode is LAZY on s390, this patch make it can be set to > STRICT at build time. It can be overridden by boot option. > > There is no functional change. > > Signed-off-by: Zhen Lei Acked-by: Sebastian Ott
Re: [PATCH 22/22] docs: fix broken documentation links
Le 30/05/2019 à 01:23, Mauro Carvalho Chehab a écrit : Mostly due to x86 and acpi conversion, several documentation links are still pointing to the old file. Fix them. Signed-off-by: Mauro Carvalho Chehab --- Documentation/acpi/dsd/leds.txt | 2 +- Documentation/admin-guide/kernel-parameters.rst | 6 +++--- Documentation/admin-guide/kernel-parameters.txt | 16 Documentation/admin-guide/ras.rst| 2 +- .../devicetree/bindings/net/fsl-enetc.txt| 7 +++ .../bindings/pci/amlogic,meson-pcie.txt | 2 +- .../bindings/regulator/qcom,rpmh-regulator.txt | 2 +- Documentation/devicetree/booting-without-of.txt | 2 +- Documentation/driver-api/gpio/board.rst | 2 +- Documentation/driver-api/gpio/consumer.rst | 2 +- .../firmware-guide/acpi/enumeration.rst | 2 +- .../firmware-guide/acpi/method-tracing.rst | 2 +- Documentation/i2c/instantiating-devices | 2 +- Documentation/sysctl/kernel.txt | 4 ++-- .../translations/it_IT/process/howto.rst | 2 +- .../it_IT/process/stable-kernel-rules.rst| 4 ++-- .../translations/zh_CN/process/4.Coding.rst | 2 +- Documentation/x86/x86_64/5level-paging.rst | 2 +- Documentation/x86/x86_64/boot-options.rst| 4 ++-- .../x86/x86_64/fake-numa-for-cpusets.rst | 2 +- MAINTAINERS | 6 +++--- arch/arm/Kconfig | 2 +- arch/arm64/kernel/kexec_image.c | 2 +- arch/powerpc/Kconfig | 2 +- arch/x86/Kconfig | 16 arch/x86/Kconfig.debug | 2 +- arch/x86/boot/header.S | 2 +- arch/x86/entry/entry_64.S| 2 +- arch/x86/include/asm/bootparam_utils.h | 2 +- arch/x86/include/asm/page_64_types.h | 2 +- arch/x86/include/asm/pgtable_64_types.h | 2 +- arch/x86/kernel/cpu/microcode/amd.c | 2 +- arch/x86/kernel/kexec-bzimage64.c| 2 +- arch/x86/kernel/pci-dma.c| 2 +- arch/x86/mm/tlb.c| 2 +- arch/x86/platform/pvh/enlighten.c| 2 +- drivers/acpi/Kconfig | 10 +- drivers/net/ethernet/faraday/ftgmac100.c | 2 +- .../fieldbus/Documentation/fieldbus_dev.txt | 4 ++-- drivers/vhost/vhost.c| 2 +- include/acpi/acpi_drivers.h | 2 +- include/linux/fs_context.h | 2 +- include/linux/lsm_hooks.h| 2 +- mm/Kconfig | 2 +- security/Kconfig | 2 +- tools/include/linux/err.h| 2 +- tools/objtool/Documentation/stack-validation.txt | 4 ++-- tools/testing/selftests/x86/protection_keys.c| 2 +- 48 files changed, 77 insertions(+), 78 deletions(-) [...] diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 8c1c636308c8..e868d2bd48b8 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -898,7 +898,7 @@ config PPC_MEM_KEYS page-based protections, but without requiring modification of the page tables when an application changes protection domains. - For details, see Documentation/vm/protection-keys.rst + For details, see Documentation/x86/protection-keys.rst It looks strange to reference an x86 file, for powerpc arch. Christophe
Re: [PATCH v3 14/16] powerpc/32: implement fast entry for syscalls on BOOKE
On 05/28/2019 05:03 PM, Christophe Leroy wrote: Michael Ellerman a écrit : Christophe Leroy writes: Le 23/05/2019 à 09:00, Christophe Leroy a écrit : [...] arch/powerpc/kernel/head_fsl_booke.o: In function `SystemCall': arch/powerpc/kernel/head_fsl_booke.S:416: undefined reference to `kvmppc_handler_BOOKE_INTERRUPT_SYSCALL_SPRN_SRR1' Makefile:1052: recipe for target 'vmlinux' failed +.macro SYSCALL_ENTRY trapno intno + mfspr r10, SPRN_SPRG_THREAD +#ifdef CONFIG_KVM_BOOKE_HV +BEGIN_FTR_SECTION + mtspr SPRN_SPRG_WSCRATCH0, r10 + stw r11, THREAD_NORMSAVE(0)(r10) + stw r13, THREAD_NORMSAVE(2)(r10) + mfcr r13 /* save CR in r13 for now */ + mfspr r11, SPRN_SRR1 + mtocrf 0x80, r11 /* check MSR[GS] without clobbering reg */ + bf 3, 1975f + b kvmppc_handler_BOOKE_INTERRUPT_\intno\()_SPRN_SRR1 It seems to me that the "_SPRN_SRR1" on the end of this line isn't meant to be there... However, it still fails to link with that removed. It looks like I missed the macro expansion. The called function should be kvmppc_handler_8_0x01B Seems like kisskb doesn't build any config like this. I thought we did, ie: http://kisskb.ellerman.id.au/kisskb/buildresult/13817941/ That's a ppc64 config it seems. The problem was on booke32. Christophe But clearly something is missing to trigger the bug. I was able to trigger the bug with mpc85xx_defconfig + CONFIG_VIRTUALIZATION + CONFIG_PPC_E500MC The bug pops up when CONFIG_KVM_BOOKE_HV is set. Christophe cheers
Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On 6/3/19 3:35 AM, Alexey Kardashevskiy wrote: On 03/06/2019 15:02, Alexey Kardashevskiy wrote: On 03/06/2019 12:23, Shawn Anastasio wrote: On 5/30/19 10:56 PM, Alexey Kardashevskiy wrote: On 31/05/2019 08:49, Shawn Anastasio wrote: On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote: On 28/05/2019 17:39, Shawn Anastasio wrote: On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote: On 28/05/2019 15:36, Oliver wrote: On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio wrote: Introduce a new pcibios function pcibios_ignore_alignment_request which allows the PCI core to defer to platform-specific code to determine whether or not to ignore alignment requests for PCI resources. The existing behavior is to simply ignore alignment requests when PCI_PROBE_ONLY is set. This is behavior is maintained by the default implementation of pcibios_ignore_alignment_request. Signed-off-by: Shawn Anastasio --- drivers/pci/pci.c | 9 +++-- include/linux/pci.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8abc843b1615..8207a09085d1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5882,6 +5882,11 @@ resource_size_t __weak pcibios_default_alignment(void) return 0; } +int __weak pcibios_ignore_alignment_request(void) +{ + return pci_has_flag(PCI_PROBE_ONLY); +} + #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE static char resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; static DEFINE_SPINLOCK(resource_alignment_lock); @@ -5906,9 +5911,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev, p = resource_alignment_param; if (!*p && !align) goto out; - if (pci_has_flag(PCI_PROBE_ONLY)) { + if (pcibios_ignore_alignment_request()) { align = 0; - pr_info_once("PCI: Ignoring requested alignments (PCI_PROBE_ONLY)\n"); + pr_info_once("PCI: Ignoring requested alignments\n"); goto out; } I think the logic here is questionable to begin with. If the user has explicitly requested re-aligning a resource via the command line then we should probably do it even if PCI_PROBE_ONLY is set. When it breaks they get to keep the pieces. That said, the real issue here is that PCI_PROBE_ONLY probably shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) hotplugged devices are configured by firmware before it's passed to the guest and we need to keep the FW assignments otherwise things break. QEMU however doesn't do any BAR assignments and relies on that being handled by the guest. At boot time this is done by SLOF, but Linux only keeps SLOF around until it's extracted the device-tree. Once that's done SLOF gets blown away and the kernel needs to do it's own BAR assignments. I'm guessing there's a hack in there to make it work today, but it's a little surprising that it works at all... The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the guest which receives an event from qemu (RAS_EPOW from /proc/interrupts), fetches device tree chunks (and as I understand it - they come with BARs from phyp but without from qemu) and writes "1" to "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually: Interesting. Does this mean that the PHYP hotplug path doesn't call pci_assign_resource? I'd expect dlpar_add_slot() to be called under phyp and eventually pci_device_add() which (I think) may or may not trigger later reassignment. If so it means the patch may not break that platform after all, though it still may not be the correct way of doing things. We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems that (unless resource_alignment= is used) the pseries guest should just walk through all allocated resources and leave them unchanged. If we add a pcibios_default_alignment() implementation like was suggested earlier, then it will behave as if the user has specified resource_alignment= by default and SLOF's assignments won't be honored (I think). I removed pci_add_flags(PCI_PROBE_ONLY) from pSeries_setup_arch and tried booting with and without pci=resource_alignment= and I can see no difference - BARs are still aligned to 64K as programmed in SLOF; if I hack SLOF to align to 4K or 32K - BARs get packed and the guest leaves them unchanged. I guess it boils down to one question - is it important that we observe SLOF's initial BAR assignments? It isn't if it's SLOF but it is if it's phyp. It used to not allow/support BAR reassignment and even if it does not, I'd rather avoid touching them. A quick update. I tried removing pci_add_flags(PCI_PROBE_ONLY) which worked, but if I add an implementation of pcibios_default_alignment which simply returns PAGE_SIZE, my VM fails to boot and many errors from the virtio disk driver are printed to the console. After some
Re: [PATCH v3] powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt()
Hi, On Wed, May 15, 2019 at 01:15:52PM +0530, Gautham R. Shenoy wrote: > From: "Gautham R. Shenoy" > > The calls to arch_add_memory()/arch_remove_memory() are always made > with the read-side cpu_hotplug_lock acquired via > memory_hotplug_begin(). On pSeries, > arch_add_memory()/arch_remove_memory() eventually call resize_hpt() > which in turn calls stop_machine() which acquires the read-side > cpu_hotplug_lock again, thereby resulting in the recursive acquisition > of this lock. A clarification regarding why we hadn't observed this problem earlier. In the absence of CONFIG_PROVE_LOCKING, we hadn't observed a system lockup during a memory hotplug operation because cpus_read_lock() is a per-cpu rwsem read, which, in the fast-path (in the absence of the writer, which in our case is a CPU-hotplug operation) simply increments the read_count on the semaphore. Thus a recursive read in the fast-path doesn't cause any problems. However, we can hit this problem in practice if there is a concurrent CPU-Hotplug operation in progress which is waiting to acquire the write-side of the lock. This will cause the second recursive read to block until the writer finishes. While the writer is blocked since the first read holds the lock. Thus both the reader as well as the writers fail to make any progress thereby blocking both CPU-Hotplug as well as Memory Hotplug operations. Memory-Hotplug CPU-Hotplug CPU 0 CPU 1 -- -- 1. down_read(cpu_hotplug_lock.rw_sem) [memory_hotplug_begin] 2. down_write(cpu_hotplug_lock.rw_sem) [cpu_up/cpu_down] 3. down_read(cpu_hotplug_lock.rw_sem) [stop_machine()] > > Lockdep complains as follows in these code-paths. > > swapper/0/1 is trying to acquire lock: > (ptrval) (cpu_hotplug_lock.rw_sem){}, at: stop_machine+0x2c/0x60 > > but task is already holding lock: > (ptrval) (cpu_hotplug_lock.rw_sem){}, at: > mem_hotplug_begin+0x20/0x50 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > >lock(cpu_hotplug_lock.rw_sem); >lock(cpu_hotplug_lock.rw_sem); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 3 locks held by swapper/0/1: > #0: (ptrval) (>mutex){}, at: __driver_attach+0x12c/0x1b0 > #1: (ptrval) (cpu_hotplug_lock.rw_sem){}, at: > mem_hotplug_begin+0x20/0x50 > #2: (ptrval) (mem_hotplug_lock.rw_sem){}, at: > percpu_down_write+0x54/0x1a0 > > stack backtrace: > CPU: 0 PID: 1 Comm: swapper/0 Not tainted > 5.0.0-rc5-58373-gbc99402235f3-dirty #166 > Call Trace: > [c000feb03150] [c0e32bd4] dump_stack+0xe8/0x164 (unreliable) > [c000feb031a0] [c020d6c0] __lock_acquire+0x1110/0x1c70 > [c000feb03320] [c020f080] lock_acquire+0x240/0x290 > [c000feb033e0] [c017f554] cpus_read_lock+0x64/0xf0 > [c000feb03420] [c029ebac] stop_machine+0x2c/0x60 > [c000feb03460] [c00d7f7c] pseries_lpar_resize_hpt+0x19c/0x2c0 > [c000feb03500] [c00788d0] resize_hpt_for_hotplug+0x70/0xd0 > [c000feb03570] [c0e5d278] arch_add_memory+0x58/0xfc > [c000feb03610] [c03553a8] devm_memremap_pages+0x5e8/0x8f0 > [c000feb036c0] [c09c2394] pmem_attach_disk+0x764/0x830 > [c000feb037d0] [c09a7c38] nvdimm_bus_probe+0x118/0x240 > [c000feb03860] [c0968500] really_probe+0x230/0x4b0 > [c000feb038f0] [c0968aec] driver_probe_device+0x16c/0x1e0 > [c000feb03970] [c0968ca8] __driver_attach+0x148/0x1b0 > [c000feb039f0] [c09650b0] bus_for_each_dev+0x90/0x130 > [c000feb03a50] [c0967dd4] driver_attach+0x34/0x50 > [c000feb03a70] [c0967068] bus_add_driver+0x1a8/0x360 > [c000feb03b00] [c096a498] driver_register+0x108/0x170 > [c000feb03b70] [c09a7400] __nd_driver_register+0xd0/0xf0 > [c000feb03bd0] [c128aa90] nd_pmem_driver_init+0x34/0x48 > [c000feb03bf0] [c0010a10] do_one_initcall+0x1e0/0x45c > [c000feb03cd0] [c122462c] kernel_init_freeable+0x540/0x64c > [c000feb03db0] [c001110c] kernel_init+0x2c/0x160 > [c000feb03e20] [c000bed4] ret_from_kernel_thread+0x5c/0x68 > > Fix this issue by > 1) Requiring all the calls to pseries_lpar_resize_hpt() be made > with cpu_hotplug_lock held. > > 2) In pseries_lpar_resize_hpt() invoke stop_machine_cpuslocked() > as a consequence of 1) > > 3) To satisfy 1), in hpt_order_set(), call mmu_hash_ops.resize_hpt() > with cpu_hotplug_lock held. > > Reported-by: Aneesh Kumar K.V > Signed-off-by: Gautham R. Shenoy > --- > v2 -> v3 : Updated the comment for pseries_lpar_resize_hpt() >Updated the commit-log with the full
Re: [PATCH] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work
Le 31/05/2019 à 20:53, Nathan Chancellor a écrit : clang warns: drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used uninitialized whenever switch case is taken [-Wsometimes-uninitialized] case IBMVSCSI_HOST_ACTION_NONE: ^ drivers/scsi/ibmvscsi/ibmvscsi.c:2151:6: note: uninitialized use occurs here if (rc) { ^~ Initialize rc to zero so that the atomic_set and dev_err statement don't trigger for the cases that just break. Fixes: 035a3c4046b5 ("scsi: ibmvscsi: redo driver work thread to use enum action states") Link: https://github.com/ClangBuiltLinux/linux/issues/502 Signed-off-by: Nathan Chancellor --- drivers/scsi/ibmvscsi/ibmvscsi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c index 727c31dc11a0..6714d8043e62 100644 --- a/drivers/scsi/ibmvscsi/ibmvscsi.c +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c @@ -2118,7 +2118,7 @@ static unsigned long ibmvscsi_get_desired_dma(struct vio_dev *vdev) static void ibmvscsi_do_work(struct ibmvscsi_host_data *hostdata) { unsigned long flags; - int rc; + int rc = 0; I don't think the above is the best solution, as it hides the warning instead of really fixing it. Your problem is that some legs of the switch are missing setting the value of rc, it would therefore be better to fix the legs instead of setting a default value which may not be correct for every case, allthough it may be at the time being. Christophe char *action = "reset"; spin_lock_irqsave(hostdata->host->host_lock, flags);
Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request
On 03/06/2019 15:02, Alexey Kardashevskiy wrote: > > > On 03/06/2019 12:23, Shawn Anastasio wrote: >> >> >> On 5/30/19 10:56 PM, Alexey Kardashevskiy wrote: >>> >>> >>> On 31/05/2019 08:49, Shawn Anastasio wrote: On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote: > > > On 28/05/2019 17:39, Shawn Anastasio wrote: >> >> >> On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote: >>> >>> >>> On 28/05/2019 15:36, Oliver wrote: On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio wrote: > > Introduce a new pcibios function pcibios_ignore_alignment_request > which allows the PCI core to defer to platform-specific code to > determine whether or not to ignore alignment requests for PCI > resources. > > The existing behavior is to simply ignore alignment requests when > PCI_PROBE_ONLY is set. This is behavior is maintained by the > default implementation of pcibios_ignore_alignment_request. > > Signed-off-by: Shawn Anastasio > --- > drivers/pci/pci.c | 9 +++-- > include/linux/pci.h | 1 + > 2 files changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 8abc843b1615..8207a09085d1 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -5882,6 +5882,11 @@ resource_size_t __weak > pcibios_default_alignment(void) > return 0; > } > > +int __weak pcibios_ignore_alignment_request(void) > +{ > + return pci_has_flag(PCI_PROBE_ONLY); > +} > + > #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE > static char > resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0}; > static DEFINE_SPINLOCK(resource_alignment_lock); > @@ -5906,9 +5911,9 @@ static resource_size_t > pci_specified_resource_alignment(struct pci_dev *dev, > p = resource_alignment_param; > if (!*p && !align) > goto out; > - if (pci_has_flag(PCI_PROBE_ONLY)) { > + if (pcibios_ignore_alignment_request()) { > align = 0; > - pr_info_once("PCI: Ignoring requested alignments > (PCI_PROBE_ONLY)\n"); > + pr_info_once("PCI: Ignoring requested > alignments\n"); > goto out; > } I think the logic here is questionable to begin with. If the user has explicitly requested re-aligning a resource via the command line then we should probably do it even if PCI_PROBE_ONLY is set. When it breaks they get to keep the pieces. That said, the real issue here is that PCI_PROBE_ONLY probably shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM) hotplugged devices are configured by firmware before it's passed to the guest and we need to keep the FW assignments otherwise things break. QEMU however doesn't do any BAR assignments and relies on that being handled by the guest. At boot time this is done by SLOF, but Linux only keeps SLOF around until it's extracted the device-tree. Once that's done SLOF gets blown away and the kernel needs to do it's own BAR assignments. I'm guessing there's a hack in there to make it work today, but it's a little surprising that it works at all... >>> >>> >>> The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the >>> guest which receives an event from qemu (RAS_EPOW from >>> /proc/interrupts), fetches device tree chunks (and as I understand >>> it - >>> they come with BARs from phyp but without from qemu) and writes >>> "1" to >>> "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually: >> >> Interesting. Does this mean that the PHYP hotplug path doesn't >> call pci_assign_resource? > > > I'd expect dlpar_add_slot() to be called under phyp and eventually > pci_device_add() which (I think) may or may not trigger later > reassignment. > > >> If so it means the patch may not >> break that platform after all, though it still may not be >> the correct way of doing things. > > > We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems > that (unless resource_alignment= is used) the pseries guest should just > walk through all allocated resources and leave them unchanged. If we add a pcibios_default_alignment() implementation like was suggested earlier, then it will behave as if the user has specified resource_alignment= by
[PATCH v3] powerpc: fix kexec failure on book3s/32
In the old days, _PAGE_EXEC didn't exist on 6xx aka book3s/32. Therefore, allthough __mapin_ram_chunk() was already mapping kernel text with PAGE_KERNEL_TEXT and the rest with PAGE_KERNEL, the entire memory was executable. Part of the memory (first 512kbytes) was mapped with BATs instead of page table, but it was also entirely mapped as executable. In commit 385e89d5b20f ("powerpc/mm: add exec protection on powerpc 603"), we started adding exec protection to some 6xx, namely the 603, for pages mapped via pagetables. Then, in commit 63b2bc619565 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX"), the exec protection was extended to BAT mapped memory, so that really only the kernel text could be executed. The problem here is that kexec is based on copying some code into upper part of memory then executing it from there in order to install a fresh new kernel at its definitive location. However, the code is position independant and first part of it is just there to deactivate the MMU and jump to the second part. So it is possible to run this first part inplace instead of running the copy. Once the MMU is off, there is no protection anymore and the second part of the code will just run as before. Reported-by: Aaro Koskinen Fixes: 63b2bc619565 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX") Cc: sta...@vger.kernel.org Signed-off-by: Christophe Leroy --- Aaro, can you test this patch ? Thanks. arch/powerpc/include/asm/kexec.h | 3 +++ arch/powerpc/kernel/machine_kexec_32.c | 4 +++- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 4a585cba1787..c68476818753 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -94,6 +94,9 @@ static inline bool kdump_in_progress(void) return crashing_cpu >= 0; } +void relocate_new_kernel(unsigned long indirection_page, unsigned long reboot_code_buffer, +unsigned long start_address) __noreturn; + #ifdef CONFIG_KEXEC_FILE extern const struct kexec_file_ops kexec_elf64_ops; diff --git a/arch/powerpc/kernel/machine_kexec_32.c b/arch/powerpc/kernel/machine_kexec_32.c index affe5dcce7f4..2b160d68db49 100644 --- a/arch/powerpc/kernel/machine_kexec_32.c +++ b/arch/powerpc/kernel/machine_kexec_32.c @@ -30,7 +30,6 @@ typedef void (*relocate_new_kernel_t)( */ void default_machine_kexec(struct kimage *image) { - extern const unsigned char relocate_new_kernel[]; extern const unsigned int relocate_new_kernel_size; unsigned long page_list; unsigned long reboot_code_buffer, reboot_code_buffer_phys; @@ -58,6 +57,9 @@ void default_machine_kexec(struct kimage *image) reboot_code_buffer + KEXEC_CONTROL_PAGE_SIZE); printk(KERN_INFO "Bye!\n"); + if (!IS_ENABLED(CONFIG_FSL_BOOKE) && !IS_ENABLED(CONFIG_44x)) + relocate_new_kernel(page_list, reboot_code_buffer_phys, image->start); + /* now call it */ rnk = (relocate_new_kernel_t) reboot_code_buffer; (*rnk)(page_list, reboot_code_buffer_phys, image->start); -- 2.13.3
Re: [PATCH BACKPORTv2 4.19, 5.0, 5.1] crypto: vmx - ghash: do nosimd fallback manually
On Mon, Jun 03, 2019 at 12:08:48PM +1000, Daniel Axtens wrote: > commit 357d065a44cdd77ed5ff35155a989f2a763e96ef upstream. > [backported: the VMX driver did not use crypto_simd_usable() until > after 5.1] > > VMX ghash was using a fallback that did not support interleaving simd > and nosimd operations, leading to failures in the extended test suite. > > If I understood correctly, Eric's suggestion was to use the same > data format that the generic code uses, allowing us to call into it > with the same contexts. I wasn't able to get that to work - I think > there's a very different key structure and data layout being used. > > So instead steal the arm64 approach and perform the fallback > operations directly if required. > > Fixes: cc333cd68dfa ("crypto: vmx - Adding GHASH routines for VMX module") > Cc: sta...@vger.kernel.org # v4.1+ > Reported-by: Eric Biggers > Signed-off-by: Daniel Axtens > Acked-by: Ard Biesheuvel > Tested-by: Michael Ellerman > Signed-off-by: Herbert Xu > Signed-off-by: Daniel Axtens > --- > > v2: do stable backport form correctly. Thanks for all of these, now queued up. greg k-h
Re: [PATCH 10/16] sparc64: use the generic get_user_pages_fast code
On Sun, Jun 02, 2019 at 03:39:48PM +0800, Hillf Danton wrote: > > Hi Christoph > > On Sat, 1 Jun 2019 09:49:53 +0200 Christoph Hellwig wrote: > > > > diff --git a/arch/sparc/include/asm/pgtable_64.h > > b/arch/sparc/include/asm/pgtable_64.h > > index a93eca29e85a..2301ab5250e4 100644 > > --- a/arch/sparc/include/asm/pgtable_64.h > > +++ b/arch/sparc/include/asm/pgtable_64.h > > @@ -1098,6 +1098,24 @@ static inline unsigned long untagged_addr(unsigned > > long start) > > } > > #define untagged_addr untagged_addr > > > > +static inline bool pte_access_permitted(pte_t pte, bool write) > > +{ > > + u64 prot; > > + > > + if (tlb_type == hypervisor) { > > + prot = _PAGE_PRESENT_4V | _PAGE_P_4V; > > + if (prot) > > Feel free to correct me if I misread or miss anything. > It looks like a typo: s/prot/write/, as checking _PAGE_PRESENT_4V and > _PAGE_P_4V makes prot always have _PAGE_WRITE_4V set, regardless of write. True, the if prot should be if write.
Re: [PATCH 08/16] sparc64: add the missing pgd_page definition
On Sat, Jun 01, 2019 at 09:28:54AM -0700, Linus Torvalds wrote: > Both sparc64 and sh had this pattern, but now that I look at it more > closely, I think your version is wrong, or at least nonoptimal. I bet it is. Then again these symbols are just required for the code to compile, as neither sparc64 nor sh actually use the particular variant of huge pages we need it for. Then again even actually dead code should better be not too buggy if it isn't just a stub. > So I thgink this would be better done with > > #define pgd_page(pgd)pfn_to_page(pgd_pfn(pgd)) > > where that "pgd_pfn()" would need to be a new (but likely very > trivial) function. That's what we do for pte_pfn(). > > IOW, it would likely end up something like > > #define pgd_to_pfn(pgd) (pgd_val(x) >> PFN_PGD_SHIFT) True. I guess it would be best if we could get most if not all architectures to use common versions of these macros so that we have the issue settled once.
Re: [RFC PATCH] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate
Le 03/06/2019 à 08:44, Nicholas Piggin a écrit : The pmd_none check does not catch hugepage collapse, nor does the pmd_present check in pmd_trans_huge, because hugepage collapse sets !_PAGE_PRESENT && _PAGE_INVALID (which results in !pmd_none and pmd_present). Aneesh noticed we might need this check as well. --- arch/powerpc/mm/pgtable.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index db4a6253df92..7a702d21400a 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -372,13 +372,20 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea, pdshift = PMD_SHIFT; pmdp = pmd_offset(, ea); pmd = READ_ONCE(*pmdp); - /* -* A hugepage collapse is captured by pmd_none, because -* it mark the pmd none and do a hpte invalidate. -*/ + if (pmd_none(pmd)) return NULL; +#ifdef CONFIG_PPC_BOOK3S_64 I can't see anything that would build fail on other subarches. Wouldn't be better to use if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && (pmd_val(pmd) & (_PAGE_PRESENT|_PAGE_INVALID) == _PAGE_INVALID)) + if (pmd_val(pmd) & (_PAGE_PRESENT|_PAGE_INVALID) == _PAGE_INVALID) { Maybe using pmd_raw() instead as in all the book3s64 helpers ? Christophe + /* +* A hugepage collapse is captured by this condition, see +* pmdp_invalidate. +*/ + return NULL; + } +#endif + if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) { if (is_thp) *is_thp = true;
Re: [PATCH 03/16] mm: simplify gup_fast_permitted
On Sat, Jun 01, 2019 at 09:14:17AM -0700, Linus Torvalds wrote: > On Sat, Jun 1, 2019 at 12:50 AM Christoph Hellwig wrote: > > > > Pass in the already calculated end value instead of recomputing it, and > > leave the end > start check in the callers instead of duplicating them > > in the arch code. > > Good cleanup, except it's wrong. > > > - if (nr_pages <= 0) > > + if (end < start) > > return 0; > > You moved the overflow test to generic code - good. > > You removed the sign and zero test on nr_pages - bad. I only removed a duplicate of it. The full (old) code in get_user_pages_fast() looks like this: if (nr_pages <= 0) return 0; if (unlikely(!access_ok((void __user *)start, len))) return -EFAULT; if (gup_fast_permitted(start, nr_pages)) {
Re: [PATCH] powerpc/64s: Fix THP PMD collapse serialisation
Aneesh Kumar K.V's on June 3, 2019 4:43 pm: > On 6/3/19 11:35 AM, Nicholas Piggin wrote: >> Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion >> in pte helpers") changed the actual bitwise tests in pte_access_permitted >> by using pte_write() and pte_present() helpers rather than raw bitwise >> testing _PAGE_WRITE and _PAGE_PRESENT bits. >> >> The pte_present change now returns true for ptes which are !_PAGE_PRESENT >> and _PAGE_INVALID, which is the combination used by pmdp_invalidate to >> synchronize access from lock-free lookups. pte_access_permitted is used by >> pmd_access_permitted, so allowing GUP lock free access to proceed with >> such PTEs breaks this synchronisation. >> >> This bug has been observed on HPT host, with random crashes and corruption >> in guests, usually together with bad PMD messages in the host. >> >> Fix this by adding an explicit check in pmd_access_permitted, and >> documenting the condition explicitly. >> >> The pte_write() change should be okay, and would prevent GUP from falling >> back to the slow path when encountering savedwrite ptes, which matches >> what x86 (that does not implement savedwrite) does. >> >> Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in >> pte helpers") >> Cc: Aneesh Kumar K.V >> Cc: Christophe Leroy >> Signed-off-by: Nicholas Piggin >> --- >> arch/powerpc/include/asm/book3s/64/pgtable.h | 19 ++- >> arch/powerpc/mm/book3s64/pgtable.c | 3 +++ >> 2 files changed, 21 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h >> b/arch/powerpc/include/asm/book3s/64/pgtable.h >> index 7dede2e34b70..aaa72aa1b765 100644 >> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h >> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h >> @@ -1092,7 +1092,24 @@ static inline int pmd_protnone(pmd_t pmd) >> #define pmd_access_permitted pmd_access_permitted >> static inline bool pmd_access_permitted(pmd_t pmd, bool write) >> { >> -return pte_access_permitted(pmd_pte(pmd), write); >> +pte_t pte = pmd_pte(pmd); >> +unsigned long pteval = pte_val(pte); >> + >> +/* >> + * pmdp_invalidate sets this combination (that is not caught by >> + * !pte_present() check in pte_access_permitted), to prevent >> + * lock-free lookups, as part of the serialize_against_pte_lookup() >> + * synchronisation. >> + * >> + * This check inadvertently catches the case where the PTE's hardware >> + * PRESENT bit is cleared while TLB is flushed, to work around >> + * hardware TLB issues. This is suboptimal, but should not be hit >> + * frequently and should be harmless. >> + */ >> +if ((pteval & _PAGE_INVALID) && !(pteval & _PAGE_PRESENT)) >> +return false; >> + >> +return pte_access_permitted(pte, write); >> } >> > > > you need to do similar for other lockless page table walk like > find_linux_pte Yeah good point as discussed offline. I was going to make that a separate patch, it would have a different Fixes:. I have not been able to trigger any bugs caused by it, whereas the bug caused by this patch hits reliably in about 10 minutes or less. Maybe the race window is just a lot smaller or the function is less frequently used? Thanks, Nick
Re: [PATCH 09/22] docs: mark orphan documents as such
Le 30/05/2019 à 01:23, Mauro Carvalho Chehab a écrit : Sphinx doesn't like orphan documents: Documentation/accelerators/ocxl.rst: WARNING: document isn't included in any toctree Documentation/arm/stm32/overview.rst: WARNING: document isn't included in any toctree Documentation/arm/stm32/stm32f429-overview.rst: WARNING: document isn't included in any toctree Documentation/arm/stm32/stm32f746-overview.rst: WARNING: document isn't included in any toctree Documentation/arm/stm32/stm32f769-overview.rst: WARNING: document isn't included in any toctree Documentation/arm/stm32/stm32h743-overview.rst: WARNING: document isn't included in any toctree Documentation/arm/stm32/stm32mp157-overview.rst: WARNING: document isn't included in any toctree Documentation/gpu/msm-crash-dump.rst: WARNING: document isn't included in any toctree Documentation/interconnect/interconnect.rst: WARNING: document isn't included in any toctree Documentation/laptops/lg-laptop.rst: WARNING: document isn't included in any toctree Documentation/powerpc/isa-versions.rst: WARNING: document isn't included in any toctree Documentation/virtual/kvm/amd-memory-encryption.rst: WARNING: document isn't included in any toctree Documentation/virtual/kvm/vcpu-requests.rst: WARNING: document isn't included in any toctree So, while they aren't on any toctree, add :orphan: to them, in order to silent this warning. Are those files really not meant to be included in a toctree ? Shouldn't we include them in the relevant toctree instead of just shutting up Sphinx warnings ? Christophe Signed-off-by: Mauro Carvalho Chehab --- Documentation/accelerators/ocxl.rst | 2 ++ Documentation/arm/stm32/overview.rst| 2 ++ Documentation/arm/stm32/stm32f429-overview.rst | 2 ++ Documentation/arm/stm32/stm32f746-overview.rst | 2 ++ Documentation/arm/stm32/stm32f769-overview.rst | 2 ++ Documentation/arm/stm32/stm32h743-overview.rst | 2 ++ Documentation/arm/stm32/stm32mp157-overview.rst | 2 ++ Documentation/gpu/msm-crash-dump.rst| 2 ++ Documentation/interconnect/interconnect.rst | 2 ++ Documentation/laptops/lg-laptop.rst | 2 ++ Documentation/powerpc/isa-versions.rst | 2 ++ Documentation/virtual/kvm/amd-memory-encryption.rst | 2 ++ Documentation/virtual/kvm/vcpu-requests.rst | 2 ++ 13 files changed, 26 insertions(+) diff --git a/Documentation/accelerators/ocxl.rst b/Documentation/accelerators/ocxl.rst index 14cefc020e2d..b1cea19a90f5 100644 --- a/Documentation/accelerators/ocxl.rst +++ b/Documentation/accelerators/ocxl.rst @@ -1,3 +1,5 @@ +:orphan: + OpenCAPI (Open Coherent Accelerator Processor Interface) diff --git a/Documentation/arm/stm32/overview.rst b/Documentation/arm/stm32/overview.rst index 85cfc8410798..f7e734153860 100644 --- a/Documentation/arm/stm32/overview.rst +++ b/Documentation/arm/stm32/overview.rst @@ -1,3 +1,5 @@ +:orphan: + STM32 ARM Linux Overview diff --git a/Documentation/arm/stm32/stm32f429-overview.rst b/Documentation/arm/stm32/stm32f429-overview.rst index 18feda97f483..65bbb1c3b423 100644 --- a/Documentation/arm/stm32/stm32f429-overview.rst +++ b/Documentation/arm/stm32/stm32f429-overview.rst @@ -1,3 +1,5 @@ +:orphan: + STM32F429 Overview == diff --git a/Documentation/arm/stm32/stm32f746-overview.rst b/Documentation/arm/stm32/stm32f746-overview.rst index b5f4b6ce7656..42d593085015 100644 --- a/Documentation/arm/stm32/stm32f746-overview.rst +++ b/Documentation/arm/stm32/stm32f746-overview.rst @@ -1,3 +1,5 @@ +:orphan: + STM32F746 Overview == diff --git a/Documentation/arm/stm32/stm32f769-overview.rst b/Documentation/arm/stm32/stm32f769-overview.rst index 228656ced2fe..f6adac862b17 100644 --- a/Documentation/arm/stm32/stm32f769-overview.rst +++ b/Documentation/arm/stm32/stm32f769-overview.rst @@ -1,3 +1,5 @@ +:orphan: + STM32F769 Overview == diff --git a/Documentation/arm/stm32/stm32h743-overview.rst b/Documentation/arm/stm32/stm32h743-overview.rst index 3458dc00095d..c525835e7473 100644 --- a/Documentation/arm/stm32/stm32h743-overview.rst +++ b/Documentation/arm/stm32/stm32h743-overview.rst @@ -1,3 +1,5 @@ +:orphan: + STM32H743 Overview == diff --git a/Documentation/arm/stm32/stm32mp157-overview.rst b/Documentation/arm/stm32/stm32mp157-overview.rst index 62e176d47ca7..2c52cd020601 100644 --- a/Documentation/arm/stm32/stm32mp157-overview.rst +++ b/Documentation/arm/stm32/stm32mp157-overview.rst @@ -1,3 +1,5 @@ +:orphan: + STM32MP157 Overview === diff --git a/Documentation/gpu/msm-crash-dump.rst
Re: [WIP RFC PATCH 0/6] Generic Firmware Variable Filesystem
On Mon, Jun 03, 2019 at 04:04:32PM +1000, Daniel Axtens wrote: > Hi Nayna, > > >> As PowerNV moves towards secure boot, we need a place to put secure > >> variables. One option that has been canvassed is to make our secure > >> variables look like EFI variables. This is an early sketch of another > >> approach where we create a generic firmware variable file system, > >> fwvarfs, and an OPAL Secure Variable backend for it. > > > > Is there a need of new filesystem ? I am wondering why can't these be > > exposed via sysfs / securityfs ? > > Probably, something like... /sys/firmware/secureboot or > > /sys/kernel/security/secureboot/ ? > > I suppose we could put secure variables in sysfs, but I'm not sure > that's what sysfs was intended for. I understand sysfs as "a > filesystem-based view of kernel objects" (from > Documentation/filesystems/configfs/configfs.txt), and I don't think a > secure variable is really a kernel object in the same way most other > things in sysfs are... but I'm open to being convinced. What makes them more "secure" than anything else that is in sysfs today? I didn't see anything in this patchset that provided "additional security", did I miss it? > securityfs seems to be reserved for LSMs, I don't think we can put > things there. Yeah, I wouldn't mess with that. I would just recommend putting this in sysfs. Make a new subsystem (i.e. class) and away you go. > My hope with fwvarfs is to provide a generic place for firmware > variables so that we don't need to expand the list of firmware-specific > filesystems beyond efivarfs. I am also aiming to make things simple to > use so that people familiar with firmware don't also have to become > familiar with filesystem code in order to expose firmware variables to > userspace. Why would anyone need to be writing new code to firmware variables that makes it any different from any other kernel change? > > Also, it sounds like this is needed only for secure firmware variables > > and does not include > > other firmware variables which are not security relevant ? Is that > > correct understanding ? > > The primary use case at the moment - OPAL secure variables - is security > focused because the current OPAL secure variable design stores and > manipulates secure variables separately from the rest of nvram. This > isn't an inherent feature of fwvarfs. Again, why not just put it in sysfs please? > fwvarfs can also be used for variables that are not security relevant as > well. For example, with the EFI backend (patch 3), both secure and > insecure variables can be read. I don't remember why efi variables were not put in sysfs, I think there was some reasoning behind it originally. Perhaps look in the linux-efi archives. thanks, greg k-h
RE: [next-20190530] Boot failure on PowerPC
ACK for ' but it fixes the get_swap_device warning messages during the boot.' Double check for kernel panic issue, without this patch, it seems not repro in my local manual environment, so please ignore my previous mail for 'the patch fixes the kernel panic'. Sorry for the confusion. -Original Message- From: Lili Deng (Wicresoft North America Ltd) Sent: Monday, June 3, 2019 11:24 AM To: Sachin Sant ; Dexuan-Linux Cui Cc: Michael Ellerman ; linuxppc-dev@lists.ozlabs.org; linux-n...@vger.kernel.org; Dexuan Cui Subject: RE: [next-20190530] Boot failure on PowerPC Hi Sachin, I verified below patch against Ubuntu 18.04, didn't hit the kernel panic any more, could you please let know how did you verify? Thanks, Lili -Original Message- From: Sachin Sant Sent: Monday, June 3, 2019 11:12 AM To: Dexuan-Linux Cui Cc: Michael Ellerman ; linuxppc-dev@lists.ozlabs.org; linux-n...@vger.kernel.org; Dexuan Cui ; Lili Deng (Wicresoft North America Ltd) Subject: Re: [next-20190530] Boot failure on PowerPC > On 31-May-2019, at 11:43 PM, Dexuan-Linux Cui wrote: > > On Fri, May 31, 2019 at 6:52 AM Michael Ellerman wrote: >>> >>> Machine boots till login prompt and then panics few seconds later. >>> >>> Last known next build was May 24th. Will attempt few builds till May >>> 30 to narrow down this problem. >> >> My CI was fine with next-20190529 (9a15d2e3fd03e3). >> >> cheers > > Hi Sachin, > It looks this patch may fix the issue: > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2019%2F5%2F30%2F1630data=02%7C01%7Cv-lide%40microsoft.com%7C66e1ef6017aa461703f808d6e7d148cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636951283393233385sdata=IJFhtvL2Bd87HCoMZ7oWL%2Bar6NY%2FfPbmdCZMT%2BJz5t4%3Dreserved=0 > , but I'm not sure. It does not help fix the kernel panic issue, but it fixes the get_swap_device warning messages during the boot. Thanks -Sachin
Re: [RFC PATCH] powerpc/book3e: KASAN Full support for 64bit
Hi, Ok, can you share your .config ? Christophe Le 31/05/2019 à 03:29, Daniel Axtens a écrit : Hi Christophe, I tried this on the t4240rdb and it fails to boot if KASAN is enabled. It does boot with the patch applied but KASAN disabled, so that narrows it down a little bit. I need to focus on 3s first so I'll just drop 3e from my patch set for now. Regards, Daniel The KASAN shadow area is mapped into vmemmap space: 0x8000 0400 to 0x8000 0600 . For this vmemmap has to be disabled. Cc: Daniel Axtens Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 1 + arch/powerpc/Kconfig.debug| 3 +- arch/powerpc/include/asm/kasan.h | 11 +++ arch/powerpc/kernel/Makefile | 2 + arch/powerpc/kernel/head_64.S | 3 + arch/powerpc/kernel/setup_64.c| 20 +++--- arch/powerpc/mm/kasan/Makefile| 1 + arch/powerpc/mm/kasan/kasan_init_64.c | 129 ++ 8 files changed, 159 insertions(+), 11 deletions(-) create mode 100644 arch/powerpc/mm/kasan/kasan_init_64.c diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 1a2fb50126b2..e0b7c45e4dc7 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -174,6 +174,7 @@ config PPC select HAVE_ARCH_AUDITSYSCALL select HAVE_ARCH_JUMP_LABEL select HAVE_ARCH_KASAN if PPC32 + select HAVE_ARCH_KASAN if PPC_BOOK3E_64 && !SPARSEMEM_VMEMMAP select HAVE_ARCH_KGDB select HAVE_ARCH_MMAP_RND_BITS select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug index 61febbbdd02b..b4140dd6b4e4 100644 --- a/arch/powerpc/Kconfig.debug +++ b/arch/powerpc/Kconfig.debug @@ -370,4 +370,5 @@ config PPC_FAST_ENDIAN_SWITCH config KASAN_SHADOW_OFFSET hex depends on KASAN - default 0xe000 + default 0xe000 if PPC32 + default 0x68000400 if PPC64 diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h index 296e51c2f066..756b3d58f921 100644 --- a/arch/powerpc/include/asm/kasan.h +++ b/arch/powerpc/include/asm/kasan.h @@ -23,10 +23,21 @@ #define KASAN_SHADOW_OFFSET ASM_CONST(CONFIG_KASAN_SHADOW_OFFSET) +#ifdef CONFIG_PPC32 #define KASAN_SHADOW_END 0UL #define KASAN_SHADOW_SIZE (KASAN_SHADOW_END - KASAN_SHADOW_START) +#else + +#include + +#define KASAN_SHADOW_SIZE (KERN_VIRT_SIZE >> KASAN_SHADOW_SCALE_SHIFT) + +#define KASAN_SHADOW_END (KASAN_SHADOW_START + KASAN_SHADOW_SIZE) + +#endif /* CONFIG_PPC32 */ + #ifdef CONFIG_KASAN void kasan_early_init(void); void kasan_mmu_init(void); diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 0ea6c4aa3a20..7f232c06f11d 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -35,6 +35,8 @@ KASAN_SANITIZE_early_32.o := n KASAN_SANITIZE_cputable.o := n KASAN_SANITIZE_prom_init.o := n KASAN_SANITIZE_btext.o := n +KASAN_SANITIZE_paca.o := n +KASAN_SANITIZE_setup_64.o := n ifdef CONFIG_KASAN CFLAGS_early_32.o += -DDISABLE_BRANCH_PROFILING diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index 3fad8d499767..80fbd8024fb2 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -966,6 +966,9 @@ start_here_multiplatform: * and SLB setup before we turn on relocation. */ +#ifdef CONFIG_KASAN + bl kasan_early_init +#endif /* Restore parameters passed from prom_init/kexec */ mr r3,r31 bl early_setup /* also sets r13 and SPRG_PACA */ diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index ba404dd9ce1d..d2bf860dd966 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -311,6 +311,16 @@ void __init early_setup(unsigned long dt_ptr) DBG(" -> early_setup(), dt_ptr: 0x%lx\n", dt_ptr); /* +* Configure exception handlers. This include setting up trampolines +* if needed, setting exception endian mode, etc... +*/ + configure_exceptions(); + + /* Apply all the dynamic patching */ + apply_feature_fixups(); + setup_feature_keys(); + + /* * Do early initialization using the flattened device * tree, such as retrieving the physical memory map or * calculating/retrieving the hash table size. @@ -325,16 +335,6 @@ void __init early_setup(unsigned long dt_ptr) setup_paca(paca_ptrs[boot_cpuid]); fixup_boot_paca(); - /* -* Configure exception handlers. This include setting up trampolines -* if needed, setting exception endian mode, etc... -*/ - configure_exceptions(); - - /* Apply all the dynamic patching */ - apply_feature_fixups(); - setup_feature_keys(); - /*
[PATCH v2] powerpc: pseries/hvconsole: fix stack overread via udbg
While developing kasan for 64-bit book3s, I hit the following stack over-read. It occurs because the hypercall to put characters onto the terminal takes 2 longs (128 bits/16 bytes) of characters at a time, and so hvc_put_chars would unconditionally copy 16 bytes from the argument buffer, regardless of supplied length. However, udbg_hvc_putc can call hvc_put_chars with a single-byte buffer, leading to the error. [0.001931] == [150/819] [0.001933] BUG: KASAN: stack-out-of-bounds in hvc_put_chars+0xdc/0x110 [0.001934] Read of size 8 at addr c23e7a90 by task swapper/0 [0.001934] [0.001935] CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.0-rc2-next-20190528-02824-g048a6ab4835b #113 [0.001935] Call Trace: [0.001936] [c23e7790] [c1b4a450] dump_stack+0x104/0x154 (unreliable) [0.001937] [c23e77f0] [c06d3524] print_address_description+0xa0/0x30c [0.001938] [c23e7880] [c06d318c] __kasan_report+0x20c/0x224 [0.001939] [c23e7950] [c06d19d8] kasan_report+0x18/0x30 [0.001940] [c23e7970] [c06d4854] __asan_report_load8_noabort+0x24/0x40 [0.001941] [c23e7990] [c01511ac] hvc_put_chars+0xdc/0x110 [0.001942] [c23e7a10] [c0f81cfc] hvterm_raw_put_chars+0x9c/0x110 [0.001943] [c23e7a50] [c0f82634] udbg_hvc_putc+0x154/0x200 [0.001944] [c23e7b10] [c0049c90] udbg_write+0xf0/0x240 [0.001945] [c23e7b70] [c02e5d88] console_unlock+0x868/0xd30 [0.001946] [c23e7ca0] [c02e6e00] register_console+0x970/0xe90 [0.001947] [c23e7d80] [c1ff1928] register_early_udbg_console+0xf8/0x114 [0.001948] [c23e7df0] [c1ff1174] setup_arch+0x108/0x790 [0.001948] [c23e7e90] [c1fe41c8] start_kernel+0x104/0x784 [0.001949] [c23e7f90] [c000b368] start_here_common+0x1c/0x534 [0.001950] [0.001950] [0.001951] Memory state around the buggy address: [0.001952] c23e7980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [0.001952] c23e7a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 [0.001953] >c23e7a80: f1 f1 01 f2 f2 f2 00 00 00 00 00 00 00 00 00 00 [0.001953] ^ [0.001954] c23e7b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [0.001954] c23e7b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [0.001955] == Document that a 16-byte buffer is requred, and provide it in udbg. CC: Dmitry Vyukov Signed-off-by: Daniel Axtens --- v2: avoid memcpy, push responsibility to caller. Solution suggested by mpe. --- arch/powerpc/platforms/pseries/hvconsole.c | 2 +- drivers/tty/hvc/hvc_vio.c | 16 +++- 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/pseries/hvconsole.c b/arch/powerpc/platforms/pseries/hvconsole.c index 74da18de853a..73ec15cd2708 100644 --- a/arch/powerpc/platforms/pseries/hvconsole.c +++ b/arch/powerpc/platforms/pseries/hvconsole.c @@ -62,7 +62,7 @@ EXPORT_SYMBOL(hvc_get_chars); * @vtermno: The vtermno or unit_address of the adapter from which the data * originated. * @buf: The character buffer that contains the character data to send to - * firmware. + * firmware. Must be at least 16 bytes, even if count is less than 16. * @count: Send this number of characters. */ int hvc_put_chars(uint32_t vtermno, const char *buf, int count) diff --git a/drivers/tty/hvc/hvc_vio.c b/drivers/tty/hvc/hvc_vio.c index 6de6d4a1a221..7af54d6ed5b8 100644 --- a/drivers/tty/hvc/hvc_vio.c +++ b/drivers/tty/hvc/hvc_vio.c @@ -107,6 +107,14 @@ static int hvterm_raw_get_chars(uint32_t vtermno, char *buf, int count) return got; } +/** + * hvterm_raw_put_chars: send characters to firmware for given vterm adapter + * @vtermno: The virtual terminal number. + * @buf: The characters to send. Because of the underlying hypercall in + * hvc_put_chars(), this buffer must be at least 16 bytes long, even if + * you are sending fewer chars. + * @count: number of chars to send. + */ static int hvterm_raw_put_chars(uint32_t vtermno, const char *buf, int count) { struct hvterm_priv *pv = hvterm_privs[vtermno]; @@ -219,6 +227,7 @@ static const struct hv_ops hvterm_hvsi_ops = { static void udbg_hvc_putc(char c) { int count = -1; + unsigned char bounce_buffer[16]; if (!hvterm_privs[0]) return; @@ -229,7 +238,12 @@ static void udbg_hvc_putc(char c) do { switch(hvterm_privs[0]->proto) { case HV_PROTOCOL_RAW: - count = hvterm_raw_put_chars(0, , 1); +
[RFC PATCH] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate
The pmd_none check does not catch hugepage collapse, nor does the pmd_present check in pmd_trans_huge, because hugepage collapse sets !_PAGE_PRESENT && _PAGE_INVALID (which results in !pmd_none and pmd_present). Aneesh noticed we might need this check as well. --- arch/powerpc/mm/pgtable.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index db4a6253df92..7a702d21400a 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -372,13 +372,20 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea, pdshift = PMD_SHIFT; pmdp = pmd_offset(, ea); pmd = READ_ONCE(*pmdp); - /* -* A hugepage collapse is captured by pmd_none, because -* it mark the pmd none and do a hpte invalidate. -*/ + if (pmd_none(pmd)) return NULL; +#ifdef CONFIG_PPC_BOOK3S_64 + if (pmd_val(pmd) & (_PAGE_PRESENT|_PAGE_INVALID) == _PAGE_INVALID) { + /* +* A hugepage collapse is captured by this condition, see +* pmdp_invalidate. +*/ + return NULL; + } +#endif + if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) { if (is_thp) *is_thp = true; -- 2.20.1
Re: [PATCH] powerpc/64s: Fix THP PMD collapse serialisation
On 6/3/19 11:35 AM, Nicholas Piggin wrote: Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers") changed the actual bitwise tests in pte_access_permitted by using pte_write() and pte_present() helpers rather than raw bitwise testing _PAGE_WRITE and _PAGE_PRESENT bits. The pte_present change now returns true for ptes which are !_PAGE_PRESENT and _PAGE_INVALID, which is the combination used by pmdp_invalidate to synchronize access from lock-free lookups. pte_access_permitted is used by pmd_access_permitted, so allowing GUP lock free access to proceed with such PTEs breaks this synchronisation. This bug has been observed on HPT host, with random crashes and corruption in guests, usually together with bad PMD messages in the host. Fix this by adding an explicit check in pmd_access_permitted, and documenting the condition explicitly. The pte_write() change should be okay, and would prevent GUP from falling back to the slow path when encountering savedwrite ptes, which matches what x86 (that does not implement savedwrite) does. Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers") Cc: Aneesh Kumar K.V Cc: Christophe Leroy Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/book3s/64/pgtable.h | 19 ++- arch/powerpc/mm/book3s64/pgtable.c | 3 +++ 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 7dede2e34b70..aaa72aa1b765 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -1092,7 +1092,24 @@ static inline int pmd_protnone(pmd_t pmd) #define pmd_access_permitted pmd_access_permitted static inline bool pmd_access_permitted(pmd_t pmd, bool write) { - return pte_access_permitted(pmd_pte(pmd), write); + pte_t pte = pmd_pte(pmd); + unsigned long pteval = pte_val(pte); + + /* +* pmdp_invalidate sets this combination (that is not caught by +* !pte_present() check in pte_access_permitted), to prevent +* lock-free lookups, as part of the serialize_against_pte_lookup() +* synchronisation. +* +* This check inadvertently catches the case where the PTE's hardware +* PRESENT bit is cleared while TLB is flushed, to work around +* hardware TLB issues. This is suboptimal, but should not be hit +* frequently and should be harmless. +*/ + if ((pteval & _PAGE_INVALID) && !(pteval & _PAGE_PRESENT)) + return false; + + return pte_access_permitted(pte, write); } you need to do similar for other lockless page table walk like find_linux_pte #ifdef CONFIG_TRANSPARENT_HUGEPAGE diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c index 16bda049187a..ff98b663c83e 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -116,6 +116,9 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, /* * This ensures that generic code that rely on IRQ disabling * to prevent a parallel THP split work as expected. +* +* Marking the entry with _PAGE_INVALID && ~_PAGE_PRESENT requires +* a special case check in pmd_access_permitted. */ serialize_against_pte_lookup(vma->vm_mm); return __pmd(old_pmd); -anesh
[PATCH] powerpc/64s: Fix THP PMD collapse serialisation
Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers") changed the actual bitwise tests in pte_access_permitted by using pte_write() and pte_present() helpers rather than raw bitwise testing _PAGE_WRITE and _PAGE_PRESENT bits. The pte_present change now returns true for ptes which are !_PAGE_PRESENT and _PAGE_INVALID, which is the combination used by pmdp_invalidate to synchronize access from lock-free lookups. pte_access_permitted is used by pmd_access_permitted, so allowing GUP lock free access to proceed with such PTEs breaks this synchronisation. This bug has been observed on HPT host, with random crashes and corruption in guests, usually together with bad PMD messages in the host. Fix this by adding an explicit check in pmd_access_permitted, and documenting the condition explicitly. The pte_write() change should be okay, and would prevent GUP from falling back to the slow path when encountering savedwrite ptes, which matches what x86 (that does not implement savedwrite) does. Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers") Cc: Aneesh Kumar K.V Cc: Christophe Leroy Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/book3s/64/pgtable.h | 19 ++- arch/powerpc/mm/book3s64/pgtable.c | 3 +++ 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 7dede2e34b70..aaa72aa1b765 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -1092,7 +1092,24 @@ static inline int pmd_protnone(pmd_t pmd) #define pmd_access_permitted pmd_access_permitted static inline bool pmd_access_permitted(pmd_t pmd, bool write) { - return pte_access_permitted(pmd_pte(pmd), write); + pte_t pte = pmd_pte(pmd); + unsigned long pteval = pte_val(pte); + + /* +* pmdp_invalidate sets this combination (that is not caught by +* !pte_present() check in pte_access_permitted), to prevent +* lock-free lookups, as part of the serialize_against_pte_lookup() +* synchronisation. +* +* This check inadvertently catches the case where the PTE's hardware +* PRESENT bit is cleared while TLB is flushed, to work around +* hardware TLB issues. This is suboptimal, but should not be hit +* frequently and should be harmless. +*/ + if ((pteval & _PAGE_INVALID) && !(pteval & _PAGE_PRESENT)) + return false; + + return pte_access_permitted(pte, write); } #ifdef CONFIG_TRANSPARENT_HUGEPAGE diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c index 16bda049187a..ff98b663c83e 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -116,6 +116,9 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, /* * This ensures that generic code that rely on IRQ disabling * to prevent a parallel THP split work as expected. +* +* Marking the entry with _PAGE_INVALID && ~_PAGE_PRESENT requires +* a special case check in pmd_access_permitted. */ serialize_against_pte_lookup(vma->vm_mm); return __pmd(old_pmd); -- 2.20.1
Re: [WIP RFC PATCH 0/6] Generic Firmware Variable Filesystem
Hi Nayna, >> As PowerNV moves towards secure boot, we need a place to put secure >> variables. One option that has been canvassed is to make our secure >> variables look like EFI variables. This is an early sketch of another >> approach where we create a generic firmware variable file system, >> fwvarfs, and an OPAL Secure Variable backend for it. > > Is there a need of new filesystem ? I am wondering why can't these be > exposed via sysfs / securityfs ? > Probably, something like... /sys/firmware/secureboot or > /sys/kernel/security/secureboot/ ? I suppose we could put secure variables in sysfs, but I'm not sure that's what sysfs was intended for. I understand sysfs as "a filesystem-based view of kernel objects" (from Documentation/filesystems/configfs/configfs.txt), and I don't think a secure variable is really a kernel object in the same way most other things in sysfs are... but I'm open to being convinced. securityfs seems to be reserved for LSMs, I don't think we can put things there. My hope with fwvarfs is to provide a generic place for firmware variables so that we don't need to expand the list of firmware-specific filesystems beyond efivarfs. I am also aiming to make things simple to use so that people familiar with firmware don't also have to become familiar with filesystem code in order to expose firmware variables to userspace. > Also, it sounds like this is needed only for secure firmware variables > and does not include > other firmware variables which are not security relevant ? Is that > correct understanding ? The primary use case at the moment - OPAL secure variables - is security focused because the current OPAL secure variable design stores and manipulates secure variables separately from the rest of nvram. This isn't an inherent feature of fwvarfs. fwvarfs can also be used for variables that are not security relevant as well. For example, with the EFI backend (patch 3), both secure and insecure variables can be read. Regards, Daniel