[PATCH] powerpc/configs: Disable latencytop

2019-06-03 Thread Anton Blanchard
latencytop adds almost 4kB to each and every task struct and as such
it doesn't deserve to be in our defconfigs.

Signed-off-by: Anton Blanchard 
---
 arch/powerpc/configs/g5_defconfig   | 1 -
 arch/powerpc/configs/gamecube_defconfig | 1 -
 arch/powerpc/configs/maple_defconfig| 1 -
 arch/powerpc/configs/pmac32_defconfig   | 1 -
 arch/powerpc/configs/powernv_defconfig  | 1 -
 arch/powerpc/configs/ppc64_defconfig| 1 -
 arch/powerpc/configs/ppc64e_defconfig   | 1 -
 arch/powerpc/configs/ppc6xx_defconfig   | 1 -
 arch/powerpc/configs/pseries_defconfig  | 1 -
 arch/powerpc/configs/wii_defconfig  | 1 -
 10 files changed, 10 deletions(-)

diff --git a/arch/powerpc/configs/g5_defconfig 
b/arch/powerpc/configs/g5_defconfig
index ceb3c770786f..8e9389d6c8ef 100644
--- a/arch/powerpc/configs/g5_defconfig
+++ b/arch/powerpc/configs/g5_defconfig
@@ -244,7 +244,6 @@ CONFIG_CRC_T10DIF=y
 CONFIG_MAGIC_SYSRQ=y
 CONFIG_DEBUG_KERNEL=y
 CONFIG_DEBUG_MUTEXES=y
-CONFIG_LATENCYTOP=y
 CONFIG_BOOTX_TEXT=y
 CONFIG_CRYPTO_TEST=m
 CONFIG_CRYPTO_PCBC=m
diff --git a/arch/powerpc/configs/gamecube_defconfig 
b/arch/powerpc/configs/gamecube_defconfig
index 805b0f87653c..bfffdc4f1b73 100644
--- a/arch/powerpc/configs/gamecube_defconfig
+++ b/arch/powerpc/configs/gamecube_defconfig
@@ -91,7 +91,6 @@ CONFIG_CRC_CCITT=y
 CONFIG_PRINTK_TIME=y
 CONFIG_DEBUG_SPINLOCK=y
 CONFIG_DEBUG_MUTEXES=y
-CONFIG_LATENCYTOP=y
 CONFIG_SCHED_TRACER=y
 CONFIG_DMA_API_DEBUG=y
 CONFIG_PPC_EARLY_DEBUG=y
diff --git a/arch/powerpc/configs/maple_defconfig 
b/arch/powerpc/configs/maple_defconfig
index c5f2005005d3..1c436fafb397 100644
--- a/arch/powerpc/configs/maple_defconfig
+++ b/arch/powerpc/configs/maple_defconfig
@@ -104,7 +104,6 @@ CONFIG_MAGIC_SYSRQ=y
 CONFIG_DEBUG_KERNEL=y
 CONFIG_DEBUG_STACK_USAGE=y
 CONFIG_DEBUG_STACKOVERFLOW=y
-CONFIG_LATENCYTOP=y
 CONFIG_XMON=y
 CONFIG_XMON_DEFAULT=y
 CONFIG_BOOTX_TEXT=y
diff --git a/arch/powerpc/configs/pmac32_defconfig 
b/arch/powerpc/configs/pmac32_defconfig
index 50b610b48914..8d632ceaea48 100644
--- a/arch/powerpc/configs/pmac32_defconfig
+++ b/arch/powerpc/configs/pmac32_defconfig
@@ -293,7 +293,6 @@ CONFIG_CRC_T10DIF=y
 CONFIG_MAGIC_SYSRQ=y
 CONFIG_DEBUG_KERNEL=y
 CONFIG_DETECT_HUNG_TASK=y
-CONFIG_LATENCYTOP=y
 CONFIG_XMON=y
 CONFIG_XMON_DEFAULT=y
 CONFIG_BOOTX_TEXT=y
diff --git a/arch/powerpc/configs/powernv_defconfig 
b/arch/powerpc/configs/powernv_defconfig
index ef2ef98d3f28..1cf8ce18b4ca 100644
--- a/arch/powerpc/configs/powernv_defconfig
+++ b/arch/powerpc/configs/powernv_defconfig
@@ -317,7 +317,6 @@ CONFIG_DEBUG_STACK_USAGE=y
 CONFIG_DEBUG_STACKOVERFLOW=y
 CONFIG_SOFTLOCKUP_DETECTOR=y
 CONFIG_HARDLOCKUP_DETECTOR=y
-CONFIG_LATENCYTOP=y
 CONFIG_FUNCTION_TRACER=y
 CONFIG_SCHED_TRACER=y
 CONFIG_FTRACE_SYSCALLS=y
diff --git a/arch/powerpc/configs/ppc64_defconfig 
b/arch/powerpc/configs/ppc64_defconfig
index 91fdb619b484..96d695ffe074 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -367,7 +367,6 @@ CONFIG_DEBUG_STACKOVERFLOW=y
 CONFIG_SOFTLOCKUP_DETECTOR=y
 CONFIG_HARDLOCKUP_DETECTOR=y
 CONFIG_DEBUG_MUTEXES=y
-CONFIG_LATENCYTOP=y
 CONFIG_FUNCTION_TRACER=y
 CONFIG_SCHED_TRACER=y
 CONFIG_BLK_DEV_IO_TRACE=y
diff --git a/arch/powerpc/configs/ppc64e_defconfig 
b/arch/powerpc/configs/ppc64e_defconfig
index 41d85cb3c9a2..7e52d4658867 100644
--- a/arch/powerpc/configs/ppc64e_defconfig
+++ b/arch/powerpc/configs/ppc64e_defconfig
@@ -223,7 +223,6 @@ CONFIG_DEBUG_STACK_USAGE=y
 CONFIG_DEBUG_STACKOVERFLOW=y
 CONFIG_DETECT_HUNG_TASK=y
 CONFIG_DEBUG_MUTEXES=y
-CONFIG_LATENCYTOP=y
 CONFIG_IRQSOFF_TRACER=y
 CONFIG_SCHED_TRACER=y
 CONFIG_BLK_DEV_IO_TRACE=y
diff --git a/arch/powerpc/configs/ppc6xx_defconfig 
b/arch/powerpc/configs/ppc6xx_defconfig
index 7c6baf6df139..7bcdb4d1411c 100644
--- a/arch/powerpc/configs/ppc6xx_defconfig
+++ b/arch/powerpc/configs/ppc6xx_defconfig
@@ -1148,7 +1148,6 @@ CONFIG_FAIL_MAKE_REQUEST=y
 CONFIG_FAIL_IO_TIMEOUT=y
 CONFIG_FAULT_INJECTION_DEBUG_FS=y
 CONFIG_FAULT_INJECTION_STACKTRACE_FILTER=y
-CONFIG_LATENCYTOP=y
 CONFIG_SCHED_TRACER=y
 CONFIG_STACK_TRACER=y
 CONFIG_BLK_DEV_IO_TRACE=y
diff --git a/arch/powerpc/configs/pseries_defconfig 
b/arch/powerpc/configs/pseries_defconfig
index 62e12f61a3b2..c8f5f281e367 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -290,7 +290,6 @@ CONFIG_DEBUG_STACK_USAGE=y
 CONFIG_DEBUG_STACKOVERFLOW=y
 CONFIG_SOFTLOCKUP_DETECTOR=y
 CONFIG_HARDLOCKUP_DETECTOR=y
-CONFIG_LATENCYTOP=y
 CONFIG_FUNCTION_TRACER=y
 CONFIG_SCHED_TRACER=y
 CONFIG_BLK_DEV_IO_TRACE=y
diff --git a/arch/powerpc/configs/wii_defconfig 
b/arch/powerpc/configs/wii_defconfig
index f5c366b02828..d60c04a4708a 100644
--- a/arch/powerpc/configs/wii_defconfig
+++ b/arch/powerpc/configs/wii_defconfig
@@ -123,7 +123,6 @@ CONFIG_PRINTK_TIME=y
 CONFIG_MAGIC_SYSRQ=y
 CONFIG_DEBUG_SPINLOCK=y
 CONFIG_DEBUG_MUTEXES=y
-CONFIG_LATENCYTOP=y
 CONFIG_SCHED_TRACER=y
 CONFIG_BLK_DEV_IO_TRACE=y
 

Re: [PATCH kernel] prom_init: Fetch flatten device tree from the system firmware

2019-06-03 Thread Alexey Kardashevskiy



On 04/06/2019 09:42, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Jun 03, 2019 at 12:56:26PM +1000, Alexey Kardashevskiy wrote:
>> On 03/06/2019 09:23, Segher Boessenkool wrote:
 So I go for the simple one and agree with Alexey's idea.
>>>
>>> When dealing with a whole device tree you have to know about the various
>>> dynamically generated nodes and props, and handle each appropriately.
>>
>> The code I am changing fetches the device tree and build an fdt. What is
>> that special knowledge in this context you are talking about?
> 
> Things like /options are dynamically generated.

Generated by the guest kernel, after walking the tree and before making
the fdt blob? I cannot see it, what do I miss? Otherwise it is all in
slof and the same result is fetched one way or another.


> I thought you would just be able to reuse some FDT parsing code to
> implement my suggested new interface.  Maybe that was too optimistic.




-- 
Alexey


[PATCH v2] perf ioctl: Add check for the sample_period value

2019-06-03 Thread Ravi Bangoria
perf_event_open() limits the sample_period to 63 bits. See
commit 0819b2e30ccb ("perf: Limit perf_event_attr::sample_period
to 63 bits"). Make ioctl() consistent with it.

Also on powerpc, negative sample_period could cause a recursive
PMIs leading to a hang (reported when running perf-fuzzer).

Signed-off-by: Ravi Bangoria 
---
 kernel/events/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index abbd4b3b96c2..e44c90378940 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5005,6 +5005,9 @@ static int perf_event_period(struct perf_event *event, 
u64 __user *arg)
if (perf_event_check_period(event, value))
return -EINVAL;
 
+   if (!event->attr.freq && (value & (1ULL << 63)))
+   return -EINVAL;
+
event_function_call(event, __perf_event_period, );
 
return 0;
-- 
2.20.1



Re: [RFC PATCH v2] powerpc/xmon: restrict when kernel is locked down

2019-06-03 Thread Andrew Donnellan

On 4/6/19 1:05 pm, Christopher M Riedl wrote:>>> + if (!xmon_is_ro) {

+   xmon_is_ro = kernel_is_locked_down("Using xmon write-access",
+  LOCKDOWN_INTEGRITY);
+   if (xmon_is_ro) {
+   printf("xmon: Read-only due to kernel lockdown\n");
+   clear_all_bpt();


Remind me again why we need to clear breakpoints in integrity mode?


Andrew



I interpreted "integrity" mode as meaning that any changes made by xmon should
be reversed. This also covers the case when a user creates some breakpoint(s)
in xmon, exits xmon, and then elevates the lockdown state. Upon hitting the
first breakpoint and (re-)entering xmon, xmon will clear all breakpoints.

Xmon can only take action in response to dynamic lockdown level changes when
xmon is invoked in some manner - if there is a better way I am all ears :)



Integrity mode merely means we are aiming to prevent modifications to 
kernel memory. IMHO leaving existing breakpoints in place is fine as 
long as when we hit the breakpoint xmon is in read-only mode.


(dja/mpe might have opinions on this)

--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited



crash after NX error

2019-06-03 Thread Stewart Smith
On my two socket POWER9 system (powernv) with 842 zwap set up, I
recently got a crash with the Ubuntu kernel (I haven't tried with
upstream, and this is the first time the system has died like this, so
I'm not sure how repeatable it is).

[2.891463] zswap: loaded using pool 842-nx/zbud
...
[15626.124646] nx_compress_powernv: ERROR: CSB still not valid after 500 
us, giving up : 00 00 00 00 
[16868.932913] Unable to handle kernel paging request for data at address 
0x6655f67da816cdb8
[16868.933726] Faulting instruction address: 0xc0391600


cpu 0x68: Vector: 380 (Data Access Out of Range) at [c01c9d98b9a0]
pc: c0391600: kmem_cache_alloc+0x2e0/0x340
lr: c03915ec: kmem_cache_alloc+0x2cc/0x340
sp: c01c9d98bc20
   msr: 9280b033
   dar: 6655f67da816cdb8
  current = 0xc01ad43cb400
  paca= 0xcfac7800   softe: 0irq_happened: 0x01
pid   = 8319, comm = make
Linux version 4.15.0-50-generic (buildd@bos02-ppc64el-006) (gcc version 7.3.0 
(Ubuntu 7.3.0-16ubuntu3)) #54-Ubuntu SMP Mon May 6 18:55:18 UTC 2019 (Ubuntu 
4.15.0-50.54-generic 4.15.18)

68:mon> t
[c01c9d98bc20] c03914d4 kmem_cache_alloc+0x1b4/0x340 (unreliable)
[c01c9d98bc80] c03b1e14 __khugepaged_enter+0x54/0x220
[c01c9d98bcc0] c010f0ec copy_process.isra.5.part.6+0xebc/0x1a10
[c01c9d98bda0] c010fe4c _do_fork+0xec/0x510
[c01c9d98be30] c000b584 ppc_clone+0x8/0xc
--- Exception: c00 (System Call) at 7afe9daf87f4
SP (7fffca606880) is in userspace

So, it looks like there could be a problem in the error path, plausibly
fixed by this patch:

commit 656ecc16e8fc2ab44b3d70e3fcc197a7020d0ca5
Author: Haren Myneni 
Date:   Wed Jun 13 00:32:40 2018 -0700

crypto/nx: Initialize 842 high and normal RxFIFO control registers

NX increments readOffset by FIFO size in receive FIFO control register
when CRB is read. But the index in RxFIFO has to match with the
corresponding entry in FIFO maintained by VAS in kernel. Otherwise NX
may be processing incorrect CRBs and can cause CRB timeout.

VAS FIFO offset is 0 when the receive window is opened during
initialization. When the module is reloaded or in kexec boot, readOffset
in FIFO control register may not match with VAS entry. This patch adds
nx_coproc_init OPAL call to reset readOffset and queued entries in FIFO
control register for both high and normal FIFOs.

Signed-off-by: Haren Myneni 
[mpe: Fixup uninitialized variable warning]
Signed-off-by: Michael Ellerman 

$ git describe --contains 656ecc16e8fc2ab44b3d70e3fcc197a7020d0ca5
v4.19-rc1~24^2~50


Which was never backported to any stable release, so probably needs to
be for v4.14 through v4.18. Notably, Ubuntu is on v4.15 and it doesn't
seem to have picked up the patch. I'm opening an Ubuntu bug for this.

Haren, is this something you can drive through the stable process
(assuming my above crash looks like this failure)?

-- 
Stewart Smith
OPAL Architect, IBM.



[PATCH v5 1/2] powerpc: silence a -Wcast-function-type warning in dawr_write_file_bool

2019-06-03 Thread Michael Neuling
From: Mathieu Malaterre 

In commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9
option") the following piece of code was added:

   smp_call_function((smp_call_func_t)set_dawr, _brk, 0);

Since GCC 8 this triggers the following warning about incompatible
function types:

  arch/powerpc/kernel/hw_breakpoint.c:408:21: error: cast between incompatible 
function types from 'int (*)(struct arch_hw_breakpoint *)' to 'void (*)(void 
*)' [-Werror=cast-function-type]

Since the warning is there for a reason, and should not be hidden behind
a cast, provide an intermediate callback function to avoid the warning.

Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option")
Suggested-by: Christoph Hellwig 
Signed-off-by: Mathieu Malaterre 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/hw_breakpoint.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index da307dd93e..ca3a2358b7 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -384,6 +384,11 @@ void hw_breakpoint_pmu_read(struct perf_event *bp)
 bool dawr_force_enable;
 EXPORT_SYMBOL_GPL(dawr_force_enable);
 
+static void set_dawr_cb(void *info)
+{
+   set_dawr(info);
+}
+
 static ssize_t dawr_write_file_bool(struct file *file,
const char __user *user_buf,
size_t count, loff_t *ppos)
@@ -403,7 +408,7 @@ static ssize_t dawr_write_file_bool(struct file *file,
 
/* If we are clearing, make sure all CPUs have the DAWR cleared */
if (!dawr_force_enable)
-   smp_call_function((smp_call_func_t)set_dawr, _brk, 0);
+   smp_call_function(set_dawr_cb, _brk, 0);
 
return rc;
 }
-- 
2.21.0



[PATCH v5 2/2] powerpc: Fix compile issue with force DAWR

2019-06-03 Thread Michael Neuling
If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail
at linking with:
  arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined reference to 
`dawr_force_enable'

This was caused by commit c1fe190c0672 ("powerpc: Add force enable of
DAWR on P9 option").

This moves a bunch of code around to fix this. It moves a lot of the
DAWR code in a new file and creates a new CONFIG_PPC_DAWR to enable
compiling it.

Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option")
Signed-off-by: Michael Neuling 
--
v5:
  - Changes based on comments by hch
- Change // to /* comments
- Change return code from -1 to -ENODEV
- Remove unneeded default n in new Kconfig option
- Remove setting to default value
- Remove unnecessary braces

v4:
  - Fix merge conflict with patch from Mathieu Malaterre:
 powerpc: silence a -Wcast-function-type warning in dawr_write_file_bool
  - Fixed checkpatch issues noticed by Christophe Leroy.

v3:
  Fixes based on Christophe Leroy's comments:
  - Fix Kconfig options to better reflect reality
  - Reorder alphabetically
  - Inline vs #define
  - Fixed default return for dawr_enabled() when CONFIG_PPC_DAWR=N

V2:
  Fixes based on Christophe Leroy's comments:
  - Fix commit message formatting
  - Move more DAWR code into dawr.c
---
 arch/powerpc/Kconfig |  4 +
 arch/powerpc/include/asm/hw_breakpoint.h | 21 +++--
 arch/powerpc/kernel/Makefile |  1 +
 arch/powerpc/kernel/dawr.c   | 98 
 arch/powerpc/kernel/hw_breakpoint.c  | 61 ---
 arch/powerpc/kernel/process.c| 28 ---
 arch/powerpc/kvm/Kconfig |  1 +
 7 files changed, 118 insertions(+), 96 deletions(-)
 create mode 100644 arch/powerpc/kernel/dawr.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8c1c636308..9d61b36df4 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -234,6 +234,7 @@ config PPC
select OLD_SIGSUSPEND
select PCI_DOMAINS  if PCI
select PCI_SYSCALL  if PCI
+   select PPC_DAWR if PPC64
select RTC_LIB
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
@@ -370,6 +371,9 @@ config PPC_ADV_DEBUG_DAC_RANGE
depends on PPC_ADV_DEBUG_REGS && 44x
default y
 
+config PPC_DAWR
+   bool
+
 config ZONE_DMA
bool
default y if PPC_BOOK3E_64
diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index 0fe8c1e46b..41abdae6d0 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -90,18 +90,25 @@ static inline void hw_breakpoint_disable(void)
 extern void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs);
 int hw_breakpoint_handler(struct die_args *args);
 
-extern int set_dawr(struct arch_hw_breakpoint *brk);
+#else  /* CONFIG_HAVE_HW_BREAKPOINT */
+static inline void hw_breakpoint_disable(void) { }
+static inline void thread_change_pc(struct task_struct *tsk,
+   struct pt_regs *regs) { }
+
+#endif /* CONFIG_HAVE_HW_BREAKPOINT */
+
+
+#ifdef CONFIG_PPC_DAWR
 extern bool dawr_force_enable;
 static inline bool dawr_enabled(void)
 {
return dawr_force_enable;
 }
-
-#else  /* CONFIG_HAVE_HW_BREAKPOINT */
-static inline void hw_breakpoint_disable(void) { }
-static inline void thread_change_pc(struct task_struct *tsk,
-   struct pt_regs *regs) { }
+int set_dawr(struct arch_hw_breakpoint *brk);
+#else
 static inline bool dawr_enabled(void) { return false; }
-#endif /* CONFIG_HAVE_HW_BREAKPOINT */
+static inline int set_dawr(struct arch_hw_breakpoint *brk) { return -1; }
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _PPC_BOOK3S_64_HW_BREAKPOINT_H */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 0ea6c4aa3a..56dfa7a2a6 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -56,6 +56,7 @@ obj-$(CONFIG_PPC64)   += setup_64.o sys_ppc32.o \
 obj-$(CONFIG_VDSO32)   += vdso32/
 obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)   += hw_breakpoint.o
+obj-$(CONFIG_PPC_DAWR) += dawr.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_ppc970.o cpu_setup_pa6t.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_power.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= mce.o mce_power.o
diff --git a/arch/powerpc/kernel/dawr.c b/arch/powerpc/kernel/dawr.c
new file mode 100644
index 00..ae3bd6abac
--- /dev/null
+++ b/arch/powerpc/kernel/dawr.c
@@ -0,0 +1,98 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * DAWR infrastructure
+ *
+ * Copyright 2019, Michael Neuling, IBM Corporation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+bool dawr_force_enable;
+EXPORT_SYMBOL_GPL(dawr_force_enable);
+
+int set_dawr(struct 

Re: [RFC PATCH v2] powerpc/xmon: restrict when kernel is locked down

2019-06-03 Thread Christopher M Riedl


> On June 3, 2019 at 1:36 AM Andrew Donnellan  wrote:
> 
> 
> On 24/5/19 10:38 pm, Christopher M. Riedl wrote:
> > Xmon should be either fully or partially disabled depending on the
> > kernel lockdown state.
> > 
> > Put xmon into read-only mode for lockdown=integrity and completely
> > disable xmon when lockdown=confidentiality. Xmon checks the lockdown
> > state and takes appropriate action:
> > 
> >   (1) during xmon_setup to prevent early xmon'ing
> > 
> >   (2) when triggered via sysrq
> > 
> >   (3) when toggled via debugfs
> > 
> >   (4) when triggered via a previously enabled breakpoint
> > 
> > The following lockdown state transitions are handled:
> > 
> >   (1) lockdown=none -> lockdown=integrity
> >   clear all breakpoints, set xmon read-only mode
> > 
> >   (2) lockdown=none -> lockdown=confidentiality
> >   clear all breakpoints, prevent re-entry into xmon
> > 
> >   (3) lockdown=integrity -> lockdown=confidentiality
> >   prevent re-entry into xmon
> > 
> > Suggested-by: Andrew Donnellan 
> > Signed-off-by: Christopher M. Riedl 
> > ---
> > 
> > Applies on top of this series:
> > https://patchwork.kernel.org/cover/10884631/
> > 
> > I've done some limited testing of the scenarios mentioned in the commit
> > message on a single CPU QEMU config.
> > 
> > v1->v2:
> > Fix subject line
> > Submit to linuxppc-dev and kernel-hardening
> > 
> >   arch/powerpc/xmon/xmon.c | 56 +++-
> >   1 file changed, 55 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> > index 3e7be19aa208..8c4a5a0c28f0 100644
> > --- a/arch/powerpc/xmon/xmon.c
> > +++ b/arch/powerpc/xmon/xmon.c
> > @@ -191,6 +191,9 @@ static void dump_tlb_44x(void);
> >   static void dump_tlb_book3e(void);
> >   #endif
> >   
> > +static void clear_all_bpt(void);
> > +static void xmon_init(int);
> > +
> >   #ifdef CONFIG_PPC64
> >   #define REG   "%.16lx"
> >   #else
> > @@ -291,6 +294,39 @@ Commands:\n\
> > zh  halt\n"
> >   ;
> >   
> > +#ifdef CONFIG_LOCK_DOWN_KERNEL
> > +static bool xmon_check_lockdown(void)
> > +{
> > +   static bool lockdown = false;
> > +
> > +   if (!lockdown) {
> > +   lockdown = kernel_is_locked_down("Using xmon",
> > +LOCKDOWN_CONFIDENTIALITY);
> > +   if (lockdown) {
> > +   printf("xmon: Disabled by strict kernel lockdown\n");
> > +   xmon_on = 0;
> > +   xmon_init(0);
> > +   }
> > +   }
> > +
> > +   if (!xmon_is_ro) {
> > +   xmon_is_ro = kernel_is_locked_down("Using xmon write-access",
> > +  LOCKDOWN_INTEGRITY);
> > +   if (xmon_is_ro) {
> > +   printf("xmon: Read-only due to kernel lockdown\n");
> > +   clear_all_bpt();
> 
> Remind me again why we need to clear breakpoints in integrity mode?
> 
> 
> Andrew
> 

I interpreted "integrity" mode as meaning that any changes made by xmon should
be reversed. This also covers the case when a user creates some breakpoint(s)
in xmon, exits xmon, and then elevates the lockdown state. Upon hitting the
first breakpoint and (re-)entering xmon, xmon will clear all breakpoints.

Xmon can only take action in response to dynamic lockdown level changes when
xmon is invoked in some manner - if there is a better way I am all ears :)

> 
> > +   }
> > +   }
> > +
> > +   return lockdown;
> > +}
> > +#else
> > +inline static bool xmon_check_lockdown(void)
> > +{
> > +   return false;
> > +}
> > +#endif /* CONFIG_LOCK_DOWN_KERNEL */
> > +
> >   static struct pt_regs *xmon_regs;
> >   
> >   static inline void sync(void)
> > @@ -708,6 +744,9 @@ static int xmon_bpt(struct pt_regs *regs)
> > struct bpt *bp;
> > unsigned long offset;
> >   
> > +   if (xmon_check_lockdown())
> > +   return 0;
> > +
> > if ((regs->msr & (MSR_IR|MSR_PR|MSR_64BIT)) != (MSR_IR|MSR_64BIT))
> > return 0;
> >   
> > @@ -739,6 +778,9 @@ static int xmon_sstep(struct pt_regs *regs)
> >   
> >   static int xmon_break_match(struct pt_regs *regs)
> >   {
> > +   if (xmon_check_lockdown())
> > +   return 0;
> > +
> > if ((regs->msr & (MSR_IR|MSR_PR|MSR_64BIT)) != (MSR_IR|MSR_64BIT))
> > return 0;
> > if (dabr.enabled == 0)
> > @@ -749,6 +791,9 @@ static int xmon_break_match(struct pt_regs *regs)
> >   
> >   static int xmon_iabr_match(struct pt_regs *regs)
> >   {
> > +   if (xmon_check_lockdown())
> > +   return 0;
> > +
> > if ((regs->msr & (MSR_IR|MSR_PR|MSR_64BIT)) != (MSR_IR|MSR_64BIT))
> > return 0;
> > if (iabr == NULL)
> > @@ -3742,6 +3787,9 @@ static void xmon_init(int enable)
> >   #ifdef CONFIG_MAGIC_SYSRQ
> >   static void sysrq_handle_xmon(int key)
> >   {
> > +   if (xmon_check_lockdown())
> > +   return;
> > +
> > /* ensure xmon is enabled */
> > 

Re: [PATCH v4 2/2] powerpc: Fix compile issue with force DAWR

2019-06-03 Thread Michael Neuling
I agree with all the below and will address in v5.

Mikey

On Tue, 2019-05-28 at 23:28 -0700, Christoph Hellwig wrote:
> > +config PPC_DAWR
> > +   bool
> > +   default n
> 
> "default n" is the default default.  No need to write this line.
> 
> > +++ b/arch/powerpc/kernel/dawr.c
> > @@ -0,0 +1,100 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +//
> > +// DAWR infrastructure
> > +//
> > +// Copyright 2019, Michael Neuling, IBM Corporation.
> 
> Normal top of file header should be /* */, //-style comments are only
> for the actual SPDX heder line.
> 
> > +   /* Send error to user if they hypervisor won't allow us to write DAWR */
> > +   if ((!dawr_force_enable) &&
> > +   (firmware_has_feature(FW_FEATURE_LPAR)) &&
> > +   (set_dawr(_brk) != H_SUCCESS))
> 
> None of the three inner brace sets here are required, and the code
> becomes much easier to read without them.
> 
> > +   return -1;
> 
> What about returning a proper error code?
> 
> > +static int __init dawr_force_setup(void)
> > +{
> > +   dawr_force_enable = false;
> 
> This variable already is initialized to alse by default, so this line
> is not required.
> 
> > +   if (PVR_VER(mfspr(SPRN_PVR)) == PVR_POWER9) {
> > +   /* Turn DAWR off by default, but allow admin to turn it on */
> > +   dawr_force_enable = false;
> 
> .. and neither is this one.
> 



Re: [PATCH kernel] prom_init: Fetch flatten device tree from the system firmware

2019-06-03 Thread David Gibson
On Mon, Jun 03, 2019 at 06:49:32PM -0500, Segher Boessenkool wrote:
> On Tue, Jun 04, 2019 at 07:18:42AM +1000, Benjamin Herrenschmidt wrote:
> > Yes we could make property fetching faster but mostly by creating a new
> > bulk interface which is quite a bit of work, a new API, and will in
> > practice not be used for anything other than creating the FDT. In that
> > case, nothing will beat in performance having OF create the FDT itself
> > based on its current tree.
> 
> And that will change the whole boot protocol, the interaction between OF
> and its client, which is a much bigger change, not conceptually trivial
> at all.  Copying all properties at once is, which is why I suggested it.

Much as I wasn't terribly convinced by the original idea, I agree with
Ben and Alexey here.  Your approach has slightly more complexity for
slightly less benefit.  If it's worth doing yours it's better to do
theirs.

> It would take away the opposition to your patch.

Only from you, AFAICT...

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2 1/5] powerpc/powernv: Move SCOM access code into powernv platform

2019-06-03 Thread Andrew Donnellan

On 9/5/19 3:11 pm, Andrew Donnellan wrote:

The powernv platform is the only one that directly accesses SCOMs. Move the
support code to platforms/powernv, and get rid of the PPC_SCOM Kconfig
option, as SCOM support is always selected when compiling for powernv.

This also means that the Kconfig item for CONFIG_SCOM_DEBUGFS will actually
show up in menuconfig, as previously it was the only labelled option in
sysdev/Kconfig and wasn't actually in a menu.


As I've just realised, this isn't actually correct - the option does 
indeed show up... in the root menu, where I've just been trained to 
ignore it, and where you won't get a menu location if you try to search 
for it using / in menuconfig.


I think moving it to the platform menu is obviously a better location. 
mpe would you be able to fix up the commit message in merge?



Andrew




Signed-off-by: Andrew Donnellan 
---
v1->v2:
- move scom.h as well (mpe)
- add all the other patches in this series
---
  arch/powerpc/platforms/powernv/Kconfig  |  5 -
  arch/powerpc/platforms/powernv/Makefile |  2 +-
  arch/powerpc/platforms/powernv/opal-xscom.c |  3 ++-
  arch/powerpc/{sysdev => platforms/powernv}/scom.c   |  3 ++-
  .../{include/asm => platforms/powernv}/scom.h   | 13 +++--
  arch/powerpc/sysdev/Kconfig |  7 ---
  arch/powerpc/sysdev/Makefile|  2 --
  7 files changed, 12 insertions(+), 23 deletions(-)
  rename arch/powerpc/{sysdev => platforms/powernv}/scom.c (99%)
  rename arch/powerpc/{include/asm => platforms/powernv}/scom.h (95%)

diff --git a/arch/powerpc/platforms/powernv/Kconfig 
b/arch/powerpc/platforms/powernv/Kconfig
index 850eee860cf2..938803eab0ad 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -12,7 +12,6 @@ config PPC_POWERNV
select EPAPR_BOOT
select PPC_INDIRECT_PIO
select PPC_UDBG_16550
-   select PPC_SCOM
select ARCH_RANDOM
select CPU_FREQ
select PPC_DOORBELL
@@ -47,3 +46,7 @@ config PPC_VAS
  VAS adapters are found in POWER9 based systems.
  
  	  If unsure, say N.

+
+config SCOM_DEBUGFS
+   bool "Expose SCOM controllers via debugfs"
+   depends on DEBUG_FS
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index da2e99efbd04..4b1644150135 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -4,12 +4,12 @@ obj-y += idle.o opal-rtc.o opal-nvram.o 
opal-lpc.o opal-flash.o
  obj-y += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
  obj-y += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
  obj-y += opal-kmsg.o opal-powercap.o opal-psr.o 
opal-sensor-groups.o
+obj-y  += opal-xscom.o scom.o
  
  obj-$(CONFIG_SMP)	+= smp.o subcore.o subcore-asm.o

  obj-$(CONFIG_PCI) += pci.o pci-ioda.o npu-dma.o pci-ioda-tce.o
  obj-$(CONFIG_CXL_BASE)+= pci-cxl.o
  obj-$(CONFIG_EEH) += eeh-powernv.o
-obj-$(CONFIG_PPC_SCOM) += opal-xscom.o
  obj-$(CONFIG_MEMORY_FAILURE)  += opal-memory-errors.o
  obj-$(CONFIG_OPAL_PRD)+= opal-prd.o
  obj-$(CONFIG_PERF_EVENTS) += opal-imc.o
diff --git a/arch/powerpc/platforms/powernv/opal-xscom.c 
b/arch/powerpc/platforms/powernv/opal-xscom.c
index 22d5e1110dbb..66337d92cb63 100644
--- a/arch/powerpc/platforms/powernv/opal-xscom.c
+++ b/arch/powerpc/platforms/powernv/opal-xscom.c
@@ -18,7 +18,8 @@
  #include 
  #include 
  #include 
-#include 
+
+#include "scom.h"
  
  /*

   * We could probably fit that inside the scom_map_t
diff --git a/arch/powerpc/sysdev/scom.c b/arch/powerpc/platforms/powernv/scom.c
similarity index 99%
rename from arch/powerpc/sysdev/scom.c
rename to arch/powerpc/platforms/powernv/scom.c
index a707b24a7ddb..50c019d2ef45 100644
--- a/arch/powerpc/sysdev/scom.c
+++ b/arch/powerpc/platforms/powernv/scom.c
@@ -23,9 +23,10 @@
  #include 
  #include 
  #include 
-#include 
  #include 
  
+#include "scom.h"

+
  const struct scom_controller *scom_controller;
  EXPORT_SYMBOL_GPL(scom_controller);
  
diff --git a/arch/powerpc/include/asm/scom.h b/arch/powerpc/platforms/powernv/scom.h

similarity index 95%
rename from arch/powerpc/include/asm/scom.h
rename to arch/powerpc/platforms/powernv/scom.h
index f5cde45b1161..b14fe0edf95b 100644
--- a/arch/powerpc/include/asm/scom.h
+++ b/arch/powerpc/platforms/powernv/scom.h
@@ -18,12 +18,8 @@
   *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
   */
  
-#ifndef _ASM_POWERPC_SCOM_H

-#define _ASM_POWERPC_SCOM_H
-
-#ifdef __KERNEL__
-#ifndef __ASSEMBLY__
-#ifdef CONFIG_PPC_SCOM
+#ifndef _SCOM_H
+#define _SCOM_H
  
  /*

   * The SCOM bus is a sideband bus used for accessing various internal
@@ -161,7 +157,4 @@ static inline int scom_write(scom_map_t map, u64 reg, u64 
value)
  }
  
  
-#endif /* CONFIG_PPC_SCOM */

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-06-03 Thread Michael S. Tsirkin
On Mon, Jun 03, 2019 at 10:13:59PM -0300, Thiago Jung Bauermann wrote:
> 
> 
> Michael S. Tsirkin  writes:
> 
> > On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote:
> >> I rephrased it in terms of address translation. What do you think of
> >> this version? The flag name is slightly different too:
> >>
> >>
> >> VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION This feature has the same
> >> meaning as VIRTIO_F_ACCESS_PLATFORM both when set and when not set,
> >> with the exception that address translation is guaranteed to be
> >> unnecessary when accessing memory addresses supplied to the device
> >> by the driver. Which is to say, the device will always use physical
> >> addresses matching addresses used by the driver (typically meaning
> >> physical addresses used by the CPU) and not translated further. This
> >> flag should be set by the guest if offered, but to allow for
> >> backward-compatibility device implementations allow for it to be
> >> left unset by the guest. It is an error to set both this flag and
> >> VIRTIO_F_ACCESS_PLATFORM.
> >
> >
> > OK so VIRTIO_F_ACCESS_PLATFORM is designed to allow unpriveledged
> > drivers. This is why devices fail when it's not negotiated.
> 
> Just to clarify, what do you mean by unprivileged drivers? Is it drivers
> implemented in guest userspace such as with VFIO? Or unprivileged in
> some other sense such as needing to use bounce buffers for some reason?

I had drivers in guest userspace in mind.

> > This confuses me.
> > If driver is unpriveledged then what happens with this flag?
> > It can supply any address it wants. Will that corrupt kernel
> > memory?
> 
> Not needing address translation doesn't necessarily mean that there's no
> IOMMU. On powerpc we don't use VIRTIO_F_ACCESS_PLATFORM but there's
> always an IOMMU present. And we also support VFIO drivers. The VFIO API
> for pseries (sPAPR section in Documentation/vfio.txt) has extra ioctls
> to program the IOMMU.
> 
> For our use case, we don't need address translation because we set up an
> identity mapping in the IOMMU so that the device can use guest physical
> addresses.

And can it access any guest physical address?

> If the guest kernel is concerned that an unprivileged driver could
> jeopardize its integrity it should not negotiate this feature flag.

Unfortunately flag negotiation is done through config space
and so can be overwritten by the driver.

> Perhaps there should be a note about this in the flag definition? This
> concern is platform-dependant though. I don't believe it's an issue in
> pseries.

Again ACCESS_PLATFORM has a pretty open definition. It does actually
say it's all up to the platform.

Specifically how will VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION be
implemented portably? virtio has no portable way to know
whether DMA API bypasses translation.



> --
> Thiago Jung Bauermann
> IBM Linux Technology Center


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-06-03 Thread Thiago Jung Bauermann



Michael S. Tsirkin  writes:

> On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote:
>> I rephrased it in terms of address translation. What do you think of
>> this version? The flag name is slightly different too:
>>
>>
>> VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION This feature has the same
>> meaning as VIRTIO_F_ACCESS_PLATFORM both when set and when not set,
>> with the exception that address translation is guaranteed to be
>> unnecessary when accessing memory addresses supplied to the device
>> by the driver. Which is to say, the device will always use physical
>> addresses matching addresses used by the driver (typically meaning
>> physical addresses used by the CPU) and not translated further. This
>> flag should be set by the guest if offered, but to allow for
>> backward-compatibility device implementations allow for it to be
>> left unset by the guest. It is an error to set both this flag and
>> VIRTIO_F_ACCESS_PLATFORM.
>
>
> OK so VIRTIO_F_ACCESS_PLATFORM is designed to allow unpriveledged
> drivers. This is why devices fail when it's not negotiated.

Just to clarify, what do you mean by unprivileged drivers? Is it drivers
implemented in guest userspace such as with VFIO? Or unprivileged in
some other sense such as needing to use bounce buffers for some reason?

> This confuses me.
> If driver is unpriveledged then what happens with this flag?
> It can supply any address it wants. Will that corrupt kernel
> memory?

Not needing address translation doesn't necessarily mean that there's no
IOMMU. On powerpc we don't use VIRTIO_F_ACCESS_PLATFORM but there's
always an IOMMU present. And we also support VFIO drivers. The VFIO API
for pseries (sPAPR section in Documentation/vfio.txt) has extra ioctls
to program the IOMMU.

For our use case, we don't need address translation because we set up an
identity mapping in the IOMMU so that the device can use guest physical
addresses.

If the guest kernel is concerned that an unprivileged driver could
jeopardize its integrity it should not negotiate this feature flag.
Perhaps there should be a note about this in the flag definition? This
concern is platform-dependant though. I don't believe it's an issue in
pseries.

--
Thiago Jung Bauermann
IBM Linux Technology Center



Re: [PATCH kernel] prom_init: Fetch flatten device tree from the system firmware

2019-06-03 Thread Benjamin Herrenschmidt
On Mon, 2019-06-03 at 18:42 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Jun 03, 2019 at 12:56:26PM +1000, Alexey Kardashevskiy wrote:
> > On 03/06/2019 09:23, Segher Boessenkool wrote:
> > > > So I go for the simple one and agree with Alexey's idea.
> > > 
> > > When dealing with a whole device tree you have to know about the
> > > various
> > > dynamically generated nodes and props, and handle each
> > > appropriately.
> > 
> > The code I am changing fetches the device tree and build an fdt.
> > What is
> > that special knowledge in this context you are talking about?
> 
> Things like /options are dynamically generated.

They are generated before we do the call to extract the fdt, what's the
problem there ?

> I thought you would just be able to reuse some FDT parsing code to
> implement my suggested new interface.  Maybe that was too optimistic.

No, the idea is to have SLOF re-flatten it's live tree and hand us a
blob. At least that's my understanding.

Ben.




Re: [PATCH v2] PCI: rpaphp: Avoid a sometimes-uninitialized warning

2019-06-03 Thread Tyrel Datwyler
On 06/03/2019 03:11 PM, Nathan Chancellor wrote:
> When building with -Wsometimes-uninitialized, clang warns:
> 
> drivers/pci/hotplug/rpaphp_core.c:243:14: warning: variable 'fndit' is
> used uninitialized whenever 'for' loop exits because its condition is
> false [-Wsometimes-uninitialized]
> for (j = 0; j < entries; j++) {
> ^~~
> drivers/pci/hotplug/rpaphp_core.c:256:6: note: uninitialized use occurs
> here
> if (fndit)
> ^
> drivers/pci/hotplug/rpaphp_core.c:243:14: note: remove the condition if
> it is always true
> for (j = 0; j < entries; j++) {
> ^~~
> drivers/pci/hotplug/rpaphp_core.c:233:14: note: initialize the variable
> 'fndit' to silence this warning
> int j, fndit;
> ^
>  = 0
> 
> fndit is only used to gate a sprintf call, which can be moved into the
> loop to simplify the code and eliminate the local variable, which will
> fix this warning.
> 
> Link: https://github.com/ClangBuiltLinux/linux/issues/504
> Fixes: 2fcf3ae508c2 ("hotplug/drc-info: Add code to search ibm,drc-info 
> property")
> Suggested-by: Nick Desaulniers 
> Signed-off-by: Nathan Chancellor 

Acked-by: Tyrel Datwyler 


Re: [PATCH v3] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work

2019-06-03 Thread Tyrel Datwyler
On 06/03/2019 04:44 PM, Nathan Chancellor wrote:
> clang warns:
> 
> drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used
> uninitialized whenever switch case is taken [-Wsometimes-uninitialized]
> case IBMVSCSI_HOST_ACTION_NONE:
>  ^
> drivers/scsi/ibmvscsi/ibmvscsi.c:2151:6: note: uninitialized use occurs
> here
> if (rc) {
> ^~
> 
> Initialize rc in the IBMVSCSI_HOST_ACTION_UNBLOCK case statement then
> shuffle IBMVSCSI_HOST_ACTION_NONE down to the default case statement and
> make it return early so that rc is never used uninitialized in this
> function.
> 
> Fixes: 035a3c4046b5 ("scsi: ibmvscsi: redo driver work thread to use enum 
> action states")
> Link: https://github.com/ClangBuiltLinux/linux/issues/502
> Suggested-by: Michael Ellerman 
> Suggested-by: Tyrel Datwyler 
> Signed-off-by: Nathan Chancellor 

Acked-by: Tyrel Datwyler 



Re: [WIP RFC PATCH 0/6] Generic Firmware Variable Filesystem

2019-06-03 Thread Daniel Axtens
Hi Greg,

>> >> As PowerNV moves towards secure boot, we need a place to put secure
>> >> variables. One option that has been canvassed is to make our secure
>> >> variables look like EFI variables. This is an early sketch of another
>> >> approach where we create a generic firmware variable file system,
>> >> fwvarfs, and an OPAL Secure Variable backend for it.
>> >
>> > Is there a need of new filesystem ? I am wondering why can't these be 
>> > exposed via sysfs / securityfs ?
>> > Probably, something like... /sys/firmware/secureboot or 
>> > /sys/kernel/security/secureboot/  ?
>> 
>> I suppose we could put secure variables in sysfs, but I'm not sure
>> that's what sysfs was intended for. I understand sysfs as "a
>> filesystem-based view of kernel objects" (from
>> Documentation/filesystems/configfs/configfs.txt), and I don't think a
>> secure variable is really a kernel object in the same way most other
>> things in sysfs are... but I'm open to being convinced.
>
> What makes them more "secure" than anything else that is in sysfs today?
> I didn't see anything in this patchset that provided "additional
> security", did I miss it?

You're right, there's no additional security. What I should have said
was that I didn't think that _firmware_ variables were kernel objects in
the same way that other things in sysfs are. Having read the rest of
your reply it seems I'm mistaken on this.

> I would just recommend putting this in sysfs.  Make a new subsystem
> (i.e. class) and away you go.
>
>> My hope with fwvarfs is to provide a generic place for firmware
>> variables so that we don't need to expand the list of firmware-specific
>> filesystems beyond efivarfs. I am also aiming to make things simple to
>> use so that people familiar with firmware don't also have to become
>> familiar with filesystem code in order to expose firmware variables to
>> userspace.

>> fwvarfs can also be used for variables that are not security relevant as
>> well. For example, with the EFI backend (patch 3), both secure and
>> insecure variables can be read.
>
> I don't remember why efi variables were not put in sysfs, I think there
> was some reasoning behind it originally.  Perhaps look in the linux-efi
> archives.

I'll have a look: I suspect the appeal of efivarfs is that it allows for
things like non-case-sensitive matching on the GUID part of the filename
while retaining case-sensitivity on the part of the filename
representing the variable name.

As suggested, I'll try a sysfs class. I think that will allow me to
kill off most of the abstraction layer too. Thanks for the input.

Regards,
Daniel

>
> thanks,
>
> greg k-h


Re: [PATCH kernel] prom_init: Fetch flatten device tree from the system firmware

2019-06-03 Thread Segher Boessenkool
On Tue, Jun 04, 2019 at 07:18:42AM +1000, Benjamin Herrenschmidt wrote:
> Yes we could make property fetching faster but mostly by creating a new
> bulk interface which is quite a bit of work, a new API, and will in
> practice not be used for anything other than creating the FDT. In that
> case, nothing will beat in performance having OF create the FDT itself
> based on its current tree.

And that will change the whole boot protocol, the interaction between OF
and its client, which is a much bigger change, not conceptually trivial
at all.  Copying all properties at once is, which is why I suggested it.

It would take away the opposition to your patch.


Segher


[PATCH v3] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work

2019-06-03 Thread Nathan Chancellor
clang warns:

drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used
uninitialized whenever switch case is taken [-Wsometimes-uninitialized]
case IBMVSCSI_HOST_ACTION_NONE:
 ^
drivers/scsi/ibmvscsi/ibmvscsi.c:2151:6: note: uninitialized use occurs
here
if (rc) {
^~

Initialize rc in the IBMVSCSI_HOST_ACTION_UNBLOCK case statement then
shuffle IBMVSCSI_HOST_ACTION_NONE down to the default case statement and
make it return early so that rc is never used uninitialized in this
function.

Fixes: 035a3c4046b5 ("scsi: ibmvscsi: redo driver work thread to use enum 
action states")
Link: https://github.com/ClangBuiltLinux/linux/issues/502
Suggested-by: Michael Ellerman 
Suggested-by: Tyrel Datwyler 
Signed-off-by: Nathan Chancellor 
---

v1 -> v2:

* Initialize rc in the case statements, rather than at the top of the
  function, as suggested by Michael.
  
v2 -> v3:

* default and IBMVSCSI_HOST_ACTION_NONE now return early from the
  function, as requested by Tyrel.

 drivers/scsi/ibmvscsi/ibmvscsi.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index 65053daef5f7..7f66a7783209 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -2109,8 +2109,8 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data 
*hostdata)
 
spin_lock_irqsave(hostdata->host->host_lock, flags);
switch (hostdata->action) {
-   case IBMVSCSI_HOST_ACTION_NONE:
case IBMVSCSI_HOST_ACTION_UNBLOCK:
+   rc = 0;
break;
case IBMVSCSI_HOST_ACTION_RESET:
spin_unlock_irqrestore(hostdata->host->host_lock, flags);
@@ -2128,8 +2128,10 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data 
*hostdata)
if (!rc)
rc = ibmvscsi_send_crq(hostdata, 0xC001LL, 
0);
break;
+   case IBMVSCSI_HOST_ACTION_NONE:
default:
-   break;
+   spin_unlock_irqrestore(hostdata->host->host_lock, flags);
+   return;
}
 
hostdata->action = IBMVSCSI_HOST_ACTION_NONE;
-- 
2.22.0.rc3



Re: [PATCH kernel] prom_init: Fetch flatten device tree from the system firmware

2019-06-03 Thread Segher Boessenkool
Hi!

On Mon, Jun 03, 2019 at 12:56:26PM +1000, Alexey Kardashevskiy wrote:
> On 03/06/2019 09:23, Segher Boessenkool wrote:
> >> So I go for the simple one and agree with Alexey's idea.
> > 
> > When dealing with a whole device tree you have to know about the various
> > dynamically generated nodes and props, and handle each appropriately.
> 
> The code I am changing fetches the device tree and build an fdt. What is
> that special knowledge in this context you are talking about?

Things like /options are dynamically generated.

I thought you would just be able to reuse some FDT parsing code to
implement my suggested new interface.  Maybe that was too optimistic.


Segher


Re: [PATCH v2] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work

2019-06-03 Thread Tyrel Datwyler
On 06/03/2019 04:34 PM, Tyrel Datwyler wrote:
> On 06/03/2019 04:25 PM, Tyrel Datwyler wrote:
>> On 06/03/2019 03:19 PM, Nathan Chancellor wrote:
>>> clang warns:
>>>
>>> drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used
>>> uninitialized whenever switch case is taken [-Wsometimes-uninitialized]
>>> case IBMVSCSI_HOST_ACTION_NONE:
>>>  ^
>>> drivers/scsi/ibmvscsi/ibmvscsi.c:2151:6: note: uninitialized use occurs
>>> here
>>> if (rc) {
>>> ^~
>>>
>>> Initialize rc to zero in the case statements that clang mentions so that
>>> the atomic_set and dev_err statement don't trigger for them.
>>>
>>> Fixes: 035a3c4046b5 ("scsi: ibmvscsi: redo driver work thread to use enum 
>>> action states")
>>> Link: https://github.com/ClangBuiltLinux/linux/issues/502
>>> Suggested-by: Michael Ellerman 
>>> Signed-off-by: Nathan Chancellor 
>>
>> Acked-by: Tyrel Datwyler 
>>
> 
> On second thought NACK. See my response to Michael earlier in the thread.
> 
> I think this is the better solution:
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c 
> b/drivers/scsi/ibmvscsi/ibmvscsi.c
> index 727c31dc11a0..c3cf05dd8733 100644
> --- a/drivers/scsi/ibmvscsi/ibmvscsi.c
> +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
> @@ -2123,8 +2123,8 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data
> *hostdata)
> 
> spin_lock_irqsave(hostdata->host->host_lock, flags);
> switch (hostdata->action) {
> -   case IBMVSCSI_HOST_ACTION_NONE:
> case IBMVSCSI_HOST_ACTION_UNBLOCK:
> +   rc = 0;
> break;
> case IBMVSCSI_HOST_ACTION_RESET:
> spin_unlock_irqrestore(hostdata->host->host_lock, flags);
> @@ -2142,8 +2142,9 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data
> *hostdata)
> if (!rc)
> rc = ibmvscsi_send_crq(hostdata, 
> 0xC001LL, 0);
> break;
> +   case IBMVSCSI_HOST_ACTION_NONE:
> default:
> -   break;

Need a spin_unlock_irqrestore() here before the return.

-Tyrel

> +   return;
> }
> 
> hostdata->action = IBMVSCSI_HOST_ACTION_NONE;
> 



Re: [PATCH v2] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work

2019-06-03 Thread Tyrel Datwyler
On 06/03/2019 04:25 PM, Tyrel Datwyler wrote:
> On 06/03/2019 03:19 PM, Nathan Chancellor wrote:
>> clang warns:
>>
>> drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used
>> uninitialized whenever switch case is taken [-Wsometimes-uninitialized]
>> case IBMVSCSI_HOST_ACTION_NONE:
>>  ^
>> drivers/scsi/ibmvscsi/ibmvscsi.c:2151:6: note: uninitialized use occurs
>> here
>> if (rc) {
>> ^~
>>
>> Initialize rc to zero in the case statements that clang mentions so that
>> the atomic_set and dev_err statement don't trigger for them.
>>
>> Fixes: 035a3c4046b5 ("scsi: ibmvscsi: redo driver work thread to use enum 
>> action states")
>> Link: https://github.com/ClangBuiltLinux/linux/issues/502
>> Suggested-by: Michael Ellerman 
>> Signed-off-by: Nathan Chancellor 
> 
> Acked-by: Tyrel Datwyler 
> 

On second thought NACK. See my response to Michael earlier in the thread.

I think this is the better solution:

diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index 727c31dc11a0..c3cf05dd8733 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -2123,8 +2123,8 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data
*hostdata)

spin_lock_irqsave(hostdata->host->host_lock, flags);
switch (hostdata->action) {
-   case IBMVSCSI_HOST_ACTION_NONE:
case IBMVSCSI_HOST_ACTION_UNBLOCK:
+   rc = 0;
break;
case IBMVSCSI_HOST_ACTION_RESET:
spin_unlock_irqrestore(hostdata->host->host_lock, flags);
@@ -2142,8 +2142,9 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data
*hostdata)
if (!rc)
rc = ibmvscsi_send_crq(hostdata, 0xC001LL, 
0);
break;
+   case IBMVSCSI_HOST_ACTION_NONE:
default:
-   break;
+   return;
}

hostdata->action = IBMVSCSI_HOST_ACTION_NONE;



Re: [PATCH] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work

2019-06-03 Thread Tyrel Datwyler
On 06/02/2019 03:15 AM, Michael Ellerman wrote:
> Hi Nathan,
> 
> Nathan Chancellor  writes:
>> clang warns:
>>
>> drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used
>> uninitialized whenever switch case is taken [-Wsometimes-uninitialized]
>> case IBMVSCSI_HOST_ACTION_NONE:
>>  ^
>> drivers/scsi/ibmvscsi/ibmvscsi.c:2151:6: note: uninitialized use occurs
>> here
>> if (rc) {
>> ^~
>>
>> Initialize rc to zero so that the atomic_set and dev_err statement don't
>> trigger for the cases that just break.
>>
>> Fixes: 035a3c4046b5 ("scsi: ibmvscsi: redo driver work thread to use enum 
>> action states")
>> Link: https://github.com/ClangBuiltLinux/linux/issues/502
>> Signed-off-by: Nathan Chancellor 
>> ---
>>  drivers/scsi/ibmvscsi/ibmvscsi.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c 
>> b/drivers/scsi/ibmvscsi/ibmvscsi.c
>> index 727c31dc11a0..6714d8043e62 100644
>> --- a/drivers/scsi/ibmvscsi/ibmvscsi.c
>> +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
>> @@ -2118,7 +2118,7 @@ static unsigned long ibmvscsi_get_desired_dma(struct 
>> vio_dev *vdev)
>>  static void ibmvscsi_do_work(struct ibmvscsi_host_data *hostdata)
>>  {
>>  unsigned long flags;
>> -int rc;
>> +int rc = 0;
>>  char *action = "reset";
>>  
>>  spin_lock_irqsave(hostdata->host->host_lock, flags);
> 
> It's always preferable IMHO to keep any initialisation as localised as
> possible, so that the compiler can continue to warn about uninitialised
> usages elsewhere. In this case that would mean doing the rc = 0 in the
> switch, something like:
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c 
> b/drivers/scsi/ibmvscsi/ibmvscsi.c
> index 727c31dc11a0..7ee5755cf636 100644
> --- a/drivers/scsi/ibmvscsi/ibmvscsi.c
> +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
> @@ -2123,9 +2123,6 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data 
> *hostdata)
>  
> spin_lock_irqsave(hostdata->host->host_lock, flags);
> switch (hostdata->action) {
> -   case IBMVSCSI_HOST_ACTION_NONE:
> -   case IBMVSCSI_HOST_ACTION_UNBLOCK:
> -   break;
> case IBMVSCSI_HOST_ACTION_RESET:
> spin_unlock_irqrestore(hostdata->host->host_lock, flags);
> rc = ibmvscsi_reset_crq_queue(>queue, hostdata);
> @@ -2142,7 +2139,10 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data 
> *hostdata)
> if (!rc)
> rc = ibmvscsi_send_crq(hostdata, 
> 0xC001LL, 0);
> break;
> +   case IBMVSCSI_HOST_ACTION_NONE:
> +   case IBMVSCSI_HOST_ACTION_UNBLOCK:
> default:
> +   rc = 0;
> break;
> }
> 
> 
> But then that makes me wonder if that's actually correct?
> 
> If we get an action that we don't recognise should we just throw it away
> like that? (by doing hostdata->action = IBMVSCSI_HOST_ACTION_NONE). Tyrel?

On initial pass I was ok with this, but after thinking on it I think it is more
subtle.

The right approach is to set rc = 0 for HOST_ACTION_UNBLOCK as we want to fall
through. For HOST_ACTION_NONE and default we need to return directly from the
function.

-Tyrel

> 
> cheers
> 



Re: [PATCH v2] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work

2019-06-03 Thread Tyrel Datwyler
On 06/03/2019 03:19 PM, Nathan Chancellor wrote:
> clang warns:
> 
> drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used
> uninitialized whenever switch case is taken [-Wsometimes-uninitialized]
> case IBMVSCSI_HOST_ACTION_NONE:
>  ^
> drivers/scsi/ibmvscsi/ibmvscsi.c:2151:6: note: uninitialized use occurs
> here
> if (rc) {
> ^~
> 
> Initialize rc to zero in the case statements that clang mentions so that
> the atomic_set and dev_err statement don't trigger for them.
> 
> Fixes: 035a3c4046b5 ("scsi: ibmvscsi: redo driver work thread to use enum 
> action states")
> Link: https://github.com/ClangBuiltLinux/linux/issues/502
> Suggested-by: Michael Ellerman 
> Signed-off-by: Nathan Chancellor 

Acked-by: Tyrel Datwyler 



[PATCH v2] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work

2019-06-03 Thread Nathan Chancellor
clang warns:

drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used
uninitialized whenever switch case is taken [-Wsometimes-uninitialized]
case IBMVSCSI_HOST_ACTION_NONE:
 ^
drivers/scsi/ibmvscsi/ibmvscsi.c:2151:6: note: uninitialized use occurs
here
if (rc) {
^~

Initialize rc to zero in the case statements that clang mentions so that
the atomic_set and dev_err statement don't trigger for them.

Fixes: 035a3c4046b5 ("scsi: ibmvscsi: redo driver work thread to use enum 
action states")
Link: https://github.com/ClangBuiltLinux/linux/issues/502
Suggested-by: Michael Ellerman 
Signed-off-by: Nathan Chancellor 
---

v1 -> v2:

* Initialize rc in the case statements, rather than at the top of the
  function, as suggested by Michael.

 drivers/scsi/ibmvscsi/ibmvscsi.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index 65053daef5f7..3b5647d622d9 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -2109,9 +2109,6 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data 
*hostdata)
 
spin_lock_irqsave(hostdata->host->host_lock, flags);
switch (hostdata->action) {
-   case IBMVSCSI_HOST_ACTION_NONE:
-   case IBMVSCSI_HOST_ACTION_UNBLOCK:
-   break;
case IBMVSCSI_HOST_ACTION_RESET:
spin_unlock_irqrestore(hostdata->host->host_lock, flags);
rc = ibmvscsi_reset_crq_queue(>queue, hostdata);
@@ -2128,7 +2125,10 @@ static void ibmvscsi_do_work(struct ibmvscsi_host_data 
*hostdata)
if (!rc)
rc = ibmvscsi_send_crq(hostdata, 0xC001LL, 
0);
break;
+   case IBMVSCSI_HOST_ACTION_NONE:
+   case IBMVSCSI_HOST_ACTION_UNBLOCK:
default:
+   rc = 0;
break;
}
 
-- 
2.22.0.rc3



Re: [PATCH v3 06/11] mm/memory_hotplug: Allow arch_remove_pages() without CONFIG_MEMORY_HOTREMOVE

2019-06-03 Thread Wei Yang
Allow arch_remove_pages() or arch_remove_memory()?

And want to confirm the kernel build on affected arch succeed?

On Mon, May 27, 2019 at 01:11:47PM +0200, David Hildenbrand wrote:
>We want to improve error handling while adding memory by allowing
>to use arch_remove_memory() and __remove_pages() even if
>CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like:
>
>   arch_add_memory()
>   rc = do_something();
>   if (rc) {
>   arch_remove_memory();
>   }
>
>We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require
>quite some dependencies for memory offlining.
>
>Cc: Tony Luck 
>Cc: Fenghua Yu 
>Cc: Benjamin Herrenschmidt 
>Cc: Paul Mackerras 
>Cc: Michael Ellerman 
>Cc: Martin Schwidefsky 
>Cc: Heiko Carstens 
>Cc: Yoshinori Sato 
>Cc: Rich Felker 
>Cc: Dave Hansen 
>Cc: Andy Lutomirski 
>Cc: Peter Zijlstra 
>Cc: Thomas Gleixner 
>Cc: Ingo Molnar 
>Cc: Borislav Petkov 
>Cc: "H. Peter Anvin" 
>Cc: Greg Kroah-Hartman 
>Cc: "Rafael J. Wysocki" 
>Cc: Andrew Morton 
>Cc: Michal Hocko 
>Cc: Mike Rapoport 
>Cc: David Hildenbrand 
>Cc: Oscar Salvador 
>Cc: "Kirill A. Shutemov" 
>Cc: Alex Deucher 
>Cc: "David S. Miller" 
>Cc: Mark Brown 
>Cc: Chris Wilson 
>Cc: Christophe Leroy 
>Cc: Nicholas Piggin 
>Cc: Vasily Gorbik 
>Cc: Rob Herring 
>Cc: Masahiro Yamada 
>Cc: "mike.tra...@hpe.com" 
>Cc: Andrew Banman 
>Cc: Pavel Tatashin 
>Cc: Wei Yang 
>Cc: Arun KS 
>Cc: Qian Cai 
>Cc: Mathieu Malaterre 
>Cc: Baoquan He 
>Cc: Logan Gunthorpe 
>Cc: Anshuman Khandual 
>Signed-off-by: David Hildenbrand 
>---
> arch/arm64/mm/mmu.c| 2 --
> arch/ia64/mm/init.c| 2 --
> arch/powerpc/mm/mem.c  | 2 --
> arch/s390/mm/init.c| 2 --
> arch/sh/mm/init.c  | 2 --
> arch/x86/mm/init_32.c  | 2 --
> arch/x86/mm/init_64.c  | 2 --
> drivers/base/memory.c  | 2 --
> include/linux/memory.h | 2 --
> include/linux/memory_hotplug.h | 2 --
> mm/memory_hotplug.c| 2 --
> mm/sparse.c| 6 --
> 12 files changed, 28 deletions(-)
>
>diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>index e569a543c384..9ccd7539f2d4 100644
>--- a/arch/arm64/mm/mmu.c
>+++ b/arch/arm64/mm/mmu.c
>@@ -1084,7 +1084,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
>   return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
>  restrictions);
> }
>-#ifdef CONFIG_MEMORY_HOTREMOVE
> void arch_remove_memory(int nid, u64 start, u64 size,
>   struct vmem_altmap *altmap)
> {
>@@ -1103,4 +1102,3 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>   __remove_pages(zone, start_pfn, nr_pages, altmap);
> }
> #endif
>-#endif
>diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
>index d28e29103bdb..aae75fd7b810 100644
>--- a/arch/ia64/mm/init.c
>+++ b/arch/ia64/mm/init.c
>@@ -681,7 +681,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
>   return ret;
> }
> 
>-#ifdef CONFIG_MEMORY_HOTREMOVE
> void arch_remove_memory(int nid, u64 start, u64 size,
>   struct vmem_altmap *altmap)
> {
>@@ -693,4 +692,3 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>   __remove_pages(zone, start_pfn, nr_pages, altmap);
> }
> #endif
>-#endif
>diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
>index e885fe2aafcc..e4bc2dc3f593 100644
>--- a/arch/powerpc/mm/mem.c
>+++ b/arch/powerpc/mm/mem.c
>@@ -130,7 +130,6 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
>   return __add_pages(nid, start_pfn, nr_pages, restrictions);
> }
> 
>-#ifdef CONFIG_MEMORY_HOTREMOVE
> void __ref arch_remove_memory(int nid, u64 start, u64 size,
>struct vmem_altmap *altmap)
> {
>@@ -164,7 +163,6 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
>   pr_warn("Hash collision while resizing HPT\n");
> }
> #endif
>-#endif /* CONFIG_MEMORY_HOTPLUG */
> 
> #ifndef CONFIG_NEED_MULTIPLE_NODES
> void __init mem_topology_setup(void)
>diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
>index 14955e0a9fcf..ffb81fe95c77 100644
>--- a/arch/s390/mm/init.c
>+++ b/arch/s390/mm/init.c
>@@ -239,7 +239,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
>   return rc;
> }
> 
>-#ifdef CONFIG_MEMORY_HOTREMOVE
> void arch_remove_memory(int nid, u64 start, u64 size,
>   struct vmem_altmap *altmap)
> {
>@@ -251,5 +250,4 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>   __remove_pages(zone, start_pfn, nr_pages, altmap);
>   vmem_remove_mapping(start, size);
> }
>-#endif
> #endif /* CONFIG_MEMORY_HOTPLUG */
>diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
>index 13c6a6bb5fd9..dfdbaa50946e 100644
>--- a/arch/sh/mm/init.c
>+++ b/arch/sh/mm/init.c
>@@ -429,7 +429,6 @@ int memory_add_physaddr_to_nid(u64 addr)
> EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
> #endif
> 
>-#ifdef CONFIG_MEMORY_HOTREMOVE
> void arch_remove_memory(int nid, u64 start, u64 

[PATCH v2] PCI: rpaphp: Avoid a sometimes-uninitialized warning

2019-06-03 Thread Nathan Chancellor
When building with -Wsometimes-uninitialized, clang warns:

drivers/pci/hotplug/rpaphp_core.c:243:14: warning: variable 'fndit' is
used uninitialized whenever 'for' loop exits because its condition is
false [-Wsometimes-uninitialized]
for (j = 0; j < entries; j++) {
^~~
drivers/pci/hotplug/rpaphp_core.c:256:6: note: uninitialized use occurs
here
if (fndit)
^
drivers/pci/hotplug/rpaphp_core.c:243:14: note: remove the condition if
it is always true
for (j = 0; j < entries; j++) {
^~~
drivers/pci/hotplug/rpaphp_core.c:233:14: note: initialize the variable
'fndit' to silence this warning
int j, fndit;
^
 = 0

fndit is only used to gate a sprintf call, which can be moved into the
loop to simplify the code and eliminate the local variable, which will
fix this warning.

Link: https://github.com/ClangBuiltLinux/linux/issues/504
Fixes: 2fcf3ae508c2 ("hotplug/drc-info: Add code to search ibm,drc-info 
property")
Suggested-by: Nick Desaulniers 
Signed-off-by: Nathan Chancellor 
---

v1 -> v2:

* Eliminate fndit altogether by shuffling the sprintf call into the for
  loop and changing the if conditional, as suggested by Nick.

 drivers/pci/hotplug/rpaphp_core.c | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/hotplug/rpaphp_core.c 
b/drivers/pci/hotplug/rpaphp_core.c
index bcd5d357ca23..c3899ee1db99 100644
--- a/drivers/pci/hotplug/rpaphp_core.c
+++ b/drivers/pci/hotplug/rpaphp_core.c
@@ -230,7 +230,7 @@ static int rpaphp_check_drc_props_v2(struct device_node 
*dn, char *drc_name,
struct of_drc_info drc;
const __be32 *value;
char cell_drc_name[MAX_DRC_NAME_LEN];
-   int j, fndit;
+   int j;
 
info = of_find_property(dn->parent, "ibm,drc-info", NULL);
if (info == NULL)
@@ -245,17 +245,13 @@ static int rpaphp_check_drc_props_v2(struct device_node 
*dn, char *drc_name,
 
/* Should now know end of current entry */
 
-   if (my_index > drc.last_drc_index)
-   continue;
-
-   fndit = 1;
-   break;
+   /* Found it */
+   if (my_index <= drc.last_drc_index) {
+   sprintf(cell_drc_name, "%s%d", drc.drc_name_prefix,
+   my_index);
+   break;
+   }
}
-   /* Found it */
-
-   if (fndit)
-   sprintf(cell_drc_name, "%s%d", drc.drc_name_prefix, 
-   my_index);
 
if (((drc_name == NULL) ||
 (drc_name && !strcmp(drc_name, cell_drc_name))) &&
-- 
2.22.0.rc3



Re: [PATCH] PCI: rpaphp: Avoid a sometimes-uninitialized warning

2019-06-03 Thread Nathan Chancellor
Hi Nick,

On Mon, Jun 03, 2019 at 02:07:50PM -0700, Nick Desaulniers wrote:
> On Mon, Jun 3, 2019 at 10:44 AM Nathan Chancellor
>  wrote:
> > Looking at the loop in a vacuum as clang would, fndit could be
> > uninitialized if entries was ever zero or the if statement was
> > always true within the loop. Regardless of whether or not this
> > warning is a problem in practice, "found" variables should always
> > be initialized to false so that there is no possibility of
> > undefined behavior.
> 
> Thanks for the patch Nathan.  fndit isn't really being used for
> anything other than a print statement outside of the loop.  How about:

Thank you for the review, this seems reasonable. I will send a v2
shortly.

> 
> ```
> diff --git a/drivers/pci/hotplug/rpaphp_core.c
> b/drivers/pci/hotplug/rpaphp_core.c
> index bcd5d357ca23..c3899ee1db99 100644
> --- a/drivers/pci/hotplug/rpaphp_core.c
> +++ b/drivers/pci/hotplug/rpaphp_core.c
> @@ -230,7 +230,7 @@ static int rpaphp_check_drc_props_v2(struct
> device_node *dn, char *drc_name,
>   struct of_drc_info drc;
>   const __be32 *value;
>   char cell_drc_name[MAX_DRC_NAME_LEN];
> - int j, fndit;
> + int j;
> 
>   info = of_find_property(dn->parent, "ibm,drc-info", NULL);
>   if (info == NULL)
> @@ -245,17 +245,13 @@ static int rpaphp_check_drc_props_v2(struct
> device_node *dn, char *drc_name,
> 
>   /* Should now know end of current entry */
> 
> - if (my_index > drc.last_drc_index)
> - continue;
> -
> - fndit = 1;
> - break;
> + /* Found it */
> + if (my_index <= drc.last_drc_index) {
> + sprintf(cell_drc_name, "%s%d", drc.drc_name_prefix,
> + my_index);
> + break;
> + }
>   }
> - /* Found it */
> -
> - if (fndit)
> - sprintf(cell_drc_name, "%s%d", drc.drc_name_prefix,
> - my_index);
> 
>   if (((drc_name == NULL) ||
>(drc_name && !strcmp(drc_name, cell_drc_name))) &&
> ```
> (not sure my tabs were pasted properly in the above...)

Doesn't look like it but no worries.

Thanks,
Nathan


Re: [PATCH v3 05/11] drivers/base/memory: Pass a block_id to init_memory_block()

2019-06-03 Thread Wei Yang
On Mon, May 27, 2019 at 01:11:46PM +0200, David Hildenbrand wrote:
>We'll rework hotplug_memory_register() shortly, so it no longer consumes
>pass a section.
>
>Cc: Greg Kroah-Hartman 
>Cc: "Rafael J. Wysocki" 
>Signed-off-by: David Hildenbrand 
>---
> drivers/base/memory.c | 15 +++
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index f180427e48f4..f914fa6fe350 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -651,21 +651,18 @@ int register_memory(struct memory_block *memory)
>   return ret;
> }
> 
>-static int init_memory_block(struct memory_block **memory,
>-   struct mem_section *section, unsigned long state)
>+static int init_memory_block(struct memory_block **memory, int block_id,
>+   unsigned long state)
> {
>   struct memory_block *mem;
>   unsigned long start_pfn;
>-  int scn_nr;
>   int ret = 0;
> 
>   mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>   if (!mem)
>   return -ENOMEM;
> 
>-  scn_nr = __section_nr(section);
>-  mem->start_section_nr =
>-  base_memory_block_id(scn_nr) * sections_per_block;
>+  mem->start_section_nr = block_id * sections_per_block;
>   mem->end_section_nr = mem->start_section_nr + sections_per_block - 1;
>   mem->state = state;
>   start_pfn = section_nr_to_pfn(mem->start_section_nr);
>@@ -694,7 +691,8 @@ static int add_memory_block(int base_section_nr)
> 
>   if (section_count == 0)
>   return 0;
>-  ret = init_memory_block(, __nr_to_section(section_nr), MEM_ONLINE);
>+  ret = init_memory_block(, base_memory_block_id(base_section_nr),
>+  MEM_ONLINE);

If my understanding is correct, section_nr could be removed too.

>   if (ret)
>   return ret;
>   mem->section_count = section_count;
>@@ -707,6 +705,7 @@ static int add_memory_block(int base_section_nr)
>  */
> int hotplug_memory_register(int nid, struct mem_section *section)
> {
>+  int block_id = base_memory_block_id(__section_nr(section));
>   int ret = 0;
>   struct memory_block *mem;
> 
>@@ -717,7 +716,7 @@ int hotplug_memory_register(int nid, struct mem_section 
>*section)
>   mem->section_count++;
>   put_device(>dev);
>   } else {
>-  ret = init_memory_block(, section, MEM_OFFLINE);
>+  ret = init_memory_block(, block_id, MEM_OFFLINE);
>   if (ret)
>   goto out;
>   mem->section_count++;
>-- 
>2.20.1

-- 
Wei Yang
Help you, Help me


Re: [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling

2019-06-03 Thread David Hildenbrand
On 03.06.19 23:21, Wei Yang wrote:
> IMHO, there is some typo.

Yes, thanks.

> 
> s/devicehandling/device handling/
> 
> On Mon, May 27, 2019 at 01:11:41PM +0200, David Hildenbrand wrote:
>> We only want memory block devices for memory to be onlined/offlined
>> (add/remove from the buddy). This is required so user space can
>> online/offline memory and kdump gets notified about newly onlined memory.
>>
>> Let's factor out creation/removal of memory block devices. This helps
>> to further cleanup arch_add_memory/arch_remove_memory() and to make
>> implementation of new features easier - especially sub-section
>> memory hot add from Dan.
>>
>> Anshuman Khandual is currently working on arch_remove_memory(). I added
>> a temporary solution via "arm64/mm: Add temporary arch_remove_memory()
>> implementation", that is sufficient as a firsts tep in the context of
> 
> s/firsts tep/first step/
> 
>> this series. (we don't cleanup page tables in case anything goes
>> wrong already)
>>
>> Did a quick sanity test with DIMM plug/unplug, making sure all devices
>> and sysfs links properly get added/removed. Compile tested on s390x and
>> x86-64.
>>
>> Based on next/master.
>>
>> Next refactoring on my list will be making sure that remove_memory()
>> will never deal with zones / access "struct pages". Any kind of zone
>> handling will have to be done when offlining system memory / before
>> removing device memory. I am thinking about remove_pfn_range_from_zone()",
>> du undo everything "move_pfn_range_to_zone()" did.
> 
> what is "du undo"? I may not get it.

to undo ;)

-- 

Thanks,

David / dhildenb


Re: [PATCH v3 00/11] mm/memory_hotplug: Factor out memory block devicehandling

2019-06-03 Thread Wei Yang
IMHO, there is some typo.

s/devicehandling/device handling/

On Mon, May 27, 2019 at 01:11:41PM +0200, David Hildenbrand wrote:
>We only want memory block devices for memory to be onlined/offlined
>(add/remove from the buddy). This is required so user space can
>online/offline memory and kdump gets notified about newly onlined memory.
>
>Let's factor out creation/removal of memory block devices. This helps
>to further cleanup arch_add_memory/arch_remove_memory() and to make
>implementation of new features easier - especially sub-section
>memory hot add from Dan.
>
>Anshuman Khandual is currently working on arch_remove_memory(). I added
>a temporary solution via "arm64/mm: Add temporary arch_remove_memory()
>implementation", that is sufficient as a firsts tep in the context of

s/firsts tep/first step/

>this series. (we don't cleanup page tables in case anything goes
>wrong already)
>
>Did a quick sanity test with DIMM plug/unplug, making sure all devices
>and sysfs links properly get added/removed. Compile tested on s390x and
>x86-64.
>
>Based on next/master.
>
>Next refactoring on my list will be making sure that remove_memory()
>will never deal with zones / access "struct pages". Any kind of zone
>handling will have to be done when offlining system memory / before
>removing device memory. I am thinking about remove_pfn_range_from_zone()",
>du undo everything "move_pfn_range_to_zone()" did.

what is "du undo"? I may not get it.

>
>v2 -> v3:
>- Add "s390x/mm: Fail when an altmap is used for arch_add_memory()"
>- Add "arm64/mm: Add temporary arch_remove_memory() implementation"
>- Add "drivers/base/memory: Pass a block_id to init_memory_block()"
>- Various changes to "mm/memory_hotplug: Create memory block devices
>  after arch_add_memory()" and "mm/memory_hotplug: Create memory block
>  devices after arch_add_memory()" due to switching from sections to
>  block_id's.
>
>v1 -> v2:
>- s390x/mm: Implement arch_remove_memory()
>-- remove mapping after "__remove_pages"
>
>David Hildenbrand (11):
>  mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()
>  s390x/mm: Fail when an altmap is used for arch_add_memory()
>  s390x/mm: Implement arch_remove_memory()
>  arm64/mm: Add temporary arch_remove_memory() implementation
>  drivers/base/memory: Pass a block_id to init_memory_block()
>  mm/memory_hotplug: Allow arch_remove_pages() without
>CONFIG_MEMORY_HOTREMOVE
>  mm/memory_hotplug: Create memory block devices after arch_add_memory()
>  mm/memory_hotplug: Drop MHP_MEMBLOCK_API
>  mm/memory_hotplug: Remove memory block devices before
>arch_remove_memory()
>  mm/memory_hotplug: Make unregister_memory_block_under_nodes() never
>fail
>  mm/memory_hotplug: Remove "zone" parameter from
>sparse_remove_one_section
>
> arch/arm64/mm/mmu.c|  17 +
> arch/ia64/mm/init.c|   2 -
> arch/powerpc/mm/mem.c  |   2 -
> arch/s390/mm/init.c|  18 +++--
> arch/sh/mm/init.c  |   2 -
> arch/x86/mm/init_32.c  |   2 -
> arch/x86/mm/init_64.c  |   2 -
> drivers/base/memory.c  | 134 +++--
> drivers/base/node.c|  27 +++
> include/linux/memory.h |   6 +-
> include/linux/memory_hotplug.h |  12 +--
> include/linux/node.h   |   7 +-
> mm/memory_hotplug.c|  44 +--
> mm/sparse.c|  10 +--
> 14 files changed, 140 insertions(+), 145 deletions(-)
>
>-- 
>2.20.1

-- 
Wei Yang
Help you, Help me


Re: [PATCH v3 0/6] soc/fsl/qe: cleanups and new DT binding

2019-06-03 Thread Rasmus Villemoes
On 13/05/2019 13.14, Rasmus Villemoes wrote:
> This small series consists of some small cleanups and simplifications
> of the QUICC engine driver, and introduces a new DT binding that makes
> it much easier to support other variants of the QUICC engine IP block
> that appears in the wild: There's no reason to expect in general that
> the number of valid SNUMs uniquely determines the set of such, so it's
> better to simply let the device tree specify the values (and,
> implicitly via the array length, also the count).
> 
> Which tree should this go through?

Ping? These patches should be ready to go in, but I don't know who is
supposed to pick them up.

Thanks,
Rasmus


[PATCH] powerpc: Enable kernel XZ compression option on PPC_85xx

2019-06-03 Thread Pawel Dembicki
Enable kernel XZ compression option on PPC_85xx. Tested with
simpleImage on TP-Link TL-WDR4900 (Freescale P1014 processor).

Suggested-by: Christian Lamparter 
Signed-off-by: Pawel Dembicki 
---
 arch/powerpc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8c1c636308c8..daf4cb968922 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -196,7 +196,7 @@ config PPC
select HAVE_IOREMAP_PROT
select HAVE_IRQ_EXIT_ON_IRQ_STACK
select HAVE_KERNEL_GZIP
-   select HAVE_KERNEL_XZ   if PPC_BOOK3S || 44x
+   select HAVE_KERNEL_XZ   if PPC_BOOK3S || 44x || PPC_85xx
select HAVE_KPROBES
select HAVE_KPROBES_ON_FTRACE
select HAVE_KRETPROBES
-- 
2.20.1



Re: [PATCH 22/22] docs: fix broken documentation links

2019-06-03 Thread Mark Brown
On Wed, May 29, 2019 at 08:23:53PM -0300, Mauro Carvalho Chehab wrote:
> Mostly due to x86 and acpi conversion, several documentation
> links are still pointing to the old file. Fix them.

Acked-by: Mark Brown 


signature.asc
Description: PGP signature


Re: [PATCH v3 04/11] arm64/mm: Add temporary arch_remove_memory() implementation

2019-06-03 Thread Wei Yang
On Mon, May 27, 2019 at 01:11:45PM +0200, David Hildenbrand wrote:
>A proper arch_remove_memory() implementation is on its way, which also
>cleanly removes page tables in arch_add_memory() in case something goes
>wrong.

Would this be better to understand?

removes page tables created in arch_add_memory

>
>As we want to use arch_remove_memory() in case something goes wrong
>during memory hotplug after arch_add_memory() finished, let's add
>a temporary hack that is sufficient enough until we get a proper
>implementation that cleans up page table entries.
>
>We will remove CONFIG_MEMORY_HOTREMOVE around this code in follow up
>patches.
>
>Cc: Catalin Marinas 
>Cc: Will Deacon 
>Cc: Mark Rutland 
>Cc: Andrew Morton 
>Cc: Ard Biesheuvel 
>Cc: Chintan Pandya 
>Cc: Mike Rapoport 
>Cc: Jun Yao 
>Cc: Yu Zhao 
>Cc: Robin Murphy 
>Cc: Anshuman Khandual 
>Signed-off-by: David Hildenbrand 
>---
> arch/arm64/mm/mmu.c | 19 +++
> 1 file changed, 19 insertions(+)
>
>diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>index a1bfc4413982..e569a543c384 100644
>--- a/arch/arm64/mm/mmu.c
>+++ b/arch/arm64/mm/mmu.c
>@@ -1084,4 +1084,23 @@ int arch_add_memory(int nid, u64 start, u64 size,
>   return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
>  restrictions);
> }
>+#ifdef CONFIG_MEMORY_HOTREMOVE
>+void arch_remove_memory(int nid, u64 start, u64 size,
>+  struct vmem_altmap *altmap)
>+{
>+  unsigned long start_pfn = start >> PAGE_SHIFT;
>+  unsigned long nr_pages = size >> PAGE_SHIFT;
>+  struct zone *zone;
>+
>+  /*
>+   * FIXME: Cleanup page tables (also in arch_add_memory() in case
>+   * adding fails). Until then, this function should only be used
>+   * during memory hotplug (adding memory), not for memory
>+   * unplug. ARCH_ENABLE_MEMORY_HOTREMOVE must not be
>+   * unlocked yet.
>+   */
>+  zone = page_zone(pfn_to_page(start_pfn));

Compared with arch_remove_memory in x86. If altmap is not NULL, zone will be
retrieved from page related to altmap. Not sure why this is not the same?

>+  __remove_pages(zone, start_pfn, nr_pages, altmap);
>+}
>+#endif
> #endif
>-- 
>2.20.1

-- 
Wei Yang
Help you, Help me


Re: [PATCH kernel] prom_init: Fetch flatten device tree from the system firmware

2019-06-03 Thread Benjamin Herrenschmidt
On Mon, 2019-06-03 at 12:56 +1000, Alexey Kardashevskiy wrote:
> 
> > That is all you need if you do not want to use OF at all.
> 
> ? We also need OF drivers to boot grub and the system, and a default
> console for early booting, but yes, we do not want to keep using slof
> that much.
> 
> > If you *do* want to keep having an Open Firmware, what we want or need
> > is a faster way to walk huge device trees.
> 
> Why? We do not need to _walk_ the tree at all (we _have_ to now and
> while walking we do nothing but pack everything together into the fdt
> blob) as slof can easily do this already.

I agree with Alexey. In a sense this can be thought as an extension of
"quiesce", which effectively kills OF.

Yes we could make property fetching faster but mostly by creating a new
bulk interface which is quite a bit of work, a new API, and will in
practice not be used for anything other than creating the FDT. In that
case, nothing will beat in performance having OF create the FDT itself
based on its current tree.

> > > There is no use for the "fetch all properties" cases other than
> > > building an FDT that any of us can think of, and it would create a more
> > > complicated interface than just "fetch an FDT".
> > 
> > It is a simple way to speed up fetching the device tree enormously,
> > without needing big changes to either OF or the clients using it -- not
> > in the code, but importantly also not conceptually: everything works just
> > as before, just a lot faster.
> 
> I can safely presume though that this will never ever be used in
> practice. And it will be still slower, a tiny bit. And require new code
> in both slof and linux.

Correct.

> I can rather see us getting rid of SLOF in the future which in turn will
> require the fdt blob pointer in r5 (or whatever it is, forgot) when the
> guest starts; so "fetch-all-props" won't be used there either.
> 
> > > So I go for the simple one and agree with Alexey's idea.
> > 
> > When dealing with a whole device tree you have to know about the various
> > dynamically generated nodes and props, and handle each appropriately.
> 
> The code I am changing fetches the device tree and build an fdt. What is
> that special knowledge in this context you are talking about?

Ben.




RE: [PATCH v3 0/6] soc/fsl/qe: cleanups and new DT binding

2019-06-03 Thread Leo Li


> -Original Message-
> From: Rasmus Villemoes 
> Sent: Monday, June 3, 2019 2:54 PM
> To: devicet...@vger.kernel.org; Qiang Zhao ; Leo Li
> 
> Cc: linuxppc-dev@lists.ozlabs.org; linux-arm-ker...@lists.infradead.org;
> linux-ker...@vger.kernel.org; Rob Herring ; Scott
> Wood ; Christophe Leroy ;
> Mark Rutland ; jo...@infinera.com
> 
> Subject: Re: [PATCH v3 0/6] soc/fsl/qe: cleanups and new DT binding
> 
> On 13/05/2019 13.14, Rasmus Villemoes wrote:
> > This small series consists of some small cleanups and simplifications
> > of the QUICC engine driver, and introduces a new DT binding that makes
> > it much easier to support other variants of the QUICC engine IP block
> > that appears in the wild: There's no reason to expect in general that
> > the number of valid SNUMs uniquely determines the set of such, so it's
> > better to simply let the device tree specify the values (and,
> > implicitly via the array length, also the count).
> >
> > Which tree should this go through?
> 
> Ping? These patches should be ready to go in, but I don't know who is
> supposed to pick them up.

I can pick them up through the soc/fsl tree.

Regards,
Leo


Re: [PATCH v3] mm: add account_locked_vm utility function

2019-06-03 Thread Alex Williamson
On Wed, 29 May 2019 16:50:19 -0400
Daniel Jordan  wrote:

> locked_vm accounting is done roughly the same way in five places, so
> unify them in a helper.
> 
> Include the helper's caller in the debug print to distinguish between
> callsites.
> 
> Error codes stay the same, so user-visible behavior does too.  The one
> exception is that the -EPERM case in tce_account_locked_vm is removed
> because Alexey has never seen it triggered.
> 
> Signed-off-by: Daniel Jordan 
> Tested-by: Alexey Kardashevskiy 
> Cc: Alan Tull 
> Cc: Alex Williamson 
> Cc: Andrew Morton 
> Cc: Benjamin Herrenschmidt 
> Cc: Christoph Lameter 
> Cc: Christophe Leroy 
> Cc: Davidlohr Bueso 
> Cc: Ira Weiny 
> Cc: Jason Gunthorpe 
> Cc: Mark Rutland 
> Cc: Michael Ellerman 
> Cc: Moritz Fischer 
> Cc: Paul Mackerras 
> Cc: Steve Sistare 
> Cc: Wu Hao 
> Cc: linux...@kvack.org
> Cc: k...@vger.kernel.org
> Cc: kvm-...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-f...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> ---
> v3:
>  - uninline account_locked_vm (Andrew)
>  - fix doc comment (Ira)
>  - retain down_write_killable in vfio type1 (Alex)
>  - leave Alexey's T-b since the code is the same aside from uninlining
>  - sanity tested with vfio type1, sanity-built on ppc
> 
>  arch/powerpc/kvm/book3s_64_vio.c | 44 ++--
>  arch/powerpc/mm/book3s64/iommu_api.c | 41 ++-
>  drivers/fpga/dfl-afu-dma-region.c| 53 ++--
>  drivers/vfio/vfio_iommu_spapr_tce.c  | 54 ++--
>  drivers/vfio/vfio_iommu_type1.c  | 17 +--
>  include/linux/mm.h   |  4 ++
>  mm/util.c| 75 
>  7 files changed, 98 insertions(+), 190 deletions(-)

I tend to prefer adding a negative rather than converting to absolute
and passing a bool for inc/dec, but it all seems equivalent, so for
vfio parts

Acked-by: Alex Williamson 

> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c 
> b/arch/powerpc/kvm/book3s_64_vio.c
> index 66270e07449a..768b645c7edf 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -30,6 +30,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -56,43 +57,6 @@ static unsigned long kvmppc_stt_pages(unsigned long 
> tce_pages)
>   return tce_pages + ALIGN(stt_bytes, PAGE_SIZE) / PAGE_SIZE;
>  }
>  
> -static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc)
> -{
> - long ret = 0;
> -
> - if (!current || !current->mm)
> - return ret; /* process exited */
> -
> - down_write(>mm->mmap_sem);
> -
> - if (inc) {
> - unsigned long locked, lock_limit;
> -
> - locked = current->mm->locked_vm + stt_pages;
> - lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> - if (locked > lock_limit && !capable(CAP_IPC_LOCK))
> - ret = -ENOMEM;
> - else
> - current->mm->locked_vm += stt_pages;
> - } else {
> - if (WARN_ON_ONCE(stt_pages > current->mm->locked_vm))
> - stt_pages = current->mm->locked_vm;
> -
> - current->mm->locked_vm -= stt_pages;
> - }
> -
> - pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,
> - inc ? '+' : '-',
> - stt_pages << PAGE_SHIFT,
> - current->mm->locked_vm << PAGE_SHIFT,
> - rlimit(RLIMIT_MEMLOCK),
> - ret ? " - exceeded" : "");
> -
> - up_write(>mm->mmap_sem);
> -
> - return ret;
> -}
> -
>  static void kvm_spapr_tce_iommu_table_free(struct rcu_head *head)
>  {
>   struct kvmppc_spapr_tce_iommu_table *stit = container_of(head,
> @@ -302,7 +266,7 @@ static int kvm_spapr_tce_release(struct inode *inode, 
> struct file *filp)
>  
>   kvm_put_kvm(stt->kvm);
>  
> - kvmppc_account_memlimit(
> + account_locked_vm(current->mm,
>   kvmppc_stt_pages(kvmppc_tce_pages(stt->size)), false);
>   call_rcu(>rcu, release_spapr_tce_table);
>  
> @@ -327,7 +291,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>   return -EINVAL;
>  
>   npages = kvmppc_tce_pages(size);
> - ret = kvmppc_account_memlimit(kvmppc_stt_pages(npages), true);
> + ret = account_locked_vm(current->mm, kvmppc_stt_pages(npages), true);
>   if (ret)
>   return ret;
>  
> @@ -373,7 +337,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  
>   kfree(stt);
>   fail_acct:
> - kvmppc_account_memlimit(kvmppc_stt_pages(npages), false);
> + account_locked_vm(current->mm, kvmppc_stt_pages(npages), false);
>   return ret;
>  }
>  
> diff --git a/arch/powerpc/mm/book3s64/iommu_api.c 
> b/arch/powerpc/mm/book3s64/iommu_api.c
> index 5c521f3924a5..18d22eec0ebd 100644
> --- a/arch/powerpc/mm/book3s64/iommu_api.c
> +++ 

[PATCH] PCI: rpaphp: Avoid a sometimes-uninitialized warning

2019-06-03 Thread Nathan Chancellor
When building with -Wsometimes-uninitialized, clang warns:

drivers/pci/hotplug/rpaphp_core.c:243:14: warning: variable 'fndit' is
used uninitialized whenever 'for' loop exits because its condition is
false [-Wsometimes-uninitialized]
for (j = 0; j < entries; j++) {
^~~
drivers/pci/hotplug/rpaphp_core.c:256:6: note: uninitialized use occurs
here
if (fndit)
^
drivers/pci/hotplug/rpaphp_core.c:243:14: note: remove the condition if
it is always true
for (j = 0; j < entries; j++) {
^~~
drivers/pci/hotplug/rpaphp_core.c:233:14: note: initialize the variable
'fndit' to silence this warning
int j, fndit;
^
 = 0

Looking at the loop in a vacuum as clang would, fndit could be
uninitialized if entries was ever zero or the if statement was
always true within the loop. Regardless of whether or not this
warning is a problem in practice, "found" variables should always
be initialized to false so that there is no possibility of
undefined behavior.

Link: https://github.com/ClangBuiltLinux/linux/issues/504
Fixes: 2fcf3ae508c2 ("hotplug/drc-info: Add code to search ibm,drc-info 
property")
Signed-off-by: Nathan Chancellor 
---
 drivers/pci/hotplug/rpaphp_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/hotplug/rpaphp_core.c 
b/drivers/pci/hotplug/rpaphp_core.c
index bcd5d357ca23..07b56bf2886f 100644
--- a/drivers/pci/hotplug/rpaphp_core.c
+++ b/drivers/pci/hotplug/rpaphp_core.c
@@ -230,7 +230,7 @@ static int rpaphp_check_drc_props_v2(struct device_node 
*dn, char *drc_name,
struct of_drc_info drc;
const __be32 *value;
char cell_drc_name[MAX_DRC_NAME_LEN];
-   int j, fndit;
+   int j, fndit = 0;
 
info = of_find_property(dn->parent, "ibm,drc-info", NULL);
if (info == NULL)
-- 
2.22.0.rc2



Re: [PATCH 03/16] mm: simplify gup_fast_permitted

2019-06-03 Thread Linus Torvalds
On Mon, Jun 3, 2019 at 9:08 AM Linus Torvalds
 wrote:
>
> The new code has no test at all for "nr_pages == 0", afaik.

Note that it really is important to check for that, because right now we do

if (gup_fast_permitted(start, nr_pages)) {
local_irq_save(flags);
gup_pgd_range(start, end, write ? FOLL_WRITE : 0, pages, );
local_irq_restore(flags);
}

and that gup_pgd_range() function *depends* on the range being
non-zero, and does

pgdp = pgd_offset(current->mm, addr);
do {
pgd_t pgd = READ_ONCE(*pgdp);
...
} while (pgdp++, addr = next, addr != end);

Note how a zero range would turn into an infinite range here.

And the only check for 0 was that

if (nr_pages <= 0)
return 0;

in get_user_pages_fast() that you removed.

(Admittedly, it would be much better to have that check in
__get_user_pages_fast() itself, because we do have callers that call
the double-underscore version)

Now, I sincerely hope that we don't have anybody that passes in a zero
nr_pages (or a negative one), but we do actually have a comment saying
it's ok.

Note that the check for "if (end < start)" not only does not check for
0, it also doesn't really check for negative. It checks for
_overflow_. Admittedly most negative values would be expected to
overflow, but it's still a very different issue.

Maybe you added the check for negative somewhere else (in another
patch), but I don't see it.

Linus


Re: [PATCH v3 0/6] Prerequisites for NXP LS104xA SMMU enablement

2019-06-03 Thread Andreas Färber
Am 31.05.19 um 19:32 schrieb Laurentiu Tudor:
>> -Original Message-
>> From: Andreas Färber 
>> Sent: Friday, May 31, 2019 8:04 PM
>>
>> Hello Laurentiu,
>>
>> Am 31.05.19 um 18:46 schrieb Laurentiu Tudor:
 -Original Message-
 From: Andreas Färber 
 Sent: Friday, May 31, 2019 7:15 PM

 Hi Laurentiu,

 Am 30.05.19 um 16:19 schrieb laurentiu.tu...@nxp.com:
> This patch series contains several fixes in preparation for SMMU
> support on NXP LS1043A and LS1046A chips. Once these get picked up,
> I'll submit the actual SMMU enablement patches consisting in the
> required device tree changes.

 Have you thought through what will happen if this patch ordering is not
 preserved? In particular, a user installing a future U-Boot update with
 the DTB bits but booting a stable kernel without this patch series -
 wouldn't that regress dpaa then for our customers?

>>>
>>> These are fixes for issues that popped out after enabling SMMU.
>>> I do not expect them to break anything.
>>
>> That was not my question! You're missing my point: All your patches are
>> lacking a Fixes header in their commit message, for backporting them, to
>> avoid _your DT patches_ breaking the driver on stable branches!
> 
> It does appear that I'm missing your point. For sure, the DT updates solely 
> will
> break the kernel without these fixes but I'm not sure I understand how this
> could happen.

In short, customers rarely run master branch. Kindly have your
colleagues explain stable branches to you in details.

With Fixes header I was referring to the syntax explained here:
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes

> My plan was to share the kernel dts patches sometime after this series
> makes it through.

That's fine. What I'm warning you is that seemingly your DT patches,
once in one of your LSDK U-Boot releases, will cause a regression for
distros like our SLES 15 SP1 unless these prereq kernel patches get
applied on the respective stable branches. Which will not happen
automatically unless you as patch author take the appropriate action
before they get merged.

Thanks,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)


Re: [PATCH 03/16] mm: simplify gup_fast_permitted

2019-06-03 Thread Linus Torvalds
On Mon, Jun 3, 2019 at 12:41 AM Christoph Hellwig  wrote:
>
> I only removed a duplicate of it.

I don't see any remaining cases.

> The full (old) code in get_user_pages_fast() looks like this:
>
> if (nr_pages <= 0)
> return 0;
>
> if (unlikely(!access_ok((void __user *)start, len)))
> return -EFAULT;
>
> if (gup_fast_permitted(start, nr_pages)) {

Yes, and that code was correct.

The new code has no test at all for "nr_pages == 0", afaik.

 Linus


Re: [PATCH 01/16] uaccess: add untagged_addr definition for other arches

2019-06-03 Thread Khalid Aziz
On 6/1/19 1:49 AM, Christoph Hellwig wrote:
> From: Andrey Konovalov 
> 
> To allow arm64 syscalls to accept tagged pointers from userspace, we must
> untag them when they are passed to the kernel. Since untagging is done in
> generic parts of the kernel, the untagged_addr macro needs to be defined
> for all architectures.
> 
> Define it as a noop for architectures other than arm64.

Could you reword above sentence? We are already starting off with
untagged_addr() not being no-op for arm64 and sparc64. It will expand
further potentially. So something more along the lines of "Define it as
noop for architectures that do not support memory tagging". The first
paragraph in the log can also be rewritten to be not specific to arm64.

--
Khalid

> 
> Acked-by: Catalin Marinas 
> Signed-off-by: Andrey Konovalov 
> Signed-off-by: Christoph Hellwig 
> ---
>  include/linux/mm.h | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0e8834ac32b7..949d43e9c0b6 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -99,6 +99,10 @@ extern int mmap_rnd_compat_bits __read_mostly;
>  #include 
>  #include 
>  
> +#ifndef untagged_addr
> +#define untagged_addr(addr) (addr)
> +#endif
> +
>  #ifndef __pa_symbol
>  #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
>  #endif
> 



[PATCH] powerpc/32s: fix booting with CONFIG_PPC_EARLY_DEBUG_BOOTX

2019-06-03 Thread Christophe Leroy
When booting through OF, setup_disp_bat() does nothing because
disp_BAT are not set. By change, it used to work because BOOTX
buffer is mapped 1:1 at address 0x8100 by the bootloader, and
btext_setup_display() sets virt addr same as phys addr.

But since commit 215b823707ce ("powerpc/32s: set up an early static
hash table for KASAN."), a temporary page table overrides the
bootloader mapping.

This 0x8100 is also problematic with the newly implemented
Kernel Userspace Access Protection (KUAP) because it is within user
address space.

This patch fixes those issues by properly setting disp_BAT through
a call to btext_prepare_BAT(), allowing setup_disp_bat() to
properly setup BAT3 for early bootx screen buffer access.

Reported-by: Mathieu Malaterre 
Fixes: 215b823707ce ("powerpc/32s: set up an early static hash table for 
KASAN.")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/btext.h   | 4 
 arch/powerpc/kernel/prom_init.c| 1 +
 arch/powerpc/kernel/prom_init_check.sh | 2 +-
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/btext.h b/arch/powerpc/include/asm/btext.h
index 3ffad030393c..461b0f193864 100644
--- a/arch/powerpc/include/asm/btext.h
+++ b/arch/powerpc/include/asm/btext.h
@@ -13,7 +13,11 @@ extern void btext_update_display(unsigned long phys, int 
width, int height,
 int depth, int pitch);
 extern void btext_setup_display(int width, int height, int depth, int pitch,
unsigned long address);
+#ifdef CONFIG_PPC32
 extern void btext_prepare_BAT(void);
+#else
+static inline void btext_prepare_BAT(void) { }
+#endif
 extern void btext_map(void);
 extern void btext_unmap(void);
 
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 3555cad7bdde..ed446b7ea164 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -2336,6 +2336,7 @@ static void __init prom_check_displays(void)
prom_printf("W=%d H=%d LB=%d addr=0x%x\n",
width, height, pitch, addr);
btext_setup_display(width, height, 8, pitch, addr);
+   btext_prepare_BAT();
}
 #endif /* CONFIG_PPC_EARLY_DEBUG_BOOTX */
}
diff --git a/arch/powerpc/kernel/prom_init_check.sh 
b/arch/powerpc/kernel/prom_init_check.sh
index 518d416971c1..160bef0d553d 100644
--- a/arch/powerpc/kernel/prom_init_check.sh
+++ b/arch/powerpc/kernel/prom_init_check.sh
@@ -24,7 +24,7 @@ fi
 WHITELIST="add_reloc_offset __bss_start __bss_stop copy_and_flush
 _end enter_prom $MEM_FUNCS reloc_offset __secondary_hold
 __secondary_hold_acknowledge __secondary_hold_spinloop __start
-logo_linux_clut224
+logo_linux_clut224 btext_prepare_BAT
 reloc_got2 kernstart_addr memstart_addr linux_banner _stext
 __prom_init_toc_start __prom_init_toc_end btext_setup_display TOC."
 
-- 
2.13.3



Re: [PATCH] dlpar: Fix a missing-check bug in dlpar_parse_cc_property()

2019-06-03 Thread Michael Ellerman
On Sun, 2019-05-26 at 02:42:40 UTC, Gen Zhang wrote:
> In dlpar_parse_cc_property(), 'prop->name' is allocated by kstrdup().
> kstrdup() may return NULL, so it should be checked and handle error.
> And prop should be freed if 'prop->name' is NULL.
> 
> Signed-off-by: Gen Zhang 
> Acked-by: Nathan Lynch 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/efa9ace68e487ddd29c2b4d6dd232421

cheers


Re: [PATCH] powerpc/powernv: Update firmware archaeology around OPAL_HANDLE_HMI

2019-06-03 Thread Michael Ellerman
On Fri, 2019-05-24 at 05:09:56 UTC, Stewart Smith wrote:
> The first machines to ship with OPAL firmware all got firmware updates
> that have the new call, but just in case someone is foolish enough to
> believe the first 4 months of firmware is the best, we keep this code
> around.
> 
> Comment is updated to not refer to late 2014 as recent or the future.
> 
> Signed-off-by: Stewart Smith 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/1549c42deff5f3326ae295ae5816

cheers


Re: [PATCH] powerpc/powernv: Show checkstop reason for NPU2 HMIs

2019-06-03 Thread Michael Ellerman
On Thu, 2019-05-23 at 12:28:04 UTC, Frederic Barrat wrote:
> If the kernel is notified of an HMI caused by the NPU2, it's currently
> not being recognized and it logs the default message:
> 
> Unknown Malfunction Alert of type 3
> 
> The NPU on Power 9 has 3 Fault Isolation Registers, so that's a lot of
> possible causes, but we should at least log that it's an NPU problem
> and report which FIR and which bit were raised if opal gave us the
> information.
> 
> Signed-off-by: Frederic Barrat 
> Reviewed-by: Andrew Donnellan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/89d87bcba2874d824affb7842bb3960c

cheers


Re: [PATCH] powerpc: Remove variable ‘path’ since not used

2019-06-03 Thread Michael Ellerman
On Thu, 2019-05-23 at 10:25:20 UTC, Mathieu Malaterre wrote:
> In commit eab00a208eb6 ("powerpc: Move `path` variable inside
> DEBUG_PROM") DEBUG_PROM sentinels were added to silence a warning
> (treated as error with W=1):
> 
>   arch/powerpc/kernel/prom_init.c:1388:8: error: variable ‘path’ set but not 
> used [-Werror=unused-but-set-variable]
> 
> Rework the original patch and simplify the code, by removing the
> variable ‘path’ completely. Fix line over 90 characters.
> 
> Suggested-by: Michael Ellerman 
> Signed-off-by: Mathieu Malaterre 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/c806a6fde1c29e7419afcf94d761827a

cheers


Re: [PATCH] powerpc/pseries: Fix xive=off command line

2019-06-03 Thread Michael Ellerman
On Wed, 2019-05-15 at 10:05:01 UTC, Greg Kurz wrote:
> On POWER9, if the hypervisor supports XIVE exploitation mode, the guest OS
> will unconditionally requests for the XIVE interrupt mode even if XIVE was
> deactivated with the kernel command line xive=off. Later on, when the spapr
> XIVE init code handles xive=off, it disables XIVE and tries to fall back on
> the legacy mode XICS.
> 
> This discrepency causes a kernel panic because the hypervisor is configured
> to provide the XIVE interrupt mode to the guest :
> 
> [0.008837] kernel BUG at arch/powerpc/sysdev/xics/xics-common.c:135!
> [0.008877] Oops: Exception in kernel mode, sig: 5 [#1]
> [0.008908] LE SMP NR_CPUS=1024 NUMA pSeries
> [0.008939] Modules linked in:
> [0.008964] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 
> 5.0.13-200.fc29.ppc64le #1
> [0.009018] NIP:  c1029ab8 LR: c1029aac CTR: 
> c18e
> [0.009065] REGS: c007f96d7900 TRAP: 0700   Tainted: GW
>   (5.0.13-200.fc29.ppc64le)
> [0.009119] MSR:  82029033   CR: 
> 28000222  XER: 2004
> [0.009168] CFAR: c01b1e28 IRQMASK: 0
> [0.009168] GPR00: c1029aac c007f96d7b90 c15e8600 
> 
> [0.009168] GPR04: 0001  0061 
> 646f6d61696e0d0a
> [0.009168] GPR08: 0007fd8f 0001 c14c44c0 
> c007f96d76cf
> [0.009168] GPR12:  c18e 0001 
> 
> [0.009168] GPR16:  0001 c007f96d7c08 
> c16903d0
> [0.009168] GPR20: c007fffe04e8 ffea c1620164 
> c161fe58
> [0.009168] GPR24: c0ea6c88 c11151a8 006000c0 
> c007f96d7c34
> [0.009168] GPR28:  c14b286c c1115180 
> c161dc70
> [0.009558] NIP [c1029ab8] xics_smp_probe+0x38/0x98
> [0.009590] LR [c1029aac] xics_smp_probe+0x2c/0x98
> [0.009622] Call Trace:
> [0.009639] [c007f96d7b90] [c1029aac] xics_smp_probe+0x2c/0x98 
> (unreliable)
> [0.009687] [c007f96d7bb0] [c1033404] 
> pSeries_smp_probe+0x40/0xa0
> [0.009734] [c007f96d7bd0] [c10212a4] 
> smp_prepare_cpus+0x62c/0x6ec
> [0.009782] [c007f96d7cf0] [c10141b8] 
> kernel_init_freeable+0x148/0x448
> [0.009829] [c007f96d7db0] [c0010ba4] kernel_init+0x2c/0x148
> [0.009870] [c007f96d7e20] [c000bdd4] 
> ret_from_kernel_thread+0x5c/0x68
> [0.009916] Instruction dump:
> [0.009940] 7c0802a6 6000 7c0802a6 3882 f8010010 f821ffe1 3c62001c 
> e863b9a0
> [0.009988] 4b1882d1 6000 7c690034 5529d97e <0b09> 3d22001c 
> e929b998 3ce2ff8f
> 
> Look for xive=off during prom_init and don't ask for XIVE in this case. One
> exception though: if the host only supports XIVE, we still want to boot so
> we ignore xive=off.
> 
> Similarly, have the spapr XIVE init code to looking at the interrupt mode
> negociated during CAS, and ignore xive=off if the hypervisor only supports
> XIVE.
> 
> Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt 
> controller")
> Cc: sta...@vger.kernel.org # v4.20
> Reported-by: Pavithra R. Prakash 
> Signed-off-by: Greg Kurz 
> Reviewed-by: Cédric Le Goater 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a3bf9fbdad600b1e4335dd90979f8d60

cheers


Re: [PATCH 1/2] powerpc/lib: fix redundant inclusion of quad.o

2019-06-03 Thread Michael Ellerman
On Mon, 2019-05-13 at 10:00:14 UTC, Christophe Leroy wrote:
> quad.o is only for PPC64, and already included in obj64-y,
> so it doesn't have to be in obj-y
> 
> Fixes: 31bfdb036f12 ("powerpc: Use instruction emulation infrastructure to 
> handle alignment faults")
> Signed-off-by: Christophe Leroy 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f8e0d0fddf87f26aed576983f752329d

cheers


Re: [PATCH -next] misc: ocxl: Make ocxl_remove static

2019-06-03 Thread Michael Ellerman
On Sat, 2019-05-04 at 10:27:20 UTC, YueHaibing wrote:
> Fix sparse warning:
> 
> drivers/misc/ocxl/pci.c:44:6: warning:
>  symbol 'ocxl_remove' was not declared. Should it be static?
> 
> Reported-by: Hulk Robot 
> Signed-off-by: YueHaibing 
> Acked-by: Andrew Donnellan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/00b0cdbbc87fb4b531a0d75f4004ed3d

cheers


Re: [PATCH -next] powerpc/book3s64: Make some symbols static

2019-06-03 Thread Michael Ellerman
On Sat, 2019-05-04 at 10:24:27 UTC, YueHaibing wrote:
> Fix sparse warnings:
> 
> arch/powerpc/mm/book3s64/radix_pgtable.c:326:13: warning: symbol 
> 'radix_init_pgtable' was not declared. Should it be static?
> arch/powerpc/mm/book3s64/hash_native.c:48:1: warning: symbol 
> 'native_tlbie_lock' was not declared. Should it be static?
> arch/powerpc/mm/book3s64/hash_utils.c:988:24: warning: symbol 
> 'init_hash_mm_context' was not declared. Should it be static?
> 
> Signed-off-by: YueHaibing 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/d667edc01bedcd23988ef69f851e7cc2

cheers


Re: [PATCH] powerpc/powernv/npu: Fix reference leak

2019-06-03 Thread Michael Ellerman
On Fri, 2019-04-19 at 15:34:13 UTC, Greg Kurz wrote:
> Since 902bdc57451c, get_pci_dev() calls pci_get_domain_bus_and_slot(). This
> has the effect of incrementing the reference count of the PCI device, as
> explained in drivers/pci/search.c:
> 
>  * Given a PCI domain, bus, and slot/function number, the desired PCI
>  * device is located in the list of PCI devices. If the device is
>  * found, its reference count is increased and this function returns a
>  * pointer to its data structure.  The caller must decrement the
>  * reference count by calling pci_dev_put().  If no device is found,
>  * %NULL is returned.
> 
> Nothing was done to call pci_dev_put() and the reference count of GPU and
> NPU PCI devices rockets up.
> 
> A natural way to fix this would be to teach the callers about the change,
> so that they call pci_dev_put() when done with the pointer. This turns
> out to be quite intrusive, as it affects many paths in npu-dma.c,
> pci-ioda.c and vfio_pci_nvlink2.c. Also, the issue appeared in 4.16 and
> some affected code got moved around since then: it would be problematic
> to backport the fix to stable releases.
> 
> All that code never cared for reference counting anyway. Call pci_dev_put()
> from get_pci_dev() to revert to the previous behavior.
> 
> Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev from 
> pci_dn")
> Cc: sta...@vger.kernel.org # v4.16
> Signed-off-by: Greg Kurz 
> Reviewed-by: Alexey Kardashevskiy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/02c5f5394918b9b47ff4357b1b183357

cheers


Re: [PATCH v8 3/7] s390/pci: add support for IOMMU default DMA mode build options

2019-06-03 Thread Sebastian Ott


On Thu, 30 May 2019, Zhen Lei wrote:
> The default DMA mode is LAZY on s390, this patch make it can be set to
> STRICT at build time. It can be overridden by boot option.
> 
> There is no functional change.
> 
> Signed-off-by: Zhen Lei 

Acked-by: Sebastian Ott 



Re: [PATCH 22/22] docs: fix broken documentation links

2019-06-03 Thread Christophe Leroy




Le 30/05/2019 à 01:23, Mauro Carvalho Chehab a écrit :

Mostly due to x86 and acpi conversion, several documentation
links are still pointing to the old file. Fix them.

Signed-off-by: Mauro Carvalho Chehab 
---
  Documentation/acpi/dsd/leds.txt  |  2 +-
  Documentation/admin-guide/kernel-parameters.rst  |  6 +++---
  Documentation/admin-guide/kernel-parameters.txt  | 16 
  Documentation/admin-guide/ras.rst|  2 +-
  .../devicetree/bindings/net/fsl-enetc.txt|  7 +++
  .../bindings/pci/amlogic,meson-pcie.txt  |  2 +-
  .../bindings/regulator/qcom,rpmh-regulator.txt   |  2 +-
  Documentation/devicetree/booting-without-of.txt  |  2 +-
  Documentation/driver-api/gpio/board.rst  |  2 +-
  Documentation/driver-api/gpio/consumer.rst   |  2 +-
  .../firmware-guide/acpi/enumeration.rst  |  2 +-
  .../firmware-guide/acpi/method-tracing.rst   |  2 +-
  Documentation/i2c/instantiating-devices  |  2 +-
  Documentation/sysctl/kernel.txt  |  4 ++--
  .../translations/it_IT/process/howto.rst |  2 +-
  .../it_IT/process/stable-kernel-rules.rst|  4 ++--
  .../translations/zh_CN/process/4.Coding.rst  |  2 +-
  Documentation/x86/x86_64/5level-paging.rst   |  2 +-
  Documentation/x86/x86_64/boot-options.rst|  4 ++--
  .../x86/x86_64/fake-numa-for-cpusets.rst |  2 +-
  MAINTAINERS  |  6 +++---
  arch/arm/Kconfig |  2 +-
  arch/arm64/kernel/kexec_image.c  |  2 +-
  arch/powerpc/Kconfig |  2 +-
  arch/x86/Kconfig | 16 
  arch/x86/Kconfig.debug   |  2 +-
  arch/x86/boot/header.S   |  2 +-
  arch/x86/entry/entry_64.S|  2 +-
  arch/x86/include/asm/bootparam_utils.h   |  2 +-
  arch/x86/include/asm/page_64_types.h |  2 +-
  arch/x86/include/asm/pgtable_64_types.h  |  2 +-
  arch/x86/kernel/cpu/microcode/amd.c  |  2 +-
  arch/x86/kernel/kexec-bzimage64.c|  2 +-
  arch/x86/kernel/pci-dma.c|  2 +-
  arch/x86/mm/tlb.c|  2 +-
  arch/x86/platform/pvh/enlighten.c|  2 +-
  drivers/acpi/Kconfig | 10 +-
  drivers/net/ethernet/faraday/ftgmac100.c |  2 +-
  .../fieldbus/Documentation/fieldbus_dev.txt  |  4 ++--
  drivers/vhost/vhost.c|  2 +-
  include/acpi/acpi_drivers.h  |  2 +-
  include/linux/fs_context.h   |  2 +-
  include/linux/lsm_hooks.h|  2 +-
  mm/Kconfig   |  2 +-
  security/Kconfig |  2 +-
  tools/include/linux/err.h|  2 +-
  tools/objtool/Documentation/stack-validation.txt |  4 ++--
  tools/testing/selftests/x86/protection_keys.c|  2 +-
  48 files changed, 77 insertions(+), 78 deletions(-)


[...]


diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8c1c636308c8..e868d2bd48b8 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -898,7 +898,7 @@ config PPC_MEM_KEYS
  page-based protections, but without requiring modification of the
  page tables when an application changes protection domains.
  
-	  For details, see Documentation/vm/protection-keys.rst

+ For details, see Documentation/x86/protection-keys.rst


It looks strange to reference an x86 file, for powerpc arch.

Christophe



Re: [PATCH v3 14/16] powerpc/32: implement fast entry for syscalls on BOOKE

2019-06-03 Thread Christophe Leroy




On 05/28/2019 05:03 PM, Christophe Leroy wrote:

Michael Ellerman  a écrit :


Christophe Leroy  writes:

Le 23/05/2019 à 09:00, Christophe Leroy a écrit :

[...]


arch/powerpc/kernel/head_fsl_booke.o: In function `SystemCall':
arch/powerpc/kernel/head_fsl_booke.S:416: undefined reference to
`kvmppc_handler_BOOKE_INTERRUPT_SYSCALL_SPRN_SRR1'
Makefile:1052: recipe for target 'vmlinux' failed


+.macro SYSCALL_ENTRY trapno intno
+    mfspr    r10, SPRN_SPRG_THREAD
+#ifdef CONFIG_KVM_BOOKE_HV
+BEGIN_FTR_SECTION
+    mtspr    SPRN_SPRG_WSCRATCH0, r10
+    stw    r11, THREAD_NORMSAVE(0)(r10)
+    stw    r13, THREAD_NORMSAVE(2)(r10)
+    mfcr    r13    /* save CR in r13 for now   */
+    mfspr    r11, SPRN_SRR1
+    mtocrf    0x80, r11    /* check MSR[GS] without clobbering 
reg */

+    bf    3, 1975f
+    b    kvmppc_handler_BOOKE_INTERRUPT_\intno\()_SPRN_SRR1


It seems to me that the "_SPRN_SRR1" on the end of this line
isn't meant to be there...  However, it still fails to link with that
removed.


It looks like I missed the macro expansion.

The called function should be kvmppc_handler_8_0x01B

Seems like kisskb doesn't build any config like this.


I thought we did, ie:

http://kisskb.ellerman.id.au/kisskb/buildresult/13817941/


That's a ppc64 config it seems. The problem was on booke32.

Christophe



But clearly something is missing to trigger the bug.


I was able to trigger the bug with mpc85xx_defconfig + 
CONFIG_VIRTUALIZATION + CONFIG_PPC_E500MC


The bug pops up when CONFIG_KVM_BOOKE_HV is set.

Christophe



cheers





Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request

2019-06-03 Thread Shawn Anastasio




On 6/3/19 3:35 AM, Alexey Kardashevskiy wrote:



On 03/06/2019 15:02, Alexey Kardashevskiy wrote:



On 03/06/2019 12:23, Shawn Anastasio wrote:



On 5/30/19 10:56 PM, Alexey Kardashevskiy wrote:



On 31/05/2019 08:49, Shawn Anastasio wrote:

On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote:



On 28/05/2019 17:39, Shawn Anastasio wrote:



On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote:



On 28/05/2019 15:36, Oliver wrote:

On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio 
wrote:


Introduce a new pcibios function pcibios_ignore_alignment_request
which allows the PCI core to defer to platform-specific code to
determine whether or not to ignore alignment requests for PCI
resources.

The existing behavior is to simply ignore alignment requests when
PCI_PROBE_ONLY is set. This is behavior is maintained by the
default implementation of pcibios_ignore_alignment_request.

Signed-off-by: Shawn Anastasio 
---
     drivers/pci/pci.c   | 9 +++--
     include/linux/pci.h | 1 +
     2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8abc843b1615..8207a09085d1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5882,6 +5882,11 @@ resource_size_t __weak
pcibios_default_alignment(void)
    return 0;
     }

+int __weak pcibios_ignore_alignment_request(void)
+{
+   return pci_has_flag(PCI_PROBE_ONLY);
+}
+
     #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
     static char
resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
     static DEFINE_SPINLOCK(resource_alignment_lock);
@@ -5906,9 +5911,9 @@ static resource_size_t
pci_specified_resource_alignment(struct pci_dev *dev,
    p = resource_alignment_param;
    if (!*p && !align)
    goto out;
-   if (pci_has_flag(PCI_PROBE_ONLY)) {
+   if (pcibios_ignore_alignment_request()) {
    align = 0;
-   pr_info_once("PCI: Ignoring requested alignments
(PCI_PROBE_ONLY)\n");
+   pr_info_once("PCI: Ignoring requested
alignments\n");
    goto out;
    }


I think the logic here is questionable to begin with. If the user
has
explicitly requested re-aligning a resource via the command line
then
we should probably do it even if PCI_PROBE_ONLY is set. When it
breaks
they get to keep the pieces.

That said, the real issue here is that PCI_PROBE_ONLY probably
shouldn't be set under qemu/kvm. Under the other hypervisor
(PowerVM)
hotplugged devices are configured by firmware before it's passed to
the guest and we need to keep the FW assignments otherwise things
break. QEMU however doesn't do any BAR assignments and relies on
that
being handled by the guest. At boot time this is done by SLOF, but
Linux only keeps SLOF around until it's extracted the device-tree.
Once that's done SLOF gets blown away and the kernel needs to do
it's
own BAR assignments. I'm guessing there's a hack in there to make it
work today, but it's a little surprising that it works at all...



The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the
guest which receives an event from qemu (RAS_EPOW from
/proc/interrupts), fetches device tree chunks (and as I understand
it -
they come with BARs from phyp but without from qemu) and writes
"1" to
"/sys/bus/pci/rescan" which calls pci_assign_resource() eventually:


Interesting. Does this mean that the PHYP hotplug path doesn't
call pci_assign_resource?



I'd expect dlpar_add_slot() to be called under phyp and eventually
pci_device_add() which (I think) may or may not trigger later
reassignment.



If so it means the patch may not
break that platform after all, though it still may not be
the correct way of doing things.



We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems
that (unless resource_alignment= is used) the pseries guest should just
walk through all allocated resources and leave them unchanged.


If we add a pcibios_default_alignment() implementation like was
suggested earlier, then it will behave as if the user has
specified resource_alignment= by default and SLOF's assignments
won't be honored (I think).



I removed pci_add_flags(PCI_PROBE_ONLY) from pSeries_setup_arch and
tried booting with and without pci=resource_alignment= and I can see no
difference - BARs are still aligned to 64K as programmed in SLOF; if I
hack SLOF to align to 4K or 32K - BARs get packed and the guest leaves
them unchanged.



I guess it boils down to one question - is it important that we
observe SLOF's initial BAR assignments?


It isn't if it's SLOF but it is if it's phyp. It used to not
allow/support BAR reassignment and even if it does not, I'd rather avoid
touching them.


A quick update. I tried removing pci_add_flags(PCI_PROBE_ONLY) which
worked, but if I add an implementation of pcibios_default_alignment
which simply returns PAGE_SIZE, my VM fails to boot and many errors
from the virtio disk driver are printed to the console.

After some 

Re: [PATCH v3] powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt()

2019-06-03 Thread Gautham R Shenoy
Hi,

On Wed, May 15, 2019 at 01:15:52PM +0530, Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" 
> 
> The calls to arch_add_memory()/arch_remove_memory() are always made
> with the read-side cpu_hotplug_lock acquired via
> memory_hotplug_begin().  On pSeries,
> arch_add_memory()/arch_remove_memory() eventually call resize_hpt()
> which in turn calls stop_machine() which acquires the read-side
> cpu_hotplug_lock again, thereby resulting in the recursive acquisition
> of this lock.

A clarification regarding why we hadn't observed this problem earlier.

In the absence of CONFIG_PROVE_LOCKING, we hadn't observed a system
lockup during a memory hotplug operation because cpus_read_lock() is a
per-cpu rwsem read, which, in the fast-path (in the absence of the
writer, which in our case is a CPU-hotplug operation) simply
increments the read_count on the semaphore. Thus a recursive read in
the fast-path doesn't cause any problems.

However, we can hit this problem in practice if there is a concurrent
CPU-Hotplug operation in progress which is waiting to acquire the
write-side of the lock. This will cause the second recursive read to
block until the writer finishes. While the writer is blocked since the
first read holds the lock. Thus both the reader as well as the writers
fail to make any progress thereby blocking both CPU-Hotplug as well as
Memory Hotplug operations.

Memory-Hotplug  CPU-Hotplug
CPU 0   CPU 1
--  --

1. down_read(cpu_hotplug_lock.rw_sem)  
   [memory_hotplug_begin]
2. down_write(cpu_hotplug_lock.rw_sem)
[cpu_up/cpu_down]
3. down_read(cpu_hotplug_lock.rw_sem)
   [stop_machine()]


> 
> Lockdep complains as follows in these code-paths.
> 
>  swapper/0/1 is trying to acquire lock:
>  (ptrval) (cpu_hotplug_lock.rw_sem){}, at: stop_machine+0x2c/0x60
> 
> but task is already holding lock:
> (ptrval) (cpu_hotplug_lock.rw_sem){}, at: 
> mem_hotplug_begin+0x20/0x50
> 
>  other info that might help us debug this:
>   Possible unsafe locking scenario:
> 
> CPU0
> 
>lock(cpu_hotplug_lock.rw_sem);
>lock(cpu_hotplug_lock.rw_sem);
> 
>   *** DEADLOCK ***
> 
>   May be due to missing lock nesting notation
> 
>  3 locks held by swapper/0/1:
>   #0: (ptrval) (>mutex){}, at: __driver_attach+0x12c/0x1b0
>   #1: (ptrval) (cpu_hotplug_lock.rw_sem){}, at: 
> mem_hotplug_begin+0x20/0x50
>   #2: (ptrval) (mem_hotplug_lock.rw_sem){}, at: 
> percpu_down_write+0x54/0x1a0
> 
> stack backtrace:
>  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 5.0.0-rc5-58373-gbc99402235f3-dirty #166
>  Call Trace:
>  [c000feb03150] [c0e32bd4] dump_stack+0xe8/0x164 (unreliable)
>  [c000feb031a0] [c020d6c0] __lock_acquire+0x1110/0x1c70
>  [c000feb03320] [c020f080] lock_acquire+0x240/0x290
>  [c000feb033e0] [c017f554] cpus_read_lock+0x64/0xf0
>  [c000feb03420] [c029ebac] stop_machine+0x2c/0x60
>  [c000feb03460] [c00d7f7c] pseries_lpar_resize_hpt+0x19c/0x2c0
>  [c000feb03500] [c00788d0] resize_hpt_for_hotplug+0x70/0xd0
>  [c000feb03570] [c0e5d278] arch_add_memory+0x58/0xfc
>  [c000feb03610] [c03553a8] devm_memremap_pages+0x5e8/0x8f0
>  [c000feb036c0] [c09c2394] pmem_attach_disk+0x764/0x830
>  [c000feb037d0] [c09a7c38] nvdimm_bus_probe+0x118/0x240
>  [c000feb03860] [c0968500] really_probe+0x230/0x4b0
>  [c000feb038f0] [c0968aec] driver_probe_device+0x16c/0x1e0
>  [c000feb03970] [c0968ca8] __driver_attach+0x148/0x1b0
>  [c000feb039f0] [c09650b0] bus_for_each_dev+0x90/0x130
>  [c000feb03a50] [c0967dd4] driver_attach+0x34/0x50
>  [c000feb03a70] [c0967068] bus_add_driver+0x1a8/0x360
>  [c000feb03b00] [c096a498] driver_register+0x108/0x170
>  [c000feb03b70] [c09a7400] __nd_driver_register+0xd0/0xf0
>  [c000feb03bd0] [c128aa90] nd_pmem_driver_init+0x34/0x48
>  [c000feb03bf0] [c0010a10] do_one_initcall+0x1e0/0x45c
>  [c000feb03cd0] [c122462c] kernel_init_freeable+0x540/0x64c
>  [c000feb03db0] [c001110c] kernel_init+0x2c/0x160
>  [c000feb03e20] [c000bed4] ret_from_kernel_thread+0x5c/0x68
> 
> Fix this issue by
>   1) Requiring all the calls to pseries_lpar_resize_hpt() be made
>  with cpu_hotplug_lock held.
> 
>   2) In pseries_lpar_resize_hpt() invoke stop_machine_cpuslocked()
>  as a consequence of 1)
> 
>   3) To satisfy 1), in hpt_order_set(), call mmu_hash_ops.resize_hpt()
>  with cpu_hotplug_lock held.
> 
> Reported-by: Aneesh Kumar K.V 
> Signed-off-by: Gautham R. Shenoy 
> ---
> v2 -> v3 : Updated the comment for pseries_lpar_resize_hpt()
>Updated the commit-log with the full 

Re: [PATCH] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work

2019-06-03 Thread Christophe Leroy




Le 31/05/2019 à 20:53, Nathan Chancellor a écrit :

clang warns:

drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used
uninitialized whenever switch case is taken [-Wsometimes-uninitialized]
 case IBMVSCSI_HOST_ACTION_NONE:
  ^
drivers/scsi/ibmvscsi/ibmvscsi.c:2151:6: note: uninitialized use occurs
here
 if (rc) {
 ^~

Initialize rc to zero so that the atomic_set and dev_err statement don't
trigger for the cases that just break.

Fixes: 035a3c4046b5 ("scsi: ibmvscsi: redo driver work thread to use enum action 
states")
Link: https://github.com/ClangBuiltLinux/linux/issues/502
Signed-off-by: Nathan Chancellor 
---
  drivers/scsi/ibmvscsi/ibmvscsi.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index 727c31dc11a0..6714d8043e62 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -2118,7 +2118,7 @@ static unsigned long ibmvscsi_get_desired_dma(struct 
vio_dev *vdev)
  static void ibmvscsi_do_work(struct ibmvscsi_host_data *hostdata)
  {
unsigned long flags;
-   int rc;
+   int rc = 0;


I don't think the above is the best solution, as it hides the warning 
instead of really fixing it.


Your problem is that some legs of the switch are missing setting the 
value of rc, it would therefore be better to fix the legs instead of 
setting a default value which may not be correct for every case, 
allthough it may be at the time being.


Christophe



char *action = "reset";
  
  	spin_lock_irqsave(hostdata->host->host_lock, flags);




Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request

2019-06-03 Thread Alexey Kardashevskiy



On 03/06/2019 15:02, Alexey Kardashevskiy wrote:
> 
> 
> On 03/06/2019 12:23, Shawn Anastasio wrote:
>>
>>
>> On 5/30/19 10:56 PM, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 31/05/2019 08:49, Shawn Anastasio wrote:
 On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote:
>
>
> On 28/05/2019 17:39, Shawn Anastasio wrote:
>>
>>
>> On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 28/05/2019 15:36, Oliver wrote:
 On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio 
 wrote:
>
> Introduce a new pcibios function pcibios_ignore_alignment_request
> which allows the PCI core to defer to platform-specific code to
> determine whether or not to ignore alignment requests for PCI
> resources.
>
> The existing behavior is to simply ignore alignment requests when
> PCI_PROBE_ONLY is set. This is behavior is maintained by the
> default implementation of pcibios_ignore_alignment_request.
>
> Signed-off-by: Shawn Anastasio 
> ---
>     drivers/pci/pci.c   | 9 +++--
>     include/linux/pci.h | 1 +
>     2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 8abc843b1615..8207a09085d1 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5882,6 +5882,11 @@ resource_size_t __weak
> pcibios_default_alignment(void)
>    return 0;
>     }
>
> +int __weak pcibios_ignore_alignment_request(void)
> +{
> +   return pci_has_flag(PCI_PROBE_ONLY);
> +}
> +
>     #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
>     static char
> resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
>     static DEFINE_SPINLOCK(resource_alignment_lock);
> @@ -5906,9 +5911,9 @@ static resource_size_t
> pci_specified_resource_alignment(struct pci_dev *dev,
>    p = resource_alignment_param;
>    if (!*p && !align)
>    goto out;
> -   if (pci_has_flag(PCI_PROBE_ONLY)) {
> +   if (pcibios_ignore_alignment_request()) {
>    align = 0;
> -   pr_info_once("PCI: Ignoring requested alignments
> (PCI_PROBE_ONLY)\n");
> +   pr_info_once("PCI: Ignoring requested
> alignments\n");
>    goto out;
>    }

 I think the logic here is questionable to begin with. If the user
 has
 explicitly requested re-aligning a resource via the command line
 then
 we should probably do it even if PCI_PROBE_ONLY is set. When it
 breaks
 they get to keep the pieces.

 That said, the real issue here is that PCI_PROBE_ONLY probably
 shouldn't be set under qemu/kvm. Under the other hypervisor
 (PowerVM)
 hotplugged devices are configured by firmware before it's passed to
 the guest and we need to keep the FW assignments otherwise things
 break. QEMU however doesn't do any BAR assignments and relies on
 that
 being handled by the guest. At boot time this is done by SLOF, but
 Linux only keeps SLOF around until it's extracted the device-tree.
 Once that's done SLOF gets blown away and the kernel needs to do
 it's
 own BAR assignments. I'm guessing there's a hack in there to make it
 work today, but it's a little surprising that it works at all...
>>>
>>>
>>> The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the
>>> guest which receives an event from qemu (RAS_EPOW from
>>> /proc/interrupts), fetches device tree chunks (and as I understand
>>> it -
>>> they come with BARs from phyp but without from qemu) and writes
>>> "1" to
>>> "/sys/bus/pci/rescan" which calls pci_assign_resource() eventually:
>>
>> Interesting. Does this mean that the PHYP hotplug path doesn't
>> call pci_assign_resource?
>
>
> I'd expect dlpar_add_slot() to be called under phyp and eventually
> pci_device_add() which (I think) may or may not trigger later
> reassignment.
>
>
>> If so it means the patch may not
>> break that platform after all, though it still may not be
>> the correct way of doing things.
>
>
> We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems
> that (unless resource_alignment= is used) the pseries guest should just
> walk through all allocated resources and leave them unchanged.

 If we add a pcibios_default_alignment() implementation like was
 suggested earlier, then it will behave as if the user has
 specified resource_alignment= by 

[PATCH v3] powerpc: fix kexec failure on book3s/32

2019-06-03 Thread Christophe Leroy
In the old days, _PAGE_EXEC didn't exist on 6xx aka book3s/32.
Therefore, allthough __mapin_ram_chunk() was already mapping kernel
text with PAGE_KERNEL_TEXT and the rest with PAGE_KERNEL, the entire
memory was executable. Part of the memory (first 512kbytes) was
mapped with BATs instead of page table, but it was also entirely
mapped as executable.

In commit 385e89d5b20f ("powerpc/mm: add exec protection on
powerpc 603"), we started adding exec protection to some 6xx, namely
the 603, for pages mapped via pagetables.

Then, in commit 63b2bc619565 ("powerpc/mm/32s: Use BATs for
STRICT_KERNEL_RWX"), the exec protection was extended to BAT mapped
memory, so that really only the kernel text could be executed.

The problem here is that kexec is based on copying some code into
upper part of memory then executing it from there in order to install
a fresh new kernel at its definitive location.

However, the code is position independant and first part of it is
just there to deactivate the MMU and jump to the second part. So it
is possible to run this first part inplace instead of running the
copy. Once the MMU is off, there is no protection anymore and the
second part of the code will just run as before.

Reported-by: Aaro Koskinen 
Fixes: 63b2bc619565 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX")
Cc: sta...@vger.kernel.org
Signed-off-by: Christophe Leroy 
---
 Aaro, can you test this patch ? Thanks.

 arch/powerpc/include/asm/kexec.h   | 3 +++
 arch/powerpc/kernel/machine_kexec_32.c | 4 +++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 4a585cba1787..c68476818753 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -94,6 +94,9 @@ static inline bool kdump_in_progress(void)
return crashing_cpu >= 0;
 }
 
+void relocate_new_kernel(unsigned long indirection_page, unsigned long 
reboot_code_buffer,
+unsigned long start_address) __noreturn;
+
 #ifdef CONFIG_KEXEC_FILE
 extern const struct kexec_file_ops kexec_elf64_ops;
 
diff --git a/arch/powerpc/kernel/machine_kexec_32.c 
b/arch/powerpc/kernel/machine_kexec_32.c
index affe5dcce7f4..2b160d68db49 100644
--- a/arch/powerpc/kernel/machine_kexec_32.c
+++ b/arch/powerpc/kernel/machine_kexec_32.c
@@ -30,7 +30,6 @@ typedef void (*relocate_new_kernel_t)(
  */
 void default_machine_kexec(struct kimage *image)
 {
-   extern const unsigned char relocate_new_kernel[];
extern const unsigned int relocate_new_kernel_size;
unsigned long page_list;
unsigned long reboot_code_buffer, reboot_code_buffer_phys;
@@ -58,6 +57,9 @@ void default_machine_kexec(struct kimage *image)
reboot_code_buffer + KEXEC_CONTROL_PAGE_SIZE);
printk(KERN_INFO "Bye!\n");
 
+   if (!IS_ENABLED(CONFIG_FSL_BOOKE) && !IS_ENABLED(CONFIG_44x))
+   relocate_new_kernel(page_list, reboot_code_buffer_phys, 
image->start);
+
/* now call it */
rnk = (relocate_new_kernel_t) reboot_code_buffer;
(*rnk)(page_list, reboot_code_buffer_phys, image->start);
-- 
2.13.3



Re: [PATCH BACKPORTv2 4.19, 5.0, 5.1] crypto: vmx - ghash: do nosimd fallback manually

2019-06-03 Thread Greg KH
On Mon, Jun 03, 2019 at 12:08:48PM +1000, Daniel Axtens wrote:
> commit 357d065a44cdd77ed5ff35155a989f2a763e96ef upstream.
> [backported: the VMX driver did not use crypto_simd_usable() until
>  after 5.1]
> 
> VMX ghash was using a fallback that did not support interleaving simd
> and nosimd operations, leading to failures in the extended test suite.
> 
> If I understood correctly, Eric's suggestion was to use the same
> data format that the generic code uses, allowing us to call into it
> with the same contexts. I wasn't able to get that to work - I think
> there's a very different key structure and data layout being used.
> 
> So instead steal the arm64 approach and perform the fallback
> operations directly if required.
> 
> Fixes: cc333cd68dfa ("crypto: vmx - Adding GHASH routines for VMX module")
> Cc: sta...@vger.kernel.org # v4.1+
> Reported-by: Eric Biggers 
> Signed-off-by: Daniel Axtens 
> Acked-by: Ard Biesheuvel 
> Tested-by: Michael Ellerman 
> Signed-off-by: Herbert Xu 
> Signed-off-by: Daniel Axtens 
> ---
> 
> v2: do stable backport form correctly.

Thanks for all of these, now queued up.

greg k-h


Re: [PATCH 10/16] sparc64: use the generic get_user_pages_fast code

2019-06-03 Thread Christoph Hellwig
On Sun, Jun 02, 2019 at 03:39:48PM +0800, Hillf Danton wrote:
> 
> Hi Christoph 
> 
> On Sat,  1 Jun 2019 09:49:53 +0200 Christoph Hellwig wrote:
> > 
> > diff --git a/arch/sparc/include/asm/pgtable_64.h 
> > b/arch/sparc/include/asm/pgtable_64.h
> > index a93eca29e85a..2301ab5250e4 100644
> > --- a/arch/sparc/include/asm/pgtable_64.h
> > +++ b/arch/sparc/include/asm/pgtable_64.h
> > @@ -1098,6 +1098,24 @@ static inline unsigned long untagged_addr(unsigned 
> > long start)
> >  }
> >  #define untagged_addr untagged_addr
> >  
> > +static inline bool pte_access_permitted(pte_t pte, bool write)
> > +{
> > +   u64 prot;
> > +
> > +   if (tlb_type == hypervisor) {
> > +   prot = _PAGE_PRESENT_4V | _PAGE_P_4V;
> > +   if (prot)
> 
> Feel free to correct me if I misread or miss anything.
> It looks like a typo: s/prot/write/, as checking _PAGE_PRESENT_4V and
> _PAGE_P_4V makes prot always have _PAGE_WRITE_4V set, regardless of write.

True, the if prot should be if write.


Re: [PATCH 08/16] sparc64: add the missing pgd_page definition

2019-06-03 Thread Christoph Hellwig
On Sat, Jun 01, 2019 at 09:28:54AM -0700, Linus Torvalds wrote:
> Both sparc64 and sh had this pattern, but now that I look at it more
> closely, I think your version is wrong, or at least nonoptimal.

I bet it is.  Then again these symbols are just required for the code
to compile, as neither sparc64 nor sh actually use the particular
variant of huge pages we need it for.  Then again even actually dead
code should better be not too buggy if it isn't just a stub.

> So I thgink this would be better done with
> 
>  #define pgd_page(pgd)pfn_to_page(pgd_pfn(pgd))
> 
> where that "pgd_pfn()" would need to be a new (but likely very
> trivial) function. That's what we do for pte_pfn().
> 
> IOW, it would likely end up something like
> 
>   #define pgd_to_pfn(pgd) (pgd_val(x) >> PFN_PGD_SHIFT)

True.  I guess it would be best if we could get most if not all
architectures to use common versions of these macros so that we have
the issue settled once.


Re: [RFC PATCH] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate

2019-06-03 Thread Christophe Leroy




Le 03/06/2019 à 08:44, Nicholas Piggin a écrit :

The pmd_none check does not catch hugepage collapse, nor does the
pmd_present check in pmd_trans_huge, because hugepage collapse sets
!_PAGE_PRESENT && _PAGE_INVALID (which results in !pmd_none and
pmd_present).

Aneesh noticed we might need this check as well.

---
  arch/powerpc/mm/pgtable.c | 15 +++
  1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index db4a6253df92..7a702d21400a 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -372,13 +372,20 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
pdshift = PMD_SHIFT;
pmdp = pmd_offset(, ea);
pmd  = READ_ONCE(*pmdp);
-   /*
-* A hugepage collapse is captured by pmd_none, because
-* it mark the pmd none and do a hpte invalidate.
-*/
+
if (pmd_none(pmd))
return NULL;
  
+#ifdef CONFIG_PPC_BOOK3S_64


I can't see anything that would build fail on other subarches. Wouldn't 
be better to use


if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && (pmd_val(pmd) & 
(_PAGE_PRESENT|_PAGE_INVALID) == _PAGE_INVALID))




+   if (pmd_val(pmd) & (_PAGE_PRESENT|_PAGE_INVALID) == _PAGE_INVALID) {


Maybe using pmd_raw() instead as in all the book3s64 helpers ?

Christophe



+   /*
+* A hugepage collapse is captured by this condition, see
+* pmdp_invalidate.
+*/
+   return NULL;
+   }
+#endif
+
if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) {
if (is_thp)
*is_thp = true;



Re: [PATCH 03/16] mm: simplify gup_fast_permitted

2019-06-03 Thread Christoph Hellwig
On Sat, Jun 01, 2019 at 09:14:17AM -0700, Linus Torvalds wrote:
> On Sat, Jun 1, 2019 at 12:50 AM Christoph Hellwig  wrote:
> >
> > Pass in the already calculated end value instead of recomputing it, and
> > leave the end > start check in the callers instead of duplicating them
> > in the arch code.
> 
> Good cleanup, except it's wrong.
> 
> > -   if (nr_pages <= 0)
> > +   if (end < start)
> > return 0;
> 
> You moved the overflow test to generic code - good.
> 
> You removed the sign and zero test on nr_pages - bad.

I only removed a duplicate of it.  The full (old) code in
get_user_pages_fast() looks like this:

if (nr_pages <= 0)
return 0;

if (unlikely(!access_ok((void __user *)start, len)))
return -EFAULT;

if (gup_fast_permitted(start, nr_pages)) {


Re: [PATCH] powerpc/64s: Fix THP PMD collapse serialisation

2019-06-03 Thread Nicholas Piggin
Aneesh Kumar K.V's on June 3, 2019 4:43 pm:
> On 6/3/19 11:35 AM, Nicholas Piggin wrote:
>> Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion
>> in pte helpers") changed the actual bitwise tests in pte_access_permitted
>> by using pte_write() and pte_present() helpers rather than raw bitwise
>> testing _PAGE_WRITE and _PAGE_PRESENT bits.
>> 
>> The pte_present change now returns true for ptes which are !_PAGE_PRESENT
>> and _PAGE_INVALID, which is the combination used by pmdp_invalidate to
>> synchronize access from lock-free lookups. pte_access_permitted is used by
>> pmd_access_permitted, so allowing GUP lock free access to proceed with
>> such PTEs breaks this synchronisation.
>> 
>> This bug has been observed on HPT host, with random crashes and corruption
>> in guests, usually together with bad PMD messages in the host.
>> 
>> Fix this by adding an explicit check in pmd_access_permitted, and
>> documenting the condition explicitly.
>> 
>> The pte_write() change should be okay, and would prevent GUP from falling
>> back to the slow path when encountering savedwrite ptes, which matches
>> what x86 (that does not implement savedwrite) does.
>> 
>> Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in 
>> pte helpers")
>> Cc: Aneesh Kumar K.V 
>> Cc: Christophe Leroy 
>> Signed-off-by: Nicholas Piggin 
>> ---
>>   arch/powerpc/include/asm/book3s/64/pgtable.h | 19 ++-
>>   arch/powerpc/mm/book3s64/pgtable.c   |  3 +++
>>   2 files changed, 21 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
>> b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index 7dede2e34b70..aaa72aa1b765 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -1092,7 +1092,24 @@ static inline int pmd_protnone(pmd_t pmd)
>>   #define pmd_access_permitted pmd_access_permitted
>>   static inline bool pmd_access_permitted(pmd_t pmd, bool write)
>>   {
>> -return pte_access_permitted(pmd_pte(pmd), write);
>> +pte_t pte = pmd_pte(pmd);
>> +unsigned long pteval = pte_val(pte);
>> +
>> +/*
>> + * pmdp_invalidate sets this combination (that is not caught by
>> + * !pte_present() check in pte_access_permitted), to prevent
>> + * lock-free lookups, as part of the serialize_against_pte_lookup()
>> + * synchronisation.
>> + *
>> + * This check inadvertently catches the case where the PTE's hardware
>> + * PRESENT bit is cleared while TLB is flushed, to work around
>> + * hardware TLB issues. This is suboptimal, but should not be hit
>> + * frequently and should be harmless.
>> + */
>> +if ((pteval & _PAGE_INVALID) && !(pteval & _PAGE_PRESENT))
>> +return false;
>> +
>> +return pte_access_permitted(pte, write);
>>   }
>>   
> 
> 
> you need to do similar for other lockless page table walk like 
> find_linux_pte

Yeah good point as discussed offline. I was going to make that a
separate patch, it would have a different Fixes:. I have not been
able to trigger any bugs caused by it, whereas the bug caused by
this patch hits reliably in about 10 minutes or less.

Maybe the race window is just a lot smaller or the function is
less frequently used?

Thanks,
Nick




Re: [PATCH 09/22] docs: mark orphan documents as such

2019-06-03 Thread Christophe Leroy




Le 30/05/2019 à 01:23, Mauro Carvalho Chehab a écrit :

Sphinx doesn't like orphan documents:

 Documentation/accelerators/ocxl.rst: WARNING: document isn't included in 
any toctree
 Documentation/arm/stm32/overview.rst: WARNING: document isn't included in 
any toctree
 Documentation/arm/stm32/stm32f429-overview.rst: WARNING: document isn't 
included in any toctree
 Documentation/arm/stm32/stm32f746-overview.rst: WARNING: document isn't 
included in any toctree
 Documentation/arm/stm32/stm32f769-overview.rst: WARNING: document isn't 
included in any toctree
 Documentation/arm/stm32/stm32h743-overview.rst: WARNING: document isn't 
included in any toctree
 Documentation/arm/stm32/stm32mp157-overview.rst: WARNING: document isn't 
included in any toctree
 Documentation/gpu/msm-crash-dump.rst: WARNING: document isn't included in 
any toctree
 Documentation/interconnect/interconnect.rst: WARNING: document isn't 
included in any toctree
 Documentation/laptops/lg-laptop.rst: WARNING: document isn't included in 
any toctree
 Documentation/powerpc/isa-versions.rst: WARNING: document isn't included 
in any toctree
 Documentation/virtual/kvm/amd-memory-encryption.rst: WARNING: document 
isn't included in any toctree
 Documentation/virtual/kvm/vcpu-requests.rst: WARNING: document isn't 
included in any toctree

So, while they aren't on any toctree, add :orphan: to them, in order
to silent this warning.


Are those files really not meant to be included in a toctree ?

Shouldn't we include them in the relevant toctree instead of just 
shutting up Sphinx warnings ?


Christophe



Signed-off-by: Mauro Carvalho Chehab 
---
  Documentation/accelerators/ocxl.rst | 2 ++
  Documentation/arm/stm32/overview.rst| 2 ++
  Documentation/arm/stm32/stm32f429-overview.rst  | 2 ++
  Documentation/arm/stm32/stm32f746-overview.rst  | 2 ++
  Documentation/arm/stm32/stm32f769-overview.rst  | 2 ++
  Documentation/arm/stm32/stm32h743-overview.rst  | 2 ++
  Documentation/arm/stm32/stm32mp157-overview.rst | 2 ++
  Documentation/gpu/msm-crash-dump.rst| 2 ++
  Documentation/interconnect/interconnect.rst | 2 ++
  Documentation/laptops/lg-laptop.rst | 2 ++
  Documentation/powerpc/isa-versions.rst  | 2 ++
  Documentation/virtual/kvm/amd-memory-encryption.rst | 2 ++
  Documentation/virtual/kvm/vcpu-requests.rst | 2 ++
  13 files changed, 26 insertions(+)

diff --git a/Documentation/accelerators/ocxl.rst 
b/Documentation/accelerators/ocxl.rst
index 14cefc020e2d..b1cea19a90f5 100644
--- a/Documentation/accelerators/ocxl.rst
+++ b/Documentation/accelerators/ocxl.rst
@@ -1,3 +1,5 @@
+:orphan:
+
  
  OpenCAPI (Open Coherent Accelerator Processor Interface)
  
diff --git a/Documentation/arm/stm32/overview.rst 
b/Documentation/arm/stm32/overview.rst
index 85cfc8410798..f7e734153860 100644
--- a/Documentation/arm/stm32/overview.rst
+++ b/Documentation/arm/stm32/overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
  
  STM32 ARM Linux Overview
  
diff --git a/Documentation/arm/stm32/stm32f429-overview.rst 
b/Documentation/arm/stm32/stm32f429-overview.rst
index 18feda97f483..65bbb1c3b423 100644
--- a/Documentation/arm/stm32/stm32f429-overview.rst
+++ b/Documentation/arm/stm32/stm32f429-overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
  STM32F429 Overview
  ==
  
diff --git a/Documentation/arm/stm32/stm32f746-overview.rst b/Documentation/arm/stm32/stm32f746-overview.rst

index b5f4b6ce7656..42d593085015 100644
--- a/Documentation/arm/stm32/stm32f746-overview.rst
+++ b/Documentation/arm/stm32/stm32f746-overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
  STM32F746 Overview
  ==
  
diff --git a/Documentation/arm/stm32/stm32f769-overview.rst b/Documentation/arm/stm32/stm32f769-overview.rst

index 228656ced2fe..f6adac862b17 100644
--- a/Documentation/arm/stm32/stm32f769-overview.rst
+++ b/Documentation/arm/stm32/stm32f769-overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
  STM32F769 Overview
  ==
  
diff --git a/Documentation/arm/stm32/stm32h743-overview.rst b/Documentation/arm/stm32/stm32h743-overview.rst

index 3458dc00095d..c525835e7473 100644
--- a/Documentation/arm/stm32/stm32h743-overview.rst
+++ b/Documentation/arm/stm32/stm32h743-overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
  STM32H743 Overview
  ==
  
diff --git a/Documentation/arm/stm32/stm32mp157-overview.rst b/Documentation/arm/stm32/stm32mp157-overview.rst

index 62e176d47ca7..2c52cd020601 100644
--- a/Documentation/arm/stm32/stm32mp157-overview.rst
+++ b/Documentation/arm/stm32/stm32mp157-overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
  STM32MP157 Overview
  ===
  
diff --git a/Documentation/gpu/msm-crash-dump.rst 

Re: [WIP RFC PATCH 0/6] Generic Firmware Variable Filesystem

2019-06-03 Thread Greg KH
On Mon, Jun 03, 2019 at 04:04:32PM +1000, Daniel Axtens wrote:
> Hi Nayna,
> 
> >> As PowerNV moves towards secure boot, we need a place to put secure
> >> variables. One option that has been canvassed is to make our secure
> >> variables look like EFI variables. This is an early sketch of another
> >> approach where we create a generic firmware variable file system,
> >> fwvarfs, and an OPAL Secure Variable backend for it.
> >
> > Is there a need of new filesystem ? I am wondering why can't these be 
> > exposed via sysfs / securityfs ?
> > Probably, something like... /sys/firmware/secureboot or 
> > /sys/kernel/security/secureboot/  ?
> 
> I suppose we could put secure variables in sysfs, but I'm not sure
> that's what sysfs was intended for. I understand sysfs as "a
> filesystem-based view of kernel objects" (from
> Documentation/filesystems/configfs/configfs.txt), and I don't think a
> secure variable is really a kernel object in the same way most other
> things in sysfs are... but I'm open to being convinced.

What makes them more "secure" than anything else that is in sysfs today?
I didn't see anything in this patchset that provided "additional
security", did I miss it?

> securityfs seems to be reserved for LSMs, I don't think we can put
> things there.

Yeah, I wouldn't mess with that.

I would just recommend putting this in sysfs.  Make a new subsystem
(i.e. class) and away you go.

> My hope with fwvarfs is to provide a generic place for firmware
> variables so that we don't need to expand the list of firmware-specific
> filesystems beyond efivarfs. I am also aiming to make things simple to
> use so that people familiar with firmware don't also have to become
> familiar with filesystem code in order to expose firmware variables to
> userspace.

Why would anyone need to be writing new code to firmware variables that
makes it any different from any other kernel change?

> > Also, it sounds like this is needed only for secure firmware variables 
> > and does not include
> > other firmware variables which are not security relevant ? Is that 
> > correct understanding ?
> 
> The primary use case at the moment - OPAL secure variables - is security
> focused because the current OPAL secure variable design stores and
> manipulates secure variables separately from the rest of nvram. This
> isn't an inherent feature of fwvarfs.

Again, why not just put it in sysfs please?

> fwvarfs can also be used for variables that are not security relevant as
> well. For example, with the EFI backend (patch 3), both secure and
> insecure variables can be read.

I don't remember why efi variables were not put in sysfs, I think there
was some reasoning behind it originally.  Perhaps look in the linux-efi
archives.

thanks,

greg k-h


RE: [next-20190530] Boot failure on PowerPC

2019-06-03 Thread Lili Deng (Wicresoft North America Ltd)
ACK for ' but it fixes the get_swap_device warning messages during the boot.'
Double check for kernel panic issue, without this patch, it seems not repro in 
my local manual environment, so please ignore my previous mail for 'the patch 
fixes the kernel panic'.
Sorry for the confusion. 

-Original Message-
From: Lili Deng (Wicresoft North America Ltd) 
Sent: Monday, June 3, 2019 11:24 AM
To: Sachin Sant ; Dexuan-Linux Cui 

Cc: Michael Ellerman ; linuxppc-dev@lists.ozlabs.org; 
linux-n...@vger.kernel.org; Dexuan Cui 
Subject: RE: [next-20190530] Boot failure on PowerPC

Hi Sachin,

I verified below patch against Ubuntu 18.04, didn't hit the kernel panic any 
more, could you please let know how did you verify?

Thanks,
Lili
-Original Message-
From: Sachin Sant  
Sent: Monday, June 3, 2019 11:12 AM
To: Dexuan-Linux Cui 
Cc: Michael Ellerman ; linuxppc-dev@lists.ozlabs.org; 
linux-n...@vger.kernel.org; Dexuan Cui ; Lili Deng 
(Wicresoft North America Ltd) 
Subject: Re: [next-20190530] Boot failure on PowerPC



> On 31-May-2019, at 11:43 PM, Dexuan-Linux Cui  wrote:
> 
> On Fri, May 31, 2019 at 6:52 AM Michael Ellerman  wrote:
>>> 
>>> Machine boots till login prompt and then panics few seconds later.
>>> 
>>> Last known next build was May 24th. Will attempt few builds till May 
>>> 30 to narrow down this problem.
>> 
>> My CI was fine with next-20190529 (9a15d2e3fd03e3).
>> 
>> cheers
> 
> Hi Sachin,
> It looks this patch may fix the issue:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2019%2F5%2F30%2F1630data=02%7C01%7Cv-lide%40microsoft.com%7C66e1ef6017aa461703f808d6e7d148cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636951283393233385sdata=IJFhtvL2Bd87HCoMZ7oWL%2Bar6NY%2FfPbmdCZMT%2BJz5t4%3Dreserved=0
>  , but I'm not sure.

It does not help fix the kernel panic issue, but it fixes the get_swap_device 
warning messages during the boot.

Thanks
-Sachin



Re: [RFC PATCH] powerpc/book3e: KASAN Full support for 64bit

2019-06-03 Thread Christophe Leroy

Hi,

Ok, can you share your .config ?

Christophe

Le 31/05/2019 à 03:29, Daniel Axtens a écrit :

Hi Christophe,

I tried this on the t4240rdb and it fails to boot if KASAN is
enabled. It does boot with the patch applied but KASAN disabled, so that
narrows it down a little bit.

I need to focus on 3s first so I'll just drop 3e from my patch set for
now.

Regards,
Daniel


The KASAN shadow area is mapped into vmemmap space:
0x8000 0400   to 0x8000 0600  .
For this vmemmap has to be disabled.

Cc: Daniel Axtens 
Signed-off-by: Christophe Leroy 
---
  arch/powerpc/Kconfig  |   1 +
  arch/powerpc/Kconfig.debug|   3 +-
  arch/powerpc/include/asm/kasan.h  |  11 +++
  arch/powerpc/kernel/Makefile  |   2 +
  arch/powerpc/kernel/head_64.S |   3 +
  arch/powerpc/kernel/setup_64.c|  20 +++---
  arch/powerpc/mm/kasan/Makefile|   1 +
  arch/powerpc/mm/kasan/kasan_init_64.c | 129 ++
  8 files changed, 159 insertions(+), 11 deletions(-)
  create mode 100644 arch/powerpc/mm/kasan/kasan_init_64.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1a2fb50126b2..e0b7c45e4dc7 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -174,6 +174,7 @@ config PPC
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KASAN  if PPC32
+   select HAVE_ARCH_KASAN  if PPC_BOOK3E_64 && 
!SPARSEMEM_VMEMMAP
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 61febbbdd02b..b4140dd6b4e4 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -370,4 +370,5 @@ config PPC_FAST_ENDIAN_SWITCH
  config KASAN_SHADOW_OFFSET
hex
depends on KASAN
-   default 0xe000
+   default 0xe000 if PPC32
+   default 0x68000400 if PPC64
diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index 296e51c2f066..756b3d58f921 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -23,10 +23,21 @@
  
  #define KASAN_SHADOW_OFFSET	ASM_CONST(CONFIG_KASAN_SHADOW_OFFSET)
  
+#ifdef CONFIG_PPC32

  #define KASAN_SHADOW_END  0UL
  
  #define KASAN_SHADOW_SIZE	(KASAN_SHADOW_END - KASAN_SHADOW_START)
  
+#else

+
+#include 
+
+#define KASAN_SHADOW_SIZE  (KERN_VIRT_SIZE >> KASAN_SHADOW_SCALE_SHIFT)
+
+#define KASAN_SHADOW_END   (KASAN_SHADOW_START + KASAN_SHADOW_SIZE)
+
+#endif /* CONFIG_PPC32 */
+
  #ifdef CONFIG_KASAN
  void kasan_early_init(void);
  void kasan_mmu_init(void);
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 0ea6c4aa3a20..7f232c06f11d 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -35,6 +35,8 @@ KASAN_SANITIZE_early_32.o := n
  KASAN_SANITIZE_cputable.o := n
  KASAN_SANITIZE_prom_init.o := n
  KASAN_SANITIZE_btext.o := n
+KASAN_SANITIZE_paca.o := n
+KASAN_SANITIZE_setup_64.o := n
  
  ifdef CONFIG_KASAN

  CFLAGS_early_32.o += -DDISABLE_BRANCH_PROFILING
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index 3fad8d499767..80fbd8024fb2 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -966,6 +966,9 @@ start_here_multiplatform:
 * and SLB setup before we turn on relocation.
 */
  
+#ifdef CONFIG_KASAN

+   bl  kasan_early_init
+#endif
/* Restore parameters passed from prom_init/kexec */
mr  r3,r31
bl  early_setup /* also sets r13 and SPRG_PACA */
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index ba404dd9ce1d..d2bf860dd966 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -311,6 +311,16 @@ void __init early_setup(unsigned long dt_ptr)
DBG(" -> early_setup(), dt_ptr: 0x%lx\n", dt_ptr);
  
  	/*

+* Configure exception handlers. This include setting up trampolines
+* if needed, setting exception endian mode, etc...
+*/
+   configure_exceptions();
+
+   /* Apply all the dynamic patching */
+   apply_feature_fixups();
+   setup_feature_keys();
+
+   /*
 * Do early initialization using the flattened device
 * tree, such as retrieving the physical memory map or
 * calculating/retrieving the hash table size.
@@ -325,16 +335,6 @@ void __init early_setup(unsigned long dt_ptr)
setup_paca(paca_ptrs[boot_cpuid]);
fixup_boot_paca();
  
-	/*

-* Configure exception handlers. This include setting up trampolines
-* if needed, setting exception endian mode, etc...
-*/
-   configure_exceptions();
-
-   /* Apply all the dynamic patching */
-   apply_feature_fixups();
-   setup_feature_keys();
-
/* 

[PATCH v2] powerpc: pseries/hvconsole: fix stack overread via udbg

2019-06-03 Thread Daniel Axtens
While developing kasan for 64-bit book3s, I hit the following stack
over-read.

It occurs because the hypercall to put characters onto the terminal
takes 2 longs (128 bits/16 bytes) of characters at a time, and so
hvc_put_chars would unconditionally copy 16 bytes from the argument
buffer, regardless of supplied length. However, udbg_hvc_putc can
call hvc_put_chars with a single-byte buffer, leading to the error.

[0.001931] 
==  
[150/819]
[0.001933] BUG: KASAN: stack-out-of-bounds in hvc_put_chars+0xdc/0x110
[0.001934] Read of size 8 at addr c23e7a90 by task swapper/0
[0.001934]
[0.001935] CPU: 0 PID: 0 Comm: swapper Not tainted 
5.2.0-rc2-next-20190528-02824-g048a6ab4835b #113
[0.001935] Call Trace:
[0.001936] [c23e7790] [c1b4a450] dump_stack+0x104/0x154 
(unreliable)
[0.001937] [c23e77f0] [c06d3524] 
print_address_description+0xa0/0x30c
[0.001938] [c23e7880] [c06d318c] __kasan_report+0x20c/0x224
[0.001939] [c23e7950] [c06d19d8] kasan_report+0x18/0x30
[0.001940] [c23e7970] [c06d4854] 
__asan_report_load8_noabort+0x24/0x40
[0.001941] [c23e7990] [c01511ac] hvc_put_chars+0xdc/0x110
[0.001942] [c23e7a10] [c0f81cfc] 
hvterm_raw_put_chars+0x9c/0x110
[0.001943] [c23e7a50] [c0f82634] udbg_hvc_putc+0x154/0x200
[0.001944] [c23e7b10] [c0049c90] udbg_write+0xf0/0x240
[0.001945] [c23e7b70] [c02e5d88] console_unlock+0x868/0xd30
[0.001946] [c23e7ca0] [c02e6e00] 
register_console+0x970/0xe90
[0.001947] [c23e7d80] [c1ff1928] 
register_early_udbg_console+0xf8/0x114
[0.001948] [c23e7df0] [c1ff1174] setup_arch+0x108/0x790
[0.001948] [c23e7e90] [c1fe41c8] start_kernel+0x104/0x784
[0.001949] [c23e7f90] [c000b368] 
start_here_common+0x1c/0x534
[0.001950]
[0.001950]
[0.001951] Memory state around the buggy address:
[0.001952]  c23e7980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00
[0.001952]  c23e7a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 
f1
[0.001953] >c23e7a80: f1 f1 01 f2 f2 f2 00 00 00 00 00 00 00 00 00 
00
[0.001953]  ^
[0.001954]  c23e7b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00
[0.001954]  c23e7b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00
[0.001955] 
==

Document that a 16-byte buffer is requred, and provide it in udbg.

CC: Dmitry Vyukov 
Signed-off-by: Daniel Axtens 

---

v2: avoid memcpy, push responsibility to caller.
Solution suggested by mpe.
---
 arch/powerpc/platforms/pseries/hvconsole.c |  2 +-
 drivers/tty/hvc/hvc_vio.c  | 16 +++-
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hvconsole.c 
b/arch/powerpc/platforms/pseries/hvconsole.c
index 74da18de853a..73ec15cd2708 100644
--- a/arch/powerpc/platforms/pseries/hvconsole.c
+++ b/arch/powerpc/platforms/pseries/hvconsole.c
@@ -62,7 +62,7 @@ EXPORT_SYMBOL(hvc_get_chars);
  * @vtermno: The vtermno or unit_address of the adapter from which the data
  * originated.
  * @buf: The character buffer that contains the character data to send to
- * firmware.
+ * firmware. Must be at least 16 bytes, even if count is less than 16.
  * @count: Send this number of characters.
  */
 int hvc_put_chars(uint32_t vtermno, const char *buf, int count)
diff --git a/drivers/tty/hvc/hvc_vio.c b/drivers/tty/hvc/hvc_vio.c
index 6de6d4a1a221..7af54d6ed5b8 100644
--- a/drivers/tty/hvc/hvc_vio.c
+++ b/drivers/tty/hvc/hvc_vio.c
@@ -107,6 +107,14 @@ static int hvterm_raw_get_chars(uint32_t vtermno, char 
*buf, int count)
return got;
 }
 
+/**
+ * hvterm_raw_put_chars: send characters to firmware for given vterm adapter
+ * @vtermno: The virtual terminal number.
+ * @buf: The characters to send. Because of the underlying hypercall in
+ *   hvc_put_chars(), this buffer must be at least 16 bytes long, even if
+ *   you are sending fewer chars.
+ * @count: number of chars to send.
+ */
 static int hvterm_raw_put_chars(uint32_t vtermno, const char *buf, int count)
 {
struct hvterm_priv *pv = hvterm_privs[vtermno];
@@ -219,6 +227,7 @@ static const struct hv_ops hvterm_hvsi_ops = {
 static void udbg_hvc_putc(char c)
 {
int count = -1;
+   unsigned char bounce_buffer[16];
 
if (!hvterm_privs[0])
return;
@@ -229,7 +238,12 @@ static void udbg_hvc_putc(char c)
do {
switch(hvterm_privs[0]->proto) {
case HV_PROTOCOL_RAW:
-   count = hvterm_raw_put_chars(0, , 1);
+   

[RFC PATCH] powerpc/64s: __find_linux_pte synchronization vs pmdp_invalidate

2019-06-03 Thread Nicholas Piggin
The pmd_none check does not catch hugepage collapse, nor does the
pmd_present check in pmd_trans_huge, because hugepage collapse sets
!_PAGE_PRESENT && _PAGE_INVALID (which results in !pmd_none and
pmd_present).

Aneesh noticed we might need this check as well.

---
 arch/powerpc/mm/pgtable.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index db4a6253df92..7a702d21400a 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -372,13 +372,20 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
pdshift = PMD_SHIFT;
pmdp = pmd_offset(, ea);
pmd  = READ_ONCE(*pmdp);
-   /*
-* A hugepage collapse is captured by pmd_none, because
-* it mark the pmd none and do a hpte invalidate.
-*/
+
if (pmd_none(pmd))
return NULL;
 
+#ifdef CONFIG_PPC_BOOK3S_64
+   if (pmd_val(pmd) & (_PAGE_PRESENT|_PAGE_INVALID) == _PAGE_INVALID) {
+   /*
+* A hugepage collapse is captured by this condition, see
+* pmdp_invalidate.
+*/
+   return NULL;
+   }
+#endif
+
if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) {
if (is_thp)
*is_thp = true;
-- 
2.20.1



Re: [PATCH] powerpc/64s: Fix THP PMD collapse serialisation

2019-06-03 Thread Aneesh Kumar K.V

On 6/3/19 11:35 AM, Nicholas Piggin wrote:

Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion
in pte helpers") changed the actual bitwise tests in pte_access_permitted
by using pte_write() and pte_present() helpers rather than raw bitwise
testing _PAGE_WRITE and _PAGE_PRESENT bits.

The pte_present change now returns true for ptes which are !_PAGE_PRESENT
and _PAGE_INVALID, which is the combination used by pmdp_invalidate to
synchronize access from lock-free lookups. pte_access_permitted is used by
pmd_access_permitted, so allowing GUP lock free access to proceed with
such PTEs breaks this synchronisation.

This bug has been observed on HPT host, with random crashes and corruption
in guests, usually together with bad PMD messages in the host.

Fix this by adding an explicit check in pmd_access_permitted, and
documenting the condition explicitly.

The pte_write() change should be okay, and would prevent GUP from falling
back to the slow path when encountering savedwrite ptes, which matches
what x86 (that does not implement savedwrite) does.

Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte 
helpers")
Cc: Aneesh Kumar K.V 
Cc: Christophe Leroy 
Signed-off-by: Nicholas Piggin 
---
  arch/powerpc/include/asm/book3s/64/pgtable.h | 19 ++-
  arch/powerpc/mm/book3s64/pgtable.c   |  3 +++
  2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 7dede2e34b70..aaa72aa1b765 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1092,7 +1092,24 @@ static inline int pmd_protnone(pmd_t pmd)
  #define pmd_access_permitted pmd_access_permitted
  static inline bool pmd_access_permitted(pmd_t pmd, bool write)
  {
-   return pte_access_permitted(pmd_pte(pmd), write);
+   pte_t pte = pmd_pte(pmd);
+   unsigned long pteval = pte_val(pte);
+
+   /*
+* pmdp_invalidate sets this combination (that is not caught by
+* !pte_present() check in pte_access_permitted), to prevent
+* lock-free lookups, as part of the serialize_against_pte_lookup()
+* synchronisation.
+*
+* This check inadvertently catches the case where the PTE's hardware
+* PRESENT bit is cleared while TLB is flushed, to work around
+* hardware TLB issues. This is suboptimal, but should not be hit
+* frequently and should be harmless.
+*/
+   if ((pteval & _PAGE_INVALID) && !(pteval & _PAGE_PRESENT))
+   return false;
+
+   return pte_access_permitted(pte, write);
  }
  



you need to do similar for other lockless page table walk like 
find_linux_pte



  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 16bda049187a..ff98b663c83e 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -116,6 +116,9 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned 
long address,
/*
 * This ensures that generic code that rely on IRQ disabling
 * to prevent a parallel THP split work as expected.
+*
+* Marking the entry with _PAGE_INVALID && ~_PAGE_PRESENT requires
+* a special case check in pmd_access_permitted.
 */
serialize_against_pte_lookup(vma->vm_mm);
return __pmd(old_pmd);




-anesh



[PATCH] powerpc/64s: Fix THP PMD collapse serialisation

2019-06-03 Thread Nicholas Piggin
Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion
in pte helpers") changed the actual bitwise tests in pte_access_permitted
by using pte_write() and pte_present() helpers rather than raw bitwise
testing _PAGE_WRITE and _PAGE_PRESENT bits.

The pte_present change now returns true for ptes which are !_PAGE_PRESENT
and _PAGE_INVALID, which is the combination used by pmdp_invalidate to
synchronize access from lock-free lookups. pte_access_permitted is used by
pmd_access_permitted, so allowing GUP lock free access to proceed with
such PTEs breaks this synchronisation.

This bug has been observed on HPT host, with random crashes and corruption
in guests, usually together with bad PMD messages in the host.

Fix this by adding an explicit check in pmd_access_permitted, and
documenting the condition explicitly.

The pte_write() change should be okay, and would prevent GUP from falling
back to the slow path when encountering savedwrite ptes, which matches
what x86 (that does not implement savedwrite) does.

Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte 
helpers")
Cc: Aneesh Kumar K.V 
Cc: Christophe Leroy 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 19 ++-
 arch/powerpc/mm/book3s64/pgtable.c   |  3 +++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 7dede2e34b70..aaa72aa1b765 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1092,7 +1092,24 @@ static inline int pmd_protnone(pmd_t pmd)
 #define pmd_access_permitted pmd_access_permitted
 static inline bool pmd_access_permitted(pmd_t pmd, bool write)
 {
-   return pte_access_permitted(pmd_pte(pmd), write);
+   pte_t pte = pmd_pte(pmd);
+   unsigned long pteval = pte_val(pte);
+
+   /*
+* pmdp_invalidate sets this combination (that is not caught by
+* !pte_present() check in pte_access_permitted), to prevent
+* lock-free lookups, as part of the serialize_against_pte_lookup()
+* synchronisation.
+*
+* This check inadvertently catches the case where the PTE's hardware
+* PRESENT bit is cleared while TLB is flushed, to work around
+* hardware TLB issues. This is suboptimal, but should not be hit
+* frequently and should be harmless.
+*/
+   if ((pteval & _PAGE_INVALID) && !(pteval & _PAGE_PRESENT))
+   return false;
+
+   return pte_access_permitted(pte, write);
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 16bda049187a..ff98b663c83e 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -116,6 +116,9 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned 
long address,
/*
 * This ensures that generic code that rely on IRQ disabling
 * to prevent a parallel THP split work as expected.
+*
+* Marking the entry with _PAGE_INVALID && ~_PAGE_PRESENT requires
+* a special case check in pmd_access_permitted.
 */
serialize_against_pte_lookup(vma->vm_mm);
return __pmd(old_pmd);
-- 
2.20.1



Re: [WIP RFC PATCH 0/6] Generic Firmware Variable Filesystem

2019-06-03 Thread Daniel Axtens
Hi Nayna,

>> As PowerNV moves towards secure boot, we need a place to put secure
>> variables. One option that has been canvassed is to make our secure
>> variables look like EFI variables. This is an early sketch of another
>> approach where we create a generic firmware variable file system,
>> fwvarfs, and an OPAL Secure Variable backend for it.
>
> Is there a need of new filesystem ? I am wondering why can't these be 
> exposed via sysfs / securityfs ?
> Probably, something like... /sys/firmware/secureboot or 
> /sys/kernel/security/secureboot/  ?

I suppose we could put secure variables in sysfs, but I'm not sure
that's what sysfs was intended for. I understand sysfs as "a
filesystem-based view of kernel objects" (from
Documentation/filesystems/configfs/configfs.txt), and I don't think a
secure variable is really a kernel object in the same way most other
things in sysfs are... but I'm open to being convinced.

securityfs seems to be reserved for LSMs, I don't think we can put
things there.

My hope with fwvarfs is to provide a generic place for firmware
variables so that we don't need to expand the list of firmware-specific
filesystems beyond efivarfs. I am also aiming to make things simple to
use so that people familiar with firmware don't also have to become
familiar with filesystem code in order to expose firmware variables to
userspace.

> Also, it sounds like this is needed only for secure firmware variables 
> and does not include
> other firmware variables which are not security relevant ? Is that 
> correct understanding ?

The primary use case at the moment - OPAL secure variables - is security
focused because the current OPAL secure variable design stores and
manipulates secure variables separately from the rest of nvram. This
isn't an inherent feature of fwvarfs.

fwvarfs can also be used for variables that are not security relevant as
well. For example, with the EFI backend (patch 3), both secure and
insecure variables can be read.

Regards,
Daniel