date:20190718

Re:[PATCH v7] cpufreq/pasemi: fix an use-after-freeinpas_cpufreq_cpu_init()

2019-07-18 Thread wen.yang99

>>> Hello Wen,
>>>
>>> Thanks for your patch!
>>>
>>> Did you test your patch with a P.A. Semi board?
>>>
>> Hello Christian, thank you.
>> We don't have a P.A. Semi board yet, so we didn't test it.
>> If you have such a board, could you please kindly help to test it?
>>
>> --
>> Thanks and regards,
>> Wen
> 
> Hello Wen,
> 
> I successfully tested your pasemi cpufreq modifications with my P.A.
> Semi board [1] today.
> 
> First I patched the latest Git kernel with Viresh Kumar's patch [2].
> After that I was able to patch the latest Git kernel with your v7 patch [3].
> 
> Then the kernel compiled without any errors.
> 
> Afterwards I successfully tested the new Git kernel with some cpufreq
> governors on openSUSE Tumbleweed 20190521 PowerPC64 [4] and on ubuntu
> MATE 16.04.6 LTS PowerPC32.
> 
> Thanks a lot for your work!
> 
> Tested-by: Christian Zigotzky 
> 
> Cheers,
> Christian

Thank you very much!
--
Cheers,
Wen

> [1] https://en.wikipedia.org/wiki/AmigaOne_X1000
> [2]
> https://lore.kernel.org/lkml/ee8cf5fb4b4a01fdf9199037ff6d835b935cfd13.1562902877.git.viresh.ku...@linaro.org/#Z30drivers:cpufreq:pasemi-cpufreq.c
> [3] https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-July/193735.html
> [4] Screenshots:
> https://i.pinimg.com/originals/37/66/93/37669306cbc909a9d79270a849d18aa6.png
> and
> https://i.pinimg.com/originals/fe/f8/bf/fef8bfc90d95b5ae9cf31e175e8ba2da.png

Re: [PATCH v2 1/3] tools/perf: Move kvm-stat header file from conditional inclusion to common include section

2019-07-18 Thread Ravi Bangoria



LGTM. For the series,

Reviewed-By: Ravi Bangoria

[PATCH] powerpc/tm: Fix oops on sigreturn on systems without TM

2019-07-18 Thread Michael Neuling

On systems like P9 powernv where we have no TM (or P8 booted with
ppc_tm=off), userspace can construct a signal context which still has
the MSR TS bits set. The kernel tries to restore this context which
results in the following crash:

[   74.980557] Unexpected TM Bad Thing exception at c00022fc (msr 
0x800102a03031) tm_scratch=80020280f033
[   74.980741] Oops: Unrecoverable exception, sig: 6 [#1]
[   74.980820] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[   74.980917] Modules linked in:
[   74.980980] CPU: 0 PID: 1636 Comm: sigfuz Not tainted 
5.2.0-11043-g0a8ad0ffa4 #69
[   74.981096] NIP:  c00022fc LR: 7fffb2d67e48 CTR: 
[   74.981212] REGS: c0003fffbd70 TRAP: 0700   Not tainted  
(5.2.0-11045-g7142b497d8)
[   74.981325] MSR:  800102a03031   CR: 
42004242  XER: 
[   74.981463] CFAR: c00022e0 IRQMASK: 0
[   74.981463] GPR00: 0072 7fffb2b6e560 7fffb2d87f00 
0669
[   74.981463] GPR04: 7fffb2b6e728   
7fffb2b6f2a8
[   74.981463] GPR08:    

[   74.981463] GPR12:  7fffb2b76900  

[   74.981463] GPR16: 7fffb237 7fffb2d84390 7fffea3a15ac 
01000a250420
[   74.981463] GPR20: 7fffb2b6f260 10001770  

[   74.981463] GPR24: 7fffb2d843a0 7fffea3a14a0 0001 
0080
[   74.981463] GPR28: 7fffea3a14d8 003d0f00  
7fffb2b6e728
[   74.982420] NIP [c00022fc] rfi_flush_fallback+0x7c/0x80
[   74.982517] LR [7fffb2d67e48] 0x7fffb2d67e48
[   74.982593] Call Trace:
[   74.982632] Instruction dump:
[   74.982691] e96a0220 e96a02a8 e96a0330 e96a03b8 394a0400 4200ffdc 7d2903a6 
e92d0c00
[   74.982809] e94d0c08 e96d0c10 e82d0c18 7db242a6 <4c24> 7db243a6 7db142a6 
f82d0c18

The problem is the signal code assumes TM is enabled when
CONFIG_PPC_TRANSACTIONAL_MEM is on. This may not be the case as with
P9 powernv or if `ppc_tm=off` is used on P8.

This means any local user can crash the system.

Fix the problem by returning a bad stack frame to the user if they try
to set the MSR TS bits with sigreturn() on systems where TM is not
supported.

Found with sigfuz kernel selftest on P9.

This fixes CVE-2019-13648.

Fixes: 2b0a576d15 ("powerpc: Add new transactional memory state to the signal 
context")
Cc: sta...@vger.kernel.org # v3.9
Reported-by: Praveen Pandey 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/signal_32.c | 3 +++
 arch/powerpc/kernel/signal_64.c | 5 +
 2 files changed, 8 insertions(+)

diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index f50b708d6d..98600b276f 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -1198,6 +1198,9 @@ SYSCALL_DEFINE0(rt_sigreturn)
goto bad;
 
if (MSR_TM_ACTIVE(msr_hi<<32)) {
+   /* Trying to start TM on non TM system */
+   if (!cpu_has_feature(CPU_FTR_TM))
+   goto bad;
/* We only recheckpoint on return if we're
 * transaction.
 */
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 2f80e270c7..117515564e 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -771,6 +771,11 @@ SYSCALL_DEFINE0(rt_sigreturn)
if (MSR_TM_ACTIVE(msr)) {
/* We recheckpoint on return. */
struct ucontext __user *uc_transact;
+
+   /* Trying to start TM on non TM system */
+   if (!cpu_has_feature(CPU_FTR_TM))
+   goto badframe;
+
if (__get_user(uc_transact, >uc_link))
goto badframe;
if (restore_tm_sigcontexts(current, >uc_mcontext,
-- 
2.21.0

[PATCH AUTOSEL 4.4 30/35] powerpc/eeh: Handle hugepages in ioremap space

2019-07-18 Thread Sasha Levin

From: Oliver O'Halloran 

[ Upstream commit 33439620680be5225c1b8806579a291e0d761ca0 ]

In commit 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap
space") support for using hugepages in the vmalloc and ioremap areas was
enabled for radix. Unfortunately this broke EEH MMIO error checking.

Detection works by inserting a hook which checks the results of the
ioreadXX() set of functions.  When a read returns a 0xFFs response we
need to check for an error which we do by mapping the (virtual) MMIO
address back to a physical address, then mapping physical address to a
PCI device via an interval tree.

When translating virt -> phys we currently assume the ioremap space is
only populated by PAGE_SIZE mappings. If a hugepage mapping is found we
emit a WARN_ON(), but otherwise handles the check as though a normal
page was found. In pathalogical cases such as copying a buffer
containing a lot of 0xFFs from BAR memory this can result in the system
not booting because it's too busy printing WARN_ON()s.

There's no real reason to assume huge pages can't be present and we're
prefectly capable of handling them, so do that.

Fixes: 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap space")
Reported-by: Sachin Sant 
Signed-off-by: Oliver O'Halloran 
Tested-by: Sachin Sant 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20190710150517.27114-1-ooh...@gmail.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/eeh.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 6696c1986844..16193d7b0635 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -363,10 +363,19 @@ static inline unsigned long eeh_token_to_phys(unsigned 
long token)
   NULL, _shift);
if (!ptep)
return token;
-   WARN_ON(hugepage_shift);
-   pa = pte_pfn(*ptep) << PAGE_SHIFT;
 
-   return pa | (token & (PAGE_SIZE-1));
+   pa = pte_pfn(*ptep);
+
+   /* On radix we can do hugepage mappings for io, so handle that */
+   if (hugepage_shift) {
+   pa <<= hugepage_shift;
+   pa |= token & ((1ul << hugepage_shift) - 1);
+   } else {
+   pa <<= PAGE_SHIFT;
+   pa |= token & (PAGE_SIZE - 1);
+   }
+
+   return pa;
 }
 
 /*
-- 
2.20.1

[PATCH AUTOSEL 4.4 23/35] powerpc/4xx/uic: clear pending interrupt after irq type/pol change

2019-07-18 Thread Sasha Levin

From: Christian Lamparter 

[ Upstream commit 3ab3a0689e74e6aa5b41360bc18861040ddef5b1 ]

When testing out gpio-keys with a button, a spurious
interrupt (and therefore a key press or release event)
gets triggered as soon as the driver enables the irq
line for the first time.

This patch clears any potential bogus generated interrupt
that was caused by the switching of the associated irq's
type and polarity.

Signed-off-by: Christian Lamparter 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/sysdev/uic.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/sysdev/uic.c b/arch/powerpc/sysdev/uic.c
index 6893d8f236df..225346dda151 100644
--- a/arch/powerpc/sysdev/uic.c
+++ b/arch/powerpc/sysdev/uic.c
@@ -158,6 +158,7 @@ static int uic_set_irq_type(struct irq_data *d, unsigned 
int flow_type)
 
mtdcr(uic->dcrbase + UIC_PR, pr);
mtdcr(uic->dcrbase + UIC_TR, tr);
+   mtdcr(uic->dcrbase + UIC_SR, ~mask);
 
raw_spin_unlock_irqrestore(>lock, flags);
 
-- 
2.20.1

[PATCH AUTOSEL 4.4 15/35] powerpc/pci/of: Fix OF flags parsing for 64bit BARs

2019-07-18 Thread Sasha Levin

From: Alexey Kardashevskiy 

[ Upstream commit df5be5be8735ef2ae80d5ae1f2453cd81a035c4b ]

When the firmware does PCI BAR resource allocation, it passes the assigned
addresses and flags (prefetch/64bit/...) via the "reg" property of
a PCI device device tree node so the kernel does not need to do
resource allocation.

The flags are stored in resource::flags - the lower byte stores
PCI_BASE_ADDRESS_SPACE/etc bits and the other bytes are IORESOURCE_IO/etc.
Some flags from PCI_BASE_ADDRESS_xxx and IORESOURCE_xxx are duplicated,
such as PCI_BASE_ADDRESS_MEM_PREFETCH/PCI_BASE_ADDRESS_MEM_TYPE_64/etc.
When parsing the "reg" property, we copy the prefetch flag but we skip
on PCI_BASE_ADDRESS_MEM_TYPE_64 which leaves the flags out of sync.

The missing IORESOURCE_MEM_64 flag comes into play under 2 conditions:
1. we remove PCI_PROBE_ONLY for pseries (by hacking pSeries_setup_arch()
or by passing "/chosen/linux,pci-probe-only");
2. we request resource alignment (by passing pci=resource_alignment=
via the kernel cmd line to request PAGE_SIZE alignment or defining
ppc_md.pcibios_default_alignment which returns anything but 0). Note that
the alignment requests are ignored if PCI_PROBE_ONLY is enabled.

With 1) and 2), the generic PCI code in the kernel unconditionally
decides to:
- reassign the BARs in pci_specified_resource_alignment() (works fine)
- write new BARs to the device - this fails for 64bit BARs as the generic
code looks at IORESOURCE_MEM_64 (not set) and writes only lower 32bits
of the BAR and leaves the upper 32bit unmodified which breaks BAR mapping
in the hypervisor.

This fixes the issue by copying the flag. This is useful if we want to
enforce certain BAR alignment per platform as handling subpage sized BARs
is proven to cause problems with hotplug (SLOF already aligns BARs to 64k).

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: Sam Bobroff 
Reviewed-by: Oliver O'Halloran 
Reviewed-by: Shawn Anastasio 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/pci_of_scan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/pci_of_scan.c 
b/arch/powerpc/kernel/pci_of_scan.c
index 2e710c15893f..a38d7293460d 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -45,6 +45,8 @@ static unsigned int pci_parse_of_flags(u32 addr0, int bridge)
if (addr0 & 0x0200) {
flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY;
flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64;
+   if (flags & PCI_BASE_ADDRESS_MEM_TYPE_64)
+   flags |= IORESOURCE_MEM_64;
flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M;
if (addr0 & 0x4000)
flags |= IORESOURCE_PREFETCH
-- 
2.20.1

[PATCH AUTOSEL 4.4 13/35] powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration

2019-07-18 Thread Sasha Levin

From: Nathan Lynch 

[ Upstream commit e610a466d16a086e321f0bd421e2fc75cff28605 ]

It's common for the platform to replace the cache device nodes after a
migration. Since the cacheinfo code is never informed about this, it
never drops its references to the source system's cache nodes, causing
it to wind up in an inconsistent state resulting in warnings and oopses
as soon as CPU online/offline occurs after the migration, e.g.

  cache for /cpus/l3-cache@3113(Unified) refers to cache for 
/cpus/l2-cache@200d(Unified)
  WARNING: CPU: 15 PID: 86 at arch/powerpc/kernel/cacheinfo.c:176 
release_cache+0x1bc/0x1d0
  [...]
  NIP release_cache+0x1bc/0x1d0
  LR  release_cache+0x1b8/0x1d0
  Call Trace:
release_cache+0x1b8/0x1d0 (unreliable)
cacheinfo_cpu_offline+0x1c4/0x2c0
unregister_cpu_online+0x1b8/0x260
cpuhp_invoke_callback+0x114/0xf40
cpuhp_thread_fun+0x270/0x310
smpboot_thread_fn+0x2c8/0x390
kthread+0x1b8/0x1c0
ret_from_kernel_thread+0x5c/0x68

Using device tree notifiers won't work since we want to rebuild the
hierarchy only after all the removals and additions have occurred and
the device tree is in a consistent state. Call cacheinfo_teardown()
before processing device tree updates, and rebuild the hierarchy
afterward.

Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
Signed-off-by: Nathan Lynch 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/pseries/mobility.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index fcd1a32267c4..e85767c74e81 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include "pseries.h"
+#include "../../kernel/cacheinfo.h"
 
 static struct kobject *mobility_kobj;
 
@@ -316,11 +317,20 @@ void post_mobility_fixup(void)
 */
cpus_read_lock();
 
+   /*
+* It's common for the destination firmware to replace cache
+* nodes.  Release all of the cacheinfo hierarchy's references
+* before updating the device tree.
+*/
+   cacheinfo_teardown();
+
rc = pseries_devicetree_update(MIGRATION_SCOPE);
if (rc)
printk(KERN_ERR "Post-mobility device tree update "
"failed: %d\n", rc);
 
+   cacheinfo_rebuild();
+
cpus_read_unlock();
 
/* Possibly switch to a new RFI flush type */
-- 
2.20.1

[PATCH AUTOSEL 4.4 12/35] powerpc/pseries/mobility: prevent cpu hotplug during DT update

2019-07-18 Thread Sasha Levin

From: Nathan Lynch 

[ Upstream commit e59a175faa8df9d674247946f2a5a9c29c835725 ]

CPU online/offline code paths are sensitive to parts of the device
tree (various cpu node properties, cache nodes) that can be changed as
a result of a migration.

Prevent CPU hotplug while the device tree potentially is inconsistent.

Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
Signed-off-by: Nathan Lynch 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/pseries/mobility.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index c773396d0969..fcd1a32267c4 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -9,6 +9,7 @@
  * 2 as published by the Free Software Foundation.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -309,11 +310,19 @@ void post_mobility_fixup(void)
if (rc)
printk(KERN_ERR "Post-mobility activate-fw failed: %d\n", rc);
 
+   /*
+* We don't want CPUs to go online/offline while the device
+* tree is being updated.
+*/
+   cpus_read_lock();
+
rc = pseries_devicetree_update(MIGRATION_SCOPE);
if (rc)
printk(KERN_ERR "Post-mobility device tree update "
"failed: %d\n", rc);
 
+   cpus_read_unlock();
+
/* Possibly switch to a new RFI flush type */
pseries_setup_rfi_flush();
 
-- 
2.20.1

[PATCH AUTOSEL 4.9 40/45] powerpc/eeh: Handle hugepages in ioremap space

2019-07-18 Thread Sasha Levin

From: Oliver O'Halloran 

[ Upstream commit 33439620680be5225c1b8806579a291e0d761ca0 ]

In commit 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap
space") support for using hugepages in the vmalloc and ioremap areas was
enabled for radix. Unfortunately this broke EEH MMIO error checking.

Detection works by inserting a hook which checks the results of the
ioreadXX() set of functions.  When a read returns a 0xFFs response we
need to check for an error which we do by mapping the (virtual) MMIO
address back to a physical address, then mapping physical address to a
PCI device via an interval tree.

When translating virt -> phys we currently assume the ioremap space is
only populated by PAGE_SIZE mappings. If a hugepage mapping is found we
emit a WARN_ON(), but otherwise handles the check as though a normal
page was found. In pathalogical cases such as copying a buffer
containing a lot of 0xFFs from BAR memory this can result in the system
not booting because it's too busy printing WARN_ON()s.

There's no real reason to assume huge pages can't be present and we're
prefectly capable of handling them, so do that.

Fixes: 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap space")
Reported-by: Sachin Sant 
Signed-off-by: Oliver O'Halloran 
Tested-by: Sachin Sant 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20190710150517.27114-1-ooh...@gmail.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/eeh.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 8336b9016ca9..a7f229e59892 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -362,10 +362,19 @@ static inline unsigned long eeh_token_to_phys(unsigned 
long token)
   NULL, _shift);
if (!ptep)
return token;
-   WARN_ON(hugepage_shift);
-   pa = pte_pfn(*ptep) << PAGE_SHIFT;
 
-   return pa | (token & (PAGE_SIZE-1));
+   pa = pte_pfn(*ptep);
+
+   /* On radix we can do hugepage mappings for io, so handle that */
+   if (hugepage_shift) {
+   pa <<= hugepage_shift;
+   pa |= token & ((1ul << hugepage_shift) - 1);
+   } else {
+   pa <<= PAGE_SHIFT;
+   pa |= token & (PAGE_SIZE - 1);
+   }
+
+   return pa;
 }
 
 /*
-- 
2.20.1

[PATCH AUTOSEL 4.9 37/45] powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h

2019-07-18 Thread Sasha Levin

From: Masahiro Yamada 

[ Upstream commit 9e005b761e7ad153dcf40a6cba1d681fe0830ac6 ]

The next commit will make the way of passing CONFIG options more robust.
Unfortunately, it would uncover another hidden issue; without this
commit, skiroot_defconfig would be broken like this:

|   WRAParch/powerpc/boot/zImage.pseries
| arch/powerpc/boot/wrapper.a(decompress.o): In function `bcj_powerpc.isra.10':
| decompress.c:(.text+0x720): undefined reference to `get_unaligned_be32'
| decompress.c:(.text+0x7a8): undefined reference to `put_unaligned_be32'
| make[1]: *** [arch/powerpc/boot/Makefile;383: 
arch/powerpc/boot/zImage.pseries] Error 1
| make: *** [arch/powerpc/Makefile;295: zImage] Error 2

skiroot_defconfig is the only defconfig that enables CONFIG_KERNEL_XZ
for ppc, which has never been correctly built before.

I figured out the root cause in lib/decompress_unxz.c:

| #ifdef CONFIG_PPC
| #  define XZ_DEC_POWERPC
| #endif

CONFIG_PPC is undefined here in the ppc bootwrapper because autoconf.h
is not included except by arch/powerpc/boot/serial.c

XZ_DEC_POWERPC is not defined, therefore, bcj_powerpc() is not compiled
for the bootwrapper.

With the next commit passing CONFIG_PPC correctly, we would realize that
{get,put}_unaligned_be32 was missing.

Unlike the other decompressors, the ppc bootwrapper duplicates all the
necessary helpers in arch/powerpc/boot/.

The other architectures define __KERNEL__ and pull in helpers for
building the decompressors.

If ppc bootwrapper had defined __KERNEL__, lib/xz/xz_private.h would
have included :

| #ifdef __KERNEL__
| #   include 
| #   include 
| #   include 

However, doing so would cause tons of definition conflicts since the
bootwrapper has duplicated everything.

I just added copies of {get,put}_unaligned_be32, following the
bootwrapper coding convention.

Signed-off-by: Masahiro Yamada 
Signed-off-by: Michael Ellerman 
Link: 
https://lore.kernel.org/r/20190705100144.28785-1-yamada.masah...@socionext.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/boot/xz_config.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/powerpc/boot/xz_config.h b/arch/powerpc/boot/xz_config.h
index 5c6afdbca642..21b52c15aafc 100644
--- a/arch/powerpc/boot/xz_config.h
+++ b/arch/powerpc/boot/xz_config.h
@@ -19,10 +19,30 @@ static inline uint32_t swab32p(void *p)
 
 #ifdef __LITTLE_ENDIAN__
 #define get_le32(p) (*((uint32_t *) (p)))
+#define cpu_to_be32(x) swab32(x)
+static inline u32 be32_to_cpup(const u32 *p)
+{
+   return swab32p((u32 *)p);
+}
 #else
 #define get_le32(p) swab32p(p)
+#define cpu_to_be32(x) (x)
+static inline u32 be32_to_cpup(const u32 *p)
+{
+   return *p;
+}
 #endif
 
+static inline uint32_t get_unaligned_be32(const void *p)
+{
+   return be32_to_cpup(p);
+}
+
+static inline void put_unaligned_be32(u32 val, void *p)
+{
+   *((u32 *)p) = cpu_to_be32(val);
+}
+
 #define memeq(a, b, size) (memcmp(a, b, size) == 0)
 #define memzero(buf, size) memset(buf, 0, size)
 
-- 
2.20.1

[PATCH AUTOSEL 4.9 29/45] powerpc/4xx/uic: clear pending interrupt after irq type/pol change

2019-07-18 Thread Sasha Levin

From: Christian Lamparter 

[ Upstream commit 3ab3a0689e74e6aa5b41360bc18861040ddef5b1 ]

When testing out gpio-keys with a button, a spurious
interrupt (and therefore a key press or release event)
gets triggered as soon as the driver enables the irq
line for the first time.

This patch clears any potential bogus generated interrupt
that was caused by the switching of the associated irq's
type and polarity.

Signed-off-by: Christian Lamparter 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/sysdev/uic.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/sysdev/uic.c b/arch/powerpc/sysdev/uic.c
index a00949f3e378..a8ebc96dfed2 100644
--- a/arch/powerpc/sysdev/uic.c
+++ b/arch/powerpc/sysdev/uic.c
@@ -158,6 +158,7 @@ static int uic_set_irq_type(struct irq_data *d, unsigned 
int flow_type)
 
mtdcr(uic->dcrbase + UIC_PR, pr);
mtdcr(uic->dcrbase + UIC_TR, tr);
+   mtdcr(uic->dcrbase + UIC_SR, ~mask);
 
raw_spin_unlock_irqrestore(>lock, flags);
 
-- 
2.20.1

[PATCH AUTOSEL 4.9 18/45] powerpc/pci/of: Fix OF flags parsing for 64bit BARs

2019-07-18 Thread Sasha Levin

From: Alexey Kardashevskiy 

[ Upstream commit df5be5be8735ef2ae80d5ae1f2453cd81a035c4b ]

When the firmware does PCI BAR resource allocation, it passes the assigned
addresses and flags (prefetch/64bit/...) via the "reg" property of
a PCI device device tree node so the kernel does not need to do
resource allocation.

The flags are stored in resource::flags - the lower byte stores
PCI_BASE_ADDRESS_SPACE/etc bits and the other bytes are IORESOURCE_IO/etc.
Some flags from PCI_BASE_ADDRESS_xxx and IORESOURCE_xxx are duplicated,
such as PCI_BASE_ADDRESS_MEM_PREFETCH/PCI_BASE_ADDRESS_MEM_TYPE_64/etc.
When parsing the "reg" property, we copy the prefetch flag but we skip
on PCI_BASE_ADDRESS_MEM_TYPE_64 which leaves the flags out of sync.

The missing IORESOURCE_MEM_64 flag comes into play under 2 conditions:
1. we remove PCI_PROBE_ONLY for pseries (by hacking pSeries_setup_arch()
or by passing "/chosen/linux,pci-probe-only");
2. we request resource alignment (by passing pci=resource_alignment=
via the kernel cmd line to request PAGE_SIZE alignment or defining
ppc_md.pcibios_default_alignment which returns anything but 0). Note that
the alignment requests are ignored if PCI_PROBE_ONLY is enabled.

With 1) and 2), the generic PCI code in the kernel unconditionally
decides to:
- reassign the BARs in pci_specified_resource_alignment() (works fine)
- write new BARs to the device - this fails for 64bit BARs as the generic
code looks at IORESOURCE_MEM_64 (not set) and writes only lower 32bits
of the BAR and leaves the upper 32bit unmodified which breaks BAR mapping
in the hypervisor.

This fixes the issue by copying the flag. This is useful if we want to
enforce certain BAR alignment per platform as handling subpage sized BARs
is proven to cause problems with hotplug (SLOF already aligns BARs to 64k).

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: Sam Bobroff 
Reviewed-by: Oliver O'Halloran 
Reviewed-by: Shawn Anastasio 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/pci_of_scan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/pci_of_scan.c 
b/arch/powerpc/kernel/pci_of_scan.c
index ea3d98115b88..e0648a09d9c8 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -45,6 +45,8 @@ static unsigned int pci_parse_of_flags(u32 addr0, int bridge)
if (addr0 & 0x0200) {
flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY;
flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64;
+   if (flags & PCI_BASE_ADDRESS_MEM_TYPE_64)
+   flags |= IORESOURCE_MEM_64;
flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M;
if (addr0 & 0x4000)
flags |= IORESOURCE_PREFETCH
-- 
2.20.1

[PATCH AUTOSEL 4.14 51/60] powerpc/eeh: Handle hugepages in ioremap space

2019-07-18 Thread Sasha Levin

From: Oliver O'Halloran 

[ Upstream commit 33439620680be5225c1b8806579a291e0d761ca0 ]

In commit 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap
space") support for using hugepages in the vmalloc and ioremap areas was
enabled for radix. Unfortunately this broke EEH MMIO error checking.

Detection works by inserting a hook which checks the results of the
ioreadXX() set of functions.  When a read returns a 0xFFs response we
need to check for an error which we do by mapping the (virtual) MMIO
address back to a physical address, then mapping physical address to a
PCI device via an interval tree.

When translating virt -> phys we currently assume the ioremap space is
only populated by PAGE_SIZE mappings. If a hugepage mapping is found we
emit a WARN_ON(), but otherwise handles the check as though a normal
page was found. In pathalogical cases such as copying a buffer
containing a lot of 0xFFs from BAR memory this can result in the system
not booting because it's too busy printing WARN_ON()s.

There's no real reason to assume huge pages can't be present and we're
prefectly capable of handling them, so do that.

Fixes: 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap space")
Reported-by: Sachin Sant 
Signed-off-by: Oliver O'Halloran 
Tested-by: Sachin Sant 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20190710150517.27114-1-ooh...@gmail.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/eeh.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 45322b37669a..d2ba7936d0d3 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -361,10 +361,19 @@ static inline unsigned long eeh_token_to_phys(unsigned 
long token)
ptep = find_init_mm_pte(token, _shift);
if (!ptep)
return token;
-   WARN_ON(hugepage_shift);
-   pa = pte_pfn(*ptep) << PAGE_SHIFT;
 
-   return pa | (token & (PAGE_SIZE-1));
+   pa = pte_pfn(*ptep);
+
+   /* On radix we can do hugepage mappings for io, so handle that */
+   if (hugepage_shift) {
+   pa <<= hugepage_shift;
+   pa |= token & ((1ul << hugepage_shift) - 1);
+   } else {
+   pa <<= PAGE_SHIFT;
+   pa |= token & (PAGE_SIZE - 1);
+   }
+
+   return pa;
 }
 
 /*
-- 
2.20.1

[PATCH AUTOSEL 4.14 48/60] powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h

2019-07-18 Thread Sasha Levin

From: Masahiro Yamada 

[ Upstream commit 9e005b761e7ad153dcf40a6cba1d681fe0830ac6 ]

The next commit will make the way of passing CONFIG options more robust.
Unfortunately, it would uncover another hidden issue; without this
commit, skiroot_defconfig would be broken like this:

|   WRAParch/powerpc/boot/zImage.pseries
| arch/powerpc/boot/wrapper.a(decompress.o): In function `bcj_powerpc.isra.10':
| decompress.c:(.text+0x720): undefined reference to `get_unaligned_be32'
| decompress.c:(.text+0x7a8): undefined reference to `put_unaligned_be32'
| make[1]: *** [arch/powerpc/boot/Makefile;383: 
arch/powerpc/boot/zImage.pseries] Error 1
| make: *** [arch/powerpc/Makefile;295: zImage] Error 2

skiroot_defconfig is the only defconfig that enables CONFIG_KERNEL_XZ
for ppc, which has never been correctly built before.

I figured out the root cause in lib/decompress_unxz.c:

| #ifdef CONFIG_PPC
| #  define XZ_DEC_POWERPC
| #endif

CONFIG_PPC is undefined here in the ppc bootwrapper because autoconf.h
is not included except by arch/powerpc/boot/serial.c

XZ_DEC_POWERPC is not defined, therefore, bcj_powerpc() is not compiled
for the bootwrapper.

With the next commit passing CONFIG_PPC correctly, we would realize that
{get,put}_unaligned_be32 was missing.

Unlike the other decompressors, the ppc bootwrapper duplicates all the
necessary helpers in arch/powerpc/boot/.

The other architectures define __KERNEL__ and pull in helpers for
building the decompressors.

If ppc bootwrapper had defined __KERNEL__, lib/xz/xz_private.h would
have included :

| #ifdef __KERNEL__
| #   include 
| #   include 
| #   include 

However, doing so would cause tons of definition conflicts since the
bootwrapper has duplicated everything.

I just added copies of {get,put}_unaligned_be32, following the
bootwrapper coding convention.

Signed-off-by: Masahiro Yamada 
Signed-off-by: Michael Ellerman 
Link: 
https://lore.kernel.org/r/20190705100144.28785-1-yamada.masah...@socionext.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/boot/xz_config.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/powerpc/boot/xz_config.h b/arch/powerpc/boot/xz_config.h
index e22e5b3770dd..ebfadd39e192 100644
--- a/arch/powerpc/boot/xz_config.h
+++ b/arch/powerpc/boot/xz_config.h
@@ -20,10 +20,30 @@ static inline uint32_t swab32p(void *p)
 
 #ifdef __LITTLE_ENDIAN__
 #define get_le32(p) (*((uint32_t *) (p)))
+#define cpu_to_be32(x) swab32(x)
+static inline u32 be32_to_cpup(const u32 *p)
+{
+   return swab32p((u32 *)p);
+}
 #else
 #define get_le32(p) swab32p(p)
+#define cpu_to_be32(x) (x)
+static inline u32 be32_to_cpup(const u32 *p)
+{
+   return *p;
+}
 #endif
 
+static inline uint32_t get_unaligned_be32(const void *p)
+{
+   return be32_to_cpup(p);
+}
+
+static inline void put_unaligned_be32(u32 val, void *p)
+{
+   *((u32 *)p) = cpu_to_be32(val);
+}
+
 #define memeq(a, b, size) (memcmp(a, b, size) == 0)
 #define memzero(buf, size) memset(buf, 0, size)
 
-- 
2.20.1

[PATCH AUTOSEL 4.14 37/60] powerpc/4xx/uic: clear pending interrupt after irq type/pol change

2019-07-18 Thread Sasha Levin

From: Christian Lamparter 

[ Upstream commit 3ab3a0689e74e6aa5b41360bc18861040ddef5b1 ]

When testing out gpio-keys with a button, a spurious
interrupt (and therefore a key press or release event)
gets triggered as soon as the driver enables the irq
line for the first time.

This patch clears any potential bogus generated interrupt
that was caused by the switching of the associated irq's
type and polarity.

Signed-off-by: Christian Lamparter 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/4xx/uic.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/4xx/uic.c b/arch/powerpc/platforms/4xx/uic.c
index 8b4dd0da0839..9e27cfe27026 100644
--- a/arch/powerpc/platforms/4xx/uic.c
+++ b/arch/powerpc/platforms/4xx/uic.c
@@ -158,6 +158,7 @@ static int uic_set_irq_type(struct irq_data *d, unsigned 
int flow_type)
 
mtdcr(uic->dcrbase + UIC_PR, pr);
mtdcr(uic->dcrbase + UIC_TR, tr);
+   mtdcr(uic->dcrbase + UIC_SR, ~mask);
 
raw_spin_unlock_irqrestore(>lock, flags);
 
-- 
2.20.1

[PATCH AUTOSEL 4.14 31/60] powerpc/xmon: Fix disabling tracing while in xmon

2019-07-18 Thread Sasha Levin

From: "Naveen N. Rao" 

[ Upstream commit aaf06665f7ea3ee9f9754e16c1a507a89f1de5b1 ]

Commit ed49f7fd6438d ("powerpc/xmon: Disable tracing when entering
xmon") added code to disable recording trace entries while in xmon. The
commit introduced a variable 'tracing_enabled' to record if tracing was
enabled on xmon entry, and used this to conditionally enable tracing
during exit from xmon.

However, we are not checking the value of 'fromipi' variable in
xmon_core() when setting 'tracing_enabled'. Due to this, when secondary
cpus enter xmon, they will see tracing as being disabled already and
tracing won't be re-enabled on exit. Fix the same.

Fixes: ed49f7fd6438d ("powerpc/xmon: Disable tracing when entering xmon")
Signed-off-by: Naveen N. Rao 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/xmon/xmon.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index f752f771f29d..6b9038a3e79f 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -465,8 +465,10 @@ static int xmon_core(struct pt_regs *regs, int fromipi)
local_irq_save(flags);
hard_irq_disable();
 
-   tracing_enabled = tracing_is_on();
-   tracing_off();
+   if (!fromipi) {
+   tracing_enabled = tracing_is_on();
+   tracing_off();
+   }
 
bp = in_breakpoint_table(regs->nip, );
if (bp != NULL) {
-- 
2.20.1

[PATCH AUTOSEL 4.14 23/60] powerpc/pci/of: Fix OF flags parsing for 64bit BARs

2019-07-18 Thread Sasha Levin

From: Alexey Kardashevskiy 

[ Upstream commit df5be5be8735ef2ae80d5ae1f2453cd81a035c4b ]

When the firmware does PCI BAR resource allocation, it passes the assigned
addresses and flags (prefetch/64bit/...) via the "reg" property of
a PCI device device tree node so the kernel does not need to do
resource allocation.

The flags are stored in resource::flags - the lower byte stores
PCI_BASE_ADDRESS_SPACE/etc bits and the other bytes are IORESOURCE_IO/etc.
Some flags from PCI_BASE_ADDRESS_xxx and IORESOURCE_xxx are duplicated,
such as PCI_BASE_ADDRESS_MEM_PREFETCH/PCI_BASE_ADDRESS_MEM_TYPE_64/etc.
When parsing the "reg" property, we copy the prefetch flag but we skip
on PCI_BASE_ADDRESS_MEM_TYPE_64 which leaves the flags out of sync.

The missing IORESOURCE_MEM_64 flag comes into play under 2 conditions:
1. we remove PCI_PROBE_ONLY for pseries (by hacking pSeries_setup_arch()
or by passing "/chosen/linux,pci-probe-only");
2. we request resource alignment (by passing pci=resource_alignment=
via the kernel cmd line to request PAGE_SIZE alignment or defining
ppc_md.pcibios_default_alignment which returns anything but 0). Note that
the alignment requests are ignored if PCI_PROBE_ONLY is enabled.

With 1) and 2), the generic PCI code in the kernel unconditionally
decides to:
- reassign the BARs in pci_specified_resource_alignment() (works fine)
- write new BARs to the device - this fails for 64bit BARs as the generic
code looks at IORESOURCE_MEM_64 (not set) and writes only lower 32bits
of the BAR and leaves the upper 32bit unmodified which breaks BAR mapping
in the hypervisor.

This fixes the issue by copying the flag. This is useful if we want to
enforce certain BAR alignment per platform as handling subpage sized BARs
is proven to cause problems with hotplug (SLOF already aligns BARs to 64k).

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: Sam Bobroff 
Reviewed-by: Oliver O'Halloran 
Reviewed-by: Shawn Anastasio 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/pci_of_scan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/pci_of_scan.c 
b/arch/powerpc/kernel/pci_of_scan.c
index 0d790f8432d2..6ca1b3a1e196 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -45,6 +45,8 @@ static unsigned int pci_parse_of_flags(u32 addr0, int bridge)
if (addr0 & 0x0200) {
flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY;
flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64;
+   if (flags & PCI_BASE_ADDRESS_MEM_TYPE_64)
+   flags |= IORESOURCE_MEM_64;
flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M;
if (addr0 & 0x4000)
flags |= IORESOURCE_PREFETCH
-- 
2.20.1

[PATCH AUTOSEL 4.14 17/60] powerpc/pseries/mobility: prevent cpu hotplug during DT update

2019-07-18 Thread Sasha Levin

From: Nathan Lynch 

[ Upstream commit e59a175faa8df9d674247946f2a5a9c29c835725 ]

CPU online/offline code paths are sensitive to parts of the device
tree (various cpu node properties, cache nodes) that can be changed as
a result of a migration.

Prevent CPU hotplug while the device tree potentially is inconsistent.

Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
Signed-off-by: Nathan Lynch 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/pseries/mobility.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index fbea7db043fa..4addc552eb33 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -9,6 +9,7 @@
  * 2 as published by the Free Software Foundation.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -343,11 +344,19 @@ void post_mobility_fixup(void)
if (rc)
printk(KERN_ERR "Post-mobility activate-fw failed: %d\n", rc);
 
+   /*
+* We don't want CPUs to go online/offline while the device
+* tree is being updated.
+*/
+   cpus_read_lock();
+
rc = pseries_devicetree_update(MIGRATION_SCOPE);
if (rc)
printk(KERN_ERR "Post-mobility device tree update "
"failed: %d\n", rc);
 
+   cpus_read_unlock();
+
/* Possibly switch to a new RFI flush type */
pseries_setup_rfi_flush();
 
-- 
2.20.1

[PATCH AUTOSEL 4.19 084/101] powerpc/eeh: Handle hugepages in ioremap space

2019-07-18 Thread Sasha Levin

From: Oliver O'Halloran 

[ Upstream commit 33439620680be5225c1b8806579a291e0d761ca0 ]

In commit 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap
space") support for using hugepages in the vmalloc and ioremap areas was
enabled for radix. Unfortunately this broke EEH MMIO error checking.

Detection works by inserting a hook which checks the results of the
ioreadXX() set of functions.  When a read returns a 0xFFs response we
need to check for an error which we do by mapping the (virtual) MMIO
address back to a physical address, then mapping physical address to a
PCI device via an interval tree.

When translating virt -> phys we currently assume the ioremap space is
only populated by PAGE_SIZE mappings. If a hugepage mapping is found we
emit a WARN_ON(), but otherwise handles the check as though a normal
page was found. In pathalogical cases such as copying a buffer
containing a lot of 0xFFs from BAR memory this can result in the system
not booting because it's too busy printing WARN_ON()s.

There's no real reason to assume huge pages can't be present and we're
prefectly capable of handling them, so do that.

Fixes: 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap space")
Reported-by: Sachin Sant 
Signed-off-by: Oliver O'Halloran 
Tested-by: Sachin Sant 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20190710150517.27114-1-ooh...@gmail.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/eeh.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index c72767a5327a..fe3c6f3bd3b6 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -360,10 +360,19 @@ static inline unsigned long eeh_token_to_phys(unsigned 
long token)
ptep = find_init_mm_pte(token, _shift);
if (!ptep)
return token;
-   WARN_ON(hugepage_shift);
-   pa = pte_pfn(*ptep) << PAGE_SHIFT;
 
-   return pa | (token & (PAGE_SIZE-1));
+   pa = pte_pfn(*ptep);
+
+   /* On radix we can do hugepage mappings for io, so handle that */
+   if (hugepage_shift) {
+   pa <<= hugepage_shift;
+   pa |= token & ((1ul << hugepage_shift) - 1);
+   } else {
+   pa <<= PAGE_SHIFT;
+   pa |= token & (PAGE_SIZE - 1);
+   }
+
+   return pa;
 }
 
 /*
-- 
2.20.1

[PATCH AUTOSEL 4.19 079/101] powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h

2019-07-18 Thread Sasha Levin

From: Masahiro Yamada 

[ Upstream commit 9e005b761e7ad153dcf40a6cba1d681fe0830ac6 ]

The next commit will make the way of passing CONFIG options more robust.
Unfortunately, it would uncover another hidden issue; without this
commit, skiroot_defconfig would be broken like this:

|   WRAParch/powerpc/boot/zImage.pseries
| arch/powerpc/boot/wrapper.a(decompress.o): In function `bcj_powerpc.isra.10':
| decompress.c:(.text+0x720): undefined reference to `get_unaligned_be32'
| decompress.c:(.text+0x7a8): undefined reference to `put_unaligned_be32'
| make[1]: *** [arch/powerpc/boot/Makefile;383: 
arch/powerpc/boot/zImage.pseries] Error 1
| make: *** [arch/powerpc/Makefile;295: zImage] Error 2

skiroot_defconfig is the only defconfig that enables CONFIG_KERNEL_XZ
for ppc, which has never been correctly built before.

I figured out the root cause in lib/decompress_unxz.c:

| #ifdef CONFIG_PPC
| #  define XZ_DEC_POWERPC
| #endif

CONFIG_PPC is undefined here in the ppc bootwrapper because autoconf.h
is not included except by arch/powerpc/boot/serial.c

XZ_DEC_POWERPC is not defined, therefore, bcj_powerpc() is not compiled
for the bootwrapper.

With the next commit passing CONFIG_PPC correctly, we would realize that
{get,put}_unaligned_be32 was missing.

Unlike the other decompressors, the ppc bootwrapper duplicates all the
necessary helpers in arch/powerpc/boot/.

The other architectures define __KERNEL__ and pull in helpers for
building the decompressors.

If ppc bootwrapper had defined __KERNEL__, lib/xz/xz_private.h would
have included :

| #ifdef __KERNEL__
| #   include 
| #   include 
| #   include 

However, doing so would cause tons of definition conflicts since the
bootwrapper has duplicated everything.

I just added copies of {get,put}_unaligned_be32, following the
bootwrapper coding convention.

Signed-off-by: Masahiro Yamada 
Signed-off-by: Michael Ellerman 
Link: 
https://lore.kernel.org/r/20190705100144.28785-1-yamada.masah...@socionext.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/boot/xz_config.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/powerpc/boot/xz_config.h b/arch/powerpc/boot/xz_config.h
index e22e5b3770dd..ebfadd39e192 100644
--- a/arch/powerpc/boot/xz_config.h
+++ b/arch/powerpc/boot/xz_config.h
@@ -20,10 +20,30 @@ static inline uint32_t swab32p(void *p)
 
 #ifdef __LITTLE_ENDIAN__
 #define get_le32(p) (*((uint32_t *) (p)))
+#define cpu_to_be32(x) swab32(x)
+static inline u32 be32_to_cpup(const u32 *p)
+{
+   return swab32p((u32 *)p);
+}
 #else
 #define get_le32(p) swab32p(p)
+#define cpu_to_be32(x) (x)
+static inline u32 be32_to_cpup(const u32 *p)
+{
+   return *p;
+}
 #endif
 
+static inline uint32_t get_unaligned_be32(const void *p)
+{
+   return be32_to_cpup(p);
+}
+
+static inline void put_unaligned_be32(u32 val, void *p)
+{
+   *((u32 *)p) = cpu_to_be32(val);
+}
+
 #define memeq(a, b, size) (memcmp(a, b, size) == 0)
 #define memzero(buf, size) memset(buf, 0, size)
 
-- 
2.20.1

[PATCH AUTOSEL 4.19 062/101] powerpc/mm: Handle page table allocation failures

2019-07-18 Thread Sasha Levin

From: "Aneesh Kumar K.V" 

[ Upstream commit 2230ebf6e6dd0b7751e2921b40f6cfe34f09bb16 ]

This fixes kernel crash that arises due to not handling page table allocation
failures while allocating hugetlb page table.

Fixes: e2b3d202d1db ("powerpc: Switch 16GB and 16MB explicit hugepages to a 
different page table format")
Signed-off-by: Aneesh Kumar K.V 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/mm/hugetlbpage.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 7296a42eb62e..cef0b7ee1024 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -150,6 +150,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long 
addr, unsigned long sz
} else {
pdshift = PUD_SHIFT;
pu = pud_alloc(mm, pg, addr);
+   if (!pu)
+   return NULL;
if (pshift == PUD_SHIFT)
return (pte_t *)pu;
else if (pshift > PMD_SHIFT) {
@@ -158,6 +160,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long 
addr, unsigned long sz
} else {
pdshift = PMD_SHIFT;
pm = pmd_alloc(mm, pu, addr);
+   if (!pm)
+   return NULL;
if (pshift == PMD_SHIFT)
/* 16MB hugepage */
return (pte_t *)pm;
@@ -174,12 +178,16 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long 
addr, unsigned long sz
} else {
pdshift = PUD_SHIFT;
pu = pud_alloc(mm, pg, addr);
+   if (!pu)
+   return NULL;
if (pshift >= PUD_SHIFT) {
ptl = pud_lockptr(mm, pu);
hpdp = (hugepd_t *)pu;
} else {
pdshift = PMD_SHIFT;
pm = pmd_alloc(mm, pu, addr);
+   if (!pm)
+   return NULL;
ptl = pmd_lockptr(mm, pm);
hpdp = (hugepd_t *)pm;
}
-- 
2.20.1

[PATCH AUTOSEL 4.19 057/101] powerpc/4xx/uic: clear pending interrupt after irq type/pol change

2019-07-18 Thread Sasha Levin

From: Christian Lamparter 

[ Upstream commit 3ab3a0689e74e6aa5b41360bc18861040ddef5b1 ]

When testing out gpio-keys with a button, a spurious
interrupt (and therefore a key press or release event)
gets triggered as soon as the driver enables the irq
line for the first time.

This patch clears any potential bogus generated interrupt
that was caused by the switching of the associated irq's
type and polarity.

Signed-off-by: Christian Lamparter 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/4xx/uic.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/4xx/uic.c b/arch/powerpc/platforms/4xx/uic.c
index 8b4dd0da0839..9e27cfe27026 100644
--- a/arch/powerpc/platforms/4xx/uic.c
+++ b/arch/powerpc/platforms/4xx/uic.c
@@ -158,6 +158,7 @@ static int uic_set_irq_type(struct irq_data *d, unsigned 
int flow_type)
 
mtdcr(uic->dcrbase + UIC_PR, pr);
mtdcr(uic->dcrbase + UIC_TR, tr);
+   mtdcr(uic->dcrbase + UIC_SR, ~mask);
 
raw_spin_unlock_irqrestore(>lock, flags);
 
-- 
2.20.1

[PATCH AUTOSEL 4.19 049/101] powerpc/xmon: Fix disabling tracing while in xmon

2019-07-18 Thread Sasha Levin

From: "Naveen N. Rao" 

[ Upstream commit aaf06665f7ea3ee9f9754e16c1a507a89f1de5b1 ]

Commit ed49f7fd6438d ("powerpc/xmon: Disable tracing when entering
xmon") added code to disable recording trace entries while in xmon. The
commit introduced a variable 'tracing_enabled' to record if tracing was
enabled on xmon entry, and used this to conditionally enable tracing
during exit from xmon.

However, we are not checking the value of 'fromipi' variable in
xmon_core() when setting 'tracing_enabled'. Due to this, when secondary
cpus enter xmon, they will see tracing as being disabled already and
tracing won't be re-enabled on exit. Fix the same.

Fixes: ed49f7fd6438d ("powerpc/xmon: Disable tracing when entering xmon")
Signed-off-by: Naveen N. Rao 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/xmon/xmon.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index dd6badc31f45..74cfc1be04d6 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -466,8 +466,10 @@ static int xmon_core(struct pt_regs *regs, int fromipi)
local_irq_save(flags);
hard_irq_disable();
 
-   tracing_enabled = tracing_is_on();
-   tracing_off();
+   if (!fromipi) {
+   tracing_enabled = tracing_is_on();
+   tracing_off();
+   }
 
bp = in_breakpoint_table(regs->nip, );
if (bp != NULL) {
-- 
2.20.1

[PATCH AUTOSEL 4.19 048/101] powerpc/cacheflush: fix variable set but not used

2019-07-18 Thread Sasha Levin

From: Qian Cai 

[ Upstream commit 04db3ede40ae4fc23a5c4237254c4a53bbe4c1f2 ]

The powerpc's flush_cache_vmap() is defined as a macro and never use
both of its arguments, so it will generate a compilation warning,

lib/ioremap.c: In function 'ioremap_page_range':
lib/ioremap.c:203:16: warning: variable 'start' set but not used
[-Wunused-but-set-variable]

Fix it by making it an inline function.

Signed-off-by: Qian Cai 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/include/asm/cacheflush.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index d5a8d7bf0759..b189f7aee222 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -32,9 +32,12 @@
  * not expect this type of fault. flush_cache_vmap is not exactly the right
  * place to put this, but it seems to work well enough.
  */
-#define flush_cache_vmap(start, end)   do { asm volatile("ptesync" ::: 
"memory"); } while (0)
+static inline void flush_cache_vmap(unsigned long start, unsigned long end)
+{
+   asm volatile("ptesync" ::: "memory");
+}
 #else
-#define flush_cache_vmap(start, end)   do { } while (0)
+static inline void flush_cache_vmap(unsigned long start, unsigned long end) { }
 #endif
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-- 
2.20.1

[PATCH AUTOSEL 4.19 038/101] powerpc/pci/of: Fix OF flags parsing for 64bit BARs

2019-07-18 Thread Sasha Levin

From: Alexey Kardashevskiy 

[ Upstream commit df5be5be8735ef2ae80d5ae1f2453cd81a035c4b ]

When the firmware does PCI BAR resource allocation, it passes the assigned
addresses and flags (prefetch/64bit/...) via the "reg" property of
a PCI device device tree node so the kernel does not need to do
resource allocation.

The flags are stored in resource::flags - the lower byte stores
PCI_BASE_ADDRESS_SPACE/etc bits and the other bytes are IORESOURCE_IO/etc.
Some flags from PCI_BASE_ADDRESS_xxx and IORESOURCE_xxx are duplicated,
such as PCI_BASE_ADDRESS_MEM_PREFETCH/PCI_BASE_ADDRESS_MEM_TYPE_64/etc.
When parsing the "reg" property, we copy the prefetch flag but we skip
on PCI_BASE_ADDRESS_MEM_TYPE_64 which leaves the flags out of sync.

The missing IORESOURCE_MEM_64 flag comes into play under 2 conditions:
1. we remove PCI_PROBE_ONLY for pseries (by hacking pSeries_setup_arch()
or by passing "/chosen/linux,pci-probe-only");
2. we request resource alignment (by passing pci=resource_alignment=
via the kernel cmd line to request PAGE_SIZE alignment or defining
ppc_md.pcibios_default_alignment which returns anything but 0). Note that
the alignment requests are ignored if PCI_PROBE_ONLY is enabled.

With 1) and 2), the generic PCI code in the kernel unconditionally
decides to:
- reassign the BARs in pci_specified_resource_alignment() (works fine)
- write new BARs to the device - this fails for 64bit BARs as the generic
code looks at IORESOURCE_MEM_64 (not set) and writes only lower 32bits
of the BAR and leaves the upper 32bit unmodified which breaks BAR mapping
in the hypervisor.

This fixes the issue by copying the flag. This is useful if we want to
enforce certain BAR alignment per platform as handling subpage sized BARs
is proven to cause problems with hotplug (SLOF already aligns BARs to 64k).

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: Sam Bobroff 
Reviewed-by: Oliver O'Halloran 
Reviewed-by: Shawn Anastasio 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/pci_of_scan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/pci_of_scan.c 
b/arch/powerpc/kernel/pci_of_scan.c
index 98f04725def7..c101b321dece 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -45,6 +45,8 @@ unsigned int pci_parse_of_flags(u32 addr0, int bridge)
if (addr0 & 0x0200) {
flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY;
flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64;
+   if (flags & PCI_BASE_ADDRESS_MEM_TYPE_64)
+   flags |= IORESOURCE_MEM_64;
flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M;
if (addr0 & 0x4000)
flags |= IORESOURCE_PREFETCH
-- 
2.20.1

[PATCH AUTOSEL 4.19 030/101] powerpc/pseries/mobility: prevent cpu hotplug during DT update

2019-07-18 Thread Sasha Levin

From: Nathan Lynch 

[ Upstream commit e59a175faa8df9d674247946f2a5a9c29c835725 ]

CPU online/offline code paths are sensitive to parts of the device
tree (various cpu node properties, cache nodes) that can be changed as
a result of a migration.

Prevent CPU hotplug while the device tree potentially is inconsistent.

Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
Signed-off-by: Nathan Lynch 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/pseries/mobility.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index f0e30dc94988..7b60fcf04dc4 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -9,6 +9,7 @@
  * 2 as published by the Free Software Foundation.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -344,11 +345,19 @@ void post_mobility_fixup(void)
if (rc)
printk(KERN_ERR "Post-mobility activate-fw failed: %d\n", rc);
 
+   /*
+* We don't want CPUs to go online/offline while the device
+* tree is being updated.
+*/
+   cpus_read_lock();
+
rc = pseries_devicetree_update(MIGRATION_SCOPE);
if (rc)
printk(KERN_ERR "Post-mobility device tree update "
"failed: %d\n", rc);
 
+   cpus_read_unlock();
+
/* Possibly switch to a new RFI flush type */
pseries_setup_rfi_flush();
 
-- 
2.20.1

[PATCH AUTOSEL 5.1 119/141] powerpc/eeh: Handle hugepages in ioremap space

2019-07-18 Thread Sasha Levin

From: Oliver O'Halloran 

[ Upstream commit 33439620680be5225c1b8806579a291e0d761ca0 ]

In commit 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap
space") support for using hugepages in the vmalloc and ioremap areas was
enabled for radix. Unfortunately this broke EEH MMIO error checking.

Detection works by inserting a hook which checks the results of the
ioreadXX() set of functions.  When a read returns a 0xFFs response we
need to check for an error which we do by mapping the (virtual) MMIO
address back to a physical address, then mapping physical address to a
PCI device via an interval tree.

When translating virt -> phys we currently assume the ioremap space is
only populated by PAGE_SIZE mappings. If a hugepage mapping is found we
emit a WARN_ON(), but otherwise handles the check as though a normal
page was found. In pathalogical cases such as copying a buffer
containing a lot of 0xFFs from BAR memory this can result in the system
not booting because it's too busy printing WARN_ON()s.

There's no real reason to assume huge pages can't be present and we're
prefectly capable of handling them, so do that.

Fixes: 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap space")
Reported-by: Sachin Sant 
Signed-off-by: Oliver O'Halloran 
Tested-by: Sachin Sant 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20190710150517.27114-1-ooh...@gmail.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/eeh.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 289c0b37d845..0dc1865c84ce 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -367,10 +367,19 @@ static inline unsigned long eeh_token_to_phys(unsigned 
long token)
ptep = find_init_mm_pte(token, _shift);
if (!ptep)
return token;
-   WARN_ON(hugepage_shift);
-   pa = pte_pfn(*ptep) << PAGE_SHIFT;
 
-   return pa | (token & (PAGE_SIZE-1));
+   pa = pte_pfn(*ptep);
+
+   /* On radix we can do hugepage mappings for io, so handle that */
+   if (hugepage_shift) {
+   pa <<= hugepage_shift;
+   pa |= token & ((1ul << hugepage_shift) - 1);
+   } else {
+   pa <<= PAGE_SHIFT;
+   pa |= token & (PAGE_SIZE - 1);
+   }
+
+   return pa;
 }
 
 /*
-- 
2.20.1

[PATCH AUTOSEL 5.1 112/141] powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h

2019-07-18 Thread Sasha Levin

From: Masahiro Yamada 

[ Upstream commit 9e005b761e7ad153dcf40a6cba1d681fe0830ac6 ]

The next commit will make the way of passing CONFIG options more robust.
Unfortunately, it would uncover another hidden issue; without this
commit, skiroot_defconfig would be broken like this:

|   WRAParch/powerpc/boot/zImage.pseries
| arch/powerpc/boot/wrapper.a(decompress.o): In function `bcj_powerpc.isra.10':
| decompress.c:(.text+0x720): undefined reference to `get_unaligned_be32'
| decompress.c:(.text+0x7a8): undefined reference to `put_unaligned_be32'
| make[1]: *** [arch/powerpc/boot/Makefile;383: 
arch/powerpc/boot/zImage.pseries] Error 1
| make: *** [arch/powerpc/Makefile;295: zImage] Error 2

skiroot_defconfig is the only defconfig that enables CONFIG_KERNEL_XZ
for ppc, which has never been correctly built before.

I figured out the root cause in lib/decompress_unxz.c:

| #ifdef CONFIG_PPC
| #  define XZ_DEC_POWERPC
| #endif

CONFIG_PPC is undefined here in the ppc bootwrapper because autoconf.h
is not included except by arch/powerpc/boot/serial.c

XZ_DEC_POWERPC is not defined, therefore, bcj_powerpc() is not compiled
for the bootwrapper.

With the next commit passing CONFIG_PPC correctly, we would realize that
{get,put}_unaligned_be32 was missing.

Unlike the other decompressors, the ppc bootwrapper duplicates all the
necessary helpers in arch/powerpc/boot/.

The other architectures define __KERNEL__ and pull in helpers for
building the decompressors.

If ppc bootwrapper had defined __KERNEL__, lib/xz/xz_private.h would
have included :

| #ifdef __KERNEL__
| #   include 
| #   include 
| #   include 

However, doing so would cause tons of definition conflicts since the
bootwrapper has duplicated everything.

I just added copies of {get,put}_unaligned_be32, following the
bootwrapper coding convention.

Signed-off-by: Masahiro Yamada 
Signed-off-by: Michael Ellerman 
Link: 
https://lore.kernel.org/r/20190705100144.28785-1-yamada.masah...@socionext.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/boot/xz_config.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/powerpc/boot/xz_config.h b/arch/powerpc/boot/xz_config.h
index e22e5b3770dd..ebfadd39e192 100644
--- a/arch/powerpc/boot/xz_config.h
+++ b/arch/powerpc/boot/xz_config.h
@@ -20,10 +20,30 @@ static inline uint32_t swab32p(void *p)
 
 #ifdef __LITTLE_ENDIAN__
 #define get_le32(p) (*((uint32_t *) (p)))
+#define cpu_to_be32(x) swab32(x)
+static inline u32 be32_to_cpup(const u32 *p)
+{
+   return swab32p((u32 *)p);
+}
 #else
 #define get_le32(p) swab32p(p)
+#define cpu_to_be32(x) (x)
+static inline u32 be32_to_cpup(const u32 *p)
+{
+   return *p;
+}
 #endif
 
+static inline uint32_t get_unaligned_be32(const void *p)
+{
+   return be32_to_cpup(p);
+}
+
+static inline void put_unaligned_be32(u32 val, void *p)
+{
+   *((u32 *)p) = cpu_to_be32(val);
+}
+
 #define memeq(a, b, size) (memcmp(a, b, size) == 0)
 #define memzero(buf, size) memset(buf, 0, size)
 
-- 
2.20.1

[PATCH AUTOSEL 5.1 086/141] powerpc/mm: Handle page table allocation failures

2019-07-18 Thread Sasha Levin

From: "Aneesh Kumar K.V" 

[ Upstream commit 2230ebf6e6dd0b7751e2921b40f6cfe34f09bb16 ]

This fixes kernel crash that arises due to not handling page table allocation
failures while allocating hugetlb page table.

Fixes: e2b3d202d1db ("powerpc: Switch 16GB and 16MB explicit hugepages to a 
different page table format")
Signed-off-by: Aneesh Kumar K.V 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/mm/hugetlbpage.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 9e732bb2c84a..8e098a08 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -154,6 +154,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long 
addr, unsigned long sz
} else {
pdshift = PUD_SHIFT;
pu = pud_alloc(mm, pg, addr);
+   if (!pu)
+   return NULL;
if (pshift == PUD_SHIFT)
return (pte_t *)pu;
else if (pshift > PMD_SHIFT) {
@@ -162,6 +164,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long 
addr, unsigned long sz
} else {
pdshift = PMD_SHIFT;
pm = pmd_alloc(mm, pu, addr);
+   if (!pm)
+   return NULL;
if (pshift == PMD_SHIFT)
/* 16MB hugepage */
return (pte_t *)pm;
@@ -178,12 +182,16 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long 
addr, unsigned long sz
} else {
pdshift = PUD_SHIFT;
pu = pud_alloc(mm, pg, addr);
+   if (!pu)
+   return NULL;
if (pshift >= PUD_SHIFT) {
ptl = pud_lockptr(mm, pu);
hpdp = (hugepd_t *)pu;
} else {
pdshift = PMD_SHIFT;
pm = pmd_alloc(mm, pu, addr);
+   if (!pm)
+   return NULL;
ptl = pmd_lockptr(mm, pm);
hpdp = (hugepd_t *)pm;
}
-- 
2.20.1

[PATCH AUTOSEL 5.1 081/141] powerpc/4xx/uic: clear pending interrupt after irq type/pol change

2019-07-18 Thread Sasha Levin

From: Christian Lamparter 

[ Upstream commit 3ab3a0689e74e6aa5b41360bc18861040ddef5b1 ]

When testing out gpio-keys with a button, a spurious
interrupt (and therefore a key press or release event)
gets triggered as soon as the driver enables the irq
line for the first time.

This patch clears any potential bogus generated interrupt
that was caused by the switching of the associated irq's
type and polarity.

Signed-off-by: Christian Lamparter 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/4xx/uic.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/4xx/uic.c b/arch/powerpc/platforms/4xx/uic.c
index 8b4dd0da0839..9e27cfe27026 100644
--- a/arch/powerpc/platforms/4xx/uic.c
+++ b/arch/powerpc/platforms/4xx/uic.c
@@ -158,6 +158,7 @@ static int uic_set_irq_type(struct irq_data *d, unsigned 
int flow_type)
 
mtdcr(uic->dcrbase + UIC_PR, pr);
mtdcr(uic->dcrbase + UIC_TR, tr);
+   mtdcr(uic->dcrbase + UIC_SR, ~mask);
 
raw_spin_unlock_irqrestore(>lock, flags);
 
-- 
2.20.1

[PATCH AUTOSEL 5.1 071/141] powerpc/rtas: retry when cpu offline races with suspend/migration

2019-07-18 Thread Sasha Levin

From: Nathan Lynch 

[ Upstream commit 9fb603050ffd94f8127df99c699cca2f575eb6a0 ]

The protocol for suspending or migrating an LPAR requires all present
processor threads to enter H_JOIN. So if we have threads offline, we
have to temporarily bring them up. This can race with administrator
actions such as SMT state changes. As of dfd718a2ed1f ("powerpc/rtas:
Fix a potential race between CPU-Offline & Migration"),
rtas_ibm_suspend_me() accounts for this, but errors out with -EBUSY
for what almost certainly is a transient condition in any reasonable
scenario.

Callers of rtas_ibm_suspend_me() already retry when -EAGAIN is
returned, and it is typical during a migration for that to happen
repeatedly for several minutes polling the H_VASI_STATE hcall result
before proceeding to the next stage.

So return -EAGAIN instead of -EBUSY when this race is
encountered. Additionally: logging this event is still appropriate but
use pr_info instead of pr_err; and remove use of unlikely() while here
as this is not a hot path at all.

Fixes: dfd718a2ed1f ("powerpc/rtas: Fix a potential race between CPU-Offline & 
Migration")
Signed-off-by: Nathan Lynch 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/rtas.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index fbc676160adf..9b4d2a2ffb4f 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -984,10 +984,9 @@ int rtas_ibm_suspend_me(u64 handle)
cpu_hotplug_disable();
 
/* Check if we raced with a CPU-Offline Operation */
-   if (unlikely(!cpumask_equal(cpu_present_mask, cpu_online_mask))) {
-   pr_err("%s: Raced against a concurrent CPU-Offline\n",
-  __func__);
-   atomic_set(, -EBUSY);
+   if (!cpumask_equal(cpu_present_mask, cpu_online_mask)) {
+   pr_info("%s: Raced against a concurrent CPU-Offline\n", 
__func__);
+   atomic_set(, -EAGAIN);
goto out_hotplug_enable;
}
 
-- 
2.20.1

[PATCH AUTOSEL 5.1 070/141] powerpc/xmon: Fix disabling tracing while in xmon

2019-07-18 Thread Sasha Levin

From: "Naveen N. Rao" 

[ Upstream commit aaf06665f7ea3ee9f9754e16c1a507a89f1de5b1 ]

Commit ed49f7fd6438d ("powerpc/xmon: Disable tracing when entering
xmon") added code to disable recording trace entries while in xmon. The
commit introduced a variable 'tracing_enabled' to record if tracing was
enabled on xmon entry, and used this to conditionally enable tracing
during exit from xmon.

However, we are not checking the value of 'fromipi' variable in
xmon_core() when setting 'tracing_enabled'. Due to this, when secondary
cpus enter xmon, they will see tracing as being disabled already and
tracing won't be re-enabled on exit. Fix the same.

Fixes: ed49f7fd6438d ("powerpc/xmon: Disable tracing when entering xmon")
Signed-off-by: Naveen N. Rao 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/xmon/xmon.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index a0f44f992360..c608e3fa6d3b 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -466,8 +466,10 @@ static int xmon_core(struct pt_regs *regs, int fromipi)
local_irq_save(flags);
hard_irq_disable();
 
-   tracing_enabled = tracing_is_on();
-   tracing_off();
+   if (!fromipi) {
+   tracing_enabled = tracing_is_on();
+   tracing_off();
+   }
 
bp = in_breakpoint_table(regs->nip, );
if (bp != NULL) {
-- 
2.20.1

[PATCH AUTOSEL 5.1 069/141] powerpc/cacheflush: fix variable set but not used

2019-07-18 Thread Sasha Levin

From: Qian Cai 

[ Upstream commit 04db3ede40ae4fc23a5c4237254c4a53bbe4c1f2 ]

The powerpc's flush_cache_vmap() is defined as a macro and never use
both of its arguments, so it will generate a compilation warning,

lib/ioremap.c: In function 'ioremap_page_range':
lib/ioremap.c:203:16: warning: variable 'start' set but not used
[-Wunused-but-set-variable]

Fix it by making it an inline function.

Signed-off-by: Qian Cai 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/include/asm/cacheflush.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index d5a8d7bf0759..b189f7aee222 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -32,9 +32,12 @@
  * not expect this type of fault. flush_cache_vmap is not exactly the right
  * place to put this, but it seems to work well enough.
  */
-#define flush_cache_vmap(start, end)   do { asm volatile("ptesync" ::: 
"memory"); } while (0)
+static inline void flush_cache_vmap(unsigned long start, unsigned long end)
+{
+   asm volatile("ptesync" ::: "memory");
+}
 #else
-#define flush_cache_vmap(start, end)   do { } while (0)
+static inline void flush_cache_vmap(unsigned long start, unsigned long end) { }
 #endif
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-- 
2.20.1

[PATCH AUTOSEL 5.1 058/141] powerpc/pci/of: Fix OF flags parsing for 64bit BARs

2019-07-18 Thread Sasha Levin

From: Alexey Kardashevskiy 

[ Upstream commit df5be5be8735ef2ae80d5ae1f2453cd81a035c4b ]

When the firmware does PCI BAR resource allocation, it passes the assigned
addresses and flags (prefetch/64bit/...) via the "reg" property of
a PCI device device tree node so the kernel does not need to do
resource allocation.

The flags are stored in resource::flags - the lower byte stores
PCI_BASE_ADDRESS_SPACE/etc bits and the other bytes are IORESOURCE_IO/etc.
Some flags from PCI_BASE_ADDRESS_xxx and IORESOURCE_xxx are duplicated,
such as PCI_BASE_ADDRESS_MEM_PREFETCH/PCI_BASE_ADDRESS_MEM_TYPE_64/etc.
When parsing the "reg" property, we copy the prefetch flag but we skip
on PCI_BASE_ADDRESS_MEM_TYPE_64 which leaves the flags out of sync.

The missing IORESOURCE_MEM_64 flag comes into play under 2 conditions:
1. we remove PCI_PROBE_ONLY for pseries (by hacking pSeries_setup_arch()
or by passing "/chosen/linux,pci-probe-only");
2. we request resource alignment (by passing pci=resource_alignment=
via the kernel cmd line to request PAGE_SIZE alignment or defining
ppc_md.pcibios_default_alignment which returns anything but 0). Note that
the alignment requests are ignored if PCI_PROBE_ONLY is enabled.

With 1) and 2), the generic PCI code in the kernel unconditionally
decides to:
- reassign the BARs in pci_specified_resource_alignment() (works fine)
- write new BARs to the device - this fails for 64bit BARs as the generic
code looks at IORESOURCE_MEM_64 (not set) and writes only lower 32bits
of the BAR and leaves the upper 32bit unmodified which breaks BAR mapping
in the hypervisor.

This fixes the issue by copying the flag. This is useful if we want to
enforce certain BAR alignment per platform as handling subpage sized BARs
is proven to cause problems with hotplug (SLOF already aligns BARs to 64k).

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: Sam Bobroff 
Reviewed-by: Oliver O'Halloran 
Reviewed-by: Shawn Anastasio 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/pci_of_scan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/pci_of_scan.c 
b/arch/powerpc/kernel/pci_of_scan.c
index 24191ea2d9a7..64ad92016b63 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -45,6 +45,8 @@ unsigned int pci_parse_of_flags(u32 addr0, int bridge)
if (addr0 & 0x0200) {
flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY;
flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64;
+   if (flags & PCI_BASE_ADDRESS_MEM_TYPE_64)
+   flags |= IORESOURCE_MEM_64;
flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M;
if (addr0 & 0x4000)
flags |= IORESOURCE_PREFETCH
-- 
2.20.1

[PATCH AUTOSEL 5.1 043/141] powerpc/pseries/mobility: prevent cpu hotplug during DT update

2019-07-18 Thread Sasha Levin

From: Nathan Lynch 

[ Upstream commit e59a175faa8df9d674247946f2a5a9c29c835725 ]

CPU online/offline code paths are sensitive to parts of the device
tree (various cpu node properties, cache nodes) that can be changed as
a result of a migration.

Prevent CPU hotplug while the device tree potentially is inconsistent.

Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
Signed-off-by: Nathan Lynch 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/pseries/mobility.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 88925f8ca8a0..edc1ec408589 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -9,6 +9,7 @@
  * 2 as published by the Free Software Foundation.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -338,11 +339,19 @@ void post_mobility_fixup(void)
if (rc)
printk(KERN_ERR "Post-mobility activate-fw failed: %d\n", rc);
 
+   /*
+* We don't want CPUs to go online/offline while the device
+* tree is being updated.
+*/
+   cpus_read_lock();
+
rc = pseries_devicetree_update(MIGRATION_SCOPE);
if (rc)
printk(KERN_ERR "Post-mobility device tree update "
"failed: %d\n", rc);
 
+   cpus_read_unlock();
+
/* Possibly switch to a new RFI flush type */
pseries_setup_rfi_flush();
 
-- 
2.20.1

[PATCH AUTOSEL 5.2 148/171] powerpc/eeh: Handle hugepages in ioremap space

2019-07-18 Thread Sasha Levin

From: Oliver O'Halloran 

[ Upstream commit 33439620680be5225c1b8806579a291e0d761ca0 ]

In commit 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap
space") support for using hugepages in the vmalloc and ioremap areas was
enabled for radix. Unfortunately this broke EEH MMIO error checking.

Detection works by inserting a hook which checks the results of the
ioreadXX() set of functions.  When a read returns a 0xFFs response we
need to check for an error which we do by mapping the (virtual) MMIO
address back to a physical address, then mapping physical address to a
PCI device via an interval tree.

When translating virt -> phys we currently assume the ioremap space is
only populated by PAGE_SIZE mappings. If a hugepage mapping is found we
emit a WARN_ON(), but otherwise handles the check as though a normal
page was found. In pathalogical cases such as copying a buffer
containing a lot of 0xFFs from BAR memory this can result in the system
not booting because it's too busy printing WARN_ON()s.

There's no real reason to assume huge pages can't be present and we're
prefectly capable of handling them, so do that.

Fixes: 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap space")
Reported-by: Sachin Sant 
Signed-off-by: Oliver O'Halloran 
Tested-by: Sachin Sant 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20190710150517.27114-1-ooh...@gmail.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/eeh.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index f192d57db47d..c0e4b73191f3 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -354,10 +354,19 @@ static inline unsigned long eeh_token_to_phys(unsigned 
long token)
ptep = find_init_mm_pte(token, _shift);
if (!ptep)
return token;
-   WARN_ON(hugepage_shift);
-   pa = pte_pfn(*ptep) << PAGE_SHIFT;
 
-   return pa | (token & (PAGE_SIZE-1));
+   pa = pte_pfn(*ptep);
+
+   /* On radix we can do hugepage mappings for io, so handle that */
+   if (hugepage_shift) {
+   pa <<= hugepage_shift;
+   pa |= token & ((1ul << hugepage_shift) - 1);
+   } else {
+   pa <<= PAGE_SHIFT;
+   pa |= token & (PAGE_SIZE - 1);
+   }
+
+   return pa;
 }
 
 /*
-- 
2.20.1

[PATCH AUTOSEL 5.2 141/171] powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h

2019-07-18 Thread Sasha Levin

From: Masahiro Yamada 

[ Upstream commit 9e005b761e7ad153dcf40a6cba1d681fe0830ac6 ]

The next commit will make the way of passing CONFIG options more robust.
Unfortunately, it would uncover another hidden issue; without this
commit, skiroot_defconfig would be broken like this:

|   WRAParch/powerpc/boot/zImage.pseries
| arch/powerpc/boot/wrapper.a(decompress.o): In function `bcj_powerpc.isra.10':
| decompress.c:(.text+0x720): undefined reference to `get_unaligned_be32'
| decompress.c:(.text+0x7a8): undefined reference to `put_unaligned_be32'
| make[1]: *** [arch/powerpc/boot/Makefile;383: 
arch/powerpc/boot/zImage.pseries] Error 1
| make: *** [arch/powerpc/Makefile;295: zImage] Error 2

skiroot_defconfig is the only defconfig that enables CONFIG_KERNEL_XZ
for ppc, which has never been correctly built before.

I figured out the root cause in lib/decompress_unxz.c:

| #ifdef CONFIG_PPC
| #  define XZ_DEC_POWERPC
| #endif

CONFIG_PPC is undefined here in the ppc bootwrapper because autoconf.h
is not included except by arch/powerpc/boot/serial.c

XZ_DEC_POWERPC is not defined, therefore, bcj_powerpc() is not compiled
for the bootwrapper.

With the next commit passing CONFIG_PPC correctly, we would realize that
{get,put}_unaligned_be32 was missing.

Unlike the other decompressors, the ppc bootwrapper duplicates all the
necessary helpers in arch/powerpc/boot/.

The other architectures define __KERNEL__ and pull in helpers for
building the decompressors.

If ppc bootwrapper had defined __KERNEL__, lib/xz/xz_private.h would
have included :

| #ifdef __KERNEL__
| #   include 
| #   include 
| #   include 

However, doing so would cause tons of definition conflicts since the
bootwrapper has duplicated everything.

I just added copies of {get,put}_unaligned_be32, following the
bootwrapper coding convention.

Signed-off-by: Masahiro Yamada 
Signed-off-by: Michael Ellerman 
Link: 
https://lore.kernel.org/r/20190705100144.28785-1-yamada.masah...@socionext.com
Signed-off-by: Sasha Levin 
---
 arch/powerpc/boot/xz_config.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/powerpc/boot/xz_config.h b/arch/powerpc/boot/xz_config.h
index e22e5b3770dd..ebfadd39e192 100644
--- a/arch/powerpc/boot/xz_config.h
+++ b/arch/powerpc/boot/xz_config.h
@@ -20,10 +20,30 @@ static inline uint32_t swab32p(void *p)
 
 #ifdef __LITTLE_ENDIAN__
 #define get_le32(p) (*((uint32_t *) (p)))
+#define cpu_to_be32(x) swab32(x)
+static inline u32 be32_to_cpup(const u32 *p)
+{
+   return swab32p((u32 *)p);
+}
 #else
 #define get_le32(p) swab32p(p)
+#define cpu_to_be32(x) (x)
+static inline u32 be32_to_cpup(const u32 *p)
+{
+   return *p;
+}
 #endif
 
+static inline uint32_t get_unaligned_be32(const void *p)
+{
+   return be32_to_cpup(p);
+}
+
+static inline void put_unaligned_be32(u32 val, void *p)
+{
+   *((u32 *)p) = cpu_to_be32(val);
+}
+
 #define memeq(a, b, size) (memcmp(a, b, size) == 0)
 #define memzero(buf, size) memset(buf, 0, size)
 
-- 
2.20.1

[PATCH AUTOSEL 5.2 140/171] powerpc/irq: Don't WARN continuously in arch_local_irq_restore()

2019-07-18 Thread Sasha Levin

From: Michael Ellerman 

[ Upstream commit 0fc12c022ad25532b66bf6f6c818ee1c1d63e702 ]

When CONFIG_PPC_IRQ_SOFT_MASK_DEBUG is enabled (uncommon), we have a
series of WARN_ON's in arch_local_irq_restore().

These are "should never happen" conditions, but if they do happen they
can flood the console and render the system unusable. So switch them
to WARN_ON_ONCE().

Fixes: e2b36d591720 ("powerpc/64: Don't trace code that runs with the soft irq 
mask unreconciled")
Fixes: 9b81c0211c24 ("powerpc/64s: make PACA_IRQ_HARD_DIS track MSR[EE] 
closely")
Fixes: 7c0482e3d055 ("powerpc/irq: Fix another case of lazy IRQ state getting 
out of sync")
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20190708061046.7075-1-...@ellerman.id.au
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/irq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index bc68c53af67c..5645bc9cbc09 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -255,7 +255,7 @@ notrace void arch_local_irq_restore(unsigned long mask)
irq_happened = get_irq_happened();
if (!irq_happened) {
 #ifdef CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
-   WARN_ON(!(mfmsr() & MSR_EE));
+   WARN_ON_ONCE(!(mfmsr() & MSR_EE));
 #endif
return;
}
@@ -268,7 +268,7 @@ notrace void arch_local_irq_restore(unsigned long mask)
 */
if (!(irq_happened & PACA_IRQ_HARD_DIS)) {
 #ifdef CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
-   WARN_ON(!(mfmsr() & MSR_EE));
+   WARN_ON_ONCE(!(mfmsr() & MSR_EE));
 #endif
__hard_irq_disable();
 #ifdef CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
@@ -279,7 +279,7 @@ notrace void arch_local_irq_restore(unsigned long mask)
 * warn if we are wrong. Only do that when IRQ tracing
 * is enabled as mfmsr() can be costly.
 */
-   if (WARN_ON(mfmsr() & MSR_EE))
+   if (WARN_ON_ONCE(mfmsr() & MSR_EE))
__hard_irq_disable();
 #endif
}
-- 
2.20.1

[PATCH AUTOSEL 5.2 112/171] powerpc/mm: Handle page table allocation failures

2019-07-18 Thread Sasha Levin

From: "Aneesh Kumar K.V" 

[ Upstream commit 2230ebf6e6dd0b7751e2921b40f6cfe34f09bb16 ]

This fixes kernel crash that arises due to not handling page table allocation
failures while allocating hugetlb page table.

Fixes: e2b3d202d1db ("powerpc: Switch 16GB and 16MB explicit hugepages to a 
different page table format")
Signed-off-by: Aneesh Kumar K.V 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/mm/hugetlbpage.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index b5d92dc32844..1de0f43a68e5 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -130,6 +130,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long 
addr, unsigned long sz
} else {
pdshift = PUD_SHIFT;
pu = pud_alloc(mm, pg, addr);
+   if (!pu)
+   return NULL;
if (pshift == PUD_SHIFT)
return (pte_t *)pu;
else if (pshift > PMD_SHIFT) {
@@ -138,6 +140,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long 
addr, unsigned long sz
} else {
pdshift = PMD_SHIFT;
pm = pmd_alloc(mm, pu, addr);
+   if (!pm)
+   return NULL;
if (pshift == PMD_SHIFT)
/* 16MB hugepage */
return (pte_t *)pm;
@@ -154,12 +158,16 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long 
addr, unsigned long sz
} else {
pdshift = PUD_SHIFT;
pu = pud_alloc(mm, pg, addr);
+   if (!pu)
+   return NULL;
if (pshift >= PUD_SHIFT) {
ptl = pud_lockptr(mm, pu);
hpdp = (hugepd_t *)pu;
} else {
pdshift = PMD_SHIFT;
pm = pmd_alloc(mm, pu, addr);
+   if (!pm)
+   return NULL;
ptl = pmd_lockptr(mm, pm);
hpdp = (hugepd_t *)pm;
}
-- 
2.20.1

[PATCH AUTOSEL 5.2 107/171] powerpc/mm: mark more tlb functions as __always_inline

2019-07-18 Thread Sasha Levin

From: Masahiro Yamada 

[ Upstream commit 6d3ca7e73642ce17398f4cd5df1780da4a1ccdaf ]

With CONFIG_OPTIMIZE_INLINING enabled, Laura Abbott reported error
with gcc 9.1.1:

  arch/powerpc/mm/book3s64/radix_tlb.c: In function '_tlbiel_pid':
  arch/powerpc/mm/book3s64/radix_tlb.c:104:2: warning: asm operand 3 probably 
doesn't match constraints
104 |  asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
|  ^~~
  arch/powerpc/mm/book3s64/radix_tlb.c:104:2: error: impossible constraint in 
'asm'

Fixing _tlbiel_pid() is enough to address the warning above, but I
inlined more functions to fix all potential issues.

To meet the "i" (immediate) constraint for the asm operands, functions
propagating "ric" must be always inlined.

Fixes: 9012d011660e ("compiler: allow all arches to enable 
CONFIG_OPTIMIZE_INLINING")
Reported-by: Laura Abbott 
Signed-off-by: Masahiro Yamada 
Reviewed-by: Christophe Leroy 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/mm/book3s64/hash_native.c |  2 +-
 arch/powerpc/mm/book3s64/radix_tlb.c   | 32 +-
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/hash_native.c 
b/arch/powerpc/mm/book3s64/hash_native.c
index 30d62ffe3310..1322c59cb5dd 100644
--- a/arch/powerpc/mm/book3s64/hash_native.c
+++ b/arch/powerpc/mm/book3s64/hash_native.c
@@ -56,7 +56,7 @@ static inline void tlbiel_hash_set_isa206(unsigned int set, 
unsigned int is)
  * tlbiel instruction for hash, set invalidation
  * i.e., r=1 and is=01 or is=10 or is=11
  */
-static inline void tlbiel_hash_set_isa300(unsigned int set, unsigned int is,
+static __always_inline void tlbiel_hash_set_isa300(unsigned int set, unsigned 
int is,
unsigned int pid,
unsigned int ric, unsigned int prs)
 {
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c 
b/arch/powerpc/mm/book3s64/radix_tlb.c
index bb9835681315..d0cd5271a57c 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -25,7 +25,7 @@
  * tlbiel instruction for radix, set invalidation
  * i.e., r=1 and is=01 or is=10 or is=11
  */
-static inline void tlbiel_radix_set_isa300(unsigned int set, unsigned int is,
+static __always_inline void tlbiel_radix_set_isa300(unsigned int set, unsigned 
int is,
unsigned int pid,
unsigned int ric, unsigned int prs)
 {
@@ -146,8 +146,8 @@ static __always_inline void __tlbie_lpid(unsigned long 
lpid, unsigned long ric)
trace_tlbie(lpid, 0, rb, rs, ric, prs, r);
 }
 
-static inline void __tlbiel_lpid_guest(unsigned long lpid, int set,
-   unsigned long ric)
+static __always_inline void __tlbiel_lpid_guest(unsigned long lpid, int set,
+   unsigned long ric)
 {
unsigned long rb,rs,prs,r;
 
@@ -163,8 +163,8 @@ static inline void __tlbiel_lpid_guest(unsigned long lpid, 
int set,
 }
 
 
-static inline void __tlbiel_va(unsigned long va, unsigned long pid,
-  unsigned long ap, unsigned long ric)
+static __always_inline void __tlbiel_va(unsigned long va, unsigned long pid,
+   unsigned long ap, unsigned long ric)
 {
unsigned long rb,rs,prs,r;
 
@@ -179,8 +179,8 @@ static inline void __tlbiel_va(unsigned long va, unsigned 
long pid,
trace_tlbie(0, 1, rb, rs, ric, prs, r);
 }
 
-static inline void __tlbie_va(unsigned long va, unsigned long pid,
- unsigned long ap, unsigned long ric)
+static __always_inline void __tlbie_va(unsigned long va, unsigned long pid,
+  unsigned long ap, unsigned long ric)
 {
unsigned long rb,rs,prs,r;
 
@@ -195,8 +195,8 @@ static inline void __tlbie_va(unsigned long va, unsigned 
long pid,
trace_tlbie(0, 0, rb, rs, ric, prs, r);
 }
 
-static inline void __tlbie_lpid_va(unsigned long va, unsigned long lpid,
- unsigned long ap, unsigned long ric)
+static __always_inline void __tlbie_lpid_va(unsigned long va, unsigned long 
lpid,
+   unsigned long ap, unsigned long ric)
 {
unsigned long rb,rs,prs,r;
 
@@ -235,7 +235,7 @@ static inline void fixup_tlbie_lpid(unsigned long lpid)
 /*
  * We use 128 set in radix mode and 256 set in hpt mode.
  */
-static inline void _tlbiel_pid(unsigned long pid, unsigned long ric)
+static __always_inline void _tlbiel_pid(unsigned long pid, unsigned long ric)
 {
int set;
 
@@ -337,7 +337,7 @@ static inline void _tlbie_lpid(unsigned long lpid, unsigned 
long ric)
asm volatile("eieio; tlbsync; ptesync": : :"memory");
 }
 
-static inline void _tlbiel_lpid_guest(unsigned long lpid, unsigned long ric)
+static __always_inline void _tlbiel_lpid_guest(unsigned long lpid, unsigned 
long ric)
 {

[PATCH AUTOSEL 5.2 106/171] powerpc/4xx/uic: clear pending interrupt after irq type/pol change

2019-07-18 Thread Sasha Levin

From: Christian Lamparter 

[ Upstream commit 3ab3a0689e74e6aa5b41360bc18861040ddef5b1 ]

When testing out gpio-keys with a button, a spurious
interrupt (and therefore a key press or release event)
gets triggered as soon as the driver enables the irq
line for the first time.

This patch clears any potential bogus generated interrupt
that was caused by the switching of the associated irq's
type and polarity.

Signed-off-by: Christian Lamparter 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/4xx/uic.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/4xx/uic.c b/arch/powerpc/platforms/4xx/uic.c
index 31f12ad37a98..36fb66ce54cf 100644
--- a/arch/powerpc/platforms/4xx/uic.c
+++ b/arch/powerpc/platforms/4xx/uic.c
@@ -154,6 +154,7 @@ static int uic_set_irq_type(struct irq_data *d, unsigned 
int flow_type)
 
mtdcr(uic->dcrbase + UIC_PR, pr);
mtdcr(uic->dcrbase + UIC_TR, tr);
+   mtdcr(uic->dcrbase + UIC_SR, ~mask);
 
raw_spin_unlock_irqrestore(>lock, flags);
 
-- 
2.20.1

[PATCH AUTOSEL 5.2 105/171] powerpc: silence a -Wcast-function-type warning in dawr_write_file_bool

2019-07-18 Thread Sasha Levin

From: Mathieu Malaterre 

[ Upstream commit 548c54acba5bd1388d50727a9a126a42d0cd4ad0 ]

In commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9
option") the following piece of code was added:

   smp_call_function((smp_call_func_t)set_dawr, _brk, 0);

Since GCC 8 this triggers the following warning about incompatible
function types:

  arch/powerpc/kernel/hw_breakpoint.c:408:21: error: cast between incompatible 
function types from 'int (*)(struct arch_hw_breakpoint *)' to 'void (*)(void 
*)' [-Werror=cast-function-type]

Since the warning is there for a reason, and should not be hidden behind
a cast, provide an intermediate callback function to avoid the warning.

Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option")
Suggested-by: Christoph Hellwig 
Signed-off-by: Mathieu Malaterre 
Signed-off-by: Michael Neuling 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/hw_breakpoint.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index a293a53b4365..50262597c222 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -370,6 +370,11 @@ void hw_breakpoint_pmu_read(struct perf_event *bp)
 bool dawr_force_enable;
 EXPORT_SYMBOL_GPL(dawr_force_enable);
 
+static void set_dawr_cb(void *info)
+{
+   set_dawr(info);
+}
+
 static ssize_t dawr_write_file_bool(struct file *file,
const char __user *user_buf,
size_t count, loff_t *ppos)
@@ -389,7 +394,7 @@ static ssize_t dawr_write_file_bool(struct file *file,
 
/* If we are clearing, make sure all CPUs have the DAWR cleared */
if (!dawr_force_enable)
-   smp_call_function((smp_call_func_t)set_dawr, _brk, 0);
+   smp_call_function(set_dawr_cb, _brk, 0);
 
return rc;
 }
-- 
2.20.1

[PATCH AUTOSEL 5.2 094/171] powerpc/rtas: retry when cpu offline races with suspend/migration

2019-07-18 Thread Sasha Levin

From: Nathan Lynch 

[ Upstream commit 9fb603050ffd94f8127df99c699cca2f575eb6a0 ]

The protocol for suspending or migrating an LPAR requires all present
processor threads to enter H_JOIN. So if we have threads offline, we
have to temporarily bring them up. This can race with administrator
actions such as SMT state changes. As of dfd718a2ed1f ("powerpc/rtas:
Fix a potential race between CPU-Offline & Migration"),
rtas_ibm_suspend_me() accounts for this, but errors out with -EBUSY
for what almost certainly is a transient condition in any reasonable
scenario.

Callers of rtas_ibm_suspend_me() already retry when -EAGAIN is
returned, and it is typical during a migration for that to happen
repeatedly for several minutes polling the H_VASI_STATE hcall result
before proceeding to the next stage.

So return -EAGAIN instead of -EBUSY when this race is
encountered. Additionally: logging this event is still appropriate but
use pr_info instead of pr_err; and remove use of unlikely() while here
as this is not a hot path at all.

Fixes: dfd718a2ed1f ("powerpc/rtas: Fix a potential race between CPU-Offline & 
Migration")
Signed-off-by: Nathan Lynch 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/rtas.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index b824f4c69622..fff2eb22427d 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -980,10 +980,9 @@ int rtas_ibm_suspend_me(u64 handle)
cpu_hotplug_disable();
 
/* Check if we raced with a CPU-Offline Operation */
-   if (unlikely(!cpumask_equal(cpu_present_mask, cpu_online_mask))) {
-   pr_err("%s: Raced against a concurrent CPU-Offline\n",
-  __func__);
-   atomic_set(, -EBUSY);
+   if (!cpumask_equal(cpu_present_mask, cpu_online_mask)) {
+   pr_info("%s: Raced against a concurrent CPU-Offline\n", 
__func__);
+   atomic_set(, -EAGAIN);
goto out_hotplug_enable;
}
 
-- 
2.20.1

[PATCH AUTOSEL 5.2 093/171] powerpc/xmon: Fix disabling tracing while in xmon

2019-07-18 Thread Sasha Levin

From: "Naveen N. Rao" 

[ Upstream commit aaf06665f7ea3ee9f9754e16c1a507a89f1de5b1 ]

Commit ed49f7fd6438d ("powerpc/xmon: Disable tracing when entering
xmon") added code to disable recording trace entries while in xmon. The
commit introduced a variable 'tracing_enabled' to record if tracing was
enabled on xmon entry, and used this to conditionally enable tracing
during exit from xmon.

However, we are not checking the value of 'fromipi' variable in
xmon_core() when setting 'tracing_enabled'. Due to this, when secondary
cpus enter xmon, they will see tracing as being disabled already and
tracing won't be re-enabled on exit. Fix the same.

Fixes: ed49f7fd6438d ("powerpc/xmon: Disable tracing when entering xmon")
Signed-off-by: Naveen N. Rao 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/xmon/xmon.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index d0620d762a5a..4a721fd62406 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -465,8 +465,10 @@ static int xmon_core(struct pt_regs *regs, int fromipi)
local_irq_save(flags);
hard_irq_disable();
 
-   tracing_enabled = tracing_is_on();
-   tracing_off();
+   if (!fromipi) {
+   tracing_enabled = tracing_is_on();
+   tracing_off();
+   }
 
bp = in_breakpoint_table(regs->nip, );
if (bp != NULL) {
-- 
2.20.1

[PATCH AUTOSEL 5.2 092/171] powerpc/cacheflush: fix variable set but not used

2019-07-18 Thread Sasha Levin

From: Qian Cai 

[ Upstream commit 04db3ede40ae4fc23a5c4237254c4a53bbe4c1f2 ]

The powerpc's flush_cache_vmap() is defined as a macro and never use
both of its arguments, so it will generate a compilation warning,

lib/ioremap.c: In function 'ioremap_page_range':
lib/ioremap.c:203:16: warning: variable 'start' set but not used
[-Wunused-but-set-variable]

Fix it by making it an inline function.

Signed-off-by: Qian Cai 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/include/asm/cacheflush.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index 74d60cfe8ce5..fd318f7c3eed 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -29,9 +29,12 @@
  * not expect this type of fault. flush_cache_vmap is not exactly the right
  * place to put this, but it seems to work well enough.
  */
-#define flush_cache_vmap(start, end)   do { asm volatile("ptesync" ::: 
"memory"); } while (0)
+static inline void flush_cache_vmap(unsigned long start, unsigned long end)
+{
+   asm volatile("ptesync" ::: "memory");
+}
 #else
-#define flush_cache_vmap(start, end)   do { } while (0)
+static inline void flush_cache_vmap(unsigned long start, unsigned long end) { }
 #endif
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-- 
2.20.1

[PATCH AUTOSEL 5.2 077/171] powerpc/pci/of: Fix OF flags parsing for 64bit BARs

2019-07-18 Thread Sasha Levin

From: Alexey Kardashevskiy 

[ Upstream commit df5be5be8735ef2ae80d5ae1f2453cd81a035c4b ]

When the firmware does PCI BAR resource allocation, it passes the assigned
addresses and flags (prefetch/64bit/...) via the "reg" property of
a PCI device device tree node so the kernel does not need to do
resource allocation.

The flags are stored in resource::flags - the lower byte stores
PCI_BASE_ADDRESS_SPACE/etc bits and the other bytes are IORESOURCE_IO/etc.
Some flags from PCI_BASE_ADDRESS_xxx and IORESOURCE_xxx are duplicated,
such as PCI_BASE_ADDRESS_MEM_PREFETCH/PCI_BASE_ADDRESS_MEM_TYPE_64/etc.
When parsing the "reg" property, we copy the prefetch flag but we skip
on PCI_BASE_ADDRESS_MEM_TYPE_64 which leaves the flags out of sync.

The missing IORESOURCE_MEM_64 flag comes into play under 2 conditions:
1. we remove PCI_PROBE_ONLY for pseries (by hacking pSeries_setup_arch()
or by passing "/chosen/linux,pci-probe-only");
2. we request resource alignment (by passing pci=resource_alignment=
via the kernel cmd line to request PAGE_SIZE alignment or defining
ppc_md.pcibios_default_alignment which returns anything but 0). Note that
the alignment requests are ignored if PCI_PROBE_ONLY is enabled.

With 1) and 2), the generic PCI code in the kernel unconditionally
decides to:
- reassign the BARs in pci_specified_resource_alignment() (works fine)
- write new BARs to the device - this fails for 64bit BARs as the generic
code looks at IORESOURCE_MEM_64 (not set) and writes only lower 32bits
of the BAR and leaves the upper 32bit unmodified which breaks BAR mapping
in the hypervisor.

This fixes the issue by copying the flag. This is useful if we want to
enforce certain BAR alignment per platform as handling subpage sized BARs
is proven to cause problems with hotplug (SLOF already aligns BARs to 64k).

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: Sam Bobroff 
Reviewed-by: Oliver O'Halloran 
Reviewed-by: Shawn Anastasio 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/kernel/pci_of_scan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/pci_of_scan.c 
b/arch/powerpc/kernel/pci_of_scan.c
index 24522aa37665..c63c53b37e8e 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -42,6 +42,8 @@ unsigned int pci_parse_of_flags(u32 addr0, int bridge)
if (addr0 & 0x0200) {
flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY;
flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64;
+   if (flags & PCI_BASE_ADDRESS_MEM_TYPE_64)
+   flags |= IORESOURCE_MEM_64;
flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M;
if (addr0 & 0x4000)
flags |= IORESOURCE_PREFETCH
-- 
2.20.1

[PATCH AUTOSEL 5.2 060/171] powerpc/pseries/mobility: prevent cpu hotplug during DT update

2019-07-18 Thread Sasha Levin

From: Nathan Lynch 

[ Upstream commit e59a175faa8df9d674247946f2a5a9c29c835725 ]

CPU online/offline code paths are sensitive to parts of the device
tree (various cpu node properties, cache nodes) that can be changed as
a result of a migration.

Prevent CPU hotplug while the device tree potentially is inconsistent.

Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
Signed-off-by: Nathan Lynch 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
---
 arch/powerpc/platforms/pseries/mobility.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 0c48c8964783..50e7aee3c7f3 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -6,6 +6,7 @@
  * Copyright (C) 2010 IBM Corporation
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -335,11 +336,19 @@ void post_mobility_fixup(void)
if (rc)
printk(KERN_ERR "Post-mobility activate-fw failed: %d\n", rc);
 
+   /*
+* We don't want CPUs to go online/offline while the device
+* tree is being updated.
+*/
+   cpus_read_lock();
+
rc = pseries_devicetree_update(MIGRATION_SCOPE);
if (rc)
printk(KERN_ERR "Post-mobility device tree update "
"failed: %d\n", rc);
 
+   cpus_read_unlock();
+
/* Possibly switch to a new RFI flush type */
pseries_setup_rfi_flush();
 
-- 
2.20.1

Re: [PATCH] powerpc: remove meaningless KBUILD_ARFLAGS addition

2019-07-18 Thread Michael Ellerman

Segher Boessenkool  writes:
> On Thu, Jul 18, 2019 at 11:19:58AM +0900, Masahiro Yamada wrote:
>> On Thu, Jul 18, 2019 at 1:46 AM Segher Boessenkool
>>  wrote:
>> Kbuild always uses thin archives as far as vmlinux is concerned.
>> 
>> But, there are some other call-sites.
>> 
>> masahiro@pug:~/ref/linux$ git grep  '$(AR)' -- :^Documentation :^tools
>> arch/powerpc/boot/Makefile:BOOTAR := $(AR)
>> arch/unicore32/lib/Makefile:$(Q)$(AR) p $(GNU_LIBC_A) $(notdir $@) > $@
>> arch/unicore32/lib/Makefile:$(Q)$(AR) p $(GNU_LIBGCC_A) $(notdir $@) > $@
>> lib/raid6/test/Makefile: $(AR) cq $@ $^
>> scripts/Kbuild.include:ar-option = $(call try-run, $(AR) rc$(1)
>> "$$TMP",$(1),$(2))
>> scripts/Makefile.build:  cmd_ar_builtin = rm -f $@; $(AR)
>> rcSTP$(KBUILD_ARFLAGS) $@ $(real-prereqs)
>> scripts/Makefile.lib:  cmd_ar = rm -f $@; $(AR)
>> rcsTP$(KBUILD_ARFLAGS) $@ $(real-prereqs)
>> 
>> Probably, you are interested in arch/powerpc/boot/Makefile.
>
> That one seems fine actually.  The raid6 one I don't know.
>
>
> My original commit message was
>
> Without this, some versions of GNU ar fail to create
> an archive index if the object files it is packing
> together are of a different object format than ar's
> default format (for example, binutils compiled to
> default to 64-bit, with 32-bit objects).
>
> but I cannot reproduce the problem anymore.  Shortly after my patch the
> thin archive code happened to binutils, and that overhauled some other
> things, which might have fixed it already?
>
>> > Yes, I know.  This isn't about built-in.[oa], it is about *other*
>> > archives we at least *used to* create.  If we *know* we do not anymore,
>> > then this workaround can of course be removed (and good riddance).
>> 
>> If it is not about built-in.[oa],
>> which archive are you talking about?
>> 
>> Can you pin-point the one?
>
> No, not anymore.  Lost in the mists of time, I guess?  I think we'll
> just have to file it as "it seems to work fine now".

Yeah I think so. If someone finds a case it breaks we can fix it then.

> Thank you (and everyone else) for the time looking at this!

Likewise.

cheers

Re: [PATCH v2] powerpc: slightly improve cache helpers

2019-07-18 Thread Nathan Chancellor

On Mon, Jul 08, 2019 at 11:49:52PM -0700, Nathan Chancellor wrote:
> On Tue, Jul 09, 2019 at 07:04:43AM +0200, Christophe Leroy wrote:
> > Is that a Clang bug ?
> 
> No idea, it happens with clang-8 and clang-9 though (pretty sure there
> were fixes for PowerPC in clang-8 so something before it probably won't
> work but I haven't tried).
> 
> > 
> > Do you have a disassembly of the code both with and without this patch in
> > order to compare ?
> 
> I can give you whatever disassembly you want (or I can upload the raw
> files if that is easier).
> 
> Cheers,
> Nathan

Hi Christophe and Segher,

What disassembly/files did you need to start taking a look at this? I
can upload/send whatever you need.

If it is easier, we have a self contained clang build script available
to make it easier to reproduce this on your side (does assume an x86_64
host):

https://github.com/ClangBuiltLinux/tc-build

Cheers,
Nathan

Re: [PATCH v4 4/8] KVM: PPC: Ultravisor: Use UV_WRITE_PATE ucall to register a PATE

2019-07-18 Thread Michael Ellerman

Claudio Carvalho  writes:
> On 7/11/19 9:57 AM, Michael Ellerman wrote:
>>>  static pmd_t *get_pmd_from_cache(struct mm_struct *mm)
>>> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
>>> b/arch/powerpc/mm/book3s64/radix_pgtable.c
>>> index 8904aa1243d8..da6a6b76a040 100644
>>> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
>>> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
>>> @@ -656,8 +656,10 @@ void radix__early_init_mmu_secondary(void)
>>> lpcr = mfspr(SPRN_LPCR);
>>> mtspr(SPRN_LPCR, lpcr | LPCR_UPRT | LPCR_HR);
>>>  
>>> -   mtspr(SPRN_PTCR,
>>> - __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
>>> +   if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
>>> +   mtspr(SPRN_PTCR, __pa(partition_tb) |
>>> + (PATB_SIZE_SHIFT - 12));
>>> +
>>> radix_init_amor();
>>> }
>>>  
>>> @@ -673,7 +675,8 @@ void radix__mmu_cleanup_all(void)
>>> if (!firmware_has_feature(FW_FEATURE_LPAR)) {
>>> lpcr = mfspr(SPRN_LPCR);
>>> mtspr(SPRN_LPCR, lpcr & ~LPCR_UPRT);
>>> -   mtspr(SPRN_PTCR, 0);
>>> +   if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
>>> +   mtspr(SPRN_PTCR, 0);
>>> powernv_set_nmmu_ptcr(0);
>>> radix__flush_tlb_all();
>>> }
>> There's four of these case where we skip touching the PTCR, which is
>> right on the borderline of warranting an accessor. I guess we can do it
>> as a cleanup later.
>
> I agree.
>
> Since the kernel doesn't need to access a big number of ultravisor
> privileged registers, maybe we can define mtspr_ and mfspr_
> inline functions that in ultravisor.h that skip touching the register if an
> ultravisor is present and and the register is ultravisor privileged. Thus,
> we don't need to replicate comments and that also would make it easier for
> developers to know what are the ultravisor privileged registers.
>
> Something like this:
>
> --- a/arch/powerpc/include/asm/ultravisor.h
> +++ b/arch/powerpc/include/asm/ultravisor.h
> @@ -10,10 +10,21 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  int early_init_dt_scan_ultravisor(unsigned long node, const char *uname,
>   int depth, void *data);
>  
> +static inline void mtspr_ptcr(unsigned long val)
> +{
> +   /*
> +    * If the ultravisor firmware is present, it maintains the partition
> +    * table. PTCR becomes ultravisor privileged only for writing.
> +    */
> +   if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
> +   mtspr(SPRN_PTCR, val);
> +}
+
>  static inline int uv_register_pate(u64 lpid, u64 dw0, u64 dw1)
>  {
>     return ucall_norets(UV_WRITE_PATE, lpid, dw0, dw1);
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c
> b/arch/powerpc/mm/book3s64/pgtable.c
> index e1bbc48e730f..25156f9dfde8 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -220,7 +220,7 @@ void __init mmu_partition_table_init(void)
>  * 64 K size.
>  */
>     ptcr = __pa(partition_tb) | (PATB_SIZE_SHIFT - 12);
> -   mtspr(SPRN_PTCR, ptcr);
> +   mtspr_ptcr(ptcr);
>     powernv_set_nmmu_ptcr(ptcr);
>  }
>
> What do you think?

I don't think that's actually clearer.

If the logic was always:

  if (ultravisor)
 do_ucall()
  else
 mtspr()

Then a wrapper called eg. set_ptcr() would make sense.

But because in some cases you do a ucall and some you don't, I don't
think it helps to hide that in an accessor like above.

That is confusing to a reader who sees all this code to setup a value
and then the write to PTCR does nothing.

And in fact you didn't explain why it's OK for those cases to not do the
write at all.

> An alternative could be to change the mtspr() and mfspr() macros as we
> proposed in the v1, but access to non-ultravisor privileged registers would
> be performance impacted because we always would need to check if the
> register is one of the few ultravisor registers that the kernel needs to
> access.

Yeah that and it would be very confusing to a reader who sees:

ptcr = ...;
mtspr(SPRN_PTCR, ptcr);
...

And then they discover the mtspr does *nothing* when the Ultravisor is
enabled.

cheers

Re: [PATCH v9 08/10] open: openat2(2) syscall

2019-07-18 Thread Aleksa Sarai

On 2019-07-19, Dmitry V. Levin  wrote:
> On Sun, Jul 07, 2019 at 12:57:35AM +1000, Aleksa Sarai wrote:
> [...]
> > +/**
> > + * Arguments for how openat2(2) should open the target path. If @extra is 
> > zero,
> > + * then openat2(2) is identical to openat(2).
> > + *
> > + * @flags: O_* flags (unknown flags ignored).
> 
> What was the rationale for implementing this semantics?
> Ignoring unknown flags makes potential extension of this new interface
> problematic.  This has bitten us many times already, so ...

I am mirroring the semantics of open(2) and openat(2).

To be clear, I am in favour of doing it -- and it would definitely be
possible to implement it with -EINVAL (you would just mask off
~VALID_OPEN_FLAGS for the older syscalls). But Linus' response to my
point about (the lack of) -EINVAL for unknown open(2) flags gave me the
impression he would be against this idea (though I might be
misunderstanding the point he was making).

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH

signature.asc
Description: PGP signature

Re: [PATCH v9 08/10] open: openat2(2) syscall

2019-07-18 Thread Dmitry V. Levin

On Thu, Jul 18, 2019 at 11:29:50PM +0200, Arnd Bergmann wrote:
[...]
> 5. you get the same problem with seccomp and strace that
>clone3() has -- these and others only track the register
>arguments by default.

Just for the record, this is definitely not the case for strace:
it decodes arrays, structures, netlink messages, and so on by default.

-- 
ldv

signature.asc
Description: PGP signature

Re: [PATCH v9 08/10] open: openat2(2) syscall

2019-07-18 Thread Dmitry V. Levin

On Sun, Jul 07, 2019 at 12:57:35AM +1000, Aleksa Sarai wrote:
[...]
> +/**
> + * Arguments for how openat2(2) should open the target path. If @extra is 
> zero,
> + * then openat2(2) is identical to openat(2).
> + *
> + * @flags: O_* flags (unknown flags ignored).

What was the rationale for implementing this semantics?
Ignoring unknown flags makes potential extension of this new interface
problematic.  This has bitten us many times already, so ...

> + * @mode: O_CREAT file mode (ignored otherwise).
> + * @upgrade_mask: restrict how the O_PATH may be re-opened (ignored 
> otherwise).
> + * @resolve: RESOLVE_* flags (-EINVAL on unknown flags).

... could you consider implementing this (-EINVAL on unknown flags) semantics
for @flags as well, please?

-- 
ldv

signature.asc
Description: PGP signature

Re: [PATCH v2 03/13] powerpc/prom_init: Add the ESM call to prom_init

2019-07-18 Thread Thiago Jung Bauermann



Alexey Kardashevskiy  writes:

> On 19/07/2019 07:28, Thiago Jung Bauermann wrote:
>>
>> Hello Segher,
>>
>> Thanks for your review and suggestions!
>>
>> Segher Boessenkool  writes:
>>
>>> (Sorry to hijack your reply).
>>>
>>> On Thu, Jul 18, 2019 at 06:11:48PM +1000, Alexey Kardashevskiy wrote:
 On 13/07/2019 16:00, Thiago Jung Bauermann wrote:
> From: Ram Pai 
> +static int enter_secure_mode(unsigned long kbase, unsigned long fdt)
> +{
> + register uint64_t func asm("r3") = UV_ESM;
> + register uint64_t arg1 asm("r4") = (uint64_t)kbase;
> + register uint64_t arg2 asm("r5") = (uint64_t)fdt;

 What does UV do with kbase and fdt precisely? Few words in the commit
 log will do.
>
>
> What about this one? :)

Sorry, I don't have an elaborate answer yet. The non-elaborate answer is
that the ultravisor uses the kbase and fdt as part of integrity checking
of the secure guest.

--
Thiago Jung Bauermann
IBM Linux Technology Center

Re: [PATCH v2 03/13] powerpc/prom_init: Add the ESM call to prom_init

2019-07-18 Thread Alexey Kardashevskiy





On 19/07/2019 07:28, Thiago Jung Bauermann wrote:


Hello Segher,

Thanks for your review and suggestions!

Segher Boessenkool  writes:


(Sorry to hijack your reply).

On Thu, Jul 18, 2019 at 06:11:48PM +1000, Alexey Kardashevskiy wrote:

On 13/07/2019 16:00, Thiago Jung Bauermann wrote:

From: Ram Pai 
+static int enter_secure_mode(unsigned long kbase, unsigned long fdt)
+{
+   register uint64_t func asm("r3") = UV_ESM;
+   register uint64_t arg1 asm("r4") = (uint64_t)kbase;
+   register uint64_t arg2 asm("r5") = (uint64_t)fdt;


What does UV do with kbase and fdt precisely? Few words in the commit
log will do.



What about this one? :)





+
+   asm volatile("sc 2\n"
+: "=r"(func)
+: "0"(func), "r"(arg1), "r"(arg2)
+:);
+
+   return (int)func;


And why "func"? Is it "function"? Weird name. Thanks,


Yes, I believe func is for function. Perhaps ucall would be clearer
if the variable wasn't reused for the return value as Segher points out.


Maybe the three vars should just be called "r3", "r4", and "r5" --
r3 is used as return value as well, so "func" isn't a great name for it.


Yes, that does seem simpler.


Some other comments about this inline asm:

The "\n" makes the generated asm look funny and has no other function.
Instead of using backreferences you can use a "+" constraint, "inout".
Empty clobber list is strange.
Casts to the return type, like most other casts, are an invitation to
bugs and not actually useful.

So this can be written

static int enter_secure_mode(unsigned long kbase, unsigned long fdt)
{
register uint64_t r3 asm("r3") = UV_ESM;
register uint64_t r4 asm("r4") = kbase;
register uint64_t r4 asm("r5") = fdt;

asm volatile("sc 2" : "+r"(r3) : "r"(r4), "r"(r5));

return r3;
}


I'll adopt your version, it is cleaner inded. Thanks for providing it!


(and it probably should use u64 instead of both uint64_t and unsigned long?)


Almost all of prom_init.c uses unsigned long, with u64 in just a few
places. uint64_t isn't used anywhere else in the file. I'll switch to
unsigned long everywhere, since this feature is only for 64 bit.



--
Alexey

Re: Crash in kvmppc_xive_release()

2019-07-18 Thread Cédric Le Goater

On 18/07/2019 15:14, Cédric Le Goater wrote:
> On 18/07/2019 14:49, Michael Ellerman wrote:
>> Anyone else seen this?
>>
>> This is running ~176 VMs on a Power9 (1 per thread), host crashes:
> 
> This is beyond the underlying limits of XIVE. 
> 
> As we allocate 2K vCPUs per VM, that is 16K EQs for interrupt events. The 
> overall
> EQ count is 1M. I let you calculate what is our max number of VMs ...
> 
>>   [   66.403750][ T6423] xive: OPAL failed to allocate VCPUs order 11, err 
>> -10
> 
> Hence, the OPAL XIVE driver fails which is good but ...
> 
>>   [188523.080935670,4] Spent 1783 msecs in OPAL call 135!
>>   [   66.484965][ T6250] BUG: Kernel NULL pointer dereference at 0x42e8
>>   [   66.485558][ T6250] Faulting instruction address: 0xc00811a33fcc
>>   [   66.485990][ T6250] Oops: Kernel access of bad area, sig: 7 [#1]
>>   [   66.486405][ T6250] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP 
>> NR_CPUS=2048 NUMA PowerNV
>>   [   66.486967][ T6250] Modules linked in: kvm_hv kvm
>>   [   66.487275][ T6250] CPU: 107 PID: 6250 Comm: qemu-system-ppc Not 
>> tainted 5.2.0-rc2-gcc9x-gf5a9e488d623 #1
>>   [   66.487902][ T6250] NIP:  c00811a33fcc LR: c00811a33fc4 CTR: 
>> c05d5970
>>   [   66.488383][ T6250] REGS: c01fabebb900 TRAP: 0300   Not tainted  
>> (5.2.0-rc2-gcc9x-gf5a9e488d623)
>>   [   66.488933][ T6250] MSR:  9280b033 
>>   CR: 24028224  XER: 
>>   [   66.489724][ T6250] CFAR: c05d6a4c DAR: 42e8 DSISR: 
>> 0008 IRQMASK: 0 
>>   [   66.489724][ T6250] GPR00: c00811a33fc4 c01fabebbb90 
>> c00811a5a200 c1399928 
>>   [   66.489724][ T6250] GPR04: 0001 c047b8d0 
>>  0001 
>>   [   66.489724][ T6250] GPR08:   
>> c01fa8c42f00 c00811a3af20 
>>   [   66.489724][ T6250] GPR12: 8000 c0002023ff65a880 
>> 00013a1b4000 0002 
>>   [   66.489724][ T6250] GPR16: 1000 0002 
>> 0001 00012b194cc0 
>>   [   66.489724][ T6250] GPR20: 7fffb1645250 0001 
>> 0031  
>>   [   66.489724][ T6250] GPR24: 7fffb16408d8 c01ffafb62e0 
>> c01f78699360 c01ff35d0620 
>>   [   66.489724][ T6250] GPR28: c01ed0ed c01ecd90 
>>  c01ed0ed 
>>   [   66.495211][ T6250] NIP [c00811a33fcc] 
>> kvmppc_xive_release+0x54/0x1b0 [kvm]
>>   [   66.495642][ T6250] LR [c00811a33fc4] 
>> kvmppc_xive_release+0x4c/0x1b0 [kvm]
>>   [   66.496101][ T6250] Call Trace:
>>   [   66.496314][ T6250] [c01fabebbb90] [c00811a33fc4] 
>> kvmppc_xive_release+0x4c/0x1b0 [kvm] (unreliable)
>>   [   66.496893][ T6250] [c01fabebbbf0] [c00811a18d54] 
>> kvm_device_release+0xac/0xf0 [kvm]
>>   [   66.497399][ T6250] [c01fabebbc30] [c0442f8c] 
>> __fput+0xec/0x310
>>   [   66.497815][ T6250] [c01fabebbc90] [c0145f94] 
>> task_work_run+0x114/0x170
>>   [   66.498296][ T6250] [c01fabebbce0] [c0115274] 
>> do_exit+0x454/0xee0
>>   [   66.498743][ T6250] [c01fabebbdc0] [c0115dd0] 
>> do_group_exit+0x60/0xe0
>>   [   66.499201][ T6250] [c01fabebbe00] [c0115e74] 
>> sys_exit_group+0x24/0x40
>>   [   66.499747][ T6250] [c01fabebbe20] [c000b83c] 
>> system_call+0x5c/0x70
>>   [   66.500261][ T6250] Instruction dump:
>>   [   66.500484][ T6250] fbe1fff8 fba1ffe8 fbc1fff0 7c7c1b78 f8010010 
>> f821ffa1 eba30010 e87d0010 
>>   [   66.501006][ T6250] ebdd 48006f61 e8410018 3920  
>> 913e42e8 48007f3d e8410018 
>>   [   66.501529][ T6250] ---[ end trace c021a6ca03594ec3 ]---
>>   [   66.513119][ T6150] xive: OPAL failed to allocate VCPUs order 11, err 
>> -10
> 
> 
> ... the rollback code in case of such error must be bogus. It was never 
> tested 
> clearly :/

Here is a fix. Could you give it a try on your system  ?

Thanks,

C.

>From b6f728ca19a9540c8bf4f5a56991c4e3dab4cf56 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= 
Date: Thu, 18 Jul 2019 22:15:31 +0200
Subject: [PATCH] KVM: PPC: Book3S HV: XIVE: fix rollback when
 kvmppc_xive_create fails
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The XIVE device structure is now allocated in kvmppc_xive_get_device()
and kfree'd in kvmppc_core_destroy_vm(). In case of an OPAL error when
allocating the XIVE VPs, the kfree() call in kvmppc_xive_*create()
will result in a double free and corrupt the host memory.

Fixes: 5422e95103cf ("KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method 
by a 'release' method")
Signed-off-by: Cédric Le Goater 
---
 arch/powerpc/kvm/book3s_xive.c| 4 +---
 arch/powerpc/kvm/book3s_xive_native.c | 4 ++--
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 6ca0d7376a9f..e3ba67095895 100644
---

Re: [PATCH v9 08/10] open: openat2(2) syscall

2019-07-18 Thread Arnd Bergmann

On Thu, Jul 18, 2019 at 6:12 PM Aleksa Sarai  wrote:
> On 2019-07-18, Arnd Bergmann  wrote:
> > On Sat, Jul 6, 2019 at 5:00 PM Aleksa Sarai  wrote:
> >
> > In fact, that seems similar enough to the existing openat() that I think
> > you could also just add the fifth argument to the existing call when
> > a newly defined flag is set, similarly to how we only use the 'mode'
> > argument when O_CREAT or O_TMPFILE are set.
>
> I considered doing this (and even had a preliminary version of it), but
> I discovered that I was not in favour of this idea -- once I started to
> write tests using it -- for a few reasons:
>
>   1. It doesn't really allow for clean extension for a future 6th
>  argument (because you are using up O_* flags to signify "use the
>  next argument", and O_* flags don't give -EINVAL if they're
>  unknown). Now, yes you can do the on-start runtime check that
>  everyone does -- but I've never really liked having to do it.
>
>  Having reserved padding for later extensions (that is actually
>  checked and gives -EINVAL) matches more modern syscall designs.
>
>   2. I really was hoping that the variadic openat(2) could be done away
>  using this union setup (Linus said he didn't like it, and suggested
>  using something like 'struct stat' as an argument for openat(2) --
>  though personally I am not sure I would personally like to use an
>  interface like that).
>
>   3. In order to avoid wasting a syscall argument for mode/mask you need
>  to either have something like your suggested mode_mask (which makes
>  the syscall arguments less consistent) or have some sort of
>  mode-like argument that is treated specially (which is really awful
>  on multiple levels -- this one I also tried and even wrote my
>  original tests using). And in both cases, the shims for
>  open{,at}(2) are somewhat less clean.

These are all good reasons, thanks for providing the background.

> All of that being said, I'd be happy to switch to whatever you think
> makes the most sense. As long as it's possible to get an O_PATH with
> RESOLVE_IN_ROOT set, I'm happy.

I don't feel I should be in charge of making the decision. I'd still
prefer avoiding the indirect argument structure because

4. it's inconsistent with most other syscalls

5. you get the same problem with seccomp and strace that
   clone3() has -- these and others only track the register
   arguments by default.

6. copying the structure adds a small overhead compared to
   passing registers

7. the calling conventions may be inconvenient for  a user space
   library, so you end up with different prototypes for the low-level
   syscall and the libc abstraction.

I don't see any of the above seven points as a showstopper
either way, so I hope someone else has a strong opinion
and can make the decision easier for you.

In the meantime just keep what you have, so you don't have
to change it multiple times.

   Arnd

Re: [PATCH v2 03/13] powerpc/prom_init: Add the ESM call to prom_init

2019-07-18 Thread Thiago Jung Bauermann



Hello Segher,

Thanks for your review and suggestions!

Segher Boessenkool  writes:

> (Sorry to hijack your reply).
>
> On Thu, Jul 18, 2019 at 06:11:48PM +1000, Alexey Kardashevskiy wrote:
>> On 13/07/2019 16:00, Thiago Jung Bauermann wrote:
>> >From: Ram Pai 
>> >+static int enter_secure_mode(unsigned long kbase, unsigned long fdt)
>> >+{
>> >+   register uint64_t func asm("r3") = UV_ESM;
>> >+   register uint64_t arg1 asm("r4") = (uint64_t)kbase;
>> >+   register uint64_t arg2 asm("r5") = (uint64_t)fdt;
>> 
>> What does UV do with kbase and fdt precisely? Few words in the commit 
>> log will do.
>> 
>> >+
>> >+   asm volatile("sc 2\n"
>> >+: "=r"(func)
>> >+: "0"(func), "r"(arg1), "r"(arg2)
>> >+:);
>> >+
>> >+   return (int)func;
>> 
>> And why "func"? Is it "function"? Weird name. Thanks,

Yes, I believe func is for function. Perhaps ucall would be clearer
if the variable wasn't reused for the return value as Segher points out.

> Maybe the three vars should just be called "r3", "r4", and "r5" --
> r3 is used as return value as well, so "func" isn't a great name for it.

Yes, that does seem simpler.

> Some other comments about this inline asm:
>
> The "\n" makes the generated asm look funny and has no other function.
> Instead of using backreferences you can use a "+" constraint, "inout".
> Empty clobber list is strange.
> Casts to the return type, like most other casts, are an invitation to
> bugs and not actually useful.
>
> So this can be written
>
> static int enter_secure_mode(unsigned long kbase, unsigned long fdt)
> {
>   register uint64_t r3 asm("r3") = UV_ESM;
>   register uint64_t r4 asm("r4") = kbase;
>   register uint64_t r4 asm("r5") = fdt;
>
>   asm volatile("sc 2" : "+r"(r3) : "r"(r4), "r"(r5));
>
>   return r3;
> }

I'll adopt your version, it is cleaner inded. Thanks for providing it!

> (and it probably should use u64 instead of both uint64_t and unsigned long?)

Almost all of prom_init.c uses unsigned long, with u64 in just a few
places. uint64_t isn't used anywhere else in the file. I'll switch to
unsigned long everywhere, since this feature is only for 64 bit.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

Re: [PATCH v4 4/8] KVM: PPC: Ultravisor: Use UV_WRITE_PATE ucall to register a PATE

2019-07-18 Thread Claudio Carvalho



On 7/11/19 9:57 AM, Michael Ellerman wrote:
>
>>  
>>  static pmd_t *get_pmd_from_cache(struct mm_struct *mm)
>> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
>> b/arch/powerpc/mm/book3s64/radix_pgtable.c
>> index 8904aa1243d8..da6a6b76a040 100644
>> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
>> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
>> @@ -656,8 +656,10 @@ void radix__early_init_mmu_secondary(void)
>>  lpcr = mfspr(SPRN_LPCR);
>>  mtspr(SPRN_LPCR, lpcr | LPCR_UPRT | LPCR_HR);
>>  
>> -mtspr(SPRN_PTCR,
>> -  __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
>> +if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
>> +mtspr(SPRN_PTCR, __pa(partition_tb) |
>> +  (PATB_SIZE_SHIFT - 12));
>> +
>>  radix_init_amor();
>>  }
>>  
>> @@ -673,7 +675,8 @@ void radix__mmu_cleanup_all(void)
>>  if (!firmware_has_feature(FW_FEATURE_LPAR)) {
>>  lpcr = mfspr(SPRN_LPCR);
>>  mtspr(SPRN_LPCR, lpcr & ~LPCR_UPRT);
>> -mtspr(SPRN_PTCR, 0);
>> +if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
>> +mtspr(SPRN_PTCR, 0);
>>  powernv_set_nmmu_ptcr(0);
>>  radix__flush_tlb_all();
>>  }
> There's four of these case where we skip touching the PTCR, which is
> right on the borderline of warranting an accessor. I guess we can do it
> as a cleanup later.

I agree.

Since the kernel doesn't need to access a big number of ultravisor
privileged registers, maybe we can define mtspr_ and mfspr_
inline functions that in ultravisor.h that skip touching the register if an
ultravisor is present and and the register is ultravisor privileged. Thus,
we don't need to replicate comments and that also would make it easier for
developers to know what are the ultravisor privileged registers.

Something like this:

--- a/arch/powerpc/include/asm/ultravisor.h
+++ b/arch/powerpc/include/asm/ultravisor.h
@@ -10,10 +10,21 @@
 
 #include 
 #include 
+#include 
 
 int early_init_dt_scan_ultravisor(unsigned long node, const char *uname,
  int depth, void *data);
 
+static inline void mtspr_ptcr(unsigned long val)
+{
+   /*
+    * If the ultravisor firmware is present, it maintains the partition
+    * table. PTCR becomes ultravisor privileged only for writing.
+    */
+   if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR))
+   mtspr(SPRN_PTCR, val);
+}
+
 static inline int uv_register_pate(u64 lpid, u64 dw0, u64 dw1)
 {
    return ucall_norets(UV_WRITE_PATE, lpid, dw0, dw1);
diff --git a/arch/powerpc/mm/book3s64/pgtable.c
b/arch/powerpc/mm/book3s64/pgtable.c
index e1bbc48e730f..25156f9dfde8 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -220,7 +220,7 @@ void __init mmu_partition_table_init(void)
 * 64 K size.
 */
    ptcr = __pa(partition_tb) | (PATB_SIZE_SHIFT - 12);
-   mtspr(SPRN_PTCR, ptcr);
+   mtspr_ptcr(ptcr);
    powernv_set_nmmu_ptcr(ptcr);
 }

What do you think?
An alternative could be to change the mtspr() and mfspr() macros as we
proposed in the v1, but access to non-ultravisor privileged registers would
be performance impacted because we always would need to check if the
register is one of the few ultravisor registers that the kernel needs to
access.

Thanks,
Claudio


> cheers
>

Re: [PATCH] powerpc: remove meaningless KBUILD_ARFLAGS addition

2019-07-18 Thread Segher Boessenkool

Hi!

On Thu, Jul 18, 2019 at 11:19:58AM +0900, Masahiro Yamada wrote:
> On Thu, Jul 18, 2019 at 1:46 AM Segher Boessenkool
>  wrote:
> Kbuild always uses thin archives as far as vmlinux is concerned.
> 
> But, there are some other call-sites.
> 
> masahiro@pug:~/ref/linux$ git grep  '$(AR)' -- :^Documentation :^tools
> arch/powerpc/boot/Makefile:BOOTAR := $(AR)
> arch/unicore32/lib/Makefile:$(Q)$(AR) p $(GNU_LIBC_A) $(notdir $@) > $@
> arch/unicore32/lib/Makefile:$(Q)$(AR) p $(GNU_LIBGCC_A) $(notdir $@) > $@
> lib/raid6/test/Makefile: $(AR) cq $@ $^
> scripts/Kbuild.include:ar-option = $(call try-run, $(AR) rc$(1)
> "$$TMP",$(1),$(2))
> scripts/Makefile.build:  cmd_ar_builtin = rm -f $@; $(AR)
> rcSTP$(KBUILD_ARFLAGS) $@ $(real-prereqs)
> scripts/Makefile.lib:  cmd_ar = rm -f $@; $(AR)
> rcsTP$(KBUILD_ARFLAGS) $@ $(real-prereqs)
> 
> Probably, you are interested in arch/powerpc/boot/Makefile.

That one seems fine actually.  The raid6 one I don't know.

My original commit message was

Without this, some versions of GNU ar fail to create
an archive index if the object files it is packing
together are of a different object format than ar's
default format (for example, binutils compiled to
default to 64-bit, with 32-bit objects).

but I cannot reproduce the problem anymore.  Shortly after my patch the
thin archive code happened to binutils, and that overhauled some other
things, which might have fixed it already?

> > Yes, I know.  This isn't about built-in.[oa], it is about *other*
> > archives we at least *used to* create.  If we *know* we do not anymore,
> > then this workaround can of course be removed (and good riddance).
> 
> If it is not about built-in.[oa],
> which archive are you talking about?
> 
> Can you pin-point the one?

No, not anymore.  Lost in the mists of time, I guess?  I think we'll
just have to file it as "it seems to work fine now".

Thank you (and everyone else) for the time looking at this!

Segher

Re: [PATCH v2 04/13] powerpc/pseries/svm: Add helpers for UV_SHARE_PAGE and UV_UNSHARE_PAGE

2019-07-18 Thread Thiago Jung Bauermann



Hello Alexey,

Thanks for your review!

Alexey Kardashevskiy  writes:

> On 13/07/2019 16:00, Thiago Jung Bauermann wrote:
>> From: Ram Pai 
>>
>> These functions are used when the guest wants to grant the hypervisor
>> access to certain pages.
>>
>> Signed-off-by: Ram Pai 
>> Signed-off-by: Thiago Jung Bauermann 
>> ---
>>   arch/powerpc/include/asm/ultravisor-api.h |  2 ++
>>   arch/powerpc/include/asm/ultravisor.h | 15 +++
>>   2 files changed, 17 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/ultravisor-api.h 
>> b/arch/powerpc/include/asm/ultravisor-api.h
>> index fe9a0d8d7673..c7513bbadf57 100644
>> --- a/arch/powerpc/include/asm/ultravisor-api.h
>> +++ b/arch/powerpc/include/asm/ultravisor-api.h
>> @@ -25,6 +25,8 @@
>>   #define UV_UNREGISTER_MEM_SLOT 0xF124
>>   #define UV_PAGE_IN 0xF128
>>   #define UV_PAGE_OUT0xF12C
>> +#define UV_SHARE_PAGE   0xF130
>> +#define UV_UNSHARE_PAGE 0xF134
>>   #define UV_PAGE_INVAL  0xF138
>>   #define UV_SVM_TERMINATE   0xF13C
>>   diff --git a/arch/powerpc/include/asm/ultravisor.h
>> b/arch/powerpc/include/asm/ultravisor.h
>> index f5dc5af739b8..f7418b663a0e 100644
>> --- a/arch/powerpc/include/asm/ultravisor.h
>> +++ b/arch/powerpc/include/asm/ultravisor.h
>> @@ -91,6 +91,21 @@ static inline int uv_svm_terminate(u64 lpid)
>>  return ucall(UV_SVM_TERMINATE, retbuf, lpid);
>>   }
>> +
>> +static inline int uv_share_page(u64 pfn, u64 npages)
>> +{
>> +unsigned long retbuf[UCALL_BUFSIZE];
>> +
>> +return ucall(UV_SHARE_PAGE, retbuf, pfn, npages);
>
>
> What is in that retbuf? Can you pass NULL instead?

I think so, that buffer isn't used actually. Claudio is working on a
ucall_norets() which doesn't take the buffer and I can switch to that.

--
Thiago Jung Bauermann
IBM Linux Technology Center

Re: [PATCH v2 03/13] powerpc/prom_init: Add the ESM call to prom_init

2019-07-18 Thread Segher Boessenkool

(Sorry to hijack your reply).

On Thu, Jul 18, 2019 at 06:11:48PM +1000, Alexey Kardashevskiy wrote:
> On 13/07/2019 16:00, Thiago Jung Bauermann wrote:
> >From: Ram Pai 
> >+static int enter_secure_mode(unsigned long kbase, unsigned long fdt)
> >+{
> >+register uint64_t func asm("r3") = UV_ESM;
> >+register uint64_t arg1 asm("r4") = (uint64_t)kbase;
> >+register uint64_t arg2 asm("r5") = (uint64_t)fdt;
> 
> What does UV do with kbase and fdt precisely? Few words in the commit 
> log will do.
> 
> >+
> >+asm volatile("sc 2\n"
> >+ : "=r"(func)
> >+ : "0"(func), "r"(arg1), "r"(arg2)
> >+ :);
> >+
> >+return (int)func;
> 
> And why "func"? Is it "function"? Weird name. Thanks,

Maybe the three vars should just be called "r3", "r4", and "r5" --
r3 is used as return value as well, so "func" isn't a great name for it.

Some other comments about this inline asm:

The "\n" makes the generated asm look funny and has no other function.
Instead of using backreferences you can use a "+" constraint, "inout".
Empty clobber list is strange.
Casts to the return type, like most other casts, are an invitation to
bugs and not actually useful.

So this can be written

static int enter_secure_mode(unsigned long kbase, unsigned long fdt)
{
register uint64_t r3 asm("r3") = UV_ESM;
register uint64_t r4 asm("r4") = kbase;
register uint64_t r4 asm("r5") = fdt;

asm volatile("sc 2" : "+r"(r3) : "r"(r4), "r"(r5));

return r3;
}

(and it probably should use u64 instead of both uint64_t and unsigned long?)

Segher

Re: [PATCH 2/3] DMA mapping: Move SME handling to x86-specific files

2019-07-18 Thread Thiago Jung Bauermann



Thomas Gleixner  writes:

> On Fri, 12 Jul 2019, Thiago Jung Bauermann wrote:
>> diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
>> index b310a9c18113..f2e399fb626b 100644
>> --- a/include/linux/mem_encrypt.h
>> +++ b/include/linux/mem_encrypt.h
>> @@ -21,23 +21,11 @@
>>  
>>  #else   /* !CONFIG_ARCH_HAS_MEM_ENCRYPT */
>>  
>> -#define sme_me_mask 0ULL
>> -
>> -static inline bool sme_active(void) { return false; }
>>  static inline bool sev_active(void) { return false; }
>
> You want to move out sev_active as well, the only relevant thing is
> mem_encrypt_active(). Everything SME/SEV is an architecture detail.

I'm sure you saw it. I addressed sev_active in a separate patch.

Thanks for reviewing this series!

>> +static inline bool mem_encrypt_active(void) { return false; }
>
> Thanks,
>
>   tglx


-- 
Thiago Jung Bauermann
IBM Linux Technology Center

Re: [PATCH 1/2] powerpc/rtas: use device model APIs and serialization during LPM

2019-07-18 Thread Nathan Lynch

Nathan Lynch  writes:

> During LPAR migration, cpu hotplug and migration operations can
> interleave like so:
>
> cd /sys/devices/system/cpu/cpu7/ | drmgr -m -c pmig -p pre \
> echo 0 > online  | -s 0xd7a884f83d830e6d -t 19 \
> echo 1 > online  | -n -d 1 5
> -+---
> online_store() { |
>   device_offline() { |
> cpu_subsys_offline() {   |
>   cpu_down(7);   |
> }|
> dev->offline = true; |
>   }  | migration_store() {
> }|   rtas_ibm_suspend_me() {
>  | rtas_online_cpus_mask() {
>  |   cpu_up(7);
>| }
>  | cpu_hotplug_disable();
>  | on_each_cpu(rtas_percpu_suspend_me());
>  | cpu_hotplug_enable();
> online_store() { |
>   device_online() {  |
> cpu_subsys_online() {|
>   cpu_up(7); |
> }|
> dev->offline = false;|
>   }  | rtas_offline_cpus_mask() {
> }|   rtas_cpu_state_change_mask() {
>  | cpu_down(7);
>|   }
>| }
>|   }
>| }

Actually I think this isn't a correct depiction of the race. I'll
rewrite and resend.

Re: [PATCH] powerpc/dma: Fix invalid DMA mmap behavior

2019-07-18 Thread Shawn Anastasio


On 7/18/19 4:52 AM, Christoph Hellwig wrote:

On Thu, Jul 18, 2019 at 10:49:34AM +0200, Christoph Hellwig wrote:

On Thu, Jul 18, 2019 at 01:45:16PM +1000, Oliver O'Halloran wrote:

Other than m68k, mips, and arm64, everybody else that doesn't have
ARCH_NO_COHERENT_DMA_MMAP set uses this default implementation, so
I assume this behavior is acceptable on those architectures.


It might be acceptable, but there's no reason to use pgport_noncached
if the platform supports cache-coherent DMA.

Christoph (+cc) made the change so maybe he saw something we're missing.


I always found the forcing of noncached access even for coherent
devices a little odd, but this was inherited from the previous
implementation, which surprised me a bit as the different attributes
are usually problematic even on x86.  Let me dig into the history a
bit more, but I suspect the righ fix is to default to cached mappings
for coherent devices.


Ok, some history:

The generic dma mmap implementation, which we are effectively still
using today was added by:

commit 64ccc9c033c6089b2d426dad3c56477ab066c999
Author: Marek Szyprowski 
Date:   Thu Jun 14 13:03:04 2012 +0200

 common: dma-mapping: add support for generic dma_mmap_* calls

and unconditionally uses pgprot_noncached in dma_common_mmap, which is
then used as the fallback by dma_mmap_attrs if no ->mmap method is
present.  At that point we already had the powerpc implementation
that only uses pgprot_noncached for non-coherent mappings, and
the arm one, which uses pgprot_writecombine if DMA_ATTR_WRITE_COMBINE
is set and otherwise pgprot_dmacoherent, which seems to be uncached.
Arm did support coherent platforms at that time, but they might have
been an afterthought and not handled properly.

So it migt have been that we were all wrong for that time and might
have to fix it up.


Personally, I'm not a huge fan of an implicit default for something
inherently architecture-dependent like this at all. What I'd like to
see is a mechanism that forces architecture code to explicitly
opt in to the default pgprot settings if they don't provide an
implementation of arch_dma_mmap_pgprot. This could perhaps be done
by reversing ARCH_HAS_DMA_MMAP_PGPROT to something like
ARCH_USE_DEFAULT_DMA_MMAP_PGPROT.

This way as more systems are moved to use the common mmap code instead
of their ops->mmap, the people doing the refactoring have to make an
explicit decision about the pgprot settings to use. Such a configuration
would have likely prevented this situation with powerpc from happening.

That being said, if the default behavior doesn't make sense in the
general case it should probably be fixed as well.

Curious to hear some thoughts on this.

Re: [PATCH v3 0/6] Remove x86-specific code from generic headers

2019-07-18 Thread Thiago Jung Bauermann



Lendacky, Thomas  writes:

> On 7/17/19 10:28 PM, Thiago Jung Bauermann wrote:
>> Hello,
>> 
>> This version is mostly about splitting up patch 2/3 into three separate
>> patches, as suggested by Christoph Hellwig. Two other changes are a fix in
>> patch 1 which wasn't selecting ARCH_HAS_MEM_ENCRYPT for s390 spotted by
>> Janani and removal of sme_active and sev_active symbol exports as suggested
>> by Christoph Hellwig.
>> 
>> These patches are applied on top of today's dma-mapping/for-next.
>> 
>> I don't have a way to test SME, SEV, nor s390's PEF so the patches have only
>> been build tested.
>
> I'll try and get this tested quickly to be sure everything works for SME
> and SEV.

Thanks! And thanks for reviewing the patches.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

[PATCH] scsi: ibmvscsi: remove casting dma_alloc_coherent

2019-07-18 Thread Vasyl Gomonovych

Fix allocation style
Generated by:  alloc_cast.cocci

Signed-off-by: Vasyl Gomonovych 
---
 drivers/scsi/ibmvscsi/ibmvscsi.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index 7f66a7783209..7e9b3e409851 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -715,8 +715,7 @@ static int map_sg_data(struct scsi_cmnd *cmd,
 
/* get indirect table */
if (!evt_struct->ext_list) {
-   evt_struct->ext_list = (struct srp_direct_buf *)
-   dma_alloc_coherent(dev,
+   evt_struct->ext_list = dma_alloc_coherent(dev,
   SG_ALL * sizeof(struct 
srp_direct_buf),
   _struct->ext_list_token, 0);
if (!evt_struct->ext_list) {
-- 
2.17.1

[PATCH 1/2] powerpc/rtas: use device model APIs and serialization during LPM

2019-07-18 Thread Nathan Lynch

During LPAR migration, cpu hotplug and migration operations can
interleave like so:

cd /sys/devices/system/cpu/cpu7/ | drmgr -m -c pmig -p pre \
echo 0 > online  | -s 0xd7a884f83d830e6d -t 19 \
echo 1 > online  | -n -d 1 5
-+---
online_store() { |
  device_offline() { |
cpu_subsys_offline() {   |
  cpu_down(7);   |
}|
dev->offline = true; |
  }  | migration_store() {
}|   rtas_ibm_suspend_me() {
 | rtas_online_cpus_mask() {
 |   cpu_up(7);
 | }
 | cpu_hotplug_disable();
 | on_each_cpu(rtas_percpu_suspend_me());
 | cpu_hotplug_enable();
online_store() { |
  device_online() {  |
cpu_subsys_online() {|
  cpu_up(7); |
}|
dev->offline = false;|
  }  | rtas_offline_cpus_mask() {
}|   rtas_cpu_state_change_mask() {
 | cpu_down(7);
 |   }
 | }
 |   }
 | }

This leaves cpu7 in a state where the driver core considers the cpu
device online, but in all other respects it is offline and
unused. Attempts to online the cpu via sysfs appear to succeed but the
driver core actually does not pass the request to the lower-level
cpuhp support code. This makes the cpu unusable until the system is
rebooted.

Instead of directly calling cpu_up/cpu_down, the migration code should
use the higher-level device core APIs to maintain consistent state and
serialize operations.

Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to 
migration/hibernation")
Signed-off-by: Nathan Lynch 
---
 arch/powerpc/kernel/rtas.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 9b4d2a2ffb4f..fbefd9ff6dab 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -875,15 +875,17 @@ static int rtas_cpu_state_change_mask(enum rtas_cpu_state 
state,
return 0;
 
for_each_cpu(cpu, cpus) {
+   struct device *dev = get_cpu_device(cpu);
+
switch (state) {
case DOWN:
-   cpuret = cpu_down(cpu);
+   cpuret = device_offline(dev);
break;
case UP:
-   cpuret = cpu_up(cpu);
+   cpuret = device_online(dev);
break;
}
-   if (cpuret) {
+   if (cpuret < 0) {
pr_debug("%s: cpu_%s for cpu#%d returned %d.\n",
__func__,
((state == UP) ? "up" : "down"),
@@ -972,6 +974,8 @@ int rtas_ibm_suspend_me(u64 handle)
data.token = rtas_token("ibm,suspend-me");
data.complete = 
 
+   lock_device_hotplug();
+
/* All present CPUs must be online */
cpumask_andnot(offline_mask, cpu_present_mask, cpu_online_mask);
cpuret = rtas_online_cpus_mask(offline_mask);
@@ -1011,6 +1015,7 @@ int rtas_ibm_suspend_me(u64 handle)
__func__);
 
 out:
+   unlock_device_hotplug();
free_cpumask_var(offline_mask);
return atomic_read();
 }
-- 
2.20.1

[PATCH 2/2] powerpc/rtas: allow rescheduling while changing cpu states

2019-07-18 Thread Nathan Lynch

rtas_cpu_state_change_mask() potentially operates on scores of cpus,
so explicitly allow rescheduling in the loop body.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/kernel/rtas.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index fbefd9ff6dab..396fb2f35c01 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -902,6 +903,7 @@ static int rtas_cpu_state_change_mask(enum rtas_cpu_state 
state,
cpumask_clear_cpu(cpu, cpus);
}
}
+   cond_resched();
}
 
return ret;
-- 
2.20.1

[PATCH 0/2] more migration vs CPU hotplug fixes

2019-07-18 Thread Nathan Lynch

Despite recent fixes, userspace-initiated CPU hotplug still can
destructively race with the migration code's CPU state manipulations
on the destination. Also, since such manipulations can consist of
mass-onlining and -offlining half or more of the CPUs in the system,
take care to reschedule when needed.

Nathan Lynch (2):
  powerpc/rtas: use device model APIs and serialization during LPM
  powerpc/rtas: allow rescheduling while changing cpu states

 arch/powerpc/kernel/rtas.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

-- 
2.20.1

[PATCH v2 3/3] tools/perf: Set 'trace_cycles' as defaultevent for perf kvm record in powerpc

2019-07-18 Thread Anju T Sudhakar

Use 'trace_imc/trace_cycles' as the default event for 'perf kvm record'
in powerpc.

Signed-off-by: Anju T Sudhakar 
---
 tools/perf/arch/powerpc/util/kvm-stat.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/tools/perf/arch/powerpc/util/kvm-stat.c 
b/tools/perf/arch/powerpc/util/kvm-stat.c
index c55e7405940e..0a06626fb18a 100644
--- a/tools/perf/arch/powerpc/util/kvm-stat.c
+++ b/tools/perf/arch/powerpc/util/kvm-stat.c
@@ -177,8 +177,9 @@ int cpu_isa_init(struct perf_kvm_stat *kvm, const char 
*cpuid __maybe_unused)
 /*
  * Incase of powerpc architecture, pmu registers are programmable
  * by guest kernel. So monitoring guest via host may not provide
- * valid samples. It is better to fail the "perf kvm record"
- * with default "cycles" event to monitor guest in powerpc.
+ * valid samples with default 'cycles' event. It is better to use
+ * 'trace_imc/trace_cycles' event for guest profiling, since it
+ * can track the guest instruction pointer in the trace-record.
  *
  * Function to parse the arguments and return appropriate values.
  */
@@ -202,8 +203,14 @@ int kvm_add_default_arch_event(int *argc, const char 
**argv)
 
parse_options(j, tmp, event_options, NULL, PARSE_OPT_KEEP_UNKNOWN);
if (!event) {
-   free(tmp);
-   return -EINVAL;
+   if (pmu_have_event("trace_imc", "trace_cycles")) {
+   argv[j++] = strdup("-e");
+   argv[j++] = strdup("trace_imc/trace_cycles/");
+   *argc += 2;
+   } else {
+   free(tmp);
+   return -EINVAL;
+   }
}
 
free(tmp);
-- 
2.20.1

[PATCH v2 2/3] tools/perf: Add arch neutral function to choose event for perf kvm record

2019-07-18 Thread Anju T Sudhakar

'perf kvm record' uses 'cycles'(if the user did not specify any event) as
the default event to profile the guest.
This will not provide any proper samples from the guest incase of
powerpc architecture, since in powerpc the PMUs are controlled by
the guest rather than the host.

Patch adds a function to pick an arch specific event for 'perf kvm record',
instead of selecting 'cycles' as a default event for all architectures.

For powerpc this function checks for any user specified event, and if there
isn't any it returns invalid instead of proceeding with 'cycles' event.

Signed-off-by: Anju T Sudhakar 
---

Changes from v1->v2
* Cross-build issue for aarch64, reported by Ravi is fixed.
---

 tools/perf/arch/powerpc/util/kvm-stat.c | 37 +
 tools/perf/builtin-kvm.c| 12 +++-
 tools/perf/util/kvm-stat.h  |  1 +
 3 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/tools/perf/arch/powerpc/util/kvm-stat.c 
b/tools/perf/arch/powerpc/util/kvm-stat.c
index f9db341c47b6..c55e7405940e 100644
--- a/tools/perf/arch/powerpc/util/kvm-stat.c
+++ b/tools/perf/arch/powerpc/util/kvm-stat.c
@@ -8,6 +8,7 @@
 
 #include "book3s_hv_exits.h"
 #include "book3s_hcalls.h"
+#include 
 
 #define NR_TPS 4
 
@@ -172,3 +173,39 @@ int cpu_isa_init(struct perf_kvm_stat *kvm, const char 
*cpuid __maybe_unused)
 
return ret;
 }
+
+/*
+ * Incase of powerpc architecture, pmu registers are programmable
+ * by guest kernel. So monitoring guest via host may not provide
+ * valid samples. It is better to fail the "perf kvm record"
+ * with default "cycles" event to monitor guest in powerpc.
+ *
+ * Function to parse the arguments and return appropriate values.
+ */
+int kvm_add_default_arch_event(int *argc, const char **argv)
+{
+   const char **tmp;
+   bool event = false;
+   int i, j = *argc;
+
+   const struct option event_options[] = {
+   OPT_BOOLEAN('e', "event", , NULL),
+   OPT_END()
+   };
+
+   tmp = calloc(j + 1, sizeof(char *));
+   if (!tmp)
+   return -EINVAL;
+
+   for (i = 0; i < j; i++)
+   tmp[i] = argv[i];
+
+   parse_options(j, tmp, event_options, NULL, PARSE_OPT_KEEP_UNKNOWN);
+   if (!event) {
+   free(tmp);
+   return -EINVAL;
+   }
+
+   free(tmp);
+   return 0;
+}
diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 5d2b34d290a3..d03750da051b 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -1510,11 +1510,21 @@ static int kvm_cmd_stat(const char *file_name, int 
argc, const char **argv)
 }
 #endif /* HAVE_KVM_STAT_SUPPORT */
 
+int __weak kvm_add_default_arch_event(int *argc __maybe_unused,
+   const char **argv __maybe_unused)
+{
+   return 0;
+}
+
 static int __cmd_record(const char *file_name, int argc, const char **argv)
 {
-   int rec_argc, i = 0, j;
+   int rec_argc, i = 0, j, ret;
const char **rec_argv;
 
+   ret = kvm_add_default_arch_event(, argv);
+   if (ret)
+   return -EINVAL;
+
rec_argc = argc + 2;
rec_argv = calloc(rec_argc + 1, sizeof(char *));
rec_argv[i++] = strdup("record");
diff --git a/tools/perf/util/kvm-stat.h b/tools/perf/util/kvm-stat.h
index b3b2670e1a2b..81a5bf4fbc71 100644
--- a/tools/perf/util/kvm-stat.h
+++ b/tools/perf/util/kvm-stat.h
@@ -148,4 +148,5 @@ extern const char *kvm_entry_trace;
 extern const char *kvm_exit_trace;
 #endif /* HAVE_KVM_STAT_SUPPORT */
 
+extern int kvm_add_default_arch_event(int *argc, const char **argv);
 #endif /* __PERF_KVM_STAT_H */
-- 
2.20.1

[PATCH v2 1/3] tools/perf: Move kvm-stat header file from conditional inclusion to common include section

2019-07-18 Thread Anju T Sudhakar

Move kvm-stat header file to the common include section, and make the
definitions in the header file under the conditional inclusion 
`#ifdef HAVE_KVM_STAT_SUPPORT`.

This helps to define other perf kvm related function prototypes in
kvm-stat header file, which may not need kvm-stat support.

Signed-off-by: Anju T Sudhakar 
---
 tools/perf/builtin-kvm.c   | 2 +-
 tools/perf/util/kvm-stat.h | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index b33c83489120..5d2b34d290a3 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -19,6 +19,7 @@
 #include "util/top.h"
 #include "util/data.h"
 #include "util/ordered-events.h"
+#include "util/kvm-stat.h"
 
 #include 
 #ifdef HAVE_TIMERFD_SUPPORT
@@ -55,7 +56,6 @@ static const char *get_filename_for_perf_kvm(void)
 }
 
 #ifdef HAVE_KVM_STAT_SUPPORT
-#include "util/kvm-stat.h"
 
 void exit_event_get_key(struct perf_evsel *evsel,
struct perf_sample *sample,
diff --git a/tools/perf/util/kvm-stat.h b/tools/perf/util/kvm-stat.h
index 1403dec189b4..b3b2670e1a2b 100644
--- a/tools/perf/util/kvm-stat.h
+++ b/tools/perf/util/kvm-stat.h
@@ -2,6 +2,8 @@
 #ifndef __PERF_KVM_STAT_H
 #define __PERF_KVM_STAT_H
 
+#ifdef HAVE_KVM_STAT_SUPPORT
+
 #include "../perf.h"
 #include "tool.h"
 #include "stat.h"
@@ -144,5 +146,6 @@ extern const int decode_str_len;
 extern const char *kvm_exit_reason;
 extern const char *kvm_entry_trace;
 extern const char *kvm_exit_trace;
+#endif /* HAVE_KVM_STAT_SUPPORT */
 
 #endif /* __PERF_KVM_STAT_H */
-- 
2.20.1

Re: [PATCH v3 4/6] x86,s390/mm: Move sme_active() and sme_me_mask to x86-specific header

2019-07-18 Thread Christoph Hellwig

On Thu, Jul 18, 2019 at 05:42:18PM +, Lendacky, Thomas wrote:
> You may want to try and build the out-of-tree nvidia driver just to be
> sure you can remove the EXPORT_SYMBOL(). But I believe that was related
> to the DMA mask check, which now removed, may no longer be a problem.

Out of tree driver simply don't matter for kernel development decisions.

Re: [PATCH v3 0/6] Remove x86-specific code from generic headers

2019-07-18 Thread Lendacky, Thomas

On 7/17/19 10:28 PM, Thiago Jung Bauermann wrote:
> Hello,
> 
> This version is mostly about splitting up patch 2/3 into three separate
> patches, as suggested by Christoph Hellwig. Two other changes are a fix in
> patch 1 which wasn't selecting ARCH_HAS_MEM_ENCRYPT for s390 spotted by
> Janani and removal of sme_active and sev_active symbol exports as suggested
> by Christoph Hellwig.
> 
> These patches are applied on top of today's dma-mapping/for-next.
> 
> I don't have a way to test SME, SEV, nor s390's PEF so the patches have only
> been build tested.

I'll try and get this tested quickly to be sure everything works for SME
and SEV.

Thanks,
Tom

> 
> Changelog
> 
> Since v2:
> 
> - Patch "x86,s390: Move ARCH_HAS_MEM_ENCRYPT definition to arch/Kconfig"
>   - Added "select ARCH_HAS_MEM_ENCRYPT" to config S390. Suggested by Janani.
> 
> - Patch "DMA mapping: Move SME handling to x86-specific files"
>   - Split up into 3 new patches. Suggested by Christoph Hellwig.
> 
> - Patch "swiotlb: Remove call to sme_active()"
>   - New patch.
> 
> - Patch "dma-mapping: Remove dma_check_mask()"
>   - New patch.
> 
> - Patch "x86,s390/mm: Move sme_active() and sme_me_mask to x86-specific 
> header"
>   - New patch.
>   - Removed export of sme_active symbol. Suggested by Christoph Hellwig.
> 
> - Patch "fs/core/vmcore: Move sev_active() reference to x86 arch code"
>   - Removed export of sev_active symbol. Suggested by Christoph Hellwig.
> 
> - Patch "s390/mm: Remove sev_active() function"
>   - New patch.
> 
> Since v1:
> 
> - Patch "x86,s390: Move ARCH_HAS_MEM_ENCRYPT definition to arch/Kconfig"
>   - Remove definition of ARCH_HAS_MEM_ENCRYPT from s390/Kconfig as well.
>   - Reworded patch title and message a little bit.
> 
> - Patch "DMA mapping: Move SME handling to x86-specific files"
>   - Adapt s390's  as well.
>   - Remove dma_check_mask() from kernel/dma/mapping.c. Suggested by
> Christoph Hellwig.
> 
> Thiago Jung Bauermann (6):
>   x86,s390: Move ARCH_HAS_MEM_ENCRYPT definition to arch/Kconfig
>   swiotlb: Remove call to sme_active()
>   dma-mapping: Remove dma_check_mask()
>   x86,s390/mm: Move sme_active() and sme_me_mask to x86-specific header
>   fs/core/vmcore: Move sev_active() reference to x86 arch code
>   s390/mm: Remove sev_active() function
> 
>  arch/Kconfig|  3 +++
>  arch/s390/Kconfig   |  4 +---
>  arch/s390/include/asm/mem_encrypt.h |  5 +
>  arch/s390/mm/init.c |  8 +---
>  arch/x86/Kconfig|  4 +---
>  arch/x86/include/asm/mem_encrypt.h  | 10 ++
>  arch/x86/kernel/crash_dump_64.c |  5 +
>  arch/x86/mm/mem_encrypt.c   |  2 --
>  fs/proc/vmcore.c|  8 
>  include/linux/crash_dump.h  | 14 ++
>  include/linux/mem_encrypt.h | 15 +--
>  kernel/dma/mapping.c|  8 
>  kernel/dma/swiotlb.c|  3 +--
>  13 files changed, 42 insertions(+), 47 deletions(-)
>

Re: [PATCH v3 5/6] fs/core/vmcore: Move sev_active() reference to x86 arch code

2019-07-18 Thread Lendacky, Thomas

On 7/17/19 10:28 PM, Thiago Jung Bauermann wrote:
> Secure Encrypted Virtualization is an x86-specific feature, so it shouldn't
> appear in generic kernel code because it forces non-x86 architectures to
> define the sev_active() function, which doesn't make a lot of sense.
> 
> To solve this problem, add an x86 elfcorehdr_read() function to override
> the generic weak implementation. To do that, it's necessary to make
> read_from_oldmem() public so that it can be used outside of vmcore.c.
> 
> Also, remove the export for sev_active() since it's only used in files that
> won't be built as modules.
> 
> Signed-off-by: Thiago Jung Bauermann 

Adding Lianbo and Baoquan, who recently worked on this, for their review.

Thanks,
Tom

> ---
>  arch/x86/kernel/crash_dump_64.c |  5 +
>  arch/x86/mm/mem_encrypt.c   |  1 -
>  fs/proc/vmcore.c|  8 
>  include/linux/crash_dump.h  | 14 ++
>  include/linux/mem_encrypt.h |  1 -
>  5 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c
> index 22369dd5de3b..045e82e8945b 100644
> --- a/arch/x86/kernel/crash_dump_64.c
> +++ b/arch/x86/kernel/crash_dump_64.c
> @@ -70,3 +70,8 @@ ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char 
> *buf, size_t csize,
>  {
>   return __copy_oldmem_page(pfn, buf, csize, offset, userbuf, true);
>  }
> +
> +ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos)
> +{
> + return read_from_oldmem(buf, count, ppos, 0, sev_active());
> +}
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 7139f2f43955..b1e823441093 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -349,7 +349,6 @@ bool sev_active(void)
>  {
>   return sme_me_mask && sev_enabled;
>  }
> -EXPORT_SYMBOL(sev_active);
>  
>  /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED 
> */
>  bool force_dma_unencrypted(struct device *dev)
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 57957c91c6df..ca1f20bedd8c 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -100,9 +100,9 @@ static int pfn_is_ram(unsigned long pfn)
>  }
>  
>  /* Reads a page from the oldmem device from given offset. */
> -static ssize_t read_from_oldmem(char *buf, size_t count,
> - u64 *ppos, int userbuf,
> - bool encrypted)
> +ssize_t read_from_oldmem(char *buf, size_t count,
> +  u64 *ppos, int userbuf,
> +  bool encrypted)
>  {
>   unsigned long pfn, offset;
>   size_t nr_bytes;
> @@ -166,7 +166,7 @@ void __weak elfcorehdr_free(unsigned long long addr)
>   */
>  ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
>  {
> - return read_from_oldmem(buf, count, ppos, 0, sev_active());
> + return read_from_oldmem(buf, count, ppos, 0, false);
>  }
>  
>  /*
> diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
> index f774c5eb9e3c..4664fc1871de 100644
> --- a/include/linux/crash_dump.h
> +++ b/include/linux/crash_dump.h
> @@ -115,4 +115,18 @@ static inline int vmcore_add_device_dump(struct 
> vmcoredd_data *data)
>   return -EOPNOTSUPP;
>  }
>  #endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
> +
> +#ifdef CONFIG_PROC_VMCORE
> +ssize_t read_from_oldmem(char *buf, size_t count,
> +  u64 *ppos, int userbuf,
> +  bool encrypted);
> +#else
> +static inline ssize_t read_from_oldmem(char *buf, size_t count,
> +u64 *ppos, int userbuf,
> +bool encrypted)
> +{
> + return -EOPNOTSUPP;
> +}
> +#endif /* CONFIG_PROC_VMCORE */
> +
>  #endif /* LINUX_CRASHDUMP_H */
> diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
> index 0c5b0ff9eb29..5c4a18a91f89 100644
> --- a/include/linux/mem_encrypt.h
> +++ b/include/linux/mem_encrypt.h
> @@ -19,7 +19,6 @@
>  #else/* !CONFIG_ARCH_HAS_MEM_ENCRYPT */
>  
>  static inline bool mem_encrypt_active(void) { return false; }
> -static inline bool sev_active(void) { return false; }
>  
>  #endif   /* CONFIG_ARCH_HAS_MEM_ENCRYPT */
>  
>

Re: [PATCH v3 4/6] x86, s390/mm: Move sme_active() and sme_me_mask to x86-specific header

2019-07-18 Thread Lendacky, Thomas

On 7/17/19 10:28 PM, Thiago Jung Bauermann wrote:
> Now that generic code doesn't reference them, move sme_active() and
> sme_me_mask to x86's .
> 
> Also remove the export for sme_active() since it's only used in files that
> won't be built as modules. sme_me_mask on the other hand is used in
> arch/x86/kvm/svm.c (via __sme_set() and __psp_pa()) which can be built as a
> module so its export needs to stay.

You may want to try and build the out-of-tree nvidia driver just to be
sure you can remove the EXPORT_SYMBOL(). But I believe that was related
to the DMA mask check, which now removed, may no longer be a problem.

> 
> Signed-off-by: Thiago Jung Bauermann 

Reviewed-by: Tom Lendacky 

> ---
>  arch/s390/include/asm/mem_encrypt.h |  4 +---
>  arch/x86/include/asm/mem_encrypt.h  | 10 ++
>  arch/x86/mm/mem_encrypt.c   |  1 -
>  include/linux/mem_encrypt.h | 14 +-
>  4 files changed, 12 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/s390/include/asm/mem_encrypt.h 
> b/arch/s390/include/asm/mem_encrypt.h
> index 3eb018508190..ff813a56bc30 100644
> --- a/arch/s390/include/asm/mem_encrypt.h
> +++ b/arch/s390/include/asm/mem_encrypt.h
> @@ -4,9 +4,7 @@
>  
>  #ifndef __ASSEMBLY__
>  
> -#define sme_me_mask  0ULL
> -
> -static inline bool sme_active(void) { return false; }
> +static inline bool mem_encrypt_active(void) { return false; }
>  extern bool sev_active(void);
>  
>  int set_memory_encrypted(unsigned long addr, int numpages);
> diff --git a/arch/x86/include/asm/mem_encrypt.h 
> b/arch/x86/include/asm/mem_encrypt.h
> index 0c196c47d621..848ce43b9040 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -92,6 +92,16 @@ early_set_memory_encrypted(unsigned long vaddr, unsigned 
> long size) { return 0;
>  
>  extern char __start_bss_decrypted[], __end_bss_decrypted[], 
> __start_bss_decrypted_unused[];
>  
> +static inline bool mem_encrypt_active(void)
> +{
> + return sme_me_mask;
> +}
> +
> +static inline u64 sme_get_me_mask(void)
> +{
> + return sme_me_mask;
> +}
> +
>  #endif   /* __ASSEMBLY__ */
>  
>  #endif   /* __X86_MEM_ENCRYPT_H__ */
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index c805f0a5c16e..7139f2f43955 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -344,7 +344,6 @@ bool sme_active(void)
>  {
>   return sme_me_mask && !sev_enabled;
>  }
> -EXPORT_SYMBOL(sme_active);
>  
>  bool sev_active(void)
>  {
> diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
> index 470bd53a89df..0c5b0ff9eb29 100644
> --- a/include/linux/mem_encrypt.h
> +++ b/include/linux/mem_encrypt.h
> @@ -18,23 +18,11 @@
>  
>  #else/* !CONFIG_ARCH_HAS_MEM_ENCRYPT */
>  
> -#define sme_me_mask  0ULL
> -
> -static inline bool sme_active(void) { return false; }
> +static inline bool mem_encrypt_active(void) { return false; }
>  static inline bool sev_active(void) { return false; }
>  
>  #endif   /* CONFIG_ARCH_HAS_MEM_ENCRYPT */
>  
> -static inline bool mem_encrypt_active(void)
> -{
> - return sme_me_mask;
> -}
> -
> -static inline u64 sme_get_me_mask(void)
> -{
> - return sme_me_mask;
> -}
> -
>  #ifdef CONFIG_AMD_MEM_ENCRYPT
>  /*
>   * The __sme_set() and __sme_clr() macros are useful for adding or removing
>

Re: [PATCH v3 3/6] dma-mapping: Remove dma_check_mask()

2019-07-18 Thread Lendacky, Thomas

On 7/17/19 10:28 PM, Thiago Jung Bauermann wrote:
> sme_active() is an x86-specific function so it's better not to call it from
> generic code. Christoph Hellwig mentioned that "There is no reason why we
> should have a special debug printk just for one specific reason why there
> is a requirement for a large DMA mask.", so just remove dma_check_mask().
> 
> Signed-off-by: Thiago Jung Bauermann 

Reviewed-by: Tom Lendacky 

> ---
>  kernel/dma/mapping.c | 8 
>  1 file changed, 8 deletions(-)
> 
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 1f628e7ac709..61eeefbfcb36 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -291,12 +291,6 @@ void dma_free_attrs(struct device *dev, size_t size, 
> void *cpu_addr,
>  }
>  EXPORT_SYMBOL(dma_free_attrs);
>  
> -static inline void dma_check_mask(struct device *dev, u64 mask)
> -{
> - if (sme_active() && (mask < (((u64)sme_get_me_mask() << 1) - 1)))
> - dev_warn(dev, "SME is active, device will require DMA bounce 
> buffers\n");
> -}
> -
>  int dma_supported(struct device *dev, u64 mask)
>  {
>   const struct dma_map_ops *ops = get_dma_ops(dev);
> @@ -327,7 +321,6 @@ int dma_set_mask(struct device *dev, u64 mask)
>   return -EIO;
>  
>   arch_dma_set_mask(dev, mask);
> - dma_check_mask(dev, mask);
>   *dev->dma_mask = mask;
>   return 0;
>  }
> @@ -345,7 +338,6 @@ int dma_set_coherent_mask(struct device *dev, u64 mask)
>   if (!dma_supported(dev, mask))
>   return -EIO;
>  
> - dma_check_mask(dev, mask);
>   dev->coherent_dma_mask = mask;
>   return 0;
>  }
>

Re: [PATCH v3 2/6] swiotlb: Remove call to sme_active()

2019-07-18 Thread Lendacky, Thomas

On 7/17/19 10:28 PM, Thiago Jung Bauermann wrote:
> sme_active() is an x86-specific function so it's better not to call it from
> generic code.
> 
> There's no need to mention which memory encryption feature is active, so
> just use a more generic message. Besides, other architectures will have
> different names for similar technology.
> 
> Signed-off-by: Thiago Jung Bauermann 

Reviewed-by: Tom Lendacky 

> ---
>  kernel/dma/swiotlb.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 62fa5a82a065..e52401f94e91 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -459,8 +459,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>   panic("Can not allocate SWIOTLB buffer earlier and can't now 
> provide you with the DMA bounce buffer");
>  
>   if (mem_encrypt_active())
> - pr_warn_once("%s is active and system is using DMA bounce 
> buffers\n",
> -  sme_active() ? "SME" : "SEV");
> + pr_warn_once("Memory encryption is active and system is using 
> DMA bounce buffers\n");
>  
>   mask = dma_get_seg_boundary(hwdev);
>  
>

Re: [PATCH v3 6/6] s390/mm: Remove sev_active() function

2019-07-18 Thread Thiago Jung Bauermann



Halil Pasic  writes:

> On Thu, 18 Jul 2019 10:44:56 +0200
> Christoph Hellwig  wrote:
>
>> > -/* are we a protected virtualization guest? */
>> > -bool sev_active(void)
>> > -{
>> > -  return is_prot_virt_guest();
>> > -}
>> > -
>> >  bool force_dma_unencrypted(struct device *dev)
>> >  {
>> > -  return sev_active();
>> > +  return is_prot_virt_guest();
>> >  }
>> 
>> Do we want to keep the comment for force_dma_unencrypted?
>
> Yes we do. With the comment transferred:
>
> Reviewed-by: Halil Pasic 

Thanks for your review.

Here is the new version. Should I send a new patch series with this
patch and the Reviewed-by on the other ones?

-- 
Thiago Jung Bauermann
IBM Linux Technology Center


>From 1726205c73fb9e29feaa3d8909c5a1b0f2054c04 Mon Sep 17 00:00:00 2001
From: Thiago Jung Bauermann 
Date: Mon, 15 Jul 2019 20:50:43 -0300
Subject: [PATCH v4] s390/mm: Remove sev_active() function

All references to sev_active() were moved to arch/x86 so we don't need to
define it for s390 anymore.

Signed-off-by: Thiago Jung Bauermann 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Halil Pasic 
---
 arch/s390/include/asm/mem_encrypt.h | 1 -
 arch/s390/mm/init.c | 7 +--
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/s390/include/asm/mem_encrypt.h 
b/arch/s390/include/asm/mem_encrypt.h
index ff813a56bc30..2542cbf7e2d1 100644
--- a/arch/s390/include/asm/mem_encrypt.h
+++ b/arch/s390/include/asm/mem_encrypt.h
@@ -5,7 +5,6 @@
 #ifndef __ASSEMBLY__
 
 static inline bool mem_encrypt_active(void) { return false; }
-extern bool sev_active(void);
 
 int set_memory_encrypted(unsigned long addr, int numpages);
 int set_memory_decrypted(unsigned long addr, int numpages);
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 78c319c5ce48..6c43a1ed1beb 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -156,14 +156,9 @@ int set_memory_decrypted(unsigned long addr, int numpages)
 }
 
 /* are we a protected virtualization guest? */
-bool sev_active(void)
-{
-   return is_prot_virt_guest();
-}
-
 bool force_dma_unencrypted(struct device *dev)
 {
-   return sev_active();
+   return is_prot_virt_guest();
 }
 
 /* protected virtualization */

Re: [PATCH v3 6/6] s390/mm: Remove sev_active() function

2019-07-18 Thread Thiago Jung Bauermann



Christoph Hellwig  writes:

>> -/* are we a protected virtualization guest? */
>> -bool sev_active(void)
>> -{
>> -return is_prot_virt_guest();
>> -}
>> -
>>  bool force_dma_unencrypted(struct device *dev)
>>  {
>> -return sev_active();
>> +return is_prot_virt_guest();
>>  }
>
> Do we want to keep the comment for force_dma_unencrypted?
>
> Otherwise looks good:
>
> Reviewed-by: Christoph Hellwig 

Thank you for your review on al these patches.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

[PATCH] powerpc/rtas: unexport rtas_online_cpus_mask, rtas_offline_cpus_mask

2019-07-18 Thread Nathan Lynch

These aren't used by modular code, nor should they be.

Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to 
migration/hibernation")
Signed-off-by: Nathan Lynch 
---
 arch/powerpc/kernel/rtas.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 9b4d2a2ffb4f..2d4c9a0c4f08 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -926,13 +926,11 @@ int rtas_online_cpus_mask(cpumask_var_t cpus)
 
return ret;
 }
-EXPORT_SYMBOL(rtas_online_cpus_mask);
 
 int rtas_offline_cpus_mask(cpumask_var_t cpus)
 {
return rtas_cpu_state_change_mask(DOWN, cpus);
 }
-EXPORT_SYMBOL(rtas_offline_cpus_mask);
 
 int rtas_ibm_suspend_me(u64 handle)
 {
-- 
2.20.1

Re: [PATCH v9 08/10] open: openat2(2) syscall

2019-07-18 Thread Aleksa Sarai

On 2019-07-18, Arnd Bergmann  wrote:
> On Sat, Jul 6, 2019 at 5:00 PM Aleksa Sarai  wrote:
> 
> > diff --git a/arch/alpha/kernel/syscalls/syscall.tbl 
> > b/arch/alpha/kernel/syscalls/syscall.tbl
> > index 9e7704e44f6d..1703d048c141 100644
> > --- a/arch/alpha/kernel/syscalls/syscall.tbl
> > +++ b/arch/alpha/kernel/syscalls/syscall.tbl
> > @@ -461,6 +461,7 @@
> >  530common  getegid sys_getegid
> >  531common  geteuid sys_geteuid
> >  532common  getppid sys_getppid
> > +533common  openat2 sys_openat2
> >  # all other architectures have common numbers for new syscall, alpha
> >  # is the exception.
> >  534common  pidfd_send_signal   sys_pidfd_send_signal
> 
> My plan here was to add new syscalls in the same order as everwhere else,
> just with the number 110 higher. In the long run, I hope we can automate
> this.

Alright, I will adjust this.

> > diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
> > index aaf479a9e92d..4ad262698396 100644
> > --- a/arch/arm/tools/syscall.tbl
> > +++ b/arch/arm/tools/syscall.tbl
> > @@ -447,3 +447,4 @@
> >  431common  fsconfigsys_fsconfig
> >  432common  fsmount sys_fsmount
> >  433common  fspick  sys_fspick
> > +434common  openat2 sys_openat2
> 
> 434 is already used in linux-next, I suggest you use 437 (Palmer
> just submitted fchmodat4, which could become 436).

437 sounds good to me.

> > +/**
> > + * Arguments for how openat2(2) should open the target path. If @extra is 
> > zero,
> > + * then openat2(2) is identical to openat(2).
> > + *
> > + * @flags: O_* flags (unknown flags ignored).
> > + * @mode: O_CREAT file mode (ignored otherwise).
> > + * @upgrade_mask: restrict how the O_PATH may be re-opened (ignored 
> > otherwise).
> > + * @resolve: RESOLVE_* flags (-EINVAL on unknown flags).
> > + * @reserved: reserved for future extensions, must be zeroed.
> > + */
> > +struct open_how {
> > +   __u32 flags;
> > +   union {
> > +   __u16 mode;
> > +   __u16 upgrade_mask;
> > +   };
> > +   __u16 resolve;
> > +   __u64 reserved[7]; /* must be zeroed */
> > +};
> 
> We can have system calls with up to six arguments on all architectures, so
> this could still be done more conventionally without the indirection: like
> 
> long openat2(int dfd, const char __user * filename, int flags, mode_t
> mode_mask, __u16 resolve);
> 
> In fact, that seems similar enough to the existing openat() that I think
> you could also just add the fifth argument to the existing call when
> a newly defined flag is set, similarly to how we only use the 'mode'
> argument when O_CREAT or O_TMPFILE are set.

I considered doing this (and even had a preliminary version of it), but
I discovered that I was not in favour of this idea -- once I started to
write tests using it -- for a few reasons:

  1. It doesn't really allow for clean extension for a future 6th
 argument (because you are using up O_* flags to signify "use the
 next argument", and O_* flags don't give -EINVAL if they're
 unknown). Now, yes you can do the on-start runtime check that
 everyone does -- but I've never really liked having to do it.

 Having reserved padding for later extensions (that is actually
 checked and gives -EINVAL) matches more modern syscall designs.

  2. I really was hoping that the variadic openat(2) could be done away
 using this union setup (Linus said he didn't like it, and suggested
 using something like 'struct stat' as an argument for openat(2) --
 though personally I am not sure I would personally like to use an
 interface like that).

  3. In order to avoid wasting a syscall argument for mode/mask you need
 to either have something like your suggested mode_mask (which makes
 the syscall arguments less consistent) or have some sort of
 mode-like argument that is treated specially (which is really awful
 on multiple levels -- this one I also tried and even wrote my
 original tests using). And in both cases, the shims for
 open{,at}(2) are somewhat less clean.

All of that being said, I'd be happy to switch to whatever you think
makes the most sense. As long as it's possible to get an O_PATH with
RESOLVE_IN_ROOT set, I'm happy.

> > --- a/include/linux/syscalls.h
> > +++ b/include/linux/syscalls.h
> 
> This file seems to lack a declaration for the system call, which means it
> will cause a build failure on some architectures, e.g. arch/arc/kernel/sys.c:
> 
> #define __SYSCALL(nr, call) [nr] = (call),
> void *sys_call_table[NR_syscalls] = {
> [0 ... NR_syscalls-1] = sys_ni_syscall,
> #include 
> };

Thanks, I will fix this.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)

Re: [PATCH v9 08/10] open: openat2(2) syscall

2019-07-18 Thread Aleksa Sarai

On 2019-07-18, Rasmus Villemoes  wrote:
> On 06/07/2019 16.57, Aleksa Sarai wrote:
> > --- a/fs/open.c
> > +++ b/fs/open.c
> > @@ -928,24 +928,32 @@ struct file *open_with_fake_path(const struct path 
> > *path, int flags,
> >  }
> >  EXPORT_SYMBOL(open_with_fake_path);
> >  
> > -static inline int build_open_flags(int flags, umode_t mode, struct 
> > open_flags *op)
> > +static inline int build_open_flags(struct open_how how, struct open_flags 
> > *op)
> >  {
> 
> How does passing such a huge struct by value affect code generation?
> Does gcc actually inline the function (and does it even inline the old
> one given that it's already non-trivial and has more than one caller).

I'm not sure, but I'll just do what you suggested with passing a const
reference and just copying the few fields that actually are touched by
this function.

> >  
> > diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h
> > index 2868ae6c8fc1..e59917292213 100644
> > --- a/include/linux/fcntl.h
> > +++ b/include/linux/fcntl.h
> > @@ -4,13 +4,26 @@
> >  
> >  #include 
> >  
> > -/* list of all valid flags for the open/openat flags argument: */
> > +/* Should open_how.mode be set for older syscalls wrappers? */
> > +#define OPENHOW_MODE(flags, mode) \
> > +   (((flags) | (O_CREAT | __O_TMPFILE)) ? (mode) : 0)
> > +
> 
> Typo: (((flags) & (O_CREAT | __O_TMPFILE)) ? (mode) : 0)

Yup, thanks. I'm not sure why my tests passed on v9 with this bug (they
didn't pass in my v10-draft until I fixed this bug earlier today).

> 
> > +/**
> > + * Arguments for how openat2(2) should open the target path. If @extra is 
> > zero,
> > + * then openat2(2) is identical to openat(2).
> > + *
> > + * @flags: O_* flags (unknown flags ignored).
> > + * @mode: O_CREAT file mode (ignored otherwise).
> 
> should probably say "O_CREAT/O_TMPFILE file mode".

:+1:

> > + * @upgrade_mask: restrict how the O_PATH may be re-opened (ignored 
> > otherwise).
> > + * @resolve: RESOLVE_* flags (-EINVAL on unknown flags).
> > + * @reserved: reserved for future extensions, must be zeroed.
> > + */
> > +struct open_how {
> > +   __u32 flags;
> > +   union {
> > +   __u16 mode;
> > +   __u16 upgrade_mask;
> > +   };
> > +   __u16 resolve;
> 
> So mode and upgrade_mask are naturally u16 aka mode_t. And yes, they
> probably never need to be used together, so the union works. That then
> makes the next member 2-byte aligned, so using a u16 for the resolve
> flags brings us to an 8-byte boundary, and 11 unused flag bits should be
> enough for a while. But it seems a bit artificial to cram all this
> together and then add 56 bytes of reserved space.

I will happily admit that padding to 64 bytes is probably _very_ extreme
(I picked it purely because it's the size of a cache-line so anything
bigger makes even less sense). I was hoping someone would suggest a
better size once I posted the patchset, since I couldn't think of a good
answer myself.

Do you have any suggestions for a better layout or padding size?

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH



signature.asc
Description: PGP signature

Re: [PATCH v9 08/10] open: openat2(2) syscall

2019-07-18 Thread Arnd Bergmann

On Sat, Jul 6, 2019 at 5:00 PM Aleksa Sarai  wrote:

> diff --git a/arch/alpha/kernel/syscalls/syscall.tbl 
> b/arch/alpha/kernel/syscalls/syscall.tbl
> index 9e7704e44f6d..1703d048c141 100644
> --- a/arch/alpha/kernel/syscalls/syscall.tbl
> +++ b/arch/alpha/kernel/syscalls/syscall.tbl
> @@ -461,6 +461,7 @@
>  530common  getegid sys_getegid
>  531common  geteuid sys_geteuid
>  532common  getppid sys_getppid
> +533common  openat2 sys_openat2
>  # all other architectures have common numbers for new syscall, alpha
>  # is the exception.
>  534common  pidfd_send_signal   sys_pidfd_send_signal

My plan here was to add new syscalls in the same order as everwhere else,
just with the number 110 higher. In the long run, I hope we can automate
this.

> diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
> index aaf479a9e92d..4ad262698396 100644
> --- a/arch/arm/tools/syscall.tbl
> +++ b/arch/arm/tools/syscall.tbl
> @@ -447,3 +447,4 @@
>  431common  fsconfigsys_fsconfig
>  432common  fsmount sys_fsmount
>  433common  fspick  sys_fspick
> +434common  openat2 sys_openat2

434 is already used in linux-next, I suggest you use 437 (Palmer
just submitted fchmodat4, which could become 436).

> +/**
> + * Arguments for how openat2(2) should open the target path. If @extra is 
> zero,
> + * then openat2(2) is identical to openat(2).
> + *
> + * @flags: O_* flags (unknown flags ignored).
> + * @mode: O_CREAT file mode (ignored otherwise).
> + * @upgrade_mask: restrict how the O_PATH may be re-opened (ignored 
> otherwise).
> + * @resolve: RESOLVE_* flags (-EINVAL on unknown flags).
> + * @reserved: reserved for future extensions, must be zeroed.
> + */
> +struct open_how {
> +   __u32 flags;
> +   union {
> +   __u16 mode;
> +   __u16 upgrade_mask;
> +   };
> +   __u16 resolve;
> +   __u64 reserved[7]; /* must be zeroed */
> +};

We can have system calls with up to six arguments on all architectures, so
this could still be done more conventionally without the indirection: like

long openat2(int dfd, const char __user * filename, int flags, mode_t
mode_mask, __u16 resolve);

In fact, that seems similar enough to the existing openat() that I think
you could also just add the fifth argument to the existing call when
a newly defined flag is set, similarly to how we only use the 'mode'
argument when O_CREAT or O_TMPFILE are set.

> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h

This file seems to lack a declaration for the system call, which means it
will cause a build failure on some architectures, e.g. arch/arc/kernel/sys.c:

#define __SYSCALL(nr, call) [nr] = (call),
void *sys_call_table[NR_syscalls] = {
[0 ... NR_syscalls-1] = sys_ni_syscall,
#include 
};

Arnd

Re: [PATCH v9 08/10] open: openat2(2) syscall

2019-07-18 Thread Rasmus Villemoes

On 06/07/2019 16.57, Aleksa Sarai wrote:
> 
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -928,24 +928,32 @@ struct file *open_with_fake_path(const struct path 
> *path, int flags,
>  }
>  EXPORT_SYMBOL(open_with_fake_path);
>  
> -static inline int build_open_flags(int flags, umode_t mode, struct 
> open_flags *op)
> +static inline int build_open_flags(struct open_how how, struct open_flags 
> *op)
>  {

How does passing such a huge struct by value affect code generation?
Does gcc actually inline the function (and does it even inline the old
one given that it's already non-trivial and has more than one caller).

>   int lookup_flags = 0;
> - int acc_mode = ACC_MODE(flags);
> + int opath_mask = 0;
> + int acc_mode = ACC_MODE(how.flags);
> +
> + if (how.resolve & ~VALID_RESOLVE_FLAGS)
> + return -EINVAL;
> + if (!(how.flags & (O_PATH | O_CREAT | __O_TMPFILE)) && how.mode != 0)
> + return -EINVAL;
> + if (memchr_inv(how.reserved, 0, sizeof(how.reserved)))
> + return -EINVAL;

How about passing how by const reference, and copy the few fields you
need to local variables. That would at least simplify this patch by
eliminating a lot of the

> - flags &= VALID_OPEN_FLAGS;
> + how.flags &= VALID_OPEN_FLAGS;
>  
> - if (flags & (O_CREAT | __O_TMPFILE))
> - op->mode = (mode & S_IALLUGO) | S_IFREG;
> + if (how.flags & (O_CREAT | __O_TMPFILE))
> + op->mode = (how.mode & S_IALLUGO) | S_IFREG;

churn.

>  
> diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h
> index 2868ae6c8fc1..e59917292213 100644
> --- a/include/linux/fcntl.h
> +++ b/include/linux/fcntl.h
> @@ -4,13 +4,26 @@
>  
>  #include 
>  
> -/* list of all valid flags for the open/openat flags argument: */
> +/* Should open_how.mode be set for older syscalls wrappers? */
> +#define OPENHOW_MODE(flags, mode) \
> + (((flags) | (O_CREAT | __O_TMPFILE)) ? (mode) : 0)
> +

Typo: (((flags) & (O_CREAT | __O_TMPFILE)) ? (mode) : 0)

> +/**
> + * Arguments for how openat2(2) should open the target path. If @extra is 
> zero,
> + * then openat2(2) is identical to openat(2).
> + *
> + * @flags: O_* flags (unknown flags ignored).
> + * @mode: O_CREAT file mode (ignored otherwise).

should probably say "O_CREAT/O_TMPFILE file mode".

> + * @upgrade_mask: restrict how the O_PATH may be re-opened (ignored 
> otherwise).
> + * @resolve: RESOLVE_* flags (-EINVAL on unknown flags).
> + * @reserved: reserved for future extensions, must be zeroed.
> + */
> +struct open_how {
> + __u32 flags;
> + union {
> + __u16 mode;
> + __u16 upgrade_mask;
> + };
> + __u16 resolve;

So mode and upgrade_mask are naturally u16 aka mode_t. And yes, they
probably never need to be used together, so the union works. That then
makes the next member 2-byte aligned, so using a u16 for the resolve
flags brings us to an 8-byte boundary, and 11 unused flag bits should be
enough for a while. But it seems a bit artificial to cram all this
together and then add 56 bytes of reserved space.

Rasmus

Re: Crash in kvmppc_xive_release()

2019-07-18 Thread Cédric Le Goater

On 18/07/2019 14:49, Michael Ellerman wrote:
> Anyone else seen this?
> 
> This is running ~176 VMs on a Power9 (1 per thread), host crashes:

This is beyond the underlying limits of XIVE. 

As we allocate 2K vCPUs per VM, that is 16K EQs for interrupt events. The 
overall
EQ count is 1M. I let you calculate what is our max number of VMs ...

>   [   66.403750][ T6423] xive: OPAL failed to allocate VCPUs order 11, err -10

Hence, the OPAL XIVE driver fails which is good but ...

>   [188523.080935670,4] Spent 1783 msecs in OPAL call 135!
>   [   66.484965][ T6250] BUG: Kernel NULL pointer dereference at 0x42e8
>   [   66.485558][ T6250] Faulting instruction address: 0xc00811a33fcc
>   [   66.485990][ T6250] Oops: Kernel access of bad area, sig: 7 [#1]
>   [   66.486405][ T6250] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 
> NUMA PowerNV
>   [   66.486967][ T6250] Modules linked in: kvm_hv kvm
>   [   66.487275][ T6250] CPU: 107 PID: 6250 Comm: qemu-system-ppc Not tainted 
> 5.2.0-rc2-gcc9x-gf5a9e488d623 #1
>   [   66.487902][ T6250] NIP:  c00811a33fcc LR: c00811a33fc4 CTR: 
> c05d5970
>   [   66.488383][ T6250] REGS: c01fabebb900 TRAP: 0300   Not tainted  
> (5.2.0-rc2-gcc9x-gf5a9e488d623)
>   [   66.488933][ T6250] MSR:  9280b033 
>   CR: 24028224  XER: 
>   [   66.489724][ T6250] CFAR: c05d6a4c DAR: 42e8 DSISR: 
> 0008 IRQMASK: 0 
>   [   66.489724][ T6250] GPR00: c00811a33fc4 c01fabebbb90 
> c00811a5a200 c1399928 
>   [   66.489724][ T6250] GPR04: 0001 c047b8d0 
>  0001 
>   [   66.489724][ T6250] GPR08:   
> c01fa8c42f00 c00811a3af20 
>   [   66.489724][ T6250] GPR12: 8000 c0002023ff65a880 
> 00013a1b4000 0002 
>   [   66.489724][ T6250] GPR16: 1000 0002 
> 0001 00012b194cc0 
>   [   66.489724][ T6250] GPR20: 7fffb1645250 0001 
> 0031  
>   [   66.489724][ T6250] GPR24: 7fffb16408d8 c01ffafb62e0 
> c01f78699360 c01ff35d0620 
>   [   66.489724][ T6250] GPR28: c01ed0ed c01ecd90 
>  c01ed0ed 
>   [   66.495211][ T6250] NIP [c00811a33fcc] 
> kvmppc_xive_release+0x54/0x1b0 [kvm]
>   [   66.495642][ T6250] LR [c00811a33fc4] kvmppc_xive_release+0x4c/0x1b0 
> [kvm]
>   [   66.496101][ T6250] Call Trace:
>   [   66.496314][ T6250] [c01fabebbb90] [c00811a33fc4] 
> kvmppc_xive_release+0x4c/0x1b0 [kvm] (unreliable)
>   [   66.496893][ T6250] [c01fabebbbf0] [c00811a18d54] 
> kvm_device_release+0xac/0xf0 [kvm]
>   [   66.497399][ T6250] [c01fabebbc30] [c0442f8c] 
> __fput+0xec/0x310
>   [   66.497815][ T6250] [c01fabebbc90] [c0145f94] 
> task_work_run+0x114/0x170
>   [   66.498296][ T6250] [c01fabebbce0] [c0115274] 
> do_exit+0x454/0xee0
>   [   66.498743][ T6250] [c01fabebbdc0] [c0115dd0] 
> do_group_exit+0x60/0xe0
>   [   66.499201][ T6250] [c01fabebbe00] [c0115e74] 
> sys_exit_group+0x24/0x40
>   [   66.499747][ T6250] [c01fabebbe20] [c000b83c] 
> system_call+0x5c/0x70
>   [   66.500261][ T6250] Instruction dump:
>   [   66.500484][ T6250] fbe1fff8 fba1ffe8 fbc1fff0 7c7c1b78 f8010010 
> f821ffa1 eba30010 e87d0010 
>   [   66.501006][ T6250] ebdd 48006f61 e8410018 3920  
> 913e42e8 48007f3d e8410018 
>   [   66.501529][ T6250] ---[ end trace c021a6ca03594ec3 ]---
>   [   66.513119][ T6150] xive: OPAL failed to allocate VCPUs order 11, err -10


... the rollback code in case of such error must be bogus. It was never tested 
clearly :/

Thanks,

C.

Re: [PATCH] powerpc: mm: Limit rma_size to 1TB when running without HV mode

2019-07-18 Thread Michael Ellerman

On Wed, 2019-07-10 at 05:20:18 UTC, Suraj Jitindar Singh wrote:
> The virtual real mode addressing (VRMA) mechanism is used when a
> partition is using HPT (Hash Page Table) translation and performs
> real mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this
> mode effective address bits 0:23 are treated as zero (i.e. the access
> is aliased to 0) and the access is performed using an implicit 1TB SLB
> entry.
> 
> The size of the RMA (Real Memory Area) is communicated to the guest as
> the size of the first memory region in the device tree. And because of
> the mechanism described above can be expected to not exceed 1TB. In the
> event that the host erroneously represents the RMA as being larger than
> 1TB, guest accesses in real mode to memory addresses above 1TB will be
> aliased down to below 1TB. This means that a memory access performed in
> real mode may differ to one performed in virtual mode for the same memory
> address, which would likely have unintended consequences.
> 
> To avoid this outcome have the guest explicitly limit the size of the
> RMA to the current maximum, which is 1TB. This means that even if the
> first memory block is larger than 1TB, only the first 1TB should be
> accessed in real mode.
> 
> Signed-off-by: Suraj Jitindar Singh 
> Tested-by: Satheesh Rajendran 
> Reviewed-by: David Gibson 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/da0ef93310e67ae6902efded60b6724dab27a5d1

cheers

Re: [PATCH 1/3] KVM: PPC: Book3S HV: Always save guest pmu for guest capable of nesting

2019-07-18 Thread Michael Ellerman

On Wed, 2019-07-03 at 01:20:20 UTC, Suraj Jitindar Singh wrote:
> The performance monitoring unit (PMU) registers are saved on guest exit
> when the guest has set the pmcregs_in_use flag in its lppaca, if it
> exists, or unconditionally if it doesn't. If a nested guest is being
> run then the hypervisor doesn't, and in most cases can't, know if the
> pmu registers are in use since it doesn't know the location of the lppaca
> for the nested guest, although it may have one for its immediate guest.
> This results in the values of these registers being lost across nested
> guest entry and exit in the case where the nested guest was making use
> of the performance monitoring facility while it's nested guest hypervisor
> wasn't.
> 
> Further more the hypervisor could interrupt a guest hypervisor between
> when it has loaded up the pmu registers and it calling H_ENTER_NESTED or
> between returning from the nested guest to the guest hypervisor and the
> guest hypervisor reading the pmu registers, in kvmhv_p9_guest_entry().
> This means that it isn't sufficient to just save the pmu registers when
> entering or exiting a nested guest, but that it is necessary to always
> save the pmu registers whenever a guest is capable of running nested guests
> to ensure the register values aren't lost in the context switch.
> 
> Ensure the pmu register values are preserved by always saving their
> value into the vcpu struct when a guest is capable of running nested
> guests.
> 
> This should have minimal performance impact however any impact can be
> avoided by booting a guest with "-machine pseries,cap-nested-hv=false"
> on the qemu commandline.
> 
> Fixes: 95a6432ce903 "KVM: PPC: Book3S HV: Streamlined guest entry/exit path 
> on P9 for radix guests"
> 
> Signed-off-by: Suraj Jitindar Singh 

Series applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/63279eeb7f93abb1692573c26f1e038e1a87358b

cheers

Re: [PATCH 1/1] powerpc: fix off by one in max_zone_pfn initialization for ZONE_DMA

2019-07-18 Thread Michael Ellerman

On Tue, 2019-06-25 at 14:17:27 UTC, Andrea Arcangeli wrote:
> 25078dc1f74be16b858e914f52cc8f4d03c2271a first introduced an off by
> one error in the ZONE_DMA initialization of PPC_BOOK3E_64=y and since
> 9739ab7eda459f0669ec9807e0d9be5020bab88c the off by one applies to
> PPC32=y too. This simply corrects the off by one and should resolve
> crashes like below:
> 
> [   65.179101] page 0x7fff outside node 0 zone DMA [ 0x0 - 0x7fff ]
> 
> Unfortunately in various MM places "max" means a non inclusive end of
> range. free_area_init_nodes max_zone_pfn parameter is one case and
> MAX_ORDER is another one (unrelated) that comes by memory.
> 
> Reported-by: Zorro Lang 
> Fixes: 25078dc1f74b ("powerpc: use mm zones more sensibly")
> Fixes: 9739ab7eda45 ("powerpc: enable a 30-bit ZONE_DMA for 32-bit pmac")
> Signed-off-by: Andrea Arcangeli 
> Reviewed-by: Christoph Hellwig 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/03800e0526ee25ed7c843ca1e57b69ac2a5af642

cheers

Re: [PATCH v3 6/6] s390/mm: Remove sev_active() function

2019-07-18 Thread Halil Pasic

On Thu, 18 Jul 2019 10:44:56 +0200
Christoph Hellwig  wrote:

> > -/* are we a protected virtualization guest? */
> > -bool sev_active(void)
> > -{
> > -   return is_prot_virt_guest();
> > -}
> > -
> >  bool force_dma_unencrypted(struct device *dev)
> >  {
> > -   return sev_active();
> > +   return is_prot_virt_guest();
> >  }
> 
> Do we want to keep the comment for force_dma_unencrypted?

Yes we do. With the comment transferred:

Reviewed-by: Halil Pasic 

> 
> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig

Crash in kvmppc_xive_release()

2019-07-18 Thread Michael Ellerman

Anyone else seen this?

This is running ~176 VMs on a Power9 (1 per thread), host crashes:

  [   66.403750][ T6423] xive: OPAL failed to allocate VCPUs order 11, err -10
  [188523.080935670,4] Spent 1783 msecs in OPAL call 135!
  [   66.484965][ T6250] BUG: Kernel NULL pointer dereference at 0x42e8
  [   66.485558][ T6250] Faulting instruction address: 0xc00811a33fcc
  [   66.485990][ T6250] Oops: Kernel access of bad area, sig: 7 [#1]
  [   66.486405][ T6250] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 
NUMA PowerNV
  [   66.486967][ T6250] Modules linked in: kvm_hv kvm
  [   66.487275][ T6250] CPU: 107 PID: 6250 Comm: qemu-system-ppc Not tainted 
5.2.0-rc2-gcc9x-gf5a9e488d623 #1
  [   66.487902][ T6250] NIP:  c00811a33fcc LR: c00811a33fc4 CTR: 
c05d5970
  [   66.488383][ T6250] REGS: c01fabebb900 TRAP: 0300   Not tainted  
(5.2.0-rc2-gcc9x-gf5a9e488d623)
  [   66.488933][ T6250] MSR:  9280b033 
  CR: 24028224  XER: 
  [   66.489724][ T6250] CFAR: c05d6a4c DAR: 42e8 DSISR: 
0008 IRQMASK: 0 
  [   66.489724][ T6250] GPR00: c00811a33fc4 c01fabebbb90 
c00811a5a200 c1399928 
  [   66.489724][ T6250] GPR04: 0001 c047b8d0 
 0001 
  [   66.489724][ T6250] GPR08:   
c01fa8c42f00 c00811a3af20 
  [   66.489724][ T6250] GPR12: 8000 c0002023ff65a880 
00013a1b4000 0002 
  [   66.489724][ T6250] GPR16: 1000 0002 
0001 00012b194cc0 
  [   66.489724][ T6250] GPR20: 7fffb1645250 0001 
0031  
  [   66.489724][ T6250] GPR24: 7fffb16408d8 c01ffafb62e0 
c01f78699360 c01ff35d0620 
  [   66.489724][ T6250] GPR28: c01ed0ed c01ecd90 
 c01ed0ed 
  [   66.495211][ T6250] NIP [c00811a33fcc] kvmppc_xive_release+0x54/0x1b0 
[kvm]
  [   66.495642][ T6250] LR [c00811a33fc4] kvmppc_xive_release+0x4c/0x1b0 
[kvm]
  [   66.496101][ T6250] Call Trace:
  [   66.496314][ T6250] [c01fabebbb90] [c00811a33fc4] 
kvmppc_xive_release+0x4c/0x1b0 [kvm] (unreliable)
  [   66.496893][ T6250] [c01fabebbbf0] [c00811a18d54] 
kvm_device_release+0xac/0xf0 [kvm]
  [   66.497399][ T6250] [c01fabebbc30] [c0442f8c] __fput+0xec/0x310
  [   66.497815][ T6250] [c01fabebbc90] [c0145f94] 
task_work_run+0x114/0x170
  [   66.498296][ T6250] [c01fabebbce0] [c0115274] 
do_exit+0x454/0xee0
  [   66.498743][ T6250] [c01fabebbdc0] [c0115dd0] 
do_group_exit+0x60/0xe0
  [   66.499201][ T6250] [c01fabebbe00] [c0115e74] 
sys_exit_group+0x24/0x40
  [   66.499747][ T6250] [c01fabebbe20] [c000b83c] 
system_call+0x5c/0x70
  [   66.500261][ T6250] Instruction dump:
  [   66.500484][ T6250] fbe1fff8 fba1ffe8 fbc1fff0 7c7c1b78 f8010010 f821ffa1 
eba30010 e87d0010 
  [   66.501006][ T6250] ebdd 48006f61 e8410018 3920  
913e42e8 48007f3d e8410018 
  [   66.501529][ T6250] ---[ end trace c021a6ca03594ec3 ]---
  [   66.513119][ T6150] xive: OPAL failed to allocate VCPUs order 11, err -10

cheers

[PATCH v7] cpufreq/pasemi: fix an use-after-free inpas_cpufreq_cpu_init()

2019-07-18 Thread Christian Zigotzky


On 09.07.2019 at 03:39am, wen.yan...@zte.com.cn wrote:

Hello Wen,

Thanks for your patch!

Did you test your patch with a P.A. Semi board?


Hello Christian, thank you.
We don't have a P.A. Semi board yet, so we didn't test it.
If you have such a board, could you please kindly help to test it?

--
Thanks and regards,
Wen


Hello Wen,

I successfully tested your pasemi cpufreq modifications with my P.A. 
Semi board [1] today.


First I patched the latest Git kernel with Viresh Kumar's patch [2]. 
After that I was able to patch the latest Git kernel with your v7 patch [3].


Then the kernel compiled without any errors.

Afterwards I successfully tested the new Git kernel with some cpufreq 
governors on openSUSE Tumbleweed 20190521 PowerPC64 [4] and on ubuntu 
MATE 16.04.6 LTS PowerPC32.


Thanks a lot for your work!

Tested-by: Christian Zigotzky 

Cheers,
Christian

[1] https://en.wikipedia.org/wiki/AmigaOne_X1000
[2] 
https://lore.kernel.org/lkml/ee8cf5fb4b4a01fdf9199037ff6d835b935cfd13.1562902877.git.viresh.ku...@linaro.org/#Z30drivers:cpufreq:pasemi-cpufreq.c

[3] https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-July/193735.html
[4] Screenshots: 
https://i.pinimg.com/originals/37/66/93/37669306cbc909a9d79270a849d18aa6.png 
and 
https://i.pinimg.com/originals/fe/f8/bf/fef8bfc90d95b5ae9cf31e175e8ba2da.png

Re: [PATCH] powerpc/dma: Fix invalid DMA mmap behavior

2019-07-18 Thread Christoph Hellwig

On Thu, Jul 18, 2019 at 10:49:34AM +0200, Christoph Hellwig wrote:
> On Thu, Jul 18, 2019 at 01:45:16PM +1000, Oliver O'Halloran wrote:
> > > Other than m68k, mips, and arm64, everybody else that doesn't have
> > > ARCH_NO_COHERENT_DMA_MMAP set uses this default implementation, so
> > > I assume this behavior is acceptable on those architectures.
> > 
> > It might be acceptable, but there's no reason to use pgport_noncached
> > if the platform supports cache-coherent DMA.
> > 
> > Christoph (+cc) made the change so maybe he saw something we're missing.
> 
> I always found the forcing of noncached access even for coherent
> devices a little odd, but this was inherited from the previous
> implementation, which surprised me a bit as the different attributes
> are usually problematic even on x86.  Let me dig into the history a
> bit more, but I suspect the righ fix is to default to cached mappings
> for coherent devices.

Ok, some history:

The generic dma mmap implementation, which we are effectively still
using today was added by:

commit 64ccc9c033c6089b2d426dad3c56477ab066c999
Author: Marek Szyprowski 
Date:   Thu Jun 14 13:03:04 2012 +0200

common: dma-mapping: add support for generic dma_mmap_* calls

and unconditionally uses pgprot_noncached in dma_common_mmap, which is
then used as the fallback by dma_mmap_attrs if no ->mmap method is
present.  At that point we already had the powerpc implementation
that only uses pgprot_noncached for non-coherent mappings, and
the arm one, which uses pgprot_writecombine if DMA_ATTR_WRITE_COMBINE
is set and otherwise pgprot_dmacoherent, which seems to be uncached.
Arm did support coherent platforms at that time, but they might have
been an afterthought and not handled properly.

So it migt have been that we were all wrong for that time and might
have to fix it up.

Re: [PATCH] powerpc/dma: Fix invalid DMA mmap behavior

2019-07-18 Thread Christoph Hellwig

On Thu, Jul 18, 2019 at 01:45:16PM +1000, Oliver O'Halloran wrote:
> > Other than m68k, mips, and arm64, everybody else that doesn't have
> > ARCH_NO_COHERENT_DMA_MMAP set uses this default implementation, so
> > I assume this behavior is acceptable on those architectures.
> 
> It might be acceptable, but there's no reason to use pgport_noncached
> if the platform supports cache-coherent DMA.
> 
> Christoph (+cc) made the change so maybe he saw something we're missing.

I always found the forcing of noncached access even for coherent
devices a little odd, but this was inherited from the previous
implementation, which surprised me a bit as the different attributes
are usually problematic even on x86.  Let me dig into the history a
bit more, but I suspect the righ fix is to default to cached mappings
for coherent devices.

Re: [PATCH V3] cpufreq: Make cpufreq_generic_init() return void

2019-07-18 Thread Rafael J. Wysocki

On Tuesday, July 16, 2019 6:06:08 AM CEST Viresh Kumar wrote:
> It always returns 0 (success) and its return type should really be void.
> Over that, many drivers have added error handling code based on its
> return value, which is not required at all.
> 
> change its return type to void and update all the callers.
> 
> Signed-off-by: Viresh Kumar 
> ---
> V2->V3:
> - Update bmips cpufreq driver to avoid "warning: 'ret' may be used
>   uninitialized".
> - Build bot reported this issue almost after 4 days of posting this
>   patch, I was expecting this a lot earlier :)
> 
>  drivers/cpufreq/bmips-cpufreq.c | 17 ++---
>  drivers/cpufreq/cpufreq.c   |  4 +---
>  drivers/cpufreq/davinci-cpufreq.c   |  3 ++-
>  drivers/cpufreq/imx6q-cpufreq.c |  6 ++
>  drivers/cpufreq/kirkwood-cpufreq.c  |  3 ++-
>  drivers/cpufreq/loongson1-cpufreq.c |  8 +++-
>  drivers/cpufreq/loongson2_cpufreq.c |  3 ++-
>  drivers/cpufreq/maple-cpufreq.c |  3 ++-
>  drivers/cpufreq/omap-cpufreq.c  | 15 +--
>  drivers/cpufreq/pasemi-cpufreq.c|  3 ++-
>  drivers/cpufreq/pmac32-cpufreq.c|  3 ++-
>  drivers/cpufreq/pmac64-cpufreq.c|  3 ++-
>  drivers/cpufreq/s3c2416-cpufreq.c   |  9 ++---
>  drivers/cpufreq/s3c64xx-cpufreq.c   | 15 +++
>  drivers/cpufreq/s5pv210-cpufreq.c   |  3 ++-
>  drivers/cpufreq/sa1100-cpufreq.c|  3 ++-
>  drivers/cpufreq/sa1110-cpufreq.c|  3 ++-
>  drivers/cpufreq/spear-cpufreq.c |  3 ++-
>  drivers/cpufreq/tegra20-cpufreq.c   |  8 +---
>  include/linux/cpufreq.h |  2 +-
>  20 files changed, 46 insertions(+), 71 deletions(-)
> 
> diff --git a/drivers/cpufreq/bmips-cpufreq.c b/drivers/cpufreq/bmips-cpufreq.c
> index 56a4ebbf00e0..f7c23fa468f0 100644
> --- a/drivers/cpufreq/bmips-cpufreq.c
> +++ b/drivers/cpufreq/bmips-cpufreq.c
> @@ -131,23 +131,18 @@ static int bmips_cpufreq_exit(struct cpufreq_policy 
> *policy)
>  static int bmips_cpufreq_init(struct cpufreq_policy *policy)
>  {
>   struct cpufreq_frequency_table *freq_table;
> - int ret;
>  
>   freq_table = bmips_cpufreq_get_freq_table(policy);
>   if (IS_ERR(freq_table)) {
> - ret = PTR_ERR(freq_table);
> - pr_err("%s: couldn't determine frequency table (%d).\n",
> - BMIPS_CPUFREQ_NAME, ret);
> - return ret;
> + pr_err("%s: couldn't determine frequency table (%ld).\n",
> + BMIPS_CPUFREQ_NAME, PTR_ERR(freq_table));
> + return PTR_ERR(freq_table);
>   }
>  
> - ret = cpufreq_generic_init(policy, freq_table, TRANSITION_LATENCY);
> - if (ret)
> - bmips_cpufreq_exit(policy);
> - else
> - pr_info("%s: registered\n", BMIPS_CPUFREQ_NAME);
> + cpufreq_generic_init(policy, freq_table, TRANSITION_LATENCY);
> + pr_info("%s: registered\n", BMIPS_CPUFREQ_NAME);
>  
> - return ret;
> + return 0;
>  }
>  
>  static struct cpufreq_driver bmips_cpufreq_driver = {
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 4d6043ee7834..8dda62367816 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -159,7 +159,7 @@ EXPORT_SYMBOL_GPL(arch_set_freq_scale);
>   * - set policies transition latency
>   * - policy->cpus with all possible CPUs
>   */
> -int cpufreq_generic_init(struct cpufreq_policy *policy,
> +void cpufreq_generic_init(struct cpufreq_policy *policy,
>   struct cpufreq_frequency_table *table,
>   unsigned int transition_latency)
>  {
> @@ -171,8 +171,6 @@ int cpufreq_generic_init(struct cpufreq_policy *policy,
>* share the clock and voltage and clock.
>*/
>   cpumask_setall(policy->cpus);
> -
> - return 0;
>  }
>  EXPORT_SYMBOL_GPL(cpufreq_generic_init);
>  
> diff --git a/drivers/cpufreq/davinci-cpufreq.c 
> b/drivers/cpufreq/davinci-cpufreq.c
> index 3de48ae60c29..297d23cad8b5 100644
> --- a/drivers/cpufreq/davinci-cpufreq.c
> +++ b/drivers/cpufreq/davinci-cpufreq.c
> @@ -90,7 +90,8 @@ static int davinci_cpu_init(struct cpufreq_policy *policy)
>* Setting the latency to 2000 us to accommodate addition of drivers
>* to pre/post change notification list.
>*/
> - return cpufreq_generic_init(policy, freq_table, 2000 * 1000);
> + cpufreq_generic_init(policy, freq_table, 2000 * 1000);
> + return 0;
>  }
>  
>  static struct cpufreq_driver davinci_driver = {
> diff --git a/drivers/cpufreq/imx6q-cpufreq.c b/drivers/cpufreq/imx6q-cpufreq.c
> index 47ccfa6b17b7..648a09a1778a 100644
> --- a/drivers/cpufreq/imx6q-cpufreq.c
> +++ b/drivers/cpufreq/imx6q-cpufreq.c
> @@ -190,14 +190,12 @@ static int imx6q_set_target(struct cpufreq_policy 
> *policy, unsigned int index)
>  
>  static int imx6q_cpufreq_init(struct cpufreq_policy *policy)
>  {
> - int ret;
> -
>   policy->clk = clks[ARM].clk;
> - ret = cpufreq_generic_init(policy, freq_table, transition_latency);
> +

Re: [PATCH v3 6/6] s390/mm: Remove sev_active() function

2019-07-18 Thread Christoph Hellwig

> -/* are we a protected virtualization guest? */
> -bool sev_active(void)
> -{
> - return is_prot_virt_guest();
> -}
> -
>  bool force_dma_unencrypted(struct device *dev)
>  {
> - return sev_active();
> + return is_prot_virt_guest();
>  }

Do we want to keep the comment for force_dma_unencrypted?

Otherwise looks good:

Reviewed-by: Christoph Hellwig

Re: [PATCH v3 5/6] fs/core/vmcore: Move sev_active() reference to x86 arch code

2019-07-18 Thread Christoph Hellwig

On Thu, Jul 18, 2019 at 12:28:57AM -0300, Thiago Jung Bauermann wrote:
> Secure Encrypted Virtualization is an x86-specific feature, so it shouldn't
> appear in generic kernel code because it forces non-x86 architectures to
> define the sev_active() function, which doesn't make a lot of sense.
> 
> To solve this problem, add an x86 elfcorehdr_read() function to override
> the generic weak implementation. To do that, it's necessary to make
> read_from_oldmem() public so that it can be used outside of vmcore.c.
> 
> Also, remove the export for sev_active() since it's only used in files that
> won't be built as modules.

I have to say I find the __weak overrides of the vmcore files very
confusing and which we'd have a better scheme there.  But as this fits
into that scheme and allows to remove the AMD SME vs SEV knowledge from
the core I'm fine with it.

Reviewed-by: Christoph Hellwig

Re: [PATCH v3 4/6] x86,s390/mm: Move sme_active() and sme_me_mask to x86-specific header

2019-07-18 Thread Christoph Hellwig

On Thu, Jul 18, 2019 at 12:28:56AM -0300, Thiago Jung Bauermann wrote:
> Now that generic code doesn't reference them, move sme_active() and
> sme_me_mask to x86's .
> 
> Also remove the export for sme_active() since it's only used in files that
> won't be built as modules. sme_me_mask on the other hand is used in
> arch/x86/kvm/svm.c (via __sme_set() and __psp_pa()) which can be built as a
> module so its export needs to stay.
> 
> Signed-off-by: Thiago Jung Bauermann 

Looks good,

Reviewed-by: Christoph Hellwig

Re: [PATCH v3 3/6] dma-mapping: Remove dma_check_mask()

2019-07-18 Thread Christoph Hellwig

On Thu, Jul 18, 2019 at 12:28:55AM -0300, Thiago Jung Bauermann wrote:
> sme_active() is an x86-specific function so it's better not to call it from
> generic code. Christoph Hellwig mentioned that "There is no reason why we
> should have a special debug printk just for one specific reason why there
> is a requirement for a large DMA mask.", so just remove dma_check_mask().
> 
> Signed-off-by: Thiago Jung Bauermann 

Looks good,

Reviewed-by: Christoph Hellwig

1 2 >

1 - 100 of 104 matches

Mail list logo