date:20200724

Re: [PATCH v3 5/6] powerpc/pseries: implement paravirt qspinlocks for SPLPAR

2020-07-24 Thread Waiman Long


On 7/24/20 3:10 PM, Waiman Long wrote:

On 7/24/20 4:16 AM, Will Deacon wrote:

On Thu, Jul 23, 2020 at 08:47:59PM +0200, pet...@infradead.org wrote:

On Thu, Jul 23, 2020 at 02:32:36PM -0400, Waiman Long wrote:
BTW, do you have any comment on my v2 lock holder cpu info 
qspinlock patch?
I will have to update the patch to fix the reported 0-day test 
problem, but

I want to collect other feedback before sending out v3.

I want to say I hate it all, it adds instructions to a path we spend an
aweful lot of time optimizing without really getting anything back for
it.

Will, how do you feel about it?

I can see it potentially being useful for debugging, but I hate the
limitation to 256 CPUs. Even arm64 is hitting that now.


After thinking more about that, I think we can use all the remaining 
bits in the 16-bit locked_pending. Reserving 1 bit for locked and 1 
bit for pending, there are 14 bits left. So as long as NR_CPUS < 16k 
(requirement for 16-bit locked_pending), we can put all possible cpu 
numbers into the lock. We can also just use smp_processor_id() without 
additional percpu data. 


Sorry, that doesn't work. The extra bits in the pending byte won't get 
cleared on unlock. That will have noticeable performance impact. 
Clearing the pending byte on unlock will cause other performance 
problem. So I guess we will have to limit the cpu number in the locked byte.


Regards,
Longman

Re: [PATCH v4 06/12] ppc64/kexec_file: restrict memory usage of kdump kernel

2020-07-24 Thread Thiago Jung Bauermann



Hari Bathini  writes:

> On 24/07/20 5:36 am, Thiago Jung Bauermann wrote:
>>
>> Hari Bathini  writes:
>>
>>> Kdump kernel, used for capturing the kernel core image, is supposed
>>> to use only specific memory regions to avoid corrupting the image to
>>> be captured. The regions are crashkernel range - the memory reserved
>>> explicitly for kdump kernel, memory used for the tce-table, the OPAL
>>> region and RTAS region as applicable. Restrict kdump kernel memory
>>> to use only these regions by setting up usable-memory DT property.
>>> Also, tell the kdump kernel to run at the loaded address by setting
>>> the magic word at 0x5c.
>>>
>>> Signed-off-by: Hari Bathini 
>>> Tested-by: Pingfan Liu 
>>> ---
>>>
>>> v3 -> v4:
>>> * Updated get_node_path() to be an iterative function instead of a
>>>   recursive one.
>>> * Added comment explaining why low memory is added to kdump kernel's
>>>   usable memory ranges though it doesn't fall in crashkernel region.
>>> * For correctness, added fdt_add_mem_rsv() for the low memory being
>>>   added to kdump kernel's usable memory ranges.
>>
>> Good idea.
>>
>>> * Fixed prop pointer update in add_usable_mem_property() and changed
>>>   duple to tuple as suggested by Thiago.
>>
>> 
>>
>>> +/**
>>> + * get_node_pathlen - Get the full path length of the given node.
>>> + * @dn:   Node.
>>> + *
>>> + * Also, counts '/' at the end of the path.
>>> + * For example, /memory@0 will be "/memory@0/\0" => 11 bytes.
>>
>> Wouldn't this function return 10 in the case of /memory@0?
>
> Actually, it does return 11. +1 while returning is for counting %NUL.
> On top of that we count an extra '/' for root node.. so, it ends up as 11.
> ('/'memory@0'/''\0'). Note the extra '/' before '\0'. Let me handle root node
> separately. That should avoid the confusion.

Ah, that is true. I forgot to count the iteration for the root node.
Sorry about that.

--
Thiago Jung Bauermann
IBM Linux Technology Center

Re: [PATCH v4 0/6] powerpc: queued spinlocks and rwlocks

2020-07-24 Thread Waiman Long


On 7/24/20 9:14 AM, Nicholas Piggin wrote:

Updated with everybody's feedback (thanks all), and more performance
results.

What I've found is I might have been measuring the worst load point for
the paravirt case, and by looking at a range of loads it's clear that
queued spinlocks are overall better even on PV, doubly so when you look
at the generally much improved worst case latencies.

I have defaulted it to N even though I'm less concerned about the PV
numbers now, just because I think it needs more stress testing. But
it's very nicely selectable so should be low risk to include.

All in all this is a very cool technology and great results especially
on the big systems but even on smaller ones there are nice gains. Thanks
Waiman and everyone who developed it.

Thanks,
Nick

Nicholas Piggin (6):
   powerpc/pseries: move some PAPR paravirt functions to their own file
   powerpc: move spinlock implementation to simple_spinlock
   powerpc/64s: implement queued spinlocks and rwlocks
   powerpc/pseries: implement paravirt qspinlocks for SPLPAR
   powerpc/qspinlock: optimised atomic_try_cmpxchg_lock that adds the
 lock hint
   powerpc: implement smp_cond_load_relaxed

  arch/powerpc/Kconfig  |  15 +
  arch/powerpc/include/asm/Kbuild   |   1 +
  arch/powerpc/include/asm/atomic.h |  28 ++
  arch/powerpc/include/asm/barrier.h|  14 +
  arch/powerpc/include/asm/paravirt.h   |  87 +
  arch/powerpc/include/asm/qspinlock.h  |  91 ++
  arch/powerpc/include/asm/qspinlock_paravirt.h |   7 +
  arch/powerpc/include/asm/simple_spinlock.h| 288 
  .../include/asm/simple_spinlock_types.h   |  21 ++
  arch/powerpc/include/asm/spinlock.h   | 308 +-
  arch/powerpc/include/asm/spinlock_types.h |  17 +-
  arch/powerpc/lib/Makefile |   3 +
  arch/powerpc/lib/locks.c  |  12 +-
  arch/powerpc/platforms/pseries/Kconfig|   9 +-
  arch/powerpc/platforms/pseries/setup.c|   4 +-
  include/asm-generic/qspinlock.h   |   4 +
  16 files changed, 588 insertions(+), 321 deletions(-)
  create mode 100644 arch/powerpc/include/asm/paravirt.h
  create mode 100644 arch/powerpc/include/asm/qspinlock.h
  create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt.h
  create mode 100644 arch/powerpc/include/asm/simple_spinlock.h
  create mode 100644 arch/powerpc/include/asm/simple_spinlock_types.h


That patch series looks good to me. Thanks for working on this.

For the series,

Acked-by: Waiman Long

Re: [PATCH v4 6/6] powerpc: implement smp_cond_load_relaxed

2020-07-24 Thread Waiman Long


On 7/24/20 9:14 AM, Nicholas Piggin wrote:

This implements smp_cond_load_relaed with the slowpath busy loop using the


Nit: "smp_cond_load_relaxed"

Cheers,
Longman

[PATCH v5 02/11] powerpc/kexec_file: mark PPC64 specific code

2020-07-24 Thread Hari Bathini

Some of the kexec_file_load code isn't PPC64 specific. Move PPC64
specific code from kexec/file_load.c to kexec/file_load_64.c. Also,
rename purgatory/trampoline.S to purgatory/trampoline_64.S in the
same spirit. No functional changes.

Signed-off-by: Hari Bathini 
Tested-by: Pingfan Liu 
Reviewed-by: Laurent Dufour 
Reviewed-by: Thiago Jung Bauermann 
---

v4 -> v5:
* Unchanged.

v3 -> v4:
* Moved common code back to set_new_fdt() from setup_new_fdt_ppc64()
  function. Added Reviewed-by tags from Laurent & Thiago.

v2 -> v3:
* Unchanged. Added Tested-by tag from Pingfan.

v1 -> v2:
* No changes.


 arch/powerpc/include/asm/kexec.h   |9 ++
 arch/powerpc/kexec/Makefile|2 -
 arch/powerpc/kexec/elf_64.c|7 +-
 arch/powerpc/kexec/file_load.c |   19 +
 arch/powerpc/kexec/file_load_64.c  |   87 
 arch/powerpc/purgatory/Makefile|4 +
 arch/powerpc/purgatory/trampoline.S|  117 
 arch/powerpc/purgatory/trampoline_64.S |  117 
 8 files changed, 222 insertions(+), 140 deletions(-)
 create mode 100644 arch/powerpc/kexec/file_load_64.c
 delete mode 100644 arch/powerpc/purgatory/trampoline.S
 create mode 100644 arch/powerpc/purgatory/trampoline_64.S

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index c684768..ac8fd48 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -116,6 +116,15 @@ int setup_new_fdt(const struct kimage *image, void *fdt,
  unsigned long initrd_load_addr, unsigned long initrd_len,
  const char *cmdline);
 int delete_fdt_mem_rsv(void *fdt, unsigned long start, unsigned long size);
+
+#ifdef CONFIG_PPC64
+int setup_purgatory_ppc64(struct kimage *image, const void *slave_code,
+ const void *fdt, unsigned long kernel_load_addr,
+ unsigned long fdt_load_addr);
+int setup_new_fdt_ppc64(const struct kimage *image, void *fdt,
+   unsigned long initrd_load_addr,
+   unsigned long initrd_len, const char *cmdline);
+#endif /* CONFIG_PPC64 */
 #endif /* CONFIG_KEXEC_FILE */
 
 #else /* !CONFIG_KEXEC_CORE */
diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
index 86380c6..67c3553 100644
--- a/arch/powerpc/kexec/Makefile
+++ b/arch/powerpc/kexec/Makefile
@@ -7,7 +7,7 @@ obj-y   += core.o crash.o core_$(BITS).o
 
 obj-$(CONFIG_PPC32)+= relocate_32.o
 
-obj-$(CONFIG_KEXEC_FILE)   += file_load.o elf_$(BITS).o
+obj-$(CONFIG_KEXEC_FILE)   += file_load.o file_load_$(BITS).o elf_$(BITS).o
 
 ifdef CONFIG_HAVE_IMA_KEXEC
 ifdef CONFIG_IMA
diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
index 3072fd6..23ad04c 100644
--- a/arch/powerpc/kexec/elf_64.c
+++ b/arch/powerpc/kexec/elf_64.c
@@ -88,7 +88,8 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
goto out;
}
 
-   ret = setup_new_fdt(image, fdt, initrd_load_addr, initrd_len, cmdline);
+   ret = setup_new_fdt_ppc64(image, fdt, initrd_load_addr,
+ initrd_len, cmdline);
if (ret)
goto out;
 
@@ -107,8 +108,8 @@ static void *elf64_load(struct kimage *image, char 
*kernel_buf,
pr_debug("Loaded device tree at 0x%lx\n", fdt_load_addr);
 
slave_code = elf_info.buffer + elf_info.proghdrs[0].p_offset;
-   ret = setup_purgatory(image, slave_code, fdt, kernel_load_addr,
- fdt_load_addr);
+   ret = setup_purgatory_ppc64(image, slave_code, fdt, kernel_load_addr,
+   fdt_load_addr);
if (ret)
pr_err("Error setting up the purgatory.\n");
 
diff --git a/arch/powerpc/kexec/file_load.c b/arch/powerpc/kexec/file_load.c
index 143c917..38439ab 100644
--- a/arch/powerpc/kexec/file_load.c
+++ b/arch/powerpc/kexec/file_load.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /*
- * ppc64 code to implement the kexec_file_load syscall
+ * powerpc code to implement the kexec_file_load syscall
  *
  * Copyright (C) 2004  Adam Litke (a...@us.ibm.com)
  * Copyright (C) 2004  IBM Corp.
@@ -20,22 +20,7 @@
 #include 
 #include 
 
-#define SLAVE_CODE_SIZE256
-
-const struct kexec_file_ops * const kexec_file_loaders[] = {
-   &kexec_elf64_ops,
-   NULL
-};
-
-int arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
- unsigned long buf_len)
-{
-   /* We don't support crash kernels yet. */
-   if (image->type == KEXEC_TYPE_CRASH)
-   return -EOPNOTSUPP;
-
-   return kexec_image_probe_default(image, buf, buf_len);
-}
+#define SLAVE_CODE_SIZE256 /* First 0x100 bytes */
 
 /**
  * setup_purgatory - initialize the purgatory's global variables
diff --git a/arch/

[PATCH v5 01/11] kexec_file: allow archs to handle special regions while locating memory hole

2020-07-24 Thread Hari Bathini

Some architectures may have special memory regions, within the given
memory range, which can't be used for the buffer in a kexec segment.
Implement weak arch_kexec_locate_mem_hole() definition which arch code
may override, to take care of special regions, while trying to locate
a memory hole.

Also, add the missing declarations for arch overridable functions and
and drop the __weak descriptors in the declarations to avoid non-weak
definitions from becoming weak.

Reported-by: kernel test robot 
[lkp: In v1, arch_kimage_file_post_load_cleanup() declaration was missing]
Signed-off-by: Hari Bathini 
Tested-by: Pingfan Liu 
Acked-by: Dave Young 
Reviewed-by: Thiago Jung Bauermann 
---

v4 -> v5:
* Unchanged.

v3 -> v4:
* Unchanged. Added Reviewed-by tag from Thiago.

v2 -> v3:
* Unchanged. Added Acked-by & Tested-by tags from Dave & Pingfan.

v1 -> v2:
* Introduced arch_kexec_locate_mem_hole() for override and dropped
  weak arch_kexec_add_buffer().
* Dropped __weak identifier for arch overridable functions.
* Fixed the missing declaration for arch_kimage_file_post_load_cleanup()
  reported by lkp. lkp report for reference:
- https://lore.kernel.org/patchwork/patch/1264418/


 include/linux/kexec.h |   29 ++---
 kernel/kexec_file.c   |   16 ++--
 2 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index ea67910..9e93bef 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -183,17 +183,24 @@ int kexec_purgatory_get_set_symbol(struct kimage *image, 
const char *name,
   bool get_value);
 void *kexec_purgatory_get_symbol_addr(struct kimage *image, const char *name);
 
-int __weak arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
-unsigned long buf_len);
-void * __weak arch_kexec_kernel_image_load(struct kimage *image);
-int __weak arch_kexec_apply_relocations_add(struct purgatory_info *pi,
-   Elf_Shdr *section,
-   const Elf_Shdr *relsec,
-   const Elf_Shdr *symtab);
-int __weak arch_kexec_apply_relocations(struct purgatory_info *pi,
-   Elf_Shdr *section,
-   const Elf_Shdr *relsec,
-   const Elf_Shdr *symtab);
+/* Architectures may override the below functions */
+int arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
+ unsigned long buf_len);
+void *arch_kexec_kernel_image_load(struct kimage *image);
+int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
+Elf_Shdr *section,
+const Elf_Shdr *relsec,
+const Elf_Shdr *symtab);
+int arch_kexec_apply_relocations(struct purgatory_info *pi,
+Elf_Shdr *section,
+const Elf_Shdr *relsec,
+const Elf_Shdr *symtab);
+int arch_kimage_file_post_load_cleanup(struct kimage *image);
+#ifdef CONFIG_KEXEC_SIG
+int arch_kexec_kernel_verify_sig(struct kimage *image, void *buf,
+unsigned long buf_len);
+#endif
+int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf);
 
 extern int kexec_add_buffer(struct kexec_buf *kbuf);
 int kexec_locate_mem_hole(struct kexec_buf *kbuf);
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 09cc78d..e89912d 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -636,6 +636,19 @@ int kexec_locate_mem_hole(struct kexec_buf *kbuf)
 }
 
 /**
+ * arch_kexec_locate_mem_hole - Find free memory to place the segments.
+ * @kbuf:   Parameters for the memory search.
+ *
+ * On success, kbuf->mem will have the start address of the memory region 
found.
+ *
+ * Return: 0 on success, negative errno on error.
+ */
+int __weak arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
+{
+   return kexec_locate_mem_hole(kbuf);
+}
+
+/**
  * kexec_add_buffer - place a buffer in a kexec segment
  * @kbuf:  Buffer contents and memory parameters.
  *
@@ -647,7 +660,6 @@ int kexec_locate_mem_hole(struct kexec_buf *kbuf)
  */
 int kexec_add_buffer(struct kexec_buf *kbuf)
 {
-
struct kexec_segment *ksegment;
int ret;
 
@@ -675,7 +687,7 @@ int kexec_add_buffer(struct kexec_buf *kbuf)
kbuf->buf_align = max(kbuf->buf_align, PAGE_SIZE);
 
/* Walk the RAM ranges and allocate a suitable range for the buffer */
-   ret = kexec_locate_mem_hole(kbuf);
+   ret = arch_kexec_locate_mem_hole(kbuf);
if (ret)
return ret;

[PATCH v5 00/11] ppc64: enable kdump support for kexec_file_load syscall

2020-07-24 Thread Hari Bathini

This patch series enables kdump support for kexec_file_load system
call (kexec -s -p) on PPC64. The changes are inspired from kexec-tools
code but heavily modified for kernel consumption.

The first patch adds a weak arch_kexec_locate_mem_hole() function to
override locate memory hole logic suiting arch needs. There are some
special regions in ppc64 which should be avoided while loading buffer
& there are multiple callers to kexec_add_buffer making it complicated
to maintain range sanity and using generic lookup at the same time.

The second patch marks ppc64 specific code within arch/powerpc/kexec
and arch/powerpc/purgatory to make the subsequent code changes easy
to understand.

The next patch adds helper function to setup different memory ranges
needed for loading kdump kernel, booting into it and exporting the
crashing kernel's elfcore.

The fourth patch overrides arch_kexec_locate_mem_hole() function to
locate memory hole for kdump segments by accounting for the special
memory regions, referred to as excluded memory ranges, and sets
kbuf->mem when a suitable memory region is found.

The fifth patch moves walk_drmem_lmbs() out of .init section with
a few changes to reuse it for setting up kdump kernel's usable memory
ranges. The next patch uses walk_drmem_lmbs() to look up the LMBs
and set linux,drconf-usable-memory & linux,usable-memory properties
in order to restrict kdump kernel's memory usage.

The seventh patch updates purgatory to setup r8 & r9 with opal base
and opal entry addresses respectively to aid kernels built with
CONFIG_PPC_EARLY_DEBUG_OPAL enabled. The next patch setups up backup
region as a kexec segment while loading kdump kernel and teaches
purgatory to copy data from source to destination.

Patch 09 builds the elfcore header for the running kernel & passes
the info to kdump kernel via "elfcorehdr=" parameter to export as
/proc/vmcore file. The next patch sets up the memory reserve map
for the kexec kernel and also claims kdump support for kdump as
all the necessary changes are added.

The last patch fixes a lookup issue for `kexec -l -s` case when
memory is reserved for crashkernel.

Tested the changes successfully on P8, P9 lpars, couple of OpenPOWER
boxes, one with secureboot enabled, KVM guest and a simulator.

v4 -> v5:
* Dropped patches 07/12 & 08/12 and updated purgatory to do everything
  in assembly.
* Added a new patch (which was part of patch 08/12 in v4) to update
  r8 & r9 registers with opal base & opal entry addresses as it is
  expected on kernels built with CONFIG_PPC_EARLY_DEBUG_OPAL enabled.
* Fixed kexec load issue on KVM guest.

v3 -> v4:
* Updated get_node_path() function to be iterative instead of a recursive one.
* Added comment explaining why low memory is added to kdump kernel's usable
  memory ranges though it doesn't fall in crashkernel region.
* Fixed stack_buf to be quadword aligned in accordance with ABI.
* Added missing of_node_put() in setup_purgatory_ppc64().
* Added a FIXME tag to indicate issue in adding opal/rtas regions to
  core image.

v2 -> v3:
* Fixed TOC pointer calculation for purgatory by using section info
  that has relocations applied.
* Fixed arch_kexec_locate_mem_hole() function to fallback to generic
  kexec_locate_mem_hole() lookup if exclude ranges list is empty.
* Dropped check for backup_start in trampoline_64.S as purgatory()
  function takes care of it anyway.

v1 -> v2:
* Introduced arch_kexec_locate_mem_hole() for override and dropped
  weak arch_kexec_add_buffer().
* Addressed warnings reported by lkp.
* Added patch to address kexec load issue when memory is reserved
  for crashkernel.
* Used the appropriate license header for the new files added.
* Added an option to merge ranges to minimize reallocations while
  adding memory ranges.
* Dropped within_crashkernel parameter for add_opal_mem_range() &
  add_rtas_mem_range() functions as it is not really needed.

---

Hari Bathini (11):
  kexec_file: allow archs to handle special regions while locating memory 
hole
  powerpc/kexec_file: mark PPC64 specific code
  powerpc/kexec_file: add helper functions for getting memory ranges
  ppc64/kexec_file: avoid stomping memory used by special regions
  powerpc/drmem: make lmb walk a bit more flexible
  ppc64/kexec_file: restrict memory usage of kdump kernel
  ppc64/kexec_file: enable early kernel's OPAL calls
  ppc64/kexec_file: setup backup region for kdump kernel
  ppc64/kexec_file: prepare elfcore header for crashing kernel
  ppc64/kexec_file: add appropriate regions for memory reserve map
  ppc64/kexec_file: fix kexec load failure with lack of memory hole


 arch/powerpc/include/asm/crashdump-ppc64.h |   19 
 arch/powerpc/include/asm/drmem.h   |9 
 arch/powerpc/include/asm/kexec.h   |   29 +
 arch/powerpc/include/asm/kexec_ranges.h|   25 +
 arch/powerpc/kernel/prom.c |   13 
 arch/powerpc/kexec/Makefile|2 
 arch/powerpc/kexec/e

Re: [PATCH v3 5/6] powerpc/pseries: implement paravirt qspinlocks for SPLPAR

2020-07-24 Thread Waiman Long


On 7/24/20 4:16 AM, Will Deacon wrote:

On Thu, Jul 23, 2020 at 08:47:59PM +0200, pet...@infradead.org wrote:

On Thu, Jul 23, 2020 at 02:32:36PM -0400, Waiman Long wrote:

BTW, do you have any comment on my v2 lock holder cpu info qspinlock patch?
I will have to update the patch to fix the reported 0-day test problem, but
I want to collect other feedback before sending out v3.

I want to say I hate it all, it adds instructions to a path we spend an
aweful lot of time optimizing without really getting anything back for
it.

Will, how do you feel about it?

I can see it potentially being useful for debugging, but I hate the
limitation to 256 CPUs. Even arm64 is hitting that now.


After thinking more about that, I think we can use all the remaining 
bits in the 16-bit locked_pending. Reserving 1 bit for locked and 1 bit 
for pending, there are 14 bits left. So as long as NR_CPUS < 16k 
(requirement for 16-bit locked_pending), we can put all possible cpu 
numbers into the lock. We can also just use smp_processor_id() without 
additional percpu data.




Also, you're talking ~1% gains here. I think our collective time would
be better spent off reviewing the CNA series and trying to make it more
deterministic.


I thought you guys are not interested in CNA. I do want to get CNA 
merged, if possible. Let review the current version again and see if 
there are ways we can further improve it.


Cheers,
Longman

Re: [PATCH 5/9] powerpc/32s: Fix CONFIG_BOOK3S_601 uses

2020-07-24 Thread Christophe Leroy


Michael Ellerman  a écrit :


We have two uses of CONFIG_BOOK3S_601, which doesn't exist. Fix them
to use CONFIG_PPC_BOOK3S_601 which is the correct symbol.

Fixes: 12c3f1fd87bf ("powerpc/32s: get rid of CPU_FTR_601 feature")
Signed-off-by: Michael Ellerman 
---

I think the bug in get_cycles() at least demonstrates that no one has
booted a 601 since v5.4. Time to drop 601?


Would be great.

I can submit a patch for that in August.

Christophe


---
 arch/powerpc/include/asm/ptrace.h | 2 +-
 arch/powerpc/include/asm/timex.h  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h  
b/arch/powerpc/include/asm/ptrace.h

index f194339cef3b..155a197c0aa1 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -243,7 +243,7 @@ static inline void set_trap_norestart(struct  
pt_regs *regs)

 }

 #define arch_has_single_step() (1)
-#ifndef CONFIG_BOOK3S_601
+#ifndef CONFIG_PPC_BOOK3S_601
 #define arch_has_block_step()  (true)
 #else
 #define arch_has_block_step()  (false)
diff --git a/arch/powerpc/include/asm/timex.h  
b/arch/powerpc/include/asm/timex.h

index d2d2c4bd8435..6047402b0a4d 100644
--- a/arch/powerpc/include/asm/timex.h
+++ b/arch/powerpc/include/asm/timex.h
@@ -17,7 +17,7 @@ typedef unsigned long cycles_t;

 static inline cycles_t get_cycles(void)
 {
-   if (IS_ENABLED(CONFIG_BOOK3S_601))
+   if (IS_ENABLED(CONFIG_PPC_BOOK3S_601))
return 0;

return mftb();
--
2.25.1

Re: [v3 12/15] powerpc/perf: Add support for outputting extended regs in perf intr_regs

2020-07-24 Thread Athira Rajeev




> On 24-Jul-2020, at 5:56 PM, Ravi Bangoria  wrote:
> 
> Hi Athira,
> 
>> +/* Function to return the extended register values */
>> +static u64 get_ext_regs_value(int idx)
>> +{
>> +switch (idx) {
>> +case PERF_REG_POWERPC_MMCR0:
>> +return mfspr(SPRN_MMCR0);
>> +case PERF_REG_POWERPC_MMCR1:
>> +return mfspr(SPRN_MMCR1);
>> +case PERF_REG_POWERPC_MMCR2:
>> +return mfspr(SPRN_MMCR2);
>> +default: return 0;
>> +}
>> +}
>> +
>>  u64 perf_reg_value(struct pt_regs *regs, int idx)
>>  {
>> -if (WARN_ON_ONCE(idx >= PERF_REG_POWERPC_MAX))
>> -return 0;
>> +u64 PERF_REG_EXTENDED_MAX;
> 
> PERF_REG_EXTENDED_MAX should be initialized. otherwise ...
> 
>> +
>> +if (cpu_has_feature(CPU_FTR_ARCH_300))
>> +PERF_REG_EXTENDED_MAX = PERF_REG_MAX_ISA_300;
>>  if (idx == PERF_REG_POWERPC_SIER &&
>> (IS_ENABLED(CONFIG_FSL_EMB_PERF_EVENT) ||
>> @@ -85,6 +103,16 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
>>  IS_ENABLED(CONFIG_PPC32)))
>>  return 0;
>>  +   if (idx >= PERF_REG_POWERPC_MAX && idx < PERF_REG_EXTENDED_MAX)
>> +return get_ext_regs_value(idx);
> 
> On non p9/p10 machine, PERF_REG_EXTENDED_MAX may contain random value which 
> will
> allow user to pass this if condition unintentionally.

> 
> Neat: PERF_REG_EXTENDED_MAX is a local variable so it should be in lowercase.
> Any specific reason to define it in capital?

Hi Ravi

There is no specific reason. I will include both these changes in next version

Thanks
Athira Rajeev


> 
> Ravi

Re: [v3 13/15] tools/perf: Add perf tools support for extended register capability in powerpc

2020-07-24 Thread Athira Rajeev




> On 24-Jul-2020, at 4:32 PM, Ravi Bangoria  wrote:
> 
> Hi Athira,
> 
> On 7/17/20 8:08 PM, Athira Rajeev wrote:
>> From: Anju T Sudhakar 
>> Add extended regs to sample_reg_mask in the tool side to use
>> with `-I?` option. Perf tools side uses extended mask to display
>> the platform supported register names (with -I? option) to the user
>> and also send this mask to the kernel to capture the extended registers
>> in each sample. Hence decide the mask value based on the processor
>> version.
>> Currently definitions for `mfspr`, `SPRN_PVR` are part of
>> `arch/powerpc/util/header.c`. Move this to a header file so that
>> these definitions can be re-used in other source files as well.
> 
> It seems this patch has a regression.
> 
> Without this patch:
> 
>  $ sudo ./perf record -I
>  ^C[ perf record: Woken up 1 times to write data ]
>  [ perf record: Captured and wrote 0.458 MB perf.data (318 samples) ]
> 
> With this patch:
> 
>  $ sudo ./perf record -I
>  Error:
>  dummy:HG: PMU Hardware doesn't support sampling/overflow-interrupts. Try 
> 'perf stat'


Hi Ravi,

Thanks for reviewing this patch and also testing. The above issue happens since
commit 0a892c1c9472 ("perf record: Add dummy event during system wide 
synthesis”) which adds a dummy event.

The fix for this issue is currently discussed here:  
https://lkml.org/lkml/2020/7/19/413
So once this fix is in, the issue will be resolved.

Thanks
Athira

> 
> Ravi

Re: [PATCHv3 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents

2020-07-24 Thread Nathan Lynch

Pingfan Liu  writes:
> On Thu, Jul 23, 2020 at 9:27 PM Nathan Lynch  wrote:
>> Pingfan Liu  writes:
>> > This will introduce extra dt updating payload for each involved lmb when 
>> > hotplug.
>> > But it should be fine since drmem_update_dt() is memory based operation and
>> > hotplug is not a hot path.
>>
>> This is great analysis but the performance implications of the change
>> are grave. The add/remove paths here are already O(n) where n is the
>> quantity of memory assigned to the LP, this change would make it O(n^2):
>>
>> dlpar_memory_add_by_count
>>   for_each_drmem_lmb <--
>> dlpar_add_lmb
>>   drmem_update_dt(_v1|_v2)
>> for_each_drmem_lmb   <--
>>
>> Memory add/remove isn't a hot path but quadratic runtime complexity
>> isn't acceptable. Its current performance is bad enough that I have
> Yes, the quadratic runtime complexity sounds terrible.
> And I am curious about the bug. Does the system have thousands of lmb?

Yes.

>> Not to mention we leak memory every time drmem_update_dt is called
>> because we can't safely free device tree properties :-(
> Do you know what block us to free it?

It's a longstanding problem. References to device tree properties aren't
counted or tracked so there's no way to safely free them unless the node
itself is released. But the ibm,dynamic-reconfiguration-memory node does
not ever go away and its properties are only subject to updates.

Maybe there's a way to address the specific case of
ibm,dynamic-reconfiguration-memory and the ibm,dynamic-memory(-v2)
properties, instead of tackling the general problem.

Regardless of all that, the drmem code needs better data structures and
lookup functions.

Re: [PATCH v4 06/12] ppc64/kexec_file: restrict memory usage of kdump kernel

2020-07-24 Thread Hari Bathini




On 24/07/20 5:36 am, Thiago Jung Bauermann wrote:
> 
> Hari Bathini  writes:
> 
>> Kdump kernel, used for capturing the kernel core image, is supposed
>> to use only specific memory regions to avoid corrupting the image to
>> be captured. The regions are crashkernel range - the memory reserved
>> explicitly for kdump kernel, memory used for the tce-table, the OPAL
>> region and RTAS region as applicable. Restrict kdump kernel memory
>> to use only these regions by setting up usable-memory DT property.
>> Also, tell the kdump kernel to run at the loaded address by setting
>> the magic word at 0x5c.
>>
>> Signed-off-by: Hari Bathini 
>> Tested-by: Pingfan Liu 
>> ---
>>
>> v3 -> v4:
>> * Updated get_node_path() to be an iterative function instead of a
>>   recursive one.
>> * Added comment explaining why low memory is added to kdump kernel's
>>   usable memory ranges though it doesn't fall in crashkernel region.
>> * For correctness, added fdt_add_mem_rsv() for the low memory being
>>   added to kdump kernel's usable memory ranges.
> 
> Good idea.
> 
>> * Fixed prop pointer update in add_usable_mem_property() and changed
>>   duple to tuple as suggested by Thiago.
> 
> 
> 
>> +/**
>> + * get_node_pathlen - Get the full path length of the given node.
>> + * @dn:   Node.
>> + *
>> + * Also, counts '/' at the end of the path.
>> + * For example, /memory@0 will be "/memory@0/\0" => 11 bytes.
> 
> Wouldn't this function return 10 in the case of /memory@0?

Actually, it does return 11. +1 while returning is for counting %NUL.
On top of that we count an extra '/' for root node.. so, it ends up as 11.
('/'memory@0'/''\0'). Note the extra '/' before '\0'. Let me handle root node
separately. That should avoid the confusion.

>> + *
>> + * Returns the string length of the node's full path.
>> + */
> 
> Maybe it's me (by analogy with strlen()), but I would expect "string
> length" to not include the terminating \0. I suggest renaming the
> function to something like get_node_path_size() and do s/length/size/ in
> the comment above if it's supposed to count the terminating \0.

Sure, will update the function name.

Thanks
Hari

Re: [PATCH 1/1 V4] : PCIE PHB reset

2020-07-24 Thread Michael Ellerman

On Mon, 13 Jul 2020 09:39:33 -0500, wenxi...@linux.vnet.ibm.com wrote:
> Several device drivers hit EEH(Extended Error handling) when triggering
> kdump on Pseries PowerVM. This patch implemented a reset of the PHBs
> in pci general code when triggering kdump. PHB reset stop all PCI
> transactions from normal kernel. We have tested the patch in several
> enviroments:
> - direct slot adapters
> - adapters under the switch
> - a VF adapter in PowerVM
> - a VF adapter/adapter in KVM guest.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/pseries: PCIE PHB reset
  https://git.kernel.org/powerpc/c/5a090f7c363fdc09b99222eae679506a58e7cc68

cheers

Re: [PATCH -next] powerpc: Remove unneeded inline functions

2020-07-24 Thread Michael Ellerman

On Fri, 17 Jul 2020 19:27:14 +0800, YueHaibing wrote:
> Both of those functions are only called from 64-bit only code, so the
> stubs should not be needed at all.

Applied to powerpc/next.

[1/1] powerpc: Remove unneeded inline functions
  https://git.kernel.org/powerpc/c/a3f3f8aa1f72dafe1450ccf8cbdfb1d12d42853a

cheers

Re: [PATCH trivial] ppc64/mm: remove comment that is no longer valid

2020-07-24 Thread Michael Ellerman

On Tue, 21 Jul 2020 14:49:15 +0530, Santosh Sivaraj wrote:
> hash_low_64.S was removed in [1] and since flush_hash_page is not called
> from any assembly routine.
> 
> [1]: commit a43c0eb8364c0 ("powerpc/mm: Convert 4k insert from asm to C")

Applied to powerpc/next.

[1/1] powerpc/mm/hash64: Remove comment that is no longer valid
  https://git.kernel.org/powerpc/c/69507b984ddce803df81215cc7813825189adafa

cheers

Re: [PATCH v2 1/2] powerpc/mce: Add MCE notification chain

2020-07-24 Thread Michael Ellerman

On Thu, 9 Jul 2020 19:21:41 +0530, Santosh Sivaraj wrote:
> Introduce notification chain which lets us know about uncorrected memory
> errors(UE). This would help prospective users in pmem or nvdimm subsystem
> to track bad blocks for better handling of persistent memory allocations.

Applied to powerpc/next.

[1/2] powerpc/mce: Add MCE notification chain
  https://git.kernel.org/powerpc/c/c37a63afc429ce959402168f67e4f094ab639ace
[2/2] powerpc/papr/scm: Add bad memory ranges to nvdimm bad ranges
  https://git.kernel.org/powerpc/c/85343a8da2d969df1a10ada8f7cb857d52ea70a6

cheers

Re: [PATCH v4 0/3] powernv/idle: Power9 idle cleanup

2020-07-24 Thread Michael Ellerman

On Tue, 21 Jul 2020 21:07:05 +0530, Pratik Rajesh Sampat wrote:
> v3: https://lkml.org/lkml/2020/7/17/1093
> Changelog v3-->v4:
> Based on comments from Nicholas Piggin and Gautham Shenoy,
> 1. Changed the naming of pnv_first_spr_loss_level from
> pnv_first_fullstate_loss_level to deep_spr_loss_state
> 2. Make the P9 PVR check only on the top level function
> pnv_probe_idle_states and let the rest of the checks be DT based because
> it is faster to do so
> 
> [...]

Applied to powerpc/next.

[1/3] powerpc/powernv/idle: Replace CPU feature check with PVR check
  https://git.kernel.org/powerpc/c/8747bf36f312356f8a295a0c39ff092d65ce75ae
[2/3] powerpc/powernv/idle: Rename pnv_first_spr_loss_level variable
  https://git.kernel.org/powerpc/c/dcbbfa6b05daca94ebcdbce80a7cf05c717d2942
[3/3] powerpc/powernv/idle: Exclude mfspr on HID1, 4, 5 on P9 and above
  https://git.kernel.org/powerpc/c/5c92fb1b46102e1efe0eed69e743f711bc1c7d2e

cheers

Re: [PATCH] powerpc/64: Fix an out of date comment about MMIO ordering

2020-07-24 Thread Michael Ellerman

On Thu, 16 Jul 2020 12:38:20 -0700, Palmer Dabbelt wrote:
> This primitive has been renamed, but because it was spelled incorrectly in the
> first place it must have escaped the fixup patch.  As far as I can tell this
> logic is still correct: smp_mb__after_spinlock() uses the default smp_mb()
> implementation, which is "sync" rather than "hwsync" but those are the same
> (though I'm not that familiar with PowerPC).

Applied to powerpc/next.

[1/1] powerpc/64: Fix an out of date comment about MMIO ordering
  https://git.kernel.org/powerpc/c/147c13413c04bc6a2bd76f2503402905e5e98cff

cheers

Re: [PATCH v3] powerpc: select ARCH_HAS_MEMBARRIER_SYNC_CORE

2020-07-24 Thread Michael Ellerman

On Thu, 16 Jul 2020 11:35:22 +1000, Nicholas Piggin wrote:
> powerpc return from interrupt and return from system call sequences are
> context synchronising.

Applied to powerpc/next.

[1/1] powerpc: Select ARCH_HAS_MEMBARRIER_SYNC_CORE
  https://git.kernel.org/powerpc/c/2384b36f9156c3b815a5ce5f694edc5054ab7625

cheers

Re: [PATCH v2 0/3] remove PROT_SAO support and disable

2020-07-24 Thread Michael Ellerman

On Fri, 3 Jul 2020 11:19:55 +1000, Nicholas Piggin wrote:
> It was suggested that I post this to a wider audience on account of
> the change to supported userspace features in patch 2 particularly.
> 
> Thanks,
> Nick
> 
> Nicholas Piggin (3):
>   powerpc: remove stale calc_vm_prot_bits comment
>   powerpc/64s: remove PROT_SAO support
>   powerpc/64s/hash: disable subpage_prot syscall by default
> 
> [...]

Applied to powerpc/next.

[1/3] powerpc: Remove stale calc_vm_prot_bits() comment
  https://git.kernel.org/powerpc/c/f4ac1774f2cba44994ce9ac0a65772e4656ac2df
[2/3] powerpc/64s: Remove PROT_SAO support
  https://git.kernel.org/powerpc/c/5c9fa16e8abd342ce04dc830c1ebb2a03abf6c05
[3/3] powerpc/64s/hash: Disable subpage_prot syscall by default
  https://git.kernel.org/powerpc/c/63396ada804c676e070bd1b8663046f18698ab27

cheers

Re: [PATCH] powerpc/powernv: machine check handler for POWER10

2020-07-24 Thread Michael Ellerman

On Fri, 3 Jul 2020 09:33:43 +1000, Nicholas Piggin wrote:
> 


Applied to powerpc/next.

[1/1] powerpc/powernv: Machine check handler for POWER10
  https://git.kernel.org/powerpc/c/201220bb0e8cbc163ec7f550b3b7b3da46eb5877

cheers

Re: [PATCH v2] powerpc/spufs: Rework fcheck() usage

2020-07-24 Thread Michael Ellerman

On Fri, 8 May 2020 23:06:33 +1000, Michael Ellerman wrote:
> Currently the spu coredump code triggers an RCU warning:
> 
>   =
>   WARNING: suspicious RCU usage
>   5.7.0-rc3-01755-g7cd49f0b7ec7 #1 Not tainted
>   -
>   include/linux/fdtable.h:95 suspicious rcu_dereference_check() usage!
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/spufs: Rework fcheck() usage
  https://git.kernel.org/powerpc/c/38b407be172d3d15afdbfe172691b7caad98120f

cheers

Re: [PATCH 1/2] powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked

2020-07-24 Thread Michael Ellerman

On Thu, 11 Jun 2020 18:12:02 +1000, Nicholas Piggin wrote:
> The scv instruction causes an interrupt which can enter the kernel with
> MSR[EE]=1, thus allowing interrupts to hit at any time. These must not
> be taken as normal interrupts, because they come from MSR[PR]=0 context,
> and yet the kernel stack is not yet set up and r13 is not set to the
> PACA).
> 
> Treat this as a soft-masked interrupt regardless of the soft masked
> state. This does not affect behaviour yet, because currently all
> interrupts are taken with MSR[EE]=0.

Applied to powerpc/next.

[1/2] powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked
  https://git.kernel.org/powerpc/c/b2dc2977cba48990df45e0a96150663d4f342700
[2/2] powerpc/64s: system call support for scv/rfscv instructions
  https://git.kernel.org/powerpc/c/7fa95f9adaee7e5cbb195d3359741120829e488b

cheers

Re: [PATCH] selftests/powerpc: Add test of memcmp at end of page

2020-07-24 Thread Michael Ellerman

On Wed, 22 Jul 2020 15:53:15 +1000, Michael Ellerman wrote:
> Update our memcmp selftest, to test the case where we're comparing up
> to the end of a page and the subsequent page is not mapped. We have to
> make sure we don't read off the end of the page and cause a fault.
> 
> We had a bug there in the past, fixed in commit
> d9470757398a ("powerpc/64: Fix memcmp reading past the end of src/dest").

Applied to powerpc/next.

[1/1] selftests/powerpc: Add test of memcmp at end of page
  https://git.kernel.org/powerpc/c/8ac9b9d61f0eceba6ce571e7527798465ae9a7c5

cheers

Re: [PATCH] selftests/powerpc: Run per_event_excludes test on Power8 or later

2020-07-24 Thread Michael Ellerman

On Thu, 16 Jul 2020 22:21:42 +1000, Michael Ellerman wrote:
> The per_event_excludes test wants to run on Power8 or later. But
> currently it checks that AT_BASE_PLATFORM *equals* power8, which means
> it only runs on Power8.
> 
> Fix it to check for the ISA 2.07 feature, which will be set on Power8
> and later CPUs.

Applied to powerpc/next.

[1/1] selftests/powerpc: Run per_event_excludes test on Power8 or later
  https://git.kernel.org/powerpc/c/9d1ebe9a98c1d7bf7cfbe1dba0052230c042ecdb

cheers

Re: [PATCH] powerpc/perf: fix missing is_sier_aviable() during build

2020-07-24 Thread Michael Ellerman

On Sun, 14 Jun 2020 14:06:04 +0530, Madhavan Srinivasan wrote:
> Compilation error:
> 
> arch/powerpc/perf/perf_regs.c:80:undefined reference to `.is_sier_available'
> 
> Currently is_sier_available() is part of core-book3s.c.
> But then, core-book3s.c is added to build based on
> CONFIG_PPC_PERF_CTRS. A config with CONFIG_PERF_EVENTS
> and without CONFIG_PPC_PERF_CTRS will have a build break
> because of missing is_sier_available(). Patch adds
> is_sier_available() in asm/perf_event.h to fix the build
> break for configs missing CONFIG_PPC_PERF_CTRS.

Applied to powerpc/next.

[1/1] powerpc/perf: Fix missing is_sier_aviable() during build
  https://git.kernel.org/powerpc/c/3c9450c053f88e525b2db1e6990cdf34d14e7696

cheers

Re: [PATCH 1/1] KVM/PPC: Fix typo on H_DISABLE_AND_GET hcall

2020-07-24 Thread Michael Ellerman

On Mon, 6 Jul 2020 21:48:12 -0300, Leonardo Bras wrote:
> On PAPR+ the hcall() on 0x1B0 is called H_DISABLE_AND_GET, but got
> defined as H_DISABLE_AND_GETC instead.
> 
> This define was introduced with a typo in commit 
> ("[PATCH] powerpc: Extends HCALL interface for InfiniBand usage"), and was
> later used without having the typo noticed.

Applied to powerpc/next.

[1/1] KVM: PPC: Fix typo on H_DISABLE_AND_GET hcall
  https://git.kernel.org/powerpc/c/0f10228c6ff6af36cbb31af35b01f76cdb0b3fc1

cheers

Re: [PATCH 1/5] powerpc sstep: Add tests for prefixed integer load/stores

2020-07-24 Thread Michael Ellerman

On Mon, 25 May 2020 12:59:19 +1000, Jordan Niethe wrote:
> Add tests for the prefixed versions of the integer load/stores that are
> currently tested. This includes the following instructions:
>   * Prefixed Load Doubleword (pld)
>   * Prefixed Load Word and Zero (plwz)
>   * Prefixed Store Doubleword (pstd)
> 
> Skip the new tests if ISA v3.1 is unsupported.

Applied to powerpc/next.

[1/5] powerpc/sstep: Add tests for prefixed integer load/stores
  https://git.kernel.org/powerpc/c/b6b54b42722a2393056c891c0d05cd8cc40eb776
[2/5] powerpc/sstep: Add tests for prefixed floating-point load/stores
  https://git.kernel.org/powerpc/c/0396de6d8561c721b03fce386eb9682b37a26013
[3/5] powerpc/sstep: Set NIP in instruction emulation tests
  https://git.kernel.org/powerpc/c/1c89cf7fbed36f078b20fd47d308b4fc6dbff5f6
[4/5] powerpc/sstep: Let compute tests specify a required cpu feature
  https://git.kernel.org/powerpc/c/301ebf7d69f6709575d137a41a0291f69f343aed
[5/5] powerpc/sstep: Add tests for Prefixed Add Immediate
  https://git.kernel.org/powerpc/c/4f825900786e1c24e4c48622e12eb493a6cd27b6

cheers

Re: [PATCH 1/4] powerpc: Add a ppc_inst_as_str() helper

2020-07-24 Thread Michael Ellerman

On Tue, 2 Jun 2020 15:27:25 +1000, Jordan Niethe wrote:
> There are quite a few places where instructions are printed, this is
> done using a '%x' format specifier. With the introduction of prefixed
> instructions, this does not work well. Currently in these places,
> ppc_inst_val() is used for the value for %x so only the first word of
> prefixed instructions are printed.
> 
> When the instructions are word instructions, only a single word should
> be printed. For prefixed instructions both the prefix and suffix should
> be printed. To accommodate both of these situations, instead of a '%x'
> specifier use '%s' and introduce a helper, __ppc_inst_as_str() which
> returns a char *. The char * __ppc_inst_as_str() returns is buffer that
> is passed to it by the caller.
> 
> [...]

Patches 1-2 applied to powerpc/next.

[1/4] powerpc: Add a ppc_inst_as_str() helper
  https://git.kernel.org/powerpc/c/50428fdc53ba48f6936b10dfdc0d644972403908
[2/4] powerpc/xmon: Improve dumping prefixed instructions
  https://git.kernel.org/powerpc/c/8b98afc117aaf825c66d7ddd59f1849e559b42cd

cheers

Re: [PATCH] powerpc/spufs: fix the type of ret in spufs_arch_write_note

2020-07-24 Thread Michael Ellerman

On Wed, 10 Jun 2020 10:55:54 +0200, Christoph Hellwig wrote:
> Both the ->dump method and snprintf return an int.  So switch to an
> int and properly handle errors from ->dump.

Applied to powerpc/next.

[1/1] powerpc/spufs: Fix the type of ret in spufs_arch_write_note
  https://git.kernel.org/powerpc/c/7c7ff885c7bce40a487e41c68f1dac14dd2c8033

cheers

Re: [PATCH] powerpc: Replace HTTP links with HTTPS ones

2020-07-24 Thread Michael Ellerman

On Sat, 18 Jul 2020 12:39:58 +0200, Alexander A. Klimov wrote:
> Rationale:
> Reduces attack surface on kernel devs opening the links for MITM
> as HTTPS traffic is much harder to manipulate.
> 
> Deterministic algorithm:
> For each file:
>   If not .svg:
> For each line:
>   If doesn't contain `\bxmlns\b`:
> For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
> If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
> If both the HTTP and HTTPS versions
> return 200 OK and serve the same content:
>   Replace HTTP with HTTPS.

Applied to powerpc/next.

[1/1] powerpc: Replace HTTP links with HTTPS ones
  https://git.kernel.org/powerpc/c/c8ed9fc9d29e24dafd08971e6a0c6b302a8ade2d

cheers

Re: [PATCH] macintosh/therm_adt746x: Replace HTTP links with HTTPS ones

2020-07-24 Thread Michael Ellerman

On Fri, 17 Jul 2020 20:29:40 +0200, Alexander A. Klimov wrote:
> Rationale:
> Reduces attack surface on kernel devs opening the links for MITM
> as HTTPS traffic is much harder to manipulate.
> 
> Deterministic algorithm:
> For each file:
>   If not .svg:
> For each line:
>   If doesn't contain `\bxmlns\b`:
> For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
> If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
> If both the HTTP and HTTPS versions
> return 200 OK and serve the same content:
>   Replace HTTP with HTTPS.

Applied to powerpc/next.

[1/1] macintosh/therm_adt746x: Replace HTTP links with HTTPS ones
  https://git.kernel.org/powerpc/c/1666e5ea2f838f4266e50e4f3d973c0824256429

cheers

Re: [PATCH] macintosh/adb: Replace HTTP links with HTTPS ones

2020-07-24 Thread Michael Ellerman

On Fri, 17 Jul 2020 20:35:22 +0200, Alexander A. Klimov wrote:
> Rationale:
> Reduces attack surface on kernel devs opening the links for MITM
> as HTTPS traffic is much harder to manipulate.
> 
> Deterministic algorithm:
> For each file:
>   If not .svg:
> For each line:
>   If doesn't contain `\bxmlns\b`:
> For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
> If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
> If both the HTTP and HTTPS versions
> return 200 OK and serve the same content:
>   Replace HTTP with HTTPS.

Applied to powerpc/next.

[1/1] macintosh/adb: Replace HTTP links with HTTPS ones
  https://git.kernel.org/powerpc/c/a7beab413e2e67dd1abe6bdd0001576892a89e81

cheers

Re: [PATCH v2 0/4] Prefixed instruction tests to cover negative cases

2020-07-24 Thread Michael Ellerman

On Fri, 26 Jun 2020 15:21:54 +0530, Balamuruhan S wrote:
> This patchset adds support to test negative scenarios and adds testcase
> for paddi with few fixes. It is based on powerpc/next and on top of
> Jordan's tests for prefixed instructions patchsets,
> 
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-May/211394.html
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-June/211768.html
> 
> [...]

Applied to powerpc/next.

[1/4] powerpc/test_emulate_step: Enhancement to test negative scenarios
  https://git.kernel.org/powerpc/c/93c3a0ba2a0863a5c82a518d64044434f82a57f5
[2/4] powerpc/test_emulate_step: Add negative tests for prefixed addi
  https://git.kernel.org/powerpc/c/7e67c73b939b25d4ad18a536e52282aa35d8ee56
[3/4] powerpc/sstep: Introduce macros to retrieve Prefix instruction operands
  https://git.kernel.org/powerpc/c/68a180a44c29d7e918ae7d3c18a01b0751d1c22f
[4/4] powerpc/test_emulate_step: Move extern declaration to sstep.h
  https://git.kernel.org/powerpc/c/e93ad65e3611b06288efdf0cfd76c012df3feec1

cheers

Re: [v3 00/15] powerpc/perf: Add support for power10 PMU Hardware

2020-07-24 Thread Michael Ellerman

On Fri, 17 Jul 2020 10:38:12 -0400, Athira Rajeev wrote:
> The patch series adds support for power10 PMU hardware.
> 
> Patches 1..3 are the clean up patches which refactors the way how
> PMU SPR's are stored in core-book3s and in KVM book3s, as well as update
> data type for PMU cache_events.
> 
> Patches 12 and 13 adds base support for perf extended register
> capability in powerpc. Support for extended regs in power10 is
> covered in patches 14,15
> 
> [...]

Patches 1-11 applied to powerpc/next.

[01/15] powerpc/perf: Update cpu_hw_event to use `struct` for storing MMCR 
registers

https://git.kernel.org/powerpc/c/78d76819e6f04672989506e7792895a51438516e
[02/15] KVM: PPC: Book3S HV: Cleanup updates for kvm vcpu MMCR

https://git.kernel.org/powerpc/c/7e4a145e5b675d5a9182f756950f001eaa256795
[03/15] powerpc/perf: Update Power PMU cache_events to u64 type

https://git.kernel.org/powerpc/c/9d4fc86dcd510dab5521a6c891f9bf379b85a7e0
[04/15] powerpc/perf: Add support for ISA3.1 PMU SPRs

https://git.kernel.org/powerpc/c/c718547e4a92d74089f862457adf1f617c498e16
[05/15] KVM: PPC: Book3S HV: Save/restore new PMU registers

https://git.kernel.org/powerpc/c/5752fe0b811bb3cee531c52074921c6dd09dc42d
[06/15] powerpc/xmon: Add PowerISA v3.1 PMU SPRs

https://git.kernel.org/powerpc/c/1979ae8c7215718c7a98f038bad0122034ad6529
[07/15] powerpc/perf: Add Power10 PMU feature to DT CPU features

https://git.kernel.org/powerpc/c/9908c826d5ed150637a3a4c0eec5146a0c438f21
[08/15] powerpc/perf: power10 Performance Monitoring support

https://git.kernel.org/powerpc/c/a64e697cef23b3d24bac700f6d66c8e2bf8efccc
[09/15] powerpc/perf: Ignore the BHRB kernel address filtering for P10

https://git.kernel.org/powerpc/c/bfe3b1945d5e0531103b3d4ab3a367a1a156d99a
[10/15] powerpc/perf: Add Power10 BHRB filter support for 
PERF_SAMPLE_BRANCH_IND_CALL/COND

https://git.kernel.org/powerpc/c/80350a4bac992e3404067d31ff901ae9ff76aaa8
[11/15] powerpc/perf: BHRB control to disable BHRB logic when not used

https://git.kernel.org/powerpc/c/1cade527f6e9bec6a6412d0641643c359ada8096

cheers

Re: [PATCH v6 00/23] powerpc/book3s/64/pkeys: Simplify the code

2020-07-24 Thread Michael Ellerman

On Thu, 9 Jul 2020 08:59:23 +0530, Aneesh Kumar K.V wrote:
> This patch series update the pkey subsystem with more documentation and
> rename variables so that it is easy to follow the code. We drop the changes
> to support KUAP/KUEP with hash translation in this update. The changes
> are adding 200 cycles to null syscalls benchmark and I want to look at that
> closely before requesting a merge. The rest of the patches are included
> in this series. This should avoid having to carry a large patchset across
> the upstream merge. Some of the changes in here make the hash KUEP/KUAP
> addition simpler.
> 
> [...]

Applied to powerpc/next.

[01/23] powerpc/book3s64/pkeys: Use PVR check instead of cpu feature

https://git.kernel.org/powerpc/c/d79e7a5f26f1d179cbb915a8bf2469b6d7431c29
[02/23] powerpc/book3s64/pkeys: Fixup bit numbering

https://git.kernel.org/powerpc/c/33699023f51f96ac9be38747e64967ea05e00bab
[03/23] powerpc/book3s64/pkeys: pkeys are supported only on hash on book3s.

https://git.kernel.org/powerpc/c/b9658f83e721ddfcee3e08b16a6628420de424c3
[04/23] powerpc/book3s64/pkeys: Move pkey related bits in the linux page table

https://git.kernel.org/powerpc/c/ee8b39331f89950b0a011c7965db5694f0153166
[05/23] powerpc/book3s64/pkeys: Explain key 1 reservation details

https://git.kernel.org/powerpc/c/1f404058e2911afe08417ef82f17aba6adccfc63
[06/23] powerpc/book3s64/pkeys: Simplify the key initialization

https://git.kernel.org/powerpc/c/f491fe3fb41eafc7a159874040e032ad41ade210
[07/23] powerpc/book3s64/pkeys: Prevent key 1 modification from userspace.

https://git.kernel.org/powerpc/c/718d9b380174eb8fe16d67769395737b79654a02
[08/23] powerpc/book3s64/pkeys: kill cpu feature key CPU_FTR_PKEY

https://git.kernel.org/powerpc/c/a24204c307962214996627e3f4caa8772b9b0cf4
[09/23] powerpc/book3s64/pkeys: Simplify pkey disable branch

https://git.kernel.org/powerpc/c/a4678d4b477c3d2901f101986ca01406f3b7eaea
[10/23] powerpc/book3s64/pkeys: Convert pkey_total to num_pkey

https://git.kernel.org/powerpc/c/c529afd7cbc71ae1dc44a31efc7c1c9db3c3a143
[11/23] powerpc/book3s64/pkeys: Make initial_allocation_mask static

https://git.kernel.org/powerpc/c/3c8ab47362fe9a74f61b48efe957666a423c55a2
[12/23] powerpc/book3s64/pkeys: Mark all the pkeys above max pkey as reserved

https://git.kernel.org/powerpc/c/3e4352aeb8b17eb1040ba288f586620e8294389d
[13/23] powerpc/book3s64/pkeys: Add MMU_FTR_PKEY

https://git.kernel.org/powerpc/c/d3cd91fb8d2e202cf8ebb6f271898aaf37ecda8f
[14/23] powerpc/book3s64/kuep: Add MMU_FTR_KUEP

https://git.kernel.org/powerpc/c/e10cc8715d180509a367d3ab25d40e4a1612cb2f
[15/23] powerpc/book3s64/pkeys: Use pkey_execute_disable_supported

https://git.kernel.org/powerpc/c/2daf298de728dc37f32d0749fa4f59db36fa7d96
[16/23] powerpc/book3s64/pkeys: Use MMU_FTR_PKEY instead of pkey_disabled 
static key

https://git.kernel.org/powerpc/c/f7045a45115b17fe695ea7075f5213706f202edb
[17/23] powerpc/book3s64/keys: Print information during boot.

https://git.kernel.org/powerpc/c/7cdd3745f2d75aecc2b61368e2563ae54bfac59a
[18/23] powerpc/book3s64/keys/kuap: Reset AMR/IAMR values on kexec

https://git.kernel.org/powerpc/c/000a42b35a54372597f0657f6b9875b38c641864
[19/23] powerpc/book3s64/kuap: Move UAMOR setup to key init function

https://git.kernel.org/powerpc/c/e0d8e991be641ba0034c67785bf86f6c097869d6
[20/23] selftests/powerpc: ptrace-pkey: Rename variables to make it easier to 
follow code

https://git.kernel.org/powerpc/c/9a11f12e0a6c374b3ef1ce81e32ce477d28eb1b8
[21/23] selftests/powerpc: ptrace-pkey: Update the test to mark an invalid pkey 
correctly

https://git.kernel.org/powerpc/c/0eaa3b5ca7b5a76e3783639c828498343be66a01
[22/23] selftests/powerpc: ptrace-pkey: Don't update expected UAMOR value

https://git.kernel.org/powerpc/c/3563b9bea0ca7f53e4218b5e268550341a49f333
[23/23] powerpc/book3s64/pkeys: Remove is_pkey_enabled()

https://git.kernel.org/powerpc/c/482b9b3948675df60c015b2155011c1f93234992

cheers

Re: [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes

2020-07-24 Thread Michael Ellerman

On Thu, 9 Jul 2020 18:49:21 +0530, Aneesh Kumar K.V wrote:
> This is the next version of the fixes for memory unplug on radix.
> The issues and the fix are described in the actual patches.
> 
> Changes from v2:
> - Address review feedback
> 
> Changes from v1:
> - Added back patch to drop split_kernel_mapping
> - Most of the split_kernel_mapping related issues are now described
>   in the removal patch
> - drop pte fragment change
> - use lmb size as the max mapping size.
> - Radix baremetal now use memory block size of 1G.
> 
> [...]

Applied to powerpc/next.

[1/4] powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappings
  https://git.kernel.org/powerpc/c/645d5ce2f7d6cb4dcf6a4e087fb550e238d24283
[2/4] powerpc/mm/radix: Free PUD table when freeing pagetable
  https://git.kernel.org/powerpc/c/9ce8853b4a735c8115f55ac0e9c2b27a4c8f80b5
[3/4] powerpc/mm/radix: Remove split_kernel_mapping()
  https://git.kernel.org/powerpc/c/d6d6ebfc5dbb4008be21baa4ec2ad45606578966
[4/4] powerpc/mm/radix: Create separate mappings for hot-plugged memory
  https://git.kernel.org/powerpc/c/af9d00e93a4f062c5f160325d7b8f6f6744e

cheers

[PATCH 9/9] powerpc: Drop old comment about CONFIG_POWER

2020-07-24 Thread Michael Ellerman

There's a comment in time.h referring to CONFIG_POWER, which doesn't
exist. That confuses scripts/checkkconfigsymbols.py.

Presumably the comment was referring to a CONFIG_POWER vs CONFIG_PPC,
in which case for CONFIG_POWER we would #define __USE_RTC to 1. But
instead we have CONFIG_PPC_BOOK3S_601, and these days we have
IS_ENABLED().

So the comment is no longer relevant, drop it.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/time.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index b287cfc2dd85..cb326720a8a1 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -39,7 +39,6 @@ struct div_result {
 };
 
 /* Accessor functions for the timebase (RTC on 601) registers. */
-/* If one day CONFIG_POWER is added just define __USE_RTC as 1 */
 #define __USE_RTC()(IS_ENABLED(CONFIG_PPC_BOOK3S_601))
 
 #ifdef CONFIG_PPC64
-- 
2.25.1

[PATCH 8/9] powerpc/kvm: Use correct CONFIG symbol in comment

2020-07-24 Thread Michael Ellerman

This comment refers to the non-existent CONFIG_PPC_BOOK3S_XX, which
confuses scripts/checkkconfigsymbols.py.

Change it to use the correct symbol.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kvm/book3s_interrupts.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_interrupts.S 
b/arch/powerpc/kvm/book3s_interrupts.S
index f7ad99d972ce..607a9b99c334 100644
--- a/arch/powerpc/kvm/book3s_interrupts.S
+++ b/arch/powerpc/kvm/book3s_interrupts.S
@@ -26,7 +26,7 @@
 #define FUNC(name) name
 #define GET_SHADOW_VCPU(reg)   lwz reg, (THREAD + THREAD_KVM_SVCPU)(r2)
 
-#endif /* CONFIG_PPC_BOOK3S_XX */
+#endif /* CONFIG_PPC_BOOK3S_64 */
 
 #define VCPU_LOAD_NVGPRS(vcpu) \
PPC_LL  r14, VCPU_GPR(R14)(vcpu); \
-- 
2.25.1

[PATCH 7/9] powerpc/boot: Fix CONFIG_PPC_MPC52XX references

2020-07-24 Thread Michael Ellerman

Commit 866bfc75f40e ("powerpc: conditionally compile platform-specific
serial drivers") made some code depend on CONFIG_PPC_MPC52XX, which
doesn't exist.

Fix it to use CONFIG_PPC_MPC52xx.

Fixes: 866bfc75f40e ("powerpc: conditionally compile platform-specific serial 
drivers")
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/boot/Makefile | 2 +-
 arch/powerpc/boot/serial.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 4d43cb59b4a4..44af71543380 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -117,7 +117,7 @@ src-wlib-y := string.S crt0.S stdio.c decompress.c main.c \
elf_util.c $(zlib-y) devtree.c stdlib.c \
oflib.c ofconsole.c cuboot.c
 
-src-wlib-$(CONFIG_PPC_MPC52XX) += mpc52xx-psc.c
+src-wlib-$(CONFIG_PPC_MPC52xx) += mpc52xx-psc.c
 src-wlib-$(CONFIG_PPC64_BOOT_WRAPPER) += opal-calls.S opal.c
 ifndef CONFIG_PPC64_BOOT_WRAPPER
 src-wlib-y += crtsavres.S
diff --git a/arch/powerpc/boot/serial.c b/arch/powerpc/boot/serial.c
index 0bfa7e87e546..9a19e5905485 100644
--- a/arch/powerpc/boot/serial.c
+++ b/arch/powerpc/boot/serial.c
@@ -128,7 +128,7 @@ int serial_console_init(void)
 dt_is_compatible(devp, "fsl,cpm2-smc-uart"))
rc = cpm_console_init(devp, &serial_cd);
 #endif
-#ifdef CONFIG_PPC_MPC52XX
+#ifdef CONFIG_PPC_MPC52xx
else if (dt_is_compatible(devp, "fsl,mpc5200-psc-uart"))
rc = mpc5200_psc_console_init(devp, &serial_cd);
 #endif
-- 
2.25.1

[PATCH 5/9] powerpc/32s: Fix CONFIG_BOOK3S_601 uses

2020-07-24 Thread Michael Ellerman

We have two uses of CONFIG_BOOK3S_601, which doesn't exist. Fix them
to use CONFIG_PPC_BOOK3S_601 which is the correct symbol.

Fixes: 12c3f1fd87bf ("powerpc/32s: get rid of CPU_FTR_601 feature")
Signed-off-by: Michael Ellerman 
---

I think the bug in get_cycles() at least demonstrates that no one has
booted a 601 since v5.4. Time to drop 601?
---
 arch/powerpc/include/asm/ptrace.h | 2 +-
 arch/powerpc/include/asm/timex.h  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index f194339cef3b..155a197c0aa1 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -243,7 +243,7 @@ static inline void set_trap_norestart(struct pt_regs *regs)
 }
 
 #define arch_has_single_step() (1)
-#ifndef CONFIG_BOOK3S_601
+#ifndef CONFIG_PPC_BOOK3S_601
 #define arch_has_block_step()  (true)
 #else
 #define arch_has_block_step()  (false)
diff --git a/arch/powerpc/include/asm/timex.h b/arch/powerpc/include/asm/timex.h
index d2d2c4bd8435..6047402b0a4d 100644
--- a/arch/powerpc/include/asm/timex.h
+++ b/arch/powerpc/include/asm/timex.h
@@ -17,7 +17,7 @@ typedef unsigned long cycles_t;
 
 static inline cycles_t get_cycles(void)
 {
-   if (IS_ENABLED(CONFIG_BOOK3S_601))
+   if (IS_ENABLED(CONFIG_PPC_BOOK3S_601))
return 0;
 
return mftb();
-- 
2.25.1

[PATCH 6/9] powerpc/32s: Remove TAUException wart in traps.c

2020-07-24 Thread Michael Ellerman

All 32 and 64-bit builds that don't have CONFIG_TAU_INT enabled (all
of them), get a definition of TAUException() in traps.c.

On 64-bit it's completely useless, and just wastes ~120 bytes of text.
On 32-bit it allows the kernel to link because head_32.S calls it
unconditionally.

Instead follow the example of altivec_assist_exception(), and if
CONFIG_TAU_INT is not enabled just point it at unknown_exception using
the preprocessor.

Signed-off-by: Michael Ellerman 
---

Can we just remove TAU_INT entirely? It's in zero defconfigs and
doesn't sound like something anyone really wants to enable:

  However, on some cpus it appears that the TAU interrupt hardware
  is buggy and can cause a situation which would lead unexplained hard
  lockups.

  Unless you are extending the TAU driver, or enjoy kernel/hardware
  debugging, leave this option off.
---
 arch/powerpc/kernel/head_32.S | 4 
 arch/powerpc/kernel/traps.c   | 8 
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 705c042309d8..dcfb7dceb6d6 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -671,6 +671,10 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_DTLB_SW_LRU)
 
 #ifndef CONFIG_ALTIVEC
 #define altivec_assist_exception   unknown_exception
+#endif
+
+#ifndef CONFIG_TAU_INT
+#define TAUException   unknown_exception
 #endif
 
EXCEPTION(0x1300, Trap_13, instruction_breakpoint_exception, 
EXC_XFER_STD)
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 97413a385720..d1ebe152f210 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -2060,14 +2060,6 @@ void DebugException(struct pt_regs *regs, unsigned long 
debug_status)
 NOKPROBE_SYMBOL(DebugException);
 #endif /* CONFIG_PPC_ADV_DEBUG_REGS */
 
-#if !defined(CONFIG_TAU_INT)
-void TAUException(struct pt_regs *regs)
-{
-   printk("TAU trap at PC: %lx, MSR: %lx, vector=%lx%s\n",
-  regs->nip, regs->msr, regs->trap, print_tainted());
-}
-#endif /* CONFIG_INT_TAU */
-
 #ifdef CONFIG_ALTIVEC
 void altivec_assist_exception(struct pt_regs *regs)
 {
-- 
2.25.1

[PATCH 3/9] powerpc/52xx: Fix comment about CONFIG_BDI*

2020-07-24 Thread Michael Ellerman

There's a comment in lite5200_sleep.S that refers to "CONFIG_BDI*".

This confuses scripts/checkkconfigsymbols.py, which thinks it should
be able to find CONFIG_BDI.

Change the comment to refer to CONFIG_BDI_SWITCH which is presumably
roughly what it was referring to. AFAICS there never has been a
CONFIG_BDI.

Signed-off-by: Michael Ellerman 
---

If anyone has a better idea what it means feel free to reply.
---
 arch/powerpc/platforms/52xx/lite5200_sleep.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/52xx/lite5200_sleep.S 
b/arch/powerpc/platforms/52xx/lite5200_sleep.S
index 70083649c9ea..11475c58ea43 100644
--- a/arch/powerpc/platforms/52xx/lite5200_sleep.S
+++ b/arch/powerpc/platforms/52xx/lite5200_sleep.S
@@ -56,7 +56,7 @@
/*
 * save stuff BDI overwrites
 * 0xf0 (0xe0->0x100 gets overwritten when BDI connected;
-*   even when CONFIG_BDI* is disabled and MMU XLAT commented; 
heisenbug?))
+*   even when CONFIG_BDI_SWITCH is disabled and MMU XLAT commented; 
heisenbug?))
 * WARNING: self-refresh doesn't seem to work when BDI2000 is connected,
 *   possibly because BDI sets SDRAM registers before wakeup code does
 */
-- 
2.25.1

[PATCH 4/9] powerpc/64e: Drop dead BOOK3E_MMU_TLB_STATS code

2020-07-24 Thread Michael Ellerman

This code was merged 11 years ago in commit 13363ab9b9d0 ("powerpc:
Add definitions used by exception handling on 64-bit Book3E") but was
never able to be built because CONFIG_BOOK3E_MMU_TLB_STATS never
existed. Remove it.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/exception-64e.h | 53 +---
 arch/powerpc/mm/nohash/tlb_low_64e.S | 47 ++---
 2 files changed, 4 insertions(+), 96 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64e.h 
b/arch/powerpc/include/asm/exception-64e.h
index 72b6657acd2d..40cdcb2fb057 100644
--- a/arch/powerpc/include/asm/exception-64e.h
+++ b/arch/powerpc/include/asm/exception-64e.h
@@ -66,14 +66,7 @@
 #define EX_TLB_SRR0(10 * 8)
 #define EX_TLB_SRR1(11 * 8)
 #define EX_TLB_R7  (12 * 8)
-#ifdef CONFIG_BOOK3E_MMU_TLB_STATS
-#define EX_TLB_R8  (13 * 8)
-#define EX_TLB_R9  (14 * 8)
-#define EX_TLB_LR  (15 * 8)
-#define EX_TLB_SIZE(16 * 8)
-#else
 #define EX_TLB_SIZE(13 * 8)
-#endif
 
 #defineSTART_EXCEPTION(label)  
\
.globl exc_##label##_book3e;\
@@ -110,8 +103,7 @@ exc_##label##_book3e:
std r11,EX_TLB_R12(r12);\
mtspr   SPRN_SPRG_TLB_EXFRAME,r14;  \
std r15,EX_TLB_SRR1(r12);   \
-   std r16,EX_TLB_SRR0(r12);   \
-   TLB_MISS_PROLOG_STATS
+   std r16,EX_TLB_SRR0(r12);
 
 /* And these are the matching epilogs that restores things
  *
@@ -143,7 +135,6 @@ exc_##label##_book3e:
mtspr   SPRN_SRR0,r15;  \
ld  r15,EX_TLB_R15(r12);\
mtspr   SPRN_SRR1,r16;  \
-   TLB_MISS_RESTORE_STATS  \
ld  r16,EX_TLB_R16(r12);\
ld  r12,EX_TLB_R12(r12);\
 
@@ -158,48 +149,6 @@ exc_##label##_book3e:
addir11,r13,PACA_EXTLB; \
TLB_MISS_RESTORE(r11)
 
-#ifdef CONFIG_BOOK3E_MMU_TLB_STATS
-#define TLB_MISS_PROLOG_STATS  \
-   mflrr10;\
-   std r8,EX_TLB_R8(r12);  \
-   std r9,EX_TLB_R9(r12);  \
-   std r10,EX_TLB_LR(r12);
-#define TLB_MISS_RESTORE_STATS \
-   ld  r16,EX_TLB_LR(r12); \
-   ld  r9,EX_TLB_R9(r12);  \
-   ld  r8,EX_TLB_R8(r12);  \
-   mtlrr16;
-#define TLB_MISS_STATS_D(name) \
-   addir9,r13,MMSTAT_DSTATS+name;  \
-   bl  tlb_stat_inc;
-#define TLB_MISS_STATS_I(name) \
-   addir9,r13,MMSTAT_ISTATS+name;  \
-   bl  tlb_stat_inc;
-#define TLB_MISS_STATS_X(name) \
-   ld  r8,PACA_EXTLB+EX_TLB_ESR(r13);  \
-   cmpdi   cr2,r8,-1;  \
-   beq cr2,61f;\
-   addir9,r13,MMSTAT_DSTATS+name;  \
-   b   62f;\
-61:addir9,r13,MMSTAT_ISTATS+name;  \
-62:bl  tlb_stat_inc;
-#define TLB_MISS_STATS_SAVE_INFO   \
-   std r14,EX_TLB_ESR(r12);/* save ESR */
-#define TLB_MISS_STATS_SAVE_INFO_BOLTED
\
-   std r14,PACA_EXTLB+EX_TLB_ESR(r13); /* save ESR */
-#else
-#define TLB_MISS_PROLOG_STATS
-#define TLB_MISS_RESTORE_STATS
-#define TLB_MISS_PROLOG_STATS_BOLTED
-#define TLB_MISS_RESTORE_STATS_BOLTED
-#define TLB_MISS_STATS_D(name)
-#define TLB_MISS_STATS_I(name)
-#define TLB_MISS_STATS_X(name)
-#define TLB_MISS_STATS_Y(name)
-#define TLB_MISS_STATS_SAVE_INFO
-#define TLB_MISS_STATS_SAVE_INFO_BOLTED
-#endif
-
 #define SET_IVOR(vector_number, vector_offset) \
LOAD_REG_ADDR(r3,interrupt_base_book3e);\
ori r3,r3,vector_offset@l;  \
diff --git a/arch/powerpc/mm/nohash/tlb_low_64e.S 
b/arch/powerpc/mm/nohash/tlb_low_64e.S
index d5e2704d0096..bf24451f3e71 100644
--- a/arch/powerpc/mm/nohash/tlb_low_64e.S
+++ b/arch/powerpc/mm/nohash/tlb_low

[PATCH 2/9] powerpc/configs: Remove dead symbols

2020-07-24 Thread Michael Ellerman

Remove references to symbols that no longer exist as reported by
scripts/checkkconfigsymbols.py.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/configs/44x/akebono_defconfig  | 1 -
 arch/powerpc/configs/85xx/xes_mpc85xx_defconfig | 3 ---
 arch/powerpc/configs/86xx-hw.config | 2 --
 arch/powerpc/configs/fsl-emb-nonhw.config   | 1 -
 arch/powerpc/configs/g5_defconfig   | 1 -
 arch/powerpc/configs/linkstation_defconfig  | 1 -
 arch/powerpc/configs/mpc512x_defconfig  | 1 -
 arch/powerpc/configs/mpc83xx_defconfig  | 1 -
 arch/powerpc/configs/mvme5100_defconfig | 1 -
 arch/powerpc/configs/pasemi_defconfig   | 1 -
 arch/powerpc/configs/pmac32_defconfig   | 8 
 arch/powerpc/configs/powernv_defconfig  | 1 -
 arch/powerpc/configs/ppc40x_defconfig   | 3 ---
 arch/powerpc/configs/ppc64_defconfig| 1 -
 arch/powerpc/configs/pseries_defconfig  | 1 -
 15 files changed, 27 deletions(-)

diff --git a/arch/powerpc/configs/44x/akebono_defconfig 
b/arch/powerpc/configs/44x/akebono_defconfig
index 60d5fa2c3b93..3894ba8f8ffc 100644
--- a/arch/powerpc/configs/44x/akebono_defconfig
+++ b/arch/powerpc/configs/44x/akebono_defconfig
@@ -56,7 +56,6 @@ CONFIG_BLK_DEV_SD=y
 # CONFIG_NET_VENDOR_DEC is not set
 # CONFIG_NET_VENDOR_DLINK is not set
 # CONFIG_NET_VENDOR_EMULEX is not set
-# CONFIG_NET_VENDOR_EXAR is not set
 CONFIG_IBM_EMAC=y
 # CONFIG_NET_VENDOR_MARVELL is not set
 # CONFIG_NET_VENDOR_MELLANOX is not set
diff --git a/arch/powerpc/configs/85xx/xes_mpc85xx_defconfig 
b/arch/powerpc/configs/85xx/xes_mpc85xx_defconfig
index d50aca608736..3a6381aa9fdc 100644
--- a/arch/powerpc/configs/85xx/xes_mpc85xx_defconfig
+++ b/arch/powerpc/configs/85xx/xes_mpc85xx_defconfig
@@ -51,9 +51,6 @@ CONFIG_NET_IPIP=y
 CONFIG_IP_MROUTE=y
 CONFIG_IP_PIMSM_V1=y
 CONFIG_IP_PIMSM_V2=y
-# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
-# CONFIG_INET_XFRM_MODE_TUNNEL is not set
-# CONFIG_INET_XFRM_MODE_BEET is not set
 CONFIG_MTD=y
 CONFIG_MTD_REDBOOT_PARTS=y
 CONFIG_MTD_CMDLINE_PARTS=y
diff --git a/arch/powerpc/configs/86xx-hw.config 
b/arch/powerpc/configs/86xx-hw.config
index 151164cf8cb3..0cb24b33c88e 100644
--- a/arch/powerpc/configs/86xx-hw.config
+++ b/arch/powerpc/configs/86xx-hw.config
@@ -32,8 +32,6 @@ CONFIG_HW_RANDOM=y
 CONFIG_HZ_1000=y
 CONFIG_I2C_MPC=y
 CONFIG_I2C=y
-# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
-# CONFIG_INET_XFRM_MODE_TUNNEL is not set
 CONFIG_INPUT_FF_MEMLESS=m
 # CONFIG_INPUT_KEYBOARD is not set
 # CONFIG_INPUT_MOUSEDEV is not set
diff --git a/arch/powerpc/configs/fsl-emb-nonhw.config 
b/arch/powerpc/configs/fsl-emb-nonhw.config
index 3c7dad19a691..df37efed0aec 100644
--- a/arch/powerpc/configs/fsl-emb-nonhw.config
+++ b/arch/powerpc/configs/fsl-emb-nonhw.config
@@ -56,7 +56,6 @@ CONFIG_IKCONFIG=y
 CONFIG_INET_AH=y
 CONFIG_INET_ESP=y
 CONFIG_INET_IPCOMP=y
-# CONFIG_INET_XFRM_MODE_BEET is not set
 CONFIG_INET=y
 CONFIG_IP_ADVANCED_ROUTER=y
 CONFIG_IP_MROUTE=y
diff --git a/arch/powerpc/configs/g5_defconfig 
b/arch/powerpc/configs/g5_defconfig
index a68c7f3af10e..1c674c4c1d86 100644
--- a/arch/powerpc/configs/g5_defconfig
+++ b/arch/powerpc/configs/g5_defconfig
@@ -51,7 +51,6 @@ CONFIG_NF_CONNTRACK_FTP=m
 CONFIG_NF_CONNTRACK_IRC=m
 CONFIG_NF_CONNTRACK_TFTP=m
 CONFIG_NF_CT_NETLINK=m
-CONFIG_NF_CONNTRACK_IPV4=m
 CONFIG_DEVTMPFS=y
 CONFIG_DEVTMPFS_MOUNT=y
 CONFIG_BLK_DEV_LOOP=y
diff --git a/arch/powerpc/configs/linkstation_defconfig 
b/arch/powerpc/configs/linkstation_defconfig
index ea59f3d146df..d4be64f190ff 100644
--- a/arch/powerpc/configs/linkstation_defconfig
+++ b/arch/powerpc/configs/linkstation_defconfig
@@ -37,7 +37,6 @@ CONFIG_NF_CONNTRACK_TFTP=m
 CONFIG_NETFILTER_XT_MATCH_MAC=m
 CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m
 CONFIG_NETFILTER_XT_MATCH_STATE=m
-CONFIG_NF_CONNTRACK_IPV4=m
 CONFIG_IP_NF_IPTABLES=m
 CONFIG_IP_NF_FILTER=m
 CONFIG_IP_NF_TARGET_REJECT=m
diff --git a/arch/powerpc/configs/mpc512x_defconfig 
b/arch/powerpc/configs/mpc512x_defconfig
index e39346b3dc3b..e75d3f3060c9 100644
--- a/arch/powerpc/configs/mpc512x_defconfig
+++ b/arch/powerpc/configs/mpc512x_defconfig
@@ -47,7 +47,6 @@ CONFIG_MTD_UBI=y
 CONFIG_BLK_DEV_RAM=y
 CONFIG_BLK_DEV_RAM_COUNT=1
 CONFIG_BLK_DEV_RAM_SIZE=8192
-CONFIG_BLK_DEV_RAM_DAX=y
 CONFIG_EEPROM_AT24=y
 CONFIG_EEPROM_AT25=y
 CONFIG_SCSI=y
diff --git a/arch/powerpc/configs/mpc83xx_defconfig 
b/arch/powerpc/configs/mpc83xx_defconfig
index be125729635c..95d43f8a3869 100644
--- a/arch/powerpc/configs/mpc83xx_defconfig
+++ b/arch/powerpc/configs/mpc83xx_defconfig
@@ -19,7 +19,6 @@ CONFIG_MPC836x_MDS=y
 CONFIG_MPC836x_RDK=y
 CONFIG_MPC837x_MDS=y
 CONFIG_MPC837x_RDB=y
-CONFIG_SBC834x=y
 CONFIG_ASP834x=y
 CONFIG_QE_GPIO=y
 CONFIG_MATH_EMULATION=y
diff --git a/arch/powerpc/configs/mvme5100_defconfig 
b/arch/powerpc/configs/mvme5100_defconfig
index 3d53d69ed36c..1fed6be95d53 100644
--- a/arch/powerpc/configs/mvme5100_defconfig
+++ b/arch/powerpc/configs/mvme5100_defconfig
@@

[PATCH 1/9] powerpc/configs: Drop old symbols from ppc6xx_defconfig

2020-07-24 Thread Michael Ellerman

ppc6xx_defconfig refers to quite a few symbols that no longer exist,
as reported by scripts/checkkconfigsymbols.py, remove them.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/configs/ppc6xx_defconfig | 39 ---
 1 file changed, 39 deletions(-)

diff --git a/arch/powerpc/configs/ppc6xx_defconfig 
b/arch/powerpc/configs/ppc6xx_defconfig
index feb5d47d8d1e..5e6f92ba3210 100644
--- a/arch/powerpc/configs/ppc6xx_defconfig
+++ b/arch/powerpc/configs/ppc6xx_defconfig
@@ -53,7 +53,6 @@ CONFIG_MPC836x_MDS=y
 CONFIG_MPC836x_RDK=y
 CONFIG_MPC837x_MDS=y
 CONFIG_MPC837x_RDB=y
-CONFIG_SBC834x=y
 CONFIG_ASP834x=y
 CONFIG_PPC_86xx=y
 CONFIG_MPC8641_HPCN=y
@@ -187,7 +186,6 @@ CONFIG_NETFILTER_XT_MATCH_STRING=m
 CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
 CONFIG_NETFILTER_XT_MATCH_TIME=m
 CONFIG_NETFILTER_XT_MATCH_U32=m
-CONFIG_NF_CONNTRACK_IPV4=m
 CONFIG_IP_NF_IPTABLES=m
 CONFIG_IP_NF_MATCH_AH=m
 CONFIG_IP_NF_MATCH_ECN=m
@@ -203,7 +201,6 @@ CONFIG_IP_NF_SECURITY=m
 CONFIG_IP_NF_ARPTABLES=m
 CONFIG_IP_NF_ARPFILTER=m
 CONFIG_IP_NF_ARP_MANGLE=m
-CONFIG_NF_CONNTRACK_IPV6=m
 CONFIG_IP6_NF_IPTABLES=m
 CONFIG_IP6_NF_MATCH_AH=m
 CONFIG_IP6_NF_MATCH_EUI64=m
@@ -241,7 +238,6 @@ CONFIG_BRIDGE_EBT_SNAT=m
 CONFIG_BRIDGE_EBT_LOG=m
 CONFIG_BRIDGE_EBT_NFLOG=m
 CONFIG_IP_DCCP=m
-CONFIG_NET_DCCPPROBE=m
 CONFIG_TIPC=m
 CONFIG_ATM=m
 CONFIG_ATM_CLIP=m
@@ -251,7 +247,6 @@ CONFIG_BRIDGE=m
 CONFIG_VLAN_8021Q=m
 CONFIG_DECNET=m
 CONFIG_DECNET_ROUTER=y
-CONFIG_IPX=m
 CONFIG_ATALK=m
 CONFIG_DEV_APPLETALK=m
 CONFIG_IPDDP=m
@@ -297,26 +292,6 @@ CONFIG_NET_ACT_NAT=m
 CONFIG_NET_ACT_PEDIT=m
 CONFIG_NET_ACT_SIMP=m
 CONFIG_NET_ACT_SKBEDIT=m
-CONFIG_IRDA=m
-CONFIG_IRLAN=m
-CONFIG_IRNET=m
-CONFIG_IRCOMM=m
-CONFIG_IRDA_CACHE_LAST_LSAP=y
-CONFIG_IRDA_FAST_RR=y
-CONFIG_IRTTY_SIR=m
-CONFIG_KINGSUN_DONGLE=m
-CONFIG_KSDAZZLE_DONGLE=m
-CONFIG_KS959_DONGLE=m
-CONFIG_USB_IRDA=m
-CONFIG_SIGMATEL_FIR=m
-CONFIG_NSC_FIR=m
-CONFIG_WINBOND_FIR=m
-CONFIG_TOSHIBA_FIR=m
-CONFIG_SMC_IRCC_FIR=m
-CONFIG_ALI_FIR=m
-CONFIG_VLSI_FIR=m
-CONFIG_VIA_FIR=m
-CONFIG_MCS_FIR=m
 CONFIG_BT=m
 CONFIG_BT_RFCOMM=m
 CONFIG_BT_RFCOMM_TTY=y
@@ -332,7 +307,6 @@ CONFIG_BT_HCIBFUSB=m
 CONFIG_BT_HCIDTL1=m
 CONFIG_BT_HCIBT3C=m
 CONFIG_BT_HCIBLUECARD=m
-CONFIG_BT_HCIBTUART=m
 CONFIG_BT_HCIVHCI=m
 CONFIG_CFG80211=m
 CONFIG_MAC80211=m
@@ -366,7 +340,6 @@ CONFIG_EEPROM_93CX6=m
 CONFIG_RAID_ATTRS=m
 CONFIG_BLK_DEV_SD=y
 CONFIG_CHR_DEV_ST=m
-CONFIG_CHR_DEV_OSST=m
 CONFIG_BLK_DEV_SR=m
 CONFIG_CHR_DEV_SG=y
 CONFIG_CHR_DEV_SCH=m
@@ -663,7 +636,6 @@ CONFIG_I2C_MPC=m
 CONFIG_I2C_PCA_PLATFORM=m
 CONFIG_I2C_SIMTEC=m
 CONFIG_I2C_PARPORT=m
-CONFIG_I2C_PARPORT_LIGHT=m
 CONFIG_I2C_TINY_USB=m
 CONFIG_I2C_PCA_ISA=m
 CONFIG_I2C_STUB=m
@@ -676,7 +648,6 @@ CONFIG_W1_SLAVE_THERM=m
 CONFIG_W1_SLAVE_SMEM=m
 CONFIG_W1_SLAVE_DS2433=m
 CONFIG_W1_SLAVE_DS2433_CRC=y
-CONFIG_W1_SLAVE_DS2760=m
 CONFIG_APM_POWER=m
 CONFIG_BATTERY_PMU=m
 CONFIG_HWMON=m
@@ -1065,15 +1036,6 @@ CONFIG_CIFS_UPCALL=y
 CONFIG_CIFS_XATTR=y
 CONFIG_CIFS_POSIX=y
 CONFIG_CIFS_DFS_UPCALL=y
-CONFIG_NCP_FS=m
-CONFIG_NCPFS_PACKET_SIGNING=y
-CONFIG_NCPFS_IOCTL_LOCKING=y
-CONFIG_NCPFS_STRONG=y
-CONFIG_NCPFS_NFS_NS=y
-CONFIG_NCPFS_OS2_NS=y
-CONFIG_NCPFS_SMALLDOS=y
-CONFIG_NCPFS_NLS=y
-CONFIG_NCPFS_EXTRAS=y
 CONFIG_CODA_FS=m
 CONFIG_9P_FS=m
 CONFIG_NLS_DEFAULT="utf8"
@@ -1117,7 +1079,6 @@ CONFIG_NLS_KOI8_U=m
 CONFIG_DEBUG_INFO=y
 CONFIG_UNUSED_SYMBOLS=y
 CONFIG_HEADERS_INSTALL=y
-CONFIG_HEADERS_CHECK=y
 CONFIG_MAGIC_SYSRQ=y
 CONFIG_DEBUG_KERNEL=y
 CONFIG_DEBUG_OBJECTS=y
-- 
2.25.1

[PATCH] powerpc/sstep: Fix incorrect CONFIG symbol in scv handling

2020-07-24 Thread Michael Ellerman

When I "fixed" the ppc64e build in Nick's recent patch, I typoed the
CONFIG symbol, resulting in one that doesn't exist. Fix it to use the
correct symbol.

Reported-by: Christophe Leroy 
Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv 
instructions")
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/lib/sstep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 4194119eff82..c58ea9e787cb 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -3382,7 +3382,7 @@ int emulate_step(struct pt_regs *regs, struct ppc_inst 
instr)
regs->msr = MSR_KERNEL;
return 1;
 
-#ifdef CONFIG_PPC64_BOOK3S
+#ifdef CONFIG_PPC_BOOK3S_64
case SYSCALL_VECTORED_0:/* scv 0 */
regs->gpr[9] = regs->gpr[13];
regs->gpr[10] = MSR_KERNEL;
-- 
2.25.1

[PATCH v4 6/6] powerpc: implement smp_cond_load_relaxed

2020-07-24 Thread Nicholas Piggin

This implements smp_cond_load_relaed with the slowpath busy loop using the
preferred SMT priority pattern.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/barrier.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/powerpc/include/asm/barrier.h 
b/arch/powerpc/include/asm/barrier.h
index 123adcefd40f..9b4671d38674 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -76,6 +76,20 @@ do { 
\
___p1;  \
 })
 
+#define smp_cond_load_relaxed(ptr, cond_expr) ({   \
+   typeof(ptr) __PTR = (ptr);  \
+   __unqual_scalar_typeof(*ptr) VAL;   \
+   VAL = READ_ONCE(*__PTR);\
+   if (unlikely(!(cond_expr))) {   \
+   spin_begin();   \
+   do {\
+   VAL = READ_ONCE(*__PTR);\
+   } while (!(cond_expr)); \
+   spin_end(); \
+   }   \
+   (typeof(*ptr))VAL;  \
+})
+
 #ifdef CONFIG_PPC_BOOK3S_64
 #define NOSPEC_BARRIER_SLOT   nop
 #elif defined(CONFIG_PPC_FSL_BOOK3E)
-- 
2.23.0

[PATCH v4 5/6] powerpc/qspinlock: optimised atomic_try_cmpxchg_lock that adds the lock hint

2020-07-24 Thread Nicholas Piggin

This brings the behaviour of the uncontended fast path back to roughly
equivalent to simple spinlocks -- a single atomic op with lock hint.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/atomic.h| 28 
 arch/powerpc/include/asm/qspinlock.h |  2 +-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/atomic.h 
b/arch/powerpc/include/asm/atomic.h
index 498785ffc25f..f6a3d145ffb7 100644
--- a/arch/powerpc/include/asm/atomic.h
+++ b/arch/powerpc/include/asm/atomic.h
@@ -193,6 +193,34 @@ static __inline__ int atomic_dec_return_relaxed(atomic_t 
*v)
 #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
 #define atomic_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))
 
+/*
+ * Don't want to override the generic atomic_try_cmpxchg_acquire, because
+ * we add a lock hint to the lwarx, which may not be wanted for the
+ * _acquire case (and is not used by the other _acquire variants so it
+ * would be a surprise).
+ */
+static __always_inline bool
+atomic_try_cmpxchg_lock(atomic_t *v, int *old, int new)
+{
+   int r, o = *old;
+
+   __asm__ __volatile__ (
+"1:\t" PPC_LWARX(%0,0,%2,1) "  # atomic_try_cmpxchg_acquire\n"
+"  cmpw0,%0,%3 \n"
+"  bne-2f  \n"
+"  stwcx.  %4,0,%2 \n"
+"  bne-1b  \n"
+"\t"   PPC_ACQUIRE_BARRIER "   \n"
+"2:\n"
+   : "=&r" (r), "+m" (v->counter)
+   : "r" (&v->counter), "r" (o), "r" (new)
+   : "cr0", "memory");
+
+   if (unlikely(r != o))
+   *old = r;
+   return likely(r == o);
+}
+
 /**
  * atomic_fetch_add_unless - add unless the number is a given value
  * @v: pointer of type atomic_t
diff --git a/arch/powerpc/include/asm/qspinlock.h 
b/arch/powerpc/include/asm/qspinlock.h
index f5066f00a08c..b752d34517b3 100644
--- a/arch/powerpc/include/asm/qspinlock.h
+++ b/arch/powerpc/include/asm/qspinlock.h
@@ -37,7 +37,7 @@ static __always_inline void queued_spin_lock(struct qspinlock 
*lock)
 {
u32 val = 0;
 
-   if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL)))
+   if (likely(atomic_try_cmpxchg_lock(&lock->val, &val, _Q_LOCKED_VAL)))
return;
 
queued_spin_lock_slowpath(lock, val);
-- 
2.23.0

[PATCH v4 4/6] powerpc/pseries: implement paravirt qspinlocks for SPLPAR

2020-07-24 Thread Nicholas Piggin

This implements the generic paravirt qspinlocks using H_PROD and H_CONFER to
kick and wait.

This uses an un-directed yield to any CPU rather than the directed yield to
a pre-empted lock holder that paravirtualised simple spinlocks use, that
requires no kick hcall. This is something that could be investigated and
improved in future.

Performance results can be found in the commit which added queued spinlocks.

Acked-by: Peter Zijlstra (Intel) 
Acked-by: Waiman Long 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/paravirt.h   | 28 
 arch/powerpc/include/asm/qspinlock.h  | 66 +++
 arch/powerpc/include/asm/qspinlock_paravirt.h |  7 ++
 arch/powerpc/include/asm/spinlock.h   |  4 ++
 arch/powerpc/platforms/pseries/Kconfig|  9 ++-
 arch/powerpc/platforms/pseries/setup.c|  4 +-
 include/asm-generic/qspinlock.h   |  2 +
 7 files changed, 118 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt.h

diff --git a/arch/powerpc/include/asm/paravirt.h 
b/arch/powerpc/include/asm/paravirt.h
index 339e8533464b..21e5f29ca251 100644
--- a/arch/powerpc/include/asm/paravirt.h
+++ b/arch/powerpc/include/asm/paravirt.h
@@ -28,6 +28,16 @@ static inline void yield_to_preempted(int cpu, u32 
yield_count)
 {
plpar_hcall_norets(H_CONFER, get_hard_smp_processor_id(cpu), 
yield_count);
 }
+
+static inline void prod_cpu(int cpu)
+{
+   plpar_hcall_norets(H_PROD, get_hard_smp_processor_id(cpu));
+}
+
+static inline void yield_to_any(void)
+{
+   plpar_hcall_norets(H_CONFER, -1, 0);
+}
 #else
 static inline bool is_shared_processor(void)
 {
@@ -44,6 +54,19 @@ static inline void yield_to_preempted(int cpu, u32 
yield_count)
 {
___bad_yield_to_preempted(); /* This would be a bug */
 }
+
+extern void ___bad_yield_to_any(void);
+static inline void yield_to_any(void)
+{
+   ___bad_yield_to_any(); /* This would be a bug */
+}
+
+extern void ___bad_prod_cpu(void);
+static inline void prod_cpu(int cpu)
+{
+   ___bad_prod_cpu(); /* This would be a bug */
+}
+
 #endif
 
 #define vcpu_is_preempted vcpu_is_preempted
@@ -56,4 +79,9 @@ static inline bool vcpu_is_preempted(int cpu)
return false;
 }
 
+static inline bool pv_is_native_spin_unlock(void)
+{
+ return !is_shared_processor();
+}
+
 #endif /* _ASM_POWERPC_PARAVIRT_H */
diff --git a/arch/powerpc/include/asm/qspinlock.h 
b/arch/powerpc/include/asm/qspinlock.h
index c49e33e24edd..f5066f00a08c 100644
--- a/arch/powerpc/include/asm/qspinlock.h
+++ b/arch/powerpc/include/asm/qspinlock.h
@@ -3,9 +3,47 @@
 #define _ASM_POWERPC_QSPINLOCK_H
 
 #include 
+#include 
 
 #define _Q_PENDING_LOOPS   (1 << 9) /* not tuned */
 
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+extern void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+extern void __pv_queued_spin_unlock(struct qspinlock *lock);
+
+static __always_inline void queued_spin_lock_slowpath(struct qspinlock *lock, 
u32 val)
+{
+   if (!is_shared_processor())
+   native_queued_spin_lock_slowpath(lock, val);
+   else
+   __pv_queued_spin_lock_slowpath(lock, val);
+}
+
+#define queued_spin_unlock queued_spin_unlock
+static inline void queued_spin_unlock(struct qspinlock *lock)
+{
+   if (!is_shared_processor())
+   smp_store_release(&lock->locked, 0);
+   else
+   __pv_queued_spin_unlock(lock);
+}
+
+#else
+extern void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+#endif
+
+static __always_inline void queued_spin_lock(struct qspinlock *lock)
+{
+   u32 val = 0;
+
+   if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL)))
+   return;
+
+   queued_spin_lock_slowpath(lock, val);
+}
+#define queued_spin_lock queued_spin_lock
+
 #define smp_mb__after_spinlock()   smp_mb()
 
 static __always_inline int queued_spin_is_locked(struct qspinlock *lock)
@@ -20,6 +58,34 @@ static __always_inline int queued_spin_is_locked(struct 
qspinlock *lock)
 }
 #define queued_spin_is_locked queued_spin_is_locked
 
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define SPIN_THRESHOLD (1<<15) /* not tuned */
+
+static __always_inline void pv_wait(u8 *ptr, u8 val)
+{
+   if (*ptr != val)
+   return;
+   yield_to_any();
+   /*
+* We could pass in a CPU here if waiting in the queue and yield to
+* the previous CPU in the queue.
+*/
+}
+
+static __always_inline void pv_kick(int cpu)
+{
+   prod_cpu(cpu);
+}
+
+extern void __pv_init_lock_hash(void);
+
+static inline void pv_spinlocks_init(void)
+{
+   __pv_init_lock_hash();
+}
+
+#endif
+
 #include 
 
 #endif /* _ASM_POWERPC_QSPINLOCK_H */
diff --git a/arch/powerpc/include/asm/qspinlock_paravirt.h 
b/arch/powerpc/include/asm/qspinlock_paravirt.h
new file mode 100644
index ..6b60e7736a47
--- /dev/null
+++

[PATCH v4 3/6] powerpc/64s: implement queued spinlocks and rwlocks

2020-07-24 Thread Nicholas Piggin

These have shown significantly improved performance and fairness when
spinlock contention is moderate to high on very large systems.

With this series including subsequent patches, on a 16 socket 1536 thread
POWER9, a stress test such as same-file open/close from all CPUs gets big
speedups, 11620op/s aggregate with simple spinlocks vs 384158op/s (33x
faster), where the difference in throughput between the fastest and slowest
thread goes from 7x to 1.4x.

Thanks to the fast path being identical in terms of atomics and barriers
(after a subsequent optimisation patch), single threaded performance is not
changed (no measurable difference).

On smaller systems, performance and fairness seems to be generally improved.
Using dbench on tmpfs as a test (that starts to run into kernel spinlock
contention), a 2-socket OpenPOWER POWER9 system was tested with bare metal
and KVM guest configurations. Results can be found here:

https://github.com/linuxppc/issues/issues/305#issuecomment-663487453

Observations are:

- Queued spinlocks are equal when contention is insignificant, as expected
  and as measured with microbenchmarks.

- When there is contention, on bare metal queued spinlocks have better
  throughput and max latency at all points.

- When virtualised, queued spinlocks are slightly worse approaching peak
  throughput, but significantly better throughput and max latency at all
  points beyond peak, until queued spinlock maximum latency rises when
  clients are 2x vCPUs.

The regressions haven't been analysed very well yet, there are a lot of
things that can be tuned, particularly the paravirtualised locking, but the
numbers already look like a good net win even on relatively small systems.

Acked-by: Peter Zijlstra (Intel) 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/Kconfig  | 15 ++
 arch/powerpc/include/asm/Kbuild   |  1 +
 arch/powerpc/include/asm/qspinlock.h  | 25 +++
 arch/powerpc/include/asm/spinlock.h   |  5 +
 arch/powerpc/include/asm/spinlock_types.h |  5 +
 arch/powerpc/lib/Makefile |  3 +++
 include/asm-generic/qspinlock.h   |  2 ++
 7 files changed, 56 insertions(+)
 create mode 100644 arch/powerpc/include/asm/qspinlock.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9fa23eb320ff..641946052d67 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -145,6 +145,8 @@ config PPC
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if PPC64
+   select ARCH_USE_QUEUED_RWLOCKS  if PPC_QUEUED_SPINLOCKS
+   select ARCH_USE_QUEUED_SPINLOCKSif PPC_QUEUED_SPINLOCKS
select ARCH_WANT_IPC_PARSE_VERSION
select ARCH_WEAK_RELEASE_ACQUIRE
select BINFMT_ELF
@@ -490,6 +492,19 @@ config HOTPLUG_CPU
 
  Say N if you are unsure.
 
+config PPC_QUEUED_SPINLOCKS
+   bool "Queued spinlocks"
+   depends on SMP
+   help
+ Say Y here to use to use queued spinlocks which give better
+ scalability and fairness on large SMP and NUMA systems without
+ harming single threaded performance.
+
+ This option is currently experimental, the code is more complex
+ and less tested so it defaults to "N" for the moment.
+
+ If unsure, say "N".
+
 config ARCH_CPU_PROBE_RELEASE
def_bool y
depends on HOTPLUG_CPU
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index dadbcf3a0b1e..27c2268dfd6c 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -6,5 +6,6 @@ generated-y += syscall_table_spu.h
 generic-y += export.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
+generic-y += qrwlock.h
 generic-y += vtime.h
 generic-y += early_ioremap.h
diff --git a/arch/powerpc/include/asm/qspinlock.h 
b/arch/powerpc/include/asm/qspinlock.h
new file mode 100644
index ..c49e33e24edd
--- /dev/null
+++ b/arch/powerpc/include/asm/qspinlock.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_QSPINLOCK_H
+#define _ASM_POWERPC_QSPINLOCK_H
+
+#include 
+
+#define _Q_PENDING_LOOPS   (1 << 9) /* not tuned */
+
+#define smp_mb__after_spinlock()   smp_mb()
+
+static __always_inline int queued_spin_is_locked(struct qspinlock *lock)
+{
+   /*
+* This barrier was added to simple spinlocks by commit 51d7d5205d338,
+* but it should now be possible to remove it, asm arm64 has done with
+* commit c6f5d02b6a0f.
+*/
+   smp_mb();
+   return atomic_read(&lock->val);
+}
+#define queued_spin_is_locked queued_spin_is_locked
+
+#include 
+
+#endif /* _ASM_POWERPC_QSPINLOCK_H */
diff --git a/arch/powerpc/include/asm/spinlock.h 
b/arch/powerpc/include/asm/spinlock.h
index 21357fe05fe0..434615f1d761 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -3,7 +3,12 @@

[PATCH v4 2/6] powerpc: move spinlock implementation to simple_spinlock

2020-07-24 Thread Nicholas Piggin

To prepare for queued spinlocks. This is a simple rename except to update
preprocessor guard name and a file reference.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/simple_spinlock.h| 288 ++
 .../include/asm/simple_spinlock_types.h   |  21 ++
 arch/powerpc/include/asm/spinlock.h   | 285 +
 arch/powerpc/include/asm/spinlock_types.h |  12 +-
 4 files changed, 311 insertions(+), 295 deletions(-)
 create mode 100644 arch/powerpc/include/asm/simple_spinlock.h
 create mode 100644 arch/powerpc/include/asm/simple_spinlock_types.h

diff --git a/arch/powerpc/include/asm/simple_spinlock.h 
b/arch/powerpc/include/asm/simple_spinlock.h
new file mode 100644
index ..fe6cff7df48e
--- /dev/null
+++ b/arch/powerpc/include/asm/simple_spinlock.h
@@ -0,0 +1,288 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _ASM_POWERPC_SIMPLE_SPINLOCK_H
+#define _ASM_POWERPC_SIMPLE_SPINLOCK_H
+
+/*
+ * Simple spin lock operations.  
+ *
+ * Copyright (C) 2001-2004 Paul Mackerras , IBM
+ * Copyright (C) 2001 Anton Blanchard , IBM
+ * Copyright (C) 2002 Dave Engebretsen , IBM
+ * Rework to support virtual processors
+ *
+ * Type of int is used as a full 64b word is not necessary.
+ *
+ * (the type definitions are in asm/simple_spinlock_types.h)
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef CONFIG_PPC64
+/* use 0x80yy when locked, where yy == CPU number */
+#ifdef __BIG_ENDIAN__
+#define LOCK_TOKEN (*(u32 *)(&get_paca()->lock_token))
+#else
+#define LOCK_TOKEN (*(u32 *)(&get_paca()->paca_index))
+#endif
+#else
+#define LOCK_TOKEN 1
+#endif
+
+static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
+{
+   return lock.slock == 0;
+}
+
+static inline int arch_spin_is_locked(arch_spinlock_t *lock)
+{
+   smp_mb();
+   return !arch_spin_value_unlocked(*lock);
+}
+
+/*
+ * This returns the old value in the lock, so we succeeded
+ * in getting the lock if the return value is 0.
+ */
+static inline unsigned long __arch_spin_trylock(arch_spinlock_t *lock)
+{
+   unsigned long tmp, token;
+
+   token = LOCK_TOKEN;
+   __asm__ __volatile__(
+"1:" PPC_LWARX(%0,0,%2,1) "\n\
+   cmpwi   0,%0,0\n\
+   bne-2f\n\
+   stwcx.  %1,0,%2\n\
+   bne-1b\n"
+   PPC_ACQUIRE_BARRIER
+"2:"
+   : "=&r" (tmp)
+   : "r" (token), "r" (&lock->slock)
+   : "cr0", "memory");
+
+   return tmp;
+}
+
+static inline int arch_spin_trylock(arch_spinlock_t *lock)
+{
+   return __arch_spin_trylock(lock) == 0;
+}
+
+/*
+ * On a system with shared processors (that is, where a physical
+ * processor is multiplexed between several virtual processors),
+ * there is no point spinning on a lock if the holder of the lock
+ * isn't currently scheduled on a physical processor.  Instead
+ * we detect this situation and ask the hypervisor to give the
+ * rest of our timeslice to the lock holder.
+ *
+ * So that we can tell which virtual processor is holding a lock,
+ * we put 0x8000 | smp_processor_id() in the lock when it is
+ * held.  Conveniently, we have a word in the paca that holds this
+ * value.
+ */
+
+#if defined(CONFIG_PPC_SPLPAR)
+/* We only yield to the hypervisor if we are in shared processor mode */
+void splpar_spin_yield(arch_spinlock_t *lock);
+void splpar_rw_yield(arch_rwlock_t *lock);
+#else /* SPLPAR */
+static inline void splpar_spin_yield(arch_spinlock_t *lock) {};
+static inline void splpar_rw_yield(arch_rwlock_t *lock) {};
+#endif
+
+static inline void spin_yield(arch_spinlock_t *lock)
+{
+   if (is_shared_processor())
+   splpar_spin_yield(lock);
+   else
+   barrier();
+}
+
+static inline void rw_yield(arch_rwlock_t *lock)
+{
+   if (is_shared_processor())
+   splpar_rw_yield(lock);
+   else
+   barrier();
+}
+
+static inline void arch_spin_lock(arch_spinlock_t *lock)
+{
+   while (1) {
+   if (likely(__arch_spin_trylock(lock) == 0))
+   break;
+   do {
+   HMT_low();
+   if (is_shared_processor())
+   splpar_spin_yield(lock);
+   } while (unlikely(lock->slock != 0));
+   HMT_medium();
+   }
+}
+
+static inline
+void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags)
+{
+   unsigned long flags_dis;
+
+   while (1) {
+   if (likely(__arch_spin_trylock(lock) == 0))
+   break;
+   local_save_flags(flags_dis);
+   local_irq_restore(flags);
+   do {
+   HMT_low();
+   if (is_shared_processor())
+   splpar_spin_yield(lock);
+   } while (unlikely(lock->slock != 0));
+   HMT_medium();
+   local_irq_restore(flags_dis);
+

[PATCH v4 1/6] powerpc/pseries: move some PAPR paravirt functions to their own file

2020-07-24 Thread Nicholas Piggin

These functions will be used by queued spinlock implementation,
and may be useful elsewhere too, so move them out of spinlock.h.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/paravirt.h | 59 +
 arch/powerpc/include/asm/spinlock.h | 24 +---
 arch/powerpc/lib/locks.c| 12 +++---
 3 files changed, 66 insertions(+), 29 deletions(-)
 create mode 100644 arch/powerpc/include/asm/paravirt.h

diff --git a/arch/powerpc/include/asm/paravirt.h 
b/arch/powerpc/include/asm/paravirt.h
new file mode 100644
index ..339e8533464b
--- /dev/null
+++ b/arch/powerpc/include/asm/paravirt.h
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _ASM_POWERPC_PARAVIRT_H
+#define _ASM_POWERPC_PARAVIRT_H
+
+#include 
+#include 
+#ifdef CONFIG_PPC64
+#include 
+#include 
+#endif
+
+#ifdef CONFIG_PPC_SPLPAR
+DECLARE_STATIC_KEY_FALSE(shared_processor);
+
+static inline bool is_shared_processor(void)
+{
+   return static_branch_unlikely(&shared_processor);
+}
+
+/* If bit 0 is set, the cpu has been preempted */
+static inline u32 yield_count_of(int cpu)
+{
+   __be32 yield_count = READ_ONCE(lppaca_of(cpu).yield_count);
+   return be32_to_cpu(yield_count);
+}
+
+static inline void yield_to_preempted(int cpu, u32 yield_count)
+{
+   plpar_hcall_norets(H_CONFER, get_hard_smp_processor_id(cpu), 
yield_count);
+}
+#else
+static inline bool is_shared_processor(void)
+{
+   return false;
+}
+
+static inline u32 yield_count_of(int cpu)
+{
+   return 0;
+}
+
+extern void ___bad_yield_to_preempted(void);
+static inline void yield_to_preempted(int cpu, u32 yield_count)
+{
+   ___bad_yield_to_preempted(); /* This would be a bug */
+}
+#endif
+
+#define vcpu_is_preempted vcpu_is_preempted
+static inline bool vcpu_is_preempted(int cpu)
+{
+   if (!is_shared_processor())
+   return false;
+   if (yield_count_of(cpu) & 1)
+   return true;
+   return false;
+}
+
+#endif /* _ASM_POWERPC_PARAVIRT_H */
diff --git a/arch/powerpc/include/asm/spinlock.h 
b/arch/powerpc/include/asm/spinlock.h
index 2d620896cdae..79be9bb10bbb 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -15,11 +15,10 @@
  *
  * (the type definitions are in asm/spinlock_types.h)
  */
-#include 
 #include 
+#include 
 #ifdef CONFIG_PPC64
 #include 
-#include 
 #endif
 #include 
 #include 
@@ -35,18 +34,6 @@
 #define LOCK_TOKEN 1
 #endif
 
-#ifdef CONFIG_PPC_PSERIES
-DECLARE_STATIC_KEY_FALSE(shared_processor);
-
-#define vcpu_is_preempted vcpu_is_preempted
-static inline bool vcpu_is_preempted(int cpu)
-{
-   if (!static_branch_unlikely(&shared_processor))
-   return false;
-   return !!(be32_to_cpu(lppaca_of(cpu).yield_count) & 1);
-}
-#endif
-
 static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
return lock.slock == 0;
@@ -110,15 +97,6 @@ static inline void splpar_spin_yield(arch_spinlock_t *lock) 
{};
 static inline void splpar_rw_yield(arch_rwlock_t *lock) {};
 #endif
 
-static inline bool is_shared_processor(void)
-{
-#ifdef CONFIG_PPC_SPLPAR
-   return static_branch_unlikely(&shared_processor);
-#else
-   return false;
-#endif
-}
-
 static inline void spin_yield(arch_spinlock_t *lock)
 {
if (is_shared_processor())
diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c
index 6440d5943c00..04165b7a163f 100644
--- a/arch/powerpc/lib/locks.c
+++ b/arch/powerpc/lib/locks.c
@@ -27,14 +27,14 @@ void splpar_spin_yield(arch_spinlock_t *lock)
return;
holder_cpu = lock_value & 0x;
BUG_ON(holder_cpu >= NR_CPUS);
-   yield_count = be32_to_cpu(lppaca_of(holder_cpu).yield_count);
+
+   yield_count = yield_count_of(holder_cpu);
if ((yield_count & 1) == 0)
return; /* virtual cpu is currently running */
rmb();
if (lock->slock != lock_value)
return; /* something has changed */
-   plpar_hcall_norets(H_CONFER,
-   get_hard_smp_processor_id(holder_cpu), yield_count);
+   yield_to_preempted(holder_cpu, yield_count);
 }
 EXPORT_SYMBOL_GPL(splpar_spin_yield);
 
@@ -53,13 +53,13 @@ void splpar_rw_yield(arch_rwlock_t *rw)
return; /* no write lock at present */
holder_cpu = lock_value & 0x;
BUG_ON(holder_cpu >= NR_CPUS);
-   yield_count = be32_to_cpu(lppaca_of(holder_cpu).yield_count);
+
+   yield_count = yield_count_of(holder_cpu);
if ((yield_count & 1) == 0)
return; /* virtual cpu is currently running */
rmb();
if (rw->lock != lock_value)
return; /* something has changed */
-   plpar_hcall_norets(H_CONFER,
-   get_hard_smp_processor_id(holder_cpu), yield_count);
+   yield_to_preempted(holder_cpu, yield_count);
 }
 #endif
-- 
2.23.0

[PATCH v4 0/6] powerpc: queued spinlocks and rwlocks

2020-07-24 Thread Nicholas Piggin

Updated with everybody's feedback (thanks all), and more performance
results.

What I've found is I might have been measuring the worst load point for
the paravirt case, and by looking at a range of loads it's clear that
queued spinlocks are overall better even on PV, doubly so when you look
at the generally much improved worst case latencies.

I have defaulted it to N even though I'm less concerned about the PV
numbers now, just because I think it needs more stress testing. But
it's very nicely selectable so should be low risk to include.

All in all this is a very cool technology and great results especially
on the big systems but even on smaller ones there are nice gains. Thanks
Waiman and everyone who developed it.

Thanks,
Nick

Nicholas Piggin (6):
  powerpc/pseries: move some PAPR paravirt functions to their own file
  powerpc: move spinlock implementation to simple_spinlock
  powerpc/64s: implement queued spinlocks and rwlocks
  powerpc/pseries: implement paravirt qspinlocks for SPLPAR
  powerpc/qspinlock: optimised atomic_try_cmpxchg_lock that adds the
lock hint
  powerpc: implement smp_cond_load_relaxed

 arch/powerpc/Kconfig  |  15 +
 arch/powerpc/include/asm/Kbuild   |   1 +
 arch/powerpc/include/asm/atomic.h |  28 ++
 arch/powerpc/include/asm/barrier.h|  14 +
 arch/powerpc/include/asm/paravirt.h   |  87 +
 arch/powerpc/include/asm/qspinlock.h  |  91 ++
 arch/powerpc/include/asm/qspinlock_paravirt.h |   7 +
 arch/powerpc/include/asm/simple_spinlock.h| 288 
 .../include/asm/simple_spinlock_types.h   |  21 ++
 arch/powerpc/include/asm/spinlock.h   | 308 +-
 arch/powerpc/include/asm/spinlock_types.h |  17 +-
 arch/powerpc/lib/Makefile |   3 +
 arch/powerpc/lib/locks.c  |  12 +-
 arch/powerpc/platforms/pseries/Kconfig|   9 +-
 arch/powerpc/platforms/pseries/setup.c|   4 +-
 include/asm-generic/qspinlock.h   |   4 +
 16 files changed, 588 insertions(+), 321 deletions(-)
 create mode 100644 arch/powerpc/include/asm/paravirt.h
 create mode 100644 arch/powerpc/include/asm/qspinlock.h
 create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt.h
 create mode 100644 arch/powerpc/include/asm/simple_spinlock.h
 create mode 100644 arch/powerpc/include/asm/simple_spinlock_types.h

-- 
2.23.0

Re: [v3 12/15] powerpc/perf: Add support for outputting extended regs in perf intr_regs

2020-07-24 Thread Ravi Bangoria


Hi Athira,


+/* Function to return the extended register values */
+static u64 get_ext_regs_value(int idx)
+{
+   switch (idx) {
+   case PERF_REG_POWERPC_MMCR0:
+   return mfspr(SPRN_MMCR0);
+   case PERF_REG_POWERPC_MMCR1:
+   return mfspr(SPRN_MMCR1);
+   case PERF_REG_POWERPC_MMCR2:
+   return mfspr(SPRN_MMCR2);
+   default: return 0;
+   }
+}
+
  u64 perf_reg_value(struct pt_regs *regs, int idx)
  {
-   if (WARN_ON_ONCE(idx >= PERF_REG_POWERPC_MAX))
-   return 0;
+   u64 PERF_REG_EXTENDED_MAX;


PERF_REG_EXTENDED_MAX should be initialized. otherwise ...


+
+   if (cpu_has_feature(CPU_FTR_ARCH_300))
+   PERF_REG_EXTENDED_MAX = PERF_REG_MAX_ISA_300;
  
  	if (idx == PERF_REG_POWERPC_SIER &&

   (IS_ENABLED(CONFIG_FSL_EMB_PERF_EVENT) ||
@@ -85,6 +103,16 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
IS_ENABLED(CONFIG_PPC32)))
return 0;
  
+	if (idx >= PERF_REG_POWERPC_MAX && idx < PERF_REG_EXTENDED_MAX)

+   return get_ext_regs_value(idx);


On non p9/p10 machine, PERF_REG_EXTENDED_MAX may contain random value which will
allow user to pass this if condition unintentionally.

Neat: PERF_REG_EXTENDED_MAX is a local variable so it should be in lowercase.
Any specific reason to define it in capital?

Ravi

Re: [PATCHv3 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents

2020-07-24 Thread Pingfan Liu

On Thu, Jul 23, 2020 at 9:27 PM Nathan Lynch  wrote:
>
> Pingfan Liu  writes:
> > A bug is observed on pseries by taking the following steps on rhel:
> > -1. drmgr -c mem -r -q 5
> > -2. echo c > /proc/sysrq-trigger
> >
> > And then, the failure looks like:
> > kdump: saving to /sysroot//var/crash/127.0.0.1-2020-01-16-02:06:14/
> > kdump: saving vmcore-dmesg.txt
> > kdump: saving vmcore-dmesg.txt complete
> > kdump: saving vmcore
> >  Checking for memory holes : [  0.0 %] /
> >Checking for memory holes : [100.0 %] |  
> >  Excluding unnecessary pages   : [100.0 %] 
> > \   Copying data  : [  
> > 0.3 %] -  eta: 38s[   44.337636] hash-mmu: mm: Hashing failure ! 
> > EA=0x7fffba40 access=0x8004 current=makedumpfile
> > [   44.337663] hash-mmu: trap=0x300 vsid=0x13a109c ssize=1 base psize=2 
> > psize 2 pte=0xc0005504
> > [   44.337677] hash-mmu: mm: Hashing failure ! EA=0x7fffba40 
> > access=0x8004 current=makedumpfile
> > [   44.337692] hash-mmu: trap=0x300 vsid=0x13a109c ssize=1 base psize=2 
> > psize 2 pte=0xc0005504
> > [   44.337708] makedumpfile[469]: unhandled signal 7 at 7fffba40 
> > nip 7fffbbc4d7fc lr 00011356ca3c code 2
> > [   44.338548] Core dump to |/bin/false pipe failed
> > /lib/kdump-lib-initramfs.sh: line 98:   469 Bus error   
> > $CORE_COLLECTOR /proc/vmcore 
> > $_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/vmcore-incomplete
> > kdump: saving vmcore failed
> >
> > * Root cause *
> >   After analyzing, it turns out that in the current implementation,
> > when hot-removing lmb, the KOBJ_REMOVE event ejects before the dt updating 
> > as
> > the code __remove_memory() comes before drmem_update_dt().
> > So in kdump kernel, when read_from_oldmem() resorts to
> > pSeries_lpar_hpte_insert() to install hpte, but fails with -2 due to
> > non-exist pfn. And finally, low_hash_fault() raise SIGBUS to process, as it
> > can be observed "Bus error"
> >
> > From a viewpoint of listener and publisher, the publisher notifies the
> > listener before data is ready.  This introduces a problem where udev
> > launches kexec-tools (due to KOBJ_REMOVE) and loads a stale dt before
> > updating. And in capture kernel, makedumpfile will access the memory based
> > on the stale dt info, and hit a SIGBUS error due to an un-existed lmb.
> >
> > * Fix *
> >   In order to fix this issue, update dt before __remove_memory(), and
> > accordingly the same rule in hot-add path.
> >
> > This will introduce extra dt updating payload for each involved lmb when 
> > hotplug.
> > But it should be fine since drmem_update_dt() is memory based operation and
> > hotplug is not a hot path.
>
> This is great analysis but the performance implications of the change
> are grave. The add/remove paths here are already O(n) where n is the
> quantity of memory assigned to the LP, this change would make it O(n^2):
>
> dlpar_memory_add_by_count
>   for_each_drmem_lmb <--
> dlpar_add_lmb
>   drmem_update_dt(_v1|_v2)
> for_each_drmem_lmb   <--
>
> Memory add/remove isn't a hot path but quadratic runtime complexity
> isn't acceptable. Its current performance is bad enough that I have
Yes, the quadratic runtime complexity sounds terrible.
And I am curious about the bug. Does the system have thousands of lmb?

> internal bugs open on it.
>
> Not to mention we leak memory every time drmem_update_dt is called
> because we can't safely free device tree properties :-(
Do you know what block us to free it?
>
> Also note that this sort of reverts (fixes?) 063b8b1251fd
> ("powerpc/pseries/memory-hotplug: Only update DT once per memory DLPAR
> request").
Yes. And now, I think I need to bring up another method to fix it.

Thanks,
Pingfan

Re: [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes

2020-07-24 Thread Bharata B Rao

On Fri, Jul 24, 2020 at 09:52:14PM +1000, Michael Ellerman wrote:
> Bharata B Rao  writes:
> > On Tue, Jul 21, 2020 at 10:25:58PM +1000, Michael Ellerman wrote:
> >> Bharata B Rao  writes:
> >> > On Tue, Jul 21, 2020 at 11:45:20AM +1000, Michael Ellerman wrote:
> >> >> Nathan Lynch  writes:
> >> >> > "Aneesh Kumar K.V"  writes:
> >> >> >> This is the next version of the fixes for memory unplug on radix.
> >> >> >> The issues and the fix are described in the actual patches.
> >> >> >
> >> >> > I guess this isn't actually causing problems at runtime right now, 
> >> >> > but I
> >> >> > notice calls to resize_hpt_for_hotplug() from arch_add_memory() and
> >> >> > arch_remove_memory(), which ought to be mmu-agnostic:
> >> >> >
> >> >> > int __ref arch_add_memory(int nid, u64 start, u64 size,
> >> >> > struct mhp_params *params)
> >> >> > {
> >> >> >   unsigned long start_pfn = start >> PAGE_SHIFT;
> >> >> >   unsigned long nr_pages = size >> PAGE_SHIFT;
> >> >> >   int rc;
> >> >> >
> >> >> >   resize_hpt_for_hotplug(memblock_phys_mem_size());
> >> >> >
> >> >> >   start = (unsigned long)__va(start);
> >> >> >   rc = create_section_mapping(start, start + size, nid,
> >> >> >   params->pgprot);
> >> >> > ...
> >> >> 
> >> >> Hmm well spotted.
> >> >> 
> >> >> That does return early if the ops are not setup:
> >> >> 
> >> >> int resize_hpt_for_hotplug(unsigned long new_mem_size)
> >> >> {
> >> >> unsigned target_hpt_shift;
> >> >> 
> >> >> if (!mmu_hash_ops.resize_hpt)
> >> >> return 0;
> >> >> 
> >> >> 
> >> >> And:
> >> >> 
> >> >> void __init hpte_init_pseries(void)
> >> >> {
> >> >> ...
> >> >> if (firmware_has_feature(FW_FEATURE_HPT_RESIZE))
> >> >> mmu_hash_ops.resize_hpt = pseries_lpar_resize_hpt;
> >> >> 
> >> >> And that comes in via ibm,hypertas-functions:
> >> >> 
> >> >> {FW_FEATURE_HPT_RESIZE, "hcall-hpt-resize"},
> >> >> 
> >> >> 
> >> >> But firmware is not necessarily going to add/remove that call based on
> >> >> whether we're using hash/radix.
> >> >
> >> > Correct but hpte_init_pseries() will not be called for radix guests.
> >> 
> >> Yeah, duh. You'd think the function name would have been a sufficient
> >> clue for me :)
> >> 
> >> >> So I think a follow-up patch is needed to make this more robust.
> >> >> 
> >> >> Aneesh/Bharata what platform did you test this series on? I'm curious
> >> >> how this didn't break.
> >> >
> >> > I have tested memory hotplug/unplug for radix guest on zz platform and
> >> > sanity-tested this for hash guest on P8.
> >> >
> >> > As noted above, mmu_hash_ops.resize_hpt will not be set for radix
> >> > guest and hence we won't see any breakage.
> >> 
> >> OK.
> >> 
> >> That's probably fine as it is then. Or maybe just a comment in
> >> resize_hpt_for_hotplug() pointing out that resize_hpt will be NULL if
> >> we're using radix.
> >
> > Or we could move these calls to hpt-only routines like below?
> 
> That looks like it would be equivalent, and would nicely isolate those
> calls in hash specific code. So yeah I think that's worth sending as a
> proper patch, even better if you can test it.

Sure I will send it as a proper patch. I did test minimal hotplug/unplug
for hash guest with that patch, will do more extensive test and resend.

> 
> > David - Do you remember if there was any particular reason to have
> > these two hpt-resize calls within powerpc-generic memory hotplug code?
> 
> I think the HPT resizing was developed before or concurrently with the
> radix support, so I would guess it was just not something we thought
> about at the time.

Right.

Regards,
Bharata.

Re: [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes

2020-07-24 Thread Michael Ellerman

Bharata B Rao  writes:
> On Tue, Jul 21, 2020 at 10:25:58PM +1000, Michael Ellerman wrote:
>> Bharata B Rao  writes:
>> > On Tue, Jul 21, 2020 at 11:45:20AM +1000, Michael Ellerman wrote:
>> >> Nathan Lynch  writes:
>> >> > "Aneesh Kumar K.V"  writes:
>> >> >> This is the next version of the fixes for memory unplug on radix.
>> >> >> The issues and the fix are described in the actual patches.
>> >> >
>> >> > I guess this isn't actually causing problems at runtime right now, but I
>> >> > notice calls to resize_hpt_for_hotplug() from arch_add_memory() and
>> >> > arch_remove_memory(), which ought to be mmu-agnostic:
>> >> >
>> >> > int __ref arch_add_memory(int nid, u64 start, u64 size,
>> >> >   struct mhp_params *params)
>> >> > {
>> >> > unsigned long start_pfn = start >> PAGE_SHIFT;
>> >> > unsigned long nr_pages = size >> PAGE_SHIFT;
>> >> > int rc;
>> >> >
>> >> > resize_hpt_for_hotplug(memblock_phys_mem_size());
>> >> >
>> >> > start = (unsigned long)__va(start);
>> >> > rc = create_section_mapping(start, start + size, nid,
>> >> > params->pgprot);
>> >> > ...
>> >> 
>> >> Hmm well spotted.
>> >> 
>> >> That does return early if the ops are not setup:
>> >> 
>> >> int resize_hpt_for_hotplug(unsigned long new_mem_size)
>> >> {
>> >>   unsigned target_hpt_shift;
>> >> 
>> >>   if (!mmu_hash_ops.resize_hpt)
>> >>   return 0;
>> >> 
>> >> 
>> >> And:
>> >> 
>> >> void __init hpte_init_pseries(void)
>> >> {
>> >>   ...
>> >>   if (firmware_has_feature(FW_FEATURE_HPT_RESIZE))
>> >>   mmu_hash_ops.resize_hpt = pseries_lpar_resize_hpt;
>> >> 
>> >> And that comes in via ibm,hypertas-functions:
>> >> 
>> >>   {FW_FEATURE_HPT_RESIZE, "hcall-hpt-resize"},
>> >> 
>> >> 
>> >> But firmware is not necessarily going to add/remove that call based on
>> >> whether we're using hash/radix.
>> >
>> > Correct but hpte_init_pseries() will not be called for radix guests.
>> 
>> Yeah, duh. You'd think the function name would have been a sufficient
>> clue for me :)
>> 
>> >> So I think a follow-up patch is needed to make this more robust.
>> >> 
>> >> Aneesh/Bharata what platform did you test this series on? I'm curious
>> >> how this didn't break.
>> >
>> > I have tested memory hotplug/unplug for radix guest on zz platform and
>> > sanity-tested this for hash guest on P8.
>> >
>> > As noted above, mmu_hash_ops.resize_hpt will not be set for radix
>> > guest and hence we won't see any breakage.
>> 
>> OK.
>> 
>> That's probably fine as it is then. Or maybe just a comment in
>> resize_hpt_for_hotplug() pointing out that resize_hpt will be NULL if
>> we're using radix.
>
> Or we could move these calls to hpt-only routines like below?

That looks like it would be equivalent, and would nicely isolate those
calls in hash specific code. So yeah I think that's worth sending as a
proper patch, even better if you can test it.

> David - Do you remember if there was any particular reason to have
> these two hpt-resize calls within powerpc-generic memory hotplug code?

I think the HPT resizing was developed before or concurrently with the
radix support, so I would guess it was just not something we thought
about at the time.

cheers

> diff --git a/arch/powerpc/include/asm/sparsemem.h 
> b/arch/powerpc/include/asm/sparsemem.h
> index c89b32443cff..1e6fa371cc38 100644
> --- a/arch/powerpc/include/asm/sparsemem.h
> +++ b/arch/powerpc/include/asm/sparsemem.h
> @@ -17,12 +17,6 @@ extern int create_section_mapping(unsigned long start, 
> unsigned long end,
> int nid, pgprot_t prot);
>  extern int remove_section_mapping(unsigned long start, unsigned long end);
>  
> -#ifdef CONFIG_PPC_BOOK3S_64
> -extern int resize_hpt_for_hotplug(unsigned long new_mem_size);
> -#else
> -static inline int resize_hpt_for_hotplug(unsigned long new_mem_size) { 
> return 0; }
> -#endif
> -
>  #ifdef CONFIG_NUMA
>  extern int hot_add_scn_to_nid(unsigned long scn_addr);
>  #else
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
> b/arch/powerpc/mm/book3s64/hash_utils.c
> index eec6f4e5e481..5daf53ec7600 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -787,7 +787,7 @@ static unsigned long __init htab_get_table_size(void)
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> -int resize_hpt_for_hotplug(unsigned long new_mem_size)
> +static int resize_hpt_for_hotplug(unsigned long new_mem_size)
>  {
>   unsigned target_hpt_shift;
>  
> @@ -821,6 +821,8 @@ int hash__create_section_mapping(unsigned long start, 
> unsigned long end,
>   return -1;
>   }
>  
> + resize_hpt_for_hotplug(memblock_phys_mem_size());
> +
>   rc = htab_bolt_mapping(start, end, __pa(start),
>  pgprot_val(prot), mmu_linear_psize,
>  mmu_kernel_ssize);
> @@ -838,6 +840,10 @@ int hash__remove_section_mappi

RE: [RFC PATCH] powerpc/pseries/svm: capture instruction faulting on MMIO access, in sprg0 register

2020-07-24 Thread Michael Ellerman

Ram Pai  writes:
> On Wed, Jul 22, 2020 at 12:06:06PM +1000, Michael Ellerman wrote:
>> Ram Pai  writes:
>> > An instruction accessing a mmio address, generates a HDSI fault.  This 
>> > fault is
>> > appropriately handled by the Hypervisor.  However in the case of 
>> > secureVMs, the
>> > fault is delivered to the ultravisor.
>> >
>> > Unfortunately the Ultravisor has no correct-way to fetch the faulting
>> > instruction. The PEF architecture does not allow Ultravisor to enable MMU
>> > translation. Walking the two level page table to read the instruction can 
>> > race
>> > with other vcpus modifying the SVM's process scoped page table.
>> 
>> You're trying to read the guest's kernel text IIUC, that mapping should
>> be stable. Possibly permissions on it could change over time, but the
>> virtual -> real mapping should not.
>
> Actually the code does not capture the address of the instruction in the
> sprg0 register. It captures the instruction itself. So should the mapping
> matter?
>> 
>> > This problem can be correctly solved with some help from the kernel.
>> >
>> > Capture the faulting instruction in SPRG0 register, before executing the
>> > faulting instruction. This enables the ultravisor to easily procure the
>> > faulting instruction and emulate it.
>> 
>> This is not something I'm going to merge. Sorry.
>
> Ok. Will consider other approaches.

To elaborate ...

You've basically invented a custom ucall ABI. But a really strange one
which takes an instruction as its first parameter in SPRG0, and then
subsequent parameters in any GPR depending on what the instruction was.

The UV should either emulate the instruction, which means the guest
should not be expected to do anything other than execute the
instruction. Or it should be done with a proper ucall that the guest
explicitly makes with a well defined ABI.

cheers

Re: [v3 13/15] tools/perf: Add perf tools support for extended register capability in powerpc

2020-07-24 Thread Ravi Bangoria


Hi Athira,

On 7/17/20 8:08 PM, Athira Rajeev wrote:

From: Anju T Sudhakar 

Add extended regs to sample_reg_mask in the tool side to use
with `-I?` option. Perf tools side uses extended mask to display
the platform supported register names (with -I? option) to the user
and also send this mask to the kernel to capture the extended registers
in each sample. Hence decide the mask value based on the processor
version.

Currently definitions for `mfspr`, `SPRN_PVR` are part of
`arch/powerpc/util/header.c`. Move this to a header file so that
these definitions can be re-used in other source files as well.


It seems this patch has a regression.

Without this patch:

  $ sudo ./perf record -I
  ^C[ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.458 MB perf.data (318 samples) ]

With this patch:

  $ sudo ./perf record -I
  Error:
  dummy:HG: PMU Hardware doesn't support sampling/overflow-interrupts. Try 
'perf stat'

Ravi

[PATCH v2] powerpc/numa: Limit possible nodes to within num_possible_nodes

2020-07-24 Thread Srikar Dronamraju

MAX_NUMNODES is a theoretical maximum number of nodes thats is supported
by the kernel. Device tree properties exposes the number of possible
nodes on the current platform. The kernel would detected this and would
use it for most of its resource allocations.  If the platform now
increases the nodes to over what was already exposed, then it may lead
to inconsistencies. Hence limit it to the already exposed nodes.

Suggested-by: Nathan Lynch 
Cc: linuxppc-dev 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Anton Blanchard 
Cc: Nathan Lynch 
Cc: Tyrel Datwyler 
Signed-off-by: Srikar Dronamraju 

Changelog v1 -> v2:
v1: 
https://lore.kernel.org/linuxppc-dev/20200715120534.3673-1-sri...@linux.vnet.ibm.com/t/#u
Use nr_node_ids instead of num_possible_nodes() When nodes are
sparse like in PowerNV, nr_node_ids gets the right value unlike
num_possible_nodes()

Signed-off-by: Srikar Dronamraju 
---
 arch/powerpc/mm/numa.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index e437a9ac4956..383359272270 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -221,7 +221,7 @@ static void initialize_distance_lookup_table(int nid,
}
 }
 
-/* Returns nid in the range [0..MAX_NUMNODES-1], or -1 if no useful numa
+/* Returns nid in the range [0..nr_node_ids], or -1 if no useful numa
  * info is found.
  */
 static int associativity_to_nid(const __be32 *associativity)
@@ -235,7 +235,7 @@ static int associativity_to_nid(const __be32 *associativity)
nid = of_read_number(&associativity[min_common_depth], 1);
 
/* POWER4 LPAR uses 0x as invalid node */
-   if (nid == 0x || nid >= MAX_NUMNODES)
+   if (nid == 0x || nid >= nr_node_ids)
nid = NUMA_NO_NODE;
 
if (nid > 0 &&
@@ -448,7 +448,7 @@ static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
index = lmb->aa_index * aa.array_sz + min_common_depth - 1;
nid = of_read_number(&aa.arrays[index], 1);
 
-   if (nid == 0x || nid >= MAX_NUMNODES)
+   if (nid == 0x || nid >= nr_node_ids)
nid = default_nid;
 
if (nid > 0) {
-- 
2.17.1

Re: [PATCH v 1/1] powerpc/64s: allow for clang's objdump differences

2020-07-24 Thread Michael Ellerman

Hi Bill,

Bill Wendling  writes:
> Clang's objdump emits slightly different output from GNU's objdump,
> causing a list of warnings to be emitted during relocatable builds.
> E.g., clang's objdump emits this:
>
>c004: 2c 00 00 48  b  0xc030
>...
>c0005c6c: 10 00 82 40  bf 2, 0xc0005c7c
>
> while GNU objdump emits:
>
>c004: 2c 00 00 48  bc030 <__start+0x30>
>...
>c0005c6c: 10 00 82 40  bne  c0005c7c 
> 
>
> Adjust llvm-objdump's output to remove the extraneous '0x' and convert
> 'bf' and 'bt' to 'bne' and 'beq' resp. to more closely match GNU
> objdump's output.
>
> Note that clang's objdump doesn't yet output the relocation symbols on
> PPC.
>
> Signed-off-by: Bill Wendling 
> ---
>  arch/powerpc/tools/unrel_branch_check.sh | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/powerpc/tools/unrel_branch_check.sh 
> b/arch/powerpc/tools/unrel_branch_check.sh
> index 77114755dc6f..71ce86b68d18 100755
> --- a/arch/powerpc/tools/unrel_branch_check.sh
> +++ b/arch/powerpc/tools/unrel_branch_check.sh
> @@ -31,6 +31,9 @@ grep -e 
> "^c[0-9a-f]*:[[:space:]]*\([0-9a-f][0-9a-f][[:space:]]\)\{4\}[[:space:]]
>  grep -v '\<__start_initialization_multiplatform>' |
>  grep -v -e 'b.\?.\?ctr' |
>  grep -v -e 'b.\?.\?lr' |
> +sed 's/\bbt.\?[[:space:]]*[[:digit:]][[:digit:]]*,/beq/' |
> +sed 's/\bbf.\?[[:space:]]*[[:digit:]][[:digit:]]*,/bne/' |
> +sed 's/[[:space:]]0x/ /' |
>  sed 's/://' |

I know you followed the example in the script of just doing everything
as a separate entry in the pipeline, but I think we could consolidate
all the seds into one?

eg:

sed -e 's/\bbt.\?[[:space:]]*[[:digit:]][[:digit:]]*,/beq/' \
-e 's/\bbf.\?[[:space:]]*[[:digit:]][[:digit:]]*,/bne/' \
-e 's/[[:space:]]0x/ /' \
-e 's/://' |

Does that work?

cheers

Re: [PATCH 2/2] powerpc/64s: system call support for scv/rfscv instructions

2020-07-24 Thread Michael Ellerman

Christophe Leroy  writes:
> Michael Ellerman  a écrit :
>
>> Nicholas Piggin  writes:
>>> diff --git a/arch/powerpc/include/asm/ppc-opcode.h  
>>> b/arch/powerpc/include/asm/ppc-opcode.h
>>> index 2a39c716c343..b2bdc4de1292 100644
>>> --- a/arch/powerpc/include/asm/ppc-opcode.h
>>> +++ b/arch/powerpc/include/asm/ppc-opcode.h
>>> @@ -257,6 +257,7 @@
>>>  #define PPC_INST_MFVSRD0x7c66
>>>  #define PPC_INST_MTVSRD0x7c000166
>>>  #define PPC_INST_SC0x4402
>>> +#define PPC_INST_SCV   0x4401
>> ...
>>> @@ -411,6 +412,7 @@
>> ...
>>> +#define __PPC_LEV(l)   (((l) & 0x7f) << 5)
>>
>> These conflicted and didn't seem to be used so I dropped them.
>>
>>> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
>>> index 5abe98216dc2..161bfccbc309 100644
>>> --- a/arch/powerpc/lib/sstep.c
>>> +++ b/arch/powerpc/lib/sstep.c
>>> @@ -3378,6 +3382,16 @@ int emulate_step(struct pt_regs *regs,  
>>> struct ppc_inst instr)
>>> regs->msr = MSR_KERNEL;
>>> return 1;
>>>
>>> +   case SYSCALL_VECTORED_0:/* scv 0 */
>>> +   regs->gpr[9] = regs->gpr[13];
>>> +   regs->gpr[10] = MSR_KERNEL;
>>> +   regs->gpr[11] = regs->nip + 4;
>>> +   regs->gpr[12] = regs->msr & MSR_MASK;
>>> +   regs->gpr[13] = (unsigned long) get_paca();
>>> +   regs->nip = (unsigned long) &system_call_vectored_emulate;
>>> +   regs->msr = MSR_KERNEL;
>>> +   return 1;
>>> +
>>
>> This broke the ppc64e build:
>>
>>   ld: arch/powerpc/lib/sstep.o:(.toc+0x0): undefined reference to  
>> `system_call_vectored_emulate'
>>   make[1]: *** [/home/michael/linux/Makefile:1139: vmlinux] Error 1
>>
>> I wrapped it in #ifdef CONFIG_PPC64_BOOK3S.
>
> You mean CONFIG_PPC_BOOK3S_64 ?

I hope so ...

 ## .

Will send a fixup. Thanks for noticing.

cheers

[PATCH v2 5/5] selftests/powerpc: Remove powerpc special cases from stack expansion test

2020-07-24 Thread Michael Ellerman

Now that the powerpc code behaves the same as other architectures we
can drop the special cases we had.

Signed-off-by: Michael Ellerman 
---
 .../powerpc/mm/stack_expansion_ldst.c | 41 +++
 1 file changed, 5 insertions(+), 36 deletions(-)

v2: no change just rebased.

diff --git a/tools/testing/selftests/powerpc/mm/stack_expansion_ldst.c 
b/tools/testing/selftests/powerpc/mm/stack_expansion_ldst.c
index 8dbfb51acf0f..ed9143990888 100644
--- a/tools/testing/selftests/powerpc/mm/stack_expansion_ldst.c
+++ b/tools/testing/selftests/powerpc/mm/stack_expansion_ldst.c
@@ -56,13 +56,7 @@ int consume_stack(unsigned long target_sp, unsigned long 
stack_high, int delta,
 #else
asm volatile ("mov %%rsp, %[sp]" : [sp] "=r" (stack_top_sp));
 #endif
-
-   // Kludge, delta < 0 indicates relative to SP
-   if (delta < 0)
-   target = stack_top_sp + delta;
-   else
-   target = stack_high - delta + 1;
-
+   target = stack_high - delta + 1;
volatile char *p = (char *)target;
 
if (type == STORE)
@@ -162,41 +156,16 @@ static int test_one(unsigned int stack_used, int delta, 
enum access_type type)
 
 static void test_one_type(enum access_type type, unsigned long page_size, 
unsigned long rlim_cur)
 {
-   assert(test_one(DEFAULT_SIZE, 512 * _KB, type) == 0);
+   unsigned long delta;
 
-   // powerpc has a special case to allow up to 1MB
-   assert(test_one(DEFAULT_SIZE, 1 * _MB, type) == 0);
-
-#ifdef __powerpc__
-   // This fails on powerpc because it's > 1MB and is not a stdu &
-   // not close to r1
-   assert(test_one(DEFAULT_SIZE, 1 * _MB + 8, type) != 0);
-#else
-   assert(test_one(DEFAULT_SIZE, 1 * _MB + 8, type) == 0);
-#endif
-
-#ifdef __powerpc__
-   // Accessing way past the stack pointer is not allowed on powerpc
-   assert(test_one(DEFAULT_SIZE, rlim_cur, type) != 0);
-#else
// We should be able to access anywhere within the rlimit
+   for (delta = page_size; delta <= rlim_cur; delta += page_size)
+   assert(test_one(DEFAULT_SIZE, delta, type) == 0);
+
assert(test_one(DEFAULT_SIZE, rlim_cur, type) == 0);
-#endif
 
// But if we go past the rlimit it should fail
assert(test_one(DEFAULT_SIZE, rlim_cur + 1, type) != 0);
-
-   // Above 1MB powerpc only allows accesses within 4224 bytes of
-   // r1 for accesses that aren't stdu
-   assert(test_one(1 * _MB + page_size - 128, -4224, type) == 0);
-#ifdef __powerpc__
-   assert(test_one(1 * _MB + page_size - 128, -4225, type) != 0);
-#else
-   assert(test_one(1 * _MB + page_size - 128, -4225, type) == 0);
-#endif
-
-   // By consuming 2MB of stack we test the stdu case
-   assert(test_one(2 * _MB + page_size - 128, -4224, type) == 0);
 }
 
 static int test(void)
-- 
2.25.1

[PATCH v2 4/5] powerpc/mm: Remove custom stack expansion checking

2020-07-24 Thread Michael Ellerman

We have powerpc specific logic in our page fault handling to decide if
an access to an unmapped address below the stack pointer should expand
the stack VMA.

The logic aims to prevent userspace from doing bad accesses below the
stack pointer. However as long as the stack is < 1MB in size, we allow
all accesses without further checks. Adding some debug I see that I
can do a full kernel build and LTP run, and not a single process has
used more than 1MB of stack. So for the majority of processes the
logic never even fires.

We also recently found a nasty bug in this code which could cause
userspace programs to be killed during signal delivery. It went
unnoticed presumably because most processes use < 1MB of stack.

The generic mm code has also grown support for stack guard pages since
this code was originally written, so the most heinous case of the
stack expanding into other mappings is now handled for us.

Finally although some other arches have special logic in this path,
from what I can tell none of x86, arm64, arm and s390 impose any extra
checks other than those in expand_stack().

So drop our complicated logic and like other architectures just let
the stack expand as long as its within the rlimit.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/mm/fault.c | 109 ++--
 1 file changed, 5 insertions(+), 104 deletions(-)

v2: no change just rebased.

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 3ebb1792e636..925a7231abb3 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -42,39 +42,7 @@
 #include 
 #include 
 
-/*
- * Check whether the instruction inst is a store using
- * an update addressing form which will update r1.
- */
-static bool store_updates_sp(struct ppc_inst inst)
-{
-   /* check for 1 in the rA field */
-   if (((ppc_inst_val(inst) >> 16) & 0x1f) != 1)
-   return false;
-   /* check major opcode */
-   switch (ppc_inst_primary_opcode(inst)) {
-   case OP_STWU:
-   case OP_STBU:
-   case OP_STHU:
-   case OP_STFSU:
-   case OP_STFDU:
-   return true;
-   case OP_STD:/* std or stdu */
-   return (ppc_inst_val(inst) & 3) == 1;
-   case OP_31:
-   /* check minor opcode */
-   switch ((ppc_inst_val(inst) >> 1) & 0x3ff) {
-   case OP_31_XOP_STDUX:
-   case OP_31_XOP_STWUX:
-   case OP_31_XOP_STBUX:
-   case OP_31_XOP_STHUX:
-   case OP_31_XOP_STFSUX:
-   case OP_31_XOP_STFDUX:
-   return true;
-   }
-   }
-   return false;
-}
+
 /*
  * do_page_fault error handling helpers
  */
@@ -267,57 +235,6 @@ static bool bad_kernel_fault(struct pt_regs *regs, 
unsigned long error_code,
return false;
 }
 
-// This comes from 64-bit struct rt_sigframe + __SIGNAL_FRAMESIZE
-#define SIGFRAME_MAX_SIZE  (4096 + 128)
-
-static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address,
-   struct vm_area_struct *vma, unsigned int flags,
-   bool *must_retry)
-{
-   /*
-* N.B. The POWER/Open ABI allows programs to access up to
-* 288 bytes below the stack pointer.
-* The kernel signal delivery code writes a bit over 4KB
-* below the stack pointer (r1) before decrementing it.
-* The exec code can write slightly over 640kB to the stack
-* before setting the user r1.  Thus we allow the stack to
-* expand to 1MB without further checks.
-*/
-   if (address + 0x10 < vma->vm_end) {
-   struct ppc_inst __user *nip = (struct ppc_inst __user 
*)regs->nip;
-   /* get user regs even if this fault is in kernel mode */
-   struct pt_regs *uregs = current->thread.regs;
-   if (uregs == NULL)
-   return true;
-
-   /*
-* A user-mode access to an address a long way below
-* the stack pointer is only valid if the instruction
-* is one which would update the stack pointer to the
-* address accessed if the instruction completed,
-* i.e. either stwu rs,n(r1) or stwux rs,r1,rb
-* (or the byte, halfword, float or double forms).
-*
-* If we don't check this then any write to the area
-* between the last mapped region and the stack will
-* expand the stack rather than segfaulting.
-*/
-   if (address + SIGFRAME_MAX_SIZE >= uregs->gpr[1])
-   return false;
-
-   if ((flags & FAULT_FLAG_WRITE) && (flags & FAULT_FLAG_USER) &&
-   access_ok(nip, sizeof(*nip))) {
-   struct ppc_inst inst;
-
-   if (!probe_user_read_inst(&inst, nip))
-

[PATCH v2 3/5] selftests/powerpc: Update the stack expansion test

2020-07-24 Thread Michael Ellerman

Update the stack expansion load/store test to take into account the
new allowance of 4224 bytes below the stack pointer.

Signed-off-by: Michael Ellerman 
---
 .../selftests/powerpc/mm/stack_expansion_ldst.c| 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

v2: Update for change of size to 4224.

diff --git a/tools/testing/selftests/powerpc/mm/stack_expansion_ldst.c 
b/tools/testing/selftests/powerpc/mm/stack_expansion_ldst.c
index 0587e11437f5..8dbfb51acf0f 100644
--- a/tools/testing/selftests/powerpc/mm/stack_expansion_ldst.c
+++ b/tools/testing/selftests/powerpc/mm/stack_expansion_ldst.c
@@ -186,17 +186,17 @@ static void test_one_type(enum access_type type, unsigned 
long page_size, unsign
// But if we go past the rlimit it should fail
assert(test_one(DEFAULT_SIZE, rlim_cur + 1, type) != 0);
 
-   // Above 1MB powerpc only allows accesses within 2048 bytes of
+   // Above 1MB powerpc only allows accesses within 4224 bytes of
// r1 for accesses that aren't stdu
-   assert(test_one(1 * _MB + page_size - 128, -2048, type) == 0);
+   assert(test_one(1 * _MB + page_size - 128, -4224, type) == 0);
 #ifdef __powerpc__
-   assert(test_one(1 * _MB + page_size - 128, -2049, type) != 0);
+   assert(test_one(1 * _MB + page_size - 128, -4225, type) != 0);
 #else
-   assert(test_one(1 * _MB + page_size - 128, -2049, type) == 0);
+   assert(test_one(1 * _MB + page_size - 128, -4225, type) == 0);
 #endif
 
// By consuming 2MB of stack we test the stdu case
-   assert(test_one(2 * _MB + page_size - 128, -2048, type) == 0);
+   assert(test_one(2 * _MB + page_size - 128, -4224, type) == 0);
 }
 
 static int test(void)
-- 
2.25.1

[PATCH v2 2/5] powerpc: Allow 4224 bytes of stack expansion for the signal frame

2020-07-24 Thread Michael Ellerman

We have powerpc specific logic in our page fault handling to decide if
an access to an unmapped address below the stack pointer should expand
the stack VMA.

The code was originally added in 2004 "ported from 2.4". The rough
logic is that the stack is allowed to grow to 1MB with no extra
checking. Over 1MB the access must be within 2048 bytes of the stack
pointer, or be from a user instruction that updates the stack pointer.

The 2048 byte allowance below the stack pointer is there to cover the
288 byte "red zone" as well as the "about 1.5kB" needed by the signal
delivery code.

Unfortunately since then the signal frame has expanded, and is now
4224 bytes on 64-bit kernels with transactional memory enabled. This
means if a process has consumed more than 1MB of stack, and its stack
pointer lies less than 4224 bytes from the next page boundary, signal
delivery will fault when trying to expand the stack and the process
will see a SEGV.

The total size of the signal frame is the size of struct rt_sigframe
(which includes the red zone) plus __SIGNAL_FRAMESIZE (128 bytes on
64-bit).

The 2048 byte allowance was correct until 2008 as the signal frame
was:

struct rt_sigframe {
struct ucontextuc;   /* 0  1440 */
/* --- cacheline 11 boundary (1408 bytes) was 32 bytes ago --- */
long unsigned int  _unused[2];   /*  144016 */
unsigned int   tramp[6]; /*  145624 */
struct siginfo *   pinfo;/*  1480 8 */
void * puc;  /*  1488 8 */
struct siginfo info; /*  1496   128 */
/* --- cacheline 12 boundary (1536 bytes) was 88 bytes ago --- */
char   abigap[288];  /*  1624   288 */

/* size: 1920, cachelines: 15, members: 7 */
/* padding: 8 */
};

1920 + 128 = 2048

Then in commit ce48b2100785 ("powerpc: Add VSX context save/restore,
ptrace and signal support") (Jul 2008) the signal frame expanded to
2304 bytes:

struct rt_sigframe {
struct ucontextuc;   /* 0  1696 */  
<--
/* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */
long unsigned int  _unused[2];   /*  169616 */
unsigned int   tramp[6]; /*  171224 */
struct siginfo *   pinfo;/*  1736 8 */
void * puc;  /*  1744 8 */
struct siginfo info; /*  1752   128 */
/* --- cacheline 14 boundary (1792 bytes) was 88 bytes ago --- */
char   abigap[288];  /*  1880   288 */

/* size: 2176, cachelines: 17, members: 7 */
/* padding: 8 */
};

2176 + 128 = 2304

At this point we should have been exposed to the bug, though as far as
I know it was never reported. I no longer have a system old enough to
easily test on.

Then in 2010 commit 320b2b8de126 ("mm: keep a guard page below a
grow-down stack segment") caused our stack expansion code to never
trigger, as there was always a VMA found for a write up to PAGE_SIZE
below r1.

That meant the bug was hidden as we continued to expand the signal
frame in commit 2b0a576d15e0 ("powerpc: Add new transactional memory
state to the signal context") (Feb 2013):

struct rt_sigframe {
struct ucontextuc;   /* 0  1696 */
/* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */
struct ucontextuc_transact;  /*  1696  1696 */  
<--
/* --- cacheline 26 boundary (3328 bytes) was 64 bytes ago --- */
long unsigned int  _unused[2];   /*  339216 */
unsigned int   tramp[6]; /*  340824 */
struct siginfo *   pinfo;/*  3432 8 */
void * puc;  /*  3440 8 */
struct siginfo info; /*  3448   128 */
/* --- cacheline 27 boundary (3456 bytes) was 120 bytes ago --- */
char   abigap[288];  /*  3576   288 */

/* size: 3872, cachelines: 31, members: 8 */
/* padding: 8 */
/* last cacheline: 32 bytes */
};

3872 + 128 = 4000

And commit 573ebfa6601f ("powerpc: Increase stack redzone for 64-bit
userspace to 512 bytes") (Feb 2014):

struct rt_sigframe {
struct ucontextuc;   /* 0  1696 */
/* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */
struct ucontextuc_transact;  /*  1696  1696 */
/* --- cacheline 26 boundary (3328 bytes) was 64 bytes ago --- */
long unsigned int  _unused[2];   /*  339216 */
unsigned int

[PATCH v2 1/5] selftests/powerpc: Add test of stack expansion logic

2020-07-24 Thread Michael Ellerman

We have custom stack expansion checks that it turns out are extremely
badly tested and contain bugs, surprise. So add some tests that
exercise the code and capture the current boundary conditions.

The signal test currently fails on 64-bit kernels because the 2048
byte allowance for the signal frame is too small, we will fix that in
a subsequent patch.

Signed-off-by: Michael Ellerman 
---

v2:
 - Concentrate on used stack around the 1MB size, as that's where our
   custom logic kicks in.
 - Increment the used stack size by 64 so we can exercise the case
   where we overflow the page by less than 128 (__SIGNAL_FRAMESIZE).

---
 tools/testing/selftests/powerpc/mm/.gitignore |   2 +
 tools/testing/selftests/powerpc/mm/Makefile   |   9 +-
 .../powerpc/mm/stack_expansion_ldst.c | 233 ++
 .../powerpc/mm/stack_expansion_signal.c   | 118 +
 tools/testing/selftests/powerpc/pmu/lib.h |   1 +
 5 files changed, 362 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/mm/stack_expansion_ldst.c
 create mode 100644 tools/testing/selftests/powerpc/mm/stack_expansion_signal.c

diff --git a/tools/testing/selftests/powerpc/mm/.gitignore 
b/tools/testing/selftests/powerpc/mm/.gitignore
index 8d041f508a51..52308f42b7de 100644
--- a/tools/testing/selftests/powerpc/mm/.gitignore
+++ b/tools/testing/selftests/powerpc/mm/.gitignore
@@ -8,3 +8,5 @@ large_vm_fork_separation
 bad_accesses
 tlbie_test
 pkey_exec_prot
+stack_expansion_ldst
+stack_expansion_signal
diff --git a/tools/testing/selftests/powerpc/mm/Makefile 
b/tools/testing/selftests/powerpc/mm/Makefile
index 5a86d59441dc..6cd772e0e374 100644
--- a/tools/testing/selftests/powerpc/mm/Makefile
+++ b/tools/testing/selftests/powerpc/mm/Makefile
@@ -3,7 +3,9 @@
$(MAKE) -C ../
 
 TEST_GEN_PROGS := hugetlb_vs_thp_test subpage_prot segv_errors wild_bctr \
- large_vm_fork_separation bad_accesses pkey_exec_prot
+ large_vm_fork_separation bad_accesses pkey_exec_prot 
stack_expansion_signal \
+ stack_expansion_ldst
+
 TEST_GEN_PROGS_EXTENDED := tlbie_test
 TEST_GEN_FILES := tempfile
 
@@ -17,6 +19,11 @@ $(OUTPUT)/large_vm_fork_separation: CFLAGS += -m64
 $(OUTPUT)/bad_accesses: CFLAGS += -m64
 $(OUTPUT)/pkey_exec_prot: CFLAGS += -m64
 
+$(OUTPUT)/stack_expansion_signal: ../utils.c ../pmu/lib.c
+
+$(OUTPUT)/stack_expansion_ldst: CFLAGS += -fno-stack-protector
+$(OUTPUT)/stack_expansion_ldst: ../utils.c
+
 $(OUTPUT)/tempfile:
dd if=/dev/zero of=$@ bs=64k count=1
 
diff --git a/tools/testing/selftests/powerpc/mm/stack_expansion_ldst.c 
b/tools/testing/selftests/powerpc/mm/stack_expansion_ldst.c
new file mode 100644
index ..0587e11437f5
--- /dev/null
+++ b/tools/testing/selftests/powerpc/mm/stack_expansion_ldst.c
@@ -0,0 +1,233 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Test that loads/stores expand the stack segment, or trigger a SEGV, in
+ * various conditions.
+ *
+ * Based on test code by Tom Lane.
+ */
+
+#undef NDEBUG
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define _KB (1024)
+#define _MB (1024 * 1024)
+
+volatile char *stack_top_ptr;
+volatile unsigned long stack_top_sp;
+volatile char c;
+
+enum access_type {
+   LOAD,
+   STORE,
+};
+
+/*
+ * Consume stack until the stack pointer is below @target_sp, then do an access
+ * (load or store) at offset @delta from either the base of the stack or the
+ * current stack pointer.
+ */
+__attribute__ ((noinline))
+int consume_stack(unsigned long target_sp, unsigned long stack_high, int 
delta, enum access_type type)
+{
+   unsigned long target;
+   char stack_cur;
+
+   if ((unsigned long)&stack_cur > target_sp)
+   return consume_stack(target_sp, stack_high, delta, type);
+   else {
+   // We don't really need this, but without it GCC might not
+   // generate a recursive call above.
+   stack_top_ptr = &stack_cur;
+
+#ifdef __powerpc__
+   asm volatile ("mr %[sp], %%r1" : [sp] "=r" (stack_top_sp));
+#else
+   asm volatile ("mov %%rsp, %[sp]" : [sp] "=r" (stack_top_sp));
+#endif
+
+   // Kludge, delta < 0 indicates relative to SP
+   if (delta < 0)
+   target = stack_top_sp + delta;
+   else
+   target = stack_high - delta + 1;
+
+   volatile char *p = (char *)target;
+
+   if (type == STORE)
+   *p = c;
+   else
+   c = *p;
+
+   // Do something to prevent the stack frame being popped prior to
+   // our access above.
+   getpid();
+   }
+
+   return 0;
+}
+
+static int search_proc_maps(char *needle, unsigned long *low, unsigned long 
*high)
+{
+   unsigned long start, end;
+   static char buf[4096];
+

Re: [PATCH 2/5] powerpc: Allow 4096 bytes of stack expansion for the signal frame

2020-07-24 Thread Michael Ellerman

Daniel Axtens  writes:
> Hi Michael,
>
> Unfortunately, this patch doesn't completely solve the problem.
>
> Trying the original reproducer, I'm still able to trigger the crash even
> with this patch, although not 100% of the time. (If I turn ASLR off
> outside of tmux it reliably crashes, if I turn ASLR off _inside_ of tmux
> it reliably succeeds; all of this is on a serial console.)
>
> ./foo 1241000 & sleep 1; killall -USR1 foo; echo ok
>
> If I add some debugging information, I see that I'm getting
> address + 4096 = 7fed0fa0
> gpr1 =   7fed1020
>
> So address + 4096 is 0x80 bytes below the 4k window. I haven't been able
> to figure out why, gdb gives me a NIP in __kernel_sigtramp_rt64 but I
> don't know what to make of that.

Thanks for testing.

I looked at it again this morning and it's fairly obvious when it's not
11pm :)

We need space for struct rt_sigframe as well as another 128 bytes,
which is __SIGNAL_FRAMESIZE. It's actually mentioned in the comment
above struct rt_sigframe.

I'll send a v2.

> P.S. I don't know what your policy on linking to kernel bugzilla is, but
> if you want:
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=205183

In general I prefer to keep things clean with just a single Link: tag
pointing to the archive of the patch submission.

That can then contain further links and other info, and has the
advantage that people can reply to the patch submission in the future to
add information to the thread that wasn't known at the time of the
commit.

cheers

[PATCH] KVM: PPC: Book3S HV: rework secure mem slot dropping

2020-07-24 Thread Laurent Dufour

When a secure memslot is dropped, all the pages backed in the secure
device (aka really backed by secure memory by the Ultravisor)
should be paged out to a normal page. Previously, this was
achieved by triggering the page fault mechanism which is calling
kvmppc_svm_page_out() on each pages.

This can't work when hot unplugging a memory slot because the memory
slot is flagged as invalid and gfn_to_pfn() is then not trying to access
the page, so the page fault mechanism is not triggered.

Since the final goal is to make a call to kvmppc_svm_page_out() it seems
simpler to call directly instead of triggering such a mechanism. This
way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a
memslot.

Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock,
the call to __kvmppc_svm_page_out() is made.  As
__kvmppc_svm_page_out needs the vma pointer to migrate the pages,
the VMA is fetched in a lazy way, to not trigger find_vma() all
the time. In addition, the mmap_sem is held in read mode during
that time, not in write mode since the virual memory layout is not
impacted, and kvm->arch.uvmem_lock prevents concurrent operation
on the secure device.

Cc: Ram Pai 
Cc: Bharata B Rao 
Cc: Paul Mackerras 
Signed-off-by: Ram Pai 
[modified the changelog description]
Signed-off-by: Laurent Dufour 
[modified check on the VMA in kvmppc_uvmem_drop_pages]
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 53 --
 1 file changed, 36 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index c772e921f769..5dd3e9acdcab 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -632,35 +632,54 @@ static inline int kvmppc_svm_page_out(struct 
vm_area_struct *vma,
  * fault on them, do fault time migration to replace the device PTEs in
  * QEMU page table with normal PTEs from newly allocated pages.
  */
-void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
+void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *slot,
 struct kvm *kvm, bool skip_page_out)
 {
int i;
struct kvmppc_uvmem_page_pvt *pvt;
-   unsigned long pfn, uvmem_pfn;
-   unsigned long gfn = free->base_gfn;
+   struct page *uvmem_page;
+   struct vm_area_struct *vma = NULL;
+   unsigned long uvmem_pfn, gfn;
+   unsigned long addr, end;
+
+   mmap_read_lock(kvm->mm);
+
+   addr = slot->userspace_addr;
+   end = addr + (slot->npages * PAGE_SIZE);
 
-   for (i = free->npages; i; --i, ++gfn) {
-   struct page *uvmem_page;
+   gfn = slot->base_gfn;
+   for (i = slot->npages; i; --i, ++gfn, addr += PAGE_SIZE) {
+
+   /* Fetch the VMA if addr is not in the latest fetched one */
+   if (!vma || addr >= vma->vm_end) {
+   vma = find_vma_intersection(kvm->mm, addr, addr+1);
+   if (!vma) {
+   pr_err("Can't find VMA for gfn:0x%lx\n", gfn);
+   break;
+   }
+   }
 
mutex_lock(&kvm->arch.uvmem_lock);
-   if (!kvmppc_gfn_is_uvmem_pfn(gfn, kvm, &uvmem_pfn)) {
+
+   if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, &uvmem_pfn)) {
+   uvmem_page = pfn_to_page(uvmem_pfn);
+   pvt = uvmem_page->zone_device_data;
+   pvt->skip_page_out = skip_page_out;
+   pvt->remove_gfn = true;
+
+   if (__kvmppc_svm_page_out(vma, addr, addr + PAGE_SIZE,
+ PAGE_SHIFT, kvm, pvt->gpa))
+   pr_err("Can't page out gpa:0x%lx addr:0x%lx\n",
+  pvt->gpa, addr);
+   } else {
+   /* Remove the shared flag if any */
kvmppc_gfn_remove(gfn, kvm);
-   mutex_unlock(&kvm->arch.uvmem_lock);
-   continue;
}
 
-   uvmem_page = pfn_to_page(uvmem_pfn);
-   pvt = uvmem_page->zone_device_data;
-   pvt->skip_page_out = skip_page_out;
-   pvt->remove_gfn = true;
mutex_unlock(&kvm->arch.uvmem_lock);
-
-   pfn = gfn_to_pfn(kvm, gfn);
-   if (is_error_noslot_pfn(pfn))
-   continue;
-   kvm_release_pfn_clean(pfn);
}
+
+   mmap_read_unlock(kvm->mm);
 }
 
 unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm)
-- 
2.27.0

Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper

2020-07-24 Thread Jessica Yu

+++ Jarkko Sakkinen [24/07/20 10:36 +0300]:

On Thu, Jul 23, 2020 at 03:42:09PM +0300, Ard Biesheuvel wrote:

On Thu, 23 Jul 2020 at 04:52, Jarkko Sakkinen
 wrote:
>
> On Thu, Jul 16, 2020 at 06:49:09PM +0200, Christophe Leroy wrote:
> > Jarkko Sakkinen  a écrit :
> >
> > > Rename module_alloc() to text_alloc() and module_memfree() to
> > > text_memfree(), and move them to kernel/text.c, which is unconditionally
> > > compiled to the kernel proper. This allows kprobes, ftrace and bpf to
> > > allocate space for executable code without requiring to compile the 
modules
> > > support (CONFIG_MODULES=y) in.
> >
> > You are not changing enough in powerpc to have this work.
> > On powerpc 32 bits (6xx), when STRICT_KERNEL_RWX is selected, the vmalloc
> > space is set to NX (no exec) at segment level (ie by 256Mbytes zone) unless
> > CONFIG_MODULES is selected.
> >
> > Christophe
>
> This has been deduced down to:
>
> 
https://lore.kernel.org/lkml/20200717030422.679972-1-jarkko.sakki...@linux.intel.com/
>
> I.e. not intruding PPC anymore :-)
>

Ok, so after the elaborate discussion we had between Jessica, Russell,
Peter, Will, Mark, you and myself, where we pointed out that
a) a single text_alloc() abstraction for bpf, kprobes and ftrace does
not fit other architectures very well, and
b) that module_alloc() is not suitable as a default to base text_alloc() on,

In the latest iteration (v5) it is conditionally available only if arch
defines and fallback has been removed.

you went ahead and implemented that anyway, but only cc'ing Peter,
akpm, Masami and the mm list this time?

No problems with that. Actually each patch gets everything that
get_maintainer.pl gives with a cc cmd script, not just the ones
explicitly listed in the patch.

Should I explicitly CC you to the next version? I'm happy to grow
the list when requested.

Yes, please CC everybody that was part of the discussion last time
especially during v2, and please use a consistent CC list for the
whole patchset. It is difficult to review when you only receive patch
1 out of 6 with no mention of text_alloc() anywhere and without being
CC'd on the cover letter.

Jessica

Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper

2020-07-24 Thread Jarkko Sakkinen

On Thu, Jul 23, 2020 at 03:42:09PM +0300, Ard Biesheuvel wrote:
> On Thu, 23 Jul 2020 at 04:52, Jarkko Sakkinen
>  wrote:
> >
> > On Thu, Jul 16, 2020 at 06:49:09PM +0200, Christophe Leroy wrote:
> > > Jarkko Sakkinen  a écrit :
> > >
> > > > Rename module_alloc() to text_alloc() and module_memfree() to
> > > > text_memfree(), and move them to kernel/text.c, which is unconditionally
> > > > compiled to the kernel proper. This allows kprobes, ftrace and bpf to
> > > > allocate space for executable code without requiring to compile the 
> > > > modules
> > > > support (CONFIG_MODULES=y) in.
> > >
> > > You are not changing enough in powerpc to have this work.
> > > On powerpc 32 bits (6xx), when STRICT_KERNEL_RWX is selected, the vmalloc
> > > space is set to NX (no exec) at segment level (ie by 256Mbytes zone) 
> > > unless
> > > CONFIG_MODULES is selected.
> > >
> > > Christophe
> >
> > This has been deduced down to:
> >
> > https://lore.kernel.org/lkml/20200717030422.679972-1-jarkko.sakki...@linux.intel.com/
> >
> > I.e. not intruding PPC anymore :-)
> >
> 
> Ok, so after the elaborate discussion we had between Jessica, Russell,
> Peter, Will, Mark, you and myself, where we pointed out that
> a) a single text_alloc() abstraction for bpf, kprobes and ftrace does
> not fit other architectures very well, and
> b) that module_alloc() is not suitable as a default to base text_alloc() on,

In the latest iteration (v5) it is conditionally available only if arch
defines and fallback has been removed.

> you went ahead and implemented that anyway, but only cc'ing Peter,
> akpm, Masami and the mm list this time?

No problems with that. Actually each patch gets everything that
get_maintainer.pl gives with a cc cmd script, not just the ones
explicitly listed in the patch.

Should I explicitly CC you to the next version? I'm happy to grow
the list when requested.

> Sorry, but that is not how it works. Once people get pulled into a
> discussion, you cannot dismiss them or their feedback like that and go
> off and do your own thing anyway. Generic features like this are
> tricky to get right, and it will likely take many iterations and input
> from many different people.

Sure. I'm not expecting this move quickly.

I don't think I've at least purposely done that. As you said it's tricky
to get this right.

/Jarkko

Re: [v3 12/15] powerpc/perf: Add support for outputting extended regs in perf intr_regs

2020-07-24 Thread Athira Rajeev




> On 23-Jul-2020, at 8:26 PM, Arnaldo Carvalho de Melo  wrote:
> 
> Em Thu, Jul 23, 2020 at 11:14:16AM +0530, kajoljain escreveu:
>> 
>> 
>> On 7/21/20 11:32 AM, kajoljain wrote:
>>> 
>>> 
>>> On 7/17/20 8:08 PM, Athira Rajeev wrote:
 From: Anju T Sudhakar 
 
 Add support for perf extended register capability in powerpc.
 The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to indicate the
 PMU which support extended registers. The generic code define the mask
 of extended registers as 0 for non supported architectures.
 
 Patch adds extended regs support for power9 platform by
 exposing MMCR0, MMCR1 and MMCR2 registers.
 
 REG_RESERVED mask needs update to include extended regs.
 `PERF_REG_EXTENDED_MASK`, contains mask value of the supported registers,
 is defined at runtime in the kernel based on platform since the supported
 registers may differ from one processor version to another and hence the
 MASK value.
 
 with patch
 --
 
 available registers: r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11
 r12 r13 r14 r15 r16 r17 r18 r19 r20 r21 r22 r23 r24 r25 r26
 r27 r28 r29 r30 r31 nip msr orig_r3 ctr link xer ccr softe
 trap dar dsisr sier mmcra mmcr0 mmcr1 mmcr2
 
 PERF_RECORD_SAMPLE(IP, 0x1): 4784/4784: 0 period: 1 addr: 0
 ... intr regs: mask 0x ABI 64-bit
  r00xc012b77c
  r10xc03fe5e03930
  r20xc1b0e000
  r30xc03fdcddf800
  r40xc03fc788
  r50x9c422724be
  r60xc03fe5e03908
  r70xff63bddc8706
  r80x9e4
  r90x0
  r10   0x1
  r11   0x0
  r12   0xc01299c0
  r13   0xc03c4800
  r14   0x0
  r15   0x7fffdd8b8b00
  r16   0x0
  r17   0x7fffdd8be6b8
  r18   0x7e7076607730
  r19   0x2f
  r20   0xc0001fc26c68
  r21   0xc0002041e4227e00
  r22   0xc0002018fb60
  r23   0x1
  r24   0xc03ffec4d900
  r25   0x8000
  r26   0x0
  r27   0x1
  r28   0x1
  r29   0xc1be1260
  r30   0x6008010
  r31   0xc03ffebb7218
  nip   0xc012b910
  msr   0x90009033
  orig_r3 0xc012b86c
  ctr   0xc01299c0
  link  0xc012b77c
  xer   0x0
  ccr   0x2800
  softe 0x1
  trap  0xf00
  dar   0x0
  dsisr 0x800
  sier  0x0
  mmcra 0x800
  mmcr0 0x82008090
  mmcr1 0x1e00
  mmcr2 0x0
 ... thread: perf:4784
 
 Signed-off-by: Anju T Sudhakar 
 [Defined PERF_REG_EXTENDED_MASK at run time to add support for different 
 platforms ]
 Signed-off-by: Athira Rajeev 
 Reviewed-by: Madhavan Srinivasan 
 ---
>>> 
>>> Patch looks good to me.
>>> 
>>> Reviewed-by: Kajol Jain 
>> 
>> Hi Arnaldo and Jiri,
>>   Please let me know if you have any comments on these patches. Can you 
>> pull/ack these
>> patches if they seems fine to you.
> 
> Can you please clarify something here, I think I saw a kernel build bot
> complaint followed by a fix, in these cases I think, for reviewer's
> sake, that this would entail a v4 patchkit? One that has no such build
> issues?
> 
> Or have I got something wrong?

Hi Arnaldo,

yes you are right, I will send version 4 as a new series with changes to add 
support for extended regs and including fix for the build issue.
Thanks for your response.

Athira 

> 
> - Arnaldo

Re: [PATCH v3 5/6] powerpc/pseries: implement paravirt qspinlocks for SPLPAR

2020-07-24 Thread Will Deacon

On Thu, Jul 23, 2020 at 08:47:59PM +0200, pet...@infradead.org wrote:
> On Thu, Jul 23, 2020 at 02:32:36PM -0400, Waiman Long wrote:
> > BTW, do you have any comment on my v2 lock holder cpu info qspinlock patch?
> > I will have to update the patch to fix the reported 0-day test problem, but
> > I want to collect other feedback before sending out v3.
> 
> I want to say I hate it all, it adds instructions to a path we spend an
> aweful lot of time optimizing without really getting anything back for
> it.
> 
> Will, how do you feel about it?

I can see it potentially being useful for debugging, but I hate the
limitation to 256 CPUs. Even arm64 is hitting that now.

Also, you're talking ~1% gains here. I think our collective time would
be better spent off reviewing the CNA series and trying to make it more
deterministic.

Will

[PATCH] powerpc/book3s64/radix: Add kernel command line option to disable radix GTSE

2020-07-24 Thread Aneesh Kumar K.V

This adds a kernel command line option that can be used to disable GTSE support.
Disabling GTSE implies kernel will make hcalls to invalidate TLB entries.

This was done so that we can do VM migration between configs that enable/disable
GTSE support via hypervisor. To migrate a VM from a system that supports
GTSE to a system that doesn't, we can boot the guest with radix_gtse=off, 
thereby
forcing the guest to use hcalls for TLB invalidates.

The check for hcall availability is done in pSeries_setup_arch so that
the panic message appears on the console. This should only happen on
a hypervisor that doesn't force the guest to hash translation even
though it can't handle the radix GTSE=0 request via CAS. With radix_gtse=off
if the hypervisor doesn't support hcall_rpt_invalidate hcall it should
force the LPAR to hash translation.

Signed-off-by: Aneesh Kumar K.V 
---
 Documentation/admin-guide/kernel-parameters.txt |  3 +++
 arch/powerpc/include/asm/firmware.h |  4 +++-
 arch/powerpc/kernel/prom_init.c | 13 +
 arch/powerpc/platforms/pseries/firmware.c   |  1 +
 arch/powerpc/platforms/pseries/setup.c  |  5 +
 5 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index fb95fad81c79..df20c98a8920 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -896,6 +896,9 @@
disable_radix   [PPC]
Disable RADIX MMU mode on POWER9
 
+   radix_gtse=off  [PPC/PSERIES]
+   Disable RADIX GTSE feature.
+
disable_tlbie   [PPC]
Disable TLBIE instruction. Currently does not work
with KVM, with HASH MMU, or with coherent accelerators.
diff --git a/arch/powerpc/include/asm/firmware.h 
b/arch/powerpc/include/asm/firmware.h
index 6003c2e533a0..aa6a5ef5d483 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -52,6 +52,7 @@
 #define FW_FEATURE_PAPR_SCMASM_CONST(0x0020)
 #define FW_FEATURE_ULTRAVISOR  ASM_CONST(0x0040)
 #define FW_FEATURE_STUFF_TCE   ASM_CONST(0x0080)
+#define FW_FEATURE_RPT_INVALIDATE ASM_CONST(0x0100)
 
 #ifndef __ASSEMBLY__
 
@@ -71,7 +72,8 @@ enum {
FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 |
FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE |
-   FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR,
+   FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR |
+   FW_FEATURE_RPT_INVALIDATE,
FW_FEATURE_PSERIES_ALWAYS = 0,
FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_ULTRAVISOR,
FW_FEATURE_POWERNV_ALWAYS = 0,
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index cbc605cfdec0..e7e91965fe6c 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -169,6 +169,7 @@ static unsigned long __prombss prom_tce_alloc_end;
 
 #ifdef CONFIG_PPC_PSERIES
 static bool __prombss prom_radix_disable;
+static bool __prombss prom_radix_gtse_disable;
 static bool __prombss prom_xive_disable;
 #endif
 
@@ -823,6 +824,12 @@ static void __init early_cmdline_parse(void)
if (prom_radix_disable)
prom_debug("Radix disabled from cmdline\n");
 
+   opt = prom_strstr(prom_cmd_line, "radix_gtse=off");
+   if (opt) {
+   prom_radix_gtse_disable = true;
+   prom_debug("Radix GTSE disabled from cmdline\n");
+   }
+
opt = prom_strstr(prom_cmd_line, "xive=off");
if (opt) {
prom_xive_disable = true;
@@ -1285,10 +1292,8 @@ static void __init prom_parse_platform_support(u8 index, 
u8 val,
prom_parse_mmu_model(val & OV5_FEAT(OV5_MMU_SUPPORT), support);
break;
case OV5_INDX(OV5_RADIX_GTSE): /* Radix Extensions */
-   if (val & OV5_FEAT(OV5_RADIX_GTSE)) {
-   prom_debug("Radix - GTSE supported\n");
-   support->radix_gtse = true;
-   }
+   if (val & OV5_FEAT(OV5_RADIX_GTSE))
+   support->radix_gtse = !prom_radix_gtse_disable;
break;
case OV5_INDX(OV5_XIVE_SUPPORT): /* Interrupt mode */
prom_parse_xive_model(val & OV5_FEAT(OV5_XIVE_SUPPORT),
diff --git a/arch/powerpc/platforms/pseries/firmware.c 
b/arch/powerpc/platforms/pseries/firmware.c
index 3e49cc23a97a..4c7b7f5a2ebc 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -65,6 +65,7 @@ hypertas_fw_features_table[] = {
{FW_FEATURE_HPT_RESIZE, "hcall-hpt-resize"},
{FW_FEATURE_BLOCK_REMOVE,   "hcall-block-remove"},
{FW_FEATURE_PAPR_SCM

Re: [PATCH v5 7/7] KVM: PPC: Book3S HV: rework secure mem slot dropping

2020-07-24 Thread Laurent Dufour


Le 24/07/2020 à 05:03, Bharata B Rao a écrit :

On Thu, Jul 23, 2020 at 01:07:24PM -0700, Ram Pai wrote:

From: Laurent Dufour 

When a secure memslot is dropped, all the pages backed in the secure
device (aka really backed by secure memory by the Ultravisor)
should be paged out to a normal page. Previously, this was
achieved by triggering the page fault mechanism which is calling
kvmppc_svm_page_out() on each pages.

This can't work when hot unplugging a memory slot because the memory
slot is flagged as invalid and gfn_to_pfn() is then not trying to access
the page, so the page fault mechanism is not triggered.

Since the final goal is to make a call to kvmppc_svm_page_out() it seems
simpler to call directly instead of triggering such a mechanism. This
way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a
memslot.

Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock,
the call to __kvmppc_svm_page_out() is made.  As
__kvmppc_svm_page_out needs the vma pointer to migrate the pages,
the VMA is fetched in a lazy way, to not trigger find_vma() all
the time. In addition, the mmap_sem is held in read mode during
that time, not in write mode since the virual memory layout is not
impacted, and kvm->arch.uvmem_lock prevents concurrent operation
on the secure device.

Cc: Ram Pai 
Cc: Bharata B Rao 
Cc: Paul Mackerras 
Signed-off-by: Ram Pai 
[modified the changelog description]
Signed-off-by: Laurent Dufour 
---
  arch/powerpc/kvm/book3s_hv_uvmem.c | 54 ++
  1 file changed, 37 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index c772e92..daffa6e 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -632,35 +632,55 @@ static inline int kvmppc_svm_page_out(struct 
vm_area_struct *vma,
   * fault on them, do fault time migration to replace the device PTEs in
   * QEMU page table with normal PTEs from newly allocated pages.
   */
-void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
+void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *slot,
 struct kvm *kvm, bool skip_page_out)
  {
int i;
struct kvmppc_uvmem_page_pvt *pvt;
-   unsigned long pfn, uvmem_pfn;
-   unsigned long gfn = free->base_gfn;
+   struct page *uvmem_page;
+   struct vm_area_struct *vma = NULL;
+   unsigned long uvmem_pfn, gfn;
+   unsigned long addr, end;
+
+   mmap_read_lock(kvm->mm);
+
+   addr = slot->userspace_addr;
+   end = addr + (slot->npages * PAGE_SIZE);
  
-	for (i = free->npages; i; --i, ++gfn) {

-   struct page *uvmem_page;
+   gfn = slot->base_gfn;
+   for (i = slot->npages; i; --i, ++gfn, addr += PAGE_SIZE) {
+
+   /* Fetch the VMA if addr is not in the latest fetched one */
+   if (!vma || (addr < vma->vm_start || addr >= vma->vm_end)) {
+   vma = find_vma_intersection(kvm->mm, addr, end);
+   if (!vma ||
+   vma->vm_start > addr || vma->vm_end < end) {
+   pr_err("Can't find VMA for gfn:0x%lx\n", gfn);
+   break;
+   }


There is a potential issue with the boundary condition check here
which I discussed with Laurent yesterday. Guess he hasn't gotten around
to look at it yet.


Right, I'm working on that..

Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone

2020-07-24 Thread Arnd Bergmann

On Wed, Jul 22, 2020 at 11:06 PM Atish Patra  wrote:
>
> On Wed, Jul 22, 2020 at 1:23 PM Arnd Bergmann  wrote:
> >
> > I just noticed that rv32 allows 2GB of lowmem rather than just the usual
> > 768MB or 1GB, at the expense of addressable user memory. This seems
> > like an unusual choice, but I also don't see any reason to change this
> > or make it more flexible unless actual users appear.
> >
>
> I am a bit confused here. As per my understanding, RV32 supports 1GB
> of lowmem only
> as the page offset is set to 0xC000. The config option
> MAXPHYSMEM_2GB is misleading
> as RV32 actually allows 1GB of physical memory only.

Ok, in that case I was apparently misled by the Kconfig option name.

I just tried building a kernel to see what the boundaries actually are,
as this is not the only confusing bit. Here is what I see:

0x9dc0 TASK_SIZE/FIXADDR_START   /* code comment says 0x9fc0 */
0x9e00 FIXADDR_TOP/PCI_IO_START
0x9f00 PCI_IO_END/VMEMMAP_START
0xa000 VMEMMAP_END/VMALLOC_START
0xc000 VMALLOC_END/PAGE_OFFSET

Having exactly 1GB of linear map does make a lot of sense. Having PCI I/O,
vmemmap and fixmap come out of the user range means you get slightly
different behavior in user space if there are any changes to that set, but
that is probably fine as well, if you want the flexibility to go to a 2GB linear
map and expect user space to deal with that as well.

There is one common trick from arm32 however that you might want to
consider: if vmalloc was moved above the linear map rather than below,
the size of the vmalloc area can dynamically depend on the amount of
RAM that is actually present rather than be set to a fixed value.

On arm32, there is around 240MB of vmalloc space if the linear map
is fully populated with RAM, but it can grow to use all of the avaialable
address space if less RAM was detected at boot time (up to 3GB
depending on CONFIG_VMSPLIT).

> Any memory blocks beyond
> DRAM + 1GB are removed in setup_bootmem. IMHO, The current config
> should clarify that.
>
> Moreover, we should add 2G split under a separate configuration if we
> want to support that.

Right. It's probably not needed immediately, but can't hurt either.

Arnd

Re: [PATCH v3 05/10] powerpc/smp: Dont assume l2-cache to be superset of sibling

2020-07-24 Thread Gautham R Shenoy



On Thu, Jul 23, 2020 at 02:21:11PM +0530, Srikar Dronamraju wrote:
> Current code assumes that cpumask of cpus sharing a l2-cache mask will
> always be a superset of cpu_sibling_mask.
> 
> Lets stop that assumption. cpu_l2_cache_mask is a superset of
> cpu_sibling_mask if and only if shared_caches is set.
> 
> Cc: linuxppc-dev 
> Cc: LKML 
> Cc: Michael Ellerman 
> Cc: Nicholas Piggin 
> Cc: Anton Blanchard 
> Cc: Oliver O'Halloran 
> Cc: Nathan Lynch 
> Cc: Michael Neuling 
> Cc: Gautham R Shenoy 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Cc: Valentin Schneider 
> Cc: Jordan Niethe 
> Signed-off-by: Srikar Dronamraju 

Reviewed-by: Gautham R. Shenoy 

> ---
> Changelog v1 -> v2:
>   Set cpumask after verifying l2-cache. (Gautham)
> 
>  arch/powerpc/kernel/smp.c | 28 +++-
>  1 file changed, 15 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index da27f6909be1..d997c7411664 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1194,6 +1194,7 @@ static bool update_mask_by_l2(int cpu, struct cpumask 
> *(*mask_fn)(int))
>   if (!l2_cache)
>   return false;
> 
> + cpumask_set_cpu(cpu, mask_fn(cpu));
>   for_each_cpu(i, cpu_online_mask) {
>   /*
>* when updating the marks the current CPU has not been marked
> @@ -1276,29 +1277,30 @@ static void add_cpu_to_masks(int cpu)
>* add it to it's own thread sibling mask.
>*/
>   cpumask_set_cpu(cpu, cpu_sibling_mask(cpu));
> + cpumask_set_cpu(cpu, cpu_core_mask(cpu));
> 
>   for (i = first_thread; i < first_thread + threads_per_core; i++)
>   if (cpu_online(i))
>   set_cpus_related(i, cpu, cpu_sibling_mask);
> 
>   add_cpu_to_smallcore_masks(cpu);
> - /*
> -  * Copy the thread sibling mask into the cache sibling mask
> -  * and mark any CPUs that share an L2 with this CPU.
> -  */
> - for_each_cpu(i, cpu_sibling_mask(cpu))
> - set_cpus_related(cpu, i, cpu_l2_cache_mask);
>   update_mask_by_l2(cpu, cpu_l2_cache_mask);
> 
> - /*
> -  * Copy the cache sibling mask into core sibling mask and mark
> -  * any CPUs on the same chip as this CPU.
> -  */
> - for_each_cpu(i, cpu_l2_cache_mask(cpu))
> - set_cpus_related(cpu, i, cpu_core_mask);
> + if (pkg_id == -1) {
> + struct cpumask *(*mask)(int) = cpu_sibling_mask;
> +
> + /*
> +  * Copy the sibling mask into core sibling mask and
> +  * mark any CPUs on the same chip as this CPU.
> +  */
> + if (shared_caches)
> + mask = cpu_l2_cache_mask;
> +
> + for_each_cpu(i, mask(cpu))
> + set_cpus_related(cpu, i, cpu_core_mask);
> 
> - if (pkg_id == -1)
>   return;
> + }
> 
>   for_each_cpu(i, cpu_online_mask)
>   if (get_physical_package_id(i) == pkg_id)
> -- 
> 2.18.2
>

Re: [PATCH v2 05/10] powerpc/smp: Dont assume l2-cache to be superset of sibling

2020-07-24 Thread Gautham R Shenoy

On Wed, Jul 22, 2020 at 12:27:47PM +0530, Srikar Dronamraju wrote:
> * Gautham R Shenoy  [2020-07-22 11:51:14]:
> 
> > Hi Srikar,
> > 
> > > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> > > index 72f16dc0cb26..57468877499a 100644
> > > --- a/arch/powerpc/kernel/smp.c
> > > +++ b/arch/powerpc/kernel/smp.c
> > > @@ -1196,6 +1196,7 @@ static bool update_mask_by_l2(int cpu, struct 
> > > cpumask *(*mask_fn)(int))
> > >   if (!l2_cache)
> > >   return false;
> > > 
> > > + cpumask_set_cpu(cpu, mask_fn(cpu));
> > 
> > 
> > Ok, we need to do this because "cpu" is not yet set in the
> > cpu_online_mask. Prior to your patch the "cpu" was getting set in
> > cpu_l2_cache_map(cpu) as a side-effect of the code that is removed in
> > the patch.
> > 
> 
> Right.
> 
> > 
> > >   for_each_cpu(i, cpu_online_mask) {
> > >   /*
> > >* when updating the marks the current CPU has not been marked
> > > @@ -1278,29 +1279,30 @@ static void add_cpu_to_masks(int cpu)
> > >* add it to it's own thread sibling mask.
> > >*/
> > >   cpumask_set_cpu(cpu, cpu_sibling_mask(cpu));
> > > + cpumask_set_cpu(cpu, cpu_core_mask(cpu));
> 
> Note: Above, we are explicitly setting the cpu_core_mask.

You are right. I missed this.

> 
> > > 
> > >   for (i = first_thread; i < first_thread + threads_per_core; i++)
> > >   if (cpu_online(i))
> > >   set_cpus_related(i, cpu, cpu_sibling_mask);
> > > 
> > >   add_cpu_to_smallcore_masks(cpu);
> > > - /*
> > > -  * Copy the thread sibling mask into the cache sibling mask
> > > -  * and mark any CPUs that share an L2 with this CPU.
> > > -  */
> > > - for_each_cpu(i, cpu_sibling_mask(cpu))
> > > - set_cpus_related(cpu, i, cpu_l2_cache_mask);
> > >   update_mask_by_l2(cpu, cpu_l2_cache_mask);
> > > 
> > > - /*
> > > -  * Copy the cache sibling mask into core sibling mask and mark
> > > -  * any CPUs on the same chip as this CPU.
> > > -  */
> > > - for_each_cpu(i, cpu_l2_cache_mask(cpu))
> > > - set_cpus_related(cpu, i, cpu_core_mask);
> > > + if (pkg_id == -1) {
> > 
> > I suppose this "if" condition is an optimization, since if pkg_id != -1,
> > we anyway set these CPUs in the cpu_core_mask below.
> > 
> > However...
> 
> This is not just an optimization.
> The hunk removed would only work if cpu_l2_cache_mask is bigger than
> cpu_sibling_mask. (this was the previous assumption that we want to break)
> If the cpu_sibling_mask is bigger than cpu_l2_cache_mask and pkg_id is -1,
> then setting only cpu_l2_cache_mask in cpu_core_mask will result in a broken 
> topology.
> 
> > 
> > > + struct cpumask *(*mask)(int) = cpu_sibling_mask;
> > > +
> > > + /*
> > > +  * Copy the sibling mask into core sibling mask and
> > > +  * mark any CPUs on the same chip as this CPU.
> > > +  */
> > > + if (shared_caches)
> > > + mask = cpu_l2_cache_mask;
> > > +
> > > + for_each_cpu(i, mask(cpu))
> > > + set_cpus_related(cpu, i, cpu_core_mask);
> > > 
> > > - if (pkg_id == -1)
> > >   return;
> > > + }
> > 
> > 
> > ... since "cpu" is not yet set in the cpu_online_mask, do we not miss 
> > setting
> > "cpu" in the cpu_core_mask(cpu) in the for-loop below ?
> > 
> > 
> 
> As noted above, we are setting before. So we don't missing the cpu and hence
> have not different from before.


Fair enough.

> 
> > --
> > Thanks and Regards
> > gautham.
> 
> -- 
> Thanks and Regards
> Srikar Dronamraju

Re: [PATCH v3 04/10] powerpc/smp: Move topology fixups into a new function

2020-07-24 Thread Gautham R Shenoy

On Thu, Jul 23, 2020 at 02:21:10PM +0530, Srikar Dronamraju wrote:
> Move topology fixup based on the platform attributes into its own
> function which is called just before set_sched_topology.
> 
> Cc: linuxppc-dev 
> Cc: LKML 
> Cc: Michael Ellerman 
> Cc: Nicholas Piggin 
> Cc: Anton Blanchard 
> Cc: Oliver O'Halloran 
> Cc: Nathan Lynch 
> Cc: Michael Neuling 
> Cc: Gautham R Shenoy 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Cc: Valentin Schneider 
> Cc: Jordan Niethe 
> Signed-off-by: Srikar Dronamraju 


Reviewed-by: Gautham R. Shenoy 
> ---
> Changelog v2 -> v3:
>   Rewrote changelog (Gautham)
>   Renamed to powerpc/smp: Move topology fixups into  a new function
> 
>  arch/powerpc/kernel/smp.c | 17 +++--
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index a685915e5941..da27f6909be1 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1368,6 +1368,16 @@ int setup_profiling_timer(unsigned int multiplier)
>   return 0;
>  }
> 
> +static void fixup_topology(void)
> +{
> +#ifdef CONFIG_SCHED_SMT
> + if (has_big_cores) {
> + pr_info("Big cores detected but using small core scheduling\n");
> + powerpc_topology[0].mask = smallcore_smt_mask;
> + }
> +#endif
> +}
> +
>  void __init smp_cpus_done(unsigned int max_cpus)
>  {
>   /*
> @@ -1381,12 +1391,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
> 
>   dump_numa_cpu_topology();
> 
> -#ifdef CONFIG_SCHED_SMT
> - if (has_big_cores) {
> - pr_info("Big cores detected but using small core scheduling\n");
> - powerpc_topology[0].mask = smallcore_smt_mask;
> - }
> -#endif
> + fixup_topology();
>   set_sched_topology(powerpc_topology);
>  }
> 
> -- 
> 2.18.2
>

Re: [PATCH v3 02/10] powerpc/smp: Merge Power9 topology with Power topology

2020-07-24 Thread Gautham R Shenoy

On Thu, Jul 23, 2020 at 02:21:08PM +0530, Srikar Dronamraju wrote:
> A new sched_domain_topology_level was added just for Power9. However the
> same can be achieved by merging powerpc_topology with power9_topology
> and makes the code more simpler especially when adding a new sched
> domain.
> 
> Cc: linuxppc-dev 
> Cc: LKML 
> Cc: Michael Ellerman 
> Cc: Nicholas Piggin 
> Cc: Anton Blanchard 
> Cc: Oliver O'Halloran 
> Cc: Nathan Lynch 
> Cc: Michael Neuling 
> Cc: Gautham R Shenoy 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Cc: Valentin Schneider 
> Cc: Jordan Niethe 
> Signed-off-by: Srikar Dronamraju 

LGTM.


Reviewed-by: Gautham R. Shenoy 

> ---
> Changelog v1 -> v2:
>   Replaced a reference to cpu_smt_mask with per_cpu(cpu_sibling_map, cpu)
>   since cpu_smt_mask is only defined under CONFIG_SCHED_SMT
> 
>  arch/powerpc/kernel/smp.c | 33 ++---
>  1 file changed, 10 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index edf94ca64eea..283a04e54f52 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1313,7 +1313,7 @@ int setup_profiling_timer(unsigned int multiplier)
>  }
> 
>  #ifdef CONFIG_SCHED_SMT
> -/* cpumask of CPUs with asymetric SMT dependancy */
> +/* cpumask of CPUs with asymmetric SMT dependency */
>  static int powerpc_smt_flags(void)
>  {
>   int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
> @@ -1326,14 +1326,6 @@ static int powerpc_smt_flags(void)
>  }
>  #endif
> 
> -static struct sched_domain_topology_level powerpc_topology[] = {
> -#ifdef CONFIG_SCHED_SMT
> - { cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
> -#endif
> - { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> - { NULL, },
> -};
> -
>  /*
>   * P9 has a slightly odd architecture where pairs of cores share an L2 cache.
>   * This topology makes it *much* cheaper to migrate tasks between adjacent 
> cores
> @@ -1351,7 +1343,13 @@ static int powerpc_shared_cache_flags(void)
>   */
>  static const struct cpumask *shared_cache_mask(int cpu)
>  {
> - return cpu_l2_cache_mask(cpu);
> + if (shared_caches)
> + return cpu_l2_cache_mask(cpu);
> +
> + if (has_big_cores)
> + return cpu_smallcore_mask(cpu);
> +
> + return per_cpu(cpu_sibling_map, cpu);
>  }
> 
>  #ifdef CONFIG_SCHED_SMT
> @@ -1361,7 +1359,7 @@ static const struct cpumask *smallcore_smt_mask(int cpu)
>  }
>  #endif
> 
> -static struct sched_domain_topology_level power9_topology[] = {
> +static struct sched_domain_topology_level powerpc_topology[] = {
>  #ifdef CONFIG_SCHED_SMT
>   { cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
>  #endif
> @@ -1386,21 +1384,10 @@ void __init smp_cpus_done(unsigned int max_cpus)
>  #ifdef CONFIG_SCHED_SMT
>   if (has_big_cores) {
>   pr_info("Big cores detected but using small core scheduling\n");
> - power9_topology[0].mask = smallcore_smt_mask;
>   powerpc_topology[0].mask = smallcore_smt_mask;
>   }
>  #endif
> - /*
> -  * If any CPU detects that it's sharing a cache with another CPU then
> -  * use the deeper topology that is aware of this sharing.
> -  */
> - if (shared_caches) {
> - pr_info("Using shared cache scheduler topology\n");
> - set_sched_topology(power9_topology);
> - } else {
> - pr_info("Using standard scheduler topology\n");
> - set_sched_topology(powerpc_topology);
> - }
> + set_sched_topology(powerpc_topology);
>  }
> 
>  #ifdef CONFIG_HOTPLUG_CPU
> -- 
> 2.18.2
>

82 matches

Mail list logo