[PATCH v5 2/2] powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show cpumask

2020-07-08 Thread Kajol Jain
Patch here adds a cpumask attr to hv_24x7 pmu along with ABI documentation.

Primary use to expose the cpumask is for the perf tool which has the
capability to parse the driver sysfs folder and understand the
cpumask file. Having cpumask file will reduce the number of perf command
line parameters (will avoid "-C" option in the perf tool
command line). It can also notify the user which is
the current cpu used to retrieve the counter data.

command:# cat /sys/devices/hv_24x7/interface/cpumask
0

Signed-off-by: Kajol Jain 
---
 .../ABI/testing/sysfs-bus-event_source-devices-hv_24x7| 7 +++
 arch/powerpc/perf/hv-24x7.c   | 8 
 2 files changed, 15 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 
b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
index e8698afcd952..f7e32f218f73 100644
--- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
@@ -43,6 +43,13 @@ Description: read only
This sysfs interface exposes the number of cores per chip
present in the system.
 
+What:  /sys/devices/hv_24x7/interface/cpumask
+Date:  July 2020
+Contact:   Linux on PowerPC Developer List 
+Description:   read only
+   This sysfs file exposes the cpumask which is designated to make
+   HCALLs to retrieve hv-24x7 pmu event counter data.
+
 What:  /sys/bus/event_source/devices/hv_24x7/event_descs/
 Date:  February 2014
 Contact:   Linux on PowerPC Developer List 
diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 93b4700dcf8c..acc34148ad09 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -448,6 +448,12 @@ static ssize_t device_show_string(struct device *dev,
return sprintf(buf, "%s\n", (char *)d->var);
 }
 
+static ssize_t cpumask_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   return cpumap_print_to_pagebuf(true, buf, _24x7_cpumask);
+}
+
 static ssize_t sockets_show(struct device *dev,
struct device_attribute *attr, char *buf)
 {
@@ -1115,6 +1121,7 @@ static DEVICE_ATTR_RO(domains);
 static DEVICE_ATTR_RO(sockets);
 static DEVICE_ATTR_RO(chipspersocket);
 static DEVICE_ATTR_RO(coresperchip);
+static DEVICE_ATTR_RO(cpumask);
 
 static struct bin_attribute *if_bin_attrs[] = {
_attr_catalog,
@@ -1128,6 +1135,7 @@ static struct attribute *if_attrs[] = {
_attr_sockets.attr,
_attr_chipspersocket.attr,
_attr_coresperchip.attr,
+   _attr_cpumask.attr,
NULL,
 };
 
-- 
2.26.2



[PATCH v5 0/2] Add cpu hotplug support for powerpc/perf/hv-24x7

2020-07-08 Thread Kajol Jain
This patchset add cpu hotplug support for hv_24x7 driver by adding
online/offline cpu hotplug function. It also add sysfs file
"cpumask" to expose current online cpu that can be used for
hv_24x7 event count.

Changelog:
v4 -> v5
- Since we are making PMU fail incase hotplug init failed, hence
  directly adding cpumask attr inside if_attrs rather then creating
  new attribute_group as suggested by Madhavan Srinivasan.

v3 -> v4
- Make PMU initialization fail incase hotplug init failed. Rather then
  just printing error msg.
- Did some nits changes like removing extra comment and initialising
  target value part as suggested by Michael Ellerman
- Retained Reviewd-by tag because the changes were fixes to some nits.

- Incase we sequentially offline multiple cpus, taking cpumask_first() may
  add some latency in that scenario.

  So, I was trying to test benchmark in power9 lpar with 16 cpu,
  by off-lining cpu 0-14

With cpumask_last: This is what I got.

real0m2.812s
user0m0.002s
sys 0m0.003s

With cpulast_any:
real0m3.690s
user0m0.002s
sys 0m0.062s

That's why I just went with cpumask_last thing.

v2 -> v3
- Corrected some of the typo mistakes and update commit message
  as suggested by Gautham R Shenoy.
- Added Reviewed-by tag for the first patch in the patchset.

v1 -> v2
- Changed function to pick active cpu incase of offline
  from "cpumask_any_but" to "cpumask_last", as
  cpumask_any_but function pick very next online cpu and incase where
  we are sequentially off-lining multiple cpus, "pmu_migrate_context"
  can add extra latency.
  - Suggested by: Gautham R Shenoy.

- Change documentation for cpumask and rather then hardcode the
  initialization for cpumask_attr_group, add loop to get very first
  NULL as suggested by Gautham R Shenoy.

Kajol Jain (2):
  powerpc/perf/hv-24x7: Add cpu hotplug support
  powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show cpumask

 .../sysfs-bus-event_source-devices-hv_24x7|  7 +++
 arch/powerpc/perf/hv-24x7.c   | 54 +++
 include/linux/cpuhotplug.h|  1 +
 3 files changed, 62 insertions(+)

-- 
2.26.2



[PATCH v5 1/2] powerpc/perf/hv-24x7: Add cpu hotplug support

2020-07-08 Thread Kajol Jain
Patch here adds cpu hotplug functions to hv_24x7 pmu.
A new cpuhp_state "CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE" enum
is added.

The online callback function updates the cpumask only if its
empty. As the primary intention of adding hotplug support
is to designate a CPU to make HCALL to collect the
counter data.

The offline function test and clear corresponding cpu in a cpumask
and update cpumask to any other active cpu.

Signed-off-by: Kajol Jain 
Reviewed-by: Gautham R. Shenoy 
---
 arch/powerpc/perf/hv-24x7.c | 46 +
 include/linux/cpuhotplug.h  |  1 +
 2 files changed, 47 insertions(+)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index db213eb7cb02..93b4700dcf8c 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -31,6 +31,8 @@ static int interface_version;
 /* Whether we have to aggregate result data for some domains. */
 static bool aggregate_result_elements;
 
+static cpumask_t hv_24x7_cpumask;
+
 static bool domain_is_valid(unsigned domain)
 {
switch (domain) {
@@ -1641,6 +1643,45 @@ static struct pmu h_24x7_pmu = {
.capabilities = PERF_PMU_CAP_NO_EXCLUDE,
 };
 
+static int ppc_hv_24x7_cpu_online(unsigned int cpu)
+{
+   if (cpumask_empty(_24x7_cpumask))
+   cpumask_set_cpu(cpu, _24x7_cpumask);
+
+   return 0;
+}
+
+static int ppc_hv_24x7_cpu_offline(unsigned int cpu)
+{
+   int target;
+
+   /* Check if exiting cpu is used for collecting 24x7 events */
+   if (!cpumask_test_and_clear_cpu(cpu, _24x7_cpumask))
+   return 0;
+
+   /* Find a new cpu to collect 24x7 events */
+   target = cpumask_last(cpu_active_mask);
+
+   if (target < 0 || target >= nr_cpu_ids) {
+   pr_err("hv_24x7: CPU hotplug init failed\n");
+   return -1;
+   }
+
+   /* Migrate 24x7 events to the new target */
+   cpumask_set_cpu(target, _24x7_cpumask);
+   perf_pmu_migrate_context(_24x7_pmu, cpu, target);
+
+   return 0;
+}
+
+static int hv_24x7_cpu_hotplug_init(void)
+{
+   return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE,
+ "perf/powerpc/hv_24x7:online",
+ ppc_hv_24x7_cpu_online,
+ ppc_hv_24x7_cpu_offline);
+}
+
 static int hv_24x7_init(void)
 {
int r;
@@ -1685,6 +1726,11 @@ static int hv_24x7_init(void)
if (r)
return r;
 
+   /* init cpuhotplug */
+   r = hv_24x7_cpu_hotplug_init();
+   if (r)
+   return r;
+
r = perf_pmu_register(_24x7_pmu, h_24x7_pmu.name, -1);
if (r)
return r;
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 191772d4a4d7..a2710e654b64 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -181,6 +181,7 @@ enum cpuhp_state {
CPUHP_AP_PERF_POWERPC_CORE_IMC_ONLINE,
CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE,
CPUHP_AP_PERF_POWERPC_TRACE_IMC_ONLINE,
+   CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE,
CPUHP_AP_WATCHDOG_ONLINE,
CPUHP_AP_WORKQUEUE_ONLINE,
CPUHP_AP_RCUTREE_ONLINE,
-- 
2.26.2



Re: [RFC PATCH v0 2/2] KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM

2020-07-08 Thread Paul Mackerras
On Fri, Jul 03, 2020 at 04:14:20PM +0530, Bharata B Rao wrote:
> In the nested KVM case, replace H_TLB_INVALIDATE by the new hcall
> H_RPT_INVALIDATE if available. The availability of this hcall
> is determined from "hcall-rpt-invalidate" string in ibm,hypertas-functions
> DT property.

What are we going to use when nested KVM supports HPT guests at L2?
L1 will need to do partition-scoped tlbies with R=0 via a hypercall,
but H_RPT_INVALIDATE says in its name that it only handles radix
page tables (i.e. R=1).

Paul.


Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone

2020-07-08 Thread Palmer Dabbelt

On Sun, 07 Jun 2020 00:59:46 PDT (-0700), a...@ghiti.fr wrote:

This is a preparatory patch for relocatable kernel.

The kernel used to be linked at PAGE_OFFSET address and used to be loaded
physically at the beginning of the main memory. Therefore, we could use
the linear mapping for the kernel mapping.

But the relocated kernel base address will be different from PAGE_OFFSET
and since in the linear mapping, two different virtual addresses cannot
point to the same physical address, the kernel mapping needs to lie outside
the linear mapping.


I know it's been a while, but I keep opening this up to review it and just
can't get over how ugly it is to put the kernel's linear map in the vmalloc
region.

I guess I don't understand why this is necessary at all.  Specifically: why
can't we just relocate the kernel within the linear map?  That would let the
bootloader put the kernel wherever it wants, modulo the physical memory size we
support.  We'd need to handle the regions that are coupled to the kernel's
execution address, but we could just put them in an explicit memory region
which is what we should probably be doing anyway.


In addition, because modules and BPF must be close to the kernel (inside
+-2GB window), the kernel is placed at the end of the vmalloc zone minus
2GB, which leaves room for modules and BPF. The kernel could not be
placed at the beginning of the vmalloc zone since other vmalloc
allocations from the kernel could get all the +-2GB window around the
kernel which would prevent new modules and BPF programs to be loaded.


Well, that's not enough to make sure this doesn't happen -- it's just enough to
make sure it doesn't happen very quickily.  That's the same boat we're already
in, though, so it's not like it's worse.


Signed-off-by: Alexandre Ghiti 
Reviewed-by: Zong Li 
---
 arch/riscv/boot/loader.lds.S |  3 +-
 arch/riscv/include/asm/page.h| 10 +-
 arch/riscv/include/asm/pgtable.h | 38 ++---
 arch/riscv/kernel/head.S |  3 +-
 arch/riscv/kernel/module.c   |  4 +--
 arch/riscv/kernel/vmlinux.lds.S  |  3 +-
 arch/riscv/mm/init.c | 58 +---
 arch/riscv/mm/physaddr.c |  2 +-
 8 files changed, 88 insertions(+), 33 deletions(-)

diff --git a/arch/riscv/boot/loader.lds.S b/arch/riscv/boot/loader.lds.S
index 47a5003c2e28..62d94696a19c 100644
--- a/arch/riscv/boot/loader.lds.S
+++ b/arch/riscv/boot/loader.lds.S
@@ -1,13 +1,14 @@
 /* SPDX-License-Identifier: GPL-2.0 */

 #include 
+#include 

 OUTPUT_ARCH(riscv)
 ENTRY(_start)

 SECTIONS
 {
-   . = PAGE_OFFSET;
+   . = KERNEL_LINK_ADDR;

.payload : {
*(.payload)
diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 2d50f76efe48..48bb09b6a9b7 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -90,18 +90,26 @@ typedef struct page *pgtable_t;

 #ifdef CONFIG_MMU
 extern unsigned long va_pa_offset;
+extern unsigned long va_kernel_pa_offset;
 extern unsigned long pfn_base;
 #define ARCH_PFN_OFFSET(pfn_base)
 #else
 #define va_pa_offset   0
+#define va_kernel_pa_offset0
 #define ARCH_PFN_OFFSET(PAGE_OFFSET >> PAGE_SHIFT)
 #endif /* CONFIG_MMU */

 extern unsigned long max_low_pfn;
 extern unsigned long min_low_pfn;
+extern unsigned long kernel_virt_addr;

 #define __pa_to_va_nodebug(x)  ((void *)((unsigned long) (x) + va_pa_offset))
-#define __va_to_pa_nodebug(x)  ((unsigned long)(x) - va_pa_offset)
+#define linear_mapping_va_to_pa(x) ((unsigned long)(x) - va_pa_offset)
+#define kernel_mapping_va_to_pa(x) \
+   ((unsigned long)(x) - va_kernel_pa_offset)
+#define __va_to_pa_nodebug(x)  \
+   (((x) >= PAGE_OFFSET) ?  \
+   linear_mapping_va_to_pa(x) : kernel_mapping_va_to_pa(x))

 #ifdef CONFIG_DEBUG_VIRTUAL
 extern phys_addr_t __virt_to_phys(unsigned long x);
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 35b60035b6b0..94ef3b49dfb6 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -11,23 +11,29 @@

 #include 

-#ifndef __ASSEMBLY__
-
-/* Page Upper Directory not used in RISC-V */
-#include 
-#include 
-#include 
-#include 
-
-#ifdef CONFIG_MMU
+#ifndef CONFIG_MMU
+#define KERNEL_VIRT_ADDR   PAGE_OFFSET
+#define KERNEL_LINK_ADDR   PAGE_OFFSET
+#else
+/*
+ * Leave 2GB for modules and BPF that must lie within a 2GB range around
+ * the kernel.
+ */
+#define KERNEL_VIRT_ADDR   (VMALLOC_END - SZ_2G + 1)
+#define KERNEL_LINK_ADDR   KERNEL_VIRT_ADDR


At a bare minimum this is going to make a mess of the 32-bit port, as
non-relocatable kernels are now going to get linked at 1GiB which is where user
code is supposed to live.  That's an easy fix, though, as the 32-bit stuff
doesn't need any module address restrictions.


 #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
 #define VMALLOC_END  

Re: [PATCH v2 2/3] powerpc/64s: remove PROT_SAO support

2020-07-08 Thread Paul Mackerras
On Fri, Jul 03, 2020 at 11:19:57AM +1000, Nicholas Piggin wrote:
> ISA v3.1 does not support the SAO storage control attribute required to
> implement PROT_SAO. PROT_SAO was used by specialised system software
> (Lx86) that has been discontinued for about 7 years, and is not thought
> to be used elsewhere, so removal should not cause problems.
> 
> We rather remove it than keep support for older processors, because
> live migrating guest partitions to newer processors may not be possible
> if SAO is in use (or worse allowed with silent races).

This is actually a real problem for KVM, because now we have the
capabilities of the host affecting the characteristics of the guest
virtual machine in a manner which userspace (e.g. QEMU) is unable to
control.

It would probably be better to disallow SAO on all machines than have
it available on some hosts and not others.  (Yes I know there is a
check on CPU_FTR_ARCH_206 in there, but that has been a no-op since we
removed the PPC970 KVM support.)

Solving this properly will probably require creating a new KVM host
capability and associated machine parameter in QEMU, along with a new
machine type.

[snip]

> diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
> b/arch/powerpc/include/asm/kvm_book3s_64.h
> index 9bb9bb370b53..fac39ff659d4 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_64.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
> @@ -398,9 +398,10 @@ static inline bool hpte_cache_flags_ok(unsigned long 
> hptel, bool is_ci)
>  {
>   unsigned int wimg = hptel & HPTE_R_WIMG;
>  
> - /* Handle SAO */
> + /* Handle SAO for POWER7,8,9 */
>   if (wimg == (HPTE_R_W | HPTE_R_I | HPTE_R_M) &&
> - cpu_has_feature(CPU_FTR_ARCH_206))
> + cpu_has_feature(CPU_FTR_ARCH_206) &&
> + !cpu_has_feature(CPU_FTR_ARCH_31))
>   wimg = HPTE_R_M;

Paul.


[PATCH v2 2/5] powerpc/lib: Initialize a temporary mm for code patching

2020-07-08 Thread Christopher M. Riedl
When code patching a STRICT_KERNEL_RWX kernel the page containing the
address to be patched is temporarily mapped with permissive memory
protections. Currently, a per-cpu vmalloc patch area is used for this
purpose. While the patch area is per-cpu, the temporary page mapping is
inserted into the kernel page tables for the duration of the patching.
The mapping is exposed to CPUs other than the patching CPU - this is
undesirable from a hardening perspective.

Use the `poking_init` init hook to prepare a temporary mm and patching
address. Initialize the temporary mm by copying the init mm. Choose a
randomized patching address inside the temporary mm userspace address
portion. The next patch uses the temporary mm and patching address for
code patching.

Based on x86 implementation:

commit 4fc19708b165
("x86/alternatives: Initialize temporary mm for patching")

Signed-off-by: Christopher M. Riedl 
---
 arch/powerpc/lib/code-patching.c | 33 
 1 file changed, 33 insertions(+)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 0a051dfeb177..8ae1a9e5fe6e 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -11,6 +11,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -44,6 +46,37 @@ int raw_patch_instruction(struct ppc_inst *addr, struct 
ppc_inst instr)
 }
 
 #ifdef CONFIG_STRICT_KERNEL_RWX
+
+static struct mm_struct *patching_mm __ro_after_init;
+static unsigned long patching_addr __ro_after_init;
+
+void __init poking_init(void)
+{
+   spinlock_t *ptl; /* for protecting pte table */
+   pte_t *ptep;
+
+   /*
+* Some parts of the kernel (static keys for example) depend on
+* successful code patching. Code patching under STRICT_KERNEL_RWX
+* requires this setup - otherwise we cannot patch at all. We use
+* BUG_ON() here and later since an early failure is preferred to
+* buggy behavior and/or strange crashes later.
+*/
+   patching_mm = copy_init_mm();
+   BUG_ON(!patching_mm);
+
+   /*
+* In hash we cannot go above DEFAULT_MAP_WINDOW easily.
+* XXX: Do we want additional bits of entropy for radix?
+*/
+   patching_addr = (get_random_long() & PAGE_MASK) %
+   (DEFAULT_MAP_WINDOW - PAGE_SIZE);
+
+   ptep = get_locked_pte(patching_mm, patching_addr, );
+   BUG_ON(!ptep);
+   pte_unmap_unlock(ptep, ptl);
+}
+
 static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
 
 static int text_area_cpu_up(unsigned int cpu)
-- 
2.27.0



[PATCH v2 3/5] powerpc/lib: Use a temporary mm for code patching

2020-07-08 Thread Christopher M. Riedl
Currently, code patching a STRICT_KERNEL_RWX exposes the temporary
mappings to other CPUs. These mappings should be kept local to the CPU
doing the patching. Use the pre-initialized temporary mm and patching
address for this purpose. Also add a check after patching to ensure the
patch succeeded.

Use the KUAP functions on non-BOOKS3_64 platforms since the temporary
mapping for patching uses a userspace address (to keep the mapping
local). On BOOKS3_64 platforms hash does not implement KUAP and on radix
the use of PAGE_KERNEL sets EAA[0] for the PTE which means the AMR
(KUAP) protection is ignored (see PowerISA v3.0b, Fig, 35).

Based on x86 implementation:

commit b3fd8e83ada0
("x86/alternatives: Use temporary mm for text poking")

Signed-off-by: Christopher M. Riedl 
---
 arch/powerpc/lib/code-patching.c | 152 +++
 1 file changed, 54 insertions(+), 98 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 8ae1a9e5fe6e..80fe3864f377 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int __patch_instruction(struct ppc_inst *exec_addr, struct ppc_inst 
instr,
   struct ppc_inst *patch_addr)
@@ -77,106 +78,57 @@ void __init poking_init(void)
pte_unmap_unlock(ptep, ptl);
 }
 
-static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
-
-static int text_area_cpu_up(unsigned int cpu)
-{
-   struct vm_struct *area;
-
-   area = get_vm_area(PAGE_SIZE, VM_ALLOC);
-   if (!area) {
-   WARN_ONCE(1, "Failed to create text area for cpu %d\n",
-   cpu);
-   return -1;
-   }
-   this_cpu_write(text_poke_area, area);
-
-   return 0;
-}
-
-static int text_area_cpu_down(unsigned int cpu)
-{
-   free_vm_area(this_cpu_read(text_poke_area));
-   return 0;
-}
-
-/*
- * Run as a late init call. This allows all the boot time patching to be done
- * simply by patching the code, and then we're called here prior to
- * mark_rodata_ro(), which happens after all init calls are run. Although
- * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge
- * it as being preferable to a kernel that will crash later when someone tries
- * to use patch_instruction().
- */
-static int __init setup_text_poke_area(void)
-{
-   BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
-   "powerpc/text_poke:online", text_area_cpu_up,
-   text_area_cpu_down));
-
-   return 0;
-}
-late_initcall(setup_text_poke_area);
+struct patch_mapping {
+   spinlock_t *ptl; /* for protecting pte table */
+   pte_t *ptep;
+   struct temp_mm temp_mm;
+};
 
 /*
  * This can be called for kernel text or a module.
  */
-static int map_patch_area(void *addr, unsigned long text_poke_addr)
+static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
 {
-   unsigned long pfn;
-   int err;
+   struct page *page;
+   pte_t pte;
+   pgprot_t pgprot;
 
if (is_vmalloc_addr(addr))
-   pfn = vmalloc_to_pfn(addr);
+   page = vmalloc_to_page(addr);
else
-   pfn = __pa_symbol(addr) >> PAGE_SHIFT;
+   page = virt_to_page(addr);
 
-   err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL);
+   if (radix_enabled())
+   pgprot = PAGE_KERNEL;
+   else
+   pgprot = PAGE_SHARED;
 
-   pr_devel("Mapped addr %lx with pfn %lx:%d\n", text_poke_addr, pfn, err);
-   if (err)
+   patch_mapping->ptep = get_locked_pte(patching_mm, patching_addr,
+_mapping->ptl);
+   if (unlikely(!patch_mapping->ptep)) {
+   pr_warn("map patch: failed to allocate pte for patching\n");
return -1;
+   }
+
+   pte = mk_pte(page, pgprot);
+   pte = pte_mkdirty(pte);
+   set_pte_at(patching_mm, patching_addr, patch_mapping->ptep, pte);
+
+   init_temp_mm(_mapping->temp_mm, patching_mm);
+   use_temporary_mm(_mapping->temp_mm);
 
return 0;
 }
 
-static inline int unmap_patch_area(unsigned long addr)
+static void unmap_patch(struct patch_mapping *patch_mapping)
 {
-   pte_t *ptep;
-   pmd_t *pmdp;
-   pud_t *pudp;
-   p4d_t *p4dp;
-   pgd_t *pgdp;
-
-   pgdp = pgd_offset_k(addr);
-   if (unlikely(!pgdp))
-   return -EINVAL;
-
-   p4dp = p4d_offset(pgdp, addr);
-   if (unlikely(!p4dp))
-   return -EINVAL;
-
-   pudp = pud_offset(p4dp, addr);
-   if (unlikely(!pudp))
-   return -EINVAL;
-
-   pmdp = pmd_offset(pudp, addr);
-   if (unlikely(!pmdp))
-   return -EINVAL;
-
-   ptep = pte_offset_kernel(pmdp, addr);
-   if (unlikely(!ptep))
-   return -EINVAL;
+   /* In hash, pte_clear flushes the 

[PATCH v2 5/5] powerpc: Add LKDTM test to hijack a patch mapping

2020-07-08 Thread Christopher M. Riedl
When live patching with STRICT_KERNEL_RWX, the CPU doing the patching
must use a temporary mapping which allows for writing to kernel text.
During the entire window of time when this temporary mapping is in use,
another CPU could write to the same mapping and maliciously alter kernel
text. Implement a LKDTM test to attempt to exploit such a openings when
a CPU is patching under STRICT_KERNEL_RWX. The test is only implemented
on powerpc for now.

The LKDTM "hijack" test works as follows:

1. A CPU executes an infinite loop to patch an instruction.
   This is the "patching" CPU.
2. Another CPU attempts to write to the address of the temporary
   mapping used by the "patching" CPU. This other CPU is the
   "hijacker" CPU. The hijack either fails with a segfault or
   succeeds, in which case some kernel text is now overwritten.

How to run the test:

mount -t debugfs none /sys/kernel/debug
(echo HIJACK_PATCH > /sys/kernel/debug/provoke-crash/DIRECT)

Signed-off-by: Christopher M. Riedl 
---
 drivers/misc/lkdtm/core.c  |  1 +
 drivers/misc/lkdtm/lkdtm.h |  1 +
 drivers/misc/lkdtm/perms.c | 99 ++
 3 files changed, 101 insertions(+)

diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index a5e344df9166..482e72f6a1e1 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -145,6 +145,7 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(WRITE_RO),
CRASHTYPE(WRITE_RO_AFTER_INIT),
CRASHTYPE(WRITE_KERN),
+   CRASHTYPE(HIJACK_PATCH),
CRASHTYPE(REFCOUNT_INC_OVERFLOW),
CRASHTYPE(REFCOUNT_ADD_OVERFLOW),
CRASHTYPE(REFCOUNT_INC_NOT_ZERO_OVERFLOW),
diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
index 601a2156a0d4..bfcf3542370d 100644
--- a/drivers/misc/lkdtm/lkdtm.h
+++ b/drivers/misc/lkdtm/lkdtm.h
@@ -62,6 +62,7 @@ void lkdtm_EXEC_USERSPACE(void);
 void lkdtm_EXEC_NULL(void);
 void lkdtm_ACCESS_USERSPACE(void);
 void lkdtm_ACCESS_NULL(void);
+void lkdtm_HIJACK_PATCH(void);
 
 /* lkdtm_refcount.c */
 void lkdtm_REFCOUNT_INC_OVERFLOW(void);
diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 62f76d506f04..b7149daaeb6f 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* Whether or not to fill the target memory area with do_nothing(). */
@@ -213,6 +214,104 @@ void lkdtm_ACCESS_NULL(void)
*ptr = tmp;
 }
 
+#if defined(CONFIG_PPC) && defined(CONFIG_STRICT_KERNEL_RWX)
+#include 
+
+static struct ppc_inst * const patch_site = (struct ppc_inst *)_nothing;
+
+static int lkdtm_patching_cpu(void *data)
+{
+   int err = 0;
+   struct ppc_inst insn = ppc_inst(0xdeadbeef);
+
+   pr_info("starting patching_cpu=%d\n", smp_processor_id());
+   do {
+   err = patch_instruction(patch_site, insn);
+   } while (ppc_inst_equal(ppc_inst_read(READ_ONCE(patch_site)), insn) &&
+   !err && !kthread_should_stop());
+
+   if (err)
+   pr_warn("patch_instruction returned error: %d\n", err);
+
+   set_current_state(TASK_INTERRUPTIBLE);
+   while (!kthread_should_stop()) {
+   schedule();
+   set_current_state(TASK_INTERRUPTIBLE);
+   }
+
+   return err;
+}
+
+void lkdtm_HIJACK_PATCH(void)
+{
+   struct task_struct *patching_kthrd;
+   struct ppc_inst original_insn;
+   int patching_cpu, hijacker_cpu, attempts;
+   unsigned long addr;
+   bool hijacked;
+
+   if (num_online_cpus() < 2) {
+   pr_warn("need at least two cpus\n");
+   return;
+   }
+
+   original_insn = ppc_inst_read(READ_ONCE(patch_site));
+
+   hijacker_cpu = smp_processor_id();
+   patching_cpu = cpumask_any_but(cpu_online_mask, hijacker_cpu);
+
+   patching_kthrd = kthread_create_on_node(_patching_cpu, NULL,
+   cpu_to_node(patching_cpu),
+   "lkdtm_patching_cpu");
+   kthread_bind(patching_kthrd, patching_cpu);
+   wake_up_process(patching_kthrd);
+
+   addr = offset_in_page(patch_site) | 
read_cpu_patching_addr(patching_cpu);
+
+   pr_info("starting hijacker_cpu=%d\n", hijacker_cpu);
+   for (attempts = 0; attempts < 10; ++attempts) {
+   /* Use __put_user to catch faults without an Oops */
+   hijacked = !__put_user(0xbad00bad, (unsigned int *)addr);
+
+   if (hijacked) {
+   if (kthread_stop(patching_kthrd))
+   goto out;
+   break;
+   }
+   }
+   pr_info("hijack attempts: %d\n", attempts);
+
+   if (hijacked) {
+   if (*(unsigned int *)READ_ONCE(patch_site) == 0xbad00bad)
+   pr_err("overwrote kernel 

Re: [PATCH 1/1] KVM/PPC: Fix typo on H_DISABLE_AND_GET hcall

2020-07-08 Thread Paul Mackerras
On Mon, Jul 06, 2020 at 09:48:12PM -0300, Leonardo Bras wrote:
> On PAPR+ the hcall() on 0x1B0 is called H_DISABLE_AND_GET, but got
> defined as H_DISABLE_AND_GETC instead.
> 
> This define was introduced with a typo in commit 
> ("[PATCH] powerpc: Extends HCALL interface for InfiniBand usage"), and was
> later used without having the typo noticed.
> 
> Signed-off-by: Leonardo Bras 

Acked-by: Paul Mackerras 

Since this hypercall is not implemented in KVM nor used by KVM guests,
I'll leave this one for Michael to pick up.

Paul.


[PATCH 0/5] Use per-CPU temporary mappings for patching

2020-07-08 Thread Christopher M. Riedl
When compiled with CONFIG_STRICT_KERNEL_RWX, the kernel must create
temporary mappings when patching itself. These mappings temporarily
override the strict RWX text protections to permit a write. Currently,
powerpc allocates a per-CPU VM area for patching. Patching occurs as
follows:

1. Map page of text to be patched to per-CPU VM area w/
   PAGE_KERNEL protection
2. Patch text
3. Remove the temporary mapping

While the VM area is per-CPU, the mapping is actually inserted into the
kernel page tables. Presumably, this could allow another CPU to access
the normally write-protected text - either malicously or accidentally -
via this same mapping if the address of the VM area is known. Ideally,
the mapping should be kept local to the CPU doing the patching (or any
other sensitive operations requiring temporarily overriding memory
protections) [0].

x86 introduced "temporary mm" structs which allow the creation of
mappings local to a particular CPU [1]. This series intends to bring the
notion of a temporary mm to powerpc and harden powerpc by using such a
mapping for patching a kernel with strict RWX permissions.

The first patch introduces the temporary mm struct and API for powerpc
along with a new function to retrieve a current hw breakpoint.

The second patch uses the `poking_init` init hook added by the x86
patches to initialize a temporary mm and patching address. The patching
address is randomized between 0 and DEFAULT_MAP_WINDOW-PAGE_SIZE. The
upper limit is necessary due to how the hash MMU operates - by default
the space above DEFAULT_MAP_WINDOW is not available. For now, both hash
and radix randomize inside this range. The number of possible random
addresses is dependent on PAGE_SIZE and limited by DEFAULT_MAP_WINDOW.

Bits of entropy with 64K page size on BOOK3S_64:

bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)

PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
bits of entropy = log2(128TB / 64K)
bits of entropy = 31

Randomization occurs only once during initialization at boot.

The third patch replaces the VM area with the temporary mm in the
patching code. The page for patching has to be mapped PAGE_SHARED with
the hash MMU since hash prevents the kernel from accessing userspace
pages with PAGE_PRIVILEGED bit set. On the radix MMU the page is mapped with
PAGE_KERNEL which has the added benefit that we can skip KUAP. 

The fourth and fifth patches implement an LKDTM test "proof-of-concept" which
exploits the previous vulnerability (ie. the mapping during patching is exposed
in kernel page tables and accessible by other CPUS). The LKDTM test is somewhat
"rough" in that it uses a brute-force approach - I am open to any suggestions
and/or ideas to improve this. Currently, the LKDTM test passes with this series
on POWER8 (hash) and POWER9 (radix, hash) and fails without this series (ie.
the temporary mapping for patching is exposed to CPUs other than the patching
CPU).

The test can be applied to a tree without this new series by applying
the last two patches of this series, and then fixing up in
/arch/powerpc/lib/code-patching.c:

 #ifdef CONFIG_STRICT_KERNEL_RWX
 static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);

+#ifdef CONFIG_LKDTM
+unsigned long read_cpu_patching_addr(unsigned int cpu)
+{
+   return (unsigned long)(per_cpu(text_poke_area, cpu))->addr;
+}
+#endif
+
 static int text_area_cpu_up(unsigned int cpu)
 {
struct vm_struct *area;

I also have a tree with just linuxppc/next and the LKDTM test here:
https://github.com/cmr-informatik/linux/commits/percpu-lkdtm-only

Tested on Blackbird (8-core POWER9) w/ Hash (`disable_radix`) and Radix
MMUs.

v2: * Rebase on linuxppc/next:
  commit 105fb38124a4 ("powerpc/8xx: Modify ptep_get()")
* Always dirty pte when mapping patch
* Use `ppc_inst_len` instead of `sizeof` on instructions
* Declare LKDTM patching addr accessor in header where it
  belongs   

v1: * Rebase on linuxppc/next (4336b9337824)
* Save and restore second hw watchpoint
* Use new ppc_inst_* functions for patching check and in LKDTM
  test

rfc-v2: * Many fixes and improvements mostly based on extensive feedback and
  testing by Christophe Leroy (thanks!).
* Make patching_mm and patching_addr static and mode '__ro_after_init'
  to after the variable name (more common in other parts of the kernel)
* Use 'asm/debug.h' header instead of 'asm/hw_breakpoint.h' to fix
  PPC64e compile
* Add comment explaining why we use BUG_ON() during the init call to
  setup for patching later
* Move ptep into patch_mapping to avoid walking page tables a second
  time when unmapping the temporary mapping
* Use KUAP under non-radix, also manually dirty the PTE for patch
  mapping on non-BOOK3S_64 platforms
* Properly return any error from 

[PATCH v2 1/5] powerpc/mm: Introduce temporary mm

2020-07-08 Thread Christopher M. Riedl
x86 supports the notion of a temporary mm which restricts access to
temporary PTEs to a single CPU. A temporary mm is useful for situations
where a CPU needs to perform sensitive operations (such as patching a
STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
said mappings to other CPUs. A side benefit is that other CPU TLBs do
not need to be flushed when the temporary mm is torn down.

Mappings in the temporary mm can be set in the userspace portion of the
address-space.

Interrupts must be disabled while the temporary mm is in use. HW
breakpoints, which may have been set by userspace as watchpoints on
addresses now within the temporary mm, are saved and disabled when
loading the temporary mm. The HW breakpoints are restored when unloading
the temporary mm. All HW breakpoints are indiscriminately disabled while
the temporary mm is in use.

Based on x86 implementation:

commit cefa929c034e
("x86/mm: Introduce temporary mm structs")

Signed-off-by: Christopher M. Riedl 
---
 arch/powerpc/include/asm/debug.h   |  1 +
 arch/powerpc/include/asm/mmu_context.h | 64 ++
 arch/powerpc/kernel/process.c  |  5 ++
 3 files changed, 70 insertions(+)

diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h
index ec57daf87f40..827350c9bcf3 100644
--- a/arch/powerpc/include/asm/debug.h
+++ b/arch/powerpc/include/asm/debug.h
@@ -46,6 +46,7 @@ static inline int debugger_fault_handler(struct pt_regs 
*regs) { return 0; }
 #endif
 
 void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk);
+void __get_breakpoint(int nr, struct arch_hw_breakpoint *brk);
 bool ppc_breakpoint_available(void);
 #ifdef CONFIG_PPC_ADV_DEBUG_REGS
 extern void do_send_trap(struct pt_regs *regs, unsigned long address,
diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 1a474f6b1992..9269c7c7b04e 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -10,6 +10,7 @@
 #include
 #include 
 #include 
+#include 
 
 /*
  * Most if the context management is out of line
@@ -300,5 +301,68 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm,
return 0;
 }
 
+struct temp_mm {
+   struct mm_struct *temp;
+   struct mm_struct *prev;
+   bool is_kernel_thread;
+   struct arch_hw_breakpoint brk[HBP_NUM_MAX];
+};
+
+static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
+{
+   temp_mm->temp = mm;
+   temp_mm->prev = NULL;
+   temp_mm->is_kernel_thread = false;
+   memset(_mm->brk, 0, sizeof(temp_mm->brk));
+}
+
+static inline void use_temporary_mm(struct temp_mm *temp_mm)
+{
+   lockdep_assert_irqs_disabled();
+
+   temp_mm->is_kernel_thread = current->mm == NULL;
+   if (temp_mm->is_kernel_thread)
+   temp_mm->prev = current->active_mm;
+   else
+   temp_mm->prev = current->mm;
+
+   /*
+* Hash requires a non-NULL current->mm to allocate a userspace address
+* when handling a page fault. Does not appear to hurt in Radix either.
+*/
+   current->mm = temp_mm->temp;
+   switch_mm_irqs_off(NULL, temp_mm->temp, current);
+
+   if (ppc_breakpoint_available()) {
+   struct arch_hw_breakpoint null_brk = {0};
+   int i = 0;
+
+   for (; i < nr_wp_slots(); ++i) {
+   __get_breakpoint(i, _mm->brk[i]);
+   if (temp_mm->brk[i].type != 0)
+   __set_breakpoint(i, _brk);
+   }
+   }
+}
+
+static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
+{
+   lockdep_assert_irqs_disabled();
+
+   if (temp_mm->is_kernel_thread)
+   current->mm = NULL;
+   else
+   current->mm = temp_mm->prev;
+   switch_mm_irqs_off(NULL, temp_mm->prev, current);
+
+   if (ppc_breakpoint_available()) {
+   int i = 0;
+
+   for (; i < nr_wp_slots(); ++i)
+   if (temp_mm->brk[i].type != 0)
+   __set_breakpoint(i, _mm->brk[i]);
+   }
+}
+
 #endif /* __KERNEL__ */
 #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 4650b9bb217f..b6c123bf5edd 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -824,6 +824,11 @@ static inline int set_breakpoint_8xx(struct 
arch_hw_breakpoint *brk)
return 0;
 }
 
+void __get_breakpoint(int nr, struct arch_hw_breakpoint *brk)
+{
+   memcpy(brk, this_cpu_ptr(_brk[nr]), sizeof(*brk));
+}
+
 void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk)
 {
memcpy(this_cpu_ptr(_brk[nr]), brk, sizeof(*brk));
-- 
2.27.0



[PATCH v2 4/5] powerpc/lib: Add LKDTM accessor for patching addr

2020-07-08 Thread Christopher M. Riedl
When live patching a STRICT_RWX kernel, a mapping is installed at a
"patching address" with temporary write permissions. Provide a
LKDTM-only accessor function for this address in preparation for a LKDTM
test which attempts to "hijack" this mapping by writing to it from
another CPU.

Signed-off-by: Christopher M. Riedl 
---
 arch/powerpc/include/asm/code-patching.h | 4 
 arch/powerpc/lib/code-patching.c | 7 +++
 2 files changed, 11 insertions(+)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index eacc9102c251..ffc6dfdbbf8e 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -187,4 +187,8 @@ static inline unsigned long ppc_kallsyms_lookup_name(const 
char *name)
 ___PPC_RA(__REG_R1) | PPC_LR_STKOFF)
 #endif /* CONFIG_PPC64 */
 
+#ifdef CONFIG_LKDTM
+unsigned long read_cpu_patching_addr(unsigned int cpu);
+#endif
+
 #endif /* _ASM_POWERPC_CODE_PATCHING_H */
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 80fe3864f377..a12db2092947 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -51,6 +51,13 @@ int raw_patch_instruction(struct ppc_inst *addr, struct 
ppc_inst instr)
 static struct mm_struct *patching_mm __ro_after_init;
 static unsigned long patching_addr __ro_after_init;
 
+#ifdef CONFIG_LKDTM
+unsigned long read_cpu_patching_addr(unsigned int cpu)
+{
+   return patching_addr;
+}
+#endif
+
 void __init poking_init(void)
 {
spinlock_t *ptl; /* for protecting pte table */
-- 
2.27.0



Re: [PATCH v2 09/10] tools/perf: Add perf tools support for extended register capability in powerpc

2020-07-08 Thread Athira Rajeev


> On 08-Jul-2020, at 5:34 PM, Michael Ellerman  wrote:
> 
> Athira Rajeev  > writes:
>> From: Anju T Sudhakar 
>> 
>> Add extended regs to sample_reg_mask in the tool side to use
>> with `-I?` option. Perf tools side uses extended mask to display
>> the platform supported register names (with -I? option) to the user
>> and also send this mask to the kernel to capture the extended registers
>> in each sample. Hence decide the mask value based on the processor
>> version.
>> 
>> Signed-off-by: Anju T Sudhakar 
>> [Decide extended mask at run time based on platform]
>> Signed-off-by: Athira Rajeev 
>> Reviewed-by: Madhavan Srinivasan 
> 
> Will need an ack from perf tools folks, who are not on Cc by the looks.
> 

Yes, my bad. Will make sure to add proper Cc 

>> diff --git a/tools/arch/powerpc/include/uapi/asm/perf_regs.h 
>> b/tools/arch/powerpc/include/uapi/asm/perf_regs.h
>> index f599064..485b1d5 100644
>> --- a/tools/arch/powerpc/include/uapi/asm/perf_regs.h
>> +++ b/tools/arch/powerpc/include/uapi/asm/perf_regs.h
>> @@ -48,6 +48,18 @@ enum perf_event_powerpc_regs {
>>  PERF_REG_POWERPC_DSISR,
>>  PERF_REG_POWERPC_SIER,
>>  PERF_REG_POWERPC_MMCRA,
>> -PERF_REG_POWERPC_MAX,
>> +/* Extended registers */
>> +PERF_REG_POWERPC_MMCR0,
>> +PERF_REG_POWERPC_MMCR1,
>> +PERF_REG_POWERPC_MMCR2,
>> +/* Max regs without the extended regs */
>> +PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1,
> 
> I don't really understand this idea of a max that's not the max.
> 
>> };
>> +
>> +#define PERF_REG_PMU_MASK   ((1ULL << PERF_REG_POWERPC_MAX) - 1)
>> +
>> +/* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300 */
>> +#define PERF_REG_PMU_MASK_300   (((1ULL << (PERF_REG_POWERPC_MMCR2 + 1)) - 
>> 1) \
>> +- PERF_REG_PMU_MASK)
>> +
>> #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
>> diff --git a/tools/perf/arch/powerpc/include/perf_regs.h 
>> b/tools/perf/arch/powerpc/include/perf_regs.h
>> index e18a355..46ed00d 100644
>> --- a/tools/perf/arch/powerpc/include/perf_regs.h
>> +++ b/tools/perf/arch/powerpc/include/perf_regs.h
>> @@ -64,7 +64,10 @@
>>  [PERF_REG_POWERPC_DAR] = "dar",
>>  [PERF_REG_POWERPC_DSISR] = "dsisr",
>>  [PERF_REG_POWERPC_SIER] = "sier",
>> -[PERF_REG_POWERPC_MMCRA] = "mmcra"
>> +[PERF_REG_POWERPC_MMCRA] = "mmcra",
>> +[PERF_REG_POWERPC_MMCR0] = "mmcr0",
>> +[PERF_REG_POWERPC_MMCR1] = "mmcr1",
>> +[PERF_REG_POWERPC_MMCR2] = "mmcr2",
>> };
>> 
>> static inline const char *perf_reg_name(int id)
>> diff --git a/tools/perf/arch/powerpc/util/perf_regs.c 
>> b/tools/perf/arch/powerpc/util/perf_regs.c
>> index 0a52429..9179230 100644
>> --- a/tools/perf/arch/powerpc/util/perf_regs.c
>> +++ b/tools/perf/arch/powerpc/util/perf_regs.c
>> @@ -6,9 +6,14 @@
>> 
>> #include "../../../util/perf_regs.h"
>> #include "../../../util/debug.h"
>> +#include "../../../util/event.h"
>> +#include "../../../util/header.h"
>> +#include "../../../perf-sys.h"
>> 
>> #include 
>> 
>> +#define PVR_POWER9  0x004E
>> +
>> const struct sample_reg sample_reg_masks[] = {
>>  SMPL_REG(r0, PERF_REG_POWERPC_R0),
>>  SMPL_REG(r1, PERF_REG_POWERPC_R1),
>> @@ -55,6 +60,9 @@
>>  SMPL_REG(dsisr, PERF_REG_POWERPC_DSISR),
>>  SMPL_REG(sier, PERF_REG_POWERPC_SIER),
>>  SMPL_REG(mmcra, PERF_REG_POWERPC_MMCRA),
>> +SMPL_REG(mmcr0, PERF_REG_POWERPC_MMCR0),
>> +SMPL_REG(mmcr1, PERF_REG_POWERPC_MMCR1),
>> +SMPL_REG(mmcr2, PERF_REG_POWERPC_MMCR2),
>>  SMPL_REG_END
>> };
>> 
>> @@ -163,3 +171,50 @@ int arch_sdt_arg_parse_op(char *old_op, char **new_op)
>> 
>>  return SDT_ARG_VALID;
>> }
>> +
>> +uint64_t arch__intr_reg_mask(void)
>> +{
>> +struct perf_event_attr attr = {
>> +.type   = PERF_TYPE_HARDWARE,
>> +.config = PERF_COUNT_HW_CPU_CYCLES,
>> +.sample_type= PERF_SAMPLE_REGS_INTR,
>> +.precise_ip = 1,
>> +.disabled   = 1,
>> +.exclude_kernel = 1,
>> +};
>> +int fd, ret;
>> +char buffer[64];
>> +u32 version;
>> +u64 extended_mask = 0;
>> +
>> +/* Get the PVR value to set the extended
>> + * mask specific to platform
> 
> Comment format is wrong, and punctuation please.
> 
>> + */
>> +get_cpuid(buffer, sizeof(buffer));
>> +ret = sscanf(buffer, "%u,", );
> 
> This is powerpc specific code, why not just use mfspr(SPRN_PVR), rather
> than redirecting via printf/sscanf.

Hi Michael

For perf tools, defines for `mfspr` , `SPRN_PVR` are in 
arch/powerpc/util/header.c 
So I have re-used existing utility. Otherwise, we will need to include these 
defines here as well
Does that sounds good ?

> 
>> +
>> +if (ret != 1) {
>> +pr_debug("Failed to get the processor version, unable to output 
>> extended registers\n");
>> +return PERF_REGS_MASK;
>> +}
>> +
>> +if 

Re: [PATCH v2 07/10] powerpc/perf: support BHRB disable bit and new filtering modes

2020-07-08 Thread Athira Rajeev


> On 08-Jul-2020, at 5:12 PM, Michael Ellerman  wrote:
> 
> Athira Rajeev  > writes:
> 
>> PowerISA v3.1 has few updates for the Branch History Rolling Buffer(BHRB).
>   ^
>   a
>> First is the addition of BHRB disable bit and second new filtering
>  ^
>  is
>> modes for BHRB.
>> 
>> BHRB disable is controlled via Monitor Mode Control Register A (MMCRA)
>> bit 26, namely "BHRB Recording Disable (BHRBRD)". This field controls
> 
> Most people call that bit 37.
> 
>> whether BHRB entries are written when BHRB recording is enabled by other
>> bits. Patch implements support for this BHRB disable bit.
>   ^
>   This
> 
>> Secondly PowerISA v3.1 introduce filtering support for
> 
> .. that should be in a separate patch please.
> 
>> PERF_SAMPLE_BRANCH_IND_CALL/COND. The patch adds BHRB filter support
>^
>This
>> for "ind_call" and "cond" in power10_bhrb_filter_map().
>> 
>> 'commit bb19af816025 ("powerpc/perf: Prevent kernel address leak to 
>> userspace via BHRB buffer")'
> 
> That doesn't need single quotes, and should be wrapped at 72 columns
> like the rest of the text.
> 
>> added a check in bhrb_read() to filter the kernel address from BHRB buffer. 
>> Patch here modified
>> it to avoid that check for PowerISA v3.1 based processors, since PowerISA 
>> v3.1 allows
>> only MSR[PR]=1 address to be written to BHRB buffer.
> 
> And that should be a separate patch again please.

Sure, I will split these to separate patches

> 
>> Signed-off-by: Athira Rajeev 
>> ---
>> arch/powerpc/perf/core-book3s.c   | 27 +--
>> arch/powerpc/perf/isa207-common.c | 13 +
>> arch/powerpc/perf/power10-pmu.c   | 13 +++--
>> arch/powerpc/platforms/powernv/idle.c | 14 ++
>> 4 files changed, 59 insertions(+), 8 deletions(-)
>> 
>> diff --git a/arch/powerpc/perf/core-book3s.c 
>> b/arch/powerpc/perf/core-book3s.c
>> index fad5159..9709606 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -466,9 +466,13 @@ static void power_pmu_bhrb_read(struct perf_event 
>> *event, struct cpu_hw_events *
>>   * addresses at this point. Check the privileges before
>>   * exporting it to userspace (avoid exposure of regions
>>   * where we could have speculative execution)
>> + * Incase of ISA 310, BHRB will capture only user-space
>   ^
>   In case of ISA v3.1,

Ok, 
> 
>> + * address,hence include a check before filtering code
>   ^  ^
>   addresses, hence   .
>>   */
>> -if (is_kernel_addr(addr) && 
>> perf_allow_kernel(>attr) != 0)
>> -continue;
>> +if (!(ppmu->flags & PPMU_ARCH_310S))
>> +if (is_kernel_addr(addr) &&
>> +perf_allow_kernel(>attr) != 0)
>> +continue;
> 
> The indentation is weird. You should just check all three conditions
> with &&.

Ok, will correct this.
> 
>> 
>>  /* Branches are read most recent first (ie. mfbhrb 0 is
>>   * the most recent branch).
>> @@ -1212,7 +1216,7 @@ static void write_mmcr0(struct cpu_hw_events *cpuhw, 
>> unsigned long mmcr0)
>> static void power_pmu_disable(struct pmu *pmu)
>> {
>>  struct cpu_hw_events *cpuhw;
>> -unsigned long flags, mmcr0, val;
>> +unsigned long flags, mmcr0, val, mmcra = 0;
> 
> You initialise it below.
> 
>>  if (!ppmu)
>>  return;
>> @@ -1245,12 +1249,23 @@ static void power_pmu_disable(struct pmu *pmu)
>>  mb();
>>  isync();
>> 
>> +val = mmcra = cpuhw->mmcr[2];
>> +
> 
> For mmcr0 (above), val is the variable we mutate and mmcr0 is the
> original value. But here you've done the reverse, which is confusing.

Yes, I am altering mmcra here and using val as original value. I should have 
done it reverse.

> 
>>  /*
>>   * Disable instruction sampling if it was enabled
>>   */
>> -if (cpuhw->mmcr[2] & MMCRA_SAMPLE_ENABLE) {
>> -mtspr(SPRN_MMCRA,
>> -  cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE);
>> +if (cpuhw->mmcr[2] & MMCRA_SAMPLE_ENABLE)
>> +mmcra = cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE;
> 
> You just loaded cpuhw->mmcr[2] into mmcra, use it rather than referring
> back to cpuhw->mmcr[2] over and over.
> 

Ok,
>> +
>> +/* Disable BHRB via 

Re: [PATCH v2 07/10] powerpc/perf: support BHRB disable bit and new filtering modes

2020-07-08 Thread Athira Rajeev


> On 08-Jul-2020, at 1:13 PM, Gautham R Shenoy  wrote:
> 
> On Tue, Jul 07, 2020 at 05:17:55PM +1000, Michael Neuling wrote:
>> On Wed, 2020-07-01 at 05:20 -0400, Athira Rajeev wrote:
>>> PowerISA v3.1 has few updates for the Branch History Rolling Buffer(BHRB).
>>> First is the addition of BHRB disable bit and second new filtering
>>> modes for BHRB.
>>> 
>>> BHRB disable is controlled via Monitor Mode Control Register A (MMCRA)
>>> bit 26, namely "BHRB Recording Disable (BHRBRD)". This field controls
>>> whether BHRB entries are written when BHRB recording is enabled by other
>>> bits. Patch implements support for this BHRB disable bit.
>> 
>> Probably good to note here that this is backwards compatible. So if you have 
>> a
>> kernel that doesn't know about this bit, it'll clear it and hence you still 
>> get
>> BHRB. 
>> 
>> You should also note why you'd want to do disable this (ie. the core will run
>> faster).
>> 
>>> Secondly PowerISA v3.1 introduce filtering support for
>>> PERF_SAMPLE_BRANCH_IND_CALL/COND. The patch adds BHRB filter support
>>> for "ind_call" and "cond" in power10_bhrb_filter_map().
>>> 
>>> 'commit bb19af816025 ("powerpc/perf: Prevent kernel address leak to 
>>> userspace
>>> via BHRB buffer")'
>>> added a check in bhrb_read() to filter the kernel address from BHRB buffer.
>>> Patch here modified
>>> it to avoid that check for PowerISA v3.1 based processors, since PowerISA 
>>> v3.1
>>> allows
>>> only MSR[PR]=1 address to be written to BHRB buffer.
>>> 
>>> Signed-off-by: Athira Rajeev 
>>> ---
>>> arch/powerpc/perf/core-book3s.c   | 27 +--
>>> arch/powerpc/perf/isa207-common.c | 13 +
>>> arch/powerpc/perf/power10-pmu.c   | 13 +++--
>>> arch/powerpc/platforms/powernv/idle.c | 14 ++
>> 
>> This touches the idle code so we should get those guys on CC (adding Vaidy 
>> and
>> Ego).
>> 
>>> 4 files changed, 59 insertions(+), 8 deletions(-)
>>> 
> 
> [..snip..]
> 
> 
>>> diff --git a/arch/powerpc/platforms/powernv/idle.c
>>> b/arch/powerpc/platforms/powernv/idle.c
>>> index 2dd4673..7db99c7 100644
>>> --- a/arch/powerpc/platforms/powernv/idle.c
>>> +++ b/arch/powerpc/platforms/powernv/idle.c
>>> @@ -611,6 +611,7 @@ static unsigned long power9_idle_stop(unsigned long 
>>> psscr,
>>> bool mmu_on)
>>> unsigned long srr1;
>>> unsigned long pls;
>>> unsigned long mmcr0 = 0;
>>> +   unsigned long mmcra_bhrb = 0;
> 
> We are saving the whole of MMCRA aren't we ? We might want to just
> name it mmcra in that case.
> 
>>> struct p9_sprs sprs = {}; /* avoid false used-uninitialised */
>>> bool sprs_saved = false;
>>> 
>>> @@ -657,6 +658,15 @@ static unsigned long power9_idle_stop(unsigned long
>>> psscr, bool mmu_on)
>>>   */
>>> mmcr0   = mfspr(SPRN_MMCR0);
>>> }
>>> +
>>> +   if (cpu_has_feature(CPU_FTR_ARCH_31)) {
>>> +   /* POWER10 uses MMCRA[:26] as BHRB disable bit
>>> +* to disable BHRB logic when not used. Hence Save and
>>> +* restore MMCRA after a state-loss idle.
>>> +*/
> 
> Multi-line comment usually has the first line blank.

Hi Gautham

Thanks for checking. I will change the comment format
Yes, we are saving whole of MMCRA. 
> 
>   /*
>* Line 1
>* Line 2
>* .
>* .
>* .
>* Line N
>*/
> 
>>> +   mmcra_bhrb  = mfspr(SPRN_MMCRA);
>> 
>> 
>> Why is the bhrb bit of mmcra special here?
> 
> The comment above could include the consequence of not saving and
> restoring MMCRA i.e
> 
> - If the user hasn't asked for the BHRB to be
>  written the value of MMCRA[BHRBD] = 1.
> 
> - On wakeup from stop, MMCRA[BHRBD] will be 0, since MMCRA is not a
>  previleged resource and will be lost.
> 
> - Thus, if we do not save and restore the MMCRA[BHRBD], the hardware
>  will be needlessly writing to the BHRB in the problem mode.
> 
>> 
>>> +   }
>>> +
>>> if ((psscr & PSSCR_RL_MASK) >= pnv_first_spr_loss_level) {
>>> sprs.lpcr   = mfspr(SPRN_LPCR);
>>> sprs.hfscr  = mfspr(SPRN_HFSCR);
>>> @@ -721,6 +731,10 @@ static unsigned long power9_idle_stop(unsigned long
>>> psscr, bool mmu_on)
>>> mtspr(SPRN_MMCR0, mmcr0);
>>> }
>>> 
>>> +   /* Reload MMCRA to restore BHRB disable bit for POWER10 */
>>> +   if (cpu_has_feature(CPU_FTR_ARCH_31))
>>> +   mtspr(SPRN_MMCRA, mmcra_bhrb);
>>> +
>>> /*
>>>  * DD2.2 and earlier need to set then clear bit 60 in MMCRA
>>>  * to ensure the PMU starts running.
>> 
> 
> --
> Thanks and Regards
> gautham.



Re: [PATCH v2 03/10] powerpc/xmon: Add PowerISA v3.1 PMU SPRs

2020-07-08 Thread Athira Rajeev


> On 08-Jul-2020, at 4:34 PM, Michael Ellerman  wrote:
> 
> Athira Rajeev  > writes:
>> From: Madhavan Srinivasan 
>> 
>> PowerISA v3.1 added three new perfromance
>> monitoring unit (PMU) speical purpose register (SPR).
>> They are Monitor Mode Control Register 3 (MMCR3),
>> Sampled Instruction Event Register 2 (SIER2),
>> Sampled Instruction Event Register 3 (SIER3).
>> 
>> Patch here adds a new dump function dump_310_sprs
>> to print these SPR values.
>> 
>> Signed-off-by: Madhavan Srinivasan 
>> ---
>> arch/powerpc/xmon/xmon.c | 15 +++
>> 1 file changed, 15 insertions(+)
>> 
>> diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
>> index 7efe4bc..8917fe8 100644
>> --- a/arch/powerpc/xmon/xmon.c
>> +++ b/arch/powerpc/xmon/xmon.c
>> @@ -2022,6 +2022,20 @@ static void dump_300_sprs(void)
>> #endif
>> }
>> 
>> +static void dump_310_sprs(void)
>> +{
>> +#ifdef CONFIG_PPC64
>> +if (!cpu_has_feature(CPU_FTR_ARCH_31))
>> +return;
>> +
>> +printf("mmcr3  = %.16lx\n",
>> +mfspr(SPRN_MMCR3));
>> +
>> +printf("sier2  = %.16lx  sier3  = %.16lx\n",
>> +mfspr(SPRN_SIER2), mfspr(SPRN_SIER3));
> 
> Why not all on one line like many of the others?

Sure, will change this to one line

Thanks
Athira
> 
> cheers



Re: [PATCH V4 0/3] arm64: Enable vmemmap mapping from device memory

2020-07-08 Thread Anshuman Khandual



On 07/06/2020 08:26 AM, Anshuman Khandual wrote:
> This series enables vmemmap backing memory allocation from device memory
> ranges on arm64. But before that, it enables vmemmap_populate_basepages()
> and vmemmap_alloc_block_buf() to accommodate struct vmem_altmap based
> alocation requests.
> 
> This series applies on 5.8-rc4.
> 
> Changes in V4:
> 
> - Dropped 'fallback' from vmemmap_alloc_block_buf() per Catalin

Hello Andrew,

This series has been a long running one :) Now that all the three patches
here have been reviewed, could you please consider this series for merging
towards 5.9-rc1. Catalin had suggested earlier [1] that it should go via
the MM tree instead, as it touches multiple architecture. Thank you.

[1] https://patchwork.kernel.org/patch/11611103/

- Anshuman


[PATCH v6 23/23] powerpc/book3s64/pkeys: Remove is_pkey_enabled()

2020-07-08 Thread Aneesh Kumar K.V
There is only one caller to this function and the function is wrongly
named. Avoid further confusion w.r.t name and open code this at the
only call site. Also remove read_uamor(). There are no users for
the same after this.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/book3s64/pkeys.c | 31 +++
 1 file changed, 11 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 480ae31fad52..a4768c694720 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -251,25 +251,6 @@ static inline void write_iamr(u64 value)
mtspr(SPRN_IAMR, value);
 }
 
-static inline u64 read_uamor(void)
-{
-   return mfspr(SPRN_UAMOR);
-}
-
-static bool is_pkey_enabled(int pkey)
-{
-   u64 uamor = read_uamor();
-   u64 pkey_bits = 0x3ul << pkeyshift(pkey);
-   u64 uamor_pkey_bits = (uamor & pkey_bits);
-
-   /*
-* Both the bits in UAMOR corresponding to the key should be set or
-* reset.
-*/
-   WARN_ON(uamor_pkey_bits && (uamor_pkey_bits != pkey_bits));
-   return !!(uamor_pkey_bits);
-}
-
 static inline void init_amr(int pkey, u8 init_bits)
 {
u64 new_amr_bits = (((u64)init_bits & 0x3UL) << pkeyshift(pkey));
@@ -295,8 +276,18 @@ int __arch_set_user_pkey_access(struct task_struct *tsk, 
int pkey,
 {
u64 new_amr_bits = 0x0ul;
u64 new_iamr_bits = 0x0ul;
+   u64 pkey_bits, uamor_pkey_bits;
 
-   if (!is_pkey_enabled(pkey))
+   /*
+* Check whether the key is disabled by UAMOR.
+*/
+   pkey_bits = 0x3ul << pkeyshift(pkey);
+   uamor_pkey_bits = (default_uamor & pkey_bits);
+
+   /*
+* Both the bits in UAMOR corresponding to the key should be set
+*/
+   if (uamor_pkey_bits != pkey_bits)
return -EINVAL;
 
if (init_val & PKEY_DISABLE_EXECUTE) {
-- 
2.26.2



[PATCH v6 22/23] powerpc/selftest/ptrace-pkey: Don't update expected UAMOR value

2020-07-08 Thread Aneesh Kumar K.V
with commit: 4a4a5e5d2aad ("powerpc/pkeys: key allocation/deallocation must not 
change pkey registers")
we are not updating UAMOR on key allocation. So don't update the
expected uamor value in the test.

Fixes: 4a4a5e5d2aad ("powerpc/pkeys: key allocation/deallocation must not 
change pkey registers")
Signed-off-by: Aneesh Kumar K.V 
---
 tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
index bc33d748d95b..3694613f418f 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
@@ -101,15 +101,20 @@ static int child(struct shared_info *info)
 */
info->invalid_amr = info->amr2 | (~0x0UL & ~info->expected_uamor);
 
+   /*
+* if PKEY_DISABLE_EXECUTE succeeded we should update the expected_iamr
+*/
if (disable_execute)
info->expected_iamr |= 1ul << pkeyshift(pkey1);
else
info->expected_iamr &= ~(1ul << pkeyshift(pkey1));
 
-   info->expected_iamr &= ~(1ul << pkeyshift(pkey2) | 1ul << 
pkeyshift(pkey3));
+   /*
+* We allocated pkey2 and pkey 3 above. Clear the IAMR bits.
+*/
+   info->expected_iamr &= ~(1ul << pkeyshift(pkey2));
+   info->expected_iamr &= ~(1ul << pkeyshift(pkey3));
 
-   info->expected_uamor |= 3ul << pkeyshift(pkey1) |
-   3ul << pkeyshift(pkey2);
/*
 * Create an IAMR value different from expected value.
 * Kernel will reject an IAMR and UAMOR change.
-- 
2.26.2



[PATCH v6 21/23] powerpc/selftest/ptrace-pkey: Update the test to mark an invalid pkey correctly

2020-07-08 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 .../selftests/powerpc/ptrace/ptrace-pkey.c| 30 ---
 1 file changed, 12 insertions(+), 18 deletions(-)

diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
index f9216c7a1829..bc33d748d95b 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
@@ -66,11 +66,6 @@ static int sys_pkey_alloc(unsigned long flags, unsigned long 
init_access_rights)
return syscall(__NR_pkey_alloc, flags, init_access_rights);
 }
 
-static int sys_pkey_free(int pkey)
-{
-   return syscall(__NR_pkey_free, pkey);
-}
-
 static int child(struct shared_info *info)
 {
unsigned long reg;
@@ -100,7 +95,11 @@ static int child(struct shared_info *info)
 
info->amr1 |= 3ul << pkeyshift(pkey1);
info->amr2 |= 3ul << pkeyshift(pkey2);
-   info->invalid_amr |= info->amr2 | 3ul << pkeyshift(pkey3);
+   /*
+* invalid amr value where we try to force write
+* things which are deined by a uamor setting.
+*/
+   info->invalid_amr = info->amr2 | (~0x0UL & ~info->expected_uamor);
 
if (disable_execute)
info->expected_iamr |= 1ul << pkeyshift(pkey1);
@@ -111,17 +110,12 @@ static int child(struct shared_info *info)
 
info->expected_uamor |= 3ul << pkeyshift(pkey1) |
3ul << pkeyshift(pkey2);
-   info->invalid_iamr |= 1ul << pkeyshift(pkey1) | 1ul << pkeyshift(pkey2);
-   info->invalid_uamor |= 3ul << pkeyshift(pkey1);
-
/*
-* We won't use pkey3. We just want a plausible but invalid key to test
-* whether ptrace will let us write to AMR bits we are not supposed to.
-*
-* This also tests whether the kernel restores the UAMOR permissions
-* after a key is freed.
+* Create an IAMR value different from expected value.
+* Kernel will reject an IAMR and UAMOR change.
 */
-   sys_pkey_free(pkey3);
+   info->invalid_iamr = info->expected_iamr | (1ul << pkeyshift(pkey1) | 
1ul << pkeyshift(pkey2));
+   info->invalid_uamor = info->expected_uamor & ~(0x3ul << 
pkeyshift(pkey1));
 
printf("%-30s AMR: %016lx pkey1: %d pkey2: %d pkey3: %d\n",
   user_write, info->amr1, pkey1, pkey2, pkey3);
@@ -196,9 +190,9 @@ static int parent(struct shared_info *info, pid_t pid)
PARENT_SKIP_IF_UNSUPPORTED(ret, >child_sync);
PARENT_FAIL_IF(ret, >child_sync);
 
-   info->amr1 = info->amr2 = info->invalid_amr = regs[0];
-   info->expected_iamr = info->invalid_iamr = regs[1];
-   info->expected_uamor = info->invalid_uamor = regs[2];
+   info->amr1 = info->amr2 = regs[0];
+   info->expected_iamr = regs[1];
+   info->expected_uamor = regs[2];
 
/* Wake up child so that it can set itself up. */
ret = prod_child(>child_sync);
-- 
2.26.2



[PATCH v6 19/23] powerpc/book3s64/kuap: Move UAMOR setup to key init function

2020-07-08 Thread Aneesh Kumar K.V
UAMOR values are not application-specific. The kernel initializes
its value based on different reserved keys. Remove the thread-specific
UAMOR value and don't switch the UAMOR on context switch.

Move UAMOR initialization to key initialization code and remove
thread_struct.uamor because it is not used anymore.

Before commit: 4a4a5e5d2aad ("powerpc/pkeys: key allocation/deallocation must 
not change pkey registers")
we used to update uamor based on key allocation and free.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pkeys.h |  2 ++
 arch/powerpc/include/asm/processor.h   |  1 -
 arch/powerpc/kernel/ptrace/ptrace-view.c   | 27 +++-
 arch/powerpc/kernel/smp.c  |  1 +
 arch/powerpc/mm/book3s64/hash_utils.c  |  4 +++
 arch/powerpc/mm/book3s64/pkeys.c   | 29 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c   |  4 +++
 7 files changed, 44 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pkeys.h 
b/arch/powerpc/include/asm/book3s/64/pkeys.h
index 5b178139f3c0..b7d9f4267bcd 100644
--- a/arch/powerpc/include/asm/book3s/64/pkeys.h
+++ b/arch/powerpc/include/asm/book3s/64/pkeys.h
@@ -5,6 +5,8 @@
 
 #include 
 
+extern u64 __ro_after_init default_uamor;
+
 static inline u64 vmflag_to_pte_pkey_bits(u64 vm_flags)
 {
if (!mmu_has_feature(MMU_FTR_PKEY))
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 52a67835057a..6ac12168f1fe 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -237,7 +237,6 @@ struct thread_struct {
 #ifdef CONFIG_PPC_MEM_KEYS
unsigned long   amr;
unsigned long   iamr;
-   unsigned long   uamor;
 #endif
 #ifdef CONFIG_KVM_BOOK3S_32_HANDLER
void*   kvm_shadow_vcpu; /* KVM internal data */
diff --git a/arch/powerpc/kernel/ptrace/ptrace-view.c 
b/arch/powerpc/kernel/ptrace/ptrace-view.c
index caeb5822a8f4..ac7d480cb9c1 100644
--- a/arch/powerpc/kernel/ptrace/ptrace-view.c
+++ b/arch/powerpc/kernel/ptrace/ptrace-view.c
@@ -488,14 +488,21 @@ static int pkey_active(struct task_struct *target, const 
struct user_regset *reg
 static int pkey_get(struct task_struct *target, const struct user_regset 
*regset,
unsigned int pos, unsigned int count, void *kbuf, void 
__user *ubuf)
 {
+   int ret;
+
BUILD_BUG_ON(TSO(amr) + sizeof(unsigned long) != TSO(iamr));
-   BUILD_BUG_ON(TSO(iamr) + sizeof(unsigned long) != TSO(uamor));
 
if (!arch_pkeys_enabled())
return -ENODEV;
 
-   return user_regset_copyout(, , , , 
>thread.amr,
-  0, ELF_NPKEY * sizeof(unsigned long));
+   ret = user_regset_copyout(, , , , 
>thread.amr,
+ 0, 2 * sizeof(unsigned long));
+   if (ret)
+   return ret;
+
+   ret = user_regset_copyout(, , , , _uamor,
+ 2 * sizeof(unsigned long), 3 * 
sizeof(unsigned long));
+   return ret;
 }
 
 static int pkey_set(struct task_struct *target, const struct user_regset 
*regset,
@@ -517,9 +524,17 @@ static int pkey_set(struct task_struct *target, const 
struct user_regset *regset
if (ret)
return ret;
 
-   /* UAMOR determines which bits of the AMR can be set from userspace. */
-   target->thread.amr = (new_amr & target->thread.uamor) |
-(target->thread.amr & ~target->thread.uamor);
+   /*
+* UAMOR determines which bits of the AMR can be set from userspace.
+* UAMOR value 0b11 indicates that the AMR value can be modified
+* from userspace. If the kernel is using a specific key, we avoid
+* userspace modifying the AMR value for that key by masking them
+* via UAMOR 0b00.
+*
+* Pick the AMR values for the keys that kernel is using. This
+* will be indicated by the ~default_uamor bits.
+*/
+   target->thread.amr = (new_amr & default_uamor) | (target->thread.amr & 
~default_uamor);
 
return 0;
 }
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 73199470c265..8261999c7d52 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -59,6 +59,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef DEBUG
 #include 
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index eec6f4e5e481..9dfb0ceed5e3 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1110,6 +1110,10 @@ void hash__early_init_mmu_secondary(void)
if (cpu_has_feature(CPU_FTR_ARCH_206)
&& cpu_has_feature(CPU_FTR_HVMODE))
tlbiel_all();
+
+#ifdef CONFIG_PPC_MEM_KEYS
+   mtspr(SPRN_UAMOR, default_uamor);
+#endif
 }
 #endif /* CONFIG_SMP */
 
diff --git a/arch/powerpc/mm/book3s64/pkeys.c 

[PATCH v6 18/23] powerpc/book3s64/keys/kuap: Reset AMR/IAMR values on kexec

2020-07-08 Thread Aneesh Kumar K.V
As we kexec across kernels that use AMR/IAMR for different purposes
we need to ensure that new kernels get kexec'd with a reset value
of AMR/IAMR. For ex: the new kernel can use key 0 for kernel mapping and the old
AMR value prevents access to key 0.

This patch also removes reset if IAMR and AMOR in kexec_sequence. Reset of AMOR
is not needed and the IAMR reset is partial (it doesn't do the reset
on secondary cpus) and is redundant with this patch.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/kexec.h | 23 ++
 arch/powerpc/include/asm/kexec.h   | 12 +++
 arch/powerpc/kernel/misc_64.S  | 14 -
 arch/powerpc/kexec/core_64.c   |  2 ++
 arch/powerpc/mm/book3s64/pgtable.c |  3 +++
 5 files changed, 40 insertions(+), 14 deletions(-)
 create mode 100644 arch/powerpc/include/asm/book3s/64/kexec.h

diff --git a/arch/powerpc/include/asm/book3s/64/kexec.h 
b/arch/powerpc/include/asm/book3s/64/kexec.h
new file mode 100644
index ..6b5c3a248ba2
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/64/kexec.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _ASM_POWERPC_BOOK3S_64_KEXEC_H_
+#define _ASM_POWERPC_BOOK3S_64_KEXEC_H_
+
+
+#define reset_sprs reset_sprs
+static inline void reset_sprs(void)
+{
+   if (cpu_has_feature(CPU_FTR_ARCH_206)) {
+   mtspr(SPRN_AMR, 0);
+   mtspr(SPRN_UAMOR, 0);
+   }
+
+   if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
+   mtspr(SPRN_IAMR, 0);
+   }
+
+   /*  Do we need isync()? We are going via a kexec reset */
+   isync();
+}
+
+#endif
diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index c68476818753..89f7e3462292 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -150,6 +150,18 @@ static inline void crash_send_ipi(void 
(*crash_ipi_callback)(struct pt_regs *))
 }
 
 #endif /* CONFIG_KEXEC_CORE */
+
+#ifdef CONFIG_PPC_BOOK3S_64
+#include 
+#endif
+
+#ifndef reset_sprs
+#define reset_sprs reset_sprs
+static inline void reset_sprs(void)
+{
+}
+#endif
+
 #endif /* ! __ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_KEXEC_H */
diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index 1864605eca29..7bb46ad98207 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -413,20 +413,6 @@ _GLOBAL(kexec_sequence)
li  r0,0
std r0,16(r1)
 
-BEGIN_FTR_SECTION
-   /*
-* This is the best time to turn AMR/IAMR off.
-* key 0 is used in radix for supervisor<->user
-* protection, but on hash key 0 is reserved
-* ideally we want to enter with a clean state.
-* NOTE, we rely on r0 being 0 from above.
-*/
-   mtspr   SPRN_IAMR,r0
-BEGIN_FTR_SECTION_NESTED(42)
-   mtspr   SPRN_AMOR,r0
-END_FTR_SECTION_NESTED_IFSET(CPU_FTR_HVMODE, 42)
-END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
-
/* save regs for local vars on new stack.
 * yes, we won't go back, but ...
 */
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index b4184092172a..8a449b2d8715 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -152,6 +152,8 @@ static void kexec_smp_down(void *arg)
if (ppc_md.kexec_cpu_down)
ppc_md.kexec_cpu_down(0, 1);
 
+   reset_sprs();
+
kexec_smp_wait();
/* NOTREACHED */
 }
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index c58ad1049909..e63fcc00744c 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -165,6 +166,8 @@ void mmu_cleanup_all(void)
radix__mmu_cleanup_all();
else if (mmu_hash_ops.hpte_clear_all)
mmu_hash_ops.hpte_clear_all();
+
+   reset_sprs();
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-- 
2.26.2



[PATCH v6 20/23] powerpc/selftest/ptrave-pkey: Rename variables to make it easier to follow code

2020-07-08 Thread Aneesh Kumar K.V
Rename variable to indicate that they are invalid values which we will use to
test ptrace update of pkeys.

Signed-off-by: Aneesh Kumar K.V 
---
 .../selftests/powerpc/ptrace/ptrace-pkey.c| 26 +--
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
index bdbbbe8431e0..f9216c7a1829 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
@@ -44,7 +44,7 @@ struct shared_info {
unsigned long amr2;
 
/* AMR value that ptrace should refuse to write to the child. */
-   unsigned long amr3;
+   unsigned long invalid_amr;
 
/* IAMR value the parent expects to read from the child. */
unsigned long expected_iamr;
@@ -57,8 +57,8 @@ struct shared_info {
 * (even though they're valid ones) because userspace doesn't have
 * access to those registers.
 */
-   unsigned long new_iamr;
-   unsigned long new_uamor;
+   unsigned long invalid_iamr;
+   unsigned long invalid_uamor;
 };
 
 static int sys_pkey_alloc(unsigned long flags, unsigned long 
init_access_rights)
@@ -100,7 +100,7 @@ static int child(struct shared_info *info)
 
info->amr1 |= 3ul << pkeyshift(pkey1);
info->amr2 |= 3ul << pkeyshift(pkey2);
-   info->amr3 |= info->amr2 | 3ul << pkeyshift(pkey3);
+   info->invalid_amr |= info->amr2 | 3ul << pkeyshift(pkey3);
 
if (disable_execute)
info->expected_iamr |= 1ul << pkeyshift(pkey1);
@@ -111,8 +111,8 @@ static int child(struct shared_info *info)
 
info->expected_uamor |= 3ul << pkeyshift(pkey1) |
3ul << pkeyshift(pkey2);
-   info->new_iamr |= 1ul << pkeyshift(pkey1) | 1ul << pkeyshift(pkey2);
-   info->new_uamor |= 3ul << pkeyshift(pkey1);
+   info->invalid_iamr |= 1ul << pkeyshift(pkey1) | 1ul << pkeyshift(pkey2);
+   info->invalid_uamor |= 3ul << pkeyshift(pkey1);
 
/*
 * We won't use pkey3. We just want a plausible but invalid key to test
@@ -196,9 +196,9 @@ static int parent(struct shared_info *info, pid_t pid)
PARENT_SKIP_IF_UNSUPPORTED(ret, >child_sync);
PARENT_FAIL_IF(ret, >child_sync);
 
-   info->amr1 = info->amr2 = info->amr3 = regs[0];
-   info->expected_iamr = info->new_iamr = regs[1];
-   info->expected_uamor = info->new_uamor = regs[2];
+   info->amr1 = info->amr2 = info->invalid_amr = regs[0];
+   info->expected_iamr = info->invalid_iamr = regs[1];
+   info->expected_uamor = info->invalid_uamor = regs[2];
 
/* Wake up child so that it can set itself up. */
ret = prod_child(>child_sync);
@@ -234,10 +234,10 @@ static int parent(struct shared_info *info, pid_t pid)
return ret;
 
/* Write invalid AMR value in child. */
-   ret = ptrace_write_regs(pid, NT_PPC_PKEY, >amr3, 1);
+   ret = ptrace_write_regs(pid, NT_PPC_PKEY, >invalid_amr, 1);
PARENT_FAIL_IF(ret, >child_sync);
 
-   printf("%-30s AMR: %016lx\n", ptrace_write_running, info->amr3);
+   printf("%-30s AMR: %016lx\n", ptrace_write_running, info->invalid_amr);
 
/* Wake up child so that it can verify it didn't change. */
ret = prod_child(>child_sync);
@@ -249,7 +249,7 @@ static int parent(struct shared_info *info, pid_t pid)
 
/* Try to write to IAMR. */
regs[0] = info->amr1;
-   regs[1] = info->new_iamr;
+   regs[1] = info->invalid_iamr;
ret = ptrace_write_regs(pid, NT_PPC_PKEY, regs, 2);
PARENT_FAIL_IF(!ret, >child_sync);
 
@@ -257,7 +257,7 @@ static int parent(struct shared_info *info, pid_t pid)
   ptrace_write_running, regs[0], regs[1]);
 
/* Try to write to IAMR and UAMOR. */
-   regs[2] = info->new_uamor;
+   regs[2] = info->invalid_uamor;
ret = ptrace_write_regs(pid, NT_PPC_PKEY, regs, 3);
PARENT_FAIL_IF(!ret, >child_sync);
 
-- 
2.26.2



[PATCH v6 17/23] powerpc/book3s64/keys: Print information during boot.

2020-07-08 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/book3s64/pkeys.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index f388c8d5359a..c682eefd3fc1 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -208,6 +208,7 @@ void __init pkey_early_init_devtree(void)
 */
initial_allocation_mask |= reserved_allocation_mask;
 
+   pr_info("Enabling pkeys with max key count %d\n", num_pkey);
return;
 }
 
-- 
2.26.2



[PATCH v6 16/23] powerpc/book3s64/pkeys: Use MMU_FTR_PKEY instead of pkey_disabled static key

2020-07-08 Thread Aneesh Kumar K.V
Instead of pkey_disabled static key use mmu feature MMU_FTR_PKEY.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pkeys.h |  2 +-
 arch/powerpc/include/asm/pkeys.h   | 14 ++
 arch/powerpc/mm/book3s64/pkeys.c   | 17 +++--
 3 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pkeys.h 
b/arch/powerpc/include/asm/book3s/64/pkeys.h
index 8174662a9173..5b178139f3c0 100644
--- a/arch/powerpc/include/asm/book3s/64/pkeys.h
+++ b/arch/powerpc/include/asm/book3s/64/pkeys.h
@@ -7,7 +7,7 @@
 
 static inline u64 vmflag_to_pte_pkey_bits(u64 vm_flags)
 {
-   if (static_branch_likely(_disabled))
+   if (!mmu_has_feature(MMU_FTR_PKEY))
return 0x0UL;
 
if (radix_enabled())
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index f44a14d64d47..a7951049e129 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -11,7 +11,6 @@
 #include 
 #include 
 
-DECLARE_STATIC_KEY_FALSE(pkey_disabled);
 extern int num_pkey;
 extern u32 reserved_allocation_mask; /* bits set for reserved keys */
 
@@ -38,7 +37,7 @@ static inline u64 pkey_to_vmflag_bits(u16 pkey)
 
 static inline int vma_pkey(struct vm_area_struct *vma)
 {
-   if (static_branch_likely(_disabled))
+   if (!mmu_has_feature(MMU_FTR_PKEY))
return 0;
return (vma->vm_flags & ARCH_VM_PKEY_FLAGS) >> VM_PKEY_SHIFT;
 }
@@ -93,9 +92,8 @@ static inline int mm_pkey_alloc(struct mm_struct *mm)
u32 all_pkeys_mask = (u32)(~(0x0));
int ret;
 
-   if (static_branch_likely(_disabled))
+   if (!mmu_has_feature(MMU_FTR_PKEY))
return -1;
-
/*
 * Are we out of pkeys? We must handle this specially because ffz()
 * behavior is undefined if there are no zeros.
@@ -111,7 +109,7 @@ static inline int mm_pkey_alloc(struct mm_struct *mm)
 
 static inline int mm_pkey_free(struct mm_struct *mm, int pkey)
 {
-   if (static_branch_likely(_disabled))
+   if (!mmu_has_feature(MMU_FTR_PKEY))
return -1;
 
if (!mm_pkey_is_allocated(mm, pkey))
@@ -132,7 +130,7 @@ extern int __arch_override_mprotect_pkey(struct 
vm_area_struct *vma,
 static inline int arch_override_mprotect_pkey(struct vm_area_struct *vma,
  int prot, int pkey)
 {
-   if (static_branch_likely(_disabled))
+   if (!mmu_has_feature(MMU_FTR_PKEY))
return 0;
 
/*
@@ -150,7 +148,7 @@ extern int __arch_set_user_pkey_access(struct task_struct 
*tsk, int pkey,
 static inline int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
unsigned long init_val)
 {
-   if (static_branch_likely(_disabled))
+   if (!mmu_has_feature(MMU_FTR_PKEY))
return -EINVAL;
 
/*
@@ -167,7 +165,7 @@ static inline int arch_set_user_pkey_access(struct 
task_struct *tsk, int pkey,
 
 static inline bool arch_pkeys_enabled(void)
 {
-   return !static_branch_likely(_disabled);
+   return mmu_has_feature(MMU_FTR_PKEY);
 }
 
 extern void pkey_mm_init(struct mm_struct *mm);
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index db2d0d34f515..f388c8d5359a 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -12,8 +12,6 @@
 #include 
 #include 
 
-
-DEFINE_STATIC_KEY_FALSE(pkey_disabled);
 int  num_pkey; /* Max number of pkeys supported */
 /*
  *  Keys marked in the reservation list cannot be allocated by  userspace
@@ -126,7 +124,6 @@ void __init pkey_early_init_devtree(void)
pkeys_total = scan_pkey_feature();
if (!pkeys_total) {
/* No support for pkey. Mark it disabled */
-   static_branch_enable(_disabled);
return;
}
 
@@ -216,7 +213,7 @@ void __init pkey_early_init_devtree(void)
 
 void pkey_mm_init(struct mm_struct *mm)
 {
-   if (static_branch_likely(_disabled))
+   if (!mmu_has_feature(MMU_FTR_PKEY))
return;
mm_pkey_allocation_map(mm) = initial_allocation_mask;
mm->context.execute_only_pkey = execute_only_key;
@@ -320,7 +317,7 @@ int __arch_set_user_pkey_access(struct task_struct *tsk, 
int pkey,
 
 void thread_pkey_regs_save(struct thread_struct *thread)
 {
-   if (static_branch_likely(_disabled))
+   if (!mmu_has_feature(MMU_FTR_PKEY))
return;
 
/*
@@ -334,7 +331,7 @@ void thread_pkey_regs_save(struct thread_struct *thread)
 void thread_pkey_regs_restore(struct thread_struct *new_thread,
  struct thread_struct *old_thread)
 {
-   if (static_branch_likely(_disabled))
+   if (!mmu_has_feature(MMU_FTR_PKEY))
return;
 
if (old_thread->amr != new_thread->amr)
@@ -347,7 +344,7 @@ void thread_pkey_regs_restore(struct 

[PATCH v6 15/23] powerpc/book3s64/pkeys: Use pkey_execute_disable_supported

2020-07-08 Thread Aneesh Kumar K.V
Use pkey_execute_disable_supported to check for execute key support instead
of pkey_disabled.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/pkeys.h | 10 +-
 arch/powerpc/mm/book3s64/pkeys.c |  6 +++---
 2 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index dd32e30f6767..f44a14d64d47 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -126,15 +126,7 @@ static inline int mm_pkey_free(struct mm_struct *mm, int 
pkey)
  * Try to dedicate one of the protection keys to be used as an
  * execute-only protection key.
  */
-extern int __execute_only_pkey(struct mm_struct *mm);
-static inline int execute_only_pkey(struct mm_struct *mm)
-{
-   if (static_branch_likely(_disabled))
-   return -1;
-
-   return __execute_only_pkey(mm);
-}
-
+extern int execute_only_pkey(struct mm_struct *mm);
 extern int __arch_override_mprotect_pkey(struct vm_area_struct *vma,
 int prot, int pkey);
 static inline int arch_override_mprotect_pkey(struct vm_area_struct *vma,
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index c795a28c1964..db2d0d34f515 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -23,7 +23,6 @@ u32 reserved_allocation_mask __ro_after_init;
 /* Bits set for the initially allocated keys */
 static u32 initial_allocation_mask __ro_after_init;
 
-static bool pkey_execute_disable_supported;
 /*
  * Even if we allocate keys with sys_pkey_alloc(), we need to make sure
  * other thread still find the access denied using the same keys.
@@ -38,6 +37,7 @@ static u64 default_uamor = ~0x0UL;
  * We pick key 2 because 0 is special key and 1 is reserved as per ISA.
  */
 static int execute_only_key = 2;
+static bool pkey_execute_disable_supported;
 
 
 #define AMR_BITS_PER_PKEY 2
@@ -151,7 +151,7 @@ void __init pkey_early_init_devtree(void)
num_pkey = pkeys_total;
 #endif
 
-   if (unlikely(num_pkey <= execute_only_key)) {
+   if (unlikely(num_pkey <= execute_only_key) || 
!pkey_execute_disable_supported) {
/*
 * Insufficient number of keys to support
 * execute only key. Mark it unavailable.
@@ -359,7 +359,7 @@ void thread_pkey_regs_init(struct thread_struct *thread)
write_uamor(default_uamor);
 }
 
-int __execute_only_pkey(struct mm_struct *mm)
+int execute_only_pkey(struct mm_struct *mm)
 {
return mm->context.execute_only_pkey;
 }
-- 
2.26.2



[PATCH v6 14/23] powerpc/book3s64/kuep: Add MMU_FTR_KUEP

2020-07-08 Thread Aneesh Kumar K.V
This will be used to enable/disable Kernel Userspace Execution
Prevention (KUEP).

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/mmu.h   | 8 
 arch/powerpc/mm/book3s64/radix_pgtable.c | 4 +++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 88aed01fad81..df767315ec8c 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -28,6 +28,10 @@
  * Individual features below.
  */
 
+/*
+ * Suppor for KUEP feature.
+ */
+#define MMU_FTR_KUEP   ASM_CONST(0x0800)
 /*
  * Support for memory protection keys.
  */
@@ -184,6 +188,10 @@ enum {
 #ifdef CONFIG_PPC_MEM_KEYS
MMU_FTR_PKEY |
 #endif
+#ifdef CONFIG_PPC_KUEP
+   MMU_FTR_KUEP |
+#endif /* CONFIG_PPC_KUAP */
+
0,
 };
 
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index bb00e0cba119..6d814c9bb4bf 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -519,8 +519,10 @@ void setup_kuep(bool disabled)
if (disabled || !early_radix_enabled())
return;
 
-   if (smp_processor_id() == boot_cpuid)
+   if (smp_processor_id() == boot_cpuid) {
pr_info("Activating Kernel Userspace Execution Prevention\n");
+   cur_cpu_spec->mmu_features |= MMU_FTR_KUEP;
+   }
 
/*
 * Radix always uses key0 of the IAMR to determine if an access is
-- 
2.26.2



[PATCH v6 13/23] powerpc/book3s64/pkeys: Add MMU_FTR_PKEY

2020-07-08 Thread Aneesh Kumar K.V
Parse storage keys related device tree entry in early_init_devtree
and enable MMU feature MMU_FTR_PKEY if pkeys are supported.

MMU feature is used instead of CPU feature because this enables us
to group MMU_FTR_KUAP and MMU_FTR_PKEY in asm feature fixup code.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/mmu.h |  6 +++
 arch/powerpc/include/asm/mmu.h   |  9 
 arch/powerpc/kernel/prom.c   |  5 +++
 arch/powerpc/mm/book3s64/pkeys.c | 52 ++--
 4 files changed, 50 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index 5393a535240c..3371ea05b7d3 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -209,6 +209,12 @@ extern int mmu_io_psize;
 void mmu_early_init_devtree(void);
 void hash__early_init_devtree(void);
 void radix__early_init_devtree(void);
+#ifdef CONFIG_PPC_MEM_KEYS
+void pkey_early_init_devtree(void);
+#else
+static inline void pkey_early_init_devtree(void) {}
+#endif
+
 extern void hash__early_init_mmu(void);
 extern void radix__early_init_mmu(void);
 static inline void __init early_init_mmu(void)
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index f4ac25d4df05..88aed01fad81 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -28,6 +28,10 @@
  * Individual features below.
  */
 
+/*
+ * Support for memory protection keys.
+ */
+#define MMU_FTR_PKEY   ASM_CONST(0x1000)
 /*
  * Support for 68 bit VA space. We added that from ISA 2.05
  */
@@ -177,6 +181,9 @@ enum {
MMU_FTR_RADIX_KUAP |
 #endif /* CONFIG_PPC_KUAP */
 #endif /* CONFIG_PPC_RADIX_MMU */
+#ifdef CONFIG_PPC_MEM_KEYS
+   MMU_FTR_PKEY |
+#endif
0,
 };
 
@@ -356,6 +363,8 @@ extern void setup_initial_memory_limit(phys_addr_t 
first_memblock_base,
   phys_addr_t first_memblock_size);
 static inline void mmu_early_init_devtree(void) { }
 
+static inline void pkey_early_init_devtree(void) {}
+
 extern void *abatron_pteptrs[2];
 #endif /* __ASSEMBLY__ */
 #endif
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 9cc49f265c86..4cb65fd5f532 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -815,6 +815,11 @@ void __init early_init_devtree(void *params)
/* Now try to figure out if we are running on LPAR and so on */
pseries_probe_fw_features();
 
+   /*
+* Initialize pkey features and default AMR/IAMR values
+*/
+   pkey_early_init_devtree();
+
 #ifdef CONFIG_PPC_PS3
/* Identify PS3 firmware */
if (of_flat_dt_is_compatible(of_get_flat_dt_root(), "sony,ps3"))
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index ac272166e5b4..c795a28c1964 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -10,7 +10,8 @@
 #include 
 #include 
 #include 
-#include 
+#include 
+
 
 DEFINE_STATIC_KEY_FALSE(pkey_disabled);
 int  num_pkey; /* Max number of pkeys supported */
@@ -46,31 +47,38 @@ static int execute_only_key = 2;
 #define PKEY_REG_BITS (sizeof(u64) * 8)
 #define pkeyshift(pkey) (PKEY_REG_BITS - ((pkey+1) * AMR_BITS_PER_PKEY))
 
+static int __init dt_scan_storage_keys(unsigned long node,
+  const char *uname, int depth,
+  void *data)
+{
+   const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
+   const __be32 *prop;
+   int *pkeys_total = (int *) data;
+
+   /* We are scanning "cpu" nodes only */
+   if (type == NULL || strcmp(type, "cpu") != 0)
+   return 0;
+
+   prop = of_get_flat_dt_prop(node, "ibm,processor-storage-keys", NULL);
+   if (!prop)
+   return 0;
+   *pkeys_total = be32_to_cpu(prop[0]);
+   return 1;
+}
+
 static int scan_pkey_feature(void)
 {
-   u32 vals[2];
+   int ret;
int pkeys_total = 0;
-   struct device_node *cpu;
 
/*
 * Pkey is not supported with Radix translation.
 */
-   if (radix_enabled())
-   return 0;
-
-   cpu = of_find_node_by_type(NULL, "cpu");
-   if (!cpu)
+   if (early_radix_enabled())
return 0;
 
-   if (of_property_read_u32_array(cpu,
-  "ibm,processor-storage-keys", vals, 2) 
== 0) {
-   /*
-* Since any pkey can be used for data or execute, we will
-* just treat all keys as equal and track them as one entity.
-*/
-   pkeys_total = vals[0];
-   } else {
-
+   ret = of_scan_flat_dt(dt_scan_storage_keys, _total);
+   if (ret == 0) {
/*
 * Let's assume 32 pkeys on P8/P9 bare metal, if its not 

[PATCH v6 12/23] powerpc/book3s64/pkeys: Mark all the pkeys above max pkey as reserved

2020-07-08 Thread Aneesh Kumar K.V
The hypervisor can return less than max allowed pkey (for ex: 31) instead
of 32. We should mark all the pkeys above max allowed as reserved so
that we avoid the allocation of the wrong pkey(for ex: key 31 in the above
case) by userspace.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/book3s64/pkeys.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index ecbbf548e08f..ac272166e5b4 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -189,9 +189,10 @@ static int pkey_initialize(void)
 
/*
 * Prevent the usage of OS reserved keys. Update UAMOR
-* for those keys.
+* for those keys. Also mark the rest of the bits in the
+* 32 bit mask as reserved.
 */
-   for (i = num_pkey; i < pkeys_total; i++) {
+   for (i = num_pkey; i < 32 ; i++) {
reserved_allocation_mask |= (0x1 << i);
default_uamor &= ~(0x3ul << pkeyshift(i));
}
-- 
2.26.2



[PATCH v6 11/23] powerpc/book3s64/pkeys: Make initial_allocation_mask static

2020-07-08 Thread Aneesh Kumar K.V
initial_allocation_mask is not used outside this file.
Also mark reserved_allocation_mask and initial_allocation_mask __ro_after_init;

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/pkeys.h | 1 -
 arch/powerpc/mm/book3s64/pkeys.c | 7 +--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 26b20061512f..dd32e30f6767 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -13,7 +13,6 @@
 
 DECLARE_STATIC_KEY_FALSE(pkey_disabled);
 extern int num_pkey;
-extern u32 initial_allocation_mask; /*  bits set for the initially allocated 
keys */
 extern u32 reserved_allocation_mask; /* bits set for reserved keys */
 
 #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | \
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 42c236bd725f..ecbbf548e08f 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -14,11 +14,14 @@
 
 DEFINE_STATIC_KEY_FALSE(pkey_disabled);
 int  num_pkey; /* Max number of pkeys supported */
-u32  initial_allocation_mask;   /* Bits set for the initially allocated keys */
 /*
  *  Keys marked in the reservation list cannot be allocated by  userspace
  */
-u32  reserved_allocation_mask;
+u32 reserved_allocation_mask __ro_after_init;
+
+/* Bits set for the initially allocated keys */
+static u32 initial_allocation_mask __ro_after_init;
+
 static bool pkey_execute_disable_supported;
 /*
  * Even if we allocate keys with sys_pkey_alloc(), we need to make sure
-- 
2.26.2



[PATCH v6 10/23] powerpc/book3s64/pkeys: Convert pkey_total to num_pkey

2020-07-08 Thread Aneesh Kumar K.V
num_pkey now represents max number of keys supported such that we return
to userspace 0 - num_pkey - 1.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/pkeys.h |  7 +--
 arch/powerpc/mm/book3s64/pkeys.c | 14 +++---
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index f984bfac814a..26b20061512f 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -12,7 +12,7 @@
 #include 
 
 DECLARE_STATIC_KEY_FALSE(pkey_disabled);
-extern int pkeys_total; /* total pkeys as per device tree */
+extern int num_pkey;
 extern u32 initial_allocation_mask; /*  bits set for the initially allocated 
keys */
 extern u32 reserved_allocation_mask; /* bits set for reserved keys */
 
@@ -44,7 +44,10 @@ static inline int vma_pkey(struct vm_area_struct *vma)
return (vma->vm_flags & ARCH_VM_PKEY_FLAGS) >> VM_PKEY_SHIFT;
 }
 
-#define arch_max_pkey() pkeys_total
+static inline int arch_max_pkey(void)
+{
+   return num_pkey;
+}
 
 #define pkey_alloc_mask(pkey) (0x1 << pkey)
 
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index c95fb0280cd9..42c236bd725f 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -13,7 +13,7 @@
 #include 
 
 DEFINE_STATIC_KEY_FALSE(pkey_disabled);
-int  pkeys_total;  /* Total pkeys as per device tree */
+int  num_pkey; /* Max number of pkeys supported */
 u32  initial_allocation_mask;   /* Bits set for the initially allocated keys */
 /*
  *  Keys marked in the reservation list cannot be allocated by  userspace
@@ -93,7 +93,7 @@ static int scan_pkey_feature(void)
 
 static int pkey_initialize(void)
 {
-   int os_reserved, i;
+   int pkeys_total, i;
 
/*
 * We define PKEY_DISABLE_EXECUTE in addition to the arch-neutral
@@ -133,12 +133,12 @@ static int pkey_initialize(void)
 * The OS can manage only 8 pkeys due to its inability to represent them
 * in the Linux 4K PTE. Mark all other keys reserved.
 */
-   os_reserved = pkeys_total - 8;
+   num_pkey = min(8, pkeys_total);
 #else
-   os_reserved = 0;
+   num_pkey = pkeys_total;
 #endif
 
-   if (unlikely((pkeys_total - os_reserved) <= execute_only_key)) {
+   if (unlikely(num_pkey <= execute_only_key)) {
/*
 * Insufficient number of keys to support
 * execute only key. Mark it unavailable.
@@ -185,10 +185,10 @@ static int pkey_initialize(void)
default_uamor &= ~(0x3ul << pkeyshift(1));
 
/*
-* Prevent the usage of OS reserved the keys. Update UAMOR
+* Prevent the usage of OS reserved keys. Update UAMOR
 * for those keys.
 */
-   for (i = (pkeys_total - os_reserved); i < pkeys_total; i++) {
+   for (i = num_pkey; i < pkeys_total; i++) {
reserved_allocation_mask |= (0x1 << i);
default_uamor &= ~(0x3ul << pkeyshift(i));
}
-- 
2.26.2



[PATCH v6 09/23] powerpc/book3s64/pkeys: Simplify pkey disable branch

2020-07-08 Thread Aneesh Kumar K.V
Make the default value FALSE (pkey enabled) and set to TRUE when we
find the total number of keys supported to be zero.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/pkeys.h | 2 +-
 arch/powerpc/mm/book3s64/pkeys.c | 7 +++
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index d06ec0948964..f984bfac814a 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -11,7 +11,7 @@
 #include 
 #include 
 
-DECLARE_STATIC_KEY_TRUE(pkey_disabled);
+DECLARE_STATIC_KEY_FALSE(pkey_disabled);
 extern int pkeys_total; /* total pkeys as per device tree */
 extern u32 initial_allocation_mask; /*  bits set for the initially allocated 
keys */
 extern u32 reserved_allocation_mask; /* bits set for reserved keys */
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 59950d1cffc9..c95fb0280cd9 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -12,7 +12,7 @@
 #include 
 #include 
 
-DEFINE_STATIC_KEY_TRUE(pkey_disabled);
+DEFINE_STATIC_KEY_FALSE(pkey_disabled);
 int  pkeys_total;  /* Total pkeys as per device tree */
 u32  initial_allocation_mask;   /* Bits set for the initially allocated keys */
 /*
@@ -113,9 +113,8 @@ static int pkey_initialize(void)
 
/* scan the device tree for pkey feature */
pkeys_total = scan_pkey_feature();
-   if (pkeys_total)
-   static_branch_disable(_disabled);
-   else {
+   if (!pkeys_total) {
+   /* No support for pkey. Mark it disabled */
static_branch_enable(_disabled);
return 0;
}
-- 
2.26.2



[PATCH v6 08/23] powerpc/book3s64/pkeys: kill cpu feature key CPU_FTR_PKEY

2020-07-08 Thread Aneesh Kumar K.V
We don't use CPU_FTR_PKEY anymore. Remove the feature bit and mark it
free.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/cputable.h | 13 ++---
 arch/powerpc/kernel/dt_cpu_ftrs.c   |  6 --
 2 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index bac2252c839e..dd0a2e77a695 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -198,7 +198,7 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTR_STCX_CHECKS_ADDRESSLONG_ASM_CONST(0x8000)
 #define CPU_FTR_POPCNTB
LONG_ASM_CONST(0x0001)
 #define CPU_FTR_POPCNTD
LONG_ASM_CONST(0x0002)
-#define CPU_FTR_PKEY   LONG_ASM_CONST(0x0004)
+/* LONG_ASM_CONST(0x0004) Free */
 #define CPU_FTR_VMX_COPY   LONG_ASM_CONST(0x0008)
 #define CPU_FTR_TM LONG_ASM_CONST(0x0010)
 #define CPU_FTR_CFAR   LONG_ASM_CONST(0x0020)
@@ -438,7 +438,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_DSCR | CPU_FTR_SAO  | CPU_FTR_ASYM_SMT | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_CFAR | CPU_FTR_HVMODE | \
-   CPU_FTR_VMX_COPY | CPU_FTR_HAS_PPR | CPU_FTR_DABRX | CPU_FTR_PKEY)
+   CPU_FTR_VMX_COPY | CPU_FTR_HAS_PPR | CPU_FTR_DABRX )
 #define CPU_FTRS_POWER8 (CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_ARCH_206 |\
CPU_FTR_MMCRA | CPU_FTR_SMT | \
@@ -448,7 +448,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \
-   CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_PKEY)
+   CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP )
 #define CPU_FTRS_POWER8E (CPU_FTRS_POWER8 | CPU_FTR_PMAO_BUG)
 #define CPU_FTRS_POWER9 (CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_ARCH_206 |\
@@ -459,8 +459,8 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
-   CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_PKEY | \
-   CPU_FTR_P9_TLBIE_STQ_BUG | CPU_FTR_P9_TLBIE_ERAT_BUG | 
CPU_FTR_P9_TIDR)
+   CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_P9_TLBIE_STQ_BUG | \
+   CPU_FTR_P9_TLBIE_ERAT_BUG | CPU_FTR_P9_TIDR)
 #define CPU_FTRS_POWER9_DD2_0 (CPU_FTRS_POWER9 | CPU_FTR_P9_RADIX_PREFETCH_BUG)
 #define CPU_FTRS_POWER9_DD2_1 (CPU_FTRS_POWER9 | \
   CPU_FTR_P9_RADIX_PREFETCH_BUG | \
@@ -477,8 +477,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
-   CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_PKEY | \
-   CPU_FTR_ARCH_31)
+   CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_ARCH_31)
 #define CPU_FTRS_CELL  (CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index a0edeb391e3e..3765d211b88b 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -775,12 +775,6 @@ static __init void cpufeatures_cpu_quirks(void)
}
 
update_tlbie_feature_flag(version);
-   /*
-* PKEY was not in the initial base or feature node
-* specification, but it should become optional in the next
-* cpu feature version sequence.
-*/
-   cur_cpu_spec->cpu_features |= CPU_FTR_PKEY;
 }
 
 static void __init cpufeatures_setup_finished(void)
-- 
2.26.2



[PATCH v6 07/23] powerpc/book3s64/pkeys: Prevent key 1 modification from userspace.

2020-07-08 Thread Aneesh Kumar K.V
Key 1 is marked reserved by ISA. Setup uamor to prevent userspace modification
of the same.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/book3s64/pkeys.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 54d4868a9e68..59950d1cffc9 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -183,6 +183,7 @@ static int pkey_initialize(void)
 * programming note.
 */
reserved_allocation_mask |= (0x1 << 1);
+   default_uamor &= ~(0x3ul << pkeyshift(1));
 
/*
 * Prevent the usage of OS reserved the keys. Update UAMOR
-- 
2.26.2



[PATCH v6 06/23] powerpc/book3s64/pkeys: Simplify the key initialization

2020-07-08 Thread Aneesh Kumar K.V
Add documentation explaining the execute_only_key. The reservation and 
initialization mask
details are also explained in this patch.

No functional change in this patch.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/book3s64/pkeys.c | 199 ++-
 1 file changed, 116 insertions(+), 83 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 6ff9fe4112ef..54d4868a9e68 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -15,48 +15,80 @@
 DEFINE_STATIC_KEY_TRUE(pkey_disabled);
 int  pkeys_total;  /* Total pkeys as per device tree */
 u32  initial_allocation_mask;   /* Bits set for the initially allocated keys */
-u32  reserved_allocation_mask;  /* Bits set for reserved keys */
+/*
+ *  Keys marked in the reservation list cannot be allocated by  userspace
+ */
+u32  reserved_allocation_mask;
 static bool pkey_execute_disable_supported;
-static bool pkeys_devtree_defined; /* property exported by device tree */
-static u64 pkey_amr_mask;  /* Bits in AMR not to be touched */
-static u64 pkey_iamr_mask; /* Bits in AMR not to be touched */
-static u64 pkey_uamor_mask;/* Bits in UMOR not to be touched */
+/*
+ * Even if we allocate keys with sys_pkey_alloc(), we need to make sure
+ * other thread still find the access denied using the same keys.
+ */
+static u64 default_amr = ~0x0UL;
+static u64 default_iamr = 0xUL;
+
+/* Allow all keys to be modified by default */
+static u64 default_uamor = ~0x0UL;
+/*
+ * Key used to implement PROT_EXEC mmap. Denies READ/WRITE
+ * We pick key 2 because 0 is special key and 1 is reserved as per ISA.
+ */
 static int execute_only_key = 2;
 
+
 #define AMR_BITS_PER_PKEY 2
 #define AMR_RD_BIT 0x1UL
 #define AMR_WR_BIT 0x2UL
 #define IAMR_EX_BIT 0x1UL
-#define PKEY_REG_BITS (sizeof(u64)*8)
+#define PKEY_REG_BITS (sizeof(u64) * 8)
 #define pkeyshift(pkey) (PKEY_REG_BITS - ((pkey+1) * AMR_BITS_PER_PKEY))
 
-static void scan_pkey_feature(void)
+static int scan_pkey_feature(void)
 {
u32 vals[2];
+   int pkeys_total = 0;
struct device_node *cpu;
 
+   /*
+* Pkey is not supported with Radix translation.
+*/
+   if (radix_enabled())
+   return 0;
+
cpu = of_find_node_by_type(NULL, "cpu");
if (!cpu)
-   return;
+   return 0;
 
if (of_property_read_u32_array(cpu,
-   "ibm,processor-storage-keys", vals, 2))
-   return;
+  "ibm,processor-storage-keys", vals, 2) 
== 0) {
+   /*
+* Since any pkey can be used for data or execute, we will
+* just treat all keys as equal and track them as one entity.
+*/
+   pkeys_total = vals[0];
+   } else {
+
+   /*
+* Let's assume 32 pkeys on P8/P9 bare metal, if its not 
defined by device
+* tree. We make this exception since some version of skiboot 
forgot to
+* expose this property on power8/9.
+*/
+   if (!firmware_has_feature(FW_FEATURE_LPAR)) {
+   unsigned long pvr = mfspr(SPRN_PVR);
+
+   if (PVR_VER(pvr) == PVR_POWER8 || PVR_VER(pvr) == 
PVR_POWER8E ||
+   PVR_VER(pvr) == PVR_POWER8NVL || PVR_VER(pvr) == 
PVR_POWER9)
+   pkeys_total = 32;
+   }
+   }
 
/*
-* Since any pkey can be used for data or execute, we will just treat
-* all keys as equal and track them as one entity.
+* Adjust the upper limit, based on the number of bits supported by
+* arch-neutral code.
 */
-   pkeys_total = vals[0];
-   pkeys_devtree_defined = true;
-}
-
-static inline bool pkey_mmu_enabled(void)
-{
-   if (firmware_has_feature(FW_FEATURE_LPAR))
-   return pkeys_total;
-   else
-   return cpu_has_feature(CPU_FTR_PKEY);
+   pkeys_total = min_t(int, pkeys_total,
+   ((ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT) + 1));
+   return pkeys_total;
 }
 
 static int pkey_initialize(void)
@@ -80,35 +112,13 @@ static int pkey_initialize(void)
!= (sizeof(u64) * BITS_PER_BYTE));
 
/* scan the device tree for pkey feature */
-   scan_pkey_feature();
-
-   /*
-* Let's assume 32 pkeys on P8/P9 bare metal, if its not defined by 
device
-* tree. We make this exception since some version of skiboot forgot to
-* expose this property on power8/9.
-*/
-   if (!pkeys_devtree_defined && !firmware_has_feature(FW_FEATURE_LPAR)) {
-   unsigned long pvr = mfspr(SPRN_PVR);
-
-   if (PVR_VER(pvr) == PVR_POWER8 || PVR_VER(pvr) == PVR_POWER8E ||
-   PVR_VER(pvr) 

[PATCH v6 05/23] powerpc/book3s64/pkeys: Explain key 1 reservation details

2020-07-08 Thread Aneesh Kumar K.V
This explains the details w.r.t key 1.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/book3s64/pkeys.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index d69b4cfc5792..6ff9fe4112ef 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -128,7 +128,10 @@ static int pkey_initialize(void)
 #else
os_reserved = 0;
 #endif
-   /* Bits are in LE format. */
+   /*
+* key 1 is recommended not to be used. PowerISA(3.0) page 1015,
+* programming note.
+*/
reserved_allocation_mask = (0x1 << 1) | (0x1 << execute_only_key);
 
/* register mask is in BE format */
-- 
2.26.2



[PATCH v6 04/23] powerpc/book3s64/pkeys: Move pkey related bits in the linux page table

2020-07-08 Thread Aneesh Kumar K.V
To keep things simple, all the pkey related bits are kept together
in linux page table for 64K config with hash translation. With hash-4k
kernel requires 4 bits to store slots details. This is done by overloading
some of the RPN bits for storing the slot details. Due to this PKEY_BIT0 on
the 4K config is used for storing hash slot details.

64K before

||RSV1| RSV2| RSV3 | RSV4 | RPN44| RPN43   | | RSV5|
|| P4 |  P3 |  P2  |  P1  | Busy | HASHPTE | |  P0 |

after

||RSV1| RSV2| RSV3 | RSV4 | RPN44 | RPN43   | | RSV5 |
|| P4 |  P3 |  P2  |  P1  | P0| HASHPTE | | Busy |

4k before

|| RSV1 | RSV2 | RSV3 | RSV4 | RPN44| RPN43 | RSV5|
|| Busy |  HASHPTE |  P2  |  P1  | F_SEC| F_GIX |  P0 |

after

|| RSV1| RSV2| RSV3 | RSV4 | Free | RPN43 | RSV5 |
|| HASHPTE |  P2 |  P1  |  P0  | F_SEC| F_GIX | BUSY |

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  | 16 
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 12 ++--
 arch/powerpc/include/asm/book3s/64/pgtable.h  | 17 -
 3 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index f889d56bf8cf..082b98808701 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -34,11 +34,11 @@
 #define H_PUD_TABLE_SIZE   (sizeof(pud_t) << H_PUD_INDEX_SIZE)
 #define H_PGD_TABLE_SIZE   (sizeof(pgd_t) << H_PGD_INDEX_SIZE)
 
-#define H_PAGE_F_GIX_SHIFT 53
-#define H_PAGE_F_SECOND_RPAGE_RPN44/* HPTE is in 2ndary HPTEG */
-#define H_PAGE_F_GIX   (_RPAGE_RPN43 | _RPAGE_RPN42 | _RPAGE_RPN41)
-#define H_PAGE_BUSY_RPAGE_RSV1 /* software: PTE & hash are busy */
-#define H_PAGE_HASHPTE _RPAGE_RSV2 /* software: PTE & hash are busy */
+#define H_PAGE_F_GIX_SHIFT _PAGE_PA_MAX
+#define H_PAGE_F_SECOND_RPAGE_PKEY_BIT0 /* HPTE is in 2ndary 
HPTEG */
+#define H_PAGE_F_GIX   (_RPAGE_RPN43 | _RPAGE_RPN42 | _RPAGE_RPN41)
+#define H_PAGE_BUSY_RPAGE_RSV1
+#define H_PAGE_HASHPTE _RPAGE_PKEY_BIT4
 
 /* PTE flags to conserve for HPTE identification */
 #define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_HASHPTE | \
@@ -59,9 +59,9 @@
 /* memory key bits, only 8 keys supported */
 #define H_PTE_PKEY_BIT40
 #define H_PTE_PKEY_BIT30
-#define H_PTE_PKEY_BIT2_RPAGE_RSV3
-#define H_PTE_PKEY_BIT1_RPAGE_RSV4
-#define H_PTE_PKEY_BIT0_RPAGE_RSV5
+#define H_PTE_PKEY_BIT2_RPAGE_PKEY_BIT3
+#define H_PTE_PKEY_BIT1_RPAGE_PKEY_BIT2
+#define H_PTE_PKEY_BIT0_RPAGE_PKEY_BIT1
 
 
 /*
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 0a15fd14cf72..f20de1149ebe 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -32,15 +32,15 @@
  */
 #define H_PAGE_COMBO   _RPAGE_RPN0 /* this is a combo 4k page */
 #define H_PAGE_4K_PFN  _RPAGE_RPN1 /* PFN is for a single 4k page */
-#define H_PAGE_BUSY_RPAGE_RPN44 /* software: PTE & hash are busy */
+#define H_PAGE_BUSY_RPAGE_RSV1 /* software: PTE & hash are busy */
 #define H_PAGE_HASHPTE _RPAGE_RPN43/* PTE has associated HPTE */
 
 /* memory key bits. */
-#define H_PTE_PKEY_BIT4_RPAGE_RSV1
-#define H_PTE_PKEY_BIT3_RPAGE_RSV2
-#define H_PTE_PKEY_BIT2_RPAGE_RSV3
-#define H_PTE_PKEY_BIT1_RPAGE_RSV4
-#define H_PTE_PKEY_BIT0_RPAGE_RSV5
+#define H_PTE_PKEY_BIT4_RPAGE_PKEY_BIT4
+#define H_PTE_PKEY_BIT3_RPAGE_PKEY_BIT3
+#define H_PTE_PKEY_BIT2_RPAGE_PKEY_BIT2
+#define H_PTE_PKEY_BIT1_RPAGE_PKEY_BIT1
+#define H_PTE_PKEY_BIT0_RPAGE_PKEY_BIT0
 
 /*
  * We need to differentiate between explicit huge page and THP huge
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 25c3cb8272c0..495fc0ccb453 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -32,11 +32,13 @@
 #define _RPAGE_SW1 0x00800
 #define _RPAGE_SW2 0x00400
 #define _RPAGE_SW3 0x00200
-#define _RPAGE_RSV10x1000UL
-#define _RPAGE_RSV20x0800UL
-#define _RPAGE_RSV30x0400UL
-#define _RPAGE_RSV40x0200UL
-#define _RPAGE_RSV50x00040UL
+#define _RPAGE_RSV10x00040UL
+
+#define _RPAGE_PKEY_BIT4   0x1000UL
+#define _RPAGE_PKEY_BIT3   0x0800UL
+#define _RPAGE_PKEY_BIT2   0x0400UL
+#define _RPAGE_PKEY_BIT1   0x0200UL
+#define _RPAGE_PKEY_BIT0   0x0100UL
 
 #define _PAGE_PTE  

[PATCH v6 03/23] powerpc/book3s64/pkeys: pkeys are supported only on hash on book3s.

2020-07-08 Thread Aneesh Kumar K.V
Move them to hash specific file and add BUG() for radix path.
---
 .../powerpc/include/asm/book3s/64/hash-pkey.h | 32 
 arch/powerpc/include/asm/book3s/64/pkeys.h| 25 +
 arch/powerpc/include/asm/pkeys.h  | 37 ---
 3 files changed, 64 insertions(+), 30 deletions(-)
 create mode 100644 arch/powerpc/include/asm/book3s/64/hash-pkey.h
 create mode 100644 arch/powerpc/include/asm/book3s/64/pkeys.h

diff --git a/arch/powerpc/include/asm/book3s/64/hash-pkey.h 
b/arch/powerpc/include/asm/book3s/64/hash-pkey.h
new file mode 100644
index ..795010897e5d
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/64/hash-pkey.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_BOOK3S_64_HASH_PKEY_H
+#define _ASM_POWERPC_BOOK3S_64_HASH_PKEY_H
+
+static inline u64 hash__vmflag_to_pte_pkey_bits(u64 vm_flags)
+{
+   return (((vm_flags & VM_PKEY_BIT0) ? H_PTE_PKEY_BIT0 : 0x0UL) |
+   ((vm_flags & VM_PKEY_BIT1) ? H_PTE_PKEY_BIT1 : 0x0UL) |
+   ((vm_flags & VM_PKEY_BIT2) ? H_PTE_PKEY_BIT2 : 0x0UL) |
+   ((vm_flags & VM_PKEY_BIT3) ? H_PTE_PKEY_BIT3 : 0x0UL) |
+   ((vm_flags & VM_PKEY_BIT4) ? H_PTE_PKEY_BIT4 : 0x0UL));
+}
+
+static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
+{
+   return (((pteflags & H_PTE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL) |
+   ((pteflags & H_PTE_PKEY_BIT3) ? HPTE_R_KEY_BIT3 : 0x0UL) |
+   ((pteflags & H_PTE_PKEY_BIT2) ? HPTE_R_KEY_BIT2 : 0x0UL) |
+   ((pteflags & H_PTE_PKEY_BIT1) ? HPTE_R_KEY_BIT1 : 0x0UL) |
+   ((pteflags & H_PTE_PKEY_BIT0) ? HPTE_R_KEY_BIT0 : 0x0UL));
+}
+
+static inline u16 hash__pte_to_pkey_bits(u64 pteflags)
+{
+   return (((pteflags & H_PTE_PKEY_BIT4) ? 0x10 : 0x0UL) |
+   ((pteflags & H_PTE_PKEY_BIT3) ? 0x8 : 0x0UL) |
+   ((pteflags & H_PTE_PKEY_BIT2) ? 0x4 : 0x0UL) |
+   ((pteflags & H_PTE_PKEY_BIT1) ? 0x2 : 0x0UL) |
+   ((pteflags & H_PTE_PKEY_BIT0) ? 0x1 : 0x0UL));
+}
+
+#endif
diff --git a/arch/powerpc/include/asm/book3s/64/pkeys.h 
b/arch/powerpc/include/asm/book3s/64/pkeys.h
new file mode 100644
index ..8174662a9173
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/64/pkeys.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+#ifndef _ASM_POWERPC_BOOK3S_64_PKEYS_H
+#define _ASM_POWERPC_BOOK3S_64_PKEYS_H
+
+#include 
+
+static inline u64 vmflag_to_pte_pkey_bits(u64 vm_flags)
+{
+   if (static_branch_likely(_disabled))
+   return 0x0UL;
+
+   if (radix_enabled())
+   BUG();
+   return hash__vmflag_to_pte_pkey_bits(vm_flags);
+}
+
+static inline u16 pte_to_pkey_bits(u64 pteflags)
+{
+   if (radix_enabled())
+   BUG();
+   return hash__pte_to_pkey_bits(pteflags);
+}
+
+#endif /*_ASM_POWERPC_KEYS_H */
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index fec358782c04..d06ec0948964 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -25,23 +25,18 @@ extern u32 reserved_allocation_mask; /* bits set for 
reserved keys */
PKEY_DISABLE_WRITE  | \
PKEY_DISABLE_EXECUTE)
 
+#ifdef CONFIG_PPC_BOOK3S_64
+#include 
+#else
+#error "Not supported"
+#endif
+
+
 static inline u64 pkey_to_vmflag_bits(u16 pkey)
 {
return (((u64)pkey << VM_PKEY_SHIFT) & ARCH_VM_PKEY_FLAGS);
 }
 
-static inline u64 vmflag_to_pte_pkey_bits(u64 vm_flags)
-{
-   if (static_branch_likely(_disabled))
-   return 0x0UL;
-
-   return (((vm_flags & VM_PKEY_BIT0) ? H_PTE_PKEY_BIT0 : 0x0UL) |
-   ((vm_flags & VM_PKEY_BIT1) ? H_PTE_PKEY_BIT1 : 0x0UL) |
-   ((vm_flags & VM_PKEY_BIT2) ? H_PTE_PKEY_BIT2 : 0x0UL) |
-   ((vm_flags & VM_PKEY_BIT3) ? H_PTE_PKEY_BIT3 : 0x0UL) |
-   ((vm_flags & VM_PKEY_BIT4) ? H_PTE_PKEY_BIT4 : 0x0UL));
-}
-
 static inline int vma_pkey(struct vm_area_struct *vma)
 {
if (static_branch_likely(_disabled))
@@ -51,24 +46,6 @@ static inline int vma_pkey(struct vm_area_struct *vma)
 
 #define arch_max_pkey() pkeys_total
 
-static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
-{
-   return (((pteflags & H_PTE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT3) ? HPTE_R_KEY_BIT3 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT2) ? HPTE_R_KEY_BIT2 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT1) ? HPTE_R_KEY_BIT1 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT0) ? HPTE_R_KEY_BIT0 : 0x0UL));
-}
-
-static inline u16 pte_to_pkey_bits(u64 pteflags)
-{
-   return (((pteflags & H_PTE_PKEY_BIT4) ? 0x10 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT3) ? 0x8 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT2) ? 0x4 : 0x0UL) |
-   ((pteflags & H_PTE_PKEY_BIT1) ? 0x2 : 0x0UL) |
-   

[PATCH v6 02/23] powerpc/book3s64/pkeys: Fixup bit numbering

2020-07-08 Thread Aneesh Kumar K.V
This number the pkey bit such that it is easy to follow. PKEY_BIT0 is
the lower order bit. This makes further changes easy to follow.

No functional change in this patch other than linux page table for
hash translation now maps pkeys differently.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  |  9 +++
 arch/powerpc/include/asm/book3s/64/hash-64k.h |  8 +++
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  8 +++
 arch/powerpc/include/asm/pkeys.h  | 24 +--
 4 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index 3f9ae3585ab9..f889d56bf8cf 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -57,11 +57,12 @@
 #define H_PMD_FRAG_NR  (PAGE_SIZE >> H_PMD_FRAG_SIZE_SHIFT)
 
 /* memory key bits, only 8 keys supported */
-#define H_PTE_PKEY_BIT00
-#define H_PTE_PKEY_BIT10
+#define H_PTE_PKEY_BIT40
+#define H_PTE_PKEY_BIT30
 #define H_PTE_PKEY_BIT2_RPAGE_RSV3
-#define H_PTE_PKEY_BIT3_RPAGE_RSV4
-#define H_PTE_PKEY_BIT4_RPAGE_RSV5
+#define H_PTE_PKEY_BIT1_RPAGE_RSV4
+#define H_PTE_PKEY_BIT0_RPAGE_RSV5
+
 
 /*
  * On all 4K setups, remap_4k_pfn() equates to remap_pfn_range()
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 0729c034e56f..0a15fd14cf72 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -36,11 +36,11 @@
 #define H_PAGE_HASHPTE _RPAGE_RPN43/* PTE has associated HPTE */
 
 /* memory key bits. */
-#define H_PTE_PKEY_BIT0_RPAGE_RSV1
-#define H_PTE_PKEY_BIT1_RPAGE_RSV2
+#define H_PTE_PKEY_BIT4_RPAGE_RSV1
+#define H_PTE_PKEY_BIT3_RPAGE_RSV2
 #define H_PTE_PKEY_BIT2_RPAGE_RSV3
-#define H_PTE_PKEY_BIT3_RPAGE_RSV4
-#define H_PTE_PKEY_BIT4_RPAGE_RSV5
+#define H_PTE_PKEY_BIT1_RPAGE_RSV4
+#define H_PTE_PKEY_BIT0_RPAGE_RSV5
 
 /*
  * We need to differentiate between explicit huge page and THP huge
diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 3fa1b962dc27..58fcc959f9d5 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -86,8 +86,8 @@
 #define HPTE_R_PP0 ASM_CONST(0x8000)
 #define HPTE_R_TS  ASM_CONST(0x4000)
 #define HPTE_R_KEY_HI  ASM_CONST(0x3000)
-#define HPTE_R_KEY_BIT0ASM_CONST(0x2000)
-#define HPTE_R_KEY_BIT1ASM_CONST(0x1000)
+#define HPTE_R_KEY_BIT4ASM_CONST(0x2000)
+#define HPTE_R_KEY_BIT3ASM_CONST(0x1000)
 #define HPTE_R_RPN_SHIFT   12
 #define HPTE_R_RPN ASM_CONST(0x0000)
 #define HPTE_R_RPN_3_0 ASM_CONST(0x01fff000)
@@ -103,8 +103,8 @@
 #define HPTE_R_R   ASM_CONST(0x0100)
 #define HPTE_R_KEY_LO  ASM_CONST(0x0e00)
 #define HPTE_R_KEY_BIT2ASM_CONST(0x0800)
-#define HPTE_R_KEY_BIT3ASM_CONST(0x0400)
-#define HPTE_R_KEY_BIT4ASM_CONST(0x0200)
+#define HPTE_R_KEY_BIT1ASM_CONST(0x0400)
+#define HPTE_R_KEY_BIT0ASM_CONST(0x0200)
 #define HPTE_R_KEY (HPTE_R_KEY_LO | HPTE_R_KEY_HI)
 
 #define HPTE_V_1TB_SEG ASM_CONST(0x4000)
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 2fe6cae14d10..fec358782c04 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -35,11 +35,11 @@ static inline u64 vmflag_to_pte_pkey_bits(u64 vm_flags)
if (static_branch_likely(_disabled))
return 0x0UL;
 
-   return (((vm_flags & VM_PKEY_BIT0) ? H_PTE_PKEY_BIT4 : 0x0UL) |
-   ((vm_flags & VM_PKEY_BIT1) ? H_PTE_PKEY_BIT3 : 0x0UL) |
+   return (((vm_flags & VM_PKEY_BIT0) ? H_PTE_PKEY_BIT0 : 0x0UL) |
+   ((vm_flags & VM_PKEY_BIT1) ? H_PTE_PKEY_BIT1 : 0x0UL) |
((vm_flags & VM_PKEY_BIT2) ? H_PTE_PKEY_BIT2 : 0x0UL) |
-   ((vm_flags & VM_PKEY_BIT3) ? H_PTE_PKEY_BIT1 : 0x0UL) |
-   ((vm_flags & VM_PKEY_BIT4) ? H_PTE_PKEY_BIT0 : 0x0UL));
+   ((vm_flags & VM_PKEY_BIT3) ? H_PTE_PKEY_BIT3 : 0x0UL) |
+   ((vm_flags & VM_PKEY_BIT4) ? H_PTE_PKEY_BIT4 : 0x0UL));
 }
 
 static inline int vma_pkey(struct vm_area_struct *vma)
@@ -53,20 +53,20 @@ static inline int vma_pkey(struct vm_area_struct *vma)
 
 static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
 {
-   return (((pteflags & H_PTE_PKEY_BIT0) ? 

[PATCH v6 01/23] powerpc/book3s64/pkeys: Use PVR check instead of cpu feature

2020-07-08 Thread Aneesh Kumar K.V
We are wrongly using CPU_FTRS_POWER8 to check for P8 support. Instead, we should
use PVR value. Now considering we are using CPU_FTRS_POWER8, that
implies we returned true for P9 with older firmware. Keep the same behavior
by checking for P9 PVR value.

Fixes: cf43d3b26452 ("powerpc: Enable pkey subsystem")
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/book3s64/pkeys.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index ca5fcb4bff32..d69b4cfc5792 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -83,13 +83,17 @@ static int pkey_initialize(void)
scan_pkey_feature();
 
/*
-* Let's assume 32 pkeys on P8 bare metal, if its not defined by device
-* tree. We make this exception since skiboot forgot to expose this
-* property on power8.
+* Let's assume 32 pkeys on P8/P9 bare metal, if its not defined by 
device
+* tree. We make this exception since some version of skiboot forgot to
+* expose this property on power8/9.
 */
-   if (!pkeys_devtree_defined && !firmware_has_feature(FW_FEATURE_LPAR) &&
-   cpu_has_feature(CPU_FTRS_POWER8))
-   pkeys_total = 32;
+   if (!pkeys_devtree_defined && !firmware_has_feature(FW_FEATURE_LPAR)) {
+   unsigned long pvr = mfspr(SPRN_PVR);
+
+   if (PVR_VER(pvr) == PVR_POWER8 || PVR_VER(pvr) == PVR_POWER8E ||
+   PVR_VER(pvr) == PVR_POWER8NVL || PVR_VER(pvr) == PVR_POWER9)
+   pkeys_total = 32;
+   }
 
/*
 * Adjust the upper limit, based on the number of bits supported by
-- 
2.26.2



[PATCH v6 00/23] powerpc/book3s/64/pkeys: Simplify the code

2020-07-08 Thread Aneesh Kumar K.V
This patch series update the pkey subsystem with more documentation and
rename variables so that it is easy to follow the code. We drop the changes
to support KUAP/KUEP with hash translation in this update. The changes
are adding 200 cycles to null syscalls benchmark and I want to look at that
closely before requesting a merge. The rest of the patches are included
in this series. This should avoid having to carry a large patchset across
the upstream merge. Some of the changes in here make the hash KUEP/KUAP
addition simpler.

Changes from v5:
* Address review feedback.
* Dropped patches moving kup to generic name.
* Dropped static key changes related to execute only support.

Changes from v4:
* Drop hash KUAP/KUEP changes.

Changes from v3:
* Fix build error reported by kernel test robot 

Changes from v2:
* Rebase to the latest kernel.
* Fixed a bug with disabling KUEP/KUAP on kernel command line
* Added a patch to make kup key dynamic.

Changes from v1:
* Rebased on latest kernel

Aneesh Kumar K.V (23):
  powerpc/book3s64/pkeys: Use PVR check instead of cpu feature
  powerpc/book3s64/pkeys: Fixup bit numbering
  powerpc/book3s64/pkeys: pkeys are supported only on hash on book3s.
  powerpc/book3s64/pkeys: Move pkey related bits in the linux page table
  powerpc/book3s64/pkeys: Explain key 1 reservation details
  powerpc/book3s64/pkeys: Simplify the key initialization
  powerpc/book3s64/pkeys: Prevent key 1 modification from userspace.
  powerpc/book3s64/pkeys: kill cpu feature key CPU_FTR_PKEY
  powerpc/book3s64/pkeys: Simplify pkey disable branch
  powerpc/book3s64/pkeys: Convert pkey_total to num_pkey
  powerpc/book3s64/pkeys: Make initial_allocation_mask static
  powerpc/book3s64/pkeys: Mark all the pkeys above max pkey as reserved
  powerpc/book3s64/pkeys: Add MMU_FTR_PKEY
  powerpc/book3s64/kuep: Add MMU_FTR_KUEP
  powerpc/book3s64/pkeys: Use pkey_execute_disable_supported
  powerpc/book3s64/pkeys: Use MMU_FTR_PKEY instead of pkey_disabled
static key
  powerpc/book3s64/keys: Print information during boot.
  powerpc/book3s64/keys/kuap: Reset AMR/IAMR values on kexec
  powerpc/book3s64/kuap: Move UAMOR setup to key init function
  powerpc/selftest/ptrave-pkey: Rename variables to make it easier to
follow code
  powerpc/selftest/ptrace-pkey: Update the test to mark an invalid pkey
correctly
  powerpc/selftest/ptrace-pkey: Don't update expected UAMOR value
  powerpc/book3s64/pkeys: Remove is_pkey_enabled()

 arch/powerpc/include/asm/book3s/64/hash-4k.h  |  21 +-
 arch/powerpc/include/asm/book3s/64/hash-64k.h |  12 +-
 .../powerpc/include/asm/book3s/64/hash-pkey.h |  32 ++
 arch/powerpc/include/asm/book3s/64/kexec.h|  23 ++
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |   8 +-
 arch/powerpc/include/asm/book3s/64/mmu.h  |   6 +
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  17 +-
 arch/powerpc/include/asm/book3s/64/pkeys.h|  27 ++
 arch/powerpc/include/asm/cputable.h   |  13 +-
 arch/powerpc/include/asm/kexec.h  |  12 +
 arch/powerpc/include/asm/mmu.h|  17 +
 arch/powerpc/include/asm/pkeys.h  |  65 +---
 arch/powerpc/include/asm/processor.h  |   1 -
 arch/powerpc/kernel/dt_cpu_ftrs.c |   6 -
 arch/powerpc/kernel/misc_64.S |  14 -
 arch/powerpc/kernel/prom.c|   5 +
 arch/powerpc/kernel/ptrace/ptrace-view.c  |  27 +-
 arch/powerpc/kernel/smp.c |   1 +
 arch/powerpc/kexec/core_64.c  |   2 +
 arch/powerpc/mm/book3s64/hash_utils.c |   4 +
 arch/powerpc/mm/book3s64/pgtable.c|   3 +
 arch/powerpc/mm/book3s64/pkeys.c  | 294 ++
 arch/powerpc/mm/book3s64/radix_pgtable.c  |   8 +-
 .../selftests/powerpc/ptrace/ptrace-pkey.c|  55 ++--
 24 files changed, 404 insertions(+), 269 deletions(-)
 create mode 100644 arch/powerpc/include/asm/book3s/64/hash-pkey.h
 create mode 100644 arch/powerpc/include/asm/book3s/64/kexec.h
 create mode 100644 arch/powerpc/include/asm/book3s/64/pkeys.h

-- 
2.26.2



[powerpc:merge] BUILD SUCCESS 71d6070a8e0e0a1ed82365544f97b86475cb161e

2020-07-08 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
merge
branch HEAD: 71d6070a8e0e0a1ed82365544f97b86475cb161e  Automatic merge of 
'master', 'next' and 'fixes' (2020-07-08 23:12)

elapsed time: 828m

configs tested: 113
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

arm defconfig
arm  allyesconfig
arm  allmodconfig
arm   allnoconfig
arm64allyesconfig
arm64   defconfig
arm64allmodconfig
arm64 allnoconfig
xtensa virt_defconfig
pariscallnoconfig
mips  pic32mzda_defconfig
arm   netwinder_defconfig
openrisc simple_smp_defconfig
powerpc  pasemi_defconfig
armcerfcube_defconfig
xtensa  iss_defconfig
sh   se7343_defconfig
arm  ixp4xx_defconfig
sh   se7780_defconfig
m68k alldefconfig
arc  axs103_smp_defconfig
arm  prima2_defconfig
sh   sh2007_defconfig
mips  maltaaprp_defconfig
i386  allnoconfig
i386 allyesconfig
i386defconfig
i386  debian-10.3
ia64 allmodconfig
ia64defconfig
ia64  allnoconfig
ia64 allyesconfig
m68k allmodconfig
m68k  allnoconfig
m68k   sun3_defconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
nios2allyesconfig
openriscdefconfig
c6x  allyesconfig
c6x   allnoconfig
openrisc allyesconfig
nds32   defconfig
nds32 allnoconfig
csky allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
h8300allmodconfig
xtensa  defconfig
arc defconfig
arc  allyesconfig
sh   allmodconfig
shallnoconfig
microblazeallnoconfig
mips allyesconfig
mips  allnoconfig
mips allmodconfig
parisc  defconfig
parisc   allyesconfig
parisc   allmodconfig
powerpc defconfig
powerpc  allyesconfig
powerpc  rhel-kconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a002-20200708
i386 randconfig-a001-20200708
i386 randconfig-a006-20200708
i386 randconfig-a005-20200708
i386 randconfig-a004-20200708
i386 randconfig-a003-20200708
i386 randconfig-a011-20200708
i386 randconfig-a015-20200708
i386 randconfig-a014-20200708
i386 randconfig-a016-20200708
i386 randconfig-a012-20200708
i386 randconfig-a013-20200708
x86_64   randconfig-a001-20200708
x86_64   randconfig-a006-20200708
x86_64   randconfig-a003-20200708
x86_64   randconfig-a002-20200708
x86_64   randconfig-a004-20200708
x86_64   randconfig-a005-20200708
riscvallyesconfig
riscv allnoconfig
riscv   defconfig
riscvallmodconfig
s390 allyesconfig
s390  allnoconfig
s390 allmodconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
sparc64 defconfig
sparc64   allnoconfig
sparc64  allyesconfig
sparc64

[powerpc:next-test] BUILD SUCCESS 18fcf96cb354eb003297e14b1e19c1f7c067c49b

2020-07-08 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
next-test
branch HEAD: 18fcf96cb354eb003297e14b1e19c1f7c067c49b  powerpc/vdso: Provide 
__kernel_clock_gettime64() on vdso32

elapsed time: 722m

configs tested: 109
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

arm defconfig
arm  allyesconfig
arm  allmodconfig
arm   allnoconfig
arm64allyesconfig
arm64   defconfig
arm64allmodconfig
arm64 allnoconfig
powerpc  pasemi_defconfig
armcerfcube_defconfig
xtensa  iss_defconfig
sh   se7343_defconfig
arm  ixp4xx_defconfig
sh   se7780_defconfig
m68k alldefconfig
arc  axs103_smp_defconfig
arm  prima2_defconfig
sh   sh2007_defconfig
mips  maltaaprp_defconfig
i386  allnoconfig
i386 allyesconfig
i386defconfig
i386  debian-10.3
ia64 allmodconfig
ia64defconfig
ia64  allnoconfig
ia64 allyesconfig
m68k allmodconfig
m68k  allnoconfig
m68k   sun3_defconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
nios2allyesconfig
openriscdefconfig
c6x  allyesconfig
c6x   allnoconfig
openrisc allyesconfig
nds32   defconfig
nds32 allnoconfig
csky allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
h8300allmodconfig
xtensa  defconfig
arc defconfig
arc  allyesconfig
sh   allmodconfig
shallnoconfig
microblazeallnoconfig
mips allyesconfig
mips  allnoconfig
mips allmodconfig
pariscallnoconfig
parisc  defconfig
parisc   allyesconfig
parisc   allmodconfig
powerpc defconfig
powerpc  allyesconfig
powerpc  rhel-kconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a002-20200708
i386 randconfig-a001-20200708
i386 randconfig-a006-20200708
i386 randconfig-a005-20200708
i386 randconfig-a004-20200708
i386 randconfig-a003-20200708
x86_64   randconfig-a001-20200708
x86_64   randconfig-a006-20200708
x86_64   randconfig-a003-20200708
x86_64   randconfig-a002-20200708
x86_64   randconfig-a004-20200708
x86_64   randconfig-a005-20200708
i386 randconfig-a011-20200708
i386 randconfig-a015-20200708
i386 randconfig-a014-20200708
i386 randconfig-a016-20200708
i386 randconfig-a012-20200708
i386 randconfig-a013-20200708
riscvallyesconfig
riscv allnoconfig
riscv   defconfig
riscvallmodconfig
s390 allyesconfig
s390  allnoconfig
s390 allmodconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
sparc64 defconfig
sparc64   allnoconfig
sparc64  allyesconfig
sparc64  allmodconfig
um   allmodconfig
umallnoconfig
um   allyesconfig
um

[powerpc:fixes-test] BUILD SUCCESS 4557ac6b344b8cdf948ff8b007e8e1de34832f2e

2020-07-08 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
fixes-test
branch HEAD: 4557ac6b344b8cdf948ff8b007e8e1de34832f2e  powerpc/64s/exception: 
Fix 0x1500 interrupt handler crash

elapsed time: 829m

configs tested: 113
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

arm defconfig
arm  allyesconfig
arm  allmodconfig
arm   allnoconfig
arm64allyesconfig
arm64   defconfig
arm64allmodconfig
arm64 allnoconfig
xtensa virt_defconfig
pariscallnoconfig
mips  pic32mzda_defconfig
arm   netwinder_defconfig
openrisc simple_smp_defconfig
powerpc  pasemi_defconfig
armcerfcube_defconfig
xtensa  iss_defconfig
sh   se7343_defconfig
arm  ixp4xx_defconfig
sh   se7780_defconfig
m68k alldefconfig
arc  axs103_smp_defconfig
arm  prima2_defconfig
sh   sh2007_defconfig
mips  maltaaprp_defconfig
i386  allnoconfig
i386 allyesconfig
i386defconfig
i386  debian-10.3
ia64 allmodconfig
ia64defconfig
ia64  allnoconfig
ia64 allyesconfig
m68k allmodconfig
m68k  allnoconfig
m68k   sun3_defconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
nios2allyesconfig
openriscdefconfig
c6x  allyesconfig
c6x   allnoconfig
openrisc allyesconfig
nds32   defconfig
nds32 allnoconfig
csky allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
h8300allmodconfig
xtensa  defconfig
arc defconfig
arc  allyesconfig
sh   allmodconfig
shallnoconfig
microblazeallnoconfig
mips allyesconfig
mips  allnoconfig
mips allmodconfig
parisc  defconfig
parisc   allyesconfig
parisc   allmodconfig
powerpc defconfig
powerpc  allyesconfig
powerpc  rhel-kconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a002-20200708
i386 randconfig-a001-20200708
i386 randconfig-a006-20200708
i386 randconfig-a005-20200708
i386 randconfig-a004-20200708
i386 randconfig-a003-20200708
x86_64   randconfig-a001-20200708
x86_64   randconfig-a006-20200708
x86_64   randconfig-a003-20200708
x86_64   randconfig-a002-20200708
x86_64   randconfig-a004-20200708
x86_64   randconfig-a005-20200708
i386 randconfig-a011-20200708
i386 randconfig-a015-20200708
i386 randconfig-a014-20200708
i386 randconfig-a016-20200708
i386 randconfig-a012-20200708
i386 randconfig-a013-20200708
riscvallyesconfig
riscv allnoconfig
riscv   defconfig
riscvallmodconfig
s390 allyesconfig
s390  allnoconfig
s390 allmodconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
sparc64 defconfig
sparc64   allnoconfig
sparc64  allyesconfig
sparc64

Re: [PATCH v2 01/10] powerpc/perf: Add support for ISA3.1 PMU SPRs

2020-07-08 Thread Athira Rajeev



> On 08-Jul-2020, at 4:32 PM, Michael Ellerman  wrote:
> 
> Athira Rajeev  writes:
> ...
>> diff --git a/arch/powerpc/perf/core-book3s.c 
>> b/arch/powerpc/perf/core-book3s.c
>> index cd6a742..5c64bd3 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -39,10 +39,10 @@ struct cpu_hw_events {
>>  unsigned int flags[MAX_HWEVENTS];
>>  /*
>>   * The order of the MMCR array is:
>> - *  - 64-bit, MMCR0, MMCR1, MMCRA, MMCR2
>> + *  - 64-bit, MMCR0, MMCR1, MMCRA, MMCR2, MMCR3
>>   *  - 32-bit, MMCR0, MMCR1, MMCR2
>>   */
>> -unsigned long mmcr[4];
>> +unsigned long mmcr[5];
>>  struct perf_event *limited_counter[MAX_LIMITED_HWCOUNTERS];
>>  u8  limited_hwidx[MAX_LIMITED_HWCOUNTERS];
>>  u64 alternatives[MAX_HWEVENTS][MAX_EVENT_ALTERNATIVES];
> ...
>> @@ -1310,6 +1326,10 @@ static void power_pmu_enable(struct pmu *pmu)
>>  if (!cpuhw->n_added) {
>>  mtspr(SPRN_MMCRA, cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE);
>>  mtspr(SPRN_MMCR1, cpuhw->mmcr[1]);
>> +#ifdef CONFIG_PPC64
>> +if (ppmu->flags & PPMU_ARCH_310S)
>> +mtspr(SPRN_MMCR3, cpuhw->mmcr[4]);
>> +#endif /* CONFIG_PPC64 */
>>  goto out_enable;
>>  }
>> 
>> @@ -1353,6 +1373,11 @@ static void power_pmu_enable(struct pmu *pmu)
>>  if (ppmu->flags & PPMU_ARCH_207S)
>>  mtspr(SPRN_MMCR2, cpuhw->mmcr[3]);
>> 
>> +#ifdef CONFIG_PPC64
>> +if (ppmu->flags & PPMU_ARCH_310S)
>> +mtspr(SPRN_MMCR3, cpuhw->mmcr[4]);
>> +#endif /* CONFIG_PPC64 */
> 
> I don't think you need the #ifdef CONFIG_PPC64?

Hi Michael

Thanks for reviewing this series.

SPRN_MMCR3 is not defined for PPC32 and we hit build failure for 
pmac32_defconfig.
The #ifdef CONFIG_PPC64 is to address this.

Thanks
Athira


> 
> cheers



Re: powerpc: Incorrect stw operand modifier in __set_pte_at

2020-07-08 Thread Segher Boessenkool
On Wed, Jul 08, 2020 at 06:16:54PM +0200, Christophe Leroy wrote:
> Le 08/07/2020 à 16:45, Mathieu Desnoyers a écrit :
> >Reviewing use of the patterns "Un%Xn" with lwz and stw instructions
> >(where n should be the operand number) within the Linux kernel led
> >me to spot those 2 weird cases:
> >
> >arch/powerpc/include/asm/nohash/pgtable.h:__set_pte_at()
> >
> > __asm__ __volatile__("\
> > stw%U0%X0 %2,%0\n\
> > eieio\n\
> > stw%U0%X0 %L2,%1"
> > : "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
> > : "r" (pte) : "memory");
> >
> >I would have expected the stw to be:
> >
> > stw%U1%X1 %L2,%1"
> >
> >and:
> >arch/powerpc/include/asm/book3s/32/pgtable.h:__set_pte_at()
> >
> > __asm__ __volatile__("\
> > stw%U0%X0 %2,%0\n\
> > eieio\n\
> > stw%U0%X0 %L2,%1"
> > : "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
> > : "r" (pte) : "memory");
> >
> >where I would have expected:
> >
> > stw%U1%X1 %L2,%1"
> >
> >Is it a bug or am I missing something ?
> 
> Well spotted. I guess it's definitly a bug.

Yes :-)

> Introduced 12 years ago by commit 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9bf2b5cd
>  
> ("powerpc: Fixes for CONFIG_PTE_64BIT for SMP support").
> 
> It's gone unnoticed until now it seems.

Apparently it always could use offset form memory accesses?  Or even
when not, %0 and %1 are likely to use the same base register for
addressing :-)


Segher


Re: Failure to build librseq on ppc

2020-07-08 Thread Segher Boessenkool
On Wed, Jul 08, 2020 at 08:01:23PM -0400, Mathieu Desnoyers wrote:
> > > #define RSEQ_ASM_OP_CMPEQ(var, expect, label) 
> > >   \
> > > LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t" 
> > >   \
> > 
> > The way this hardcodes r17 *will* break, btw.  The compiler will not
> > likely want to use r17 as long as your code (after inlining etc.!) stays
> > small, but there is Murphy's law.
> 
> r17 is in the clobber list, so it should be ok.

What protects r17 *after* this asm statement?


Segher


Re: Failure to build librseq on ppc

2020-07-08 Thread Segher Boessenkool
On Wed, Jul 08, 2020 at 10:32:20AM -0400, Mathieu Desnoyers wrote:
> > As far as I can see, %U is mentioned in
> > https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html in the
> > powerpc subpart, at the "m" constraint.
> 
> Yep, I did notice it, but mistakenly thought it was only needed for "m<>" 
> operand,
> not "m".

Historically, "m" meant what "m<>" does now (in inline asm).  Too many
people couldn't get it right ever (on other targets -- not that the
situation was great for PowerPC, heh), so in inline asm "m" now means
"no pre-modify or post-modify".


Segher


Re: [PATCH] powerpc/64s/exception: Fix 0x1500 interrupt handler crash

2020-07-08 Thread Michael Ellerman
On Wed, 8 Jul 2020 17:49:42 +1000, Nicholas Piggin wrote:
> A typo caused the interrupt handler to branch immediately to the common
> "unknown interrupt" handler and skip the special case test for denormal
> cause.
> 
> This does not affect KVM softpatch handling (e.g., for POWER9 TM assist)
> because the KVM test was moved to common code by commit 9600f261acaa
> ("powerpc/64s/exception: Move KVM test to common code") just before this
> bug was introduced.

Applied to powerpc/fixes.

[1/1] powerpc/64s/exception: Fix 0x1500 interrupt handler crash
  https://git.kernel.org/powerpc/c/4557ac6b344b8cdf948ff8b007e8e1de34832f2e

cheers


Re: Failure to build librseq on ppc

2020-07-08 Thread Segher Boessenkool
Hi!

On Wed, Jul 08, 2020 at 10:00:01AM -0400, Mathieu Desnoyers wrote:
> >> So perhaps you have code like
> >> 
> >>  int *p;
> >>  int x;
> >>  ...
> >>  asm ("lwz %0,%1" : "=r"(x) : "m"(*p));
> > 
> > We indeed have explicit "lwz" and "stw" instructions in there.
> > 
> >> 
> >> where that last line should actually read
> >> 
> >>  asm ("lwz%X1 %0,%1" : "=r"(x) : "m"(*p));
> > 
> > Indeed, turning those into "lwzx" and "stwx" seems to fix the issue.
> > 
> > There has been some level of extra CPP macro coating around those 
> > instructions
> > to
> > support both ppc32 and ppc64 with the same assembly. So adding %X[arg] is 
> > not
> > trivial.
> > Let me see what can be done here.
> 
> I did the following changes which appear to generate valid asm.
> See attached corresponding .S output.
> 
> I grepped for uses of "m" asm operand in Linux powerpc code and noticed it's 
> pretty much
> always used with e.g. "lwz%U1%X1". I could find one blog post discussing that 
> %U is about
> update flag, and nothing about %X. Are those documented ?

Historically, no machine-specific output modifiers were documented.
For GCC 10 i added a few (in
https://gcc.gnu.org/onlinedocs/gcc-10.1.0/gcc/Machine-Constraints.html#Machine-Constraints
), but not all (that user code should use!) yet.

> Although it appears to generate valid asm, I have the feeling I'm relying on 
> undocumented
> features here. :-/

It is supported for 30 years or so now.  GCC itself uses this a *lot*
internally as well.  It works, and it will work forever.

> -#define STORE_WORD "std "
> -#define LOAD_WORD  "ld "
> -#define LOADX_WORD "ldx "
> +#define STORE_WORD(arg)"std%U[" __rseq_str(arg) "]%X[" 
> __rseq_str(arg) "] "/* To memory ("m" constraint) */
> +#define LOAD_WORD(arg) "lwd%U[" __rseq_str(arg) "]%X[" __rseq_str(arg) "] "  
>   /* From memory ("m" constraint) */

That cannot work (you typoed "ld" here).

Some more advice about this code, pretty generic stuff:

The way this all uses r17 will likely not work reliably.

The way multiple asm statements are used seems to have missing
dependencies between the statements.

Don't try to work *against* the compiler.  You will not win.

Alternatively, write assembler code, if that is what you actually want
to do?  Not C code.

And done macro-mess this, you want to be able to debug it, and you need
other people to be able to read it!


Segher


Re: Failure to build librseq on ppc

2020-07-08 Thread Mathieu Desnoyers


- Segher Boessenkool  wrote:
> Hi!
> 
> On Wed, Jul 08, 2020 at 10:27:27PM +1000, Michael Ellerman wrote:
> > Segher Boessenkool  writes:
> > > You'll have to show the actual failing machine code, and with enough
> > > context that we can relate this to the source code.
> > >
> > > -save-temps helps, or use -S instead of -c, etc.
> > 
> > Attached below.
> 
> Thanks!
> 
> > I think that's from:
> > 
> > #define LOAD_WORD   "ld "
> > 
> > #define RSEQ_ASM_OP_CMPEQ(var, expect, label)   
> > \
> > LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t"   
> > \
> 
> The way this hardcodes r17 *will* break, btw.  The compiler will not
> likely want to use r17 as long as your code (after inlining etc.!) stays
> small, but there is Murphy's law.

r17 is in the clobber list, so it should be ok.

> 
> Anyway...  something in rseq_str is wrong, missing %X.  This may
> have to do with the abuse of inline asm here, making a fix harder :-(

I just committed a fix which enhances the macros.

Thanks for your help!

Mathieu

> 
> 
> Segher

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks

2020-07-08 Thread Waiman Long

On 7/8/20 7:50 PM, Waiman Long wrote:

On 7/8/20 1:10 AM, Nicholas Piggin wrote:

Excerpts from Waiman Long's message of July 8, 2020 1:33 pm:

On 7/7/20 1:57 AM, Nicholas Piggin wrote:

Yes, powerpc could certainly get more performance out of the slow
paths, and then there are a few parameters to tune.

We don't have a good alternate patching for function calls yet, but
that would be something to do for native vs pv.

And then there seem to be one or two tunable parameters we could
experiment with.

The paravirt locks may need a bit more tuning. Some simple testing
under KVM shows we might be a bit slower in some cases. Whether this
is fairness or something else I'm not sure. The current simple pv
spinlock code can do a directed yield to the lock holder CPU, whereas
the pv qspl here just does a general yield. I think we might actually
be able to change that to also support directed yield. Though I'm
not sure if this is actually the cause of the slowdown yet.

Regarding the paravirt lock, I have taken a further look into the
current PPC spinlock code. There is an equivalent of pv_wait() but no
pv_kick(). Maybe PPC doesn't really need that.

So powerpc has two types of wait, either undirected "all processors" or
directed to a specific processor which has been preempted by the
hypervisor.

The simple spinlock code does a directed wait, because it knows the CPU
which is holding the lock. In this case, there is a sequence that is
used to ensure we don't wait if the condition has become true, and the
target CPU does not need to kick the waiter it will happen automatically
(see splpar_spin_yield). This is preferable because we only wait as
needed and don't require the kick operation.

Thanks for the explanation.


The pv spinlock code I did uses the undirected wait, because we don't
know the CPU number which we are waiting on. This is undesirable because
it's higher overhead and the wait is not so accurate.

I think perhaps we could change things so we wait on the correct CPU
when queued, which might be good enough (we could also put the lock
owner CPU in the spinlock word, if we add another format).


The LS byte of the lock word is used to indicate locking status. If we 
have less than 255 cpus, we can put the (cpu_nr + 1) into the lock 
byte. The special 0xff value can be used to indicate a cpu number >= 
255 for indirect yield. The required change to the qspinlock code will 
be minimal, I think. 


BTW, we can also keep track of the previous cpu in the waiting queue. 
Due to lock stealing, that may not be the cpu that is holding the lock. 
Maybe we can use this, if available, in case the cpu number is >= 255.


Regards,
Longman



Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks

2020-07-08 Thread Waiman Long

On 7/8/20 4:41 AM, Peter Zijlstra wrote:

On Tue, Jul 07, 2020 at 03:57:06PM +1000, Nicholas Piggin wrote:

Yes, powerpc could certainly get more performance out of the slow
paths, and then there are a few parameters to tune.

Can you clarify? The slow path is already in use on ARM64 which is weak,
so I doubt there's superfluous serialization present. And Will spend a
fair amount of time on making that thing guarantee forward progressm, so
there just isn't too much room to play.


We don't have a good alternate patching for function calls yet, but
that would be something to do for native vs pv.

Going by your jump_label implementation, support for static_call should
be fairly straight forward too, no?

   https://lkml.kernel.org/r/20200624153024.794671...@infradead.org

Speaking of static_call, I am also looking forward to it. Do you have an 
idea when that will be merged?


Cheers,
Longman



Re: Failure to build librseq on ppc

2020-07-08 Thread Segher Boessenkool
Hi!

On Wed, Jul 08, 2020 at 10:27:27PM +1000, Michael Ellerman wrote:
> Segher Boessenkool  writes:
> > You'll have to show the actual failing machine code, and with enough
> > context that we can relate this to the source code.
> >
> > -save-temps helps, or use -S instead of -c, etc.
> 
> Attached below.

Thanks!

> I think that's from:
> 
> #define LOAD_WORD   "ld "
> 
> #define RSEQ_ASM_OP_CMPEQ(var, expect, label) 
>   \
> LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t" 
>   \

The way this hardcodes r17 *will* break, btw.  The compiler will not
likely want to use r17 as long as your code (after inlining etc.!) stays
small, but there is Murphy's law.

Anyway...  something in rseq_str is wrong, missing %X.  This may
have to do with the abuse of inline asm here, making a fix harder :-(


Segher


Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks

2020-07-08 Thread Waiman Long

On 7/8/20 4:32 AM, Peter Zijlstra wrote:

On Tue, Jul 07, 2020 at 11:33:45PM -0400, Waiman Long wrote:

 From 5d7941a498935fb225b2c7a3108cbf590114c3db Mon Sep 17 00:00:00 2001
From: Waiman Long 
Date: Tue, 7 Jul 2020 22:29:16 -0400
Subject: [PATCH 2/9] locking/pvqspinlock: Introduce
  CONFIG_PARAVIRT_QSPINLOCKS_LITE

Add a new PARAVIRT_QSPINLOCKS_LITE config option that allows
architectures to use the PV qspinlock code without the need to use or
implement a pv_kick() function, thus eliminating the atomic unlock
overhead. The non-atomic queued_spin_unlock() can be used instead.
The pv_wait() function will still be needed, but it can be a dummy
function.

With that option set, the hybrid PV queued/unfair locking code should
still be able to make it performant enough in a paravirtualized

How is this supposed to work? If there is no kick, you have no control
over who wakes up and fairness goes out the window entirely.

You don't even begin to explain...

I don't have a full understanding of how the PPC hypervisor work myself. 
Apparently, a cpu kick may not be needed.


This is just a test patch to see if it yields better result. It is 
subjected to further modifcation.


Cheers,
Longman



Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks

2020-07-08 Thread Waiman Long

On 7/8/20 1:10 AM, Nicholas Piggin wrote:

Excerpts from Waiman Long's message of July 8, 2020 1:33 pm:

On 7/7/20 1:57 AM, Nicholas Piggin wrote:

Yes, powerpc could certainly get more performance out of the slow
paths, and then there are a few parameters to tune.

We don't have a good alternate patching for function calls yet, but
that would be something to do for native vs pv.

And then there seem to be one or two tunable parameters we could
experiment with.

The paravirt locks may need a bit more tuning. Some simple testing
under KVM shows we might be a bit slower in some cases. Whether this
is fairness or something else I'm not sure. The current simple pv
spinlock code can do a directed yield to the lock holder CPU, whereas
the pv qspl here just does a general yield. I think we might actually
be able to change that to also support directed yield. Though I'm
not sure if this is actually the cause of the slowdown yet.

Regarding the paravirt lock, I have taken a further look into the
current PPC spinlock code. There is an equivalent of pv_wait() but no
pv_kick(). Maybe PPC doesn't really need that.

So powerpc has two types of wait, either undirected "all processors" or
directed to a specific processor which has been preempted by the
hypervisor.

The simple spinlock code does a directed wait, because it knows the CPU
which is holding the lock. In this case, there is a sequence that is
used to ensure we don't wait if the condition has become true, and the
target CPU does not need to kick the waiter it will happen automatically
(see splpar_spin_yield). This is preferable because we only wait as
needed and don't require the kick operation.

Thanks for the explanation.


The pv spinlock code I did uses the undirected wait, because we don't
know the CPU number which we are waiting on. This is undesirable because
it's higher overhead and the wait is not so accurate.

I think perhaps we could change things so we wait on the correct CPU
when queued, which might be good enough (we could also put the lock
owner CPU in the spinlock word, if we add another format).


The LS byte of the lock word is used to indicate locking status. If we 
have less than 255 cpus, we can put the (cpu_nr + 1) into the lock byte. 
The special 0xff value can be used to indicate a cpu number >= 255 for 
indirect yield. The required change to the qspinlock code will be 
minimal, I think.




Attached are two
additional qspinlock patches that adds a CONFIG_PARAVIRT_QSPINLOCKS_LITE
option to not require pv_kick(). There is also a fixup patch to be
applied after your patchset.

I don't have access to a PPC LPAR with shared processor at the moment,
so I can't test the performance of the paravirt code. Would you mind
adding my patches and do some performance test on your end to see if it
gives better result?

Great, I'll do some tests. Any suggestions for what to try?


I will just like to see if it will produce some better performance 
result compared with your current version.


Cheers,
Longman



[PATCH 2/2] selftests/powerpc: Use proper error code to check fault address

2020-07-08 Thread Haren Myneni


ERR_NX_TRANSLATION (CSB.CC=5) is for internal to VAS for fault
handling and should not used by OS. ERR_NX_AT_FAULT(CSB.CC=250) is the
proper error code reported by OS when NX encounters address translation
failure.

This patch uses ERR_NX_AT_FAULT (CSB.CC=250) to determine the fault
address when the request is not successful.

Signed-off-by: Haren Myneni 
---
 tools/testing/selftests/powerpc/nx-gzip/gunz_test.c  | 4 ++--
 tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c 
b/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c
index 6ee0fde..7c23d3d 100644
--- a/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c
+++ b/tools/testing/selftests/powerpc/nx-gzip/gunz_test.c
@@ -698,13 +698,13 @@ int decompress_file(int argc, char **argv, void 
*devhandle)
 
switch (cc) {
 
-   case ERR_NX_TRANSLATION:
+   case ERR_NX_AT_FAULT:
 
/* We touched the pages ahead of time.  In the most common case
 * we shouldn't be here.  But may be some pages were paged out.
 * Kernel should have placed the faulting address to fsaddr.
 */
-   NXPRT(fprintf(stderr, "ERR_NX_TRANSLATION %p\n",
+   NXPRT(fprintf(stderr, "ERR_NX_AT_FAULT %p\n",
  (void *)cmdp->crb.csb.fsaddr));
 
if (pgfault_retries == NX_MAX_FAULTS) {
diff --git a/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c 
b/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c
index 7496a83..02dffb6 100644
--- a/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c
+++ b/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c
@@ -306,13 +306,13 @@ int compress_file(int argc, char **argv, void *handle)
lzcounts, cmdp, handle);
 
if (cc != ERR_NX_OK && cc != ERR_NX_TPBC_GT_SPBC &&
-   cc != ERR_NX_TRANSLATION) {
+   cc != ERR_NX_AT_FAULT) {
fprintf(stderr, "nx error: cc= %d\n", cc);
exit(-1);
}
 
/* Page faults are handled by the user code */
-   if (cc == ERR_NX_TRANSLATION) {
+   if (cc == ERR_NX_AT_FAULT) {
NXPRT(fprintf(stderr, "page fault: cc= %d, ", cc));
NXPRT(fprintf(stderr, "try= %d, fsa= %08llx\n",
  fault_tries,
-- 
1.8.3.1




[PATCH 1/2] powerpc/vas: Report proper error for address translation failure

2020-07-08 Thread Haren Myneni


DMA controller uses CC=5 internally for translation fault handling. So
OS should be using CC=250 and should report this error to the user space
when NX encounters address translation failure on the request buffer.
Not an issue in earlier releases as NX does not get faults on
kernel addresses.

This patch defines CSB_CC_ADDRESS_TRANSLATION(250) and updates
CSB.CC with this proper error code for user space.

Signed-off-by: Haren Myneni 
---
 Documentation/powerpc/vas-api.rst  | 2 +-
 arch/powerpc/include/asm/icswx.h   | 2 ++
 arch/powerpc/platforms/powernv/vas-fault.c | 2 +-
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/powerpc/vas-api.rst 
b/Documentation/powerpc/vas-api.rst
index 1217c2f..78627cc 100644
--- a/Documentation/powerpc/vas-api.rst
+++ b/Documentation/powerpc/vas-api.rst
@@ -213,7 +213,7 @@ request buffers are not in memory. The operating system 
handles the fault by
 updating CSB with the following data:
 
csb.flags = CSB_V;
-   csb.cc = CSB_CC_TRANSLATION;
+   csb.cc = CSB_CC_ADDRESS_TRANSLATION;
csb.ce = CSB_CE_TERMINATION;
csb.address = fault_address;
 
diff --git a/arch/powerpc/include/asm/icswx.h b/arch/powerpc/include/asm/icswx.h
index 965b1f3..b1c9a57 100644
--- a/arch/powerpc/include/asm/icswx.h
+++ b/arch/powerpc/include/asm/icswx.h
@@ -77,6 +77,8 @@ struct coprocessor_completion_block {
 #define CSB_CC_CHAIN   (37)
 #define CSB_CC_SEQUENCE(38)
 #define CSB_CC_HW  (39)
+/* User space address traslation failure */
+#defineCSB_CC_ADDRESS_TRANSLATION  (250)
 
 #define CSB_SIZE   (0x10)
 #define CSB_ALIGN  CSB_SIZE
diff --git a/arch/powerpc/platforms/powernv/vas-fault.c 
b/arch/powerpc/platforms/powernv/vas-fault.c
index 266a6ca..33e89d4 100644
--- a/arch/powerpc/platforms/powernv/vas-fault.c
+++ b/arch/powerpc/platforms/powernv/vas-fault.c
@@ -79,7 +79,7 @@ static void update_csb(struct vas_window *window,
csb_addr = (void __user *)be64_to_cpu(crb->csb_addr);
 
memset(, 0, sizeof(csb));
-   csb.cc = CSB_CC_TRANSLATION;
+   csb.cc = CSB_CC_ADDRESS_TRANSLATION;
csb.ce = CSB_CE_TERMINATION;
csb.cs = 0;
csb.count = 0;
-- 
1.8.3.1




Re: [PATCH v2 2/4] powerpc/mm/radix: Free PUD table when freeing pagetable

2020-07-08 Thread Reza Arbab

On Thu, Jun 25, 2020 at 12:15:45PM +0530, Aneesh Kumar K.V wrote:

remove_pagetable() isn't freeing PUD table. This causes memory
leak during memory unplug. Fix this.


This has come up before:
https://lore.kernel.org/linuxppc-dev/20190731061920.ga18...@in.ibm.com/

tl;dr, x86 intentionally does not free, and it wasn't quite clear if 
their motivation also applies to us. Probably not, but I thought it was 
worth mentioning again.


--
Reza Arbab


Re: kernel since 5.6 do not boot anymore on Apple PowerBook

2020-07-08 Thread Christophe Leroy




Le 08/07/2020 à 19:36, Giuseppe Sacco a écrit :

Hi Cristophe,

Il giorno mer, 08/07/2020 alle 19.09 +0200, Christophe Leroy ha
scritto:

Hi

Le 08/07/2020 à 19:00, Giuseppe Sacco a écrit :

Hello,
while trying to debug a problem using git bisect, I am now at a point
where I cannot build the kernel at all. This is the error message I
get:

$ LANG=C make ARCH=powerpc \
   CROSS_COMPILE=powerpc-linux- \
   CONFIG_MODULE_COMPRESS_GZIP=true \
   INSTALL_MOD_STRIP=1 CONFIG_MODULE_COMPRESS=1 \
   -j4 INSTALL_MOD_PATH=$BOOT INSTALL_PATH=$BOOT \
   CONFIG_DEBUG_INFO_COMPRESSED=1 \
   install modules_install
make[2]: *** No rule to make target 'vmlinux', needed by


Surprising.

Did you make any change to Makefiles ?


No


Are you in the middle of a bisect ? If so, if the previous builds
worked, I'd do 'git bisect skip'


Yes, the previous one worked.


What's the result with:

LANG=C make ARCH=powerpc CROSS_COMPILE=powerpc-linux- vmlinux


$ LANG=C make ARCH=powerpc CROSS_COMPILE=powerpc-linux- vmlinux
   CALLscripts/checksyscalls.sh
   CALLscripts/atomic/check-atomics.sh
   CHK include/generated/compile.h
   CC  kernel/module.o
kernel/module.c: In function 'do_init_module':
kernel/module.c:3593:2: error: implicit declaration of function
'module_enable_ro'; did you mean 'module_enable_x'? [-Werror=implicit-
function-declaration]
  3593 |  module_enable_ro(mod, true);
   |  ^~~~
   |  module_enable_x
cc1: some warnings being treated as errors
make[1]: *** [scripts/Makefile.build:267: kernel/module.o] Error 1
make: *** [Makefile:1735: kernel] Error 2

So, should I 'git bisect skip'?


Ah yes, I had the exact same problem last time I bisected.

So yes do 'git bisect skip'. You'll probably hit this problem half a 
dozen of times, but at the end you should get a usefull bisect anyway.


Christophe


Re: [PATCH 3/3] misc: cxl: flash: Remove unused variable 'drc_index'

2020-07-08 Thread kernel test robot
Hi Lee,

I love your patch! Yet something to improve:

[auto build test ERROR on char-misc/char-misc-testing]
[also build test ERROR on soc/for-next linux/master linus/master v5.8-rc4 
next-20200708]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use  as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Lee-Jones/Mop-up-last-remaining-patches-for-Misc/20200708-205913
base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git 
8ab11d705c3b33ae4c6ca05eefaf025b7c5dbeaf
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   drivers/misc/cxl/flash.c: In function 'update_devicetree':
>> drivers/misc/cxl/flash.c:216:6: error: value computed is not used 
>> [-Werror=unused-value]
 216 |  *data++;
 |  ^~~
   cc1: all warnings being treated as errors

vim +216 drivers/misc/cxl/flash.c

   172  
   173  static int update_devicetree(struct cxl *adapter, s32 scope)
   174  {
   175  struct update_nodes_workarea *unwa;
   176  u32 action, node_count;
   177  int token, rc, i;
   178  __be32 *data, phandle;
   179  char *buf;
   180  
   181  token = rtas_token("ibm,update-nodes");
   182  if (token == RTAS_UNKNOWN_SERVICE)
   183  return -EINVAL;
   184  
   185  buf = kzalloc(RTAS_DATA_BUF_SIZE, GFP_KERNEL);
   186  if (!buf)
   187  return -ENOMEM;
   188  
   189  unwa = (struct update_nodes_workarea *)[0];
   190  unwa->unit_address = cpu_to_be64(adapter->guest->handle);
   191  do {
   192  rc = rcall(token, buf, scope);
   193  if (rc && rc != 1)
   194  break;
   195  
   196  data = (__be32 *)buf + 4;
   197  while (be32_to_cpu(*data) & NODE_ACTION_MASK) {
   198  action = be32_to_cpu(*data) & NODE_ACTION_MASK;
   199  node_count = be32_to_cpu(*data) & 
NODE_COUNT_MASK;
   200  pr_devel("device reconfiguration - action: %#x, 
nodes: %#x\n",
   201   action, node_count);
   202  data++;
   203  
   204  for (i = 0; i < node_count; i++) {
   205  phandle = *data++;
   206  
   207  switch (action) {
   208  case OPCODE_DELETE:
   209  /* nothing to do */
   210  break;
   211  case OPCODE_UPDATE:
   212  update_node(phandle, scope);
   213  break;
   214  case OPCODE_ADD:
   215  /* nothing to do, just move 
pointer */
 > 216  *data++;
   217  break;
   218  }
   219  }
   220  }
   221  } while (rc == 1);
   222  
   223  kfree(buf);
   224  return 0;
   225  }
   226  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


[PATCH 1/1] powerpc: Fix incorrect stw{, ux, u, x} instructions in __set_pte_at

2020-07-08 Thread Mathieu Desnoyers
The placeholder for instruction selection should use the second
argument's operand, which is %1, not %0. This could generate incorrect
assembly code if the instruction selection for argument %0 ever differs
from argument %1.

Fixes: 9bf2b5cdc5fe ("powerpc: Fixes for CONFIG_PTE_64BIT for SMP support")
Signed-off-by: Mathieu Desnoyers 
Cc: Christophe Leroy 
Cc: Kumar Gala 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: linuxppc-dev@lists.ozlabs.org
Cc:  # v2.6.28+
---
 arch/powerpc/include/asm/book3s/32/pgtable.h | 2 +-
 arch/powerpc/include/asm/nohash/pgtable.h| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 224912432821..f1467b3c417a 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -529,7 +529,7 @@ static inline void __set_pte_at(struct mm_struct *mm, 
unsigned long addr,
__asm__ __volatile__("\
stw%U0%X0 %2,%0\n\
eieio\n\
-   stw%U0%X0 %L2,%1"
+   stw%U1%X1 %L2,%1"
: "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
: "r" (pte) : "memory");
 
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index 4b7c3472eab1..a00e4c1746d6 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -199,7 +199,7 @@ static inline void __set_pte_at(struct mm_struct *mm, 
unsigned long addr,
__asm__ __volatile__("\
stw%U0%X0 %2,%0\n\
eieio\n\
-   stw%U0%X0 %L2,%1"
+   stw%U1%X1 %L2,%1"
: "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
: "r" (pte) : "memory");
return;
-- 
2.11.0



Re: kernel since 5.6 do not boot anymore on Apple PowerBook

2020-07-08 Thread Giuseppe Sacco
Hi Cristophe,

Il giorno mer, 08/07/2020 alle 19.09 +0200, Christophe Leroy ha
scritto:
> Hi
> 
> Le 08/07/2020 à 19:00, Giuseppe Sacco a écrit :
> > Hello,
> > while trying to debug a problem using git bisect, I am now at a point
> > where I cannot build the kernel at all. This is the error message I
> > get:
> > 
> > $ LANG=C make ARCH=powerpc \
> >   CROSS_COMPILE=powerpc-linux- \
> >   CONFIG_MODULE_COMPRESS_GZIP=true \
> >   INSTALL_MOD_STRIP=1 CONFIG_MODULE_COMPRESS=1 \
> >   -j4 INSTALL_MOD_PATH=$BOOT INSTALL_PATH=$BOOT \
> >   CONFIG_DEBUG_INFO_COMPRESSED=1 \
> >   install modules_install
> > make[2]: *** No rule to make target 'vmlinux', needed by
> 
> Surprising.
> 
> Did you make any change to Makefiles ?

No

> Are you in the middle of a bisect ? If so, if the previous builds 
> worked, I'd do 'git bisect skip'

Yes, the previous one worked.

> What's the result with:
> 
> LANG=C make ARCH=powerpc CROSS_COMPILE=powerpc-linux- vmlinux

$ LANG=C make ARCH=powerpc CROSS_COMPILE=powerpc-linux- vmlinux
  CALLscripts/checksyscalls.sh
  CALLscripts/atomic/check-atomics.sh
  CHK include/generated/compile.h
  CC  kernel/module.o
kernel/module.c: In function 'do_init_module':
kernel/module.c:3593:2: error: implicit declaration of function
'module_enable_ro'; did you mean 'module_enable_x'? [-Werror=implicit-
function-declaration]
 3593 |  module_enable_ro(mod, true);
  |  ^~~~
  |  module_enable_x
cc1: some warnings being treated as errors
make[1]: *** [scripts/Makefile.build:267: kernel/module.o] Error 1
make: *** [Makefile:1735: kernel] Error 2

So, should I 'git bisect skip'?

Thank you,
Giuseppe



Re: kernel since 5.6 do not boot anymore on Apple PowerBook

2020-07-08 Thread Christophe Leroy

Hi

Le 08/07/2020 à 19:00, Giuseppe Sacco a écrit :

Hello,
while trying to debug a problem using git bisect, I am now at a point
where I cannot build the kernel at all. This is the error message I
get:

$ LANG=C make ARCH=powerpc \
  CROSS_COMPILE=powerpc-linux- \
  CONFIG_MODULE_COMPRESS_GZIP=true \
  INSTALL_MOD_STRIP=1 CONFIG_MODULE_COMPRESS=1 \
  -j4 INSTALL_MOD_PATH=$BOOT INSTALL_PATH=$BOOT \
  CONFIG_DEBUG_INFO_COMPRESSED=1 \
  install modules_install
make[2]: *** No rule to make target 'vmlinux', needed by


Surprising.

Did you make any change to Makefiles ?

Are you in the middle of a bisect ? If so, if the previous builds 
worked, I'd do 'git bisect skip'


What's the result with:

LANG=C make ARCH=powerpc CROSS_COMPILE=powerpc-linux- vmlinux

Christophe


'arch/powerpc/boot/zImage.pmac'.  Stop.
make[1]: *** [arch/powerpc/Makefile:407: install] Error 2
make: *** [Makefile:328: __build_one_by_one] Error 2

How can I continue?

Thank you,
Giuseppe



Re: kernel since 5.6 do not boot anymore on Apple PowerBook

2020-07-08 Thread Giuseppe Sacco
Hello,
while trying to debug a problem using git bisect, I am now at a point
where I cannot build the kernel at all. This is the error message I
get:

$ LANG=C make ARCH=powerpc \
 CROSS_COMPILE=powerpc-linux- \
 CONFIG_MODULE_COMPRESS_GZIP=true \
 INSTALL_MOD_STRIP=1 CONFIG_MODULE_COMPRESS=1 \
 -j4 INSTALL_MOD_PATH=$BOOT INSTALL_PATH=$BOOT \
 CONFIG_DEBUG_INFO_COMPRESSED=1 \
 install modules_install
make[2]: *** No rule to make target 'vmlinux', needed by
'arch/powerpc/boot/zImage.pmac'.  Stop.
make[1]: *** [arch/powerpc/Makefile:407: install] Error 2
make: *** [Makefile:328: __build_one_by_one] Error 2

How can I continue?

Thank you,
Giuseppe



Re: powerpc: Incorrect stw operand modifier in __set_pte_at

2020-07-08 Thread Christophe Leroy




Le 08/07/2020 à 16:45, Mathieu Desnoyers a écrit :

Hi,

Reviewing use of the patterns "Un%Xn" with lwz and stw instructions
(where n should be the operand number) within the Linux kernel led
me to spot those 2 weird cases:

arch/powerpc/include/asm/nohash/pgtable.h:__set_pte_at()

 __asm__ __volatile__("\
 stw%U0%X0 %2,%0\n\
 eieio\n\
 stw%U0%X0 %L2,%1"
 : "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
 : "r" (pte) : "memory");

I would have expected the stw to be:

 stw%U1%X1 %L2,%1"

and:
arch/powerpc/include/asm/book3s/32/pgtable.h:__set_pte_at()

 __asm__ __volatile__("\
 stw%U0%X0 %2,%0\n\
 eieio\n\
 stw%U0%X0 %L2,%1"
 : "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
 : "r" (pte) : "memory");

where I would have expected:

 stw%U1%X1 %L2,%1"

Is it a bug or am I missing something ?


Well spotted. I guess it's definitly a bug.

Introduced 12 years ago by commit 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9bf2b5cd 
("powerpc: Fixes for CONFIG_PTE_64BIT for SMP support").


It's gone unnoticed until now it seems.

Can you submit a patch for it ?

Christophe


Re: Failure to build librseq on ppc

2020-07-08 Thread Christophe Leroy




Le 08/07/2020 à 16:32, Mathieu Desnoyers a écrit :

- On Jul 8, 2020, at 10:21 AM, Christophe Leroy christophe.le...@csgroup.eu 
wrote:


Le 08/07/2020 à 16:00, Mathieu Desnoyers a écrit :

- On Jul 8, 2020, at 8:33 AM, Mathieu Desnoyers
mathieu.desnoy...@efficios.com wrote:


- On Jul 7, 2020, at 8:59 PM, Segher Boessenkool seg...@kernel.crashing.org
wrote:

[...]


So perhaps you have code like

   int *p;
   int x;
   ...
   asm ("lwz %0,%1" : "=r"(x) : "m"(*p));


We indeed have explicit "lwz" and "stw" instructions in there.



where that last line should actually read

   asm ("lwz%X1 %0,%1" : "=r"(x) : "m"(*p));


Indeed, turning those into "lwzx" and "stwx" seems to fix the issue.

There has been some level of extra CPP macro coating around those instructions
to
support both ppc32 and ppc64 with the same assembly. So adding %X[arg] is not
trivial.
Let me see what can be done here.


I did the following changes which appear to generate valid asm.
See attached corresponding .S output.

I grepped for uses of "m" asm operand in Linux powerpc code and noticed it's
pretty much
always used with e.g. "lwz%U1%X1". I could find one blog post discussing that %U
is about
update flag, and nothing about %X. Are those documented ?


As far as I can see, %U is mentioned in
https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html in the
powerpc subpart, at the "m" constraint.


Yep, I did notice it, but mistakenly thought it was only needed for "m<>" 
operand,
not "m".


You are right, AFAIU on recent versions of GCC, %U has no effect without m<>

Christophe



Thanks,

Mathieu



For the %X I don't know.

Christophe



Although it appears to generate valid asm, I have the feeling I'm relying on
undocumented
features here. :-/




[PATCH 5/5] powerpc: use the generic dma_ops_bypass mode

2020-07-08 Thread Christoph Hellwig
Use the DMA API bypass mechanism for direct window mappings.  This uses
common code and speed up the direct mapping case by avoiding indirect
calls just when not using dma ops at all.  It also fixes a problem where
the sync_* methods were using the bypass check for DMA allocations, but
those are part of the streaming ops.

Note that this patch loses the DMA_ATTR_WEAK_ORDERING override, which
has never been well defined, as is only used by a few drivers, which
IIRC never showed up in the typical Cell blade setups that are affected
by the ordering workaround.

Fixes: efd176a04bef ("powerpc/pseries/dma: Allow SWIOTLB")
Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/Kconfig  |  1 +
 arch/powerpc/include/asm/device.h |  5 --
 arch/powerpc/kernel/dma-iommu.c   | 90 ---
 3 files changed, 10 insertions(+), 86 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e9b091d3587222..be868bfbe76ecf 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -152,6 +152,7 @@ config PPC
select CLONE_BACKWARDS
select DCACHE_WORD_ACCESS   if PPC64 && CPU_LITTLE_ENDIAN
select DMA_OPS  if PPC64
+   select DMA_OPS_BYPASS   if PPC64
select DYNAMIC_FTRACE   if FUNCTION_TRACER
select EDAC_ATOMIC_SCRUB
select EDAC_SUPPORT
diff --git a/arch/powerpc/include/asm/device.h 
b/arch/powerpc/include/asm/device.h
index 266542769e4bd1..452402215e1210 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -18,11 +18,6 @@ struct iommu_table;
  * drivers/macintosh/macio_asic.c
  */
 struct dev_archdata {
-   /*
-* Set to %true if the dma_iommu_ops are requested to use a direct
-* window instead of dynamically mapping memory.
-*/
-   booliommu_bypass : 1;
/*
 * These two used to be a union. However, with the hybrid ops we need
 * both so here we store both a DMA offset for direct mappings and
diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index e486d1d78de288..569fecd7b5b234 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -14,23 +14,6 @@
  * Generic iommu implementation
  */
 
-/*
- * The coherent mask may be smaller than the real mask, check if we can
- * really use a direct window.
- */
-static inline bool dma_iommu_alloc_bypass(struct device *dev)
-{
-   return dev->archdata.iommu_bypass && !iommu_fixed_is_weak &&
-   dma_direct_supported(dev, dev->coherent_dma_mask);
-}
-
-static inline bool dma_iommu_map_bypass(struct device *dev,
-   unsigned long attrs)
-{
-   return dev->archdata.iommu_bypass &&
-   (!iommu_fixed_is_weak || (attrs & DMA_ATTR_WEAK_ORDERING));
-}
-
 /* Allocates a contiguous real buffer and creates mappings over it.
  * Returns the virtual address of the buffer and sets dma_handle
  * to the dma address (mapping) of the first page.
@@ -39,8 +22,6 @@ static void *dma_iommu_alloc_coherent(struct device *dev, 
size_t size,
  dma_addr_t *dma_handle, gfp_t flag,
  unsigned long attrs)
 {
-   if (dma_iommu_alloc_bypass(dev))
-   return dma_direct_alloc(dev, size, dma_handle, flag, attrs);
return iommu_alloc_coherent(dev, get_iommu_table_base(dev), size,
dma_handle, dev->coherent_dma_mask, flag,
dev_to_node(dev));
@@ -50,11 +31,7 @@ static void dma_iommu_free_coherent(struct device *dev, 
size_t size,
void *vaddr, dma_addr_t dma_handle,
unsigned long attrs)
 {
-   if (dma_iommu_alloc_bypass(dev))
-   dma_direct_free(dev, size, vaddr, dma_handle, attrs);
-   else
-   iommu_free_coherent(get_iommu_table_base(dev), size, vaddr,
-   dma_handle);
+   iommu_free_coherent(get_iommu_table_base(dev), size, vaddr, dma_handle);
 }
 
 /* Creates TCEs for a user provided buffer.  The user buffer must be
@@ -67,9 +44,6 @@ static dma_addr_t dma_iommu_map_page(struct device *dev, 
struct page *page,
 enum dma_data_direction direction,
 unsigned long attrs)
 {
-   if (dma_iommu_map_bypass(dev, attrs))
-   return dma_direct_map_page(dev, page, offset, size, direction,
-   attrs);
return iommu_map_page(dev, get_iommu_table_base(dev), page, offset,
  size, dma_get_mask(dev), direction, attrs);
 }
@@ -79,11 +53,8 @@ static void dma_iommu_unmap_page(struct device *dev, 
dma_addr_t dma_handle,
 size_t size, enum dma_data_direction direction,
  

[PATCH 4/5] dma-mapping: add a dma_ops_bypass flag to struct device

2020-07-08 Thread Christoph Hellwig
Several IOMMU drivers have a bypass mode where they can use a direct
mapping if the devices DMA mask is large enough.  Add generic support
to the core dma-mapping code to do that to switch those drivers to
a common solution.

Signed-off-by: Christoph Hellwig 
---
 include/linux/device.h |  8 +
 kernel/dma/Kconfig |  8 +
 kernel/dma/mapping.c   | 74 +-
 3 files changed, 68 insertions(+), 22 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 4c4af98321ebd6..1f71acf37f78d7 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -523,6 +523,11 @@ struct dev_links_info {
  *   sync_state() callback.
  * @dma_coherent: this particular device is dma coherent, even if the
  * architecture supports non-coherent devices.
+ * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
+ * streaming DMA operations (->map_* / ->unmap_* / ->sync_*),
+ * and optionall (if the coherent mask is large enough) also
+ * for dma allocations.  This flag is managed by the dma ops
+ * instance from ->dma_supported.
  *
  * At the lowest level, every device in a Linux system is represented by an
  * instance of struct device. The device structure contains the information
@@ -623,6 +628,9 @@ struct device {
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
booldma_coherent:1;
 #endif
+#ifdef CONFIG_DMA_OPS_BYPASS
+   booldma_ops_bypass : 1;
+#endif
 };
 
 static inline struct device *kobj_to_dev(struct kobject *kobj)
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 5cfb2428593ac7..f4770fcfa62bb3 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -8,6 +8,14 @@ config HAS_DMA
 config DMA_OPS
bool
 
+#
+# IOMMU drivers that can bypass the IOMMU code and optionally use the direct
+# mapping fast path should select this option and set the dma_ops_bypass
+# flag in struct device where applicable
+#
+config DMA_OPS_BYPASS
+   bool
+
 config NEED_SG_DMA_LENGTH
bool
 
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index b53953024512fe..0d129421e75fc8 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -105,9 +105,35 @@ void *dmam_alloc_attrs(struct device *dev, size_t size, 
dma_addr_t *dma_handle,
 }
 EXPORT_SYMBOL(dmam_alloc_attrs);
 
-static inline bool dma_is_direct(const struct dma_map_ops *ops)
+static bool dma_go_direct(struct device *dev, dma_addr_t mask,
+   const struct dma_map_ops *ops)
 {
-   return likely(!ops);
+   if (likely(!ops))
+   return true;
+#ifdef CONFIG_DMA_OPS_BYPASS
+   if (dev->dma_ops_bypass)
+   return min_not_zero(mask, dev->bus_dma_limit) >=
+   dma_direct_get_required_mask(dev);
+#endif
+   return false;
+}
+
+
+/*
+ * Check if the devices uses a direct mapping for streaming DMA operations.
+ * This allows IOMMU drivers to set a bypass mode if the DMA mask is large
+ * enough.
+ */
+static inline bool dma_alloc_direct(struct device *dev,
+   const struct dma_map_ops *ops)
+{
+   return dma_go_direct(dev, dev->coherent_dma_mask, ops);
+}
+
+static inline bool dma_map_direct(struct device *dev,
+   const struct dma_map_ops *ops)
+{
+   return dma_go_direct(dev, *dev->dma_mask, ops);
 }
 
 dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
@@ -118,7 +144,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct 
page *page,
dma_addr_t addr;
 
BUG_ON(!valid_dma_direction(dir));
-   if (dma_is_direct(ops))
+   if (dma_map_direct(dev, ops))
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
@@ -134,7 +160,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t 
addr, size_t size,
const struct dma_map_ops *ops = get_dma_ops(dev);
 
BUG_ON(!valid_dma_direction(dir));
-   if (dma_is_direct(ops))
+   if (dma_map_direct(dev, ops))
dma_direct_unmap_page(dev, addr, size, dir, attrs);
else if (ops->unmap_page)
ops->unmap_page(dev, addr, size, dir, attrs);
@@ -153,7 +179,7 @@ int dma_map_sg_attrs(struct device *dev, struct scatterlist 
*sg, int nents,
int ents;
 
BUG_ON(!valid_dma_direction(dir));
-   if (dma_is_direct(ops))
+   if (dma_map_direct(dev, ops))
ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
else
ents = ops->map_sg(dev, sg, nents, dir, attrs);
@@ -172,7 +198,7 @@ void dma_unmap_sg_attrs(struct device *dev, struct 
scatterlist *sg,
 
BUG_ON(!valid_dma_direction(dir));
debug_dma_unmap_sg(dev, sg, nents, dir);
-   if (dma_is_direct(ops))
+   if (dma_map_direct(dev, ops))

[PATCH 3/5] dma-mapping: make support for dma ops optional

2020-07-08 Thread Christoph Hellwig
Avoid the overhead of the dma ops support for tiny builds that only
use the direct mapping.

Signed-off-by: Christoph Hellwig 
---
 arch/alpha/Kconfig  |  1 +
 arch/arm/Kconfig|  1 +
 arch/ia64/Kconfig   |  1 +
 arch/mips/Kconfig   |  1 +
 arch/parisc/Kconfig |  1 +
 arch/powerpc/Kconfig|  1 +
 arch/s390/Kconfig   |  1 +
 arch/sparc/Kconfig  |  1 +
 arch/x86/Kconfig|  1 +
 drivers/iommu/Kconfig   |  2 ++
 drivers/misc/mic/Kconfig|  1 +
 drivers/vdpa/Kconfig|  1 +
 drivers/xen/Kconfig |  1 +
 include/linux/device.h  |  3 ++-
 include/linux/dma-mapping.h | 12 +++-
 kernel/dma/Kconfig  |  4 
 kernel/dma/Makefile |  3 ++-
 17 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 10862c5a8c7682..9c5f06e8eb9bc0 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -7,6 +7,7 @@ config ALPHA
select ARCH_NO_PREEMPT
select ARCH_NO_SG_CHAIN
select ARCH_USE_CMPXCHG_LOCKREF
+   select DMA_OPS if PCI
select FORCE_PCI if !ALPHA_JENSEN
select PCI_DOMAINS if PCI
select PCI_SYSCALL if PCI
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 2ac74904a3ce58..bee35b0187e452 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -41,6 +41,7 @@ config ARM
select CPU_PM if SUSPEND || CPU_IDLE
select DCACHE_WORD_ACCESS if HAVE_EFFICIENT_UNALIGNED_ACCESS
select DMA_DECLARE_COHERENT
+   select DMA_OPS
select DMA_REMAP if MMU
select EDAC_SUPPORT
select EDAC_ATOMIC_SCRUB
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 1fa2fe2ef053f8..5b4ec80bf5863a 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -192,6 +192,7 @@ config IA64_SGI_UV
 
 config IA64_HP_SBA_IOMMU
bool "HP SBA IOMMU support"
+   select DMA_OPS
default y
help
  Say Y here to add support for the SBA IOMMU found on HP zx1 and
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 6fee1a133e9d6a..8a458105e445b6 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -367,6 +367,7 @@ config MACH_JAZZ
select ARC_PROMLIB
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_MIGHT_HAVE_PC_SERIO
+   select DMA_OPS
select FW_ARC
select FW_ARC32
select ARCH_MAY_HAVE_PC_FDC
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 8e4c3708773d08..38c1eafc1f1ae9 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -14,6 +14,7 @@ config PARISC
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARCH_NO_SG_CHAIN
select ARCH_SUPPORTS_MEMORY_FAILURE
+   select DMA_OPS
select RTC_CLASS
select RTC_DRV_GENERIC
select INIT_ALL_POSSIBLE
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9fa23eb320ff5a..e9b091d3587222 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -151,6 +151,7 @@ config PPC
select BUILDTIME_TABLE_SORT
select CLONE_BACKWARDS
select DCACHE_WORD_ACCESS   if PPC64 && CPU_LITTLE_ENDIAN
+   select DMA_OPS  if PPC64
select DYNAMIC_FTRACE   if FUNCTION_TRACER
select EDAC_ATOMIC_SCRUB
select EDAC_SUPPORT
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index c7d7ede6300c59..687fe23f61cc8d 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -113,6 +113,7 @@ config S390
select ARCH_WANT_IPC_PARSE_VERSION
select BUILDTIME_TABLE_SORT
select CLONE_BACKWARDS2
+   select DMA_OPS if PCI
select DYNAMIC_FTRACE if FUNCTION_TRACER
select GENERIC_CLOCKEVENTS
select GENERIC_CPU_AUTOPROBE
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 5bf2dc163540fc..5db1faaaee31c8 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -15,6 +15,7 @@ config SPARC
default y
select ARCH_MIGHT_HAVE_PC_PARPORT if SPARC64 && PCI
select ARCH_MIGHT_HAVE_PC_SERIO
+   select DMA_OPS
select OF
select OF_PROMTREE
select HAVE_ASM_MODVERSIONS
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 883da0abf7790c..96ab92754158dd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -909,6 +909,7 @@ config DMI
 
 config GART_IOMMU
bool "Old AMD GART IOMMU support"
+   select DMA_OPS
select IOMMU_HELPER
select SWIOTLB
depends on X86_64 && PCI && AMD_NB
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 6dc49ed8377a5c..d6ce878a7e8684 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -97,6 +97,7 @@ config OF_IOMMU
 # IOMMU-agnostic DMA-mapping layer
 config IOMMU_DMA
bool
+   select DMA_OPS
select IOMMU_API
select IOMMU_IOVA
select IRQ_MSI_IOMMU
@@ -183,6 +184,7 @@ config DMAR_TABLE
 config INTEL_IOMMU
bool "Support 

[PATCH 2/5] dma-mapping: inline the fast path dma-direct calls

2020-07-08 Thread Christoph Hellwig
Inline the single page map/unmap/sync dma-direct calls into the now
out of line generic wrappers.  This restores the behavior of a single
function call that we had before moving the generic calls out of line.
Besides the dma-mapping callers there are just a few callers in IOMMU
drivers that have a bypass mode, and more of those are going to be
switched to the generic bypass soon.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-direct.h | 92 --
 kernel/dma/direct.c| 65 ---
 2 files changed, 69 insertions(+), 88 deletions(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 78dc3524adf880..dbb19dd9869054 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -1,10 +1,16 @@
 /* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Internals of the DMA direct mapping implementation.  Only for use by the
+ * DMA mapping code and IOMMU drivers.
+ */
 #ifndef _LINUX_DMA_DIRECT_H
 #define _LINUX_DMA_DIRECT_H 1
 
 #include 
+#include 
 #include  /* for min_low_pfn */
 #include 
+#include 
 
 extern unsigned int zone_dma_bits;
 
@@ -86,25 +92,17 @@ int dma_direct_mmap(struct device *dev, struct 
vm_area_struct *vma,
unsigned long attrs);
 int dma_direct_supported(struct device *dev, u64 mask);
 bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr);
-dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
-   unsigned long offset, size_t size, enum dma_data_direction dir,
-   unsigned long attrs);
 int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
enum dma_data_direction dir, unsigned long attrs);
 dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
size_t size, enum dma_data_direction dir, unsigned long attrs);
+size_t dma_direct_max_mapping_size(struct device *dev);
 
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
 defined(CONFIG_SWIOTLB)
-void dma_direct_sync_single_for_device(struct device *dev,
-   dma_addr_t addr, size_t size, enum dma_data_direction dir);
-void dma_direct_sync_sg_for_device(struct device *dev,
-   struct scatterlist *sgl, int nents, enum dma_data_direction 
dir);
+void dma_direct_sync_sg_for_device(struct device *dev, struct scatterlist *sgl,
+   int nents, enum dma_data_direction dir);
 #else
-static inline void dma_direct_sync_single_for_device(struct device *dev,
-   dma_addr_t addr, size_t size, enum dma_data_direction dir)
-{
-}
 static inline void dma_direct_sync_sg_for_device(struct device *dev,
struct scatterlist *sgl, int nents, enum dma_data_direction dir)
 {
@@ -114,34 +112,82 @@ static inline void dma_direct_sync_sg_for_device(struct 
device *dev,
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) || \
 defined(CONFIG_SWIOTLB)
-void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
-   size_t size, enum dma_data_direction dir, unsigned long attrs);
 void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
int nents, enum dma_data_direction dir, unsigned long attrs);
-void dma_direct_sync_single_for_cpu(struct device *dev,
-   dma_addr_t addr, size_t size, enum dma_data_direction dir);
 void dma_direct_sync_sg_for_cpu(struct device *dev,
struct scatterlist *sgl, int nents, enum dma_data_direction 
dir);
 #else
-static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
-   size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
-}
 static inline void dma_direct_unmap_sg(struct device *dev,
struct scatterlist *sgl, int nents, enum dma_data_direction dir,
unsigned long attrs)
 {
 }
+static inline void dma_direct_sync_sg_for_cpu(struct device *dev,
+   struct scatterlist *sgl, int nents, enum dma_data_direction dir)
+{
+}
+#endif
+
+static inline void dma_direct_sync_single_for_device(struct device *dev,
+   dma_addr_t addr, size_t size, enum dma_data_direction dir)
+{
+   phys_addr_t paddr = dma_to_phys(dev, addr);
+
+   if (unlikely(is_swiotlb_buffer(paddr)))
+   swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_DEVICE);
+
+   if (!dev_is_dma_coherent(dev))
+   arch_sync_dma_for_device(paddr, size, dir);
+}
+
 static inline void dma_direct_sync_single_for_cpu(struct device *dev,
dma_addr_t addr, size_t size, enum dma_data_direction dir)
 {
+   phys_addr_t paddr = dma_to_phys(dev, addr);
+
+   if (!dev_is_dma_coherent(dev)) {
+   arch_sync_dma_for_cpu(paddr, size, dir);
+   arch_sync_dma_for_cpu_all();
+   }
+
+   if (unlikely(is_swiotlb_buffer(paddr)))
+   swiotlb_tbl_sync_single(dev, paddr, size, dir, 

[PATCH 1/5] dma-mapping: move the remaining DMA API calls out of line

2020-07-08 Thread Christoph Hellwig
For a long time the DMA API has been implemented inline in dma-mapping.h,
but the function bodies can be quite large.  Move them all out of line.

This also removes all the dma_direct_* exports as those are just
implementation details and should never be used by drivers directly.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-direct.h  |  58 +
 include/linux/dma-mapping.h | 247 
 kernel/dma/direct.c |   9 --
 kernel/dma/mapping.c| 164 
 4 files changed, 244 insertions(+), 234 deletions(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 5184735a0fe8eb..78dc3524adf880 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -86,4 +86,62 @@ int dma_direct_mmap(struct device *dev, struct 
vm_area_struct *vma,
unsigned long attrs);
 int dma_direct_supported(struct device *dev, u64 mask);
 bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr);
+dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
+   unsigned long offset, size_t size, enum dma_data_direction dir,
+   unsigned long attrs);
+int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
+   enum dma_data_direction dir, unsigned long attrs);
+dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
+   size_t size, enum dma_data_direction dir, unsigned long attrs);
+
+#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
+defined(CONFIG_SWIOTLB)
+void dma_direct_sync_single_for_device(struct device *dev,
+   dma_addr_t addr, size_t size, enum dma_data_direction dir);
+void dma_direct_sync_sg_for_device(struct device *dev,
+   struct scatterlist *sgl, int nents, enum dma_data_direction 
dir);
+#else
+static inline void dma_direct_sync_single_for_device(struct device *dev,
+   dma_addr_t addr, size_t size, enum dma_data_direction dir)
+{
+}
+static inline void dma_direct_sync_sg_for_device(struct device *dev,
+   struct scatterlist *sgl, int nents, enum dma_data_direction dir)
+{
+}
+#endif
+
+#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
+defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) || \
+defined(CONFIG_SWIOTLB)
+void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
+   size_t size, enum dma_data_direction dir, unsigned long attrs);
+void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
+   int nents, enum dma_data_direction dir, unsigned long attrs);
+void dma_direct_sync_single_for_cpu(struct device *dev,
+   dma_addr_t addr, size_t size, enum dma_data_direction dir);
+void dma_direct_sync_sg_for_cpu(struct device *dev,
+   struct scatterlist *sgl, int nents, enum dma_data_direction 
dir);
+#else
+static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
+   size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+}
+static inline void dma_direct_unmap_sg(struct device *dev,
+   struct scatterlist *sgl, int nents, enum dma_data_direction dir,
+   unsigned long attrs)
+{
+}
+static inline void dma_direct_sync_single_for_cpu(struct device *dev,
+   dma_addr_t addr, size_t size, enum dma_data_direction dir)
+{
+}
+static inline void dma_direct_sync_sg_for_cpu(struct device *dev,
+   struct scatterlist *sgl, int nents, enum dma_data_direction dir)
+{
+}
+#endif
+
+size_t dma_direct_max_mapping_size(struct device *dev);
+
 #endif /* _LINUX_DMA_DIRECT_H */
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index a33ed3954ed465..bd0a6f5ee44581 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -188,73 +188,6 @@ static inline int dma_mmap_from_global_coherent(struct 
vm_area_struct *vma,
 }
 #endif /* CONFIG_DMA_DECLARE_COHERENT */
 
-static inline bool dma_is_direct(const struct dma_map_ops *ops)
-{
-   return likely(!ops);
-}
-
-/*
- * All the dma_direct_* declarations are here just for the indirect call 
bypass,
- * and must not be used directly drivers!
- */
-dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
-   unsigned long offset, size_t size, enum dma_data_direction dir,
-   unsigned long attrs);
-int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
-   enum dma_data_direction dir, unsigned long attrs);
-dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
-   size_t size, enum dma_data_direction dir, unsigned long attrs);
-
-#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
-defined(CONFIG_SWIOTLB)
-void dma_direct_sync_single_for_device(struct device *dev,
-   dma_addr_t addr, size_t size, enum dma_data_direction dir);
-void dma_direct_sync_sg_for_device(struct device *dev,

generic DMA bypass flag v4

2020-07-08 Thread Christoph Hellwig
Hi all,

I've recently beeing chatting with Lu about using dma-iommu and
per-device DMA ops in the intel IOMMU driver, and one missing feature
in dma-iommu is a bypass mode where the direct mapping is used even
when an iommu is attached to improve performance.  The powerpc
code already has a similar mode, so I'd like to move it to the core
DMA mapping code.  As part of that I noticed that the current
powerpc code has a little bug in that it used the wrong check in the
dma_sync_* routines to see if the direct mapping code is used.

These two patches just add the generic code and move powerpc over,
the intel IOMMU bits will require a separate discussion.

The x86 AMD Gart code also has a bypass mode, but it is a lot
strange, so I'm not going to touch it for now.

Note that as-is this breaks the XSK buffer pool, which unfortunately
poked directly into DMA internals.  A fix for that is already queued
up in the netdev tree.

Jesper and XDP gang: this should not regress any performance as
the dma-direct calls are now inlined into the out of line DMA mapping
calls.  But if you can verify the performance numbers that would be
greatly appreciated.

A git tree is available here:

git://git.infradead.org/users/hch/misc.git dma-bypass.4

Gitweb:

git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-bypass.4


Changes since v3:
 - add config options for the dma ops bypass and dma ops themselves
   to not increase the size of tinyconfig builds

Changes since v2:
 - move the dma mapping helpers out of line
 - check for possible direct mappings using the dma mask

Changes since v1:
 - rebased to the current dma-mapping-for-next tree


Diffstat:
 arch/alpha/Kconfig|1 
 arch/arm/Kconfig  |1 
 arch/ia64/Kconfig |1 
 arch/mips/Kconfig |1 
 arch/parisc/Kconfig   |1 
 arch/powerpc/Kconfig  |2 
 arch/powerpc/include/asm/device.h |5 
 arch/powerpc/kernel/dma-iommu.c   |   90 +
 arch/s390/Kconfig |1 
 arch/sparc/Kconfig|1 
 arch/x86/Kconfig  |1 
 drivers/iommu/Kconfig |2 
 drivers/misc/mic/Kconfig  |1 
 drivers/vdpa/Kconfig  |1 
 drivers/xen/Kconfig   |1 
 include/linux/device.h|   11 +
 include/linux/dma-direct.h|  104 +++
 include/linux/dma-mapping.h   |  251 --
 kernel/dma/Kconfig|   12 +
 kernel/dma/Makefile   |3 
 kernel/dma/direct.c   |   74 ---
 kernel/dma/mapping.c  |  214 ++--
 22 files changed, 385 insertions(+), 394 deletions(-)


powerpc: Incorrect stw operand modifier in __set_pte_at

2020-07-08 Thread Mathieu Desnoyers
Hi,

Reviewing use of the patterns "Un%Xn" with lwz and stw instructions
(where n should be the operand number) within the Linux kernel led
me to spot those 2 weird cases:

arch/powerpc/include/asm/nohash/pgtable.h:__set_pte_at()

__asm__ __volatile__("\
stw%U0%X0 %2,%0\n\
eieio\n\
stw%U0%X0 %L2,%1"
: "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
: "r" (pte) : "memory");

I would have expected the stw to be:

stw%U1%X1 %L2,%1"

and:
arch/powerpc/include/asm/book3s/32/pgtable.h:__set_pte_at()

__asm__ __volatile__("\
stw%U0%X0 %2,%0\n\
eieio\n\
stw%U0%X0 %L2,%1"
: "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
: "r" (pte) : "memory");

where I would have expected:

stw%U1%X1 %L2,%1"

Is it a bug or am I missing something ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: Failure to build librseq on ppc

2020-07-08 Thread Mathieu Desnoyers
- On Jul 8, 2020, at 10:21 AM, Christophe Leroy christophe.le...@csgroup.eu 
wrote:

> Le 08/07/2020 à 16:00, Mathieu Desnoyers a écrit :
>> - On Jul 8, 2020, at 8:33 AM, Mathieu Desnoyers
>> mathieu.desnoy...@efficios.com wrote:
>> 
>>> - On Jul 7, 2020, at 8:59 PM, Segher Boessenkool 
>>> seg...@kernel.crashing.org
>>> wrote:
>> [...]

 So perhaps you have code like

   int *p;
   int x;
   ...
   asm ("lwz %0,%1" : "=r"(x) : "m"(*p));
>>>
>>> We indeed have explicit "lwz" and "stw" instructions in there.
>>>

 where that last line should actually read

   asm ("lwz%X1 %0,%1" : "=r"(x) : "m"(*p));
>>>
>>> Indeed, turning those into "lwzx" and "stwx" seems to fix the issue.
>>>
>>> There has been some level of extra CPP macro coating around those 
>>> instructions
>>> to
>>> support both ppc32 and ppc64 with the same assembly. So adding %X[arg] is 
>>> not
>>> trivial.
>>> Let me see what can be done here.
>> 
>> I did the following changes which appear to generate valid asm.
>> See attached corresponding .S output.
>> 
>> I grepped for uses of "m" asm operand in Linux powerpc code and noticed it's
>> pretty much
>> always used with e.g. "lwz%U1%X1". I could find one blog post discussing 
>> that %U
>> is about
>> update flag, and nothing about %X. Are those documented ?
> 
> As far as I can see, %U is mentioned in
> https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html in the
> powerpc subpart, at the "m" constraint.

Yep, I did notice it, but mistakenly thought it was only needed for "m<>" 
operand,
not "m".

Thanks,

Mathieu

> 
> For the %X I don't know.
> 
> Christophe
> 
>> 
>> Although it appears to generate valid asm, I have the feeling I'm relying on
>> undocumented
>> features here. :-/

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: Failure to build librseq on ppc

2020-07-08 Thread Christophe Leroy




Le 08/07/2020 à 16:00, Mathieu Desnoyers a écrit :

- On Jul 8, 2020, at 8:33 AM, Mathieu Desnoyers 
mathieu.desnoy...@efficios.com wrote:


- On Jul 7, 2020, at 8:59 PM, Segher Boessenkool seg...@kernel.crashing.org
wrote:

[...]


So perhaps you have code like

  int *p;
  int x;
  ...
  asm ("lwz %0,%1" : "=r"(x) : "m"(*p));


We indeed have explicit "lwz" and "stw" instructions in there.



where that last line should actually read

  asm ("lwz%X1 %0,%1" : "=r"(x) : "m"(*p));


Indeed, turning those into "lwzx" and "stwx" seems to fix the issue.

There has been some level of extra CPP macro coating around those instructions
to
support both ppc32 and ppc64 with the same assembly. So adding %X[arg] is not
trivial.
Let me see what can be done here.


I did the following changes which appear to generate valid asm.
See attached corresponding .S output.

I grepped for uses of "m" asm operand in Linux powerpc code and noticed it's 
pretty much
always used with e.g. "lwz%U1%X1". I could find one blog post discussing that 
%U is about
update flag, and nothing about %X. Are those documented ?


As far as I can see, %U is mentioned in 
https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html in the 
powerpc subpart, at the "m" constraint.


For the %X I don't know.

Christophe



Although it appears to generate valid asm, I have the feeling I'm relying on 
undocumented
features here. :-/



Re: [PATCH] powerpc: select ARCH_HAS_MEMBARRIER_SYNC_CORE

2020-07-08 Thread Mathieu Desnoyers
- On Jul 8, 2020, at 1:17 AM, Nicholas Piggin npig...@gmail.com wrote:

> Excerpts from Mathieu Desnoyers's message of July 7, 2020 9:25 pm:
>> - On Jul 7, 2020, at 1:50 AM, Nicholas Piggin npig...@gmail.com wrote:
>> 
[...]
>>> I should actually change the comment for 64-bit because soft masked
>>> interrupt replay is an interesting case. I thought it was okay (because
>>> the IPI would cause a hard interrupt which does do the rfi) but that
>>> should at least be written.
>> 
>> Yes.
>> 
>>> The context synchronisation happens before
>>> the Linux IPI function is called, but for the purpose of membarrier I
>>> think that is okay (the membarrier just needs to have caused a memory
>>> barrier + context synchronistaion by the time it has done).
>> 
>> Can you point me to the code implementing this logic ?
> 
> It's mostly in arch/powerpc/kernel/exception-64s.S and
> powerpc/kernel/irq.c, but a lot of asm so easier to explain.
> 
> When any Linux code does local_irq_disable(), we set interrupts as
> software-masked in a per-cpu flag. When interrupts (including IPIs) come
> in, the first thing we do is check that flag and if we are masked, then
> record that the interrupt needs to be "replayed" in another per-cpu
> flag. The interrupt handler then exits back using RFI (which is context
> synchronising the CPU). Later, when the kernel code does
> local_irq_enable(), it checks the replay flag to see if anything needs
> to be done. At that point we basically just call the interrupt handler
> code like a normal function, and when that returns there is no context
> synchronising instruction.

AFAIU this can only happen for interrupts nesting over irqoff sections,
therefore over kernel code, never userspace, right ?

> 
> So membarrier IPI will always cause target CPUs to perform a context
> synchronising instruction, but sometimes it happens before the IPI
> handler function runs.

If my understanding is correct, the replayed interrupt handler logic
only nests over kernel code, which will eventually need to issue a
context synchronizing instruction before returning to user-space.

All we care about is that starting from the membarrier, each core
either:

- interrupt user-space to issue the context synchronizing instruction if
  they were running userspace, or
- _eventually_ issue a context synchronizing instruction before returning
  to user-space if they were running kernel code.

So your earlier statement "the membarrier just needs to have caused a memory
barrier + context synchronistaion by the time it has done" is not strictly
correct: the context synchronizing instruction does not strictly need to
happen on each core before membarrier returns. A similar line of thoughts
can be followed for memory barriers.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: Failure to build librseq on ppc

2020-07-08 Thread Mathieu Desnoyers
- On Jul 8, 2020, at 8:33 AM, Mathieu Desnoyers 
mathieu.desnoy...@efficios.com wrote:

> - On Jul 7, 2020, at 8:59 PM, Segher Boessenkool 
> seg...@kernel.crashing.org
> wrote:
[...]
>> 
>> So perhaps you have code like
>> 
>>  int *p;
>>  int x;
>>  ...
>>  asm ("lwz %0,%1" : "=r"(x) : "m"(*p));
> 
> We indeed have explicit "lwz" and "stw" instructions in there.
> 
>> 
>> where that last line should actually read
>> 
>>  asm ("lwz%X1 %0,%1" : "=r"(x) : "m"(*p));
> 
> Indeed, turning those into "lwzx" and "stwx" seems to fix the issue.
> 
> There has been some level of extra CPP macro coating around those instructions
> to
> support both ppc32 and ppc64 with the same assembly. So adding %X[arg] is not
> trivial.
> Let me see what can be done here.

I did the following changes which appear to generate valid asm.
See attached corresponding .S output.

I grepped for uses of "m" asm operand in Linux powerpc code and noticed it's 
pretty much
always used with e.g. "lwz%U1%X1". I could find one blog post discussing that 
%U is about
update flag, and nothing about %X. Are those documented ?

Although it appears to generate valid asm, I have the feeling I'm relying on 
undocumented
features here. :-/

Here is the diff on 
https://git.kernel.org/pub/scm/libs/librseq/librseq.git/tree/include/rseq/rseq-ppc.h
It's only compile-tested on powerpc32 so far:

diff --git a/include/rseq/rseq-ppc.h b/include/rseq/rseq-ppc.h
index eb53953..f689fe9 100644
--- a/include/rseq/rseq-ppc.h
+++ b/include/rseq/rseq-ppc.h
@@ -47,9 +47,9 @@ do {  
\

 #ifdef __PPC64__

-#define STORE_WORD "std "
-#define LOAD_WORD  "ld "
-#define LOADX_WORD "ldx "
+#define STORE_WORD(arg)"std%U[" __rseq_str(arg) "]%X[" __rseq_str(arg) 
"] "/* To memory ("m" constraint) */
+#define LOAD_WORD(arg) "lwd%U[" __rseq_str(arg) "]%X[" __rseq_str(arg) "] "
/* From memory ("m" constraint) */
+#define LOADX_WORD "ldx "  
/* From base register ("b" constraint) */
 #define CMP_WORD   "cmpd "

 #define __RSEQ_ASM_DEFINE_TABLE(label, version, flags, 
\
@@ -89,9 +89,9 @@ do {  
\

 #else /* #ifdef __PPC64__ */

-#define STORE_WORD "stw "
-#define LOAD_WORD  "lwz "
-#define LOADX_WORD "lwzx "
+#define STORE_WORD(arg)"stw%U[" __rseq_str(arg) "]%X[" __rseq_str(arg) 
"] "/* To memory ("m" constraint) */
+#define LOAD_WORD(arg) "lwz%U[" __rseq_str(arg) "]%X[" __rseq_str(arg) "] "
/* From memory ("m" constraint) */
+#define LOADX_WORD "lwzx " 
/* From base register ("b" constraint) */
 #define CMP_WORD   "cmpw "

 #define __RSEQ_ASM_DEFINE_TABLE(label, version, flags, 
\
@@ -125,7 +125,7 @@ do {
\
RSEQ_INJECT_ASM(1)  
\
"lis %%r17, (" __rseq_str(cs_label) ")@ha\n\t"  
\
"addi %%r17, %%r17, (" __rseq_str(cs_label) ")@l\n\t"   
\
-   "stw %%r17, %[" __rseq_str(rseq_cs) "]\n\t" 
\
+   "stw%U[" __rseq_str(rseq_cs) "]%X[" __rseq_str(rseq_cs) "] 
%%r17, %[" __rseq_str(rseq_cs) "]\n\t"   \
__rseq_str(label) ":\n\t"

 #endif /* #ifdef __PPC64__ */
@@ -136,7 +136,7 @@ do {
\

 #define RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, label) 
\
RSEQ_INJECT_ASM(2)  
\
-   "lwz %%r17, %[" __rseq_str(current_cpu_id) "]\n\t"  
\
+   "lwz%U[" __rseq_str(current_cpu_id) "]%X[" 
__rseq_str(current_cpu_id) "] %%r17, %[" __rseq_str(current_cpu_id) "]\n\t" \
"cmpw cr7, %[" __rseq_str(cpu_id) "], %%r17\n\t"
\
"bne- cr7, " __rseq_str(label) "\n\t"
@@ -153,25 +153,25 @@ do {  
\
  * RSEQ_ASM_OP_* (else): doesn't have hard-code registers(unless cr7)
  */
 #define RSEQ_ASM_OP_CMPEQ(var, expect, label)  
\
-   LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t"   
\
+   LOAD_WORD(var) "%%r17, %[" __rseq_str(var) "]\n\t"  
\
CMP_WORD "cr7, %%r17, %[" __rseq_str(expect) "]\n\t"
\
"bne- cr7, " __rseq_str(label) "\n\t"

 #define RSEQ_ASM_OP_CMPNE(var, expectnot, label)   
\
-   LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t"   
\
+   LOAD_WORD(var) "%%r17, %[" __rseq_str(var) "]\n\t"  
\
  

Re: [PATCH 18/20] Documentation: security/keys: eliminate duplicated word

2020-07-08 Thread Jarkko Sakkinen
On Tue, Jul 07, 2020 at 11:04:12AM -0700, Randy Dunlap wrote:
> Drop the doubled word "in".
> 
> Signed-off-by: Randy Dunlap 
> Cc: Jonathan Corbet 
> Cc: linux-...@vger.kernel.org
> Cc: James Bottomley 
> Cc: Jarkko Sakkinen 
> Cc: Mimi Zohar 
> Cc: linux-integr...@vger.kernel.org
> Cc: keyri...@vger.kernel.org

Acked-by: Jarkko Sakkinen 

/Jarkko


Re: [PATCH V4 2/3] mm/sparsemem: Enable vmem_altmap support in vmemmap_alloc_block_buf()

2020-07-08 Thread Catalin Marinas
On Mon, Jul 06, 2020 at 08:26:17AM +0530, Anshuman Khandual wrote:
> There are many instances where vmemap allocation is often switched between
> regular memory and device memory just based on whether altmap is available
> or not. vmemmap_alloc_block_buf() is used in various platforms to allocate
> vmemmap mappings. Lets also enable it to handle altmap based device memory
> allocation along with existing regular memory allocations. This will help
> in avoiding the altmap based allocation switch in many places. To summarize
> there are two different methods to call vmemmap_alloc_block_buf().
> 
> vmemmap_alloc_block_buf(size, node, NULL)   /* Allocate from system RAM */
> vmemmap_alloc_block_buf(size, node, altmap) /* Allocate from altmap */
> 
> This converts altmap_alloc_block_buf() into a static function, drops it's

s/it's/its/

> entry from the header and updates Documentation/vm/memory-model.rst.
> 
> Cc: Jonathan Corbet 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Dave Hansen 
> Cc: Andy Lutomirski 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: "H. Peter Anvin" 
> Cc: Andrew Morton 
> Cc: linux-...@vger.kernel.org
> Cc: x...@kernel.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux...@kvack.org
> Cc: linux-ker...@vger.kernel.org
> Tested-by: Jia He 
> Suggested-by: Robin Murphy 
> Signed-off-by: Anshuman Khandual 

With the fallback argument dropped, the patch looks fine to me.

Reviewed-by: Catalin Marinas 


Re: [PATCH 3/3] misc: cxl: flash: Remove unused variable 'drc_index'

2020-07-08 Thread Andrew Donnellan

On 8/7/20 10:57 pm, Lee Jones wrote:

Keeping the pointer increment though.

Fixes the following W=1 kernel build warning:

  drivers/misc/cxl/flash.c: In function ‘update_devicetree’:
  drivers/misc/cxl/flash.c:178:16: warning: variable ‘drc_index’ set but not 
used [-Wunused-but-set-variable]
  178 | __be32 *data, drc_index, phandle;
  | ^

Cc: Frederic Barrat 
Cc: Andrew Donnellan 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Lee Jones 


Acked-by: Andrew Donnellan 

--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited


Re: [PATCH 04/20] Documentation: kgdb: eliminate duplicated word

2020-07-08 Thread Daniel Thompson
On Tue, Jul 07, 2020 at 11:03:58AM -0700, Randy Dunlap wrote:
> Drop the doubled word "driver".
> 
> Signed-off-by: Randy Dunlap 
> Cc: Jonathan Corbet 
> Cc: linux-...@vger.kernel.org
> Cc: Jason Wessel 
> Cc: Daniel Thompson 
> Cc: Douglas Anderson 
> Cc: kgdb-bugrep...@lists.sourceforge.net

Acked-by: Daniel Thompson 


Daniel.


[PATCH 3/3] misc: cxl: flash: Remove unused variable 'drc_index'

2020-07-08 Thread Lee Jones
Keeping the pointer increment though.

Fixes the following W=1 kernel build warning:

 drivers/misc/cxl/flash.c: In function ‘update_devicetree’:
 drivers/misc/cxl/flash.c:178:16: warning: variable ‘drc_index’ set but not 
used [-Wunused-but-set-variable]
 178 | __be32 *data, drc_index, phandle;
 | ^

Cc: Frederic Barrat 
Cc: Andrew Donnellan 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Lee Jones 
---
 drivers/misc/cxl/flash.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/cxl/flash.c b/drivers/misc/cxl/flash.c
index cb9cca35a2263..774d582ddd70b 100644
--- a/drivers/misc/cxl/flash.c
+++ b/drivers/misc/cxl/flash.c
@@ -175,7 +175,7 @@ static int update_devicetree(struct cxl *adapter, s32 scope)
struct update_nodes_workarea *unwa;
u32 action, node_count;
int token, rc, i;
-   __be32 *data, drc_index, phandle;
+   __be32 *data, phandle;
char *buf;
 
token = rtas_token("ibm,update-nodes");
@@ -213,7 +213,7 @@ static int update_devicetree(struct cxl *adapter, s32 scope)
break;
case OPCODE_ADD:
/* nothing to do, just move pointer */
-   drc_index = *data++;
+   *data++;
break;
}
}
-- 
2.25.1



Re: [PATCH] selftests/powerpc: Purge extra count_pmc() calls of ebb selftests

2020-07-08 Thread Desnes Augusto Nunes do Rosario



On 7/8/20 7:38 AM, Sachin Sant wrote:



On 26-Jun-2020, at 10:17 PM, Desnes A. Nunes do Rosario  
wrote:

An extra count on ebb_state.stats.pmc_count[PMC_INDEX(pmc)] is being per-
formed when count_pmc() is used to reset PMCs on a few selftests. This
extra pmc_count can occasionally invalidate results, such as the ones from
cycles_test shown hereafter. The ebb_check_count() failed with an above
the upper limit error due to the extra value on ebb_state.stats.pmc_count.

Furthermore, this extra count is also indicated by extra PMC1 trace_log on
the output of the cycle test (as well as on pmc56_overflow_test):

==
   ...
   [21]: counter = 8
   [22]: register SPRN_MMCR0 = 0x8080
   [23]: register SPRN_PMC1  = 0x8004
   [24]: counter = 9
   [25]: register SPRN_MMCR0 = 0x8080
   [26]: register SPRN_PMC1  = 0x8004
   [27]: counter = 10
   [28]: register SPRN_MMCR0 = 0x8080
   [29]: register SPRN_PMC1  = 0x8004

[30]: register SPRN_PMC1  = 0x451e

PMC1 count (0x28546) above upper limit 0x283e8 (+0x15e)
[FAIL] Test FAILED on line 52
failure: cycles
==

Signed-off-by: Desnes A. Nunes do Rosario 
---

I too have run into similar failure with cycles_test. I will add that the 
failure
is inconsistent. I have run into this issue 1 out of 25 times. The failure 
always
happen at first instance. Subsequent tries work correctly.

Indeed; on my tests I was running 100 times to validate.
Thanks for the review Sachin


With this patch applied the test completes successfully 25 out of 25 times.

# ./cycles_test
test: cycles
…..
…..
   [25]: register SPRN_MMCR0 = 0x8080
   [26]: register SPRN_PMC1  = 0x8004
   [27]: counter = 10
   [28]: register SPRN_MMCR0 = 0x8080
   [29]: register SPRN_PMC1  = 0x8004
   [30]: register SPRN_PMC1  = 0x448f
PMC1 count (0x284b7) above upper limit 0x283e8 (+0xcf)
[FAIL] Test FAILED on line 52
failure: cycles

With the patch

# ./cycles_test
test: cycles
…..
…..
   [25]: register SPRN_MMCR0 = 0x8080
   [26]: register SPRN_PMC1  = 0x8004
   [27]: counter = 10
   [28]: register SPRN_MMCR0 = 0x8080
   [29]: register SPRN_PMC1  = 0x8004
PMC1 count (0x28028) is between 0x27c18 and 0x283e8 delta 
+0x410/-0x3c0
success: cycles
#

FWIW   Tested-by : Sachin Sant 

Thanks
-Sachin


--
Desnes A. Nunes do Rosario

Advisory Software Engineer - IBM
Virtual Onsite Engineer - Red Hat



Re: Failure to build librseq on ppc

2020-07-08 Thread Mathieu Desnoyers
- On Jul 7, 2020, at 8:59 PM, Segher Boessenkool seg...@kernel.crashing.org 
wrote:

> Hi!
> 
> On Tue, Jul 07, 2020 at 03:17:10PM -0400, Mathieu Desnoyers wrote:
>> I'm trying to build librseq at:
>> 
>> https://git.kernel.org/pub/scm/libs/librseq/librseq.git
>> 
>> on powerpc, and I get these errors when building the rseq basic
>> test mirrored from the kernel selftests code:
>> 
>> /tmp/ccieEWxU.s: Assembler messages:
>> /tmp/ccieEWxU.s:118: Error: syntax error; found `,', expected `('
>> /tmp/ccieEWxU.s:118: Error: junk at end of line: `,8'
>> /tmp/ccieEWxU.s:121: Error: syntax error; found `,', expected `('
>> /tmp/ccieEWxU.s:121: Error: junk at end of line: `,8'
>> /tmp/ccieEWxU.s:626: Error: syntax error; found `,', expected `('
>> /tmp/ccieEWxU.s:626: Error: junk at end of line: `,8'
>> /tmp/ccieEWxU.s:629: Error: syntax error; found `,', expected `('
>> /tmp/ccieEWxU.s:629: Error: junk at end of line: `,8'
>> /tmp/ccieEWxU.s:735: Error: syntax error; found `,', expected `('
>> /tmp/ccieEWxU.s:735: Error: junk at end of line: `,8'
>> /tmp/ccieEWxU.s:738: Error: syntax error; found `,', expected `('
>> /tmp/ccieEWxU.s:738: Error: junk at end of line: `,8'
>> /tmp/ccieEWxU.s:741: Error: syntax error; found `,', expected `('
>> /tmp/ccieEWxU.s:741: Error: junk at end of line: `,8'
>> Makefile:581: recipe for target 'basic_percpu_ops_test.o' failed
> 
> You'll have to show the actual failing machine code, and with enough
> context that we can relate this to the source code.
> 
> -save-temps helps, or use -S instead of -c, etc.

Sure, see attached .S file.

> 
>> I am using this compiler:
>> 
>> gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)
>> Target: powerpc-linux-gnu
>> 
>> So far, I got things to build by changing "m" operands to "Q" operands.
>> Based on
>> https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints
>> it seems that "Q" means "A memory operand addressed by just a base register."
> 
> Yup.
> 
>> I suspect that lwz and stw don't expect some kind of immediate offset which
>> can be kept with "m", and "Q" fixes this. Is that the right fix ?
>> 
>> And should we change all operands passed to lwz and stw to a "Q" operand ?
> 
> No, lwz and stw exactly *do* take an immediate offset.
> 
> It sounds like the compiler passed memory addressed by indexed
> addressing, instead.  Which is fine for "m", and also fine for those
> insns... well, you need lwzx and stwx.
> 
> So perhaps you have code like
> 
>  int *p;
>  int x;
>  ...
>  asm ("lwz %0,%1" : "=r"(x) : "m"(*p));

We indeed have explicit "lwz" and "stw" instructions in there.

> 
> where that last line should actually read
> 
>  asm ("lwz%X1 %0,%1" : "=r"(x) : "m"(*p));

Indeed, turning those into "lwzx" and "stwx" seems to fix the issue.

There has been some level of extra CPP macro coating around those instructions 
to
support both ppc32 and ppc64 with the same assembly. So adding %X[arg] is not 
trivial.
Let me see what can be done here.

Thanks,

Mathieu


> 
> ?
> 
> 
> Segher

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


basic_percpu_ops_test.S
Description: Binary data


Re: Failure to build librseq on ppc

2020-07-08 Thread Michael Ellerman
Segher Boessenkool  writes:
> Hi!
>
> On Tue, Jul 07, 2020 at 03:17:10PM -0400, Mathieu Desnoyers wrote:
>> I'm trying to build librseq at:
>> 
>> https://git.kernel.org/pub/scm/libs/librseq/librseq.git
>> 
>> on powerpc, and I get these errors when building the rseq basic
>> test mirrored from the kernel selftests code:
>> 
>> /tmp/ccieEWxU.s: Assembler messages:
>> /tmp/ccieEWxU.s:118: Error: syntax error; found `,', expected `('
>> /tmp/ccieEWxU.s:118: Error: junk at end of line: `,8'
>> /tmp/ccieEWxU.s:121: Error: syntax error; found `,', expected `('
>> /tmp/ccieEWxU.s:121: Error: junk at end of line: `,8'
>> /tmp/ccieEWxU.s:626: Error: syntax error; found `,', expected `('
>> /tmp/ccieEWxU.s:626: Error: junk at end of line: `,8'
>> /tmp/ccieEWxU.s:629: Error: syntax error; found `,', expected `('
>> /tmp/ccieEWxU.s:629: Error: junk at end of line: `,8'
>> /tmp/ccieEWxU.s:735: Error: syntax error; found `,', expected `('
>> /tmp/ccieEWxU.s:735: Error: junk at end of line: `,8'
>> /tmp/ccieEWxU.s:738: Error: syntax error; found `,', expected `('
>> /tmp/ccieEWxU.s:738: Error: junk at end of line: `,8'
>> /tmp/ccieEWxU.s:741: Error: syntax error; found `,', expected `('
>> /tmp/ccieEWxU.s:741: Error: junk at end of line: `,8'
>> Makefile:581: recipe for target 'basic_percpu_ops_test.o' failed
>
> You'll have to show the actual failing machine code, and with enough
> context that we can relate this to the source code.
>
> -save-temps helps, or use -S instead of -c, etc.

Attached below.

$ gcc -Wall basic_percpu_ops_test.s 
basic_percpu_ops_test.s: Assembler messages:
basic_percpu_ops_test.s:133: Error: operand out of domain (3 is not a multiple 
of 4)
basic_percpu_ops_test.s:133: Error: syntax error; found `,', expected `('
basic_percpu_ops_test.s:133: Error: junk at end of line: `,8'
basic_percpu_ops_test.s:136: Error: operand out of domain (3 is not a multiple 
of 4)
basic_percpu_ops_test.s:136: Error: syntax error; found `,', expected `('
basic_percpu_ops_test.s:136: Error: junk at end of line: `,8'
basic_percpu_ops_test.s:818: Error: operand out of domain (3 is not a multiple 
of 4)
basic_percpu_ops_test.s:818: Error: syntax error; found `,', expected `('
basic_percpu_ops_test.s:818: Error: junk at end of line: `,8'
basic_percpu_ops_test.s:821: Error: operand out of domain (3 is not a multiple 
of 4)
basic_percpu_ops_test.s:821: Error: syntax error; found `,', expected `('
basic_percpu_ops_test.s:821: Error: junk at end of line: `,8'
basic_percpu_ops_test.s:955: Error: operand out of domain (3 is not a multiple 
of 4)
basic_percpu_ops_test.s:955: Error: syntax error; found `,', expected `('
basic_percpu_ops_test.s:955: Error: junk at end of line: `,8'
basic_percpu_ops_test.s:958: Error: operand out of domain (3 is not a multiple 
of 4)
basic_percpu_ops_test.s:958: Error: syntax error; found `,', expected `('
basic_percpu_ops_test.s:958: Error: junk at end of line: `,8'
basic_percpu_ops_test.s:961: Error: operand out of domain (3 is not a multiple 
of 4)
basic_percpu_ops_test.s:961: Error: syntax error; found `,', expected `('
basic_percpu_ops_test.s:961: Error: junk at end of line: `,8'

$ sed '133!d' basic_percpu_ops_test.s
ld %r17, 3,8
$ sed '136!d' basic_percpu_ops_test.s
std 7, 3,8
$ sed '818!d' basic_percpu_ops_test.s
ld %r17, 3,8
$ sed '821!d' basic_percpu_ops_test.s
std 4, 3,8
$ sed '955!d' basic_percpu_ops_test.s
ld %r17, 3,8
$ sed '958!d' basic_percpu_ops_test.s
ld %r17, 3,8
$ sed '961!d' basic_percpu_ops_test.s
std %r17, 3,8

 # 211 "../include/rseq/rseq-ppc.h" 1
.pushsection __rseq_cs, "aw"
.balign 32
3:
.long 0x0, 0x0
.quad 1f, (2f - 1f), 4f
.popsection
.pushsection __rseq_cs_ptr_array, "aw"
.quad 3b
.popsection
.pushsection __rseq_exit_point_array, "aw"
.quad 1f, .L8
.popsection
lis %r17, (3b)@highest
ori %r17, %r17, (3b)@higher
rldicr %r17, %r17, 32, 31
oris %r17, %r17, (3b)@high
ori %r17, %r17, (3b)@l
std %r17, 8(9)
1:
lwz %r17, 4(9)
cmpw cr7, 10, %r17
bne- cr7, 4f
ld %r17, 3,8<--- line 133
cmpd cr7, %r17, 6
bne- cr7, .L8
std 7, 3,8
2:
.pushsection __rseq_failure, "ax"
.long 0x0fe5000b
4:
b .L8
.popsection

Tracking back to the source is "interesting", given there's a lot of
macros involved :)

I think that's from:

#define LOAD_WORD   "ld "

#define RSEQ_ASM_OP_CMPEQ(var, expect, label)   
\
LOAD_WORD "%%r17, %[" __rseq_str(var) "]\n\t"   
\
CMP_WORD "cr7, %%r17, %[" __rseq_str(expect) "]\n\t"
\
"bne- cr7, " __rseq_str(label) "\n\t"

static inline __attribute__((always_inline))
int rseq_cmpeqv_storev(intptr_t *v, intptr_t expect, intptr_t newv, 

Re: [PATCH 2/2] KVM: PPC: Book3S HV: rework secure mem slot dropping

2020-07-08 Thread Laurent Dufour

Le 08/07/2020 à 13:25, Bharata B Rao a écrit :

On Fri, Jul 03, 2020 at 05:59:14PM +0200, Laurent Dufour wrote:

When a secure memslot is dropped, all the pages backed in the secure device
(aka really backed by secure memory by the Ultravisor) should be paged out
to a normal page. Previously, this was achieved by triggering the page
fault mechanism which is calling kvmppc_svm_page_out() on each pages.

This can't work when hot unplugging a memory slot because the memory slot
is flagged as invalid and gfn_to_pfn() is then not trying to access the
page, so the page fault mechanism is not triggered.

Since the final goal is to make a call to kvmppc_svm_page_out() it seems
simpler to directly calling it instead of triggering such a mechanism. This
way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a
memslot.


Yes, this appears much simpler.


Thanks Bharata for reviewing this.





Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock,
the call to __kvmppc_svm_page_out() is made.
As __kvmppc_svm_page_out needs the vma pointer to migrate the pages, the
VMA is fetched in a lazy way, to not trigger find_vma() all the time. In
addition, the mmap_sem is help in read mode during that time, not in write
mode since the virual memory layout is not impacted, and
kvm->arch.uvmem_lock prevents concurrent operation on the secure device.

Cc: Ram Pai 
Cc: Bharata B Rao 
Cc: Paul Mackerras 
Signed-off-by: Laurent Dufour 
---
  arch/powerpc/kvm/book3s_hv_uvmem.c | 54 --
  1 file changed, 37 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 852cc9ae6a0b..479ddf16d18c 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -533,35 +533,55 @@ static inline int kvmppc_svm_page_out(struct 
vm_area_struct *vma,
   * fault on them, do fault time migration to replace the device PTEs in
   * QEMU page table with normal PTEs from newly allocated pages.
   */
-void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
+void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *slot,
 struct kvm *kvm, bool skip_page_out)
  {
int i;
struct kvmppc_uvmem_page_pvt *pvt;
-   unsigned long pfn, uvmem_pfn;
-   unsigned long gfn = free->base_gfn;
+   struct page *uvmem_page;
+   struct vm_area_struct *vma = NULL;
+   unsigned long uvmem_pfn, gfn;
+   unsigned long addr, end;
+
+   down_read(>mm->mmap_sem);


You should be using mmap_read_lock(kvm->mm) with recent kernels.


Absolutely, shame on me, I reviewed Michel's series about that!

Paul, Michael, could you fix that when pulling this patch or should I sent a 
whole new series?





+
+   addr = slot->userspace_addr;
+   end = addr + (slot->npages * PAGE_SIZE);
  
-	for (i = free->npages; i; --i, ++gfn) {

-   struct page *uvmem_page;
+   gfn = slot->base_gfn;
+   for (i = slot->npages; i; --i, ++gfn, addr += PAGE_SIZE) {
+
+   /* Fetch the VMA if addr is not in the latest fetched one */
+   if (!vma || (addr < vma->vm_start || addr >= vma->vm_end)) {
+   vma = find_vma_intersection(kvm->mm, addr, end);
+   if (!vma ||
+   vma->vm_start > addr || vma->vm_end < end) {
+   pr_err("Can't find VMA for gfn:0x%lx\n", gfn);
+   break;
+   }
+   }


The first find_vma_intersection() was called for the range spanning the
entire memslot, but you have code to check if vma remains valid for the
new addr in each iteration. Guess you wanted to get vma for one page at
a time and use it for subsequent pages until it covers the range?


That's the goal, fetch the VMA once and no more until we reach its end boundary.


Re: [PATCH v2 4/4] powerpc/mm/radix: Create separate mappings for hot-plugged memory

2020-07-08 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:
> On 7/8/20 10:14 AM, Michael Ellerman wrote:
>> "Aneesh Kumar K.V"  writes:
>>> To enable memory unplug without splitting kernel page table
>>> mapping, we force the max mapping size to the LMB size. LMB
>>> size is the unit in which hypervisor will do memory add/remove
>>> operation.
>>>
>>> This implies on pseries system, we now end up mapping
>> 
>> Please expand on why it "implies" that for pseries.
>> 
>>> memory with 2M page size instead of 1G. To improve
>>> that we want hypervisor to hint the kernel about the hotplug
>>> memory range.  This was added that as part of
>>   That
>>>
>>> commit b6eca183e23e ("powerpc/kernel: Enables memory
>>> hot-remove after reboot on pseries guests")
>>>
>>> But we still don't do that on PowerVM. Once we get PowerVM
>> 
>> I think you mean PowerVM doesn't provide that hint yet?
>> 
>> Realistically it won't until P10. So this means we'll always use 2MB on
>> Power9 PowerVM doesn't it?
>> 
>> What about KVM?
>> 
>> Have you done any benchmarking on the impact of switching the linear
>> mapping to 2MB pages?
>> 
>
> The TLB impact should be minimal because with a 256M LMB size partition 
> scoped entries are still 2M and hence we end up with TLBs of 2M size.
>
>
>>> updated, we can then force the 2M mapping only to hot-pluggable
>>> memory region using memblock_is_hotpluggable(). Till then
>>> let's depend on LMB size for finding the mapping page size
>>> for linear range.
>>>
>
> updated
>
>
> powerpc/mm/radix: Create separate mappings for hot-plugged memory
>
> To enable memory unplug without splitting kernel page table
> mapping, we force the max mapping size to the LMB size. LMB
> size is the unit in which hypervisor will do memory add/remove
> operation.
>
> Pseries systems supports max LMB size of 256MB. Hence on pseries,
> we now end up mapping memory with 2M page size instead of 1G. To improve
> that we want hypervisor to hint the kernel about the hotplug
> memory range.  That was added that as part of
>
> commit b6eca18 ("powerpc/kernel: Enables memory
> hot-remove after reboot on pseries guests")
>
> But PowerVM doesn't provide that hint yet. Once we get PowerVM
> updated, we can then force the 2M mapping only to hot-pluggable
> memory region using memblock_is_hotpluggable(). Till then
> let's depend on LMB size for finding the mapping page size
> for linear range.
>
> With this change KVM guest will also be doing linear mapping with
> 2M page size.

...
>>> @@ -494,17 +544,27 @@ void __init radix__early_init_devtree(void)
>>>  * Try to find the available page sizes in the device-tree
>>>  */
>>> rc = of_scan_flat_dt(radix_dt_scan_page_sizes, NULL);
>>> -   if (rc != 0)  /* Found */
>>> -   goto found;
>>> +   if (rc == 0) {
>>> +   /*
>>> +* no page size details found in device tree
>>> +* let's assume we have page 4k and 64k support
>> 
>> Capitals and punctuation please?
>> 
>>> +*/
>>> +   mmu_psize_defs[MMU_PAGE_4K].shift = 12;
>>> +   mmu_psize_defs[MMU_PAGE_4K].ap = 0x0;
>>> +
>>> +   mmu_psize_defs[MMU_PAGE_64K].shift = 16;
>>> +   mmu_psize_defs[MMU_PAGE_64K].ap = 0x5;
>>> +   }
>> 
>> Moving that seems like an unrelated change. It's a reasonable change but
>> I'd rather you did it in a standalone patch.
>> 
>
> we needed that change so that we can call radix_memory_block_size() for 
> both found and !found case.

But the found and !found cases converge at found:, which is where you
call it. So I don't understand.

But as I said below, it would be even simpler if you worked out the
memory block size first.

cheers

>>> /*
>>> -* let's assume we have page 4k and 64k support
>>> +* Max mapping size used when mapping pages. We don't use
>>> +* ppc_md.memory_block_size() here because this get called
>>> +* early and we don't have machine probe called yet. Also
>>> +* the pseries implementation only check for ibm,lmb-size.
>>> +* All hypervisor supporting radix do expose that device
>>> +* tree node.
>>>  */
>>> -   mmu_psize_defs[MMU_PAGE_4K].shift = 12;
>>> -   mmu_psize_defs[MMU_PAGE_4K].ap = 0x0;
>>> -
>>> -   mmu_psize_defs[MMU_PAGE_64K].shift = 16;
>>> -   mmu_psize_defs[MMU_PAGE_64K].ap = 0x5;
>>> -found:
>>> +   radix_mem_block_size = radix_memory_block_size();
>> 
>> If you did that earlier in the function, before
>> radix_dt_scan_page_sizes(), the logic would be simpler.
>> 
>>> return;
>>>   }


Re: [PATCH v2 09/10] tools/perf: Add perf tools support for extended register capability in powerpc

2020-07-08 Thread Michael Ellerman
Athira Rajeev  writes:
> From: Anju T Sudhakar 
>
> Add extended regs to sample_reg_mask in the tool side to use
> with `-I?` option. Perf tools side uses extended mask to display
> the platform supported register names (with -I? option) to the user
> and also send this mask to the kernel to capture the extended registers
> in each sample. Hence decide the mask value based on the processor
> version.
>
> Signed-off-by: Anju T Sudhakar 
> [Decide extended mask at run time based on platform]
> Signed-off-by: Athira Rajeev 
> Reviewed-by: Madhavan Srinivasan 

Will need an ack from perf tools folks, who are not on Cc by the looks.

> diff --git a/tools/arch/powerpc/include/uapi/asm/perf_regs.h 
> b/tools/arch/powerpc/include/uapi/asm/perf_regs.h
> index f599064..485b1d5 100644
> --- a/tools/arch/powerpc/include/uapi/asm/perf_regs.h
> +++ b/tools/arch/powerpc/include/uapi/asm/perf_regs.h
> @@ -48,6 +48,18 @@ enum perf_event_powerpc_regs {
>   PERF_REG_POWERPC_DSISR,
>   PERF_REG_POWERPC_SIER,
>   PERF_REG_POWERPC_MMCRA,
> - PERF_REG_POWERPC_MAX,
> + /* Extended registers */
> + PERF_REG_POWERPC_MMCR0,
> + PERF_REG_POWERPC_MMCR1,
> + PERF_REG_POWERPC_MMCR2,
> + /* Max regs without the extended regs */
> + PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1,

I don't really understand this idea of a max that's not the max.

>  };
> +
> +#define PERF_REG_PMU_MASK((1ULL << PERF_REG_POWERPC_MAX) - 1)
> +
> +/* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300 */
> +#define PERF_REG_PMU_MASK_300   (((1ULL << (PERF_REG_POWERPC_MMCR2 + 1)) - 
> 1) \
> + - PERF_REG_PMU_MASK)
> +
>  #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
> diff --git a/tools/perf/arch/powerpc/include/perf_regs.h 
> b/tools/perf/arch/powerpc/include/perf_regs.h
> index e18a355..46ed00d 100644
> --- a/tools/perf/arch/powerpc/include/perf_regs.h
> +++ b/tools/perf/arch/powerpc/include/perf_regs.h
> @@ -64,7 +64,10 @@
>   [PERF_REG_POWERPC_DAR] = "dar",
>   [PERF_REG_POWERPC_DSISR] = "dsisr",
>   [PERF_REG_POWERPC_SIER] = "sier",
> - [PERF_REG_POWERPC_MMCRA] = "mmcra"
> + [PERF_REG_POWERPC_MMCRA] = "mmcra",
> + [PERF_REG_POWERPC_MMCR0] = "mmcr0",
> + [PERF_REG_POWERPC_MMCR1] = "mmcr1",
> + [PERF_REG_POWERPC_MMCR2] = "mmcr2",
>  };
>  
>  static inline const char *perf_reg_name(int id)
> diff --git a/tools/perf/arch/powerpc/util/perf_regs.c 
> b/tools/perf/arch/powerpc/util/perf_regs.c
> index 0a52429..9179230 100644
> --- a/tools/perf/arch/powerpc/util/perf_regs.c
> +++ b/tools/perf/arch/powerpc/util/perf_regs.c
> @@ -6,9 +6,14 @@
>  
>  #include "../../../util/perf_regs.h"
>  #include "../../../util/debug.h"
> +#include "../../../util/event.h"
> +#include "../../../util/header.h"
> +#include "../../../perf-sys.h"
>  
>  #include 
>  
> +#define PVR_POWER9   0x004E
> +
>  const struct sample_reg sample_reg_masks[] = {
>   SMPL_REG(r0, PERF_REG_POWERPC_R0),
>   SMPL_REG(r1, PERF_REG_POWERPC_R1),
> @@ -55,6 +60,9 @@
>   SMPL_REG(dsisr, PERF_REG_POWERPC_DSISR),
>   SMPL_REG(sier, PERF_REG_POWERPC_SIER),
>   SMPL_REG(mmcra, PERF_REG_POWERPC_MMCRA),
> + SMPL_REG(mmcr0, PERF_REG_POWERPC_MMCR0),
> + SMPL_REG(mmcr1, PERF_REG_POWERPC_MMCR1),
> + SMPL_REG(mmcr2, PERF_REG_POWERPC_MMCR2),
>   SMPL_REG_END
>  };
>  
> @@ -163,3 +171,50 @@ int arch_sdt_arg_parse_op(char *old_op, char **new_op)
>  
>   return SDT_ARG_VALID;
>  }
> +
> +uint64_t arch__intr_reg_mask(void)
> +{
> + struct perf_event_attr attr = {
> + .type   = PERF_TYPE_HARDWARE,
> + .config = PERF_COUNT_HW_CPU_CYCLES,
> + .sample_type= PERF_SAMPLE_REGS_INTR,
> + .precise_ip = 1,
> + .disabled   = 1,
> + .exclude_kernel = 1,
> + };
> + int fd, ret;
> + char buffer[64];
> + u32 version;
> + u64 extended_mask = 0;
> +
> + /* Get the PVR value to set the extended
> +  * mask specific to platform

Comment format is wrong, and punctuation please.

> +  */
> + get_cpuid(buffer, sizeof(buffer));
> + ret = sscanf(buffer, "%u,", );

This is powerpc specific code, why not just use mfspr(SPRN_PVR), rather
than redirecting via printf/sscanf.

> +
> + if (ret != 1) {
> + pr_debug("Failed to get the processor version, unable to output 
> extended registers\n");
> + return PERF_REGS_MASK;
> + }
> +
> + if (version == PVR_POWER9)
> + extended_mask = PERF_REG_PMU_MASK_300;
> + else
> + return PERF_REGS_MASK;
> +
> + attr.sample_regs_intr = extended_mask;
> + attr.sample_period = 1;
> + event_attr_init();
> +
> + /*
> +  * check if the pmu supports perf extended regs, before
> +  * returning the register mask to sample.
> +  */
> + fd = sys_perf_event_open(, 0, -1, -1, 0);
> + if (fd != 

Re: [PATCH v2 10/10] powerpc/perf: Add extended regs support for power10 platform

2020-07-08 Thread Michael Ellerman
Athira Rajeev  writes:
> Include capability flag `PERF_PMU_CAP_EXTENDED_REGS` for power10
> and expose MMCR3, SIER2, SIER3 registers as part of extended regs.
> Also introduce `PERF_REG_PMU_MASK_31` to define extended mask
> value at runtime for power10
>
> Signed-off-by: Athira Rajeev 
> ---
>  arch/powerpc/include/uapi/asm/perf_regs.h   |  6 ++
>  arch/powerpc/perf/perf_regs.c   | 10 +-
>  arch/powerpc/perf/power10-pmu.c |  6 ++
>  tools/arch/powerpc/include/uapi/asm/perf_regs.h |  6 ++
>  tools/perf/arch/powerpc/include/perf_regs.h |  3 +++
>  tools/perf/arch/powerpc/util/perf_regs.c|  6 ++

Please split into a kernel patch and a tools patch. And cc the tools people.

>  6 files changed, 36 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/uapi/asm/perf_regs.h 
> b/arch/powerpc/include/uapi/asm/perf_regs.h
> index 485b1d5..020b51c 100644
> --- a/arch/powerpc/include/uapi/asm/perf_regs.h
> +++ b/arch/powerpc/include/uapi/asm/perf_regs.h
> @@ -52,6 +52,9 @@ enum perf_event_powerpc_regs {
>   PERF_REG_POWERPC_MMCR0,
>   PERF_REG_POWERPC_MMCR1,
>   PERF_REG_POWERPC_MMCR2,
> + PERF_REG_POWERPC_MMCR3,
> + PERF_REG_POWERPC_SIER2,
> + PERF_REG_POWERPC_SIER3,
>   /* Max regs without the extended regs */
>   PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1,
>  };
> @@ -62,4 +65,7 @@ enum perf_event_powerpc_regs {
>  #define PERF_REG_PMU_MASK_300   (((1ULL << (PERF_REG_POWERPC_MMCR2 + 1)) - 
> 1) \
>   - PERF_REG_PMU_MASK)
>  
> +/* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_31 */
> +#define PERF_REG_PMU_MASK_31 (((1ULL << (PERF_REG_POWERPC_SIER3 + 1)) - 1) \
> + - PERF_REG_PMU_MASK)

Wrapping that provides no benefit, just let it be long.

>  #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
> diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c
> index c8a7e8c..c969935 100644
> --- a/arch/powerpc/perf/perf_regs.c
> +++ b/arch/powerpc/perf/perf_regs.c
> @@ -81,6 +81,12 @@ static u64 get_ext_regs_value(int idx)
>   return mfspr(SPRN_MMCR1);
>   case PERF_REG_POWERPC_MMCR2:
>   return mfspr(SPRN_MMCR2);
> + case PERF_REG_POWERPC_MMCR3:
> + return mfspr(SPRN_MMCR3);
> + case PERF_REG_POWERPC_SIER2:
> + return mfspr(SPRN_SIER2);
> + case PERF_REG_POWERPC_SIER3:
> + return mfspr(SPRN_SIER3);

Indentation is wrong.

>   default: return 0;
>   }
>  }
> @@ -89,7 +95,9 @@ u64 perf_reg_value(struct pt_regs *regs, int idx)
>  {
>   u64 PERF_REG_EXTENDED_MAX;
>  
> - if (cpu_has_feature(CPU_FTR_ARCH_300))
> + if (cpu_has_feature(CPU_FTR_ARCH_31))
> + PERF_REG_EXTENDED_MAX = PERF_REG_POWERPC_SIER3 + 1;

There's no way to know if that's correct other than going back to the
header to look at the list of values.

So instead you should define it in the header, next to the other values,
with a meaningful name, like PERF_REG_MAX_ISA_31 or something.

> + else if (cpu_has_feature(CPU_FTR_ARCH_300))
>   PERF_REG_EXTENDED_MAX = PERF_REG_POWERPC_MMCR2 + 1;

Same.

>   if (idx == PERF_REG_POWERPC_SIER &&
> diff --git a/arch/powerpc/perf/power10-pmu.c b/arch/powerpc/perf/power10-pmu.c
> index 07fb919..51082d6 100644
> --- a/arch/powerpc/perf/power10-pmu.c
> +++ b/arch/powerpc/perf/power10-pmu.c
> @@ -86,6 +86,8 @@
>  #define POWER10_MMCRA_IFM3   0xC000UL
>  #define POWER10_MMCRA_BHRB_MASK  0xC000UL
>  
> +extern u64 mask_var;

Why is it extern? Also not a good name for a global.

Hang on, it's not even used? Is there some macro magic somewhere?

>  /* Table of alternatives, sorted by column 0 */
>  static const unsigned int power10_event_alternatives[][MAX_ALT] = {
>   { PM_RUN_CYC_ALT,   PM_RUN_CYC },
> @@ -397,6 +399,7 @@ static void power10_config_bhrb(u64 pmu_bhrb_filter)
>   .cache_events   = _cache_events,
>   .attr_groups= power10_pmu_attr_groups,
>   .bhrb_nr= 32,
> + .capabilities   = PERF_PMU_CAP_EXTENDED_REGS,
>  };
>  
>  int init_power10_pmu(void)
> @@ -408,6 +411,9 @@ int init_power10_pmu(void)
>   strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power10"))
>   return -ENODEV;
>  
> + /* Set the PERF_REG_EXTENDED_MASK here */
> + mask_var = PERF_REG_PMU_MASK_31;
> +
>   rc = register_power_pmu(_pmu);
>   if (rc)
>   return rc;


cheers


Re: [PATCH v2 07/10] powerpc/perf: support BHRB disable bit and new filtering modes

2020-07-08 Thread Michael Ellerman
Athira Rajeev  writes:

> PowerISA v3.1 has few updates for the Branch History Rolling Buffer(BHRB).
   ^
   a
> First is the addition of BHRB disable bit and second new filtering
  ^
  is
> modes for BHRB.
>
> BHRB disable is controlled via Monitor Mode Control Register A (MMCRA)
> bit 26, namely "BHRB Recording Disable (BHRBRD)". This field controls

Most people call that bit 37.

> whether BHRB entries are written when BHRB recording is enabled by other
> bits. Patch implements support for this BHRB disable bit.
   ^
   This

> Secondly PowerISA v3.1 introduce filtering support for

.. that should be in a separate patch please.

> PERF_SAMPLE_BRANCH_IND_CALL/COND. The patch adds BHRB filter support
^
This
> for "ind_call" and "cond" in power10_bhrb_filter_map().
>
> 'commit bb19af816025 ("powerpc/perf: Prevent kernel address leak to userspace 
> via BHRB buffer")'

That doesn't need single quotes, and should be wrapped at 72 columns
like the rest of the text.

> added a check in bhrb_read() to filter the kernel address from BHRB buffer. 
> Patch here modified
> it to avoid that check for PowerISA v3.1 based processors, since PowerISA 
> v3.1 allows
> only MSR[PR]=1 address to be written to BHRB buffer.

And that should be a separate patch again please.

> Signed-off-by: Athira Rajeev 
> ---
>  arch/powerpc/perf/core-book3s.c   | 27 +--
>  arch/powerpc/perf/isa207-common.c | 13 +
>  arch/powerpc/perf/power10-pmu.c   | 13 +++--
>  arch/powerpc/platforms/powernv/idle.c | 14 ++
>  4 files changed, 59 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index fad5159..9709606 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -466,9 +466,13 @@ static void power_pmu_bhrb_read(struct perf_event 
> *event, struct cpu_hw_events *
>* addresses at this point. Check the privileges before
>* exporting it to userspace (avoid exposure of regions
>* where we could have speculative execution)
> +  * Incase of ISA 310, BHRB will capture only user-space
   ^
   In case of ISA v3.1,

> +  * address,hence include a check before filtering code
   ^  ^
   addresses, hence   .
>*/
> - if (is_kernel_addr(addr) && 
> perf_allow_kernel(>attr) != 0)
> - continue;
> + if (!(ppmu->flags & PPMU_ARCH_310S))
> + if (is_kernel_addr(addr) &&
> + perf_allow_kernel(>attr) != 0)
> + continue;

The indentation is weird. You should just check all three conditions
with &&.

>  
>   /* Branches are read most recent first (ie. mfbhrb 0 is
>* the most recent branch).
> @@ -1212,7 +1216,7 @@ static void write_mmcr0(struct cpu_hw_events *cpuhw, 
> unsigned long mmcr0)
>  static void power_pmu_disable(struct pmu *pmu)
>  {
>   struct cpu_hw_events *cpuhw;
> - unsigned long flags, mmcr0, val;
> + unsigned long flags, mmcr0, val, mmcra = 0;

You initialise it below.

>   if (!ppmu)
>   return;
> @@ -1245,12 +1249,23 @@ static void power_pmu_disable(struct pmu *pmu)
>   mb();
>   isync();
>  
> + val = mmcra = cpuhw->mmcr[2];
> +

For mmcr0 (above), val is the variable we mutate and mmcr0 is the
original value. But here you've done the reverse, which is confusing.

>   /*
>* Disable instruction sampling if it was enabled
>*/
> - if (cpuhw->mmcr[2] & MMCRA_SAMPLE_ENABLE) {
> - mtspr(SPRN_MMCRA,
> -   cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE);
> + if (cpuhw->mmcr[2] & MMCRA_SAMPLE_ENABLE)
> + mmcra = cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE;

You just loaded cpuhw->mmcr[2] into mmcra, use it rather than referring
back to cpuhw->mmcr[2] over and over.

> +
> + /* Disable BHRB via mmcra [:26] for p10 if needed */
> + if (!(cpuhw->mmcr[2] & MMCRA_BHRB_DISABLE))

You don't need to check that it's clear AFAICS. Just always set disable
and the check against val below will catch the nop case.

> + mmcra |= MMCRA_BHRB_DISABLE;
> +
> + /* Write SPRN_MMCRA if mmcra has either disabled

Comment format is wrong.

> +  * 

Re: [PATCH 2/2] KVM: PPC: Book3S HV: rework secure mem slot dropping

2020-07-08 Thread Bharata B Rao
On Fri, Jul 03, 2020 at 05:59:14PM +0200, Laurent Dufour wrote:
> When a secure memslot is dropped, all the pages backed in the secure device
> (aka really backed by secure memory by the Ultravisor) should be paged out
> to a normal page. Previously, this was achieved by triggering the page
> fault mechanism which is calling kvmppc_svm_page_out() on each pages.
> 
> This can't work when hot unplugging a memory slot because the memory slot
> is flagged as invalid and gfn_to_pfn() is then not trying to access the
> page, so the page fault mechanism is not triggered.
> 
> Since the final goal is to make a call to kvmppc_svm_page_out() it seems
> simpler to directly calling it instead of triggering such a mechanism. This
> way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a
> memslot.

Yes, this appears much simpler.

> 
> Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock,
> the call to __kvmppc_svm_page_out() is made.
> As __kvmppc_svm_page_out needs the vma pointer to migrate the pages, the
> VMA is fetched in a lazy way, to not trigger find_vma() all the time. In
> addition, the mmap_sem is help in read mode during that time, not in write
> mode since the virual memory layout is not impacted, and
> kvm->arch.uvmem_lock prevents concurrent operation on the secure device.
> 
> Cc: Ram Pai 
> Cc: Bharata B Rao 
> Cc: Paul Mackerras 
> Signed-off-by: Laurent Dufour 
> ---
>  arch/powerpc/kvm/book3s_hv_uvmem.c | 54 --
>  1 file changed, 37 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index 852cc9ae6a0b..479ddf16d18c 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -533,35 +533,55 @@ static inline int kvmppc_svm_page_out(struct 
> vm_area_struct *vma,
>   * fault on them, do fault time migration to replace the device PTEs in
>   * QEMU page table with normal PTEs from newly allocated pages.
>   */
> -void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
> +void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *slot,
>struct kvm *kvm, bool skip_page_out)
>  {
>   int i;
>   struct kvmppc_uvmem_page_pvt *pvt;
> - unsigned long pfn, uvmem_pfn;
> - unsigned long gfn = free->base_gfn;
> + struct page *uvmem_page;
> + struct vm_area_struct *vma = NULL;
> + unsigned long uvmem_pfn, gfn;
> + unsigned long addr, end;
> +
> + down_read(>mm->mmap_sem);

You should be using mmap_read_lock(kvm->mm) with recent kernels.

> +
> + addr = slot->userspace_addr;
> + end = addr + (slot->npages * PAGE_SIZE);
>  
> - for (i = free->npages; i; --i, ++gfn) {
> - struct page *uvmem_page;
> + gfn = slot->base_gfn;
> + for (i = slot->npages; i; --i, ++gfn, addr += PAGE_SIZE) {
> +
> + /* Fetch the VMA if addr is not in the latest fetched one */
> + if (!vma || (addr < vma->vm_start || addr >= vma->vm_end)) {
> + vma = find_vma_intersection(kvm->mm, addr, end);
> + if (!vma ||
> + vma->vm_start > addr || vma->vm_end < end) {
> + pr_err("Can't find VMA for gfn:0x%lx\n", gfn);
> + break;
> + }
> + }

The first find_vma_intersection() was called for the range spanning the
entire memslot, but you have code to check if vma remains valid for the
new addr in each iteration. Guess you wanted to get vma for one page at
a time and use it for subsequent pages until it covers the range?

Regards,
Bharata.


Re: [PATCH v2 07/10] powerpc/perf: support BHRB disable bit and new filtering modes

2020-07-08 Thread Athira Rajeev


> On 07-Jul-2020, at 12:47 PM, Michael Neuling  wrote:
> 
> On Wed, 2020-07-01 at 05:20 -0400, Athira Rajeev wrote:
>> PowerISA v3.1 has few updates for the Branch History Rolling Buffer(BHRB).
>> First is the addition of BHRB disable bit and second new filtering
>> modes for BHRB.
>> 
>> BHRB disable is controlled via Monitor Mode Control Register A (MMCRA)
>> bit 26, namely "BHRB Recording Disable (BHRBRD)". This field controls
>> whether BHRB entries are written when BHRB recording is enabled by other
>> bits. Patch implements support for this BHRB disable bit.
> 
> Probably good to note here that this is backwards compatible. So if you have a
> kernel that doesn't know about this bit, it'll clear it and hence you still 
> get
> BHRB. 
> 
> You should also note why you'd want to do disable this (ie. the core will run
> faster).
> 


Sure Mikey, will add these information in commit message 

Thanks
Athira


>> Secondly PowerISA v3.1 introduce filtering support for
>> PERF_SAMPLE_BRANCH_IND_CALL/COND. The patch adds BHRB filter support
>> for "ind_call" and "cond" in power10_bhrb_filter_map().
>> 
>> 'commit bb19af816025 ("powerpc/perf: Prevent kernel address leak to userspace
>> via BHRB buffer")'
>> added a check in bhrb_read() to filter the kernel address from BHRB buffer.
>> Patch here modified
>> it to avoid that check for PowerISA v3.1 based processors, since PowerISA 
>> v3.1
>> allows
>> only MSR[PR]=1 address to be written to BHRB buffer.
>> 
>> Signed-off-by: Athira Rajeev 
>> ---
>> arch/powerpc/perf/core-book3s.c   | 27 +--
>> arch/powerpc/perf/isa207-common.c | 13 +
>> arch/powerpc/perf/power10-pmu.c   | 13 +++--
>> arch/powerpc/platforms/powernv/idle.c | 14 ++
> 
> This touches the idle code so we should get those guys on CC (adding Vaidy and
> Ego).
> 
>> 4 files changed, 59 insertions(+), 8 deletions(-)
>> 
>> diff --git a/arch/powerpc/perf/core-book3s.c 
>> b/arch/powerpc/perf/core-book3s.c
>> index fad5159..9709606 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -466,9 +466,13 @@ static void power_pmu_bhrb_read(struct perf_event 
>> *event,
>> struct cpu_hw_events *
>>   * addresses at this point. Check the privileges before
>>   * exporting it to userspace (avoid exposure of regions
>>   * where we could have speculative execution)
>> + * Incase of ISA 310, BHRB will capture only user-space
>> + * address,hence include a check before filtering code
>>   */
>> -if (is_kernel_addr(addr) && perf_allow_kernel(
>>> attr) != 0)
>> -continue;
>> +if (!(ppmu->flags & PPMU_ARCH_310S))
>> +if (is_kernel_addr(addr) &&
>> +perf_allow_kernel(>attr) != 0)
>> +continue;
>> 
>>  /* Branches are read most recent first (ie. mfbhrb 0 is
>>   * the most recent branch).
>> @@ -1212,7 +1216,7 @@ static void write_mmcr0(struct cpu_hw_events *cpuhw,
>> unsigned long mmcr0)
>> static void power_pmu_disable(struct pmu *pmu)
>> {
>>  struct cpu_hw_events *cpuhw;
>> -unsigned long flags, mmcr0, val;
>> +unsigned long flags, mmcr0, val, mmcra = 0;
>> 
>>  if (!ppmu)
>>  return;
>> @@ -1245,12 +1249,23 @@ static void power_pmu_disable(struct pmu *pmu)
>>  mb();
>>  isync();
>> 
>> +val = mmcra = cpuhw->mmcr[2];
>> +
>>  /*
>>   * Disable instruction sampling if it was enabled
>>   */
>> -if (cpuhw->mmcr[2] & MMCRA_SAMPLE_ENABLE) {
>> -mtspr(SPRN_MMCRA,
>> -  cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE);
>> +if (cpuhw->mmcr[2] & MMCRA_SAMPLE_ENABLE)
>> +mmcra = cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE;
>> +
>> +/* Disable BHRB via mmcra [:26] for p10 if needed */
>> +if (!(cpuhw->mmcr[2] & MMCRA_BHRB_DISABLE))
>> +mmcra |= MMCRA_BHRB_DISABLE;
>> +
>> +/* Write SPRN_MMCRA if mmcra has either disabled
>> + * instruction sampling or BHRB
>> + */
>> +if (val != mmcra) {
>> +mtspr(SPRN_MMCRA, mmcra);
>>  mb();
>>  isync();
>>  }
>> diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-
>> common.c
>> index 7d4839e..463d925 100644
>> --- a/arch/powerpc/perf/isa207-common.c
>> +++ b/arch/powerpc/perf/isa207-common.c
>> @@ -404,6 +404,12 @@ int isa207_compute_mmcr(u64 event[], int n_ev,
>> 
>>  mmcra = mmcr1 = mmcr2 = mmcr3 = 0;
>> 
>> +/* Disable bhrb unless explicitly requested
>> + * by setting MMCRA [:26] bit.
>> + */

[PATCH] arch: powerpc: Remove unnecessary cast in kfree()

2020-07-08 Thread Xu Wang
Remove unnecassary casts in the argument to kfree.

Signed-off-by: Xu Wang 
---
 arch/powerpc/platforms/pseries/dlpar.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
b/arch/powerpc/platforms/pseries/dlpar.c
index 16e86ba8aa20..1f3d26806295 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -379,7 +379,7 @@ static void pseries_hp_work_fn(struct work_struct *work)
handle_dlpar_errorlog(hp_work->errlog);
 
kfree(hp_work->errlog);
-   kfree((void *)work);
+   kfree(work);
 }
 
 void queue_hotplug_event(struct pseries_hp_errorlog *hp_errlog)
-- 
2.17.1



Re: [PATCH v2 04/10] powerpc/perf: Add power10_feat to dt_cpu_ftrs

2020-07-08 Thread Michael Ellerman
Athira Rajeev  writes:
> From: Madhavan Srinivasan 
>
> Add power10 feature function to dt_cpu_ftrs.c along
> with a power10 specific init() to initialize pmu sprs.
>
> Signed-off-by: Madhavan Srinivasan 
> ---
>  arch/powerpc/include/asm/reg.h|  3 +++
>  arch/powerpc/kernel/cpu_setup_power.S |  7 +++
>  arch/powerpc/kernel/dt_cpu_ftrs.c | 26 ++
>  3 files changed, 36 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index 21a1b2d..900ada1 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -1068,6 +1068,9 @@
>  #define MMCR0_PMC2_LOADMISSTIME  0x5
>  #endif
>  
> +/* BHRB disable bit for PowerISA v3.10 */
> +#define MMCRA_BHRB_DISABLE   0x0020
> +
>  /*
>   * SPRG usage:
>   *
> diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
> b/arch/powerpc/kernel/cpu_setup_power.S
> index efdcfa7..e8b3370c 100644
> --- a/arch/powerpc/kernel/cpu_setup_power.S
> +++ b/arch/powerpc/kernel/cpu_setup_power.S
> @@ -233,3 +233,10 @@ __init_PMU_ISA207:
>   li  r5,0
>   mtspr   SPRN_MMCRS,r5
>   blr
> +
> +__init_PMU_ISA31:
> + li  r5,0
> + mtspr   SPRN_MMCR3,r5
> + LOAD_REG_IMMEDIATE(r5, MMCRA_BHRB_DISABLE)
> + mtspr   SPRN_MMCRA,r5
> + blr

This doesn't seem like it belongs in this patch. It's not called?

cheers


Re: [PATCH v2 03/10] powerpc/xmon: Add PowerISA v3.1 PMU SPRs

2020-07-08 Thread Michael Ellerman
Athira Rajeev  writes:
> From: Madhavan Srinivasan 
>
> PowerISA v3.1 added three new perfromance
> monitoring unit (PMU) speical purpose register (SPR).
> They are Monitor Mode Control Register 3 (MMCR3),
> Sampled Instruction Event Register 2 (SIER2),
> Sampled Instruction Event Register 3 (SIER3).
>
> Patch here adds a new dump function dump_310_sprs
> to print these SPR values.
>
> Signed-off-by: Madhavan Srinivasan 
> ---
>  arch/powerpc/xmon/xmon.c | 15 +++
>  1 file changed, 15 insertions(+)
>
> diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> index 7efe4bc..8917fe8 100644
> --- a/arch/powerpc/xmon/xmon.c
> +++ b/arch/powerpc/xmon/xmon.c
> @@ -2022,6 +2022,20 @@ static void dump_300_sprs(void)
>  #endif
>  }
>  
> +static void dump_310_sprs(void)
> +{
> +#ifdef CONFIG_PPC64
> + if (!cpu_has_feature(CPU_FTR_ARCH_31))
> + return;
> +
> + printf("mmcr3  = %.16lx\n",
> + mfspr(SPRN_MMCR3));
> +
> + printf("sier2  = %.16lx  sier3  = %.16lx\n",
> + mfspr(SPRN_SIER2), mfspr(SPRN_SIER3));

Why not all on one line like many of the others?

cheers


Re: [PATCH v2 01/10] powerpc/perf: Add support for ISA3.1 PMU SPRs

2020-07-08 Thread Michael Ellerman
Athira Rajeev  writes:
...
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index cd6a742..5c64bd3 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -39,10 +39,10 @@ struct cpu_hw_events {
>   unsigned int flags[MAX_HWEVENTS];
>   /*
>* The order of the MMCR array is:
> -  *  - 64-bit, MMCR0, MMCR1, MMCRA, MMCR2
> +  *  - 64-bit, MMCR0, MMCR1, MMCRA, MMCR2, MMCR3
>*  - 32-bit, MMCR0, MMCR1, MMCR2
>*/
> - unsigned long mmcr[4];
> + unsigned long mmcr[5];
>   struct perf_event *limited_counter[MAX_LIMITED_HWCOUNTERS];
>   u8  limited_hwidx[MAX_LIMITED_HWCOUNTERS];
>   u64 alternatives[MAX_HWEVENTS][MAX_EVENT_ALTERNATIVES];
...
> @@ -1310,6 +1326,10 @@ static void power_pmu_enable(struct pmu *pmu)
>   if (!cpuhw->n_added) {
>   mtspr(SPRN_MMCRA, cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE);
>   mtspr(SPRN_MMCR1, cpuhw->mmcr[1]);
> +#ifdef CONFIG_PPC64
> + if (ppmu->flags & PPMU_ARCH_310S)
> + mtspr(SPRN_MMCR3, cpuhw->mmcr[4]);
> +#endif /* CONFIG_PPC64 */
>   goto out_enable;
>   }
>  
> @@ -1353,6 +1373,11 @@ static void power_pmu_enable(struct pmu *pmu)
>   if (ppmu->flags & PPMU_ARCH_207S)
>   mtspr(SPRN_MMCR2, cpuhw->mmcr[3]);
>  
> +#ifdef CONFIG_PPC64
> + if (ppmu->flags & PPMU_ARCH_310S)
> + mtspr(SPRN_MMCR3, cpuhw->mmcr[4]);
> +#endif /* CONFIG_PPC64 */

I don't think you need the #ifdef CONFIG_PPC64?

cheers


Re: [PATCH v2 06/10] powerpc/perf: power10 Performance Monitoring support

2020-07-08 Thread Athira Rajeev



> On 07-Jul-2020, at 12:20 PM, Michael Neuling  wrote:
> 
> 
>> @@ -480,6 +520,7 @@ int isa207_compute_mmcr(u64 event[], int n_ev,
>>  mmcr[1] = mmcr1;
>>  mmcr[2] = mmcra;
>>  mmcr[3] = mmcr2;
>> +mmcr[4] = mmcr3;
> 
> This is fragile like the kvm vcpu case I commented on before but it gets 
> passed
> in via a function parameter?! Can you create a struct to store these in rather
> than this odd ball numbering?

Mikey,
Yes, it gets passed as cpuhw->mmcr array 
I will check on these cleanup changes for the kvm vcpu case as well as 
cpu_hw_events mmcr array

Thanks
Athira
> 
> The cleanup should start in patch 1/10 here:
> 
>/*
> * The order of the MMCR array is:
> -*  - 64-bit, MMCR0, MMCR1, MMCRA, MMCR2
> +*  - 64-bit, MMCR0, MMCR1, MMCRA, MMCR2, MMCR3
> *  - 32-bit, MMCR0, MMCR1, MMCR2
> */
> -   unsigned long mmcr[4];
> +   unsigned long mmcr[5];
> 
> 
> 
> mikey



Re: [PATCH] selftests/powerpc: Purge extra count_pmc() calls of ebb selftests

2020-07-08 Thread Sachin Sant



> On 26-Jun-2020, at 10:17 PM, Desnes A. Nunes do Rosario 
>  wrote:
> 
> An extra count on ebb_state.stats.pmc_count[PMC_INDEX(pmc)] is being per-
> formed when count_pmc() is used to reset PMCs on a few selftests. This
> extra pmc_count can occasionally invalidate results, such as the ones from
> cycles_test shown hereafter. The ebb_check_count() failed with an above
> the upper limit error due to the extra value on ebb_state.stats.pmc_count.
> 
> Furthermore, this extra count is also indicated by extra PMC1 trace_log on
> the output of the cycle test (as well as on pmc56_overflow_test):
> 
> ==
>   ...
>   [21]: counter = 8
>   [22]: register SPRN_MMCR0 = 0x8080
>   [23]: register SPRN_PMC1  = 0x8004
>   [24]: counter = 9
>   [25]: register SPRN_MMCR0 = 0x8080
>   [26]: register SPRN_PMC1  = 0x8004
>   [27]: counter = 10
>   [28]: register SPRN_MMCR0 = 0x8080
>   [29]: register SPRN_PMC1  = 0x8004
>>> [30]: register SPRN_PMC1  = 0x451e
> PMC1 count (0x28546) above upper limit 0x283e8 (+0x15e)
> [FAIL] Test FAILED on line 52
> failure: cycles
> ==
> 
> Signed-off-by: Desnes A. Nunes do Rosario 
> ---

I too have run into similar failure with cycles_test. I will add that the 
failure
is inconsistent. I have run into this issue 1 out of 25 times. The failure 
always
happen at first instance. Subsequent tries work correctly.

With this patch applied the test completes successfully 25 out of 25 times.

# ./cycles_test 
test: cycles
…..
…..
  [25]: register SPRN_MMCR0 = 0x8080
  [26]: register SPRN_PMC1  = 0x8004
  [27]: counter = 10
  [28]: register SPRN_MMCR0 = 0x8080
  [29]: register SPRN_PMC1  = 0x8004
  [30]: register SPRN_PMC1  = 0x448f
PMC1 count (0x284b7) above upper limit 0x283e8 (+0xcf)
[FAIL] Test FAILED on line 52
failure: cycles

With the patch

# ./cycles_test 
test: cycles
…..
…..
  [25]: register SPRN_MMCR0 = 0x8080
  [26]: register SPRN_PMC1  = 0x8004
  [27]: counter = 10
  [28]: register SPRN_MMCR0 = 0x8080
  [29]: register SPRN_PMC1  = 0x8004
PMC1 count (0x28028) is between 0x27c18 and 0x283e8 delta 
+0x410/-0x3c0
success: cycles
# 

FWIW   Tested-by : Sachin Sant 

Thanks
-Sachin

  1   2   >