Re: [PATCH v6 0/3] Introduce 64b relocatable kernel

2021-06-30 Thread Palmer Dabbelt

On Thu, 17 Jun 2021 06:33:48 PDT (-0700), a...@ghiti.fr wrote:

Le 18/05/2021 à 12:12, Alexandre Ghiti a écrit :

After multiple attempts, this patchset is now based on the fact that the
64b kernel mapping was moved outside the linear mapping.

The first patch allows to build relocatable kernels but is not selected
by default. That patch should ease KASLR implementation a lot.
The second and third patches take advantage of an already existing powerpc
script that checks relocations at compile-time, and uses it for riscv.


@Palmer, any thought about that? There are no users for now, do you want
to wait for a KASLR implementation to use it before merging this? If so,
I can work on a KASLR implementation based on older implementation from
Zong.


Sorry, I must have missed this patch set the first time through.  I 
don't see any reason to wait for KASLR before taking support for 
relocatable kernels, as relocatable kernelsa are useful on their own.


I'm not sure I'll have time to look at this for this cycle, but I'll try 
to find some time to given that it was posted a while ago.




Thanks,



This patchset was tested on:

* kernel:
- rv32: OK
- rv64 with RELOCATABLE: OK and checked that "suspicious" relocations are 
caught.
- rv64 without RELOCATABLE: OK
- powerpc: build only and checked that "suspicious" relocations are caught.

* xipkernel:
- rv32: build only
- rv64: OK

* nommukernel:
- rv64: build only

Changes in v6:
   * Remove the kernel move to vmalloc zone
   * Rebased on top of for-next
   * Remove relocatable property from 32b kernel as the kernel is mapped in
 the linear mapping and would then need to be copied physically too
   * CONFIG_RELOCATABLE depends on !XIP_KERNEL
   * Remove Reviewed-by from first patch as it changed a bit

Changes in v5:
   * Add "static __init" to create_kernel_page_table function as reported by
 Kbuild test robot
   * Add reviewed-by from Zong
   * Rebase onto v5.7

Changes in v4:
   * Fix BPF region that overlapped with kernel's as suggested by Zong
   * Fix end of module region that could be larger than 2GB as suggested by Zong
   * Fix the size of the vm area reserved for the kernel as we could lose
 PMD_SIZE if the size was already aligned on PMD_SIZE
   * Split compile time relocations check patch into 2 patches as suggested by 
Anup
   * Applied Reviewed-by from Zong and Anup

Changes in v3:
   * Move kernel mapping to vmalloc

Changes in v2:
   * Make RELOCATABLE depend on MMU as suggested by Anup
   * Rename kernel_load_addr into kernel_virt_addr as suggested by Anup
   * Use __pa_symbol instead of __pa, as suggested by Zong
   * Rebased on top of v5.6-rc3
   * Tested with sv48 patchset
   * Add Reviewed/Tested-by from Zong and Anup

Alexandre Ghiti (3):
   riscv: Introduce CONFIG_RELOCATABLE
   powerpc: Move script to check relocations at compile time in scripts/
   riscv: Check relocations at compile time

  arch/powerpc/tools/relocs_check.sh | 18 ++
  arch/riscv/Kconfig | 12 +++
  arch/riscv/Makefile|  5 ++-
  arch/riscv/Makefile.postlink   | 36 
  arch/riscv/kernel/vmlinux.lds.S|  6 
  arch/riscv/mm/Makefile |  4 +++
  arch/riscv/mm/init.c   | 53 +-
  arch/riscv/tools/relocs_check.sh   | 26 +++
  scripts/relocs_check.sh| 20 +++
  9 files changed, 162 insertions(+), 18 deletions(-)
  create mode 100644 arch/riscv/Makefile.postlink
  create mode 100755 arch/riscv/tools/relocs_check.sh
  create mode 100755 scripts/relocs_check.sh



Re: [RESEND PATCH v4 05/11] powerpc/64s: Add ability to skip SLB preload

2021-06-30 Thread Christopher M. Riedl
On Wed Jun 30, 2021 at 11:15 PM CDT, Nicholas Piggin wrote:
> Excerpts from Christopher M. Riedl's message of July 1, 2021 1:48 pm:
> > On Sun Jun 20, 2021 at 10:13 PM CDT, Daniel Axtens wrote:
> >> "Christopher M. Riedl"  writes:
> >>
> >> > Switching to a different mm with Hash translation causes SLB entries to
> >> > be preloaded from the current thread_info. This reduces SLB faults, for
> >> > example when threads share a common mm but operate on different address
> >> > ranges.
> >> >
> >> > Preloading entries from the thread_info struct may not always be
> >> > appropriate - such as when switching to a temporary mm. Introduce a new
> >> > boolean in mm_context_t to skip the SLB preload entirely. Also move the
> >> > SLB preload code into a separate function since switch_slb() is already
> >> > quite long. The default behavior (preloading SLB entries from the
> >> > current thread_info struct) remains unchanged.
> >> >
> >> > Signed-off-by: Christopher M. Riedl 
> >> >
> >> > ---
> >> >
> >> > v4:  * New to series.
> >> > ---
> >> >  arch/powerpc/include/asm/book3s/64/mmu.h |  3 ++
> >> >  arch/powerpc/include/asm/mmu_context.h   | 13 ++
> >> >  arch/powerpc/mm/book3s64/mmu_context.c   |  2 +
> >> >  arch/powerpc/mm/book3s64/slb.c   | 56 ++--
> >> >  4 files changed, 50 insertions(+), 24 deletions(-)
> >> >
> >> > diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
> >> > b/arch/powerpc/include/asm/book3s/64/mmu.h
> >> > index eace8c3f7b0a1..b23a9dcdee5af 100644
> >> > --- a/arch/powerpc/include/asm/book3s/64/mmu.h
> >> > +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
> >> > @@ -130,6 +130,9 @@ typedef struct {
> >> >  u32 pkey_allocation_map;
> >> >  s16 execute_only_pkey; /* key holding execute-only protection */
> >> >  #endif
> >> > +
> >> > +/* Do not preload SLB entries from thread_info during 
> >> > switch_slb() */
> >> > +bool skip_slb_preload;
> >> >  } mm_context_t;
> >> >  
> >> >  static inline u16 mm_ctx_user_psize(mm_context_t *ctx)
> >> > diff --git a/arch/powerpc/include/asm/mmu_context.h 
> >> > b/arch/powerpc/include/asm/mmu_context.h
> >> > index 4bc45d3ed8b0e..264787e90b1a1 100644
> >> > --- a/arch/powerpc/include/asm/mmu_context.h
> >> > +++ b/arch/powerpc/include/asm/mmu_context.h
> >> > @@ -298,6 +298,19 @@ static inline int arch_dup_mmap(struct mm_struct 
> >> > *oldmm,
> >> >  return 0;
> >> >  }
> >> >  
> >> > +#ifdef CONFIG_PPC_BOOK3S_64
> >> > +
> >> > +static inline void skip_slb_preload_mm(struct mm_struct *mm)
> >> > +{
> >> > +mm->context.skip_slb_preload = true;
> >> > +}
> >> > +
> >> > +#else
> >> > +
> >> > +static inline void skip_slb_preload_mm(struct mm_struct *mm) {}
> >> > +
> >> > +#endif /* CONFIG_PPC_BOOK3S_64 */
> >> > +
> >> >  #include 
> >> >  
> >> >  #endif /* __KERNEL__ */
> >> > diff --git a/arch/powerpc/mm/book3s64/mmu_context.c 
> >> > b/arch/powerpc/mm/book3s64/mmu_context.c
> >> > index c10fc8a72fb37..3479910264c59 100644
> >> > --- a/arch/powerpc/mm/book3s64/mmu_context.c
> >> > +++ b/arch/powerpc/mm/book3s64/mmu_context.c
> >> > @@ -202,6 +202,8 @@ int init_new_context(struct task_struct *tsk, struct 
> >> > mm_struct *mm)
> >> >  atomic_set(>context.active_cpus, 0);
> >> >  atomic_set(>context.copros, 0);
> >> >  
> >> > +mm->context.skip_slb_preload = false;
> >> > +
> >> >  return 0;
> >> >  }
> >> >  
> >> > diff --git a/arch/powerpc/mm/book3s64/slb.c 
> >> > b/arch/powerpc/mm/book3s64/slb.c
> >> > index c91bd85eb90e3..da0836cb855af 100644
> >> > --- a/arch/powerpc/mm/book3s64/slb.c
> >> > +++ b/arch/powerpc/mm/book3s64/slb.c
> >> > @@ -441,10 +441,39 @@ static void slb_cache_slbie_user(unsigned int 
> >> > index)
> >> >  asm volatile("slbie %0" : : "r" (slbie_data));
> >> >  }
> >> >  
> >> > +static void preload_slb_entries(struct task_struct *tsk, struct 
> >> > mm_struct *mm)
> >> Should this be explicitly inline or even __always_inline? I'm thinking
> >> switch_slb is probably a fairly hot path on hash?
> > 
> > Yes absolutely. I'll make this change in v5.
> > 
> >>
> >> > +{
> >> > +struct thread_info *ti = task_thread_info(tsk);
> >> > +unsigned char i;
> >> > +
> >> > +/*
> >> > + * We gradually age out SLBs after a number of context switches 
> >> > to
> >> > + * reduce reload overhead of unused entries (like we do with 
> >> > FP/VEC
> >> > + * reload). Each time we wrap 256 switches, take an entry out 
> >> > of the
> >> > + * SLB preload cache.
> >> > + */
> >> > +tsk->thread.load_slb++;
> >> > +if (!tsk->thread.load_slb) {
> >> > +unsigned long pc = KSTK_EIP(tsk);
> >> > +
> >> > +preload_age(ti);
> >> > +preload_add(ti, pc);
> >> > +}
> >> > +
> >> > +for (i = 0; i < ti->slb_preload_nr; i++) {
> >> > +unsigned char idx;
> >> > +

Re: [RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching

2021-06-30 Thread Christopher M. Riedl
On Sun Jun 20, 2021 at 10:19 PM CDT, Daniel Axtens wrote:
> Hi Chris,
>
> > +   /*
> > +* Choose a randomized, page-aligned address from the range:
> > +* [PAGE_SIZE, DEFAULT_MAP_WINDOW - PAGE_SIZE]
> > +* The lower address bound is PAGE_SIZE to avoid the zero-page.
> > +* The upper address bound is DEFAULT_MAP_WINDOW - PAGE_SIZE to stay
> > +* under DEFAULT_MAP_WINDOW with the Book3s64 Hash MMU.
> > +*/
> > +   patching_addr = PAGE_SIZE + ((get_random_long() & PAGE_MASK)
> > +   % (DEFAULT_MAP_WINDOW - 2 * PAGE_SIZE));
>
> I checked and poking_init() comes after the functions that init the RNG,
> so this should be fine. The maths - while a bit fiddly to reason about -
> does check out.

Thanks for double checking.

>
> > +
> > +   /*
> > +* PTE allocation uses GFP_KERNEL which means we need to pre-allocate
> > +* the PTE here. We cannot do the allocation during patching with IRQs
> > +* disabled (ie. "atomic" context).
> > +*/
> > +   ptep = get_locked_pte(patching_mm, patching_addr, );
> > +   BUG_ON(!ptep);
> > +   pte_unmap_unlock(ptep, ptl);
> > +}
> >  
> >  #if IS_BUILTIN(CONFIG_LKDTM)
> >  unsigned long read_cpu_patching_addr(unsigned int cpu)
> >  {
> > -   return (unsigned long)(per_cpu(text_poke_area, cpu))->addr;
> > +   return patching_addr;
> >  }
> >  #endif
> >  
> > -static int text_area_cpu_up(unsigned int cpu)
> > +struct patch_mapping {
> > +   spinlock_t *ptl; /* for protecting pte table */
> > +   pte_t *ptep;
> > +   struct temp_mm temp_mm;
> > +};
> > +
> > +#ifdef CONFIG_PPC_BOOK3S_64
> > +
> > +static inline int hash_prefault_mapping(pgprot_t pgprot)
> >  {
> > -   struct vm_struct *area;
> > +   int err;
> >  
> > -   area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> > -   if (!area) {
> > -   WARN_ONCE(1, "Failed to create text area for cpu %d\n",
> > -   cpu);
> > -   return -1;
> > -   }
> > -   this_cpu_write(text_poke_area, area);
> > +   if (radix_enabled())
> > +   return 0;
> >  
> > -   return 0;
> > -}
> > +   err = slb_allocate_user(patching_mm, patching_addr);
> > +   if (err)
> > +   pr_warn("map patch: failed to allocate slb entry\n");
> >  
>
> Here if slb_allocate_user() fails, you'll print a warning and then fall
> through to the rest of the function. You do return err, but there's a
> later call to hash_page_mm() that also sets err. Can slb_allocate_user()
> fail while hash_page_mm() succeeds, and would that be a problem?

Hmm, yes I think this is a problem. If slb_allocate_user() fails then we
could potentially mask that error until the actual patching
fails/miscompares later (and that *will* certainly fail in this case). I
will return the error and exit the function early in v5 of the series.
Thanks!

>
> > -static int text_area_cpu_down(unsigned int cpu)
> > -{
> > -   free_vm_area(this_cpu_read(text_poke_area));
> > -   return 0;
> > +   err = hash_page_mm(patching_mm, patching_addr, pgprot_val(pgprot), 0,
> > +  HPTE_USE_KERNEL_KEY);
> > +   if (err)
> > +   pr_warn("map patch: failed to insert hashed page\n");
> > +
> > +   /* See comment in switch_slb() in mm/book3s64/slb.c */
> > +   isync();
> > +
>
> The comment reads:
>
> /*
> * Synchronize slbmte preloads with possible subsequent user memory
> * address accesses by the kernel (user mode won't happen until
> * rfid, which is safe).
> */
> isync();
>
> I have to say having read the description of isync I'm not 100% sure why
> that's enough (don't we also need stores to complete?) but I'm happy to
> take commit 5434ae74629a ("powerpc/64s/hash: Add a SLB preload cache")
> on trust here!
>
> I think it does make sense for you to have that barrier here: you are
> potentially about to start poking at the memory mapped through that SLB
> entry so you should make sure you're fully synchronised.
>
> > +   return err;
> >  }
> >  
>
> > +   init_temp_mm(_mapping->temp_mm, patching_mm);
> > +   use_temporary_mm(_mapping->temp_mm);
> >  
> > -   pmdp = pmd_offset(pudp, addr);
> > -   if (unlikely(!pmdp))
> > -   return -EINVAL;
> > +   /*
> > +* On Book3s64 with the Hash MMU we have to manually insert the SLB
> > +* entry and HPTE to prevent taking faults on the patching_addr later.
> > +*/
> > +   return(hash_prefault_mapping(pgprot));
>
> hmm, `return hash_prefault_mapping(pgprot);` or
> `return (hash_prefault_mapping((pgprot));` maybe?

Yeah, I noticed I left the extra parentheses here after the RESEND. I
think this is left-over when I had another wrapper here... anyway, I'll
clean it up for v5.

>
> Kind regards,
> Daniel



[PATCH v2 1/2] sched/topology: Skip updating masks for non-online nodes

2021-06-30 Thread Srikar Dronamraju
Currently scheduler doesn't check if node is online before adding CPUs
to the node mask. However on some architectures, node distance is only
available for nodes that are online. Its not sure how much to rely on
the node distance, when one of the nodes is offline.

If said node distance is fake (since one of the nodes is offline) and
the actual node distance is different, then the cpumask of such nodes
when the nodes become becomes online will be wrong.

This can cause topology_span_sane to throw up a warning message and the
rest of the topology being not updated properly.

Resolve this by skipping update of cpumask for nodes that are not
online.

However by skipping, relevant CPUs may not be set when nodes are
onlined. i.e when coming up with NUMA masks at a certain NUMA distance,
CPUs that are part of other nodes, which are already online will not be
part of the NUMA mask. Hence the first time, a CPU is added to the newly
onlined node, add the other CPUs to the numa_mask.

Cc: LKML 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Nathan Lynch 
Cc: Michael Ellerman 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Gautham R Shenoy 
Cc: Dietmar Eggemann 
Cc: Mel Gorman 
Cc: Vincent Guittot 
Cc: Rik van Riel 
Cc: Geetika Moolchandani 
Cc: Laurent Dufour 
Reported-by: Geetika Moolchandani 
Signed-off-by: Srikar Dronamraju 
---
Changelog v1->v2:
v1 link: 
http://lore.kernel.org/lkml/20210520154427.1041031-4-sri...@linux.vnet.ibm.com/t/#u
Update the NUMA masks, whenever 1st CPU is added to cpuless node

 kernel/sched/topology.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index b77ad49dc14f..f25dbcab4fd2 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1833,6 +1833,9 @@ void sched_init_numa(void)
sched_domains_numa_masks[i][j] = mask;
 
for_each_node(k) {
+   if (!node_online(j))
+   continue;
+
if (sched_debug() && (node_distance(j, k) != 
node_distance(k, j)))
sched_numa_warn("Node-distance not 
symmetric");
 
@@ -1891,12 +1894,30 @@ void sched_init_numa(void)
 void sched_domains_numa_masks_set(unsigned int cpu)
 {
int node = cpu_to_node(cpu);
-   int i, j;
+   int i, j, empty;
 
+   empty = cpumask_empty(sched_domains_numa_masks[0][node]);
for (i = 0; i < sched_domains_numa_levels; i++) {
for (j = 0; j < nr_node_ids; j++) {
-   if (node_distance(j, node) <= 
sched_domains_numa_distance[i])
+   if (!node_online(j))
+   continue;
+
+   if (node_distance(j, node) <= 
sched_domains_numa_distance[i]) {
cpumask_set_cpu(cpu, 
sched_domains_numa_masks[i][j]);
+
+   /*
+* We skip updating numa_masks for offline
+* nodes. However now that the node is
+* finally online, CPUs that were added
+* earlier, should now be accommodated into
+* newly oneline node's numa mask.
+*/
+   if (node != j && empty) {
+   
cpumask_or(sched_domains_numa_masks[i][node],
+   
sched_domains_numa_masks[i][node],
+   
sched_domains_numa_masks[0][j]);
+   }
+   }
}
}
 }
-- 
2.27.0



[PATCH v2 2/2] powerpc/numa: Fill distance_lookup_table for offline nodes

2021-06-30 Thread Srikar Dronamraju
Currently scheduler populates the distance map by looking at distance
of each node from all other nodes. This should work for most
architectures and platforms.

Scheduler expects unique number of node distances to be available at
boot. It uses node distance to calculate this unique node distances.
On Power Servers, node distances for offline nodes is not available.
However, Power Servers already knows unique possible node distances.
Fake the offline node's distance_lookup_table entries so that all
possible node distances are updated.

For example distance info from numactl from a fully populated 8 node
system at boot may look like this.

node distances:
node   0   1   2   3   4   5   6   7
  0:  10  20  40  40  40  40  40  40
  1:  20  10  40  40  40  40  40  40
  2:  40  40  10  20  40  40  40  40
  3:  40  40  20  10  40  40  40  40
  4:  40  40  40  40  10  20  40  40
  5:  40  40  40  40  20  10  40  40
  6:  40  40  40  40  40  40  10  20
  7:  40  40  40  40  40  40  20  10

However the same system when only two nodes are online at boot, then
distance info from numactl will look like
node distances:
node   0   1
  0:  10  20
  1:  20  10

It may be implementation dependent on what node_distance(0,3) where
node 0 is online and node 3 is offline. In Power Servers case, it returns
LOCAL_DISTANCE(10). Here at boot the scheduler would assume that the max
distance between nodes is 20. However that would not be true.

When Nodes are onlined and CPUs from those nodes are hotplugged,
the max node distance would be 40.

However this only needs to be done if the number of unique node
distances that can be computed for online nodes is less than the
number of possible unique node distances as represented by
distance_ref_points_depth. When the node is actually onlined,
distance_lookup_table will be updated with actual entries.

Cc: LKML 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Nathan Lynch 
Cc: Michael Ellerman 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Gautham R Shenoy 
Cc: Dietmar Eggemann 
Cc: Mel Gorman 
Cc: Vincent Guittot 
Cc: Rik van Riel 
Cc: Geetika Moolchandani 
Cc: Laurent Dufour 
Reported-by: Geetika Moolchandani 
Signed-off-by: Srikar Dronamraju 
---
Changelog v1->v2:
Move to a Powerpc specific solution as suggested by Peter and Valentin

 arch/powerpc/mm/numa.c | 70 ++
 1 file changed, 70 insertions(+)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index f2bf98bdcea2..6d0d89127190 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -860,6 +860,75 @@ void __init dump_numa_cpu_topology(void)
}
 }
 
+/*
+ * Scheduler expects unique number of node distances to be available at
+ * boot. It uses node distance to calculate this unique node distances. On
+ * POWER, node distances for offline nodes is not available. However, POWER
+ * already knows unique possible node distances. Fake the offline node's
+ * distance_lookup_table entries so that all possible node distances are
+ * updated.
+ */
+void __init fake_update_distance_lookup_table(void)
+{
+   unsigned long distance_map;
+   int i, nr_levels, nr_depth, node;
+
+   if (!numa_enabled)
+   return;
+
+   if (!form1_affinity)
+   return;
+
+   /*
+* distance_ref_points_depth lists the unique numa domains
+* available. However it ignore LOCAL_DISTANCE. So add +1
+* to get the actual number of unique distances.
+*/
+   nr_depth = distance_ref_points_depth + 1;
+
+   WARN_ON(nr_depth > sizeof(distance_map));
+
+   bitmap_zero(_map, nr_depth);
+   bitmap_set(_map, 0, 1);
+
+   for_each_online_node(node) {
+   int nd, distance = LOCAL_DISTANCE;
+
+   if (node == first_online_node)
+   continue;
+
+   nd = __node_distance(node, first_online_node);
+   for (i = 0; i < nr_depth; i++, distance *= 2) {
+   if (distance == nd) {
+   bitmap_set(_map, i, 1);
+   break;
+   }
+   }
+   nr_levels = bitmap_weight(_map, nr_depth);
+   if (nr_levels == nr_depth)
+   return;
+   }
+
+   for_each_node(node) {
+   if (node_online(node))
+   continue;
+
+   i = find_first_zero_bit(_map, nr_depth);
+   if (i >= nr_depth || i == 0) {
+   pr_warn("Levels(%d) not matching levels(%d)", 
nr_levels, nr_depth);
+   return;
+   }
+
+   bitmap_set(_map, i, 1);
+   while (i--)
+   distance_lookup_table[node][i] = node;
+
+   nr_levels = bitmap_weight(_map, nr_depth);
+   if (nr_levels == nr_depth)
+   return;
+   }
+}
+
 /* Initialize NODE_DATA for a node on the local memory 

[PATCH v2 0/2] Skip numa distance for offline nodes

2021-06-30 Thread Srikar Dronamraju
Changelog v1->v2:
v1: 
http://lore.kernel.org/lkml/20210520154427.1041031-1-sri...@linux.vnet.ibm.com/t/#u
- Update the numa masks, whenever 1st CPU is added to cpuless node
- Populate all possible nodes distances in boot in a
powerpc specific function

Geetika reported yet another trace while doing a dlpar CPU add
operation. This was true even on top of a recent commit
6980d13f0dd1 ("powerpc/smp: Set numa node before updating mask") which fixed
a similar trace.

WARNING: CPU: 40 PID: 2954 at kernel/sched/topology.c:2088 
build_sched_domains+0x6e8/0x1540
Modules linked in: nft_counter nft_compat rpadlpar_io rpaphp mptcp_diag
xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag
netlink_diag bonding tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat
nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set rfkill nf_tables
nfnetlink dm_multipath pseries_rng xts vmx_crypto binfmt_misc ip_tables xfs
libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp dm_mirror
dm_region_hash dm_log dm_mod fuse
CPU: 40 PID: 2954 Comm: kworker/40:0 Not tainted 5.13.0-rc1+ #19
Workqueue: events cpuset_hotplug_workfn
NIP:  c01de588 LR: c01de584 CTR: 006cd36c
REGS: c0002772b250 TRAP: 0700   Not tainted  (5.12.0-rc5-master+)
MSR:  80029033   CR: 28828422  XER: 000d
CFAR: c020c2f8 IRQMASK: 0 #012GPR00: c01de584 c0002772b4f0
c1f55400 0036 #012GPR04: c063c6368010 c063c63f0a00
0027 c063c6368018 #012GPR08: 0023 c063c636ef48
0063c4de c063bfe9ffe8 #012GPR12: 28828424 c063fe68fe80
 0417 #012GPR16: 0028 c740dcd8
c205db68 c1a3a4a0 #012GPR20: c00091ed7d20 c00091ed8520
0001  #012GPR24: c000113a9600 0190
0028 c10e3ac0 #012GPR28:  c740dd00
c000317b5900 0190
NIP [c01de588] build_sched_domains+0x6e8/0x1540
LR [c01de584] build_sched_domains+0x6e4/0x1540
Call Trace:
[c0002772b4f0] [c01de584] build_sched_domains+0x6e4/0x1540 
(unreliable)
[c0002772b640] [c01e08dc] partition_sched_domains_locked+0x3ec/0x530
[c0002772b6e0] [c02a2144] rebuild_sched_domains_locked+0x524/0xbf0
[c0002772b7e0] [c02a5620] rebuild_sched_domains+0x40/0x70
[c0002772b810] [c02a58e4] cpuset_hotplug_workfn+0x294/0xe20
[c0002772bc30] [c0187510] process_one_work+0x300/0x670
[c0002772bd10] [c01878f8] worker_thread+0x78/0x520
[c0002772bda0] [c01937f0] kthread+0x1a0/0x1b0
[c0002772be10] [c000d6ec] ret_from_kernel_thread+0x5c/0x70
Instruction dump:
7ee5bb78 7f0ac378 7f29cb78 7f68db78 7f46d378 7f84e378 f8610068 3c62ff19
fbe10060 3863e558 4802dd31 6000 <0fe0> 3920fff4 f9210080 e86100b0

Detailed analysis of the failing scenario showed that the span in
question belongs to NODE domain and further the cpumasks for some
cpus in NODE overlapped. There are two possible reasons how we ended
up here:

(1) The numa node was offline or blank with no CPUs or memory. Hence
the sched_max_numa_distance could not be set correctly, or the
sched_domains_numa_distance happened to be partially populated.

(2) Depending on a bogus node_distance of an offline node to populate
cpumasks is the issue.  On POWER platform the node_distance is
correctly available only for an online node which has some CPU or
memory resource associated with it.

For example distance info from numactl from a fully populated 8 node
system at boot may look like this.

node distances:
node   0   1   2   3   4   5   6   7
  0:  10  20  40  40  40  40  40  40
  1:  20  10  40  40  40  40  40  40
  2:  40  40  10  20  40  40  40  40
  3:  40  40  20  10  40  40  40  40
  4:  40  40  40  40  10  20  40  40
  5:  40  40  40  40  20  10  40  40
  6:  40  40  40  40  40  40  10  20
  7:  40  40  40  40  40  40  20  10

However the same system when only two nodes are online at boot, then the
numa topology will look like
node distances:
node   0   1
  0:  10  20
  1:  20  10

This series tries to fix both these problems.
Note: These problems are now visible, thanks to
Commit ccf74128d66c ("sched/topology: Assert non-NUMA topology masks don't
(partially) overlap")

Cc: LKML 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Nathan Lynch 
Cc: Michael Ellerman 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Gautham R Shenoy 
Cc: Dietmar Eggemann 
Cc: Mel Gorman 
Cc: Vincent Guittot 
Cc: Rik van Riel 
Cc: Geetika Moolchandani 
Cc: Laurent Dufour 

Srikar Dronamraju (2):
  sched/topology: Skip updating masks for non-online nodes
  powerpc/numa: Fill distance_lookup_table for offline nodes

 arch/powerpc/mm/numa.c  | 70 +
 kernel/sched/topology.c | 25 

Re: [RESEND PATCH v4 05/11] powerpc/64s: Add ability to skip SLB preload

2021-06-30 Thread Nicholas Piggin
Excerpts from Christopher M. Riedl's message of July 1, 2021 1:48 pm:
> On Sun Jun 20, 2021 at 10:13 PM CDT, Daniel Axtens wrote:
>> "Christopher M. Riedl"  writes:
>>
>> > Switching to a different mm with Hash translation causes SLB entries to
>> > be preloaded from the current thread_info. This reduces SLB faults, for
>> > example when threads share a common mm but operate on different address
>> > ranges.
>> >
>> > Preloading entries from the thread_info struct may not always be
>> > appropriate - such as when switching to a temporary mm. Introduce a new
>> > boolean in mm_context_t to skip the SLB preload entirely. Also move the
>> > SLB preload code into a separate function since switch_slb() is already
>> > quite long. The default behavior (preloading SLB entries from the
>> > current thread_info struct) remains unchanged.
>> >
>> > Signed-off-by: Christopher M. Riedl 
>> >
>> > ---
>> >
>> > v4:  * New to series.
>> > ---
>> >  arch/powerpc/include/asm/book3s/64/mmu.h |  3 ++
>> >  arch/powerpc/include/asm/mmu_context.h   | 13 ++
>> >  arch/powerpc/mm/book3s64/mmu_context.c   |  2 +
>> >  arch/powerpc/mm/book3s64/slb.c   | 56 ++--
>> >  4 files changed, 50 insertions(+), 24 deletions(-)
>> >
>> > diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
>> > b/arch/powerpc/include/asm/book3s/64/mmu.h
>> > index eace8c3f7b0a1..b23a9dcdee5af 100644
>> > --- a/arch/powerpc/include/asm/book3s/64/mmu.h
>> > +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
>> > @@ -130,6 +130,9 @@ typedef struct {
>> >u32 pkey_allocation_map;
>> >s16 execute_only_pkey; /* key holding execute-only protection */
>> >  #endif
>> > +
>> > +  /* Do not preload SLB entries from thread_info during switch_slb() */
>> > +  bool skip_slb_preload;
>> >  } mm_context_t;
>> >  
>> >  static inline u16 mm_ctx_user_psize(mm_context_t *ctx)
>> > diff --git a/arch/powerpc/include/asm/mmu_context.h 
>> > b/arch/powerpc/include/asm/mmu_context.h
>> > index 4bc45d3ed8b0e..264787e90b1a1 100644
>> > --- a/arch/powerpc/include/asm/mmu_context.h
>> > +++ b/arch/powerpc/include/asm/mmu_context.h
>> > @@ -298,6 +298,19 @@ static inline int arch_dup_mmap(struct mm_struct 
>> > *oldmm,
>> >return 0;
>> >  }
>> >  
>> > +#ifdef CONFIG_PPC_BOOK3S_64
>> > +
>> > +static inline void skip_slb_preload_mm(struct mm_struct *mm)
>> > +{
>> > +  mm->context.skip_slb_preload = true;
>> > +}
>> > +
>> > +#else
>> > +
>> > +static inline void skip_slb_preload_mm(struct mm_struct *mm) {}
>> > +
>> > +#endif /* CONFIG_PPC_BOOK3S_64 */
>> > +
>> >  #include 
>> >  
>> >  #endif /* __KERNEL__ */
>> > diff --git a/arch/powerpc/mm/book3s64/mmu_context.c 
>> > b/arch/powerpc/mm/book3s64/mmu_context.c
>> > index c10fc8a72fb37..3479910264c59 100644
>> > --- a/arch/powerpc/mm/book3s64/mmu_context.c
>> > +++ b/arch/powerpc/mm/book3s64/mmu_context.c
>> > @@ -202,6 +202,8 @@ int init_new_context(struct task_struct *tsk, struct 
>> > mm_struct *mm)
>> >atomic_set(>context.active_cpus, 0);
>> >atomic_set(>context.copros, 0);
>> >  
>> > +  mm->context.skip_slb_preload = false;
>> > +
>> >return 0;
>> >  }
>> >  
>> > diff --git a/arch/powerpc/mm/book3s64/slb.c 
>> > b/arch/powerpc/mm/book3s64/slb.c
>> > index c91bd85eb90e3..da0836cb855af 100644
>> > --- a/arch/powerpc/mm/book3s64/slb.c
>> > +++ b/arch/powerpc/mm/book3s64/slb.c
>> > @@ -441,10 +441,39 @@ static void slb_cache_slbie_user(unsigned int index)
>> >asm volatile("slbie %0" : : "r" (slbie_data));
>> >  }
>> >  
>> > +static void preload_slb_entries(struct task_struct *tsk, struct mm_struct 
>> > *mm)
>> Should this be explicitly inline or even __always_inline? I'm thinking
>> switch_slb is probably a fairly hot path on hash?
> 
> Yes absolutely. I'll make this change in v5.
> 
>>
>> > +{
>> > +  struct thread_info *ti = task_thread_info(tsk);
>> > +  unsigned char i;
>> > +
>> > +  /*
>> > +   * We gradually age out SLBs after a number of context switches to
>> > +   * reduce reload overhead of unused entries (like we do with FP/VEC
>> > +   * reload). Each time we wrap 256 switches, take an entry out of the
>> > +   * SLB preload cache.
>> > +   */
>> > +  tsk->thread.load_slb++;
>> > +  if (!tsk->thread.load_slb) {
>> > +  unsigned long pc = KSTK_EIP(tsk);
>> > +
>> > +  preload_age(ti);
>> > +  preload_add(ti, pc);
>> > +  }
>> > +
>> > +  for (i = 0; i < ti->slb_preload_nr; i++) {
>> > +  unsigned char idx;
>> > +  unsigned long ea;
>> > +
>> > +  idx = (ti->slb_preload_tail + i) % SLB_PRELOAD_NR;
>> > +  ea = (unsigned long)ti->slb_preload_esid[idx] << SID_SHIFT;
>> > +
>> > +  slb_allocate_user(mm, ea);
>> > +  }
>> > +}
>> > +
>> >  /* Flush all user entries from the segment table of the current 
>> > processor. */
>> >  void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
>> >  {
>> > -  struct thread_info *ti = task_thread_info(tsk);
>> >unsigned char i;
>> >  
>> >/*
>> > 

Re: [RESEND PATCH v4 05/11] powerpc/64s: Add ability to skip SLB preload

2021-06-30 Thread Christopher M. Riedl
On Sun Jun 20, 2021 at 10:13 PM CDT, Daniel Axtens wrote:
> "Christopher M. Riedl"  writes:
>
> > Switching to a different mm with Hash translation causes SLB entries to
> > be preloaded from the current thread_info. This reduces SLB faults, for
> > example when threads share a common mm but operate on different address
> > ranges.
> >
> > Preloading entries from the thread_info struct may not always be
> > appropriate - such as when switching to a temporary mm. Introduce a new
> > boolean in mm_context_t to skip the SLB preload entirely. Also move the
> > SLB preload code into a separate function since switch_slb() is already
> > quite long. The default behavior (preloading SLB entries from the
> > current thread_info struct) remains unchanged.
> >
> > Signed-off-by: Christopher M. Riedl 
> >
> > ---
> >
> > v4:  * New to series.
> > ---
> >  arch/powerpc/include/asm/book3s/64/mmu.h |  3 ++
> >  arch/powerpc/include/asm/mmu_context.h   | 13 ++
> >  arch/powerpc/mm/book3s64/mmu_context.c   |  2 +
> >  arch/powerpc/mm/book3s64/slb.c   | 56 ++--
> >  4 files changed, 50 insertions(+), 24 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
> > b/arch/powerpc/include/asm/book3s/64/mmu.h
> > index eace8c3f7b0a1..b23a9dcdee5af 100644
> > --- a/arch/powerpc/include/asm/book3s/64/mmu.h
> > +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
> > @@ -130,6 +130,9 @@ typedef struct {
> > u32 pkey_allocation_map;
> > s16 execute_only_pkey; /* key holding execute-only protection */
> >  #endif
> > +
> > +   /* Do not preload SLB entries from thread_info during switch_slb() */
> > +   bool skip_slb_preload;
> >  } mm_context_t;
> >  
> >  static inline u16 mm_ctx_user_psize(mm_context_t *ctx)
> > diff --git a/arch/powerpc/include/asm/mmu_context.h 
> > b/arch/powerpc/include/asm/mmu_context.h
> > index 4bc45d3ed8b0e..264787e90b1a1 100644
> > --- a/arch/powerpc/include/asm/mmu_context.h
> > +++ b/arch/powerpc/include/asm/mmu_context.h
> > @@ -298,6 +298,19 @@ static inline int arch_dup_mmap(struct mm_struct 
> > *oldmm,
> > return 0;
> >  }
> >  
> > +#ifdef CONFIG_PPC_BOOK3S_64
> > +
> > +static inline void skip_slb_preload_mm(struct mm_struct *mm)
> > +{
> > +   mm->context.skip_slb_preload = true;
> > +}
> > +
> > +#else
> > +
> > +static inline void skip_slb_preload_mm(struct mm_struct *mm) {}
> > +
> > +#endif /* CONFIG_PPC_BOOK3S_64 */
> > +
> >  #include 
> >  
> >  #endif /* __KERNEL__ */
> > diff --git a/arch/powerpc/mm/book3s64/mmu_context.c 
> > b/arch/powerpc/mm/book3s64/mmu_context.c
> > index c10fc8a72fb37..3479910264c59 100644
> > --- a/arch/powerpc/mm/book3s64/mmu_context.c
> > +++ b/arch/powerpc/mm/book3s64/mmu_context.c
> > @@ -202,6 +202,8 @@ int init_new_context(struct task_struct *tsk, struct 
> > mm_struct *mm)
> > atomic_set(>context.active_cpus, 0);
> > atomic_set(>context.copros, 0);
> >  
> > +   mm->context.skip_slb_preload = false;
> > +
> > return 0;
> >  }
> >  
> > diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
> > index c91bd85eb90e3..da0836cb855af 100644
> > --- a/arch/powerpc/mm/book3s64/slb.c
> > +++ b/arch/powerpc/mm/book3s64/slb.c
> > @@ -441,10 +441,39 @@ static void slb_cache_slbie_user(unsigned int index)
> > asm volatile("slbie %0" : : "r" (slbie_data));
> >  }
> >  
> > +static void preload_slb_entries(struct task_struct *tsk, struct mm_struct 
> > *mm)
> Should this be explicitly inline or even __always_inline? I'm thinking
> switch_slb is probably a fairly hot path on hash?

Yes absolutely. I'll make this change in v5.

>
> > +{
> > +   struct thread_info *ti = task_thread_info(tsk);
> > +   unsigned char i;
> > +
> > +   /*
> > +* We gradually age out SLBs after a number of context switches to
> > +* reduce reload overhead of unused entries (like we do with FP/VEC
> > +* reload). Each time we wrap 256 switches, take an entry out of the
> > +* SLB preload cache.
> > +*/
> > +   tsk->thread.load_slb++;
> > +   if (!tsk->thread.load_slb) {
> > +   unsigned long pc = KSTK_EIP(tsk);
> > +
> > +   preload_age(ti);
> > +   preload_add(ti, pc);
> > +   }
> > +
> > +   for (i = 0; i < ti->slb_preload_nr; i++) {
> > +   unsigned char idx;
> > +   unsigned long ea;
> > +
> > +   idx = (ti->slb_preload_tail + i) % SLB_PRELOAD_NR;
> > +   ea = (unsigned long)ti->slb_preload_esid[idx] << SID_SHIFT;
> > +
> > +   slb_allocate_user(mm, ea);
> > +   }
> > +}
> > +
> >  /* Flush all user entries from the segment table of the current processor. 
> > */
> >  void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
> >  {
> > -   struct thread_info *ti = task_thread_info(tsk);
> > unsigned char i;
> >  
> > /*
> > @@ -502,29 +531,8 @@ void switch_slb(struct task_struct *tsk, struct 
> > mm_struct *mm)
> >  
> > copy_mm_to_paca(mm);
> >  
> > -   /*
> > -* We gradually age out SLBs 

[powerpc:merge] BUILD SUCCESS 086d9878e1092e7e69a69676ee9ec792690abb1d

2021-06-30 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
merge
branch HEAD: 086d9878e1092e7e69a69676ee9ec792690abb1d  Automatic merge of 
'master' into merge (2021-06-30 23:05)

elapsed time: 727m

configs tested: 101
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
sh  urquell_defconfig
arm  pcm027_defconfig
arcvdk_hs38_smp_defconfig
arcnsimosci_defconfig
armvt8500_v6_v7_defconfig
powerpc  tqm8xx_defconfig
powerpc  mpc866_ads_defconfig
powerpc tqm8548_defconfig
sh   se7619_defconfig
ia64 allmodconfig
openriscdefconfig
powerpc  katmai_defconfig
powerpc   eiger_defconfig
mipse55_defconfig
powerpc pseries_defconfig
mips   bmips_be_defconfig
powerpc   currituck_defconfig
mips  ath25_defconfig
sh   j2_defconfig
ia64  tiger_defconfig
xtensa  audio_kc705_defconfig
arm mv78xx0_defconfig
um  defconfig
x86_64allnoconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a002-20210630
x86_64   randconfig-a001-20210630
x86_64   randconfig-a004-20210630
x86_64   randconfig-a005-20210630
x86_64   randconfig-a006-20210630
x86_64   randconfig-a003-20210630
i386 randconfig-a004-20210630
i386 randconfig-a001-20210630
i386 randconfig-a003-20210630
i386 randconfig-a002-20210630
i386 randconfig-a005-20210630
i386 randconfig-a006-20210630
i386 randconfig-a014-20210630
i386 randconfig-a011-20210630
i386 randconfig-a016-20210630
i386 randconfig-a012-20210630
i386 randconfig-a013-20210630
i386 randconfig-a015-20210630
riscvnommu_k210_defconfig
riscvallyesconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv   defconfig
riscv  rv32_defconfig
riscvallmodconfig
x86_64rhel-8.3-kselftests
um   x86_64_defconfig
um i386_defconfig
umkunit_defconfig
x86_64   allyesconfig
x86_64  defconfig
x86_64   rhel-8.3
x86_64  rhel-8.3-kbuiltin
x86_64  kexec

clang tested configs:
x86_64   randconfig-b001-20210630
x86_64   randconfig-a012-20210630
x86_64   randconfig-a015-20210630
x86_64   randconfig-a016-20210630
x86_64   randconfig-a013-20210630
x86_64   randconfig-a011

[powerpc:next] BUILD REGRESSION 91fc46eced0f70526d74468ac6c932c90a8585b3

2021-06-30 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next
branch HEAD: 91fc46eced0f70526d74468ac6c932c90a8585b3  powerpc/64s: move 
ret_from_fork etc above __end_soft_masked

Error/Warning in current branch:

arch/powerpc/lib/restart_table.c:14:15: error: no previous prototype for 
function 'search_kernel_restart_table' [-Werror,-Wmissing-prototypes]
arch/powerpc/lib/restart_table.c:14:15: warning: no previous prototype for 
function 'search_kernel_restart_table' [-Wmissing-prototypes]
arch/powerpc/lib/restart_table.c:22:6: error: no previous prototype for 
function 'search_kernel_soft_mask_table' [-Werror,-Wmissing-prototypes]
arch/powerpc/lib/restart_table.c:22:6: warning: no previous prototype for 
function 'search_kernel_soft_mask_table' [-Wmissing-prototypes]

possible Error/Warning in current branch:

arch/powerpc/platforms/pseries/vas.c:186:13: warning: no previous prototype for 
'pseries_vas_fault_thread_fn' [-Wmissing-prototypes]

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
`-- powerpc64-randconfig-r034-20210630
`-- 
arch-powerpc-platforms-pseries-vas.c:warning:no-previous-prototype-for-pseries_vas_fault_thread_fn

clang_recent_errors
|-- powerpc-buildonly-randconfig-r003-20210630
|   |-- 
arch-powerpc-lib-restart_table.c:error:no-previous-prototype-for-function-search_kernel_restart_table-Werror-Wmissing-prototypes
|   `-- 
arch-powerpc-lib-restart_table.c:error:no-previous-prototype-for-function-search_kernel_soft_mask_table-Werror-Wmissing-prototypes
`-- powerpc-randconfig-r012-20210630
|-- 
arch-powerpc-lib-restart_table.c:warning:no-previous-prototype-for-function-search_kernel_restart_table
`-- 
arch-powerpc-lib-restart_table.c:warning:no-previous-prototype-for-function-search_kernel_soft_mask_table

elapsed time: 726m

configs tested: 137
configs skipped: 4

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
mipsworkpad_defconfig
arm at91_dt_defconfig
sh   se7712_defconfig
ia64 bigsur_defconfig
mips  maltasmvp_defconfig
arm   aspeed_g4_defconfig
sh   se7724_defconfig
arm s5pv210_defconfig
m68km5272c3_defconfig
ia64 alldefconfig
m68km5307c3_defconfig
sh  sdk7786_defconfig
shdreamcast_defconfig
shedosk7705_defconfig
armzeus_defconfig
powerpc   maple_defconfig
powerpc mpc834x_mds_defconfig
arm  collie_defconfig
sh   se7721_defconfig
powerpc mpc8540_ads_defconfig
parisc   alldefconfig
mips db1xxx_defconfig
sh  r7785rp_defconfig
mips   xway_defconfig
powerpc sbc8548_defconfig
arc  axs103_defconfig
sh  rsk7203_defconfig
mipsqi_lb60_defconfig
powerpcwarp_defconfig
arm   h5000_defconfig
mips   sb1250_swarm_defconfig
s390 allyesconfig
arm hackkit_defconfig
powerpc   bluestone_defconfig
mips   capcella_defconfig
arm  pxa255-idp_defconfig
powerpc pseries_defconfig
mips   bmips_be_defconfig
powerpc   currituck_defconfig
mips  ath25_defconfig
mips rt305x_defconfig
sh   se7705_defconfig
powerpcfsp2_defconfig
ia64  tiger_defconfig
sh   rts7751r2dplus_defconfig
powerpc mpc836x_mds_defconfig
arm pxa_defconfig
sh   se7206_defconfig
mips mpc30x_defconfig
arc nsimosci_hs_defconfig
sh   se7722_defconfig
powerpc canyonlands_defconfig
sh   se7619_defconfig
shsh7785lcr_defconfig
powerpc  mpc885_ads_defconfig
powerpc  tqm8xx_defconfig
nios2alldefconfig
mips loongson1c_defconfig
sh   se7343_defconfig
x86_64allnoconfig
ia64 allmodconfig
ia64defconfig
ia64

Re: [PATCH v3 3/9] powerpc/64e: remove implicit soft-masking and interrupt exit restart logic

2021-06-30 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of June 30, 2021 5:56 pm:
> 
> 
> Le 30/06/2021 à 09:46, Nicholas Piggin a écrit :
>> The implicit soft-masking to speed up interrupt return was going to be
>> used by 64e as well, but it has not been extensively tested on that
>> platform and is not considered ready. It was intended to be disabled
>> before merge. Disable it for now.
>> 
>> Most of the restart code is common with 64s, so with more correctness
>> and performance testing this could be re-enabled again by adding the
>> extra soft-mask checks to interrupt handlers and flipping
>> exit_must_hard_disable().
>> 
>> Fixes: 9d1988ca87dd ("powerpc/64: treat low kernel text as irqs soft-masked")
>> Signed-off-by: Nicholas Piggin 
>> ---
>>   arch/powerpc/include/asm/interrupt.h | 33 
>>   arch/powerpc/kernel/exceptions-64e.S | 12 +-
>>   arch/powerpc/kernel/interrupt.c  |  2 +-
>>   arch/powerpc/kernel/interrupt_64.S   | 16 --
>>   4 files changed, 40 insertions(+), 23 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/interrupt.h 
>> b/arch/powerpc/include/asm/interrupt.h
>> index 8b4b1e84e110..f13c93b033c7 100644
>> --- a/arch/powerpc/include/asm/interrupt.h
>> +++ b/arch/powerpc/include/asm/interrupt.h
>> @@ -73,20 +73,34 @@
>>   #include 
>>   #include 
>>   
>> -#ifdef CONFIG_PPC64
>> +#ifdef CONFIG_PPC_BOOK3S_64
> 
> Can we avoid that ifdef and use IS_ENABLED(CONFIG_PPC_BOOK3S_64) below ?

Hey Christophe,

Thanks for the review, sorry it was a bit rushed to get these fixes in
before the pull. I agree with this and there's a few other cleanups we
might do as well. Something to look at next.

> 
>>   extern char __end_soft_masked[];
>>   unsigned long search_kernel_restart_table(unsigned long addr);
>> -#endif
>>   
>> -#ifdef CONFIG_PPC_BOOK3S_64
>>   DECLARE_STATIC_KEY_FALSE(interrupt_exit_not_reentrant);
>>   
>> +static inline bool is_implicit_soft_masked(struct pt_regs *regs)
>> +{
>> +if (regs->msr & MSR_PR)
>> +return false;
>> +
>> +if (regs->nip >= (unsigned long)__end_soft_masked)
>> +return false;
>> +
>> +return true;
>> +}
>> +
>>   static inline void srr_regs_clobbered(void)
>>   {
>>  local_paca->srr_valid = 0;
>>  local_paca->hsrr_valid = 0;
>>   }
>>   #else
>> +static inline bool is_implicit_soft_masked(struct pt_regs *regs)
>> +{
>> +return false;
>> +}
>> +
>>   static inline void srr_regs_clobbered(void)
>>   {
>>   }
>> @@ -150,11 +164,13 @@ static inline void interrupt_enter_prepare(struct 
>> pt_regs *regs, struct interrup
>>   */
>>  if (TRAP(regs) != INTERRUPT_PROGRAM) {
>>  CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
>> -BUG_ON(regs->nip < (unsigned long)__end_soft_masked);
>> +BUG_ON(is_implicit_soft_masked(regs));
>>  }
>> +#ifdef CONFIG_PPC_BOOK3S
> 
> Allthough we are already in a PPC64 section, wouldn't it be better to use 
> CONFIG_PPC_BOOK3S_64 ?
> 
> Can we use IS_ENABLED(CONFIG_PPC_BOOK3S_64) instead ?

Good question, it's a matter of preference. I have used PPC_BOOK3S in 
other places, but maybe in files shared by 32-bit it would be better to 
have the _64?

On the other hand, in cases where you have an else or #else, then you
still need the PPC64 context to understand that.

I don't really have a preference, I would go with either. Making some
convention and using it everywhere is probably a good idea though.

Thanks,
Nick


Re: [RFC PATCH 19/43] KVM: PPC: Book3S HV P9: Add kvmppc_stop_thread to match kvmppc_start_thread

2021-06-30 Thread Fabiano Rosas
Nicholas Piggin  writes:

> Small cleanup makes it a bit easier to match up entry and exit
> operations.
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas 

> ---
>  arch/powerpc/kvm/book3s_hv.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index b8b0695a9312..86c85e303a6d 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -2948,6 +2948,13 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu, 
> struct kvmppc_vcore *vc)
>   kvmppc_ipi_thread(cpu);
>  }
>
> +/* Old path does this in asm */
> +static void kvmppc_stop_thread(struct kvm_vcpu *vcpu)
> +{
> + vcpu->cpu = -1;
> + vcpu->arch.thread_cpu = -1;
> +}
> +
>  static void kvmppc_wait_for_nap(int n_threads)
>  {
>   int cpu = smp_processor_id();
> @@ -4154,8 +4161,6 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
> u64 time_limit,
>   dec = (s32) dec;
>   tb = mftb();
>   vcpu->arch.dec_expires = dec + tb;
> - vcpu->cpu = -1;
> - vcpu->arch.thread_cpu = -1;
>
>   store_spr_state(vcpu);
>
> @@ -4627,6 +4632,8 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 
> time_limit,
>
>   guest_exit_irqoff();
>
> + kvmppc_stop_thread(vcpu);
> +
>   powerpc_local_irq_pmu_restore(flags);
>
>   cpumask_clear_cpu(pcpu, >arch.cpu_in_guest);


Re: [RFC PATCH 07/43] KVM: PPC: Book3S HV: POWER10 enable HAIL when running radix guests

2021-06-30 Thread Fabiano Rosas
Nicholas Piggin  writes:

> HV interrupts may be taken with the MMU enabled when radix guests are
> running. Enable LPCR[HAIL] on ISA v3.1 processors for radix guests.
> Make this depend on the host LPCR[HAIL] being enabled. Currently that is
> always enabled, but having this test means any issue that might require
> LPCR[HAIL] to be disabled in the host will not have to be duplicated in
> KVM.
>
> -1380 cycles on P10 NULL hcall entry+exit
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas 

> ---
>  arch/powerpc/kvm/book3s_hv.c | 29 +
>  1 file changed, 25 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 36e1db48fccf..ed713f49fbd5 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -4896,6 +4896,8 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu 
> *vcpu)
>   */
>  int kvmppc_switch_mmu_to_hpt(struct kvm *kvm)
>  {
> + unsigned long lpcr, lpcr_mask;
> +
>   if (nesting_enabled(kvm))
>   kvmhv_release_all_nested(kvm);
>   kvmppc_rmap_reset(kvm);
> @@ -4905,8 +4907,13 @@ int kvmppc_switch_mmu_to_hpt(struct kvm *kvm)
>   kvm->arch.radix = 0;
>   spin_unlock(>mmu_lock);
>   kvmppc_free_radix(kvm);
> - kvmppc_update_lpcr(kvm, LPCR_VPM1,
> -LPCR_VPM1 | LPCR_UPRT | LPCR_GTSE | LPCR_HR);
> +
> + lpcr = LPCR_VPM1;
> + lpcr_mask = LPCR_VPM1 | LPCR_UPRT | LPCR_GTSE | LPCR_HR;
> + if (cpu_has_feature(CPU_FTR_ARCH_31))
> + lpcr_mask |= LPCR_HAIL;
> + kvmppc_update_lpcr(kvm, lpcr, lpcr_mask);
> +
>   return 0;
>  }
>
> @@ -4916,6 +4923,7 @@ int kvmppc_switch_mmu_to_hpt(struct kvm *kvm)
>   */
>  int kvmppc_switch_mmu_to_radix(struct kvm *kvm)
>  {
> + unsigned long lpcr, lpcr_mask;
>   int err;
>
>   err = kvmppc_init_vm_radix(kvm);
> @@ -4927,8 +4935,17 @@ int kvmppc_switch_mmu_to_radix(struct kvm *kvm)
>   kvm->arch.radix = 1;
>   spin_unlock(>mmu_lock);
>   kvmppc_free_hpt(>arch.hpt);
> - kvmppc_update_lpcr(kvm, LPCR_UPRT | LPCR_GTSE | LPCR_HR,
> -LPCR_VPM1 | LPCR_UPRT | LPCR_GTSE | LPCR_HR);
> +
> + lpcr = LPCR_UPRT | LPCR_GTSE | LPCR_HR;
> + lpcr_mask = LPCR_VPM1 | LPCR_UPRT | LPCR_GTSE | LPCR_HR;
> + if (cpu_has_feature(CPU_FTR_ARCH_31)) {
> + lpcr_mask |= LPCR_HAIL;
> + if (cpu_has_feature(CPU_FTR_HVMODE) &&
> + (kvm->arch.host_lpcr & LPCR_HAIL))
> + lpcr |= LPCR_HAIL;
> + }
> + kvmppc_update_lpcr(kvm, lpcr, lpcr_mask);
> +
>   return 0;
>  }
>
> @@ -5092,6 +5109,10 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
>   kvm->arch.mmu_ready = 1;
>   lpcr &= ~LPCR_VPM1;
>   lpcr |= LPCR_UPRT | LPCR_GTSE | LPCR_HR;
> + if (cpu_has_feature(CPU_FTR_HVMODE) &&
> + cpu_has_feature(CPU_FTR_ARCH_31) &&
> + (kvm->arch.host_lpcr & LPCR_HAIL))
> + lpcr |= LPCR_HAIL;
>   ret = kvmppc_init_vm_radix(kvm);
>   if (ret) {
>   kvmppc_free_lpid(kvm->arch.lpid);


Re: [RFC PATCH 08/43] powerpc/64s: Keep AMOR SPR a constant ~0 at runtime

2021-06-30 Thread Fabiano Rosas
Nicholas Piggin  writes:

> This register controls supervisor SPR modifications, and as such is only
> relevant for KVM. KVM always sets AMOR to ~0 on guest entry, and never
> restores it coming back out to the host, so it can be kept constant and
> avoid the mtSPR in KVM guest entry.
>
> -21 cycles (9116) cycles POWER9 virt-mode NULL hcall
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas 

> ---
>  arch/powerpc/kernel/cpu_setup_power.c|  8 
>  arch/powerpc/kernel/dt_cpu_ftrs.c|  2 ++
>  arch/powerpc/kvm/book3s_hv_p9_entry.c|  2 --
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S  |  2 --
>  arch/powerpc/mm/book3s64/radix_pgtable.c | 15 ---
>  arch/powerpc/platforms/powernv/idle.c|  8 +++-
>  6 files changed, 13 insertions(+), 24 deletions(-)
>
> diff --git a/arch/powerpc/kernel/cpu_setup_power.c 
> b/arch/powerpc/kernel/cpu_setup_power.c
> index 3cca88ee96d7..a29dc8326622 100644
> --- a/arch/powerpc/kernel/cpu_setup_power.c
> +++ b/arch/powerpc/kernel/cpu_setup_power.c
> @@ -137,6 +137,7 @@ void __setup_cpu_power7(unsigned long offset, struct 
> cpu_spec *t)
>   return;
>
>   mtspr(SPRN_LPID, 0);
> + mtspr(SPRN_AMOR, ~0);
>   mtspr(SPRN_PCR, PCR_MASK);
>   init_LPCR_ISA206(mfspr(SPRN_LPCR), LPCR_LPES1 >> LPCR_LPES_SH);
>  }
> @@ -150,6 +151,7 @@ void __restore_cpu_power7(void)
>   return;
>
>   mtspr(SPRN_LPID, 0);
> + mtspr(SPRN_AMOR, ~0);
>   mtspr(SPRN_PCR, PCR_MASK);
>   init_LPCR_ISA206(mfspr(SPRN_LPCR), LPCR_LPES1 >> LPCR_LPES_SH);
>  }
> @@ -164,6 +166,7 @@ void __setup_cpu_power8(unsigned long offset, struct 
> cpu_spec *t)
>   return;
>
>   mtspr(SPRN_LPID, 0);
> + mtspr(SPRN_AMOR, ~0);
>   mtspr(SPRN_PCR, PCR_MASK);
>   init_LPCR_ISA206(mfspr(SPRN_LPCR) | LPCR_PECEDH, 0); /* LPES = 0 */
>   init_HFSCR();
> @@ -184,6 +187,7 @@ void __restore_cpu_power8(void)
>   return;
>
>   mtspr(SPRN_LPID, 0);
> + mtspr(SPRN_AMOR, ~0);
>   mtspr(SPRN_PCR, PCR_MASK);
>   init_LPCR_ISA206(mfspr(SPRN_LPCR) | LPCR_PECEDH, 0); /* LPES = 0 */
>   init_HFSCR();
> @@ -202,6 +206,7 @@ void __setup_cpu_power9(unsigned long offset, struct 
> cpu_spec *t)
>   mtspr(SPRN_PSSCR, 0);
>   mtspr(SPRN_LPID, 0);
>   mtspr(SPRN_PID, 0);
> + mtspr(SPRN_AMOR, ~0);
>   mtspr(SPRN_PCR, PCR_MASK);
>   init_LPCR_ISA300((mfspr(SPRN_LPCR) | LPCR_PECEDH | LPCR_PECE_HVEE |\
>LPCR_HVICE | LPCR_HEIC) & ~(LPCR_UPRT | LPCR_HR), 0);
> @@ -223,6 +228,7 @@ void __restore_cpu_power9(void)
>   mtspr(SPRN_PSSCR, 0);
>   mtspr(SPRN_LPID, 0);
>   mtspr(SPRN_PID, 0);
> + mtspr(SPRN_AMOR, ~0);
>   mtspr(SPRN_PCR, PCR_MASK);
>   init_LPCR_ISA300((mfspr(SPRN_LPCR) | LPCR_PECEDH | LPCR_PECE_HVEE |\
>LPCR_HVICE | LPCR_HEIC) & ~(LPCR_UPRT | LPCR_HR), 0);
> @@ -242,6 +248,7 @@ void __setup_cpu_power10(unsigned long offset, struct 
> cpu_spec *t)
>   mtspr(SPRN_PSSCR, 0);
>   mtspr(SPRN_LPID, 0);
>   mtspr(SPRN_PID, 0);
> + mtspr(SPRN_AMOR, ~0);
>   mtspr(SPRN_PCR, PCR_MASK);
>   init_LPCR_ISA300((mfspr(SPRN_LPCR) | LPCR_PECEDH | LPCR_PECE_HVEE |\
>LPCR_HVICE | LPCR_HEIC) & ~(LPCR_UPRT | LPCR_HR), 0);
> @@ -264,6 +271,7 @@ void __restore_cpu_power10(void)
>   mtspr(SPRN_PSSCR, 0);
>   mtspr(SPRN_LPID, 0);
>   mtspr(SPRN_PID, 0);
> + mtspr(SPRN_AMOR, ~0);
>   mtspr(SPRN_PCR, PCR_MASK);
>   init_LPCR_ISA300((mfspr(SPRN_LPCR) | LPCR_PECEDH | LPCR_PECE_HVEE |\
>LPCR_HVICE | LPCR_HEIC) & ~(LPCR_UPRT | LPCR_HR), 0);
> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
> b/arch/powerpc/kernel/dt_cpu_ftrs.c
> index 358aee7c2d79..0a6b36b4bda8 100644
> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
> @@ -80,6 +80,7 @@ static void __restore_cpu_cpufeatures(void)
>   mtspr(SPRN_LPCR, system_registers.lpcr);
>   if (hv_mode) {
>   mtspr(SPRN_LPID, 0);
> + mtspr(SPRN_AMOR, ~0);
>   mtspr(SPRN_HFSCR, system_registers.hfscr);
>   mtspr(SPRN_PCR, system_registers.pcr);
>   }
> @@ -216,6 +217,7 @@ static int __init feat_enable_hv(struct dt_cpu_feature *f)
>   }
>
>   mtspr(SPRN_LPID, 0);
> + mtspr(SPRN_AMOR, ~0);
>
>   lpcr = mfspr(SPRN_LPCR);
>   lpcr &=  ~LPCR_LPES0; /* HV external interrupts */
> diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
> b/arch/powerpc/kvm/book3s_hv_p9_entry.c
> index c4f3e066fcb4..a3281f0c9214 100644
> --- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
> +++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
> @@ -286,8 +286,6 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
> time_limit, unsigned long lpc
>   mtspr(SPRN_SPRG2, vcpu->arch.shregs.sprg2);
>   mtspr(SPRN_SPRG3, vcpu->arch.shregs.sprg3);
>
> - mtspr(SPRN_AMOR, ~0UL);
> -
>   local_paca->kvm_hstate.in_guest = 

Re: [RFC PATCH 38/43] KVM: PPC: Book3S HV P9: Test dawr_enabled() before saving host DAWR SPRs

2021-06-30 Thread Fabiano Rosas
Nicholas Piggin  writes:

> Some of the DAWR SPR access is already predicated on dawr_enabled(),
> apply this to the remainder of the accesses.
>
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/kvm/book3s_hv_p9_entry.c | 34 ---
>  1 file changed, 20 insertions(+), 14 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
> b/arch/powerpc/kvm/book3s_hv_p9_entry.c
> index 7aa72efcac6c..f305d1d6445c 100644
> --- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
> +++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
> @@ -638,13 +638,16 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
> time_limit, unsigned long lpc
>
>   host_hfscr = mfspr(SPRN_HFSCR);
>   host_ciabr = mfspr(SPRN_CIABR);
> - host_dawr0 = mfspr(SPRN_DAWR0);
> - host_dawrx0 = mfspr(SPRN_DAWRX0);
>   host_psscr = mfspr(SPRN_PSSCR);
>   host_pidr = mfspr(SPRN_PID);
> - if (cpu_has_feature(CPU_FTR_DAWR1)) {
> - host_dawr1 = mfspr(SPRN_DAWR1);
> - host_dawrx1 = mfspr(SPRN_DAWRX1);
> +
> + if (dawr_enabled()) {
> + host_dawr0 = mfspr(SPRN_DAWR0);
> + host_dawrx0 = mfspr(SPRN_DAWRX0);
> + if (cpu_has_feature(CPU_FTR_DAWR1)) {
> + host_dawr1 = mfspr(SPRN_DAWR1);
> + host_dawrx1 = mfspr(SPRN_DAWRX1);

The userspace needs to enable DAWR1 via KVM_CAP_PPC_DAWR1. That cap is
not even implemented in QEMU currently, so we never allow the guest to
set vcpu->arch.dawr1. If we check for kvm->arch.dawr1_enabled instead of
the CPU feature, we could shave some more time here.

> + }
>   }
>
>   local_paca->kvm_hstate.host_purr = mfspr(SPRN_PURR);
> @@ -951,15 +954,18 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
> time_limit, unsigned long lpc
>   mtspr(SPRN_HFSCR, host_hfscr);
>   if (vcpu->arch.ciabr != host_ciabr)
>   mtspr(SPRN_CIABR, host_ciabr);
> - if (vcpu->arch.dawr0 != host_dawr0)
> - mtspr(SPRN_DAWR0, host_dawr0);
> - if (vcpu->arch.dawrx0 != host_dawrx0)
> - mtspr(SPRN_DAWRX0, host_dawrx0);
> - if (cpu_has_feature(CPU_FTR_DAWR1)) {
> - if (vcpu->arch.dawr1 != host_dawr1)
> - mtspr(SPRN_DAWR1, host_dawr1);
> - if (vcpu->arch.dawrx1 != host_dawrx1)
> - mtspr(SPRN_DAWRX1, host_dawrx1);
> +
> + if (dawr_enabled()) {
> + if (vcpu->arch.dawr0 != host_dawr0)
> + mtspr(SPRN_DAWR0, host_dawr0);
> + if (vcpu->arch.dawrx0 != host_dawrx0)
> + mtspr(SPRN_DAWRX0, host_dawrx0);
> + if (cpu_has_feature(CPU_FTR_DAWR1)) {
> + if (vcpu->arch.dawr1 != host_dawr1)
> + mtspr(SPRN_DAWR1, host_dawr1);
> + if (vcpu->arch.dawrx1 != host_dawrx1)
> + mtspr(SPRN_DAWRX1, host_dawrx1);
> + }
>   }
>
>   if (vc->dpdes)


Re: [RFC PATCH 01/43] powerpc/64s: Remove WORT SPR from POWER9/10

2021-06-30 Thread Fabiano Rosas
Nicholas Piggin  writes:

> This register is not architected and not implemented in POWER9 or 10,
> it just reads back zeroes for compatibility.
>
> -78 cycles (9255) cycles POWER9 virt-mode NULL hcall
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas 

> ---
>  arch/powerpc/kvm/book3s_hv.c  | 3 ---
>  arch/powerpc/platforms/powernv/idle.c | 2 --
>  2 files changed, 5 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 9228042bd54f..97f3d6d54b61 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -3640,7 +3640,6 @@ static void load_spr_state(struct kvm_vcpu *vcpu)
>   mtspr(SPRN_EBBHR, vcpu->arch.ebbhr);
>   mtspr(SPRN_EBBRR, vcpu->arch.ebbrr);
>   mtspr(SPRN_BESCR, vcpu->arch.bescr);
> - mtspr(SPRN_WORT, vcpu->arch.wort);
>   mtspr(SPRN_TIDR, vcpu->arch.tid);
>   mtspr(SPRN_AMR, vcpu->arch.amr);
>   mtspr(SPRN_UAMOR, vcpu->arch.uamor);
> @@ -3667,7 +3666,6 @@ static void store_spr_state(struct kvm_vcpu *vcpu)
>   vcpu->arch.ebbhr = mfspr(SPRN_EBBHR);
>   vcpu->arch.ebbrr = mfspr(SPRN_EBBRR);
>   vcpu->arch.bescr = mfspr(SPRN_BESCR);
> - vcpu->arch.wort = mfspr(SPRN_WORT);
>   vcpu->arch.tid = mfspr(SPRN_TIDR);
>   vcpu->arch.amr = mfspr(SPRN_AMR);
>   vcpu->arch.uamor = mfspr(SPRN_UAMOR);
> @@ -3699,7 +3697,6 @@ static void restore_p9_host_os_sprs(struct kvm_vcpu 
> *vcpu,
>   struct p9_host_os_sprs *host_os_sprs)
>  {
>   mtspr(SPRN_PSPB, 0);
> - mtspr(SPRN_WORT, 0);
>   mtspr(SPRN_UAMOR, 0);
>
>   mtspr(SPRN_DSCR, host_os_sprs->dscr);
> diff --git a/arch/powerpc/platforms/powernv/idle.c 
> b/arch/powerpc/platforms/powernv/idle.c
> index 528a7e0cf83a..180baecad914 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -667,7 +667,6 @@ static unsigned long power9_idle_stop(unsigned long psscr)
>   sprs.purr   = mfspr(SPRN_PURR);
>   sprs.spurr  = mfspr(SPRN_SPURR);
>   sprs.dscr   = mfspr(SPRN_DSCR);
> - sprs.wort   = mfspr(SPRN_WORT);
>   sprs.ciabr  = mfspr(SPRN_CIABR);
>
>   sprs.mmcra  = mfspr(SPRN_MMCRA);
> @@ -785,7 +784,6 @@ static unsigned long power9_idle_stop(unsigned long psscr)
>   mtspr(SPRN_PURR,sprs.purr);
>   mtspr(SPRN_SPURR,   sprs.spurr);
>   mtspr(SPRN_DSCR,sprs.dscr);
> - mtspr(SPRN_WORT,sprs.wort);
>   mtspr(SPRN_CIABR,   sprs.ciabr);
>
>   mtspr(SPRN_MMCRA,   sprs.mmcra);


Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-06-30 Thread Nathan Chancellor
Hi Will and Claire,

On Wed, Jun 30, 2021 at 12:43:48PM +0100, Will Deacon wrote:
> On Wed, Jun 30, 2021 at 05:17:27PM +0800, Claire Chang wrote:
> > On Wed, Jun 30, 2021 at 9:43 AM Nathan Chancellor  wrote:
> > >
> > > On Thu, Jun 24, 2021 at 11:55:20PM +0800, Claire Chang wrote:
> > > > Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and
> > > > use it to determine whether to bounce the data or not. This will be
> > > > useful later to allow for different pools.
> > > >
> > > > Signed-off-by: Claire Chang 
> > > > Reviewed-by: Christoph Hellwig 
> > > > Tested-by: Stefano Stabellini 
> > > > Tested-by: Will Deacon 
> > > > Acked-by: Stefano Stabellini 
> > >
> > > This patch as commit af452ec1b1a3 ("swiotlb: Use is_swiotlb_force_bounce
> > > for swiotlb data bouncing") causes my Ryzen 3 4300G system to fail to
> > > get to an X session consistently (although not every single time),
> > > presumably due to a crash in the AMDGPU driver that I see in dmesg.
> > >
> > > I have attached logs at af452ec1b1a3 and f127c9556a8e and I am happy
> > > to provide any further information, debug, or test patches as necessary.
> > 
> > Are you using swiotlb=force? or the swiotlb_map is called because of
> > !dma_capable? 
> > (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/dma/direct.h#n93)
> 
> The command line is in the dmesg:
> 
>   | Kernel command line: initrd=\amd-ucode.img 
> initrd=\initramfs-linux-next-llvm.img 
> root=PARTUUID=8680aa0c-cf09-4a69-8cf3-970478040ee7 rw intel_pstate=no_hwp 
> irqpoll
> 
> but I worry that this looks _very_ similar to the issue reported by Qian
> Cai which we thought we had fixed. Nathan -- is the failure deterministic?

Yes, for the most part. It does not happen every single boot so when I
was bisecting, I did a series of seven boots and only considered the
revision good when all seven of them made it to LightDM's greeter. My
results that I notated show most bad revisions failed anywhere from four
to six times.

> > `BUG: unable to handle page fault for address: 003a8290` and
> > the fact it crashed at `_raw_spin_lock_irqsave` look like the memory
> > (maybe dev->dma_io_tlb_mem) was corrupted?
> > The dev->dma_io_tlb_mem should be set here
> > (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528)
> > through device_initialize.
> 
> I'm less sure about this. 'dma_io_tlb_mem' should be pointing at
> 'io_tlb_default_mem', which is a page-aligned allocation from memblock.
> The spinlock is at offset 0x24 in that structure, and looking at the
> register dump from the crash:
> 
> Jun 29 18:28:42 hp-4300G kernel: RSP: 0018:adb4013db9e8 EFLAGS: 00010006
> Jun 29 18:28:42 hp-4300G kernel: RAX: 003a8290 RBX:  
> RCX: 8900572ad580
> Jun 29 18:28:42 hp-4300G kernel: RDX: 89005653f024 RSI: 000c 
> RDI: 1d17
> Jun 29 18:28:42 hp-4300G kernel: RBP: 0a20d000 R08: 000c 
> R09: 
> Jun 29 18:28:42 hp-4300G kernel: R10: 0a20d000 R11: 89005653f000 
> R12: 0212
> Jun 29 18:28:42 hp-4300G kernel: R13: 1000 R14: 0002 
> R15: 0020
> Jun 29 18:28:42 hp-4300G kernel: FS:  7f1f8898ea40() 
> GS:89005728() knlGS:
> Jun 29 18:28:42 hp-4300G kernel: CS:  0010 DS:  ES:  CR0: 
> 80050033
> Jun 29 18:28:42 hp-4300G kernel: CR2: 003a8290 CR3: 0001020d 
> CR4: 00350ee0
> Jun 29 18:28:42 hp-4300G kernel: Call Trace:
> Jun 29 18:28:42 hp-4300G kernel:  _raw_spin_lock_irqsave+0x39/0x50
> Jun 29 18:28:42 hp-4300G kernel:  swiotlb_tbl_map_single+0x12b/0x4c0
> 
> Then that correlates with R11 holding the 'dma_io_tlb_mem' pointer and
> RDX pointing at the spinlock. Yet RAX is holding junk :/
> 
> I agree that enabling KASAN would be a good idea, but I also think we
> probably need to get some more information out of swiotlb_tbl_map_single()
> to see see what exactly is going wrong in there.

I can certainly enable KASAN and if there is any debug print I can add
or dump anything, let me know!

Cheers,
Nathan


Re: [PATCH] powerpc/32s: Fix setup_{kuap/kuep}() on SMP

2021-06-30 Thread Michael Ellerman
On Mon, 28 Jun 2021 06:56:11 + (UTC), Christophe Leroy wrote:
> On SMP, setup_kup() is also called from start_secondary().
> 
> start_secondary() is not an __init function.
> 
> Remove the __init marker from setup_kuep() and and setup_kuap().

Applied to powerpc/next.

[1/1] powerpc/32s: Fix setup_{kuap/kuep}() on SMP
  https://git.kernel.org/powerpc/c/c89e632658e793fbbdcbfbe80a6c13bbf7203e9b

cheers


Re: [PATCH v3 0/9] powerpc: fast interrupt exit bug and misc fixes

2021-06-30 Thread Michael Ellerman
On Wed, 30 Jun 2021 17:46:12 +1000, Nicholas Piggin wrote:
> This is a bunch of fixes for powerpc next, mostly a nasty hole in fast
> interrupt exit code found by Sachin and some other bits along the way
> while looking at it.
> 
> Since v2:
> - Fixed 64e patch 3 to really set exit_must_hard_disable.
> - Reworded some changelogs.
> 
> [...]

Applied to powerpc/next.

[1/9] powerpc/64s: fix hash page fault interrupt handler
  https://git.kernel.org/powerpc/c/5567b1ee29b7a83e8c01d99d34b5bbd306ce0bcf
[2/9] powerpc/64e: fix CONFIG_RELOCATABLE build warnings
  https://git.kernel.org/powerpc/c/fce01acf830a697110ed72ecace4b0afdbcd53cb
[3/9] powerpc/64e: remove implicit soft-masking and interrupt exit restart logic
  https://git.kernel.org/powerpc/c/9b69d48c7516a29cdaacd18d8bf5f575014a42a1
[4/9] powerpc/64s: add a table of implicit soft-masked addresses
  https://git.kernel.org/powerpc/c/325678fd052259e7c05ef29060a73c705ea90432
[5/9] powerpc/64s/interrupt: preserve regs->softe for NMI interrupts
  https://git.kernel.org/powerpc/c/1b0482229c302a3c6afd00d6b3bf0169cf279b44
[6/9] powerpc/64: enable MSR[EE] in irq replay pt_regs
  https://git.kernel.org/powerpc/c/2b43dd7653cca47d297756980846ebbfe8887fa1
[7/9] powerpc/64/interrupt: add missing kprobe annotations on interrupt exit 
symbols
  https://git.kernel.org/powerpc/c/98798f33c6be5a511ab61958b40835b3ef08def2
[8/9] powerpc/64s/interrupt: clean up interrupt return labels
  https://git.kernel.org/powerpc/c/c59458b00aec4ba580d9628d36d6c984af94d192
[9/9] powerpc/64s: move ret_from_fork etc above __end_soft_masked
  https://git.kernel.org/powerpc/c/91fc46eced0f70526d74468ac6c932c90a8585b3

cheers


Re: [PATCH v2] powerpc/4xx: Fix setup_kuep() on SMP

2021-06-30 Thread Michael Ellerman
On Tue, 29 Jun 2021 12:24:21 + (UTC), Christophe Leroy wrote:
> On SMP, setup_kuep() is also called from start_secondary() since
> commit 86f46f343272 ("powerpc/32s: Initialise KUAP and KUEP in C").
> 
> start_secondary() is not an __init function.
> 
> Remove the __init marker from setup_kuep() and bail out when
> not caller on the first CPU as the work is already done.

Applied to powerpc/next.

[1/1] powerpc/4xx: Fix setup_kuep() on SMP
  https://git.kernel.org/powerpc/c/fc4999864bca323f1b844fefe1b402632443c076

cheers


Re: [PATCH] crypto: DRBG - select SHA512

2021-06-30 Thread Sachin Sant


> On 30-Jun-2021, at 4:02 PM, Stephan Mueller  wrote:
> 
> With the swtich to use HMAC(SHA-512) as the default DRBG type, the
> configuration must now also select SHA-512.
> 
> Fixes: 9b7b94683a9b "crypto: DRBG - switch to HMAC SHA512 DRBG as default
> DRBG"
> Reported-by: Sachin Sant 
> Signed-off-by: Stephan Mueller 
> ---

Thanks Stephan. This patch fixes the reported problem.

Tested-by: Sachin Sant 

-Sachin



[PATCH] powerpc/papr_scm: Move duplicate definitions to common header files

2021-06-30 Thread Shivaprasad G Bhat
The papr_scm uses PDSM structures like nd_papr_pdsm_health as the
PDSM payload which would be reused by ndtest. Since ndtest is arch
independent, move the PDSM header from arch/powerpc/include/uapi/
to the generic include/uapi/linux directory.

Also, there are some #defines common between papr_scm and ndtest
which are not exported to the user space. So, move them to a header
file which can be shared across ndtest and papr_scm via newly
introduced include/linux/papr_scm.h.

Signed-off-by: Shivaprasad G Bhat 
Suggested-by: "Aneesh Kumar K.V" 
---
 MAINTAINERS   |2 
 arch/powerpc/include/uapi/asm/papr_pdsm.h |  147 -
 arch/powerpc/platforms/pseries/papr_scm.c |   43 
 include/linux/papr_scm.h  |   48 +
 include/uapi/linux/papr_pdsm.h|  147 +
 tools/testing/nvdimm/test/ndtest.c|1 
 tools/testing/nvdimm/test/ndtest.h|   31 --
 7 files changed, 200 insertions(+), 219 deletions(-)
 delete mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
 create mode 100644 include/linux/papr_scm.h
 create mode 100644 include/uapi/linux/papr_pdsm.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 008fcad7ac008..5d2d8a6b5eb53 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10494,6 +10494,8 @@ F:  drivers/rtc/rtc-opal.c
 F: drivers/scsi/ibmvscsi/
 F: drivers/tty/hvc/hvc_opal.c
 F: drivers/watchdog/wdrtas.c
+F: include/linux/papr_scm.h
+F: include/uapi/linux/papr_pdsm.h
 F: tools/testing/selftests/powerpc
 N: /pmac
 N: powermac
diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h 
b/arch/powerpc/include/uapi/asm/papr_pdsm.h
deleted file mode 100644
index 82488b1e7276e..0
--- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
+++ /dev/null
@@ -1,147 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
-/*
- * PAPR nvDimm Specific Methods (PDSM) and structs for libndctl
- *
- * (C) Copyright IBM 2020
- *
- * Author: Vaibhav Jain 
- */
-
-#ifndef _UAPI_ASM_POWERPC_PAPR_PDSM_H_
-#define _UAPI_ASM_POWERPC_PAPR_PDSM_H_
-
-#include 
-#include 
-
-/*
- * PDSM Envelope:
- *
- * The ioctl ND_CMD_CALL exchange data between user-space and kernel via
- * envelope which consists of 2 headers sections and payload sections as
- * illustrated below:
- *  +-+---+---+
- *  |   64-Bytes  |   8-Bytes |   Max 184-Bytes   |
- *  +-+---+---+
- *  | ND-HEADER   |  PDSM-HEADER  |  PDSM-PAYLOAD |
- *  +-+---+---+
- *  | nd_family   |   |   |
- *  | nd_size_out | cmd_status|   |
- *  | nd_size_in  | reserved  | nd_pdsm_payload   |
- *  | nd_command  | payload   --> |   |
- *  | nd_fw_size  |   |   |
- *  | nd_payload ---> |   |   |
- *  +---+-+---+
- *
- * ND Header:
- * This is the generic libnvdimm header described as 'struct nd_cmd_pkg'
- * which is interpreted by libnvdimm before passed on to papr_scm. Important
- * member fields used are:
- * 'nd_family' : (In) NVDIMM_FAMILY_PAPR_SCM
- * 'nd_size_in': (In) PDSM-HEADER + PDSM-IN-PAYLOAD (usually 0)
- * 'nd_size_out': (In) PDSM-HEADER + PDSM-RETURN-PAYLOAD
- * 'nd_command' : (In) One of PAPR_PDSM_XXX
- * 'nd_fw_size' : (Out) PDSM-HEADER + size of actual payload returned
- *
- * PDSM Header:
- * This is papr-scm specific header that precedes the payload. This is defined
- * as nd_cmd_pdsm_pkg.  Following fields aare available in this header:
- *
- * 'cmd_status': (Out) Errors if any encountered while 
servicing PDSM.
- * 'reserved'  : Not used, reserved for future and should be set to 0.
- * 'payload': A union of all the possible payload structs
- *
- * PDSM Payload:
- *
- * The layout of the PDSM Payload is defined by various structs shared between
- * papr_scm and libndctl so that contents of payload can be interpreted. As 
such
- * its defined as a union of all possible payload structs as
- * 'union nd_pdsm_payload'. Based on the value of 'nd_cmd_pkg.nd_command'
- * appropriate member of the union is accessed.
- */
-
-/* Max payload size that we can handle */
-#define ND_PDSM_PAYLOAD_MAX_SIZE 184
-
-/* Max payload size that we can handle */
-#define ND_PDSM_HDR_SIZE \
-   (sizeof(struct nd_pkg_pdsm) - ND_PDSM_PAYLOAD_MAX_SIZE)
-
-/* Various nvdimm health indicators */
-#define PAPR_PDSM_DIMM_HEALTHY   0
-#define PAPR_PDSM_DIMM_UNHEALTHY 1
-#define PAPR_PDSM_DIMM_CRITICAL  2
-#define PAPR_PDSM_DIMM_FATAL 3
-
-/* struct nd_papr_pdsm_health.extension_flags field 

Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-06-30 Thread Will Deacon
On Wed, Jun 30, 2021 at 05:17:27PM +0800, Claire Chang wrote:
> On Wed, Jun 30, 2021 at 9:43 AM Nathan Chancellor  wrote:
> >
> > On Thu, Jun 24, 2021 at 11:55:20PM +0800, Claire Chang wrote:
> > > Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and
> > > use it to determine whether to bounce the data or not. This will be
> > > useful later to allow for different pools.
> > >
> > > Signed-off-by: Claire Chang 
> > > Reviewed-by: Christoph Hellwig 
> > > Tested-by: Stefano Stabellini 
> > > Tested-by: Will Deacon 
> > > Acked-by: Stefano Stabellini 
> >
> > This patch as commit af452ec1b1a3 ("swiotlb: Use is_swiotlb_force_bounce
> > for swiotlb data bouncing") causes my Ryzen 3 4300G system to fail to
> > get to an X session consistently (although not every single time),
> > presumably due to a crash in the AMDGPU driver that I see in dmesg.
> >
> > I have attached logs at af452ec1b1a3 and f127c9556a8e and I am happy
> > to provide any further information, debug, or test patches as necessary.
> 
> Are you using swiotlb=force? or the swiotlb_map is called because of
> !dma_capable? 
> (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/dma/direct.h#n93)

The command line is in the dmesg:

  | Kernel command line: initrd=\amd-ucode.img 
initrd=\initramfs-linux-next-llvm.img 
root=PARTUUID=8680aa0c-cf09-4a69-8cf3-970478040ee7 rw intel_pstate=no_hwp 
irqpoll

but I worry that this looks _very_ similar to the issue reported by Qian
Cai which we thought we had fixed. Nathan -- is the failure deterministic?

> `BUG: unable to handle page fault for address: 003a8290` and
> the fact it crashed at `_raw_spin_lock_irqsave` look like the memory
> (maybe dev->dma_io_tlb_mem) was corrupted?
> The dev->dma_io_tlb_mem should be set here
> (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528)
> through device_initialize.

I'm less sure about this. 'dma_io_tlb_mem' should be pointing at
'io_tlb_default_mem', which is a page-aligned allocation from memblock.
The spinlock is at offset 0x24 in that structure, and looking at the
register dump from the crash:

Jun 29 18:28:42 hp-4300G kernel: RSP: 0018:adb4013db9e8 EFLAGS: 00010006
Jun 29 18:28:42 hp-4300G kernel: RAX: 003a8290 RBX:  
RCX: 8900572ad580
Jun 29 18:28:42 hp-4300G kernel: RDX: 89005653f024 RSI: 000c 
RDI: 1d17
Jun 29 18:28:42 hp-4300G kernel: RBP: 0a20d000 R08: 000c 
R09: 
Jun 29 18:28:42 hp-4300G kernel: R10: 0a20d000 R11: 89005653f000 
R12: 0212
Jun 29 18:28:42 hp-4300G kernel: R13: 1000 R14: 0002 
R15: 0020
Jun 29 18:28:42 hp-4300G kernel: FS:  7f1f8898ea40() 
GS:89005728() knlGS:
Jun 29 18:28:42 hp-4300G kernel: CS:  0010 DS:  ES:  CR0: 
80050033
Jun 29 18:28:42 hp-4300G kernel: CR2: 003a8290 CR3: 0001020d 
CR4: 00350ee0
Jun 29 18:28:42 hp-4300G kernel: Call Trace:
Jun 29 18:28:42 hp-4300G kernel:  _raw_spin_lock_irqsave+0x39/0x50
Jun 29 18:28:42 hp-4300G kernel:  swiotlb_tbl_map_single+0x12b/0x4c0

Then that correlates with R11 holding the 'dma_io_tlb_mem' pointer and
RDX pointing at the spinlock. Yet RAX is holding junk :/

I agree that enabling KASAN would be a good idea, but I also think we
probably need to get some more information out of swiotlb_tbl_map_single()
to see see what exactly is going wrong in there.

Will


[PATCH] crypto: DRBG - select SHA512

2021-06-30 Thread Stephan Mueller
With the swtich to use HMAC(SHA-512) as the default DRBG type, the
configuration must now also select SHA-512.

Fixes: 9b7b94683a9b "crypto: DRBG - switch to HMAC SHA512 DRBG as default
DRBG"
Reported-by: Sachin Sant 
Signed-off-by: Stephan Mueller 
---
 crypto/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index ca3b02dcbbfa..64b772c5d1c9 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1768,7 +1768,7 @@ config CRYPTO_DRBG_HMAC
bool
default y
select CRYPTO_HMAC
-   select CRYPTO_SHA256
+   select CRYPTO_SHA512
 
 config CRYPTO_DRBG_HASH
bool "Enable Hash DRBG"
-- 
2.31.1





Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-06-30 Thread Claire Chang
On Wed, Jun 30, 2021 at 9:43 AM Nathan Chancellor  wrote:
>
> On Thu, Jun 24, 2021 at 11:55:20PM +0800, Claire Chang wrote:
> > Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and
> > use it to determine whether to bounce the data or not. This will be
> > useful later to allow for different pools.
> >
> > Signed-off-by: Claire Chang 
> > Reviewed-by: Christoph Hellwig 
> > Tested-by: Stefano Stabellini 
> > Tested-by: Will Deacon 
> > Acked-by: Stefano Stabellini 
>
> This patch as commit af452ec1b1a3 ("swiotlb: Use is_swiotlb_force_bounce
> for swiotlb data bouncing") causes my Ryzen 3 4300G system to fail to
> get to an X session consistently (although not every single time),
> presumably due to a crash in the AMDGPU driver that I see in dmesg.
>
> I have attached logs at af452ec1b1a3 and f127c9556a8e and I am happy
> to provide any further information, debug, or test patches as necessary.

Are you using swiotlb=force? or the swiotlb_map is called because of
!dma_capable? 
(https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/dma/direct.h#n93)

`BUG: unable to handle page fault for address: 003a8290` and
the fact it crashed at `_raw_spin_lock_irqsave` look like the memory
(maybe dev->dma_io_tlb_mem) was corrupted?
The dev->dma_io_tlb_mem should be set here
(https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528)
through device_initialize.

I can't tell what happened from the logs, but maybe we could try KASAN
to see if it provides more clue.

Thanks,
Claire

>
> Cheers,
> Nathan


Re: [PATCH v3 3/9] powerpc/64e: remove implicit soft-masking and interrupt exit restart logic

2021-06-30 Thread Christophe Leroy




Le 30/06/2021 à 09:46, Nicholas Piggin a écrit :

The implicit soft-masking to speed up interrupt return was going to be
used by 64e as well, but it has not been extensively tested on that
platform and is not considered ready. It was intended to be disabled
before merge. Disable it for now.

Most of the restart code is common with 64s, so with more correctness
and performance testing this could be re-enabled again by adding the
extra soft-mask checks to interrupt handlers and flipping
exit_must_hard_disable().

Fixes: 9d1988ca87dd ("powerpc/64: treat low kernel text as irqs soft-masked")
Signed-off-by: Nicholas Piggin 
---
  arch/powerpc/include/asm/interrupt.h | 33 
  arch/powerpc/kernel/exceptions-64e.S | 12 +-
  arch/powerpc/kernel/interrupt.c  |  2 +-
  arch/powerpc/kernel/interrupt_64.S   | 16 --
  4 files changed, 40 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index 8b4b1e84e110..f13c93b033c7 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -73,20 +73,34 @@
  #include 
  #include 
  
-#ifdef CONFIG_PPC64

+#ifdef CONFIG_PPC_BOOK3S_64


Can we avoid that ifdef and use IS_ENABLED(CONFIG_PPC_BOOK3S_64) below ?


  extern char __end_soft_masked[];
  unsigned long search_kernel_restart_table(unsigned long addr);
-#endif
  
-#ifdef CONFIG_PPC_BOOK3S_64

  DECLARE_STATIC_KEY_FALSE(interrupt_exit_not_reentrant);
  
+static inline bool is_implicit_soft_masked(struct pt_regs *regs)

+{
+   if (regs->msr & MSR_PR)
+   return false;
+
+   if (regs->nip >= (unsigned long)__end_soft_masked)
+   return false;
+
+   return true;
+}
+
  static inline void srr_regs_clobbered(void)
  {
local_paca->srr_valid = 0;
local_paca->hsrr_valid = 0;
  }
  #else
+static inline bool is_implicit_soft_masked(struct pt_regs *regs)
+{
+   return false;
+}
+
  static inline void srr_regs_clobbered(void)
  {
  }
@@ -150,11 +164,13 @@ static inline void interrupt_enter_prepare(struct pt_regs 
*regs, struct interrup
 */
if (TRAP(regs) != INTERRUPT_PROGRAM) {
CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
-   BUG_ON(regs->nip < (unsigned long)__end_soft_masked);
+   BUG_ON(is_implicit_soft_masked(regs));
}
+#ifdef CONFIG_PPC_BOOK3S


Allthough we are already in a PPC64 section, wouldn't it be better to use 
CONFIG_PPC_BOOK3S_64 ?

Can we use IS_ENABLED(CONFIG_PPC_BOOK3S_64) instead ?


/* Move this under a debugging check */
if (arch_irq_disabled_regs(regs))
BUG_ON(search_kernel_restart_table(regs->nip));
+#endif
}
  #endif
  
@@ -244,10 +260,9 @@ static inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, struct inte

local_paca->irq_soft_mask = IRQS_ALL_DISABLED;
local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
  
-	if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && !(regs->msr & MSR_PR) &&

-   regs->nip < (unsigned long)__end_soft_masked) {
-   // Kernel code running below __end_soft_masked is
-   // implicitly soft-masked.
+   if (is_implicit_soft_masked(regs)) {
+   // Adjust regs->softe soft implicit soft-mask, so
+   // arch_irq_disabled_regs(regs) behaves as expected.
regs->softe = IRQS_ALL_DISABLED;
}
  
@@ -282,6 +297,7 @@ static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct inter

 */
  
  #ifdef CONFIG_PPC64

+#ifdef CONFIG_PPC_BOOK3S


IS_ENABLED(CONFIG_PPC_BOOK3S_64) instead ?


if (arch_irq_disabled_regs(regs)) {
unsigned long rst = search_kernel_restart_table(regs->nip);
if (rst)
@@ -289,7 +305,6 @@ static inline void interrupt_nmi_exit_prepare(struct 
pt_regs *regs, struct inter
}
  #endif
  
-#ifdef CONFIG_PPC64

if (nmi_disables_ftrace(regs))
this_cpu_set_ftrace_enabled(state->ftrace_enabled);
  
diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S

index d634bfceed2c..1401787b0b93 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -342,17 +342,7 @@ ret_from_mc_except:
  #define PROLOG_ADDITION_MASKABLE_GEN(n)   
\
lbz r10,PACAIRQSOFTMASK(r13);   /* are irqs soft-masked? */ \
andi.   r10,r10,IRQS_DISABLED;  /* yes -> go out of line */ \
-   bne masked_interrupt_book3e_##n;\
-   /* Kernel code below __end_soft_masked is implicitly masked */  \
-   andi.   r10,r11,MSR_PR; \
-   bne 1f; /* user -> not masked */ \
-   std 

[PATCH v3 9/9] powerpc/64s: move ret_from_fork etc above __end_soft_masked

2021-06-30 Thread Nicholas Piggin
Code which runs with interrupts enabled should be moved above
__end_soft_masked where possible, because maskable interrupts that hit
below that symbol will need to consult the soft mask table, which is an
extra cost.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/interrupt_64.S | 52 +++---
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/kernel/interrupt_64.S 
b/arch/powerpc/kernel/interrupt_64.S
index c4336e2e2ce8..4063e8a3f704 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -449,32 +449,6 @@ _ASM_NOKPROBE_SYMBOL(tabort_syscall)
b   .   /* prevent speculative execution */
 #endif
 
-#ifdef CONFIG_PPC_BOOK3S
-_GLOBAL(ret_from_fork_scv)
-   bl  schedule_tail
-   REST_NVGPRS(r1)
-   li  r3,0/* fork() return value */
-   b   .Lsyscall_vectored_common_exit
-#endif
-
-_GLOBAL(ret_from_fork)
-   bl  schedule_tail
-   REST_NVGPRS(r1)
-   li  r3,0/* fork() return value */
-   b   .Lsyscall_exit
-
-_GLOBAL(ret_from_kernel_thread)
-   bl  schedule_tail
-   REST_NVGPRS(r1)
-   mtctr   r14
-   mr  r3,r15
-#ifdef PPC64_ELF_ABI_v2
-   mr  r12,r14
-#endif
-   bctrl
-   li  r3,0
-   b   .Lsyscall_exit
-
/*
 * If MSR EE/RI was never enabled, IRQs not reconciled, NVGPRs not
 * touched, no exit work created, then this can be used.
@@ -768,3 +742,29 @@ interrupt_return_macro hsrr
 __end_soft_masked:
 DEFINE_FIXED_SYMBOL(__end_soft_masked)
 #endif /* CONFIG_PPC_BOOK3S */
+
+#ifdef CONFIG_PPC_BOOK3S
+_GLOBAL(ret_from_fork_scv)
+   bl  schedule_tail
+   REST_NVGPRS(r1)
+   li  r3,0/* fork() return value */
+   b   .Lsyscall_vectored_common_exit
+#endif
+
+_GLOBAL(ret_from_fork)
+   bl  schedule_tail
+   REST_NVGPRS(r1)
+   li  r3,0/* fork() return value */
+   b   .Lsyscall_exit
+
+_GLOBAL(ret_from_kernel_thread)
+   bl  schedule_tail
+   REST_NVGPRS(r1)
+   mtctr   r14
+   mr  r3,r15
+#ifdef PPC64_ELF_ABI_v2
+   mr  r12,r14
+#endif
+   bctrl
+   li  r3,0
+   b   .Lsyscall_exit
-- 
2.23.0



[PATCH v3 8/9] powerpc/64s/interrupt: clean up interrupt return labels

2021-06-30 Thread Nicholas Piggin
Normal kernel-interrupt exits can get interrupt_return_srr_user_restart
in their backtrace, which is an unusual and notable function, and it is
part of the user-interrupt exit path, which is doubly confusing.

Add non-local labels for both user and kernel interrupt exit cases to
address this and make the user and kernel cases more symmetric. Also get
rid of an unused label.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/interrupt_64.S | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/interrupt_64.S 
b/arch/powerpc/kernel/interrupt_64.S
index 5c18362693fe..c4336e2e2ce8 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -509,7 +509,9 @@ interrupt_return_\srr\():
 _ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\())
ld  r4,_MSR(r1)
andi.   r0,r4,MSR_PR
-   beq .Lkernel_interrupt_return_\srr
+   beq interrupt_return_\srr\()_kernel
+interrupt_return_\srr\()_user: /* make backtraces match the _kernel variant */
+_ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_user)
addir3,r1,STACK_FRAME_OVERHEAD
bl  interrupt_exit_user_prepare
cmpdi   r3,0
@@ -623,8 +625,8 @@ RESTART_TABLE(.Linterrupt_return_\srr\()_user_rst_start, 
.Linterrupt_return_\srr
 #endif
 
.balign IFETCH_ALIGN_BYTES
-.Lkernel_interrupt_return_\srr\():
-.Linterrupt_return_\srr\()_kernel:
+interrupt_return_\srr\()_kernel:
+_ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_kernel)
addir3,r1,STACK_FRAME_OVERHEAD
bl  interrupt_exit_kernel_prepare
 
-- 
2.23.0



[PATCH v3 7/9] powerpc/64/interrupt: add missing kprobe annotations on interrupt exit symbols

2021-06-30 Thread Nicholas Piggin
If one interrupt exit symbol must not be kprobed, none of them can be,
without more justification for why it's safe. Disallow kprobing on any
of the (non-local) labels in the exit paths.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/interrupt_64.S | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/kernel/interrupt_64.S 
b/arch/powerpc/kernel/interrupt_64.S
index 2c92bbca02ca..5c18362693fe 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -197,6 +197,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 .Lsyscall_vectored_\name\()_rst_end:
 
 syscall_vectored_\name\()_restart:
+_ASM_NOKPROBE_SYMBOL(syscall_vectored_\name\()_restart)
GET_PACA(r13)
ld  r1,PACA_EXIT_SAVE_R1(r13)
ld  r2,PACATOC(r13)
@@ -238,6 +239,7 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_emulate)
.balign IFETCH_ALIGN_BYTES
.globl system_call_common_real
 system_call_common_real:
+_ASM_NOKPROBE_SYMBOL(system_call_common_real)
ld  r10,PACAKMSR(r13)   /* get MSR value for kernel */
mtmsrd  r10
 
@@ -402,6 +404,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
 #ifdef CONFIG_PPC_BOOK3S
 syscall_restart:
+_ASM_NOKPROBE_SYMBOL(syscall_restart)
GET_PACA(r13)
ld  r1,PACA_EXIT_SAVE_R1(r13)
ld  r2,PACATOC(r13)
@@ -420,6 +423,7 @@ RESTART_TABLE(.Lsyscall_rst_start, .Lsyscall_rst_end, 
syscall_restart)
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 tabort_syscall:
+_ASM_NOKPROBE_SYMBOL(tabort_syscall)
/* Firstly we need to enable TM in the kernel */
mfmsr   r10
li  r9, 1
@@ -602,6 +606,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
 
 #ifdef CONFIG_PPC_BOOK3S
 interrupt_return_\srr\()_user_restart:
+_ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_user_restart)
GET_PACA(r13)
ld  r1,PACA_EXIT_SAVE_R1(r13)
ld  r2,PACATOC(r13)
@@ -735,6 +740,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
 
 #ifdef CONFIG_PPC_BOOK3S
 interrupt_return_\srr\()_kernel_restart:
+_ASM_NOKPROBE_SYMBOL(interrupt_return_\srr\()_kernel_restart)
GET_PACA(r13)
ld  r1,PACA_EXIT_SAVE_R1(r13)
ld  r2,PACATOC(r13)
-- 
2.23.0



[PATCH v3 6/9] powerpc/64: enable MSR[EE] in irq replay pt_regs

2021-06-30 Thread Nicholas Piggin
Similar to commit 2b48e96be2f9f ("powerpc/64: fix irq replay
pt_regs->softe value"), enable MSR_EE in pt_regs->msr. This makes the
regs look more normal. It also allows some extra debug checks to be
added to interrupt handler entry.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/interrupt.h | 4 
 arch/powerpc/kernel/irq.c| 1 +
 2 files changed, 5 insertions(+)

diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index 789311d1e283..d4bdf7d274ac 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -173,6 +173,8 @@ static inline void interrupt_enter_prepare(struct pt_regs 
*regs, struct interrup
BUG_ON(search_kernel_restart_table(regs->nip));
 #endif
}
+   if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
+   BUG_ON(!arch_irq_disabled_regs(regs) && !(regs->msr & MSR_EE));
 #endif
 
booke_restore_dbcr0();
@@ -268,6 +270,8 @@ static inline void interrupt_nmi_enter_prepare(struct 
pt_regs *regs, struct inte
// arch_irq_disabled_regs(regs) behaves as expected.
regs->softe = IRQS_ALL_DISABLED;
}
+   if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
+   BUG_ON(!arch_irq_disabled_regs(regs) && !(regs->msr & MSR_EE));
 
/* Don't do any per-CPU operations until interrupt state is fixed */
 
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 8428caf3194e..91e63eac4e8f 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -121,6 +121,7 @@ void replay_soft_interrupts(void)
 
ppc_save_regs();
regs.softe = IRQS_ENABLED;
+   regs.msr |= MSR_EE;
 
 again:
if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
-- 
2.23.0



[PATCH v3 5/9] powerpc/64s/interrupt: preserve regs->softe for NMI interrupts

2021-06-30 Thread Nicholas Piggin
If an NMI interrupt hits in an implicit soft-masked region, regs->softe
is modified to reflect that. This may not be necessary for correctness
at the moment, but it is less surprising and it's unhelpful when
debugging or adding checks.

Make sure this is changed back to how it was found before returning.

Fixes: 4ec5feec1ad0 ("powerpc/64s: Make NMI record implicitly soft-masked code 
as irqs disabled")
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/interrupt.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index d7df247a149c..789311d1e283 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -227,6 +227,7 @@ struct interrupt_nmi_state {
u8 irq_soft_mask;
u8 irq_happened;
u8 ftrace_enabled;
+   u64 softe;
 #endif
 };
 
@@ -252,6 +253,7 @@ static inline void interrupt_nmi_enter_prepare(struct 
pt_regs *regs, struct inte
 #ifdef CONFIG_PPC64
state->irq_soft_mask = local_paca->irq_soft_mask;
state->irq_happened = local_paca->irq_happened;
+   state->softe = regs->softe;
 
/*
 * Set IRQS_ALL_DISABLED unconditionally so irqs_disabled() does
@@ -311,6 +313,7 @@ static inline void interrupt_nmi_exit_prepare(struct 
pt_regs *regs, struct inter
 
/* Check we didn't change the pending interrupt mask. */
WARN_ON_ONCE((state->irq_happened | PACA_IRQ_HARD_DIS) != 
local_paca->irq_happened);
+   regs->softe = state->softe;
local_paca->irq_happened = state->irq_happened;
local_paca->irq_soft_mask = state->irq_soft_mask;
 #endif
-- 
2.23.0



[PATCH v3 4/9] powerpc/64s: add a table of implicit soft-masked addresses

2021-06-30 Thread Nicholas Piggin
Commit 9d1988ca87dd ("powerpc/64: treat low kernel text as irqs
soft-masked") ends up catching too much code, including ret_from_fork,
and parts of interrupt and syscall return that do not expect to be
interrupts to be soft-masked. If an interrupt gets marked pending,
and then the code proceeds out of the implicit soft-masked region it
will fail to deal with the pending interrupt.

Fix this by adding a new table of addresses which explicitly marks
the regions of code that are soft masked. This table is only checked
for interrupts that below __end_soft_masked, so most kernel interrupts
will not have the overhead of the table search.

Fixes: 9d1988ca87dd ("powerpc/64: treat low kernel text as irqs soft-masked")
Reported-by: Sachin Sant 
Tested-by: Sachin Sant 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/interrupt.h |  3 +-
 arch/powerpc/include/asm/ppc_asm.h   |  7 +++
 arch/powerpc/kernel/exceptions-64s.S | 64 +++-
 arch/powerpc/kernel/interrupt_64.S   |  8 
 arch/powerpc/kernel/vmlinux.lds.S|  9 
 arch/powerpc/lib/restart_table.c | 26 +++
 6 files changed, 106 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index f13c93b033c7..d7df247a149c 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -75,6 +75,7 @@
 
 #ifdef CONFIG_PPC_BOOK3S_64
 extern char __end_soft_masked[];
+bool search_kernel_soft_mask_table(unsigned long addr);
 unsigned long search_kernel_restart_table(unsigned long addr);
 
 DECLARE_STATIC_KEY_FALSE(interrupt_exit_not_reentrant);
@@ -87,7 +88,7 @@ static inline bool is_implicit_soft_masked(struct pt_regs 
*regs)
if (regs->nip >= (unsigned long)__end_soft_masked)
return false;
 
-   return true;
+   return search_kernel_soft_mask_table(regs->nip);
 }
 
 static inline void srr_regs_clobbered(void)
diff --git a/arch/powerpc/include/asm/ppc_asm.h 
b/arch/powerpc/include/asm/ppc_asm.h
index c9c2c36c1f8f..116c1519728a 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -762,6 +762,13 @@ END_FTR_SECTION_NESTED(CPU_FTR_CELL_TB_BUG, 
CPU_FTR_CELL_TB_BUG, 96)
stringify_in_c(.long (_target) - . ;)   \
stringify_in_c(.previous)
 
+#define SOFT_MASK_TABLE(_start, _end)  \
+   stringify_in_c(.section __soft_mask_table,"a";)\
+   stringify_in_c(.balign 8;)  \
+   stringify_in_c(.llong (_start);)\
+   stringify_in_c(.llong (_end);)  \
+   stringify_in_c(.previous)
+
 #define RESTART_TABLE(_start, _end, _target)   \
stringify_in_c(.section __restart_table,"a";)\
stringify_in_c(.balign 8;)  \
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index ecd07bf604c5..4aec59a77d4c 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -428,21 +428,31 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real)
 
/* If coming from user, skip soft-mask tests. */
andi.   r10,r12,MSR_PR
-   bne 2f
+   bne 3f
 
/*
-* Kernel code running below __end_soft_masked is implicitly
-* soft-masked
+* Kernel code running below __end_soft_masked may be
+* implicitly soft-masked if it is within the regions
+* in the soft mask table.
 */
LOAD_HANDLER(r10, __end_soft_masked)
cmpld   r11,r10
-
+   bge+1f
+
+   /* SEARCH_SOFT_MASK_TABLE clobbers r9,r10,r12 */
+   mtctr   r12
+   stw r9,PACA_EXGEN+EX_CCR(r13)
+   SEARCH_SOFT_MASK_TABLE
+   cmpdi   r12,0
+   mfctr   r12 /* Restore r12 to SRR1 */
+   lwz r9,PACA_EXGEN+EX_CCR(r13)
+   beq 1f  /* Not in soft-mask table */
li  r10,IMASK
-   blt-1f
+   b   2f  /* In soft-mask table, always mask */
 
/* Test the soft mask state against our interrupt's bit */
-   lbz r10,PACAIRQSOFTMASK(r13)
-1: andi.   r10,r10,IMASK
+1: lbz r10,PACAIRQSOFTMASK(r13)
+2: andi.   r10,r10,IMASK
/* Associate vector numbers with bits in paca->irq_happened */
.if IVEC == 0x500 || IVEC == 0xea0
li  r10,PACA_IRQ_EE
@@ -473,7 +483,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real)
 
.if ISTACK
andi.   r10,r12,MSR_PR  /* See if coming from user  */
-2: mr  r10,r1  /* Save r1  */
+3: mr  r10,r1  /* Save r1  */
subir1,r1,INT_FRAME_SIZE/* alloc frame 

[PATCH v3 3/9] powerpc/64e: remove implicit soft-masking and interrupt exit restart logic

2021-06-30 Thread Nicholas Piggin
The implicit soft-masking to speed up interrupt return was going to be
used by 64e as well, but it has not been extensively tested on that
platform and is not considered ready. It was intended to be disabled
before merge. Disable it for now.

Most of the restart code is common with 64s, so with more correctness
and performance testing this could be re-enabled again by adding the
extra soft-mask checks to interrupt handlers and flipping
exit_must_hard_disable().

Fixes: 9d1988ca87dd ("powerpc/64: treat low kernel text as irqs soft-masked")
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/interrupt.h | 33 
 arch/powerpc/kernel/exceptions-64e.S | 12 +-
 arch/powerpc/kernel/interrupt.c  |  2 +-
 arch/powerpc/kernel/interrupt_64.S   | 16 --
 4 files changed, 40 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index 8b4b1e84e110..f13c93b033c7 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -73,20 +73,34 @@
 #include 
 #include 
 
-#ifdef CONFIG_PPC64
+#ifdef CONFIG_PPC_BOOK3S_64
 extern char __end_soft_masked[];
 unsigned long search_kernel_restart_table(unsigned long addr);
-#endif
 
-#ifdef CONFIG_PPC_BOOK3S_64
 DECLARE_STATIC_KEY_FALSE(interrupt_exit_not_reentrant);
 
+static inline bool is_implicit_soft_masked(struct pt_regs *regs)
+{
+   if (regs->msr & MSR_PR)
+   return false;
+
+   if (regs->nip >= (unsigned long)__end_soft_masked)
+   return false;
+
+   return true;
+}
+
 static inline void srr_regs_clobbered(void)
 {
local_paca->srr_valid = 0;
local_paca->hsrr_valid = 0;
 }
 #else
+static inline bool is_implicit_soft_masked(struct pt_regs *regs)
+{
+   return false;
+}
+
 static inline void srr_regs_clobbered(void)
 {
 }
@@ -150,11 +164,13 @@ static inline void interrupt_enter_prepare(struct pt_regs 
*regs, struct interrup
 */
if (TRAP(regs) != INTERRUPT_PROGRAM) {
CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
-   BUG_ON(regs->nip < (unsigned long)__end_soft_masked);
+   BUG_ON(is_implicit_soft_masked(regs));
}
+#ifdef CONFIG_PPC_BOOK3S
/* Move this under a debugging check */
if (arch_irq_disabled_regs(regs))
BUG_ON(search_kernel_restart_table(regs->nip));
+#endif
}
 #endif
 
@@ -244,10 +260,9 @@ static inline void interrupt_nmi_enter_prepare(struct 
pt_regs *regs, struct inte
local_paca->irq_soft_mask = IRQS_ALL_DISABLED;
local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
 
-   if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && !(regs->msr & MSR_PR) &&
-   regs->nip < (unsigned long)__end_soft_masked) {
-   // Kernel code running below __end_soft_masked is
-   // implicitly soft-masked.
+   if (is_implicit_soft_masked(regs)) {
+   // Adjust regs->softe soft implicit soft-mask, so
+   // arch_irq_disabled_regs(regs) behaves as expected.
regs->softe = IRQS_ALL_DISABLED;
}
 
@@ -282,6 +297,7 @@ static inline void interrupt_nmi_exit_prepare(struct 
pt_regs *regs, struct inter
 */
 
 #ifdef CONFIG_PPC64
+#ifdef CONFIG_PPC_BOOK3S
if (arch_irq_disabled_regs(regs)) {
unsigned long rst = search_kernel_restart_table(regs->nip);
if (rst)
@@ -289,7 +305,6 @@ static inline void interrupt_nmi_exit_prepare(struct 
pt_regs *regs, struct inter
}
 #endif
 
-#ifdef CONFIG_PPC64
if (nmi_disables_ftrace(regs))
this_cpu_set_ftrace_enabled(state->ftrace_enabled);
 
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index d634bfceed2c..1401787b0b93 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -342,17 +342,7 @@ ret_from_mc_except:
 #define PROLOG_ADDITION_MASKABLE_GEN(n)
\
lbz r10,PACAIRQSOFTMASK(r13);   /* are irqs soft-masked? */ \
andi.   r10,r10,IRQS_DISABLED;  /* yes -> go out of line */ \
-   bne masked_interrupt_book3e_##n;\
-   /* Kernel code below __end_soft_masked is implicitly masked */  \
-   andi.   r10,r11,MSR_PR; \
-   bne 1f; /* user -> not masked */\
-   std r14,PACA_EXGEN+EX_R14(r13); \
-   LOAD_REG_IMMEDIATE_SYM(r14, r10, __end_soft_masked);\
-   mfspr   r10,SPRN_SRR0;  \
-   cmpld   r10,r14;\
-   ld  r14,PACA_EXGEN+EX_R14(r13);  

[PATCH v3 0/9] powerpc: fast interrupt exit bug and misc fixes

2021-06-30 Thread Nicholas Piggin
This is a bunch of fixes for powerpc next, mostly a nasty hole in fast
interrupt exit code found by Sachin and some other bits along the way
while looking at it.

Since v2:
- Fixed 64e patch 3 to really set exit_must_hard_disable.
- Reworded some changelogs.

Since v1:
- Fixed a bisection compile error due to a fix incorrectly going to
  a later patch.
- Fixed the "add a table of implicit soft-masked addresses" patch to
  include the low scv vector range as a soft-masked table entry. scv
  was the original reason for implicit soft masking, and the previous
  version breaks it. My stress testing was using an image without
  scv glibc unfortunately.
- Fixed a bug with the same patch that would restore r12 with SRR1
  rather than HSRR1 in the case of masked hypervisor interrupts after
  searching the soft-mask table. Again unfortunately my stress testing
  was in a guest so no HV interrupts.

Thanks to Michael for noticing these issues.

- Pulled in the hash page fault interrupt handler fix into this series
  to make the dependencies clear (well it's not exactly dependent but
  assertions introduced later make the existing bug crash more often).

Thanks,
Nick

Nicholas Piggin (9):
  powerpc/64s: fix hash page fault interrupt handler
  powerpc/64e: fix CONFIG_RELOCATABLE build warnings
  powerpc/64e: remove implicit soft-masking and interrupt exit restart
logic
  powerpc/64s: add a table of implicit soft-masked addresses
  powerpc/64s/interrupt: preserve regs->softe for NMI interrupts
  powerpc/64: enable MSR[EE] in irq replay pt_regs
  powerpc/64/interrupt: add missing kprobe annotations on interrupt exit
symbols
  powerpc/64s/interrupt: clean up interrupt return labels
  powerpc/64s: move ret_from_fork etc above __end_soft_masked

 arch/powerpc/include/asm/interrupt.h  | 41 +---
 arch/powerpc/include/asm/ppc_asm.h|  7 +++
 arch/powerpc/kernel/exceptions-64e.S  | 23 +++
 arch/powerpc/kernel/exceptions-64s.S  | 64 ---
 arch/powerpc/kernel/interrupt.c   |  2 +-
 arch/powerpc/kernel/interrupt_64.S| 90 ++-
 arch/powerpc/kernel/irq.c |  1 +
 arch/powerpc/kernel/vmlinux.lds.S |  9 +++
 arch/powerpc/lib/restart_table.c  | 26 
 arch/powerpc/mm/book3s64/hash_utils.c | 24 ---
 10 files changed, 212 insertions(+), 75 deletions(-)

-- 
2.23.0



[PATCH v3 2/9] powerpc/64e: fix CONFIG_RELOCATABLE build warnings

2021-06-30 Thread Nicholas Piggin
CONFIG_RELOCATABLE=y causes build warnings from unresolved relocations.
Fix these by using TOC addressing for these cases.

Commit 24d33ac5b8ff ("powerpc/64s: Make prom_init require RELOCATABLE")
caused some 64e configs to select RELOCATABLE resulting in these
warnings, but the underlying issue was already there.

This passes basic qemu testing.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64e.S | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 22fcd95dd8dc..d634bfceed2c 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -912,8 +912,14 @@ kernel_dbg_exc:
b   interrupt_return
 
 .macro SEARCH_RESTART_TABLE
+#ifdef CONFIG_RELOCATABLE
+   ld  r11,PACATOC(r13)
+   ld  r14,__start___restart_table@got(r11)
+   ld  r15,__stop___restart_table@got(r11)
+#else
LOAD_REG_IMMEDIATE_SYM(r14, r11, __start___restart_table)
LOAD_REG_IMMEDIATE_SYM(r15, r11, __stop___restart_table)
+#endif
 300:
cmpdr14,r15
beq 302f
@@ -1329,7 +1335,12 @@ a2_tlbinit_code_start:
 a2_tlbinit_after_linear_map:
 
/* Now we branch the new virtual address mapped by this entry */
+#ifdef CONFIG_RELOCATABLE
+   ld  r5,PACATOC(r13)
+   ld  r3,1f@got(r5)
+#else
LOAD_REG_IMMEDIATE_SYM(r3, r5, 1f)
+#endif
mtctr   r3
bctr
 
-- 
2.23.0



[PATCH v3 1/9] powerpc/64s: fix hash page fault interrupt handler

2021-06-30 Thread Nicholas Piggin
The early bad fault or key fault test in do_hash_fault() ends up calling
into ___do_page_fault without having gone through an interrupt handler
wrapper (except the initial _RAW one). This can end up calling local irq
functions while the interrupt has not been reconciled, which will likely
cause crashes and it trips up on a later patch that adds more assertions.

pkey_exec_prot from selftests causes this path to be executed.

There is no real reason to run the in_nmi() test should be performed
before the key fault check. In fact if a perf interrupt in the hash
fault code did a stack walk that was made to take a key fault somehow
then running ___do_page_fault could possibly cause another hash fault
causing problems. Move the in_nmi() test first, and then do everything
else inside the regular interrupt handler function.

Fixes: 3a96570ffceb ("powerpc: convert interrupt handlers to use wrappers")
Reported-by: Sachin Sant 
Tested-by: Sachin Sant 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/mm/book3s64/hash_utils.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index 96d9aa164007..ac5720371c0d 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1522,8 +1522,8 @@ int hash_page(unsigned long ea, unsigned long access, 
unsigned long trap,
 }
 EXPORT_SYMBOL_GPL(hash_page);
 
-DECLARE_INTERRUPT_HANDLER_RET(__do_hash_fault);
-DEFINE_INTERRUPT_HANDLER_RET(__do_hash_fault)
+DECLARE_INTERRUPT_HANDLER(__do_hash_fault);
+DEFINE_INTERRUPT_HANDLER(__do_hash_fault)
 {
unsigned long ea = regs->dar;
unsigned long dsisr = regs->dsisr;
@@ -1533,6 +1533,11 @@ DEFINE_INTERRUPT_HANDLER_RET(__do_hash_fault)
unsigned int region_id;
long err;
 
+   if (unlikely(dsisr & (DSISR_BAD_FAULT_64S | DSISR_KEYFAULT))) {
+   hash__do_page_fault(regs);
+   return;
+   }
+
region_id = get_region_id(ea);
if ((region_id == VMALLOC_REGION_ID) || (region_id == IO_REGION_ID))
mm = _mm;
@@ -1571,9 +1576,10 @@ DEFINE_INTERRUPT_HANDLER_RET(__do_hash_fault)
bad_page_fault(regs, SIGBUS);
}
err = 0;
-   }
 
-   return err;
+   } else if (err) {
+   hash__do_page_fault(regs);
+   }
 }
 
 /*
@@ -1582,13 +1588,6 @@ DEFINE_INTERRUPT_HANDLER_RET(__do_hash_fault)
  */
 DEFINE_INTERRUPT_HANDLER_RAW(do_hash_fault)
 {
-   unsigned long dsisr = regs->dsisr;
-
-   if (unlikely(dsisr & (DSISR_BAD_FAULT_64S | DSISR_KEYFAULT))) {
-   hash__do_page_fault(regs);
-   return 0;
-   }
-
/*
 * If we are in an "NMI" (e.g., an interrupt when soft-disabled), then
 * don't call hash_page, just fail the fault. This is required to
@@ -1607,8 +1606,7 @@ DEFINE_INTERRUPT_HANDLER_RAW(do_hash_fault)
return 0;
}
 
-   if (__do_hash_fault(regs))
-   hash__do_page_fault(regs);
+   __do_hash_fault(regs);
 
return 0;
 }
-- 
2.23.0



[powerpc][5.13.0-next20210629] Kernel WARN crypto/testmgr.c:5653 during boot

2021-06-30 Thread Sachin Sant
While booting 5.13.0-next20210629 on a Power server, following
warning is seen:

[0.076955] DRBG: could not allocate digest TFM handle: hmac(sha512)
[0.076960] alg: drbg: Failed to reset rng
[0.076963] alg: drbg: Test 0 failed for drbg_nopr_hmac_sha512
[0.076967] [ cut here ]
[0.076970] alg: self-tests for drbg_nopr_hmac_sha512 (stdrng) failed 
(rc=-22)
[0.076977] WARNING: CPU: 10 PID: 153 at crypto/testmgr.c:5653 
alg_test+0x484/0x860
[0.076989] Modules linked in:
[0.076993] CPU: 10 PID: 153 Comm: cryptomgr_test Not tainted 
5.13.0-next-20210629 #1
[0.076998] NIP:  c063ea44 LR: c063ea40 CTR: c0730b40
[0.077003] REGS: ce7ff960 TRAP: 0700   Not tainted  
(5.13.0-next-20210629)
[0.077007] MSR:  80029033   CR: 28008222  
XER: 20040005
[0.077018] CFAR: c0150a00 IRQMASK: 0 
[0.077018] GPR00: c063ea40 ce7ffc00 c29bc300 
0042 
[0.077018] GPR04: 7fff ce7ff8c0 ce7ff8b8 
 
[0.077018] GPR08: 0009f98e c24f66f0 c24f66f0 
c2876838 
[0.077018] GPR12: 8000 c0001ec7a280 c018ce88 
c3b300c0 
[0.077018] GPR16:    
 
[0.077018] GPR20:   c2a9ab20 
c0fb5278 
[0.077018] GPR24: c00010a85200 c0d61fa8 c00010a85280 
0400 
[0.077018] GPR28: c00010a85200 000c c2ccf230 
ffea 
[0.077072] NIP [c063ea44] alg_test+0x484/0x860
[0.077077] LR [c063ea40] alg_test+0x480/0x860
[0.077082] Call Trace:
[0.077085] [ce7ffc00] [c063ea40] alg_test+0x480/0x860 
(unreliable)
[0.077091] [ce7ffd70] [c063ca60] cryptomgr_test+0x40/0x70
[0.077097] [ce7ffda0] [c018d014] kthread+0x194/0x1a0
[0.077103] [ce7ffe10] [c000c750] 
ret_from_kernel_thread+0x5c/0x6c
[0.077110] Instruction dump:
[0.077113] 409e0298 3d220031 89292f56 2f89 409e0288 3c62fe63 7f45d378 
7f84e378 
[0.077121] 7fe6fb78 38633260 4bb11f5d 6000 <0fe0> e8010180 eb210138 
7c0803a6 
[0.077131] ---[ end trace a1cc3999f90f0962 ]---
[0.077585] iommu: Default domain type: Translated 

This new self test was introduced with
commit 8833272d876e
 crypto: drbg - self test for HMAC(SHA-512)

Thanks
-Sachin