Re: [PATCH] powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features

2018-04-12 Thread Nicholas Piggin
On Thu, 12 Apr 2018 23:47:51 +1000
Michael Ellerman  wrote:

> The cpu_has_feature() mechanism has an optimisation where at build
> time we construct a mask of the CPU feature bits that will always be
> true for the given .config, based on the platform/bitness/etc. that we
> are building for.
> 
> That is incompatible with DT CPU features, where the set of CPU
> features is dependent on feature flags that are given to us by
> firmware.
> 
> The result is that some feature bits can not be *disabled* by DT CPU
> features. Or more accurately, they can be disabled but they will still
> appear in the ALWAYS mask, meaning cpu_has_feature() will always
> return true for them.
> 
> In the past this hasn't really been a problem because on Book3S
> 64 (where we support DT CPU features), the set of ALWAYS bits has been
> very small. That was because we always built for POWER4 and later,
> meaning the set of common bits was small.
> 
> The only bit that could be cleared by DT CPU features that was also in
> the ALWAYS mask was CPU_FTR_NODSISRALIGN, and that was only used in
> the alignment handler to create a fake DSISR. That code was itself
> deleted in 31bfdb036f12 ("powerpc: Use instruction emulation
> infrastructure to handle alignment faults") (Sep 2017).
> 
> However the set of ALWAYS features changed with the recent commit
> db5ae1c155af ("powerpc/64s: Refine feature sets for little endian
> builds") which restricted the set of feature flags when building
> little endian to Power7 or later. That caused the ALWAYS mask to
> become much larger for little endian builds.
> 
> The result is that the following feature bits can currently not
> be *disabled* by DT CPU features:
> 
>   CPU_FTR_REAL_LE, CPU_FTR_MMCRA, CPU_FTR_CTRL, CPU_FTR_SMT,
>   CPU_FTR_PURR, CPU_FTR_SPURR, CPU_FTR_DSCR, CPU_FTR_PKEY,
>   CPU_FTR_VMX_COPY, CPU_FTR_CFAR, CPU_FTR_HAS_PPR.
> 
> To fix it we need to mask the set of ALWAYS features with the base set
> of DT CPU features, ie. the features that are always enabled by DT CPU
> features. That way there are no bits in the ALWAYS mask that are not
> also always set by DT CPU features.
> 
> Fixes: db5ae1c155af ("powerpc/64s: Refine feature sets for little endian 
> builds")
> Signed-off-by: Michael Ellerman 

Looks good to me.

Reviewed-by: Nicholas Piggin 


Re: [PATCH 2/2] powernv/npu: Prevent overwriting of pnv_npu2_init_contex() callback parameters

2018-04-12 Thread Mark Hairgrove


On Wed, 11 Apr 2018, Alistair Popple wrote:

> There is a single npu context per set of callback parameters. Callers
> should be prevented from overwriting existing callback values so instead
> return an error if different parameters are passed.
> 
> Signed-off-by: Alistair Popple 
> ---
>  arch/powerpc/include/asm/powernv.h   |  2 +-
>  arch/powerpc/platforms/powernv/npu-dma.c | 16 +---
>  2 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/powernv.h 
> b/arch/powerpc/include/asm/powernv.h
> index dc5f6a5d4575..362ea12a4501 100644
> --- a/arch/powerpc/include/asm/powernv.h
> +++ b/arch/powerpc/include/asm/powernv.h
> @@ -15,7 +15,7 @@
>  extern void powernv_set_nmmu_ptcr(unsigned long ptcr);
>  extern struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>   unsigned long flags,
> - struct npu_context *(*cb)(struct npu_context *, void *),
> + void (*cb)(struct npu_context *, void *),
>   void *priv);
>  extern void pnv_npu2_destroy_context(struct npu_context *context,
>   struct pci_dev *gpdev);
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
> b/arch/powerpc/platforms/powernv/npu-dma.c
> index cb77162f4e7a..193f43ea3fbc 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -407,7 +407,7 @@ struct npu_context {
>   bool nmmu_flush;
>  
>   /* Callback to stop translation requests on a given GPU */
> - struct npu_context *(*release_cb)(struct npu_context *, void *);
> + void (*release_cb)(struct npu_context *context, void *priv);
>  
>   /*
>* Private pointer passed to the above callback for usage by
> @@ -705,7 +705,7 @@ static const struct mmu_notifier_ops nv_nmmu_notifier_ops 
> = {
>   */
>  struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>   unsigned long flags,
> - struct npu_context *(*cb)(struct npu_context *, void *),
> + void (*cb)(struct npu_context *, void *),
>   void *priv)
>  {
>   int rc;
> @@ -763,8 +763,18 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev 
> *gpdev,
>*/
>   spin_lock(_context_lock);
>   npu_context = mm->context.npu_context;
> - if (npu_context)
> + if (npu_context) {
> + if (npu_context->release_cb != cb ||
> + npu_context->priv != priv) {
> + spin_unlock(_context_lock);
> + opal_npu_destroy_context(nphb->opal_id, mm->context.id,
> + PCI_DEVID(gpdev->bus->number,
> + gpdev->devfn));
> + return ERR_PTR(-EINVAL);
> + }
> +
>   WARN_ON(!kref_get_unless_zero(_context->kref));
> + }
>   spin_unlock(_context_lock);
>  
>   if (!npu_context) {
> -- 
> 2.11.0
> 
> 

Reviewed-by: Mark Hairgrove 
Tested-by: Mark Hairgrove 



Re: [PATCH 1/2] powernv/npu: Add lock to prevent race in concurrent context init/destroy

2018-04-12 Thread Mark Hairgrove


On Wed, 11 Apr 2018, Alistair Popple wrote:

> The pnv_npu2_init_context() and pnv_npu2_destroy_context() functions are
> used to allocate/free contexts to allow address translation and shootdown
> by the NPU on a particular GPU. Context initialisation is implicitly safe
> as it is protected by the requirement mmap_sem be held in write mode,
> however pnv_npu2_destroy_context() does not require mmap_sem to be held and
> it is not safe to call with a concurrent initialisation for a different
> GPU.
> 
> It was assumed the driver would ensure destruction was not called
> concurrently with initialisation. However the driver may be simplified by
> allowing concurrent initialisation and destruction for different GPUs. As
> npu context creation/destruction is not a performance critical path and the
> critical section is not large a single spinlock is used for simplicity.
> 
> Signed-off-by: Alistair Popple 
> ---
>  arch/powerpc/platforms/powernv/npu-dma.c | 51 
> ++--
>  1 file changed, 42 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
> b/arch/powerpc/platforms/powernv/npu-dma.c
> index 1cbef1f9cd37..cb77162f4e7a 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -34,6 +34,12 @@
>  #define npu_to_phb(x) container_of(x, struct pnv_phb, npu)
>  
>  /*
> + * spinlock to protect initialisation of an npu_context for a particular
> + * mm_struct.
> + */
> +DEFINE_SPINLOCK(npu_context_lock);

static DEFINE_SPINLOCK

> +
> +/*
>   * Other types of TCE cache invalidation are not functional in the
>   * hardware.
>   */
> @@ -694,7 +700,8 @@ static const struct mmu_notifier_ops nv_nmmu_notifier_ops 
> = {
>   * Returns an error if there no contexts are currently available or a
>   * npu_context which should be passed to pnv_npu2_handle_fault().
>   *
> - * mmap_sem must be held in write mode.
> + * mmap_sem must be held in write mode and must not be called from interrupt
> + * context.
>   */
>  struct npu_context *pnv_npu2_init_context(struct pci_dev *gpdev,
>   unsigned long flags,
> @@ -741,7 +748,9 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev 
> *gpdev,
>   /*
>* Setup the NPU context table for a particular GPU. These need to be
>* per-GPU as we need the tables to filter ATSDs when there are no
> -  * active contexts on a particular GPU.
> +  * active contexts on a particular GPU. It is safe for these to be
> +  * called concurrently with destroy as the OPAL call takes appropriate
> +  * locks and refcounts on init/destroy.
>*/
>   rc = opal_npu_init_context(nphb->opal_id, mm->context.id, flags,
>   PCI_DEVID(gpdev->bus->number, gpdev->devfn));
> @@ -752,8 +761,19 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev 
> *gpdev,
>* We store the npu pci device so we can more easily get at the
>* associated npus.
>*/
> + spin_lock(_context_lock);
>   npu_context = mm->context.npu_context;
> + if (npu_context)
> + WARN_ON(!kref_get_unless_zero(_context->kref));
> + spin_unlock(_context_lock);
> +
>   if (!npu_context) {
> + /*
> +  * We can set up these fields without holding the
> +  * npu_context_lock as the npu_context hasn't been returned to
> +  * the caller meaning it can't be destroyed. Parallel allocation
> +  * is protected against by mmap_sem.
> +  */
>   rc = -ENOMEM;
>   npu_context = kzalloc(sizeof(struct npu_context), GFP_KERNEL);
>   if (npu_context) {
> @@ -772,8 +792,6 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev 
> *gpdev,
>   }
>  
>   mm->context.npu_context = npu_context;
> - } else {
> - WARN_ON(!kref_get_unless_zero(_context->kref));
>   }
>  
>   npu_context->release_cb = cb;
> @@ -811,15 +829,16 @@ static void pnv_npu2_release_context(struct kref *kref)
>   mm_context_remove_copro(npu_context->mm);

mm_context_remove_copro will now be called while holding a spin lock. Just
as a sanity check, is that ok? I haven't hit any problems in testing
and I see radix__flush_all_mm call preempt_disable/enable so I assume so,
but it doesn't hurt to double-check my understanding.

>  
>   npu_context->mm->context.npu_context = NULL;
> - mmu_notifier_unregister(_context->mn,
> - npu_context->mm);
> -
> - kfree(npu_context);
>  }
>  
> +/*
> + * Destroy a context on the given GPU. May free the npu_context if it is no
> + * longer active on any GPUs. Must not be called from interrupt context.
> + */
>  void pnv_npu2_destroy_context(struct npu_context *npu_context,
>   struct pci_dev *gpdev)
>  {
> + int removed;
>   struct pnv_phb *nphb;
> 

Re: powerpc/mm/radix: Fix checkstops caused by invalid tlbiel

2018-04-12 Thread Michael Ellerman
On Thu, 2018-04-12 at 05:53:52 UTC, Michael Ellerman wrote:
> In tlbiel_radix_set_isa300() we use the PPC_TLBIEL() macro to
> construct tlbiel instructions. The instruction takes 5 fields, two of
> which are registers, and the others are constants. But because it's
> constructed with inline asm the compiler doesn't know that.
> 
> We got the constraint wrong on the 'r' field, using "r" tells the
> compiler to put the value in a register. The value we then get in the
> macro is the *register number*, not the value of the field.
> 
> That means when we mask the register number with 0x1 we get 0 or 1
> depending on which register the compiler happens to put the constant
> in, eg:
> 
>   li  r10,1
>   tlbiel  r8,r9,2,0,0
> 
>   li  r7,1
>   tlbiel  r10,r6,0,0,1
> 
> If we're unlucky we might generate an invalid instruction form, for
> example RIC=0, PRS=1 and R=0, tlbiel r8,r7,0,1,0, this has been
> observed to cause machine checks:
> 
>   Oops: Machine check, sig: 7 [#1]
>   CPU: 24 PID: 0 Comm: swapper
>   NIP:  000385f4 LR: 0100ed00 CTR: 007f
>   REGS: c110bb40 TRAP: 0200
>   MSR:  90201003   CR: 4800  XER: 2004
>   CFAR: 000385d0 DAR: 1c00 DSISR: 0200 SOFTE: 1
> 
> If the machine check happens early in boot while we have MSR_ME=0 it
> will escalate into a checkstop and kill the box entirely.
> 
> To fix it we could change the inline asm constraint to "i" which
> tells the compiler the value is a constant. But a better fix is to just
> pass a literal 1 into the macro, which bypasses any problems with inline
> asm constraints.
> 
> Fixes: d4748276ae14 ("powerpc/64s: Improve local TLB flush for boot and MCE 
> on POWER9")
> Cc: sta...@vger.kernel.org # v4.16+
> Signed-off-by: Michael Ellerman 
> Reviewed-by: Nicholas Piggin 

Applied to powerpc fixes.

https://git.kernel.org/powerpc/c/2675c13b293a007b7b7f8229514126

cheers


[PATCH] powerpc/sparse: fix plain integer as NULL pointer warning

2018-04-12 Thread Mathieu Malaterre
Trivial fix to remove the following sparse warnings:

  arch/powerpc/kernel/module_32.c:112:74: warning: Using plain integer as NULL 
pointer
  arch/powerpc/kernel/module_32.c:117:74: warning: Using plain integer as NULL 
pointer
  drivers/macintosh/via-pmu.c:1155:28: warning: Using plain integer as NULL 
pointer
  drivers/macintosh/via-pmu.c:1230:20: warning: Using plain integer as NULL 
pointer
  drivers/macintosh/via-pmu.c:1385:36: warning: Using plain integer as NULL 
pointer
  drivers/macintosh/via-pmu.c:1752:23: warning: Using plain integer as NULL 
pointer
  drivers/macintosh/via-pmu.c:2084:19: warning: Using plain integer as NULL 
pointer
  drivers/macintosh/via-pmu.c:2110:32: warning: Using plain integer as NULL 
pointer
  drivers/macintosh/via-pmu.c:2167:19: warning: Using plain integer as NULL 
pointer
  drivers/macintosh/via-pmu.c:2183:19: warning: Using plain integer as NULL 
pointer
  drivers/macintosh/via-pmu.c:277:20: warning: Using plain integer as NULL 
pointer
  arch/powerpc/platforms/powermac/setup.c:155:67: warning: Using plain integer 
as NULL pointer
  arch/powerpc/platforms/powermac/setup.c:247:27: warning: Using plain integer 
as NULL pointer
  arch/powerpc/platforms/powermac/setup.c:249:27: warning: Using plain integer 
as NULL pointer
  arch/powerpc/platforms/powermac/setup.c:252:37: warning: Using plain integer 
as NULL pointer
  arch/powerpc/mm/tlb_hash32.c:127:21: warning: Using plain integer as NULL 
pointer
  arch/powerpc/mm/tlb_hash32.c:148:21: warning: Using plain integer as NULL 
pointer
  arch/powerpc/mm/tlb_hash32.c:44:21: warning: Using plain integer as NULL 
pointer
  arch/powerpc/mm/tlb_hash32.c:57:21: warning: Using plain integer as NULL 
pointer
  arch/powerpc/mm/tlb_hash32.c:87:21: warning: Using plain integer as NULL 
pointer
  arch/powerpc/kernel/btext.c:160:31: warning: Using plain integer as NULL 
pointer
  arch/powerpc/kernel/btext.c:167:22: warning: Using plain integer as NULL 
pointer
  arch/powerpc/kernel/btext.c:274:21: warning: Using plain integer as NULL 
pointer
  arch/powerpc/kernel/btext.c:285:31: warning: Using plain integer as NULL 
pointer
  arch/powerpc/include/asm/hugetlb.h:204:16: warning: Using plain integer as 
NULL pointer
  arch/powerpc/mm/ppc_mmu_32.c:170:21: warning: Using plain integer as NULL 
pointer
  arch/powerpc/platforms/powermac/pci.c:1227:23: warning: Using plain integer 
as NULL pointer
  arch/powerpc/platforms/powermac/pci.c:65:24: warning: Using plain integer as 
NULL pointer

Also use `--fix` command line option from `script/checkpatch --strict` to
remove the following:

  CHECK: Comparison to NULL could be written "!dispDeviceBase"
  #72: FILE: arch/powerpc/kernel/btext.c:160:
  + if (dispDeviceBase == NULL)

  CHECK: Comparison to NULL could be written "!vbase"
  #80: FILE: arch/powerpc/kernel/btext.c:167:
  + if (vbase == NULL)

  CHECK: Comparison to NULL could be written "!base"
  #89: FILE: arch/powerpc/kernel/btext.c:274:
  + if (base == NULL)

  CHECK: Comparison to NULL could be written "!dispDeviceBase"
  #98: FILE: arch/powerpc/kernel/btext.c:285:
  + if (dispDeviceBase == NULL)

  CHECK: Comparison to NULL could be written "strstr"
  #117: FILE: arch/powerpc/kernel/module_32.c:117:
  + if (strstr(secstrings + sechdrs[i].sh_name, ".debug") != NULL)

  CHECK: Comparison to NULL could be written "!Hash"
  #130: FILE: arch/powerpc/mm/ppc_mmu_32.c:170:
  + if (Hash == NULL)

  CHECK: Comparison to NULL could be written "Hash"
  #143: FILE: arch/powerpc/mm/tlb_hash32.c:44:
  + if (Hash != NULL) {

  CHECK: Comparison to NULL could be written "!Hash"
  #152: FILE: arch/powerpc/mm/tlb_hash32.c:57:
  + if (Hash == NULL) {

  CHECK: Comparison to NULL could be written "!Hash"
  #161: FILE: arch/powerpc/mm/tlb_hash32.c:87:
  + if (Hash == NULL) {

  CHECK: Comparison to NULL could be written "!Hash"
  #170: FILE: arch/powerpc/mm/tlb_hash32.c:127:
  + if (Hash == NULL) {

  CHECK: Comparison to NULL could be written "!Hash"
  #179: FILE: arch/powerpc/mm/tlb_hash32.c:148:
  + if (Hash == NULL) {

  ERROR: space required after that ';' (ctx:VxV)
  #192: FILE: arch/powerpc/platforms/powermac/pci.c:65:
  + for (; node != NULL;node = node->sibling) {

  CHECK: Comparison to NULL could be written "node"
  #192: FILE: arch/powerpc/platforms/powermac/pci.c:65:
  + for (; node != NULL;node = node->sibling) {

  CHECK: Comparison to NULL could be written "!region"
  #201: FILE: arch/powerpc/platforms/powermac/pci.c:1227:
  + if (region == NULL)

  CHECK: Comparison to NULL could be written "of_get_property"
  #214: FILE: arch/powerpc/platforms/powermac/setup.c:155:
  + if (of_get_property(np, "cache-unified", NULL) != NULL && dc) {

  CHECK: Comparison to NULL could be written "!np"
  #223: FILE: arch/powerpc/platforms/powermac/setup.c:247:
  + if (np == NULL)

  CHECK: Comparison to NULL could be written "np"
  #226: FILE: 

[RFC PATCH for 4.18 14/23] powerpc: Wire up cpu_opv system call

2018-04-12 Thread Mathieu Desnoyers
Signed-off-by: Mathieu Desnoyers 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Michael Ellerman 
CC: Boqun Feng 
CC: Peter Zijlstra 
CC: "Paul E. McKenney" 
CC: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/include/asm/systbl.h  | 1 +
 arch/powerpc/include/asm/unistd.h  | 2 +-
 arch/powerpc/include/uapi/asm/unistd.h | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/systbl.h 
b/arch/powerpc/include/asm/systbl.h
index 45d4d37495fd..4131825b5a05 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -393,3 +393,4 @@ SYSCALL(pkey_alloc)
 SYSCALL(pkey_free)
 SYSCALL(pkey_mprotect)
 SYSCALL(rseq)
+SYSCALL(cpu_opv)
diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index 1e9708632dce..c19379f0a32e 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -12,7 +12,7 @@
 #include 
 
 
-#define NR_syscalls388
+#define NR_syscalls389
 
 #define __NR__exit __NR_exit
 
diff --git a/arch/powerpc/include/uapi/asm/unistd.h 
b/arch/powerpc/include/uapi/asm/unistd.h
index ac5ba55066dd..f7a221bdb5df 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -399,5 +399,6 @@
 #define __NR_pkey_free 385
 #define __NR_pkey_mprotect 386
 #define __NR_rseq  387
+#define __NR_cpu_opv   388
 
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
-- 
2.11.0



[RFC PATCH for 4.18 07/23] powerpc: Add support for restartable sequences

2018-04-12 Thread Mathieu Desnoyers
From: Boqun Feng 

Call the rseq_handle_notify_resume() function on return to userspace if
TIF_NOTIFY_RESUME thread flag is set.

Perform fixup on the pre-signal when a signal is delivered on top of a
restartable sequence critical section.

Signed-off-by: Boqun Feng 
Signed-off-by: Mathieu Desnoyers 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Michael Ellerman 
CC: Peter Zijlstra 
CC: "Paul E. McKenney" 
CC: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/Kconfig | 1 +
 arch/powerpc/kernel/signal.c | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 73ce5dd07642..90700b6918ef 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -223,6 +223,7 @@ config PPC
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_VIRT_CPU_ACCOUNTING
select HAVE_IRQ_TIME_ACCOUNTING
+   select HAVE_RSEQ
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
select MODULES_USE_ELF_RELA
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index 61db86ecd318..d3bb3aaaf5ac 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk)
/* Re-enable the breakpoints for the signal stack */
thread_change_pc(tsk, tsk->thread.regs);
 
+   rseq_signal_deliver(tsk->thread.regs);
+
if (is32) {
if (ksig.ka.sa.sa_flags & SA_SIGINFO)
ret = handle_rt_signal32(, oldset, tsk);
@@ -164,6 +166,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long 
thread_info_flags)
if (thread_info_flags & _TIF_NOTIFY_RESUME) {
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
+   rseq_handle_notify_resume(regs);
}
 
user_enter();
-- 
2.11.0



[RFC PATCH for 4.18 08/23] powerpc: Wire up restartable sequences system call

2018-04-12 Thread Mathieu Desnoyers
From: Boqun Feng 

Wire up the rseq system call on powerpc.

This provides an ABI improving the speed of a user-space getcpu
operation on powerpc by skipping the getcpu system call on the fast
path, as well as improving the speed of user-space operations on per-cpu
data compared to using load-reservation/store-conditional atomics.

TODO: wire up rseq_syscall() on return from system call. It is used with
CONFIG_DEBUG_RSEQ=y to ensure system calls are not issued within rseq critical
section

Signed-off-by: Boqun Feng 
Signed-off-by: Mathieu Desnoyers 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Michael Ellerman 
CC: Peter Zijlstra 
CC: "Paul E. McKenney" 
CC: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/include/asm/systbl.h  | 1 +
 arch/powerpc/include/asm/unistd.h  | 2 +-
 arch/powerpc/include/uapi/asm/unistd.h | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/systbl.h 
b/arch/powerpc/include/asm/systbl.h
index d61f9c96d916..45d4d37495fd 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -392,3 +392,4 @@ SYSCALL(statx)
 SYSCALL(pkey_alloc)
 SYSCALL(pkey_free)
 SYSCALL(pkey_mprotect)
+SYSCALL(rseq)
diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index daf1ba97a00c..1e9708632dce 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -12,7 +12,7 @@
 #include 
 
 
-#define NR_syscalls387
+#define NR_syscalls388
 
 #define __NR__exit __NR_exit
 
diff --git a/arch/powerpc/include/uapi/asm/unistd.h 
b/arch/powerpc/include/uapi/asm/unistd.h
index 389c36fd8299..ac5ba55066dd 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -398,5 +398,6 @@
 #define __NR_pkey_alloc384
 #define __NR_pkey_free 385
 #define __NR_pkey_mprotect 386
+#define __NR_rseq  387
 
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
-- 
2.11.0



Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-12 Thread Dmitry V. Levin
On Thu, Apr 12, 2018 at 09:50:26AM -0700, Linus Torvalds wrote:
> Does this attached patch perhaps fix the ARM case?
> 
> It just uses FPE_FLTUNK as the default si_code for SIGFPE, which seems
> sane enough. And then gets rid of FPE_FIXME, which should resolve the
> nasty case.
> 
> Hmm? Entirely untested, and I didn't really look at the test-case in
> question since I can't really run it anyway.
> 
> Well, I could run it all on x86-64, but it doesn't have that FPE_FIXME
> case at all.
> 
>  Linus

>  arch/arm/include/uapi/asm/siginfo.h | 7 ---
>  arch/arm/vfp/vfpmodule.c| 4 ++--
>  2 files changed, 2 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm/include/uapi/asm/siginfo.h 
> b/arch/arm/include/uapi/asm/siginfo.h
> index d0513880be21..d87beeedb4c4 100644
> --- a/arch/arm/include/uapi/asm/siginfo.h
> +++ b/arch/arm/include/uapi/asm/siginfo.h
> @@ -3,11 +3,4 @@
>  
>  #include 
>  
> -/*
> - * SIGFPE si_codes
> - */
> -#ifdef __KERNEL__
> -#define FPE_FIXME0   /* Broken dup of SI_USER */
> -#endif /* __KERNEL__ */
> -
>  #endif

Looks like the whole file should go away.

> diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
> index 4c375e11ae95..012c6e690303 100644
> --- a/arch/arm/vfp/vfpmodule.c
> +++ b/arch/arm/vfp/vfpmodule.c
> @@ -251,13 +251,13 @@ static void vfp_panic(char *reason, u32 inst)
>   */
>  static void vfp_raise_exceptions(u32 exceptions, u32 inst, u32 fpscr, struct 
> pt_regs *regs)
>  {
> - int si_code = 0;
> + int si_code = FPE_FLTUNK;

Note that this change would affect the following code
at the end of vfp_raise_exceptions:

if (si_code)
vfp_raise_sigfpe(si_code, regs);

>   pr_debug("VFP: raising exceptions %08x\n", exceptions);
>  
>   if (exceptions == VFP_EXCEPTION_ERROR) {
>   vfp_panic("unhandled bounce", inst);
> - vfp_raise_sigfpe(FPE_FIXME, regs);
> + vfp_raise_sigfpe(si_code, regs);
>   return;
>   }
>  

To be on the safe side, I'd just change it this way:

diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 4c375e1..66a73ba 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -257,7 +257,7 @@ static void vfp_raise_exceptions(u32 exceptions, u32 inst, 
u32 fpscr, struct pt_
 
if (exceptions == VFP_EXCEPTION_ERROR) {
vfp_panic("unhandled bounce", inst);
-   vfp_raise_sigfpe(FPE_FIXME, regs);
+   vfp_raise_sigfpe(FPE_FLTUNK, regs);
return;
}

-- 
ldv


signature.asc
Description: PGP signature


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-12 Thread Linus Torvalds
On Thu, Apr 12, 2018 at 10:20 AM, Russell King - ARM Linux
 wrote:
>
> This file was created to contain FPE_FIXME, by the "signal/arm: Document
> conflicts with SI_USER and SIGFPE" commit so if we're removing it, it
> would be better to remove the whole file.

Fair enough. I'm not going to commit that anyway since I can't test
it, but yes, if it tests ok that sounds like the right thing to do.

   Linus


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-12 Thread Russell King - ARM Linux
On Thu, Apr 12, 2018 at 09:50:26AM -0700, Linus Torvalds wrote:
> Does this attached patch perhaps fix the ARM case?
> 
> It just uses FPE_FLTUNK as the default si_code for SIGFPE, which seems
> sane enough. And then gets rid of FPE_FIXME, which should resolve the
> nasty case.
> 
> Hmm? Entirely untested, and I didn't really look at the test-case in
> question since I can't really run it anyway.

I'll test tomorrow and let you know.

>  arch/arm/include/uapi/asm/siginfo.h | 7 ---
>  arch/arm/vfp/vfpmodule.c| 4 ++--
>  2 files changed, 2 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm/include/uapi/asm/siginfo.h 
> b/arch/arm/include/uapi/asm/siginfo.h
> index d0513880be21..d87beeedb4c4 100644
> --- a/arch/arm/include/uapi/asm/siginfo.h
> +++ b/arch/arm/include/uapi/asm/siginfo.h
> @@ -3,11 +3,4 @@
>  
>  #include 
>  
> -/*
> - * SIGFPE si_codes
> - */
> -#ifdef __KERNEL__
> -#define FPE_FIXME0   /* Broken dup of SI_USER */
> -#endif /* __KERNEL__ */
> -
>  #endif

This file was created to contain FPE_FIXME, by the "signal/arm: Document
conflicts with SI_USER and SIGFPE" commit so if we're removing it, it
would be better to remove the whole file.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-12 Thread Linus Torvalds
Does this attached patch perhaps fix the ARM case?

It just uses FPE_FLTUNK as the default si_code for SIGFPE, which seems
sane enough. And then gets rid of FPE_FIXME, which should resolve the
nasty case.

Hmm? Entirely untested, and I didn't really look at the test-case in
question since I can't really run it anyway.

Well, I could run it all on x86-64, but it doesn't have that FPE_FIXME
case at all.

 Linus
 arch/arm/include/uapi/asm/siginfo.h | 7 ---
 arch/arm/vfp/vfpmodule.c| 4 ++--
 2 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/arch/arm/include/uapi/asm/siginfo.h b/arch/arm/include/uapi/asm/siginfo.h
index d0513880be21..d87beeedb4c4 100644
--- a/arch/arm/include/uapi/asm/siginfo.h
+++ b/arch/arm/include/uapi/asm/siginfo.h
@@ -3,11 +3,4 @@
 
 #include 
 
-/*
- * SIGFPE si_codes
- */
-#ifdef __KERNEL__
-#define FPE_FIXME	0	/* Broken dup of SI_USER */
-#endif /* __KERNEL__ */
-
 #endif
diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 4c375e11ae95..012c6e690303 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -251,13 +251,13 @@ static void vfp_panic(char *reason, u32 inst)
  */
 static void vfp_raise_exceptions(u32 exceptions, u32 inst, u32 fpscr, struct pt_regs *regs)
 {
-	int si_code = 0;
+	int si_code = FPE_FLTUNK;
 
 	pr_debug("VFP: raising exceptions %08x\n", exceptions);
 
 	if (exceptions == VFP_EXCEPTION_ERROR) {
 		vfp_panic("unhandled bounce", inst);
-		vfp_raise_sigfpe(FPE_FIXME, regs);
+		vfp_raise_sigfpe(si_code, regs);
 		return;
 	}
 


Applied "ASoC: fsl_ssi: Fix mode setting when changing channel number" to the asoc tree

2018-04-12 Thread Mark Brown
The patch

   ASoC: fsl_ssi: Fix mode setting when changing channel number

has been applied to the asoc tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From fac8a5a5ea40b03dcbb0f46977094099ba2220b8 Mon Sep 17 00:00:00 2001
From: Nicolin Chen 
Date: Sat, 7 Apr 2018 21:40:21 -0700
Subject: [PATCH] ASoC: fsl_ssi: Fix mode setting when changing channel number
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This is a partial revert (in a cleaner way) of commit ebf08ae3bc90
("ASoC: fsl_ssi: Keep ssi->i2s_net updated") to fix a regression
at test cases when switching between mono and stereo audio.

The problem is that ssi->i2s_net is initialized in set_dai_fmt()
only, while this set_dai_fmt() is only called during the dai-link
probe(). The original patch assumed set_dai_fmt() would be called
during every playback instance, so it failed at the overriding use
cases.

This patch adds the local variable i2s_net back to let regular use
cases still follow the mode settings from the set_dai_fmt().

Meanwhile, the original commit of keeping ssi->i2s_net updated was
to make set_tdm_slot() clean by checking the ssi->i2s_net directly
instead of reading SCR register. However, the change itself is not
necessary (or even harmful) because the set_tdm_slot() might fail
to check the slot number for Normal-Mode-None-Net settings while
mono audio cases still need 2 slots. So this patch can also fix it.
And it adds an extra line of comments to declare ssi->i2s_net does
not reflect the register value but merely the initial setting from
the set_dai_fmt().

Reported-by: Mika Penttilä 
Signed-off-by: Nicolin Chen 
Tested-by: Mika Penttilä 
Signed-off-by: Mark Brown 
---
 sound/soc/fsl/fsl_ssi.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index 0823b08923b5..89df2d9f63d7 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -217,6 +217,7 @@ struct fsl_ssi_soc_data {
  * @dai_fmt: DAI configuration this device is currently used with
  * @streams: Mask of current active streams: BIT(TX) and BIT(RX)
  * @i2s_net: I2S and Network mode configurations of SCR register
+ *   (this is the initial settings based on the DAI format)
  * @synchronous: Use synchronous mode - both of TX and RX use STCK and SFCK
  * @use_dma: DMA is used or FIQ with stream filter
  * @use_dual_fifo: DMA with support for dual FIFO mode
@@ -829,16 +830,23 @@ static int fsl_ssi_hw_params(struct snd_pcm_substream 
*substream,
}
 
if (!fsl_ssi_is_ac97(ssi)) {
+   /*
+* Keep the ssi->i2s_net intact while having a local variable
+* to override settings for special use cases. Otherwise, the
+* ssi->i2s_net will lose the settings for regular use cases.
+*/
+   u8 i2s_net = ssi->i2s_net;
+
/* Normal + Network mode to send 16-bit data in 32-bit frames */
if (fsl_ssi_is_i2s_cbm_cfs(ssi) && sample_size == 16)
-   ssi->i2s_net = SSI_SCR_I2S_MODE_NORMAL | SSI_SCR_NET;
+   i2s_net = SSI_SCR_I2S_MODE_NORMAL | SSI_SCR_NET;
 
/* Use Normal mode to send mono data at 1st slot of 2 slots */
if (channels == 1)
-   ssi->i2s_net = SSI_SCR_I2S_MODE_NORMAL;
+   i2s_net = SSI_SCR_I2S_MODE_NORMAL;
 
regmap_update_bits(regs, REG_SSI_SCR,
-  SSI_SCR_I2S_NET_MASK, ssi->i2s_net);
+  SSI_SCR_I2S_NET_MASK, i2s_net);
}
 
/* In synchronous mode, the SSI uses STCCR for capture */
-- 
2.17.0



Applied "ASoC: fsl_esai: Fix divisor calculation failure at lower ratio" to the asoc tree

2018-04-12 Thread Mark Brown
The patch

   ASoC: fsl_esai: Fix divisor calculation failure at lower ratio

has been applied to the asoc tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From c656941df9bc80f7ec65b92ca73c42f8b0b62628 Mon Sep 17 00:00:00 2001
From: Nicolin Chen 
Date: Sun, 8 Apr 2018 16:57:35 -0700
Subject: [PATCH] ASoC: fsl_esai: Fix divisor calculation failure at lower
 ratio

When the desired ratio is less than 256, the savesub (tolerance)
in the calculation would become 0. This will then fail the loop-
search immediately without reporting any errors.

But if the ratio is smaller enough, there is no need to calculate
the tolerance because PM divisor alone is enough to get the ratio.

So a simple fix could be just to set PM directly instead of going
into the loop-search.

Reported-by: Marek Vasut 
Signed-off-by: Nicolin Chen 
Tested-by: Marek Vasut 
Reviewed-by: Fabio Estevam 
Signed-off-by: Mark Brown 
Cc: sta...@vger.kernel.org
---
 sound/soc/fsl/fsl_esai.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/sound/soc/fsl/fsl_esai.c b/sound/soc/fsl/fsl_esai.c
index 40a700493f4c..da8fd98c7f51 100644
--- a/sound/soc/fsl/fsl_esai.c
+++ b/sound/soc/fsl/fsl_esai.c
@@ -144,6 +144,13 @@ static int fsl_esai_divisor_cal(struct snd_soc_dai *dai, 
bool tx, u32 ratio,
 
psr = ratio <= 256 * maxfp ? ESAI_xCCR_xPSR_BYPASS : 
ESAI_xCCR_xPSR_DIV8;
 
+   /* Do not loop-search if PM (1 ~ 256) alone can serve the ratio */
+   if (ratio <= 256) {
+   pm = ratio;
+   fp = 1;
+   goto out;
+   }
+
/* Set the max fluctuation -- 0.1% of the max devisor */
savesub = (psr ? 1 : 8)  * 256 * maxfp / 1000;
 
-- 
2.17.0



[PATCH v2 00/13] y2038: convert IPC syscalls

2018-04-12 Thread Arnd Bergmann
This is an update of a series I posted a long time ago [1], updating
the IPC subsystem to pass down 64-bit time stamps to user space.

In particular, for sys_msgctl, sys_semctl and sys_shmctl, I do not
introduce a completely new set of replacement system calls, but instead
extend the existing ones to return data in the reserved fields of the
normal data structure.

This should be completely transparent to any existing user space, and
only after the 32-bit time_t wraps, it will make a difference in the
returned data.

libc implementations will consequently have to provide their own data
structures when they move to 64-bit time_t, and convert the structures
in user space from the ones returned by the kernel.

There are three cases here:

- little-endian architectures (except powerpc and mips) can use
  the normal layout and just cast the data structure to the user space
  type that contains 64-bit numbers.

- parisc and sparc can do the same thing with big-endian user space

- little-endian powerpc and most big-endian architectures have
  to flip the upper and lower 32-bit halves of the time_t value in memory,
  but can otherwise keep using the normal layout

- mips and big-endian xtensa need to be more careful because
  they are not consistent in their definitions, and they have to provide
  custom libc implementations for the system calls to use 64-bit time_t.

Changes to the previous version include

- Rebased to the latest kernel (4.17-rc)

- Dropped changes for removed architectures

- Simplified the IPC code changes, based on prior work from
  both Deepa and Eric

- Fixed a few bugs that I found during rebasing, in parcular the
  sparc version was incorrect.

If everyone agrees with the series, I'd like to have it merged through
the tip tree once Deepa's earlier syscall series in there (I have both
in my y2038 tree [2]).

Arnd

[1] https://lkml.org/lkml/2015/5/20/605
[2] git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git y2038-next

Arnd Bergmann (13):
  y2038: asm-generic: extend sysvipc data structures
  y2038: alpha: remove unneeded ipc uapi header files
  y2038: ia64: remove unneeded ipc uapi header files
  y2038: s390: remove unneeded ipc uapi header files
  y2038: arm64: extend sysvipc compat data structures
  y2038: mips: extend sysvipc data structures
  y2038: x86: extend sysvipc data structures
  y2038: parisc: extend sysvipc data structures
  y2038: sparc: extend sysvipc data structures
  y2038: powerpc: extend sysvipc data structures
  y2038: xtensa: extend sysvipc data structures
  y2038: ipc: use ktime_get_real_seconds consistently
  y2038: ipc: report long times to user space

Cc: linux-...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: libc-al...@sourceware.org
Cc: t...@linutronix.de
Cc: deepa.ker...@gmail.com
Cc: v...@zeniv.linux.org.uk
Cc: ebied...@xmission.com
Cc: albert.arib...@3adev.fr
Cc: linux-s...@vger.kernel.org
Cc: schwidef...@de.ibm.com
Cc: x...@kernel.org
Cc: catalin.mari...@arm.com
Cc: will.dea...@arm.com
Cc: linux-m...@linux-mips.org
Cc: jho...@kernel.org
Cc: r...@linux-mips.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org

 arch/alpha/include/asm/Kbuild  |  4 +++
 arch/alpha/include/uapi/asm/ipcbuf.h   |  2 --
 arch/alpha/include/uapi/asm/msgbuf.h   | 28 -
 arch/alpha/include/uapi/asm/sembuf.h   | 23 --
 arch/alpha/include/uapi/asm/shmbuf.h   | 39 ---
 arch/arm64/include/asm/compat.h| 32 +--
 arch/ia64/include/asm/Kbuild   |  4 +++
 arch/ia64/include/uapi/asm/ipcbuf.h|  2 --
 arch/ia64/include/uapi/asm/msgbuf.h| 28 -
 arch/ia64/include/uapi/asm/sembuf.h| 23 --
 arch/ia64/include/uapi/asm/shmbuf.h| 39 ---
 arch/mips/include/asm/compat.h | 38 ---
 arch/mips/include/uapi/asm/msgbuf.h| 57 ++
 arch/mips/include/uapi/asm/sembuf.h| 15 +++--
 arch/mips/include/uapi/asm/shmbuf.h| 23 --
 arch/parisc/include/asm/compat.h   | 32 +--
 arch/parisc/include/uapi/asm/msgbuf.h  | 33 ++--
 arch/parisc/include/uapi/asm/sembuf.h  | 16 +-
 arch/parisc/include/uapi/asm/shmbuf.h  | 19 +---
 arch/powerpc/include/asm/compat.h  | 32 +--
 arch/powerpc/include/uapi/asm/msgbuf.h | 18 +--
 arch/powerpc/include/uapi/asm/sembuf.h | 14 -
 arch/powerpc/include/uapi/asm/shmbuf.h | 19 +---
 arch/s390/include/asm/Kbuild   |  3 ++
 arch/s390/include/asm/compat.h | 32 +--
 arch/s390/include/uapi/asm/msgbuf.h| 38 ---
 arch/s390/include/uapi/asm/sembuf.h| 30 --
 arch/s390/include/uapi/asm/shmbuf.h| 49 -
 arch/sparc/include/asm/compat.h| 32 +--
 arch/sparc/include/uapi/asm/msgbuf.h   | 22 ++---
 

[PATCH] powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features

2018-04-12 Thread Michael Ellerman
The cpu_has_feature() mechanism has an optimisation where at build
time we construct a mask of the CPU feature bits that will always be
true for the given .config, based on the platform/bitness/etc. that we
are building for.

That is incompatible with DT CPU features, where the set of CPU
features is dependent on feature flags that are given to us by
firmware.

The result is that some feature bits can not be *disabled* by DT CPU
features. Or more accurately, they can be disabled but they will still
appear in the ALWAYS mask, meaning cpu_has_feature() will always
return true for them.

In the past this hasn't really been a problem because on Book3S
64 (where we support DT CPU features), the set of ALWAYS bits has been
very small. That was because we always built for POWER4 and later,
meaning the set of common bits was small.

The only bit that could be cleared by DT CPU features that was also in
the ALWAYS mask was CPU_FTR_NODSISRALIGN, and that was only used in
the alignment handler to create a fake DSISR. That code was itself
deleted in 31bfdb036f12 ("powerpc: Use instruction emulation
infrastructure to handle alignment faults") (Sep 2017).

However the set of ALWAYS features changed with the recent commit
db5ae1c155af ("powerpc/64s: Refine feature sets for little endian
builds") which restricted the set of feature flags when building
little endian to Power7 or later. That caused the ALWAYS mask to
become much larger for little endian builds.

The result is that the following feature bits can currently not
be *disabled* by DT CPU features:

  CPU_FTR_REAL_LE, CPU_FTR_MMCRA, CPU_FTR_CTRL, CPU_FTR_SMT,
  CPU_FTR_PURR, CPU_FTR_SPURR, CPU_FTR_DSCR, CPU_FTR_PKEY,
  CPU_FTR_VMX_COPY, CPU_FTR_CFAR, CPU_FTR_HAS_PPR.

To fix it we need to mask the set of ALWAYS features with the base set
of DT CPU features, ie. the features that are always enabled by DT CPU
features. That way there are no bits in the ALWAYS mask that are not
also always set by DT CPU features.

Fixes: db5ae1c155af ("powerpc/64s: Refine feature sets for little endian 
builds")
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/cputable.h | 23 +--
 arch/powerpc/kernel/dt_cpu_ftrs.c   | 14 +-
 2 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index 931dda8be87c..66fcab13c8b4 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -545,18 +545,37 @@ enum {
 #ifdef CONFIG_PPC_BOOK3E
 #define CPU_FTRS_ALWAYS(CPU_FTRS_E6500 & CPU_FTRS_E5500)
 #else
+
+#ifdef CONFIG_PPC_DT_CPU_FTRS
+#define CPU_FTRS_DT_CPU_BASE   \
+   (CPU_FTR_LWSYNC |   \
+CPU_FTR_FPU_UNAVAILABLE |  \
+CPU_FTR_NODSISRALIGN | \
+CPU_FTR_NOEXECUTE |\
+CPU_FTR_COHERENT_ICACHE |  \
+CPU_FTR_STCX_CHECKS_ADDRESS |  \
+CPU_FTR_POPCNTB | CPU_FTR_POPCNTD |\
+CPU_FTR_DAWR | \
+CPU_FTR_ARCH_206 | \
+CPU_FTR_ARCH_207S)
+#else
+#define CPU_FTRS_DT_CPU_BASE   (~0ul)
+#endif
+
 #ifdef CONFIG_CPU_LITTLE_ENDIAN
 #define CPU_FTRS_ALWAYS \
(CPU_FTRS_POSSIBLE & ~CPU_FTR_HVMODE & CPU_FTRS_POWER7 & \
 CPU_FTRS_POWER8E & CPU_FTRS_POWER8 & CPU_FTRS_POWER8_DD1 & \
-CPU_FTRS_POWER9 & CPU_FTRS_POWER9_DD1 & CPU_FTRS_POWER9_DD2_1)
+CPU_FTRS_POWER9 & CPU_FTRS_POWER9_DD1 & CPU_FTRS_POWER9_DD2_1 & \
+CPU_FTRS_DT_CPU_BASE)
 #else
 #define CPU_FTRS_ALWAYS\
(CPU_FTRS_PPC970 & CPU_FTRS_POWER5 & \
 CPU_FTRS_POWER6 & CPU_FTRS_POWER7 & CPU_FTRS_CELL & \
 CPU_FTRS_PA6T & CPU_FTRS_POWER8 & CPU_FTRS_POWER8E & \
 CPU_FTRS_POWER8_DD1 & ~CPU_FTR_HVMODE & CPU_FTRS_POSSIBLE & \
-CPU_FTRS_POWER9 & CPU_FTRS_POWER9_DD1 & CPU_FTRS_POWER9_DD2_1)
+CPU_FTRS_POWER9 & CPU_FTRS_POWER9_DD1 & CPU_FTRS_POWER9_DD2_1 & \
+CPU_FTRS_DT_CPU_BASE)
 #endif /* CONFIG_CPU_LITTLE_ENDIAN */
 #endif
 #else
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index e88fbb1fdb8f..8ab51f6ca03a 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -53,18 +53,6 @@ struct dt_cpu_feature {
int disabled;
 };
 
-#define CPU_FTRS_BASE \
-  (CPU_FTR_LWSYNC | \
-   CPU_FTR_FPU_UNAVAILABLE |\
-   CPU_FTR_NODSISRALIGN |\
-   CPU_FTR_NOEXECUTE |\
-   CPU_FTR_COHERENT_ICACHE | \
-   CPU_FTR_STCX_CHECKS_ADDRESS |\
-   CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
-   CPU_FTR_DAWR | \
-   CPU_FTR_ARCH_206 |\
-   CPU_FTR_ARCH_207S)
-
 #define MMU_FTRS_HASH_BASE (MMU_FTRS_POWER8)
 
 #define COMMON_USER_BASE   (PPC_FEATURE_32 | PPC_FEATURE_64 

Re: [PATCH v9 21/24] perf tools: Add support for the SPF perf event

2018-04-12 Thread Laurent Dufour
On 10/04/2018 08:47, David Rientjes wrote:
> On Mon, 26 Mar 2018, Andi Kleen wrote:
> 
>>> Aside: should there be a new spec_flt field for struct task_struct that 
>>> complements maj_flt and min_flt and be exported through /proc/pid/stat?
>>
>> No. task_struct is already too bloated. If you need per process tracking 
>> you can always get it through trace points.
>>
> 
> Hi Andi,
> 
> We have
> 
>   count_vm_event(PGFAULT);
>   count_memcg_event_mm(vma->vm_mm, PGFAULT);
> 
> in handle_mm_fault() but not counterpart for spf.  I think it would be 
> helpful to be able to determine how much faulting can be done 
> speculatively if there is no per-process tracking without tracing.

That sounds to be a good idea, I will create a separate patch a dedicated
speculative_pgfault counter as PGFAULT is.

Thanks,
Laurent.



Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-12 Thread Russell King - ARM Linux
On Thu, Apr 12, 2018 at 03:49:28PM +0300, Dmitry V. Levin wrote:
> The "KERNEL BUG" diagnostics I was talking about was added to strace yesterday
> as a part of workaround commit, see
> https://github.com/strace/strace/commit/34c7794cc16e2511eda7b1d5767c655a83b17309
> Before that change the test just failed.

Ah, seeing the test case really helps to see exactly what and why it's
broken.  Yes, Eric's commit was definitely wrong and needs to be
reverted, because it incorrectly changes what happens when kill(1) is
used to deliver a SIGFPE signal to a process.

Eric, please sort this out - you have a much better handle on whether
there are any dependencies here that would need to be resolved from
a simple revert of the offending commits, but that revert must happen
because you've caused a user visible regression.

The original code _was_ safe even if it wasn't correct to the specs,
as we'd end up copying the si_addr field (as the si_pid copy) and a
zeroed field as the si_uid copy.  It was just that si_code was
technically wrong, and that's something that would be even more
dangerous to change now.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-12 Thread Dmitry V. Levin
On Thu, Apr 12, 2018 at 01:19:49PM +0100, Russell King - ARM Linux wrote:
> On Thu, Apr 12, 2018 at 02:03:14PM +0300, Dmitry V. Levin wrote:
> > On Thu, Apr 12, 2018 at 10:58:11AM +0100, Russell King - ARM Linux wrote:
> > > On Thu, Apr 12, 2018 at 04:34:35AM +0300, Dmitry V. Levin wrote:
> > > > A similar commit v4.16-rc1~159^2~37
> > > > ("signal/arm: Document conflicts with SI_USER and SIGFPE") must have
> > > > introduced a similar ABI regression to compat arm.
> > > 
> > > So, could you explain how can this change cause a regression?
> > > 
> > > +#define FPE_FIXME  0
> > > -   vfp_raise_sigfpe(0, regs);
> > > +   vfp_raise_sigfpe(FPE_FIXME, regs);
> > 
> > No, this hunk hasn't caused the regression, but another one did:
> > 
> > diff --git a/arch/arm/include/uapi/asm/siginfo.h 
> > b/arch/arm/include/uapi/asm/siginfo.h
> > new file mode 100644
> > index 000..d051388
> > --- /dev/null
> > +++ b/arch/arm/include/uapi/asm/siginfo.h
> > @@ -0,0 +1,13 @@
> > +#ifndef __ASM_SIGINFO_H
> > +#define __ASM_SIGINFO_H
> > +
> > +#include 
> > +
> > +/*
> > + * SIGFPE si_codes
> > + */
> > +#ifdef __KERNEL__
> > +#define FPE_FIXME  0   /* Broken dup of SI_USER */
> > +#endif /* __KERNEL__ */
> > +
> > +#endif
> > 
> > This is due to FPE_FIXME handling in kernel/signal.c
> 
> Building strace 4.22 on ARM and running the test suite reveals no
> problems with the signal_receive test, tested on both 4.14 and 4.16
> kernels - there's no "KERNEL BUG" reports in any of the test results.

https://build.opensuse.org/public/build/home:ldv_alt/openSUSE_Factory_ARM/armv7l/strace/_log
- the test just fails there with
[   50s] + uname -a
[   50s] Linux armbuild01 4.16.0-1-lpae #1 SMP PREEMPT Wed Apr 4 13:35:56 UTC 
2018 (e16f96d) armv7l armv7l armv7l GNU/Linux
...
[  570s] FAIL: signal_receive.gen
[  570s]  SIGFPE {si_signo=SIGFPE, si_code=SI_USER, si_pid=25332, 
si_uid=399} ---
[  570s] +--- SIGFPE {si_signo=SIGFPE, si_code=SI_USER, si_pid=25332, si_uid=0} 
---
[  570s] signal_receive.gen.test: failed test: ../../strace -a16 -e trace=kill 
../signal_receive output mismatch

> However, stock strace 4.22 source doesn't appear to contain the
> "KERNEL BUG" string anywhere, so this may be a Suse specific addition
> to the test:

The "KERNEL BUG" diagnostics I was talking about was added to strace yesterday
as a part of workaround commit, see
https://github.com/strace/strace/commit/34c7794cc16e2511eda7b1d5767c655a83b17309
Before that change the test just failed.

[...]
> Any ideas where the "KERNEL BUG" in Suse builds is coming from?

strace developers use OBS to test strace.git for regressions.
The build environment is provided by OBS, all the rest comes from strace.git.

> Any ideas how to test it on other architectures (iow, where can we get
> source that contains this test?)

Just use master branch of https://github.com/strace/strace
or https://gitlab.com/strace/strace (they are the same).

> Based on previous experience, unfortunately folk don't tend to report
> user ABI regressions to kernel developers, so we'd probably never know
> that there's a problem - I do think the safer thing would've been to
> leave it well alone, and just accept that we'll end up copying more
> words to userspace than is actually intended.

Well, these changes caused visible regressions in strace test suite on arm, ppc,
and sparc - this is the reason why I have reported them to kernel developers.


-- 
ldv


signature.asc
Description: PGP signature


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-12 Thread Russell King - ARM Linux
On Thu, Apr 12, 2018 at 02:03:14PM +0300, Dmitry V. Levin wrote:
> On Thu, Apr 12, 2018 at 10:58:11AM +0100, Russell King - ARM Linux wrote:
> > On Thu, Apr 12, 2018 at 04:34:35AM +0300, Dmitry V. Levin wrote:
> > > A similar commit v4.16-rc1~159^2~37
> > > ("signal/arm: Document conflicts with SI_USER and SIGFPE") must have
> > > introduced a similar ABI regression to compat arm.
> > 
> > So, could you explain how can this change cause a regression?
> > 
> > +#define FPE_FIXME  0
> > -   vfp_raise_sigfpe(0, regs);
> > +   vfp_raise_sigfpe(FPE_FIXME, regs);
> 
> No, this hunk hasn't caused the regression, but another one did:
> 
> diff --git a/arch/arm/include/uapi/asm/siginfo.h 
> b/arch/arm/include/uapi/asm/siginfo.h
> new file mode 100644
> index 000..d051388
> --- /dev/null
> +++ b/arch/arm/include/uapi/asm/siginfo.h
> @@ -0,0 +1,13 @@
> +#ifndef __ASM_SIGINFO_H
> +#define __ASM_SIGINFO_H
> +
> +#include 
> +
> +/*
> + * SIGFPE si_codes
> + */
> +#ifdef __KERNEL__
> +#define FPE_FIXME  0   /* Broken dup of SI_USER */
> +#endif /* __KERNEL__ */
> +
> +#endif
> 
> This is due to FPE_FIXME handling in kernel/signal.c

Building strace 4.22 on ARM and running the test suite reveals no
problems with the signal_receive test, tested on both 4.14 and 4.16
kernels - there's no "KERNEL BUG" reports in any of the test results.
However, stock strace 4.22 source doesn't appear to contain the
"KERNEL BUG" string anywhere, so this may be a Suse specific addition
to the test:

~/src/strace-4.22$ grep -ri 'KERNEL BUG' .
./strace.1:Arguably, every instance of such behavior is a kernel bug.)
./strace.1.in:Arguably, every instance of such behavior is a kernel bug.)
./NEWS:  * Worked around a kernel bug in tracing privileged executables.
./ChangeLog:aarch64: workaround gcc+kernel bug.
./ChangeLog:tests: workaround kernel bugs in seccomp-strict.test and 
prctl-seccomp-strict.test
./ChangeLog:instead.  We don't want the testsuite failing due to kernel 
bugs.
./ChangeLog:First guess is that it's a workaround for old kernel bugs:
./ChangeLog:This kernel bug is fixed long ago. This change removes the 
workaround.

Any ideas where the "KERNEL BUG" in Suse builds is coming from?  Any
ideas how to test it on other architectures (iow, where can we get
source that contains this test?)

Based on previous experience, unfortunately folk don't tend to report
user ABI regressions to kernel developers, so we'd probably never know
that there's a problem - I do think the safer thing would've been to
leave it well alone, and just accept that we'll end up copying more
words to userspace than is actually intended.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


[PATCH v4] mm: remove odd HAVE_PTE_SPECIAL

2018-04-12 Thread Laurent Dufour
Remove the additional define HAVE_PTE_SPECIAL and rely directly on
CONFIG_ARCH_HAS_PTE_SPECIAL.

There is no functional change introduced by this patch

Signed-off-by: Laurent Dufour 
---
 mm/memory.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 96910c625daa..345e562a138d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -817,17 +817,12 @@ static void print_bad_pte(struct vm_area_struct *vma, 
unsigned long addr,
  * PFNMAP mappings in order to support COWable mappings.
  *
  */
-#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
-# define HAVE_PTE_SPECIAL 1
-#else
-# define HAVE_PTE_SPECIAL 0
-#endif
 struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 pte_t pte, bool with_public_device)
 {
unsigned long pfn = pte_pfn(pte);
 
-   if (HAVE_PTE_SPECIAL) {
+   if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) {
if (likely(!pte_special(pte)))
goto check_pfn;
if (vma->vm_ops && vma->vm_ops->find_special_page)
@@ -862,7 +857,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, 
unsigned long addr,
return NULL;
}
 
-   /* !HAVE_PTE_SPECIAL case follows: */
+   /* !CONFIG_ARCH_HAS_PTE_SPECIAL case follows: */
 
if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
if (vma->vm_flags & VM_MIXEDMAP) {
@@ -881,6 +876,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, 
unsigned long addr,
 
if (is_zero_pfn(pfn))
return NULL;
+
 check_pfn:
if (unlikely(pfn > highest_memmap_pfn)) {
print_bad_pte(vma, addr, pte, NULL);
@@ -904,7 +900,7 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, 
unsigned long addr,
/*
 * There is no pmd_special() but there may be special pmds, e.g.
 * in a direct-access (dax) mapping, so let's just replicate the
-* !HAVE_PTE_SPECIAL case from vm_normal_page() here.
+* !CONFIG_ARCH_HAS_PTE_SPECIAL case from vm_normal_page() here.
 */
if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
if (vma->vm_flags & VM_MIXEDMAP) {
@@ -1933,7 +1929,8 @@ static int __vm_insert_mixed(struct vm_area_struct *vma, 
unsigned long addr,
 * than insert_pfn).  If a zero_pfn were inserted into a VM_MIXEDMAP
 * without pte special, it would there be refcounted as a normal page.
 */
-   if (!HAVE_PTE_SPECIAL && !pfn_t_devmap(pfn) && pfn_t_valid(pfn)) {
+   if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) &&
+   !pfn_t_devmap(pfn) && pfn_t_valid(pfn)) {
struct page *page;
 
/*
-- 
2.7.4



[PATCH] cxl: Set the PBCQ Tunnel BAR register when enabling capi mode

2018-04-12 Thread Philippe Bergheaud
Skiboot used to set the default Tunnel BAR register value when capi mode
was enabled. This approach was ok for the cxl driver, but prevented other
drivers from choosing different values.

Skiboot versions > 5.11 will not set the default value any longer. This
patch modifies the cxl driver to set/reset the Tunnel BAR register when
entering/exiting the cxl mode, with pnv_pci_set_tunnel_bar().

Signed-off-by: Philippe Bergheaud 
---
 drivers/misc/cxl/pci.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 83f1d08058fc..3beff9188446 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -1742,6 +1742,9 @@ static int cxl_configure_adapter(struct cxl *adapter, 
struct pci_dev *dev)
/* Required for devices using CAPP DMA mode, harmless for others */
pci_set_master(dev);
 
+   if ((rc = pnv_pci_set_tunnel_bar(dev, 0x0002E000ull, 1)))
+   goto err;
+
if ((rc = pnv_phb_to_cxl_mode(dev, adapter->native->sl_ops->capi_mode)))
goto err;
 
@@ -1768,6 +1771,7 @@ static void cxl_deconfigure_adapter(struct cxl *adapter)
 {
struct pci_dev *pdev = to_pci_dev(adapter->dev.parent);
 
+   pnv_pci_set_tunnel_bar(pdev, 0x0002E000ull, 0);
cxl_native_release_psl_err_irq(adapter);
cxl_unmap_adapter_regs(adapter);
 
-- 
2.16.2



Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-12 Thread Dmitry V. Levin
On Thu, Apr 12, 2018 at 10:58:11AM +0100, Russell King - ARM Linux wrote:
> On Thu, Apr 12, 2018 at 04:34:35AM +0300, Dmitry V. Levin wrote:
> > A similar commit v4.16-rc1~159^2~37
> > ("signal/arm: Document conflicts with SI_USER and SIGFPE") must have
> > introduced a similar ABI regression to compat arm.
> 
> So, could you explain how can this change cause a regression?
> 
> +#define FPE_FIXME  0
> -   vfp_raise_sigfpe(0, regs);
> +   vfp_raise_sigfpe(FPE_FIXME, regs);

No, this hunk hasn't caused the regression, but another one did:

diff --git a/arch/arm/include/uapi/asm/siginfo.h 
b/arch/arm/include/uapi/asm/siginfo.h
new file mode 100644
index 000..d051388
--- /dev/null
+++ b/arch/arm/include/uapi/asm/siginfo.h
@@ -0,0 +1,13 @@
+#ifndef __ASM_SIGINFO_H
+#define __ASM_SIGINFO_H
+
+#include 
+
+/*
+ * SIGFPE si_codes
+ */
+#ifdef __KERNEL__
+#define FPE_FIXME  0   /* Broken dup of SI_USER */
+#endif /* __KERNEL__ */
+
+#endif

This is due to FPE_FIXME handling in kernel/signal.c


-- 
ldv


signature.asc
Description: PGP signature


Re: sparc/ppc/arm compat siginfo ABI regressions: sending SIGFPE via kill() returns wrong values in si_pid and si_uid

2018-04-12 Thread Russell King - ARM Linux
On Thu, Apr 12, 2018 at 04:34:35AM +0300, Dmitry V. Levin wrote:
> A similar commit v4.16-rc1~159^2~37
> ("signal/arm: Document conflicts with SI_USER and SIGFPE") must have
> introduced a similar ABI regression to compat arm.

So, could you explain how can this change cause a regression?

+#define FPE_FIXME  0
-   vfp_raise_sigfpe(0, regs);
+   vfp_raise_sigfpe(FPE_FIXME, regs);

I think you're talking garbage here - look at the damned change.
It subsitutes a definition for a constant, and vfp_raise_sigfpe()
ends up receiving exactly the same value bother before and after
the change.

The change is rather incomplete though because it should have
also changed:

int si_code = 0;

as well.

So, the commit log is accurate in this case: it _is_ about
documenting the conflicting cases between SI_USER and SIGFPE and
that bit of the change has no ABI effect.

What does slightly annoy me is the creation of uapi/asm/siginfo.h
to contain a definition that _isn't_ to be exposed as part of the
UAPI.  If it's not part of the UAPI, it doesn't belong in a UAPI
header, period.  In any case, I don't think that is exposed to
userspace.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: [GIT PULL] asm-generic fixes for v4.17-rc1

2018-04-12 Thread Arnd Bergmann
On Thu, Apr 12, 2018 at 1:16 AM, Linus Torvalds
 wrote:
> On Wed, Apr 11, 2018 at 8:40 AM, Arnd Bergmann  wrote:
>>
>> are available in the git repository at:
>>
>>   
>> git+ssh://gitol...@ra.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git
>> tags/asm-generic
>
> Yeah, no they aren't available there at all.
>
> That tag is some old tag from two years ago that just contains your
> ancient "asm-generic: use compat version for preadv2 and pwritev2".
>
> Forgot to push out? Or forgot to use "-f" to overwrite the old tag?

Yes, something like that, I first tagged the local branch in the wrong
tree which had an old branch of the same name, noticed my mistake
as I pushed it, but then screwed up again when I tried to correct it:
I force-pushed the correct branch again, but not the tag.

Pushed the right tag now, please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git
tags/asm-generic

Arnd


[PATCH RFC 1/1] KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space

2018-04-12 Thread Sam Bobroff
It is not currently possible to create the full number of possible
VCPUs (KVM_MAX_VCPUS) on Power9 with KVM-HV when the guest uses less
threads per core than it's core stride (or "VSMT mode"). This is
because the VCORE ID and XIVE offsets to grow beyond KVM_MAX_VCPUS
even though the VCPU ID is less than KVM_MAX_VCPU_ID.

To address this, "pack" the VCORE ID and XIVE offsets by using
knowledge of the way the VCPU IDs will be used when there are less
guest threads per core than the core stride. The primary thread of
each core will always be used first. Then, if the guest uses more than
one thread per core, these secondary threads will sequentially follow
the primary in each core.

So, the only way an ID above KVM_MAX_VCPUS can be seen, is if the
VCPUs are being spaced apart, so at least half of each core is empty
and IDs between KVM_MAX_VCPUS and (KVM_MAX_VCPUS * 2) can be mapped
into the second half of each core (4..7, in an 8-thread core).

Similarly, if IDs above KVM_MAX_VCPUS * 2 are seen, at least 3/4 of
each core is being left empty, and we can map down into the second and
third quarters of each core (2, 3 and 5, 6 in an 8-thread core).

Lastly, if IDs above KVM_MAX_VCPUS * 4 are seen, only the primary
threads are being used and 7/8 of the core is empty, allowing use of
the 1, 3, 5 and 7 thread slots.

(Strides less than 8 are handled similarly.)

This allows the VCORE ID or offset to be calculated quickly from the
VCPU ID or XIVE server numbers, without access to the VCPU structure.

Signed-off-by: Sam Bobroff 
---
Hello everyone,

I've tested this on P8 and P9, in lots of combinations of host and guest
threading modes and it has been fine but it does feel like a "tricky"
approach, so I still feel somewhat wary about it.

I've posted it as an RFC because I have not tested it with guest native-XIVE,
and I suspect that it will take some work to support it.

 arch/powerpc/include/asm/kvm_book3s.h | 19 +++
 arch/powerpc/kvm/book3s_hv.c  | 14 ++
 arch/powerpc/kvm/book3s_xive.c|  9 +++--
 3 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 376ae803b69c..1295056d564a 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -368,4 +368,23 @@ extern int kvmppc_h_logical_ci_store(struct kvm_vcpu 
*vcpu);
 #define SPLIT_HACK_MASK0xff00
 #define SPLIT_HACK_OFFS0xfb00
 
+/* Pack a VCPU ID from the [0..KVM_MAX_VCPU_ID) space down to the
+ * [0..KVM_MAX_VCPUS) space, while using knowledge of the guest's core stride
+ * (but not it's actual threading mode, which is not available) to avoid
+ * collisions.
+ */
+static inline u32 kvmppc_pack_vcpu_id(struct kvm *kvm, u32 id)
+{
+   const int block_offsets[MAX_SMT_THREADS] = {0, 4, 2, 6, 1, 5, 3, 7};
+   int stride = kvm->arch.emul_smt_mode > 1 ?
+kvm->arch.emul_smt_mode : kvm->arch.smt_mode;
+   int block = (id / KVM_MAX_VCPUS) * (MAX_SMT_THREADS / stride);
+   u32 packed_id;
+
+   BUG_ON(block >= MAX_SMT_THREADS);
+   packed_id = (id % KVM_MAX_VCPUS) + block_offsets[block];
+   BUG_ON(packed_id >= KVM_MAX_VCPUS);
+   return packed_id;
+}
+
 #endif /* __ASM_KVM_BOOK3S_H__ */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 9cb9448163c4..49165cc90051 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1762,7 +1762,7 @@ static int threads_per_vcore(struct kvm *kvm)
return threads_per_subcore;
 }
 
-static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
+static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int id)
 {
struct kvmppc_vcore *vcore;
 
@@ -1776,7 +1776,7 @@ static struct kvmppc_vcore *kvmppc_vcore_create(struct 
kvm *kvm, int core)
init_swait_queue_head(>wq);
vcore->preempt_tb = TB_NIL;
vcore->lpcr = kvm->arch.lpcr;
-   vcore->first_vcpuid = core * kvm->arch.smt_mode;
+   vcore->first_vcpuid = id;
vcore->kvm = kvm;
INIT_LIST_HEAD(>preempt_list);
 
@@ -1992,12 +1992,18 @@ static struct kvm_vcpu 
*kvmppc_core_vcpu_create_hv(struct kvm *kvm,
mutex_lock(>lock);
vcore = NULL;
err = -EINVAL;
-   core = id / kvm->arch.smt_mode;
+   if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+   BUG_ON(kvm->arch.smt_mode != 1);
+   core = kvmppc_pack_vcpu_id(kvm, id);
+   } else {
+   core = id / kvm->arch.smt_mode;
+   }
if (core < KVM_MAX_VCORES) {
vcore = kvm->arch.vcores[core];
+   BUG_ON(cpu_has_feature(CPU_FTR_ARCH_300) && vcore);
if (!vcore) {
err = -ENOMEM;
-   vcore = kvmppc_vcore_create(kvm, core);
+   vcore = 

Re: [PATCH] powerpc/mm/radix: Fix checkstops caused by invalid tlbiel

2018-04-12 Thread Nicholas Piggin
On Thu, 12 Apr 2018 15:53:52 +1000
Michael Ellerman  wrote:

> In tlbiel_radix_set_isa300() we use the PPC_TLBIEL() macro to
> construct tlbiel instructions. The instruction takes 5 fields, two of
> which are registers, and the others are constants. But because it's
> constructed with inline asm the compiler doesn't know that.
> 
> We got the constraint wrong on the 'r' field, using "r" tells the
> compiler to put the value in a register. The value we then get in the
> macro is the *register number*, not the value of the field.
> 
> That means when we mask the register number with 0x1 we get 0 or 1
> depending on which register the compiler happens to put the constant
> in, eg:
> 
>   li  r10,1
>   tlbiel  r8,r9,2,0,0
> 
>   li  r7,1
>   tlbiel  r10,r6,0,0,1
> 
> If we're unlucky we might generate an invalid instruction form, for
> example RIC=0, PRS=1 and R=0, tlbiel r8,r7,0,1,0, this has been
> observed to cause machine checks:
> 
>   Oops: Machine check, sig: 7 [#1]
>   CPU: 24 PID: 0 Comm: swapper
>   NIP:  000385f4 LR: 0100ed00 CTR: 007f
>   REGS: c110bb40 TRAP: 0200
>   MSR:  90201003   CR: 4800  XER: 2004
>   CFAR: 000385d0 DAR: 1c00 DSISR: 0200 SOFTE: 1
> 
> If the machine check happens early in boot while we have MSR_ME=0 it
> will escalate into a checkstop and kill the box entirely.
> 
> To fix it we could change the inline asm constraint to "i" which
> tells the compiler the value is a constant. But a better fix is to just
> pass a literal 1 into the macro, which bypasses any problems with inline
> asm constraints.
> 

Reviewed-by: Nicholas Piggin 

> Fixes: d4748276ae14 ("powerpc/64s: Improve local TLB flush for boot and MCE 
> on POWER9")
> Cc: sta...@vger.kernel.org # v4.16+
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/mm/tlb-radix.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
> index 2fba6170ab3f..a5d7309c2d05 100644
> --- a/arch/powerpc/mm/tlb-radix.c
> +++ b/arch/powerpc/mm/tlb-radix.c
> @@ -33,13 +33,12 @@ static inline void tlbiel_radix_set_isa300(unsigned int 
> set, unsigned int is,
>  {
>   unsigned long rb;
>   unsigned long rs;
> - unsigned int r = 1; /* radix format */
>  
>   rb = (set << PPC_BITLSHIFT(51)) | (is << PPC_BITLSHIFT(53));
>   rs = ((unsigned long)pid << PPC_BITLSHIFT(31));
>  
> - asm volatile(PPC_TLBIEL(%0, %1, %2, %3, %4)
> -  : : "r"(rb), "r"(rs), "i"(ric), "i"(prs), "r"(r)
> + asm volatile(PPC_TLBIEL(%0, %1, %2, %3, 1)
> +  : : "r"(rb), "r"(rs), "i"(ric), "i"(prs)
>: "memory");
>  }
>