Re: [PATCH v4 1/2] powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'

2020-06-12 Thread Segher Boessenkool
Hi!

On Fri, Jun 12, 2020 at 02:33:09PM -0700, Nick Desaulniers wrote:
> On Thu, Jun 11, 2020 at 4:53 PM Segher Boessenkool
>  wrote:
> > The PowerPC part of
> > https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints
> > (sorry, no anchor) documents %U.
> 
> I thought those were constraints, not output templates?  Oh,
> The asm statement must also use %U as a placeholder for the
> “update” flag in the corresponding load or store instruction.
> got it.

Traditionally, *all* constraints were documented here, including the
ones that are only meant for GCC's internal use.  And the output
modifiers were largely not documented at all.

For GCC 10, for Power, I changed it to only document the constraints
that should be public in gcc.info (and everything in gccint.info).  The
output modifiers can neatly be documented here as well, since it such a
short section now.  We're not quite there yet, but getting there.

> > Traditionally the source code is the documentation for this.  The code
> > here starts with the comment
> >   /* Write second word of DImode or DFmode reference.  Works on register
> >  or non-indexed memory only.  */
> > (which is very out-of-date itself, it works fine for e.g. TImode as well,
> > but alas).
> >
> > Unit tests are completely unsuitable for most compiler things like this.
> 
> What? No, surely one may write tests for output operands.  Grepping
> for `%L` in gcc/ was less fun than I was hoping.

You should look for 'L' instead (incl. those quotes) ;-)

Unit tests are 100x as much work, and gets <5% of the problems, compared
to regression tests.  Unit tests only test the stuff you should have
written *anyway*.  It is much more useful to test that much higher level
things work, IMNSHO.

> > HtH,
> 
> Yes, perfect, thank you so much!  So it looks like LLVM does not yet
> handle %L properly for memory operands.
> https://bugs.llvm.org/show_bug.cgi?id=46186#c4
> It's neat to see how this is implemented in GCC (and how many aren't
> implemented in LLVM, yikes :( ).  For reference, this is implemented
> in PPCAsmPrinter::PrintAsmOperand() and
> PPCAsmPrinter::PrintAsmMemoryOperand() in
> llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp.  GCC switches first on the
> modifier characters, then the operand type.

That is what the rs6000 backend currently does, yeah.  The print_operand
function just gets passed the modifier character (as "int code", or 0 if
there is no modifier).  Since there are so many modifiers there aren't
really any better options than just doing a "switch (code)" around
everything else (well, things can be factored, some helper functions,
etc., but this is mostly very old code, and it has grown organically).

> LLVM dispatches on operand type, then modifier.

That is neater, certainly for REG operands.

> When I was looking into LLVM's AsmPrinter class,
> I was surprised to see it's basically an assembler that just has
> complex logic to just do a bunch of prints, so it makes sense to see
> that pattern in GCC literally calling printf.

GCC always outputs assembler code.  This is usually a big advantage, for
things like output_operand.

> Some things I don't understand from PPC parlance is the "mode"
> (preinc, predec, premodify) and small data operands?

"mode" is "machine mode" -- SImode and the like.  PRE_DEC etc. are
*codes* (rtx codes), like,  (mem:DF (pre_dec:SI (reg:SI 39)))  (straight
from the manual).

> IIUC the bug report correctly, it looks like LLVM is failing for the
> __put_user_asm2_goto case for -m32.  A simple reproducer:
> https://godbolt.org/z/jBBF9b
> 
> void foo(long long in, long long* out) {
> asm volatile(
>   "stw%X1 %0, %1\n\t"
>   "stw%X1 %L0, %L1"
>   ::"r"(in), "m"(*out));
> }

This is wrong if operands[0] is a register, btw.  So it should use 'o'
as constraint (not 'm'), and then the 'X' output modifier has become
useless.

> prints (in GCC):
> foo:
>   stw 3, 0(5)
>   stw 4, 4(5)
>   blr
> (first time looking at ppc assembler, seems constants and registers
> are not as easy to distinguish,

The instruction mnemonic always tells you what types all arguments are.
Traditionally we don't write spaces after commas, either.  That is
actually easier to read -- well, if you are used to it, anyway! :-)

> https://developer.ibm.com/technologies/linux/articles/l-ppc/ say "Get
> used to it." LOL, ok).

Since quite a while you can write your assembler using register names as
well.  Not using the dangerous macros the Linux kernel had/has(with
which you can write "rN" in place of any "N", and it doesn't force you
to use the register name either, so you could write "li r3,r4" and
"mr r3,0" and even "addi r3,r0,1234", all very misleading).

> so that's "store word from register 3 into dereference of register 5
> plus 0, then store word from register 4 into dereference of register 5
> plus 4?"

Yup.

> Guessing the ppc32 abi is ILP32 putting long long's into two
> separate registers?

Yes, and the order is the same as it 

[PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-12 Thread Fenghua Yu
A #GP fault is generated when ENQCMD instruction is executed without
a valid PASID value programmed in the current thread's PASID MSR. The
#GP fault handler will initialize the MSR if a PASID has been allocated
for this process.

Decoding the user instruction is ugly and sets a bad architecture
precedent. It may not function if the faulting instruction is modified
after #GP.

Thomas suggested to provide a reason for the #GP caused by executing ENQCMD
without a valid PASID value programmed. #GP error codes are 16 bits and all
16 bits are taken. Refer to SDM Vol 3, Chapter 16.13 for details. The other
choice was to reflect the error code in an MSR. ENQCMD can also cause #GP
when loading from the source operand, so its not fully comprehending all
the reasons. Rather than special case the ENQCMD, in future Intel may
choose a different fault mechanism for such cases if recovery is needed on
#GP.

The following heuristic is used to avoid decoding the user instructions
to determine the precise reason for the #GP fault:
1) If the mm for the process has not been allocated a PASID, this #GP
   cannot be fixed.
2) If the PASID MSR is already initialized, then the #GP was for some
   other reason
3) Try initializing the PASID MSR and returning. If the #GP was from
   an ENQCMD this will fix it. If not, the #GP fault will be repeated
   and will hit case "2".

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Update the first paragraph of the commit message (Thomas)
- Add reasons why don't decode the user instruction and don't use
  #GP error code (Thomas)
- Change get_task_mm() to current->mm (Thomas)
- Add comments on why IRQ is disabled during PASID fixup (Thomas)
- Add comment in fixup() that the function is called when #GP is from
  user (so mm is not NULL) (Dave Hansen)

 arch/x86/include/asm/iommu.h |  1 +
 arch/x86/kernel/traps.c  | 23 +
 drivers/iommu/intel/svm.c| 39 
 3 files changed, 63 insertions(+)

diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index ed41259fe7ac..e9365a5d6f7d 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -27,5 +27,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr)
 }
 
 void __free_pasid(struct mm_struct *mm);
+bool __fixup_pasid_exception(void);
 
 #endif /* _ASM_X86_IOMMU_H */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4cc541051994..0f78d5cdddfe 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -59,6 +59,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86_64
 #include 
@@ -436,6 +437,16 @@ static enum kernel_gp_hint get_kernel_gp_address(struct 
pt_regs *regs,
return GP_CANONICAL;
 }
 
+static bool fixup_pasid_exception(void)
+{
+   if (!IS_ENABLED(CONFIG_INTEL_IOMMU_SVM))
+   return false;
+   if (!static_cpu_has(X86_FEATURE_ENQCMD))
+   return false;
+
+   return __fixup_pasid_exception();
+}
+
 #define GPFSTR "general protection fault"
 
 dotraplinkage void do_general_protection(struct pt_regs *regs, long error_code)
@@ -447,6 +458,18 @@ dotraplinkage void do_general_protection(struct pt_regs 
*regs, long error_code)
int ret;
 
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+   /*
+* Perform the check for a user mode PASID exception before enable
+* interrupts. Doing this here ensures that the PASID MSR can be simply
+* accessed because the contents are known to be still associated
+* with the current process.
+*/
+   if (user_mode(regs) && fixup_pasid_exception()) {
+   cond_local_irq_enable(regs);
+   return;
+   }
+
cond_local_irq_enable(regs);
 
if (static_cpu_has(X86_FEATURE_UMIP)) {
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 27dc866b8461..81fd2380c0f9 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -1078,3 +1078,42 @@ void __free_pasid(struct mm_struct *mm)
 */
ioasid_free(pasid);
 }
+
+/*
+ * Apply some heuristics to see if the #GP fault was caused by a thread
+ * that hasn't had the IA32_PASID MSR initialized.  If it looks like that
+ * is the problem, try initializing the IA32_PASID MSR. If the heuristic
+ * guesses incorrectly, take one more #GP fault.
+ */
+bool __fixup_pasid_exception(void)
+{
+   u64 pasid_msr;
+   unsigned int pasid;
+
+   /*
+* This function is called only when this #GP was triggered from user
+* space. So the mm cannot be NULL.
+*/
+   pasid = current->mm->pasid;
+   /* If the mm doesn't have a valid PASID, then can't help. */
+   if (invalid_pasid(pasid))
+   return false;
+
+   /*
+* Since IRQ is disabled now, the current task still owns the FPU on
+* this CPU and the PASID MSR can be 

[PATCH v2 11/12] x86/mmu: Allocate/free PASID

2020-06-12 Thread Fenghua Yu
A PASID is allocated for an "mm" the first time any thread attaches
to an SVM capable device. Later device attachments (whether to the same
device or another SVM device) will re-use the same PASID.

The PASID is freed when the process exits (so no need to keep
reference counts on how many SVM devices are sharing the PASID).

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Define a helper free_bind() to simplify error exit code in bind_mm()
  (Thomas)
- Fix a ret error code in bind_mm() (Thomas)
- Change pasid's type from "int" to "unsigned int" to have consistent
  pasid type in iommu (Thomas)
- Simplify alloc_pasid() a bit.

 arch/x86/include/asm/iommu.h   |   2 +
 arch/x86/include/asm/mmu_context.h |  14 
 drivers/iommu/intel/svm.c  | 101 +
 3 files changed, 105 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index bf1ed2ddc74b..ed41259fe7ac 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -26,4 +26,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr)
return -EINVAL;
 }
 
+void __free_pasid(struct mm_struct *mm);
+
 #endif /* _ASM_X86_IOMMU_H */
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 47562147e70b..f8c91ce8c451 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 extern atomic64_t last_mm_ctx_id;
 
@@ -117,9 +118,22 @@ static inline int init_new_context(struct task_struct *tsk,
init_new_context_ldt(mm);
return 0;
 }
+
+static inline void free_pasid(struct mm_struct *mm)
+{
+   if (!IS_ENABLED(CONFIG_INTEL_IOMMU_SVM))
+   return;
+
+   if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
+   return;
+
+   __free_pasid(mm);
+}
+
 static inline void destroy_context(struct mm_struct *mm)
 {
destroy_context_ldt(mm);
+   free_pasid(mm);
 }
 
 extern void switch_mm(struct mm_struct *prev, struct mm_struct *next,
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 4e775e12ae52..27dc866b8461 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -425,6 +425,53 @@ int intel_svm_unbind_gpasid(struct device *dev, unsigned 
int pasid)
return ret;
 }
 
+static void free_bind(struct intel_svm *svm, struct intel_svm_dev *sdev,
+ bool new_pasid)
+{
+   if (new_pasid)
+   ioasid_free(svm->pasid);
+   kfree(svm);
+   kfree(sdev);
+}
+
+/*
+ * If this mm already has a PASID, use it. Otherwise allocate a new one.
+ * Let the caller know if a new PASID is allocated via 'new_pasid'.
+ */
+static int alloc_pasid(struct intel_svm *svm, struct mm_struct *mm,
+  unsigned int pasid_max, bool *new_pasid,
+  unsigned int flags)
+{
+   unsigned int pasid;
+
+   *new_pasid = false;
+
+   /*
+* Reuse the PASID if the mm already has a PASID and not a private
+* PASID is requested.
+*/
+   if (mm && mm->pasid && !(flags & SVM_FLAG_PRIVATE_PASID)) {
+   /*
+* Once a PASID is allocated for this mm, the PASID
+* stays with the mm until the mm is dropped. Reuse
+* the PASID which has been already allocated for the
+* mm instead of allocating a new one.
+*/
+   ioasid_set_data(mm->pasid, svm);
+
+   return mm->pasid;
+   }
+
+   /* Allocate a new pasid. Do not use PASID 0, reserved for init PASID. */
+   pasid = ioasid_alloc(NULL, PASID_MIN, pasid_max - 1, svm);
+   if (pasid != INVALID_IOASID) {
+   /* A new pasid is allocated. */
+   *new_pasid = true;
+   }
+
+   return pasid;
+}
+
 /* Caller must hold pasid_mutex, mm reference */
 static int
 intel_svm_bind_mm(struct device *dev, unsigned int flags,
@@ -518,6 +565,8 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
init_rcu_head(>rcu);
 
if (!svm) {
+   bool new_pasid;
+
svm = kzalloc(sizeof(*svm), GFP_KERNEL);
if (!svm) {
ret = -ENOMEM;
@@ -529,12 +578,9 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
if (pasid_max > intel_pasid_max_id)
pasid_max = intel_pasid_max_id;
 
-   /* Do not use PASID 0, reserved for RID to PASID */
-   svm->pasid = ioasid_alloc(NULL, PASID_MIN,
- pasid_max - 1, svm);
+   svm->pasid = alloc_pasid(svm, mm, pasid_max, _pasid, flags);
if (svm->pasid == INVALID_IOASID) {
-   kfree(svm);
-   kfree(sdev);
+   free_bind(svm, sdev, new_pasid);
ret = 

[PATCH v2 09/12] fork: Clear PASID for new mm

2020-06-12 Thread Fenghua Yu
When a new mm is created, its PASID should be cleared, i.e. the PASID is
initialized to its init state 0 on both ARM and X86.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Add this patch to initialize PASID value for a new mm.

 include/linux/mm_types.h | 2 ++
 kernel/fork.c| 8 
 2 files changed, 10 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5778db3aa42d..904bc07411a9 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -22,6 +22,8 @@
 #endif
 #define AT_VECTOR_SIZE (2*(AT_VECTOR_SIZE_ARCH + AT_VECTOR_SIZE_BASE + 1))
 
+/* Initial PASID value is 0. */
+#define INIT_PASID 0
 
 struct address_space;
 struct mem_cgroup;
diff --git a/kernel/fork.c b/kernel/fork.c
index 142b23645d82..085e72d3e9eb 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1007,6 +1007,13 @@ static void mm_init_owner(struct mm_struct *mm, struct 
task_struct *p)
 #endif
 }
 
+static void mm_init_pasid(struct mm_struct *mm)
+{
+#ifdef CONFIG_PCI_PASID
+   mm->pasid = INIT_PASID;
+#endif
+}
+
 static void mm_init_uprobes_state(struct mm_struct *mm)
 {
 #ifdef CONFIG_UPROBES
@@ -1035,6 +1042,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, 
struct task_struct *p,
mm_init_cpumask(mm);
mm_init_aio(mm);
mm_init_owner(mm, p);
+   mm_init_pasid(mm);
RCU_INIT_POINTER(mm->exe_file, NULL);
mmu_notifier_subscriptions_init(mm);
init_tlb_flush_pending(mm);
-- 
2.19.1



[PATCH v2 10/12] x86/process: Clear PASID state for a newly forked/cloned thread

2020-06-12 Thread Fenghua Yu
The PASID state has to be cleared on forks, since the child has a
different address space. The PASID is also cleared for thread clone. While
it would be correct to inherit the PASID in this case, it is unknown
whether the new task will use ENQCMD. Giving it the PASID "just in case"
would have the downside of increased context switch overhead to setting
the PASID MSR.

Since #GP faults have to be handled on any threads that were created before
the PASID was assigned to the mm of the process, newly created threads
might as well be treated in a consistent way.

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Modify init_task_pasid().

 arch/x86/kernel/process.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index f362ce0d5ac0..1b1492e337a6 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -121,6 +121,21 @@ static int set_new_tls(struct task_struct *p, unsigned 
long tls)
return do_set_thread_area_64(p, ARCH_SET_FS, tls);
 }
 
+/* Initialize the PASID state for the forked/cloned thread. */
+static void init_task_pasid(struct task_struct *task)
+{
+   struct ia32_pasid_state *ppasid;
+
+   /*
+* Initialize the PASID state so that the PASID MSR will be
+* initialized to its initial state (0) by XRSTORS when the task is
+* scheduled for the first time.
+*/
+   ppasid = get_xsave_addr(>thread.fpu.state.xsave, XFEATURE_PASID);
+   if (ppasid)
+   ppasid->pasid = INIT_PASID;
+}
+
 int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
unsigned long arg, struct task_struct *p, unsigned long tls)
 {
@@ -174,6 +189,9 @@ int copy_thread_tls(unsigned long clone_flags, unsigned 
long sp,
task_user_gs(p) = get_user_gs(current_pt_regs());
 #endif
 
+   if (static_cpu_has(X86_FEATURE_ENQCMD))
+   init_task_pasid(p);
+
/* Set a new TLS for the child thread? */
if (clone_flags & CLONE_SETTLS)
ret = set_new_tls(p, tls);
-- 
2.19.1



[PATCH v2 06/12] x86/fpu/xstate: Add supervisor PASID state for ENQCMD feature

2020-06-12 Thread Fenghua Yu
From: Yu-cheng Yu 

ENQCMD instruction reads PASID from IA32_PASID MSR. The MSR is stored
in the task's supervisor FPU PASID state and is context switched by
XSAVES/XRSTORS.

Signed-off-by: Yu-cheng Yu 
Co-developed-by: Fenghua Yu 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Modify the commit message (Thomas)

 arch/x86/include/asm/fpu/types.h  | 10 ++
 arch/x86/include/asm/fpu/xstate.h |  2 +-
 arch/x86/kernel/fpu/xstate.c  |  4 
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index f098f6cab94b..00f8efd4c07d 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -114,6 +114,7 @@ enum xfeature {
XFEATURE_Hi16_ZMM,
XFEATURE_PT_UNIMPLEMENTED_SO_FAR,
XFEATURE_PKRU,
+   XFEATURE_PASID,
 
XFEATURE_MAX,
 };
@@ -128,6 +129,7 @@ enum xfeature {
 #define XFEATURE_MASK_Hi16_ZMM (1 << XFEATURE_Hi16_ZMM)
 #define XFEATURE_MASK_PT   (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR)
 #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU)
+#define XFEATURE_MASK_PASID(1 << XFEATURE_PASID)
 
 #define XFEATURE_MASK_FPSSE(XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
 #define XFEATURE_MASK_AVX512   (XFEATURE_MASK_OPMASK \
@@ -229,6 +231,14 @@ struct pkru_state {
u32 pad;
 } __packed;
 
+/*
+ * State component 10 is supervisor state used for context-switching the
+ * PASID state.
+ */
+struct ia32_pasid_state {
+   u64 pasid;
+} __packed;
+
 struct xstate_header {
u64 xfeatures;
u64 xcomp_bv;
diff --git a/arch/x86/include/asm/fpu/xstate.h 
b/arch/x86/include/asm/fpu/xstate.h
index 422d8369012a..ab9833c57aaa 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -33,7 +33,7 @@
  XFEATURE_MASK_BNDCSR)
 
 /* All currently supported supervisor features */
-#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (0)
+#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID)
 
 /*
  * Unsupported supervisor features. When a supervisor feature in this mask is
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index bda2e5eaca0e..31629e43383c 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -37,6 +37,7 @@ static const char *xfeature_names[] =
"AVX-512 ZMM_Hi256" ,
"Processor Trace (unused)"  ,
"Protection Keys User registers",
+   "PASID state",
"unknown xstate feature",
 };
 
@@ -51,6 +52,7 @@ static short xsave_cpuid_features[] __initdata = {
X86_FEATURE_AVX512F,
X86_FEATURE_INTEL_PT,
X86_FEATURE_PKU,
+   X86_FEATURE_ENQCMD,
 };
 
 /*
@@ -316,6 +318,7 @@ static void __init print_xstate_features(void)
print_xstate_feature(XFEATURE_MASK_ZMM_Hi256);
print_xstate_feature(XFEATURE_MASK_Hi16_ZMM);
print_xstate_feature(XFEATURE_MASK_PKRU);
+   print_xstate_feature(XFEATURE_MASK_PASID);
 }
 
 /*
@@ -590,6 +593,7 @@ static void check_xstate_against_struct(int nr)
XCHECK_SZ(sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state);
XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM,  struct avx_512_hi16_state);
XCHECK_SZ(sz, nr, XFEATURE_PKRU,  struct pkru_state);
+   XCHECK_SZ(sz, nr, XFEATURE_PASID, struct ia32_pasid_state);
 
/*
 * Make *SURE* to add any feature numbers in below if
-- 
2.19.1



[PATCH v2 08/12] mm: Define pasid in mm

2020-06-12 Thread Fenghua Yu
PASID is shared by all threads in a process. So the logical place to keep
track of it is in the "mm". Both ARM and X86 need to use the PASID in the
"mm".

Suggested-by: Christoph Hellwig 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- This new patch moves "pasid" from x86 specific mm_context_t to generic
  struct mm_struct per Christopher's comment: 
https://lore.kernel.org/linux-iommu/20200414170252.714402-1-jean-phili...@linaro.org/T/#mb57110ffe1aaa24750eeea4f93b611f0d1913911
- Jean-Philippe Brucker released a virtually same patch. I still put this
  patch in the series for better review. The upstream kernel only needs one
  of the two patches eventually.
https://lore.kernel.org/linux-iommu/20200519175502.2504091-2-jean-phili...@linaro.org/
- Change CONFIG_IOASID to CONFIG_PCI_PASID (Ashok)

 include/linux/mm_types.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 64ede5f150dc..5778db3aa42d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -538,6 +538,10 @@ struct mm_struct {
atomic_long_t hugetlb_usage;
 #endif
struct work_struct async_put_work;
+
+#ifdef CONFIG_PCI_PASID
+   unsigned int pasid;
+#endif
} __randomize_layout;
 
/*
-- 
2.19.1



[PATCH v2 07/12] x86/msr-index: Define IA32_PASID MSR

2020-06-12 Thread Fenghua Yu
The IA32_PASID MSR (0xd93) contains the Process Address Space Identifier
(PASID), a 20-bit value. Bit 31 must be set to indicate the value
programmed in the MSR is valid. Hardware uses PASID to identify process
address space and direct responses to the right address space.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Change "identify process" to "identify process address space" in the
  commit message (Thomas)

 arch/x86/include/asm/msr-index.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e8370e64a155..e5f699ff1dd6 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -237,6 +237,9 @@
 #define MSR_IA32_LASTINTFROMIP 0x01dd
 #define MSR_IA32_LASTINTTOIP   0x01de
 
+#define MSR_IA32_PASID 0x0d93
+#define MSR_IA32_PASID_VALID   BIT_ULL(31)
+
 /* DEBUGCTLMSR bits (others vary by model): */
 #define DEBUGCTLMSR_LBR(1UL <<  0) /* last branch 
recording */
 #define DEBUGCTLMSR_BTF_SHIFT  1
-- 
2.19.1



[PATCH v2 04/12] docs: x86: Add documentation for SVA (Shared Virtual Addressing)

2020-06-12 Thread Fenghua Yu
From: Ashok Raj 

ENQCMD and Data Streaming Accelerator (DSA) and all of their associated
features are a complicated stack with lots of interconnected pieces.
This documentation provides a big picture overview for all of the
features.

Signed-off-by: Ashok Raj 
Co-developed-by: Fenghua Yu 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Fix the doc format and add the doc in toctree (Thomas)
- Modify the doc for better description (Thomas, Tony, Dave)

 Documentation/x86/index.rst |   1 +
 Documentation/x86/sva.rst   | 287 
 2 files changed, 288 insertions(+)
 create mode 100644 Documentation/x86/sva.rst

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 265d9e9a093b..e5d5ff096685 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -30,3 +30,4 @@ x86-specific Documentation
usb-legacy-support
i386/index
x86_64/index
+   sva
diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst
new file mode 100644
index ..1e52208c7dda
--- /dev/null
+++ b/Documentation/x86/sva.rst
@@ -0,0 +1,287 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===
+Shared Virtual Addressing (SVA) with ENQCMD
+===
+
+Background
+==
+
+Shared Virtual Addressing (SVA) allows the processor and device to use the
+same virtual addresses avoiding the need for software to translate virtual
+addresses to physical addresses. SVA is what PCIe calls Shared Virtual
+Memory (SVM)
+
+In addition to the convenience of using application virtual addresses
+by the device, it also doesn't require pinning pages for DMA.
+PCIe Address Translation Services (ATS) along with Page Request Interface
+(PRI) allow devices to function much the same way as the CPU handling
+application page-faults. For more information please refer to PCIe
+specification Chapter 10: ATS Specification.
+
+Use of SVA requires IOMMU support in the platform. IOMMU also is required
+to support PCIe features ATS and PRI. ATS allows devices to cache
+translations for the virtual address. IOMMU driver uses the mmu_notifier()
+support to keep the device tlb cache and the CPU cache in sync. PRI allows
+the device to request paging the virtual address before using if they are
+not paged in the CPU page tables.
+
+
+Shared Hardware Workqueues
+==
+
+Unlike Single Root I/O Virtualization (SRIOV), Scalable IOV (SIOV) permits
+the use of Shared Work Queues (SWQ) by both applications and Virtual
+Machines (VM's). This allows better hardware utilization vs. hard
+partitioning resources that could result in under utilization. In order to
+allow the hardware to distinguish the context for which work is being
+executed in the hardware by SWQ interface, SIOV uses Process Address Space
+ID (PASID), which is a 20bit number defined by the PCIe SIG.
+
+PASID value is encoded in all transactions from the device. This allows the
+IOMMU to track I/O on a per-PASID granularity in addition to using the PCIe
+Resource Identifier (RID) which is the Bus/Device/Function.
+
+
+ENQCMD
+==
+
+ENQCMD is a new instruction on Intel platforms that atomically submits a
+work descriptor to a device. The descriptor includes the operation to be
+performed, virtual addresses of all parameters, virtual address of a completion
+record, and the PASID (process address space ID) of the current process.
+
+ENQCMD works with non-posted semantics and carries a status back if the
+command was accepted by hardware. This allows the submitter to know if the
+submission needs to be retried or other device specific mechanisms to
+implement implement fairness or ensure forward progress can be made.
+
+ENQCMD is the glue that ensures applications can directly submit commands
+to the hardware and also permit hardware to be aware of application context
+to perform I/O operations via use of PASID.
+
+Process Address Space Tagging
+=
+
+A new thread scoped MSR (IA32_PASID) provides the connection between
+user processes and the rest of the hardware. When an application first
+accesses an SVA capable device this MSR is initialized with a newly
+allocated PASID. The driver for the device calls an IOMMU specific api
+that sets up the routing for DMA and page-requests.
+
+For example, the Intel Data Streaming Accelerator (DSA) uses
+intel_svm_bind_mm(), which will do the following.
+
+- Allocate the PASID, and program the process page-table (cr3) in the PASID
+  context entries.
+- Register for mmu_notifier() to track any page-table invalidations to keep
+  the device tlb in sync. For example, when a page-table entry is invalidated,
+  IOMMU propagates the invalidation to device tlb. This will force any
+  future access by the device to this virtual address to participate in
+  ATS. If the IOMMU responds with proper response that a page is not
+  present, the device would request the 

[PATCH v2 05/12] x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions

2020-06-12 Thread Fenghua Yu
Work submission instruction comes in two flavors. ENQCMD can be called
both in ring 3 and ring 0 and always uses the contents of PASID MSR when
shipping the command to the device. ENQCMDS allows a kernel driver to
submit commands on behalf of a user process. The driver supplies the
PASID value in ENQCMDS. There isn't any usage of ENQCMD in the kernel
as of now.

The CPU feature flag is shown as "enqcmd" in /proc/cpuinfo.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Re-write commit message (Thomas)

 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/cpuid-deps.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 02dabc9e77b0..4469618c410f 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -351,6 +351,7 @@
 #define X86_FEATURE_CLDEMOTE   (16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI(16*32+27) /* MOVDIRI instruction */
 #define X86_FEATURE_MOVDIR64B  (16*32+28) /* MOVDIR64B instruction */
+#define X86_FEATURE_ENQCMD (16*32+29) /* ENQCMD and ENQCMDS 
instructions */
 
 /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery 
support */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 3cbe24ca80ab..3a02707c1f4d 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -69,6 +69,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_CQM_MBM_TOTAL,X86_FEATURE_CQM_LLC   },
{ X86_FEATURE_CQM_MBM_LOCAL,X86_FEATURE_CQM_LLC   },
{ X86_FEATURE_AVX512_BF16,  X86_FEATURE_AVX512VL  },
+   { X86_FEATURE_ENQCMD,   X86_FEATURE_XSAVES},
{}
 };
 
-- 
2.19.1



[PATCH v2 03/12] iommu/vt-d: Change flags type to unsigned int in binding mm

2020-06-12 Thread Fenghua Yu
"flags" passed to intel_svm_bind_mm() is a bit mask and should be
defined as "unsigned int" instead of "int".

Change its type to "unsigned int".

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Add this new patch per Thomas' comment.

 drivers/iommu/intel/svm.c   | 7 ---
 include/linux/intel-iommu.h | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index b5618341b4b1..4e775e12ae52 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -427,7 +427,8 @@ int intel_svm_unbind_gpasid(struct device *dev, unsigned 
int pasid)
 
 /* Caller must hold pasid_mutex, mm reference */
 static int
-intel_svm_bind_mm(struct device *dev, int flags, struct svm_dev_ops *ops,
+intel_svm_bind_mm(struct device *dev, unsigned int flags,
+ struct svm_dev_ops *ops,
  struct mm_struct *mm, struct intel_svm_dev **sd)
 {
struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
@@ -954,7 +955,7 @@ intel_svm_bind(struct device *dev, struct mm_struct *mm, 
void *drvdata)
 {
struct iommu_sva *sva = ERR_PTR(-EINVAL);
struct intel_svm_dev *sdev = NULL;
-   int flags = 0;
+   unsigned int flags = 0;
int ret;
 
/*
@@ -963,7 +964,7 @@ intel_svm_bind(struct device *dev, struct mm_struct *mm, 
void *drvdata)
 * and intel_svm etc.
 */
if (drvdata)
-   flags = *(int *)drvdata;
+   flags = *(unsigned int *)drvdata;
mutex_lock(_mutex);
ret = intel_svm_bind_mm(dev, flags, NULL, mm, );
if (ret)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 44fa8879f829..9abc30cf10fc 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -759,7 +759,7 @@ struct intel_svm {
struct mm_struct *mm;
 
struct intel_iommu *iommu;
-   int flags;
+   unsigned int flags;
unsigned int pasid;
int gpasid; /* In case that guest PASID is different from host PASID */
struct list_head devs;
-- 
2.19.1



[PATCH v2 01/12] iommu: Change type of pasid to unsigned int

2020-06-12 Thread Fenghua Yu
PASID is defined as a few different types in iommu including "int",
"u32", and "unsigned int". To be consistent and to match with ioasid's
type, define PASID and its variations (e.g. max PASID) as "unsigned int".

No PASID type change in uapi.

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Create this new patch to define PASID as "unsigned int" consistently in
  iommu (Thomas)

 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c |  5 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |  2 +-
 drivers/iommu/amd/amd_iommu.h  | 13 
 drivers/iommu/amd/amd_iommu_types.h| 12 
 drivers/iommu/amd/init.c   |  4 +--
 drivers/iommu/amd/iommu.c  | 41 ++
 drivers/iommu/amd/iommu_v2.c   | 22 +++---
 drivers/iommu/intel/debugfs.c  |  2 +-
 drivers/iommu/intel/dmar.c | 13 
 drivers/iommu/intel/intel-pasid.h  | 21 ++---
 drivers/iommu/intel/iommu.c|  4 +--
 drivers/iommu/intel/pasid.c| 36 +++---
 drivers/iommu/intel/svm.c  | 12 
 drivers/iommu/iommu.c  |  2 +-
 drivers/misc/uacce/uacce.c |  2 +-
 include/linux/amd-iommu.h  |  9 +++---
 include/linux/intel-iommu.h| 18 +--
 include/linux/intel-svm.h  |  2 +-
 include/linux/iommu.h  |  8 ++---
 include/linux/uacce.h  |  2 +-
 20 files changed, 121 insertions(+), 109 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
index 7c8786b9eb0a..703d23deca76 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
@@ -139,7 +139,8 @@ void kfd_iommu_unbind_process(struct kfd_process *p)
 }
 
 /* Callback for process shutdown invoked by the IOMMU driver */
-static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int pasid)
+static void iommu_pasid_shutdown_callback(struct pci_dev *pdev,
+ unsigned int pasid)
 {
struct kfd_dev *dev = kfd_device_by_pci_dev(pdev);
struct kfd_process *p;
@@ -185,7 +186,7 @@ static void iommu_pasid_shutdown_callback(struct pci_dev 
*pdev, int pasid)
 }
 
 /* This function called by IOMMU driver on PPR failure */
-static int iommu_invalid_ppr_cb(struct pci_dev *pdev, int pasid,
+static int iommu_invalid_ppr_cb(struct pci_dev *pdev, unsigned int pasid,
unsigned long address, u16 flags)
 {
struct kfd_dev *dev;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index f0587d94294d..3c7d1f774afe 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -714,7 +714,7 @@ struct kfd_process {
/* We want to receive a notification when the mm_struct is destroyed */
struct mmu_notifier mmu_notifier;
 
-   uint16_t pasid;
+   unsigned int pasid;
unsigned int doorbell_index;
 
/*
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index f892992c8744..0914b5b6f879 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -45,12 +45,13 @@ extern int amd_iommu_register_ppr_notifier(struct 
notifier_block *nb);
 extern int amd_iommu_unregister_ppr_notifier(struct notifier_block *nb);
 extern void amd_iommu_domain_direct_map(struct iommu_domain *dom);
 extern int amd_iommu_domain_enable_v2(struct iommu_domain *dom, int pasids);
-extern int amd_iommu_flush_page(struct iommu_domain *dom, int pasid,
+extern int amd_iommu_flush_page(struct iommu_domain *dom, unsigned int pasid,
u64 address);
-extern int amd_iommu_flush_tlb(struct iommu_domain *dom, int pasid);
-extern int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, int pasid,
-unsigned long cr3);
-extern int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom, int pasid);
+extern int amd_iommu_flush_tlb(struct iommu_domain *dom, unsigned int pasid);
+extern int amd_iommu_domain_set_gcr3(struct iommu_domain *dom,
+unsigned int pasid, unsigned long cr3);
+extern int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom,
+  unsigned int pasid);
 extern struct iommu_domain *amd_iommu_get_v2_domain(struct pci_dev *pdev);
 
 #ifdef CONFIG_IRQ_REMAP
@@ -66,7 +67,7 @@ static inline int amd_iommu_create_irq_domain(struct 
amd_iommu *iommu)
 #define PPR_INVALID0x1
 #define PPR_FAILURE0xf
 
-extern int amd_iommu_complete_ppr(struct pci_dev *pdev, int pasid,
+extern int amd_iommu_complete_ppr(struct pci_dev *pdev, unsigned int pasid,
  int status, int tag);
 
 static inline bool is_rd890_iommu(struct pci_dev *pdev)
diff --git a/drivers/iommu/amd/amd_iommu_types.h 

[PATCH v2 02/12] ocxl: Change type of pasid to unsigned int

2020-06-12 Thread Fenghua Yu
PASID is defined as "int" although it's a 20-bit value and shouldn't be
negative int. To be consistent with type defined in iommu, define PASID
as "unsigned int".

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Create this new patch to define PASID as "unsigned int" consistently in
  ocxl (Thomas)

 drivers/misc/ocxl/config.c|  3 ++-
 drivers/misc/ocxl/link.c  |  6 +++---
 drivers/misc/ocxl/ocxl_internal.h |  6 +++---
 drivers/misc/ocxl/pasid.c |  2 +-
 drivers/misc/ocxl/trace.h | 20 ++--
 include/misc/ocxl.h   |  6 +++---
 6 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index c8e19bfb5ef9..22d034caed3d 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -806,7 +806,8 @@ int ocxl_config_set_TL(struct pci_dev *dev, int tl_dvsec)
 }
 EXPORT_SYMBOL_GPL(ocxl_config_set_TL);
 
-int ocxl_config_terminate_pasid(struct pci_dev *dev, int afu_control, int 
pasid)
+int ocxl_config_terminate_pasid(struct pci_dev *dev, int afu_control,
+   unsigned int pasid)
 {
u32 val;
unsigned long timeout;
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 58d111afd9f6..931f6ae022db 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -492,7 +492,7 @@ static u64 calculate_cfg_state(bool kernel)
return state;
 }
 
-int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
+int ocxl_link_add_pe(void *link_handle, unsigned int pasid, u32 pidr, u32 tidr,
u64 amr, struct mm_struct *mm,
void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
void *xsl_err_data)
@@ -572,7 +572,7 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
 }
 EXPORT_SYMBOL_GPL(ocxl_link_add_pe);
 
-int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid)
+int ocxl_link_update_pe(void *link_handle, unsigned int pasid, __u16 tid)
 {
struct ocxl_link *link = (struct ocxl_link *) link_handle;
struct spa *spa = link->spa;
@@ -608,7 +608,7 @@ int ocxl_link_update_pe(void *link_handle, int pasid, __u16 
tid)
return rc;
 }
 
-int ocxl_link_remove_pe(void *link_handle, int pasid)
+int ocxl_link_remove_pe(void *link_handle, unsigned int pasid)
 {
struct ocxl_link *link = (struct ocxl_link *) link_handle;
struct spa *spa = link->spa;
diff --git a/drivers/misc/ocxl/ocxl_internal.h 
b/drivers/misc/ocxl/ocxl_internal.h
index 345bf843a38e..3ca982ba7472 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -41,7 +41,7 @@ struct ocxl_afu {
struct ocxl_afu_config config;
int pasid_base;
int pasid_count; /* opened contexts */
-   int pasid_max; /* maximum number of contexts */
+   unsigned int pasid_max; /* maximum number of contexts */
int actag_base;
int actag_enabled;
struct mutex contexts_lock;
@@ -69,7 +69,7 @@ struct ocxl_xsl_error {
 
 struct ocxl_context {
struct ocxl_afu *afu;
-   int pasid;
+   unsigned int pasid;
struct mutex status_mutex;
enum ocxl_context_status status;
struct address_space *mapping;
@@ -128,7 +128,7 @@ int ocxl_config_check_afu_index(struct pci_dev *dev,
  * pasid: the PASID for the AFU context
  * tid: the new thread id for the process element
  */
-int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid);
+int ocxl_link_update_pe(void *link_handle, unsigned int pasid, __u16 tid);
 
 int ocxl_context_mmap(struct ocxl_context *ctx,
struct vm_area_struct *vma);
diff --git a/drivers/misc/ocxl/pasid.c b/drivers/misc/ocxl/pasid.c
index d14cb56e6920..a151fc8f0bec 100644
--- a/drivers/misc/ocxl/pasid.c
+++ b/drivers/misc/ocxl/pasid.c
@@ -80,7 +80,7 @@ static void range_free(struct list_head *head, u32 start, u32 
size,
 
 int ocxl_pasid_afu_alloc(struct ocxl_fn *fn, u32 size)
 {
-   int max_pasid;
+   unsigned int max_pasid;
 
if (fn->config.max_pasid_log < 0)
return -ENOSPC;
diff --git a/drivers/misc/ocxl/trace.h b/drivers/misc/ocxl/trace.h
index 17e21cb2addd..019e2fc63b1d 100644
--- a/drivers/misc/ocxl/trace.h
+++ b/drivers/misc/ocxl/trace.h
@@ -9,13 +9,13 @@
 #include 
 
 DECLARE_EVENT_CLASS(ocxl_context,
-   TP_PROTO(pid_t pid, void *spa, int pasid, u32 pidr, u32 tidr),
+   TP_PROTO(pid_t pid, void *spa, unsigned int pasid, u32 pidr, u32 tidr),
TP_ARGS(pid, spa, pasid, pidr, tidr),
 
TP_STRUCT__entry(
__field(pid_t, pid)
__field(void*, spa)
-   __field(int, pasid)
+   __field(unsigned int, pasid)
__field(u32, pidr)
__field(u32, tidr)
),
@@ -38,21 +38,21 @@ DECLARE_EVENT_CLASS(ocxl_context,
 );
 
 DEFINE_EVENT(ocxl_context, 

[PATCH v2 00/12] x86: tag application address space for devices

2020-06-12 Thread Fenghua Yu
Typical hardware devices require a driver stack to translate application
buffers to hardware addresses, and a kernel-user transition to notify the
hardware of new work. What if both the translation and transition overhead
could be eliminated? This is what Shared Virtual Address (SVA) and ENQCMD
enabled hardware like Data Streaming Accelerator (DSA) aims to achieve.
Applications map portals in their local-address-space and directly submit
work to them using a new instruction.

This series enables ENQCMD and associated management of the new MSR
(MSR_IA32_PASID). This new MSR allows an application address space to be
associated with what the PCIe spec calls a Process Address Space ID (PASID).
This PASID tag is carried along with all requests between applications and
devices and allows devices to interact with the process address space.

SVA and ENQCMD enabled device drivers need this series. The phase 2 DSA
patches with SVA and ENQCMD support was released on the top of this series:
https://lore.kernel.org/patchwork/cover/1244060/

This series only provides simple and basic support for ENQCMD and the MSR:
1. Clean up type definitions (patch 1-3). These patches can be in a
   separate series.
   - Define "pasid" as "unsigned int" consistently (patch 1 and 2).
   - Define "flags" as "unsigned int"
2. Explain different various technical terms used in the series (patch 4).
3. Enumerate support for ENQCMD in the processor (patch 5).
4. Handle FPU PASID state and the MSR during context switch (patches 6-7).
5. Define "pasid" in mm_struct (patch 8).
5. Clear PASID state for new mm and forked and cloned thread (patch 9-10).
6. Allocate and free PASID for a process (patch 11).
7. Fix up the PASID MSR in #GP handler when one thread in a process
   executes ENQCMD for the first time (patches 12).

This patch series and the DSA phase 2 series are in
https://github.com/intel/idxd-driver/tree/idxd-stage2

References:
1. Detailed information on the ENQCMD/ENQCMDS instructions and the
IA32_PASID MSR can be found in Intel Architecture Instruction Set
Extensions and Future Features Programming Reference:
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

2. Detailed information on DSA can be found in DSA specification:
https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification

Chang log:
v2:
- Add patches 1-3 to define "pasid" and "flags" as "unsigned int"
  consistently (Thomas)
  (these 3 patches could be in a separate patch set)
- Add patch 8 to move "pasid" to generic mm_struct (Christoph).
  Jean-Philippe Brucker released a virtually same patch. Upstream only
  needs one of the two.
- Add patch 9 to initialize PASID in a new mm.
- Plus other changes described in each patch (Thomas)

Ashok Raj (1):
  docs: x86: Add documentation for SVA (Shared Virtual Addressing)

Fenghua Yu (10):
  iommu: Change type of pasid to unsigned int
  ocxl: Change type of pasid to unsigned int
  iommu/vt-d: Change flags type to unsigned int in binding mm
  x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions
  x86/msr-index: Define IA32_PASID MSR
  mm: Define pasid in mm
  fork: Clear PASID for new mm
  x86/process: Clear PASID state for a newly forked/cloned thread
  x86/mmu: Allocate/free PASID
  x86/traps: Fix up invalid PASID

Yu-cheng Yu (1):
  x86/fpu/xstate: Add supervisor PASID state for ENQCMD feature

 Documentation/x86/index.rst|   1 +
 Documentation/x86/sva.rst  | 287 +
 arch/x86/include/asm/cpufeatures.h |   1 +
 arch/x86/include/asm/fpu/types.h   |  10 +
 arch/x86/include/asm/fpu/xstate.h  |   2 +-
 arch/x86/include/asm/iommu.h   |   3 +
 arch/x86/include/asm/mmu_context.h |  14 ++
 arch/x86/include/asm/msr-index.h   |   3 +
 arch/x86/kernel/cpu/cpuid-deps.c   |   1 +
 arch/x86/kernel/fpu/xstate.c   |   4 +
 arch/x86/kernel/process.c  |  18 ++
 arch/x86/kernel/traps.c|  23 ++
 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c |   5 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |   2 +-
 drivers/iommu/amd/amd_iommu.h  |  13 +-
 drivers/iommu/amd/amd_iommu_types.h|  12 +-
 drivers/iommu/amd/init.c   |   4 +-
 drivers/iommu/amd/iommu.c  |  41 ++--
 drivers/iommu/amd/iommu_v2.c   |  22 +-
 drivers/iommu/intel/debugfs.c  |   2 +-
 drivers/iommu/intel/dmar.c |  13 +-
 drivers/iommu/intel/intel-pasid.h  |  21 +-
 drivers/iommu/intel/iommu.c|   4 +-
 drivers/iommu/intel/pasid.c|  36 ++--
 drivers/iommu/intel/svm.c  | 159 --
 drivers/iommu/iommu.c  |   2 +-
 drivers/misc/ocxl/config.c |   3 +-
 drivers/misc/ocxl/link.c   |   6 +-
 drivers/misc/ocxl/ocxl_internal.h  |   6 +-
 drivers/misc/ocxl/pasid.c  |   2 +-
 

Re: [PATCH kernel] KVM: PPC: Fix nested guest RC bits update

2020-06-12 Thread Michael Ellerman
On Thu, 11 Jun 2020 13:05:59 +1000, Alexey Kardashevskiy wrote:
> Before commit 6cdf30375f82 ("powerpc/kvm/book3s: Use kvm helpers
> to walk shadow or secondary table") we called __find_linux_pte() with
> a page table pointer from a kvm_nested_guest struct but
> now we rely on kvmhv_find_nested() which takes an L1 LPID and returns
> a kvm_nested_guest pointer, however we pass a L0 LPID there and
> the L2 guest hangs.
> 
> [...]

Applied to powerpc/fixes.

[1/1] KVM: PPC: Fix nested guest RC bits update
  https://git.kernel.org/powerpc/c/e881bfaf5a5f409390973e076333281465f2b0d9

cheers


Re: [PATCH net] ibmvnic: Flush existing work items before device removal

2020-06-12 Thread David Miller
From: Thomas Falcon 
Date: Fri, 12 Jun 2020 13:34:41 -0500

> Ensure that all scheduled work items have completed before continuing
> with device removal and after further event scheduling has been
> halted. This patch fixes a bug where a scheduled driver reset event
> is processed following device removal.
> 
> Signed-off-by: Thomas Falcon 

Applied, thank you.


Re: [PATCH net] ibmvnic: Harden device login requests

2020-06-12 Thread David Miller
From: Thomas Falcon 
Date: Fri, 12 Jun 2020 13:31:39 -0500

> @@ -841,13 +841,14 @@ static int ibmvnic_login(struct net_device *netdev)
>  {
>   struct ibmvnic_adapter *adapter = netdev_priv(netdev);
>   unsigned long timeout = msecs_to_jiffies(3);
> + int retries = 10;
>   int retry_count = 0;
>   bool retry;
>   int rc;

Reverse christmas tree, please.


Re: [PATCH] ASoC: fsl_ssi: Fix bclk calculation for mono channel

2020-06-12 Thread Nicolin Chen
On Tue, Jun 09, 2020 at 04:19:28PM +0800, Shengjiu Wang wrote:
> For mono channel, ssi will switch to normal mode. In normal
> mode, the Word Length Control bits control the word length
> divider in clock generator, which is different with I2S master
> mode, the word length is fixed to 32bit.
> 
> So we refine the famula for mono channel, otherwise there
> will be sound issue for S24_LE.
> 
> Fixes: b0a7043d5c2c ("ASoC: fsl_ssi: Caculate bit clock rate using slot 
> number and width")
> Signed-off-by: Shengjiu Wang 
> ---
>  sound/soc/fsl/fsl_ssi.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
> index bad89b0d129e..e347776590f7 100644
> --- a/sound/soc/fsl/fsl_ssi.c
> +++ b/sound/soc/fsl/fsl_ssi.c
> @@ -695,6 +695,11 @@ static int fsl_ssi_set_bclk(struct snd_pcm_substream 
> *substream,
>   /* Generate bit clock based on the slot number and slot width */
>   freq = slots * slot_width * params_rate(hw_params);
>  
> + /* The slot_width is not fixed to 32 for normal mode */
> + if (params_channels(hw_params) == 1)

This function has a local variable that you can reuse here:
unsigned int slots = params_channels(hw_params);

> + freq = (slots <= 1 ? 2 : slots) * params_width(hw_params) *
> +params_rate(hw_params);

We have a small section of slots and slot_width calculation
at the top of this function where we can squash these in.


[PATCH net] ibmvnic: Flush existing work items before device removal

2020-06-12 Thread Thomas Falcon
Ensure that all scheduled work items have completed before continuing
with device removal and after further event scheduling has been
halted. This patch fixes a bug where a scheduled driver reset event
is processed following device removal.

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 1cb2b7f3b2cb..a66fa75976d3 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -5197,6 +5197,9 @@ static int ibmvnic_remove(struct vio_dev *dev)
adapter->state = VNIC_REMOVING;
spin_unlock_irqrestore(>state_lock, flags);
 
+   flush_work(>ibmvnic_reset);
+   flush_delayed_work(>ibmvnic_delayed_reset);
+
rtnl_lock();
unregister_netdevice(netdev);
 
-- 
2.18.1



[PATCH net] ibmvnic: Harden device login requests

2020-06-12 Thread Thomas Falcon
The VNIC driver's "login" command sequence is the final step
in the driver's initialization process with device firmware,
confirming the available device queue resources to be utilized
by the driver. Under high system load, firmware may not respond
to the request in a timely manner or may abort the request. In
such cases, the driver should reattempt the login command
sequence. In case of a device error, the number of retries
is bounded.

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 197dc5b2c090..1cb2b7f3b2cb 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -841,13 +841,14 @@ static int ibmvnic_login(struct net_device *netdev)
 {
struct ibmvnic_adapter *adapter = netdev_priv(netdev);
unsigned long timeout = msecs_to_jiffies(3);
+   int retries = 10;
int retry_count = 0;
bool retry;
int rc;
 
do {
retry = false;
-   if (retry_count > IBMVNIC_MAX_QUEUES) {
+   if (retry_count > retries) {
netdev_warn(netdev, "Login attempts exceeded\n");
return -1;
}
@@ -862,11 +863,23 @@ static int ibmvnic_login(struct net_device *netdev)
 
if (!wait_for_completion_timeout(>init_done,
 timeout)) {
-   netdev_warn(netdev, "Login timed out\n");
-   return -1;
+   netdev_warn(netdev, "Login timed out, retrying...\n");
+   retry = true;
+   adapter->init_done_rc = 0;
+   retry_count++;
+   continue;
}
 
-   if (adapter->init_done_rc == PARTIALSUCCESS) {
+   if (adapter->init_done_rc == ABORTED) {
+   netdev_warn(netdev, "Login aborted, retrying...\n");
+   retry = true;
+   adapter->init_done_rc = 0;
+   retry_count++;
+   /* FW or device may be busy, so
+* wait a bit before retrying login
+*/
+   msleep(500);
+   } else if (adapter->init_done_rc == PARTIALSUCCESS) {
retry_count++;
release_sub_crqs(adapter, 1);
 
-- 
2.18.1



[PATCH v2] tty: serial: cpm_uart: Fix behaviour for non existing GPIOs

2020-06-12 Thread Christophe Leroy
devm_gpiod_get_index() doesn't return NULL but -ENOENT when the
requested GPIO doesn't exist,  leading to the following messages:

[2.742468] gpiod_direction_input: invalid GPIO (errorpointer)
[2.748147] can't set direction for gpio #2: -2
[2.753081] gpiod_direction_input: invalid GPIO (errorpointer)
[2.758724] can't set direction for gpio #3: -2
[2.763666] gpiod_direction_output: invalid GPIO (errorpointer)
[2.769394] can't set direction for gpio #4: -2
[2.774341] gpiod_direction_input: invalid GPIO (errorpointer)
[2.779981] can't set direction for gpio #5: -2
[2.784545] ff000a20.serial: ttyCPM1 at MMIO 0xfff00a20 (irq = 39, base_baud 
= 825) is a CPM UART

Use devm_gpiod_get_index_optional() instead.

At the same time, handle the error case and properly exit
with an error.

Fixes: 97cbaf2c829b ("tty: serial: cpm_uart: Convert to use GPIO descriptors")
Cc: sta...@vger.kernel.org
Cc: Linus Walleij 
Signed-off-by: Christophe Leroy 
---
v2: Using devm_gpiod_get_index_optional() and exiting if error
---
 drivers/tty/serial/cpm_uart/cpm_uart_core.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/serial/cpm_uart/cpm_uart_core.c 
b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
index a04f74d2e854..4df47d02b34b 100644
--- a/drivers/tty/serial/cpm_uart/cpm_uart_core.c
+++ b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
@@ -1215,7 +1215,12 @@ static int cpm_uart_init_port(struct device_node *np,
 
pinfo->gpios[i] = NULL;
 
-   gpiod = devm_gpiod_get_index(dev, NULL, i, GPIOD_ASIS);
+   gpiod = devm_gpiod_get_index_optional(dev, NULL, i, GPIOD_ASIS);
+
+   if (IS_ERR(gpiod)) {
+   ret = PTR_ERR(gpiod);
+   goto out_irq;
+   }
 
if (gpiod) {
if (i == GPIO_RTS || i == GPIO_DTR)
@@ -1237,6 +1242,8 @@ static int cpm_uart_init_port(struct device_node *np,
 
return cpm_uart_request_port(>port);
 
+out_irq:
+   irq_dispose_mapping(pinfo->port.irq);
 out_pram:
cpm_uart_unmap_pram(pinfo, pram);
 out_mem:
-- 
2.25.0



[PATCH V2] powerpc/pseries/svm: Drop unused align argument in alloc_shared_lppaca() function

2020-06-12 Thread Satheesh Rajendran
Argument "align" in alloc_shared_lppaca() was unused inside the
function. Let's drop it and update code comment for page alignment.

Cc: linux-ker...@vger.kernel.org
Cc: Thiago Jung Bauermann 
Cc: Ram Pai 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Michael Ellerman 
Reviewed-by: Thiago Jung Bauermann 
Signed-off-by: Satheesh Rajendran 
---

V2:
Added reviewed by Thiago.
Dropped align argument as per Michael suggest.
Modified commit msg.

V1: 
http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200609113909.17236-1-sathn...@linux.vnet.ibm.com/
---
 arch/powerpc/kernel/paca.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 8d96169c597e..a174d64d9b4d 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -57,8 +57,8 @@ static void *__init alloc_paca_data(unsigned long size, 
unsigned long align,
 
 #define LPPACA_SIZE 0x400
 
-static void *__init alloc_shared_lppaca(unsigned long size, unsigned long 
align,
-   unsigned long limit, int cpu)
+static void *__init alloc_shared_lppaca(unsigned long size, unsigned long 
limit,
+   int cpu)
 {
size_t shared_lppaca_total_size = PAGE_ALIGN(nr_cpu_ids * LPPACA_SIZE);
static unsigned long shared_lppaca_size;
@@ -68,6 +68,12 @@ static void *__init alloc_shared_lppaca(unsigned long size, 
unsigned long align,
if (!shared_lppaca) {
memblock_set_bottom_up(true);
 
+   /* See Documentation/powerpc/ultravisor.rst for mode details
+*
+* UV/HV data share is in PAGE granularity, In order to
+* minimize the number of pages shared and maximize the
+* use of a page, let's use page align.
+*/
shared_lppaca =
memblock_alloc_try_nid(shared_lppaca_total_size,
   PAGE_SIZE, MEMBLOCK_LOW_LIMIT,
@@ -122,7 +128,7 @@ static struct lppaca * __init new_lppaca(int cpu, unsigned 
long limit)
return NULL;
 
if (is_secure_guest())
-   lp = alloc_shared_lppaca(LPPACA_SIZE, 0x400, limit, cpu);
+   lp = alloc_shared_lppaca(LPPACA_SIZE, limit, cpu);
else
lp = alloc_paca_data(LPPACA_SIZE, 0x400, limit, cpu);
 
-- 
2.26.2



Re: [RFC PATCH v3 0/4] Reuse the dma channel if available in Back-End

2020-06-12 Thread Mark Brown
On Fri, 12 Jun 2020 15:37:47 +0800, Shengjiu Wang wrote:
> Reuse the dma channel if available in Back-End
> 
> Shengjiu Wang (4):
>   ASoC: soc-card: export snd_soc_lookup_component_nolocked
>   ASoC: dmaengine_pcm: export soc_component_to_pcm
>   ASoC: fsl_asrc_dma: Reuse the dma channel if available in Back-End
>   ASoC: fsl_asrc_dma: Fix data copying speed issue with EDMA
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/4] ASoC: soc-card: export snd_soc_lookup_component_nolocked
  commit: 6fbea6b6a838f9aa941fe53a3637fd8d8aab1eba
[2/4] ASoC: dmaengine_pcm: export soc_component_to_pcm
  commit: a9a21e1eafc94b79502cab8272b392f7f63ef7bb
[3/4] ASoC: fsl_asrc_dma: Reuse the dma channel if available in Back-End
  commit: 706e2c8811585f42612b6cff218ab3adbe63a4ee
[4/4] ASoC: fsl_asrc_dma: Fix data copying speed issue with EDMA
  commit: b287a6d9723c601dd947f1c27d4cc0192e384a5a

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


Re: [PATCH v3 20/41] powerpc/book3s64/kuap/kuep: Make KUAP and KUEP a subfeature of PPC_MEM_KEYS

2020-06-12 Thread kernel test robot
Hi "Aneesh,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on next-20200611]
[cannot apply to v5.7]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Aneesh-Kumar-K-V/Kernel-userspace-access-execution-prevention-with-hash-translation/20200610-191943
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-randconfig-r003-20200612 (attached as .config)
compiler: powerpc64le-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>, old ones prefixed by <<):

arch/powerpc/mm/book3s64/pkeys.c: In function 'setup_kuep':
>> arch/powerpc/mm/book3s64/pkeys.c:207:28: error: 'boot_cpuid' undeclared 
>> (first use in this function)
207 |  if (smp_processor_id() == boot_cpuid) {
|^~
arch/powerpc/mm/book3s64/pkeys.c:207:28: note: each undeclared identifier is 
reported only once for each function it appears in
arch/powerpc/mm/book3s64/pkeys.c: In function 'setup_kuap':
arch/powerpc/mm/book3s64/pkeys.c:228:28: error: 'boot_cpuid' undeclared (first 
use in this function)
228 |  if (smp_processor_id() == boot_cpuid) {
|^~

vim +/boot_cpuid +207 arch/powerpc/mm/book3s64/pkeys.c

92e3da3cf193fd arch/powerpc/mm/pkeys.c  Ram Pai  2018-01-18  
200  
82b6d6aaa3e29c arch/powerpc/mm/book3s64/pkeys.c Aneesh Kumar K.V 2020-06-10  
201  #ifdef CONFIG_PPC_KUEP
82b6d6aaa3e29c arch/powerpc/mm/book3s64/pkeys.c Aneesh Kumar K.V 2020-06-10  
202  void __init setup_kuep(bool disabled)
82b6d6aaa3e29c arch/powerpc/mm/book3s64/pkeys.c Aneesh Kumar K.V 2020-06-10  
203  {
82b6d6aaa3e29c arch/powerpc/mm/book3s64/pkeys.c Aneesh Kumar K.V 2020-06-10  
204if (disabled || !early_radix_enabled())
82b6d6aaa3e29c arch/powerpc/mm/book3s64/pkeys.c Aneesh Kumar K.V 2020-06-10  
205return;
82b6d6aaa3e29c arch/powerpc/mm/book3s64/pkeys.c Aneesh Kumar K.V 2020-06-10  
206  
82b6d6aaa3e29c arch/powerpc/mm/book3s64/pkeys.c Aneesh Kumar K.V 2020-06-10 
@207if (smp_processor_id() == boot_cpuid) {
82b6d6aaa3e29c arch/powerpc/mm/book3s64/pkeys.c Aneesh Kumar K.V 2020-06-10  
208pr_info("Activating Kernel Userspace Execution 
Prevention\n");
82b6d6aaa3e29c arch/powerpc/mm/book3s64/pkeys.c Aneesh Kumar K.V 2020-06-10  
209cur_cpu_spec->mmu_features |= MMU_FTR_KUEP;
82b6d6aaa3e29c arch/powerpc/mm/book3s64/pkeys.c Aneesh Kumar K.V 2020-06-10  
210}
82b6d6aaa3e29c arch/powerpc/mm/book3s64/pkeys.c Aneesh Kumar K.V 2020-06-10  
211  
82b6d6aaa3e29c arch/powerpc/mm/book3s64/pkeys.c Aneesh Kumar K.V 2020-06-10  
212/*
82b6d6aaa3e29c arch/powerpc/mm/book3s64/pkeys.c Aneesh Kumar K.V 2020-06-10  
213 * Radix always uses key0 of the IAMR to determine if an access is
82b6d6aaa3e29c arch/powerpc/mm/book3s64/pkeys.c Aneesh Kumar K.V 2020-06-10  
214 * allowed. We set bit 0 (IBM bit 1) of key0, to prevent instruction
82b6d6aaa3e29c arch/powerpc/mm/book3s64/pkeys.c Aneesh Kumar K.V 2020-06-10  
215 * fetch.
82b6d6aaa3e29c arch/powerpc/mm/book3s64/pkeys.c Aneesh Kumar K.V 2020-06-10  
216 */
82b6d6aaa3e29c arch/powerpc/mm/book3s64/pkeys.c Aneesh Kumar K.V 2020-06-10  
217mtspr(SPRN_IAMR, AMR_KUEP_BLOCKED);
82b6d6aaa3e29c arch/powerpc/mm/book3s64/pkeys.c Aneesh Kumar K.V 2020-06-10  
218isync();
82b6d6aaa3e29c arch/powerpc/mm/book3s64/pkeys.c Aneesh Kumar K.V 2020-06-10  
219  }
82b6d6aaa3e29c arch/powerpc/mm/book3s64/pkeys.c Aneesh Kumar K.V 2020-06-10  
220  #endif
82b6d6aaa3e29c arch/powerpc/mm/book3s64/pkeys.c Aneesh Kumar K.V 2020-06-10  
221  

:: The code at line 207 was first introduced by commit
:: 82b6d6aaa3e29c1f61639eaf61333b3f84b34c4d powerpc/book3s64/kuep: Move 
KUEP related function outside radix

:: TO: Aneesh Kumar K.V 
:: CC: 0day robot 

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: PowerPC KVM-PR issue

2020-06-12 Thread Christian Zigotzky

On 11 June 2020 at 04:47 pm, Christian Zigotzky wrote:

On 10 June 2020 at 01:23 pm, Christian Zigotzky wrote:

On 10 June 2020 at 11:06 am, Christian Zigotzky wrote:

On 10 June 2020 at 00:18 am, Christian Zigotzky wrote:

Hello,

KVM-PR doesn't work anymore on my Nemo board [1]. I figured out 
that the Git kernels and the kernel 5.7 are affected.


Error message: Fienix kernel: kvmppc_exit_pr_progint: emulation at 
700 failed ()


I can boot virtual QEMU PowerPC machines with KVM-PR with the 
kernel 5.6 without any problems on my Nemo board.


I tested it with QEMU 2.5.0 and QEMU 5.0.0 today.

Could you please check KVM-PR on your PowerPC machine?

Thanks,
Christian

[1] https://en.wikipedia.org/wiki/AmigaOne_X1000


I figured out that the PowerPC updates 5.7-1 [1] are responsible for 
the KVM-PR issue. Please test KVM-PR on your PowerPC machines and 
check the PowerPC updates 5.7-1 [1].


Thanks

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d38c07afc356ddebaa3ed8ecb3f553340e05c969



I tested the latest Git kernel with Mac-on-Linux/KVM-PR today. 
Unfortunately I can't use KVM-PR with MoL anymore because of this 
issue (see screenshots [1]). Please check the PowerPC updates 5.7-1.


Thanks

[1]
- 
https://i.pinimg.com/originals/0c/b3/64/0cb364a40241fa2b7f297d4272bbb8b7.png
- 
https://i.pinimg.com/originals/9a/61/d1/9a61d170b1c9f514f7a78a3014ffd18f.png



Hi All,

I bisected today because of the KVM-PR issue.

Result:

9600f261acaaabd476d7833cec2dd20f2919f1a0 is the first bad commit
commit 9600f261acaaabd476d7833cec2dd20f2919f1a0
Author: Nicholas Piggin 
Date:   Wed Feb 26 03:35:21 2020 +1000

    powerpc/64s/exception: Move KVM test to common code

    This allows more code to be moved out of unrelocated regions. The
    system call KVMTEST is changed to be open-coded and remain in the
    tramp area to avoid having to move it to entry_64.S. The custom 
nature

    of the system call entry code means the hcall case can be made more
    streamlined than regular interrupt handlers.

    mpe: Incorporate fix from Nick:

    Moving KVM test to the common entry code missed the case of HMI and
    MCE, which do not do __GEN_COMMON_ENTRY (because they don't want to
    switch to virt mode).

    This means a MCE or HMI exception that is taken while KVM is 
running a

    guest context will not be switched out of that context, and KVM won't
    be notified. Found by running sigfuz in guest with patched host on
    POWER9 DD2.3, which causes some TM related HMI interrupts (which are
    expected and supposed to be handled by KVM).

    This fix adds a __GEN_REALMODE_COMMON_ENTRY for those handlers to add
    the KVM test. This makes them look a little more like other handlers
    that all use __GEN_COMMON_ENTRY.

    Signed-off-by: Nicholas Piggin 
    Signed-off-by: Michael Ellerman 
    Link: 
https://lore.kernel.org/r/20200225173541.1549955-13-npig...@gmail.com


:04 04 ec21cec22d165f8696d69532734cb2985d532cb0 
87dd49a9cd7202ec79350e8ca26cea01f1dbd93d M    arch


-

The following commit is the problem: powerpc/64s/exception: Move KVM 
test to common code [1]


These changes were included in the PowerPC updates 5.7-1. [2]

Another test:

git checkout d38c07afc356ddebaa3ed8ecb3f553340e05c969 (PowerPC updates 
5.7-1 [2] ) -> KVM-PR doesn't work.


After that: git revert d38c07afc356ddebaa3ed8ecb3f553340e05c969 -m 1 
-> KVM-PR works.


Could you please check the first bad commit? [1]

Thanks,
Christian


[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9600f261acaaabd476d7833cec2dd20f2919f1a0
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d38c07afc356ddebaa3ed8ecb3f553340e05c969


Hi All,

I tried to revert the __GEN_REALMODE_COMMON_ENTRY fix for the latest Git 
kernel and for the stable kernel 5.7.2 but without any success. There 
was lot of restructuring work during the kernel 5.7 development time in 
the PowerPC area so it isn't possible reactivate the old code. That 
means we have lost the whole KVM-PR support. I also reported this issue 
to Alexander Graf two days ago. He wrote: "Howdy :). It looks pretty 
broken. Have you ever made a bisect to see where the problem comes from?"


Please check the KVM-PR code.

Thanks,
Christian




Re: [PATCH kernel] powerpc/xive: Ignore kmemleak false positives

2020-06-12 Thread Michael Ellerman
Alexey Kardashevskiy  writes:
> xive_native_provision_pages() allocates memory and passes the pointer to
> OPAL so kmemleak cannot find the pointer usage in the kernel memory and
> produces a false positive report (below) (even if the kernel did scan
> OPAL memory, it is unable to deal with __pa() addresses anyway).
>
> This silences the warning.
>
> unreferenced object 0xc000200350c4 (size 65536):
>   comm "qemu-system-ppc", pid 2725, jiffies 4294946414 (age 70776.530s)
>   hex dump (first 32 bytes):
> 02 00 00 00 50 00 00 00 00 00 00 00 00 00 00 00  P...
> 01 00 08 07 00 00 00 00 00 00 00 00 00 00 00 00  
>   backtrace:
> [<81ff046c>] xive_native_alloc_vp_block+0x120/0x250
> [] kvmppc_xive_compute_vp_id+0x248/0x350 [kvm]
> [] kvmppc_xive_connect_vcpu+0xc0/0x520 [kvm]
> [<6acbc81c>] kvm_arch_vcpu_ioctl+0x308/0x580 [kvm]
> [<89c69580>] kvm_vcpu_ioctl+0x19c/0xae0 [kvm]
> [<902ae91e>] ksys_ioctl+0x184/0x1b0
> [] sys_ioctl+0x48/0xb0
> [<01b2c127>] system_call_exception+0x124/0x1f0
> [] system_call_common+0xe8/0x214
>
> Signed-off-by: Alexey Kardashevskiy 
> ---
>
> Does kmemleak actually check the OPAL memory?

No it shouldn't.

The memory used by OPAL should all be reserved in the device tree. That
means we never give it to any of the Linux memory allocators, and
therefore kmemleak will never see an allocation from those areas and add
that area to its list of areas to scan.

At least that's my understanding of how kmemleak works.

> Because if it did, we would still have a warning as kmemleak does not
> trace __pa() addresses anyway.

Right.

I think this patch is an OK solution.

It's kind of odd that we donate pages and don't keep track of them. But
they're used by xive until it's reset, and we don't do that until we
kexec, at which point we don't need to know about them anyway.

cheers


Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone

2020-06-12 Thread Alex Ghiti

Hi Atish,

Le 6/11/20 à 5:34 PM, Atish Patra a écrit :

On Sun, Jun 7, 2020 at 1:01 AM Alexandre Ghiti  wrote:

This is a preparatory patch for relocatable kernel.

The kernel used to be linked at PAGE_OFFSET address and used to be loaded
physically at the beginning of the main memory. Therefore, we could use
the linear mapping for the kernel mapping.

But the relocated kernel base address will be different from PAGE_OFFSET
and since in the linear mapping, two different virtual addresses cannot
point to the same physical address, the kernel mapping needs to lie outside
the linear mapping.

In addition, because modules and BPF must be close to the kernel (inside
+-2GB window), the kernel is placed at the end of the vmalloc zone minus
2GB, which leaves room for modules and BPF. The kernel could not be
placed at the beginning of the vmalloc zone since other vmalloc
allocations from the kernel could get all the +-2GB window around the
kernel which would prevent new modules and BPF programs to be loaded.

Signed-off-by: Alexandre Ghiti 
Reviewed-by: Zong Li 
---
  arch/riscv/boot/loader.lds.S |  3 +-
  arch/riscv/include/asm/page.h| 10 +-
  arch/riscv/include/asm/pgtable.h | 38 ++---
  arch/riscv/kernel/head.S |  3 +-
  arch/riscv/kernel/module.c   |  4 +--
  arch/riscv/kernel/vmlinux.lds.S  |  3 +-
  arch/riscv/mm/init.c | 58 +---
  arch/riscv/mm/physaddr.c |  2 +-
  8 files changed, 88 insertions(+), 33 deletions(-)

diff --git a/arch/riscv/boot/loader.lds.S b/arch/riscv/boot/loader.lds.S
index 47a5003c2e28..62d94696a19c 100644
--- a/arch/riscv/boot/loader.lds.S
+++ b/arch/riscv/boot/loader.lds.S
@@ -1,13 +1,14 @@
  /* SPDX-License-Identifier: GPL-2.0 */

  #include 
+#include 

  OUTPUT_ARCH(riscv)
  ENTRY(_start)

  SECTIONS
  {
-   . = PAGE_OFFSET;
+   . = KERNEL_LINK_ADDR;

 .payload : {
 *(.payload)
diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 2d50f76efe48..48bb09b6a9b7 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -90,18 +90,26 @@ typedef struct page *pgtable_t;

  #ifdef CONFIG_MMU
  extern unsigned long va_pa_offset;
+extern unsigned long va_kernel_pa_offset;
  extern unsigned long pfn_base;
  #define ARCH_PFN_OFFSET(pfn_base)
  #else
  #define va_pa_offset   0
+#define va_kernel_pa_offset0
  #define ARCH_PFN_OFFSET(PAGE_OFFSET >> PAGE_SHIFT)
  #endif /* CONFIG_MMU */

  extern unsigned long max_low_pfn;
  extern unsigned long min_low_pfn;
+extern unsigned long kernel_virt_addr;

  #define __pa_to_va_nodebug(x)  ((void *)((unsigned long) (x) + va_pa_offset))
-#define __va_to_pa_nodebug(x)  ((unsigned long)(x) - va_pa_offset)
+#define linear_mapping_va_to_pa(x) ((unsigned long)(x) - va_pa_offset)
+#define kernel_mapping_va_to_pa(x) \
+   ((unsigned long)(x) - va_kernel_pa_offset)
+#define __va_to_pa_nodebug(x)  \
+   (((x) >= PAGE_OFFSET) ? \
+   linear_mapping_va_to_pa(x) : kernel_mapping_va_to_pa(x))

  #ifdef CONFIG_DEBUG_VIRTUAL
  extern phys_addr_t __virt_to_phys(unsigned long x);
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 35b60035b6b0..94ef3b49dfb6 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -11,23 +11,29 @@

  #include 

-#ifndef __ASSEMBLY__
-
-/* Page Upper Directory not used in RISC-V */
-#include 
-#include 
-#include 
-#include 
-
-#ifdef CONFIG_MMU
+#ifndef CONFIG_MMU
+#define KERNEL_VIRT_ADDR   PAGE_OFFSET
+#define KERNEL_LINK_ADDR   PAGE_OFFSET
+#else
+/*
+ * Leave 2GB for modules and BPF that must lie within a 2GB range around
+ * the kernel.
+ */
+#define KERNEL_VIRT_ADDR   (VMALLOC_END - SZ_2G + 1)
+#define KERNEL_LINK_ADDR   KERNEL_VIRT_ADDR

  #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
  #define VMALLOC_END  (PAGE_OFFSET - 1)
  #define VMALLOC_START(PAGE_OFFSET - VMALLOC_SIZE)

  #define BPF_JIT_REGION_SIZE(SZ_128M)
-#define BPF_JIT_REGION_START   (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
-#define BPF_JIT_REGION_END (VMALLOC_END)
+#define BPF_JIT_REGION_START   PFN_ALIGN((unsigned long)&_end)
+#define BPF_JIT_REGION_END (BPF_JIT_REGION_START + BPF_JIT_REGION_SIZE)
+

As these mappings have changed a few times in recent months including
this one, I think it would be
better to have virtual memory layout documentation in RISC-V similar
to other architectures.

If you can include the page table layout for 3/4 level page tables in
the same document, that would be really helpful.



Yes, I'll do that in a separate commit.

Thanks,

Alex



+#ifdef CONFIG_64BIT
+#define VMALLOC_MODULE_START   BPF_JIT_REGION_END
+#define VMALLOC_MODULE_END (((unsigned long)&_start & PAGE_MASK) + SZ_2G)
+#endif

  /*
   * Roughly size the vmemmap space to be large enough to fit enough
@@ -57,9 +63,16 @@
  

Re: [PATCH v2] All arch: remove system call sys_sysctl

2020-06-12 Thread Xiaoming Ni

On 2020/6/12 2:23, Eric W. Biederman wrote:

Rich Felker  writes:


On Thu, Jun 11, 2020 at 12:01:11PM -0500, Eric W. Biederman wrote:

Rich Felker  writes:


On Thu, Jun 11, 2020 at 06:43:00AM -0500, Eric W. Biederman wrote:

Xiaoming Ni  writes:


Since the commit 61a47c1ad3a4dc ("sysctl: Remove the sysctl system call"),
sys_sysctl is actually unavailable: any input can only return an error.

We have been warning about people using the sysctl system call for years
and believe there are no more users.  Even if there are users of this
interface if they have not complained or fixed their code by now they
probably are not going to, so there is no point in warning them any
longer.

So completely remove sys_sysctl on all architectures.






Signed-off-by: Xiaoming Ni 

changes in v2:
   According to Kees Cook's suggestion, completely remove sys_sysctl on all arch
   According to Eric W. Biederman's suggestion, update the commit log

V1: 
https://lore.kernel.org/lkml/1591683605-8585-1-git-send-email-nixiaom...@huawei.com/
   Delete the code of sys_sysctl and return -ENOSYS directly at the function 
entry
---
  include/uapi/linux/sysctl.h|  15 --

[snip]


diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h
index 27c1ed2..84b44c3 100644
--- a/include/uapi/linux/sysctl.h
+++ b/include/uapi/linux/sysctl.h
@@ -27,21 +27,6 @@
  #include 
  #include 
  
-#define CTL_MAXNAME 10		/* how many path components do we allow in a

-  call to sysctl?   In other words, what is
-  the largest acceptable value for the nlen
-  member of a struct __sysctl_args to have? */
-
-struct __sysctl_args {
-   int __user *name;
-   int nlen;
-   void __user *oldval;
-   size_t __user *oldlenp;
-   void __user *newval;
-   size_t newlen;
-   unsigned long __unused[4];
-};
-
  /* Define sysctl names first */
  
  /* Top-level names: */

[snip]

The uapi header change does not make sense.  The entire point of the
header is to allow userspace programs to be able to call sys_sysctl.
It either needs to all stay or all go.

As the concern with the uapi header is about userspace programs being
able to compile please leave the header for now.

We should leave auditing userspace and seeing if userspace code will
still compile if we remove this header for a separate patch.  The
concerns and justifications for the uapi header are completely different
then for the removing the sys_sysctl implementation.

Otherwise
Acked-by: "Eric W. Biederman" 


The UAPI header should be kept because it's defining an API not just
for the kernel the headers are supplied with, but for all past
kernels. In particular programs needing a failsafe CSPRNG source that
works on old kernels may (do) use this as a fallback only if modern
syscalls are missing. Removing the syscall is no problem since it
won't be used, but if you remove the types/macros from the UAPI
headers, they'll have to copy that into their own sources.


May we assume you know of a least one piece of userspace that will fail
to compile if this header file is removed?


I know at least one piece of software is using SYS_sysctl for a
fallback CSPRNG source. I'm not 100% sure that they're using the
kernel headers; they might have copied it already. I'm also not sure
how many there are.

Regardless, I think the principle stands. There's no need to remove
definitions that are essentially maintenance-free now that the
interface is no longer available in new kernels, and doing so
contributes to the myth that you're supposed to use kernel headers
matching runtime kernel rather than it always being safe to use latest
headers.


If there is no one using the definitions removing them saves people
having to remember what they are there for.

The big rule is don't break userspace.  The goal is to allow people to
upgrade their kernel without needing to worry about userspace breaking,
and to be able to downgrade to the extent possible to help in tracking
bugs.

Not being able to compile userspace seems like a pretty clear cut case.
Although there are some fuzzy edges given the history of the kernel
headers.  Things like your libc requiring kernel headers to be processed
before they can be used.  I think there are still some kernel headers
that have that restriction when used with glibc as glibc uses different
sizes for types like dev_t.

The bottom line is we can't do it casually so that any work in the
direction of removing from or deleting uapi headers needs to be it's own
separate patch.

Given how much effort it can be to show that userspace is not using
something I don't expect us to be mucking with the uapi headers any time
soon.

Eric



Thanks everyone for your guidance, I will delete the update of uapi file 
in v3 version.


But here I am still a bit confused: how to modify include/uapi?

Before commit 61a47c1ad3a4dc ("sysctl: Remove the sysctl 

[PATCH] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed

2020-06-12 Thread Cédric Le Goater
When a passthrough IO adapter is removed from a pseries machine using
hash MMU and the XIVE interrupt mode, the POWER hypervisor, pHyp,
expects the guest OS to have cleared all page table entries related to
the adapter. If some are still present, the RTAS call which isolates
the PCI slot returns error 9001 "valid outstanding translations" and
the removal of the IO adapter fails.

INTx interrupt numbers need special care because Linux maps the
interrupts automatically in the Linux interrupt number space. For this
purpose, record the logical interrupt number of the INTx at the PHB
level and clear these interrupts when the PCI bus is removed. This
will also clear all the page table entries of the ESB pages when using
XIVE.

Cc: "Oliver O'Halloran" 
Signed-off-by: Cédric Le Goater 
---

 This deprecates patch :
 
 
http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200429075122.1216388-3-...@kaod.org/

 Thanks,

 arch/powerpc/include/asm/pci-bridge.h |  4 +++
 arch/powerpc/kernel/pci-common.c  | 45 +++
 2 files changed, 49 insertions(+)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index b92e81b256e5..9960dd249079 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -48,6 +48,8 @@ struct pci_controller_ops {
 
 /*
  * Structure of a PCI controller (host bridge)
+ *
+ * @intx: legacy INTx mappings
  */
 struct pci_controller {
struct pci_bus *bus;
@@ -127,6 +129,8 @@ struct pci_controller {
 
void *private_data;
struct npu *npu;
+
+   unsigned int intx[PCI_NUM_INTX];
 };
 
 /* These are used for config access before all the PCI probing
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index be108616a721..8c442627f465 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -353,6 +353,49 @@ struct pci_controller *pci_find_controller_for_domain(int 
domain_nr)
return NULL;
 }
 
+static void pci_intx_register(struct pci_dev *pdev, int virq)
+{
+   struct pci_controller *phb = pci_bus_to_host(pdev->bus);
+   int i;
+
+   for (i = 0; i < PCI_NUM_INTX; i++) {
+   /*
+* Look for an empty or an equivalent slot, as INTx
+* interrupts can be shared between adapters
+*/
+   if (phb->intx[i] == virq || !phb->intx[i]) {
+   phb->intx[i] = virq;
+   break;
+   }
+   }
+
+   if (i == PCI_NUM_INTX)
+   pr_err("PCI:%s INTx all mapped\n", pci_name(pdev));
+}
+
+/*
+ * Clearing the mapped INTx interrupts will also clear the underlying
+ * mappings of the ESB pages of the interrupts when under XIVE. It is
+ * a requirement of PowerVM to clear all memory mappings before
+ * removing a PHB.
+ */
+static void pci_intx_dispose(struct pci_bus *bus)
+{
+   struct pci_controller *phb = pci_bus_to_host(bus);
+   int i;
+
+   pr_debug("PCI: Clearing INTx for PHB %04x:%02x...\n",
+pci_domain_nr(bus), bus->number);
+   for (i = 0; i < PCI_NUM_INTX; i++)
+   irq_dispose_mapping(phb->intx[i]);
+}
+
+void pcibios_remove_bus(struct pci_bus *bus)
+{
+   pci_intx_dispose(bus);
+}
+EXPORT_SYMBOL_GPL(pcibios_remove_bus);
+
 /*
  * Reads the interrupt pin to determine if interrupt is use by card.
  * If the interrupt is used, then gets the interrupt line from the
@@ -401,6 +444,8 @@ static int pci_read_irq_line(struct pci_dev *pci_dev)
 
pci_dev->irq = virq;
 
+   /* Record all INTx mappings for later removal of a PHB */
+   pci_intx_register(pci_dev, virq);
return 0;
 }
 
-- 
2.25.4



Re: [RFC PATCH v3 4/4] ASoC: fsl_asrc_dma: Fix data copying speed issue with EDMA

2020-06-12 Thread Nicolin Chen
On Fri, Jun 12, 2020 at 03:37:51PM +0800, Shengjiu Wang wrote:
> With EDMA, there is two dma channels can be used for dev_to_dev,
> one is from ASRC, one is from another peripheral (ESAI or SAI).
> 
> If we select the dma channel of ASRC, there is an issue for ideal
> ratio case, the speed of copy data is faster than sample
> frequency, because ASRC output data is very fast in ideal ratio
> mode.
> 
> So it is reasonable to use the dma channel of Back-End peripheral.
> then copying speed of DMA is controlled by data consumption
> speed in the peripheral FIFO,
> 
> Signed-off-by: Shengjiu Wang 

Reviewed-by: Nicolin Chen 


Re: [RFC PATCH v3 3/4] ASoC: fsl_asrc_dma: Reuse the dma channel if available in Back-End

2020-06-12 Thread Nicolin Chen
On Fri, Jun 12, 2020 at 03:37:50PM +0800, Shengjiu Wang wrote:
> The dma channel has been requested by Back-End cpu dai driver already.
> If fsl_asrc_dma requests dma chan with same dma:tx symlink, then
> there will be below warning with SDMA.
> 
> [   48.174236] fsl-esai-dai 2024000.esai: Cannot create DMA dma:tx symlink
> 
> So if we can reuse the dma channel of Back-End, then the issue can be
> fixed.
> 
> In order to get the dma channel which is already requested in Back-End.
> we use the exported two functions (snd_soc_lookup_component_nolocked
> and soc_component_to_pcm). If we can get the dma channel, then reuse it,
> if can't, then request a new one.
> 
> Signed-off-by: Shengjiu Wang 

Reviewed-by: Nicolin Chen 


Re: [RFC PATCH v3 2/4] ASoC: dmaengine_pcm: export soc_component_to_pcm

2020-06-12 Thread Nicolin Chen
On Fri, Jun 12, 2020 at 03:37:49PM +0800, Shengjiu Wang wrote:
> In DPCM case, Front-End needs to get the dma chan which has
> been requested by Back-End and reuse it.
> 
> Signed-off-by: Shengjiu Wang 

Reviewed-by: Nicolin Chen 


Re: [RFC PATCH v3 1/4] ASoC: soc-card: export snd_soc_lookup_component_nolocked

2020-06-12 Thread Nicolin Chen
On Fri, Jun 12, 2020 at 03:37:48PM +0800, Shengjiu Wang wrote:
> snd_soc_lookup_component_nolocked can be used for the DPCM case
> that Front-End needs to get the unused platform component but
> added by Back-End cpu dai driver.
> 
> If the component is gotten, then we can get the dma chan created
> by Back-End component and reused it in Front-End.
> 
> Signed-off-by: Shengjiu Wang 

Reviewed-by: Nicolin Chen 


[RFC PATCH v3 4/4] ASoC: fsl_asrc_dma: Fix data copying speed issue with EDMA

2020-06-12 Thread Shengjiu Wang
With EDMA, there is two dma channels can be used for dev_to_dev,
one is from ASRC, one is from another peripheral (ESAI or SAI).

If we select the dma channel of ASRC, there is an issue for ideal
ratio case, the speed of copy data is faster than sample
frequency, because ASRC output data is very fast in ideal ratio
mode.

So it is reasonable to use the dma channel of Back-End peripheral.
then copying speed of DMA is controlled by data consumption
speed in the peripheral FIFO,

Signed-off-by: Shengjiu Wang 
---
 sound/soc/fsl/fsl_asrc_common.h |  2 ++
 sound/soc/fsl/fsl_asrc_dma.c| 26 +++---
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/sound/soc/fsl/fsl_asrc_common.h b/sound/soc/fsl/fsl_asrc_common.h
index 77665b15c8db..7e1c13ca37f1 100644
--- a/sound/soc/fsl/fsl_asrc_common.h
+++ b/sound/soc/fsl/fsl_asrc_common.h
@@ -32,6 +32,7 @@ enum asrc_pair_index {
  * @dma_chan: inputer and output DMA channels
  * @dma_data: private dma data
  * @pos: hardware pointer position
+ * @req_dma_chan: flag to release dev_to_dev chan
  * @private: pair private area
  */
 struct fsl_asrc_pair {
@@ -45,6 +46,7 @@ struct fsl_asrc_pair {
struct dma_chan *dma_chan[2];
struct imx_dma_data dma_data;
unsigned int pos;
+   bool req_dma_chan;
 
void *private;
 };
diff --git a/sound/soc/fsl/fsl_asrc_dma.c b/sound/soc/fsl/fsl_asrc_dma.c
index d88e6343e0a2..5f01a58f422a 100644
--- a/sound/soc/fsl/fsl_asrc_dma.c
+++ b/sound/soc/fsl/fsl_asrc_dma.c
@@ -233,11 +233,11 @@ static int fsl_asrc_dma_hw_params(struct 
snd_soc_component *component,
 
pair->dma_chan[dir] =
dma_request_channel(mask, filter, >dma_data);
+   pair->req_dma_chan = true;
} else {
-   if (!be_chan)
-   dma_release_channel(tmp_chan);
-   pair->dma_chan[dir] =
-   asrc->get_dma_channel(pair, dir);
+   pair->dma_chan[dir] = tmp_chan;
+   /* Do not flag to release if we are reusing the Back-End one */
+   pair->req_dma_chan = !be_chan;
}
 
if (!pair->dma_chan[dir]) {
@@ -276,7 +276,8 @@ static int fsl_asrc_dma_hw_params(struct snd_soc_component 
*component,
ret = dmaengine_slave_config(pair->dma_chan[dir], _be);
if (ret) {
dev_err(dev, "failed to config DMA channel for Back-End\n");
-   dma_release_channel(pair->dma_chan[dir]);
+   if (pair->req_dma_chan)
+   dma_release_channel(pair->dma_chan[dir]);
return ret;
}
 
@@ -288,19 +289,22 @@ static int fsl_asrc_dma_hw_params(struct 
snd_soc_component *component,
 static int fsl_asrc_dma_hw_free(struct snd_soc_component *component,
struct snd_pcm_substream *substream)
 {
+   bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK;
struct snd_pcm_runtime *runtime = substream->runtime;
struct fsl_asrc_pair *pair = runtime->private_data;
+   u8 dir = tx ? OUT : IN;
 
snd_pcm_set_runtime_buffer(substream, NULL);
 
-   if (pair->dma_chan[IN])
-   dma_release_channel(pair->dma_chan[IN]);
+   if (pair->dma_chan[!dir])
+   dma_release_channel(pair->dma_chan[!dir]);
 
-   if (pair->dma_chan[OUT])
-   dma_release_channel(pair->dma_chan[OUT]);
+   /* release dev_to_dev chan if we aren't reusing the Back-End one */
+   if (pair->dma_chan[dir] && pair->req_dma_chan)
+   dma_release_channel(pair->dma_chan[dir]);
 
-   pair->dma_chan[IN] = NULL;
-   pair->dma_chan[OUT] = NULL;
+   pair->dma_chan[!dir] = NULL;
+   pair->dma_chan[dir] = NULL;
 
return 0;
 }
-- 
2.21.0



[RFC PATCH v3 3/4] ASoC: fsl_asrc_dma: Reuse the dma channel if available in Back-End

2020-06-12 Thread Shengjiu Wang
The dma channel has been requested by Back-End cpu dai driver already.
If fsl_asrc_dma requests dma chan with same dma:tx symlink, then
there will be below warning with SDMA.

[   48.174236] fsl-esai-dai 2024000.esai: Cannot create DMA dma:tx symlink

So if we can reuse the dma channel of Back-End, then the issue can be
fixed.

In order to get the dma channel which is already requested in Back-End.
we use the exported two functions (snd_soc_lookup_component_nolocked
and soc_component_to_pcm). If we can get the dma channel, then reuse it,
if can't, then request a new one.

Signed-off-by: Shengjiu Wang 
---
 sound/soc/fsl/fsl_asrc_dma.c | 25 -
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/sound/soc/fsl/fsl_asrc_dma.c b/sound/soc/fsl/fsl_asrc_dma.c
index d6a3fc5f87e5..d88e6343e0a2 100644
--- a/sound/soc/fsl/fsl_asrc_dma.c
+++ b/sound/soc/fsl/fsl_asrc_dma.c
@@ -135,6 +135,8 @@ static int fsl_asrc_dma_hw_params(struct snd_soc_component 
*component,
struct snd_dmaengine_dai_dma_data *dma_params_be = NULL;
struct snd_pcm_runtime *runtime = substream->runtime;
struct fsl_asrc_pair *pair = runtime->private_data;
+   struct dma_chan *tmp_chan = NULL, *be_chan = NULL;
+   struct snd_soc_component *component_be = NULL;
struct fsl_asrc *asrc = pair->asrc;
struct dma_slave_config config_fe, config_be;
enum asrc_pair_index index = pair->index;
@@ -142,7 +144,6 @@ static int fsl_asrc_dma_hw_params(struct snd_soc_component 
*component,
int stream = substream->stream;
struct imx_dma_data *tmp_data;
struct snd_soc_dpcm *dpcm;
-   struct dma_chan *tmp_chan;
struct device *dev_be;
u8 dir = tx ? OUT : IN;
dma_cap_mask_t mask;
@@ -197,18 +198,30 @@ static int fsl_asrc_dma_hw_params(struct 
snd_soc_component *component,
dma_cap_set(DMA_SLAVE, mask);
dma_cap_set(DMA_CYCLIC, mask);
 
+   /*
+* The Back-End device might have already requested a DMA channel,
+* so try to reuse it first, and then request a new one upon NULL.
+*/
+   component_be = snd_soc_lookup_component_nolocked(dev_be, 
SND_DMAENGINE_PCM_DRV_NAME);
+   if (component_be) {
+   be_chan = 
soc_component_to_pcm(component_be)->chan[substream->stream];
+   tmp_chan = be_chan;
+   }
+   if (!tmp_chan)
+   tmp_chan = dma_request_slave_channel(dev_be, tx ? "tx" : "rx");
+
/*
 * An EDMA DEV_TO_DEV channel is fixed and bound with DMA event of each
 * peripheral, unlike SDMA channel that is allocated dynamically. So no
-* need to configure dma_request and dma_request2, but get dma_chan via
-* dma_request_slave_channel directly with dma name of Front-End device
+* need to configure dma_request and dma_request2, but get dma_chan of
+* Back-End device directly via dma_request_slave_channel.
 */
if (!asrc->use_edma) {
/* Get DMA request of Back-End */
-   tmp_chan = dma_request_slave_channel(dev_be, tx ? "tx" : "rx");
tmp_data = tmp_chan->private;
pair->dma_data.dma_request = tmp_data->dma_request;
-   dma_release_channel(tmp_chan);
+   if (!be_chan)
+   dma_release_channel(tmp_chan);
 
/* Get DMA request of Front-End */
tmp_chan = asrc->get_dma_channel(pair, dir);
@@ -221,6 +234,8 @@ static int fsl_asrc_dma_hw_params(struct snd_soc_component 
*component,
pair->dma_chan[dir] =
dma_request_channel(mask, filter, >dma_data);
} else {
+   if (!be_chan)
+   dma_release_channel(tmp_chan);
pair->dma_chan[dir] =
asrc->get_dma_channel(pair, dir);
}
-- 
2.21.0



[RFC PATCH v3 2/4] ASoC: dmaengine_pcm: export soc_component_to_pcm

2020-06-12 Thread Shengjiu Wang
In DPCM case, Front-End needs to get the dma chan which has
been requested by Back-End and reuse it.

Signed-off-by: Shengjiu Wang 
---
 include/sound/dmaengine_pcm.h | 11 +++
 sound/soc/soc-generic-dmaengine-pcm.c | 12 
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/include/sound/dmaengine_pcm.h b/include/sound/dmaengine_pcm.h
index b65220685920..8c5e38180fb0 100644
--- a/include/sound/dmaengine_pcm.h
+++ b/include/sound/dmaengine_pcm.h
@@ -161,4 +161,15 @@ int snd_dmaengine_pcm_prepare_slave_config(struct 
snd_pcm_substream *substream,
 
 #define SND_DMAENGINE_PCM_DRV_NAME "snd_dmaengine_pcm"
 
+struct dmaengine_pcm {
+   struct dma_chan *chan[SNDRV_PCM_STREAM_LAST + 1];
+   const struct snd_dmaengine_pcm_config *config;
+   struct snd_soc_component component;
+   unsigned int flags;
+};
+
+static inline struct dmaengine_pcm *soc_component_to_pcm(struct 
snd_soc_component *p)
+{
+   return container_of(p, struct dmaengine_pcm, component);
+}
 #endif
diff --git a/sound/soc/soc-generic-dmaengine-pcm.c 
b/sound/soc/soc-generic-dmaengine-pcm.c
index f728309a0833..80a4e71f2d95 100644
--- a/sound/soc/soc-generic-dmaengine-pcm.c
+++ b/sound/soc/soc-generic-dmaengine-pcm.c
@@ -21,18 +21,6 @@
  */
 #define SND_DMAENGINE_PCM_FLAG_NO_RESIDUE BIT(31)
 
-struct dmaengine_pcm {
-   struct dma_chan *chan[SNDRV_PCM_STREAM_LAST + 1];
-   const struct snd_dmaengine_pcm_config *config;
-   struct snd_soc_component component;
-   unsigned int flags;
-};
-
-static struct dmaengine_pcm *soc_component_to_pcm(struct snd_soc_component *p)
-{
-   return container_of(p, struct dmaengine_pcm, component);
-}
-
 static struct device *dmaengine_dma_dev(struct dmaengine_pcm *pcm,
struct snd_pcm_substream *substream)
 {
-- 
2.21.0



[RFC PATCH v3 1/4] ASoC: soc-card: export snd_soc_lookup_component_nolocked

2020-06-12 Thread Shengjiu Wang
snd_soc_lookup_component_nolocked can be used for the DPCM case
that Front-End needs to get the unused platform component but
added by Back-End cpu dai driver.

If the component is gotten, then we can get the dma chan created
by Back-End component and reused it in Front-End.

Signed-off-by: Shengjiu Wang 
---
 include/sound/soc.h  | 2 ++
 sound/soc/soc-core.c | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/sound/soc.h b/include/sound/soc.h
index 74868436ac79..565612a8d690 100644
--- a/include/sound/soc.h
+++ b/include/sound/soc.h
@@ -444,6 +444,8 @@ int devm_snd_soc_register_component(struct device *dev,
 const struct snd_soc_component_driver 
*component_driver,
 struct snd_soc_dai_driver *dai_drv, int num_dai);
 void snd_soc_unregister_component(struct device *dev);
+struct snd_soc_component *snd_soc_lookup_component_nolocked(struct device *dev,
+   const char 
*driver_name);
 struct snd_soc_component *snd_soc_lookup_component(struct device *dev,
   const char *driver_name);
 
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
index b07eca2c6ccc..d4c73e86d058 100644
--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c
@@ -310,7 +310,7 @@ struct snd_soc_component *snd_soc_rtdcom_lookup(struct 
snd_soc_pcm_runtime *rtd,
 }
 EXPORT_SYMBOL_GPL(snd_soc_rtdcom_lookup);
 
-static struct snd_soc_component
+struct snd_soc_component
 *snd_soc_lookup_component_nolocked(struct device *dev, const char *driver_name)
 {
struct snd_soc_component *component;
@@ -329,6 +329,7 @@ static struct snd_soc_component
 
return found_component;
 }
+EXPORT_SYMBOL_GPL(snd_soc_lookup_component_nolocked);
 
 struct snd_soc_component *snd_soc_lookup_component(struct device *dev,
   const char *driver_name)
-- 
2.21.0



[RFC PATCH v3 0/4] Reuse the dma channel if available in Back-End

2020-06-12 Thread Shengjiu Wang
Reuse the dma channel if available in Back-End

Shengjiu Wang (4):
  ASoC: soc-card: export snd_soc_lookup_component_nolocked
  ASoC: dmaengine_pcm: export soc_component_to_pcm
  ASoC: fsl_asrc_dma: Reuse the dma channel if available in Back-End
  ASoC: fsl_asrc_dma: Fix data copying speed issue with EDMA

changes in v3:
- update according to Nicolin's comments
- split previous 0003 patch to two patches

changes in v2:
- update according to Mark's comments and split the patch

 include/sound/dmaengine_pcm.h | 11 +++
 include/sound/soc.h   |  2 ++
 sound/soc/fsl/fsl_asrc_common.h   |  2 ++
 sound/soc/fsl/fsl_asrc_dma.c  | 47 +++
 sound/soc/soc-core.c  |  3 +-
 sound/soc/soc-generic-dmaengine-pcm.c | 12 ---
 6 files changed, 50 insertions(+), 27 deletions(-)

-- 
2.21.0



Re: [PATCH 1/2] powerpc/64s: remove PROT_SAO support

2020-06-12 Thread Michael Ellerman
Nicholas Piggin  writes:
> ISA v3.1 does not support the SAO storage control attribute required to
> implement PROT_SAO. PROT_SAO was used by specialised system software
> (Lx86) that has been discontinued for about 7 years, and is not thought
> to be used elsewhere, so removal should not cause problems.
>
> We rather remove it than keep support for older processors, because
> live migrating guest partitions to newer processors may not be possible
> if SAO is in use.

They key details being:
 - you don't remove PROT_SAO from the uapi header, so code using the
   definition will still build.
 - you change arch_validate_prot() to reject PROT_SAO, which means code
   using it will see a failure from mmap() at runtime.


This obviously risks breaking userspace, even if we think it won't in
practice. I guess we don't really have any option given the hardware
support is being dropped.

Can you repost with a wider Cc list, including linux-mm and linux-arch?

I wonder if we should add a comment to the uapi header, eg?

diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index c0c737215b00..d4fdbe768997 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -11,7 +11,7 @@
 #include 
 
 
-#define PROT_SAO   0x10/* Strong Access Ordering */
+#define PROT_SAO   0x10/* Unsupported since v5.9 */
 
 #define MAP_RENAME  MAP_ANONYMOUS   /* In SunOS terminology */
 #define MAP_NORESERVE   0x40/* don't reserve swap pages */


> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index f17442c3a092..d9e92586f8dc 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -20,9 +20,13 @@
>  #define _PAGE_RW (_PAGE_READ | _PAGE_WRITE)
>  #define _PAGE_RWX(_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
>  #define _PAGE_PRIVILEGED 0x8 /* kernel access only */
> -#define _PAGE_SAO0x00010 /* Strong access order */
> +
> +#define _PAGE_CACHE_CTL  0x00030 /* Bits for the folowing cache 
> modes */
> + /*  No bits set is normal cacheable memory */
> + /*  0x00010 unused, is SAO bit on radix POWER9 */
>  #define _PAGE_NON_IDEMPOTENT 0x00020 /* non idempotent memory */
>  #define _PAGE_TOLERANT   0x00030 /* tolerant memory, cache 
> inhibited */
> +

Why'd you do it that way vs just dropping _PAGE_SAO from the or below?

> diff --git a/arch/powerpc/include/asm/cputable.h 
> b/arch/powerpc/include/asm/cputable.h
> index bac2252c839e..c7e923ba 100644
> --- a/arch/powerpc/include/asm/cputable.h
> +++ b/arch/powerpc/include/asm/cputable.h
> @@ -191,7 +191,6 @@ static inline void cpu_feature_keys_init(void) { }
>  #define CPU_FTR_SPURR
> LONG_ASM_CONST(0x0100)
>  #define CPU_FTR_DSCR LONG_ASM_CONST(0x0200)
>  #define CPU_FTR_VSX  LONG_ASM_CONST(0x0400)
> -#define CPU_FTR_SAO  LONG_ASM_CONST(0x0800)

Can you do:

+// FreeLONG_ASM_CONST(0x0800)

> diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
> b/arch/powerpc/include/asm/kvm_book3s_64.h
> index 9bb9bb370b53..579c9229124b 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_64.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
> @@ -400,7 +400,8 @@ static inline bool hpte_cache_flags_ok(unsigned long 
> hptel, bool is_ci)
>  
>   /* Handle SAO */
>   if (wimg == (HPTE_R_W | HPTE_R_I | HPTE_R_M) &&
> - cpu_has_feature(CPU_FTR_ARCH_206))
> + cpu_has_feature(CPU_FTR_ARCH_206) &&
> + !cpu_has_feature(CPU_FTR_ARCH_31))
>   wimg = HPTE_R_M;

Shouldn't it reject that combination if the host can't support it?

Or I guess it does, but yikes that code is not clear.

> diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h
> index d610c2e07b28..43a62f3e21a0 100644
> --- a/arch/powerpc/include/asm/mman.h
> +++ b/arch/powerpc/include/asm/mman.h
> @@ -13,38 +13,24 @@
>  #include 
>  #include 
>  
> -/*
> - * This file is included by linux/mman.h, so we can't use cacl_vm_prot_bits()
> - * here.  How important is the optimization?
> - */

This comment seems confused, but also unrelated to this patch?

> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
> b/arch/powerpc/kernel/dt_cpu_ftrs.c
> index 3a409517c031..8d2e4043702f 100644
> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
> @@ -622,7 +622,7 @@ static struct dt_cpu_feature_match __initdata
>   {"processor-control-facility-v3", feat_enable_dbell, CPU_FTR_DBELL},
>   {"processor-utilization-of-resources-register", feat_enable_purr, 0},
>   {"no-execute", feat_enable, 0},
> - {"strong-access-ordering", feat_enable,