Re: [PATCH v8 26/29] vfio-pci: Register an iommu fault handler

2019-06-18 Thread Jacob Pan
On Tue, 18 Jun 2019 15:04:36 +0100
Jean-Philippe Brucker  wrote:

> On 12/06/2019 19:53, Jacob Pan wrote:
> >>> You are right, the worst case of the spurious PS is to terminate
> >>> the group prematurely. Need to know the scope of the HW damage in
> >>> case of mdev where group IDs can be shared among mdevs belong to
> >>> the same PF.
> >>
> >> But from the IOMMU fault API point of view, the full page request
> >> is identified by both PRGI and PASID. Given that each mdev has its
> >> own set of PASIDs, it should be easy to isolate page responses per
> >> mdev. 
> > On Intel platform, devices sending page request with private data
> > must receive page response with matching private data. If we solely
> > depend on PRGI and PASID, we may send stale private data to the
> > device in those incorrect page response. Since private data may
> > represent PF device wide contexts, the consequence of sending page
> > response with wrong private data may affect other mdev/PASID.
> > 
> > One solution we are thinking to do is to inject the sequence #(e.g.
> > ktime raw mono clock) as vIOMMU private data into to the guest.
> > Guest would return this fake private data in page response, then
> > host will send page response back to the device that matches PRG1
> > and PASID and private_data.
> > 
> > This solution does not expose HW context related private data to the
> > guest but need to extend page response in iommu uapi.
> > 
> > /**
> >  * struct iommu_page_response - Generic page response information
> >  * @version: API version of this structure
> >  * @flags: encodes whether the corresponding fields are valid
> >  * (IOMMU_FAULT_PAGE_RESPONSE_* values)
> >  * @pasid: Process Address Space ID
> >  * @grpid: Page Request Group Index
> >  * @code: response code from  iommu_page_response_code
> >  * @private_data: private data for the matching page request
> >  */
> > struct iommu_page_response {
> > #define IOMMU_PAGE_RESP_VERSION_1   1
> > __u32   version;
> > #define IOMMU_PAGE_RESP_PASID_VALID (1 << 0)
> > #define IOMMU_PAGE_RESP_PRIVATE_DATA(1 << 1)
> > __u32   flags;
> > __u32   pasid;
> > __u32   grpid;
> > __u32   code;
> > __u32   padding;
> > __u64   private_data[2];
> > };
> > 
> > There is also the change needed for separating storage for the real
> > and fake private data.
> > 
> > Sorry for the last minute change, did not realize the HW
> > implications.
> > 
> > I see this as a future extension due to limited testing,   
> 
> I'm wondering how we deal with:
> (1) old userspace that won't fill the new private_data field in
> page_response. A new kernel still has to support it.
> (2) old kernel that won't recognize the new PRIVATE_DATA flag.
> Currently iommu_page_response() rejects page responses with unknown
> flags.
> 
> I guess we'll need a two-way negotiation, where userspace queries
> whether the kernel supports the flag (2), and the kernel learns
> whether it should expect the private data to come back (1).
> 
I am not sure case (1) exist in that there is no existing user space
supports PRQ w/o private data. Am I missing something?

For VT-d emulation, private data is always part of the scalable mode
PASID capability. If vIOMMU query host supports PASID and scalable
mode, it will always support private data once PRQ is enabled.

So I think we only need to negotiate (2) which should be covered by
VT-d PASID cap.

> > perhaps for
> > now, can you add paddings similar to page request? Make it 64B as
> > well.  
> 
> I don't think padding is necessary, because iommu_page_response is
> sent by userspace to the kernel, unlike iommu_fault which is
> allocated by userspace and filled by the kernel.
> 
> Page response looks a lot more like existing VFIO mechanisms, so I
> suppose we'll wrap the iommu_page_response structure and include an
> argsz parameter at the top:
> 
>   struct vfio_iommu_page_response {
>   u32 argsz;
>   struct iommu_page_response pr;
>   };
> 
>   struct vfio_iommu_page_response vpr = {
>   .argsz = sizeof(vpr),
>   .pr = ...
>   ...
>   };
> 
>   ioctl(devfd, VFIO_IOMMU_PAGE_RESPONSE, );
> 
> In that case supporting private data can be done by simply appending a
> field at the end (plus the negotiation above).
> 
Do you mean at the end of struct vfio_iommu_page_response{}? or at
the end of that seems struct iommu_page_response{}?

The consumer of the private data is iommu driver not vfio. So I think
you want to add the new field at the end of struct iommu_page_response,
right?
I think that would work, just to clarify.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 2/2] KVM: arm64: Skip more of the SError vaxorcism

2019-06-18 Thread James Morse
During __guest_exit() we need to consume any SError left pending by the
guest so it doesn't contaminate the host. With v8.2 we use the
ESB-instruction. For systems without v8.2, we use dsb+isb and unmask
SError. We do this on every guest exit.

Use the same dsb+isr_el1 trick, this lets us know if an SError is pending
after the dsb, allowing us to skip the isb and self-synchronising PSTATE
write if its not.

This means SError remains masked during KVM's world-switch, so any SError
that occurs during this time is reported by the host, instead of causing
a hyp-panic.

As we're benchmarking this code lets polish the layout. If you give gcc
likely()/unlikely() hints in an if() condition, it shuffles the generated
assembly so that the likely case is immediately after the branch. Lets
do the same here.

Signed-off-by: James Morse 

Changes since v2:
 * Added isb after the dsb to prevent an early read
---
 arch/arm64/kvm/hyp/entry.S | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index 017ec4189a08..269e7b2da1fd 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -162,8 +162,16 @@ alternative_if ARM64_HAS_RAS_EXTN
orr x0, x0, #(1

[PATCH v3 1/2] KVM: arm64: Re-mask SError after the one instruction window

2019-06-18 Thread James Morse
KVM consumes any SError that were pending during guest exit with a
dsb/isb and unmasking SError. It currently leaves SError unmasked for
the rest of world-switch.

This means any SError that occurs during this part of world-switch
will cause a hyp-panic. We'd much prefer it to remain pending until
we return to the host.

Signed-off-by: James Morse 
---
 arch/arm64/kvm/hyp/entry.S | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index d9a55503fab7..017ec4189a08 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -186,6 +186,8 @@ abort_guest_exit_start:
.global abort_guest_exit_end
 abort_guest_exit_end:
 
+   msr daifset, #4 // Mask aborts
+
// If the exception took place, restore the EL1 exception
// context so that we can report some information.
// Merge the exception code with the SError pending bit.
-- 
2.20.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 0/2] KVM: arm64: Skip more of the SError vaxorcism

2019-06-18 Thread James Morse
In guest_enter we used ISR_EL1 to know if an SError is pending as we
really don't want to take it as an exception. We can do the same
in guest_exit, which saves toggling bits in pstate.

This lets us leave SError masked for the remainder of world-switch
without having to toggle pstate twice.

Changes since v2:
 * Added patch 1 of this series to make the 'SError remains masked'
   behaviour explicit
 * Added missing isb before the isr_el1 read.

James Morse (2):
  KVM: arm64: Re-mask SError after the one instruction window
  KVM: arm64: Skip more of the SError vaxorcism

 arch/arm64/kvm/hyp/entry.S | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

-- 
2.20.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 6/6] arm64: Update silicon-errata.txt for Neoverse-N1 #1349291

2019-06-18 Thread James Morse
Neoverse-N1 affected by #1349291 may report an Uncontained RAS Error
as Unrecoverable. The kernel's architecture code already considers
Unrecoverable errors as fatal as without kernel-first support no
further error-handling is possible.

Now that KVM attributes SError to the host/guest more precisely
the host's architecture code will always handle host errors that
become pending during world-switch.
Errors misclassified by this errata that affected the guest will be
re-injected to the guest as an implementation-defined SError, which can
be uncontained.

Until kernel-first support is implemented, no workaround is needed
for this issue.

Signed-off-by: James Morse 
---
imp-def SError can mean uncontained. In the RAS spec, 2.4.2 "ESB and other
containable errors":
| It is [imp-def] whether [imp-def] and uncategorized SError interrupts
| are containable or Uncontainable.
---
 Documentation/arm64/silicon-errata.txt | 1 +
 arch/arm64/kernel/traps.c  | 4 
 2 files changed, 5 insertions(+)

diff --git a/Documentation/arm64/silicon-errata.txt 
b/Documentation/arm64/silicon-errata.txt
index 2735462d5958..51d506a1f8dc 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -63,6 +63,7 @@ stable kernels.
 | ARM| Cortex-A76  | #1286807| ARM64_ERRATUM_1286807   
|
 | ARM| Cortex-A76  | #1463225| ARM64_ERRATUM_1463225   
|
 | ARM| Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040   
|
+| ARM| Neoverse-N1 | #1349291| N/A 
|
 | ARM| MMU-500 | #841119,826419  | N/A 
|
 || | | 
|
 | Cavium | ThunderX ITS| #22375,24313| CAVIUM_ERRATUM_22375
|
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index e6be1a6efc0a..9f64cd609298 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -888,6 +888,10 @@ bool arm64_is_fatal_ras_serror(struct pt_regs *regs, 
unsigned int esr)
/*
 * The CPU can't make progress. The exception may have
 * been imprecise.
+*
+* Neoverse-N1 #1349291 means a non-KVM SError reported as
+* Unrecoverable should be treated as Uncontainable. We
+* call arm64_serror_panic() in both cases.
 */
return true;
 
-- 
2.20.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 4/6] KVM: arm64: Consume pending SError as early as possible

2019-06-18 Thread James Morse
On systems with v8.2 we switch the 'vaxorcism' of guest SError with an
alternative sequence that uses the ESB-instruction, then reads DISR_EL1.
This saves the unmasking and remasking of asynchronous exceptions.

We do this after we've saved the guest registers and restored the
host's. Any SError that becomes pending due to this will be accounted
to the guest, when it actually occurred during host-execution.

Move the ESB-instruction as early as possible. Any guest SError
will become pending due to this ESB-instruction and then consumed to
DISR_EL1 before the host touches anything.

This lets us account for host/guest SError precisely on the guest
exit exception boundary.

Because the ESB-instruction now lands in the preamble section of
the vectors, we need to add it to the unpatched indirect vectors
too, and to any sequence that may be patched in over the top.

The ESB-instruction always lives in the head of the vectors,
to be before any memory write. Whereas the register-store always
lives in the tail.

Signed-off-by: James Morse 
---
Changes since v1:
 * nop in the invalid vector, now that we check its size
 * esb in the unpatched head for
   ARM64_HARDEN_EL2_VECTORS && !_HARDEN_BRANCH_PREDICTOR
---
 arch/arm64/include/asm/kvm_asm.h | 2 +-
 arch/arm64/kvm/hyp/entry.S   | 5 ++---
 arch/arm64/kvm/hyp/hyp-entry.S   | 6 +-
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 96c2d79063aa..38b46ce1dfee 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -45,7 +45,7 @@
  * Size of the HYP vectors preamble. kvm_patch_vector_branch() generates code
  * that jumps over this.
  */
-#define KVM_VECTOR_PREAMBLE(1 * AARCH64_INSN_SIZE)
+#define KVM_VECTOR_PREAMBLE(2 * AARCH64_INSN_SIZE)
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index 93ba3d7ef027..7863ec5266e2 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -138,8 +138,8 @@ ENTRY(__guest_exit)
 
 alternative_if ARM64_HAS_RAS_EXTN
// If we have the RAS extensions we can consume a pending error
-   // without an unmask-SError and isb.
-   esb
+   // without an unmask-SError and isb. The ESB-instruction consumed any
+   // pending guest error when we took the exception from the guest.
mrs_s   x2, SYS_DISR_EL1
str x2, [x1, #(VCPU_FAULT_DISR - VCPU_CONTEXT)]
cbz x2, 1f
@@ -157,7 +157,6 @@ alternative_else
mov x5, x0
 
dsb sy  // Synchronize against in-flight ld/st
-   nop
msr daifclr, #4 // Unmask aborts
 alternative_endif
 
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index 5f0412f124a3..8fbfac35f83f 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -237,6 +237,7 @@ ENDPROC(\label)
 .macro valid_vect target
.align 7
 661:
+   esb
stp x0, x1, [sp, #-16]!
 662:
b   \target
@@ -248,6 +249,7 @@ check_preamble_length 661b, 662b
.align 7
 661:
b   \target
+   nop
 662:
ldp x0, x1, [sp], #16
b   \target
@@ -280,7 +282,8 @@ ENDPROC(__kvm_hyp_vector)
 #ifdef CONFIG_KVM_INDIRECT_VECTORS
 .macro hyp_ventry
.align 7
-1: .rept 27
+1: esb
+   .rept 26
nop
.endr
 /*
@@ -328,6 +331,7 @@ ENTRY(__bp_harden_hyp_vecs_end)
.popsection
 
 ENTRY(__smccc_workaround_1_smc_start)
+   esb
sub sp, sp, #(8 * 4)
stp x2, x3, [sp, #(8 * 0)]
stp x0, x1, [sp, #(8 * 2)]
-- 
2.20.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 0/6] KVM: arm64: Account host/guest SError more precisely (Neoverse-N1 #1349291)

2019-06-18 Thread James Morse
Hello,

This series started as a workaround for Neoverse-N1 #1349291, but has
become an improvement in RAS error accounting for KVM on arm64.

Neoverse-N1 affected by #1349291 may report an Uncontained RAS Error
as Unrecoverable. [0] This is the difference between killing the thread and
killing the machine.
The workaround is to treat all Unrecoverable SError as Uncontained.
The arch code's SError handling already does this, due to its nascent
kernel-first support.

So only KVM needs some work as it has its own SError handling as we want
KVM to handle guest:SError and the host to handle host:SError.

Instead of working around the errata in KVM, we account SError as precisely
as we can instead. This means moving the ESB-instruction into the guest-exit
vectors, and deferring guest-entry if there is an SError pending. (so that the
host's existing handling takes it).

Change since v2:
 * Added isb after the dsb to prevent an early read of ISR_EL1.

Changes since v1:
 * Squashed v1:patch5 into v2:patch4. v1:patch6 to be posted separately.
 * Dropped all the performance numbers.
 * Added patch1, making the ESB macro emit a nop if the kconfig feature
   is disabled.
 * Tried to formalise the indirect vectors preamble a little more to
   make changes easier to review
 * Added preamble size checks to the invalid vector entries.
 * Pulled the size check out as a macro now there are two invocations.

Thanks,

James


[0] account-required: 
https://developer.arm.com/docs/sden885747/latest/arm-neoverse-n1-mp050-software-developer-errata-notice

[v1] 
https://lore.kernel.org/linux-arm-kernel/20190604144551.188107-1-james.mo...@arm.com/

James Morse (6):
  arm64: assember: Switch ESB-instruction with a vanilla nop if
!ARM64_HAS_RAS
  KVM: arm64: Abstract the size of the HYP vectors pre-amble
  KVM: arm64: Make indirect vectors preamble behaviour symmetric
  KVM: arm64: Consume pending SError as early as possible
  KVM: arm64: Defer guest entry when an asynchronous exception is
pending
  arm64: Update silicon-errata.txt for Neoverse-N1 #1349291

 Documentation/arm64/silicon-errata.txt |  1 +
 arch/arm64/include/asm/assembler.h |  4 
 arch/arm64/include/asm/kvm_asm.h   |  6 ++
 arch/arm64/kernel/traps.c  |  4 
 arch/arm64/kvm/hyp/entry.S | 20 ++---
 arch/arm64/kvm/hyp/hyp-entry.S | 30 +-
 arch/arm64/kvm/va_layout.c |  7 +++---
 7 files changed, 60 insertions(+), 12 deletions(-)

-- 
2.20.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 3/6] KVM: arm64: Make indirect vectors preamble behaviour symmetric

2019-06-18 Thread James Morse
The KVM indirect vectors support is a little complicated. Different CPUs
may use different exception vectors for KVM that are generated at boot.
Adding new instructions involves checking all the possible combinations
do the right thing.

To make changes here easier to review lets state what we expect of the
preamble:
  1. The first vector run, must always run the preamble.
  2. Patching the head or tail of the vector shouldn't remove
 preamble instructions.

Today, this is easy as we only have one instruction in the preamble.
Change the unpatched tail of the indirect vector so that it always
runs this, regardless of patching.

Signed-off-by: James Morse 
---
New for v2.
---
 arch/arm64/kvm/hyp/hyp-entry.S | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index c7f9d5e271a9..5f0412f124a3 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -286,7 +286,7 @@ ENDPROC(__kvm_hyp_vector)
 /*
  * The default sequence is to directly branch to the KVM vectors,
  * using the computed offset. This applies for VHE as well as
- * !ARM64_HARDEN_EL2_VECTORS.
+ * !ARM64_HARDEN_EL2_VECTORS. The first vector must always run the preamble.
  *
  * For ARM64_HARDEN_EL2_VECTORS configurations, this gets replaced
  * with:
@@ -302,8 +302,8 @@ ENDPROC(__kvm_hyp_vector)
  * See kvm_patch_vector_branch for details.
  */
 alternative_cb kvm_patch_vector_branch
-   b   __kvm_hyp_vector + (1b - 0b)
-   nop
+   stp x0, x1, [sp, #-16]!
+   b   __kvm_hyp_vector + (1b - 0b + KVM_VECTOR_PREAMBLE)
nop
nop
nop
-- 
2.20.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 2/6] KVM: arm64: Abstract the size of the HYP vectors pre-amble

2019-06-18 Thread James Morse
The EL2 vector hardening feature causes KVM to generate vectors for
each type of CPU present in the system. The generated sequences already
do some of the early guest-exit work (i.e. saving registers). To avoid
duplication the generated vectors branch to the original vector just
after the preamble. This size is hard coded.

Adding new instructions to the HYP vector causes strange side effects,
which are difficult to debug as the affected code is patched in at
runtime.

Add KVM_VECTOR_PREAMBLE to tell kvm_patch_vector_branch() how big
the preamble is. The valid_vect macro can then validate this at
build time.

Reviewed-by: Julien Thierry 
Signed-off-by: James Morse 

---
Changes since v1:
 * Use AARCH64_INSN_SIZE
 * Add preamble build check to invalid_vect too
---
 arch/arm64/include/asm/kvm_asm.h |  6 ++
 arch/arm64/kvm/hyp/hyp-entry.S   | 18 +-
 arch/arm64/kvm/va_layout.c   |  7 +++
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index ff73f5462aca..96c2d79063aa 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -41,6 +41,12 @@
{ARM_EXCEPTION_TRAP,"TRAP"  },  \
{ARM_EXCEPTION_HYP_GONE,"HYP_GONE"  }
 
+/*
+ * Size of the HYP vectors preamble. kvm_patch_vector_branch() generates code
+ * that jumps over this.
+ */
+#define KVM_VECTOR_PREAMBLE(1 * AARCH64_INSN_SIZE)
+
 #ifndef __ASSEMBLY__
 
 #include 
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index 2b1e686772bf..c7f9d5e271a9 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -227,17 +227,32 @@ ENDPROC(\label)
 
.align 11
 
+.macro check_preamble_length start, end
+/* kvm_patch_vector_branch() generates code that jumps over the preamble. */
+.if ((\end-\start) != KVM_VECTOR_PREAMBLE)
+   .error "KVM vector preamble length mismatch"
+.endif
+.endm
+
 .macro valid_vect target
.align 7
+661:
stp x0, x1, [sp, #-16]!
+662:
b   \target
+
+check_preamble_length 661b, 662b
 .endm
 
 .macro invalid_vect target
.align 7
+661:
b   \target
+662:
ldp x0, x1, [sp], #16
b   \target
+
+check_preamble_length 661b, 662b
 .endm
 
 ENTRY(__kvm_hyp_vector)
@@ -282,7 +297,8 @@ ENDPROC(__kvm_hyp_vector)
  * movkx0, #((addr >> 32) & 0x), lsl #32
  * br  x0
  *
- * Where addr = kern_hyp_va(__kvm_hyp_vector) + vector-offset + 4.
+ * Where:
+ * addr = kern_hyp_va(__kvm_hyp_vector) + vector-offset + KVM_VECTOR_PREAMBLE.
  * See kvm_patch_vector_branch for details.
  */
 alternative_cb kvm_patch_vector_branch
diff --git a/arch/arm64/kvm/va_layout.c b/arch/arm64/kvm/va_layout.c
index c712a7376bc1..58b3a91db892 100644
--- a/arch/arm64/kvm/va_layout.c
+++ b/arch/arm64/kvm/va_layout.c
@@ -181,11 +181,10 @@ void kvm_patch_vector_branch(struct alt_instr *alt,
addr |= ((u64)origptr & GENMASK_ULL(10, 7));
 
/*
-* Branch to the second instruction in the vectors in order to
-* avoid the initial store on the stack (which we already
-* perform in the hardening vectors).
+* Branch over the preamble in order to avoid the initial store on
+* the stack (which we already perform in the hardening vectors).
 */
-   addr += AARCH64_INSN_SIZE;
+   addr += KVM_VECTOR_PREAMBLE;
 
/* stp x0, x1, [sp, #-16]! */
insn = aarch64_insn_gen_load_store_pair(AARCH64_INSN_REG_0,
-- 
2.20.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 5/6] KVM: arm64: Defer guest entry when an asynchronous exception is pending

2019-06-18 Thread James Morse
SError that occur during world-switch's entry to the guest will be
accounted to the guest, as the exception is masked until we enter the
guest... but we want to attribute the SError as precisely as possible.

Reading DISR_EL1 before guest entry requires free registers, and using
ESB+DISR_EL1 to consume and read back the ESR would leave KVM holding
a host SError... We would rather leave the SError pending and let the
host take it once we exit world-switch. To do this, we need to defer
guest-entry if an SError is pending.

Read the ISR to see if SError (or an IRQ) is pending. If so fake an
exit. Place this check between __guest_enter()'s save of the host
registers, and restore of the guest's. SError that occur between
here and the eret into the guest must have affected the guest's
registers, which we can naturally attribute to the guest.

The dsb is needed to ensure any previous writes have been done before
we read ISR_EL1. On systems without the v8.2 RAS extensions this
doesn't give us anything as we can't contain errors, and the ESR bits
to describe the severity are all implementation-defined. Replace
this with a nop for these systems.

Signed-off-by: James Morse 
---
Changes since v2:
 * Added isb after the dsb to prevent an early read
 * Fixed some spelling
Changes since v1:
 * Squashed later dsb/nop patch in here
---
 arch/arm64/kvm/hyp/entry.S | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index 7863ec5266e2..d9a55503fab7 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -17,6 +17,7 @@
 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -63,6 +64,20 @@ ENTRY(__guest_enter)
// Store the host regs
save_callee_saved_regs x1
 
+   // Now the host state is stored if we have a pending RAS SError it must
+   // affect the host. If any asynchronous exception is pending we defer
+   // the guest entry. The DSB isn't necessary before v8.2 as any SError
+   // would be fatal.
+alternative_if ARM64_HAS_RAS_EXTN
+   dsb nshst
+   isb
+alternative_else_nop_endif
+   mrs x1, isr_el1
+   cbz x1,  1f
+   mov x0, #ARM_EXCEPTION_IRQ
+   ret
+
+1:
add x18, x0, #VCPU_CONTEXT
 
// Macro ptrauth_switch_to_guest format:
-- 
2.20.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 1/6] arm64: assembler: Switch ESB-instruction with a vanilla nop if !ARM64_HAS_RAS

2019-06-18 Thread James Morse
The ESB-instruction is a nop on CPUs that don't implement the RAS
extensions. This lets us use it in places like the vectors without
having to use alternatives.

If someone disables CONFIG_ARM64_RAS_EXTN, this instruction still has
its RAS extensions behaviour, but we no longer read DISR_EL1 as this
register does depend on alternatives.

This could go wrong if we want to synchronize an SError from a KVM
guest. On a CPU that has the RAS extensions, but the KConfig option
was disabled, we consume the pending SError with no chance of ever
reading it.

Hide the ESB-instruction behind the CONFIG_ARM64_RAS_EXTN option,
outputting a regular nop if the feature has been disabled.

Reported-by: Julien Thierry 
Signed-off-by: James Morse 
---
New for v2. The esb where this would be a problem is added later in
this series, but there is no build-dependency.
---
 arch/arm64/include/asm/assembler.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index 92b6b7cf67dd..2d2114b39846 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -107,7 +107,11 @@
  * RAS Error Synchronization barrier
  */
.macro  esb
+#ifdef CONFIG_ARM64_RAS_EXTN
hint#16
+#else
+   nop
+#endif
.endm
 
 /*
-- 
2.20.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2] KVM: arm64: Skip more of the SError vaxorcism

2019-06-18 Thread James Morse
Hi Marc,

On 10/06/2019 17:58, Marc Zyngier wrote:
> On 10/06/2019 17:30, James Morse wrote:
>> During __guest_exit() we need to consume any SError left pending by the
>> guest so it doesn't contaminate the host. With v8.2 we use the
>> ESB-instruction. For systems without v8.2, we use dsb+isb and unmask
>> SError. We do this on every guest exit.
>>
>> Use the same dsb+isr_el1 trick, this lets us know if an SError is pending
>> after the dsb, allowing us to skip the isb and self-synchronising PSTATE
>> write if its not.
>>
>> This means SError remains masked during KVM's world-switch, so any SError
>> that occurs during this time is reported by the host, instead of causing
>> a hyp-panic.
> 
> Ah, that'd be pretty good.

I'll add a patch to re-mask it so this intent is clear, and the 
behaviour/performance
stuff is done in separate patches.


>> If you give gcc likely()/unlikely() hints in an if() condition, it
>> shuffles the generated assembly so that the likely case is immediately
>> after the branch. Lets do the same here.
>>
>> Signed-off-by: James Morse 
>> ---
>> This patch was previously posted as part of:
>> [v1] 
>> https://lore.kernel.org/linux-arm-kernel/20190604144551.188107-1-james.mo...@arm.com/
>>
>>  arch/arm64/kvm/hyp/entry.S | 14 ++
>>  1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
>> index a5a4254314a1..c2de1a1faaf4 100644
>> --- a/arch/arm64/kvm/hyp/entry.S
>> +++ b/arch/arm64/kvm/hyp/entry.S
>> @@ -161,18 +161,24 @@ alternative_if ARM64_HAS_RAS_EXTN
>>  orr x0, x0, #(1<>  1:  ret
>>  alternative_else
>> -// If we have a pending asynchronous abort, now is the
>> -// time to find out. From your VAXorcist book, page 666:
>> +dsb sy  // Synchronize against in-flight ld/st
>> +mrs x2, isr_el1
> 
> The CPU is allowed to perform a system register access before the DSB
> completes if it doesn't have a side effect. Reading ISR_EL1 doesn't have
> such side effect, so you could end-up missing the abort. An ISB after
> DSB should cure that,

... bother ...


> but you'll need to verify that it doesn't make
> things much worse than what we already have.

Retested with isb in both patches, and Robin's better assembly.

This still saves the self-synchronising pstate modification, (of which we'd 
need two if we
want to keep SError masked over the rest of world-switch)

On Xgene:
| 5.2.0-rc2-6-g9b94314 mean:3215 stddev:45
| 5.2.0-rc2-7-g5d37b0b mean:3176 stddev:30
| with this patch 1.23% faster

On Seattle:
| 5.2.0-rc2-6-g9b9431445730 mean:4474 stddev:10
| 5.2.0-rc2-7-g5d37b0b5dd65 mean:4410 stddev:27
| with this patch: 1.44% faster


Thanks,

James
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2] KVM: arm64: Skip more of the SError vaxorcism

2019-06-18 Thread James Morse
Hi Robin,

On 10/06/2019 17:38, Robin Murphy wrote:
> On 10/06/2019 17:30, James Morse wrote:
>> During __guest_exit() we need to consume any SError left pending by the
>> guest so it doesn't contaminate the host. With v8.2 we use the
>> ESB-instruction. For systems without v8.2, we use dsb+isb and unmask
>> SError. We do this on every guest exit.
>>
>> Use the same dsb+isr_el1 trick, this lets us know if an SError is pending
>> after the dsb, allowing us to skip the isb and self-synchronising PSTATE
>> write if its not.
>>
>> This means SError remains masked during KVM's world-switch, so any SError
>> that occurs during this time is reported by the host, instead of causing
>> a hyp-panic.
>>
>> If you give gcc likely()/unlikely() hints in an if() condition, it
>> shuffles the generated assembly so that the likely case is immediately
>> after the branch. Lets do the same here.

>> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
>> index a5a4254314a1..c2de1a1faaf4 100644
>> --- a/arch/arm64/kvm/hyp/entry.S
>> +++ b/arch/arm64/kvm/hyp/entry.S
>> @@ -161,18 +161,24 @@ alternative_if ARM64_HAS_RAS_EXTN
>>   orr    x0, x0, #(1<>   1:    ret
>>   alternative_else
>> -    // If we have a pending asynchronous abort, now is the
>> -    // time to find out. From your VAXorcist book, page 666:
>> +    dsb    sy    // Synchronize against in-flight ld/st
>> +    mrs    x2, isr_el1
>> +    and    x2, x2, #(1<<8)    // ISR_EL1.A
>> +    cbnz    x2, 2f

> It doesn't appear that anyone cares much about x2 containing the masked value 
> after
> returning, so is this just a needlessly long-form TBNZ?

Yes, I'd make a third-rate compiler.

(I almost certainly had 'cmp x2, xzr' in there at some point!)


Thanks,

James
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v8 26/29] vfio-pci: Register an iommu fault handler

2019-06-18 Thread Jean-Philippe Brucker
On 12/06/2019 19:53, Jacob Pan wrote:
>>> You are right, the worst case of the spurious PS is to terminate the
>>> group prematurely. Need to know the scope of the HW damage in case
>>> of mdev where group IDs can be shared among mdevs belong to the
>>> same PF.  
>>
>> But from the IOMMU fault API point of view, the full page request is
>> identified by both PRGI and PASID. Given that each mdev has its own
>> set of PASIDs, it should be easy to isolate page responses per mdev.
>>
> On Intel platform, devices sending page request with private data must
> receive page response with matching private data. If we solely depend
> on PRGI and PASID, we may send stale private data to the device in
> those incorrect page response. Since private data may represent PF
> device wide contexts, the consequence of sending page response with
> wrong private data may affect other mdev/PASID.
> 
> One solution we are thinking to do is to inject the sequence #(e.g.
> ktime raw mono clock) as vIOMMU private data into to the guest. Guest
> would return this fake private data in page response, then host will
> send page response back to the device that matches PRG1 and PASID and
> private_data.
> 
> This solution does not expose HW context related private data to the
> guest but need to extend page response in iommu uapi.
> 
> /**
>  * struct iommu_page_response - Generic page response information
>  * @version: API version of this structure
>  * @flags: encodes whether the corresponding fields are valid
>  * (IOMMU_FAULT_PAGE_RESPONSE_* values)
>  * @pasid: Process Address Space ID
>  * @grpid: Page Request Group Index
>  * @code: response code from  iommu_page_response_code
>  * @private_data: private data for the matching page request
>  */
> struct iommu_page_response {
> #define IOMMU_PAGE_RESP_VERSION_1 1
>   __u32   version;
> #define IOMMU_PAGE_RESP_PASID_VALID   (1 << 0)
> #define IOMMU_PAGE_RESP_PRIVATE_DATA  (1 << 1)
>   __u32   flags;
>   __u32   pasid;
>   __u32   grpid;
>   __u32   code;
>   __u32   padding;
>   __u64   private_data[2];
> };
> 
> There is also the change needed for separating storage for the real and
> fake private data.
> 
> Sorry for the last minute change, did not realize the HW implications.
> 
> I see this as a future extension due to limited testing, 

I'm wondering how we deal with:
(1) old userspace that won't fill the new private_data field in
page_response. A new kernel still has to support it.
(2) old kernel that won't recognize the new PRIVATE_DATA flag. Currently
iommu_page_response() rejects page responses with unknown flags.

I guess we'll need a two-way negotiation, where userspace queries
whether the kernel supports the flag (2), and the kernel learns whether
it should expect the private data to come back (1).

> perhaps for
> now, can you add paddings similar to page request? Make it 64B as well.

I don't think padding is necessary, because iommu_page_response is sent
by userspace to the kernel, unlike iommu_fault which is allocated by
userspace and filled by the kernel.

Page response looks a lot more like existing VFIO mechanisms, so I
suppose we'll wrap the iommu_page_response structure and include an
argsz parameter at the top:

struct vfio_iommu_page_response {
u32 argsz;
struct iommu_page_response pr;
};

struct vfio_iommu_page_response vpr = {
.argsz = sizeof(vpr),
.pr = ...
...
};

ioctl(devfd, VFIO_IOMMU_PAGE_RESPONSE, );

In that case supporting private data can be done by simply appending a
field at the end (plus the negotiation above).

Thanks,
Jean
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm