Re: [PATCH v8 26/29] vfio-pci: Register an iommu fault handler
On Tue, 18 Jun 2019 15:04:36 +0100 Jean-Philippe Brucker wrote: > On 12/06/2019 19:53, Jacob Pan wrote: > >>> You are right, the worst case of the spurious PS is to terminate > >>> the group prematurely. Need to know the scope of the HW damage in > >>> case of mdev where group IDs can be shared among mdevs belong to > >>> the same PF. > >> > >> But from the IOMMU fault API point of view, the full page request > >> is identified by both PRGI and PASID. Given that each mdev has its > >> own set of PASIDs, it should be easy to isolate page responses per > >> mdev. > > On Intel platform, devices sending page request with private data > > must receive page response with matching private data. If we solely > > depend on PRGI and PASID, we may send stale private data to the > > device in those incorrect page response. Since private data may > > represent PF device wide contexts, the consequence of sending page > > response with wrong private data may affect other mdev/PASID. > > > > One solution we are thinking to do is to inject the sequence #(e.g. > > ktime raw mono clock) as vIOMMU private data into to the guest. > > Guest would return this fake private data in page response, then > > host will send page response back to the device that matches PRG1 > > and PASID and private_data. > > > > This solution does not expose HW context related private data to the > > guest but need to extend page response in iommu uapi. > > > > /** > > * struct iommu_page_response - Generic page response information > > * @version: API version of this structure > > * @flags: encodes whether the corresponding fields are valid > > * (IOMMU_FAULT_PAGE_RESPONSE_* values) > > * @pasid: Process Address Space ID > > * @grpid: Page Request Group Index > > * @code: response code from iommu_page_response_code > > * @private_data: private data for the matching page request > > */ > > struct iommu_page_response { > > #define IOMMU_PAGE_RESP_VERSION_1 1 > > __u32 version; > > #define IOMMU_PAGE_RESP_PASID_VALID (1 << 0) > > #define IOMMU_PAGE_RESP_PRIVATE_DATA(1 << 1) > > __u32 flags; > > __u32 pasid; > > __u32 grpid; > > __u32 code; > > __u32 padding; > > __u64 private_data[2]; > > }; > > > > There is also the change needed for separating storage for the real > > and fake private data. > > > > Sorry for the last minute change, did not realize the HW > > implications. > > > > I see this as a future extension due to limited testing, > > I'm wondering how we deal with: > (1) old userspace that won't fill the new private_data field in > page_response. A new kernel still has to support it. > (2) old kernel that won't recognize the new PRIVATE_DATA flag. > Currently iommu_page_response() rejects page responses with unknown > flags. > > I guess we'll need a two-way negotiation, where userspace queries > whether the kernel supports the flag (2), and the kernel learns > whether it should expect the private data to come back (1). > I am not sure case (1) exist in that there is no existing user space supports PRQ w/o private data. Am I missing something? For VT-d emulation, private data is always part of the scalable mode PASID capability. If vIOMMU query host supports PASID and scalable mode, it will always support private data once PRQ is enabled. So I think we only need to negotiate (2) which should be covered by VT-d PASID cap. > > perhaps for > > now, can you add paddings similar to page request? Make it 64B as > > well. > > I don't think padding is necessary, because iommu_page_response is > sent by userspace to the kernel, unlike iommu_fault which is > allocated by userspace and filled by the kernel. > > Page response looks a lot more like existing VFIO mechanisms, so I > suppose we'll wrap the iommu_page_response structure and include an > argsz parameter at the top: > > struct vfio_iommu_page_response { > u32 argsz; > struct iommu_page_response pr; > }; > > struct vfio_iommu_page_response vpr = { > .argsz = sizeof(vpr), > .pr = ... > ... > }; > > ioctl(devfd, VFIO_IOMMU_PAGE_RESPONSE, ); > > In that case supporting private data can be done by simply appending a > field at the end (plus the negotiation above). > Do you mean at the end of struct vfio_iommu_page_response{}? or at the end of that seems struct iommu_page_response{}? The consumer of the private data is iommu driver not vfio. So I think you want to add the new field at the end of struct iommu_page_response, right? I think that would work, just to clarify. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v3 2/2] KVM: arm64: Skip more of the SError vaxorcism
During __guest_exit() we need to consume any SError left pending by the guest so it doesn't contaminate the host. With v8.2 we use the ESB-instruction. For systems without v8.2, we use dsb+isb and unmask SError. We do this on every guest exit. Use the same dsb+isr_el1 trick, this lets us know if an SError is pending after the dsb, allowing us to skip the isb and self-synchronising PSTATE write if its not. This means SError remains masked during KVM's world-switch, so any SError that occurs during this time is reported by the host, instead of causing a hyp-panic. As we're benchmarking this code lets polish the layout. If you give gcc likely()/unlikely() hints in an if() condition, it shuffles the generated assembly so that the likely case is immediately after the branch. Lets do the same here. Signed-off-by: James Morse Changes since v2: * Added isb after the dsb to prevent an early read --- arch/arm64/kvm/hyp/entry.S | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S index 017ec4189a08..269e7b2da1fd 100644 --- a/arch/arm64/kvm/hyp/entry.S +++ b/arch/arm64/kvm/hyp/entry.S @@ -162,8 +162,16 @@ alternative_if ARM64_HAS_RAS_EXTN orr x0, x0, #(1
[PATCH v3 1/2] KVM: arm64: Re-mask SError after the one instruction window
KVM consumes any SError that were pending during guest exit with a dsb/isb and unmasking SError. It currently leaves SError unmasked for the rest of world-switch. This means any SError that occurs during this part of world-switch will cause a hyp-panic. We'd much prefer it to remain pending until we return to the host. Signed-off-by: James Morse --- arch/arm64/kvm/hyp/entry.S | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S index d9a55503fab7..017ec4189a08 100644 --- a/arch/arm64/kvm/hyp/entry.S +++ b/arch/arm64/kvm/hyp/entry.S @@ -186,6 +186,8 @@ abort_guest_exit_start: .global abort_guest_exit_end abort_guest_exit_end: + msr daifset, #4 // Mask aborts + // If the exception took place, restore the EL1 exception // context so that we can report some information. // Merge the exception code with the SError pending bit. -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v3 0/2] KVM: arm64: Skip more of the SError vaxorcism
In guest_enter we used ISR_EL1 to know if an SError is pending as we really don't want to take it as an exception. We can do the same in guest_exit, which saves toggling bits in pstate. This lets us leave SError masked for the remainder of world-switch without having to toggle pstate twice. Changes since v2: * Added patch 1 of this series to make the 'SError remains masked' behaviour explicit * Added missing isb before the isr_el1 read. James Morse (2): KVM: arm64: Re-mask SError after the one instruction window KVM: arm64: Skip more of the SError vaxorcism arch/arm64/kvm/hyp/entry.S | 16 1 file changed, 12 insertions(+), 4 deletions(-) -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v3 6/6] arm64: Update silicon-errata.txt for Neoverse-N1 #1349291
Neoverse-N1 affected by #1349291 may report an Uncontained RAS Error as Unrecoverable. The kernel's architecture code already considers Unrecoverable errors as fatal as without kernel-first support no further error-handling is possible. Now that KVM attributes SError to the host/guest more precisely the host's architecture code will always handle host errors that become pending during world-switch. Errors misclassified by this errata that affected the guest will be re-injected to the guest as an implementation-defined SError, which can be uncontained. Until kernel-first support is implemented, no workaround is needed for this issue. Signed-off-by: James Morse --- imp-def SError can mean uncontained. In the RAS spec, 2.4.2 "ESB and other containable errors": | It is [imp-def] whether [imp-def] and uncategorized SError interrupts | are containable or Uncontainable. --- Documentation/arm64/silicon-errata.txt | 1 + arch/arm64/kernel/traps.c | 4 2 files changed, 5 insertions(+) diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt index 2735462d5958..51d506a1f8dc 100644 --- a/Documentation/arm64/silicon-errata.txt +++ b/Documentation/arm64/silicon-errata.txt @@ -63,6 +63,7 @@ stable kernels. | ARM| Cortex-A76 | #1286807| ARM64_ERRATUM_1286807 | | ARM| Cortex-A76 | #1463225| ARM64_ERRATUM_1463225 | | ARM| Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 | +| ARM| Neoverse-N1 | #1349291| N/A | | ARM| MMU-500 | #841119,826419 | N/A | || | | | | Cavium | ThunderX ITS| #22375,24313| CAVIUM_ERRATUM_22375 | diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c index e6be1a6efc0a..9f64cd609298 100644 --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -888,6 +888,10 @@ bool arm64_is_fatal_ras_serror(struct pt_regs *regs, unsigned int esr) /* * The CPU can't make progress. The exception may have * been imprecise. +* +* Neoverse-N1 #1349291 means a non-KVM SError reported as +* Unrecoverable should be treated as Uncontainable. We +* call arm64_serror_panic() in both cases. */ return true; -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v3 4/6] KVM: arm64: Consume pending SError as early as possible
On systems with v8.2 we switch the 'vaxorcism' of guest SError with an alternative sequence that uses the ESB-instruction, then reads DISR_EL1. This saves the unmasking and remasking of asynchronous exceptions. We do this after we've saved the guest registers and restored the host's. Any SError that becomes pending due to this will be accounted to the guest, when it actually occurred during host-execution. Move the ESB-instruction as early as possible. Any guest SError will become pending due to this ESB-instruction and then consumed to DISR_EL1 before the host touches anything. This lets us account for host/guest SError precisely on the guest exit exception boundary. Because the ESB-instruction now lands in the preamble section of the vectors, we need to add it to the unpatched indirect vectors too, and to any sequence that may be patched in over the top. The ESB-instruction always lives in the head of the vectors, to be before any memory write. Whereas the register-store always lives in the tail. Signed-off-by: James Morse --- Changes since v1: * nop in the invalid vector, now that we check its size * esb in the unpatched head for ARM64_HARDEN_EL2_VECTORS && !_HARDEN_BRANCH_PREDICTOR --- arch/arm64/include/asm/kvm_asm.h | 2 +- arch/arm64/kvm/hyp/entry.S | 5 ++--- arch/arm64/kvm/hyp/hyp-entry.S | 6 +- 3 files changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index 96c2d79063aa..38b46ce1dfee 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -45,7 +45,7 @@ * Size of the HYP vectors preamble. kvm_patch_vector_branch() generates code * that jumps over this. */ -#define KVM_VECTOR_PREAMBLE(1 * AARCH64_INSN_SIZE) +#define KVM_VECTOR_PREAMBLE(2 * AARCH64_INSN_SIZE) #ifndef __ASSEMBLY__ diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S index 93ba3d7ef027..7863ec5266e2 100644 --- a/arch/arm64/kvm/hyp/entry.S +++ b/arch/arm64/kvm/hyp/entry.S @@ -138,8 +138,8 @@ ENTRY(__guest_exit) alternative_if ARM64_HAS_RAS_EXTN // If we have the RAS extensions we can consume a pending error - // without an unmask-SError and isb. - esb + // without an unmask-SError and isb. The ESB-instruction consumed any + // pending guest error when we took the exception from the guest. mrs_s x2, SYS_DISR_EL1 str x2, [x1, #(VCPU_FAULT_DISR - VCPU_CONTEXT)] cbz x2, 1f @@ -157,7 +157,6 @@ alternative_else mov x5, x0 dsb sy // Synchronize against in-flight ld/st - nop msr daifclr, #4 // Unmask aborts alternative_endif diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S index 5f0412f124a3..8fbfac35f83f 100644 --- a/arch/arm64/kvm/hyp/hyp-entry.S +++ b/arch/arm64/kvm/hyp/hyp-entry.S @@ -237,6 +237,7 @@ ENDPROC(\label) .macro valid_vect target .align 7 661: + esb stp x0, x1, [sp, #-16]! 662: b \target @@ -248,6 +249,7 @@ check_preamble_length 661b, 662b .align 7 661: b \target + nop 662: ldp x0, x1, [sp], #16 b \target @@ -280,7 +282,8 @@ ENDPROC(__kvm_hyp_vector) #ifdef CONFIG_KVM_INDIRECT_VECTORS .macro hyp_ventry .align 7 -1: .rept 27 +1: esb + .rept 26 nop .endr /* @@ -328,6 +331,7 @@ ENTRY(__bp_harden_hyp_vecs_end) .popsection ENTRY(__smccc_workaround_1_smc_start) + esb sub sp, sp, #(8 * 4) stp x2, x3, [sp, #(8 * 0)] stp x0, x1, [sp, #(8 * 2)] -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v3 0/6] KVM: arm64: Account host/guest SError more precisely (Neoverse-N1 #1349291)
Hello, This series started as a workaround for Neoverse-N1 #1349291, but has become an improvement in RAS error accounting for KVM on arm64. Neoverse-N1 affected by #1349291 may report an Uncontained RAS Error as Unrecoverable. [0] This is the difference between killing the thread and killing the machine. The workaround is to treat all Unrecoverable SError as Uncontained. The arch code's SError handling already does this, due to its nascent kernel-first support. So only KVM needs some work as it has its own SError handling as we want KVM to handle guest:SError and the host to handle host:SError. Instead of working around the errata in KVM, we account SError as precisely as we can instead. This means moving the ESB-instruction into the guest-exit vectors, and deferring guest-entry if there is an SError pending. (so that the host's existing handling takes it). Change since v2: * Added isb after the dsb to prevent an early read of ISR_EL1. Changes since v1: * Squashed v1:patch5 into v2:patch4. v1:patch6 to be posted separately. * Dropped all the performance numbers. * Added patch1, making the ESB macro emit a nop if the kconfig feature is disabled. * Tried to formalise the indirect vectors preamble a little more to make changes easier to review * Added preamble size checks to the invalid vector entries. * Pulled the size check out as a macro now there are two invocations. Thanks, James [0] account-required: https://developer.arm.com/docs/sden885747/latest/arm-neoverse-n1-mp050-software-developer-errata-notice [v1] https://lore.kernel.org/linux-arm-kernel/20190604144551.188107-1-james.mo...@arm.com/ James Morse (6): arm64: assember: Switch ESB-instruction with a vanilla nop if !ARM64_HAS_RAS KVM: arm64: Abstract the size of the HYP vectors pre-amble KVM: arm64: Make indirect vectors preamble behaviour symmetric KVM: arm64: Consume pending SError as early as possible KVM: arm64: Defer guest entry when an asynchronous exception is pending arm64: Update silicon-errata.txt for Neoverse-N1 #1349291 Documentation/arm64/silicon-errata.txt | 1 + arch/arm64/include/asm/assembler.h | 4 arch/arm64/include/asm/kvm_asm.h | 6 ++ arch/arm64/kernel/traps.c | 4 arch/arm64/kvm/hyp/entry.S | 20 ++--- arch/arm64/kvm/hyp/hyp-entry.S | 30 +- arch/arm64/kvm/va_layout.c | 7 +++--- 7 files changed, 60 insertions(+), 12 deletions(-) -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v3 3/6] KVM: arm64: Make indirect vectors preamble behaviour symmetric
The KVM indirect vectors support is a little complicated. Different CPUs may use different exception vectors for KVM that are generated at boot. Adding new instructions involves checking all the possible combinations do the right thing. To make changes here easier to review lets state what we expect of the preamble: 1. The first vector run, must always run the preamble. 2. Patching the head or tail of the vector shouldn't remove preamble instructions. Today, this is easy as we only have one instruction in the preamble. Change the unpatched tail of the indirect vector so that it always runs this, regardless of patching. Signed-off-by: James Morse --- New for v2. --- arch/arm64/kvm/hyp/hyp-entry.S | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S index c7f9d5e271a9..5f0412f124a3 100644 --- a/arch/arm64/kvm/hyp/hyp-entry.S +++ b/arch/arm64/kvm/hyp/hyp-entry.S @@ -286,7 +286,7 @@ ENDPROC(__kvm_hyp_vector) /* * The default sequence is to directly branch to the KVM vectors, * using the computed offset. This applies for VHE as well as - * !ARM64_HARDEN_EL2_VECTORS. + * !ARM64_HARDEN_EL2_VECTORS. The first vector must always run the preamble. * * For ARM64_HARDEN_EL2_VECTORS configurations, this gets replaced * with: @@ -302,8 +302,8 @@ ENDPROC(__kvm_hyp_vector) * See kvm_patch_vector_branch for details. */ alternative_cb kvm_patch_vector_branch - b __kvm_hyp_vector + (1b - 0b) - nop + stp x0, x1, [sp, #-16]! + b __kvm_hyp_vector + (1b - 0b + KVM_VECTOR_PREAMBLE) nop nop nop -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v3 2/6] KVM: arm64: Abstract the size of the HYP vectors pre-amble
The EL2 vector hardening feature causes KVM to generate vectors for each type of CPU present in the system. The generated sequences already do some of the early guest-exit work (i.e. saving registers). To avoid duplication the generated vectors branch to the original vector just after the preamble. This size is hard coded. Adding new instructions to the HYP vector causes strange side effects, which are difficult to debug as the affected code is patched in at runtime. Add KVM_VECTOR_PREAMBLE to tell kvm_patch_vector_branch() how big the preamble is. The valid_vect macro can then validate this at build time. Reviewed-by: Julien Thierry Signed-off-by: James Morse --- Changes since v1: * Use AARCH64_INSN_SIZE * Add preamble build check to invalid_vect too --- arch/arm64/include/asm/kvm_asm.h | 6 ++ arch/arm64/kvm/hyp/hyp-entry.S | 18 +- arch/arm64/kvm/va_layout.c | 7 +++ 3 files changed, 26 insertions(+), 5 deletions(-) diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index ff73f5462aca..96c2d79063aa 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -41,6 +41,12 @@ {ARM_EXCEPTION_TRAP,"TRAP" }, \ {ARM_EXCEPTION_HYP_GONE,"HYP_GONE" } +/* + * Size of the HYP vectors preamble. kvm_patch_vector_branch() generates code + * that jumps over this. + */ +#define KVM_VECTOR_PREAMBLE(1 * AARCH64_INSN_SIZE) + #ifndef __ASSEMBLY__ #include diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S index 2b1e686772bf..c7f9d5e271a9 100644 --- a/arch/arm64/kvm/hyp/hyp-entry.S +++ b/arch/arm64/kvm/hyp/hyp-entry.S @@ -227,17 +227,32 @@ ENDPROC(\label) .align 11 +.macro check_preamble_length start, end +/* kvm_patch_vector_branch() generates code that jumps over the preamble. */ +.if ((\end-\start) != KVM_VECTOR_PREAMBLE) + .error "KVM vector preamble length mismatch" +.endif +.endm + .macro valid_vect target .align 7 +661: stp x0, x1, [sp, #-16]! +662: b \target + +check_preamble_length 661b, 662b .endm .macro invalid_vect target .align 7 +661: b \target +662: ldp x0, x1, [sp], #16 b \target + +check_preamble_length 661b, 662b .endm ENTRY(__kvm_hyp_vector) @@ -282,7 +297,8 @@ ENDPROC(__kvm_hyp_vector) * movkx0, #((addr >> 32) & 0x), lsl #32 * br x0 * - * Where addr = kern_hyp_va(__kvm_hyp_vector) + vector-offset + 4. + * Where: + * addr = kern_hyp_va(__kvm_hyp_vector) + vector-offset + KVM_VECTOR_PREAMBLE. * See kvm_patch_vector_branch for details. */ alternative_cb kvm_patch_vector_branch diff --git a/arch/arm64/kvm/va_layout.c b/arch/arm64/kvm/va_layout.c index c712a7376bc1..58b3a91db892 100644 --- a/arch/arm64/kvm/va_layout.c +++ b/arch/arm64/kvm/va_layout.c @@ -181,11 +181,10 @@ void kvm_patch_vector_branch(struct alt_instr *alt, addr |= ((u64)origptr & GENMASK_ULL(10, 7)); /* -* Branch to the second instruction in the vectors in order to -* avoid the initial store on the stack (which we already -* perform in the hardening vectors). +* Branch over the preamble in order to avoid the initial store on +* the stack (which we already perform in the hardening vectors). */ - addr += AARCH64_INSN_SIZE; + addr += KVM_VECTOR_PREAMBLE; /* stp x0, x1, [sp, #-16]! */ insn = aarch64_insn_gen_load_store_pair(AARCH64_INSN_REG_0, -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v3 5/6] KVM: arm64: Defer guest entry when an asynchronous exception is pending
SError that occur during world-switch's entry to the guest will be accounted to the guest, as the exception is masked until we enter the guest... but we want to attribute the SError as precisely as possible. Reading DISR_EL1 before guest entry requires free registers, and using ESB+DISR_EL1 to consume and read back the ESR would leave KVM holding a host SError... We would rather leave the SError pending and let the host take it once we exit world-switch. To do this, we need to defer guest-entry if an SError is pending. Read the ISR to see if SError (or an IRQ) is pending. If so fake an exit. Place this check between __guest_enter()'s save of the host registers, and restore of the guest's. SError that occur between here and the eret into the guest must have affected the guest's registers, which we can naturally attribute to the guest. The dsb is needed to ensure any previous writes have been done before we read ISR_EL1. On systems without the v8.2 RAS extensions this doesn't give us anything as we can't contain errors, and the ESR bits to describe the severity are all implementation-defined. Replace this with a nop for these systems. Signed-off-by: James Morse --- Changes since v2: * Added isb after the dsb to prevent an early read * Fixed some spelling Changes since v1: * Squashed later dsb/nop patch in here --- arch/arm64/kvm/hyp/entry.S | 15 +++ 1 file changed, 15 insertions(+) diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S index 7863ec5266e2..d9a55503fab7 100644 --- a/arch/arm64/kvm/hyp/entry.S +++ b/arch/arm64/kvm/hyp/entry.S @@ -17,6 +17,7 @@ #include +#include #include #include #include @@ -63,6 +64,20 @@ ENTRY(__guest_enter) // Store the host regs save_callee_saved_regs x1 + // Now the host state is stored if we have a pending RAS SError it must + // affect the host. If any asynchronous exception is pending we defer + // the guest entry. The DSB isn't necessary before v8.2 as any SError + // would be fatal. +alternative_if ARM64_HAS_RAS_EXTN + dsb nshst + isb +alternative_else_nop_endif + mrs x1, isr_el1 + cbz x1, 1f + mov x0, #ARM_EXCEPTION_IRQ + ret + +1: add x18, x0, #VCPU_CONTEXT // Macro ptrauth_switch_to_guest format: -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v3 1/6] arm64: assembler: Switch ESB-instruction with a vanilla nop if !ARM64_HAS_RAS
The ESB-instruction is a nop on CPUs that don't implement the RAS extensions. This lets us use it in places like the vectors without having to use alternatives. If someone disables CONFIG_ARM64_RAS_EXTN, this instruction still has its RAS extensions behaviour, but we no longer read DISR_EL1 as this register does depend on alternatives. This could go wrong if we want to synchronize an SError from a KVM guest. On a CPU that has the RAS extensions, but the KConfig option was disabled, we consume the pending SError with no chance of ever reading it. Hide the ESB-instruction behind the CONFIG_ARM64_RAS_EXTN option, outputting a regular nop if the feature has been disabled. Reported-by: Julien Thierry Signed-off-by: James Morse --- New for v2. The esb where this would be a problem is added later in this series, but there is no build-dependency. --- arch/arm64/include/asm/assembler.h | 4 1 file changed, 4 insertions(+) diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index 92b6b7cf67dd..2d2114b39846 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -107,7 +107,11 @@ * RAS Error Synchronization barrier */ .macro esb +#ifdef CONFIG_ARM64_RAS_EXTN hint#16 +#else + nop +#endif .endm /* -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2] KVM: arm64: Skip more of the SError vaxorcism
Hi Marc, On 10/06/2019 17:58, Marc Zyngier wrote: > On 10/06/2019 17:30, James Morse wrote: >> During __guest_exit() we need to consume any SError left pending by the >> guest so it doesn't contaminate the host. With v8.2 we use the >> ESB-instruction. For systems without v8.2, we use dsb+isb and unmask >> SError. We do this on every guest exit. >> >> Use the same dsb+isr_el1 trick, this lets us know if an SError is pending >> after the dsb, allowing us to skip the isb and self-synchronising PSTATE >> write if its not. >> >> This means SError remains masked during KVM's world-switch, so any SError >> that occurs during this time is reported by the host, instead of causing >> a hyp-panic. > > Ah, that'd be pretty good. I'll add a patch to re-mask it so this intent is clear, and the behaviour/performance stuff is done in separate patches. >> If you give gcc likely()/unlikely() hints in an if() condition, it >> shuffles the generated assembly so that the likely case is immediately >> after the branch. Lets do the same here. >> >> Signed-off-by: James Morse >> --- >> This patch was previously posted as part of: >> [v1] >> https://lore.kernel.org/linux-arm-kernel/20190604144551.188107-1-james.mo...@arm.com/ >> >> arch/arm64/kvm/hyp/entry.S | 14 ++ >> 1 file changed, 10 insertions(+), 4 deletions(-) >> >> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S >> index a5a4254314a1..c2de1a1faaf4 100644 >> --- a/arch/arm64/kvm/hyp/entry.S >> +++ b/arch/arm64/kvm/hyp/entry.S >> @@ -161,18 +161,24 @@ alternative_if ARM64_HAS_RAS_EXTN >> orr x0, x0, #(1<> 1: ret >> alternative_else >> -// If we have a pending asynchronous abort, now is the >> -// time to find out. From your VAXorcist book, page 666: >> +dsb sy // Synchronize against in-flight ld/st >> +mrs x2, isr_el1 > > The CPU is allowed to perform a system register access before the DSB > completes if it doesn't have a side effect. Reading ISR_EL1 doesn't have > such side effect, so you could end-up missing the abort. An ISB after > DSB should cure that, ... bother ... > but you'll need to verify that it doesn't make > things much worse than what we already have. Retested with isb in both patches, and Robin's better assembly. This still saves the self-synchronising pstate modification, (of which we'd need two if we want to keep SError masked over the rest of world-switch) On Xgene: | 5.2.0-rc2-6-g9b94314 mean:3215 stddev:45 | 5.2.0-rc2-7-g5d37b0b mean:3176 stddev:30 | with this patch 1.23% faster On Seattle: | 5.2.0-rc2-6-g9b9431445730 mean:4474 stddev:10 | 5.2.0-rc2-7-g5d37b0b5dd65 mean:4410 stddev:27 | with this patch: 1.44% faster Thanks, James ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2] KVM: arm64: Skip more of the SError vaxorcism
Hi Robin, On 10/06/2019 17:38, Robin Murphy wrote: > On 10/06/2019 17:30, James Morse wrote: >> During __guest_exit() we need to consume any SError left pending by the >> guest so it doesn't contaminate the host. With v8.2 we use the >> ESB-instruction. For systems without v8.2, we use dsb+isb and unmask >> SError. We do this on every guest exit. >> >> Use the same dsb+isr_el1 trick, this lets us know if an SError is pending >> after the dsb, allowing us to skip the isb and self-synchronising PSTATE >> write if its not. >> >> This means SError remains masked during KVM's world-switch, so any SError >> that occurs during this time is reported by the host, instead of causing >> a hyp-panic. >> >> If you give gcc likely()/unlikely() hints in an if() condition, it >> shuffles the generated assembly so that the likely case is immediately >> after the branch. Lets do the same here. >> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S >> index a5a4254314a1..c2de1a1faaf4 100644 >> --- a/arch/arm64/kvm/hyp/entry.S >> +++ b/arch/arm64/kvm/hyp/entry.S >> @@ -161,18 +161,24 @@ alternative_if ARM64_HAS_RAS_EXTN >> orr x0, x0, #(1<> 1: ret >> alternative_else >> - // If we have a pending asynchronous abort, now is the >> - // time to find out. From your VAXorcist book, page 666: >> + dsb sy // Synchronize against in-flight ld/st >> + mrs x2, isr_el1 >> + and x2, x2, #(1<<8) // ISR_EL1.A >> + cbnz x2, 2f > It doesn't appear that anyone cares much about x2 containing the masked value > after > returning, so is this just a needlessly long-form TBNZ? Yes, I'd make a third-rate compiler. (I almost certainly had 'cmp x2, xzr' in there at some point!) Thanks, James ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v8 26/29] vfio-pci: Register an iommu fault handler
On 12/06/2019 19:53, Jacob Pan wrote: >>> You are right, the worst case of the spurious PS is to terminate the >>> group prematurely. Need to know the scope of the HW damage in case >>> of mdev where group IDs can be shared among mdevs belong to the >>> same PF. >> >> But from the IOMMU fault API point of view, the full page request is >> identified by both PRGI and PASID. Given that each mdev has its own >> set of PASIDs, it should be easy to isolate page responses per mdev. >> > On Intel platform, devices sending page request with private data must > receive page response with matching private data. If we solely depend > on PRGI and PASID, we may send stale private data to the device in > those incorrect page response. Since private data may represent PF > device wide contexts, the consequence of sending page response with > wrong private data may affect other mdev/PASID. > > One solution we are thinking to do is to inject the sequence #(e.g. > ktime raw mono clock) as vIOMMU private data into to the guest. Guest > would return this fake private data in page response, then host will > send page response back to the device that matches PRG1 and PASID and > private_data. > > This solution does not expose HW context related private data to the > guest but need to extend page response in iommu uapi. > > /** > * struct iommu_page_response - Generic page response information > * @version: API version of this structure > * @flags: encodes whether the corresponding fields are valid > * (IOMMU_FAULT_PAGE_RESPONSE_* values) > * @pasid: Process Address Space ID > * @grpid: Page Request Group Index > * @code: response code from iommu_page_response_code > * @private_data: private data for the matching page request > */ > struct iommu_page_response { > #define IOMMU_PAGE_RESP_VERSION_1 1 > __u32 version; > #define IOMMU_PAGE_RESP_PASID_VALID (1 << 0) > #define IOMMU_PAGE_RESP_PRIVATE_DATA (1 << 1) > __u32 flags; > __u32 pasid; > __u32 grpid; > __u32 code; > __u32 padding; > __u64 private_data[2]; > }; > > There is also the change needed for separating storage for the real and > fake private data. > > Sorry for the last minute change, did not realize the HW implications. > > I see this as a future extension due to limited testing, I'm wondering how we deal with: (1) old userspace that won't fill the new private_data field in page_response. A new kernel still has to support it. (2) old kernel that won't recognize the new PRIVATE_DATA flag. Currently iommu_page_response() rejects page responses with unknown flags. I guess we'll need a two-way negotiation, where userspace queries whether the kernel supports the flag (2), and the kernel learns whether it should expect the private data to come back (1). > perhaps for > now, can you add paddings similar to page request? Make it 64B as well. I don't think padding is necessary, because iommu_page_response is sent by userspace to the kernel, unlike iommu_fault which is allocated by userspace and filled by the kernel. Page response looks a lot more like existing VFIO mechanisms, so I suppose we'll wrap the iommu_page_response structure and include an argsz parameter at the top: struct vfio_iommu_page_response { u32 argsz; struct iommu_page_response pr; }; struct vfio_iommu_page_response vpr = { .argsz = sizeof(vpr), .pr = ... ... }; ioctl(devfd, VFIO_IOMMU_PAGE_RESPONSE, ); In that case supporting private data can be done by simply appending a field at the end (plus the negotiation above). Thanks, Jean ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm