Re: [PATCH 2/2] KVM: arm64: nVHE: Don't consume host SErrors with RAS

2020-08-11 Thread Andrew Scull
On Wed, Aug 05, 2020 at 03:37:27PM +0100, James Morse wrote:
> Hi Andrew,
> 
> On 31/07/2020 11:20, Andrew Scull wrote:
> > On Fri, Jul 31, 2020 at 09:00:03AM +0100, Marc Zyngier wrote:
> >> On 2020-07-30 23:31, Andrew Scull wrote:
> >>> On Thu, Jul 30, 2020 at 04:18:23PM +0100, Andrew Scull wrote:
>  The ESB at the start of the vectors causes any SErrors to be
>  consumed to
>  DISR_EL1. If the exception came from the host and the ESB caught an
>  SError, it would not be noticed until a guest exits and DISR_EL1 is
>  checked. Further, the SError would be attributed to the guest and not
>  the host.
> 
>  To avoid these problems, use a different exception vector for the host
>  that does not use an ESB but instead leaves any host SError pending. A
>  guest will not be entered if an SError is pending so it will always be
>  the host that will receive and handle it.
> >>>
> >>> Thinking further, I'm not sure this actually solves all of the problem.
> >>> It does prevent hyp from causing a host SError to be consumed but, IIUC,
> >>> there could be an SError already deferred by the host and logged in
> >>> DISR_EL1 that hyp would not preserve if a guest is run.
> >>>
> >>> I think the host's DISR_EL1 would need to be saved and restored in the
> >>> vcpu context switch which, from a cursory read of the ARM, is possible
> >>> without having to virtualize SErrors for the host.
> >>
> >> The question is what do you if you have something pending in DISR_EL1
> >> at the point where you enter EL2? Context switching it is not going to
> >> help. One problem is that you'd need to do an ESB, corrupting DISR_EL1,
> >> before any memory access (I'm assuming you can get traps where all
> >> registers are live). I can't see how we square this circle.
> 
> > I'll expand on what I understand (or think I do) about RAS at the
> > moment. It should hopefully highlight anything wrong with my reasoning
> > for your questions.
> > 
> > DISR_EL1.A being set means a pending physical SError has been
> > consumed/cleared. The host has already deferred an SError so saving and
> > restoring (i.e. preserving) DISR_EL1 for the host would mean the
> > deferred SError is as it was on return to the host.
> 
> [..]
> 
> > If there is a pending physical SError, we'd have to keep it pending so
> > the host can consume it.
> 
> I agree!
> 
> 
> > __guest_enter has the dsb-isb for RAS so
> > SErrors will become pending, but it doesn't consume them.
> 
> 
> > I can't
> > remember now whether this was reliable or not; I assumed it was as it is
> > gated on the RAS config.
> 
> It should be reliable for 'asynchronous external abort', but not 'SError 
> Interrupt', both
> of which are signalled via the SError vector.

Uh oh, more terms with differences that I don't understand yet.

I assume that there aren't SError Interrupts that we would be concerned
about being deliered to the guest that should have been delivered to the
host after checing for asynchronous external aborts? At least, that seem
to be the current behaviour if I'm reading things right.

> The DSB completes previous memory accesses, so that the components they land 
> in can return
> an abort if they need to. The ISB ensures the abort is seen as pending in 
> ISR_EL1 before
> the CPU reads it.
> 
> The ESB-instruction is potentially cheaper, (as cheap as a nop if all errors 
> are going to
> be reported as uncontained anyway), but it has the nasty side effect of 
> consuming the
> error, which we don't want here.
>
>
> > The above didn't need an ESB for the host but incorrect assumptions
> > might change that.
> > 
> >> Furthermore, assuming you find a way to do it, what do you do with it?
> >>
> >> (a) Either there was something pending already and it is still pending,
> 
> > If a physical SError is pending, you leave it pending and go back to the
> > host so it can consume it.
> 
> Great idea!
> If the CPU has IESB, you can just check ISR_EL1 on entry from the host.
> If not, you'll need some extra barriers.

I'm not sure it would be easy to check ISR_EL1 and bail to the host in
the general case as it would require a meaningful return code for each
context, something like EAGAIN that doesn't exist in the hyp interfaces.

I was instead thinking that EL2 would continue as usual except it won't
consume any SErrors with the ESB. Care is needed when a vCPU is being
run and the DSB-ISB + ISR_EL1 check in __guest_enter is be there to
prevent a guest being entered if the host needs to handle an SError.

> >> (b) Or there was nothing pending and you now have an error that you
> >> don't know how to report (the host EL1 never issued an ESB)
> > 
> > If there isn't a physical SError pending, either there is no SError at
> > all (easy) or the SError has already been consumed to DISR_EL1 by a host
> > ESB and we'd preserve DISR_EL1 for the host to handle however it chooses.
> 
> (I think this is a host bug)
> 
> 
> >> We could just error out on 

Re: [PATCH 2/2] KVM: arm64: nVHE: Don't consume host SErrors with RAS

2020-08-11 Thread Andrew Scull
On Wed, Aug 05, 2020 at 03:34:11PM +0100, James Morse wrote:
> Hi Andrew,
> 
> On 30/07/2020 23:31, Andrew Scull wrote:
> > On Thu, Jul 30, 2020 at 04:18:23PM +0100, Andrew Scull wrote:
> >> The ESB at the start of the vectors causes any SErrors to be consumed to
> >> DISR_EL1. If the exception came from the host and the ESB caught an
> >> SError, it would not be noticed until a guest exits and DISR_EL1 is
> >> checked. Further, the SError would be attributed to the guest and not
> >> the host.
> >>
> >> To avoid these problems, use a different exception vector for the host
> >> that does not use an ESB but instead leaves any host SError pending. A
> >> guest will not be entered if an SError is pending so it will always be
> >> the host that will receive and handle it.
> 
> > Thinking further, I'm not sure this actually solves all of the problem.
> > It does prevent hyp from causing a host SError to be consumed but, IIUC,
> > there could be an SError already deferred by the host and logged in
> > DISR_EL1 that hyp would not preserve if a guest is run.
> 
> I think that would be a host bug.
> 
> The ESB-instruction is the only thing that writes to DISR_EL1, and only if 
> PSTATE.A is
> set. The host should:
> * Read DISR_EL1 after running the ESB-instruction,
> * Not call into HYP from SError masked code!

Good to know that this is the intent and not just what appears to happen
today.

> (VHE only does it to match the nVHE behaviour, as KVM isn't prepared to 
> handle these).
> 
> 
> 'ESB-instruction' is pedantry to avoid the risk of it being confused with 
> IESB, which is
> just the barrier bit, not the writes-to-DISR bit.
> 
> 
> > I think the host's DISR_EL1 would need to be saved and restored in the
> > vcpu context switch which, from a cursory read of the ARM, is possible
> > without having to virtualize SErrors for the host.
> 
> ... I thought this was a redirected register. Reads from EL1 when HCR_EL2.AMO 
> is set get
> the value from VDISR_EL2, meaning the guest can't read DISR_EL1 at all.
> (see 'Accessing DISR_EL1' in the register description, "D13.7.1
> DISR_EL1, Deferred Interrupt Status Register" of DDI0487F.a

The host doesn't run with HCR_EL2.AMO set so it uses DISR_EL1 directly,
but hyp also uses DISR_EL1 directly during __guest_exit. That is the
clobbering I was concerned about. It may not be a problem most of the
time given what you said above, but I think something like the diff
below should be enough to be sure it is preserved:

 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h 
b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index 28f349288f3a..a34210c1c877 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -56,8 +56,12 @@ static inline void __sysreg_save_el2_return_state(struct 
kvm_cpu_context *ctxt)
ctxt->regs.pc   = read_sysreg_el2(SYS_ELR);
ctxt->regs.pstate   = read_sysreg_el2(SYS_SPSR);
 
-   if (cpus_have_final_cap(ARM64_HAS_RAS_EXTN))
-   ctxt_sys_reg(ctxt, DISR_EL1) = read_sysreg_s(SYS_VDISR_EL2);
+   if (cpus_have_final_cap(ARM64_HAS_RAS_EXTN)) {
+   if (kvm_is_host_cpu_context(ctxt))
+   ctxt_sys_reg(ctxt, DISR_EL1) = 
read_sysreg_s(SYS_DISR_EL1);
+   else
+   ctxt_sys_reg(ctxt, DISR_EL1) = 
read_sysreg_s(SYS_VDISR_EL2);
+   }
 }
 
 static inline void __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
@@ -152,8 +156,12 @@ static inline void 
__sysreg_restore_el2_return_state(struct kvm_cpu_context *ctx
write_sysreg_el2(ctxt->regs.pc, SYS_ELR);
write_sysreg_el2(pstate,SYS_SPSR);
 
-   if (cpus_have_final_cap(ARM64_HAS_RAS_EXTN))
-   write_sysreg_s(ctxt_sys_reg(ctxt, DISR_EL1), SYS_VDISR_EL2);
+   if (cpus_have_final_cap(ARM64_HAS_RAS_EXTN)) {
+   if (kvm_is_host_cpu_context(ctxt))
+   write_sysreg_s(ctxt_sys_reg(ctxt, DISR_EL1), 
SYS_DISR_EL1);
+   else
+   write_sysreg_s(ctxt_sys_reg(ctxt, DISR_EL1), 
SYS_VDISR_EL2);
+   }
 }
 
 static inline void __sysreg32_save_state(struct kvm_vcpu *vcpu)


> >> Hyp initialization is now passed the vector that is used for the host
> >> and the vector for guests is stored in a percpu variable as
> >> kvm_get_hyp_vector() is not suitable for calling from nVHE hyp.
> 
> 
> Thanks,
> 
> James
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/2] KVM: arm64: nVHE: Don't consume host SErrors with RAS

2020-08-11 Thread Andrew Scull
On Wed, Aug 05, 2020 at 03:33:22PM +0100, James Morse wrote:
> Hi Andrew,
> 
> On 30/07/2020 16:18, Andrew Scull wrote:
> > The ESB at the start of the vectors causes any SErrors to be consumed to
> > DISR_EL1. If the exception came from the host and the ESB caught an
> > SError, it would not be noticed until a guest exits and DISR_EL1 is
> > checked. Further, the SError would be attributed to the guest and not
> > the host.
> 
> Yup, this happens because the world has moved underneath this code:

That's understandable!

> Previously any v8.2 RAS capable system would have been crazy to turn off VHE, 
> so v8.0 and
> v8.1 systems had a nop here instead, and v8.2 systems had VHE, so there were 
> no 'from the
> host' EL2 vectors.

RAS with nVHE is a buildable config and we're being foolish enough to
think about it so hopefully we can work something out :)

> > To avoid these problems, use a different exception vector for the host
> > that does not use an ESB but instead leaves any host SError pending. A
> > guest will not be entered if an SError is pending so it will always be
> > the host that will receive and handle it.
> > 
> > Hyp initialization is now passed the vector that is used for the host
> > and the vector for guests is stored in a percpu variable as
> > kvm_get_hyp_vector() is not suitable for calling from nVHE hyp.
> 
> > Fixes: 0e5b9c085dce ("KVM: arm64: Consume pending SError as early as 
> > possible")
> 
> Surely this can only happen if you had turned VHE off?

Correct, I'll make that clearer in the description.

> Thanks,
> 
> James
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/2] KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set

2020-08-11 Thread Suzuki K Poulose

On 08/11/2020 11:27 AM, Will Deacon wrote:

When an MMU notifier call results in unmapping a range that spans multiple
PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary,
since this avoids running into RCU stalls during VM teardown. Unfortunately,
if the VM is destroyed as a result of OOM, then blocking is not permitted
and the call to the scheduler triggers the following BUG():

  | BUG: sleeping function called from invalid context at 
arch/arm64/kvm/mmu.c:394
  | in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper
  | INFO: lockdep is turned off.
  | CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1
  | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
  | Call trace:
  |  dump_backtrace+0x0/0x284
  |  show_stack+0x1c/0x28
  |  dump_stack+0xf0/0x1a4
  |  ___might_sleep+0x2bc/0x2cc
  |  unmap_stage2_range+0x160/0x1ac
  |  kvm_unmap_hva_range+0x1a0/0x1c8
  |  kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8
  |  __mmu_notifier_invalidate_range_start+0x218/0x31c
  |  mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0
  |  __oom_reap_task_mm+0x128/0x268
  |  oom_reap_task+0xac/0x298
  |  oom_reaper+0x178/0x17c
  |  kthread+0x1e4/0x1fc
  |  ret_from_fork+0x10/0x30

Use the new 'flags' argument to kvm_unmap_hva_range() to ensure that we
only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is set in the notifier
flags.

Cc: 
Fixes: 8b3405e345b5 ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd")
Cc: Marc Zyngier 
Cc: Suzuki K Poulose 
Cc: James Morse 
Signed-off-by: Will Deacon 
---


Reviewed-by: Suzuki K Poulose 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 0/2] KVM: arm64: Fix sleeping while atomic BUG() on OOM

2020-08-11 Thread Will Deacon
Hi all,

While stress-testing my arm64 stage-2 page-table rewrite [1], I ran into
a sleeping while atomic BUG() during OOM that I can reproduce with
mainline.

The problem is that the arm64 page-table code periodically calls
cond_resched_lock() when unmapping the stage-2 page-tables, but in the
case of OOM, this occurs in atomic context.

These couple of patches (based on 5.8) propagate the flags from the MMU
notifier range structure, which in turn indicate whether or not blocking
is permitted.

Cheers,

Will

[1] https://android-kvm.googlesource.com/linux/+/refs/heads/topic/pgtable

Cc: Marc Zyngier 
Cc: Suzuki K Poulose 
Cc: James Morse 
Cc: Thomas Bogendoerfer 
Cc: Paul Mackerras 
Cc: Paolo Bonzini 
Cc: Sean Christopherson 

--->8

Will Deacon (2):
  KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()
  KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set

 arch/arm64/include/asm/kvm_host.h   |  2 +-
 arch/arm64/kvm/mmu.c| 19 ++-
 arch/mips/include/asm/kvm_host.h|  2 +-
 arch/mips/kvm/mmu.c |  3 ++-
 arch/powerpc/include/asm/kvm_host.h |  3 ++-
 arch/powerpc/kvm/book3s.c   |  3 ++-
 arch/powerpc/kvm/e500_mmu_host.c|  3 ++-
 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/kvm/mmu/mmu.c  |  3 ++-
 virt/kvm/kvm_main.c |  3 ++-
 10 files changed, 30 insertions(+), 14 deletions(-)

-- 
2.28.0.236.gb10cc79966-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 2/2] KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set

2020-08-11 Thread Will Deacon
When an MMU notifier call results in unmapping a range that spans multiple
PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary,
since this avoids running into RCU stalls during VM teardown. Unfortunately,
if the VM is destroyed as a result of OOM, then blocking is not permitted
and the call to the scheduler triggers the following BUG():

 | BUG: sleeping function called from invalid context at 
arch/arm64/kvm/mmu.c:394
 | in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper
 | INFO: lockdep is turned off.
 | CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1
 | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
 | Call trace:
 |  dump_backtrace+0x0/0x284
 |  show_stack+0x1c/0x28
 |  dump_stack+0xf0/0x1a4
 |  ___might_sleep+0x2bc/0x2cc
 |  unmap_stage2_range+0x160/0x1ac
 |  kvm_unmap_hva_range+0x1a0/0x1c8
 |  kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8
 |  __mmu_notifier_invalidate_range_start+0x218/0x31c
 |  mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0
 |  __oom_reap_task_mm+0x128/0x268
 |  oom_reap_task+0xac/0x298
 |  oom_reaper+0x178/0x17c
 |  kthread+0x1e4/0x1fc
 |  ret_from_fork+0x10/0x30

Use the new 'flags' argument to kvm_unmap_hva_range() to ensure that we
only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is set in the notifier
flags.

Cc: 
Fixes: 8b3405e345b5 ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd")
Cc: Marc Zyngier 
Cc: Suzuki K Poulose 
Cc: James Morse 
Signed-off-by: Will Deacon 
---
 arch/arm64/kvm/mmu.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 5f6b35c33618..bd47f06739d6 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -365,7 +365,8 @@ static void unmap_stage2_p4ds(struct kvm *kvm, pgd_t *pgd,
  * destroying the VM), otherwise another faulting VCPU may come in and mess
  * with things behind our backs.
  */
-static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
+static void __unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size,
+bool may_block)
 {
pgd_t *pgd;
phys_addr_t addr = start, end = start + size;
@@ -390,11 +391,16 @@ static void unmap_stage2_range(struct kvm *kvm, 
phys_addr_t start, u64 size)
 * If the range is too large, release the kvm->mmu_lock
 * to prevent starvation and lockup detector warnings.
 */
-   if (next != end)
+   if (may_block && next != end)
cond_resched_lock(>mmu_lock);
} while (pgd++, addr = next, addr != end);
 }
 
+static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
+{
+   __unmap_stage2_range(kvm, start, size, true);
+}
+
 static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
  phys_addr_t addr, phys_addr_t end)
 {
@@ -2198,7 +2204,10 @@ static int handle_hva_to_gpa(struct kvm *kvm,
 
 static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void 
*data)
 {
-   unmap_stage2_range(kvm, gpa, size);
+   unsigned flags = *(unsigned *)data;
+   bool may_block = flags & MMU_NOTIFIER_RANGE_BLOCKABLE;
+
+   __unmap_stage2_range(kvm, gpa, size, may_block);
return 0;
 }
 
@@ -2209,7 +2218,7 @@ int kvm_unmap_hva_range(struct kvm *kvm,
return 0;
 
trace_kvm_unmap_hva_range(start, end);
-   handle_hva_to_gpa(kvm, start, end, _unmap_hva_handler, NULL);
+   handle_hva_to_gpa(kvm, start, end, _unmap_hva_handler, );
return 0;
 }
 
-- 
2.28.0.236.gb10cc79966-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 1/2] KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()

2020-08-11 Thread Will Deacon
The 'flags' field of 'struct mmu_notifier_range' is used to indicate
whether invalidate_range_{start,end}() are permitted to block. In the
case of kvm_mmu_notifier_invalidate_range_start(), this field is not
forwarded on to the architecture-specific implementation of
kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
whether or not to block.

Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
architectures are aware as to whether or not they are permitted to block.

Cc: 
Cc: Marc Zyngier 
Cc: Suzuki K Poulose 
Cc: James Morse 
Signed-off-by: Will Deacon 
---
 arch/arm64/include/asm/kvm_host.h   | 2 +-
 arch/arm64/kvm/mmu.c| 2 +-
 arch/mips/include/asm/kvm_host.h| 2 +-
 arch/mips/kvm/mmu.c | 3 ++-
 arch/powerpc/include/asm/kvm_host.h | 3 ++-
 arch/powerpc/kvm/book3s.c   | 3 ++-
 arch/powerpc/kvm/e500_mmu_host.c| 3 ++-
 arch/x86/include/asm/kvm_host.h | 3 ++-
 arch/x86/kvm/mmu/mmu.c  | 3 ++-
 virt/kvm/kvm_main.c | 3 ++-
 10 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index e21d4a01372f..759d62343e1d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -443,7 +443,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva_range(struct kvm *kvm,
-   unsigned long start, unsigned long end);
+   unsigned long start, unsigned long end, unsigned flags);
 int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 31058e6e7c2a..5f6b35c33618 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2203,7 +2203,7 @@ static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t 
gpa, u64 size, void *dat
 }
 
 int kvm_unmap_hva_range(struct kvm *kvm,
-   unsigned long start, unsigned long end)
+   unsigned long start, unsigned long end, unsigned flags)
 {
if (!kvm->arch.pgd)
return 0;
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 363e7a89d173..ef1d25d49ec8 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -981,7 +981,7 @@ enum kvm_mips_fault_result kvm_trap_emul_gva_fault(struct 
kvm_vcpu *vcpu,
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva_range(struct kvm *kvm,
-   unsigned long start, unsigned long end);
+   unsigned long start, unsigned long end, unsigned flags);
 int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 49bd160f4d85..0783ac9b3240 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -518,7 +518,8 @@ static int kvm_unmap_hva_handler(struct kvm *kvm, gfn_t 
gfn, gfn_t gfn_end,
return 1;
 }
 
-int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long 
end)
+int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long 
end,
+   unsigned flags)
 {
handle_hva_to_gpa(kvm, start, end, _unmap_hva_handler, NULL);
 
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 7e2d061d0445..bccf0ba2da2e 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -58,7 +58,8 @@
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 
 extern int kvm_unmap_hva_range(struct kvm *kvm,
-  unsigned long start, unsigned long end);
+  unsigned long start, unsigned long end,
+  unsigned flags);
 extern int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long 
end);
 extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 extern int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 41fedec69ac3..49db50d1db04 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -834,7 +834,8 @@ void kvmppc_core_commit_memory_region(struct kvm *kvm,
kvm->arch.kvm_ops->commit_memory_region(kvm, mem, old, new, change);
 }
 
-int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long 
end)
+int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long 
end,
+   unsigned flags)
 {
return kvm->arch.kvm_ops->unmap_hva_range(kvm, start, end);
 }
diff --git a/arch/powerpc/kvm/e500_mmu_host.c 

Re: [kvm-unit-tests PATCH v3 00/10] arm/arm64: Add IPI/LPI/vtimer latency test

2020-08-11 Thread Marc Zyngier

On 2020-08-11 02:48, Jingyi Wang wrote:

Hi Marc,

On 8/5/2020 8:13 PM, Marc Zyngier wrote:

On 2020-08-05 12:54, Jingyi Wang wrote:

Hi all,

Currently, kvm-unit-tests only support GICv3 vLPI injection. May I 
ask

is there any plan or suggestion on constructing irq bypass mechanism
to test vLPI direct injection in kvm-unit-tests?


I'm not sure what you are asking for here. VLPIs are only delivered
from a HW device, and the offloading mechanism isn't visible from
userspace (you either have an enabled GICv4 implementation, or
you don't).

There are ways to *trigger* device MSIs from userspace and inject
them in a guest, but that's only a debug feature, which shouldn't
be enabled on a production system.

     M.


Sorry for the late reply.

As I mentioned before, we want to add vLPI direct injection test
in KUT, meanwhile measure the latency of hardware vLPI injection.

Sure, vLPI is triggered by hardware. Since kernel supports sending
ITS INT command in guest to trigger vLPI, I wonder if it is possible


So can the host.


to add an extra interface to make a vLPI hardware-offload(just as
kvm_vgic_v4_set_forwarding() does). If so, vgic_its_trigger_msi()
can inject vLPI directly instead of using LR.


The interface exists, it is in debugfs. But it mandates that the
device exists. And no, I am not willing to add an extra KVM userspace
API for this.

The whole concept of injecting an INT to measure the performance
of GICv4 is slightly bonkers, actually. Most of the cost is paid
on the injection path (queuing a pair of command, waiting until
the ITS wakes up and generate the signal...).

What you really want to measure is the time from generation of
the LPI by a device until the guest acknowledges the interrupt
to the device itself. and this can only be implemented in the
device.

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm