Re: [PATCH] KVM: arm/arm64: Don't enable/disable physical timer access on VHE

2017-11-20 Thread Jintack Lim
On Mon, Nov 20, 2017 at 6:16 AM, Christoffer Dall
 wrote:
> After the timer optimization rework we accidentally end up calling
> physical timer enable/disable functions on VHE systems, which is neither
> needed nor correct, since the CNTHCTL_EL2 register format is
> different when HCR_EL2.E2H is set.
>
> The CNTHCTL_EL2 is initialized when CPUs become online in
> kvm_timer_init_vhe() and we don't have to call these functions on VHE
> systems, which also allows us to inline the non-VHE functionality.
>
> Reported-by: Jintack Lim 
> Signed-off-by: Christoffer Dall 

Reviewed-by: Jintack Lim 

Thanks,
Jintack

> ---
>  include/kvm/arm_arch_timer.h |  3 ---
>  virt/kvm/arm/arch_timer.c|  6 --
>  virt/kvm/arm/hyp/timer-sr.c  | 48 
> ++--
>  3 files changed, 20 insertions(+), 37 deletions(-)
>
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index 01ee473517e2..6e45608b2399 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -93,7 +93,4 @@ void kvm_timer_init_vhe(void);
>  #define vcpu_vtimer(v) (&(v)->arch.timer_cpu.vtimer)
>  #define vcpu_ptimer(v) (&(v)->arch.timer_cpu.ptimer)
>
> -void enable_el1_phys_timer_access(void);
> -void disable_el1_phys_timer_access(void);
> -
>  #endif
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 4151250ce8da..190c99ed1b73 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -479,9 +479,6 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
>
> vtimer_restore_state(vcpu);
>
> -   if (has_vhe())
> -   disable_el1_phys_timer_access();
> -
> /* Set the background timer for the physical timer emulation. */
> phys_timer_emulate(vcpu);
>  }
> @@ -510,9 +507,6 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
> if (unlikely(!timer->enabled))
> return;
>
> -   if (has_vhe())
> -   enable_el1_phys_timer_access();
> -
> vtimer_save_state(vcpu);
>
> /*
> diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
> index f39861639f08..f24404b3c8df 100644
> --- a/virt/kvm/arm/hyp/timer-sr.c
> +++ b/virt/kvm/arm/hyp/timer-sr.c
> @@ -27,42 +27,34 @@ void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, 
> u32 cntvoff_high)
> write_sysreg(cntvoff, cntvoff_el2);
>  }
>
> -void __hyp_text enable_el1_phys_timer_access(void)
> -{
> -   u64 val;
> -
> -   /* Allow physical timer/counter access for the host */
> -   val = read_sysreg(cnthctl_el2);
> -   val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
> -   write_sysreg(val, cnthctl_el2);
> -}
> -
> -void __hyp_text disable_el1_phys_timer_access(void)
> -{
> -   u64 val;
> -
> -   /*
> -* Disallow physical timer access for the guest
> -* Physical counter access is allowed
> -*/
> -   val = read_sysreg(cnthctl_el2);
> -   val &= ~CNTHCTL_EL1PCEN;
> -   val |= CNTHCTL_EL1PCTEN;
> -   write_sysreg(val, cnthctl_el2);
> -}
> -
>  void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
>  {
> /*
>  * We don't need to do this for VHE since the host kernel runs in EL2
>  * with HCR_EL2.TGE ==1, which makes those bits have no impact.
>  */
> -   if (!has_vhe())
> -   enable_el1_phys_timer_access();
> +   if (!has_vhe()) {
> +   u64 val;
> +
> +   /* Allow physical timer/counter access for the host */
> +   val = read_sysreg(cnthctl_el2);
> +   val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
> +   write_sysreg(val, cnthctl_el2);
> +   }
>  }
>
>  void __hyp_text __timer_enable_traps(struct kvm_vcpu *vcpu)
>  {
> -   if (!has_vhe())
> -   disable_el1_phys_timer_access();
> +   if (!has_vhe()) {
> +   u64 val;
> +
> +   /*
> +* Disallow physical timer access for the guest
> +* Physical counter access is allowed
> +*/
> +   val = read_sysreg(cnthctl_el2);
> +   val &= ~CNTHCTL_EL1PCEN;
> +   val |= CNTHCTL_EL1PCTEN;
> +   write_sysreg(val, cnthctl_el2);
> +   }
>  }
> --
> 2.14.2
>
>

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 4/8] KVM: arm/arm64: vgic: Support level-triggered mapped interrupts

2017-11-20 Thread Christoffer Dall
Level-triggered mapped IRQs are special because we only observe rising
edges as input to the VGIC, and we don't set the EOI flag and therefore
are not told when the level goes down, so that we can re-queue a new
interrupt when the level goes up.

One way to solve this problem is to side-step the logic of the VGIC and
special case the validation in the injection path, but it has the
unfortunate drawback of having to peak into the physical GIC state
whenever we want to know if the interrupt is pending on the virtual
distributor.

Instead, we can maintain the current semantics of a level triggered
interrupt by sort of treating it as an edge-triggered interrupt,
following from the fact that we only observe an asserting edge.  This
requires us to be a bit careful when populating the LRs and when folding
the state back in though:

 * We lower the line level when populating the LR, so that when
   subsequently observing an asserting edge, the VGIC will do the right
   thing.

 * If the guest never acked the interrupt while running (for example if
   it had masked interrupts at the CPU level while running), we have
   to preserve the pending state of the LR and move it back to the
   line_level field of the struct irq when folding LR state.

   If the guest never acked the interrupt while running, but changed the
   device state and lowered the line (again with interrupts masked) then
   we need to observe this change in the line_level.

   Both of the above situations are solved by sampling the physical line
   and set the line level when folding the LR back.

 * Finally, if the guest never acked the interrupt while running and
   sampling the line reveals that the device state has changed and the
   line has been lowered, we must clear the physical active state, since
   we will otherwise never be told when the interrupt becomes asserted
   again.

This has the added benefit of making the timer optimization patches
(https://lists.cs.columbia.edu/pipermail/kvmarm/2017-July/026343.html) a
bit simpler, because the timer code doesn't have to clear the active
state on the sync anymore.  It also potentially improves the performance
of the timer implementation because the GIC knows the state or the LR
and only needs to clear the
active state when the pending bit in the LR is still set, where the
timer has to always clear it when returning from running the guest with
an injected timer interrupt.

Reviewed-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 virt/kvm/arm/vgic/vgic-v2.c | 29 +
 virt/kvm/arm/vgic/vgic-v3.c | 29 +
 virt/kvm/arm/vgic/vgic.c| 23 +++
 virt/kvm/arm/vgic/vgic.h|  7 +++
 4 files changed, 88 insertions(+)

diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
index 80897102da26..c32d7b93ffd1 100644
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -105,6 +105,26 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
irq->pending_latch = false;
}
 
+   /*
+* Level-triggered mapped IRQs are special because we only
+* observe rising edges as input to the VGIC.
+*
+* If the guest never acked the interrupt we have to sample
+* the physical line and set the line level, because the
+* device state could have changed or we simply need to
+* process the still pending interrupt later.
+*
+* If this causes us to lower the level, we have to also clear
+* the physical active state, since we will otherwise never be
+* told when the interrupt becomes asserted again.
+*/
+   if (vgic_irq_is_mapped_level(irq) && (val & 
GICH_LR_PENDING_BIT)) {
+   irq->line_level = vgic_get_phys_line_level(irq);
+
+   if (!irq->line_level)
+   vgic_irq_set_phys_active(irq, false);
+   }
+
spin_unlock_irqrestore(>irq_lock, flags);
vgic_put_irq(vcpu->kvm, irq);
}
@@ -162,6 +182,15 @@ void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct 
vgic_irq *irq, int lr)
val |= GICH_LR_EOI;
}
 
+   /*
+* Level-triggered mapped IRQs are special because we only observe
+* rising edges as input to the VGIC.  We therefore lower the line
+* level here, so that we can take new virtual IRQs.  See
+* vgic_v2_fold_lr_state for more info.
+*/
+   if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT))
+   irq->line_level = false;
+
/* The GICv2 LR only holds five bits of priority. */
val |= (irq->priority >> 3) << GICH_LR_PRIORITY_SHIFT;
 
diff --git a/virt/kvm/arm/vgic/vgic-v3.c 

[PATCH v5 7/8] KVM: arm/arm64: Provide a get_input_level for the arch timer

2017-11-20 Thread Christoffer Dall
The VGIC can now support the life-cycle of mapped level-triggered
interrupts, and we no longer have to read back the timer state on every
exit from the VM if we had an asserted timer interrupt signal, because
the VGIC already knows if we hit the unlikely case where the guest
disables the timer without ACKing the virtual timer interrupt.

This means we rework a bit of the code to factor out the functionality
to snapshot the timer state from vtimer_save_state(), and we can reuse
this functionality in the sync path when we have an irqchip in
userspace, and also to support our implementation of the
get_input_level() function for the timer.

This change also means that we can no longer rely on the timer's view of
the interrupt line to set the active state, because we no longer
maintain this state for mapped interrupts when exiting from the guest.
Instead, we only set the active state if the virtual interrupt is
active, and otherwise we simply let the timer fire again and raise the
virtual interrupt from the ISR.

Signed-off-by: Christoffer Dall 
---
 include/kvm/arm_arch_timer.h |  2 ++
 virt/kvm/arm/arch_timer.c| 76 +---
 2 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 6e45608b2399..b1dcfde0a3ef 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -90,6 +90,8 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
 
 void kvm_timer_init_vhe(void);
 
+bool kvm_arch_timer_get_input_level(int vintid);
+
 #define vcpu_vtimer(v) (&(v)->arch.timer_cpu.vtimer)
 #define vcpu_ptimer(v) (&(v)->arch.timer_cpu.ptimer)
 
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 8f804ba5896f..bcde6bb99669 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -343,6 +343,12 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
phys_timer_emulate(vcpu);
 }
 
+static void __timer_snapshot_state(struct arch_timer_context *timer)
+{
+   timer->cnt_ctl = read_sysreg_el0(cntv_ctl);
+   timer->cnt_cval = read_sysreg_el0(cntv_cval);
+}
+
 static void vtimer_save_state(struct kvm_vcpu *vcpu)
 {
struct arch_timer_cpu *timer = >arch.timer_cpu;
@@ -354,10 +360,8 @@ static void vtimer_save_state(struct kvm_vcpu *vcpu)
if (!vtimer->loaded)
goto out;
 
-   if (timer->enabled) {
-   vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
-   vtimer->cnt_cval = read_sysreg_el0(cntv_cval);
-   }
+   if (timer->enabled)
+   __timer_snapshot_state(vtimer);
 
/* Disable the virtual timer */
write_sysreg_el0(0, cntv_ctl);
@@ -454,8 +458,7 @@ static void kvm_timer_vcpu_load_vgic(struct kvm_vcpu *vcpu)
bool phys_active;
int ret;
 
-   phys_active = vtimer->irq.level ||
- kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);
+   phys_active = kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);
 
ret = irq_set_irqchip_state(host_vtimer_irq,
IRQCHIP_STATE_ACTIVE,
@@ -535,27 +538,19 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
set_cntvoff(0);
 }
 
-static void unmask_vtimer_irq(struct kvm_vcpu *vcpu)
+/*
+ * With a userspace irqchip we have to check if the guest de-asserted the
+ * timer and if so, unmask the timer irq signal on the host interrupt
+ * controller to ensure that we see future timer signals.
+ */
+static void unmask_vtimer_irq_user(struct kvm_vcpu *vcpu)
 {
struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 
if (unlikely(!irqchip_in_kernel(vcpu->kvm))) {
-   kvm_vtimer_update_mask_user(vcpu);
-   return;
-   }
-
-   /*
-* If the guest disabled the timer without acking the interrupt, then
-* we must make sure the physical and virtual active states are in
-* sync by deactivating the physical interrupt, because otherwise we
-* wouldn't see the next timer interrupt in the host.
-*/
-   if (!kvm_vgic_map_is_active(vcpu, vtimer->irq.irq)) {
-   int ret;
-   ret = irq_set_irqchip_state(host_vtimer_irq,
-   IRQCHIP_STATE_ACTIVE,
-   false);
-   WARN_ON(ret);
+   __timer_snapshot_state(vtimer);
+   if (!kvm_timer_should_fire(vtimer))
+   kvm_vtimer_update_mask_user(vcpu);
}
 }
 
@@ -568,21 +563,7 @@ static void unmask_vtimer_irq(struct kvm_vcpu *vcpu)
  */
 void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 {
-   struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
-
-   /*
-* If we entered the guest with the vtimer output asserted we have to
-* check if the guest has modified the timer so that we should lower
-* the line at this point.
-*/
- 

[PATCH v5 1/8] KVM: arm/arm64: Remove redundant preemptible checks

2017-11-20 Thread Christoffer Dall
The __this_cpu_read() and __this_cpu_write() functions already implement
checks for the required preemption levels when using
CONFIG_DEBUG_PREEMPT which gives you nice error messages and such.
Therefore there is no need to explicitly check this using a BUG_ON() in
the code (which we don't do for other uses of per cpu variables either).

Acked-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 virt/kvm/arm/arm.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index c13d74c083fe..28548aeaf164 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -71,7 +71,6 @@ static DEFINE_PER_CPU(unsigned char, 
kvm_arm_hardware_enabled);
 
 static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
 {
-   BUG_ON(preemptible());
__this_cpu_write(kvm_arm_running_vcpu, vcpu);
 }
 
@@ -81,7 +80,6 @@ static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
  */
 struct kvm_vcpu *kvm_arm_get_running_vcpu(void)
 {
-   BUG_ON(preemptible());
return __this_cpu_read(kvm_arm_running_vcpu);
 }
 
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 0/8] Handle forwarded level-triggered interrupts

2017-11-20 Thread Christoffer Dall
This series illustrates an alternative approach to Eric Auger's direct
EOI setup patches [1] in terms of the KVM VGIC support.

The idea is to maintain existing semantics for the VGIC for mapped
level-triggered IRQs and also support the timer using mapped IRQs with
the same VGIC support as VFIO interrupts.

Based on kvm-arm-gicv4-for-v4.15 (latest pull request of KVM/ARM changes
to the KVM tree).

Also available at:
git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v5

(Apologies for sending this out during the merge window, but I've been
sitting on these for a while, and the VHE optimization patches have
gotten a fair amount of review lately, and I'd like to base the next
version on these patches.)

Changes since v4:
 - Rebased on the timer optimization series merged in the v4.15 merge
   window, which caused a fair amount of modifications to patch 3.
 - Added a static key to disable the sync operations when no VMs are
   using userspace irqchips to further optimize the performance
 - Fixed extra semicolon in vgic-mmio.c
 - Added commentary as requested during review
 - Dropped what was patch 4, because it was merged as part of GICv4
   support.
 - Factored out the VGIC input level function change as separate patch
   (helps bisect and debugging), before providing a function for the
   timer.

Changes since v3:
 - Added a number of patches and moved patches around a bit.
 - Check for uaccesses in the mmio handler functions
 - Fixed bugs in the mmio handler functions

Changes since v2:
 - Removed patch 5 from v2 and integrating the changes in what's now
   patch 5 to make it easier to reuse code when adding VFIO integration.
 - Changed the virtual distributor MMIO handling to use the
   pending_latch and more closely match the semantics of SPENDR and
   CPENDR for both level and edge mapped interrupts.

Changes since v1:
 - Added necessary changes to the timer (Patch 1)
 - Added handling of guest MMIO accesses to the virtual distributor
   (Patch 4)
 - Addressed Marc's comments from the initial RFC (mostly renames)

Thanks,
-Christoffer

Christoffer Dall (8):
  KVM: arm/arm64: Remove redundant preemptible checks
  KVM: arm/arm64: Factor out functionality to get vgic mmio
requester_vcpu
  KVM: arm/arm64: Don't cache the timer IRQ level
  KVM: arm/arm64: vgic: Support level-triggered mapped interrupts
  KVM: arm/arm64: Support a vgic interrupt line level sample function
  KVM: arm/arm64: Support VGIC dist pend/active changes for mapped IRQs
  KVM: arm/arm64: Provide a get_input_level for the arch timer
  KVM: arm/arm64: Avoid work when userspace iqchips are not used

 include/kvm/arm_arch_timer.h  |   2 +
 include/kvm/arm_vgic.h|  13 -
 virt/kvm/arm/arch_timer.c | 106 ++-
 virt/kvm/arm/arm.c|   2 -
 virt/kvm/arm/vgic/vgic-mmio.c | 114 ++
 virt/kvm/arm/vgic/vgic-v2.c   |  29 +++
 virt/kvm/arm/vgic/vgic-v3.c   |  29 +++
 virt/kvm/arm/vgic/vgic.c  |  42 ++--
 virt/kvm/arm/vgic/vgic.h  |   8 +++
 9 files changed, 270 insertions(+), 75 deletions(-)

-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 5/8] KVM: arm/arm64: Support a vgic interrupt line level sample function

2017-11-20 Thread Christoffer Dall
The GIC sometimes need to sample the physical line of a mapped
interrupt.  As we know this to be notoriously slow, provide a callback
function for devices (such as the timer) which can do this much faster
than talking to the distributor, for example by comparing a few
in-memory values.  Fall back to the good old method of poking the
physical GIC if no callback is provided.

Reviewed-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 include/kvm/arm_vgic.h| 13 -
 virt/kvm/arm/arch_timer.c |  3 ++-
 virt/kvm/arm/vgic/vgic.c  | 12 +---
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 8c896540a72c..cdbd142ca7f2 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -130,6 +130,17 @@ struct vgic_irq {
u8 priority;
enum vgic_irq_config config;/* Level or edge */
 
+   /*
+* Callback function pointer to in-kernel devices that can tell us the
+* state of the input level of mapped level-triggered IRQ faster than
+* peaking into the physical GIC.
+*
+* Always called in non-preemptible section and the functions can use
+* kvm_arm_get_running_vcpu() to get the vcpu pointer for private
+* IRQs.
+*/
+   bool (*get_input_level)(int vintid);
+
void *owner;/* Opaque pointer to reserve an 
interrupt
   for in-kernel devices. */
 };
@@ -331,7 +342,7 @@ void kvm_vgic_init_cpu_hardware(void);
 int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
bool level, void *owner);
 int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
- u32 vintid);
+ u32 vintid, bool (*get_input_level)(int vindid));
 int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid);
 bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid);
 
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 5f8ad8e3f3ff..8f804ba5896f 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -834,7 +834,8 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
return -EINVAL;
}
 
-   ret = kvm_vgic_map_phys_irq(vcpu, host_vtimer_irq, vtimer->irq.irq);
+   ret = kvm_vgic_map_phys_irq(vcpu, host_vtimer_irq, vtimer->irq.irq,
+   NULL);
if (ret)
return ret;
 
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 423a572c92f5..91266960dbd4 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -151,6 +151,9 @@ bool vgic_get_phys_line_level(struct vgic_irq *irq)
 
BUG_ON(!irq->hw);
 
+   if (irq->get_input_level)
+   return irq->get_input_level(irq->intid);
+
WARN_ON(irq_get_irqchip_state(irq->host_irq,
  IRQCHIP_STATE_PENDING,
  _level));
@@ -436,7 +439,8 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, 
unsigned int intid,
 
 /* @irq->irq_lock must be held */
 static int kvm_vgic_map_irq(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
-   unsigned int host_irq)
+   unsigned int host_irq,
+   bool (*get_input_level)(int vindid))
 {
struct irq_desc *desc;
struct irq_data *data;
@@ -456,6 +460,7 @@ static int kvm_vgic_map_irq(struct kvm_vcpu *vcpu, struct 
vgic_irq *irq,
irq->hw = true;
irq->host_irq = host_irq;
irq->hwintid = data->hwirq;
+   irq->get_input_level = get_input_level;
return 0;
 }
 
@@ -464,10 +469,11 @@ static inline void kvm_vgic_unmap_irq(struct vgic_irq 
*irq)
 {
irq->hw = false;
irq->hwintid = 0;
+   irq->get_input_level = NULL;
 }
 
 int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
- u32 vintid)
+ u32 vintid, bool (*get_input_level)(int vindid))
 {
struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, vintid);
unsigned long flags;
@@ -476,7 +482,7 @@ int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned 
int host_irq,
BUG_ON(!irq);
 
spin_lock_irqsave(>irq_lock, flags);
-   ret = kvm_vgic_map_irq(vcpu, irq, host_irq);
+   ret = kvm_vgic_map_irq(vcpu, irq, host_irq, get_input_level);
spin_unlock_irqrestore(>irq_lock, flags);
vgic_put_irq(vcpu->kvm, irq);
 
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 8/8] KVM: arm/arm64: Avoid work when userspace iqchips are not used

2017-11-20 Thread Christoffer Dall
We currently check if the VM has a userspace irqchip on every exit from
the VCPU, and if so, we do some work to ensure correct timer behavior.
This is unfortunate, as we could avoid doing any work entirely, if we
didn't have to support irqchip in userspace.

Realizing the userspace irqchip on ARM is mostly a developer or hobby
feature, and is unlikely to be used in servers or other scenarios where
performance is a priority, we can use a refcounted static key to only
check the irqchip configuration when we have at least one VM that uses
an irqchip in userspace.

Signed-off-by: Christoffer Dall 
---
 virt/kvm/arm/arch_timer.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index bcde6bb99669..7b27316d4c3f 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -51,6 +51,8 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool 
new_level,
 struct arch_timer_context *timer_ctx);
 static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx);
 
+static DEFINE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
+
 u64 kvm_phys_timer_read(void)
 {
return timecounter->cc->read(timecounter->cc);
@@ -563,7 +565,8 @@ static void unmask_vtimer_irq_user(struct kvm_vcpu *vcpu)
  */
 void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 {
-   unmask_vtimer_irq_user(vcpu);
+   if (static_branch_unlikely(_irqchip_in_use))
+   unmask_vtimer_irq_user(vcpu);
 }
 
 int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
@@ -768,6 +771,8 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
soft_timer_cancel(>bg_timer, >expired);
soft_timer_cancel(>phys_timer, NULL);
kvm_vgic_unmap_phys_irq(vcpu, vtimer->irq.irq);
+   if (timer->enabled && !irqchip_in_kernel(vcpu->kvm))
+   static_branch_dec(_irqchip_in_use);
 }
 
 static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
@@ -821,8 +826,10 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
return 0;
 
/* Without a VGIC we do not map virtual IRQs to physical IRQs */
-   if (!irqchip_in_kernel(vcpu->kvm))
+   if (!irqchip_in_kernel(vcpu->kvm)) {
+   static_branch_inc(_irqchip_in_use);
goto no_vgic;
+   }
 
if (!vgic_initialized(vcpu->kvm))
return -ENODEV;
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 6/8] KVM: arm/arm64: Support VGIC dist pend/active changes for mapped IRQs

2017-11-20 Thread Christoffer Dall
For mapped IRQs (with the HW bit set in the LR) we have to follow some
rules of the architecture.  One of these rules is that VM must not be
allowed to deactivate a virtual interrupt with the HW bit set unless the
physical interrupt is also active.

This works fine when injecting mapped interrupts, because we leave it up
to the injector to either set EOImode==1 or manually set the active
state of the physical interrupt.

However, the guest can set virtual interrupt to be pending or active by
writing to the virtual distributor, which could lead to deactivating a
virtual interrupt with the HW bit set without the physical interrupt
being active.

We could set the physical interrupt to active whenever we are about to
enter the VM with a HW interrupt either pending or active, but that
would be really slow, especially on GICv2.  So we take the long way
around and do the hard work when needed, which is expected to be
extremely rare.

When the VM sets the pending state for a HW interrupt on the virtual
distributor we set the active state on the physical distributor, because
the virtual interrupt can become active and then the guest can
deactivate it.

When the VM clears the pending state we also clear it on the physical
side, because the injector might otherwise raise the interrupt.  We also
clear the physical active state when the virtual interrupt is not
active, since otherwise a SPEND/CPEND sequence from the guest would
prevent signaling of future interrupts.

Changing the state of mapped interrupts from userspace is not supported,
and it's expected that userspace unmaps devices from VFIO before
attempting to set the interrupt state, because the interrupt state is
driven by hardware.

Reviewed-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 virt/kvm/arm/vgic/vgic-mmio.c | 71 +++
 virt/kvm/arm/vgic/vgic.c  |  7 +
 virt/kvm/arm/vgic/vgic.h  |  1 +
 3 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
index 6113cf850f47..294e949ece24 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.c
+++ b/virt/kvm/arm/vgic/vgic-mmio.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "vgic.h"
@@ -142,10 +143,22 @@ static struct kvm_vcpu *vgic_get_mmio_requester_vcpu(void)
return vcpu;
 }
 
+/* Must be called with irq->irq_lock held */
+static void vgic_hw_irq_spending(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
+bool is_uaccess)
+{
+   if (!is_uaccess)
+   irq->pending_latch = true;
+
+   if (!is_uaccess)
+   vgic_irq_set_phys_active(irq, true);
+}
+
 void vgic_mmio_write_spending(struct kvm_vcpu *vcpu,
  gpa_t addr, unsigned int len,
  unsigned long val)
 {
+   bool is_uaccess = !vgic_get_mmio_requester_vcpu();
u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
int i;
unsigned long flags;
@@ -154,17 +167,45 @@ void vgic_mmio_write_spending(struct kvm_vcpu *vcpu,
struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 
spin_lock_irqsave(>irq_lock, flags);
-   irq->pending_latch = true;
-
+   if (irq->hw)
+   vgic_hw_irq_spending(vcpu, irq, is_uaccess);
+   else
+   irq->pending_latch = true;
vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
vgic_put_irq(vcpu->kvm, irq);
}
 }
 
+/* Must be called with irq->irq_lock held */
+static void vgic_hw_irq_cpending(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
+bool is_uaccess)
+{
+   if (!is_uaccess)
+   irq->pending_latch = false;
+
+   if (!is_uaccess) {
+   /*
+* We don't want the guest to effectively mask the physical
+* interrupt by doing a write to SPENDR followed by a write to
+* CPENDR for HW interrupts, so we clear the active state on
+* the physical side if the virtual interrupt is not active.
+* This may lead to taking an additional interrupt on the
+* host, but that should not be a problem as the worst that
+* can happen is an additional vgic injection.  We also clear
+* the pending state to maintain proper semantics for edge HW
+* interrupts.
+*/
+   vgic_irq_set_phys_pending(irq, false);
+   if (!irq->active)
+   vgic_irq_set_phys_active(irq, false);
+   }
+}
+
 void vgic_mmio_write_cpending(struct kvm_vcpu *vcpu,
  gpa_t addr, unsigned int len,
  unsigned long val)
 {
+   bool is_uaccess = !vgic_get_mmio_requester_vcpu();
u32 intid = 

[PATCH v5 3/8] KVM: arm/arm64: Don't cache the timer IRQ level

2017-11-20 Thread Christoffer Dall
The timer was modeled after a strict idea of modelling an interrupt line
level in software, meaning that only transitions in the level needed to
be reported to the VGIC.  This works well for the timer, because the
arch timer code is in complete control of the device and can track the
transitions of the line.

However, as we are about to support using the HW bit in the VGIC not
just for the timer, but also for VFIO which cannot track transitions of
the interrupt line, we have to decide on an interface for level
triggered mapped interrupts to the GIC, which both the timer and VFIO
can use.

VFIO only sees an asserting transition of the physical interrupt line,
and tells the VGIC when that happens.  That means that part of the
interrupt flow is offloaded to the hardware.

To use the same interface for VFIO devices and the timer, we therefore
have to change the timer (we cannot change VFIO because it doesn't know
the details of the device it is assigning to a VM).

Luckily, changing the timer is simple, we just need to stop 'caching'
the line level, but instead let the VGIC know the state of the timer
every time there is a potential change in the line level, and when the
line level should be asserted from the timer ISR.  The VGIC can ignore
extra notifications using its validate mechanism.

Signed-off-by: Christoffer Dall 
---
 virt/kvm/arm/arch_timer.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 190c99ed1b73..5f8ad8e3f3ff 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -99,11 +99,9 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void 
*dev_id)
}
vtimer = vcpu_vtimer(vcpu);
 
-   if (!vtimer->irq.level) {
-   vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
-   if (kvm_timer_irq_can_fire(vtimer))
-   kvm_timer_update_irq(vcpu, true, vtimer);
-   }
+   vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
+   if (kvm_timer_irq_can_fire(vtimer))
+   kvm_timer_update_irq(vcpu, true, vtimer);
 
if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
kvm_vtimer_update_mask_user(vcpu);
@@ -324,12 +322,20 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
struct arch_timer_cpu *timer = >arch.timer_cpu;
struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
+   bool level;
 
if (unlikely(!timer->enabled))
return;
 
-   if (kvm_timer_should_fire(vtimer) != vtimer->irq.level)
-   kvm_timer_update_irq(vcpu, !vtimer->irq.level, vtimer);
+   /*
+* The vtimer virtual interrupt is a 'mapped' interrupt, meaning part
+* of its lifecycle is offloaded to the hardware, and we therefore may
+* not have lowered the irq.level value before having to signal a new
+* interrupt, but have to signal an interrupt every time the level is
+* asserted.
+*/
+   level = kvm_timer_should_fire(vtimer);
+   kvm_timer_update_irq(vcpu, level, vtimer);
 
if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 2/8] KVM: arm/arm64: Factor out functionality to get vgic mmio requester_vcpu

2017-11-20 Thread Christoffer Dall
We are about to distinguish between userspace accesses and mmio traps
for a number of the mmio handlers.  When the requester vcpu is NULL, it
mens we are handling a userspace acccess.

Factor out the functionality to get the request vcpu into its own
function, mostly so we have a common place to document the semantics of
the return value.

Also take the chance to move the functionality outside of holding a
spinlock and instead explicitly disable and enable preemption.  This
supports PREEMPT_RT kernels as well.

Acked-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 virt/kvm/arm/vgic/vgic-mmio.c | 43 +++
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
index deb51ee16a3d..6113cf850f47 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.c
+++ b/virt/kvm/arm/vgic/vgic-mmio.c
@@ -122,6 +122,26 @@ unsigned long vgic_mmio_read_pending(struct kvm_vcpu *vcpu,
return value;
 }
 
+/*
+ * This function will return the VCPU that performed the MMIO access and
+ * trapped from twithin the VM, and will return NULL if this is a userspace
+ * access.
+ *
+ * We can disable preemption locally around accessing the per-CPU variable
+ * because even if the current thread is migrated to another CPU, reading the
+ * per-CPU value later will give us the same value as we update the per-CPU
+ * variable in the preempt notifier handlers.
+ */
+static struct kvm_vcpu *vgic_get_mmio_requester_vcpu(void)
+{
+   struct kvm_vcpu *vcpu;
+
+   preempt_disable();
+   vcpu = kvm_arm_get_running_vcpu();
+   preempt_enable();
+   return vcpu;
+}
+
 void vgic_mmio_write_spending(struct kvm_vcpu *vcpu,
  gpa_t addr, unsigned int len,
  unsigned long val)
@@ -184,24 +204,10 @@ unsigned long vgic_mmio_read_active(struct kvm_vcpu *vcpu,
 static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq 
*irq,
bool new_active_state)
 {
-   struct kvm_vcpu *requester_vcpu;
unsigned long flags;
-   spin_lock_irqsave(>irq_lock, flags);
+   struct kvm_vcpu *requester_vcpu = vgic_get_mmio_requester_vcpu();
 
-   /*
-* The vcpu parameter here can mean multiple things depending on how
-* this function is called; when handling a trap from the kernel it
-* depends on the GIC version, and these functions are also called as
-* part of save/restore from userspace.
-*
-* Therefore, we have to figure out the requester in a reliable way.
-*
-* When accessing VGIC state from user space, the requester_vcpu is
-* NULL, which is fine, because we guarantee that no VCPUs are running
-* when accessing VGIC state from user space so irq->vcpu->cpu is
-* always -1.
-*/
-   requester_vcpu = kvm_arm_get_running_vcpu();
+   spin_lock_irqsave(>irq_lock, flags);
 
/*
 * If this virtual IRQ was written into a list register, we
@@ -213,6 +219,11 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, 
struct vgic_irq *irq,
 * vgic_change_active_prepare)  and still has to sync back this IRQ,
 * so we release and re-acquire the spin_lock to let the other thread
 * sync back the IRQ.
+*
+* When accessing VGIC state from user space, requester_vcpu is
+* NULL, which is fine, because we guarantee that no VCPUs are running
+* when accessing VGIC state from user space so irq->vcpu->cpu is
+* always -1.
 */
while (irq->vcpu && /* IRQ may have state in an LR somewhere */
   irq->vcpu != requester_vcpu && /* Current thread is not the VCPU 
thread */
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 14/20] KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit

2017-11-20 Thread Jintack Lim
On Mon, Nov 20, 2017 at 6:15 AM, Christoffer Dall  wrote:
> On Thu, Nov 16, 2017 at 03:30:39PM -0500, Jintack Lim wrote:
>> Hi Christoffer,
>>
>> On Fri, Oct 20, 2017 at 7:49 AM, Christoffer Dall
>>  wrote:
>> > From: Christoffer Dall 
>> >
>> > We don't need to save and restore the hardware timer state and examine
>> > if it generates interrupts on on every entry/exit to the guest.  The
>> > timer hardware is perfectly capable of telling us when it has expired
>> > by signaling interrupts.
>> >
>> > When taking a vtimer interrupt in the host, we don't want to mess with
>> > the timer configuration, we just want to forward the physical interrupt
>> > to the guest as a virtual interrupt.  We can use the split priority drop
>> > and deactivate feature of the GIC to do this, which leaves an EOI'ed
>> > interrupt active on the physical distributor, making sure we don't keep
>> > taking timer interrupts which would prevent the guest from running.  We
>> > can then forward the physical interrupt to the VM using the HW bit in
>> > the LR of the GIC, like we do already, which lets the guest directly
>> > deactivate both the physical and virtual timer simultaneously, allowing
>> > the timer hardware to exit the VM and generate a new physical interrupt
>> > when the timer output is again asserted later on.
>> >
>> > We do need to capture this state when migrating VCPUs between physical
>> > CPUs, however, which we use the vcpu put/load functions for, which are
>> > called through preempt notifiers whenever the thread is scheduled away
>> > from the CPU or called directly if we return from the ioctl to
>> > userspace.
>> >
>> > One caveat is that we have to save and restore the timer state in both
>> > kvm_timer_vcpu_[put/load] and kvm_timer_[schedule/unschedule], because
>> > we can have the following flows:
>> >
>> >   1. kvm_vcpu_block
>> >   2. kvm_timer_schedule
>> >   3. schedule
>> >   4. kvm_timer_vcpu_put (preempt notifier)
>> >   5. schedule (vcpu thread gets scheduled back)
>> >   6. kvm_timer_vcpu_load (preempt notifier)
>> >   7. kvm_timer_unschedule
>> >
>> > And a version where we don't actually call schedule:
>> >
>> >   1. kvm_vcpu_block
>> >   2. kvm_timer_schedule
>> >   7. kvm_timer_unschedule
>> >
>> > Since kvm_timer_[schedule/unschedule] may not be followed by put/load,
>> > but put/load also may be called independently, we call the timer
>> > save/restore functions from both paths.  Since they rely on the loaded
>> > flag to never save/restore when unnecessary, this doesn't cause any
>> > harm, and we ensure that all invokations of either set of functions work
>> > as intended.
>> >
>> > An added benefit beyond not having to read and write the timer sysregs
>> > on every entry and exit is that we no longer have to actively write the
>> > active state to the physical distributor, because we configured the
>> > irq for the vtimer to only get a priority drop when handling the
>> > interrupt in the GIC driver (we called irq_set_vcpu_affinity()), and
>> > the interrupt stays active after firing on the host.
>> >
>> > Signed-off-by: Christoffer Dall 
>> > ---
>> >
>> > Notes:
>> > Changes since v3:
>> >  - Added comments explaining the 'loaded' flag and made other 
>> > clarifying
>> >comments.
>> >  - No longer rely on the armed flag to conditionally save/restore 
>> > state,
>> >as we already rely on the 'loaded' flag to not repetitively
>> >save/restore state.
>> >  - Reworded parts of the commit message.
>> >  - Removed renames not belonging to this patch.
>> >  - Added warning in kvm_arch_timer_handler in case we see spurious
>> >interrupts, for example if the hardware doesn't retire the
>> >level-triggered timer signal fast enough.
>> >
>> >  include/kvm/arm_arch_timer.h |  16 ++-
>> >  virt/kvm/arm/arch_timer.c| 237 
>> > +++
>> >  virt/kvm/arm/arm.c   |  19 +++-
>> >  3 files changed, 178 insertions(+), 94 deletions(-)
>> >
>> > diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
>> > index 184c3ef2df93..c538f707e1c1 100644
>> > --- a/include/kvm/arm_arch_timer.h
>> > +++ b/include/kvm/arm_arch_timer.h
>> > @@ -31,8 +31,15 @@ struct arch_timer_context {
>> > /* Timer IRQ */
>> > struct kvm_irq_levelirq;
>> >
>> > -   /* Active IRQ state caching */
>> > -   boolactive_cleared_last;
>> > +   /*
>> > +* We have multiple paths which can save/restore the timer state
>> > +* onto the hardware, so we need some way of keeping track of
>> > +* where the latest state is.
>> > +*
>> > +* loaded == true:  State is loaded on the hardware registers.
>> > +* loaded == false: State is stored in memory.
>> > +*/
>> > +   boolloaded;
>> >
>> > 

Re: [PATCH v3 0/3] KVM: arm64: single step emulation instructions

2017-11-20 Thread Christoffer Dall
Hi Alex,

On Thu, Nov 16, 2017 at 03:39:18PM +, Alex Bennée wrote:
> Hi,
> 
> This is rev 3 of the series, practically the same than rev 2 but fixed
> a return 1->0 in the kvm_run loop that Julien caught. I've added his
> r-b tags to the other patches.
> 
> As usual revision details bellow the --- in each patch.

Thanks for taking care of this.

I have applied the series and slightly tweaked the commit messages and
commentary.

Did we simply decide to not worry about exiting to userspace if we do
fast-path emulation, such as for the errata workaround and GIC
mashaling in switch.c ?

Thanks,
-Christoffer

> 
> Alex Bennée (3):
>   kvm: arm debug: introduce helper for single-step
>   kvm: arm64: handle single-stepping trapped instructions
>   kvm: arm64: handle single-step of userspace mmio instructions
> 
>  arch/arm/include/asm/kvm_host.h   |  5 +
>  arch/arm64/include/asm/kvm_host.h |  1 +
>  arch/arm64/kvm/debug.c| 22 ++
>  arch/arm64/kvm/handle_exit.c  | 47 
> +++
>  virt/kvm/arm/arm.c|  3 +++
>  5 files changed, 64 insertions(+), 14 deletions(-)
> 
> -- 
> 2.15.0
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 4/7] KVM: arm/arm64: vgic-its: Check result of allocation before use

2017-11-20 Thread Christoffer Dall
On Thu, Nov 16, 2017 at 05:58:18PM +, Marc Zyngier wrote:
> We miss a test against NULL after allocation.
> 
> Fixes: 6d03a68f8054 ("KVM: arm64: vgic-its: Turn device_id validation into 
> generic ID validation")
> Cc: sta...@vger.kernel.org # 4.8
> Reported-by: AKASHI Takahiro 
> Signed-off-by: Marc Zyngier 
> ---
>  virt/kvm/arm/vgic/vgic-its.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> index 370086006838..30f7c7e6d2f4 100644
> --- a/virt/kvm/arm/vgic/vgic-its.c
> +++ b/virt/kvm/arm/vgic/vgic-its.c
> @@ -821,6 +821,8 @@ static int vgic_its_alloc_collection(struct vgic_its *its,
>   return E_ITS_MAPC_COLLECTION_OOR;
>  
>   collection = kzalloc(sizeof(*collection), GFP_KERNEL);
> + if (!collection)
> + return -ENOMEM;

Our processing of this return value seems to be "shrug, something went
wrong, let's move on to the next command".  Is this really a valid thing
to do if we're so much under memory pressure that we dannot allocate a
collection structure?

My question notwithstanding:

Acked-by: Christoffer Dall 

>  
>   collection->collection_id = coll_id;
>   collection->target_addr = COLLECTION_NOT_MAPPED;
> -- 
> 2.14.2
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/7] KVM: arm/arm64: vgic-irqfd: Fix MSI entry allocation

2017-11-20 Thread Christoffer Dall
On Thu, Nov 16, 2017 at 05:58:15PM +, Marc Zyngier wrote:
> Using the size of the structure we're allocating is a good idea
> and avoids any surprise... In this case, we're happilly confusing
> kvm_kernel_irq_routing_entry and kvm_irq_routing_entry...

Yikes.

Reviewed-by: Christoffer Dall 

> 
> Fixes: 95b110ab9a09 ("KVM: arm/arm64: Enable irqchip routing")
> Cc: sta...@vger.kernel.org # 4.8
> Reported-by: AKASHI Takahiro 
> Signed-off-by: Marc Zyngier 
> ---
>  virt/kvm/arm/vgic/vgic-irqfd.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-irqfd.c b/virt/kvm/arm/vgic/vgic-irqfd.c
> index b7baf581611a..99e026d2dade 100644
> --- a/virt/kvm/arm/vgic/vgic-irqfd.c
> +++ b/virt/kvm/arm/vgic/vgic-irqfd.c
> @@ -112,8 +112,7 @@ int kvm_vgic_setup_default_irq_routing(struct kvm *kvm)
>   u32 nr = dist->nr_spis;
>   int i, ret;
>  
> - entries = kcalloc(nr, sizeof(struct kvm_kernel_irq_routing_entry),
> -   GFP_KERNEL);
> + entries = kcalloc(nr, sizeof(*entries), GFP_KERNEL);
>   if (!entries)
>   return -ENOMEM;
>  
> -- 
> 2.14.2
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 7/7] arm: KVM: Fix VTTBR_BADDR_MASK BUG_ON off-by-one

2017-11-20 Thread Christoffer Dall
On Thu, Nov 16, 2017 at 05:58:21PM +, Marc Zyngier wrote:
> VTTBR_BADDR_MASK is used to sanity check the size and alignment of the
> VTTBR address. It seems to currently be off by one, thereby only
> allowing up to 39-bit addresses (instead of 40-bit) and also
> insufficiently checking the alignment. This patch fixes it.
> 
> This patch is the 32bit pendent of Kristina's arm64 fix, and
> she deserves the actual kudos for pinpointing that one.
> 
> Fixes: f7ed45be3ba52 ("KVM: ARM: World-switch implementation")
> Cc:  # 3.9
> Reported-by: Kristina Martsenko 
> Signed-off-by: Marc Zyngier 

Reviewed-by: Christoffer Dall 

> ---
>  arch/arm/include/asm/kvm_arm.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> index c8781450905b..3ab8b3781bfe 100644
> --- a/arch/arm/include/asm/kvm_arm.h
> +++ b/arch/arm/include/asm/kvm_arm.h
> @@ -161,8 +161,7 @@
>  #else
>  #define VTTBR_X  (5 - KVM_T0SZ)
>  #endif
> -#define VTTBR_BADDR_SHIFT (VTTBR_X - 1)
> -#define VTTBR_BADDR_MASK  (((_AC(1, ULL) << (40 - VTTBR_X)) - 1) << 
> VTTBR_BADDR_SHIFT)
> +#define VTTBR_BADDR_MASK  (((_AC(1, ULL) << (40 - VTTBR_X)) - 1) << VTTBR_X)
>  #define VTTBR_VMID_SHIFT  _AC(48, ULL)
>  #define VTTBR_VMID_MASK(size)(_AT(u64, (1 << size) - 1) << 
> VTTBR_VMID_SHIFT)
>  
> -- 
> 2.14.2
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: KVM: fix VTTBR_BADDR_MASK BUG_ON off-by-one

2017-11-20 Thread Christoffer Dall
On Tue, Nov 14, 2017 at 02:37:43PM +, Kristina Martsenko wrote:
> VTTBR_BADDR_MASK is used to sanity check the size and alignment of the
> VTTBR address. It seems to currently be off by one, thereby only
> allowing up to 47-bit addresses (instead of 48-bit) and also
> insufficiently checking the alignment. This patch fixes it.
> 
> As an example, with 4k pages, before this patch we have:
> 
>   PHYS_MASK_SHIFT = 48
>   VTTBR_X = 37 - 24 = 13
>   VTTBR_BADDR_SHIFT = 13 - 1 = 12
>   VTTBR_BADDR_MASK = ((1 << 35) - 1) << 12 = 0x7000
> 
> Which is wrong, because the mask doesn't allow bit 47 of the VTTBR
> address to be set, and only requires the address to be 12-bit (4k)
> aligned, while it actually needs to be 13-bit (8k) aligned because we
> concatenate two 4k tables.
> 
> With this patch, the mask becomes 0xe000, which is what we
> want.
> 
> Fixes: 0369f6a34b9f ("arm64: KVM: EL2 register definitions")
> Cc:  # 3.11.x
> Reviewed-by: Suzuki K Poulose 
> Signed-off-by: Kristina Martsenko 
> ---
>  arch/arm64/include/asm/kvm_arm.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h 
> b/arch/arm64/include/asm/kvm_arm.h
> index 61d694c2eae5..555d463c0eaa 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -170,8 +170,7 @@
>  #define VTCR_EL2_FLAGS   (VTCR_EL2_COMMON_BITS | 
> VTCR_EL2_TGRAN_FLAGS)
>  #define VTTBR_X  (VTTBR_X_TGRAN_MAGIC - 
> VTCR_EL2_T0SZ_IPA)
>  
> -#define VTTBR_BADDR_SHIFT (VTTBR_X - 1)
> -#define VTTBR_BADDR_MASK  (((UL(1) << (PHYS_MASK_SHIFT - VTTBR_X)) - 1) << 
> VTTBR_BADDR_SHIFT)
> +#define VTTBR_BADDR_MASK  (((UL(1) << (PHYS_MASK_SHIFT - VTTBR_X)) - 1) << 
> VTTBR_X)
>  #define VTTBR_VMID_SHIFT  (UL(48))
>  #define VTTBR_VMID_MASK(size) (_AT(u64, (1 << size) - 1) << VTTBR_VMID_SHIFT)
>  
> -- 
> 2.1.4
> 

Reviewed-by: Christoffer Dall 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm/arm64: VGIC: extend !vgic_is_initialized guard

2017-11-20 Thread Christoffer Dall
Hi Andre,

On Fri, Nov 17, 2017 at 05:58:21PM +, Andre Przywara wrote:
> Commit f39d16cbabf9 ("KVM: arm/arm64: Guard kvm_vgic_map_is_active against
> !vgic_initialized") introduced a check whether the VGIC has been
> initialized before accessing the spinlock and the VGIC data structure.
> However the vgic_get_irq() call in the variable declaration sneaked
> through the net, so lets make sure that this also gets called only after
> we actually allocated the arrays this function accesses.
> 
> Signed-off-by: Andre Przywara 
> ---
>  virt/kvm/arm/vgic/vgic.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index e54ef2fdf73d..967983a33ab2 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -786,13 +786,14 @@ void vgic_kick_vcpus(struct kvm *kvm)
>  
>  bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq)
>  {
> - struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
> + struct vgic_irq *irq;
>   bool map_is_active;
>   unsigned long flags;
>  
>   if (!vgic_initialized(vcpu->kvm))
>   return false;
>  
> + irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
>   spin_lock_irqsave(>irq_lock, flags);
>   map_is_active = irq->hw && irq->active;
>   spin_unlock_irqrestore(>irq_lock, flags);
> -- 
> 2.14.1
> 
As explained in the other e-mail this isn't actually strictly necessary
anymore, but I think our current appraoch of "how do we handle calls
from generic code the VGIC when the VGIC potentially hasn't been
initialized yet?" is to have an initialized check on most entry points
to the VGIC, unless strictly performance sensitive.

Therefore:

Reviewed-by: Christoffer Dall 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 04/20] KVM: arm/arm64: Guard kvm_vgic_map_is_active against !vgic_initialized

2017-11-20 Thread Christoffer Dall
On Thu, Nov 16, 2017 at 12:29:48PM +, Andre Przywara wrote:
> Hi,
> 
> On 27/10/17 09:34, Christoffer Dall wrote:
> > If the vgic is not initialized, don't try to grab its spinlocks or
> > traverse its data structures.
> > 
> > This is important because we soon have to start considering the active
> > state of a virtual interrupts when doing vcpu_load, which may happen
> > early on before the vgic is initialized.
> 
> I understand this patch is on its way to Linus already, but I just found
> this by browsing for VGIC changes...
> 
> > Signed-off-by: Christoffer Dall 
> > Acked-by: Marc Zyngier 
> > ---
> >  virt/kvm/arm/vgic/vgic.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> > index fed717e..e1f7dbc 100644
> > --- a/virt/kvm/arm/vgic/vgic.c
> > +++ b/virt/kvm/arm/vgic/vgic.c
> > @@ -777,6 +777,9 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, 
> > unsigned int virt_irq)
> > struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
> 
> Isn't vgic_get_irq() already accessing VGIC data structures? In which
> case this assignment should be moved after the vgic_initialized() check
> below?

Theoretically, yes, but we're saved by two things:

First, this is only ever called on PPIs from the timer code, which are
always statically allocated as part of the VCPU.

Second, this isn't actually needed anymore after the irqchip in
userspace support, which partially reworked the timer init sequence
(done after the first version of these patches), so this patch could
actually have been entirely dropped, or reworked as you suggest.

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] KVM: arm/arm64: Don't enable/disable physical timer access on VHE

2017-11-20 Thread Christoffer Dall
After the timer optimization rework we accidentally end up calling
physical timer enable/disable functions on VHE systems, which is neither
needed nor correct, since the CNTHCTL_EL2 register format is
different when HCR_EL2.E2H is set.

The CNTHCTL_EL2 is initialized when CPUs become online in
kvm_timer_init_vhe() and we don't have to call these functions on VHE
systems, which also allows us to inline the non-VHE functionality.

Reported-by: Jintack Lim 
Signed-off-by: Christoffer Dall 
---
 include/kvm/arm_arch_timer.h |  3 ---
 virt/kvm/arm/arch_timer.c|  6 --
 virt/kvm/arm/hyp/timer-sr.c  | 48 ++--
 3 files changed, 20 insertions(+), 37 deletions(-)

diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 01ee473517e2..6e45608b2399 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -93,7 +93,4 @@ void kvm_timer_init_vhe(void);
 #define vcpu_vtimer(v) (&(v)->arch.timer_cpu.vtimer)
 #define vcpu_ptimer(v) (&(v)->arch.timer_cpu.ptimer)
 
-void enable_el1_phys_timer_access(void);
-void disable_el1_phys_timer_access(void);
-
 #endif
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 4151250ce8da..190c99ed1b73 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -479,9 +479,6 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 
vtimer_restore_state(vcpu);
 
-   if (has_vhe())
-   disable_el1_phys_timer_access();
-
/* Set the background timer for the physical timer emulation. */
phys_timer_emulate(vcpu);
 }
@@ -510,9 +507,6 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
if (unlikely(!timer->enabled))
return;
 
-   if (has_vhe())
-   enable_el1_phys_timer_access();
-
vtimer_save_state(vcpu);
 
/*
diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
index f39861639f08..f24404b3c8df 100644
--- a/virt/kvm/arm/hyp/timer-sr.c
+++ b/virt/kvm/arm/hyp/timer-sr.c
@@ -27,42 +27,34 @@ void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, 
u32 cntvoff_high)
write_sysreg(cntvoff, cntvoff_el2);
 }
 
-void __hyp_text enable_el1_phys_timer_access(void)
-{
-   u64 val;
-
-   /* Allow physical timer/counter access for the host */
-   val = read_sysreg(cnthctl_el2);
-   val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
-   write_sysreg(val, cnthctl_el2);
-}
-
-void __hyp_text disable_el1_phys_timer_access(void)
-{
-   u64 val;
-
-   /*
-* Disallow physical timer access for the guest
-* Physical counter access is allowed
-*/
-   val = read_sysreg(cnthctl_el2);
-   val &= ~CNTHCTL_EL1PCEN;
-   val |= CNTHCTL_EL1PCTEN;
-   write_sysreg(val, cnthctl_el2);
-}
-
 void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
 {
/*
 * We don't need to do this for VHE since the host kernel runs in EL2
 * with HCR_EL2.TGE ==1, which makes those bits have no impact.
 */
-   if (!has_vhe())
-   enable_el1_phys_timer_access();
+   if (!has_vhe()) {
+   u64 val;
+
+   /* Allow physical timer/counter access for the host */
+   val = read_sysreg(cnthctl_el2);
+   val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
+   write_sysreg(val, cnthctl_el2);
+   }
 }
 
 void __hyp_text __timer_enable_traps(struct kvm_vcpu *vcpu)
 {
-   if (!has_vhe())
-   disable_el1_phys_timer_access();
+   if (!has_vhe()) {
+   u64 val;
+
+   /*
+* Disallow physical timer access for the guest
+* Physical counter access is allowed
+*/
+   val = read_sysreg(cnthctl_el2);
+   val &= ~CNTHCTL_EL1PCEN;
+   val |= CNTHCTL_EL1PCTEN;
+   write_sysreg(val, cnthctl_el2);
+   }
 }
-- 
2.14.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 14/20] KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit

2017-11-20 Thread Christoffer Dall
On Thu, Nov 16, 2017 at 03:30:39PM -0500, Jintack Lim wrote:
> Hi Christoffer,
> 
> On Fri, Oct 20, 2017 at 7:49 AM, Christoffer Dall
>  wrote:
> > From: Christoffer Dall 
> >
> > We don't need to save and restore the hardware timer state and examine
> > if it generates interrupts on on every entry/exit to the guest.  The
> > timer hardware is perfectly capable of telling us when it has expired
> > by signaling interrupts.
> >
> > When taking a vtimer interrupt in the host, we don't want to mess with
> > the timer configuration, we just want to forward the physical interrupt
> > to the guest as a virtual interrupt.  We can use the split priority drop
> > and deactivate feature of the GIC to do this, which leaves an EOI'ed
> > interrupt active on the physical distributor, making sure we don't keep
> > taking timer interrupts which would prevent the guest from running.  We
> > can then forward the physical interrupt to the VM using the HW bit in
> > the LR of the GIC, like we do already, which lets the guest directly
> > deactivate both the physical and virtual timer simultaneously, allowing
> > the timer hardware to exit the VM and generate a new physical interrupt
> > when the timer output is again asserted later on.
> >
> > We do need to capture this state when migrating VCPUs between physical
> > CPUs, however, which we use the vcpu put/load functions for, which are
> > called through preempt notifiers whenever the thread is scheduled away
> > from the CPU or called directly if we return from the ioctl to
> > userspace.
> >
> > One caveat is that we have to save and restore the timer state in both
> > kvm_timer_vcpu_[put/load] and kvm_timer_[schedule/unschedule], because
> > we can have the following flows:
> >
> >   1. kvm_vcpu_block
> >   2. kvm_timer_schedule
> >   3. schedule
> >   4. kvm_timer_vcpu_put (preempt notifier)
> >   5. schedule (vcpu thread gets scheduled back)
> >   6. kvm_timer_vcpu_load (preempt notifier)
> >   7. kvm_timer_unschedule
> >
> > And a version where we don't actually call schedule:
> >
> >   1. kvm_vcpu_block
> >   2. kvm_timer_schedule
> >   7. kvm_timer_unschedule
> >
> > Since kvm_timer_[schedule/unschedule] may not be followed by put/load,
> > but put/load also may be called independently, we call the timer
> > save/restore functions from both paths.  Since they rely on the loaded
> > flag to never save/restore when unnecessary, this doesn't cause any
> > harm, and we ensure that all invokations of either set of functions work
> > as intended.
> >
> > An added benefit beyond not having to read and write the timer sysregs
> > on every entry and exit is that we no longer have to actively write the
> > active state to the physical distributor, because we configured the
> > irq for the vtimer to only get a priority drop when handling the
> > interrupt in the GIC driver (we called irq_set_vcpu_affinity()), and
> > the interrupt stays active after firing on the host.
> >
> > Signed-off-by: Christoffer Dall 
> > ---
> >
> > Notes:
> > Changes since v3:
> >  - Added comments explaining the 'loaded' flag and made other clarifying
> >comments.
> >  - No longer rely on the armed flag to conditionally save/restore state,
> >as we already rely on the 'loaded' flag to not repetitively
> >save/restore state.
> >  - Reworded parts of the commit message.
> >  - Removed renames not belonging to this patch.
> >  - Added warning in kvm_arch_timer_handler in case we see spurious
> >interrupts, for example if the hardware doesn't retire the
> >level-triggered timer signal fast enough.
> >
> >  include/kvm/arm_arch_timer.h |  16 ++-
> >  virt/kvm/arm/arch_timer.c| 237 
> > +++
> >  virt/kvm/arm/arm.c   |  19 +++-
> >  3 files changed, 178 insertions(+), 94 deletions(-)
> >
> > diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> > index 184c3ef2df93..c538f707e1c1 100644
> > --- a/include/kvm/arm_arch_timer.h
> > +++ b/include/kvm/arm_arch_timer.h
> > @@ -31,8 +31,15 @@ struct arch_timer_context {
> > /* Timer IRQ */
> > struct kvm_irq_levelirq;
> >
> > -   /* Active IRQ state caching */
> > -   boolactive_cleared_last;
> > +   /*
> > +* We have multiple paths which can save/restore the timer state
> > +* onto the hardware, so we need some way of keeping track of
> > +* where the latest state is.
> > +*
> > +* loaded == true:  State is loaded on the hardware registers.
> > +* loaded == false: State is stored in memory.
> > +*/
> > +   boolloaded;
> >
> > /* Virtual offset */
> > u64 cntvoff;
> > @@ -78,10 +85,15 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
> >
> >  u64 kvm_phys_timer_read(void);
> >
> > 

Re: [PATCH] KVM: arm/arm64: VGIC: extend !vgic_is_initialized guard

2017-11-20 Thread Auger Eric
Hi Andre,

On 17/11/2017 18:58, Andre Przywara wrote:
> Commit f39d16cbabf9 ("KVM: arm/arm64: Guard kvm_vgic_map_is_active against
> !vgic_initialized") introduced a check whether the VGIC has been
> initialized before accessing the spinlock and the VGIC data structure.
> However the vgic_get_irq() call in the variable declaration sneaked
> through the net, so lets make sure that this also gets called only after
> we actually allocated the arrays this function accesses.
> 
> Signed-off-by: Andre Przywara 
actually it does not directly apply on kvmarm/next. There is a small
conflict with 47bbd31  KVM: arm/arm64: vgic: restructure
kvm_vgic_(un)map_phys_irq I think.

Cheers

Eric
> ---
>  virt/kvm/arm/vgic/vgic.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index e54ef2fdf73d..967983a33ab2 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -786,13 +786,14 @@ void vgic_kick_vcpus(struct kvm *kvm)
>  
>  bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq)
>  {
> - struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
> + struct vgic_irq *irq;
>   bool map_is_active;
>   unsigned long flags;
>  
>   if (!vgic_initialized(vcpu->kvm))
>   return false;
>  
> + irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
>   spin_lock_irqsave(>irq_lock, flags);
>   map_is_active = irq->hw && irq->active;
>   spin_unlock_irqrestore(>irq_lock, flags);
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 00/21] SError rework + RAS for firmware first support

2017-11-20 Thread Christoffer Dall
On Tue, Nov 14, 2017 at 04:03:01PM +, James Morse wrote:
> Hi Christoffer,
> 
> On 13/11/17 11:29, Christoffer Dall wrote:
> > On Thu, Nov 09, 2017 at 06:14:56PM +, James Morse wrote:
> >> On 19/10/17 15:57, James Morse wrote:
> >>> Known issues:
> >> [...]
> >>>  * KVM-Migration: VDISR_EL2 is exposed to userspace as DISR_EL1, but how 
> >>> should
> >>>HCR_EL2.VSE or VSESR_EL2 be migrated when the guest has an SError 
> >>> pending but
> >>>hasn't taken it yet...?
> >>
> >> I've been trying to work out how this pending-SError-migration could work.
> >>
> >> If HCR_EL2.VSE is set then the guest will take a virtual SError when it 
> >> next
> >> unmasks SError. Today this doesn't get migrated, but only KVM sets this 
> >> bit as
> >> an attempt to kill the guest.
> >>
> >> This will be more of a problem with GengDongjiu's SError CAP for triggering
> >> guest SError from user-space, which will also allow the VSESR_EL2 to be
> >> specified. (this register becomes the guest ESR_EL1 when the virtual 
> >> SError is
> >> taken and is used to emulate firmware-first's NOTIFY_SEI and eventually
> >> kernel-first RAS). These errors are likely to be handled by the guest.
> >>
> >>
> >> We don't want to expose VSESR_EL2 to user-space, and for migration it isn't
> >> enough as a value of '0' doesn't tell us if HCR_EL2.VSE is set.
> >>
> >> To get out of this corner: why not declare pending-SError-migration an 
> >> invalid
> >> thing to do?
> 
> > To answer that question we'd have to know if that is generally a valid
> > thing to require.  How will higher level tools in the stack deal with
> > this (e.g. libvirt, and OpenStack).  Is it really valid to tell them
> > "nope, can't migrate right now".  I'm thinking if you have a failing
> > host and want to signal some error to the guest, that's probably a
> > really good time to migrate your mission-critical VM away to a different
> > host, and being told, "sorry, cannot do this" would be painful.  I'm
> > cc'ing Drew for his insight into libvirt and how this is done on x86,
> 
> Thanks,
> 
> 
> > but I'm not really crazy about this idea.
> 
> Excellent, so at the other extreme we could have an API to query all of this
> state, and another to set it. On systems without the RAS extensions this just
> moves the HCR_EL2.VSE bit. On systems with the RAS extensions it moves 
> VSESR_EL2
> too.
> 
> I was hoping to avoid exposing different information. I need to look into how
> that works. (and this is all while avoiding adding an EL2 register to
> vcpu_sysreg [0])
> 
> 
> >> We can give Qemu a way to query if a virtual SError is (still) pending. 
> >> Qemu
> >> would need to check this on each vcpu after migration, just before it 
> >> throws the
> >> switch and the guest runs on the new host. This way the VSESR_EL2 value 
> >> doesn't
> >> need migrating at all.
> >>
> >> In the ideal world, Qemu could re-inject the last SError it triggered if 
> >> there
> >> is still one pending when it migrates... but because KVM injects errors 
> >> too, it
> >> would need to block migration until this flag is cleared.
> 
> > I don't understand your conclusion here.
> 
> I was trying to reduce it to exposing just HCR_EL2.VSE as 'bool
> serror_still_pending()', then let Qemu re-inject whatever SError it injected
> last. This then behaves the same regardless of the RAS support.
> But KVM's kvm_inject_vabt() breaks this, Qemu can't know whether this pending
> SError was from Qemu, or from KVM.
> 
> ... So we need VSESR_EL2 on systems which have that register ...
> 
> (or, get rid of kvm_inject_vabt(), but that would involve a new exit type, and
> some trickery for existing user-space)
> 
> > If QEMU can query the virtual SError pending state, it can also inject
> > that before running the VM after a restore, and we should have preserved
> > the same state.
> 
> [..]
> 
> >> Can anyone suggest a better way?
> 
> > I'm thinking this is analogous to migrating a VM that uses an irqchip in
> > userspace and has set the IRQ or FIQ lines using KVM_IRQ_LINE.  My
> > feeling is that this is also not supported today.
> 
> Does KVM change/update these values behind Qemu's back? It's kvm_inject_vabt()
> that is making this tricky. (or at least confusing me)
> 

Yes, the IRQ line can be set to high from userspace, and then KVM can
lower this value when the guest has taken the virtual IRQ/FIQ.  I think
it's completely similar to your problem.

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 00/21] SError rework + RAS for firmware first support

2017-11-20 Thread Christoffer Dall
On Mon, Nov 13, 2017 at 01:05:19PM +, Peter Maydell wrote:
> On 13 November 2017 at 11:29, Christoffer Dall  wrote:
> > I'm thinking this is analogous to migrating a VM that uses an irqchip in
> > userspace and has set the IRQ or FIQ lines using KVM_IRQ_LINE.  My
> > feeling is that this is also not supported today.
> 
> Oops, yes, we completely forgot about migration when we added
> that feature... I think you're right that we won't get that right.

So I think it might actually work for the timer, because we migrate the
timer state, and I think QEMU migrates the timer-to-GIC line state, and
if we're licky the IRQ line from the userspace GIC to the KVM VCPU would
get updated after restore.

But in general, KVM_IRQ_LINE values don't get migrated, and I think
that's a problem we've probably had from the initial implementation, and
not introduced with userspace timers support.

IIRC, QEMU is really happy to continously call KVM_IRQ_LINE with the
same value (which could potentially be further optimized), and that
currently may hide this problem.

Still, it's something we should look at and ensure is correct,
especially when adding a new similar state that needs migration.

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm/arm64: VGIC: extend !vgic_is_initialized guard

2017-11-20 Thread Auger Eric
Hi Andre,

On 17/11/2017 18:58, Andre Przywara wrote:
> Commit f39d16cbabf9 ("KVM: arm/arm64: Guard kvm_vgic_map_is_active against
> !vgic_initialized") introduced a check whether the VGIC has been
> initialized before accessing the spinlock and the VGIC data structure.
> However the vgic_get_irq() call in the variable declaration sneaked
> through the net, so lets make sure that this also gets called only after
> we actually allocated the arrays this function accesses.
> 
> Signed-off-by: Andre Przywara 
Reviewed-by: Eric Auger 

Thanks

Eric
> ---
>  virt/kvm/arm/vgic/vgic.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index e54ef2fdf73d..967983a33ab2 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -786,13 +786,14 @@ void vgic_kick_vcpus(struct kvm *kvm)
>  
>  bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq)
>  {
> - struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
> + struct vgic_irq *irq;
>   bool map_is_active;
>   unsigned long flags;
>  
>   if (!vgic_initialized(vcpu->kvm))
>   return false;
>  
> + irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
>   spin_lock_irqsave(>irq_lock, flags);
>   map_is_active = irq->hw && irq->active;
>   spin_unlock_irqrestore(>irq_lock, flags);
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm