Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put

2018-02-14 Thread Marc Zyngier
On Wed, 14 Feb 2018 17:38:11 +,
Christoffer Dall wrote:
> 
> On Wed, Feb 14, 2018 at 02:43:42PM +, Dave Martin wrote:
> > [CC Ard, in case he has a view on how much we care about softirq NEON
> > performance regressions ... and whether my suggestions make sense]
> > 
> > On Wed, Feb 14, 2018 at 11:15:54AM +0100, Christoffer Dall wrote:
> > > On Tue, Feb 13, 2018 at 02:08:47PM +, Dave Martin wrote:
> > > > On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote:
> > > > > On Fri, Feb 09, 2018 at 03:59:30PM +, Dave Martin wrote:
> 
> [...]
> 
> > > > 
> > > > kvm_fpsimd_flush_cpu_state() is just an invalidation.  No state is
> > > > actually saved today because we explicitly don't care about preserving
> > > > the SVE state, because the syscall ABI throws the SVE regs away as
> > > > a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM
> > > > ensures that the non-SVE FPSIMD bits _are_ restored by itself.
> > > > 
> > > > I think my proposal is that this hook might take on the role of
> > > > actually saving the state too, if we move that out of the KVM host
> > > > context save/restore code.
> > > > 
> > > > Perhaps we could even replace
> > > > 
> > > > preempt_disable();
> > > > kvm_fpsimd_flush_cpu_state();
> > > > /* ... */
> > > > preempt_enable();
> > > > 
> > > > with
> > > > 
> > > > kernel_neon_begin();
> > > > /* ... */
> > > > kernel_neon_end();
> > > 
> > > I'm not entirely sure where the begin and end points would be in the
> > > context of KVM?
> > 
> > Hmmm, actually there's a bug in your VHE changes now I look more
> > closely in this area:
> > 
> > You assume that the only way for the FPSIMD regs to get unexpectedly
> > dirtied is through a context switch, but actually this is not the case:
> > a softirq can use kernel-mode NEON any time that softirqs are enabled.
> > 
> > This means that in between kvm_arch_vcpu_load() and _put() (whether via
> > preempt notification or not), the guest's FPSIMD state in the regs may
> > be trashed by a softirq.
> 
> ouch.
> 
> > 
> > The simplest fix is to disable softirqs and preemption for that whole
> > region, but since we can stay in it indefinitely that's obviously not
> > the right approach.  Putting kernel_neon_begin() in _load() and
> > kernel_neon_end() in _put() achieves the same without disabling
> > softirq, but preemption is still disabled throughout, which is bad.
> > This effectively makes the run ioctl nonpreemptible...
> > 
> > A better fix would be to set the cpu's kernel_neon_busy flag, which
> > makes softirq code use non-NEON fallback code.
> > 
> > We could expose an interface from fpsimd.c to support that.
> > 
> > It still comes at a cost though: due to the switching from NEON to
> > fallback code in softirq handlers, we may get a big performance
> > regression in setups that rely heavily on NEON in softirq for
> > performance.
> > 
> 
> I wasn't aware that softirqs would use fpsimd.
> 
> > 
> > Alternatively we could do something like the following, but it's a
> > rather gross abstraction violation:
> > 
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index 2e43f9d..6a1ff3a 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -746,9 +746,24 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
> > struct kvm_run *run)
> >  * the effect of taking the interrupt again, in SVC
> >  * mode this time.
> >  */
> > +   local_bh_disable();
> > local_irq_enable();
> >  
> > /*
> > +* If we exited due to one or mode pending interrupts, they
> > +* have now been handled.  If such an interrupt pended a
> > +* softirq, we shouldn't prevent that softirq from using
> > +* kernel-mode NEON indefinitely: instead, give FPSIMD back to
> > +* the host to manage as it likes.  We'll grab it again on the
> > +* next FPSIMD trap from the guest (if any).
> > +*/
> > +   if (local_softirq_pending() && FPSIMD untrapped for guest) {
> > +   /* save vcpu FPSIMD context */
> > +   /* enable FPSIMD trap for guest */
> > +   }
> > +   local_bh_enable();
> > +
> > +   /*
> >  * We do local_irq_enable() before calling guest_exit() so
> >  * that if a timer interrupt hits while running the guest we
> >  * account that tick as being spent in the guest.  We enable
> > 
> > [...]
> > 
> 
> I can't see this working, what if an IRQ comes in and a softirq gets
> pending immediately after local_bh_enable() above?
> 
> And as you say, it's really not pretty.
> 
> This is really making me think that I'll drop this part of the
> optimization and when we do optimize fpsimd handling, we do it properly
> by integrating it with the kernel tracking.
> 
> What do you think?

[catching up with the 

Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put

2018-02-14 Thread Ard Biesheuvel
On 14 February 2018 at 17:38, Christoffer Dall
 wrote:
> On Wed, Feb 14, 2018 at 02:43:42PM +, Dave Martin wrote:
>> [CC Ard, in case he has a view on how much we care about softirq NEON
>> performance regressions ... and whether my suggestions make sense]
>>
>> On Wed, Feb 14, 2018 at 11:15:54AM +0100, Christoffer Dall wrote:
>> > On Tue, Feb 13, 2018 at 02:08:47PM +, Dave Martin wrote:
>> > > On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote:
>> > > > On Fri, Feb 09, 2018 at 03:59:30PM +, Dave Martin wrote:
>
> [...]
>
>> > >
>> > > kvm_fpsimd_flush_cpu_state() is just an invalidation.  No state is
>> > > actually saved today because we explicitly don't care about preserving
>> > > the SVE state, because the syscall ABI throws the SVE regs away as
>> > > a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM
>> > > ensures that the non-SVE FPSIMD bits _are_ restored by itself.
>> > >
>> > > I think my proposal is that this hook might take on the role of
>> > > actually saving the state too, if we move that out of the KVM host
>> > > context save/restore code.
>> > >
>> > > Perhaps we could even replace
>> > >
>> > >   preempt_disable();
>> > >   kvm_fpsimd_flush_cpu_state();
>> > >   /* ... */
>> > >   preempt_enable();
>> > >
>> > > with
>> > >
>> > >   kernel_neon_begin();
>> > >   /* ... */
>> > >   kernel_neon_end();
>> >
>> > I'm not entirely sure where the begin and end points would be in the
>> > context of KVM?
>>
>> Hmmm, actually there's a bug in your VHE changes now I look more
>> closely in this area:
>>
>> You assume that the only way for the FPSIMD regs to get unexpectedly
>> dirtied is through a context switch, but actually this is not the case:
>> a softirq can use kernel-mode NEON any time that softirqs are enabled.
>>
>> This means that in between kvm_arch_vcpu_load() and _put() (whether via
>> preempt notification or not), the guest's FPSIMD state in the regs may
>> be trashed by a softirq.
>
> ouch.
>
>>
>> The simplest fix is to disable softirqs and preemption for that whole
>> region, but since we can stay in it indefinitely that's obviously not
>> the right approach.  Putting kernel_neon_begin() in _load() and
>> kernel_neon_end() in _put() achieves the same without disabling
>> softirq, but preemption is still disabled throughout, which is bad.
>> This effectively makes the run ioctl nonpreemptible...
>>
>> A better fix would be to set the cpu's kernel_neon_busy flag, which
>> makes softirq code use non-NEON fallback code.
>>
>> We could expose an interface from fpsimd.c to support that.
>>
>> It still comes at a cost though: due to the switching from NEON to
>> fallback code in softirq handlers, we may get a big performance
>> regression in setups that rely heavily on NEON in softirq for
>> performance.
>>
>
> I wasn't aware that softirqs would use fpsimd.
>

It is not common but it is permitted by the API, and there is mac80211
code and IPsec code that does this.

Performance penalties incurred by switching from accelerated h/w
instruction based crypto to scalar code can be as high as 20x, so we
should really avoid this if we can.

>>
>> Alternatively we could do something like the following, but it's a
>> rather gross abstraction violation:
>>
>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>> index 2e43f9d..6a1ff3a 100644
>> --- a/virt/kvm/arm/arm.c
>> +++ b/virt/kvm/arm/arm.c
>> @@ -746,9 +746,24 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
>> struct kvm_run *run)
>>* the effect of taking the interrupt again, in SVC
>>* mode this time.
>>*/
>> + local_bh_disable();
>>   local_irq_enable();
>>
>>   /*
>> +  * If we exited due to one or mode pending interrupts, they
>> +  * have now been handled.  If such an interrupt pended a
>> +  * softirq, we shouldn't prevent that softirq from using
>> +  * kernel-mode NEON indefinitely: instead, give FPSIMD back to
>> +  * the host to manage as it likes.  We'll grab it again on the
>> +  * next FPSIMD trap from the guest (if any).
>> +  */
>> + if (local_softirq_pending() && FPSIMD untrapped for guest) {
>> + /* save vcpu FPSIMD context */
>> + /* enable FPSIMD trap for guest */
>> + }
>> + local_bh_enable();
>> +
>> + /*
>>* We do local_irq_enable() before calling guest_exit() so
>>* that if a timer interrupt hits while running the guest we
>>* account that tick as being spent in the guest.  We enable
>>
>> [...]
>>
>
> I can't see this working, what if an IRQ comes in and a softirq gets
> pending immediately after local_bh_enable() above?
>
> And as you say, it's really not pretty.
>
> This is really making me think that I'll drop this part of the

Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put

2018-02-14 Thread Christoffer Dall
On Wed, Feb 14, 2018 at 02:43:42PM +, Dave Martin wrote:
> [CC Ard, in case he has a view on how much we care about softirq NEON
> performance regressions ... and whether my suggestions make sense]
> 
> On Wed, Feb 14, 2018 at 11:15:54AM +0100, Christoffer Dall wrote:
> > On Tue, Feb 13, 2018 at 02:08:47PM +, Dave Martin wrote:
> > > On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote:
> > > > On Fri, Feb 09, 2018 at 03:59:30PM +, Dave Martin wrote:

[...]

> > > 
> > > kvm_fpsimd_flush_cpu_state() is just an invalidation.  No state is
> > > actually saved today because we explicitly don't care about preserving
> > > the SVE state, because the syscall ABI throws the SVE regs away as
> > > a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM
> > > ensures that the non-SVE FPSIMD bits _are_ restored by itself.
> > > 
> > > I think my proposal is that this hook might take on the role of
> > > actually saving the state too, if we move that out of the KVM host
> > > context save/restore code.
> > > 
> > > Perhaps we could even replace
> > > 
> > >   preempt_disable();
> > >   kvm_fpsimd_flush_cpu_state();
> > >   /* ... */
> > >   preempt_enable();
> > > 
> > > with
> > > 
> > >   kernel_neon_begin();
> > >   /* ... */
> > >   kernel_neon_end();
> > 
> > I'm not entirely sure where the begin and end points would be in the
> > context of KVM?
> 
> Hmmm, actually there's a bug in your VHE changes now I look more
> closely in this area:
> 
> You assume that the only way for the FPSIMD regs to get unexpectedly
> dirtied is through a context switch, but actually this is not the case:
> a softirq can use kernel-mode NEON any time that softirqs are enabled.
> 
> This means that in between kvm_arch_vcpu_load() and _put() (whether via
> preempt notification or not), the guest's FPSIMD state in the regs may
> be trashed by a softirq.

ouch.

> 
> The simplest fix is to disable softirqs and preemption for that whole
> region, but since we can stay in it indefinitely that's obviously not
> the right approach.  Putting kernel_neon_begin() in _load() and
> kernel_neon_end() in _put() achieves the same without disabling
> softirq, but preemption is still disabled throughout, which is bad.
> This effectively makes the run ioctl nonpreemptible...
> 
> A better fix would be to set the cpu's kernel_neon_busy flag, which
> makes softirq code use non-NEON fallback code.
> 
> We could expose an interface from fpsimd.c to support that.
> 
> It still comes at a cost though: due to the switching from NEON to
> fallback code in softirq handlers, we may get a big performance
> regression in setups that rely heavily on NEON in softirq for
> performance.
> 

I wasn't aware that softirqs would use fpsimd.

> 
> Alternatively we could do something like the following, but it's a
> rather gross abstraction violation:
> 
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 2e43f9d..6a1ff3a 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -746,9 +746,24 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
> struct kvm_run *run)
>* the effect of taking the interrupt again, in SVC
>* mode this time.
>*/
> + local_bh_disable();
>   local_irq_enable();
>  
>   /*
> +  * If we exited due to one or mode pending interrupts, they
> +  * have now been handled.  If such an interrupt pended a
> +  * softirq, we shouldn't prevent that softirq from using
> +  * kernel-mode NEON indefinitely: instead, give FPSIMD back to
> +  * the host to manage as it likes.  We'll grab it again on the
> +  * next FPSIMD trap from the guest (if any).
> +  */
> + if (local_softirq_pending() && FPSIMD untrapped for guest) {
> + /* save vcpu FPSIMD context */
> + /* enable FPSIMD trap for guest */
> + }
> + local_bh_enable();
> +
> + /*
>* We do local_irq_enable() before calling guest_exit() so
>* that if a timer interrupt hits while running the guest we
>* account that tick as being spent in the guest.  We enable
> 
> [...]
> 

I can't see this working, what if an IRQ comes in and a softirq gets
pending immediately after local_bh_enable() above?

And as you say, it's really not pretty.

This is really making me think that I'll drop this part of the
optimization and when we do optimize fpsimd handling, we do it properly
by integrating it with the kernel tracking.

What do you think?

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 4/4] vfio: Allow type-1 IOMMU instantiation with a virtio-iommu

2018-02-14 Thread Robin Murphy

On 14/02/18 15:26, Alex Williamson wrote:

On Wed, 14 Feb 2018 14:53:40 +
Jean-Philippe Brucker  wrote:


When enabling both VFIO and VIRTIO_IOMMU modules, automatically select
VFIO_IOMMU_TYPE1 as well.

Signed-off-by: Jean-Philippe Brucker 
---
  drivers/vfio/Kconfig | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index c84333eb5eb5..65a1e691110c 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -21,7 +21,7 @@ config VFIO_VIRQFD
  menuconfig VFIO
tristate "VFIO Non-Privileged userspace driver framework"
depends on IOMMU_API
-   select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM_SMMU || ARM_SMMU_V3)
+   select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM_SMMU || ARM_SMMU_V3 || 
VIRTIO_IOMMU)
select ANON_INODES
help
  VFIO provides a framework for secure userspace device drivers.


Why are we basing this on specific IOMMU drivers in the first place?
Only ARM is doing that.  Shouldn't IOMMU_API only be enabled for ARM
targets that support it and therefore we can forget about the specific
IOMMU drivers?  Thanks,


Makes sense - the majority of ARM systems (and mobile/embedded ARM64 
ones) making use of IOMMU_API won't actually support VFIO, but it can't 
hurt to allow them to select the type 1 driver regardless. Especially as 
multiplatform configs are liable to be pulling in the SMMU driver(s) anyway.


Robin.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 4/4] vfio: Allow type-1 IOMMU instantiation with a virtio-iommu

2018-02-14 Thread Alex Williamson
On Wed, 14 Feb 2018 14:53:40 +
Jean-Philippe Brucker  wrote:

> When enabling both VFIO and VIRTIO_IOMMU modules, automatically select
> VFIO_IOMMU_TYPE1 as well.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/vfio/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index c84333eb5eb5..65a1e691110c 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -21,7 +21,7 @@ config VFIO_VIRQFD
>  menuconfig VFIO
>   tristate "VFIO Non-Privileged userspace driver framework"
>   depends on IOMMU_API
> - select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM_SMMU || ARM_SMMU_V3)
> + select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM_SMMU || ARM_SMMU_V3 || 
> VIRTIO_IOMMU)
>   select ANON_INODES
>   help
> VFIO provides a framework for secure userspace device drivers.

Why are we basing this on specific IOMMU drivers in the first place?
Only ARM is doing that.  Shouldn't IOMMU_API only be enabled for ARM
targets that support it and therefore we can forget about the specific
IOMMU drivers?  Thanks,

Alex
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 4/4] vfio: Allow type-1 IOMMU instantiation with a virtio-iommu

2018-02-14 Thread Jean-Philippe Brucker
When enabling both VFIO and VIRTIO_IOMMU modules, automatically select
VFIO_IOMMU_TYPE1 as well.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/vfio/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index c84333eb5eb5..65a1e691110c 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -21,7 +21,7 @@ config VFIO_VIRQFD
 menuconfig VFIO
tristate "VFIO Non-Privileged userspace driver framework"
depends on IOMMU_API
-   select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM_SMMU || ARM_SMMU_V3)
+   select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM_SMMU || ARM_SMMU_V3 || 
VIRTIO_IOMMU)
select ANON_INODES
help
  VFIO provides a framework for secure userspace device drivers.
-- 
2.16.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 2/4] iommu/virtio: Add probe request

2018-02-14 Thread Jean-Philippe Brucker
When the device offers the probe feature, send a probe request for each
device managed by the IOMMU. Extract RESV_MEM information. When we
encounter a MSI doorbell region, set it up as a IOMMU_RESV_MSI region.
This will tell other subsystems that there is no need to map the MSI
doorbell in the virtio-iommu, because MSIs bypass it.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c  | 163 --
 include/uapi/linux/virtio_iommu.h |  37 +
 2 files changed, 193 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index a9c9245e8ba2..3ac4b38eaf19 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -45,6 +45,7 @@ struct viommu_dev {
struct iommu_domain_geometrygeometry;
u64 pgsize_bitmap;
u8  domain_bits;
+   u32 probe_size;
 };
 
 struct viommu_mapping {
@@ -72,6 +73,7 @@ struct viommu_domain {
 struct viommu_endpoint {
struct viommu_dev   *viommu;
struct viommu_domain*vdomain;
+   struct list_headresv_regions;
 };
 
 struct viommu_request {
@@ -140,6 +142,10 @@ static int viommu_get_req_size(struct viommu_dev *viommu,
case VIRTIO_IOMMU_T_UNMAP:
size = sizeof(r->unmap);
break;
+   case VIRTIO_IOMMU_T_PROBE:
+   *bottom += viommu->probe_size;
+   size = sizeof(r->probe) + *bottom;
+   break;
default:
return -EINVAL;
}
@@ -448,6 +454,105 @@ static int viommu_replay_mappings(struct viommu_domain 
*vdomain)
return ret;
 }
 
+static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
+  struct virtio_iommu_probe_resv_mem *mem,
+  size_t len)
+{
+   struct iommu_resv_region *region = NULL;
+   unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
+
+   u64 addr = le64_to_cpu(mem->addr);
+   u64 size = le64_to_cpu(mem->size);
+
+   if (len < sizeof(*mem))
+   return -EINVAL;
+
+   switch (mem->subtype) {
+   case VIRTIO_IOMMU_RESV_MEM_T_MSI:
+   region = iommu_alloc_resv_region(addr, size, prot,
+IOMMU_RESV_MSI);
+   break;
+   case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
+   default:
+   region = iommu_alloc_resv_region(addr, size, 0,
+IOMMU_RESV_RESERVED);
+   break;
+   }
+
+   list_add(>resv_regions, >list);
+
+   /*
+* Treat unknown subtype as RESERVED, but urge users to update their
+* driver.
+*/
+   if (mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_RESERVED &&
+   mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_MSI)
+   pr_warn("unknown resv mem subtype 0x%x\n", mem->subtype);
+
+   return 0;
+}
+
+static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device *dev)
+{
+   int ret;
+   u16 type, len;
+   size_t cur = 0;
+   struct virtio_iommu_req_probe *probe;
+   struct virtio_iommu_probe_property *prop;
+   struct iommu_fwspec *fwspec = dev->iommu_fwspec;
+   struct viommu_endpoint *vdev = fwspec->iommu_priv;
+
+   if (!fwspec->num_ids)
+   /* Trouble ahead. */
+   return -EINVAL;
+
+   probe = kzalloc(sizeof(*probe) + viommu->probe_size +
+   sizeof(struct virtio_iommu_req_tail), GFP_KERNEL);
+   if (!probe)
+   return -ENOMEM;
+
+   probe->head.type = VIRTIO_IOMMU_T_PROBE;
+   /*
+* For now, assume that properties of an endpoint that outputs multiple
+* IDs are consistent. Only probe the first one.
+*/
+   probe->endpoint = cpu_to_le32(fwspec->ids[0]);
+
+   ret = viommu_send_req_sync(viommu, probe);
+   if (ret)
+   goto out_free;
+
+   prop = (void *)probe->properties;
+   type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK;
+
+   while (type != VIRTIO_IOMMU_PROBE_T_NONE &&
+  cur < viommu->probe_size) {
+   len = le16_to_cpu(prop->length);
+
+   switch (type) {
+   case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
+   ret = viommu_add_resv_mem(vdev, (void *)prop->value, 
len);
+   break;
+   default:
+   dev_dbg(dev, "unknown viommu prop 0x%x\n", type);
+   }
+
+   if (ret)
+   dev_err(dev, "failed to parse viommu prop 0x%x\n", 
type);
+
+   cur += sizeof(*prop) + len;
+   if (cur >= viommu->probe_size)
+   break;
+
+   prop = (void *)probe->properties + cur;
+ 

[PATCH 3/4] iommu/virtio: Add event queue

2018-02-14 Thread Jean-Philippe Brucker
The event queue offers a way for the device to report access faults from
endpoints. It is implemented on virtqueue #1. Whenever the host needs to
signal a fault, it fills one of the buffers offered by the guest and
interrupts it.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c  | 139 ++
 include/uapi/linux/virtio_iommu.h |  18 +
 2 files changed, 143 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 3ac4b38eaf19..6b96f1b36d5a 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -30,6 +30,12 @@
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
+enum viommu_vq_idx {
+   VIOMMU_REQUEST_VQ   = 0,
+   VIOMMU_EVENT_VQ = 1,
+   VIOMMU_NUM_VQS  = 2,
+};
+
 struct viommu_dev {
struct iommu_device iommu;
struct device   *dev;
@@ -37,9 +43,10 @@ struct viommu_dev {
 
struct ida  domain_ids;
 
-   struct virtqueue*vq;
+   struct virtqueue*vqs[VIOMMU_NUM_VQS];
/* Serialize anything touching the request queue */
spinlock_t  request_lock;
+   void*evts;
 
/* Device configuration */
struct iommu_domain_geometrygeometry;
@@ -84,6 +91,15 @@ struct viommu_request {
struct list_headlist;
 };
 
+#define VIOMMU_FAULT_RESV_MASK 0xff00
+
+struct viommu_event {
+   union {
+   u32 head;
+   struct virtio_iommu_fault fault;
+   };
+};
+
 #define to_viommu_domain(domain)   \
container_of(domain, struct viommu_domain, domain)
 
@@ -161,12 +177,13 @@ static int viommu_receive_resp(struct viommu_dev *viommu, 
int nr_sent,
unsigned int len;
int nr_received = 0;
struct viommu_request *req, *pending;
+   struct virtqueue *vq = viommu->vqs[VIOMMU_REQUEST_VQ];
 
pending = list_first_entry_or_null(sent, struct viommu_request, list);
if (WARN_ON(!pending))
return 0;
 
-   while ((req = virtqueue_get_buf(viommu->vq, )) != NULL) {
+   while ((req = virtqueue_get_buf(vq, )) != NULL) {
if (req != pending) {
dev_warn(viommu->dev, "discarding stale request\n");
continue;
@@ -201,6 +218,7 @@ static int _viommu_send_reqs_sync(struct viommu_dev *viommu,
 * up the CPU in case of a device bug.
 */
unsigned long timeout_ms = 1000;
+   struct virtqueue *vq = viommu->vqs[VIOMMU_REQUEST_VQ];
 
*nr_sent = 0;
 
@@ -210,15 +228,14 @@ static int _viommu_send_reqs_sync(struct viommu_dev 
*viommu,
sg[0] = >top;
sg[1] = >bottom;
 
-   ret = virtqueue_add_sgs(viommu->vq, sg, 1, 1, req,
-   GFP_ATOMIC);
+   ret = virtqueue_add_sgs(vq, sg, 1, 1, req, GFP_ATOMIC);
if (ret)
break;
 
list_add_tail(>list, );
}
 
-   if (i && !virtqueue_kick(viommu->vq))
+   if (i && !virtqueue_kick(vq))
return -EPIPE;
 
timeout = ktime_add_ms(ktime_get(), timeout_ms * i);
@@ -553,6 +570,70 @@ static int viommu_probe_endpoint(struct viommu_dev 
*viommu, struct device *dev)
return ret;
 }
 
+static int viommu_fault_handler(struct viommu_dev *viommu,
+   struct virtio_iommu_fault *fault)
+{
+   char *reason_str;
+
+   u8 reason   = fault->reason;
+   u32 flags   = le32_to_cpu(fault->flags);
+   u32 endpoint= le32_to_cpu(fault->endpoint);
+   u64 address = le64_to_cpu(fault->address);
+
+   switch (reason) {
+   case VIRTIO_IOMMU_FAULT_R_DOMAIN:
+   reason_str = "domain";
+   break;
+   case VIRTIO_IOMMU_FAULT_R_MAPPING:
+   reason_str = "page";
+   break;
+   case VIRTIO_IOMMU_FAULT_R_UNKNOWN:
+   default:
+   reason_str = "unknown";
+   break;
+   }
+
+   /* TODO: find EP by ID and report_iommu_fault */
+   if (flags & VIRTIO_IOMMU_FAULT_F_ADDRESS)
+   dev_err_ratelimited(viommu->dev, "%s fault from EP %u at %#llx 
[%s%s%s]\n",
+   reason_str, endpoint, address,
+   flags & VIRTIO_IOMMU_FAULT_F_READ ? "R" : 
"",
+   flags & VIRTIO_IOMMU_FAULT_F_WRITE ? "W" : 
"",
+   flags & VIRTIO_IOMMU_FAULT_F_EXEC ? "X" : 
"");
+   else
+   dev_err_ratelimited(viommu->dev, "%s fault from EP %u\n",
+   reason_str, endpoint);
+
+  

[PATCH 0/4] Add virtio-iommu driver

2018-02-14 Thread Jean-Philippe Brucker
Implement the virtio-iommu driver following version 0.6 of the
specification [1]. Previous version, RFCv2, was sent in November [2].
This version addresses Eric's comments and changes the device number.
(Since last week I also tested and fixed the probe/release functions,
they now use devm properly.)

I did not include ACPI support because the next IORT specifications
isn't ready yet (even though the virtio-iommu spec describes the node
format, a new node type has to be allocated). Therefore only device-tree
guests are supported for the moment but the x86 prototype, on branch
virtio-iommu/devel, doesn't add much complexity.

git://linux-arm.org/linux-jpb.git virtio-iommu/v0.6

Test it with Eric's latest QEMU device [3], for example with the
following command-line:

$ qemu-system-aarch64 -M virt -cpu cortex-a57 -nographic
-kernel Image -append 'console=ttyAMA0 root=/dev/vda rw'
-device virtio-iommu-device
-device virtio-blk-pci,iommu_platform,disable-legacy=on,drive=hd0
-drive if=none,file=rootfs.bin,id=hd0

You can also try the kvmtool device [4]. For example on AMD Seattle I
use the following commands to boot a guest with all devices behind a
virtio-iommu, then display mapping stats.

$ lkvm run -k Image --irqchip gicv2m --console virtio --vfio-pci 01:00.0
--viommu vfio --viommu virtio
$ lkvm debug -a -i stats

[1] [RFC] virtio-iommu version 0.6
https://www.spinics.net/lists/linux-virtualization/msg32628.html
[2] [RFC PATCH v2 0/5] Add virtio-iommu driver
https://www.spinics.net/lists/kvm/msg159047.html
[3] [RFC v6 00/22] VIRTIO-IOMMU device
http://lists.gnu.org/archive/html/qemu-arm/2018-02/msg00274.html
[4] git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.6

Jean-Philippe Brucker (4):
  iommu: Add virtio-iommu driver
  iommu/virtio: Add probe request
  iommu/virtio: Add event queue
  vfio: Allow type-1 IOMMU instantiation with a virtio-iommu

 MAINTAINERS   |6 +
 drivers/iommu/Kconfig |   11 +
 drivers/iommu/Makefile|1 +
 drivers/iommu/virtio-iommu.c  | 1220 +
 drivers/vfio/Kconfig  |2 +-
 include/uapi/linux/virtio_ids.h   |1 +
 include/uapi/linux/virtio_iommu.h |  171 ++
 7 files changed, 1411 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/virtio-iommu.c
 create mode 100644 include/uapi/linux/virtio_iommu.h

-- 
2.16.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 1/4] iommu: Add virtio-iommu driver

2018-02-14 Thread Jean-Philippe Brucker
The virtio IOMMU is a para-virtualized device, allowing to send IOMMU
requests such as map/unmap over virtio-mmio transport without emulating
page tables. This implementation handles ATTACH, DETACH, MAP and UNMAP
requests.

The bulk of the code transforms calls coming from the IOMMU API into
corresponding virtio requests. Mappings are kept in an interval tree
instead of page tables.

Signed-off-by: Jean-Philippe Brucker 
---
 MAINTAINERS   |   6 +
 drivers/iommu/Kconfig |  11 +
 drivers/iommu/Makefile|   1 +
 drivers/iommu/virtio-iommu.c  | 960 ++
 include/uapi/linux/virtio_ids.h   |   1 +
 include/uapi/linux/virtio_iommu.h | 116 +
 6 files changed, 1095 insertions(+)
 create mode 100644 drivers/iommu/virtio-iommu.c
 create mode 100644 include/uapi/linux/virtio_iommu.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bdc260e36b7..2a181924d420 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14818,6 +14818,12 @@ S: Maintained
 F: drivers/virtio/virtio_input.c
 F: include/uapi/linux/virtio_input.h
 
+VIRTIO IOMMU DRIVER
+M: Jean-Philippe Brucker 
+S: Maintained
+F: drivers/iommu/virtio-iommu.c
+F: include/uapi/linux/virtio_iommu.h
+
 VIRTUAL BOX GUEST DEVICE DRIVER
 M: Hans de Goede 
 M: Arnd Bergmann 
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f3a21343e636..1ea0ec74524f 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -381,4 +381,15 @@ config QCOM_IOMMU
help
  Support for IOMMU on certain Qualcomm SoCs.
 
+config VIRTIO_IOMMU
+   bool "Virtio IOMMU driver"
+   depends on VIRTIO_MMIO
+   select IOMMU_API
+   select INTERVAL_TREE
+   select ARM_DMA_USE_IOMMU if ARM
+   help
+ Para-virtualised IOMMU driver with virtio.
+
+ Say Y here if you intend to run this kernel as a guest.
+
 endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 1fb695854809..9c68be1365e1 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -29,3 +29,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
 obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
 obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
+obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
new file mode 100644
index ..a9c9245e8ba2
--- /dev/null
+++ b/drivers/iommu/virtio-iommu.c
@@ -0,0 +1,960 @@
+/*
+ * Virtio driver for the paravirtualized IOMMU
+ *
+ * Copyright (C) 2018 ARM Limited
+ * Author: Jean-Philippe Brucker 
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define MSI_IOVA_BASE  0x800
+#define MSI_IOVA_LENGTH0x10
+
+struct viommu_dev {
+   struct iommu_device iommu;
+   struct device   *dev;
+   struct virtio_device*vdev;
+
+   struct ida  domain_ids;
+
+   struct virtqueue*vq;
+   /* Serialize anything touching the request queue */
+   spinlock_t  request_lock;
+
+   /* Device configuration */
+   struct iommu_domain_geometrygeometry;
+   u64 pgsize_bitmap;
+   u8  domain_bits;
+};
+
+struct viommu_mapping {
+   phys_addr_t paddr;
+   struct interval_tree_node   iova;
+   union {
+   struct virtio_iommu_req_map map;
+   struct virtio_iommu_req_unmap unmap;
+   } req;
+};
+
+struct viommu_domain {
+   struct iommu_domain domain;
+   struct viommu_dev   *viommu;
+   struct mutexmutex;
+   unsigned intid;
+
+   spinlock_t  mappings_lock;
+   struct rb_root_cached   mappings;
+
+   /* Number of endpoints attached to this domain */
+   unsigned long   endpoints;
+};
+
+struct viommu_endpoint {
+   struct viommu_dev   *viommu;
+   struct viommu_domain*vdomain;
+};
+
+struct viommu_request {
+   struct scatterlist  top;
+   struct scatterlist  bottom;
+
+   int written;
+   struct list_headlist;
+};
+
+#define to_viommu_domain(domain)   \
+   container_of(domain, struct viommu_domain, domain)
+
+/* Virtio transport */
+
+static int viommu_status_to_errno(u8 status)
+{
+   switch 

Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put

2018-02-14 Thread Dave Martin
[CC Ard, in case he has a view on how much we care about softirq NEON
performance regressions ... and whether my suggestions make sense]

On Wed, Feb 14, 2018 at 11:15:54AM +0100, Christoffer Dall wrote:
> On Tue, Feb 13, 2018 at 02:08:47PM +, Dave Martin wrote:
> > On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote:
> > > On Fri, Feb 09, 2018 at 03:59:30PM +, Dave Martin wrote:

[...]

> > It's hard to gauge the impact of this: it seems unlikely to make a
> > massive difference, but will be highly workload-dependent.
> 
> So I thought it might be useful to have some idea of the frequency of
> events on a balanced workload, so I ran an 8-way SMP guest on Ubuntu
> 14.04 running SPECjvm2008, a memcached benchmark, a MySQL workloads, and
> some networking benchmarks, and I counted a few events:
> 
>  - Out of all the exits, from the guest to run-loop in EL1 on a non-VHE
>system, fewer than 1% of them result in an exit to userspace (0.57%).
> 
>  - The VCPU thread was preempted (voluntarily or forced) in the kernel
>less than 3% of the exits (2.72%).  That's just below 5 preemptions
>per ioctl(KVM_RUN).
> 
>  - In 29% of the preemptions (vcpu_put), the guest had touched FPSIMD
>registers and the host context was restored.
> 
>  - We store the host context about 1.38 times per ioctl(KVM_RUN).
> 
> So that tells me that (1) it's worth restoring the guest FPSIMD state
> lazily as opposed to proactively on vcpu_load, and (2) that there's a
> small opportunity for improvement by reducing redundant host vfp state
> saves.

That's really useful.  I guess it confirms that lazy guest FPSIMD
restore is desirable (though I wasn't disputing this) and that the
potential benefit from eliminating redundant host FPSIMD saves is
modest, assume that this workload is representative.

So we shouldn't over-optimise for the latter if there are side costs
from doing so.

> > The redundancy occurs because of the deferred restore of the FPSIMD
> > registers for host userspace: as a result, the host FPSIMD regs are
> > either discardable (i.e., already saved) or not live at all between
> > and context switch and the next ret_to_user.
> > 
> > This means that if the vcpu run loop is preempted, then when the host
> > switches back to the run loop it is pointless to save or restore the
> > host FPSIMD state.
> > 
> > A typical sequence of events exposing this redundancy would be as
> > follows.  I assume here that there are two cpu-bound tasks A and B
> > competing for a host CPU, where A is a vcpu thread:
> > 
> >  - vcpu A is in the guest running a compute-heavy task
> >  - FPSIMD typically traps to the host before context switch
> >  X kvm saves the host FPSIMD state
> >  - kvm loads the guest FPSIMD state
> >  - vcpu A reenters the guest
> >  - host context switch IRQ preempts A back to the run loop
> >  Y kvm loads the host FPSIMD state via vcpu_put
> > 
> >  - host context switch:
> >  - TIF_FOREIGN_FPSTATE is set -> no save of user FPSIMD state
> >  - switch to B
> >  - B reaches ret_to_user
> >  Y B's user FPSIMD state is loaded: TIF_FOREIGN_FPSTATE now clear
> >  - B enters userspace
> > 
> >  - host context switch:
> >  - B enters kernel
> >  X TIF_FOREIGN_FPSTATE now set -> host saves B's FPSIMD state
> >  - switch to A -> set TIF_FOREIGN_FPSTATE for A
> >  - back to the KVM run loop
> > 
> >  - vcpu A enters guest
> >  - redo from start
> > 
> > Here, the two saves marked X are redundant with respect to each other,
> > and the two restores marked Y are redundant with respect to each other.
> > 
> 
> Right, ok, but if we have
> 
>  - ioctl(KVM_RUN)
>  - mark hardware FPSIMD register state as invalid
>  - load guest FPSIMD state
>  - enter guest
>  - exit guest
>  - save guest FPSIMD state
>  - return to user space
> 
> (I.e. we don't do any preemption in the guest)
> 
> Then we'll loose the host FPSIMD register state, potentially, right?

Yes.

However, (disregarding kernel-mode NEON) no host task's state can be
live in the FPSIMD regs other than current's.  If another context's
state is in the regs, it is either stale or a clean copy and we can
harmlessly invalidate the association with no ill effects.

The subtlety here comes from the SVE syscall ABI, which allows
current's non-FPSIMD SVE bits to be discarded across a syscall: in this
code, current _is_ in a syscall, so the fact that we can lose current's
SVE bits here is fine: TIF_SVE will have been cleared in entry.S on the
way in, and that means that SVE will trap for userspace giving a chance
to zero those regs lazily for userspace when/if they're used again.
Conversely, current's FPSIMD regs are preserved separately by KVM.

> Your original comment on this patch was that we didn't need to restore
> the host FPSIMD state in kvm_vcpu_put_sysregs, which would result in the
> scenario above.  The only way I can see this working is by making sure
> that kvm_fpsimd_flush_cpu_state() also saves the FPSIMD hardware
> register state if the 

Re: [PATCHv2] arm64/kvm: Prohibit guest LOR accesses

2018-02-14 Thread Christoffer Dall
On Tue, Feb 13, 2018 at 01:39:23PM +, Mark Rutland wrote:
> We don't currently limit guest accesses to the LOR registers, which we
> neither virtualize nor context-switch. As such, guests are provided with
> unusable information/controls, and are not isolated from each other (or
> the host).
> 
> To prevent these issues, we can trap register accesses and present the
> illusion LORegions are unssupported by the CPU. To do this, we mask
> ID_AA64MMFR1.LO, and set HCR_EL2.TLOR to trap accesses to the following
> registers:
> 
> * LORC_EL1
> * LOREA_EL1
> * LORID_EL1
> * LORN_EL1
> * LORSA_EL1
> 
> ... when trapped, we inject an UNDEFINED exception to EL1, simulating
> their non-existence.
> 
> As noted in D7.2.67, when no LORegions are implemented, LoadLOAcquire
> and StoreLORelease must behave as LoadAcquire and StoreRelease
> respectively. We can ensure this by clearing LORC_EL1.EN when a CPU's
> EL2 is first initialized, as the host kernel will not modify this.
> 
> Signed-off-by: Mark Rutland 
> Cc: Vladimir Murzin 
> Cc: Catalin Marinas 
> Cc: Christoffer Dall 
> Cc: Marc Zyngier 
> Cc: Will Deacon 
> Cc: kvmarm@lists.cs.columbia.edu

Applied, thanks.
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put

2018-02-14 Thread Christoffer Dall
On Tue, Feb 13, 2018 at 02:08:47PM +, Dave Martin wrote:
> On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote:
> > On Fri, Feb 09, 2018 at 03:59:30PM +, Dave Martin wrote:
> > > On Wed, Feb 07, 2018 at 06:56:44PM +0100, Christoffer Dall wrote:
> > > > On Wed, Feb 07, 2018 at 04:49:55PM +, Dave Martin wrote:
> 
> [...]
> 
> > > Simply entering the kernel and returning to userspace doesn't have
> > > this effect by itself.
> > > 
> > > 
> > > Prior to the SVE patches, KVM makes itself orthogonal to the host
> > > context switch machinery by ensuring that whatever the host had
> > > in the FPSIMD regs at guest entry is restored before returning to
> > > the host. (IIUC)  
> > 
> > Only if the guest actually touches FPSIMD state.  If the guest doesn't
> > touch FPSIMD (no trap to EL2), then we never touch the state at all.
> 
> I should have been clearer: KVM ensures that the state is _unchanged_
> before returning to the host, but can elide the save/restore when the
> guest doesn't touch the state...
> 
> > 
> > > This means that redundant save/restore work is
> > > done by KVM, but does have the advantage of simplicity.
> > 
> > I don't understand what the redundant part here is?  Isn't it only
> > redundant in the case where the host (for some reason) has already saved
> > its FPSIMD state?  I assume that won't be the common case, since
> > "userspace->kernel->kvm_run" won't save the FPSIMD state, as you just
> > explained above.
> 
> ...however, when this elision does not occur, it may duplicate
> save/restore done by the kernel, or it may save/restore worthless data
> if the host's FPSIMD state is non-live at the time.
> 
> It's hard to gauge the impact of this: it seems unlikely to make a
> massive difference, but will be highly workload-dependent.

So I thought it might be useful to have some idea of the frequency of
events on a balanced workload, so I ran an 8-way SMP guest on Ubuntu
14.04 running SPECjvm2008, a memcached benchmark, a MySQL workloads, and
some networking benchmarks, and I counted a few events:

 - Out of all the exits, from the guest to run-loop in EL1 on a non-VHE
   system, fewer than 1% of them result in an exit to userspace (0.57%).

 - The VCPU thread was preempted (voluntarily or forced) in the kernel
   less than 3% of the exits (2.72%).  That's just below 5 preemptions
   per ioctl(KVM_RUN).

 - In 29% of the preemptions (vcpu_put), the guest had touched FPSIMD
   registers and the host context was restored.

 - We store the host context about 1.38 times per ioctl(KVM_RUN).

So that tells me that (1) it's worth restoring the guest FPSIMD state
lazily as opposed to proactively on vcpu_load, and (2) that there's a
small opportunity for improvement by reducing redundant host vfp state
saves.

> 
> 
> The redundancy occurs because of the deferred restore of the FPSIMD
> registers for host userspace: as a result, the host FPSIMD regs are
> either discardable (i.e., already saved) or not live at all between
> and context switch and the next ret_to_user.
> 
> This means that if the vcpu run loop is preempted, then when the host
> switches back to the run loop it is pointless to save or restore the
> host FPSIMD state.
> 
> A typical sequence of events exposing this redundancy would be as
> follows.  I assume here that there are two cpu-bound tasks A and B
> competing for a host CPU, where A is a vcpu thread:
> 
>  - vcpu A is in the guest running a compute-heavy task
>  - FPSIMD typically traps to the host before context switch
>  X kvm saves the host FPSIMD state
>  - kvm loads the guest FPSIMD state
>  - vcpu A reenters the guest
>  - host context switch IRQ preempts A back to the run loop
>  Y kvm loads the host FPSIMD state via vcpu_put
> 
>  - host context switch:
>  - TIF_FOREIGN_FPSTATE is set -> no save of user FPSIMD state
>  - switch to B
>  - B reaches ret_to_user
>  Y B's user FPSIMD state is loaded: TIF_FOREIGN_FPSTATE now clear
>  - B enters userspace
> 
>  - host context switch:
>  - B enters kernel
>  X TIF_FOREIGN_FPSTATE now set -> host saves B's FPSIMD state
>  - switch to A -> set TIF_FOREIGN_FPSTATE for A
>  - back to the KVM run loop
> 
>  - vcpu A enters guest
>  - redo from start
> 
> Here, the two saves marked X are redundant with respect to each other,
> and the two restores marked Y are redundant with respect to each other.
> 

Right, ok, but if we have

 - ioctl(KVM_RUN)
 - mark hardware FPSIMD register state as invalid
 - load guest FPSIMD state
 - enter guest
 - exit guest
 - save guest FPSIMD state
 - return to user space

(I.e. we don't do any preemption in the guest)

Then we'll loose the host FPSIMD register state, potentially, right?

Your original comment on this patch was that we didn't need to restore
the host FPSIMD state in kvm_vcpu_put_sysregs, which would result in the
scenario above.  The only way I can see this working is by making sure
that kvm_fpsimd_flush_cpu_state() also saves the FPSIMD hardware
register state