Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
On Wed, 14 Feb 2018 17:38:11 +, Christoffer Dall wrote: > > On Wed, Feb 14, 2018 at 02:43:42PM +, Dave Martin wrote: > > [CC Ard, in case he has a view on how much we care about softirq NEON > > performance regressions ... and whether my suggestions make sense] > > > > On Wed, Feb 14, 2018 at 11:15:54AM +0100, Christoffer Dall wrote: > > > On Tue, Feb 13, 2018 at 02:08:47PM +, Dave Martin wrote: > > > > On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote: > > > > > On Fri, Feb 09, 2018 at 03:59:30PM +, Dave Martin wrote: > > [...] > > > > > > > > > kvm_fpsimd_flush_cpu_state() is just an invalidation. No state is > > > > actually saved today because we explicitly don't care about preserving > > > > the SVE state, because the syscall ABI throws the SVE regs away as > > > > a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM > > > > ensures that the non-SVE FPSIMD bits _are_ restored by itself. > > > > > > > > I think my proposal is that this hook might take on the role of > > > > actually saving the state too, if we move that out of the KVM host > > > > context save/restore code. > > > > > > > > Perhaps we could even replace > > > > > > > > preempt_disable(); > > > > kvm_fpsimd_flush_cpu_state(); > > > > /* ... */ > > > > preempt_enable(); > > > > > > > > with > > > > > > > > kernel_neon_begin(); > > > > /* ... */ > > > > kernel_neon_end(); > > > > > > I'm not entirely sure where the begin and end points would be in the > > > context of KVM? > > > > Hmmm, actually there's a bug in your VHE changes now I look more > > closely in this area: > > > > You assume that the only way for the FPSIMD regs to get unexpectedly > > dirtied is through a context switch, but actually this is not the case: > > a softirq can use kernel-mode NEON any time that softirqs are enabled. > > > > This means that in between kvm_arch_vcpu_load() and _put() (whether via > > preempt notification or not), the guest's FPSIMD state in the regs may > > be trashed by a softirq. > > ouch. > > > > > The simplest fix is to disable softirqs and preemption for that whole > > region, but since we can stay in it indefinitely that's obviously not > > the right approach. Putting kernel_neon_begin() in _load() and > > kernel_neon_end() in _put() achieves the same without disabling > > softirq, but preemption is still disabled throughout, which is bad. > > This effectively makes the run ioctl nonpreemptible... > > > > A better fix would be to set the cpu's kernel_neon_busy flag, which > > makes softirq code use non-NEON fallback code. > > > > We could expose an interface from fpsimd.c to support that. > > > > It still comes at a cost though: due to the switching from NEON to > > fallback code in softirq handlers, we may get a big performance > > regression in setups that rely heavily on NEON in softirq for > > performance. > > > > I wasn't aware that softirqs would use fpsimd. > > > > > Alternatively we could do something like the following, but it's a > > rather gross abstraction violation: > > > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c > > index 2e43f9d..6a1ff3a 100644 > > --- a/virt/kvm/arm/arm.c > > +++ b/virt/kvm/arm/arm.c > > @@ -746,9 +746,24 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, > > struct kvm_run *run) > > * the effect of taking the interrupt again, in SVC > > * mode this time. > > */ > > + local_bh_disable(); > > local_irq_enable(); > > > > /* > > +* If we exited due to one or mode pending interrupts, they > > +* have now been handled. If such an interrupt pended a > > +* softirq, we shouldn't prevent that softirq from using > > +* kernel-mode NEON indefinitely: instead, give FPSIMD back to > > +* the host to manage as it likes. We'll grab it again on the > > +* next FPSIMD trap from the guest (if any). > > +*/ > > + if (local_softirq_pending() && FPSIMD untrapped for guest) { > > + /* save vcpu FPSIMD context */ > > + /* enable FPSIMD trap for guest */ > > + } > > + local_bh_enable(); > > + > > + /* > > * We do local_irq_enable() before calling guest_exit() so > > * that if a timer interrupt hits while running the guest we > > * account that tick as being spent in the guest. We enable > > > > [...] > > > > I can't see this working, what if an IRQ comes in and a softirq gets > pending immediately after local_bh_enable() above? > > And as you say, it's really not pretty. > > This is really making me think that I'll drop this part of the > optimization and when we do optimize fpsimd handling, we do it properly > by integrating it with the kernel tracking. > > What do you think? [catching up with the
Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
On 14 February 2018 at 17:38, Christoffer Dallwrote: > On Wed, Feb 14, 2018 at 02:43:42PM +, Dave Martin wrote: >> [CC Ard, in case he has a view on how much we care about softirq NEON >> performance regressions ... and whether my suggestions make sense] >> >> On Wed, Feb 14, 2018 at 11:15:54AM +0100, Christoffer Dall wrote: >> > On Tue, Feb 13, 2018 at 02:08:47PM +, Dave Martin wrote: >> > > On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote: >> > > > On Fri, Feb 09, 2018 at 03:59:30PM +, Dave Martin wrote: > > [...] > >> > > >> > > kvm_fpsimd_flush_cpu_state() is just an invalidation. No state is >> > > actually saved today because we explicitly don't care about preserving >> > > the SVE state, because the syscall ABI throws the SVE regs away as >> > > a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM >> > > ensures that the non-SVE FPSIMD bits _are_ restored by itself. >> > > >> > > I think my proposal is that this hook might take on the role of >> > > actually saving the state too, if we move that out of the KVM host >> > > context save/restore code. >> > > >> > > Perhaps we could even replace >> > > >> > > preempt_disable(); >> > > kvm_fpsimd_flush_cpu_state(); >> > > /* ... */ >> > > preempt_enable(); >> > > >> > > with >> > > >> > > kernel_neon_begin(); >> > > /* ... */ >> > > kernel_neon_end(); >> > >> > I'm not entirely sure where the begin and end points would be in the >> > context of KVM? >> >> Hmmm, actually there's a bug in your VHE changes now I look more >> closely in this area: >> >> You assume that the only way for the FPSIMD regs to get unexpectedly >> dirtied is through a context switch, but actually this is not the case: >> a softirq can use kernel-mode NEON any time that softirqs are enabled. >> >> This means that in between kvm_arch_vcpu_load() and _put() (whether via >> preempt notification or not), the guest's FPSIMD state in the regs may >> be trashed by a softirq. > > ouch. > >> >> The simplest fix is to disable softirqs and preemption for that whole >> region, but since we can stay in it indefinitely that's obviously not >> the right approach. Putting kernel_neon_begin() in _load() and >> kernel_neon_end() in _put() achieves the same without disabling >> softirq, but preemption is still disabled throughout, which is bad. >> This effectively makes the run ioctl nonpreemptible... >> >> A better fix would be to set the cpu's kernel_neon_busy flag, which >> makes softirq code use non-NEON fallback code. >> >> We could expose an interface from fpsimd.c to support that. >> >> It still comes at a cost though: due to the switching from NEON to >> fallback code in softirq handlers, we may get a big performance >> regression in setups that rely heavily on NEON in softirq for >> performance. >> > > I wasn't aware that softirqs would use fpsimd. > It is not common but it is permitted by the API, and there is mac80211 code and IPsec code that does this. Performance penalties incurred by switching from accelerated h/w instruction based crypto to scalar code can be as high as 20x, so we should really avoid this if we can. >> >> Alternatively we could do something like the following, but it's a >> rather gross abstraction violation: >> >> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c >> index 2e43f9d..6a1ff3a 100644 >> --- a/virt/kvm/arm/arm.c >> +++ b/virt/kvm/arm/arm.c >> @@ -746,9 +746,24 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, >> struct kvm_run *run) >>* the effect of taking the interrupt again, in SVC >>* mode this time. >>*/ >> + local_bh_disable(); >> local_irq_enable(); >> >> /* >> + * If we exited due to one or mode pending interrupts, they >> + * have now been handled. If such an interrupt pended a >> + * softirq, we shouldn't prevent that softirq from using >> + * kernel-mode NEON indefinitely: instead, give FPSIMD back to >> + * the host to manage as it likes. We'll grab it again on the >> + * next FPSIMD trap from the guest (if any). >> + */ >> + if (local_softirq_pending() && FPSIMD untrapped for guest) { >> + /* save vcpu FPSIMD context */ >> + /* enable FPSIMD trap for guest */ >> + } >> + local_bh_enable(); >> + >> + /* >>* We do local_irq_enable() before calling guest_exit() so >>* that if a timer interrupt hits while running the guest we >>* account that tick as being spent in the guest. We enable >> >> [...] >> > > I can't see this working, what if an IRQ comes in and a softirq gets > pending immediately after local_bh_enable() above? > > And as you say, it's really not pretty. > > This is really making me think that I'll drop this part of the
Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
On Wed, Feb 14, 2018 at 02:43:42PM +, Dave Martin wrote: > [CC Ard, in case he has a view on how much we care about softirq NEON > performance regressions ... and whether my suggestions make sense] > > On Wed, Feb 14, 2018 at 11:15:54AM +0100, Christoffer Dall wrote: > > On Tue, Feb 13, 2018 at 02:08:47PM +, Dave Martin wrote: > > > On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote: > > > > On Fri, Feb 09, 2018 at 03:59:30PM +, Dave Martin wrote: [...] > > > > > > kvm_fpsimd_flush_cpu_state() is just an invalidation. No state is > > > actually saved today because we explicitly don't care about preserving > > > the SVE state, because the syscall ABI throws the SVE regs away as > > > a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM > > > ensures that the non-SVE FPSIMD bits _are_ restored by itself. > > > > > > I think my proposal is that this hook might take on the role of > > > actually saving the state too, if we move that out of the KVM host > > > context save/restore code. > > > > > > Perhaps we could even replace > > > > > > preempt_disable(); > > > kvm_fpsimd_flush_cpu_state(); > > > /* ... */ > > > preempt_enable(); > > > > > > with > > > > > > kernel_neon_begin(); > > > /* ... */ > > > kernel_neon_end(); > > > > I'm not entirely sure where the begin and end points would be in the > > context of KVM? > > Hmmm, actually there's a bug in your VHE changes now I look more > closely in this area: > > You assume that the only way for the FPSIMD regs to get unexpectedly > dirtied is through a context switch, but actually this is not the case: > a softirq can use kernel-mode NEON any time that softirqs are enabled. > > This means that in between kvm_arch_vcpu_load() and _put() (whether via > preempt notification or not), the guest's FPSIMD state in the regs may > be trashed by a softirq. ouch. > > The simplest fix is to disable softirqs and preemption for that whole > region, but since we can stay in it indefinitely that's obviously not > the right approach. Putting kernel_neon_begin() in _load() and > kernel_neon_end() in _put() achieves the same without disabling > softirq, but preemption is still disabled throughout, which is bad. > This effectively makes the run ioctl nonpreemptible... > > A better fix would be to set the cpu's kernel_neon_busy flag, which > makes softirq code use non-NEON fallback code. > > We could expose an interface from fpsimd.c to support that. > > It still comes at a cost though: due to the switching from NEON to > fallback code in softirq handlers, we may get a big performance > regression in setups that rely heavily on NEON in softirq for > performance. > I wasn't aware that softirqs would use fpsimd. > > Alternatively we could do something like the following, but it's a > rather gross abstraction violation: > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c > index 2e43f9d..6a1ff3a 100644 > --- a/virt/kvm/arm/arm.c > +++ b/virt/kvm/arm/arm.c > @@ -746,9 +746,24 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, > struct kvm_run *run) >* the effect of taking the interrupt again, in SVC >* mode this time. >*/ > + local_bh_disable(); > local_irq_enable(); > > /* > + * If we exited due to one or mode pending interrupts, they > + * have now been handled. If such an interrupt pended a > + * softirq, we shouldn't prevent that softirq from using > + * kernel-mode NEON indefinitely: instead, give FPSIMD back to > + * the host to manage as it likes. We'll grab it again on the > + * next FPSIMD trap from the guest (if any). > + */ > + if (local_softirq_pending() && FPSIMD untrapped for guest) { > + /* save vcpu FPSIMD context */ > + /* enable FPSIMD trap for guest */ > + } > + local_bh_enable(); > + > + /* >* We do local_irq_enable() before calling guest_exit() so >* that if a timer interrupt hits while running the guest we >* account that tick as being spent in the guest. We enable > > [...] > I can't see this working, what if an IRQ comes in and a softirq gets pending immediately after local_bh_enable() above? And as you say, it's really not pretty. This is really making me think that I'll drop this part of the optimization and when we do optimize fpsimd handling, we do it properly by integrating it with the kernel tracking. What do you think? Thanks, -Christoffer ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 4/4] vfio: Allow type-1 IOMMU instantiation with a virtio-iommu
On 14/02/18 15:26, Alex Williamson wrote: On Wed, 14 Feb 2018 14:53:40 + Jean-Philippe Bruckerwrote: When enabling both VFIO and VIRTIO_IOMMU modules, automatically select VFIO_IOMMU_TYPE1 as well. Signed-off-by: Jean-Philippe Brucker --- drivers/vfio/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index c84333eb5eb5..65a1e691110c 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -21,7 +21,7 @@ config VFIO_VIRQFD menuconfig VFIO tristate "VFIO Non-Privileged userspace driver framework" depends on IOMMU_API - select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM_SMMU || ARM_SMMU_V3) + select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM_SMMU || ARM_SMMU_V3 || VIRTIO_IOMMU) select ANON_INODES help VFIO provides a framework for secure userspace device drivers. Why are we basing this on specific IOMMU drivers in the first place? Only ARM is doing that. Shouldn't IOMMU_API only be enabled for ARM targets that support it and therefore we can forget about the specific IOMMU drivers? Thanks, Makes sense - the majority of ARM systems (and mobile/embedded ARM64 ones) making use of IOMMU_API won't actually support VFIO, but it can't hurt to allow them to select the type 1 driver regardless. Especially as multiplatform configs are liable to be pulling in the SMMU driver(s) anyway. Robin. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 4/4] vfio: Allow type-1 IOMMU instantiation with a virtio-iommu
On Wed, 14 Feb 2018 14:53:40 + Jean-Philippe Bruckerwrote: > When enabling both VFIO and VIRTIO_IOMMU modules, automatically select > VFIO_IOMMU_TYPE1 as well. > > Signed-off-by: Jean-Philippe Brucker > --- > drivers/vfio/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig > index c84333eb5eb5..65a1e691110c 100644 > --- a/drivers/vfio/Kconfig > +++ b/drivers/vfio/Kconfig > @@ -21,7 +21,7 @@ config VFIO_VIRQFD > menuconfig VFIO > tristate "VFIO Non-Privileged userspace driver framework" > depends on IOMMU_API > - select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM_SMMU || ARM_SMMU_V3) > + select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM_SMMU || ARM_SMMU_V3 || > VIRTIO_IOMMU) > select ANON_INODES > help > VFIO provides a framework for secure userspace device drivers. Why are we basing this on specific IOMMU drivers in the first place? Only ARM is doing that. Shouldn't IOMMU_API only be enabled for ARM targets that support it and therefore we can forget about the specific IOMMU drivers? Thanks, Alex ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 4/4] vfio: Allow type-1 IOMMU instantiation with a virtio-iommu
When enabling both VFIO and VIRTIO_IOMMU modules, automatically select VFIO_IOMMU_TYPE1 as well. Signed-off-by: Jean-Philippe Brucker--- drivers/vfio/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index c84333eb5eb5..65a1e691110c 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -21,7 +21,7 @@ config VFIO_VIRQFD menuconfig VFIO tristate "VFIO Non-Privileged userspace driver framework" depends on IOMMU_API - select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM_SMMU || ARM_SMMU_V3) + select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM_SMMU || ARM_SMMU_V3 || VIRTIO_IOMMU) select ANON_INODES help VFIO provides a framework for secure userspace device drivers. -- 2.16.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 2/4] iommu/virtio: Add probe request
When the device offers the probe feature, send a probe request for each device managed by the IOMMU. Extract RESV_MEM information. When we encounter a MSI doorbell region, set it up as a IOMMU_RESV_MSI region. This will tell other subsystems that there is no need to map the MSI doorbell in the virtio-iommu, because MSIs bypass it. Signed-off-by: Jean-Philippe Brucker--- drivers/iommu/virtio-iommu.c | 163 -- include/uapi/linux/virtio_iommu.h | 37 + 2 files changed, 193 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index a9c9245e8ba2..3ac4b38eaf19 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -45,6 +45,7 @@ struct viommu_dev { struct iommu_domain_geometrygeometry; u64 pgsize_bitmap; u8 domain_bits; + u32 probe_size; }; struct viommu_mapping { @@ -72,6 +73,7 @@ struct viommu_domain { struct viommu_endpoint { struct viommu_dev *viommu; struct viommu_domain*vdomain; + struct list_headresv_regions; }; struct viommu_request { @@ -140,6 +142,10 @@ static int viommu_get_req_size(struct viommu_dev *viommu, case VIRTIO_IOMMU_T_UNMAP: size = sizeof(r->unmap); break; + case VIRTIO_IOMMU_T_PROBE: + *bottom += viommu->probe_size; + size = sizeof(r->probe) + *bottom; + break; default: return -EINVAL; } @@ -448,6 +454,105 @@ static int viommu_replay_mappings(struct viommu_domain *vdomain) return ret; } +static int viommu_add_resv_mem(struct viommu_endpoint *vdev, + struct virtio_iommu_probe_resv_mem *mem, + size_t len) +{ + struct iommu_resv_region *region = NULL; + unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO; + + u64 addr = le64_to_cpu(mem->addr); + u64 size = le64_to_cpu(mem->size); + + if (len < sizeof(*mem)) + return -EINVAL; + + switch (mem->subtype) { + case VIRTIO_IOMMU_RESV_MEM_T_MSI: + region = iommu_alloc_resv_region(addr, size, prot, +IOMMU_RESV_MSI); + break; + case VIRTIO_IOMMU_RESV_MEM_T_RESERVED: + default: + region = iommu_alloc_resv_region(addr, size, 0, +IOMMU_RESV_RESERVED); + break; + } + + list_add(>resv_regions, >list); + + /* +* Treat unknown subtype as RESERVED, but urge users to update their +* driver. +*/ + if (mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_RESERVED && + mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_MSI) + pr_warn("unknown resv mem subtype 0x%x\n", mem->subtype); + + return 0; +} + +static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device *dev) +{ + int ret; + u16 type, len; + size_t cur = 0; + struct virtio_iommu_req_probe *probe; + struct virtio_iommu_probe_property *prop; + struct iommu_fwspec *fwspec = dev->iommu_fwspec; + struct viommu_endpoint *vdev = fwspec->iommu_priv; + + if (!fwspec->num_ids) + /* Trouble ahead. */ + return -EINVAL; + + probe = kzalloc(sizeof(*probe) + viommu->probe_size + + sizeof(struct virtio_iommu_req_tail), GFP_KERNEL); + if (!probe) + return -ENOMEM; + + probe->head.type = VIRTIO_IOMMU_T_PROBE; + /* +* For now, assume that properties of an endpoint that outputs multiple +* IDs are consistent. Only probe the first one. +*/ + probe->endpoint = cpu_to_le32(fwspec->ids[0]); + + ret = viommu_send_req_sync(viommu, probe); + if (ret) + goto out_free; + + prop = (void *)probe->properties; + type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK; + + while (type != VIRTIO_IOMMU_PROBE_T_NONE && + cur < viommu->probe_size) { + len = le16_to_cpu(prop->length); + + switch (type) { + case VIRTIO_IOMMU_PROBE_T_RESV_MEM: + ret = viommu_add_resv_mem(vdev, (void *)prop->value, len); + break; + default: + dev_dbg(dev, "unknown viommu prop 0x%x\n", type); + } + + if (ret) + dev_err(dev, "failed to parse viommu prop 0x%x\n", type); + + cur += sizeof(*prop) + len; + if (cur >= viommu->probe_size) + break; + + prop = (void *)probe->properties + cur; +
[PATCH 3/4] iommu/virtio: Add event queue
The event queue offers a way for the device to report access faults from endpoints. It is implemented on virtqueue #1. Whenever the host needs to signal a fault, it fills one of the buffers offered by the guest and interrupts it. Signed-off-by: Jean-Philippe Brucker--- drivers/iommu/virtio-iommu.c | 139 ++ include/uapi/linux/virtio_iommu.h | 18 + 2 files changed, 143 insertions(+), 14 deletions(-) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 3ac4b38eaf19..6b96f1b36d5a 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -30,6 +30,12 @@ #define MSI_IOVA_BASE 0x800 #define MSI_IOVA_LENGTH0x10 +enum viommu_vq_idx { + VIOMMU_REQUEST_VQ = 0, + VIOMMU_EVENT_VQ = 1, + VIOMMU_NUM_VQS = 2, +}; + struct viommu_dev { struct iommu_device iommu; struct device *dev; @@ -37,9 +43,10 @@ struct viommu_dev { struct ida domain_ids; - struct virtqueue*vq; + struct virtqueue*vqs[VIOMMU_NUM_VQS]; /* Serialize anything touching the request queue */ spinlock_t request_lock; + void*evts; /* Device configuration */ struct iommu_domain_geometrygeometry; @@ -84,6 +91,15 @@ struct viommu_request { struct list_headlist; }; +#define VIOMMU_FAULT_RESV_MASK 0xff00 + +struct viommu_event { + union { + u32 head; + struct virtio_iommu_fault fault; + }; +}; + #define to_viommu_domain(domain) \ container_of(domain, struct viommu_domain, domain) @@ -161,12 +177,13 @@ static int viommu_receive_resp(struct viommu_dev *viommu, int nr_sent, unsigned int len; int nr_received = 0; struct viommu_request *req, *pending; + struct virtqueue *vq = viommu->vqs[VIOMMU_REQUEST_VQ]; pending = list_first_entry_or_null(sent, struct viommu_request, list); if (WARN_ON(!pending)) return 0; - while ((req = virtqueue_get_buf(viommu->vq, )) != NULL) { + while ((req = virtqueue_get_buf(vq, )) != NULL) { if (req != pending) { dev_warn(viommu->dev, "discarding stale request\n"); continue; @@ -201,6 +218,7 @@ static int _viommu_send_reqs_sync(struct viommu_dev *viommu, * up the CPU in case of a device bug. */ unsigned long timeout_ms = 1000; + struct virtqueue *vq = viommu->vqs[VIOMMU_REQUEST_VQ]; *nr_sent = 0; @@ -210,15 +228,14 @@ static int _viommu_send_reqs_sync(struct viommu_dev *viommu, sg[0] = >top; sg[1] = >bottom; - ret = virtqueue_add_sgs(viommu->vq, sg, 1, 1, req, - GFP_ATOMIC); + ret = virtqueue_add_sgs(vq, sg, 1, 1, req, GFP_ATOMIC); if (ret) break; list_add_tail(>list, ); } - if (i && !virtqueue_kick(viommu->vq)) + if (i && !virtqueue_kick(vq)) return -EPIPE; timeout = ktime_add_ms(ktime_get(), timeout_ms * i); @@ -553,6 +570,70 @@ static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device *dev) return ret; } +static int viommu_fault_handler(struct viommu_dev *viommu, + struct virtio_iommu_fault *fault) +{ + char *reason_str; + + u8 reason = fault->reason; + u32 flags = le32_to_cpu(fault->flags); + u32 endpoint= le32_to_cpu(fault->endpoint); + u64 address = le64_to_cpu(fault->address); + + switch (reason) { + case VIRTIO_IOMMU_FAULT_R_DOMAIN: + reason_str = "domain"; + break; + case VIRTIO_IOMMU_FAULT_R_MAPPING: + reason_str = "page"; + break; + case VIRTIO_IOMMU_FAULT_R_UNKNOWN: + default: + reason_str = "unknown"; + break; + } + + /* TODO: find EP by ID and report_iommu_fault */ + if (flags & VIRTIO_IOMMU_FAULT_F_ADDRESS) + dev_err_ratelimited(viommu->dev, "%s fault from EP %u at %#llx [%s%s%s]\n", + reason_str, endpoint, address, + flags & VIRTIO_IOMMU_FAULT_F_READ ? "R" : "", + flags & VIRTIO_IOMMU_FAULT_F_WRITE ? "W" : "", + flags & VIRTIO_IOMMU_FAULT_F_EXEC ? "X" : ""); + else + dev_err_ratelimited(viommu->dev, "%s fault from EP %u\n", + reason_str, endpoint); + +
[PATCH 0/4] Add virtio-iommu driver
Implement the virtio-iommu driver following version 0.6 of the specification [1]. Previous version, RFCv2, was sent in November [2]. This version addresses Eric's comments and changes the device number. (Since last week I also tested and fixed the probe/release functions, they now use devm properly.) I did not include ACPI support because the next IORT specifications isn't ready yet (even though the virtio-iommu spec describes the node format, a new node type has to be allocated). Therefore only device-tree guests are supported for the moment but the x86 prototype, on branch virtio-iommu/devel, doesn't add much complexity. git://linux-arm.org/linux-jpb.git virtio-iommu/v0.6 Test it with Eric's latest QEMU device [3], for example with the following command-line: $ qemu-system-aarch64 -M virt -cpu cortex-a57 -nographic -kernel Image -append 'console=ttyAMA0 root=/dev/vda rw' -device virtio-iommu-device -device virtio-blk-pci,iommu_platform,disable-legacy=on,drive=hd0 -drive if=none,file=rootfs.bin,id=hd0 You can also try the kvmtool device [4]. For example on AMD Seattle I use the following commands to boot a guest with all devices behind a virtio-iommu, then display mapping stats. $ lkvm run -k Image --irqchip gicv2m --console virtio --vfio-pci 01:00.0 --viommu vfio --viommu virtio $ lkvm debug -a -i stats [1] [RFC] virtio-iommu version 0.6 https://www.spinics.net/lists/linux-virtualization/msg32628.html [2] [RFC PATCH v2 0/5] Add virtio-iommu driver https://www.spinics.net/lists/kvm/msg159047.html [3] [RFC v6 00/22] VIRTIO-IOMMU device http://lists.gnu.org/archive/html/qemu-arm/2018-02/msg00274.html [4] git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.6 Jean-Philippe Brucker (4): iommu: Add virtio-iommu driver iommu/virtio: Add probe request iommu/virtio: Add event queue vfio: Allow type-1 IOMMU instantiation with a virtio-iommu MAINTAINERS |6 + drivers/iommu/Kconfig | 11 + drivers/iommu/Makefile|1 + drivers/iommu/virtio-iommu.c | 1220 + drivers/vfio/Kconfig |2 +- include/uapi/linux/virtio_ids.h |1 + include/uapi/linux/virtio_iommu.h | 171 ++ 7 files changed, 1411 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/virtio-iommu.c create mode 100644 include/uapi/linux/virtio_iommu.h -- 2.16.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 1/4] iommu: Add virtio-iommu driver
The virtio IOMMU is a para-virtualized device, allowing to send IOMMU requests such as map/unmap over virtio-mmio transport without emulating page tables. This implementation handles ATTACH, DETACH, MAP and UNMAP requests. The bulk of the code transforms calls coming from the IOMMU API into corresponding virtio requests. Mappings are kept in an interval tree instead of page tables. Signed-off-by: Jean-Philippe Brucker--- MAINTAINERS | 6 + drivers/iommu/Kconfig | 11 + drivers/iommu/Makefile| 1 + drivers/iommu/virtio-iommu.c | 960 ++ include/uapi/linux/virtio_ids.h | 1 + include/uapi/linux/virtio_iommu.h | 116 + 6 files changed, 1095 insertions(+) create mode 100644 drivers/iommu/virtio-iommu.c create mode 100644 include/uapi/linux/virtio_iommu.h diff --git a/MAINTAINERS b/MAINTAINERS index 3bdc260e36b7..2a181924d420 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -14818,6 +14818,12 @@ S: Maintained F: drivers/virtio/virtio_input.c F: include/uapi/linux/virtio_input.h +VIRTIO IOMMU DRIVER +M: Jean-Philippe Brucker +S: Maintained +F: drivers/iommu/virtio-iommu.c +F: include/uapi/linux/virtio_iommu.h + VIRTUAL BOX GUEST DEVICE DRIVER M: Hans de Goede M: Arnd Bergmann diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index f3a21343e636..1ea0ec74524f 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -381,4 +381,15 @@ config QCOM_IOMMU help Support for IOMMU on certain Qualcomm SoCs. +config VIRTIO_IOMMU + bool "Virtio IOMMU driver" + depends on VIRTIO_MMIO + select IOMMU_API + select INTERVAL_TREE + select ARM_DMA_USE_IOMMU if ARM + help + Para-virtualised IOMMU driver with virtio. + + Say Y here if you intend to run this kernel as a guest. + endif # IOMMU_SUPPORT diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 1fb695854809..9c68be1365e1 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -29,3 +29,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o obj-$(CONFIG_S390_IOMMU) += s390-iommu.o obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o +obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c new file mode 100644 index ..a9c9245e8ba2 --- /dev/null +++ b/drivers/iommu/virtio-iommu.c @@ -0,0 +1,960 @@ +/* + * Virtio driver for the paravirtualized IOMMU + * + * Copyright (C) 2018 ARM Limited + * Author: Jean-Philippe Brucker + * + * SPDX-License-Identifier: GPL-2.0 + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#define MSI_IOVA_BASE 0x800 +#define MSI_IOVA_LENGTH0x10 + +struct viommu_dev { + struct iommu_device iommu; + struct device *dev; + struct virtio_device*vdev; + + struct ida domain_ids; + + struct virtqueue*vq; + /* Serialize anything touching the request queue */ + spinlock_t request_lock; + + /* Device configuration */ + struct iommu_domain_geometrygeometry; + u64 pgsize_bitmap; + u8 domain_bits; +}; + +struct viommu_mapping { + phys_addr_t paddr; + struct interval_tree_node iova; + union { + struct virtio_iommu_req_map map; + struct virtio_iommu_req_unmap unmap; + } req; +}; + +struct viommu_domain { + struct iommu_domain domain; + struct viommu_dev *viommu; + struct mutexmutex; + unsigned intid; + + spinlock_t mappings_lock; + struct rb_root_cached mappings; + + /* Number of endpoints attached to this domain */ + unsigned long endpoints; +}; + +struct viommu_endpoint { + struct viommu_dev *viommu; + struct viommu_domain*vdomain; +}; + +struct viommu_request { + struct scatterlist top; + struct scatterlist bottom; + + int written; + struct list_headlist; +}; + +#define to_viommu_domain(domain) \ + container_of(domain, struct viommu_domain, domain) + +/* Virtio transport */ + +static int viommu_status_to_errno(u8 status) +{ + switch
Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
[CC Ard, in case he has a view on how much we care about softirq NEON performance regressions ... and whether my suggestions make sense] On Wed, Feb 14, 2018 at 11:15:54AM +0100, Christoffer Dall wrote: > On Tue, Feb 13, 2018 at 02:08:47PM +, Dave Martin wrote: > > On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote: > > > On Fri, Feb 09, 2018 at 03:59:30PM +, Dave Martin wrote: [...] > > It's hard to gauge the impact of this: it seems unlikely to make a > > massive difference, but will be highly workload-dependent. > > So I thought it might be useful to have some idea of the frequency of > events on a balanced workload, so I ran an 8-way SMP guest on Ubuntu > 14.04 running SPECjvm2008, a memcached benchmark, a MySQL workloads, and > some networking benchmarks, and I counted a few events: > > - Out of all the exits, from the guest to run-loop in EL1 on a non-VHE >system, fewer than 1% of them result in an exit to userspace (0.57%). > > - The VCPU thread was preempted (voluntarily or forced) in the kernel >less than 3% of the exits (2.72%). That's just below 5 preemptions >per ioctl(KVM_RUN). > > - In 29% of the preemptions (vcpu_put), the guest had touched FPSIMD >registers and the host context was restored. > > - We store the host context about 1.38 times per ioctl(KVM_RUN). > > So that tells me that (1) it's worth restoring the guest FPSIMD state > lazily as opposed to proactively on vcpu_load, and (2) that there's a > small opportunity for improvement by reducing redundant host vfp state > saves. That's really useful. I guess it confirms that lazy guest FPSIMD restore is desirable (though I wasn't disputing this) and that the potential benefit from eliminating redundant host FPSIMD saves is modest, assume that this workload is representative. So we shouldn't over-optimise for the latter if there are side costs from doing so. > > The redundancy occurs because of the deferred restore of the FPSIMD > > registers for host userspace: as a result, the host FPSIMD regs are > > either discardable (i.e., already saved) or not live at all between > > and context switch and the next ret_to_user. > > > > This means that if the vcpu run loop is preempted, then when the host > > switches back to the run loop it is pointless to save or restore the > > host FPSIMD state. > > > > A typical sequence of events exposing this redundancy would be as > > follows. I assume here that there are two cpu-bound tasks A and B > > competing for a host CPU, where A is a vcpu thread: > > > > - vcpu A is in the guest running a compute-heavy task > > - FPSIMD typically traps to the host before context switch > > X kvm saves the host FPSIMD state > > - kvm loads the guest FPSIMD state > > - vcpu A reenters the guest > > - host context switch IRQ preempts A back to the run loop > > Y kvm loads the host FPSIMD state via vcpu_put > > > > - host context switch: > > - TIF_FOREIGN_FPSTATE is set -> no save of user FPSIMD state > > - switch to B > > - B reaches ret_to_user > > Y B's user FPSIMD state is loaded: TIF_FOREIGN_FPSTATE now clear > > - B enters userspace > > > > - host context switch: > > - B enters kernel > > X TIF_FOREIGN_FPSTATE now set -> host saves B's FPSIMD state > > - switch to A -> set TIF_FOREIGN_FPSTATE for A > > - back to the KVM run loop > > > > - vcpu A enters guest > > - redo from start > > > > Here, the two saves marked X are redundant with respect to each other, > > and the two restores marked Y are redundant with respect to each other. > > > > Right, ok, but if we have > > - ioctl(KVM_RUN) > - mark hardware FPSIMD register state as invalid > - load guest FPSIMD state > - enter guest > - exit guest > - save guest FPSIMD state > - return to user space > > (I.e. we don't do any preemption in the guest) > > Then we'll loose the host FPSIMD register state, potentially, right? Yes. However, (disregarding kernel-mode NEON) no host task's state can be live in the FPSIMD regs other than current's. If another context's state is in the regs, it is either stale or a clean copy and we can harmlessly invalidate the association with no ill effects. The subtlety here comes from the SVE syscall ABI, which allows current's non-FPSIMD SVE bits to be discarded across a syscall: in this code, current _is_ in a syscall, so the fact that we can lose current's SVE bits here is fine: TIF_SVE will have been cleared in entry.S on the way in, and that means that SVE will trap for userspace giving a chance to zero those regs lazily for userspace when/if they're used again. Conversely, current's FPSIMD regs are preserved separately by KVM. > Your original comment on this patch was that we didn't need to restore > the host FPSIMD state in kvm_vcpu_put_sysregs, which would result in the > scenario above. The only way I can see this working is by making sure > that kvm_fpsimd_flush_cpu_state() also saves the FPSIMD hardware > register state if the
Re: [PATCHv2] arm64/kvm: Prohibit guest LOR accesses
On Tue, Feb 13, 2018 at 01:39:23PM +, Mark Rutland wrote: > We don't currently limit guest accesses to the LOR registers, which we > neither virtualize nor context-switch. As such, guests are provided with > unusable information/controls, and are not isolated from each other (or > the host). > > To prevent these issues, we can trap register accesses and present the > illusion LORegions are unssupported by the CPU. To do this, we mask > ID_AA64MMFR1.LO, and set HCR_EL2.TLOR to trap accesses to the following > registers: > > * LORC_EL1 > * LOREA_EL1 > * LORID_EL1 > * LORN_EL1 > * LORSA_EL1 > > ... when trapped, we inject an UNDEFINED exception to EL1, simulating > their non-existence. > > As noted in D7.2.67, when no LORegions are implemented, LoadLOAcquire > and StoreLORelease must behave as LoadAcquire and StoreRelease > respectively. We can ensure this by clearing LORC_EL1.EN when a CPU's > EL2 is first initialized, as the host kernel will not modify this. > > Signed-off-by: Mark Rutland> Cc: Vladimir Murzin > Cc: Catalin Marinas > Cc: Christoffer Dall > Cc: Marc Zyngier > Cc: Will Deacon > Cc: kvmarm@lists.cs.columbia.edu Applied, thanks. -Christoffer ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
On Tue, Feb 13, 2018 at 02:08:47PM +, Dave Martin wrote: > On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote: > > On Fri, Feb 09, 2018 at 03:59:30PM +, Dave Martin wrote: > > > On Wed, Feb 07, 2018 at 06:56:44PM +0100, Christoffer Dall wrote: > > > > On Wed, Feb 07, 2018 at 04:49:55PM +, Dave Martin wrote: > > [...] > > > > Simply entering the kernel and returning to userspace doesn't have > > > this effect by itself. > > > > > > > > > Prior to the SVE patches, KVM makes itself orthogonal to the host > > > context switch machinery by ensuring that whatever the host had > > > in the FPSIMD regs at guest entry is restored before returning to > > > the host. (IIUC) > > > > Only if the guest actually touches FPSIMD state. If the guest doesn't > > touch FPSIMD (no trap to EL2), then we never touch the state at all. > > I should have been clearer: KVM ensures that the state is _unchanged_ > before returning to the host, but can elide the save/restore when the > guest doesn't touch the state... > > > > > > This means that redundant save/restore work is > > > done by KVM, but does have the advantage of simplicity. > > > > I don't understand what the redundant part here is? Isn't it only > > redundant in the case where the host (for some reason) has already saved > > its FPSIMD state? I assume that won't be the common case, since > > "userspace->kernel->kvm_run" won't save the FPSIMD state, as you just > > explained above. > > ...however, when this elision does not occur, it may duplicate > save/restore done by the kernel, or it may save/restore worthless data > if the host's FPSIMD state is non-live at the time. > > It's hard to gauge the impact of this: it seems unlikely to make a > massive difference, but will be highly workload-dependent. So I thought it might be useful to have some idea of the frequency of events on a balanced workload, so I ran an 8-way SMP guest on Ubuntu 14.04 running SPECjvm2008, a memcached benchmark, a MySQL workloads, and some networking benchmarks, and I counted a few events: - Out of all the exits, from the guest to run-loop in EL1 on a non-VHE system, fewer than 1% of them result in an exit to userspace (0.57%). - The VCPU thread was preempted (voluntarily or forced) in the kernel less than 3% of the exits (2.72%). That's just below 5 preemptions per ioctl(KVM_RUN). - In 29% of the preemptions (vcpu_put), the guest had touched FPSIMD registers and the host context was restored. - We store the host context about 1.38 times per ioctl(KVM_RUN). So that tells me that (1) it's worth restoring the guest FPSIMD state lazily as opposed to proactively on vcpu_load, and (2) that there's a small opportunity for improvement by reducing redundant host vfp state saves. > > > The redundancy occurs because of the deferred restore of the FPSIMD > registers for host userspace: as a result, the host FPSIMD regs are > either discardable (i.e., already saved) or not live at all between > and context switch and the next ret_to_user. > > This means that if the vcpu run loop is preempted, then when the host > switches back to the run loop it is pointless to save or restore the > host FPSIMD state. > > A typical sequence of events exposing this redundancy would be as > follows. I assume here that there are two cpu-bound tasks A and B > competing for a host CPU, where A is a vcpu thread: > > - vcpu A is in the guest running a compute-heavy task > - FPSIMD typically traps to the host before context switch > X kvm saves the host FPSIMD state > - kvm loads the guest FPSIMD state > - vcpu A reenters the guest > - host context switch IRQ preempts A back to the run loop > Y kvm loads the host FPSIMD state via vcpu_put > > - host context switch: > - TIF_FOREIGN_FPSTATE is set -> no save of user FPSIMD state > - switch to B > - B reaches ret_to_user > Y B's user FPSIMD state is loaded: TIF_FOREIGN_FPSTATE now clear > - B enters userspace > > - host context switch: > - B enters kernel > X TIF_FOREIGN_FPSTATE now set -> host saves B's FPSIMD state > - switch to A -> set TIF_FOREIGN_FPSTATE for A > - back to the KVM run loop > > - vcpu A enters guest > - redo from start > > Here, the two saves marked X are redundant with respect to each other, > and the two restores marked Y are redundant with respect to each other. > Right, ok, but if we have - ioctl(KVM_RUN) - mark hardware FPSIMD register state as invalid - load guest FPSIMD state - enter guest - exit guest - save guest FPSIMD state - return to user space (I.e. we don't do any preemption in the guest) Then we'll loose the host FPSIMD register state, potentially, right? Your original comment on this patch was that we didn't need to restore the host FPSIMD state in kvm_vcpu_put_sysregs, which would result in the scenario above. The only way I can see this working is by making sure that kvm_fpsimd_flush_cpu_state() also saves the FPSIMD hardware register state