Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > Sent: Friday, July 14, 2017 7:26 PM > > On 14/07/17 08:20, Tian, Kevin wrote: > >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > >> Sent: Friday, July 7, 2017 11:15 PM > >> > >> On 07/07/17 07:21, Tian, Kevin wrote: > >>> sorry I didn't quite get this part, and here is my understanding: > >>> > >>> Guest programs vIOMMU to map a gIOVA (used by MSI to a GPA > >>> of doorbell register of virtual irqchip. vIOMMU then > >>> triggers VFIO map/unmap to update physical IOMMU page > >>> table for gIOVA -> HPA of real doorbell of physical irqchip > >> > >> At the moment (non-SVM), physical and virtual MSI doorbell are > completely > >> dissociated. VFIO itself maps the doorbell GPA->HPA during container > >> initialization. The GPA, chosen arbitrarily by the host, is then removed > >> from the guest GPA space. > > > > got you. I also got some basic understanding from below link. :-) > > > > https://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough- > armarm64/ > > > >> > >> When the guest programs the vIOMMU to map a gIOVA to the virtual > irqchip > >> doorbell, I suppose Qemu will notice that the GPA doesn't correspond to > >> RAM and will withhold sending a VFIO_IOMMU_MAP_DMA request. > >> > >> (For SVM I don't want to go into the details just now, but we will > >> probably need a separate VFIO mechanism to update the physical MSI-X > >> tables with whatever gIOVA the guest mapped in its private stage-1 page > >> tables.) > > > > I guess there may be either a terminology difference or a hardware > > difference here, since I noted you mentioned IOVA with stage-1 > > multiple times. > > > > For Intel VT-d: > > > > - stage-1 is only for VA translation, tagged with PASID > > - stage-2 can be used for IOVA translation on bare metal or GPA/gIOVA > > translation in virtualization, w/o PASID tagged > > The terminology is indeed a bit confusing, and the hardware slightly > different. For me IOVA is the address used as input of the pIOMMU, PA is > the output address, and GPA only exists if there is stage-1 + stage-2. So > I think what I meant by gIOVA above was VA in your description. In Linux kernel IOVA specifically refers to a pseudo address space remapped to PA (e.g. from pci_map) while VA is for real CPU virtual address (so-called SVM). either IOVA or VA can be input to pIOMMU based on different usages. When running inside a VM, then input addresses become gIOVA or GVA. What about following this convention here and in future discussions, though I agree conceptually IOVA can represent any input of pIOMMU? :-) > > I understand your "stage-1" and "stage-2" are named "first-level" and > "second level" in the VT-d spec? yes, VT-d uses first/second level. > > If I read the VT-d spec correctly, I think the main difference on ARM SMMU > is that stage-2 always follows stage-1 translation, but either stage may > be disabled (or both, for bypass mode). There is no mode like in VT-d, > where non-PASID transactions go only through stage-2 and PASID > transactions go only through stage-1. I believe this is (NESTE=0, > T=000b/001b) in the Extended-Context-Entry. > > Something equivalent in SMMU is disabling stage-2 and using the entry 0 in > the PASID table for non-PASID traffic. In this mode, traffic that uses > PASID#0 would be aborted. So using your terms, the SMMU can have VAs > and > IOVAs be translated by stage-1 and then, if enabled, be translated by > stage-2 as well. > Clear to me. Thanks for explanation. Kevin
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
On 14/07/17 08:20, Tian, Kevin wrote: >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] >> Sent: Friday, July 7, 2017 11:15 PM >> >> On 07/07/17 07:21, Tian, Kevin wrote: >>> sorry I didn't quite get this part, and here is my understanding: >>> >>> Guest programs vIOMMU to map a gIOVA (used by MSI to a GPA >>> of doorbell register of virtual irqchip. vIOMMU then >>> triggers VFIO map/unmap to update physical IOMMU page >>> table for gIOVA -> HPA of real doorbell of physical irqchip >> >> At the moment (non-SVM), physical and virtual MSI doorbell are completely >> dissociated. VFIO itself maps the doorbell GPA->HPA during container >> initialization. The GPA, chosen arbitrarily by the host, is then removed >> from the guest GPA space. > > got you. I also got some basic understanding from below link. :-) > > https://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-armarm64/ > >> >> When the guest programs the vIOMMU to map a gIOVA to the virtual irqchip >> doorbell, I suppose Qemu will notice that the GPA doesn't correspond to >> RAM and will withhold sending a VFIO_IOMMU_MAP_DMA request. >> >> (For SVM I don't want to go into the details just now, but we will >> probably need a separate VFIO mechanism to update the physical MSI-X >> tables with whatever gIOVA the guest mapped in its private stage-1 page >> tables.) > > I guess there may be either a terminology difference or a hardware > difference here, since I noted you mentioned IOVA with stage-1 > multiple times. > > For Intel VT-d: > > - stage-1 is only for VA translation, tagged with PASID > - stage-2 can be used for IOVA translation on bare metal or GPA/gIOVA > translation in virtualization, w/o PASID tagged The terminology is indeed a bit confusing, and the hardware slightly different. For me IOVA is the address used as input of the pIOMMU, PA is the output address, and GPA only exists if there is stage-1 + stage-2. So I think what I meant by gIOVA above was VA in your description. I understand your "stage-1" and "stage-2" are named "first-level" and "second level" in the VT-d spec? If I read the VT-d spec correctly, I think the main difference on ARM SMMU is that stage-2 always follows stage-1 translation, but either stage may be disabled (or both, for bypass mode). There is no mode like in VT-d, where non-PASID transactions go only through stage-2 and PASID transactions go only through stage-1. I believe this is (NESTE=0, T=000b/001b) in the Extended-Context-Entry. Something equivalent in SMMU is disabling stage-2 and using the entry 0 in the PASID table for non-PASID traffic. In this mode, traffic that uses PASID#0 would be aborted. So using your terms, the SMMU can have VAs and IOVAs be translated by stage-1 and then, if enabled, be translated by stage-2 as well. Thanks, Jean > Does ARM SMMU allow stage-1 used for both VA and IOVA? IIRC > you said PASID#0 reserved for traffic w/o PASID in some mail...> >>> (assume your irqchip will provide multiple doorbells so each >>> device can have its own channel). >> >> In existing irqchips the doorbell is shared by endpoints, which are >> differentiated by their device ID (generally the BDF). I'm not sure why >> this matters here? > > Not matter now with device ID > >> >>> then once this update is >>> done, later MSI interrupts from assigned device will go >>> through physical IOMMU (gIOVA->HPA) then reach irqchip >>> for irq remapping. vIOMMU is involved only in configuration >>> path instead of actual interrupt path. >> >> Yes the vIOMMU is used to correlate the IOVA written by the guest in its >> virtual MSI-X table with the MAP request received by the vIOMMU. That is >> probably used to setup IRQFD routes with KVM. But the vIOMMU is not >> involved further than that in MSIs. >> >>> If my understanding is correct, above will be the natural flow then >>> why is additional virtio-iommu change required? :-) >> >> The change is not *required* for ARM systems, I only proposed removing the >> doorbell address translation stage to make host implementation simpler >> (and since virtio-iommu on x86 won't translate the doorbell anyway, we >> have to add support for this to virtio-iommu). But for Qemu, since vSMMU >> needs to implement the natural flow anyway, it might not be a lot of >> effort to also do it for virtio-iommu. Other implementations (e.g. >> kvmtool) might piggy-back on the x86 way and declare the irqchip doorbell >> as untranslated. >> >> My proposal also breaks when confronted to virtual SVM in a physical ARM >> system, where the guest owns stage-1 page tables and *has* to map the >> doorbell if it wants MSIs to work, so you can disregard it :) >> > > It is a good learning. thanks. >
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > Sent: Friday, July 7, 2017 11:15 PM > > On 07/07/17 07:21, Tian, Kevin wrote: > > sorry I didn't quite get this part, and here is my understanding: > > > > Guest programs vIOMMU to map a gIOVA (used by MSI to a GPA > > of doorbell register of virtual irqchip. vIOMMU then > > triggers VFIO map/unmap to update physical IOMMU page > > table for gIOVA -> HPA of real doorbell of physical irqchip > > At the moment (non-SVM), physical and virtual MSI doorbell are completely > dissociated. VFIO itself maps the doorbell GPA->HPA during container > initialization. The GPA, chosen arbitrarily by the host, is then removed > from the guest GPA space. got you. I also got some basic understanding from below link. :-) https://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-armarm64/ > > When the guest programs the vIOMMU to map a gIOVA to the virtual irqchip > doorbell, I suppose Qemu will notice that the GPA doesn't correspond to > RAM and will withhold sending a VFIO_IOMMU_MAP_DMA request. > > (For SVM I don't want to go into the details just now, but we will > probably need a separate VFIO mechanism to update the physical MSI-X > tables with whatever gIOVA the guest mapped in its private stage-1 page > tables.) I guess there may be either a terminology difference or a hardware difference here, since I noted you mentioned IOVA with stage-1 multiple times. For Intel VT-d: - stage-1 is only for VA translation, tagged with PASID - stage-2 can be used for IOVA translation on bare metal or GPA/gIOVA translation in virtualization, w/o PASID tagged Does ARM SMMU allow stage-1 used for both VA and IOVA? IIRC you said PASID#0 reserved for traffic w/o PASID in some mail... > > > (assume your irqchip will provide multiple doorbells so each > > device can have its own channel). > > In existing irqchips the doorbell is shared by endpoints, which are > differentiated by their device ID (generally the BDF). I'm not sure why > this matters here? Not matter now with device ID > > > then once this update is > > done, later MSI interrupts from assigned device will go > > through physical IOMMU (gIOVA->HPA) then reach irqchip > > for irq remapping. vIOMMU is involved only in configuration > > path instead of actual interrupt path. > > Yes the vIOMMU is used to correlate the IOVA written by the guest in its > virtual MSI-X table with the MAP request received by the vIOMMU. That is > probably used to setup IRQFD routes with KVM. But the vIOMMU is not > involved further than that in MSIs. > > > If my understanding is correct, above will be the natural flow then > > why is additional virtio-iommu change required? :-) > > The change is not *required* for ARM systems, I only proposed removing the > doorbell address translation stage to make host implementation simpler > (and since virtio-iommu on x86 won't translate the doorbell anyway, we > have to add support for this to virtio-iommu). But for Qemu, since vSMMU > needs to implement the natural flow anyway, it might not be a lot of > effort to also do it for virtio-iommu. Other implementations (e.g. > kvmtool) might piggy-back on the x86 way and declare the irqchip doorbell > as untranslated. > > My proposal also breaks when confronted to virtual SVM in a physical ARM > system, where the guest owns stage-1 page tables and *has* to map the > doorbell if it wants MSIs to work, so you can disregard it :) > It is a good learning. thanks.
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> -Original Message- > From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > Sent: Wednesday, July 12, 2017 4:28 PM > To: Bharat Bhushan ; Auger Eric > ; eric.auger@gmail.com; > peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com; > qemu-...@nongnu.org; qemu-devel@nongnu.org > Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > robin.mur...@arm.com; christoffer.d...@linaro.org > Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device > > On 12/07/17 11:27, Bharat Bhushan wrote: > > > > > >> -Original Message- > >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > >> Sent: Wednesday, July 12, 2017 3:48 PM > >> To: Bharat Bhushan ; Auger Eric > >> ; eric.auger@gmail.com; > >> peter.mayd...@linaro.org; alex.william...@redhat.com; > m...@redhat.com; > >> qemu-...@nongnu.org; qemu-devel@nongnu.org > >> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > >> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > >> robin.mur...@arm.com; christoffer.d...@linaro.org > >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device > >> > >> On 12/07/17 04:50, Bharat Bhushan wrote: > >> [...] > >>>> The size of the virtio_iommu_req_probe structure is variable, and > >> depends > >>>> what fields the device implements. So the device initially computes > >>>> the > >> size it > >>>> needs to fill virtio_iommu_req_probe, describes it in probe_size, > >>>> and the driver allocates that many bytes for > >>>> virtio_iommu_req_probe.content[] > >>>> > >>>>>> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should > >> send > >>>> an > >>>>>> VIRTIO_IOMMU_T_PROBE request for each new endpoint. > >>>>>> * The driver allocates a device-writeable buffer of probe_size > >>>>>> (plus > >>>>>> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request. > >>>>>> * The device fills the buffer with various information. > >>>>>> > >>>>>> struct virtio_iommu_req_probe { > >>>>>>/* device-readable */ > >>>>>>struct virtio_iommu_req_head head; > >>>>>>le32 device; > >>>>>>le32 flags; > >>>>>> > >>>>>>/* maybe also le32 content_size, but it must be equal to > >>>>>> probe_size */ > >>>>> > >>>>> Can you please describe why we need to pass size of "probe_size" > >>>>> in > >> probe > >>>> request? > >>>> > >>>> We don't. I don't think we should add this 'content_size' field > >>>> unless there > >> is > >>>> a compelling reason to do so. > >>>> > >>>>>> > >>>>>>/* device-writeable */ > >>>>>>u8 content[]; > >>>>> > >>>>> I assume content_size above is the size of array "content[]" and > >>>>> max > >> value > >>>> can be equal to probe_size advertised by device? > >>>> > >>>> probe_size is exactly the size of array content[]. The driver must > >>>> allocate a buffer of this size (plus the space needed for head, device, > flags and tail). > >>>> > >>>> Then the device is free to leave parts of content[] empty. Field > >>>> 'type' 0 will > >> be > >>>> reserved and mark the end of the array. > >>>> > >>>>>>struct virtio_iommu_req_tail tail; }; > >>>>>> > >>>>>> I'm still struggling with the content and layout of the probe > >>>>>> request, and would appreciate any feedback. To be easily > >>>>>> extended, I think it should contain a list of fields of variable size: > >>>>>> > >>>>>>|0 15|16 31|32 N| > >>>>>>| type |length | values | > >>>>>> > >>>>>> 'length' might be made optional if it c
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
On 12/07/17 11:27, Bharat Bhushan wrote: > > >> -Original Message- >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] >> Sent: Wednesday, July 12, 2017 3:48 PM >> To: Bharat Bhushan ; Auger Eric >> ; eric.auger@gmail.com; >> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com; >> qemu-...@nongnu.org; qemu-devel@nongnu.org >> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; >> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; >> robin.mur...@arm.com; christoffer.d...@linaro.org >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device >> >> On 12/07/17 04:50, Bharat Bhushan wrote: >> [...] >>>> The size of the virtio_iommu_req_probe structure is variable, and >> depends >>>> what fields the device implements. So the device initially computes the >> size it >>>> needs to fill virtio_iommu_req_probe, describes it in probe_size, and the >>>> driver allocates that many bytes for virtio_iommu_req_probe.content[] >>>> >>>>>> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should >> send >>>> an >>>>>> VIRTIO_IOMMU_T_PROBE request for each new endpoint. >>>>>> * The driver allocates a device-writeable buffer of probe_size (plus >>>>>> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request. >>>>>> * The device fills the buffer with various information. >>>>>> >>>>>> struct virtio_iommu_req_probe { >>>>>> /* device-readable */ >>>>>> struct virtio_iommu_req_head head; >>>>>> le32 device; >>>>>> le32 flags; >>>>>> >>>>>> /* maybe also le32 content_size, but it must be equal to >>>>>> probe_size */ >>>>> >>>>> Can you please describe why we need to pass size of "probe_size" in >> probe >>>> request? >>>> >>>> We don't. I don't think we should add this 'content_size' field unless >>>> there >> is >>>> a compelling reason to do so. >>>> >>>>>> >>>>>> /* device-writeable */ >>>>>> u8 content[]; >>>>> >>>>> I assume content_size above is the size of array "content[]" and max >> value >>>> can be equal to probe_size advertised by device? >>>> >>>> probe_size is exactly the size of array content[]. The driver must >>>> allocate a >>>> buffer of this size (plus the space needed for head, device, flags and >>>> tail). >>>> >>>> Then the device is free to leave parts of content[] empty. Field 'type' 0 >>>> will >> be >>>> reserved and mark the end of the array. >>>> >>>>>> struct virtio_iommu_req_tail tail; >>>>>> }; >>>>>> >>>>>> I'm still struggling with the content and layout of the probe >>>>>> request, and would appreciate any feedback. To be easily extended, I >>>>>> think it should contain a list of fields of variable size: >>>>>> >>>>>> |0 15|16 31|32 N| >>>>>> | type |length | values | >>>>>> >>>>>> 'length' might be made optional if it can be deduced from type, but >>>>>> might make driver-side parsing more robust. >>>>>> >>>>>> The probe could either be done for each endpoint, or for each address >>>>>> space. I much prefer endpoint because it is the smallest granularity. >>>>>> The driver can then decide what endpoints to put together in the same >>>>>> address space based on their individual capabilities. The >>>>>> specification would described how each endpoint property is combined >>>>>> when endpoints are put in the same address space. For example, take >>>>>> the minimum of all PASID size, the maximum of all page granularities, >>>>>> combine doorbell addresses, etc. >>>>>> >>>>>> If we did the probe on address spaces instead, the driver would have >>>>>> to re-send a probe request each time a new endpoint is attached to an >>>>>> existing address space, to see if it is s
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> -Original Message- > From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > Sent: Wednesday, July 12, 2017 3:48 PM > To: Bharat Bhushan ; Auger Eric > ; eric.auger@gmail.com; > peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com; > qemu-...@nongnu.org; qemu-devel@nongnu.org > Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > robin.mur...@arm.com; christoffer.d...@linaro.org > Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device > > On 12/07/17 04:50, Bharat Bhushan wrote: > [...] > >> The size of the virtio_iommu_req_probe structure is variable, and > depends > >> what fields the device implements. So the device initially computes the > size it > >> needs to fill virtio_iommu_req_probe, describes it in probe_size, and the > >> driver allocates that many bytes for virtio_iommu_req_probe.content[] > >> > >>>> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should > send > >> an > >>>> VIRTIO_IOMMU_T_PROBE request for each new endpoint. > >>>> * The driver allocates a device-writeable buffer of probe_size (plus > >>>> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request. > >>>> * The device fills the buffer with various information. > >>>> > >>>> struct virtio_iommu_req_probe { > >>>> /* device-readable */ > >>>> struct virtio_iommu_req_head head; > >>>> le32 device; > >>>> le32 flags; > >>>> > >>>> /* maybe also le32 content_size, but it must be equal to > >>>> probe_size */ > >>> > >>> Can you please describe why we need to pass size of "probe_size" in > probe > >> request? > >> > >> We don't. I don't think we should add this 'content_size' field unless > >> there > is > >> a compelling reason to do so. > >> > >>>> > >>>> /* device-writeable */ > >>>> u8 content[]; > >>> > >>> I assume content_size above is the size of array "content[]" and max > value > >> can be equal to probe_size advertised by device? > >> > >> probe_size is exactly the size of array content[]. The driver must > >> allocate a > >> buffer of this size (plus the space needed for head, device, flags and > >> tail). > >> > >> Then the device is free to leave parts of content[] empty. Field 'type' 0 > >> will > be > >> reserved and mark the end of the array. > >> > >>>> struct virtio_iommu_req_tail tail; > >>>> }; > >>>> > >>>> I'm still struggling with the content and layout of the probe > >>>> request, and would appreciate any feedback. To be easily extended, I > >>>> think it should contain a list of fields of variable size: > >>>> > >>>> |0 15|16 31|32 N| > >>>> | type |length | values | > >>>> > >>>> 'length' might be made optional if it can be deduced from type, but > >>>> might make driver-side parsing more robust. > >>>> > >>>> The probe could either be done for each endpoint, or for each address > >>>> space. I much prefer endpoint because it is the smallest granularity. > >>>> The driver can then decide what endpoints to put together in the same > >>>> address space based on their individual capabilities. The > >>>> specification would described how each endpoint property is combined > >>>> when endpoints are put in the same address space. For example, take > >>>> the minimum of all PASID size, the maximum of all page granularities, > >>>> combine doorbell addresses, etc. > >>>> > >>>> If we did the probe on address spaces instead, the driver would have > >>>> to re-send a probe request each time a new endpoint is attached to an > >>>> existing address space, to see if it is still capable of page table > >>>> handover or if the driver just combined a VFIO and an emulated > >>>> endpoint by accident. > >>>> > >>>> *** > >>>> > >>>> Using this framework, the device can declare doorbell regions
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
On 12/07/17 04:50, Bharat Bhushan wrote: [...] >> The size of the virtio_iommu_req_probe structure is variable, and depends >> what fields the device implements. So the device initially computes the size >> it >> needs to fill virtio_iommu_req_probe, describes it in probe_size, and the >> driver allocates that many bytes for virtio_iommu_req_probe.content[] >> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should send >> an VIRTIO_IOMMU_T_PROBE request for each new endpoint. * The driver allocates a device-writeable buffer of probe_size (plus framing) and sends it as a VIRTIO_IOMMU_T_PROBE request. * The device fills the buffer with various information. struct virtio_iommu_req_probe { /* device-readable */ struct virtio_iommu_req_head head; le32 device; le32 flags; /* maybe also le32 content_size, but it must be equal to probe_size */ >>> >>> Can you please describe why we need to pass size of "probe_size" in probe >> request? >> >> We don't. I don't think we should add this 'content_size' field unless there >> is >> a compelling reason to do so. >> /* device-writeable */ u8 content[]; >>> >>> I assume content_size above is the size of array "content[]" and max value >> can be equal to probe_size advertised by device? >> >> probe_size is exactly the size of array content[]. The driver must allocate a >> buffer of this size (plus the space needed for head, device, flags and tail). >> >> Then the device is free to leave parts of content[] empty. Field 'type' 0 >> will be >> reserved and mark the end of the array. >> struct virtio_iommu_req_tail tail; }; I'm still struggling with the content and layout of the probe request, and would appreciate any feedback. To be easily extended, I think it should contain a list of fields of variable size: |0 15|16 31|32 N| | type |length | values | 'length' might be made optional if it can be deduced from type, but might make driver-side parsing more robust. The probe could either be done for each endpoint, or for each address space. I much prefer endpoint because it is the smallest granularity. The driver can then decide what endpoints to put together in the same address space based on their individual capabilities. The specification would described how each endpoint property is combined when endpoints are put in the same address space. For example, take the minimum of all PASID size, the maximum of all page granularities, combine doorbell addresses, etc. If we did the probe on address spaces instead, the driver would have to re-send a probe request each time a new endpoint is attached to an existing address space, to see if it is still capable of page table handover or if the driver just combined a VFIO and an emulated endpoint by accident. *** Using this framework, the device can declare doorbell regions by adding one or more RESV fields into the probe buffer: /* 'type' */ #define VIRTIO_IOMMU_PROBE_T_RESV 0x1 /* 'values'. 'length' is sizeof(struct virtio_iommu_probe_resv) */ struct virtio_iommu_probe_resv { le64 gpa; le64 size; #define VIRTIO_IOMMU_PROBE_RESV_MSI0x1 u8 type; > > To be sure I am understanding it correctly, Is this "type" in struct > virtio_iommu_req_head? No, virtio_iommu_req_head::type is the request type (ATTACH/DETACH/MAP/UNMAP/PROBE). Then virtio_iommu_probe_property::type is the property type (only RESV for the moment). And this is virtio_iommu_probe_resv::type, which is the type of the resv region (MSI). I renamed it to 'subtype' below, but I think it still is pretty confusing. I did a number of changes to structures and naming when trying to integrate it to the specification: * Added 64 bytes of padding in virtio_iommu_req_probe, so that future extensions can add fields in the device-readable part. * renamed "RESV" to "RESV_MEM". * The resv_mem property now looks like this: struct virtio_iommu_probe_resv_mem { u8 subtype; u8 padding[3]; le32flags; le64addr; le64size; }; * subtype for MSI doorbells is now VIRTIO_IOMMU_PROBE_RESV_MEM_T_BYPASS (because transactions to this region bypass the IOMMU). 'flags' contain a hint VIRTIO_IOMMU_PROBE_RESV_MEM_F_MSI, telling the driver that this region is used for MSIs. Here is an example of a probe request returning an MSI doorbell property. 31 7 0 +-+ | 0| type | <- request type = PROBE (5) +-+ | device | +---
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> -Original Message- > From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > Sent: Tuesday, July 11, 2017 6:21 PM > To: Bharat Bhushan ; Auger Eric > ; eric.auger@gmail.com; > peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com; > qemu-...@nongnu.org; qemu-devel@nongnu.org > Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > robin.mur...@arm.com; christoffer.d...@linaro.org > Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device > > On 11/07/17 06:54, Bharat Bhushan wrote: > > Hi Jean, > > > >> -Original Message- > >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > >> Sent: Friday, July 07, 2017 8:50 PM > >> To: Bharat Bhushan ; Auger Eric > >> ; eric.auger@gmail.com; > >> peter.mayd...@linaro.org; alex.william...@redhat.com; > m...@redhat.com; > >> qemu-...@nongnu.org; qemu-devel@nongnu.org > >> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > >> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > >> robin.mur...@arm.com; christoffer.d...@linaro.org > >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device > >> > >> On 07/07/17 12:36, Bharat Bhushan wrote: > >>>>> In this proposal, QEMU reserves a iova-range for guest (not host) > >>>>> and > >> guest > >>>> kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI). > >>>> While > >> this > >>>> does not change host interface and it will continue to use host > >>>> reserved mapping for actual interrupt generation, no? > >>>> But then userspace needs to provide IOMMU_RESV_MSI range to > guest > >>>> kernel, right? What would be the proposed manner? > >>> > >>> Just an opinion, we can define feature > >> (VIRTIO_IOMMU_F_RES_MSI_RANGE) and provide this info via a > command > >> (VIRTIO_IOMMU_T_MSI_RANGE). Guest iommu-driver will make this call > >> during initialization and store the value. This value will just > >> replace MSI_IOVA_BASE and MSI_IOVA_LENGHT hash define. Rest will > >> remain same in virtio-iommu driver. > >> > >> Yes I had something similar in mind, although more generic since > >> we'll need to get other bits of information from the device in future > >> extensions (fault handling, page table formats and dynamic reserves > >> of memory for SVM), and maybe also for finding out per-address-space > >> page granularity (see my reply of patch 3/8). These are per-endpoint > >> properties that cannot be advertise in the virtio config space. > >> > >> *** > >> > >> So I propose to add a per-endpoint probing mechanism on the request > >> queue: > > > > What is per-endpoint? Is it "per-pci/platform-device"? > > Yes, it's a pci or platform device managed by the IOMMU. In the spec I'm > now using the term "endpoint" to easily differentiate from the virtio-iommu > device ("the device"). > > >> * The device advertises a new command VIRTIO_IOMMU_T_PROBE with > >> feature bit VIRTIO_IOMMU_F_PROBE. > >> * When this feature is advertised, the device sets probe_size field > >> in the the config space. > > > > Probably I did not get how virtio-iommu device emulation decides value of > "probe_size", can you share more info? > > The size of the virtio_iommu_req_probe structure is variable, and depends > what fields the device implements. So the device initially computes the size > it > needs to fill virtio_iommu_req_probe, describes it in probe_size, and the > driver allocates that many bytes for virtio_iommu_req_probe.content[] > > >> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should send > an > >> VIRTIO_IOMMU_T_PROBE request for each new endpoint. > >> * The driver allocates a device-writeable buffer of probe_size (plus > >> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request. > >> * The device fills the buffer with various information. > >> > >> struct virtio_iommu_req_probe { > >>/* device-readable */ > >>struct virtio_iommu_req_head head; > >>le32 device; > >>le32 flags; > >> > >>/* maybe also le32 content_size, but it must be equal to > >> probe_size */ > > > > Can you please describe why
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
On 11/07/17 06:54, Bharat Bhushan wrote: > Hi Jean, > >> -Original Message- >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] >> Sent: Friday, July 07, 2017 8:50 PM >> To: Bharat Bhushan ; Auger Eric >> ; eric.auger@gmail.com; >> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com; >> qemu-...@nongnu.org; qemu-devel@nongnu.org >> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; >> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; >> robin.mur...@arm.com; christoffer.d...@linaro.org >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device >> >> On 07/07/17 12:36, Bharat Bhushan wrote: >>>>> In this proposal, QEMU reserves a iova-range for guest (not host) and >> guest >>>> kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI). While >> this >>>> does not change host interface and it will continue to use host reserved >>>> mapping for actual interrupt generation, no? >>>> But then userspace needs to provide IOMMU_RESV_MSI range to guest >>>> kernel, right? What would be the proposed manner? >>> >>> Just an opinion, we can define feature >> (VIRTIO_IOMMU_F_RES_MSI_RANGE) and provide this info via a command >> (VIRTIO_IOMMU_T_MSI_RANGE). Guest iommu-driver will make this call >> during initialization and store the value. This value will just replace >> MSI_IOVA_BASE and MSI_IOVA_LENGHT hash define. Rest will remain same >> in virtio-iommu driver. >> >> Yes I had something similar in mind, although more generic since we'll >> need to get other bits of information from the device in future extensions >> (fault handling, page table formats and dynamic reserves of memory for >> SVM), and maybe also for finding out per-address-space page granularity >> (see my reply of patch 3/8). These are per-endpoint properties that cannot >> be advertise in the virtio config space. >> >> *** >> >> So I propose to add a per-endpoint probing mechanism on the request >> queue: > > What is per-endpoint? Is it "per-pci/platform-device"? Yes, it's a pci or platform device managed by the IOMMU. In the spec I'm now using the term "endpoint" to easily differentiate from the virtio-iommu device ("the device"). >> * The device advertises a new command VIRTIO_IOMMU_T_PROBE with >> feature >> bit VIRTIO_IOMMU_F_PROBE. >> * When this feature is advertised, the device sets probe_size field in the >> the config space. > > Probably I did not get how virtio-iommu device emulation decides value of > "probe_size", can you share more info? The size of the virtio_iommu_req_probe structure is variable, and depends what fields the device implements. So the device initially computes the size it needs to fill virtio_iommu_req_probe, describes it in probe_size, and the driver allocates that many bytes for virtio_iommu_req_probe.content[] >> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should send an >> VIRTIO_IOMMU_T_PROBE request for each new endpoint. >> * The driver allocates a device-writeable buffer of probe_size (plus >> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request. >> * The device fills the buffer with various information. >> >> struct virtio_iommu_req_probe { >> /* device-readable */ >> struct virtio_iommu_req_head head; >> le32 device; >> le32 flags; >> >> /* maybe also le32 content_size, but it must be equal to >> probe_size */ > > Can you please describe why we need to pass size of "probe_size" in probe > request? We don't. I don't think we should add this 'content_size' field unless there is a compelling reason to do so. >> >> /* device-writeable */ >> u8 content[]; > > I assume content_size above is the size of array "content[]" and max value > can be equal to probe_size advertised by device? probe_size is exactly the size of array content[]. The driver must allocate a buffer of this size (plus the space needed for head, device, flags and tail). Then the device is free to leave parts of content[] empty. Field 'type' 0 will be reserved and mark the end of the array. >> struct virtio_iommu_req_tail tail; >> }; >> >> I'm still struggling with the content and layout of the probe request, and >> would appreciate any feedback. To be easily extended, I think it should >> contain a list of fields of variable size: >> >> |0 1
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Hi Jean, > -Original Message- > From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > Sent: Friday, July 07, 2017 8:50 PM > To: Bharat Bhushan ; Auger Eric > ; eric.auger@gmail.com; > peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com; > qemu-...@nongnu.org; qemu-devel@nongnu.org > Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > robin.mur...@arm.com; christoffer.d...@linaro.org > Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device > > On 07/07/17 12:36, Bharat Bhushan wrote: > >>> In this proposal, QEMU reserves a iova-range for guest (not host) and > guest > >> kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI). While > this > >> does not change host interface and it will continue to use host reserved > >> mapping for actual interrupt generation, no? > >> But then userspace needs to provide IOMMU_RESV_MSI range to guest > >> kernel, right? What would be the proposed manner? > > > > Just an opinion, we can define feature > (VIRTIO_IOMMU_F_RES_MSI_RANGE) and provide this info via a command > (VIRTIO_IOMMU_T_MSI_RANGE). Guest iommu-driver will make this call > during initialization and store the value. This value will just replace > MSI_IOVA_BASE and MSI_IOVA_LENGHT hash define. Rest will remain same > in virtio-iommu driver. > > Yes I had something similar in mind, although more generic since we'll > need to get other bits of information from the device in future extensions > (fault handling, page table formats and dynamic reserves of memory for > SVM), and maybe also for finding out per-address-space page granularity > (see my reply of patch 3/8). These are per-endpoint properties that cannot > be advertise in the virtio config space. > > *** > > So I propose to add a per-endpoint probing mechanism on the request > queue: What is per-endpoint? Is it "per-pci/platform-device"? > > * The device advertises a new command VIRTIO_IOMMU_T_PROBE with > feature > bit VIRTIO_IOMMU_F_PROBE. > * When this feature is advertised, the device sets probe_size field in the > the config space. Probably I did not get how virtio-iommu device emulation decides value of "probe_size", can you share more info? > * When device offers VIRTIO_IOMMU_F_PROBE, the driver should send an > VIRTIO_IOMMU_T_PROBE request for each new endpoint. > * The driver allocates a device-writeable buffer of probe_size (plus > framing) and sends it as a VIRTIO_IOMMU_T_PROBE request. > * The device fills the buffer with various information. > > struct virtio_iommu_req_probe { > /* device-readable */ > struct virtio_iommu_req_head head; > le32 device; > le32 flags; > > /* maybe also le32 content_size, but it must be equal to > probe_size */ Can you please describe why we need to pass size of "probe_size" in probe request? > > /* device-writeable */ > u8 content[]; I assume content_size above is the size of array "content[]" and max value can be equal to probe_size advertised by device? > struct virtio_iommu_req_tail tail; > }; > > I'm still struggling with the content and layout of the probe request, and > would appreciate any feedback. To be easily extended, I think it should > contain a list of fields of variable size: > > |0 15|16 31|32 N| > | type |length | values | > > 'length' might be made optional if it can be deduced from type, but might > make driver-side parsing more robust. > > The probe could either be done for each endpoint, or for each address > space. I much prefer endpoint because it is the smallest granularity. The > driver can then decide what endpoints to put together in the same address > space based on their individual capabilities. The specification would > described how each endpoint property is combined when endpoints are put > in > the same address space. For example, take the minimum of all PASID size, > the maximum of all page granularities, combine doorbell addresses, etc. > > If we did the probe on address spaces instead, the driver would have to > re-send a probe request each time a new endpoint is attached to an > existing address space, to see if it is still capable of page table > handover or if the driver just combined a VFIO and an emulated endpoint by > accident. > > *** > > Using this framework, the device can declare doorbell regions by adding > one or more RE
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
On 07/07/17 12:36, Bharat Bhushan wrote: >>> In this proposal, QEMU reserves a iova-range for guest (not host) and guest >> kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI). While this >> does not change host interface and it will continue to use host reserved >> mapping for actual interrupt generation, no? >> But then userspace needs to provide IOMMU_RESV_MSI range to guest >> kernel, right? What would be the proposed manner? > > Just an opinion, we can define feature (VIRTIO_IOMMU_F_RES_MSI_RANGE) and > provide this info via a command (VIRTIO_IOMMU_T_MSI_RANGE). Guest > iommu-driver will make this call during initialization and store the value. > This value will just replace MSI_IOVA_BASE and MSI_IOVA_LENGHT hash define. > Rest will remain same in virtio-iommu driver. Yes I had something similar in mind, although more generic since we'll need to get other bits of information from the device in future extensions (fault handling, page table formats and dynamic reserves of memory for SVM), and maybe also for finding out per-address-space page granularity (see my reply of patch 3/8). These are per-endpoint properties that cannot be advertise in the virtio config space. *** So I propose to add a per-endpoint probing mechanism on the request queue: * The device advertises a new command VIRTIO_IOMMU_T_PROBE with feature bit VIRTIO_IOMMU_F_PROBE. * When this feature is advertised, the device sets probe_size field in the the config space. * When device offers VIRTIO_IOMMU_F_PROBE, the driver should send an VIRTIO_IOMMU_T_PROBE request for each new endpoint. * The driver allocates a device-writeable buffer of probe_size (plus framing) and sends it as a VIRTIO_IOMMU_T_PROBE request. * The device fills the buffer with various information. struct virtio_iommu_req_probe { /* device-readable */ struct virtio_iommu_req_head head; le32 device; le32 flags; /* maybe also le32 content_size, but it must be equal to probe_size */ /* device-writeable */ u8 content[]; struct virtio_iommu_req_tail tail; }; I'm still struggling with the content and layout of the probe request, and would appreciate any feedback. To be easily extended, I think it should contain a list of fields of variable size: |0 15|16 31|32 N| | type |length | values | 'length' might be made optional if it can be deduced from type, but might make driver-side parsing more robust. The probe could either be done for each endpoint, or for each address space. I much prefer endpoint because it is the smallest granularity. The driver can then decide what endpoints to put together in the same address space based on their individual capabilities. The specification would described how each endpoint property is combined when endpoints are put in the same address space. For example, take the minimum of all PASID size, the maximum of all page granularities, combine doorbell addresses, etc. If we did the probe on address spaces instead, the driver would have to re-send a probe request each time a new endpoint is attached to an existing address space, to see if it is still capable of page table handover or if the driver just combined a VFIO and an emulated endpoint by accident. *** Using this framework, the device can declare doorbell regions by adding one or more RESV fields into the probe buffer: /* 'type' */ #define VIRTIO_IOMMU_PROBE_T_RESV 0x1 /* 'values'. 'length' is sizeof(struct virtio_iommu_probe_resv) */ struct virtio_iommu_probe_resv { le64 gpa; le64 size; #define VIRTIO_IOMMU_PROBE_RESV_MSI 0x1 u8 type; }; Such a region would be subject to the following rules: * Driver should not use any IOVA declared as RESV_MSI in a mapping. * Device should leave any transaction matching a RESV_MSI region pass through untranslated. * If the device does not advertise any RESV region, then the driver should assume that MSI doorbells, like any other GPA, must be mapped with an arbitrary IOVA in order for the endpoint to access them. * Given that the driver *should* perform a probe request if available, and it *should* understand the VIRTIO_IOMMU_PROBE_T_RESV field, then this field tells the guest how it should handle MSI doorbells, and whether it should map the address via MAP requests or not. Does this make sense and did I overlook something? Thanks, Jean
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
On 06/07/17 22:11, Auger Eric wrote: > Hello Bharat, Jean-Philippe, > On 06/07/2017 12:02, Jean-Philippe Brucker wrote: >> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route >> kvm_irqchip_add_msi_route() we needed to provide the translated address. > According to my understanding this is required because kernel does no go through viommu translation when generating interrupt, no? yes this is needed when KVM MSI routes are set up, ie. along with GICV3 ITS. With GICv2M, qemu direct gsi mapping is used and this is not needed. So I do not understand your previous sentence saying "MSI interrupts works without any change". >>> >>> I have almost completed vfio integration with virtio-iommu and now testing >>> the changes by assigning e1000 device to VM. For this I have changed >>> virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi and this does >>> not need changed in vfio_get_addr() and kvm_irqchip_add_msi_route() >> >> I understand you're reserving region 0x0800-0x0810 as >> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only works >> because Qemu places the vgic in that area as well (in hw/arm/virt.c). It's >> not a coincidence if the addresses are the same, because Eric chose them >> for the Linux SMMU drivers and I copied them. > > Yes I chose this region because it does not overlap with any guest RAM > region > >> >> We can't rely on that behavior, though, it will break MSIs in emulated >> devices. And if Qemu happens to move the MSI doorbell in future machine >> revisions, then it would also break VFIO. >> >> Just for my own understanding -- what happens, I think, is that in Linux >> iova_reserve_iommu_regions initially reserves the guest-physical doorbell >> 0x0800-0x0810. Then much later, when the device driver requests an >> MSI, the irqchip driver calls iommu_dma_map_msi_msg with the >> guest-physical gicv2m address 0x0802. The function finds the right >> page in msi_page_list, which was added by cookie_init_hw_msi_region, >> therefore bypassing the viommu and the GPA gets written in the MSI-X table. > > I share Jean's understanding. To me using IOMMU_RESV_MSI in the > virtio-iommu means this region is not translated by the IOMMU. as > cookie_init_hw_msi_region() pre-allocates the msi_page array, > iommu_dma_get_msi_page() does not do any IOMMU mapping. > >> >> If an emulated device such as virtio-net-pci were to generate an MSI, then >> Qemu would attempt to access the doorbell written by Linux into the MSI-X >> table, 0x0802, and fault because that address wasn't mapped in the >> viommu. > Yes so I am confused, how can it work with a virtio-net-pci or > passthrough'ed e1000e device using MSIs? >> >> So for VFIO, you either need to translate the MSI-X entry using the >> viommu, > > For the vsmmuv3 I created a dedicated IOMMUNotifier to handle the fact > the MSI doorbell is translated and MSI routes need to be updated. This > seems to work. > > or just assume that the vaddr corresponds to the only MSI doorbell >> accessible by this device (because how can we be certain that the guest >> already mapped the doorbell before writing the entry?) >> >> For ARM machines it's probably best to stick with IOMMU_RESV_SW_MSI. >> However, a nice way to use IOMMU_RESV_MSI would be for the virtio-iommu >> device to advertise identity-mapped/reserved regions, and bypass >> translation on these regions. Then the driver could reserve those with >> IOMMU_RESV_MSI. > > At least we may need to configure the virtio-iommu to either bypass MSIs > (x86) or translate MSIs (ARM)? Yes, see the VIRTIO_IOMMU_T_PROBE proposal in, er, my other reply. > For x86 we will need such a system, with an added IRQ >> remapping feature. > Meaning this must live along with vIR, is that what you mean? Also on > ARM this must live with vITS anyway. This is an orthogonal feature, right? Reserving doorbells regions on x86 is a must otherwise MSIs won't work. IRQ remapping would be nice to add in some distant future. Thanks, Jean
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
On 07/07/17 07:21, Tian, Kevin wrote: > sorry I didn't quite get this part, and here is my understanding: > > Guest programs vIOMMU to map a gIOVA (used by MSI to a GPA > of doorbell register of virtual irqchip. vIOMMU then > triggers VFIO map/unmap to update physical IOMMU page > table for gIOVA -> HPA of real doorbell of physical irqchip At the moment (non-SVM), physical and virtual MSI doorbell are completely dissociated. VFIO itself maps the doorbell GPA->HPA during container initialization. The GPA, chosen arbitrarily by the host, is then removed from the guest GPA space. When the guest programs the vIOMMU to map a gIOVA to the virtual irqchip doorbell, I suppose Qemu will notice that the GPA doesn't correspond to RAM and will withhold sending a VFIO_IOMMU_MAP_DMA request. (For SVM I don't want to go into the details just now, but we will probably need a separate VFIO mechanism to update the physical MSI-X tables with whatever gIOVA the guest mapped in its private stage-1 page tables.) > (assume your irqchip will provide multiple doorbells so each > device can have its own channel). In existing irqchips the doorbell is shared by endpoints, which are differentiated by their device ID (generally the BDF). I'm not sure why this matters here? > then once this update is > done, later MSI interrupts from assigned device will go > through physical IOMMU (gIOVA->HPA) then reach irqchip > for irq remapping. vIOMMU is involved only in configuration > path instead of actual interrupt path. Yes the vIOMMU is used to correlate the IOVA written by the guest in its virtual MSI-X table with the MAP request received by the vIOMMU. That is probably used to setup IRQFD routes with KVM. But the vIOMMU is not involved further than that in MSIs. > If my understanding is correct, above will be the natural flow then > why is additional virtio-iommu change required? :-) The change is not *required* for ARM systems, I only proposed removing the doorbell address translation stage to make host implementation simpler (and since virtio-iommu on x86 won't translate the doorbell anyway, we have to add support for this to virtio-iommu). But for Qemu, since vSMMU needs to implement the natural flow anyway, it might not be a lot of effort to also do it for virtio-iommu. Other implementations (e.g. kvmtool) might piggy-back on the x86 way and declare the irqchip doorbell as untranslated. My proposal also breaks when confronted to virtual SVM in a physical ARM system, where the guest owns stage-1 page tables and *has* to map the doorbell if it wants MSIs to work, so you can disregard it :) Thanks, Jean
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Hi Eric, > >>>> -Original Message- > >>>> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > >>>> Sent: Thursday, July 06, 2017 3:33 PM > >>>> To: Bharat Bhushan ; Auger Eric > >>>> ; eric.auger@gmail.com; > >>>> peter.mayd...@linaro.org; alex.william...@redhat.com; > >> m...@redhat.com; > >>>> qemu-...@nongnu.org; qemu-devel@nongnu.org > >>>> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > >>>> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > >>>> robin.mur...@arm.com; christoffer.d...@linaro.org > >>>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device > >>>> > >>>> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup > >>>> msi-route > >>>> kvm_irqchip_add_msi_route() we needed to > >>>>>> provide the translated address. > >>>>>>> According to my understanding this is required because kernel > >>>>>>> does no go > >>>>>> through viommu translation when generating interrupt, no? > >>>>>> > >>>>>> yes this is needed when KVM MSI routes are set up, ie. along with > >>>>>> GICV3 > >>>> ITS. > >>>>>> With GICv2M, qemu direct gsi mapping is used and this is not > needed. > >>>>>> > >>>>>> So I do not understand your previous sentence saying "MSI > >>>>>> interrupts works without any change". > >>>>> > >>>>> I have almost completed vfio integration with virtio-iommu and now > >>>>> testing the changes by assigning e1000 device to VM. For this I > >>>>> have changed virtio-iommu driver to use IOMMU_RESV_MSI rather > than > >>>>> sw- > >> msi > >>>>> and this does not need changed in vfio_get_addr() and > >>>>> kvm_irqchip_add_msi_route() > >>>> > >>>> I understand you're reserving region 0x0800-0x0810 as > >>>> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this > only > >> works > >>>> because Qemu places the vgic in that area as well (in hw/arm/virt.c). > >>>> It's not a coincidence if the addresses are the same, because Eric > >>>> chose them for the Linux SMMU drivers and I copied them. > >>>> > >>>> We can't rely on that behavior, though, it will break MSIs in > >>>> emulated devices. And if Qemu happens to move the MSI doorbell in > >>>> future machine revisions, then it would also break VFIO. > >>> > >>> Yes, make sense to me > >>> > >>>> > >>>> Just for my own understanding -- what happens, I think, is that in > >>>> Linux iova_reserve_iommu_regions initially reserves the > >>>> guest-physical doorbell 0x0800-0x0810. Then much later, > >>>> when the device driver requests an MSI, the irqchip driver calls > >>>> iommu_dma_map_msi_msg with the guest- physical gicv2m address > >>>> 0x0802. The function finds the right page in msi_page_list, > >>>> which was added by cookie_init_hw_msi_region, therefore bypassing > >>>> the > >> viommu and the GPA gets written in the MSI-X table. > >>> > >>> This means in case tomorrow when qemu changes virt machine address > >> map and vgic-its (its-translator register address) address range does > >> not fall in the msi_page_list then it will allocate a new iova, > >> create mapping in iommu. So this will no longer be identity mapped > >> and fail to work with new qemu? > >>> > >> Yes that's correct. > >>>> > >>>> If an emulated device such as virtio-net-pci were to generate an > >>>> MSI, then Qemu would attempt to access the doorbell written by > >>>> Linux into the MSI-X table, 0x0802, and fault because that > >>>> address wasn't mapped in the viommu. > >>>> > >>>> So for VFIO, you either need to translate the MSI-X entry using the > >>>> viommu, or just assume that the vaddr corresponds to the only MSI > >>>> doorbell accessible by this device (because how can we be certain > >>>> that the guest already mapped the doorbell before writing th
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Hi, On 06/07/2017 23:11, Auger Eric wrote: > Hello Bharat, Jean-Philippe, > On 06/07/2017 12:02, Jean-Philippe Brucker wrote: >> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route >> kvm_irqchip_add_msi_route() we needed to provide the translated address. > According to my understanding this is required because kernel does no go through viommu translation when generating interrupt, no? yes this is needed when KVM MSI routes are set up, ie. along with GICV3 ITS. With GICv2M, qemu direct gsi mapping is used and this is not needed. So I do not understand your previous sentence saying "MSI interrupts works without any change". >>> >>> I have almost completed vfio integration with virtio-iommu and now testing >>> the changes by assigning e1000 device to VM. For this I have changed >>> virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi and this does >>> not need changed in vfio_get_addr() and kvm_irqchip_add_msi_route() >> >> I understand you're reserving region 0x0800-0x0810 as >> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only works >> because Qemu places the vgic in that area as well (in hw/arm/virt.c). It's >> not a coincidence if the addresses are the same, because Eric chose them >> for the Linux SMMU drivers and I copied them. > > Yes I chose this region because it does not overlap with any guest RAM > region > >> >> We can't rely on that behavior, though, it will break MSIs in emulated >> devices. And if Qemu happens to move the MSI doorbell in future machine >> revisions, then it would also break VFIO. >> >> Just for my own understanding -- what happens, I think, is that in Linux >> iova_reserve_iommu_regions initially reserves the guest-physical doorbell >> 0x0800-0x0810. Then much later, when the device driver requests an >> MSI, the irqchip driver calls iommu_dma_map_msi_msg with the >> guest-physical gicv2m address 0x0802. The function finds the right >> page in msi_page_list, which was added by cookie_init_hw_msi_region, >> therefore bypassing the viommu and the GPA gets written in the MSI-X table. > > I share Jean's understanding. To me using IOMMU_RESV_MSI in the > virtio-iommu means this region is not translated by the IOMMU. as > cookie_init_hw_msi_region() pre-allocates the msi_page array, > iommu_dma_get_msi_page() does not do any IOMMU mapping. > >> >> If an emulated device such as virtio-net-pci were to generate an MSI, then >> Qemu would attempt to access the doorbell written by Linux into the MSI-X >> table, 0x0802, and fault because that address wasn't mapped in the >> viommu. > Yes so I am confused, how can it work with a virtio-net-pci or > passthrough'ed e1000e device using MSIs? >> >> So for VFIO, you either need to translate the MSI-X entry using the >> viommu, > > For the vsmmuv3 I created a dedicated IOMMUNotifier to handle the fact > the MSI doorbell is translated and MSI routes need to be updated. This > seems to work. > > or just assume that the vaddr corresponds to the only MSI doorbell >> accessible by this device (because how can we be certain that the guest >> already mapped the doorbell before writing the entry?) >> >> For ARM machines it's probably best to stick with IOMMU_RESV_SW_MSI. >> However, a nice way to use IOMMU_RESV_MSI would be for the virtio-iommu >> device to advertise identity-mapped/reserved regions, and bypass >> translation on these regions. Then the driver could reserve those with >> IOMMU_RESV_MSI. > > At least we may need to configure the virtio-iommu to either bypass MSIs > (x86) or translate MSIs (ARM)? Actually on x86 no MSI controller will attempt to map MSIs, as opposed to ARM GICv2M & ITS. So the only problem exposing IOMMU_RESV_SW_MSI regions is vfio_iommu_type1 will assess the IRQ assignment safety using irq_domain_check_msi_remap() and not with the IOMMU IOMMU_CAP_INTR_REMAP capability. Thanks Eric > For x86 we will need such a system, with an added IRQ >> remapping feature. > Meaning this must live along with vIR, is that what you mean? Also on > ARM this must live with vITS anyway. This is an orthogonal feature, right? > > Thanks > > Eric >> >> Thanks, >> Jean >>
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
On 07/07/2017 08:25, Bharat Bhushan wrote: > Hi Eric, > >> -Original Message- >> From: Auger Eric [mailto:eric.au...@redhat.com] >> Sent: Friday, July 07, 2017 2:47 AM >> To: Bharat Bhushan ; Jean-Philippe Brucker >> ; eric.auger@gmail.com; >> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com; >> qemu-...@nongnu.org; qemu-devel@nongnu.org >> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; >> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; >> robin.mur...@arm.com; christoffer.d...@linaro.org >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device >> >> Hi Bharat, >> >> On 06/07/2017 13:24, Bharat Bhushan wrote: >>> >>> >>>> -Original Message- >>>> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] >>>> Sent: Thursday, July 06, 2017 3:33 PM >>>> To: Bharat Bhushan ; Auger Eric >>>> ; eric.auger@gmail.com; >>>> peter.mayd...@linaro.org; alex.william...@redhat.com; >> m...@redhat.com; >>>> qemu-...@nongnu.org; qemu-devel@nongnu.org >>>> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; >>>> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; >>>> robin.mur...@arm.com; christoffer.d...@linaro.org >>>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device >>>> >>>> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route >>>> kvm_irqchip_add_msi_route() we needed to >>>>>> provide the translated address. >>>>>>> According to my understanding this is required because kernel does >>>>>>> no go >>>>>> through viommu translation when generating interrupt, no? >>>>>> >>>>>> yes this is needed when KVM MSI routes are set up, ie. along with >>>>>> GICV3 >>>> ITS. >>>>>> With GICv2M, qemu direct gsi mapping is used and this is not needed. >>>>>> >>>>>> So I do not understand your previous sentence saying "MSI >>>>>> interrupts works without any change". >>>>> >>>>> I have almost completed vfio integration with virtio-iommu and now >>>>> testing the changes by assigning e1000 device to VM. For this I have >>>>> changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw- >> msi >>>>> and this does not need changed in vfio_get_addr() and >>>>> kvm_irqchip_add_msi_route() >>>> >>>> I understand you're reserving region 0x0800-0x0810 as >>>> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only >> works >>>> because Qemu places the vgic in that area as well (in hw/arm/virt.c). >>>> It's not a coincidence if the addresses are the same, because Eric >>>> chose them for the Linux SMMU drivers and I copied them. >>>> >>>> We can't rely on that behavior, though, it will break MSIs in >>>> emulated devices. And if Qemu happens to move the MSI doorbell in >>>> future machine revisions, then it would also break VFIO. >>> >>> Yes, make sense to me >>> >>>> >>>> Just for my own understanding -- what happens, I think, is that in >>>> Linux iova_reserve_iommu_regions initially reserves the >>>> guest-physical doorbell 0x0800-0x0810. Then much later, when >>>> the device driver requests an MSI, the irqchip driver calls >>>> iommu_dma_map_msi_msg with the guest- physical gicv2m address >>>> 0x0802. The function finds the right page in msi_page_list, which >>>> was added by cookie_init_hw_msi_region, therefore bypassing the >> viommu and the GPA gets written in the MSI-X table. >>> >>> This means in case tomorrow when qemu changes virt machine address >> map and vgic-its (its-translator register address) address range does not >> fall >> in the msi_page_list then it will allocate a new iova, create mapping in >> iommu. So this will no longer be identity mapped and fail to work with new >> qemu? >>> >> Yes that's correct. >>>> >>>> If an emulated device such as virtio-net-pci were to generate an MSI, >>>> then Qemu would attempt to access the doorbell written by Linux into >>>> the MSI-X table, 0x0802, and fault because that address wasn't >>>> mapped in the
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Hi Eric, > -Original Message- > From: Auger Eric [mailto:eric.au...@redhat.com] > Sent: Friday, July 07, 2017 2:47 AM > To: Bharat Bhushan ; Jean-Philippe Brucker > ; eric.auger@gmail.com; > peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com; > qemu-...@nongnu.org; qemu-devel@nongnu.org > Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > robin.mur...@arm.com; christoffer.d...@linaro.org > Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device > > Hi Bharat, > > On 06/07/2017 13:24, Bharat Bhushan wrote: > > > > > >> -Original Message- > >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > >> Sent: Thursday, July 06, 2017 3:33 PM > >> To: Bharat Bhushan ; Auger Eric > >> ; eric.auger@gmail.com; > >> peter.mayd...@linaro.org; alex.william...@redhat.com; > m...@redhat.com; > >> qemu-...@nongnu.org; qemu-devel@nongnu.org > >> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > >> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > >> robin.mur...@arm.com; christoffer.d...@linaro.org > >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device > >> > >> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route > >> kvm_irqchip_add_msi_route() we needed to > >>>> provide the translated address. > >>>>> According to my understanding this is required because kernel does > >>>>> no go > >>>> through viommu translation when generating interrupt, no? > >>>> > >>>> yes this is needed when KVM MSI routes are set up, ie. along with > >>>> GICV3 > >> ITS. > >>>> With GICv2M, qemu direct gsi mapping is used and this is not needed. > >>>> > >>>> So I do not understand your previous sentence saying "MSI > >>>> interrupts works without any change". > >>> > >>> I have almost completed vfio integration with virtio-iommu and now > >>> testing the changes by assigning e1000 device to VM. For this I have > >>> changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw- > msi > >>> and this does not need changed in vfio_get_addr() and > >>> kvm_irqchip_add_msi_route() > >> > >> I understand you're reserving region 0x0800-0x0810 as > >> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only > works > >> because Qemu places the vgic in that area as well (in hw/arm/virt.c). > >> It's not a coincidence if the addresses are the same, because Eric > >> chose them for the Linux SMMU drivers and I copied them. > >> > >> We can't rely on that behavior, though, it will break MSIs in > >> emulated devices. And if Qemu happens to move the MSI doorbell in > >> future machine revisions, then it would also break VFIO. > > > > Yes, make sense to me > > > >> > >> Just for my own understanding -- what happens, I think, is that in > >> Linux iova_reserve_iommu_regions initially reserves the > >> guest-physical doorbell 0x0800-0x0810. Then much later, when > >> the device driver requests an MSI, the irqchip driver calls > >> iommu_dma_map_msi_msg with the guest- physical gicv2m address > >> 0x0802. The function finds the right page in msi_page_list, which > >> was added by cookie_init_hw_msi_region, therefore bypassing the > viommu and the GPA gets written in the MSI-X table. > > > > This means in case tomorrow when qemu changes virt machine address > map and vgic-its (its-translator register address) address range does not fall > in the msi_page_list then it will allocate a new iova, create mapping in > iommu. So this will no longer be identity mapped and fail to work with new > qemu? > > > Yes that's correct. > >> > >> If an emulated device such as virtio-net-pci were to generate an MSI, > >> then Qemu would attempt to access the doorbell written by Linux into > >> the MSI-X table, 0x0802, and fault because that address wasn't > >> mapped in the viommu. > >> > >> So for VFIO, you either need to translate the MSI-X entry using the > >> viommu, or just assume that the vaddr corresponds to the only MSI > >> doorbell accessible by this device (because how can we be certain > >> that the guest already mapped the doorbell before writing the entry?) &
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > Sent: Wednesday, July 5, 2017 8:45 PM > > On 05/07/17 08:14, Tian, Kevin wrote: > >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > >> Sent: Monday, June 19, 2017 6:15 PM > >> > >> On 19/06/17 08:54, Bharat Bhushan wrote: > >>> Hi Eric, > >>> > >>> I started added replay in virtio-iommu and came across how MSI > interrupts > >> with work with VFIO. > >>> I understand that on intel this works differently but vsmmu will have > same > >> requirement. > >>> kvm-msi-irq-route are added using the msi-address to be translated by > >> viommu and not the final translated address. > >>> While currently the irqfd framework does not know about emulated > >> iommus (virtio-iommu, vsmmuv3/vintel-iommu). > >>> So in my view we have following options: > >>> - Programming with translated address when setting up kvm-msi-irq- > route > >>> - Route the interrupts via QEMU, which is bad from performance > >>> - vhost-virtio-iommu may solve the problem in long term > >>> > >>> Is there any other better option I am missing? > >> > >> Since we're on the topic of MSIs... I'm currently trying to figure out how > >> we'll handle MSIs in the nested translation mode, where the guest > manages > >> S1 page tables and the host doesn't know about GVA->GPA translation. > >> > >> I'm also wondering about the benefits of having SW-mapped MSIs in the > >> guest. It seems unavoidable for vSMMU since that's what a physical > system > >> would do. But in a paravirtualized solution there doesn't seem to be any > >> compelling reason for having the guest map MSI doorbells. These > addresses > >> are never accessed directly, they are only used for setting up IRQ routing > >> (at least on kvmtool). So here's what I'd like to have. Note that I > >> haven't investigated the feasibility in Qemu yet, I don't know how it > >> deals with MSIs. > >> > >> (1) Guest uses the guest-physical MSI doorbell when setting up MSIs. For > >> ARM with GICv3 this would be GITS_TRANSLATER, for x86 it would be the > >> fixed MSI doorbell. This way the host wouldn't need to inspect IOMMU > >> mappings when handling writes to PCI MSI-X tables. > >> > > > > What do you mean by "fixed MSI doorbell"? PCI MSI-X table is part of > > PCI MMIO bar. Accessing to it is just a memory virtualization issue (e.g. > > trap by KVM and then emulated in Qemu) on x86. It's not a IOMMU > > problem. I guess you may mean same thing but want to double confirm > > here given the terminology confusion. Or do you mean the interrupt > > triggered by IOMMU itself? > > Yes I didn't mean access to the MSI-X table, but how we interpret the > address in the MSI message. In kvmtool I create MSI routes for VFIO > devices when the guest accesses the MSI-X tables. And on ARM the tables > contains an IOVA that needs to be translated into a PA, so handling a > write to an MSI-X entry might mean doing the IOVA->PA translation of the > doorbell. > > On x86 the MSI address is 0xfee, whether there is an IOMMU or not. > That's what I meant by fixed. And it is the IOMMU that performs IRQ > remapping. > > On physical ARM systems, the SMMU doesn't treat any special address range > as "MSI window". For the SMMU, an MSI is simply a memory transaction. > MSI > addresses are arbitrary IOVAs that get translated into PAs by the SMMU. > The SMMU doesn't perform any IRQ remapping, only address translation. > This > PA is a doorbell register in the irqchip, which performs IRQ remapping and > triggers an interrupt. Thanks for explanation. I see the background now. > > Therefore in an emulated ARM system, when the guest writes the MSI-X > table, it writes an IOVA. In a strict emulation the MSI would have to > first go through the vIOMMU, and then into the irqchip. I was wondering if > with virtio-iommu we could skip the address translation and go to the MSI > remapping component immediately, effectively implementing a "hardware > MSI > window". This is what x86 does, the difference being that MSI remapping is > done by the IOMMU on x86, and by the irqchip on ARM. sorry I didn't quite get this part, and here is my understanding: Guest programs vIOMMU to map a gIOVA (used by MSI to a GPA of doorbell register of virtual irqchip. vIOMMU then triggers VFIO map/unmap to update physical IOMMU page table for gIOVA -> HPA of real doorbell of physical irqchip (assume your irqchip will provide multiple doorbells so each device can have its own channel). then once this update is done, later MSI interrupts from assigned device will go through physical IOMMU (gIOVA->HPA) then reach irqchip for irq remapping. vIOMMU is involved only in configuration path instead of actual interrupt path. If my understanding is correct, above will be the natural flow then why is additional virtio-iommu change required? :-) > > My current take is that we should keep the current behavior, but I will > try to sort out the different ways of implementin
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Hi Bharat, On 06/07/2017 13:24, Bharat Bhushan wrote: > > >> -Original Message- >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] >> Sent: Thursday, July 06, 2017 3:33 PM >> To: Bharat Bhushan ; Auger Eric >> ; eric.auger@gmail.com; >> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com; >> qemu-...@nongnu.org; qemu-devel@nongnu.org >> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; >> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; >> robin.mur...@arm.com; christoffer.d...@linaro.org >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device >> >> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route >> kvm_irqchip_add_msi_route() we needed to >>>> provide the translated address. >>>>> According to my understanding this is required because kernel does >>>>> no go >>>> through viommu translation when generating interrupt, no? >>>> >>>> yes this is needed when KVM MSI routes are set up, ie. along with GICV3 >> ITS. >>>> With GICv2M, qemu direct gsi mapping is used and this is not needed. >>>> >>>> So I do not understand your previous sentence saying "MSI interrupts >>>> works without any change". >>> >>> I have almost completed vfio integration with virtio-iommu and now >>> testing the changes by assigning e1000 device to VM. For this I have >>> changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi >>> and this does not need changed in vfio_get_addr() and >>> kvm_irqchip_add_msi_route() >> >> I understand you're reserving region 0x0800-0x0810 as >> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only >> works because Qemu places the vgic in that area as well (in hw/arm/virt.c). >> It's not a coincidence if the addresses are the same, because Eric chose them >> for the Linux SMMU drivers and I copied them. >> >> We can't rely on that behavior, though, it will break MSIs in emulated >> devices. And if Qemu happens to move the MSI doorbell in future machine >> revisions, then it would also break VFIO. > > Yes, make sense to me > >> >> Just for my own understanding -- what happens, I think, is that in Linux >> iova_reserve_iommu_regions initially reserves the guest-physical doorbell >> 0x0800-0x0810. Then much later, when the device driver requests >> an MSI, the irqchip driver calls iommu_dma_map_msi_msg with the guest- >> physical gicv2m address 0x0802. The function finds the right page in >> msi_page_list, which was added by cookie_init_hw_msi_region, therefore >> bypassing the viommu and the GPA gets written in the MSI-X table. > > This means in case tomorrow when qemu changes virt machine address map and > vgic-its (its-translator register address) address range does not fall in the > msi_page_list then it will allocate a new iova, create mapping in iommu. So > this will no longer be identity mapped and fail to work with new qemu? > Yes that's correct. >> >> If an emulated device such as virtio-net-pci were to generate an MSI, then >> Qemu would attempt to access the doorbell written by Linux into the MSI-X >> table, 0x0802, and fault because that address wasn't mapped in the >> viommu. >> >> So for VFIO, you either need to translate the MSI-X entry using the viommu, >> or just assume that the vaddr corresponds to the only MSI doorbell >> accessible by this device (because how can we be certain that the guest >> already mapped the doorbell before writing the entry?) >> >> For ARM machines it's probably best to stick with IOMMU_RESV_SW_MSI. >> However, a nice way to use IOMMU_RESV_MSI would be for the virtio- >> iommu device to advertise identity-mapped/reserved regions, and bypass >> translation on these regions. Then the driver could reserve those with >> IOMMU_RESV_MSI. > > Correct me if I did not understood you correctly, today iommu-driver decides > msi-reserved region, what if we change this and virtio-iommu device will > provide the reserved msi region as per the emulated machine (virt/intel). So > virtio-iommu driver will use the address advertised by virtio-iommu device as > IOMMU_RESV_MSI. In this case msi-page-list will always have the reserved > region for MSI. > On qemu side, for emulated devices we will let virtio-iommu return same > address as translated address as it falls in MSI-reserved page already known > to it. I think what you're proposing
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Hello Bharat, Jean-Philippe, On 06/07/2017 12:02, Jean-Philippe Brucker wrote: > On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route > kvm_irqchip_add_msi_route() we needed to >>> provide the translated address. According to my understanding this is required because kernel does no go >>> through viommu translation when generating interrupt, no? >>> >>> yes this is needed when KVM MSI routes are set up, ie. along with GICV3 ITS. >>> With GICv2M, qemu direct gsi mapping is used and this is not needed. >>> >>> So I do not understand your previous sentence saying "MSI interrupts works >>> without any change". >> >> I have almost completed vfio integration with virtio-iommu and now testing >> the changes by assigning e1000 device to VM. For this I have changed >> virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi and this does >> not need changed in vfio_get_addr() and kvm_irqchip_add_msi_route() > > I understand you're reserving region 0x0800-0x0810 as > IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only works > because Qemu places the vgic in that area as well (in hw/arm/virt.c). It's > not a coincidence if the addresses are the same, because Eric chose them > for the Linux SMMU drivers and I copied them. Yes I chose this region because it does not overlap with any guest RAM region > > We can't rely on that behavior, though, it will break MSIs in emulated > devices. And if Qemu happens to move the MSI doorbell in future machine > revisions, then it would also break VFIO. > > Just for my own understanding -- what happens, I think, is that in Linux > iova_reserve_iommu_regions initially reserves the guest-physical doorbell > 0x0800-0x0810. Then much later, when the device driver requests an > MSI, the irqchip driver calls iommu_dma_map_msi_msg with the > guest-physical gicv2m address 0x0802. The function finds the right > page in msi_page_list, which was added by cookie_init_hw_msi_region, > therefore bypassing the viommu and the GPA gets written in the MSI-X table. I share Jean's understanding. To me using IOMMU_RESV_MSI in the virtio-iommu means this region is not translated by the IOMMU. as cookie_init_hw_msi_region() pre-allocates the msi_page array, iommu_dma_get_msi_page() does not do any IOMMU mapping. > > If an emulated device such as virtio-net-pci were to generate an MSI, then > Qemu would attempt to access the doorbell written by Linux into the MSI-X > table, 0x0802, and fault because that address wasn't mapped in the viommu. Yes so I am confused, how can it work with a virtio-net-pci or passthrough'ed e1000e device using MSIs? > > So for VFIO, you either need to translate the MSI-X entry using the > viommu, For the vsmmuv3 I created a dedicated IOMMUNotifier to handle the fact the MSI doorbell is translated and MSI routes need to be updated. This seems to work. or just assume that the vaddr corresponds to the only MSI doorbell > accessible by this device (because how can we be certain that the guest > already mapped the doorbell before writing the entry?) > > For ARM machines it's probably best to stick with IOMMU_RESV_SW_MSI. > However, a nice way to use IOMMU_RESV_MSI would be for the virtio-iommu > device to advertise identity-mapped/reserved regions, and bypass > translation on these regions. Then the driver could reserve those with > IOMMU_RESV_MSI. At least we may need to configure the virtio-iommu to either bypass MSIs (x86) or translate MSIs (ARM)? For x86 we will need such a system, with an added IRQ > remapping feature. Meaning this must live along with vIR, is that what you mean? Also on ARM this must live with vITS anyway. This is an orthogonal feature, right? Thanks Eric > > Thanks, > Jean >
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
On 06/07/17 12:24, Bharat Bhushan wrote: > > >> -Original Message- >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] >> Sent: Thursday, July 06, 2017 3:33 PM >> To: Bharat Bhushan ; Auger Eric >> ; eric.auger@gmail.com; >> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com; >> qemu-...@nongnu.org; qemu-devel@nongnu.org >> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; >> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; >> robin.mur...@arm.com; christoffer.d...@linaro.org >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device >> >> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route >> kvm_irqchip_add_msi_route() we needed to >>>> provide the translated address. >>>>> According to my understanding this is required because kernel does >>>>> no go >>>> through viommu translation when generating interrupt, no? >>>> >>>> yes this is needed when KVM MSI routes are set up, ie. along with GICV3 >> ITS. >>>> With GICv2M, qemu direct gsi mapping is used and this is not needed. >>>> >>>> So I do not understand your previous sentence saying "MSI interrupts >>>> works without any change". >>> >>> I have almost completed vfio integration with virtio-iommu and now >>> testing the changes by assigning e1000 device to VM. For this I have >>> changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi >>> and this does not need changed in vfio_get_addr() and >>> kvm_irqchip_add_msi_route() >> >> I understand you're reserving region 0x0800-0x0810 as >> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only >> works because Qemu places the vgic in that area as well (in hw/arm/virt.c). >> It's not a coincidence if the addresses are the same, because Eric chose them >> for the Linux SMMU drivers and I copied them. >> >> We can't rely on that behavior, though, it will break MSIs in emulated >> devices. And if Qemu happens to move the MSI doorbell in future machine >> revisions, then it would also break VFIO. > > Yes, make sense to me > >> >> Just for my own understanding -- what happens, I think, is that in Linux >> iova_reserve_iommu_regions initially reserves the guest-physical doorbell >> 0x0800-0x0810. Then much later, when the device driver requests >> an MSI, the irqchip driver calls iommu_dma_map_msi_msg with the guest- >> physical gicv2m address 0x0802. The function finds the right page in >> msi_page_list, which was added by cookie_init_hw_msi_region, therefore >> bypassing the viommu and the GPA gets written in the MSI-X table. > > This means in case tomorrow when qemu changes virt machine address map and > vgic-its (its-translator register address) address range does not fall in the > msi_page_list then it will allocate a new iova, create mapping in iommu. So > this will no longer be identity mapped and fail to work with new qemu? Precisely >> >> If an emulated device such as virtio-net-pci were to generate an MSI, then >> Qemu would attempt to access the doorbell written by Linux into the MSI-X >> table, 0x0802, and fault because that address wasn't mapped in the >> viommu. >> >> So for VFIO, you either need to translate the MSI-X entry using the viommu, >> or just assume that the vaddr corresponds to the only MSI doorbell >> accessible by this device (because how can we be certain that the guest >> already mapped the doorbell before writing the entry?) >> >> For ARM machines it's probably best to stick with IOMMU_RESV_SW_MSI. >> However, a nice way to use IOMMU_RESV_MSI would be for the virtio- >> iommu device to advertise identity-mapped/reserved regions, and bypass >> translation on these regions. Then the driver could reserve those with >> IOMMU_RESV_MSI. > > Correct me if I did not understood you correctly, today iommu-driver decides > msi-reserved region, what if we change this and virtio-iommu device will > provide the reserved msi region as per the emulated machine (virt/intel). So > virtio-iommu driver will use the address advertised by virtio-iommu device as > IOMMU_RESV_MSI. In this case msi-page-list will always have the reserved > region for MSI. > On qemu side, for emulated devices we will let virtio-iommu return same > address as translated address as it falls in MSI-reserved page already known > to it. Yes that's it. For example on x86, the virtio-iommu device wi
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> -Original Message- > From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > Sent: Thursday, July 06, 2017 3:33 PM > To: Bharat Bhushan ; Auger Eric > ; eric.auger@gmail.com; > peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com; > qemu-...@nongnu.org; qemu-devel@nongnu.org > Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > robin.mur...@arm.com; christoffer.d...@linaro.org > Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device > > On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route > kvm_irqchip_add_msi_route() we needed to > >> provide the translated address. > >>> According to my understanding this is required because kernel does > >>> no go > >> through viommu translation when generating interrupt, no? > >> > >> yes this is needed when KVM MSI routes are set up, ie. along with GICV3 > ITS. > >> With GICv2M, qemu direct gsi mapping is used and this is not needed. > >> > >> So I do not understand your previous sentence saying "MSI interrupts > >> works without any change". > > > > I have almost completed vfio integration with virtio-iommu and now > > testing the changes by assigning e1000 device to VM. For this I have > > changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi > > and this does not need changed in vfio_get_addr() and > > kvm_irqchip_add_msi_route() > > I understand you're reserving region 0x0800-0x0810 as > IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only > works because Qemu places the vgic in that area as well (in hw/arm/virt.c). > It's not a coincidence if the addresses are the same, because Eric chose them > for the Linux SMMU drivers and I copied them. > > We can't rely on that behavior, though, it will break MSIs in emulated > devices. And if Qemu happens to move the MSI doorbell in future machine > revisions, then it would also break VFIO. Yes, make sense to me > > Just for my own understanding -- what happens, I think, is that in Linux > iova_reserve_iommu_regions initially reserves the guest-physical doorbell > 0x0800-0x0810. Then much later, when the device driver requests > an MSI, the irqchip driver calls iommu_dma_map_msi_msg with the guest- > physical gicv2m address 0x0802. The function finds the right page in > msi_page_list, which was added by cookie_init_hw_msi_region, therefore > bypassing the viommu and the GPA gets written in the MSI-X table. This means in case tomorrow when qemu changes virt machine address map and vgic-its (its-translator register address) address range does not fall in the msi_page_list then it will allocate a new iova, create mapping in iommu. So this will no longer be identity mapped and fail to work with new qemu? > > If an emulated device such as virtio-net-pci were to generate an MSI, then > Qemu would attempt to access the doorbell written by Linux into the MSI-X > table, 0x0802, and fault because that address wasn't mapped in the > viommu. > > So for VFIO, you either need to translate the MSI-X entry using the viommu, > or just assume that the vaddr corresponds to the only MSI doorbell > accessible by this device (because how can we be certain that the guest > already mapped the doorbell before writing the entry?) > > For ARM machines it's probably best to stick with IOMMU_RESV_SW_MSI. > However, a nice way to use IOMMU_RESV_MSI would be for the virtio- > iommu device to advertise identity-mapped/reserved regions, and bypass > translation on these regions. Then the driver could reserve those with > IOMMU_RESV_MSI. Correct me if I did not understood you correctly, today iommu-driver decides msi-reserved region, what if we change this and virtio-iommu device will provide the reserved msi region as per the emulated machine (virt/intel). So virtio-iommu driver will use the address advertised by virtio-iommu device as IOMMU_RESV_MSI. In this case msi-page-list will always have the reserved region for MSI. On qemu side, for emulated devices we will let virtio-iommu return same address as translated address as it falls in MSI-reserved page already known to it. > For x86 we will need such a system, with an added IRQ > remapping feature. I do not understand x86 MSI interrupt generation, but If above understand is correct, then why we need IRQ remapping for x86? Will the x86 machine emulated in QEMU provides a big address range for MSIs and when actually generating MSI it needed some extra processing (IRQ-remapping processing) before actually generating write transaction for MSI interrupt ? Thanks -Bharat > > Thanks, > Jean
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route kvm_irqchip_add_msi_route() we needed to >> provide the translated address. >>> According to my understanding this is required because kernel does no go >> through viommu translation when generating interrupt, no? >> >> yes this is needed when KVM MSI routes are set up, ie. along with GICV3 ITS. >> With GICv2M, qemu direct gsi mapping is used and this is not needed. >> >> So I do not understand your previous sentence saying "MSI interrupts works >> without any change". > > I have almost completed vfio integration with virtio-iommu and now testing > the changes by assigning e1000 device to VM. For this I have changed > virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi and this does > not need changed in vfio_get_addr() and kvm_irqchip_add_msi_route() I understand you're reserving region 0x0800-0x0810 as IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only works because Qemu places the vgic in that area as well (in hw/arm/virt.c). It's not a coincidence if the addresses are the same, because Eric chose them for the Linux SMMU drivers and I copied them. We can't rely on that behavior, though, it will break MSIs in emulated devices. And if Qemu happens to move the MSI doorbell in future machine revisions, then it would also break VFIO. Just for my own understanding -- what happens, I think, is that in Linux iova_reserve_iommu_regions initially reserves the guest-physical doorbell 0x0800-0x0810. Then much later, when the device driver requests an MSI, the irqchip driver calls iommu_dma_map_msi_msg with the guest-physical gicv2m address 0x0802. The function finds the right page in msi_page_list, which was added by cookie_init_hw_msi_region, therefore bypassing the viommu and the GPA gets written in the MSI-X table. If an emulated device such as virtio-net-pci were to generate an MSI, then Qemu would attempt to access the doorbell written by Linux into the MSI-X table, 0x0802, and fault because that address wasn't mapped in the viommu. So for VFIO, you either need to translate the MSI-X entry using the viommu, or just assume that the vaddr corresponds to the only MSI doorbell accessible by this device (because how can we be certain that the guest already mapped the doorbell before writing the entry?) For ARM machines it's probably best to stick with IOMMU_RESV_SW_MSI. However, a nice way to use IOMMU_RESV_MSI would be for the virtio-iommu device to advertise identity-mapped/reserved regions, and bypass translation on these regions. Then the driver could reserve those with IOMMU_RESV_MSI. For x86 we will need such a system, with an added IRQ remapping feature. Thanks, Jean
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
On 05/07/17 08:25, Tian, Kevin wrote: >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] >> Sent: Tuesday, June 27, 2017 12:13 AM >> >> On 26/06/17 09:22, Auger Eric wrote: >>> Hi Jean-Philippe, >>> >>> On 19/06/2017 12:15, Jean-Philippe Brucker wrote: On 19/06/17 08:54, Bharat Bhushan wrote: > Hi Eric, > > I started added replay in virtio-iommu and came across how MSI >> interrupts with work with VFIO. > I understand that on intel this works differently but vsmmu will have >> same requirement. > kvm-msi-irq-route are added using the msi-address to be translated by >> viommu and not the final translated address. > While currently the irqfd framework does not know about emulated >> iommus (virtio-iommu, vsmmuv3/vintel-iommu). > So in my view we have following options: > - Programming with translated address when setting up kvm-msi-irq- >> route > - Route the interrupts via QEMU, which is bad from performance > - vhost-virtio-iommu may solve the problem in long term > > Is there any other better option I am missing? Since we're on the topic of MSIs... I'm currently trying to figure out how we'll handle MSIs in the nested translation mode, where the guest >> manages S1 page tables and the host doesn't know about GVA->GPA translation. >>> >>> I have a question about the "nested translation mode" terminology. Do >>> you mean in that case you use stage 1 + stage 2 of the physical IOMMU >>> (which the ARM spec normally advises or was meant for) or do you mean >>> stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At >> the >>> moment my understanding is for VFIO integration the pIOMMU uses a >> single >>> stage combining both the stage 1 and stage2 mappings but the host is not >>> aware of those 2 stages. >> >> Yes at the moment the VMM merges stage-1 (GVA->GPA) from the guest with >> its stage-2 mappings (GPA->HPA) and creates a stage-2 mapping (GVA->HPA) >> in the pIOMMU via VFIO_IOMMU_MAP_DMA. stage-1 is disabled in the >> pIOMMU. >> > > Curious whether you are describing current smmu status or general > vIOMMU status also applying to other vendors... This particular paragraph was about the non-SVM state of things. The rest was about stage-1 + stage-2 (what I call nested), which would indeed be required for SVM. I don't think SVM can work with software merging. Thanks, Jean > the usage what you described is about svm, while svm requires PASID. > At least PASID is tied to stage-1 on Intel VT-d. Only DMA w/o PASID > or nested translation from stage-1 will go through stage-2. Unless > ARM smmu has a completely different implementation, I'm not sure > how svm can be virtualized w/ stage-1 translation disabled. There > are multiple stage-1 page tables while only one stage-2 page table per > device. Could merging actually work here? > > The only case with merging happen today is for guest stage-2 usage > or so-called GIOVA usage. Guest programs GIOVA->GPA to vIOMMU > stage-2. Then vIOMMU invokes vfio map/unmap APIs to translate/ > merge to GIOVA->HPA to pIOMMU stage-2. Maybe what you > actually meant is this one? > > Thanks > Kevin >
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
On 05/07/17 08:14, Tian, Kevin wrote: >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] >> Sent: Monday, June 19, 2017 6:15 PM >> >> On 19/06/17 08:54, Bharat Bhushan wrote: >>> Hi Eric, >>> >>> I started added replay in virtio-iommu and came across how MSI interrupts >> with work with VFIO. >>> I understand that on intel this works differently but vsmmu will have same >> requirement. >>> kvm-msi-irq-route are added using the msi-address to be translated by >> viommu and not the final translated address. >>> While currently the irqfd framework does not know about emulated >> iommus (virtio-iommu, vsmmuv3/vintel-iommu). >>> So in my view we have following options: >>> - Programming with translated address when setting up kvm-msi-irq-route >>> - Route the interrupts via QEMU, which is bad from performance >>> - vhost-virtio-iommu may solve the problem in long term >>> >>> Is there any other better option I am missing? >> >> Since we're on the topic of MSIs... I'm currently trying to figure out how >> we'll handle MSIs in the nested translation mode, where the guest manages >> S1 page tables and the host doesn't know about GVA->GPA translation. >> >> I'm also wondering about the benefits of having SW-mapped MSIs in the >> guest. It seems unavoidable for vSMMU since that's what a physical system >> would do. But in a paravirtualized solution there doesn't seem to be any >> compelling reason for having the guest map MSI doorbells. These addresses >> are never accessed directly, they are only used for setting up IRQ routing >> (at least on kvmtool). So here's what I'd like to have. Note that I >> haven't investigated the feasibility in Qemu yet, I don't know how it >> deals with MSIs. >> >> (1) Guest uses the guest-physical MSI doorbell when setting up MSIs. For >> ARM with GICv3 this would be GITS_TRANSLATER, for x86 it would be the >> fixed MSI doorbell. This way the host wouldn't need to inspect IOMMU >> mappings when handling writes to PCI MSI-X tables. >> > > What do you mean by "fixed MSI doorbell"? PCI MSI-X table is part of > PCI MMIO bar. Accessing to it is just a memory virtualization issue (e.g. > trap by KVM and then emulated in Qemu) on x86. It's not a IOMMU > problem. I guess you may mean same thing but want to double confirm > here given the terminology confusion. Or do you mean the interrupt > triggered by IOMMU itself? Yes I didn't mean access to the MSI-X table, but how we interpret the address in the MSI message. In kvmtool I create MSI routes for VFIO devices when the guest accesses the MSI-X tables. And on ARM the tables contains an IOVA that needs to be translated into a PA, so handling a write to an MSI-X entry might mean doing the IOVA->PA translation of the doorbell. On x86 the MSI address is 0xfee, whether there is an IOMMU or not. That's what I meant by fixed. And it is the IOMMU that performs IRQ remapping. On physical ARM systems, the SMMU doesn't treat any special address range as "MSI window". For the SMMU, an MSI is simply a memory transaction. MSI addresses are arbitrary IOVAs that get translated into PAs by the SMMU. The SMMU doesn't perform any IRQ remapping, only address translation. This PA is a doorbell register in the irqchip, which performs IRQ remapping and triggers an interrupt. Therefore in an emulated ARM system, when the guest writes the MSI-X table, it writes an IOVA. In a strict emulation the MSI would have to first go through the vIOMMU, and then into the irqchip. I was wondering if with virtio-iommu we could skip the address translation and go to the MSI remapping component immediately, effectively implementing a "hardware MSI window". This is what x86 does, the difference being that MSI remapping is done by the IOMMU on x86, and by the irqchip on ARM. My current take is that we should keep the current behavior, but I will try to sort out the different ways of implementing MSIs with virtio-iommu in the next specification draft. Thanks, Jean
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Hi Eric, > -Original Message- > From: Auger Eric [mailto:eric.au...@redhat.com] > Sent: Wednesday, July 05, 2017 2:14 PM > To: Bharat Bhushan ; > eric.auger@gmail.com; peter.mayd...@linaro.org; > alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org; > qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com > Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > robin.mur...@arm.com; christoffer.d...@linaro.org > Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device > > Hi Bharat, > > On 05/07/2017 10:23, Bharat Bhushan wrote: > > Hi Eric, > > > >> -Original Message- > >> From: Auger Eric [mailto:eric.au...@redhat.com] > >> Sent: Monday, June 26, 2017 1:25 PM > >> To: Bharat Bhushan ; > >> eric.auger@gmail.com; peter.mayd...@linaro.org; > >> alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org; > >> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com > >> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > >> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > >> robin.mur...@arm.com; christoffer.d...@linaro.org > >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device > >> > >> Hi Bharat, > >> > >> On 19/06/2017 09:54, Bharat Bhushan wrote: > >>> Hi Eric, > >>> > >>> I started added replay in virtio-iommu and came across how MSI > >>> interrupts > >> with work with VFIO. > >>> I understand that on intel this works differently but vsmmu will > >>> have same > >> requirement. > >>> kvm-msi-irq-route are added using the msi-address to be translated > >>> by > >> viommu and not the final translated address. > >>> While currently the irqfd framework does not know about emulated > >> iommus (virtio-iommu, vsmmuv3/vintel-iommu). > >>> So in my view we have following options: > >>> - Programming with translated address when setting up > >>> kvm-msi-irq-route > >>> - Route the interrupts via QEMU, which is bad from performance > >>> - vhost-virtio-iommu may solve the problem in long term > >> > >> Sorry for the delay. With regard to the vsmmuv3/vfio integration I > >> think we need to use the guest physical address otherwise the MSI > >> address will not be recognized as an MSI doorbell. > >> > >> Also the fact on ARM we map the MSI doorbell causes an assert in > >> vfio_get_vaddr() as the vITS doorbell is not a RAM region. We will > >> need to handle this specifically. > > > > Also when setup msi-route kvm_irqchip_add_msi_route() we needed to > provide the translated address. > > According to my understanding this is required because kernel does no go > through viommu translation when generating interrupt, no? > > yes this is needed when KVM MSI routes are set up, ie. along with GICV3 ITS. > With GICv2M, qemu direct gsi mapping is used and this is not needed. > > So I do not understand your previous sentence saying "MSI interrupts works > without any change". I have almost completed vfio integration with virtio-iommu and now testing the changes by assigning e1000 device to VM. For this I have changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi and this does not need changed in vfio_get_addr() and kvm_irqchip_add_msi_route() Thanks -Bharat > > Thanks > > Eric > > > > > Thanks > > -Bharat > > > >> > >> Besides I have not looked specifically at the virtio-iommu/vfio > >> integration yet. > >> > >> Thanks > >> > >> Eric > >>> > >>> Is there any other better option I am missing? > >>> > >>> Thanks > >>> -Bharat > >>> > >>>> -Original Message- > >>>> From: Auger Eric [mailto:eric.au...@redhat.com] > >>>> Sent: Friday, June 09, 2017 5:24 PM > >>>> To: Bharat Bhushan ; > >>>> eric.auger@gmail.com; peter.mayd...@linaro.org; > >>>> alex.william...@redhat.com; m...@redhat.com; qemu- > a...@nongnu.org; > >>>> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com > >>>> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > >>>> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > >>>> robin.mur...@arm.com; christoffer.d...@linaro.org > >>>> Subject: Re: [Qe
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Hi Bharat, On 05/07/2017 10:23, Bharat Bhushan wrote: > Hi Eric, > >> -Original Message- >> From: Auger Eric [mailto:eric.au...@redhat.com] >> Sent: Monday, June 26, 2017 1:25 PM >> To: Bharat Bhushan ; >> eric.auger@gmail.com; peter.mayd...@linaro.org; >> alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org; >> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com >> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; >> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; >> robin.mur...@arm.com; christoffer.d...@linaro.org >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device >> >> Hi Bharat, >> >> On 19/06/2017 09:54, Bharat Bhushan wrote: >>> Hi Eric, >>> >>> I started added replay in virtio-iommu and came across how MSI interrupts >> with work with VFIO. >>> I understand that on intel this works differently but vsmmu will have same >> requirement. >>> kvm-msi-irq-route are added using the msi-address to be translated by >> viommu and not the final translated address. >>> While currently the irqfd framework does not know about emulated >> iommus (virtio-iommu, vsmmuv3/vintel-iommu). >>> So in my view we have following options: >>> - Programming with translated address when setting up >>> kvm-msi-irq-route >>> - Route the interrupts via QEMU, which is bad from performance >>> - vhost-virtio-iommu may solve the problem in long term >> >> Sorry for the delay. With regard to the vsmmuv3/vfio integration I think we >> need to use the guest physical address otherwise the MSI address will not be >> recognized as an MSI doorbell. >> >> Also the fact on ARM we map the MSI doorbell causes an assert in >> vfio_get_vaddr() as the vITS doorbell is not a RAM region. We will need to >> handle this specifically. > > Also when setup msi-route kvm_irqchip_add_msi_route() we needed to provide > the translated address. > According to my understanding this is required because kernel does no go > through viommu translation when generating interrupt, no? yes this is needed when KVM MSI routes are set up, ie. along with GICV3 ITS. With GICv2M, qemu direct gsi mapping is used and this is not needed. So I do not understand your previous sentence saying "MSI interrupts works without any change". Thanks Eric > > Thanks > -Bharat > >> >> Besides I have not looked specifically at the virtio-iommu/vfio integration >> yet. >> >> Thanks >> >> Eric >>> >>> Is there any other better option I am missing? >>> >>> Thanks >>> -Bharat >>> >>>> -Original Message- >>>> From: Auger Eric [mailto:eric.au...@redhat.com] >>>> Sent: Friday, June 09, 2017 5:24 PM >>>> To: Bharat Bhushan ; >>>> eric.auger@gmail.com; peter.mayd...@linaro.org; >>>> alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org; >>>> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com >>>> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; >>>> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; >>>> robin.mur...@arm.com; christoffer.d...@linaro.org >>>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device >>>> >>>> Hi Bharat, >>>> >>>> On 09/06/2017 13:30, Bharat Bhushan wrote: >>>>> Hi Eric, >>>>> >>>>>> -Original Message- >>>>>> From: Auger Eric [mailto:eric.au...@redhat.com] >>>>>> Sent: Friday, June 09, 2017 12:14 PM >>>>>> To: Bharat Bhushan ; >>>>>> eric.auger@gmail.com; peter.mayd...@linaro.org; >>>>>> alex.william...@redhat.com; m...@redhat.com; qemu- >> a...@nongnu.org; >>>>>> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com >>>>>> Cc: will.dea...@arm.com; robin.mur...@arm.com; >>>> kevin.t...@intel.com; >>>>>> marc.zyng...@arm.com; christoffer.d...@linaro.org; >>>>>> drjo...@redhat.com; w...@redhat.com; t...@semihalf.com >>>>>> Subject: Re: [RFC v2 0/8] VIRTIO-IOMMU device >>>>>> >>>>>> Hi Bharat, >>>>>> >>>>>> On 09/06/2017 08:16, Bharat Bhushan wrote: >>>>>>> Hi Eric, >>>>>>> >>>>>>>> -Original Message- >>>>>>>> From: Eric Auger [mailto:eric.au...@redhat.com]
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Hi Eric, > -Original Message- > From: Auger Eric [mailto:eric.au...@redhat.com] > Sent: Monday, June 26, 2017 1:25 PM > To: Bharat Bhushan ; > eric.auger@gmail.com; peter.mayd...@linaro.org; > alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org; > qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com > Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > robin.mur...@arm.com; christoffer.d...@linaro.org > Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device > > Hi Bharat, > > On 19/06/2017 09:54, Bharat Bhushan wrote: > > Hi Eric, > > > > I started added replay in virtio-iommu and came across how MSI interrupts > with work with VFIO. > > I understand that on intel this works differently but vsmmu will have same > requirement. > > kvm-msi-irq-route are added using the msi-address to be translated by > viommu and not the final translated address. > > While currently the irqfd framework does not know about emulated > iommus (virtio-iommu, vsmmuv3/vintel-iommu). > > So in my view we have following options: > > - Programming with translated address when setting up > > kvm-msi-irq-route > > - Route the interrupts via QEMU, which is bad from performance > > - vhost-virtio-iommu may solve the problem in long term > > Sorry for the delay. With regard to the vsmmuv3/vfio integration I think we > need to use the guest physical address otherwise the MSI address will not be > recognized as an MSI doorbell. > > Also the fact on ARM we map the MSI doorbell causes an assert in > vfio_get_vaddr() as the vITS doorbell is not a RAM region. We will need to > handle this specifically. Also when setup msi-route kvm_irqchip_add_msi_route() we needed to provide the translated address. According to my understanding this is required because kernel does no go through viommu translation when generating interrupt, no? Thanks -Bharat > > Besides I have not looked specifically at the virtio-iommu/vfio integration > yet. > > Thanks > > Eric > > > > Is there any other better option I am missing? > > > > Thanks > > -Bharat > > > >> -Original Message- > >> From: Auger Eric [mailto:eric.au...@redhat.com] > >> Sent: Friday, June 09, 2017 5:24 PM > >> To: Bharat Bhushan ; > >> eric.auger@gmail.com; peter.mayd...@linaro.org; > >> alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org; > >> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com > >> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > >> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > >> robin.mur...@arm.com; christoffer.d...@linaro.org > >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device > >> > >> Hi Bharat, > >> > >> On 09/06/2017 13:30, Bharat Bhushan wrote: > >>> Hi Eric, > >>> > >>>> -Original Message- > >>>> From: Auger Eric [mailto:eric.au...@redhat.com] > >>>> Sent: Friday, June 09, 2017 12:14 PM > >>>> To: Bharat Bhushan ; > >>>> eric.auger@gmail.com; peter.mayd...@linaro.org; > >>>> alex.william...@redhat.com; m...@redhat.com; qemu- > a...@nongnu.org; > >>>> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com > >>>> Cc: will.dea...@arm.com; robin.mur...@arm.com; > >> kevin.t...@intel.com; > >>>> marc.zyng...@arm.com; christoffer.d...@linaro.org; > >>>> drjo...@redhat.com; w...@redhat.com; t...@semihalf.com > >>>> Subject: Re: [RFC v2 0/8] VIRTIO-IOMMU device > >>>> > >>>> Hi Bharat, > >>>> > >>>> On 09/06/2017 08:16, Bharat Bhushan wrote: > >>>>> Hi Eric, > >>>>> > >>>>>> -Original Message- > >>>>>> From: Eric Auger [mailto:eric.au...@redhat.com] > >>>>>> Sent: Wednesday, June 07, 2017 9:31 PM > >>>>>> To: eric.auger@gmail.com; eric.au...@redhat.com; > >>>>>> peter.mayd...@linaro.org; alex.william...@redhat.com; > >>>> m...@redhat.com; > >>>>>> qemu-...@nongnu.org; qemu-devel@nongnu.org; jean- > >>>>>> philippe.bruc...@arm.com > >>>>>> Cc: will.dea...@arm.com; robin.mur...@arm.com; > >>>> kevin.t...@intel.com; > >>>>>> marc.zyng...@arm.com; christoffer.d...@linaro.org; > >>>>>> drjo...@redhat.com; w...@redhat
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > Sent: Tuesday, June 27, 2017 12:13 AM > > On 26/06/17 09:22, Auger Eric wrote: > > Hi Jean-Philippe, > > > > On 19/06/2017 12:15, Jean-Philippe Brucker wrote: > >> On 19/06/17 08:54, Bharat Bhushan wrote: > >>> Hi Eric, > >>> > >>> I started added replay in virtio-iommu and came across how MSI > interrupts with work with VFIO. > >>> I understand that on intel this works differently but vsmmu will have > same requirement. > >>> kvm-msi-irq-route are added using the msi-address to be translated by > viommu and not the final translated address. > >>> While currently the irqfd framework does not know about emulated > iommus (virtio-iommu, vsmmuv3/vintel-iommu). > >>> So in my view we have following options: > >>> - Programming with translated address when setting up kvm-msi-irq- > route > >>> - Route the interrupts via QEMU, which is bad from performance > >>> - vhost-virtio-iommu may solve the problem in long term > >>> > >>> Is there any other better option I am missing? > >> > >> Since we're on the topic of MSIs... I'm currently trying to figure out how > >> we'll handle MSIs in the nested translation mode, where the guest > manages > >> S1 page tables and the host doesn't know about GVA->GPA translation. > > > > I have a question about the "nested translation mode" terminology. Do > > you mean in that case you use stage 1 + stage 2 of the physical IOMMU > > (which the ARM spec normally advises or was meant for) or do you mean > > stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At > the > > moment my understanding is for VFIO integration the pIOMMU uses a > single > > stage combining both the stage 1 and stage2 mappings but the host is not > > aware of those 2 stages. > > Yes at the moment the VMM merges stage-1 (GVA->GPA) from the guest with > its stage-2 mappings (GPA->HPA) and creates a stage-2 mapping (GVA->HPA) > in the pIOMMU via VFIO_IOMMU_MAP_DMA. stage-1 is disabled in the > pIOMMU. > Curious whether you are describing current smmu status or general vIOMMU status also applying to other vendors... the usage what you described is about svm, while svm requires PASID. At least PASID is tied to stage-1 on Intel VT-d. Only DMA w/o PASID or nested translation from stage-1 will go through stage-2. Unless ARM smmu has a completely different implementation, I'm not sure how svm can be virtualized w/ stage-1 translation disabled. There are multiple stage-1 page tables while only one stage-2 page table per device. Could merging actually work here? The only case with merging happen today is for guest stage-2 usage or so-called GIOVA usage. Guest programs GIOVA->GPA to vIOMMU stage-2. Then vIOMMU invokes vfio map/unmap APIs to translate/ merge to GIOVA->HPA to pIOMMU stage-2. Maybe what you actually meant is this one? Thanks Kevin
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Hi Jean, > -Original Message- > From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > Sent: Monday, June 19, 2017 3:45 PM > To: Bharat Bhushan ; Auger Eric > ; eric.auger@gmail.com; > peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com; > qemu-...@nongnu.org; qemu-devel@nongnu.org > Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > robin.mur...@arm.com; christoffer.d...@linaro.org > Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device > > On 19/06/17 08:54, Bharat Bhushan wrote: > > Hi Eric, > > > > I started added replay in virtio-iommu and came across how MSI interrupts > with work with VFIO. > > I understand that on intel this works differently but vsmmu will have same > requirement. > > kvm-msi-irq-route are added using the msi-address to be translated by > viommu and not the final translated address. > > While currently the irqfd framework does not know about emulated > iommus (virtio-iommu, vsmmuv3/vintel-iommu). > > So in my view we have following options: > > - Programming with translated address when setting up > > kvm-msi-irq-route > > - Route the interrupts via QEMU, which is bad from performance > > - vhost-virtio-iommu may solve the problem in long term > > > > Is there any other better option I am missing? > > Since we're on the topic of MSIs... I'm currently trying to figure out how > we'll > handle MSIs in the nested translation mode, where the guest manages > S1 page tables and the host doesn't know about GVA->GPA translation. > > I'm also wondering about the benefits of having SW-mapped MSIs in the > guest. It seems unavoidable for vSMMU since that's what a physical system > would do. But in a paravirtualized solution there doesn't seem to be any > compelling reason for having the guest map MSI doorbells. These addresses > are never accessed directly, they are only used for setting up IRQ routing (at > least on kvmtool). So here's what I'd like to have. Note that I haven't > investigated the feasibility in Qemu yet, I don't know how it deals with MSIs. > > (1) Guest uses the guest-physical MSI doorbell when setting up MSIs. For > ARM with GICv3 this would be GITS_TRANSLATER, for x86 it would be the > fixed MSI doorbell. This way the host wouldn't need to inspect IOMMU > mappings when handling writes to PCI MSI-X tables. Sorry for late reply, does this mean that we can use IOMMU_RESV_MSI for virtio-iommu driver? This will not create mapping in IOMMU? I tried this PCI pass-through using QEMU (integrated vfio with virtio-iommu) and MSI interrupts works without any change. Thanks -Bharat > > (2) In nested mode (with VFIO) on ARM, the pSMMU will still translate MSIs > via S1+S2. Therefore the host needs to map MSIs at stage-1, and I'd like to > use the (currently unused) TTB1 tables in that case. In addition, using > TTB1 would be useful for SVM, when endpoints write MSIs with PASIDs and > we don't want to map them in user address space. > > This means that the host needs to use different doorbell addresses in nested > mode, since it would be unable to map at S1 the same IOVA as S2 > (TTB1 manages negative addresses - 0x, which are not > representable as GPAs.) It also requires to use 32-bit page tables for > endpoints that are not capable of using 64-bit MSI addresses. > > Now (2) is entirely handled in the host kernel, so it's more a Linux question. > But does (1) seem acceptable for virtio-iommu in Qemu? > > Thanks, > Jean
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > Sent: Monday, June 19, 2017 6:15 PM > > On 19/06/17 08:54, Bharat Bhushan wrote: > > Hi Eric, > > > > I started added replay in virtio-iommu and came across how MSI interrupts > with work with VFIO. > > I understand that on intel this works differently but vsmmu will have same > requirement. > > kvm-msi-irq-route are added using the msi-address to be translated by > viommu and not the final translated address. > > While currently the irqfd framework does not know about emulated > iommus (virtio-iommu, vsmmuv3/vintel-iommu). > > So in my view we have following options: > > - Programming with translated address when setting up kvm-msi-irq-route > > - Route the interrupts via QEMU, which is bad from performance > > - vhost-virtio-iommu may solve the problem in long term > > > > Is there any other better option I am missing? > > Since we're on the topic of MSIs... I'm currently trying to figure out how > we'll handle MSIs in the nested translation mode, where the guest manages > S1 page tables and the host doesn't know about GVA->GPA translation. > > I'm also wondering about the benefits of having SW-mapped MSIs in the > guest. It seems unavoidable for vSMMU since that's what a physical system > would do. But in a paravirtualized solution there doesn't seem to be any > compelling reason for having the guest map MSI doorbells. These addresses > are never accessed directly, they are only used for setting up IRQ routing > (at least on kvmtool). So here's what I'd like to have. Note that I > haven't investigated the feasibility in Qemu yet, I don't know how it > deals with MSIs. > > (1) Guest uses the guest-physical MSI doorbell when setting up MSIs. For > ARM with GICv3 this would be GITS_TRANSLATER, for x86 it would be the > fixed MSI doorbell. This way the host wouldn't need to inspect IOMMU > mappings when handling writes to PCI MSI-X tables. > What do you mean by "fixed MSI doorbell"? PCI MSI-X table is part of PCI MMIO bar. Accessing to it is just a memory virtualization issue (e.g. trap by KVM and then emulated in Qemu) on x86. It's not a IOMMU problem. I guess you may mean same thing but want to double confirm here given the terminology confusion. Or do you mean the interrupt triggered by IOMMU itself? Thanks Kevin
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Hi, On 27/06/2017 10:46, Will Deacon wrote: > Hi Eric, > > On Tue, Jun 27, 2017 at 08:38:48AM +0200, Auger Eric wrote: >> On 26/06/2017 18:13, Jean-Philippe Brucker wrote: >>> On 26/06/17 09:22, Auger Eric wrote: On 19/06/2017 12:15, Jean-Philippe Brucker wrote: > On 19/06/17 08:54, Bharat Bhushan wrote: >> I started added replay in virtio-iommu and came across how MSI >> interrupts with work with VFIO. >> I understand that on intel this works differently but vsmmu will have >> same requirement. >> kvm-msi-irq-route are added using the msi-address to be translated by >> viommu and not the final translated address. >> While currently the irqfd framework does not know about emulated iommus >> (virtio-iommu, vsmmuv3/vintel-iommu). >> So in my view we have following options: >> - Programming with translated address when setting up kvm-msi-irq-route >> - Route the interrupts via QEMU, which is bad from performance >> - vhost-virtio-iommu may solve the problem in long term >> >> Is there any other better option I am missing? > > Since we're on the topic of MSIs... I'm currently trying to figure out how > we'll handle MSIs in the nested translation mode, where the guest manages > S1 page tables and the host doesn't know about GVA->GPA translation. I have a question about the "nested translation mode" terminology. Do you mean in that case you use stage 1 + stage 2 of the physical IOMMU (which the ARM spec normally advises or was meant for) or do you mean stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At the moment my understanding is for VFIO integration the pIOMMU uses a single stage combining both the stage 1 and stage2 mappings but the host is not aware of those 2 stages. >>> >>> Yes at the moment the VMM merges stage-1 (GVA->GPA) from the guest with >>> its stage-2 mappings (GPA->HPA) and creates a stage-2 mapping (GVA->HPA) >>> in the pIOMMU via VFIO_IOMMU_MAP_DMA. stage-1 is disabled in the pIOMMU. >>> >>> What I mean by "nested mode" is stage 1 + stage 2 in the physical IOMMU. >>> I'm referring to the "Page Table Sharing" bit of the Future Work in the >>> initial RFC for virtio-iommu [1], and also PASID table binding [2] in the >>> case of vSMMU. In that mode, stage-1 page tables in the pIOMMU are managed >>> by the guest, and the VMM only maps GPA->HPA. >> >> OK I need to read that part more thoroughly. I was told in the past >> handling nested stages at pIOMMU was considered too complex and >> difficult to maintain. But definitively The SMMU architecture is devised >> for that. Michael asked why we did not use that already for vsmmu >> (nested stages are used on AMD IOMMU I think). > > Curious -- but what gave you that idea? I worry that something I might have > said wasn't clear or has been misunderstood. Lobby discussions I might not have correctly understood ;-) Anyway that's a new direction that I am happy to investigate then. Thanks Eric > > Will >
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Hi Eric, On Tue, Jun 27, 2017 at 08:38:48AM +0200, Auger Eric wrote: > On 26/06/2017 18:13, Jean-Philippe Brucker wrote: > > On 26/06/17 09:22, Auger Eric wrote: > >> On 19/06/2017 12:15, Jean-Philippe Brucker wrote: > >>> On 19/06/17 08:54, Bharat Bhushan wrote: > I started added replay in virtio-iommu and came across how MSI > interrupts with work with VFIO. > I understand that on intel this works differently but vsmmu will have > same requirement. > kvm-msi-irq-route are added using the msi-address to be translated by > viommu and not the final translated address. > While currently the irqfd framework does not know about emulated iommus > (virtio-iommu, vsmmuv3/vintel-iommu). > So in my view we have following options: > - Programming with translated address when setting up kvm-msi-irq-route > - Route the interrupts via QEMU, which is bad from performance > - vhost-virtio-iommu may solve the problem in long term > > Is there any other better option I am missing? > >>> > >>> Since we're on the topic of MSIs... I'm currently trying to figure out how > >>> we'll handle MSIs in the nested translation mode, where the guest manages > >>> S1 page tables and the host doesn't know about GVA->GPA translation. > >> > >> I have a question about the "nested translation mode" terminology. Do > >> you mean in that case you use stage 1 + stage 2 of the physical IOMMU > >> (which the ARM spec normally advises or was meant for) or do you mean > >> stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At the > >> moment my understanding is for VFIO integration the pIOMMU uses a single > >> stage combining both the stage 1 and stage2 mappings but the host is not > >> aware of those 2 stages. > > > > Yes at the moment the VMM merges stage-1 (GVA->GPA) from the guest with > > its stage-2 mappings (GPA->HPA) and creates a stage-2 mapping (GVA->HPA) > > in the pIOMMU via VFIO_IOMMU_MAP_DMA. stage-1 is disabled in the pIOMMU. > > > > What I mean by "nested mode" is stage 1 + stage 2 in the physical IOMMU. > > I'm referring to the "Page Table Sharing" bit of the Future Work in the > > initial RFC for virtio-iommu [1], and also PASID table binding [2] in the > > case of vSMMU. In that mode, stage-1 page tables in the pIOMMU are managed > > by the guest, and the VMM only maps GPA->HPA. > > OK I need to read that part more thoroughly. I was told in the past > handling nested stages at pIOMMU was considered too complex and > difficult to maintain. But definitively The SMMU architecture is devised > for that. Michael asked why we did not use that already for vsmmu > (nested stages are used on AMD IOMMU I think). Curious -- but what gave you that idea? I worry that something I might have said wasn't clear or has been misunderstood. Will
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Hi Jean-Philippe, On 26/06/2017 18:13, Jean-Philippe Brucker wrote: > On 26/06/17 09:22, Auger Eric wrote: >> Hi Jean-Philippe, >> >> On 19/06/2017 12:15, Jean-Philippe Brucker wrote: >>> On 19/06/17 08:54, Bharat Bhushan wrote: Hi Eric, I started added replay in virtio-iommu and came across how MSI interrupts with work with VFIO. I understand that on intel this works differently but vsmmu will have same requirement. kvm-msi-irq-route are added using the msi-address to be translated by viommu and not the final translated address. While currently the irqfd framework does not know about emulated iommus (virtio-iommu, vsmmuv3/vintel-iommu). So in my view we have following options: - Programming with translated address when setting up kvm-msi-irq-route - Route the interrupts via QEMU, which is bad from performance - vhost-virtio-iommu may solve the problem in long term Is there any other better option I am missing? >>> >>> Since we're on the topic of MSIs... I'm currently trying to figure out how >>> we'll handle MSIs in the nested translation mode, where the guest manages >>> S1 page tables and the host doesn't know about GVA->GPA translation. >> >> I have a question about the "nested translation mode" terminology. Do >> you mean in that case you use stage 1 + stage 2 of the physical IOMMU >> (which the ARM spec normally advises or was meant for) or do you mean >> stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At the >> moment my understanding is for VFIO integration the pIOMMU uses a single >> stage combining both the stage 1 and stage2 mappings but the host is not >> aware of those 2 stages. > > Yes at the moment the VMM merges stage-1 (GVA->GPA) from the guest with > its stage-2 mappings (GPA->HPA) and creates a stage-2 mapping (GVA->HPA) > in the pIOMMU via VFIO_IOMMU_MAP_DMA. stage-1 is disabled in the pIOMMU. > > What I mean by "nested mode" is stage 1 + stage 2 in the physical IOMMU. > I'm referring to the "Page Table Sharing" bit of the Future Work in the > initial RFC for virtio-iommu [1], and also PASID table binding [2] in the > case of vSMMU. In that mode, stage-1 page tables in the pIOMMU are managed > by the guest, and the VMM only maps GPA->HPA. OK I need to read that part more thoroughly. I was told in the past handling nested stages at pIOMMU was considered too complex and difficult to maintain. But definitively The SMMU architecture is devised for that. Michael asked why we did not use that already for vsmmu (nested stages are used on AMD IOMMU I think). > > Since both s1 and s2 are then enabled in the pIOMMU, MSIs reaching the > pIOMMU will be translated at s1 then s2. To create nested translation for > MSIs, I see two solutions: > > A. The GPA of the doorbell that is exposed to the guest is mapped by the > VMM at S2. This mapping is GPA->(PA of the doorbell) with Dev-nGnRE memory > attributes. The guest creates a GVA->GPA mapping, then writes GVA in the > MSI-X tables. > - If the MSI-X table is emulated (as we currently do), VMM has to force > the host to rewrite the physical MSIX entry with the GVA. > - If the MSI-X table is mapped (see [3]), then the guest writes > the GVA into the physical MSI-X entry. (How does this work with lazy MSI > routing setup, that is based on trapping MSIX table?) > > B. The VMM exposes a fake doorbell. Hardware MSI vectors are programmed > upfront by the host. Since TTB0 is assigned to the guest, then host must > use TTB1 to create the GVA->GPA mapping. > > Solution B was my proposal (2) below, but I didn't take vSMMU into account > at the time. I think that for virtual SVM with the vSMMU, the VMM has to > hand the whole PASID table over to the guest. This is what Intel seems to > do [2]. Even if we emulated the PASID table instead of handing it over, we > wouldn't have a way to hide TTB1 from the guest. So with vSMMU we loose > control over TTB1 and (2) doesn't work. > > I don't really like A, but it might be the only way with vSMMU: > - Guest maps doorbell at S1, > - Guest writes the GVA in its virtual MSI-X tables, > - Host handles the GVA write and reprograms the hardware MSI-X tables, > - Device issues an MSI, which gets translated at S1+S2, then hits the > doorbell, > - VFIO handles the IRQ, which is forwarded to KVM via IRQFD, finds the > corresponding irqchip by GPA, then injects the MSI. I am about to experience A) with vsmmu/VFIO. Please let me few days before I answer accurately to this part. > >>> I'm also wondering about the benefits of having SW-mapped MSIs in the >>> guest. It seems unavoidable for vSMMU since that's what a physical system >>> would do. But in a paravirtualized solution there doesn't seem to be any >>> compelling reason for having the guest map MSI doorbells. >> >> If I understand correctly the virtio-iommu would not expose MSI reserved >> regions (saying it does not translates MSIs). In that case he VFIO
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
On 26/06/17 09:22, Auger Eric wrote: > Hi Jean-Philippe, > > On 19/06/2017 12:15, Jean-Philippe Brucker wrote: >> On 19/06/17 08:54, Bharat Bhushan wrote: >>> Hi Eric, >>> >>> I started added replay in virtio-iommu and came across how MSI interrupts >>> with work with VFIO. >>> I understand that on intel this works differently but vsmmu will have same >>> requirement. >>> kvm-msi-irq-route are added using the msi-address to be translated by >>> viommu and not the final translated address. >>> While currently the irqfd framework does not know about emulated iommus >>> (virtio-iommu, vsmmuv3/vintel-iommu). >>> So in my view we have following options: >>> - Programming with translated address when setting up kvm-msi-irq-route >>> - Route the interrupts via QEMU, which is bad from performance >>> - vhost-virtio-iommu may solve the problem in long term >>> >>> Is there any other better option I am missing? >> >> Since we're on the topic of MSIs... I'm currently trying to figure out how >> we'll handle MSIs in the nested translation mode, where the guest manages >> S1 page tables and the host doesn't know about GVA->GPA translation. > > I have a question about the "nested translation mode" terminology. Do > you mean in that case you use stage 1 + stage 2 of the physical IOMMU > (which the ARM spec normally advises or was meant for) or do you mean > stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At the > moment my understanding is for VFIO integration the pIOMMU uses a single > stage combining both the stage 1 and stage2 mappings but the host is not > aware of those 2 stages. Yes at the moment the VMM merges stage-1 (GVA->GPA) from the guest with its stage-2 mappings (GPA->HPA) and creates a stage-2 mapping (GVA->HPA) in the pIOMMU via VFIO_IOMMU_MAP_DMA. stage-1 is disabled in the pIOMMU. What I mean by "nested mode" is stage 1 + stage 2 in the physical IOMMU. I'm referring to the "Page Table Sharing" bit of the Future Work in the initial RFC for virtio-iommu [1], and also PASID table binding [2] in the case of vSMMU. In that mode, stage-1 page tables in the pIOMMU are managed by the guest, and the VMM only maps GPA->HPA. Since both s1 and s2 are then enabled in the pIOMMU, MSIs reaching the pIOMMU will be translated at s1 then s2. To create nested translation for MSIs, I see two solutions: A. The GPA of the doorbell that is exposed to the guest is mapped by the VMM at S2. This mapping is GPA->(PA of the doorbell) with Dev-nGnRE memory attributes. The guest creates a GVA->GPA mapping, then writes GVA in the MSI-X tables. - If the MSI-X table is emulated (as we currently do), VMM has to force the host to rewrite the physical MSIX entry with the GVA. - If the MSI-X table is mapped (see [3]), then the guest writes the GVA into the physical MSI-X entry. (How does this work with lazy MSI routing setup, that is based on trapping MSIX table?) B. The VMM exposes a fake doorbell. Hardware MSI vectors are programmed upfront by the host. Since TTB0 is assigned to the guest, then host must use TTB1 to create the GVA->GPA mapping. Solution B was my proposal (2) below, but I didn't take vSMMU into account at the time. I think that for virtual SVM with the vSMMU, the VMM has to hand the whole PASID table over to the guest. This is what Intel seems to do [2]. Even if we emulated the PASID table instead of handing it over, we wouldn't have a way to hide TTB1 from the guest. So with vSMMU we loose control over TTB1 and (2) doesn't work. I don't really like A, but it might be the only way with vSMMU: - Guest maps doorbell at S1, - Guest writes the GVA in its virtual MSI-X tables, - Host handles the GVA write and reprograms the hardware MSI-X tables, - Device issues an MSI, which gets translated at S1+S2, then hits the doorbell, - VFIO handles the IRQ, which is forwarded to KVM via IRQFD, finds the corresponding irqchip by GPA, then injects the MSI. >> I'm also wondering about the benefits of having SW-mapped MSIs in the >> guest. It seems unavoidable for vSMMU since that's what a physical system >> would do. But in a paravirtualized solution there doesn't seem to be any >> compelling reason for having the guest map MSI doorbells. > > If I understand correctly the virtio-iommu would not expose MSI reserved > regions (saying it does not translates MSIs). In that case he VFIO > kernel code will not check the irq_domain_check_msi_remap() but will > check iommu_capable(bus, IOMMU_CAP_INTR_REMAP) instead. Would the > virtio-iommu expose this capability? How would it isolate MSI > transactions from different devices? Yes, the virtio-iommu would expose IOMMU_CAP_INTR_REMAP to keep VFIO happy. But the virtio-iommu device wouldn't do any MSI isolation. We have software-mapped doorbell on ARM because MSI transactions are translated by the SMMU before reaching the GIC, which then performs device isolation. With virtio-iommu on ARM, the address translation stage seems unnecessary if you already hav
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Hi Jean-Philippe, On 19/06/2017 12:15, Jean-Philippe Brucker wrote: > On 19/06/17 08:54, Bharat Bhushan wrote: >> Hi Eric, >> >> I started added replay in virtio-iommu and came across how MSI interrupts >> with work with VFIO. >> I understand that on intel this works differently but vsmmu will have same >> requirement. >> kvm-msi-irq-route are added using the msi-address to be translated by viommu >> and not the final translated address. >> While currently the irqfd framework does not know about emulated iommus >> (virtio-iommu, vsmmuv3/vintel-iommu). >> So in my view we have following options: >> - Programming with translated address when setting up kvm-msi-irq-route >> - Route the interrupts via QEMU, which is bad from performance >> - vhost-virtio-iommu may solve the problem in long term >> >> Is there any other better option I am missing? > > Since we're on the topic of MSIs... I'm currently trying to figure out how > we'll handle MSIs in the nested translation mode, where the guest manages > S1 page tables and the host doesn't know about GVA->GPA translation. I have a question about the "nested translation mode" terminology. Do you mean in that case you use stage 1 + stage 2 of the physical IOMMU (which the ARM spec normally advises or was meant for) or do you mean stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At the moment my understanding is for VFIO integration the pIOMMU uses a single stage combining both the stage 1 and stage2 mappings but the host is not aware of those 2 stages. > > I'm also wondering about the benefits of having SW-mapped MSIs in the > guest. It seems unavoidable for vSMMU since that's what a physical system > would do. But in a paravirtualized solution there doesn't seem to be any > compelling reason for having the guest map MSI doorbells. If I understand correctly the virtio-iommu would not expose MSI reserved regions (saying it does not translates MSIs). In that case he VFIO kernel code will not check the irq_domain_check_msi_remap() but will check iommu_capable(bus, IOMMU_CAP_INTR_REMAP) instead. Would the virtio-iommu expose this capability? How would it isolate MSI transactions from different devices? Thanks Eric These addresses > are never accessed directly, they are only used for setting up IRQ routing > (at least on kvmtool). So here's what I'd like to have. Note that I > haven't investigated the feasibility in Qemu yet, I don't know how it > deals with MSIs. > > (1) Guest uses the guest-physical MSI doorbell when setting up MSIs. For > ARM with GICv3 this would be GITS_TRANSLATER, for x86 it would be the > fixed MSI doorbell. This way the host wouldn't need to inspect IOMMU > mappings when handling writes to PCI MSI-X tables. > > (2) In nested mode (with VFIO) on ARM, the pSMMU will still translate MSIs > via S1+S2. Therefore the host needs to map MSIs at stage-1, and I'd like > to use the (currently unused) TTB1 tables in that case. In addition, using > TTB1 would be useful for SVM, when endpoints write MSIs with PASIDs and we > don't want to map them in user address space. > > This means that the host needs to use different doorbell addresses in > nested mode, since it would be unable to map at S1 the same IOVA as S2 > (TTB1 manages negative addresses - 0x, which are not > representable as GPAs.) It also requires to use 32-bit page tables for > endpoints that are not capable of using 64-bit MSI addresses. > > > Now (2) is entirely handled in the host kernel, so it's more a Linux > question. But does (1) seem acceptable for virtio-iommu in Qemu? > > Thanks, > Jean >
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Hi Bharat, On 19/06/2017 09:54, Bharat Bhushan wrote: > Hi Eric, > > I started added replay in virtio-iommu and came across how MSI interrupts > with work with VFIO. > I understand that on intel this works differently but vsmmu will have same > requirement. > kvm-msi-irq-route are added using the msi-address to be translated by viommu > and not the final translated address. > While currently the irqfd framework does not know about emulated iommus > (virtio-iommu, vsmmuv3/vintel-iommu). > So in my view we have following options: > - Programming with translated address when setting up kvm-msi-irq-route > - Route the interrupts via QEMU, which is bad from performance > - vhost-virtio-iommu may solve the problem in long term Sorry for the delay. With regard to the vsmmuv3/vfio integration I think we need to use the guest physical address otherwise the MSI address will not be recognized as an MSI doorbell. Also the fact on ARM we map the MSI doorbell causes an assert in vfio_get_vaddr() as the vITS doorbell is not a RAM region. We will need to handle this specifically. Besides I have not looked specifically at the virtio-iommu/vfio integration yet. Thanks Eric > > Is there any other better option I am missing? > > Thanks > -Bharat > >> -Original Message- >> From: Auger Eric [mailto:eric.au...@redhat.com] >> Sent: Friday, June 09, 2017 5:24 PM >> To: Bharat Bhushan ; >> eric.auger@gmail.com; peter.mayd...@linaro.org; >> alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org; >> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com >> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; >> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; >> robin.mur...@arm.com; christoffer.d...@linaro.org >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device >> >> Hi Bharat, >> >> On 09/06/2017 13:30, Bharat Bhushan wrote: >>> Hi Eric, >>> >>>> -Original Message- >>>> From: Auger Eric [mailto:eric.au...@redhat.com] >>>> Sent: Friday, June 09, 2017 12:14 PM >>>> To: Bharat Bhushan ; >>>> eric.auger@gmail.com; peter.mayd...@linaro.org; >>>> alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org; >>>> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com >>>> Cc: will.dea...@arm.com; robin.mur...@arm.com; >> kevin.t...@intel.com; >>>> marc.zyng...@arm.com; christoffer.d...@linaro.org; >>>> drjo...@redhat.com; w...@redhat.com; t...@semihalf.com >>>> Subject: Re: [RFC v2 0/8] VIRTIO-IOMMU device >>>> >>>> Hi Bharat, >>>> >>>> On 09/06/2017 08:16, Bharat Bhushan wrote: >>>>> Hi Eric, >>>>> >>>>>> -Original Message- >>>>>> From: Eric Auger [mailto:eric.au...@redhat.com] >>>>>> Sent: Wednesday, June 07, 2017 9:31 PM >>>>>> To: eric.auger@gmail.com; eric.au...@redhat.com; >>>>>> peter.mayd...@linaro.org; alex.william...@redhat.com; >>>> m...@redhat.com; >>>>>> qemu-...@nongnu.org; qemu-devel@nongnu.org; jean- >>>>>> philippe.bruc...@arm.com >>>>>> Cc: will.dea...@arm.com; robin.mur...@arm.com; >>>> kevin.t...@intel.com; >>>>>> marc.zyng...@arm.com; christoffer.d...@linaro.org; >>>>>> drjo...@redhat.com; w...@redhat.com; t...@semihalf.com; Bharat >>>> Bhushan >>>>>> >>>>>> Subject: [RFC v2 0/8] VIRTIO-IOMMU device >>>>>> >>>>>> This series implements the virtio-iommu device. This is a proof of >>>>>> concept based on the virtio-iommu specification written by >>>>>> Jean-Philippe >>>> Brucker [1]. >>>>>> This was tested with a guest using the virtio-iommu driver [2] and >>>>>> exposed with a virtio-net-pci using dma ops. >>>>>> >>>>>> The device gets instantiated using the "-device virtio-iommu-device" >>>>>> option. It currently works with ARM virt machine only as the >>>>>> machine must handle the dt binding between the virtio-mmio "iommu" >>>>>> node and the PCI host bridge node. ACPI booting is not yet supported. >>>>>> >>>>>> This should allow to start some benchmarking activities against >>>>>> pure emulated IOMMU (especially ARM SMMU). >>>>> >>>>> I am testing this on ARM64 and see below c
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
On 19/06/17 08:54, Bharat Bhushan wrote: > Hi Eric, > > I started added replay in virtio-iommu and came across how MSI interrupts > with work with VFIO. > I understand that on intel this works differently but vsmmu will have same > requirement. > kvm-msi-irq-route are added using the msi-address to be translated by viommu > and not the final translated address. > While currently the irqfd framework does not know about emulated iommus > (virtio-iommu, vsmmuv3/vintel-iommu). > So in my view we have following options: > - Programming with translated address when setting up kvm-msi-irq-route > - Route the interrupts via QEMU, which is bad from performance > - vhost-virtio-iommu may solve the problem in long term > > Is there any other better option I am missing? Since we're on the topic of MSIs... I'm currently trying to figure out how we'll handle MSIs in the nested translation mode, where the guest manages S1 page tables and the host doesn't know about GVA->GPA translation. I'm also wondering about the benefits of having SW-mapped MSIs in the guest. It seems unavoidable for vSMMU since that's what a physical system would do. But in a paravirtualized solution there doesn't seem to be any compelling reason for having the guest map MSI doorbells. These addresses are never accessed directly, they are only used for setting up IRQ routing (at least on kvmtool). So here's what I'd like to have. Note that I haven't investigated the feasibility in Qemu yet, I don't know how it deals with MSIs. (1) Guest uses the guest-physical MSI doorbell when setting up MSIs. For ARM with GICv3 this would be GITS_TRANSLATER, for x86 it would be the fixed MSI doorbell. This way the host wouldn't need to inspect IOMMU mappings when handling writes to PCI MSI-X tables. (2) In nested mode (with VFIO) on ARM, the pSMMU will still translate MSIs via S1+S2. Therefore the host needs to map MSIs at stage-1, and I'd like to use the (currently unused) TTB1 tables in that case. In addition, using TTB1 would be useful for SVM, when endpoints write MSIs with PASIDs and we don't want to map them in user address space. This means that the host needs to use different doorbell addresses in nested mode, since it would be unable to map at S1 the same IOVA as S2 (TTB1 manages negative addresses - 0x, which are not representable as GPAs.) It also requires to use 32-bit page tables for endpoints that are not capable of using 64-bit MSI addresses. Now (2) is entirely handled in the host kernel, so it's more a Linux question. But does (1) seem acceptable for virtio-iommu in Qemu? Thanks, Jean
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Hi Eric, I started added replay in virtio-iommu and came across how MSI interrupts with work with VFIO. I understand that on intel this works differently but vsmmu will have same requirement. kvm-msi-irq-route are added using the msi-address to be translated by viommu and not the final translated address. While currently the irqfd framework does not know about emulated iommus (virtio-iommu, vsmmuv3/vintel-iommu). So in my view we have following options: - Programming with translated address when setting up kvm-msi-irq-route - Route the interrupts via QEMU, which is bad from performance - vhost-virtio-iommu may solve the problem in long term Is there any other better option I am missing? Thanks -Bharat > -Original Message- > From: Auger Eric [mailto:eric.au...@redhat.com] > Sent: Friday, June 09, 2017 5:24 PM > To: Bharat Bhushan ; > eric.auger@gmail.com; peter.mayd...@linaro.org; > alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org; > qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com > Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > robin.mur...@arm.com; christoffer.d...@linaro.org > Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device > > Hi Bharat, > > On 09/06/2017 13:30, Bharat Bhushan wrote: > > Hi Eric, > > > >> -Original Message- > >> From: Auger Eric [mailto:eric.au...@redhat.com] > >> Sent: Friday, June 09, 2017 12:14 PM > >> To: Bharat Bhushan ; > >> eric.auger@gmail.com; peter.mayd...@linaro.org; > >> alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org; > >> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com > >> Cc: will.dea...@arm.com; robin.mur...@arm.com; > kevin.t...@intel.com; > >> marc.zyng...@arm.com; christoffer.d...@linaro.org; > >> drjo...@redhat.com; w...@redhat.com; t...@semihalf.com > >> Subject: Re: [RFC v2 0/8] VIRTIO-IOMMU device > >> > >> Hi Bharat, > >> > >> On 09/06/2017 08:16, Bharat Bhushan wrote: > >>> Hi Eric, > >>> > >>>> -Original Message- > >>>> From: Eric Auger [mailto:eric.au...@redhat.com] > >>>> Sent: Wednesday, June 07, 2017 9:31 PM > >>>> To: eric.auger@gmail.com; eric.au...@redhat.com; > >>>> peter.mayd...@linaro.org; alex.william...@redhat.com; > >> m...@redhat.com; > >>>> qemu-...@nongnu.org; qemu-devel@nongnu.org; jean- > >>>> philippe.bruc...@arm.com > >>>> Cc: will.dea...@arm.com; robin.mur...@arm.com; > >> kevin.t...@intel.com; > >>>> marc.zyng...@arm.com; christoffer.d...@linaro.org; > >>>> drjo...@redhat.com; w...@redhat.com; t...@semihalf.com; Bharat > >> Bhushan > >>>> > >>>> Subject: [RFC v2 0/8] VIRTIO-IOMMU device > >>>> > >>>> This series implements the virtio-iommu device. This is a proof of > >>>> concept based on the virtio-iommu specification written by > >>>> Jean-Philippe > >> Brucker [1]. > >>>> This was tested with a guest using the virtio-iommu driver [2] and > >>>> exposed with a virtio-net-pci using dma ops. > >>>> > >>>> The device gets instantiated using the "-device virtio-iommu-device" > >>>> option. It currently works with ARM virt machine only as the > >>>> machine must handle the dt binding between the virtio-mmio "iommu" > >>>> node and the PCI host bridge node. ACPI booting is not yet supported. > >>>> > >>>> This should allow to start some benchmarking activities against > >>>> pure emulated IOMMU (especially ARM SMMU). > >>> > >>> I am testing this on ARM64 and see below continuous error prints: > >>> > >>> virtio_iommu_translate sid=8 is not known!! > >>> virtio_iommu_translate sid=8 is not known!! > >>> virtio_iommu_translate sid=8 is not known!! > >>> virtio_iommu_translate sid=8 is not known!! > >>> virtio_iommu_translate sid=8 is not known!! > >>> virtio_iommu_translate sid=8 is not known!! > >>> virtio_iommu_translate sid=8 is not known!! > >>> virtio_iommu_translate sid=8 is not known!! > >>> virtio_iommu_translate sid=8 is not known!! > >>> virtio_iommu_translate sid=8 is not known!! > >>> > >>> > >>> Also in guest I do not see device-tree node with virtio-iommu. > >> do you mean
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Hi, On 09/06/2017 13:30, Bharat Bhushan wrote: > Hi Eric, > >> -Original Message- >> From: Auger Eric [mailto:eric.au...@redhat.com] >> Sent: Friday, June 09, 2017 12:14 PM >> To: Bharat Bhushan ; >> eric.auger@gmail.com; peter.mayd...@linaro.org; >> alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org; >> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com >> Cc: will.dea...@arm.com; robin.mur...@arm.com; kevin.t...@intel.com; >> marc.zyng...@arm.com; christoffer.d...@linaro.org; drjo...@redhat.com; >> w...@redhat.com; t...@semihalf.com >> Subject: Re: [RFC v2 0/8] VIRTIO-IOMMU device >> >> Hi Bharat, >> >> On 09/06/2017 08:16, Bharat Bhushan wrote: >>> Hi Eric, >>> -Original Message- From: Eric Auger [mailto:eric.au...@redhat.com] Sent: Wednesday, June 07, 2017 9:31 PM To: eric.auger@gmail.com; eric.au...@redhat.com; peter.mayd...@linaro.org; alex.william...@redhat.com; >> m...@redhat.com; qemu-...@nongnu.org; qemu-devel@nongnu.org; jean- philippe.bruc...@arm.com Cc: will.dea...@arm.com; robin.mur...@arm.com; >> kevin.t...@intel.com; marc.zyng...@arm.com; christoffer.d...@linaro.org; drjo...@redhat.com; w...@redhat.com; t...@semihalf.com; Bharat >> Bhushan Subject: [RFC v2 0/8] VIRTIO-IOMMU device This series implements the virtio-iommu device. This is a proof of concept based on the virtio-iommu specification written by Jean-Philippe >> Brucker [1]. This was tested with a guest using the virtio-iommu driver [2] and exposed with a virtio-net-pci using dma ops. The device gets instantiated using the "-device virtio-iommu-device" option. It currently works with ARM virt machine only as the machine must handle the dt binding between the virtio-mmio "iommu" node and the PCI host bridge node. ACPI booting is not yet supported. For those who may play with the device, this was tested with a virtio-net-pci device using the following command: -device virtio-net-pci,netdev=tap0,mac=,iommu_platform,disable-modern=off,disable-legacy=on \ I tried to run the guest using a virtio-blk-pci device using -device virtio-blk-pci,scsi=off,drive=<>,iommu_platform=off,disable-modern=off,disable-legacy=on,werror=stop,rerror=stop \ and the guest does *not* boot whereas it does without any iommu. However I am not sure the issue is related to the actual virtual iommu device as I have the exact same issue with vsmmuv3 emulated device (This was originally reported by Tomasz). So the issue may come from the infrastructure around. To be further investigated ... Thanks Eric This should allow to start some benchmarking activities against pure emulated IOMMU (especially ARM SMMU). >>> >>> I am testing this on ARM64 and see below continuous error prints: >>> >>> virtio_iommu_translate sid=8 is not known!! >>> virtio_iommu_translate sid=8 is not known!! >>> virtio_iommu_translate sid=8 is not known!! >>> virtio_iommu_translate sid=8 is not known!! >>> virtio_iommu_translate sid=8 is not known!! >>> virtio_iommu_translate sid=8 is not known!! >>> virtio_iommu_translate sid=8 is not known!! >>> virtio_iommu_translate sid=8 is not known!! >>> virtio_iommu_translate sid=8 is not known!! >>> virtio_iommu_translate sid=8 is not known!! >>> >>> >>> Also in guest I do not see device-tree node with virtio-iommu. >> do you mean the virtio-mmio with #iommu-cells property? >> >> This one is created statically by virt machine. I would be surprised if it >> were >> not there. Are you using the virt = virt2.10 machine. Machines before do not >> support its instantiation. >> >> Please can you add a printf in hw/arm/virt.c create_virtio_mmio() at the >> moment when this node is created. Also you can add a printf in >> bind_virtio_iommu_device() to make sure the binding with the PCI host >> bridge is added on machine init done. >> >> Also worth to check, CONFIG_VIRTIO_IOMMU=y on guest side. > > It works on my side. The driver config was disabled and also I was using > guest kernel which was not have deferred-probing. Now after fixing it works > on my side > I placed some prints to see dma-map are mapping regions in virtio-iommu, it > uses emulated iommu. > > I will continue to add VFIO support now on this and more testing !! > > Thanks > -Bharat > >> >> Thanks >> >> Eric >> >>> I am using qemu-tree you mentioned below and iommu-driver patches >> published by Jean-P. >>> Qemu command line have additional ""-device virtio-iommu-device". What >> I am missing ? >> >> >>> >>> Thanks >>> -Bharat >>> Best Regards Eric This series can be found at: https://github.com/eauger/qemu/tree/virtio-iommu-rfcv2 References: [1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU, [2] [RFC PATCH linux] iommu: Add virtio-iommu driver [3] [RFC PATCH kvmtool 00/15] Add virtio- iommu History: >>
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Hi Bharat, On 09/06/2017 13:30, Bharat Bhushan wrote: > Hi Eric, > >> -Original Message- >> From: Auger Eric [mailto:eric.au...@redhat.com] >> Sent: Friday, June 09, 2017 12:14 PM >> To: Bharat Bhushan ; >> eric.auger@gmail.com; peter.mayd...@linaro.org; >> alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org; >> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com >> Cc: will.dea...@arm.com; robin.mur...@arm.com; kevin.t...@intel.com; >> marc.zyng...@arm.com; christoffer.d...@linaro.org; drjo...@redhat.com; >> w...@redhat.com; t...@semihalf.com >> Subject: Re: [RFC v2 0/8] VIRTIO-IOMMU device >> >> Hi Bharat, >> >> On 09/06/2017 08:16, Bharat Bhushan wrote: >>> Hi Eric, >>> -Original Message- From: Eric Auger [mailto:eric.au...@redhat.com] Sent: Wednesday, June 07, 2017 9:31 PM To: eric.auger@gmail.com; eric.au...@redhat.com; peter.mayd...@linaro.org; alex.william...@redhat.com; >> m...@redhat.com; qemu-...@nongnu.org; qemu-devel@nongnu.org; jean- philippe.bruc...@arm.com Cc: will.dea...@arm.com; robin.mur...@arm.com; >> kevin.t...@intel.com; marc.zyng...@arm.com; christoffer.d...@linaro.org; drjo...@redhat.com; w...@redhat.com; t...@semihalf.com; Bharat >> Bhushan Subject: [RFC v2 0/8] VIRTIO-IOMMU device This series implements the virtio-iommu device. This is a proof of concept based on the virtio-iommu specification written by Jean-Philippe >> Brucker [1]. This was tested with a guest using the virtio-iommu driver [2] and exposed with a virtio-net-pci using dma ops. The device gets instantiated using the "-device virtio-iommu-device" option. It currently works with ARM virt machine only as the machine must handle the dt binding between the virtio-mmio "iommu" node and the PCI host bridge node. ACPI booting is not yet supported. This should allow to start some benchmarking activities against pure emulated IOMMU (especially ARM SMMU). >>> >>> I am testing this on ARM64 and see below continuous error prints: >>> >>> virtio_iommu_translate sid=8 is not known!! >>> virtio_iommu_translate sid=8 is not known!! >>> virtio_iommu_translate sid=8 is not known!! >>> virtio_iommu_translate sid=8 is not known!! >>> virtio_iommu_translate sid=8 is not known!! >>> virtio_iommu_translate sid=8 is not known!! >>> virtio_iommu_translate sid=8 is not known!! >>> virtio_iommu_translate sid=8 is not known!! >>> virtio_iommu_translate sid=8 is not known!! >>> virtio_iommu_translate sid=8 is not known!! >>> >>> >>> Also in guest I do not see device-tree node with virtio-iommu. >> do you mean the virtio-mmio with #iommu-cells property? >> >> This one is created statically by virt machine. I would be surprised if it >> were >> not there. Are you using the virt = virt2.10 machine. Machines before do not >> support its instantiation. >> >> Please can you add a printf in hw/arm/virt.c create_virtio_mmio() at the >> moment when this node is created. Also you can add a printf in >> bind_virtio_iommu_device() to make sure the binding with the PCI host >> bridge is added on machine init done. >> >> Also worth to check, CONFIG_VIRTIO_IOMMU=y on guest side. > > It works on my side. Great. The driver config was disabled and also I was using guest kernel which was not have deferred-probing. Yes I did not mention in my cover letter the guest I have been using is based on Jean-Philippe's branch, featuring deferred IOMMU probing. I I have not tried yet with an upstream guest. Now after fixing it works on my side > I placed some prints to see dma-map are mapping regions in virtio-iommu, it > uses emulated iommu. > > I will continue to add VFIO support now on this and more testing !! OK. I will do the VFIO integration first on the vsmmuv3 device as I already prepared the VFIO replay and hopefully we will sync ;-) Thanks Eric > > Thanks > -Bharat > >> >> Thanks >> >> Eric >> >>> I am using qemu-tree you mentioned below and iommu-driver patches >> published by Jean-P. >>> Qemu command line have additional ""-device virtio-iommu-device". What >> I am missing ? >> >> >>> >>> Thanks >>> -Bharat >>> Best Regards Eric This series can be found at: https://github.com/eauger/qemu/tree/virtio-iommu-rfcv2 References: [1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU, [2] [RFC PATCH linux] iommu: Add virtio-iommu driver [3] [RFC PATCH kvmtool 00/15] Add virtio- iommu History: v1 -> v2: - fix redifinition of viommu_as typedef Eric Auger (8): update-linux-headers: import virtio_iommu.h linux-headers: Update for virtio-iommu virtio_iommu: add skeleton virtio-iommu: Decode the command payload virtio_iommu: Add the iommu regions virtio-iommu: Implement the translation and commands hw/arm/virt: Add 2.10 mac
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Hi Eric, > -Original Message- > From: Auger Eric [mailto:eric.au...@redhat.com] > Sent: Friday, June 09, 2017 12:14 PM > To: Bharat Bhushan ; > eric.auger@gmail.com; peter.mayd...@linaro.org; > alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org; > qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com > Cc: will.dea...@arm.com; robin.mur...@arm.com; kevin.t...@intel.com; > marc.zyng...@arm.com; christoffer.d...@linaro.org; drjo...@redhat.com; > w...@redhat.com; t...@semihalf.com > Subject: Re: [RFC v2 0/8] VIRTIO-IOMMU device > > Hi Bharat, > > On 09/06/2017 08:16, Bharat Bhushan wrote: > > Hi Eric, > > > >> -Original Message- > >> From: Eric Auger [mailto:eric.au...@redhat.com] > >> Sent: Wednesday, June 07, 2017 9:31 PM > >> To: eric.auger@gmail.com; eric.au...@redhat.com; > >> peter.mayd...@linaro.org; alex.william...@redhat.com; > m...@redhat.com; > >> qemu-...@nongnu.org; qemu-devel@nongnu.org; jean- > >> philippe.bruc...@arm.com > >> Cc: will.dea...@arm.com; robin.mur...@arm.com; > kevin.t...@intel.com; > >> marc.zyng...@arm.com; christoffer.d...@linaro.org; > >> drjo...@redhat.com; w...@redhat.com; t...@semihalf.com; Bharat > Bhushan > >> > >> Subject: [RFC v2 0/8] VIRTIO-IOMMU device > >> > >> This series implements the virtio-iommu device. This is a proof of > >> concept based on the virtio-iommu specification written by Jean-Philippe > Brucker [1]. > >> This was tested with a guest using the virtio-iommu driver [2] and > >> exposed with a virtio-net-pci using dma ops. > >> > >> The device gets instantiated using the "-device virtio-iommu-device" > >> option. It currently works with ARM virt machine only as the machine > >> must handle the dt binding between the virtio-mmio "iommu" node and > >> the PCI host bridge node. ACPI booting is not yet supported. > >> > >> This should allow to start some benchmarking activities against pure > >> emulated IOMMU (especially ARM SMMU). > > > > I am testing this on ARM64 and see below continuous error prints: > > > > virtio_iommu_translate sid=8 is not known!! > > virtio_iommu_translate sid=8 is not known!! > > virtio_iommu_translate sid=8 is not known!! > > virtio_iommu_translate sid=8 is not known!! > > virtio_iommu_translate sid=8 is not known!! > > virtio_iommu_translate sid=8 is not known!! > > virtio_iommu_translate sid=8 is not known!! > > virtio_iommu_translate sid=8 is not known!! > > virtio_iommu_translate sid=8 is not known!! > > virtio_iommu_translate sid=8 is not known!! > > > > > > Also in guest I do not see device-tree node with virtio-iommu. > do you mean the virtio-mmio with #iommu-cells property? > > This one is created statically by virt machine. I would be surprised if it > were > not there. Are you using the virt = virt2.10 machine. Machines before do not > support its instantiation. > > Please can you add a printf in hw/arm/virt.c create_virtio_mmio() at the > moment when this node is created. Also you can add a printf in > bind_virtio_iommu_device() to make sure the binding with the PCI host > bridge is added on machine init done. > > Also worth to check, CONFIG_VIRTIO_IOMMU=y on guest side. It works on my side. The driver config was disabled and also I was using guest kernel which was not have deferred-probing. Now after fixing it works on my side I placed some prints to see dma-map are mapping regions in virtio-iommu, it uses emulated iommu. I will continue to add VFIO support now on this and more testing !! Thanks -Bharat > > Thanks > > Eric > > > I am using qemu-tree you mentioned below and iommu-driver patches > published by Jean-P. > > Qemu command line have additional ""-device virtio-iommu-device". What > I am missing ? > > > > > > Thanks > > -Bharat > > > >> > >> Best Regards > >> > >> Eric > >> > >> This series can be found at: > >> https://github.com/eauger/qemu/tree/virtio-iommu-rfcv2 > >> > >> References: > >> [1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU, [2] [RFC PATCH > >> linux] > >> iommu: Add virtio-iommu driver [3] [RFC PATCH kvmtool 00/15] Add > >> virtio- iommu > >> > >> History: > >> v1 -> v2: > >> - fix redifinition of viommu_as typedef > >> > >> Eric Auger (8): > >> update-linux-headers: import virtio_iommu.h > >> linux-headers: Update for virtio-iommu > >> virtio_iommu: add skeleton > >> virtio-iommu: Decode the command payload > >> virtio_iommu: Add the iommu regions > >> virtio-iommu: Implement the translation and commands > >> hw/arm/virt: Add 2.10 machine type > >> hw/arm/virt: Add virtio-iommu the virt board > >> > >> hw/arm/virt.c | 116 - > >> hw/virtio/Makefile.objs | 1 + > >> hw/virtio/trace-events| 14 + > >> hw/virtio/virtio-iommu.c | 623 > ++ > >> include/hw/arm/virt.h | 5 + > >> include/hw/virtio/virtio-iommu.h |
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Hi Bharat, On 09/06/2017 08:16, Bharat Bhushan wrote: > Hi Eric, > >> -Original Message- >> From: Eric Auger [mailto:eric.au...@redhat.com] >> Sent: Wednesday, June 07, 2017 9:31 PM >> To: eric.auger@gmail.com; eric.au...@redhat.com; >> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com; >> qemu-...@nongnu.org; qemu-devel@nongnu.org; jean- >> philippe.bruc...@arm.com >> Cc: will.dea...@arm.com; robin.mur...@arm.com; kevin.t...@intel.com; >> marc.zyng...@arm.com; christoffer.d...@linaro.org; drjo...@redhat.com; >> w...@redhat.com; t...@semihalf.com; Bharat Bhushan >> >> Subject: [RFC v2 0/8] VIRTIO-IOMMU device >> >> This series implements the virtio-iommu device. This is a proof of concept >> based on the virtio-iommu specification written by Jean-Philippe Brucker [1]. >> This was tested with a guest using the virtio-iommu driver [2] and exposed >> with a virtio-net-pci using dma ops. >> >> The device gets instantiated using the "-device virtio-iommu-device" >> option. It currently works with ARM virt machine only as the machine must >> handle the dt binding between the virtio-mmio "iommu" node and the PCI >> host bridge node. ACPI booting is not yet supported. >> >> This should allow to start some benchmarking activities against pure >> emulated IOMMU (especially ARM SMMU). > > I am testing this on ARM64 and see below continuous error prints: > > virtio_iommu_translate sid=8 is not known!! > virtio_iommu_translate sid=8 is not known!! > virtio_iommu_translate sid=8 is not known!! > virtio_iommu_translate sid=8 is not known!! > virtio_iommu_translate sid=8 is not known!! > virtio_iommu_translate sid=8 is not known!! > virtio_iommu_translate sid=8 is not known!! > virtio_iommu_translate sid=8 is not known!! > virtio_iommu_translate sid=8 is not known!! > virtio_iommu_translate sid=8 is not known!! > > > Also in guest I do not see device-tree node with virtio-iommu. do you mean the virtio-mmio with #iommu-cells property? This one is created statically by virt machine. I would be surprised if it were not there. Are you using the virt = virt2.10 machine. Machines before do not support its instantiation. Please can you add a printf in hw/arm/virt.c create_virtio_mmio() at the moment when this node is created. Also you can add a printf in bind_virtio_iommu_device() to make sure the binding with the PCI host bridge is added on machine init done. Also worth to check, CONFIG_VIRTIO_IOMMU=y on guest side. Thanks Eric > I am using qemu-tree you mentioned below and iommu-driver patches published > by Jean-P. > Qemu command line have additional ""-device virtio-iommu-device". What I am > missing ? > > Thanks > -Bharat > >> >> Best Regards >> >> Eric >> >> This series can be found at: >> https://github.com/eauger/qemu/tree/virtio-iommu-rfcv2 >> >> References: >> [1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU, [2] [RFC PATCH linux] >> iommu: Add virtio-iommu driver [3] [RFC PATCH kvmtool 00/15] Add virtio- >> iommu >> >> History: >> v1 -> v2: >> - fix redifinition of viommu_as typedef >> >> Eric Auger (8): >> update-linux-headers: import virtio_iommu.h >> linux-headers: Update for virtio-iommu >> virtio_iommu: add skeleton >> virtio-iommu: Decode the command payload >> virtio_iommu: Add the iommu regions >> virtio-iommu: Implement the translation and commands >> hw/arm/virt: Add 2.10 machine type >> hw/arm/virt: Add virtio-iommu the virt board >> >> hw/arm/virt.c | 116 - >> hw/virtio/Makefile.objs | 1 + >> hw/virtio/trace-events| 14 + >> hw/virtio/virtio-iommu.c | 623 >> ++ >> include/hw/arm/virt.h | 5 + >> include/hw/virtio/virtio-iommu.h | 60 +++ >> include/standard-headers/linux/virtio_ids.h | 1 + >> include/standard-headers/linux/virtio_iommu.h | 142 ++ >> linux-headers/linux/virtio_iommu.h| 1 + >> scripts/update-linux-headers.sh | 3 + >> 10 files changed, 957 insertions(+), 9 deletions(-) create mode 100644 >> hw/virtio/virtio-iommu.c create mode 100644 include/hw/virtio/virtio- >> iommu.h create mode 100644 include/standard- >> headers/linux/virtio_iommu.h >> create mode 100644 linux-headers/linux/virtio_iommu.h >> >> -- >> 2.5.5 >
Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
Hi Eric, > -Original Message- > From: Eric Auger [mailto:eric.au...@redhat.com] > Sent: Wednesday, June 07, 2017 9:31 PM > To: eric.auger@gmail.com; eric.au...@redhat.com; > peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com; > qemu-...@nongnu.org; qemu-devel@nongnu.org; jean- > philippe.bruc...@arm.com > Cc: will.dea...@arm.com; robin.mur...@arm.com; kevin.t...@intel.com; > marc.zyng...@arm.com; christoffer.d...@linaro.org; drjo...@redhat.com; > w...@redhat.com; t...@semihalf.com; Bharat Bhushan > > Subject: [RFC v2 0/8] VIRTIO-IOMMU device > > This series implements the virtio-iommu device. This is a proof of concept > based on the virtio-iommu specification written by Jean-Philippe Brucker [1]. > This was tested with a guest using the virtio-iommu driver [2] and exposed > with a virtio-net-pci using dma ops. > > The device gets instantiated using the "-device virtio-iommu-device" > option. It currently works with ARM virt machine only as the machine must > handle the dt binding between the virtio-mmio "iommu" node and the PCI > host bridge node. ACPI booting is not yet supported. > > This should allow to start some benchmarking activities against pure > emulated IOMMU (especially ARM SMMU). I am testing this on ARM64 and see below continuous error prints: virtio_iommu_translate sid=8 is not known!! virtio_iommu_translate sid=8 is not known!! virtio_iommu_translate sid=8 is not known!! virtio_iommu_translate sid=8 is not known!! virtio_iommu_translate sid=8 is not known!! virtio_iommu_translate sid=8 is not known!! virtio_iommu_translate sid=8 is not known!! virtio_iommu_translate sid=8 is not known!! virtio_iommu_translate sid=8 is not known!! virtio_iommu_translate sid=8 is not known!! Also in guest I do not see device-tree node with virtio-iommu. I am using qemu-tree you mentioned below and iommu-driver patches published by Jean-P. Qemu command line have additional ""-device virtio-iommu-device". What I am missing ? Thanks -Bharat > > Best Regards > > Eric > > This series can be found at: > https://github.com/eauger/qemu/tree/virtio-iommu-rfcv2 > > References: > [1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU, [2] [RFC PATCH linux] > iommu: Add virtio-iommu driver [3] [RFC PATCH kvmtool 00/15] Add virtio- > iommu > > History: > v1 -> v2: > - fix redifinition of viommu_as typedef > > Eric Auger (8): > update-linux-headers: import virtio_iommu.h > linux-headers: Update for virtio-iommu > virtio_iommu: add skeleton > virtio-iommu: Decode the command payload > virtio_iommu: Add the iommu regions > virtio-iommu: Implement the translation and commands > hw/arm/virt: Add 2.10 machine type > hw/arm/virt: Add virtio-iommu the virt board > > hw/arm/virt.c | 116 - > hw/virtio/Makefile.objs | 1 + > hw/virtio/trace-events| 14 + > hw/virtio/virtio-iommu.c | 623 > ++ > include/hw/arm/virt.h | 5 + > include/hw/virtio/virtio-iommu.h | 60 +++ > include/standard-headers/linux/virtio_ids.h | 1 + > include/standard-headers/linux/virtio_iommu.h | 142 ++ > linux-headers/linux/virtio_iommu.h| 1 + > scripts/update-linux-headers.sh | 3 + > 10 files changed, 957 insertions(+), 9 deletions(-) create mode 100644 > hw/virtio/virtio-iommu.c create mode 100644 include/hw/virtio/virtio- > iommu.h create mode 100644 include/standard- > headers/linux/virtio_iommu.h > create mode 100644 linux-headers/linux/virtio_iommu.h > > -- > 2.5.5