Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-16 Thread Tian, Kevin
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Friday, July 14, 2017 7:26 PM
> 
> On 14/07/17 08:20, Tian, Kevin wrote:
> >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> >> Sent: Friday, July 7, 2017 11:15 PM
> >>
> >> On 07/07/17 07:21, Tian, Kevin wrote:
> >>> sorry I didn't quite get this part, and here is my understanding:
> >>>
> >>> Guest programs vIOMMU to map a gIOVA (used by MSI to a GPA
> >>> of doorbell register of virtual irqchip. vIOMMU then
> >>> triggers VFIO map/unmap to update physical IOMMU page
> >>> table for gIOVA -> HPA of real doorbell of physical irqchip
> >>
> >> At the moment (non-SVM), physical and virtual MSI doorbell are
> completely
> >> dissociated. VFIO itself maps the doorbell GPA->HPA during container
> >> initialization. The GPA, chosen arbitrarily by the host, is then removed
> >> from the guest GPA space.
> >
> > got you. I also got some basic understanding from below link. :-)
> >
> > https://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-
> armarm64/
> >
> >>
> >> When the guest programs the vIOMMU to map a gIOVA to the virtual
> irqchip
> >> doorbell, I suppose Qemu will notice that the GPA doesn't correspond to
> >> RAM and will withhold sending a VFIO_IOMMU_MAP_DMA request.
> >>
> >> (For SVM I don't want to go into the details just now, but we will
> >> probably need a separate VFIO mechanism to update the physical MSI-X
> >> tables with whatever gIOVA the guest mapped in its private stage-1 page
> >> tables.)
> >
> > I guess there may be either a terminology difference or a hardware
> > difference here, since I noted you mentioned IOVA with stage-1
> > multiple times.
> >
> > For Intel VT-d:
> >
> > - stage-1 is only for VA translation, tagged with PASID
> > - stage-2 can be used for IOVA translation on bare metal or GPA/gIOVA
> > translation in virtualization, w/o PASID tagged
> 
> The terminology is indeed a bit confusing, and the hardware slightly
> different. For me IOVA is the address used as input of the pIOMMU, PA is
> the output address, and GPA only exists if there is stage-1 + stage-2. So
> I think what I meant by gIOVA above was VA in your description.

In Linux kernel IOVA specifically refers to a pseudo address space
remapped to PA (e.g. from pci_map) while VA is for real CPU virtual 
address (so-called SVM). either IOVA or VA can be input to pIOMMU
based on different usages. When running inside a VM, then input
addresses become gIOVA or GVA. What about following this convention
here and in future discussions, though I agree conceptually IOVA can 
represent any input of pIOMMU? :-)

> 
> I understand your "stage-1" and "stage-2" are named "first-level" and
> "second level" in the VT-d spec?

yes, VT-d uses first/second level.

> 
> If I read the VT-d spec correctly, I think the main difference on ARM SMMU
> is that stage-2 always follows stage-1 translation, but either stage may
> be disabled (or both, for bypass mode). There is no mode like in VT-d,
> where non-PASID transactions go only through stage-2 and PASID
> transactions go only through stage-1. I believe this is (NESTE=0,
> T=000b/001b) in the Extended-Context-Entry.
> 
> Something equivalent in SMMU is disabling stage-2 and using the entry 0 in
> the PASID table for non-PASID traffic. In this mode, traffic that uses
> PASID#0 would be aborted. So using your terms, the SMMU can have VAs
> and
> IOVAs be translated by stage-1 and then, if enabled, be translated by
> stage-2 as well.
> 

Clear to me. Thanks for explanation.

Kevin


Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-14 Thread Jean-Philippe Brucker
On 14/07/17 08:20, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
>> Sent: Friday, July 7, 2017 11:15 PM
>>
>> On 07/07/17 07:21, Tian, Kevin wrote:
>>> sorry I didn't quite get this part, and here is my understanding:
>>>
>>> Guest programs vIOMMU to map a gIOVA (used by MSI to a GPA
>>> of doorbell register of virtual irqchip. vIOMMU then
>>> triggers VFIO map/unmap to update physical IOMMU page
>>> table for gIOVA -> HPA of real doorbell of physical irqchip
>>
>> At the moment (non-SVM), physical and virtual MSI doorbell are completely
>> dissociated. VFIO itself maps the doorbell GPA->HPA during container
>> initialization. The GPA, chosen arbitrarily by the host, is then removed
>> from the guest GPA space.
> 
> got you. I also got some basic understanding from below link. :-)
> 
> https://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-armarm64/
> 
>>
>> When the guest programs the vIOMMU to map a gIOVA to the virtual irqchip
>> doorbell, I suppose Qemu will notice that the GPA doesn't correspond to
>> RAM and will withhold sending a VFIO_IOMMU_MAP_DMA request.
>>
>> (For SVM I don't want to go into the details just now, but we will
>> probably need a separate VFIO mechanism to update the physical MSI-X
>> tables with whatever gIOVA the guest mapped in its private stage-1 page
>> tables.)
> 
> I guess there may be either a terminology difference or a hardware
> difference here, since I noted you mentioned IOVA with stage-1
> multiple times.
> 
> For Intel VT-d:
> 
> - stage-1 is only for VA translation, tagged with PASID
> - stage-2 can be used for IOVA translation on bare metal or GPA/gIOVA
> translation in virtualization, w/o PASID tagged

The terminology is indeed a bit confusing, and the hardware slightly
different. For me IOVA is the address used as input of the pIOMMU, PA is
the output address, and GPA only exists if there is stage-1 + stage-2. So
I think what I meant by gIOVA above was VA in your description.

I understand your "stage-1" and "stage-2" are named "first-level" and
"second level" in the VT-d spec?

If I read the VT-d spec correctly, I think the main difference on ARM SMMU
is that stage-2 always follows stage-1 translation, but either stage may
be disabled (or both, for bypass mode). There is no mode like in VT-d,
where non-PASID transactions go only through stage-2 and PASID
transactions go only through stage-1. I believe this is (NESTE=0,
T=000b/001b) in the Extended-Context-Entry.

Something equivalent in SMMU is disabling stage-2 and using the entry 0 in
the PASID table for non-PASID traffic. In this mode, traffic that uses
PASID#0 would be aborted. So using your terms, the SMMU can have VAs and
IOVAs be translated by stage-1 and then, if enabled, be translated by
stage-2 as well.

Thanks,
Jean

> Does ARM SMMU allow stage-1 used for both VA and IOVA? IIRC
> you said PASID#0 reserved for traffic w/o PASID in some mail...>
>>> (assume your irqchip will provide multiple doorbells so each
>>> device can have its own channel).
>>
>> In existing irqchips the doorbell is shared by endpoints, which are
>> differentiated by their device ID (generally the BDF). I'm not sure why
>> this matters here?
> 
> Not matter now with device ID
> 
>>
>>> then once this update is
>>> done, later MSI interrupts from assigned device will go
>>> through physical IOMMU (gIOVA->HPA) then reach irqchip
>>> for irq remapping. vIOMMU is involved only in configuration
>>> path instead of actual interrupt path.
>>
>> Yes the vIOMMU is used to correlate the IOVA written by the guest in its
>> virtual MSI-X table with the MAP request received by the vIOMMU. That is
>> probably used to setup IRQFD routes with KVM. But the vIOMMU is not
>> involved further than that in MSIs.
>>
>>> If my understanding is correct, above will be the natural flow then
>>> why is additional virtio-iommu change required? :-)
>>
>> The change is not *required* for ARM systems, I only proposed removing the
>> doorbell address translation stage to make host implementation simpler
>> (and since virtio-iommu on x86 won't translate the doorbell anyway, we
>> have to add support for this to virtio-iommu). But for Qemu, since vSMMU
>> needs to implement the natural flow anyway, it might not be a lot of
>> effort to also do it for virtio-iommu. Other implementations (e.g.
>> kvmtool) might piggy-back on the x86 way and declare the irqchip doorbell
>> as untranslated.
>>
>> My proposal also breaks when confronted to virtual SVM in a physical ARM
>> system, where the guest owns stage-1 page tables and *has* to map the
>> doorbell if it wants MSIs to work, so you can disregard it :)
>>
> 
> It is a good learning. thanks.
> 




Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-14 Thread Tian, Kevin
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Friday, July 7, 2017 11:15 PM
> 
> On 07/07/17 07:21, Tian, Kevin wrote:
> > sorry I didn't quite get this part, and here is my understanding:
> >
> > Guest programs vIOMMU to map a gIOVA (used by MSI to a GPA
> > of doorbell register of virtual irqchip. vIOMMU then
> > triggers VFIO map/unmap to update physical IOMMU page
> > table for gIOVA -> HPA of real doorbell of physical irqchip
> 
> At the moment (non-SVM), physical and virtual MSI doorbell are completely
> dissociated. VFIO itself maps the doorbell GPA->HPA during container
> initialization. The GPA, chosen arbitrarily by the host, is then removed
> from the guest GPA space.

got you. I also got some basic understanding from below link. :-)

https://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-armarm64/

> 
> When the guest programs the vIOMMU to map a gIOVA to the virtual irqchip
> doorbell, I suppose Qemu will notice that the GPA doesn't correspond to
> RAM and will withhold sending a VFIO_IOMMU_MAP_DMA request.
> 
> (For SVM I don't want to go into the details just now, but we will
> probably need a separate VFIO mechanism to update the physical MSI-X
> tables with whatever gIOVA the guest mapped in its private stage-1 page
> tables.)

I guess there may be either a terminology difference or a hardware
difference here, since I noted you mentioned IOVA with stage-1
multiple times.

For Intel VT-d:

- stage-1 is only for VA translation, tagged with PASID
- stage-2 can be used for IOVA translation on bare metal or GPA/gIOVA
translation in virtualization, w/o PASID tagged

Does ARM SMMU allow stage-1 used for both VA and IOVA? IIRC
you said PASID#0 reserved for traffic w/o PASID in some mail...

> 
> > (assume your irqchip will provide multiple doorbells so each
> > device can have its own channel).
> 
> In existing irqchips the doorbell is shared by endpoints, which are
> differentiated by their device ID (generally the BDF). I'm not sure why
> this matters here?

Not matter now with device ID

> 
> > then once this update is
> > done, later MSI interrupts from assigned device will go
> > through physical IOMMU (gIOVA->HPA) then reach irqchip
> > for irq remapping. vIOMMU is involved only in configuration
> > path instead of actual interrupt path.
> 
> Yes the vIOMMU is used to correlate the IOVA written by the guest in its
> virtual MSI-X table with the MAP request received by the vIOMMU. That is
> probably used to setup IRQFD routes with KVM. But the vIOMMU is not
> involved further than that in MSIs.
> 
> > If my understanding is correct, above will be the natural flow then
> > why is additional virtio-iommu change required? :-)
> 
> The change is not *required* for ARM systems, I only proposed removing the
> doorbell address translation stage to make host implementation simpler
> (and since virtio-iommu on x86 won't translate the doorbell anyway, we
> have to add support for this to virtio-iommu). But for Qemu, since vSMMU
> needs to implement the natural flow anyway, it might not be a lot of
> effort to also do it for virtio-iommu. Other implementations (e.g.
> kvmtool) might piggy-back on the x86 way and declare the irqchip doorbell
> as untranslated.
> 
> My proposal also breaks when confronted to virtual SVM in a physical ARM
> system, where the guest owns stage-1 page tables and *has* to map the
> doorbell if it wants MSIs to work, so you can disregard it :)
> 

It is a good learning. thanks.


Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-12 Thread Bharat Bhushan


> -Original Message-
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Wednesday, July 12, 2017 4:28 PM
> To: Bharat Bhushan ; Auger Eric
> ; eric.auger@gmail.com;
> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com;
> qemu-...@nongnu.org; qemu-devel@nongnu.org
> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
> robin.mur...@arm.com; christoffer.d...@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> On 12/07/17 11:27, Bharat Bhushan wrote:
> >
> >
> >> -Original Message-
> >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> >> Sent: Wednesday, July 12, 2017 3:48 PM
> >> To: Bharat Bhushan ; Auger Eric
> >> ; eric.auger@gmail.com;
> >> peter.mayd...@linaro.org; alex.william...@redhat.com;
> m...@redhat.com;
> >> qemu-...@nongnu.org; qemu-devel@nongnu.org
> >> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
> >> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
> >> robin.mur...@arm.com; christoffer.d...@linaro.org
> >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> >>
> >> On 12/07/17 04:50, Bharat Bhushan wrote:
> >> [...]
> >>>> The size of the virtio_iommu_req_probe structure is variable, and
> >> depends
> >>>> what fields the device implements. So the device initially computes
> >>>> the
> >> size it
> >>>> needs to fill virtio_iommu_req_probe, describes it in probe_size,
> >>>> and the driver allocates that many bytes for
> >>>> virtio_iommu_req_probe.content[]
> >>>>
> >>>>>> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should
> >> send
> >>>> an
> >>>>>> VIRTIO_IOMMU_T_PROBE request for each new endpoint.
> >>>>>> * The driver allocates a device-writeable buffer of probe_size
> >>>>>> (plus
> >>>>>> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request.
> >>>>>> * The device fills the buffer with various information.
> >>>>>>
> >>>>>> struct virtio_iommu_req_probe {
> >>>>>>/* device-readable */
> >>>>>>struct virtio_iommu_req_head head;
> >>>>>>le32 device;
> >>>>>>le32 flags;
> >>>>>>
> >>>>>>/* maybe also le32 content_size, but it must be equal to
> >>>>>>   probe_size */
> >>>>>
> >>>>> Can you please describe why we need to pass size of "probe_size"
> >>>>> in
> >> probe
> >>>> request?
> >>>>
> >>>> We don't. I don't think we should add this 'content_size' field
> >>>> unless there
> >> is
> >>>> a compelling reason to do so.
> >>>>
> >>>>>>
> >>>>>>/* device-writeable */
> >>>>>>u8 content[];
> >>>>>
> >>>>> I assume content_size above is the size of array "content[]" and
> >>>>> max
> >> value
> >>>> can be equal to probe_size advertised by device?
> >>>>
> >>>> probe_size is exactly the size of array content[]. The driver must
> >>>> allocate a buffer of this size (plus the space needed for head, device,
> flags and tail).
> >>>>
> >>>> Then the device is free to leave parts of content[] empty. Field
> >>>> 'type' 0 will
> >> be
> >>>> reserved and mark the end of the array.
> >>>>
> >>>>>>struct virtio_iommu_req_tail tail; };
> >>>>>>
> >>>>>> I'm still struggling with the content and layout of the probe
> >>>>>> request, and would appreciate any feedback. To be easily
> >>>>>> extended, I think it should contain a list of fields of variable size:
> >>>>>>
> >>>>>>|0   15|16   31|32   N|
> >>>>>>| type |length |  values  |
> >>>>>>
> >>>>>> 'length' might be made optional if it c

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-12 Thread Jean-Philippe Brucker
On 12/07/17 11:27, Bharat Bhushan wrote:
> 
> 
>> -Original Message-
>> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
>> Sent: Wednesday, July 12, 2017 3:48 PM
>> To: Bharat Bhushan ; Auger Eric
>> ; eric.auger@gmail.com;
>> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com;
>> qemu-...@nongnu.org; qemu-devel@nongnu.org
>> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
>> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
>> robin.mur...@arm.com; christoffer.d...@linaro.org
>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
>>
>> On 12/07/17 04:50, Bharat Bhushan wrote:
>> [...]
>>>> The size of the virtio_iommu_req_probe structure is variable, and
>> depends
>>>> what fields the device implements. So the device initially computes the
>> size it
>>>> needs to fill virtio_iommu_req_probe, describes it in probe_size, and the
>>>> driver allocates that many bytes for virtio_iommu_req_probe.content[]
>>>>
>>>>>> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should
>> send
>>>> an
>>>>>> VIRTIO_IOMMU_T_PROBE request for each new endpoint.
>>>>>> * The driver allocates a device-writeable buffer of probe_size (plus
>>>>>> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request.
>>>>>> * The device fills the buffer with various information.
>>>>>>
>>>>>> struct virtio_iommu_req_probe {
>>>>>>  /* device-readable */
>>>>>>  struct virtio_iommu_req_head head;
>>>>>>  le32 device;
>>>>>>  le32 flags;
>>>>>>
>>>>>>  /* maybe also le32 content_size, but it must be equal to
>>>>>> probe_size */
>>>>>
>>>>> Can you please describe why we need to pass size of "probe_size" in
>> probe
>>>> request?
>>>>
>>>> We don't. I don't think we should add this 'content_size' field unless 
>>>> there
>> is
>>>> a compelling reason to do so.
>>>>
>>>>>>
>>>>>>  /* device-writeable */
>>>>>>  u8 content[];
>>>>>
>>>>> I assume content_size above is the size of array "content[]" and max
>> value
>>>> can be equal to probe_size advertised by device?
>>>>
>>>> probe_size is exactly the size of array content[]. The driver must 
>>>> allocate a
>>>> buffer of this size (plus the space needed for head, device, flags and 
>>>> tail).
>>>>
>>>> Then the device is free to leave parts of content[] empty. Field 'type' 0 
>>>> will
>> be
>>>> reserved and mark the end of the array.
>>>>
>>>>>>  struct virtio_iommu_req_tail tail;
>>>>>> };
>>>>>>
>>>>>> I'm still struggling with the content and layout of the probe
>>>>>> request, and would appreciate any feedback. To be easily extended, I
>>>>>> think it should contain a list of fields of variable size:
>>>>>>
>>>>>>  |0   15|16   31|32   N|
>>>>>>  | type |length |  values  |
>>>>>>
>>>>>> 'length' might be made optional if it can be deduced from type, but
>>>>>> might make driver-side parsing more robust.
>>>>>>
>>>>>> The probe could either be done for each endpoint, or for each address
>>>>>> space. I much prefer endpoint because it is the smallest granularity.
>>>>>> The driver can then decide what endpoints to put together in the same
>>>>>> address space based on their individual capabilities. The
>>>>>> specification would described how each endpoint property is combined
>>>>>> when endpoints are put in the same address space. For example, take
>>>>>> the minimum of all PASID size, the maximum of all page granularities,
>>>>>> combine doorbell addresses, etc.
>>>>>>
>>>>>> If we did the probe on address spaces instead, the driver would have
>>>>>> to re-send a probe request each time a new endpoint is attached to an
>>>>>> existing address space, to see if it is s

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-12 Thread Bharat Bhushan


> -Original Message-
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Wednesday, July 12, 2017 3:48 PM
> To: Bharat Bhushan ; Auger Eric
> ; eric.auger@gmail.com;
> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com;
> qemu-...@nongnu.org; qemu-devel@nongnu.org
> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
> robin.mur...@arm.com; christoffer.d...@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> On 12/07/17 04:50, Bharat Bhushan wrote:
> [...]
> >> The size of the virtio_iommu_req_probe structure is variable, and
> depends
> >> what fields the device implements. So the device initially computes the
> size it
> >> needs to fill virtio_iommu_req_probe, describes it in probe_size, and the
> >> driver allocates that many bytes for virtio_iommu_req_probe.content[]
> >>
> >>>> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should
> send
> >> an
> >>>> VIRTIO_IOMMU_T_PROBE request for each new endpoint.
> >>>> * The driver allocates a device-writeable buffer of probe_size (plus
> >>>> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request.
> >>>> * The device fills the buffer with various information.
> >>>>
> >>>> struct virtio_iommu_req_probe {
> >>>>  /* device-readable */
> >>>>  struct virtio_iommu_req_head head;
> >>>>  le32 device;
> >>>>  le32 flags;
> >>>>
> >>>>  /* maybe also le32 content_size, but it must be equal to
> >>>> probe_size */
> >>>
> >>> Can you please describe why we need to pass size of "probe_size" in
> probe
> >> request?
> >>
> >> We don't. I don't think we should add this 'content_size' field unless 
> >> there
> is
> >> a compelling reason to do so.
> >>
> >>>>
> >>>>  /* device-writeable */
> >>>>  u8 content[];
> >>>
> >>> I assume content_size above is the size of array "content[]" and max
> value
> >> can be equal to probe_size advertised by device?
> >>
> >> probe_size is exactly the size of array content[]. The driver must 
> >> allocate a
> >> buffer of this size (plus the space needed for head, device, flags and 
> >> tail).
> >>
> >> Then the device is free to leave parts of content[] empty. Field 'type' 0 
> >> will
> be
> >> reserved and mark the end of the array.
> >>
> >>>>  struct virtio_iommu_req_tail tail;
> >>>> };
> >>>>
> >>>> I'm still struggling with the content and layout of the probe
> >>>> request, and would appreciate any feedback. To be easily extended, I
> >>>> think it should contain a list of fields of variable size:
> >>>>
> >>>>  |0   15|16   31|32   N|
> >>>>  | type |length |  values  |
> >>>>
> >>>> 'length' might be made optional if it can be deduced from type, but
> >>>> might make driver-side parsing more robust.
> >>>>
> >>>> The probe could either be done for each endpoint, or for each address
> >>>> space. I much prefer endpoint because it is the smallest granularity.
> >>>> The driver can then decide what endpoints to put together in the same
> >>>> address space based on their individual capabilities. The
> >>>> specification would described how each endpoint property is combined
> >>>> when endpoints are put in the same address space. For example, take
> >>>> the minimum of all PASID size, the maximum of all page granularities,
> >>>> combine doorbell addresses, etc.
> >>>>
> >>>> If we did the probe on address spaces instead, the driver would have
> >>>> to re-send a probe request each time a new endpoint is attached to an
> >>>> existing address space, to see if it is still capable of page table
> >>>> handover or if the driver just combined a VFIO and an emulated
> >>>> endpoint by accident.
> >>>>
> >>>>  ***
> >>>>
> >>>> Using this framework, the device can declare doorbell regions 

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-12 Thread Jean-Philippe Brucker
On 12/07/17 04:50, Bharat Bhushan wrote:
[...]
>> The size of the virtio_iommu_req_probe structure is variable, and depends
>> what fields the device implements. So the device initially computes the size 
>> it
>> needs to fill virtio_iommu_req_probe, describes it in probe_size, and the
>> driver allocates that many bytes for virtio_iommu_req_probe.content[]
>>
 * When device offers VIRTIO_IOMMU_F_PROBE, the driver should send
>> an
 VIRTIO_IOMMU_T_PROBE request for each new endpoint.
 * The driver allocates a device-writeable buffer of probe_size (plus
 framing) and sends it as a VIRTIO_IOMMU_T_PROBE request.
 * The device fills the buffer with various information.

 struct virtio_iommu_req_probe {
/* device-readable */
struct virtio_iommu_req_head head;
le32 device;
le32 flags;

/* maybe also le32 content_size, but it must be equal to
   probe_size */
>>>
>>> Can you please describe why we need to pass size of "probe_size" in probe
>> request?
>>
>> We don't. I don't think we should add this 'content_size' field unless there 
>> is
>> a compelling reason to do so.
>>

/* device-writeable */
u8 content[];
>>>
>>> I assume content_size above is the size of array "content[]" and max value
>> can be equal to probe_size advertised by device?
>>
>> probe_size is exactly the size of array content[]. The driver must allocate a
>> buffer of this size (plus the space needed for head, device, flags and tail).
>>
>> Then the device is free to leave parts of content[] empty. Field 'type' 0 
>> will be
>> reserved and mark the end of the array.
>>
struct virtio_iommu_req_tail tail;
 };

 I'm still struggling with the content and layout of the probe
 request, and would appreciate any feedback. To be easily extended, I
 think it should contain a list of fields of variable size:

|0   15|16   31|32   N|
| type |length |  values  |

 'length' might be made optional if it can be deduced from type, but
 might make driver-side parsing more robust.

 The probe could either be done for each endpoint, or for each address
 space. I much prefer endpoint because it is the smallest granularity.
 The driver can then decide what endpoints to put together in the same
 address space based on their individual capabilities. The
 specification would described how each endpoint property is combined
 when endpoints are put in the same address space. For example, take
 the minimum of all PASID size, the maximum of all page granularities,
 combine doorbell addresses, etc.

 If we did the probe on address spaces instead, the driver would have
 to re-send a probe request each time a new endpoint is attached to an
 existing address space, to see if it is still capable of page table
 handover or if the driver just combined a VFIO and an emulated
 endpoint by accident.

  ***

 Using this framework, the device can declare doorbell regions by
 adding one or more RESV fields into the probe buffer:

 /* 'type' */
 #define VIRTIO_IOMMU_PROBE_T_RESV  0x1

 /* 'values'. 'length' is sizeof(struct virtio_iommu_probe_resv) */
 struct virtio_iommu_probe_resv {
le64 gpa;
le64 size;

 #define VIRTIO_IOMMU_PROBE_RESV_MSI0x1
u8 type;
> 
> To be sure I am understanding it correctly, Is this "type" in struct 
> virtio_iommu_req_head?

No, virtio_iommu_req_head::type is the request type
(ATTACH/DETACH/MAP/UNMAP/PROBE).

Then virtio_iommu_probe_property::type is the property type (only RESV for
the moment).

And this is virtio_iommu_probe_resv::type, which is the type of the resv
region (MSI). I renamed it to 'subtype' below, but I think it still is
pretty confusing.


I did a number of changes to structures and naming when trying to
integrate it to the specification:

* Added 64 bytes of padding in virtio_iommu_req_probe, so that future
extensions can add fields in the device-readable part.
* renamed "RESV" to "RESV_MEM".
* The resv_mem property now looks like this:
  struct virtio_iommu_probe_resv_mem {
u8  subtype;
u8  padding[3];
le32flags;
le64addr;
le64size;
  };
* subtype for MSI doorbells is now VIRTIO_IOMMU_PROBE_RESV_MEM_T_BYPASS
(because transactions to this region bypass the IOMMU). 'flags' contain a
hint VIRTIO_IOMMU_PROBE_RESV_MEM_F_MSI, telling the driver that this
region is used for MSIs.

Here is an example of a probe request returning an MSI doorbell property.

 31   7  0
+-+
|   0|  type  | <- request type = PROBE (5)
+-+
| device  |
+---

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-11 Thread Bharat Bhushan


> -Original Message-
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Tuesday, July 11, 2017 6:21 PM
> To: Bharat Bhushan ; Auger Eric
> ; eric.auger@gmail.com;
> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com;
> qemu-...@nongnu.org; qemu-devel@nongnu.org
> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
> robin.mur...@arm.com; christoffer.d...@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> On 11/07/17 06:54, Bharat Bhushan wrote:
> > Hi Jean,
> >
> >> -Original Message-
> >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> >> Sent: Friday, July 07, 2017 8:50 PM
> >> To: Bharat Bhushan ; Auger Eric
> >> ; eric.auger@gmail.com;
> >> peter.mayd...@linaro.org; alex.william...@redhat.com;
> m...@redhat.com;
> >> qemu-...@nongnu.org; qemu-devel@nongnu.org
> >> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
> >> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
> >> robin.mur...@arm.com; christoffer.d...@linaro.org
> >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> >>
> >> On 07/07/17 12:36, Bharat Bhushan wrote:
> >>>>> In this proposal, QEMU reserves a iova-range for guest (not host)
> >>>>> and
> >> guest
> >>>> kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI).
> >>>> While
> >> this
> >>>> does not change host interface and it will continue to use host
> >>>> reserved mapping for actual interrupt generation, no?
> >>>> But then userspace needs to provide IOMMU_RESV_MSI range to
> guest
> >>>> kernel, right? What would be the proposed manner?
> >>>
> >>> Just an opinion, we can define feature
> >> (VIRTIO_IOMMU_F_RES_MSI_RANGE) and provide this info via a
> command
> >> (VIRTIO_IOMMU_T_MSI_RANGE). Guest iommu-driver will make this call
> >> during initialization and store the value. This value will just
> >> replace MSI_IOVA_BASE and MSI_IOVA_LENGHT hash define. Rest will
> >> remain same in virtio-iommu driver.
> >>
> >> Yes I had something similar in mind, although more generic since
> >> we'll need to get other bits of information from the device in future
> >> extensions (fault handling, page table formats and dynamic reserves
> >> of memory for SVM), and maybe also for finding out per-address-space
> >> page granularity (see my reply of patch 3/8). These are per-endpoint
> >> properties that cannot be advertise in the virtio config space.
> >>
> >>  ***
> >>
> >> So I propose to add a per-endpoint probing mechanism on the request
> >> queue:
> >
> > What is per-endpoint? Is it "per-pci/platform-device"?
> 
> Yes, it's a pci or platform device managed by the IOMMU. In the spec I'm
> now using the term "endpoint" to easily differentiate from the virtio-iommu
> device ("the device").
> 
> >> * The device advertises a new command VIRTIO_IOMMU_T_PROBE with
> >> feature bit VIRTIO_IOMMU_F_PROBE.
> >> * When this feature is advertised, the device sets probe_size field
> >> in the the config space.
> >
> > Probably I did not get how virtio-iommu device emulation decides value of
> "probe_size", can you share more info?
> 
> The size of the virtio_iommu_req_probe structure is variable, and depends
> what fields the device implements. So the device initially computes the size 
> it
> needs to fill virtio_iommu_req_probe, describes it in probe_size, and the
> driver allocates that many bytes for virtio_iommu_req_probe.content[]
> 
> >> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should send
> an
> >> VIRTIO_IOMMU_T_PROBE request for each new endpoint.
> >> * The driver allocates a device-writeable buffer of probe_size (plus
> >> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request.
> >> * The device fills the buffer with various information.
> >>
> >> struct virtio_iommu_req_probe {
> >>/* device-readable */
> >>struct virtio_iommu_req_head head;
> >>le32 device;
> >>le32 flags;
> >>
> >>/* maybe also le32 content_size, but it must be equal to
> >>   probe_size */
> >
> > Can you please describe why

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-11 Thread Jean-Philippe Brucker
On 11/07/17 06:54, Bharat Bhushan wrote:
> Hi Jean,
> 
>> -Original Message-
>> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
>> Sent: Friday, July 07, 2017 8:50 PM
>> To: Bharat Bhushan ; Auger Eric
>> ; eric.auger@gmail.com;
>> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com;
>> qemu-...@nongnu.org; qemu-devel@nongnu.org
>> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
>> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
>> robin.mur...@arm.com; christoffer.d...@linaro.org
>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
>>
>> On 07/07/17 12:36, Bharat Bhushan wrote:
>>>>> In this proposal, QEMU reserves a iova-range for guest (not host) and
>> guest
>>>> kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI). While
>> this
>>>> does not change host interface and it will continue to use host reserved
>>>> mapping for actual interrupt generation, no?
>>>> But then userspace needs to provide IOMMU_RESV_MSI range to guest
>>>> kernel, right? What would be the proposed manner?
>>>
>>> Just an opinion, we can define feature
>> (VIRTIO_IOMMU_F_RES_MSI_RANGE) and provide this info via a command
>> (VIRTIO_IOMMU_T_MSI_RANGE). Guest iommu-driver will make this call
>> during initialization and store the value. This value will just replace
>> MSI_IOVA_BASE and MSI_IOVA_LENGHT hash define. Rest will remain same
>> in virtio-iommu driver.
>>
>> Yes I had something similar in mind, although more generic since we'll
>> need to get other bits of information from the device in future extensions
>> (fault handling, page table formats and dynamic reserves of memory for
>> SVM), and maybe also for finding out per-address-space page granularity
>> (see my reply of patch 3/8). These are per-endpoint properties that cannot
>> be advertise in the virtio config space.
>>
>>  ***
>>
>> So I propose to add a per-endpoint probing mechanism on the request
>> queue:
> 
> What is per-endpoint? Is it "per-pci/platform-device"?

Yes, it's a pci or platform device managed by the IOMMU. In the spec I'm
now using the term "endpoint" to easily differentiate from the
virtio-iommu device ("the device").

>> * The device advertises a new command VIRTIO_IOMMU_T_PROBE with
>> feature
>> bit VIRTIO_IOMMU_F_PROBE.
>> * When this feature is advertised, the device sets probe_size field in the
>> the config space.
> 
> Probably I did not get how virtio-iommu device emulation decides value of 
> "probe_size", can you share more info?

The size of the virtio_iommu_req_probe structure is variable, and depends
what fields the device implements. So the device initially computes the
size it needs to fill virtio_iommu_req_probe, describes it in probe_size,
and the driver allocates that many bytes for virtio_iommu_req_probe.content[]

>> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should send an
>> VIRTIO_IOMMU_T_PROBE request for each new endpoint.
>> * The driver allocates a device-writeable buffer of probe_size (plus
>> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request.
>> * The device fills the buffer with various information.
>>
>> struct virtio_iommu_req_probe {
>>  /* device-readable */
>>  struct virtio_iommu_req_head head;
>>  le32 device;
>>  le32 flags;
>>
>>  /* maybe also le32 content_size, but it must be equal to
>> probe_size */
> 
> Can you please describe why we need to pass size of "probe_size" in probe 
> request?

We don't. I don't think we should add this 'content_size' field unless
there is a compelling reason to do so.

>>
>>  /* device-writeable */
>>  u8 content[];
> 
> I assume content_size above is the size of array "content[]" and max value 
> can be equal to probe_size advertised by device?

probe_size is exactly the size of array content[]. The driver must
allocate a buffer of this size (plus the space needed for head, device,
flags and tail).

Then the device is free to leave parts of content[] empty. Field 'type' 0
will be reserved and mark the end of the array.

>>  struct virtio_iommu_req_tail tail;
>> };
>>
>> I'm still struggling with the content and layout of the probe request, and
>> would appreciate any feedback. To be easily extended, I think it should
>> contain a list of fields of variable size:
>>
>>  |0   1

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-10 Thread Bharat Bhushan
Hi Jean,

> -Original Message-
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Friday, July 07, 2017 8:50 PM
> To: Bharat Bhushan ; Auger Eric
> ; eric.auger@gmail.com;
> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com;
> qemu-...@nongnu.org; qemu-devel@nongnu.org
> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
> robin.mur...@arm.com; christoffer.d...@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> On 07/07/17 12:36, Bharat Bhushan wrote:
> >>> In this proposal, QEMU reserves a iova-range for guest (not host) and
> guest
> >> kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI). While
> this
> >> does not change host interface and it will continue to use host reserved
> >> mapping for actual interrupt generation, no?
> >> But then userspace needs to provide IOMMU_RESV_MSI range to guest
> >> kernel, right? What would be the proposed manner?
> >
> > Just an opinion, we can define feature
> (VIRTIO_IOMMU_F_RES_MSI_RANGE) and provide this info via a command
> (VIRTIO_IOMMU_T_MSI_RANGE). Guest iommu-driver will make this call
> during initialization and store the value. This value will just replace
> MSI_IOVA_BASE and MSI_IOVA_LENGHT hash define. Rest will remain same
> in virtio-iommu driver.
> 
> Yes I had something similar in mind, although more generic since we'll
> need to get other bits of information from the device in future extensions
> (fault handling, page table formats and dynamic reserves of memory for
> SVM), and maybe also for finding out per-address-space page granularity
> (see my reply of patch 3/8). These are per-endpoint properties that cannot
> be advertise in the virtio config space.
> 
>  ***
> 
> So I propose to add a per-endpoint probing mechanism on the request
> queue:

What is per-endpoint? Is it "per-pci/platform-device"?

> 
> * The device advertises a new command VIRTIO_IOMMU_T_PROBE with
> feature
> bit VIRTIO_IOMMU_F_PROBE.
> * When this feature is advertised, the device sets probe_size field in the
> the config space.

Probably I did not get how virtio-iommu device emulation decides value of 
"probe_size", can you share more info?

> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should send an
> VIRTIO_IOMMU_T_PROBE request for each new endpoint.
> * The driver allocates a device-writeable buffer of probe_size (plus
> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request.
> * The device fills the buffer with various information.
> 
> struct virtio_iommu_req_probe {
>   /* device-readable */
>   struct virtio_iommu_req_head head;
>   le32 device;
>   le32 flags;
> 
>   /* maybe also le32 content_size, but it must be equal to
>  probe_size */

Can you please describe why we need to pass size of "probe_size" in probe 
request?

> 
>   /* device-writeable */
>   u8 content[];

I assume content_size above is the size of array "content[]" and max value can 
be equal to probe_size advertised by device?

>   struct virtio_iommu_req_tail tail;
> };
> 
> I'm still struggling with the content and layout of the probe request, and
> would appreciate any feedback. To be easily extended, I think it should
> contain a list of fields of variable size:
> 
>   |0   15|16   31|32   N|
>   | type |length |  values  |
> 
> 'length' might be made optional if it can be deduced from type, but might
> make driver-side parsing more robust.
> 
> The probe could either be done for each endpoint, or for each address
> space. I much prefer endpoint because it is the smallest granularity. The
> driver can then decide what endpoints to put together in the same address
> space based on their individual capabilities. The specification would
> described how each endpoint property is combined when endpoints are put
> in
> the same address space. For example, take the minimum of all PASID size,
> the maximum of all page granularities, combine doorbell addresses, etc.
> 
> If we did the probe on address spaces instead, the driver would have to
> re-send a probe request each time a new endpoint is attached to an
> existing address space, to see if it is still capable of page table
> handover or if the driver just combined a VFIO and an emulated endpoint by
> accident.
> 
>  ***
> 
> Using this framework, the device can declare doorbell regions by adding
> one or more RE

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-07 Thread Jean-Philippe Brucker
On 07/07/17 12:36, Bharat Bhushan wrote:
>>> In this proposal, QEMU reserves a iova-range for guest (not host) and guest
>> kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI). While this
>> does not change host interface and it will continue to use host reserved
>> mapping for actual interrupt generation, no?
>> But then userspace needs to provide IOMMU_RESV_MSI range to guest
>> kernel, right? What would be the proposed manner?
> 
> Just an opinion, we can define feature (VIRTIO_IOMMU_F_RES_MSI_RANGE) and 
> provide this info via a command (VIRTIO_IOMMU_T_MSI_RANGE). Guest 
> iommu-driver will make this call during initialization and store the value. 
> This value will just replace MSI_IOVA_BASE and MSI_IOVA_LENGHT hash define. 
> Rest will remain same in virtio-iommu driver.

Yes I had something similar in mind, although more generic since we'll
need to get other bits of information from the device in future extensions
(fault handling, page table formats and dynamic reserves of memory for
SVM), and maybe also for finding out per-address-space page granularity
(see my reply of patch 3/8). These are per-endpoint properties that cannot
be advertise in the virtio config space.

 ***

So I propose to add a per-endpoint probing mechanism on the request queue:

* The device advertises a new command VIRTIO_IOMMU_T_PROBE with feature
bit VIRTIO_IOMMU_F_PROBE.
* When this feature is advertised, the device sets probe_size field in the
the config space.
* When device offers VIRTIO_IOMMU_F_PROBE, the driver should send an
VIRTIO_IOMMU_T_PROBE request for each new endpoint.
* The driver allocates a device-writeable buffer of probe_size (plus
framing) and sends it as a VIRTIO_IOMMU_T_PROBE request.
* The device fills the buffer with various information.

struct virtio_iommu_req_probe {
/* device-readable */
struct virtio_iommu_req_head head;
le32 device;
le32 flags;

/* maybe also le32 content_size, but it must be equal to
   probe_size */

/* device-writeable */
u8 content[];
struct virtio_iommu_req_tail tail;
};

I'm still struggling with the content and layout of the probe request, and
would appreciate any feedback. To be easily extended, I think it should
contain a list of fields of variable size:

|0   15|16   31|32   N|
| type |length |  values  |

'length' might be made optional if it can be deduced from type, but might
make driver-side parsing more robust.

The probe could either be done for each endpoint, or for each address
space. I much prefer endpoint because it is the smallest granularity. The
driver can then decide what endpoints to put together in the same address
space based on their individual capabilities. The specification would
described how each endpoint property is combined when endpoints are put in
the same address space. For example, take the minimum of all PASID size,
the maximum of all page granularities, combine doorbell addresses, etc.

If we did the probe on address spaces instead, the driver would have to
re-send a probe request each time a new endpoint is attached to an
existing address space, to see if it is still capable of page table
handover or if the driver just combined a VFIO and an emulated endpoint by
accident.

 ***

Using this framework, the device can declare doorbell regions by adding
one or more RESV fields into the probe buffer:

/* 'type' */
#define VIRTIO_IOMMU_PROBE_T_RESV   0x1

/* 'values'. 'length' is sizeof(struct virtio_iommu_probe_resv) */
struct virtio_iommu_probe_resv {
le64 gpa;
le64 size;

#define VIRTIO_IOMMU_PROBE_RESV_MSI 0x1
u8 type;
};

Such a region would be subject to the following rules:

* Driver should not use any IOVA declared as RESV_MSI in a mapping.
* Device should leave any transaction matching a RESV_MSI region pass
through untranslated.
* If the device does not advertise any RESV region, then the driver should
assume that MSI doorbells, like any other GPA, must be mapped with an
arbitrary IOVA in order for the endpoint to access them.
* Given that the driver *should* perform a probe request if available, and
it *should* understand the VIRTIO_IOMMU_PROBE_T_RESV field, then this
field tells the guest how it should handle MSI doorbells, and whether it
should map the address via MAP requests or not.

Does this make sense and did I overlook something?

Thanks,
Jean



Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-07 Thread Jean-Philippe Brucker
On 06/07/17 22:11, Auger Eric wrote:
> Hello Bharat, Jean-Philippe,
> On 06/07/2017 12:02, Jean-Philippe Brucker wrote:
>> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route
>> kvm_irqchip_add_msi_route() we needed to
 provide the translated address.
> According to my understanding this is required because kernel does no go
 through viommu translation when generating interrupt, no?

 yes this is needed when KVM MSI routes are set up, ie. along with GICV3 
 ITS.
 With GICv2M, qemu direct gsi mapping is used and this is not needed.

 So I do not understand your previous sentence saying "MSI interrupts works
 without any change".
>>>
>>> I have almost completed vfio integration with virtio-iommu and now testing 
>>> the changes by assigning e1000 device to VM. For this I have changed 
>>> virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi and this does 
>>> not need changed in vfio_get_addr()  and kvm_irqchip_add_msi_route()
>>
>> I understand you're reserving region 0x0800-0x0810 as
>> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only works
>> because Qemu places the vgic in that area as well (in hw/arm/virt.c). It's
>> not a coincidence if the addresses are the same, because Eric chose them
>> for the Linux SMMU drivers and I copied them.
> 
> Yes I chose this region because it does not overlap with any guest RAM
> region
> 
>>
>> We can't rely on that behavior, though, it will break MSIs in emulated
>> devices. And if Qemu happens to move the MSI doorbell in future machine
>> revisions, then it would also break VFIO.
>>
>> Just for my own understanding -- what happens, I think, is that in Linux
>> iova_reserve_iommu_regions initially reserves the guest-physical doorbell
>> 0x0800-0x0810. Then much later, when the device driver requests an
>> MSI, the irqchip driver calls iommu_dma_map_msi_msg with the
>> guest-physical gicv2m address 0x0802. The function finds the right
>> page in msi_page_list, which was added by cookie_init_hw_msi_region,
>> therefore bypassing the viommu and the GPA gets written in the MSI-X table.
> 
> I share Jean's understanding. To me using IOMMU_RESV_MSI in the
> virtio-iommu means this region is not translated by the IOMMU. as
> cookie_init_hw_msi_region() pre-allocates the msi_page array,
> iommu_dma_get_msi_page() does not do any IOMMU mapping.
> 
>>
>> If an emulated device such as virtio-net-pci were to generate an MSI, then
>> Qemu would attempt to access the doorbell written by Linux into the MSI-X
>> table, 0x0802, and fault because that address wasn't mapped in the 
>> viommu.
> Yes so I am confused, how can it work with a virtio-net-pci or
> passthrough'ed e1000e device using MSIs?
>>
>> So for VFIO, you either need to translate the MSI-X entry using the
>> viommu,
> 
> For the vsmmuv3 I created a dedicated IOMMUNotifier to handle the fact
> the MSI doorbell is translated and MSI routes need to be updated. This
> seems to work.
> 
>  or just assume that the vaddr corresponds to the only MSI doorbell
>> accessible by this device (because how can we be certain that the guest
>> already mapped the doorbell before writing the entry?)
>>
>> For ARM machines it's probably best to stick with IOMMU_RESV_SW_MSI.
>> However, a nice way to use IOMMU_RESV_MSI would be for the virtio-iommu
>> device to advertise identity-mapped/reserved regions, and bypass
>> translation on these regions. Then the driver could reserve those with
>> IOMMU_RESV_MSI.
> 
> At least we may need to configure the virtio-iommu to either bypass MSIs
> (x86) or translate MSIs (ARM)?

Yes, see the VIRTIO_IOMMU_T_PROBE proposal in, er, my other reply.

>  For x86 we will need such a system, with an added IRQ
>> remapping feature.
> Meaning this must live along with vIR, is that what you mean? Also on
> ARM this must live with vITS anyway. This is an orthogonal feature, right?

Reserving doorbells regions on x86 is a must otherwise MSIs won't work.
IRQ remapping would be nice to add in some distant future.

Thanks,
Jean




Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-07 Thread Jean-Philippe Brucker
On 07/07/17 07:21, Tian, Kevin wrote:
> sorry I didn't quite get this part, and here is my understanding:
> 
> Guest programs vIOMMU to map a gIOVA (used by MSI to a GPA 
> of doorbell register of virtual irqchip. vIOMMU then 
> triggers VFIO map/unmap to update physical IOMMU page
> table for gIOVA -> HPA of real doorbell of physical irqchip

At the moment (non-SVM), physical and virtual MSI doorbell are completely
dissociated. VFIO itself maps the doorbell GPA->HPA during container
initialization. The GPA, chosen arbitrarily by the host, is then removed
from the guest GPA space.

When the guest programs the vIOMMU to map a gIOVA to the virtual irqchip
doorbell, I suppose Qemu will notice that the GPA doesn't correspond to
RAM and will withhold sending a VFIO_IOMMU_MAP_DMA request.

(For SVM I don't want to go into the details just now, but we will
probably need a separate VFIO mechanism to update the physical MSI-X
tables with whatever gIOVA the guest mapped in its private stage-1 page
tables.)

> (assume your irqchip will provide multiple doorbells so each
> device can have its own channel).

In existing irqchips the doorbell is shared by endpoints, which are
differentiated by their device ID (generally the BDF). I'm not sure why
this matters here?

> then once this update is
> done, later MSI interrupts from assigned device will go 
> through physical IOMMU (gIOVA->HPA) then reach irqchip 
> for irq remapping. vIOMMU is involved only in configuration
> path instead of actual interrupt path.

Yes the vIOMMU is used to correlate the IOVA written by the guest in its
virtual MSI-X table with the MAP request received by the vIOMMU. That is
probably used to setup IRQFD routes with KVM. But the vIOMMU is not
involved further than that in MSIs.

> If my understanding is correct, above will be the natural flow then
> why is additional virtio-iommu change required? :-)

The change is not *required* for ARM systems, I only proposed removing the
doorbell address translation stage to make host implementation simpler
(and since virtio-iommu on x86 won't translate the doorbell anyway, we
have to add support for this to virtio-iommu). But for Qemu, since vSMMU
needs to implement the natural flow anyway, it might not be a lot of
effort to also do it for virtio-iommu. Other implementations (e.g.
kvmtool) might piggy-back on the x86 way and declare the irqchip doorbell
as untranslated.

My proposal also breaks when confronted to virtual SVM in a physical ARM
system, where the guest owns stage-1 page tables and *has* to map the
doorbell if it wants MSIs to work, so you can disregard it :)

Thanks,
Jean



Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-07 Thread Bharat Bhushan
Hi Eric,

> >>>> -Original Message-
> >>>> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> >>>> Sent: Thursday, July 06, 2017 3:33 PM
> >>>> To: Bharat Bhushan ; Auger Eric
> >>>> ; eric.auger@gmail.com;
> >>>> peter.mayd...@linaro.org; alex.william...@redhat.com;
> >> m...@redhat.com;
> >>>> qemu-...@nongnu.org; qemu-devel@nongnu.org
> >>>> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
> >>>> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
> >>>> robin.mur...@arm.com; christoffer.d...@linaro.org
> >>>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> >>>>
> >>>> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup
> >>>> msi-route
> >>>> kvm_irqchip_add_msi_route() we needed to
> >>>>>> provide the translated address.
> >>>>>>> According to my understanding this is required because kernel
> >>>>>>> does no go
> >>>>>> through viommu translation when generating interrupt, no?
> >>>>>>
> >>>>>> yes this is needed when KVM MSI routes are set up, ie. along with
> >>>>>> GICV3
> >>>> ITS.
> >>>>>> With GICv2M, qemu direct gsi mapping is used and this is not
> needed.
> >>>>>>
> >>>>>> So I do not understand your previous sentence saying "MSI
> >>>>>> interrupts works without any change".
> >>>>>
> >>>>> I have almost completed vfio integration with virtio-iommu and now
> >>>>> testing the changes by assigning e1000 device to VM. For this I
> >>>>> have changed virtio-iommu driver to use IOMMU_RESV_MSI rather
> than
> >>>>> sw-
> >> msi
> >>>>> and this does not need changed in vfio_get_addr()  and
> >>>>> kvm_irqchip_add_msi_route()
> >>>>
> >>>> I understand you're reserving region 0x0800-0x0810 as
> >>>> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this
> only
> >> works
> >>>> because Qemu places the vgic in that area as well (in hw/arm/virt.c).
> >>>> It's not a coincidence if the addresses are the same, because Eric
> >>>> chose them for the Linux SMMU drivers and I copied them.
> >>>>
> >>>> We can't rely on that behavior, though, it will break MSIs in
> >>>> emulated devices. And if Qemu happens to move the MSI doorbell in
> >>>> future machine revisions, then it would also break VFIO.
> >>>
> >>> Yes, make sense to me
> >>>
> >>>>
> >>>> Just for my own understanding -- what happens, I think, is that in
> >>>> Linux iova_reserve_iommu_regions initially reserves the
> >>>> guest-physical doorbell 0x0800-0x0810. Then much later,
> >>>> when the device driver requests an MSI, the irqchip driver calls
> >>>> iommu_dma_map_msi_msg with the guest- physical gicv2m address
> >>>> 0x0802. The function finds the right page in msi_page_list,
> >>>> which was added by cookie_init_hw_msi_region, therefore bypassing
> >>>> the
> >> viommu and the GPA gets written in the MSI-X table.
> >>>
> >>> This means in case tomorrow when qemu changes virt machine address
> >> map and vgic-its (its-translator register address) address range does
> >> not fall in the msi_page_list then it will allocate a new iova,
> >> create mapping in iommu. So this will no longer be identity mapped
> >> and fail to work with new qemu?
> >>>
> >> Yes that's correct.
> >>>>
> >>>> If an emulated device such as virtio-net-pci were to generate an
> >>>> MSI, then Qemu would attempt to access the doorbell written by
> >>>> Linux into the MSI-X table, 0x0802, and fault because that
> >>>> address wasn't mapped in the viommu.
> >>>>
> >>>> So for VFIO, you either need to translate the MSI-X entry using the
> >>>> viommu, or just assume that the vaddr corresponds to the only MSI
> >>>> doorbell accessible by this device (because how can we be certain
> >>>> that the guest already mapped the doorbell before writing th

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-07 Thread Auger Eric
Hi,

On 06/07/2017 23:11, Auger Eric wrote:
> Hello Bharat, Jean-Philippe,
> On 06/07/2017 12:02, Jean-Philippe Brucker wrote:
>> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route
>> kvm_irqchip_add_msi_route() we needed to
 provide the translated address.
> According to my understanding this is required because kernel does no go
 through viommu translation when generating interrupt, no?

 yes this is needed when KVM MSI routes are set up, ie. along with GICV3 
 ITS.
 With GICv2M, qemu direct gsi mapping is used and this is not needed.

 So I do not understand your previous sentence saying "MSI interrupts works
 without any change".
>>>
>>> I have almost completed vfio integration with virtio-iommu and now testing 
>>> the changes by assigning e1000 device to VM. For this I have changed 
>>> virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi and this does 
>>> not need changed in vfio_get_addr()  and kvm_irqchip_add_msi_route()
>>
>> I understand you're reserving region 0x0800-0x0810 as
>> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only works
>> because Qemu places the vgic in that area as well (in hw/arm/virt.c). It's
>> not a coincidence if the addresses are the same, because Eric chose them
>> for the Linux SMMU drivers and I copied them.
> 
> Yes I chose this region because it does not overlap with any guest RAM
> region
> 
>>
>> We can't rely on that behavior, though, it will break MSIs in emulated
>> devices. And if Qemu happens to move the MSI doorbell in future machine
>> revisions, then it would also break VFIO.
>>
>> Just for my own understanding -- what happens, I think, is that in Linux
>> iova_reserve_iommu_regions initially reserves the guest-physical doorbell
>> 0x0800-0x0810. Then much later, when the device driver requests an
>> MSI, the irqchip driver calls iommu_dma_map_msi_msg with the
>> guest-physical gicv2m address 0x0802. The function finds the right
>> page in msi_page_list, which was added by cookie_init_hw_msi_region,
>> therefore bypassing the viommu and the GPA gets written in the MSI-X table.
> 
> I share Jean's understanding. To me using IOMMU_RESV_MSI in the
> virtio-iommu means this region is not translated by the IOMMU. as
> cookie_init_hw_msi_region() pre-allocates the msi_page array,
> iommu_dma_get_msi_page() does not do any IOMMU mapping.
> 
>>
>> If an emulated device such as virtio-net-pci were to generate an MSI, then
>> Qemu would attempt to access the doorbell written by Linux into the MSI-X
>> table, 0x0802, and fault because that address wasn't mapped in the 
>> viommu.
> Yes so I am confused, how can it work with a virtio-net-pci or
> passthrough'ed e1000e device using MSIs?
>>
>> So for VFIO, you either need to translate the MSI-X entry using the
>> viommu,
> 
> For the vsmmuv3 I created a dedicated IOMMUNotifier to handle the fact
> the MSI doorbell is translated and MSI routes need to be updated. This
> seems to work.
> 
>  or just assume that the vaddr corresponds to the only MSI doorbell
>> accessible by this device (because how can we be certain that the guest
>> already mapped the doorbell before writing the entry?)
>>
>> For ARM machines it's probably best to stick with IOMMU_RESV_SW_MSI.
>> However, a nice way to use IOMMU_RESV_MSI would be for the virtio-iommu
>> device to advertise identity-mapped/reserved regions, and bypass
>> translation on these regions. Then the driver could reserve those with
>> IOMMU_RESV_MSI.
> 
> At least we may need to configure the virtio-iommu to either bypass MSIs
> (x86) or translate MSIs (ARM)?

Actually on x86 no MSI controller will attempt to map MSIs, as opposed
to ARM GICv2M & ITS. So the only problem exposing IOMMU_RESV_SW_MSI
regions is vfio_iommu_type1 will assess the IRQ assignment safety using
irq_domain_check_msi_remap() and not with the IOMMU IOMMU_CAP_INTR_REMAP
capability.

Thanks

Eric
>  For x86 we will need such a system, with an added IRQ
>> remapping feature.
> Meaning this must live along with vIR, is that what you mean? Also on
> ARM this must live with vITS anyway. This is an orthogonal feature, right?
> 
> Thanks
> 
> Eric
>>
>> Thanks,
>> Jean
>>



Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-07 Thread Auger Eric


On 07/07/2017 08:25, Bharat Bhushan wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: Auger Eric [mailto:eric.au...@redhat.com]
>> Sent: Friday, July 07, 2017 2:47 AM
>> To: Bharat Bhushan ; Jean-Philippe Brucker
>> ; eric.auger@gmail.com;
>> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com;
>> qemu-...@nongnu.org; qemu-devel@nongnu.org
>> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
>> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
>> robin.mur...@arm.com; christoffer.d...@linaro.org
>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
>>
>> Hi Bharat,
>>
>> On 06/07/2017 13:24, Bharat Bhushan wrote:
>>>
>>>
>>>> -Original Message-
>>>> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
>>>> Sent: Thursday, July 06, 2017 3:33 PM
>>>> To: Bharat Bhushan ; Auger Eric
>>>> ; eric.auger@gmail.com;
>>>> peter.mayd...@linaro.org; alex.william...@redhat.com;
>> m...@redhat.com;
>>>> qemu-...@nongnu.org; qemu-devel@nongnu.org
>>>> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
>>>> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
>>>> robin.mur...@arm.com; christoffer.d...@linaro.org
>>>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
>>>>
>>>> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route
>>>> kvm_irqchip_add_msi_route() we needed to
>>>>>> provide the translated address.
>>>>>>> According to my understanding this is required because kernel does
>>>>>>> no go
>>>>>> through viommu translation when generating interrupt, no?
>>>>>>
>>>>>> yes this is needed when KVM MSI routes are set up, ie. along with
>>>>>> GICV3
>>>> ITS.
>>>>>> With GICv2M, qemu direct gsi mapping is used and this is not needed.
>>>>>>
>>>>>> So I do not understand your previous sentence saying "MSI
>>>>>> interrupts works without any change".
>>>>>
>>>>> I have almost completed vfio integration with virtio-iommu and now
>>>>> testing the changes by assigning e1000 device to VM. For this I have
>>>>> changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-
>> msi
>>>>> and this does not need changed in vfio_get_addr()  and
>>>>> kvm_irqchip_add_msi_route()
>>>>
>>>> I understand you're reserving region 0x0800-0x0810 as
>>>> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only
>> works
>>>> because Qemu places the vgic in that area as well (in hw/arm/virt.c).
>>>> It's not a coincidence if the addresses are the same, because Eric
>>>> chose them for the Linux SMMU drivers and I copied them.
>>>>
>>>> We can't rely on that behavior, though, it will break MSIs in
>>>> emulated devices. And if Qemu happens to move the MSI doorbell in
>>>> future machine revisions, then it would also break VFIO.
>>>
>>> Yes, make sense to me
>>>
>>>>
>>>> Just for my own understanding -- what happens, I think, is that in
>>>> Linux iova_reserve_iommu_regions initially reserves the
>>>> guest-physical doorbell 0x0800-0x0810. Then much later, when
>>>> the device driver requests an MSI, the irqchip driver calls
>>>> iommu_dma_map_msi_msg with the guest- physical gicv2m address
>>>> 0x0802. The function finds the right page in msi_page_list, which
>>>> was added by cookie_init_hw_msi_region, therefore bypassing the
>> viommu and the GPA gets written in the MSI-X table.
>>>
>>> This means in case tomorrow when qemu changes virt machine address
>> map and vgic-its (its-translator register address) address range does not 
>> fall
>> in the msi_page_list then it will allocate a new iova, create mapping in
>> iommu. So this will no longer be identity mapped and fail to work with new
>> qemu?
>>>
>> Yes that's correct.
>>>>
>>>> If an emulated device such as virtio-net-pci were to generate an MSI,
>>>> then Qemu would attempt to access the doorbell written by Linux into
>>>> the MSI-X table, 0x0802, and fault because that address wasn't
>>>> mapped in the 

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-06 Thread Bharat Bhushan
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: Friday, July 07, 2017 2:47 AM
> To: Bharat Bhushan ; Jean-Philippe Brucker
> ; eric.auger@gmail.com;
> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com;
> qemu-...@nongnu.org; qemu-devel@nongnu.org
> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
> robin.mur...@arm.com; christoffer.d...@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> Hi Bharat,
> 
> On 06/07/2017 13:24, Bharat Bhushan wrote:
> >
> >
> >> -Original Message-
> >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> >> Sent: Thursday, July 06, 2017 3:33 PM
> >> To: Bharat Bhushan ; Auger Eric
> >> ; eric.auger@gmail.com;
> >> peter.mayd...@linaro.org; alex.william...@redhat.com;
> m...@redhat.com;
> >> qemu-...@nongnu.org; qemu-devel@nongnu.org
> >> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
> >> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
> >> robin.mur...@arm.com; christoffer.d...@linaro.org
> >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> >>
> >> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route
> >> kvm_irqchip_add_msi_route() we needed to
> >>>> provide the translated address.
> >>>>> According to my understanding this is required because kernel does
> >>>>> no go
> >>>> through viommu translation when generating interrupt, no?
> >>>>
> >>>> yes this is needed when KVM MSI routes are set up, ie. along with
> >>>> GICV3
> >> ITS.
> >>>> With GICv2M, qemu direct gsi mapping is used and this is not needed.
> >>>>
> >>>> So I do not understand your previous sentence saying "MSI
> >>>> interrupts works without any change".
> >>>
> >>> I have almost completed vfio integration with virtio-iommu and now
> >>> testing the changes by assigning e1000 device to VM. For this I have
> >>> changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-
> msi
> >>> and this does not need changed in vfio_get_addr()  and
> >>> kvm_irqchip_add_msi_route()
> >>
> >> I understand you're reserving region 0x0800-0x0810 as
> >> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only
> works
> >> because Qemu places the vgic in that area as well (in hw/arm/virt.c).
> >> It's not a coincidence if the addresses are the same, because Eric
> >> chose them for the Linux SMMU drivers and I copied them.
> >>
> >> We can't rely on that behavior, though, it will break MSIs in
> >> emulated devices. And if Qemu happens to move the MSI doorbell in
> >> future machine revisions, then it would also break VFIO.
> >
> > Yes, make sense to me
> >
> >>
> >> Just for my own understanding -- what happens, I think, is that in
> >> Linux iova_reserve_iommu_regions initially reserves the
> >> guest-physical doorbell 0x0800-0x0810. Then much later, when
> >> the device driver requests an MSI, the irqchip driver calls
> >> iommu_dma_map_msi_msg with the guest- physical gicv2m address
> >> 0x0802. The function finds the right page in msi_page_list, which
> >> was added by cookie_init_hw_msi_region, therefore bypassing the
> viommu and the GPA gets written in the MSI-X table.
> >
> > This means in case tomorrow when qemu changes virt machine address
> map and vgic-its (its-translator register address) address range does not fall
> in the msi_page_list then it will allocate a new iova, create mapping in
> iommu. So this will no longer be identity mapped and fail to work with new
> qemu?
> >
> Yes that's correct.
> >>
> >> If an emulated device such as virtio-net-pci were to generate an MSI,
> >> then Qemu would attempt to access the doorbell written by Linux into
> >> the MSI-X table, 0x0802, and fault because that address wasn't
> >> mapped in the viommu.
> >>
> >> So for VFIO, you either need to translate the MSI-X entry using the
> >> viommu, or just assume that the vaddr corresponds to the only MSI
> >> doorbell accessible by this device (because how can we be certain
> >> that the guest already mapped the doorbell before writing the entry?)
&

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-06 Thread Tian, Kevin
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Wednesday, July 5, 2017 8:45 PM
> 
> On 05/07/17 08:14, Tian, Kevin wrote:
> >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> >> Sent: Monday, June 19, 2017 6:15 PM
> >>
> >> On 19/06/17 08:54, Bharat Bhushan wrote:
> >>> Hi Eric,
> >>>
> >>> I started added replay in virtio-iommu and came across how MSI
> interrupts
> >> with work with VFIO.
> >>> I understand that on intel this works differently but vsmmu will have
> same
> >> requirement.
> >>> kvm-msi-irq-route are added using the msi-address to be translated by
> >> viommu and not the final translated address.
> >>> While currently the irqfd framework does not know about emulated
> >> iommus (virtio-iommu, vsmmuv3/vintel-iommu).
> >>> So in my view we have following options:
> >>> - Programming with translated address when setting up kvm-msi-irq-
> route
> >>> - Route the interrupts via QEMU, which is bad from performance
> >>> - vhost-virtio-iommu may solve the problem in long term
> >>>
> >>> Is there any other better option I am missing?
> >>
> >> Since we're on the topic of MSIs... I'm currently trying to figure out how
> >> we'll handle MSIs in the nested translation mode, where the guest
> manages
> >> S1 page tables and the host doesn't know about GVA->GPA translation.
> >>
> >> I'm also wondering about the benefits of having SW-mapped MSIs in the
> >> guest. It seems unavoidable for vSMMU since that's what a physical
> system
> >> would do. But in a paravirtualized solution there doesn't seem to be any
> >> compelling reason for having the guest map MSI doorbells. These
> addresses
> >> are never accessed directly, they are only used for setting up IRQ routing
> >> (at least on kvmtool). So here's what I'd like to have. Note that I
> >> haven't investigated the feasibility in Qemu yet, I don't know how it
> >> deals with MSIs.
> >>
> >> (1) Guest uses the guest-physical MSI doorbell when setting up MSIs. For
> >> ARM with GICv3 this would be GITS_TRANSLATER, for x86 it would be the
> >> fixed MSI doorbell. This way the host wouldn't need to inspect IOMMU
> >> mappings when handling writes to PCI MSI-X tables.
> >>
> >
> > What do you mean by "fixed MSI doorbell"? PCI MSI-X table is part of
> > PCI MMIO bar. Accessing to it is just a memory virtualization issue (e.g.
> > trap by KVM and then emulated in Qemu) on x86. It's not a IOMMU
> > problem. I guess you may mean same thing but want to double confirm
> > here given the terminology confusion. Or do you mean the interrupt
> > triggered by IOMMU itself?
> 
> Yes I didn't mean access to the MSI-X table, but how we interpret the
> address in the MSI message. In kvmtool I create MSI routes for VFIO
> devices when the guest accesses the MSI-X tables. And on ARM the tables
> contains an IOVA that needs to be translated into a PA, so handling a
> write to an MSI-X entry might mean doing the IOVA->PA translation of the
> doorbell.
> 
> On x86 the MSI address is 0xfee, whether there is an IOMMU or not.
> That's what I meant by fixed. And it is the IOMMU that performs IRQ
> remapping.
> 
> On physical ARM systems, the SMMU doesn't treat any special address range
> as "MSI window". For the SMMU, an MSI is simply a memory transaction.
> MSI
> addresses are arbitrary IOVAs that get translated into PAs by the SMMU.
> The SMMU doesn't perform any IRQ remapping, only address translation.
> This
> PA is a doorbell register in the irqchip, which performs IRQ remapping and
> triggers an interrupt.

Thanks for explanation. I see the background now.

> 
> Therefore in an emulated ARM system, when the guest writes the MSI-X
> table, it writes an IOVA. In a strict emulation the MSI would have to
> first go through the vIOMMU, and then into the irqchip. I was wondering if
> with virtio-iommu we could skip the address translation and go to the MSI
> remapping component immediately, effectively implementing a "hardware
> MSI
> window". This is what x86 does, the difference being that MSI remapping is
> done by the IOMMU on x86, and by the irqchip on ARM.

sorry I didn't quite get this part, and here is my understanding:

Guest programs vIOMMU to map a gIOVA (used by MSI to a GPA 
of doorbell register of virtual irqchip. vIOMMU then 
triggers VFIO map/unmap to update physical IOMMU page
table for gIOVA -> HPA of real doorbell of physical irqchip
(assume your irqchip will provide multiple doorbells so each
device can have its own channel). then once this update is
done, later MSI interrupts from assigned device will go 
through physical IOMMU (gIOVA->HPA) then reach irqchip 
for irq remapping. vIOMMU is involved only in configuration
path instead of actual interrupt path.

If my understanding is correct, above will be the natural flow then
why is additional virtio-iommu change required? :-)

> 
> My current take is that we should keep the current behavior, but I will
> try to sort out the different ways of implementin

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-06 Thread Auger Eric
Hi Bharat,

On 06/07/2017 13:24, Bharat Bhushan wrote:
> 
> 
>> -Original Message-
>> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
>> Sent: Thursday, July 06, 2017 3:33 PM
>> To: Bharat Bhushan ; Auger Eric
>> ; eric.auger@gmail.com;
>> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com;
>> qemu-...@nongnu.org; qemu-devel@nongnu.org
>> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
>> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
>> robin.mur...@arm.com; christoffer.d...@linaro.org
>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
>>
>> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route
>> kvm_irqchip_add_msi_route() we needed to
>>>> provide the translated address.
>>>>> According to my understanding this is required because kernel does
>>>>> no go
>>>> through viommu translation when generating interrupt, no?
>>>>
>>>> yes this is needed when KVM MSI routes are set up, ie. along with GICV3
>> ITS.
>>>> With GICv2M, qemu direct gsi mapping is used and this is not needed.
>>>>
>>>> So I do not understand your previous sentence saying "MSI interrupts
>>>> works without any change".
>>>
>>> I have almost completed vfio integration with virtio-iommu and now
>>> testing the changes by assigning e1000 device to VM. For this I have
>>> changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi
>>> and this does not need changed in vfio_get_addr()  and
>>> kvm_irqchip_add_msi_route()
>>
>> I understand you're reserving region 0x0800-0x0810 as
>> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only
>> works because Qemu places the vgic in that area as well (in hw/arm/virt.c).
>> It's not a coincidence if the addresses are the same, because Eric chose them
>> for the Linux SMMU drivers and I copied them.
>>
>> We can't rely on that behavior, though, it will break MSIs in emulated
>> devices. And if Qemu happens to move the MSI doorbell in future machine
>> revisions, then it would also break VFIO.
> 
> Yes, make sense to me
> 
>>
>> Just for my own understanding -- what happens, I think, is that in Linux
>> iova_reserve_iommu_regions initially reserves the guest-physical doorbell
>> 0x0800-0x0810. Then much later, when the device driver requests
>> an MSI, the irqchip driver calls iommu_dma_map_msi_msg with the guest-
>> physical gicv2m address 0x0802. The function finds the right page in
>> msi_page_list, which was added by cookie_init_hw_msi_region, therefore
>> bypassing the viommu and the GPA gets written in the MSI-X table.
> 
> This means in case tomorrow when qemu changes virt machine address map and 
> vgic-its (its-translator register address) address range does not fall in the 
> msi_page_list then it will allocate a new iova, create mapping in iommu. So 
> this will no longer be identity mapped and fail to work with new qemu?
> 
Yes that's correct.
>>
>> If an emulated device such as virtio-net-pci were to generate an MSI, then
>> Qemu would attempt to access the doorbell written by Linux into the MSI-X
>> table, 0x0802, and fault because that address wasn't mapped in the
>> viommu.
>>
>> So for VFIO, you either need to translate the MSI-X entry using the viommu,
>> or just assume that the vaddr corresponds to the only MSI doorbell
>> accessible by this device (because how can we be certain that the guest
>> already mapped the doorbell before writing the entry?)
>>
>> For ARM machines it's probably best to stick with IOMMU_RESV_SW_MSI.
>> However, a nice way to use IOMMU_RESV_MSI would be for the virtio-
>> iommu device to advertise identity-mapped/reserved regions, and bypass
>> translation on these regions. Then the driver could reserve those with
>> IOMMU_RESV_MSI.
> 
> Correct me if I did not understood you correctly, today iommu-driver decides 
> msi-reserved region, what if we change this and virtio-iommu device will 
> provide the reserved msi region as per the emulated machine (virt/intel). So 
> virtio-iommu driver will use the address advertised by virtio-iommu device as 
> IOMMU_RESV_MSI. In this case msi-page-list will always have the reserved 
> region for MSI.
> On qemu side, for emulated devices we will let virtio-iommu return same 
> address as translated address as it falls in MSI-reserved page already known 
> to it.

I think what you're proposing 

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-06 Thread Auger Eric
Hello Bharat, Jean-Philippe,
On 06/07/2017 12:02, Jean-Philippe Brucker wrote:
> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route
> kvm_irqchip_add_msi_route() we needed to
>>> provide the translated address.
 According to my understanding this is required because kernel does no go
>>> through viommu translation when generating interrupt, no?
>>>
>>> yes this is needed when KVM MSI routes are set up, ie. along with GICV3 ITS.
>>> With GICv2M, qemu direct gsi mapping is used and this is not needed.
>>>
>>> So I do not understand your previous sentence saying "MSI interrupts works
>>> without any change".
>>
>> I have almost completed vfio integration with virtio-iommu and now testing 
>> the changes by assigning e1000 device to VM. For this I have changed 
>> virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi and this does 
>> not need changed in vfio_get_addr()  and kvm_irqchip_add_msi_route()
> 
> I understand you're reserving region 0x0800-0x0810 as
> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only works
> because Qemu places the vgic in that area as well (in hw/arm/virt.c). It's
> not a coincidence if the addresses are the same, because Eric chose them
> for the Linux SMMU drivers and I copied them.

Yes I chose this region because it does not overlap with any guest RAM
region

> 
> We can't rely on that behavior, though, it will break MSIs in emulated
> devices. And if Qemu happens to move the MSI doorbell in future machine
> revisions, then it would also break VFIO.
> 
> Just for my own understanding -- what happens, I think, is that in Linux
> iova_reserve_iommu_regions initially reserves the guest-physical doorbell
> 0x0800-0x0810. Then much later, when the device driver requests an
> MSI, the irqchip driver calls iommu_dma_map_msi_msg with the
> guest-physical gicv2m address 0x0802. The function finds the right
> page in msi_page_list, which was added by cookie_init_hw_msi_region,
> therefore bypassing the viommu and the GPA gets written in the MSI-X table.

I share Jean's understanding. To me using IOMMU_RESV_MSI in the
virtio-iommu means this region is not translated by the IOMMU. as
cookie_init_hw_msi_region() pre-allocates the msi_page array,
iommu_dma_get_msi_page() does not do any IOMMU mapping.

> 
> If an emulated device such as virtio-net-pci were to generate an MSI, then
> Qemu would attempt to access the doorbell written by Linux into the MSI-X
> table, 0x0802, and fault because that address wasn't mapped in the viommu.
Yes so I am confused, how can it work with a virtio-net-pci or
passthrough'ed e1000e device using MSIs?
> 
> So for VFIO, you either need to translate the MSI-X entry using the
> viommu,

For the vsmmuv3 I created a dedicated IOMMUNotifier to handle the fact
the MSI doorbell is translated and MSI routes need to be updated. This
seems to work.

 or just assume that the vaddr corresponds to the only MSI doorbell
> accessible by this device (because how can we be certain that the guest
> already mapped the doorbell before writing the entry?)
> 
> For ARM machines it's probably best to stick with IOMMU_RESV_SW_MSI.
> However, a nice way to use IOMMU_RESV_MSI would be for the virtio-iommu
> device to advertise identity-mapped/reserved regions, and bypass
> translation on these regions. Then the driver could reserve those with
> IOMMU_RESV_MSI.

At least we may need to configure the virtio-iommu to either bypass MSIs
(x86) or translate MSIs (ARM)?
 For x86 we will need such a system, with an added IRQ
> remapping feature.
Meaning this must live along with vIR, is that what you mean? Also on
ARM this must live with vITS anyway. This is an orthogonal feature, right?

Thanks

Eric
> 
> Thanks,
> Jean
> 



Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-06 Thread Jean-Philippe Brucker
On 06/07/17 12:24, Bharat Bhushan wrote:
> 
> 
>> -Original Message-
>> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
>> Sent: Thursday, July 06, 2017 3:33 PM
>> To: Bharat Bhushan ; Auger Eric
>> ; eric.auger@gmail.com;
>> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com;
>> qemu-...@nongnu.org; qemu-devel@nongnu.org
>> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
>> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
>> robin.mur...@arm.com; christoffer.d...@linaro.org
>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
>>
>> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route
>> kvm_irqchip_add_msi_route() we needed to
>>>> provide the translated address.
>>>>> According to my understanding this is required because kernel does
>>>>> no go
>>>> through viommu translation when generating interrupt, no?
>>>>
>>>> yes this is needed when KVM MSI routes are set up, ie. along with GICV3
>> ITS.
>>>> With GICv2M, qemu direct gsi mapping is used and this is not needed.
>>>>
>>>> So I do not understand your previous sentence saying "MSI interrupts
>>>> works without any change".
>>>
>>> I have almost completed vfio integration with virtio-iommu and now
>>> testing the changes by assigning e1000 device to VM. For this I have
>>> changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi
>>> and this does not need changed in vfio_get_addr()  and
>>> kvm_irqchip_add_msi_route()
>>
>> I understand you're reserving region 0x0800-0x0810 as
>> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only
>> works because Qemu places the vgic in that area as well (in hw/arm/virt.c).
>> It's not a coincidence if the addresses are the same, because Eric chose them
>> for the Linux SMMU drivers and I copied them.
>>
>> We can't rely on that behavior, though, it will break MSIs in emulated
>> devices. And if Qemu happens to move the MSI doorbell in future machine
>> revisions, then it would also break VFIO.
> 
> Yes, make sense to me
> 
>>
>> Just for my own understanding -- what happens, I think, is that in Linux
>> iova_reserve_iommu_regions initially reserves the guest-physical doorbell
>> 0x0800-0x0810. Then much later, when the device driver requests
>> an MSI, the irqchip driver calls iommu_dma_map_msi_msg with the guest-
>> physical gicv2m address 0x0802. The function finds the right page in
>> msi_page_list, which was added by cookie_init_hw_msi_region, therefore
>> bypassing the viommu and the GPA gets written in the MSI-X table.
> 
> This means in case tomorrow when qemu changes virt machine address map and 
> vgic-its (its-translator register address) address range does not fall in the 
> msi_page_list then it will allocate a new iova, create mapping in iommu. So 
> this will no longer be identity mapped and fail to work with new qemu?

Precisely

>>
>> If an emulated device such as virtio-net-pci were to generate an MSI, then
>> Qemu would attempt to access the doorbell written by Linux into the MSI-X
>> table, 0x0802, and fault because that address wasn't mapped in the
>> viommu.
>>
>> So for VFIO, you either need to translate the MSI-X entry using the viommu,
>> or just assume that the vaddr corresponds to the only MSI doorbell
>> accessible by this device (because how can we be certain that the guest
>> already mapped the doorbell before writing the entry?)
>>
>> For ARM machines it's probably best to stick with IOMMU_RESV_SW_MSI.
>> However, a nice way to use IOMMU_RESV_MSI would be for the virtio-
>> iommu device to advertise identity-mapped/reserved regions, and bypass
>> translation on these regions. Then the driver could reserve those with
>> IOMMU_RESV_MSI.
> 
> Correct me if I did not understood you correctly, today iommu-driver decides 
> msi-reserved region, what if we change this and virtio-iommu device will 
> provide the reserved msi region as per the emulated machine (virt/intel). So 
> virtio-iommu driver will use the address advertised by virtio-iommu device as 
> IOMMU_RESV_MSI. In this case msi-page-list will always have the reserved 
> region for MSI.
> On qemu side, for emulated devices we will let virtio-iommu return same 
> address as translated address as it falls in MSI-reserved page already known 
> to it.

Yes that's it. For example on x86, the virtio-iommu device wi

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-06 Thread Bharat Bhushan


> -Original Message-
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Thursday, July 06, 2017 3:33 PM
> To: Bharat Bhushan ; Auger Eric
> ; eric.auger@gmail.com;
> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com;
> qemu-...@nongnu.org; qemu-devel@nongnu.org
> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
> robin.mur...@arm.com; christoffer.d...@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route
> kvm_irqchip_add_msi_route() we needed to
> >> provide the translated address.
> >>> According to my understanding this is required because kernel does
> >>> no go
> >> through viommu translation when generating interrupt, no?
> >>
> >> yes this is needed when KVM MSI routes are set up, ie. along with GICV3
> ITS.
> >> With GICv2M, qemu direct gsi mapping is used and this is not needed.
> >>
> >> So I do not understand your previous sentence saying "MSI interrupts
> >> works without any change".
> >
> > I have almost completed vfio integration with virtio-iommu and now
> > testing the changes by assigning e1000 device to VM. For this I have
> > changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi
> > and this does not need changed in vfio_get_addr()  and
> > kvm_irqchip_add_msi_route()
> 
> I understand you're reserving region 0x0800-0x0810 as
> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only
> works because Qemu places the vgic in that area as well (in hw/arm/virt.c).
> It's not a coincidence if the addresses are the same, because Eric chose them
> for the Linux SMMU drivers and I copied them.
> 
> We can't rely on that behavior, though, it will break MSIs in emulated
> devices. And if Qemu happens to move the MSI doorbell in future machine
> revisions, then it would also break VFIO.

Yes, make sense to me

> 
> Just for my own understanding -- what happens, I think, is that in Linux
> iova_reserve_iommu_regions initially reserves the guest-physical doorbell
> 0x0800-0x0810. Then much later, when the device driver requests
> an MSI, the irqchip driver calls iommu_dma_map_msi_msg with the guest-
> physical gicv2m address 0x0802. The function finds the right page in
> msi_page_list, which was added by cookie_init_hw_msi_region, therefore
> bypassing the viommu and the GPA gets written in the MSI-X table.

This means in case tomorrow when qemu changes virt machine address map and 
vgic-its (its-translator register address) address range does not fall in the 
msi_page_list then it will allocate a new iova, create mapping in iommu. So 
this will no longer be identity mapped and fail to work with new qemu?

> 
> If an emulated device such as virtio-net-pci were to generate an MSI, then
> Qemu would attempt to access the doorbell written by Linux into the MSI-X
> table, 0x0802, and fault because that address wasn't mapped in the
> viommu.
> 
> So for VFIO, you either need to translate the MSI-X entry using the viommu,
> or just assume that the vaddr corresponds to the only MSI doorbell
> accessible by this device (because how can we be certain that the guest
> already mapped the doorbell before writing the entry?)
> 
> For ARM machines it's probably best to stick with IOMMU_RESV_SW_MSI.
> However, a nice way to use IOMMU_RESV_MSI would be for the virtio-
> iommu device to advertise identity-mapped/reserved regions, and bypass
> translation on these regions. Then the driver could reserve those with
> IOMMU_RESV_MSI.

Correct me if I did not understood you correctly, today iommu-driver decides 
msi-reserved region, what if we change this and virtio-iommu device will 
provide the reserved msi region as per the emulated machine (virt/intel). So 
virtio-iommu driver will use the address advertised by virtio-iommu device as 
IOMMU_RESV_MSI. In this case msi-page-list will always have the reserved region 
for MSI.
On qemu side, for emulated devices we will let virtio-iommu return same address 
as translated address as it falls in MSI-reserved page already known to it.


> For x86 we will need such a system, with an added IRQ
> remapping feature.

I do not understand x86 MSI interrupt generation, but If above understand is 
correct, then why we need IRQ remapping for x86?
Will the x86 machine emulated in QEMU provides a big address range for MSIs and 
when actually generating MSI it needed some extra processing (IRQ-remapping 
processing) before actually generating write transaction for MSI interrupt ? 

Thanks
-Bharat

> 
> Thanks,
> Jean


Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-06 Thread Jean-Philippe Brucker
On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route
kvm_irqchip_add_msi_route() we needed to
>> provide the translated address.
>>> According to my understanding this is required because kernel does no go
>> through viommu translation when generating interrupt, no?
>>
>> yes this is needed when KVM MSI routes are set up, ie. along with GICV3 ITS.
>> With GICv2M, qemu direct gsi mapping is used and this is not needed.
>>
>> So I do not understand your previous sentence saying "MSI interrupts works
>> without any change".
> 
> I have almost completed vfio integration with virtio-iommu and now testing 
> the changes by assigning e1000 device to VM. For this I have changed 
> virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-msi and this does 
> not need changed in vfio_get_addr()  and kvm_irqchip_add_msi_route()

I understand you're reserving region 0x0800-0x0810 as
IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only works
because Qemu places the vgic in that area as well (in hw/arm/virt.c). It's
not a coincidence if the addresses are the same, because Eric chose them
for the Linux SMMU drivers and I copied them.

We can't rely on that behavior, though, it will break MSIs in emulated
devices. And if Qemu happens to move the MSI doorbell in future machine
revisions, then it would also break VFIO.

Just for my own understanding -- what happens, I think, is that in Linux
iova_reserve_iommu_regions initially reserves the guest-physical doorbell
0x0800-0x0810. Then much later, when the device driver requests an
MSI, the irqchip driver calls iommu_dma_map_msi_msg with the
guest-physical gicv2m address 0x0802. The function finds the right
page in msi_page_list, which was added by cookie_init_hw_msi_region,
therefore bypassing the viommu and the GPA gets written in the MSI-X table.

If an emulated device such as virtio-net-pci were to generate an MSI, then
Qemu would attempt to access the doorbell written by Linux into the MSI-X
table, 0x0802, and fault because that address wasn't mapped in the viommu.

So for VFIO, you either need to translate the MSI-X entry using the
viommu, or just assume that the vaddr corresponds to the only MSI doorbell
accessible by this device (because how can we be certain that the guest
already mapped the doorbell before writing the entry?)

For ARM machines it's probably best to stick with IOMMU_RESV_SW_MSI.
However, a nice way to use IOMMU_RESV_MSI would be for the virtio-iommu
device to advertise identity-mapped/reserved regions, and bypass
translation on these regions. Then the driver could reserve those with
IOMMU_RESV_MSI. For x86 we will need such a system, with an added IRQ
remapping feature.

Thanks,
Jean



Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-05 Thread Jean-Philippe Brucker
On 05/07/17 08:25, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
>> Sent: Tuesday, June 27, 2017 12:13 AM
>>
>> On 26/06/17 09:22, Auger Eric wrote:
>>> Hi Jean-Philippe,
>>>
>>> On 19/06/2017 12:15, Jean-Philippe Brucker wrote:
 On 19/06/17 08:54, Bharat Bhushan wrote:
> Hi Eric,
>
> I started added replay in virtio-iommu and came across how MSI
>> interrupts with work with VFIO.
> I understand that on intel this works differently but vsmmu will have
>> same requirement.
> kvm-msi-irq-route are added using the msi-address to be translated by
>> viommu and not the final translated address.
> While currently the irqfd framework does not know about emulated
>> iommus (virtio-iommu, vsmmuv3/vintel-iommu).
> So in my view we have following options:
> - Programming with translated address when setting up kvm-msi-irq-
>> route
> - Route the interrupts via QEMU, which is bad from performance
> - vhost-virtio-iommu may solve the problem in long term
>
> Is there any other better option I am missing?

 Since we're on the topic of MSIs... I'm currently trying to figure out how
 we'll handle MSIs in the nested translation mode, where the guest
>> manages
 S1 page tables and the host doesn't know about GVA->GPA translation.
>>>
>>> I have a question about the "nested translation mode" terminology. Do
>>> you mean in that case you use stage 1 + stage 2 of the physical IOMMU
>>> (which the ARM spec normally advises or was meant for) or do you mean
>>> stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At
>> the
>>> moment my understanding is for VFIO integration the pIOMMU uses a
>> single
>>> stage combining both the stage 1 and stage2 mappings but the host is not
>>> aware of those 2 stages.
>>
>> Yes at the moment the VMM merges stage-1 (GVA->GPA) from the guest with
>> its stage-2 mappings (GPA->HPA) and creates a stage-2 mapping (GVA->HPA)
>> in the pIOMMU via VFIO_IOMMU_MAP_DMA. stage-1 is disabled in the
>> pIOMMU.
>>
> 
> Curious whether you are describing current smmu status or general
> vIOMMU status also applying to other vendors...

This particular paragraph was about the non-SVM state of things. The rest
was about stage-1 + stage-2 (what I call nested), which would indeed be
required for SVM. I don't think SVM can work with software merging.

Thanks,
Jean

> the usage what you described is about svm, while svm requires PASID.
> At least PASID is tied to stage-1 on Intel VT-d. Only DMA w/o PASID
> or nested translation from stage-1 will go through stage-2. Unless
> ARM smmu has a completely different implementation, I'm not sure
> how svm can be virtualized w/ stage-1 translation disabled. There
> are multiple stage-1 page tables while only one stage-2 page table per
> device. Could merging actually work here?
> 
> The only case with merging happen today is for guest stage-2 usage
> or so-called GIOVA usage. Guest programs GIOVA->GPA to vIOMMU 
> stage-2. Then vIOMMU invokes vfio map/unmap APIs to translate/
> merge to GIOVA->HPA to pIOMMU stage-2. Maybe what you
> actually meant is this one?
> 
> Thanks
> Kevin
> 




Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-05 Thread Jean-Philippe Brucker
On 05/07/17 08:14, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
>> Sent: Monday, June 19, 2017 6:15 PM
>>
>> On 19/06/17 08:54, Bharat Bhushan wrote:
>>> Hi Eric,
>>>
>>> I started added replay in virtio-iommu and came across how MSI interrupts
>> with work with VFIO.
>>> I understand that on intel this works differently but vsmmu will have same
>> requirement.
>>> kvm-msi-irq-route are added using the msi-address to be translated by
>> viommu and not the final translated address.
>>> While currently the irqfd framework does not know about emulated
>> iommus (virtio-iommu, vsmmuv3/vintel-iommu).
>>> So in my view we have following options:
>>> - Programming with translated address when setting up kvm-msi-irq-route
>>> - Route the interrupts via QEMU, which is bad from performance
>>> - vhost-virtio-iommu may solve the problem in long term
>>>
>>> Is there any other better option I am missing?
>>
>> Since we're on the topic of MSIs... I'm currently trying to figure out how
>> we'll handle MSIs in the nested translation mode, where the guest manages
>> S1 page tables and the host doesn't know about GVA->GPA translation.
>>
>> I'm also wondering about the benefits of having SW-mapped MSIs in the
>> guest. It seems unavoidable for vSMMU since that's what a physical system
>> would do. But in a paravirtualized solution there doesn't seem to be any
>> compelling reason for having the guest map MSI doorbells. These addresses
>> are never accessed directly, they are only used for setting up IRQ routing
>> (at least on kvmtool). So here's what I'd like to have. Note that I
>> haven't investigated the feasibility in Qemu yet, I don't know how it
>> deals with MSIs.
>>
>> (1) Guest uses the guest-physical MSI doorbell when setting up MSIs. For
>> ARM with GICv3 this would be GITS_TRANSLATER, for x86 it would be the
>> fixed MSI doorbell. This way the host wouldn't need to inspect IOMMU
>> mappings when handling writes to PCI MSI-X tables.
>>
> 
> What do you mean by "fixed MSI doorbell"? PCI MSI-X table is part of
> PCI MMIO bar. Accessing to it is just a memory virtualization issue (e.g.
> trap by KVM and then emulated in Qemu) on x86. It's not a IOMMU
> problem. I guess you may mean same thing but want to double confirm
> here given the terminology confusion. Or do you mean the interrupt
> triggered by IOMMU itself?

Yes I didn't mean access to the MSI-X table, but how we interpret the
address in the MSI message. In kvmtool I create MSI routes for VFIO
devices when the guest accesses the MSI-X tables. And on ARM the tables
contains an IOVA that needs to be translated into a PA, so handling a
write to an MSI-X entry might mean doing the IOVA->PA translation of the
doorbell.

On x86 the MSI address is 0xfee, whether there is an IOMMU or not.
That's what I meant by fixed. And it is the IOMMU that performs IRQ remapping.

On physical ARM systems, the SMMU doesn't treat any special address range
as "MSI window". For the SMMU, an MSI is simply a memory transaction. MSI
addresses are arbitrary IOVAs that get translated into PAs by the SMMU.
The SMMU doesn't perform any IRQ remapping, only address translation. This
PA is a doorbell register in the irqchip, which performs IRQ remapping and
triggers an interrupt.

Therefore in an emulated ARM system, when the guest writes the MSI-X
table, it writes an IOVA. In a strict emulation the MSI would have to
first go through the vIOMMU, and then into the irqchip. I was wondering if
with virtio-iommu we could skip the address translation and go to the MSI
remapping component immediately, effectively implementing a "hardware MSI
window". This is what x86 does, the difference being that MSI remapping is
done by the IOMMU on x86, and by the irqchip on ARM.

My current take is that we should keep the current behavior, but I will
try to sort out the different ways of implementing MSIs with virtio-iommu
in the next specification draft.

Thanks,
Jean



Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-05 Thread Bharat Bhushan
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: Wednesday, July 05, 2017 2:14 PM
> To: Bharat Bhushan ;
> eric.auger@gmail.com; peter.mayd...@linaro.org;
> alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org;
> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com
> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
> robin.mur...@arm.com; christoffer.d...@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> Hi Bharat,
> 
> On 05/07/2017 10:23, Bharat Bhushan wrote:
> > Hi Eric,
> >
> >> -Original Message-
> >> From: Auger Eric [mailto:eric.au...@redhat.com]
> >> Sent: Monday, June 26, 2017 1:25 PM
> >> To: Bharat Bhushan ;
> >> eric.auger@gmail.com; peter.mayd...@linaro.org;
> >> alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org;
> >> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com
> >> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
> >> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
> >> robin.mur...@arm.com; christoffer.d...@linaro.org
> >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> >>
> >> Hi Bharat,
> >>
> >> On 19/06/2017 09:54, Bharat Bhushan wrote:
> >>> Hi Eric,
> >>>
> >>> I started added replay in virtio-iommu and came across how MSI
> >>> interrupts
> >> with work with VFIO.
> >>> I understand that on intel this works differently but vsmmu will
> >>> have same
> >> requirement.
> >>> kvm-msi-irq-route are added using the msi-address to be translated
> >>> by
> >> viommu and not the final translated address.
> >>> While currently the irqfd framework does not know about emulated
> >> iommus (virtio-iommu, vsmmuv3/vintel-iommu).
> >>> So in my view we have following options:
> >>> - Programming with translated address when setting up
> >>> kvm-msi-irq-route
> >>> - Route the interrupts via QEMU, which is bad from performance
> >>> - vhost-virtio-iommu may solve the problem in long term
> >>
> >> Sorry for the delay. With regard to the vsmmuv3/vfio integration I
> >> think we need to use the guest physical address otherwise the MSI
> >> address will not be recognized as an MSI doorbell.
> >>
> >> Also the fact on ARM we map the MSI doorbell causes an assert in
> >> vfio_get_vaddr() as the vITS doorbell is not a RAM region. We will
> >> need to handle this specifically.
> >
> > Also when setup msi-route kvm_irqchip_add_msi_route() we needed to
> provide the translated address.
> > According to my understanding this is required because kernel does no go
> through viommu translation when generating interrupt, no?
> 
> yes this is needed when KVM MSI routes are set up, ie. along with GICV3 ITS.
> With GICv2M, qemu direct gsi mapping is used and this is not needed.
> 
> So I do not understand your previous sentence saying "MSI interrupts works
> without any change".

I have almost completed vfio integration with virtio-iommu and now testing the 
changes by assigning e1000 device to VM. For this I have changed virtio-iommu 
driver to use IOMMU_RESV_MSI rather than sw-msi and this does not need changed 
in vfio_get_addr()  and kvm_irqchip_add_msi_route()

Thanks
-Bharat

> 
> Thanks
> 
> Eric
> 
> >
> > Thanks
> > -Bharat
> >
> >>
> >> Besides I have not looked specifically at the virtio-iommu/vfio
> >> integration yet.
> >>
> >> Thanks
> >>
> >> Eric
> >>>
> >>> Is there any other better option I am missing?
> >>>
> >>> Thanks
> >>> -Bharat
> >>>
> >>>> -Original Message-
> >>>> From: Auger Eric [mailto:eric.au...@redhat.com]
> >>>> Sent: Friday, June 09, 2017 5:24 PM
> >>>> To: Bharat Bhushan ;
> >>>> eric.auger@gmail.com; peter.mayd...@linaro.org;
> >>>> alex.william...@redhat.com; m...@redhat.com; qemu-
> a...@nongnu.org;
> >>>> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com
> >>>> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
> >>>> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
> >>>> robin.mur...@arm.com; christoffer.d...@linaro.org
> >>>> Subject: Re: [Qe

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-05 Thread Auger Eric
Hi Bharat,

On 05/07/2017 10:23, Bharat Bhushan wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: Auger Eric [mailto:eric.au...@redhat.com]
>> Sent: Monday, June 26, 2017 1:25 PM
>> To: Bharat Bhushan ;
>> eric.auger@gmail.com; peter.mayd...@linaro.org;
>> alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org;
>> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com
>> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
>> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
>> robin.mur...@arm.com; christoffer.d...@linaro.org
>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
>>
>> Hi Bharat,
>>
>> On 19/06/2017 09:54, Bharat Bhushan wrote:
>>> Hi Eric,
>>>
>>> I started added replay in virtio-iommu and came across how MSI interrupts
>> with work with VFIO.
>>> I understand that on intel this works differently but vsmmu will have same
>> requirement.
>>> kvm-msi-irq-route are added using the msi-address to be translated by
>> viommu and not the final translated address.
>>> While currently the irqfd framework does not know about emulated
>> iommus (virtio-iommu, vsmmuv3/vintel-iommu).
>>> So in my view we have following options:
>>> - Programming with translated address when setting up
>>> kvm-msi-irq-route
>>> - Route the interrupts via QEMU, which is bad from performance
>>> - vhost-virtio-iommu may solve the problem in long term
>>
>> Sorry for the delay. With regard to the vsmmuv3/vfio integration I think we
>> need to use the guest physical address otherwise the MSI address will not be
>> recognized as an MSI doorbell.
>>
>> Also the fact on ARM we map the MSI doorbell causes an assert in
>> vfio_get_vaddr() as the vITS doorbell is not a RAM region. We will need to
>> handle this specifically.
> 
> Also when setup msi-route kvm_irqchip_add_msi_route() we needed to provide 
> the translated address.
> According to my understanding this is required because kernel does no go 
> through viommu translation when generating interrupt, no? 

yes this is needed when KVM MSI routes are set up, ie. along with GICV3
ITS. With GICv2M, qemu direct gsi mapping is used and this is not needed.

So I do not understand your previous sentence saying "MSI interrupts
works without any change".

Thanks

Eric

> 
> Thanks
> -Bharat
> 
>>
>> Besides I have not looked specifically at the virtio-iommu/vfio integration
>> yet.
>>
>> Thanks
>>
>> Eric
>>>
>>> Is there any other better option I am missing?
>>>
>>> Thanks
>>> -Bharat
>>>
>>>> -Original Message-
>>>> From: Auger Eric [mailto:eric.au...@redhat.com]
>>>> Sent: Friday, June 09, 2017 5:24 PM
>>>> To: Bharat Bhushan ;
>>>> eric.auger@gmail.com; peter.mayd...@linaro.org;
>>>> alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org;
>>>> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com
>>>> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
>>>> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
>>>> robin.mur...@arm.com; christoffer.d...@linaro.org
>>>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
>>>>
>>>> Hi Bharat,
>>>>
>>>> On 09/06/2017 13:30, Bharat Bhushan wrote:
>>>>> Hi Eric,
>>>>>
>>>>>> -Original Message-
>>>>>> From: Auger Eric [mailto:eric.au...@redhat.com]
>>>>>> Sent: Friday, June 09, 2017 12:14 PM
>>>>>> To: Bharat Bhushan ;
>>>>>> eric.auger@gmail.com; peter.mayd...@linaro.org;
>>>>>> alex.william...@redhat.com; m...@redhat.com; qemu-
>> a...@nongnu.org;
>>>>>> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com
>>>>>> Cc: will.dea...@arm.com; robin.mur...@arm.com;
>>>> kevin.t...@intel.com;
>>>>>> marc.zyng...@arm.com; christoffer.d...@linaro.org;
>>>>>> drjo...@redhat.com; w...@redhat.com; t...@semihalf.com
>>>>>> Subject: Re: [RFC v2 0/8] VIRTIO-IOMMU device
>>>>>>
>>>>>> Hi Bharat,
>>>>>>
>>>>>> On 09/06/2017 08:16, Bharat Bhushan wrote:
>>>>>>> Hi Eric,
>>>>>>>
>>>>>>>> -Original Message-
>>>>>>>> From: Eric Auger [mailto:eric.au...@redhat.com]

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-05 Thread Bharat Bhushan
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: Monday, June 26, 2017 1:25 PM
> To: Bharat Bhushan ;
> eric.auger@gmail.com; peter.mayd...@linaro.org;
> alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org;
> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com
> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
> robin.mur...@arm.com; christoffer.d...@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> Hi Bharat,
> 
> On 19/06/2017 09:54, Bharat Bhushan wrote:
> > Hi Eric,
> >
> > I started added replay in virtio-iommu and came across how MSI interrupts
> with work with VFIO.
> > I understand that on intel this works differently but vsmmu will have same
> requirement.
> > kvm-msi-irq-route are added using the msi-address to be translated by
> viommu and not the final translated address.
> > While currently the irqfd framework does not know about emulated
> iommus (virtio-iommu, vsmmuv3/vintel-iommu).
> > So in my view we have following options:
> > - Programming with translated address when setting up
> > kvm-msi-irq-route
> > - Route the interrupts via QEMU, which is bad from performance
> > - vhost-virtio-iommu may solve the problem in long term
> 
> Sorry for the delay. With regard to the vsmmuv3/vfio integration I think we
> need to use the guest physical address otherwise the MSI address will not be
> recognized as an MSI doorbell.
> 
> Also the fact on ARM we map the MSI doorbell causes an assert in
> vfio_get_vaddr() as the vITS doorbell is not a RAM region. We will need to
> handle this specifically.

Also when setup msi-route kvm_irqchip_add_msi_route() we needed to provide the 
translated address.
According to my understanding this is required because kernel does no go 
through viommu translation when generating interrupt, no? 

Thanks
-Bharat

> 
> Besides I have not looked specifically at the virtio-iommu/vfio integration
> yet.
> 
> Thanks
> 
> Eric
> >
> > Is there any other better option I am missing?
> >
> > Thanks
> > -Bharat
> >
> >> -Original Message-
> >> From: Auger Eric [mailto:eric.au...@redhat.com]
> >> Sent: Friday, June 09, 2017 5:24 PM
> >> To: Bharat Bhushan ;
> >> eric.auger@gmail.com; peter.mayd...@linaro.org;
> >> alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org;
> >> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com
> >> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
> >> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
> >> robin.mur...@arm.com; christoffer.d...@linaro.org
> >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> >>
> >> Hi Bharat,
> >>
> >> On 09/06/2017 13:30, Bharat Bhushan wrote:
> >>> Hi Eric,
> >>>
> >>>> -Original Message-
> >>>> From: Auger Eric [mailto:eric.au...@redhat.com]
> >>>> Sent: Friday, June 09, 2017 12:14 PM
> >>>> To: Bharat Bhushan ;
> >>>> eric.auger@gmail.com; peter.mayd...@linaro.org;
> >>>> alex.william...@redhat.com; m...@redhat.com; qemu-
> a...@nongnu.org;
> >>>> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com
> >>>> Cc: will.dea...@arm.com; robin.mur...@arm.com;
> >> kevin.t...@intel.com;
> >>>> marc.zyng...@arm.com; christoffer.d...@linaro.org;
> >>>> drjo...@redhat.com; w...@redhat.com; t...@semihalf.com
> >>>> Subject: Re: [RFC v2 0/8] VIRTIO-IOMMU device
> >>>>
> >>>> Hi Bharat,
> >>>>
> >>>> On 09/06/2017 08:16, Bharat Bhushan wrote:
> >>>>> Hi Eric,
> >>>>>
> >>>>>> -Original Message-
> >>>>>> From: Eric Auger [mailto:eric.au...@redhat.com]
> >>>>>> Sent: Wednesday, June 07, 2017 9:31 PM
> >>>>>> To: eric.auger@gmail.com; eric.au...@redhat.com;
> >>>>>> peter.mayd...@linaro.org; alex.william...@redhat.com;
> >>>> m...@redhat.com;
> >>>>>> qemu-...@nongnu.org; qemu-devel@nongnu.org; jean-
> >>>>>> philippe.bruc...@arm.com
> >>>>>> Cc: will.dea...@arm.com; robin.mur...@arm.com;
> >>>> kevin.t...@intel.com;
> >>>>>> marc.zyng...@arm.com; christoffer.d...@linaro.org;
> >>>>>> drjo...@redhat.com; w...@redhat

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-05 Thread Tian, Kevin
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Tuesday, June 27, 2017 12:13 AM
> 
> On 26/06/17 09:22, Auger Eric wrote:
> > Hi Jean-Philippe,
> >
> > On 19/06/2017 12:15, Jean-Philippe Brucker wrote:
> >> On 19/06/17 08:54, Bharat Bhushan wrote:
> >>> Hi Eric,
> >>>
> >>> I started added replay in virtio-iommu and came across how MSI
> interrupts with work with VFIO.
> >>> I understand that on intel this works differently but vsmmu will have
> same requirement.
> >>> kvm-msi-irq-route are added using the msi-address to be translated by
> viommu and not the final translated address.
> >>> While currently the irqfd framework does not know about emulated
> iommus (virtio-iommu, vsmmuv3/vintel-iommu).
> >>> So in my view we have following options:
> >>> - Programming with translated address when setting up kvm-msi-irq-
> route
> >>> - Route the interrupts via QEMU, which is bad from performance
> >>> - vhost-virtio-iommu may solve the problem in long term
> >>>
> >>> Is there any other better option I am missing?
> >>
> >> Since we're on the topic of MSIs... I'm currently trying to figure out how
> >> we'll handle MSIs in the nested translation mode, where the guest
> manages
> >> S1 page tables and the host doesn't know about GVA->GPA translation.
> >
> > I have a question about the "nested translation mode" terminology. Do
> > you mean in that case you use stage 1 + stage 2 of the physical IOMMU
> > (which the ARM spec normally advises or was meant for) or do you mean
> > stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At
> the
> > moment my understanding is for VFIO integration the pIOMMU uses a
> single
> > stage combining both the stage 1 and stage2 mappings but the host is not
> > aware of those 2 stages.
> 
> Yes at the moment the VMM merges stage-1 (GVA->GPA) from the guest with
> its stage-2 mappings (GPA->HPA) and creates a stage-2 mapping (GVA->HPA)
> in the pIOMMU via VFIO_IOMMU_MAP_DMA. stage-1 is disabled in the
> pIOMMU.
> 

Curious whether you are describing current smmu status or general
vIOMMU status also applying to other vendors...

the usage what you described is about svm, while svm requires PASID.
At least PASID is tied to stage-1 on Intel VT-d. Only DMA w/o PASID
or nested translation from stage-1 will go through stage-2. Unless
ARM smmu has a completely different implementation, I'm not sure
how svm can be virtualized w/ stage-1 translation disabled. There
are multiple stage-1 page tables while only one stage-2 page table per
device. Could merging actually work here?

The only case with merging happen today is for guest stage-2 usage
or so-called GIOVA usage. Guest programs GIOVA->GPA to vIOMMU 
stage-2. Then vIOMMU invokes vfio map/unmap APIs to translate/
merge to GIOVA->HPA to pIOMMU stage-2. Maybe what you
actually meant is this one?

Thanks
Kevin


Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-05 Thread Bharat Bhushan
Hi Jean,

> -Original Message-
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Monday, June 19, 2017 3:45 PM
> To: Bharat Bhushan ; Auger Eric
> ; eric.auger@gmail.com;
> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com;
> qemu-...@nongnu.org; qemu-devel@nongnu.org
> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
> robin.mur...@arm.com; christoffer.d...@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> On 19/06/17 08:54, Bharat Bhushan wrote:
> > Hi Eric,
> >
> > I started added replay in virtio-iommu and came across how MSI interrupts
> with work with VFIO.
> > I understand that on intel this works differently but vsmmu will have same
> requirement.
> > kvm-msi-irq-route are added using the msi-address to be translated by
> viommu and not the final translated address.
> > While currently the irqfd framework does not know about emulated
> iommus (virtio-iommu, vsmmuv3/vintel-iommu).
> > So in my view we have following options:
> > - Programming with translated address when setting up
> > kvm-msi-irq-route
> > - Route the interrupts via QEMU, which is bad from performance
> > - vhost-virtio-iommu may solve the problem in long term
> >
> > Is there any other better option I am missing?
> 
> Since we're on the topic of MSIs... I'm currently trying to figure out how 
> we'll
> handle MSIs in the nested translation mode, where the guest manages
> S1 page tables and the host doesn't know about GVA->GPA translation.
> 
> I'm also wondering about the benefits of having SW-mapped MSIs in the
> guest. It seems unavoidable for vSMMU since that's what a physical system
> would do. But in a paravirtualized solution there doesn't seem to be any
> compelling reason for having the guest map MSI doorbells. These addresses
> are never accessed directly, they are only used for setting up IRQ routing (at
> least on kvmtool). So here's what I'd like to have. Note that I haven't
> investigated the feasibility in Qemu yet, I don't know how it deals with MSIs.
> 
> (1) Guest uses the guest-physical MSI doorbell when setting up MSIs. For
> ARM with GICv3 this would be GITS_TRANSLATER, for x86 it would be the
> fixed MSI doorbell. This way the host wouldn't need to inspect IOMMU
> mappings when handling writes to PCI MSI-X tables.

Sorry for late reply, does this mean that we can use IOMMU_RESV_MSI for 
virtio-iommu driver? This will not create mapping in IOMMU?
I tried this PCI pass-through using QEMU (integrated vfio with virtio-iommu) 
and MSI interrupts works without any change.

Thanks
-Bharat

> 
> (2) In nested mode (with VFIO) on ARM, the pSMMU will still translate MSIs
> via S1+S2. Therefore the host needs to map MSIs at stage-1, and I'd like to
> use the (currently unused) TTB1 tables in that case. In addition, using
> TTB1 would be useful for SVM, when endpoints write MSIs with PASIDs and
> we don't want to map them in user address space.
> 
> This means that the host needs to use different doorbell addresses in nested
> mode, since it would be unable to map at S1 the same IOVA as S2
> (TTB1 manages negative addresses - 0x, which are not
> representable as GPAs.) It also requires to use 32-bit page tables for
> endpoints that are not capable of using 64-bit MSI addresses.
> 
> Now (2) is entirely handled in the host kernel, so it's more a Linux question.
> But does (1) seem acceptable for virtio-iommu in Qemu?
> 
> Thanks,
> Jean


Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-07-05 Thread Tian, Kevin
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Monday, June 19, 2017 6:15 PM
> 
> On 19/06/17 08:54, Bharat Bhushan wrote:
> > Hi Eric,
> >
> > I started added replay in virtio-iommu and came across how MSI interrupts
> with work with VFIO.
> > I understand that on intel this works differently but vsmmu will have same
> requirement.
> > kvm-msi-irq-route are added using the msi-address to be translated by
> viommu and not the final translated address.
> > While currently the irqfd framework does not know about emulated
> iommus (virtio-iommu, vsmmuv3/vintel-iommu).
> > So in my view we have following options:
> > - Programming with translated address when setting up kvm-msi-irq-route
> > - Route the interrupts via QEMU, which is bad from performance
> > - vhost-virtio-iommu may solve the problem in long term
> >
> > Is there any other better option I am missing?
> 
> Since we're on the topic of MSIs... I'm currently trying to figure out how
> we'll handle MSIs in the nested translation mode, where the guest manages
> S1 page tables and the host doesn't know about GVA->GPA translation.
> 
> I'm also wondering about the benefits of having SW-mapped MSIs in the
> guest. It seems unavoidable for vSMMU since that's what a physical system
> would do. But in a paravirtualized solution there doesn't seem to be any
> compelling reason for having the guest map MSI doorbells. These addresses
> are never accessed directly, they are only used for setting up IRQ routing
> (at least on kvmtool). So here's what I'd like to have. Note that I
> haven't investigated the feasibility in Qemu yet, I don't know how it
> deals with MSIs.
> 
> (1) Guest uses the guest-physical MSI doorbell when setting up MSIs. For
> ARM with GICv3 this would be GITS_TRANSLATER, for x86 it would be the
> fixed MSI doorbell. This way the host wouldn't need to inspect IOMMU
> mappings when handling writes to PCI MSI-X tables.
> 

What do you mean by "fixed MSI doorbell"? PCI MSI-X table is part of
PCI MMIO bar. Accessing to it is just a memory virtualization issue (e.g.
trap by KVM and then emulated in Qemu) on x86. It's not a IOMMU
problem. I guess you may mean same thing but want to double confirm
here given the terminology confusion. Or do you mean the interrupt
triggered by IOMMU itself?

Thanks
Kevin


Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-06-27 Thread Auger Eric
Hi,

On 27/06/2017 10:46, Will Deacon wrote:
> Hi Eric,
> 
> On Tue, Jun 27, 2017 at 08:38:48AM +0200, Auger Eric wrote:
>> On 26/06/2017 18:13, Jean-Philippe Brucker wrote:
>>> On 26/06/17 09:22, Auger Eric wrote:
 On 19/06/2017 12:15, Jean-Philippe Brucker wrote:
> On 19/06/17 08:54, Bharat Bhushan wrote:
>> I started added replay in virtio-iommu and came across how MSI 
>> interrupts with work with VFIO. 
>> I understand that on intel this works differently but vsmmu will have 
>> same requirement. 
>> kvm-msi-irq-route are added using the msi-address to be translated by 
>> viommu and not the final translated address.
>> While currently the irqfd framework does not know about emulated iommus 
>> (virtio-iommu, vsmmuv3/vintel-iommu).
>> So in my view we have following options:
>> - Programming with translated address when setting up kvm-msi-irq-route
>> - Route the interrupts via QEMU, which is bad from performance
>> - vhost-virtio-iommu may solve the problem in long term
>>
>> Is there any other better option I am missing?
>
> Since we're on the topic of MSIs... I'm currently trying to figure out how
> we'll handle MSIs in the nested translation mode, where the guest manages
> S1 page tables and the host doesn't know about GVA->GPA translation.

 I have a question about the "nested translation mode" terminology. Do
 you mean in that case you use stage 1 + stage 2 of the physical IOMMU
 (which the ARM spec normally advises or was meant for) or do you mean
 stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At the
 moment my understanding is for VFIO integration the pIOMMU uses a single
 stage combining both the stage 1 and stage2 mappings but the host is not
 aware of those 2 stages.
>>>
>>> Yes at the moment the VMM merges stage-1 (GVA->GPA) from the guest with
>>> its stage-2 mappings (GPA->HPA) and creates a stage-2 mapping (GVA->HPA)
>>> in the pIOMMU via VFIO_IOMMU_MAP_DMA. stage-1 is disabled in the pIOMMU.
>>>
>>> What I mean by "nested mode" is stage 1 + stage 2 in the physical IOMMU.
>>> I'm referring to the "Page Table Sharing" bit of the Future Work in the
>>> initial RFC for virtio-iommu [1], and also PASID table binding [2] in the
>>> case of vSMMU. In that mode, stage-1 page tables in the pIOMMU are managed
>>> by the guest, and the VMM only maps GPA->HPA.
>>
>> OK I need to read that part more thoroughly. I was told in the past
>> handling nested stages at pIOMMU was considered too complex and
>> difficult to maintain. But definitively The SMMU architecture is devised
>> for that. Michael asked why we did not use that already for vsmmu
>> (nested stages are used on AMD IOMMU I think).
> 
> Curious -- but what gave you that idea? I worry that something I might have
> said wasn't clear or has been misunderstood.

Lobby discussions I might not have correctly understood ;-) Anyway
that's a new direction that I am happy to investigate then.

Thanks

Eric
> 
> Will
> 



Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-06-27 Thread Will Deacon
Hi Eric,

On Tue, Jun 27, 2017 at 08:38:48AM +0200, Auger Eric wrote:
> On 26/06/2017 18:13, Jean-Philippe Brucker wrote:
> > On 26/06/17 09:22, Auger Eric wrote:
> >> On 19/06/2017 12:15, Jean-Philippe Brucker wrote:
> >>> On 19/06/17 08:54, Bharat Bhushan wrote:
>  I started added replay in virtio-iommu and came across how MSI 
>  interrupts with work with VFIO. 
>  I understand that on intel this works differently but vsmmu will have 
>  same requirement. 
>  kvm-msi-irq-route are added using the msi-address to be translated by 
>  viommu and not the final translated address.
>  While currently the irqfd framework does not know about emulated iommus 
>  (virtio-iommu, vsmmuv3/vintel-iommu).
>  So in my view we have following options:
>  - Programming with translated address when setting up kvm-msi-irq-route
>  - Route the interrupts via QEMU, which is bad from performance
>  - vhost-virtio-iommu may solve the problem in long term
> 
>  Is there any other better option I am missing?
> >>>
> >>> Since we're on the topic of MSIs... I'm currently trying to figure out how
> >>> we'll handle MSIs in the nested translation mode, where the guest manages
> >>> S1 page tables and the host doesn't know about GVA->GPA translation.
> >>
> >> I have a question about the "nested translation mode" terminology. Do
> >> you mean in that case you use stage 1 + stage 2 of the physical IOMMU
> >> (which the ARM spec normally advises or was meant for) or do you mean
> >> stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At the
> >> moment my understanding is for VFIO integration the pIOMMU uses a single
> >> stage combining both the stage 1 and stage2 mappings but the host is not
> >> aware of those 2 stages.
> > 
> > Yes at the moment the VMM merges stage-1 (GVA->GPA) from the guest with
> > its stage-2 mappings (GPA->HPA) and creates a stage-2 mapping (GVA->HPA)
> > in the pIOMMU via VFIO_IOMMU_MAP_DMA. stage-1 is disabled in the pIOMMU.
> > 
> > What I mean by "nested mode" is stage 1 + stage 2 in the physical IOMMU.
> > I'm referring to the "Page Table Sharing" bit of the Future Work in the
> > initial RFC for virtio-iommu [1], and also PASID table binding [2] in the
> > case of vSMMU. In that mode, stage-1 page tables in the pIOMMU are managed
> > by the guest, and the VMM only maps GPA->HPA.
> 
> OK I need to read that part more thoroughly. I was told in the past
> handling nested stages at pIOMMU was considered too complex and
> difficult to maintain. But definitively The SMMU architecture is devised
> for that. Michael asked why we did not use that already for vsmmu
> (nested stages are used on AMD IOMMU I think).

Curious -- but what gave you that idea? I worry that something I might have
said wasn't clear or has been misunderstood.

Will



Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-06-26 Thread Auger Eric
Hi Jean-Philippe,

On 26/06/2017 18:13, Jean-Philippe Brucker wrote:
> On 26/06/17 09:22, Auger Eric wrote:
>> Hi Jean-Philippe,
>>
>> On 19/06/2017 12:15, Jean-Philippe Brucker wrote:
>>> On 19/06/17 08:54, Bharat Bhushan wrote:
 Hi Eric,

 I started added replay in virtio-iommu and came across how MSI interrupts 
 with work with VFIO. 
 I understand that on intel this works differently but vsmmu will have same 
 requirement. 
 kvm-msi-irq-route are added using the msi-address to be translated by 
 viommu and not the final translated address.
 While currently the irqfd framework does not know about emulated iommus 
 (virtio-iommu, vsmmuv3/vintel-iommu).
 So in my view we have following options:
 - Programming with translated address when setting up kvm-msi-irq-route
 - Route the interrupts via QEMU, which is bad from performance
 - vhost-virtio-iommu may solve the problem in long term

 Is there any other better option I am missing?
>>>
>>> Since we're on the topic of MSIs... I'm currently trying to figure out how
>>> we'll handle MSIs in the nested translation mode, where the guest manages
>>> S1 page tables and the host doesn't know about GVA->GPA translation.
>>
>> I have a question about the "nested translation mode" terminology. Do
>> you mean in that case you use stage 1 + stage 2 of the physical IOMMU
>> (which the ARM spec normally advises or was meant for) or do you mean
>> stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At the
>> moment my understanding is for VFIO integration the pIOMMU uses a single
>> stage combining both the stage 1 and stage2 mappings but the host is not
>> aware of those 2 stages.
> 
> Yes at the moment the VMM merges stage-1 (GVA->GPA) from the guest with
> its stage-2 mappings (GPA->HPA) and creates a stage-2 mapping (GVA->HPA)
> in the pIOMMU via VFIO_IOMMU_MAP_DMA. stage-1 is disabled in the pIOMMU.
> 
> What I mean by "nested mode" is stage 1 + stage 2 in the physical IOMMU.
> I'm referring to the "Page Table Sharing" bit of the Future Work in the
> initial RFC for virtio-iommu [1], and also PASID table binding [2] in the
> case of vSMMU. In that mode, stage-1 page tables in the pIOMMU are managed
> by the guest, and the VMM only maps GPA->HPA.

OK I need to read that part more thoroughly. I was told in the past
handling nested stages at pIOMMU was considered too complex and
difficult to maintain. But definitively The SMMU architecture is devised
for that. Michael asked why we did not use that already for vsmmu
(nested stages are used on AMD IOMMU I think).
> 
> Since both s1 and s2 are then enabled in the pIOMMU, MSIs reaching the
> pIOMMU will be translated at s1 then s2. To create nested translation for
> MSIs, I see two solutions:
> 
> A. The GPA of the doorbell that is exposed to the guest is mapped by the
> VMM at S2. This mapping is GPA->(PA of the doorbell) with Dev-nGnRE memory
> attributes. The guest creates a GVA->GPA mapping, then writes GVA in the
> MSI-X tables.
> - If the MSI-X table is emulated (as we currently do), VMM has to force
>   the host to rewrite the physical MSIX entry with the GVA.
> - If the MSI-X table is mapped (see [3]), then the guest writes
>   the GVA into the physical MSI-X entry. (How does this work with lazy MSI
>   routing setup, that is based on trapping MSIX table?)
> 
> B. The VMM exposes a fake doorbell. Hardware MSI vectors are programmed
> upfront by the host. Since TTB0 is assigned to the guest, then host must
> use TTB1 to create the GVA->GPA mapping.
> 
> Solution B was my proposal (2) below, but I didn't take vSMMU into account
> at the time. I think that for virtual SVM with the vSMMU, the VMM has to
> hand the whole PASID table over to the guest. This is what Intel seems to
> do [2]. Even if we emulated the PASID table instead of handing it over, we
> wouldn't have a way to hide TTB1 from the guest. So with vSMMU we loose
> control over TTB1 and (2) doesn't work.
> 
> I don't really like A, but it might be the only way with vSMMU:
> - Guest maps doorbell at S1,
> - Guest writes the GVA in its virtual MSI-X tables,
> - Host handles the GVA write and reprograms the hardware MSI-X tables,
> - Device issues an MSI, which gets translated at S1+S2, then hits the
>   doorbell,
> - VFIO handles the IRQ, which is forwarded to KVM via IRQFD, finds the
>   corresponding irqchip by GPA, then injects the MSI.

I am about to experience A) with vsmmu/VFIO. Please let me few days
before I answer accurately to this part.
> 
>>> I'm also wondering about the benefits of having SW-mapped MSIs in the
>>> guest. It seems unavoidable for vSMMU since that's what a physical system
>>> would do. But in a paravirtualized solution there doesn't seem to be any
>>> compelling reason for having the guest map MSI doorbells.
>>
>> If I understand correctly the virtio-iommu would not expose MSI reserved
>> regions (saying it does not translates MSIs). In that case he VFIO

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-06-26 Thread Jean-Philippe Brucker
On 26/06/17 09:22, Auger Eric wrote:
> Hi Jean-Philippe,
> 
> On 19/06/2017 12:15, Jean-Philippe Brucker wrote:
>> On 19/06/17 08:54, Bharat Bhushan wrote:
>>> Hi Eric,
>>>
>>> I started added replay in virtio-iommu and came across how MSI interrupts 
>>> with work with VFIO. 
>>> I understand that on intel this works differently but vsmmu will have same 
>>> requirement. 
>>> kvm-msi-irq-route are added using the msi-address to be translated by 
>>> viommu and not the final translated address.
>>> While currently the irqfd framework does not know about emulated iommus 
>>> (virtio-iommu, vsmmuv3/vintel-iommu).
>>> So in my view we have following options:
>>> - Programming with translated address when setting up kvm-msi-irq-route
>>> - Route the interrupts via QEMU, which is bad from performance
>>> - vhost-virtio-iommu may solve the problem in long term
>>>
>>> Is there any other better option I am missing?
>>
>> Since we're on the topic of MSIs... I'm currently trying to figure out how
>> we'll handle MSIs in the nested translation mode, where the guest manages
>> S1 page tables and the host doesn't know about GVA->GPA translation.
> 
> I have a question about the "nested translation mode" terminology. Do
> you mean in that case you use stage 1 + stage 2 of the physical IOMMU
> (which the ARM spec normally advises or was meant for) or do you mean
> stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At the
> moment my understanding is for VFIO integration the pIOMMU uses a single
> stage combining both the stage 1 and stage2 mappings but the host is not
> aware of those 2 stages.

Yes at the moment the VMM merges stage-1 (GVA->GPA) from the guest with
its stage-2 mappings (GPA->HPA) and creates a stage-2 mapping (GVA->HPA)
in the pIOMMU via VFIO_IOMMU_MAP_DMA. stage-1 is disabled in the pIOMMU.

What I mean by "nested mode" is stage 1 + stage 2 in the physical IOMMU.
I'm referring to the "Page Table Sharing" bit of the Future Work in the
initial RFC for virtio-iommu [1], and also PASID table binding [2] in the
case of vSMMU. In that mode, stage-1 page tables in the pIOMMU are managed
by the guest, and the VMM only maps GPA->HPA.

Since both s1 and s2 are then enabled in the pIOMMU, MSIs reaching the
pIOMMU will be translated at s1 then s2. To create nested translation for
MSIs, I see two solutions:

A. The GPA of the doorbell that is exposed to the guest is mapped by the
VMM at S2. This mapping is GPA->(PA of the doorbell) with Dev-nGnRE memory
attributes. The guest creates a GVA->GPA mapping, then writes GVA in the
MSI-X tables.
- If the MSI-X table is emulated (as we currently do), VMM has to force
  the host to rewrite the physical MSIX entry with the GVA.
- If the MSI-X table is mapped (see [3]), then the guest writes
  the GVA into the physical MSI-X entry. (How does this work with lazy MSI
  routing setup, that is based on trapping MSIX table?)

B. The VMM exposes a fake doorbell. Hardware MSI vectors are programmed
upfront by the host. Since TTB0 is assigned to the guest, then host must
use TTB1 to create the GVA->GPA mapping.

Solution B was my proposal (2) below, but I didn't take vSMMU into account
at the time. I think that for virtual SVM with the vSMMU, the VMM has to
hand the whole PASID table over to the guest. This is what Intel seems to
do [2]. Even if we emulated the PASID table instead of handing it over, we
wouldn't have a way to hide TTB1 from the guest. So with vSMMU we loose
control over TTB1 and (2) doesn't work.

I don't really like A, but it might be the only way with vSMMU:
- Guest maps doorbell at S1,
- Guest writes the GVA in its virtual MSI-X tables,
- Host handles the GVA write and reprograms the hardware MSI-X tables,
- Device issues an MSI, which gets translated at S1+S2, then hits the
  doorbell,
- VFIO handles the IRQ, which is forwarded to KVM via IRQFD, finds the
  corresponding irqchip by GPA, then injects the MSI.

>> I'm also wondering about the benefits of having SW-mapped MSIs in the
>> guest. It seems unavoidable for vSMMU since that's what a physical system
>> would do. But in a paravirtualized solution there doesn't seem to be any
>> compelling reason for having the guest map MSI doorbells.
> 
> If I understand correctly the virtio-iommu would not expose MSI reserved
> regions (saying it does not translates MSIs). In that case he VFIO
> kernel code will not check the irq_domain_check_msi_remap() but will
> check iommu_capable(bus, IOMMU_CAP_INTR_REMAP) instead. Would the
> virtio-iommu expose this capability? How would it isolate MSI
> transactions from different devices?

Yes, the virtio-iommu would expose IOMMU_CAP_INTR_REMAP to keep VFIO
happy. But the virtio-iommu device wouldn't do any MSI isolation. We have
software-mapped doorbell on ARM because MSI transactions are translated by
the SMMU before reaching the GIC, which then performs device isolation.
With virtio-iommu on ARM, the address translation stage seems unnecessary
if you already hav

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-06-26 Thread Auger Eric
Hi Jean-Philippe,

On 19/06/2017 12:15, Jean-Philippe Brucker wrote:
> On 19/06/17 08:54, Bharat Bhushan wrote:
>> Hi Eric,
>>
>> I started added replay in virtio-iommu and came across how MSI interrupts 
>> with work with VFIO. 
>> I understand that on intel this works differently but vsmmu will have same 
>> requirement. 
>> kvm-msi-irq-route are added using the msi-address to be translated by viommu 
>> and not the final translated address.
>> While currently the irqfd framework does not know about emulated iommus 
>> (virtio-iommu, vsmmuv3/vintel-iommu).
>> So in my view we have following options:
>> - Programming with translated address when setting up kvm-msi-irq-route
>> - Route the interrupts via QEMU, which is bad from performance
>> - vhost-virtio-iommu may solve the problem in long term
>>
>> Is there any other better option I am missing?
> 
> Since we're on the topic of MSIs... I'm currently trying to figure out how
> we'll handle MSIs in the nested translation mode, where the guest manages
> S1 page tables and the host doesn't know about GVA->GPA translation.

I have a question about the "nested translation mode" terminology. Do
you mean in that case you use stage 1 + stage 2 of the physical IOMMU
(which the ARM spec normally advises or was meant for) or do you mean
stage 1 implemented in vIOMMU and stage 2 implemented in pIOMMU. At the
moment my understanding is for VFIO integration the pIOMMU uses a single
stage combining both the stage 1 and stage2 mappings but the host is not
aware of those 2 stages.
> 
> I'm also wondering about the benefits of having SW-mapped MSIs in the
> guest. It seems unavoidable for vSMMU since that's what a physical system
> would do. But in a paravirtualized solution there doesn't seem to be any
> compelling reason for having the guest map MSI doorbells.

If I understand correctly the virtio-iommu would not expose MSI reserved
regions (saying it does not translates MSIs). In that case he VFIO
kernel code will not check the irq_domain_check_msi_remap() but will
check iommu_capable(bus, IOMMU_CAP_INTR_REMAP) instead. Would the
virtio-iommu expose this capability? How would it isolate MSI
transactions from different devices?

Thanks

Eric


 These addresses
> are never accessed directly, they are only used for setting up IRQ routing
> (at least on kvmtool). So here's what I'd like to have. Note that I
> haven't investigated the feasibility in Qemu yet, I don't know how it
> deals with MSIs.
> 
> (1) Guest uses the guest-physical MSI doorbell when setting up MSIs. For
> ARM with GICv3 this would be GITS_TRANSLATER, for x86 it would be the
> fixed MSI doorbell. This way the host wouldn't need to inspect IOMMU
> mappings when handling writes to PCI MSI-X tables.
> 
> (2) In nested mode (with VFIO) on ARM, the pSMMU will still translate MSIs
> via S1+S2. Therefore the host needs to map MSIs at stage-1, and I'd like
> to use the (currently unused) TTB1 tables in that case. In addition, using
> TTB1 would be useful for SVM, when endpoints write MSIs with PASIDs and we
> don't want to map them in user address space.
> 
> This means that the host needs to use different doorbell addresses in
> nested mode, since it would be unable to map at S1 the same IOVA as S2
> (TTB1 manages negative addresses - 0x, which are not
> representable as GPAs.) It also requires to use 32-bit page tables for
> endpoints that are not capable of using 64-bit MSI addresses.
> 
> 
> Now (2) is entirely handled in the host kernel, so it's more a Linux
> question. But does (1) seem acceptable for virtio-iommu in Qemu?
> 
> Thanks,
> Jean
> 



Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-06-26 Thread Auger Eric
Hi Bharat,

On 19/06/2017 09:54, Bharat Bhushan wrote:
> Hi Eric,
> 
> I started added replay in virtio-iommu and came across how MSI interrupts 
> with work with VFIO. 
> I understand that on intel this works differently but vsmmu will have same 
> requirement. 
> kvm-msi-irq-route are added using the msi-address to be translated by viommu 
> and not the final translated address.
> While currently the irqfd framework does not know about emulated iommus 
> (virtio-iommu, vsmmuv3/vintel-iommu).
> So in my view we have following options:
> - Programming with translated address when setting up kvm-msi-irq-route
> - Route the interrupts via QEMU, which is bad from performance
> - vhost-virtio-iommu may solve the problem in long term

Sorry for the delay. With regard to the vsmmuv3/vfio integration I think
we need to use the guest physical address otherwise the MSI address will
not be recognized as an MSI doorbell.

Also the fact on ARM we map the MSI doorbell causes an assert in
vfio_get_vaddr() as the vITS doorbell is not a RAM region. We will need
to handle this specifically.

Besides I have not looked specifically at the virtio-iommu/vfio
integration yet.

Thanks

Eric
> 
> Is there any other better option I am missing?
> 
> Thanks
> -Bharat
> 
>> -Original Message-
>> From: Auger Eric [mailto:eric.au...@redhat.com]
>> Sent: Friday, June 09, 2017 5:24 PM
>> To: Bharat Bhushan ;
>> eric.auger@gmail.com; peter.mayd...@linaro.org;
>> alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org;
>> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com
>> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
>> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
>> robin.mur...@arm.com; christoffer.d...@linaro.org
>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
>>
>> Hi Bharat,
>>
>> On 09/06/2017 13:30, Bharat Bhushan wrote:
>>> Hi Eric,
>>>
>>>> -Original Message-
>>>> From: Auger Eric [mailto:eric.au...@redhat.com]
>>>> Sent: Friday, June 09, 2017 12:14 PM
>>>> To: Bharat Bhushan ;
>>>> eric.auger@gmail.com; peter.mayd...@linaro.org;
>>>> alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org;
>>>> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com
>>>> Cc: will.dea...@arm.com; robin.mur...@arm.com;
>> kevin.t...@intel.com;
>>>> marc.zyng...@arm.com; christoffer.d...@linaro.org;
>>>> drjo...@redhat.com; w...@redhat.com; t...@semihalf.com
>>>> Subject: Re: [RFC v2 0/8] VIRTIO-IOMMU device
>>>>
>>>> Hi Bharat,
>>>>
>>>> On 09/06/2017 08:16, Bharat Bhushan wrote:
>>>>> Hi Eric,
>>>>>
>>>>>> -Original Message-
>>>>>> From: Eric Auger [mailto:eric.au...@redhat.com]
>>>>>> Sent: Wednesday, June 07, 2017 9:31 PM
>>>>>> To: eric.auger@gmail.com; eric.au...@redhat.com;
>>>>>> peter.mayd...@linaro.org; alex.william...@redhat.com;
>>>> m...@redhat.com;
>>>>>> qemu-...@nongnu.org; qemu-devel@nongnu.org; jean-
>>>>>> philippe.bruc...@arm.com
>>>>>> Cc: will.dea...@arm.com; robin.mur...@arm.com;
>>>> kevin.t...@intel.com;
>>>>>> marc.zyng...@arm.com; christoffer.d...@linaro.org;
>>>>>> drjo...@redhat.com; w...@redhat.com; t...@semihalf.com; Bharat
>>>> Bhushan
>>>>>> 
>>>>>> Subject: [RFC v2 0/8] VIRTIO-IOMMU device
>>>>>>
>>>>>> This series implements the virtio-iommu device. This is a proof of
>>>>>> concept based on the virtio-iommu specification written by
>>>>>> Jean-Philippe
>>>> Brucker [1].
>>>>>> This was tested with a guest using the virtio-iommu driver [2] and
>>>>>> exposed with a virtio-net-pci using dma ops.
>>>>>>
>>>>>> The device gets instantiated using the "-device virtio-iommu-device"
>>>>>> option. It currently works with ARM virt machine only as the
>>>>>> machine must handle the dt binding between the virtio-mmio "iommu"
>>>>>> node and the PCI host bridge node. ACPI booting is not yet supported.
>>>>>>
>>>>>> This should allow to start some benchmarking activities against
>>>>>> pure emulated IOMMU (especially ARM SMMU).
>>>>>
>>>>> I am testing this on ARM64 and see below c

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-06-19 Thread Jean-Philippe Brucker
On 19/06/17 08:54, Bharat Bhushan wrote:
> Hi Eric,
> 
> I started added replay in virtio-iommu and came across how MSI interrupts 
> with work with VFIO. 
> I understand that on intel this works differently but vsmmu will have same 
> requirement. 
> kvm-msi-irq-route are added using the msi-address to be translated by viommu 
> and not the final translated address.
> While currently the irqfd framework does not know about emulated iommus 
> (virtio-iommu, vsmmuv3/vintel-iommu).
> So in my view we have following options:
> - Programming with translated address when setting up kvm-msi-irq-route
> - Route the interrupts via QEMU, which is bad from performance
> - vhost-virtio-iommu may solve the problem in long term
> 
> Is there any other better option I am missing?

Since we're on the topic of MSIs... I'm currently trying to figure out how
we'll handle MSIs in the nested translation mode, where the guest manages
S1 page tables and the host doesn't know about GVA->GPA translation.

I'm also wondering about the benefits of having SW-mapped MSIs in the
guest. It seems unavoidable for vSMMU since that's what a physical system
would do. But in a paravirtualized solution there doesn't seem to be any
compelling reason for having the guest map MSI doorbells. These addresses
are never accessed directly, they are only used for setting up IRQ routing
(at least on kvmtool). So here's what I'd like to have. Note that I
haven't investigated the feasibility in Qemu yet, I don't know how it
deals with MSIs.

(1) Guest uses the guest-physical MSI doorbell when setting up MSIs. For
ARM with GICv3 this would be GITS_TRANSLATER, for x86 it would be the
fixed MSI doorbell. This way the host wouldn't need to inspect IOMMU
mappings when handling writes to PCI MSI-X tables.

(2) In nested mode (with VFIO) on ARM, the pSMMU will still translate MSIs
via S1+S2. Therefore the host needs to map MSIs at stage-1, and I'd like
to use the (currently unused) TTB1 tables in that case. In addition, using
TTB1 would be useful for SVM, when endpoints write MSIs with PASIDs and we
don't want to map them in user address space.

This means that the host needs to use different doorbell addresses in
nested mode, since it would be unable to map at S1 the same IOVA as S2
(TTB1 manages negative addresses - 0x, which are not
representable as GPAs.) It also requires to use 32-bit page tables for
endpoints that are not capable of using 64-bit MSI addresses.


Now (2) is entirely handled in the host kernel, so it's more a Linux
question. But does (1) seem acceptable for virtio-iommu in Qemu?

Thanks,
Jean



Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-06-19 Thread Bharat Bhushan
Hi Eric,

I started added replay in virtio-iommu and came across how MSI interrupts with 
work with VFIO. 
I understand that on intel this works differently but vsmmu will have same 
requirement. 
kvm-msi-irq-route are added using the msi-address to be translated by viommu 
and not the final translated address.
While currently the irqfd framework does not know about emulated iommus 
(virtio-iommu, vsmmuv3/vintel-iommu).
So in my view we have following options:
- Programming with translated address when setting up kvm-msi-irq-route
- Route the interrupts via QEMU, which is bad from performance
- vhost-virtio-iommu may solve the problem in long term

Is there any other better option I am missing?

Thanks
-Bharat

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: Friday, June 09, 2017 5:24 PM
> To: Bharat Bhushan ;
> eric.auger@gmail.com; peter.mayd...@linaro.org;
> alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org;
> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com
> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
> robin.mur...@arm.com; christoffer.d...@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> Hi Bharat,
> 
> On 09/06/2017 13:30, Bharat Bhushan wrote:
> > Hi Eric,
> >
> >> -Original Message-
> >> From: Auger Eric [mailto:eric.au...@redhat.com]
> >> Sent: Friday, June 09, 2017 12:14 PM
> >> To: Bharat Bhushan ;
> >> eric.auger@gmail.com; peter.mayd...@linaro.org;
> >> alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org;
> >> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com
> >> Cc: will.dea...@arm.com; robin.mur...@arm.com;
> kevin.t...@intel.com;
> >> marc.zyng...@arm.com; christoffer.d...@linaro.org;
> >> drjo...@redhat.com; w...@redhat.com; t...@semihalf.com
> >> Subject: Re: [RFC v2 0/8] VIRTIO-IOMMU device
> >>
> >> Hi Bharat,
> >>
> >> On 09/06/2017 08:16, Bharat Bhushan wrote:
> >>> Hi Eric,
> >>>
> >>>> -Original Message-
> >>>> From: Eric Auger [mailto:eric.au...@redhat.com]
> >>>> Sent: Wednesday, June 07, 2017 9:31 PM
> >>>> To: eric.auger@gmail.com; eric.au...@redhat.com;
> >>>> peter.mayd...@linaro.org; alex.william...@redhat.com;
> >> m...@redhat.com;
> >>>> qemu-...@nongnu.org; qemu-devel@nongnu.org; jean-
> >>>> philippe.bruc...@arm.com
> >>>> Cc: will.dea...@arm.com; robin.mur...@arm.com;
> >> kevin.t...@intel.com;
> >>>> marc.zyng...@arm.com; christoffer.d...@linaro.org;
> >>>> drjo...@redhat.com; w...@redhat.com; t...@semihalf.com; Bharat
> >> Bhushan
> >>>> 
> >>>> Subject: [RFC v2 0/8] VIRTIO-IOMMU device
> >>>>
> >>>> This series implements the virtio-iommu device. This is a proof of
> >>>> concept based on the virtio-iommu specification written by
> >>>> Jean-Philippe
> >> Brucker [1].
> >>>> This was tested with a guest using the virtio-iommu driver [2] and
> >>>> exposed with a virtio-net-pci using dma ops.
> >>>>
> >>>> The device gets instantiated using the "-device virtio-iommu-device"
> >>>> option. It currently works with ARM virt machine only as the
> >>>> machine must handle the dt binding between the virtio-mmio "iommu"
> >>>> node and the PCI host bridge node. ACPI booting is not yet supported.
> >>>>
> >>>> This should allow to start some benchmarking activities against
> >>>> pure emulated IOMMU (especially ARM SMMU).
> >>>
> >>> I am testing this on ARM64 and see below continuous error prints:
> >>>
> >>>   virtio_iommu_translate sid=8 is not known!!
> >>>   virtio_iommu_translate sid=8 is not known!!
> >>>   virtio_iommu_translate sid=8 is not known!!
> >>>   virtio_iommu_translate sid=8 is not known!!
> >>>   virtio_iommu_translate sid=8 is not known!!
> >>>   virtio_iommu_translate sid=8 is not known!!
> >>>   virtio_iommu_translate sid=8 is not known!!
> >>>   virtio_iommu_translate sid=8 is not known!!
> >>>   virtio_iommu_translate sid=8 is not known!!
> >>>   virtio_iommu_translate sid=8 is not known!!
> >>>
> >>>
> >>> Also in guest I do not see device-tree node with virtio-iommu.
> >> do you mean 

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-06-09 Thread Auger Eric
Hi,

On 09/06/2017 13:30, Bharat Bhushan wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: Auger Eric [mailto:eric.au...@redhat.com]
>> Sent: Friday, June 09, 2017 12:14 PM
>> To: Bharat Bhushan ;
>> eric.auger@gmail.com; peter.mayd...@linaro.org;
>> alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org;
>> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com
>> Cc: will.dea...@arm.com; robin.mur...@arm.com; kevin.t...@intel.com;
>> marc.zyng...@arm.com; christoffer.d...@linaro.org; drjo...@redhat.com;
>> w...@redhat.com; t...@semihalf.com
>> Subject: Re: [RFC v2 0/8] VIRTIO-IOMMU device
>>
>> Hi Bharat,
>>
>> On 09/06/2017 08:16, Bharat Bhushan wrote:
>>> Hi Eric,
>>>
 -Original Message-
 From: Eric Auger [mailto:eric.au...@redhat.com]
 Sent: Wednesday, June 07, 2017 9:31 PM
 To: eric.auger@gmail.com; eric.au...@redhat.com;
 peter.mayd...@linaro.org; alex.william...@redhat.com;
>> m...@redhat.com;
 qemu-...@nongnu.org; qemu-devel@nongnu.org; jean-
 philippe.bruc...@arm.com
 Cc: will.dea...@arm.com; robin.mur...@arm.com;
>> kevin.t...@intel.com;
 marc.zyng...@arm.com; christoffer.d...@linaro.org;
 drjo...@redhat.com; w...@redhat.com; t...@semihalf.com; Bharat
>> Bhushan
 
 Subject: [RFC v2 0/8] VIRTIO-IOMMU device

 This series implements the virtio-iommu device. This is a proof of
 concept based on the virtio-iommu specification written by Jean-Philippe
>> Brucker [1].
 This was tested with a guest using the virtio-iommu driver [2] and
 exposed with a virtio-net-pci using dma ops.

 The device gets instantiated using the "-device virtio-iommu-device"
 option. It currently works with ARM virt machine only as the machine
 must handle the dt binding between the virtio-mmio "iommu" node and
 the PCI host bridge node. ACPI booting is not yet supported.

For those who may play with the device, this was tested with a
virtio-net-pci device using the following command:

-device
virtio-net-pci,netdev=tap0,mac=,iommu_platform,disable-modern=off,disable-legacy=on
\

I tried to run the guest using a virtio-blk-pci device using
-device
virtio-blk-pci,scsi=off,drive=<>,iommu_platform=off,disable-modern=off,disable-legacy=on,werror=stop,rerror=stop
\

and the guest does *not* boot whereas it does without any iommu.

However I am not sure the issue is related to the actual virtual iommu
device as I have the exact same issue with vsmmuv3 emulated device (This
was originally reported by Tomasz). So the issue may come from the
infrastructure around. To be further investigated ...

Thanks

Eric


 This should allow to start some benchmarking activities against pure
 emulated IOMMU (especially ARM SMMU).
>>>
>>> I am testing this on ARM64 and see below continuous error prints:
>>>
>>> virtio_iommu_translate sid=8 is not known!!
>>> virtio_iommu_translate sid=8 is not known!!
>>> virtio_iommu_translate sid=8 is not known!!
>>> virtio_iommu_translate sid=8 is not known!!
>>> virtio_iommu_translate sid=8 is not known!!
>>> virtio_iommu_translate sid=8 is not known!!
>>> virtio_iommu_translate sid=8 is not known!!
>>> virtio_iommu_translate sid=8 is not known!!
>>> virtio_iommu_translate sid=8 is not known!!
>>> virtio_iommu_translate sid=8 is not known!!
>>>
>>>
>>> Also in guest I do not see device-tree node with virtio-iommu.
>> do you mean the virtio-mmio with #iommu-cells property?
>>
>> This one is created statically by virt machine. I would be surprised if it 
>> were
>> not there. Are you using the virt = virt2.10 machine. Machines before do not
>> support its instantiation.
>>
>> Please can you add a printf in hw/arm/virt.c create_virtio_mmio() at the
>> moment when this node is created. Also you can add a printf in
>> bind_virtio_iommu_device() to make sure the binding with the PCI host
>> bridge is added on machine init done.
>>
>> Also worth to check, CONFIG_VIRTIO_IOMMU=y on guest side.
> 
> It works on my side. The driver config was disabled and also I was using 
> guest kernel which was not have deferred-probing. Now after fixing it works 
> on my side
> I placed some prints to see dma-map are mapping regions in virtio-iommu, it 
> uses emulated iommu.
> 
> I will continue to add VFIO support now on this and more testing !!
> 
> Thanks
> -Bharat
> 
>>
>> Thanks
>>
>> Eric
>>
>>> I am using qemu-tree you mentioned below and iommu-driver patches
>> published by Jean-P.
>>> Qemu command line have additional ""-device virtio-iommu-device". What
>> I am missing ?
>>
>>
>>>
>>> Thanks
>>> -Bharat
>>>

 Best Regards

 Eric

 This series can be found at:
 https://github.com/eauger/qemu/tree/virtio-iommu-rfcv2

 References:
 [1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU, [2] [RFC PATCH
 linux]
 iommu: Add virtio-iommu driver [3] [RFC PATCH kvmtool 00/15] Add
 virtio- iommu

 History:
>>

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-06-09 Thread Auger Eric
Hi Bharat,

On 09/06/2017 13:30, Bharat Bhushan wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: Auger Eric [mailto:eric.au...@redhat.com]
>> Sent: Friday, June 09, 2017 12:14 PM
>> To: Bharat Bhushan ;
>> eric.auger@gmail.com; peter.mayd...@linaro.org;
>> alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org;
>> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com
>> Cc: will.dea...@arm.com; robin.mur...@arm.com; kevin.t...@intel.com;
>> marc.zyng...@arm.com; christoffer.d...@linaro.org; drjo...@redhat.com;
>> w...@redhat.com; t...@semihalf.com
>> Subject: Re: [RFC v2 0/8] VIRTIO-IOMMU device
>>
>> Hi Bharat,
>>
>> On 09/06/2017 08:16, Bharat Bhushan wrote:
>>> Hi Eric,
>>>
 -Original Message-
 From: Eric Auger [mailto:eric.au...@redhat.com]
 Sent: Wednesday, June 07, 2017 9:31 PM
 To: eric.auger@gmail.com; eric.au...@redhat.com;
 peter.mayd...@linaro.org; alex.william...@redhat.com;
>> m...@redhat.com;
 qemu-...@nongnu.org; qemu-devel@nongnu.org; jean-
 philippe.bruc...@arm.com
 Cc: will.dea...@arm.com; robin.mur...@arm.com;
>> kevin.t...@intel.com;
 marc.zyng...@arm.com; christoffer.d...@linaro.org;
 drjo...@redhat.com; w...@redhat.com; t...@semihalf.com; Bharat
>> Bhushan
 
 Subject: [RFC v2 0/8] VIRTIO-IOMMU device

 This series implements the virtio-iommu device. This is a proof of
 concept based on the virtio-iommu specification written by Jean-Philippe
>> Brucker [1].
 This was tested with a guest using the virtio-iommu driver [2] and
 exposed with a virtio-net-pci using dma ops.

 The device gets instantiated using the "-device virtio-iommu-device"
 option. It currently works with ARM virt machine only as the machine
 must handle the dt binding between the virtio-mmio "iommu" node and
 the PCI host bridge node. ACPI booting is not yet supported.

 This should allow to start some benchmarking activities against pure
 emulated IOMMU (especially ARM SMMU).
>>>
>>> I am testing this on ARM64 and see below continuous error prints:
>>>
>>> virtio_iommu_translate sid=8 is not known!!
>>> virtio_iommu_translate sid=8 is not known!!
>>> virtio_iommu_translate sid=8 is not known!!
>>> virtio_iommu_translate sid=8 is not known!!
>>> virtio_iommu_translate sid=8 is not known!!
>>> virtio_iommu_translate sid=8 is not known!!
>>> virtio_iommu_translate sid=8 is not known!!
>>> virtio_iommu_translate sid=8 is not known!!
>>> virtio_iommu_translate sid=8 is not known!!
>>> virtio_iommu_translate sid=8 is not known!!
>>>
>>>
>>> Also in guest I do not see device-tree node with virtio-iommu.
>> do you mean the virtio-mmio with #iommu-cells property?
>>
>> This one is created statically by virt machine. I would be surprised if it 
>> were
>> not there. Are you using the virt = virt2.10 machine. Machines before do not
>> support its instantiation.
>>
>> Please can you add a printf in hw/arm/virt.c create_virtio_mmio() at the
>> moment when this node is created. Also you can add a printf in
>> bind_virtio_iommu_device() to make sure the binding with the PCI host
>> bridge is added on machine init done.
>>
>> Also worth to check, CONFIG_VIRTIO_IOMMU=y on guest side.
> 
> It works on my side.
Great.

 The driver config was disabled and also I was using guest kernel which
was not have deferred-probing.
Yes I did not mention in my cover letter the guest I have been using is
based on Jean-Philippe's branch, featuring deferred IOMMU probing. I I
have not tried yet with an upstream guest.
 Now after fixing it works on my side
> I placed some prints to see dma-map are mapping regions in virtio-iommu, it 
> uses emulated iommu.
> 
> I will continue to add VFIO support now on this and more testing !!

OK. I will do the VFIO integration first on the vsmmuv3 device as I
already prepared the VFIO replay and hopefully we will sync ;-)

Thanks

Eric
> 
> Thanks
> -Bharat
> 
>>
>> Thanks
>>
>> Eric
>>
>>> I am using qemu-tree you mentioned below and iommu-driver patches
>> published by Jean-P.
>>> Qemu command line have additional ""-device virtio-iommu-device". What
>> I am missing ?
>>
>>
>>>
>>> Thanks
>>> -Bharat
>>>

 Best Regards

 Eric

 This series can be found at:
 https://github.com/eauger/qemu/tree/virtio-iommu-rfcv2

 References:
 [1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU, [2] [RFC PATCH
 linux]
 iommu: Add virtio-iommu driver [3] [RFC PATCH kvmtool 00/15] Add
 virtio- iommu

 History:
 v1 -> v2:
 - fix redifinition of viommu_as typedef

 Eric Auger (8):
   update-linux-headers: import virtio_iommu.h
   linux-headers: Update for virtio-iommu
   virtio_iommu: add skeleton
   virtio-iommu: Decode the command payload
   virtio_iommu: Add the iommu regions
   virtio-iommu: Implement the translation and commands
   hw/arm/virt: Add 2.10 mac

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-06-09 Thread Bharat Bhushan
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: Friday, June 09, 2017 12:14 PM
> To: Bharat Bhushan ;
> eric.auger@gmail.com; peter.mayd...@linaro.org;
> alex.william...@redhat.com; m...@redhat.com; qemu-...@nongnu.org;
> qemu-devel@nongnu.org; jean-philippe.bruc...@arm.com
> Cc: will.dea...@arm.com; robin.mur...@arm.com; kevin.t...@intel.com;
> marc.zyng...@arm.com; christoffer.d...@linaro.org; drjo...@redhat.com;
> w...@redhat.com; t...@semihalf.com
> Subject: Re: [RFC v2 0/8] VIRTIO-IOMMU device
> 
> Hi Bharat,
> 
> On 09/06/2017 08:16, Bharat Bhushan wrote:
> > Hi Eric,
> >
> >> -Original Message-
> >> From: Eric Auger [mailto:eric.au...@redhat.com]
> >> Sent: Wednesday, June 07, 2017 9:31 PM
> >> To: eric.auger@gmail.com; eric.au...@redhat.com;
> >> peter.mayd...@linaro.org; alex.william...@redhat.com;
> m...@redhat.com;
> >> qemu-...@nongnu.org; qemu-devel@nongnu.org; jean-
> >> philippe.bruc...@arm.com
> >> Cc: will.dea...@arm.com; robin.mur...@arm.com;
> kevin.t...@intel.com;
> >> marc.zyng...@arm.com; christoffer.d...@linaro.org;
> >> drjo...@redhat.com; w...@redhat.com; t...@semihalf.com; Bharat
> Bhushan
> >> 
> >> Subject: [RFC v2 0/8] VIRTIO-IOMMU device
> >>
> >> This series implements the virtio-iommu device. This is a proof of
> >> concept based on the virtio-iommu specification written by Jean-Philippe
> Brucker [1].
> >> This was tested with a guest using the virtio-iommu driver [2] and
> >> exposed with a virtio-net-pci using dma ops.
> >>
> >> The device gets instantiated using the "-device virtio-iommu-device"
> >> option. It currently works with ARM virt machine only as the machine
> >> must handle the dt binding between the virtio-mmio "iommu" node and
> >> the PCI host bridge node. ACPI booting is not yet supported.
> >>
> >> This should allow to start some benchmarking activities against pure
> >> emulated IOMMU (especially ARM SMMU).
> >
> > I am testing this on ARM64 and see below continuous error prints:
> >
> > virtio_iommu_translate sid=8 is not known!!
> > virtio_iommu_translate sid=8 is not known!!
> > virtio_iommu_translate sid=8 is not known!!
> > virtio_iommu_translate sid=8 is not known!!
> > virtio_iommu_translate sid=8 is not known!!
> > virtio_iommu_translate sid=8 is not known!!
> > virtio_iommu_translate sid=8 is not known!!
> > virtio_iommu_translate sid=8 is not known!!
> > virtio_iommu_translate sid=8 is not known!!
> > virtio_iommu_translate sid=8 is not known!!
> >
> >
> > Also in guest I do not see device-tree node with virtio-iommu.
> do you mean the virtio-mmio with #iommu-cells property?
> 
> This one is created statically by virt machine. I would be surprised if it 
> were
> not there. Are you using the virt = virt2.10 machine. Machines before do not
> support its instantiation.
> 
> Please can you add a printf in hw/arm/virt.c create_virtio_mmio() at the
> moment when this node is created. Also you can add a printf in
> bind_virtio_iommu_device() to make sure the binding with the PCI host
> bridge is added on machine init done.
> 
> Also worth to check, CONFIG_VIRTIO_IOMMU=y on guest side.

It works on my side. The driver config was disabled and also I was using guest 
kernel which was not have deferred-probing. Now after fixing it works on my side
I placed some prints to see dma-map are mapping regions in virtio-iommu, it 
uses emulated iommu.

I will continue to add VFIO support now on this and more testing !!

Thanks
-Bharat

> 
> Thanks
> 
> Eric
> 
> > I am using qemu-tree you mentioned below and iommu-driver patches
> published by Jean-P.
> > Qemu command line have additional ""-device virtio-iommu-device". What
> I am missing ?
> 
> 
> >
> > Thanks
> > -Bharat
> >
> >>
> >> Best Regards
> >>
> >> Eric
> >>
> >> This series can be found at:
> >> https://github.com/eauger/qemu/tree/virtio-iommu-rfcv2
> >>
> >> References:
> >> [1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU, [2] [RFC PATCH
> >> linux]
> >> iommu: Add virtio-iommu driver [3] [RFC PATCH kvmtool 00/15] Add
> >> virtio- iommu
> >>
> >> History:
> >> v1 -> v2:
> >> - fix redifinition of viommu_as typedef
> >>
> >> Eric Auger (8):
> >>   update-linux-headers: import virtio_iommu.h
> >>   linux-headers: Update for virtio-iommu
> >>   virtio_iommu: add skeleton
> >>   virtio-iommu: Decode the command payload
> >>   virtio_iommu: Add the iommu regions
> >>   virtio-iommu: Implement the translation and commands
> >>   hw/arm/virt: Add 2.10 machine type
> >>   hw/arm/virt: Add virtio-iommu the virt board
> >>
> >>  hw/arm/virt.c | 116 -
> >>  hw/virtio/Makefile.objs   |   1 +
> >>  hw/virtio/trace-events|  14 +
> >>  hw/virtio/virtio-iommu.c  | 623
> ++
> >>  include/hw/arm/virt.h |   5 +
> >>  include/hw/virtio/virtio-iommu.h  | 

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-06-08 Thread Auger Eric
Hi Bharat,

On 09/06/2017 08:16, Bharat Bhushan wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: Eric Auger [mailto:eric.au...@redhat.com]
>> Sent: Wednesday, June 07, 2017 9:31 PM
>> To: eric.auger@gmail.com; eric.au...@redhat.com;
>> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com;
>> qemu-...@nongnu.org; qemu-devel@nongnu.org; jean-
>> philippe.bruc...@arm.com
>> Cc: will.dea...@arm.com; robin.mur...@arm.com; kevin.t...@intel.com;
>> marc.zyng...@arm.com; christoffer.d...@linaro.org; drjo...@redhat.com;
>> w...@redhat.com; t...@semihalf.com; Bharat Bhushan
>> 
>> Subject: [RFC v2 0/8] VIRTIO-IOMMU device
>>
>> This series implements the virtio-iommu device. This is a proof of concept
>> based on the virtio-iommu specification written by Jean-Philippe Brucker [1].
>> This was tested with a guest using the virtio-iommu driver [2] and exposed
>> with a virtio-net-pci using dma ops.
>>
>> The device gets instantiated using the "-device virtio-iommu-device"
>> option. It currently works with ARM virt machine only as the machine must
>> handle the dt binding between the virtio-mmio "iommu" node and the PCI
>> host bridge node. ACPI booting is not yet supported.
>>
>> This should allow to start some benchmarking activities against pure
>> emulated IOMMU (especially ARM SMMU).
> 
> I am testing this on ARM64 and see below continuous error prints:
> 
>   virtio_iommu_translate sid=8 is not known!!
>   virtio_iommu_translate sid=8 is not known!!
>   virtio_iommu_translate sid=8 is not known!!
>   virtio_iommu_translate sid=8 is not known!!
>   virtio_iommu_translate sid=8 is not known!!
>   virtio_iommu_translate sid=8 is not known!!
>   virtio_iommu_translate sid=8 is not known!!
>   virtio_iommu_translate sid=8 is not known!!
>   virtio_iommu_translate sid=8 is not known!!
>   virtio_iommu_translate sid=8 is not known!! 
> 
> 
> Also in guest I do not see device-tree node with virtio-iommu.
do you mean the virtio-mmio with #iommu-cells property?

This one is created statically by virt machine. I would be surprised if
it were not there. Are you using the virt = virt2.10 machine. Machines
before do not support its instantiation.

Please can you add a printf in hw/arm/virt.c create_virtio_mmio() at the
moment when this node is created. Also you can add a printf in
bind_virtio_iommu_device() to make sure the binding with the PCI host
bridge is added on machine init done.

Also worth to check, CONFIG_VIRTIO_IOMMU=y on guest side.

Thanks

Eric

> I am using qemu-tree you mentioned below and iommu-driver patches published 
> by Jean-P.
> Qemu command line have additional ""-device virtio-iommu-device". What I am 
> missing ?


> 
> Thanks
> -Bharat
> 
>>
>> Best Regards
>>
>> Eric
>>
>> This series can be found at:
>> https://github.com/eauger/qemu/tree/virtio-iommu-rfcv2
>>
>> References:
>> [1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU, [2] [RFC PATCH linux]
>> iommu: Add virtio-iommu driver [3] [RFC PATCH kvmtool 00/15] Add virtio-
>> iommu
>>
>> History:
>> v1 -> v2:
>> - fix redifinition of viommu_as typedef
>>
>> Eric Auger (8):
>>   update-linux-headers: import virtio_iommu.h
>>   linux-headers: Update for virtio-iommu
>>   virtio_iommu: add skeleton
>>   virtio-iommu: Decode the command payload
>>   virtio_iommu: Add the iommu regions
>>   virtio-iommu: Implement the translation and commands
>>   hw/arm/virt: Add 2.10 machine type
>>   hw/arm/virt: Add virtio-iommu the virt board
>>
>>  hw/arm/virt.c | 116 -
>>  hw/virtio/Makefile.objs   |   1 +
>>  hw/virtio/trace-events|  14 +
>>  hw/virtio/virtio-iommu.c  | 623 
>> ++
>>  include/hw/arm/virt.h |   5 +
>>  include/hw/virtio/virtio-iommu.h  |  60 +++
>>  include/standard-headers/linux/virtio_ids.h   |   1 +
>>  include/standard-headers/linux/virtio_iommu.h | 142 ++
>>  linux-headers/linux/virtio_iommu.h|   1 +
>>  scripts/update-linux-headers.sh   |   3 +
>>  10 files changed, 957 insertions(+), 9 deletions(-)  create mode 100644
>> hw/virtio/virtio-iommu.c  create mode 100644 include/hw/virtio/virtio-
>> iommu.h  create mode 100644 include/standard-
>> headers/linux/virtio_iommu.h
>>  create mode 100644 linux-headers/linux/virtio_iommu.h
>>
>> --
>> 2.5.5
> 



Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

2017-06-08 Thread Bharat Bhushan
Hi Eric,

> -Original Message-
> From: Eric Auger [mailto:eric.au...@redhat.com]
> Sent: Wednesday, June 07, 2017 9:31 PM
> To: eric.auger@gmail.com; eric.au...@redhat.com;
> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com;
> qemu-...@nongnu.org; qemu-devel@nongnu.org; jean-
> philippe.bruc...@arm.com
> Cc: will.dea...@arm.com; robin.mur...@arm.com; kevin.t...@intel.com;
> marc.zyng...@arm.com; christoffer.d...@linaro.org; drjo...@redhat.com;
> w...@redhat.com; t...@semihalf.com; Bharat Bhushan
> 
> Subject: [RFC v2 0/8] VIRTIO-IOMMU device
> 
> This series implements the virtio-iommu device. This is a proof of concept
> based on the virtio-iommu specification written by Jean-Philippe Brucker [1].
> This was tested with a guest using the virtio-iommu driver [2] and exposed
> with a virtio-net-pci using dma ops.
> 
> The device gets instantiated using the "-device virtio-iommu-device"
> option. It currently works with ARM virt machine only as the machine must
> handle the dt binding between the virtio-mmio "iommu" node and the PCI
> host bridge node. ACPI booting is not yet supported.
> 
> This should allow to start some benchmarking activities against pure
> emulated IOMMU (especially ARM SMMU).

I am testing this on ARM64 and see below continuous error prints:

virtio_iommu_translate sid=8 is not known!!
virtio_iommu_translate sid=8 is not known!!
virtio_iommu_translate sid=8 is not known!!
virtio_iommu_translate sid=8 is not known!!
virtio_iommu_translate sid=8 is not known!!
virtio_iommu_translate sid=8 is not known!!
virtio_iommu_translate sid=8 is not known!!
virtio_iommu_translate sid=8 is not known!!
virtio_iommu_translate sid=8 is not known!!
virtio_iommu_translate sid=8 is not known!! 


Also in guest I do not see device-tree node with virtio-iommu.
I am using qemu-tree you mentioned below and iommu-driver patches published by 
Jean-P.
Qemu command line have additional ""-device virtio-iommu-device". What I am 
missing ?

Thanks
-Bharat

> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> https://github.com/eauger/qemu/tree/virtio-iommu-rfcv2
> 
> References:
> [1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU, [2] [RFC PATCH linux]
> iommu: Add virtio-iommu driver [3] [RFC PATCH kvmtool 00/15] Add virtio-
> iommu
> 
> History:
> v1 -> v2:
> - fix redifinition of viommu_as typedef
> 
> Eric Auger (8):
>   update-linux-headers: import virtio_iommu.h
>   linux-headers: Update for virtio-iommu
>   virtio_iommu: add skeleton
>   virtio-iommu: Decode the command payload
>   virtio_iommu: Add the iommu regions
>   virtio-iommu: Implement the translation and commands
>   hw/arm/virt: Add 2.10 machine type
>   hw/arm/virt: Add virtio-iommu the virt board
> 
>  hw/arm/virt.c | 116 -
>  hw/virtio/Makefile.objs   |   1 +
>  hw/virtio/trace-events|  14 +
>  hw/virtio/virtio-iommu.c  | 623 
> ++
>  include/hw/arm/virt.h |   5 +
>  include/hw/virtio/virtio-iommu.h  |  60 +++
>  include/standard-headers/linux/virtio_ids.h   |   1 +
>  include/standard-headers/linux/virtio_iommu.h | 142 ++
>  linux-headers/linux/virtio_iommu.h|   1 +
>  scripts/update-linux-headers.sh   |   3 +
>  10 files changed, 957 insertions(+), 9 deletions(-)  create mode 100644
> hw/virtio/virtio-iommu.c  create mode 100644 include/hw/virtio/virtio-
> iommu.h  create mode 100644 include/standard-
> headers/linux/virtio_iommu.h
>  create mode 100644 linux-headers/linux/virtio_iommu.h
> 
> --
> 2.5.5