> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Wednesday, July 12, 2017 4:28 PM
> To: Bharat Bhushan <bharat.bhus...@nxp.com>; Auger Eric
> <eric.au...@redhat.com>; eric.auger....@gmail.com;
> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com;
> qemu-...@nongnu.org; qemu-devel@nongnu.org
> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
> robin.mur...@arm.com; christoffer.d...@linaro.org
> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> 
> On 12/07/17 11:27, Bharat Bhushan wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> >> Sent: Wednesday, July 12, 2017 3:48 PM
> >> To: Bharat Bhushan <bharat.bhus...@nxp.com>; Auger Eric
> >> <eric.au...@redhat.com>; eric.auger....@gmail.com;
> >> peter.mayd...@linaro.org; alex.william...@redhat.com;
> m...@redhat.com;
> >> qemu-...@nongnu.org; qemu-devel@nongnu.org
> >> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
> >> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
> >> robin.mur...@arm.com; christoffer.d...@linaro.org
> >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
> >>
> >> On 12/07/17 04:50, Bharat Bhushan wrote:
> >> [...]
> >>>> The size of the virtio_iommu_req_probe structure is variable, and
> >> depends
> >>>> what fields the device implements. So the device initially computes
> >>>> the
> >> size it
> >>>> needs to fill virtio_iommu_req_probe, describes it in probe_size,
> >>>> and the driver allocates that many bytes for
> >>>> virtio_iommu_req_probe.content[]
> >>>>
> >>>>>> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should
> >> send
> >>>> an
> >>>>>> VIRTIO_IOMMU_T_PROBE request for each new endpoint.
> >>>>>> * The driver allocates a device-writeable buffer of probe_size
> >>>>>> (plus
> >>>>>> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request.
> >>>>>> * The device fills the buffer with various information.
> >>>>>>
> >>>>>> struct virtio_iommu_req_probe {
> >>>>>>        /* device-readable */
> >>>>>>        struct virtio_iommu_req_head head;
> >>>>>>        le32 device;
> >>>>>>        le32 flags;
> >>>>>>
> >>>>>>        /* maybe also le32 content_size, but it must be equal to
> >>>>>>           probe_size */
> >>>>>
> >>>>> Can you please describe why we need to pass size of "probe_size"
> >>>>> in
> >> probe
> >>>> request?
> >>>>
> >>>> We don't. I don't think we should add this 'content_size' field
> >>>> unless there
> >> is
> >>>> a compelling reason to do so.
> >>>>
> >>>>>>
> >>>>>>        /* device-writeable */
> >>>>>>        u8 content[];
> >>>>>
> >>>>> I assume content_size above is the size of array "content[]" and
> >>>>> max
> >> value
> >>>> can be equal to probe_size advertised by device?
> >>>>
> >>>> probe_size is exactly the size of array content[]. The driver must
> >>>> allocate a buffer of this size (plus the space needed for head, device,
> flags and tail).
> >>>>
> >>>> Then the device is free to leave parts of content[] empty. Field
> >>>> 'type' 0 will
> >> be
> >>>> reserved and mark the end of the array.
> >>>>
> >>>>>>        struct virtio_iommu_req_tail tail; };
> >>>>>>
> >>>>>> I'm still struggling with the content and layout of the probe
> >>>>>> request, and would appreciate any feedback. To be easily
> >>>>>> extended, I think it should contain a list of fields of variable size:
> >>>>>>
> >>>>>>        |0           15|16           31|32               N|
> >>>>>>        |     type     |    length     |      values      |
> >>>>>>
> >>>>>> 'length' might be made optional if it can be deduced from type,
> >>>>>> but might make driver-side parsing more robust.
> >>>>>>
> >>>>>> The probe could either be done for each endpoint, or for each
> >>>>>> address space. I much prefer endpoint because it is the smallest
> granularity.
> >>>>>> The driver can then decide what endpoints to put together in the
> >>>>>> same address space based on their individual capabilities. The
> >>>>>> specification would described how each endpoint property is
> >>>>>> combined when endpoints are put in the same address space. For
> >>>>>> example, take the minimum of all PASID size, the maximum of all
> >>>>>> page granularities, combine doorbell addresses, etc.
> >>>>>>
> >>>>>> If we did the probe on address spaces instead, the driver would
> >>>>>> have to re-send a probe request each time a new endpoint is
> >>>>>> attached to an existing address space, to see if it is still
> >>>>>> capable of page table handover or if the driver just combined a
> >>>>>> VFIO and an emulated endpoint by accident.
> >>>>>>
> >>>>>>                                  ***
> >>>>>>
> >>>>>> Using this framework, the device can declare doorbell regions by
> >>>>>> adding one or more RESV fields into the probe buffer:
> >>>>>>
> >>>>>> /* 'type' */
> >>>>>> #define VIRTIO_IOMMU_PROBE_T_RESV      0x1
> >>>>>>
> >>>>>> /* 'values'. 'length' is sizeof(struct virtio_iommu_probe_resv)
> >>>>>> */ struct virtio_iommu_probe_resv {
> >>>>>>        le64 gpa;
> >>>>>>        le64 size;
> >>>>>>
> >>>>>> #define VIRTIO_IOMMU_PROBE_RESV_MSI    0x1
> >>>>>>        u8 type;
> >>>
> >>> To be sure I am understanding it correctly, Is this "type" in struct
> >> virtio_iommu_req_head?
> >>
> >> No, virtio_iommu_req_head::type is the request type
> >> (ATTACH/DETACH/MAP/UNMAP/PROBE).
> >>
> >> Then virtio_iommu_probe_property::type is the property type (only
> >> RESV for the moment).
> >>
> >> And this is virtio_iommu_probe_resv::type, which is the type of the
> >> resv region (MSI). I renamed it to 'subtype' below, but I think it
> >> still is pretty confusing.
> >>
> >>
> >> I did a number of changes to structures and naming when trying to
> >> integrate it to the specification:
> >>
> >> * Added 64 bytes of padding in virtio_iommu_req_probe, so that future
> >> extensions can add fields in the device-readable part.
> >> * renamed "RESV" to "RESV_MEM".
> >> * The resv_mem property now looks like this:
> >>   struct virtio_iommu_probe_resv_mem {
> >>         u8      subtype;
> >>         u8      padding[3];
> >>         le32    flags;
> >>         le64    addr;
> >>         le64    size;
> >>   };
> >> * subtype for MSI doorbells is now
> >> VIRTIO_IOMMU_PROBE_RESV_MEM_T_BYPASS
> >> (because transactions to this region bypass the IOMMU). 'flags'
> >> contain a hint VIRTIO_IOMMU_PROBE_RESV_MEM_F_MSI, telling the
> driver
> >> that this region is used for MSIs.
> >>
> >> Here is an example of a probe request returning an MSI doorbell
> property.
> >>
> >>      31                       7      0
> >>     +---------------------------------+
> >>     |           0            |  type  | <- request type = PROBE (5)
> >>     +---------------------------------+
> >>     |             device              |
> >>     +---------------------------------+
> >>     :                                 :
> >>     :          (64B padding)          :
> >>     :                                 :
> >>     +---------------------------------+
> >>   ^ |  length = 24   |    type = 1    | <- property type = RESV_MEM (1)
> >>   | +---------------------------------+
> >>   | |           0            |subtype | <- RESV_MEM subtype = BYPASS (1)
> >>  p| +---------------------------------+
> >>  r| |           flags = MSI           |
> >>  o| +---------------------------------+
> >>  b| |         addr = 0xfee00000       |
> >>  e| |                                 |
> >>  _| +---------------------------------+
> >>  s| |         size = 0x00100000       |
> >>  i| |                                 |
> >>  z| +---------------------------------+
> >>  e| |    length      |      type      | <- another property may start
> >>   | :                                 :    here
> >>   v :               ...               :
> >>     +---------------------------------+
> >>     |           0            | status | <- request tail
> >>     +---------------------------------+
> >
> > So we want a single probe will return info of all "types" and each "subtype"
> of given "type"? I was of impression that based on flags there will be
> separate probe request for a type.
> 
> Argh, now I'm lost :)
> 
> The virtio-iommu driver sends a single PROBE request for each endpoint.
> The virtio-iommu device fills the 'properties' field with a list of 
> properties.
> 
> And endpoint may have one or more reserved virtual addresses. Each such
> region is described by the virtio-iommu device with a RESV_MEM property in
> the properties list.
> 
> There will be other types of properties in the future, for other information
> than RESV_MEM. So the PROBE request is a generic channel for different
> types of properties, all aggregated into a single list. The virtio-iommu 
> device
> chooses which property it needs to communicate to the driver.
> 
> The list does not have a fixed layout, and the driver knows what properties it
> contains by reading the 'type' header of each property.
> 
> The virtio-iommu driver parses the 'properties' list filled by the device.
> If it encounters one or more RESV_MEM properties, the driver should take
> note of them. Thereafter, the driver should never create a mapping
> overlapping RESV_MEM regions for this endpoint.
> 
> If, in addition, the RESV_MEM property is of subtype BYPASS and has flag
> MSI, then the driver knows that it is an MSI doorbell and it doesn't need to
> create a mapping (using a MAP request) for this MSI doorbell.

Thanks for clarification, it is inline with my understanding.

Thanks
-Bharat

> 
> Maybe my prototype implementation will be more clear.
> 
> Thanks,
> Jean
> 
> 
> >>
> >>
> >> I'll try to send the next version of the spec out as soon as possible.
> >>
> >> Thanks,
> >> Jean
> >>
> >>
> >>> Thanks
> >>> -Bharat
> >>>
> >>>>>
> >>>>> type is 16 bit above?
> >>>>
> >>>> Ah, the naming isn't great. This is not the same as above, and
> >>>> could be
> >> called
> >>>> 'subtype' to avoid confusion. The above 16-bit type describes the
> >>>> field
> >> type,
> >>>> e.g. struct virtio_iommu_probe_resv. I proposed 16-bit because it
> >>>> seems easy to reach more than 255 kinds of endpoint properties, but
> >>>> 65535 should do.
> >>>>
> >>>> This subtype describes which kind of resv region is described in
> >>>> the
> >> structure.
> >>>> For the moment there only is VIRTIO_IOMMU_PROBE_RESV_MSI, but
> we
> >>>> could for example add resv regions that the driver should never use
> >>>> or
> >> that it
> >>>> should identity-map (equivalent to IOMMU_RESV_RESERVED/DIRECT
> in
> >>>> Linux). I think 8 bits should be enough to contain any future
> >>>> types, unless
> >> we
> >>>> make this a bitfield. For identity-map, there may be an additional
> >>>> flags
> >> field
> >>>> describing the protection.
> >>>>
> >>>>>> };
> >>>>>>
> >>>>>> Such a region would be subject to the following rules:
> >>>>>>
> >>>>>> * Driver should not use any IOVA declared as RESV_MSI in a
> mapping.
> >>>>>> * Device should leave any transaction matching a RESV_MSI region
> >>>>>> pass through untranslated.
> >>>>>> * If the device does not advertise any RESV region, then the
> >>>>>> driver should assume that MSI doorbells, like any other GPA, must
> >>>>>> be
> >> mapped
> >>>>>> with an arbitrary IOVA in order for the endpoint to access them.
> >>>>>> * Given that the driver *should* perform a probe request if
> >>>>>> available, and it *should* understand the
> >>>> VIRTIO_IOMMU_PROBE_T_RESV
> >>>>>> field, then this field tells the guest how it should handle MSI
> >>>>>> doorbells, and whether it should map the address via MAP requests
> >>>>>> or
> >>>> not.
> >>>>>>
> >>>>>> Does this make sense and did I overlook something?
> >>>>>
> >>>>> Overall it looks good to me. Do you have plans to implements this
> >>>>> in
> >> virtio-
> >>>> iommu driver and kvmtool?
> >>>>
> >>>> Yes, if there is no objection I'll try to formalize it and
> >>>> implement it right
> >> away.
> >>>>
> >>>> Thanks,
> >>>> Jean
> >

Reply via email to