On 12/07/17 11:27, Bharat Bhushan wrote: > > >> -----Original Message----- >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] >> Sent: Wednesday, July 12, 2017 3:48 PM >> To: Bharat Bhushan <bharat.bhus...@nxp.com>; Auger Eric >> <eric.au...@redhat.com>; eric.auger....@gmail.com; >> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com; >> qemu-...@nongnu.org; qemu-devel@nongnu.org >> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; >> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; >> robin.mur...@arm.com; christoffer.d...@linaro.org >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device >> >> On 12/07/17 04:50, Bharat Bhushan wrote: >> [...] >>>> The size of the virtio_iommu_req_probe structure is variable, and >> depends >>>> what fields the device implements. So the device initially computes the >> size it >>>> needs to fill virtio_iommu_req_probe, describes it in probe_size, and the >>>> driver allocates that many bytes for virtio_iommu_req_probe.content[] >>>> >>>>>> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should >> send >>>> an >>>>>> VIRTIO_IOMMU_T_PROBE request for each new endpoint. >>>>>> * The driver allocates a device-writeable buffer of probe_size (plus >>>>>> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request. >>>>>> * The device fills the buffer with various information. >>>>>> >>>>>> struct virtio_iommu_req_probe { >>>>>> /* device-readable */ >>>>>> struct virtio_iommu_req_head head; >>>>>> le32 device; >>>>>> le32 flags; >>>>>> >>>>>> /* maybe also le32 content_size, but it must be equal to >>>>>> probe_size */ >>>>> >>>>> Can you please describe why we need to pass size of "probe_size" in >> probe >>>> request? >>>> >>>> We don't. I don't think we should add this 'content_size' field unless >>>> there >> is >>>> a compelling reason to do so. >>>> >>>>>> >>>>>> /* device-writeable */ >>>>>> u8 content[]; >>>>> >>>>> I assume content_size above is the size of array "content[]" and max >> value >>>> can be equal to probe_size advertised by device? >>>> >>>> probe_size is exactly the size of array content[]. The driver must >>>> allocate a >>>> buffer of this size (plus the space needed for head, device, flags and >>>> tail). >>>> >>>> Then the device is free to leave parts of content[] empty. Field 'type' 0 >>>> will >> be >>>> reserved and mark the end of the array. >>>> >>>>>> struct virtio_iommu_req_tail tail; >>>>>> }; >>>>>> >>>>>> I'm still struggling with the content and layout of the probe >>>>>> request, and would appreciate any feedback. To be easily extended, I >>>>>> think it should contain a list of fields of variable size: >>>>>> >>>>>> |0 15|16 31|32 N| >>>>>> | type | length | values | >>>>>> >>>>>> 'length' might be made optional if it can be deduced from type, but >>>>>> might make driver-side parsing more robust. >>>>>> >>>>>> The probe could either be done for each endpoint, or for each address >>>>>> space. I much prefer endpoint because it is the smallest granularity. >>>>>> The driver can then decide what endpoints to put together in the same >>>>>> address space based on their individual capabilities. The >>>>>> specification would described how each endpoint property is combined >>>>>> when endpoints are put in the same address space. For example, take >>>>>> the minimum of all PASID size, the maximum of all page granularities, >>>>>> combine doorbell addresses, etc. >>>>>> >>>>>> If we did the probe on address spaces instead, the driver would have >>>>>> to re-send a probe request each time a new endpoint is attached to an >>>>>> existing address space, to see if it is still capable of page table >>>>>> handover or if the driver just combined a VFIO and an emulated >>>>>> endpoint by accident. >>>>>> >>>>>> *** >>>>>> >>>>>> Using this framework, the device can declare doorbell regions by >>>>>> adding one or more RESV fields into the probe buffer: >>>>>> >>>>>> /* 'type' */ >>>>>> #define VIRTIO_IOMMU_PROBE_T_RESV 0x1 >>>>>> >>>>>> /* 'values'. 'length' is sizeof(struct virtio_iommu_probe_resv) */ >>>>>> struct virtio_iommu_probe_resv { >>>>>> le64 gpa; >>>>>> le64 size; >>>>>> >>>>>> #define VIRTIO_IOMMU_PROBE_RESV_MSI 0x1 >>>>>> u8 type; >>> >>> To be sure I am understanding it correctly, Is this "type" in struct >> virtio_iommu_req_head? >> >> No, virtio_iommu_req_head::type is the request type >> (ATTACH/DETACH/MAP/UNMAP/PROBE). >> >> Then virtio_iommu_probe_property::type is the property type (only RESV >> for >> the moment). >> >> And this is virtio_iommu_probe_resv::type, which is the type of the resv >> region (MSI). I renamed it to 'subtype' below, but I think it still is >> pretty confusing. >> >> >> I did a number of changes to structures and naming when trying to >> integrate it to the specification: >> >> * Added 64 bytes of padding in virtio_iommu_req_probe, so that future >> extensions can add fields in the device-readable part. >> * renamed "RESV" to "RESV_MEM". >> * The resv_mem property now looks like this: >> struct virtio_iommu_probe_resv_mem { >> u8 subtype; >> u8 padding[3]; >> le32 flags; >> le64 addr; >> le64 size; >> }; >> * subtype for MSI doorbells is now >> VIRTIO_IOMMU_PROBE_RESV_MEM_T_BYPASS >> (because transactions to this region bypass the IOMMU). 'flags' contain a >> hint VIRTIO_IOMMU_PROBE_RESV_MEM_F_MSI, telling the driver that this >> region is used for MSIs. >> >> Here is an example of a probe request returning an MSI doorbell property. >> >> 31 7 0 >> +---------------------------------+ >> | 0 | type | <- request type = PROBE (5) >> +---------------------------------+ >> | device | >> +---------------------------------+ >> : : >> : (64B padding) : >> : : >> +---------------------------------+ >> ^ | length = 24 | type = 1 | <- property type = RESV_MEM (1) >> | +---------------------------------+ >> | | 0 |subtype | <- RESV_MEM subtype = BYPASS (1) >> p| +---------------------------------+ >> r| | flags = MSI | >> o| +---------------------------------+ >> b| | addr = 0xfee00000 | >> e| | | >> _| +---------------------------------+ >> s| | size = 0x00100000 | >> i| | | >> z| +---------------------------------+ >> e| | length | type | <- another property may start >> | : : here >> v : ... : >> +---------------------------------+ >> | 0 | status | <- request tail >> +---------------------------------+ > > So we want a single probe will return info of all "types" and each "subtype" > of given "type"? I was of impression that based on flags there will be > separate probe request for a type.
Argh, now I'm lost :) The virtio-iommu driver sends a single PROBE request for each endpoint. The virtio-iommu device fills the 'properties' field with a list of properties. And endpoint may have one or more reserved virtual addresses. Each such region is described by the virtio-iommu device with a RESV_MEM property in the properties list. There will be other types of properties in the future, for other information than RESV_MEM. So the PROBE request is a generic channel for different types of properties, all aggregated into a single list. The virtio-iommu device chooses which property it needs to communicate to the driver. The list does not have a fixed layout, and the driver knows what properties it contains by reading the 'type' header of each property. The virtio-iommu driver parses the 'properties' list filled by the device. If it encounters one or more RESV_MEM properties, the driver should take note of them. Thereafter, the driver should never create a mapping overlapping RESV_MEM regions for this endpoint. If, in addition, the RESV_MEM property is of subtype BYPASS and has flag MSI, then the driver knows that it is an MSI doorbell and it doesn't need to create a mapping (using a MAP request) for this MSI doorbell. Maybe my prototype implementation will be more clear. Thanks, Jean >> >> >> I'll try to send the next version of the spec out as soon as possible. >> >> Thanks, >> Jean >> >> >>> Thanks >>> -Bharat >>> >>>>> >>>>> type is 16 bit above? >>>> >>>> Ah, the naming isn't great. This is not the same as above, and could be >> called >>>> 'subtype' to avoid confusion. The above 16-bit type describes the field >> type, >>>> e.g. struct virtio_iommu_probe_resv. I proposed 16-bit because it seems >>>> easy to reach more than 255 kinds of endpoint properties, but >>>> 65535 should do. >>>> >>>> This subtype describes which kind of resv region is described in the >> structure. >>>> For the moment there only is VIRTIO_IOMMU_PROBE_RESV_MSI, but we >>>> could for example add resv regions that the driver should never use or >> that it >>>> should identity-map (equivalent to IOMMU_RESV_RESERVED/DIRECT in >>>> Linux). I think 8 bits should be enough to contain any future types, unless >> we >>>> make this a bitfield. For identity-map, there may be an additional flags >> field >>>> describing the protection. >>>> >>>>>> }; >>>>>> >>>>>> Such a region would be subject to the following rules: >>>>>> >>>>>> * Driver should not use any IOVA declared as RESV_MSI in a mapping. >>>>>> * Device should leave any transaction matching a RESV_MSI region pass >>>>>> through untranslated. >>>>>> * If the device does not advertise any RESV region, then the driver >>>>>> should assume that MSI doorbells, like any other GPA, must be >> mapped >>>>>> with an arbitrary IOVA in order for the endpoint to access them. >>>>>> * Given that the driver *should* perform a probe request if >>>>>> available, and it *should* understand the >>>> VIRTIO_IOMMU_PROBE_T_RESV >>>>>> field, then this field tells the guest how it should handle MSI >>>>>> doorbells, and whether it should map the address via MAP requests or >>>> not. >>>>>> >>>>>> Does this make sense and did I overlook something? >>>>> >>>>> Overall it looks good to me. Do you have plans to implements this in >> virtio- >>>> iommu driver and kvmtool? >>>> >>>> Yes, if there is no objection I'll try to formalize it and implement it >>>> right >> away. >>>> >>>> Thanks, >>>> Jean >