Hi Michael, >-----Original Message----- >From: Michael S. Tsirkin <m...@redhat.com> >Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >compatibility check with host IOMMU cap/ecap > >On Fri, Apr 26, 2024 at 03:10:14AM +0000, Duan, Zhenzhong wrote: >> >> >> >-----Original Message----- >> >From: Cédric Le Goater <c...@redhat.com> >> >Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >> >compatibility check with host IOMMU cap/ecap >> > >> >On 4/25/24 10:46, Duan, Zhenzhong wrote: >> >> Hi Cédric, >> >> >> >>> -----Original Message----- >> >>> From: Cédric Le Goater <c...@redhat.com> >> >>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >> >>> compatibility check with host IOMMU cap/ecap >> >>> >> >>> Hello Zhenzhong, >> >>> >> >>> On 4/18/24 10:42, Duan, Zhenzhong wrote: >> >>>> Hi Cédric, >> >>>> >> >>>>> -----Original Message----- >> >>>>> From: Cédric Le Goater <c...@redhat.com> >> >>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >> >>>>> compatibility check with host IOMMU cap/ecap >> >>>>> >> >>>>> Hello Zhenzhong >> >>>>> >> >>>>> On 4/17/24 11:24, Duan, Zhenzhong wrote: >> >>>>>> >> >>>>>> >> >>>>>>> -----Original Message----- >> >>>>>>> From: Cédric Le Goater <c...@redhat.com> >> >>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to >do >> >>>>>>> compatibility check with host IOMMU cap/ecap >> >>>>>>> >> >>>>>>> On 4/17/24 06:21, Duan, Zhenzhong wrote: >> >>>>>>>> >> >>>>>>>> >> >>>>>>>>> -----Original Message----- >> >>>>>>>>> From: Cédric Le Goater <c...@redhat.com> >> >>>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to >do >> >>>>>>>>> compatibility check with host IOMMU cap/ecap >> >>>>>>>>> >> >>>>>>>>> Hello, >> >>>>>>>>> >> >>>>>>>>> On 4/16/24 09:09, Duan, Zhenzhong wrote: >> >>>>>>>>>> Hi Cédric, >> >>>>>>>>>> >> >>>>>>>>>>> -----Original Message----- >> >>>>>>>>>>> From: Cédric Le Goater <c...@redhat.com> >> >>>>>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework >to >> >do >> >>>>>>>>>>> compatibility check with host IOMMU cap/ecap >> >>>>>>>>>>> >> >>>>>>>>>>> On 4/8/24 10:44, Zhenzhong Duan wrote: >> >>>>>>>>>>>> From: Yi Liu <yi.l....@intel.com> >> >>>>>>>>>>>> >> >>>>>>>>>>>> If check fails, the host side device(either vfio or vdpa device) >> >>> should >> >>>>>>> not >> >>>>>>>>>>>> be passed to guest. >> >>>>>>>>>>>> >> >>>>>>>>>>>> Implementation details for different backends will be in >> >>> following >> >>>>>>>>> patches. >> >>>>>>>>>>>> >> >>>>>>>>>>>> Signed-off-by: Yi Liu <yi.l....@intel.com> >> >>>>>>>>>>>> Signed-off-by: Yi Sun <yi.y....@linux.intel.com> >> >>>>>>>>>>>> Signed-off-by: Zhenzhong Duan ><zhenzhong.d...@intel.com> >> >>>>>>>>>>>> --- >> >>>>>>>>>>>> hw/i386/intel_iommu.c | 35 >> >>>>>>>>>>> +++++++++++++++++++++++++++++++++++ >> >>>>>>>>>>>> 1 file changed, 35 insertions(+) >> >>>>>>>>>>>> >> >>>>>>>>>>>> diff --git a/hw/i386/intel_iommu.c >b/hw/i386/intel_iommu.c >> >>>>>>>>>>>> index 4f84e2e801..a49b587c73 100644 >> >>>>>>>>>>>> --- a/hw/i386/intel_iommu.c >> >>>>>>>>>>>> +++ b/hw/i386/intel_iommu.c >> >>>>>>>>>>>> @@ -35,6 +35,7 @@ >> >>>>>>>>>>>> #include "sysemu/kvm.h" >> >>>>>>>>>>>> #include "sysemu/dma.h" >> >>>>>>>>>>>> #include "sysemu/sysemu.h" >> >>>>>>>>>>>> +#include "sysemu/iommufd.h" >> >>>>>>>>>>>> #include "hw/i386/apic_internal.h" >> >>>>>>>>>>>> #include "kvm/kvm_i386.h" >> >>>>>>>>>>>> #include "migration/vmstate.h" >> >>>>>>>>>>>> @@ -3819,6 +3820,32 @@ VTDAddressSpace >> >>>>>>>>>>> *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, >> >>>>>>>>>>>> return vtd_dev_as; >> >>>>>>>>>>>> } >> >>>>>>>>>>>> >> >>>>>>>>>>>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s, >> >>>>>>>>>>>> + HostIOMMUDevice *hiod, >> >>>>>>>>>>>> + Error **errp) >> >>>>>>>>>>>> +{ >> >>>>>>>>>>>> + return 0; >> >>>>>>>>>>>> +} >> >>>>>>>>>>>> + >> >>>>>>>>>>>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s, >> >>>>>>>>>>>> + HostIOMMUDevice *hiod, >> >>>>>>>>>>>> + Error **errp) >> >>>>>>>>>>>> +{ >> >>>>>>>>>>>> + return 0; >> >>>>>>>>>>>> +} >> >>>>>>>>>>>> + >> >>>>>>>>>>>> +static int vtd_check_hdev(IntelIOMMUState *s, >> >>>>>>>>> VTDHostIOMMUDevice >> >>>>>>>>>>> *vtd_hdev, >> >>>>>>>>>>>> + Error **errp) >> >>>>>>>>>>>> +{ >> >>>>>>>>>>>> + HostIOMMUDevice *hiod = vtd_hdev->dev; >> >>>>>>>>>>>> + >> >>>>>>>>>>>> + if (object_dynamic_cast(OBJECT(hiod), >> >>> TYPE_HIOD_IOMMUFD)) >> >>>>> { >> >>>>>>>>>>>> + return vtd_check_iommufd_hdev(s, hiod, errp); >> >>>>>>>>>>>> + } >> >>>>>>>>>>>> + >> >>>>>>>>>>>> + return vtd_check_legacy_hdev(s, hiod, errp); >> >>>>>>>>>>>> +} >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> I think we should be using the .get_host_iommu_info() class >> >>> handler >> >>>>>>>>>>> instead. Can we refactor the code slightly to avoid this check >on >> >>>>>>>>>>> the type ? >> >>>>>>>>>> >> >>>>>>>>>> There is some difficulty ini avoiding this check, the behavior >of >> >>>>>>>>> vtd_check_legacy_hdev >> >>>>>>>>>> and vtd_check_iommufd_hdev are different especially after >> >>> nesting >> >>>>>>>>> support introduced. >> >>>>>>>>>> vtd_check_iommufd_hdev() has much wider check over >> >cap/ecap >> >>> bits >> >>>>>>>>> besides aw_bits. >> >>>>>>>>> >> >>>>>>>>> I think it is important to fully separate the vIOMMU model >from >> >the >> >>>>>>>>> host IOMMU backing device. >> >>>>> >> >>>>> This comment is true for the structures also. >> >>>>> >> >>>>>>>>> Could we introduce a new HostIOMMUDeviceClass >> >>>>>>>>> handler .check_hdev() handler, which would >> >>>>> call .get_host_iommu_info() ? >> >>>>> >> >>>>> This means that HIOD_LEGACY_INFO and HIOD_IOMMUFD_INFO >> >should >> >>> be >> >>>>> a common structure 'HostIOMMUDeviceInfo' holding all attributes >> >>>>> for the different backends. Each .get_host_iommu_info() >> >implementation >> >>>>> would translate the specific host iommu device data presentation >> >>>>> into the common 'HostIOMMUDeviceInfo', this is true for >> >host_aw_bits. >> >>>> >> >>>> I see, it's just not easy to define the unified elements in >> >>> HostIOMMUDeviceInfo >> >>>> so that they maps to bits or fields in host return IOMMU info. >> >>> >> >>> The proposal is adding a vIOMMU <-> HostIOMMUDevice interface >and a >> >>> new >> >>> API needs to be completely defined for it. The IOMMU backend >> >>> implementation >> >>> could be anything, legacy, iommufd, iommufd v2, some other >framework >> >>> and >> >>> the vIOMMU shouldn't be aware of its implementation. >> >>> >> >>> Exposing the kernel structures as done below should be avoided >because >> >>> they are part of the QEMU <-> kernel IOMMUFD interface. >> >>> >> >>> >> >>>> Different platform returned host IOMMU info is platform specific. >> >>>> For vtd and siommu: >> >>>> >> >>>> struct iommu_hw_info_vtd { >> >>>> __u32 flags; >> >>>> __u32 __reserved; >> >>>> __aligned_u64 cap_reg; >> >>>> __aligned_u64 ecap_reg; >> >>>> }; >> >>>> >> >>>> struct iommu_hw_info_arm_smmuv3 { >> >>>> __u32 flags; >> >>>> __u32 __reserved; >> >>>> __u32 idr[6]; >> >>>> __u32 iidr; >> >>>> __u32 aidr; >> >>>> }; >> >>>> >> >>>> I can think of two kinds of declaration of HostIOMMUDeviceInfo: >> >>>> >> >>>> struct HostIOMMUDeviceInfo { >> >>>> uint8_t aw_bits; >> >>>> enum iommu_hw_info_type type; >> >>>> union { >> >>>> struct iommu_hw_info_vtd vtd; >> >>>> struct iommu_hw_info_arm_smmuv3; >> >>>> ...... >> >>>> } data; >> >>>> } >> >>>> >> >>>> or >> >>>> >> >>>> struct HostIOMMUDeviceInfo { >> >>>> uint8_t aw_bits; >> >>>> enum iommu_hw_info_type type; >> >>>> __u32 flags; >> >>>> __aligned_u64 cap_reg; >> >>>> __aligned_u64 ecap_reg; >> >>>> __u32 idr[6]; >> >>>> __u32 iidr; >> >>>> __u32 aidr; >> >>>> ...... >> >>>> } >> >>>> >> >>>> Not clear if any is your expected format. >> >>>> >> >>>>> 'type' could be handled the same way, with a >'HostIOMMUDeviceInfo' >> >>>>> type attribute and host iommu device type definitions, or as you >> >>>>> suggested with a QOM interface. This is more complex however. In >> >>>>> this case, I would suggest to implement a .compatible() handler to >> >>>>> compare the host iommu device type with the vIOMMU type. >> >>>>> >> >>>>> The resulting check_hdev routine would look something like : >> >>>>> >> >>>>> static int vtd_check_hdev(IntelIOMMUState *s, >> >VTDHostIOMMUDevice >> >>>>> *vtd_hdev, >> >>>>> Error **errp) >> >>>>> { >> >>>>> HostIOMMUDevice *hiod = vtd_hdev->dev; >> >>>>> HostIOMMUDeviceClass *hiodc = >> >>>>> HOST_IOMMU_DEVICE_GET_CLASS(hiod); >> >>>>> HostIOMMUDevice info; >> >>>>> int host_aw_bits, ret; >> >>>>> >> >>>>> ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), >errp); >> >>>>> if (ret) { >> >>>>> return ret; >> >>>>> } >> >>>>> >> >>>>> ret = hiodc->is_compatible(hiod, VIOMMU_INTERFACE(s)); >> >>>>> if (ret) { >> >>>>> return ret; >> >>>>> } >> >>>>> >> >>>>> if (s->aw_bits > info.aw_bits) { >> >>>>> error_setg(errp, "aw-bits %d > host aw-bits %d", >> >>>>> s->aw_bits, info.aw_bits); >> >>>>> return -EINVAL; >> >>>>> } >> >>>>> } >> >>>>> >> >>>>> and the HostIOMMUDeviceClass::is_compatible() handler would >call a >> >>>>> vIOMMUInterface::compatible() handler simply returning >> >>>>> IOMMU_HW_INFO_TYPE_INTEL_VTD. How does that sound ? >> >>>> >> >>>> Not quite get what HostIOMMUDeviceClass::is_compatible() does. >> >>> >> >>> HostIOMMUDeviceClass::is_compatible() calls in the host IOMMU >> >backend >> >>> to determine which IOMMU types are exposed by the host, then calls >the >> >>> vIOMMUInterface::compatible() handler to do the compare. API is to >be >> >>> defined. >> >>> >> >>> As a refinement, we could introduce in the vIOMMU <-> >> >HostIOMMUDevice >> >>> interface capabilities, or features, to check more precisely the level >> >>> of compatibility between the vIOMMU and the host IOMMU device. >This >> >is >> >>> similar to what is done between QEMU and KVM. >> >>> >> >>> If you think this is too complex, include type in HostIOMMUDeviceInfo. >> >>> >> >>>> Currently legacy and IOMMUFD host device has different check logic, >> >how >> >>> it can help >> >>>> in merging vtd_check_legacy_hdev() and vtd_check_iommufd_hdev() >> >into >> >>> a single vtd_check_hdev()? >> >>> >> >>> IMHO, IOMMU shouldn't be aware of the IOMMU backend >> >implementation, >> >>> but >> >>> if you think the Intel vIOMMU should access directly the iommufd >> >backend >> >>> when available, then we should drop this proposal and revisit the >design >> >>> to take a different approach. >> >> >> >> I implemented a draft following your suggestions so we could explore >> >further. >> >> See >> >https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_ >pre >> >q_v3_tmp >> >> >> >> In this draft, it uses .check_cap() to query >HOST_IOMMU_DEVICE_CAP_xxx >> >> just like KVM CAPs. >> >> A common HostIOMMUDeviceCaps structure is introduced to be used >by >> >> both legacy and iommufd backend. >> >> >> >> It indeed is cleaner. Only problem is I failed to implement .compatible() >> >> as all the check could go ahead by just calling check_cap(). >> >> Could you help a quick check to see if I misunderstood any of your >> >suggestion? >> > >> >Thanks for the changes. It looks cleaner and simpler ! Some comments, >> > >> >* HostIOMMUDeviceIOMMUFDClass seems useless as it is empty. I don't >> > remember if you told me already you had plans for future changes. >> > Sorry about that if this is the case. I forgot. >> >> Never mind😊, reason is: >> >> In nesting series >> >https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_nesting >_rfcv2/ >> This commit >> >https://github.com/yiliu1765/qemu/commit/581fc900aa296988eaa48abee >6d68d3670faf8c9 >> implement [at|de]tach_hwpt handlers. >> >> So I add an extra layer of abstract HostIOMMUDeviceIOMMUFDClass to >define >> [at|de]tach_hwpt handlers. >> >> > >> >* I would use the 'host_iommu_device_' prefix for external routines >> > which are part of the HostIOMMUDevice API and use 'hiod_' for >> > internal routines where it makes sense, to limit the name length for >> > instance. >> >> Good idea, will do. >> >> > >> >* I would rename HOST_IOMMU_DEVICE_CAP_IOMMUFD_V1 to >> > HOST_IOMMU_DEVICE_CAP_IOMMUFD. I mentioned IOMMUFD v2 as >a >> > theoretical example of a different IOMMU interface. I don't think we >> > need to anticipate yet :) >> >> Will do. >> >> > >> >* HostIOMMUDeviceCaps is using 'enum iommu_hw_info_type' from >> > 'linux/iommufd.h', that's not my preferred choice but I won't >> > object. The result looks good. >> >> Ok, will keep it for now. We can change when you want in future. >> >> > >> >* HostIOMMUDevice now has a realize() routine to query the host >IOMMU >> > capability for later usage. This is a good idea. However, you could >> > change the return value to bool and avoid the ERRP_GUARD() prologue. >> >> Will do. >> >> > >> >* Beware of : >> > >> > struct Range { >> > /* >> > * Do not access members directly, use the functions! >> > * A non-empty range has @lob <= @upb. >> > * An empty range has @lob == @upb + 1. >> > */ >> > uint64_t lob; /* inclusive lower bound */ >> > uint64_t upb; /* inclusive upper bound */ >> > }; >> >> I remember😊, will add the change in formal version. >> >> > >> > >> >* I think you could introduce a new VFIOIOMMUClass attribute. Let's >> > call it VFIOIOMMUClass::hiod_typename. The creation of >> >HostIOMMUDevice >> > would become generic and you could move : >> > >> > hiod= >> >HOST_IOMMU_DEVICE(object_new(TYPE_HOST_IOMMU_DEVICE_LEGAC >Y_V >> >FIO)); >> > HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, >errp); >> > if (*errp) { >> > object_unref(hiod); >> > return -EINVAL; >> > } >> > vbasedev->hiod = hiod; >> > >> > at the end of vfio_attach_device(). >> >> Good suggestion! Will do. >> >> Thanks >> Zhenzhong > > >So I'm expecting v3 of this.
Just posted v6, https://lists.gnu.org/archive/html/qemu-devel/2024-06/msg00102.html Comments appreciated! Thanks Zhenzhong