Hi Cédric, >-----Original Message----- >From: Cédric Le Goater <c...@redhat.com> >Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >compatibility check with host IOMMU cap/ecap > >Hello Zhenzhong, > >On 4/18/24 10:42, Duan, Zhenzhong wrote: >> Hi Cédric, >> >>> -----Original Message----- >>> From: Cédric Le Goater <c...@redhat.com> >>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >>> compatibility check with host IOMMU cap/ecap >>> >>> Hello Zhenzhong >>> >>> On 4/17/24 11:24, Duan, Zhenzhong wrote: >>>> >>>> >>>>> -----Original Message----- >>>>> From: Cédric Le Goater <c...@redhat.com> >>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >>>>> compatibility check with host IOMMU cap/ecap >>>>> >>>>> On 4/17/24 06:21, Duan, Zhenzhong wrote: >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Cédric Le Goater <c...@redhat.com> >>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >>>>>>> compatibility check with host IOMMU cap/ecap >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> On 4/16/24 09:09, Duan, Zhenzhong wrote: >>>>>>>> Hi Cédric, >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Cédric Le Goater <c...@redhat.com> >>>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >>>>>>>>> compatibility check with host IOMMU cap/ecap >>>>>>>>> >>>>>>>>> On 4/8/24 10:44, Zhenzhong Duan wrote: >>>>>>>>>> From: Yi Liu <yi.l....@intel.com> >>>>>>>>>> >>>>>>>>>> If check fails, the host side device(either vfio or vdpa device) >should >>>>> not >>>>>>>>>> be passed to guest. >>>>>>>>>> >>>>>>>>>> Implementation details for different backends will be in >following >>>>>>> patches. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Yi Liu <yi.l....@intel.com> >>>>>>>>>> Signed-off-by: Yi Sun <yi.y....@linux.intel.com> >>>>>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.d...@intel.com> >>>>>>>>>> --- >>>>>>>>>> hw/i386/intel_iommu.c | 35 >>>>>>>>> +++++++++++++++++++++++++++++++++++ >>>>>>>>>> 1 file changed, 35 insertions(+) >>>>>>>>>> >>>>>>>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c >>>>>>>>>> index 4f84e2e801..a49b587c73 100644 >>>>>>>>>> --- a/hw/i386/intel_iommu.c >>>>>>>>>> +++ b/hw/i386/intel_iommu.c >>>>>>>>>> @@ -35,6 +35,7 @@ >>>>>>>>>> #include "sysemu/kvm.h" >>>>>>>>>> #include "sysemu/dma.h" >>>>>>>>>> #include "sysemu/sysemu.h" >>>>>>>>>> +#include "sysemu/iommufd.h" >>>>>>>>>> #include "hw/i386/apic_internal.h" >>>>>>>>>> #include "kvm/kvm_i386.h" >>>>>>>>>> #include "migration/vmstate.h" >>>>>>>>>> @@ -3819,6 +3820,32 @@ VTDAddressSpace >>>>>>>>> *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, >>>>>>>>>> return vtd_dev_as; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s, >>>>>>>>>> + HostIOMMUDevice *hiod, >>>>>>>>>> + Error **errp) >>>>>>>>>> +{ >>>>>>>>>> + return 0; >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s, >>>>>>>>>> + HostIOMMUDevice *hiod, >>>>>>>>>> + Error **errp) >>>>>>>>>> +{ >>>>>>>>>> + return 0; >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +static int vtd_check_hdev(IntelIOMMUState *s, >>>>>>> VTDHostIOMMUDevice >>>>>>>>> *vtd_hdev, >>>>>>>>>> + Error **errp) >>>>>>>>>> +{ >>>>>>>>>> + HostIOMMUDevice *hiod = vtd_hdev->dev; >>>>>>>>>> + >>>>>>>>>> + if (object_dynamic_cast(OBJECT(hiod), >TYPE_HIOD_IOMMUFD)) >>> { >>>>>>>>>> + return vtd_check_iommufd_hdev(s, hiod, errp); >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + return vtd_check_legacy_hdev(s, hiod, errp); >>>>>>>>>> +} >>>>>>>>> >>>>>>>>> >>>>>>>>> I think we should be using the .get_host_iommu_info() class >handler >>>>>>>>> instead. Can we refactor the code slightly to avoid this check on >>>>>>>>> the type ? >>>>>>>> >>>>>>>> There is some difficulty ini avoiding this check, the behavior of >>>>>>> vtd_check_legacy_hdev >>>>>>>> and vtd_check_iommufd_hdev are different especially after >nesting >>>>>>> support introduced. >>>>>>>> vtd_check_iommufd_hdev() has much wider check over cap/ecap >bits >>>>>>> besides aw_bits. >>>>>>> >>>>>>> I think it is important to fully separate the vIOMMU model from the >>>>>>> host IOMMU backing device. >>> >>> This comment is true for the structures also. >>> >>>>>>> Could we introduce a new HostIOMMUDeviceClass >>>>>>> handler .check_hdev() handler, which would >>> call .get_host_iommu_info() ? >>> >>> This means that HIOD_LEGACY_INFO and HIOD_IOMMUFD_INFO should >be >>> a common structure 'HostIOMMUDeviceInfo' holding all attributes >>> for the different backends. Each .get_host_iommu_info() implementation >>> would translate the specific host iommu device data presentation >>> into the common 'HostIOMMUDeviceInfo', this is true for host_aw_bits. >> >> I see, it's just not easy to define the unified elements in >HostIOMMUDeviceInfo >> so that they maps to bits or fields in host return IOMMU info. > >The proposal is adding a vIOMMU <-> HostIOMMUDevice interface and a >new >API needs to be completely defined for it. The IOMMU backend >implementation >could be anything, legacy, iommufd, iommufd v2, some other framework >and >the vIOMMU shouldn't be aware of its implementation. > >Exposing the kernel structures as done below should be avoided because >they are part of the QEMU <-> kernel IOMMUFD interface. > > >> Different platform returned host IOMMU info is platform specific. >> For vtd and siommu: >> >> struct iommu_hw_info_vtd { >> __u32 flags; >> __u32 __reserved; >> __aligned_u64 cap_reg; >> __aligned_u64 ecap_reg; >> }; >> >> struct iommu_hw_info_arm_smmuv3 { >> __u32 flags; >> __u32 __reserved; >> __u32 idr[6]; >> __u32 iidr; >> __u32 aidr; >> }; >> >> I can think of two kinds of declaration of HostIOMMUDeviceInfo: >> >> struct HostIOMMUDeviceInfo { >> uint8_t aw_bits; >> enum iommu_hw_info_type type; >> union { >> struct iommu_hw_info_vtd vtd; >> struct iommu_hw_info_arm_smmuv3; >> ...... >> } data; >> } >> >> or >> >> struct HostIOMMUDeviceInfo { >> uint8_t aw_bits; >> enum iommu_hw_info_type type; >> __u32 flags; >> __aligned_u64 cap_reg; >> __aligned_u64 ecap_reg; >> __u32 idr[6]; >> __u32 iidr; >> __u32 aidr; >> ...... >> } >> >> Not clear if any is your expected format. >> >>> 'type' could be handled the same way, with a 'HostIOMMUDeviceInfo' >>> type attribute and host iommu device type definitions, or as you >>> suggested with a QOM interface. This is more complex however. In >>> this case, I would suggest to implement a .compatible() handler to >>> compare the host iommu device type with the vIOMMU type. >>> >>> The resulting check_hdev routine would look something like : >>> >>> static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice >>> *vtd_hdev, >>> Error **errp) >>> { >>> HostIOMMUDevice *hiod = vtd_hdev->dev; >>> HostIOMMUDeviceClass *hiodc = >>> HOST_IOMMU_DEVICE_GET_CLASS(hiod); >>> HostIOMMUDevice info; >>> int host_aw_bits, ret; >>> >>> ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp); >>> if (ret) { >>> return ret; >>> } >>> >>> ret = hiodc->is_compatible(hiod, VIOMMU_INTERFACE(s)); >>> if (ret) { >>> return ret; >>> } >>> >>> if (s->aw_bits > info.aw_bits) { >>> error_setg(errp, "aw-bits %d > host aw-bits %d", >>> s->aw_bits, info.aw_bits); >>> return -EINVAL; >>> } >>> } >>> >>> and the HostIOMMUDeviceClass::is_compatible() handler would call a >>> vIOMMUInterface::compatible() handler simply returning >>> IOMMU_HW_INFO_TYPE_INTEL_VTD. How does that sound ? >> >> Not quite get what HostIOMMUDeviceClass::is_compatible() does. > >HostIOMMUDeviceClass::is_compatible() calls in the host IOMMU backend >to determine which IOMMU types are exposed by the host, then calls the >vIOMMUInterface::compatible() handler to do the compare. API is to be >defined. > >As a refinement, we could introduce in the vIOMMU <-> HostIOMMUDevice >interface capabilities, or features, to check more precisely the level >of compatibility between the vIOMMU and the host IOMMU device. This is >similar to what is done between QEMU and KVM. > >If you think this is too complex, include type in HostIOMMUDeviceInfo. > >> Currently legacy and IOMMUFD host device has different check logic, how >it can help >> in merging vtd_check_legacy_hdev() and vtd_check_iommufd_hdev() into >a single vtd_check_hdev()? > >IMHO, IOMMU shouldn't be aware of the IOMMU backend implementation, >but >if you think the Intel vIOMMU should access directly the iommufd backend >when available, then we should drop this proposal and revisit the design >to take a different approach.
I implemented a draft following your suggestions so we could explore further. See https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_preq_v3_tmp In this draft, it uses .check_cap() to query HOST_IOMMU_DEVICE_CAP_xxx just like KVM CAPs. A common HostIOMMUDeviceCaps structure is introduced to be used by both legacy and iommufd backend. It indeed is cleaner. Only problem is I failed to implement .compatible() as all the check could go ahead by just calling check_cap(). Could you help a quick check to see if I misunderstood any of your suggestion? Thanks Zhenzhong > >> Below is the two functions after nesting series, for your easy reference: >> >> static int vtd_check_legacy_hdev() >> { >> if (s->scalable_modern) { >> /* Modern vIOMMU and legacy backend */ >> error_setg(errp, "Need IOMMUFD backend in scalable modern >mode"); >> return -EINVAL; >> } > >This part would typically go in the compatible() handler. > >> >> ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp); >> if (ret) { >> return ret; >> } >> >> if (s->aw_bits > info.aw_bits) { >> error_setg(errp, "aw-bits %d > host aw-bits %d", >> s->aw_bits, info.aw_bits); >> return -EINVAL; >> } >> >> return 0; >> } >> >> static int vtd_check_iommufd_hdev(IntelIOMMUState *s, >> VTDHostIOMMUDevice *vtd_hdev, >> Error **errp) >> { >> ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp); >> if (ret) { >> return ret; >> } >> >> if (info.type != IOMMU_HW_INFO_TYPE_INTEL_VTD) { >> error_setg(errp, "IOMMU hardware is not compatible"); >> return -EINVAL; >> } >> >> vtd = &info.data.vtd; >> host_aw_bits = VTD_MGAW_FROM_CAP(vtd->cap_reg) + 1; >> if (s->aw_bits > host_aw_bits) { >> error_setg(errp, "aw-bits %d > host aw-bits %d", >> s->aw_bits, host_aw_bits); >> return -EINVAL; >> } >> >> if (!s->scalable_modern) { >> goto done; >> } >> >> if (!(vtd->ecap_reg & VTD_ECAP_NEST)) { >> error_setg(errp, "Host IOMMU doesn't support nested translation"); >> return -EINVAL; >> } >> >> if (s->fl1gp && !(vtd->cap_reg & VTD_CAP_FL1GP)) { >> error_setg(errp, "Stage-1 1GB huge page is unsupported by host >IOMMU"); >> return -EINVAL; >> } > >These checks above would typically go in the compatible() handler also. > >Now, the question is how useful will that framework be if hotplugging >devices always fail because the vIOMMU and host IOMMU devices have >incompatible settings/capabilities ? Shouldn't we also add properties >at the vIOMMU level ? > > >Thanks, > >C.