Hi Cédric, >-----Original Message----- >From: Cédric Le Goater <c...@redhat.com> >Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >compatibility check with host IOMMU cap/ecap > >Hello Zhenzhong > >On 4/17/24 11:24, Duan, Zhenzhong wrote: >> >> >>> -----Original Message----- >>> From: Cédric Le Goater <c...@redhat.com> >>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >>> compatibility check with host IOMMU cap/ecap >>> >>> On 4/17/24 06:21, Duan, Zhenzhong wrote: >>>> >>>> >>>>> -----Original Message----- >>>>> From: Cédric Le Goater <c...@redhat.com> >>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >>>>> compatibility check with host IOMMU cap/ecap >>>>> >>>>> Hello, >>>>> >>>>> On 4/16/24 09:09, Duan, Zhenzhong wrote: >>>>>> Hi Cédric, >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Cédric Le Goater <c...@redhat.com> >>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do >>>>>>> compatibility check with host IOMMU cap/ecap >>>>>>> >>>>>>> On 4/8/24 10:44, Zhenzhong Duan wrote: >>>>>>>> From: Yi Liu <yi.l....@intel.com> >>>>>>>> >>>>>>>> If check fails, the host side device(either vfio or vdpa device) should >>> not >>>>>>>> be passed to guest. >>>>>>>> >>>>>>>> Implementation details for different backends will be in following >>>>> patches. >>>>>>>> >>>>>>>> Signed-off-by: Yi Liu <yi.l....@intel.com> >>>>>>>> Signed-off-by: Yi Sun <yi.y....@linux.intel.com> >>>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.d...@intel.com> >>>>>>>> --- >>>>>>>> hw/i386/intel_iommu.c | 35 >>>>>>> +++++++++++++++++++++++++++++++++++ >>>>>>>> 1 file changed, 35 insertions(+) >>>>>>>> >>>>>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c >>>>>>>> index 4f84e2e801..a49b587c73 100644 >>>>>>>> --- a/hw/i386/intel_iommu.c >>>>>>>> +++ b/hw/i386/intel_iommu.c >>>>>>>> @@ -35,6 +35,7 @@ >>>>>>>> #include "sysemu/kvm.h" >>>>>>>> #include "sysemu/dma.h" >>>>>>>> #include "sysemu/sysemu.h" >>>>>>>> +#include "sysemu/iommufd.h" >>>>>>>> #include "hw/i386/apic_internal.h" >>>>>>>> #include "kvm/kvm_i386.h" >>>>>>>> #include "migration/vmstate.h" >>>>>>>> @@ -3819,6 +3820,32 @@ VTDAddressSpace >>>>>>> *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, >>>>>>>> return vtd_dev_as; >>>>>>>> } >>>>>>>> >>>>>>>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s, >>>>>>>> + HostIOMMUDevice *hiod, >>>>>>>> + Error **errp) >>>>>>>> +{ >>>>>>>> + return 0; >>>>>>>> +} >>>>>>>> + >>>>>>>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s, >>>>>>>> + HostIOMMUDevice *hiod, >>>>>>>> + Error **errp) >>>>>>>> +{ >>>>>>>> + return 0; >>>>>>>> +} >>>>>>>> + >>>>>>>> +static int vtd_check_hdev(IntelIOMMUState *s, >>>>> VTDHostIOMMUDevice >>>>>>> *vtd_hdev, >>>>>>>> + Error **errp) >>>>>>>> +{ >>>>>>>> + HostIOMMUDevice *hiod = vtd_hdev->dev; >>>>>>>> + >>>>>>>> + if (object_dynamic_cast(OBJECT(hiod), TYPE_HIOD_IOMMUFD)) >{ >>>>>>>> + return vtd_check_iommufd_hdev(s, hiod, errp); >>>>>>>> + } >>>>>>>> + >>>>>>>> + return vtd_check_legacy_hdev(s, hiod, errp); >>>>>>>> +} >>>>>>> >>>>>>> >>>>>>> I think we should be using the .get_host_iommu_info() class handler >>>>>>> instead. Can we refactor the code slightly to avoid this check on >>>>>>> the type ? >>>>>> >>>>>> There is some difficulty ini avoiding this check, the behavior of >>>>> vtd_check_legacy_hdev >>>>>> and vtd_check_iommufd_hdev are different especially after nesting >>>>> support introduced. >>>>>> vtd_check_iommufd_hdev() has much wider check over cap/ecap bits >>>>> besides aw_bits. >>>>> >>>>> I think it is important to fully separate the vIOMMU model from the >>>>> host IOMMU backing device. > >This comment is true for the structures also. > >>>>> Could we introduce a new HostIOMMUDeviceClass >>>>> handler .check_hdev() handler, which would >call .get_host_iommu_info() ? > >This means that HIOD_LEGACY_INFO and HIOD_IOMMUFD_INFO should be >a common structure 'HostIOMMUDeviceInfo' holding all attributes >for the different backends. Each .get_host_iommu_info() implementation >would translate the specific host iommu device data presentation >into the common 'HostIOMMUDeviceInfo', this is true for host_aw_bits.
I see, it's just not easy to define the unified elements in HostIOMMUDeviceInfo so that they maps to bits or fields in host return IOMMU info. Different platform returned host IOMMU info is platform specific. For vtd and siommu: struct iommu_hw_info_vtd { __u32 flags; __u32 __reserved; __aligned_u64 cap_reg; __aligned_u64 ecap_reg; }; struct iommu_hw_info_arm_smmuv3 { __u32 flags; __u32 __reserved; __u32 idr[6]; __u32 iidr; __u32 aidr; }; I can think of two kinds of declaration of HostIOMMUDeviceInfo: struct HostIOMMUDeviceInfo { uint8_t aw_bits; enum iommu_hw_info_type type; union { struct iommu_hw_info_vtd vtd; struct iommu_hw_info_arm_smmuv3; ...... } data; } or struct HostIOMMUDeviceInfo { uint8_t aw_bits; enum iommu_hw_info_type type; __u32 flags; __aligned_u64 cap_reg; __aligned_u64 ecap_reg; __u32 idr[6]; __u32 iidr; __u32 aidr; ...... } Not clear if any is your expected format. > >'type' could be handled the same way, with a 'HostIOMMUDeviceInfo' >type attribute and host iommu device type definitions, or as you >suggested with a QOM interface. This is more complex however. In >this case, I would suggest to implement a .compatible() handler to >compare the host iommu device type with the vIOMMU type. > >The resulting check_hdev routine would look something like : > >static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice >*vtd_hdev, > Error **errp) >{ > HostIOMMUDevice *hiod = vtd_hdev->dev; > HostIOMMUDeviceClass *hiodc = >HOST_IOMMU_DEVICE_GET_CLASS(hiod); > HostIOMMUDevice info; > int host_aw_bits, ret; > > ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp); > if (ret) { > return ret; > } > > ret = hiodc->is_compatible(hiod, VIOMMU_INTERFACE(s)); > if (ret) { > return ret; > } > > if (s->aw_bits > info.aw_bits) { > error_setg(errp, "aw-bits %d > host aw-bits %d", > s->aw_bits, info.aw_bits); > return -EINVAL; > } >} > >and the HostIOMMUDeviceClass::is_compatible() handler would call a >vIOMMUInterface::compatible() handler simply returning >IOMMU_HW_INFO_TYPE_INTEL_VTD. How does that sound ? Not quite get what HostIOMMUDeviceClass::is_compatible() does. Currently legacy and IOMMUFD host device has different check logic, how it can help in merging vtd_check_legacy_hdev() and vtd_check_iommufd_hdev() into a single vtd_check_hdev()? Below is the two functions after nesting series, for your easy reference: static int vtd_check_legacy_hdev() { if (s->scalable_modern) { /* Modern vIOMMU and legacy backend */ error_setg(errp, "Need IOMMUFD backend in scalable modern mode"); return -EINVAL; } ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp); if (ret) { return ret; } if (s->aw_bits > info.aw_bits) { error_setg(errp, "aw-bits %d > host aw-bits %d", s->aw_bits, info.aw_bits); return -EINVAL; } return 0; } static int vtd_check_iommufd_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hdev, Error **errp) { ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp); if (ret) { return ret; } if (info.type != IOMMU_HW_INFO_TYPE_INTEL_VTD) { error_setg(errp, "IOMMU hardware is not compatible"); return -EINVAL; } vtd = &info.data.vtd; host_aw_bits = VTD_MGAW_FROM_CAP(vtd->cap_reg) + 1; if (s->aw_bits > host_aw_bits) { error_setg(errp, "aw-bits %d > host aw-bits %d", s->aw_bits, host_aw_bits); return -EINVAL; } if (!s->scalable_modern) { goto done; } if (!(vtd->ecap_reg & VTD_ECAP_NEST)) { error_setg(errp, "Host IOMMU doesn't support nested translation"); return -EINVAL; } if (s->fl1gp && !(vtd->cap_reg & VTD_CAP_FL1GP)) { error_setg(errp, "Stage-1 1GB huge page is unsupported by host IOMMU"); return -EINVAL; } vtd_hdev->errata = vtd->flags & IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17; done: return 0; } Thanks Zhenzhong > >Including the type in HostIOMMUDeviceInfo is much simpler to start with. > >Thanks, > >C. > > > > > >>>> >>>> Understood, besides the new .check_hdev() handler, I think we also >need a >>> new interface >>>> class TYPE_IOMMU_CHECK_HDEV which has two handlers >>> check_[legacy|iommufd]_hdev(), >>>> and different vIOMMUs have different implementation. >>> >>> I am not sure to understand. Which class hierarchy would implement this >>> new "TYPE_IOMMU_CHECK_HDEV" interface ? vIOMMU or host iommu ? >>> >>> Could you please explain with an update of your diagram : >>> >>> HostIOMMUDevice >>> | .get_host_iommu_info() >>> | >>> | >>> .------------------------------------. >>> | | | >>> HIODLegacyVFIO [HIODLegacyVDPA] HIODIOMMUFD >>> | .vdev | [.vdev] | .iommufd >>> | .devid >>> | [.ioas_id] >>> | [.attach_hwpt()] >>> | [.detach_hwpt()] >>> | >>> .----------------------. >>> | | >>> HIODIOMMUFDVFIO [HIODIOMMUFDVDPA] >>> | .vdev | [.vdev] >>> >> >> Sure. >> >> HostIOMMUDevice >> | .get_host_iommu_info() >> | .check_hdev() >> | >> .------------------------------. >> | | >> HIODLegacy HIODIOMMUFD >> | | .iommufd >> .--------------. | .devid >> | | | [.ioas_id] >> HIODLegacyVFIO [HIODLegacyVDPA] | [.attach_hwpt()] >> | .vdev | [.vdev] | [.detach_hwpt()] >> | >> .----------------------. >> | | >> HIODIOMMUFDVFIO [HIODIOMMUFDVDPA] >> | .vdev | [.vdev] >> >> >> HostIOMMUDevice only declare .check_hdev(), but >> HIODLegacy and HIODIOMMUFD will implement .check_hdev(). >> E.g., hiod_legacy_check_hdev() and hiod_iommufd_check_hdev(). >> >> int hiod_legacy_check_hdev(HostIOMMUDevice *hiod, IOMMUCheckHDev >*viommu, Error **errp) >> { >> IOMMUCheckHDevClass *chdc = >IOMMU_CHECK_HDEV_GET_CLASS(viommu); >> >> return chdc->check_legacy_hdev(viommu, hiod, errp); >> } >> >> int hiod_iommufd_check_hdev(HostIOMMUDevice *hiod, >IOMMUCheckHDev *viommu, Error **errp) >> { >> IOMMUCheckHDevClass *chdc = >IOMMU_CHECK_HDEV_GET_CLASS(viommu); >> >> return chdc->check_iommufd_hdev(viommu, hiod, errp); >> } >> >> And we implement interface TYPE_IOMMU_CHECK_HDEV in intel-iommu >module. >> Certainly, we can also implement the same in other vIOMMUs we want. >> See below pseudo change: >> >> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c >> index 68380d50ca..173c702b9f 100644 >> --- a/hw/i386/intel_iommu.c >> +++ b/hw/i386/intel_iommu.c >> @@ -5521,12 +5521,9 @@ static int vtd_check_hdev(IntelIOMMUState >*s, VTDHostIOMMUDevice *vtd_hdev, >> Error **errp) >> { >> HostIOMMUDevice *hiod = vtd_hdev->dev; >> + HostIOMMUDeviceClass *hiodc = >HOST_IOMMU_DEVICE_GET_CLASS(hiod); >> >> - if (object_dynamic_cast(OBJECT(hiod), TYPE_HIOD_IOMMUFD)) { >> - return vtd_check_iommufd_hdev(s, vtd_hdev, errp); >> - } >> - >> - return vtd_check_legacy_hdev(s, hiod, errp); >> + return hiodc->check_hdev(IOMMU_CHECK_HDEV(s), hiod, errp); >> } >> >> static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int >devfn, >> @@ -6076,6 +6073,7 @@ static void vtd_class_init(ObjectClass *klass, >void *data) >> { >> DeviceClass *dc = DEVICE_CLASS(klass); >> X86IOMMUClass *x86_class = X86_IOMMU_DEVICE_CLASS(klass); >> + IOMMUCheckHDevClass *chdc = IOMMU_CHECK_HDEV_CLASS(klass); >> >> dc->reset = vtd_reset; >> dc->vmsd = &vtd_vmstate; >> @@ -6087,6 +6085,8 @@ static void vtd_class_init(ObjectClass *klass, >void *data) >> dc->user_creatable = true; >> set_bit(DEVICE_CATEGORY_MISC, dc->categories); >> dc->desc = "Intel IOMMU (VT-d) DMA Remapping device"; >> + chdc->check_legacy_hdev = vtd_check_legacy_hdev; >> + chdc->check_iommufd_hdev = vtd_check_iommufd_hdev; >> } >> >> static const TypeInfo vtd_info = { >> @@ -6094,6 +6094,10 @@ static const TypeInfo vtd_info = { >> .parent = TYPE_X86_IOMMU_DEVICE, >> .instance_size = sizeof(IntelIOMMUState), >> .class_init = vtd_class_init, >> + .interfaces = (InterfaceInfo[]) { >> + { TYPE_IOMMU_CHECK_HDEV }, >> + { } >> + } >> }; >> >> Thanks >> Zhenzhong