Hi Zhenzhong, On 7/17/25 5:47 AM, Duan, Zhenzhong wrote: > Hi Eric, > >> -----Original Message----- >> From: Eric Auger <[email protected]> >> Sent: Wednesday, July 16, 2025 8:09 PM >> To: Duan, Zhenzhong <[email protected]>; >> [email protected] >> Cc: [email protected]; [email protected]; [email protected]; >> [email protected]; [email protected]; [email protected]; >> [email protected]; [email protected]; >> [email protected]; [email protected]; >> [email protected]; Tian, Kevin <[email protected]>; Liu, >> Yi L <[email protected]>; Peng, Chao P <[email protected]> >> Subject: Re: [PATCH v3 07/20] intel_iommu: Check for compatibility with >> IOMMUFD backed device when x-flts=on >> >> Hi Zhenzhong, >> >> On 7/16/25 12:31 PM, Duan, Zhenzhong wrote: >>> Hi Eric, >>> >>>> -----Original Message----- >>>> From: Eric Auger <[email protected]> >>>> Subject: Re: [PATCH v3 07/20] intel_iommu: Check for compatibility with >>>> IOMMUFD backed device when x-flts=on >>>> >>>> Hi Zhenzhong, >>>> >>>> On 7/8/25 1:05 PM, Zhenzhong Duan wrote: >>>>> When vIOMMU is configured x-flts=on in scalable mode, stage-1 page >> table >>>>> is passed to host to construct nested page table. We need to check >>>>> compatibility of some critical IOMMU capabilities between vIOMMU and >>>>> host IOMMU to ensure guest stage-1 page table could be used by host. >>>>> >>>>> For instance, vIOMMU supports stage-1 1GB huge page mapping, but >> host >>>>> does not, then this IOMMUFD backed device should fail. >>>>> >>>>> Even of the checks pass, for now we willingly reject the association >>>>> because all the bits are not there yet. >>>>> >>>>> Signed-off-by: Yi Liu <[email protected]> >>>>> Signed-off-by: Zhenzhong Duan <[email protected]> >>>>> --- >>>>> hw/i386/intel_iommu.c | 30 >>>> +++++++++++++++++++++++++++++- >>>>> hw/i386/intel_iommu_internal.h | 1 + >>>>> 2 files changed, 30 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c >>>>> index e90fd2f28f..c57ca02cdd 100644 >>>>> --- a/hw/i386/intel_iommu.c >>>>> +++ b/hw/i386/intel_iommu.c >>>>> @@ -40,6 +40,7 @@ >>>>> #include "kvm/kvm_i386.h" >>>>> #include "migration/vmstate.h" >>>>> #include "trace.h" >>>>> +#include "system/iommufd.h" >>>>> >>>>> /* context entry operations */ >>>>> #define VTD_CE_GET_RID2PASID(ce) \ >>>>> @@ -4355,7 +4356,34 @@ static bool vtd_check_hiod(IntelIOMMUState >> *s, >>>> HostIOMMUDevice *hiod, >>>>> return true; >>>>> } >>>>> >>>>> - error_setg(errp, "host device is uncompatible with stage-1 >>>> translation"); >>>>> +#ifdef CONFIG_IOMMUFD >>>>> + struct HostIOMMUDeviceCaps *caps = &hiod->caps; >>>>> + struct iommu_hw_info_vtd *vtd = &caps->vendor_caps.vtd; >>>> I am now confused about how this relates to vtd_get_viommu_cap(). >>>> PCIIOMMUOps.set_iommu_device = vtd_dev_set_iommu_device calls >>>> vtd_check_hiod() >>>> viommu might return HW_NESTED_CAP through >>>> PCIIOMMUOps.get_viommu_cap >>>> without making sure the underlying HW IOMMU does support it. Is that a >>>> correct understanding? Maybe we should clarify the calling order between >>>> set_iommu_device vs get_viommu_cap? Could we check HW IOMMU >>>> prerequisites in vtd_get_viommu_cap() by enforcing this is called after >>>> set_iommu_device. I think we should clarify the exact semantic of >>>> get_viommu_cap().Thanks Eric >>> My understanding get_viommu_cap() returns pure vIOMMU's capabilities >>> with no host IOMMU's capabilities involved. >>> >>> For example, returned HW_NESTED_CAP means this vIOMMU has code >>> to support creating nested hwpt and attaching, no matter if host IOMMU >>> supports nesting or not. >> Then I think you need to refine the description in 2/20 to make this clear. >> stating explicitly get_viommu_cap returns theoretical capabilities which >> are independent on the actual host capabilities they may depend on. > Will do. > > For virtual vtd, we are unable to return capabilities depending on host > capacities, > Because different host IOMMU may have different capabilities, we want to > return > a consistent result only depending on user's cmdline config. ok > >>> The compatibility check between host IOMMU vs vIOMMU is done in >>> set_iommu_device(), see vtd_check_hiod(). >>> >>> It's too late for VFIO to call get_viommu_cap() after set_iommu_device() >>> because we need get_viommu_cap() to determine if creating nested parent >>> hwpt or not at attaching stage. >> isn't it possible to rework the call sequence? > I think not. Current sequence: > > attach_device() > get_viommu_cap() > create hwpt > ... > create hiod > set_iommu_device(hiod) > > Hiod realize needs iommufd,devid and hwpt_id which are ready after > attach_device(). OK. I would add this explanation in the commit msg too. > > Thanks > Zhenzhong Thanks
Eric
