Hi Zhenzhong,

On 7/17/25 5:47 AM, Duan, Zhenzhong wrote:
> Hi Eric,
>
>> -----Original Message-----
>> From: Eric Auger <[email protected]>
>> Sent: Wednesday, July 16, 2025 8:09 PM
>> To: Duan, Zhenzhong <[email protected]>;
>> [email protected]
>> Cc: [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; Tian, Kevin <[email protected]>; Liu,
>> Yi L <[email protected]>; Peng, Chao P <[email protected]>
>> Subject: Re: [PATCH v3 07/20] intel_iommu: Check for compatibility with
>> IOMMUFD backed device when x-flts=on
>>
>> Hi Zhenzhong,
>>
>> On 7/16/25 12:31 PM, Duan, Zhenzhong wrote:
>>> Hi Eric,
>>>
>>>> -----Original Message-----
>>>> From: Eric Auger <[email protected]>
>>>> Subject: Re: [PATCH v3 07/20] intel_iommu: Check for compatibility with
>>>> IOMMUFD backed device when x-flts=on
>>>>
>>>> Hi Zhenzhong,
>>>>
>>>> On 7/8/25 1:05 PM, Zhenzhong Duan wrote:
>>>>> When vIOMMU is configured x-flts=on in scalable mode, stage-1 page
>> table
>>>>> is passed to host to construct nested page table. We need to check
>>>>> compatibility of some critical IOMMU capabilities between vIOMMU and
>>>>> host IOMMU to ensure guest stage-1 page table could be used by host.
>>>>>
>>>>> For instance, vIOMMU supports stage-1 1GB huge page mapping, but
>> host
>>>>> does not, then this IOMMUFD backed device should fail.
>>>>>
>>>>> Even of the checks pass, for now we willingly reject the association
>>>>> because all the bits are not there yet.
>>>>>
>>>>> Signed-off-by: Yi Liu <[email protected]>
>>>>> Signed-off-by: Zhenzhong Duan <[email protected]>
>>>>> ---
>>>>>  hw/i386/intel_iommu.c          | 30
>>>> +++++++++++++++++++++++++++++-
>>>>>  hw/i386/intel_iommu_internal.h |  1 +
>>>>>  2 files changed, 30 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>>>> index e90fd2f28f..c57ca02cdd 100644
>>>>> --- a/hw/i386/intel_iommu.c
>>>>> +++ b/hw/i386/intel_iommu.c
>>>>> @@ -40,6 +40,7 @@
>>>>>  #include "kvm/kvm_i386.h"
>>>>>  #include "migration/vmstate.h"
>>>>>  #include "trace.h"
>>>>> +#include "system/iommufd.h"
>>>>>
>>>>>  /* context entry operations */
>>>>>  #define VTD_CE_GET_RID2PASID(ce) \
>>>>> @@ -4355,7 +4356,34 @@ static bool vtd_check_hiod(IntelIOMMUState
>> *s,
>>>> HostIOMMUDevice *hiod,
>>>>>          return true;
>>>>>      }
>>>>>
>>>>> -    error_setg(errp, "host device is uncompatible with stage-1
>>>> translation");
>>>>> +#ifdef CONFIG_IOMMUFD
>>>>> +    struct HostIOMMUDeviceCaps *caps = &hiod->caps;
>>>>> +    struct iommu_hw_info_vtd *vtd = &caps->vendor_caps.vtd;
>>>> I am now confused about how this relates to vtd_get_viommu_cap().
>>>> PCIIOMMUOps.set_iommu_device = vtd_dev_set_iommu_device calls
>>>> vtd_check_hiod()
>>>> viommu might return HW_NESTED_CAP through
>>>> PCIIOMMUOps.get_viommu_cap
>>>> without making sure the underlying HW IOMMU does support it. Is that a
>>>> correct understanding? Maybe we should clarify the calling order between
>>>> set_iommu_device vs get_viommu_cap? Could we check HW IOMMU
>>>> prerequisites in vtd_get_viommu_cap() by enforcing this is called after
>>>> set_iommu_device. I think we should clarify the exact semantic of
>>>> get_viommu_cap().Thanks Eric
>>> My understanding get_viommu_cap() returns pure vIOMMU's capabilities
>>> with no host IOMMU's capabilities involved.
>>>
>>> For example, returned HW_NESTED_CAP means this vIOMMU has code
>>> to support creating nested hwpt and attaching, no matter if host IOMMU
>>> supports nesting or not.
>> Then I think you need to refine the description in 2/20 to make this clear.
>> stating explicitly get_viommu_cap returns theoretical capabilities which
>> are independent on the actual host capabilities they may depend on.
> Will do.
>
> For virtual vtd, we are unable to return capabilities depending on host 
> capacities,
> Because different host IOMMU may have different capabilities, we want to 
> return
> a consistent result only depending on user's cmdline config.
ok
>
>>> The compatibility check between host IOMMU vs vIOMMU is done in
>>> set_iommu_device(), see vtd_check_hiod().
>>>
>>> It's too late for VFIO to call get_viommu_cap() after set_iommu_device()
>>> because we need get_viommu_cap() to determine if creating nested parent
>>> hwpt or not at attaching stage.
>> isn't it possible to rework the call sequence?
> I think not. Current sequence:
>
> attach_device()
>     get_viommu_cap()
>     create hwpt
> ...
> create hiod
> set_iommu_device(hiod)
>
> Hiod realize needs iommufd,devid and hwpt_id which are ready after 
> attach_device().
OK. I would add this explanation in the commit msg too.
>
> Thanks
> Zhenzhong
Thanks

Eric


Reply via email to