On 8/25/25 10:28 AM, Duan, Zhenzhong wrote:
>
>> -----Original Message-----
>> From: Nicolin Chen <nicol...@nvidia.com>
>> Subject: Re: [PATCH v5 05/21] vfio/iommufd: Force creating nested parent
>> domain
>>
>> On Fri, Aug 22, 2025 at 02:40:43AM -0400, Zhenzhong Duan wrote:
>>> Call pci_device_get_viommu_cap() to get if vIOMMU supports
>> VIOMMU_CAP_HW_NESTED,
>>> if yes, create nested parent domain which could be reused by vIOMMU to
>> create
>>> nested domain.
>>>
>>> Introduce helper vfio_device_viommu_get_nested to facilitate this
>>> implementation.
>> It'd be nicer to slightly mention the benefit of having it. Assuming
>> that QEMU commit message can be as long as 80 characters:
>>
>> -------------------------
>> Call pci_device_get_viommu_cap() to get if vIOMMU supports
>> VIOMMU_CAP_HW_NESTED.
>>
>> If yes, create a nesting parent domain and add it to the container's
>> hwpt_list,
>> letting this parent domain cover the entire stage-2 mappings (gPA=>PA).
>>
>> This allows a VFIO passthrough device to directly attach to this default
>> domain
>> and then to use the system address space and its listener.
>>
>> Introduce a vfio_device_viommu_get_nested() helper to facilitate this
>> implementation.
>> -------------------------
> Thanks, will do.
>
>>> It is safe because even if VIOMMU_CAP_HW_NESTED is returned, s->flts is
>>> forbidden and VFIO device fails in set_iommu_device() call, until we support
>>> passthrough device with x-flts=on.
>> I think this is too vendor specific to be mentioned here. Likely
>> the previous VTD patch is the place to have this.
>>
>> Or you could say:
>>
>> --------------------------
>> It is safe to do so because a vIOMMU will be able to fail in
>> set_iommu_device()
>> call, if something else related to the VFIO device or vIOMMU isn't
>> compatible.
>> --------------------------
> Will do.
I would say: it is safe to add the flags |=
IOMMU_HWPT_ALLOC_NEST_PARENT; at this stage, despite the whole
functionality is not in place because HW_NESTED is currently forced off in
set_iommu_device()
Eric
>
>>> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev)
>>> +{
>>> + VFIOPCIDevice *vdev = vfio_pci_from_vfio_device(vbasedev);
>>> +
>>> + if (vdev) {
>>> + return !!(pci_device_get_viommu_cap(&vdev->pdev) &
>>> + VIOMMU_CAP_HW_NESTED);
>> "get_nested" feels too general. Here it particularly means the cap:
>>
>> bool vfio_device_get_viommu_cap_hw_nested(VFIODevice *vbasedev)
> Will use vfio_device_get_viommu_cap_hw_nested()
>
>>> @@ -379,6 +379,14 @@ static bool
>> iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>> flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>> }
>>>
>>> + /*
>>> + * If vIOMMU supports stage-1 translation, force to create nested
>> parent
>>
>> "nested parent" is a contradictory phrase. Parent is a container
>> holding some nested items. A nested parent sounds like a "parent"
>> item that lives inside another parent container.
>>
>> In kernel kdoc/uAPI, we use:
>> - "nesting parent" for stage-2 object
>> - "nested hwpt", "nested domain" for stage-1 object
> Thanks for sharing this info, I didn't notice that. I will fix the whole
> series to use 'nesting parent'.
>
> BRs,
> Zhenzhong
>