>-----Original Message-----
>From: Nicolin Chen <nicol...@nvidia.com>
>Subject: Re: [PATCH rfcv3 15/21] intel_iommu: Bind/unbind guest page table to
>host
>
>On Wed, May 21, 2025 at 07:14:45PM +0800, Zhenzhong Duan wrote:
>> +static const MemoryListener iommufd_s2domain_memory_listener = {
>> +    .name = "iommufd_s2domain",
>> +    .priority = 1000,
>> +    .region_add = iommufd_listener_region_add_s2domain,
>> +    .region_del = iommufd_listener_region_del_s2domain,
>> +};
>
>Would you mind elaborating When and how vtd does all S2 mappings?

When guest trigger pasid cache invalidation, vIOMMU will attach device
to stage2 page table if guest's PGTT=PT or nested page table if PGTT=Stage1.
All these page tables are dynamically created during attach. We don't use
VFIO's shadow page table. The S2 mappings are also created during attach.
See:

vtd_device_attach_iommufd()
{
...
    vtd_device_attach_container();
    container->listener = iommufd_s2domain_memory_listener;
    memory_listener_register(&container->listener, &address_space_memory);
...
}

>
>On ARM, the default vfio_memory_listener could capture the entire
>guest RAM and add to the address space. So what we do is basically
>reusing the vfio_memory_listener:
>https://lore.kernel.org/qemu-devel/20250311141045.66620-13-
>shameerali.kolothum.th...@huawei.com/
>
>The thing is that when a VFIO device is attached to the container
>upon a nesting configuration, the ->get_address_space op should
>return the system address space as S1 nested HWPT isn't allocated
>yet. Then all the iommu as routines in vfio_listener_region_add()
>would be skipped, ending up with mapping the guest RAM in S2 HWPT
>correctly. Not until the S1 nested HWPT is allocated by the guest
>OS (after guest boots), can the ->get_address_space op return the
>iommu address space.

When S1 hwpt is allocated by guest, who will notify VFIO to call
->get_address_space op() again to get iommu address space?

>
>With this address space shift, S2 mappings can be simply captured
>and done by vfio_memory_listener. Then, such an s2domain listener
>would be largely redundant.

I didn't get how arm smmu supports switching address space, will VFIO call
->get_address_space() twice, once to get system address space and the other
for iommu address space?

>
>So the second question is:
>Does vtd have to own this iommufd_s2domain_memory_listener? IOW,
>does vtd_host_dma_iommu() have to return the iommu address space
>all the time?

Vtd only support to return a fixed address space, under the address space
there can be either system memory region or iommu memory region enabled.

>
>> +static int vtd_create_s1_hwpt(HostIOMMUDeviceIOMMUFD *idev,
>> +                              VTDS2Hwpt *s2_hwpt, VTDHwpt *hwpt,
>> +                              VTDPASIDEntry *pe, Error **errp)
>> +{
>> +    struct iommu_hwpt_vtd_s1 vtd;
>> +    uint32_t hwpt_id, s2_hwpt_id = s2_hwpt->hwpt_id;
>> +
>> +    vtd_init_s1_hwpt_data(&vtd, pe);
>> +
>> +    if (!iommufd_backend_alloc_hwpt(idev->iommufd, idev->devid,
>> +                                    s2_hwpt_id, 0, IOMMU_HWPT_DATA_VTD_S1,
>> +                                    sizeof(vtd), &vtd, &hwpt_id, errp)) {
>> +        return -EINVAL;
>> +    }
>> +
>> +    hwpt->hwpt_id = hwpt_id;
>> +
>> +    return 0;
>> +}
>> +
>> +static void vtd_destroy_s1_hwpt(HostIOMMUDeviceIOMMUFD *idev,
>VTDHwpt *hwpt)
>> +{
>> +    iommufd_backend_free_id(idev->iommufd, hwpt->hwpt_id);
>> +}
>
>I think you did some substantial work to isolate the get_hw_info
>part inside the iommufd backend code, which looks nice and clean
>as the vIOMMU code simply does iodc->get_cap().
>
>However, that then makes these direct raw backend function calls
>very awkward :-/
>
>In my view, the way to make sense is either:
>* We don't do any isolation, but just call raw backend functions
>  in vIOMMU code
>* We do every isolation, and never call raw backend functions in
>  vIOMMU code

Iommufd backend functions are general for all modules usage including
vIOMMU. I think we are not blocking vIOMMU calling raw backend functions.
We just provide a general interface for querying capabilities, not all 
capabilities
are from iommufd get_hw_info result, e.g., aw_bits.

Zhenzhong



Reply via email to