OK. Let me clarify this at the top as I see the gap here now: First, the vSMMU model is based on Zhenzhong's older series that keeps an ioas_id in the HostIOMMUDeviceIOMMUFD structure, which now it only keeps an hwpt_id in this RFCv3 series. This ioas_id is allocated when a passthrough cdev attaches to a VFIO container.
Second, the vSMMU model reuses the default IOAS via that ioas_id. Since the VFIO container doesn't allocate a nesting parent S2 HWPT (maybe it could?), so the vSMMU allocates another S2 HWPT in the vIOMMU code. Third, the vSMMU model, for invalidation efficiency and HW Queue support, isolates all emulated devices out of the nesting-enabled vSMMU instance, suggested by Jason. So, only passthrough devices would use the nesting-enabled vSMMU instance, meaning there is no need of IOMMU_NOTIFIER_IOTLB_EVENTS: - MAP is not needed as there is no shadow page table. QEMU only traps the page table pointer and forwards it to host kernel. - UNMAP is not needed as QEMU only traps invalidation requests and forwards them to host kernel. (let's forget about the "address space switch" for MSI for now.) So, in the vSMMU model, there is actually no need for the iommu AS. And there is only one IOAS in the VM instance allocated by the VFIO container. And this IOAS manages the GPA->PA mappings. So, get_address_space() returns the system AS for passthrough devices. On the other hand, the VT-d model is a bit different. It's a giant vIOMMU for all devices (either passthrough or emualted). For all emulated devices, it needs IOMMU_NOTIFIER_IOTLB_EVENTS, i.e. the iommu address space returned via get_address_space(). That being said, IOMMU_NOTIFIER_IOTLB_EVENTS should not be needed for passthrough devices, right? IIUIC, in the VT-d model, a passthrough device also gets attached to the VFIO container via iommufd_cdev_attach, allocating an IOAS. But it returns the iommu address space, treating them like those emulated devices, although the underlying MR of the returned IOMMU AS is backed by a nodmar MR (that is essentially a system AS). This seems to completely ignore the default IOAS owned by the VFIO container, because it needs to bypass those RO mappings(?) Then for passthrough devices, the VT-d model allocates an internal IOAS that further requires an internal S2 listener, which seems an large duplication of what the VFIO container already does.. So, here are things that I want us to conclude: 1) Since the VFIO container already has an IOAS for a passthrough device, and IOMMU_NOTIFIER_IOTLB_EVENTS isn't seemingly needed, why not setup this default IOAS to manage gPA=>PA mappings by returning the system AS via get_address_space() for passthrough devices? I got that the VT-d model might have some concern against this, as the default listener would map those RO regions. Yet, maybe the right approach is to figure out a way to bypass RO regions in the core v.s. duplicating another ioas_alloc()/map() and S2 listener? 2) If (1) makes sense, I think we can further simplify the routine by allocating a nesting parent HWPT in iommufd_cdev_attach(), as long as the attaching device is identified as "passthrough" and there is "iommufd" in its "-device" string? After all, IOMMU_HWPT_ALLOC_NEST_PARENT is a common flag. On Mon, May 26, 2025 at 03:24:50PM +0800, Yi Liu wrote: > vfio_listener_region_add, section->mr->name: pc.bios, iova: fffc0000, size: > 40000, vaddr: 7fb314200000, RO > vfio_listener_region_add, section->mr->name: pc.rom, iova: c0000, size: > 20000, vaddr: 7fb206c00000, RO .. > vfio_listener_region_add, section->mr->name: pc.ram, iova: ce000, size: > 1a000, vaddr: 7fb207ece000, RO OK. They look like memory carveouts for FWs. "iova" is gPA right? And they can be in the range of a guest RAM.. Mind elaborating why they shouldn't be mapped onto nesting parent S2? > IMHO. At least for vfio devices, I can see only one get_address_space() > call. So even there are two ASs, how should the vfio be notified when the > AS changed? Since vIOMMU is the source of map/umap requests, it looks fine > to always return iommu AS and handle the AS switch by switching the enabled > subregions according to the guest vIOMMU translation types. No, VFIO doesn't get notified when the AS changes. The vSMMU model wants VFIO to stay in the system AS since the VFIO container manages the S2 mappings for guest PA. The "switch" in vSMMU model is only needed by KVM for MSI doorbell translation. By thinking it carefully, maybe it shouldn't switch AS because VFIO might be confused if it somehow does get_address_space again in the future.. Thanks Nic