Hi Zhenzhong, On 10/24/25 10:43 AM, Zhenzhong Duan wrote: > When guest enables scalable mode and setup first stage page table, we don't > want to use IOMMU MR but rather continue using the system MR for IOMMUFD > backed host device. > > Then default HWPT in VFIO contains GPA->HPA mappings which could be reused > as nesting parent HWPT to construct nested HWPT in vIOMMU.
we had a discussion thread with Nicolin and Shameer about usage of AS for nested SMMU (https://lore.kernel.org/all/[email protected]/) If I understand correctly you also rely on system MR for nested. I am not sure this is a good usage of the API/AS objects as in practice you have an actual translation in place (althout implemented by HW) while by using the system MR you do not reflect that. I encouraged Shameer to try using a dummy dedicated AS that can be shared. I think it would be better if we could align the strategies. > > Suggested-by: Yi Liu <[email protected]> > Signed-off-by: Zhenzhong Duan <[email protected]> > Reviewed-by: Eric Auger <[email protected]> > Reviewed-by: Yi Liu <[email protected]> > --- > hw/i386/intel_iommu.c | 36 ++++++++++++++++++++++++++++++++++-- > 1 file changed, 34 insertions(+), 2 deletions(-) > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > index 4c83578c54..ce4c54165e 100644 > --- a/hw/i386/intel_iommu.c > +++ b/hw/i386/intel_iommu.c > @@ -41,6 +41,7 @@ > #include "migration/misc.h" > #include "migration/vmstate.h" > #include "trace.h" > +#include "system/iommufd.h" > > /* context entry operations */ > #define PASID_0 0 > @@ -1713,6 +1714,24 @@ static bool vtd_dev_pt_enabled(IntelIOMMUState *s, > VTDContextEntry *ce, > > } > > +static VTDHostIOMMUDevice *vtd_find_hiod_iommufd(VTDAddressSpace *as) > +{ > + IntelIOMMUState *s = as->iommu_state; > + struct vtd_as_key key = { > + .bus = as->bus, > + .devfn = as->devfn, > + }; > + VTDHostIOMMUDevice *vtd_hiod = g_hash_table_lookup(s->vtd_host_iommu_dev, > + &key); > + > + if (vtd_hiod && vtd_hiod->hiod && > + object_dynamic_cast(OBJECT(vtd_hiod->hiod), > + TYPE_HOST_IOMMU_DEVICE_IOMMUFD)) { > + return vtd_hiod; > + } > + return NULL; > +} > + > static bool vtd_as_pt_enabled(VTDAddressSpace *as) > { > IntelIOMMUState *s; > @@ -1738,12 +1757,25 @@ static bool vtd_as_pt_enabled(VTDAddressSpace *as) > /* Return whether the device is using IOMMU translation. */ > static bool vtd_switch_address_space(VTDAddressSpace *as) > { > + IntelIOMMUState *s; > bool use_iommu, pt; > > assert(as); > > - use_iommu = as->iommu_state->dmar_enabled && !vtd_as_pt_enabled(as); > - pt = as->iommu_state->dmar_enabled && vtd_as_pt_enabled(as); > + s = as->iommu_state; > + use_iommu = s->dmar_enabled && !vtd_as_pt_enabled(as); > + pt = s->dmar_enabled && vtd_as_pt_enabled(as); > + > + /* > + * When guest enables scalable mode and sets up first stage page table, > + * we stick to system MR for IOMMUFD backed host device. Then its > + * default hwpt contains GPA->HPA mappings which is used directly if > + * PGTT=PT and used as nesting parent if PGTT=FST. Otherwise fall back > + * to original processing. According to the above comment you have a S1 translation in place but you set use_iommu = false and use system MR? Revoking my R-bs for now because I am not convinced we shall use system MR when S1+S2 is setup. I may be wrong but at least I need more explanations ;-) Eric > + */ > + if (s->root_scalable && s->fsts && vtd_find_hiod_iommufd(as)) { > + use_iommu = false; > + } > > trace_vtd_switch_address_space(pci_bus_num(as->bus), > VTD_PCI_SLOT(as->devfn),
