On Mon, Sep 29, 2025 at 02:36:27PM +0100, Shameer Kolothum wrote:
> On ARM, when a device is behind an IOMMU, its MSI doorbell address is
> subject to translation by the IOMMU. This behavior affects vfio-pci
> passthrough devices assigned to guests using an accelerated SMMUv3.
>
> In this setup, we configure the host SMMUv3 in nested mode, where
> VFIO sets up the Stage-2 (S2) mappings for guest RAM, while the guest
> controls Stage-1 (S1). To allow VFIO to correctly configure S2 mappings,
> we currently return the system address space via the get_address_space()
> callback for vfio-pci devices.
>
> However, QEMU/KVM also uses this same callback path when resolving the
> address space for MSI doorbells:
>
> kvm_irqchip_add_msi_route()
> kvm_arch_fixup_msi_route()
> pci_device_iommu_address_space()
> get_address_space()
>
> This will cause the device to be configured with wrong MSI doorbell
> address if it return the system address space.
I think it'd be nicer to elaborate why a wrong address will be returned:
--------------------------------------------------------------------------
On ARM, a device behind an IOMMU requires translation for its MSI doorbell
address. When HW nested translation is enabled, the translation will also
happen in two stages: gIOVA => gPA => ITS page.
In the accelerated SMMUv3 mode, both stages are translated by the HW. So,
get_address_space() returns the system address space for stage-2 mappings,
as the smmuv3-accel model doesn't involve in either stage.
On the other hand, this callback is also invoked by QEMU/KVM:
kvm_irqchip_add_msi_route()
kvm_arch_fixup_msi_route()
pci_device_iommu_address_space()
get_address_space()
What KVM wants is to translate an MSI doorbell gIOVA to a vITS page (gPA),
so as to inject IRQs to the guest VM. And it expected get_address_space()
to return the address space for stage-1 mappings instead. Apparently, this
is broken.
Introduce an optional get_msi_address_space() callback and use that in the
above path.
--------------------------------------------------------------------------
> @@ -652,6 +652,21 @@ typedef struct PCIIOMMUOps {
> uint32_t pasid, bool priv_req, bool exec_req,
> hwaddr addr, bool lpig, uint16_t prgi, bool
> is_read,
> bool is_write);
> + /**
> + * @get_msi_address_space: get the address space for MSI doorbell address
> + * for devices
+ * @get_msi_address_space: get the address space to translate MSI doorbell
+ * address for a device
> + *
> + * Optional callback which returns a pointer to an #AddressSpace. This
> + * is required if MSI doorbell also gets translated through IOMMU(eg:
> ARM)
through vIOMMU (e.g. ARM).
With these,
Reviewed-by Nicolin Chen <[email protected]>