On Mon, Sep 29, 2025 at 02:36:27PM +0100, Shameer Kolothum wrote:
> On ARM, when a device is behind an IOMMU, its MSI doorbell address is
> subject to translation by the IOMMU. This behavior affects vfio-pci
> passthrough devices assigned to guests using an accelerated SMMUv3.
> 
> In this setup, we configure the host SMMUv3 in nested mode, where
> VFIO sets up the Stage-2 (S2) mappings for guest RAM, while the guest
> controls Stage-1 (S1). To allow VFIO to correctly configure S2 mappings,
> we currently return the system address space via the get_address_space()
> callback for vfio-pci devices.
> 
> However, QEMU/KVM also uses this same callback path when resolving the
> address space for MSI doorbells:
> 
> kvm_irqchip_add_msi_route()
>   kvm_arch_fixup_msi_route()
>     pci_device_iommu_address_space()
>      get_address_space()
> 
> This will cause the device to be configured with wrong MSI doorbell
> address if it return the system address space.

I think it'd be nicer to elaborate why a wrong address will be returned:

--------------------------------------------------------------------------
On ARM, a device behind an IOMMU requires translation for its MSI doorbell
address. When HW nested translation is enabled, the translation will also
happen in two stages: gIOVA => gPA => ITS page.

In the accelerated SMMUv3 mode, both stages are translated by the HW. So,
get_address_space() returns the system address space for stage-2 mappings,
as the smmuv3-accel model doesn't involve in either stage.

On the other hand, this callback is also invoked by QEMU/KVM:

 kvm_irqchip_add_msi_route()
   kvm_arch_fixup_msi_route()
     pci_device_iommu_address_space()
      get_address_space()

What KVM wants is to translate an MSI doorbell gIOVA to a vITS page (gPA),
so as to inject IRQs to the guest VM. And it expected get_address_space()
to return the address space for stage-1 mappings instead. Apparently, this
is broken.

Introduce an optional get_msi_address_space() callback and use that in the
above path.
--------------------------------------------------------------------------

> @@ -652,6 +652,21 @@ typedef struct PCIIOMMUOps {
>                              uint32_t pasid, bool priv_req, bool exec_req,
>                              hwaddr addr, bool lpig, uint16_t prgi, bool 
> is_read,
>                              bool is_write);
> +    /**
> +     * @get_msi_address_space: get the address space for MSI doorbell address
> +     * for devices

+     * @get_msi_address_space: get the address space to translate MSI doorbell
+     * address for a device

> +     *
> +     * Optional callback which returns a pointer to an #AddressSpace. This
> +     * is required if MSI doorbell also gets translated through IOMMU(eg: 
> ARM)

through vIOMMU (e.g. ARM).

With these,

Reviewed-by Nicolin Chen <[email protected]>

Reply via email to