Hi Nicolin, Shameer,

On 10/17/25 12:30 AM, Nicolin Chen wrote:
> On Mon, Sep 29, 2025 at 02:36:27PM +0100, Shameer Kolothum wrote:
>> On ARM, when a device is behind an IOMMU, its MSI doorbell address is
>> subject to translation by the IOMMU. This behavior affects vfio-pci
>> passthrough devices assigned to guests using an accelerated SMMUv3.
>>
>> In this setup, we configure the host SMMUv3 in nested mode, where
>> VFIO sets up the Stage-2 (S2) mappings for guest RAM, while the guest
>> controls Stage-1 (S1). To allow VFIO to correctly configure S2 mappings,
>> we currently return the system address space via the get_address_space()
>> callback for vfio-pci devices.
>>
>> However, QEMU/KVM also uses this same callback path when resolving the
>> address space for MSI doorbells:
>>
>> kvm_irqchip_add_msi_route()
>>   kvm_arch_fixup_msi_route()
>>     pci_device_iommu_address_space()
>>      get_address_space()
>>
>> This will cause the device to be configured with wrong MSI doorbell
>> address if it return the system address space.
> I think it'd be nicer to elaborate why a wrong address will be returned:
>
> --------------------------------------------------------------------------
> On ARM, a device behind an IOMMU requires translation for its MSI doorbell
> address. When HW nested translation is enabled, the translation will also
> happen in two stages: gIOVA => gPA => ITS page.
>
> In the accelerated SMMUv3 mode, both stages are translated by the HW. So,
> get_address_space() returns the system address space for stage-2 mappings,
> as the smmuv3-accel model doesn't involve in either stage.
I don't understand "doesn't involve in either stage". This is still not
obious to me that for an HW accelerated nested IOMMU get_address_space()
shall return the system address space. I think this deserves to be
explained and maybe documented along with the callback.
>
> On the other hand, this callback is also invoked by QEMU/KVM:
>
>  kvm_irqchip_add_msi_route()
>    kvm_arch_fixup_msi_route()
>      pci_device_iommu_address_space()
>       get_address_space()
>
> What KVM wants is to translate an MSI doorbell gIOVA to a vITS page (gPA),
> so as to inject IRQs to the guest VM. And it expected get_address_space()
> to return the address space for stage-1 mappings instead. Apparently, this
> is broken.
"Apparently this is broken". Please clarify what is broken. Definitively if

pci_device_iommu_address_space(dev) retruns @adress_system_memory no
translation is attempted.

kvm_arch_fixup_msi_route() was introduced by 
https://lore.kernel.org/all/[email protected]/

This relies on the vIOMMU translate callback which is supposed to be bypassed 
in general with VFIO devices. Isn't needed only for emulated devices?

May you and shameer discussed that in a previous thread. Might be worth to add 
the link to this discussion.

Thanks

Eric


>
> Introduce an optional get_msi_address_space() callback and use that in the
> above path.
> --------------------------------------------------------------------------
>
>> @@ -652,6 +652,21 @@ typedef struct PCIIOMMUOps {
>>                              uint32_t pasid, bool priv_req, bool exec_req,
>>                              hwaddr addr, bool lpig, uint16_t prgi, bool 
>> is_read,
>>                              bool is_write);
>> +    /**
>> +     * @get_msi_address_space: get the address space for MSI doorbell 
>> address
>> +     * for devices
> +     * @get_msi_address_space: get the address space to translate MSI 
> doorbell
> +     * address for a device
>
>> +     *
>> +     * Optional callback which returns a pointer to an #AddressSpace. This
>> +     * is required if MSI doorbell also gets translated through IOMMU(eg: 
>> ARM)
> through vIOMMU (e.g. ARM).
>
> With these,
>
> Reviewed-by Nicolin Chen <[email protected]>
>


Reply via email to