Hi Nicolin,

On 10/20/25 8:00 PM, Nicolin Chen wrote:
> On Mon, Oct 20, 2025 at 06:14:33PM +0200, Eric Auger wrote:
>>>> This will cause the device to be configured with wrong MSI doorbell
>>>> address if it return the system address space.
>>> I think it'd be nicer to elaborate why a wrong address will be returned:
>>>
>>> --------------------------------------------------------------------------
>>> On ARM, a device behind an IOMMU requires translation for its MSI doorbell
>>> address. When HW nested translation is enabled, the translation will also
>>> happen in two stages: gIOVA => gPA => ITS page.
>>>
>>> In the accelerated SMMUv3 mode, both stages are translated by the HW. So,
>>> get_address_space() returns the system address space for stage-2 mappings,
>>> as the smmuv3-accel model doesn't involve in either stage.
>> I don't understand "doesn't involve in either stage". This is still not
>> obious to me that for an HW accelerated nested IOMMU get_address_space()
>> shall return the system address space. I think this deserves to be
>> explained and maybe documented along with the callback.
> get_address_space() is used by pci_device_iommu_address_space(),
> which is for attach or translation.
>
> In QEMU, we have an "iommu" type of memory region, to represent
> the address space providing the stage-1 translation.
>
> In accel case excluding MSI, there is no need of "emulated iommu
> translation" since HW/host SMMU takes care of both stages. Thus,
> the system address is returned for get_address_space(), to avoid
> stage-1 translation and to also allow VFIO devices to attach to
> the system address space that the VFIO core will monitor to take
> care of stage-2 mappings.
but in general if you set as output 'as' the system_address_memory it
rather means you have no translation in place. This is what I am not
convinced about.

you say it aims at
- avoiding stage-1 translation - allow VFIO devices to attach to the
system address space that the VFIO core will monitor to take care of
stage-2 mappings. Can you achieve the same goals with a proper address
space?
>
>>> On the other hand, this callback is also invoked by QEMU/KVM:
>>>
>>>  kvm_irqchip_add_msi_route()
>>>    kvm_arch_fixup_msi_route()
>>>      pci_device_iommu_address_space()
>>>       get_address_space()
>>>
>>> What KVM wants is to translate an MSI doorbell gIOVA to a vITS page (gPA),
>>> so as to inject IRQs to the guest VM. And it expected get_address_space()
>>> to return the address space for stage-1 mappings instead. Apparently, this
>>> is broken.
>> "Apparently this is broken". Please clarify what is broken. Definitively if
>>
>> pci_device_iommu_address_space(dev) retruns @adress_system_memory no
>> translation is attempted.
> Hmm, I thought my writing was clear:
>  - pci_device_iommu_address_space() returns the system address
>    space that can't do a stage-1 translation.
>  - KVM/MSI pathway requires an adress space that can do a stage-1
>    translation.
understood. although I am not sure using system address space is the
best choice. But I may not be the best person to decide about this.
>
>> kvm_arch_fixup_msi_route() was introduced by 
>> https://lore.kernel.org/all/[email protected]/
>>
>> This relies on the vIOMMU translate callback which is supposed to be 
>> bypassed in general with VFIO devices. Isn't needed only for emulated 
>> devices?
> Not only for emulated devices.
>
> This KVM function needs the translation for the IRQ injection for
> VFIO devices as well.
understood.
>
> Although we use RMR for underlying HW to bypass the stage-1, the
> translation for gIOVA=>vITS page (VIRT_GIC_ITS) still exists in
> the guest level. FWIW, it's just doesn't have the stage-2 mapping
> because HW never uses the "gIOVA" but a hard-coded SW_MSI address.
>
> In the meantime, a VFIO device in the guest is programmed with a
> gIOVA for MSI doorbell. This gIOVA can't be used for KVM code to
> inject IRQs. It needs the gPA (i.e. VIRT_GIC_ITS). So, it needs a
> translation address space to do that.
>
> Hope this is clear now.
OK. I understand the needs but I am unsure using system address space is
the good choice.

Eric
>
> Thanks
> Nicolin
>


Reply via email to