On Mon, Oct 20, 2025 at 06:14:33PM +0200, Eric Auger wrote: > >> This will cause the device to be configured with wrong MSI doorbell > >> address if it return the system address space. > > > > I think it'd be nicer to elaborate why a wrong address will be returned: > > > > -------------------------------------------------------------------------- > > On ARM, a device behind an IOMMU requires translation for its MSI doorbell > > address. When HW nested translation is enabled, the translation will also > > happen in two stages: gIOVA => gPA => ITS page. > > > > In the accelerated SMMUv3 mode, both stages are translated by the HW. So, > > get_address_space() returns the system address space for stage-2 mappings, > > as the smmuv3-accel model doesn't involve in either stage.
> I don't understand "doesn't involve in either stage". This is still not > obious to me that for an HW accelerated nested IOMMU get_address_space() > shall return the system address space. I think this deserves to be > explained and maybe documented along with the callback. get_address_space() is used by pci_device_iommu_address_space(), which is for attach or translation. In QEMU, we have an "iommu" type of memory region, to represent the address space providing the stage-1 translation. In accel case excluding MSI, there is no need of "emulated iommu translation" since HW/host SMMU takes care of both stages. Thus, the system address is returned for get_address_space(), to avoid stage-1 translation and to also allow VFIO devices to attach to the system address space that the VFIO core will monitor to take care of stage-2 mappings. > > On the other hand, this callback is also invoked by QEMU/KVM: > > > > kvm_irqchip_add_msi_route() > > kvm_arch_fixup_msi_route() > > pci_device_iommu_address_space() > > get_address_space() > > > > What KVM wants is to translate an MSI doorbell gIOVA to a vITS page (gPA), > > so as to inject IRQs to the guest VM. And it expected get_address_space() > > to return the address space for stage-1 mappings instead. Apparently, this > > is broken. > "Apparently this is broken". Please clarify what is broken. Definitively if > > pci_device_iommu_address_space(dev) retruns @adress_system_memory no > translation is attempted. Hmm, I thought my writing was clear: - pci_device_iommu_address_space() returns the system address space that can't do a stage-1 translation. - KVM/MSI pathway requires an adress space that can do a stage-1 translation. > kvm_arch_fixup_msi_route() was introduced by > https://lore.kernel.org/all/[email protected]/ > > This relies on the vIOMMU translate callback which is supposed to be bypassed > in general with VFIO devices. Isn't needed only for emulated devices? Not only for emulated devices. This KVM function needs the translation for the IRQ injection for VFIO devices as well. Although we use RMR for underlying HW to bypass the stage-1, the translation for gIOVA=>vITS page (VIRT_GIC_ITS) still exists in the guest level. FWIW, it's just doesn't have the stage-2 mapping because HW never uses the "gIOVA" but a hard-coded SW_MSI address. In the meantime, a VFIO device in the guest is programmed with a gIOVA for MSI doorbell. This gIOVA can't be used for KVM code to inject IRQs. It needs the gPA (i.e. VIRT_GIC_ITS). So, it needs a translation address space to do that. Hope this is clear now. Thanks Nicolin
