Hi Nicolin, On 10/20/25 8:00 PM, Nicolin Chen wrote: > On Mon, Oct 20, 2025 at 06:14:33PM +0200, Eric Auger wrote: >>>> This will cause the device to be configured with wrong MSI doorbell >>>> address if it return the system address space. >>> I think it'd be nicer to elaborate why a wrong address will be returned: >>> >>> -------------------------------------------------------------------------- >>> On ARM, a device behind an IOMMU requires translation for its MSI doorbell >>> address. When HW nested translation is enabled, the translation will also >>> happen in two stages: gIOVA => gPA => ITS page. >>> >>> In the accelerated SMMUv3 mode, both stages are translated by the HW. So, >>> get_address_space() returns the system address space for stage-2 mappings, >>> as the smmuv3-accel model doesn't involve in either stage. >> I don't understand "doesn't involve in either stage". This is still not >> obious to me that for an HW accelerated nested IOMMU get_address_space() >> shall return the system address space. I think this deserves to be >> explained and maybe documented along with the callback. > get_address_space() is used by pci_device_iommu_address_space(), > which is for attach or translation. > > In QEMU, we have an "iommu" type of memory region, to represent > the address space providing the stage-1 translation. > > In accel case excluding MSI, there is no need of "emulated iommu > translation" since HW/host SMMU takes care of both stages. Thus, > the system address is returned for get_address_space(), to avoid > stage-1 translation and to also allow VFIO devices to attach to > the system address space that the VFIO core will monitor to take > care of stage-2 mappings. but in general if you set as output 'as' the system_address_memory it rather means you have no translation in place. This is what I am not convinced about.
you say it aims at - avoiding stage-1 translation - allow VFIO devices to attach to the system address space that the VFIO core will monitor to take care of stage-2 mappings. Can you achieve the same goals with a proper address space? > >>> On the other hand, this callback is also invoked by QEMU/KVM: >>> >>> kvm_irqchip_add_msi_route() >>> kvm_arch_fixup_msi_route() >>> pci_device_iommu_address_space() >>> get_address_space() >>> >>> What KVM wants is to translate an MSI doorbell gIOVA to a vITS page (gPA), >>> so as to inject IRQs to the guest VM. And it expected get_address_space() >>> to return the address space for stage-1 mappings instead. Apparently, this >>> is broken. >> "Apparently this is broken". Please clarify what is broken. Definitively if >> >> pci_device_iommu_address_space(dev) retruns @adress_system_memory no >> translation is attempted. > Hmm, I thought my writing was clear: > - pci_device_iommu_address_space() returns the system address > space that can't do a stage-1 translation. > - KVM/MSI pathway requires an adress space that can do a stage-1 > translation. understood. although I am not sure using system address space is the best choice. But I may not be the best person to decide about this. > >> kvm_arch_fixup_msi_route() was introduced by >> https://lore.kernel.org/all/[email protected]/ >> >> This relies on the vIOMMU translate callback which is supposed to be >> bypassed in general with VFIO devices. Isn't needed only for emulated >> devices? > Not only for emulated devices. > > This KVM function needs the translation for the IRQ injection for > VFIO devices as well. understood. > > Although we use RMR for underlying HW to bypass the stage-1, the > translation for gIOVA=>vITS page (VIRT_GIC_ITS) still exists in > the guest level. FWIW, it's just doesn't have the stage-2 mapping > because HW never uses the "gIOVA" but a hard-coded SW_MSI address. > > In the meantime, a VFIO device in the guest is programmed with a > gIOVA for MSI doorbell. This gIOVA can't be used for KVM code to > inject IRQs. It needs the gPA (i.e. VIRT_GIC_ITS). So, it needs a > translation address space to do that. > > Hope this is clear now. OK. I understand the needs but I am unsure using system address space is the good choice. Eric > > Thanks > Nicolin >
