Hi Nicolin,

On 10/21/25 8:56 PM, Nicolin Chen wrote:
> On Tue, Oct 21, 2025 at 06:26:39PM +0200, Eric Auger wrote:
>> Hi Nicolin,
>>
>> On 10/20/25 8:00 PM, Nicolin Chen wrote:
>>> On Mon, Oct 20, 2025 at 06:14:33PM +0200, Eric Auger wrote:
>>>>>> This will cause the device to be configured with wrong MSI doorbell
>>>>>> address if it return the system address space.
>>>>> I think it'd be nicer to elaborate why a wrong address will be returned:
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> On ARM, a device behind an IOMMU requires translation for its MSI doorbell
>>>>> address. When HW nested translation is enabled, the translation will also
>>>>> happen in two stages: gIOVA => gPA => ITS page.
>>>>>
>>>>> In the accelerated SMMUv3 mode, both stages are translated by the HW. So,
>>>>> get_address_space() returns the system address space for stage-2 mappings,
>>>>> as the smmuv3-accel model doesn't involve in either stage.
>>>> I don't understand "doesn't involve in either stage". This is still not
>>>> obious to me that for an HW accelerated nested IOMMU get_address_space()
>>>> shall return the system address space. I think this deserves to be
>>>> explained and maybe documented along with the callback.
>>> get_address_space() is used by pci_device_iommu_address_space(),
>>> which is for attach or translation.
>>>
>>> In QEMU, we have an "iommu" type of memory region, to represent
>>> the address space providing the stage-1 translation.
>>>
>>> In accel case excluding MSI, there is no need of "emulated iommu
>>> translation" since HW/host SMMU takes care of both stages. Thus,
>>> the system address is returned for get_address_space(), to avoid
>>> stage-1 translation and to also allow VFIO devices to attach to
>>> the system address space that the VFIO core will monitor to take
>>> care of stage-2 mappings.
>> but in general if you set as output 'as' the system_address_memory it
>> rather means you have no translation in place. This is what I am not
>> convinced about.
> You mean you are not convinced about "no translation"?
I am not convinced about the choice of using address_space_memory.
>
>> you say it aims at
>> - avoiding stage-1 translation - allow VFIO devices to attach to the
>> system address space that the VFIO core will monitor to take care of
>> stage-2 mappings. Can you achieve the same goals with a proper address
>> space?
> Would you please define "proper"?
an address space different from address_space_memory
>
> The disagreement is seemingly about using system address space or
> even address_space_memory, IIUIC.
Yes my doubt is about:

smmuv3_accel_find_add_as()
     * We are using the global &address_space_memory here, as this will
ensure
     * same system address space pointer for all devices behind the
accelerated
     * SMMUv3s in a VM. That way VFIO/iommufd can reuse a single IOAS ID in
     * iommufd_cdev_attach(), allowing the Stage-2 page tables to be shared
     * within the VM instead of duplicating them for every SMMUv3 instance.
     */
    if (vfio_pci) {
        return &address_space_memory;

I think it would be cleaner to a have an AddressSpace allocated on
purpose to support the VFIO accel use case, if possible.
To me returning address_space_memory pretends we are not doing any
translation. I understand it is "easy" to reuse that one but I wonder it
is the spirit of the get_address_space callback.

I would rather allocate a dedicated (shared) AddressSpace to support the
VFIO accel case. That's my suggestion.

>
> To our purpose here, so long as the vfio core can setup a proper
> listener to monitor the guest physical address space, we are fine
> with any alternative.
>
> The system address space just seems to be the simplest one. FWIW,
> kvm_arch_fixup_msi_route() also checks in the beginning:
>     if (as == &address_space_memory)
>
> So, returning @address_space_memory seems to be straightforward?
>
> I think I also need some education to understand why do we need
> an indirect address space that eventually will be routed back to
> address_space_memory?

Well I am not an expert of AddressSpaces either. Reading hw/pci/pci.h
and get_address_space() callback API doc comment, I understand this is
the output address space for the PCI device. If you return

address_space_memory, to me this means there is no translation in place. By the 
way, this was the interpretation of kvm_arch_fixup_msi_route() on ARM

    AddressSpace *as = pci_device_iommu_address_space(dev)

    if (as == &address_space_memory) {
        return 0;
    }

    /* MSI doorbell address is translated by an IOMMU */

Note: I am currently out of the office so I am not able to reply as fast as you 
may wish.

Thanks

Eric

>
> Thanks
> Nicolin
>


Reply via email to