On 06/06/2023 16:05, Peter Xu wrote: > On Tue, Jun 06, 2023 at 11:03:11AM -0400, Peter Xu wrote: >> On Tue, Jun 06, 2023 at 12:22:16PM +0100, Joao Martins wrote: >>> On 05/06/2023 17:57, Peter Xu wrote: >>>> On Tue, May 30, 2023 at 06:59:25PM +0100, Joao Martins wrote: >>>>> Much like pci_device_iommu_address_space() fetches the IOMMU AS, add a >>>>> pci_device_iommu_memory_region() which lets it return an the IOMMU MR >>>>> associated with it. The IOMMU MR is returned correctly for vIOMMUs using >>>>> pci_setup_iommu_info(). Note that today most vIOMMUs create the address >>>>> space and IOMMU MR at the same time, it's just mainly that there's API >>>>> to make the latter available. >>>> >>>> Have you looked into other archs outside x86? IIRC on some other arch one >>>> address space can have >1 IOMMU memory regions.. at least with such AS and >>>> MR layering it seems always possible? Thanks, >>>> >>> >>> I looked at all callers of pci_setup_iommu() restricting to those that >>> actually >>> track an IOMMUMemoryRegion when they create a address space... as this is >>> where >>> pci_device_iommu_memory_region() is applicable. From looking at those[*], I >>> see >>> always a 1:1 association between the AS and the IOMMU-MR in their >>> initialization >>> when iommu_fn is called. Unless I missed something... Is there an arch you >>> were >>> thinking specifically? >> >> If only observing the ones that "track an IOMMUMemoryRegion when they >> create a address space", probably we're fine. I was thinking ppc but I >> don't really know the details, and I assume that's not in the scope. >> Copying David Gibson just in case he got some comments here. >>/me nods
>>> >>> [I am not sure we can track today an 1:N AS->IOMMU association today in >>> Qemu] >> >> IIUC we can? The address space only have a root MR, and with that after >> translate() upon the root mr (per address_space_translate_iommu(), it can >> even be a few rounds of nested translations) it can go into whatever MR >> under it IIUC. Different ranges can map to a different IOMMU MR logically. >> I'll look some more into address_space_translate_iommu(). From a data structure PoV wasn't obvious how two different AS can be routed via two different IOMMU MRs from different AS (or vice versa). Thanks for clarifying; >>> >>> [*] alpha, arm smmu, ppc, s390, virtio, and some pci bridges (pnv_phb3 and >>> pnv_phb4) >> >> I just worried what we need here is not a MR object but a higher level >> object like the vIOMMU object. We used to have a requirement with Scalable >> IOV (SVA) on Intel. I tried to dig a bit in my inbox, not sure whether >> it's the latest status, just to show what I meant: >> >> https://lore.kernel.org/r/20210302203827.437645-6-yi.l....@intel.com >> Oh nice; I wasn't aware of this series. >> Copy Yi too for that too. From that aspect it makes more sense to me to >> fetching things from either an IOMMUops or "an iommu object", rather than >> relying on a specific MR (it'll also make it even harder when we can have >> 1 vIOMMUs so different MR can point to different IOMMUs in the future). >> >> I assume the two goals have similar requirement, iiuc. If that's the case, >> we'd better make sure we'll have one way to work for both. Yeap, makes sense, definitely more future-proof. We essentially were trying to do the exact same thing in the PCI layer just different purposes. All I meant in this series is a way to fetch some vIOMMU attribute that tell me if DMA translation is enabled or not and max IOVA for the IOMMU under a particular PCI device. Perhaps I would instead do a bit like this series and have a pci_setup_iommu_ops() and convert existing users to it as a separate step or series, to avoid regressing those who don't care. I am happy to pick those up if Yi's is OK with it -- should help for the nesting work down the road too. Joao