Re: [git pull] IOMMU Updates for Linux v5.15
On Fri, Sep 03, 2021 at 11:43:31AM -0700, Linus Torvalds wrote: > choice > prompt "IOMMU default domain type" > depends on IOMMU_API > default IOMMU_DEFAULT_DMA_LAZY if AMD_IOMMU || INTEL_IOMMU > default IOMMU_DEFAULT_DMA_STRICT Huh, yeah, that is bogus. Seems like I overlooked that part of the patch-set because I was so amazed by the simplifications and cleanups in the rest of it. > See what I'm saying? Making the default be based on some random "this > driver is enabled" when it can then affect *other* drivers that are > also enabled and not part of the decision seems to be a fundamental > confusion about what is going on, when it's not at all clear which > driver will actually be IN USE. The Kconfig option itself was actually my suggestion, but how the default value is chosen certainly needs improvement. I will sort this out with the people involved. > IOW, the fix might be to just say "the default is always lazy". > > Or the fix might be something that is global to a configuration and > doesn't rely on which iommu is in use (eg "on x86, the default is > always LAZY") > > Or the fix is to make that 'iommu_dma_strict' variable - and the > default value for it - be a per-IOMMU thing rather than be a global. My preference would be to make 'lazy' or 'strict' the default for all, but the ARM folks might disagree. On the other side it also doesn't make sense to let IOMMU drivers override the users Kconfig choice at runtime. We will discuss this and come up with something better. Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] dt-bindings: iommu: renesas, ipmmu-vmsa: add r8a779a0 support
On Wed, 01 Sep 2021 19:27:04 +0900, Yoshihiro Shimoda wrote: > Add support for r8a779a0 (R-Car V3U). > > Signed-off-by: Yoshihiro Shimoda > --- > Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml | 1 + > 1 file changed, 1 insertion(+) > Acked-by: Rob Herring ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/vt-d: Drop "0x" prefix from PCI bus & device addresses
From: Bjorn Helgaas 719a19335692 ("iommu/vt-d: Tweak the description of a DMA fault") changed the DMA fault reason from hex to decimal. It also added "0x" prefixes to the PCI bus/device, e.g., - DMAR: [INTR-REMAP] Request device [00:00.5] + DMAR: [INTR-REMAP] Request device [0x00:0x00.5] These no longer match dev_printk() and other similar messages in dmar_match_pci_path() and dmar_acpi_insert_dev_scope(). Drop the "0x" prefixes from the bus and device addresses. Fixes: 719a19335692 ("iommu/vt-d: Tweak the description of a DMA fault") Signed-off-by: Bjorn Helgaas --- drivers/iommu/intel/dmar.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c index d66f79acd14d..8647a355dad0 100644 --- a/drivers/iommu/intel/dmar.c +++ b/drivers/iommu/intel/dmar.c @@ -1944,18 +1944,18 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, int type, reason = dmar_get_fault_reason(fault_reason, &fault_type); if (fault_type == INTR_REMAP) - pr_err("[INTR-REMAP] Request device [0x%02x:0x%02x.%d] fault index 0x%llx [fault reason 0x%02x] %s\n", + pr_err("[INTR-REMAP] Request device [%02x:%02x.%d] fault index 0x%llx [fault reason 0x%02x] %s\n", source_id >> 8, PCI_SLOT(source_id & 0xFF), PCI_FUNC(source_id & 0xFF), addr >> 48, fault_reason, reason); else if (pasid == INVALID_IOASID) - pr_err("[%s NO_PASID] Request device [0x%02x:0x%02x.%d] fault addr 0x%llx [fault reason 0x%02x] %s\n", + pr_err("[%s NO_PASID] Request device [%02x:%02x.%d] fault addr 0x%llx [fault reason 0x%02x] %s\n", type ? "DMA Read" : "DMA Write", source_id >> 8, PCI_SLOT(source_id & 0xFF), PCI_FUNC(source_id & 0xFF), addr, fault_reason, reason); else - pr_err("[%s PASID 0x%x] Request device [0x%02x:0x%02x.%d] fault addr 0x%llx [fault reason 0x%02x] %s\n", + pr_err("[%s PASID 0x%x] Request device [%02x:%02x.%d] fault addr 0x%llx [fault reason 0x%02x] %s\n", type ? "DMA Read" : "DMA Write", pasid, source_id >> 8, PCI_SLOT(source_id & 0xFF), PCI_FUNC(source_id & 0xFF), addr, -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [git pull] IOMMU Updates for Linux v5.15
The pull request you sent on Fri, 3 Sep 2021 16:03:11 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git > tags/iommu-updates-v5.15 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/69a5c49a9147e9daca76201e3d6edfea5ed8403a Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [git pull] IOMMU Updates for Linux v5.15
On Fri, Sep 3, 2021 at 7:03 AM Joerg Roedel wrote: > > git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git > tags/iommu-updates-v5.15 So I've merged this, but I'm not entirely happy with some of the rather insane Kconfig choices. In particular, the defaults for this: choice prompt "IOMMU default domain type" depends on IOMMU_API default IOMMU_DEFAULT_DMA_LAZY if AMD_IOMMU || INTEL_IOMMU default IOMMU_DEFAULT_DMA_STRICT seems fundamentally confused about what the h*ll is going on. Why? Because a choice like "AMD_IOMMU" or "INTEL_IOMMU" isn't some exclusive thing. You can have one or the other - or both. Or you could have another IOMMU entirely, despite _also_ having AMD/INTEL_IOMMU enabled as a config option. IOW, maybe INTEL or AMD_IOMMU is enabled in the kernel configuration, but that might not be the IOMMU that is then actually *active*. The active IOMMU might be VIRTIO_IOMMU, for example. See what I'm saying? Making the default be based on some random "this driver is enabled" when it can then affect *other* drivers that are also enabled and not part of the decision seems to be a fundamental confusion about what is going on, when it's not at all clear which driver will actually be IN USE. Now, I don't think this _matters_ that much in practice, and as mentioned, I already merged things. But I think people should seriously think about either (a) make that default something that is actually *reliable*, so that the fact that you have possibly enabled iommu X doesn't affect iommu Y that is actually the one in use or (b) make the defaults be actually per-driver, and set when the driver is in *use* rather than this incorrect model of "enabled but maybe not even used". IOW, the fix might be to just say "the default is always lazy". Or the fix might be something that is global to a configuration and doesn't rely on which iommu is in use (eg "on x86, the default is always LAZY") Or the fix is to make that 'iommu_dma_strict' variable - and the default value for it - be a per-IOMMU thing rather than be a global. Hmm? Linus ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/8] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE
On Fri, Sep 3, 2021, at 17:45, Robin Murphy wrote: > On 2021-09-03 16:16, Sven Peter wrote: > > > > > > On Thu, Sep 2, 2021, at 21:42, Robin Murphy wrote: > >> On 2021-09-02 19:19, Sven Peter wrote: > >>> > >>> > >>> On Wed, Sep 1, 2021, at 23:10, Alyssa Rosenzweig wrote: > > My biggest issue is that I do not understand how this function is > > supposed > > to be used correctly. It would work fine as-is if it only ever gets > > passed buffers > > allocated by the coherent API but there's not way to check or guarantee > > that. > > There may also be callers making assumptions that no longer hold when > > iovad->granule > PAGE_SIZE. > > > > Regarding your case: I'm not convinced the function is meant to be used > > there. > > If I understand it correctly, your code first allocates memory with > > dma_alloc_coherent > > (which possibly creates a sgt internally and then maps it with > > iommu_map_sg), > > then coerces that back into a sgt with dma_get_sgtable, and then maps > > that sgt to > > another iommu domain with dma_map_sg while assuming that the result > > will be contiguous > > in IOVA space. It'll work out because dma_alloc_coherent is the very > > thing > > meant to allocate pages that can be mapped into kernel and device VA > > space > > as a single contiguous block and because both of your IOMMUs are > > different > > instances of the same HW block. Anything allocated by > > dma_alloc_coherent for the > > first IOMMU will have the right shape that will allow it to be mapped as > > a single contiguous block for the second IOMMU. > > > > What could be done in your case is to instead use the IOMMU API, > > allocate the pages yourself (while ensuring the sgt your create is made > > up > > of blocks with size and physaddr aligned to max(domain_a->granule, > > domain_b->granule)) > > and then just use iommu_map_sg for both domains which actually comes > > with the > > guarantee that the result will be a single contiguous block in IOVA > > space and > > doesn't required the sgt roundtrip. > > In principle I agree. I am getting the sense this function can't be used > correctly in general, and yet is the function that's meant to be used. > If my interpretation of prior LKML discussion holds, the problems are > far deeper than my code or indeed page size problems... > >>> > >>> Right, which makes reasoning about this function and its behavior if the > >>> IOMMU pages size is unexpected very hard for me. I'm not opposed to just > >>> keeping this function as-is when there's a mismatch between PAGE_SIZE and > >>> the IOMMU page size (and it will probably work that way) but I'd like to > >>> be sure that won't introduce unexpected behavior. > >>> > > If the right way to handle this is with the IOMMU and IOVA APIs, I > really wish > that dance were wrapped up in a safe helper function instead of open > coding it in every driver that does cross device sharing. > > We might even call that helper... hmm... dma_map_sg *ducks* > > >>> > >>> There might be another way to do this correctly. I'm likely just a little > >>> bit biased because I've spent the past weeks wrapping my head around the > >>> IOMMU and DMA APIs and when all you have is a hammer everything looks like > >>> a nail. > >>> > >>> But dma_map_sg operates at the DMA API level and at that point the dma-ops > >>> for two different devices could be vastly different. > >>> In the worst case one of them could be behind an IOMMU that can easily map > >>> non-contiguous pages while the other one is directly connected to the bus > >>> and > >>> can't even access >4G pages without swiotlb support. > >>> It's really only possible to guarantee that it will map N buffers to <= N > >>> DMA-addressable buffers (possibly by using an IOMMU or swiotlb > >>> internally) at > >>> that point. > >>> > >>> On the IOMMU API level you have much more information available about the > >>> actual > >>> hardware and can prepare the buffers in a way that makes both devices > >>> happy. > >>> That's why iommu_map_sgtable combined with iovad->granule aligned sgt > >>> entries > >>> can actually guarantee to map the entire list to a single contiguous IOVA > >>> block. > >> > >> Essentially there are two reasonable options, and doing pretend dma-buf > >> export/import between two devices effectively owned by the same driver > >> is neither of them. Handily, DRM happens to be exactly where all the > >> precedent is, too; unsurprisingly this is not a new concern. > >> > >> One is to go full IOMMU API, like rockchip or tegra, attaching the > >> relevant devices to your own unmanaged domain(s) and mapping pages > >> exactly where you choose. You still make dma_map/dma_unmap calls for the > >> sake of cache maintenance and other housekee
Re: [PATCH v2 3/8] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE
> > On the IOMMU API level you have much more information available about the > > actual > > hardware and can prepare the buffers in a way that makes both devices happy. > > That's why iommu_map_sgtable combined with iovad->granule aligned sgt > > entries > > can actually guarantee to map the entire list to a single contiguous IOVA > > block. > > Essentially there are two reasonable options, and doing pretend dma-buf > export/import between two devices effectively owned by the same driver is > neither of them. Handily, DRM happens to be exactly where all the precedent > is, too; unsurprisingly this is not a new concern. > > One is to go full IOMMU API, like rockchip or tegra, attaching the relevant > devices to your own unmanaged domain(s) and mapping pages exactly where you > choose. You still make dma_map/dma_unmap calls for the sake of cache > maintenance and other housekeeping on the underlying memory, but you ignore > the provided DMA addresses in favour of your own IOVAs when it comes to > programming the devices. I guess this is the way to go for DCP. > The lazier option if you can rely on all relevant devices having equal DMA > and IOMMU capabilities is to follow exynos, and herd the devices into a > common default domain. Instead of allocating you own domain, you grab the > current domain for one device (which will be its default domain) and > manually attach the other devices to that. Then you forget all about IOMMUs > but make sure to do all your regular DMA API calls using that first device, > and the DMA addresses which come back should be magically valid for the > other devices too. It was a bit of a cheeky hack TBH, but I'd still much > prefer more of that over any usage of get_sgtable outside of actual > dma-buf... It'd probably be *possible* to get away with this for DCP but it'd probably involve more hacks, since the DARTs are not 100% symmetric and there are some contraints on the different DARTs involved. It'd also be less desirable -- there is no reason for the display coprocessor to know the actual *contents* of the framebuffer, only the IOVA valid only for the actual display hardware. These are two devices in hardware with two independent DARTs, by modeling as such we reduce the amount we need to trust the coprocessor firmware blob. > Note that where multiple IOMMU instances are involved, the latter approach > does depend on the IOMMU driver being able to support sharing a single > domain across them; I think that might sort-of-work for DART already, but > may need a little more attention. I think this already works (for USB-C). ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/8] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE
On 2021-09-03 16:16, Sven Peter wrote: On Thu, Sep 2, 2021, at 21:42, Robin Murphy wrote: On 2021-09-02 19:19, Sven Peter wrote: On Wed, Sep 1, 2021, at 23:10, Alyssa Rosenzweig wrote: My biggest issue is that I do not understand how this function is supposed to be used correctly. It would work fine as-is if it only ever gets passed buffers allocated by the coherent API but there's not way to check or guarantee that. There may also be callers making assumptions that no longer hold when iovad->granule > PAGE_SIZE. Regarding your case: I'm not convinced the function is meant to be used there. If I understand it correctly, your code first allocates memory with dma_alloc_coherent (which possibly creates a sgt internally and then maps it with iommu_map_sg), then coerces that back into a sgt with dma_get_sgtable, and then maps that sgt to another iommu domain with dma_map_sg while assuming that the result will be contiguous in IOVA space. It'll work out because dma_alloc_coherent is the very thing meant to allocate pages that can be mapped into kernel and device VA space as a single contiguous block and because both of your IOMMUs are different instances of the same HW block. Anything allocated by dma_alloc_coherent for the first IOMMU will have the right shape that will allow it to be mapped as a single contiguous block for the second IOMMU. What could be done in your case is to instead use the IOMMU API, allocate the pages yourself (while ensuring the sgt your create is made up of blocks with size and physaddr aligned to max(domain_a->granule, domain_b->granule)) and then just use iommu_map_sg for both domains which actually comes with the guarantee that the result will be a single contiguous block in IOVA space and doesn't required the sgt roundtrip. In principle I agree. I am getting the sense this function can't be used correctly in general, and yet is the function that's meant to be used. If my interpretation of prior LKML discussion holds, the problems are far deeper than my code or indeed page size problems... Right, which makes reasoning about this function and its behavior if the IOMMU pages size is unexpected very hard for me. I'm not opposed to just keeping this function as-is when there's a mismatch between PAGE_SIZE and the IOMMU page size (and it will probably work that way) but I'd like to be sure that won't introduce unexpected behavior. If the right way to handle this is with the IOMMU and IOVA APIs, I really wish that dance were wrapped up in a safe helper function instead of open coding it in every driver that does cross device sharing. We might even call that helper... hmm... dma_map_sg *ducks* There might be another way to do this correctly. I'm likely just a little bit biased because I've spent the past weeks wrapping my head around the IOMMU and DMA APIs and when all you have is a hammer everything looks like a nail. But dma_map_sg operates at the DMA API level and at that point the dma-ops for two different devices could be vastly different. In the worst case one of them could be behind an IOMMU that can easily map non-contiguous pages while the other one is directly connected to the bus and can't even access >4G pages without swiotlb support. It's really only possible to guarantee that it will map N buffers to <= N DMA-addressable buffers (possibly by using an IOMMU or swiotlb internally) at that point. On the IOMMU API level you have much more information available about the actual hardware and can prepare the buffers in a way that makes both devices happy. That's why iommu_map_sgtable combined with iovad->granule aligned sgt entries can actually guarantee to map the entire list to a single contiguous IOVA block. Essentially there are two reasonable options, and doing pretend dma-buf export/import between two devices effectively owned by the same driver is neither of them. Handily, DRM happens to be exactly where all the precedent is, too; unsurprisingly this is not a new concern. One is to go full IOMMU API, like rockchip or tegra, attaching the relevant devices to your own unmanaged domain(s) and mapping pages exactly where you choose. You still make dma_map/dma_unmap calls for the sake of cache maintenance and other housekeeping on the underlying memory, but you ignore the provided DMA addresses in favour of your own IOVAs when it comes to programming the devices. The lazier option if you can rely on all relevant devices having equal DMA and IOMMU capabilities is to follow exynos, and herd the devices into a common default domain. Instead of allocating you own domain, you grab the current domain for one device (which will be its default domain) and manually attach the other devices to that. Then you forget all about IOMMUs but make sure to do all your regular DMA API calls using that first device, and the DMA addresses which come back should be magically valid for the other devices too. It was a bit of a cheeky hack TBH, but I'd still much prefer
Re: [PATCH v2 1/5] dt-bindings: reserved-memory: Document memory region specifier
On Fri, Sep 03, 2021 at 09:36:33AM -0500, Rob Herring wrote: > On Fri, Sep 3, 2021 at 8:52 AM Thierry Reding > wrote: > > > > On Fri, Sep 03, 2021 at 08:20:55AM -0500, Rob Herring wrote: > > > On Wed, Sep 1, 2021 at 9:13 AM Thierry Reding > > > wrote: > > > > > > > > On Fri, Jul 02, 2021 at 05:16:25PM +0300, Dmitry Osipenko wrote: > > > > > 01.07.2021 21:14, Thierry Reding пишет: > > > > > > On Tue, Jun 08, 2021 at 06:51:40PM +0200, Thierry Reding wrote: > > > > > >> On Fri, May 28, 2021 at 06:54:55PM +0200, Thierry Reding wrote: > > > > > >>> On Thu, May 20, 2021 at 05:03:06PM -0500, Rob Herring wrote: > > > > > On Fri, Apr 23, 2021 at 06:32:30PM +0200, Thierry Reding wrote: > > > > > > From: Thierry Reding > > > > > > > > > > > > Reserved memory region phandle references can be accompanied by > > > > > > a > > > > > > specifier that provides additional information about how that > > > > > > specific > > > > > > reference should be treated. > > > > > > > > > > > > One use-case is to mark a memory region as needing an identity > > > > > > mapping > > > > > > in the system's IOMMU for the device that references the > > > > > > region. This is > > > > > > needed for example when the bootloader has set up hardware > > > > > > (such as a > > > > > > display controller) to actively access a memory region (e.g. a > > > > > > boot > > > > > > splash screen framebuffer) during boot. The operating system > > > > > > can use the > > > > > > identity mapping flag from the specifier to make sure an IOMMU > > > > > > identity > > > > > > mapping is set up for the framebuffer before IOMMU translations > > > > > > are > > > > > > enabled for the display controller. > > > > > > > > > > > > Signed-off-by: Thierry Reding > > > > > > --- > > > > > > .../reserved-memory/reserved-memory.txt | 21 > > > > > > +++ > > > > > > include/dt-bindings/reserved-memory.h | 8 +++ > > > > > > 2 files changed, 29 insertions(+) > > > > > > create mode 100644 include/dt-bindings/reserved-memory.h > > > > > > > > > > Sorry for being slow on this. I have 2 concerns. > > > > > > > > > > First, this creates an ABI issue. A DT with cells in > > > > > 'memory-region' > > > > > will not be understood by an existing OS. I'm less concerned > > > > > about this > > > > > if we address that with a stable fix. (Though I'm pretty sure > > > > > we've > > > > > naively added #?-cells in the past ignoring this issue.) > > > > > >>> > > > > > >>> A while ago I had proposed adding memory-region*s* as an > > > > > >>> alternative > > > > > >>> name for memory-region to make the naming more consistent with > > > > > >>> other > > > > > >>> types of properties (think clocks, resets, gpios, ...). If we > > > > > >>> added > > > > > >>> that, we could easily differentiate between the "legacy" cases > > > > > >>> where > > > > > >>> no #memory-region-cells was allowed and the new cases where it > > > > > >>> was. > > > > > >>> > > > > > Second, it could be the bootloader setting up the reserved > > > > > region. If a > > > > > node already has 'memory-region', then adding more regions is > > > > > more > > > > > complicated compared to adding new properties. And defining what > > > > > each > > > > > memory-region entry is or how many in schemas is impossible. > > > > > >>> > > > > > >>> It's true that updating the property gets a bit complicated, but > > > > > >>> it's > > > > > >>> not exactly rocket science. We really just need to splice the > > > > > >>> array. I > > > > > >>> have a working implemention for this in U-Boot. > > > > > >>> > > > > > >>> For what it's worth, we could run into the same issue with any new > > > > > >>> property that we add. Even if we renamed this to > > > > > >>> iommu-memory-region, > > > > > >>> it's still possible that a bootloader may have to update this > > > > > >>> property > > > > > >>> if it already exists (it could be hard-coded in DT, or it could > > > > > >>> have > > > > > >>> been added by some earlier bootloader or firmware). > > > > > >>> > > > > > Both could be addressed with a new property. Perhaps something > > > > > like > > > > > 'iommu-memory-region = <&phandle>;'. I think the 'iommu' prefix > > > > > is > > > > > appropriate given this is entirely because of the IOMMU being in > > > > > the > > > > > mix. I might feel differently if we had other uses for cells, > > > > > but I > > > > > don't really see it in this case. > > > > > >>> > > > > > >>> I'm afraid that down the road we'll end up with other cases and > > > > > >>> then we > > > > > >>> might proliferate a number of *-memory-region properties with > > > > > >>> varying > > > > > >>> prefixes. > > > > > >>> > > > > > >>>
Re: [PATCH v2 3/8] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE
On Thu, Sep 2, 2021, at 21:42, Robin Murphy wrote: > On 2021-09-02 19:19, Sven Peter wrote: > > > > > > On Wed, Sep 1, 2021, at 23:10, Alyssa Rosenzweig wrote: > >>> My biggest issue is that I do not understand how this function is supposed > >>> to be used correctly. It would work fine as-is if it only ever gets > >>> passed buffers > >>> allocated by the coherent API but there's not way to check or guarantee > >>> that. > >>> There may also be callers making assumptions that no longer hold when > >>> iovad->granule > PAGE_SIZE. > >>> > >>> Regarding your case: I'm not convinced the function is meant to be used > >>> there. > >>> If I understand it correctly, your code first allocates memory with > >>> dma_alloc_coherent > >>> (which possibly creates a sgt internally and then maps it with > >>> iommu_map_sg), > >>> then coerces that back into a sgt with dma_get_sgtable, and then maps > >>> that sgt to > >>> another iommu domain with dma_map_sg while assuming that the result will > >>> be contiguous > >>> in IOVA space. It'll work out because dma_alloc_coherent is the very thing > >>> meant to allocate pages that can be mapped into kernel and device VA space > >>> as a single contiguous block and because both of your IOMMUs are different > >>> instances of the same HW block. Anything allocated by dma_alloc_coherent > >>> for the > >>> first IOMMU will have the right shape that will allow it to be mapped as > >>> a single contiguous block for the second IOMMU. > >>> > >>> What could be done in your case is to instead use the IOMMU API, > >>> allocate the pages yourself (while ensuring the sgt your create is made up > >>> of blocks with size and physaddr aligned to max(domain_a->granule, > >>> domain_b->granule)) > >>> and then just use iommu_map_sg for both domains which actually comes with > >>> the > >>> guarantee that the result will be a single contiguous block in IOVA space > >>> and > >>> doesn't required the sgt roundtrip. > >> > >> In principle I agree. I am getting the sense this function can't be used > >> correctly in general, and yet is the function that's meant to be used. > >> If my interpretation of prior LKML discussion holds, the problems are > >> far deeper than my code or indeed page size problems... > > > > Right, which makes reasoning about this function and its behavior if the > > IOMMU pages size is unexpected very hard for me. I'm not opposed to just > > keeping this function as-is when there's a mismatch between PAGE_SIZE and > > the IOMMU page size (and it will probably work that way) but I'd like to > > be sure that won't introduce unexpected behavior. > > > >> > >> If the right way to handle this is with the IOMMU and IOVA APIs, I really > >> wish > >> that dance were wrapped up in a safe helper function instead of open > >> coding it in every driver that does cross device sharing. > >> > >> We might even call that helper... hmm... dma_map_sg *ducks* > >> > > > > There might be another way to do this correctly. I'm likely just a little > > bit biased because I've spent the past weeks wrapping my head around the > > IOMMU and DMA APIs and when all you have is a hammer everything looks like > > a nail. > > > > But dma_map_sg operates at the DMA API level and at that point the dma-ops > > for two different devices could be vastly different. > > In the worst case one of them could be behind an IOMMU that can easily map > > non-contiguous pages while the other one is directly connected to the bus > > and > > can't even access >4G pages without swiotlb support. > > It's really only possible to guarantee that it will map N buffers to <= N > > DMA-addressable buffers (possibly by using an IOMMU or swiotlb internally) > > at > > that point. > > > > On the IOMMU API level you have much more information available about the > > actual > > hardware and can prepare the buffers in a way that makes both devices happy. > > That's why iommu_map_sgtable combined with iovad->granule aligned sgt > > entries > > can actually guarantee to map the entire list to a single contiguous IOVA > > block. > > Essentially there are two reasonable options, and doing pretend dma-buf > export/import between two devices effectively owned by the same driver > is neither of them. Handily, DRM happens to be exactly where all the > precedent is, too; unsurprisingly this is not a new concern. > > One is to go full IOMMU API, like rockchip or tegra, attaching the > relevant devices to your own unmanaged domain(s) and mapping pages > exactly where you choose. You still make dma_map/dma_unmap calls for the > sake of cache maintenance and other housekeeping on the underlying > memory, but you ignore the provided DMA addresses in favour of your own > IOVAs when it comes to programming the devices. > > The lazier option if you can rely on all relevant devices having equal > DMA and IOMMU capabilities is to follow exynos, and herd the devices > into a common default domain. Instead of
Re: [PATCH v2 1/5] dt-bindings: reserved-memory: Document memory region specifier
On Fri, Sep 3, 2021 at 8:52 AM Thierry Reding wrote: > > On Fri, Sep 03, 2021 at 08:20:55AM -0500, Rob Herring wrote: > > On Wed, Sep 1, 2021 at 9:13 AM Thierry Reding > > wrote: > > > > > > On Fri, Jul 02, 2021 at 05:16:25PM +0300, Dmitry Osipenko wrote: > > > > 01.07.2021 21:14, Thierry Reding пишет: > > > > > On Tue, Jun 08, 2021 at 06:51:40PM +0200, Thierry Reding wrote: > > > > >> On Fri, May 28, 2021 at 06:54:55PM +0200, Thierry Reding wrote: > > > > >>> On Thu, May 20, 2021 at 05:03:06PM -0500, Rob Herring wrote: > > > > On Fri, Apr 23, 2021 at 06:32:30PM +0200, Thierry Reding wrote: > > > > > From: Thierry Reding > > > > > > > > > > Reserved memory region phandle references can be accompanied by a > > > > > specifier that provides additional information about how that > > > > > specific > > > > > reference should be treated. > > > > > > > > > > One use-case is to mark a memory region as needing an identity > > > > > mapping > > > > > in the system's IOMMU for the device that references the region. > > > > > This is > > > > > needed for example when the bootloader has set up hardware (such > > > > > as a > > > > > display controller) to actively access a memory region (e.g. a > > > > > boot > > > > > splash screen framebuffer) during boot. The operating system can > > > > > use the > > > > > identity mapping flag from the specifier to make sure an IOMMU > > > > > identity > > > > > mapping is set up for the framebuffer before IOMMU translations > > > > > are > > > > > enabled for the display controller. > > > > > > > > > > Signed-off-by: Thierry Reding > > > > > --- > > > > > .../reserved-memory/reserved-memory.txt | 21 > > > > > +++ > > > > > include/dt-bindings/reserved-memory.h | 8 +++ > > > > > 2 files changed, 29 insertions(+) > > > > > create mode 100644 include/dt-bindings/reserved-memory.h > > > > > > > > Sorry for being slow on this. I have 2 concerns. > > > > > > > > First, this creates an ABI issue. A DT with cells in > > > > 'memory-region' > > > > will not be understood by an existing OS. I'm less concerned about > > > > this > > > > if we address that with a stable fix. (Though I'm pretty sure we've > > > > naively added #?-cells in the past ignoring this issue.) > > > > >>> > > > > >>> A while ago I had proposed adding memory-region*s* as an alternative > > > > >>> name for memory-region to make the naming more consistent with other > > > > >>> types of properties (think clocks, resets, gpios, ...). If we added > > > > >>> that, we could easily differentiate between the "legacy" cases where > > > > >>> no #memory-region-cells was allowed and the new cases where it was. > > > > >>> > > > > Second, it could be the bootloader setting up the reserved region. > > > > If a > > > > node already has 'memory-region', then adding more regions is more > > > > complicated compared to adding new properties. And defining what > > > > each > > > > memory-region entry is or how many in schemas is impossible. > > > > >>> > > > > >>> It's true that updating the property gets a bit complicated, but > > > > >>> it's > > > > >>> not exactly rocket science. We really just need to splice the > > > > >>> array. I > > > > >>> have a working implemention for this in U-Boot. > > > > >>> > > > > >>> For what it's worth, we could run into the same issue with any new > > > > >>> property that we add. Even if we renamed this to > > > > >>> iommu-memory-region, > > > > >>> it's still possible that a bootloader may have to update this > > > > >>> property > > > > >>> if it already exists (it could be hard-coded in DT, or it could have > > > > >>> been added by some earlier bootloader or firmware). > > > > >>> > > > > Both could be addressed with a new property. Perhaps something like > > > > 'iommu-memory-region = <&phandle>;'. I think the 'iommu' prefix is > > > > appropriate given this is entirely because of the IOMMU being in > > > > the > > > > mix. I might feel differently if we had other uses for cells, but I > > > > don't really see it in this case. > > > > >>> > > > > >>> I'm afraid that down the road we'll end up with other cases and > > > > >>> then we > > > > >>> might proliferate a number of *-memory-region properties with > > > > >>> varying > > > > >>> prefixes. > > > > >>> > > > > >>> I am aware of one other case where we might need something like > > > > >>> this: on > > > > >>> some Tegra SoCs we have audio processors that will access memory > > > > >>> buffers > > > > >>> using a DMA engine. These processors are booted from early firmware > > > > >>> using firmware from system memory. In order to avoid trashing the > > > > >>> firmware, we need to reserve memory. We can do this using reserved > > > > >>> memo
[git pull] IOMMU Updates for Linux v5.15
Hi Linus, The tree is a bit more messy this time, mostly because there are many IOMMU core changes included and driver patches which depend on them living in different branches. So some cross-merging between branches was necessary. With that said: The following changes since commit 7c60610d476766e128cc4284bb6349732cbd6606: Linux 5.14-rc6 (2021-08-15 13:40:53 -1000) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git tags/iommu-updates-v5.15 for you to fetch changes up to d8768d7eb9c21ef928adb93402d9348bcc4a6915: Merge branches 'apple/dart', 'arm/smmu', 'iommu/fixes', 'x86/amd', 'x86/vt-d' and 'core' into next (2021-08-20 17:14:35 +0200) IOMMU Updates for Linux v5.15 Including: - New DART IOMMU driver for Apple Silicon M1 chips. - Optimizations for iommu_[map/unmap] performance - Selective TLB flush support for the AMD IOMMU driver to make it more efficient on emulated IOMMUs. - Rework IOVA setup and default domain type setting to move more code out of IOMMU drivers and to support runtime switching between certain types of default domains. - VT-d Updates from Lu Baolu: - Update the virtual command related registers - Enable Intel IOMMU scalable mode by default - Preset A/D bits for user space DMA usage - Allow devices to have more than 32 outstanding PRs - Various cleanups - ARM SMMU Updates from Will Deacon: - SMMUv3: Minor optimisation to avoid zeroing struct members on CMD submission - SMMUv3: Increased use of batched commands to reduce submission latency - SMMUv3: Refactoring in preparation for ECMDQ support - SMMUv2: Fix races when probing devices with identical StreamIDs - SMMUv2: Optimise walk cache flushing for Qualcomm implementations - SMMUv2: Allow deep sleep states for some Qualcomm SoCs with shared clocks - Various smaller optimizations, cleanups, and fixes Andy Shevchenko (1): iommu/vt-d: Drop the kernel doc annotation Ashish Mhetre (1): iommu: Fix race condition during default domain allocation Ezequiel Garcia (1): iommu/dma: Fix leak in non-contiguous API Fenghua Yu (1): iommu/vt-d: Fix PASID reference leak Frank Wunderlich (1): iommu: Check if group is NULL before remove device Geert Uytterhoeven (1): iommu/dart: APPLE_DART should depend on ARCH_APPLE Isaac J. Manjarres (12): iommu/io-pgtable: Introduce unmap_pages() as a page table op iommu: Add an unmap_pages() op for IOMMU drivers iommu/io-pgtable: Introduce map_pages() as a page table op iommu: Add a map_pages() op for IOMMU drivers iommu: Add support for the map_pages() callback iommu/io-pgtable-arm: Prepare PTE methods for handling multiple entries iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages() iommu/io-pgtable-arm: Implement arm_lpae_map_pages() iommu/io-pgtable-arm-v7s: Implement arm_v7s_unmap_pages() iommu/io-pgtable-arm-v7s: Implement arm_v7s_map_pages() iommu/arm-smmu: Implement the unmap_pages() IOMMU driver callback iommu/arm-smmu: Implement the map_pages() IOMMU driver callback Joerg Roedel (4): Merge remote-tracking branch 'korg/core' into x86/amd iommu/amd: Remove stale amd_iommu_unmap_flush usage Merge tag 'arm-smmu-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into arm/smmu Merge branches 'apple/dart', 'arm/smmu', 'iommu/fixes', 'x86/amd', 'x86/vt-d' and 'core' into next John Garry (5): iommu: Deprecate Intel and AMD cmdline methods to enable strict mode iommu: Print strict or lazy mode at init time iommu: Remove mode argument from iommu_set_dma_strict() iommu/arm-smmu-v3: Remove some unneeded init in arm_smmu_cmdq_issue_cmdlist() iommu/arm-smmu-v3: Stop pre-zeroing batch commands Krishna Reddy (1): iommu/arm-smmu: Fix race condition during iommu_group creation Lennert Buytenhek (1): iommu/amd: Fix printing of IOMMU events when rate limiting kicks in Liu Yi L (3): iommu/vt-d: Fix incomplete cache flush in intel_pasid_tear_down_entry() iommu/vt-d: Use pasid_pte_is_present() helper function iommu/vt-d: Add present bit check in pasid entry setup helpers Lu Baolu (8): iommu/vt-d: Report real pgsize bitmap to iommu core iommu/vt-d: Implement map/unmap_pages() iommu_ops callback iommu/vt-d: Move clflush'es from iotlb_sync_map() to map_pages() iommu/vt-d: Update the virtual command related registers iommu/vt-d: Refactor Kconfig a bit iommu/vt-d: Enable Intel IOMMU scalable mode by default iommu/vt-d: Preset A/D bits for user space DMA usage iommu/vt-d: A
Re: [PATCH v2 1/5] dt-bindings: reserved-memory: Document memory region specifier
On Fri, Sep 03, 2021 at 08:20:55AM -0500, Rob Herring wrote: > On Wed, Sep 1, 2021 at 9:13 AM Thierry Reding > wrote: > > > > On Fri, Jul 02, 2021 at 05:16:25PM +0300, Dmitry Osipenko wrote: > > > 01.07.2021 21:14, Thierry Reding пишет: > > > > On Tue, Jun 08, 2021 at 06:51:40PM +0200, Thierry Reding wrote: > > > >> On Fri, May 28, 2021 at 06:54:55PM +0200, Thierry Reding wrote: > > > >>> On Thu, May 20, 2021 at 05:03:06PM -0500, Rob Herring wrote: > > > On Fri, Apr 23, 2021 at 06:32:30PM +0200, Thierry Reding wrote: > > > > From: Thierry Reding > > > > > > > > Reserved memory region phandle references can be accompanied by a > > > > specifier that provides additional information about how that > > > > specific > > > > reference should be treated. > > > > > > > > One use-case is to mark a memory region as needing an identity > > > > mapping > > > > in the system's IOMMU for the device that references the region. > > > > This is > > > > needed for example when the bootloader has set up hardware (such as > > > > a > > > > display controller) to actively access a memory region (e.g. a boot > > > > splash screen framebuffer) during boot. The operating system can > > > > use the > > > > identity mapping flag from the specifier to make sure an IOMMU > > > > identity > > > > mapping is set up for the framebuffer before IOMMU translations are > > > > enabled for the display controller. > > > > > > > > Signed-off-by: Thierry Reding > > > > --- > > > > .../reserved-memory/reserved-memory.txt | 21 > > > > +++ > > > > include/dt-bindings/reserved-memory.h | 8 +++ > > > > 2 files changed, 29 insertions(+) > > > > create mode 100644 include/dt-bindings/reserved-memory.h > > > > > > Sorry for being slow on this. I have 2 concerns. > > > > > > First, this creates an ABI issue. A DT with cells in 'memory-region' > > > will not be understood by an existing OS. I'm less concerned about > > > this > > > if we address that with a stable fix. (Though I'm pretty sure we've > > > naively added #?-cells in the past ignoring this issue.) > > > >>> > > > >>> A while ago I had proposed adding memory-region*s* as an alternative > > > >>> name for memory-region to make the naming more consistent with other > > > >>> types of properties (think clocks, resets, gpios, ...). If we added > > > >>> that, we could easily differentiate between the "legacy" cases where > > > >>> no #memory-region-cells was allowed and the new cases where it was. > > > >>> > > > Second, it could be the bootloader setting up the reserved region. > > > If a > > > node already has 'memory-region', then adding more regions is more > > > complicated compared to adding new properties. And defining what each > > > memory-region entry is or how many in schemas is impossible. > > > >>> > > > >>> It's true that updating the property gets a bit complicated, but it's > > > >>> not exactly rocket science. We really just need to splice the array. I > > > >>> have a working implemention for this in U-Boot. > > > >>> > > > >>> For what it's worth, we could run into the same issue with any new > > > >>> property that we add. Even if we renamed this to iommu-memory-region, > > > >>> it's still possible that a bootloader may have to update this property > > > >>> if it already exists (it could be hard-coded in DT, or it could have > > > >>> been added by some earlier bootloader or firmware). > > > >>> > > > Both could be addressed with a new property. Perhaps something like > > > 'iommu-memory-region = <&phandle>;'. I think the 'iommu' prefix is > > > appropriate given this is entirely because of the IOMMU being in the > > > mix. I might feel differently if we had other uses for cells, but I > > > don't really see it in this case. > > > >>> > > > >>> I'm afraid that down the road we'll end up with other cases and then > > > >>> we > > > >>> might proliferate a number of *-memory-region properties with varying > > > >>> prefixes. > > > >>> > > > >>> I am aware of one other case where we might need something like this: > > > >>> on > > > >>> some Tegra SoCs we have audio processors that will access memory > > > >>> buffers > > > >>> using a DMA engine. These processors are booted from early firmware > > > >>> using firmware from system memory. In order to avoid trashing the > > > >>> firmware, we need to reserve memory. We can do this using reserved > > > >>> memory nodes. However, the audio DMA engine also uses the SMMU, so we > > > >>> need to make sure that the firmware memory is marked as reserved > > > >>> within > > > >>> the SMMU. This is similar to the identity mapping case, but not > > > >>> exactly > > > >>> the same. Instead of creating a 1:1 mapping, we just want that IOVA > > > >>> region to be reserved (i.e. I
Re: [PATCH v2 1/5] dt-bindings: reserved-memory: Document memory region specifier
On Wed, Sep 1, 2021 at 9:13 AM Thierry Reding wrote: > > On Fri, Jul 02, 2021 at 05:16:25PM +0300, Dmitry Osipenko wrote: > > 01.07.2021 21:14, Thierry Reding пишет: > > > On Tue, Jun 08, 2021 at 06:51:40PM +0200, Thierry Reding wrote: > > >> On Fri, May 28, 2021 at 06:54:55PM +0200, Thierry Reding wrote: > > >>> On Thu, May 20, 2021 at 05:03:06PM -0500, Rob Herring wrote: > > On Fri, Apr 23, 2021 at 06:32:30PM +0200, Thierry Reding wrote: > > > From: Thierry Reding > > > > > > Reserved memory region phandle references can be accompanied by a > > > specifier that provides additional information about how that specific > > > reference should be treated. > > > > > > One use-case is to mark a memory region as needing an identity mapping > > > in the system's IOMMU for the device that references the region. This > > > is > > > needed for example when the bootloader has set up hardware (such as a > > > display controller) to actively access a memory region (e.g. a boot > > > splash screen framebuffer) during boot. The operating system can use > > > the > > > identity mapping flag from the specifier to make sure an IOMMU > > > identity > > > mapping is set up for the framebuffer before IOMMU translations are > > > enabled for the display controller. > > > > > > Signed-off-by: Thierry Reding > > > --- > > > .../reserved-memory/reserved-memory.txt | 21 > > > +++ > > > include/dt-bindings/reserved-memory.h | 8 +++ > > > 2 files changed, 29 insertions(+) > > > create mode 100644 include/dt-bindings/reserved-memory.h > > > > Sorry for being slow on this. I have 2 concerns. > > > > First, this creates an ABI issue. A DT with cells in 'memory-region' > > will not be understood by an existing OS. I'm less concerned about this > > if we address that with a stable fix. (Though I'm pretty sure we've > > naively added #?-cells in the past ignoring this issue.) > > >>> > > >>> A while ago I had proposed adding memory-region*s* as an alternative > > >>> name for memory-region to make the naming more consistent with other > > >>> types of properties (think clocks, resets, gpios, ...). If we added > > >>> that, we could easily differentiate between the "legacy" cases where > > >>> no #memory-region-cells was allowed and the new cases where it was. > > >>> > > Second, it could be the bootloader setting up the reserved region. If a > > node already has 'memory-region', then adding more regions is more > > complicated compared to adding new properties. And defining what each > > memory-region entry is or how many in schemas is impossible. > > >>> > > >>> It's true that updating the property gets a bit complicated, but it's > > >>> not exactly rocket science. We really just need to splice the array. I > > >>> have a working implemention for this in U-Boot. > > >>> > > >>> For what it's worth, we could run into the same issue with any new > > >>> property that we add. Even if we renamed this to iommu-memory-region, > > >>> it's still possible that a bootloader may have to update this property > > >>> if it already exists (it could be hard-coded in DT, or it could have > > >>> been added by some earlier bootloader or firmware). > > >>> > > Both could be addressed with a new property. Perhaps something like > > 'iommu-memory-region = <&phandle>;'. I think the 'iommu' prefix is > > appropriate given this is entirely because of the IOMMU being in the > > mix. I might feel differently if we had other uses for cells, but I > > don't really see it in this case. > > >>> > > >>> I'm afraid that down the road we'll end up with other cases and then we > > >>> might proliferate a number of *-memory-region properties with varying > > >>> prefixes. > > >>> > > >>> I am aware of one other case where we might need something like this: on > > >>> some Tegra SoCs we have audio processors that will access memory buffers > > >>> using a DMA engine. These processors are booted from early firmware > > >>> using firmware from system memory. In order to avoid trashing the > > >>> firmware, we need to reserve memory. We can do this using reserved > > >>> memory nodes. However, the audio DMA engine also uses the SMMU, so we > > >>> need to make sure that the firmware memory is marked as reserved within > > >>> the SMMU. This is similar to the identity mapping case, but not exactly > > >>> the same. Instead of creating a 1:1 mapping, we just want that IOVA > > >>> region to be reserved (i.e. IOMMU_RESV_RESERVED instead of > > >>> IOMMU_RESV_DIRECT{,_RELAXABLE}). > > >>> > > >>> That would also fall into the IOMMU domain, but we can't reuse the > > >>> iommu-memory-region property for that because then we don't have enough > > >>> information to decide which type of reservation we need. > > >>> > > >>> We could obviously make iommu-m