Re: [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
On Fri, Aug 07, 2015 at 02:38:39PM +0100, Robin Murphy wrote: Indeed, DMA_DEBUG will check that a driver is making DMA API calls to the arch code in the right way; this is a different check, to catch things like the arch code passing the wrong domain into this layer, or someone else having messed directly with the domain via the IOMMU API. If the iommu_unmap doesn't match the IOVA region we looked up, that means the IOMMU page tables have somehow become inconsistent with the IOVA allocator, so we are in an unrecoverable situation where we can no longer be sure what devices have access to. That's bad. Sure, but the BUG_ON would also trigger on things like a double-free, which is bad to handle as a BUG_ON. A WARN_ON for this is sufficient. AFAIK, yes (this is just a slight tidyup of the existing code that 32-bit Exynos/Tegra/Rockchip/etc. devices are already using) - the display guys want increasingly massive contiguous allocations for framebuffers, layers, etc., so having IOMMU magic deal with that saves CMA for non-IOMMU devices that really need it. Makes sense, I thougt about something similar for x86 too to avoid the high-order allocations we currently do. I guess the buffer will later be mapped into the vmalloc space for the CPU? Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
Hi Joerg, On 11/08/15 10:37, Joerg Roedel wrote: On Fri, Aug 07, 2015 at 02:38:39PM +0100, Robin Murphy wrote: Indeed, DMA_DEBUG will check that a driver is making DMA API calls to the arch code in the right way; this is a different check, to catch things like the arch code passing the wrong domain into this layer, or someone else having messed directly with the domain via the IOMMU API. If the iommu_unmap doesn't match the IOVA region we looked up, that means the IOMMU page tables have somehow become inconsistent with the IOVA allocator, so we are in an unrecoverable situation where we can no longer be sure what devices have access to. That's bad. Sure, but the BUG_ON would also trigger on things like a double-free, which is bad to handle as a BUG_ON. A WARN_ON for this is sufficient. Oh dear, it gets even better than that; in the case of a simple double-unmap where the IOVA is already free, we wouldn't even get as far as that check because we'd die calling iova_size(NULL). How on Earth did I get to v5 without spotting that? :( Anyway, on reflection I think you're probably right; I've clearly been working on this for long enough to start falling into the my thing is obviously more important than all the other things trap. AFAIK, yes (this is just a slight tidyup of the existing code that 32-bit Exynos/Tegra/Rockchip/etc. devices are already using) - the display guys want increasingly massive contiguous allocations for framebuffers, layers, etc., so having IOMMU magic deal with that saves CMA for non-IOMMU devices that really need it. Makes sense, I thougt about something similar for x86 too to avoid the high-order allocations we currently do. I guess the buffer will later be mapped into the vmalloc space for the CPU? Indeed - for non-coherent devices we have to remap all allocations (IOMMU or not) anyway in order to get a non-cacheable CPU mapping of the buffer, so having non-contiguous pages is no bother; for coherent devices we can just do the same thing but keep the vmalloc mapping cacheable. There's also the DMA_ATTR_NO_KERNEL_MAPPING case (e.g. GPU just wants a big buffer to render into and read back out again) where we wouldn't need a CPU address at all, although on arm64 vmalloc space is cheap enough that we've no plans to implement that at the moment. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
Hi Joerg, Thanks for taking a look, On 07/08/15 09:42, Joerg Roedel wrote: On Fri, Jul 31, 2015 at 06:18:27PM +0100, Robin Murphy wrote: +int iommu_get_dma_cookie(struct iommu_domain *domain) +{ + struct iova_domain *iovad; + + if (domain-dma_api_cookie) + return -EEXIST; Why do you call that dma_api_cookie? It is just a pointer to an iova allocator, you can just name it as such, like domain-iova. Sure, it was more the case that since it had to be in the top-level generic domain structure, I didn't want it to be too implementation-specific. I figured this was a reasonable compromise that wouldn't be a waste of space for implementations with different per-domain DMA API data - e.g. the AMD IOMMU driver could then make use of protection_domain-domain-dma_api_cookie instead of having protection_domain-priv, but that's a patch that wouldn't belong in this series anyway. If you really hate that idea, then yeah, let's just call it iova and consider if factoring out redundancy is still applicable later. +static struct iova *__alloc_iova(struct iova_domain *iovad, size_t size, + dma_addr_t dma_limit) I think you also need a struct device here to take segment boundary and dma_mask into account. To the best of my understanding, those limits are only relevant when actually handing off a scatterlist to a client device doing hardware scatter-gather operations, so it's not so much the IOVA allocation that matters, but where the segments lie within it when handling dma_map_sg. However, you do raise a good point - in the current map everything consecutively approach, if there is a non-power-of-2-sized segment in the middle of a scatterlist, then subsequent segments could possibly end up inadvertently straddling a boundary. That needs handling in iommu_dma_map_sg; I'll fix it appropriately. +/* The IOVA allocator knows what we mapped, so just unmap whatever that was */ +static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr) +{ + struct iova_domain *iovad = domain-dma_api_cookie; + unsigned long shift = iova_shift(iovad); + unsigned long pfn = dma_addr shift; + struct iova *iova = find_iova(iovad, pfn); + size_t size = iova_size(iova) shift; + + /* ...and if we can't, then something is horribly, horribly wrong */ + BUG_ON(iommu_unmap(domain, pfn shift, size) size); This is a WARN_ON at most, not a BUG_ON condition, especially since this type of bug is also catched with the dma-api debugging code. Indeed, DMA_DEBUG will check that a driver is making DMA API calls to the arch code in the right way; this is a different check, to catch things like the arch code passing the wrong domain into this layer, or someone else having messed directly with the domain via the IOMMU API. If the iommu_unmap doesn't match the IOVA region we looked up, that means the IOMMU page tables have somehow become inconsistent with the IOVA allocator, so we are in an unrecoverable situation where we can no longer be sure what devices have access to. That's bad. +static struct page **__iommu_dma_alloc_pages(unsigned int count, gfp_t gfp) +{ + struct page **pages; + unsigned int i = 0, array_size = count * sizeof(*pages); + + if (array_size = PAGE_SIZE) + pages = kzalloc(array_size, GFP_KERNEL); + else + pages = vzalloc(array_size); + if (!pages) + return NULL; + + /* IOMMU can map any pages, so himem can also be used here */ + gfp |= __GFP_NOWARN | __GFP_HIGHMEM; + + while (count) { + struct page *page = NULL; + int j, order = __fls(count); + + /* +* Higher-order allocations are a convenience rather +* than a necessity, hence using __GFP_NORETRY until +* falling back to single-page allocations. +*/ + for (order = min(order, MAX_ORDER); order 0; order--) { + page = alloc_pages(gfp | __GFP_NORETRY, order); + if (!page) + continue; + if (PageCompound(page)) { + if (!split_huge_page(page)) + break; + __free_pages(page, order); + } else { + split_page(page, order); + break; + } + } + if (!page) + page = alloc_page(gfp); + if (!page) { + __iommu_dma_free_pages(pages, i); + return NULL; + } + j = 1 order; + count -= j; + while (j--) + pages[i++] = page++; + } + return pages; +} Hmm, most dma-api
Re: [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
On Fri, Jul 31, 2015 at 06:18:27PM +0100, Robin Murphy wrote: +int iommu_get_dma_cookie(struct iommu_domain *domain) +{ + struct iova_domain *iovad; + + if (domain-dma_api_cookie) + return -EEXIST; Why do you call that dma_api_cookie? It is just a pointer to an iova allocator, you can just name it as such, like domain-iova. +static struct iova *__alloc_iova(struct iova_domain *iovad, size_t size, + dma_addr_t dma_limit) I think you also need a struct device here to take segment boundary and dma_mask into account. +/* The IOVA allocator knows what we mapped, so just unmap whatever that was */ +static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr) +{ + struct iova_domain *iovad = domain-dma_api_cookie; + unsigned long shift = iova_shift(iovad); + unsigned long pfn = dma_addr shift; + struct iova *iova = find_iova(iovad, pfn); + size_t size = iova_size(iova) shift; + + /* ...and if we can't, then something is horribly, horribly wrong */ + BUG_ON(iommu_unmap(domain, pfn shift, size) size); This is a WARN_ON at most, not a BUG_ON condition, especially since this type of bug is also catched with the dma-api debugging code. +static struct page **__iommu_dma_alloc_pages(unsigned int count, gfp_t gfp) +{ + struct page **pages; + unsigned int i = 0, array_size = count * sizeof(*pages); + + if (array_size = PAGE_SIZE) + pages = kzalloc(array_size, GFP_KERNEL); + else + pages = vzalloc(array_size); + if (!pages) + return NULL; + + /* IOMMU can map any pages, so himem can also be used here */ + gfp |= __GFP_NOWARN | __GFP_HIGHMEM; + + while (count) { + struct page *page = NULL; + int j, order = __fls(count); + + /* + * Higher-order allocations are a convenience rather + * than a necessity, hence using __GFP_NORETRY until + * falling back to single-page allocations. + */ + for (order = min(order, MAX_ORDER); order 0; order--) { + page = alloc_pages(gfp | __GFP_NORETRY, order); + if (!page) + continue; + if (PageCompound(page)) { + if (!split_huge_page(page)) + break; + __free_pages(page, order); + } else { + split_page(page, order); + break; + } + } + if (!page) + page = alloc_page(gfp); + if (!page) { + __iommu_dma_free_pages(pages, i); + return NULL; + } + j = 1 order; + count -= j; + while (j--) + pages[i++] = page++; + } + return pages; +} Hmm, most dma-api implementation just try to allocate a big enough region from the page-alloctor. Is it implemented different here to avoid the use of CMA? Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
Hi Will, On Thu, Aug 06, 2015 at 04:23:27PM +0100, Will Deacon wrote: We're quite keen to get this in for arm64, since we're without IOMMU DMA ops and need to get something upstream. Do you think this is likely to be merged for 4.3/4.4 or would we be better off doing our own arch-private implementation instead? Sorry to pester, but we've got people basing their patches and products on this and I don't want to end up having to support out-of-tree code. I definitly plan to merge it soon, but not sure if its getting into v4.3. There are a few things I have questions about or need rework, but I am sure we can work this out. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
Joerg, On Fri, Jul 31, 2015 at 06:18:27PM +0100, Robin Murphy wrote: Taking inspiration from the existing arch/arm code, break out some generic functions to interface the DMA-API to the IOMMU-API. This will do the bulk of the heavy lifting for IOMMU-backed dma-mapping. Signed-off-by: Robin Murphy robin.mur...@arm.com --- drivers/iommu/Kconfig | 7 + drivers/iommu/Makefile| 1 + drivers/iommu/dma-iommu.c | 534 ++ include/linux/dma-iommu.h | 84 include/linux/iommu.h | 1 + 5 files changed, 627 insertions(+) create mode 100644 drivers/iommu/dma-iommu.c create mode 100644 include/linux/dma-iommu.h We're quite keen to get this in for arm64, since we're without IOMMU DMA ops and need to get something upstream. Do you think this is likely to be merged for 4.3/4.4 or would we be better off doing our own arch-private implementation instead? Sorry to pester, but we've got people basing their patches and products on this and I don't want to end up having to support out-of-tree code. Cheers, Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
On Fri, Jul 31, 2015 at 06:18:27PM +0100, Robin Murphy wrote: Taking inspiration from the existing arch/arm code, break out some generic functions to interface the DMA-API to the IOMMU-API. This will do the bulk of the heavy lifting for IOMMU-backed dma-mapping. Signed-off-by: Robin Murphy robin.mur...@arm.com Acked-by: Catalin Marinas catalin.mari...@arm.com ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping
Taking inspiration from the existing arch/arm code, break out some generic functions to interface the DMA-API to the IOMMU-API. This will do the bulk of the heavy lifting for IOMMU-backed dma-mapping. Signed-off-by: Robin Murphy robin.mur...@arm.com --- drivers/iommu/Kconfig | 7 + drivers/iommu/Makefile| 1 + drivers/iommu/dma-iommu.c | 534 ++ include/linux/dma-iommu.h | 84 include/linux/iommu.h | 1 + 5 files changed, 627 insertions(+) create mode 100644 drivers/iommu/dma-iommu.c create mode 100644 include/linux/dma-iommu.h diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 8a1bc38..4996dc3 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -48,6 +48,13 @@ config OF_IOMMU def_bool y depends on OF IOMMU_API +# IOMMU-agnostic DMA-mapping layer +config IOMMU_DMA + bool + depends on NEED_SG_DMA_LENGTH + select IOMMU_API + select IOMMU_IOVA + config FSL_PAMU bool Freescale IOMMU support depends on PPC32 diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index dc6f511..45efa2a 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -1,6 +1,7 @@ obj-$(CONFIG_IOMMU_API) += iommu.o obj-$(CONFIG_IOMMU_API) += iommu-traces.o obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o +obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o obj-$(CONFIG_IOMMU_IOVA) += iova.o diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c new file mode 100644 index 000..f34fd46 --- /dev/null +++ b/drivers/iommu/dma-iommu.c @@ -0,0 +1,534 @@ +/* + * A fairly generic DMA-API to IOMMU-API glue layer. + * + * Copyright (C) 2014-2015 ARM Ltd. + * + * based in part on arch/arm/mm/dma-mapping.c: + * Copyright (C) 2000-2004 Russell King + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#include linux/device.h +#include linux/dma-iommu.h +#include linux/huge_mm.h +#include linux/iommu.h +#include linux/iova.h +#include linux/mm.h + +int iommu_dma_init(void) +{ + return iova_cache_get(); +} + +/** + * iommu_get_dma_cookie - Acquire DMA-API resources for a domain + * @domain: IOMMU domain to prepare for DMA-API usage + * + * IOMMU drivers should normally call this from their domain_alloc + * callback when domain-type == IOMMU_DOMAIN_DMA. + */ +int iommu_get_dma_cookie(struct iommu_domain *domain) +{ + struct iova_domain *iovad; + + if (domain-dma_api_cookie) + return -EEXIST; + + iovad = kzalloc(sizeof(*iovad), GFP_KERNEL); + domain-dma_api_cookie = iovad; + + return iovad ? 0 : -ENOMEM; +} +EXPORT_SYMBOL(iommu_get_dma_cookie); + +/** + * iommu_put_dma_cookie - Release a domain's DMA mapping resources + * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie() + * + * IOMMU drivers should normally call this from their domain_free callback. + */ +void iommu_put_dma_cookie(struct iommu_domain *domain) +{ + struct iova_domain *iovad = domain-dma_api_cookie; + + if (!iovad) + return; + + put_iova_domain(iovad); + kfree(iovad); + domain-dma_api_cookie = NULL; +} +EXPORT_SYMBOL(iommu_put_dma_cookie); + +/** + * iommu_dma_init_domain - Initialise a DMA mapping domain + * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie() + * @base: IOVA at which the mappable address space starts + * @size: Size of IOVA space + * + * @base and @size should be exact multiples of IOMMU page granularity to + * avoid rounding surprises. If necessary, we reserve the page at address 0 + * to ensure it is an invalid IOVA. + */ +int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 size) +{ + struct iova_domain *iovad = domain-dma_api_cookie; + unsigned long order, base_pfn, end_pfn; + + if (!iovad) + return -ENODEV; + + /* Use the smallest supported page size for IOVA granularity */ + order = __ffs(domain-ops-pgsize_bitmap); + base_pfn = max_t(unsigned long, 1, base order); + end_pfn = (base + size - 1) order; + + /* Check the domain allows at least some access to the device... */ + if (domain-geometry.force_aperture) { + if (base domain-geometry.aperture_end || + base + size =