Re: [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping

2015-08-11 Thread Joerg Roedel
On Fri, Aug 07, 2015 at 02:38:39PM +0100, Robin Murphy wrote:
 Indeed, DMA_DEBUG will check that a driver is making DMA API calls
 to the arch code in the right way; this is a different check, to
 catch things like the arch code passing the wrong domain into this
 layer, or someone else having messed directly with the domain via
 the IOMMU API. If the iommu_unmap doesn't match the IOVA region we
 looked up, that means the IOMMU page tables have somehow become
 inconsistent with the IOVA allocator, so we are in an unrecoverable
 situation where we can no longer be sure what devices have access
 to. That's bad.

Sure, but the BUG_ON would also trigger on things like a double-free,
which is bad to handle as a BUG_ON. A WARN_ON for this is sufficient.

 AFAIK, yes (this is just a slight tidyup of the existing code that
 32-bit Exynos/Tegra/Rockchip/etc. devices are already using) - the
 display guys want increasingly massive contiguous allocations for
 framebuffers, layers, etc., so having IOMMU magic deal with that
 saves CMA for non-IOMMU devices that really need it.

Makes sense, I thougt about something similar for x86 too to avoid the
high-order allocations we currently do. I guess the buffer will later be
mapped into the vmalloc space for the CPU?


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping

2015-08-11 Thread Robin Murphy

Hi Joerg,

On 11/08/15 10:37, Joerg Roedel wrote:

On Fri, Aug 07, 2015 at 02:38:39PM +0100, Robin Murphy wrote:

Indeed, DMA_DEBUG will check that a driver is making DMA API calls
to the arch code in the right way; this is a different check, to
catch things like the arch code passing the wrong domain into this
layer, or someone else having messed directly with the domain via
the IOMMU API. If the iommu_unmap doesn't match the IOVA region we
looked up, that means the IOMMU page tables have somehow become
inconsistent with the IOVA allocator, so we are in an unrecoverable
situation where we can no longer be sure what devices have access
to. That's bad.


Sure, but the BUG_ON would also trigger on things like a double-free,
which is bad to handle as a BUG_ON. A WARN_ON for this is sufficient.


Oh dear, it gets even better than that; in the case of a simple 
double-unmap where the IOVA is already free, we wouldn't even get as far 
as that check because we'd die calling iova_size(NULL). How on Earth did 
I get to v5 without spotting that? :(


Anyway, on reflection I think you're probably right; I've clearly been 
working on this for long enough to start falling into the my thing is 
obviously more important than all the other things trap.



AFAIK, yes (this is just a slight tidyup of the existing code that
32-bit Exynos/Tegra/Rockchip/etc. devices are already using) - the
display guys want increasingly massive contiguous allocations for
framebuffers, layers, etc., so having IOMMU magic deal with that
saves CMA for non-IOMMU devices that really need it.


Makes sense, I thougt about something similar for x86 too to avoid the
high-order allocations we currently do. I guess the buffer will later be
mapped into the vmalloc space for the CPU?


Indeed - for non-coherent devices we have to remap all allocations 
(IOMMU or not) anyway in order to get a non-cacheable CPU mapping of the 
buffer, so having non-contiguous pages is no bother; for coherent 
devices we can just do the same thing but keep the vmalloc mapping 
cacheable. There's also the DMA_ATTR_NO_KERNEL_MAPPING case (e.g. GPU 
just wants a big buffer to render into and read back out again) where we 
wouldn't need a CPU address at all, although on arm64 vmalloc space is 
cheap enough that we've no plans to implement that at the moment.


Robin.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping

2015-08-07 Thread Robin Murphy

Hi Joerg,

Thanks for taking a look,

On 07/08/15 09:42, Joerg Roedel wrote:

On Fri, Jul 31, 2015 at 06:18:27PM +0100, Robin Murphy wrote:

+int iommu_get_dma_cookie(struct iommu_domain *domain)
+{
+   struct iova_domain *iovad;
+
+   if (domain-dma_api_cookie)
+   return -EEXIST;


Why do you call that dma_api_cookie? It is just a pointer to an iova
allocator, you can just name it as such, like domain-iova.


Sure, it was more the case that since it had to be in the top-level 
generic domain structure, I didn't want it to be too 
implementation-specific. I figured this was a reasonable compromise that 
wouldn't be a waste of space for implementations with different 
per-domain DMA API data - e.g. the AMD IOMMU driver could then make use 
of protection_domain-domain-dma_api_cookie instead of having 
protection_domain-priv, but that's a patch that wouldn't belong in this 
series anyway.


If you really hate that idea, then yeah, let's just call it iova and 
consider if factoring out redundancy is still applicable later.



+static struct iova *__alloc_iova(struct iova_domain *iovad, size_t size,
+   dma_addr_t dma_limit)


I think you also need a struct device here to take segment boundary and
dma_mask into account.


To the best of my understanding, those limits are only relevant when 
actually handing off a scatterlist to a client device doing hardware 
scatter-gather operations, so it's not so much the IOVA allocation that 
matters, but where the segments lie within it when handling dma_map_sg.


However, you do raise a good point - in the current map everything 
consecutively approach, if there is a non-power-of-2-sized segment in 
the middle of a scatterlist, then subsequent segments could possibly end 
up inadvertently straddling a boundary. That needs handling in 
iommu_dma_map_sg; I'll fix it appropriately.



+/* The IOVA allocator knows what we mapped, so just unmap whatever that was */
+static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr)
+{
+   struct iova_domain *iovad = domain-dma_api_cookie;
+   unsigned long shift = iova_shift(iovad);
+   unsigned long pfn = dma_addr  shift;
+   struct iova *iova = find_iova(iovad, pfn);
+   size_t size = iova_size(iova)  shift;
+
+   /* ...and if we can't, then something is horribly, horribly wrong */
+   BUG_ON(iommu_unmap(domain, pfn  shift, size)  size);


This is a WARN_ON at most, not a BUG_ON condition, especially since this
type of bug is also catched with the dma-api debugging code.


Indeed, DMA_DEBUG will check that a driver is making DMA API calls to 
the arch code in the right way; this is a different check, to catch 
things like the arch code passing the wrong domain into this layer, or 
someone else having messed directly with the domain via the IOMMU API. 
If the iommu_unmap doesn't match the IOVA region we looked up, that 
means the IOMMU page tables have somehow become inconsistent with the 
IOVA allocator, so we are in an unrecoverable situation where we can no 
longer be sure what devices have access to. That's bad.



+static struct page **__iommu_dma_alloc_pages(unsigned int count, gfp_t gfp)
+{
+   struct page **pages;
+   unsigned int i = 0, array_size = count * sizeof(*pages);
+
+   if (array_size = PAGE_SIZE)
+   pages = kzalloc(array_size, GFP_KERNEL);
+   else
+   pages = vzalloc(array_size);
+   if (!pages)
+   return NULL;
+
+   /* IOMMU can map any pages, so himem can also be used here */
+   gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
+
+   while (count) {
+   struct page *page = NULL;
+   int j, order = __fls(count);
+
+   /*
+* Higher-order allocations are a convenience rather
+* than a necessity, hence using __GFP_NORETRY until
+* falling back to single-page allocations.
+*/
+   for (order = min(order, MAX_ORDER); order  0; order--) {
+   page = alloc_pages(gfp | __GFP_NORETRY, order);
+   if (!page)
+   continue;
+   if (PageCompound(page)) {
+   if (!split_huge_page(page))
+   break;
+   __free_pages(page, order);
+   } else {
+   split_page(page, order);
+   break;
+   }
+   }
+   if (!page)
+   page = alloc_page(gfp);
+   if (!page) {
+   __iommu_dma_free_pages(pages, i);
+   return NULL;
+   }
+   j = 1  order;
+   count -= j;
+   while (j--)
+   pages[i++] = page++;
+   }
+   return pages;
+}


Hmm, most dma-api 

Re: [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping

2015-08-07 Thread Joerg Roedel
On Fri, Jul 31, 2015 at 06:18:27PM +0100, Robin Murphy wrote:
 +int iommu_get_dma_cookie(struct iommu_domain *domain)
 +{
 + struct iova_domain *iovad;
 +
 + if (domain-dma_api_cookie)
 + return -EEXIST;

Why do you call that dma_api_cookie? It is just a pointer to an iova
allocator, you can just name it as such, like domain-iova.

 +static struct iova *__alloc_iova(struct iova_domain *iovad, size_t size,
 + dma_addr_t dma_limit)

I think you also need a struct device here to take segment boundary and
dma_mask into account.

 +/* The IOVA allocator knows what we mapped, so just unmap whatever that was 
 */
 +static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t 
 dma_addr)
 +{
 + struct iova_domain *iovad = domain-dma_api_cookie;
 + unsigned long shift = iova_shift(iovad);
 + unsigned long pfn = dma_addr  shift;
 + struct iova *iova = find_iova(iovad, pfn);
 + size_t size = iova_size(iova)  shift;
 +
 + /* ...and if we can't, then something is horribly, horribly wrong */
 + BUG_ON(iommu_unmap(domain, pfn  shift, size)  size);

This is a WARN_ON at most, not a BUG_ON condition, especially since this
type of bug is also catched with the dma-api debugging code.

 +static struct page **__iommu_dma_alloc_pages(unsigned int count, gfp_t gfp)
 +{
 + struct page **pages;
 + unsigned int i = 0, array_size = count * sizeof(*pages);
 +
 + if (array_size = PAGE_SIZE)
 + pages = kzalloc(array_size, GFP_KERNEL);
 + else
 + pages = vzalloc(array_size);
 + if (!pages)
 + return NULL;
 +
 + /* IOMMU can map any pages, so himem can also be used here */
 + gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
 +
 + while (count) {
 + struct page *page = NULL;
 + int j, order = __fls(count);
 +
 + /*
 +  * Higher-order allocations are a convenience rather
 +  * than a necessity, hence using __GFP_NORETRY until
 +  * falling back to single-page allocations.
 +  */
 + for (order = min(order, MAX_ORDER); order  0; order--) {
 + page = alloc_pages(gfp | __GFP_NORETRY, order);
 + if (!page)
 + continue;
 + if (PageCompound(page)) {
 + if (!split_huge_page(page))
 + break;
 + __free_pages(page, order);
 + } else {
 + split_page(page, order);
 + break;
 + }
 + }
 + if (!page)
 + page = alloc_page(gfp);
 + if (!page) {
 + __iommu_dma_free_pages(pages, i);
 + return NULL;
 + }
 + j = 1  order;
 + count -= j;
 + while (j--)
 + pages[i++] = page++;
 + }
 + return pages;
 +}

Hmm, most dma-api implementation just try to allocate a big enough
region from the page-alloctor. Is it implemented different here to avoid
the use of CMA?


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping

2015-08-06 Thread j...@8bytes.org
Hi Will,

On Thu, Aug 06, 2015 at 04:23:27PM +0100, Will Deacon wrote:
 We're quite keen to get this in for arm64, since we're without IOMMU DMA
 ops and need to get something upstream. Do you think this is likely to
 be merged for 4.3/4.4 or would we be better off doing our own
 arch-private implementation instead?
 
 Sorry to pester, but we've got people basing their patches and products
 on this and I don't want to end up having to support out-of-tree code.

I definitly plan to merge it soon, but not sure if its getting into
v4.3. There are a few things I have questions about or need rework, but
I am sure we can work this out.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping

2015-08-06 Thread Will Deacon
Joerg,

On Fri, Jul 31, 2015 at 06:18:27PM +0100, Robin Murphy wrote:
 Taking inspiration from the existing arch/arm code, break out some
 generic functions to interface the DMA-API to the IOMMU-API. This will
 do the bulk of the heavy lifting for IOMMU-backed dma-mapping.
 
 Signed-off-by: Robin Murphy robin.mur...@arm.com
 ---
  drivers/iommu/Kconfig |   7 +
  drivers/iommu/Makefile|   1 +
  drivers/iommu/dma-iommu.c | 534 
 ++
  include/linux/dma-iommu.h |  84 
  include/linux/iommu.h |   1 +
  5 files changed, 627 insertions(+)
  create mode 100644 drivers/iommu/dma-iommu.c
  create mode 100644 include/linux/dma-iommu.h

We're quite keen to get this in for arm64, since we're without IOMMU DMA
ops and need to get something upstream. Do you think this is likely to
be merged for 4.3/4.4 or would we be better off doing our own
arch-private implementation instead?

Sorry to pester, but we've got people basing their patches and products
on this and I don't want to end up having to support out-of-tree code.

Cheers,

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping

2015-08-03 Thread Catalin Marinas
On Fri, Jul 31, 2015 at 06:18:27PM +0100, Robin Murphy wrote:
 Taking inspiration from the existing arch/arm code, break out some
 generic functions to interface the DMA-API to the IOMMU-API. This will
 do the bulk of the heavy lifting for IOMMU-backed dma-mapping.
 
 Signed-off-by: Robin Murphy robin.mur...@arm.com

Acked-by: Catalin Marinas catalin.mari...@arm.com
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping

2015-07-31 Thread Robin Murphy
Taking inspiration from the existing arch/arm code, break out some
generic functions to interface the DMA-API to the IOMMU-API. This will
do the bulk of the heavy lifting for IOMMU-backed dma-mapping.

Signed-off-by: Robin Murphy robin.mur...@arm.com
---
 drivers/iommu/Kconfig |   7 +
 drivers/iommu/Makefile|   1 +
 drivers/iommu/dma-iommu.c | 534 ++
 include/linux/dma-iommu.h |  84 
 include/linux/iommu.h |   1 +
 5 files changed, 627 insertions(+)
 create mode 100644 drivers/iommu/dma-iommu.c
 create mode 100644 include/linux/dma-iommu.h

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 8a1bc38..4996dc3 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -48,6 +48,13 @@ config OF_IOMMU
def_bool y
depends on OF  IOMMU_API
 
+# IOMMU-agnostic DMA-mapping layer
+config IOMMU_DMA
+   bool
+   depends on NEED_SG_DMA_LENGTH
+   select IOMMU_API
+   select IOMMU_IOVA
+
 config FSL_PAMU
bool Freescale IOMMU support
depends on PPC32
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index dc6f511..45efa2a 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -1,6 +1,7 @@
 obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
+obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
new file mode 100644
index 000..f34fd46
--- /dev/null
+++ b/drivers/iommu/dma-iommu.c
@@ -0,0 +1,534 @@
+/*
+ * A fairly generic DMA-API to IOMMU-API glue layer.
+ *
+ * Copyright (C) 2014-2015 ARM Ltd.
+ *
+ * based in part on arch/arm/mm/dma-mapping.c:
+ * Copyright (C) 2000-2004 Russell King
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see http://www.gnu.org/licenses/.
+ */
+
+#include linux/device.h
+#include linux/dma-iommu.h
+#include linux/huge_mm.h
+#include linux/iommu.h
+#include linux/iova.h
+#include linux/mm.h
+
+int iommu_dma_init(void)
+{
+   return iova_cache_get();
+}
+
+/**
+ * iommu_get_dma_cookie - Acquire DMA-API resources for a domain
+ * @domain: IOMMU domain to prepare for DMA-API usage
+ *
+ * IOMMU drivers should normally call this from their domain_alloc
+ * callback when domain-type == IOMMU_DOMAIN_DMA.
+ */
+int iommu_get_dma_cookie(struct iommu_domain *domain)
+{
+   struct iova_domain *iovad;
+
+   if (domain-dma_api_cookie)
+   return -EEXIST;
+
+   iovad = kzalloc(sizeof(*iovad), GFP_KERNEL);
+   domain-dma_api_cookie = iovad;
+
+   return iovad ? 0 : -ENOMEM;
+}
+EXPORT_SYMBOL(iommu_get_dma_cookie);
+
+/**
+ * iommu_put_dma_cookie - Release a domain's DMA mapping resources
+ * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
+ *
+ * IOMMU drivers should normally call this from their domain_free callback.
+ */
+void iommu_put_dma_cookie(struct iommu_domain *domain)
+{
+   struct iova_domain *iovad = domain-dma_api_cookie;
+
+   if (!iovad)
+   return;
+
+   put_iova_domain(iovad);
+   kfree(iovad);
+   domain-dma_api_cookie = NULL;
+}
+EXPORT_SYMBOL(iommu_put_dma_cookie);
+
+/**
+ * iommu_dma_init_domain - Initialise a DMA mapping domain
+ * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
+ * @base: IOVA at which the mappable address space starts
+ * @size: Size of IOVA space
+ *
+ * @base and @size should be exact multiples of IOMMU page granularity to
+ * avoid rounding surprises. If necessary, we reserve the page at address 0
+ * to ensure it is an invalid IOVA.
+ */
+int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 
size)
+{
+   struct iova_domain *iovad = domain-dma_api_cookie;
+   unsigned long order, base_pfn, end_pfn;
+
+   if (!iovad)
+   return -ENODEV;
+
+   /* Use the smallest supported page size for IOVA granularity */
+   order = __ffs(domain-ops-pgsize_bitmap);
+   base_pfn = max_t(unsigned long, 1, base  order);
+   end_pfn = (base + size - 1)  order;
+
+   /* Check the domain allows at least some access to the device... */
+   if (domain-geometry.force_aperture) {
+   if (base  domain-geometry.aperture_end ||
+   base + size =