Re: [PATCH] iommu/rockchip: Free irqs in shutdown handler

2018-09-20 Thread Marc Zyngier
On Mon, 27 Aug 2018 11:56:24 +0100,
Heiko Stuebner  wrote:
> 
> In the iommu's shutdown handler we disable runtime-pm which could
> result in the irq-handler running unclocked and since commit
> 3fc7c5c0cff3 ("iommu/rockchip: Handle errors returned from PM framework")
> we warn about that fact.
> 
> This can cause warnings on shutdown on some Rockchip machines, so
> free the irqs in the shutdown handler before we disable runtime-pm.
> 
> Reported-by: Enric Balletbo i Serra 
> Fixes: 3fc7c5c0cff3 ("iommu/rockchip: Handle errors returned from PM 
> framework")
> Signed-off-by: Heiko Stuebner 
> ---
>  drivers/iommu/rockchip-iommu.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> index 258115b10fa9..ad3e2b97469e 100644
> --- a/drivers/iommu/rockchip-iommu.c
> +++ b/drivers/iommu/rockchip-iommu.c
> @@ -1241,6 +1241,12 @@ static int rk_iommu_probe(struct platform_device *pdev)
>  
>  static void rk_iommu_shutdown(struct platform_device *pdev)
>  {
> + struct rk_iommu *iommu = platform_get_drvdata(pdev);
> + int i = 0, irq;
> +
> + while ((irq = platform_get_irq(pdev, i++)) != -ENXIO)
> + devm_free_irq(iommu->dev, irq, iommu);
> +
>   pm_runtime_force_suspend(>dev);
>  }
>  
> -- 
> 2.17.0
> 

Acked-by: Marc Zyngier 

M.

-- 
Jazz is not dead, it just smell funny.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] xtensa: remove partial support for DMA buffers in high memory

2018-09-20 Thread Max Filippov
On Thu, Sep 20, 2018 at 11:08 AM, Christoph Hellwig  wrote:
> On Thu, Sep 20, 2018 at 10:44:55AM -0700, Max Filippov wrote:
>> Hi Christoph,
>>
>> On Thu, Sep 20, 2018 at 10:15 AM, Christoph Hellwig  wrote:
>> > This reverts commit 6137e4166004e2ec383ac05d5ca15831f4668806.
>> >
>> > We explicitly clear GFP_HIGHMEM from the allowed dma flags at the beginning
>> > of the function (and the generic dma_alloc_attr function calling us does 
>> > the
>> > same!), so this code just adds dead wood.
>>
>> No, not really: dma_alloc_from_contiguous does not accept flags (only
>> no_warn bit)
>> and may return arbitrary pages. That's the case that this code is handling.
>
> dma_alloc_from_contiguous calls cma_alloc to do the actual
> allocation, and that uses alloc_contig_range with the GFP_KERNEL
> flag.  How do you end up getting highmem patches from it?

I'm not familiar with the details of alloc_contig_range implementation, but
I don't see how gfp_mask is used to limit allocation to non-high memory.
So when alloc_contig_range gets start and end PFNs in high memory
(from the CMA region allocated in high memory) it just returns high memory
pages -- that's what I see.

-- 
Thanks.
-- Max
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/5] dma-direct: add an explicit dma_direct_get_required_mask

2018-09-20 Thread Christoph Hellwig
This is somewhat modelled after the powerpc version, and differs from
the legacy fallback in use fls64 instead of pointlessly splitting up the
address into low and high dwords and in that it takes (__)phys_to_dma
into account.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-direct.h |  1 +
 kernel/dma/direct.c| 21 ++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 86a59ba5a7f3..b79496d8c75b 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -55,6 +55,7 @@ static inline void dma_mark_clean(void *addr, size_t size)
 }
 #endif /* CONFIG_ARCH_HAS_DMA_MARK_CLEAN */
 
+u64 dma_direct_get_required_mask(struct device *dev);
 void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t gfp, unsigned long attrs);
 void dma_direct_free(struct device *dev, size_t size, void *cpu_addr,
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index c954f0a6dc62..81b73a5bba54 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -53,11 +53,25 @@ check_addr(struct device *dev, dma_addr_t dma_addr, size_t 
size,
return true;
 }
 
+static inline dma_addr_t phys_to_dma_direct(struct device *dev,
+   phys_addr_t phys)
+{
+   if (force_dma_unencrypted())
+   return __phys_to_dma(dev, phys);
+   return phys_to_dma(dev, phys);
+}
+
+u64 dma_direct_get_required_mask(struct device *dev)
+{
+   u64 max_dma = phys_to_dma_direct(dev, (max_pfn - 1) << PAGE_SHIFT);
+
+   return (1ULL << (fls64(max_dma) - 1)) * 2 - 1;
+}
+
 static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
 {
-   dma_addr_t addr = force_dma_unencrypted() ?
-   __phys_to_dma(dev, phys) : phys_to_dma(dev, phys);
-   return addr + size - 1 <= dev->coherent_dma_mask;
+   return phys_to_dma_direct(dev, phys) + size - 1 <=
+   dev->coherent_dma_mask;
 }
 
 void *dma_direct_alloc_pages(struct device *dev, size_t size,
@@ -296,6 +310,7 @@ const struct dma_map_ops dma_direct_ops = {
.unmap_page = dma_direct_unmap_page,
.unmap_sg   = dma_direct_unmap_sg,
 #endif
+   .get_required_mask  = dma_direct_get_required_mask,
.dma_supported  = dma_direct_supported,
.mapping_error  = dma_direct_mapping_error,
.cache_sync = arch_dma_cache_sync,
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/5] dma-direct: implement complete bus_dma_mask handling

2018-09-20 Thread Christoph Hellwig
Instead of rejecting devices with a too small bus_dma_mask we can handle
by taking the bus dma_mask into account for allocations and bounce
buffering decisions.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-direct.h |  3 ++-
 kernel/dma/direct.c| 21 +++--
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index b79496d8c75b..fbca184ff5a0 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -27,7 +27,8 @@ static inline bool dma_capable(struct device *dev, dma_addr_t 
addr, size_t size)
if (!dev->dma_mask)
return false;
 
-   return addr + size - 1 <= *dev->dma_mask;
+   return addr + size - 1 <=
+   min_not_zero(*dev->dma_mask, dev->bus_dma_mask);
 }
 #endif /* !CONFIG_ARCH_HAS_PHYS_TO_DMA */
 
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 3c404e33d946..64466b7ef67b 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -43,10 +43,11 @@ check_addr(struct device *dev, dma_addr_t dma_addr, size_t 
size,
return false;
}
 
-   if (*dev->dma_mask >= DMA_BIT_MASK(32)) {
+   if (*dev->dma_mask >= DMA_BIT_MASK(32) || dev->bus_dma_mask) {
dev_err(dev,
-   "%s: overflow %pad+%zu of device mask %llx\n",
-   caller, _addr, size, *dev->dma_mask);
+   "%s: overflow %pad+%zu of device mask %llx bus 
mask %llx\n",
+   caller, _addr, size,
+   *dev->dma_mask, dev->bus_dma_mask);
}
return false;
}
@@ -65,12 +66,18 @@ u64 dma_direct_get_required_mask(struct device *dev)
 {
u64 max_dma = phys_to_dma_direct(dev, (max_pfn - 1) << PAGE_SHIFT);
 
+   if (dev->bus_dma_mask && dev->bus_dma_mask < max_dma)
+   max_dma = dev->bus_dma_mask;
+
return (1ULL << (fls64(max_dma) - 1)) * 2 - 1;
 }
 
 static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
u64 *phys_mask)
 {
+   if (dev->bus_dma_mask && dev->bus_dma_mask < dma_mask)
+   dma_mask = dev->bus_dma_mask;
+
if (force_dma_unencrypted())
*phys_mask = __dma_to_phys(dev, dma_mask);
else
@@ -87,7 +94,7 @@ static gfp_t __dma_direct_optimal_gfp_mask(struct device 
*dev, u64 dma_mask,
 static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
 {
return phys_to_dma_direct(dev, phys) + size - 1 <=
-   dev->coherent_dma_mask;
+   min_not_zero(dev->coherent_dma_mask, dev->bus_dma_mask);
 }
 
 void *dma_direct_alloc_pages(struct device *dev, size_t size,
@@ -291,12 +298,6 @@ int dma_direct_supported(struct device *dev, u64 mask)
if (mask < phys_to_dma(dev, DMA_BIT_MASK(32)))
return 0;
 #endif
-   /*
-* Upstream PCI/PCIe bridges or SoC interconnects may not carry
-* as many DMA address bits as the device itself supports.
-*/
-   if (dev->bus_dma_mask && mask > dev->bus_dma_mask)
-   return 0;
return 1;
 }
 
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/5] dma-direct: always allow dma mask <= physiscal memory size

2018-09-20 Thread Christoph Hellwig
This way an architecture with less than 4G of RAM can support dma_mask
smaller than 32-bit without a ZONE_DMA.  Apparently that is a common
case on powerpc.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/direct.c | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 64466b7ef67b..d1e103c6b107 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -4,6 +4,7 @@
  *
  * DMA operations that map physical memory directly without using an IOMMU.
  */
+#include  /* for max_pfn */
 #include 
 #include 
 #include 
@@ -283,21 +284,24 @@ int dma_direct_map_sg(struct device *dev, struct 
scatterlist *sgl, int nents,
return nents;
 }
 
+/*
+ * Because 32-bit DMA masks are so common we expect every architecture to be
+ * able to satisfy them - either by not supporting more physical memory, or by
+ * providing a ZONE_DMA32.  If neither is the case, the architecture needs to
+ * use an IOMMU instead of the direct mapping.
+ */
 int dma_direct_supported(struct device *dev, u64 mask)
 {
-#ifdef CONFIG_ZONE_DMA
-   if (mask < phys_to_dma(dev, DMA_BIT_MASK(ARCH_ZONE_DMA_BITS)))
-   return 0;
-#else
-   /*
-* Because 32-bit DMA masks are so common we expect every architecture
-* to be able to satisfy them - either by not supporting more physical
-* memory, or by providing a ZONE_DMA32.  If neither is the case, the
-* architecture needs to use an IOMMU instead of the direct mapping.
-*/
-   if (mask < phys_to_dma(dev, DMA_BIT_MASK(32)))
+   u64 min_mask;
+
+   if (IS_ENABLED(CONFIG_ZONE_DMA))
+   min_mask = DMA_BIT_MASK(ARCH_ZONE_DMA_BITS);
+   else
+   min_mask = min_t(u64, DMA_BIT_MASK(32),
+(max_pfn - 1) << PAGE_SHIFT);
+
+   if (mask >= phys_to_dma(dev, min_mask))
return 0;
-#endif
return 1;
 }
 
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/5] dma-direct: refine dma_direct_alloc zone selection

2018-09-20 Thread Christoph Hellwig
We need to take the DMA offset and encryption bit into account when
selecting a zone.  User the opportunity to factor out the zone
selection into a helper for reuse.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/direct.c | 31 +--
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 81b73a5bba54..3c404e33d946 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -68,6 +68,22 @@ u64 dma_direct_get_required_mask(struct device *dev)
return (1ULL << (fls64(max_dma) - 1)) * 2 - 1;
 }
 
+static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
+   u64 *phys_mask)
+{
+   if (force_dma_unencrypted())
+   *phys_mask = __dma_to_phys(dev, dma_mask);
+   else
+   *phys_mask = dma_to_phys(dev, dma_mask);
+
+   /* GFP_DMA32 and GFP_DMA are no ops without the corresponding zones: */
+   if (*phys_mask <= DMA_BIT_MASK(ARCH_ZONE_DMA_BITS))
+   return GFP_DMA;
+   if (*phys_mask <= DMA_BIT_MASK(32))
+   return GFP_DMA32;
+   return 0;
+}
+
 static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
 {
return phys_to_dma_direct(dev, phys) + size - 1 <=
@@ -80,17 +96,13 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
int page_order = get_order(size);
struct page *page = NULL;
+   u64 phys_mask;
void *ret;
 
/* we always manually zero the memory once we are done: */
gfp &= ~__GFP_ZERO;
-
-   /* GFP_DMA32 and GFP_DMA are no ops without the corresponding zones: */
-   if (dev->coherent_dma_mask <= DMA_BIT_MASK(ARCH_ZONE_DMA_BITS))
-   gfp |= GFP_DMA;
-   if (dev->coherent_dma_mask <= DMA_BIT_MASK(32) && !(gfp & GFP_DMA))
-   gfp |= GFP_DMA32;
-
+   gfp |= __dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
+   _mask);
 again:
/* CMA can be used only in the context which permits sleeping */
if (gfpflags_allow_blocking(gfp)) {
@@ -109,15 +121,14 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
page = NULL;
 
if (IS_ENABLED(CONFIG_ZONE_DMA32) &&
-   dev->coherent_dma_mask < DMA_BIT_MASK(64) &&
+   phys_mask < DMA_BIT_MASK(64) &&
!(gfp & (GFP_DMA32 | GFP_DMA))) {
gfp |= GFP_DMA32;
goto again;
}
 
if (IS_ENABLED(CONFIG_ZONE_DMA) &&
-   dev->coherent_dma_mask < DMA_BIT_MASK(32) &&
-   !(gfp & GFP_DMA)) {
+   phys_mask < DMA_BIT_MASK(32) && !(gfp & GFP_DMA)) {
gfp = (gfp & ~GFP_DMA32) | GFP_DMA;
goto again;
}
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/5] dma-mapping: make the get_required_mask method available unconditionally

2018-09-20 Thread Christoph Hellwig
This save some duplication for ia64, and makes the interface more
general.  In the long run we want each dma_map_ops instance to fill this
out, but this will take a little more prep work.

Signed-off-by: Christoph Hellwig 
---
 arch/ia64/include/asm/dma-mapping.h  |  2 --
 arch/ia64/include/asm/machvec.h  |  7 ---
 arch/ia64/include/asm/machvec_init.h |  1 -
 arch/ia64/include/asm/machvec_sn2.h  |  2 --
 arch/ia64/pci/pci.c  | 26 --
 arch/ia64/sn/pci/pci_dma.c   |  4 ++--
 drivers/base/platform.c  | 13 +++--
 drivers/pci/controller/vmd.c |  4 
 include/linux/dma-mapping.h  |  2 --
 9 files changed, 13 insertions(+), 48 deletions(-)

diff --git a/arch/ia64/include/asm/dma-mapping.h 
b/arch/ia64/include/asm/dma-mapping.h
index 76e4d6632d68..522745ae67bb 100644
--- a/arch/ia64/include/asm/dma-mapping.h
+++ b/arch/ia64/include/asm/dma-mapping.h
@@ -10,8 +10,6 @@
 #include 
 #include 
 
-#define ARCH_HAS_DMA_GET_REQUIRED_MASK
-
 extern const struct dma_map_ops *dma_ops;
 extern struct ia64_machine_vector ia64_mv;
 extern void set_iommu_machvec(void);
diff --git a/arch/ia64/include/asm/machvec.h b/arch/ia64/include/asm/machvec.h
index 267f4f170191..5133739966bc 100644
--- a/arch/ia64/include/asm/machvec.h
+++ b/arch/ia64/include/asm/machvec.h
@@ -44,7 +44,6 @@ typedef void ia64_mv_kernel_launch_event_t(void);
 
 /* DMA-mapping interface: */
 typedef void ia64_mv_dma_init (void);
-typedef u64 ia64_mv_dma_get_required_mask (struct device *);
 typedef const struct dma_map_ops *ia64_mv_dma_get_ops(struct device *);
 
 /*
@@ -127,7 +126,6 @@ extern void machvec_tlb_migrate_finish (struct mm_struct *);
 #  define platform_global_tlb_purgeia64_mv.global_tlb_purge
 #  define platform_tlb_migrate_finish  ia64_mv.tlb_migrate_finish
 #  define platform_dma_initia64_mv.dma_init
-#  define platform_dma_get_required_mask ia64_mv.dma_get_required_mask
 #  define platform_dma_get_ops ia64_mv.dma_get_ops
 #  define platform_irq_to_vector   ia64_mv.irq_to_vector
 #  define platform_local_vector_to_irq ia64_mv.local_vector_to_irq
@@ -171,7 +169,6 @@ struct ia64_machine_vector {
ia64_mv_global_tlb_purge_t *global_tlb_purge;
ia64_mv_tlb_migrate_finish_t *tlb_migrate_finish;
ia64_mv_dma_init *dma_init;
-   ia64_mv_dma_get_required_mask *dma_get_required_mask;
ia64_mv_dma_get_ops *dma_get_ops;
ia64_mv_irq_to_vector *irq_to_vector;
ia64_mv_local_vector_to_irq *local_vector_to_irq;
@@ -211,7 +208,6 @@ struct ia64_machine_vector {
platform_global_tlb_purge,  \
platform_tlb_migrate_finish,\
platform_dma_init,  \
-   platform_dma_get_required_mask, \
platform_dma_get_ops,   \
platform_irq_to_vector, \
platform_local_vector_to_irq,   \
@@ -286,9 +282,6 @@ extern const struct dma_map_ops *dma_get_ops(struct device 
*);
 #ifndef platform_dma_get_ops
 # define platform_dma_get_ops  dma_get_ops
 #endif
-#ifndef platform_dma_get_required_mask
-# define  platform_dma_get_required_mask   ia64_dma_get_required_mask
-#endif
 #ifndef platform_irq_to_vector
 # define platform_irq_to_vector__ia64_irq_to_vector
 #endif
diff --git a/arch/ia64/include/asm/machvec_init.h 
b/arch/ia64/include/asm/machvec_init.h
index 2b32fd06b7c6..2aafb69a3787 100644
--- a/arch/ia64/include/asm/machvec_init.h
+++ b/arch/ia64/include/asm/machvec_init.h
@@ -4,7 +4,6 @@
 
 extern ia64_mv_send_ipi_t ia64_send_ipi;
 extern ia64_mv_global_tlb_purge_t ia64_global_tlb_purge;
-extern ia64_mv_dma_get_required_mask ia64_dma_get_required_mask;
 extern ia64_mv_irq_to_vector __ia64_irq_to_vector;
 extern ia64_mv_local_vector_to_irq __ia64_local_vector_to_irq;
 extern ia64_mv_pci_get_legacy_mem_t ia64_pci_get_legacy_mem;
diff --git a/arch/ia64/include/asm/machvec_sn2.h 
b/arch/ia64/include/asm/machvec_sn2.h
index ece9fa85be88..b5153d300289 100644
--- a/arch/ia64/include/asm/machvec_sn2.h
+++ b/arch/ia64/include/asm/machvec_sn2.h
@@ -55,7 +55,6 @@ extern ia64_mv_readb_t __sn_readb_relaxed;
 extern ia64_mv_readw_t __sn_readw_relaxed;
 extern ia64_mv_readl_t __sn_readl_relaxed;
 extern ia64_mv_readq_t __sn_readq_relaxed;
-extern ia64_mv_dma_get_required_mask   sn_dma_get_required_mask;
 extern ia64_mv_dma_initsn_dma_init;
 extern ia64_mv_migrate_t   sn_migrate;
 extern ia64_mv_kernel_launch_event_t   sn_kernel_launch_event;
@@ -100,7 +99,6 @@ extern ia64_mv_pci_fixup_bus_t   
sn_pci_fixup_bus;
 #define platform_pci_get_legacy_memsn_pci_get_legacy_mem
 #define platform_pci_legacy_read   sn_pci_legacy_read
 #define platform_pci_legacy_write  sn_pci_legacy_write
-#define platform_dma_get_required_mask sn_dma_get_required_mask
 #define platform_dma_init  sn_dma_init
 #define 

dma mask related fixups (including full bus_dma_mask support)

2018-09-20 Thread Christoph Hellwig
Hi all,

the dma_get_required_mask dma API implementation has always been a little
odd, in that we by default don't wire it up struct dma_map_ops, but
instead hard code a default implementation.  powerpc and ia64 override
this default and either call a method or otherwise duplicate the default.

This series always enabled the method and just falls back to the previous
default implementation when it is not available, as well as fixing up
a few bits in the default implementations.  This already allows removing
the ia64 override of the implementation, and will also allow to remove
the powerpc one together with a few additional cleanups in the powerpc
code, but those will be sent separately with other powerpc DMA API
patches.  Last but not least the method will allow us to return a more
sensible value for typical iommu dma_ops eventually, but that is left
to another series as well.

Additionally the dma-direct code has been a bit sloppy in when it was
using phys_to_dma in a few places, so this gets fixed up as well as
actually respecting the bus_dma_mask everywhere instead of just
rejecting dma mask that don't fit into it.

Alltogether this should be all core dma-direct changes required to
move powerpc over to the generic code.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] xtensa: remove partial support for DMA buffers in high memory

2018-09-20 Thread Christoph Hellwig
On Thu, Sep 20, 2018 at 10:44:55AM -0700, Max Filippov wrote:
> Hi Christoph,
> 
> On Thu, Sep 20, 2018 at 10:15 AM, Christoph Hellwig  wrote:
> > This reverts commit 6137e4166004e2ec383ac05d5ca15831f4668806.
> >
> > We explicitly clear GFP_HIGHMEM from the allowed dma flags at the beginning
> > of the function (and the generic dma_alloc_attr function calling us does the
> > same!), so this code just adds dead wood.
> 
> No, not really: dma_alloc_from_contiguous does not accept flags (only
> no_warn bit)
> and may return arbitrary pages. That's the case that this code is handling.

dma_alloc_from_contiguous calls cma_alloc to do the actual
allocation, and that uses alloc_contig_range with the GFP_KERNEL
flag.  How do you end up getting highmem patches from it?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] xtensa: remove partial support for DMA buffers in high memory

2018-09-20 Thread Max Filippov
Hi Christoph,

On Thu, Sep 20, 2018 at 10:15 AM, Christoph Hellwig  wrote:
> This reverts commit 6137e4166004e2ec383ac05d5ca15831f4668806.
>
> We explicitly clear GFP_HIGHMEM from the allowed dma flags at the beginning
> of the function (and the generic dma_alloc_attr function calling us does the
> same!), so this code just adds dead wood.

No, not really: dma_alloc_from_contiguous does not accept flags (only
no_warn bit)
and may return arbitrary pages. That's the case that this code is handling.

-- 
Thanks.
-- Max
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/13] iommu/io-pgtable-arm-v7s: Extend MediaTek 4GB mode

2018-09-20 Thread Robin Murphy

On 03/09/18 07:01, Yong Wu wrote:

MediaTek extend the arm v7s descriptor to support the dram over 4GB.

In the mt2712 and mt8173, it's called "4GB mode", the physical address
is from 0x4000_ to 0x1_3fff_, but from EMI point of view, it
is remapped to high address from 0x1__ to 0x1__, the
bit32 is always enabled. thus, in the M4U, we always enable the bit9
for all PTEs which means to enable bit32 of physical address.

but in mt8183, M4U support the dram from 0x4000_ to 0x3__
which isn't remaped. We extend the PTEs: the bit9 represent bit32 of
PA and the bit4 represent bit33 of PA. Meanwhile the iova still is
32bits.

In order to unify code, in the "4GB mode", we add the bit32 for the
physical address manually in our driver.

Correspondingly, Adding bit32 and bit33 for the PA in the iova_to_phys
has to been moved into v7s.

Signed-off-by: Yong Wu 
---
In mt8183, the PA is from 0x4000_ to 0x3__ while the iova
still is 32bits. Acturally, our HW extend the v7s pgtable. currently
the lvl1 pgtable is 16KB, our HW double it. but 32bit iova is enough
for us currently, thus we don't change it.
---
  drivers/iommu/io-pgtable-arm-v7s.c | 38 ++
  drivers/iommu/io-pgtable.h |  8 
  drivers/iommu/mtk_iommu.c  | 15 +--
  drivers/iommu/mtk_iommu.h  |  1 +
  4 files changed, 44 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index b5948ba..47538dd 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -124,7 +124,9 @@
  #define ARM_V7S_TEX_MASK  0x7
  #define ARM_V7S_ATTR_TEX(val) (((val) & ARM_V7S_TEX_MASK) << 
ARM_V7S_TEX_SHIFT)
  
-#define ARM_V7S_ATTR_MTK_4GB		BIT(9) /* MTK extend it for 4GB mode */

+/* MTK extend the two bits below for over 4GB mode */
+#define ARM_V7S_ATTR_MTK_PA_BIT32  BIT(9)
+#define ARM_V7S_ATTR_MTK_PA_BIT33  BIT(4)
  
  /* *well, except for TEX on level 2 large pages, of course :( */

  #define ARM_V7S_CONT_PAGE_TEX_SHIFT   6
@@ -268,7 +270,8 @@ static void __arm_v7s_set_pte(arm_v7s_iopte *ptep, 
arm_v7s_iopte pte,
  }
  
  static arm_v7s_iopte arm_v7s_prot_to_pte(int prot, int lvl,

-struct io_pgtable_cfg *cfg)
+struct io_pgtable_cfg *cfg,
+phys_addr_t paddr) /* Only for MTK */


I'd rather keep this function dedicated to just generating the 
permissions and attributes. If necessary I'm quite happy to add 
additional helpers for getting/setting the address, much like LPAE now 
has for handling 52-bit IOVAs.



  {
bool ap = !(cfg->quirks & IO_PGTABLE_QUIRK_NO_PERMS);
arm_v7s_iopte pte = ARM_V7S_ATTR_NG | ARM_V7S_ATTR_S;
@@ -295,8 +298,12 @@ static arm_v7s_iopte arm_v7s_prot_to_pte(int prot, int lvl,
if (lvl == 1 && (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS))
pte |= ARM_V7S_ATTR_NS_SECTION;
  
-	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_4GB)

-   pte |= ARM_V7S_ATTR_MTK_4GB;
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_4GB) {
+   if (paddr & BIT_ULL(32))
+   pte |= ARM_V7S_ATTR_MTK_PA_BIT32;
+   if (paddr & BIT_ULL(33))
+   pte |= ARM_V7S_ATTR_MTK_PA_BIT33;
+   }
  
  	return pte;

  }
@@ -392,7 +399,7 @@ static int arm_v7s_init_pte(struct arm_v7s_io_pgtable *data,
return -EEXIST;
}
  
-	pte = arm_v7s_prot_to_pte(prot, lvl, cfg);

+   pte = arm_v7s_prot_to_pte(prot, lvl, cfg, paddr);
if (num_entries > 1)
pte = arm_v7s_pte_to_cont(pte, lvl);
  
@@ -484,7 +491,11 @@ static int arm_v7s_map(struct io_pgtable_ops *ops, unsigned long iova,

if (!(prot & (IOMMU_READ | IOMMU_WRITE)))
return 0;
  
-	if (WARN_ON(upper_32_bits(iova) || upper_32_bits(paddr)))

+   if (WARN_ON(upper_32_bits(iova)))
+   return -ERANGE;
+
+   if (WARN_ON(upper_32_bits(paddr) &&
+   !(iop->cfg.quirks & IO_PGTABLE_QUIRK_ARM_MTK_4GB)))


TBH I'd just cram the quirk check into the right-hand-side of the || 
rather than introduce a separate if() - there's already too many 
parentheses to read the whole thing easily ;)



return -ERANGE;
  
  	ret = __arm_v7s_map(data, iova, paddr, size, prot, 1, data->pgd);

@@ -563,7 +574,7 @@ static size_t arm_v7s_split_blk_unmap(struct 
arm_v7s_io_pgtable *data,
num_entries = size >> ARM_V7S_LVL_SHIFT(2);
unmap_idx = ARM_V7S_LVL_IDX(iova, 2);
  
-	pte = arm_v7s_prot_to_pte(arm_v7s_pte_to_prot(blk_pte, 1), 2, cfg);

+   pte = arm_v7s_prot_to_pte(arm_v7s_pte_to_prot(blk_pte, 1), 2, cfg, 0);
if (num_entries > 1)
pte = arm_v7s_pte_to_cont(pte, 2);
  
@@ -677,7 +688,9 @@ static phys_addr_t arm_v7s_iova_to_phys(struct 

[PATCH v3 08/10] iommu/iopf: Handle mm faults

2018-09-20 Thread Jean-Philippe Brucker
When a recoverable page fault is handled by the fault workqueue, find the
associated mm and call handle_mm_fault.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/io-pgfault.c | 86 +-
 1 file changed, 84 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 29aa8c6ba459..f6d9f40b879b 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -7,6 +7,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -65,8 +66,65 @@ static int iopf_complete(struct device *dev, struct 
iommu_fault_event *evt,
 static enum page_response_code
 iopf_handle_single(struct iopf_fault *fault)
 {
-   /* TODO */
-   return -ENODEV;
+   vm_fault_t ret;
+   struct mm_struct *mm;
+   struct vm_area_struct *vma;
+   unsigned int access_flags = 0;
+   unsigned int fault_flags = FAULT_FLAG_REMOTE;
+   struct iommu_fault_event *evt = >evt;
+   enum page_response_code status = IOMMU_PAGE_RESP_INVALID;
+
+   if (!evt->pasid_valid)
+   return status;
+
+   mm = iommu_sva_find(evt->pasid);
+   if (!mm)
+   return status;
+
+   down_read(>mmap_sem);
+
+   vma = find_extend_vma(mm, evt->addr);
+   if (!vma)
+   /* Unmapped area */
+   goto out_put_mm;
+
+   if (evt->prot & IOMMU_FAULT_READ)
+   access_flags |= VM_READ;
+
+   if (evt->prot & IOMMU_FAULT_WRITE) {
+   access_flags |= VM_WRITE;
+   fault_flags |= FAULT_FLAG_WRITE;
+   }
+
+   if (evt->prot & IOMMU_FAULT_EXEC) {
+   access_flags |= VM_EXEC;
+   fault_flags |= FAULT_FLAG_INSTRUCTION;
+   }
+
+   if (!(evt->prot & IOMMU_FAULT_PRIV))
+   fault_flags |= FAULT_FLAG_USER;
+
+   if (access_flags & ~vma->vm_flags)
+   /* Access fault */
+   goto out_put_mm;
+
+   ret = handle_mm_fault(vma, evt->addr, fault_flags);
+   status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
+   IOMMU_PAGE_RESP_SUCCESS;
+
+out_put_mm:
+   up_read(>mmap_sem);
+
+   /*
+* If the process exits while we're handling the fault on its mm, we
+* can't do mmput(). exit_mmap() would release the MMU notifier, calling
+* iommu_notifier_release(), which has to flush the fault queue that
+* we're executing on... So mmput_async() moves the release of the mm to
+* another thread, if we're the last user.
+*/
+   mmput_async(mm);
+
+   return status;
 }
 
 static void iopf_handle_group(struct work_struct *work)
@@ -100,6 +158,30 @@ static void iopf_handle_group(struct work_struct *work)
  * @cookie: struct device, passed to iommu_register_device_fault_handler.
  *
  * Add a fault to the device workqueue, to be handled by mm.
+ *
+ * This module doesn't handle PCI PASID Stop Marker; IOMMU drivers must discard
+ * them before reporting faults. A PASID Stop Marker (LRW = 0b100) doesn't
+ * expect a response. It may be generated when disabling a PASID (issuing a
+ * PASID stop request) by some PCI devices.
+ *
+ * The PASID stop request is triggered by the mm_exit() callback. When the
+ * callback returns from the device driver, no page request is generated for
+ * this PASID anymore and outstanding ones have been pushed to the IOMMU (as 
per
+ * PCIe 4.0r1.0 - 6.20.1 and 10.4.1.2 - Managing PASID TLP Prefix Usage). Some
+ * PCI devices will wait for all outstanding page requests to come back with a
+ * response before completing the PASID stop request. Others do not wait for
+ * page responses, and instead issue this Stop Marker that tells us when the
+ * PASID can be reallocated.
+ *
+ * It is safe to discard the Stop Marker because it is an optimization.
+ * a. Page requests, which are posted requests, have been flushed to the IOMMU
+ *when mm_exit() returns,
+ * b. We flush all fault queues after mm_exit() returns and before freeing the
+ *PASID.
+ *
+ * So even though the Stop Marker might be issued by the device *after* the 
stop
+ * request completes, outstanding faults will have been dealt with by the time
+ * we free the PASID.
  */
 int iommu_queue_iopf(struct iommu_fault_event *evt, void *cookie)
 {
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 10/10] iommu/sva: Add support for private PASIDs

2018-09-20 Thread Jean-Philippe Brucker
Provide an API for allocating PASIDs and populating them manually. To ease
cleanup and factor allocation code, reuse the io_mm structure for private
PASID. Private io_mm has a NULL mm_struct pointer, and cannot be bound to
multiple devices. The mm_alloc() IOMMU op must now check if the mm
argument is NULL, in which case it should allocate io_pgtables instead of
binding to an mm.

Signed-off-by: Jordan Crouse 
Signed-off-by: Jean-Philippe Brucker 
---
Sadly this probably won't be the final thing. The API in this patch is
used like this:

iommu_sva_alloc_pasid(dev, _mm) -> PASID
iommu_sva_map(io_mm, ...)
iommu_sva_unmap(io_mm, ...)
iommu_sva_free_pasid(dev, io_mm)

The proposed API for auxiliary domains is in an early stage but might
replace this patch and could be used like this:

iommu_enable_aux_domain(dev)
d = iommu_domain_alloc()
iommu_attach_aux(dev, d)
iommu_aux_id(d) -> PASID
iommu_map(d, ...)
iommu_unmap(d, ...)
iommu_detach_aux(dev, d)
iommu_domain_free(d)

The advantage being that the driver doesn't have to use a special
version of map/unmap/etc.
---
 drivers/iommu/iommu-sva.c | 209 ++
 drivers/iommu/iommu.c |  51 ++
 include/linux/iommu.h | 112 +++-
 3 files changed, 331 insertions(+), 41 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 1588a523a214..029776f64e7d 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -15,11 +15,11 @@
 /**
  * DOC: io_mm model
  *
- * The io_mm keeps track of process address spaces shared between CPU and 
IOMMU.
- * The following example illustrates the relation between structures
- * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link between io_mm 
and
- * device. A device can have multiple io_mm and an io_mm may be bound to
- * multiple devices.
+ * When used with the bind()/unbind() functions, the io_mm keeps track of
+ * process address spaces shared between CPU and IOMMU. The following example
+ * illustrates the relation between structures iommu_domain, io_mm and
+ * iommu_bond. An iommu_bond is a link between io_mm and device. A device can
+ * have multiple io_mm and an io_mm may be bound to multiple devices.
  *  ___
  * |  IOMMU domain A   |
  * |   |
@@ -98,6 +98,12 @@
  * the first entry points to the io_pgtable pointer. In other IOMMUs the
  * io_pgtable pointer is held in the device table and PASID #0 is available to
  * the allocator.
+ *
+ * The io_mm can also represent a private IOMMU address space, which isn't
+ * shared with a process. The device driver calls iommu_sva_alloc_pasid which
+ * returns an io_mm that can be populated with the iommu_sva_map/unmap
+ * functions. The principle is the same as shared io_mm, except that a private
+ * io_mm cannot be bound to multiple devices.
  */
 
 struct iommu_bond {
@@ -131,6 +137,9 @@ static DEFINE_SPINLOCK(iommu_sva_lock);
 
 static struct mmu_notifier_ops iommu_mmu_notifier;
 
+#define io_mm_is_private(io_mm) ((io_mm) != NULL && (io_mm)->mm == NULL)
+#define io_mm_is_shared(io_mm) ((io_mm) != NULL && (io_mm)->mm != NULL)
+
 static struct io_mm *
 io_mm_alloc(struct iommu_domain *domain, struct device *dev,
struct mm_struct *mm, unsigned long flags)
@@ -149,19 +158,10 @@ io_mm_alloc(struct iommu_domain *domain, struct device 
*dev,
if (!io_mm)
return ERR_PTR(-ENOMEM);
 
-   /*
-* The mm must not be freed until after the driver frees the io_mm
-* (which may involve unpinning the CPU ASID for instance, requiring a
-* valid mm struct.)
-*/
-   mmgrab(mm);
-
io_mm->flags= flags;
io_mm->mm   = mm;
-   io_mm->notifier.ops = _mmu_notifier;
io_mm->release  = domain->ops->mm_free;
INIT_LIST_HEAD(_mm->devices);
-   /* Leave kref as zero until the io_mm is fully initialized */
 
idr_preload(GFP_KERNEL);
spin_lock(_sva_lock);
@@ -176,6 +176,32 @@ io_mm_alloc(struct iommu_domain *domain, struct device 
*dev,
goto err_free_mm;
}
 
+   return io_mm;
+
+err_free_mm:
+   io_mm->release(io_mm);
+   return ERR_PTR(ret);
+}
+
+static struct io_mm *
+io_mm_alloc_shared(struct iommu_domain *domain, struct device *dev,
+  struct mm_struct *mm, unsigned long flags)
+{
+   int ret;
+   struct io_mm *io_mm;
+
+   io_mm = io_mm_alloc(domain, dev, mm, flags);
+   if (IS_ERR(io_mm))
+   return io_mm;
+
+   /*
+* The mm must not be freed until after the driver frees the io_mm
+* (which may involve unpinning the CPU ASID for instance, requiring a
+* valid mm struct.)
+*/
+   mmgrab(mm);
+
+   io_mm->notifier.ops = _mmu_notifier;

[PATCH v3 09/10] iommu/sva: Register page fault handler

2018-09-20 Thread Jean-Philippe Brucker
Let users call iommu_sva_init_device() with the IOMMU_SVA_FEAT_IOPF flag,
that enables the I/O Page Fault queue. The IOMMU driver checks is the
device supports a form of page fault, in which case they add the device to
a fault queue. If the device doesn't support page faults, the IOMMU driver
aborts iommu_sva_init_device().

The fault queue must be flushed before any io_mm is freed, to make sure
that its PASID isn't used in any fault queue, and can be reallocated.
Add iopf_queue_flush() calls in a few strategic locations.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/iommu-sva.c | 26 +-
 drivers/iommu/iommu.c |  6 +++---
 include/linux/iommu.h |  2 ++
 3 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index ee86f00ee1b9..1588a523a214 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -443,6 +443,8 @@ static void iommu_notifier_release(struct mmu_notifier *mn, 
struct mm_struct *mm
dev_WARN(bond->dev, "possible leak of PASID %u",
 io_mm->pasid);
 
+   iopf_queue_flush_dev(bond->dev, io_mm->pasid);
+
spin_lock(_sva_lock);
next = list_next_entry(bond, mm_head);
 
@@ -590,6 +592,12 @@ int __iommu_sva_unbind_device(struct device *dev, int 
pasid)
goto out_unlock;
}
 
+   /*
+* Caller stopped the device from issuing PASIDs, now make sure they are
+* out of the fault queue.
+*/
+   iopf_queue_flush_dev(dev, pasid);
+
/* spin_lock_irq matches the one in wait_event_lock_irq */
spin_lock_irq(_sva_lock);
list_for_each_entry(bond, >mm_list, dev_head) {
@@ -615,6 +623,8 @@ static void __iommu_sva_unbind_device_all(struct device 
*dev)
if (!param)
return;
 
+   iopf_queue_flush_dev(dev, IOMMU_PASID_INVALID);
+
spin_lock_irq(_sva_lock);
list_for_each_entry_safe(bond, next, >mm_list, dev_head)
io_mm_detach_locked(bond, true);
@@ -680,6 +690,9 @@ EXPORT_SYMBOL_GPL(iommu_sva_find);
  * overrides it. Similarly, @min_pasid overrides the lower PASID limit 
supported
  * by the IOMMU.
  *
+ * If the device should support recoverable I/O Page Faults (e.g. PCI PRI), the
+ * IOMMU_SVA_FEAT_IOPF feature must be requested.
+ *
  * @mm_exit is called when an address space bound to the device is about to be
  * torn down by exit_mmap. After @mm_exit returns, the device must not issue 
any
  * more transaction with the PASID given as argument. The handler gets an 
opaque
@@ -707,7 +720,7 @@ int iommu_sva_init_device(struct device *dev, unsigned long 
features,
if (!domain || !domain->ops->sva_init_device)
return -ENODEV;
 
-   if (features)
+   if (features & ~IOMMU_SVA_FEAT_IOPF)
return -EINVAL;
 
param = kzalloc(sizeof(*param), GFP_KERNEL);
@@ -734,10 +747,20 @@ int iommu_sva_init_device(struct device *dev, unsigned 
long features,
if (ret)
goto err_unlock;
 
+   if (features & IOMMU_SVA_FEAT_IOPF) {
+   ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf,
+ dev);
+   if (ret)
+   goto err_shutdown;
+   }
+
dev->iommu_param->sva_param = param;
mutex_unlock(>iommu_param->sva_lock);
return 0;
 
+err_shutdown:
+   if (domain->ops->sva_shutdown_device)
+   domain->ops->sva_shutdown_device(dev);
 err_unlock:
mutex_unlock(>iommu_param->sva_lock);
kfree(param);
@@ -766,6 +789,7 @@ void iommu_sva_shutdown_device(struct device *dev)
goto out_unlock;
 
__iommu_sva_unbind_device_all(dev);
+   iommu_unregister_device_fault_handler(dev);
 
if (domain->ops->sva_shutdown_device)
domain->ops->sva_shutdown_device(dev);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7113fe398b70..b493f5c4fe64 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2342,9 +2342,9 @@ EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
  * iommu_sva_init_device() must be called first, to initialize the required SVA
  * features. @flags must be a subset of these features.
  *
- * The caller must pin down using get_user_pages*() all mappings shared with 
the
- * device. mlock() isn't sufficient, as it doesn't prevent minor page faults
- * (e.g. copy-on-write).
+ * If IOMMU_SVA_FEAT_IOPF isn't requested, the caller must pin down using
+ * get_user_pages*() all mappings shared with the device. mlock() isn't
+ * sufficient, as it doesn't prevent minor page faults (e.g. copy-on-write).
  *
  * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an 
error
  * is returned.
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b7cd00ae7358..ad2b18883ae2 100644
--- 

[PATCH v3 05/10] iommu/sva: Track mm changes with an MMU notifier

2018-09-20 Thread Jean-Philippe Brucker
When creating an io_mm structure, register an MMU notifier that informs
us when the virtual address space changes and disappears.

Add a new operation to the IOMMU driver, mm_invalidate, called when a
range of addresses is unmapped to let the IOMMU driver send ATC
invalidations. mm_invalidate cannot sleep.

Adding the notifier complicates io_mm release. In one case device
drivers free the io_mm explicitly by calling unbind (or detaching the
device from its domain). In the other case the process could crash
before unbind, in which case the release notifier has to do all the
work.

Signed-off-by: Jean-Philippe Brucker 
---
v2->v3: Add MMU_INVALIDATE_DOES_NOT_BLOCK flag to MMU notifier

In v2 Christian pointed out that letting mm_exit() linger for too long
(some devices could need minutes to flush a PASID context) might force
the OOM killer to kill additional tasks, for example if the victim has
mlocked all its memory, which the reaper thread cannot clean up.

If this turns out to be problematic to users, we might need to add some
complexity in IOMMU drivers in order to disable PASIDs and return to
exit_mmap() while DMA is still running. While invasive on the IOMMU
side, such change might not require modification of device drivers or
the API, since iommu_notifier_release() could simply schedule a call to
their mm_exit() instead of calling it synchronously. So we can tune this
behavior in a later series.

Note that some steps cannot be skipped: the ATC invalidation, which may
take up to a minute according to the PCI spec, must be done from the MMU
notifier context. The PCI stop PASID mechanism is an implicit ATC
invalidation, but if we postpone it then we'll have to perform an
explicit one.
---
 drivers/iommu/Kconfig |   1 +
 drivers/iommu/iommu-sva.c | 246 +++---
 include/linux/iommu.h |  10 ++
 3 files changed, 240 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 884580401919..88d6c68284f3 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -98,6 +98,7 @@ config IOMMU_DMA
 config IOMMU_SVA
bool
select IOMMU_API
+   select MMU_NOTIFIER
 
 config FSL_PAMU
bool "Freescale IOMMU support"
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 08da479dad68..5ff8967cb213 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -7,6 +7,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -107,6 +108,9 @@ struct iommu_bond {
struct list_headmm_head;
struct list_headdev_head;
struct list_headdomain_head;
+   refcount_t  refs;
+   struct wait_queue_head  mm_exit_wq;
+   boolmm_exit_active;
 
void*drvdata;
 };
@@ -125,6 +129,8 @@ static DEFINE_IDR(iommu_pasid_idr);
  */
 static DEFINE_SPINLOCK(iommu_sva_lock);
 
+static struct mmu_notifier_ops iommu_mmu_notifier;
+
 static struct io_mm *
 io_mm_alloc(struct iommu_domain *domain, struct device *dev,
struct mm_struct *mm, unsigned long flags)
@@ -152,6 +158,7 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 
io_mm->flags= flags;
io_mm->mm   = mm;
+   io_mm->notifier.ops = _mmu_notifier;
io_mm->release  = domain->ops->mm_free;
INIT_LIST_HEAD(_mm->devices);
/* Leave kref as zero until the io_mm is fully initialized */
@@ -169,8 +176,29 @@ io_mm_alloc(struct iommu_domain *domain, struct device 
*dev,
goto err_free_mm;
}
 
-   /* TODO: keep track of mm. For the moment, abort. */
-   ret = -ENOSYS;
+   ret = mmu_notifier_register(_mm->notifier, mm);
+   if (ret)
+   goto err_free_pasid;
+
+   /*
+* Now that the MMU notifier is valid, we can allow users to grab this
+* io_mm by setting a valid refcount. Before that it was accessible in
+* the IDR but invalid.
+*
+* The following barrier ensures that users, who obtain the io_mm with
+* kref_get_unless_zero, don't read uninitialized fields in the
+* structure.
+*/
+   smp_wmb();
+   kref_init(_mm->kref);
+
+   return io_mm;
+
+err_free_pasid:
+   /*
+* Even if the io_mm is accessible from the IDR at this point, kref is
+* 0 so no user could get a reference to it. Free it manually.
+*/
spin_lock(_sva_lock);
idr_remove(_pasid_idr, io_mm->pasid);
spin_unlock(_sva_lock);
@@ -182,9 +210,13 @@ io_mm_alloc(struct iommu_domain *domain, struct device 
*dev,
return ERR_PTR(ret);
 }
 
-static void io_mm_free(struct io_mm *io_mm)
+static void io_mm_free(struct rcu_head *rcu)
 {
-   struct mm_struct *mm = io_mm->mm;
+   struct io_mm *io_mm;
+   struct mm_struct *mm;
+
+   io_mm = container_of(rcu, struct io_mm, rcu);
+   mm = 

[PATCH v3 07/10] iommu: Add a page fault handler

2018-09-20 Thread Jean-Philippe Brucker
Some systems allow devices to handle I/O Page Faults in the core mm. For
example systems implementing the PCI PRI extension or Arm SMMU stall
model. Infrastructure for reporting these recoverable page faults was
recently added to the IOMMU core for SVA virtualisation. Add a page fault
handler for host SVA.

IOMMU driver can now instantiate several fault workqueues and link them to
IOPF-capable devices. Drivers can choose between a single global
workqueue, one per IOMMU device, one per low-level fault queue, one per
domain, etc.

When it receives a fault event, supposedly in an IRQ handler, the IOMMU
driver reports the fault using iommu_report_device_fault(), which calls
the registered handler. The page fault handler then calls the mm fault
handler, and reports either success or failure with iommu_page_response().
When the handler succeeded, the IOMMU retries the access.

The iopf_param pointer could be embedded into iommu_fault_param. But
putting iopf_param into the iommu_param structure allows us not to care
about ordering between calls to iopf_queue_add_device() and
iommu_register_device_fault_handler().

Signed-off-by: Jean-Philippe Brucker 
---
v2->v3:
* queue_flush now removes pending partial faults
* queue_flush now takes an optional PASID argument, allowing IOMMU
  drivers to selectively flush faults if possible
* remove PAGE_RESP_HANDLED/PAGE_RESP_CONTINUE
* rename iopf_context -> iopf_fault
---
 drivers/iommu/Kconfig  |   4 +
 drivers/iommu/Makefile |   1 +
 drivers/iommu/io-pgfault.c | 382 +
 include/linux/iommu.h  |  56 +-
 4 files changed, 442 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/io-pgfault.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 88d6c68284f3..27ead980 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -100,6 +100,10 @@ config IOMMU_SVA
select IOMMU_API
select MMU_NOTIFIER
 
+config IOMMU_PAGE_FAULT
+   bool
+   select IOMMU_API
+
 config FSL_PAMU
bool "Freescale IOMMU support"
depends on PCI
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 7d6332be5f0e..1c4b0be5d44b 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_DEBUGFS) += iommu-debugfs.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
+obj-$(CONFIG_IOMMU_PAGE_FAULT) += io-pgfault.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
new file mode 100644
index ..29aa8c6ba459
--- /dev/null
+++ b/drivers/iommu/io-pgfault.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Handle device page faults
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * struct iopf_queue - IO Page Fault queue
+ * @wq: the fault workqueue
+ * @flush: low-level flush callback
+ * @flush_arg: flush() argument
+ * @refs: references to this structure taken by producers
+ */
+struct iopf_queue {
+   struct workqueue_struct *wq;
+   iopf_queue_flush_t  flush;
+   void*flush_arg;
+   refcount_t  refs;
+};
+
+/**
+ * struct iopf_device_param - IO Page Fault data attached to a device
+ * @queue: IOPF queue
+ * @partial: faults that are part of a Page Request Group for which the last
+ *   request hasn't been submitted yet.
+ */
+struct iopf_device_param {
+   struct iopf_queue   *queue;
+   struct list_headpartial;
+};
+
+struct iopf_fault {
+   struct iommu_fault_eventevt;
+   struct list_headhead;
+};
+
+struct iopf_group {
+   struct iopf_fault   last_fault;
+   struct list_headfaults;
+   struct work_struct  work;
+   struct device   *dev;
+};
+
+static int iopf_complete(struct device *dev, struct iommu_fault_event *evt,
+enum page_response_code status)
+{
+   struct page_response_msg resp = {
+   .addr   = evt->addr,
+   .pasid  = evt->pasid,
+   .pasid_present  = evt->pasid_valid,
+   .page_req_group_id  = evt->page_req_group_id,
+   .private_data   = evt->iommu_private,
+   .resp_code  = status,
+   };
+
+   return iommu_page_response(dev, );
+}
+
+static enum page_response_code
+iopf_handle_single(struct iopf_fault *fault)
+{
+   /* TODO */
+   return -ENODEV;
+}
+
+static void iopf_handle_group(struct work_struct *work)
+{
+   struct iopf_group *group;
+   struct iopf_fault *fault, *next;
+   

[PATCH v3 06/10] iommu/sva: Search mm by PASID

2018-09-20 Thread Jean-Philippe Brucker
The fault handler will need to find an mm given its PASID. This is the
reason we have an IDR for storing address spaces, so hook it up.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/iommu-sva.c | 26 ++
 include/linux/iommu.h |  7 +++
 2 files changed, 33 insertions(+)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 5ff8967cb213..ee86f00ee1b9 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -636,6 +636,32 @@ void iommu_sva_unbind_device_all(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_sva_unbind_device_all);
 
+/**
+ * iommu_sva_find() - Find mm associated to the given PASID
+ * @pasid: Process Address Space ID assigned to the mm
+ *
+ * Returns the mm corresponding to this PASID, or NULL if not found. A 
reference
+ * to the mm is taken, and must be released with mmput().
+ */
+struct mm_struct *iommu_sva_find(int pasid)
+{
+   struct io_mm *io_mm;
+   struct mm_struct *mm = NULL;
+
+   spin_lock(_sva_lock);
+   io_mm = idr_find(_pasid_idr, pasid);
+   if (io_mm && io_mm_get_locked(io_mm)) {
+   if (mmget_not_zero(io_mm->mm))
+   mm = io_mm->mm;
+
+   io_mm_put_locked(io_mm);
+   }
+   spin_unlock(_sva_lock);
+
+   return mm;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_find);
+
 /**
  * iommu_sva_init_device() - Initialize Shared Virtual Addressing for a device
  * @dev: the device
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 429f3dc37a35..a457650b80de 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -987,6 +987,8 @@ extern int __iommu_sva_bind_device(struct device *dev, 
struct mm_struct *mm,
   void *drvdata);
 extern int __iommu_sva_unbind_device(struct device *dev, int pasid);
 extern void iommu_sva_unbind_device_all(struct device *dev);
+extern struct mm_struct *iommu_sva_find(int pasid);
+
 #else /* CONFIG_IOMMU_SVA */
 static inline int iommu_sva_init_device(struct device *dev,
unsigned long features,
@@ -1016,6 +1018,11 @@ static inline int __iommu_sva_unbind_device(struct 
device *dev, int pasid)
 static inline void iommu_sva_unbind_device_all(struct device *dev)
 {
 }
+
+static inline struct mm_struct *iommu_sva_find(int pasid)
+{
+   return NULL;
+}
 #endif /* CONFIG_IOMMU_SVA */
 
 #endif /* __LINUX_IOMMU_H */
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 03/10] iommu/sva: Manage process address spaces

2018-09-20 Thread Jean-Philippe Brucker
Allocate IOMMU mm structures and bind them to devices. Four operations are
added to IOMMU drivers:

* mm_alloc(): to create an io_mm structure and perform architecture-
  specific operations required to grab the process (for instance on ARM,
  pin down the CPU ASID so that the process doesn't get assigned a new
  ASID on rollover).

  There is a single valid io_mm structure per Linux mm. Future extensions
  may also use io_mm for kernel-managed address spaces, populated with
  map()/unmap() calls instead of bound to process address spaces. This
  patch focuses on "shared" io_mm.

* mm_attach(): attach an mm to a device. The IOMMU driver checks that the
  device is capable of sharing an address space, and writes the PASID
  table entry to install the pgd.

  Some IOMMU drivers will have a single PASID table per domain, for
  convenience. Other can implement it differently but to help these
  drivers, mm_attach and mm_detach take 'attach_domain' and
  'detach_domain' parameters, that tell whether they need to set and clear
  the PASID entry or only send the required TLB invalidations.

* mm_detach(): detach an mm from a device. The IOMMU driver removes the
  PASID table entry and invalidates the IOTLBs.

* mm_free(): free a structure allocated by mm_alloc(), and let arch
  release the process.

mm_attach and mm_detach operations are serialized with a spinlock. When
trying to optimize this code, we should at least prevent concurrent
attach()/detach() on the same domain (so multi-level PASID table code can
allocate tables lazily). mm_alloc() can sleep, but mm_free must not
(because we'll have to call it from call_srcu later on).

At the moment we use an IDR for allocating PASIDs and retrieving contexts.
We also use a single spinlock. These can be refined and optimized later (a
custom allocator will be needed for top-down PASID allocation).

Keeping track of address spaces requires the use of MMU notifiers.
Handling process exit with regard to unbind() is tricky, so it is left for
another patch and we explicitly fail mm_alloc() for the moment.

Signed-off-by: Jean-Philippe Brucker 
---
v2->v3: use sva_lock, comment updates
---
 drivers/iommu/iommu-sva.c | 397 +-
 drivers/iommu/iommu.c |   1 +
 include/linux/iommu.h |  29 +++
 3 files changed, 424 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index d60d4f0bb89e..a486bc947335 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -5,25 +5,415 @@
  * Copyright (C) 2018 ARM Ltd.
  */
 
+#include 
 #include 
+#include 
 #include 
+#include 
+
+/**
+ * DOC: io_mm model
+ *
+ * The io_mm keeps track of process address spaces shared between CPU and 
IOMMU.
+ * The following example illustrates the relation between structures
+ * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link between io_mm 
and
+ * device. A device can have multiple io_mm and an io_mm may be bound to
+ * multiple devices.
+ *  ___
+ * |  IOMMU domain A   |
+ * |   |
+ * | |  IOMMU group   |+--- io_pgtables
+ * | |||
+ * | |   dev 00:00.0 +--- bond --- io_mm X
+ * | ||   \|
+ * |   '- bond ---.
+ * |___|   \
+ *  ___ \
+ * |  IOMMU domain B   |   io_mm Y
+ * |   |   / /
+ * | |  IOMMU group   ||  / /
+ * | ||| / /
+ * | |   dev 00:01.0  bond -' /
+ * | |   dev 00:01.1  bond --'
+ * | |||
+ * |   +--- io_pgtables
+ * |___|
+ *
+ * In this example, device 00:00.0 is in domain A, devices 00:01.* are in 
domain
+ * B. All devices within the same domain access the same address spaces. Device
+ * 00:00.0 accesses address spaces X and Y, each corresponding to an mm_struct.
+ * Devices 00:01.* only access address space Y. In addition each
+ * IOMMU_DOMAIN_DMA domain has a private address space, io_pgtable, that is
+ * managed with iommu_map()/iommu_unmap(), and isn't shared with the CPU MMU.
+ *
+ * To obtain the above configuration, users would for instance issue the
+ * following calls:
+ *
+ * iommu_sva_bind_device(dev 00:00.0, mm X, ...) -> PASID 1
+ * iommu_sva_bind_device(dev 00:00.0, mm Y, ...) -> PASID 2
+ * iommu_sva_bind_device(dev 00:01.0, mm Y, ...) -> PASID 2
+ * iommu_sva_bind_device(dev 00:01.1, mm Y, ...) -> PASID 2
+ *
+ * A single Process Address Space ID (PASID) is allocated for each mm. In the
+ * example, devices use PASID 1 

[PATCH v3 04/10] iommu/sva: Add a mm_exit callback for device drivers

2018-09-20 Thread Jean-Philippe Brucker
When an mm exits, devices that were bound to it must stop performing DMA
on its PASID. Let device drivers register a callback to be notified on mm
exit. Add the callback to the sva_param structure attached to struct
device.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/iommu-sva.c | 10 +-
 include/linux/iommu.h |  8 ++--
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index a486bc947335..08da479dad68 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -436,6 +436,7 @@ EXPORT_SYMBOL_GPL(iommu_sva_unbind_device_all);
  * @features: bitmask of features that need to be initialized
  * @min_pasid: min PASID value supported by the device
  * @max_pasid: max PASID value supported by the device
+ * @mm_exit: callback for process address space release
  *
  * Users of the bind()/unbind() API must call this function to initialize all
  * features required for SVA.
@@ -447,13 +448,19 @@ EXPORT_SYMBOL_GPL(iommu_sva_unbind_device_all);
  * overrides it. Similarly, @min_pasid overrides the lower PASID limit 
supported
  * by the IOMMU.
  *
+ * @mm_exit is called when an address space bound to the device is about to be
+ * torn down by exit_mmap. After @mm_exit returns, the device must not issue 
any
+ * more transaction with the PASID given as argument. The handler gets an 
opaque
+ * pointer corresponding to the drvdata passed as argument to bind().
+ *
  * The device should not be performing any DMA while this function is running,
  * otherwise the behavior is undefined.
  *
  * Return 0 if initialization succeeded, or an error.
  */
 int iommu_sva_init_device(struct device *dev, unsigned long features,
-  unsigned int min_pasid, unsigned int max_pasid)
+ unsigned int min_pasid, unsigned int max_pasid,
+ iommu_mm_exit_handler_t mm_exit)
 {
int ret;
struct iommu_sva_param *param;
@@ -472,6 +479,7 @@ int iommu_sva_init_device(struct device *dev, unsigned long 
features,
param->features = features;
param->min_pasid= min_pasid;
param->max_pasid= max_pasid;
+   param->mm_exit  = mm_exit;
INIT_LIST_HEAD(>mm_list);
 
mutex_lock(>iommu_param->sva_lock);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 6a3ced6a5aa1..c95ff714ea66 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -60,6 +60,7 @@ struct iommu_fault_event;
 typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
struct device *, unsigned long, int, void *);
 typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *, void *);
+typedef int (*iommu_mm_exit_handler_t)(struct device *dev, int pasid, void *);
 
 struct iommu_domain_geometry {
dma_addr_t aperture_start; /* First address that can be mapped*/
@@ -216,6 +217,7 @@ struct iommu_sva_param {
unsigned int min_pasid;
unsigned int max_pasid;
struct list_head mm_list;
+   iommu_mm_exit_handler_t mm_exit;
 };
 
 /**
@@ -967,7 +969,8 @@ static inline void iommu_debugfs_setup(void) {}
 #ifdef CONFIG_IOMMU_SVA
 extern int iommu_sva_init_device(struct device *dev, unsigned long features,
 unsigned int min_pasid,
-unsigned int max_pasid);
+unsigned int max_pasid,
+iommu_mm_exit_handler_t mm_exit);
 extern void iommu_sva_shutdown_device(struct device *dev);
 extern int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
   int *pasid, unsigned long flags,
@@ -978,7 +981,8 @@ extern void iommu_sva_unbind_device_all(struct device *dev);
 static inline int iommu_sva_init_device(struct device *dev,
unsigned long features,
unsigned int min_pasid,
-   unsigned int max_pasid)
+   unsigned int max_pasid,
+   iommu_mm_exit_handler_t mm_exit)
 {
return -ENODEV;
 }
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 02/10] iommu/sva: Bind process address spaces to devices

2018-09-20 Thread Jean-Philippe Brucker
Add bind() and unbind() operations to the IOMMU API. Bind() returns a
PASID that drivers can program in hardware, to let their devices access an
mm. This patch only adds skeletons for the device driver API, most of the
implementation is still missing.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/iommu-sva.c | 34 +++
 drivers/iommu/iommu.c | 90 +++
 include/linux/iommu.h | 37 
 3 files changed, 161 insertions(+)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 85ef98efede8..d60d4f0bb89e 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -8,6 +8,38 @@
 #include 
 #include 
 
+int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int 
*pasid,
+   unsigned long flags, void *drvdata)
+{
+   return -ENOSYS; /* TODO */
+}
+EXPORT_SYMBOL_GPL(__iommu_sva_bind_device);
+
+int __iommu_sva_unbind_device(struct device *dev, int pasid)
+{
+   return -ENOSYS; /* TODO */
+}
+EXPORT_SYMBOL_GPL(__iommu_sva_unbind_device);
+
+static void __iommu_sva_unbind_device_all(struct device *dev)
+{
+   /* TODO */
+}
+
+/**
+ * iommu_sva_unbind_device_all() - Detach all address spaces from this device
+ * @dev: the device
+ *
+ * When detaching @dev from a domain, IOMMU drivers should use this helper.
+ */
+void iommu_sva_unbind_device_all(struct device *dev)
+{
+   mutex_lock(>iommu_param->sva_lock);
+   __iommu_sva_unbind_device_all(dev);
+   mutex_unlock(>iommu_param->sva_lock);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unbind_device_all);
+
 /**
  * iommu_sva_init_device() - Initialize Shared Virtual Addressing for a device
  * @dev: the device
@@ -96,6 +128,8 @@ void iommu_sva_shutdown_device(struct device *dev)
if (!param)
goto out_unlock;
 
+   __iommu_sva_unbind_device_all(dev);
+
if (domain->ops->sva_shutdown_device)
domain->ops->sva_shutdown_device(dev);
 
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index fa0561ed006f..aba3bf15d46c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2325,3 +2325,93 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, 
int num_ids)
return 0;
 }
 EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
+
+/**
+ * iommu_sva_bind_device() - Bind a process address space to a device
+ * @dev: the device
+ * @mm: the mm to bind, caller must hold a reference to it
+ * @pasid: valid address where the PASID will be stored
+ * @flags: bond properties
+ * @drvdata: private data passed to the mm exit handler
+ *
+ * Create a bond between device and task, allowing the device to access the mm
+ * using the returned PASID. If unbind() isn't called first, a subsequent 
bind()
+ * for the same device and mm fails with -EEXIST.
+ *
+ * iommu_sva_init_device() must be called first, to initialize the required SVA
+ * features. @flags must be a subset of these features.
+ *
+ * The caller must pin down using get_user_pages*() all mappings shared with 
the
+ * device. mlock() isn't sufficient, as it doesn't prevent minor page faults
+ * (e.g. copy-on-write).
+ *
+ * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an 
error
+ * is returned.
+ */
+int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
+ unsigned long flags, void *drvdata)
+{
+   int ret = -EINVAL;
+   struct iommu_group *group;
+
+   if (!pasid)
+   return -EINVAL;
+
+   group = iommu_group_get(dev);
+   if (!group)
+   return -ENODEV;
+
+   /* Ensure device count and domain don't change while we're binding */
+   mutex_lock(>mutex);
+
+   /*
+* To keep things simple, SVA currently doesn't support IOMMU groups
+* with more than one device. Existing SVA-capable systems are not
+* affected by the problems that required IOMMU groups (lack of ACS
+* isolation, device ID aliasing and other hardware issues).
+*/
+   if (iommu_group_device_count(group) != 1)
+   goto out_unlock;
+
+   ret = __iommu_sva_bind_device(dev, mm, pasid, flags, drvdata);
+
+out_unlock:
+   mutex_unlock(>mutex);
+   iommu_group_put(group);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
+
+/**
+ * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
+ * @dev: the device
+ * @pasid: the pasid returned by bind()
+ *
+ * Remove bond between device and address space identified by @pasid. Users
+ * should not call unbind() if the corresponding mm exited (as the PASID might
+ * have been reallocated for another process).
+ *
+ * The device must not be issuing any more transaction for this PASID. All
+ * outstanding page requests for this PASID must have been flushed to the 
IOMMU.
+ *
+ * Returns 0 on success, or an error value
+ */
+int iommu_sva_unbind_device(struct device *dev, int pasid)

[PATCH v3 01/10] iommu: Introduce Shared Virtual Addressing API

2018-09-20 Thread Jean-Philippe Brucker
Shared Virtual Addressing (SVA) provides a way for device drivers to bind
process address spaces to devices. This requires the IOMMU to support page
table format and features compatible with the CPUs, and usually requires
the system to support I/O Page Faults (IOPF) and Process Address Space ID
(PASID). When all of these are available, DMA can access virtual addresses
of a process. A PASID is allocated for each process, and the device driver
programs it into the device in an implementation-specific way.

Add a new API for sharing process page tables with devices. Introduce two
IOMMU operations, sva_init_device() and sva_shutdown_device(), that
prepare the IOMMU driver for SVA. For example allocate PASID tables and
fault queues. Subsequent patches will implement the bind() and unbind()
operations.

Introduce a new mutex sva_lock on the device's IOMMU param to serialize
init(), shutdown(), bind() and unbind() operations. Using the existing
lock isn't possible because the unbind() and shutdown() operations will
have to wait while holding sva_lock for concurrent fault queue flushes to
terminate. These flushes will take the existing lock.

Support for I/O Page Faults will be added in a later patch using a new
feature bit (IOMMU_SVA_FEAT_IOPF). With the current API users must pin
down all shared mappings.

Signed-off-by: Jean-Philippe Brucker 
---
v2->v3:
* Add sva_lock to serialize init/bind/unbind/shutdown
* Rename functions for consistency with the rest of the API
---
 drivers/iommu/Kconfig |   4 ++
 drivers/iommu/Makefile|   1 +
 drivers/iommu/iommu-sva.c | 107 ++
 drivers/iommu/iommu.c |   1 +
 include/linux/iommu.h |  34 
 5 files changed, 147 insertions(+)
 create mode 100644 drivers/iommu/iommu-sva.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index c60395b7470f..884580401919 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -95,6 +95,10 @@ config IOMMU_DMA
select IOMMU_IOVA
select NEED_SG_DMA_LENGTH
 
+config IOMMU_SVA
+   bool
+   select IOMMU_API
+
 config FSL_PAMU
bool "Freescale IOMMU support"
depends on PCI
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index ab5eba6edf82..7d6332be5f0e 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_DEBUGFS) += iommu-debugfs.o
 obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
+obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
new file mode 100644
index ..85ef98efede8
--- /dev/null
+++ b/drivers/iommu/iommu-sva.c
@@ -0,0 +1,107 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Manage PASIDs and bind process address spaces to devices.
+ *
+ * Copyright (C) 2018 ARM Ltd.
+ */
+
+#include 
+#include 
+
+/**
+ * iommu_sva_init_device() - Initialize Shared Virtual Addressing for a device
+ * @dev: the device
+ * @features: bitmask of features that need to be initialized
+ * @min_pasid: min PASID value supported by the device
+ * @max_pasid: max PASID value supported by the device
+ *
+ * Users of the bind()/unbind() API must call this function to initialize all
+ * features required for SVA.
+ *
+ * The device must support multiple address spaces (e.g. PCI PASID). By default
+ * the PASID allocated during bind() is limited by the IOMMU capacity, and by
+ * the device PASID width defined in the PCI capability or in the firmware
+ * description. Setting @max_pasid to a non-zero value smaller than this limit
+ * overrides it. Similarly, @min_pasid overrides the lower PASID limit 
supported
+ * by the IOMMU.
+ *
+ * The device should not be performing any DMA while this function is running,
+ * otherwise the behavior is undefined.
+ *
+ * Return 0 if initialization succeeded, or an error.
+ */
+int iommu_sva_init_device(struct device *dev, unsigned long features,
+  unsigned int min_pasid, unsigned int max_pasid)
+{
+   int ret;
+   struct iommu_sva_param *param;
+   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+   if (!domain || !domain->ops->sva_init_device)
+   return -ENODEV;
+
+   if (features)
+   return -EINVAL;
+
+   param = kzalloc(sizeof(*param), GFP_KERNEL);
+   if (!param)
+   return -ENOMEM;
+
+   param->features = features;
+   param->min_pasid= min_pasid;
+   param->max_pasid= max_pasid;
+
+   mutex_lock(>iommu_param->sva_lock);
+   if (dev->iommu_param->sva_param) {
+   ret = -EEXIST;
+   goto err_unlock;
+   }
+
+   /*
+* IOMMU driver updates the limits depending on the 

[PATCH v3 00/10] Shared Virtual Addressing for the IOMMU

2018-09-20 Thread Jean-Philippe Brucker
This is version 3 of the core changes for Shared Virtual Addressing in
the IOMMU. It provides an API for sharing process address spaces with
devices, using for example PCI PASID and PRI.

This time I didn't append the VFIO and SMMUv3 example users. A smaller
series is easier for me to manage and may be less intimidating for
reviewers. If you're adding or updating SVA support to your IOMMU or
device driver, you can find my complete patch stack at [1] for
inspiration.

Changes to patches 1-9 address feedback from v2 [2]. Mostly small tweaks
and comments, that I tried to detail on each patch.

I added patch 10/10 from Jordan Crouse, for managing PASID without
sharing process address spaces. This interface might get superseded by
auxiliary domains [3], since they are more suitable than io_mm for Intel
vt-d's mdev virtualization. Auxiliary domains reuse existing iommu_map/
iommu_unmap/etc ops instead of introducing new ones, so even though it's
more invasive, I tend to prefer that solution. But since we need
something for testing right now, I'm appending patch 10 to the series.

The series depends on Jacob Pan's patches for fault reporting [4].
  iommu: introduce device fault data
  driver core: add per device iommu param
  iommu: add a timeout parameter for prq response
  iommu: introduce device fault report API
  iommu: introduce page response function

[1] git://linux-arm.org/linux-jpb.git sva/v3
[2] https://www.spinics.net/lists/kvm/msg168742.html
[3] 
https://lwn.net/ml/linux-kernel/20180830040922.30426-1-baolu...@linux.intel.com/
[4] 
https://lwn.net/ml/linux-kernel/1526072055-86990-1-git-send-email-jacob.jun.pan%40linux.intel.com/

Jean-Philippe Brucker (10):
  iommu: Introduce Shared Virtual Addressing API
  iommu/sva: Bind process address spaces to devices
  iommu/sva: Manage process address spaces
  iommu/sva: Add a mm_exit callback for device drivers
  iommu/sva: Track mm changes with an MMU notifier
  iommu/sva: Search mm by PASID
  iommu: Add a page fault handler
  iommu/iopf: Handle mm faults
  iommu/sva: Register page fault handler
  iommu/sva: Add support for private PASIDs

 drivers/iommu/Kconfig  |   9 +
 drivers/iommu/Makefile |   2 +
 drivers/iommu/io-pgfault.c | 464 ++
 drivers/iommu/iommu-sva.c  | 967 +
 drivers/iommu/iommu.c  | 143 +-
 include/linux/iommu.h  | 289 ++-
 6 files changed, 1855 insertions(+), 19 deletions(-)
 create mode 100644 drivers/iommu/io-pgfault.c
 create mode 100644 drivers/iommu/iommu-sva.c

-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 01/20] iommu: Introduce bind_pasid_table API

2018-09-20 Thread Jacob Pan
On Tue, 18 Sep 2018 16:24:38 +0200
Eric Auger  wrote:

> From: Jacob Pan 
> 
> In virtualization use case, when a guest is assigned
> a PCI host device, protected by a virtual IOMMU on a guest,
> the physical IOMMU must be programmed to be consistent with
> the guest mappings. If the physical IOMMU supports two
> translation stages it makes sense to program guest mappings
> onto the first stage/level (ARM/VTD terminology) while to host
> owns the stage/level 2.
> 
> In that case, it is mandated to trap on guest configuration
> settings and pass those to the physical iommu driver.
> 
> This patch adds a new API to the iommu subsystem that allows
> to bind and unbind the guest configuration data to the host.
> 
> A generic iommu_pasid_table_config struct is introduced in
> a new iommu.h uapi header. This is going to be used by the VFIO
> user API. We foresee at least two specializations of this struct,
> for PASID table passing and ARM SMMUv3.
> 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> In practice, I think it would be simpler to have a single
> set_pasid_table function instead of bind/unbind. The "bypass" field
> tells the stage 1 is bypassed (equivalent to the unbind actually).
> On userspace we have notifications that the device context has
> changed. Calling either bind or unbind requires to have an understand
> of what was the previous state and call different notifiers. So to me
> the bind/unbind complexifies the user integration while not bring much
> benefits.
> 
I don't have strong preference and I think having a single function
makes sense. In VT-d2, the bind/unbind operation is a result of PASID
cache invalidation from the guest. So there is no symmetrical
bind/unbin user calls.

> This patch generalizes the API introduced by Jacob & co-authors in
> https://lwn.net/Articles/754331/
> 
> v1 -> v2:
> - restore the original pasid table name
> - remove the struct device * parameter in the API
> - reworked iommu_pasid_smmuv3
> ---
>  drivers/iommu/iommu.c  | 19 ++
>  include/linux/iommu.h  | 21 +++
>  include/uapi/linux/iommu.h | 52
> ++ 3 files changed, 92
> insertions(+) create mode 100644 include/uapi/linux/iommu.h
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 8c15c5980299..db2c7c9502ae 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1362,6 +1362,25 @@ int iommu_attach_device(struct iommu_domain
> *domain, struct device *dev) }
>  EXPORT_SYMBOL_GPL(iommu_attach_device);
>  
> +int iommu_bind_pasid_table(struct iommu_domain *domain,
> +struct iommu_pasid_table_config *cfg)
> +{
> + if (unlikely(!domain->ops->bind_pasid_table))
> + return -ENODEV;
> +
> + return domain->ops->bind_pasid_table(domain, cfg);
> +}
> +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
> +
> +void iommu_unbind_pasid_table(struct iommu_domain *domain)
> +{
> + if (unlikely(!domain->ops->unbind_pasid_table))
> + return;
> +
> + domain->ops->unbind_pasid_table(domain);
> +}
> +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
> struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 87994c265bf5..e56cad4863f7 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define IOMMU_READ   (1 << 0)
>  #define IOMMU_WRITE  (1 << 1)
> @@ -185,6 +186,8 @@ struct iommu_resv_region {
>   * @domain_get_windows: Return the number of windows for a domain
>   * @of_xlate: add OF master IDs to iommu grouping
>   * @pgsize_bitmap: bitmap of all possible supported page sizes
> + * @bind_pasid_table: bind pasid table
> + * @unbind_pasid_table: unbind pasid table and restore defaults
>   */
>  struct iommu_ops {
>   bool (*capable)(enum iommu_cap);
> @@ -231,6 +234,10 @@ struct iommu_ops {
>   int (*of_xlate)(struct device *dev, struct of_phandle_args
> *args); bool (*is_attach_deferred)(struct iommu_domain *domain,
> struct device *dev); 
> + int (*bind_pasid_table)(struct iommu_domain *domain,
> + struct iommu_pasid_table_config
> *cfg);
> + void (*unbind_pasid_table)(struct iommu_domain *domain);
> +
>   unsigned long pgsize_bitmap;
>  };
>  
> @@ -292,6 +299,9 @@ extern int iommu_attach_device(struct
> iommu_domain *domain, struct device *dev);
>  extern void iommu_detach_device(struct iommu_domain *domain,
>   struct device *dev);
> +extern int iommu_bind_pasid_table(struct iommu_domain *domain,
> +   struct iommu_pasid_table_config
> *cfg); +extern void iommu_unbind_pasid_table(struct iommu_domain
> *domain); extern 

[PATCH 2/3] xtensa: remove ZONE_DMA

2018-09-20 Thread Christoph Hellwig
ZONE_DMA is intended for magic < 32-bit pools (usually ISA DMA), which
isn't required on xtensa.  Move all the non-highmem memory into
ZONE_NORMAL instead to match other architectures.

Signed-off-by: Christoph Hellwig 
---
 arch/xtensa/Kconfig   | 3 ---
 arch/xtensa/mm/init.c | 2 +-
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
index 516694937b7a..9a7c654a7654 100644
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -1,7 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0
-config ZONE_DMA
-   def_bool y
-
 config XTENSA
def_bool y
select ARCH_HAS_SYNC_DMA_FOR_CPU
diff --git a/arch/xtensa/mm/init.c b/arch/xtensa/mm/init.c
index 34aead7dcb48..b385e6b73065 100644
--- a/arch/xtensa/mm/init.c
+++ b/arch/xtensa/mm/init.c
@@ -71,7 +71,7 @@ void __init zones_init(void)
 {
/* All pages are DMA-able, so we put them all in the DMA zone. */
unsigned long zones_size[MAX_NR_ZONES] = {
-   [ZONE_DMA] = max_low_pfn - ARCH_PFN_OFFSET,
+   [ZONE_NORMAL] = max_low_pfn - ARCH_PFN_OFFSET,
 #ifdef CONFIG_HIGHMEM
[ZONE_HIGHMEM] = max_pfn - max_low_pfn,
 #endif
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/3] xtensa: use dma_direct_{alloc,free}_pages

2018-09-20 Thread Christoph Hellwig
Use the generic helpers for dma allocation instead of opencoding them
with slightly less bells and whistles.

Signed-off-by: Christoph Hellwig 
---
 arch/xtensa/kernel/pci-dma.c | 48 ++--
 1 file changed, 13 insertions(+), 35 deletions(-)

diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c
index a764d894ffdd..a74ca0dd728a 100644
--- a/arch/xtensa/kernel/pci-dma.c
+++ b/arch/xtensa/kernel/pci-dma.c
@@ -141,56 +141,34 @@ void __attribute__((weak)) *platform_vaddr_to_cached(void 
*p)
  * Note: We assume that the full memory space is always mapped to 'kseg'
  *  Otherwise we have to use page attributes (not implemented).
  */
-
-void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
-   gfp_t flag, unsigned long attrs)
+void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+   gfp_t gfp, unsigned long attrs)
 {
-   unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   struct page *page = NULL;
-
-   /* ignore region speicifiers */
-
-   flag &= ~(__GFP_DMA | __GFP_HIGHMEM);
-
-   if (dev == NULL || (dev->coherent_dma_mask < 0x))
-   flag |= GFP_DMA;
-
-   if (gfpflags_allow_blocking(flag))
-   page = dma_alloc_from_contiguous(dev, count, get_order(size),
-flag & __GFP_NOWARN);
+   void *vaddr;
 
-   if (!page)
-   page = alloc_pages(flag, get_order(size));
-
-   if (!page)
+   vaddr = dma_direct_alloc_pages(dev, size, dma_handle, gfp, attrs);
+   if (!vaddr)
return NULL;
 
-   *handle = phys_to_dma(dev, page_to_phys(page));
+   if (attrs & DMA_ATTR_NO_KERNEL_MAPPING)
+   return virt_to_page(vaddr); /* just a random cookie */
 
-   if (attrs & DMA_ATTR_NO_KERNEL_MAPPING) {
-   return page;
-   }
-
-   BUG_ON(!platform_vaddr_cached(page_address(page)));
-   __invalidate_dcache_range((unsigned long)page_address(page), size);
-   return platform_vaddr_to_uncached(page_address(page));
+   BUG_ON(!platform_vaddr_cached(vaddr));
+   __invalidate_dcache_range((unsigned long)vaddr, size);
+   return platform_vaddr_to_uncached(vaddr);
 }
 
 void arch_dma_free(struct device *dev, size_t size, void *vaddr,
dma_addr_t dma_handle, unsigned long attrs)
 {
-   unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   struct page *page;
-
if (attrs & DMA_ATTR_NO_KERNEL_MAPPING) {
-   page = vaddr;
+   vaddr = page_to_virt((struct page *)vaddr); /* decode cookie */
} else if (platform_vaddr_uncached(vaddr)) {
-   page = virt_to_page(platform_vaddr_to_cached(vaddr));
+   vaddr = platform_vaddr_to_cached(vaddr);
} else {
WARN_ON_ONCE(1);
return;
}
 
-   if (!dma_release_from_contiguous(dev, page, count))
-   __free_pages(page, get_order(size));
+   dma_direct_free_pages(dev, size, vaddr, dma_handle, attrs);
 }
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/3] xtensa: remove partial support for DMA buffers in high memory

2018-09-20 Thread Christoph Hellwig
This reverts commit 6137e4166004e2ec383ac05d5ca15831f4668806.

We explicitly clear GFP_HIGHMEM from the allowed dma flags at the beginning
of the function (and the generic dma_alloc_attr function calling us does the
same!), so this code just adds dead wood.

Signed-off-by: Christoph Hellwig 
---
 arch/xtensa/kernel/pci-dma.c | 20 ++--
 1 file changed, 2 insertions(+), 18 deletions(-)

diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c
index 1fc138b6bc0a..a764d894ffdd 100644
--- a/arch/xtensa/kernel/pci-dma.c
+++ b/arch/xtensa/kernel/pci-dma.c
@@ -171,20 +171,6 @@ void *arch_dma_alloc(struct device *dev, size_t size, 
dma_addr_t *handle,
return page;
}
 
-#ifdef CONFIG_MMU
-   if (PageHighMem(page)) {
-   void *p;
-
-   p = dma_common_contiguous_remap(page, size, VM_MAP,
-   pgprot_noncached(PAGE_KERNEL),
-   __builtin_return_address(0));
-   if (!p) {
-   if (!dma_release_from_contiguous(dev, page, count))
-   __free_pages(page, get_order(size));
-   }
-   return p;
-   }
-#endif
BUG_ON(!platform_vaddr_cached(page_address(page)));
__invalidate_dcache_range((unsigned long)page_address(page), size);
return platform_vaddr_to_uncached(page_address(page));
@@ -201,10 +187,8 @@ void arch_dma_free(struct device *dev, size_t size, void 
*vaddr,
} else if (platform_vaddr_uncached(vaddr)) {
page = virt_to_page(platform_vaddr_to_cached(vaddr));
} else {
-#ifdef CONFIG_MMU
-   dma_common_free_remap(vaddr, size, VM_MAP);
-#endif
-   page = pfn_to_page(PHYS_PFN(dma_to_phys(dev, dma_handle)));
+   WARN_ON_ONCE(1);
+   return;
}
 
if (!dma_release_from_contiguous(dev, page, count))
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


xtensa dma-mapping tidyups

2018-09-20 Thread Christoph Hellwig
Hi Chris and Max,

this small series has a few tweaks to the xtensa dma-mapping code.

It is against the dma-mapping tree:

git://git.infradead.org/users/hch/dma-mapping.git for-next

Gitweb:


http://git.infradead.org/users/hch/dma-mapping.git/shortlog/refs/heads/for-next
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v1 0/6] Resolve unwanted DMA backing with IOMMU

2018-09-20 Thread Dmitry Osipenko

On 8/16/18 8:23 PM, Robin Murphy wrote:

On 15/08/18 20:56, Dmitry Osipenko wrote:

On Friday, 3 August 2018 18:43:41 MSK Robin Murphy wrote:

On 02/08/18 19:24, Dmitry Osipenko wrote:

On Friday, 27 July 2018 20:16:53 MSK Dmitry Osipenko wrote:

On Friday, 27 July 2018 20:03:26 MSK Jordan Crouse wrote:

On Fri, Jul 27, 2018 at 05:02:37PM +0100, Robin Murphy wrote:

On 27/07/18 15:10, Dmitry Osipenko wrote:

On Friday, 27 July 2018 12:03:28 MSK Will Deacon wrote:

On Fri, Jul 27, 2018 at 10:25:13AM +0200, Joerg Roedel wrote:

On Fri, Jul 27, 2018 at 02:16:18AM +0300, Dmitry Osipenko wrote:

The proposed solution adds a new option to the base device driver
structure that allows device drivers to explicitly convey to the
drivers
core that the implicit IOMMU backing for devices must not happen.


Why is IOMMU mapping a problem for the Tegra GPU driver?

If we add something like this then it should not be the choice of
the
device driver, but of the user and/or the firmware.


Agreed, and it would still need somebody to configure an identity
domain
so
that transactions aren't aborted immediately. We currently allow the
identity domain to be used by default via a command-line option, so I
guess
we'd need a way for firmware to request that on a per-device basis.


The IOMMU mapping itself is not a problem, the problem is the
management
of
the IOMMU. For Tegra we don't want anything to intrude into the IOMMU
activities because:

1) GPU HW require additional configuration for the IOMMU usage and
dumb
mapping of the allocations simply doesn't work.


Generally, that's already handled by the DRM drivers allocating
their own unmanaged domains. The only problem we really need to
solve in that regard is that currently the device DMA ops don't get
updated when moving away from the managed domain. That's been OK for
the VFIO case where the device is bound to a different driver which
we know won't make any explicit DMA API calls, but for the more
general case of IOMMU-aware drivers we could certainly do with a bit
of cooperation between the IOMMU API, DMA API, and arch code to
update the DMA ops dynamically to cope with intermediate subsystems
making DMA API calls on behalf of devices they don't know the
intimate details of.


2) Older Tegra generations have a limited resource and capabilities in
regards to IOMMU usage, allocating IOMMU domain per-device is just
impossible for example.

3) HW performs context switches and so particular allocations have to
be
assigned to a particular contexts IOMMU domain.


I understand Qualcomm SoCs have a similar thing too, and AFAICS that
case just doesn't fit into the current API model at all. We need the
IOMMU driver to somehow know about the specific details of which
devices have magic associations with specific contexts, and we
almost certainly need a more expressive interface than
iommu_domain_alloc() to have any hope of reliable results.


This is correct for Qualcomm GPUs - The GPU hardware context switching
requires a specific context and there are some restrictions around
secure contexts as well.

We don't really care if the DMA attaches to a context just as long as it
doesn't attach to the one(s) we care about. Perhaps a "valid context"
mask
would work in from the DT or the device struct to give the subsystems a
clue as to which domains they were allowed to use. I recognize that
there
isn't a one-size-fits-all solution to this problem so I'm open to
different
ideas.


Designating whether implicit IOMMU backing is appropriate for a device
via
device-tree property sounds a bit awkward because that will be a kinda
software description (of a custom Linux driver model), while device-tree
is
supposed to describe HW.

What about to grant IOMMU drivers with ability to decide whether the
implicit backing for a device is appropriate? Like this:

bool implicit_iommu_for_dma_is_allowed(struct device *dev)
{

const struct iommu_ops *ops = dev->bus->iommu_ops;
struct iommu_group *group;

group = iommu_group_get(dev);
if (!group)

    return NULL;

iommu_group_put(group);

if (!ops->implicit_iommu_for_dma_is_allowed)

    return true;

return ops->implicit_iommu_for_dma_is_allowed(dev);

}

Then arch_setup_dma_ops() could have a clue whether implicit IOMMU
backing
for a device is appropriate.


Guys, does it sound good to you or maybe you have something else on your
mind? Even if it's not an ideal solution, it fixes the immediate problem
and should be good enough for the starter.


To me that looks like a step ion the wrong direction that won't help at
all in actually addressing the underlying issues.

If the GPU driver wants to explicitly control IOMMU mappings instead of
relying on the IOMMU_DOMAIN_DMA abstraction, then it should use its own
unmanaged domain. At that point it shouldn't matter if a DMA ops domain
was allocated, since the GPU device will no longer be attached to it.


It is not obvious to me what solution you are proposing..

Are you 

Re: [PATCH] iommu: Fix passthrough option documentation

2018-09-20 Thread Jonathan Corbet
On Thu, 20 Sep 2018 14:14:26 +0100
Robin Murphy  wrote:

> Document that the default for "iommu.passthrough" is now configurable.
> 
> Fixes: 58d1131777a4 ("iommu: Add config option to set passthrough as default")
> Signed-off-by: Robin Murphy 
> ---
> 
> Joerg, Jon, as a pure doc fix I'll leave it to your discretion as to
> whose tree is most appropriate (it shouldn't conflict with the other
> IOMMU patches I'm working on)

I went ahead and applied it, thanks.

jon
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: misc ia64 cleanups (resend)

2018-09-20 Thread Luck, Tony
> A couple random cleanups I stumbled upon when doing dma related work.

Christoph,

Thanks.  Applied. Will send pull request for next merge window.

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 7/7] iommu/arm-smmu: Support non-strict mode

2018-09-20 Thread Robin Murphy
All we need is to wire up .flush_iotlb_all properly and implement the
domain attribute, and iommu-dma and io-pgtable will do the rest for us.
The only real subtlety is documenting the barrier semantics we're
introducing between io-pgtable and the drivers for non-strict flushes.

Signed-off-by: Robin Murphy 
---

v8:
 - Use nested switches for attrs
 - Fix barriers

 drivers/iommu/arm-smmu.c | 93 
 1 file changed, 66 insertions(+), 27 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index fd1b80ef9490..79ece565d96d 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -246,6 +246,7 @@ struct arm_smmu_domain {
const struct iommu_gather_ops   *tlb_ops;
struct arm_smmu_cfg cfg;
enum arm_smmu_domain_stage  stage;
+   boolnon_strict;
struct mutexinit_mutex; /* Protects smmu pointer */
spinlock_t  cb_lock; /* Serialises ATS1* ops and 
TLB syncs */
struct iommu_domain domain;
@@ -447,7 +448,11 @@ static void arm_smmu_tlb_inv_context_s1(void *cookie)
struct arm_smmu_cfg *cfg = _domain->cfg;
void __iomem *base = ARM_SMMU_CB(smmu_domain->smmu, cfg->cbndx);
 
-   writel_relaxed(cfg->asid, base + ARM_SMMU_CB_S1_TLBIASID);
+   /*
+* NOTE: this is not a relaxed write; it needs to guarantee that PTEs
+* cleared by the current CPU are visible to the SMMU before the TLBI.
+*/
+   writel(cfg->asid, base + ARM_SMMU_CB_S1_TLBIASID);
arm_smmu_tlb_sync_context(cookie);
 }
 
@@ -457,7 +462,8 @@ static void arm_smmu_tlb_inv_context_s2(void *cookie)
struct arm_smmu_device *smmu = smmu_domain->smmu;
void __iomem *base = ARM_SMMU_GR0(smmu);
 
-   writel_relaxed(smmu_domain->cfg.vmid, base + ARM_SMMU_GR0_TLBIVMID);
+   /* NOTE: see above */
+   writel(smmu_domain->cfg.vmid, base + ARM_SMMU_GR0_TLBIVMID);
arm_smmu_tlb_sync_global(smmu);
 }
 
@@ -863,6 +869,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_NO_DMA;
 
+   if (smmu_domain->non_strict)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
+
smmu_domain->smmu = smmu;
pgtbl_ops = alloc_io_pgtable_ops(fmt, _cfg, smmu_domain);
if (!pgtbl_ops) {
@@ -1252,6 +1261,14 @@ static size_t arm_smmu_unmap(struct iommu_domain 
*domain, unsigned long iova,
return ops->unmap(ops, iova, size);
 }
 
+static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+   if (smmu_domain->tlb_ops)
+   smmu_domain->tlb_ops->tlb_flush_all(smmu_domain);
+}
+
 static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
 {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
@@ -1470,15 +1487,27 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
 {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
-   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
-   return -EINVAL;
-
-   switch (attr) {
-   case DOMAIN_ATTR_NESTING:
-   *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
-   return 0;
+   switch(domain->type) {
+   case IOMMU_DOMAIN_UNMANAGED:
+   switch (attr) {
+   case DOMAIN_ATTR_NESTING:
+   *(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);
+   return 0;
+   default:
+   return -ENODEV;
+   }
+   break;
+   case IOMMU_DOMAIN_DMA:
+   switch (attr) {
+   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
+   *(int *)data = smmu_domain->non_strict;
+   return 0;
+   default:
+   return -ENODEV;
+   }
+   break;
default:
-   return -ENODEV;
+   return -EINVAL;
}
 }
 
@@ -1488,28 +1517,38 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
int ret = 0;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
-   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
-   return -EINVAL;
-
mutex_lock(_domain->init_mutex);
 
-   switch (attr) {
-   case DOMAIN_ATTR_NESTING:
-   if (smmu_domain->smmu) {
-   ret = -EPERM;
-   goto out_unlock;
+   switch(domain->type) {
+   case IOMMU_DOMAIN_UNMANAGED:
+   switch (attr) {
+   case DOMAIN_ATTR_NESTING:
+   if (smmu_domain->smmu) {
+   ret = -EPERM;
+   

[PATCH v8 3/7] iommu: Add "iommu.strict" command line option

2018-09-20 Thread Robin Murphy
From: Zhen Lei 

Add a generic command line option to enable lazy unmapping via IOVA
flush queues, which will initally be suuported by iommu-dma. This echoes
the semantics of "intel_iommu=strict" (albeit with the opposite default
value), but in the driver-agnostic fashion of "iommu.passthrough".

Signed-off-by: Zhen Lei 
[rm: move handling out of SMMUv3 driver, clean up documentation]
Signed-off-by: Robin Murphy 
---

v8:
 - Rename "non-strict" to "strict" to better match existing options
 - Rewrite doc text/commit message
 - Downgrade boot-time message from warn/taint to info

 .../admin-guide/kernel-parameters.txt | 12 ++
 drivers/iommu/iommu.c | 23 +++
 2 files changed, 35 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 9871e649ffef..92ae12aeabf4 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1749,6 +1749,18 @@
nobypass[PPC/POWERNV]
Disable IOMMU bypass, using IOMMU for PCI devices.
 
+   iommu.strict=   [ARM64] Configure TLB invalidation behaviour
+   Format: { "0" | "1" }
+   0 - Lazy mode.
+ Request that DMA unmap operations use deferred
+ invalidation of hardware TLBs, for increased
+ throughput at the cost of reduced device isolation.
+ Will fall back to strict mode if not supported by
+ the relevant IOMMU driver.
+   1 - Strict mode (default).
+ DMA unmap operations invalidate IOMMU hardware TLBs
+ synchronously.
+
iommu.passthrough=
[ARM64] Configure DMA to bypass the IOMMU by default.
Format: { "0" | "1" }
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8c15c5980299..02b6603f0616 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -41,6 +41,7 @@ static unsigned int iommu_def_domain_type = 
IOMMU_DOMAIN_IDENTITY;
 #else
 static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA;
 #endif
+static bool iommu_dma_strict __read_mostly = true;
 
 struct iommu_callback_data {
const struct iommu_ops *ops;
@@ -131,6 +132,21 @@ static int __init iommu_set_def_domain_type(char *str)
 }
 early_param("iommu.passthrough", iommu_set_def_domain_type);
 
+static int __init iommu_dma_setup(char *str)
+{
+   int ret;
+
+   ret = kstrtobool(str, _dma_strict);
+   if (ret)
+   return ret;
+
+   if (!iommu_dma_strict)
+   pr_info("Enabling deferred TLB invalidation for DMA; protection 
against malicious/malfunctioning devices may be reduced.\n");
+
+   return 0;
+}
+early_param("iommu.strict", iommu_dma_setup);
+
 static ssize_t iommu_group_attr_show(struct kobject *kobj,
 struct attribute *__attr, char *buf)
 {
@@ -1072,6 +1088,13 @@ struct iommu_group *iommu_group_get_for_dev(struct 
device *dev)
group->default_domain = dom;
if (!group->domain)
group->domain = dom;
+
+   if (dom && !iommu_dma_strict) {
+   int attr = 1;
+   iommu_domain_set_attr(dom,
+ DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+ );
+   }
}
 
ret = iommu_group_add_device(group, dev);
-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 5/7] iommu/arm-smmu-v3: Add support for non-strict mode

2018-09-20 Thread Robin Murphy
From: Zhen Lei 

Now that io-pgtable knows how to dodge strict TLB maintenance, all
that's left to do is bridge the gap between the IOMMU core requesting
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE for default domains, and showing the
appropriate IO_PGTABLE_QUIRK_NON_STRICT flag to alloc_io_pgtable_ops().

Signed-off-by: Zhen Lei 
[rm: convert to domain attribute, tweak commit message]
Signed-off-by: Robin Murphy 
---

v8:
 - Use nested switches for attrs
 - Document barrier semantics

 drivers/iommu/arm-smmu-v3.c | 79 ++---
 1 file changed, 56 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index f10c852479fc..db402e8b068b 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -612,6 +612,7 @@ struct arm_smmu_domain {
struct mutexinit_mutex; /* Protects smmu pointer */
 
struct io_pgtable_ops   *pgtbl_ops;
+   boolnon_strict;
 
enum arm_smmu_domain_stage  stage;
union {
@@ -1407,6 +1408,12 @@ static void arm_smmu_tlb_inv_context(void *cookie)
cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
}
 
+   /*
+* NOTE: when io-pgtable is in non-strict mode, we may get here with
+* PTEs previously cleared by unmaps on the current CPU not yet visible
+* to the SMMU. We are relying on the DSB implicit in queue_inc_prod()
+* to guarantee those are observed before the TLBI. Do be careful, 007.
+*/
arm_smmu_cmdq_issue_cmd(smmu, );
__arm_smmu_tlb_sync(smmu);
 }
@@ -1633,6 +1640,9 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain)
if (smmu->features & ARM_SMMU_FEAT_COHERENCY)
pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_NO_DMA;
 
+   if (smmu_domain->non_strict)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
+
pgtbl_ops = alloc_io_pgtable_ops(fmt, _cfg, smmu_domain);
if (!pgtbl_ops)
return -ENOMEM;
@@ -1934,15 +1944,27 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
 {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
-   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
-   return -EINVAL;
-
-   switch (attr) {
-   case DOMAIN_ATTR_NESTING:
-   *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
-   return 0;
+   switch (domain->type) {
+   case IOMMU_DOMAIN_UNMANAGED:
+   switch (attr) {
+   case DOMAIN_ATTR_NESTING:
+   *(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);
+   return 0;
+   default:
+   return -ENODEV;
+   }
+   break;
+   case IOMMU_DOMAIN_DMA:
+   switch (attr) {
+   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
+   *(int *)data = smmu_domain->non_strict;
+   return 0;
+   default:
+   return -ENODEV;
+   }
+   break;
default:
-   return -ENODEV;
+   return -EINVAL;
}
 }
 
@@ -1952,26 +1974,37 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
int ret = 0;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
-   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
-   return -EINVAL;
-
mutex_lock(_domain->init_mutex);
 
-   switch (attr) {
-   case DOMAIN_ATTR_NESTING:
-   if (smmu_domain->smmu) {
-   ret = -EPERM;
-   goto out_unlock;
+   switch (domain->type) {
+   case IOMMU_DOMAIN_UNMANAGED:
+   switch (attr) {
+   case DOMAIN_ATTR_NESTING:
+   if (smmu_domain->smmu) {
+   ret = -EPERM;
+   goto out_unlock;
+   }
+
+   if (*(int *)data)
+   smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
+   else
+   smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
+   break;
+   default:
+   ret = -ENODEV;
+   }
+   break;
+   case IOMMU_DOMAIN_DMA:
+   switch(attr) {
+   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
+   smmu_domain->non_strict = *(int *)data;
+   break;
+   default:
+   ret = -ENODEV;
}
-
-   if (*(int *)data)
-   smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
-   else
-   smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
-
break;
default:
-   ret = -ENODEV;
+ 

[PATCH v8 6/7] iommu/io-pgtable-arm-v7s: Add support for non-strict mode

2018-09-20 Thread Robin Murphy
As for LPAE, it's simply a case of skipping the leaf invalidation for a
regular unmap, and ensuring that the one in split_blk_unmap() is paired
with an explicit sync ASAP rather than relying on one which might only
eventually happen way down the line.

Signed-off-by: Robin Murphy 
---

v8: Do this properly instead of just hacking around not doing it
(look, it turns out barely any more complicated!)

 drivers/iommu/io-pgtable-arm-v7s.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index b5948ba6b3b3..445c3bde0480 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -587,6 +587,7 @@ static size_t arm_v7s_split_blk_unmap(struct 
arm_v7s_io_pgtable *data,
}
 
io_pgtable_tlb_add_flush(>iop, iova, size, size, true);
+   io_pgtable_tlb_sync(>iop);
return size;
 }
 
@@ -642,6 +643,13 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable 
*data,
io_pgtable_tlb_sync(iop);
ptep = iopte_deref(pte[i], lvl);
__arm_v7s_free_table(ptep, lvl + 1, data);
+   } else if (iop->cfg.quirks & 
IO_PGTABLE_QUIRK_NON_STRICT) {
+   /*
+* Order the PTE update against queueing the 
IOVA, to
+* guarantee that a flush callback from a 
different CPU
+* has observed it before the TLBIALL can be 
issued.
+*/
+   smp_wmb();
} else {
io_pgtable_tlb_add_flush(iop, iova, blk_size,
 blk_size, true);
@@ -712,7 +720,8 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct 
io_pgtable_cfg *cfg,
IO_PGTABLE_QUIRK_NO_PERMS |
IO_PGTABLE_QUIRK_TLBI_ON_MAP |
IO_PGTABLE_QUIRK_ARM_MTK_4GB |
-   IO_PGTABLE_QUIRK_NO_DMA))
+   IO_PGTABLE_QUIRK_NO_DMA |
+   IO_PGTABLE_QUIRK_NON_STRICT))
return NULL;
 
/* If ARM_MTK_4GB is enabled, the NO_PERMS is also expected. */
-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 4/7] iommu/io-pgtable-arm: Add support for non-strict mode

2018-09-20 Thread Robin Murphy
From: Zhen Lei 

Non-strict mode is simply a case of skipping 'regular' leaf TLBIs, since
the sync is already factored out into ops->iotlb_sync at the core API
level. Non-leaf invalidations where we change the page table structure
itself still have to be issued synchronously in order to maintain walk
caches correctly.

To save having to reason about it too much, make sure the invalidation
in arm_lpae_split_blk_unmap() just performs its own unconditional sync
to minimise the window in which we're technically violating the break-
before-make requirement on a live mapping. This might work out redundant
with an outer-level sync for strict unmaps, but we'll never be splitting
blocks on a DMA fastpath anyway.

Signed-off-by: Zhen Lei 
[rm: tweak comment, commit message, split_blk_unmap logic and barriers]
Signed-off-by: Robin Murphy 
---

v8: Add barrier for the fiddly cross-cpu flush case

 drivers/iommu/io-pgtable-arm.c | 14 --
 drivers/iommu/io-pgtable.h |  5 +
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 2f79efd16a05..237cacd4a62b 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -576,6 +576,7 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,
tablep = iopte_deref(pte, data);
} else if (unmap_idx >= 0) {
io_pgtable_tlb_add_flush(>iop, iova, size, size, true);
+   io_pgtable_tlb_sync(>iop);
return size;
}
 
@@ -609,6 +610,13 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable 
*data,
io_pgtable_tlb_sync(iop);
ptep = iopte_deref(pte, data);
__arm_lpae_free_pgtable(data, lvl + 1, ptep);
+   } else if (iop->cfg.quirks & IO_PGTABLE_QUIRK_NON_STRICT) {
+   /*
+* Order the PTE update against queueing the IOVA, to
+* guarantee that a flush callback from a different CPU
+* has observed it before the TLBIALL can be issued.
+*/
+   smp_wmb();
} else {
io_pgtable_tlb_add_flush(iop, iova, size, size, true);
}
@@ -771,7 +779,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, 
void *cookie)
u64 reg;
struct arm_lpae_io_pgtable *data;
 
-   if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA))
+   if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA |
+   IO_PGTABLE_QUIRK_NON_STRICT))
return NULL;
 
data = arm_lpae_alloc_pgtable(cfg);
@@ -863,7 +872,8 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, 
void *cookie)
struct arm_lpae_io_pgtable *data;
 
/* The NS quirk doesn't apply at stage 2 */
-   if (cfg->quirks & ~IO_PGTABLE_QUIRK_NO_DMA)
+   if (cfg->quirks & ~(IO_PGTABLE_QUIRK_NO_DMA |
+   IO_PGTABLE_QUIRK_NON_STRICT))
return NULL;
 
data = arm_lpae_alloc_pgtable(cfg);
diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
index 2df79093cad9..47d5ae559329 100644
--- a/drivers/iommu/io-pgtable.h
+++ b/drivers/iommu/io-pgtable.h
@@ -71,12 +71,17 @@ struct io_pgtable_cfg {
 *  be accessed by a fully cache-coherent IOMMU or CPU (e.g. for a
 *  software-emulated IOMMU), such that pagetable updates need not
 *  be treated as explicit DMA data.
+*
+* IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs
+*  on unmap, for DMA domains using the flush queue mechanism for
+*  delayed invalidation.
 */
#define IO_PGTABLE_QUIRK_ARM_NS BIT(0)
#define IO_PGTABLE_QUIRK_NO_PERMS   BIT(1)
#define IO_PGTABLE_QUIRK_TLBI_ON_MAPBIT(2)
#define IO_PGTABLE_QUIRK_ARM_MTK_4GBBIT(3)
#define IO_PGTABLE_QUIRK_NO_DMA BIT(4)
+   #define IO_PGTABLE_QUIRK_NON_STRICT BIT(5)
unsigned long   quirks;
unsigned long   pgsize_bitmap;
unsigned intias;
-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 2/7] iommu/dma: Add support for non-strict mode

2018-09-20 Thread Robin Murphy
From: Zhen Lei 

With the flush queue infrastructure already abstracted into IOVA
domains, hooking it up in iommu-dma is pretty simple. Since there is a
degree of dependency on the IOMMU driver knowing what to do to play
along, we key the whole thing off a domain attribute which will be set
on default DMA ops domains to request non-strict invalidation. That way,
drivers can indicate the appropriate support by acknowledging the
attribute, and we can easily fall back to strict invalidation otherwise.

The flush queue callback needs a handle on the iommu_domain which owns
our cookie, so we have to add a pointer back to that, but neatly, that's
also sufficient to indicate whether we're using a flush queue or not,
and thus which way to release IOVAs. The only slight subtlety is
switching __iommu_dma_unmap() from calling iommu_unmap() to explicit
iommu_unmap_fast()/iommu_tlb_sync() so that we can elide the sync
entirely in non-strict mode.

Signed-off-by: Zhen Lei 
[rm: convert to domain attribute, tweak comments and commit message]
Signed-off-by: Robin Murphy 
---

v8:
 - Rewrite commit message/comments
 - Don't initialise "attr" unnecessarily
 - Rename "domain" to "fq_domain" for clarity
 - Don't let init_iova_flush_queue() be called more than once

 drivers/iommu/dma-iommu.c | 32 +++-
 include/linux/iommu.h |  1 +
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 511ff9a1d6d9..cc1bf786cfac 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -55,6 +55,9 @@ struct iommu_dma_cookie {
};
struct list_headmsi_page_list;
spinlock_t  msi_lock;
+
+   /* Domain for flush queue callback; NULL if flush queue not in use */
+   struct iommu_domain *fq_domain;
 };
 
 static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
@@ -257,6 +260,20 @@ static int iova_reserve_iommu_regions(struct device *dev,
return ret;
 }
 
+static void iommu_dma_flush_iotlb_all(struct iova_domain *iovad)
+{
+   struct iommu_dma_cookie *cookie;
+   struct iommu_domain *domain;
+
+   cookie = container_of(iovad, struct iommu_dma_cookie, iovad);
+   domain = cookie->fq_domain;
+   /*
+* The IOMMU driver supporting DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
+* implies that ops->flush_iotlb_all must be non-NULL.
+*/
+   domain->ops->flush_iotlb_all(domain);
+}
+
 /**
  * iommu_dma_init_domain - Initialise a DMA mapping domain
  * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
@@ -275,6 +292,7 @@ int iommu_dma_init_domain(struct iommu_domain *domain, 
dma_addr_t base,
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = >iovad;
unsigned long order, base_pfn, end_pfn;
+   int attr;
 
if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)
return -EINVAL;
@@ -308,6 +326,13 @@ int iommu_dma_init_domain(struct iommu_domain *domain, 
dma_addr_t base,
}
 
init_iova_domain(iovad, 1UL << order, base_pfn);
+
+   if (!cookie->fq_domain && !iommu_domain_get_attr(domain,
+   DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE, ) && attr) {
+   cookie->fq_domain = domain;
+   init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all, NULL);
+   }
+
if (!dev)
return 0;
 
@@ -393,6 +418,9 @@ static void iommu_dma_free_iova(struct iommu_dma_cookie 
*cookie,
/* The MSI case is only ever cleaning up its most recent allocation */
if (cookie->type == IOMMU_DMA_MSI_COOKIE)
cookie->msi_iova -= size;
+   else if (cookie->fq_domain) /* non-strict mode */
+   queue_iova(iovad, iova_pfn(iovad, iova),
+   size >> iova_shift(iovad), 0);
else
free_iova_fast(iovad, iova_pfn(iovad, iova),
size >> iova_shift(iovad));
@@ -408,7 +436,9 @@ static void __iommu_dma_unmap(struct iommu_domain *domain, 
dma_addr_t dma_addr,
dma_addr -= iova_off;
size = iova_align(iovad, size + iova_off);
 
-   WARN_ON(iommu_unmap(domain, dma_addr, size) != size);
+   WARN_ON(iommu_unmap_fast(domain, dma_addr, size) != size);
+   if (!cookie->fq_domain)
+   iommu_tlb_sync(domain);
iommu_dma_free_iova(cookie, dma_addr, size);
 }
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 87994c265bf5..decabe8e8dbe 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -124,6 +124,7 @@ enum iommu_attr {
DOMAIN_ATTR_FSL_PAMU_ENABLE,
DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_NESTING,/* two stages of translation */
+   DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
DOMAIN_ATTR_MAX,
 };
 
-- 
2.19.0.dirty

___
iommu 

[PATCH v8 1/7] iommu/arm-smmu-v3: Implement flush_iotlb_all hook

2018-09-20 Thread Robin Murphy
From: Zhen Lei 

.flush_iotlb_all is currently stubbed to arm_smmu_iotlb_sync() since the
only time it would ever need to actually do anything is for callers
doing their own explicit batching, e.g.:

iommu_unmap_fast(domain, ...);
iommu_unmap_fast(domain, ...);
iommu_iotlb_flush_all(domain, ...);

where since io-pgtable still issues the TLBI commands implicitly in the
unmap instead of implementing .iotlb_range_add, the "flush" only needs
to ensure completion of those already-in-flight invalidations.

However, we're about to start using it in anger with flush queues, so
let's get a proper implementation wired up.

Signed-off-by: Zhen Lei 
Reviewed-by: Robin Murphy 
[rm: document why it wasn't a bug]
Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu-v3.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index e395f1ff3f81..f10c852479fc 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1781,6 +1781,14 @@ arm_smmu_unmap(struct iommu_domain *domain, unsigned 
long iova, size_t size)
return ops->unmap(ops, iova, size);
 }
 
+static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+   if (smmu_domain->smmu)
+   arm_smmu_tlb_inv_context(smmu_domain);
+}
+
 static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
 {
struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
@@ -2008,7 +2016,7 @@ static struct iommu_ops arm_smmu_ops = {
.attach_dev = arm_smmu_attach_dev,
.map= arm_smmu_map,
.unmap  = arm_smmu_unmap,
-   .flush_iotlb_all= arm_smmu_iotlb_sync,
+   .flush_iotlb_all= arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys   = arm_smmu_iova_to_phys,
.add_device = arm_smmu_add_device,
-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 0/7] Add non-strict mode support for iommu-dma

2018-09-20 Thread Robin Murphy
Hi all,

Hopefully this is the last spin of the series - I've now dropped my light
touch and fixed up all the various prose text, plus implemented the proper
quirk support for short-descriptor because it's actually just a trivial
cut-and-paste job.

Robin.


Robin Murphy (2):
  iommu/io-pgtable-arm-v7s: Add support for non-strict mode
  iommu/arm-smmu: Support non-strict mode

Zhen Lei (5):
  iommu/arm-smmu-v3: Implement flush_iotlb_all hook
  iommu/dma: Add support for non-strict mode
  iommu: Add "iommu.strict" command line option
  iommu/io-pgtable-arm: Add support for non-strict mode
  iommu/arm-smmu-v3: Add support for non-strict mode

 .../admin-guide/kernel-parameters.txt | 12 +++
 drivers/iommu/arm-smmu-v3.c   | 89 +-
 drivers/iommu/arm-smmu.c  | 93 +--
 drivers/iommu/dma-iommu.c | 32 ++-
 drivers/iommu/io-pgtable-arm-v7s.c| 11 ++-
 drivers/iommu/io-pgtable-arm.c| 14 ++-
 drivers/iommu/io-pgtable.h|  5 +
 drivers/iommu/iommu.c | 23 +
 include/linux/iommu.h |  1 +
 9 files changed, 225 insertions(+), 55 deletions(-)

-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device

2018-09-20 Thread Jacob Pan
On Wed, 19 Sep 2018 02:22:03 +
"Tian, Kevin"  wrote:

> > From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> > Sent: Tuesday, September 18, 2018 11:47 PM
> > 
> > On 14/09/2018 22:04, Jacob Pan wrote:  
> > >> This example only needs to modify first-level translation, and
> > >> works with SMMUv3. The kernel here could be the host, in which
> > >> case second-level translation is disabled in the SMMU, or it
> > >> could be the guest, in which case second-level mappings are
> > >> created by QEMU and first-level translation is managed by
> > >> assigning PASID tables to the guest.  
> > > There is a difference in case of guest SVA. VT-d v3 will bind
> > > guest PASID and guest CR3 instead of the guest PASID table. Then
> > > turn on nesting. In case of mdev, the second level is obtained
> > > from the aux domain which was setup for the default PASID. Or in
> > > case of PCI device, second level is harvested from RID2PASID.  
> > 
> > Right, though I wasn't talking about the host managing guest SVA
> > here, but a kernel binding the address space of one of its
> > userspace drivers to the mdev.
> >   
> > >> So (2) would use iommu_sva_bind_device(),  
> > > We would need something different than that for guest bind, just
> > > to show the two cases:>
> > > int iommu_sva_bind_device(struct device *dev, struct mm_struct
> > > *mm,  
> > int  
> > > *pasid, unsigned long flags, void *drvdata)
> > >
> > > (WIP)
> > > int sva_bind_gpasid(struct device *dev, struct gpasid_bind_data
> > > *data) where:
> > > /**
> > >  * struct gpasid_bind_data - Information about device and guest
> > > PASID binding
> > >  * @pasid:   Process address space ID used for the guest mm
> > >  * @addr_width:  Guest address width. Paging mode can also be
> > > derived.
> > >  * @gcr3:    Guest CR3 value from guest mm
> > >  */
> > > struct gpasid_bind_data {
> > >     __u32 pasid;
> > >     __u64 gcr3;
> > >     __u32 addr_width;
> > >     __u32 flags;
> > > #define IOMMU_SVA_GPASID_SRE    BIT(0) /* supervisor request */
> > > };
> > > Perhaps there is room to merge with io_mm but the life cycle  
> > management  
> > > of guest PASID and host PASID will be different if you rely on mm
> > > release callback than FD.  
> 
> let's not calling gpasid here - which makes sense only in
> bind_pasid_table proposal where pasid table thus pasid space is
> managed by guest. In above context it is always about host pasid
> (allocated in system-wide), which could point to a host cr3 (user
> process) or a guest cr3 (vm case).
> 
I agree this gpasid is confusing, we have a system wide PASID
name space. Just a way to differentiate different bind, perhaps
just a flag indicating the PASID is used for guest.
i.e.
struct pasid_bind_data {
     __u32 pasid;
     __u64 gcr3;
     __u32 addr_width;
     __u32 flags;
#define IOMMU_SVA_GPASID_SRE    BIT(0) /* supervisor request */
#define IOMMU_SVA_PASID_GUEST   BIT(0) /* host pasid used by guest */
};

> > I think gpasid management should stay separate from io_mm, since in
> > your case VFIO mechanisms are used for life cycle management of the
> > VM, similarly to the former bind_pasid_table proposal. For example
> > closing the container fd would unbind all guest page tables. The
> > QEMU process' address space lifetime seems like the wrong thing to
> > track for gpasid.  
> 
> I sort of agree (though not thinking through all the flow carefully).
> PASIDs are allocated per iommu domain, thus release also happens when
> domain is detached (along with container fd close).
> 
I also prefer to keep gpasid separate.

But I don't think we need to have per iommu domain per PASID for guest
SVA case. Assuming you are talking about host IOMMU domain. The PASID
bind call is a result of guest PASID cache flush with a PASID
previously allocated. The host just need to put gcr3 into the PASID
entry then harvest the second level from the existing domain.
> >   
> > >> but (1) needs something
> > >> else. Aren't auxiliary domains suitable for (1)? Why limit
> > >> auxiliary domain to second-level or nested translation? It seems
> > >> silly to use a different API for first-level, since the flow in
> > >> userspace and VFIO is the same as your second-level case as far
> > >> as MAP_DMA ioctl goes. The difference is that in your case the
> > >> auxiliary domain supports an additional operation which binds
> > >> first-level page tables. An auxiliary domain that only supports
> > >> first-level wouldn't support this operation, but it can still
> > >> implement iommu_map/unmap/etc. 
> > > I think the intention is that when a mdev is created, we don;t
> > > know whether it will be used for SVA or IOVA. So aux domain is
> > > here to "hold a spot" for the default PASID such that MAP_DMA
> > > calls can work as usual, which is second level only. Later, if
> > > SVA is used on the mdev there will be another PASID allocated for
> > > that purpose. Do we need to create an 

Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive

2018-09-20 Thread Jerome Glisse
On Thu, Sep 20, 2018 at 01:55:43PM +0800, Kenneth Lee wrote:
> On Tue, Sep 18, 2018 at 09:03:14AM -0400, Jerome Glisse wrote:
> > On Tue, Sep 18, 2018 at 02:00:14PM +0800, Kenneth Lee wrote:
> > > On Mon, Sep 17, 2018 at 08:37:45AM -0400, Jerome Glisse wrote:
> > > > On Mon, Sep 17, 2018 at 04:39:40PM +0800, Kenneth Lee wrote:
> > > > > On Sun, Sep 16, 2018 at 09:42:44PM -0400, Jerome Glisse wrote:
> > > > > > So i want to summarize issues i have as this threads have dig deep 
> > > > > > into
> > > > > > details. For this i would like to differentiate two cases first the 
> > > > > > easy
> > > > > > one when relying on SVA/SVM. Then the second one when there is no 
> > > > > > SVA/SVM.
> > > > > 
> > > > > Thank you very much for the summary.
> > > > > 
> > > > > > In both cases your objectives as i understand them:
> > > > > > 
> > > > > > [R1]- expose a common user space API that make it easy to share 
> > > > > > boiler
> > > > > >   plate code accross many devices (discovering devices, opening
> > > > > >   device, creating context, creating command queue ...).
> > > > > > [R2]- try to share the device as much as possible up to device 
> > > > > > limits
> > > > > >   (number of independant queues the device has)
> > > > > > [R3]- minimize syscall by allowing user space to directly schedule 
> > > > > > on the
> > > > > >   device queue without a round trip to the kernel
> > > > > > 
> > > > > > I don't think i missed any.
> > > > > > 
> > > > > > 
> > > > > > (1) Device with SVA/SVM
> > > > > > 
> > > > > > For that case it is easy, you do not need to be in VFIO or part of 
> > > > > > any
> > > > > > thing specific in the kernel. There is no security risk (modulo bug 
> > > > > > in
> > > > > > the SVA/SVM silicon). Fork/exec is properly handle and binding a 
> > > > > > process
> > > > > > to a device is just couple dozen lines of code.
> > > > > > 
> > > > > 
> > > > > This is right...logically. But the kernel has no clear definition 
> > > > > about "Device
> > > > > with SVA/SVM" and no boiler plate for doing so. Then VFIO may become 
> > > > > one of the
> > > > > boiler plate.
> > > > > 
> > > > > VFIO is one of the wrappers for IOMMU for user space. And maybe it is 
> > > > > the only
> > > > > one. If we add that support within VFIO, which solve most of the 
> > > > > problem of
> > > > > SVA/SVM, it will save a lot of work in the future.
> > > > 
> > > > You do not need to "wrap" IOMMU for SVA/SVM. Existing upstream SVA/SVM 
> > > > user
> > > > all do the SVA/SVM setup in couple dozen lines and i failed to see how 
> > > > it
> > > > would require any more than that in your case.
> > > > 
> > > > 
> > > > > I think this is the key confliction between us. So could Alex please 
> > > > > say
> > > > > something here? If the VFIO is going to take this into its scope, we 
> > > > > can try
> > > > > together to solve all the problem on the way. If it it is not, it is 
> > > > > also
> > > > > simple, we can just go to another way to fulfill this part of 
> > > > > requirements even
> > > > > we have to duplicate most of the code.
> > > > > 
> > > > > Another point I need to emphasis here: because we have to replace the 
> > > > > hardware
> > > > > queue when fork, so it won't be very simple even in SVA/SVM case.
> > > > 
> > > > I am assuming hardware queue can only be setup by the kernel and thus
> > > > you are totaly safe forkwise as the queue is setup against a PASID and
> > > > the child does not bind to any PASID and you use VM_DONTCOPY on the
> > > > mmap of the hardware MMIO queue because you should really use that flag
> > > > for that.
> > > > 
> > > > 
> > > > > > (2) Device does not have SVA/SVM (or it is disabled)
> > > > > > 
> > > > > > You want to still allow device to be part of your framework. However
> > > > > > here i see fundamentals securities issues and you move the burden of
> > > > > > being careful to user space which i think is a bad idea. We should
> > > > > > never trus the userspace from kernel space.
> > > > > > 
> > > > > > To keep the same API for the user space code you want a 1:1 mapping
> > > > > > between device physical address and process virtual address (ie if
> > > > > > device access device physical address A it is accessing the same
> > > > > > memory as what is backing the virtual address A in the process.
> > > > > > 
> > > > > > Security issues are on two things:
> > > > > > [I1]- fork/exec, a process who opened any such device and created an
> > > > > >   active queue can transfer without its knowledge control of its
> > > > > >   commands queue through COW. The parent map some anonymous 
> > > > > > region
> > > > > >   to the device as a command queue buffer but because of COW the
> > > > > >   parent can be the first to copy on write and thus the child 
> > > > > > can
> > > > > >   inherit the original pages that are mapped to the hardware.
> > > > > >   Here parent lose control and child gain it.
> > 

[PATCH] iommu: Fix passthrough option documentation

2018-09-20 Thread Robin Murphy
Document that the default for "iommu.passthrough" is now configurable.

Fixes: 58d1131777a4 ("iommu: Add config option to set passthrough as default")
Signed-off-by: Robin Murphy 
---

Joerg, Jon, as a pure doc fix I'll leave it to your discretion as to
whose tree is most appropriate (it shouldn't conflict with the other
IOMMU patches I'm working on)

 Documentation/admin-guide/kernel-parameters.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 406b91759b62..8ab2984692ff 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1767,7 +1767,7 @@
Format: { "0" | "1" }
0 - Use IOMMU translation for DMA.
1 - Bypass the IOMMU for DMA.
-   unset - Use IOMMU translation for DMA.
+   unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.
 
io7=[HW] IO7 for Marvel based alpha systems
See comment before marvel_specify_io7 in
-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/3] iommu: Avoid DMA ops domain refcount contention

2018-09-20 Thread John Garry

On 17/09/2018 12:20, John Garry wrote:

On 14/09/2018 13:48, Will Deacon wrote:

Hi Robin,



Hi Robin,

I just spoke with Dongdong and we will test this version also so that we
may provide a "Tested-by" tag.



I tested this, so for series:
Tested-by: John Garry 

Thanks,
John


Thanks,
John


On Wed, Sep 12, 2018 at 04:24:11PM +0100, Robin Murphy wrote:

John raised the issue[1] that we have some unnecessary refcount
contention
in the DMA ops path which shows scalability problems now that we have
more
real high-performance hardware using iommu-dma. The x86 IOMMU drivers
are
sidestepping this by stashing domain references in archdata, but since
that's not very nice for architecture-agnostic code, I think it's
time to
look at a generic API-level solution.

These are a couple of quick patches based on the idea I had back when
first implementing iommu-dma, but didn't have any way to justify at the
time. However, the reports of 10-25% better networking performance on v1
suggest that it's very worthwhile (and far more significant than I ever
would have guessed).

As far as merging goes, I don't mind at all whether this goes via IOMMU,
or via dma-mapping provided Joerg's happy to ack it.


I think it makes most sense for Joerg to take this series via his tree.

Anyway, I've been running them on my TX2 box and things are happy enough,
so:

Tested-by: Will Deacon 

Will

.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Linux-4.19] ThinkPad T470 not booting with "intel_iommu=on"

2018-09-20 Thread Sedat Dilek
On Wed, Sep 5, 2018 at 2:04 PM, Sedat Dilek  wrote:
> Hi Joerg,
>
> "intel_ioomu=on" was working fine here on my ThinkPad T470 with
> debian/testing AMD64 with Linux v4.17.y and v4.18.y.
>
> With testing Linux v4.19-rc1 and v4.19-rc2 my machine is not booting -
> black screen.
> These kernels are built with LLVM/Clang v7.0.0rc2 (v4.18.y too).
>
> Furthermore, I switched from GRUB to systemd-boot v239.
>
> [ /boot/efi/loader/entries/debian.conf ]
>
> title Debian GNU/Linux buster/sid [ debian kernel ]
> linux /vmlinuz-4.17.0-3-amd64
> initrd /initrd.img-4.17.0-3-amd64
> options root=UUID=4c2aa544-6e86-44d2-9329-572623867b3d ro intel_iommu=on
>
> [ /boot/efi/loader/entries/debian-llvmlinux.conf ]
>
> title Debian GNU/Linux buster/sid [ llvmlinux kernel (EXPERIMENTAL) ]
> linux /vmlinuz-4.19.0-rc2-3-iniza-llvmlinux
> initrd /initrd.img-4.19.0-rc2-3-iniza-llvmlinux
> options root=UUID=4c2aa544-6e86-44d2-9329-572623867b3d ro
>
> Any hints or hints to debug this?
> Any other kernel boot cheats [1] I can try?
>
> My kernel-config and dmesg output is attached.
>
> Thanks in advance.
>
> Regards,
> - Sedat -
>
>
> Documentation/intel_txt.txt
> Documentation/admin-guide/kernel-parameters.txt
> Documentation/Intel-IOMMU.txt
>
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/kernel-parameters.txt#n1645
> [2] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/Intel-IOMMU.txt
> [3] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/intel_txt.txt

[ TO JR @ SuSE ]

I still have this issue with Linux v4.19-rc4+.

- Sedat -
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/13] iommu/io-pgtable-arm-v7s: Extend MediaTek 4GB mode

2018-09-20 Thread Robin Murphy

On 20/09/18 06:54, Yong Wu wrote:

Hi Robin,

Could you help take a look at this patch since I changed the v7s
here?
Thanks.


Sorry, I did have a very quick skim through this series when it landed 
in my inbox but I've not yet found the chance to go through it properly. 
Since I am actually back doing io-pgtable stuff right now, I'll take a 
look shortly.


Thanks,
Robin.


On Mon, 2018-09-03 at 14:01 +0800, Yong Wu wrote:

MediaTek extend the arm v7s descriptor to support the dram over 4GB.

In the mt2712 and mt8173, it's called "4GB mode", the physical address
is from 0x4000_ to 0x1_3fff_, but from EMI point of view, it
is remapped to high address from 0x1__ to 0x1__, the
bit32 is always enabled. thus, in the M4U, we always enable the bit9
for all PTEs which means to enable bit32 of physical address.

but in mt8183, M4U support the dram from 0x4000_ to 0x3__
which isn't remaped. We extend the PTEs: the bit9 represent bit32 of
PA and the bit4 represent bit33 of PA. Meanwhile the iova still is
32bits.

In order to unify code, in the "4GB mode", we add the bit32 for the
physical address manually in our driver.

Correspondingly, Adding bit32 and bit33 for the PA in the iova_to_phys
has to been moved into v7s.

Signed-off-by: Yong Wu 
---
In mt8183, the PA is from 0x4000_ to 0x3__ while the iova
still is 32bits. Acturally, our HW extend the v7s pgtable. currently
the lvl1 pgtable is 16KB, our HW double it. but 32bit iova is enough
for us currently, thus we don't change it.
---
  drivers/iommu/io-pgtable-arm-v7s.c | 38 ++
  drivers/iommu/io-pgtable.h |  8 
  drivers/iommu/mtk_iommu.c  | 15 +--
  drivers/iommu/mtk_iommu.h  |  1 +
  4 files changed, 44 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index b5948ba..47538dd 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -124,7 +124,9 @@
  #define ARM_V7S_TEX_MASK  0x7
  #define ARM_V7S_ATTR_TEX(val) (((val) & ARM_V7S_TEX_MASK) << 
ARM_V7S_TEX_SHIFT)
  
-#define ARM_V7S_ATTR_MTK_4GB		BIT(9) /* MTK extend it for 4GB mode */

+/* MTK extend the two bits below for over 4GB mode */
+#define ARM_V7S_ATTR_MTK_PA_BIT32  BIT(9)
+#define ARM_V7S_ATTR_MTK_PA_BIT33  BIT(4)
  
  /* *well, except for TEX on level 2 large pages, of course :( */

  #define ARM_V7S_CONT_PAGE_TEX_SHIFT   6
@@ -268,7 +270,8 @@ static void __arm_v7s_set_pte(arm_v7s_iopte *ptep, 
arm_v7s_iopte pte,
  }
  
  static arm_v7s_iopte arm_v7s_prot_to_pte(int prot, int lvl,

-struct io_pgtable_cfg *cfg)
+struct io_pgtable_cfg *cfg,
+phys_addr_t paddr) /* Only for MTK */
  {
bool ap = !(cfg->quirks & IO_PGTABLE_QUIRK_NO_PERMS);
arm_v7s_iopte pte = ARM_V7S_ATTR_NG | ARM_V7S_ATTR_S;
@@ -295,8 +298,12 @@ static arm_v7s_iopte arm_v7s_prot_to_pte(int prot, int lvl,
if (lvl == 1 && (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS))
pte |= ARM_V7S_ATTR_NS_SECTION;
  
-	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_4GB)

-   pte |= ARM_V7S_ATTR_MTK_4GB;
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_4GB) {
+   if (paddr & BIT_ULL(32))
+   pte |= ARM_V7S_ATTR_MTK_PA_BIT32;
+   if (paddr & BIT_ULL(33))
+   pte |= ARM_V7S_ATTR_MTK_PA_BIT33;
+   }
  
  	return pte;

  }
@@ -392,7 +399,7 @@ static int arm_v7s_init_pte(struct arm_v7s_io_pgtable *data,
return -EEXIST;
}
  
-	pte = arm_v7s_prot_to_pte(prot, lvl, cfg);

+   pte = arm_v7s_prot_to_pte(prot, lvl, cfg, paddr);
if (num_entries > 1)
pte = arm_v7s_pte_to_cont(pte, lvl);
  
@@ -484,7 +491,11 @@ static int arm_v7s_map(struct io_pgtable_ops *ops, unsigned long iova,

if (!(prot & (IOMMU_READ | IOMMU_WRITE)))
return 0;
  
-	if (WARN_ON(upper_32_bits(iova) || upper_32_bits(paddr)))

+   if (WARN_ON(upper_32_bits(iova)))
+   return -ERANGE;
+
+   if (WARN_ON(upper_32_bits(paddr) &&
+   !(iop->cfg.quirks & IO_PGTABLE_QUIRK_ARM_MTK_4GB)))
return -ERANGE;
  
  	ret = __arm_v7s_map(data, iova, paddr, size, prot, 1, data->pgd);

@@ -563,7 +574,7 @@ static size_t arm_v7s_split_blk_unmap(struct 
arm_v7s_io_pgtable *data,
num_entries = size >> ARM_V7S_LVL_SHIFT(2);
unmap_idx = ARM_V7S_LVL_IDX(iova, 2);
  
-	pte = arm_v7s_prot_to_pte(arm_v7s_pte_to_prot(blk_pte, 1), 2, cfg);

+   pte = arm_v7s_prot_to_pte(arm_v7s_pte_to_prot(blk_pte, 1), 2, cfg, 0);
if (num_entries > 1)
pte = arm_v7s_pte_to_cont(pte, 2);
  
@@ -677,7 +688,9 @@ static phys_addr_t arm_v7s_iova_to_phys(struct io_pgtable_ops *ops,


Re: [PATCH 1/1] iommu/arm-smmu: Add support to use Last level cache

2018-09-20 Thread Vivek Gautam
Hi Will,

On Wed, Jun 27, 2018 at 10:07 PM Will Deacon  wrote:
>
> Hi Vivek,
>
> On Tue, Jun 19, 2018 at 02:04:44PM +0530, Vivek Gautam wrote:
> > On Fri, Jun 15, 2018 at 10:22 PM, Will Deacon  wrote:
> > > On Fri, Jun 15, 2018 at 04:23:29PM +0530, Vivek Gautam wrote:
> > >> Qualcomm SoCs have an additional level of cache called as
> > >> System cache or Last level cache[1]. This cache sits right
> > >> before the DDR, and is tightly coupled with the memory
> > >> controller.
> > >> The cache is available to all the clients present in the
> > >> SoC system. The clients request their slices from this system
> > >> cache, make it active, and can then start using it. For these
> > >> clients with smmu, to start using the system cache for
> > >> dma buffers and related page tables [2], few of the memory
> > >> attributes need to be set accordingly.
> > >> This change makes the related memory Outer-Shareable, and
> > >> updates the MAIR with necessary protection.
> > >>
> > >> The MAIR attribute requirements are:
> > >> Inner Cacheablity = 0
> > >> Outer Cacheablity = 1, Write-Back Write Allocate
> > >> Outer Shareablity = 1
> > >
> > > Hmm, so is this cache coherent with the CPU or not?
> >
> > Thanks for reviewing.
> > Yes, this LLC is cache coherent with CPU, so we mark for Outer-cacheable.
> > The different masters such as GPU as able to allocated and activate a slice
> > in this Last Level Cache.
>
> What I mean is, for example, if the CPU writes some data using Normal, Inner
> Shareable, Inner/Outer Cacheable, Inner/Outer Write-back, Non-transient
> Read/Write-allocate and a device reads that data using your MAIR encoding
> above, is the device guaranteed to see the CPU writes after the CPU has
> executed a DSB instruction?

No, these MAIR configurations don't guarantee that devices will have
coherent view
of what CPU writes. Not all devices can snoop into CPU caches (only IO-Coherent
devices can).
So a normal cached memory configuration in CPU MMU tables, and SMMU page tables
is valid only for few devices that are IO-coherent.

Moreover, CPU can lookup in system cache, and so do all devices;
allocation will depend on h/w configurations and memory attributes.
So anything that CPU caches in system cache will be coherently visible
to devices.

>
> I don't think so, because the ARM ARM would say that there's a mismatch on
> the Inner Cacheability attribute.
>
> > > Why don't normal
> > > non-cacheable mappings allocated in the LLC by default?
> >
> > Sorry, I couldn't fully understand your question here.
> > Few of the masters on qcom socs are not io-coherent, so for them
> > the IC has to be marked as 0.
>
> By IC you mean Inner Cacheability? In your MAIR encoding above, it is zero
> so I don't understand the problem. What goes wrong if non-coherent devices
> use your MAIR encoding for their DMA buffers?
>
> > But they are able to use the LLC with OC marked as 1.
>
> The issue here is that whatever attributes we put in the SMMU need to align
> with the attributes used by the CPU in order to avoid introducing mismatched
> aliases.

Not really, right?
Devices can use Inner non-Cacheable, Outer-cacheable (IC=0, OC=1) to allocate
into the system cache (as these devices don't want to allocate in
their inner caches),
and the CPU will have a coherent view of these buffers/page-tables.
This should be
a normal cached non-IO-Coherent memory.

But anything that CPU writes using Normal, Inner Shareable,
Inner/Outer Cacheable,
Inner/Outer Write-back, Non-transient Read/Write-allocate, may not be visible
to the device.

Also added Jordan, and Pratik to this thread.

Thanks & Regards
Vivek

> Currently, we support three types of mapping in the SMMU:
>
> 1. DMA non-coherent (e.g. "dma-coherent" is not set on the device)
> Normal, Inner Shareable, Inner/Outer Non-Cacheable
>
> 2. DMA coherent (e.g. "dma-coherent" is set on the device) [IOMMU_CACHE]
> Normal, Inner Shareable, Inner/Outer Cacheable, Inner/Outer
> Write-back, Non-transient Read/Write-allocate
>
> 3. MMIO (e.g. MSI doorbell) [IOMMU_MMIO]
> Device-nGnRE (Outer Shareable)
>
> So either you override one of these types (I was suggesting (1)) or you need
> to create a new memory type, along with the infrastructure for it to be
> recognised on a per-device basis and used by the DMA API so that we don't
> get mismatched aliases on the CPU.
>
> Will
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu



-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/arm-smmu: Add support to use Last level cache

2018-09-20 Thread Vivek Gautam
On Thu, Sep 20, 2018 at 1:05 AM Jordan Crouse  wrote:
>
> On Tue, Jul 24, 2018 at 03:13:37PM +0530, Vivek Gautam wrote:
> > Hi Will,
> >
> >
> > On Wed, Jun 27, 2018 at 10:07 PM, Will Deacon  wrote:
> > > Hi Vivek,
> > >
> > > On Tue, Jun 19, 2018 at 02:04:44PM +0530, Vivek Gautam wrote:
> > >> On Fri, Jun 15, 2018 at 10:22 PM, Will Deacon  
> > >> wrote:
> > >> > On Fri, Jun 15, 2018 at 04:23:29PM +0530, Vivek Gautam wrote:
> > >> >> Qualcomm SoCs have an additional level of cache called as
> > >> >> System cache or Last level cache[1]. This cache sits right
> > >> >> before the DDR, and is tightly coupled with the memory
> > >> >> controller.
> > >> >> The cache is available to all the clients present in the
> > >> >> SoC system. The clients request their slices from this system
> > >> >> cache, make it active, and can then start using it. For these
> > >> >> clients with smmu, to start using the system cache for
> > >> >> dma buffers and related page tables [2], few of the memory
> > >> >> attributes need to be set accordingly.
> > >> >> This change makes the related memory Outer-Shareable, and
> > >> >> updates the MAIR with necessary protection.
> > >> >>
> > >> >> The MAIR attribute requirements are:
> > >> >> Inner Cacheablity = 0
> > >> >> Outer Cacheablity = 1, Write-Back Write Allocate
> > >> >> Outer Shareablity = 1
> > >> >
> > >> > Hmm, so is this cache coherent with the CPU or not?
> > >>
> > >> Thanks for reviewing.
> > >> Yes, this LLC is cache coherent with CPU, so we mark for Outer-cacheable.
> > >> The different masters such as GPU as able to allocated and activate a 
> > >> slice
> > >> in this Last Level Cache.
> > >
> > > What I mean is, for example, if the CPU writes some data using Normal, 
> > > Inner
> > > Shareable, Inner/Outer Cacheable, Inner/Outer Write-back, Non-transient
> > > Read/Write-allocate and a device reads that data using your MAIR encoding
> > > above, is the device guaranteed to see the CPU writes after the CPU has
> > > executed a DSB instruction?
> > >
> > > I don't think so, because the ARM ARM would say that there's a mismatch on
> > > the Inner Cacheability attribute.
> > >
> > >> > Why don't normal
> > >> > non-cacheable mappings allocated in the LLC by default?
> > >>
> > >> Sorry, I couldn't fully understand your question here.
> > >> Few of the masters on qcom socs are not io-coherent, so for them
> > >> the IC has to be marked as 0.
> > >
> > > By IC you mean Inner Cacheability? In your MAIR encoding above, it is zero
> > > so I don't understand the problem. What goes wrong if non-coherent devices
> > > use your MAIR encoding for their DMA buffers?
> > >
> > >> But they are able to use the LLC with OC marked as 1.
> > >
> > > The issue here is that whatever attributes we put in the SMMU need to 
> > > align
> > > with the attributes used by the CPU in order to avoid introducing 
> > > mismatched
> > > aliases. Currently, we support three types of mapping in the SMMU:
> > >
> > > 1. DMA non-coherent (e.g. "dma-coherent" is not set on the device)
> > > Normal, Inner Shareable, Inner/Outer Non-Cacheable
> > >
> > > 2. DMA coherent (e.g. "dma-coherent" is set on the device) [IOMMU_CACHE]
> > > Normal, Inner Shareable, Inner/Outer Cacheable, Inner/Outer
> > > Write-back, Non-transient Read/Write-allocate
> > >
> > > 3. MMIO (e.g. MSI doorbell) [IOMMU_MMIO]
> > > Device-nGnRE (Outer Shareable)
> > >
> > > So either you override one of these types (I was suggesting (1)) or you 
> > > need
> > > to create a new memory type, along with the infrastructure for it to be
> > > recognised on a per-device basis and used by the DMA API so that we don't
> > > get mismatched aliases on the CPU.
> >
> > My apologies for delay in responding to this thread.
> > I have been digging and getting in touch with internal tech teams
> > to get more information on this. I will update as soon as I have enough
> > details.
> > Thanks.
>
> Hi Vivek.  I want to revive this discussion. I believe that Andy has pulled
> in the base LLCC support so this the remaining dependency we need to implement
> the LLCC in the GPU driver.

Hi Jordan, yes I was in process of gathering information about the system cache
usage and the attributes configurations required when devices use system cache.

Let me respond to Will's questions now.

Thanks
Vivek
>
> Thanks,
> Jordan
>
> --
> The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu


--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 8/9] ia64: remove machvec_dma_sync_{single,sg}

2018-09-20 Thread Christoph Hellwig
The original form of these was added (to the HP zx1 platform only) by
the following bitkeeper commit (by the way of the historic.git tree):

commit 66b99421d118a5ddd98a72913670b0fcf0a38d45
Author: Andrew Morton 
Date:   Sat Mar 13 17:05:37 2004 -0800

[PATCH] DMA: Fill gaping hole in DMA API interfaces.

From: "David S. Miller" 

The commit does not explain why we'd need the memory barrier on ia64,
it never included the swiotlb or SGI IOMMU based platforms, and also
failed to address the map/unmap parts of the dma mapping interface,
which should provide the same ordering semantics and actually are
commonly used.  The conclusion of this is that they were added in
error and should be removed.

Signed-off-by: Christoph Hellwig 
---
 arch/ia64/hp/common/sba_iommu.c |  4 
 arch/ia64/include/asm/dma-mapping.h |  5 -
 arch/ia64/kernel/machvec.c  | 16 
 arch/ia64/kernel/pci-dma.c  |  5 -
 4 files changed, 30 deletions(-)

diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 671ce1e3f6f2..e8a93b07283e 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -2207,10 +2207,6 @@ const struct dma_map_ops sba_dma_ops = {
.unmap_page = sba_unmap_page,
.map_sg = sba_map_sg_attrs,
.unmap_sg   = sba_unmap_sg_attrs,
-   .sync_single_for_cpu= machvec_dma_sync_single,
-   .sync_sg_for_cpu= machvec_dma_sync_sg,
-   .sync_single_for_device = machvec_dma_sync_single,
-   .sync_sg_for_device = machvec_dma_sync_sg,
.dma_supported  = sba_dma_supported,
.mapping_error  = sba_dma_mapping_error,
 };
diff --git a/arch/ia64/include/asm/dma-mapping.h 
b/arch/ia64/include/asm/dma-mapping.h
index 76e4d6632d68..2b8cd4a6d958 100644
--- a/arch/ia64/include/asm/dma-mapping.h
+++ b/arch/ia64/include/asm/dma-mapping.h
@@ -16,11 +16,6 @@ extern const struct dma_map_ops *dma_ops;
 extern struct ia64_machine_vector ia64_mv;
 extern void set_iommu_machvec(void);
 
-extern void machvec_dma_sync_single(struct device *, dma_addr_t, size_t,
-   enum dma_data_direction);
-extern void machvec_dma_sync_sg(struct device *, struct scatterlist *, int,
-   enum dma_data_direction);
-
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
return platform_dma_get_ops(NULL);
diff --git a/arch/ia64/kernel/machvec.c b/arch/ia64/kernel/machvec.c
index 7bfe98859911..1b604d02250b 100644
--- a/arch/ia64/kernel/machvec.c
+++ b/arch/ia64/kernel/machvec.c
@@ -73,19 +73,3 @@ machvec_timer_interrupt (int irq, void *dev_id)
 {
 }
 EXPORT_SYMBOL(machvec_timer_interrupt);
-
-void
-machvec_dma_sync_single(struct device *hwdev, dma_addr_t dma_handle, size_t 
size,
-   enum dma_data_direction dir)
-{
-   mb();
-}
-EXPORT_SYMBOL(machvec_dma_sync_single);
-
-void
-machvec_dma_sync_sg(struct device *hwdev, struct scatterlist *sg, int n,
-   enum dma_data_direction dir)
-{
-   mb();
-}
-EXPORT_SYMBOL(machvec_dma_sync_sg);
diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
index afb43677f9ca..5a5bf5a82ac2 100644
--- a/arch/ia64/kernel/pci-dma.c
+++ b/arch/ia64/kernel/pci-dma.c
@@ -41,11 +41,6 @@ void __init pci_iommu_alloc(void)
 {
dma_ops = _dma_ops;
 
-   intel_dma_ops.sync_single_for_cpu = machvec_dma_sync_single;
-   intel_dma_ops.sync_sg_for_cpu = machvec_dma_sync_sg;
-   intel_dma_ops.sync_single_for_device = machvec_dma_sync_single;
-   intel_dma_ops.sync_sg_for_device = machvec_dma_sync_sg;
-
/*
 * The order of these functions is important for
 * fall-back/fail-over reasons
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 7/9] ia64/sn2: remove no-ops dma sync methods

2018-09-20 Thread Christoph Hellwig
These do nothing but duplicating an assert that would have triggered
earlier on setting the dma mask, so remove them.

Signed-off-by: Christoph Hellwig 
---
 arch/ia64/sn/pci/pci_dma.c | 29 -
 1 file changed, 29 deletions(-)

diff --git a/arch/ia64/sn/pci/pci_dma.c b/arch/ia64/sn/pci/pci_dma.c
index 74c934a997bb..3b944fba0a3e 100644
--- a/arch/ia64/sn/pci/pci_dma.c
+++ b/arch/ia64/sn/pci/pci_dma.c
@@ -314,31 +314,6 @@ static int sn_dma_map_sg(struct device *dev, struct 
scatterlist *sgl,
return nhwentries;
 }
 
-static void sn_dma_sync_single_for_cpu(struct device *dev, dma_addr_t 
dma_handle,
-  size_t size, enum dma_data_direction dir)
-{
-   BUG_ON(!dev_is_pci(dev));
-}
-
-static void sn_dma_sync_single_for_device(struct device *dev, dma_addr_t 
dma_handle,
- size_t size,
- enum dma_data_direction dir)
-{
-   BUG_ON(!dev_is_pci(dev));
-}
-
-static void sn_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
-  int nelems, enum dma_data_direction dir)
-{
-   BUG_ON(!dev_is_pci(dev));
-}
-
-static void sn_dma_sync_sg_for_device(struct device *dev, struct scatterlist 
*sg,
- int nelems, enum dma_data_direction dir)
-{
-   BUG_ON(!dev_is_pci(dev));
-}
-
 static int sn_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
 {
return 0;
@@ -467,10 +442,6 @@ static struct dma_map_ops sn_dma_ops = {
.unmap_page = sn_dma_unmap_page,
.map_sg = sn_dma_map_sg,
.unmap_sg   = sn_dma_unmap_sg,
-   .sync_single_for_cpu= sn_dma_sync_single_for_cpu,
-   .sync_sg_for_cpu= sn_dma_sync_sg_for_cpu,
-   .sync_single_for_device = sn_dma_sync_single_for_device,
-   .sync_sg_for_device = sn_dma_sync_sg_for_device,
.mapping_error  = sn_dma_mapping_error,
.dma_supported  = sn_dma_supported,
 };
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 9/9] intel-iommu: mark intel_dma_ops static

2018-09-20 Thread Christoph Hellwig
ia64 currently explicitly assigns it to dma_ops, but that same work is
already done by intel_iommu_init a little later, so we can remove the
duplicate assignment and mark the variable static.

Signed-off-by: Christoph Hellwig 
---
 arch/ia64/kernel/pci-dma.c  | 4 
 drivers/iommu/intel-iommu.c | 2 +-
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
index 5a5bf5a82ac2..fe988c49f01c 100644
--- a/arch/ia64/kernel/pci-dma.c
+++ b/arch/ia64/kernel/pci-dma.c
@@ -24,8 +24,6 @@ int force_iommu __read_mostly;
 
 int iommu_pass_through;
 
-extern struct dma_map_ops intel_dma_ops;
-
 static int __init pci_iommu_init(void)
 {
if (iommu_detected)
@@ -39,8 +37,6 @@ fs_initcall(pci_iommu_init);
 
 void __init pci_iommu_alloc(void)
 {
-   dma_ops = _dma_ops;
-
/*
 * The order of these functions is important for
 * fall-back/fail-over reasons
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index e72efef97924..0ee6516de41a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3895,7 +3895,7 @@ static int intel_mapping_error(struct device *dev, 
dma_addr_t dma_addr)
return !dma_addr;
 }
 
-const struct dma_map_ops intel_dma_ops = {
+static const struct dma_map_ops intel_dma_ops = {
.alloc = intel_alloc_coherent,
.free = intel_free_coherent,
.map_sg = intel_map_sg,
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 6/9] ia64: remove the unused iommu_dma_init function

2018-09-20 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 arch/ia64/include/asm/iommu.h | 1 -
 arch/ia64/kernel/pci-dma.c| 6 --
 2 files changed, 7 deletions(-)

diff --git a/arch/ia64/include/asm/iommu.h b/arch/ia64/include/asm/iommu.h
index 5397e5aa3704..7429a72f3f92 100644
--- a/arch/ia64/include/asm/iommu.h
+++ b/arch/ia64/include/asm/iommu.h
@@ -15,7 +15,6 @@ extern int iommu_detected;
 #define no_iommu   (1)
 #define iommu_detected (0)
 #endif
-extern void iommu_dma_init(void);
 extern void machvec_init(const char *name);
 
 #endif
diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
index 936af0ec7f8f..afb43677f9ca 100644
--- a/arch/ia64/kernel/pci-dma.c
+++ b/arch/ia64/kernel/pci-dma.c
@@ -37,12 +37,6 @@ static int __init pci_iommu_init(void)
 /* Must execute after PCI subsystem */
 fs_initcall(pci_iommu_init);
 
-void __init
-iommu_dma_init(void)
-{
-   return;
-}
-
 void __init pci_iommu_alloc(void)
 {
dma_ops = _dma_ops;
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/9] ia64: remove the unused pci_iommu_shutdown function

2018-09-20 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 arch/ia64/include/asm/iommu.h | 1 -
 arch/ia64/kernel/pci-dma.c| 5 -
 2 files changed, 6 deletions(-)

diff --git a/arch/ia64/include/asm/iommu.h b/arch/ia64/include/asm/iommu.h
index 156b9d8e1932..5397e5aa3704 100644
--- a/arch/ia64/include/asm/iommu.h
+++ b/arch/ia64/include/asm/iommu.h
@@ -5,7 +5,6 @@
 /* 10 seconds */
 #define DMAR_OPERATION_TIMEOUT (((cycles_t) local_cpu_data->itc_freq)*10)
 
-extern void pci_iommu_shutdown(void);
 extern void no_iommu_init(void);
 #ifdef CONFIG_INTEL_IOMMU
 extern int force_iommu, no_iommu;
diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
index 924966e5aa25..936af0ec7f8f 100644
--- a/arch/ia64/kernel/pci-dma.c
+++ b/arch/ia64/kernel/pci-dma.c
@@ -37,11 +37,6 @@ static int __init pci_iommu_init(void)
 /* Must execute after PCI subsystem */
 fs_initcall(pci_iommu_init);
 
-void pci_iommu_shutdown(void)
-{
-   return;
-}
-
 void __init
 iommu_dma_init(void)
 {
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


misc ia64 cleanups (resend)

2018-09-20 Thread Christoph Hellwig
A couple random cleanups I stumbled upon when doing dma related work.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/9] ia64: remove the dead iommu_sac_force variable

2018-09-20 Thread Christoph Hellwig
Looks like copy and paste from x86 that never actually got used.

Signed-off-by: Christoph Hellwig 
---
 arch/ia64/kernel/pci-dma.c | 19 ---
 1 file changed, 19 deletions(-)

diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
index b5df084c0af4..50b6ad282a90 100644
--- a/arch/ia64/kernel/pci-dma.c
+++ b/arch/ia64/kernel/pci-dma.c
@@ -18,8 +18,6 @@
 dma_addr_t bad_dma_address __read_mostly;
 EXPORT_SYMBOL(bad_dma_address);
 
-static int iommu_sac_force __read_mostly;
-
 int no_iommu __read_mostly;
 #ifdef CONFIG_IOMMU_DEBUG
 int force_iommu __read_mostly = 1;
@@ -61,23 +59,6 @@ int iommu_dma_supported(struct device *dev, u64 mask)
if (mask < DMA_BIT_MASK(24))
return 0;
 
-   /* Tell the device to use SAC when IOMMU force is on.  This
-  allows the driver to use cheaper accesses in some cases.
-
-  Problem with this is that if we overflow the IOMMU area and
-  return DAC as fallback address the device may not handle it
-  correctly.
-
-  As a special case some controllers have a 39bit address
-  mode that is as efficient as 32bit (aic79xx). Don't force
-  SAC for these.  Assume all masks <= 40 bits are of this
-  type. Normally this doesn't make any difference, but gives
-  more gentle handling of IOMMU overflow. */
-   if (iommu_sac_force && (mask >= DMA_BIT_MASK(40))) {
-   dev_info(dev, "Force SAC with mask %llx\n", mask);
-   return 0;
-   }
-
return 1;
 }
 EXPORT_SYMBOL(iommu_dma_supported);
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/9] ia64: remove the unused bad_dma_address symbol

2018-09-20 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 arch/ia64/kernel/pci-dma.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
index 3c2884bef3d4..924966e5aa25 100644
--- a/arch/ia64/kernel/pci-dma.c
+++ b/arch/ia64/kernel/pci-dma.c
@@ -15,9 +15,6 @@
 #include 
 #include 
 
-dma_addr_t bad_dma_address __read_mostly;
-EXPORT_SYMBOL(bad_dma_address);
-
 int no_iommu __read_mostly;
 #ifdef CONFIG_IOMMU_DEBUG
 int force_iommu __read_mostly = 1;
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/9] ia64: remove iommu_dma_supported

2018-09-20 Thread Christoph Hellwig
The generic dma_direct_supported helper already used by intel-iommu on
x86 does a better job than the ia64 reimplementation.

Signed-off-by: Christoph Hellwig 
---
 arch/ia64/kernel/pci-dma.c  | 13 -
 drivers/iommu/intel-iommu.c |  2 --
 2 files changed, 15 deletions(-)

diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
index 50b6ad282a90..3c2884bef3d4 100644
--- a/arch/ia64/kernel/pci-dma.c
+++ b/arch/ia64/kernel/pci-dma.c
@@ -51,18 +51,6 @@ iommu_dma_init(void)
return;
 }
 
-int iommu_dma_supported(struct device *dev, u64 mask)
-{
-   /* Copied from i386. Doesn't make much sense, because it will
-  only work for pci_alloc_coherent.
-  The caller just has to use GFP_DMA in this case. */
-   if (mask < DMA_BIT_MASK(24))
-   return 0;
-
-   return 1;
-}
-EXPORT_SYMBOL(iommu_dma_supported);
-
 void __init pci_iommu_alloc(void)
 {
dma_ops = _dma_ops;
@@ -71,7 +59,6 @@ void __init pci_iommu_alloc(void)
intel_dma_ops.sync_sg_for_cpu = machvec_dma_sync_sg;
intel_dma_ops.sync_single_for_device = machvec_dma_sync_single;
intel_dma_ops.sync_sg_for_device = machvec_dma_sync_sg;
-   intel_dma_ops.dma_supported = iommu_dma_supported;
 
/*
 * The order of these functions is important for
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 5f3f10cf9d9d..e72efef97924 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3903,9 +3903,7 @@ const struct dma_map_ops intel_dma_ops = {
.map_page = intel_map_page,
.unmap_page = intel_unmap_page,
.mapping_error = intel_mapping_error,
-#ifdef CONFIG_X86
.dma_supported = dma_direct_supported,
-#endif
 };
 
 static inline int iommu_domain_cache_init(void)
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/9] ia64: remove the kern_mem_attribute export

2018-09-20 Thread Christoph Hellwig
No actually used anywhere.

Signed-off-by: Christoph Hellwig 
---
 arch/ia64/kernel/efi.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
index 9c09bf390cce..f77d80edddfe 100644
--- a/arch/ia64/kernel/efi.c
+++ b/arch/ia64/kernel/efi.c
@@ -842,7 +842,6 @@ kern_mem_attribute (unsigned long phys_addr, unsigned long 
size)
} while (md);
return 0;   /* never reached */
 }
-EXPORT_SYMBOL(kern_mem_attribute);
 
 int
 valid_phys_addr_range (phys_addr_t phys_addr, unsigned long size)
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: merge dma_direct_ops and dma_noncoherent_ops v3

2018-09-20 Thread Christoph Hellwig
I've pulled this into the dma-mapping for-next tree now to make
progress.  Any further review comments should be dealt with in
incremental patches.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] kernel/dma: Fix panic caused by passing cma to command line

2018-09-20 Thread Christoph Hellwig
On Tue, Sep 18, 2018 at 11:28:33AM +0200, Marek Szyprowski wrote:
> Hi
> 
> On 2018-09-17 05:24, zhe...@windriver.com wrote:
> > From: He Zhe 
> >
> > early_cma does not check input argument before passing it to
> > simple_strtoull. The argument would be a NULL pointer if "cma", without
> > its value, is set in command line and thus causes the following panic.
> >
> > PANIC: early exception 0xe3 IP 10:a3e9db8d error 0 cr2 0x0
> > [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 
> > 4.19.0-rc3-yocto-standard+ #7
> > [0.00] RIP: 0010:_parse_integer_fixup_radix+0xd/0x70
> > ...
> > [0.00] Call Trace:
> > [0.00]  simple_strtoull+0x29/0x70
> > [0.00]  memparse+0x26/0x90
> > [0.00]  early_cma+0x17/0x6a
> > [0.00]  do_early_param+0x57/0x8e
> > [0.00]  parse_args+0x208/0x320
> > [0.00]  ? rdinit_setup+0x30/0x30
> > [0.00]  parse_early_options+0x29/0x2d
> > [0.00]  ? rdinit_setup+0x30/0x30
> > [0.00]  parse_early_param+0x36/0x4d
> > [0.00]  setup_arch+0x336/0x99e
> > [0.00]  start_kernel+0x6f/0x4e6
> > [0.00]  x86_64_start_reservations+0x24/0x26
> > [0.00]  x86_64_start_kernel+0x6f/0x72
> > [0.00]  secondary_startup_64+0xa4/0xb0
> >
> > This patch adds a check to prevent the panic.
> >
> > Signed-off-by: He Zhe 
> > Cc: sta...@vger.kernel.org
> > Cc: h...@lst.de
> > Cc: m.szyprow...@samsung.com
> > Cc: robin.mur...@arm.com
> 
> Thanks for the fix.
> 
> Reviewed-by: Marek Szyprowski 

Thanks, added to the dma-mapping tree.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu