Re: [PATCH v3 1/9] mm: Introduce new vm_insert_range API

2018-12-07 Thread Souptick Joarder
On Sat, Dec 8, 2018 at 2:40 AM Robin Murphy  wrote:
>
> On 2018-12-07 7:28 pm, Souptick Joarder wrote:
> > On Fri, Dec 7, 2018 at 10:41 PM Matthew Wilcox  wrote:
> >>
> >> On Fri, Dec 07, 2018 at 03:34:56PM +, Robin Murphy wrote:
>  +int vm_insert_range(struct vm_area_struct *vma, unsigned long addr,
>  +   struct page **pages, unsigned long page_count)
>  +{
>  +   unsigned long uaddr = addr;
>  +   int ret = 0, i;
> >>>
> >>> Some of the sites being replaced were effectively ensuring that vma and
> >>> pages were mutually compatible as an initial condition - would it be worth
> >>> adding something here for robustness, e.g.:
> >>>
> >>> + if (page_count != vma_pages(vma))
> >>> + return -ENXIO;
> >>
> >> I think we want to allow this to be used to populate part of a VMA.
> >> So perhaps:
> >>
> >>  if (page_count > vma_pages(vma))
> >>  return -ENXIO;
> >
> > Ok, This can be added.
> >
> > I think Patch [2/9] is the only leftover place where this
> > check could be removed.
>
> Right, 9/9 could also have relied on my stricter check here, but since
> it's really testing whether it actually managed to allocate vma_pages()
> worth of pages earlier, Matthew's more lenient version won't help for
> that one.


(Why privcmd_buf_mmap() doesn't clean up and return an error
> as soon as that allocation loop fails, without taking the mutex under
> which it still does a bunch more pointless work to only undo it again,
> is a mind-boggling mystery, but that's not our problem here...)

I think some clean up can be done here in a separate patch.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/9] mm: Introduce new vm_insert_range API

2018-12-07 Thread Robin Murphy

On 2018-12-07 7:28 pm, Souptick Joarder wrote:

On Fri, Dec 7, 2018 at 10:41 PM Matthew Wilcox  wrote:


On Fri, Dec 07, 2018 at 03:34:56PM +, Robin Murphy wrote:

+int vm_insert_range(struct vm_area_struct *vma, unsigned long addr,
+   struct page **pages, unsigned long page_count)
+{
+   unsigned long uaddr = addr;
+   int ret = 0, i;


Some of the sites being replaced were effectively ensuring that vma and
pages were mutually compatible as an initial condition - would it be worth
adding something here for robustness, e.g.:

+ if (page_count != vma_pages(vma))
+ return -ENXIO;


I think we want to allow this to be used to populate part of a VMA.
So perhaps:

 if (page_count > vma_pages(vma))
 return -ENXIO;


Ok, This can be added.

I think Patch [2/9] is the only leftover place where this
check could be removed.


Right, 9/9 could also have relied on my stricter check here, but since 
it's really testing whether it actually managed to allocate vma_pages() 
worth of pages earlier, Matthew's more lenient version won't help for 
that one. (Why privcmd_buf_mmap() doesn't clean up and return an error 
as soon as that allocation loop fails, without taking the mutex under 
which it still does a bunch more pointless work to only undo it again, 
is a mind-boggling mystery, but that's not our problem here...)


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 6/9] iommu/dma-iommu.c: Convert to use vm_insert_range

2018-12-07 Thread Souptick Joarder
On Fri, Dec 7, 2018 at 7:17 PM Robin Murphy  wrote:
>
> On 06/12/2018 18:43, Souptick Joarder wrote:
> > Convert to use vm_insert_range() to map range of kernel
> > memory to user vma.
> >
> > Signed-off-by: Souptick Joarder 
> > Reviewed-by: Matthew Wilcox 
> > ---
> >   drivers/iommu/dma-iommu.c | 13 +++--
> >   1 file changed, 3 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index d1b0475..a2c65e2 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -622,17 +622,10 @@ struct page **iommu_dma_alloc(struct device *dev, 
> > size_t size, gfp_t gfp,
> >
> >   int iommu_dma_mmap(struct page **pages, size_t size, struct 
> > vm_area_struct *vma)
> >   {
> > - unsigned long uaddr = vma->vm_start;
> > - unsigned int i, count = PAGE_ALIGN(size) >> PAGE_SHIFT;
> > - int ret = -ENXIO;
> > + unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
> >
> > - for (i = vma->vm_pgoff; i < count && uaddr < vma->vm_end; i++) {
> > - ret = vm_insert_page(vma, uaddr, pages[i]);
> > - if (ret)
> > - break;
> > - uaddr += PAGE_SIZE;
> > - }
> > - return ret;
> > + return vm_insert_range(vma, vma->vm_start,
> > + pages + vma->vm_pgoff, count);
>
> You also need to adjust count to compensate for the pages skipped by
> vm_pgoff, otherwise you've got an out-of-bounds dereference triggered
> from userspace, which is pretty high up the "not good" scale (not to
> mention the entire call would then propagate -EFAULT back from
> vm_insert_page() and thus always appear to fail for nonzero offsets).

So this should something similar to ->

return vm_insert_range(vma, vma->vm_start,
pages + vma->vm_pgoff, count - vma->vm_pgoff);
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/9] mm: Introduce new vm_insert_range API

2018-12-07 Thread Souptick Joarder
On Fri, Dec 7, 2018 at 10:41 PM Matthew Wilcox  wrote:
>
> On Fri, Dec 07, 2018 at 03:34:56PM +, Robin Murphy wrote:
> > > +int vm_insert_range(struct vm_area_struct *vma, unsigned long addr,
> > > +   struct page **pages, unsigned long page_count)
> > > +{
> > > +   unsigned long uaddr = addr;
> > > +   int ret = 0, i;
> >
> > Some of the sites being replaced were effectively ensuring that vma and
> > pages were mutually compatible as an initial condition - would it be worth
> > adding something here for robustness, e.g.:
> >
> > + if (page_count != vma_pages(vma))
> > + return -ENXIO;
>
> I think we want to allow this to be used to populate part of a VMA.
> So perhaps:
>
> if (page_count > vma_pages(vma))
> return -ENXIO;

Ok, This can be added.

I think Patch [2/9] is the only leftover place where this
check could be removed.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 15/15] dma-mapping: bypass indirect calls for dma-direct

2018-12-07 Thread Christoph Hellwig
Avoid expensive indirect calls in the fast path DMA mapping
operations by directly calling the dma_direct_* ops if we are using
the directly mapped DMA operations.

Signed-off-by: Christoph Hellwig 
---
 arch/alpha/include/asm/dma-mapping.h |   2 +-
 arch/arc/mm/cache.c  |   2 +-
 arch/arm/include/asm/dma-mapping.h   |   2 +-
 arch/arm/mm/dma-mapping-nommu.c  |  14 +---
 arch/arm64/mm/dma-mapping.c  |   3 -
 arch/ia64/hp/common/hwsw_iommu.c |   2 +-
 arch/ia64/hp/common/sba_iommu.c  |   4 +-
 arch/ia64/kernel/dma-mapping.c   |   1 -
 arch/mips/include/asm/dma-mapping.h  |   2 +-
 arch/parisc/kernel/setup.c   |   4 -
 arch/sparc/include/asm/dma-mapping.h |   4 +-
 arch/x86/kernel/pci-dma.c|   2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c  |   2 +-
 drivers/iommu/amd_iommu.c|  13 +---
 include/asm-generic/dma-mapping.h|   2 +-
 include/linux/dma-direct.h   |  17 
 include/linux/dma-mapping.h  | 111 +++
 include/linux/dma-noncoherent.h  |   5 +-
 kernel/dma/direct.c  |  37 ++---
 kernel/dma/mapping.c |  40 ++
 20 files changed, 150 insertions(+), 119 deletions(-)

diff --git a/arch/alpha/include/asm/dma-mapping.h 
b/arch/alpha/include/asm/dma-mapping.h
index 8beeafd4f68e..0ee6a5c99b16 100644
--- a/arch/alpha/include/asm/dma-mapping.h
+++ b/arch/alpha/include/asm/dma-mapping.h
@@ -7,7 +7,7 @@ extern const struct dma_map_ops alpha_pci_ops;
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
 #ifdef CONFIG_ALPHA_JENSEN
-   return &dma_direct_ops;
+   return NULL;
 #else
return &alpha_pci_ops;
 #endif
diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c
index f2701c13a66b..e188bb3ede53 100644
--- a/arch/arc/mm/cache.c
+++ b/arch/arc/mm/cache.c
@@ -1280,7 +1280,7 @@ void __init arc_cache_init_master(void)
/*
 * In case of IOC (say IOC+SLC case), pointers above could still be set
 * but end up not being relevant as the first function in chain is not
-* called at all for @dma_direct_ops
+* called at all for devices using coherent DMA.
 * arch_sync_dma_for_cpu() -> dma_cache_*() -> __dma_cache_*()
 */
 }
diff --git a/arch/arm/include/asm/dma-mapping.h 
b/arch/arm/include/asm/dma-mapping.h
index 965b7c846ecb..31d3b96f0f4b 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -18,7 +18,7 @@ extern const struct dma_map_ops arm_coherent_dma_ops;
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
-   return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : &dma_direct_ops;
+   return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : NULL;
 }
 
 #ifdef __arch_page_to_dma
diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index 712416ecd8e6..f304b10e23a4 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -22,7 +22,7 @@
 #include "dma.h"
 
 /*
- *  dma_direct_ops is used if
+ *  The generic direct mapping code is used if
  *   - MMU/MPU is off
  *   - cpu is v7m w/o cache support
  *   - device is coherent
@@ -209,16 +209,9 @@ const struct dma_map_ops arm_nommu_dma_ops = {
 };
 EXPORT_SYMBOL(arm_nommu_dma_ops);
 
-static const struct dma_map_ops *arm_nommu_get_dma_map_ops(bool coherent)
-{
-   return coherent ? &dma_direct_ops : &arm_nommu_dma_ops;
-}
-
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
const struct iommu_ops *iommu, bool coherent)
 {
-   const struct dma_map_ops *dma_ops;
-
if (IS_ENABLED(CONFIG_CPU_V7M)) {
/*
 * Cache support for v7m is optional, so can be treated as
@@ -234,7 +227,6 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, 
u64 size,
dev->archdata.dma_coherent = (get_cr() & CR_M) ? coherent : 
true;
}
 
-   dma_ops = arm_nommu_get_dma_map_ops(dev->archdata.dma_coherent);
-
-   set_dma_ops(dev, dma_ops);
+   if (!dev->archdata.dma_coherent)
+   set_dma_ops(dev, &arm_nommu_dma_ops);
 }
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index ab1e417204d0..95eda81e3f2d 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -462,9 +462,6 @@ static void __iommu_setup_dma_ops(struct device *dev, u64 
dma_base, u64 size,
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
const struct iommu_ops *iommu, bool coherent)
 {
-   if (!dev->dma_ops)
-   dev->dma_ops = &dma_direct_ops;
-
dev->dma_coherent = coherent;
__iommu_setup_dma_ops(dev, dma_base, size, iommu);
 
diff --git a/arch/ia64/hp/common/hwsw_iommu.c b/arch/ia64/hp/common/hwsw_iommu.c
index f40ca499b246..8840ed97712f 100644
--- a/arch/ia64/hp/common/hwsw_iommu.c
+++ b/arch/ia64/hp/common/hwsw_iommu.

[PATCH 14/15] vmd: use the proper dma_* APIs instead of direct methods calls

2018-12-07 Thread Christoph Hellwig
With the bypass support for the direct mapping we might not always have
methods to call, so use the proper APIs instead.  The only downside is
that we will create two dma-debug entries for each mapping if
CONFIG_DMA_DEBUG is enabled.

Signed-off-by: Christoph Hellwig 
---
 drivers/pci/controller/vmd.c | 42 +++-
 1 file changed, 17 insertions(+), 25 deletions(-)

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 98ce79eac128..3890812cdf87 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -307,39 +307,32 @@ static struct device *to_vmd_dev(struct device *dev)
return &vmd->dev->dev;
 }
 
-static const struct dma_map_ops *vmd_dma_ops(struct device *dev)
-{
-   return get_dma_ops(to_vmd_dev(dev));
-}
-
 static void *vmd_alloc(struct device *dev, size_t size, dma_addr_t *addr,
   gfp_t flag, unsigned long attrs)
 {
-   return vmd_dma_ops(dev)->alloc(to_vmd_dev(dev), size, addr, flag,
-  attrs);
+   return dma_alloc_attrs(to_vmd_dev(dev), size, addr, flag, attrs);
 }
 
 static void vmd_free(struct device *dev, size_t size, void *vaddr,
 dma_addr_t addr, unsigned long attrs)
 {
-   return vmd_dma_ops(dev)->free(to_vmd_dev(dev), size, vaddr, addr,
- attrs);
+   return dma_free_attrs(to_vmd_dev(dev), size, vaddr, addr, attrs);
 }
 
 static int vmd_mmap(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t addr, size_t size,
unsigned long attrs)
 {
-   return vmd_dma_ops(dev)->mmap(to_vmd_dev(dev), vma, cpu_addr, addr,
- size, attrs);
+   return dma_mmap_attrs(to_vmd_dev(dev), vma, cpu_addr, addr, size,
+   attrs);
 }
 
 static int vmd_get_sgtable(struct device *dev, struct sg_table *sgt,
   void *cpu_addr, dma_addr_t addr, size_t size,
   unsigned long attrs)
 {
-   return vmd_dma_ops(dev)->get_sgtable(to_vmd_dev(dev), sgt, cpu_addr,
-addr, size, attrs);
+   return dma_get_sgtable_attrs(to_vmd_dev(dev), sgt, cpu_addr, addr, size,
+   attrs);
 }
 
 static dma_addr_t vmd_map_page(struct device *dev, struct page *page,
@@ -347,61 +340,60 @@ static dma_addr_t vmd_map_page(struct device *dev, struct 
page *page,
   enum dma_data_direction dir,
   unsigned long attrs)
 {
-   return vmd_dma_ops(dev)->map_page(to_vmd_dev(dev), page, offset, size,
- dir, attrs);
+   return dma_map_page_attrs(to_vmd_dev(dev), page, offset, size, dir,
+   attrs);
 }
 
 static void vmd_unmap_page(struct device *dev, dma_addr_t addr, size_t size,
   enum dma_data_direction dir, unsigned long attrs)
 {
-   vmd_dma_ops(dev)->unmap_page(to_vmd_dev(dev), addr, size, dir, attrs);
+   dma_unmap_page_attrs(to_vmd_dev(dev), addr, size, dir, attrs);
 }
 
 static int vmd_map_sg(struct device *dev, struct scatterlist *sg, int nents,
  enum dma_data_direction dir, unsigned long attrs)
 {
-   return vmd_dma_ops(dev)->map_sg(to_vmd_dev(dev), sg, nents, dir, attrs);
+   return dma_map_sg_attrs(to_vmd_dev(dev), sg, nents, dir, attrs);
 }
 
 static void vmd_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
 enum dma_data_direction dir, unsigned long attrs)
 {
-   vmd_dma_ops(dev)->unmap_sg(to_vmd_dev(dev), sg, nents, dir, attrs);
+   dma_unmap_sg_attrs(to_vmd_dev(dev), sg, nents, dir, attrs);
 }
 
 static void vmd_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
size_t size, enum dma_data_direction dir)
 {
-   vmd_dma_ops(dev)->sync_single_for_cpu(to_vmd_dev(dev), addr, size, dir);
+   dma_sync_single_for_cpu(to_vmd_dev(dev), addr, size, dir);
 }
 
 static void vmd_sync_single_for_device(struct device *dev, dma_addr_t addr,
   size_t size, enum dma_data_direction dir)
 {
-   vmd_dma_ops(dev)->sync_single_for_device(to_vmd_dev(dev), addr, size,
-dir);
+   dma_sync_single_for_device(to_vmd_dev(dev), addr, size, dir);
 }
 
 static void vmd_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir)
 {
-   vmd_dma_ops(dev)->sync_sg_for_cpu(to_vmd_dev(dev), sg, nents, dir);
+   dma_sync_sg_for_cpu(to_vmd_dev(dev), sg, nents, dir);
 }
 
 static void vmd_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
   int nents, enum dma_data_direction dir)
 {
-   vmd_dma_ops(dev)->sync_sg_for_device(to_vmd_dev(dev), sg,

[PATCH 13/15] ACPI / scan: Refactor _CCA enforcement

2018-12-07 Thread Christoph Hellwig
From: Robin Murphy 

Rather than checking the DMA attribute at each callsite, just pass it
through for acpi_dma_configure() to handle directly. That can then deal
with the relatively exceptional DEV_DMA_NOT_SUPPORTED case by explicitly
installing dummy DMA ops instead of just skipping setup entirely. This
will then free up the dev->dma_ops == NULL case for some valuable
fastpath optimisations.

Signed-off-by: Robin Murphy 
Signed-off-by: Christoph Hellwig 
---
 drivers/acpi/scan.c  | 5 +
 drivers/base/platform.c  | 3 +--
 drivers/pci/pci-driver.c | 3 +--
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index bd1c59fb0e17..b75ae34ed188 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1456,6 +1456,11 @@ int acpi_dma_configure(struct device *dev, enum 
dev_dma_attr attr)
const struct iommu_ops *iommu;
u64 dma_addr = 0, size = 0;
 
+   if (attr == DEV_DMA_NOT_SUPPORTED) {
+   set_dma_ops(dev, &dma_dummy_ops);
+   return 0;
+   }
+
iort_dma_setup(dev, &dma_addr, &size);
 
iommu = iort_iommu_configure(dev);
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index eae841935a45..c1ddf191711e 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1138,8 +1138,7 @@ int platform_dma_configure(struct device *dev)
ret = of_dma_configure(dev, dev->of_node, true);
} else if (has_acpi_companion(dev)) {
attr = acpi_get_dma_attr(to_acpi_device_node(dev->fwnode));
-   if (attr != DEV_DMA_NOT_SUPPORTED)
-   ret = acpi_dma_configure(dev, attr);
+   ret = acpi_dma_configure(dev, attr);
}
 
return ret;
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index bef17c3fca67..1b58e058b13f 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1602,8 +1602,7 @@ static int pci_dma_configure(struct device *dev)
struct acpi_device *adev = to_acpi_device_node(bridge->fwnode);
enum dev_dma_attr attr = acpi_get_dma_attr(adev);
 
-   if (attr != DEV_DMA_NOT_SUPPORTED)
-   ret = acpi_dma_configure(dev, attr);
+   ret = acpi_dma_configure(dev, acpi_get_dma_attr(adev));
}
 
pci_put_host_bridge_device(bridge);
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 12/15] dma-mapping: factor out dummy DMA ops

2018-12-07 Thread Christoph Hellwig
From: Robin Murphy 

The dummy DMA ops are currently used by arm64 for any device which has
an invalid ACPI description and is thus barred from using DMA due to not
knowing whether is is cache-coherent or not. Factor these out into
general dma-mapping code so that they can be referenced from other
common code paths. In the process, we can prune all the optional
callbacks which just do the same thing as the default behaviour, and
fill in .map_resource for completeness.

Signed-off-by: Robin Murphy 
[hch: moved to a separate source file]
Signed-off-by: Christoph Hellwig 
---
 arch/arm64/include/asm/dma-mapping.h |  4 +-
 arch/arm64/mm/dma-mapping.c  | 86 
 include/linux/dma-mapping.h  |  1 +
 kernel/dma/Makefile  |  2 +-
 kernel/dma/dummy.c   | 39 +
 5 files changed, 42 insertions(+), 90 deletions(-)
 create mode 100644 kernel/dma/dummy.c

diff --git a/arch/arm64/include/asm/dma-mapping.h 
b/arch/arm64/include/asm/dma-mapping.h
index c41f3fb1446c..273e778f7de2 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -24,15 +24,13 @@
 #include 
 #include 
 
-extern const struct dma_map_ops dummy_dma_ops;
-
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
/*
 * We expect no ISA devices, and all other DMA masters are expected to
 * have someone call arch_setup_dma_ops at device creation time.
 */
-   return &dummy_dma_ops;
+   return &dma_dummy_ops;
 }
 
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index e4effbb243b1..ab1e417204d0 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -89,92 +89,6 @@ static int __swiotlb_mmap_pfn(struct vm_area_struct *vma,
 }
 #endif /* CONFIG_IOMMU_DMA */
 
-/
- * The following APIs are for dummy DMA ops *
- /
-
-static void *__dummy_alloc(struct device *dev, size_t size,
-  dma_addr_t *dma_handle, gfp_t flags,
-  unsigned long attrs)
-{
-   return NULL;
-}
-
-static void __dummy_free(struct device *dev, size_t size,
-void *vaddr, dma_addr_t dma_handle,
-unsigned long attrs)
-{
-}
-
-static int __dummy_mmap(struct device *dev,
-   struct vm_area_struct *vma,
-   void *cpu_addr, dma_addr_t dma_addr, size_t size,
-   unsigned long attrs)
-{
-   return -ENXIO;
-}
-
-static dma_addr_t __dummy_map_page(struct device *dev, struct page *page,
-  unsigned long offset, size_t size,
-  enum dma_data_direction dir,
-  unsigned long attrs)
-{
-   return DMA_MAPPING_ERROR;
-}
-
-static void __dummy_unmap_page(struct device *dev, dma_addr_t dev_addr,
-  size_t size, enum dma_data_direction dir,
-  unsigned long attrs)
-{
-}
-
-static int __dummy_map_sg(struct device *dev, struct scatterlist *sgl,
- int nelems, enum dma_data_direction dir,
- unsigned long attrs)
-{
-   return 0;
-}
-
-static void __dummy_unmap_sg(struct device *dev,
-struct scatterlist *sgl, int nelems,
-enum dma_data_direction dir,
-unsigned long attrs)
-{
-}
-
-static void __dummy_sync_single(struct device *dev,
-   dma_addr_t dev_addr, size_t size,
-   enum dma_data_direction dir)
-{
-}
-
-static void __dummy_sync_sg(struct device *dev,
-   struct scatterlist *sgl, int nelems,
-   enum dma_data_direction dir)
-{
-}
-
-static int __dummy_dma_supported(struct device *hwdev, u64 mask)
-{
-   return 0;
-}
-
-const struct dma_map_ops dummy_dma_ops = {
-   .alloc  = __dummy_alloc,
-   .free   = __dummy_free,
-   .mmap   = __dummy_mmap,
-   .map_page   = __dummy_map_page,
-   .unmap_page = __dummy_unmap_page,
-   .map_sg = __dummy_map_sg,
-   .unmap_sg   = __dummy_unmap_sg,
-   .sync_single_for_cpu= __dummy_sync_single,
-   .sync_single_for_device = __dummy_sync_single,
-   .sync_sg_for_cpu= __dummy_sync_sg,
-   .sync_sg_for_device = __dummy_sync_sg,
-   .dma_supported  = __dummy_dma_supported,
-};
-EXPORT_SYMBOL(dummy_dma_ops);
-
 static int __init arm64_dma_init(void)
 {
WARN_TAINT(ARCH_DMA_MINALIGN < cache_line_size(),
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 0

[PATCH 08/15] dma-mapping: move dma_get_required_mask to kernel/dma

2018-12-07 Thread Christoph Hellwig
dma_get_required_mask should really be with the rest of the DMA mapping
implementation instead of in drivers/base as a lone outlier.

Signed-off-by: Christoph Hellwig 
---
 drivers/base/platform.c | 31 ---
 kernel/dma/mapping.c| 34 +-
 2 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 41b91af95afb..eae841935a45 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1179,37 +1179,6 @@ int __init platform_bus_init(void)
return error;
 }
 
-#ifndef ARCH_HAS_DMA_GET_REQUIRED_MASK
-static u64 dma_default_get_required_mask(struct device *dev)
-{
-   u32 low_totalram = ((max_pfn - 1) << PAGE_SHIFT);
-   u32 high_totalram = ((max_pfn - 1) >> (32 - PAGE_SHIFT));
-   u64 mask;
-
-   if (!high_totalram) {
-   /* convert to mask just covering totalram */
-   low_totalram = (1 << (fls(low_totalram) - 1));
-   low_totalram += low_totalram - 1;
-   mask = low_totalram;
-   } else {
-   high_totalram = (1 << (fls(high_totalram) - 1));
-   high_totalram += high_totalram - 1;
-   mask = (((u64)high_totalram) << 32) + 0x;
-   }
-   return mask;
-}
-
-u64 dma_get_required_mask(struct device *dev)
-{
-   const struct dma_map_ops *ops = get_dma_ops(dev);
-
-   if (ops->get_required_mask)
-   return ops->get_required_mask(dev);
-   return dma_default_get_required_mask(dev);
-}
-EXPORT_SYMBOL_GPL(dma_get_required_mask);
-#endif
-
 static __initdata LIST_HEAD(early_platform_driver_list);
 static __initdata LIST_HEAD(early_platform_device_list);
 
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index dfbc3deb95cd..dfe29d18dba1 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -5,7 +5,7 @@
  * Copyright (c) 2006  SUSE Linux Products GmbH
  * Copyright (c) 2006  Tejun Heo 
  */
-
+#include  /* for max_pfn */
 #include 
 #include 
 #include 
@@ -262,3 +262,35 @@ int dma_common_mmap(struct device *dev, struct 
vm_area_struct *vma,
 #endif /* !CONFIG_ARCH_NO_COHERENT_DMA_MMAP */
 }
 EXPORT_SYMBOL(dma_common_mmap);
+
+#ifndef ARCH_HAS_DMA_GET_REQUIRED_MASK
+static u64 dma_default_get_required_mask(struct device *dev)
+{
+   u32 low_totalram = ((max_pfn - 1) << PAGE_SHIFT);
+   u32 high_totalram = ((max_pfn - 1) >> (32 - PAGE_SHIFT));
+   u64 mask;
+
+   if (!high_totalram) {
+   /* convert to mask just covering totalram */
+   low_totalram = (1 << (fls(low_totalram) - 1));
+   low_totalram += low_totalram - 1;
+   mask = low_totalram;
+   } else {
+   high_totalram = (1 << (fls(high_totalram) - 1));
+   high_totalram += high_totalram - 1;
+   mask = (((u64)high_totalram) << 32) + 0x;
+   }
+   return mask;
+}
+
+u64 dma_get_required_mask(struct device *dev)
+{
+   const struct dma_map_ops *ops = get_dma_ops(dev);
+
+   if (ops->get_required_mask)
+   return ops->get_required_mask(dev);
+   return dma_default_get_required_mask(dev);
+}
+EXPORT_SYMBOL_GPL(dma_get_required_mask);
+#endif
+
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 10/15] dma-mapping: move dma_cache_sync out of line

2018-12-07 Thread Christoph Hellwig
This isn't exactly a slow path routine, but it is not super critical
either, and moving it out of line will help to keep the include chain
clean for the following DMA indirection bypass work.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-mapping.h | 12 ++--
 kernel/dma/mapping.c| 11 +++
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 0bbce52606c2..0f0078490df4 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -411,16 +411,8 @@ dma_sync_sg_for_device(struct device *dev, struct 
scatterlist *sg,
 #define dma_map_page(d, p, o, s, r) dma_map_page_attrs(d, p, o, s, r, 0)
 #define dma_unmap_page(d, a, s, r) dma_unmap_page_attrs(d, a, s, r, 0)
 
-static inline void
-dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-   enum dma_data_direction dir)
-{
-   const struct dma_map_ops *ops = get_dma_ops(dev);
-
-   BUG_ON(!valid_dma_direction(dir));
-   if (ops->cache_sync)
-   ops->cache_sync(dev, vaddr, size, dir);
-}
+void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
+   enum dma_data_direction dir);
 
 extern int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 176ae3e08916..0b18cfbdde95 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -430,3 +430,14 @@ int dma_set_coherent_mask(struct device *dev, u64 mask)
 }
 EXPORT_SYMBOL(dma_set_coherent_mask);
 #endif
+
+void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
+   enum dma_data_direction dir)
+{
+   const struct dma_map_ops *ops = get_dma_ops(dev);
+
+   BUG_ON(!valid_dma_direction(dir));
+   if (ops->cache_sync)
+   ops->cache_sync(dev, vaddr, size, dir);
+}
+EXPORT_SYMBOL(dma_cache_sync);
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 11/15] dma-mapping: always build the direct mapping code

2018-12-07 Thread Christoph Hellwig
All architectures except for sparc64 use the dma-direct code in some
form, and even for sparc64 we had the discussion of a direct mapping
mode a while ago.  In preparation for directly calling the direct
mapping code don't bother having it optionally but always build the
code in.  This is a minor hardship for some powerpc and arm configs
that don't pull it in yet (although they should in a relase ot two),
and sparc64 which currently doesn't need it at all, but it will
reduce the ifdef mess we'd otherwise need significantly.

Signed-off-by: Christoph Hellwig 
---
 arch/alpha/Kconfig  | 1 -
 arch/arc/Kconfig| 1 -
 arch/arm/Kconfig| 1 -
 arch/arm64/Kconfig  | 1 -
 arch/c6x/Kconfig| 1 -
 arch/csky/Kconfig   | 1 -
 arch/h8300/Kconfig  | 1 -
 arch/hexagon/Kconfig| 1 -
 arch/m68k/Kconfig   | 1 -
 arch/microblaze/Kconfig | 1 -
 arch/mips/Kconfig   | 1 -
 arch/nds32/Kconfig  | 1 -
 arch/nios2/Kconfig  | 1 -
 arch/openrisc/Kconfig   | 1 -
 arch/parisc/Kconfig | 1 -
 arch/riscv/Kconfig  | 1 -
 arch/s390/Kconfig   | 1 -
 arch/sh/Kconfig | 1 -
 arch/sparc/Kconfig  | 1 -
 arch/unicore32/Kconfig  | 1 -
 arch/x86/Kconfig| 1 -
 arch/xtensa/Kconfig | 1 -
 kernel/dma/Kconfig  | 7 ---
 kernel/dma/Makefile | 3 +--
 24 files changed, 1 insertion(+), 31 deletions(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index a7e748a46c18..5da6ff54b3e7 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -203,7 +203,6 @@ config ALPHA_EIGER
 config ALPHA_JENSEN
bool "Jensen"
depends on BROKEN
-   select DMA_DIRECT_OPS
help
  DEC PC 150 AXP (aka Jensen): This is a very old Digital system - one
  of the first-generation Alpha systems. A number of these systems
diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index fd48d698da29..7deaabeb531a 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -17,7 +17,6 @@ config ARC
select BUILDTIME_EXTABLE_SORT
select CLONE_BACKWARDS
select COMMON_CLK
-   select DMA_DIRECT_OPS
select GENERIC_ATOMIC64 if !ISA_ARCV2 || !(ARC_HAS_LL64 && ARC_HAS_LLSC)
select GENERIC_CLOCKEVENTS
select GENERIC_FIND_FIRST_BIT
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index a858ee791ef0..586fc30b23bd 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -30,7 +30,6 @@ config ARM
select CLONE_BACKWARDS
select CPU_PM if (SUSPEND || CPU_IDLE)
select DCACHE_WORD_ACCESS if HAVE_EFFICIENT_UNALIGNED_ACCESS
-   select DMA_DIRECT_OPS if !MMU
select DMA_REMAP if MMU
select EDAC_SUPPORT
select EDAC_ATOMIC_SCRUB
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 06cf0ef24367..2092080240b0 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -80,7 +80,6 @@ config ARM64
select CPU_PM if (SUSPEND || CPU_IDLE)
select CRC32
select DCACHE_WORD_ACCESS
-   select DMA_DIRECT_OPS
select DMA_DIRECT_REMAP
select EDAC_SUPPORT
select FRAME_POINTER
diff --git a/arch/c6x/Kconfig b/arch/c6x/Kconfig
index 84420109113d..456e154674d1 100644
--- a/arch/c6x/Kconfig
+++ b/arch/c6x/Kconfig
@@ -9,7 +9,6 @@ config C6X
select ARCH_HAS_SYNC_DMA_FOR_CPU
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select CLKDEV_LOOKUP
-   select DMA_DIRECT_OPS
select GENERIC_ATOMIC64
select GENERIC_IRQ_SHOW
select HAVE_ARCH_TRACEHOOK
diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
index ea74f3a9eeaf..37bed8aadf95 100644
--- a/arch/csky/Kconfig
+++ b/arch/csky/Kconfig
@@ -7,7 +7,6 @@ config CSKY
select COMMON_CLK
select CLKSRC_MMIO
select CLKSRC_OF
-   select DMA_DIRECT_OPS
select DMA_DIRECT_REMAP
select IRQ_DOMAIN
select HANDLE_DOMAIN_IRQ
diff --git a/arch/h8300/Kconfig b/arch/h8300/Kconfig
index d19c6b16cd5d..6472a0685470 100644
--- a/arch/h8300/Kconfig
+++ b/arch/h8300/Kconfig
@@ -22,7 +22,6 @@ config H8300
select HAVE_ARCH_KGDB
select HAVE_ARCH_HASH
select CPU_NO_EFFICIENT_FFS
-   select DMA_DIRECT_OPS
 
 config CPU_BIG_ENDIAN
def_bool y
diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig
index 2b688af379e6..d71036c598de 100644
--- a/arch/hexagon/Kconfig
+++ b/arch/hexagon/Kconfig
@@ -31,7 +31,6 @@ config HEXAGON
select GENERIC_CLOCKEVENTS_BROADCAST
select MODULES_USE_ELF_RELA
select GENERIC_CPU_DEVICES
-   select DMA_DIRECT_OPS
---help---
  Qualcomm Hexagon is a processor architecture designed for high
  performance and low power across a wide variety of applications.
diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
index 1bc9f1ba759a..8a5868e9a3a0 100644
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -26,7 +26,6 @@ config M68K
select MODULES_USE_ELF_RELA
select OLD_SIGSUSPEND3
select OLD_SIGACTION
-   select DMA_DI

[PATCH 09/15] dma-mapping: move various slow path functions out of line

2018-12-07 Thread Christoph Hellwig
There is no need to have all setup and coherent allocation / freeing
routines inline.  Move them out of line to keep the implemeation
nicely encapsulated and save some kernel text size.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/include/asm/dma-mapping.h |   1 -
 include/linux/dma-mapping.h| 150 +++--
 kernel/dma/mapping.c   | 140 ++-
 3 files changed, 151 insertions(+), 140 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h 
b/arch/powerpc/include/asm/dma-mapping.h
index 8fa394520af6..5201f2b7838c 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -108,7 +108,6 @@ static inline void set_dma_offset(struct device *dev, 
dma_addr_t off)
 }
 
 #define HAVE_ARCH_DMA_SET_MASK 1
-extern int dma_set_mask(struct device *dev, u64 dma_mask);
 
 extern u64 __dma_get_required_mask(struct device *dev);
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 3b431cc58794..0bbce52606c2 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -440,107 +440,24 @@ bool dma_in_atomic_pool(void *start, size_t size);
 void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags);
 bool dma_free_from_pool(void *start, size_t size);
 
-/**
- * dma_mmap_attrs - map a coherent DMA allocation into user space
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @vma: vm_area_struct describing requested user mapping
- * @cpu_addr: kernel CPU-view address returned from dma_alloc_attrs
- * @handle: device-view address returned from dma_alloc_attrs
- * @size: size of memory originally requested in dma_alloc_attrs
- * @attrs: attributes of mapping properties requested in dma_alloc_attrs
- *
- * Map a coherent DMA buffer previously allocated by dma_alloc_attrs
- * into user space.  The coherent DMA buffer must not be freed by the
- * driver until the user space mapping has been released.
- */
-static inline int
-dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma, void *cpu_addr,
-  dma_addr_t dma_addr, size_t size, unsigned long attrs)
-{
-   const struct dma_map_ops *ops = get_dma_ops(dev);
-   BUG_ON(!ops);
-   if (ops->mmap)
-   return ops->mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
-   return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
-}
-
+int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
+   void *cpu_addr, dma_addr_t dma_addr, size_t size,
+   unsigned long attrs);
 #define dma_mmap_coherent(d, v, c, h, s) dma_mmap_attrs(d, v, c, h, s, 0)
 
 int
 dma_common_get_sgtable(struct device *dev, struct sg_table *sgt, void 
*cpu_addr,
dma_addr_t dma_addr, size_t size, unsigned long attrs);
 
-static inline int
-dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt, void *cpu_addr,
- dma_addr_t dma_addr, size_t size,
- unsigned long attrs)
-{
-   const struct dma_map_ops *ops = get_dma_ops(dev);
-   BUG_ON(!ops);
-   if (ops->get_sgtable)
-   return ops->get_sgtable(dev, sgt, cpu_addr, dma_addr, size,
-   attrs);
-   return dma_common_get_sgtable(dev, sgt, cpu_addr, dma_addr, size,
-   attrs);
-}
-
+int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
+   void *cpu_addr, dma_addr_t dma_addr, size_t size,
+   unsigned long attrs);
 #define dma_get_sgtable(d, t, v, h, s) dma_get_sgtable_attrs(d, t, v, h, s, 0)
 
-#ifndef arch_dma_alloc_attrs
-#define arch_dma_alloc_attrs(dev)  (true)
-#endif
-
-static inline void *dma_alloc_attrs(struct device *dev, size_t size,
-  dma_addr_t *dma_handle, gfp_t flag,
-  unsigned long attrs)
-{
-   const struct dma_map_ops *ops = get_dma_ops(dev);
-   void *cpu_addr;
-
-   BUG_ON(!ops);
-   WARN_ON_ONCE(dev && !dev->coherent_dma_mask);
-
-   if (dma_alloc_from_dev_coherent(dev, size, dma_handle, &cpu_addr))
-   return cpu_addr;
-
-   /* let the implementation decide on the zone to allocate from: */
-   flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
-
-   if (!arch_dma_alloc_attrs(&dev))
-   return NULL;
-   if (!ops->alloc)
-   return NULL;
-
-   cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
-   debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr);
-   return cpu_addr;
-}
-
-static inline void dma_free_attrs(struct device *dev, size_t size,
-void *cpu_addr, dma_addr_t dma_handle,
-unsigned long attrs)
-{
-   const struct dma_map_ops *ops = get_dma_ops(dev);
-
-   BUG_ON(!ops);
-
-   if (dma_release_from_dev_coherent(dev, get_o

[PATCH 07/15] dma-mapping: merge dma_unmap_page_attrs and dma_unmap_single_attrs

2018-12-07 Thread Christoph Hellwig
The two functions are exactly the same, so don't bother implementing
them twice.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-mapping.h | 19 ++-
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 8916499d2805..3b431cc58794 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -253,6 +253,12 @@ static inline void dma_unmap_single_attrs(struct device 
*dev, dma_addr_t addr,
debug_dma_unmap_page(dev, addr, size, dir, true);
 }
 
+static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
+   size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+   return dma_unmap_single_attrs(dev, addr, size, dir, attrs);
+}
+
 /*
  * dma_maps_sg_attrs returns 0 on error and > 0 on success.
  * It should never return a value < 0.
@@ -300,19 +306,6 @@ static inline dma_addr_t dma_map_page_attrs(struct device 
*dev,
return addr;
 }
 
-static inline void dma_unmap_page_attrs(struct device *dev,
-   dma_addr_t addr, size_t size,
-   enum dma_data_direction dir,
-   unsigned long attrs)
-{
-   const struct dma_map_ops *ops = get_dma_ops(dev);
-
-   BUG_ON(!valid_dma_direction(dir));
-   if (ops->unmap_page)
-   ops->unmap_page(dev, addr, size, dir, attrs);
-   debug_dma_unmap_page(dev, addr, size, dir, false);
-}
-
 static inline dma_addr_t dma_map_resource(struct device *dev,
  phys_addr_t phys_addr,
  size_t size,
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 06/15] dma-mapping: simplify the dma_sync_single_range_for_{cpu, device} implementation

2018-12-07 Thread Christoph Hellwig
We can just call the regular calls after adding offset the the address instead
of reimplementing them.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-debug.h   | 27 
 include/linux/dma-mapping.h | 34 +-
 kernel/dma/debug.c  | 42 -
 3 files changed, 10 insertions(+), 93 deletions(-)

diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h
index 30213adbb6b9..c85e097a984c 100644
--- a/include/linux/dma-debug.h
+++ b/include/linux/dma-debug.h
@@ -72,17 +72,6 @@ extern void debug_dma_sync_single_for_device(struct device 
*dev,
 dma_addr_t dma_handle,
 size_t size, int direction);
 
-extern void debug_dma_sync_single_range_for_cpu(struct device *dev,
-   dma_addr_t dma_handle,
-   unsigned long offset,
-   size_t size,
-   int direction);
-
-extern void debug_dma_sync_single_range_for_device(struct device *dev,
-  dma_addr_t dma_handle,
-  unsigned long offset,
-  size_t size, int direction);
-
 extern void debug_dma_sync_sg_for_cpu(struct device *dev,
  struct scatterlist *sg,
  int nelems, int direction);
@@ -174,22 +163,6 @@ static inline void debug_dma_sync_single_for_device(struct 
device *dev,
 {
 }
 
-static inline void debug_dma_sync_single_range_for_cpu(struct device *dev,
-  dma_addr_t dma_handle,
-  unsigned long offset,
-  size_t size,
-  int direction)
-{
-}
-
-static inline void debug_dma_sync_single_range_for_device(struct device *dev,
- dma_addr_t dma_handle,
- unsigned long offset,
- size_t size,
- int direction)
-{
-}
-
 static inline void debug_dma_sync_sg_for_cpu(struct device *dev,
 struct scatterlist *sg,
 int nelems, int direction)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 7799c2b27849..8916499d2805 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -360,6 +360,13 @@ static inline void dma_sync_single_for_cpu(struct device 
*dev, dma_addr_t addr,
debug_dma_sync_single_for_cpu(dev, addr, size, dir);
 }
 
+static inline void dma_sync_single_range_for_cpu(struct device *dev,
+   dma_addr_t addr, unsigned long offset, size_t size,
+   enum dma_data_direction dir)
+{
+   return dma_sync_single_for_cpu(dev, addr + offset, size, dir);
+}
+
 static inline void dma_sync_single_for_device(struct device *dev,
  dma_addr_t addr, size_t size,
  enum dma_data_direction dir)
@@ -372,32 +379,11 @@ static inline void dma_sync_single_for_device(struct 
device *dev,
debug_dma_sync_single_for_device(dev, addr, size, dir);
 }
 
-static inline void dma_sync_single_range_for_cpu(struct device *dev,
-dma_addr_t addr,
-unsigned long offset,
-size_t size,
-enum dma_data_direction dir)
-{
-   const struct dma_map_ops *ops = get_dma_ops(dev);
-
-   BUG_ON(!valid_dma_direction(dir));
-   if (ops->sync_single_for_cpu)
-   ops->sync_single_for_cpu(dev, addr + offset, size, dir);
-   debug_dma_sync_single_range_for_cpu(dev, addr, offset, size, dir);
-}
-
 static inline void dma_sync_single_range_for_device(struct device *dev,
-   dma_addr_t addr,
-   unsigned long offset,
-   size_t size,
-   enum dma_data_direction dir)
+   dma_addr_t addr, unsigned long offset, size_t size,
+   enum dma_data_direction dir)
 {
-   const struct dma_map_ops *ops = get_dma_ops(dev);
-
-   BUG_ON(!valid_dma_direction(dir));
-   if (ops->sync_single_for_device)
-   ops->sync_single_for_device(dev, addr 

[PATCH 05/15] dma-direct: merge swiotlb_dma_ops into the dma_direct code

2018-12-07 Thread Christoph Hellwig
While the dma-direct code is (relatively) clean and simple we actually
have to use the swiotlb ops for the mapping on many architectures due
to devices with addressing limits.  Instead of keeping two
implementations around this commit allows the dma-direct
implementation to call the swiotlb bounce buffering functions and
thus share the guts of the mapping implementation.  This also
simplified the dma-mapping setup on a few architectures where we
don't have to differenciate which implementation to use.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/mm/dma-mapping.c  |   2 +-
 arch/ia64/hp/common/hwsw_iommu.c |   2 +-
 arch/ia64/hp/common/sba_iommu.c  |   6 +-
 arch/ia64/kernel/dma-mapping.c   |   2 +-
 arch/mips/include/asm/dma-mapping.h  |   2 -
 arch/powerpc/kernel/dma-swiotlb.c|  16 +-
 arch/riscv/include/asm/dma-mapping.h |  15 --
 arch/x86/kernel/pci-swiotlb.c|   4 +-
 arch/x86/mm/mem_encrypt.c|   7 -
 arch/x86/pci/sta2x11-fixup.c |   1 -
 include/linux/dma-direct.h   |  12 ++
 include/linux/swiotlb.h  |  74 -
 kernel/dma/direct.c  | 113 +
 kernel/dma/swiotlb.c | 232 ++-
 14 files changed, 150 insertions(+), 338 deletions(-)
 delete mode 100644 arch/riscv/include/asm/dma-mapping.h

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 4c0f498069e8..e4effbb243b1 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -549,7 +549,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, 
u64 size,
const struct iommu_ops *iommu, bool coherent)
 {
if (!dev->dma_ops)
-   dev->dma_ops = &swiotlb_dma_ops;
+   dev->dma_ops = &dma_direct_ops;
 
dev->dma_coherent = coherent;
__iommu_setup_dma_ops(dev, dma_base, size, iommu);
diff --git a/arch/ia64/hp/common/hwsw_iommu.c b/arch/ia64/hp/common/hwsw_iommu.c
index 58969039bed2..f40ca499b246 100644
--- a/arch/ia64/hp/common/hwsw_iommu.c
+++ b/arch/ia64/hp/common/hwsw_iommu.c
@@ -38,7 +38,7 @@ static inline int use_swiotlb(struct device *dev)
 const struct dma_map_ops *hwsw_dma_get_ops(struct device *dev)
 {
if (use_swiotlb(dev))
-   return &swiotlb_dma_ops;
+   return &dma_direct_ops;
return &sba_dma_ops;
 }
 EXPORT_SYMBOL(hwsw_dma_get_ops);
diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 0d21c0b5b23d..5ee74820a0f6 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -2065,8 +2065,6 @@ static int __init acpi_sba_ioc_init_acpi(void)
 /* This has to run before acpi_scan_init(). */
 arch_initcall(acpi_sba_ioc_init_acpi);
 
-extern const struct dma_map_ops swiotlb_dma_ops;
-
 static int __init
 sba_init(void)
 {
@@ -2080,7 +2078,7 @@ sba_init(void)
 * a successful kdump kernel boot is to use the swiotlb.
 */
if (is_kdump_kernel()) {
-   dma_ops = &swiotlb_dma_ops;
+   dma_ops = &dma_direct_ops;
if (swiotlb_late_init_with_default_size(64 * (1<<20)) != 0)
panic("Unable to initialize software I/O TLB:"
  " Try machvec=dig boot option");
@@ -2102,7 +2100,7 @@ sba_init(void)
 * If we didn't find something sba_iommu can claim, we
 * need to setup the swiotlb and switch to the dig machvec.
 */
-   dma_ops = &swiotlb_dma_ops;
+   dma_ops = &dma_direct_ops;
if (swiotlb_late_init_with_default_size(64 * (1<<20)) != 0)
panic("Unable to find SBA IOMMU or initialize "
  "software I/O TLB: Try machvec=dig boot option");
diff --git a/arch/ia64/kernel/dma-mapping.c b/arch/ia64/kernel/dma-mapping.c
index 36dd6aa6d759..80cd3e1ea95a 100644
--- a/arch/ia64/kernel/dma-mapping.c
+++ b/arch/ia64/kernel/dma-mapping.c
@@ -36,7 +36,7 @@ long arch_dma_coherent_to_pfn(struct device *dev, void 
*cpu_addr,
 
 void __init swiotlb_dma_init(void)
 {
-   dma_ops = &swiotlb_dma_ops;
+   dma_ops = &dma_direct_ops;
swiotlb_init(1);
 }
 #endif
diff --git a/arch/mips/include/asm/dma-mapping.h 
b/arch/mips/include/asm/dma-mapping.h
index b4c477eb46ce..69f914667f3e 100644
--- a/arch/mips/include/asm/dma-mapping.h
+++ b/arch/mips/include/asm/dma-mapping.h
@@ -10,8 +10,6 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
 {
 #if defined(CONFIG_MACH_JAZZ)
return &jazz_dma_ops;
-#elif defined(CONFIG_SWIOTLB)
-   return &swiotlb_dma_ops;
 #else
return &dma_direct_ops;
 #endif
diff --git a/arch/powerpc/kernel/dma-swiotlb.c 
b/arch/powerpc/kernel/dma-swiotlb.c
index 3d8df2cf8be9..430a7d0aa2cb 100644
--- a/arch/powerpc/kernel/dma-swiotlb.c
+++ b/arch/powerpc/kernel/dma-swiotlb.c
@@ -50,15 +50,15 @@ const struct dma_map_op

[PATCH 04/15] dma-direct: use dma_direct_map_page to implement dma_direct_map_sg

2018-12-07 Thread Christoph Hellwig
No need to duplicate the mapping logic.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/direct.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index edb24f94ea1e..d45306473c90 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -217,6 +217,7 @@ static void dma_direct_sync_single_for_device(struct device 
*dev,
arch_sync_dma_for_device(dev, dma_to_phys(dev, addr), size, dir);
 }
 
+#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE)
 static void dma_direct_sync_sg_for_device(struct device *dev,
struct scatterlist *sgl, int nents, enum dma_data_direction dir)
 {
@@ -229,6 +230,7 @@ static void dma_direct_sync_sg_for_device(struct device 
*dev,
for_each_sg(sgl, sg, nents, i)
arch_sync_dma_for_device(dev, sg_phys(sg), sg->length, dir);
 }
+#endif
 
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
@@ -294,19 +296,13 @@ int dma_direct_map_sg(struct device *dev, struct 
scatterlist *sgl, int nents,
struct scatterlist *sg;
 
for_each_sg(sgl, sg, nents, i) {
-   BUG_ON(!sg_page(sg));
-
-   sg_dma_address(sg) = phys_to_dma(dev, sg_phys(sg));
-   if (unlikely(dev && !dma_capable(dev, sg_dma_address(sg),
-   sg->length))) {
-   report_addr(dev, sg_dma_address(sg), sg->length);
+   sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
+   sg->offset, sg->length, dir, attrs);
+   if (sg->dma_address == DMA_MAPPING_ERROR)
return 0;
-   }
sg_dma_len(sg) = sg->length;
}
 
-   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
-   dma_direct_sync_sg_for_device(dev, sgl, nents, dir);
return nents;
 }
 
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC] avoid indirect calls for DMA direct mappings v2

2018-12-07 Thread Christoph Hellwig
Hi all,

a while ago Jesper reported major performance regressions due to the
spectre v2 mitigations in his XDP forwarding workloads.  A large part
of that is due to the DMA mapping API indirect calls.

It turns out that the most common implementation of the DMA API is the
direct mapping case, and now that we have merged almost all duplicate
implementations of that into a single generic one is easily feasily to
direct calls for this fast path.

This series adds consolidate the DMA mapping code by merging the
swiotlb case into the dma direct case, and then treats NULL dma_ops
as an indicator that that we should directly call the direct mapping
case.  This recovers a large part of the retpoline induces XDP slowdown.

This works is based on the dma-mapping tree, so you probably want to
want this git tree for testing:

git://git.infradead.org/users/hch/misc.git dma-direct-calls.2

Gitweb:


http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-direct-calls.2

Changes since v1:
 - now also includes all the prep patches relative to the dma-mapping
   for-next tree
 - move various slow path functions out of line
 - use a NULL dma ops as the indicate to use the direct mapping path
 - remove dma_direct_ops now that we always call it without the indirection
 - move the dummy dma ops to common code
 - explicitly st the dummy dma ops for devices that are indicates as not
   DMA capable by firmware
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 03/15] dma-direct: improve addressability error reporting

2018-12-07 Thread Christoph Hellwig
Only report report a DMA addressability report once to avoid spewing the
kernel log with repeated message.  Also provide a stack trace to make it
easy to find the actual caller that caused the problem.

Last but not least move the actual check into the fast path and only
leave the error reporting in a helper.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/direct.c | 36 +++-
 1 file changed, 15 insertions(+), 21 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 308f88a750c8..edb24f94ea1e 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -30,27 +30,16 @@ static inline bool force_dma_unencrypted(void)
return sev_active();
 }
 
-static bool
-check_addr(struct device *dev, dma_addr_t dma_addr, size_t size,
-   const char *caller)
+static void report_addr(struct device *dev, dma_addr_t dma_addr, size_t size)
 {
-   if (unlikely(dev && !dma_capable(dev, dma_addr, size))) {
-   if (!dev->dma_mask) {
-   dev_err(dev,
-   "%s: call on device without dma_mask\n",
-   caller);
-   return false;
-   }
-
-   if (*dev->dma_mask >= DMA_BIT_MASK(32) || dev->bus_dma_mask) {
-   dev_err(dev,
-   "%s: overflow %pad+%zu of device mask %llx bus 
mask %llx\n",
-   caller, &dma_addr, size,
-   *dev->dma_mask, dev->bus_dma_mask);
-   }
-   return false;
+   if (!dev->dma_mask) {
+   dev_err_once(dev, "DMA map on device without dma_mask\n");
+   } else if (*dev->dma_mask >= DMA_BIT_MASK(32) || dev->bus_dma_mask) {
+   dev_err_once(dev,
+   "overflow %pad+%zu of DMA mask %llx bus mask %llx\n",
+   &dma_addr, size, *dev->dma_mask, dev->bus_dma_mask);
}
-   return true;
+   WARN_ON_ONCE(1);
 }
 
 static inline dma_addr_t phys_to_dma_direct(struct device *dev,
@@ -288,8 +277,10 @@ dma_addr_t dma_direct_map_page(struct device *dev, struct 
page *page,
phys_addr_t phys = page_to_phys(page) + offset;
dma_addr_t dma_addr = phys_to_dma(dev, phys);
 
-   if (!check_addr(dev, dma_addr, size, __func__))
+   if (unlikely(dev && !dma_capable(dev, dma_addr, size))) {
+   report_addr(dev, dma_addr, size);
return DMA_MAPPING_ERROR;
+   }
 
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
dma_direct_sync_single_for_device(dev, dma_addr, size, dir);
@@ -306,8 +297,11 @@ int dma_direct_map_sg(struct device *dev, struct 
scatterlist *sgl, int nents,
BUG_ON(!sg_page(sg));
 
sg_dma_address(sg) = phys_to_dma(dev, sg_phys(sg));
-   if (!check_addr(dev, sg_dma_address(sg), sg->length, __func__))
+   if (unlikely(dev && !dma_capable(dev, sg_dma_address(sg),
+   sg->length))) {
+   report_addr(dev, sg_dma_address(sg), sg->length);
return 0;
+   }
sg_dma_len(sg) = sg->length;
}
 
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 01/15] swiotlb: remove SWIOTLB_MAP_ERROR

2018-12-07 Thread Christoph Hellwig
We can use DMA_MAPPING_ERROR instead, which already maps to the same
value.

Signed-off-by: Christoph Hellwig 
---
 drivers/xen/swiotlb-xen.c | 4 ++--
 include/linux/swiotlb.h   | 3 ---
 kernel/dma/swiotlb.c  | 4 ++--
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 6dc969d5ea2f..833e80b46eb2 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -403,7 +403,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, 
struct page *page,
 
map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir,
 attrs);
-   if (map == SWIOTLB_MAP_ERROR)
+   if (map == DMA_MAPPING_ERROR)
return DMA_MAPPING_ERROR;
 
dev_addr = xen_phys_to_bus(map);
@@ -572,7 +572,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct 
scatterlist *sgl,
 sg_phys(sg),
 sg->length,
 dir, attrs);
-   if (map == SWIOTLB_MAP_ERROR) {
+   if (map == DMA_MAPPING_ERROR) {
dev_warn(hwdev, "swiotlb buffer is full\n");
/* Don't panic here, we expect map_sg users
   to do proper error handling. */
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index a387b59640a4..14aec0b70dd9 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -46,9 +46,6 @@ enum dma_sync_target {
SYNC_FOR_DEVICE = 1,
 };
 
-/* define the last possible byte of physical address space as a mapping error 
*/
-#define SWIOTLB_MAP_ERROR (~(phys_addr_t)0x0)
-
 extern phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
  dma_addr_t tbl_dma_addr,
  phys_addr_t phys, size_t size,
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index ff1ce81bb623..19ba8e473d71 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -526,7 +526,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
spin_unlock_irqrestore(&io_tlb_lock, flags);
if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", 
size);
-   return SWIOTLB_MAP_ERROR;
+   return DMA_MAPPING_ERROR;
 found:
spin_unlock_irqrestore(&io_tlb_lock, flags);
 
@@ -637,7 +637,7 @@ static dma_addr_t swiotlb_bounce_page(struct device *dev, 
phys_addr_t *phys,
/* Oh well, have to allocate and map a bounce buffer. */
*phys = swiotlb_tbl_map_single(dev, __phys_to_dma(dev, io_tlb_start),
*phys, size, dir, attrs);
-   if (*phys == SWIOTLB_MAP_ERROR)
+   if (*phys == DMA_MAPPING_ERROR)
return DMA_MAPPING_ERROR;
 
/* Ensure that the address returned is DMA'ble */
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 02/15] swiotlb: remove dma_mark_clean

2018-12-07 Thread Christoph Hellwig
Instead of providing a special dma_mark_clean hook just for ia64, switch
ia64 to use the normal arch_sync_dma_for_cpu hooks instead.

This means that we now also set the PG_arch_1 bit for pages in the
swiotlb buffer, which isn't stricly needed as we will never execute code
out of the swiotlb buffer, but otherwise harmless.

Signed-off-by: Christoph Hellwig 
---
 arch/ia64/Kconfig  |  3 ++-
 arch/ia64/kernel/dma-mapping.c | 20 +++-
 arch/ia64/mm/init.c| 18 +++---
 drivers/xen/swiotlb-xen.c  | 20 +---
 include/linux/dma-direct.h |  8 
 kernel/dma/swiotlb.c   | 18 +-
 6 files changed, 30 insertions(+), 57 deletions(-)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index d6f203658994..c587e3316c38 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -28,7 +28,8 @@ config IA64
select HAVE_ARCH_TRACEHOOK
select HAVE_MEMBLOCK_NODE_MAP
select HAVE_VIRT_CPU_ACCOUNTING
-   select ARCH_HAS_DMA_MARK_CLEAN
+   select ARCH_HAS_DMA_COHERENT_TO_PFN
+   select ARCH_HAS_SYNC_DMA_FOR_CPU
select VIRT_TO_BUS
select ARCH_DISCARD_MEMBLOCK
select GENERIC_IRQ_PROBE
diff --git a/arch/ia64/kernel/dma-mapping.c b/arch/ia64/kernel/dma-mapping.c
index 7a471d8d67d4..36dd6aa6d759 100644
--- a/arch/ia64/kernel/dma-mapping.c
+++ b/arch/ia64/kernel/dma-mapping.c
@@ -1,5 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
-#include 
+#include 
 #include 
 #include 
 
@@ -16,6 +16,24 @@ const struct dma_map_ops *dma_get_ops(struct device *dev)
 EXPORT_SYMBOL(dma_get_ops);
 
 #ifdef CONFIG_SWIOTLB
+void *arch_dma_alloc(struct device *dev, size_t size,
+   dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
+{
+   return dma_direct_alloc_pages(dev, size, dma_handle, gfp, attrs);
+}
+
+void arch_dma_free(struct device *dev, size_t size, void *cpu_addr,
+   dma_addr_t dma_addr, unsigned long attrs)
+{
+   dma_direct_free_pages(dev, size, cpu_addr, dma_addr, attrs);
+}
+
+long arch_dma_coherent_to_pfn(struct device *dev, void *cpu_addr,
+   dma_addr_t dma_addr)
+{
+   return page_to_pfn(virt_to_page(cpu_addr));
+}
+
 void __init swiotlb_dma_init(void)
 {
dma_ops = &swiotlb_dma_ops;
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index d5e12ff1d73c..2c51733f1dfd 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -71,18 +71,14 @@ __ia64_sync_icache_dcache (pte_t pte)
  * DMA can be marked as "clean" so that lazy_mmu_prot_update() doesn't have to
  * flush them when they get mapped into an executable vm-area.
  */
-void
-dma_mark_clean(void *addr, size_t size)
+void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
+   size_t size, enum dma_data_direction dir)
 {
-   unsigned long pg_addr, end;
-
-   pg_addr = PAGE_ALIGN((unsigned long) addr);
-   end = (unsigned long) addr + size;
-   while (pg_addr + PAGE_SIZE <= end) {
-   struct page *page = virt_to_page(pg_addr);
-   set_bit(PG_arch_1, &page->flags);
-   pg_addr += PAGE_SIZE;
-   }
+   unsigned long pfn = __phys_to_pfn(paddr);
+
+   do {
+   set_bit(PG_arch_1, &pfn_to_page(pfn)->flags);
+   } while (++pfn <= __phys_to_pfn(paddr + size - 1));
 }
 
 inline void
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 833e80b46eb2..989cf872b98c 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -441,21 +441,8 @@ static void xen_unmap_single(struct device *hwdev, 
dma_addr_t dev_addr,
xen_dma_unmap_page(hwdev, dev_addr, size, dir, attrs);
 
/* NOTE: We use dev_addr here, not paddr! */
-   if (is_xen_swiotlb_buffer(dev_addr)) {
+   if (is_xen_swiotlb_buffer(dev_addr))
swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
-   return;
-   }
-
-   if (dir != DMA_FROM_DEVICE)
-   return;
-
-   /*
-* phys_to_virt doesn't work with hihgmem page but we could
-* call dma_mark_clean() with hihgmem page here. However, we
-* are fine since dma_mark_clean() is null on POWERPC. We can
-* make dma_mark_clean() take a physical address if necessary.
-*/
-   dma_mark_clean(phys_to_virt(paddr), size);
 }
 
 static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
@@ -493,11 +480,6 @@ xen_swiotlb_sync_single(struct device *hwdev, dma_addr_t 
dev_addr,
 
if (target == SYNC_FOR_DEVICE)
xen_dma_sync_single_for_device(hwdev, dev_addr, size, dir);
-
-   if (dir != DMA_FROM_DEVICE)
-   return;
-
-   dma_mark_clean(phys_to_virt(paddr), size);
 }
 
 void
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 6e5a47ae7d64..1aa73f4907ae 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -48,14 +48,6 @@ static inline

Re: [virtio-dev] Re: [PATCH v5 5/7] iommu: Add virtio-iommu driver

2018-12-07 Thread Jean-Philippe Brucker
Sorry for the delay, I wanted to do a little more performance analysis
before continuing.

On 27/11/2018 18:10, Michael S. Tsirkin wrote:
> On Tue, Nov 27, 2018 at 05:55:20PM +, Jean-Philippe Brucker wrote:
 +  if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1) ||
 +  !virtio_has_feature(vdev, VIRTIO_IOMMU_F_MAP_UNMAP))
>>>
>>> Why bother with a feature bit for this then btw?
>>
>> We'll need a new feature bit for sharing page tables with the hardware,
>> because they require different requests (attach_table/invalidate instead
>> of map/unmap.) A future device supporting page table sharing won't
>> necessarily need to support map/unmap.
>>
> I don't see virtio iommu being extended to support ARM specific
> requests. This just won't scale, too many different
> descriptor formats out there.

They aren't really ARM specific requests. The two new requests are
ATTACH_TABLE and INVALIDATE, which would be used by x86 IOMMUs as well.

Sharing CPU address space with the HW IOMMU (SVM) has been in the scope
of virtio-iommu since the first RFC, and I've been working with that
extension in mind since the beginning. As an example you can have a look
at my current draft for this [1], which is inspired from the VFIO work
we've been doing with Intel.

The negotiation phase inevitably requires vendor-specific fields in the
descriptors - host tells which formats are supported, guest chooses a
format and attaches page tables. But invalidation and fault reporting
descriptors are fairly generic.

> If you want to go that way down the road, you should avoid
> virtio iommu, instead emulate and share code with the ARM SMMU (probably
> with a different vendor id so you can implement the
> report on map for devices without PRI).

vSMMU has to stay in userspace though. The main reason we're proposing
virtio-iommu is that emulating every possible vIOMMU model in the kernel
would be unmaintainable. With virtio-iommu we can process the fast path
in the host kernel, through vhost-iommu, and do the heavy lifting in
userspace. As said above, I'm trying to keep the fast path for
virtio-iommu generic.


More notes on what I consider to be the fast path, and comparison with
vSMMU:

(1) The primary use-case we have in mind for vIOMMU is something like
DPDK in the guest, assigning a hardware device to guest userspace. DPDK
maps a large amount of memory statically, to be used by a pass-through
device. For this case I don't think we care about vIOMMU performance.
Setup and teardown need to be reasonably fast, sure, but the MAP/UNMAP
requests don't have to be optimal.


(2) If the assigned device is owned by the guest kernel, then mappings
are dynamic and require dma_map/unmap() to be fast, but there generally
is no need for a vIOMMU, since device and drivers are trusted by the
guest kernel. Even when the user does enable a vIOMMU for this case
(allowing to over-commit guest memory, which needs to be pinned
otherwise), we generally play tricks like lazy TLBI (non-strict mode) to
make it faster. Here device and drivers are trusted, therefore the
vulnerability window of lazy mode isn't a concern.

If the reason to enable the vIOMMU is over-comitting guest memory
however, you can't use nested translation because it requires pinning
the second-level tables. For this case performance matters a bit,
because your invalidate-on-map needs to be fast, even if you enable lazy
mode and only receive inval-on-unmap every 10ms. It won't ever be as
fast as nested translation, though. For this case I think vSMMU+Caching
Mode and userspace virtio-iommu with MAP/UNMAP would perform similarly
(given page-sized payloads), because the pagetable walk doesn't add a
lot of overhead compared to the context switch. But given the results
below, vhost-iommu would be faster than vSMMU+CM.


(3) Then there is SVM. For SVM, any destructive change to the process
address space requires a synchronous invalidation command to the
hardware (at least when using PCI ATS). Given that SVM is based on page
faults, fault reporting from host to guest also needs to be fast, as
well as fault response from guest to host.

I think this is where performance matters the most. To get a feel of the
advantage we get with virtio-iommu, I compared the vSMMU page-table
sharing implementation [2] and vhost-iommu + VFIO with page table
sharing (based on Tomasz Nowicki's vhost-iommu prototype). That's on a
ThunderX2 with a 10Gb NIC assigned to the guest kernel, which
corresponds to case (2) above, with nesting page tables and without the
lazy mode. The host's only job is forwarding invalidation to the HW SMMU.

vhost-iommu performed on average 1.8x and 5.5x better than vSMMU on
netperf TCP_STREAM and TCP_MAERTS respectively (~200 samples). I think
this can be further optimized (that was still polling under the vq
lock), and unlike vSMMU, virtio-iommu offers the possibility of
multi-queue for improved scalability. In addition, the guest will need
to send both TLB and ATC invalidations with v

Re: use generic DMA mapping code in powerpc V4

2018-12-07 Thread Christian Zigotzky
Next step: 13c1fdec5682b6e13257277fa16aa31f342d167d (powerpc/dma: move 
pci_dma_dev_setup_swiotlb to fsl_pci.c)

git checkout 13c1fdec5682b6e13257277fa16aa31f342d167d

Result: The PASEMI onboard ethernet works and the X5000 boots.

— Christian

Sent from my iPhone

> On 7. Dec 2018, at 14:45, Christian Zigotzky  wrote:
> 
>> On 06 December 2018 at 11:55AM, Christian Zigotzky wrote:
>>> On 05 December 2018 at 3:05PM, Christoph Hellwig wrote:
>>> 
>>> Thanks.  Can you try a few stepping points in the tree?
>>> 
>>> First just with commit 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6
>>> (the first one) applied?
>>> 
>>> Second with all commits up to 5da11e49df21f21dac25a2491aa788307bdacb6b
>>> 
>>> And if that still works with commits up to
>>> c1bfcad4b0cf38ce5b00f7ad880d3a13484c123a
>>> 
>> Hi Christoph,
>> 
>> I undid the commit 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6 with the 
>> following command:
>> 
>> git checkout 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6
>> 
>> Result: PASEMI onboard ethernet works again and the P5020 board boots.
>> 
>> I will test the other commits in the next days.
>> 
>> @All
>> It is really important, that you also test Christoph's work on your PASEMI 
>> and NXP boards. Could you please help us with solving the issues?
>> 
>> 'git clone git://git.infradead.org/users/hch/misc.git -b powerpc-dma.5 a'
>> 
>> Thanks,
>> Christian
>> 
>> 
> Today I tested the commit 5da11e49df21f21dac25a2491aa788307bdacb6b.
> 
> git checkout 5da11e49df21f21dac25a2491aa788307bdacb6b
> 
> The PASEMI onboard ethernet works and the P5020 board boots.
> 
> -- Christian
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-12-07 Thread John Garry

On 30/11/2018 11:14, John Garry wrote:

From: Ganapatrao Kulkarni 



Hi Joerg,

A friendly reminder. Can you please let me know your position on this patch?

Cheers,
John


Change function __iommu_dma_alloc_pages() to allocate pages for DMA from
respective device NUMA node. The ternary operator which would be for
alloc_pages_node() is tidied along with this.

The motivation for this change is to have a policy for page allocation
consistent with direct DMA mapping, which attempts to allocate pages local
to the device, as mentioned in [1].

In addition, for certain workloads it has been observed a marginal
performance improvement. The patch caused an observation of 0.9% average
throughput improvement for running tcrypt with HiSilicon crypto engine.

We also include a modification to use kvzalloc() for kzalloc()/vzalloc()
combination.

[1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1692998.html

Signed-off-by: Ganapatrao Kulkarni 
[JPG: Added kvzalloc(), drop pages ** being device local, remove ternary 
operator, update message]
Signed-off-by: John Garry 
---
Difference:
v1->v2:
- Add Ganapatrao's tag and change author

v2->v3:
- removed ternary operator
- stopped making pages ** allocation local to device

v3->v4:
- Update commit message to include motivation for patch, including
  headline performance improvement for test.

Some notes:
This patch was originally posted by Ganapatrao in [2].

However, after initial review, it was never reposted (due to lack of
cycles, I think). In addition, the functionality in its sibling patches
were merged through patches, as mentioned in [2]; this also refers to a
discussion on device local allocations vs CPU local allocations for DMA
pool, and which is better [1].

However, as mentioned in [1], dma_alloc_coherent() uses the locality
information from the device - as in direct DMA - so this patch is just
applying this same policy.

And from some testing, mentioned in commit message, shows marginal
improvement.

[2] https://lore.kernel.org/patchwork/patch/833004/
[3] https://lkml.org/lkml/2018/8/22/391

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d1b0475..4afb1a8 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -449,20 +449,17 @@ static void __iommu_dma_free_pages(struct page **pages, 
int count)
kvfree(pages);
 }

-static struct page **__iommu_dma_alloc_pages(unsigned int count,
-   unsigned long order_mask, gfp_t gfp)
+static struct page **__iommu_dma_alloc_pages(struct device *dev,
+   unsigned int count, unsigned long order_mask, gfp_t gfp)
 {
struct page **pages;
-   unsigned int i = 0, array_size = count * sizeof(*pages);
+   unsigned int i = 0, nid = dev_to_node(dev);

order_mask &= (2U << MAX_ORDER) - 1;
if (!order_mask)
return NULL;

-   if (array_size <= PAGE_SIZE)
-   pages = kzalloc(array_size, GFP_KERNEL);
-   else
-   pages = vzalloc(array_size);
+   pages = kvzalloc(count * sizeof(*pages), GFP_KERNEL);
if (!pages)
return NULL;

@@ -481,10 +478,12 @@ static struct page **__iommu_dma_alloc_pages(unsigned int 
count,
for (order_mask &= (2U << __fls(count)) - 1;
 order_mask; order_mask &= ~order_size) {
unsigned int order = __fls(order_mask);
+   gfp_t alloc_flags = gfp;

order_size = 1U << order;
-   page = alloc_pages((order_mask - order_size) ?
-  gfp | __GFP_NORETRY : gfp, order);
+   if (order_mask > order_size)
+   alloc_flags |= __GFP_NORETRY;
+   page = alloc_pages_node(nid, alloc_flags, order);
if (!page)
continue;
if (!order)
@@ -569,7 +568,8 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
alloc_sizes = min_size;

count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   pages = __iommu_dma_alloc_pages(count, alloc_sizes >> PAGE_SHIFT, gfp);
+   pages = __iommu_dma_alloc_pages(dev, count, alloc_sizes >> PAGE_SHIFT,
+   gfp);
if (!pages)
return NULL;





___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] Refactor dummy DMA ops

2018-12-07 Thread Robin Murphy

On 07/12/2018 17:05, Christoph Hellwig wrote:

So I'd really prefer if we had a separate dummy.c file, like in
my take on your previous patch here:

http://git.infradead.org/users/hch/misc.git/commitdiff/e01adddc1733fa414dc16cd22e8f58be9b64a025

http://git.infradead.org/users/hch/misc.git/commitdiff/596bde76e5944a3f4beb8c2769067ca88dda127a

Otherwise this looks fine.  If you don't minde I'll take your patches,
apply the move to a separate file and merge it into the above tree.


Sure - TBH I did consider creating a separate file, but then didn't for 
mysterious reasons that don't stand up to scrutiny. If you're happy to 
do the fixup on top that's fine by me (if culling .map_resource was 
intentional, please scrub the last bit of the commit message to match).


I'll make the equivalent change locally anyway just in case there's any 
other cause to resend.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/9] mm: Introduce new vm_insert_range API

2018-12-07 Thread Matthew Wilcox
On Fri, Dec 07, 2018 at 03:34:56PM +, Robin Murphy wrote:
> > +int vm_insert_range(struct vm_area_struct *vma, unsigned long addr,
> > +   struct page **pages, unsigned long page_count)
> > +{
> > +   unsigned long uaddr = addr;
> > +   int ret = 0, i;
> 
> Some of the sites being replaced were effectively ensuring that vma and
> pages were mutually compatible as an initial condition - would it be worth
> adding something here for robustness, e.g.:
> 
> + if (page_count != vma_pages(vma))
> + return -ENXIO;

I think we want to allow this to be used to populate part of a VMA.
So perhaps:

if (page_count > vma_pages(vma))
return -ENXIO;

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] Refactor dummy DMA ops

2018-12-07 Thread Christoph Hellwig
So I'd really prefer if we had a separate dummy.c file, like in
my take on your previous patch here:

http://git.infradead.org/users/hch/misc.git/commitdiff/e01adddc1733fa414dc16cd22e8f58be9b64a025

http://git.infradead.org/users/hch/misc.git/commitdiff/596bde76e5944a3f4beb8c2769067ca88dda127a

Otherwise this looks fine.  If you don't minde I'll take your patches,
apply the move to a separate file and merge it into the above tree.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/2] Refactor dummy DMA ops

2018-12-07 Thread Robin Murphy
Hi all,

Tangential to Christoph's RFC for mitigating indirect call overhead in
common DMA mapping scenarios[1], this is a little reshuffle to prevent the
CONFIG_ACPI_CCA_REQUIRED case from getting in the way. This would best go
via the dma-mapping tree, so reviews and acks welcome.

Robin.

[1] https://lore.kernel.org/lkml/20181206153720.10702-1-...@lst.de/

Robin Murphy (2):
  dma-mapping: Factor out dummy DMA ops
  ACPI / scan: Refactor _CCA enforcement

 arch/arm64/include/asm/dma-mapping.h |  4 +-
 arch/arm64/mm/dma-mapping.c  | 92 
 drivers/acpi/scan.c  |  5 ++
 drivers/base/platform.c  |  3 +-
 drivers/pci/pci-driver.c |  3 +-
 include/linux/dma-mapping.h  |  1 +
 kernel/dma/mapping.c | 54 
 7 files changed, 63 insertions(+), 99 deletions(-)

-- 
2.19.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] dma-mapping: Factor out dummy DMA ops

2018-12-07 Thread Robin Murphy
The dummy DMA ops are currently used by arm64 for any device which has
an invalid ACPI description and is thus barred from using DMA due to not
knowing whether is is cache-coherent or not. Factor these out into
general dma-mapping code so that they can be referenced from other
common code paths. In the process, we can prune all the optional
callbacks which just do the same thing as the default behaviour, and
fill in .map_resource for completeness.

CC: Catalin Marinas 
CC: Will Deacon 
Signed-off-by: Robin Murphy 
---
 arch/arm64/include/asm/dma-mapping.h |  4 +-
 arch/arm64/mm/dma-mapping.c  | 92 
 include/linux/dma-mapping.h  |  1 +
 kernel/dma/mapping.c | 54 
 4 files changed, 56 insertions(+), 95 deletions(-)

diff --git a/arch/arm64/include/asm/dma-mapping.h 
b/arch/arm64/include/asm/dma-mapping.h
index c41f3fb1446c..273e778f7de2 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -24,15 +24,13 @@
 #include 
 #include 
 
-extern const struct dma_map_ops dummy_dma_ops;
-
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
/*
 * We expect no ISA devices, and all other DMA masters are expected to
 * have someone call arch_setup_dma_ops at device creation time.
 */
-   return &dummy_dma_ops;
+   return &dma_dummy_ops;
 }
 
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index a3ac26284845..f3af6cc52fad 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -252,98 +252,6 @@ static int __init atomic_pool_init(void)
return -ENOMEM;
 }
 
-/
- * The following APIs are for dummy DMA ops *
- /
-
-static void *__dummy_alloc(struct device *dev, size_t size,
-  dma_addr_t *dma_handle, gfp_t flags,
-  unsigned long attrs)
-{
-   return NULL;
-}
-
-static void __dummy_free(struct device *dev, size_t size,
-void *vaddr, dma_addr_t dma_handle,
-unsigned long attrs)
-{
-}
-
-static int __dummy_mmap(struct device *dev,
-   struct vm_area_struct *vma,
-   void *cpu_addr, dma_addr_t dma_addr, size_t size,
-   unsigned long attrs)
-{
-   return -ENXIO;
-}
-
-static dma_addr_t __dummy_map_page(struct device *dev, struct page *page,
-  unsigned long offset, size_t size,
-  enum dma_data_direction dir,
-  unsigned long attrs)
-{
-   return 0;
-}
-
-static void __dummy_unmap_page(struct device *dev, dma_addr_t dev_addr,
-  size_t size, enum dma_data_direction dir,
-  unsigned long attrs)
-{
-}
-
-static int __dummy_map_sg(struct device *dev, struct scatterlist *sgl,
- int nelems, enum dma_data_direction dir,
- unsigned long attrs)
-{
-   return 0;
-}
-
-static void __dummy_unmap_sg(struct device *dev,
-struct scatterlist *sgl, int nelems,
-enum dma_data_direction dir,
-unsigned long attrs)
-{
-}
-
-static void __dummy_sync_single(struct device *dev,
-   dma_addr_t dev_addr, size_t size,
-   enum dma_data_direction dir)
-{
-}
-
-static void __dummy_sync_sg(struct device *dev,
-   struct scatterlist *sgl, int nelems,
-   enum dma_data_direction dir)
-{
-}
-
-static int __dummy_mapping_error(struct device *hwdev, dma_addr_t dma_addr)
-{
-   return 1;
-}
-
-static int __dummy_dma_supported(struct device *hwdev, u64 mask)
-{
-   return 0;
-}
-
-const struct dma_map_ops dummy_dma_ops = {
-   .alloc  = __dummy_alloc,
-   .free   = __dummy_free,
-   .mmap   = __dummy_mmap,
-   .map_page   = __dummy_map_page,
-   .unmap_page = __dummy_unmap_page,
-   .map_sg = __dummy_map_sg,
-   .unmap_sg   = __dummy_unmap_sg,
-   .sync_single_for_cpu= __dummy_sync_single,
-   .sync_single_for_device = __dummy_sync_single,
-   .sync_sg_for_cpu= __dummy_sync_sg,
-   .sync_sg_for_device = __dummy_sync_sg,
-   .mapping_error  = __dummy_mapping_error,
-   .dma_supported  = __dummy_dma_supported,
-};
-EXPORT_SYMBOL(dummy_dma_ops);
-
 static int __init arm64_dma_init(void)
 {
WARN_TAINT(ARCH_DMA_MINALIGN < cache_line_size(),
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 15bd41447025..3a1

[PATCH 2/2] ACPI / scan: Refactor _CCA enforcement

2018-12-07 Thread Robin Murphy
Rather than checking the DMA attribute at each callsite, just pass it
through for acpi_dma_configure() to handle directly. That can then deal
with the relatively exceptional DEV_DMA_NOT_SUPPORTED case by explicitly
installing dummy DMA ops instead of just skipping setup entirely. This
will then free up the dev->dma_ops == NULL case for some valuable
fastpath optimisations.

CC: Rafael J. Wysocki 
CC: Len Brown 
CC: Greg Kroah-Hartman 
CC: Bjorn Helgaas 
Signed-off-by: Robin Murphy 
---
 drivers/acpi/scan.c  | 5 +
 drivers/base/platform.c  | 3 +--
 drivers/pci/pci-driver.c | 3 +--
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index bd1c59fb0e17..b75ae34ed188 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1456,6 +1456,11 @@ int acpi_dma_configure(struct device *dev, enum 
dev_dma_attr attr)
const struct iommu_ops *iommu;
u64 dma_addr = 0, size = 0;
 
+   if (attr == DEV_DMA_NOT_SUPPORTED) {
+   set_dma_ops(dev, &dma_dummy_ops);
+   return 0;
+   }
+
iort_dma_setup(dev, &dma_addr, &size);
 
iommu = iort_iommu_configure(dev);
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 41b91af95afb..c6daca875c17 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1138,8 +1138,7 @@ int platform_dma_configure(struct device *dev)
ret = of_dma_configure(dev, dev->of_node, true);
} else if (has_acpi_companion(dev)) {
attr = acpi_get_dma_attr(to_acpi_device_node(dev->fwnode));
-   if (attr != DEV_DMA_NOT_SUPPORTED)
-   ret = acpi_dma_configure(dev, attr);
+   ret = acpi_dma_configure(dev, attr);
}
 
return ret;
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index bef17c3fca67..f899a28b90f8 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1602,8 +1602,7 @@ static int pci_dma_configure(struct device *dev)
struct acpi_device *adev = to_acpi_device_node(bridge->fwnode);
enum dev_dma_attr attr = acpi_get_dma_attr(adev);
 
-   if (attr != DEV_DMA_NOT_SUPPORTED)
-   ret = acpi_dma_configure(dev, attr);
+   ret = acpi_dma_configure(dev, attr);
}
 
pci_put_host_bridge_device(bridge);
-- 
2.19.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] avoid indirect calls for DMA direct mappings

2018-12-07 Thread Jesper Dangaard Brouer
On Fri, 7 Dec 2018 16:44:35 +0100
Jesper Dangaard Brouer  wrote:

> On Fri, 7 Dec 2018 02:21:42 +0100
> Christoph Hellwig  wrote:
> 
> > On Thu, Dec 06, 2018 at 08:24:38PM +, Robin Murphy wrote:  
> > > On 06/12/2018 20:00, Christoph Hellwig wrote:
> > >> On Thu, Dec 06, 2018 at 06:54:17PM +, Robin Murphy wrote:
> > >>> I'm pretty sure we used to assign dummy_dma_ops explicitly to devices at
> > >>> the point we detected the ACPI properties are wrong - that shouldn't be 
> > >>> too
> > >>> much of a headache to go back to.
> > >>
> > >> Ok.  I've cooked up a patch to use NULL as the go direct marker.
> > >> This cleans up a few things nicely, but also means we now need to
> > >> do the bypass scheme for all ops, not just the fast path.  But we
> > >> probably should just move the slow path ops out of line anyway,
> > >> so I'm not worried about it.  This has survived some very basic
> > >> testing on x86, and really needs to be cleaned up and split into
> > >> multiple patches..
> > >
> > > I've also just finished hacking something up to keep the arm64 status quo 
> > > - 
> > > I'll need to actually test it tomorrow, but the overall diff looks like 
> > > the 
> > > below.
> > 
> > Nice.  I created a branch that picked up your bits and also the ideas
> > from Linus, and the result looks reall nice.  I'll still need a signoff
> > for your bits, though.
> > 
> > Jesper, can you give this a spin if it changes the number even further?
> > 
> >   git://git.infradead.org/users/hch/misc.git dma-direct-calls.2
> > 
> >   
> > http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-direct-calls.2
> >   
> 
> I'll test it soon...
> 
> I looked at my perf stat recording on my existing tests[1] and there
> seems to be significantly more I-cache usage.

The I-cache pressure seems to be lower with the new branch, and
performance improved with 4.5 nanosec.

 
> Copy-paste from my summary[1]:
> [1] 
> https://github.com/xdp-project/xdp-project/blob/master/areas/dma/dma01_test_hellwig_direct_dma.org#summary-of-results

Updated:

* Summary of results

Using XDP_REDIRECT between drivers RX ixgbe(10G) redirect TX i40e(40G),
via BPF devmap (used samples/bpf/xdp_redirect_map) . (Note choose
higher TX link-speed to assure that we don't to have a TX bottleneck).
The baseline-kernel is at commit 
[[https://git.kernel.org/torvalds/c/ef78e5ec9214][ef78e5ec9214]], which is 
commit just
before Hellwigs changes in this tree.

Performance numbers in packets/sec (XDP_REDIRECT ixgbe -> i40e):
 - 11913154 (11,913,154) pps - baseline compiled without retpoline
 -  7438283  (7,438,283) pps - regression due to CONFIG_RETPOLINE
 -  9610088  (9,610,088) pps - mitigation via Hellwig dma-direct-calls
 - 10049223 (10,049,223) pps - Hellwig branch dma-direct-calls.2

Do notice at these extreme speeds the pps number increase rabbit with
small changes, e.g. difference to new branch is:
 - (1/9610088-1/10049223)*10^9 = 4.54 nanosec faster

>From the inst per cycle, it is clear that retpolines are stalling the CPU
pipeline:

| pps| insn per cycle |
|+|
| 11,913,154 |   2.39 |
| 7,438,283  |   1.54 |
| 9,610,088  |   2.04 |
| 10,049,223 |   1.99 |
|||


Strangely the Instruction-Cache is also under heavier pressure:

| pps| l2_rqsts.all_code_rd | l2_rqsts.code_rd_hit | 
l2_rqsts.code_rd_miss |
|+--+--+---|
| 11,913,154 | 874,547  | 742,335  | 132,198
   |
| 7,438,283  | 649,513  | 547,581  | 101,945
   |
| 9,610,088  | 2,568,064| 2,001,369| 566,683
   |
| 10,049,223 | 1,232,818| 1,152,514| 80,299 
   |
||  |  |
   |

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] avoid indirect calls for DMA direct mappings

2018-12-07 Thread Jesper Dangaard Brouer
On Fri, 7 Dec 2018 02:21:42 +0100
Christoph Hellwig  wrote:

> On Thu, Dec 06, 2018 at 08:24:38PM +, Robin Murphy wrote:
> > On 06/12/2018 20:00, Christoph Hellwig wrote:  
> >> On Thu, Dec 06, 2018 at 06:54:17PM +, Robin Murphy wrote:  
> >>> I'm pretty sure we used to assign dummy_dma_ops explicitly to devices at
> >>> the point we detected the ACPI properties are wrong - that shouldn't be 
> >>> too
> >>> much of a headache to go back to.  
> >>
> >> Ok.  I've cooked up a patch to use NULL as the go direct marker.
> >> This cleans up a few things nicely, but also means we now need to
> >> do the bypass scheme for all ops, not just the fast path.  But we
> >> probably should just move the slow path ops out of line anyway,
> >> so I'm not worried about it.  This has survived some very basic
> >> testing on x86, and really needs to be cleaned up and split into
> >> multiple patches..  
> >
> > I've also just finished hacking something up to keep the arm64 status quo - 
> > I'll need to actually test it tomorrow, but the overall diff looks like the 
> > below.  
> 
> Nice.  I created a branch that picked up your bits and also the ideas
> from Linus, and the result looks reall nice.  I'll still need a signoff
> for your bits, though.
> 
> Jesper, can you give this a spin if it changes the number even further?
> 
>   git://git.infradead.org/users/hch/misc.git dma-direct-calls.2
> 
>   
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-direct-calls.2

I'll test it soon...

I looked at my perf stat recording on my existing tests[1] and there
seems to be significantly more I-cache usage.

Copy-paste from my summary[1]:
 [1] 
https://github.com/xdp-project/xdp-project/blob/master/areas/dma/dma01_test_hellwig_direct_dma.org#summary-of-results

* Summary of results

Using XDP_REDIRECT between drivers RX ixgbe(10G) redirect TX i40e(40G),
via BPF devmap (used samples/bpf/xdp_redirect_map) . (Note choose
higher TX link-speed to assure that we don't to have a TX bottleneck).
The baseline-kernel is at commit https://git.kernel.org/torvalds/c/ef78e5ec9214,
which is commit just before Hellwigs changes in this tree.

Performance numbers in packets/sec (XDP_REDIRECT ixgbe -> i40e):
 - 11913154 (11,913,154) pps - baseline compiled without retpoline
 -  7438283  (7,438,283) pps - regression due to CONFIG_RETPOLINE
 -  9610088  (9,610,088) pps - mitigation via Hellwig dma-direct-calls

>From the inst per cycle, it is clear that retpolines are stalling the CPU
pipeline:

| pps| insn per cycle |
|+|
| 11,913,154 |   2.39 |
| 7,438,283  |   1.54 |
| 9,610,088  |   2.04 |


Strangely the Instruction-Cache is also under heavier pressure:

| pps| l2_rqsts.all_code_rd | l2_rqsts.code_rd_hit | 
l2_rqsts.code_rd_miss |
|+--+--+---|
| 11,913,154 | 874,547  | 742,335  | 132,198
   |
| 7,438,283  | 649,513  | 547,581  | 101,945
   |
| 9,610,088  | 2,568,064| 2,001,369| 566,683
   |
||  |  |
   |


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


dma_declare_coherent_memory on main memory

2018-12-07 Thread Christoph Hellwig
Hi all,

the ARM imx27/31 ports and various sh boards use
dma_declare_coherent_memory on main memory taken from the memblock
allocator.

Is there any good reason these couldn't be switched to CMA areas?
Getting rid of these magic dma_declare_coherent_memory area would
help making the dma allocator a lot simpler.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/9] mm: Introduce new vm_insert_range API

2018-12-07 Thread Robin Murphy

On 06/12/2018 18:39, Souptick Joarder wrote:

Previouly drivers have their own way of mapping range of
kernel pages/memory into user vma and this was done by
invoking vm_insert_page() within a loop.

As this pattern is common across different drivers, it can
be generalized by creating a new function and use it across
the drivers.

vm_insert_range is the new API which will be used to map a
range of kernel memory/pages to user vma.

This API is tested by Heiko for Rockchip drm driver, on rk3188,
rk3288, rk3328 and rk3399 with graphics.

Signed-off-by: Souptick Joarder 
Reviewed-by: Matthew Wilcox 
Reviewed-by: Mike Rapoport 
Tested-by: Heiko Stuebner 
---
  include/linux/mm.h |  2 ++
  mm/memory.c| 38 ++
  mm/nommu.c |  7 +++
  3 files changed, 47 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index fcf9cc9..2bc399f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2506,6 +2506,8 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
  int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t);
  int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page 
*);
+int vm_insert_range(struct vm_area_struct *vma, unsigned long addr,
+   struct page **pages, unsigned long page_count);
  vm_fault_t vmf_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn);
  vm_fault_t vmf_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/memory.c b/mm/memory.c
index 15c417e..84ea46c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1478,6 +1478,44 @@ static int insert_page(struct vm_area_struct *vma, 
unsigned long addr,
  }
  
  /**

+ * vm_insert_range - insert range of kernel pages into user vma
+ * @vma: user vma to map to
+ * @addr: target user address of this page
+ * @pages: pointer to array of source kernel pages
+ * @page_count: number of pages need to insert into user vma
+ *
+ * This allows drivers to insert range of kernel pages they've allocated
+ * into a user vma. This is a generic function which drivers can use
+ * rather than using their own way of mapping range of kernel pages into
+ * user vma.
+ *
+ * If we fail to insert any page into the vma, the function will return
+ * immediately leaving any previously-inserted pages present.  Callers
+ * from the mmap handler may immediately return the error as their caller
+ * will destroy the vma, removing any successfully-inserted pages. Other
+ * callers should make their own arrangements for calling unmap_region().
+ *
+ * Context: Process context. Called by mmap handlers.
+ * Return: 0 on success and error code otherwise
+ */
+int vm_insert_range(struct vm_area_struct *vma, unsigned long addr,
+   struct page **pages, unsigned long page_count)
+{
+   unsigned long uaddr = addr;
+   int ret = 0, i;


Some of the sites being replaced were effectively ensuring that vma and 
pages were mutually compatible as an initial condition - would it be 
worth adding something here for robustness, e.g.:


+   if (page_count != vma_pages(vma))
+   return -ENXIO;

?

(then you could also clean up a couple more places where you're not 
already removing such checks)


Robin.


+
+   for (i = 0; i < page_count; i++) {
+   ret = vm_insert_page(vma, uaddr, pages[i]);
+   if (ret < 0)
+   return ret;
+   uaddr += PAGE_SIZE;
+   }
+
+   return ret;
+}
+EXPORT_SYMBOL(vm_insert_range);
+
+/**
   * vm_insert_page - insert single page into user vma
   * @vma: user vma to map to
   * @addr: target user address of this page
diff --git a/mm/nommu.c b/mm/nommu.c
index 749276b..d6ef5c7 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -473,6 +473,13 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned 
long addr,
  }
  EXPORT_SYMBOL(vm_insert_page);
  
+int vm_insert_range(struct vm_area_struct *vma, unsigned long addr,

+   struct page **pages, unsigned long page_count)
+{
+   return -EINVAL;
+}
+EXPORT_SYMBOL(vm_insert_range);
+
  /*
   *  sys_brk() for the most part doesn't need the global kernel
   *  lock, except when an application is doing something nasty


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/34] powerpc: use mm zones more sensibly

2018-12-07 Thread Christian Zigotzky
I will work at the weekend to figure out where the problematic commit is.

— Christian

Sent from my iPhone

> On 7. Dec 2018, at 15:09, Christoph Hellwig  wrote:
> 
>> On Fri, Dec 07, 2018 at 11:18:18PM +1100, Michael Ellerman wrote:
>> Christoph Hellwig  writes:
>> 
>>> Ben / Michael,
>>> 
>>> can we get this one queued up for 4.21 to prepare for the DMA work later
>>> on?
>> 
>> I was hoping the PASEMI / NXP regressions could be solved before
>> merging.
>> 
>> My p5020ds is booting fine with this series, so I'm not sure why it's
>> causing problems on Christian's machine.
>> 
>> The last time I turned on my PASEMI board it tripped some breakers, so I
>> need to investigate that before I can help test that.
>> 
>> I'll see how things look on Monday and either merge the commits you
>> identified or the whole series depending on if there's any more info
>> from Christian.
> 
> Christian just confirmed everything up to at least
> "powerpc/dma: stop overriding dma_get_required_mask" works for his
> setups.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 2/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

2018-12-07 Thread Vlastimil Babka
On 12/7/18 7:16 AM, Nicolas Boichat wrote:
> IOMMUs using ARMv7 short-descriptor format require page tables
> (level 1 and 2) to be allocated within the first 4GB of RAM, even
> on 64-bit systems.
> 
> For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32
> is defined (e.g. on arm64 platforms).
> 
> For level 2 pages, allocate a slab cache in SLAB_CACHE_DMA32. Note
> that we do not explicitly pass GFP_DMA[32] to kmem_cache_zalloc,
> as this is not strictly necessary, and would cause a warning
> in mm/sl*b.c, as we did not update GFP_SLAB_BUG_MASK.
> 
> Also, print an error when the physical address does not fit in
> 32-bit, to make debugging easier in the future.
> 
> Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")

Also, CC stable?

> Signed-off-by: Nicolas Boichat 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/9] mm: Introduce new vm_insert_range API

2018-12-07 Thread Mauro Carvalho Chehab
Em Fri, 7 Dec 2018 00:09:45 +0530
Souptick Joarder  escreveu:

> Previouly drivers have their own way of mapping range of
> kernel pages/memory into user vma and this was done by
> invoking vm_insert_page() within a loop.
> 
> As this pattern is common across different drivers, it can
> be generalized by creating a new function and use it across
> the drivers.
> 
> vm_insert_range is the new API which will be used to map a
> range of kernel memory/pages to user vma.
> 
> This API is tested by Heiko for Rockchip drm driver, on rk3188,
> rk3288, rk3328 and rk3399 with graphics.
> 
> Signed-off-by: Souptick Joarder 
> Reviewed-by: Matthew Wilcox 
> Reviewed-by: Mike Rapoport 
> Tested-by: Heiko Stuebner 

Looks good to me.

Reviewed-by: Mauro Carvalho Chehab 

> ---
>  include/linux/mm.h |  2 ++
>  mm/memory.c| 38 ++
>  mm/nommu.c |  7 +++
>  3 files changed, 47 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index fcf9cc9..2bc399f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2506,6 +2506,8 @@ unsigned long change_prot_numa(struct vm_area_struct 
> *vma,
>  int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
>   unsigned long pfn, unsigned long size, pgprot_t);
>  int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page 
> *);
> +int vm_insert_range(struct vm_area_struct *vma, unsigned long addr,
> + struct page **pages, unsigned long page_count);
>  vm_fault_t vmf_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
>   unsigned long pfn);
>  vm_fault_t vmf_insert_pfn_prot(struct vm_area_struct *vma, unsigned long 
> addr,
> diff --git a/mm/memory.c b/mm/memory.c
> index 15c417e..84ea46c 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1478,6 +1478,44 @@ static int insert_page(struct vm_area_struct *vma, 
> unsigned long addr,
>  }
>  
>  /**
> + * vm_insert_range - insert range of kernel pages into user vma
> + * @vma: user vma to map to
> + * @addr: target user address of this page
> + * @pages: pointer to array of source kernel pages
> + * @page_count: number of pages need to insert into user vma
> + *
> + * This allows drivers to insert range of kernel pages they've allocated
> + * into a user vma. This is a generic function which drivers can use
> + * rather than using their own way of mapping range of kernel pages into
> + * user vma.
> + *
> + * If we fail to insert any page into the vma, the function will return
> + * immediately leaving any previously-inserted pages present.  Callers
> + * from the mmap handler may immediately return the error as their caller
> + * will destroy the vma, removing any successfully-inserted pages. Other
> + * callers should make their own arrangements for calling unmap_region().
> + *
> + * Context: Process context. Called by mmap handlers.
> + * Return: 0 on success and error code otherwise
> + */
> +int vm_insert_range(struct vm_area_struct *vma, unsigned long addr,
> + struct page **pages, unsigned long page_count)
> +{
> + unsigned long uaddr = addr;
> + int ret = 0, i;
> +
> + for (i = 0; i < page_count; i++) {
> + ret = vm_insert_page(vma, uaddr, pages[i]);
> + if (ret < 0)
> + return ret;
> + uaddr += PAGE_SIZE;
> + }
> +
> + return ret;
> +}
> +EXPORT_SYMBOL(vm_insert_range);
> +
> +/**
>   * vm_insert_page - insert single page into user vma
>   * @vma: user vma to map to
>   * @addr: target user address of this page
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 749276b..d6ef5c7 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -473,6 +473,13 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned 
> long addr,
>  }
>  EXPORT_SYMBOL(vm_insert_page);
>  
> +int vm_insert_range(struct vm_area_struct *vma, unsigned long addr,
> + struct page **pages, unsigned long page_count)
> +{
> + return -EINVAL;
> +}
> +EXPORT_SYMBOL(vm_insert_range);
> +
>  /*
>   *  sys_brk() for the most part doesn't need the global kernel
>   *  lock, except when an application is doing something nasty



Thanks,
Mauro
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/34] powerpc: use mm zones more sensibly

2018-12-07 Thread Christoph Hellwig
On Fri, Dec 07, 2018 at 11:18:18PM +1100, Michael Ellerman wrote:
> Christoph Hellwig  writes:
> 
> > Ben / Michael,
> >
> > can we get this one queued up for 4.21 to prepare for the DMA work later
> > on?
> 
> I was hoping the PASEMI / NXP regressions could be solved before
> merging.
> 
> My p5020ds is booting fine with this series, so I'm not sure why it's
> causing problems on Christian's machine.
> 
> The last time I turned on my PASEMI board it tripped some breakers, so I
> need to investigate that before I can help test that.
> 
> I'll see how things look on Monday and either merge the commits you
> identified or the whole series depending on if there's any more info
> from Christian.

Christian just confirmed everything up to at least
"powerpc/dma: stop overriding dma_get_required_mask" works for his
setups.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 6/9] iommu/dma-iommu.c: Convert to use vm_insert_range

2018-12-07 Thread Robin Murphy

On 06/12/2018 18:43, Souptick Joarder wrote:

Convert to use vm_insert_range() to map range of kernel
memory to user vma.

Signed-off-by: Souptick Joarder 
Reviewed-by: Matthew Wilcox 
---
  drivers/iommu/dma-iommu.c | 13 +++--
  1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d1b0475..a2c65e2 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -622,17 +622,10 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
  
  int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct *vma)

  {
-   unsigned long uaddr = vma->vm_start;
-   unsigned int i, count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   int ret = -ENXIO;
+   unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
  
-	for (i = vma->vm_pgoff; i < count && uaddr < vma->vm_end; i++) {

-   ret = vm_insert_page(vma, uaddr, pages[i]);
-   if (ret)
-   break;
-   uaddr += PAGE_SIZE;
-   }
-   return ret;
+   return vm_insert_range(vma, vma->vm_start,
+   pages + vma->vm_pgoff, count);


You also need to adjust count to compensate for the pages skipped by 
vm_pgoff, otherwise you've got an out-of-bounds dereference triggered 
from userspace, which is pretty high up the "not good" scale (not to 
mention the entire call would then propagate -EFAULT back from 
vm_insert_page() and thus always appear to fail for nonzero offsets).


Robin.


  }
  
  static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: use generic DMA mapping code in powerpc V4

2018-12-07 Thread Christian Zigotzky

On 06 December 2018 at 11:55AM, Christian Zigotzky wrote:

On 05 December 2018 at 3:05PM, Christoph Hellwig wrote:


Thanks.  Can you try a few stepping points in the tree?

First just with commit 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6
(the first one) applied?

Second with all commits up to 5da11e49df21f21dac25a2491aa788307bdacb6b

And if that still works with commits up to
c1bfcad4b0cf38ce5b00f7ad880d3a13484c123a


Hi Christoph,

I undid the commit 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6 with the 
following command:


git checkout 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6

Result: PASEMI onboard ethernet works again and the P5020 board boots.

I will test the other commits in the next days.

@All
It is really important, that you also test Christoph's work on your 
PASEMI and NXP boards. Could you please help us with solving the issues?


'git clone git://git.infradead.org/users/hch/misc.git -b powerpc-dma.5 a'

Thanks,
Christian



Today I tested the commit 5da11e49df21f21dac25a2491aa788307bdacb6b.

git checkout 5da11e49df21f21dac25a2491aa788307bdacb6b

The PASEMI onboard ethernet works and the P5020 board boots.

-- Christian

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH RFC 1/1] swiotlb: add debugfs to track swiotlb buffer usage

2018-12-07 Thread Robin Murphy

On 07/12/2018 05:49, Dongli Zhang wrote:



On 12/07/2018 12:12 AM, Joe Jin wrote:

Hi Dongli,

Maybe move d_swiotlb_usage declare into swiotlb_create_debugfs():


I assume the call of swiotlb_tbl_map_single() might be frequent in some
situations, e.g., when 'swiotlb=force'.

That's why I declare the d_swiotlb_usage out of any functions and use "if
(unlikely(!d_swiotlb_usage))".

I think "if (unlikely(!d_swiotlb_usage))" incur less performance overhead than
calling swiotlb_create_debugfs() every time to confirm if debugfs is created. I
would declare d_swiotlb_usage statically inside swiotlb_create_debugfs() if the
performance overhead is acceptable (it is trivial indeed).


That is the reason I tag the patch with RFC because I am not sure if the
on-demand creation of debugfs is fine with maintainers/reviewers. If swiotlb
pages are never allocated, we would not be able to see the debugfs entry.

I would prefer to limit the modification within swiotlb and to not taint any
other files.

The drawback is there is no place to create or delete the debugfs entry because
swiotlb buffer could be initialized and uninitialized at very early stage.


Couldn't you just do it from an initcall? All you really need to care 
about is ordering after debugfs_init(), which is easy. If SWIOTLB 
initialisation does end up being skipped at any point, nobody's going to 
mind if debugfs still has an entry saying io_tlb_nslabs == 0 (in fact, 
that's arguably useful in itself as positive confirmation that the 
system is not using SWIOTLB).



void swiotlb_create_debugfs(void)
{
#ifdef CONFIG_DEBUG_FS
static struct dentry *d_swiotlb_usage = NULL;

if (d_swiotlb_usage)
return;

d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL);

if (!d_swiotlb_usage)
return;

debugfs_create_file("usage", 0600, d_swiotlb_usage,
NULL, &swiotlb_usage_fops);


Maybe expose io_tlb_nslabs and io_tlb_used as separate entries? Then you 
could just use debugfs_create_ulong() to keep things really simple. That 
would also make the interface more consistent with dma-debug, which 
would be nice given how closely-related they are.


Robin.


#endif
}

And for io_tlb_used, possible add a check at the begin of 
swiotlb_tbl_map_single(),
if there were not any free slots or not enough slots, return fail directly?


This would optimize the slots allocation path. I will follow this in next
version after I got more suggestions and confirmations from maintainers.


Thank you very much!

Dongli Zhang



Thanks,
Joe
On 12/5/18 7:59 PM, Dongli Zhang wrote:

The device driver will not be able to do dma operations once swiotlb buffer
is full, either because the driver is using so many IO TLB blocks inflight,
or because there is memory leak issue in device driver. To export the
swiotlb buffer usage via debugfs would help the user estimate the size of
swiotlb buffer to pre-allocate or analyze device driver memory leak issue.

As the swiotlb can be initialized at very early stage when debugfs cannot
register successfully, this patch creates the debugfs entry on demand.

Signed-off-by: Dongli Zhang 
---
  kernel/dma/swiotlb.c | 57 
  1 file changed, 57 insertions(+)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 045930e..d3c8aa4 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -35,6 +35,9 @@
  #include 
  #include 
  #include 
+#ifdef CONFIG_DEBUG_FS
+#include 
+#endif
  
  #include 

  #include 
@@ -73,6 +76,13 @@ static phys_addr_t io_tlb_start, io_tlb_end;
   */
  static unsigned long io_tlb_nslabs;
  
+#ifdef CONFIG_DEBUG_FS

+/*
+ * The number of used IO TLB block
+ */
+static unsigned long io_tlb_used;
+#endif
+
  /*
   * This is a free list describing the number of free entries available from
   * each index
@@ -100,6 +110,41 @@ static DEFINE_SPINLOCK(io_tlb_lock);
  
  static int late_alloc;
  
+#ifdef CONFIG_DEBUG_FS

+
+static struct dentry *d_swiotlb_usage;
+
+static int swiotlb_usage_show(struct seq_file *m, void *v)
+{
+   seq_printf(m, "%lu\n%lu\n", io_tlb_used, );
+   return 0;
+}
+
+static int swiotlb_usage_open(struct inode *inode, struct file *filp)
+{
+   return single_open(filp, swiotlb_usage_show, NULL);
+}
+
+static const struct file_operations swiotlb_usage_fops = {
+   .open   = swiotlb_usage_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
+void swiotlb_create_debugfs(void)
+{
+   d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL);
+
+   if (!d_swiotlb_usage)
+   return;
+
+   debugfs_create_file("usage", 0600, d_swiotlb_usage,
+   NULL, &swiotlb_usage_fops);
+}
+
+#endif
+
  static int __init
  setup_io_tlb_npages(char *str)
  {
@@ -449,6 +494,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
   

Re: [PATCH v2] iommu: fix amd_iommu=force_isolation

2018-12-07 Thread Joerg Roedel
On Thu, Dec 06, 2018 at 02:39:15PM -0700, Yu Zhao wrote:
> Fixes: aafd8ba0ca74 ("iommu/amd: Implement add_device and remove_device")
> 
> Signed-off-by: Yu Zhao 
> ---
>  drivers/iommu/amd_iommu.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)

Applied, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/34] powerpc: use mm zones more sensibly

2018-12-07 Thread Michael Ellerman
Christoph Hellwig  writes:

> Ben / Michael,
>
> can we get this one queued up for 4.21 to prepare for the DMA work later
> on?

I was hoping the PASEMI / NXP regressions could be solved before
merging.

My p5020ds is booting fine with this series, so I'm not sure why it's
causing problems on Christian's machine.

The last time I turned on my PASEMI board it tripped some breakers, so I
need to investigate that before I can help test that.

I'll see how things look on Monday and either merge the commits you
identified or the whole series depending on if there's any more info
from Christian.

cheers

> On Wed, Nov 14, 2018 at 09:22:41AM +0100, Christoph Hellwig wrote:
>> Powerpc has somewhat odd usage where ZONE_DMA is used for all memory on
>> common 64-bit configfs, and ZONE_DMA32 is used for 31-bit schemes.
>> 
>> Move to a scheme closer to what other architectures use (and I dare to
>> say the intent of the system):
>> 
>>  - ZONE_DMA: optionally for memory < 31-bit (64-bit embedded only)
>>  - ZONE_NORMAL: everything addressable by the kernel
>>  - ZONE_HIGHMEM: memory > 32-bit for 32-bit kernels
>> 
>> Also provide information on how ZONE_DMA is used by defining
>> ARCH_ZONE_DMA_BITS.
>> 
>> Contains various fixes from Benjamin Herrenschmidt.
>> 
>> Signed-off-by: Christoph Hellwig 
>> ---
>>  arch/powerpc/Kconfig  |  8 +---
>>  arch/powerpc/include/asm/page.h   |  2 +
>>  arch/powerpc/include/asm/pgtable.h|  1 -
>>  arch/powerpc/kernel/dma-swiotlb.c |  6 +--
>>  arch/powerpc/kernel/dma.c |  7 +--
>>  arch/powerpc/mm/mem.c | 47 +++
>>  arch/powerpc/platforms/85xx/corenet_generic.c | 10 
>>  arch/powerpc/platforms/85xx/qemu_e500.c   |  9 
>>  include/linux/mmzone.h|  2 +-
>>  9 files changed, 25 insertions(+), 67 deletions(-)
>> 
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index 8be31261aec8..c3613bc1 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -374,9 +374,9 @@ config PPC_ADV_DEBUG_DAC_RANGE
>>  depends on PPC_ADV_DEBUG_REGS && 44x
>>  default y
>>  
>> -config ZONE_DMA32
>> +config ZONE_DMA
>>  bool
>> -default y if PPC64
>> +default y if PPC_BOOK3E_64
>>  
>>  config PGTABLE_LEVELS
>>  int
>> @@ -869,10 +869,6 @@ config ISA
>>have an IBM RS/6000 or pSeries machine, say Y.  If you have an
>>embedded board, consult your board documentation.
>>  
>> -config ZONE_DMA
>> -bool
>> -default y
>> -
>>  config GENERIC_ISA_DMA
>>  bool
>>  depends on ISA_DMA_API
>> diff --git a/arch/powerpc/include/asm/page.h 
>> b/arch/powerpc/include/asm/page.h
>> index f6a1265face2..fc8c9ac0c6be 100644
>> --- a/arch/powerpc/include/asm/page.h
>> +++ b/arch/powerpc/include/asm/page.h
>> @@ -354,4 +354,6 @@ typedef struct page *pgtable_t;
>>  #endif /* __ASSEMBLY__ */
>>  #include 
>>  
>> +#define ARCH_ZONE_DMA_BITS 31
>> +
>>  #endif /* _ASM_POWERPC_PAGE_H */
>> diff --git a/arch/powerpc/include/asm/pgtable.h 
>> b/arch/powerpc/include/asm/pgtable.h
>> index 9679b7519a35..8af32ce93c7f 100644
>> --- a/arch/powerpc/include/asm/pgtable.h
>> +++ b/arch/powerpc/include/asm/pgtable.h
>> @@ -66,7 +66,6 @@ extern unsigned long empty_zero_page[];
>>  
>>  extern pgd_t swapper_pg_dir[];
>>  
>> -void limit_zone_pfn(enum zone_type zone, unsigned long max_pfn);
>>  int dma_pfn_limit_to_zone(u64 pfn_limit);
>>  extern void paging_init(void);
>>  
>> diff --git a/arch/powerpc/kernel/dma-swiotlb.c 
>> b/arch/powerpc/kernel/dma-swiotlb.c
>> index 5fc335f4d9cd..678811abccfc 100644
>> --- a/arch/powerpc/kernel/dma-swiotlb.c
>> +++ b/arch/powerpc/kernel/dma-swiotlb.c
>> @@ -108,12 +108,8 @@ int __init swiotlb_setup_bus_notifier(void)
>>  
>>  void __init swiotlb_detect_4g(void)
>>  {
>> -if ((memblock_end_of_DRAM() - 1) > 0x) {
>> +if ((memblock_end_of_DRAM() - 1) > 0x)
>>  ppc_swiotlb_enable = 1;
>> -#ifdef CONFIG_ZONE_DMA32
>> -limit_zone_pfn(ZONE_DMA32, (1ULL << 32) >> PAGE_SHIFT);
>> -#endif
>> -}
>>  }
>>  
>>  static int __init check_swiotlb_enabled(void)
>> diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
>> index dbfc7056d7df..6551685a4ed0 100644
>> --- a/arch/powerpc/kernel/dma.c
>> +++ b/arch/powerpc/kernel/dma.c
>> @@ -50,7 +50,7 @@ static int dma_nommu_dma_supported(struct device *dev, u64 
>> mask)
>>  return 1;
>>  
>>  #ifdef CONFIG_FSL_SOC
>> -/* Freescale gets another chance via ZONE_DMA/ZONE_DMA32, however
>> +/* Freescale gets another chance via ZONE_DMA, however
>>   * that will have to be refined if/when they support iommus
>>   */
>>  return 1;
>> @@ -94,13 +94,10 @@ void *__dma_nommu_alloc_coherent(struct device *dev, 
>> size_t size,
>>  }
>>  
>>  switch (zone) {
>> +#ifdef CONFIG_ZONE_DMA
>>  case ZONE_DMA:
>>  flag |= GFP_DMA;
>> 

Re: [RFC PATCH 0/6] Auxiliary IOMMU domains and Arm SMMUv3

2018-12-07 Thread 'j...@8bytes.org'
Hi,

On Mon, Nov 26, 2018 at 07:29:45AM +, Tian, Kevin wrote:
> btw Baolu just reminded me one thing which is worthy of noting here.
> 'primary' vs. 'aux' concept makes sense only when we look from a device
> p.o.v. That binding relationship is not (*should not be*) carry-and-forwarded
> cross devices. every domain must be explicitly attached to other devices
> (instead of implicitly attached being above example), and new primary/aux
> attribute on another device will be decided at attach time.

Okay, so after all the discussions we had I learned a few more things
about the scalable mode feature and thought a bit longer about how to
best support it in the IOMMU-API.

The concept of sub-domains I initially proposed certainly makes no
sense, but scalable-mode specific attach/detach functions do. So instead
of a sub-domain mode, I'd like to propose device-feature sets.

The posted patch-set already includes this as device-attributes, but I
don't like this naming as we are really talking about additional
feature sets of a device. So how about we introduce this:

enum iommu_dev_features {
/* ... */
IOMMU_DEV_FEAT_AUX,
IOMMU_DEV_FEAT_SVA,
/* ... */
};

/* Check if a device supports a given feature of the IOMMU-API */
bool iommu_dev_has_feature(struct device *dev, enum iommu_dev_features 
*feat);

/*
 * Only works if iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)
 * returns true
 *
 * Also, as long as domains are attached to a device through
 * this interface, any trys to call iommu_attach_device() should
 * fail (iommu_detach_device() can't fail, so we fail on the
 * tryint to re-attach). This should make us safe against a
 * device being attached to a guest as a whole while there are
 * still pasid users on it (aux and sva).
 */
int iommu_aux_attach_device(struct iommu_domain *domain,
struct device *dev);

int iommu_aux_detach_device(struct iommu_domain *domain,
struct device *dev);
/*
 * I know we are targeting a system-wide pasid-space, so that
 * the pasid would be the same for one domain on all devices,
 * let's just keep the option open to have different
 * pasid-spaces in one system. Also this way we can use it to
 * check whether the domain is attached to this device at all.
 *
 * returns pasid or <0 if domain has no pasid on that device.
 */
int iommu_aux_get_pasid(struct iommu_domain *domain,
struct device *dev);

/* So we need a iommu_aux_detach_all()? */

This concept can also be easily extended for supporting SVA in parallel
on the same device, with the same constraints regarding the behavior of
iommu_attach_device()/iommu_detach_device().

So what do you think about that approach?

Regards,

Joerg


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 08/17] locking/lockdep: Add support for nestable terminal locks

2018-12-07 Thread Peter Zijlstra
On Fri, Dec 07, 2018 at 10:22:52AM +0100, Peter Zijlstra wrote:
> On Mon, Nov 19, 2018 at 01:55:17PM -0500, Waiman Long wrote:
> > There are use cases where we want to allow nesting of one terminal lock
> > underneath another terminal-like lock. That new lock type is called
> > nestable terminal lock which can optionally allow the acquisition of
> > no more than one regular (non-nestable) terminal lock underneath it.
> 
> I think I asked for a more coherent changelog on this. The above is
> still self contradictory and doesn't explain why you'd ever want such a
> 'misfeature' :-)

So maybe call the thing penterminal (contraction of penultimate and
terminal) locks and explain why this annotation is safe -- in great
detail.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 09/17] debugobjects: Make object hash locks nestable terminal locks

2018-12-07 Thread Peter Zijlstra
On Mon, Nov 19, 2018 at 01:55:18PM -0500, Waiman Long wrote:
> By making the object hash locks nestable terminal locks, we can avoid
> a bunch of unnecessary lockdep validations as well as saving space
> in the lockdep tables.

So the 'problem'; which you've again not explained; is that debugobjects
has the following lock order:

&db->lock
  &pool_lock

And you seem to want to tag '&db->lock' as terminal, which is obviuosly
a big fat lie.

You've also not explained why it is safe to do this (I think it actually
is, but you really should've spelled it out).

Furthermore; you've not justified any of this 'insanity' with numbers.
What do we gain with this nestable madness that justifies the crazy?


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5] iommu/of: Use device_iommu_mapped()

2018-12-07 Thread Joerg Roedel
On Thu, Dec 06, 2018 at 05:42:16PM +, Robin Murphy wrote:
> For sure - although I am now wondering whether "mapped" is perhaps a little
> ambiguous in the naming, since the answer to "can I use the API" is yes even
> when the device may currently be attached to an identity/passthrough domain
> or blocked completely, neither of which involve any "mapping". Maybe simply
> "device_has_iommu()" would convey the intent better?

The name is shorter version of:

device_is_behind_an_iommu_remapping_its_dma_transactions()
:)

The name is not perfect, but device_has_iommu() is not better because it
might be considered as a check whether the device itself has an IOMMU
built-in.

In the end an identity-mapping is also still a mapping (if the iommu
needs a page-table for that is an implementation detail), as is a
mapping with no page-table entries at all (blocking). So I think
device_iommu_mapped() is a reasonable choice.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/arm-smmu: Add support to use Last level cache

2018-12-07 Thread Vivek Gautam
Hi Robin,

On Tue, Dec 4, 2018 at 8:51 PM Robin Murphy  wrote:
>
> On 04/12/2018 11:01, Vivek Gautam wrote:
> > Qualcomm SoCs have an additional level of cache called as
> > System cache, aka. Last level cache (LLC). This cache sits right
> > before the DDR, and is tightly coupled with the memory controller.
> > The cache is available to all the clients present in the SoC system.
> > The clients request their slices from this system cache, make it
> > active, and can then start using it.
> > For these clients with smmu, to start using the system cache for
> > buffers and, related page tables [1], memory attributes need to be
> > set accordingly.
> > This change updates the MAIR and TCR configurations with correct
> > attributes to use this system cache.
> >
> > To explain a little about memory attribute requirements here:
> >
> > Non-coherent I/O devices can't look-up into inner caches. However,
> > coherent I/O devices can. But both can allocate in the system cache
> > based on system policy and configured memory attributes in page
> > tables.
> > CPUs can access both inner and outer caches (including system cache,
> > aka. Last level cache), and can allocate into system cache too
> > based on memory attributes, and system policy.
> >
> > Further looking at memory types, we have following -
> > a) Normal uncached :- MAIR 0x44, inner non-cacheable,
> >outer non-cacheable;
> > b) Normal cached :-   MAIR 0xff, inner read write-back non-transient,
> >outer read write-back non-transient;
> >attribute setting for coherenet I/O devices.
> >
> > and, for non-coherent i/o devices that can allocate in system cache
> > another type gets added -
> > c) Normal sys-cached/non-inner-cached :-
> >MAIR 0xf4, inner non-cacheable,
> >outer read write-back non-transient
> >
> > So, CPU will automatically use the system cache for memory marked as
> > normal cached. The normal sys-cached is downgraded to normal non-cached
> > memory for CPUs.
> > Coherent I/O devices can use system cache by marking the memory as
> > normal cached.
> > Non-coherent I/O devices, to use system cache, should mark the memory as
> > normal sys-cached in page tables.
> >
> > This change is a realisation of following changes
> > from downstream msm-4.9:
> > iommu: io-pgtable-arm: Support DOMAIN_ATTRIBUTE_USE_UPSTREAM_HINT[2]
> > iommu: io-pgtable-arm: Implement IOMMU_USE_UPSTREAM_HINT[3]
> >
> > [1] https://patchwork.kernel.org/patch/10302791/
> > [2] 
> > https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9&id=bf762276796e79ca90014992f4d9da5593fa7d51
> > [3] 
> > https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9&id=d4c72c413ea27c43f60825193d4de9cb8ffd9602
> >
> > Signed-off-by: Vivek Gautam 
> > ---
> >
> > Changes since v1:
> >   - Addressed Tomasz's comments for basing the change on
> > "NO_INNER_CACHE" concept for non-coherent I/O devices
> > rather than capturing "SYS_CACHE". This is to indicate
> > clearly the intent of non-coherent I/O devices that
> > can't access inner caches.
>
> That seems backwards to me - there is already a fundamental assumption
> that non-coherent devices can't access caches. What we're adding here is
> a weird exception where they *can* use some level of cache despite still
> being non-coherent overall.
>
> In other words, it's not a case of downgrading coherent devices'
> accesses to bypass inner caches, it's upgrading non-coherent devices'
> accesses to hit the outer cache. That's certainly the understanding I
> got from talking with Pratik at Plumbers, and it does appear to fit with
> your explanation above despite the final conclusion you draw being
> different.

Thanks for the thorough review of the change.
Right, I guess it's rather an upgrade for non-coherent devices to use
an outer cache than a downgrade for coherent devices.

>
> I do see what Tomasz meant in terms of the TCR attributes, but what we
> currently do there is a little unintuitive and not at all representative
> of actual mapping attributes - I'll come back to that inline.
>
> >   drivers/iommu/arm-smmu.c   | 15 +++
> >   drivers/iommu/dma-iommu.c  |  3 +++
> >   drivers/iommu/io-pgtable-arm.c | 22 +-
> >   drivers/iommu/io-pgtable.h |  5 +
> >   include/linux/iommu.h  |  3 +++
> >   5 files changed, 43 insertions(+), 5 deletions(-)
>
> As a minor nit, I'd prefer this as at least two patches to separate the
> io-pgtable changes and arm-smmu changes - basically I'd expect it to
> look much the same as the non-strict mode support did.

Sure, will split the patch.

>
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > index ba18d89d4732..047f7ff95b0d 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -255,6 +255,7 @@ struct arm_smmu_domain {
> >   struct mutex   

Re: [PATCH v2 08/17] locking/lockdep: Add support for nestable terminal locks

2018-12-07 Thread Peter Zijlstra
On Mon, Nov 19, 2018 at 01:55:17PM -0500, Waiman Long wrote:
> There are use cases where we want to allow nesting of one terminal lock
> underneath another terminal-like lock. That new lock type is called
> nestable terminal lock which can optionally allow the acquisition of
> no more than one regular (non-nestable) terminal lock underneath it.

I think I asked for a more coherent changelog on this. The above is
still self contradictory and doesn't explain why you'd ever want such a
'misfeature' :-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections

2018-12-07 Thread Peter Zijlstra
On Mon, Nov 19, 2018 at 01:55:16PM -0500, Waiman Long wrote:
> The db->lock is a raw spinlock and so the lock hold time is supposed
> to be short. This will not be the case when printk() is being involved
> in some of the critical sections. In order to avoid the long hold time,
> in case some messages need to be printed, the debug_object_is_on_stack()
> and debug_print_object() calls are now moved out of those critical
> sections.

That's not why you did this patch though; you want to make these locks
terminal locks and that means no printk() inside, as that uses locks
again.

Please write relevant changelogs.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 03/17] locking/lockdep: Add a new terminal lock type

2018-12-07 Thread Peter Zijlstra
On Mon, Nov 19, 2018 at 01:55:12PM -0500, Waiman Long wrote:
> A terminal lock is a lock where further locking or unlocking on another
> lock is not allowed. IOW, no forward dependency is permitted.
> 
> With such a restriction in place, we don't really need to do a full
> validation of the lock chain involving a terminal lock.  Instead,
> we just check if there is any further locking or unlocking on another
> lock when a terminal lock is being held.
> 
> Only spinlocks which are acquired by the _irq or _irqsave variants
> or in IRQ disabled context should be classified as terminal locks.
> 
> By adding this new lock type, we can save entries in lock_chains[],
> chain_hlocks[], list_entries[] and stack_trace[]. By marking suitable
> locks as terminal, we reduce the chance of overflowing those tables
> allowing them to focus on locks that can have both forward and backward
> dependencies.
> 
> Four bits are stolen from the pin_count of the held_lock structure
> to hold a new 4-bit flags field. The pin_count field is essentially a
> summation of 16-bit random cookie values. Removing 4 bits still allow
> the pin_count to accumulate up to almost 4096 of those cookie values.
> 
> Signed-off-by: Waiman Long 
> ---
>  include/linux/lockdep.h| 29 ---
>  kernel/locking/lockdep.c   | 47 
> --
>  kernel/locking/lockdep_internals.h |  5 
>  kernel/locking/lockdep_proc.c  | 11 +++--
>  4 files changed, 80 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index 8fe5b4f..a146bca 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -144,9 +144,20 @@ struct lock_class_stats {
>  
>  /*
>   * Lockdep class type flags
> + *
>   * 1) LOCKDEP_FLAG_NOVALIDATE: No full validation, just simple checks.
> + * 2) LOCKDEP_FLAG_TERMINAL: This is a terminal lock where lock/unlock on
> + *another lock within its critical section is not allowed.
> + *
> + * Only the least significant 4 bits of the flags will be copied to the
> + * held_lock structure.
>   */
> -#define LOCKDEP_FLAG_NOVALIDATE  (1 << 0)
> +#define LOCKDEP_FLAG_TERMINAL(1 << 0)
> +#define LOCKDEP_FLAG_NOVALIDATE  (1 << 4)

Just leave the novalidate thing along, then you don't get to do silly
things like this.


Also; I have a pending patch (that I never quite finished) that tests
lock nesting type (ie. raw_spinlock_t < spinlock_t < struct mutex) that
wanted to use many of these same holes you took.

I think we can easily fit the lot together in bitfields though, since
you really don't need that many flags.

I refreshed the below patch a number of months ago (no idea if it still
applies, I think it was before Paul munged all of RCU). You need to kill
printk and lift a few RT patches for the below to 'work' IIRC.

---
Subject: lockdep: Introduce wait-type checks
From: Peter Zijlstra 
Date: Tue, 19 Nov 2013 21:45:48 +0100

This patch extends lockdep to validate lock wait-type context.

The current wait-types are:

LD_WAIT_FREE,   /* wait free, rcu etc.. */
LD_WAIT_SPIN,   /* spin loops, raw_spinlock_t etc.. */
LD_WAIT_CONFIG, /* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
LD_WAIT_SLEEP,  /* sleeping locks, mutex_t etc.. */

Where lockdep validates that the current lock (the one being acquired)
fits in the current wait-context (as generated by the held stack).

This ensures that we do not try and acquire mutices while holding
spinlocks, do not attempt to acquire spinlocks while holding
raw_spinlocks and so on. In other words, its a more fancy
might_sleep().

Obviously RCU made the entire ordeal more complex than a simple single
value test because we can acquire RCU in (pretty much) any context and
while it presents a context to nested locks it is not the same as it
got acquired in.

Therefore we needed to split the wait_type into two values, one
representing the acquire (outer) and one representing the nested
context (inner). For most 'normal' locks these two are the same.

[ To make static initialization easier we have the rule that:
  .outer == INV means .outer == .inner; because INV == 0. ]

It further means that we need to find the minimal .inner of the held
stack to compare against the outer of the new lock; because while
'normal' RCU presents a CONFIG type to nested locks, if it is taken
while already holding a SPIN type it obviously doesn't relax the
rules.

Below is an example output; generated by the trivial example:

raw_spin_lock(&foo);
spin_lock(&bar);
spin_unlock(&bar);
raw_spin_unlock(&foo);

The way to read it is to look at the new -{n,m} part in the lock
description; -{3:3} for our attempted lock, and try and match that up
to the held locks, which in this case is the one: -{2,2}.

This tells us that the acquiring lock requires a more relaxed
environment that presented by the lock sta

Re: [PATCH v5 2/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

2018-12-07 Thread Nicolas Boichat
On Fri, Dec 7, 2018 at 4:05 PM Matthew Wilcox  wrote:
>
> On Fri, Dec 07, 2018 at 02:16:19PM +0800, Nicolas Boichat wrote:
> > +#ifdef CONFIG_ZONE_DMA32
> > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
> > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32
>
> This name doesn't make any sense.  Why not ARM_V7S_TABLE_SLAB_FLAGS ?

Sure, will fix in v6 then.

> > +#else
> > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA
> > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA
>
> Can you remind me again why it is, on machines which don't support
> ZONE_DMA32, why we have to allocate from ZONE_DMA?  My understanding
> is that 64-bit machines have ZONE_DMA32 and 32-bit machines don't.
> So shouldn't this rather be GFP_KERNEL?

Sorry I mean to reply on the v4 thread, Christoph raised the same
question (https://patchwork.kernel.org/patch/10713025/).

I don't know, and I don't have all the hardware needed to test this
,-( Robin and Will both didn't seem sure.

I'd rather not introduce a new regression, this patch series tries to
fix a known arm64 regression, where we _need_ tables to be in DMA32.
If we want to change 32-bit hardware to use GFP_KERNEL, and we're sure
it works, that's fine by me, but it should be in another patch set.

Hope this makes sense,

Thanks,

> Actually, maybe we could centralise this in gfp.h:
>
> #ifdef CONFIG_64BIT
> # ifdef CONFIG_ZONE_DMA32
> #define GFP_32BIT   GFP_DMA32
> # else
> #define GFP_32BIT   GFP_DMA
> #else /* 32-bit */
> #define GFP_32BIT   GFP_KERNEL
> #endif
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 2/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

2018-12-07 Thread Matthew Wilcox
On Fri, Dec 07, 2018 at 02:16:19PM +0800, Nicolas Boichat wrote:
> +#ifdef CONFIG_ZONE_DMA32
> +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
> +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32

This name doesn't make any sense.  Why not ARM_V7S_TABLE_SLAB_FLAGS ?

> +#else
> +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA
> +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA

Can you remind me again why it is, on machines which don't support
ZONE_DMA32, why we have to allocate from ZONE_DMA?  My understanding
is that 64-bit machines have ZONE_DMA32 and 32-bit machines don't.
So shouldn't this rather be GFP_KERNEL?

Actually, maybe we could centralise this in gfp.h:

#ifdef CONFIG_64BIT
# ifdef CONFIG_ZONE_DMA32
#define GFP_32BIT   GFP_DMA32
# else
#define GFP_32BIT   GFP_DMA
#else /* 32-bit */
#define GFP_32BIT   GFP_KERNEL
#endif

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu