Re: [PATCH 6/6] sparc: merge 32-bit and 64-bit version of pci.h

2018-12-08 Thread David Miller
From: Christoph Hellwig 
Date: Sat,  8 Dec 2018 09:41:15 -0800

> There are enough common defintions that a single header seems nicer.
> 
> Also drop the pointless  include.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Sam Ravnborg 

Acked-by: David S. Miller 

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 4/6] sparc: remove not required includes from dma-mapping.h

2018-12-08 Thread David Miller
From: Christoph Hellwig 
Date: Sat,  8 Dec 2018 09:41:13 -0800

> The only thing we need to explicitly pull in is the defines for the
> CPU type.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: David S. Miller 

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 5/6] sparc: move the leon PCI memory space comment to

2018-12-08 Thread David Miller
From: Christoph Hellwig 
Date: Sat,  8 Dec 2018 09:41:14 -0800

> It has nothing to do with the content of the pci.h header.
> 
> Suggested by: Sam Ravnborg 
> Signed-off-by: Christoph Hellwig 

Acked-by: David S. Miller 

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 07/10] sparc64/pci_sun4v: move code around a bit

2018-12-08 Thread David Miller
From: Christoph Hellwig 
Date: Sat,  8 Dec 2018 09:36:59 -0800

> Move the alloc / free routines down the file so that we can easily use
> the map / unmap helpers to implement non-consistent allocations.
> 
> Also drop the _coherent postfix to match the method name.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: David S. Miller 

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 06/10] sparc64/iommu: implement DMA_ATTR_NON_CONSISTENT

2018-12-08 Thread David Miller
From: Christoph Hellwig 
Date: Sat,  8 Dec 2018 09:36:58 -0800

> Just allocate the memory and use map_page to map the memory.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: David S. Miller 

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 08/10] sparc64/pci_sun4v: implement DMA_ATTR_NON_CONSISTENT

2018-12-08 Thread David Miller
From: Christoph Hellwig 
Date: Sat,  8 Dec 2018 09:37:00 -0800

> Just allocate the memory and use map_page to map the memory.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: David S. Miller 

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 05/10] sparc64/iommu: move code around a bit

2018-12-08 Thread David Miller
From: Christoph Hellwig 
Date: Sat,  8 Dec 2018 09:36:57 -0800

> Move the alloc / free routines down the file so that we can easily use
> the map / unmap helpers to implement non-consistent allocations.
> 
> Also drop the _coherent postfix to match the method name.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: David S. Miller 

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 1/6] sparc: remove no needed sbus_dma_ops methods

2018-12-08 Thread David Miller
From: Christoph Hellwig 
Date: Sat,  8 Dec 2018 09:41:10 -0800

> No need to BUG_ON() on the cache maintainance ops - they are no-ops
> by default, and there is nothing in the DMA API contract that prohibits
> calling them on sbus devices (even if such drivers are unlikely to
> ever appear).
> 
> Similarly a dma_supported method that always returns 0 is rather
> pointless.  The only thing that indicates is that no one ever calls
> the method on sbus devices.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: David S. Miller 

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 2/6] sparc: factor the dma coherent mapping into helper

2018-12-08 Thread David Miller
From: Christoph Hellwig 
Date: Sat,  8 Dec 2018 09:41:11 -0800

> Factor the code to remap memory returned from the DMA coherent allocator
> into two helpers that can be shared by the IOMMU and direct mapping code.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: David S. Miller 

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH 6/6] sparc: merge 32-bit and 64-bit version of pci.h

2018-12-08 Thread Christoph Hellwig
There are enough common defintions that a single header seems nicer.

Also drop the pointless  include.

Signed-off-by: Christoph Hellwig 
Acked-by: Sam Ravnborg 
---
 arch/sparc/include/asm/pci.h| 53 ++---
 arch/sparc/include/asm/pci_32.h | 32 
 arch/sparc/include/asm/pci_64.h | 52 
 3 files changed, 49 insertions(+), 88 deletions(-)
 delete mode 100644 arch/sparc/include/asm/pci_32.h
 delete mode 100644 arch/sparc/include/asm/pci_64.h

diff --git a/arch/sparc/include/asm/pci.h b/arch/sparc/include/asm/pci.h
index cad79a6ce0e4..cfec79bb1831 100644
--- a/arch/sparc/include/asm/pci.h
+++ b/arch/sparc/include/asm/pci.h
@@ -1,9 +1,54 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #ifndef ___ASM_SPARC_PCI_H
 #define ___ASM_SPARC_PCI_H
-#if defined(__sparc__) && defined(__arch64__)
-#include 
+
+
+/* Can be used to override the logic in pci_scan_bus for skipping
+ * already-configured bus numbers - to be used for buggy BIOSes
+ * or architectures with incomplete PCI setup by the loader.
+ */
+#define pcibios_assign_all_busses()0
+
+#define PCIBIOS_MIN_IO 0UL
+#define PCIBIOS_MIN_MEM0UL
+
+#define PCI_IRQ_NONE   0x
+
+
+#ifdef CONFIG_SPARC64
+
+/* PCI IOMMU mapping bypass support. */
+
+/* PCI 64-bit addressing works for all slots on all controller
+ * types on sparc64.  However, it requires that the device
+ * can drive enough of the 64 bits.
+ */
+#define PCI64_REQUIRED_MASK(~(u64)0)
+#define PCI64_ADDR_BASE0xfffcUL
+
+/* Return the index of the PCI controller for device PDEV. */
+int pci_domain_nr(struct pci_bus *bus);
+static inline int pci_proc_domain(struct pci_bus *bus)
+{
+   return 1;
+}
+
+/* Platform support for /proc/bus/pci/X/Y mmap()s. */
+#define HAVE_PCI_MMAP
+#define arch_can_pci_mmap_io() 1
+#define HAVE_ARCH_PCI_GET_UNMAPPED_AREA
+#define get_pci_unmapped_area get_fb_unmapped_area
+
+#define HAVE_ARCH_PCI_RESOURCE_TO_USER
+#endif /* CONFIG_SPARC64 */
+
+#if defined(CONFIG_SPARC64) || defined(CONFIG_LEON_PCI)
+static inline int pci_get_legacy_ide_irq(struct pci_dev *dev, int channel)
+{
+   return PCI_IRQ_NONE;
+}
 #else
-#include 
-#endif
+#include 
 #endif
+
+#endif /* ___ASM_SPARC_PCI_H */
diff --git a/arch/sparc/include/asm/pci_32.h b/arch/sparc/include/asm/pci_32.h
deleted file mode 100644
index a475380ea108..
--- a/arch/sparc/include/asm/pci_32.h
+++ /dev/null
@@ -1,32 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __SPARC_PCI_H
-#define __SPARC_PCI_H
-
-#ifdef __KERNEL__
-
-#include 
-
-/* Can be used to override the logic in pci_scan_bus for skipping
- * already-configured bus numbers - to be used for buggy BIOSes
- * or architectures with incomplete PCI setup by the loader.
- */
-#define pcibios_assign_all_busses()0
-
-#define PCIBIOS_MIN_IO 0UL
-#define PCIBIOS_MIN_MEM0UL
-
-#define PCI_IRQ_NONE   0x
-
-#endif /* __KERNEL__ */
-
-#ifndef CONFIG_LEON_PCI
-/* generic pci stuff */
-#include 
-#else
-static inline int pci_get_legacy_ide_irq(struct pci_dev *dev, int channel)
-{
-   return PCI_IRQ_NONE;
-}
-#endif
-
-#endif /* __SPARC_PCI_H */
diff --git a/arch/sparc/include/asm/pci_64.h b/arch/sparc/include/asm/pci_64.h
deleted file mode 100644
index fac77813402c..
--- a/arch/sparc/include/asm/pci_64.h
+++ /dev/null
@@ -1,52 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __SPARC64_PCI_H
-#define __SPARC64_PCI_H
-
-#ifdef __KERNEL__
-
-#include 
-
-/* Can be used to override the logic in pci_scan_bus for skipping
- * already-configured bus numbers - to be used for buggy BIOSes
- * or architectures with incomplete PCI setup by the loader.
- */
-#define pcibios_assign_all_busses()0
-
-#define PCIBIOS_MIN_IO 0UL
-#define PCIBIOS_MIN_MEM0UL
-
-#define PCI_IRQ_NONE   0x
-
-/* PCI IOMMU mapping bypass support. */
-
-/* PCI 64-bit addressing works for all slots on all controller
- * types on sparc64.  However, it requires that the device
- * can drive enough of the 64 bits.
- */
-#define PCI64_REQUIRED_MASK(~(u64)0)
-#define PCI64_ADDR_BASE0xfffcUL
-
-/* Return the index of the PCI controller for device PDEV. */
-
-int pci_domain_nr(struct pci_bus *bus);
-static inline int pci_proc_domain(struct pci_bus *bus)
-{
-   return 1;
-}
-
-/* Platform support for /proc/bus/pci/X/Y mmap()s. */
-
-#define HAVE_PCI_MMAP
-#define arch_can_pci_mmap_io() 1
-#define HAVE_ARCH_PCI_GET_UNMAPPED_AREA
-#define get_pci_unmapped_area get_fb_unmapped_area
-
-static inline int pci_get_legacy_ide_irq(struct pci_dev *dev, int channel)
-{
-   return PCI_IRQ_NONE;
-}
-
-#define HAVE_ARCH_PCI_RESOURCE_TO_USER
-#endif /* __KERNEL__ */
-
-#endif /* __SPARC64_PCI_H */
-- 
2.19.2


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org

[PATCH 4/6] sparc: remove not required includes from dma-mapping.h

2018-12-08 Thread Christoph Hellwig
The only thing we need to explicitly pull in is the defines for the
CPU type.

Signed-off-by: Christoph Hellwig 
---
 arch/sparc/include/asm/dma-mapping.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/sparc/include/asm/dma-mapping.h 
b/arch/sparc/include/asm/dma-mapping.h
index b0bb2fcaf1c9..55a44f08a9a4 100644
--- a/arch/sparc/include/asm/dma-mapping.h
+++ b/arch/sparc/include/asm/dma-mapping.h
@@ -2,9 +2,7 @@
 #ifndef ___ASM_SPARC_DMA_MAPPING_H
 #define ___ASM_SPARC_DMA_MAPPING_H
 
-#include 
-#include 
-#include 
+#include 
 
 extern const struct dma_map_ops *dma_ops;
 
-- 
2.19.2


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH 5/6] sparc: move the leon PCI memory space comment to

2018-12-08 Thread Christoph Hellwig
It has nothing to do with the content of the pci.h header.

Suggested by: Sam Ravnborg 
Signed-off-by: Christoph Hellwig 
---
 arch/sparc/include/asm/leon.h   | 9 +
 arch/sparc/include/asm/pci_32.h | 9 -
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/sparc/include/asm/leon.h b/arch/sparc/include/asm/leon.h
index c68bb5b76e3d..77ea406ff9df 100644
--- a/arch/sparc/include/asm/leon.h
+++ b/arch/sparc/include/asm/leon.h
@@ -255,4 +255,13 @@ extern int leon_ipi_irq;
 #define _pfn_valid(pfn) ((pfn < last_valid_pfn) && (pfn >= 
PFN(phys_base)))
 #define _SRMMU_PTE_PMASK_LEON 0x
 
+/*
+ * On LEON PCI Memory space is mapped 1:1 with physical address space.
+ *
+ * I/O space is located at low 64Kbytes in PCI I/O space. The I/O addresses
+ * are converted into CPU addresses to virtual addresses that are mapped with
+ * MMU to the PCI Host PCI I/O space window which are translated to the low
+ * 64Kbytes by the Host controller.
+ */
+
 #endif
diff --git a/arch/sparc/include/asm/pci_32.h b/arch/sparc/include/asm/pci_32.h
index cfc0ee9476c6..a475380ea108 100644
--- a/arch/sparc/include/asm/pci_32.h
+++ b/arch/sparc/include/asm/pci_32.h
@@ -23,15 +23,6 @@
 /* generic pci stuff */
 #include 
 #else
-/*
- * On LEON PCI Memory space is mapped 1:1 with physical address space.
- *
- * I/O space is located at low 64Kbytes in PCI I/O space. The I/O addresses
- * are converted into CPU addresses to virtual addresses that are mapped with
- * MMU to the PCI Host PCI I/O space window which are translated to the low
- * 64Kbytes by the Host controller.
- */
-
 static inline int pci_get_legacy_ide_irq(struct pci_dev *dev, int channel)
 {
return PCI_IRQ_NONE;
-- 
2.19.2


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH 3/6] sparc: remove the sparc32_dma_ops indirection

2018-12-08 Thread Christoph Hellwig
There is no good reason to have a double indirection for the sparc32
dma ops, so remove the sparc32_dma_ops and define separate dma_map_ops
instance for the different IOMMU types.

Signed-off-by: Christoph Hellwig 
---
 arch/sparc/include/asm/dma.h |  48 +---
 arch/sparc/kernel/ioport.c   | 124 +--
 arch/sparc/mm/io-unit.c  |  65 -
 arch/sparc/mm/iommu.c| 137 ++-
 4 files changed, 138 insertions(+), 236 deletions(-)

diff --git a/arch/sparc/include/asm/dma.h b/arch/sparc/include/asm/dma.h
index a1d7c86917c6..462e7c794a09 100644
--- a/arch/sparc/include/asm/dma.h
+++ b/arch/sparc/include/asm/dma.h
@@ -91,54 +91,10 @@ extern int isa_dma_bridge_buggy;
 #endif
 
 #ifdef CONFIG_SPARC32
-
-/* Routines for data transfer buffers. */
 struct device;
-struct scatterlist;
-
-struct sparc32_dma_ops {
-   __u32 (*get_scsi_one)(struct device *, char *, unsigned long);
-   void (*get_scsi_sgl)(struct device *, struct scatterlist *, int);
-   void (*release_scsi_one)(struct device *, __u32, unsigned long);
-   void (*release_scsi_sgl)(struct device *, struct scatterlist *,int);
-#ifdef CONFIG_SBUS
-   int (*map_dma_area)(struct device *, dma_addr_t *, unsigned long, 
unsigned long, int);
-   void (*unmap_dma_area)(struct device *, unsigned long, int);
-#endif
-};
-extern const struct sparc32_dma_ops *sparc32_dma_ops;
-
-#define mmu_get_scsi_one(dev,vaddr,len) \
-   sparc32_dma_ops->get_scsi_one(dev, vaddr, len)
-#define mmu_get_scsi_sgl(dev,sg,sz) \
-   sparc32_dma_ops->get_scsi_sgl(dev, sg, sz)
-#define mmu_release_scsi_one(dev,vaddr,len) \
-   sparc32_dma_ops->release_scsi_one(dev, vaddr,len)
-#define mmu_release_scsi_sgl(dev,sg,sz) \
-   sparc32_dma_ops->release_scsi_sgl(dev, sg, sz)
-
-#ifdef CONFIG_SBUS
-/*
- * mmu_map/unmap are provided by iommu/iounit; Invalid to call on IIep.
- *
- * The mmu_map_dma_area establishes two mappings in one go.
- * These mappings point to pages normally mapped at 'va' (linear address).
- * First mapping is for CPU visible address at 'a', uncached.
- * This is an alias, but it works because it is an uncached mapping.
- * Second mapping is for device visible address, or "bus" address.
- * The bus address is returned at '*pba'.
- *
- * These functions seem distinct, but are hard to split.
- * On sun4m, page attributes depend on the CPU type, so we have to
- * know if we are mapping RAM or I/O, so it has to be an additional argument
- * to a separate mapping function for CPU visible mappings.
- */
-#define sbus_map_dma_area(dev,pba,va,a,len) \
-   sparc32_dma_ops->map_dma_area(dev, pba, va, a, len)
-#define sbus_unmap_dma_area(dev,ba,len) \
-   sparc32_dma_ops->unmap_dma_area(dev, ba, len)
-#endif /* CONFIG_SBUS */
 
+unsigned long sparc_dma_alloc_resource(struct device *dev, size_t len);
+bool sparc_dma_free_resource(void *cpu_addr, size_t size);
 #endif
 
 #endif /* !(_ASM_SPARC_DMA_H) */
diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c
index fd7a41c6d688..f46213035637 100644
--- a/arch/sparc/kernel/ioport.c
+++ b/arch/sparc/kernel/ioport.c
@@ -52,8 +52,6 @@
 #include 
 #include 
 
-const struct sparc32_dma_ops *sparc32_dma_ops;
-
 /* This function must make sure that caches and memory are coherent after DMA
  * On LEON systems without cache snooping it flushes the entire D-CACHE.
  */
@@ -247,7 +245,7 @@ static void _sparc_free_io(struct resource *res)
release_resource(res);
 }
 
-static unsigned long sparc_dma_alloc_resource(struct device *dev, size_t len)
+unsigned long sparc_dma_alloc_resource(struct device *dev, size_t len)
 {
struct resource *res;
 
@@ -266,7 +264,7 @@ static unsigned long sparc_dma_alloc_resource(struct device 
*dev, size_t len)
return res->start;
 }
 
-static bool sparc_dma_free_resource(void *cpu_addr, size_t size)
+bool sparc_dma_free_resource(void *cpu_addr, size_t size)
 {
unsigned long addr = (unsigned long)cpu_addr;
struct resource *res;
@@ -302,122 +300,6 @@ void sbus_set_sbus64(struct device *dev, int x)
 }
 EXPORT_SYMBOL(sbus_set_sbus64);
 
-/*
- * Allocate a chunk of memory suitable for DMA.
- * Typically devices use them for control blocks.
- * CPU may access them without any explicit flushing.
- */
-static void *sbus_alloc_coherent(struct device *dev, size_t len,
-dma_addr_t *dma_addrp, gfp_t gfp,
-unsigned long attrs)
-{
-   unsigned long len_total = PAGE_ALIGN(len);
-   unsigned long va, addr;
-   int order;
-
-   /* XXX why are some lengths signed, others unsigned? */
-   if (len <= 0) {
-   return NULL;
-   }
-   /* XXX So what is maxphys for us and how do drivers know it? */
-   if (len > 256*1024) {   /* __get_free_pages() limit */
-   return NULL;
-   }
-
-   order = get_order(len_total);
-   va = 

make the non-consistent DMA allocator more userful (resend)

2018-12-08 Thread Christoph Hellwig
[sorry for the spam, had to resend due a wrongly typed linux-arm-kernel
 address]

Hi all,

we had all kinds of discussions about how to best allocate DMAable memory
without having to deal with the problem that your normal "coherent"
DMA allocator can be very slow on platforms where DMA is not DMA
coherent.

To work around this drivers basically two choices at the moment:

 (1) just allocate memory using the page or slab allocator and the call
 one of the dma_map_* APIs on it.  This has a few drawbacks:

   - normal GFP_KERNEL memory might not actually be DMA addressable
 for all devices, forcing fallbacks to slow bounce buffering
   - there is no easy way to access the CMA allocator for large
 chunks, or to map small pages into single device and virtually
 contigous chunks using the iommu and vmap

 (2) use dma_alloc_attrs with the DMA_ATTR_NON_CONSISTENT flag.  This
 has a different set of drawbacks

   - only very few architectures actually implement this API fully,
 if it is not implemented it falls back to the potentially
 uncached and slow coherent allocator
   - the dma_cache_sync API to use with it is not very well
 specified and problematic in that it does not clearly
 transfer ownership

Based on that I've been planning to introduce a proper API for
allocating DMAable memory for a while.  In the end I've ended up
improving the DMA_ATTR_NON_CONSISTENT flag instead of designing
something new.  To make it useful we need to:

 (a) ensure we don't fall back to the slow coherent allocator except
 on fully coherent platforms where they are the same anyway
 (b) replace the odd dma_cache_sync calls with the proper
 dma_sync_* APIs that we also use for other ownership trasnfers

This turned out to be surprisingly simple now that we have consolidated
most of the direct mapping code.  Note that this series is missing
the updates for powerpc which is in the process of being migrated to
the common direct mapping code in another series and would be covered
by that.

Note that these patches don't use iommu/vmap coalescing as they can
be problematic depending on the cache architecture.  But we could
opt into those when we know we don't have cache interaction problems
based on the API.

All the patches are on top of the dma-mapping for-net tree and also
available as a git tree here:

git://git.infradead.org/users/hch/misc.git dma-noncoherent-allocator

Gitweb:


http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-noncoherent-allocator

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH 2/6] sparc: factor the dma coherent mapping into helper

2018-12-08 Thread Christoph Hellwig
Factor the code to remap memory returned from the DMA coherent allocator
into two helpers that can be shared by the IOMMU and direct mapping code.

Signed-off-by: Christoph Hellwig 
---
 arch/sparc/kernel/ioport.c | 151 -
 1 file changed, 67 insertions(+), 84 deletions(-)

diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c
index 4b2167a0ec0b..fd7a41c6d688 100644
--- a/arch/sparc/kernel/ioport.c
+++ b/arch/sparc/kernel/ioport.c
@@ -247,6 +247,53 @@ static void _sparc_free_io(struct resource *res)
release_resource(res);
 }
 
+static unsigned long sparc_dma_alloc_resource(struct device *dev, size_t len)
+{
+   struct resource *res;
+
+   res = kzalloc(sizeof(*res), GFP_KERNEL);
+   if (!res)
+   return 0;
+   res->name = dev->of_node->name;
+
+   if (allocate_resource(&_sparc_dvma, res, len, _sparc_dvma.start,
+   _sparc_dvma.end, PAGE_SIZE, NULL, NULL) != 0) {
+   printk("sbus_alloc_consistent: cannot occupy 0x%zx", len);
+   kfree(res);
+   return 0;
+   }
+
+   return res->start;
+}
+
+static bool sparc_dma_free_resource(void *cpu_addr, size_t size)
+{
+   unsigned long addr = (unsigned long)cpu_addr;
+   struct resource *res;
+
+   res = lookup_resource(&_sparc_dvma, addr);
+   if (!res) {
+   printk("%s: cannot free %p\n", __func__, cpu_addr);
+   return false;
+   }
+
+   if ((addr & (PAGE_SIZE - 1)) != 0) {
+   printk("%s: unaligned va %p\n", __func__, cpu_addr);
+   return false;
+   }
+
+   size = PAGE_ALIGN(size);
+   if (resource_size(res) != size) {
+   printk("%s: region 0x%lx asked 0x%zx\n",
+   __func__, (long)resource_size(res), size);
+   return false;
+   }
+
+   release_resource(res);
+   kfree(res);
+   return true;
+}
+
 #ifdef CONFIG_SBUS
 
 void sbus_set_sbus64(struct device *dev, int x)
@@ -264,10 +311,8 @@ static void *sbus_alloc_coherent(struct device *dev, 
size_t len,
 dma_addr_t *dma_addrp, gfp_t gfp,
 unsigned long attrs)
 {
-   struct platform_device *op = to_platform_device(dev);
unsigned long len_total = PAGE_ALIGN(len);
-   unsigned long va;
-   struct resource *res;
+   unsigned long va, addr;
int order;
 
/* XXX why are some lengths signed, others unsigned? */
@@ -284,32 +329,23 @@ static void *sbus_alloc_coherent(struct device *dev, 
size_t len,
if (va == 0)
goto err_nopages;
 
-   if ((res = kzalloc(sizeof(struct resource), GFP_KERNEL)) == NULL)
+   addr = sparc_dma_alloc_resource(dev, len_total);
+   if (!addr)
goto err_nomem;
 
-   if (allocate_resource(&_sparc_dvma, res, len_total,
-   _sparc_dvma.start, _sparc_dvma.end, PAGE_SIZE, NULL, NULL) != 0) {
-   printk("sbus_alloc_consistent: cannot occupy 0x%lx", len_total);
-   goto err_nova;
-   }
-
// XXX The sbus_map_dma_area does this for us below, see comments.
// srmmu_mapiorange(0, virt_to_phys(va), res->start, len_total);
/*
 * XXX That's where sdev would be used. Currently we load
 * all iommu tables with the same translations.
 */
-   if (sbus_map_dma_area(dev, dma_addrp, va, res->start, len_total) != 0)
+   if (sbus_map_dma_area(dev, dma_addrp, va, addr, len_total) != 0)
goto err_noiommu;
 
-   res->name = op->dev.of_node->name;
-
-   return (void *)(unsigned long)res->start;
+   return (void *)addr;
 
 err_noiommu:
-   release_resource(res);
-err_nova:
-   kfree(res);
+   sparc_dma_free_resource((void *)addr, len_total);
 err_nomem:
free_pages(va, order);
 err_nopages:
@@ -319,29 +355,11 @@ static void *sbus_alloc_coherent(struct device *dev, 
size_t len,
 static void sbus_free_coherent(struct device *dev, size_t n, void *p,
   dma_addr_t ba, unsigned long attrs)
 {
-   struct resource *res;
struct page *pgv;
 
-   if ((res = lookup_resource(&_sparc_dvma,
-   (unsigned long)p)) == NULL) {
-   printk("sbus_free_consistent: cannot free %p\n", p);
-   return;
-   }
-
-   if (((unsigned long)p & (PAGE_SIZE-1)) != 0) {
-   printk("sbus_free_consistent: unaligned va %p\n", p);
-   return;
-   }
-
n = PAGE_ALIGN(n);
-   if (resource_size(res) != n) {
-   printk("sbus_free_consistent: region 0x%lx asked 0x%zx\n",
-   (long)resource_size(res), n);
+   if (!sparc_dma_free_resource(p, n))
return;
-   }
-
-   release_resource(res);
-   kfree(res);
 
pgv = virt_to_page(p);
sbus_unmap_dma_area(dev, ba, n);
@@ -418,45 +436,30 @@ 

[PATCH 1/6] sparc: remove no needed sbus_dma_ops methods

2018-12-08 Thread Christoph Hellwig
No need to BUG_ON() on the cache maintainance ops - they are no-ops
by default, and there is nothing in the DMA API contract that prohibits
calling them on sbus devices (even if such drivers are unlikely to
ever appear).

Similarly a dma_supported method that always returns 0 is rather
pointless.  The only thing that indicates is that no one ever calls
the method on sbus devices.

Signed-off-by: Christoph Hellwig 
---
 arch/sparc/kernel/ioport.c | 20 
 1 file changed, 20 deletions(-)

diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c
index 6799c93c9f27..4b2167a0ec0b 100644
--- a/arch/sparc/kernel/ioport.c
+++ b/arch/sparc/kernel/ioport.c
@@ -391,23 +391,6 @@ static void sbus_unmap_sg(struct device *dev, struct 
scatterlist *sg, int n,
mmu_release_scsi_sgl(dev, sg, n);
 }
 
-static void sbus_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
-int n, enum dma_data_direction dir)
-{
-   BUG();
-}
-
-static void sbus_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
-   int n, enum dma_data_direction dir)
-{
-   BUG();
-}
-
-static int sbus_dma_supported(struct device *dev, u64 mask)
-{
-   return 0;
-}
-
 static const struct dma_map_ops sbus_dma_ops = {
.alloc  = sbus_alloc_coherent,
.free   = sbus_free_coherent,
@@ -415,9 +398,6 @@ static const struct dma_map_ops sbus_dma_ops = {
.unmap_page = sbus_unmap_page,
.map_sg = sbus_map_sg,
.unmap_sg   = sbus_unmap_sg,
-   .sync_sg_for_cpu= sbus_sync_sg_for_cpu,
-   .sync_sg_for_device = sbus_sync_sg_for_device,
-   .dma_supported  = sbus_dma_supported,
 };
 
 static int __init sparc_register_ioport(void)
-- 
2.19.2


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH 03/10] arm64/iommu: implement support for DMA_ATTR_NON_CONSISTENT

2018-12-08 Thread Christoph Hellwig
DMA_ATTR_NON_CONSISTENT forces contiguous allocations as we don't
want to remap, and is otherwise forced down the same pass as if we
were always on a coherent device.  No new code required except for
a few conditionals.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/mm/dma-mapping.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index d39b60113539..0010688ca30e 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -240,7 +240,8 @@ static void *__iommu_alloc_attrs(struct device *dev, size_t 
size,
dma_free_from_pool(addr, size);
addr = NULL;
}
-   } else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
+   } else if (attrs & (DMA_ATTR_FORCE_CONTIGUOUS |
+   DMA_ATTR_NON_CONSISTENT)) {
pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
struct page *page;
 
@@ -256,7 +257,7 @@ static void *__iommu_alloc_attrs(struct device *dev, size_t 
size,
return NULL;
}
 
-   if (coherent) {
+   if (coherent || (attrs & DMA_ATTR_NON_CONSISTENT)) {
memset(addr, 0, size);
return addr;
}
@@ -309,7 +310,8 @@ static void __iommu_free_attrs(struct device *dev, size_t 
size, void *cpu_addr,
if (dma_in_atomic_pool(cpu_addr, size)) {
iommu_dma_unmap_page(dev, handle, iosize, 0, 0);
dma_free_from_pool(cpu_addr, size);
-   } else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
+   } else if (attrs & (DMA_ATTR_FORCE_CONTIGUOUS |
+   DMA_ATTR_NON_CONSISTENT)) {
struct page *page = vmalloc_to_page(cpu_addr);
 
iommu_dma_unmap_page(dev, handle, iosize, 0, attrs);
@@ -342,10 +344,11 @@ static int __iommu_mmap_attrs(struct device *dev, struct 
vm_area_struct *vma,
if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, ))
return ret;
 
-   if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
+   if (attrs & (DMA_ATTR_FORCE_CONTIGUOUS | DMA_ATTR_NON_CONSISTENT)) {
unsigned long pfn;
 
-   if (dev_is_dma_coherent(dev))
+   if (dev_is_dma_coherent(dev) ||
+   (attrs & DMA_ATTR_NON_CONSISTENT))
pfn = virt_to_pfn(cpu_addr);
else
pfn = vmalloc_to_pfn(cpu_addr);
@@ -366,10 +369,11 @@ static int __iommu_get_sgtable(struct device *dev, struct 
sg_table *sgt,
unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
struct vm_struct *area = find_vm_area(cpu_addr);
 
-   if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
+   if (attrs & (DMA_ATTR_FORCE_CONTIGUOUS | DMA_ATTR_NON_CONSISTENT)) {
struct page *page;
 
-   if (dev_is_dma_coherent(dev))
+   if (dev_is_dma_coherent(dev) ||
+   (attrs & DMA_ATTR_NON_CONSISTENT))
page = virt_to_page(cpu_addr);
else
page = vmalloc_to_page(cpu_addr);
-- 
2.19.2


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH 05/10] sparc64/iommu: move code around a bit

2018-12-08 Thread Christoph Hellwig
Move the alloc / free routines down the file so that we can easily use
the map / unmap helpers to implement non-consistent allocations.

Also drop the _coherent postfix to match the method name.

Signed-off-by: Christoph Hellwig 
---
 arch/sparc/kernel/iommu.c | 135 +++---
 1 file changed, 67 insertions(+), 68 deletions(-)

diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c
index 0626bae5e3da..4bf0497e0704 100644
--- a/arch/sparc/kernel/iommu.c
+++ b/arch/sparc/kernel/iommu.c
@@ -195,72 +195,6 @@ static inline void iommu_free_ctx(struct iommu *iommu, int 
ctx)
}
 }
 
-static void *dma_4u_alloc_coherent(struct device *dev, size_t size,
-  dma_addr_t *dma_addrp, gfp_t gfp,
-  unsigned long attrs)
-{
-   unsigned long order, first_page;
-   struct iommu *iommu;
-   struct page *page;
-   int npages, nid;
-   iopte_t *iopte;
-   void *ret;
-
-   size = IO_PAGE_ALIGN(size);
-   order = get_order(size);
-   if (order >= 10)
-   return NULL;
-
-   nid = dev->archdata.numa_node;
-   page = alloc_pages_node(nid, gfp, order);
-   if (unlikely(!page))
-   return NULL;
-
-   first_page = (unsigned long) page_address(page);
-   memset((char *)first_page, 0, PAGE_SIZE << order);
-
-   iommu = dev->archdata.iommu;
-
-   iopte = alloc_npages(dev, iommu, size >> IO_PAGE_SHIFT);
-
-   if (unlikely(iopte == NULL)) {
-   free_pages(first_page, order);
-   return NULL;
-   }
-
-   *dma_addrp = (iommu->tbl.table_map_base +
- ((iopte - iommu->page_table) << IO_PAGE_SHIFT));
-   ret = (void *) first_page;
-   npages = size >> IO_PAGE_SHIFT;
-   first_page = __pa(first_page);
-   while (npages--) {
-   iopte_val(*iopte) = (IOPTE_CONSISTENT(0UL) |
-IOPTE_WRITE |
-(first_page & IOPTE_PAGE));
-   iopte++;
-   first_page += IO_PAGE_SIZE;
-   }
-
-   return ret;
-}
-
-static void dma_4u_free_coherent(struct device *dev, size_t size,
-void *cpu, dma_addr_t dvma,
-unsigned long attrs)
-{
-   struct iommu *iommu;
-   unsigned long order, npages;
-
-   npages = IO_PAGE_ALIGN(size) >> IO_PAGE_SHIFT;
-   iommu = dev->archdata.iommu;
-
-   iommu_tbl_range_free(>tbl, dvma, npages, IOMMU_ERROR_CODE);
-
-   order = get_order(size);
-   if (order < 10)
-   free_pages((unsigned long)cpu, order);
-}
-
 static dma_addr_t dma_4u_map_page(struct device *dev, struct page *page,
  unsigned long offset, size_t sz,
  enum dma_data_direction direction,
@@ -742,6 +676,71 @@ static void dma_4u_sync_sg_for_cpu(struct device *dev,
spin_unlock_irqrestore(>lock, flags);
 }
 
+static void *dma_4u_alloc(struct device *dev, size_t size,
+ dma_addr_t *dma_addrp, gfp_t gfp, unsigned long attrs)
+{
+   unsigned long order, first_page;
+   struct iommu *iommu;
+   struct page *page;
+   int npages, nid;
+   iopte_t *iopte;
+   void *ret;
+
+   size = IO_PAGE_ALIGN(size);
+   order = get_order(size);
+   if (order >= 10)
+   return NULL;
+
+   nid = dev->archdata.numa_node;
+   page = alloc_pages_node(nid, gfp, order);
+   if (unlikely(!page))
+   return NULL;
+
+   first_page = (unsigned long) page_address(page);
+   memset((char *)first_page, 0, PAGE_SIZE << order);
+
+   iommu = dev->archdata.iommu;
+
+   iopte = alloc_npages(dev, iommu, size >> IO_PAGE_SHIFT);
+
+   if (unlikely(iopte == NULL)) {
+   free_pages(first_page, order);
+   return NULL;
+   }
+
+   *dma_addrp = (iommu->tbl.table_map_base +
+ ((iopte - iommu->page_table) << IO_PAGE_SHIFT));
+   ret = (void *) first_page;
+   npages = size >> IO_PAGE_SHIFT;
+   first_page = __pa(first_page);
+   while (npages--) {
+   iopte_val(*iopte) = (IOPTE_CONSISTENT(0UL) |
+IOPTE_WRITE |
+(first_page & IOPTE_PAGE));
+   iopte++;
+   first_page += IO_PAGE_SIZE;
+   }
+
+   return ret;
+}
+
+static void dma_4u_free(struct device *dev, size_t size, void *cpu,
+   dma_addr_t dvma, unsigned long attrs)
+{
+   struct iommu *iommu;
+   unsigned long order, npages;
+
+   npages = IO_PAGE_ALIGN(size) >> IO_PAGE_SHIFT;
+   iommu = dev->archdata.iommu;
+
+   iommu_tbl_range_free(>tbl, dvma, npages, IOMMU_ERROR_CODE);
+
+   order = get_order(size);
+   if (order < 10)
+   free_pages((unsigned long)cpu, order);
+}
+

[PATCH 07/10] sparc64/pci_sun4v: move code around a bit

2018-12-08 Thread Christoph Hellwig
Move the alloc / free routines down the file so that we can easily use
the map / unmap helpers to implement non-consistent allocations.

Also drop the _coherent postfix to match the method name.

Signed-off-by: Christoph Hellwig 
---
 arch/sparc/kernel/pci_sun4v.c | 229 +-
 1 file changed, 114 insertions(+), 115 deletions(-)

diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index fa0e42b4cbfb..b95c70136559 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -171,87 +171,6 @@ static inline long iommu_batch_end(u64 mask)
return iommu_batch_flush(p, mask);
 }
 
-static void *dma_4v_alloc_coherent(struct device *dev, size_t size,
-  dma_addr_t *dma_addrp, gfp_t gfp,
-  unsigned long attrs)
-{
-   u64 mask;
-   unsigned long flags, order, first_page, npages, n;
-   unsigned long prot = 0;
-   struct iommu *iommu;
-   struct atu *atu;
-   struct iommu_map_table *tbl;
-   struct page *page;
-   void *ret;
-   long entry;
-   int nid;
-
-   size = IO_PAGE_ALIGN(size);
-   order = get_order(size);
-   if (unlikely(order >= MAX_ORDER))
-   return NULL;
-
-   npages = size >> IO_PAGE_SHIFT;
-
-   if (attrs & DMA_ATTR_WEAK_ORDERING)
-   prot = HV_PCI_MAP_ATTR_RELAXED_ORDER;
-
-   nid = dev->archdata.numa_node;
-   page = alloc_pages_node(nid, gfp, order);
-   if (unlikely(!page))
-   return NULL;
-
-   first_page = (unsigned long) page_address(page);
-   memset((char *)first_page, 0, PAGE_SIZE << order);
-
-   iommu = dev->archdata.iommu;
-   atu = iommu->atu;
-
-   mask = dev->coherent_dma_mask;
-   if (mask <= DMA_BIT_MASK(32))
-   tbl = >tbl;
-   else
-   tbl = >tbl;
-
-   entry = iommu_tbl_range_alloc(dev, tbl, npages, NULL,
- (unsigned long)(-1), 0);
-
-   if (unlikely(entry == IOMMU_ERROR_CODE))
-   goto range_alloc_fail;
-
-   *dma_addrp = (tbl->table_map_base + (entry << IO_PAGE_SHIFT));
-   ret = (void *) first_page;
-   first_page = __pa(first_page);
-
-   local_irq_save(flags);
-
-   iommu_batch_start(dev,
- (HV_PCI_MAP_ATTR_READ | prot |
-  HV_PCI_MAP_ATTR_WRITE),
- entry);
-
-   for (n = 0; n < npages; n++) {
-   long err = iommu_batch_add(first_page + (n * PAGE_SIZE), mask);
-   if (unlikely(err < 0L))
-   goto iommu_map_fail;
-   }
-
-   if (unlikely(iommu_batch_end(mask) < 0L))
-   goto iommu_map_fail;
-
-   local_irq_restore(flags);
-
-   return ret;
-
-iommu_map_fail:
-   local_irq_restore(flags);
-   iommu_tbl_range_free(tbl, *dma_addrp, npages, IOMMU_ERROR_CODE);
-
-range_alloc_fail:
-   free_pages(first_page, order);
-   return NULL;
-}
-
 unsigned long dma_4v_iotsb_bind(unsigned long devhandle,
unsigned long iotsb_num,
struct pci_bus *bus_dev)
@@ -316,38 +235,6 @@ static void dma_4v_iommu_demap(struct device *dev, 
unsigned long devhandle,
local_irq_restore(flags);
 }
 
-static void dma_4v_free_coherent(struct device *dev, size_t size, void *cpu,
-dma_addr_t dvma, unsigned long attrs)
-{
-   struct pci_pbm_info *pbm;
-   struct iommu *iommu;
-   struct atu *atu;
-   struct iommu_map_table *tbl;
-   unsigned long order, npages, entry;
-   unsigned long iotsb_num;
-   u32 devhandle;
-
-   npages = IO_PAGE_ALIGN(size) >> IO_PAGE_SHIFT;
-   iommu = dev->archdata.iommu;
-   pbm = dev->archdata.host_controller;
-   atu = iommu->atu;
-   devhandle = pbm->devhandle;
-
-   if (dvma <= DMA_BIT_MASK(32)) {
-   tbl = >tbl;
-   iotsb_num = 0; /* we don't care for legacy iommu */
-   } else {
-   tbl = >tbl;
-   iotsb_num = atu->iotsb->iotsb_num;
-   }
-   entry = ((dvma - tbl->table_map_base) >> IO_PAGE_SHIFT);
-   dma_4v_iommu_demap(dev, devhandle, dvma, iotsb_num, entry, npages);
-   iommu_tbl_range_free(tbl, dvma, npages, IOMMU_ERROR_CODE);
-   order = get_order(size);
-   if (order < 10)
-   free_pages((unsigned long)cpu, order);
-}
-
 static dma_addr_t dma_4v_map_page(struct device *dev, struct page *page,
  unsigned long offset, size_t sz,
  enum dma_data_direction direction,
@@ -671,6 +558,118 @@ static void dma_4v_unmap_sg(struct device *dev, struct 
scatterlist *sglist,
local_irq_restore(flags);
 }
 
+static void *dma_4v_alloc(struct device *dev, size_t size,
+ dma_addr_t *dma_addrp, gfp_t gfp, unsigned long attrs)

[PATCH 04/10] arm: implement DMA_ATTR_NON_CONSISTENT

2018-12-08 Thread Christoph Hellwig
For the iommu ops we can just use the implementaton for DMA coherent
devices.  For the regular ops we need mix and match a bit so that
we either use the CMA allocator without remapping, but with a special
error handling case for highmem pages, or the simple allocator.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/mm/dma-mapping.c | 49 ---
 1 file changed, 35 insertions(+), 14 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 2cfb17bad1e6..b3b66b41c450 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -49,6 +49,7 @@ struct arm_dma_alloc_args {
const void *caller;
bool want_vaddr;
int coherent_flag;
+   bool nonconsistent_flag;
 };
 
 struct arm_dma_free_args {
@@ -57,6 +58,7 @@ struct arm_dma_free_args {
void *cpu_addr;
struct page *page;
bool want_vaddr;
+   bool nonconsistent_flag;
 };
 
 #define NORMAL 0
@@ -348,7 +350,8 @@ static void __dma_free_buffer(struct page *page, size_t 
size)
 static void *__alloc_from_contiguous(struct device *dev, size_t size,
 pgprot_t prot, struct page **ret_page,
 const void *caller, bool want_vaddr,
-int coherent_flag, gfp_t gfp);
+int coherent_flag, bool nonconsistent_flag,
+gfp_t gfp);
 
 static void *__alloc_remap_buffer(struct device *dev, size_t size, gfp_t gfp,
 pgprot_t prot, struct page **ret_page,
@@ -405,7 +408,7 @@ static int __init atomic_pool_init(void)
if (dev_get_cma_area(NULL))
ptr = __alloc_from_contiguous(NULL, atomic_pool_size, prot,
  , atomic_pool_init, true, NORMAL,
- GFP_KERNEL);
+ false, GFP_KERNEL);
else
ptr = __alloc_remap_buffer(NULL, atomic_pool_size, gfp, prot,
   , atomic_pool_init, true);
@@ -579,7 +582,8 @@ static int __free_from_pool(void *start, size_t size)
 static void *__alloc_from_contiguous(struct device *dev, size_t size,
 pgprot_t prot, struct page **ret_page,
 const void *caller, bool want_vaddr,
-int coherent_flag, gfp_t gfp)
+int coherent_flag, bool nonconsistent_flag,
+gfp_t gfp)
 {
unsigned long order = get_order(size);
size_t count = size >> PAGE_SHIFT;
@@ -595,12 +599,16 @@ static void *__alloc_from_contiguous(struct device *dev, 
size_t size,
if (!want_vaddr)
goto out;
 
+   if (nonconsistent_flag) {
+   if (PageHighMem(page))
+   goto fail;
+   goto out;
+   }
+
if (PageHighMem(page)) {
ptr = __dma_alloc_remap(page, size, GFP_KERNEL, prot, caller);
-   if (!ptr) {
-   dma_release_from_contiguous(dev, page, count);
-   return NULL;
-   }
+   if (!ptr)
+   goto fail;
} else {
__dma_remap(page, size, prot);
ptr = page_address(page);
@@ -609,12 +617,15 @@ static void *__alloc_from_contiguous(struct device *dev, 
size_t size,
  out:
*ret_page = page;
return ptr;
+ fail:
+   dma_release_from_contiguous(dev, page, count);
+   return NULL;
 }
 
 static void __free_from_contiguous(struct device *dev, struct page *page,
-  void *cpu_addr, size_t size, bool want_vaddr)
+  void *cpu_addr, size_t size, bool remapped)
 {
-   if (want_vaddr) {
+   if (remapped) {
if (PageHighMem(page))
__dma_free_remap(cpu_addr, size);
else
@@ -635,7 +646,11 @@ static void *__alloc_simple_buffer(struct device *dev, 
size_t size, gfp_t gfp,
   struct page **ret_page)
 {
struct page *page;
-   /* __alloc_simple_buffer is only called when the device is coherent */
+   /*
+* __alloc_simple_buffer is only called when the device is coherent,
+* or if the caller explicitly asked for an allocation that is not
+* consistent.
+*/
page = __dma_alloc_buffer(dev, size, gfp, COHERENT);
if (!page)
return NULL;
@@ -667,13 +682,15 @@ static void *cma_allocator_alloc(struct 
arm_dma_alloc_args *args,
return __alloc_from_contiguous(args->dev, args->size, args->prot,
   ret_page, args->caller,
   args->want_vaddr, args->coherent_flag,
+

[PATCH 10/10] Documentation: update the description for DMA_ATTR_NON_CONSISTENT

2018-12-08 Thread Christoph Hellwig
We got rid of the odd selective consistent or not behavior, and now
want the normal dma_sync_single_* functions to be used for strict
ownership transfers.  While dma_cache_sync hasn't been removed from
the tree yet it should not be used in any new caller, so documentation
for it is dropped here.

Signed-off-by: Christoph Hellwig 
---
 Documentation/DMA-API.txt| 30 --
 Documentation/DMA-attributes.txt |  9 +
 include/linux/dma-mapping.h  |  3 +++
 3 files changed, 12 insertions(+), 30 deletions(-)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index ac66ae2509a9..c81fe8a4aeec 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -518,20 +518,9 @@ API at all.
dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t flag, unsigned long attrs)
 
-Identical to dma_alloc_coherent() except that when the
-DMA_ATTR_NON_CONSISTENT flags is passed in the attrs argument, the
-platform will choose to return either consistent or non-consistent memory
-as it sees fit.  By using this API, you are guaranteeing to the platform
-that you have all the correct and necessary sync points for this memory
-in the driver should it choose to return non-consistent memory.
-
-Note: where the platform can return consistent memory, it will
-guarantee that the sync points become nops.
-
-Warning:  Handling non-consistent memory is a real pain.  You should
-only use this API if you positively know your driver will be
-required to work on one of the rare (usually non-PCI) architectures
-that simply cannot make consistent memory.
+Similar to dma_alloc_coherent(), except that the behavior can be controlled
+in more detail using the attrs argument.  See Documentation/DMA-attributes.txt
+for more details.
 
 ::
 
@@ -540,7 +529,7 @@ that simply cannot make consistent memory.
   dma_addr_t dma_handle, unsigned long attrs)
 
 Free memory allocated by the dma_alloc_attrs().  All parameters common
-parameters must identical to those otherwise passed to dma_fre_coherent,
+parameters must identical to those otherwise passed to dma_free_coherent,
 and the attrs argument must be identical to the attrs passed to
 dma_alloc_attrs().
 
@@ -560,17 +549,6 @@ memory or doing partial flushes.
into the width returned by this call.  It will also always be a power
of two for easy alignment.
 
-::
-
-   void
-   dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-  enum dma_data_direction direction)
-
-Do a partial sync of memory that was allocated by dma_alloc_attrs() with
-the DMA_ATTR_NON_CONSISTENT flag starting at virtual address vaddr and
-continuing on for size.  Again, you *must* observe the cache line
-boundaries when doing this.
-
 ::
 
int
diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
index 8f8d97f65d73..2bb3fc0a621b 100644
--- a/Documentation/DMA-attributes.txt
+++ b/Documentation/DMA-attributes.txt
@@ -46,10 +46,11 @@ behavior.
 DMA_ATTR_NON_CONSISTENT
 ---
 
-DMA_ATTR_NON_CONSISTENT lets the platform to choose to return either
-consistent or non-consistent memory as it sees fit.  By using this API,
-you are guaranteeing to the platform that you have all the correct and
-necessary sync points for this memory in the driver.
+DMA_ATTR_NON_CONSISTENT specifies that the memory returned is not
+required to be consistent.  The memory is owned by the device when
+returned from this function, and ownership must be explicitly
+transferred to the CPU using dma_sync_single_for_cpu, and back to the
+device using dma_sync_single_for_device.
 
 DMA_ATTR_NO_KERNEL_MAPPING
 --
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 8c81fa5d1f44..8757ad5087c4 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -432,6 +432,9 @@ dma_sync_sg_for_device(struct device *dev, struct 
scatterlist *sg,
 #define dma_map_page(d, p, o, s, r) dma_map_page_attrs(d, p, o, s, r, 0)
 #define dma_unmap_page(d, a, s, r) dma_unmap_page_attrs(d, a, s, r, 0)
 
+/*
+ * Don't use in new code, use dma_sync_single_for_{device,cpu} instead.
+ */
 static inline void
 dma_cache_sync(struct device *dev, void *vaddr, size_t size,
enum dma_data_direction dir)
-- 
2.19.2


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH 06/10] sparc64/iommu: implement DMA_ATTR_NON_CONSISTENT

2018-12-08 Thread Christoph Hellwig
Just allocate the memory and use map_page to map the memory.

Signed-off-by: Christoph Hellwig 
---
 arch/sparc/kernel/iommu.c | 33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c
index 4bf0497e0704..4ce24c9dc691 100644
--- a/arch/sparc/kernel/iommu.c
+++ b/arch/sparc/kernel/iommu.c
@@ -699,14 +699,19 @@ static void *dma_4u_alloc(struct device *dev, size_t size,
first_page = (unsigned long) page_address(page);
memset((char *)first_page, 0, PAGE_SIZE << order);
 
+   if (attrs & DMA_ATTR_NON_CONSISTENT) {
+   *dma_addrp = dma_4u_map_page(dev, page, 0, size,
+DMA_BIDIRECTIONAL, 0);
+   if (*dma_addrp == DMA_MAPPING_ERROR)
+   goto out_free_page;
+   return page_address(page);
+   }
+
iommu = dev->archdata.iommu;
 
iopte = alloc_npages(dev, iommu, size >> IO_PAGE_SHIFT);
-
-   if (unlikely(iopte == NULL)) {
-   free_pages(first_page, order);
-   return NULL;
-   }
+   if (unlikely(iopte == NULL))
+   goto out_free_page;
 
*dma_addrp = (iommu->tbl.table_map_base +
  ((iopte - iommu->page_table) << IO_PAGE_SHIFT));
@@ -722,18 +727,26 @@ static void *dma_4u_alloc(struct device *dev, size_t size,
}
 
return ret;
+
+out_free_page:
+   free_pages(first_page, order);
+   return NULL;
 }
 
 static void dma_4u_free(struct device *dev, size_t size, void *cpu,
dma_addr_t dvma, unsigned long attrs)
 {
-   struct iommu *iommu;
-   unsigned long order, npages;
+   unsigned long order;
 
-   npages = IO_PAGE_ALIGN(size) >> IO_PAGE_SHIFT;
-   iommu = dev->archdata.iommu;
+   if (attrs & DMA_ATTR_NON_CONSISTENT) {
+   dma_4u_unmap_page(dev, dvma, size, DMA_BIDIRECTIONAL, 0);
+   } else {
+   struct iommu *iommu = dev->archdata.iommu;
 
-   iommu_tbl_range_free(>tbl, dvma, npages, IOMMU_ERROR_CODE);
+   iommu_tbl_range_free(>tbl, dvma,
+IO_PAGE_ALIGN(size) >> IO_PAGE_SHIFT,
+IOMMU_ERROR_CODE);
+   }
 
order = get_order(size);
if (order < 10)
-- 
2.19.2


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


make the non-consistent DMA allocator more userful

2018-12-08 Thread Christoph Hellwig
Hi all,

we had all kinds of discussions about how to best allocate DMAable memory
without having to deal with the problem that your normal "coherent"
DMA allocator can be very slow on platforms where DMA is not DMA
coherent.

To work around this drivers basically two choices at the moment:

 (1) just allocate memory using the page or slab allocator and the call
 one of the dma_map_* APIs on it.  This has a few drawbacks:

   - normal GFP_KERNEL memory might not actually be DMA addressable
 for all devices, forcing fallbacks to slow bounce buffering
   - there is no easy way to access the CMA allocator for large
 chunks, or to map small pages into single device and virtually
 contigous chunks using the iommu and vmap

 (2) use dma_alloc_attrs with the DMA_ATTR_NON_CONSISTENT flag.  This
 has a different set of drawbacks

   - only very few architectures actually implement this API fully,
 if it is not implemented it falls back to the potentially
 uncached and slow coherent allocator
   - the dma_cache_sync API to use with it is not very well
 specified and problematic in that it does not clearly
 transfer ownership

Based on that I've been planning to introduce a proper API for
allocating DMAable memory for a while.  In the end I've ended up
improving the DMA_ATTR_NON_CONSISTENT flag instead of designing
something new.  To make it useful we need to:

 (a) ensure we don't fall back to the slow coherent allocator except
 on fully coherent platforms where they are the same anyway
 (b) replace the odd dma_cache_sync calls with the proper
 dma_sync_* APIs that we also use for other ownership trasnfers

This turned out to be surprisingly simple now that we have consolidated
most of the direct mapping code.  Note that this series is missing
the updates for powerpc which is in the process of being migrated to
the common direct mapping code in another series and would be covered
by that.

Note that these patches don't use iommu/vmap coalescing as they can
be problematic depending on the cache architecture.  But we could
opt into those when we know we don't have cache interaction problems
based on the API.

All the patches are on top of the dma-mapping for-net tree and also
available as a git tree here:

git://git.infradead.org/users/hch/misc.git dma-noncoherent-allocator

Gitweb:


http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-noncoherent-allocator

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH 08/10] sparc64/pci_sun4v: implement DMA_ATTR_NON_CONSISTENT

2018-12-08 Thread Christoph Hellwig
Just allocate the memory and use map_page to map the memory.

Signed-off-by: Christoph Hellwig 
---
 arch/sparc/kernel/pci_sun4v.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index b95c70136559..24a76ecf2986 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -590,6 +590,14 @@ static void *dma_4v_alloc(struct device *dev, size_t size,
first_page = (unsigned long) page_address(page);
memset((char *)first_page, 0, PAGE_SIZE << order);
 
+   if (attrs & DMA_ATTR_NON_CONSISTENT) {
+   *dma_addrp = dma_4v_map_page(dev, page, 0, size,
+DMA_BIDIRECTIONAL, 0);
+   if (*dma_addrp == DMA_MAPPING_ERROR)
+   goto range_alloc_fail;
+   return page_address(page);
+   }
+
iommu = dev->archdata.iommu;
atu = iommu->atu;
 
@@ -649,6 +657,11 @@ static void dma_4v_free(struct device *dev, size_t size, 
void *cpu,
unsigned long iotsb_num;
u32 devhandle;
 
+   if (attrs & DMA_ATTR_NON_CONSISTENT) {
+   dma_4v_unmap_page(dev, dvma, size, DMA_BIDIRECTIONAL, 0);
+   goto free_pages;
+   }
+
npages = IO_PAGE_ALIGN(size) >> IO_PAGE_SHIFT;
iommu = dev->archdata.iommu;
pbm = dev->archdata.host_controller;
@@ -665,6 +678,7 @@ static void dma_4v_free(struct device *dev, size_t size, 
void *cpu,
entry = ((dvma - tbl->table_map_base) >> IO_PAGE_SHIFT);
dma_4v_iommu_demap(dev, devhandle, dvma, iotsb_num, entry, npages);
iommu_tbl_range_free(tbl, dvma, npages, IOMMU_ERROR_CODE);
+free_pages:
order = get_order(size);
if (order < 10)
free_pages((unsigned long)cpu, order);
-- 
2.19.2


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH 02/10] arm64/iommu: don't remap contiguous allocations for coherent devices

2018-12-08 Thread Christoph Hellwig
There is no need to have an additional kernel mapping for a contiguous
allocation if the device already is DMA coherent, so skip it.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/mm/dma-mapping.c | 35 ++-
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 4c0f498069e8..d39b60113539 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -255,13 +255,18 @@ static void *__iommu_alloc_attrs(struct device *dev, 
size_t size,
size >> PAGE_SHIFT);
return NULL;
}
+
+   if (coherent) {
+   memset(addr, 0, size);
+   return addr;
+   }
+
addr = dma_common_contiguous_remap(page, size, VM_USERMAP,
   prot,
   __builtin_return_address(0));
if (addr) {
memset(addr, 0, size);
-   if (!coherent)
-   __dma_flush_area(page_to_virt(page), iosize);
+   __dma_flush_area(page_to_virt(page), iosize);
} else {
iommu_dma_unmap_page(dev, *handle, iosize, 0, attrs);
dma_release_from_contiguous(dev, page,
@@ -309,7 +314,9 @@ static void __iommu_free_attrs(struct device *dev, size_t 
size, void *cpu_addr,
 
iommu_dma_unmap_page(dev, handle, iosize, 0, attrs);
dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT);
-   dma_common_free_remap(cpu_addr, size, VM_USERMAP);
+
+   if (!dev_is_dma_coherent(dev))
+   dma_common_free_remap(cpu_addr, size, VM_USERMAP);
} else if (is_vmalloc_addr(cpu_addr)){
struct vm_struct *area = find_vm_area(cpu_addr);
 
@@ -336,11 +343,12 @@ static int __iommu_mmap_attrs(struct device *dev, struct 
vm_area_struct *vma,
return ret;
 
if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
-   /*
-* DMA_ATTR_FORCE_CONTIGUOUS allocations are always remapped,
-* hence in the vmalloc space.
-*/
-   unsigned long pfn = vmalloc_to_pfn(cpu_addr);
+   unsigned long pfn;
+
+   if (dev_is_dma_coherent(dev))
+   pfn = virt_to_pfn(cpu_addr);
+   else
+   pfn = vmalloc_to_pfn(cpu_addr);
return __swiotlb_mmap_pfn(vma, pfn, size);
}
 
@@ -359,11 +367,12 @@ static int __iommu_get_sgtable(struct device *dev, struct 
sg_table *sgt,
struct vm_struct *area = find_vm_area(cpu_addr);
 
if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
-   /*
-* DMA_ATTR_FORCE_CONTIGUOUS allocations are always remapped,
-* hence in the vmalloc space.
-*/
-   struct page *page = vmalloc_to_page(cpu_addr);
+   struct page *page;
+
+   if (dev_is_dma_coherent(dev))
+   page = virt_to_page(cpu_addr);
+   else
+   page = vmalloc_to_page(cpu_addr);
return __swiotlb_get_sgtable_page(sgt, page, size);
}
 
-- 
2.19.2


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH 09/10] dma-mapping: skip declared coherent memory for DMA_ATTR_NON_CONSISTENT

2018-12-08 Thread Christoph Hellwig
Memory declared using dma_declare_coherent is ioremapped and thus not
always suitable for our tightened DMA_ATTR_NON_CONSISTENT definition.

Skip it given all the existing callers don't DMA_ATTR_NON_CONSISTENT
anyway.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-mapping.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 7799c2b27849..8c81fa5d1f44 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -521,7 +521,8 @@ static inline void *dma_alloc_attrs(struct device *dev, 
size_t size,
BUG_ON(!ops);
WARN_ON_ONCE(dev && !dev->coherent_dma_mask);
 
-   if (dma_alloc_from_dev_coherent(dev, size, dma_handle, _addr))
+   if (!(attrs & DMA_ATTR_NON_CONSISTENT) &&
+   dma_alloc_from_dev_coherent(dev, size, dma_handle, _addr))
return cpu_addr;
 
/* let the implementation decide on the zone to allocate from: */
-- 
2.19.2


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc