Re: [PATCH v3 1/9] mm: Introduce new vm_insert_range API
On Sat, Dec 8, 2018 at 2:40 AM Robin Murphy wrote: > > On 2018-12-07 7:28 pm, Souptick Joarder wrote: > > On Fri, Dec 7, 2018 at 10:41 PM Matthew Wilcox wrote: > >> > >> On Fri, Dec 07, 2018 at 03:34:56PM +, Robin Murphy wrote: > +int vm_insert_range(struct vm_area_struct *vma, unsigned long addr, > + struct page **pages, unsigned long page_count) > +{ > + unsigned long uaddr = addr; > + int ret = 0, i; > >>> > >>> Some of the sites being replaced were effectively ensuring that vma and > >>> pages were mutually compatible as an initial condition - would it be worth > >>> adding something here for robustness, e.g.: > >>> > >>> + if (page_count != vma_pages(vma)) > >>> + return -ENXIO; > >> > >> I think we want to allow this to be used to populate part of a VMA. > >> So perhaps: > >> > >> if (page_count > vma_pages(vma)) > >> return -ENXIO; > > > > Ok, This can be added. > > > > I think Patch [2/9] is the only leftover place where this > > check could be removed. > > Right, 9/9 could also have relied on my stricter check here, but since > it's really testing whether it actually managed to allocate vma_pages() > worth of pages earlier, Matthew's more lenient version won't help for > that one. (Why privcmd_buf_mmap() doesn't clean up and return an error > as soon as that allocation loop fails, without taking the mutex under > which it still does a bunch more pointless work to only undo it again, > is a mind-boggling mystery, but that's not our problem here...) I think some clean up can be done here in a separate patch. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/9] mm: Introduce new vm_insert_range API
On 2018-12-07 7:28 pm, Souptick Joarder wrote: On Fri, Dec 7, 2018 at 10:41 PM Matthew Wilcox wrote: On Fri, Dec 07, 2018 at 03:34:56PM +, Robin Murphy wrote: +int vm_insert_range(struct vm_area_struct *vma, unsigned long addr, + struct page **pages, unsigned long page_count) +{ + unsigned long uaddr = addr; + int ret = 0, i; Some of the sites being replaced were effectively ensuring that vma and pages were mutually compatible as an initial condition - would it be worth adding something here for robustness, e.g.: + if (page_count != vma_pages(vma)) + return -ENXIO; I think we want to allow this to be used to populate part of a VMA. So perhaps: if (page_count > vma_pages(vma)) return -ENXIO; Ok, This can be added. I think Patch [2/9] is the only leftover place where this check could be removed. Right, 9/9 could also have relied on my stricter check here, but since it's really testing whether it actually managed to allocate vma_pages() worth of pages earlier, Matthew's more lenient version won't help for that one. (Why privcmd_buf_mmap() doesn't clean up and return an error as soon as that allocation loop fails, without taking the mutex under which it still does a bunch more pointless work to only undo it again, is a mind-boggling mystery, but that's not our problem here...) Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 6/9] iommu/dma-iommu.c: Convert to use vm_insert_range
On Fri, Dec 7, 2018 at 7:17 PM Robin Murphy wrote: > > On 06/12/2018 18:43, Souptick Joarder wrote: > > Convert to use vm_insert_range() to map range of kernel > > memory to user vma. > > > > Signed-off-by: Souptick Joarder > > Reviewed-by: Matthew Wilcox > > --- > > drivers/iommu/dma-iommu.c | 13 +++-- > > 1 file changed, 3 insertions(+), 10 deletions(-) > > > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > > index d1b0475..a2c65e2 100644 > > --- a/drivers/iommu/dma-iommu.c > > +++ b/drivers/iommu/dma-iommu.c > > @@ -622,17 +622,10 @@ struct page **iommu_dma_alloc(struct device *dev, > > size_t size, gfp_t gfp, > > > > int iommu_dma_mmap(struct page **pages, size_t size, struct > > vm_area_struct *vma) > > { > > - unsigned long uaddr = vma->vm_start; > > - unsigned int i, count = PAGE_ALIGN(size) >> PAGE_SHIFT; > > - int ret = -ENXIO; > > + unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT; > > > > - for (i = vma->vm_pgoff; i < count && uaddr < vma->vm_end; i++) { > > - ret = vm_insert_page(vma, uaddr, pages[i]); > > - if (ret) > > - break; > > - uaddr += PAGE_SIZE; > > - } > > - return ret; > > + return vm_insert_range(vma, vma->vm_start, > > + pages + vma->vm_pgoff, count); > > You also need to adjust count to compensate for the pages skipped by > vm_pgoff, otherwise you've got an out-of-bounds dereference triggered > from userspace, which is pretty high up the "not good" scale (not to > mention the entire call would then propagate -EFAULT back from > vm_insert_page() and thus always appear to fail for nonzero offsets). So this should something similar to -> return vm_insert_range(vma, vma->vm_start, pages + vma->vm_pgoff, count - vma->vm_pgoff); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/9] mm: Introduce new vm_insert_range API
On Fri, Dec 7, 2018 at 10:41 PM Matthew Wilcox wrote: > > On Fri, Dec 07, 2018 at 03:34:56PM +, Robin Murphy wrote: > > > +int vm_insert_range(struct vm_area_struct *vma, unsigned long addr, > > > + struct page **pages, unsigned long page_count) > > > +{ > > > + unsigned long uaddr = addr; > > > + int ret = 0, i; > > > > Some of the sites being replaced were effectively ensuring that vma and > > pages were mutually compatible as an initial condition - would it be worth > > adding something here for robustness, e.g.: > > > > + if (page_count != vma_pages(vma)) > > + return -ENXIO; > > I think we want to allow this to be used to populate part of a VMA. > So perhaps: > > if (page_count > vma_pages(vma)) > return -ENXIO; Ok, This can be added. I think Patch [2/9] is the only leftover place where this check could be removed. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 15/15] dma-mapping: bypass indirect calls for dma-direct
Avoid expensive indirect calls in the fast path DMA mapping operations by directly calling the dma_direct_* ops if we are using the directly mapped DMA operations. Signed-off-by: Christoph Hellwig --- arch/alpha/include/asm/dma-mapping.h | 2 +- arch/arc/mm/cache.c | 2 +- arch/arm/include/asm/dma-mapping.h | 2 +- arch/arm/mm/dma-mapping-nommu.c | 14 +--- arch/arm64/mm/dma-mapping.c | 3 - arch/ia64/hp/common/hwsw_iommu.c | 2 +- arch/ia64/hp/common/sba_iommu.c | 4 +- arch/ia64/kernel/dma-mapping.c | 1 - arch/mips/include/asm/dma-mapping.h | 2 +- arch/parisc/kernel/setup.c | 4 - arch/sparc/include/asm/dma-mapping.h | 4 +- arch/x86/kernel/pci-dma.c| 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 2 +- drivers/iommu/amd_iommu.c| 13 +--- include/asm-generic/dma-mapping.h| 2 +- include/linux/dma-direct.h | 17 include/linux/dma-mapping.h | 111 +++ include/linux/dma-noncoherent.h | 5 +- kernel/dma/direct.c | 37 ++--- kernel/dma/mapping.c | 40 ++ 20 files changed, 150 insertions(+), 119 deletions(-) diff --git a/arch/alpha/include/asm/dma-mapping.h b/arch/alpha/include/asm/dma-mapping.h index 8beeafd4f68e..0ee6a5c99b16 100644 --- a/arch/alpha/include/asm/dma-mapping.h +++ b/arch/alpha/include/asm/dma-mapping.h @@ -7,7 +7,7 @@ extern const struct dma_map_ops alpha_pci_ops; static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus) { #ifdef CONFIG_ALPHA_JENSEN - return &dma_direct_ops; + return NULL; #else return &alpha_pci_ops; #endif diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c index f2701c13a66b..e188bb3ede53 100644 --- a/arch/arc/mm/cache.c +++ b/arch/arc/mm/cache.c @@ -1280,7 +1280,7 @@ void __init arc_cache_init_master(void) /* * In case of IOC (say IOC+SLC case), pointers above could still be set * but end up not being relevant as the first function in chain is not -* called at all for @dma_direct_ops +* called at all for devices using coherent DMA. * arch_sync_dma_for_cpu() -> dma_cache_*() -> __dma_cache_*() */ } diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h index 965b7c846ecb..31d3b96f0f4b 100644 --- a/arch/arm/include/asm/dma-mapping.h +++ b/arch/arm/include/asm/dma-mapping.h @@ -18,7 +18,7 @@ extern const struct dma_map_ops arm_coherent_dma_ops; static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus) { - return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : &dma_direct_ops; + return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : NULL; } #ifdef __arch_page_to_dma diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c index 712416ecd8e6..f304b10e23a4 100644 --- a/arch/arm/mm/dma-mapping-nommu.c +++ b/arch/arm/mm/dma-mapping-nommu.c @@ -22,7 +22,7 @@ #include "dma.h" /* - * dma_direct_ops is used if + * The generic direct mapping code is used if * - MMU/MPU is off * - cpu is v7m w/o cache support * - device is coherent @@ -209,16 +209,9 @@ const struct dma_map_ops arm_nommu_dma_ops = { }; EXPORT_SYMBOL(arm_nommu_dma_ops); -static const struct dma_map_ops *arm_nommu_get_dma_map_ops(bool coherent) -{ - return coherent ? &dma_direct_ops : &arm_nommu_dma_ops; -} - void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, const struct iommu_ops *iommu, bool coherent) { - const struct dma_map_ops *dma_ops; - if (IS_ENABLED(CONFIG_CPU_V7M)) { /* * Cache support for v7m is optional, so can be treated as @@ -234,7 +227,6 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, dev->archdata.dma_coherent = (get_cr() & CR_M) ? coherent : true; } - dma_ops = arm_nommu_get_dma_map_ops(dev->archdata.dma_coherent); - - set_dma_ops(dev, dma_ops); + if (!dev->archdata.dma_coherent) + set_dma_ops(dev, &arm_nommu_dma_ops); } diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index ab1e417204d0..95eda81e3f2d 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -462,9 +462,6 @@ static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, const struct iommu_ops *iommu, bool coherent) { - if (!dev->dma_ops) - dev->dma_ops = &dma_direct_ops; - dev->dma_coherent = coherent; __iommu_setup_dma_ops(dev, dma_base, size, iommu); diff --git a/arch/ia64/hp/common/hwsw_iommu.c b/arch/ia64/hp/common/hwsw_iommu.c index f40ca499b246..8840ed97712f 100644 --- a/arch/ia64/hp/common/hwsw_iommu.c +++ b/arch/ia64/hp/common/hwsw_iommu.
[PATCH 14/15] vmd: use the proper dma_* APIs instead of direct methods calls
With the bypass support for the direct mapping we might not always have methods to call, so use the proper APIs instead. The only downside is that we will create two dma-debug entries for each mapping if CONFIG_DMA_DEBUG is enabled. Signed-off-by: Christoph Hellwig --- drivers/pci/controller/vmd.c | 42 +++- 1 file changed, 17 insertions(+), 25 deletions(-) diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c index 98ce79eac128..3890812cdf87 100644 --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -307,39 +307,32 @@ static struct device *to_vmd_dev(struct device *dev) return &vmd->dev->dev; } -static const struct dma_map_ops *vmd_dma_ops(struct device *dev) -{ - return get_dma_ops(to_vmd_dev(dev)); -} - static void *vmd_alloc(struct device *dev, size_t size, dma_addr_t *addr, gfp_t flag, unsigned long attrs) { - return vmd_dma_ops(dev)->alloc(to_vmd_dev(dev), size, addr, flag, - attrs); + return dma_alloc_attrs(to_vmd_dev(dev), size, addr, flag, attrs); } static void vmd_free(struct device *dev, size_t size, void *vaddr, dma_addr_t addr, unsigned long attrs) { - return vmd_dma_ops(dev)->free(to_vmd_dev(dev), size, vaddr, addr, - attrs); + return dma_free_attrs(to_vmd_dev(dev), size, vaddr, addr, attrs); } static int vmd_mmap(struct device *dev, struct vm_area_struct *vma, void *cpu_addr, dma_addr_t addr, size_t size, unsigned long attrs) { - return vmd_dma_ops(dev)->mmap(to_vmd_dev(dev), vma, cpu_addr, addr, - size, attrs); + return dma_mmap_attrs(to_vmd_dev(dev), vma, cpu_addr, addr, size, + attrs); } static int vmd_get_sgtable(struct device *dev, struct sg_table *sgt, void *cpu_addr, dma_addr_t addr, size_t size, unsigned long attrs) { - return vmd_dma_ops(dev)->get_sgtable(to_vmd_dev(dev), sgt, cpu_addr, -addr, size, attrs); + return dma_get_sgtable_attrs(to_vmd_dev(dev), sgt, cpu_addr, addr, size, + attrs); } static dma_addr_t vmd_map_page(struct device *dev, struct page *page, @@ -347,61 +340,60 @@ static dma_addr_t vmd_map_page(struct device *dev, struct page *page, enum dma_data_direction dir, unsigned long attrs) { - return vmd_dma_ops(dev)->map_page(to_vmd_dev(dev), page, offset, size, - dir, attrs); + return dma_map_page_attrs(to_vmd_dev(dev), page, offset, size, dir, + attrs); } static void vmd_unmap_page(struct device *dev, dma_addr_t addr, size_t size, enum dma_data_direction dir, unsigned long attrs) { - vmd_dma_ops(dev)->unmap_page(to_vmd_dev(dev), addr, size, dir, attrs); + dma_unmap_page_attrs(to_vmd_dev(dev), addr, size, dir, attrs); } static int vmd_map_sg(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir, unsigned long attrs) { - return vmd_dma_ops(dev)->map_sg(to_vmd_dev(dev), sg, nents, dir, attrs); + return dma_map_sg_attrs(to_vmd_dev(dev), sg, nents, dir, attrs); } static void vmd_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir, unsigned long attrs) { - vmd_dma_ops(dev)->unmap_sg(to_vmd_dev(dev), sg, nents, dir, attrs); + dma_unmap_sg_attrs(to_vmd_dev(dev), sg, nents, dir, attrs); } static void vmd_sync_single_for_cpu(struct device *dev, dma_addr_t addr, size_t size, enum dma_data_direction dir) { - vmd_dma_ops(dev)->sync_single_for_cpu(to_vmd_dev(dev), addr, size, dir); + dma_sync_single_for_cpu(to_vmd_dev(dev), addr, size, dir); } static void vmd_sync_single_for_device(struct device *dev, dma_addr_t addr, size_t size, enum dma_data_direction dir) { - vmd_dma_ops(dev)->sync_single_for_device(to_vmd_dev(dev), addr, size, -dir); + dma_sync_single_for_device(to_vmd_dev(dev), addr, size, dir); } static void vmd_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir) { - vmd_dma_ops(dev)->sync_sg_for_cpu(to_vmd_dev(dev), sg, nents, dir); + dma_sync_sg_for_cpu(to_vmd_dev(dev), sg, nents, dir); } static void vmd_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir) { - vmd_dma_ops(dev)->sync_sg_for_device(to_vmd_dev(dev), sg,
[PATCH 13/15] ACPI / scan: Refactor _CCA enforcement
From: Robin Murphy Rather than checking the DMA attribute at each callsite, just pass it through for acpi_dma_configure() to handle directly. That can then deal with the relatively exceptional DEV_DMA_NOT_SUPPORTED case by explicitly installing dummy DMA ops instead of just skipping setup entirely. This will then free up the dev->dma_ops == NULL case for some valuable fastpath optimisations. Signed-off-by: Robin Murphy Signed-off-by: Christoph Hellwig --- drivers/acpi/scan.c | 5 + drivers/base/platform.c | 3 +-- drivers/pci/pci-driver.c | 3 +-- 3 files changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c index bd1c59fb0e17..b75ae34ed188 100644 --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -1456,6 +1456,11 @@ int acpi_dma_configure(struct device *dev, enum dev_dma_attr attr) const struct iommu_ops *iommu; u64 dma_addr = 0, size = 0; + if (attr == DEV_DMA_NOT_SUPPORTED) { + set_dma_ops(dev, &dma_dummy_ops); + return 0; + } + iort_dma_setup(dev, &dma_addr, &size); iommu = iort_iommu_configure(dev); diff --git a/drivers/base/platform.c b/drivers/base/platform.c index eae841935a45..c1ddf191711e 100644 --- a/drivers/base/platform.c +++ b/drivers/base/platform.c @@ -1138,8 +1138,7 @@ int platform_dma_configure(struct device *dev) ret = of_dma_configure(dev, dev->of_node, true); } else if (has_acpi_companion(dev)) { attr = acpi_get_dma_attr(to_acpi_device_node(dev->fwnode)); - if (attr != DEV_DMA_NOT_SUPPORTED) - ret = acpi_dma_configure(dev, attr); + ret = acpi_dma_configure(dev, attr); } return ret; diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index bef17c3fca67..1b58e058b13f 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -1602,8 +1602,7 @@ static int pci_dma_configure(struct device *dev) struct acpi_device *adev = to_acpi_device_node(bridge->fwnode); enum dev_dma_attr attr = acpi_get_dma_attr(adev); - if (attr != DEV_DMA_NOT_SUPPORTED) - ret = acpi_dma_configure(dev, attr); + ret = acpi_dma_configure(dev, acpi_get_dma_attr(adev)); } pci_put_host_bridge_device(bridge); -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 12/15] dma-mapping: factor out dummy DMA ops
From: Robin Murphy The dummy DMA ops are currently used by arm64 for any device which has an invalid ACPI description and is thus barred from using DMA due to not knowing whether is is cache-coherent or not. Factor these out into general dma-mapping code so that they can be referenced from other common code paths. In the process, we can prune all the optional callbacks which just do the same thing as the default behaviour, and fill in .map_resource for completeness. Signed-off-by: Robin Murphy [hch: moved to a separate source file] Signed-off-by: Christoph Hellwig --- arch/arm64/include/asm/dma-mapping.h | 4 +- arch/arm64/mm/dma-mapping.c | 86 include/linux/dma-mapping.h | 1 + kernel/dma/Makefile | 2 +- kernel/dma/dummy.c | 39 + 5 files changed, 42 insertions(+), 90 deletions(-) create mode 100644 kernel/dma/dummy.c diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h index c41f3fb1446c..273e778f7de2 100644 --- a/arch/arm64/include/asm/dma-mapping.h +++ b/arch/arm64/include/asm/dma-mapping.h @@ -24,15 +24,13 @@ #include #include -extern const struct dma_map_ops dummy_dma_ops; - static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus) { /* * We expect no ISA devices, and all other DMA masters are expected to * have someone call arch_setup_dma_ops at device creation time. */ - return &dummy_dma_ops; + return &dma_dummy_ops; } void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index e4effbb243b1..ab1e417204d0 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -89,92 +89,6 @@ static int __swiotlb_mmap_pfn(struct vm_area_struct *vma, } #endif /* CONFIG_IOMMU_DMA */ -/ - * The following APIs are for dummy DMA ops * - / - -static void *__dummy_alloc(struct device *dev, size_t size, - dma_addr_t *dma_handle, gfp_t flags, - unsigned long attrs) -{ - return NULL; -} - -static void __dummy_free(struct device *dev, size_t size, -void *vaddr, dma_addr_t dma_handle, -unsigned long attrs) -{ -} - -static int __dummy_mmap(struct device *dev, - struct vm_area_struct *vma, - void *cpu_addr, dma_addr_t dma_addr, size_t size, - unsigned long attrs) -{ - return -ENXIO; -} - -static dma_addr_t __dummy_map_page(struct device *dev, struct page *page, - unsigned long offset, size_t size, - enum dma_data_direction dir, - unsigned long attrs) -{ - return DMA_MAPPING_ERROR; -} - -static void __dummy_unmap_page(struct device *dev, dma_addr_t dev_addr, - size_t size, enum dma_data_direction dir, - unsigned long attrs) -{ -} - -static int __dummy_map_sg(struct device *dev, struct scatterlist *sgl, - int nelems, enum dma_data_direction dir, - unsigned long attrs) -{ - return 0; -} - -static void __dummy_unmap_sg(struct device *dev, -struct scatterlist *sgl, int nelems, -enum dma_data_direction dir, -unsigned long attrs) -{ -} - -static void __dummy_sync_single(struct device *dev, - dma_addr_t dev_addr, size_t size, - enum dma_data_direction dir) -{ -} - -static void __dummy_sync_sg(struct device *dev, - struct scatterlist *sgl, int nelems, - enum dma_data_direction dir) -{ -} - -static int __dummy_dma_supported(struct device *hwdev, u64 mask) -{ - return 0; -} - -const struct dma_map_ops dummy_dma_ops = { - .alloc = __dummy_alloc, - .free = __dummy_free, - .mmap = __dummy_mmap, - .map_page = __dummy_map_page, - .unmap_page = __dummy_unmap_page, - .map_sg = __dummy_map_sg, - .unmap_sg = __dummy_unmap_sg, - .sync_single_for_cpu= __dummy_sync_single, - .sync_single_for_device = __dummy_sync_single, - .sync_sg_for_cpu= __dummy_sync_sg, - .sync_sg_for_device = __dummy_sync_sg, - .dma_supported = __dummy_dma_supported, -}; -EXPORT_SYMBOL(dummy_dma_ops); - static int __init arm64_dma_init(void) { WARN_TAINT(ARCH_DMA_MINALIGN < cache_line_size(), diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 0
[PATCH 08/15] dma-mapping: move dma_get_required_mask to kernel/dma
dma_get_required_mask should really be with the rest of the DMA mapping implementation instead of in drivers/base as a lone outlier. Signed-off-by: Christoph Hellwig --- drivers/base/platform.c | 31 --- kernel/dma/mapping.c| 34 +- 2 files changed, 33 insertions(+), 32 deletions(-) diff --git a/drivers/base/platform.c b/drivers/base/platform.c index 41b91af95afb..eae841935a45 100644 --- a/drivers/base/platform.c +++ b/drivers/base/platform.c @@ -1179,37 +1179,6 @@ int __init platform_bus_init(void) return error; } -#ifndef ARCH_HAS_DMA_GET_REQUIRED_MASK -static u64 dma_default_get_required_mask(struct device *dev) -{ - u32 low_totalram = ((max_pfn - 1) << PAGE_SHIFT); - u32 high_totalram = ((max_pfn - 1) >> (32 - PAGE_SHIFT)); - u64 mask; - - if (!high_totalram) { - /* convert to mask just covering totalram */ - low_totalram = (1 << (fls(low_totalram) - 1)); - low_totalram += low_totalram - 1; - mask = low_totalram; - } else { - high_totalram = (1 << (fls(high_totalram) - 1)); - high_totalram += high_totalram - 1; - mask = (((u64)high_totalram) << 32) + 0x; - } - return mask; -} - -u64 dma_get_required_mask(struct device *dev) -{ - const struct dma_map_ops *ops = get_dma_ops(dev); - - if (ops->get_required_mask) - return ops->get_required_mask(dev); - return dma_default_get_required_mask(dev); -} -EXPORT_SYMBOL_GPL(dma_get_required_mask); -#endif - static __initdata LIST_HEAD(early_platform_driver_list); static __initdata LIST_HEAD(early_platform_device_list); diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index dfbc3deb95cd..dfe29d18dba1 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -5,7 +5,7 @@ * Copyright (c) 2006 SUSE Linux Products GmbH * Copyright (c) 2006 Tejun Heo */ - +#include /* for max_pfn */ #include #include #include @@ -262,3 +262,35 @@ int dma_common_mmap(struct device *dev, struct vm_area_struct *vma, #endif /* !CONFIG_ARCH_NO_COHERENT_DMA_MMAP */ } EXPORT_SYMBOL(dma_common_mmap); + +#ifndef ARCH_HAS_DMA_GET_REQUIRED_MASK +static u64 dma_default_get_required_mask(struct device *dev) +{ + u32 low_totalram = ((max_pfn - 1) << PAGE_SHIFT); + u32 high_totalram = ((max_pfn - 1) >> (32 - PAGE_SHIFT)); + u64 mask; + + if (!high_totalram) { + /* convert to mask just covering totalram */ + low_totalram = (1 << (fls(low_totalram) - 1)); + low_totalram += low_totalram - 1; + mask = low_totalram; + } else { + high_totalram = (1 << (fls(high_totalram) - 1)); + high_totalram += high_totalram - 1; + mask = (((u64)high_totalram) << 32) + 0x; + } + return mask; +} + +u64 dma_get_required_mask(struct device *dev) +{ + const struct dma_map_ops *ops = get_dma_ops(dev); + + if (ops->get_required_mask) + return ops->get_required_mask(dev); + return dma_default_get_required_mask(dev); +} +EXPORT_SYMBOL_GPL(dma_get_required_mask); +#endif + -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 10/15] dma-mapping: move dma_cache_sync out of line
This isn't exactly a slow path routine, but it is not super critical either, and moving it out of line will help to keep the include chain clean for the following DMA indirection bypass work. Signed-off-by: Christoph Hellwig --- include/linux/dma-mapping.h | 12 ++-- kernel/dma/mapping.c| 11 +++ 2 files changed, 13 insertions(+), 10 deletions(-) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 0bbce52606c2..0f0078490df4 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -411,16 +411,8 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, #define dma_map_page(d, p, o, s, r) dma_map_page_attrs(d, p, o, s, r, 0) #define dma_unmap_page(d, a, s, r) dma_unmap_page_attrs(d, a, s, r, 0) -static inline void -dma_cache_sync(struct device *dev, void *vaddr, size_t size, - enum dma_data_direction dir) -{ - const struct dma_map_ops *ops = get_dma_ops(dev); - - BUG_ON(!valid_dma_direction(dir)); - if (ops->cache_sync) - ops->cache_sync(dev, vaddr, size, dir); -} +void dma_cache_sync(struct device *dev, void *vaddr, size_t size, + enum dma_data_direction dir); extern int dma_common_mmap(struct device *dev, struct vm_area_struct *vma, void *cpu_addr, dma_addr_t dma_addr, size_t size, diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 176ae3e08916..0b18cfbdde95 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -430,3 +430,14 @@ int dma_set_coherent_mask(struct device *dev, u64 mask) } EXPORT_SYMBOL(dma_set_coherent_mask); #endif + +void dma_cache_sync(struct device *dev, void *vaddr, size_t size, + enum dma_data_direction dir) +{ + const struct dma_map_ops *ops = get_dma_ops(dev); + + BUG_ON(!valid_dma_direction(dir)); + if (ops->cache_sync) + ops->cache_sync(dev, vaddr, size, dir); +} +EXPORT_SYMBOL(dma_cache_sync); -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 11/15] dma-mapping: always build the direct mapping code
All architectures except for sparc64 use the dma-direct code in some form, and even for sparc64 we had the discussion of a direct mapping mode a while ago. In preparation for directly calling the direct mapping code don't bother having it optionally but always build the code in. This is a minor hardship for some powerpc and arm configs that don't pull it in yet (although they should in a relase ot two), and sparc64 which currently doesn't need it at all, but it will reduce the ifdef mess we'd otherwise need significantly. Signed-off-by: Christoph Hellwig --- arch/alpha/Kconfig | 1 - arch/arc/Kconfig| 1 - arch/arm/Kconfig| 1 - arch/arm64/Kconfig | 1 - arch/c6x/Kconfig| 1 - arch/csky/Kconfig | 1 - arch/h8300/Kconfig | 1 - arch/hexagon/Kconfig| 1 - arch/m68k/Kconfig | 1 - arch/microblaze/Kconfig | 1 - arch/mips/Kconfig | 1 - arch/nds32/Kconfig | 1 - arch/nios2/Kconfig | 1 - arch/openrisc/Kconfig | 1 - arch/parisc/Kconfig | 1 - arch/riscv/Kconfig | 1 - arch/s390/Kconfig | 1 - arch/sh/Kconfig | 1 - arch/sparc/Kconfig | 1 - arch/unicore32/Kconfig | 1 - arch/x86/Kconfig| 1 - arch/xtensa/Kconfig | 1 - kernel/dma/Kconfig | 7 --- kernel/dma/Makefile | 3 +-- 24 files changed, 1 insertion(+), 31 deletions(-) diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig index a7e748a46c18..5da6ff54b3e7 100644 --- a/arch/alpha/Kconfig +++ b/arch/alpha/Kconfig @@ -203,7 +203,6 @@ config ALPHA_EIGER config ALPHA_JENSEN bool "Jensen" depends on BROKEN - select DMA_DIRECT_OPS help DEC PC 150 AXP (aka Jensen): This is a very old Digital system - one of the first-generation Alpha systems. A number of these systems diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig index fd48d698da29..7deaabeb531a 100644 --- a/arch/arc/Kconfig +++ b/arch/arc/Kconfig @@ -17,7 +17,6 @@ config ARC select BUILDTIME_EXTABLE_SORT select CLONE_BACKWARDS select COMMON_CLK - select DMA_DIRECT_OPS select GENERIC_ATOMIC64 if !ISA_ARCV2 || !(ARC_HAS_LL64 && ARC_HAS_LLSC) select GENERIC_CLOCKEVENTS select GENERIC_FIND_FIRST_BIT diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index a858ee791ef0..586fc30b23bd 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -30,7 +30,6 @@ config ARM select CLONE_BACKWARDS select CPU_PM if (SUSPEND || CPU_IDLE) select DCACHE_WORD_ACCESS if HAVE_EFFICIENT_UNALIGNED_ACCESS - select DMA_DIRECT_OPS if !MMU select DMA_REMAP if MMU select EDAC_SUPPORT select EDAC_ATOMIC_SCRUB diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 06cf0ef24367..2092080240b0 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -80,7 +80,6 @@ config ARM64 select CPU_PM if (SUSPEND || CPU_IDLE) select CRC32 select DCACHE_WORD_ACCESS - select DMA_DIRECT_OPS select DMA_DIRECT_REMAP select EDAC_SUPPORT select FRAME_POINTER diff --git a/arch/c6x/Kconfig b/arch/c6x/Kconfig index 84420109113d..456e154674d1 100644 --- a/arch/c6x/Kconfig +++ b/arch/c6x/Kconfig @@ -9,7 +9,6 @@ config C6X select ARCH_HAS_SYNC_DMA_FOR_CPU select ARCH_HAS_SYNC_DMA_FOR_DEVICE select CLKDEV_LOOKUP - select DMA_DIRECT_OPS select GENERIC_ATOMIC64 select GENERIC_IRQ_SHOW select HAVE_ARCH_TRACEHOOK diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig index ea74f3a9eeaf..37bed8aadf95 100644 --- a/arch/csky/Kconfig +++ b/arch/csky/Kconfig @@ -7,7 +7,6 @@ config CSKY select COMMON_CLK select CLKSRC_MMIO select CLKSRC_OF - select DMA_DIRECT_OPS select DMA_DIRECT_REMAP select IRQ_DOMAIN select HANDLE_DOMAIN_IRQ diff --git a/arch/h8300/Kconfig b/arch/h8300/Kconfig index d19c6b16cd5d..6472a0685470 100644 --- a/arch/h8300/Kconfig +++ b/arch/h8300/Kconfig @@ -22,7 +22,6 @@ config H8300 select HAVE_ARCH_KGDB select HAVE_ARCH_HASH select CPU_NO_EFFICIENT_FFS - select DMA_DIRECT_OPS config CPU_BIG_ENDIAN def_bool y diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig index 2b688af379e6..d71036c598de 100644 --- a/arch/hexagon/Kconfig +++ b/arch/hexagon/Kconfig @@ -31,7 +31,6 @@ config HEXAGON select GENERIC_CLOCKEVENTS_BROADCAST select MODULES_USE_ELF_RELA select GENERIC_CPU_DEVICES - select DMA_DIRECT_OPS ---help--- Qualcomm Hexagon is a processor architecture designed for high performance and low power across a wide variety of applications. diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig index 1bc9f1ba759a..8a5868e9a3a0 100644 --- a/arch/m68k/Kconfig +++ b/arch/m68k/Kconfig @@ -26,7 +26,6 @@ config M68K select MODULES_USE_ELF_RELA select OLD_SIGSUSPEND3 select OLD_SIGACTION - select DMA_DI
[PATCH 09/15] dma-mapping: move various slow path functions out of line
There is no need to have all setup and coherent allocation / freeing routines inline. Move them out of line to keep the implemeation nicely encapsulated and save some kernel text size. Signed-off-by: Christoph Hellwig --- arch/powerpc/include/asm/dma-mapping.h | 1 - include/linux/dma-mapping.h| 150 +++-- kernel/dma/mapping.c | 140 ++- 3 files changed, 151 insertions(+), 140 deletions(-) diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h index 8fa394520af6..5201f2b7838c 100644 --- a/arch/powerpc/include/asm/dma-mapping.h +++ b/arch/powerpc/include/asm/dma-mapping.h @@ -108,7 +108,6 @@ static inline void set_dma_offset(struct device *dev, dma_addr_t off) } #define HAVE_ARCH_DMA_SET_MASK 1 -extern int dma_set_mask(struct device *dev, u64 dma_mask); extern u64 __dma_get_required_mask(struct device *dev); diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 3b431cc58794..0bbce52606c2 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -440,107 +440,24 @@ bool dma_in_atomic_pool(void *start, size_t size); void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags); bool dma_free_from_pool(void *start, size_t size); -/** - * dma_mmap_attrs - map a coherent DMA allocation into user space - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices - * @vma: vm_area_struct describing requested user mapping - * @cpu_addr: kernel CPU-view address returned from dma_alloc_attrs - * @handle: device-view address returned from dma_alloc_attrs - * @size: size of memory originally requested in dma_alloc_attrs - * @attrs: attributes of mapping properties requested in dma_alloc_attrs - * - * Map a coherent DMA buffer previously allocated by dma_alloc_attrs - * into user space. The coherent DMA buffer must not be freed by the - * driver until the user space mapping has been released. - */ -static inline int -dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma, void *cpu_addr, - dma_addr_t dma_addr, size_t size, unsigned long attrs) -{ - const struct dma_map_ops *ops = get_dma_ops(dev); - BUG_ON(!ops); - if (ops->mmap) - return ops->mmap(dev, vma, cpu_addr, dma_addr, size, attrs); - return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size, attrs); -} - +int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma, + void *cpu_addr, dma_addr_t dma_addr, size_t size, + unsigned long attrs); #define dma_mmap_coherent(d, v, c, h, s) dma_mmap_attrs(d, v, c, h, s, 0) int dma_common_get_sgtable(struct device *dev, struct sg_table *sgt, void *cpu_addr, dma_addr_t dma_addr, size_t size, unsigned long attrs); -static inline int -dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt, void *cpu_addr, - dma_addr_t dma_addr, size_t size, - unsigned long attrs) -{ - const struct dma_map_ops *ops = get_dma_ops(dev); - BUG_ON(!ops); - if (ops->get_sgtable) - return ops->get_sgtable(dev, sgt, cpu_addr, dma_addr, size, - attrs); - return dma_common_get_sgtable(dev, sgt, cpu_addr, dma_addr, size, - attrs); -} - +int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt, + void *cpu_addr, dma_addr_t dma_addr, size_t size, + unsigned long attrs); #define dma_get_sgtable(d, t, v, h, s) dma_get_sgtable_attrs(d, t, v, h, s, 0) -#ifndef arch_dma_alloc_attrs -#define arch_dma_alloc_attrs(dev) (true) -#endif - -static inline void *dma_alloc_attrs(struct device *dev, size_t size, - dma_addr_t *dma_handle, gfp_t flag, - unsigned long attrs) -{ - const struct dma_map_ops *ops = get_dma_ops(dev); - void *cpu_addr; - - BUG_ON(!ops); - WARN_ON_ONCE(dev && !dev->coherent_dma_mask); - - if (dma_alloc_from_dev_coherent(dev, size, dma_handle, &cpu_addr)) - return cpu_addr; - - /* let the implementation decide on the zone to allocate from: */ - flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM); - - if (!arch_dma_alloc_attrs(&dev)) - return NULL; - if (!ops->alloc) - return NULL; - - cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs); - debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr); - return cpu_addr; -} - -static inline void dma_free_attrs(struct device *dev, size_t size, -void *cpu_addr, dma_addr_t dma_handle, -unsigned long attrs) -{ - const struct dma_map_ops *ops = get_dma_ops(dev); - - BUG_ON(!ops); - - if (dma_release_from_dev_coherent(dev, get_o
[PATCH 07/15] dma-mapping: merge dma_unmap_page_attrs and dma_unmap_single_attrs
The two functions are exactly the same, so don't bother implementing them twice. Signed-off-by: Christoph Hellwig --- include/linux/dma-mapping.h | 19 ++- 1 file changed, 6 insertions(+), 13 deletions(-) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 8916499d2805..3b431cc58794 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -253,6 +253,12 @@ static inline void dma_unmap_single_attrs(struct device *dev, dma_addr_t addr, debug_dma_unmap_page(dev, addr, size, dir, true); } +static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, + size_t size, enum dma_data_direction dir, unsigned long attrs) +{ + return dma_unmap_single_attrs(dev, addr, size, dir, attrs); +} + /* * dma_maps_sg_attrs returns 0 on error and > 0 on success. * It should never return a value < 0. @@ -300,19 +306,6 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev, return addr; } -static inline void dma_unmap_page_attrs(struct device *dev, - dma_addr_t addr, size_t size, - enum dma_data_direction dir, - unsigned long attrs) -{ - const struct dma_map_ops *ops = get_dma_ops(dev); - - BUG_ON(!valid_dma_direction(dir)); - if (ops->unmap_page) - ops->unmap_page(dev, addr, size, dir, attrs); - debug_dma_unmap_page(dev, addr, size, dir, false); -} - static inline dma_addr_t dma_map_resource(struct device *dev, phys_addr_t phys_addr, size_t size, -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 06/15] dma-mapping: simplify the dma_sync_single_range_for_{cpu, device} implementation
We can just call the regular calls after adding offset the the address instead of reimplementing them. Signed-off-by: Christoph Hellwig --- include/linux/dma-debug.h | 27 include/linux/dma-mapping.h | 34 +- kernel/dma/debug.c | 42 - 3 files changed, 10 insertions(+), 93 deletions(-) diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h index 30213adbb6b9..c85e097a984c 100644 --- a/include/linux/dma-debug.h +++ b/include/linux/dma-debug.h @@ -72,17 +72,6 @@ extern void debug_dma_sync_single_for_device(struct device *dev, dma_addr_t dma_handle, size_t size, int direction); -extern void debug_dma_sync_single_range_for_cpu(struct device *dev, - dma_addr_t dma_handle, - unsigned long offset, - size_t size, - int direction); - -extern void debug_dma_sync_single_range_for_device(struct device *dev, - dma_addr_t dma_handle, - unsigned long offset, - size_t size, int direction); - extern void debug_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nelems, int direction); @@ -174,22 +163,6 @@ static inline void debug_dma_sync_single_for_device(struct device *dev, { } -static inline void debug_dma_sync_single_range_for_cpu(struct device *dev, - dma_addr_t dma_handle, - unsigned long offset, - size_t size, - int direction) -{ -} - -static inline void debug_dma_sync_single_range_for_device(struct device *dev, - dma_addr_t dma_handle, - unsigned long offset, - size_t size, - int direction) -{ -} - static inline void debug_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nelems, int direction) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 7799c2b27849..8916499d2805 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -360,6 +360,13 @@ static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr, debug_dma_sync_single_for_cpu(dev, addr, size, dir); } +static inline void dma_sync_single_range_for_cpu(struct device *dev, + dma_addr_t addr, unsigned long offset, size_t size, + enum dma_data_direction dir) +{ + return dma_sync_single_for_cpu(dev, addr + offset, size, dir); +} + static inline void dma_sync_single_for_device(struct device *dev, dma_addr_t addr, size_t size, enum dma_data_direction dir) @@ -372,32 +379,11 @@ static inline void dma_sync_single_for_device(struct device *dev, debug_dma_sync_single_for_device(dev, addr, size, dir); } -static inline void dma_sync_single_range_for_cpu(struct device *dev, -dma_addr_t addr, -unsigned long offset, -size_t size, -enum dma_data_direction dir) -{ - const struct dma_map_ops *ops = get_dma_ops(dev); - - BUG_ON(!valid_dma_direction(dir)); - if (ops->sync_single_for_cpu) - ops->sync_single_for_cpu(dev, addr + offset, size, dir); - debug_dma_sync_single_range_for_cpu(dev, addr, offset, size, dir); -} - static inline void dma_sync_single_range_for_device(struct device *dev, - dma_addr_t addr, - unsigned long offset, - size_t size, - enum dma_data_direction dir) + dma_addr_t addr, unsigned long offset, size_t size, + enum dma_data_direction dir) { - const struct dma_map_ops *ops = get_dma_ops(dev); - - BUG_ON(!valid_dma_direction(dir)); - if (ops->sync_single_for_device) - ops->sync_single_for_device(dev, addr
[PATCH 05/15] dma-direct: merge swiotlb_dma_ops into the dma_direct code
While the dma-direct code is (relatively) clean and simple we actually have to use the swiotlb ops for the mapping on many architectures due to devices with addressing limits. Instead of keeping two implementations around this commit allows the dma-direct implementation to call the swiotlb bounce buffering functions and thus share the guts of the mapping implementation. This also simplified the dma-mapping setup on a few architectures where we don't have to differenciate which implementation to use. Signed-off-by: Christoph Hellwig --- arch/arm64/mm/dma-mapping.c | 2 +- arch/ia64/hp/common/hwsw_iommu.c | 2 +- arch/ia64/hp/common/sba_iommu.c | 6 +- arch/ia64/kernel/dma-mapping.c | 2 +- arch/mips/include/asm/dma-mapping.h | 2 - arch/powerpc/kernel/dma-swiotlb.c| 16 +- arch/riscv/include/asm/dma-mapping.h | 15 -- arch/x86/kernel/pci-swiotlb.c| 4 +- arch/x86/mm/mem_encrypt.c| 7 - arch/x86/pci/sta2x11-fixup.c | 1 - include/linux/dma-direct.h | 12 ++ include/linux/swiotlb.h | 74 - kernel/dma/direct.c | 113 + kernel/dma/swiotlb.c | 232 ++- 14 files changed, 150 insertions(+), 338 deletions(-) delete mode 100644 arch/riscv/include/asm/dma-mapping.h diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 4c0f498069e8..e4effbb243b1 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -549,7 +549,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, const struct iommu_ops *iommu, bool coherent) { if (!dev->dma_ops) - dev->dma_ops = &swiotlb_dma_ops; + dev->dma_ops = &dma_direct_ops; dev->dma_coherent = coherent; __iommu_setup_dma_ops(dev, dma_base, size, iommu); diff --git a/arch/ia64/hp/common/hwsw_iommu.c b/arch/ia64/hp/common/hwsw_iommu.c index 58969039bed2..f40ca499b246 100644 --- a/arch/ia64/hp/common/hwsw_iommu.c +++ b/arch/ia64/hp/common/hwsw_iommu.c @@ -38,7 +38,7 @@ static inline int use_swiotlb(struct device *dev) const struct dma_map_ops *hwsw_dma_get_ops(struct device *dev) { if (use_swiotlb(dev)) - return &swiotlb_dma_ops; + return &dma_direct_ops; return &sba_dma_ops; } EXPORT_SYMBOL(hwsw_dma_get_ops); diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c index 0d21c0b5b23d..5ee74820a0f6 100644 --- a/arch/ia64/hp/common/sba_iommu.c +++ b/arch/ia64/hp/common/sba_iommu.c @@ -2065,8 +2065,6 @@ static int __init acpi_sba_ioc_init_acpi(void) /* This has to run before acpi_scan_init(). */ arch_initcall(acpi_sba_ioc_init_acpi); -extern const struct dma_map_ops swiotlb_dma_ops; - static int __init sba_init(void) { @@ -2080,7 +2078,7 @@ sba_init(void) * a successful kdump kernel boot is to use the swiotlb. */ if (is_kdump_kernel()) { - dma_ops = &swiotlb_dma_ops; + dma_ops = &dma_direct_ops; if (swiotlb_late_init_with_default_size(64 * (1<<20)) != 0) panic("Unable to initialize software I/O TLB:" " Try machvec=dig boot option"); @@ -2102,7 +2100,7 @@ sba_init(void) * If we didn't find something sba_iommu can claim, we * need to setup the swiotlb and switch to the dig machvec. */ - dma_ops = &swiotlb_dma_ops; + dma_ops = &dma_direct_ops; if (swiotlb_late_init_with_default_size(64 * (1<<20)) != 0) panic("Unable to find SBA IOMMU or initialize " "software I/O TLB: Try machvec=dig boot option"); diff --git a/arch/ia64/kernel/dma-mapping.c b/arch/ia64/kernel/dma-mapping.c index 36dd6aa6d759..80cd3e1ea95a 100644 --- a/arch/ia64/kernel/dma-mapping.c +++ b/arch/ia64/kernel/dma-mapping.c @@ -36,7 +36,7 @@ long arch_dma_coherent_to_pfn(struct device *dev, void *cpu_addr, void __init swiotlb_dma_init(void) { - dma_ops = &swiotlb_dma_ops; + dma_ops = &dma_direct_ops; swiotlb_init(1); } #endif diff --git a/arch/mips/include/asm/dma-mapping.h b/arch/mips/include/asm/dma-mapping.h index b4c477eb46ce..69f914667f3e 100644 --- a/arch/mips/include/asm/dma-mapping.h +++ b/arch/mips/include/asm/dma-mapping.h @@ -10,8 +10,6 @@ static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus) { #if defined(CONFIG_MACH_JAZZ) return &jazz_dma_ops; -#elif defined(CONFIG_SWIOTLB) - return &swiotlb_dma_ops; #else return &dma_direct_ops; #endif diff --git a/arch/powerpc/kernel/dma-swiotlb.c b/arch/powerpc/kernel/dma-swiotlb.c index 3d8df2cf8be9..430a7d0aa2cb 100644 --- a/arch/powerpc/kernel/dma-swiotlb.c +++ b/arch/powerpc/kernel/dma-swiotlb.c @@ -50,15 +50,15 @@ const struct dma_map_op
[PATCH 04/15] dma-direct: use dma_direct_map_page to implement dma_direct_map_sg
No need to duplicate the mapping logic. Signed-off-by: Christoph Hellwig --- kernel/dma/direct.c | 14 +- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index edb24f94ea1e..d45306473c90 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -217,6 +217,7 @@ static void dma_direct_sync_single_for_device(struct device *dev, arch_sync_dma_for_device(dev, dma_to_phys(dev, addr), size, dir); } +#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) static void dma_direct_sync_sg_for_device(struct device *dev, struct scatterlist *sgl, int nents, enum dma_data_direction dir) { @@ -229,6 +230,7 @@ static void dma_direct_sync_sg_for_device(struct device *dev, for_each_sg(sgl, sg, nents, i) arch_sync_dma_for_device(dev, sg_phys(sg), sg->length, dir); } +#endif #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \ defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) @@ -294,19 +296,13 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents, struct scatterlist *sg; for_each_sg(sgl, sg, nents, i) { - BUG_ON(!sg_page(sg)); - - sg_dma_address(sg) = phys_to_dma(dev, sg_phys(sg)); - if (unlikely(dev && !dma_capable(dev, sg_dma_address(sg), - sg->length))) { - report_addr(dev, sg_dma_address(sg), sg->length); + sg->dma_address = dma_direct_map_page(dev, sg_page(sg), + sg->offset, sg->length, dir, attrs); + if (sg->dma_address == DMA_MAPPING_ERROR) return 0; - } sg_dma_len(sg) = sg->length; } - if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) - dma_direct_sync_sg_for_device(dev, sgl, nents, dir); return nents; } -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC] avoid indirect calls for DMA direct mappings v2
Hi all, a while ago Jesper reported major performance regressions due to the spectre v2 mitigations in his XDP forwarding workloads. A large part of that is due to the DMA mapping API indirect calls. It turns out that the most common implementation of the DMA API is the direct mapping case, and now that we have merged almost all duplicate implementations of that into a single generic one is easily feasily to direct calls for this fast path. This series adds consolidate the DMA mapping code by merging the swiotlb case into the dma direct case, and then treats NULL dma_ops as an indicator that that we should directly call the direct mapping case. This recovers a large part of the retpoline induces XDP slowdown. This works is based on the dma-mapping tree, so you probably want to want this git tree for testing: git://git.infradead.org/users/hch/misc.git dma-direct-calls.2 Gitweb: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-direct-calls.2 Changes since v1: - now also includes all the prep patches relative to the dma-mapping for-next tree - move various slow path functions out of line - use a NULL dma ops as the indicate to use the direct mapping path - remove dma_direct_ops now that we always call it without the indirection - move the dummy dma ops to common code - explicitly st the dummy dma ops for devices that are indicates as not DMA capable by firmware ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 03/15] dma-direct: improve addressability error reporting
Only report report a DMA addressability report once to avoid spewing the kernel log with repeated message. Also provide a stack trace to make it easy to find the actual caller that caused the problem. Last but not least move the actual check into the fast path and only leave the error reporting in a helper. Signed-off-by: Christoph Hellwig --- kernel/dma/direct.c | 36 +++- 1 file changed, 15 insertions(+), 21 deletions(-) diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 308f88a750c8..edb24f94ea1e 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -30,27 +30,16 @@ static inline bool force_dma_unencrypted(void) return sev_active(); } -static bool -check_addr(struct device *dev, dma_addr_t dma_addr, size_t size, - const char *caller) +static void report_addr(struct device *dev, dma_addr_t dma_addr, size_t size) { - if (unlikely(dev && !dma_capable(dev, dma_addr, size))) { - if (!dev->dma_mask) { - dev_err(dev, - "%s: call on device without dma_mask\n", - caller); - return false; - } - - if (*dev->dma_mask >= DMA_BIT_MASK(32) || dev->bus_dma_mask) { - dev_err(dev, - "%s: overflow %pad+%zu of device mask %llx bus mask %llx\n", - caller, &dma_addr, size, - *dev->dma_mask, dev->bus_dma_mask); - } - return false; + if (!dev->dma_mask) { + dev_err_once(dev, "DMA map on device without dma_mask\n"); + } else if (*dev->dma_mask >= DMA_BIT_MASK(32) || dev->bus_dma_mask) { + dev_err_once(dev, + "overflow %pad+%zu of DMA mask %llx bus mask %llx\n", + &dma_addr, size, *dev->dma_mask, dev->bus_dma_mask); } - return true; + WARN_ON_ONCE(1); } static inline dma_addr_t phys_to_dma_direct(struct device *dev, @@ -288,8 +277,10 @@ dma_addr_t dma_direct_map_page(struct device *dev, struct page *page, phys_addr_t phys = page_to_phys(page) + offset; dma_addr_t dma_addr = phys_to_dma(dev, phys); - if (!check_addr(dev, dma_addr, size, __func__)) + if (unlikely(dev && !dma_capable(dev, dma_addr, size))) { + report_addr(dev, dma_addr, size); return DMA_MAPPING_ERROR; + } if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) dma_direct_sync_single_for_device(dev, dma_addr, size, dir); @@ -306,8 +297,11 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents, BUG_ON(!sg_page(sg)); sg_dma_address(sg) = phys_to_dma(dev, sg_phys(sg)); - if (!check_addr(dev, sg_dma_address(sg), sg->length, __func__)) + if (unlikely(dev && !dma_capable(dev, sg_dma_address(sg), + sg->length))) { + report_addr(dev, sg_dma_address(sg), sg->length); return 0; + } sg_dma_len(sg) = sg->length; } -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 01/15] swiotlb: remove SWIOTLB_MAP_ERROR
We can use DMA_MAPPING_ERROR instead, which already maps to the same value. Signed-off-by: Christoph Hellwig --- drivers/xen/swiotlb-xen.c | 4 ++-- include/linux/swiotlb.h | 3 --- kernel/dma/swiotlb.c | 4 ++-- 3 files changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 6dc969d5ea2f..833e80b46eb2 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -403,7 +403,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page, map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir, attrs); - if (map == SWIOTLB_MAP_ERROR) + if (map == DMA_MAPPING_ERROR) return DMA_MAPPING_ERROR; dev_addr = xen_phys_to_bus(map); @@ -572,7 +572,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, sg_phys(sg), sg->length, dir, attrs); - if (map == SWIOTLB_MAP_ERROR) { + if (map == DMA_MAPPING_ERROR) { dev_warn(hwdev, "swiotlb buffer is full\n"); /* Don't panic here, we expect map_sg users to do proper error handling. */ diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index a387b59640a4..14aec0b70dd9 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -46,9 +46,6 @@ enum dma_sync_target { SYNC_FOR_DEVICE = 1, }; -/* define the last possible byte of physical address space as a mapping error */ -#define SWIOTLB_MAP_ERROR (~(phys_addr_t)0x0) - extern phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, dma_addr_t tbl_dma_addr, phys_addr_t phys, size_t size, diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index ff1ce81bb623..19ba8e473d71 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -526,7 +526,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, spin_unlock_irqrestore(&io_tlb_lock, flags); if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit()) dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", size); - return SWIOTLB_MAP_ERROR; + return DMA_MAPPING_ERROR; found: spin_unlock_irqrestore(&io_tlb_lock, flags); @@ -637,7 +637,7 @@ static dma_addr_t swiotlb_bounce_page(struct device *dev, phys_addr_t *phys, /* Oh well, have to allocate and map a bounce buffer. */ *phys = swiotlb_tbl_map_single(dev, __phys_to_dma(dev, io_tlb_start), *phys, size, dir, attrs); - if (*phys == SWIOTLB_MAP_ERROR) + if (*phys == DMA_MAPPING_ERROR) return DMA_MAPPING_ERROR; /* Ensure that the address returned is DMA'ble */ -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 02/15] swiotlb: remove dma_mark_clean
Instead of providing a special dma_mark_clean hook just for ia64, switch ia64 to use the normal arch_sync_dma_for_cpu hooks instead. This means that we now also set the PG_arch_1 bit for pages in the swiotlb buffer, which isn't stricly needed as we will never execute code out of the swiotlb buffer, but otherwise harmless. Signed-off-by: Christoph Hellwig --- arch/ia64/Kconfig | 3 ++- arch/ia64/kernel/dma-mapping.c | 20 +++- arch/ia64/mm/init.c| 18 +++--- drivers/xen/swiotlb-xen.c | 20 +--- include/linux/dma-direct.h | 8 kernel/dma/swiotlb.c | 18 +- 6 files changed, 30 insertions(+), 57 deletions(-) diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index d6f203658994..c587e3316c38 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -28,7 +28,8 @@ config IA64 select HAVE_ARCH_TRACEHOOK select HAVE_MEMBLOCK_NODE_MAP select HAVE_VIRT_CPU_ACCOUNTING - select ARCH_HAS_DMA_MARK_CLEAN + select ARCH_HAS_DMA_COHERENT_TO_PFN + select ARCH_HAS_SYNC_DMA_FOR_CPU select VIRT_TO_BUS select ARCH_DISCARD_MEMBLOCK select GENERIC_IRQ_PROBE diff --git a/arch/ia64/kernel/dma-mapping.c b/arch/ia64/kernel/dma-mapping.c index 7a471d8d67d4..36dd6aa6d759 100644 --- a/arch/ia64/kernel/dma-mapping.c +++ b/arch/ia64/kernel/dma-mapping.c @@ -1,5 +1,5 @@ // SPDX-License-Identifier: GPL-2.0 -#include +#include #include #include @@ -16,6 +16,24 @@ const struct dma_map_ops *dma_get_ops(struct device *dev) EXPORT_SYMBOL(dma_get_ops); #ifdef CONFIG_SWIOTLB +void *arch_dma_alloc(struct device *dev, size_t size, + dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs) +{ + return dma_direct_alloc_pages(dev, size, dma_handle, gfp, attrs); +} + +void arch_dma_free(struct device *dev, size_t size, void *cpu_addr, + dma_addr_t dma_addr, unsigned long attrs) +{ + dma_direct_free_pages(dev, size, cpu_addr, dma_addr, attrs); +} + +long arch_dma_coherent_to_pfn(struct device *dev, void *cpu_addr, + dma_addr_t dma_addr) +{ + return page_to_pfn(virt_to_page(cpu_addr)); +} + void __init swiotlb_dma_init(void) { dma_ops = &swiotlb_dma_ops; diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c index d5e12ff1d73c..2c51733f1dfd 100644 --- a/arch/ia64/mm/init.c +++ b/arch/ia64/mm/init.c @@ -71,18 +71,14 @@ __ia64_sync_icache_dcache (pte_t pte) * DMA can be marked as "clean" so that lazy_mmu_prot_update() doesn't have to * flush them when they get mapped into an executable vm-area. */ -void -dma_mark_clean(void *addr, size_t size) +void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr, + size_t size, enum dma_data_direction dir) { - unsigned long pg_addr, end; - - pg_addr = PAGE_ALIGN((unsigned long) addr); - end = (unsigned long) addr + size; - while (pg_addr + PAGE_SIZE <= end) { - struct page *page = virt_to_page(pg_addr); - set_bit(PG_arch_1, &page->flags); - pg_addr += PAGE_SIZE; - } + unsigned long pfn = __phys_to_pfn(paddr); + + do { + set_bit(PG_arch_1, &pfn_to_page(pfn)->flags); + } while (++pfn <= __phys_to_pfn(paddr + size - 1)); } inline void diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 833e80b46eb2..989cf872b98c 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -441,21 +441,8 @@ static void xen_unmap_single(struct device *hwdev, dma_addr_t dev_addr, xen_dma_unmap_page(hwdev, dev_addr, size, dir, attrs); /* NOTE: We use dev_addr here, not paddr! */ - if (is_xen_swiotlb_buffer(dev_addr)) { + if (is_xen_swiotlb_buffer(dev_addr)) swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs); - return; - } - - if (dir != DMA_FROM_DEVICE) - return; - - /* -* phys_to_virt doesn't work with hihgmem page but we could -* call dma_mark_clean() with hihgmem page here. However, we -* are fine since dma_mark_clean() is null on POWERPC. We can -* make dma_mark_clean() take a physical address if necessary. -*/ - dma_mark_clean(phys_to_virt(paddr), size); } static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr, @@ -493,11 +480,6 @@ xen_swiotlb_sync_single(struct device *hwdev, dma_addr_t dev_addr, if (target == SYNC_FOR_DEVICE) xen_dma_sync_single_for_device(hwdev, dev_addr, size, dir); - - if (dir != DMA_FROM_DEVICE) - return; - - dma_mark_clean(phys_to_virt(paddr), size); } void diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h index 6e5a47ae7d64..1aa73f4907ae 100644 --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -48,14 +48,6 @@ static inline
Re: [virtio-dev] Re: [PATCH v5 5/7] iommu: Add virtio-iommu driver
Sorry for the delay, I wanted to do a little more performance analysis before continuing. On 27/11/2018 18:10, Michael S. Tsirkin wrote: > On Tue, Nov 27, 2018 at 05:55:20PM +, Jean-Philippe Brucker wrote: + if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1) || + !virtio_has_feature(vdev, VIRTIO_IOMMU_F_MAP_UNMAP)) >>> >>> Why bother with a feature bit for this then btw? >> >> We'll need a new feature bit for sharing page tables with the hardware, >> because they require different requests (attach_table/invalidate instead >> of map/unmap.) A future device supporting page table sharing won't >> necessarily need to support map/unmap. >> > I don't see virtio iommu being extended to support ARM specific > requests. This just won't scale, too many different > descriptor formats out there. They aren't really ARM specific requests. The two new requests are ATTACH_TABLE and INVALIDATE, which would be used by x86 IOMMUs as well. Sharing CPU address space with the HW IOMMU (SVM) has been in the scope of virtio-iommu since the first RFC, and I've been working with that extension in mind since the beginning. As an example you can have a look at my current draft for this [1], which is inspired from the VFIO work we've been doing with Intel. The negotiation phase inevitably requires vendor-specific fields in the descriptors - host tells which formats are supported, guest chooses a format and attaches page tables. But invalidation and fault reporting descriptors are fairly generic. > If you want to go that way down the road, you should avoid > virtio iommu, instead emulate and share code with the ARM SMMU (probably > with a different vendor id so you can implement the > report on map for devices without PRI). vSMMU has to stay in userspace though. The main reason we're proposing virtio-iommu is that emulating every possible vIOMMU model in the kernel would be unmaintainable. With virtio-iommu we can process the fast path in the host kernel, through vhost-iommu, and do the heavy lifting in userspace. As said above, I'm trying to keep the fast path for virtio-iommu generic. More notes on what I consider to be the fast path, and comparison with vSMMU: (1) The primary use-case we have in mind for vIOMMU is something like DPDK in the guest, assigning a hardware device to guest userspace. DPDK maps a large amount of memory statically, to be used by a pass-through device. For this case I don't think we care about vIOMMU performance. Setup and teardown need to be reasonably fast, sure, but the MAP/UNMAP requests don't have to be optimal. (2) If the assigned device is owned by the guest kernel, then mappings are dynamic and require dma_map/unmap() to be fast, but there generally is no need for a vIOMMU, since device and drivers are trusted by the guest kernel. Even when the user does enable a vIOMMU for this case (allowing to over-commit guest memory, which needs to be pinned otherwise), we generally play tricks like lazy TLBI (non-strict mode) to make it faster. Here device and drivers are trusted, therefore the vulnerability window of lazy mode isn't a concern. If the reason to enable the vIOMMU is over-comitting guest memory however, you can't use nested translation because it requires pinning the second-level tables. For this case performance matters a bit, because your invalidate-on-map needs to be fast, even if you enable lazy mode and only receive inval-on-unmap every 10ms. It won't ever be as fast as nested translation, though. For this case I think vSMMU+Caching Mode and userspace virtio-iommu with MAP/UNMAP would perform similarly (given page-sized payloads), because the pagetable walk doesn't add a lot of overhead compared to the context switch. But given the results below, vhost-iommu would be faster than vSMMU+CM. (3) Then there is SVM. For SVM, any destructive change to the process address space requires a synchronous invalidation command to the hardware (at least when using PCI ATS). Given that SVM is based on page faults, fault reporting from host to guest also needs to be fast, as well as fault response from guest to host. I think this is where performance matters the most. To get a feel of the advantage we get with virtio-iommu, I compared the vSMMU page-table sharing implementation [2] and vhost-iommu + VFIO with page table sharing (based on Tomasz Nowicki's vhost-iommu prototype). That's on a ThunderX2 with a 10Gb NIC assigned to the guest kernel, which corresponds to case (2) above, with nesting page tables and without the lazy mode. The host's only job is forwarding invalidation to the HW SMMU. vhost-iommu performed on average 1.8x and 5.5x better than vSMMU on netperf TCP_STREAM and TCP_MAERTS respectively (~200 samples). I think this can be further optimized (that was still polling under the vq lock), and unlike vSMMU, virtio-iommu offers the possibility of multi-queue for improved scalability. In addition, the guest will need to send both TLB and ATC invalidations with v
Re: use generic DMA mapping code in powerpc V4
Next step: 13c1fdec5682b6e13257277fa16aa31f342d167d (powerpc/dma: move pci_dma_dev_setup_swiotlb to fsl_pci.c) git checkout 13c1fdec5682b6e13257277fa16aa31f342d167d Result: The PASEMI onboard ethernet works and the X5000 boots. — Christian Sent from my iPhone > On 7. Dec 2018, at 14:45, Christian Zigotzky wrote: > >> On 06 December 2018 at 11:55AM, Christian Zigotzky wrote: >>> On 05 December 2018 at 3:05PM, Christoph Hellwig wrote: >>> >>> Thanks. Can you try a few stepping points in the tree? >>> >>> First just with commit 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6 >>> (the first one) applied? >>> >>> Second with all commits up to 5da11e49df21f21dac25a2491aa788307bdacb6b >>> >>> And if that still works with commits up to >>> c1bfcad4b0cf38ce5b00f7ad880d3a13484c123a >>> >> Hi Christoph, >> >> I undid the commit 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6 with the >> following command: >> >> git checkout 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6 >> >> Result: PASEMI onboard ethernet works again and the P5020 board boots. >> >> I will test the other commits in the next days. >> >> @All >> It is really important, that you also test Christoph's work on your PASEMI >> and NXP boards. Could you please help us with solving the issues? >> >> 'git clone git://git.infradead.org/users/hch/misc.git -b powerpc-dma.5 a' >> >> Thanks, >> Christian >> >> > Today I tested the commit 5da11e49df21f21dac25a2491aa788307bdacb6b. > > git checkout 5da11e49df21f21dac25a2491aa788307bdacb6b > > The PASEMI onboard ethernet works and the P5020 board boots. > > -- Christian > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()
On 30/11/2018 11:14, John Garry wrote: From: Ganapatrao Kulkarni Hi Joerg, A friendly reminder. Can you please let me know your position on this patch? Cheers, John Change function __iommu_dma_alloc_pages() to allocate pages for DMA from respective device NUMA node. The ternary operator which would be for alloc_pages_node() is tidied along with this. The motivation for this change is to have a policy for page allocation consistent with direct DMA mapping, which attempts to allocate pages local to the device, as mentioned in [1]. In addition, for certain workloads it has been observed a marginal performance improvement. The patch caused an observation of 0.9% average throughput improvement for running tcrypt with HiSilicon crypto engine. We also include a modification to use kvzalloc() for kzalloc()/vzalloc() combination. [1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1692998.html Signed-off-by: Ganapatrao Kulkarni [JPG: Added kvzalloc(), drop pages ** being device local, remove ternary operator, update message] Signed-off-by: John Garry --- Difference: v1->v2: - Add Ganapatrao's tag and change author v2->v3: - removed ternary operator - stopped making pages ** allocation local to device v3->v4: - Update commit message to include motivation for patch, including headline performance improvement for test. Some notes: This patch was originally posted by Ganapatrao in [2]. However, after initial review, it was never reposted (due to lack of cycles, I think). In addition, the functionality in its sibling patches were merged through patches, as mentioned in [2]; this also refers to a discussion on device local allocations vs CPU local allocations for DMA pool, and which is better [1]. However, as mentioned in [1], dma_alloc_coherent() uses the locality information from the device - as in direct DMA - so this patch is just applying this same policy. And from some testing, mentioned in commit message, shows marginal improvement. [2] https://lore.kernel.org/patchwork/patch/833004/ [3] https://lkml.org/lkml/2018/8/22/391 diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index d1b0475..4afb1a8 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -449,20 +449,17 @@ static void __iommu_dma_free_pages(struct page **pages, int count) kvfree(pages); } -static struct page **__iommu_dma_alloc_pages(unsigned int count, - unsigned long order_mask, gfp_t gfp) +static struct page **__iommu_dma_alloc_pages(struct device *dev, + unsigned int count, unsigned long order_mask, gfp_t gfp) { struct page **pages; - unsigned int i = 0, array_size = count * sizeof(*pages); + unsigned int i = 0, nid = dev_to_node(dev); order_mask &= (2U << MAX_ORDER) - 1; if (!order_mask) return NULL; - if (array_size <= PAGE_SIZE) - pages = kzalloc(array_size, GFP_KERNEL); - else - pages = vzalloc(array_size); + pages = kvzalloc(count * sizeof(*pages), GFP_KERNEL); if (!pages) return NULL; @@ -481,10 +478,12 @@ static struct page **__iommu_dma_alloc_pages(unsigned int count, for (order_mask &= (2U << __fls(count)) - 1; order_mask; order_mask &= ~order_size) { unsigned int order = __fls(order_mask); + gfp_t alloc_flags = gfp; order_size = 1U << order; - page = alloc_pages((order_mask - order_size) ? - gfp | __GFP_NORETRY : gfp, order); + if (order_mask > order_size) + alloc_flags |= __GFP_NORETRY; + page = alloc_pages_node(nid, alloc_flags, order); if (!page) continue; if (!order) @@ -569,7 +568,8 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp, alloc_sizes = min_size; count = PAGE_ALIGN(size) >> PAGE_SHIFT; - pages = __iommu_dma_alloc_pages(count, alloc_sizes >> PAGE_SHIFT, gfp); + pages = __iommu_dma_alloc_pages(dev, count, alloc_sizes >> PAGE_SHIFT, + gfp); if (!pages) return NULL; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] Refactor dummy DMA ops
On 07/12/2018 17:05, Christoph Hellwig wrote: So I'd really prefer if we had a separate dummy.c file, like in my take on your previous patch here: http://git.infradead.org/users/hch/misc.git/commitdiff/e01adddc1733fa414dc16cd22e8f58be9b64a025 http://git.infradead.org/users/hch/misc.git/commitdiff/596bde76e5944a3f4beb8c2769067ca88dda127a Otherwise this looks fine. If you don't minde I'll take your patches, apply the move to a separate file and merge it into the above tree. Sure - TBH I did consider creating a separate file, but then didn't for mysterious reasons that don't stand up to scrutiny. If you're happy to do the fixup on top that's fine by me (if culling .map_resource was intentional, please scrub the last bit of the commit message to match). I'll make the equivalent change locally anyway just in case there's any other cause to resend. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/9] mm: Introduce new vm_insert_range API
On Fri, Dec 07, 2018 at 03:34:56PM +, Robin Murphy wrote: > > +int vm_insert_range(struct vm_area_struct *vma, unsigned long addr, > > + struct page **pages, unsigned long page_count) > > +{ > > + unsigned long uaddr = addr; > > + int ret = 0, i; > > Some of the sites being replaced were effectively ensuring that vma and > pages were mutually compatible as an initial condition - would it be worth > adding something here for robustness, e.g.: > > + if (page_count != vma_pages(vma)) > + return -ENXIO; I think we want to allow this to be used to populate part of a VMA. So perhaps: if (page_count > vma_pages(vma)) return -ENXIO; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] Refactor dummy DMA ops
So I'd really prefer if we had a separate dummy.c file, like in my take on your previous patch here: http://git.infradead.org/users/hch/misc.git/commitdiff/e01adddc1733fa414dc16cd22e8f58be9b64a025 http://git.infradead.org/users/hch/misc.git/commitdiff/596bde76e5944a3f4beb8c2769067ca88dda127a Otherwise this looks fine. If you don't minde I'll take your patches, apply the move to a separate file and merge it into the above tree. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/2] Refactor dummy DMA ops
Hi all, Tangential to Christoph's RFC for mitigating indirect call overhead in common DMA mapping scenarios[1], this is a little reshuffle to prevent the CONFIG_ACPI_CCA_REQUIRED case from getting in the way. This would best go via the dma-mapping tree, so reviews and acks welcome. Robin. [1] https://lore.kernel.org/lkml/20181206153720.10702-1-...@lst.de/ Robin Murphy (2): dma-mapping: Factor out dummy DMA ops ACPI / scan: Refactor _CCA enforcement arch/arm64/include/asm/dma-mapping.h | 4 +- arch/arm64/mm/dma-mapping.c | 92 drivers/acpi/scan.c | 5 ++ drivers/base/platform.c | 3 +- drivers/pci/pci-driver.c | 3 +- include/linux/dma-mapping.h | 1 + kernel/dma/mapping.c | 54 7 files changed, 63 insertions(+), 99 deletions(-) -- 2.19.1.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/2] dma-mapping: Factor out dummy DMA ops
The dummy DMA ops are currently used by arm64 for any device which has an invalid ACPI description and is thus barred from using DMA due to not knowing whether is is cache-coherent or not. Factor these out into general dma-mapping code so that they can be referenced from other common code paths. In the process, we can prune all the optional callbacks which just do the same thing as the default behaviour, and fill in .map_resource for completeness. CC: Catalin Marinas CC: Will Deacon Signed-off-by: Robin Murphy --- arch/arm64/include/asm/dma-mapping.h | 4 +- arch/arm64/mm/dma-mapping.c | 92 include/linux/dma-mapping.h | 1 + kernel/dma/mapping.c | 54 4 files changed, 56 insertions(+), 95 deletions(-) diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h index c41f3fb1446c..273e778f7de2 100644 --- a/arch/arm64/include/asm/dma-mapping.h +++ b/arch/arm64/include/asm/dma-mapping.h @@ -24,15 +24,13 @@ #include #include -extern const struct dma_map_ops dummy_dma_ops; - static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus) { /* * We expect no ISA devices, and all other DMA masters are expected to * have someone call arch_setup_dma_ops at device creation time. */ - return &dummy_dma_ops; + return &dma_dummy_ops; } void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index a3ac26284845..f3af6cc52fad 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -252,98 +252,6 @@ static int __init atomic_pool_init(void) return -ENOMEM; } -/ - * The following APIs are for dummy DMA ops * - / - -static void *__dummy_alloc(struct device *dev, size_t size, - dma_addr_t *dma_handle, gfp_t flags, - unsigned long attrs) -{ - return NULL; -} - -static void __dummy_free(struct device *dev, size_t size, -void *vaddr, dma_addr_t dma_handle, -unsigned long attrs) -{ -} - -static int __dummy_mmap(struct device *dev, - struct vm_area_struct *vma, - void *cpu_addr, dma_addr_t dma_addr, size_t size, - unsigned long attrs) -{ - return -ENXIO; -} - -static dma_addr_t __dummy_map_page(struct device *dev, struct page *page, - unsigned long offset, size_t size, - enum dma_data_direction dir, - unsigned long attrs) -{ - return 0; -} - -static void __dummy_unmap_page(struct device *dev, dma_addr_t dev_addr, - size_t size, enum dma_data_direction dir, - unsigned long attrs) -{ -} - -static int __dummy_map_sg(struct device *dev, struct scatterlist *sgl, - int nelems, enum dma_data_direction dir, - unsigned long attrs) -{ - return 0; -} - -static void __dummy_unmap_sg(struct device *dev, -struct scatterlist *sgl, int nelems, -enum dma_data_direction dir, -unsigned long attrs) -{ -} - -static void __dummy_sync_single(struct device *dev, - dma_addr_t dev_addr, size_t size, - enum dma_data_direction dir) -{ -} - -static void __dummy_sync_sg(struct device *dev, - struct scatterlist *sgl, int nelems, - enum dma_data_direction dir) -{ -} - -static int __dummy_mapping_error(struct device *hwdev, dma_addr_t dma_addr) -{ - return 1; -} - -static int __dummy_dma_supported(struct device *hwdev, u64 mask) -{ - return 0; -} - -const struct dma_map_ops dummy_dma_ops = { - .alloc = __dummy_alloc, - .free = __dummy_free, - .mmap = __dummy_mmap, - .map_page = __dummy_map_page, - .unmap_page = __dummy_unmap_page, - .map_sg = __dummy_map_sg, - .unmap_sg = __dummy_unmap_sg, - .sync_single_for_cpu= __dummy_sync_single, - .sync_single_for_device = __dummy_sync_single, - .sync_sg_for_cpu= __dummy_sync_sg, - .sync_sg_for_device = __dummy_sync_sg, - .mapping_error = __dummy_mapping_error, - .dma_supported = __dummy_dma_supported, -}; -EXPORT_SYMBOL(dummy_dma_ops); - static int __init arm64_dma_init(void) { WARN_TAINT(ARCH_DMA_MINALIGN < cache_line_size(), diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 15bd41447025..3a1
[PATCH 2/2] ACPI / scan: Refactor _CCA enforcement
Rather than checking the DMA attribute at each callsite, just pass it through for acpi_dma_configure() to handle directly. That can then deal with the relatively exceptional DEV_DMA_NOT_SUPPORTED case by explicitly installing dummy DMA ops instead of just skipping setup entirely. This will then free up the dev->dma_ops == NULL case for some valuable fastpath optimisations. CC: Rafael J. Wysocki CC: Len Brown CC: Greg Kroah-Hartman CC: Bjorn Helgaas Signed-off-by: Robin Murphy --- drivers/acpi/scan.c | 5 + drivers/base/platform.c | 3 +-- drivers/pci/pci-driver.c | 3 +-- 3 files changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c index bd1c59fb0e17..b75ae34ed188 100644 --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -1456,6 +1456,11 @@ int acpi_dma_configure(struct device *dev, enum dev_dma_attr attr) const struct iommu_ops *iommu; u64 dma_addr = 0, size = 0; + if (attr == DEV_DMA_NOT_SUPPORTED) { + set_dma_ops(dev, &dma_dummy_ops); + return 0; + } + iort_dma_setup(dev, &dma_addr, &size); iommu = iort_iommu_configure(dev); diff --git a/drivers/base/platform.c b/drivers/base/platform.c index 41b91af95afb..c6daca875c17 100644 --- a/drivers/base/platform.c +++ b/drivers/base/platform.c @@ -1138,8 +1138,7 @@ int platform_dma_configure(struct device *dev) ret = of_dma_configure(dev, dev->of_node, true); } else if (has_acpi_companion(dev)) { attr = acpi_get_dma_attr(to_acpi_device_node(dev->fwnode)); - if (attr != DEV_DMA_NOT_SUPPORTED) - ret = acpi_dma_configure(dev, attr); + ret = acpi_dma_configure(dev, attr); } return ret; diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index bef17c3fca67..f899a28b90f8 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -1602,8 +1602,7 @@ static int pci_dma_configure(struct device *dev) struct acpi_device *adev = to_acpi_device_node(bridge->fwnode); enum dev_dma_attr attr = acpi_get_dma_attr(adev); - if (attr != DEV_DMA_NOT_SUPPORTED) - ret = acpi_dma_configure(dev, attr); + ret = acpi_dma_configure(dev, attr); } pci_put_host_bridge_device(bridge); -- 2.19.1.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC] avoid indirect calls for DMA direct mappings
On Fri, 7 Dec 2018 16:44:35 +0100 Jesper Dangaard Brouer wrote: > On Fri, 7 Dec 2018 02:21:42 +0100 > Christoph Hellwig wrote: > > > On Thu, Dec 06, 2018 at 08:24:38PM +, Robin Murphy wrote: > > > On 06/12/2018 20:00, Christoph Hellwig wrote: > > >> On Thu, Dec 06, 2018 at 06:54:17PM +, Robin Murphy wrote: > > >>> I'm pretty sure we used to assign dummy_dma_ops explicitly to devices at > > >>> the point we detected the ACPI properties are wrong - that shouldn't be > > >>> too > > >>> much of a headache to go back to. > > >> > > >> Ok. I've cooked up a patch to use NULL as the go direct marker. > > >> This cleans up a few things nicely, but also means we now need to > > >> do the bypass scheme for all ops, not just the fast path. But we > > >> probably should just move the slow path ops out of line anyway, > > >> so I'm not worried about it. This has survived some very basic > > >> testing on x86, and really needs to be cleaned up and split into > > >> multiple patches.. > > > > > > I've also just finished hacking something up to keep the arm64 status quo > > > - > > > I'll need to actually test it tomorrow, but the overall diff looks like > > > the > > > below. > > > > Nice. I created a branch that picked up your bits and also the ideas > > from Linus, and the result looks reall nice. I'll still need a signoff > > for your bits, though. > > > > Jesper, can you give this a spin if it changes the number even further? > > > > git://git.infradead.org/users/hch/misc.git dma-direct-calls.2 > > > > > > http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-direct-calls.2 > > > > I'll test it soon... > > I looked at my perf stat recording on my existing tests[1] and there > seems to be significantly more I-cache usage. The I-cache pressure seems to be lower with the new branch, and performance improved with 4.5 nanosec. > Copy-paste from my summary[1]: > [1] > https://github.com/xdp-project/xdp-project/blob/master/areas/dma/dma01_test_hellwig_direct_dma.org#summary-of-results Updated: * Summary of results Using XDP_REDIRECT between drivers RX ixgbe(10G) redirect TX i40e(40G), via BPF devmap (used samples/bpf/xdp_redirect_map) . (Note choose higher TX link-speed to assure that we don't to have a TX bottleneck). The baseline-kernel is at commit [[https://git.kernel.org/torvalds/c/ef78e5ec9214][ef78e5ec9214]], which is commit just before Hellwigs changes in this tree. Performance numbers in packets/sec (XDP_REDIRECT ixgbe -> i40e): - 11913154 (11,913,154) pps - baseline compiled without retpoline - 7438283 (7,438,283) pps - regression due to CONFIG_RETPOLINE - 9610088 (9,610,088) pps - mitigation via Hellwig dma-direct-calls - 10049223 (10,049,223) pps - Hellwig branch dma-direct-calls.2 Do notice at these extreme speeds the pps number increase rabbit with small changes, e.g. difference to new branch is: - (1/9610088-1/10049223)*10^9 = 4.54 nanosec faster >From the inst per cycle, it is clear that retpolines are stalling the CPU pipeline: | pps| insn per cycle | |+| | 11,913,154 | 2.39 | | 7,438,283 | 1.54 | | 9,610,088 | 2.04 | | 10,049,223 | 1.99 | ||| Strangely the Instruction-Cache is also under heavier pressure: | pps| l2_rqsts.all_code_rd | l2_rqsts.code_rd_hit | l2_rqsts.code_rd_miss | |+--+--+---| | 11,913,154 | 874,547 | 742,335 | 132,198 | | 7,438,283 | 649,513 | 547,581 | 101,945 | | 9,610,088 | 2,568,064| 2,001,369| 566,683 | | 10,049,223 | 1,232,818| 1,152,514| 80,299 | || | | | -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC] avoid indirect calls for DMA direct mappings
On Fri, 7 Dec 2018 02:21:42 +0100 Christoph Hellwig wrote: > On Thu, Dec 06, 2018 at 08:24:38PM +, Robin Murphy wrote: > > On 06/12/2018 20:00, Christoph Hellwig wrote: > >> On Thu, Dec 06, 2018 at 06:54:17PM +, Robin Murphy wrote: > >>> I'm pretty sure we used to assign dummy_dma_ops explicitly to devices at > >>> the point we detected the ACPI properties are wrong - that shouldn't be > >>> too > >>> much of a headache to go back to. > >> > >> Ok. I've cooked up a patch to use NULL as the go direct marker. > >> This cleans up a few things nicely, but also means we now need to > >> do the bypass scheme for all ops, not just the fast path. But we > >> probably should just move the slow path ops out of line anyway, > >> so I'm not worried about it. This has survived some very basic > >> testing on x86, and really needs to be cleaned up and split into > >> multiple patches.. > > > > I've also just finished hacking something up to keep the arm64 status quo - > > I'll need to actually test it tomorrow, but the overall diff looks like the > > below. > > Nice. I created a branch that picked up your bits and also the ideas > from Linus, and the result looks reall nice. I'll still need a signoff > for your bits, though. > > Jesper, can you give this a spin if it changes the number even further? > > git://git.infradead.org/users/hch/misc.git dma-direct-calls.2 > > > http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-direct-calls.2 I'll test it soon... I looked at my perf stat recording on my existing tests[1] and there seems to be significantly more I-cache usage. Copy-paste from my summary[1]: [1] https://github.com/xdp-project/xdp-project/blob/master/areas/dma/dma01_test_hellwig_direct_dma.org#summary-of-results * Summary of results Using XDP_REDIRECT between drivers RX ixgbe(10G) redirect TX i40e(40G), via BPF devmap (used samples/bpf/xdp_redirect_map) . (Note choose higher TX link-speed to assure that we don't to have a TX bottleneck). The baseline-kernel is at commit https://git.kernel.org/torvalds/c/ef78e5ec9214, which is commit just before Hellwigs changes in this tree. Performance numbers in packets/sec (XDP_REDIRECT ixgbe -> i40e): - 11913154 (11,913,154) pps - baseline compiled without retpoline - 7438283 (7,438,283) pps - regression due to CONFIG_RETPOLINE - 9610088 (9,610,088) pps - mitigation via Hellwig dma-direct-calls >From the inst per cycle, it is clear that retpolines are stalling the CPU pipeline: | pps| insn per cycle | |+| | 11,913,154 | 2.39 | | 7,438,283 | 1.54 | | 9,610,088 | 2.04 | Strangely the Instruction-Cache is also under heavier pressure: | pps| l2_rqsts.all_code_rd | l2_rqsts.code_rd_hit | l2_rqsts.code_rd_miss | |+--+--+---| | 11,913,154 | 874,547 | 742,335 | 132,198 | | 7,438,283 | 649,513 | 547,581 | 101,945 | | 9,610,088 | 2,568,064| 2,001,369| 566,683 | || | | | -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
dma_declare_coherent_memory on main memory
Hi all, the ARM imx27/31 ports and various sh boards use dma_declare_coherent_memory on main memory taken from the memblock allocator. Is there any good reason these couldn't be switched to CMA areas? Getting rid of these magic dma_declare_coherent_memory area would help making the dma allocator a lot simpler. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/9] mm: Introduce new vm_insert_range API
On 06/12/2018 18:39, Souptick Joarder wrote: Previouly drivers have their own way of mapping range of kernel pages/memory into user vma and this was done by invoking vm_insert_page() within a loop. As this pattern is common across different drivers, it can be generalized by creating a new function and use it across the drivers. vm_insert_range is the new API which will be used to map a range of kernel memory/pages to user vma. This API is tested by Heiko for Rockchip drm driver, on rk3188, rk3288, rk3328 and rk3399 with graphics. Signed-off-by: Souptick Joarder Reviewed-by: Matthew Wilcox Reviewed-by: Mike Rapoport Tested-by: Heiko Stuebner --- include/linux/mm.h | 2 ++ mm/memory.c| 38 ++ mm/nommu.c | 7 +++ 3 files changed, 47 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index fcf9cc9..2bc399f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2506,6 +2506,8 @@ unsigned long change_prot_numa(struct vm_area_struct *vma, int remap_pfn_range(struct vm_area_struct *, unsigned long addr, unsigned long pfn, unsigned long size, pgprot_t); int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *); +int vm_insert_range(struct vm_area_struct *vma, unsigned long addr, + struct page **pages, unsigned long page_count); vm_fault_t vmf_insert_pfn(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn); vm_fault_t vmf_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr, diff --git a/mm/memory.c b/mm/memory.c index 15c417e..84ea46c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1478,6 +1478,44 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr, } /** + * vm_insert_range - insert range of kernel pages into user vma + * @vma: user vma to map to + * @addr: target user address of this page + * @pages: pointer to array of source kernel pages + * @page_count: number of pages need to insert into user vma + * + * This allows drivers to insert range of kernel pages they've allocated + * into a user vma. This is a generic function which drivers can use + * rather than using their own way of mapping range of kernel pages into + * user vma. + * + * If we fail to insert any page into the vma, the function will return + * immediately leaving any previously-inserted pages present. Callers + * from the mmap handler may immediately return the error as their caller + * will destroy the vma, removing any successfully-inserted pages. Other + * callers should make their own arrangements for calling unmap_region(). + * + * Context: Process context. Called by mmap handlers. + * Return: 0 on success and error code otherwise + */ +int vm_insert_range(struct vm_area_struct *vma, unsigned long addr, + struct page **pages, unsigned long page_count) +{ + unsigned long uaddr = addr; + int ret = 0, i; Some of the sites being replaced were effectively ensuring that vma and pages were mutually compatible as an initial condition - would it be worth adding something here for robustness, e.g.: + if (page_count != vma_pages(vma)) + return -ENXIO; ? (then you could also clean up a couple more places where you're not already removing such checks) Robin. + + for (i = 0; i < page_count; i++) { + ret = vm_insert_page(vma, uaddr, pages[i]); + if (ret < 0) + return ret; + uaddr += PAGE_SIZE; + } + + return ret; +} +EXPORT_SYMBOL(vm_insert_range); + +/** * vm_insert_page - insert single page into user vma * @vma: user vma to map to * @addr: target user address of this page diff --git a/mm/nommu.c b/mm/nommu.c index 749276b..d6ef5c7 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -473,6 +473,13 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned long addr, } EXPORT_SYMBOL(vm_insert_page); +int vm_insert_range(struct vm_area_struct *vma, unsigned long addr, + struct page **pages, unsigned long page_count) +{ + return -EINVAL; +} +EXPORT_SYMBOL(vm_insert_range); + /* * sys_brk() for the most part doesn't need the global kernel * lock, except when an application is doing something nasty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 01/34] powerpc: use mm zones more sensibly
I will work at the weekend to figure out where the problematic commit is. — Christian Sent from my iPhone > On 7. Dec 2018, at 15:09, Christoph Hellwig wrote: > >> On Fri, Dec 07, 2018 at 11:18:18PM +1100, Michael Ellerman wrote: >> Christoph Hellwig writes: >> >>> Ben / Michael, >>> >>> can we get this one queued up for 4.21 to prepare for the DMA work later >>> on? >> >> I was hoping the PASEMI / NXP regressions could be solved before >> merging. >> >> My p5020ds is booting fine with this series, so I'm not sure why it's >> causing problems on Christian's machine. >> >> The last time I turned on my PASEMI board it tripped some breakers, so I >> need to investigate that before I can help test that. >> >> I'll see how things look on Monday and either merge the commits you >> identified or the whole series depending on if there's any more info >> from Christian. > > Christian just confirmed everything up to at least > "powerpc/dma: stop overriding dma_get_required_mask" works for his > setups. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 2/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
On 12/7/18 7:16 AM, Nicolas Boichat wrote: > IOMMUs using ARMv7 short-descriptor format require page tables > (level 1 and 2) to be allocated within the first 4GB of RAM, even > on 64-bit systems. > > For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32 > is defined (e.g. on arm64 platforms). > > For level 2 pages, allocate a slab cache in SLAB_CACHE_DMA32. Note > that we do not explicitly pass GFP_DMA[32] to kmem_cache_zalloc, > as this is not strictly necessary, and would cause a warning > in mm/sl*b.c, as we did not update GFP_SLAB_BUG_MASK. > > Also, print an error when the physical address does not fit in > 32-bit, to make debugging easier in the future. > > Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32") Also, CC stable? > Signed-off-by: Nicolas Boichat ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/9] mm: Introduce new vm_insert_range API
Em Fri, 7 Dec 2018 00:09:45 +0530 Souptick Joarder escreveu: > Previouly drivers have their own way of mapping range of > kernel pages/memory into user vma and this was done by > invoking vm_insert_page() within a loop. > > As this pattern is common across different drivers, it can > be generalized by creating a new function and use it across > the drivers. > > vm_insert_range is the new API which will be used to map a > range of kernel memory/pages to user vma. > > This API is tested by Heiko for Rockchip drm driver, on rk3188, > rk3288, rk3328 and rk3399 with graphics. > > Signed-off-by: Souptick Joarder > Reviewed-by: Matthew Wilcox > Reviewed-by: Mike Rapoport > Tested-by: Heiko Stuebner Looks good to me. Reviewed-by: Mauro Carvalho Chehab > --- > include/linux/mm.h | 2 ++ > mm/memory.c| 38 ++ > mm/nommu.c | 7 +++ > 3 files changed, 47 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index fcf9cc9..2bc399f 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2506,6 +2506,8 @@ unsigned long change_prot_numa(struct vm_area_struct > *vma, > int remap_pfn_range(struct vm_area_struct *, unsigned long addr, > unsigned long pfn, unsigned long size, pgprot_t); > int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page > *); > +int vm_insert_range(struct vm_area_struct *vma, unsigned long addr, > + struct page **pages, unsigned long page_count); > vm_fault_t vmf_insert_pfn(struct vm_area_struct *vma, unsigned long addr, > unsigned long pfn); > vm_fault_t vmf_insert_pfn_prot(struct vm_area_struct *vma, unsigned long > addr, > diff --git a/mm/memory.c b/mm/memory.c > index 15c417e..84ea46c 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1478,6 +1478,44 @@ static int insert_page(struct vm_area_struct *vma, > unsigned long addr, > } > > /** > + * vm_insert_range - insert range of kernel pages into user vma > + * @vma: user vma to map to > + * @addr: target user address of this page > + * @pages: pointer to array of source kernel pages > + * @page_count: number of pages need to insert into user vma > + * > + * This allows drivers to insert range of kernel pages they've allocated > + * into a user vma. This is a generic function which drivers can use > + * rather than using their own way of mapping range of kernel pages into > + * user vma. > + * > + * If we fail to insert any page into the vma, the function will return > + * immediately leaving any previously-inserted pages present. Callers > + * from the mmap handler may immediately return the error as their caller > + * will destroy the vma, removing any successfully-inserted pages. Other > + * callers should make their own arrangements for calling unmap_region(). > + * > + * Context: Process context. Called by mmap handlers. > + * Return: 0 on success and error code otherwise > + */ > +int vm_insert_range(struct vm_area_struct *vma, unsigned long addr, > + struct page **pages, unsigned long page_count) > +{ > + unsigned long uaddr = addr; > + int ret = 0, i; > + > + for (i = 0; i < page_count; i++) { > + ret = vm_insert_page(vma, uaddr, pages[i]); > + if (ret < 0) > + return ret; > + uaddr += PAGE_SIZE; > + } > + > + return ret; > +} > +EXPORT_SYMBOL(vm_insert_range); > + > +/** > * vm_insert_page - insert single page into user vma > * @vma: user vma to map to > * @addr: target user address of this page > diff --git a/mm/nommu.c b/mm/nommu.c > index 749276b..d6ef5c7 100644 > --- a/mm/nommu.c > +++ b/mm/nommu.c > @@ -473,6 +473,13 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned > long addr, > } > EXPORT_SYMBOL(vm_insert_page); > > +int vm_insert_range(struct vm_area_struct *vma, unsigned long addr, > + struct page **pages, unsigned long page_count) > +{ > + return -EINVAL; > +} > +EXPORT_SYMBOL(vm_insert_range); > + > /* > * sys_brk() for the most part doesn't need the global kernel > * lock, except when an application is doing something nasty Thanks, Mauro ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 01/34] powerpc: use mm zones more sensibly
On Fri, Dec 07, 2018 at 11:18:18PM +1100, Michael Ellerman wrote: > Christoph Hellwig writes: > > > Ben / Michael, > > > > can we get this one queued up for 4.21 to prepare for the DMA work later > > on? > > I was hoping the PASEMI / NXP regressions could be solved before > merging. > > My p5020ds is booting fine with this series, so I'm not sure why it's > causing problems on Christian's machine. > > The last time I turned on my PASEMI board it tripped some breakers, so I > need to investigate that before I can help test that. > > I'll see how things look on Monday and either merge the commits you > identified or the whole series depending on if there's any more info > from Christian. Christian just confirmed everything up to at least "powerpc/dma: stop overriding dma_get_required_mask" works for his setups. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 6/9] iommu/dma-iommu.c: Convert to use vm_insert_range
On 06/12/2018 18:43, Souptick Joarder wrote: Convert to use vm_insert_range() to map range of kernel memory to user vma. Signed-off-by: Souptick Joarder Reviewed-by: Matthew Wilcox --- drivers/iommu/dma-iommu.c | 13 +++-- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index d1b0475..a2c65e2 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -622,17 +622,10 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp, int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct *vma) { - unsigned long uaddr = vma->vm_start; - unsigned int i, count = PAGE_ALIGN(size) >> PAGE_SHIFT; - int ret = -ENXIO; + unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT; - for (i = vma->vm_pgoff; i < count && uaddr < vma->vm_end; i++) { - ret = vm_insert_page(vma, uaddr, pages[i]); - if (ret) - break; - uaddr += PAGE_SIZE; - } - return ret; + return vm_insert_range(vma, vma->vm_start, + pages + vma->vm_pgoff, count); You also need to adjust count to compensate for the pages skipped by vm_pgoff, otherwise you've got an out-of-bounds dereference triggered from userspace, which is pretty high up the "not good" scale (not to mention the entire call would then propagate -EFAULT back from vm_insert_page() and thus always appear to fail for nonzero offsets). Robin. } static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: use generic DMA mapping code in powerpc V4
On 06 December 2018 at 11:55AM, Christian Zigotzky wrote: On 05 December 2018 at 3:05PM, Christoph Hellwig wrote: Thanks. Can you try a few stepping points in the tree? First just with commit 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6 (the first one) applied? Second with all commits up to 5da11e49df21f21dac25a2491aa788307bdacb6b And if that still works with commits up to c1bfcad4b0cf38ce5b00f7ad880d3a13484c123a Hi Christoph, I undid the commit 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6 with the following command: git checkout 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6 Result: PASEMI onboard ethernet works again and the P5020 board boots. I will test the other commits in the next days. @All It is really important, that you also test Christoph's work on your PASEMI and NXP boards. Could you please help us with solving the issues? 'git clone git://git.infradead.org/users/hch/misc.git -b powerpc-dma.5 a' Thanks, Christian Today I tested the commit 5da11e49df21f21dac25a2491aa788307bdacb6b. git checkout 5da11e49df21f21dac25a2491aa788307bdacb6b The PASEMI onboard ethernet works and the P5020 board boots. -- Christian ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RFC 1/1] swiotlb: add debugfs to track swiotlb buffer usage
On 07/12/2018 05:49, Dongli Zhang wrote: On 12/07/2018 12:12 AM, Joe Jin wrote: Hi Dongli, Maybe move d_swiotlb_usage declare into swiotlb_create_debugfs(): I assume the call of swiotlb_tbl_map_single() might be frequent in some situations, e.g., when 'swiotlb=force'. That's why I declare the d_swiotlb_usage out of any functions and use "if (unlikely(!d_swiotlb_usage))". I think "if (unlikely(!d_swiotlb_usage))" incur less performance overhead than calling swiotlb_create_debugfs() every time to confirm if debugfs is created. I would declare d_swiotlb_usage statically inside swiotlb_create_debugfs() if the performance overhead is acceptable (it is trivial indeed). That is the reason I tag the patch with RFC because I am not sure if the on-demand creation of debugfs is fine with maintainers/reviewers. If swiotlb pages are never allocated, we would not be able to see the debugfs entry. I would prefer to limit the modification within swiotlb and to not taint any other files. The drawback is there is no place to create or delete the debugfs entry because swiotlb buffer could be initialized and uninitialized at very early stage. Couldn't you just do it from an initcall? All you really need to care about is ordering after debugfs_init(), which is easy. If SWIOTLB initialisation does end up being skipped at any point, nobody's going to mind if debugfs still has an entry saying io_tlb_nslabs == 0 (in fact, that's arguably useful in itself as positive confirmation that the system is not using SWIOTLB). void swiotlb_create_debugfs(void) { #ifdef CONFIG_DEBUG_FS static struct dentry *d_swiotlb_usage = NULL; if (d_swiotlb_usage) return; d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL); if (!d_swiotlb_usage) return; debugfs_create_file("usage", 0600, d_swiotlb_usage, NULL, &swiotlb_usage_fops); Maybe expose io_tlb_nslabs and io_tlb_used as separate entries? Then you could just use debugfs_create_ulong() to keep things really simple. That would also make the interface more consistent with dma-debug, which would be nice given how closely-related they are. Robin. #endif } And for io_tlb_used, possible add a check at the begin of swiotlb_tbl_map_single(), if there were not any free slots or not enough slots, return fail directly? This would optimize the slots allocation path. I will follow this in next version after I got more suggestions and confirmations from maintainers. Thank you very much! Dongli Zhang Thanks, Joe On 12/5/18 7:59 PM, Dongli Zhang wrote: The device driver will not be able to do dma operations once swiotlb buffer is full, either because the driver is using so many IO TLB blocks inflight, or because there is memory leak issue in device driver. To export the swiotlb buffer usage via debugfs would help the user estimate the size of swiotlb buffer to pre-allocate or analyze device driver memory leak issue. As the swiotlb can be initialized at very early stage when debugfs cannot register successfully, this patch creates the debugfs entry on demand. Signed-off-by: Dongli Zhang --- kernel/dma/swiotlb.c | 57 1 file changed, 57 insertions(+) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 045930e..d3c8aa4 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -35,6 +35,9 @@ #include #include #include +#ifdef CONFIG_DEBUG_FS +#include +#endif #include #include @@ -73,6 +76,13 @@ static phys_addr_t io_tlb_start, io_tlb_end; */ static unsigned long io_tlb_nslabs; +#ifdef CONFIG_DEBUG_FS +/* + * The number of used IO TLB block + */ +static unsigned long io_tlb_used; +#endif + /* * This is a free list describing the number of free entries available from * each index @@ -100,6 +110,41 @@ static DEFINE_SPINLOCK(io_tlb_lock); static int late_alloc; +#ifdef CONFIG_DEBUG_FS + +static struct dentry *d_swiotlb_usage; + +static int swiotlb_usage_show(struct seq_file *m, void *v) +{ + seq_printf(m, "%lu\n%lu\n", io_tlb_used, ); + return 0; +} + +static int swiotlb_usage_open(struct inode *inode, struct file *filp) +{ + return single_open(filp, swiotlb_usage_show, NULL); +} + +static const struct file_operations swiotlb_usage_fops = { + .open = swiotlb_usage_open, + .read = seq_read, + .llseek = seq_lseek, + .release= single_release, +}; + +void swiotlb_create_debugfs(void) +{ + d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL); + + if (!d_swiotlb_usage) + return; + + debugfs_create_file("usage", 0600, d_swiotlb_usage, + NULL, &swiotlb_usage_fops); +} + +#endif + static int __init setup_io_tlb_npages(char *str) { @@ -449,6 +494,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
Re: [PATCH v2] iommu: fix amd_iommu=force_isolation
On Thu, Dec 06, 2018 at 02:39:15PM -0700, Yu Zhao wrote: > Fixes: aafd8ba0ca74 ("iommu/amd: Implement add_device and remove_device") > > Signed-off-by: Yu Zhao > --- > drivers/iommu/amd_iommu.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 01/34] powerpc: use mm zones more sensibly
Christoph Hellwig writes: > Ben / Michael, > > can we get this one queued up for 4.21 to prepare for the DMA work later > on? I was hoping the PASEMI / NXP regressions could be solved before merging. My p5020ds is booting fine with this series, so I'm not sure why it's causing problems on Christian's machine. The last time I turned on my PASEMI board it tripped some breakers, so I need to investigate that before I can help test that. I'll see how things look on Monday and either merge the commits you identified or the whole series depending on if there's any more info from Christian. cheers > On Wed, Nov 14, 2018 at 09:22:41AM +0100, Christoph Hellwig wrote: >> Powerpc has somewhat odd usage where ZONE_DMA is used for all memory on >> common 64-bit configfs, and ZONE_DMA32 is used for 31-bit schemes. >> >> Move to a scheme closer to what other architectures use (and I dare to >> say the intent of the system): >> >> - ZONE_DMA: optionally for memory < 31-bit (64-bit embedded only) >> - ZONE_NORMAL: everything addressable by the kernel >> - ZONE_HIGHMEM: memory > 32-bit for 32-bit kernels >> >> Also provide information on how ZONE_DMA is used by defining >> ARCH_ZONE_DMA_BITS. >> >> Contains various fixes from Benjamin Herrenschmidt. >> >> Signed-off-by: Christoph Hellwig >> --- >> arch/powerpc/Kconfig | 8 +--- >> arch/powerpc/include/asm/page.h | 2 + >> arch/powerpc/include/asm/pgtable.h| 1 - >> arch/powerpc/kernel/dma-swiotlb.c | 6 +-- >> arch/powerpc/kernel/dma.c | 7 +-- >> arch/powerpc/mm/mem.c | 47 +++ >> arch/powerpc/platforms/85xx/corenet_generic.c | 10 >> arch/powerpc/platforms/85xx/qemu_e500.c | 9 >> include/linux/mmzone.h| 2 +- >> 9 files changed, 25 insertions(+), 67 deletions(-) >> >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig >> index 8be31261aec8..c3613bc1 100644 >> --- a/arch/powerpc/Kconfig >> +++ b/arch/powerpc/Kconfig >> @@ -374,9 +374,9 @@ config PPC_ADV_DEBUG_DAC_RANGE >> depends on PPC_ADV_DEBUG_REGS && 44x >> default y >> >> -config ZONE_DMA32 >> +config ZONE_DMA >> bool >> -default y if PPC64 >> +default y if PPC_BOOK3E_64 >> >> config PGTABLE_LEVELS >> int >> @@ -869,10 +869,6 @@ config ISA >>have an IBM RS/6000 or pSeries machine, say Y. If you have an >>embedded board, consult your board documentation. >> >> -config ZONE_DMA >> -bool >> -default y >> - >> config GENERIC_ISA_DMA >> bool >> depends on ISA_DMA_API >> diff --git a/arch/powerpc/include/asm/page.h >> b/arch/powerpc/include/asm/page.h >> index f6a1265face2..fc8c9ac0c6be 100644 >> --- a/arch/powerpc/include/asm/page.h >> +++ b/arch/powerpc/include/asm/page.h >> @@ -354,4 +354,6 @@ typedef struct page *pgtable_t; >> #endif /* __ASSEMBLY__ */ >> #include >> >> +#define ARCH_ZONE_DMA_BITS 31 >> + >> #endif /* _ASM_POWERPC_PAGE_H */ >> diff --git a/arch/powerpc/include/asm/pgtable.h >> b/arch/powerpc/include/asm/pgtable.h >> index 9679b7519a35..8af32ce93c7f 100644 >> --- a/arch/powerpc/include/asm/pgtable.h >> +++ b/arch/powerpc/include/asm/pgtable.h >> @@ -66,7 +66,6 @@ extern unsigned long empty_zero_page[]; >> >> extern pgd_t swapper_pg_dir[]; >> >> -void limit_zone_pfn(enum zone_type zone, unsigned long max_pfn); >> int dma_pfn_limit_to_zone(u64 pfn_limit); >> extern void paging_init(void); >> >> diff --git a/arch/powerpc/kernel/dma-swiotlb.c >> b/arch/powerpc/kernel/dma-swiotlb.c >> index 5fc335f4d9cd..678811abccfc 100644 >> --- a/arch/powerpc/kernel/dma-swiotlb.c >> +++ b/arch/powerpc/kernel/dma-swiotlb.c >> @@ -108,12 +108,8 @@ int __init swiotlb_setup_bus_notifier(void) >> >> void __init swiotlb_detect_4g(void) >> { >> -if ((memblock_end_of_DRAM() - 1) > 0x) { >> +if ((memblock_end_of_DRAM() - 1) > 0x) >> ppc_swiotlb_enable = 1; >> -#ifdef CONFIG_ZONE_DMA32 >> -limit_zone_pfn(ZONE_DMA32, (1ULL << 32) >> PAGE_SHIFT); >> -#endif >> -} >> } >> >> static int __init check_swiotlb_enabled(void) >> diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c >> index dbfc7056d7df..6551685a4ed0 100644 >> --- a/arch/powerpc/kernel/dma.c >> +++ b/arch/powerpc/kernel/dma.c >> @@ -50,7 +50,7 @@ static int dma_nommu_dma_supported(struct device *dev, u64 >> mask) >> return 1; >> >> #ifdef CONFIG_FSL_SOC >> -/* Freescale gets another chance via ZONE_DMA/ZONE_DMA32, however >> +/* Freescale gets another chance via ZONE_DMA, however >> * that will have to be refined if/when they support iommus >> */ >> return 1; >> @@ -94,13 +94,10 @@ void *__dma_nommu_alloc_coherent(struct device *dev, >> size_t size, >> } >> >> switch (zone) { >> +#ifdef CONFIG_ZONE_DMA >> case ZONE_DMA: >> flag |= GFP_DMA; >>
Re: [RFC PATCH 0/6] Auxiliary IOMMU domains and Arm SMMUv3
Hi, On Mon, Nov 26, 2018 at 07:29:45AM +, Tian, Kevin wrote: > btw Baolu just reminded me one thing which is worthy of noting here. > 'primary' vs. 'aux' concept makes sense only when we look from a device > p.o.v. That binding relationship is not (*should not be*) carry-and-forwarded > cross devices. every domain must be explicitly attached to other devices > (instead of implicitly attached being above example), and new primary/aux > attribute on another device will be decided at attach time. Okay, so after all the discussions we had I learned a few more things about the scalable mode feature and thought a bit longer about how to best support it in the IOMMU-API. The concept of sub-domains I initially proposed certainly makes no sense, but scalable-mode specific attach/detach functions do. So instead of a sub-domain mode, I'd like to propose device-feature sets. The posted patch-set already includes this as device-attributes, but I don't like this naming as we are really talking about additional feature sets of a device. So how about we introduce this: enum iommu_dev_features { /* ... */ IOMMU_DEV_FEAT_AUX, IOMMU_DEV_FEAT_SVA, /* ... */ }; /* Check if a device supports a given feature of the IOMMU-API */ bool iommu_dev_has_feature(struct device *dev, enum iommu_dev_features *feat); /* * Only works if iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX) * returns true * * Also, as long as domains are attached to a device through * this interface, any trys to call iommu_attach_device() should * fail (iommu_detach_device() can't fail, so we fail on the * tryint to re-attach). This should make us safe against a * device being attached to a guest as a whole while there are * still pasid users on it (aux and sva). */ int iommu_aux_attach_device(struct iommu_domain *domain, struct device *dev); int iommu_aux_detach_device(struct iommu_domain *domain, struct device *dev); /* * I know we are targeting a system-wide pasid-space, so that * the pasid would be the same for one domain on all devices, * let's just keep the option open to have different * pasid-spaces in one system. Also this way we can use it to * check whether the domain is attached to this device at all. * * returns pasid or <0 if domain has no pasid on that device. */ int iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev); /* So we need a iommu_aux_detach_all()? */ This concept can also be easily extended for supporting SVA in parallel on the same device, with the same constraints regarding the behavior of iommu_attach_device()/iommu_detach_device(). So what do you think about that approach? Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 08/17] locking/lockdep: Add support for nestable terminal locks
On Fri, Dec 07, 2018 at 10:22:52AM +0100, Peter Zijlstra wrote: > On Mon, Nov 19, 2018 at 01:55:17PM -0500, Waiman Long wrote: > > There are use cases where we want to allow nesting of one terminal lock > > underneath another terminal-like lock. That new lock type is called > > nestable terminal lock which can optionally allow the acquisition of > > no more than one regular (non-nestable) terminal lock underneath it. > > I think I asked for a more coherent changelog on this. The above is > still self contradictory and doesn't explain why you'd ever want such a > 'misfeature' :-) So maybe call the thing penterminal (contraction of penultimate and terminal) locks and explain why this annotation is safe -- in great detail. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 09/17] debugobjects: Make object hash locks nestable terminal locks
On Mon, Nov 19, 2018 at 01:55:18PM -0500, Waiman Long wrote: > By making the object hash locks nestable terminal locks, we can avoid > a bunch of unnecessary lockdep validations as well as saving space > in the lockdep tables. So the 'problem'; which you've again not explained; is that debugobjects has the following lock order: &db->lock &pool_lock And you seem to want to tag '&db->lock' as terminal, which is obviuosly a big fat lie. You've also not explained why it is safe to do this (I think it actually is, but you really should've spelled it out). Furthermore; you've not justified any of this 'insanity' with numbers. What do we gain with this nestable madness that justifies the crazy? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/5] iommu/of: Use device_iommu_mapped()
On Thu, Dec 06, 2018 at 05:42:16PM +, Robin Murphy wrote: > For sure - although I am now wondering whether "mapped" is perhaps a little > ambiguous in the naming, since the answer to "can I use the API" is yes even > when the device may currently be attached to an identity/passthrough domain > or blocked completely, neither of which involve any "mapping". Maybe simply > "device_has_iommu()" would convey the intent better? The name is shorter version of: device_is_behind_an_iommu_remapping_its_dma_transactions() :) The name is not perfect, but device_has_iommu() is not better because it might be considered as a check whether the device itself has an IOMMU built-in. In the end an identity-mapping is also still a mapping (if the iommu needs a page-table for that is an implementation detail), as is a mapping with no page-table entries at all (blocking). So I think device_iommu_mapped() is a reasonable choice. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu/arm-smmu: Add support to use Last level cache
Hi Robin, On Tue, Dec 4, 2018 at 8:51 PM Robin Murphy wrote: > > On 04/12/2018 11:01, Vivek Gautam wrote: > > Qualcomm SoCs have an additional level of cache called as > > System cache, aka. Last level cache (LLC). This cache sits right > > before the DDR, and is tightly coupled with the memory controller. > > The cache is available to all the clients present in the SoC system. > > The clients request their slices from this system cache, make it > > active, and can then start using it. > > For these clients with smmu, to start using the system cache for > > buffers and, related page tables [1], memory attributes need to be > > set accordingly. > > This change updates the MAIR and TCR configurations with correct > > attributes to use this system cache. > > > > To explain a little about memory attribute requirements here: > > > > Non-coherent I/O devices can't look-up into inner caches. However, > > coherent I/O devices can. But both can allocate in the system cache > > based on system policy and configured memory attributes in page > > tables. > > CPUs can access both inner and outer caches (including system cache, > > aka. Last level cache), and can allocate into system cache too > > based on memory attributes, and system policy. > > > > Further looking at memory types, we have following - > > a) Normal uncached :- MAIR 0x44, inner non-cacheable, > >outer non-cacheable; > > b) Normal cached :- MAIR 0xff, inner read write-back non-transient, > >outer read write-back non-transient; > >attribute setting for coherenet I/O devices. > > > > and, for non-coherent i/o devices that can allocate in system cache > > another type gets added - > > c) Normal sys-cached/non-inner-cached :- > >MAIR 0xf4, inner non-cacheable, > >outer read write-back non-transient > > > > So, CPU will automatically use the system cache for memory marked as > > normal cached. The normal sys-cached is downgraded to normal non-cached > > memory for CPUs. > > Coherent I/O devices can use system cache by marking the memory as > > normal cached. > > Non-coherent I/O devices, to use system cache, should mark the memory as > > normal sys-cached in page tables. > > > > This change is a realisation of following changes > > from downstream msm-4.9: > > iommu: io-pgtable-arm: Support DOMAIN_ATTRIBUTE_USE_UPSTREAM_HINT[2] > > iommu: io-pgtable-arm: Implement IOMMU_USE_UPSTREAM_HINT[3] > > > > [1] https://patchwork.kernel.org/patch/10302791/ > > [2] > > https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9&id=bf762276796e79ca90014992f4d9da5593fa7d51 > > [3] > > https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9&id=d4c72c413ea27c43f60825193d4de9cb8ffd9602 > > > > Signed-off-by: Vivek Gautam > > --- > > > > Changes since v1: > > - Addressed Tomasz's comments for basing the change on > > "NO_INNER_CACHE" concept for non-coherent I/O devices > > rather than capturing "SYS_CACHE". This is to indicate > > clearly the intent of non-coherent I/O devices that > > can't access inner caches. > > That seems backwards to me - there is already a fundamental assumption > that non-coherent devices can't access caches. What we're adding here is > a weird exception where they *can* use some level of cache despite still > being non-coherent overall. > > In other words, it's not a case of downgrading coherent devices' > accesses to bypass inner caches, it's upgrading non-coherent devices' > accesses to hit the outer cache. That's certainly the understanding I > got from talking with Pratik at Plumbers, and it does appear to fit with > your explanation above despite the final conclusion you draw being > different. Thanks for the thorough review of the change. Right, I guess it's rather an upgrade for non-coherent devices to use an outer cache than a downgrade for coherent devices. > > I do see what Tomasz meant in terms of the TCR attributes, but what we > currently do there is a little unintuitive and not at all representative > of actual mapping attributes - I'll come back to that inline. > > > drivers/iommu/arm-smmu.c | 15 +++ > > drivers/iommu/dma-iommu.c | 3 +++ > > drivers/iommu/io-pgtable-arm.c | 22 +- > > drivers/iommu/io-pgtable.h | 5 + > > include/linux/iommu.h | 3 +++ > > 5 files changed, 43 insertions(+), 5 deletions(-) > > As a minor nit, I'd prefer this as at least two patches to separate the > io-pgtable changes and arm-smmu changes - basically I'd expect it to > look much the same as the non-strict mode support did. Sure, will split the patch. > > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > > index ba18d89d4732..047f7ff95b0d 100644 > > --- a/drivers/iommu/arm-smmu.c > > +++ b/drivers/iommu/arm-smmu.c > > @@ -255,6 +255,7 @@ struct arm_smmu_domain { > > struct mutex
Re: [PATCH v2 08/17] locking/lockdep: Add support for nestable terminal locks
On Mon, Nov 19, 2018 at 01:55:17PM -0500, Waiman Long wrote: > There are use cases where we want to allow nesting of one terminal lock > underneath another terminal-like lock. That new lock type is called > nestable terminal lock which can optionally allow the acquisition of > no more than one regular (non-nestable) terminal lock underneath it. I think I asked for a more coherent changelog on this. The above is still self contradictory and doesn't explain why you'd ever want such a 'misfeature' :-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections
On Mon, Nov 19, 2018 at 01:55:16PM -0500, Waiman Long wrote: > The db->lock is a raw spinlock and so the lock hold time is supposed > to be short. This will not be the case when printk() is being involved > in some of the critical sections. In order to avoid the long hold time, > in case some messages need to be printed, the debug_object_is_on_stack() > and debug_print_object() calls are now moved out of those critical > sections. That's not why you did this patch though; you want to make these locks terminal locks and that means no printk() inside, as that uses locks again. Please write relevant changelogs. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 03/17] locking/lockdep: Add a new terminal lock type
On Mon, Nov 19, 2018 at 01:55:12PM -0500, Waiman Long wrote: > A terminal lock is a lock where further locking or unlocking on another > lock is not allowed. IOW, no forward dependency is permitted. > > With such a restriction in place, we don't really need to do a full > validation of the lock chain involving a terminal lock. Instead, > we just check if there is any further locking or unlocking on another > lock when a terminal lock is being held. > > Only spinlocks which are acquired by the _irq or _irqsave variants > or in IRQ disabled context should be classified as terminal locks. > > By adding this new lock type, we can save entries in lock_chains[], > chain_hlocks[], list_entries[] and stack_trace[]. By marking suitable > locks as terminal, we reduce the chance of overflowing those tables > allowing them to focus on locks that can have both forward and backward > dependencies. > > Four bits are stolen from the pin_count of the held_lock structure > to hold a new 4-bit flags field. The pin_count field is essentially a > summation of 16-bit random cookie values. Removing 4 bits still allow > the pin_count to accumulate up to almost 4096 of those cookie values. > > Signed-off-by: Waiman Long > --- > include/linux/lockdep.h| 29 --- > kernel/locking/lockdep.c | 47 > -- > kernel/locking/lockdep_internals.h | 5 > kernel/locking/lockdep_proc.c | 11 +++-- > 4 files changed, 80 insertions(+), 12 deletions(-) > > diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h > index 8fe5b4f..a146bca 100644 > --- a/include/linux/lockdep.h > +++ b/include/linux/lockdep.h > @@ -144,9 +144,20 @@ struct lock_class_stats { > > /* > * Lockdep class type flags > + * > * 1) LOCKDEP_FLAG_NOVALIDATE: No full validation, just simple checks. > + * 2) LOCKDEP_FLAG_TERMINAL: This is a terminal lock where lock/unlock on > + *another lock within its critical section is not allowed. > + * > + * Only the least significant 4 bits of the flags will be copied to the > + * held_lock structure. > */ > -#define LOCKDEP_FLAG_NOVALIDATE (1 << 0) > +#define LOCKDEP_FLAG_TERMINAL(1 << 0) > +#define LOCKDEP_FLAG_NOVALIDATE (1 << 4) Just leave the novalidate thing along, then you don't get to do silly things like this. Also; I have a pending patch (that I never quite finished) that tests lock nesting type (ie. raw_spinlock_t < spinlock_t < struct mutex) that wanted to use many of these same holes you took. I think we can easily fit the lot together in bitfields though, since you really don't need that many flags. I refreshed the below patch a number of months ago (no idea if it still applies, I think it was before Paul munged all of RCU). You need to kill printk and lift a few RT patches for the below to 'work' IIRC. --- Subject: lockdep: Introduce wait-type checks From: Peter Zijlstra Date: Tue, 19 Nov 2013 21:45:48 +0100 This patch extends lockdep to validate lock wait-type context. The current wait-types are: LD_WAIT_FREE, /* wait free, rcu etc.. */ LD_WAIT_SPIN, /* spin loops, raw_spinlock_t etc.. */ LD_WAIT_CONFIG, /* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */ LD_WAIT_SLEEP, /* sleeping locks, mutex_t etc.. */ Where lockdep validates that the current lock (the one being acquired) fits in the current wait-context (as generated by the held stack). This ensures that we do not try and acquire mutices while holding spinlocks, do not attempt to acquire spinlocks while holding raw_spinlocks and so on. In other words, its a more fancy might_sleep(). Obviously RCU made the entire ordeal more complex than a simple single value test because we can acquire RCU in (pretty much) any context and while it presents a context to nested locks it is not the same as it got acquired in. Therefore we needed to split the wait_type into two values, one representing the acquire (outer) and one representing the nested context (inner). For most 'normal' locks these two are the same. [ To make static initialization easier we have the rule that: .outer == INV means .outer == .inner; because INV == 0. ] It further means that we need to find the minimal .inner of the held stack to compare against the outer of the new lock; because while 'normal' RCU presents a CONFIG type to nested locks, if it is taken while already holding a SPIN type it obviously doesn't relax the rules. Below is an example output; generated by the trivial example: raw_spin_lock(&foo); spin_lock(&bar); spin_unlock(&bar); raw_spin_unlock(&foo); The way to read it is to look at the new -{n,m} part in the lock description; -{3:3} for our attempted lock, and try and match that up to the held locks, which in this case is the one: -{2,2}. This tells us that the acquiring lock requires a more relaxed environment that presented by the lock sta
Re: [PATCH v5 2/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
On Fri, Dec 7, 2018 at 4:05 PM Matthew Wilcox wrote: > > On Fri, Dec 07, 2018 at 02:16:19PM +0800, Nicolas Boichat wrote: > > +#ifdef CONFIG_ZONE_DMA32 > > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32 > > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32 > > This name doesn't make any sense. Why not ARM_V7S_TABLE_SLAB_FLAGS ? Sure, will fix in v6 then. > > +#else > > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA > > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA > > Can you remind me again why it is, on machines which don't support > ZONE_DMA32, why we have to allocate from ZONE_DMA? My understanding > is that 64-bit machines have ZONE_DMA32 and 32-bit machines don't. > So shouldn't this rather be GFP_KERNEL? Sorry I mean to reply on the v4 thread, Christoph raised the same question (https://patchwork.kernel.org/patch/10713025/). I don't know, and I don't have all the hardware needed to test this ,-( Robin and Will both didn't seem sure. I'd rather not introduce a new regression, this patch series tries to fix a known arm64 regression, where we _need_ tables to be in DMA32. If we want to change 32-bit hardware to use GFP_KERNEL, and we're sure it works, that's fine by me, but it should be in another patch set. Hope this makes sense, Thanks, > Actually, maybe we could centralise this in gfp.h: > > #ifdef CONFIG_64BIT > # ifdef CONFIG_ZONE_DMA32 > #define GFP_32BIT GFP_DMA32 > # else > #define GFP_32BIT GFP_DMA > #else /* 32-bit */ > #define GFP_32BIT GFP_KERNEL > #endif > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 2/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
On Fri, Dec 07, 2018 at 02:16:19PM +0800, Nicolas Boichat wrote: > +#ifdef CONFIG_ZONE_DMA32 > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32 > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32 This name doesn't make any sense. Why not ARM_V7S_TABLE_SLAB_FLAGS ? > +#else > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA Can you remind me again why it is, on machines which don't support ZONE_DMA32, why we have to allocate from ZONE_DMA? My understanding is that 64-bit machines have ZONE_DMA32 and 32-bit machines don't. So shouldn't this rather be GFP_KERNEL? Actually, maybe we could centralise this in gfp.h: #ifdef CONFIG_64BIT # ifdef CONFIG_ZONE_DMA32 #define GFP_32BIT GFP_DMA32 # else #define GFP_32BIT GFP_DMA #else /* 32-bit */ #define GFP_32BIT GFP_KERNEL #endif ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu