Re: [PATCH v3 1/9] mm: Introduce new vm_insert_range API

2018-12-07 Thread Souptick Joarder
On Sat, Dec 8, 2018 at 2:40 AM Robin Murphy wrote: > > On 2018-12-07 7:28 pm, Souptick Joarder wrote: > > On Fri, Dec 7, 2018 at 10:41 PM Matthew Wilcox wrote: > >> > >> On Fri, Dec 07, 2018 at 03:34:56PM +, Robin Murphy wrote: > +int vm_insert_range(struct vm_area_struct *vma, unsigned

Re: [PATCH v3 1/9] mm: Introduce new vm_insert_range API

2018-12-07 Thread Robin Murphy
On 2018-12-07 7:28 pm, Souptick Joarder wrote: On Fri, Dec 7, 2018 at 10:41 PM Matthew Wilcox wrote: On Fri, Dec 07, 2018 at 03:34:56PM +, Robin Murphy wrote: +int vm_insert_range(struct vm_area_struct *vma, unsigned long addr, + struct page **pages, unsigned long

Re: [PATCH v3 6/9] iommu/dma-iommu.c: Convert to use vm_insert_range

2018-12-07 Thread Souptick Joarder
On Fri, Dec 7, 2018 at 7:17 PM Robin Murphy wrote: > > On 06/12/2018 18:43, Souptick Joarder wrote: > > Convert to use vm_insert_range() to map range of kernel > > memory to user vma. > > > > Signed-off-by: Souptick Joarder > > Reviewed-by: Matthew Wilcox > > --- > > drivers/iommu/dma-iommu.c

Re: [PATCH v3 1/9] mm: Introduce new vm_insert_range API

2018-12-07 Thread Souptick Joarder
On Fri, Dec 7, 2018 at 10:41 PM Matthew Wilcox wrote: > > On Fri, Dec 07, 2018 at 03:34:56PM +, Robin Murphy wrote: > > > +int vm_insert_range(struct vm_area_struct *vma, unsigned long addr, > > > + struct page **pages, unsigned long page_count) > > > +{ > > > + unsigned

[PATCH 15/15] dma-mapping: bypass indirect calls for dma-direct

2018-12-07 Thread Christoph Hellwig
Avoid expensive indirect calls in the fast path DMA mapping operations by directly calling the dma_direct_* ops if we are using the directly mapped DMA operations. Signed-off-by: Christoph Hellwig --- arch/alpha/include/asm/dma-mapping.h | 2 +- arch/arc/mm/cache.c | 2 +-

[PATCH 13/15] ACPI / scan: Refactor _CCA enforcement

2018-12-07 Thread Christoph Hellwig
From: Robin Murphy Rather than checking the DMA attribute at each callsite, just pass it through for acpi_dma_configure() to handle directly. That can then deal with the relatively exceptional DEV_DMA_NOT_SUPPORTED case by explicitly installing dummy DMA ops instead of just skipping setup

[PATCH 12/15] dma-mapping: factor out dummy DMA ops

2018-12-07 Thread Christoph Hellwig
From: Robin Murphy The dummy DMA ops are currently used by arm64 for any device which has an invalid ACPI description and is thus barred from using DMA due to not knowing whether is is cache-coherent or not. Factor these out into general dma-mapping code so that they can be referenced from other

[PATCH 08/15] dma-mapping: move dma_get_required_mask to kernel/dma

2018-12-07 Thread Christoph Hellwig
dma_get_required_mask should really be with the rest of the DMA mapping implementation instead of in drivers/base as a lone outlier. Signed-off-by: Christoph Hellwig --- drivers/base/platform.c | 31 --- kernel/dma/mapping.c| 34 +-

[PATCH 10/15] dma-mapping: move dma_cache_sync out of line

2018-12-07 Thread Christoph Hellwig
This isn't exactly a slow path routine, but it is not super critical either, and moving it out of line will help to keep the include chain clean for the following DMA indirection bypass work. Signed-off-by: Christoph Hellwig --- include/linux/dma-mapping.h | 12 ++--

[PATCH 11/15] dma-mapping: always build the direct mapping code

2018-12-07 Thread Christoph Hellwig
All architectures except for sparc64 use the dma-direct code in some form, and even for sparc64 we had the discussion of a direct mapping mode a while ago. In preparation for directly calling the direct mapping code don't bother having it optionally but always build the code in. This is a minor

[PATCH 09/15] dma-mapping: move various slow path functions out of line

2018-12-07 Thread Christoph Hellwig
There is no need to have all setup and coherent allocation / freeing routines inline. Move them out of line to keep the implemeation nicely encapsulated and save some kernel text size. Signed-off-by: Christoph Hellwig --- arch/powerpc/include/asm/dma-mapping.h | 1 -

[PATCH 06/15] dma-mapping: simplify the dma_sync_single_range_for_{cpu, device} implementation

2018-12-07 Thread Christoph Hellwig
We can just call the regular calls after adding offset the the address instead of reimplementing them. Signed-off-by: Christoph Hellwig --- include/linux/dma-debug.h | 27 include/linux/dma-mapping.h | 34 +- kernel/dma/debug.c |

[PATCH 05/15] dma-direct: merge swiotlb_dma_ops into the dma_direct code

2018-12-07 Thread Christoph Hellwig
While the dma-direct code is (relatively) clean and simple we actually have to use the swiotlb ops for the mapping on many architectures due to devices with addressing limits. Instead of keeping two implementations around this commit allows the dma-direct implementation to call the swiotlb bounce

[PATCH 04/15] dma-direct: use dma_direct_map_page to implement dma_direct_map_sg

2018-12-07 Thread Christoph Hellwig
No need to duplicate the mapping logic. Signed-off-by: Christoph Hellwig --- kernel/dma/direct.c | 14 +- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index edb24f94ea1e..d45306473c90 100644 --- a/kernel/dma/direct.c +++

[RFC] avoid indirect calls for DMA direct mappings v2

2018-12-07 Thread Christoph Hellwig
Hi all, a while ago Jesper reported major performance regressions due to the spectre v2 mitigations in his XDP forwarding workloads. A large part of that is due to the DMA mapping API indirect calls. It turns out that the most common implementation of the DMA API is the direct mapping case, and

[PATCH 01/15] swiotlb: remove SWIOTLB_MAP_ERROR

2018-12-07 Thread Christoph Hellwig
We can use DMA_MAPPING_ERROR instead, which already maps to the same value. Signed-off-by: Christoph Hellwig --- drivers/xen/swiotlb-xen.c | 4 ++-- include/linux/swiotlb.h | 3 --- kernel/dma/swiotlb.c | 4 ++-- 3 files changed, 4 insertions(+), 7 deletions(-) diff --git

Re: [virtio-dev] Re: [PATCH v5 5/7] iommu: Add virtio-iommu driver

2018-12-07 Thread Jean-Philippe Brucker
Sorry for the delay, I wanted to do a little more performance analysis before continuing. On 27/11/2018 18:10, Michael S. Tsirkin wrote: > On Tue, Nov 27, 2018 at 05:55:20PM +, Jean-Philippe Brucker wrote: + if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1) || +

Re: [PATCH 0/2] Refactor dummy DMA ops

2018-12-07 Thread Robin Murphy
On 07/12/2018 17:05, Christoph Hellwig wrote: So I'd really prefer if we had a separate dummy.c file, like in my take on your previous patch here: http://git.infradead.org/users/hch/misc.git/commitdiff/e01adddc1733fa414dc16cd22e8f58be9b64a025

Re: [PATCH v3 1/9] mm: Introduce new vm_insert_range API

2018-12-07 Thread Matthew Wilcox
On Fri, Dec 07, 2018 at 03:34:56PM +, Robin Murphy wrote: > > +int vm_insert_range(struct vm_area_struct *vma, unsigned long addr, > > + struct page **pages, unsigned long page_count) > > +{ > > + unsigned long uaddr = addr; > > + int ret = 0, i; > > Some of the sites

Re: [PATCH 0/2] Refactor dummy DMA ops

2018-12-07 Thread Christoph Hellwig
So I'd really prefer if we had a separate dummy.c file, like in my take on your previous patch here: http://git.infradead.org/users/hch/misc.git/commitdiff/e01adddc1733fa414dc16cd22e8f58be9b64a025 http://git.infradead.org/users/hch/misc.git/commitdiff/596bde76e5944a3f4beb8c2769067ca88dda127a

[PATCH 0/2] Refactor dummy DMA ops

2018-12-07 Thread Robin Murphy
Hi all, Tangential to Christoph's RFC for mitigating indirect call overhead in common DMA mapping scenarios[1], this is a little reshuffle to prevent the CONFIG_ACPI_CCA_REQUIRED case from getting in the way. This would best go via the dma-mapping tree, so reviews and acks welcome. Robin. [1]

[PATCH 1/2] dma-mapping: Factor out dummy DMA ops

2018-12-07 Thread Robin Murphy
The dummy DMA ops are currently used by arm64 for any device which has an invalid ACPI description and is thus barred from using DMA due to not knowing whether is is cache-coherent or not. Factor these out into general dma-mapping code so that they can be referenced from other common code paths.

[PATCH 2/2] ACPI / scan: Refactor _CCA enforcement

2018-12-07 Thread Robin Murphy
Rather than checking the DMA attribute at each callsite, just pass it through for acpi_dma_configure() to handle directly. That can then deal with the relatively exceptional DEV_DMA_NOT_SUPPORTED case by explicitly installing dummy DMA ops instead of just skipping setup entirely. This will then

Re: [RFC] avoid indirect calls for DMA direct mappings

2018-12-07 Thread Jesper Dangaard Brouer
On Fri, 7 Dec 2018 16:44:35 +0100 Jesper Dangaard Brouer wrote: > On Fri, 7 Dec 2018 02:21:42 +0100 > Christoph Hellwig wrote: > > > On Thu, Dec 06, 2018 at 08:24:38PM +, Robin Murphy wrote: > > > On 06/12/2018 20:00, Christoph Hellwig wrote: > > >> On Thu, Dec 06, 2018 at 06:54:17PM

Re: [RFC] avoid indirect calls for DMA direct mappings

2018-12-07 Thread Jesper Dangaard Brouer
On Fri, 7 Dec 2018 02:21:42 +0100 Christoph Hellwig wrote: > On Thu, Dec 06, 2018 at 08:24:38PM +, Robin Murphy wrote: > > On 06/12/2018 20:00, Christoph Hellwig wrote: > >> On Thu, Dec 06, 2018 at 06:54:17PM +, Robin Murphy wrote: > >>> I'm pretty sure we used to assign

dma_declare_coherent_memory on main memory

2018-12-07 Thread Christoph Hellwig
Hi all, the ARM imx27/31 ports and various sh boards use dma_declare_coherent_memory on main memory taken from the memblock allocator. Is there any good reason these couldn't be switched to CMA areas? Getting rid of these magic dma_declare_coherent_memory area would help making the dma allocator

Re: [PATCH v3 1/9] mm: Introduce new vm_insert_range API

2018-12-07 Thread Robin Murphy
On 06/12/2018 18:39, Souptick Joarder wrote: Previouly drivers have their own way of mapping range of kernel pages/memory into user vma and this was done by invoking vm_insert_page() within a loop. As this pattern is common across different drivers, it can be generalized by creating a new

Re: [PATCH 01/34] powerpc: use mm zones more sensibly

2018-12-07 Thread Christian Zigotzky
I will work at the weekend to figure out where the problematic commit is. — Christian Sent from my iPhone > On 7. Dec 2018, at 15:09, Christoph Hellwig wrote: > >> On Fri, Dec 07, 2018 at 11:18:18PM +1100, Michael Ellerman wrote: >> Christoph Hellwig writes: >> >>> Ben / Michael, >>> >>>

Re: [PATCH v5 2/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

2018-12-07 Thread Vlastimil Babka
On 12/7/18 7:16 AM, Nicolas Boichat wrote: > IOMMUs using ARMv7 short-descriptor format require page tables > (level 1 and 2) to be allocated within the first 4GB of RAM, even > on 64-bit systems. > > For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32 > is defined (e.g. on arm64

Re: [PATCH v3 1/9] mm: Introduce new vm_insert_range API

2018-12-07 Thread Mauro Carvalho Chehab
Em Fri, 7 Dec 2018 00:09:45 +0530 Souptick Joarder escreveu: > Previouly drivers have their own way of mapping range of > kernel pages/memory into user vma and this was done by > invoking vm_insert_page() within a loop. > > As this pattern is common across different drivers, it can > be

Re: [PATCH 01/34] powerpc: use mm zones more sensibly

2018-12-07 Thread Christoph Hellwig
On Fri, Dec 07, 2018 at 11:18:18PM +1100, Michael Ellerman wrote: > Christoph Hellwig writes: > > > Ben / Michael, > > > > can we get this one queued up for 4.21 to prepare for the DMA work later > > on? > > I was hoping the PASEMI / NXP regressions could be solved before > merging. > > My

Re: [PATCH v3 6/9] iommu/dma-iommu.c: Convert to use vm_insert_range

2018-12-07 Thread Robin Murphy
On 06/12/2018 18:43, Souptick Joarder wrote: Convert to use vm_insert_range() to map range of kernel memory to user vma. Signed-off-by: Souptick Joarder Reviewed-by: Matthew Wilcox --- drivers/iommu/dma-iommu.c | 13 +++-- 1 file changed, 3 insertions(+), 10 deletions(-) diff

Re: use generic DMA mapping code in powerpc V4

2018-12-07 Thread Christian Zigotzky
On 06 December 2018 at 11:55AM, Christian Zigotzky wrote: On 05 December 2018 at 3:05PM, Christoph Hellwig wrote: Thanks.  Can you try a few stepping points in the tree? First just with commit 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6 (the first one) applied? Second with all commits up to

Re: [PATCH RFC 1/1] swiotlb: add debugfs to track swiotlb buffer usage

2018-12-07 Thread Robin Murphy
On 07/12/2018 05:49, Dongli Zhang wrote: On 12/07/2018 12:12 AM, Joe Jin wrote: Hi Dongli, Maybe move d_swiotlb_usage declare into swiotlb_create_debugfs(): I assume the call of swiotlb_tbl_map_single() might be frequent in some situations, e.g., when 'swiotlb=force'. That's why I declare

Re: [PATCH v2] iommu: fix amd_iommu=force_isolation

2018-12-07 Thread Joerg Roedel
On Thu, Dec 06, 2018 at 02:39:15PM -0700, Yu Zhao wrote: > Fixes: aafd8ba0ca74 ("iommu/amd: Implement add_device and remove_device") > > Signed-off-by: Yu Zhao > --- > drivers/iommu/amd_iommu.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) Applied, thanks.

Re: [PATCH 01/34] powerpc: use mm zones more sensibly

2018-12-07 Thread Michael Ellerman
Christoph Hellwig writes: > Ben / Michael, > > can we get this one queued up for 4.21 to prepare for the DMA work later > on? I was hoping the PASEMI / NXP regressions could be solved before merging. My p5020ds is booting fine with this series, so I'm not sure why it's causing problems on

Re: [RFC PATCH 0/6] Auxiliary IOMMU domains and Arm SMMUv3

2018-12-07 Thread 'j...@8bytes.org'
Hi, On Mon, Nov 26, 2018 at 07:29:45AM +, Tian, Kevin wrote: > btw Baolu just reminded me one thing which is worthy of noting here. > 'primary' vs. 'aux' concept makes sense only when we look from a device > p.o.v. That binding relationship is not (*should not be*) carry-and-forwarded > cross

Re: [PATCH v2 08/17] locking/lockdep: Add support for nestable terminal locks

2018-12-07 Thread Peter Zijlstra
On Fri, Dec 07, 2018 at 10:22:52AM +0100, Peter Zijlstra wrote: > On Mon, Nov 19, 2018 at 01:55:17PM -0500, Waiman Long wrote: > > There are use cases where we want to allow nesting of one terminal lock > > underneath another terminal-like lock. That new lock type is called > > nestable terminal

Re: [PATCH v2 09/17] debugobjects: Make object hash locks nestable terminal locks

2018-12-07 Thread Peter Zijlstra
On Mon, Nov 19, 2018 at 01:55:18PM -0500, Waiman Long wrote: > By making the object hash locks nestable terminal locks, we can avoid > a bunch of unnecessary lockdep validations as well as saving space > in the lockdep tables. So the 'problem'; which you've again not explained; is that

Re: [PATCH 2/5] iommu/of: Use device_iommu_mapped()

2018-12-07 Thread Joerg Roedel
On Thu, Dec 06, 2018 at 05:42:16PM +, Robin Murphy wrote: > For sure - although I am now wondering whether "mapped" is perhaps a little > ambiguous in the naming, since the answer to "can I use the API" is yes even > when the device may currently be attached to an identity/passthrough domain >

Re: [PATCH 1/1] iommu/arm-smmu: Add support to use Last level cache

2018-12-07 Thread Vivek Gautam
Hi Robin, On Tue, Dec 4, 2018 at 8:51 PM Robin Murphy wrote: > > On 04/12/2018 11:01, Vivek Gautam wrote: > > Qualcomm SoCs have an additional level of cache called as > > System cache, aka. Last level cache (LLC). This cache sits right > > before the DDR, and is tightly coupled with the memory

Re: [PATCH v2 08/17] locking/lockdep: Add support for nestable terminal locks

2018-12-07 Thread Peter Zijlstra
On Mon, Nov 19, 2018 at 01:55:17PM -0500, Waiman Long wrote: > There are use cases where we want to allow nesting of one terminal lock > underneath another terminal-like lock. That new lock type is called > nestable terminal lock which can optionally allow the acquisition of > no more than one

Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections

2018-12-07 Thread Peter Zijlstra
On Mon, Nov 19, 2018 at 01:55:16PM -0500, Waiman Long wrote: > The db->lock is a raw spinlock and so the lock hold time is supposed > to be short. This will not be the case when printk() is being involved > in some of the critical sections. In order to avoid the long hold time, > in case some

Re: [PATCH v2 03/17] locking/lockdep: Add a new terminal lock type

2018-12-07 Thread Peter Zijlstra
On Mon, Nov 19, 2018 at 01:55:12PM -0500, Waiman Long wrote: > A terminal lock is a lock where further locking or unlocking on another > lock is not allowed. IOW, no forward dependency is permitted. > > With such a restriction in place, we don't really need to do a full > validation of the lock

Re: [PATCH v5 2/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

2018-12-07 Thread Nicolas Boichat
On Fri, Dec 7, 2018 at 4:05 PM Matthew Wilcox wrote: > > On Fri, Dec 07, 2018 at 02:16:19PM +0800, Nicolas Boichat wrote: > > +#ifdef CONFIG_ZONE_DMA32 > > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32 > > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32 > > This name doesn't make any sense. Why

Re: [PATCH v5 2/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

2018-12-07 Thread Matthew Wilcox
On Fri, Dec 07, 2018 at 02:16:19PM +0800, Nicolas Boichat wrote: > +#ifdef CONFIG_ZONE_DMA32 > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32 > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32 This name doesn't make any sense. Why not ARM_V7S_TABLE_SLAB_FLAGS ? > +#else > +#define