Re: [RFC] avoid indirect calls for DMA direct mappings v2

2018-12-10 Thread Christoph Hellwig
On Mon, Dec 10, 2018 at 01:51:13PM -0800, Luck, Tony wrote:
> But the ia64 build fails with:

Yes, I just got the same complaint form the buildbot, unfortunately
I don't have a good ia64 cross compiler locally given that Debian
is lacking one, and the one provided by the buildbot doesn't build on
Debian either..

This should fix it:

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 2c51733f1dfd..a007afaa556c 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 5/7] iommu: Add virtio-iommu driver

2018-12-10 Thread Michael S. Tsirkin
On Mon, Dec 10, 2018 at 03:06:47PM +, Jean-Philippe Brucker wrote:
> On 27/11/2018 18:53, Michael S. Tsirkin wrote:
> > On Tue, Nov 27, 2018 at 06:10:46PM +, Jean-Philippe Brucker wrote:
> >> On 27/11/2018 18:04, Michael S. Tsirkin wrote:
> >>> On Tue, Nov 27, 2018 at 05:50:50PM +, Jean-Philippe Brucker wrote:
>  On 23/11/2018 22:02, Michael S. Tsirkin wrote:
> >> +/*
> >> + * __viommu_sync_req - Complete all in-flight requests
> >> + *
> >> + * Wait for all added requests to complete. When this function 
> >> returns, all
> >> + * requests that were in-flight at the time of the call have 
> >> completed.
> >> + */
> >> +static int __viommu_sync_req(struct viommu_dev *viommu)
> >> +{
> >> +  int ret = 0;
> >> +  unsigned int len;
> >> +  size_t write_len;
> >> +  struct viommu_request *req;
> >> +  struct virtqueue *vq = viommu->vqs[VIOMMU_REQUEST_VQ];
> >> +
> >> +  assert_spin_locked(>request_lock);
> >> +
> >> +  virtqueue_kick(vq);
> >> +
> >> +  while (!list_empty(>requests)) {
> >> +  len = 0;
> >> +  req = virtqueue_get_buf(vq, );
> >> +  if (!req)
> >> +  continue;
> >> +
> >> +  if (!len)
> >> +  viommu_set_req_status(req->buf, req->len,
> >> +VIRTIO_IOMMU_S_IOERR);
> >> +
> >> +  write_len = req->len - req->write_offset;
> >> +  if (req->writeback && len == write_len)
> >> +  memcpy(req->writeback, req->buf + 
> >> req->write_offset,
> >> + write_len);
> >> +
> >> +  list_del(>list);
> >> +  kfree(req);
> >> +  }
> >
> > I didn't notice this in the past but it seems this will spin
> > with interrupts disabled until host handles the request.
> > Please do not do this - host execution can be another
> > task that needs the same host CPU. This will then disable
> > interrupts for a very very long time.
> 
>  In the guest yes, but that doesn't prevent the host from running another
>  task right?
> >>>
> >>> Doesn't prevent it but it will delay it significantly
> >>> until scheduler decides to kick the VCPU task out.
> >>>
>  My tests run fine when QEMU is bound to a single CPU, even
>  though vcpu and viommu run in different threads
> 
> > What to do then? Queue in software and wake up task.
> 
>  Unfortunately I can't do anything here, because IOMMU drivers can't
>  sleep in the iommu_map() or iommu_unmap() path.
> 
>  The problem is the same
>  for all IOMMU drivers. That's because the DMA API allows drivers to call
>  some functions with interrupts disabled. For example
>  Documentation/DMA-API-HOWTO.txt allows dma_alloc_coherent() and
>  dma_unmap_single() to be called in interrupt context.
> >>>
> >>> In fact I don't really understand how it's supposed to
> >>> work at all: you only sync when ring is full.
> >>> So host may not have seen your map request if ring
> >>> is not full.
> >>> Why is it safe to use the address with a device then?
> >>
> >> viommu_map() calls viommu_send_req_sync(), which does the sync
> >> immediately after adding the MAP request.
> >>
> >> Thanks,
> >> Jean
> > 
> > I see. So it happens on every request. Maybe you should clear
> > event index then. This way if exits are disabled you know that
> > host is processing the ring. Event index is good for when
> > you don't care when it will be processed, you just want
> > to reduce number of exits as much as possible.
> > 
> 
> I think that's already the case: since we don't attach a callback to the
> request queue, VRING_AVAIL_F_NO_INTERRUPT is set in avail_flags_shadow,
> which causes the used event index to stay clear.
> 
> Thanks,
> Jean

VRING_AVAIL_F_NO_INTERRUPT has no effect when the event index
feature has been negotiated. In any case, it also does not
affect kick notifications from guest - it affects
device interrupts.


-- 
MST
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] avoid indirect calls for DMA direct mappings v2

2018-12-10 Thread Luck, Tony
On Fri, Dec 07, 2018 at 11:07:05AM -0800, Christoph Hellwig wrote:
> This works is based on the dma-mapping tree, so you probably want to
> want this git tree for testing:
> 
> git://git.infradead.org/users/hch/misc.git dma-direct-calls.2

Pulled this tree. Got HEAD

33b9fc015171 ("dma-mapping: bypass indirect calls for dma-direct")

But the ia64 build fails with:

arch/ia64/mm/init.c:75:21: warning: 'enum dma_data_direction' declared inside 
parameter list [enabled by default]
arch/ia64/mm/init.c:75:21: warning: its scope is only this definition or 
declaration, which is probably not what you want [enabled by default]
arch/ia64/mm/init.c:75:40: error: parameter 4 ('dir') has incomplete type
arch/ia64/mm/init.c:74:6: error: function declaration isn't a prototype 
[-Werror=strict-prototypes]
arch/ia64/mm/init.c: In function 'arch_sync_dma_for_cpu':
arch/ia64/mm/init.c:77:2: error: implicit declaration of function 
'__phys_to_pfn' [-Werror=implicit-function-declaration]

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 5/7] x86/dma/amd-gart: Stop resizing dma_debug_entry pool

2018-12-10 Thread Thomas Gleixner
On Mon, 10 Dec 2018, Robin Murphy wrote:

> dma-debug is now capable of adding new entries to its pool on-demand if
> the initial preallocation was insufficient, so the IOMMU_LEAK logic no
> longer needs to explicitly change the pool size. This does lose it the
> ability to save a couple of megabytes of RAM by reducing the pool size
> below its default, but it seems unlikely that that is a realistic
> concern these days (or indeed that anyone is actively debugging AGP
> drivers' DMA usage any more). Getting rid of dma_debug_resize_entries()
> will make room for further streamlining in the dma-debug code itself.
> 
> Removing the call reveals quite a lot of cruft which has been useless
> for nearly a decade since commit 19c1a6f5764d ("x86 gart: reimplement
> IOMMU_LEAK feature by using DMA_API_DEBUG"), including the entire
> 'iommu=leak' parameter, which controlled nothing except whether
> dma_debug_resize_entries() was called or not.
> 
> CC: Thomas Gleixner 
> CC: Ingo Molnar 
> CC: Borislav Petkov 
> CC: "H. Peter Anvin" 
> CC: x...@kernel.org
> Reviewed-by: Christoph Hellwig 
> Signed-off-by: Robin Murphy 

Acked-by: Thomas Gleixner 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] swiotlb: add debugfs to track swiotlb buffer usage

2018-12-10 Thread Joe Jin
On 12/10/18 12:00 PM, Tim Chen wrote:
>> @@ -528,6 +538,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>>  dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", 
>> size);
>>  return SWIOTLB_MAP_ERROR;
>>  found:
>> +#ifdef CONFIG_DEBUG_FS
>> +io_tlb_used += nslots;
>> +#endif
> One nit I have about this patch is there are too many CONFIG_DEBUG_FS.
> 
> For example here, instead of io_tlb_used, we can have a macro defined,
> perhaps something like inc_iotlb_used(nslots).  It can be placed in the
> same section that swiotlb_create_debugfs is defined so there's a single
> place where all the CONFIG_DEBUG_FS stuff is located.
> 
> Then define inc_iotlb_used to be null when we don't have
> CONFIG_DEBUG_FS.
> 

Dongli had removed above ifdef/endif on his next patch, "[PATCH v2 2/2]
swiotlb: checking whether swiotlb buffer is full with io_tlb_used"

Thanks,
Joe
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] swiotlb: add debugfs to track swiotlb buffer usage

2018-12-10 Thread Tim Chen




On 12/9/18 4:37 PM, Dongli Zhang wrote:

> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 045930e..3979c2c 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -35,6 +35,9 @@
>  #include 
>  #include 
>  #include 
> +#ifdef CONFIG_DEBUG_FS
> +#include 
> +#endif

Seems like #ifdef CONFIG_DEBUG_FS is better located inside debugfs.h,
instead of requiring every file that includes it
to have a #ifdef CONFIG_DEBUG_FS.

>  
>  #include 
>  #include 
> @@ -73,6 +76,13 @@ static phys_addr_t io_tlb_start, io_tlb_end;
>   */
>  static unsigned long io_tlb_nslabs;
>  
> +#ifdef CONFIG_DEBUG_FS
> +/*
> + * The number of used IO TLB block
> + */
> +static unsigned long io_tlb_used;
> +#endif
> +
>  /*
>   * This is a free list describing the number of free entries available from
>   * each index
> @@ -528,6 +538,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>   dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", 
> size);
>   return SWIOTLB_MAP_ERROR;
>  found:
> +#ifdef CONFIG_DEBUG_FS
> + io_tlb_used += nslots;
> +#endif

One nit I have about this patch is there are too many CONFIG_DEBUG_FS.

For example here, instead of io_tlb_used, we can have a macro defined,
perhaps something like inc_iotlb_used(nslots).  It can be placed in the
same section that swiotlb_create_debugfs is defined so there's a single
place where all the CONFIG_DEBUG_FS stuff is located.

Then define inc_iotlb_used to be null when we don't have
CONFIG_DEBUG_FS.

>   spin_unlock_irqrestore(_tlb_lock, flags);
>  
>   /*
> @@ -588,6 +601,10 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
> phys_addr_t tlb_addr,
>*/
>   for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != 
> IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--)
>   io_tlb_list[i] = ++count;
> +
> +#ifdef CONFIG_DEBUG_FS
> + io_tlb_used -= nslots;
> +#endif
>   }
>   spin_unlock_irqrestore(_tlb_lock, flags);
>  }
> @@ -883,3 +900,36 @@ const struct dma_map_ops swiotlb_dma_ops = {
>   .dma_supported  = dma_direct_supported,
>  };
>  EXPORT_SYMBOL(swiotlb_dma_ops);
> +
> +#ifdef CONFIG_DEBUG_FS
> +

Maybe move io_tlb_used definition here and define
inc_iotlb_used here.

> +static int __init swiotlb_create_debugfs(void)
> +{
> + static struct dentry *d_swiotlb_usage;
> + struct dentry *ent;
> +
> + d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL);
> +
> + if (!d_swiotlb_usage)
> + return -ENOMEM;
> +
> + ent = debugfs_create_ulong("io_tlb_nslabs", 0400,
> +d_swiotlb_usage, _tlb_nslabs);
> + if (!ent)
> + goto fail;
> +
> + ent = debugfs_create_ulong("io_tlb_used", 0400,
> + d_swiotlb_usage, _tlb_used);
> + if (!ent)
> + goto fail;
> +
> + return 0;
> +
> +fail:
> + debugfs_remove_recursive(d_swiotlb_usage);
> + return -ENOMEM;
> +}
> +
> +late_initcall(swiotlb_create_debugfs);
> +
> +#endif
> 

Thanks.

Tim

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/6] sparc: merge 32-bit and 64-bit version of pci.h

2018-12-10 Thread David Miller
From: Christoph Hellwig 
Date: Mon, 10 Dec 2018 20:22:28 +0100

> On Mon, Dec 10, 2018 at 10:10:39AM -0800, David Miller wrote:
>> From: Christoph Hellwig 
>> Date: Mon, 10 Dec 2018 17:32:56 +0100
>> 
>> > Dave, can you pick the series up through the sparc tree?  I could also
>> > merge it through the dma-mapping tree, but given that there is no
>> > dependency on it the sparc tree seem like the better fit.
>> 
>> I thought that some of this is a prerequisite for the DMA mapping
>> work and overall consolidation you are doing.  So I kinda assumed
>> you wanted to merge it via your tree.
>> 
>> I anticipate no conflicts at all, even until the next merge window,
>> so it could very easily go through your tree.
>> 
>> Let me know if you still want me to merge this.
> 
> These patches are purely cleanups I found while auditing the DMA
> mapping code, at least as of now there are no dependencies.
> 
> That being said now that I looked into it a bit more they do however
> depend on the ->mapping_error removal, so I'll take them through
> the dma-mapping tree.

Ok, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-mapping: remove a pointless memset in dma_atomic_pool_init

2018-12-10 Thread Christoph Hellwig
ping?

On Tue, Dec 04, 2018 at 08:06:19AM -0800, Christoph Hellwig wrote:
> We already zero the memory after allocating it from the pool that
> this function fills, and having the memset here in this form means
> we can't support CMA highmem allocations.
> 
> Signed-off-by: Christoph Hellwig 
> Reported-by: Russell King - ARM Linux 
> ---
>  kernel/dma/remap.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c
> index 68a64e3ff6a1..6e784824326f 100644
> --- a/kernel/dma/remap.c
> +++ b/kernel/dma/remap.c
> @@ -121,7 +121,6 @@ int __init dma_atomic_pool_init(gfp_t gfp, pgprot_t prot)
>   if (!page)
>   goto out;
>  
> - memset(page_address(page), 0, atomic_pool_size);
>   arch_dma_prep_coherent(page, atomic_pool_size);
>  
>   atomic_pool = gen_pool_create(PAGE_SHIFT, -1);
> -- 
> 2.19.1
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
---end quoted text---
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/10] arm64/iommu: don't remap contiguous allocations for coherent devices

2018-12-10 Thread Christoph Hellwig
On Mon, Dec 10, 2018 at 07:19:30PM +, Robin Murphy wrote:
> On 08/12/2018 17:36, Christoph Hellwig wrote:
>> There is no need to have an additional kernel mapping for a contiguous
>> allocation if the device already is DMA coherent, so skip it.
>
> FWIW, the "need" was that it kept the code in this path simple and the 
> mapping behaviour consistent with the regular iommu_dma_alloc() case. One 
> could quite easily retort that there is no need for the extra complexity of 
> this patch, since vmalloc is cheap on a 64-bit architecture ;)

Heh.  Well, without the remap we do less work, we prepare for a simple
implementation of DMA_ATTR_NON_CONSISTENT, and also prepapre the code
to be better reusable for architectures that don't do remapping of
DMA allocations at all.

>>  if (addr) {
>>  memset(addr, 0, size);
>> -if (!coherent)
>> -__dma_flush_area(page_to_virt(page), iosize);
>> +__dma_flush_area(page_to_virt(page), iosize);
>
> Oh poo - seems I missed it at the time but the existing logic here is 
> wrong. Let me send a separate fix to flip those statements into the correct 
> order...

Yes, flushing the remapped alias only after zeroing it looks odd.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/6] sparc: merge 32-bit and 64-bit version of pci.h

2018-12-10 Thread Christoph Hellwig
On Mon, Dec 10, 2018 at 10:10:39AM -0800, David Miller wrote:
> From: Christoph Hellwig 
> Date: Mon, 10 Dec 2018 17:32:56 +0100
> 
> > Dave, can you pick the series up through the sparc tree?  I could also
> > merge it through the dma-mapping tree, but given that there is no
> > dependency on it the sparc tree seem like the better fit.
> 
> I thought that some of this is a prerequisite for the DMA mapping
> work and overall consolidation you are doing.  So I kinda assumed
> you wanted to merge it via your tree.
> 
> I anticipate no conflicts at all, even until the next merge window,
> so it could very easily go through your tree.
> 
> Let me know if you still want me to merge this.

These patches are purely cleanups I found while auditing the DMA
mapping code, at least as of now there are no dependencies.

That being said now that I looked into it a bit more they do however
depend on the ->mapping_error removal, so I'll take them through
the dma-mapping tree.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/10] arm64/iommu: don't remap contiguous allocations for coherent devices

2018-12-10 Thread Robin Murphy

On 08/12/2018 17:36, Christoph Hellwig wrote:

There is no need to have an additional kernel mapping for a contiguous
allocation if the device already is DMA coherent, so skip it.


FWIW, the "need" was that it kept the code in this path simple and the 
mapping behaviour consistent with the regular iommu_dma_alloc() case. 
One could quite easily retort that there is no need for the extra 
complexity of this patch, since vmalloc is cheap on a 64-bit architecture ;)



Signed-off-by: Christoph Hellwig 
---
  arch/arm64/mm/dma-mapping.c | 35 ++-
  1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 4c0f498069e8..d39b60113539 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -255,13 +255,18 @@ static void *__iommu_alloc_attrs(struct device *dev, 
size_t size,
size >> PAGE_SHIFT);
return NULL;
}
+
+   if (coherent) {
+   memset(addr, 0, size);
+   return addr;
+   }
+
addr = dma_common_contiguous_remap(page, size, VM_USERMAP,
   prot,
   __builtin_return_address(0));
if (addr) {
memset(addr, 0, size);
-   if (!coherent)
-   __dma_flush_area(page_to_virt(page), iosize);
+   __dma_flush_area(page_to_virt(page), iosize);


Oh poo - seems I missed it at the time but the existing logic here is 
wrong. Let me send a separate fix to flip those statements into the 
correct order...


Robin.


} else {
iommu_dma_unmap_page(dev, *handle, iosize, 0, attrs);
dma_release_from_contiguous(dev, page,
@@ -309,7 +314,9 @@ static void __iommu_free_attrs(struct device *dev, size_t 
size, void *cpu_addr,
  
  		iommu_dma_unmap_page(dev, handle, iosize, 0, attrs);

dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT);
-   dma_common_free_remap(cpu_addr, size, VM_USERMAP);
+
+   if (!dev_is_dma_coherent(dev))
+   dma_common_free_remap(cpu_addr, size, VM_USERMAP);
} else if (is_vmalloc_addr(cpu_addr)){
struct vm_struct *area = find_vm_area(cpu_addr);
  
@@ -336,11 +343,12 @@ static int __iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,

return ret;
  
  	if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {

-   /*
-* DMA_ATTR_FORCE_CONTIGUOUS allocations are always remapped,
-* hence in the vmalloc space.
-*/
-   unsigned long pfn = vmalloc_to_pfn(cpu_addr);
+   unsigned long pfn;
+
+   if (dev_is_dma_coherent(dev))
+   pfn = virt_to_pfn(cpu_addr);
+   else
+   pfn = vmalloc_to_pfn(cpu_addr);
return __swiotlb_mmap_pfn(vma, pfn, size);
}
  
@@ -359,11 +367,12 @@ static int __iommu_get_sgtable(struct device *dev, struct sg_table *sgt,

struct vm_struct *area = find_vm_area(cpu_addr);
  
  	if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {

-   /*
-* DMA_ATTR_FORCE_CONTIGUOUS allocations are always remapped,
-* hence in the vmalloc space.
-*/
-   struct page *page = vmalloc_to_page(cpu_addr);
+   struct page *page;
+
+   if (dev_is_dma_coherent(dev))
+   page = virt_to_page(cpu_addr);
+   else
+   page = vmalloc_to_page(cpu_addr);
return __swiotlb_get_sgtable_page(sgt, page, size);
}
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/10] arm: implement DMA_ATTR_NON_CONSISTENT

2018-12-10 Thread Christoph Hellwig
On Sat, Dec 08, 2018 at 07:52:04PM -0300, Ezequiel Garcia wrote:
> >  #ifdef CONFIG_DMA_API_DEBUG
> > @@ -773,7 +791,7 @@ static void *__dma_alloc(struct device *dev, size_t 
> > size, dma_addr_t *handle,
> >  
> > if (cma)
> > buf->allocator = _allocator;
> > -   else if (is_coherent)
> > +   else if (is_coherent || (attrs & DMA_ATTR_NON_CONSISTENT))
> > buf->allocator = _allocator;
> 
> Reading through your code I can't really see where the pgprot is changed
> for non-consistent requests. Namely, __get_dma_pgprot only
> returns writecombine or coherent memory.

We don't look at the pgprot at all for the simple allocator, and
don't look at prot for the DMA_ATTR_NON_CONSISTENT case in the
CMA allocator, so this should not be a problem.  However we need to
take DMA_ATTR_NON_CONSISTENT into account for calculating the mmap
pgprot, with something like this as an incremental patch:

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index b3b66b41c450..6ac7e430a47c 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -873,7 +873,8 @@ int arm_dma_mmap(struct device *dev, struct vm_area_struct 
*vma,
 void *cpu_addr, dma_addr_t dma_addr, size_t size,
 unsigned long attrs)
 {
-   vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
+   if (!(attrs & DMA_ATTR_NON_CONSISTENT))
+   vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
return __arm_dma_mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
 }
 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: dma_declare_coherent_memory on main memory

2018-12-10 Thread Christoph Hellwig
On Mon, Dec 10, 2018 at 08:26:48AM +0100, Sascha Hauer wrote:
> > the ARM imx27/31 ports and various sh boards use
> > dma_declare_coherent_memory on main memory taken from the memblock
> > allocator.
> > 
> > Is there any good reason these couldn't be switched to CMA areas?
> > Getting rid of these magic dma_declare_coherent_memory area would
> > help making the dma allocator a lot simpler.
> 
> At least for i.MX27/31 I'd say this predates CMA support, so historical
> reasons.

Ok.  Do you still have test hardware for i.MX?  If so I'd love to see
if the patch below works.  Note that this is the brute force approach,
just having a global CMA pool might be a little more elegant.

diff --git a/arch/arm/configs/imx_v4_v5_defconfig 
b/arch/arm/configs/imx_v4_v5_defconfig
index 8661dd9b064a..88852e7e5e7e 100644
--- a/arch/arm/configs/imx_v4_v5_defconfig
+++ b/arch/arm/configs/imx_v4_v5_defconfig
@@ -185,3 +185,5 @@ CONFIG_NLS_ISO8859_1=y
 CONFIG_NLS_ISO8859_15=m
 CONFIG_FONTS=y
 CONFIG_FONT_8x8=y
+CONFIG_CMA=y
+CONFIG_DMA_CMA=y
diff --git a/arch/arm/configs/imx_v6_v7_defconfig 
b/arch/arm/configs/imx_v6_v7_defconfig
index 1ad5736c8fa6..16c8d717316c 100644
--- a/arch/arm/configs/imx_v6_v7_defconfig
+++ b/arch/arm/configs/imx_v6_v7_defconfig
@@ -444,3 +444,5 @@ CONFIG_PROVE_LOCKING=y
 # CONFIG_DEBUG_BUGVERBOSE is not set
 # CONFIG_FTRACE is not set
 # CONFIG_ARM_UNWIND is not set
+CONFIG_CMA=y
+CONFIG_DMA_CMA=y
diff --git a/arch/arm/mach-imx/mach-imx27_visstrim_m10.c 
b/arch/arm/mach-imx/mach-imx27_visstrim_m10.c
index 5169dfba9718..2339a50d5459 100644
--- a/arch/arm/mach-imx/mach-imx27_visstrim_m10.c
+++ b/arch/arm/mach-imx/mach-imx27_visstrim_m10.c
@@ -23,6 +23,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include 
 #include 
 #include 
 #include 
@@ -35,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -245,6 +247,7 @@ static phys_addr_t mx2_camera_base __initdata;
 static void __init visstrim_analog_camera_init(void)
 {
struct platform_device *pdev;
+   struct cma *cma;
 
gpio_set_value(TVP5150_PWDN, 1);
ndelay(1);
@@ -257,9 +260,10 @@ static void __init visstrim_analog_camera_init(void)
if (IS_ERR(pdev))
return;
 
-   dma_declare_coherent_memory(>dev, mx2_camera_base,
-   mx2_camera_base, MX2_CAMERA_BUF_SIZE,
-   DMA_MEMORY_EXCLUSIVE);
+   if (cma_init_reserved_mem(mx2_camera_base, MX2_CAMERA_BUF_SIZE, 0,
+   "visstrim-cam", ))
+   return;
+   dma_contiguous_early_fixup(mx2_camera_base, MX2_CAMERA_BUF_SIZE);
 }
 
 static void __init visstrim_reserve(void)
@@ -440,13 +444,16 @@ static const struct imx_ssi_platform_data 
visstrim_m10_ssi_pdata __initconst = {
 static void __init visstrim_coda_init(void)
 {
struct platform_device *pdev;
+   struct cma *cma;
 
pdev = imx27_add_coda();
-   dma_declare_coherent_memory(>dev,
-   mx2_camera_base + MX2_CAMERA_BUF_SIZE,
-   mx2_camera_base + MX2_CAMERA_BUF_SIZE,
-   MX2_CAMERA_BUF_SIZE,
-   DMA_MEMORY_EXCLUSIVE);
+   if (IS_ERR(pdev))
+   return;
+   if (cma_init_reserved_mem(mx2_camera_base + MX2_CAMERA_BUF_SIZE,
+   MX2_CAMERA_BUF_SIZE, 0, "visstrim-coda", ))
+   return;
+   dma_contiguous_early_fixup(mx2_camera_base + MX2_CAMERA_BUF_SIZE,
+   MX2_CAMERA_BUF_SIZE);
 }
 
 /* DMA deinterlace */
@@ -459,21 +466,22 @@ static void __init visstrim_deinterlace_init(void)
 {
int ret = -ENOMEM;
struct platform_device *pdev = _deinterlace;
+   struct cma *cma;
 
ret = platform_device_register(pdev);
 
-   dma_declare_coherent_memory(>dev,
-   mx2_camera_base + 2 * MX2_CAMERA_BUF_SIZE,
-   mx2_camera_base + 2 * MX2_CAMERA_BUF_SIZE,
-   MX2_CAMERA_BUF_SIZE,
-   DMA_MEMORY_EXCLUSIVE);
+   if (cma_init_reserved_mem(mx2_camera_base + 2 * MX2_CAMERA_BUF_SIZE,
+   MX2_CAMERA_BUF_SIZE, 0, "visstrim-deinterlace", ))
+   return;
+   dma_contiguous_early_fixup(mx2_camera_base + 2 * MX2_CAMERA_BUF_SIZE,
+   MX2_CAMERA_BUF_SIZE);
 }
 
 /* Emma-PrP for format conversion */
 static void __init visstrim_emmaprp_init(void)
 {
struct platform_device *pdev;
-   int ret;
+   struct cma *cma;
 
pdev = imx27_add_mx2_emmaprp();
if (IS_ERR(pdev))
@@ -483,12 +491,12 @@ static void __init visstrim_emmaprp_init(void)
 * Use the same memory area as the analog camera since both
 * devices are, by nature, exclusive.
 */
-   ret = dma_declare_coherent_memory(>dev,
-   

Re: [PATCH 6/6] sparc: merge 32-bit and 64-bit version of pci.h

2018-12-10 Thread David Miller
From: Christoph Hellwig 
Date: Mon, 10 Dec 2018 17:32:56 +0100

> Dave, can you pick the series up through the sparc tree?  I could also
> merge it through the dma-mapping tree, but given that there is no
> dependency on it the sparc tree seem like the better fit.

I thought that some of this is a prerequisite for the DMA mapping
work and overall consolidation you are doing.  So I kinda assumed
you wanted to merge it via your tree.

I anticipate no conflicts at all, even until the next merge window,
so it could very easily go through your tree.

Let me know if you still want me to merge this.

Thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] swiotlb: add debugfs to track swiotlb buffer usage

2018-12-10 Thread Joe Jin
On 12/9/18 4:37 PM, Dongli Zhang wrote:
> The device driver will not be able to do dma operations once swiotlb buffer
> is full, either because the driver is using so many IO TLB blocks inflight,
> or because there is memory leak issue in device driver. To export the
> swiotlb buffer usage via debugfs would help the user estimate the size of
> swiotlb buffer to pre-allocate or analyze device driver memory leak issue.
> 
> Signed-off-by: Dongli Zhang 

Reviewed-by: Joe Jin 

> ---
> Changed since v1:
>   * init debugfs with late_initcall (suggested by Robin Murphy)
>   * create debugfs entries with debugfs_create_ulong(suggested by Robin 
> Murphy)
> 
>  kernel/dma/swiotlb.c | 50 ++
>  1 file changed, 50 insertions(+)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 045930e..3979c2c 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -35,6 +35,9 @@
>  #include 
>  #include 
>  #include 
> +#ifdef CONFIG_DEBUG_FS
> +#include 
> +#endif
>  
>  #include 
>  #include 
> @@ -73,6 +76,13 @@ static phys_addr_t io_tlb_start, io_tlb_end;
>   */
>  static unsigned long io_tlb_nslabs;
>  
> +#ifdef CONFIG_DEBUG_FS
> +/*
> + * The number of used IO TLB block
> + */
> +static unsigned long io_tlb_used;
> +#endif
> +
>  /*
>   * This is a free list describing the number of free entries available from
>   * each index
> @@ -528,6 +538,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>   dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", 
> size);
>   return SWIOTLB_MAP_ERROR;
>  found:
> +#ifdef CONFIG_DEBUG_FS
> + io_tlb_used += nslots;
> +#endif
>   spin_unlock_irqrestore(_tlb_lock, flags);
>  
>   /*
> @@ -588,6 +601,10 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
> phys_addr_t tlb_addr,
>*/
>   for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != 
> IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--)
>   io_tlb_list[i] = ++count;
> +
> +#ifdef CONFIG_DEBUG_FS
> + io_tlb_used -= nslots;
> +#endif
>   }
>   spin_unlock_irqrestore(_tlb_lock, flags);
>  }
> @@ -883,3 +900,36 @@ const struct dma_map_ops swiotlb_dma_ops = {
>   .dma_supported  = dma_direct_supported,
>  };
>  EXPORT_SYMBOL(swiotlb_dma_ops);
> +
> +#ifdef CONFIG_DEBUG_FS
> +
> +static int __init swiotlb_create_debugfs(void)
> +{
> + static struct dentry *d_swiotlb_usage;
> + struct dentry *ent;
> +
> + d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL);
> +
> + if (!d_swiotlb_usage)
> + return -ENOMEM;
> +
> + ent = debugfs_create_ulong("io_tlb_nslabs", 0400,
> +d_swiotlb_usage, _tlb_nslabs);
> + if (!ent)
> + goto fail;
> +
> + ent = debugfs_create_ulong("io_tlb_used", 0400,
> + d_swiotlb_usage, _tlb_used);
> + if (!ent)
> + goto fail;
> +
> + return 0;
> +
> +fail:
> + debugfs_remove_recursive(d_swiotlb_usage);
> + return -ENOMEM;
> +}
> +
> +late_initcall(swiotlb_create_debugfs);
> +
> +#endif
> 


-- 
Oracle 
Joe Jin | Software Development Director 
ORACLE | Linux and Virtualization
500 Oracle Parkway Redwood City, CA US 94065
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] swiotlb: checking whether swiotlb buffer is full with io_tlb_used

2018-12-10 Thread Joe Jin
On 12/9/18 4:37 PM, Dongli Zhang wrote:
> This patch uses io_tlb_used to help check whether swiotlb buffer is full.
> io_tlb_used is no longer used for only debugfs. It is also used to help
> optimize swiotlb_tbl_map_single().
> 
> Suggested-by: Joe Jin 
> Signed-off-by: Dongli Zhang 

Reviewed-by: Joe Jin 

> ---
>  kernel/dma/swiotlb.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 3979c2c..9300341 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -76,12 +76,10 @@ static phys_addr_t io_tlb_start, io_tlb_end;
>   */
>  static unsigned long io_tlb_nslabs;
>  
> -#ifdef CONFIG_DEBUG_FS
>  /*
>   * The number of used IO TLB block
>   */
>  static unsigned long io_tlb_used;
> -#endif
>  
>  /*
>   * This is a free list describing the number of free entries available from
> @@ -489,6 +487,10 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>* request and allocate a buffer from that IO TLB pool.
>*/
>   spin_lock_irqsave(_tlb_lock, flags);
> +
> + if (unlikely(nslots > io_tlb_nslabs - io_tlb_used))
> + goto not_found;
> +
>   index = ALIGN(io_tlb_index, stride);
>   if (index >= io_tlb_nslabs)
>   index = 0;
> @@ -538,9 +540,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>   dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", 
> size);
>   return SWIOTLB_MAP_ERROR;
>  found:
> -#ifdef CONFIG_DEBUG_FS
>   io_tlb_used += nslots;
> -#endif
>   spin_unlock_irqrestore(_tlb_lock, flags);
>  
>   /*
> @@ -602,9 +602,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
> phys_addr_t tlb_addr,
>   for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != 
> IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--)
>   io_tlb_list[i] = ++count;
>  
> -#ifdef CONFIG_DEBUG_FS
>   io_tlb_used -= nslots;
> -#endif
>   }
>   spin_unlock_irqrestore(_tlb_lock, flags);
>  }
> 


-- 
Oracle 
Joe Jin | Software Development Director 
ORACLE | Linux and Virtualization
500 Oracle Parkway Redwood City, CA US 94065
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/6] sparc: merge 32-bit and 64-bit version of pci.h

2018-12-10 Thread Christoph Hellwig
Dave, can you pick the series up through the sparc tree?  I could also
merge it through the dma-mapping tree, but given that there is no
dependency on it the sparc tree seem like the better fit.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 6/9] iommu/dma-iommu.c: Convert to use vm_insert_range

2018-12-10 Thread Robin Murphy

On 07/12/2018 20:41, Souptick Joarder wrote:

On Fri, Dec 7, 2018 at 7:17 PM Robin Murphy  wrote:


On 06/12/2018 18:43, Souptick Joarder wrote:

Convert to use vm_insert_range() to map range of kernel
memory to user vma.

Signed-off-by: Souptick Joarder 
Reviewed-by: Matthew Wilcox 
---
   drivers/iommu/dma-iommu.c | 13 +++--
   1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d1b0475..a2c65e2 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -622,17 +622,10 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,

   int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct 
*vma)
   {
- unsigned long uaddr = vma->vm_start;
- unsigned int i, count = PAGE_ALIGN(size) >> PAGE_SHIFT;
- int ret = -ENXIO;
+ unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;

- for (i = vma->vm_pgoff; i < count && uaddr < vma->vm_end; i++) {
- ret = vm_insert_page(vma, uaddr, pages[i]);
- if (ret)
- break;
- uaddr += PAGE_SIZE;
- }
- return ret;
+ return vm_insert_range(vma, vma->vm_start,
+ pages + vma->vm_pgoff, count);


You also need to adjust count to compensate for the pages skipped by
vm_pgoff, otherwise you've got an out-of-bounds dereference triggered
from userspace, which is pretty high up the "not good" scale (not to
mention the entire call would then propagate -EFAULT back from
vm_insert_page() and thus always appear to fail for nonzero offsets).


So this should something similar to ->

 return vm_insert_range(vma, vma->vm_start,
 pages + vma->vm_pgoff, count - vma->vm_pgoff);



Yup, I think that looks appropriate.

Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: use generic DMA mapping code in powerpc V4

2018-12-10 Thread Christian Zigotzky
Next step: 64ecd2c160bbef31465c4d34efc0f076a2aad4df (powerpc/dma: use 
phys_to_dma instead of get_dma_offset)


The P5020 board boots and the PASEMI onboard ethernet works.

-- Christian


On 09 December 2018 at 7:26PM, Christian Zigotzky wrote:
Next step: c1bfcad4b0cf38ce5b00f7ad880d3a13484c123a (dma-mapping, 
powerpc: simplify the arch dma_set_mask override)


Result: No problems with the PASEMI onboard ethernet and with booting 
the X5000 (P5020 board).


-- Christian


On 09 December 2018 at 3:20PM, Christian Zigotzky wrote:
Next step: 602307b034734ce77a05da4b99333a2eaf6b6482 (powerpc/fsl_pci: 
simplify fsl_pci_dma_set_mask)


git checkout 602307b034734ce77a05da4b99333a2eaf6b6482

The PASEMI onboard ethernet works and the X5000 boots.

-- Christian


On 08 December 2018 at 2:47PM, Christian Zigotzky wrote:
Next step: e15cd8173ef85e9cc3e2a9c7cc2982f5c1355615 (powerpc/dma: 
fix an off-by-one in dma_capable)


git checkout e15cd8173ef85e9cc3e2a9c7cc2982f5c1355615

The PASEMI onboard ethernet also works with this commit and the 
X5000 boots without any problems.


-- Christian


On 08 December 2018 at 11:29AM, Christian Zigotzky wrote:
Next step: 7ebc44c535f6bd726d553756d38b137acc718443 (powerpc/dma: 
remove max_direct_dma_addr)


git checkout 7ebc44c535f6bd726d553756d38b137acc718443

OK, the PASEMI onboard ethernet works and the P5020 board boots.

-- Christian


On 07 December 2018 at 7:33PM, Christian Zigotzky wrote:
Next step: 13c1fdec5682b6e13257277fa16aa31f342d167d (powerpc/dma: 
move pci_dma_dev_setup_swiotlb to fsl_pci.c)


git checkout 13c1fdec5682b6e13257277fa16aa31f342d167d

Result: The PASEMI onboard ethernet works and the P5020 board boots.

— Christian















___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 0/7] dma-debug cleanup and dynamic allocation

2018-12-10 Thread Qian Cai
On 12/10/18 9:00 AM, Robin Murphy wrote:
> Hi all,
> 
> Here's some assorted cleanup and improvements to dma-debug which grew
> out of the problem that certain drivers use very large numbers of DMA
> mappings, and knowing when to override "dma_debug_entries=..." and what
> value to override it with can be a less-than-obvious task for users.
> 
> The main part is patch #4, wherein we make dma-debug clever enough
> to allocate more entries dynamically if needed, such that the
> preallocation value becomes more of a quality-of-life option than a
> necessity. Patches #5 and #6 do some cruft-removal to allow patch #7
> to make the allocation behaviour more efficient in general.
> 
> Patches #1, #2 and #4 are some other cleanup and handy features which
> fell out of the discussion/development.
> 
> Robin.
> 
> 
> Robin Murphy (7):
>   dma-debug: Use pr_fmt()
>   dma-debug: Expose nr_total_entries in debugfs
>   dma-debug: Dynamically expand the dma_debug_entry pool
>   dma-debug: Make leak-like behaviour apparent
>   x86/dma/amd-gart: Stop resizing dma_debug_entry pool
>   dma/debug: Remove dma_debug_resize_entries()
>   dma-debug: Batch dma_debug_entry allocation
> 
>  Documentation/DMA-API.txt |  20 +-
>  Documentation/x86/x86_64/boot-options.txt |   5 +-
>  arch/x86/kernel/amd_gart_64.c |  23 ---
>  include/linux/dma-debug.h |   7 -
>  kernel/dma/debug.c| 217 ++
>  5 files changed, 109 insertions(+), 163 deletions(-)
> 

Tested-by: Qian Cai 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 5/7] iommu: Add virtio-iommu driver

2018-12-10 Thread Jean-Philippe Brucker
On 27/11/2018 18:53, Michael S. Tsirkin wrote:
> On Tue, Nov 27, 2018 at 06:10:46PM +, Jean-Philippe Brucker wrote:
>> On 27/11/2018 18:04, Michael S. Tsirkin wrote:
>>> On Tue, Nov 27, 2018 at 05:50:50PM +, Jean-Philippe Brucker wrote:
 On 23/11/2018 22:02, Michael S. Tsirkin wrote:
>> +/*
>> + * __viommu_sync_req - Complete all in-flight requests
>> + *
>> + * Wait for all added requests to complete. When this function returns, 
>> all
>> + * requests that were in-flight at the time of the call have completed.
>> + */
>> +static int __viommu_sync_req(struct viommu_dev *viommu)
>> +{
>> +int ret = 0;
>> +unsigned int len;
>> +size_t write_len;
>> +struct viommu_request *req;
>> +struct virtqueue *vq = viommu->vqs[VIOMMU_REQUEST_VQ];
>> +
>> +assert_spin_locked(>request_lock);
>> +
>> +virtqueue_kick(vq);
>> +
>> +while (!list_empty(>requests)) {
>> +len = 0;
>> +req = virtqueue_get_buf(vq, );
>> +if (!req)
>> +continue;
>> +
>> +if (!len)
>> +viommu_set_req_status(req->buf, req->len,
>> +  VIRTIO_IOMMU_S_IOERR);
>> +
>> +write_len = req->len - req->write_offset;
>> +if (req->writeback && len == write_len)
>> +memcpy(req->writeback, req->buf + 
>> req->write_offset,
>> +   write_len);
>> +
>> +list_del(>list);
>> +kfree(req);
>> +}
>
> I didn't notice this in the past but it seems this will spin
> with interrupts disabled until host handles the request.
> Please do not do this - host execution can be another
> task that needs the same host CPU. This will then disable
> interrupts for a very very long time.

 In the guest yes, but that doesn't prevent the host from running another
 task right?
>>>
>>> Doesn't prevent it but it will delay it significantly
>>> until scheduler decides to kick the VCPU task out.
>>>
 My tests run fine when QEMU is bound to a single CPU, even
 though vcpu and viommu run in different threads

> What to do then? Queue in software and wake up task.

 Unfortunately I can't do anything here, because IOMMU drivers can't
 sleep in the iommu_map() or iommu_unmap() path.

 The problem is the same
 for all IOMMU drivers. That's because the DMA API allows drivers to call
 some functions with interrupts disabled. For example
 Documentation/DMA-API-HOWTO.txt allows dma_alloc_coherent() and
 dma_unmap_single() to be called in interrupt context.
>>>
>>> In fact I don't really understand how it's supposed to
>>> work at all: you only sync when ring is full.
>>> So host may not have seen your map request if ring
>>> is not full.
>>> Why is it safe to use the address with a device then?
>>
>> viommu_map() calls viommu_send_req_sync(), which does the sync
>> immediately after adding the MAP request.
>>
>> Thanks,
>> Jean
> 
> I see. So it happens on every request. Maybe you should clear
> event index then. This way if exits are disabled you know that
> host is processing the ring. Event index is good for when
> you don't care when it will be processed, you just want
> to reduce number of exits as much as possible.
> 

I think that's already the case: since we don't attach a callback to the
request queue, VRING_AVAIL_F_NO_INTERRUPT is set in avail_flags_shadow,
which causes the used event index to stay clear.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 7/7] dma-debug: Batch dma_debug_entry allocation

2018-12-10 Thread Robin Murphy
DMA debug entries are one of those things which aren't that useful
individually - we will always want some larger quantity of them - and
which we don't really need to manage the exact number of - we only care
about having 'enough'. In that regard, the current behaviour of creating
them one-by-one leads to a lot of unwarranted function call overhead and
memory wasted on alignment padding.

Now that we don't have to worry about freeing anything via
dma_debug_resize_entries(), we can optimise the allocation behaviour by
grabbing whole pages at once, which will save considerably on the
aforementioned overheads, and probably offer a little more cache/TLB
locality benefit for traversing the lists under normal operation. This
should also give even less reason for an architecture-level override of
the preallocation size, so make the definition unconditional - if there
is still any desire to change the compile-time value for some platforms
it would be better off as a Kconfig option anyway.

Since freeing a whole page of entries at once becomes enough of a
challenge that it's not really worth complicating dma_debug_init(), we
may as well tweak the preallocation behaviour such that as long as we
manage to allocate *some* pages, we can leave debugging enabled on a
best-effort basis rather than otherwise wasting them.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Robin Murphy 
---

v3: Clean up #ifdef, misc. cosmetic tweaks, add Christoph's review tag

 Documentation/DMA-API.txt |  4 +++-
 kernel/dma/debug.c| 50 ---
 2 files changed, 24 insertions(+), 30 deletions(-)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 7a7d8a415ce8..016eb6909b8a 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -747,7 +747,9 @@ driver afterwards. This filter can be disabled or changed 
later using debugfs.
 When the code disables itself at runtime this is most likely because it ran
 out of dma_debug_entries and was unable to allocate more on-demand. 65536
 entries are preallocated at boot - if this is too low for you boot with
-'dma_debug_entries=' to overwrite the default. The
+'dma_debug_entries=' to overwrite the default. Note
+that the code allocates entries in batches, so the exact number of
+preallocated entries may be greater than the actual number requested. The
 code will print to the kernel log each time it has dynamically allocated
 as many entries as were initially preallocated. This is to indicate that a
 larger preallocation size may be appropriate, or if it happens continually
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 36a42874b05f..20ab0f6c1b70 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -43,12 +43,9 @@
 #define HASH_FN_SHIFT   13
 #define HASH_FN_MASK(HASH_SIZE - 1)
 
-/* allow architectures to override this if absolutely required */
-#ifndef PREALLOC_DMA_DEBUG_ENTRIES
 #define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
-#endif
 /* If the pool runs out, add this many new entries at once */
-#define DMA_DEBUG_DYNAMIC_ENTRIES 256
+#define DMA_DEBUG_DYNAMIC_ENTRIES (PAGE_SIZE / sizeof(struct dma_debug_entry))
 
 enum {
dma_debug_single,
@@ -648,32 +645,22 @@ static void add_dma_entry(struct dma_debug_entry *entry)
 */
 }
 
-static int dma_debug_create_entries(u32 num_entries, gfp_t gfp)
+static int dma_debug_create_entries(gfp_t gfp)
 {
-   struct dma_debug_entry *entry, *next_entry;
+   struct dma_debug_entry *entry;
int i;
 
-   for (i = 0; i < num_entries; ++i) {
-   entry = kzalloc(sizeof(*entry), gfp);
-   if (!entry)
-   goto out_err;
+   entry = (void *)get_zeroed_page(gfp);
+   if (!entry)
+   return -ENOMEM;
 
-   list_add_tail(>list, _entries);
-   }
+   for (i = 0; i < DMA_DEBUG_DYNAMIC_ENTRIES; i++)
+   list_add_tail([i].list, _entries);
 
-   num_free_entries += num_entries;
-   nr_total_entries += num_entries;
+   num_free_entries += DMA_DEBUG_DYNAMIC_ENTRIES;
+   nr_total_entries += DMA_DEBUG_DYNAMIC_ENTRIES;
 
return 0;
-
-out_err:
-
-   list_for_each_entry_safe(entry, next_entry, _entries, list) {
-   list_del(>list);
-   kfree(entry);
-   }
-
-   return -ENOMEM;
 }
 
 static struct dma_debug_entry *__dma_entry_alloc(void)
@@ -715,8 +702,7 @@ static struct dma_debug_entry *dma_entry_alloc(void)
 
spin_lock_irqsave(_entries_lock, flags);
if (num_free_entries == 0) {
-   if (dma_debug_create_entries(DMA_DEBUG_DYNAMIC_ENTRIES,
-GFP_ATOMIC)) {
+   if (dma_debug_create_entries(GFP_ATOMIC)) {
global_disable = true;
spin_unlock_irqrestore(_entries_lock, flags);
pr_err("debugging out of memory - disabling\n");
@@ -987,7 +973,7 @@ void 

[PATCH v3 6/7] dma/debug: Remove dma_debug_resize_entries()

2018-12-10 Thread Robin Murphy
With the only caller now gone, we can clean up this part of dma-debug's
exposed internals and make way to tweak the allocation behaviour.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Robin Murphy 
---

v3: Add Christoph's review tag

 include/linux/dma-debug.h |  7 --
 kernel/dma/debug.c| 46 ---
 2 files changed, 53 deletions(-)

diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h
index 30213adbb6b9..46e6131a72b6 100644
--- a/include/linux/dma-debug.h
+++ b/include/linux/dma-debug.h
@@ -30,8 +30,6 @@ struct bus_type;
 
 extern void dma_debug_add_bus(struct bus_type *bus);
 
-extern int dma_debug_resize_entries(u32 num_entries);
-
 extern void debug_dma_map_single(struct device *dev, const void *addr,
 unsigned long len);
 
@@ -101,11 +99,6 @@ static inline void dma_debug_add_bus(struct bus_type *bus)
 {
 }
 
-static inline int dma_debug_resize_entries(u32 num_entries)
-{
-   return 0;
-}
-
 static inline void debug_dma_map_single(struct device *dev, const void *addr,
unsigned long len)
 {
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 912c23f4c177..36a42874b05f 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -755,52 +755,6 @@ static void dma_entry_free(struct dma_debug_entry *entry)
spin_unlock_irqrestore(_entries_lock, flags);
 }
 
-int dma_debug_resize_entries(u32 num_entries)
-{
-   int i, delta, ret = 0;
-   unsigned long flags;
-   struct dma_debug_entry *entry;
-   LIST_HEAD(tmp);
-
-   spin_lock_irqsave(_entries_lock, flags);
-
-   if (nr_total_entries < num_entries) {
-   delta = num_entries - nr_total_entries;
-
-   spin_unlock_irqrestore(_entries_lock, flags);
-
-   for (i = 0; i < delta; i++) {
-   entry = kzalloc(sizeof(*entry), GFP_KERNEL);
-   if (!entry)
-   break;
-
-   list_add_tail(>list, );
-   }
-
-   spin_lock_irqsave(_entries_lock, flags);
-
-   list_splice(, _entries);
-   nr_total_entries += i;
-   num_free_entries += i;
-   } else {
-   delta = nr_total_entries - num_entries;
-
-   for (i = 0; i < delta && !list_empty(_entries); i++) {
-   entry = __dma_entry_alloc();
-   kfree(entry);
-   }
-
-   nr_total_entries -= i;
-   }
-
-   if (nr_total_entries != num_entries)
-   ret = 1;
-
-   spin_unlock_irqrestore(_entries_lock, flags);
-
-   return ret;
-}
-
 /*
  * DMA-API debugging init code
  *
-- 
2.19.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 4/7] dma-debug: Make leak-like behaviour apparent

2018-12-10 Thread Robin Murphy
Now that we can dynamically allocate DMA debug entries to cope with
drivers maintaining excessively large numbers of live mappings, a driver
which *does* actually have a bug leaking mappings (and is not unloaded)
will no longer trigger the "DMA-API: debugging out of memory - disabling"
message until it gets to actual kernel OOM conditions, which means it
could go unnoticed for a while. To that end, let's inform the user each
time the pool has grown to a multiple of its initial size, which should
make it apparent that they either have a leak or might want to increase
the preallocation size.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Robin Murphy 
---

v3: Add Christoph's review tag

 Documentation/DMA-API.txt |  6 +-
 kernel/dma/debug.c| 13 +
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 0fcb7561af1e..7a7d8a415ce8 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -747,7 +747,11 @@ driver afterwards. This filter can be disabled or changed 
later using debugfs.
 When the code disables itself at runtime this is most likely because it ran
 out of dma_debug_entries and was unable to allocate more on-demand. 65536
 entries are preallocated at boot - if this is too low for you boot with
-'dma_debug_entries=' to overwrite the default.
+'dma_debug_entries=' to overwrite the default. The
+code will print to the kernel log each time it has dynamically allocated
+as many entries as were initially preallocated. This is to indicate that a
+larger preallocation size may be appropriate, or if it happens continually
+that a driver may be leaking mappings.
 
 ::
 
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index ef7c90b7a346..912c23f4c177 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -691,6 +691,18 @@ static struct dma_debug_entry *__dma_entry_alloc(void)
return entry;
 }
 
+void __dma_entry_alloc_check_leak(void)
+{
+   u32 tmp = nr_total_entries % nr_prealloc_entries;
+
+   /* Shout each time we tick over some multiple of the initial pool */
+   if (tmp < DMA_DEBUG_DYNAMIC_ENTRIES) {
+   pr_info("dma_debug_entry pool grown to %u (%u00%%)\n",
+   nr_total_entries,
+   (nr_total_entries / nr_prealloc_entries));
+   }
+}
+
 /* struct dma_entry allocator
  *
  * The next two functions implement the allocator for
@@ -710,6 +722,7 @@ static struct dma_debug_entry *dma_entry_alloc(void)
pr_err("debugging out of memory - disabling\n");
return NULL;
}
+   __dma_entry_alloc_check_leak();
}
 
entry = __dma_entry_alloc();
-- 
2.19.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 5/7] x86/dma/amd-gart: Stop resizing dma_debug_entry pool

2018-12-10 Thread Robin Murphy
dma-debug is now capable of adding new entries to its pool on-demand if
the initial preallocation was insufficient, so the IOMMU_LEAK logic no
longer needs to explicitly change the pool size. This does lose it the
ability to save a couple of megabytes of RAM by reducing the pool size
below its default, but it seems unlikely that that is a realistic
concern these days (or indeed that anyone is actively debugging AGP
drivers' DMA usage any more). Getting rid of dma_debug_resize_entries()
will make room for further streamlining in the dma-debug code itself.

Removing the call reveals quite a lot of cruft which has been useless
for nearly a decade since commit 19c1a6f5764d ("x86 gart: reimplement
IOMMU_LEAK feature by using DMA_API_DEBUG"), including the entire
'iommu=leak' parameter, which controlled nothing except whether
dma_debug_resize_entries() was called or not.

CC: Thomas Gleixner 
CC: Ingo Molnar 
CC: Borislav Petkov 
CC: "H. Peter Anvin" 
CC: x...@kernel.org
Reviewed-by: Christoph Hellwig 
Signed-off-by: Robin Murphy 
---

v3: Add Christoph's review tag

 Documentation/x86/x86_64/boot-options.txt |  5 +
 arch/x86/kernel/amd_gart_64.c | 23 ---
 2 files changed, 1 insertion(+), 27 deletions(-)

diff --git a/Documentation/x86/x86_64/boot-options.txt 
b/Documentation/x86/x86_64/boot-options.txt
index ad6d2a80cf05..abc53886655e 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -209,7 +209,7 @@ IOMMU (input/output memory management unit)
   mapping with memory protection, etc.
   Kernel boot message: "PCI-DMA: Using Calgary IOMMU"
 
- iommu=[][,noagp][,off][,force][,noforce][,leak[=]
+ iommu=[][,noagp][,off][,force][,noforce]
[,memaper[=]][,merge][,fullflush][,nomerge]
[,noaperture][,calgary]
 
@@ -228,9 +228,6 @@ IOMMU (input/output memory management unit)
 allowedOverwrite iommu off workarounds for specific chipsets.
 fullflush  Flush IOMMU on each allocation (default).
 nofullflushDon't use IOMMU fullflush.
-leak   Turn on simple iommu leak tracing (only when
-   CONFIG_IOMMU_LEAK is on). Default number of leak pages
-   is 20.
 memaper[=]  Allocate an own aperture over RAM with size 32MB

[PATCH v3 3/7] dma-debug: Dynamically expand the dma_debug_entry pool

2018-12-10 Thread Robin Murphy
Certain drivers such as large multi-queue network adapters can use pools
of mapped DMA buffers larger than the default dma_debug_entry pool of
65536 entries, with the result that merely probing such a device can
cause DMA debug to disable itself during boot unless explicitly given an
appropriate "dma_debug_entries=..." option.

Developers trying to debug some other driver on such a system may not be
immediately aware of this, and at worst it can hide bugs if they fail to
realise that dma-debug has already disabled itself unexpectedly by the
time their code of interest gets to run. Even once they do realise, it
can be a bit of a pain to emprirically determine a suitable number of
preallocated entries to configure, short of massively over-allocating.

There's really no need for such a static limit, though, since we can
quite easily expand the pool at runtime in those rare cases that the
preallocated entries are insufficient, which is arguably the least
surprising and most useful behaviour. To that end, refactor the
prealloc_memory() logic a little bit to generalise it for runtime
reallocations as well.

Signed-off-by: Robin Murphy 
---

v3: Drop refactoring of dma_debug_resize_entries(), misc. cosmetic tweaks

 Documentation/DMA-API.txt | 11 +++---
 kernel/dma/debug.c| 79 ---
 2 files changed, 46 insertions(+), 44 deletions(-)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 6bdb095393b0..0fcb7561af1e 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -717,8 +717,8 @@ dma-api/num_errors  The number in this file shows 
how many
 dma-api/min_free_entries   This read-only file can be read to get the
minimum number of free dma_debug_entries the
allocator has ever seen. If this value goes
-   down to zero the code will disable itself
-   because it is not longer reliable.
+   down to zero the code will attempt to increase
+   nr_total_entries to compensate.
 
 dma-api/num_free_entries   The current number of free dma_debug_entries
in the allocator.
@@ -745,10 +745,9 @@ driver filter at boot time. The debug code will only print 
errors for that
 driver afterwards. This filter can be disabled or changed later using debugfs.
 
 When the code disables itself at runtime this is most likely because it ran
-out of dma_debug_entries. These entries are preallocated at boot. The number
-of preallocated entries is defined per architecture. If it is too low for you
-boot with 'dma_debug_entries=' to overwrite the
-architectural default.
+out of dma_debug_entries and was unable to allocate more on-demand. 65536
+entries are preallocated at boot - if this is too low for you boot with
+'dma_debug_entries=' to overwrite the default.
 
 ::
 
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 29486eb9d1dc..ef7c90b7a346 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -47,6 +47,8 @@
 #ifndef PREALLOC_DMA_DEBUG_ENTRIES
 #define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
 #endif
+/* If the pool runs out, add this many new entries at once */
+#define DMA_DEBUG_DYNAMIC_ENTRIES 256
 
 enum {
dma_debug_single,
@@ -646,6 +648,34 @@ static void add_dma_entry(struct dma_debug_entry *entry)
 */
 }
 
+static int dma_debug_create_entries(u32 num_entries, gfp_t gfp)
+{
+   struct dma_debug_entry *entry, *next_entry;
+   int i;
+
+   for (i = 0; i < num_entries; ++i) {
+   entry = kzalloc(sizeof(*entry), gfp);
+   if (!entry)
+   goto out_err;
+
+   list_add_tail(>list, _entries);
+   }
+
+   num_free_entries += num_entries;
+   nr_total_entries += num_entries;
+
+   return 0;
+
+out_err:
+
+   list_for_each_entry_safe(entry, next_entry, _entries, list) {
+   list_del(>list);
+   kfree(entry);
+   }
+
+   return -ENOMEM;
+}
+
 static struct dma_debug_entry *__dma_entry_alloc(void)
 {
struct dma_debug_entry *entry;
@@ -672,12 +702,14 @@ static struct dma_debug_entry *dma_entry_alloc(void)
unsigned long flags;
 
spin_lock_irqsave(_entries_lock, flags);
-
-   if (list_empty(_entries)) {
-   global_disable = true;
-   spin_unlock_irqrestore(_entries_lock, flags);
-   pr_err("debugging out of memory - disabling\n");
-   return NULL;
+   if (num_free_entries == 0) {
+   if (dma_debug_create_entries(DMA_DEBUG_DYNAMIC_ENTRIES,
+GFP_ATOMIC)) {
+   global_disable = true;
+   spin_unlock_irqrestore(_entries_lock, flags);
+   pr_err("debugging out of memory - disabling\n");
+   return 

[PATCH v3 2/7] dma-debug: Expose nr_total_entries in debugfs

2018-12-10 Thread Robin Murphy
Expose nr_total_entries in debugfs, so that {num,min}_free_entries
become even more meaningful to users interested in current/maximum
utilisation. This becomes even more relevant once nr_total_entries
may change at runtime beyond just the existing AMD GART debug code.

Reviewed-by: Christoph Hellwig 
Suggested-by: John Garry 
Signed-off-by: Robin Murphy 
---

v3: Add Christoph's review tag

 Documentation/DMA-API.txt | 3 +++
 kernel/dma/debug.c| 7 +++
 2 files changed, 10 insertions(+)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index ac66ae2509a9..6bdb095393b0 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -723,6 +723,9 @@ dma-api/min_free_entriesThis read-only file can be read 
to get the
 dma-api/num_free_entries   The current number of free dma_debug_entries
in the allocator.
 
+dma-api/nr_total_entries   The total number of dma_debug_entries in the
+   allocator, both free and used.
+
 dma-api/driver-filter  You can write a name of a driver into this file
to limit the debug output to requests from that
particular driver. Write an empty string to
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 91b84140e4a5..29486eb9d1dc 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -144,6 +144,7 @@ static struct dentry *show_all_errors_dent  __read_mostly;
 static struct dentry *show_num_errors_dent  __read_mostly;
 static struct dentry *num_free_entries_dent __read_mostly;
 static struct dentry *min_free_entries_dent __read_mostly;
+static struct dentry *nr_total_entries_dent __read_mostly;
 static struct dentry *filter_dent   __read_mostly;
 
 /* per-driver filter related state */
@@ -928,6 +929,12 @@ static int dma_debug_fs_init(void)
if (!min_free_entries_dent)
goto out_err;
 
+   nr_total_entries_dent = debugfs_create_u32("nr_total_entries", 0444,
+   dma_debug_dent,
+   _total_entries);
+   if (!nr_total_entries_dent)
+   goto out_err;
+
filter_dent = debugfs_create_file("driver_filter", 0644,
  dma_debug_dent, NULL, _fops);
if (!filter_dent)
-- 
2.19.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/7] dma-debug: Use pr_fmt()

2018-12-10 Thread Robin Murphy
Use pr_fmt() to generate the "DMA-API: " prefix consistently. This
results in it being added to a couple of pr_*() messages which were
missing it before, and for the err_printk() calls moves it to the actual
start of the message instead of somewhere in the middle.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Robin Murphy 
---

v3: No change

 kernel/dma/debug.c | 74 --
 1 file changed, 38 insertions(+), 36 deletions(-)

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 231ca4628062..91b84140e4a5 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -17,6 +17,8 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */
 
+#define pr_fmt(fmt)"DMA-API: " fmt
+
 #include 
 #include 
 #include 
@@ -234,7 +236,7 @@ static bool driver_filter(struct device *dev)
error_count += 1;   \
if (driver_filter(dev) &&   \
(show_all_errors || show_num_errors > 0)) { \
-   WARN(1, "%s %s: " format,   \
+   WARN(1, pr_fmt("%s %s: ") format,   \
 dev ? dev_driver_string(dev) : "NULL", \
 dev ? dev_name(dev) : "NULL", ## arg); \
dump_entry_trace(entry);\
@@ -519,7 +521,7 @@ static void active_cacheline_inc_overlap(phys_addr_t cln)
 * prematurely.
 */
WARN_ONCE(overlap > ACTIVE_CACHELINE_MAX_OVERLAP,
- "DMA-API: exceeded %d overlapping mappings of cacheline 
%pa\n",
+ pr_fmt("exceeded %d overlapping mappings of cacheline %pa\n"),
  ACTIVE_CACHELINE_MAX_OVERLAP, );
 }
 
@@ -614,7 +616,7 @@ void debug_dma_assert_idle(struct page *page)
 
cln = to_cacheline_number(entry);
err_printk(entry->dev, entry,
-  "DMA-API: cpu touching an active dma mapped cacheline 
[cln=%pa]\n",
+  "cpu touching an active dma mapped cacheline [cln=%pa]\n",
   );
 }
 
@@ -634,7 +636,7 @@ static void add_dma_entry(struct dma_debug_entry *entry)
 
rc = active_cacheline_insert(entry);
if (rc == -ENOMEM) {
-   pr_err("DMA-API: cacheline tracking ENOMEM, dma-debug 
disabled\n");
+   pr_err("cacheline tracking ENOMEM, dma-debug disabled\n");
global_disable = true;
}
 
@@ -673,7 +675,7 @@ static struct dma_debug_entry *dma_entry_alloc(void)
if (list_empty(_entries)) {
global_disable = true;
spin_unlock_irqrestore(_entries_lock, flags);
-   pr_err("DMA-API: debugging out of memory - disabling\n");
+   pr_err("debugging out of memory - disabling\n");
return NULL;
}
 
@@ -777,7 +779,7 @@ static int prealloc_memory(u32 num_entries)
num_free_entries = num_entries;
min_free_entries = num_entries;
 
-   pr_info("DMA-API: preallocated %d debug entries\n", num_entries);
+   pr_info("preallocated %d debug entries\n", num_entries);
 
return 0;
 
@@ -850,7 +852,7 @@ static ssize_t filter_write(struct file *file, const char 
__user *userbuf,
 * switched off.
 */
if (current_driver_name[0])
-   pr_info("DMA-API: switching off dma-debug driver 
filter\n");
+   pr_info("switching off dma-debug driver filter\n");
current_driver_name[0] = 0;
current_driver = NULL;
goto out_unlock;
@@ -868,7 +870,7 @@ static ssize_t filter_write(struct file *file, const char 
__user *userbuf,
current_driver_name[i] = 0;
current_driver = NULL;
 
-   pr_info("DMA-API: enable driver filter for driver [%s]\n",
+   pr_info("enable driver filter for driver [%s]\n",
current_driver_name);
 
 out_unlock:
@@ -887,7 +889,7 @@ static int dma_debug_fs_init(void)
 {
dma_debug_dent = debugfs_create_dir("dma-api", NULL);
if (!dma_debug_dent) {
-   pr_err("DMA-API: can not create debugfs directory\n");
+   pr_err("can not create debugfs directory\n");
return -ENOMEM;
}
 
@@ -973,7 +975,7 @@ static int dma_debug_device_change(struct notifier_block 
*nb, unsigned long acti
count = device_dma_allocations(dev, );
if (count == 0)
break;
-   err_printk(dev, entry, "DMA-API: device driver has pending "
+   err_printk(dev, entry, "device driver has pending "
"DMA allocations while released from device "
"[count=%d]\n"
"One of leaked entries details: "
@@ -1023,14 +1025,14 @@ static int dma_debug_init(void)
  

[PATCH v3 0/7] dma-debug cleanup and dynamic allocation

2018-12-10 Thread Robin Murphy
Hi all,

Here's some assorted cleanup and improvements to dma-debug which grew
out of the problem that certain drivers use very large numbers of DMA
mappings, and knowing when to override "dma_debug_entries=..." and what
value to override it with can be a less-than-obvious task for users.

The main part is patch #4, wherein we make dma-debug clever enough
to allocate more entries dynamically if needed, such that the
preallocation value becomes more of a quality-of-life option than a
necessity. Patches #5 and #6 do some cruft-removal to allow patch #7
to make the allocation behaviour more efficient in general.

Patches #1, #2 and #4 are some other cleanup and handy features which
fell out of the discussion/development.

Robin.


Robin Murphy (7):
  dma-debug: Use pr_fmt()
  dma-debug: Expose nr_total_entries in debugfs
  dma-debug: Dynamically expand the dma_debug_entry pool
  dma-debug: Make leak-like behaviour apparent
  x86/dma/amd-gart: Stop resizing dma_debug_entry pool
  dma/debug: Remove dma_debug_resize_entries()
  dma-debug: Batch dma_debug_entry allocation

 Documentation/DMA-API.txt |  20 +-
 Documentation/x86/x86_64/boot-options.txt |   5 +-
 arch/x86/kernel/amd_gart_64.c |  23 ---
 include/linux/dma-debug.h |   7 -
 kernel/dma/debug.c| 217 ++
 5 files changed, 109 insertions(+), 163 deletions(-)

-- 
2.19.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-12-10 Thread Robin Murphy

On 30/11/2018 11:14, John Garry wrote:

From: Ganapatrao Kulkarni 

Change function __iommu_dma_alloc_pages() to allocate pages for DMA from
respective device NUMA node. The ternary operator which would be for
alloc_pages_node() is tidied along with this.

The motivation for this change is to have a policy for page allocation
consistent with direct DMA mapping, which attempts to allocate pages local
to the device, as mentioned in [1].

In addition, for certain workloads it has been observed a marginal
performance improvement. The patch caused an observation of 0.9% average
throughput improvement for running tcrypt with HiSilicon crypto engine.

We also include a modification to use kvzalloc() for kzalloc()/vzalloc()
combination.

[1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1692998.html


Reviewed-by: Robin Murphy 


Signed-off-by: Ganapatrao Kulkarni 
[JPG: Added kvzalloc(), drop pages ** being device local, remove ternary 
operator, update message]
Signed-off-by: John Garry 
---
Difference:
v1->v2:
- Add Ganapatrao's tag and change author

v2->v3:
- removed ternary operator
- stopped making pages ** allocation local to device

v3->v4:
- Update commit message to include motivation for patch, including
   headline performance improvement for test.

Some notes:
This patch was originally posted by Ganapatrao in [2].

However, after initial review, it was never reposted (due to lack of
cycles, I think). In addition, the functionality in its sibling patches
were merged through patches, as mentioned in [2]; this also refers to a
discussion on device local allocations vs CPU local allocations for DMA
pool, and which is better [1].

However, as mentioned in [1], dma_alloc_coherent() uses the locality
information from the device - as in direct DMA - so this patch is just
applying this same policy.

And from some testing, mentioned in commit message, shows marginal
improvement.

[2] https://lore.kernel.org/patchwork/patch/833004/
[3] https://lkml.org/lkml/2018/8/22/391

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d1b0475..4afb1a8 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -449,20 +449,17 @@ static void __iommu_dma_free_pages(struct page **pages, 
int count)
kvfree(pages);
  }
  
-static struct page **__iommu_dma_alloc_pages(unsigned int count,

-   unsigned long order_mask, gfp_t gfp)
+static struct page **__iommu_dma_alloc_pages(struct device *dev,
+   unsigned int count, unsigned long order_mask, gfp_t gfp)
  {
struct page **pages;
-   unsigned int i = 0, array_size = count * sizeof(*pages);
+   unsigned int i = 0, nid = dev_to_node(dev);
  
  	order_mask &= (2U << MAX_ORDER) - 1;

if (!order_mask)
return NULL;
  
-	if (array_size <= PAGE_SIZE)

-   pages = kzalloc(array_size, GFP_KERNEL);
-   else
-   pages = vzalloc(array_size);
+   pages = kvzalloc(count * sizeof(*pages), GFP_KERNEL);
if (!pages)
return NULL;
  
@@ -481,10 +478,12 @@ static struct page **__iommu_dma_alloc_pages(unsigned int count,

for (order_mask &= (2U << __fls(count)) - 1;
 order_mask; order_mask &= ~order_size) {
unsigned int order = __fls(order_mask);
+   gfp_t alloc_flags = gfp;
  
  			order_size = 1U << order;

-   page = alloc_pages((order_mask - order_size) ?
-  gfp | __GFP_NORETRY : gfp, order);
+   if (order_mask > order_size)
+   alloc_flags |= __GFP_NORETRY;
+   page = alloc_pages_node(nid, alloc_flags, order);
if (!page)
continue;
if (!order)
@@ -569,7 +568,8 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
alloc_sizes = min_size;
  
  	count = PAGE_ALIGN(size) >> PAGE_SHIFT;

-   pages = __iommu_dma_alloc_pages(count, alloc_sizes >> PAGE_SHIFT, gfp);
+   pages = __iommu_dma_alloc_pages(dev, count, alloc_sizes >> PAGE_SHIFT,
+   gfp);
if (!pages)
return NULL;
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-12-10 Thread Joerg Roedel
On Fri, Dec 07, 2018 at 05:57:42PM +, John Garry wrote:
> A friendly reminder. Can you please let me know your position on this patch?

I am waiting for a Reviewed-by or at least Acked-by from Robin on it
before I put it in my tree.


Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 0/6] Auxiliary IOMMU domains and Arm SMMUv3

2018-12-10 Thread 'j...@8bytes.org'
Hi Lu Baolu,

On Mon, Dec 10, 2018 at 10:57:22AM +0800, Lu Baolu wrote:
> > /* Check if a device supports a given feature of the IOMMU-API */
> > bool iommu_dev_has_feature(struct device *dev, enum iommu_dev_features 
> > *feat);
> 
> Here we pass in a pointer of "enum iommu_dev_features", do we want
> to return anything here?

Yeah, this is a mistake on my side, The enum parameter should be passed
by-value, so not pointer there. It doesn't need to return anything in
this parameter.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 0/6] Auxiliary IOMMU domains and Arm SMMUv3

2018-12-10 Thread 'j...@8bytes.org'
Hi Kevin,

On Mon, Dec 10, 2018 at 02:06:44AM +, Tian, Kevin wrote:
> Can I interpret above as that you agree with the aux domain concept (i.e. one
> device can be linked to multiple domains) in general, and now we're just 
> trying
> to address the remaining open in API level?

Yes, I thouht about alternatives, but in the end they are all harder to
use than the aux-domain concept. We just need to make sure that we have
a clear definition of what the API extension do and how they impact the
behavior of existing functions.

> > enum iommu_dev_features {
> > /* ... */
> > IOMMU_DEV_FEAT_AUX,
> > IOMMU_DEV_FEAT_SVA,
> > /* ... */
> > };
> > 
> 
> Does above represent whether a device implements aux/sva features,
> or whether a device has been enabled by driver to support aux/sva 
> features?

These represent whether the device together with the IOMMU support them,
basically whether these features are usable via the IOMMU-API.


> 
> > /* Check if a device supports a given feature of the IOMMU-API */
> > bool iommu_dev_has_feature(struct device *dev, enum
> > iommu_dev_features *feat);
> 
> If the latter we also need iommu_dev_set_feature so driver can poke
> it based on its own configuration.

Do you mean we need an explict enable-API for the features? I thought
about that too, but some features clearly don't need it and I didn't
want to complicate the usage. My assumption was that when the feature is
available, it is also enabled.

> > /* So we need a iommu_aux_detach_all()? */
> 
> for what scenario?

Maybe for detaching any PASID users from a device so that it can be
assigned to a VM as a whole. But I am not sure it is needed.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: dma_declare_coherent_memory on main memory

2018-12-10 Thread Sascha Hauer
On Fri, Dec 07, 2018 at 04:34:32PM +0100, Christoph Hellwig wrote:
> Hi all,
> 
> the ARM imx27/31 ports and various sh boards use
> dma_declare_coherent_memory on main memory taken from the memblock
> allocator.
> 
> Is there any good reason these couldn't be switched to CMA areas?
> Getting rid of these magic dma_declare_coherent_memory area would
> help making the dma allocator a lot simpler.

At least for i.MX27/31 I'd say this predates CMA support, so historical
reasons.

Sascha

-- 
Pengutronix e.K.   | |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu