Re: [RFC PATCH 1/7] vfio/spimdev: Add documents for WarpDrive framework

2018-08-07 Thread Kenneth Lee



在 2018年08月06日 星期一 08:27 下午, Pavel Machek 写道:

Hi!


WarpDrive is a common user space accelerator framework.  Its main component
in Kernel is called spimdev, Share Parent IOMMU Mediated Device. It exposes

spimdev is really unfortunate name. It looks like it has something to do with 
SPI, but
it does not.


Yes. Let me change it to Share (IOMMU) Domain MDev, SDMdev:)

+++ b/Documentation/warpdrive/warpdrive.rst
@@ -0,0 +1,153 @@
+Introduction of WarpDrive
+=
+
+*WarpDrive* is a general accelerator framework built on top of vfio.
+It can be taken as a light weight virtual function, which you can use without
+*SR-IOV* like facility and can be shared among multiple processes.
+
+It can be used as the quick channel for accelerators, network adaptors or
+other hardware in user space. It can make some implementation simpler.  E.g.
+you can reuse most of the *netdev* driver and just share some ring buffer to
+the user space driver for *DPDK* or *ODP*. Or you can combine the RSA
+accelerator with the *netdev* in the user space as a Web reversed proxy, etc.

What is DPDK? ODP?

DPDK:https://www.dpdk.org/about/
ODP: https://www.opendataplane.org/

will add the reference in the next RFC



+How does it work
+
+
+*WarpDrive* takes the Hardware Accelerator as a heterogeneous processor which
+can share some load for the CPU:
+
+.. image:: wd.svg
+:alt: This is a .svg image, if your browser cannot show it,
+try to download and view it locally
+
+So it provides the capability to the user application to:
+
+1. Send request to the hardware
+2. Share memory with the application and other accelerators
+
+These requirements can be fulfilled by VFIO if the accelerator can serve each
+application with a separated Virtual Function. But a *SR-IOV* like VF (we will
+call it *HVF* hereinafter) design is too heavy for the accelerator which
+service thousands of processes.

VFIO? VF? HVF?

Also "gup" might be worth spelling out.

But I think the reference [1] has explained this.



+References
+==
+.. [1] Accroding to the comment in in mm/gup.c, The *gup* is only safe within
+   a syscall.  Because it can only keep the physical memory in place
+   without making sure the VMA will always point to it. Maybe we should
+   raise the VM_PINNED patchset (see
+   https://lists.gt.net/linux/kernel/1931993) again to solve this probl


I went through the docs, but I still don't know what it does.

Will refine the doc in next RFC, hope it will help.


Pavel



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH 3/7] vfio: add spimdev support

2018-08-07 Thread Kenneth Lee



在 2018年08月07日 星期二 01:05 上午, Alex Williamson 写道:

On Mon, 6 Aug 2018 09:34:28 -0700
"Raj, Ashok"  wrote:


On Mon, Aug 06, 2018 at 09:49:40AM -0600, Alex Williamson wrote:

On Mon, 6 Aug 2018 09:40:04 +0800
Kenneth Lee  wrote:

1. It supports thousands of processes. Take zip accelerator as an example, any
application need data compression/decompression will need to interact with the
accelerator. To support that, you have to create tens of thousands of mdev for
their usage. I don't think it is a good idea to have so many devices in the
system.

Each mdev is a device, regardless of whether there are hardware
resources committed to the device, so I don't understand this argument.


2. The application does not want to own the mdev for long. It just need an
access point for the hardware service. If it has to interact with an management
agent for allocation and release, this makes the problem complex.

I don't see how the length of the usage plays a role here either.  Are
you concerned that the time it takes to create and remove an mdev is
significant compared to the usage time?  Userspace is certainly welcome
to create a pool of devices, but why should it be the kernel's
responsibility to dynamically assign resources to an mdev?  What's the
usage model when resources are unavailable?  It seems there's
complexity in either case, but it's generally userspace's responsibility
to impose a policy.
   

Can vfio dev's created representing an mdev be shared between several
processes?  It doesn't need to be exclusive.

The path to hardware is established by the processes binding to SVM and
IOMMU ensuring that the PASID is plummed properly.  One can think the
same hardware is shared between several processes, hardware knows the
isolation is via the PASID.

For these cases it isn't required to create a dev per process.

The iommu group is the unit of ownership, a vfio group mirrors an iommu
group, therefore a vfio group only allows a single open(2).  A group
also represents the minimum isolation set of devices, therefore devices
within a group are not considered isolated and must share the same
address space represented by the vfio container.  Beyond that, it is
possible to share devices among processes, but (I think) it generally
implies a hierarchical rather than peer relationship between
processes.  Thanks,
Actually, this is the key problem we concerned. Our logic was: The PASID 
refer to the connection between the device and the process. So the 
resource should be allocated only when the process "make use of" the 
device. This strategy also bring another advantage that the kernel 
driver can also make use of the resource if no user application open it.


We do have another branch that allocate resource to mdev directly. It 
looks not so nice (many mdevs and user agent is required for resource 
management). If the conclusion here is to keep the mdev's original 
semantics, we will send that branch for discussion in next RFC.


Cheers
Kenneth


Alex



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

2018-08-07 Thread Kenneth Lee



在 2018年08月06日 星期一 11:32 下午, Jerome Glisse 写道:

On Mon, Aug 06, 2018 at 11:12:52AM +0800, Kenneth Lee wrote:

On Fri, Aug 03, 2018 at 10:39:44AM -0400, Jerome Glisse wrote:

On Fri, Aug 03, 2018 at 11:47:21AM +0800, Kenneth Lee wrote:

On Thu, Aug 02, 2018 at 10:22:43AM -0400, Jerome Glisse wrote:

On Thu, Aug 02, 2018 at 12:05:57PM +0800, Kenneth Lee wrote:

On Thu, Aug 02, 2018 at 02:33:12AM +, Tian, Kevin wrote:

On Wed, Aug 01, 2018 at 06:22:14PM +0800, Kenneth Lee wrote:

[...]


But doorbell is just a notification. Except for DOS (to make hardware busy) it
cannot actually take or change anything from the kernel space. And the DOS
problem can be always taken as the problem that a group of processes share the
same kernel entity.

In the coming HIP09 hardware, the doorbell will come with a random number so
only the process who allocated the queue can knock it correctly.

When doorbell is ring the hardware start fetching commands from
the queue and execute them ? If so than a rogue process B might
ring the doorbell of process A which would starts execution of
random commands (ie whatever random memory value there is left
inside the command buffer memory, could be old commands i guess).

If this is not how this doorbell works then, yes it can only do
a denial of service i guess. Issue i have with doorbell is that
i have seen 10 differents implementations in 10 differents hw
and each are different as to what ringing or value written to the
doorbell does. It is painfull to track what is what for each hw.


In our implementation, doorbell is simply a notification, just like an interrupt
to the accelerator. The command is all about what's in the queue.

I agree that there is no simple and standard way to track the shared IO space.
But I think we have to trust the driver in some way. If the driver is malicious,
even a simple ioctl can become an attack.

Trusting kernel space driver is fine, trusting user space driver is
not in my view. AFAICT every driver developer so far always made
sure that someone could not abuse its device to do harmfull thing to
other process.


Fully agree. That is why this driver shares only the doorbell space. There is
only the doorbell is shared in the whole page, nothing else.

Maybe you are concerning the user driver will give malicious command to the
hardware? But these commands cannot influence the other process. If we can trust
the hardware design, the process cannot do any harm.

My questions was what happens if a process B ring the doorbell of
process A.

On some hardware the value written in the doorbell is use as an
index in command buffer. On other it just wakeup the hardware to go
look at a structure private to the process. They are other variations
of those themes.

If it is the former ie the value is use to advance in the command
buffer then a rogue process can force another process to advance its
command buffer and what is in the command buffer can be some random
old memory values which can be more harmfull than just Denial Of
Service.


Yes. We have considered that. There is no other information in the 
doorbell. The indexes, such as head and tail pointers, are all in the 
shared memory between the hardware and the user process. The other 
process cannot touch it.



My more general question is do we want to grow VFIO to become
a more generic device driver API. This patchset adds a command
queue concept to it (i don't think it exist today but i have
not follow VFIO closely).


The thing is, VFIO is the only place to support DMA from user land. If we don't
put it here, we have to create another similar facility to support the same.

No it is not, network device, GPU, block device, ... they all do
support DMA. The point i am trying to make here is that even in

Sorry, wait a minute, are we talking the same thing? I meant "DMA from user
land", not "DMA from kernel driver". To do that we have to manipulate the
IOMMU(Unit). I think it can only be done by default_domain or vfio domain. Or
the user space have to directly access the IOMMU.

GPU do DMA in the sense that you pass to the kernel a valid
virtual address (kernel driver do all the proper check) and
then you can use the GPU to copy from or to that range of
virtual address. Exactly how you want to use this compression
engine. It does not rely on SVM but SVM going forward would
still be the prefered option.


No, SVM is not the reason why we rely on Jean's SVM(SVA) series. We rely on
Jean's series because of multi-process (PASID or substream ID) support.

But of couse, WarpDrive can still benefit from the SVM feature.

We are getting side tracked here. PASID/ID do not require VFIO.


Yes, PASID itself do not require VFIO. But what if:

1. Support DMA from user space.
2. The hardware makes use of standard IOMMU/SMMU for IO address translation.
3. The IOMMU facility is shared by both kernel and user drivers.
4. Support PASID with the current IOMMU facility

your mechanisms the userspace must have a specific 

Re: [PATCH 1/3] hexagon: remove the sync_single_for_cpu DMA operation

2018-08-07 Thread Richard Kuo
On Tue, Aug 07, 2018 at 09:01:36AM +0200, Christoph Hellwig wrote:
> On Tue, Jul 31, 2018 at 05:22:29PM +0200, Christoph Hellwig wrote:
> > On Wed, Jul 25, 2018 at 06:39:27AM +0200, Christoph Hellwig wrote:
> > > On Tue, Jul 24, 2018 at 10:29:48PM -0500, Richard Kuo wrote:
> > > > Patch series looks good.  Definitely appreciate the cleanup.
> > > > 
> > > > I can take it through my tree, or if not:
> > > > 
> > > > Acked-by: Richard Kuo 
> > > 
> > > Please take it through your tree, thanks!
> > 
> > I haven't seen it in linux-next yet, do you still plan to take it?
> > 
> > Otherwise I'll merge it in the dma-mapping tree.
> 
> Given that I haven't seen this in linux-next nor haven't heard back
> from you I assume you are on your well deserved summer vacation.
> 
> If I don't hear back until tomorrow night I'll merge it through the
> dma-mapping tree, so that I have it in linux-next before heading out
> to my summer vacation.

I am here, and I have the patch queued up but it's waiting for approval
before I get to push it out through my tree.

That said, I am perfectly fine with this going through a different tree
if expedience is needed.


Thanks,
Richard Kuo


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, 
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] arm64: dts: stratix10: Add Stratix10 SMMU support

2018-08-07 Thread Rob Herring
On Wed, Jul 25, 2018 at 01:24:36PM -0500, thor.tha...@linux.intel.com wrote:
> From: Thor Thayer 
> 
> Add SMMU support to the Stratix10 Device Tree which
> includes adding the SMMU node and adding IOMMU stream
> ids to the SMMU peripherals. Update bindings.
> 
> Signed-off-by: Thor Thayer 
> ---
> This patch is dependent on the patch series
> "iommu/arm-smmu: Add runtime pm/sleep support"
> (https://patchwork.ozlabs.org/cover/946160/)

I don't think so.

> ---
>  .../devicetree/bindings/iommu/arm,smmu.txt | 25 ++
>  arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi  | 30 
> ++
>  2 files changed, 55 insertions(+)

Reviewed-by: Rob Herring 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 10/10] [SCSI] mpt3sas: replace chain_dma_pool

2018-08-07 Thread Tony Battersby
Replace chain_dma_pool with direct calls to dma_alloc_coherent() and
dma_free_coherent().  Since the chain lookup can involve hundreds of
thousands of allocations, it is worthwile to avoid the overhead of the
dma_pool API.

Signed-off-by: Tony Battersby 
---

No changes since v1.

The original code called _base_release_memory_pools() before "goto out"
if dma_pool_alloc() failed, but this was unnecessary because
mpt3sas_base_attach() will call _base_release_memory_pools() after "goto
out_free_resources".  It may have been that way because the out-of-tree
vendor driver (from https://www.broadcom.com/support/download-search)
has a slightly-more-complicated error handler there that adjusts
max_request_credit, calls _base_release_memory_pools() and then does
"goto retry_allocation" under some circumstances, but that is missing
from the in-tree driver.

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c 
b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 569392d..2cb567a 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -4224,6 +4224,134 @@ void mpt3sas_base_clear_st(struct MPT3SAS_ADAPTER *ioc,
 }
 
 /**
+ * _base_release_chain_lookup - release chain_lookup memory pools
+ * @ioc: per adapter object
+ *
+ * Free memory allocated from _base_allocate_chain_lookup.
+ */
+static void
+_base_release_chain_lookup(struct MPT3SAS_ADAPTER *ioc)
+{
+   unsigned int chains_avail = 0;
+   struct chain_tracker *ct;
+   int i, j;
+
+   if (!ioc->chain_lookup)
+   return;
+
+   /*
+* NOTE
+*
+* To make this code easier to understand and maintain, the for loops
+* and the management of the chains_avail value are designed to be
+* similar to the _base_allocate_chain_lookup() function.  That way,
+* the code for freeing the memory is similar to the code for
+* allocating the memory.
+*/
+   for (i = 0; i < ioc->scsiio_depth; i++) {
+   if (!ioc->chain_lookup[i].chains_per_smid)
+   break;
+
+   for (j = ioc->chains_per_prp_buffer;
+   j < ioc->chains_needed_per_io; j++) {
+   /*
+* If chains_avail is 0, then the chain represents a
+* real allocation, so free it.
+*
+* If chains_avail is nonzero, then the chain was
+* initialized at an offset from a previous allocation,
+* so don't free it.
+*/
+   if (chains_avail == 0) {
+   ct = >chain_lookup[i].chains_per_smid[j];
+   if (ct->chain_buffer)
+   dma_free_coherent(
+   >pdev->dev,
+   ioc->chain_allocation_sz,
+   ct->chain_buffer,
+   ct->chain_buffer_dma);
+   chains_avail = ioc->chains_per_allocation;
+   }
+   chains_avail--;
+   }
+   kfree(ioc->chain_lookup[i].chains_per_smid);
+   }
+
+   kfree(ioc->chain_lookup);
+   ioc->chain_lookup = NULL;
+}
+
+/**
+ * _base_allocate_chain_lookup - allocate chain_lookup memory pools
+ * @ioc: per adapter object
+ * @total_sz: external value that tracks total amount of memory allocated
+ *
+ * Return: 0 success, anything else error
+ */
+static int
+_base_allocate_chain_lookup(struct MPT3SAS_ADAPTER *ioc, u32 *total_sz)
+{
+   unsigned int aligned_chain_segment_sz;
+   const unsigned int align = 16;
+   unsigned int chains_avail = 0;
+   struct chain_tracker *ct;
+   dma_addr_t dma_addr = 0;
+   void *vaddr = NULL;
+   int i, j;
+
+   /* Round up the allocation size for alignment. */
+   aligned_chain_segment_sz = ioc->chain_segment_sz;
+   if (aligned_chain_segment_sz % align != 0)
+   aligned_chain_segment_sz =
+   ALIGN(aligned_chain_segment_sz, align);
+
+   /* Allocate a page of chain buffers at a time. */
+   ioc->chain_allocation_sz =
+   max_t(unsigned int, aligned_chain_segment_sz, PAGE_SIZE);
+
+   /* Calculate how many chain buffers we can get from one allocation. */
+   ioc->chains_per_allocation =
+   ioc->chain_allocation_sz / aligned_chain_segment_sz;
+
+   for (i = 0; i < ioc->scsiio_depth; i++) {
+   for (j = ioc->chains_per_prp_buffer;
+   j < ioc->chains_needed_per_io; j++) {
+   /*
+* Check if there are any chain buffers left in the
+* previously-allocated block.
+*/
+   if 

[PATCH v3 09/10] dmapool: debug: prevent endless loop in case of corruption

2018-08-07 Thread Tony Battersby
Prevent a possible endless loop with DMAPOOL_DEBUG enabled if a buggy
driver corrupts DMA pool memory.

Signed-off-by: Tony Battersby 
---

Changes since v2:
This is closer to the improved version from "dmapool: reduce footprint
in struct page" in v2 thanks to a previous patch adding blks_per_alloc.

--- linux/mm/dmapool.c.orig 2018-08-06 17:52:53.0 -0400
+++ linux/mm/dmapool.c  2018-08-06 17:53:31.0 -0400
@@ -454,17 +454,39 @@ void dma_pool_free(struct dma_pool *pool
{
void *page_vaddr = vaddr - offset;
unsigned int chain = page->dma_free_off;
+   unsigned int free_blks = 0;
+
while (chain < pool->allocation) {
-   if (chain != offset) {
-   chain = *(int *)(page_vaddr + chain);
-   continue;
+   if (unlikely(chain == offset)) {
+   spin_unlock_irqrestore(>lock, flags);
+   dev_err(pool->dev,
+   "dma_pool_free %s, dma %pad already 
free\n",
+   pool->name, );
+   return;
+   }
+
+   /*
+* A buggy driver could corrupt the freelist by
+* use-after-free, buffer overflow, etc.  Besides
+* checking for corruption, this also prevents an
+* endless loop in case corruption causes a circular
+* loop in the freelist.
+*/
+   if (unlikely(++free_blks + page->dma_in_use >
+pool->blks_per_alloc)) {
+ freelist_corrupt:
+   spin_unlock_irqrestore(>lock, flags);
+   dev_err(pool->dev,
+   "dma_pool_free %s, freelist 
corrupted\n",
+   pool->name);
+   return;
}
-   spin_unlock_irqrestore(>lock, flags);
-   dev_err(pool->dev,
-   "dma_pool_free %s, dma %pad already free\n",
-   pool->name, );
-   return;
+
+   chain = *(int *)(page_vaddr + chain);
}
+   if (unlikely(free_blks + page->dma_in_use !=
+pool->blks_per_alloc))
+   goto freelist_corrupt;
}
memset(vaddr, POOL_POISON_FREED, pool->size);
 #endif

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 08/10] dmapool: improve accuracy of debug statistics

2018-08-07 Thread Tony Battersby
The "total number of blocks in pool" debug statistic currently does not
take the boundary value into account, so it diverges from the "total
number of blocks in use" statistic when a boundary is in effect.  Add a
calculation for the number of blocks per allocation that takes the
boundary into account, and use it to replace the inaccurate calculation.

Signed-off-by: Tony Battersby 
---

This was split off from "dmapool: reduce footprint in struct page" in v2.

This depends on patch #1 "dmapool: fix boundary comparison" for the
calculated blks_per_alloc value to be correct.

The added blks_per_alloc value will also be used in the next patch.

--- linux/mm/dmapool.c.orig 2018-08-06 17:48:54.0 -0400
+++ linux/mm/dmapool.c  2018-08-06 17:52:53.0 -0400
@@ -61,6 +61,7 @@ struct dma_pool { /* the pool */
struct device *dev;
unsigned int allocation;
unsigned int boundary;
+   unsigned int blks_per_alloc;
char name[32];
struct list_head pools;
 };
@@ -105,8 +106,7 @@ show_pools(struct device *dev, struct de
/* per-pool info, no real statistics yet */
temp = scnprintf(next, size, "%-16s %4zu %4zu %4u %2u\n",
 pool->name, blocks,
-(size_t) pages *
-(pool->allocation / pool->size),
+(size_t) pages * pool->blks_per_alloc,
 pool->size, pages);
size -= temp;
next += temp;
@@ -182,6 +182,9 @@ struct dma_pool *dma_pool_create(const c
retval->size = size;
retval->boundary = boundary;
retval->allocation = allocation;
+   retval->blks_per_alloc =
+   (allocation / boundary) * (boundary / size) +
+   (allocation % boundary) / size;
 
INIT_LIST_HEAD(>pools);
 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 07/10] dmapool: cleanup integer types

2018-08-07 Thread Tony Battersby
To represent the size of a single allocation, dmapool currently uses
'unsigned int' in some places and 'size_t' in other places.  Standardize
on 'unsigned int' to reduce overhead, but use 'size_t' when counting all
the blocks in the entire pool.

Signed-off-by: Tony Battersby 
---

This was split off from "dmapool: reduce footprint in struct page" in v2.

This puts an upper bound on 'size' of INT_MAX to avoid overflowing the
following comparison in pool_initialize_free_block_list():

unsigned int offset = 0;
unsigned int next = offset + pool->size;
if (unlikely((next + pool->size) > ...

The actual maximum allocation size is probably lower anyway, probably
KMALLOC_MAX_SIZE, but that gets into the implementation details of other
subsystems which don't export a predefined maximum, so I didn't want to
hardcode it here.  The purpose of the added bounds check is to avoid
overflowing integers, not to check the actual
(platform/device/config-specific?) maximum allocation size.

'boundary' is passed in as a size_t but gets stored as an unsigned int. 
'boundary' values >= 'allocation' do not have any effect, so clipping
'boundary' to 'allocation' keeps it within the range of unsigned int
without affecting anything else.  A few lines above (not in the diff)
you can see that if 'boundary' is passed in as 0 then it is set to
'allocation', so it is nothing new.  For reference, here is the
relevant code after being patched:

if (!boundary)
boundary = allocation;
else if ((boundary < size) || (boundary & (boundary - 1)))
return NULL;

boundary = min(boundary, allocation);

--- linux/mm/dmapool.c.orig 2018-08-06 17:48:19.0 -0400
+++ linux/mm/dmapool.c  2018-08-06 17:48:54.0 -0400
@@ -57,10 +57,10 @@ struct dma_pool {   /* the pool */
 #define POOL_MAX_IDX2
struct list_head page_list[POOL_MAX_IDX];
spinlock_t lock;
-   size_t size;
+   unsigned int size;
struct device *dev;
-   size_t allocation;
-   size_t boundary;
+   unsigned int allocation;
+   unsigned int boundary;
char name[32];
struct list_head pools;
 };
@@ -86,7 +86,7 @@ show_pools(struct device *dev, struct de
mutex_lock(_lock);
list_for_each_entry(pool, >dma_pools, pools) {
unsigned pages = 0;
-   unsigned blocks = 0;
+   size_t blocks = 0;
int list_idx;
 
spin_lock_irq(>lock);
@@ -103,9 +103,10 @@ show_pools(struct device *dev, struct de
spin_unlock_irq(>lock);
 
/* per-pool info, no real statistics yet */
-   temp = scnprintf(next, size, "%-16s %4u %4zu %4zu %2u\n",
+   temp = scnprintf(next, size, "%-16s %4zu %4zu %4u %2u\n",
 pool->name, blocks,
-pages * (pool->allocation / pool->size),
+(size_t) pages *
+(pool->allocation / pool->size),
 pool->size, pages);
size -= temp;
next += temp;
@@ -150,7 +151,7 @@ struct dma_pool *dma_pool_create(const c
else if (align & (align - 1))
return NULL;
 
-   if (size == 0)
+   if (size == 0 || size > INT_MAX)
return NULL;
else if (size < 4)
size = 4;
@@ -165,6 +166,8 @@ struct dma_pool *dma_pool_create(const c
else if ((boundary < size) || (boundary & (boundary - 1)))
return NULL;
 
+   boundary = min(boundary, allocation);
+
retval = kmalloc_node(sizeof(*retval), GFP_KERNEL, dev_to_node(dev));
if (!retval)
return retval;
@@ -344,7 +347,7 @@ void *dma_pool_alloc(struct dma_pool *po
 {
unsigned long flags;
struct page *page;
-   size_t offset;
+   unsigned int offset;
void *retval;
void *vaddr;
 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 03/10] dmapool: cleanup dma_pool_destroy

2018-08-07 Thread Tony Battersby
Remove a small amount of code duplication between dma_pool_destroy() and
pool_free_page() in preparation for adding more code without having to
duplicate it.  No functional changes.

Signed-off-by: Tony Battersby 
---

No changes since v2.

--- linux/mm/dmapool.c.orig 2018-08-02 09:59:15.0 -0400
+++ linux/mm/dmapool.c  2018-08-02 10:01:26.0 -0400
@@ -249,13 +249,22 @@ static inline bool is_page_busy(struct d
 
 static void pool_free_page(struct dma_pool *pool, struct dma_page *page)
 {
+   void *vaddr = page->vaddr;
dma_addr_t dma = page->dma;
 
+   list_del(>page_list);
+
+   if (is_page_busy(page)) {
+   dev_err(pool->dev,
+   "dma_pool_destroy %s, %p busy\n",
+   pool->name, vaddr);
+   /* leak the still-in-use consistent memory */
+   } else {
 #ifdef DMAPOOL_DEBUG
-   memset(page->vaddr, POOL_POISON_FREED, pool->allocation);
+   memset(vaddr, POOL_POISON_FREED, pool->allocation);
 #endif
-   dma_free_coherent(pool->dev, pool->allocation, page->vaddr, dma);
-   list_del(>page_list);
+   dma_free_coherent(pool->dev, pool->allocation, vaddr, dma);
+   }
kfree(page);
 }
 
@@ -269,6 +278,7 @@ static void pool_free_page(struct dma_po
  */
 void dma_pool_destroy(struct dma_pool *pool)
 {
+   struct dma_page *page;
bool empty = false;
 
if (unlikely(!pool))
@@ -284,19 +294,10 @@ void dma_pool_destroy(struct dma_pool *p
device_remove_file(pool->dev, _attr_pools);
mutex_unlock(_reg_lock);
 
-   while (!list_empty(>page_list)) {
-   struct dma_page *page;
-   page = list_entry(pool->page_list.next,
- struct dma_page, page_list);
-   if (is_page_busy(page)) {
-   dev_err(pool->dev,
-   "dma_pool_destroy %s, %p busy\n",
-   pool->name, page->vaddr);
-   /* leak the still-in-use consistent memory */
-   list_del(>page_list);
-   kfree(page);
-   } else
-   pool_free_page(pool, page);
+   while ((page = list_first_entry_or_null(>page_list,
+   struct dma_page,
+   page_list))) {
+   pool_free_page(pool, page);
}
 
kfree(pool);

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 06/10] dmapool: improve scalability of dma_pool_free

2018-08-07 Thread Tony Battersby
dma_pool_free() scales poorly when the pool contains many pages because
pool_find_page() does a linear scan of all allocated pages.  Improve its
scalability by replacing the linear scan with virt_to_page() and storing
dmapool private data directly in 'struct page', thereby eliminating
'struct dma_page'.  In big O notation, this improves the algorithm from
O(n^2) to O(n) while also reducing memory usage.

Thanks to Matthew Wilcox for the suggestion to use struct page.

Signed-off-by: Tony Battersby 
---

Changes since v2:
Just a re-diff after the changes in prior patches.

--- linux/include/linux/mm_types.h.orig 2018-08-01 17:59:46.0 -0400
+++ linux/include/linux/mm_types.h  2018-08-01 17:59:56.0 -0400
@@ -153,6 +153,12 @@ struct page {
unsigned long _zd_pad_1;/* uses mapping */
};
 
+   struct {/* dma_pool pages */
+   struct list_head dma_list;
+   dma_addr_t dma;
+   unsigned int dma_free_off;
+   };
+
/** @rcu_head: You can use this to free a page by RCU. */
struct rcu_head rcu_head;
};
@@ -174,6 +180,8 @@ struct page {
 
unsigned int active;/* SLAB */
int units;  /* SLOB */
+
+   unsigned int dma_in_use;/* dma_pool pages */
};
 
/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
--- linux/mm/dmapool.c.orig 2018-08-03 17:47:03.0 -0400
+++ linux/mm/dmapool.c  2018-08-03 17:47:22.0 -0400
@@ -25,6 +25,10 @@
  * list tracks pages that have one or more free blocks, and the 'full' list
  * tracks pages that have no free blocks.  Pages are moved from one list to
  * the other as their blocks are allocated and freed.
+ *
+ * When allocating DMA pages, we use some available space in 'struct page' to
+ * store data private to dmapool; search 'dma_pool' in the definition of
+ * 'struct page' for details.
  */
 
 #include 
@@ -61,14 +65,6 @@ struct dma_pool {/* the pool */
struct list_head pools;
 };
 
-struct dma_page {  /* cacheable header for 'allocation' bytes */
-   struct list_head dma_list;
-   void *vaddr;
-   dma_addr_t dma;
-   unsigned int dma_in_use;
-   unsigned int dma_free_off;
-};
-
 static DEFINE_MUTEX(pools_lock);
 static DEFINE_MUTEX(pools_reg_lock);
 
@@ -95,7 +91,7 @@ show_pools(struct device *dev, struct de
 
spin_lock_irq(>lock);
for (list_idx = 0; list_idx < POOL_MAX_IDX; list_idx++) {
-   struct dma_page *page;
+   struct page *page;
 
list_for_each_entry(page,
>page_list[list_idx],
@@ -218,7 +214,7 @@ struct dma_pool *dma_pool_create(const c
 }
 EXPORT_SYMBOL(dma_pool_create);
 
-static void pool_initialise_page(struct dma_pool *pool, struct dma_page *page)
+static void pool_initialize_free_block_list(struct dma_pool *pool, void *vaddr)
 {
unsigned int offset = 0;
unsigned int next_boundary = pool->boundary;
@@ -229,47 +225,57 @@ static void pool_initialise_page(struct 
next = next_boundary;
next_boundary += pool->boundary;
}
-   *(int *)(page->vaddr + offset) = next;
+   *(int *)(vaddr + offset) = next;
offset = next;
} while (offset < pool->allocation);
 }
 
-static struct dma_page *pool_alloc_page(struct dma_pool *pool, gfp_t mem_flags)
+static struct page *pool_alloc_page(struct dma_pool *pool, gfp_t mem_flags)
 {
-   struct dma_page *page;
+   struct page *page;
+   dma_addr_t dma;
+   void *vaddr;
 
-   page = kmalloc(sizeof(*page), mem_flags);
-   if (!page)
+   vaddr = dma_alloc_coherent(pool->dev, pool->allocation, ,
+  mem_flags);
+   if (!vaddr)
return NULL;
-   page->vaddr = dma_alloc_coherent(pool->dev, pool->allocation,
->dma, mem_flags);
-   if (page->vaddr) {
+
 #ifdef DMAPOOL_DEBUG
-   memset(page->vaddr, POOL_POISON_FREED, pool->allocation);
+   memset(vaddr, POOL_POISON_FREED, pool->allocation);
 #endif
-   pool_initialise_page(pool, page);
-   page->dma_in_use = 0;
-   page->dma_free_off = 0;
-   } else {
-   kfree(page);
-   page = NULL;
-   }
+   pool_initialize_free_block_list(pool, vaddr);
+
+   page = virt_to_page(vaddr);
+   page->dma = dma;
+   page->dma_free_off = 0;
+   page->dma_in_use = 0;
+
return page;
 }
 
-static inline bool is_page_busy(struct dma_page *page)
+static inline bool is_page_busy(struct page *page)
 {
return page->dma_in_use != 0;
 }
 
-static void 

[PATCH v3 05/10] dmapool: rename fields in dma_page

2018-08-07 Thread Tony Battersby
Rename fields in 'struct dma_page' in preparation for moving them into
'struct page'.  No functional changes.

in_use -> dma_in_use
offset -> dma_free_off

Signed-off-by: Tony Battersby 
---

Changes since v2:
Use dma_free_off instead of dma_free_o.

--- linux/mm/dmapool.c.orig 2018-08-03 17:46:13.0 -0400
+++ linux/mm/dmapool.c  2018-08-03 17:46:24.0 -0400
@@ -65,8 +65,8 @@ struct dma_page { /* cacheable header f
struct list_head dma_list;
void *vaddr;
dma_addr_t dma;
-   unsigned int in_use;
-   unsigned int offset;
+   unsigned int dma_in_use;
+   unsigned int dma_free_off;
 };
 
 static DEFINE_MUTEX(pools_lock);
@@ -101,7 +101,7 @@ show_pools(struct device *dev, struct de
>page_list[list_idx],
dma_list) {
pages++;
-   blocks += page->in_use;
+   blocks += page->dma_in_use;
}
}
spin_unlock_irq(>lock);
@@ -248,8 +248,8 @@ static struct dma_page *pool_alloc_page(
memset(page->vaddr, POOL_POISON_FREED, pool->allocation);
 #endif
pool_initialise_page(pool, page);
-   page->in_use = 0;
-   page->offset = 0;
+   page->dma_in_use = 0;
+   page->dma_free_off = 0;
} else {
kfree(page);
page = NULL;
@@ -259,7 +259,7 @@ static struct dma_page *pool_alloc_page(
 
 static inline bool is_page_busy(struct dma_page *page)
 {
-   return page->in_use != 0;
+   return page->dma_in_use != 0;
 }
 
 static void pool_free_page(struct dma_pool *pool, struct dma_page *page)
@@ -362,10 +362,10 @@ void *dma_pool_alloc(struct dma_pool *po
 
list_add(>dma_list, >page_list[POOL_AVAIL_IDX]);
  ready:
-   page->in_use++;
-   offset = page->offset;
-   page->offset = *(int *)(page->vaddr + offset);
-   if (page->offset >= pool->allocation)
+   page->dma_in_use++;
+   offset = page->dma_free_off;
+   page->dma_free_off = *(int *)(page->vaddr + offset);
+   if (page->dma_free_off >= pool->allocation)
/* Move page from the "available" list to the "full" list. */
list_move_tail(>dma_list,
   >page_list[POOL_FULL_IDX]);
@@ -375,8 +375,8 @@ void *dma_pool_alloc(struct dma_pool *po
{
int i;
u8 *data = retval;
-   /* page->offset is stored in first 4 bytes */
-   for (i = sizeof(page->offset); i < pool->size; i++) {
+   /* page->dma_free_off is stored in first 4 bytes */
+   for (i = sizeof(page->dma_free_off); i < pool->size; i++) {
if (data[i] == POOL_POISON_FREED)
continue;
dev_err(pool->dev,
@@ -458,7 +458,7 @@ void dma_pool_free(struct dma_pool *pool
return;
}
{
-   unsigned int chain = page->offset;
+   unsigned int chain = page->dma_free_off;
while (chain < pool->allocation) {
if (chain != offset) {
chain = *(int *)(page->vaddr + chain);
@@ -474,12 +474,12 @@ void dma_pool_free(struct dma_pool *pool
memset(vaddr, POOL_POISON_FREED, pool->size);
 #endif
 
-   page->in_use--;
-   if (page->offset >= pool->allocation)
+   page->dma_in_use--;
+   if (page->dma_free_off >= pool->allocation)
/* Move page from the "full" list to the "available" list. */
list_move(>dma_list, >page_list[POOL_AVAIL_IDX]);
-   *(int *)vaddr = page->offset;
-   page->offset = offset;
+   *(int *)vaddr = page->dma_free_off;
+   page->dma_free_off = offset;
/*
 * Resist a temptation to do
 *if (!is_page_busy(page)) pool_free_page(pool, page);

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 04/10] dmapool: improve scalability of dma_pool_alloc

2018-08-07 Thread Tony Battersby
dma_pool_alloc() scales poorly when allocating a large number of pages
because it does a linear scan of all previously-allocated pages before
allocating a new one.  Improve its scalability by maintaining a separate
list of pages that have free blocks ready to (re)allocate.  In big O
notation, this improves the algorithm from O(n^2) to O(n).

Signed-off-by: Tony Battersby 
---

Changes since v2:
*) Use list_move()/list_move_tail() instead of list_del+list_add().
*) Renamed POOL_N_LISTS to POOL_MAX_IDX.
*) Use defined names instead of 0/1 indexes for INIT_LIST_HEAD().

--- linux/mm/dmapool.c.orig 2018-08-03 16:16:49.0 -0400
+++ linux/mm/dmapool.c  2018-08-03 16:45:33.0 -0400
@@ -15,11 +15,16 @@
  * Many older drivers still have their own code to do this.
  *
  * The current design of this allocator is fairly simple.  The pool is
- * represented by the 'struct dma_pool' which keeps a doubly-linked list of
- * allocated pages.  Each page in the page_list is split into blocks of at
- * least 'size' bytes.  Free blocks are tracked in an unsorted singly-linked
- * list of free blocks within the page.  Used blocks aren't tracked, but we
- * keep a count of how many are currently allocated from each page.
+ * represented by the 'struct dma_pool'.  Each allocated page is split into
+ * blocks of at least 'size' bytes.  Free blocks are tracked in an unsorted
+ * singly-linked list of free blocks within the page.  Used blocks aren't
+ * tracked, but we keep a count of how many are currently allocated from each
+ * page.
+ *
+ * The pool keeps two doubly-linked list of allocated pages.  The 'available'
+ * list tracks pages that have one or more free blocks, and the 'full' list
+ * tracks pages that have no free blocks.  Pages are moved from one list to
+ * the other as their blocks are allocated and freed.
  */
 
 #include 
@@ -43,7 +48,10 @@
 #endif
 
 struct dma_pool {  /* the pool */
-   struct list_head page_list;
+#define POOL_FULL_IDX   0
+#define POOL_AVAIL_IDX  1
+#define POOL_MAX_IDX2
+   struct list_head page_list[POOL_MAX_IDX];
spinlock_t lock;
size_t size;
struct device *dev;
@@ -54,7 +62,7 @@ struct dma_pool { /* the pool */
 };
 
 struct dma_page {  /* cacheable header for 'allocation' bytes */
-   struct list_head page_list;
+   struct list_head dma_list;
void *vaddr;
dma_addr_t dma;
unsigned int in_use;
@@ -70,7 +78,6 @@ show_pools(struct device *dev, struct de
unsigned temp;
unsigned size;
char *next;
-   struct dma_page *page;
struct dma_pool *pool;
 
next = buf;
@@ -84,11 +91,18 @@ show_pools(struct device *dev, struct de
list_for_each_entry(pool, >dma_pools, pools) {
unsigned pages = 0;
unsigned blocks = 0;
+   int list_idx;
 
spin_lock_irq(>lock);
-   list_for_each_entry(page, >page_list, page_list) {
-   pages++;
-   blocks += page->in_use;
+   for (list_idx = 0; list_idx < POOL_MAX_IDX; list_idx++) {
+   struct dma_page *page;
+
+   list_for_each_entry(page,
+   >page_list[list_idx],
+   dma_list) {
+   pages++;
+   blocks += page->in_use;
+   }
}
spin_unlock_irq(>lock);
 
@@ -163,7 +177,8 @@ struct dma_pool *dma_pool_create(const c
 
retval->dev = dev;
 
-   INIT_LIST_HEAD(>page_list);
+   INIT_LIST_HEAD(>page_list[POOL_FULL_IDX]);
+   INIT_LIST_HEAD(>page_list[POOL_AVAIL_IDX]);
spin_lock_init(>lock);
retval->size = size;
retval->boundary = boundary;
@@ -252,7 +267,7 @@ static void pool_free_page(struct dma_po
void *vaddr = page->vaddr;
dma_addr_t dma = page->dma;
 
-   list_del(>page_list);
+   list_del(>dma_list);
 
if (is_page_busy(page)) {
dev_err(pool->dev,
@@ -278,8 +293,8 @@ static void pool_free_page(struct dma_po
  */
 void dma_pool_destroy(struct dma_pool *pool)
 {
-   struct dma_page *page;
bool empty = false;
+   int list_idx;
 
if (unlikely(!pool))
return;
@@ -294,10 +309,15 @@ void dma_pool_destroy(struct dma_pool *p
device_remove_file(pool->dev, _attr_pools);
mutex_unlock(_reg_lock);
 
-   while ((page = list_first_entry_or_null(>page_list,
-   struct dma_page,
-   page_list))) {
-   pool_free_page(pool, page);
+   for (list_idx = 0; list_idx < POOL_MAX_IDX; list_idx++) {
+   struct dma_page *page;
+
+   while ((page = list_first_entry_or_null(
+  

[PATCH v3 02/10] dmapool: remove checks for dev == NULL

2018-08-07 Thread Tony Battersby
dmapool originally tried to support pools without a device because
dma_alloc_coherent() supports allocations without a device.  But nobody
ended up using dma pools without a device, so the current checks in
dmapool.c for pool->dev == NULL are both insufficient and causing bloat.
Remove them.

Signed-off-by: Tony Battersby 
---

Changes since v2:
*) This was "dmapool: cleanup error messages" in v2.
*) Remove one more check for dev == NULL in dma_pool_destroy() that is
   unrelated to error messages.

--- linux/mm/dmapool.c.orig 2018-08-03 16:12:23.0 -0400
+++ linux/mm/dmapool.c  2018-08-03 16:13:44.0 -0400
@@ -277,7 +277,7 @@ void dma_pool_destroy(struct dma_pool *p
mutex_lock(_reg_lock);
mutex_lock(_lock);
list_del(>pools);
-   if (pool->dev && list_empty(>dev->dma_pools))
+   if (list_empty(>dev->dma_pools))
empty = true;
mutex_unlock(_lock);
if (empty)
@@ -289,13 +289,9 @@ void dma_pool_destroy(struct dma_pool *p
page = list_entry(pool->page_list.next,
  struct dma_page, page_list);
if (is_page_busy(page)) {
-   if (pool->dev)
-   dev_err(pool->dev,
-   "dma_pool_destroy %s, %p busy\n",
-   pool->name, page->vaddr);
-   else
-   pr_err("dma_pool_destroy %s, %p busy\n",
-  pool->name, page->vaddr);
+   dev_err(pool->dev,
+   "dma_pool_destroy %s, %p busy\n",
+   pool->name, page->vaddr);
/* leak the still-in-use consistent memory */
list_del(>page_list);
kfree(page);
@@ -357,13 +353,9 @@ void *dma_pool_alloc(struct dma_pool *po
for (i = sizeof(page->offset); i < pool->size; i++) {
if (data[i] == POOL_POISON_FREED)
continue;
-   if (pool->dev)
-   dev_err(pool->dev,
-   "dma_pool_alloc %s, %p (corrupted)\n",
-   pool->name, retval);
-   else
-   pr_err("dma_pool_alloc %s, %p (corrupted)\n",
-   pool->name, retval);
+   dev_err(pool->dev,
+   "dma_pool_alloc %s, %p (corrupted)\n",
+   pool->name, retval);
 
/*
 * Dump the first 4 bytes even if they are not
@@ -418,13 +410,9 @@ void dma_pool_free(struct dma_pool *pool
page = pool_find_page(pool, dma);
if (!page) {
spin_unlock_irqrestore(>lock, flags);
-   if (pool->dev)
-   dev_err(pool->dev,
-   "dma_pool_free %s, %p/%lx (bad dma)\n",
-   pool->name, vaddr, (unsigned long)dma);
-   else
-   pr_err("dma_pool_free %s, %p/%lx (bad dma)\n",
-  pool->name, vaddr, (unsigned long)dma);
+   dev_err(pool->dev,
+   "dma_pool_free %s, %p/%lx (bad dma)\n",
+   pool->name, vaddr, (unsigned long)dma);
return;
}
 
@@ -432,13 +420,9 @@ void dma_pool_free(struct dma_pool *pool
 #ifdef DMAPOOL_DEBUG
if ((dma - page->dma) != offset) {
spin_unlock_irqrestore(>lock, flags);
-   if (pool->dev)
-   dev_err(pool->dev,
-   "dma_pool_free %s, %p (bad vaddr)/%pad\n",
-   pool->name, vaddr, );
-   else
-   pr_err("dma_pool_free %s, %p (bad vaddr)/%pad\n",
-  pool->name, vaddr, );
+   dev_err(pool->dev,
+   "dma_pool_free %s, %p (bad vaddr)/%pad\n",
+   pool->name, vaddr, );
return;
}
{
@@ -449,12 +433,9 @@ void dma_pool_free(struct dma_pool *pool
continue;
}
spin_unlock_irqrestore(>lock, flags);
-   if (pool->dev)
-   dev_err(pool->dev, "dma_pool_free %s, dma %pad 
already free\n",
-   pool->name, );
-   else
-   pr_err("dma_pool_free %s, dma %pad already 
free\n",
-  pool->name, );
+   dev_err(pool->dev,
+   "dma_pool_free %s, dma %pad already free\n",
+   pool->name, );
  

[PATCH v3 01/10] dmapool: fix boundary comparison

2018-08-07 Thread Tony Battersby
Fix the boundary comparison when constructing the list of free blocks
for the case that 'size' is a power of two.  Since 'boundary' is also a
power of two, that would make 'boundary' a multiple of 'size', in which
case a single block would never cross the boundary.  This bug would
cause some of the allocated memory to be wasted (but not leaked).

Example:

size   = 512
boundary   = 2048
allocation = 4096

Address range
   0 -  511
 512 - 1023
1024 - 1535
1536 - 2047 *
2048 - 2559
2560 - 3071
3072 - 3583
3584 - 4095 *

Prior to this fix, the address ranges marked with "*" would not have
been used even though they didn't cross the given boundary.

Fixes: e34f44b3517f ("pool: Improve memory usage for devices which can't cross 
boundaries")
Signed-off-by: Tony Battersby 
---

No changes since v2.

Even though I described this as a "fix", it does not seem important
enough to Cc: stable from a strict reading of the stable kernel rules. 
IOW, it is not "bothering" anyone.

--- linux/mm/dmapool.c.orig 2018-08-01 17:57:04.0 -0400
+++ linux/mm/dmapool.c  2018-08-01 17:57:16.0 -0400
@@ -210,7 +210,7 @@ static void pool_initialise_page(struct 
 
do {
unsigned int next = offset + pool->size;
-   if (unlikely((next + pool->size) >= next_boundary)) {
+   if (unlikely((next + pool->size) > next_boundary)) {
next = next_boundary;
next_boundary += pool->boundary;
}


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 00/10] mpt3sas and dmapool scalability

2018-08-07 Thread Tony Battersby
Major changes since v2:

*) Addressed review comments.

*) Changed the description of patch #2.

*) Dropped "dmapool: reduce footprint in struct page", but split off
   parts of it for merging (patches #7 and #8) and used it to improve
   patch #9.

---

drivers/scsi/mpt3sas is running into a scalability problem with the
kernel's DMA pool implementation.  With a LSI/Broadcom SAS 9300-8i
12Gb/s HBA and max_sgl_entries=256, during modprobe, mpt3sas does the
equivalent of:

chain_dma_pool = dma_pool_create(size = 128);
for (i = 0; i < 373959; i++)
{
dma_addr[i] = dma_pool_alloc(chain_dma_pool);
}

And at rmmod, system shutdown, or system reboot, mpt3sas does the
equivalent of:

for (i = 0; i < 373959; i++)
{
dma_pool_free(chain_dma_pool, dma_addr[i]);
}
dma_pool_destroy(chain_dma_pool);

With this usage, both dma_pool_alloc() and dma_pool_free() exhibit
O(n^2) complexity, although dma_pool_free() is much worse due to
implementation details.  On my system, the dma_pool_free() loop above
takes about 9 seconds to run.  Note that the problem was even worse
before commit 74522a92bbf0 ("scsi: mpt3sas: Optimize I/O memory
consumption in driver."), where the dma_pool_free() loop could take ~30
seconds.

mpt3sas also has some other DMA pools, but chain_dma_pool is the only
one with so many allocations:

cat /sys/devices/pci:80/:80:07.0/:85:00.0/pools
(manually cleaned up column alignment)
poolinfo - 0.1
reply_post_free_array pool  1  21 192 1
reply_free pool 1  1  41728   1
reply pool  1  1  1335296 1
sense pool  1  1  970272  1
chain pool  373959 386048 128 12064
reply_post_free pool12 12 166528  12

The first 9 patches in this series improve the scalability of the DMA
pool implementation, which significantly reduces the running time of the
DMA alloc/free loops.

The last patch modifies mpt3sas to replace chain_dma_pool with direct
calls to dma_alloc_coherent() and dma_free_coherent(), which reduces
its overhead even further.

The mpt3sas patch is independent of the dmapool patches; it can be used
with or without them.  If either the dmapool patches or the mpt3sas
patch is applied, then "modprobe mpt3sas", "rmmod mpt3sas", and system
shutdown/reboot with mpt3sas loaded are significantly faster.  Here are
some benchmarks (of DMA alloc/free only, not the entire modprobe/rmmod):

dma_pool_create() + dma_pool_alloc() loop, size = 128, count = 373959
  original:350 ms ( 1x)
  dmapool patches:  17 ms (21x)
  mpt3sas patch: 7 ms (51x)

dma_pool_free() loop + dma_pool_destroy(), size = 128, count = 373959
  original:8901 ms (   1x)
  dmapool patches:   15 ms ( 618x)
  mpt3sas patch:  2 ms (4245x)

Considering that LSI/Broadcom offer an out-of-tree vendor driver that
works across multiple kernel versions that won't get the dmapool
patches, it may be worth it for them to patch mpt3sas to avoid the
problem on older kernels.  The downside is that the code is a bit more
complicated.  So I leave it to their judgement whether they think it is
worth it to apply the mpt3sas patch.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/rockchip: Handle errors returned from PM framework

2018-08-07 Thread Marc Zyngier
On 07/08/18 14:15, Heiko Stuebner wrote:
> Am Dienstag, 7. August 2018, 14:31:49 CEST schrieb Marc Zyngier:
>> On 07/08/18 13:09, Heiko Stuebner wrote:
>>> Hi Marc,
>>>
>>> Am Dienstag, 7. August 2018, 10:54:05 CEST schrieb Marc Zyngier:
 pm_runtime_get_if_in_use can fail: either PM has been disabled
 altogether (-EINVAL), or the device hasn't been enabled yet (0).
 Sadly, the Rockchip IOMMU driver tends to conflate the two things
 by considering a non-zero return value as successful.

 This has the consequence of hiding other bugs, so let's handle this
 case throughout the driver, with a WARN_ON_ONCE so that we can try
 and work out what happened.

 Fixes: 0f181d3cf7d98 ("iommu/rockchip: Add runtime PM support")
 Signed-off-by: Marc Zyngier 
>>>
>>> I'm still not sure about the !CONFIG_PM case, as it was probably silently
>>> working in that case before
>>
>> Do we agree that this is an orthogonal problem though?
> 
> Nope ;-)  I.e. right now the code ignores the -EINVAL from disabled PM
> and continues, possibly even handling the irq correctly.

Ah, I now see what you mean. Yeah, this is a bit rubbish. It would have
been better if the API returned something more sensible in that case,
but that's a bit late...

> If it actually worked is a different matter, as I guess nobody really tried
> with !PM in the past.

I don't think anyone noticed. !CONFIG_PM on something like rk3399
probably isn't very popular, and certainly comes for free on a
multiplatform kernel.

> Now with error-handling we always return IRQ_NONE for !PM.

Yup.

>>> But on the other hand we're also already running over it in other places
>>> like in the iommu-shutdown and I guess if someone _really_ disabled
>>> CONFIG_PM, a lot of additional stuff would fail anyway.
>>>
>>> So should we wrap that in some #ifdef magic, just ignore it or simply
>>> select PM similar to what Tegra, Renesas and Vexpress seem to do?
>>>
>>> I guess I like the 3rd option best ;-)
>>
>> It probably doesn't hurt. At what level do you want it? As a dependency
>> to the IOMMU? or to the platform?
> 
> I guess it might be best to go the Tegra, etc way. Whoever in their right
> mind would want to drive a mobile platform without any form for power
> management ;-) .
> 
> I can do these patches for arm32+arm64 myself ... I just wanted to put
> that thought out there - in case that was just a stupid idea of mine :-D .

Not stupid at all. Regarding this very patch: where do you want me to
take it?

M.
-- 
Jazz is not dead. It just smells funny...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: DMA related cleanups for IA64

2018-08-07 Thread Christoph Hellwig
Yonu, Fenghua,

any chance you could look over these patches for 4.19?  That would
make my life for the next merge window a lot easier.

On Wed, Aug 01, 2018 at 06:02:43PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this is a resend of the last two series plus additional cleanups.  The
> driver of it were the odd dma barriers in the ia64 sync_single_* methods,
> but once I started to look into that area the fallback got bigger and
> bigger..
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
---end quoted text---
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/rockchip: Handle errors returned from PM framework

2018-08-07 Thread Heiko Stuebner
Am Dienstag, 7. August 2018, 14:31:49 CEST schrieb Marc Zyngier:
> On 07/08/18 13:09, Heiko Stuebner wrote:
> > Hi Marc,
> > 
> > Am Dienstag, 7. August 2018, 10:54:05 CEST schrieb Marc Zyngier:
> >> pm_runtime_get_if_in_use can fail: either PM has been disabled
> >> altogether (-EINVAL), or the device hasn't been enabled yet (0).
> >> Sadly, the Rockchip IOMMU driver tends to conflate the two things
> >> by considering a non-zero return value as successful.
> >>
> >> This has the consequence of hiding other bugs, so let's handle this
> >> case throughout the driver, with a WARN_ON_ONCE so that we can try
> >> and work out what happened.
> >>
> >> Fixes: 0f181d3cf7d98 ("iommu/rockchip: Add runtime PM support")
> >> Signed-off-by: Marc Zyngier 
> > 
> > I'm still not sure about the !CONFIG_PM case, as it was probably silently
> > working in that case before
> 
> Do we agree that this is an orthogonal problem though?

Nope ;-)  I.e. right now the code ignores the -EINVAL from disabled PM
and continues, possibly even handling the irq correctly.

If it actually worked is a different matter, as I guess nobody really tried
with !PM in the past.

Now with error-handling we always return IRQ_NONE for !PM.


> > But on the other hand we're also already running over it in other places
> > like in the iommu-shutdown and I guess if someone _really_ disabled
> > CONFIG_PM, a lot of additional stuff would fail anyway.
> > 
> > So should we wrap that in some #ifdef magic, just ignore it or simply
> > select PM similar to what Tegra, Renesas and Vexpress seem to do?
> > 
> > I guess I like the 3rd option best ;-)
> 
> It probably doesn't hurt. At what level do you want it? As a dependency
> to the IOMMU? or to the platform?

I guess it might be best to go the Tegra, etc way. Whoever in their right
mind would want to drive a mobile platform without any form for power
management ;-) .

I can do these patches for arm32+arm64 myself ... I just wanted to put
that thought out there - in case that was just a stupid idea of mine :-D .


Heiko


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/rockchip: Handle errors returned from PM framework

2018-08-07 Thread Marc Zyngier
On 07/08/18 13:09, Heiko Stuebner wrote:
> Hi Marc,
> 
> Am Dienstag, 7. August 2018, 10:54:05 CEST schrieb Marc Zyngier:
>> pm_runtime_get_if_in_use can fail: either PM has been disabled
>> altogether (-EINVAL), or the device hasn't been enabled yet (0).
>> Sadly, the Rockchip IOMMU driver tends to conflate the two things
>> by considering a non-zero return value as successful.
>>
>> This has the consequence of hiding other bugs, so let's handle this
>> case throughout the driver, with a WARN_ON_ONCE so that we can try
>> and work out what happened.
>>
>> Fixes: 0f181d3cf7d98 ("iommu/rockchip: Add runtime PM support")
>> Signed-off-by: Marc Zyngier 
> 
> I'm still not sure about the !CONFIG_PM case, as it was probably silently
> working in that case before

Do we agree that this is an orthogonal problem though?

> 
> But on the other hand we're also already running over it in other places
> like in the iommu-shutdown and I guess if someone _really_ disabled
> CONFIG_PM, a lot of additional stuff would fail anyway.
> 
> So should we wrap that in some #ifdef magic, just ignore it or simply
> select PM similar to what Tegra, Renesas and Vexpress seem to do?
> 
> I guess I like the 3rd option best ;-)

It probably doesn't hurt. At what level do you want it? As a dependency
to the IOMMU? or to the platform?

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/rockchip: Handle errors returned from PM framework

2018-08-07 Thread Heiko Stuebner
Hi Marc,

Am Dienstag, 7. August 2018, 10:54:05 CEST schrieb Marc Zyngier:
> pm_runtime_get_if_in_use can fail: either PM has been disabled
> altogether (-EINVAL), or the device hasn't been enabled yet (0).
> Sadly, the Rockchip IOMMU driver tends to conflate the two things
> by considering a non-zero return value as successful.
> 
> This has the consequence of hiding other bugs, so let's handle this
> case throughout the driver, with a WARN_ON_ONCE so that we can try
> and work out what happened.
> 
> Fixes: 0f181d3cf7d98 ("iommu/rockchip: Add runtime PM support")
> Signed-off-by: Marc Zyngier 

I'm still not sure about the !CONFIG_PM case, as it was probably silently
working in that case before.

But on the other hand we're also already running over it in other places
like in the iommu-shutdown and I guess if someone _really_ disabled
CONFIG_PM, a lot of additional stuff would fail anyway.


So should we wrap that in some #ifdef magic, just ignore it or simply
select PM similar to what Tegra, Renesas and Vexpress seem to do?

I guess I like the 3rd option best ;-)


Heiko



>  drivers/iommu/rockchip-iommu.c | 21 +++--
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> index 054cd2c8e9c8..4e0f9b61cd7f 100644
> --- a/drivers/iommu/rockchip-iommu.c
> +++ b/drivers/iommu/rockchip-iommu.c
> @@ -521,10 +521,11 @@ static irqreturn_t rk_iommu_irq(int irq, void *dev_id)
>   u32 int_status;
>   dma_addr_t iova;
>   irqreturn_t ret = IRQ_NONE;
> - int i;
> + int i, err;
>  
> - if (WARN_ON(!pm_runtime_get_if_in_use(iommu->dev)))
> - return 0;
> + err = pm_runtime_get_if_in_use(iommu->dev);
> + if (WARN_ON_ONCE(err <= 0))
> + return ret;
>  
>   if (WARN_ON(clk_bulk_enable(iommu->num_clocks, iommu->clocks)))
>   goto out;
> @@ -620,11 +621,15 @@ static void rk_iommu_zap_iova(struct rk_iommu_domain 
> *rk_domain,
>   spin_lock_irqsave(_domain->iommus_lock, flags);
>   list_for_each(pos, _domain->iommus) {
>   struct rk_iommu *iommu;
> + int ret;
>  
>   iommu = list_entry(pos, struct rk_iommu, node);
>  
>   /* Only zap TLBs of IOMMUs that are powered on. */
> - if (pm_runtime_get_if_in_use(iommu->dev)) {
> + ret = pm_runtime_get_if_in_use(iommu->dev);
> + if (WARN_ON_ONCE(ret < 0))
> + continue;
> + if (ret) {
>   WARN_ON(clk_bulk_enable(iommu->num_clocks,
>   iommu->clocks));
>   rk_iommu_zap_lines(iommu, iova, size);
> @@ -891,6 +896,7 @@ static void rk_iommu_detach_device(struct iommu_domain 
> *domain,
>   struct rk_iommu *iommu;
>   struct rk_iommu_domain *rk_domain = to_rk_domain(domain);
>   unsigned long flags;
> + int ret;
>  
>   /* Allow 'virtual devices' (eg drm) to detach from domain */
>   iommu = rk_iommu_from_dev(dev);
> @@ -909,7 +915,9 @@ static void rk_iommu_detach_device(struct iommu_domain 
> *domain,
>   list_del_init(>node);
>   spin_unlock_irqrestore(_domain->iommus_lock, flags);
>  
> - if (pm_runtime_get_if_in_use(iommu->dev)) {
> + ret = pm_runtime_get_if_in_use(iommu->dev);
> + WARN_ON_ONCE(ret < 0);
> + if (ret > 0) {
>   rk_iommu_disable(iommu);
>   pm_runtime_put(iommu->dev);
>   }
> @@ -946,7 +954,8 @@ static int rk_iommu_attach_device(struct iommu_domain 
> *domain,
>   list_add_tail(>node, _domain->iommus);
>   spin_unlock_irqrestore(_domain->iommus_lock, flags);
>  
> - if (!pm_runtime_get_if_in_use(iommu->dev))
> + ret = pm_runtime_get_if_in_use(iommu->dev);
> + if (!ret || WARN_ON_ONCE(ret < 0))
>   return 0;
>  
ret = rk_iommu_enable(iommu);
> 




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/rockchip: Move irq request past pm_runtime_enable

2018-08-07 Thread Heiko Stuebner
Am Dienstag, 7. August 2018, 10:54:06 CEST schrieb Marc Zyngier:
> Enabling the interrupt early, before power has been applied to the
> device, can result in an interrupt being delivered too early if:
> 
> - the IOMMU shares an interrupt with a VOP
> - the VOP has a pending interrupt (after a kexec, for example)
> 
> In these conditions, we end-up taking the interrupt without
> the IOMMU being ready to handle the interrupt (not powered on).
> 
> Moving the interrupt request past the pm_runtime_enable() call
> makes sure we can at least access the IOMMU registers. Note that
> this is only a partial fix, and that the VOP interrupt will still
> be screaming until the VOP driver kicks in, which advocates for
> a more synchronized interrupt enabling/disabling approach.
> 
> Fixes: 0f181d3cf7d98 ("iommu/rockchip: Add runtime PM support")
> Signed-off-by: Marc Zyngier 

Reviewed-by: Heiko Stuebner 


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/iova: Optimise attempts to allocate iova from 32bit address range

2018-08-07 Thread Ganapatrao Kulkarni
As an optimisation for PCI devices, there is always first attempt
been made to allocate iova from SAC address range. This will lead
to unnecessary attempts/function calls, when there are no free ranges
available.

This patch optimises by adding flag to track previous failed attempts
and avoids further attempts until replenish happens.

Signed-off-by: Ganapatrao Kulkarni 
---
This patch is based on comments from Robin Murphy 
for patch [1]

[1] https://lkml.org/lkml/2018/4/19/780

 drivers/iommu/iova.c | 11 ++-
 include/linux/iova.h |  1 +
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 83fe262..d97bb5a 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -56,6 +56,7 @@ init_iova_domain(struct iova_domain *iovad, unsigned long 
granule,
iovad->granule = granule;
iovad->start_pfn = start_pfn;
iovad->dma_32bit_pfn = 1UL << (32 - iova_shift(iovad));
+   iovad->free_32bit_pfns = true;
iovad->flush_cb = NULL;
iovad->fq = NULL;
iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR;
@@ -139,8 +140,10 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, 
struct iova *free)
 
cached_iova = rb_entry(iovad->cached32_node, struct iova, node);
if (free->pfn_hi < iovad->dma_32bit_pfn &&
-   free->pfn_lo >= cached_iova->pfn_lo)
+   free->pfn_lo >= cached_iova->pfn_lo) {
iovad->cached32_node = rb_next(>node);
+   iovad->free_32bit_pfns = true;
+   }
 
cached_iova = rb_entry(iovad->cached_node, struct iova, node);
if (free->pfn_lo >= cached_iova->pfn_lo)
@@ -290,6 +293,10 @@ alloc_iova(struct iova_domain *iovad, unsigned long size,
struct iova *new_iova;
int ret;
 
+   if (limit_pfn <= iovad->dma_32bit_pfn &&
+   !iovad->free_32bit_pfns)
+   return NULL;
+
new_iova = alloc_iova_mem();
if (!new_iova)
return NULL;
@@ -299,6 +306,8 @@ alloc_iova(struct iova_domain *iovad, unsigned long size,
 
if (ret) {
free_iova_mem(new_iova);
+   if (limit_pfn <= iovad->dma_32bit_pfn)
+   iovad->free_32bit_pfns = false;
return NULL;
}
 
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 928442d..3810ba9 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -96,6 +96,7 @@ struct iova_domain {
   flush-queues */
atomic_t fq_timer_on;   /* 1 when timer is active, 0
   when not */
+   boolfree_32bit_pfns;
 };
 
 static inline unsigned long iova_size(struct iova *iova)
-- 
2.9.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] iommu/rockchip: Move irq request past pm_runtime_enable

2018-08-07 Thread Marc Zyngier
Enabling the interrupt early, before power has been applied to the
device, can result in an interrupt being delivered too early if:

- the IOMMU shares an interrupt with a VOP
- the VOP has a pending interrupt (after a kexec, for example)

In these conditions, we end-up taking the interrupt without
the IOMMU being ready to handle the interrupt (not powered on).

Moving the interrupt request past the pm_runtime_enable() call
makes sure we can at least access the IOMMU registers. Note that
this is only a partial fix, and that the VOP interrupt will still
be screaming until the VOP driver kicks in, which advocates for
a more synchronized interrupt enabling/disabling approach.

Fixes: 0f181d3cf7d98 ("iommu/rockchip: Add runtime PM support")
Signed-off-by: Marc Zyngier 
---
 drivers/iommu/rockchip-iommu.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 4e0f9b61cd7f..2b1724e8d307 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -1161,17 +1161,6 @@ static int rk_iommu_probe(struct platform_device *pdev)
if (iommu->num_mmu == 0)
return PTR_ERR(iommu->bases[0]);
 
-   i = 0;
-   while ((irq = platform_get_irq(pdev, i++)) != -ENXIO) {
-   if (irq < 0)
-   return irq;
-
-   err = devm_request_irq(iommu->dev, irq, rk_iommu_irq,
-  IRQF_SHARED, dev_name(dev), iommu);
-   if (err)
-   return err;
-   }
-
iommu->reset_disabled = device_property_read_bool(dev,
"rockchip,disable-mmu-reset");
 
@@ -1228,6 +1217,19 @@ static int rk_iommu_probe(struct platform_device *pdev)
 
pm_runtime_enable(dev);
 
+   i = 0;
+   while ((irq = platform_get_irq(pdev, i++)) != -ENXIO) {
+   if (irq < 0)
+   return irq;
+
+   err = devm_request_irq(iommu->dev, irq, rk_iommu_irq,
+  IRQF_SHARED, dev_name(dev), iommu);
+   if (err) {
+   pm_runtime_disable(dev);
+   goto err_remove_sysfs;
+   }
+   }
+
return 0;
 err_remove_sysfs:
iommu_device_sysfs_remove(>iommu);
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/2] iommu/rockchip: Runtime PM fixes

2018-08-07 Thread Marc Zyngier
This small series addresses a couple of runtime PM issues I've spotted
while running 4.18 on a Chromebook Plus (kevin, rk3399) platform, and
specifically doing kexec.

Note that even with these two patches, kexec is still fairly broken on
rk3399, as the VOP is never turned off (see [1] for a fix).

[1] https://www.spinics.net/lists/arm-kernel/msg670229.html

Marc Zyngier (2):
  iommu/rockchip: Handle errors returned from PM framework
  iommu/rockchip: Move irq request past pm_runtime_enable

 drivers/iommu/rockchip-iommu.c | 45 +-
 1 file changed, 28 insertions(+), 17 deletions(-)

-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] iommu/rockchip: Handle errors returned from PM framework

2018-08-07 Thread Marc Zyngier
pm_runtime_get_if_in_use can fail: either PM has been disabled
altogether (-EINVAL), or the device hasn't been enabled yet (0).
Sadly, the Rockchip IOMMU driver tends to conflate the two things
by considering a non-zero return value as successful.

This has the consequence of hiding other bugs, so let's handle this
case throughout the driver, with a WARN_ON_ONCE so that we can try
and work out what happened.

Fixes: 0f181d3cf7d98 ("iommu/rockchip: Add runtime PM support")
Signed-off-by: Marc Zyngier 
---
 drivers/iommu/rockchip-iommu.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 054cd2c8e9c8..4e0f9b61cd7f 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -521,10 +521,11 @@ static irqreturn_t rk_iommu_irq(int irq, void *dev_id)
u32 int_status;
dma_addr_t iova;
irqreturn_t ret = IRQ_NONE;
-   int i;
+   int i, err;
 
-   if (WARN_ON(!pm_runtime_get_if_in_use(iommu->dev)))
-   return 0;
+   err = pm_runtime_get_if_in_use(iommu->dev);
+   if (WARN_ON_ONCE(err <= 0))
+   return ret;
 
if (WARN_ON(clk_bulk_enable(iommu->num_clocks, iommu->clocks)))
goto out;
@@ -620,11 +621,15 @@ static void rk_iommu_zap_iova(struct rk_iommu_domain 
*rk_domain,
spin_lock_irqsave(_domain->iommus_lock, flags);
list_for_each(pos, _domain->iommus) {
struct rk_iommu *iommu;
+   int ret;
 
iommu = list_entry(pos, struct rk_iommu, node);
 
/* Only zap TLBs of IOMMUs that are powered on. */
-   if (pm_runtime_get_if_in_use(iommu->dev)) {
+   ret = pm_runtime_get_if_in_use(iommu->dev);
+   if (WARN_ON_ONCE(ret < 0))
+   continue;
+   if (ret) {
WARN_ON(clk_bulk_enable(iommu->num_clocks,
iommu->clocks));
rk_iommu_zap_lines(iommu, iova, size);
@@ -891,6 +896,7 @@ static void rk_iommu_detach_device(struct iommu_domain 
*domain,
struct rk_iommu *iommu;
struct rk_iommu_domain *rk_domain = to_rk_domain(domain);
unsigned long flags;
+   int ret;
 
/* Allow 'virtual devices' (eg drm) to detach from domain */
iommu = rk_iommu_from_dev(dev);
@@ -909,7 +915,9 @@ static void rk_iommu_detach_device(struct iommu_domain 
*domain,
list_del_init(>node);
spin_unlock_irqrestore(_domain->iommus_lock, flags);
 
-   if (pm_runtime_get_if_in_use(iommu->dev)) {
+   ret = pm_runtime_get_if_in_use(iommu->dev);
+   WARN_ON_ONCE(ret < 0);
+   if (ret > 0) {
rk_iommu_disable(iommu);
pm_runtime_put(iommu->dev);
}
@@ -946,7 +954,8 @@ static int rk_iommu_attach_device(struct iommu_domain 
*domain,
list_add_tail(>node, _domain->iommus);
spin_unlock_irqrestore(_domain->iommus_lock, flags);
 
-   if (!pm_runtime_get_if_in_use(iommu->dev))
+   ret = pm_runtime_get_if_in_use(iommu->dev);
+   if (!ret || WARN_ON_ONCE(ret < 0))
return 0;
 
ret = rk_iommu_enable(iommu);
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] hexagon: remove the sync_single_for_cpu DMA operation

2018-08-07 Thread Christoph Hellwig
On Tue, Jul 31, 2018 at 05:22:29PM +0200, Christoph Hellwig wrote:
> On Wed, Jul 25, 2018 at 06:39:27AM +0200, Christoph Hellwig wrote:
> > On Tue, Jul 24, 2018 at 10:29:48PM -0500, Richard Kuo wrote:
> > > Patch series looks good.  Definitely appreciate the cleanup.
> > > 
> > > I can take it through my tree, or if not:
> > > 
> > > Acked-by: Richard Kuo 
> > 
> > Please take it through your tree, thanks!
> 
> I haven't seen it in linux-next yet, do you still plan to take it?
> 
> Otherwise I'll merge it in the dma-mapping tree.

Given that I haven't seen this in linux-next nor haven't heard back
from you I assume you are on your well deserved summer vacation.

If I don't hear back until tomorrow night I'll merge it through the
dma-mapping tree, so that I have it in linux-next before heading out
to my summer vacation.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu