Re: [GIT PULL] iommu/arm-smmu: Fixes for 4.11

2017-03-22 Thread Joerg Roedel
On Wed, Mar 22, 2017 at 06:38:50PM +, Will Deacon wrote:
> The following changes since commit c1ae3cfa0e89fa1a7ecc4c99031f5e9ae99d9201:
> 
>   Linux 4.11-rc1 (2017-03-05 12:59:56 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git 
> for-joerg/arm-smmu/fixes

Pulled, thanks Will.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 24/30] iommu: Specify PASID state when unbinding a task

2017-03-22 Thread Joerg Roedel
On Wed, Mar 22, 2017 at 06:31:01PM +, Jean-Philippe Brucker wrote:
> The problem might be too tied to the specifics of the SMMU. As implemented
> in this series, the normal flow for a PPR with the SMMU is the following:
> 
> (1) PCI device issues a PPR for PASID 1
> (2) The PPR is queued by the SMMU in the (hardware) PRI queue
> (3) The SMMU driver receives an interrupt, dequeues the PPR and moves it
> to a software work queue.
> (4) The PPR is finally handled and a PRI response is sent to the device.

There are two ways a PASID could get shut down:

1) The device driver calls unbind()
2) The mm_struct bound to that PASID is going away

Case 1) is the easy one, we can safely assume that the device driver did
anything to stop new PPR requests from being created for that PASID. In
this case we just shut down PPR processing by waiting until everything
is handled and reply INVALID to any further PPR request before we remove
the PASID from the per-device IOMMU data structures and flush caches.

In case 2) we have more work to do. The mm_struct is going away
(probably because the task segfaulted) and we can't assume that the
device driver shut everything down already. But for this case we have
the call-back into the device driver to tell it should clean everything
up for that PASID and stop the device from creating further requests.

After that call-back returns it is the same as in case 1), we drain the
queue and deny any further request that comes in.

> The case that worries me is if someone unbinds PASID 1 between (2) and
> (3), while the PPR is still in the hardware queue, and immediately binds
> it to a new address space.
> 
> Then (3) and (4) happen, the PPR is handled and the fault is for the new
> address space. It's certainly undesirable, but I don't know if it could be
> exploited. We don't kill the task for an unhandled fault at the moment,
> simply report a failed PPR to the device, so I might be worrying for nothing.

As I wrote above, when the device driver calls unbind() we should
assume that the device does not sent any further requests with that
PASID. If it does, we just answer with INVALID.

> Having the caller tell us if PPRs might still be pending in the hardware
> PRI queue ensures that the SMMU driver waits until it's entirely safe:
> 
> * If the device has no outstanding PPR, PASID can be reallocated
> * If the device has outstanding PPRs, wait for a Stop Marker, or drain
>   the PRI queue after a while (if the Stop Marker was lost in a PRI queue
>   overflow).

That can't happen, when the device driver does its job right. It has to
shut down the context which causes the PPR requests for the PASID on the
device. This includes stopping the context and waiting until all PPR
requests it sent are processed.

And the device driver has to do this either before it calls unbind() or
in the call-back it provided. Only after this the PASID should be freed.

> Draining the PRI queue is very costly, we need to block the PRI thread to
> inspect the queue, risking an overflow. And with these PASID state flags
> we avoid flushing any queue.

There is a configurable maximum of PPR requests a device can have
in-flight. If you take that into account when allocation the PPR queue
for the SMMU, there can't be any overflows. The AMD driver allocates a
queue for 512 entries and allows devices to have a maximum of 32
outstanding requests.

> But since the problem seems too centered around the SMMU, I might just
> drop this patch along with the CLEAN/FLUSHED flags in my next version, and
> go with the full-drain solution. After all, unbind should be a fairly rare
> event.

I don't think all this is SMMU specific, it is the same on all other
IOMMUs that have the ATS/PRI/PASID features.



Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [iommu:core 3/3] drivers/iommu/iova.c:36: error: redefinition of 'init_iova_domain'

2017-03-22 Thread Joerg Roedel
On Wed, Mar 22, 2017 at 07:36:52PM +0100, Thierry Reding wrote:
> *sigh* I think I messed up the #ifdef line. The attached .config has
> CONFIG_IOMMU_IOVA=m, which means that the #ifdef won't be true. I think
> the proper fix would be to:
> 
> -#ifdef CONFIG_IOMMU_IOVA
> +#ifdef IS_ENABLED(CONFIG_IOMMU_IOVA)
> 
> Joerg, can you fix that up yourself or do you want me to resend the
> patch with the fix rolled in?

I'll fix that up tomorrow.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[GIT PULL] iommu/arm-smmu: Fixes for 4.11

2017-03-22 Thread Will Deacon
Hi Joerg,

Please pull these two ARM io-pgtable fixes from Oleksandr for 4.11. They're
not critical, but they mean that we detect misuses in the iommu_{map,unmap}
API instead of deferencing junk pointers in the kernel. I've had them queued
locally for a while, so Robin and I have given them a fair workout.

Thanks,

Will

--->8

The following changes since commit c1ae3cfa0e89fa1a7ecc4c99031f5e9ae99d9201:

  Linux 4.11-rc1 (2017-03-05 12:59:56 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git 
for-joerg/arm-smmu/fixes

for you to fetch changes up to a03849e7210277fa212779b7cd9c30e1ab6194b2:

  iommu/io-pgtable-arm-v7s: Check for leaf entry before dereferencing it 
(2017-03-10 18:23:34 +)


Oleksandr Tyshchenko (2):
  iommu/io-pgtable-arm: Check for leaf entry before dereferencing it
  iommu/io-pgtable-arm-v7s: Check for leaf entry before dereferencing it

 drivers/iommu/io-pgtable-arm-v7s.c | 6 +-
 drivers/iommu/io-pgtable-arm.c | 6 +-
 2 files changed, 10 insertions(+), 2 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [iommu:core 3/3] drivers/iommu/iova.c:36: error: redefinition of 'init_iova_domain'

2017-03-22 Thread Thierry Reding
On Thu, Mar 23, 2017 at 02:28:27AM +0800, kbuild test robot wrote:
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git core
> head:   21aff52ab2c831c2f07d48e2fa8d4bab26a66992
> commit: 21aff52ab2c831c2f07d48e2fa8d4bab26a66992 [3/3] iommu: Add dummy 
> implementations for !IOMMU_IOVA
> config: x86_64-randconfig-a0-03222342 (attached as .config)
> compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
> reproduce:
> git checkout 21aff52ab2c831c2f07d48e2fa8d4bab26a66992
> # save the attached .config to linux build tree
> make ARCH=x86_64 
> 
> All errors (new ones prefixed by >>):
> 
> >> drivers/iommu/iova.c:36: error: redefinition of 'init_iova_domain'
>include/linux/iova.h:170: note: previous definition of 'init_iova_domain' 
> was here
> >> drivers/iommu/iova.c:202: error: redefinition of 'alloc_iova_mem'
>include/linux/iova.h:120: note: previous definition of 'alloc_iova_mem' 
> was here
> >> drivers/iommu/iova.c:208: error: redefinition of 'free_iova_mem'
>include/linux/iova.h:125: note: previous definition of 'free_iova_mem' was 
> here
> >> drivers/iommu/iova.c:214: error: redefinition of 'iova_cache_get'
>include/linux/iova.h:111: note: previous definition of 'iova_cache_get' 
> was here
> >> drivers/iommu/iova.c:235: error: redefinition of 'iova_cache_put'
>include/linux/iova.h:116: note: previous definition of 'iova_cache_put' 
> was here
> >> drivers/iommu/iova.c:261: error: redefinition of 'alloc_iova'
>include/linux/iova.h:137: note: previous definition of 'alloc_iova' was 
> here
> >> drivers/iommu/iova.c:323: error: redefinition of 'find_iova'
>include/linux/iova.h:177: note: previous definition of 'find_iova' was here
> >> drivers/iommu/iova.c:343: error: redefinition of '__free_iova'
>include/linux/iova.h:133: note: previous definition of '__free_iova' was 
> here
> >> drivers/iommu/iova.c:361: error: redefinition of 'free_iova'
>include/linux/iova.h:129: note: previous definition of 'free_iova' was here
> >> drivers/iommu/iova.c:380: error: redefinition of 'alloc_iova_fast'
>include/linux/iova.h:151: note: previous definition of 'alloc_iova_fast' 
> was here
> >> drivers/iommu/iova.c:421: error: redefinition of 'free_iova_fast'
>include/linux/iova.h:145: note: previous definition of 'free_iova_fast' 
> was here
> >> drivers/iommu/iova.c:435: error: redefinition of 'put_iova_domain'
>include/linux/iova.h:183: note: previous definition of 'put_iova_domain' 
> was here
> >> drivers/iommu/iova.c:511: error: redefinition of 'reserve_iova'
>include/linux/iova.h:158: note: previous definition of 'reserve_iova' was 
> here
> >> drivers/iommu/iova.c:552: error: redefinition of 'copy_reserved_iova'
>include/linux/iova.h:165: note: previous definition of 
> 'copy_reserved_iova' was here
> >> drivers/iommu/iova.c:572: error: redefinition of 'split_and_remove_iova'
>include/linux/iova.h:187: note: previous definition of 
> 'split_and_remove_iova' was here
> >> drivers/iommu/iova.c:887: error: redefinition of 'free_cpu_cached_iovas'
>include/linux/iova.h:195: note: previous definition of 
> 'free_cpu_cached_iovas' was here

*sigh* I think I messed up the #ifdef line. The attached .config has
CONFIG_IOMMU_IOVA=m, which means that the #ifdef won't be true. I think
the proper fix would be to:

-#ifdef CONFIG_IOMMU_IOVA
+#ifdef IS_ENABLED(CONFIG_IOMMU_IOVA)

Joerg, can you fix that up yourself or do you want me to resend the
patch with the fix rolled in?

Thierry


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[iommu:core 3/3] drivers/iommu/iova.c:36: error: redefinition of 'init_iova_domain'

2017-03-22 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git core
head:   21aff52ab2c831c2f07d48e2fa8d4bab26a66992
commit: 21aff52ab2c831c2f07d48e2fa8d4bab26a66992 [3/3] iommu: Add dummy 
implementations for !IOMMU_IOVA
config: x86_64-randconfig-a0-03222342 (attached as .config)
compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
reproduce:
git checkout 21aff52ab2c831c2f07d48e2fa8d4bab26a66992
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

>> drivers/iommu/iova.c:36: error: redefinition of 'init_iova_domain'
   include/linux/iova.h:170: note: previous definition of 'init_iova_domain' 
was here
>> drivers/iommu/iova.c:202: error: redefinition of 'alloc_iova_mem'
   include/linux/iova.h:120: note: previous definition of 'alloc_iova_mem' was 
here
>> drivers/iommu/iova.c:208: error: redefinition of 'free_iova_mem'
   include/linux/iova.h:125: note: previous definition of 'free_iova_mem' was 
here
>> drivers/iommu/iova.c:214: error: redefinition of 'iova_cache_get'
   include/linux/iova.h:111: note: previous definition of 'iova_cache_get' was 
here
>> drivers/iommu/iova.c:235: error: redefinition of 'iova_cache_put'
   include/linux/iova.h:116: note: previous definition of 'iova_cache_put' was 
here
>> drivers/iommu/iova.c:261: error: redefinition of 'alloc_iova'
   include/linux/iova.h:137: note: previous definition of 'alloc_iova' was here
>> drivers/iommu/iova.c:323: error: redefinition of 'find_iova'
   include/linux/iova.h:177: note: previous definition of 'find_iova' was here
>> drivers/iommu/iova.c:343: error: redefinition of '__free_iova'
   include/linux/iova.h:133: note: previous definition of '__free_iova' was here
>> drivers/iommu/iova.c:361: error: redefinition of 'free_iova'
   include/linux/iova.h:129: note: previous definition of 'free_iova' was here
>> drivers/iommu/iova.c:380: error: redefinition of 'alloc_iova_fast'
   include/linux/iova.h:151: note: previous definition of 'alloc_iova_fast' was 
here
>> drivers/iommu/iova.c:421: error: redefinition of 'free_iova_fast'
   include/linux/iova.h:145: note: previous definition of 'free_iova_fast' was 
here
>> drivers/iommu/iova.c:435: error: redefinition of 'put_iova_domain'
   include/linux/iova.h:183: note: previous definition of 'put_iova_domain' was 
here
>> drivers/iommu/iova.c:511: error: redefinition of 'reserve_iova'
   include/linux/iova.h:158: note: previous definition of 'reserve_iova' was 
here
>> drivers/iommu/iova.c:552: error: redefinition of 'copy_reserved_iova'
   include/linux/iova.h:165: note: previous definition of 'copy_reserved_iova' 
was here
>> drivers/iommu/iova.c:572: error: redefinition of 'split_and_remove_iova'
   include/linux/iova.h:187: note: previous definition of 
'split_and_remove_iova' was here
>> drivers/iommu/iova.c:887: error: redefinition of 'free_cpu_cached_iovas'
   include/linux/iova.h:195: note: previous definition of 
'free_cpu_cached_iovas' was here

vim +/init_iova_domain +36 drivers/iommu/iova.c

9257b4a20 drivers/iommu/iova.c Omer Peleg2016-04-20   30
 unsigned long size,
9257b4a20 drivers/iommu/iova.c Omer Peleg2016-04-20   31
 unsigned long limit_pfn);
9257b4a20 drivers/iommu/iova.c Omer Peleg2016-04-20   32  static 
void init_iova_rcaches(struct iova_domain *iovad);
9257b4a20 drivers/iommu/iova.c Omer Peleg2016-04-20   33  static 
void free_iova_rcaches(struct iova_domain *iovad);
85b454562 drivers/iommu/iova.c Robin Murphy  2015-01-12   34  
f8de50eb6 drivers/pci/iova.c   Keshavamurthy, Anil S 2007-10-21   35  void
0fb5fe874 drivers/iommu/iova.c Robin Murphy  2015-01-12  @36  
init_iova_domain(struct iova_domain *iovad, unsigned long granule,
0fb5fe874 drivers/iommu/iova.c Robin Murphy  2015-01-12   37
unsigned long start_pfn, unsigned long pfn_32bit)
f8de50eb6 drivers/pci/iova.c   Keshavamurthy, Anil S 2007-10-21   38  {
0fb5fe874 drivers/iommu/iova.c Robin Murphy  2015-01-12   39/*
0fb5fe874 drivers/iommu/iova.c Robin Murphy  2015-01-12   40 * IOVA 
granularity will normally be equal to the smallest
0fb5fe874 drivers/iommu/iova.c Robin Murphy  2015-01-12   41 * 
supported IOMMU page size; both *must* be capable of
0fb5fe874 drivers/iommu/iova.c Robin Murphy  2015-01-12   42 * 
representing individual CPU pages exactly.
0fb5fe874 drivers/iommu/iova.c Robin Murphy  2015-01-12   43 */
0fb5fe874 drivers/iommu/iova.c Robin Murphy  2015-01-12   44
BUG_ON((granule > PAGE_SIZE) || !is_power_of_2(granule));
0fb5fe874 drivers/iommu/iova.c Robin Murphy  2015-01-12   45  
f8de50eb6 drivers/pci/iova.c   Keshavamurthy, Anil S 2007-10-21   46
spin_lock_init(>iova_rbtree_lock);
f8de50eb6 drivers/pci/iova.c   Keshavamurthy, Anil S 2007-10-21   47
iovad->rbroot = RB_ROOT;
f8de50eb6 drivers/pci/iova.c   

Re: [RFC PATCH 24/30] iommu: Specify PASID state when unbinding a task

2017-03-22 Thread Jean-Philippe Brucker
On 22/03/17 15:44, Joerg Roedel wrote:
> On Mon, Feb 27, 2017 at 07:54:35PM +, Jean-Philippe Brucker wrote:
>> It is an important distinction because, if the IOMMU driver reassigns a
>> PASID while the IOMMU still holds pending PPR targeting that PASID
>> internally, the PPR will trigger a fault in the wrong address space.
> 
> The IOMMU driver also controls a devices apbility to issue PPR requests
> (at least on PCI), so it already knows whether a device has still
> requests pending or if it even can create new ones.

Apart from resetting the PRI capability, the SMMU doesn't have any control
over the device's PPR requests, so we simply mandate that the caller did
the required work to stop issuing them before calling iommu_unbind.

> Furhter, the IOMMU driver can already wait for all pending faults to be
> processed before it shuts down a PASID. So it is not clear to me why the
> device driver needs to be involved here.

The problem might be too tied to the specifics of the SMMU. As implemented
in this series, the normal flow for a PPR with the SMMU is the following:

(1) PCI device issues a PPR for PASID 1
(2) The PPR is queued by the SMMU in the (hardware) PRI queue
(3) The SMMU driver receives an interrupt, dequeues the PPR and moves it
to a software work queue.
(4) The PPR is finally handled and a PRI response is sent to the device.

The case that worries me is if someone unbinds PASID 1 between (2) and
(3), while the PPR is still in the hardware queue, and immediately binds
it to a new address space.

Then (3) and (4) happen, the PPR is handled and the fault is for the new
address space. It's certainly undesirable, but I don't know if it could be
exploited. We don't kill the task for an unhandled fault at the moment,
simply report a failed PPR to the device, so I might be worrying for nothing.

Having the caller tell us if PPRs might still be pending in the hardware
PRI queue ensures that the SMMU driver waits until it's entirely safe:

* If the device has no outstanding PPR, PASID can be reallocated
* If the device has outstanding PPRs, wait for a Stop Marker, or drain
  the PRI queue after a while (if the Stop Marker was lost in a PRI queue
  overflow).

Draining the PRI queue is very costly, we need to block the PRI thread to
inspect the queue, risking an overflow. And with these PASID state flags
we avoid flushing any queue.

But since the problem seems too centered around the SMMU, I might just
drop this patch along with the CLEAN/FLUSHED flags in my next version, and
go with the full-drain solution. After all, unbind should be a fairly rare
event.

Thanks,
Jean-Philippe

> When the device driver issues a PASID-unbind call the iommu driver
> just waits until all pending faults are processed, answers new faults
> with INVALID, then switch off the devices capability to issue new
> faults, and then release the PASID.
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 22/30] iommu: Bind/unbind tasks to/from devices

2017-03-22 Thread Jean-Philippe Brucker
Hi Joerg,

On 22/03/17 15:36, Joerg Roedel wrote:
> On Fri, Mar 03, 2017 at 06:39:58PM +, Jean-Philippe Brucker wrote:
>> Yes, it would be nice to have a common PASID allocator. But I don't
>> think that a system-wide PASID space is workable for us. At the moment
>> systems might have a few identical devices all supporting 20 bits of
>> PASID. But consider the case where one odd device can only handle four
>> address spaces, and supports a maximum of two PASID bits. We'd quickly
>> run out of PASIDs to hand to such devices, even though we could easily
>> have one PASID space per endpoint (from a quick glance at the specs, I
>> assume that both Intel and AMD IOMMUs offer one PASID table per RID.)
> 
> But that shouldn't be a problem if we allocate PASIDs top-down (meaning
> starting from the biggest value supported by a given device), right?
> 
> Then we can satisfy the devices with 16 or 20 bit PASIDs and still have
> the 2-bit PASIDs free for the devices that need it.

But if there is more than 4 devices that only support 2 bit PASIDs, you
still get a starvation that you wouldn't get with per-domain/device PASID
allocator. Arguably I have no real-world example to back this up, we can
probably expect vendors to always implement a sane amount of PASID bits.
Unifying the API is certainly more important than imagining all the
twisted configurations possible, and a PASID allocator with per-task
top-down allocation seems to me like an acceptable compromise.

Thanks,
Jean-Philippe
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu: Add dummy implementations for !IOMMU_IOVA

2017-03-22 Thread Thierry Reding
On Wed, Mar 22, 2017 at 03:55:30PM +0100, Joerg Roedel wrote:
> Hi Thierry
> 
> On Mon, Mar 20, 2017 at 08:14:31PM +0100, Thierry Reding wrote:
> > I've got a series of patches that I'd like to merge for v4.12 that have
> > a build-time dependency on this patch. It would therefore be great to
> > get your Acked-by on this so that I can merge it through the DRM tree
> > with the rest of the patches. I can provide a stable branch with only
> > this patch for you to pull into the IOMMU tree.
> 
> I applied both patches to my 'core' branch. There is not much in there
> yet besides your patches, so you can easily pull that branch in once it
> is published (later today).

Works for me. Thanks!

Thierry


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/3] IOVA allocation improvements for iommu-dma

2017-03-22 Thread Nate Watterson

On 2017-03-15 09:33, Robin Murphy wrote:

Hi all,

Hi Robin,


Here's the first bit of lock contention removal to chew on - feedback
welcome! Note that for the current users of the io-pgtable framework,
this is most likely to simply push more contention onto the io-pgtable
lock, so may not show a great improvement alone. Will and I both have
rough proof-of-concept implementations of lock-free io-pgtable code
which we need to sit down and agree on at some point, hopefullt fairly
soon.

I've taken the opportunity to do a bit of cleanup and refactoring
within the series to make the final state of the code nicer, but the
diffstat still turns out surprisingly reasonable in the end - it would
actually be negative but for the new comments!

Magnus, Shimoda-san, the first two patches should be of interest as 
they

constitute the allocation rework I mentioned a while back[1] - if you
still need to implement that scary workaround, this should make it
simple to hook IPMMU-specific calls into the alloc and free paths, and
let the driver take care of the details internally.


I've tested your patches on a QDF2400 platform and generally
see modest improvements in iperf/fio performance. As you
suspected would happen, contention has indeed moved to the
io-pgtable lock. I am looking forward to testing with the
lock-free io-pgtable implementation, however I suspect that
there will still be contention issues acquiring the (SMMUv3)
cmdq lock on the unmap path.

Reviewed/Tested-by: Nate Watterson 


Robin.

[1]:https://lists.linuxfoundation.org/pipermail/iommu/2017-January/020189.html

Robin Murphy (3):
  iommu/dma: Convert to address-based allocation
  iommu/dma: Clean up MSI IOVA allocation
  iommu/dma: Plumb in the per-CPU IOVA caches

 drivers/iommu/dma-iommu.c | 176 
--

 1 file changed, 90 insertions(+), 86 deletions(-)


--
Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a 
Linux

Foundation Collaborative Project.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 22/30] iommu: Bind/unbind tasks to/from devices

2017-03-22 Thread Joerg Roedel
Hi Jean-Philippe,

On Mon, Feb 27, 2017 at 07:54:33PM +, Jean-Philippe Brucker wrote:
> +extern int iommu_set_svm_ops(struct device *dev,
> +  const struct iommu_svm_ops *svm_ops);
> +extern int iommu_bind_task(struct device *dev, struct task_struct *task,
> +int *pasid, int flags, void *priv);
> +
> +extern int iommu_unbind_task(struct device *dev, int pasid, int flags);

I really like that API, it is simpler than what the AMD driver
currently implements but should work for it too (once we adapt the
AMD-KFD driver to it).

One issue I like to have discussed is whether we can make a global PASID
allocation (with a one-PASID per-task model) workable with SMMU too.



Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 24/30] iommu: Specify PASID state when unbinding a task

2017-03-22 Thread Joerg Roedel
On Mon, Feb 27, 2017 at 07:54:35PM +, Jean-Philippe Brucker wrote:
> It is an important distinction because, if the IOMMU driver reassigns a
> PASID while the IOMMU still holds pending PPR targeting that PASID
> internally, the PPR will trigger a fault in the wrong address space.

The IOMMU driver also controls a devices apbility to issue PPR requests
(at least on PCI), so it already knows whether a device has still
requests pending or if it even can create new ones.

Furhter, the IOMMU driver can already wait for all pending faults to be
processed before it shuts down a PASID. So it is not clear to me why the
device driver needs to be involved here.

When the device driver issues a PASID-unbind call the iommu driver
just waits until all pending faults are processed, answers new faults
with INVALID, then switch off the devices capability to issue new
faults, and then release the PASID.



Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 22/30] iommu: Bind/unbind tasks to/from devices

2017-03-22 Thread Joerg Roedel
On Fri, Mar 03, 2017 at 06:39:58PM +, Jean-Philippe Brucker wrote:
> Yes, it would be nice to have a common PASID allocator. But I don't
> think that a system-wide PASID space is workable for us. At the moment
> systems might have a few identical devices all supporting 20 bits of
> PASID. But consider the case where one odd device can only handle four
> address spaces, and supports a maximum of two PASID bits. We'd quickly
> run out of PASIDs to hand to such devices, even though we could easily
> have one PASID space per endpoint (from a quick glance at the specs, I
> assume that both Intel and AMD IOMMUs offer one PASID table per RID.)

But that shouldn't be a problem if we allocate PASIDs top-down (meaning
starting from the biggest value supported by a given device), right?

Then we can satisfy the devices with 16 or 20 bit PASIDs and still have
the 2-bit PASIDs free for the devices that need it.



Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 4/4] iommu/dma: Make PCI window reservation generic

2017-03-22 Thread Joerg Roedel
On Thu, Mar 16, 2017 at 05:00:19PM +, Robin Murphy wrote:
> Now that we're applying the IOMMU API reserved regions to our IOVA
> domains, we shouldn't need to privately special-case PCI windows, or
> indeed anything else which isn't specific to our iommu-dma layer.
> However, since those aren't IOMMU-specific either, rather than start
> duplicating code into IOMMU drivers let's transform the existing
> function into an iommu_get_resv_regions() helper that they can share.
> 
> Signed-off-by: Robin Murphy 
> ---
> 
> Notes:
> v2:
> - New
> 
>  drivers/iommu/arm-smmu-v3.c |  2 ++
>  drivers/iommu/arm-smmu.c|  2 ++
>  drivers/iommu/dma-iommu.c   | 38 --
>  include/linux/dma-iommu.h   |  5 +
>  4 files changed, 37 insertions(+), 10 deletions(-)

Applied patches 2-4 to my arm/core branch, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/4] iommu: Disambiguate MSI region types

2017-03-22 Thread Joerg Roedel
On Thu, Mar 16, 2017 at 05:00:16PM +, Robin Murphy wrote:
> The introduction of reserved regions has left a couple of rough edges
> which we could do with sorting out sooner rather than later. Since we
> are not yet addressing the potential dynamic aspect of software-managed
> reservations and presenting them at arbitrary fixed addresses, it is
> incongruous that we end up displaying hardware vs. software-managed MSI
> regions to userspace differently, especially since ARM-based systems may
> actually require one or the other, or even potentially both at once,
> (which iommu-dma currently has no hope of dealing with at all). Let's
> resolve the former user-visible inconsistency ASAP before the ABI has
> been baked into a kernel release, in a way that also lays the groundwork
> for the latter shortcoming to be addressed by follow-up patches.
> 
> For clarity, rename the software-managed type to IOMMU_RESV_SW_MSI, use
> IOMMU_RESV_MSI to describe the hardware type, and document everything a
> little bit. Since the x86 MSI remapping hardware falls squarely under
> this meaning of IOMMU_RESV_MSI, apply that type to their regions as well,
> so that we tell the same story to userspace across all platforms.
> 
> Secondly, as the various region types require quite different handling,
> and it really makes little sense to ever try combining them, convert the
> bitfield-esque #defines to a plain enum in the process before anyone
> gets the wrong impression.
> 
> Fixes: d30ddcaa7b02 ("iommu: Add a new type field in iommu_resv_region")
> Reviewed-by: Eric Auger 
> CC: Alex Williamson 
> CC: David Woodhouse 
> CC: k...@vger.kernel.org
> Signed-off-by: Robin Murphy 

Applied this one to iommu/fixes.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/2] Handle Cavium ThunderX2 PCI topology quirk

2017-03-22 Thread Jon Masters
On 03/22/2017 04:51 AM, Jayachandran C wrote:
> Hi Bjorn, Alex,
> 
> Here is v3 of the patchset to handle the PCIe topology quirk of
> Cavium ThunderX2 (previously called Broadcom Vulcan).
> 
> The earlier discussions on this can be seen at:
> http://www.spinics.net/lists/linux-pci/msg51001.html
> https://patchwork.ozlabs.org/patch/582633/ and
> https://lists.linuxfoundation.org/pipermail/iommu/2016-June/017681.html
> 
> The earlier discussion on this patchset had stalled with a suggestion
> that it may be possible to fix up this quirk by handling the issue in
> the function argument of pci_for_each_dma_alias(). But at that point
> all the ACPI and OF code for SMMU and GIC was to merged, and we did not
> have reasonable codebase to make the changes.
> 
> For 4.11, I tried to fix it in both the SMMU and the GIC ITS code based
> on this suggestion, but after going thru the effort, that does not look
> like the right approach. I have the code changes at:
> https://github.com/jchandra-cavm/linux/commits/rid-xlate-fixup
> if anyone want to look over the code.
> 
> The problems with that approach is:
>  - of the 14 uses of pci_for_each_dma_alias in the function in the kernel
>tree, I have to fixup 6 callers (which is all but one ofthe callers
>outside x86)
>  - 4 of these can be reasonably handled (please see the github repo above),
>but the calls in drivers/irqchip/irq-gic-v3-its-pci-msi.c and
>drivers/iommu/iommu.c cannot be reasonably fixed up.
>  - Even without the 2 above two changes I can get it to work for now.
>But pci_for_each_dma_alias does not work as expected on this platform
>and we have to be aware of that for all future uses of the function.
>   
> For now, I have ruled out that approach, and I have rebased the earlier
> patch on to 4.11-rc and submitting again for review. The changes are:
> 
> v2>v3:
>  - changed device flag name from PCI_DEV_FLAGS_DMA_ALIAS_ROOT to
>PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
>  - updated commit message to make the quirk clearer.
> 
> Let me know your comments and suggestions.

My opinion FWIW is that the quirk you have is one of the least intrusive
ways of handling this. Generally in the case of ARM servers, we have a
difference vs. x86 in that the latter usually have a magic RC at the
top level that everything sits beneath (and then, presumably, Intel
do some magic for multi-socket to fudge things over Q/UPI so that
things look nice and boring to software). On ARM, we're typically
dealing with third party RC IP that's disjoint from other parts of
the SoC. We're certainly in the process of bolstering the specs to
set some expectations and greater guidance around topologies that
we would like to avoid, so I don't see this getting out of hand.

That's my $0.02.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu: Add dummy implementations for !IOMMU_IOVA

2017-03-22 Thread Joerg Roedel
Hi Thierry

On Mon, Mar 20, 2017 at 08:14:31PM +0100, Thierry Reding wrote:
> I've got a series of patches that I'd like to merge for v4.12 that have
> a build-time dependency on this patch. It would therefore be great to
> get your Acked-by on this so that I can merge it through the DRM tree
> with the rest of the patches. I can provide a stable branch with only
> this patch for you to pull into the IOMMU tree.

I applied both patches to my 'core' branch. There is not much in there
yet besides your patches, so you can easily pull that branch in once it
is published (later today).



Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/exynos: Block SYSMMU while invalidating FLPD cache

2017-03-22 Thread Joerg Roedel
On Mon, Mar 20, 2017 at 10:17:56AM +0100, Marek Szyprowski wrote:
> Documentation specifies that SYSMMU should be in blocked state while
> performing TLB/FLPD cache invalidation, so add needed calls to
> sysmmu_block/unblock.
> 
> Fixes: 66a7ed84b345d ("iommu/exynos: Apply workaround of caching fault page 
> table entries")
> CC: sta...@vger.kernel.org # v4.10+
> Signed-off-by: Marek Szyprowski 
> ---
> This issue has been introduced long time ago, but this fix applies only for
> v4.10+ kernels due to other changes in the surrounding code.
> ---
>  drivers/iommu/exynos-iommu.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)

Applied both patches to iommu/fixes, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/5] iommu/dmar: Rectify return code handling in detect_intel_iommu()

2017-03-22 Thread Joerg Roedel
On Thu, Mar 16, 2017 at 04:23:51PM +0200, Andy Shevchenko wrote:
> There is inconsistency in return codes across the functions called from
> detect_intel_iommu().
> 
> Make it consistent and propagate return code to the caller.
> 
> Signed-off-by: Andy Shevchenko 

Applied all, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOMMU detection via sysfs

2017-03-22 Thread Alex Williamson
On Wed, 22 Mar 2017 11:06:47 +
Gabor Locsei  wrote:

> Hello,
> 
> I have systems with IOMMU optionally enabled, and I would like to find a 
> method to check these capabilities that doesn't rely on dmesg.
> The evident problem is that dmesg eventually clear boot time messages so the 
> relevant lines may just disappear.
> 
> Looks like there are sysfs entries that seem to be fit for this purpose. 
> 
> DMAR  The file /sys/firmware/acpi/tables/DMAR exists. (This 
> file exists on every lab node.)
> IOMMU The directory /sys/devices/virtual/iommu/ exists and 
> there are dmar0, etc device subdirectories in it
> 
> I have looked into the ACPI DMAR tables with iasl, but they look the same 
> regardless of the cmdline iommu settings.
> The question is, provided that the platform has ACPI, is it safe to rely on 
> sysfs, is the above enough to prove that IOMMU is available? The oldest 
> kernel version to verify against is 3.10.0 from RHEL.

DMAR is a firmware table, it's present so long as VT-d is enabled in
the BIOS, irrespective of whether the IOMMU is enabled in the kernel.
The IOMMU sysfs support was added exactly for this purpose (I tend to
prefer reference via /sys/class/iommu).

IOMMU sysfs support was added in v3.17, but note that the RHEL version
of v3.10 is a moving target, not a fixed point in time.  This support
was added in the RHEL 7.1 kernel.  Thanks,

Alex
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/3] iommu/ipmmu-vmsa: r8a7796 support V3

2017-03-22 Thread Geert Uytterhoeven
Hi Jörg,

On Wed, Mar 22, 2017 at 3:23 PM, Joerg Roedel  wrote:
> On Sun, Mar 12, 2017 at 02:38:20PM +0900, Magnus Damm wrote:
>> iommu/ipmmu-vmsa: r8a7796 support V3
>>
>> [PATCH v3 1/3] iommu/ipmmu-vmsa: Add r8a7796 DT binding
>> [PATCH v3 2/3] iommu/ipmmu-vmsa: Increase maximum micro-TLBS to 48
>> [PATCH v3 3/3] iommu/ipmmu-vmsa: Hook up r8a7796 DT matching code
>
> I just tried to apply a few of your patch-sets, but failed because they
> don't apply cleanly and it is not clear what the dependencies are and in
> which order they need to be applied.
>
> So for now I skipped all ipmmu-vmsa patches that are currently in my
> inbox. Please rebase your different series to v4.11-rc3 and resend them
> in the right order, with all Reviewed-bys and other tags that people
> gave you included. Then I am taking another look into applying them.

For the record, they apply cleanly on v4.11-rc1, cfr.
https://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers.git/log/?h=topic/r8a7796-ipmmu-v3

As I had no conflicts when creating
https://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers.git/tag/?h=renesas-drivers-2017-03-21-v4.11-rc3
I expect them to apply cleanly to your next branch, too.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 0/3] iommu/ipmmu-vmsa: r8a7796 support V3

2017-03-22 Thread Joerg Roedel
Hey Magnus,

On Sun, Mar 12, 2017 at 02:38:20PM +0900, Magnus Damm wrote:
> iommu/ipmmu-vmsa: r8a7796 support V3
> 
> [PATCH v3 1/3] iommu/ipmmu-vmsa: Add r8a7796 DT binding
> [PATCH v3 2/3] iommu/ipmmu-vmsa: Increase maximum micro-TLBS to 48
> [PATCH v3 3/3] iommu/ipmmu-vmsa: Hook up r8a7796 DT matching code

I just tried to apply a few of your patch-sets, but failed because they
don't apply cleanly and it is not clear what the dependencies are and in
which order they need to be applied.

So for now I skipped all ipmmu-vmsa patches that are currently in my
inbox. Please rebase your different series to v4.11-rc3 and resend them
in the right order, with all Reviewed-bys and other tags that people
gave you included. Then I am taking another look into applying them.


Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/5] iommu: Allow default domain type to be set on the kernel command line

2017-03-22 Thread Joerg Roedel
On Tue, Mar 21, 2017 at 05:21:37PM +, Will Deacon wrote:
> On Tue, Mar 21, 2017 at 04:45:27PM +0100, Joerg Roedel wrote:
> > On Fri, Mar 10, 2017 at 08:49:36PM +, Will Deacon wrote:
> > > @@ -1014,8 +1027,8 @@ struct iommu_group *iommu_group_get_for_dev(struct 
> > > device *dev)
> > >* IOMMU driver.
> > >*/
> > >   if (!group->default_domain) {
> > > - group->default_domain = __iommu_domain_alloc(dev->bus,
> > > -  IOMMU_DOMAIN_DMA);
> > > + group->default_domain =
> > > + __iommu_domain_alloc(dev->bus, iommu_def_domain_type);
> > 
> > It would be good to have a fall-back here if we are talking to an IOMMU
> > driver that uses default domains, but does not support identity-mapped
> > domains (yet). Exynos and Rockchip IOMMU drivers seem to fall into this
> > category. A dev_warn() also makes sense in case allocating a identity
> > domain fails.
> 
> Sure, something like the diff below?

Yes, this looks good.


Thanks,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: amd-iommu: can't boot with amdgpu, AMD-Vi: Completion-Wait loop timed out

2017-03-22 Thread 'j...@8bytes.org'
On Tue, Mar 21, 2017 at 04:30:55PM +, Deucher, Alexander wrote:
> > I am preparing a debug-patch that disables ATS for these GPUs so someone
> > with such a chip can test it.
> 
> Thanks Joerg.

Here is a debug patch, using the hard hammer of disabling the use of ATS
completly in the AMD IOMMU driver. If it fixes the issue I am going to
write a more upstreamable version.

But for now, please test if this fixes the issue.

Thanks,

Joerg

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 98940d1..f019aa6 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -467,7 +467,7 @@ static int iommu_init_device(struct device *dev)
struct amd_iommu *iommu;
 
iommu = amd_iommu_rlookup_table[dev_data->devid];
-   dev_data->iommu_v2 = iommu->is_iommu_v2;
+   dev_data->iommu_v2 = false;
}
 
dev->archdata.iommu = dev_data;
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 6130278..41d0e64 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -171,7 +171,7 @@ int amd_iommus_present;
 
 /* IOMMUs have a non-present cache? */
 bool amd_iommu_np_cache __read_mostly;
-bool amd_iommu_iotlb_sup __read_mostly = true;
+bool amd_iommu_iotlb_sup __read_mostly = false;
 
 u32 amd_iommu_max_pasid __read_mostly = ~0;
 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


IOMMU detection via sysfs

2017-03-22 Thread Gabor Locsei
Hello,

I have systems with IOMMU optionally enabled, and I would like to find a method 
to check these capabilities that doesn't rely on dmesg.
The evident problem is that dmesg eventually clear boot time messages so the 
relevant lines may just disappear.

Looks like there are sysfs entries that seem to be fit for this purpose. 

DMARThe file /sys/firmware/acpi/tables/DMAR exists. (This 
file exists on every lab node.)
IOMMU   The directory /sys/devices/virtual/iommu/ exists and 
there are dmar0, etc device subdirectories in it

I have looked into the ACPI DMAR tables with iasl, but they look the same 
regardless of the cmdline iommu settings.
The question is, provided that the platform has ACPI, is it safe to rely on 
sysfs, is the above enough to prove that IOMMU is available? The oldest kernel 
version to verify against is 3.10.0 from RHEL.

Thanks!

best regards,
Gabor
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/2] PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT

2017-03-22 Thread Jayachandran C
On Wed, Mar 22, 2017 at 08:51:10AM +, Jayachandran C wrote:

> From: Jayachandran C 

Looks like I did not fix up the author to my new mail ID. Please ignore
this part.

I can send out a clean revision if needed, until then please drop the
broadcom.com mail id in any reply to avoid bounces. Sorry for the mess up.

Thanks,
JC.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 2/2] PCI: quirks: Fix ThunderX2 dma alias handling

2017-03-22 Thread Jayachandran C
From: Jayachandran C 

The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
topology is slightly unusual. For a multi-node system, it looks like:

[node level PCI bridges - one per node]
[SoC PCI devices with MSI-X but no IOMMU]
[PCI-PCIe "glue" bridges - upto 14, one per real bridge below]
[PCIe real root ports associated with IOMMU and GICv3 ITS]
[External PCI devices connected to PCIe links]

The top two levels of bridges should have introduced aliases since they
are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
In the case of external PCIe devices, the "real" root ports are connected
to the SMMU and the GIC ITS, so there is no aliases. The SoC PCI devices
are directly connected to the GIC ITS and the bridges do not introduce
an alias either.

To handle this quirk, we mark the real PCIe root ports and node level
PCIe bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
pci_for_each_dma_alias() works correctly for external PCIe devices and
SoC PCI devices.

For the current revision of Cavium ThunderX2, the VendorID and Device ID
are from Broadcom Vulcan (14e4:90XX).

Signed-off-by: Jayachandran C 
---
 drivers/pci/quirks.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6736836..564a84a 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3958,6 +3958,20 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, 
quirk_mic_x200_dma_alias);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, 
quirk_mic_x200_dma_alias);
 
 /*
+ * The IOMMU and interrupt controller on Broadcom Vulcan/Cavium ThunderX2 are
+ * associated not at the root bus, but at a bridge below. This quirk flag
+ * will ensure that the aliases are identified correctly.
+ */
+static void quirk_bridge_cavm_thrx2_pcie_root(struct pci_dev *pdev)
+{
+   pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT;
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9000,
+   quirk_bridge_cavm_thrx2_pcie_root);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9084,
+   quirk_bridge_cavm_thrx2_pcie_root);
+
+/*
  * Intersil/Techwell TW686[4589]-based video capture cards have an empty (zero)
  * class code.  Fix it.
  */
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/2] PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT

2017-03-22 Thread Jayachandran C
From: Jayachandran C 

Add a new quirk flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT to limit the DMA
alias search to go no further than the bridge where the IOMMU unit is
attached.

The flag will be used to indicate a bridge device which forwards the
address translation requests to the IOMMU, i.e where the interrupt and
DMA requests leave the PCIe hierarchy and go into the system blocks.

Usually this happens at the PCI RC, so this flag is not needed. But
on systems where there are bridges that introduce aliases above the
"real" root bridge, this flag is needed to ensure that the function
pci_for_each_dma_alias() works correctly.

The function pci_for_each_dma_alias() is updated to stop when it see a
bridge with this flag set.

Signed-off-by: Jayachandran C 
---
 drivers/pci/search.c | 4 
 include/linux/pci.h  | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/drivers/pci/search.c b/drivers/pci/search.c
index 33e0f03..4c6044a 100644
--- a/drivers/pci/search.c
+++ b/drivers/pci/search.c
@@ -60,6 +60,10 @@ int pci_for_each_dma_alias(struct pci_dev *pdev,
 
tmp = bus->self;
 
+   /* stop at bridge where translation unit is associated */
+   if (tmp->dev_flags & PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT)
+   return ret;
+
/*
 * PCIe-to-PCI/X bridges alias transactions from downstream
 * devices using the subordinate bus number (PCI Express to
diff --git a/include/linux/pci.h b/include/linux/pci.h
index eb3da1a..3f596ac 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -178,6 +178,8 @@ enum pci_dev_flags {
PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
/* Get VPD from function 0 VPD */
PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
+   /* a non-root bridge where translation occurs, stop alias search here */
+   PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT = (__force pci_dev_flags_t) (1 << 9),
 };
 
 enum pci_irq_reroute_variant {
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 0/2] Handle Cavium ThunderX2 PCI topology quirk

2017-03-22 Thread Jayachandran C
Hi Bjorn, Alex,

Here is v3 of the patchset to handle the PCIe topology quirk of
Cavium ThunderX2 (previously called Broadcom Vulcan).

The earlier discussions on this can be seen at:
http://www.spinics.net/lists/linux-pci/msg51001.html
https://patchwork.ozlabs.org/patch/582633/ and
https://lists.linuxfoundation.org/pipermail/iommu/2016-June/017681.html

The earlier discussion on this patchset had stalled with a suggestion
that it may be possible to fix up this quirk by handling the issue in
the function argument of pci_for_each_dma_alias(). But at that point
all the ACPI and OF code for SMMU and GIC was to merged, and we did not
have reasonable codebase to make the changes.

For 4.11, I tried to fix it in both the SMMU and the GIC ITS code based
on this suggestion, but after going thru the effort, that does not look
like the right approach. I have the code changes at:
https://github.com/jchandra-cavm/linux/commits/rid-xlate-fixup
if anyone want to look over the code.

The problems with that approach is:
 - of the 14 uses of pci_for_each_dma_alias in the function in the kernel
   tree, I have to fixup 6 callers (which is all but one ofthe callers
   outside x86)
 - 4 of these can be reasonably handled (please see the github repo above),
   but the calls in drivers/irqchip/irq-gic-v3-its-pci-msi.c and
   drivers/iommu/iommu.c cannot be reasonably fixed up.
 - Even without the 2 above two changes I can get it to work for now.
   But pci_for_each_dma_alias does not work as expected on this platform
   and we have to be aware of that for all future uses of the function.
  
For now, I have ruled out that approach, and I have rebased the earlier
patch on to 4.11-rc and submitting again for review. The changes are:

v2>v3:
 - changed device flag name from PCI_DEV_FLAGS_DMA_ALIAS_ROOT to
   PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
 - updated commit message to make the quirk clearer.

Let me know your comments and suggestions.

Thanks,
JC.

Jayachandran C (2):
  PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
  PCI: quirks: Fix ThunderX2 dma alias handling

 drivers/pci/quirks.c | 14 ++
 drivers/pci/search.c |  4 
 include/linux/pci.h  |  2 ++
 3 files changed, 20 insertions(+)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v12 05/10] iommu/amd: Introduce amd_iommu_get_num_iommus()

2017-03-22 Thread Suravee Suthikulpanit
Introduce amd_iommu_get_num_iommus(), which returns the value of
amd_iommus_present. The function is used to replace direct access to
the variable, which is now declared as static.

This function will also be used by Perf AMD IOMMU driver.

Cc: Borislav Petkov 
Cc: Joerg Roedel 
Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/events/amd/iommu.h |  2 ++
 drivers/iommu/amd_iommu.c   |  6 +++---
 drivers/iommu/amd_iommu_init.c  | 11 +--
 drivers/iommu/amd_iommu_proto.h |  1 +
 drivers/iommu/amd_iommu_types.h |  3 ---
 5 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/amd/iommu.h b/arch/x86/events/amd/iommu.h
index 845d173..5c5c932 100644
--- a/arch/x86/events/amd/iommu.h
+++ b/arch/x86/events/amd/iommu.h
@@ -28,6 +28,8 @@
 #define IOMMU_BASE_DEVID   0x
 
 /* amd_iommu_init.c external support functions */
+extern int amd_iommu_get_num_iommus(void);
+
 extern bool amd_iommu_pc_supported(void);
 
 extern u8 amd_iommu_pc_get_max_banks(u16 devid);
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 98940d1..5d4dd94 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1234,7 +1234,7 @@ static void __domain_flush_pages(struct protection_domain 
*domain,
 
build_inv_iommu_pages(, address, size, domain->id, pde);
 
-   for (i = 0; i < amd_iommus_present; ++i) {
+   for (i = 0; i < amd_iommu_get_num_iommus(); ++i) {
if (!domain->dev_iommu[i])
continue;
 
@@ -1278,7 +1278,7 @@ static void domain_flush_complete(struct 
protection_domain *domain)
 {
int i;
 
-   for (i = 0; i < amd_iommus_present; ++i) {
+   for (i = 0; i < amd_iommu_get_num_iommus(); ++i) {
if (domain && !domain->dev_iommu[i])
continue;
 
@@ -3363,7 +3363,7 @@ static int __flush_pasid(struct protection_domain 
*domain, int pasid,
 * IOMMU TLB needs to be flushed before Device TLB to
 * prevent device TLB refill from IOMMU TLB
 */
-   for (i = 0; i < amd_iommus_present; ++i) {
+   for (i = 0; i < amd_iommu_get_num_iommus(); ++i) {
if (domain->dev_iommu[i] == 0)
continue;
 
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index ce65a47..d3ec9c3 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -167,7 +167,9 @@ struct ivmd_header {
 
 /* Array to assign indices to IOMMUs*/
 struct amd_iommu *amd_iommus[MAX_IOMMUS];
-int amd_iommus_present;
+
+/* Number of IOMMUs present in the system */
+static int amd_iommus_present;
 
 /* IOMMUs have a non-present cache? */
 bool amd_iommu_np_cache __read_mostly;
@@ -272,6 +274,11 @@ static inline unsigned long tbl_size(int entry_size)
return 1UL << shift;
 }
 
+int amd_iommu_get_num_iommus(void)
+{
+   return amd_iommus_present;
+}
+
 /* Access to l1 and l2 indexed register spaces */
 
 static u32 iommu_read_l1(struct amd_iommu *iommu, u16 l1, u8 address)
@@ -1336,7 +1343,7 @@ static int __init init_iommu_one(struct amd_iommu *iommu, 
struct ivhd_header *h)
 
/* Add IOMMU to internal data structures */
list_add_tail(>list, _iommu_list);
-   iommu->index = amd_iommus_present++;
+   iommu->index = amd_iommus_present++;
 
if (unlikely(iommu->index >= MAX_IOMMUS)) {
WARN(1, "AMD-Vi: System has more IOMMUs than supported by this 
driver\n");
diff --git a/drivers/iommu/amd_iommu_proto.h b/drivers/iommu/amd_iommu_proto.h
index 7eb60c1..e8f0710 100644
--- a/drivers/iommu/amd_iommu_proto.h
+++ b/drivers/iommu/amd_iommu_proto.h
@@ -21,6 +21,7 @@
 
 #include "amd_iommu_types.h"
 
+extern int amd_iommu_get_num_iommus(void);
 extern int amd_iommu_init_dma_ops(void);
 extern int amd_iommu_init_passthrough(void);
 extern irqreturn_t amd_iommu_int_thread(int irq, void *data);
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 003f3ce..4de8f41 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -611,9 +611,6 @@ struct devid_map {
  */
 extern struct amd_iommu *amd_iommus[MAX_IOMMUS];
 
-/* Number of IOMMUs present in the system */
-extern int amd_iommus_present;
-
 /*
  * Declarations for the global list of all protection domains
  */
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v12 08/10] perf/amd/iommu: Fix sysfs perf attribute groups

2017-03-22 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

Introduce static amd_iommu_attr_groups to simplify the
sysfs attributes initialization code.

Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/events/amd/iommu.c | 81 ++---
 1 file changed, 32 insertions(+), 49 deletions(-)

diff --git a/arch/x86/events/amd/iommu.c b/arch/x86/events/amd/iommu.c
index 88fbc80..7ac8138 100644
--- a/arch/x86/events/amd/iommu.c
+++ b/arch/x86/events/amd/iommu.c
@@ -43,14 +43,8 @@ struct perf_amd_iommu {
u8 max_counters;
u64 cntr_assign_mask;
raw_spinlock_t lock;
-   const struct attribute_group *attr_groups[4];
 };
 
-#define format_group   attr_groups[0]
-#define cpumask_group  attr_groups[1]
-#define events_group   attr_groups[2]
-#define null_group attr_groups[3]
-
 /*-
  * sysfs format attributes
  *-*/
@@ -81,6 +75,10 @@ struct perf_amd_iommu {
 /*-
  * sysfs events attributes
  *-*/
+static struct attribute_group amd_iommu_events_group = {
+   .name = "events",
+};
+
 struct amd_iommu_event_desc {
struct kobj_attribute attr;
const char *event;
@@ -384,76 +382,60 @@ static void perf_iommu_del(struct perf_event *event, int 
flags)
perf_event_update_userpage(event);
 }
 
-static __init int _init_events_attrs(struct perf_amd_iommu *perf_iommu)
+static __init int _init_events_attrs(void)
 {
-   struct attribute **attrs;
-   struct attribute_group *attr_group;
int i = 0, j;
+   struct attribute **attrs;
 
while (amd_iommu_v2_event_descs[i].attr.attr.name)
i++;
 
-   attr_group = kzalloc(sizeof(struct attribute *)
-   * (i + 1) + sizeof(*attr_group), GFP_KERNEL);
-   if (!attr_group)
+   attrs = kzalloc(sizeof(struct attribute **) * (i + 1), GFP_KERNEL);
+   if (!attrs)
return -ENOMEM;
 
-   attrs = (struct attribute **)(attr_group + 1);
for (j = 0; j < i; j++)
attrs[j] = _iommu_v2_event_descs[j].attr.attr;
 
-   attr_group->name = "events";
-   attr_group->attrs = attrs;
-   perf_iommu->events_group = attr_group;
-
+   amd_iommu_events_group.attrs = attrs;
return 0;
 }
 
 static __init void amd_iommu_pc_exit(void)
 {
-   if (__perf_iommu.events_group != NULL) {
-   kfree(__perf_iommu.events_group);
-   __perf_iommu.events_group = NULL;
-   }
+   kfree(amd_iommu_events_group.attrs);
 }
 
-static __init int _init_perf_amd_iommu(
-   struct perf_amd_iommu *perf_iommu, char *name)
+const struct attribute_group *amd_iommu_attr_groups[] = {
+   _iommu_format_group,
+   _iommu_cpumask_group,
+   _iommu_events_group,
+   NULL,
+};
+
+static __init int
+_init_perf_amd_iommu(struct perf_amd_iommu *perf_iommu, char *name)
 {
int ret;
 
raw_spin_lock_init(_iommu->lock);
 
-   perf_iommu->format_group = _iommu_format_group;
-
/* Init cpumask attributes to only core 0 */
cpumask_set_cpu(0, _cpumask);
-   perf_iommu->cpumask_group = _iommu_cpumask_group;
-
-   ret = _init_events_attrs(perf_iommu);
-   if (ret) {
-   pr_err("Error initializing AMD IOMMU perf events.\n");
-   return ret;
-   }
 
perf_iommu->max_banks= amd_iommu_pc_get_max_banks(0);
perf_iommu->max_counters = amd_iommu_pc_get_max_counters(0);
if (!perf_iommu->max_banks || !perf_iommu->max_counters)
return -EINVAL;
 
-   perf_iommu->null_group = NULL;
-   perf_iommu->pmu.attr_groups = perf_iommu->attr_groups;
-
+   perf_iommu->pmu.attr_groups = amd_iommu_attr_groups;
ret = perf_pmu_register(_iommu->pmu, name, -1);
-   if (ret) {
+   if (ret)
pr_err("Error initializing AMD IOMMU perf counters.\n");
-   amd_iommu_pc_exit();
-   } else {
+   else
pr_info("Detected AMD IOMMU (%d banks, %d counters/bank).\n",
amd_iommu_pc_get_max_banks(0),
amd_iommu_pc_get_max_counters(0));
-   }
-
return ret;
 }
 
@@ -467,24 +449,25 @@ static __init int _init_perf_amd_iommu(
.stop   = perf_iommu_stop,
.read   = perf_iommu_read,
},
-   .max_banks  = 0x00,
-   .max_counters   = 0x00,
-   .cntr_assign_mask   = 0ULL,
-   .format_group   = NULL,
-   .cpumask_group  = NULL,
-   .events_group   = NULL,
-   .null_group = NULL,
 };
 
 static __init int amd_iommu_pc_init(void)
 {
+   int ret;
+
/* Make sure the IOMMU 

[PATCH v12 09/10] perf/amd/iommu: Introduce amd_iommu-specific struct in struct hw_perf_event

2017-03-22 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

Current AMD IOMMU Perf PMU inappropriately uses hardware struct
inside the union inside the struct hw_perf_event, mainly the use of
extra_reg.

Instead, introduce amd_iommu-specific struct with required
parameters to be programmed onto the IOMMU performance counter
control register.

Also update the pasid field from 16 to 20 bits.

Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/events/amd/iommu.c | 102 +++-
 include/linux/perf_event.h  |   7 +++
 2 files changed, 52 insertions(+), 57 deletions(-)

diff --git a/arch/x86/events/amd/iommu.c b/arch/x86/events/amd/iommu.c
index 7ac8138..915a20c 100644
--- a/arch/x86/events/amd/iommu.c
+++ b/arch/x86/events/amd/iommu.c
@@ -23,17 +23,16 @@
 
 #define COUNTER_SHIFT  16
 
-#define _GET_BANK(ev)   ((u8)(ev->hw.extra_reg.reg >> 8))
-#define _GET_CNTR(ev)   ((u8)(ev->hw.extra_reg.reg))
-
-/* iommu pmu config masks */
-#define _GET_CSOURCE(ev)((ev->hw.config & 0xFFULL))
-#define _GET_DEVID(ev)  ((ev->hw.config >> 8)  & 0xULL)
-#define _GET_PASID(ev)  ((ev->hw.config >> 24) & 0xULL)
-#define _GET_DOMID(ev)  ((ev->hw.config >> 40) & 0xULL)
-#define _GET_DEVID_MASK(ev) ((ev->hw.extra_reg.config)  & 0xULL)
-#define _GET_PASID_MASK(ev) ((ev->hw.extra_reg.config >> 16) & 0xULL)
-#define _GET_DOMID_MASK(ev) ((ev->hw.extra_reg.config >> 32) & 0xULL)
+/* iommu pmu conf masks */
+#define GET_CSOURCE(x) (x->conf & 0xFFULL)
+#define GET_DEVID(x)   ((x->conf >> 8)  & 0xULL)
+#define GET_DOMID(x)   ((x->conf >> 24) & 0xULL)
+#define GET_PASID(x)   ((x->conf >> 40) & 0xFULL)
+
+/* iommu pmu conf1 masks */
+#define GET_DEVID_MASK(x)  (x->conf1  & 0xULL)
+#define GET_DOMID_MASK(x)  ((x->conf1 >> 16) & 0xULL)
+#define GET_PASID_MASK(x)  ((x->conf1 >> 32) & 0xFULL)
 
 static struct perf_amd_iommu __perf_iommu;
 
@@ -50,11 +49,11 @@ struct perf_amd_iommu {
  *-*/
 PMU_FORMAT_ATTR(csource,"config:0-7");
 PMU_FORMAT_ATTR(devid,  "config:8-23");
-PMU_FORMAT_ATTR(pasid,  "config:24-39");
-PMU_FORMAT_ATTR(domid,  "config:40-55");
+PMU_FORMAT_ATTR(domid,  "config:24-39");
+PMU_FORMAT_ATTR(pasid,  "config:40-59");
 PMU_FORMAT_ATTR(devid_mask, "config1:0-15");
-PMU_FORMAT_ATTR(pasid_mask, "config1:16-31");
-PMU_FORMAT_ATTR(domid_mask, "config1:32-47");
+PMU_FORMAT_ATTR(domid_mask, "config1:16-31");
+PMU_FORMAT_ATTR(pasid_mask, "config1:32-51");
 
 static struct attribute *iommu_format_attrs[] = {
_attr_csource.attr,
@@ -150,10 +149,13 @@ static ssize_t _iommu_cpumask_show(struct device *dev,
 
 /*-*/
 
-static int get_next_avail_iommu_bnk_cntr(struct perf_amd_iommu *perf_iommu)
+static int get_next_avail_iommu_bnk_cntr(struct perf_event *event)
 {
unsigned long flags;
-   int shift, bank, cntr, retval;
+   u32 shift, bank, cntr;
+   int retval;
+   struct perf_amd_iommu *perf_iommu =
+   container_of(event->pmu, struct perf_amd_iommu, pmu);
int max_banks = perf_iommu->max_banks;
int max_cntrs = perf_iommu->max_counters;
 
@@ -166,7 +168,9 @@ static int get_next_avail_iommu_bnk_cntr(struct 
perf_amd_iommu *perf_iommu)
continue;
} else {
perf_iommu->cntr_assign_mask |= BIT_ULL(shift);
-   retval = ((bank & 0xFF) << 8) | (cntr & 0xFF);
+   event->hw.iommu_bank = bank;
+   event->hw.iommu_cntr = cntr;
+   retval = 0;
goto out;
}
}
@@ -202,8 +206,6 @@ static int clear_avail_iommu_bnk_cntr(struct perf_amd_iommu 
*perf_iommu,
 static int perf_iommu_event_init(struct perf_event *event)
 {
struct hw_perf_event *hwc = >hw;
-   struct perf_amd_iommu *perf_iommu;
-   u64 config, config1;
 
/* test the event attr type check for PMU enumeration */
if (event->attr.type != event->pmu->type)
@@ -225,21 +227,9 @@ static int perf_iommu_event_init(struct perf_event *event)
if (event->cpu < 0)
return -EINVAL;
 
-   perf_iommu = &__perf_iommu;
-
-   if (event->pmu != _iommu->pmu)
-   return -ENOENT;
-
-   if (perf_iommu) {
-   config = event->attr.config;
-   config1 = event->attr.config1;
-   } else {
-   return -EINVAL;
-   }
-
/* update the hw_perf_event struct with the iommu config data */
-   hwc->config = config;
-   hwc->extra_reg.config = config1;
+   hwc->conf  = event->attr.config;
+   hwc->conf1 = 

[PATCH v12 07/10] perf/amd/iommu: Modify amd_iommu_pc_get_set_reg_val() to allow specifying IOMMU

2017-03-22 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

The current amd_iommu_pc_get_set_reg_val() cannot support multiple IOMMUs.
So, modify it to allow callers to specify IOMMU. This prepares the driver
for supporting multi-IOMMU in subsequent patch.

Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Joerg Roedel 

Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/events/amd/iommu.c | 38 
 arch/x86/events/amd/iommu.h |  9 +++--
 drivers/iommu/amd_iommu_init.c  | 43 +++--
 drivers/iommu/amd_iommu_proto.h |  5 -
 4 files changed, 48 insertions(+), 47 deletions(-)

diff --git a/arch/x86/events/amd/iommu.c b/arch/x86/events/amd/iommu.c
index 10f67d3..88fbc80 100644
--- a/arch/x86/events/amd/iommu.c
+++ b/arch/x86/events/amd/iommu.c
@@ -248,49 +248,45 @@ static int perf_iommu_event_init(struct perf_event *event)
 
 static void perf_iommu_enable_event(struct perf_event *ev)
 {
+   struct amd_iommu *iommu = get_amd_iommu(0);
u8 csource = _GET_CSOURCE(ev);
u16 devid = _GET_DEVID(ev);
+   u8 bank = _GET_BANK(ev);
+   u8 cntr = _GET_CNTR(ev);
u64 reg = 0ULL;
 
reg = csource;
-   amd_iommu_pc_get_set_reg_val(devid,
-   _GET_BANK(ev), _GET_CNTR(ev) ,
-IOMMU_PC_COUNTER_SRC_REG, , true);
+   amd_iommu_pc_set_reg(iommu, bank, cntr, IOMMU_PC_COUNTER_SRC_REG, );
 
reg = devid | (_GET_DEVID_MASK(ev) << 32);
if (reg)
reg |= BIT(31);
-   amd_iommu_pc_get_set_reg_val(devid,
-   _GET_BANK(ev), _GET_CNTR(ev) ,
-IOMMU_PC_DEVID_MATCH_REG, , true);
+   amd_iommu_pc_set_reg(iommu, bank, cntr, IOMMU_PC_DEVID_MATCH_REG, );
 
reg = _GET_PASID(ev) | (_GET_PASID_MASK(ev) << 32);
if (reg)
reg |= BIT(31);
-   amd_iommu_pc_get_set_reg_val(devid,
-   _GET_BANK(ev), _GET_CNTR(ev) ,
-IOMMU_PC_PASID_MATCH_REG, , true);
+   amd_iommu_pc_set_reg(iommu, bank, cntr, IOMMU_PC_PASID_MATCH_REG, );
 
reg = _GET_DOMID(ev) | (_GET_DOMID_MASK(ev) << 32);
if (reg)
reg |= BIT(31);
-   amd_iommu_pc_get_set_reg_val(devid,
-   _GET_BANK(ev), _GET_CNTR(ev) ,
-IOMMU_PC_DOMID_MATCH_REG, , true);
+   amd_iommu_pc_set_reg(iommu, bank, cntr, IOMMU_PC_DOMID_MATCH_REG, );
 }
 
 static void perf_iommu_disable_event(struct perf_event *event)
 {
+   struct amd_iommu *iommu = get_amd_iommu(0);
u64 reg = 0ULL;
 
-   amd_iommu_pc_get_set_reg_val(_GET_DEVID(event),
-   _GET_BANK(event), _GET_CNTR(event),
-   IOMMU_PC_COUNTER_SRC_REG, , true);
+   amd_iommu_pc_set_reg(iommu, _GET_BANK(event), _GET_CNTR(event),
+IOMMU_PC_COUNTER_SRC_REG, );
 }
 
 static void perf_iommu_start(struct perf_event *event, int flags)
 {
struct hw_perf_event *hwc = >hw;
+   struct amd_iommu *iommu = get_amd_iommu(0);
 
if (WARN_ON_ONCE(!(hwc->state & PERF_HES_STOPPED)))
return;
@@ -300,9 +296,8 @@ static void perf_iommu_start(struct perf_event *event, int 
flags)
 
if (flags & PERF_EF_RELOAD) {
u64 prev_raw_count =  local64_read(>prev_count);
-   amd_iommu_pc_get_set_reg_val(_GET_DEVID(event),
-   _GET_BANK(event), _GET_CNTR(event),
-   IOMMU_PC_COUNTER_REG, _raw_count, true);
+   amd_iommu_pc_set_reg(iommu, _GET_BANK(event), _GET_CNTR(event),
+IOMMU_PC_COUNTER_REG, _raw_count);
}
 
perf_iommu_enable_event(event);
@@ -314,10 +309,11 @@ static void perf_iommu_read(struct perf_event *event)
 {
u64 count, prev, delta;
struct hw_perf_event *hwc = >hw;
+   struct amd_iommu *iommu = get_amd_iommu(0);
 
-   amd_iommu_pc_get_set_reg_val(_GET_DEVID(event),
-   _GET_BANK(event), _GET_CNTR(event),
-   IOMMU_PC_COUNTER_REG, , false);
+   if (amd_iommu_pc_get_reg(iommu, _GET_BANK(event), _GET_CNTR(event),
+IOMMU_PC_COUNTER_REG, ))
+   return;
 
/* IOMMU pc counter register is only 48 bits */
count &= GENMASK_ULL(47, 0);
diff --git a/arch/x86/events/amd/iommu.h b/arch/x86/events/amd/iommu.h
index b775107..62e0702 100644
--- a/arch/x86/events/amd/iommu.h
+++ b/arch/x86/events/amd/iommu.h
@@ -24,6 +24,8 @@
 #define PC_MAX_SPEC_BNKS   64
 #define PC_MAX_SPEC_CNTRS  16
 
+struct amd_iommu;
+
 /* amd_iommu_init.c external support functions */
 extern int amd_iommu_get_num_iommus(void);
 
@@ -33,8 +35,11 @@
 
 extern u8 

[PATCH v12 10/10] perf/amd/iommu: Enable support for multiple IOMMUs

2017-03-22 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

Add multi-IOMMU support for perf by exposing an AMD IOMMU PMU
for each IOMMU found in the system via:

  /bus/event_source/devices/amd_iommu_x

where x is the IOMMU index. This allows users to specify
different events to be programmed onto performance counters
of each IOMMU.

Cc: Borislav Petkov 
Cc: Peter Zijlstra 
Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/events/amd/iommu.c | 111 
 1 file changed, 70 insertions(+), 41 deletions(-)

diff --git a/arch/x86/events/amd/iommu.c b/arch/x86/events/amd/iommu.c
index 915a20c..9a12077 100644
--- a/arch/x86/events/amd/iommu.c
+++ b/arch/x86/events/amd/iommu.c
@@ -34,16 +34,21 @@
 #define GET_DOMID_MASK(x)  ((x->conf1 >> 16) & 0xULL)
 #define GET_PASID_MASK(x)  ((x->conf1 >> 32) & 0xFULL)
 
-static struct perf_amd_iommu __perf_iommu;
+#define PERF_AMD_IOMMU_NAME_SIZE   16
 
 struct perf_amd_iommu {
+   struct list_head list;
struct pmu pmu;
+   struct amd_iommu *iommu;
+   char name[PERF_AMD_IOMMU_NAME_SIZE];
u8 max_banks;
u8 max_counters;
u64 cntr_assign_mask;
raw_spinlock_t lock;
 };
 
+static LIST_HEAD(perf_amd_iommu_list);
+
 /*-
  * sysfs format attributes
  *-*/
@@ -234,9 +239,14 @@ static int perf_iommu_event_init(struct perf_event *event)
return 0;
 }
 
+static inline struct amd_iommu *perf_event_2_iommu(struct perf_event *ev)
+{
+   return (container_of(ev->pmu, struct perf_amd_iommu, pmu))->iommu;
+}
+
 static void perf_iommu_enable_event(struct perf_event *ev)
 {
-   struct amd_iommu *iommu = get_amd_iommu(0);
+   struct amd_iommu *iommu = perf_event_2_iommu(ev);
struct hw_perf_event *hwc = >hw;
u8 bank = hwc->iommu_bank;
u8 cntr = hwc->iommu_cntr;
@@ -266,7 +276,7 @@ static void perf_iommu_enable_event(struct perf_event *ev)
 
 static void perf_iommu_disable_event(struct perf_event *event)
 {
-   struct amd_iommu *iommu = get_amd_iommu(0);
+   struct amd_iommu *iommu = perf_event_2_iommu(event);
struct hw_perf_event *hwc = >hw;
u64 reg = 0ULL;
 
@@ -286,7 +296,7 @@ static void perf_iommu_start(struct perf_event *event, int 
flags)
 
if (flags & PERF_EF_RELOAD) {
u64 prev_raw_count = local64_read(>prev_count);
-   struct amd_iommu *iommu = get_amd_iommu(0);
+   struct amd_iommu *iommu = perf_event_2_iommu(event);
 
amd_iommu_pc_set_reg(iommu, hwc->iommu_bank, hwc->iommu_cntr,
 IOMMU_PC_COUNTER_REG, _raw_count);
@@ -301,7 +311,7 @@ static void perf_iommu_read(struct perf_event *event)
 {
u64 count, prev, delta;
struct hw_perf_event *hwc = >hw;
-   struct amd_iommu *iommu = get_amd_iommu(0);
+   struct amd_iommu *iommu = perf_event_2_iommu(event);
 
if (amd_iommu_pc_get_reg(iommu, hwc->iommu_bank, hwc->iommu_cntr,
 IOMMU_PC_COUNTER_REG, ))
@@ -389,11 +399,6 @@ static __init int _init_events_attrs(void)
return 0;
 }
 
-static __init void amd_iommu_pc_exit(void)
-{
-   kfree(amd_iommu_events_group.attrs);
-}
-
 const struct attribute_group *amd_iommu_attr_groups[] = {
_iommu_format_group,
_iommu_cpumask_group,
@@ -401,47 +406,56 @@ static __init void amd_iommu_pc_exit(void)
NULL,
 };
 
+static struct pmu iommu_pmu = {
+   .event_init = perf_iommu_event_init,
+   .add= perf_iommu_add,
+   .del= perf_iommu_del,
+   .start  = perf_iommu_start,
+   .stop   = perf_iommu_stop,
+   .read   = perf_iommu_read,
+   .task_ctx_nr= perf_invalid_context,
+   .attr_groups= amd_iommu_attr_groups,
+};
+
 static __init int
-_init_perf_amd_iommu(struct perf_amd_iommu *perf_iommu, char *name)
+init_one_iommu(unsigned int idx)
 {
int ret;
+   struct perf_amd_iommu *perf_iommu;
 
-   raw_spin_lock_init(_iommu->lock);
+   perf_iommu = kzalloc(sizeof(struct perf_amd_iommu), GFP_KERNEL);
+   if (!perf_iommu)
+   return -ENOMEM;
 
-   /* Init cpumask attributes to only core 0 */
-   cpumask_set_cpu(0, _cpumask);
+   raw_spin_lock_init(_iommu->lock);
 
-   perf_iommu->max_banks= amd_iommu_pc_get_max_banks(0);
-   perf_iommu->max_counters = amd_iommu_pc_get_max_counters(0);
-   if (!perf_iommu->max_banks || !perf_iommu->max_counters)
+   perf_iommu->pmu  = iommu_pmu;
+   perf_iommu->iommu= get_amd_iommu(idx);
+   perf_iommu->max_banks= amd_iommu_pc_get_max_banks(idx);
+   perf_iommu->max_counters = amd_iommu_pc_get_max_counters(idx);
+   if (!perf_iommu->iommu || !perf_iommu->max_banks ||
+   

[PATCH v12 04/10] iommu/amd: Clean up iommu_pc_get_set_reg()

2017-03-22 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

Clean up coding style and fix a bug in the 64-bit register read
logic since it overwrites the upper 32-bit when reading the lower 32-bit.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd_iommu_init.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 6130278..ce65a47 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -2763,22 +2763,25 @@ static int iommu_pc_get_set_reg_val(struct amd_iommu 
*iommu,
if (WARN_ON((fxn > 0x28) || (fxn & 7)))
return -ENODEV;
 
-   offset = (u32)(((0x40|bank) << 12) | (cntr << 8) | fxn);
+   offset = (u32)(((0x40 | bank) << 12) | (cntr << 8) | fxn);
 
/* Limit the offset to the hw defined mmio region aperture */
-   max_offset_lim = (u32)(((0x40|iommu->max_banks) << 12) |
+   max_offset_lim = (u32)(((0x40 | iommu->max_banks) << 12) |
(iommu->max_counters << 8) | 0x28);
if ((offset < MMIO_CNTR_REG_OFFSET) ||
(offset > max_offset_lim))
return -EINVAL;
 
if (is_write) {
-   writel((u32)*value, iommu->mmio_base + offset);
-   writel((*value >> 32), iommu->mmio_base + offset + 4);
+   u64 val = *value & GENMASK_ULL(47, 0);
+
+   writel((u32)val, iommu->mmio_base + offset);
+   writel((val >> 32), iommu->mmio_base + offset + 4);
} else {
*value = readl(iommu->mmio_base + offset + 4);
*value <<= 32;
-   *value = readl(iommu->mmio_base + offset);
+   *value |= readl(iommu->mmio_base + offset);
+   *value &= GENMASK_ULL(47, 0);
}
 
return 0;
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v12 03/10] perf/amd/iommu: Clean up perf_iommu_read()

2017-03-22 Thread Suravee Suthikulpanit
Fix coding style and make use of GENMASK_ULL macro.

Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/events/amd/iommu.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/x86/events/amd/iommu.c b/arch/x86/events/amd/iommu.c
index e112f49..d4375da 100644
--- a/arch/x86/events/amd/iommu.c
+++ b/arch/x86/events/amd/iommu.c
@@ -320,9 +320,7 @@ static void perf_iommu_start(struct perf_event *event, int 
flags)
 
 static void perf_iommu_read(struct perf_event *event)
 {
-   u64 count = 0ULL;
-   u64 prev_raw_count = 0ULL;
-   u64 delta = 0ULL;
+   u64 count, prev, delta;
struct hw_perf_event *hwc = >hw;
 
amd_iommu_pc_get_set_reg_val(_GET_DEVID(event),
@@ -330,18 +328,16 @@ static void perf_iommu_read(struct perf_event *event)
IOMMU_PC_COUNTER_REG, , false);
 
/* IOMMU pc counter register is only 48 bits */
-   count &= 0xULL;
+   count &= GENMASK_ULL(47, 0);
 
-   prev_raw_count =  local64_read(>prev_count);
-   if (local64_cmpxchg(>prev_count, prev_raw_count,
-   count) != prev_raw_count)
+   prev = local64_read(>prev_count);
+   if (local64_cmpxchg(>prev_count, prev, count) != prev)
return;
 
-   /* Handling 48-bit counter overflowing */
-   delta = (count << COUNTER_SHIFT) - (prev_raw_count << COUNTER_SHIFT);
+   /* Handle 48-bit counter overflow */
+   delta = (count << COUNTER_SHIFT) - (prev << COUNTER_SHIFT);
delta >>= COUNTER_SHIFT;
local64_add(delta, >count);
-
 }
 
 static void perf_iommu_stop(struct perf_event *event, int flags)
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v12 01/10] perf/amd/iommu: Declare pr_fmt and remove unnecessary pr_debug

2017-03-22 Thread Suravee Suthikulpanit
Declare pr_fmt for perf/amd_iommu and remove unnecessary pr_debug.

Also check return value when _init_events_attrs fails.

Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/events/amd/iommu.c | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/arch/x86/events/amd/iommu.c b/arch/x86/events/amd/iommu.c
index b28200d..8d8ed40 100644
--- a/arch/x86/events/amd/iommu.c
+++ b/arch/x86/events/amd/iommu.c
@@ -11,6 +11,8 @@
  * published by the Free Software Foundation.
  */
 
+#define pr_fmt(fmt)"perf/amd_iommu: " fmt
+
 #include 
 #include 
 #include 
@@ -298,7 +300,6 @@ static void perf_iommu_start(struct perf_event *event, int 
flags)
 {
struct hw_perf_event *hwc = >hw;
 
-   pr_debug("perf: amd_iommu:perf_iommu_start\n");
if (WARN_ON_ONCE(!(hwc->state & PERF_HES_STOPPED)))
return;
 
@@ -323,7 +324,6 @@ static void perf_iommu_read(struct perf_event *event)
u64 prev_raw_count = 0ULL;
u64 delta = 0ULL;
struct hw_perf_event *hwc = >hw;
-   pr_debug("perf: amd_iommu:perf_iommu_read\n");
 
amd_iommu_pc_get_set_reg_val(_GET_DEVID(event),
_GET_BANK(event), _GET_CNTR(event),
@@ -349,8 +349,6 @@ static void perf_iommu_stop(struct perf_event *event, int 
flags)
struct hw_perf_event *hwc = >hw;
u64 config;
 
-   pr_debug("perf: amd_iommu:perf_iommu_stop\n");
-
if (hwc->state & PERF_HES_UPTODATE)
return;
 
@@ -372,7 +370,6 @@ static int perf_iommu_add(struct perf_event *event, int 
flags)
struct perf_amd_iommu *perf_iommu =
container_of(event->pmu, struct perf_amd_iommu, pmu);
 
-   pr_debug("perf: amd_iommu:perf_iommu_add\n");
event->hw.state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
 
/* request an iommu bank/counter */
@@ -393,7 +390,6 @@ static void perf_iommu_del(struct perf_event *event, int 
flags)
struct perf_amd_iommu *perf_iommu =
container_of(event->pmu, struct perf_amd_iommu, pmu);
 
-   pr_debug("perf: amd_iommu:perf_iommu_del\n");
perf_iommu_stop(event, PERF_EF_UPDATE);
 
/* clear the assigned iommu bank/counter */
@@ -444,27 +440,27 @@ static __init int _init_perf_amd_iommu(
 
raw_spin_lock_init(_iommu->lock);
 
-   /* Init format attributes */
perf_iommu->format_group = _iommu_format_group;
 
/* Init cpumask attributes to only core 0 */
cpumask_set_cpu(0, _cpumask);
perf_iommu->cpumask_group = _iommu_cpumask_group;
 
-   /* Init events attributes */
-   if (_init_events_attrs(perf_iommu) != 0)
-   pr_err("perf: amd_iommu: Only support raw events.\n");
+   ret = _init_events_attrs(perf_iommu);
+   if (ret) {
+   pr_err("Error initializing AMD IOMMU perf events.\n");
+   return ret;
+   }
 
-   /* Init null attributes */
perf_iommu->null_group = NULL;
perf_iommu->pmu.attr_groups = perf_iommu->attr_groups;
 
ret = perf_pmu_register(_iommu->pmu, name, -1);
if (ret) {
-   pr_err("perf: amd_iommu: Failed to initialized.\n");
+   pr_err("Error initializing AMD IOMMU perf counters.\n");
amd_iommu_pc_exit();
} else {
-   pr_info("perf: amd_iommu: Detected. (%d banks, %d 
counters/bank)\n",
+   pr_info("Detected AMD IOMMU (%d banks, %d counters/bank).\n",
amd_iommu_pc_get_max_banks(IOMMU_BASE_DEVID),
amd_iommu_pc_get_max_counters(IOMMU_BASE_DEVID));
}
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v12 02/10] perf/amd/iommu: Clean up bitwise operations

2017-03-22 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

Clean up register initializaton and make use of BIT_ULL(x)
where appropriate. This should not affect logic and functionality.

Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/events/amd/iommu.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/events/amd/iommu.c b/arch/x86/events/amd/iommu.c
index 8d8ed40..e112f49 100644
--- a/arch/x86/events/amd/iommu.c
+++ b/arch/x86/events/amd/iommu.c
@@ -164,11 +164,11 @@ static int get_next_avail_iommu_bnk_cntr(struct 
perf_amd_iommu *perf_iommu)
for (bank = 0, shift = 0; bank < max_banks; bank++) {
for (cntr = 0; cntr < max_cntrs; cntr++) {
shift = bank + (bank*3) + cntr;
-   if (perf_iommu->cntr_assign_mask & (1ULL<cntr_assign_mask & BIT_ULL(shift)) {
continue;
} else {
-   perf_iommu->cntr_assign_mask |= (1ULL<cntr_assign_mask |= BIT_ULL(shift);
+   retval = ((bank & 0xFF) << 8) | (cntr & 0xFF);
goto out;
}
}
@@ -265,23 +265,23 @@ static void perf_iommu_enable_event(struct perf_event *ev)
_GET_BANK(ev), _GET_CNTR(ev) ,
 IOMMU_PC_COUNTER_SRC_REG, , true);
 
-   reg = 0ULL | devid | (_GET_DEVID_MASK(ev) << 32);
+   reg = devid | (_GET_DEVID_MASK(ev) << 32);
if (reg)
-   reg |= (1UL << 31);
+   reg |= BIT(31);
amd_iommu_pc_get_set_reg_val(devid,
_GET_BANK(ev), _GET_CNTR(ev) ,
 IOMMU_PC_DEVID_MATCH_REG, , true);
 
-   reg = 0ULL | _GET_PASID(ev) | (_GET_PASID_MASK(ev) << 32);
+   reg = _GET_PASID(ev) | (_GET_PASID_MASK(ev) << 32);
if (reg)
-   reg |= (1UL << 31);
+   reg |= BIT(31);
amd_iommu_pc_get_set_reg_val(devid,
_GET_BANK(ev), _GET_CNTR(ev) ,
 IOMMU_PC_PASID_MATCH_REG, , true);
 
-   reg = 0ULL | _GET_DOMID(ev) | (_GET_DOMID_MASK(ev) << 32);
+   reg = _GET_DOMID(ev) | (_GET_DOMID_MASK(ev) << 32);
if (reg)
-   reg |= (1UL << 31);
+   reg |= BIT(31);
amd_iommu_pc_get_set_reg_val(devid,
_GET_BANK(ev), _GET_CNTR(ev) ,
 IOMMU_PC_DOMID_MATCH_REG, , true);
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v12 00/10] perf/amd/iommu: Enable multi-IOMMU support

2017-03-22 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

This patch series modifies the existing IOMMU and Perf drivers to support
systems with multiple IOMMUs by allocating an amd_iommu PMU per IOMMU instance.
This allows users to specify performance events and filters separately for each
IOMMU.

This has been tested on the new family17h-based server w/ multiple IOMMUs.

Git branch containing this patch series is available here:

https://github.com/ssuthiku/linux.git  perf-iommu-v12

Changes from V11 (https://lkml.org/lkml/2017/2/24/114)
  * Cleaning up in patch 9 and 10 per Boris' suggestions.

Changes from V10 (https://lkml.org/lkml/2017/2/15/1202)
  * Introduce amd_iommu specific struct in struct hw_perf_event
(patch 9/10).
  * Minor clean up on pr_info (patch 1/10 per Joe Perches).

Changes from V9 (https://lkml.org/lkml/2017/2/7/105)
  * Since each IOMMU PMU in a system functions independently from 
one another, the amd_iommu_pc_init() function has been modified
to allow certain IOMMU PMU initialization to fail w/o disabling
the others (patch 9/9).
  * Define a static pmu (iommu_pmu), and stuct assignment directly.
(per Boris)
  * Refactor code clean up into a separate patch (4/9) (per Boris)

Thanks,
Suravee

Suravee Suthikulpanit (10):
  perf/amd/iommu: Declare pr_fmt and remove unnecessary pr_debug
  perf/amd/iommu: Clean up bitwise operations
  perf/amd/iommu: Clean up perf_iommu_read()
  iommu/amd: Clean up iommu_pc_get_set_reg()
  iommu/amd: Introduce amd_iommu_get_num_iommus()
  perf/amd/iommu: Modify functions to query max banks and counters
  perf/amd/iommu: Modify amd_iommu_pc_get_set_reg_val() to allow
specifying IOMMU
  perf/amd/iommu: Fix sysfs perf attribute groups
  perf/amd/iommu: Introduce amd_iommu-specific struct in struct
hw_perf_event
  perf/amd/iommu: Enable support for multiple IOMMUs

 arch/x86/events/amd/iommu.c | 321 +++-
 arch/x86/events/amd/iommu.h |  18 ++-
 drivers/iommu/amd_iommu.c   |   6 +-
 drivers/iommu/amd_iommu_init.c  | 101 -
 drivers/iommu/amd_iommu_proto.h |   8 +-
 drivers/iommu/amd_iommu_types.h |   3 -
 include/linux/perf_event.h  |   7 +
 7 files changed, 237 insertions(+), 227 deletions(-)

-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 7/7] iommu/iova: fix iovad->dma_32bit_pfn as the last pfn of dma32

2017-03-22 Thread Zhen Lei
To make sure iovad->cached32_node and iovad->cached64_node can exactly
control dma32 and dma64 area. It also help us to remove the parameter
pfn_32bit of init_iova_domain.

Signed-off-by: Zhen Lei 
---
 drivers/iommu/amd_iommu.c|  7 ++-
 drivers/iommu/dma-iommu.c| 22 +-
 drivers/iommu/intel-iommu.c  | 11 +++
 drivers/iommu/iova.c |  4 ++--
 drivers/misc/mic/scif/scif_rma.c |  3 +--
 include/linux/iova.h |  2 +-
 6 files changed, 14 insertions(+), 35 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 98940d1..78c8b93 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -61,7 +61,6 @@
 /* IO virtual address start page frame number */
 #define IOVA_START_PFN (1)
 #define IOVA_PFN(addr) ((addr) >> PAGE_SHIFT)
-#define DMA_32BIT_PFN  IOVA_PFN(DMA_BIT_MASK(32))
 
 /* Reserved IOVA ranges */
 #define MSI_RANGE_START(0xfee0)
@@ -1776,8 +1775,7 @@ static struct dma_ops_domain *dma_ops_domain_alloc(void)
if (!dma_dom->domain.pt_root)
goto free_dma_dom;
 
-   init_iova_domain(_dom->iovad, PAGE_SIZE,
-IOVA_START_PFN, DMA_32BIT_PFN);
+   init_iova_domain(_dom->iovad, PAGE_SIZE, IOVA_START_PFN);
 
/* Initialize reserved ranges */
copy_reserved_iova(_iova_ranges, _dom->iovad);
@@ -2747,8 +2745,7 @@ static int init_reserved_iova_ranges(void)
struct pci_dev *pdev = NULL;
struct iova *val;
 
-   init_iova_domain(_iova_ranges, PAGE_SIZE,
-IOVA_START_PFN, DMA_32BIT_PFN);
+   init_iova_domain(_iova_ranges, PAGE_SIZE, IOVA_START_PFN);
 
lockdep_set_class(_iova_ranges.iova_rbtree_lock,
  _rbtree_key);
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 48d36ce..7064d32 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -223,18 +223,7 @@ int iommu_dma_init_domain(struct iommu_domain *domain, 
dma_addr_t base,
/* ...then finally give it a kicking to make sure it fits */
base_pfn = max_t(unsigned long, base_pfn,
domain->geometry.aperture_start >> order);
-   end_pfn = min_t(unsigned long, end_pfn,
-   domain->geometry.aperture_end >> order);
}
-   /*
-* PCI devices may have larger DMA masks, but still prefer allocating
-* within a 32-bit mask to avoid DAC addressing. Such limitations don't
-* apply to the typical platform device, so for those we may as well
-* leave the cache limit at the top of their range to save an rb_last()
-* traversal on every allocation.
-*/
-   if (pci)
-   end_pfn &= DMA_BIT_MASK(32) >> order;
 
/* start_pfn is always nonzero for an already-initialised domain */
if (iovad->start_pfn) {
@@ -243,16 +232,15 @@ int iommu_dma_init_domain(struct iommu_domain *domain, 
dma_addr_t base,
pr_warn("Incompatible range for DMA domain\n");
return -EFAULT;
}
-   /*
-* If we have devices with different DMA masks, move the free
-* area cache limit down for the benefit of the smaller one.
-*/
-   iovad->dma_32bit_pfn = min(end_pfn, iovad->dma_32bit_pfn);
} else {
-   init_iova_domain(iovad, 1UL << order, base_pfn, end_pfn);
+   init_iova_domain(iovad, 1UL << order, base_pfn);
if (pci)
iova_reserve_pci_windows(to_pci_dev(dev), iovad);
}
+
+   if (end_pfn < iovad->dma_32bit_pfn)
+   dev_dbg(dev, "ancient device or dma range missed some bits?");
+
return 0;
 }
 EXPORT_SYMBOL(iommu_dma_init_domain);
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 238ad34..de467c1 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -82,8 +82,6 @@
 #define IOVA_START_PFN (1)
 
 #define IOVA_PFN(addr) ((addr) >> PAGE_SHIFT)
-#define DMA_32BIT_PFN  IOVA_PFN(DMA_BIT_MASK(32))
-#define DMA_64BIT_PFN  IOVA_PFN(DMA_BIT_MASK(64))
 
 /* page table handling */
 #define LEVEL_STRIDE   (9)
@@ -1869,8 +1867,7 @@ static int dmar_init_reserved_ranges(void)
struct iova *iova;
int i;
 
-   init_iova_domain(_iova_list, VTD_PAGE_SIZE, IOVA_START_PFN,
-   DMA_32BIT_PFN);
+   init_iova_domain(_iova_list, VTD_PAGE_SIZE, IOVA_START_PFN);
 
lockdep_set_class(_iova_list.iova_rbtree_lock,
_rbtree_key);
@@ -1928,8 +1925,7 @@ static int domain_init(struct dmar_domain *domain, struct 
intel_iommu *iommu,
int adjust_width, agaw;
unsigned long sagaw;
 
-   

[PATCH 1/7] iommu/iova: fix incorrect variable types

2017-03-22 Thread Zhen Lei
Keep these four variables type consistent with the paramters of function
__alloc_and_insert_iova_range and the members of struct iova:

1. static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
unsigned long size, unsigned long limit_pfn,

2. struct iova {
unsigned long   pfn_hi;
unsigned long   pfn_lo;

In fact, limit_pfn is most likely larger than 32 bits on DMA64.

Signed-off-by: Zhen Lei 
---
 drivers/iommu/iova.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b7268a1..8ba8b496 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -104,8 +104,8 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, 
struct iova *free)
  * Computes the padding size required, to make the start address
  * naturally aligned on the power-of-two order of its size
  */
-static unsigned int
-iova_get_pad_size(unsigned int size, unsigned int limit_pfn)
+static unsigned long
+iova_get_pad_size(unsigned long size, unsigned long limit_pfn)
 {
return (limit_pfn + 1 - size) & (__roundup_pow_of_two(size) - 1);
 }
@@ -117,7 +117,7 @@ static int __alloc_and_insert_iova_range(struct iova_domain 
*iovad,
struct rb_node *prev, *curr = NULL;
unsigned long flags;
unsigned long saved_pfn;
-   unsigned int pad_size = 0;
+   unsigned long pad_size = 0;
 
/* Walk the tree backwards */
spin_lock_irqsave(>iova_rbtree_lock, flags);
-- 
2.5.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/7] iommu/iova: to optimize the allocation performance of dma64

2017-03-22 Thread Zhen Lei
Currently we always search free iova space for dma64 begin at the last
node of iovad rb-tree. In the worst case, there maybe too many nodes exist
at the tail, so that we should traverse many times for the first loop in
__alloc_and_insert_iova_range. As we traced, more than 10K times for the
case of iperf.

__alloc_and_insert_iova_range:
..
curr = __get_cached_rbnode(iovad, _pfn);
//--> return rb_last(>rbroot);
while (curr) {
..
curr = rb_prev(curr);
}

So add cached64_node to take the same effect as cached32_node, and add
the start_pfn boundary of dma64, to prevent a iova cross both dma32 and
dma64 area.
|---|--|
|<--cached32_node-->||
|   |
start_pfn dma_32bit_pfn + 1

Signed-off-by: Zhen Lei 
---
 drivers/iommu/iova.c | 46 +++---
 include/linux/iova.h |  5 +++--
 2 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 87a9332..23abe84 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -37,10 +37,15 @@ insert_iova_boundary(struct iova_domain *iovad)
 {
struct iova *iova;
unsigned long start_pfn_32bit = iovad->start_pfn;
+   unsigned long start_pfn_64bit = iovad->dma_32bit_pfn + 1;
 
iova = reserve_iova(iovad, start_pfn_32bit, start_pfn_32bit);
BUG_ON(!iova);
iovad->cached32_node = >node;
+
+   iova = reserve_iova(iovad, start_pfn_64bit, start_pfn_64bit);
+   BUG_ON(!iova);
+   iovad->cached64_node = >node;
 }
 
 void
@@ -62,8 +67,8 @@ init_iova_domain(struct iova_domain *iovad, unsigned long 
granule,
init_iova_rcaches(iovad);
 
/*
-* Insert boundary nodes for dma32. So cached32_node can not be NULL in
-* future.
+* Insert boundary nodes for dma32 and dma64. So cached32_node and
+* cached64_node can not be NULL in future.
 */
insert_iova_boundary(iovad);
 }
@@ -75,10 +80,10 @@ __get_cached_rbnode(struct iova_domain *iovad, unsigned 
long *limit_pfn)
struct rb_node *cached_node;
struct rb_node *next_node;
 
-   if (*limit_pfn > iovad->dma_32bit_pfn)
-   return rb_last(>rbroot);
-   else
+   if (*limit_pfn <= iovad->dma_32bit_pfn)
cached_node = iovad->cached32_node;
+   else
+   cached_node = iovad->cached64_node;
 
next_node = rb_next(cached_node);
if (next_node) {
@@ -94,29 +99,32 @@ static void
 __cached_rbnode_insert_update(struct iova_domain *iovad, struct iova *new)
 {
struct iova *cached_iova;
+   struct rb_node **cached_node;
 
-   if (new->pfn_hi > iovad->dma_32bit_pfn)
-   return;
+   if (new->pfn_hi <= iovad->dma_32bit_pfn)
+   cached_node = >cached32_node;
+   else
+   cached_node = >cached64_node;
 
-   cached_iova = rb_entry(iovad->cached32_node, struct iova, node);
+   cached_iova = rb_entry(*cached_node, struct iova, node);
if (new->pfn_lo <= cached_iova->pfn_lo)
-   iovad->cached32_node = rb_prev(>node);
+   *cached_node = rb_prev(>node);
 }
 
 static void
 __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
 {
struct iova *cached_iova;
-   struct rb_node *curr;
+   struct rb_node **cached_node;
 
-   curr = iovad->cached32_node;
-   cached_iova = rb_entry(curr, struct iova, node);
+   if (free->pfn_hi <= iovad->dma_32bit_pfn)
+   cached_node = >cached32_node;
+   else
+   cached_node = >cached64_node;
 
-   if (free->pfn_lo >= cached_iova->pfn_lo) {
-   /* only cache if it's below 32bit pfn */
-   if (free->pfn_hi <= iovad->dma_32bit_pfn)
-   iovad->cached32_node = rb_prev(>node);
-   }
+   cached_iova = rb_entry(*cached_node, struct iova, node);
+   if (free->pfn_lo >= cached_iova->pfn_lo)
+   *cached_node = rb_prev(>node);
 }
 
 /*
@@ -283,7 +291,7 @@ EXPORT_SYMBOL_GPL(iova_cache_put);
  * alloc_iova - allocates an iova
  * @iovad: - iova domain in question
  * @size: - size of page frames to allocate
- * @limit_pfn: - max limit address
+ * @limit_pfn: - max limit address(included)
  * @size_aligned: - set if size_aligned address range is required
  * This function allocates an iova in the range iovad->start_pfn to limit_pfn,
  * searching top-down from limit_pfn to iovad->start_pfn. If the size_aligned
@@ -402,7 +410,7 @@ EXPORT_SYMBOL_GPL(free_iova);
  * alloc_iova_fast - allocates an iova from rcache
  * @iovad: - iova domain in question
  * @size: - size of page frames to allocate
- * @limit_pfn: - max limit address
+ * @limit_pfn: - max limit address(included)
  * This function tries 

[PATCH 3/7] iommu/iova: insert start_pfn boundary of dma32

2017-03-22 Thread Zhen Lei
Reserve the first granule size memory(start at start_pfn) as boundary
iova, to make sure that iovad->cached32_node can not be NULL in future.
Meanwhile, changed the assignment of iovad->cached32_node from rb_next to
rb_prev of >node in function __cached_rbnode_delete_update.

Signed-off-by: Zhen Lei 
---
 drivers/iommu/iova.c | 63 ++--
 1 file changed, 37 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 1c49969..b5a148e 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -32,6 +32,17 @@ static unsigned long iova_rcache_get(struct iova_domain 
*iovad,
 static void init_iova_rcaches(struct iova_domain *iovad);
 static void free_iova_rcaches(struct iova_domain *iovad);
 
+static void
+insert_iova_boundary(struct iova_domain *iovad)
+{
+   struct iova *iova;
+   unsigned long start_pfn_32bit = iovad->start_pfn;
+
+   iova = reserve_iova(iovad, start_pfn_32bit, start_pfn_32bit);
+   BUG_ON(!iova);
+   iovad->cached32_node = >node;
+}
+
 void
 init_iova_domain(struct iova_domain *iovad, unsigned long granule,
unsigned long start_pfn, unsigned long pfn_32bit)
@@ -45,27 +56,38 @@ init_iova_domain(struct iova_domain *iovad, unsigned long 
granule,
 
spin_lock_init(>iova_rbtree_lock);
iovad->rbroot = RB_ROOT;
-   iovad->cached32_node = NULL;
iovad->granule = granule;
iovad->start_pfn = start_pfn;
iovad->dma_32bit_pfn = pfn_32bit;
init_iova_rcaches(iovad);
+
+   /*
+* Insert boundary nodes for dma32. So cached32_node can not be NULL in
+* future.
+*/
+   insert_iova_boundary(iovad);
 }
 EXPORT_SYMBOL_GPL(init_iova_domain);
 
 static struct rb_node *
 __get_cached_rbnode(struct iova_domain *iovad, unsigned long *limit_pfn)
 {
-   if ((*limit_pfn > iovad->dma_32bit_pfn) ||
-   (iovad->cached32_node == NULL))
+   struct rb_node *cached_node;
+   struct rb_node *next_node;
+
+   if (*limit_pfn > iovad->dma_32bit_pfn)
return rb_last(>rbroot);
-   else {
-   struct rb_node *prev_node = rb_prev(iovad->cached32_node);
-   struct iova *curr_iova =
-   rb_entry(iovad->cached32_node, struct iova, node);
-   *limit_pfn = curr_iova->pfn_lo - 1;
-   return prev_node;
+   else
+   cached_node = iovad->cached32_node;
+
+   next_node = rb_next(cached_node);
+   if (next_node) {
+   struct iova *next_iova = rb_entry(next_node, struct iova, node);
+
+   *limit_pfn = min(*limit_pfn, next_iova->pfn_lo - 1);
}
+
+   return cached_node;
 }
 
 static void
@@ -83,20 +105,13 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, 
struct iova *free)
struct iova *cached_iova;
struct rb_node *curr;
 
-   if (!iovad->cached32_node)
-   return;
curr = iovad->cached32_node;
cached_iova = rb_entry(curr, struct iova, node);
 
if (free->pfn_lo >= cached_iova->pfn_lo) {
-   struct rb_node *node = rb_next(>node);
-   struct iova *iova = rb_entry(node, struct iova, node);
-
/* only cache if it's below 32bit pfn */
-   if (node && iova->pfn_lo < iovad->dma_32bit_pfn)
-   iovad->cached32_node = node;
-   else
-   iovad->cached32_node = NULL;
+   if (free->pfn_hi <= iovad->dma_32bit_pfn)
+   iovad->cached32_node = rb_prev(>node);
}
 }
 
@@ -114,7 +129,7 @@ static int __alloc_and_insert_iova_range(struct iova_domain 
*iovad,
unsigned long size, unsigned long limit_pfn,
struct iova *new, bool size_aligned)
 {
-   struct rb_node *prev, *curr = NULL;
+   struct rb_node *prev, *curr;
unsigned long flags;
unsigned long saved_pfn;
unsigned long pad_size = 0;
@@ -144,13 +159,9 @@ static int __alloc_and_insert_iova_range(struct 
iova_domain *iovad,
curr = rb_prev(curr);
}
 
-   if (!curr) {
-   if (size_aligned)
-   pad_size = iova_get_pad_size(size, limit_pfn);
-   if ((iovad->start_pfn + size + pad_size) > limit_pfn) {
-   spin_unlock_irqrestore(>iova_rbtree_lock, flags);
-   return -ENOMEM;
-   }
+   if (unlikely(!curr)) {
+   spin_unlock_irqrestore(>iova_rbtree_lock, flags);
+   return -ENOMEM;
}
 
/* pfn_lo will point to size aligned address if size_aligned is set */
-- 
2.5.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/7] iommu/iova: cut down judgement times

2017-03-22 Thread Zhen Lei
Below judgement can only be satisfied at the last time, which produced 2N
judgements(suppose N times failed, 0 or 1 time successed) in vain.

if ((pfn >= iova->pfn_lo) && (pfn <= iova->pfn_hi)) {
return iova;
}

Signed-off-by: Zhen Lei 
---
 drivers/iommu/iova.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 8ba8b496..1c49969 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -312,15 +312,12 @@ private_find_iova(struct iova_domain *iovad, unsigned 
long pfn)
while (node) {
struct iova *iova = rb_entry(node, struct iova, node);
 
-   /* If pfn falls within iova's range, return iova */
-   if ((pfn >= iova->pfn_lo) && (pfn <= iova->pfn_hi)) {
-   return iova;
-   }
-
if (pfn < iova->pfn_lo)
node = node->rb_left;
-   else if (pfn > iova->pfn_lo)
+   else if (pfn > iova->pfn_hi)
node = node->rb_right;
+   else
+   return iova;/* pfn falls within iova's range */
}
 
return NULL;
-- 
2.5.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 6/7] iommu/iova: move the caculation of pad mask out of loop

2017-03-22 Thread Zhen Lei
I'm not sure whether the compiler can optimize it, but move it out will
be better. At least, it does not require lock protection.

Signed-off-by: Zhen Lei 
---
 drivers/iommu/iova.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 23abe84..68754e4 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -127,23 +127,16 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, 
struct iova *free)
*cached_node = rb_prev(>node);
 }
 
-/*
- * Computes the padding size required, to make the start address
- * naturally aligned on the power-of-two order of its size
- */
-static unsigned long
-iova_get_pad_size(unsigned long size, unsigned long limit_pfn)
-{
-   return (limit_pfn + 1 - size) & (__roundup_pow_of_two(size) - 1);
-}
-
 static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
unsigned long size, unsigned long limit_pfn,
struct iova *new, bool size_aligned)
 {
struct rb_node *prev, *curr;
unsigned long flags;
-   unsigned long pad_size = 0;
+   unsigned long pad_mask, pad_size = 0;
+
+   if (size_aligned)
+   pad_mask = __roundup_pow_of_two(size) - 1;
 
/* Walk the tree backwards */
spin_lock_irqsave(>iova_rbtree_lock, flags);
@@ -157,8 +150,13 @@ static int __alloc_and_insert_iova_range(struct 
iova_domain *iovad,
else if (limit_pfn < curr_iova->pfn_hi)
goto adjust_limit_pfn;
else {
+   /*
+* Computes the padding size required, to make the start
+* address naturally aligned on the power-of-two order
+* of its size
+*/
if (size_aligned)
-   pad_size = iova_get_pad_size(size, limit_pfn);
+   pad_size = (limit_pfn + 1 - size) & pad_mask;
if ((curr_iova->pfn_hi + size + pad_size) <= limit_pfn)
break;  /* found a free slot */
}
-- 
2.5.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/7] iommu/iova: improve the allocation performance of dma64

2017-03-22 Thread Zhen Lei
64 bits devices is very common now. But currently we only defined a 
cached32_node
to optimize the allocation performance of dma32, and I saw some dma64 drivers 
chose
to allocate iova from dma32 space first, maybe becuase of current dma64 
performance
problem or some other reasons.

For example:(in drivers/iommu/amd_iommu.c)
static unsigned long dma_ops_alloc_iova(..
{
..
if (dma_mask > DMA_BIT_MASK(32))
pfn = alloc_iova_fast(_dom->iovad, pages,
  IOVA_PFN(DMA_BIT_MASK(32)));
if (!pfn)
pfn = alloc_iova_fast(_dom->iovad, pages, 
IOVA_PFN(dma_mask));

For the details of why dma64 iova allocation performance is very bad, please 
refer the
description of patch-5.

In this patch series, I added a cached64_node to manage the dma64 iova 
space(iova>=4G), it
takes the same effect as cached32_node(iova<4G).

Below it's the performance data before and after my patch series:
(before)$ iperf -s

Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)

[  4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 35898
[ ID] Interval   Transfer Bandwidth
[  4]  0.0-10.2 sec  7.88 MBytes  6.48 Mbits/sec
[  5] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 35900
[  5]  0.0-10.3 sec  7.88 MBytes  6.43 Mbits/sec
[  4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 35902
[  4]  0.0-10.3 sec  7.88 MBytes  6.43 Mbits/sec

(after)$ iperf -s

Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)

[  4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 36330
[ ID] Interval   Transfer Bandwidth
[  4]  0.0-10.0 sec  1.09 GBytes   933 Mbits/sec
[  5] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 36332
[  5]  0.0-10.0 sec  1.10 GBytes   939 Mbits/sec
[  4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 36334
[  4]  0.0-10.0 sec  1.10 GBytes   938 Mbits/sec


Zhen Lei (7):
  iommu/iova: fix incorrect variable types
  iommu/iova: cut down judgement times
  iommu/iova: insert start_pfn boundary of dma32
  iommu/iova: adjust __cached_rbnode_insert_update
  iommu/iova: to optimize the allocation performance of dma64
  iommu/iova: move the caculation of pad mask out of loop
  iommu/iova: fix iovad->dma_32bit_pfn as the last pfn of dma32

 drivers/iommu/amd_iommu.c|   7 +-
 drivers/iommu/dma-iommu.c|  22 ++
 drivers/iommu/intel-iommu.c  |  11 +--
 drivers/iommu/iova.c | 143 +--
 drivers/misc/mic/scif/scif_rma.c |   3 +-
 include/linux/iova.h |   7 +-
 6 files changed, 94 insertions(+), 99 deletions(-)

-- 
2.5.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/7] iommu/iova: adjust __cached_rbnode_insert_update

2017-03-22 Thread Zhen Lei
For case 2 and 3, adjust cached32_node to the new place, case 1 keep no
change.

For example:
case1: (the right part was allocated)
|--|
|<-free>|<--new_iova-->|
|
|
   cached32_node

case2: (all was allocated)
|--|
|<-new_iova--->|
|
|
   cached32_node

case3:
|---|..|-|
|..free..|<--new_iova-->|
|  |
|  |
   cached32_node(new) cached32_node(old)

Signed-off-by: Zhen Lei 
---
 drivers/iommu/iova.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b5a148e..87a9332 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -91,12 +91,16 @@ __get_cached_rbnode(struct iova_domain *iovad, unsigned 
long *limit_pfn)
 }
 
 static void
-__cached_rbnode_insert_update(struct iova_domain *iovad,
-   unsigned long limit_pfn, struct iova *new)
+__cached_rbnode_insert_update(struct iova_domain *iovad, struct iova *new)
 {
-   if (limit_pfn != iovad->dma_32bit_pfn)
+   struct iova *cached_iova;
+
+   if (new->pfn_hi > iovad->dma_32bit_pfn)
return;
-   iovad->cached32_node = >node;
+
+   cached_iova = rb_entry(iovad->cached32_node, struct iova, node);
+   if (new->pfn_lo <= cached_iova->pfn_lo)
+   iovad->cached32_node = rb_prev(>node);
 }
 
 static void
@@ -131,12 +135,10 @@ static int __alloc_and_insert_iova_range(struct 
iova_domain *iovad,
 {
struct rb_node *prev, *curr;
unsigned long flags;
-   unsigned long saved_pfn;
unsigned long pad_size = 0;
 
/* Walk the tree backwards */
spin_lock_irqsave(>iova_rbtree_lock, flags);
-   saved_pfn = limit_pfn;
curr = __get_cached_rbnode(iovad, _pfn);
prev = curr;
while (curr) {
@@ -197,11 +199,10 @@ static int __alloc_and_insert_iova_range(struct 
iova_domain *iovad,
rb_link_node(>node, parent, entry);
rb_insert_color(>node, >rbroot);
}
-   __cached_rbnode_insert_update(iovad, saved_pfn, new);
+   __cached_rbnode_insert_update(iovad, new);
 
spin_unlock_irqrestore(>iova_rbtree_lock, flags);
 
-
return 0;
 }
 
-- 
2.5.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu