Re: [PATCH] devicetree: Add generic IOMMU device tree bindings

2014-05-20 Thread Arnd Bergmann
On Monday 19 May 2014 22:59:46 Thierry Reding wrote:
 On Mon, May 19, 2014 at 08:34:07PM +0200, Arnd Bergmann wrote:
  On Monday 19 May 2014 14:53:37 Thierry Reding wrote:
   On Mon, May 19, 2014 at 12:26:35PM +0200, Arnd Bergmann wrote:
On Friday 16 May 2014 14:23:18 Thierry Reding wrote:
 From: Thierry Reding tred...@nvidia.com
 
 This commit introduces a generic device tree binding for IOMMU 
 devices.
 Only a very minimal subset is described here, but it is enough to 
 cover
 the requirements of both the Exynos System MMU and Tegra SMMU as
 discussed here:
 
 https://lkml.org/lkml/2014/4/27/346
 
 More advanced functionality such as the dma-ranges property can easily
 be added in a backwards-compatible way. In the absence of a dma-ranges
 property it should be safe to default to the whole address space.
 

The basic binding looks fine, but I'd like it to be more explicit
about dma-ranges. Most importantly, what does the whole address space
mean?
   
   The whole point was to leave out any mention of dma-ranges from the
   binding until we've figured out more of the puzzle.
   
   So what I was trying to avoid was another lengthy discussion on the
   topic of dma-ranges. Oh well... =)
  
  I think that can't work, because we need a way to specify the
  ranges for some iommu drivers. We have to make sure we at least
  don't prevent it from working.
 
 That was precisely why I wanted to let this out of the binding for now
 so that we can actually focus on making existing hardware work rather
 than speculate about what we may or may not need.
 
 If you think the current minimal binding will cause future use-cases to
 break, then we should change it to avoid that. What you're proposing is
 to make the binding more complex on the assumption that we'll get it
 right.
 
 Wouldn't it be safer to keep things to the bare minimum necessary to
 represent hardware that we have access to and understand now, rather
 than speculating about what may be necessary at some point. I'd much
 prefer to add complexity on an as-needed basis and when we have a better
 understand of where we're headed.

I just want to think it through for the cases we know about. We don't
have to implement it all at once, but I think there is a danger in making
an important binding too simple or too complicated, both of which are
equally bad.

After giving the ranges stuff some more thought, I have come to the
conclusion that using #iommu-cells should work fine for almost
all cases, including windowed iommus, because the window is not
actually needed in the device, but only in the iommu, wihch is of course
free to interpret the arguments as addresses.
 
Finally, it makes no sense to use the dma-ranges property of the 
master's
parent bus, because that bus isn't actually involved in the translation.
   
   My understanding here is mostly based on the OpenFirmware working group
   proposal for the dma-ranges property[0]. I'll give another example to
   try and clarify how I had imagined this to work:
   
 / {
 #address-cells = 2;
 #size-cells = 2;
   
 iommu {
 /*
  * This is somewhat unusual (or maybe not) in that we
  * need 2 cells to represent the size of an address
  * space that is 32 bits long.
  */
 #address-cells = 1;
 #size-cells = 2;
  
  You should never need #size-cells  #address-cells
 
 That was always my impression as well. But how then do you represent the
 full 4 GiB address space in a 32-bit system? It starts at 0 and ends at
 4 GiB - 1, which makes it 4 GiB large. That's:
 
   0 1 0
 
 With #address-cells = 1 and #size-cells = 1 the best you can do is:
 
   0 0x
 
 but that's not accurate.

I think we've done both in the past, either extended #size-cells or
taken 0x as a special token. Note that in your example,
the iommu actually needs #address-cells = 2 anyway.

 #iommu-cells = 1;
 };
   
 master {
 iommus = /iommu 42;
 /*
  * Map I/O addresses 0 - 4 GiB to physical addresses
  * 2 GiB - 6 GiB.
  */
 dma-ranges = 0x 0 0x8000 1 0;
 };
 };
   
   This is somewhat incompatible with [0] in that #address-cells used to
   parse the child address must be taken from the iommu node rather than
   the child node. But that seems to me to be the only reasonable thing
   to do, because after all the IOMMU creates a completely new address
   space for the master.
   
   [0]: http://www.openfirmware.org/ofwg/proposals/Closed/Accepted/410-it.txt
  
  I don't think you can have a dma-ranges without a #address-cells and
  #size-cells property in the same node. 

Re: [PATCH] devicetree: Add generic IOMMU device tree bindings

2014-05-20 Thread Arnd Bergmann
On Monday 19 May 2014 22:32:33 Thierry Reding wrote:
 On Mon, May 19, 2014 at 06:22:31PM +0100, Dave Martin wrote:
  On Mon, May 19, 2014 at 01:53:37PM +0100, Thierry Reding wrote:
 [...]
   My understanding here is mostly based on the OpenFirmware working group
   proposal for the dma-ranges property[0]. I'll give another example to
   try and clarify how I had imagined this to work:
   
   / {
   #address-cells = 2;
   #size-cells = 2;
   
   iommu {
   /*
* This is somewhat unusual (or maybe not) in that we
* need 2 cells to represent the size of an address
* space that is 32 bits long.
*/
   #address-cells = 1;
   #size-cells = 2;
   
   #iommu-cells = 1;
   };
   
   master {
   iommus = /iommu 42;
  
  Comparing this with the other discussion thread, we have a similar
  concept here, in that the iommu is made a logical parent, however...
  
  Firstly, there's an implicit assumption here that the only kind of
  thing the master could possibly be connected to is an IOMMU, with
  no non-trivial interconnect in between.  I'm not sure this is going
  to scale to complex SoCs.
 
 Here we go again. We're now in the very same bad spot that we've been in
 so many times before. There are at least two SoCs that we know do not
 require anything fancy, yet we're blocking adding support for those use
 cases because we think that at some point some IOMMU may require more
 than that. But at the same time it seems like we don't have enough data
 about what exactly that is, so we keep speculating. At this rate we're
 making it impossible to get a reasonable feature set supported upstream.
 
 That's very frustrating.

I agree. While I just commented that I want to think through how this
would look for other IOMMUs, the generic case of all masters that Dave
wants is going overboard with the complexity and we'd be better off
deferring this to whenever someone needs it, which I assume is never.

  If a range of Stream IDs may be issued (especially from something like
  a PCIe RC where the stream ID may be a many-bit value), describing
  the IDs individually may be impractical.
 
 The IOMMU specifier is completely specific to the IOMMU. If an IOMMU has
 a need to represent the stream IDs as a range there's nothing keeping it
 from defining the specifier accordingly.

If we make the IOMMU address space look like the PCI address space, we
can have a simple representation for this in DT. For the code, I assume
that we will always have to treat PCI and platform devices differently.

Arnd
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] devicetree: Add generic IOMMU device tree bindings

2014-05-20 Thread Arnd Bergmann
On Tuesday 20 May 2014 13:05:37 Thierry Reding wrote:
 On Tue, May 20, 2014 at 12:04:54PM +0200, Arnd Bergmann wrote:
  On Monday 19 May 2014 22:59:46 Thierry Reding wrote:
   On Mon, May 19, 2014 at 08:34:07PM +0200, Arnd Bergmann wrote:
On Monday 19 May 2014 14:53:37 Thierry Reding wrote:
 On Mon, May 19, 2014 at 12:26:35PM +0200, Arnd Bergmann wrote:
  On Friday 16 May 2014 14:23:18 Thierry Reding wrote:
   From: Thierry Reding tred...@nvidia.com
 [...]
  Finally, it makes no sense to use the dma-ranges property of the 
  master's
  parent bus, because that bus isn't actually involved in the 
  translation.
 
 My understanding here is mostly based on the OpenFirmware working 
 group
 proposal for the dma-ranges property[0]. I'll give another example to
 try and clarify how I had imagined this to work:
 
   / {
   #address-cells = 2;
   #size-cells = 2;
 
   iommu {
   /*
* This is somewhat unusual (or maybe not) in 
 that we
* need 2 cells to represent the size of an 
 address
* space that is 32 bits long.
*/
   #address-cells = 1;
   #size-cells = 2;

You should never need #size-cells  #address-cells
   
   That was always my impression as well. But how then do you represent the
   full 4 GiB address space in a 32-bit system? It starts at 0 and ends at
   4 GiB - 1, which makes it 4 GiB large. That's:
   
 0 1 0
   
   With #address-cells = 1 and #size-cells = 1 the best you can do is:
   
 0 0x
   
   but that's not accurate.
  
  I think we've done both in the past, either extended #size-cells or
  taken 0x as a special token. Note that in your example,
  the iommu actually needs #address-cells = 2 anyway.
 
 But it needs #address-cells = 2 only to encode an ID in addition to
 the address. If this was a single-master IOMMU then there'd be no need
 for the ID.

Right. But for a single-master IOMMU, there is no need to specify
any additional data, it could have #address-cells=0 if we take the
optimization you suggested.

 This really isn't specific to IOMMUs though. It really applies to all
 cases where #address-cells and #size-cells are parsed. While it's way
 too late to change the semantics of standard properties, perhaps for
 properties that are introduced it would make more sense to encode this
 as a start limit pair, both of length #address-cells, to avoid this
 type of corner case.
 
 On the other hand doing so would make it inconsistent with existing
 bindings which may not be desirable either.
 
 But since it seems like we're headed for something completely different
 for IOMMUs, perhaps it would be worth considering to describe the IOMMU
 range as start limit. Since it will likely use #iommu-cells rather
 than #address-cells we have an opportunity to change the semantics.

I'd still prefer #address-cells/#size-cells over #iommu-cells.

  / {
  #address-cells = 1;
  #size-cells = 1;
  
  iommu {
  #address-cells = 2; // ID, address
  #size-cells = 2;
  };
  
  master@a {
  iommus =  {/iommu} 0xa 0x0  0x1 0x0; // 4GB ID '0xa'
  }
  
  bus1 {
  #address-cells = 1;
  #size-cells = 1;
  ranges;
  iommus =  {/iommu} 0 0   0x100 0; // all IDs
  dma-ranges = 0  0xb 0  1 0; // child devices use ID '0xb'
  
  anothermaster {
  // no iommus link, implied by dma-ranges above
  };
  };
  };
  
  If you set #size-cells=0, you can't really do that but instead would
  require an iommus property in each master, which is not a big concern
  either.
 
 I'm not sure I understand the need for 0x100 (all IDs) entry above. If
 bus1's iommus property applies to all devices on the bus, why can't the
 ID 0xb be listed in the iommus property?
 
   bus1 {
   #address-cells = 1;
   #size-cells = 1;
   ranges;
   iommus = {/iommu} 0xb 0  0x1 0x0; // 4GB ID '0xb'
   dma-ranges = 0  0xb 0  0x1 0x0;
 
   anothermaster {
   ...
   };
   };

It depends on how the address is interpreted, but we could make this
a valid case too.

 In which case I guess dma-ranges would be redundant.

No, because the iommus property doesn't translate the address range, it
just creates a new address space. bus1 and iommu in the example have
different #address-cells, so you definitely need a non-empty ranges
property.

  The main advantage I think would be for IOMMUs that use the PCI b/d/f
  numbers as IDs. These can have #address-cells=3, #size-cells=2
  and have an empty dma-ranges property in the PCI host bridge node,
  and 

Re: [PATCH] devicetree: Add generic IOMMU device tree bindings

2014-05-20 Thread Thierry Reding
On Tue, May 20, 2014 at 01:15:48PM +0200, Arnd Bergmann wrote:
 On Tuesday 20 May 2014 13:05:37 Thierry Reding wrote:
  On Tue, May 20, 2014 at 12:04:54PM +0200, Arnd Bergmann wrote:
   On Monday 19 May 2014 22:59:46 Thierry Reding wrote:
On Mon, May 19, 2014 at 08:34:07PM +0200, Arnd Bergmann wrote:
[...]
 You should never need #size-cells  #address-cells

That was always my impression as well. But how then do you represent the
full 4 GiB address space in a 32-bit system? It starts at 0 and ends at
4 GiB - 1, which makes it 4 GiB large. That's:

0 1 0

With #address-cells = 1 and #size-cells = 1 the best you can do is:

0 0x

but that's not accurate.
   
   I think we've done both in the past, either extended #size-cells or
   taken 0x as a special token. Note that in your example,
   the iommu actually needs #address-cells = 2 anyway.
  
  But it needs #address-cells = 2 only to encode an ID in addition to
  the address. If this was a single-master IOMMU then there'd be no need
  for the ID.
 
 Right. But for a single-master IOMMU, there is no need to specify
 any additional data, it could have #address-cells=0 if we take the
 optimization you suggested.

Couldn't a single-master IOMMU be windowed?

  I'm not sure I understand the need for 0x100 (all IDs) entry above. If
  bus1's iommus property applies to all devices on the bus, why can't the
  ID 0xb be listed in the iommus property?
  
  bus1 {
  #address-cells = 1;
  #size-cells = 1;
  ranges;
  iommus = {/iommu} 0xb 0  0x1 0x0; // 4GB ID '0xb'
  dma-ranges = 0  0xb 0  0x1 0x0;
  
  anothermaster {
  ...
  };
  };
 
 It depends on how the address is interpreted, but we could make this
 a valid case too.
 
  In which case I guess dma-ranges would be redundant.
 
 No, because the iommus property doesn't translate the address range, it
 just creates a new address space. bus1 and iommu in the example have
 different #address-cells, so you definitely need a non-empty ranges
 property.

Ah, now I get it.

   The main advantage I think would be for IOMMUs that use the PCI b/d/f
   numbers as IDs. These can have #address-cells=3, #size-cells=2
   and have an empty dma-ranges property in the PCI host bridge node,
   and interpret this as using the same encoding as the PCI BARs in
   the ranges property.
  
  I'm somewhat confused here, since you said earlier:
  
   After giving the ranges stuff some more thought, I have come to the
   conclusion that using #iommu-cells should work fine for almost
   all cases, including windowed iommus, because the window is not
   actually needed in the device, but only in the iommu, wihch is of course
   free to interpret the arguments as addresses.
  
  But now you seem to be saying that we should still be using the
  #address-cells and #size-cells properties in the IOMMU node to determine
  the length of the specifier.
 
 I probably wasn't clear. I think we can make it work either way, but
 my feeling is that using #address-cells/#size-cells gives us a nicer
 syntax for the more complex cases.

Okay, so in summary we'd have something like this for simple cases:

Required properties:

- #address-cells: The number of cells in an IOMMU specifier needed to encode
  an address.
- #size-cells: The number of cells in an IOMMU specifier needed to represent
  the length of an address range.

Typical values for the above include:
- #address-cells = 0, size-cells = 0: Single master IOMMU devices are not
  configurable and therefore no additional information needs to be encoded in
  the specifier. This may also apply to multiple master IOMMU devices that do
  not allow the association of masters to be configured.
- #address-cells = 1, size-cells = 0: Multiple master IOMMU devices may
  need to be configured in order to enable translation for a given master. In
  such cases the single address cell corresponds to the master device's ID.
- #address-cells = 2, size-cells = 2: Some IOMMU devices allow the DMA
  window for masters to be configured. The first cell of the address in this
  may contain the master device's ID for example, while the second cell could
  contain the start of the DMA window for the given device. The length of the
  DMA window is specified by two additional cells.

Examples:
=

Single-master IOMMU:


iommu {
#address-cells = 0;
#size-cells = 0;
};

master {
iommus = /iommu;
};

Multiple-master IOMMU with fixed associations:
--

/* multiple-master IOMMU */
iommu {
/*
 * Masters are statically associated with this IOMMU and
 * address translation is always enabled.
 */

[PATCH 0/6] trivial cleanup for iommu/vt-d

2014-05-20 Thread Yijing Wang
Some cleanup patches for iommu/vt-d.

Yijing Wang (6):
  iommu/vt-d: Use list_for_each_safe() to simplify code
  iommu/vt-d: move up no_iommu and dmar_disabled check
  iommu/vt-d: clear the redundant assignment in dmar_enable_qi
  iommu/vt-d: clear the redundant assignment for domain-nid
  iommu/vt-d: use inline function dma_pte_superpage instead of macros
  iommu/vt-d: fix reference count in iommu_prepare_isa

 drivers/iommu/dmar.c|3 ---
 drivers/iommu/intel-iommu.c |   18 --
 2 files changed, 8 insertions(+), 13 deletions(-)


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/6] iommu/vt-d: clear the redundant assignment in dmar_enable_qi

2014-05-20 Thread Yijing Wang
__dmar_enable_qi() will initialize free_head,free_tail and
free_cnt for q_inval. Remove the redundant initialization
in dmar_enable_qi().

Signed-off-by: Yijing Wang wangyij...@huawei.com
---
 drivers/iommu/dmar.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 39f8b71..1168469 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1339,9 +1339,6 @@ int dmar_enable_qi(struct intel_iommu *iommu)
return -ENOMEM;
}
 
-   qi-free_head = qi-free_tail = 0;
-   qi-free_cnt = QI_LENGTH;
-
raw_spin_lock_init(qi-q_lock);
 
__dmar_enable_qi(iommu);
-- 
1.7.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/6] iommu/vt-d: clear the redundant assignment for domain-nid

2014-05-20 Thread Yijing Wang
Alloc_domain() will initialize domain-nid to -1. So the
initialization for domain-nid in md_domain_init() is redundant,
clear it.

Signed-off-by: Yijing Wang wangyij...@huawei.com
---
 drivers/iommu/intel-iommu.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 6b71608..d1d6636 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4134,7 +4134,6 @@ static int md_domain_init(struct dmar_domain *domain, int 
guest_width)
domain-iommu_snooping = 0;
domain-iommu_superpage = 0;
domain-max_addr = 0;
-   domain-nid = -1;
 
/* always allocate the top pgd */
domain-pgd = (struct dma_pte *)alloc_pgtable_page(domain-nid);
-- 
1.7.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 6/6] iommu/vt-d: fix reference count in iommu_prepare_isa

2014-05-20 Thread Yijing Wang
Decrease the device reference count avoid memory leak.

Signed-off-by: Yijing Wang wangyij...@huawei.com
---
 drivers/iommu/intel-iommu.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 455896c..a78a824 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2401,7 +2401,8 @@ static inline void iommu_prepare_isa(void)
if (ret)
printk(KERN_ERR IOMMU: Failed to create 0-16MiB identity map; 
   floppy might not work\n);
-
+   
+   pci_dev_put(pdev);
 }
 #else
 static inline void iommu_prepare_isa(void)
-- 
1.7.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/6] iommu/vt-d: use inline function dma_pte_superpage instead of macros

2014-05-20 Thread Yijing Wang
Use inline function dma_pte_superpage() instead of macro for
better readability.

Signed-off-by: Yijing Wang wangyij...@huawei.com
---
 drivers/iommu/intel-iommu.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index d1d6636..455896c 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -891,7 +891,7 @@ static struct dma_pte *dma_pfn_level_pte(struct dmar_domain 
*domain,
break;
}
 
-   if (pte-val  DMA_PTE_LARGE_PAGE) {
+   if (dma_pte_superpage(pte)) {
*large_page = total;
return pte;
}
-- 
1.7.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/6] iommu/vt-d: Use list_for_each_safe() to simplify code

2014-05-20 Thread Yijing Wang
Use list_for_each_entry_safe() instead of list_entry()
to simplify code.

Signed-off-by: Yijing Wang wangyij...@huawei.com
---
 drivers/iommu/intel-iommu.c |6 ++
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index f256ffc..e020dcf 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2101,13 +2101,11 @@ static inline void unlink_domain_info(struct 
device_domain_info *info)
 
 static void domain_remove_dev_info(struct dmar_domain *domain)
 {
-   struct device_domain_info *info;
+   struct device_domain_info *info, *tmp;
unsigned long flags, flags2;
 
spin_lock_irqsave(device_domain_lock, flags);
-   while (!list_empty(domain-devices)) {
-   info = list_entry(domain-devices.next,
-   struct device_domain_info, link);
+   list_for_each_entry_safe(info, tmp, domain-devices, link) {
unlink_domain_info(info);
spin_unlock_irqrestore(device_domain_lock, flags);
 
-- 
1.7.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] arm: dma-iommu: Clean up redundant variable

2014-05-20 Thread Marek Szyprowski

Hello,

On 2014-05-20 06:32, ritesh.harj...@gmail.com wrote:

From: Ritesh Harjani ritesh.harj...@gmail.com

mapping-size can be derived from mapping-bits  PAGE_SHIFT
which makes mapping-size as redundant.

Clean this up.

Signed-off-by: Ritesh Harjani ritesh.harj...@gmail.com
Reported-by: Will Deacon will.dea...@arm.com


Thanks for this cleanup. I will add it to my tree.


---
  arch/arm/include/asm/dma-iommu.h |  1 -
  arch/arm/mm/dma-mapping.c| 11 ++-
  2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm/include/asm/dma-iommu.h b/arch/arm/include/asm/dma-iommu.h
index eec0a12..8e3fcb9 100644
--- a/arch/arm/include/asm/dma-iommu.h
+++ b/arch/arm/include/asm/dma-iommu.h
@@ -18,7 +18,6 @@ struct dma_iommu_mapping {
unsigned intextensions;
size_t  bitmap_size;/* size of a single bitmap */
size_t  bits;   /* per bitmap */
-   unsigned intsize;   /* per bitmap */
dma_addr_t  base;
  
  	spinlock_t		lock;

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 6b00be1..3d43c41 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1074,6 +1074,7 @@ static inline dma_addr_t __alloc_iova(struct 
dma_iommu_mapping *mapping,
unsigned int order = get_order(size);
unsigned int align = 0;
unsigned int count, start;
+   size_t mapping_size = mapping-bits  PAGE_SHIFT;
unsigned long flags;
dma_addr_t iova;
int i;
@@ -1119,7 +1120,7 @@ static inline dma_addr_t __alloc_iova(struct 
dma_iommu_mapping *mapping,
}
spin_unlock_irqrestore(mapping-lock, flags);
  
-	iova = mapping-base + (mapping-size * i);

+   iova = mapping-base + (mapping_size * i);
iova += start  PAGE_SHIFT;
  
  	return iova;

@@ -1129,6 +1130,7 @@ static inline void __free_iova(struct dma_iommu_mapping 
*mapping,
   dma_addr_t addr, size_t size)
  {
unsigned int start, count;
+   size_t mapping_size = mapping-bits  PAGE_SHIFT;
unsigned long flags;
dma_addr_t bitmap_base;
u32 bitmap_index;
@@ -1136,14 +1138,14 @@ static inline void __free_iova(struct dma_iommu_mapping 
*mapping,
if (!size)
return;
  
-	bitmap_index = (u32) (addr - mapping-base) / (u32) mapping-size;

+   bitmap_index = (u32) (addr - mapping-base) / (u32) mapping_size;
BUG_ON(addr  mapping-base || bitmap_index  mapping-extensions);
  
-	bitmap_base = mapping-base + mapping-size * bitmap_index;

+   bitmap_base = mapping-base + mapping_size * bitmap_index;
  
  	start = (addr - bitmap_base) 	PAGE_SHIFT;
  
-	if (addr + size  bitmap_base + mapping-size) {

+   if (addr + size  bitmap_base + mapping_size) {
/*
 * The address range to be freed reaches into the iova
 * range of the next bitmap. This should not happen as
@@ -1964,7 +1966,6 @@ arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t 
base, size_t size)
mapping-extensions = extensions;
mapping-base = base;
mapping-bits = BITS_PER_BYTE * bitmap_size;
-   mapping-size = mapping-bits  PAGE_SHIFT;
  
  	spin_lock_init(mapping-lock);
  


Best regards
--
Marek Szyprowski, PhD
Samsung RD Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] devicetree: Add generic IOMMU device tree bindings

2014-05-20 Thread Thierry Reding
On Tue, May 20, 2014 at 02:41:18PM +0200, Arnd Bergmann wrote:
 On Tuesday 20 May 2014 14:02:43 Thierry Reding wrote:
[...]
  Couldn't a single-master IOMMU be windowed?
 
 Ah, yes. That would actually be like an IBM pSeries, which has a windowed
 IOMMU but uses one window per virtual machine. In that case, the window could
 be a property of the iommu node though, rather than part of the address
 in the link.

Does that mean that the IOMMU has one statically configured window which
is the same for each virtual machine? That would require some other
mechanism to assign separate address spaces to each virtual machine,
wouldn't it? But I suspect that if the IOMMU allows that it could be
allocated dynamically at runtime.

  Multiple-master IOMMU with fixed associations:
  --
  
  /* multiple-master IOMMU */
  iommu {
  /*
   * Masters are statically associated with this IOMMU and
   * address translation is always enabled.
   */
  #iommu-cells = 0;
  };
 
 copied wrong? I guess you mean #address-cells=0/#size-cells=0 here.

Yes, I obviously wasn't careful when I copied this over.

  Multiple-master device:
  ---
  
  /* single-master IOMMU */
  iommu@1 {
  reg = 1;
  #address-cells = 0;
  #size-cells = 0;
  };
  
  /* multiple-master IOMMU */
  iommu@2 {
  reg = 2;
  #address-cells = 1;
  #size-cells = 0;
  };
  
  /* device with two master interfaces */
  master {
  iommus = /iommu@1,/* master of the single-master IOMMU */
   /iommu@2 42; /* ID 42 in multiple-master IOMMU */
  };
[...]
  Does that sound about right?
 
 Yes, sounds great. I would probably leave out the Multiple-master device
 from the examples, since that seems to be a rather obscure case.

Agreed. We can easily add such examples if/when such device start to
appear.

 I would like to add an explanation about dma-ranges to the binding:
 
 8
 The parent bus of the iommu must have a valid dma-ranges property
 describing how the physical address space of the IOMMU maps into
 memory.

With physical address space you mean the addresses after translation,
not the I/O virtual addresses, right? But even so, how will this work
when there are multiple IOMMU devices? What determines which IOMMU is
mapped via which entry?

Perhaps having multiple IOMMUs implies that there will have to be some
partitioning of the parent address space to make sure two IOMMUs don't
translate to the same ranges?

 A device with an iommus property will ignore the dma-ranges property
 of the parent node and rely on the IOMMU for translation instead.

Do we need to consider the case where an IOMMU listed in iommus isn't
enabled (status = disabled)? In that case presumably the device would
either not function or may optionally continue to master onto the parent
untranslated.

 Using an iommus property in bus device nodes with dma-ranges
 specifying how child devices relate to the IOMMU is a possible extension
 but is not recommended until this binding gets extended.

Just for my understanding, bus device nodes with iommus and dma-ranges
properties could be equivalently written by explicitly moving the iommus
properties into the child device nodes, right? In which case they should
be the same as the other examples. So that concept is a convenience
notation to reduce duplication, but doesn't fundamentally introduce any
new concept.

Thierry


pgpb8EcG_HoBX.pgp
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] devicetree: Add generic IOMMU device tree bindings

2014-05-20 Thread Arnd Bergmann
On Tuesday 20 May 2014 14:07:09 Dave Martin wrote:
 On Tue, May 20, 2014 at 12:08:44PM +0200, Arnd Bergmann wrote:
  On Monday 19 May 2014 22:32:33 Thierry Reding wrote:
   On Mon, May 19, 2014 at 06:22:31PM +0100, Dave Martin wrote:
On Mon, May 19, 2014 at 01:53:37PM +0100, Thierry Reding wrote:
   [...]
 My understanding here is mostly based on the OpenFirmware working 
 group
 proposal for the dma-ranges property[0]. I'll give another example to
 try and clarify how I had imagined this to work:
 
 / {
 #address-cells = 2;
 #size-cells = 2;
 
 iommu {
 /*
  * This is somewhat unusual (or maybe not) in 
 that we
  * need 2 cells to represent the size of an 
 address
  * space that is 32 bits long.
  */
 #address-cells = 1;
 #size-cells = 2;
 
 #iommu-cells = 1;
 };
 
 master {
 iommus = /iommu 42;

Comparing this with the other discussion thread, we have a similar
concept here, in that the iommu is made a logical parent, however...

Firstly, there's an implicit assumption here that the only kind of
thing the master could possibly be connected to is an IOMMU, with
no non-trivial interconnect in between.  I'm not sure this is going
to scale to complex SoCs.
   
   Here we go again. We're now in the very same bad spot that we've been in
   so many times before. There are at least two SoCs that we know do not
   require anything fancy, yet we're blocking adding support for those use
   cases because we think that at some point some IOMMU may require more
   than that. But at the same time it seems like we don't have enough data
   about what exactly that is, so we keep speculating. At this rate we're
   making it impossible to get a reasonable feature set supported upstream.
   
   That's very frustrating.
  
  I agree. While I just commented that I want to think through how this
  would look for other IOMMUs, the generic case of all masters that Dave
  wants is going overboard with the complexity and we'd be better off
 
 I don't want that, and actually I agree with the conclusion overboard.
 
 It was really about exploring the problem space, including things that
 we can reasonably foresee.  Any bindings today should necessarily be
 much simpler, and that's certainly the correct approach.
 
 I've been neglecting this thread (apologies) -- but although I have to
 think a bit about Thierry's most recent suggestions, I think there's
 actually reasonable alignment now.
 
  deferring this to whenever someone needs it, which I assume is never.
 
 Well, some of it might be.  But I'm not saying we should give in to
 wild speculation (except to get a feel for what we're ruling out, and
 the consequences of that).


Ok, fair enough.

If a range of Stream IDs may be issued (especially from something like
a PCIe RC where the stream ID may be a many-bit value), describing
the IDs individually may be impractical.
   
   The IOMMU specifier is completely specific to the IOMMU. If an IOMMU has
   a need to represent the stream IDs as a range there's nothing keeping it
   from defining the specifier accordingly.
  
  If we make the IOMMU address space look like the PCI address space, we
  can have a simple representation for this in DT. For the code, I assume
  that we will always have to treat PCI and platform devices differently.
 
 Can you elaborate on that?
 
 Master ID signals in ARM systems and how they are handled are rather
 under-specified today.  Treating the master ID bits as some extra bits
 in some kind of extended bus address could work for ARM IOMMUs etc.,
 but I didn't have a complete answer yet.

The PCI binding at http://www.openfirmware.org/1275/bindings/pci/pci2_1.pdf
defines a 96-bit address space for MMIO registers and other things, so
it can uniquely identify not just an address on the bus, but also 
any standard resource as seen by the device, quoting from there:

2.2.1.1. Numerical Representation
(The Numerical Representation of an address is the format that Open
 Firmware uses for storing an address within a property value and on
 the stack, as an argument to a package method.) The numerical
 representation of a PCI address consists of three cells, encoded as
 follows. For this purpose, the least-significant 32 bits of a cell is used;
 if the cell size is larger than 32 bits, any additional high-order bits
 are zero. Bit# 0 refers to the least-significant bit.

Bit#   3322  1100 
   10987654 32109876 54321098 76543210
phys.hi cell:  npt000ss  dfff 
phys.mid cell:    
phys.lo cell:     


Re: [PATCH] devicetree: Add generic IOMMU device tree bindings

2014-05-20 Thread Thierry Reding
On Tue, May 20, 2014 at 03:34:46PM +0200, Arnd Bergmann wrote:
 On Tuesday 20 May 2014 15:17:43 Thierry Reding wrote:
  On Tue, May 20, 2014 at 02:41:18PM +0200, Arnd Bergmann wrote:
   On Tuesday 20 May 2014 14:02:43 Thierry Reding wrote:
  [...]
Couldn't a single-master IOMMU be windowed?
   
   Ah, yes. That would actually be like an IBM pSeries, which has a windowed
   IOMMU but uses one window per virtual machine. In that case, the window 
   could
   be a property of the iommu node though, rather than part of the address
   in the link.
  
  Does that mean that the IOMMU has one statically configured window which
  is the same for each virtual machine? That would require some other
  mechanism to assign separate address spaces to each virtual machine,
  wouldn't it? But I suspect that if the IOMMU allows that it could be
  allocated dynamically at runtime.
 
 The way it works on pSeries is that upon VM creation, the guest is assigned
 one 256MB window for use by assigned DMA capable devices. When the guest
 creates a mapping, it uses a hypercall to associate a bus address in that
 range with a guest physical address. The hypervisor checks that the bus
 address is within the allowed range, and translates the guest physical
 address into a host physical address, then enters both into the I/O page
 table or I/O TLB.

So when a VM is booted it is passed a device tree with that DMA window?
Given what you describe above this seems to be more of a configuration
option to restrict the IOMMU to a subset of the physical memory for
purposes of virtualization. So I agree that this wouldn't be a good fit
for what we're trying to achieve with iommus or dma-ranges in this
binding.

   I would like to add an explanation about dma-ranges to the binding:
   
   8
   The parent bus of the iommu must have a valid dma-ranges property
   describing how the physical address space of the IOMMU maps into
   memory.
  
  With physical address space you mean the addresses after translation,
  not the I/O virtual addresses, right? But even so, how will this work
  when there are multiple IOMMU devices? What determines which IOMMU is
  mapped via which entry?
  
  Perhaps having multiple IOMMUs implies that there will have to be some
  partitioning of the parent address space to make sure two IOMMUs don't
  translate to the same ranges?
 
 These dma-ranges properties would almost always be for the entire RAM,
 and we can treat anything else as a bug.

Would it typically be a 1:1 mapping? In that case could we define an
empty dma-ranges property to mean exactly that? That would make it
consistent with the ranges property.

 The mapping between what goes into the IOMMU and what comes out of it
 is not reflected in DT at all, since it only happens at runtime.
 The dma-ranges property I mean above describes how what comes out of
 the IOMMU maps into physical memory.

Understood. That makes sense.

   A device with an iommus property will ignore the dma-ranges property
   of the parent node and rely on the IOMMU for translation instead.
  
  Do we need to consider the case where an IOMMU listed in iommus isn't
  enabled (status = disabled)? In that case presumably the device would
  either not function or may optionally continue to master onto the parent
  untranslated.
 
 My reasoning was that the DT should specify whether we use the IOMMU
 or not. Being able to just switch on or off the IOMMU sounds nice as
 well, so we could change the text above to do that.
 
 Another option would be to do this in the IOMMU code, basically
 falling back to the IOMMU parent's dma-ranges property and using
 linear dma_map_ops when that is disabled.

Yes, it should be trivial for the IOMMU core code to take care of this
special case. Still I think it's worth mentioning it in the binding so
that it's clearly specified.

   Using an iommus property in bus device nodes with dma-ranges
   specifying how child devices relate to the IOMMU is a possible extension
   but is not recommended until this binding gets extended.
  
  Just for my understanding, bus device nodes with iommus and dma-ranges
  properties could be equivalently written by explicitly moving the iommus
  properties into the child device nodes, right? In which case they should
  be the same as the other examples. So that concept is a convenience
  notation to reduce duplication, but doesn't fundamentally introduce any
  new concept.
 
 The one case  where that doesn't work is PCI, because we don't list the
 PCI devices in DT normally, and the iommus property would only exist
 at the PCI host controller node.

But it could work in classic OpenFirmware where the device tree can be
populated with the tree of PCI devices enumerated by OF. There are also
devices that have a fixed configuration and where technically the PCI
devices can be listed in the device tree. This is somewhat important if
for example one PCI device is a GPIO controller and needs to be
referenced by phandle 

RE: [PATCH] iommu/amd: Fix for L2 race with VM invalidation

2014-05-20 Thread Cornwall, Jay
 From: Joerg Roedel [mailto:j...@8bytes.org]
 Sent: Tuesday, May 20, 2014 08:33
 
 On Wed, May 14, 2014 at 11:38:25AM +, Cornwall, Jay wrote:
  Hi,
 
  I'm not sure why you're submitting this, Suravee?
 
  We already agreed that we need the extra mmu_notifier point, proposed
  by Joerg back in 2011, to eliminate the race. I had thought we were
  waiting on that to be implemented.
 
 Speaking of this, would it also be possible to hold back all faults (so not
 sending back ppr-comletions) between invalidate_range_start/end()?
 Or will the hardware run into a timeout when it takes too long to process a
 fault or when it issued the maximum number of parallel faults?

I haven't located a comment on this in the HW documentation. The ATC will set a 
MMIO register flag if a response takes more than 2^32 cycles (I'm not certain 
of the frequency). I don't know if this is fatal. I have experimented with 
adding significant delays into the PPR handler in the past without affecting 
test cases.

My larger concern with this design is that the only situations in which there 
is a conflict between the MM and IOTLB activity is where the IOTLB client is 
racing with munmap() or physical page relocation. Both of these are low 
proability scenarios. However, blocking PPR completions for all MM 
invalidations would impact all IOTLB client activity.

This is particularly true for heterogeneous applications.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3] PCI: Introduce new device binding path using pci_dev.driver_override

2014-05-20 Thread Alex Williamson
The driver_override field allows us to specify the driver for a device
rather than relying on the driver to provide a positive match of the
device.  This shortcuts the existing process of looking up the vendor
and device ID, adding them to the driver new_id, binding the device,
then removing the ID, but it also provides a couple advantages.

First, the above existing process allows the driver to bind to any
device matching the new_id for the window where it's enabled.  This is
often not desired, such as the case of trying to bind a single device
to a meta driver like pci-stub or vfio-pci.  Using driver_override we
can do this deterministically using:

echo pci-stub  /sys/bus/pci/devices/:03:00.0/driver_override
echo :03:00.0  /sys/bus/pci/devices/:03:00.0/driver/unbind
echo :03:00.0  /sys/bus/pci/drivers_probe

Previously we could not invoke drivers_probe after adding a device
to new_id for a driver as we get non-deterministic behavior whether
the driver we intend or the standard driver will claim the device.
Now it becomes a deterministic process, only the driver matching
driver_override will probe the device.

To return the device to the standard driver, we simply clear the
driver_override and reprobe the device:

echo  /sys/bus/pci/devices/:03:00.0/driver_override
echo :03:00.0  /sys/bus/pci/devices/:03:00.0/driver/unbind
echo :03:00.0  /sys/bus/pci/drivers_probe

Another advantage to this approach is that we can specify a driver
override to force a specific binding or prevent any binding.  For
instance when an IOMMU group is exposed to userspace through VFIO
we require that all devices within that group are owned by VFIO.
However, devices can be hot-added into an IOMMU group, in which case
we want to prevent the device from binding to any driver (override
driver = none) or perhaps have it automatically bind to vfio-pci.
With driver_override it's a simple matter for this field to be set
internally when the device is first discovered to prevent driver
matches.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Cc: Greg Kroah-Hartman gre...@linuxfoundation.org
---

v3: kfree() override buffer on device release, noted by Alex Graf

v2: Use strchr() as suggested by Guenter Roeck and adopted by the
platform driver version of this same interface.

 Documentation/ABI/testing/sysfs-bus-pci |   21 
 drivers/pci/pci-driver.c|   25 +--
 drivers/pci/pci-sysfs.c |   40 +++
 drivers/pci/probe.c |1 +
 include/linux/pci.h |1 +
 5 files changed, 85 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-pci 
b/Documentation/ABI/testing/sysfs-bus-pci
index a3c5a66..898ddc4 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -250,3 +250,24 @@ Description:
valid.  For example, writing a 2 to this file when sriov_numvfs
is not 0 and not 2 already will return an error. Writing a 10
when the value of sriov_totalvfs is 8 will return an error.
+
+What:  /sys/bus/pci/devices/.../driver_override
+Date:  April 2014
+Contact:   Alex Williamson alex.william...@redhat.com
+Description:
+   This file allows the driver for a device to be specified which
+   will override standard static and dynamic ID matching.  When
+   specified, only a driver with a name matching the value written
+   to driver_override will have an opportunity to bind to the
+   device.  The override is specified by writing a string to the
+   driver_override file (echo pci-stub  driver_override) and
+   may be cleared with an empty string (echo  driver_override).
+   This returns the device to standard matching rules binding.
+   Writing to driver_override does not automatically unbind the
+   device from its current driver or make any attempt to
+   automatically load the specified driver.  If no driver with a
+   matching name is currently loaded in the kernel, the device
+   will not bind to any driver.  This also allows devices to
+   opt-out of driver binding using a driver_override name such as
+   none.  Only a single driver may be specified in the override,
+   there is no support for parsing delimiters.
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index d911e0c..4393c12 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -216,6 +216,13 @@ const struct pci_device_id *pci_match_id(const struct 
pci_device_id *ids,
return NULL;
 }
 
+static const struct pci_device_id pci_device_id_any = {
+   .vendor = PCI_ANY_ID,
+   .device = PCI_ANY_ID,
+   .subvendor = PCI_ANY_ID,
+   .subdevice = 

Re: [PATCH] devicetree: Add generic IOMMU device tree bindings

2014-05-20 Thread Dave Martin
On Tue, May 20, 2014 at 02:41:18PM +0200, Arnd Bergmann wrote:
 On Tuesday 20 May 2014 14:02:43 Thierry Reding wrote:
  On Tue, May 20, 2014 at 01:15:48PM +0200, Arnd Bergmann wrote:
   On Tuesday 20 May 2014 13:05:37 Thierry Reding wrote:
On Tue, May 20, 2014 at 12:04:54PM +0200, Arnd Bergmann wrote:
 On Monday 19 May 2014 22:59:46 Thierry Reding wrote:
  On Mon, May 19, 2014 at 08:34:07PM +0200, Arnd Bergmann wrote:
  [...]
   You should never need #size-cells  #address-cells
  
  That was always my impression as well. But how then do you 
  represent the
  full 4 GiB address space in a 32-bit system? It starts at 0 and 
  ends at
  4 GiB - 1, which makes it 4 GiB large. That's:
  
  0 1 0
  
  With #address-cells = 1 and #size-cells = 1 the best you can do 
  is:
  
  0 0x
  
  but that's not accurate.
 
 I think we've done both in the past, either extended #size-cells or
 taken 0x as a special token. Note that in your example,
 the iommu actually needs #address-cells = 2 anyway.

But it needs #address-cells = 2 only to encode an ID in addition to
the address. If this was a single-master IOMMU then there'd be no need
for the ID.
   
   Right. But for a single-master IOMMU, there is no need to specify
   any additional data, it could have #address-cells=0 if we take the
   optimization you suggested.
  
  Couldn't a single-master IOMMU be windowed?
 
 Ah, yes. That would actually be like an IBM pSeries, which has a windowed
 IOMMU but uses one window per virtual machine. In that case, the window could
 be a property of the iommu node though, rather than part of the address
 in the link.
 
 The main advantage I think would be for IOMMUs that use the PCI b/d/f
 numbers as IDs. These can have #address-cells=3, #size-cells=2
 and have an empty dma-ranges property in the PCI host bridge node,
 and interpret this as using the same encoding as the PCI BARs in
 the ranges property.

I'm somewhat confused here, since you said earlier:

 After giving the ranges stuff some more thought, I have come to the
 conclusion that using #iommu-cells should work fine for almost
 all cases, including windowed iommus, because the window is not
 actually needed in the device, but only in the iommu, wihch is of 
 course
 free to interpret the arguments as addresses.

But now you seem to be saying that we should still be using the
#address-cells and #size-cells properties in the IOMMU node to determine
the length of the specifier.
   
   I probably wasn't clear. I think we can make it work either way, but
   my feeling is that using #address-cells/#size-cells gives us a nicer
   syntax for the more complex cases.
  
  Okay, so in summary we'd have something like this for simple cases:
  
  Required properties:
  
  - #address-cells: The number of cells in an IOMMU specifier needed to encode
an address.
  - #size-cells: The number of cells in an IOMMU specifier needed to represent
the length of an address range.
  
  Typical values for the above include:
  - #address-cells = 0, size-cells = 0: Single master IOMMU devices are 
  not
configurable and therefore no additional information needs to be encoded 
  in
the specifier. This may also apply to multiple master IOMMU devices that 
  do
not allow the association of masters to be configured.
  - #address-cells = 1, size-cells = 0: Multiple master IOMMU devices may
need to be configured in order to enable translation for a given master. 
  In
such cases the single address cell corresponds to the master device's ID.
  - #address-cells = 2, size-cells = 2: Some IOMMU devices allow the DMA
window for masters to be configured. The first cell of the address in this
may contain the master device's ID for example, while the second cell 
  could
contain the start of the DMA window for the given device. The length of 
  the
DMA window is specified by two additional cells.

I was trying to figure out how to describe the different kinds of
transformation we could have on the address/ID input to the IOMMU.
Treating the whole thing as opaque gets us off the hook there.

IDs are probably not propagated, not remapped, or we simply don't care
about them; whereas the address transformation is software-controlled,
so we don't describe that anyway.

Delegating grokking the mapping to the iommu driver makes sense --
it's what it's there for, after all.


I'm not sure whether the windowed IOMMU case is special actually.

Since the address to program into the master is found by calling the
IOMMU driver to create some mappings, does anything except the IOMMU
driver need to understand that there is windowing?

  
  Examples:
  =
  
  Single-master IOMMU:
  
  
  iommu {
  #address-cells = 0;

Re: [PATCH] devicetree: Add generic IOMMU device tree bindings

2014-05-20 Thread Will Deacon
On Tue, May 20, 2014 at 02:23:47PM +0100, Arnd Bergmann wrote:
   Bit#   3322  1100 
  10987654 32109876 54321098 76543210
 phys.hi cell:  npt000ss  dfff 
 phys.mid cell:    
 phys.lo cell:     
 
 where:
 n is 0 if the address is relocatable, 1 otherwise
 p is 1 if the addressable region is prefetchable, 0 otherwise
 t is 1 if the address is aliased (for non-relocatable I/O),
  below 1 MB (for Memory), or below 64 KB (for relocatable I/O).
 ss is the space code, denoting the address space
  is the 8-bit Bus Number
 d is the 5-bit Device Number
 fff is the 3-bit Function Number
  is the 8-bit Register Number
 hh...hh is a 32-bit unsigned number
 ll...ll is a 32-bit unsigned number
 
 We can ignore n, p, t and r here, and use the same format for a DMA
 address, then define an empty dma-ranges property. That would
 imply that using b/d/f is sufficient to identify each master at the
 iommu. Any device outside of the PCI host but connected to the same
 iommu can use the same notation to list the logical b/d/f that gets
 sent to the IOMMU in bus master transactions.
 
 Do you think this is sufficient for the ARM SMMU, or do we need
 something beyond that?

I think it can define the common-cases for the existing implementations,
yes. I anticipate Stream-IDs becoming  16-bit in the near future though,
so we'd need extra bits if we're describing other devices coming into the
SMMU.

Note that we already have a binding for the current SMMU driver, so I'm not
really in a position to shift over to a new binding until the next version of
the SMMU architecture comes along...

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] devicetree: Add generic IOMMU device tree bindings

2014-05-20 Thread Arnd Bergmann
On Tuesday 20 May 2014 16:24:59 Dave Martin wrote:
 On Tue, May 20, 2014 at 02:41:18PM +0200, Arnd Bergmann wrote:
  On Tuesday 20 May 2014 14:02:43 Thierry Reding wrote:
   On Tue, May 20, 2014 at 01:15:48PM +0200, Arnd Bergmann wrote:
   Typical values for the above include:
   - #address-cells = 0, size-cells = 0: Single master IOMMU devices are 
   not
 configurable and therefore no additional information needs to be 
   encoded in
 the specifier. This may also apply to multiple master IOMMU devices 
   that do
 not allow the association of masters to be configured.
   - #address-cells = 1, size-cells = 0: Multiple master IOMMU devices 
   may
 need to be configured in order to enable translation for a given 
   master. In
 such cases the single address cell corresponds to the master device's 
   ID.
   - #address-cells = 2, size-cells = 2: Some IOMMU devices allow the DMA
 window for masters to be configured. The first cell of the address in 
   this
 may contain the master device's ID for example, while the second cell 
   could
 contain the start of the DMA window for the given device. The length of 
   the
 DMA window is specified by two additional cells.
 
 I was trying to figure out how to describe the different kinds of
 transformation we could have on the address/ID input to the IOMMU.
 Treating the whole thing as opaque gets us off the hook there.
 
 IDs are probably not propagated, not remapped, or we simply don't care
 about them; whereas the address transformation is software-controlled,
 so we don't describe that anyway.
 
 Delegating grokking the mapping to the iommu driver makes sense --
 it's what it's there for, after all.
 
 
 I'm not sure whether the windowed IOMMU case is special actually.
 
 Since the address to program into the master is found by calling the
 IOMMU driver to create some mappings, does anything except the IOMMU
 driver need to understand that there is windowing?

No. I tried to explain that earlier today, and in my earlier mails
I hadn't thought that part through. Only the IOMMU driver needs to care
about the window.

   
   Examples:
   =
   
   Single-master IOMMU:
   
   
 iommu {
 #address-cells = 0;
 #size-cells = 0;
 };
   
 master {
 iommus = /iommu;
 };
   
   Multiple-master IOMMU with fixed associations:
   --
   
 /* multiple-master IOMMU */
 iommu {
 /*
  * Masters are statically associated with this IOMMU and
  * address translation is always enabled.
  */
 #iommu-cells = 0;
 };
  
  copied wrong? I guess you mean #address-cells=0/#size-cells=0 here.
  
 /* static association with IOMMU */
 master@1 {
 reg = 1;
 
 Just for clarification, reg just has its standard meaning here, and
 is nothing to do with the IOMMU?

correct

 iommus = /iommu;
 
 In effect, iommus is doing the same thing as my slaves property.
 
 The way #address-cells and #size-cells determine the address and range
 size for mastering into the IOMMU is also similar.  The main difference
 is that I didn't build the ID into the address.

Right. I think the difference is more about what we want to call
things: Calling it iommu means we want to specifically describe
the case of iommus that needs to get handled by all OSs in a particular
way, while the more generic slave connection doesn't correspond to
a specific concept in the OS.

 };
   
 /* static association with IOMMU */
 master@2 {
 reg = 2;
 iommus = /iommu;
 };
   
   Multiple-master IOMMU:
   --
   
 iommu {
 /* the specifier represents the ID of the master */
 #address-cells = 1;
 #size-cells = 0;
 
 How do we know the size of the input address to the IOMMU?  Do we
 get cases for example where the IOMMU only accepts a 32-bit input
 address, but some 64-bit capable masters are connected through it?

I was stuck on this question for a while before, but then I realized
that it doesn't matter at all: It's the IOMMU driver itself that
manages the address space, and it doesn't matter if a slave can
address a larger range than the IOMMU can accept. If the IOMMU
needs to deal with the opposite case (64-bit input addresses
but a 32-bit master), that limitation can be put into the specifier.

 The size of the output address from the IOMMU will be determined
 by its own mastering destination, which by default in ePAPR is the
 IOMMU node's parent.  I think that's what you intended, and what we
 expect in this case.

Rihgt.

 For determining dma masks, it is the output address that it
 important.  Santosh's code can probably be taught to handle this,
 if given an additional traversal rule for following iommus
 properties.  However, deploying an IOMMU whose output address size
 is 

Re: [PATCH] devicetree: Add generic IOMMU device tree bindings

2014-05-20 Thread Arnd Bergmann
On Tuesday 20 May 2014 16:00:02 Thierry Reding wrote:
 On Tue, May 20, 2014 at 03:34:46PM +0200, Arnd Bergmann wrote:
  On Tuesday 20 May 2014 15:17:43 Thierry Reding wrote:
   On Tue, May 20, 2014 at 02:41:18PM +0200, Arnd Bergmann wrote:
On Tuesday 20 May 2014 14:02:43 Thierry Reding wrote:
   [...]
 Couldn't a single-master IOMMU be windowed?

Ah, yes. That would actually be like an IBM pSeries, which has a 
windowed
IOMMU but uses one window per virtual machine. In that case, the window 
could
be a property of the iommu node though, rather than part of the address
in the link.
   
   Does that mean that the IOMMU has one statically configured window which
   is the same for each virtual machine? That would require some other
   mechanism to assign separate address spaces to each virtual machine,
   wouldn't it? But I suspect that if the IOMMU allows that it could be
   allocated dynamically at runtime.
  
  The way it works on pSeries is that upon VM creation, the guest is assigned
  one 256MB window for use by assigned DMA capable devices. When the guest
  creates a mapping, it uses a hypercall to associate a bus address in that
  range with a guest physical address. The hypervisor checks that the bus
  address is within the allowed range, and translates the guest physical
  address into a host physical address, then enters both into the I/O page
  table or I/O TLB.
 
 So when a VM is booted it is passed a device tree with that DMA window?

Correct.

 Given what you describe above this seems to be more of a configuration
 option to restrict the IOMMU to a subset of the physical memory for
 purposes of virtualization. So I agree that this wouldn't be a good fit
 for what we're trying to achieve with iommus or dma-ranges in this
 binding.

Thinking about it again now, I wonder if there are any other use cases
for windowed IOMMUs. If this is the only one, there might be no use
in the #address-cells model I suggested instead of your original
#iommu-cells.

I would like to add an explanation about dma-ranges to the binding:

8
The parent bus of the iommu must have a valid dma-ranges property
describing how the physical address space of the IOMMU maps into
memory.
   
   With physical address space you mean the addresses after translation,
   not the I/O virtual addresses, right? But even so, how will this work
   when there are multiple IOMMU devices? What determines which IOMMU is
   mapped via which entry?
   
   Perhaps having multiple IOMMUs implies that there will have to be some
   partitioning of the parent address space to make sure two IOMMUs don't
   translate to the same ranges?
  
  These dma-ranges properties would almost always be for the entire RAM,
  and we can treat anything else as a bug.
 
 Would it typically be a 1:1 mapping? In that case could we define an
 empty dma-ranges property to mean exactly that? That would make it
 consistent with the ranges property.

Yes, I believe that is how it's already defined.

A device with an iommus property will ignore the dma-ranges property
of the parent node and rely on the IOMMU for translation instead.
   
   Do we need to consider the case where an IOMMU listed in iommus isn't
   enabled (status = disabled)? In that case presumably the device would
   either not function or may optionally continue to master onto the parent
   untranslated.
  
  My reasoning was that the DT should specify whether we use the IOMMU
  or not. Being able to just switch on or off the IOMMU sounds nice as
  well, so we could change the text above to do that.
  
  Another option would be to do this in the IOMMU code, basically
  falling back to the IOMMU parent's dma-ranges property and using
  linear dma_map_ops when that is disabled.
 
 Yes, it should be trivial for the IOMMU core code to take care of this
 special case. Still I think it's worth mentioning it in the binding so
 that it's clearly specified.

Agreed.

Using an iommus property in bus device nodes with dma-ranges
specifying how child devices relate to the IOMMU is a possible extension
but is not recommended until this binding gets extended.
   
   Just for my understanding, bus device nodes with iommus and dma-ranges
   properties could be equivalently written by explicitly moving the iommus
   properties into the child device nodes, right? In which case they should
   be the same as the other examples. So that concept is a convenience
   notation to reduce duplication, but doesn't fundamentally introduce any
   new concept.
  
  The one case  where that doesn't work is PCI, because we don't list the
  PCI devices in DT normally, and the iommus property would only exist
  at the PCI host controller node.
 
 But it could work in classic OpenFirmware where the device tree can be
 populated with the tree of PCI devices enumerated by OF. There are also
 devices that have a fixed configuration and where technically the 

Re: [PATCH] devicetree: Add generic IOMMU device tree bindings

2014-05-20 Thread Arnd Bergmann
On Tuesday 20 May 2014 17:39:12 Dave Martin wrote:
 On Tue, May 20, 2014 at 04:26:59PM +0100, Will Deacon wrote:
  On Tue, May 20, 2014 at 02:23:47PM +0100, Arnd Bergmann wrote:
 Bit#   3322  1100 
10987654 32109876 54321098 76543210
   phys.hi cell:  npt000ss  dfff 
   phys.mid cell:    
   phys.lo cell:     
   
   where:
   n is 0 if the address is relocatable, 1 otherwise
   p is 1 if the addressable region is prefetchable, 0 otherwise
   t is 1 if the address is aliased (for non-relocatable I/O),
below 1 MB (for Memory), or below 64 KB (for relocatable I/O).
   ss is the space code, denoting the address space
    is the 8-bit Bus Number
   d is the 5-bit Device Number
   fff is the 3-bit Function Number
    is the 8-bit Register Number
   hh...hh is a 32-bit unsigned number
   ll...ll is a 32-bit unsigned number
   
   We can ignore n, p, t and r here, and use the same format for a DMA
   address, then define an empty dma-ranges property. That would
   imply that using b/d/f is sufficient to identify each master at the
   iommu. Any device outside of the PCI host but connected to the same
   iommu can use the same notation to list the logical b/d/f that gets
   sent to the IOMMU in bus master transactions.
   
   Do you think this is sufficient for the ARM SMMU, or do we need
   something beyond that?
  
  I think it can define the common-cases for the existing implementations,
  yes. I anticipate Stream-IDs becoming  16-bit in the near future though,
  so we'd need extra bits if we're describing other devices coming into the
  SMMU.
  
  Note that we already have a binding for the current SMMU driver, so I'm not
  really in a position to shift over to a new binding until the next version 
  of
  the SMMU architecture comes along...
 
 How much code relies on the meaning of the nptsbdf bits?

Not much at all, I think this was defined mostly for open firmware
client interfaces, which we don't use with FDT.

n, t and r are probably only used in PCI functions that are listed
in DT, not in the host bridge.

b/d/f I think is mostly used for the interrupt-maps property, when
describing hardwired interrupts.

p and s are used when setting up the translation for inbound MMIO
and PIO.

Arnd
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/5] iommu/amd: Remove IOMMUv2 pasid_state_list

2014-05-20 Thread Joerg Roedel
From: Joerg Roedel jroe...@suse.de

This list was only used for the task_exit notifier function.
Now that it is gone we can remove it.

Signed-off-by: Joerg Roedel jroe...@suse.de
Tested-by: Jay Cornwall jay.cornw...@amd.com
---
 drivers/iommu/amd_iommu_v2.c |   26 --
 1 file changed, 26 deletions(-)

diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index c06a29a..d9309a0 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -86,10 +86,6 @@ struct fault {
 static LIST_HEAD(state_list);
 static spinlock_t state_lock;
 
-/* List and lock for all pasid_states */
-static LIST_HEAD(pasid_state_list);
-static DEFINE_SPINLOCK(ps_lock);
-
 static struct workqueue_struct *iommu_wq;
 
 /*
@@ -171,25 +167,6 @@ static void put_device_state_wait(struct device_state 
*dev_state)
free_device_state(dev_state);
 }
 
-static void link_pasid_state(struct pasid_state *pasid_state)
-{
-   spin_lock(ps_lock);
-   list_add_tail(pasid_state-list, pasid_state_list);
-   spin_unlock(ps_lock);
-}
-
-static void __unlink_pasid_state(struct pasid_state *pasid_state)
-{
-   list_del(pasid_state-list);
-}
-
-static void unlink_pasid_state(struct pasid_state *pasid_state)
-{
-   spin_lock(ps_lock);
-   __unlink_pasid_state(pasid_state);
-   spin_unlock(ps_lock);
-}
-
 /* Must be called under dev_state-lock */
 static struct pasid_state **__get_pasid_state_ptr(struct device_state 
*dev_state,
  int pasid, bool alloc)
@@ -346,7 +323,6 @@ static void unbind_pasid(struct device_state *dev_state, 
int pasid)
if (pasid_state == NULL)
return;
 
-   unlink_pasid_state(pasid_state);
__unbind_pasid(pasid_state);
put_pasid_state_wait(pasid_state); /* Reference taken in this function 
*/
 }
@@ -687,8 +663,6 @@ int amd_iommu_bind_pasid(struct pci_dev *pdev, int pasid,
if (ret)
goto out_clear_state;
 
-   link_pasid_state(pasid_state);
-
return 0;
 
 out_clear_state:
-- 
1.7.9.5


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/5] iommu/amd: Convert IOMMUv2 state_table into state_list

2014-05-20 Thread Joerg Roedel
From: Joerg Roedel jroe...@suse.de

The state_table consumes 512kb of memory and is only sparsly
populated. Convert it into a list to save memory. There
should be no measurable performance impact.

Signed-off-by: Joerg Roedel jroe...@suse.de
Tested-by: Jay Cornwall jay.cornw...@amd.com
---
 drivers/iommu/amd_iommu_v2.c |   39 ++-
 1 file changed, 18 insertions(+), 21 deletions(-)

diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index 8c3086a..05e93f1 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -56,6 +56,8 @@ struct pasid_state {
 };
 
 struct device_state {
+   struct list_head list;
+   u16 devid;
atomic_t count;
struct pci_dev *pdev;
struct pasid_state **states;
@@ -81,7 +83,7 @@ struct fault {
u16 flags;
 };
 
-static struct device_state **state_table;
+static LIST_HEAD(state_list);
 static spinlock_t state_lock;
 
 /* List and lock for all pasid_states */
@@ -113,7 +115,14 @@ static u16 device_id(struct pci_dev *pdev)
 
 static struct device_state *__get_device_state(u16 devid)
 {
-   return state_table[devid];
+   struct device_state *dev_state;
+
+   list_for_each_entry(dev_state, state_list, list) {
+   if (dev_state-devid == devid)
+   return dev_state;
+   }
+
+   return NULL;
 }
 
 static struct device_state *get_device_state(u16 devid)
@@ -776,7 +785,8 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
 
spin_lock_init(dev_state-lock);
init_waitqueue_head(dev_state-wq);
-   dev_state-pdev = pdev;
+   dev_state-pdev  = pdev;
+   dev_state-devid = devid;
 
tmp = pasids;
for (dev_state-pasid_levels = 0; (tmp - 1)  ~0x1ff; tmp = 9)
@@ -806,13 +816,13 @@ int amd_iommu_init_device(struct pci_dev *pdev, int 
pasids)
 
spin_lock_irqsave(state_lock, flags);
 
-   if (state_table[devid] != NULL) {
+   if (__get_device_state(devid) != NULL) {
spin_unlock_irqrestore(state_lock, flags);
ret = -EBUSY;
goto out_free_domain;
}
 
-   state_table[devid] = dev_state;
+   list_add_tail(dev_state-list, state_list);
 
spin_unlock_irqrestore(state_lock, flags);
 
@@ -850,7 +860,7 @@ void amd_iommu_free_device(struct pci_dev *pdev)
return;
}
 
-   state_table[devid] = NULL;
+   list_del(dev_state-list);
 
spin_unlock_irqrestore(state_lock, flags);
 
@@ -925,7 +935,6 @@ EXPORT_SYMBOL(amd_iommu_set_invalidate_ctx_cb);
 
 static int __init amd_iommu_v2_init(void)
 {
-   size_t state_table_size;
int ret;
 
pr_info(AMD IOMMUv2 driver by Joerg Roedel joerg.roe...@amd.com\n);
@@ -941,16 +950,10 @@ static int __init amd_iommu_v2_init(void)
 
spin_lock_init(state_lock);
 
-   state_table_size = MAX_DEVICES * sizeof(struct device_state *);
-   state_table = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
-  get_order(state_table_size));
-   if (state_table == NULL)
-   return -ENOMEM;
-
ret = -ENOMEM;
iommu_wq = create_workqueue(amd_iommu_v2);
if (iommu_wq == NULL)
-   goto out_free;
+   goto out;
 
ret = -ENOMEM;
empty_page_table = (u64 *)get_zeroed_page(GFP_KERNEL);
@@ -965,16 +968,13 @@ static int __init amd_iommu_v2_init(void)
 out_destroy_wq:
destroy_workqueue(iommu_wq);
 
-out_free:
-   free_pages((unsigned long)state_table, get_order(state_table_size));
-
+out:
return ret;
 }
 
 static void __exit amd_iommu_v2_exit(void)
 {
struct device_state *dev_state;
-   size_t state_table_size;
int i;
 
if (!amd_iommu_v2_supported())
@@ -1003,9 +1003,6 @@ static void __exit amd_iommu_v2_exit(void)
 
destroy_workqueue(iommu_wq);
 
-   state_table_size = MAX_DEVICES * sizeof(struct device_state *);
-   free_pages((unsigned long)state_table, get_order(state_table_size));
-
free_page((unsigned long)empty_page_table);
 }
 
-- 
1.7.9.5


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/5] iommu/amd: Don't access IOMMUv2 state_table directly

2014-05-20 Thread Joerg Roedel
From: Joerg Roedel jroe...@suse.de

This is a preparation for converting the state_table into a
state_list.

Signed-off-by: Joerg Roedel jroe...@suse.de
Tested-by: Jay Cornwall jay.cornw...@amd.com
---
 drivers/iommu/amd_iommu_v2.c |   13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index 5208828..8c3086a 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -111,13 +111,18 @@ static u16 device_id(struct pci_dev *pdev)
return devid;
 }
 
+static struct device_state *__get_device_state(u16 devid)
+{
+   return state_table[devid];
+}
+
 static struct device_state *get_device_state(u16 devid)
 {
struct device_state *dev_state;
unsigned long flags;
 
spin_lock_irqsave(state_lock, flags);
-   dev_state = state_table[devid];
+   dev_state = __get_device_state(devid);
if (dev_state != NULL)
atomic_inc(dev_state-count);
spin_unlock_irqrestore(state_lock, flags);
@@ -839,7 +844,7 @@ void amd_iommu_free_device(struct pci_dev *pdev)
 
spin_lock_irqsave(state_lock, flags);
 
-   dev_state = state_table[devid];
+   dev_state = __get_device_state(devid);
if (dev_state == NULL) {
spin_unlock_irqrestore(state_lock, flags);
return;
@@ -872,7 +877,7 @@ int amd_iommu_set_invalid_ppr_cb(struct pci_dev *pdev,
spin_lock_irqsave(state_lock, flags);
 
ret = -EINVAL;
-   dev_state = state_table[devid];
+   dev_state = __get_device_state(devid);
if (dev_state == NULL)
goto out_unlock;
 
@@ -903,7 +908,7 @@ int amd_iommu_set_invalidate_ctx_cb(struct pci_dev *pdev,
spin_lock_irqsave(state_lock, flags);
 
ret = -EINVAL;
-   dev_state = state_table[devid];
+   dev_state = __get_device_state(devid);
if (dev_state == NULL)
goto out_unlock;
 
-- 
1.7.9.5


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/5] iommu/amd: Handle parallel invalidate_range_start/end calls correctly

2014-05-20 Thread Joerg Roedel
From: Joerg Roedel jroe...@suse.de

Add a counter to the pasid_state so that we do not restore
the original page-table before all invalidate_range_start
to invalidate_range_end sections have finished.

Signed-off-by: Joerg Roedel jroe...@suse.de
---
 drivers/iommu/amd_iommu_v2.c |   17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index d9309a0..9ed9da2 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -45,6 +45,8 @@ struct pri_queue {
 struct pasid_state {
struct list_head list;  /* For global state-list */
atomic_t count; /* Reference count */
+   atomic_t mmu_notifier_count;/* Counting nested mmu_notifier
+  calls */
struct task_struct *task;   /* Task bound to this PASID */
struct mm_struct *mm;   /* mm_struct for the faults */
struct mmu_notifier mn; /* mmu_otifier handle */
@@ -433,8 +435,11 @@ static void mn_invalidate_range_start(struct mmu_notifier 
*mn,
pasid_state = mn_to_state(mn);
dev_state   = pasid_state-device_state;
 
-   amd_iommu_domain_set_gcr3(dev_state-domain, pasid_state-pasid,
- __pa(empty_page_table));
+   if (atomic_add_return(1, pasid_state-mmu_notifier_count) == 1) {
+   amd_iommu_domain_set_gcr3(dev_state-domain,
+ pasid_state-pasid,
+ __pa(empty_page_table));
+   }
 }
 
 static void mn_invalidate_range_end(struct mmu_notifier *mn,
@@ -447,8 +452,11 @@ static void mn_invalidate_range_end(struct mmu_notifier 
*mn,
pasid_state = mn_to_state(mn);
dev_state   = pasid_state-device_state;
 
-   amd_iommu_domain_set_gcr3(dev_state-domain, pasid_state-pasid,
- __pa(pasid_state-mm-pgd));
+   if (atomic_dec_and_test(pasid_state-mmu_notifier_count)) {
+   amd_iommu_domain_set_gcr3(dev_state-domain,
+ pasid_state-pasid,
+ __pa(pasid_state-mm-pgd));
+   }
 }
 
 static void mn_release(struct mmu_notifier *mn, struct mm_struct *mm)
@@ -640,6 +648,7 @@ int amd_iommu_bind_pasid(struct pci_dev *pdev, int pasid,
goto out;
 
atomic_set(pasid_state-count, 1);
+   atomic_set(pasid_state-mmu_notifier_count, 0);
init_waitqueue_head(pasid_state-wq);
spin_lock_init(pasid_state-lock);
 
-- 
1.7.9.5


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/5] AMD IOMMUv2 Cleanups and Fixes

2014-05-20 Thread Joerg Roedel
Hi,

here is a small patch-set with some clean-ups for the AMD IOMMUv2
driver. The most important change is the conversion to use the
mmu_notifier release call-back instead of the task_exit notifier to get
notified when a MM dies. Please review.

Thanks,

Joerg

Diffstat:

 drivers/iommu/amd_iommu_v2.c |  184 +-

Shortlog:

Joerg Roedel (5):
  iommu/amd: Don't access IOMMUv2 state_table directly
  iommu/amd: Convert IOMMUv2 state_table into state_list
  iommu/amd: Implement mmu_notifier_release call-back
  iommu/amd: Remove IOMMUv2 pasid_state_list
  iommu/amd: Handle parallel invalidate_range_start/end calls correctly

 1 file changed, 74 insertions(+), 110 deletions(-)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5_v2 01/11] driver core: platform: add device binding path 'driver_override'

2014-05-20 Thread Kim Phillips
From: Kim Phillips kim.phill...@freescale.com

Needed by platform device drivers, such as the vfio-platform driver
later in series, in order to bypass the existing OF, ACPI, id_table and
name string matches, and successfully be able to be bound to any
device, like so:

echo vfio-platform  /sys/bus/platform/devices/fff51000.ethernet/driver_override
echo fff51000.ethernet  
/sys/bus/platform/devices/fff51000.ethernet/driver/unbind
echo fff51000.ethernet  /sys/bus/platform/drivers_probe

This mimics PCI: Introduce new device binding path using
pci_dev.driver_override [1], which is an interface enhancement
for more deterministic PCI device binding, e.g., when in the
presence of hotplug.

[1] https://lists.cs.columbia.edu/pipermail/kvmarm/2014-May/009527.html

Suggested-by: Alex Williamson alex.william...@redhat.com
Signed-off-by: Kim Phillips kim.phill...@freescale.com
---
changes in v2 patch of v5 of this patchseries:
- rebased onto today's Linus' ToT
- added kfree to match PCI counterpart fix, as Alex Williamson
  just posted a v3 of the patch (thanks Christoffer for the
  notification)
- in the commit text, replaced vfio platform driver reference with
  'later in series', and updated the PCI version mailing list reference
  to the v3 version.

Is it safe to assume this patch will continue to as part of the VFIO
platform driver patchseries, and be submitted by Antonis?  If so, can
we start collecting some {Reviewed,Acked}-bys?   Thanks, Kim.

 Documentation/ABI/testing/sysfs-bus-platform | 20 
 drivers/base/platform.c  | 47 
 include/linux/platform_device.h  |  1 +
 3 files changed, 68 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-platform

diff --git a/Documentation/ABI/testing/sysfs-bus-platform 
b/Documentation/ABI/testing/sysfs-bus-platform
new file mode 100644
index 000..5172a61
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-platform
@@ -0,0 +1,20 @@
+What:  /sys/bus/platform/devices/.../driver_override
+Date:  April 2014
+Contact:   Kim Phillips kim.phill...@freescale.com
+Description:
+   This file allows the driver for a device to be specified which
+   will override standard OF, ACPI, ID table, and name matching.
+   When specified, only a driver with a name matching the value
+   written to driver_override will have an opportunity to bind
+   to the device.  The override is specified by writing a string
+   to the driver_override file (echo vfio-platform  \
+   driver_override) and may be cleared with an empty string
+   (echo  driver_override).  This returns the device to standard
+   matching rules binding.  Writing to driver_override does not
+   automatically unbind the device from its current driver or make
+   any attempt to automatically load the specified driver.  If no
+   driver with a matching name is currently loaded in the kernel,
+   the device will not bind to any driver.  This also allows
+   devices to opt-out of driver binding using a driver_override
+   name such as none.  Only a single driver may be specified in
+   the override, there is no support for parsing delimiters.
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 5b47210..4f47563 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -23,6 +23,7 @@
 #include linux/pm_runtime.h
 #include linux/idr.h
 #include linux/acpi.h
+#include linux/limits.h
 
 #include base.h
 #include power/power.h
@@ -188,6 +189,7 @@ static void platform_device_release(struct device *dev)
kfree(pa-pdev.dev.platform_data);
kfree(pa-pdev.mfd_cell);
kfree(pa-pdev.resource);
+   kfree(pa-pdev.driver_override);
kfree(pa);
 }
 
@@ -695,8 +697,49 @@ static ssize_t modalias_show(struct device *dev, struct 
device_attribute *a,
 }
 static DEVICE_ATTR_RO(modalias);
 
+static ssize_t driver_override_store(struct device *dev,
+struct device_attribute *attr,
+const char *buf, size_t count)
+{
+   struct platform_device *pdev = to_platform_device(dev);
+   char *driver_override, *old = pdev-driver_override, *cp;
+
+   if (count  PATH_MAX)
+   return -EINVAL;
+
+   driver_override = kstrndup(buf, count, GFP_KERNEL);
+   if (!driver_override)
+   return -ENOMEM;
+
+   cp = strchr(driver_override, '\n');
+   if (cp)
+   *cp = '\0';
+
+   if (strlen(driver_override)) {
+   pdev-driver_override = driver_override;
+   } else {
+   kfree(driver_override);
+   pdev-driver_override = NULL;
+   }
+
+   kfree(old);
+
+   return count;
+}
+
+static ssize_t driver_override_show(struct device *dev,
+  

Re: [PATCH v3 00/15] PCI/iommu: Fix DMA alias problems

2014-05-20 Thread colin
Alex Williamson wrote:
 Wow, I didn't think that kind of broken was possible.  Maybe instead of
 a bitmap of function aliases we could have a single devfn alias for a
 device.  That means we'd only be able to support a single alias for a
 device, but since I don't think we've seen devices that use more than a
 single alias, maybe that's ok.

In my (never finished) patch set for the same problem, the first thing
I did was

diff --git a/include/linux/pci.h b/include/linux/pci.h
index a13d6825..7788870a 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -251,12 +251,13 @@ struct pci_dev {
struct proc_dir_entry *procent; /* device entry in /proc/bus/pci */
struct pci_slot *slot;  /* Physical slot this device is in */
 
-   unsigned intdevfn;  /* encoded device  function index */
unsigned short  vendor;
unsigned short  device;
unsigned short  subsystem_vendor;
unsigned short  subsystem_device;
unsigned intclass;  /* 3 bytes: (base,sub,prog-if) */
+   u8  devfn;  /* encoded device  function index */
+   u8  devfn_quirk;/* zero is non-quirky */
u8  revision;   /* PCI revision, low byte of class word 
*/
u8  hdr_type;   /* PCI header type (`multi' flag masked 
out) */
u8  pcie_cap;   /* PCIe capability offset */

I encoded devfn_quirk as a delta to devfn, so that zero would mean
no quirk, and no existing intialization would need changing.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu: fsl_pamu.c: Fix for possible null pointer dereference

2014-05-20 Thread Rickard Strandqvist
There is otherwise a risk of a possible null pointer dereference.

Was largely found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist rickard_strandqv...@spectrumdigital.se
---
 drivers/iommu/fsl_pamu.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c
index cba0498..b99dd88 100644
--- a/drivers/iommu/fsl_pamu.c
+++ b/drivers/iommu/fsl_pamu.c
@@ -592,8 +592,7 @@ found_cpu_node:
/* advance to next node in cache hierarchy */
node = of_find_node_by_phandle(*prop);
if (!node) {
-   pr_debug(Invalid node for cache hierarchy %s\n,
-   node-full_name);
+   pr_debug(Invalid node for cache hierarchy\n);
return ~(u32)0;
}
}
-- 
1.7.10.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] remove duplicate checking code

2014-05-20 Thread Vaughan Cao
amd_iommu_rlookup_table[devid] != NULL is already guaranteed by check_device
called before, it's fine to attach device at this point.

Signed-off-by: Vaughan Cao vaughan@oracle.com
---
 drivers/iommu/amd_iommu.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index c949520..252851a 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3514,12 +3514,6 @@ int __init amd_iommu_init_passthrough(void)
dev_data = get_dev_data(dev-dev);
dev_data-passthrough = true;
 
-   devid = get_device_id(dev-dev);
-
-   iommu = amd_iommu_rlookup_table[devid];
-   if (!iommu)
-   continue;
-
attach_device(dev-dev, pt_domain);
}
 
-- 
1.9.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu