Re: [PATCH kernel v9 18/32] powerpc/iommu/powernv: Release replaced TCE

2015-04-29 Thread Alexey Kardashevskiy

On 04/29/2015 02:18 PM, David Gibson wrote:

On Sat, Apr 25, 2015 at 10:14:42PM +1000, Alexey Kardashevskiy wrote:

At the moment writing new TCE value to the IOMMU table fails with EBUSY
if there is a valid entry already. However PAPR specification allows
the guest to write new TCE value without clearing it first.

Another problem this patch is addressing is the use of pool locks for
external IOMMU users such as VFIO. The pool locks are to protect
DMA page allocator rather than entries and since the host kernel does
not control what pages are in use, there is no point in pool locks and
exchange()+put_page(oldtce) is sufficient to avoid possible races.

This adds an exchange() callback to iommu_table_ops which does the same
thing as set() plus it returns replaced TCE and DMA direction so
the caller can release the pages afterwards. The exchange() receives
a physical address unlike set() which receives linear mapping address;
and returns a physical address as the clear() does.

This implements exchange() for P5IOC2/IODA/IODA2. This adds a requirement
for a platform to have exchange() implemented in order to support VFIO.

This replaces iommu_tce_build() and iommu_clear_tce() with
a single iommu_tce_xchg().

This makes sure that TCE permission bits are not set in TCE passed to
IOMMU API as those are to be calculated by platform code from DMA direction.

This moves SetPageDirty() to the IOMMU code to make it work for both
VFIO ioctl interface in in-kernel TCE acceleration (when it becomes
available later).

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
[aw: for the vfio related changes]
Acked-by: Alex Williamson alex.william...@redhat.com


This looks mostly good, but there are couple of details that need fixing.




[...]


diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index ba75aa5..e8802ac 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -598,6 +598,23 @@ int pnv_tce_build(struct iommu_table *tbl, long index, 
long npages,
return 0;
  }

+#ifdef CONFIG_IOMMU_API
+int pnv_tce_xchg(struct iommu_table *tbl, long index,
+   unsigned long *tce, enum dma_data_direction *direction)
+{
+   u64 proto_tce = iommu_direction_to_tce_perm(*direction);
+   unsigned long newtce = *tce | proto_tce;
+   unsigned long idx = index - tbl-it_offset;


Should this have a BUG_ON or WARN_ON if the supplied tce has bits set
below the page mask?



Why? The caller checks these bits, do we really need to duplicate it here?



+   *tce = xchg(pnv_tce(tbl, idx), cpu_to_be64(newtce));
+   *tce = be64_to_cpu(*tce);
+   *direction = iommu_tce_direction(*tce);
+   *tce = ~(TCE_PCI_READ | TCE_PCI_WRITE);
+
+   return 0;
+}
+#endif




--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel v9 18/32] powerpc/iommu/powernv: Release replaced TCE

2015-04-29 Thread David Gibson
On Wed, Apr 29, 2015 at 07:51:21PM +1000, Alexey Kardashevskiy wrote:
 On 04/29/2015 02:18 PM, David Gibson wrote:
 On Sat, Apr 25, 2015 at 10:14:42PM +1000, Alexey Kardashevskiy wrote:
 At the moment writing new TCE value to the IOMMU table fails with EBUSY
 if there is a valid entry already. However PAPR specification allows
 the guest to write new TCE value without clearing it first.
 
 Another problem this patch is addressing is the use of pool locks for
 external IOMMU users such as VFIO. The pool locks are to protect
 DMA page allocator rather than entries and since the host kernel does
 not control what pages are in use, there is no point in pool locks and
 exchange()+put_page(oldtce) is sufficient to avoid possible races.
 
 This adds an exchange() callback to iommu_table_ops which does the same
 thing as set() plus it returns replaced TCE and DMA direction so
 the caller can release the pages afterwards. The exchange() receives
 a physical address unlike set() which receives linear mapping address;
 and returns a physical address as the clear() does.
 
 This implements exchange() for P5IOC2/IODA/IODA2. This adds a requirement
 for a platform to have exchange() implemented in order to support VFIO.
 
 This replaces iommu_tce_build() and iommu_clear_tce() with
 a single iommu_tce_xchg().
 
 This makes sure that TCE permission bits are not set in TCE passed to
 IOMMU API as those are to be calculated by platform code from DMA direction.
 
 This moves SetPageDirty() to the IOMMU code to make it work for both
 VFIO ioctl interface in in-kernel TCE acceleration (when it becomes
 available later).
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 [aw: for the vfio related changes]
 Acked-by: Alex Williamson alex.william...@redhat.com
 
 This looks mostly good, but there are couple of details that need fixing.
 
 
 
 [...]
 
 diff --git a/arch/powerpc/platforms/powernv/pci.c 
 b/arch/powerpc/platforms/powernv/pci.c
 index ba75aa5..e8802ac 100644
 --- a/arch/powerpc/platforms/powernv/pci.c
 +++ b/arch/powerpc/platforms/powernv/pci.c
 @@ -598,6 +598,23 @@ int pnv_tce_build(struct iommu_table *tbl, long index, 
 long npages,
 return 0;
   }
 
 +#ifdef CONFIG_IOMMU_API
 +int pnv_tce_xchg(struct iommu_table *tbl, long index,
 +   unsigned long *tce, enum dma_data_direction *direction)
 +{
 +   u64 proto_tce = iommu_direction_to_tce_perm(*direction);
 +   unsigned long newtce = *tce | proto_tce;
 +   unsigned long idx = index - tbl-it_offset;
 
 Should this have a BUG_ON or WARN_ON if the supplied tce has bits set
 below the page mask?
 
 
 Why? The caller checks these bits, do we really need to duplicate it
 here?

Because this is the crunch point where bad bits will actually cause
strange stuff to happen.

As much as anything the point of a BUG_ON would be to document that
this function expects the parameter to be aligned.

 
 
 +   *tce = xchg(pnv_tce(tbl, idx), cpu_to_be64(newtce));
 +   *tce = be64_to_cpu(*tce);
 +   *direction = iommu_tce_direction(*tce);
 +   *tce = ~(TCE_PCI_READ | TCE_PCI_WRITE);
 +
 +   return 0;
 +}
 +#endif
 
 
 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpguCMEm1obR.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH kernel v9 18/32] powerpc/iommu/powernv: Release replaced TCE

2015-04-28 Thread David Gibson
On Sat, Apr 25, 2015 at 10:14:42PM +1000, Alexey Kardashevskiy wrote:
 At the moment writing new TCE value to the IOMMU table fails with EBUSY
 if there is a valid entry already. However PAPR specification allows
 the guest to write new TCE value without clearing it first.
 
 Another problem this patch is addressing is the use of pool locks for
 external IOMMU users such as VFIO. The pool locks are to protect
 DMA page allocator rather than entries and since the host kernel does
 not control what pages are in use, there is no point in pool locks and
 exchange()+put_page(oldtce) is sufficient to avoid possible races.
 
 This adds an exchange() callback to iommu_table_ops which does the same
 thing as set() plus it returns replaced TCE and DMA direction so
 the caller can release the pages afterwards. The exchange() receives
 a physical address unlike set() which receives linear mapping address;
 and returns a physical address as the clear() does.
 
 This implements exchange() for P5IOC2/IODA/IODA2. This adds a requirement
 for a platform to have exchange() implemented in order to support VFIO.
 
 This replaces iommu_tce_build() and iommu_clear_tce() with
 a single iommu_tce_xchg().
 
 This makes sure that TCE permission bits are not set in TCE passed to
 IOMMU API as those are to be calculated by platform code from DMA direction.
 
 This moves SetPageDirty() to the IOMMU code to make it work for both
 VFIO ioctl interface in in-kernel TCE acceleration (when it becomes
 available later).
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 [aw: for the vfio related changes]
 Acked-by: Alex Williamson alex.william...@redhat.com

This looks mostly good, but there are couple of details that need fixing.

 ---
 Changes:
 v9:
 * changed exchange() to work with physical addresses as these addresses
 are never accessed by the code and physical addresses are actual values
 we put into the IOMMU table
 ---
  arch/powerpc/include/asm/iommu.h| 22 +--
  arch/powerpc/kernel/iommu.c | 57 +---
  arch/powerpc/platforms/powernv/pci-ioda.c   | 34 +
  arch/powerpc/platforms/powernv/pci-p5ioc2.c |  3 ++
  arch/powerpc/platforms/powernv/pci.c| 17 +
  arch/powerpc/platforms/powernv/pci.h|  2 +
  drivers/vfio/vfio_iommu_spapr_tce.c | 58 
 ++---
  7 files changed, 128 insertions(+), 65 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/iommu.h 
 b/arch/powerpc/include/asm/iommu.h
 index e63419e..7e7ca0a 100644
 --- a/arch/powerpc/include/asm/iommu.h
 +++ b/arch/powerpc/include/asm/iommu.h
 @@ -45,13 +45,29 @@ extern int iommu_is_off;
  extern int iommu_force_on;
  
  struct iommu_table_ops {
 + /*
 +  * When called with direction==DMA_NONE, it is equal to clear().
 +  * uaddr is a linear map address.
 +  */
   int (*set)(struct iommu_table *tbl,
   long index, long npages,
   unsigned long uaddr,
   enum dma_data_direction direction,
   struct dma_attrs *attrs);
 +#ifdef CONFIG_IOMMU_API
 + /*
 +  * Exchanges existing TCE with new TCE plus direction bits;
 +  * returns old TCE and DMA direction mask.
 +  * @tce is a physical address.
 +  */
 + int (*exchange)(struct iommu_table *tbl,
 + long index,
 + unsigned long *tce,

I'd prefer to call this address or paddr or something, since it's
not a full TCE entry (which would contain permission bits).

 + enum dma_data_direction *direction);
 +#endif
   void (*clear)(struct iommu_table *tbl,
   long index, long npages);
 + /* get() returns a physical address */
   unsigned long (*get)(struct iommu_table *tbl, long index);
   void (*flush)(struct iommu_table *tbl);
  };
 @@ -152,6 +168,8 @@ extern void iommu_register_group(struct iommu_table_group 
 *table_group,
  extern int iommu_add_device(struct device *dev);
  extern void iommu_del_device(struct device *dev);
  extern int __init tce_iommu_bus_notifier_init(void);
 +extern long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
 + unsigned long *tce, enum dma_data_direction *direction);
  #else
  static inline void iommu_register_group(struct iommu_table_group 
 *table_group,
   int pci_domain_number,
 @@ -231,10 +249,6 @@ extern int iommu_tce_clear_param_check(struct 
 iommu_table *tbl,
   unsigned long npages);
  extern int iommu_tce_put_param_check(struct iommu_table *tbl,
   unsigned long ioba, unsigned long tce);
 -extern int iommu_tce_build(struct iommu_table *tbl, unsigned long entry,
 - unsigned long hwaddr, enum dma_data_direction direction);
 -extern unsigned long iommu_clear_tce(struct iommu_table *tbl,
 - unsigned long entry);
  
  extern void iommu_flush_tce(struct 

[PATCH kernel v9 18/32] powerpc/iommu/powernv: Release replaced TCE

2015-04-25 Thread Alexey Kardashevskiy
At the moment writing new TCE value to the IOMMU table fails with EBUSY
if there is a valid entry already. However PAPR specification allows
the guest to write new TCE value without clearing it first.

Another problem this patch is addressing is the use of pool locks for
external IOMMU users such as VFIO. The pool locks are to protect
DMA page allocator rather than entries and since the host kernel does
not control what pages are in use, there is no point in pool locks and
exchange()+put_page(oldtce) is sufficient to avoid possible races.

This adds an exchange() callback to iommu_table_ops which does the same
thing as set() plus it returns replaced TCE and DMA direction so
the caller can release the pages afterwards. The exchange() receives
a physical address unlike set() which receives linear mapping address;
and returns a physical address as the clear() does.

This implements exchange() for P5IOC2/IODA/IODA2. This adds a requirement
for a platform to have exchange() implemented in order to support VFIO.

This replaces iommu_tce_build() and iommu_clear_tce() with
a single iommu_tce_xchg().

This makes sure that TCE permission bits are not set in TCE passed to
IOMMU API as those are to be calculated by platform code from DMA direction.

This moves SetPageDirty() to the IOMMU code to make it work for both
VFIO ioctl interface in in-kernel TCE acceleration (when it becomes
available later).

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
[aw: for the vfio related changes]
Acked-by: Alex Williamson alex.william...@redhat.com
---
Changes:
v9:
* changed exchange() to work with physical addresses as these addresses
are never accessed by the code and physical addresses are actual values
we put into the IOMMU table
---
 arch/powerpc/include/asm/iommu.h| 22 +--
 arch/powerpc/kernel/iommu.c | 57 +---
 arch/powerpc/platforms/powernv/pci-ioda.c   | 34 +
 arch/powerpc/platforms/powernv/pci-p5ioc2.c |  3 ++
 arch/powerpc/platforms/powernv/pci.c| 17 +
 arch/powerpc/platforms/powernv/pci.h|  2 +
 drivers/vfio/vfio_iommu_spapr_tce.c | 58 ++---
 7 files changed, 128 insertions(+), 65 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index e63419e..7e7ca0a 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -45,13 +45,29 @@ extern int iommu_is_off;
 extern int iommu_force_on;
 
 struct iommu_table_ops {
+   /*
+* When called with direction==DMA_NONE, it is equal to clear().
+* uaddr is a linear map address.
+*/
int (*set)(struct iommu_table *tbl,
long index, long npages,
unsigned long uaddr,
enum dma_data_direction direction,
struct dma_attrs *attrs);
+#ifdef CONFIG_IOMMU_API
+   /*
+* Exchanges existing TCE with new TCE plus direction bits;
+* returns old TCE and DMA direction mask.
+* @tce is a physical address.
+*/
+   int (*exchange)(struct iommu_table *tbl,
+   long index,
+   unsigned long *tce,
+   enum dma_data_direction *direction);
+#endif
void (*clear)(struct iommu_table *tbl,
long index, long npages);
+   /* get() returns a physical address */
unsigned long (*get)(struct iommu_table *tbl, long index);
void (*flush)(struct iommu_table *tbl);
 };
@@ -152,6 +168,8 @@ extern void iommu_register_group(struct iommu_table_group 
*table_group,
 extern int iommu_add_device(struct device *dev);
 extern void iommu_del_device(struct device *dev);
 extern int __init tce_iommu_bus_notifier_init(void);
+extern long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
+   unsigned long *tce, enum dma_data_direction *direction);
 #else
 static inline void iommu_register_group(struct iommu_table_group *table_group,
int pci_domain_number,
@@ -231,10 +249,6 @@ extern int iommu_tce_clear_param_check(struct iommu_table 
*tbl,
unsigned long npages);
 extern int iommu_tce_put_param_check(struct iommu_table *tbl,
unsigned long ioba, unsigned long tce);
-extern int iommu_tce_build(struct iommu_table *tbl, unsigned long entry,
-   unsigned long hwaddr, enum dma_data_direction direction);
-extern unsigned long iommu_clear_tce(struct iommu_table *tbl,
-   unsigned long entry);
 
 extern void iommu_flush_tce(struct iommu_table *tbl);
 extern int iommu_take_ownership(struct iommu_table *tbl);
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index ea2c8ba..2eaba0c 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -975,9 +975,6 @@ EXPORT_SYMBOL_GPL(iommu_tce_clear_param_check);
 int