Re: [PATCH kernel v11 17/34] powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group

2015-06-04 Thread Alexey Kardashevskiy

On 06/01/2015 04:24 PM, David Gibson wrote:

On Fri, May 29, 2015 at 06:44:41PM +1000, Alexey Kardashevskiy wrote:

Modern IBM POWERPC systems support multiple (currently two) TCE tables
per IOMMU group (a.k.a. PE). This adds a iommu_table_group container
for TCE tables. Right now just one table is supported.

For IODA, instead of embedding iommu_table, the new iommu_table_group
keeps pointers to those. The iommu_table structs are allocated
dynamically now by a pnv_pci_table_alloc() helper as PCI hotplug
code (for EEH recovery) and SRIOV are supported there.

For P5IOC2, both iommu_table_group and iommu_table are embedded into
PE struct. As there is no EEH and SRIOV support for P5IOC2,
iommu_free_table() should not be called on iommu_table struct pointers
so we can keep it embedded in pnv_phb::p5ioc2.

For pSeries, this replaces multiple calls of kzalloc_node() with a new
iommu_pseries_group_alloc() helper and stores the table group struct
pointer into the pci_dn struct. For release, a iommu_table_group_free()
helper is added.

This moves iommu_table struct allocation from SR-IOV code to
the generic DMA initialization code in pnv_pci_ioda2_setup_dma_pe.

This replaces a single pointer to iommu_group with a list of
iommu_table_group structs. For now it is just a single iommu_table_group
in this list but later with TCE table sharing enabled, the list will
keep all the IOMMU groups which use the particular table. The list
uses iommu_table_group_link structs rather than iommu_table_group::next
as a VFIO container may have 2 IOMMU tables, each will have its own list
head pointer as it is mainly for TCE invalidation code which should
walk through all attached groups and invalidate TCE cache so
the table has to keep the list head pointer. The other option would
be storing list head in a VFIO container but it would not work as
the platform code (which does TCE table update and invalidation) has
no idea about VFIO.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy 
[aw: for the vfio related changes]
Acked-by: Alex Williamson 
Reviewed-by: David Gibson 
Reviewed-by: Gavin Shan 


It looks like this commit message doesn't match the code - it seems
like an older or newer version of the message from the previous patch.

>

This patch seems instead to be about changing the table_group <-> table
relationship from 1:1 to many:many.



I'll put this:

===
So far one TCE table could only be used by one IOMMU group. However
IODA2 hardware allows programming the same TCE table address to
multiple PE allowing sharing tables.

This replaces a single pointer to a group in a iommu_table struct
with a linked list of groups which provides the way of invalidating
TCE cache for every PE when an actual TCE table is updated. This adds 
pnv_pci_link_table_and_group() and pnv_pci_unlink_table_and_group() helpers 
to manage the list. However without VFIO, it is still going

to be a single IOMMU group per iommu_table.

This changes iommu_add_device() to add a device to a first group
from the group list of a table as it is only called from the platform
init code or PCI bus notifier and at these moments there is only
one group per table.

This does not change TCE invalidation code to loop through all
attached groups in order to simplify this patch and because
it is not really needed in most cases. IODA2 is fixed in a later
patch.

===



---
Changes:
v10:
* iommu_table is not embedded into iommu_table_group but allocated
dynamically
* iommu_table allocation is moved to a single place for IODA2's
pnv_pci_ioda_setup_dma_pe where it belongs to
* added list of groups into iommu_table; most of the code just looks at
the first item to keep the patch simpler

v9:
* s/it_group/it_table_group/
* added and used iommu_table_group_free(), from now iommu_free_table()
is only used for VIO
* added iommu_pseries_group_alloc()
* squashed "powerpc/iommu: Introduce iommu_table_alloc() helper" into this
---
  arch/powerpc/include/asm/iommu.h|   8 +-
  arch/powerpc/kernel/iommu.c |   9 +-
  arch/powerpc/platforms/powernv/pci-ioda.c   |  45 ++
  arch/powerpc/platforms/powernv/pci-p5ioc2.c |   3 +
  arch/powerpc/platforms/powernv/pci.c|  76 +
  arch/powerpc/platforms/powernv/pci.h|   7 ++
  arch/powerpc/platforms/pseries/iommu.c  |  33 +++-
  drivers/vfio/vfio_iommu_spapr_tce.c | 122 
  8 files changed, 242 insertions(+), 61 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 5a7267f..44a20cc 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -91,7 +91,7 @@ struct iommu_table {
struct iommu_pool pools[IOMMU_NR_POOLS];
unsigned long *it_map;   /* A simple allocation bitmap for now */
unsigned long  it_page_shift;/* table iommu page size */
-   struct iommu_table_group *it_table_group;
+   struct list_head 

Re: [PATCH kernel v11 17/34] powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group

2015-06-04 Thread Alexey Kardashevskiy

On 06/01/2015 04:24 PM, David Gibson wrote:

On Fri, May 29, 2015 at 06:44:41PM +1000, Alexey Kardashevskiy wrote:

Modern IBM POWERPC systems support multiple (currently two) TCE tables
per IOMMU group (a.k.a. PE). This adds a iommu_table_group container
for TCE tables. Right now just one table is supported.

For IODA, instead of embedding iommu_table, the new iommu_table_group
keeps pointers to those. The iommu_table structs are allocated
dynamically now by a pnv_pci_table_alloc() helper as PCI hotplug
code (for EEH recovery) and SRIOV are supported there.

For P5IOC2, both iommu_table_group and iommu_table are embedded into
PE struct. As there is no EEH and SRIOV support for P5IOC2,
iommu_free_table() should not be called on iommu_table struct pointers
so we can keep it embedded in pnv_phb::p5ioc2.

For pSeries, this replaces multiple calls of kzalloc_node() with a new
iommu_pseries_group_alloc() helper and stores the table group struct
pointer into the pci_dn struct. For release, a iommu_table_group_free()
helper is added.

This moves iommu_table struct allocation from SR-IOV code to
the generic DMA initialization code in pnv_pci_ioda2_setup_dma_pe.

This replaces a single pointer to iommu_group with a list of
iommu_table_group structs. For now it is just a single iommu_table_group
in this list but later with TCE table sharing enabled, the list will
keep all the IOMMU groups which use the particular table. The list
uses iommu_table_group_link structs rather than iommu_table_group::next
as a VFIO container may have 2 IOMMU tables, each will have its own list
head pointer as it is mainly for TCE invalidation code which should
walk through all attached groups and invalidate TCE cache so
the table has to keep the list head pointer. The other option would
be storing list head in a VFIO container but it would not work as
the platform code (which does TCE table update and invalidation) has
no idea about VFIO.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
[aw: for the vfio related changes]
Acked-by: Alex Williamson alex.william...@redhat.com
Reviewed-by: David Gibson da...@gibson.dropbear.id.au
Reviewed-by: Gavin Shan gws...@linux.vnet.ibm.com


It looks like this commit message doesn't match the code - it seems
like an older or newer version of the message from the previous patch.



This patch seems instead to be about changing the table_group - table
relationship from 1:1 to many:many.



I'll put this:

===
So far one TCE table could only be used by one IOMMU group. However
IODA2 hardware allows programming the same TCE table address to
multiple PE allowing sharing tables.

This replaces a single pointer to a group in a iommu_table struct
with a linked list of groups which provides the way of invalidating
TCE cache for every PE when an actual TCE table is updated. This adds 
pnv_pci_link_table_and_group() and pnv_pci_unlink_table_and_group() helpers 
to manage the list. However without VFIO, it is still going

to be a single IOMMU group per iommu_table.

This changes iommu_add_device() to add a device to a first group
from the group list of a table as it is only called from the platform
init code or PCI bus notifier and at these moments there is only
one group per table.

This does not change TCE invalidation code to loop through all
attached groups in order to simplify this patch and because
it is not really needed in most cases. IODA2 is fixed in a later
patch.

===



---
Changes:
v10:
* iommu_table is not embedded into iommu_table_group but allocated
dynamically
* iommu_table allocation is moved to a single place for IODA2's
pnv_pci_ioda_setup_dma_pe where it belongs to
* added list of groups into iommu_table; most of the code just looks at
the first item to keep the patch simpler

v9:
* s/it_group/it_table_group/
* added and used iommu_table_group_free(), from now iommu_free_table()
is only used for VIO
* added iommu_pseries_group_alloc()
* squashed powerpc/iommu: Introduce iommu_table_alloc() helper into this
---
  arch/powerpc/include/asm/iommu.h|   8 +-
  arch/powerpc/kernel/iommu.c |   9 +-
  arch/powerpc/platforms/powernv/pci-ioda.c   |  45 ++
  arch/powerpc/platforms/powernv/pci-p5ioc2.c |   3 +
  arch/powerpc/platforms/powernv/pci.c|  76 +
  arch/powerpc/platforms/powernv/pci.h|   7 ++
  arch/powerpc/platforms/pseries/iommu.c  |  33 +++-
  drivers/vfio/vfio_iommu_spapr_tce.c | 122 
  8 files changed, 242 insertions(+), 61 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 5a7267f..44a20cc 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -91,7 +91,7 @@ struct iommu_table {
struct iommu_pool pools[IOMMU_NR_POOLS];
unsigned long *it_map;   /* A simple allocation bitmap for now */
unsigned long  it_page_shift;/* table iommu 

Re: [PATCH kernel v11 17/34] powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group

2015-06-01 Thread David Gibson
On Fri, May 29, 2015 at 06:44:41PM +1000, Alexey Kardashevskiy wrote:
> Modern IBM POWERPC systems support multiple (currently two) TCE tables
> per IOMMU group (a.k.a. PE). This adds a iommu_table_group container
> for TCE tables. Right now just one table is supported.
> 
> For IODA, instead of embedding iommu_table, the new iommu_table_group
> keeps pointers to those. The iommu_table structs are allocated
> dynamically now by a pnv_pci_table_alloc() helper as PCI hotplug
> code (for EEH recovery) and SRIOV are supported there.
> 
> For P5IOC2, both iommu_table_group and iommu_table are embedded into
> PE struct. As there is no EEH and SRIOV support for P5IOC2,
> iommu_free_table() should not be called on iommu_table struct pointers
> so we can keep it embedded in pnv_phb::p5ioc2.
> 
> For pSeries, this replaces multiple calls of kzalloc_node() with a new
> iommu_pseries_group_alloc() helper and stores the table group struct
> pointer into the pci_dn struct. For release, a iommu_table_group_free()
> helper is added.
> 
> This moves iommu_table struct allocation from SR-IOV code to
> the generic DMA initialization code in pnv_pci_ioda2_setup_dma_pe.
> 
> This replaces a single pointer to iommu_group with a list of
> iommu_table_group structs. For now it is just a single iommu_table_group
> in this list but later with TCE table sharing enabled, the list will
> keep all the IOMMU groups which use the particular table. The list
> uses iommu_table_group_link structs rather than iommu_table_group::next
> as a VFIO container may have 2 IOMMU tables, each will have its own list
> head pointer as it is mainly for TCE invalidation code which should
> walk through all attached groups and invalidate TCE cache so
> the table has to keep the list head pointer. The other option would
> be storing list head in a VFIO container but it would not work as
> the platform code (which does TCE table update and invalidation) has
> no idea about VFIO.
> 
> This should cause no behavioural change.
> 
> Signed-off-by: Alexey Kardashevskiy 
> [aw: for the vfio related changes]
> Acked-by: Alex Williamson 
> Reviewed-by: David Gibson 
> Reviewed-by: Gavin Shan 

It looks like this commit message doesn't match the code - it seems
like an older or newer version of the message from the previous patch.

This patch seems instead to be about changing the table_group <-> table
relationship from 1:1 to many:many.

> ---
> Changes:
> v10:
> * iommu_table is not embedded into iommu_table_group but allocated
> dynamically
> * iommu_table allocation is moved to a single place for IODA2's
> pnv_pci_ioda_setup_dma_pe where it belongs to
> * added list of groups into iommu_table; most of the code just looks at
> the first item to keep the patch simpler
> 
> v9:
> * s/it_group/it_table_group/
> * added and used iommu_table_group_free(), from now iommu_free_table()
> is only used for VIO
> * added iommu_pseries_group_alloc()
> * squashed "powerpc/iommu: Introduce iommu_table_alloc() helper" into this
> ---
>  arch/powerpc/include/asm/iommu.h|   8 +-
>  arch/powerpc/kernel/iommu.c |   9 +-
>  arch/powerpc/platforms/powernv/pci-ioda.c   |  45 ++
>  arch/powerpc/platforms/powernv/pci-p5ioc2.c |   3 +
>  arch/powerpc/platforms/powernv/pci.c|  76 +
>  arch/powerpc/platforms/powernv/pci.h|   7 ++
>  arch/powerpc/platforms/pseries/iommu.c  |  33 +++-
>  drivers/vfio/vfio_iommu_spapr_tce.c | 122 
> 
>  8 files changed, 242 insertions(+), 61 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/iommu.h 
> b/arch/powerpc/include/asm/iommu.h
> index 5a7267f..44a20cc 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -91,7 +91,7 @@ struct iommu_table {
>   struct iommu_pool pools[IOMMU_NR_POOLS];
>   unsigned long *it_map;   /* A simple allocation bitmap for now */
>   unsigned long  it_page_shift;/* table iommu page size */
> - struct iommu_table_group *it_table_group;
> + struct list_head it_group_list;/* List of iommu_table_group_link */
>   struct iommu_table_ops *it_ops;
>   void (*set_bypass)(struct iommu_table *tbl, bool enable);
>  };
> @@ -126,6 +126,12 @@ extern struct iommu_table *iommu_init_table(struct 
> iommu_table * tbl,
>   int nid);
>  #define IOMMU_TABLE_GROUP_MAX_TABLES 1
>  
> +struct iommu_table_group_link {
> + struct list_head next;
> + struct rcu_head rcu;
> + struct iommu_table_group *table_group;
> +};
> +
>  struct iommu_table_group {
>   struct iommu_group *group;
>   struct iommu_table *tables[IOMMU_TABLE_GROUP_MAX_TABLES];
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index 719f048..e305a8f 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -1078,6 +1078,7 @@ EXPORT_SYMBOL_GPL(iommu_release_ownership);
>  int 

Re: [PATCH kernel v11 17/34] powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group

2015-06-01 Thread David Gibson
On Fri, May 29, 2015 at 06:44:41PM +1000, Alexey Kardashevskiy wrote:
 Modern IBM POWERPC systems support multiple (currently two) TCE tables
 per IOMMU group (a.k.a. PE). This adds a iommu_table_group container
 for TCE tables. Right now just one table is supported.
 
 For IODA, instead of embedding iommu_table, the new iommu_table_group
 keeps pointers to those. The iommu_table structs are allocated
 dynamically now by a pnv_pci_table_alloc() helper as PCI hotplug
 code (for EEH recovery) and SRIOV are supported there.
 
 For P5IOC2, both iommu_table_group and iommu_table are embedded into
 PE struct. As there is no EEH and SRIOV support for P5IOC2,
 iommu_free_table() should not be called on iommu_table struct pointers
 so we can keep it embedded in pnv_phb::p5ioc2.
 
 For pSeries, this replaces multiple calls of kzalloc_node() with a new
 iommu_pseries_group_alloc() helper and stores the table group struct
 pointer into the pci_dn struct. For release, a iommu_table_group_free()
 helper is added.
 
 This moves iommu_table struct allocation from SR-IOV code to
 the generic DMA initialization code in pnv_pci_ioda2_setup_dma_pe.
 
 This replaces a single pointer to iommu_group with a list of
 iommu_table_group structs. For now it is just a single iommu_table_group
 in this list but later with TCE table sharing enabled, the list will
 keep all the IOMMU groups which use the particular table. The list
 uses iommu_table_group_link structs rather than iommu_table_group::next
 as a VFIO container may have 2 IOMMU tables, each will have its own list
 head pointer as it is mainly for TCE invalidation code which should
 walk through all attached groups and invalidate TCE cache so
 the table has to keep the list head pointer. The other option would
 be storing list head in a VFIO container but it would not work as
 the platform code (which does TCE table update and invalidation) has
 no idea about VFIO.
 
 This should cause no behavioural change.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 [aw: for the vfio related changes]
 Acked-by: Alex Williamson alex.william...@redhat.com
 Reviewed-by: David Gibson da...@gibson.dropbear.id.au
 Reviewed-by: Gavin Shan gws...@linux.vnet.ibm.com

It looks like this commit message doesn't match the code - it seems
like an older or newer version of the message from the previous patch.

This patch seems instead to be about changing the table_group - table
relationship from 1:1 to many:many.

 ---
 Changes:
 v10:
 * iommu_table is not embedded into iommu_table_group but allocated
 dynamically
 * iommu_table allocation is moved to a single place for IODA2's
 pnv_pci_ioda_setup_dma_pe where it belongs to
 * added list of groups into iommu_table; most of the code just looks at
 the first item to keep the patch simpler
 
 v9:
 * s/it_group/it_table_group/
 * added and used iommu_table_group_free(), from now iommu_free_table()
 is only used for VIO
 * added iommu_pseries_group_alloc()
 * squashed powerpc/iommu: Introduce iommu_table_alloc() helper into this
 ---
  arch/powerpc/include/asm/iommu.h|   8 +-
  arch/powerpc/kernel/iommu.c |   9 +-
  arch/powerpc/platforms/powernv/pci-ioda.c   |  45 ++
  arch/powerpc/platforms/powernv/pci-p5ioc2.c |   3 +
  arch/powerpc/platforms/powernv/pci.c|  76 +
  arch/powerpc/platforms/powernv/pci.h|   7 ++
  arch/powerpc/platforms/pseries/iommu.c  |  33 +++-
  drivers/vfio/vfio_iommu_spapr_tce.c | 122 
 
  8 files changed, 242 insertions(+), 61 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/iommu.h 
 b/arch/powerpc/include/asm/iommu.h
 index 5a7267f..44a20cc 100644
 --- a/arch/powerpc/include/asm/iommu.h
 +++ b/arch/powerpc/include/asm/iommu.h
 @@ -91,7 +91,7 @@ struct iommu_table {
   struct iommu_pool pools[IOMMU_NR_POOLS];
   unsigned long *it_map;   /* A simple allocation bitmap for now */
   unsigned long  it_page_shift;/* table iommu page size */
 - struct iommu_table_group *it_table_group;
 + struct list_head it_group_list;/* List of iommu_table_group_link */
   struct iommu_table_ops *it_ops;
   void (*set_bypass)(struct iommu_table *tbl, bool enable);
  };
 @@ -126,6 +126,12 @@ extern struct iommu_table *iommu_init_table(struct 
 iommu_table * tbl,
   int nid);
  #define IOMMU_TABLE_GROUP_MAX_TABLES 1
  
 +struct iommu_table_group_link {
 + struct list_head next;
 + struct rcu_head rcu;
 + struct iommu_table_group *table_group;
 +};
 +
  struct iommu_table_group {
   struct iommu_group *group;
   struct iommu_table *tables[IOMMU_TABLE_GROUP_MAX_TABLES];
 diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
 index 719f048..e305a8f 100644
 --- a/arch/powerpc/kernel/iommu.c
 +++ b/arch/powerpc/kernel/iommu.c
 @@ -1078,6 +1078,7 @@ EXPORT_SYMBOL_GPL(iommu_release_ownership);
  int iommu_add_device(struct device 

[PATCH kernel v11 17/34] powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group

2015-05-29 Thread Alexey Kardashevskiy
Modern IBM POWERPC systems support multiple (currently two) TCE tables
per IOMMU group (a.k.a. PE). This adds a iommu_table_group container
for TCE tables. Right now just one table is supported.

For IODA, instead of embedding iommu_table, the new iommu_table_group
keeps pointers to those. The iommu_table structs are allocated
dynamically now by a pnv_pci_table_alloc() helper as PCI hotplug
code (for EEH recovery) and SRIOV are supported there.

For P5IOC2, both iommu_table_group and iommu_table are embedded into
PE struct. As there is no EEH and SRIOV support for P5IOC2,
iommu_free_table() should not be called on iommu_table struct pointers
so we can keep it embedded in pnv_phb::p5ioc2.

For pSeries, this replaces multiple calls of kzalloc_node() with a new
iommu_pseries_group_alloc() helper and stores the table group struct
pointer into the pci_dn struct. For release, a iommu_table_group_free()
helper is added.

This moves iommu_table struct allocation from SR-IOV code to
the generic DMA initialization code in pnv_pci_ioda2_setup_dma_pe.

This replaces a single pointer to iommu_group with a list of
iommu_table_group structs. For now it is just a single iommu_table_group
in this list but later with TCE table sharing enabled, the list will
keep all the IOMMU groups which use the particular table. The list
uses iommu_table_group_link structs rather than iommu_table_group::next
as a VFIO container may have 2 IOMMU tables, each will have its own list
head pointer as it is mainly for TCE invalidation code which should
walk through all attached groups and invalidate TCE cache so
the table has to keep the list head pointer. The other option would
be storing list head in a VFIO container but it would not work as
the platform code (which does TCE table update and invalidation) has
no idea about VFIO.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy 
[aw: for the vfio related changes]
Acked-by: Alex Williamson 
Reviewed-by: David Gibson 
Reviewed-by: Gavin Shan 
---
Changes:
v10:
* iommu_table is not embedded into iommu_table_group but allocated
dynamically
* iommu_table allocation is moved to a single place for IODA2's
pnv_pci_ioda_setup_dma_pe where it belongs to
* added list of groups into iommu_table; most of the code just looks at
the first item to keep the patch simpler

v9:
* s/it_group/it_table_group/
* added and used iommu_table_group_free(), from now iommu_free_table()
is only used for VIO
* added iommu_pseries_group_alloc()
* squashed "powerpc/iommu: Introduce iommu_table_alloc() helper" into this
---
 arch/powerpc/include/asm/iommu.h|   8 +-
 arch/powerpc/kernel/iommu.c |   9 +-
 arch/powerpc/platforms/powernv/pci-ioda.c   |  45 ++
 arch/powerpc/platforms/powernv/pci-p5ioc2.c |   3 +
 arch/powerpc/platforms/powernv/pci.c|  76 +
 arch/powerpc/platforms/powernv/pci.h|   7 ++
 arch/powerpc/platforms/pseries/iommu.c  |  33 +++-
 drivers/vfio/vfio_iommu_spapr_tce.c | 122 
 8 files changed, 242 insertions(+), 61 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 5a7267f..44a20cc 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -91,7 +91,7 @@ struct iommu_table {
struct iommu_pool pools[IOMMU_NR_POOLS];
unsigned long *it_map;   /* A simple allocation bitmap for now */
unsigned long  it_page_shift;/* table iommu page size */
-   struct iommu_table_group *it_table_group;
+   struct list_head it_group_list;/* List of iommu_table_group_link */
struct iommu_table_ops *it_ops;
void (*set_bypass)(struct iommu_table *tbl, bool enable);
 };
@@ -126,6 +126,12 @@ extern struct iommu_table *iommu_init_table(struct 
iommu_table * tbl,
int nid);
 #define IOMMU_TABLE_GROUP_MAX_TABLES   1
 
+struct iommu_table_group_link {
+   struct list_head next;
+   struct rcu_head rcu;
+   struct iommu_table_group *table_group;
+};
+
 struct iommu_table_group {
struct iommu_group *group;
struct iommu_table *tables[IOMMU_TABLE_GROUP_MAX_TABLES];
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 719f048..e305a8f 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1078,6 +1078,7 @@ EXPORT_SYMBOL_GPL(iommu_release_ownership);
 int iommu_add_device(struct device *dev)
 {
struct iommu_table *tbl;
+   struct iommu_table_group_link *tgl;
 
/*
 * The sysfs entries should be populated before
@@ -1095,15 +1096,17 @@ int iommu_add_device(struct device *dev)
}
 
tbl = get_iommu_table_base(dev);
-   if (!tbl || !tbl->it_table_group || !tbl->it_table_group->group) {
+   if (!tbl || list_empty(>it_group_list)) {
pr_debug("%s: Skipping device %s with no tbl\n",
 

[PATCH kernel v11 17/34] powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group

2015-05-29 Thread Alexey Kardashevskiy
Modern IBM POWERPC systems support multiple (currently two) TCE tables
per IOMMU group (a.k.a. PE). This adds a iommu_table_group container
for TCE tables. Right now just one table is supported.

For IODA, instead of embedding iommu_table, the new iommu_table_group
keeps pointers to those. The iommu_table structs are allocated
dynamically now by a pnv_pci_table_alloc() helper as PCI hotplug
code (for EEH recovery) and SRIOV are supported there.

For P5IOC2, both iommu_table_group and iommu_table are embedded into
PE struct. As there is no EEH and SRIOV support for P5IOC2,
iommu_free_table() should not be called on iommu_table struct pointers
so we can keep it embedded in pnv_phb::p5ioc2.

For pSeries, this replaces multiple calls of kzalloc_node() with a new
iommu_pseries_group_alloc() helper and stores the table group struct
pointer into the pci_dn struct. For release, a iommu_table_group_free()
helper is added.

This moves iommu_table struct allocation from SR-IOV code to
the generic DMA initialization code in pnv_pci_ioda2_setup_dma_pe.

This replaces a single pointer to iommu_group with a list of
iommu_table_group structs. For now it is just a single iommu_table_group
in this list but later with TCE table sharing enabled, the list will
keep all the IOMMU groups which use the particular table. The list
uses iommu_table_group_link structs rather than iommu_table_group::next
as a VFIO container may have 2 IOMMU tables, each will have its own list
head pointer as it is mainly for TCE invalidation code which should
walk through all attached groups and invalidate TCE cache so
the table has to keep the list head pointer. The other option would
be storing list head in a VFIO container but it would not work as
the platform code (which does TCE table update and invalidation) has
no idea about VFIO.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
[aw: for the vfio related changes]
Acked-by: Alex Williamson alex.william...@redhat.com
Reviewed-by: David Gibson da...@gibson.dropbear.id.au
Reviewed-by: Gavin Shan gws...@linux.vnet.ibm.com
---
Changes:
v10:
* iommu_table is not embedded into iommu_table_group but allocated
dynamically
* iommu_table allocation is moved to a single place for IODA2's
pnv_pci_ioda_setup_dma_pe where it belongs to
* added list of groups into iommu_table; most of the code just looks at
the first item to keep the patch simpler

v9:
* s/it_group/it_table_group/
* added and used iommu_table_group_free(), from now iommu_free_table()
is only used for VIO
* added iommu_pseries_group_alloc()
* squashed powerpc/iommu: Introduce iommu_table_alloc() helper into this
---
 arch/powerpc/include/asm/iommu.h|   8 +-
 arch/powerpc/kernel/iommu.c |   9 +-
 arch/powerpc/platforms/powernv/pci-ioda.c   |  45 ++
 arch/powerpc/platforms/powernv/pci-p5ioc2.c |   3 +
 arch/powerpc/platforms/powernv/pci.c|  76 +
 arch/powerpc/platforms/powernv/pci.h|   7 ++
 arch/powerpc/platforms/pseries/iommu.c  |  33 +++-
 drivers/vfio/vfio_iommu_spapr_tce.c | 122 
 8 files changed, 242 insertions(+), 61 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 5a7267f..44a20cc 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -91,7 +91,7 @@ struct iommu_table {
struct iommu_pool pools[IOMMU_NR_POOLS];
unsigned long *it_map;   /* A simple allocation bitmap for now */
unsigned long  it_page_shift;/* table iommu page size */
-   struct iommu_table_group *it_table_group;
+   struct list_head it_group_list;/* List of iommu_table_group_link */
struct iommu_table_ops *it_ops;
void (*set_bypass)(struct iommu_table *tbl, bool enable);
 };
@@ -126,6 +126,12 @@ extern struct iommu_table *iommu_init_table(struct 
iommu_table * tbl,
int nid);
 #define IOMMU_TABLE_GROUP_MAX_TABLES   1
 
+struct iommu_table_group_link {
+   struct list_head next;
+   struct rcu_head rcu;
+   struct iommu_table_group *table_group;
+};
+
 struct iommu_table_group {
struct iommu_group *group;
struct iommu_table *tables[IOMMU_TABLE_GROUP_MAX_TABLES];
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 719f048..e305a8f 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1078,6 +1078,7 @@ EXPORT_SYMBOL_GPL(iommu_release_ownership);
 int iommu_add_device(struct device *dev)
 {
struct iommu_table *tbl;
+   struct iommu_table_group_link *tgl;
 
/*
 * The sysfs entries should be populated before
@@ -1095,15 +1096,17 @@ int iommu_add_device(struct device *dev)
}
 
tbl = get_iommu_table_base(dev);
-   if (!tbl || !tbl-it_table_group || !tbl-it_table_group-group) {
+   if (!tbl || list_empty(tbl-it_group_list)) {