Re: [PATCH] iommu:Check that iommu_device_create has completed successfully in alloc_iommu

2015-12-28 Thread Wan ZongShun
2015-12-29 13:10 GMT+08:00 Nicholas Krause :
> This adds the proper check to alloc_iommu to make sure that the call
> to iommu_device_create has completed successfully and if not return
> to the caller the error code returned after freeing up resources
> allocated previously by alloc_iommu.
>
> Signed-off-by: Nicholas Krause 
> ---
>  drivers/iommu/dmar.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index 80e3c17..27333b6 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -1069,9 +1069,12 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
> iommu->iommu_dev = iommu_device_create(NULL, iommu,
>intel_iommu_groups,
>"%s", iommu->name);
> +   if (IS_ERR(iommu->iommu_dev)) {
> +   err = PTR_ERR(iommu->iommu_dev);
> +   goto err_unmap;
> +   }

If so, will this bad 'iommu->iommu_dev' break your iommu work?  It
seems not necessary.

>
> return 0;
> -
>  err_unmap:
> unmap_iommu(iommu);
>  error_free_seq_id:
> --
> 2.5.0
>
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu



-- 
---
Vincent Wan(Zongshun)
www.mcuos.com
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu:Check that iommu_device_create has completed successfully in alloc_iommu

2015-12-28 Thread Nicholas Krause
This adds the proper check to alloc_iommu to make sure that the call
to iommu_device_create has completed successfully and if not return
to the caller the error code returned after freeing up resources
allocated previously by alloc_iommu.

Signed-off-by: Nicholas Krause 
---
 drivers/iommu/dmar.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 80e3c17..27333b6 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1069,9 +1069,12 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
iommu->iommu_dev = iommu_device_create(NULL, iommu,
   intel_iommu_groups,
   "%s", iommu->name);
+   if (IS_ERR(iommu->iommu_dev)) {
+   err = PTR_ERR(iommu->iommu_dev);
+   goto err_unmap;
+   }
 
return 0;
-
 err_unmap:
unmap_iommu(iommu);
 error_free_seq_id:
-- 
2.5.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 06/06] iommu/ipmmu-vmsa: Drop LPAE Kconfig dependency

2015-12-28 Thread Laurent Pinchart
Hi Magnus,

Thank you for the patch.

On Tuesday 15 December 2015 21:03:08 Magnus Damm wrote:
> From: Magnus Damm 
> 
> Neither the ARM page table code enabled by IOMMU_IO_PGTABLE_LPAE
> nor the IPMMU_VMSA driver actually depends on ARM_LPAE, so get
> rid of the dependency.
> 
> Tested with ipmmu-vmsa on r8a7794 ALT and a kernel config using:
>  # CONFIG_ARM_LPAE is not set
> 
> Signed-off-by: Magnus Damm 

Acked-by: Laurent Pinchart 

> ---
> 
>  This time the result also compiles on x86. Need to be
>  applied as last patch in the following series:
>  [PATCH 00/06] iommu/ipmmu-vmsa: IPMMU multi-arch update
> 
>  drivers/iommu/Kconfig |1 -
>  1 file changed, 1 deletion(-)
> 
> --- 0001/drivers/iommu/Kconfig
> +++ work/drivers/iommu/Kconfig2015-10-18 14:58:09.080513000 +0900
> @@ -324,7 +324,6 @@ config SHMOBILE_IOMMU_L1SIZE
> 
>  config IPMMU_VMSA
>   bool "Renesas VMSA-compatible IPMMU"
> - depends on ARM_LPAE
>   depends on ARCH_SHMOBILE || COMPILE_TEST
>   select IOMMU_API
>   select IOMMU_IO_PGTABLE_LPAE

-- 
Regards,

Laurent Pinchart

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 05/06] iommu/ipmmu-vmsa: Break out 32-bit ARM mapping code

2015-12-28 Thread Laurent Pinchart
Hi Magnus,

Thank you for the patch.

On Tuesday 15 December 2015 21:02:58 Magnus Damm wrote:
> From: Magnus Damm 
> 
> Make the driver compile on more than just 32-bit ARM
> by breaking out and wrapping ARM specific functions
> in #ifdefs. Needed to be able to use the driver on
> other architectures.

The only other architecture where we need to use the driver is ARM64, and I'm 
not sure this is the right fix to correctly support ARM and ARM64. I believe 
we should instead improve the ARM implementation to get rid of 
arm_iommu_create_mapping() and arm_iommu_attach_device() the same way that 
ARM64 did.

> Signed-off-by: Magnus Damm 
> ---
> 
> drivers/iommu/ipmmu-vmsa.c |   94 ++---
> 1 file changed, 62 insertions(+), 32 deletions(-)
> 
> --- 0007/drivers/iommu/ipmmu-vmsa.c
> +++ work/drivers/iommu/ipmmu-vmsa.c   2015-12-15 13:20:26.580513000 +0900
> @@ -22,8 +22,10 @@
>  #include 
>  #include 
> 
> +#ifdef CONFIG_ARM
>  #include 
>  #include 
> +#endif
> 
>  #include "io-pgtable.h"
> 
> @@ -38,7 +40,9 @@ struct ipmmu_vmsa_device {
>   DECLARE_BITMAP(ctx, IPMMU_CTX_MAX);
>   struct ipmmu_vmsa_domain *domains[IPMMU_CTX_MAX];
> 
> +#ifdef CONFIG_ARM
>   struct dma_iommu_mapping *mapping;
> +#endif
>  };
> 
>  struct ipmmu_vmsa_domain {
> @@ -621,6 +625,60 @@ static int ipmmu_find_utlbs(struct ipmmu
>   return 0;
>  }
> 
> +#ifdef CONFIG_ARM
> +static int ipmmu_map_attach(struct device *dev, struct ipmmu_vmsa_device
> *mmu)
> +{
> + int ret;
> +
> + /*
> +  * Create the ARM mapping, used by the ARM DMA mapping core to allocate
> +  * VAs. This will allocate a corresponding IOMMU domain.
> +  *
> +  * TODO:
> +  * - Create one mapping per context (TLB).
> +  * - Make the mapping size configurable ? We currently use a 2GB mapping
> +  *   at a 1GB offset to ensure that NULL VAs will fault.
> +  */
> + if (!mmu->mapping) {
> + struct dma_iommu_mapping *mapping;
> +
> + mapping = arm_iommu_create_mapping(&platform_bus_type,
> +SZ_1G, SZ_2G);
> + if (IS_ERR(mapping)) {
> + dev_err(mmu->dev, "failed to create ARM IOMMU 
> mapping\n");
> + return PTR_ERR(mapping);
> + }
> +
> + mmu->mapping = mapping;
> + }
> +
> + /* Attach the ARM VA mapping to the device. */
> + ret = arm_iommu_attach_device(dev, mmu->mapping);
> + if (ret < 0) {
> + dev_err(dev, "Failed to attach device to VA mapping\n");
> + arm_iommu_release_mapping(mmu->mapping);
> + }
> +
> + return ret;
> +}
> +static inline void ipmmu_detach(struct device *dev)
> +{
> + arm_iommu_detach_device(dev);
> +}
> +static inline void ipmmu_release_mapping(struct ipmmu_vmsa_device *mmu)
> +{
> + arm_iommu_release_mapping(mmu->mapping);
> +}
> +#else
> +static inline int ipmmu_map_attach(struct device *dev,
> +struct ipmmu_vmsa_device *mmu)
> +{
> + return 0;
> +}
> +static inline void ipmmu_detach(struct device *dev) {}
> +static inline void ipmmu_release_mapping(struct ipmmu_vmsa_device *mmu) {}
> +#endif
> +
>  static int ipmmu_add_device(struct device *dev)
>  {
>   struct ipmmu_vmsa_dev_data *dev_data = get_dev_data(dev);
> @@ -701,41 +759,13 @@ static int ipmmu_add_device(struct devic
>   dev_data->num_utlbs = num_utlbs;
>   set_dev_data(dev, dev_data);
> 
> - /*
> -  * Create the ARM mapping, used by the ARM DMA mapping core to allocate
> -  * VAs. This will allocate a corresponding IOMMU domain.
> -  *
> -  * TODO:
> -  * - Create one mapping per context (TLB).
> -  * - Make the mapping size configurable ? We currently use a 2GB mapping
> -  *   at a 1GB offset to ensure that NULL VAs will fault.
> -  */
> - if (!mmu->mapping) {
> - struct dma_iommu_mapping *mapping;
> -
> - mapping = arm_iommu_create_mapping(&platform_bus_type,
> -SZ_1G, SZ_2G);
> - if (IS_ERR(mapping)) {
> - dev_err(mmu->dev, "failed to create ARM IOMMU 
> mapping\n");
> - ret = PTR_ERR(mapping);
> - goto error;
> - }
> -
> - mmu->mapping = mapping;
> - }
> -
> - /* Attach the ARM VA mapping to the device. */
> - ret = arm_iommu_attach_device(dev, mmu->mapping);
> - if (ret < 0) {
> - dev_err(dev, "Failed to attach device to VA mapping\n");
> + ret = ipmmu_map_attach(dev, mmu);
> + if (ret < 0)
>   goto error;
> - }
> 
>   return 0;
> 
>  error:
> - arm_iommu_release_mapping(mmu->mapping);
> -
>   kfree(dev_data);
>   kfree(utlbs);
> 
> @@ -751,7 +781,7 @@ static void ipmmu_remove_device(struct d
>  {
>   struct ipmmu_vmsa_dev_data *dev_data = get_dev_data(dev);
> 
> -   

Re: [PATCH 04/06] iommu/ipmmu-vmsa: Rework interrupt code and use bitmap for context

2015-12-28 Thread Laurent Pinchart
Hi Magnus,

Thank you for the patch.

On Tuesday 15 December 2015 21:02:49 Magnus Damm wrote:
> From: Magnus Damm 
> 
> Introduce a bitmap for context handing and convert the
> interrupt routine to go handle all registered contexts.
> 
> At this point the number of contexts are still limited.

That's all nice, but without seeing support for multiple contexts it's hard to 
tell if the implementation is correct for multiple context purpose.

> The purpose of this patch is to remove the use of the
> ARM specific mapping variable from ipmmu_irq().

Why do you want to do that ?

> Signed-off-by: Magnus Damm 
> ---
> 
>  drivers/iommu/ipmmu-vmsa.c |   37 ++---
>  1 file changed, 26 insertions(+), 11 deletions(-)
> 
> --- 0007/drivers/iommu/ipmmu-vmsa.c
> +++ work/drivers/iommu/ipmmu-vmsa.c   2015-12-15 13:14:35.540513000 +0900
> @@ -8,6 +8,7 @@
>   * the Free Software Foundation; version 2 of the License.
>   */
> 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -26,12 +27,16 @@
> 
>  #include "io-pgtable.h"
> 
> +#define IPMMU_CTX_MAX 1
> +
>  struct ipmmu_vmsa_device {
>   struct device *dev;
>   void __iomem *base;
>   struct list_head list;
> 
>   unsigned int num_utlbs;
> + DECLARE_BITMAP(ctx, IPMMU_CTX_MAX);

We have up to 4 context on Gen2 and 8 on Gen3, a bitmap might be slightly 
overkill.

> + struct ipmmu_vmsa_domain *domains[IPMMU_CTX_MAX];
> 
>   struct dma_iommu_mapping *mapping;
>  };
> @@ -319,6 +324,7 @@ static struct iommu_gather_ops ipmmu_gat
>  static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
>  {
>   phys_addr_t ttbr;
> + int ret;
> 
>   /*
>* Allocate the page table operations.
> @@ -348,10 +354,16 @@ static int ipmmu_domain_init_context(str
>   return -EINVAL;
> 
>   /*
> -  * TODO: When adding support for multiple contexts, find an unused
> -  * context.
> +  * Find an unused context.

We need to support multiple devices per context or we will very soon run out 
of contexts. How to pick a proper context is a topic that needs to be 
researched, I believe IOMMU groups might come into play.

>*/
> - domain->context_id = 0;
> + ret = bitmap_find_free_region(domain->mmu->ctx, IPMMU_CTX_MAX, 0);
> + if (ret < 0) {
> + free_io_pgtable_ops(domain->iop);
> + return ret;
> + }
> +
> + domain->context_id = ret;
> + domain->mmu->domains[ret] = domain;

This requires locking to protect against races with the interrupt handler.

> 
>   /* TTBR0 */
>   ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr[0];
> @@ -395,6 +407,8 @@ static int ipmmu_domain_init_context(str
> 
>  static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain)
>  {
> + bitmap_release_region(domain->mmu->ctx, domain->context_id, 0);
> +
>   /*
>* Disable the context. Flush the TLB as required when modifying the
>* context registers.
> @@ -460,16 +474,16 @@ static irqreturn_t ipmmu_domain_irq(stru
>  static irqreturn_t ipmmu_irq(int irq, void *dev)
>  {
>   struct ipmmu_vmsa_device *mmu = dev;
> - struct iommu_domain *io_domain;
> - struct ipmmu_vmsa_domain *domain;
> -
> - if (!mmu->mapping)
> - return IRQ_NONE;
> + irqreturn_t status = IRQ_NONE;
> + unsigned int k;

i is a perfectly fine loop counter :-)

> - io_domain = mmu->mapping->domain;
> - domain = to_vmsa_domain(io_domain);
> + /* Check interrupts for all active contexts */
> + for (k = find_first_bit(mmu->ctx, IPMMU_CTX_MAX);
> +  k < IPMMU_CTX_MAX && status == IRQ_NONE;
> +  k = find_next_bit(mmu->ctx, IPMMU_CTX_MAX, k))

You can just loop over mmu->domains and skip NULL entries.

> + status = ipmmu_domain_irq(mmu->domains[k]);

Only the status of the last domain is taken into account.

> - return ipmmu_domain_irq(domain);
> + return status;
>  }
> 
>  /* 
> @@ -788,6 +802,7 @@ static int ipmmu_probe(struct platform_d
> 
>   mmu->dev = &pdev->dev;
>   mmu->num_utlbs = 32;
> + bitmap_zero(mmu->ctx, IPMMU_CTX_MAX);
> 
>   /* Map I/O memory and request IRQ. */
>   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);

-- 
Regards,

Laurent Pinchart

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/06] iommu/ipmmu-vmsa: Break out utlb control function

2015-12-28 Thread Laurent Pinchart
Hi Magnus,

On Tuesday 15 December 2015 16:48:42 Geert Uytterhoeven wrote:
> On Tue, Dec 15, 2015 at 1:02 PM, Magnus Damm  wrote:
> > --- 0004/drivers/iommu/ipmmu-vmsa.c
> > +++ work/drivers/iommu/ipmmu-vmsa.c 2015-12-15 13:17:40.580513000
> > +0900
> > @@ -279,9 +279,18 @@ static void ipmmu_utlb_enable(struct ipm
> >  static void ipmmu_utlb_disable(struct ipmmu_vmsa_domain *domain,
> >unsigned int utlb)
> >  {
> > -   struct ipmmu_vmsa_device *mmu = domain->mmu;
> > +   ipmmu_write(domain->mmu, IMUCTR(utlb), 0);
> > +}
> > +
> > +static void ipmmu_utlb_ctrl(struct ipmmu_vmsa_domain *domain,
> > +   void (*fn)(struct ipmmu_vmsa_domain *,
> > +  unsigned int utlb), struct device
> > *dev)
> > +{
> > +   struct ipmmu_vmsa_dev_data *dev_data = get_dev_data(dev);
> > +   unsigned int i;
> > 
> > -   ipmmu_write(mmu, IMUCTR(utlb), 0);
> > +   for (i = 0; i < dev_data->num_utlbs; ++i)
> > +   fn(domain, dev_data->utlbs[i]);
> > 
> >  }
> 
> Unless you have further plans with the "fn" parameter, I would simply pass
> a bool enable/disable flag instead of a function pointer.

I agree with Geert. What's your plan here ?

-- 
Regards,

Laurent Pinchart

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/06] iommu/ipmmu-vmsa: Convert to dev_data

2015-12-28 Thread Laurent Pinchart
Hi Magnus,

Thank you for the patch.

On Tuesday 15 December 2015 21:02:30 Magnus Damm wrote:
> From: Magnus Damm 
> 
> Rename ipmmu_vmsa_archdata to ipmmu_vmsa_dev_data to avoid
> confusion when using the driver on multiple architectures.

I'm sorry but this patches makes the driver more confusing to me. archdata is 
a well established concept for IOMMU on ARM, I don't see how hiding help would 
help.

> The data now stored in ipmmu_vmsa_dev_data is used to point
> various on-chip devices to the actual IPMMU instances.
> 
> Signed-off-by: Magnus Damm 
> ---
> 
>  drivers/iommu/ipmmu-vmsa.c |   58 +
>  1 file changed, 36 insertions(+), 22 deletions(-)
> 
> --- 0003/drivers/iommu/ipmmu-vmsa.c
> +++ work/drivers/iommu/ipmmu-vmsa.c   2015-12-15 13:16:27.160513000 +0900
> @@ -47,7 +47,7 @@ struct ipmmu_vmsa_domain {
>   spinlock_t lock;/* Protects mappings */
>  };
> 
> -struct ipmmu_vmsa_archdata {
> +struct ipmmu_vmsa_dev_data {
>   struct ipmmu_vmsa_device *mmu;
>   unsigned int *utlbs;
>   unsigned int num_utlbs;
> @@ -182,6 +182,20 @@ static struct ipmmu_vmsa_domain *to_vmsa
>  #define IMUASID_ASID0_SHIFT  0
> 
>  /* 
> + * Consumer device side private data handling
> + */
> +
> +static struct ipmmu_vmsa_dev_data *get_dev_data(struct device *dev)
> +{
> + return dev->archdata.iommu;
> +}
> +
> +static void set_dev_data(struct device *dev, struct ipmmu_vmsa_dev_data
> *data)
> +{
> + dev->archdata.iommu = data;
> +}
> +
> +/* 
>   * Read/Write Access
>   */
> 
> @@ -485,8 +499,8 @@ static void ipmmu_domain_free(struct iom
>  static int ipmmu_attach_device(struct iommu_domain *io_domain,
>  struct device *dev)
>  {
> - struct ipmmu_vmsa_archdata *archdata = dev->archdata.iommu;
> - struct ipmmu_vmsa_device *mmu = archdata->mmu;
> + struct ipmmu_vmsa_dev_data *dev_data = get_dev_data(dev);
> + struct ipmmu_vmsa_device *mmu = dev_data->mmu;
>   struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
>   unsigned long flags;
>   unsigned int i;
> @@ -518,8 +532,8 @@ static int ipmmu_attach_device(struct io
>   if (ret < 0)
>   return ret;
> 
> - for (i = 0; i < archdata->num_utlbs; ++i)
> - ipmmu_utlb_enable(domain, archdata->utlbs[i]);
> + for (i = 0; i < dev_data->num_utlbs; ++i)
> + ipmmu_utlb_enable(domain, dev_data->utlbs[i]);
> 
>   return 0;
>  }
> @@ -527,12 +541,12 @@ static int ipmmu_attach_device(struct io
>  static void ipmmu_detach_device(struct iommu_domain *io_domain,
>   struct device *dev)
>  {
> - struct ipmmu_vmsa_archdata *archdata = dev->archdata.iommu;
> + struct ipmmu_vmsa_dev_data *dev_data = get_dev_data(dev);
>   struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
>   unsigned int i;
> 
> - for (i = 0; i < archdata->num_utlbs; ++i)
> - ipmmu_utlb_disable(domain, archdata->utlbs[i]);
> + for (i = 0; i < dev_data->num_utlbs; ++i)
> + ipmmu_utlb_disable(domain, dev_data->utlbs[i]);
> 
>   /*
>* TODO: Optimize by disabling the context when no device is attached.
> @@ -595,7 +609,7 @@ static int ipmmu_find_utlbs(struct ipmmu
> 
>  static int ipmmu_add_device(struct device *dev)
>  {
> - struct ipmmu_vmsa_archdata *archdata;
> + struct ipmmu_vmsa_dev_data *dev_data = get_dev_data(dev);
>   struct ipmmu_vmsa_device *mmu;
>   struct iommu_group *group = NULL;
>   unsigned int *utlbs;
> @@ -603,7 +617,7 @@ static int ipmmu_add_device(struct devic
>   int num_utlbs;
>   int ret = -ENODEV;
> 
> - if (dev->archdata.iommu) {
> + if (dev_data) {
>   dev_warn(dev, "IOMMU driver already assigned to device %s\n",
>dev_name(dev));
>   return -EINVAL;
> @@ -662,16 +676,16 @@ static int ipmmu_add_device(struct devic
>   goto error;
>   }
> 
> - archdata = kzalloc(sizeof(*archdata), GFP_KERNEL);
> - if (!archdata) {
> + dev_data = kzalloc(sizeof(*dev_data), GFP_KERNEL);
> + if (!dev_data) {
>   ret = -ENOMEM;
>   goto error;
>   }
> 
> - archdata->mmu = mmu;
> - archdata->utlbs = utlbs;
> - archdata->num_utlbs = num_utlbs;
> - dev->archdata.iommu = archdata;
> + dev_data->mmu = mmu;
> + dev_data->utlbs = utlbs;
> + dev_data->num_utlbs = num_utlbs;
> + set_dev_data(dev, dev_data);
> 
>   /*
>* Create the ARM mapping, used by the ARM DMA mapping core to allocate
> @@ -708,10 +722,10 @@ static int ipmmu_add_device(struct devic
>  error:
>   arm_iommu_release_mapping(mmu->mapping);
> 
> - kfree(dev->archdata.iommu);
> + kfree(dev_data);
>   kfree(utlbs);
> 
> - 

Re: [PATCH 01/06] iommu/ipmmu-vmsa: Remove platform data handling

2015-12-28 Thread Laurent Pinchart
Hi Magnus,

Thank you for the patch.

On Tuesday 15 December 2015 21:02:21 Magnus Damm wrote:
> From: Magnus Damm 
> 
> The IPMMU driver is using DT these days, and platform data is no longer
> used by the driver. Remove unused code.
> 
> Signed-off-by: Magnus Damm 

Reviewed-by: Laurent Pinchart 

> ---
> 
>  drivers/iommu/ipmmu-vmsa.c |5 -
>  1 file changed, 5 deletions(-)
> 
> --- 0001/drivers/iommu/ipmmu-vmsa.c
> +++ work/drivers/iommu/ipmmu-vmsa.c   2015-12-15 11:35:00.490513000 +0900
> @@ -766,11 +766,6 @@ static int ipmmu_probe(struct platform_d
>   int irq;
>   int ret;
> 
> - if (!IS_ENABLED(CONFIG_OF) && !pdev->dev.platform_data) {
> - dev_err(&pdev->dev, "missing platform data\n");
> - return -EINVAL;
> - }
> -
>   mmu = devm_kzalloc(&pdev->dev, sizeof(*mmu), GFP_KERNEL);
>   if (!mmu) {
>   dev_err(&pdev->dev, "cannot allocate device data\n");

-- 
Regards,

Laurent Pinchart

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Regression since 4.1

2015-12-28 Thread Joerg Roedel
Hi Tobias,

On Thu, Dec 10, 2015 at 12:07:57AM +0100, Tobias Geiger wrote:
> It all comes down to kernel > 4.1 && iommu=on = not working here
> 
> Hardware is:
> Intel DX58SO, Intel CPU i7 920... all pretty solid running for years now, 
> always with iommo=on.
> 
> I'd love to debug this further, just dont know how...

Sounds like only bisecting the issue will help here. You can find an
introduction on bisecting here:

https://git-scm.com/book/en/v2/Git-Tools-Debugging-with-Git

Would be great if you can find the time to bisect the issue.



Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 6/7] iommu: change intel-iommu to use IOVA frame numbers

2015-12-28 Thread Adam Morrison
From: Omer Peleg 

Make intel-iommu map/unmap/invalidate work with IOVA pfns instead of
pointers to "struct iova". This avoids using the iova struct from the IOVA
red-black tree and the resulting explicit find_iova() on unmap.

This patch will allow us to cache IOVAs in the next patch, in order to
avoid rbtree operations for the majority of map/unmap operations.

Note: In eliminating the find_iova() operation, we have also eliminated
the sanity check previously done in the unmap flow. Arguably, this was
overhead that is better avoided in production code, but it could be
brought back as a debug option for driver development.

Signed-off-by: Omer Peleg 
[m...@cs.technion.ac.il: rebased and reworded the commit message]
Signed-off-by: Adam Morrison 
---
 drivers/iommu/intel-iommu.c | 73 ++---
 drivers/iommu/iova.c|  8 ++---
 include/linux/iova.h|  2 +-
 3 files changed, 40 insertions(+), 43 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 6c37bbc..c2de0e7 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -455,7 +455,7 @@ static LIST_HEAD(dmar_rmrr_units);
 static void flush_unmaps_timeout(unsigned long data);
 
 struct deferred_flush_entry {
-   struct iova *iova;
+   unsigned long iova_pfn;
unsigned long nrpages;
struct dmar_domain *domain;
struct page *freelist;
@@ -3264,11 +3264,11 @@ error:
 }
 
 /* This takes a number of _MM_ pages, not VTD pages */
-static struct iova *intel_alloc_iova(struct device *dev,
+static unsigned long intel_alloc_iova(struct device *dev,
 struct dmar_domain *domain,
 unsigned long nrpages, uint64_t dma_mask)
 {
-   struct iova *iova = NULL;
+   unsigned long iova_pfn;
 
/* Restrict dma_mask to the width that the iommu can handle */
dma_mask = min_t(uint64_t, DOMAIN_MAX_ADDR(domain->gaw), dma_mask);
@@ -3281,19 +3281,19 @@ static struct iova *intel_alloc_iova(struct device *dev,
 * DMA_BIT_MASK(32) and if that fails then try allocating
 * from higher range
 */
-   iova = alloc_iova(&domain->iovad, nrpages,
- IOVA_PFN(DMA_BIT_MASK(32)), 1);
-   if (iova)
-   return iova;
+   iova_pfn = alloc_iova(&domain->iovad, nrpages,
+ IOVA_PFN(DMA_BIT_MASK(32)), 1);
+   if (iova_pfn)
+   return iova_pfn;
}
-   iova = alloc_iova(&domain->iovad, nrpages, IOVA_PFN(dma_mask), 1);
-   if (unlikely(!iova)) {
+   iova_pfn = alloc_iova(&domain->iovad, nrpages, IOVA_PFN(dma_mask), 1);
+   if (unlikely(!iova_pfn)) {
pr_err("Allocating %ld-page iova for %s failed",
   nrpages, dev_name(dev));
-   return NULL;
+   return 0;
}
 
-   return iova;
+   return iova_pfn;
 }
 
 static struct dmar_domain *__get_valid_domain_for_dev(struct device *dev)
@@ -3391,7 +3391,7 @@ static dma_addr_t __intel_map_single(struct device *dev, 
phys_addr_t paddr,
 {
struct dmar_domain *domain;
phys_addr_t start_paddr;
-   struct iova *iova;
+   unsigned long iova_pfn;
int prot = 0;
int ret;
struct intel_iommu *iommu;
@@ -3409,8 +3409,8 @@ static dma_addr_t __intel_map_single(struct device *dev, 
phys_addr_t paddr,
iommu = domain_get_iommu(domain);
size = aligned_nrpages(paddr, size);
 
-   iova = intel_alloc_iova(dev, domain, dma_to_mm_pfn(size), dma_mask);
-   if (!iova)
+   iova_pfn = intel_alloc_iova(dev, domain, dma_to_mm_pfn(size), dma_mask);
+   if (!iova_pfn)
goto error;
 
/*
@@ -3428,7 +3428,7 @@ static dma_addr_t __intel_map_single(struct device *dev, 
phys_addr_t paddr,
 * might have two guest_addr mapping to the same host paddr, but this
 * is not a big problem
 */
-   ret = domain_pfn_mapping(domain, mm_to_dma_pfn(iova->pfn_lo),
+   ret = domain_pfn_mapping(domain, mm_to_dma_pfn(iova_pfn),
 mm_to_dma_pfn(paddr_pfn), size, prot);
if (ret)
goto error;
@@ -3436,18 +3436,18 @@ static dma_addr_t __intel_map_single(struct device 
*dev, phys_addr_t paddr,
/* it's a non-present to present mapping. Only flush if caching mode */
if (cap_caching_mode(iommu->cap))
iommu_flush_iotlb_psi(iommu, domain,
- mm_to_dma_pfn(iova->pfn_lo),
+ mm_to_dma_pfn(iova_pfn),
  size, 0, 1);
else
iommu_flush_write_buffer(iommu);
 
-   start_paddr = (phys_addr_t)iova->pfn_lo << PAGE_SHIFT;
+   start_paddr = (phys_addr_t)iova_pfn << PAGE_SHIFT;

[PATCH 5/7] iommu: avoid dev iotlb logic in intel-iommu for domains with no dev iotlbs

2015-12-28 Thread Adam Morrison
From: Omer Peleg 

This patch avoids taking the device_domain_lock in iommu_flush_dev_iotlb()
for domains with no dev iotlb devices.

Signed-off-by: Omer Peleg 
[m...@cs.technion.ac.il: rebased and reworded the commit message]
Signed-off-by: Adam Morrison 
---
 drivers/iommu/intel-iommu.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c2a6f9e..fc23adc 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -391,6 +391,7 @@ struct dmar_domain {
 * to VT-d spec, section 9.3 */
 
struct list_head devices;   /* all devices' list */
+   bool has_iotlb_device;
struct iova_domain iovad;   /* iova's that belong to this domain */
 
struct dma_pte  *pgd;   /* virtual address */
@@ -1463,6 +1464,30 @@ iommu_support_dev_iotlb (struct dmar_domain *domain, 
struct intel_iommu *iommu,
return NULL;
 }
 
+static void domain_update_iotlb(struct dmar_domain *domain)
+{
+   struct device_domain_info *info;
+   bool has_iotlb_device = false;
+   unsigned long flags;
+
+   spin_lock_irqsave(&device_domain_lock, flags);
+   list_for_each_entry(info, &domain->devices, link) {
+   struct pci_dev *pdev;
+
+   if (!info->dev || !dev_is_pci(info->dev))
+   continue;
+
+   pdev = to_pci_dev(info->dev);
+   if (pdev->ats_enabled) {
+   has_iotlb_device = true;
+   break;
+   }
+   }
+
+   domain->has_iotlb_device = has_iotlb_device;
+   spin_unlock_irqrestore(&device_domain_lock, flags);
+}
+
 static void iommu_enable_dev_iotlb(struct device_domain_info *info)
 {
struct pci_dev *pdev;
@@ -1486,6 +1511,7 @@ static void iommu_enable_dev_iotlb(struct 
device_domain_info *info)
 #endif
if (info->ats_supported && !pci_enable_ats(pdev, VTD_PAGE_SHIFT)) {
info->ats_enabled = 1;
+   domain_update_iotlb(info);
info->ats_qdep = pci_ats_queue_depth(pdev);
}
 }
@@ -1502,6 +1528,7 @@ static void iommu_disable_dev_iotlb(struct 
device_domain_info *info)
if (info->ats_enabled) {
pci_disable_ats(pdev);
info->ats_enabled = 0;
+   domain_update_iotlb(info);
}
 #ifdef CONFIG_INTEL_IOMMU_SVM
if (info->pri_enabled) {
@@ -1522,6 +1549,9 @@ static void iommu_flush_dev_iotlb(struct dmar_domain 
*domain,
unsigned long flags;
struct device_domain_info *info;
 
+   if (!domain->has_iotlb_device)
+   return;
+
spin_lock_irqsave(&device_domain_lock, flags);
list_for_each_entry(info, &domain->devices, link) {
if (!info->ats_enabled)
@@ -1739,6 +1769,7 @@ static struct dmar_domain *alloc_domain(int flags)
memset(domain, 0, sizeof(*domain));
domain->nid = -1;
domain->flags = flags;
+   domain->has_iotlb_device = false;
INIT_LIST_HEAD(&domain->devices);
 
return domain;
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 7/7] iommu: introduce per-cpu caching to iova allocation

2015-12-28 Thread Adam Morrison
From: Omer Peleg 

This patch introduces global and per-CPU caches of IOVAs, so that
CPUs can usually allocate IOVAs without taking the rbtree spinlock
to do so.  The caching is based on magazines, as described in "Magazines
and Vmem: Extending the Slab Allocator to Many CPUs and Arbitrary
Resources" (currently available at
https://www.usenix.org/legacy/event/usenix01/bonwick.html)

Adding caching on top of the existing rbtree allocator maintains the
property that IOVAs are densely packed in the IO virtual address space,
which is important for keeping IOMMU page table usage low.

Signed-off-by: Omer Peleg 
[m...@cs.technion.ac.il: rebased, cleaned up and reworded the commit message]
Signed-off-by: Adam Morrison 
---
 drivers/iommu/intel-iommu.c |  12 +-
 drivers/iommu/iova.c| 326 +---
 include/linux/iova.h|  21 ++-
 3 files changed, 302 insertions(+), 57 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c2de0e7..7e9c2dd 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3282,11 +3282,11 @@ static unsigned long intel_alloc_iova(struct device 
*dev,
 * from higher range
 */
iova_pfn = alloc_iova(&domain->iovad, nrpages,
- IOVA_PFN(DMA_BIT_MASK(32)), 1);
+ IOVA_PFN(DMA_BIT_MASK(32)));
if (iova_pfn)
return iova_pfn;
}
-   iova_pfn = alloc_iova(&domain->iovad, nrpages, IOVA_PFN(dma_mask), 1);
+   iova_pfn = alloc_iova(&domain->iovad, nrpages, IOVA_PFN(dma_mask));
if (unlikely(!iova_pfn)) {
pr_err("Allocating %ld-page iova for %s failed",
   nrpages, dev_name(dev));
@@ -3447,7 +3447,7 @@ static dma_addr_t __intel_map_single(struct device *dev, 
phys_addr_t paddr,
 
 error:
if (iova_pfn)
-   free_iova(&domain->iovad, iova_pfn);
+   free_iova(&domain->iovad, iova_pfn, dma_to_mm_pfn(size));
pr_err("Device %s request: %zx@%llx dir %d --- failed\n",
dev_name(dev), size, (unsigned long long)paddr, dir);
return 0;
@@ -3502,7 +3502,7 @@ static void flush_unmaps(struct deferred_flush_data 
*flush_data)
iommu_flush_dev_iotlb(domain,
(uint64_t)iova_pfn << 
PAGE_SHIFT, mask);
}
-   free_iova(&domain->iovad, iova_pfn);
+   free_iova(&domain->iovad, iova_pfn, nrpages);
if (freelist)
dma_free_pagelist(freelist);
}
@@ -3593,7 +3593,7 @@ static void intel_unmap(struct device *dev, dma_addr_t 
dev_addr, size_t size)
iommu_flush_iotlb_psi(iommu, domain, start_pfn,
  nrpages, !freelist, 0);
/* free iova */
-   free_iova(&domain->iovad, iova_pfn);
+   free_iova(&domain->iovad, iova_pfn, dma_to_mm_pfn(nrpages));
dma_free_pagelist(freelist);
} else {
add_unmap(domain, iova_pfn, nrpages, freelist);
@@ -3751,7 +3751,7 @@ static int intel_map_sg(struct device *dev, struct 
scatterlist *sglist, int nele
if (unlikely(ret)) {
dma_pte_free_pagetable(domain, start_vpfn,
   start_vpfn + size - 1);
-   free_iova(&domain->iovad, iova_pfn);
+   free_iova(&domain->iovad, iova_pfn, dma_to_mm_pfn(size));
return 0;
}
 
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 9009ce6..38dd03a 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -20,11 +20,24 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+
+static bool iova_rcache_insert(struct iova_domain *iovad,
+  struct iova_rcache *rcache,
+  unsigned long iova_pfn);
+static unsigned long iova_rcache_get(struct iova_rcache *rcache);
+
+static void init_iova_rcache(struct iova_rcache *rcache);
+static void free_iova_rcache(struct iova_domain *iovad,
+struct iova_rcache *rcache);
 
 void
 init_iova_domain(struct iova_domain *iovad, unsigned long granule,
unsigned long start_pfn, unsigned long pfn_32bit)
 {
+   int i;
+
/*
 * IOVA granularity will normally be equal to the smallest
 * supported IOMMU page size; both *must* be capable of
@@ -38,6 +51,9 @@ init_iova_domain(struct iova_domain *iovad, unsigned long 
granule,
iovad->granule = granule;
iovad->start_pfn = start_pfn;
iovad->dma_32bit_pfn = pfn_32bit;
+
+   for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i)
+   init_iova_rcache(&iovad->rcaches[i]);
 }
 EXPORT_SYMBOL_GPL(init_iova_domain);
 
@@ -100,7 +116,7 @@ io

[PATCH 3/7] iommu: correct flush_unmaps pfn usage

2015-12-28 Thread Adam Morrison
From: Omer Peleg 

Change flush_unmaps() to correctly pass iommu_flush_iotlb_psi()
dma addresses.  (Intel mm and dma have the same size for pages
at the moment, but this usage improves consistency.)

Signed-off-by: Omer Peleg 
[m...@cs.technion.ac.il: rebased and reworded the commit message]
Signed-off-by: Adam Morrison 
---
 drivers/iommu/intel-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 5b3530c..6d6229e 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3463,7 +3463,8 @@ static void flush_unmaps(struct deferred_flush_data 
*flush_data)
/* On real hardware multiple invalidations are 
expensive */
if (cap_caching_mode(iommu->cap))
iommu_flush_iotlb_psi(iommu, domain,
-   iova->pfn_lo, iova_size(iova),
+   mm_to_dma_pfn(iova->pfn_lo),
+   mm_to_dma_pfn(iova_size(iova)),
!freelist, 0);
else {
mask = ilog2(mm_to_dma_pfn(iova_size(iova)));
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/7] iommu: only unmap mapped entries

2015-12-28 Thread Adam Morrison
From: Omer Peleg 

Current unmap implementation unmaps the entire area covered by the IOVA range,
which is a power-of-2 aligned region. The corresponding map, however, only maps
those pages originally mapped by the user. This discrepancy can lead to
unmapping of already unmapped entries, which is unneeded work.

With this patch, only mapped pages are unmapped. This is also a baseline for a
map/unmap implementation based on IOVAs and not iova structures, which will
allow caching.

Signed-off-by: Omer Peleg 
[m...@cs.technion.ac.il: rebased and reworded the commit message]
Signed-off-by: Adam Morrison 
---
 drivers/iommu/intel-iommu.c | 36 +---
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 6d6229e..e4d023b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -455,6 +455,7 @@ static void flush_unmaps_timeout(unsigned long data);
 
 struct deferred_flush_entry {
struct iova *iova;
+   unsigned long nrpages;
struct dmar_domain *domain;
struct page *freelist;
 };
@@ -3457,6 +3458,7 @@ static void flush_unmaps(struct deferred_flush_data 
*flush_data)
struct deferred_flush_entry *entry =
&flush_table->entries[j];
struct iova *iova = entry->iova;
+   unsigned long nrpages = entry->nrpages;
struct dmar_domain *domain = entry->domain;
struct page *freelist = entry->freelist;
 
@@ -3464,10 +3466,9 @@ static void flush_unmaps(struct deferred_flush_data 
*flush_data)
if (cap_caching_mode(iommu->cap))
iommu_flush_iotlb_psi(iommu, domain,
mm_to_dma_pfn(iova->pfn_lo),
-   mm_to_dma_pfn(iova_size(iova)),
-   !freelist, 0);
+   nrpages, !freelist, 0);
else {
-   mask = ilog2(mm_to_dma_pfn(iova_size(iova)));
+   mask = ilog2(nrpages);
iommu_flush_dev_iotlb(domain,
(uint64_t)iova->pfn_lo << 
PAGE_SHIFT, mask);
}
@@ -3491,7 +3492,8 @@ static void flush_unmaps_timeout(unsigned long cpuid)
spin_unlock_irqrestore(&flush_data->lock, flags);
 }
 
-static void add_unmap(struct dmar_domain *dom, struct iova *iova, struct page 
*freelist)
+static void add_unmap(struct dmar_domain *dom, struct iova *iova,
+ unsigned long nrpages, struct page *freelist)
 {
unsigned long flags;
int entry_id, iommu_id;
@@ -3516,6 +3518,7 @@ static void add_unmap(struct dmar_domain *dom, struct 
iova *iova, struct page *f
entry = &flush_data->tables[iommu_id].entries[entry_id];
entry->domain = dom;
entry->iova = iova;
+   entry->nrpages = nrpages;
entry->freelist = freelist;
 
if (!flush_data->timer_on) {
@@ -3528,10 +3531,11 @@ static void add_unmap(struct dmar_domain *dom, struct 
iova *iova, struct page *f
put_cpu();
 }
 
-static void intel_unmap(struct device *dev, dma_addr_t dev_addr)
+static void intel_unmap(struct device *dev, dma_addr_t dev_addr, size_t size)
 {
struct dmar_domain *domain;
unsigned long start_pfn, last_pfn;
+   unsigned long nrpages;
struct iova *iova;
struct intel_iommu *iommu;
struct page *freelist;
@@ -3549,8 +3553,9 @@ static void intel_unmap(struct device *dev, dma_addr_t 
dev_addr)
  (unsigned long long)dev_addr))
return;
 
+   nrpages = aligned_nrpages(dev_addr, size);
start_pfn = mm_to_dma_pfn(iova->pfn_lo);
-   last_pfn = mm_to_dma_pfn(iova->pfn_hi + 1) - 1;
+   last_pfn = start_pfn + nrpages - 1;
 
pr_debug("Device %s unmapping: pfn %lx-%lx\n",
 dev_name(dev), start_pfn, last_pfn);
@@ -3559,12 +3564,12 @@ static void intel_unmap(struct device *dev, dma_addr_t 
dev_addr)
 
if (intel_iommu_strict) {
iommu_flush_iotlb_psi(iommu, domain, start_pfn,
- last_pfn - start_pfn + 1, !freelist, 0);
+ nrpages, !freelist, 0);
/* free iova */
__free_iova(&domain->iovad, iova);
dma_free_pagelist(freelist);
} else {
-   add_unmap(domain, iova, freelist);
+   add_unmap(domain, iova, nrpages, freelist);
/*
 * queue up the release of the unmap to save the 1/6th of the
 * cpu used up by the iotlb flush operation...
@@ -3576,7 +3581,7 @@ static void intel_unmap_page(struct device *dev, 
dma_addr_t dev_a

[PATCH 1/7] iommu: refactoring of deferred flush entries

2015-12-28 Thread Adam Morrison
From: Omer Peleg 

Currently, deferred flushes' info is striped between several lists in
the flush tables. Instead, move all information about a specific flush
to a single entry in this table.

This patch does not introduce any functional change.

Signed-off-by: Omer Peleg 
[m...@cs.technion.ac.il: rebased and reworded the commit message]
Signed-off-by: Adam Morrison 
---
 drivers/iommu/intel-iommu.c | 48 +++--
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 16b243e..de7a9fc 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -455,15 +455,19 @@ static void flush_unmaps_timeout(unsigned long data);
 
 static DEFINE_TIMER(unmap_timer,  flush_unmaps_timeout, 0, 0);
 
+struct deferred_flush_entry {
+   struct iova *iova;
+   struct dmar_domain *domain;
+   struct page *freelist;
+};
+
 #define HIGH_WATER_MARK 250
-struct deferred_flush_tables {
+struct deferred_flush_table {
int next;
-   struct iova *iova[HIGH_WATER_MARK];
-   struct dmar_domain *domain[HIGH_WATER_MARK];
-   struct page *freelist[HIGH_WATER_MARK];
+   struct deferred_flush_entry entries[HIGH_WATER_MARK];
 };
 
-static struct deferred_flush_tables *deferred_flush;
+static struct deferred_flush_table *deferred_flush;
 
 /* bitmap for indexing intel_iommus */
 static int g_num_of_iommus;
@@ -3039,7 +3043,7 @@ static int __init init_dmars(void)
}
 
deferred_flush = kzalloc(g_num_of_iommus *
-   sizeof(struct deferred_flush_tables), GFP_KERNEL);
+   sizeof(struct deferred_flush_table), GFP_KERNEL);
if (!deferred_flush) {
ret = -ENOMEM;
goto free_g_iommus;
@@ -3435,22 +3439,25 @@ static void flush_unmaps(void)
 DMA_TLB_GLOBAL_FLUSH);
for (j = 0; j < deferred_flush[i].next; j++) {
unsigned long mask;
-   struct iova *iova = deferred_flush[i].iova[j];
-   struct dmar_domain *domain = 
deferred_flush[i].domain[j];
+   struct deferred_flush_entry *entry =
+   &deferred_flush->entries[j];
+   struct iova *iova = entry->iova;
+   struct dmar_domain *domain = entry->domain;
+   struct page *freelist = entry->freelist;
 
/* On real hardware multiple invalidations are 
expensive */
if (cap_caching_mode(iommu->cap))
iommu_flush_iotlb_psi(iommu, domain,
iova->pfn_lo, iova_size(iova),
-   !deferred_flush[i].freelist[j], 0);
+   !freelist, 0);
else {
mask = ilog2(mm_to_dma_pfn(iova_size(iova)));
-   
iommu_flush_dev_iotlb(deferred_flush[i].domain[j],
+   iommu_flush_dev_iotlb(domain,
(uint64_t)iova->pfn_lo << 
PAGE_SHIFT, mask);
}
-   __free_iova(&deferred_flush[i].domain[j]->iovad, iova);
-   if (deferred_flush[i].freelist[j])
-   
dma_free_pagelist(deferred_flush[i].freelist[j]);
+   __free_iova(&domain->iovad, iova);
+   if (freelist)
+   dma_free_pagelist(freelist);
}
deferred_flush[i].next = 0;
}
@@ -3470,8 +3477,9 @@ static void flush_unmaps_timeout(unsigned long data)
 static void add_unmap(struct dmar_domain *dom, struct iova *iova, struct page 
*freelist)
 {
unsigned long flags;
-   int next, iommu_id;
+   int entry_id, iommu_id;
struct intel_iommu *iommu;
+   struct deferred_flush_entry *entry;
 
spin_lock_irqsave(&async_umap_flush_lock, flags);
if (list_size == HIGH_WATER_MARK)
@@ -3480,11 +3488,13 @@ static void add_unmap(struct dmar_domain *dom, struct 
iova *iova, struct page *f
iommu = domain_get_iommu(dom);
iommu_id = iommu->seq_id;
 
-   next = deferred_flush[iommu_id].next;
-   deferred_flush[iommu_id].domain[next] = dom;
-   deferred_flush[iommu_id].iova[next] = iova;
-   deferred_flush[iommu_id].freelist[next] = freelist;
-   deferred_flush[iommu_id].next++;
+   entry_id = deferred_flush[iommu_id].next;
+   ++(deferred_flush[iommu_id].next);
+
+   entry = &deferred_flush[iommu_id].entries[entry_id];
+   entry->domain = dom;
+   entry->iova = iova;
+   entry->freelist = freelist;
 
if (!timer_on) {
mod_timer(&unmap_timer, jiffies + msecs_to_jiffies(10));
-- 
1.9.1

_

[PATCH 2/7] iommu: per-cpu deferred invalidation queues

2015-12-28 Thread Adam Morrison
From: Omer Peleg 

The IOMMU's IOTLB invalidation is a costly process.  When iommu mode
is not set to "strict", it is done asynchronously. Current code
amortizes the cost of invalidating IOTLB entries by batching all the
invalidations in the system and performing a single global invalidation
instead. The code queues pending invalidations in a global queue that
is accessed under the global "async_umap_flush_lock" spinlock, which
can result is significant spinlock contention.

This patch splits this deferred queue into multiple per-cpu defererred
queues, and thus gets rid of the "async_umap_flush_lock" and its
contention.

Signed-off-by: Omer Peleg 
[m...@cs.technion.ac.il: rebased, cleaned up and reworded the commit message]
Signed-off-by: Adam Morrison 
---
 drivers/iommu/intel-iommu.c | 105 +++-
 1 file changed, 64 insertions(+), 41 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index de7a9fc..7118ed3 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -453,8 +453,6 @@ static LIST_HEAD(dmar_rmrr_units);
 
 static void flush_unmaps_timeout(unsigned long data);
 
-static DEFINE_TIMER(unmap_timer,  flush_unmaps_timeout, 0, 0);
-
 struct deferred_flush_entry {
struct iova *iova;
struct dmar_domain *domain;
@@ -467,17 +465,19 @@ struct deferred_flush_table {
struct deferred_flush_entry entries[HIGH_WATER_MARK];
 };
 
-static struct deferred_flush_table *deferred_flush;
+struct deferred_flush_data {
+   spinlock_t lock;
+   int timer_on;
+   struct timer_list timer;
+   long size;
+   struct deferred_flush_table *tables;
+};
+
+DEFINE_PER_CPU(struct deferred_flush_data, deferred_flush);
 
 /* bitmap for indexing intel_iommus */
 static int g_num_of_iommus;
 
-static DEFINE_SPINLOCK(async_umap_flush_lock);
-static LIST_HEAD(unmaps_to_do);
-
-static int timer_on;
-static long list_size;
-
 static void domain_exit(struct dmar_domain *domain);
 static void domain_remove_dev_info(struct dmar_domain *domain);
 static void dmar_remove_one_dev_info(struct dmar_domain *domain,
@@ -1869,8 +1869,12 @@ static void domain_exit(struct dmar_domain *domain)
return;
 
/* Flush any lazy unmaps that may reference this domain */
-   if (!intel_iommu_strict)
-   flush_unmaps_timeout(0);
+   if (!intel_iommu_strict) {
+   int cpu;
+
+   for_each_possible_cpu(cpu)
+   flush_unmaps_timeout(cpu);
+   }
 
/* Remove associated devices and clear attached or cached domains */
rcu_read_lock();
@@ -3009,7 +3013,7 @@ static int __init init_dmars(void)
bool copied_tables = false;
struct device *dev;
struct intel_iommu *iommu;
-   int i, ret;
+   int i, ret, cpu;
 
/*
 * for each drhd
@@ -3042,11 +3046,20 @@ static int __init init_dmars(void)
goto error;
}
 
-   deferred_flush = kzalloc(g_num_of_iommus *
-   sizeof(struct deferred_flush_table), GFP_KERNEL);
-   if (!deferred_flush) {
-   ret = -ENOMEM;
-   goto free_g_iommus;
+   for_each_possible_cpu(cpu) {
+   struct deferred_flush_data *dfd = per_cpu_ptr(&deferred_flush,
+ cpu);
+
+   dfd->tables = kzalloc(g_num_of_iommus *
+ sizeof(struct deferred_flush_table),
+ GFP_KERNEL);
+   if (!dfd->tables) {
+   ret = -ENOMEM;
+   goto free_g_iommus;
+   }
+
+   spin_lock_init(&dfd->lock);
+   setup_timer(&dfd->timer, flush_unmaps_timeout, cpu);
}
 
for_each_active_iommu(iommu, drhd) {
@@ -3212,8 +3225,9 @@ free_iommu:
disable_dmar_iommu(iommu);
free_dmar_iommu(iommu);
}
-   kfree(deferred_flush);
 free_g_iommus:
+   for_each_possible_cpu(cpu)
+   kfree(per_cpu_ptr(&deferred_flush, cpu)->tables);
kfree(g_iommus);
 error:
return ret;
@@ -3418,29 +3432,31 @@ static dma_addr_t intel_map_page(struct device *dev, 
struct page *page,
  dir, *dev->dma_mask);
 }
 
-static void flush_unmaps(void)
+static void flush_unmaps(struct deferred_flush_data *flush_data)
 {
int i, j;
 
-   timer_on = 0;
+   flush_data->timer_on = 0;
 
/* just flush them all */
for (i = 0; i < g_num_of_iommus; i++) {
struct intel_iommu *iommu = g_iommus[i];
+   struct deferred_flush_table *flush_table =
+   &flush_data->tables[i];
if (!iommu)
continue;
 
-   if (!deferred_flush[i].next)
+   if (!flush_table->next)
continue;
 
   

[PATCH 0/7] Intel IOMMU scalability improvements

2015-12-28 Thread Adam Morrison
This patchset improves the scalability of the Intel IOMMU code by
resolving two spinlock bottlenecks, yielding up to ~10x performance
improvement and approaching iommu=off performance.

For example, here's the throughput obtained by 16 memcached instances
running on a 16-core Sandy Bridge system, accessed using memslap on
another machine that has iommu=off, using the default memslap config
(64-byte keys, 1024-byte values, and 10%/90% SET/GET ops):

stock iommu=off:
   1,088,996 memcached transactions/sec (=100%, median of 10 runs).
stock iommu=on:
   123,760 memcached transactions/sec (=11%).
   [perf: 43.56%0.86%  memcached   [kernel.kallsyms]  [k] 
_raw_spin_lock_irqsave]
patched iommu=on:
   1,067,586 memcached transactions/sec (=98%).
   [perf: 0.75% 0.75%  memcached   [kernel.kallsyms]  [k] 
_raw_spin_lock_irqsave]

The two resolved spinlocks:

 - Deferred IOTLB invalidations are batched in a global data structure
   and serialized under a spinlock (add_unmap() & flush_unmaps()); this
   patchset batches IOTLB invalidations in a per-CPU data structure.

 - IOVA management (alloc_iova() & __free_iova()) is serialized under
   the rbtree spinlock; this patchset adds per-CPU caches of allocated
   IOVAs so that the rbtree doesn't get accessed frequently. (Adding a
   cache above the existing IOVA allocator is less intrusive than dynamic
   identity mapping and helps keep IOMMU page table usage low; see
   Patch 7.)

The paper "Utilizing the IOMMU Scalably" (presented at the 2015 USENIX
Annual Technical Conference) contains many more details and experiments:

  https://www.usenix.org/system/files/conference/atc15/atc15-paper-peleg.pdf


Omer Peleg (7):
  iommu: refactoring of deferred flush entries
  iommu: per-cpu deferred invalidation queues
  iommu: correct flush_unmaps pfn usage
  iommu: only unmap mapped entries
  iommu: avoid dev iotlb logic in intel-iommu for domains with no dev
iotlbs
  iommu: change intel-iommu to use IOVA frame numbers
  iommu: introduce per-cpu caching to iova allocation

 drivers/iommu/intel-iommu.c | 264 +-
 drivers/iommu/iova.c| 334 +---
 include/linux/iova.h|  23 ++-
 3 files changed, 470 insertions(+), 151 deletions(-)

-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/ipmmu-vmsa: Don't truncate ttbr if LPAE is not enabled

2015-12-28 Thread Joerg Roedel
On Tue, Dec 22, 2015 at 08:01:06PM +0100, Geert Uytterhoeven wrote:
>  static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain)
>  {
> - phys_addr_t ttbr;
> + u64 ttbr;
>   u32 tmp;
>   int ret;

Didn't apply cleanly to v4.4-rc7, but fixed it up and queued it to
iommu/fixes.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu/dma: Add some missing #includes

2015-12-28 Thread Joerg Roedel
On Fri, Dec 18, 2015 at 05:01:46PM +, Robin Murphy wrote:
> Hi Joerg,
> 
> here are a couple more minor fixes for some obscure subtleties in the
> common DMA code. I see I've just missed the rc5 fixes pull, but I hope
> you can pick these up for rc6 (unless of course you're also just about
> to disappear for 2 weeks like I am).

Applied 1 and 3 to iommu/fixes, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] iommu/dma: Use correct offset in map_sg

2015-12-28 Thread Joerg Roedel
On Fri, Dec 18, 2015 at 05:01:47PM +, Robin Murphy wrote:
> When mapping a non-page-aligned scatterlist entry, we copy the original
> offset to the output DMA address before aligning it to hand off to
> iommu_map_sg(), then later adding the IOVA page address portion to get
> the final mapped address. However, when the IOVA page size is smaller
> than the CPU page size, it is the offset within the IOVA page we want,
> not that within the CPU page, which can easily be larger than an IOVA
> page and thus result in an incorrect final address.
> 
> Fix the bug by taking only the IOVA-aligned part of the offset as the
> basis of the DMA address, not the whole thing.
> 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/dma-iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 982e716..03811e3 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -458,7 +458,7 @@ int iommu_dma_map_sg(struct device *dev, struct 
> scatterlist *sg,
>   size_t s_length = s->length;
>   size_t pad_len = (mask - iova_len + 1) & mask;
>  
> - sg_dma_address(s) = s->offset;
> + sg_dma_address(s) = s_offset;

Does not apply on v4.4-rc7.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/6] iommu/amd: Introduce amd_iommu_get_num_iommus()

2015-12-28 Thread Joerg Roedel
On Tue, Dec 22, 2015 at 01:19:14PM -0600, Suthikulpanit, Suravee wrote:
> This patch introduces amd_iommu_get_num_iommus(). Initially, this is
> intended to be used by Perf AMD IOMMU driver.
> 
> Signed-off-by: Suravee Suthikulpanit 
> ---
>  drivers/iommu/amd_iommu_init.c| 16 
>  include/linux/perf/perf_event_amd_iommu.h |  2 ++
>  2 files changed, 18 insertions(+)
> 
> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> index 275c0f5..9c62613 100644
> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -2244,6 +2244,22 @@ bool amd_iommu_v2_supported(void)
>  }
>  EXPORT_SYMBOL(amd_iommu_v2_supported);
>  
> +static int amd_iommu_cnt;
> +
> +int amd_iommu_get_num_iommus(void)
> +{
> + struct amd_iommu *iommu;
> +
> + if (amd_iommu_cnt)
> + return amd_iommu_cnt;
> +
> + for_each_iommu(iommu)
> + amd_iommu_cnt++;

It is better to set amd_iommu_cnt during IOMMU initialization. You can
just increment this value after an IOMMU has been set up.



Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu