Re: [PATCH] iommu:Check that iommu_device_create has completed successfully in alloc_iommu
2015-12-29 13:10 GMT+08:00 Nicholas Krause : > This adds the proper check to alloc_iommu to make sure that the call > to iommu_device_create has completed successfully and if not return > to the caller the error code returned after freeing up resources > allocated previously by alloc_iommu. > > Signed-off-by: Nicholas Krause > --- > drivers/iommu/dmar.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c > index 80e3c17..27333b6 100644 > --- a/drivers/iommu/dmar.c > +++ b/drivers/iommu/dmar.c > @@ -1069,9 +1069,12 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd) > iommu->iommu_dev = iommu_device_create(NULL, iommu, >intel_iommu_groups, >"%s", iommu->name); > + if (IS_ERR(iommu->iommu_dev)) { > + err = PTR_ERR(iommu->iommu_dev); > + goto err_unmap; > + } If so, will this bad 'iommu->iommu_dev' break your iommu work? It seems not necessary. > > return 0; > - > err_unmap: > unmap_iommu(iommu); > error_free_seq_id: > -- > 2.5.0 > > ___ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu -- --- Vincent Wan(Zongshun) www.mcuos.com ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu:Check that iommu_device_create has completed successfully in alloc_iommu
This adds the proper check to alloc_iommu to make sure that the call to iommu_device_create has completed successfully and if not return to the caller the error code returned after freeing up resources allocated previously by alloc_iommu. Signed-off-by: Nicholas Krause --- drivers/iommu/dmar.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index 80e3c17..27333b6 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -1069,9 +1069,12 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd) iommu->iommu_dev = iommu_device_create(NULL, iommu, intel_iommu_groups, "%s", iommu->name); + if (IS_ERR(iommu->iommu_dev)) { + err = PTR_ERR(iommu->iommu_dev); + goto err_unmap; + } return 0; - err_unmap: unmap_iommu(iommu); error_free_seq_id: -- 2.5.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 06/06] iommu/ipmmu-vmsa: Drop LPAE Kconfig dependency
Hi Magnus, Thank you for the patch. On Tuesday 15 December 2015 21:03:08 Magnus Damm wrote: > From: Magnus Damm > > Neither the ARM page table code enabled by IOMMU_IO_PGTABLE_LPAE > nor the IPMMU_VMSA driver actually depends on ARM_LPAE, so get > rid of the dependency. > > Tested with ipmmu-vmsa on r8a7794 ALT and a kernel config using: > # CONFIG_ARM_LPAE is not set > > Signed-off-by: Magnus Damm Acked-by: Laurent Pinchart > --- > > This time the result also compiles on x86. Need to be > applied as last patch in the following series: > [PATCH 00/06] iommu/ipmmu-vmsa: IPMMU multi-arch update > > drivers/iommu/Kconfig |1 - > 1 file changed, 1 deletion(-) > > --- 0001/drivers/iommu/Kconfig > +++ work/drivers/iommu/Kconfig2015-10-18 14:58:09.080513000 +0900 > @@ -324,7 +324,6 @@ config SHMOBILE_IOMMU_L1SIZE > > config IPMMU_VMSA > bool "Renesas VMSA-compatible IPMMU" > - depends on ARM_LPAE > depends on ARCH_SHMOBILE || COMPILE_TEST > select IOMMU_API > select IOMMU_IO_PGTABLE_LPAE -- Regards, Laurent Pinchart ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 05/06] iommu/ipmmu-vmsa: Break out 32-bit ARM mapping code
Hi Magnus, Thank you for the patch. On Tuesday 15 December 2015 21:02:58 Magnus Damm wrote: > From: Magnus Damm > > Make the driver compile on more than just 32-bit ARM > by breaking out and wrapping ARM specific functions > in #ifdefs. Needed to be able to use the driver on > other architectures. The only other architecture where we need to use the driver is ARM64, and I'm not sure this is the right fix to correctly support ARM and ARM64. I believe we should instead improve the ARM implementation to get rid of arm_iommu_create_mapping() and arm_iommu_attach_device() the same way that ARM64 did. > Signed-off-by: Magnus Damm > --- > > drivers/iommu/ipmmu-vmsa.c | 94 ++--- > 1 file changed, 62 insertions(+), 32 deletions(-) > > --- 0007/drivers/iommu/ipmmu-vmsa.c > +++ work/drivers/iommu/ipmmu-vmsa.c 2015-12-15 13:20:26.580513000 +0900 > @@ -22,8 +22,10 @@ > #include > #include > > +#ifdef CONFIG_ARM > #include > #include > +#endif > > #include "io-pgtable.h" > > @@ -38,7 +40,9 @@ struct ipmmu_vmsa_device { > DECLARE_BITMAP(ctx, IPMMU_CTX_MAX); > struct ipmmu_vmsa_domain *domains[IPMMU_CTX_MAX]; > > +#ifdef CONFIG_ARM > struct dma_iommu_mapping *mapping; > +#endif > }; > > struct ipmmu_vmsa_domain { > @@ -621,6 +625,60 @@ static int ipmmu_find_utlbs(struct ipmmu > return 0; > } > > +#ifdef CONFIG_ARM > +static int ipmmu_map_attach(struct device *dev, struct ipmmu_vmsa_device > *mmu) > +{ > + int ret; > + > + /* > + * Create the ARM mapping, used by the ARM DMA mapping core to allocate > + * VAs. This will allocate a corresponding IOMMU domain. > + * > + * TODO: > + * - Create one mapping per context (TLB). > + * - Make the mapping size configurable ? We currently use a 2GB mapping > + * at a 1GB offset to ensure that NULL VAs will fault. > + */ > + if (!mmu->mapping) { > + struct dma_iommu_mapping *mapping; > + > + mapping = arm_iommu_create_mapping(&platform_bus_type, > +SZ_1G, SZ_2G); > + if (IS_ERR(mapping)) { > + dev_err(mmu->dev, "failed to create ARM IOMMU > mapping\n"); > + return PTR_ERR(mapping); > + } > + > + mmu->mapping = mapping; > + } > + > + /* Attach the ARM VA mapping to the device. */ > + ret = arm_iommu_attach_device(dev, mmu->mapping); > + if (ret < 0) { > + dev_err(dev, "Failed to attach device to VA mapping\n"); > + arm_iommu_release_mapping(mmu->mapping); > + } > + > + return ret; > +} > +static inline void ipmmu_detach(struct device *dev) > +{ > + arm_iommu_detach_device(dev); > +} > +static inline void ipmmu_release_mapping(struct ipmmu_vmsa_device *mmu) > +{ > + arm_iommu_release_mapping(mmu->mapping); > +} > +#else > +static inline int ipmmu_map_attach(struct device *dev, > +struct ipmmu_vmsa_device *mmu) > +{ > + return 0; > +} > +static inline void ipmmu_detach(struct device *dev) {} > +static inline void ipmmu_release_mapping(struct ipmmu_vmsa_device *mmu) {} > +#endif > + > static int ipmmu_add_device(struct device *dev) > { > struct ipmmu_vmsa_dev_data *dev_data = get_dev_data(dev); > @@ -701,41 +759,13 @@ static int ipmmu_add_device(struct devic > dev_data->num_utlbs = num_utlbs; > set_dev_data(dev, dev_data); > > - /* > - * Create the ARM mapping, used by the ARM DMA mapping core to allocate > - * VAs. This will allocate a corresponding IOMMU domain. > - * > - * TODO: > - * - Create one mapping per context (TLB). > - * - Make the mapping size configurable ? We currently use a 2GB mapping > - * at a 1GB offset to ensure that NULL VAs will fault. > - */ > - if (!mmu->mapping) { > - struct dma_iommu_mapping *mapping; > - > - mapping = arm_iommu_create_mapping(&platform_bus_type, > -SZ_1G, SZ_2G); > - if (IS_ERR(mapping)) { > - dev_err(mmu->dev, "failed to create ARM IOMMU > mapping\n"); > - ret = PTR_ERR(mapping); > - goto error; > - } > - > - mmu->mapping = mapping; > - } > - > - /* Attach the ARM VA mapping to the device. */ > - ret = arm_iommu_attach_device(dev, mmu->mapping); > - if (ret < 0) { > - dev_err(dev, "Failed to attach device to VA mapping\n"); > + ret = ipmmu_map_attach(dev, mmu); > + if (ret < 0) > goto error; > - } > > return 0; > > error: > - arm_iommu_release_mapping(mmu->mapping); > - > kfree(dev_data); > kfree(utlbs); > > @@ -751,7 +781,7 @@ static void ipmmu_remove_device(struct d > { > struct ipmmu_vmsa_dev_data *dev_data = get_dev_data(dev); > > -
Re: [PATCH 04/06] iommu/ipmmu-vmsa: Rework interrupt code and use bitmap for context
Hi Magnus, Thank you for the patch. On Tuesday 15 December 2015 21:02:49 Magnus Damm wrote: > From: Magnus Damm > > Introduce a bitmap for context handing and convert the > interrupt routine to go handle all registered contexts. > > At this point the number of contexts are still limited. That's all nice, but without seeing support for multiple contexts it's hard to tell if the implementation is correct for multiple context purpose. > The purpose of this patch is to remove the use of the > ARM specific mapping variable from ipmmu_irq(). Why do you want to do that ? > Signed-off-by: Magnus Damm > --- > > drivers/iommu/ipmmu-vmsa.c | 37 ++--- > 1 file changed, 26 insertions(+), 11 deletions(-) > > --- 0007/drivers/iommu/ipmmu-vmsa.c > +++ work/drivers/iommu/ipmmu-vmsa.c 2015-12-15 13:14:35.540513000 +0900 > @@ -8,6 +8,7 @@ > * the Free Software Foundation; version 2 of the License. > */ > > +#include > #include > #include > #include > @@ -26,12 +27,16 @@ > > #include "io-pgtable.h" > > +#define IPMMU_CTX_MAX 1 > + > struct ipmmu_vmsa_device { > struct device *dev; > void __iomem *base; > struct list_head list; > > unsigned int num_utlbs; > + DECLARE_BITMAP(ctx, IPMMU_CTX_MAX); We have up to 4 context on Gen2 and 8 on Gen3, a bitmap might be slightly overkill. > + struct ipmmu_vmsa_domain *domains[IPMMU_CTX_MAX]; > > struct dma_iommu_mapping *mapping; > }; > @@ -319,6 +324,7 @@ static struct iommu_gather_ops ipmmu_gat > static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain) > { > phys_addr_t ttbr; > + int ret; > > /* >* Allocate the page table operations. > @@ -348,10 +354,16 @@ static int ipmmu_domain_init_context(str > return -EINVAL; > > /* > - * TODO: When adding support for multiple contexts, find an unused > - * context. > + * Find an unused context. We need to support multiple devices per context or we will very soon run out of contexts. How to pick a proper context is a topic that needs to be researched, I believe IOMMU groups might come into play. >*/ > - domain->context_id = 0; > + ret = bitmap_find_free_region(domain->mmu->ctx, IPMMU_CTX_MAX, 0); > + if (ret < 0) { > + free_io_pgtable_ops(domain->iop); > + return ret; > + } > + > + domain->context_id = ret; > + domain->mmu->domains[ret] = domain; This requires locking to protect against races with the interrupt handler. > > /* TTBR0 */ > ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr[0]; > @@ -395,6 +407,8 @@ static int ipmmu_domain_init_context(str > > static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain) > { > + bitmap_release_region(domain->mmu->ctx, domain->context_id, 0); > + > /* >* Disable the context. Flush the TLB as required when modifying the >* context registers. > @@ -460,16 +474,16 @@ static irqreturn_t ipmmu_domain_irq(stru > static irqreturn_t ipmmu_irq(int irq, void *dev) > { > struct ipmmu_vmsa_device *mmu = dev; > - struct iommu_domain *io_domain; > - struct ipmmu_vmsa_domain *domain; > - > - if (!mmu->mapping) > - return IRQ_NONE; > + irqreturn_t status = IRQ_NONE; > + unsigned int k; i is a perfectly fine loop counter :-) > - io_domain = mmu->mapping->domain; > - domain = to_vmsa_domain(io_domain); > + /* Check interrupts for all active contexts */ > + for (k = find_first_bit(mmu->ctx, IPMMU_CTX_MAX); > + k < IPMMU_CTX_MAX && status == IRQ_NONE; > + k = find_next_bit(mmu->ctx, IPMMU_CTX_MAX, k)) You can just loop over mmu->domains and skip NULL entries. > + status = ipmmu_domain_irq(mmu->domains[k]); Only the status of the last domain is taken into account. > - return ipmmu_domain_irq(domain); > + return status; > } > > /* > @@ -788,6 +802,7 @@ static int ipmmu_probe(struct platform_d > > mmu->dev = &pdev->dev; > mmu->num_utlbs = 32; > + bitmap_zero(mmu->ctx, IPMMU_CTX_MAX); > > /* Map I/O memory and request IRQ. */ > res = platform_get_resource(pdev, IORESOURCE_MEM, 0); -- Regards, Laurent Pinchart ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 03/06] iommu/ipmmu-vmsa: Break out utlb control function
Hi Magnus, On Tuesday 15 December 2015 16:48:42 Geert Uytterhoeven wrote: > On Tue, Dec 15, 2015 at 1:02 PM, Magnus Damm wrote: > > --- 0004/drivers/iommu/ipmmu-vmsa.c > > +++ work/drivers/iommu/ipmmu-vmsa.c 2015-12-15 13:17:40.580513000 > > +0900 > > @@ -279,9 +279,18 @@ static void ipmmu_utlb_enable(struct ipm > > static void ipmmu_utlb_disable(struct ipmmu_vmsa_domain *domain, > >unsigned int utlb) > > { > > - struct ipmmu_vmsa_device *mmu = domain->mmu; > > + ipmmu_write(domain->mmu, IMUCTR(utlb), 0); > > +} > > + > > +static void ipmmu_utlb_ctrl(struct ipmmu_vmsa_domain *domain, > > + void (*fn)(struct ipmmu_vmsa_domain *, > > + unsigned int utlb), struct device > > *dev) > > +{ > > + struct ipmmu_vmsa_dev_data *dev_data = get_dev_data(dev); > > + unsigned int i; > > > > - ipmmu_write(mmu, IMUCTR(utlb), 0); > > + for (i = 0; i < dev_data->num_utlbs; ++i) > > + fn(domain, dev_data->utlbs[i]); > > > > } > > Unless you have further plans with the "fn" parameter, I would simply pass > a bool enable/disable flag instead of a function pointer. I agree with Geert. What's your plan here ? -- Regards, Laurent Pinchart ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 02/06] iommu/ipmmu-vmsa: Convert to dev_data
Hi Magnus, Thank you for the patch. On Tuesday 15 December 2015 21:02:30 Magnus Damm wrote: > From: Magnus Damm > > Rename ipmmu_vmsa_archdata to ipmmu_vmsa_dev_data to avoid > confusion when using the driver on multiple architectures. I'm sorry but this patches makes the driver more confusing to me. archdata is a well established concept for IOMMU on ARM, I don't see how hiding help would help. > The data now stored in ipmmu_vmsa_dev_data is used to point > various on-chip devices to the actual IPMMU instances. > > Signed-off-by: Magnus Damm > --- > > drivers/iommu/ipmmu-vmsa.c | 58 + > 1 file changed, 36 insertions(+), 22 deletions(-) > > --- 0003/drivers/iommu/ipmmu-vmsa.c > +++ work/drivers/iommu/ipmmu-vmsa.c 2015-12-15 13:16:27.160513000 +0900 > @@ -47,7 +47,7 @@ struct ipmmu_vmsa_domain { > spinlock_t lock;/* Protects mappings */ > }; > > -struct ipmmu_vmsa_archdata { > +struct ipmmu_vmsa_dev_data { > struct ipmmu_vmsa_device *mmu; > unsigned int *utlbs; > unsigned int num_utlbs; > @@ -182,6 +182,20 @@ static struct ipmmu_vmsa_domain *to_vmsa > #define IMUASID_ASID0_SHIFT 0 > > /* > + * Consumer device side private data handling > + */ > + > +static struct ipmmu_vmsa_dev_data *get_dev_data(struct device *dev) > +{ > + return dev->archdata.iommu; > +} > + > +static void set_dev_data(struct device *dev, struct ipmmu_vmsa_dev_data > *data) > +{ > + dev->archdata.iommu = data; > +} > + > +/* > * Read/Write Access > */ > > @@ -485,8 +499,8 @@ static void ipmmu_domain_free(struct iom > static int ipmmu_attach_device(struct iommu_domain *io_domain, > struct device *dev) > { > - struct ipmmu_vmsa_archdata *archdata = dev->archdata.iommu; > - struct ipmmu_vmsa_device *mmu = archdata->mmu; > + struct ipmmu_vmsa_dev_data *dev_data = get_dev_data(dev); > + struct ipmmu_vmsa_device *mmu = dev_data->mmu; > struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain); > unsigned long flags; > unsigned int i; > @@ -518,8 +532,8 @@ static int ipmmu_attach_device(struct io > if (ret < 0) > return ret; > > - for (i = 0; i < archdata->num_utlbs; ++i) > - ipmmu_utlb_enable(domain, archdata->utlbs[i]); > + for (i = 0; i < dev_data->num_utlbs; ++i) > + ipmmu_utlb_enable(domain, dev_data->utlbs[i]); > > return 0; > } > @@ -527,12 +541,12 @@ static int ipmmu_attach_device(struct io > static void ipmmu_detach_device(struct iommu_domain *io_domain, > struct device *dev) > { > - struct ipmmu_vmsa_archdata *archdata = dev->archdata.iommu; > + struct ipmmu_vmsa_dev_data *dev_data = get_dev_data(dev); > struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain); > unsigned int i; > > - for (i = 0; i < archdata->num_utlbs; ++i) > - ipmmu_utlb_disable(domain, archdata->utlbs[i]); > + for (i = 0; i < dev_data->num_utlbs; ++i) > + ipmmu_utlb_disable(domain, dev_data->utlbs[i]); > > /* >* TODO: Optimize by disabling the context when no device is attached. > @@ -595,7 +609,7 @@ static int ipmmu_find_utlbs(struct ipmmu > > static int ipmmu_add_device(struct device *dev) > { > - struct ipmmu_vmsa_archdata *archdata; > + struct ipmmu_vmsa_dev_data *dev_data = get_dev_data(dev); > struct ipmmu_vmsa_device *mmu; > struct iommu_group *group = NULL; > unsigned int *utlbs; > @@ -603,7 +617,7 @@ static int ipmmu_add_device(struct devic > int num_utlbs; > int ret = -ENODEV; > > - if (dev->archdata.iommu) { > + if (dev_data) { > dev_warn(dev, "IOMMU driver already assigned to device %s\n", >dev_name(dev)); > return -EINVAL; > @@ -662,16 +676,16 @@ static int ipmmu_add_device(struct devic > goto error; > } > > - archdata = kzalloc(sizeof(*archdata), GFP_KERNEL); > - if (!archdata) { > + dev_data = kzalloc(sizeof(*dev_data), GFP_KERNEL); > + if (!dev_data) { > ret = -ENOMEM; > goto error; > } > > - archdata->mmu = mmu; > - archdata->utlbs = utlbs; > - archdata->num_utlbs = num_utlbs; > - dev->archdata.iommu = archdata; > + dev_data->mmu = mmu; > + dev_data->utlbs = utlbs; > + dev_data->num_utlbs = num_utlbs; > + set_dev_data(dev, dev_data); > > /* >* Create the ARM mapping, used by the ARM DMA mapping core to allocate > @@ -708,10 +722,10 @@ static int ipmmu_add_device(struct devic > error: > arm_iommu_release_mapping(mmu->mapping); > > - kfree(dev->archdata.iommu); > + kfree(dev_data); > kfree(utlbs); > > -
Re: [PATCH 01/06] iommu/ipmmu-vmsa: Remove platform data handling
Hi Magnus, Thank you for the patch. On Tuesday 15 December 2015 21:02:21 Magnus Damm wrote: > From: Magnus Damm > > The IPMMU driver is using DT these days, and platform data is no longer > used by the driver. Remove unused code. > > Signed-off-by: Magnus Damm Reviewed-by: Laurent Pinchart > --- > > drivers/iommu/ipmmu-vmsa.c |5 - > 1 file changed, 5 deletions(-) > > --- 0001/drivers/iommu/ipmmu-vmsa.c > +++ work/drivers/iommu/ipmmu-vmsa.c 2015-12-15 11:35:00.490513000 +0900 > @@ -766,11 +766,6 @@ static int ipmmu_probe(struct platform_d > int irq; > int ret; > > - if (!IS_ENABLED(CONFIG_OF) && !pdev->dev.platform_data) { > - dev_err(&pdev->dev, "missing platform data\n"); > - return -EINVAL; > - } > - > mmu = devm_kzalloc(&pdev->dev, sizeof(*mmu), GFP_KERNEL); > if (!mmu) { > dev_err(&pdev->dev, "cannot allocate device data\n"); -- Regards, Laurent Pinchart ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Regression since 4.1
Hi Tobias, On Thu, Dec 10, 2015 at 12:07:57AM +0100, Tobias Geiger wrote: > It all comes down to kernel > 4.1 && iommu=on = not working here > > Hardware is: > Intel DX58SO, Intel CPU i7 920... all pretty solid running for years now, > always with iommo=on. > > I'd love to debug this further, just dont know how... Sounds like only bisecting the issue will help here. You can find an introduction on bisecting here: https://git-scm.com/book/en/v2/Git-Tools-Debugging-with-Git Would be great if you can find the time to bisect the issue. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 6/7] iommu: change intel-iommu to use IOVA frame numbers
From: Omer Peleg Make intel-iommu map/unmap/invalidate work with IOVA pfns instead of pointers to "struct iova". This avoids using the iova struct from the IOVA red-black tree and the resulting explicit find_iova() on unmap. This patch will allow us to cache IOVAs in the next patch, in order to avoid rbtree operations for the majority of map/unmap operations. Note: In eliminating the find_iova() operation, we have also eliminated the sanity check previously done in the unmap flow. Arguably, this was overhead that is better avoided in production code, but it could be brought back as a debug option for driver development. Signed-off-by: Omer Peleg [m...@cs.technion.ac.il: rebased and reworded the commit message] Signed-off-by: Adam Morrison --- drivers/iommu/intel-iommu.c | 73 ++--- drivers/iommu/iova.c| 8 ++--- include/linux/iova.h| 2 +- 3 files changed, 40 insertions(+), 43 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 6c37bbc..c2de0e7 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -455,7 +455,7 @@ static LIST_HEAD(dmar_rmrr_units); static void flush_unmaps_timeout(unsigned long data); struct deferred_flush_entry { - struct iova *iova; + unsigned long iova_pfn; unsigned long nrpages; struct dmar_domain *domain; struct page *freelist; @@ -3264,11 +3264,11 @@ error: } /* This takes a number of _MM_ pages, not VTD pages */ -static struct iova *intel_alloc_iova(struct device *dev, +static unsigned long intel_alloc_iova(struct device *dev, struct dmar_domain *domain, unsigned long nrpages, uint64_t dma_mask) { - struct iova *iova = NULL; + unsigned long iova_pfn; /* Restrict dma_mask to the width that the iommu can handle */ dma_mask = min_t(uint64_t, DOMAIN_MAX_ADDR(domain->gaw), dma_mask); @@ -3281,19 +3281,19 @@ static struct iova *intel_alloc_iova(struct device *dev, * DMA_BIT_MASK(32) and if that fails then try allocating * from higher range */ - iova = alloc_iova(&domain->iovad, nrpages, - IOVA_PFN(DMA_BIT_MASK(32)), 1); - if (iova) - return iova; + iova_pfn = alloc_iova(&domain->iovad, nrpages, + IOVA_PFN(DMA_BIT_MASK(32)), 1); + if (iova_pfn) + return iova_pfn; } - iova = alloc_iova(&domain->iovad, nrpages, IOVA_PFN(dma_mask), 1); - if (unlikely(!iova)) { + iova_pfn = alloc_iova(&domain->iovad, nrpages, IOVA_PFN(dma_mask), 1); + if (unlikely(!iova_pfn)) { pr_err("Allocating %ld-page iova for %s failed", nrpages, dev_name(dev)); - return NULL; + return 0; } - return iova; + return iova_pfn; } static struct dmar_domain *__get_valid_domain_for_dev(struct device *dev) @@ -3391,7 +3391,7 @@ static dma_addr_t __intel_map_single(struct device *dev, phys_addr_t paddr, { struct dmar_domain *domain; phys_addr_t start_paddr; - struct iova *iova; + unsigned long iova_pfn; int prot = 0; int ret; struct intel_iommu *iommu; @@ -3409,8 +3409,8 @@ static dma_addr_t __intel_map_single(struct device *dev, phys_addr_t paddr, iommu = domain_get_iommu(domain); size = aligned_nrpages(paddr, size); - iova = intel_alloc_iova(dev, domain, dma_to_mm_pfn(size), dma_mask); - if (!iova) + iova_pfn = intel_alloc_iova(dev, domain, dma_to_mm_pfn(size), dma_mask); + if (!iova_pfn) goto error; /* @@ -3428,7 +3428,7 @@ static dma_addr_t __intel_map_single(struct device *dev, phys_addr_t paddr, * might have two guest_addr mapping to the same host paddr, but this * is not a big problem */ - ret = domain_pfn_mapping(domain, mm_to_dma_pfn(iova->pfn_lo), + ret = domain_pfn_mapping(domain, mm_to_dma_pfn(iova_pfn), mm_to_dma_pfn(paddr_pfn), size, prot); if (ret) goto error; @@ -3436,18 +3436,18 @@ static dma_addr_t __intel_map_single(struct device *dev, phys_addr_t paddr, /* it's a non-present to present mapping. Only flush if caching mode */ if (cap_caching_mode(iommu->cap)) iommu_flush_iotlb_psi(iommu, domain, - mm_to_dma_pfn(iova->pfn_lo), + mm_to_dma_pfn(iova_pfn), size, 0, 1); else iommu_flush_write_buffer(iommu); - start_paddr = (phys_addr_t)iova->pfn_lo << PAGE_SHIFT; + start_paddr = (phys_addr_t)iova_pfn << PAGE_SHIFT;
[PATCH 5/7] iommu: avoid dev iotlb logic in intel-iommu for domains with no dev iotlbs
From: Omer Peleg This patch avoids taking the device_domain_lock in iommu_flush_dev_iotlb() for domains with no dev iotlb devices. Signed-off-by: Omer Peleg [m...@cs.technion.ac.il: rebased and reworded the commit message] Signed-off-by: Adam Morrison --- drivers/iommu/intel-iommu.c | 31 +++ 1 file changed, 31 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index c2a6f9e..fc23adc 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -391,6 +391,7 @@ struct dmar_domain { * to VT-d spec, section 9.3 */ struct list_head devices; /* all devices' list */ + bool has_iotlb_device; struct iova_domain iovad; /* iova's that belong to this domain */ struct dma_pte *pgd; /* virtual address */ @@ -1463,6 +1464,30 @@ iommu_support_dev_iotlb (struct dmar_domain *domain, struct intel_iommu *iommu, return NULL; } +static void domain_update_iotlb(struct dmar_domain *domain) +{ + struct device_domain_info *info; + bool has_iotlb_device = false; + unsigned long flags; + + spin_lock_irqsave(&device_domain_lock, flags); + list_for_each_entry(info, &domain->devices, link) { + struct pci_dev *pdev; + + if (!info->dev || !dev_is_pci(info->dev)) + continue; + + pdev = to_pci_dev(info->dev); + if (pdev->ats_enabled) { + has_iotlb_device = true; + break; + } + } + + domain->has_iotlb_device = has_iotlb_device; + spin_unlock_irqrestore(&device_domain_lock, flags); +} + static void iommu_enable_dev_iotlb(struct device_domain_info *info) { struct pci_dev *pdev; @@ -1486,6 +1511,7 @@ static void iommu_enable_dev_iotlb(struct device_domain_info *info) #endif if (info->ats_supported && !pci_enable_ats(pdev, VTD_PAGE_SHIFT)) { info->ats_enabled = 1; + domain_update_iotlb(info); info->ats_qdep = pci_ats_queue_depth(pdev); } } @@ -1502,6 +1528,7 @@ static void iommu_disable_dev_iotlb(struct device_domain_info *info) if (info->ats_enabled) { pci_disable_ats(pdev); info->ats_enabled = 0; + domain_update_iotlb(info); } #ifdef CONFIG_INTEL_IOMMU_SVM if (info->pri_enabled) { @@ -1522,6 +1549,9 @@ static void iommu_flush_dev_iotlb(struct dmar_domain *domain, unsigned long flags; struct device_domain_info *info; + if (!domain->has_iotlb_device) + return; + spin_lock_irqsave(&device_domain_lock, flags); list_for_each_entry(info, &domain->devices, link) { if (!info->ats_enabled) @@ -1739,6 +1769,7 @@ static struct dmar_domain *alloc_domain(int flags) memset(domain, 0, sizeof(*domain)); domain->nid = -1; domain->flags = flags; + domain->has_iotlb_device = false; INIT_LIST_HEAD(&domain->devices); return domain; -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 7/7] iommu: introduce per-cpu caching to iova allocation
From: Omer Peleg This patch introduces global and per-CPU caches of IOVAs, so that CPUs can usually allocate IOVAs without taking the rbtree spinlock to do so. The caching is based on magazines, as described in "Magazines and Vmem: Extending the Slab Allocator to Many CPUs and Arbitrary Resources" (currently available at https://www.usenix.org/legacy/event/usenix01/bonwick.html) Adding caching on top of the existing rbtree allocator maintains the property that IOVAs are densely packed in the IO virtual address space, which is important for keeping IOMMU page table usage low. Signed-off-by: Omer Peleg [m...@cs.technion.ac.il: rebased, cleaned up and reworded the commit message] Signed-off-by: Adam Morrison --- drivers/iommu/intel-iommu.c | 12 +- drivers/iommu/iova.c| 326 +--- include/linux/iova.h| 21 ++- 3 files changed, 302 insertions(+), 57 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index c2de0e7..7e9c2dd 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3282,11 +3282,11 @@ static unsigned long intel_alloc_iova(struct device *dev, * from higher range */ iova_pfn = alloc_iova(&domain->iovad, nrpages, - IOVA_PFN(DMA_BIT_MASK(32)), 1); + IOVA_PFN(DMA_BIT_MASK(32))); if (iova_pfn) return iova_pfn; } - iova_pfn = alloc_iova(&domain->iovad, nrpages, IOVA_PFN(dma_mask), 1); + iova_pfn = alloc_iova(&domain->iovad, nrpages, IOVA_PFN(dma_mask)); if (unlikely(!iova_pfn)) { pr_err("Allocating %ld-page iova for %s failed", nrpages, dev_name(dev)); @@ -3447,7 +3447,7 @@ static dma_addr_t __intel_map_single(struct device *dev, phys_addr_t paddr, error: if (iova_pfn) - free_iova(&domain->iovad, iova_pfn); + free_iova(&domain->iovad, iova_pfn, dma_to_mm_pfn(size)); pr_err("Device %s request: %zx@%llx dir %d --- failed\n", dev_name(dev), size, (unsigned long long)paddr, dir); return 0; @@ -3502,7 +3502,7 @@ static void flush_unmaps(struct deferred_flush_data *flush_data) iommu_flush_dev_iotlb(domain, (uint64_t)iova_pfn << PAGE_SHIFT, mask); } - free_iova(&domain->iovad, iova_pfn); + free_iova(&domain->iovad, iova_pfn, nrpages); if (freelist) dma_free_pagelist(freelist); } @@ -3593,7 +3593,7 @@ static void intel_unmap(struct device *dev, dma_addr_t dev_addr, size_t size) iommu_flush_iotlb_psi(iommu, domain, start_pfn, nrpages, !freelist, 0); /* free iova */ - free_iova(&domain->iovad, iova_pfn); + free_iova(&domain->iovad, iova_pfn, dma_to_mm_pfn(nrpages)); dma_free_pagelist(freelist); } else { add_unmap(domain, iova_pfn, nrpages, freelist); @@ -3751,7 +3751,7 @@ static int intel_map_sg(struct device *dev, struct scatterlist *sglist, int nele if (unlikely(ret)) { dma_pte_free_pagetable(domain, start_vpfn, start_vpfn + size - 1); - free_iova(&domain->iovad, iova_pfn); + free_iova(&domain->iovad, iova_pfn, dma_to_mm_pfn(size)); return 0; } diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 9009ce6..38dd03a 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -20,11 +20,24 @@ #include #include #include +#include +#include + +static bool iova_rcache_insert(struct iova_domain *iovad, + struct iova_rcache *rcache, + unsigned long iova_pfn); +static unsigned long iova_rcache_get(struct iova_rcache *rcache); + +static void init_iova_rcache(struct iova_rcache *rcache); +static void free_iova_rcache(struct iova_domain *iovad, +struct iova_rcache *rcache); void init_iova_domain(struct iova_domain *iovad, unsigned long granule, unsigned long start_pfn, unsigned long pfn_32bit) { + int i; + /* * IOVA granularity will normally be equal to the smallest * supported IOMMU page size; both *must* be capable of @@ -38,6 +51,9 @@ init_iova_domain(struct iova_domain *iovad, unsigned long granule, iovad->granule = granule; iovad->start_pfn = start_pfn; iovad->dma_32bit_pfn = pfn_32bit; + + for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) + init_iova_rcache(&iovad->rcaches[i]); } EXPORT_SYMBOL_GPL(init_iova_domain); @@ -100,7 +116,7 @@ io
[PATCH 3/7] iommu: correct flush_unmaps pfn usage
From: Omer Peleg Change flush_unmaps() to correctly pass iommu_flush_iotlb_psi() dma addresses. (Intel mm and dma have the same size for pages at the moment, but this usage improves consistency.) Signed-off-by: Omer Peleg [m...@cs.technion.ac.il: rebased and reworded the commit message] Signed-off-by: Adam Morrison --- drivers/iommu/intel-iommu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 5b3530c..6d6229e 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3463,7 +3463,8 @@ static void flush_unmaps(struct deferred_flush_data *flush_data) /* On real hardware multiple invalidations are expensive */ if (cap_caching_mode(iommu->cap)) iommu_flush_iotlb_psi(iommu, domain, - iova->pfn_lo, iova_size(iova), + mm_to_dma_pfn(iova->pfn_lo), + mm_to_dma_pfn(iova_size(iova)), !freelist, 0); else { mask = ilog2(mm_to_dma_pfn(iova_size(iova))); -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 4/7] iommu: only unmap mapped entries
From: Omer Peleg Current unmap implementation unmaps the entire area covered by the IOVA range, which is a power-of-2 aligned region. The corresponding map, however, only maps those pages originally mapped by the user. This discrepancy can lead to unmapping of already unmapped entries, which is unneeded work. With this patch, only mapped pages are unmapped. This is also a baseline for a map/unmap implementation based on IOVAs and not iova structures, which will allow caching. Signed-off-by: Omer Peleg [m...@cs.technion.ac.il: rebased and reworded the commit message] Signed-off-by: Adam Morrison --- drivers/iommu/intel-iommu.c | 36 +--- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 6d6229e..e4d023b 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -455,6 +455,7 @@ static void flush_unmaps_timeout(unsigned long data); struct deferred_flush_entry { struct iova *iova; + unsigned long nrpages; struct dmar_domain *domain; struct page *freelist; }; @@ -3457,6 +3458,7 @@ static void flush_unmaps(struct deferred_flush_data *flush_data) struct deferred_flush_entry *entry = &flush_table->entries[j]; struct iova *iova = entry->iova; + unsigned long nrpages = entry->nrpages; struct dmar_domain *domain = entry->domain; struct page *freelist = entry->freelist; @@ -3464,10 +3466,9 @@ static void flush_unmaps(struct deferred_flush_data *flush_data) if (cap_caching_mode(iommu->cap)) iommu_flush_iotlb_psi(iommu, domain, mm_to_dma_pfn(iova->pfn_lo), - mm_to_dma_pfn(iova_size(iova)), - !freelist, 0); + nrpages, !freelist, 0); else { - mask = ilog2(mm_to_dma_pfn(iova_size(iova))); + mask = ilog2(nrpages); iommu_flush_dev_iotlb(domain, (uint64_t)iova->pfn_lo << PAGE_SHIFT, mask); } @@ -3491,7 +3492,8 @@ static void flush_unmaps_timeout(unsigned long cpuid) spin_unlock_irqrestore(&flush_data->lock, flags); } -static void add_unmap(struct dmar_domain *dom, struct iova *iova, struct page *freelist) +static void add_unmap(struct dmar_domain *dom, struct iova *iova, + unsigned long nrpages, struct page *freelist) { unsigned long flags; int entry_id, iommu_id; @@ -3516,6 +3518,7 @@ static void add_unmap(struct dmar_domain *dom, struct iova *iova, struct page *f entry = &flush_data->tables[iommu_id].entries[entry_id]; entry->domain = dom; entry->iova = iova; + entry->nrpages = nrpages; entry->freelist = freelist; if (!flush_data->timer_on) { @@ -3528,10 +3531,11 @@ static void add_unmap(struct dmar_domain *dom, struct iova *iova, struct page *f put_cpu(); } -static void intel_unmap(struct device *dev, dma_addr_t dev_addr) +static void intel_unmap(struct device *dev, dma_addr_t dev_addr, size_t size) { struct dmar_domain *domain; unsigned long start_pfn, last_pfn; + unsigned long nrpages; struct iova *iova; struct intel_iommu *iommu; struct page *freelist; @@ -3549,8 +3553,9 @@ static void intel_unmap(struct device *dev, dma_addr_t dev_addr) (unsigned long long)dev_addr)) return; + nrpages = aligned_nrpages(dev_addr, size); start_pfn = mm_to_dma_pfn(iova->pfn_lo); - last_pfn = mm_to_dma_pfn(iova->pfn_hi + 1) - 1; + last_pfn = start_pfn + nrpages - 1; pr_debug("Device %s unmapping: pfn %lx-%lx\n", dev_name(dev), start_pfn, last_pfn); @@ -3559,12 +3564,12 @@ static void intel_unmap(struct device *dev, dma_addr_t dev_addr) if (intel_iommu_strict) { iommu_flush_iotlb_psi(iommu, domain, start_pfn, - last_pfn - start_pfn + 1, !freelist, 0); + nrpages, !freelist, 0); /* free iova */ __free_iova(&domain->iovad, iova); dma_free_pagelist(freelist); } else { - add_unmap(domain, iova, freelist); + add_unmap(domain, iova, nrpages, freelist); /* * queue up the release of the unmap to save the 1/6th of the * cpu used up by the iotlb flush operation... @@ -3576,7 +3581,7 @@ static void intel_unmap_page(struct device *dev, dma_addr_t dev_a
[PATCH 1/7] iommu: refactoring of deferred flush entries
From: Omer Peleg Currently, deferred flushes' info is striped between several lists in the flush tables. Instead, move all information about a specific flush to a single entry in this table. This patch does not introduce any functional change. Signed-off-by: Omer Peleg [m...@cs.technion.ac.il: rebased and reworded the commit message] Signed-off-by: Adam Morrison --- drivers/iommu/intel-iommu.c | 48 +++-- 1 file changed, 29 insertions(+), 19 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 16b243e..de7a9fc 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -455,15 +455,19 @@ static void flush_unmaps_timeout(unsigned long data); static DEFINE_TIMER(unmap_timer, flush_unmaps_timeout, 0, 0); +struct deferred_flush_entry { + struct iova *iova; + struct dmar_domain *domain; + struct page *freelist; +}; + #define HIGH_WATER_MARK 250 -struct deferred_flush_tables { +struct deferred_flush_table { int next; - struct iova *iova[HIGH_WATER_MARK]; - struct dmar_domain *domain[HIGH_WATER_MARK]; - struct page *freelist[HIGH_WATER_MARK]; + struct deferred_flush_entry entries[HIGH_WATER_MARK]; }; -static struct deferred_flush_tables *deferred_flush; +static struct deferred_flush_table *deferred_flush; /* bitmap for indexing intel_iommus */ static int g_num_of_iommus; @@ -3039,7 +3043,7 @@ static int __init init_dmars(void) } deferred_flush = kzalloc(g_num_of_iommus * - sizeof(struct deferred_flush_tables), GFP_KERNEL); + sizeof(struct deferred_flush_table), GFP_KERNEL); if (!deferred_flush) { ret = -ENOMEM; goto free_g_iommus; @@ -3435,22 +3439,25 @@ static void flush_unmaps(void) DMA_TLB_GLOBAL_FLUSH); for (j = 0; j < deferred_flush[i].next; j++) { unsigned long mask; - struct iova *iova = deferred_flush[i].iova[j]; - struct dmar_domain *domain = deferred_flush[i].domain[j]; + struct deferred_flush_entry *entry = + &deferred_flush->entries[j]; + struct iova *iova = entry->iova; + struct dmar_domain *domain = entry->domain; + struct page *freelist = entry->freelist; /* On real hardware multiple invalidations are expensive */ if (cap_caching_mode(iommu->cap)) iommu_flush_iotlb_psi(iommu, domain, iova->pfn_lo, iova_size(iova), - !deferred_flush[i].freelist[j], 0); + !freelist, 0); else { mask = ilog2(mm_to_dma_pfn(iova_size(iova))); - iommu_flush_dev_iotlb(deferred_flush[i].domain[j], + iommu_flush_dev_iotlb(domain, (uint64_t)iova->pfn_lo << PAGE_SHIFT, mask); } - __free_iova(&deferred_flush[i].domain[j]->iovad, iova); - if (deferred_flush[i].freelist[j]) - dma_free_pagelist(deferred_flush[i].freelist[j]); + __free_iova(&domain->iovad, iova); + if (freelist) + dma_free_pagelist(freelist); } deferred_flush[i].next = 0; } @@ -3470,8 +3477,9 @@ static void flush_unmaps_timeout(unsigned long data) static void add_unmap(struct dmar_domain *dom, struct iova *iova, struct page *freelist) { unsigned long flags; - int next, iommu_id; + int entry_id, iommu_id; struct intel_iommu *iommu; + struct deferred_flush_entry *entry; spin_lock_irqsave(&async_umap_flush_lock, flags); if (list_size == HIGH_WATER_MARK) @@ -3480,11 +3488,13 @@ static void add_unmap(struct dmar_domain *dom, struct iova *iova, struct page *f iommu = domain_get_iommu(dom); iommu_id = iommu->seq_id; - next = deferred_flush[iommu_id].next; - deferred_flush[iommu_id].domain[next] = dom; - deferred_flush[iommu_id].iova[next] = iova; - deferred_flush[iommu_id].freelist[next] = freelist; - deferred_flush[iommu_id].next++; + entry_id = deferred_flush[iommu_id].next; + ++(deferred_flush[iommu_id].next); + + entry = &deferred_flush[iommu_id].entries[entry_id]; + entry->domain = dom; + entry->iova = iova; + entry->freelist = freelist; if (!timer_on) { mod_timer(&unmap_timer, jiffies + msecs_to_jiffies(10)); -- 1.9.1 _
[PATCH 2/7] iommu: per-cpu deferred invalidation queues
From: Omer Peleg The IOMMU's IOTLB invalidation is a costly process. When iommu mode is not set to "strict", it is done asynchronously. Current code amortizes the cost of invalidating IOTLB entries by batching all the invalidations in the system and performing a single global invalidation instead. The code queues pending invalidations in a global queue that is accessed under the global "async_umap_flush_lock" spinlock, which can result is significant spinlock contention. This patch splits this deferred queue into multiple per-cpu defererred queues, and thus gets rid of the "async_umap_flush_lock" and its contention. Signed-off-by: Omer Peleg [m...@cs.technion.ac.il: rebased, cleaned up and reworded the commit message] Signed-off-by: Adam Morrison --- drivers/iommu/intel-iommu.c | 105 +++- 1 file changed, 64 insertions(+), 41 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index de7a9fc..7118ed3 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -453,8 +453,6 @@ static LIST_HEAD(dmar_rmrr_units); static void flush_unmaps_timeout(unsigned long data); -static DEFINE_TIMER(unmap_timer, flush_unmaps_timeout, 0, 0); - struct deferred_flush_entry { struct iova *iova; struct dmar_domain *domain; @@ -467,17 +465,19 @@ struct deferred_flush_table { struct deferred_flush_entry entries[HIGH_WATER_MARK]; }; -static struct deferred_flush_table *deferred_flush; +struct deferred_flush_data { + spinlock_t lock; + int timer_on; + struct timer_list timer; + long size; + struct deferred_flush_table *tables; +}; + +DEFINE_PER_CPU(struct deferred_flush_data, deferred_flush); /* bitmap for indexing intel_iommus */ static int g_num_of_iommus; -static DEFINE_SPINLOCK(async_umap_flush_lock); -static LIST_HEAD(unmaps_to_do); - -static int timer_on; -static long list_size; - static void domain_exit(struct dmar_domain *domain); static void domain_remove_dev_info(struct dmar_domain *domain); static void dmar_remove_one_dev_info(struct dmar_domain *domain, @@ -1869,8 +1869,12 @@ static void domain_exit(struct dmar_domain *domain) return; /* Flush any lazy unmaps that may reference this domain */ - if (!intel_iommu_strict) - flush_unmaps_timeout(0); + if (!intel_iommu_strict) { + int cpu; + + for_each_possible_cpu(cpu) + flush_unmaps_timeout(cpu); + } /* Remove associated devices and clear attached or cached domains */ rcu_read_lock(); @@ -3009,7 +3013,7 @@ static int __init init_dmars(void) bool copied_tables = false; struct device *dev; struct intel_iommu *iommu; - int i, ret; + int i, ret, cpu; /* * for each drhd @@ -3042,11 +3046,20 @@ static int __init init_dmars(void) goto error; } - deferred_flush = kzalloc(g_num_of_iommus * - sizeof(struct deferred_flush_table), GFP_KERNEL); - if (!deferred_flush) { - ret = -ENOMEM; - goto free_g_iommus; + for_each_possible_cpu(cpu) { + struct deferred_flush_data *dfd = per_cpu_ptr(&deferred_flush, + cpu); + + dfd->tables = kzalloc(g_num_of_iommus * + sizeof(struct deferred_flush_table), + GFP_KERNEL); + if (!dfd->tables) { + ret = -ENOMEM; + goto free_g_iommus; + } + + spin_lock_init(&dfd->lock); + setup_timer(&dfd->timer, flush_unmaps_timeout, cpu); } for_each_active_iommu(iommu, drhd) { @@ -3212,8 +3225,9 @@ free_iommu: disable_dmar_iommu(iommu); free_dmar_iommu(iommu); } - kfree(deferred_flush); free_g_iommus: + for_each_possible_cpu(cpu) + kfree(per_cpu_ptr(&deferred_flush, cpu)->tables); kfree(g_iommus); error: return ret; @@ -3418,29 +3432,31 @@ static dma_addr_t intel_map_page(struct device *dev, struct page *page, dir, *dev->dma_mask); } -static void flush_unmaps(void) +static void flush_unmaps(struct deferred_flush_data *flush_data) { int i, j; - timer_on = 0; + flush_data->timer_on = 0; /* just flush them all */ for (i = 0; i < g_num_of_iommus; i++) { struct intel_iommu *iommu = g_iommus[i]; + struct deferred_flush_table *flush_table = + &flush_data->tables[i]; if (!iommu) continue; - if (!deferred_flush[i].next) + if (!flush_table->next) continue;
[PATCH 0/7] Intel IOMMU scalability improvements
This patchset improves the scalability of the Intel IOMMU code by resolving two spinlock bottlenecks, yielding up to ~10x performance improvement and approaching iommu=off performance. For example, here's the throughput obtained by 16 memcached instances running on a 16-core Sandy Bridge system, accessed using memslap on another machine that has iommu=off, using the default memslap config (64-byte keys, 1024-byte values, and 10%/90% SET/GET ops): stock iommu=off: 1,088,996 memcached transactions/sec (=100%, median of 10 runs). stock iommu=on: 123,760 memcached transactions/sec (=11%). [perf: 43.56%0.86% memcached [kernel.kallsyms] [k] _raw_spin_lock_irqsave] patched iommu=on: 1,067,586 memcached transactions/sec (=98%). [perf: 0.75% 0.75% memcached [kernel.kallsyms] [k] _raw_spin_lock_irqsave] The two resolved spinlocks: - Deferred IOTLB invalidations are batched in a global data structure and serialized under a spinlock (add_unmap() & flush_unmaps()); this patchset batches IOTLB invalidations in a per-CPU data structure. - IOVA management (alloc_iova() & __free_iova()) is serialized under the rbtree spinlock; this patchset adds per-CPU caches of allocated IOVAs so that the rbtree doesn't get accessed frequently. (Adding a cache above the existing IOVA allocator is less intrusive than dynamic identity mapping and helps keep IOMMU page table usage low; see Patch 7.) The paper "Utilizing the IOMMU Scalably" (presented at the 2015 USENIX Annual Technical Conference) contains many more details and experiments: https://www.usenix.org/system/files/conference/atc15/atc15-paper-peleg.pdf Omer Peleg (7): iommu: refactoring of deferred flush entries iommu: per-cpu deferred invalidation queues iommu: correct flush_unmaps pfn usage iommu: only unmap mapped entries iommu: avoid dev iotlb logic in intel-iommu for domains with no dev iotlbs iommu: change intel-iommu to use IOVA frame numbers iommu: introduce per-cpu caching to iova allocation drivers/iommu/intel-iommu.c | 264 +- drivers/iommu/iova.c| 334 +--- include/linux/iova.h| 23 ++- 3 files changed, 470 insertions(+), 151 deletions(-) -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/ipmmu-vmsa: Don't truncate ttbr if LPAE is not enabled
On Tue, Dec 22, 2015 at 08:01:06PM +0100, Geert Uytterhoeven wrote: > static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain) > { > - phys_addr_t ttbr; > + u64 ttbr; > u32 tmp; > int ret; Didn't apply cleanly to v4.4-rc7, but fixed it up and queued it to iommu/fixes. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/3] iommu/dma: Add some missing #includes
On Fri, Dec 18, 2015 at 05:01:46PM +, Robin Murphy wrote: > Hi Joerg, > > here are a couple more minor fixes for some obscure subtleties in the > common DMA code. I see I've just missed the rc5 fixes pull, but I hope > you can pick these up for rc6 (unless of course you're also just about > to disappear for 2 weeks like I am). Applied 1 and 3 to iommu/fixes, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/3] iommu/dma: Use correct offset in map_sg
On Fri, Dec 18, 2015 at 05:01:47PM +, Robin Murphy wrote: > When mapping a non-page-aligned scatterlist entry, we copy the original > offset to the output DMA address before aligning it to hand off to > iommu_map_sg(), then later adding the IOVA page address portion to get > the final mapped address. However, when the IOVA page size is smaller > than the CPU page size, it is the offset within the IOVA page we want, > not that within the CPU page, which can easily be larger than an IOVA > page and thus result in an incorrect final address. > > Fix the bug by taking only the IOVA-aligned part of the offset as the > basis of the DMA address, not the whole thing. > > Signed-off-by: Robin Murphy > --- > drivers/iommu/dma-iommu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 982e716..03811e3 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -458,7 +458,7 @@ int iommu_dma_map_sg(struct device *dev, struct > scatterlist *sg, > size_t s_length = s->length; > size_t pad_len = (mask - iova_len + 1) & mask; > > - sg_dma_address(s) = s->offset; > + sg_dma_address(s) = s_offset; Does not apply on v4.4-rc7. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/6] iommu/amd: Introduce amd_iommu_get_num_iommus()
On Tue, Dec 22, 2015 at 01:19:14PM -0600, Suthikulpanit, Suravee wrote: > This patch introduces amd_iommu_get_num_iommus(). Initially, this is > intended to be used by Perf AMD IOMMU driver. > > Signed-off-by: Suravee Suthikulpanit > --- > drivers/iommu/amd_iommu_init.c| 16 > include/linux/perf/perf_event_amd_iommu.h | 2 ++ > 2 files changed, 18 insertions(+) > > diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c > index 275c0f5..9c62613 100644 > --- a/drivers/iommu/amd_iommu_init.c > +++ b/drivers/iommu/amd_iommu_init.c > @@ -2244,6 +2244,22 @@ bool amd_iommu_v2_supported(void) > } > EXPORT_SYMBOL(amd_iommu_v2_supported); > > +static int amd_iommu_cnt; > + > +int amd_iommu_get_num_iommus(void) > +{ > + struct amd_iommu *iommu; > + > + if (amd_iommu_cnt) > + return amd_iommu_cnt; > + > + for_each_iommu(iommu) > + amd_iommu_cnt++; It is better to set amd_iommu_cnt during IOMMU initialization. You can just increment this value after an IOMMU has been set up. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu