[PATCH v3] iommu/vt-d: consider real PCI device when checking if mapping is needed
From: Jon Derrick The PCI devices handled by intel-iommu may have a DMA requester on another bus, such as VMD subdevices needing to use the VMD endpoint. The real DMA device is now used for the DMA mapping, but one case was missed earlier: if the VMD device (and hence subdevices too) are under IOMMU_DOMAIN_IDENTITY, mappings do not work. Codepaths like intel_map_page() handle the IOMMU_DOMAIN_DMA case by creating an iommu DMA mapping, and fall back on dma_direct_map_page() for the IOMMU_DOMAIN_IDENTITY case. However, handling of the IDENTITY case is broken when intel_page_page() handles a subdevice. We observe that at iommu attach time, dmar_insert_one_dev_info() for the subdevices will never set dev->archdata.iommu. This is because that function uses find_domain() to check if there is already an IOMMU for the device, and find_domain() then defers to the real DMA device which does have one. Thus dmar_insert_one_dev_info() returns without assigning dev->archdata.iommu. Then, later: 1. intel_map_page() checks if an IOMMU mapping is needed by calling iommu_need_mapping() on the subdevice. identity_mapping() returns false because dev->archdata.iommu is NULL, so this function returns false indicating that mapping is needed. 2. __intel_map_single() is called to create the mapping. 3. __intel_map_single() calls find_domain(). This function now returns the IDENTITY domain corresponding to the real DMA device. 4. __intel_map_single() calls domain_get_iommu() on this "real" domain. A failure is hit and the entire operation is aborted, because this codepath is not intended to handle IDENTITY mappings: if (WARN_ON(domain->domain.type != IOMMU_DOMAIN_DMA)) return NULL; Fix this by using the real DMA device when checking if a mapping is needed, while also considering the subdevice DMA mask. The IDENTITY case will then directly fall back on dma_direct_map_page(). Reported-by: Daniel Drake Fixes: b0140c69637e ("iommu/vt-d: Use pci_real_dma_dev() for mapping") Signed-off-by: Jon Derrick Signed-off-by: Daniel Drake --- drivers/iommu/intel-iommu.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 9dc37672bf89..7ffd252bf835 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3582,12 +3582,16 @@ static struct dmar_domain *get_private_domain_for_dev(struct device *dev) /* Check if the dev needs to go through non-identity map and unmap process.*/ static bool iommu_need_mapping(struct device *dev) { + struct device *dma_dev = dev; int ret; if (iommu_dummy(dev)) return false; - ret = identity_mapping(dev); + if (dev_is_pci(dev)) + dma_dev = &pci_real_dma_dev(to_pci_dev(dev))->dev; + + ret = identity_mapping(dma_dev); if (ret) { u64 dma_mask = *dev->dma_mask; @@ -3601,19 +3605,19 @@ static bool iommu_need_mapping(struct device *dev) * 32 bit DMA is removed from si_domain and fall back to * non-identity mapping. */ - dmar_remove_one_dev_info(dev); - ret = iommu_request_dma_domain_for_dev(dev); + dmar_remove_one_dev_info(dma_dev); + ret = iommu_request_dma_domain_for_dev(dma_dev); if (ret) { struct iommu_domain *domain; struct dmar_domain *dmar_domain; - domain = iommu_get_domain_for_dev(dev); + domain = iommu_get_domain_for_dev(dma_dev); if (domain) { dmar_domain = to_dmar_domain(domain); dmar_domain->flags |= DOMAIN_FLAG_LOSE_CHILDREN; } - dmar_remove_one_dev_info(dev); - get_private_domain_for_dev(dev); + dmar_remove_one_dev_info(dma_dev); + get_private_domain_for_dev(dma_dev); } dev_info(dev, "32bit DMA uses non-identity mapping\n"); -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC v3 1/8] vfio: Add VFIO_IOMMU_PASID_REQUEST(alloc/free)
> From: Liu, Yi L > Sent: Friday, January 31, 2020 8:41 PM > To: Alex Williamson > Subject: RE: [RFC v3 1/8] vfio: Add VFIO_IOMMU_PASID_REQUEST(alloc/free) > > > +static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu, > > > +unsigned int pasid) > > > +{ > > > + struct vfio_mm *vmm = iommu->vmm; > > > + int ret = 0; > > > + > > > + mutex_lock(&iommu->lock); > > > + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) { > > > > But we could have been IOMMU backed when the pasid was allocated, did we > just > > leak something? In fact, I didn't spot anything in this series that handles > > a container with pasids allocated losing iommu backing. > > I'd think we want to release all pasids when that happens since permission > > for > > the user to hold pasids goes along with having an iommu backed device. > > oh, yes. If a container lose iommu backend, then needs to reclaim the > allocated > PASIDs. right? I'll add it. :-) Hi Alex, I went through the flow again. Maybe current series has already covered it. There is vfio_mm which is used to track allocated PASIDs. Its life cycle is type1 driver open and release. If I understand it correctly, type1 driver release happens when there is no more iommu backed groups in a container. static void __vfio_group_unset_container(struct vfio_group *group) { [...] /* Detaching the last group deprivileges a container, remove iommu */ if (driver && list_empty(&container->group_list)) { driver->ops->release(container->iommu_data); module_put(driver->ops->owner); container->iommu_driver = NULL; container->iommu_data = NULL; } [...] } Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/5] iommu/vt-d: Do deferred attachment in iommu_need_mapping()
Hi Joerg, Thanks for doing this. On 2020/2/18 3:38, Joerg Roedel wrote: From: Joerg Roedel The attachment of deferred devices needs to happen before the check whether the device is identity mapped or not. Otherwise the check will return wrong results, cause warnings boot failures in kdump kernels, like WARNING: CPU: 0 PID: 318 at ../drivers/iommu/intel-iommu.c:592 domain_get_iommu+0x61/0x70 [...] Call Trace: __intel_map_single+0x55/0x190 intel_alloc_coherent+0xac/0x110 dmam_alloc_attrs+0x50/0xa0 ahci_port_start+0xfb/0x1f0 [libahci] ata_host_start.part.39+0x104/0x1e0 [libata] With the earlier check the kdump boot succeeds and a crashdump is written. Signed-off-by: Joerg Roedel --- drivers/iommu/intel-iommu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 42cdcce1602e..32f43695a22b 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -2541,9 +2541,6 @@ static void do_deferred_attach(struct device *dev) static struct dmar_domain *deferred_attach_domain(struct device *dev) { - if (unlikely(attach_deferred(dev))) - do_deferred_attach(dev); - This should also be moved to the call place of deferred_attach_domain() in bounce_map_single(). bounce_map_single() assumes that devices always use DMA domain, so it doesn't call iommu_need_mapping(). We could do_deferred_attach() there manually. Best regards, baolu return find_domain(dev); } @@ -3595,6 +3592,9 @@ static bool iommu_need_mapping(struct device *dev) if (iommu_dummy(dev)) return false; + if (unlikely(attach_deferred(dev))) + do_deferred_attach(dev); + ret = identity_mapping(dev); if (ret) { u64 dma_mask = *dev->dma_mask; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/5] iommu/vt-d: Simplify check in identity_mapping()
On Mon Feb 17 20, Joerg Roedel wrote: From: Joerg Roedel The function only has one call-site and there it is never called with dummy or deferred devices. Simplify the check in the function to account for that. Signed-off-by: Joerg Roedel Reviewed-by: Jerry Snitselaar ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/5] iommu/vt-d: Do deferred attachment in iommu_need_mapping()
On Mon Feb 17 20, Joerg Roedel wrote: From: Joerg Roedel The attachment of deferred devices needs to happen before the check whether the device is identity mapped or not. Otherwise the check will return wrong results, cause warnings boot failures in kdump kernels, like WARNING: CPU: 0 PID: 318 at ../drivers/iommu/intel-iommu.c:592 domain_get_iommu+0x61/0x70 [...] Call Trace: __intel_map_single+0x55/0x190 intel_alloc_coherent+0xac/0x110 dmam_alloc_attrs+0x50/0xa0 ahci_port_start+0xfb/0x1f0 [libahci] ata_host_start.part.39+0x104/0x1e0 [libata] With the earlier check the kdump boot succeeds and a crashdump is written. Signed-off-by: Joerg Roedel Reviewed-by: Jerry Snitselaar ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/5] iommu/vt-d: Remove deferred_attach_domain()
On Mon Feb 17 20, Joerg Roedel wrote: From: Joerg Roedel The function is now only a wrapper around find_domain(). Remove the function and call find_domain() directly at the call-sites. Signed-off-by: Joerg Roedel Reviewed-by: Jerry Snitselaar ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/5] iommu/vt-d: Move deferred device attachment into helper function
On Mon Feb 17 20, Joerg Roedel wrote: From: Joerg Roedel Move the code that does the deferred device attachment into a separate helper function. Signed-off-by: Joerg Roedel Reviewed-by: Jerry Snitselaar ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/5] iommu/vt-d: Add attach_deferred() helper
On Mon Feb 17 20, Joerg Roedel wrote: From: Joerg Roedel Implement a helper function to check whether a device's attach process is deferred. Signed-off-by: Joerg Roedel Reviewed-by: Jerry Snitselaar ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 4/5] iommu/vt-d: Remove deferred_attach_domain()
From: Joerg Roedel The function is now only a wrapper around find_domain(). Remove the function and call find_domain() directly at the call-sites. Signed-off-by: Joerg Roedel --- drivers/iommu/intel-iommu.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 32f43695a22b..16e47ca505eb 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -2539,11 +2539,6 @@ static void do_deferred_attach(struct device *dev) intel_iommu_attach_device(domain, dev); } -static struct dmar_domain *deferred_attach_domain(struct device *dev) -{ - return find_domain(dev); -} - static inline struct device_domain_info * dmar_search_domain_by_dev_info(int segment, int bus, int devfn) { @@ -3643,7 +3638,7 @@ static dma_addr_t __intel_map_single(struct device *dev, phys_addr_t paddr, BUG_ON(dir == DMA_NONE); - domain = deferred_attach_domain(dev); + domain = find_domain(dev); if (!domain) return DMA_MAPPING_ERROR; @@ -3863,7 +3858,7 @@ static int intel_map_sg(struct device *dev, struct scatterlist *sglist, int nele if (!iommu_need_mapping(dev)) return dma_direct_map_sg(dev, sglist, nelems, dir, attrs); - domain = deferred_attach_domain(dev); + domain = find_domain(dev); if (!domain) return 0; @@ -3958,7 +3953,7 @@ bounce_map_single(struct device *dev, phys_addr_t paddr, size_t size, int prot = 0; int ret; - domain = deferred_attach_domain(dev); + domain = find_domain(dev); if (WARN_ON(dir == DMA_NONE || !domain)) return DMA_MAPPING_ERROR; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/5] iommu/vt-d: Do deferred attachment in iommu_need_mapping()
From: Joerg Roedel The attachment of deferred devices needs to happen before the check whether the device is identity mapped or not. Otherwise the check will return wrong results, cause warnings boot failures in kdump kernels, like WARNING: CPU: 0 PID: 318 at ../drivers/iommu/intel-iommu.c:592 domain_get_iommu+0x61/0x70 [...] Call Trace: __intel_map_single+0x55/0x190 intel_alloc_coherent+0xac/0x110 dmam_alloc_attrs+0x50/0xa0 ahci_port_start+0xfb/0x1f0 [libahci] ata_host_start.part.39+0x104/0x1e0 [libata] With the earlier check the kdump boot succeeds and a crashdump is written. Signed-off-by: Joerg Roedel --- drivers/iommu/intel-iommu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 42cdcce1602e..32f43695a22b 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -2541,9 +2541,6 @@ static void do_deferred_attach(struct device *dev) static struct dmar_domain *deferred_attach_domain(struct device *dev) { - if (unlikely(attach_deferred(dev))) - do_deferred_attach(dev); - return find_domain(dev); } @@ -3595,6 +3592,9 @@ static bool iommu_need_mapping(struct device *dev) if (iommu_dummy(dev)) return false; + if (unlikely(attach_deferred(dev))) + do_deferred_attach(dev); + ret = identity_mapping(dev); if (ret) { u64 dma_mask = *dev->dma_mask; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/5] iommu/vt-d: Move deferred device attachment into helper function
From: Joerg Roedel Move the code that does the deferred device attachment into a separate helper function. Signed-off-by: Joerg Roedel --- drivers/iommu/intel-iommu.c | 20 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 80f2332a5466..42cdcce1602e 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -2529,16 +2529,20 @@ struct dmar_domain *find_domain(struct device *dev) return NULL; } -static struct dmar_domain *deferred_attach_domain(struct device *dev) +static void do_deferred_attach(struct device *dev) { - if (unlikely(attach_deferred(dev))) { - struct iommu_domain *domain; + struct iommu_domain *domain; - dev->archdata.iommu = NULL; - domain = iommu_get_domain_for_dev(dev); - if (domain) - intel_iommu_attach_device(domain, dev); - } + dev->archdata.iommu = NULL; + domain = iommu_get_domain_for_dev(dev); + if (domain) + intel_iommu_attach_device(domain, dev); +} + +static struct dmar_domain *deferred_attach_domain(struct device *dev) +{ + if (unlikely(attach_deferred(dev))) + do_deferred_attach(dev); return find_domain(dev); } -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/5] iommu/vt-d: Fix kdump boot with VT-d enabled
Hi, booting into a crashdump kernel with Intel IOMMU enabled and configured into passthrough mode does not succeed with the current kernel. The reason is that the check for identity mappings happen before the check for deferred device attachments. That results in wrong results returned from iommu_need_mapping() and subsequently in a wrong domain-type used in __intel_map_single(). A stripped oops is in the commit-message of patch 3. The patch-set fixes the issue and does a few code cleanups along the way. I have not yet researched the stable and fixes tags, but when the patches are fine I will add the tags before applying the patches. Please review. Thanks, Joerg Joerg Roedel (5): iommu/vt-d: Add attach_deferred() helper iommu/vt-d: Move deferred device attachment into helper function iommu/vt-d: Do deferred attachment in iommu_need_mapping() iommu/vt-d: Remove deferred_attach_domain() iommu/vt-d: Simplify check in identity_mapping() drivers/iommu/intel-iommu.c | 37 - 1 file changed, 20 insertions(+), 17 deletions(-) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 5/5] iommu/vt-d: Simplify check in identity_mapping()
From: Joerg Roedel The function only has one call-site and there it is never called with dummy or deferred devices. Simplify the check in the function to account for that. Signed-off-by: Joerg Roedel --- drivers/iommu/intel-iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 16e47ca505eb..0b5a0fadbc0c 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -2916,7 +2916,7 @@ static int identity_mapping(struct device *dev) struct device_domain_info *info; info = dev->archdata.iommu; - if (info && info != DUMMY_DEVICE_DOMAIN_INFO && info != DEFER_DEVICE_DOMAIN_INFO) + if (info) return (info->domain == si_domain); return 0; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/5] iommu/vt-d: Add attach_deferred() helper
From: Joerg Roedel Implement a helper function to check whether a device's attach process is deferred. Signed-off-by: Joerg Roedel --- drivers/iommu/intel-iommu.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 9dc37672bf89..80f2332a5466 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -762,6 +762,11 @@ static int iommu_dummy(struct device *dev) return dev->archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO; } +static bool attach_deferred(struct device *dev) +{ + return dev->archdata.iommu == DEFER_DEVICE_DOMAIN_INFO; +} + /** * is_downstream_to_pci_bridge - test if a device belongs to the PCI * sub-hierarchy of a candidate PCI-PCI bridge @@ -2510,8 +2515,7 @@ struct dmar_domain *find_domain(struct device *dev) { struct device_domain_info *info; - if (unlikely(dev->archdata.iommu == DEFER_DEVICE_DOMAIN_INFO || -dev->archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)) + if (unlikely(attach_deferred(dev) || iommu_dummy(dev))) return NULL; if (dev_is_pci(dev)) @@ -2527,7 +2531,7 @@ struct dmar_domain *find_domain(struct device *dev) static struct dmar_domain *deferred_attach_domain(struct device *dev) { - if (unlikely(dev->archdata.iommu == DEFER_DEVICE_DOMAIN_INFO)) { + if (unlikely(attach_deferred(dev))) { struct iommu_domain *domain; dev->archdata.iommu = NULL; @@ -6133,7 +6137,7 @@ intel_iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev) static bool intel_iommu_is_attach_deferred(struct iommu_domain *domain, struct device *dev) { - return dev->archdata.iommu == DEFER_DEVICE_DOMAIN_INFO; + return attach_deferred(dev); } static int -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/arm-smmu-v3: Add SMMUv3.2 range invalidation support
On 13/02/2020 9:49 pm, Rob Herring wrote: On Thu, Jan 30, 2020 at 11:34 AM Robin Murphy wrote: On 30/01/2020 3:06 pm, Auger Eric wrote: Hi Rob, On 1/17/20 10:16 PM, Rob Herring wrote: Arm SMMUv3.2 adds support for TLB range invalidate operations. Support for range invalidate is determined by the RIL bit in the IDR3 register. The range invalidate is in units of the leaf page size and operates on 1-32 chunks of a power of 2 multiple pages. First, we determine from the size what power of 2 multiple we can use. Then we calculate how many chunks (1-31) of the power of 2 size for the range on the iteration. On each iteration, we move up in size by at least 5 bits. Cc: Eric Auger Cc: Jean-Philippe Brucker Cc: Will Deacon Cc: Robin Murphy Cc: Joerg Roedel Signed-off-by: Rob Herring @@ -2003,7 +2024,7 @@ static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size, { u64 cmds[CMDQ_BATCH_ENTRIES * CMDQ_ENT_DWORDS]; struct arm_smmu_device *smmu = smmu_domain->smmu; -unsigned long start = iova, end = iova + size; +unsigned long start = iova, end = iova + size, num_pages = 0, tg = 0; int i = 0; struct arm_smmu_cmdq_ent cmd = { .tlbi = { @@ -2022,12 +2043,50 @@ static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size, cmd.tlbi.vmid = smmu_domain->s2_cfg.vmid; } +if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) { +/* Get the leaf page size */ +tg = __ffs(smmu_domain->domain.pgsize_bitmap); + +/* Convert page size of 12,14,16 (log2) to 1,2,3 */ +cmd.tlbi.tg = ((tg - ilog2(SZ_4K)) / 2) + 1; Given the comment, I think "(tg - 10) / 2" would suffice ;) Well, duh... + +/* Determine what level the granule is at */ +cmd.tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3)); Is it possible to rephrase that with logs and shifts to avoid the division? Well, with a loop is the only other way I came up with: bpl = tg - 3; ttl = 3; mask = BIT(bpl) - 1; while ((granule & (mask << ((4 - ttl) * bpl + 3))) == 0) ttl--; Doesn't seem like much improvement to me given we have h/w divide... Sure, it doesn't take too many extra instructions to start matching typical IDIV latency, so there's no point being silly if there really isn't a clean alternative. Some quick scribbling suggests "4 - ilog2(granule) / 10" might actually be close enough, but perhaps that's a bit too cheeky. + +num_pages = size / (1UL << tg); Similarly, in my experience GCC has always seemed too cautious to elide non-constant division even in a seemingly-obvious case like this, so explicit "size >> tg" might be preferable. My experience has been gcc is smart enough. I checked and there's only one divide and it is the prior one. But I'll change it anyways. Now that I think about it, the case that frustrated me may have had the shift and divide in separate statements - that's probably a lot harder to analyse than a single expression. Either way, the simple right shift is easier to read IMO. +} + while (iova < end) { if (i == CMDQ_BATCH_ENTRIES) { arm_smmu_cmdq_issue_cmdlist(smmu, cmds, i, false); i = 0; } +if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) { +/* + * On each iteration of the loop, the range is 5 bits + * worth of the aligned size remaining. + * The range in pages is: + * + * range = (num_pages & (0x1f << __ffs(num_pages))) + */ +unsigned long scale, num; + +/* Determine the power of 2 multiple number of pages */ +scale = __ffs(num_pages); +cmd.tlbi.scale = scale; + +/* Determine how many chunks of 2^scale size we have */ +num = (num_pages >> scale) & CMDQ_TLBI_RANGE_NUM_MAX; +cmd.tlbi.num = num - 1; + +/* range is num * 2^scale * pgsize */ +granule = num << (scale + tg); + +/* Clear out the lower order bits for the next iteration */ +num_pages -= num << scale; Regarding the 2 options given in https://lore.kernel.org/linux-arm-kernel/CAL_JsqKABoE+0crGwyZdNogNgEoG=moopf6deqgh6s73c0u...@mail.gmail.com/raw, I understand you implemented 2) but I still do not understand why you preferred that one against 1). In your case of 1023*4k pages this will invalidate by 31 32*2^0*4K + 31*2^0*4K pages whereas you could achieve that with 10 invalidations with the 1st algo. I did not get the case where it is more efficient. Please can you detail. I guess essentially we want to solve for two variables to get a range as close to size as possible. There might be a more elegant numerical method, but for the n
Re: [PATCH v2] iommu/arm-smmu-v3: Batch ATC invalidation commands
On Thu, Feb 13, 2020 at 02:56:00PM -0600, Rob Herring wrote: > Similar to commit 2af2e72b18b4 ("iommu/arm-smmu-v3: Defer TLB > invalidation until ->iotlb_sync()"), build up a list of ATC invalidation > commands and submit them all at once to the command queue instead of > one-by-one. > > As there is only one caller of arm_smmu_atc_inv_master() left, we can > simplify it and avoid passing in struct arm_smmu_cmdq_ent. > > Cc: Jean-Philippe Brucker > Cc: Will Deacon > Cc: Robin Murphy > Cc: Joerg Roedel > Signed-off-by: Rob Herring Reviewed-by: Jean-Philippe Brucker Since I'm adding a third user of cmdq batching [1], I had a go at factoring them. I can send the attached patch with my next version, if it looks OK. Thanks, Jean [1] https://lore.kernel.org/linux-iommu/20200213101435.229932-4-jean-phili...@linaro.org/ >From b304f322e6293be4ec8b5a01e2ef67e8fa34143c Mon Sep 17 00:00:00 2001 From: Jean-Philippe Brucker Date: Mon, 17 Feb 2020 17:42:54 +0100 Subject: [PATCH] iommu/arm-smmu-v3: Factor command queue batching Factor the code for command queue batching, which is now repeated three times for TLB, ATC and CFG invalidations. Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/arm-smmu-v3.c | 66 +++-- 1 file changed, 34 insertions(+), 32 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 45da5c251b65..04c3077589be 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -548,6 +548,11 @@ struct arm_smmu_cmdq { atomic_tlock; }; +struct arm_smmu_cmdq_batch { + u64 cmds[CMDQ_BATCH_ENTRIES * CMDQ_ENT_DWORDS]; + int num; +}; + struct arm_smmu_evtq { struct arm_smmu_queue q; u32 max_stalls; @@ -1482,15 +1487,33 @@ static int arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu) return arm_smmu_cmdq_issue_cmdlist(smmu, NULL, 0, true); } +static void arm_smmu_cmdq_batch_add(struct arm_smmu_device *smmu, + struct arm_smmu_cmdq_batch *cmds, + struct arm_smmu_cmdq_ent *cmd) +{ + if (cmds->num == CMDQ_BATCH_ENTRIES) { + arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, false); + cmds->num = 0; + } + arm_smmu_cmdq_build_cmd(&cmds->cmds[cmds->num * CMDQ_ENT_DWORDS], cmd); + cmds->num++; +} + +static int arm_smmu_cmdq_batch_submit(struct arm_smmu_device *smmu, + struct arm_smmu_cmdq_batch *cmds) +{ + return arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, true); +} + + /* Context descriptor manipulation functions */ static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain, int ssid, bool leaf) { size_t i; - int cmdn = 0; unsigned long flags; struct arm_smmu_master *master; - u64 cmds[CMDQ_BATCH_ENTRIES * CMDQ_ENT_DWORDS]; + struct arm_smmu_cmdq_batch cmds = {}; struct arm_smmu_device *smmu = smmu_domain->smmu; struct arm_smmu_cmdq_ent cmd = { .opcode = CMDQ_OP_CFGI_CD, @@ -1503,19 +1526,13 @@ static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain, spin_lock_irqsave(&smmu_domain->devices_lock, flags); list_for_each_entry(master, &smmu_domain->devices, domain_head) { for (i = 0; i < master->num_sids; i++) { - if (cmdn == CMDQ_BATCH_ENTRIES) { - arm_smmu_cmdq_issue_cmdlist(smmu, cmds, cmdn, false); - cmdn = 0; - } - cmd.cfgi.sid = master->sids[i]; - arm_smmu_cmdq_build_cmd(&cmds[cmdn * CMDQ_ENT_DWORDS], &cmd); - cmdn++; + arm_smmu_cmdq_batch_add(smmu, &cmds, &cmd); } } spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); - arm_smmu_cmdq_issue_cmdlist(smmu, cmds, cmdn, true); + arm_smmu_cmdq_batch_submit(smmu, &cmds); } static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu, @@ -2160,11 +2177,11 @@ static int arm_smmu_atc_inv_master(struct arm_smmu_master *master) static int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid, unsigned long iova, size_t size) { - int i, cmdn = 0; + int i; unsigned long flags; struct arm_smmu_cmdq_ent cmd; struct arm_smmu_master *master; - u64 cmds[CMDQ_BATCH_ENTRIES * CMDQ_ENT_DWORDS]; + struct arm_smmu_cmdq_batch cmds = {}; if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS)) return 0; @@ -2194,20 +2211,13 @@ static int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain,
Re: [RFC PATCH] iommu/iova: Support limiting IOVA alignment
On 14/02/2020 8:30 pm, Liam Mark wrote: When the IOVA framework applies IOVA alignment it aligns all IOVAs to the smallest PAGE_SIZE order which is greater than or equal to the requested IOVA size. We support use cases that requires large buffers (> 64 MB in size) to be allocated and mapped in their stage 1 page tables. However, with this alignment scheme we find ourselves running out of IOVA space for 32 bit devices, so we are proposing this config, along the similar vein as CONFIG_CMA_ALIGNMENT for CMA allocations. As per [1], I'd really like to better understand the allocation patterns that lead to such a sparsely-occupied address space to begin with, given that the rbtree allocator is supposed to try to maintain locality as far as possible, and the rcaches should further improve on that. Are you also frequently cycling intermediate-sized buffers which are smaller than 64MB but still too big to be cached? Are there a lot of non-power-of-two allocations? Add CONFIG_IOMMU_LIMIT_IOVA_ALIGNMENT to limit the alignment of IOVAs to some desired PAGE_SIZE order, specified by CONFIG_IOMMU_IOVA_ALIGNMENT. This helps reduce the impact of fragmentation caused by the current IOVA alignment scheme, and gives better IOVA space utilization. Even if the general change did prove reasonable, this IOVA allocator is not owned by the DMA API, so entirely removing the option of strict size-alignment feels a bit uncomfortable. Personally I'd replace the bool argument with an actual alignment value to at least hand the authority out to individual callers. Furthermore, even in DMA API terms, is anyone really ever going to bother tuning that config? Since iommu-dma is supposed to be a transparent layer, arguably it shouldn't behave unnecessarily differently from CMA, so simply piggy-backing off CONFIG_CMA_ALIGNMENT would seem logical. Robin. [1] https://lore.kernel.org/linux-iommu/1581721602-17010-1-git-send-email-isa...@codeaurora.org/ Signed-off-by: Liam Mark --- drivers/iommu/Kconfig | 31 +++ drivers/iommu/iova.c | 20 +++- 2 files changed, 50 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index d2fade984999..9684a153cc72 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -3,6 +3,37 @@ config IOMMU_IOVA tristate +if IOMMU_IOVA + +config IOMMU_LIMIT_IOVA_ALIGNMENT + bool "Limit IOVA alignment" + help + When the IOVA framework applies IOVA alignment it aligns all + IOVAs to the smallest PAGE_SIZE order which is greater than or + equal to the requested IOVA size. This works fine for sizes up + to several MiB, but for larger sizes it results in address + space wastage and fragmentation. For example drivers with a 4 + GiB IOVA space might run out of IOVA space when allocating + buffers great than 64 MiB. + + Enable this option to impose a limit on the alignment of IOVAs. + + If unsure, say N. + +config IOMMU_IOVA_ALIGNMENT + int "Maximum PAGE_SIZE order of alignment for IOVAs" + depends on IOMMU_LIMIT_IOVA_ALIGNMENT + range 4 9 + default 9 + help + With this parameter you can specify the maximum PAGE_SIZE order for + IOVAs. Larger IOVAs will be aligned only to this specified order. + The order is expressed a power of two multiplied by the PAGE_SIZE. + + If unsure, leave the default value "9". + +endif + # The IOASID library may also be used by non-IOMMU_API users config IOASID tristate diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 0e6a9536eca6..259884c8dbd1 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -177,6 +177,24 @@ int init_iova_flush_queue(struct iova_domain *iovad, rb_insert_color(&iova->node, root); } +#ifdef CONFIG_IOMMU_LIMIT_IOVA_ALIGNMENT +static unsigned long limit_align_shift(struct iova_domain *iovad, + unsigned long shift) +{ + unsigned long max_align_shift; + + max_align_shift = CONFIG_IOMMU_IOVA_ALIGNMENT + PAGE_SHIFT + - iova_shift(iovad); + return min_t(unsigned long, max_align_shift, shift); +} +#else +static unsigned long limit_align_shift(struct iova_domain *iovad, + unsigned long shift) +{ + return shift; +} +#endif + static int __alloc_and_insert_iova_range(struct iova_domain *iovad, unsigned long size, unsigned long limit_pfn, struct iova *new, bool size_aligned) @@ -188,7 +206,7 @@ static int __alloc_and_insert_iova_range(struct iova_domain *iovad, unsigned long align_mask = ~0UL; if (size_aligned) - align_mask <<= fls_long(size - 1); + align_mask <<= limit_align_shift(iovad, fls_long(size - 1)); /* Walk the tree backwards */ spin_
Re: [RFC PATCH] iommu/iova: Add a best-fit algorithm
On 14/02/2020 11:06 pm, Isaac J. Manjarres wrote: From: Liam Mark Using the best-fit algorithm, instead of the first-fit algorithm, may reduce fragmentation when allocating IOVAs. What kind of pathological allocation patterns make that a serious problem? Is there any scope for simply changing the order of things in the callers? Do these drivers also run under other DMA API backends (e.g. 32-bit Arm)? More generally, if a driver knows enough to want to second-guess a generic DMA API allocator, that's a reasonable argument that it should perhaps be properly IOMMU-aware and managing its own address space anyway. Perhaps this effort might be better spent finishing off the DMA ops bypass stuff to make that approach more robust and welcoming. Robin. Signed-off-by: Isaac J. Manjarres --- drivers/iommu/dma-iommu.c | 17 +++ drivers/iommu/iova.c | 73 +-- include/linux/dma-iommu.h | 7 + include/linux/iova.h | 1 + 4 files changed, 96 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index a2e96a5..af08770 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -364,9 +364,26 @@ static int iommu_dma_deferred_attach(struct device *dev, if (unlikely(ops->is_attach_deferred && ops->is_attach_deferred(domain, dev))) return iommu_attach_device(domain, dev); + return 0; +} + +/* + * Should be called prior to using dma-apis. + */ +int iommu_dma_enable_best_fit_algo(struct device *dev) +{ + struct iommu_domain *domain; + struct iova_domain *iovad; + + domain = iommu_get_domain_for_dev(dev); + if (!domain || !domain->iova_cookie) + return -EINVAL; + iovad = &((struct iommu_dma_cookie *)domain->iova_cookie)->iovad; + iovad->best_fit = true; return 0; } +EXPORT_SYMBOL(iommu_dma_enable_best_fit_algo); /** * dma_info_to_prot - Translate DMA API directions and attributes to IOMMU API diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 0e6a953..716b05f 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -50,6 +50,7 @@ static unsigned long iova_rcache_get(struct iova_domain *iovad, iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR; rb_link_node(&iovad->anchor.node, NULL, &iovad->rbroot.rb_node); rb_insert_color(&iovad->anchor.node, &iovad->rbroot); + iovad->best_fit = false; init_iova_rcaches(iovad); } EXPORT_SYMBOL_GPL(init_iova_domain); @@ -227,6 +228,69 @@ static int __alloc_and_insert_iova_range(struct iova_domain *iovad, return -ENOMEM; } +static int __alloc_and_insert_iova_best_fit(struct iova_domain *iovad, + unsigned long size, unsigned long limit_pfn, + struct iova *new, bool size_aligned) +{ + struct rb_node *curr, *prev; + struct iova *curr_iova, *prev_iova; + unsigned long flags; + unsigned long align_mask = ~0UL; + struct rb_node *candidate_rb_parent; + unsigned long new_pfn, candidate_pfn = ~0UL; + unsigned long gap, candidate_gap = ~0UL; + + if (size_aligned) + align_mask <<= limit_align(iovad, fls_long(size - 1)); + + /* Walk the tree backwards */ + spin_lock_irqsave(&iovad->iova_rbtree_lock, flags); + curr = &iovad->anchor.node; + prev = rb_prev(curr); + for (; prev; curr = prev, prev = rb_prev(curr)) { + curr_iova = rb_entry(curr, struct iova, node); + prev_iova = rb_entry(prev, struct iova, node); + + limit_pfn = min(limit_pfn, curr_iova->pfn_lo); + new_pfn = (limit_pfn - size) & align_mask; + gap = curr_iova->pfn_lo - prev_iova->pfn_hi - 1; + if ((limit_pfn >= size) && (new_pfn > prev_iova->pfn_hi) + && (gap < candidate_gap)) { + candidate_gap = gap; + candidate_pfn = new_pfn; + candidate_rb_parent = curr; + if (gap == size) + goto insert; + } + } + + curr_iova = rb_entry(curr, struct iova, node); + limit_pfn = min(limit_pfn, curr_iova->pfn_lo); + new_pfn = (limit_pfn - size) & align_mask; + gap = curr_iova->pfn_lo - iovad->start_pfn; + if (limit_pfn >= size && new_pfn >= iovad->start_pfn && + gap < candidate_gap) { + candidate_gap = gap; + candidate_pfn = new_pfn; + candidate_rb_parent = curr; + } + +insert: + if (candidate_pfn == ~0UL) { + spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags); + return -ENOMEM; + } + + /* pfn_lo will point to size aligned address if size_aligned is set */ + new->pfn_lo = candidate_pfn; +
Re: [RFC PATCH] iommu/dma: Allow drivers to reserve an iova range
On 17/02/2020 8:01 am, Christoph Hellwig wrote: On Fri, Feb 14, 2020 at 02:58:16PM -0800, Isaac J. Manjarres wrote: From: Liam Mark Some devices have a memory map which contains gaps or holes. In order for the device to have as much IOVA space as possible, allow its driver to inform the DMA-IOMMU layer that it should not allocate addresses from these holes. Layering violation. dma-iommu is the translation layer between the DMA API and the IOMMU API. And calls into it from drivers performing DMA mappings need to go through the DMA API (and be documented there). +1 More than that, though, we already have "holes in the address space" support for the sake of PCI host bridge windows - assuming this is the same kind of thing (i.e. the holes are between memory regions and other resources in PA space, so are only relevant once address translation comes into the picture), then this is IOMMU API level stuff, so even a DMA API level interface would be inappropriate. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 02/11] PCI: Add ats_supported host bridge flag
On Sat, Feb 15, 2020 at 03:10:47PM -0600, Bjorn Helgaas wrote: > On Thu, Feb 13, 2020 at 05:50:40PM +0100, Jean-Philippe Brucker wrote: > > Each vendor has their own way of describing whether a host bridge > > supports ATS. The Intel and AMD ACPI tables selectively enable or > > disable ATS per device or sub-tree, while Arm has a single bit for each > > host bridge. For those that need it, add an ats_supported bit to the > > host bridge structure. > > > > Signed-off-by: Jean-Philippe Brucker > > --- > > drivers/pci/probe.c | 7 +++ > > include/linux/pci.h | 1 + > > 2 files changed, 8 insertions(+) > > > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > > index 512cb4312ddd..75c0a25af44e 100644 > > --- a/drivers/pci/probe.c > > +++ b/drivers/pci/probe.c > > @@ -598,6 +598,13 @@ static void pci_init_host_bridge(struct > > pci_host_bridge *bridge) > > bridge->native_shpc_hotplug = 1; > > bridge->native_pme = 1; > > bridge->native_ltr = 1; > > + > > + /* > > +* Some systems may disable ATS at the host bridge (ACPI IORT, > > +* device-tree), other filter it with a smaller granularity (ACPI DMAR > > +* and IVRS). > > +*/ > > + bridge->ats_supported = 1; > > The cover letter says it's important to enable ATS only if the host > bridge supports it. From the other patches, it looks like we learn if > the host bridge supports ATS from either a DT "ats-supported" property > or an ACPI IORT table. If that's the case, shouldn't the default here > be "ATS is *not* supported"? The ACPI IVRS table (AMD) doesn't have a property for the host bridge, it can only deselect ATS for a sub-range of devices. Similarly the DMAR table (Intel) declares that ATS is supported either by the whole PCIe domain or for sub-ranges of devices. I selected ats_supported at the bridge by default since IVRS needs it and DMAR has its own fine-grained ATS support configuration. I'm still not sure this is the right approach, given that the ats_supported bridge property doesn't exactly correspond to a firmware property on all platforms. Maybe the device-tree implementation should follow the IORT one where each device carries a fwspec property stating "root-complex supports ATS". But it isn't nice either so I tried a cleaner implementation (as discussed with Robin back on the ATS-with-SMMUv3 series [1]). Thanks, Jean [1] https://lore.kernel.org/linux-iommu/c10c7adb-c7f6-f8c6-05cc-f4f143427...@arm.com/ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: arm-smmu.1.auto: Unhandled context fault starting with 5.4-rc1
On Mon Feb 17 20, Robin Murphy wrote: On 16/02/2020 10:11 pm, Jerry Snitselaar wrote: On Fri Feb 14 20, Robin Murphy wrote: Hi Jerry, On 2020-02-14 8:13 pm, Jerry Snitselaar wrote: Hi Will, On a gigabyte system with Cavium CN8xx, when doing a fio test against an nvme drive we are seeing the following: [ 637.161194] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x8010003f6000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.174329] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x80136000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.186887] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x8010002ee000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.199275] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x8010003c7000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.211885] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x801000392000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.224580] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x80118000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.237241] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x80100036, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.249657] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x801ba000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.262120] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x8013e000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.274468] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x801000304000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 Those "IOVAs" don't look much like IOVAs from the DMA allocator - if they were physical addresses, would they correspond to an expected region of the physical memory map? I would suspect that this is most likely misbehaviour in the NVMe driver (issuing a write to a non-DMA-mapped address), and the SMMU is just doing its job in blocking and reporting it. I also reproduced with 5.5-rc7, and will check 5.6-rc1 later today. I couldn't narrow it down further into 5.4-rc1. I don't know smmu or the code well, any thoughts on where to start digging into this? fio test that is being run is: #fio -filename=/dev/nvme0n1 -iodepth=64 -thread -rw=randwrite -ioengine=libaio -bs=4k -runtime=43200 -size=-group_reporting -name=mytest -numjobs=32 Just to clarify, do other tests work OK on the same device? Thanks, Robin. I was able to get back on the system today. I think I know what the problem is: [ 0.036189] iommu: Gigabyte R120-T34-00 detected, force iommu passthrough mode [ 6.324282] iommu: Default domain type: Translated So the new default domain code in 5.4 overrides the iommu quirk code setting default passthrough. Testing a quick patch that tracks whether the default domain was set in the quirk code, and leaves it alone if it was. So far it seems to be working. Ah, OK. Could you point me at that quirk code? I can't seem to track it down in mainline, and seeing this much leaves me dubious that it's even correct - matching a particular board implies that it's a firmware issue (as far as I'm aware the SMMUs in CN88xx SoCs are usable in general), but if the firmware description is wrong to the point that DMA ops translation doesn't work, then no other translation (e.g. VFIO) is likely to work either. In that case it's simply not safe to enable the SMMU at all, and fudging the default domain type merely hides one symptom of the problem. Robin. Ugh. It is a RHEL only patch, but for some reason it is applied to the ark kernel builds as well. Sorry for the noise. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/3] iommu/virtio: Enable x86 support
On 17/02/2020 1:31 pm, Michael S. Tsirkin wrote: On Mon, Feb 17, 2020 at 01:22:44PM +, Robin Murphy wrote: On 17/02/2020 1:01 pm, Michael S. Tsirkin wrote: On Mon, Feb 17, 2020 at 10:01:07AM +0100, Jean-Philippe Brucker wrote: On Sun, Feb 16, 2020 at 04:50:33AM -0500, Michael S. Tsirkin wrote: On Fri, Feb 14, 2020 at 04:57:11PM +, Robin Murphy wrote: On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote: With the built-in topology description in place, x86 platforms can now use the virtio-iommu. Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/Kconfig | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 068d4e0e3541..adcbda44d473 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -508,8 +508,9 @@ config HYPERV_IOMMU config VIRTIO_IOMMU bool "Virtio IOMMU driver" depends on VIRTIO=y - depends on ARM64 + depends on (ARM64 || X86) select IOMMU_API + select IOMMU_DMA Can that have an "if X86" for clarity? AIUI it's not necessary for virtio-iommu itself (and really shouldn't be), but is merely to satisfy the x86 arch code's expectation that IOMMU drivers bring their own DMA ops, right? Robin. In fact does not this work on any platform now? There is ongoing work to use the generic IOMMU_DMA ops on X86. AMD IOMMU has been converted recently [1] but VT-d still implements its own DMA ops (conversion patches are on the list [2]). On Arm the arch Kconfig selects IOMMU_DMA, and I assume we'll have the same on X86 once Tom's work is complete. Until then I can add a "if X86" here for clarity. Thanks, Jean [1] https://lore.kernel.org/linux-iommu/20190613223901.9523-1-murph...@tcd.ie/ [2] https://lore.kernel.org/linux-iommu/20191221150402.13868-1-murph...@tcd.ie/ What about others? E.g. PPC? That was the point I was getting at - while iommu-dma should build just fine for the likes of PPC, s390, 32-bit Arm, etc., they have no architecture code to correctly wire up iommu_dma_ops to devices. Thus there's currently no point pulling it in and pretending it's anything more than a waste of space for architectures other than arm64 and x86. It's merely a historical artefact of the x86 DMA API implementation that when the IOMMU drivers were split out to form drivers/iommu they took some of their relevant arch code with them. Robin. Rather than white-listing architectures, how about making the architectures in question set some kind of symbol, and depend on it? Umm, that's basically what we have already? Architectures that use iommu_dma_ops select IOMMU_DMA. The only issue is the oddity of x86 treating IOMMU drivers as part of its arch code, which has never come up against a cross-architecture driver until now. Hence the options of either maintaining that paradigm and having the 'x86 arch code' aspect of this driver "select IOMMU_DMA if x86" such that it works out equivalent to AMD_IOMMU, or a more involved cleanup to move that responsibility out of drivers/iommu/Kconfig entirely and have arch/x86/Kconfig do something like "select IOMMU_DMA if IOMMU_API", as Jean suggested up-thread. In the specific context of IOMMU_DMA we're not talking about any kind of white-list, merely a one-off special case for one particular architecture. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/3] iommu/virtio: Enable x86 support
On Mon, Feb 17, 2020 at 01:22:44PM +, Robin Murphy wrote: > On 17/02/2020 1:01 pm, Michael S. Tsirkin wrote: > > On Mon, Feb 17, 2020 at 10:01:07AM +0100, Jean-Philippe Brucker wrote: > > > On Sun, Feb 16, 2020 at 04:50:33AM -0500, Michael S. Tsirkin wrote: > > > > On Fri, Feb 14, 2020 at 04:57:11PM +, Robin Murphy wrote: > > > > > On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote: > > > > > > With the built-in topology description in place, x86 platforms can > > > > > > now > > > > > > use the virtio-iommu. > > > > > > > > > > > > Signed-off-by: Jean-Philippe Brucker > > > > > > --- > > > > > >drivers/iommu/Kconfig | 3 ++- > > > > > >1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > > > > > > index 068d4e0e3541..adcbda44d473 100644 > > > > > > --- a/drivers/iommu/Kconfig > > > > > > +++ b/drivers/iommu/Kconfig > > > > > > @@ -508,8 +508,9 @@ config HYPERV_IOMMU > > > > > >config VIRTIO_IOMMU > > > > > > bool "Virtio IOMMU driver" > > > > > > depends on VIRTIO=y > > > > > > - depends on ARM64 > > > > > > + depends on (ARM64 || X86) > > > > > > select IOMMU_API > > > > > > + select IOMMU_DMA > > > > > > > > > > Can that have an "if X86" for clarity? AIUI it's not necessary for > > > > > virtio-iommu itself (and really shouldn't be), but is merely to > > > > > satisfy the > > > > > x86 arch code's expectation that IOMMU drivers bring their own DMA > > > > > ops, > > > > > right? > > > > > > > > > > Robin. > > > > > > > > In fact does not this work on any platform now? > > > > > > There is ongoing work to use the generic IOMMU_DMA ops on X86. AMD IOMMU > > > has been converted recently [1] but VT-d still implements its own DMA ops > > > (conversion patches are on the list [2]). On Arm the arch Kconfig selects > > > IOMMU_DMA, and I assume we'll have the same on X86 once Tom's work is > > > complete. Until then I can add a "if X86" here for clarity. > > > > > > Thanks, > > > Jean > > > > > > [1] > > > https://lore.kernel.org/linux-iommu/20190613223901.9523-1-murph...@tcd.ie/ > > > [2] > > > https://lore.kernel.org/linux-iommu/20191221150402.13868-1-murph...@tcd.ie/ > > > > What about others? E.g. PPC? > > That was the point I was getting at - while iommu-dma should build just fine > for the likes of PPC, s390, 32-bit Arm, etc., they have no architecture code > to correctly wire up iommu_dma_ops to devices. Thus there's currently no > point pulling it in and pretending it's anything more than a waste of space > for architectures other than arm64 and x86. It's merely a historical > artefact of the x86 DMA API implementation that when the IOMMU drivers were > split out to form drivers/iommu they took some of their relevant arch code > with them. > > Robin. Rather than white-listing architectures, how about making the architectures in question set some kind of symbol, and depend on it? -- MST ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/3] iommu/virtio: Enable x86 support
On 17/02/2020 1:01 pm, Michael S. Tsirkin wrote: On Mon, Feb 17, 2020 at 10:01:07AM +0100, Jean-Philippe Brucker wrote: On Sun, Feb 16, 2020 at 04:50:33AM -0500, Michael S. Tsirkin wrote: On Fri, Feb 14, 2020 at 04:57:11PM +, Robin Murphy wrote: On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote: With the built-in topology description in place, x86 platforms can now use the virtio-iommu. Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/Kconfig | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 068d4e0e3541..adcbda44d473 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -508,8 +508,9 @@ config HYPERV_IOMMU config VIRTIO_IOMMU bool "Virtio IOMMU driver" depends on VIRTIO=y - depends on ARM64 + depends on (ARM64 || X86) select IOMMU_API + select IOMMU_DMA Can that have an "if X86" for clarity? AIUI it's not necessary for virtio-iommu itself (and really shouldn't be), but is merely to satisfy the x86 arch code's expectation that IOMMU drivers bring their own DMA ops, right? Robin. In fact does not this work on any platform now? There is ongoing work to use the generic IOMMU_DMA ops on X86. AMD IOMMU has been converted recently [1] but VT-d still implements its own DMA ops (conversion patches are on the list [2]). On Arm the arch Kconfig selects IOMMU_DMA, and I assume we'll have the same on X86 once Tom's work is complete. Until then I can add a "if X86" here for clarity. Thanks, Jean [1] https://lore.kernel.org/linux-iommu/20190613223901.9523-1-murph...@tcd.ie/ [2] https://lore.kernel.org/linux-iommu/20191221150402.13868-1-murph...@tcd.ie/ What about others? E.g. PPC? That was the point I was getting at - while iommu-dma should build just fine for the likes of PPC, s390, 32-bit Arm, etc., they have no architecture code to correctly wire up iommu_dma_ops to devices. Thus there's currently no point pulling it in and pretending it's anything more than a waste of space for architectures other than arm64 and x86. It's merely a historical artefact of the x86 DMA API implementation that when the IOMMU drivers were split out to form drivers/iommu they took some of their relevant arch code with them. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: arm-smmu.1.auto: Unhandled context fault starting with 5.4-rc1
On 16/02/2020 10:11 pm, Jerry Snitselaar wrote: On Fri Feb 14 20, Robin Murphy wrote: Hi Jerry, On 2020-02-14 8:13 pm, Jerry Snitselaar wrote: Hi Will, On a gigabyte system with Cavium CN8xx, when doing a fio test against an nvme drive we are seeing the following: [ 637.161194] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x8010003f6000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.174329] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x80136000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.186887] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x8010002ee000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.199275] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x8010003c7000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.211885] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x801000392000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.224580] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x80118000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.237241] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x80100036, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.249657] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x801ba000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.262120] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x8013e000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 [ 637.274468] arm-smmu arm-smmu.1.auto: Unhandled context fault: fsr=0x8402, iova=0x801000304000, fsynr=0x70091, cbfrsynra=0x9000, cb=7 Those "IOVAs" don't look much like IOVAs from the DMA allocator - if they were physical addresses, would they correspond to an expected region of the physical memory map? I would suspect that this is most likely misbehaviour in the NVMe driver (issuing a write to a non-DMA-mapped address), and the SMMU is just doing its job in blocking and reporting it. I also reproduced with 5.5-rc7, and will check 5.6-rc1 later today. I couldn't narrow it down further into 5.4-rc1. I don't know smmu or the code well, any thoughts on where to start digging into this? fio test that is being run is: #fio -filename=/dev/nvme0n1 -iodepth=64 -thread -rw=randwrite -ioengine=libaio -bs=4k -runtime=43200 -size=-group_reporting -name=mytest -numjobs=32 Just to clarify, do other tests work OK on the same device? Thanks, Robin. I was able to get back on the system today. I think I know what the problem is: [ 0.036189] iommu: Gigabyte R120-T34-00 detected, force iommu passthrough mode [ 6.324282] iommu: Default domain type: Translated So the new default domain code in 5.4 overrides the iommu quirk code setting default passthrough. Testing a quick patch that tracks whether the default domain was set in the quirk code, and leaves it alone if it was. So far it seems to be working. Ah, OK. Could you point me at that quirk code? I can't seem to track it down in mainline, and seeing this much leaves me dubious that it's even correct - matching a particular board implies that it's a firmware issue (as far as I'm aware the SMMUs in CN88xx SoCs are usable in general), but if the firmware description is wrong to the point that DMA ops translation doesn't work, then no other translation (e.g. VFIO) is likely to work either. In that case it's simply not safe to enable the SMMU at all, and fudging the default domain type merely hides one symptom of the problem. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/3] iommu/virtio: Enable x86 support
On Mon, Feb 17, 2020 at 10:01:07AM +0100, Jean-Philippe Brucker wrote: > On Sun, Feb 16, 2020 at 04:50:33AM -0500, Michael S. Tsirkin wrote: > > On Fri, Feb 14, 2020 at 04:57:11PM +, Robin Murphy wrote: > > > On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote: > > > > With the built-in topology description in place, x86 platforms can now > > > > use the virtio-iommu. > > > > > > > > Signed-off-by: Jean-Philippe Brucker > > > > --- > > > > drivers/iommu/Kconfig | 3 ++- > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > > > > index 068d4e0e3541..adcbda44d473 100644 > > > > --- a/drivers/iommu/Kconfig > > > > +++ b/drivers/iommu/Kconfig > > > > @@ -508,8 +508,9 @@ config HYPERV_IOMMU > > > > config VIRTIO_IOMMU > > > > bool "Virtio IOMMU driver" > > > > depends on VIRTIO=y > > > > - depends on ARM64 > > > > + depends on (ARM64 || X86) > > > > select IOMMU_API > > > > + select IOMMU_DMA > > > > > > Can that have an "if X86" for clarity? AIUI it's not necessary for > > > virtio-iommu itself (and really shouldn't be), but is merely to satisfy > > > the > > > x86 arch code's expectation that IOMMU drivers bring their own DMA ops, > > > right? > > > > > > Robin. > > > > In fact does not this work on any platform now? > > There is ongoing work to use the generic IOMMU_DMA ops on X86. AMD IOMMU > has been converted recently [1] but VT-d still implements its own DMA ops > (conversion patches are on the list [2]). On Arm the arch Kconfig selects > IOMMU_DMA, and I assume we'll have the same on X86 once Tom's work is > complete. Until then I can add a "if X86" here for clarity. > > Thanks, > Jean > > [1] https://lore.kernel.org/linux-iommu/20190613223901.9523-1-murph...@tcd.ie/ > [2] > https://lore.kernel.org/linux-iommu/20191221150402.13868-1-murph...@tcd.ie/ What about others? E.g. PPC? -- MST ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 03/11] PCI: OF: Check whether the host bridge supports ATS
On Thu, Feb 13, 2020 at 12:26:46PM -0600, Rob Herring wrote: > On Thu, Feb 13, 2020 at 10:52 AM Jean-Philippe Brucker > wrote: > > > > Copy the ats-supported flag into the pci_host_bridge structure. > > > > Signed-off-by: Jean-Philippe Brucker > > --- > > drivers/pci/controller/pci-host-common.c | 1 + > > drivers/pci/of.c | 9 + > > include/linux/of_pci.h | 3 +++ > > 3 files changed, 13 insertions(+) > > > > diff --git a/drivers/pci/controller/pci-host-common.c > > b/drivers/pci/controller/pci-host-common.c > > index 250a3fc80ec6..a6ac927be291 100644 > > --- a/drivers/pci/controller/pci-host-common.c > > +++ b/drivers/pci/controller/pci-host-common.c > > @@ -92,6 +92,7 @@ int pci_host_common_probe(struct platform_device *pdev, > > return ret; > > } > > > > + of_pci_host_check_ats(bridge); > > platform_set_drvdata(pdev, bridge->bus); > > return 0; > > } > > diff --git a/drivers/pci/of.c b/drivers/pci/of.c > > index 81ceeaa6f1d5..4b8a877f1e9f 100644 > > --- a/drivers/pci/of.c > > +++ b/drivers/pci/of.c > > @@ -576,6 +576,15 @@ int pci_parse_request_of_pci_ranges(struct device *dev, > > } > > EXPORT_SYMBOL_GPL(pci_parse_request_of_pci_ranges); > > > > +void of_pci_host_check_ats(struct pci_host_bridge *bridge) > > +{ > > + struct device_node *np = bridge->bus->dev.of_node; > > + > > + if (!np) > > + return; > > + > > + bridge->ats_supported = of_property_read_bool(np, "ats-supported"); > > +} > > Not really any point in a common function if we expect this to be only > for ECAM hosts which it seems to be based on the binding. I'll move this to pci-host-common.c Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: arm64 iommu groups issue
Right, and even worse is that it relies on the port driver even existing at all. All this iommu group assignment should be taken outside device driver probe paths. However we could still consider device links for sync'ing the SMMU and each device probing. Yes, we should get that for DT now thanks to the of_devlink stuff, but cooking up some equivalent for IORT might be worthwhile. It doesn't solve this problem, but at least we could remove the iommu_ops check in iort_iommu_xlate(). We would need to carve out a path from pci_device_add() or even device_add() to solve all cases. Another thought that crosses my mind is that when pci_device_group() walks up to the point of ACS isolation and doesn't find an existing group, it can still infer that everything it walked past *should* be put in the same group it's then eventually going to return. Unfortunately I can't see an obvious way for it to act on that knowledge, though, since recursive iommu_probe_device() is unlikely to end well. [...] And this looks to be the reason for which current iommu_bus_init()->bus_for_each_device(..., add_iommu_group) fails also. Of course, just adding a 'correct' add_device replay without the of_xlate process doesn't help at all. No wonder this looked suspiciously simpler than where the first idea left off... (on reflection, the core of this idea seems to be recycling the existing iommu_bus_init walk rather than building up a separate "waiting list", while forgetting that that wasn't the difficult part of the original idea anyway) We could still use a bus walk to add the group per iommu, but we would need an additional check to ensure the device is associated with the IOMMU. On this current code mentioned, the principle of this seems wrong to me - we call bus_for_each_device(..., add_iommu_group) for the first SMMU in the system which probes, but we attempt to add_iommu_group() for all devices on the bus, even though the SMMU for that device may yet to have probed. Yes, iommu_bus_init() is one of the places still holding a deeply-ingrained assumption that the ops go live for all IOMMU instances at once, which is what warranted the further replay in of_iommu_configure() originally. Moving that out of of_platform_device_create() to support probe deferral is where the trouble really started. I'm not too familiar with the history here, but could this be reverted now with the introduction of of_devlink stuff? Cheers, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/mediatek: add support for MT8167
Hi Fabien, Thanks very much for your patch. On Mon, 2020-02-17 at 09:15 +0800, CK Hu wrote: > +Yong.Wu. > > On Fri, 2020-01-03 at 17:26 +0100, Fabien Parent wrote: > > Add support for the IOMMU on MT8167 > > > > Signed-off-by: Fabien Parent > > --- > > drivers/iommu/mtk_iommu.c | 11 ++- > > drivers/iommu/mtk_iommu.h | 1 + > > 2 files changed, 11 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c > > index 6fc1f5ecf91e..5fc6178a82dc 100644 > > --- a/drivers/iommu/mtk_iommu.c > > +++ b/drivers/iommu/mtk_iommu.c > > @@ -569,7 +569,8 @@ static int mtk_iommu_hw_init(const struct > > mtk_iommu_data *data) > > F_INT_PRETETCH_TRANSATION_FIFO_FAULT; > > writel_relaxed(regval, data->base + REG_MMU_INT_MAIN_CONTROL); > > > > - if (data->plat_data->m4u_plat == M4U_MT8173) > > + if (data->plat_data->m4u_plat == M4U_MT8173 || > > + data->plat_data->m4u_plat == M4U_MT8167) I didn't know mt8167 will do upstream. In my original thought, there is only mt8173 use this setting and the later SoC won't use this, So I used the "m4u_plat" directly here. If we also need support mt8167, then CK's suggestion is reasonable. we could add a new variable like "legacy_ivrp_paddr" from its register name in a seperated patch, then support mt8167 in a new patch. > > regval = (data->protect_base >> 1) | (data->enable_4GB << 31); > > else > > regval = lower_32_bits(data->protect_base) | > > @@ -782,6 +783,13 @@ static const struct mtk_iommu_plat_data mt2712_data = { > > .larbid_remap = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, > > }; > > > > +static const struct mtk_iommu_plat_data mt8167_data = { > > + .m4u_plat = M4U_MT8167, > > + .has_4gb_mode = true, > > + .reset_axi= true, > > + .larbid_remap = {0, 1, 2, 3, 4, 5}, /* Linear mapping. */ > > +}; > > + > > static const struct mtk_iommu_plat_data mt8173_data = { > > .m4u_plat = M4U_MT8173, > > .has_4gb_mode = true, > > @@ -798,6 +806,7 @@ static const struct mtk_iommu_plat_data mt8183_data = { > > > > static const struct of_device_id mtk_iommu_of_ids[] = { > > { .compatible = "mediatek,mt2712-m4u", .data = &mt2712_data}, > > + { .compatible = "mediatek,mt8167-m4u", .data = &mt8167_data}, > > { .compatible = "mediatek,mt8173-m4u", .data = &mt8173_data}, > > { .compatible = "mediatek,mt8183-m4u", .data = &mt8183_data}, > > {} > > diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h > > index ea949a324e33..cb8fd5970cd4 100644 > > --- a/drivers/iommu/mtk_iommu.h > > +++ b/drivers/iommu/mtk_iommu.h > > @@ -30,6 +30,7 @@ struct mtk_iommu_suspend_reg { > > enum mtk_iommu_plat { > > M4U_MT2701, > > M4U_MT2712, > > + M4U_MT8167, > > M4U_MT8173, > > M4U_MT8183, > > }; > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/3] PCI: Add DMA configuration for virtual platforms
On Fri, Feb 14, 2020 at 05:03:16PM +, Robin Murphy wrote: > On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote: > > Hardware platforms usually describe the IOMMU topology using either > > device-tree pointers or vendor-specific ACPI tables. For virtual > > platforms that don't provide a device-tree, the virtio-iommu device > > contains a description of the endpoints it manages. That information > > allows us to probe endpoints after the IOMMU is probed (possibly as late > > as userspace modprobe), provided it is discovered early enough. > > > > Add a hook to pci_dma_configure(), which returns -EPROBE_DEFER if the > > endpoint is managed by a vIOMMU that will be loaded later, or 0 in any > > other case to avoid disturbing the normal DMA configuration methods. > > When CONFIG_VIRTIO_IOMMU_TOPOLOGY isn't selected, the call to > > virt_dma_configure() is compiled out. > > > > As long as the information is consistent, platforms can provide both a > > device-tree and a built-in topology, and the IOMMU infrastructure is > > able to deal with multiple DMA configuration methods. > > Urgh, it's already been established[1] that having IOMMU setup tied to DMA > configuration at driver probe time is not just conceptually wrong but > actually broken, so the concept here worries me a bit. In a world where > of_iommu_configure() and friends are being called much earlier around > iommu_probe_device() time, how badly will this fall apart? If present the DT configuration should take precedence over this built-in method, so the earlier it is called the better. virt_dma_configure() currently gives up if the device already has iommu_ops (well, still calls setup_dma_ops() which is safe enough, but I think I'll change that to have virt_iommu_setup() return NULL if iommu_ops are present). I don't have the full picture of the changes you intend for {of,acpi}_iommu_configure(), do you think checking the validity of dev->iommu_fwspec will remain sufficient to have both methods coexist? Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/3] iommu/virtio: Enable x86 support
On Sun, Feb 16, 2020 at 04:50:33AM -0500, Michael S. Tsirkin wrote: > On Fri, Feb 14, 2020 at 04:57:11PM +, Robin Murphy wrote: > > On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote: > > > With the built-in topology description in place, x86 platforms can now > > > use the virtio-iommu. > > > > > > Signed-off-by: Jean-Philippe Brucker > > > --- > > > drivers/iommu/Kconfig | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > > > index 068d4e0e3541..adcbda44d473 100644 > > > --- a/drivers/iommu/Kconfig > > > +++ b/drivers/iommu/Kconfig > > > @@ -508,8 +508,9 @@ config HYPERV_IOMMU > > > config VIRTIO_IOMMU > > > bool "Virtio IOMMU driver" > > > depends on VIRTIO=y > > > - depends on ARM64 > > > + depends on (ARM64 || X86) > > > select IOMMU_API > > > + select IOMMU_DMA > > > > Can that have an "if X86" for clarity? AIUI it's not necessary for > > virtio-iommu itself (and really shouldn't be), but is merely to satisfy the > > x86 arch code's expectation that IOMMU drivers bring their own DMA ops, > > right? > > > > Robin. > > In fact does not this work on any platform now? There is ongoing work to use the generic IOMMU_DMA ops on X86. AMD IOMMU has been converted recently [1] but VT-d still implements its own DMA ops (conversion patches are on the list [2]). On Arm the arch Kconfig selects IOMMU_DMA, and I assume we'll have the same on X86 once Tom's work is complete. Until then I can add a "if X86" here for clarity. Thanks, Jean [1] https://lore.kernel.org/linux-iommu/20190613223901.9523-1-murph...@tcd.ie/ [2] https://lore.kernel.org/linux-iommu/20191221150402.13868-1-murph...@tcd.ie/ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH] iommu/iova: Add a best-fit algorithm
On Fri, Feb 14, 2020 at 03:06:42PM -0800, Isaac J. Manjarres wrote: > From: Liam Mark > > Using the best-fit algorithm, instead of the first-fit > algorithm, may reduce fragmentation when allocating > IOVAs. As we like to say in standards groups: may also implies may not. Please provide numbers showing that this helps, and preferably and explanation how it doesn't hurt as well. > + * Should be called prior to using dma-apis. > + */ > +int iommu_dma_enable_best_fit_algo(struct device *dev) > +{ > + struct iommu_domain *domain; > + struct iova_domain *iovad; > + > + domain = iommu_get_domain_for_dev(dev); > + if (!domain || !domain->iova_cookie) > + return -EINVAL; > > + iovad = &((struct iommu_dma_cookie *)domain->iova_cookie)->iovad; > + iovad->best_fit = true; > return 0; > } > +EXPORT_SYMBOL(iommu_dma_enable_best_fit_algo); Who is going to call this? Patches adding exports always need a user that goes along with the export. Also drivers have no business calling directly into dma-iommu. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH] iommu/dma: Allow drivers to reserve an iova range
On Fri, Feb 14, 2020 at 02:58:16PM -0800, Isaac J. Manjarres wrote: > From: Liam Mark > > Some devices have a memory map which contains gaps or holes. > In order for the device to have as much IOVA space as possible, > allow its driver to inform the DMA-IOMMU layer that it should > not allocate addresses from these holes. Layering violation. dma-iommu is the translation layer between the DMA API and the IOMMU API. And calls into it from drivers performing DMA mappings need to go through the DMA API (and be documented there). ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu