Re: [PATCH v9 03/10] iommu: Separate IOMMU_DEV_FEAT_IOPF from IOMMU_DEV_FEAT_SVA
Hi Jean, On 2021/1/15 0:41, Jean-Philippe Brucker wrote: I guess detailing what's needed for nested IOPF can help the discussion, although I haven't seen any concrete plan about implementing it, and it still seems a couple of years away. There are two important steps with nested IOPF: (1) Figuring out whether a fault comes from L1 or L2. A SMMU stall event comes with this information, but a PRI page request doesn't. The IOMMU driver has to first translate the IOVA to a GPA, injecting the fault into the guest if this translation fails by using the usual iommu_report_device_fault(). (2) Translating the faulting GPA to a HVA that can be fed to handle_mm_fault(). That requires help from KVM, so another interface - either KVM registering GPA->HVA translation tables or IOMMU driver querying each translation. Either way it should be reusable by device drivers that implement IOPF themselves. (1) could be enabled with iommu_dev_enable_feature(). (2) requires a more complex interface. (2) alone might also be desirable - demand-paging for level 2 only, no SVA for level 1. Anyway, back to this patch. What I'm trying to convey is "can the IOMMU receive incoming I/O page faults for this device and, when SVA is enabled, feed them to the mm subsystem? Enable that or return an error." I'm stuck on the name. IOPF alone is too vague. Not IOPF_L1 as Kevin noted, since L1 is also used in virtualization. IOPF_BIND and IOPF_SVA could also mean (2) above. IOMMU_DEV_FEAT_IOPF_FLAT? That leaves space for the nested extensions. (1) above could be IOMMU_FEAT_IOPF_NESTED, and (2) requires some new interfacing with KVM (or just an external fault handler) and could be used with either IOPF_FLAT or IOPF_NESTED. We can figure out the details later. What do you think? I agree that we can define IOPF_ for current usage and leave space for future extensions. IOPF_FLAT represents IOPF on first-level translation, currently first level translation could be used in below cases. 1) FL w/ internal Page Table: Kernel IOVA; 2) FL w/ external Page Table: VFIO passthrough; 3) FL w/ shared CPU page table: SVA We don't need to support IOPF for case 1). Let's put it aside. IOPF handling of 2) and 3) are different. Do we need to define different names to distinguish these two cases? Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v13 08/15] iommu/smmuv3: Implement cache_invalidate
Hi Eric, 在 2020/11/18 19:21, Eric Auger 写道: Implement domain-selective and page-selective IOTLB invalidations. Signed-off-by: Eric Auger --- v7 -> v8: - ASID based invalidation using iommu_inv_pasid_info - check ARCHID/PASID flags in addr based invalidation - use __arm_smmu_tlb_inv_context and __arm_smmu_tlb_inv_range_nosync v6 -> v7 - check the uapi version v3 -> v4: - adapt to changes in the uapi - add support for leaf parameter - do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context anymore v2 -> v3: - replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync v1 -> v2: - properly pass the asid --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 53 + 1 file changed, 53 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index fdecc9f17b36..24124361dd3b 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2771,6 +2771,58 @@ static void arm_smmu_detach_pasid_table(struct iommu_domain *domain) mutex_unlock(_domain->init_mutex); } +static int +arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device *dev, + struct iommu_cache_invalidate_info *inv_info) +{ + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + + if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED) + return -EINVAL; + + if (!smmu) + return -EINVAL; + + if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1) + return -EINVAL; + + if (inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB) { + if (inv_info->granularity == IOMMU_INV_GRANU_PASID) { + struct iommu_inv_pasid_info *info = + _info->granu.pasid_info; + + if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID) || +(info->flags & IOMMU_INV_PASID_FLAGS_PASID)) + return -EINVAL; + + __arm_smmu_tlb_inv_context(smmu_domain, info->archid); + + } else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR) { + struct iommu_inv_addr_info *info = _info->granu.addr_info; + size_t size = info->nb_granules * info->granule_size; + bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF; + + if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID) || +(info->flags & IOMMU_INV_ADDR_FLAGS_PASID)) + return -EINVAL; + + __arm_smmu_tlb_inv_range(info->addr, size, +info->granule_size, leaf, + smmu_domain, info->archid); When debugging with vSMMU on ARM64 huawei platform, there is a issue: RIL feature is enabled on guest OS while it is not supported on host OS, with some operations (such as rmmod driver during iperf between guest and host), SMMU translation error occurs frequently, and it works well if RIL feature is disabled on guest OS. We find that in function vfio_iommu_unmap_notify() (qemu code) it passes total size of tlb (num_pages * granule_size) as granule size to host OS : addr_info->granules_size = size addr_info->nb_granule = 1 So total size (num_pages * granule_size) is passed as granule size to function __arm_smmu_inv_range(), if RIL feature is not supported, then total size may be not the granule size that host OS can recongize, i will only invalidate part of tlb it wants to invalidate (for example, 4K/2M/1G pagesize is supported on host OS, if total_size = 8K, then addr_info->granule_size = 8K, addr_info->nb_granule = 1, as RIL feature is not supported, it invalidates just 4K one time but it thought it had invalidated 8K). Please have a check of the issue, and we have a temporary fix as follows on the issue. hw/arm/smmuv3.c | 3 ++- hw/vfio/common.c | 6 +++--- include/exec/memory.h | 1 + 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c index 6725019..891a65d 100644 --- a/hw/arm/smmuv3.c +++ b/hw/arm/smmuv3.c @@ -821,7 +821,8 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr, entry.target_as = _space_memory; entry.iova = iova; -entry.addr_mask = num_pages * (1 << granule) - 1; +entry.addr_mask = (1 << granule) - 1; +entry.num_pages = num_pages; entry.perm = IOMMU_NONE; entry.arch_id = asid; entry.flags = IOMMU_INV_GRANU_ADDR; diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 5d365e0..c0164b1 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -604,7 +604,7 @@ static void vfio_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) hwaddr start = iotlb->iova +
Re: [RFC PATCH v3 1/2] iommu: Add capability IOMMU_CAP_VIOMMU
Hi, On 2021/1/15 14:31, Leon Romanovsky wrote: On Fri, Jan 15, 2021 at 07:49:47AM +0800, Lu Baolu wrote: Hi Leon, On 1/14/21 9:26 PM, Leon Romanovsky wrote: On Thu, Jan 14, 2021 at 09:30:02AM +0800, Lu Baolu wrote: Some vendor IOMMU drivers are able to declare that it is running in a VM context. This is very valuable for the features that only want to be supported on bare metal. Add a capability bit so that it could be used. And how is it used? Who and how will set it? Use the existing iommu_capable(). I should add more descriptions about who and how to use it. I want to see the code that sets this capability. Currently we have Intel VT-d and the virt-iommu setting this capability. static bool intel_iommu_capable(enum iommu_cap cap) { if (cap == IOMMU_CAP_CACHE_COHERENCY) return domain_update_iommu_snooping(NULL) == 1; if (cap == IOMMU_CAP_INTR_REMAP) return irq_remapping_enabled == 1; + if (cap == IOMMU_CAP_VIOMMU) + return caching_mode_enabled(); return false; } And, +static bool viommu_capable(enum iommu_cap cap) +{ + if (cap == IOMMU_CAP_VIOMMU) + return true; + + return false; +} Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 6/6] vfio/iommu_type1: Drop parameter "pgsize" of update_user_bitmap
On Thu, 7 Jan 2021 12:44:01 +0800 Keqian Zhu wrote: > We always use the smallest supported page size of vfio_iommu as > pgsize. Drop parameter "pgsize" of update_user_bitmap. > > Signed-off-by: Keqian Zhu > --- > drivers/vfio/vfio_iommu_type1.c | 9 - > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 82649a040148..bceda5e8baaa 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -978,10 +978,9 @@ static void vfio_update_pgsize_bitmap(struct vfio_iommu > *iommu) > } > > static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu, > - struct vfio_dma *dma, dma_addr_t base_iova, > - size_t pgsize) > + struct vfio_dma *dma, dma_addr_t base_iova) > { > - unsigned long pgshift = __ffs(pgsize); > + unsigned long pgshift = __ffs(iommu->pgsize_bitmap); > unsigned long nbits = dma->size >> pgshift; > unsigned long bit_offset = (dma->iova - base_iova) >> pgshift; > unsigned long copy_offset = bit_offset / BITS_PER_LONG; > @@ -1046,7 +1045,7 @@ static int vfio_iova_dirty_bitmap(u64 __user *bitmap, > struct vfio_iommu *iommu, > if (dma->iova > iova + size - 1) > break; > > - ret = update_user_bitmap(bitmap, iommu, dma, iova, pgsize); > + ret = update_user_bitmap(bitmap, iommu, dma, iova); > if (ret) > return ret; > > @@ -1192,7 +1191,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, > > if (unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) { > ret = update_user_bitmap(bitmap->data, iommu, dma, > - unmap->iova, pgsize); > + unmap->iova); > if (ret) > break; > } Same as the previous, both call sites already have both pgsize and pgshift, pass both rather than recalculate. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/6] vfio/iommu_type1: Drop parameter "pgsize" of vfio_iova_dirty_bitmap
On Thu, 7 Jan 2021 12:44:00 +0800 Keqian Zhu wrote: > We always use the smallest supported page size of vfio_iommu as > pgsize. Remove parameter "pgsize" of vfio_iova_dirty_bitmap. > > Signed-off-by: Keqian Zhu > --- > drivers/vfio/vfio_iommu_type1.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 080c05b129ee..82649a040148 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -1015,11 +1015,12 @@ static int update_user_bitmap(u64 __user *bitmap, > struct vfio_iommu *iommu, > } > > static int vfio_iova_dirty_bitmap(u64 __user *bitmap, struct vfio_iommu > *iommu, > - dma_addr_t iova, size_t size, size_t pgsize) > + dma_addr_t iova, size_t size) > { > struct vfio_dma *dma; > struct rb_node *n; > - unsigned long pgshift = __ffs(pgsize); > + unsigned long pgshift = __ffs(iommu->pgsize_bitmap); > + size_t pgsize = (size_t)1 << pgshift; > int ret; > > /* > @@ -2824,8 +2825,7 @@ static int vfio_iommu_type1_dirty_pages(struct > vfio_iommu *iommu, > if (iommu->dirty_page_tracking) > ret = vfio_iova_dirty_bitmap(range.bitmap.data, >iommu, range.iova, > - range.size, > - range.bitmap.pgsize); > + range.size); > else > ret = -EINVAL; > out_unlock: In this case the caller has actually already calculated both pgsize and pgshift, the better optimization would be to pass both rather than recalculate. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/6] vfio/iommu_type1: Drop parameter "pgsize" of vfio_dma_bitmap_alloc_all
On Thu, 7 Jan 2021 12:43:59 +0800 Keqian Zhu wrote: > We always use the smallest supported page size of vfio_iommu as > pgsize. Remove parameter "pgsize" of vfio_dma_bitmap_alloc_all. > > Signed-off-by: Keqian Zhu > --- > drivers/vfio/vfio_iommu_type1.c | 8 +++- > 1 file changed, 3 insertions(+), 5 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index b596c482487b..080c05b129ee 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -236,9 +236,10 @@ static void vfio_dma_populate_bitmap(struct vfio_dma > *dma, size_t pgsize) > } > } > > -static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu, size_t pgsize) > +static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu) > { > struct rb_node *n; > + size_t pgsize = (size_t)1 << __ffs(iommu->pgsize_bitmap); > > for (n = rb_first(>dma_list); n; n = rb_next(n)) { > struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node); > @@ -2761,12 +2762,9 @@ static int vfio_iommu_type1_dirty_pages(struct > vfio_iommu *iommu, > return -EINVAL; > > if (dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_START) { > - size_t pgsize; > - > mutex_lock(>lock); > - pgsize = 1 << __ffs(iommu->pgsize_bitmap); > if (!iommu->dirty_page_tracking) { > - ret = vfio_dma_bitmap_alloc_all(iommu, pgsize); > + ret = vfio_dma_bitmap_alloc_all(iommu); > if (!ret) > iommu->dirty_page_tracking = true; > } This just moves the same calculation from one place to another, what's the point? Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/6] vfio/iommu_type1: Initially set the pinned_page_dirty_scope
On Thu, 7 Jan 2021 12:43:58 +0800 Keqian Zhu wrote: > For now there are 3 ways to promote the pinned_page_dirty_scope > status of vfio_iommu: > > 1. Through vfio pin interface. > 2. Detach a group without pinned_dirty_scope. > 3. Attach a group with pinned_dirty_scope. > > For point 3, the only chance to promote the pinned_page_dirty_scope > status is when vfio_iommu is newly created. As we can safely set > empty vfio_iommu to be at pinned status, then the point 3 can be > removed to reduce operations. > > Signed-off-by: Keqian Zhu > --- > drivers/vfio/vfio_iommu_type1.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 110ada24ee91..b596c482487b 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -2045,11 +2045,8 @@ static int vfio_iommu_type1_attach_group(void > *iommu_data, >* Non-iommu backed group cannot dirty memory directly, >* it can only use interfaces that provide dirty >* tracking. > - * The iommu scope can only be promoted with the > - * addition of a dirty tracking group. >*/ > group->pinned_page_dirty_scope = true; > - promote_pinned_page_dirty_scope(iommu); > mutex_unlock(>lock); > > return 0; > @@ -2436,6 +2433,7 @@ static void *vfio_iommu_type1_open(unsigned long arg) > INIT_LIST_HEAD(>iova_list); > iommu->dma_list = RB_ROOT; > iommu->dma_avail = dma_entry_limit; > + iommu->pinned_page_dirty_scope = true; > mutex_init(>lock); > BLOCKING_INIT_NOTIFIER_HEAD(>notifier); > This would be resolved automatically if we used the counter approach I mentioned on the previous patch, adding a pinned-page scope group simply wouldn't increment the iommu counter, which would initially be zero indicating no "all-dma" groups. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/6] vfio/iommu_type1: Ignore external domain when promote pinned_scope
On Thu, 7 Jan 2021 12:43:57 +0800 Keqian Zhu wrote: > The pinned_scope of external domain's groups are always true, that's > to say we can safely ignore external domain when promote pinned_scope > status of vfio_iommu. > > Signed-off-by: Keqian Zhu > --- > drivers/vfio/vfio_iommu_type1.c | 14 +++--- > 1 file changed, 3 insertions(+), 11 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 334a8240e1da..110ada24ee91 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -1637,14 +1637,7 @@ static void promote_pinned_page_dirty_scope(struct > vfio_iommu *iommu) > } > } > > - if (iommu->external_domain) { > - domain = iommu->external_domain; > - list_for_each_entry(group, >group_list, next) { > - if (!group->pinned_page_dirty_scope) > - return; > - } > - } > - > + /* The external domain always passes check */ > iommu->pinned_page_dirty_scope = true; > } > > @@ -2347,7 +2340,6 @@ static void vfio_iommu_type1_detach_group(void > *iommu_data, > if (iommu->external_domain) { > group = find_iommu_group(iommu->external_domain, iommu_group); > if (group) { > - promote_dirty_scope = !group->pinned_page_dirty_scope; With this, vfio_group.pinned_page_dirty_scope is effectively a dead field on the struct for groups on the external_domain group list and handled specially. That's not great. If you actually want to make more than a trivial improvement to scope tracking, what about making a counter on our struct vfio_iommu for all the non-pinned-page (ie. all-dma) scope groups attached to the container. Groups on the external domain would still set their group dirty scope to pinned pages, groups making use of an iommu domain would have an all-dma scope initially and increment that counter when attached. Groups that still have an all-dma scope on detach would decrement the counter. If a group changes from all-dma to pinned-page scope, the counter is also decremented. We'd never need to search across group lists. Thanks, Alex > list_del(>next); > kfree(group); > > @@ -2360,7 +2352,8 @@ static void vfio_iommu_type1_detach_group(void > *iommu_data, > kfree(iommu->external_domain); > iommu->external_domain = NULL; > } > - goto detach_group_done; > + mutex_unlock(>lock); > + return; > } > } > > @@ -2408,7 +2401,6 @@ static void vfio_iommu_type1_detach_group(void > *iommu_data, > else > vfio_iommu_iova_free(_copy); > > -detach_group_done: > /* >* Removal of a group without dirty tracking may allow the iommu scope >* to be promoted. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/6] vfio/iommu_type1: Make an explicit "promote" semantic
On Thu, 7 Jan 2021 12:43:56 +0800 Keqian Zhu wrote: > When we want to promote the pinned_page_dirty_scope of vfio_iommu, > we call the "update" function to visit all vfio_group, but when we > want to downgrade this, we can set the flag as false directly. I agree that the transition can only go in one direction, but it's still conditional on the scope of all groups involved. We are "updating" the iommu state based on the change of a group. Renaming this to "promote" seems like a matter of personal preference. > So we'd better make an explicit "promote" semantic to the "update" > function. BTW, if vfio_iommu already has been promoted, then return > early. Currently it's the caller that avoids using this function when the iommu scope is already correct. In fact the changes induces a redundant test in the pin_pages code path, we're changing a group from non-pinned-page-scope to pinned-page-scope, therefore the iommu scope cannot initially be scope limited. In the attach_group call path, we're moving that test from the caller, so at best we've introduced an additional function call. The function as it exists today is also more versatile whereas the "promote" version here forces it to a single task with no appreciable difference in complexity or code. This seems like a frivolous change. Thanks, Alex > Signed-off-by: Keqian Zhu > --- > drivers/vfio/vfio_iommu_type1.c | 30 ++ > 1 file changed, 14 insertions(+), 16 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 0b4dedaa9128..334a8240e1da 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -148,7 +148,7 @@ static int put_pfn(unsigned long pfn, int prot); > static struct vfio_group *vfio_iommu_find_iommu_group(struct vfio_iommu > *iommu, > struct iommu_group *iommu_group); > > -static void update_pinned_page_dirty_scope(struct vfio_iommu *iommu); > +static void promote_pinned_page_dirty_scope(struct vfio_iommu *iommu); > /* > * This code handles mapping and unmapping of user data buffers > * into DMA'ble space using the IOMMU > @@ -714,7 +714,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, > group = vfio_iommu_find_iommu_group(iommu, iommu_group); > if (!group->pinned_page_dirty_scope) { > group->pinned_page_dirty_scope = true; > - update_pinned_page_dirty_scope(iommu); > + promote_pinned_page_dirty_scope(iommu); > } > > goto pin_done; > @@ -1622,27 +1622,26 @@ static struct vfio_group > *vfio_iommu_find_iommu_group(struct vfio_iommu *iommu, > return group; > } > > -static void update_pinned_page_dirty_scope(struct vfio_iommu *iommu) > +static void promote_pinned_page_dirty_scope(struct vfio_iommu *iommu) > { > struct vfio_domain *domain; > struct vfio_group *group; > > + if (iommu->pinned_page_dirty_scope) > + return; > + > list_for_each_entry(domain, >domain_list, next) { > list_for_each_entry(group, >group_list, next) { > - if (!group->pinned_page_dirty_scope) { > - iommu->pinned_page_dirty_scope = false; > + if (!group->pinned_page_dirty_scope) > return; > - } > } > } > > if (iommu->external_domain) { > domain = iommu->external_domain; > list_for_each_entry(group, >group_list, next) { > - if (!group->pinned_page_dirty_scope) { > - iommu->pinned_page_dirty_scope = false; > + if (!group->pinned_page_dirty_scope) > return; > - } > } > } > > @@ -2057,8 +2056,7 @@ static int vfio_iommu_type1_attach_group(void > *iommu_data, >* addition of a dirty tracking group. >*/ > group->pinned_page_dirty_scope = true; > - if (!iommu->pinned_page_dirty_scope) > - update_pinned_page_dirty_scope(iommu); > + promote_pinned_page_dirty_scope(iommu); > mutex_unlock(>lock); > > return 0; > @@ -2341,7 +2339,7 @@ static void vfio_iommu_type1_detach_group(void > *iommu_data, > struct vfio_iommu *iommu = iommu_data; > struct vfio_domain *domain; > struct vfio_group *group; > - bool update_dirty_scope = false; > + bool promote_dirty_scope = false; > LIST_HEAD(iova_copy); > > mutex_lock(>lock); > @@ -2349,7 +2347,7 @@ static void vfio_iommu_type1_detach_group(void > *iommu_data, > if (iommu->external_domain) { > group = find_iommu_group(iommu->external_domain, iommu_group); > if (group) { > -
Re: [PATCH v4 3/3] iommu/iova: Flush CPU rcache for when a depot fills
On 2021-01-15 17:32, Jean-Philippe Brucker wrote: On Thu, Dec 10, 2020 at 02:23:09AM +0800, John Garry wrote: Leizhen reported some time ago that IOVA performance may degrade over time [0], but unfortunately his solution to fix this problem was not given attention. To summarize, the issue is that as time goes by, the CPU rcache and depot rcache continue to grow. As such, IOVA RB tree access time also continues to grow. At a certain point, a depot may become full, and also some CPU rcaches may also be full when inserting another IOVA is attempted. For this scenario, currently the "loaded" CPU rcache is freed and a new one is created. This freeing means that many IOVAs in the RB tree need to be freed, which makes IO throughput performance fall off a cliff in some storage scenarios: Jobs: 12 (f=12): [] [0.0% done] [6314MB/0KB/0KB /s] [1616K/0/0 iops] Jobs: 12 (f=12): [] [0.0% done] [5669MB/0KB/0KB /s] [1451K/0/0 iops] Jobs: 12 (f=12): [] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 iops] Jobs: 12 (f=12): [] [0.0% done] [6673MB/0KB/0KB /s] [1708K/0/0 iops] Jobs: 12 (f=12): [] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 iops] Jobs: 12 (f=12): [] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 iops] Jobs: 12 (f=12): [] [0.0% done] [6761MB/0KB/0KB /s] [1731K/0/0 iops] Jobs: 12 (f=12): [] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 iops] Jobs: 12 (f=12): [] [0.0% done] [6685MB/0KB/0KB /s] [1711K/0/0 iops] Jobs: 12 (f=12): [] [0.0% done] [6178MB/0KB/0KB /s] [1582K/0/0 iops] Jobs: 12 (f=12): [] [0.0% done] [6731MB/0KB/0KB /s] [1723K/0/0 iops] Jobs: 12 (f=12): [] [0.0% done] [2387MB/0KB/0KB /s] [611K/0/0 iops] Jobs: 12 (f=12): [] [0.0% done] [2689MB/0KB/0KB /s] [688K/0/0 iops] Jobs: 12 (f=12): [] [0.0% done] [2278MB/0KB/0KB /s] [583K/0/0 iops] Jobs: 12 (f=12): [] [0.0% done] [1288MB/0KB/0KB /s] [330K/0/0 iops] Jobs: 12 (f=12): [] [0.0% done] [1632MB/0KB/0KB /s] [418K/0/0 iops] Jobs: 12 (f=12): [] [0.0% done] [1765MB/0KB/0KB /s] [452K/0/0 iops] And continue in this fashion, without recovering. Note that in this example it was required to wait 16 hours for this to occur. Also note that IO throughput also becomes gradually becomes more unstable leading up to this point. This problem is only seen for non-strict mode. For strict mode, the rcaches stay quite compact. It would be good to understand why the rcache doesn't stabilize. Could be a bug, or just need some tuning In strict mode, if a driver does Alloc-Free-Alloc and the first alloc misses the rcache, the second allocation hits it. The same sequence in non-strict mode misses the cache twice, because the IOVA is added to the flush queue on Free. So rather than AFAFAF.. we get AAA..FFF.., only once the fq_timer triggers or the FQ is full. Interestingly the FQ size is 2x IOVA_MAG_SIZE, so we could allocate 2 magazines worth of fresh IOVAs before alloc starts hitting the cache. If a job allocates more than that, some magazines are going to the depot, and with multi-CPU jobs those will get used on other CPUs during the next alloc bursts, causing the progressive increase in rcache consumption. I wonder if setting IOVA_MAG_SIZE > IOVA_FQ_SIZE helps reuse of IOVAs? Then again I haven't worked out the details, might be entirely wrong. I'll have another look next week. I did start digging into the data (thanks for that!) before Christmas, but between being generally frazzled and trying to remember how to write Perl to massage the numbers out of the log dump I never got round to responding, sorry. The partial thoughts that I can recall right now are firstly that the total numbers of IOVAs are actually pretty meaningless, it really needs to be broken down by size (that's where my Perl-hacking stalled...); secondly that the pattern is far more than just a steady increase - the CPU rcache count looks to be heading asymptotically towards ~65K IOVAs all the time, representing (IIRC) two sizes being in heavy rotation, while the depot is happily ticking along in a steady state as expected, until it suddenly explodes out of nowhere; thirdly, I'd really like to see instrumentation of the flush queues at the same time, since I think they're the real culprit. My theory so far is that everyone is calling queue_iova() frequently enough to keep the timer at bay and their own queues drained. Then at the ~16H mark, *something* happens that pauses unmaps long enough for the timer to fire, and at that point all hell breaks loose. The depot is suddenly flooded with IOVAs of *all* sizes, indicative of all the queues being flushed at once (note that the two most common sizes have been hovering perilously close to "full" the whole time), but then, crucially, *that keeps happening*. My guess is that the load of fq_flush_timeout() slows things down enough that the the
Re: [PATCH v2 2/2] vfio/iommu_type1: Sanity check pfn_list when remove vfio_dma
On Fri, 15 Jan 2021 17:26:43 +0800 Keqian Zhu wrote: > vfio_sanity_check_pfn_list() is used to check whether pfn_list of > vfio_dma is empty when remove the external domain, so it makes a > wrong assumption that only external domain will add pfn to dma pfn_list. > > Now we apply this check when remove a specific vfio_dma and extract > the notifier check just for external domain. The page pinning interface is gated by having a notifier registered for unmaps, therefore non-external domains would also need to register a notifier. There's currently no other way to add entries to the pfn_list. So if we allow pinning for such domains, then it's wrong to WARN_ON() when the notifier list is not-empty when removing an external domain. Long term we should probably extend page {un}pinning for the caller to pass their notifier to be validated against the notifier list rather than just allowing page pinning if *any* notifier is registered. Thanks, Alex > Fixes: a54eb55045ae ("vfio iommu type1: Add support for mediated devices") > Signed-off-by: Keqian Zhu > --- > drivers/vfio/vfio_iommu_type1.c | 24 +--- > 1 file changed, 5 insertions(+), 19 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 4e82b9a3440f..a9bc15e84a4e 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -958,6 +958,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, > struct vfio_dma *dma, > > static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) > { > + WARN_ON(!RB_EMPTY_ROOT(>pfn_list); > vfio_unmap_unpin(iommu, dma, true); > vfio_unlink_dma(iommu, dma); > put_task_struct(dma->task); > @@ -2251,23 +2252,6 @@ static void vfio_iommu_unmap_unpin_reaccount(struct > vfio_iommu *iommu) > } > } > > -static void vfio_sanity_check_pfn_list(struct vfio_iommu *iommu) > -{ > - struct rb_node *n; > - > - n = rb_first(>dma_list); > - for (; n; n = rb_next(n)) { > - struct vfio_dma *dma; > - > - dma = rb_entry(n, struct vfio_dma, node); > - > - if (WARN_ON(!RB_EMPTY_ROOT(>pfn_list))) > - break; > - } > - /* mdev vendor driver must unregister notifier */ > - WARN_ON(iommu->notifier.head); > -} > - > /* > * Called when a domain is removed in detach. It is possible that > * the removed domain decided the iova aperture window. Modify the > @@ -2367,7 +2351,8 @@ static void vfio_iommu_type1_detach_group(void > *iommu_data, > kfree(group); > > if (list_empty(>external_domain->group_list)) { > - vfio_sanity_check_pfn_list(iommu); > + /* mdev vendor driver must unregister notifier > */ > + WARN_ON(iommu->notifier.head); > > if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) > vfio_iommu_unmap_unpin_all(iommu); > @@ -2491,7 +2476,8 @@ static void vfio_iommu_type1_release(void *iommu_data) > > if (iommu->external_domain) { > vfio_release_domain(iommu->external_domain, true); > - vfio_sanity_check_pfn_list(iommu); > + /* mdev vendor driver must unregister notifier */ > + WARN_ON(iommu->notifier.head); > kfree(iommu->external_domain); > } > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [GIT PULL] IOMMU fixes for -rc4
The pull request you sent on Fri, 15 Jan 2021 11:26:20 +: > git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git tags/iommu-fixes has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/7aec71cd9c1f251ef17eae5f898c10133d49421d Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/2] vfio/iommu_type1: Populate full dirty when detach non-pinned group
On Fri, 15 Jan 2021 17:26:42 +0800 Keqian Zhu wrote: > If a group with non-pinned-page dirty scope is detached with dirty > logging enabled, we should fully populate the dirty bitmaps at the > time it's removed since we don't know the extent of its previous DMA, > nor will the group be present to trigger the full bitmap when the user > retrieves the dirty bitmap. > > Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages > tracking") > Suggested-by: Alex Williamson > Signed-off-by: Keqian Zhu > --- > drivers/vfio/vfio_iommu_type1.c | 18 +- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 0b4dedaa9128..4e82b9a3440f 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -236,6 +236,19 @@ static void vfio_dma_populate_bitmap(struct vfio_dma > *dma, size_t pgsize) > } > } > > +static void vfio_iommu_populate_bitmap_full(struct vfio_iommu *iommu) > +{ > + struct rb_node *n; > + unsigned long pgshift = __ffs(iommu->pgsize_bitmap); > + > + for (n = rb_first(>dma_list); n; n = rb_next(n)) { > + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node); > + > + if (dma->iommu_mapped) > + bitmap_set(dma->bitmap, 0, dma->size >> pgshift); > + } > +} > + > static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu, size_t pgsize) > { > struct rb_node *n; > @@ -2415,8 +2428,11 @@ static void vfio_iommu_type1_detach_group(void > *iommu_data, >* Removal of a group without dirty tracking may allow the iommu scope >* to be promoted. >*/ > - if (update_dirty_scope) > + if (update_dirty_scope) { > update_pinned_page_dirty_scope(iommu); > + if (iommu->dirty_page_tracking) > + vfio_iommu_populate_bitmap_full(iommu); > + } > mutex_unlock(>lock); > } > This doesn't do the right thing. This marks the bitmap dirty if: * The detached group dirty scope was not limited to pinned pages AND * Dirty tracking is enabled AND * The vfio_dma is *currently* (ie. after the detach) iommu_mapped We need to mark the bitmap dirty based on whether the vfio_dma *was* iommu_mapped by the group that is now detached. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 3/3] iommu/iova: Flush CPU rcache for when a depot fills
On Thu, Dec 10, 2020 at 02:23:09AM +0800, John Garry wrote: > Leizhen reported some time ago that IOVA performance may degrade over time > [0], but unfortunately his solution to fix this problem was not given > attention. > > To summarize, the issue is that as time goes by, the CPU rcache and depot > rcache continue to grow. As such, IOVA RB tree access time also continues > to grow. > > At a certain point, a depot may become full, and also some CPU rcaches may > also be full when inserting another IOVA is attempted. For this scenario, > currently the "loaded" CPU rcache is freed and a new one is created. This > freeing means that many IOVAs in the RB tree need to be freed, which > makes IO throughput performance fall off a cliff in some storage scenarios: > > Jobs: 12 (f=12): [] [0.0% done] [6314MB/0KB/0KB /s] [1616K/0/0 > iops] > Jobs: 12 (f=12): [] [0.0% done] [5669MB/0KB/0KB /s] [1451K/0/0 > iops] > Jobs: 12 (f=12): [] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 > iops] > Jobs: 12 (f=12): [] [0.0% done] [6673MB/0KB/0KB /s] [1708K/0/0 > iops] > Jobs: 12 (f=12): [] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 > iops] > Jobs: 12 (f=12): [] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 > iops] > Jobs: 12 (f=12): [] [0.0% done] [6761MB/0KB/0KB /s] [1731K/0/0 > iops] > Jobs: 12 (f=12): [] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 > iops] > Jobs: 12 (f=12): [] [0.0% done] [6685MB/0KB/0KB /s] [1711K/0/0 > iops] > Jobs: 12 (f=12): [] [0.0% done] [6178MB/0KB/0KB /s] [1582K/0/0 > iops] > Jobs: 12 (f=12): [] [0.0% done] [6731MB/0KB/0KB /s] [1723K/0/0 > iops] > Jobs: 12 (f=12): [] [0.0% done] [2387MB/0KB/0KB /s] [611K/0/0 > iops] > Jobs: 12 (f=12): [] [0.0% done] [2689MB/0KB/0KB /s] [688K/0/0 > iops] > Jobs: 12 (f=12): [] [0.0% done] [2278MB/0KB/0KB /s] [583K/0/0 > iops] > Jobs: 12 (f=12): [] [0.0% done] [1288MB/0KB/0KB /s] [330K/0/0 > iops] > Jobs: 12 (f=12): [] [0.0% done] [1632MB/0KB/0KB /s] [418K/0/0 > iops] > Jobs: 12 (f=12): [] [0.0% done] [1765MB/0KB/0KB /s] [452K/0/0 > iops] > > And continue in this fashion, without recovering. Note that in this > example it was required to wait 16 hours for this to occur. Also note that > IO throughput also becomes gradually becomes more unstable leading up to > this point. > > This problem is only seen for non-strict mode. For strict mode, the rcaches > stay quite compact. It would be good to understand why the rcache doesn't stabilize. Could be a bug, or just need some tuning In strict mode, if a driver does Alloc-Free-Alloc and the first alloc misses the rcache, the second allocation hits it. The same sequence in non-strict mode misses the cache twice, because the IOVA is added to the flush queue on Free. So rather than AFAFAF.. we get AAA..FFF.., only once the fq_timer triggers or the FQ is full. Interestingly the FQ size is 2x IOVA_MAG_SIZE, so we could allocate 2 magazines worth of fresh IOVAs before alloc starts hitting the cache. If a job allocates more than that, some magazines are going to the depot, and with multi-CPU jobs those will get used on other CPUs during the next alloc bursts, causing the progressive increase in rcache consumption. I wonder if setting IOVA_MAG_SIZE > IOVA_FQ_SIZE helps reuse of IOVAs? Then again I haven't worked out the details, might be entirely wrong. I'll have another look next week. Thanks, Jean > As a solution to this issue, judge that the IOVA caches have grown too big > when cached magazines need to be free, and just flush all the CPUs rcaches > instead. > > The depot rcaches, however, are not flushed, as they can be used to > immediately replenish active CPUs. > > In future, some IOVA compaction could be implemented to solve the > instability issue, which I figure could be quite complex to implement. > > [0] > https://lore.kernel.org/linux-iommu/20190815121104.29140-3-thunder.leiz...@huawei.com/ > > Analyzed-by: Zhen Lei > Reported-by: Xiang Chen > Tested-by: Xiang Chen > Signed-off-by: John Garry > Reviewed-by: Zhen Lei > --- > drivers/iommu/iova.c | 16 ++-- > 1 file changed, 6 insertions(+), 10 deletions(-) > > diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c > index 732ee687e0e2..39b7488de8bb 100644 > --- a/drivers/iommu/iova.c > +++ b/drivers/iommu/iova.c > @@ -841,7 +841,6 @@ static bool __iova_rcache_insert(struct iova_domain > *iovad, >struct iova_rcache *rcache, >unsigned long iova_pfn) > { > - struct iova_magazine *mag_to_free = NULL; > struct iova_cpu_rcache *cpu_rcache; > bool can_insert = false; > unsigned long flags; > @@ -863,13 +862,12 @@ static bool __iova_rcache_insert(struct iova_domain > *iovad, > if (cpu_rcache->loaded) >
Re: [PATCH v4 2/3] iommu/iova: Avoid double-negatives in magazine helpers
On Thu, Dec 10, 2020 at 02:23:08AM +0800, John Garry wrote: > A similar crash to the following could be observed if initial CPU rcache > magazine allocations fail in init_iova_rcaches(): Any idea why that's happening? This fix seems ok but if we're expecting allocation failures for the loaded magazine then we could easily get it for cpu_rcaches too, and get a similar abort at runtime. Thanks, Jean > Unable to handle kernel NULL pointer dereference at virtual address > > Mem abort info: > > free_iova_fast+0xfc/0x280 > iommu_dma_free_iova+0x64/0x70 > __iommu_dma_unmap+0x9c/0xf8 > iommu_dma_unmap_sg+0xa8/0xc8 > dma_unmap_sg_attrs+0x28/0x50 > cq_thread_v3_hw+0x2dc/0x528 > irq_thread_fn+0x2c/0xa0 > irq_thread+0x130/0x1e0 > kthread+0x154/0x158 > ret_from_fork+0x10/0x34 > > The issue is that expression !iova_magazine_full(NULL) evaluates true; this > falls over in __iova_rcache_insert() when we attempt to cache a mag and > cpu_rcache->loaded == NULL: > > if (!iova_magazine_full(cpu_rcache->loaded)) { > can_insert = true; > ... > > if (can_insert) > iova_magazine_push(cpu_rcache->loaded, iova_pfn); > > As above, can_insert is evaluated true, which it shouldn't be, and we try > to insert pfns in a NULL mag, which is not safe. > > To avoid this, stop using double-negatives, like !iova_magazine_full() and > !iova_magazine_empty(), and use positive tests, like > iova_magazine_has_space() and iova_magazine_has_pfns(), respectively; these > can safely deal with cpu_rcache->{loaded, prev} = NULL. > > Signed-off-by: John Garry > Tested-by: Xiang Chen > Reviewed-by: Zhen Lei > --- > drivers/iommu/iova.c | 29 + > 1 file changed, 17 insertions(+), 12 deletions(-) > > diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c > index cf1aacda2fe4..732ee687e0e2 100644 > --- a/drivers/iommu/iova.c > +++ b/drivers/iommu/iova.c > @@ -767,14 +767,18 @@ iova_magazine_free_pfns(struct iova_magazine *mag, > struct iova_domain *iovad) > mag->size = 0; > } > > -static bool iova_magazine_full(struct iova_magazine *mag) > +static bool iova_magazine_has_space(struct iova_magazine *mag) > { > - return (mag && mag->size == IOVA_MAG_SIZE); > + if (!mag) > + return false; > + return mag->size < IOVA_MAG_SIZE; > } > > -static bool iova_magazine_empty(struct iova_magazine *mag) > +static bool iova_magazine_has_pfns(struct iova_magazine *mag) > { > - return (!mag || mag->size == 0); > + if (!mag) > + return false; > + return mag->size; > } > > static unsigned long iova_magazine_pop(struct iova_magazine *mag, > @@ -783,7 +787,7 @@ static unsigned long iova_magazine_pop(struct > iova_magazine *mag, > int i; > unsigned long pfn; > > - BUG_ON(iova_magazine_empty(mag)); > + BUG_ON(!iova_magazine_has_pfns(mag)); > > /* Only fall back to the rbtree if we have no suitable pfns at all */ > for (i = mag->size - 1; mag->pfns[i] > limit_pfn; i--) > @@ -799,7 +803,7 @@ static unsigned long iova_magazine_pop(struct > iova_magazine *mag, > > static void iova_magazine_push(struct iova_magazine *mag, unsigned long pfn) > { > - BUG_ON(iova_magazine_full(mag)); > + BUG_ON(!iova_magazine_has_space(mag)); > > mag->pfns[mag->size++] = pfn; > } > @@ -845,9 +849,9 @@ static bool __iova_rcache_insert(struct iova_domain > *iovad, > cpu_rcache = raw_cpu_ptr(rcache->cpu_rcaches); > spin_lock_irqsave(_rcache->lock, flags); > > - if (!iova_magazine_full(cpu_rcache->loaded)) { > + if (iova_magazine_has_space(cpu_rcache->loaded)) { > can_insert = true; > - } else if (!iova_magazine_full(cpu_rcache->prev)) { > + } else if (iova_magazine_has_space(cpu_rcache->prev)) { > swap(cpu_rcache->prev, cpu_rcache->loaded); > can_insert = true; > } else { > @@ -856,8 +860,9 @@ static bool __iova_rcache_insert(struct iova_domain > *iovad, > if (new_mag) { > spin_lock(>lock); > if (rcache->depot_size < MAX_GLOBAL_MAGS) { > - rcache->depot[rcache->depot_size++] = > - cpu_rcache->loaded; > + if (cpu_rcache->loaded) > + rcache->depot[rcache->depot_size++] = > + cpu_rcache->loaded; > } else { > mag_to_free = cpu_rcache->loaded; > } > @@ -908,9 +913,9 @@ static unsigned long __iova_rcache_get(struct iova_rcache > *rcache, > cpu_rcache = raw_cpu_ptr(rcache->cpu_rcaches); > spin_lock_irqsave(_rcache->lock, flags); > > - if (!iova_magazine_empty(cpu_rcache->loaded)) { > + if (iova_magazine_has_pfns(cpu_rcache->loaded)) { > has_pfn = true; > - } else
Re: [PATCH v4 1/3] iommu/iova: Add free_all_cpu_cached_iovas()
On Thu, Dec 10, 2020 at 02:23:07AM +0800, John Garry wrote: > Add a helper function to free the CPU rcache for all online CPUs. > > There also exists a function of the same name in > drivers/iommu/intel/iommu.c, but the parameters are different, and there > should be no conflict. > > Signed-off-by: John Garry > Tested-by: Xiang Chen > Reviewed-by: Zhen Lei Reviewed-by: Jean-Philippe Brucker (unless we find a better solution for patch 3) > --- > drivers/iommu/iova.c | 13 + > 1 file changed, 9 insertions(+), 4 deletions(-) > > diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c > index f9c35852018d..cf1aacda2fe4 100644 > --- a/drivers/iommu/iova.c > +++ b/drivers/iommu/iova.c > @@ -238,6 +238,14 @@ static int __alloc_and_insert_iova_range(struct > iova_domain *iovad, > return -ENOMEM; > } > > +static void free_all_cpu_cached_iovas(struct iova_domain *iovad) > +{ > + unsigned int cpu; > + > + for_each_online_cpu(cpu) > + free_cpu_cached_iovas(cpu, iovad); > +} > + > static struct kmem_cache *iova_cache; > static unsigned int iova_cache_users; > static DEFINE_MUTEX(iova_cache_mutex); > @@ -435,15 +443,12 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned > long size, > retry: > new_iova = alloc_iova(iovad, size, limit_pfn, true); > if (!new_iova) { > - unsigned int cpu; > - > if (!flush_rcache) > return 0; > > /* Try replenishing IOVAs by flushing rcache. */ > flush_rcache = false; > - for_each_online_cpu(cpu) > - free_cpu_cached_iovas(cpu, iovad); > + free_all_cpu_cached_iovas(iovad); > free_global_cached_iovas(iovad); > goto retry; > } > -- > 2.26.2 > > ___ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Consult on ARM SMMU debugfs
On 2021-01-15 15:14, Russell King - ARM Linux admin wrote: On Mon, Jan 11, 2021 at 08:01:48PM +, Robin Murphy wrote: On 2021-01-07 02:45, chenxiang (M) wrote: Hi Will,� Robin or other guys, When debugging SMMU/SVA issue on huawei ARM64 board, we find that it lacks of enough debugfs for ARM SMMU driver (such as the value of STE/CD which we need to check sometimes). Currently it creates top-level iommu directory in debugfs, but there is no debugfs for ARM SMMU driver specially. Do you know whether ARM have the plan to do that recently? FWIW I don't think I've ever felt the need to need to inspect the Stream Table on a live system. So far the nature of the STE code has been simple enough that it's very hard for any given STE to be *wrong* - either it's set up as expected and thus works fine, or it's not initialised at all and you get C_BAD_STE, where 99% of the time you then just cross-reference the Stream ID against the firmware and find that the DT/IORT is wrong. Similarly I don't think I've even even *seen* an issue that could be attributed to a context descriptor, although I appreciate that as we start landing more PASID and SVA support the scope for that starts to widen considerably. Feel free to propose a patch if you believe it would be genuinely useful and won't just bit-rot into a maintenance burden, but it's not something that's on our roadmap here. I do think that the IOMMU stuff needs better debugging. I've hit the WARN_ON() in __arm_lpae_map(), and it's been pretty much undebuggable, so I've resorted to putting the IOMMU into bypass mode permanently to work around the issue. The reason that it's undebuggable is if one puts printk() or trace statements in the code, boots the platform, you get flooded with those debugging messages, because every access to the rootfs generates and tears down a mapping. It would be nice to be able to inspect the IOMMU page tables and state of the IOMMU, rather than having to resort to effectively disabling the IOMMU. Certainly once we get to stuff like unpinned VFIO, having the ability to inspect pagetables for arbitrary IOMMU API usage will indeed be useful. From the DMA mapping perspective, though, unless you're working on the io-pgtable code itself it's not really going to tell you much that dumping the mappings from dma-debug can't already. FWIW whenever I encounter that particular warning in iommu-dma context, I don't care where the existing mapping is pointing, since it's merely a symptom of the damage already having been done. At that point I'd usually go off and audit all the DMA API calls in the offending driver, since it's typically caused by corruption in the IOVA allocator from passing the wrong size in a dma_unmap_*() call, and those can often be spotted by inspection. For active debugging, what you really want to know is the *history* of operations around that IOVA, since you're primarily interested in the request that last mapped it, then the corresponding unmap request for nominally the same buffer (which allowed the IOVA region to be freed for reuse) that for some reason didn't cover one or more pages that it should have. The IOMMU API tracepoints can be a handy tool there. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Consult on ARM SMMU debugfs
On Mon, Jan 11, 2021 at 08:01:48PM +, Robin Murphy wrote: > On 2021-01-07 02:45, chenxiang (M) wrote: > > Hi Will, Robin or other guys, > > > > When debugging SMMU/SVA issue on huawei ARM64 board, we find that it > > lacks of enough debugfs for ARM SMMU driver (such as > > > > the value of STE/CD which we need to check sometimes). Currently it > > creates top-level iommu directory in debugfs, but there is no debugfs > > > > for ARM SMMU driver specially. Do you know whether ARM have the plan to > > do that recently? > > FWIW I don't think I've ever felt the need to need to inspect the Stream > Table on a live system. So far the nature of the STE code has been simple > enough that it's very hard for any given STE to be *wrong* - either it's set > up as expected and thus works fine, or it's not initialised at all and you > get C_BAD_STE, where 99% of the time you then just cross-reference the > Stream ID against the firmware and find that the DT/IORT is wrong. > > Similarly I don't think I've even even *seen* an issue that could be > attributed to a context descriptor, although I appreciate that as we start > landing more PASID and SVA support the scope for that starts to widen > considerably. > > Feel free to propose a patch if you believe it would be genuinely useful and > won't just bit-rot into a maintenance burden, but it's not something that's > on our roadmap here. I do think that the IOMMU stuff needs better debugging. I've hit the WARN_ON() in __arm_lpae_map(), and it's been pretty much undebuggable, so I've resorted to putting the IOMMU into bypass mode permanently to work around the issue. The reason that it's undebuggable is if one puts printk() or trace statements in the code, boots the platform, you get flooded with those debugging messages, because every access to the rootfs generates and tears down a mapping. It would be nice to be able to inspect the IOMMU page tables and state of the IOMMU, rather than having to resort to effectively disabling the IOMMU. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: check for the deferred attach when attaching a device
On 2021-01-15 14:26, lijiang wrote: Hi, Robin Thank you for the comment. 在 2021年01月13日 01:29, Robin Murphy 写道: On 2021-01-05 07:52, lijiang wrote: 在 2021年01月05日 11:55, lijiang 写道: Hi, Also add Joerg to cc list. Also add more people to cc list, Jerry Snitselaar and Tom Lendacky. Thanks. Thanks. Lianbo 在 2020年12月26日 13:39, Lianbo Jiang 写道: Currently, because domain attach allows to be deferred from iommu driver to device driver, and when iommu initializes, the devices on the bus will be scanned and the default groups will be allocated. Due to the above changes, some devices could be added to the same group as below: [ 3.859417] pci :01:00.0: Adding to iommu group 16 [ 3.864572] pci :01:00.1: Adding to iommu group 16 [ 3.869738] pci :02:00.0: Adding to iommu group 17 [ 3.874892] pci :02:00.1: Adding to iommu group 17 But when attaching these devices, it doesn't allow that a group has more than one device, otherwise it will return an error. This conflicts with the deferred attaching. Unfortunately, it has two devices in the same group for my side, for example: [ 9.627014] iommu_group_device_count(): device name[0]::01:00.0 [ 9.633545] iommu_group_device_count(): device name[1]::01:00.1 ... [ 10.255609] iommu_group_device_count(): device name[0]::02:00.0 [ 10.262144] iommu_group_device_count(): device name[1]::02:00.1 Finally, which caused the failure of tg3 driver when tg3 driver calls the dma_alloc_coherent() to allocate coherent memory in the tg3_test_dma(). [ 9.660310] tg3 :01:00.0: DMA engine test failed, aborting [ 9.754085] tg3: probe of :01:00.0 failed with error -12 [ 9.997512] tg3 :01:00.1: DMA engine test failed, aborting [ 10.043053] tg3: probe of :01:00.1 failed with error -12 [ 10.288905] tg3 :02:00.0: DMA engine test failed, aborting [ 10.334070] tg3: probe of :02:00.0 failed with error -12 [ 10.578303] tg3 :02:00.1: DMA engine test failed, aborting [ 10.622629] tg3: probe of :02:00.1 failed with error -12 In addition, the similar situations also occur in other drivers such as the bnxt_en driver. That can be reproduced easily in kdump kernel when SME is active. Add a check for the deferred attach in the iommu_attach_device() and allow to attach the deferred device regardless of how many devices are in a group. Is this iommu_attach_device() call is coming from iommu-dma? (if not, then whoever's calling it probably shouldn't be) Yes, you are right, the iommu_attach_device call is coming from iommu-dma. Assuming so, then probably what should happen is to move the handling currently in iommu_dma_deferred_attach() into the core so that it can call __iommu_attach_device() directly - the intent is just to replay that exact call skipped in iommu_group_add_device(), so the legacy external iommu_attach_device() interface isn't really the right tool for the job Sounds good. I will check if this can work in various cases. If it's OK, I will post again. diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index f0305e6aac1b..5e7da902ac36 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -23,7 +23,6 @@ #include #include #include -#include #include struct iommu_dma_msi_page { @@ -378,21 +377,6 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, return iova_reserve_iommu_regions(dev, domain); } -static int iommu_dma_deferred_attach(struct device *dev, - struct iommu_domain *domain) -{ - const struct iommu_ops *ops = domain->ops; - - if (!is_kdump_kernel()) - return 0; - - if (unlikely(ops->is_attach_deferred && - ops->is_attach_deferred(domain, dev))) - return iommu_attach_device(domain, dev); - - return 0; -} - /** * dma_info_to_prot - Translate DMA API directions and attributes to IOMMU API *page flags. diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index ffeebda8d6de..4fed1567b498 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -23,6 +23,7 @@ #include #include #include +#include #include static struct kset *iommu_group_kset; @@ -1952,6 +1953,21 @@ static int __iommu_attach_device(struct iommu_domain *domain, return ret; } +int iommu_dma_deferred_attach(struct device *dev, +struct iommu_domain *domain) +{ +const struct iommu_ops *ops = domain->ops; + +if (!is_kdump_kernel()) +return 0; + +if (unlikely(ops->is_attach_deferred && +ops->is_attach_deferred(domain, dev))) +return __iommu_attach_device(domain, dev); + +return 0; +} + int iommu_attach_device(struct iommu_domain *domain, struct device *dev) { struct iommu_group *group; diff --git a/include/linux/iommu.h
Re: [PATCH] iommu: check for the deferred attach when attaching a device
Hi, Robin Thank you for the comment. 在 2021年01月13日 01:29, Robin Murphy 写道: > On 2021-01-05 07:52, lijiang wrote: >> 在 2021年01月05日 11:55, lijiang 写道: >>> Hi, >>> >>> Also add Joerg to cc list. >>> >> >> Also add more people to cc list, Jerry Snitselaar and Tom Lendacky. >> >> Thanks. >> >>> Thanks. >>> Lianbo >>> 在 2020年12月26日 13:39, Lianbo Jiang 写道: Currently, because domain attach allows to be deferred from iommu driver to device driver, and when iommu initializes, the devices on the bus will be scanned and the default groups will be allocated. Due to the above changes, some devices could be added to the same group as below: [ 3.859417] pci :01:00.0: Adding to iommu group 16 [ 3.864572] pci :01:00.1: Adding to iommu group 16 [ 3.869738] pci :02:00.0: Adding to iommu group 17 [ 3.874892] pci :02:00.1: Adding to iommu group 17 But when attaching these devices, it doesn't allow that a group has more than one device, otherwise it will return an error. This conflicts with the deferred attaching. Unfortunately, it has two devices in the same group for my side, for example: [ 9.627014] iommu_group_device_count(): device name[0]::01:00.0 [ 9.633545] iommu_group_device_count(): device name[1]::01:00.1 ... [ 10.255609] iommu_group_device_count(): device name[0]::02:00.0 [ 10.262144] iommu_group_device_count(): device name[1]::02:00.1 Finally, which caused the failure of tg3 driver when tg3 driver calls the dma_alloc_coherent() to allocate coherent memory in the tg3_test_dma(). [ 9.660310] tg3 :01:00.0: DMA engine test failed, aborting [ 9.754085] tg3: probe of :01:00.0 failed with error -12 [ 9.997512] tg3 :01:00.1: DMA engine test failed, aborting [ 10.043053] tg3: probe of :01:00.1 failed with error -12 [ 10.288905] tg3 :02:00.0: DMA engine test failed, aborting [ 10.334070] tg3: probe of :02:00.0 failed with error -12 [ 10.578303] tg3 :02:00.1: DMA engine test failed, aborting [ 10.622629] tg3: probe of :02:00.1 failed with error -12 In addition, the similar situations also occur in other drivers such as the bnxt_en driver. That can be reproduced easily in kdump kernel when SME is active. Add a check for the deferred attach in the iommu_attach_device() and allow to attach the deferred device regardless of how many devices are in a group. > > Is this iommu_attach_device() call is coming from iommu-dma? (if not, then > whoever's calling it probably shouldn't be) > Yes, you are right, the iommu_attach_device call is coming from iommu-dma. > Assuming so, then probably what should happen is to move the handling > currently in iommu_dma_deferred_attach() into the core so that it can call > __iommu_attach_device() directly - the intent is just to replay that exact > call skipped in iommu_group_add_device(), so the legacy external > iommu_attach_device() interface isn't really the right tool for the job Sounds good. I will check if this can work in various cases. If it's OK, I will post again. diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index f0305e6aac1b..5e7da902ac36 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -23,7 +23,6 @@ #include #include #include -#include #include struct iommu_dma_msi_page { @@ -378,21 +377,6 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, return iova_reserve_iommu_regions(dev, domain); } -static int iommu_dma_deferred_attach(struct device *dev, - struct iommu_domain *domain) -{ - const struct iommu_ops *ops = domain->ops; - - if (!is_kdump_kernel()) - return 0; - - if (unlikely(ops->is_attach_deferred && - ops->is_attach_deferred(domain, dev))) - return iommu_attach_device(domain, dev); - - return 0; -} - /** * dma_info_to_prot - Translate DMA API directions and attributes to IOMMU API *page flags. diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index ffeebda8d6de..4fed1567b498 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -23,6 +23,7 @@ #include #include #include +#include #include static struct kset *iommu_group_kset; @@ -1952,6 +1953,21 @@ static int __iommu_attach_device(struct iommu_domain *domain, return ret; } +int iommu_dma_deferred_attach(struct device *dev, +struct iommu_domain *domain) +{ +const struct iommu_ops *ops = domain->ops; + +if (!is_kdump_kernel()) +return 0; + +if (unlikely(ops->is_attach_deferred && +ops->is_attach_deferred(domain, dev))) +return __iommu_attach_device(domain,
RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
Hi Joerg, Thanks. Hope you are doing well now. Edgar -Original Message- From: jroe...@suse.de Sent: Freitag, 15. Januar 2021 09:18 To: Merger, Edgar [AUTOSOL/MAS/AUGS] Cc: iommu@lists.linux-foundation.org Subject: Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled Hi Edgar, On Mon, Nov 23, 2020 at 06:41:18AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > Just wanted to follow-up on that topic. > Is that quirk already put into upstream kernel? Sorry for the late reply, I had to take an extended sick leave. I will take care of sending this fix upstream next week. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 5/6] media: uvcvideo: Use dma_alloc_noncontiguos API
On Mon, Jan 11, 2021 at 9:36 AM . Christoph Hellwig wrote: > > On Thu, Jan 07, 2021 at 03:14:08PM +0100, Ricardo Ribalda wrote: > > > So I think we do want these branches for coherent vs non-coherent as they > > > have very different semantics and I do not think that hiding them under > > > the same API helps people to understand those vastly different semantics. > > > > > > OTOH we should look into a fallback for DMA API instances that do not > > > support the discontigous allocations. > > > > > > I think between your comments and those from Ricardo I have a good idea > > > for a somewhat updated API. I'll try to post something in the next days. > > > > Did you have time to look into this? > > > > No hurry, I just want to make sure that I didn't miss anything ;) > > Haven't managed to get to it, sorry. No worries!, is there something we can do to help you with this? > > > > > Best regards! > > > > > > > > -- > > Ricardo Ribalda > ---end quoted text--- -- Ricardo Ribalda ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH RFC v1 15/15] iommu/virtio: Update fault type and reason info for viommu fault
Fault type information can tell about a page request fault or an unreceoverable fault, and further additions to fault reasons and the related PASID information can help in handling faults efficiently. Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Michael S. Tsirkin Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- drivers/iommu/virtio-iommu.c | 27 +-- include/uapi/linux/virtio_iommu.h | 13 - 2 files changed, 37 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 9cc3d35125e9..10ef9e98214a 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -652,9 +652,16 @@ static int viommu_fault_handler(struct viommu_dev *viommu, char *reason_str; u8 reason = fault->reason; + u16 type= fault->flt_type; u32 flags = le32_to_cpu(fault->flags); u32 endpoint= le32_to_cpu(fault->endpoint); u64 address = le64_to_cpu(fault->address); + u32 pasid = le32_to_cpu(fault->pasid); + + if (type == VIRTIO_IOMMU_FAULT_F_PAGE_REQ) { + dev_info(viommu->dev, "Page request fault - unhandled\n"); + return 0; + } switch (reason) { case VIRTIO_IOMMU_FAULT_R_DOMAIN: @@ -663,6 +670,21 @@ static int viommu_fault_handler(struct viommu_dev *viommu, case VIRTIO_IOMMU_FAULT_R_MAPPING: reason_str = "page"; break; + case VIRTIO_IOMMU_FAULT_R_WALK_EABT: + reason_str = "page walk external abort"; + break; + case VIRTIO_IOMMU_FAULT_R_PTE_FETCH: + reason_str = "pte fetch"; + break; + case VIRTIO_IOMMU_FAULT_R_PERMISSION: + reason_str = "permission"; + break; + case VIRTIO_IOMMU_FAULT_R_ACCESS: + reason_str = "access"; + break; + case VIRTIO_IOMMU_FAULT_R_OOR_ADDRESS: + reason_str = "output address"; + break; case VIRTIO_IOMMU_FAULT_R_UNKNOWN: default: reason_str = "unknown"; @@ -671,8 +693,9 @@ static int viommu_fault_handler(struct viommu_dev *viommu, /* TODO: find EP by ID and report_iommu_fault */ if (flags & VIRTIO_IOMMU_FAULT_F_ADDRESS) - dev_err_ratelimited(viommu->dev, "%s fault from EP %u at %#llx [%s%s%s]\n", - reason_str, endpoint, address, + dev_err_ratelimited(viommu->dev, + "%s fault from EP %u PASID %u at %#llx [%s%s%s]\n", + reason_str, endpoint, pasid, address, flags & VIRTIO_IOMMU_FAULT_F_READ ? "R" : "", flags & VIRTIO_IOMMU_FAULT_F_WRITE ? "W" : "", flags & VIRTIO_IOMMU_FAULT_F_EXEC ? "X" : ""); diff --git a/include/uapi/linux/virtio_iommu.h b/include/uapi/linux/virtio_iommu.h index 608c8d642e1f..a537d82777f7 100644 --- a/include/uapi/linux/virtio_iommu.h +++ b/include/uapi/linux/virtio_iommu.h @@ -290,19 +290,30 @@ struct virtio_iommu_req_invalidate { #define VIRTIO_IOMMU_FAULT_R_UNKNOWN 0 #define VIRTIO_IOMMU_FAULT_R_DOMAIN1 #define VIRTIO_IOMMU_FAULT_R_MAPPING 2 +#define VIRTIO_IOMMU_FAULT_R_WALK_EABT 3 +#define VIRTIO_IOMMU_FAULT_R_PTE_FETCH 4 +#define VIRTIO_IOMMU_FAULT_R_PERMISSION5 +#define VIRTIO_IOMMU_FAULT_R_ACCESS6 +#define VIRTIO_IOMMU_FAULT_R_OOR_ADDRESS 7 #define VIRTIO_IOMMU_FAULT_F_READ (1 << 0) #define VIRTIO_IOMMU_FAULT_F_WRITE (1 << 1) #define VIRTIO_IOMMU_FAULT_F_EXEC (1 << 2) #define VIRTIO_IOMMU_FAULT_F_ADDRESS (1 << 8) +#define VIRTIO_IOMMU_FAULT_F_DMA_UNRECOV 1 +#define VIRTIO_IOMMU_FAULT_F_PAGE_REQ 2 + struct virtio_iommu_fault { __u8reason; - __u8reserved[3]; + __le16 flt_type; + __u8reserved; __le32 flags; __le32 endpoint; __u8reserved2[4]; __le64 address; + __le32 pasid; + __u8reserved3[4]; }; #endif -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH RFC v1 14/15] iommu/virtio: Add support for Arm LPAE page table format
From: Jean-Philippe Brucker When PASID isn't supported, we can still register one set of tables. Add support to register Arm LPAE based page table. Signed-off-by: Jean-Philippe Brucker [Vivek: Clean-ups to add right tcr definitions and accomodate with parent patches] Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Michael S. Tsirkin Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- drivers/iommu/virtio-iommu.c | 131 +- include/uapi/linux/virtio_iommu.h | 30 +++ 2 files changed, 139 insertions(+), 22 deletions(-) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index b5222da1dc74..9cc3d35125e9 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -135,6 +135,13 @@ struct viommu_event { #define to_viommu_domain(domain) \ container_of(domain, struct viommu_domain, domain) +#define VIRTIO_FIELD_PREP(_mask, _shift, _val) \ + ({ \ + (((_val) << VIRTIO_IOMMU_PGTF_ARM_ ## _shift) & \ +(VIRTIO_IOMMU_PGTF_ARM_ ## _mask <<\ + VIRTIO_IOMMU_PGTF_ARM_ ## _shift)); \ + }) + static int viommu_get_req_errno(void *buf, size_t len) { struct virtio_iommu_req_tail *tail = buf + len - sizeof(*tail); @@ -897,6 +904,76 @@ static int viommu_simple_attach(struct viommu_domain *vdomain, return ret; } +static int viommu_config_arm_pgt(struct viommu_endpoint *vdev, +struct io_pgtable_cfg *cfg, +struct virtio_iommu_req_attach_pgt_arm *req, +u64 *asid) +{ + int id; + struct virtio_iommu_probe_table_format *pgtf = (void *)vdev->pgtf; + typeof(>arm_lpae_s1_cfg.tcr) tcr = >arm_lpae_s1_cfg.tcr; + u64 __tcr; + + if (pgtf->asid_bits != 8 && pgtf->asid_bits != 16) + return -EINVAL; + + id = ida_simple_get(_ida, 1, 1 << pgtf->asid_bits, GFP_KERNEL); + if (id < 0) + return -ENOMEM; + + __tcr = VIRTIO_FIELD_PREP(T0SZ_MASK, T0SZ_SHIFT, tcr->tsz) | + VIRTIO_FIELD_PREP(IRGN0_MASK, IRGN0_SHIFT, tcr->irgn) | + VIRTIO_FIELD_PREP(ORGN0_MASK, ORGN0_SHIFT, tcr->orgn) | + VIRTIO_FIELD_PREP(SH0_MASK, SH0_SHIFT, tcr->sh) | + VIRTIO_FIELD_PREP(TG0_MASK, TG0_SHIFT, tcr->tg) | + VIRTIO_IOMMU_PGTF_ARM_EPD1 | VIRTIO_IOMMU_PGTF_ARM_HPD0 | + VIRTIO_IOMMU_PGTF_ARM_HPD1; + + req->format = cpu_to_le16(VIRTIO_IOMMU_FOMRAT_PGTF_ARM_LPAE); + req->ttbr = cpu_to_le64(cfg->arm_lpae_s1_cfg.ttbr); + req->tcr= cpu_to_le64(__tcr); + req->mair = cpu_to_le64(cfg->arm_lpae_s1_cfg.mair); + req->asid = cpu_to_le16(id); + + *asid = id; + return 0; +} + +static int viommu_attach_pgtable(struct viommu_endpoint *vdev, +struct viommu_domain *vdomain, +enum io_pgtable_fmt fmt, +struct io_pgtable_cfg *cfg, +u64 *asid) +{ + int ret; + int i, eid; + + struct virtio_iommu_req_attach_table req = { + .head.type = VIRTIO_IOMMU_T_ATTACH_TABLE, + .domain = cpu_to_le32(vdomain->id), + }; + + switch (fmt) { + case ARM_64_LPAE_S1: + ret = viommu_config_arm_pgt(vdev, cfg, (void *), asid); + if (ret) + return ret; + break; + default: + WARN_ON(1); + return -EINVAL; + } + + vdev_for_each_id(i, eid, vdev) { + req.endpoint = cpu_to_le32(eid); + ret = viommu_send_req_sync(vdomain->viommu, , sizeof(req)); + if (ret) + return ret; + } + + return 0; +} + static int viommu_teardown_pgtable(struct viommu_domain *vdomain) { struct iommu_vendor_psdtable_cfg *pst_cfg; @@ -972,32 +1049,42 @@ static int viommu_setup_pgtable(struct viommu_endpoint *vdev, if (!ops) return -ENOMEM; - pst_cfg = >cfg; - cfgi = _cfg->vendor.cfg; - id = ida_simple_get(_ida, 1, 1 << desc->asid_bits, GFP_KERNEL); - if (id < 0) { - ret = id; - goto err_free_pgtable; - } + if (!tbl) { + /* No PASID support, send attach_table */ + ret = viommu_attach_pgtable(vdev, vdomain, fmt, , + >mm.archid); + if (ret) + goto err_free_pgtable; + } else { + pst_cfg = >cfg; +
[PATCH RFC v1 13/15] iommu/virtio: Attach Arm PASID tables when available
From: Jean-Philippe Brucker When the ARM PASID table format is reported in a probe, send an attach request and install the page tables for iommu_map/iommu_unmap use. Architecture-specific components are already abstracted to libraries. We just need to pass config bits around and setup an alternative mechanism to the mapping tree. We reuse the convention already adopted by other IOMMU architectures (ARM SMMU and AMD IOMMU), that entry 0 in the PASID table is reserved for non-PASID traffic. Bind the PASID table, and setup entry 0 to be modified with iommu_map/unmap. Signed-off-by: Jean-Philippe Brucker [Vivek: Bunch of refactoring and clean-ups to use iommu-pasid-table APIs, creating iommu_pasid_table, and configuring based on reported pasid format. Couple of additional methods have also been created to configure vendor specific pasid configuration] Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Michael S. Tsirkin Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- drivers/iommu/virtio-iommu.c | 314 +++ 1 file changed, 314 insertions(+) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 004ea94e3731..b5222da1dc74 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -25,6 +25,7 @@ #include #include +#include "iommu-pasid-table.h" #define MSI_IOVA_BASE 0x800 #define MSI_IOVA_LENGTH0x10 @@ -33,6 +34,9 @@ #define VIOMMU_EVENT_VQ1 #define VIOMMU_NR_VQS 2 +/* Some architectures need an Address Space ID for each page table */ +static DEFINE_IDA(asid_ida); + struct viommu_dev { struct iommu_device iommu; struct device *dev; @@ -55,6 +59,7 @@ struct viommu_dev { u32 probe_size; boolhas_map:1; + boolhas_table:1; }; struct viommu_mapping { @@ -76,6 +81,7 @@ struct viommu_domain { struct mutexmutex; /* protects viommu pointer */ unsigned intid; u32 map_flags; + struct iommu_pasid_table*pasid_tbl; /* Default address space when a table is bound */ struct viommu_mmmm; @@ -891,6 +897,285 @@ static int viommu_simple_attach(struct viommu_domain *vdomain, return ret; } +static int viommu_teardown_pgtable(struct viommu_domain *vdomain) +{ + struct iommu_vendor_psdtable_cfg *pst_cfg; + struct arm_smmu_cfg_info *cfgi; + u32 asid; + + if (!vdomain->mm.ops) + return 0; + + free_io_pgtable_ops(vdomain->mm.ops); + vdomain->mm.ops = NULL; + + if (vdomain->pasid_tbl) { + pst_cfg = >pasid_tbl->cfg; + cfgi = _cfg->vendor.cfg; + asid = cfgi->s1_cfg->cd.asid; + + iommu_psdtable_write(vdomain->pasid_tbl, pst_cfg, 0, NULL); + ida_simple_remove(_ida, asid); + } + + return 0; +} + +static int viommu_setup_pgtable(struct viommu_endpoint *vdev, + struct viommu_domain *vdomain) +{ + int ret, id; + u32 asid; + enum io_pgtable_fmt fmt; + struct io_pgtable_ops *ops = NULL; + struct viommu_dev *viommu = vdev->viommu; + struct virtio_iommu_probe_table_format *desc = vdev->pgtf; + struct iommu_pasid_table *tbl = vdomain->pasid_tbl; + struct iommu_vendor_psdtable_cfg *pst_cfg; + struct arm_smmu_cfg_info *cfgi; + struct io_pgtable_cfg cfg = { + .iommu_dev = viommu->dev->parent, + .tlb= _flush_ops, + .pgsize_bitmap = vdev->pgsize_mask ? vdev->pgsize_mask : + vdomain->domain.pgsize_bitmap, + .ias= (vdev->input_end ? ilog2(vdev->input_end) : + ilog2(vdomain->domain.geometry.aperture_end)) + 1, + .oas= vdev->output_bits, + }; + + if (!desc) + return -EINVAL; + + if (!vdev->output_bits) + return -ENODEV; + + switch (le16_to_cpu(desc->format)) { + case VIRTIO_IOMMU_FOMRAT_PGTF_ARM_LPAE: + fmt = ARM_64_LPAE_S1; + break; + default: + dev_err(vdev->dev, "unsupported page table format 0x%x\n", + le16_to_cpu(desc->format)); + return -EINVAL; + } + + if (vdomain->mm.ops) { + /* +* TODO: attach additional endpoint to the domain. Check that +* the config is sane. +*/ +
[PATCH RFC v1 12/15] iommu/virtio: Add support for INVALIDATE request
From: Jean-Philippe Brucker Add support for tlb invalidation ops that can send invalidation requests to back-end virtio-iommu when stage-1 page tables are supported. Signed-off-by: Jean-Philippe Brucker [Vivek: Refactoring the iommu_flush_ops, and adding only one pasid sync op that's needed with current iommu-pasid-table infrastructure. Also updating uapi defines as required by latest changes] Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Michael S. Tsirkin Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- drivers/iommu/virtio-iommu.c | 95 1 file changed, 95 insertions(+) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index ae5dfd3f8269..004ea94e3731 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -63,6 +64,8 @@ struct viommu_mapping { }; struct viommu_mm { + int pasid; + u64 archid; struct io_pgtable_ops *ops; struct viommu_domain*domain; }; @@ -692,6 +695,98 @@ static void viommu_event_handler(struct virtqueue *vq) virtqueue_kick(vq); } +/* PASID and pgtable APIs */ + +static void __viommu_flush_pasid_tlb_all(struct viommu_domain *vdomain, +int pasid, u64 arch_id, int type) +{ + struct virtio_iommu_req_invalidate req = { + .head.type = VIRTIO_IOMMU_T_INVALIDATE, + .inv_gran = cpu_to_le32(VIRTIO_IOMMU_INVAL_G_PASID), + .flags = cpu_to_le32(VIRTIO_IOMMU_INVAL_F_PASID), + .inv_type = cpu_to_le32(type), + + .domain = cpu_to_le32(vdomain->id), + .pasid = cpu_to_le32(pasid), + .archid = cpu_to_le64(arch_id), + }; + + if (viommu_send_req_sync(vdomain->viommu, , sizeof(req))) + pr_debug("could not send invalidate request\n"); +} + +static void viommu_flush_tlb_add(struct iommu_iotlb_gather *gather, +unsigned long iova, size_t granule, +void *cookie) +{ + struct viommu_mm *viommu_mm = cookie; + struct viommu_domain *vdomain = viommu_mm->domain; + struct iommu_domain *domain = >domain; + + iommu_iotlb_gather_add_page(domain, gather, iova, granule); +} + +static void viommu_flush_tlb_walk(unsigned long iova, size_t size, + size_t granule, void *cookie) +{ + struct viommu_mm *viommu_mm = cookie; + struct viommu_domain *vdomain = viommu_mm->domain; + struct virtio_iommu_req_invalidate req = { + .head.type = VIRTIO_IOMMU_T_INVALIDATE, + .inv_gran = cpu_to_le32(VIRTIO_IOMMU_INVAL_G_VA), + .inv_type = cpu_to_le32(VIRTIO_IOMMU_INV_T_IOTLB), + .flags = cpu_to_le32(VIRTIO_IOMMU_INVAL_F_ARCHID), + + .domain = cpu_to_le32(vdomain->id), + .pasid = cpu_to_le32(viommu_mm->pasid), + .archid = cpu_to_le64(viommu_mm->archid), + .virt_start = cpu_to_le64(iova), + .nr_pages = cpu_to_le64(size / granule), + .granule= ilog2(granule), + }; + + if (viommu_add_req(vdomain->viommu, , sizeof(req))) + pr_debug("could not add invalidate request\n"); +} + +static void viommu_flush_tlb_all(void *cookie) +{ + struct viommu_mm *viommu_mm = cookie; + + if (!viommu_mm->archid) + return; + + __viommu_flush_pasid_tlb_all(viommu_mm->domain, viommu_mm->pasid, +viommu_mm->archid, +VIRTIO_IOMMU_INV_T_IOTLB); +} + +static struct iommu_flush_ops viommu_flush_ops = { + .tlb_flush_all = viommu_flush_tlb_all, + .tlb_flush_walk = viommu_flush_tlb_walk, + .tlb_add_page = viommu_flush_tlb_add, +}; + +static void viommu_flush_pasid(void *cookie, int pasid, bool leaf) +{ + struct viommu_domain *vdomain = cookie; + struct virtio_iommu_req_invalidate req = { + .head.type = VIRTIO_IOMMU_T_INVALIDATE, + .inv_gran = cpu_to_le32(VIRTIO_IOMMU_INVAL_G_PASID), + .inv_type = cpu_to_le32(VIRTIO_IOMMU_INV_T_PASID), + .flags = cpu_to_le32(VIRTIO_IOMMU_INVAL_F_PASID), + + .domain = cpu_to_le32(vdomain->id), + .pasid = cpu_to_le32(pasid), + }; + + if (leaf) + req.flags |=
[PATCH RFC v1 11/15] iommu/virtio: Add headers for binding pasid table in iommu
From: Jean-Philippe Brucker Add the required UAPI defines for binding pasid tables in virtio-iommu. This mode allows to hand stage-1 page tables over to the guest. Signed-off-by: Jean-Philippe Brucker [Vivek: Refactor to cleanup headers for invalidation] Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Michael S. Tsirkin Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- include/uapi/linux/virtio_iommu.h | 68 +++ 1 file changed, 68 insertions(+) diff --git a/include/uapi/linux/virtio_iommu.h b/include/uapi/linux/virtio_iommu.h index 8a0624bab4b2..3481e4a3dd24 100644 --- a/include/uapi/linux/virtio_iommu.h +++ b/include/uapi/linux/virtio_iommu.h @@ -16,6 +16,7 @@ #define VIRTIO_IOMMU_F_BYPASS 3 #define VIRTIO_IOMMU_F_PROBE 4 #define VIRTIO_IOMMU_F_MMIO5 +#define VIRTIO_IOMMU_F_ATTACH_TABLE6 struct virtio_iommu_range_64 { __le64 start; @@ -44,6 +45,8 @@ struct virtio_iommu_config { #define VIRTIO_IOMMU_T_MAP 0x03 #define VIRTIO_IOMMU_T_UNMAP 0x04 #define VIRTIO_IOMMU_T_PROBE 0x05 +#define VIRTIO_IOMMU_T_ATTACH_TABLE0x06 +#define VIRTIO_IOMMU_T_INVALIDATE 0x07 /* Status types */ #define VIRTIO_IOMMU_S_OK 0x00 @@ -82,6 +85,37 @@ struct virtio_iommu_req_detach { struct virtio_iommu_req_tailtail; }; +struct virtio_iommu_req_attach_table { + struct virtio_iommu_req_headhead; + __le32 domain; + __le32 endpoint; + __le16 format; + __u8reserved[62]; + struct virtio_iommu_req_tailtail; +}; + +#define VIRTIO_IOMMU_PSTF_ARM_SMMU_V3_LINEAR 0x0 +#define VIRTIO_IOMMU_PSTF_ARM_SMMU_V3_4KL2 0x1 +#define VIRTIO_IOMMU_PSTF_ARM_SMMU_V3_64KL20x2 + +#define VIRTIO_IOMMU_PSTF_ARM_SMMU_V3_DSS_TERM 0x0 +#define VIRTIO_IOMMU_PSTF_ARM_SMMU_V3_DSS_BYPASS 0x1 +#define VIRTIO_IOMMU_PSTF_ARM_SMMU_V3_DSS_00x2 + +/* Arm SMMUv3 PASID Table Descriptor */ +struct virtio_iommu_req_attach_pst_arm { + struct virtio_iommu_req_headhead; + __le32 domain; + __le32 endpoint; + __le16 format; + __u8s1fmt; + __u8s1dss; + __le64 s1contextptr; + __le32 s1cdmax; + __u8reserved[48]; + struct virtio_iommu_req_tailtail; +}; + #define VIRTIO_IOMMU_MAP_F_READ(1 << 0) #define VIRTIO_IOMMU_MAP_F_WRITE (1 << 1) #define VIRTIO_IOMMU_MAP_F_MMIO(1 << 2) @@ -188,6 +222,40 @@ struct virtio_iommu_req_probe { */ }; +#define VIRTIO_IOMMU_INVAL_G_DOMAIN(1 << 0) +#define VIRTIO_IOMMU_INVAL_G_PASID (1 << 1) +#define VIRTIO_IOMMU_INVAL_G_VA(1 << 2) + +#define VIRTIO_IOMMU_INV_T_IOTLB (1 << 0) +#define VIRTIO_IOMMU_INV_T_DEV_IOTLB (1 << 1) +#define VIRTIO_IOMMU_INV_T_PASID (1 << 2) + +#define VIRTIO_IOMMU_INVAL_F_PASID (1 << 0) +#define VIRTIO_IOMMU_INVAL_F_ARCHID(1 << 1) +#define VIRTIO_IOMMU_INVAL_F_LEAF (1 << 2) + +struct virtio_iommu_req_invalidate { + struct virtio_iommu_req_headhead; + __le16 inv_gran; + __le16 inv_type; + + __le16 flags; + __u8reserved1[2]; + __le32 domain; + + __le32 pasid; + __u8reserved2[4]; + + __le64 archid; + __le64 virt_start; + __le64 nr_pages; + + /* Page size, in nr of bits, typically 12 for 4k, 30 for 2MB, etc.) */ + __u8granule; + __u8reserved3[11]; + struct virtio_iommu_req_tailtail; +}; + /* Fault types */ #define VIRTIO_IOMMU_FAULT_R_UNKNOWN 0 #define VIRTIO_IOMMU_FAULT_R_DOMAIN1 -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org
[PATCH RFC v1 10/15] iommu/virtio: Prepare to add attach pasid table infrastructure
In preparation to add attach pasid table op, separate out the existing attach request code to a separate method. Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Michael S. Tsirkin Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- drivers/iommu/virtio-iommu.c | 73 +--- 1 file changed, 51 insertions(+), 22 deletions(-) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 12d73321dbf4..ae5dfd3f8269 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -52,6 +52,8 @@ struct viommu_dev { /* Supported MAP flags */ u32 map_flags; u32 probe_size; + + boolhas_map:1; }; struct viommu_mapping { @@ -60,6 +62,11 @@ struct viommu_mapping { u32 flags; }; +struct viommu_mm { + struct io_pgtable_ops *ops; + struct viommu_domain*domain; +}; + struct viommu_domain { struct iommu_domain domain; struct viommu_dev *viommu; @@ -67,12 +74,20 @@ struct viommu_domain { unsigned intid; u32 map_flags; + /* Default address space when a table is bound */ + struct viommu_mmmm; + + /* When no table is bound, use generic mappings */ spinlock_t mappings_lock; struct rb_root_cached mappings; unsigned long nr_endpoints; }; +#define vdev_for_each_id(i, eid, vdev) \ + for (i = 0; i < vdev->dev->iommu->fwspec->num_ids &&\ + ({ eid = vdev->dev->iommu->fwspec->ids[i]; 1; }); i++) + struct viommu_endpoint { struct device *dev; struct viommu_dev *viommu; @@ -750,12 +765,40 @@ static void viommu_domain_free(struct iommu_domain *domain) kfree(vdomain); } +static int viommu_simple_attach(struct viommu_domain *vdomain, + struct viommu_endpoint *vdev) +{ + int i, eid, ret; + struct virtio_iommu_req_attach req = { + .head.type = VIRTIO_IOMMU_T_ATTACH, + .domain = cpu_to_le32(vdomain->id), + }; + + if (!vdomain->viommu->has_map) + return -ENODEV; + + vdev_for_each_id(i, eid, vdev) { + req.endpoint = cpu_to_le32(eid); + + ret = viommu_send_req_sync(vdomain->viommu, , sizeof(req)); + if (ret) + return ret; + } + + if (!vdomain->nr_endpoints) { + /* +* This endpoint is the first to be attached to the domain. +* Replay existing mappings if any (e.g. SW MSI). +*/ + ret = viommu_replay_mappings(vdomain); + } + + return ret; +} + static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev) { - int i; int ret = 0; - struct virtio_iommu_req_attach req; - struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); struct viommu_endpoint *vdev = dev_iommu_priv_get(dev); struct viommu_domain *vdomain = to_viommu_domain(domain); @@ -790,25 +833,9 @@ static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev) if (vdev->vdomain) vdev->vdomain->nr_endpoints--; - req = (struct virtio_iommu_req_attach) { - .head.type = VIRTIO_IOMMU_T_ATTACH, - .domain = cpu_to_le32(vdomain->id), - }; - - for (i = 0; i < fwspec->num_ids; i++) { - req.endpoint = cpu_to_le32(fwspec->ids[i]); - - ret = viommu_send_req_sync(vdomain->viommu, , sizeof(req)); - if (ret) - return ret; - } - - if (!vdomain->nr_endpoints) { - /* -* This endpoint is the first to be attached to the domain. -* Replay existing mappings (e.g. SW MSI). -*/ - ret = viommu_replay_mappings(vdomain); + if (!vdomain->mm.ops) { + /* If we couldn't bind any table, use the mapping tree */ + ret = viommu_simple_attach(vdomain, vdev); if (ret) return ret; } @@ -1142,6 +1169,8 @@ static int viommu_probe(struct virtio_device *vdev) struct virtio_iommu_config, probe_size, >probe_size); + viommu->has_map = virtio_has_feature(vdev, VIRTIO_IOMMU_F_MAP_UNMAP); + viommu->geometry = (struct iommu_domain_geometry) {
[PATCH RFC v1 09/15] iommu/virtio: Update table format probing header
Add info about asid_bits and additional flags to table format probing header. Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Michael S. Tsirkin Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- include/uapi/linux/virtio_iommu.h | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/virtio_iommu.h b/include/uapi/linux/virtio_iommu.h index 43821e33e7af..8a0624bab4b2 100644 --- a/include/uapi/linux/virtio_iommu.h +++ b/include/uapi/linux/virtio_iommu.h @@ -169,7 +169,10 @@ struct virtio_iommu_probe_pasid_size { struct virtio_iommu_probe_table_format { struct virtio_iommu_probe_property head; __le16 format; - __u8reserved[2]; + __le16 asid_bits; + + __le32 flags; + __u8reserved[4]; }; struct virtio_iommu_req_probe { -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH RFC v1 08/15] iommu: Add asid_bits to arm smmu-v3 stage1 table info
aisd_bits data is required to prepare stage-1 tables for arm-smmu-v3. Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- include/uapi/linux/iommu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h index 082d758dd016..96abbfc7c643 100644 --- a/include/uapi/linux/iommu.h +++ b/include/uapi/linux/iommu.h @@ -357,7 +357,7 @@ struct iommu_pasid_smmuv3 { __u32 version; __u8s1fmt; __u8s1dss; - __u8padding[2]; + __u16 asid_bits; }; /** -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH RFC v1 07/15] iommu/virtio: Add table format probing
From: Jean-Philippe Brucker The device may provide information about hardware tables and additional capabilities for each device. Parse the new probe fields. Signed-off-by: Jean-Philippe Brucker [Vivek: Refactor to use "struct virtio_iommu_probe_table_format" rather than separate structures for page table and pasid table format.] Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Michael S. Tsirkin Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- drivers/iommu/virtio-iommu.c | 102 ++- 1 file changed, 101 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 2bfdd5734844..12d73321dbf4 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -78,6 +78,17 @@ struct viommu_endpoint { struct viommu_dev *viommu; struct viommu_domain*vdomain; struct list_headresv_regions; + + /* properties of the physical IOMMU */ + u64 pgsize_mask; + u64 input_start; + u64 input_end; + u8 output_bits; + u8 pasid_bits; + /* Preferred PASID table format */ + void*pstf; + /* Preferred page table format */ + void*pgtf; }; struct viommu_request { @@ -457,6 +468,72 @@ static int viommu_add_resv_mem(struct viommu_endpoint *vdev, return 0; } +static int viommu_add_pgsize_mask(struct viommu_endpoint *vdev, + struct virtio_iommu_probe_page_size_mask *prop, + size_t len) +{ + if (len < sizeof(*prop)) + return -EINVAL; + vdev->pgsize_mask = le64_to_cpu(prop->mask); + return 0; +} + +static int viommu_add_input_range(struct viommu_endpoint *vdev, + struct virtio_iommu_probe_input_range *prop, + size_t len) +{ + if (len < sizeof(*prop)) + return -EINVAL; + vdev->input_start = le64_to_cpu(prop->start); + vdev->input_end = le64_to_cpu(prop->end); + return 0; +} + +static int viommu_add_output_size(struct viommu_endpoint *vdev, + struct virtio_iommu_probe_output_size *prop, + size_t len) +{ + if (len < sizeof(*prop)) + return -EINVAL; + vdev->output_bits = prop->bits; + return 0; +} + +static int viommu_add_pasid_size(struct viommu_endpoint *vdev, +struct virtio_iommu_probe_pasid_size *prop, +size_t len) +{ + if (len < sizeof(*prop)) + return -EINVAL; + vdev->pasid_bits = prop->bits; + return 0; +} + +static int viommu_add_pgtf(struct viommu_endpoint *vdev, void *pgtf, size_t len) +{ + /* Select the first page table format available */ + if (len < sizeof(struct virtio_iommu_probe_table_format) || vdev->pgtf) + return -EINVAL; + + vdev->pgtf = kmemdup(pgtf, len, GFP_KERNEL); + if (!vdev->pgtf) + return -ENOMEM; + + return 0; +} + +static int viommu_add_pstf(struct viommu_endpoint *vdev, void *pstf, size_t len) +{ + if (len < sizeof(struct virtio_iommu_probe_table_format) || vdev->pstf) + return -EINVAL; + + vdev->pstf = kmemdup(pstf, len, GFP_KERNEL); + if (!vdev->pstf) + return -ENOMEM; + + return 0; +} + static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device *dev) { int ret; @@ -493,11 +570,30 @@ static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device *dev) while (type != VIRTIO_IOMMU_PROBE_T_NONE && cur < viommu->probe_size) { + void *value = prop; len = le16_to_cpu(prop->length) + sizeof(*prop); switch (type) { case VIRTIO_IOMMU_PROBE_T_RESV_MEM: - ret = viommu_add_resv_mem(vdev, (void *)prop, len); + ret = viommu_add_resv_mem(vdev, value, len); + break; + case VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK: + ret = viommu_add_pgsize_mask(vdev, value, len); + break; + case VIRTIO_IOMMU_PROBE_T_INPUT_RANGE: + ret = viommu_add_input_range(vdev, value, len); + break; + case VIRTIO_IOMMU_PROBE_T_OUTPUT_SIZE: + ret = viommu_add_output_size(vdev, value, len); + break; +
[PATCH RFC v1 06/15] iommu/virtio: Add headers for table format probing
From: Jean-Philippe Brucker Add required UAPI defines for probing table format for underlying iommu hardware. The device may provide information about hardware tables and additional capabilities for each device. This allows guest to correctly fabricate stage-1 page tables. Signed-off-by: Jean-Philippe Brucker [Vivek: Use a single "struct virtio_iommu_probe_table_format" rather than separate structures for page table and pasid table format. Also update commit message.] Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Michael S. Tsirkin Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- include/uapi/linux/virtio_iommu.h | 44 ++- 1 file changed, 43 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/virtio_iommu.h b/include/uapi/linux/virtio_iommu.h index 237e36a280cb..43821e33e7af 100644 --- a/include/uapi/linux/virtio_iommu.h +++ b/include/uapi/linux/virtio_iommu.h @@ -2,7 +2,7 @@ /* * Virtio-iommu definition v0.12 * - * Copyright (C) 2019 Arm Ltd. + * Copyright (C) 2019-2021 Arm Ltd. */ #ifndef _UAPI_LINUX_VIRTIO_IOMMU_H #define _UAPI_LINUX_VIRTIO_IOMMU_H @@ -111,6 +111,12 @@ struct virtio_iommu_req_unmap { #define VIRTIO_IOMMU_PROBE_T_NONE 0 #define VIRTIO_IOMMU_PROBE_T_RESV_MEM 1 +#define VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK2 +#define VIRTIO_IOMMU_PROBE_T_INPUT_RANGE 3 +#define VIRTIO_IOMMU_PROBE_T_OUTPUT_SIZE 4 +#define VIRTIO_IOMMU_PROBE_T_PASID_SIZE5 +#define VIRTIO_IOMMU_PROBE_T_PAGE_TABLE_FMT6 +#define VIRTIO_IOMMU_PROBE_T_PASID_TABLE_FMT 7 #define VIRTIO_IOMMU_PROBE_T_MASK 0xfff @@ -130,6 +136,42 @@ struct virtio_iommu_probe_resv_mem { __le64 end; }; +struct virtio_iommu_probe_page_size_mask { + struct virtio_iommu_probe_property head; + __u8reserved[4]; + __le64 mask; +}; + +struct virtio_iommu_probe_input_range { + struct virtio_iommu_probe_property head; + __u8reserved[4]; + __le64 start; + __le64 end; +}; + +struct virtio_iommu_probe_output_size { + struct virtio_iommu_probe_property head; + __u8bits; + __u8reserved[3]; +}; + +struct virtio_iommu_probe_pasid_size { + struct virtio_iommu_probe_property head; + __u8bits; + __u8reserved[3]; +}; + +/* Arm LPAE page table format */ +#define VIRTIO_IOMMU_FOMRAT_PGTF_ARM_LPAE 1 +/* Arm smmu-v3 type PASID table format */ +#define VIRTIO_IOMMU_FORMAT_PSTF_ARM_SMMU_V3 2 + +struct virtio_iommu_probe_table_format { + struct virtio_iommu_probe_property head; + __le16 format; + __u8reserved[2]; +}; + struct virtio_iommu_req_probe { struct virtio_iommu_req_headhead; __le32 endpoint; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH RFC v1 05/15] iommu/arm-smmu-v3: Set sync op from consumer driver of cd-lib
Te change allows different consumers of arm-smmu-v3-cd-lib to set their respective sync op for pasid entries. Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c | 1 - drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c| 7 +++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c index ec37476c8d09..acaa09acecdd 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c @@ -265,7 +265,6 @@ struct iommu_vendor_psdtable_ops arm_cd_table_ops = { .free= arm_smmu_free_cd_tables, .prepare = arm_smmu_prepare_cd, .write = arm_smmu_write_ctx_desc, - .sync= arm_smmu_sync_cd, }; struct iommu_pasid_table *arm_smmu_register_cd_table(struct device *dev, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 2f86c6ac42b6..0c644be22b4b 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1869,6 +1869,13 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain, if (ret) goto out_free_cd_tables; + /* +* Strange to setup an op here? +* cd-lib is the actual user of sync op, and therefore the platform +* drivers should assign this sync/maintenance ops as per need. +*/ + tbl->ops->sync = arm_smmu_sync_cd; + /* * Note that this will end up calling arm_smmu_sync_cd() before * the master has been added to the devices list for this domain. -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH RFC v1 04/15] iommu/arm-smmu-v3: Update CD base address info for user-space
Update base address information in vendor pasid table info to pass that to user-space for stage1 table management. Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c index 8a7187534706..ec37476c8d09 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c @@ -55,6 +55,9 @@ static __le64 *arm_smmu_get_cd_ptr(struct iommu_vendor_psdtable_cfg *pst_cfg, if (arm_smmu_alloc_cd_leaf_table(dev, l1_desc)) return NULL; + if (s1cfg->s1fmt == STRTAB_STE_0_S1FMT_LINEAR) + pst_cfg->base = l1_desc->l2ptr_dma; + l1ptr = cdcfg->cdtab + idx * CTXDESC_L1_DESC_DWORDS; arm_smmu_write_cd_l1_desc(l1ptr, l1_desc); /* An invalid L1CD can be cached */ @@ -211,6 +214,9 @@ static int arm_smmu_alloc_cd_tables(struct iommu_vendor_psdtable_cfg *pst_cfg) goto err_free_l1; } + if (s1cfg->s1fmt == STRTAB_STE_0_S1FMT_64K_L2) + pst_cfg->base = cdcfg->cdtab_dma; + return 0; err_free_l1: -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH RFC v1 03/15] iommu/arm-smmu-v3: Update drivers to work with iommu-pasid-table
Update arm-smmu-v3 context descriptor (CD) library driver to work with iommu-pasid-table APIs. These APIs are then used in arm-smmu-v3 drivers to manage CD tables. Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- .../arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c | 127 +- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 16 ++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 47 --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 7 +- drivers/iommu/iommu-pasid-table.h | 10 +- 5 files changed, 144 insertions(+), 63 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c index 97d1786a8a70..8a7187534706 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c @@ -8,17 +8,17 @@ #include #include "arm-smmu-v3.h" +#include "../../iommu-pasid-table.h" -static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu, +static int arm_smmu_alloc_cd_leaf_table(struct device *dev, struct arm_smmu_l1_ctx_desc *l1_desc) { size_t size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3); - l1_desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, + l1_desc->l2ptr = dmam_alloc_coherent(dev, size, _desc->l2ptr_dma, GFP_KERNEL); if (!l1_desc->l2ptr) { - dev_warn(smmu->dev, -"failed to allocate context descriptor table\n"); + dev_warn(dev, "failed to allocate context descriptor table\n"); return -ENOMEM; } return 0; @@ -34,35 +34,39 @@ static void arm_smmu_write_cd_l1_desc(__le64 *dst, WRITE_ONCE(*dst, cpu_to_le64(val)); } -static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_domain *smmu_domain, +static __le64 *arm_smmu_get_cd_ptr(struct iommu_vendor_psdtable_cfg *pst_cfg, u32 ssid) { __le64 *l1ptr; unsigned int idx; + struct device *dev = pst_cfg->iommu_dev; + struct arm_smmu_cfg_info *cfgi = _cfg->vendor.cfg; + struct arm_smmu_s1_cfg *s1cfg = cfgi->s1_cfg; + struct arm_smmu_ctx_desc_cfg *cdcfg = >cdcfg; struct arm_smmu_l1_ctx_desc *l1_desc; - struct arm_smmu_device *smmu = smmu_domain->smmu; - struct arm_smmu_ctx_desc_cfg *cdcfg = _domain->s1_cfg.cdcfg; + struct iommu_pasid_table *tbl = pasid_table_cfg_to_table(pst_cfg); - if (smmu_domain->s1_cfg.s1fmt == STRTAB_STE_0_S1FMT_LINEAR) + if (s1cfg->s1fmt == STRTAB_STE_0_S1FMT_LINEAR) return cdcfg->cdtab + ssid * CTXDESC_CD_DWORDS; idx = ssid >> CTXDESC_SPLIT; l1_desc = >l1_desc[idx]; if (!l1_desc->l2ptr) { - if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc)) + if (arm_smmu_alloc_cd_leaf_table(dev, l1_desc)) return NULL; l1ptr = cdcfg->cdtab + idx * CTXDESC_L1_DESC_DWORDS; arm_smmu_write_cd_l1_desc(l1ptr, l1_desc); /* An invalid L1CD can be cached */ - arm_smmu_sync_cd(smmu_domain, ssid, false); + if (iommu_psdtable_sync(tbl, tbl->cookie, ssid, false)) + return NULL; } idx = ssid & (CTXDESC_L2_ENTRIES - 1); return l1_desc->l2ptr + idx * CTXDESC_CD_DWORDS; } -int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid, - struct arm_smmu_ctx_desc *cd) +static int arm_smmu_write_ctx_desc(struct iommu_vendor_psdtable_cfg *pst_cfg, + int ssid, void *cookie) { /* * This function handles the following cases: @@ -78,12 +82,15 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid, u64 val; bool cd_live; __le64 *cdptr; - struct arm_smmu_device *smmu = smmu_domain->smmu; + struct arm_smmu_cfg_info *cfgi = _cfg->vendor.cfg; + struct arm_smmu_s1_cfg *s1cfg = cfgi->s1_cfg; + struct iommu_pasid_table *tbl = pasid_table_cfg_to_table(pst_cfg); + struct arm_smmu_ctx_desc *cd = cookie; - if (WARN_ON(ssid >= (1 << smmu_domain->s1_cfg.s1cdmax))) + if (WARN_ON(ssid >= (1 << s1cfg->s1cdmax))) return -E2BIG; - cdptr = arm_smmu_get_cd_ptr(smmu_domain, ssid); + cdptr = arm_smmu_get_cd_ptr(pst_cfg, ssid); if (!cdptr) return -ENOMEM; @@ -111,7 +118,8 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid, * order. Ensure that it observes valid values before reading * V=1. */ -
[PATCH RFC v1 02/15] iommu: Add a simple PASID table library
Add a small API in iommu subsystem to handle PASID table allocation requests from different consumer drivers, such as a paravirtualized iommu driver. The API provides ops for allocating and freeing PASID table, writing to it and managing the table caches. This library also provides for registering a vendor API that attaches to these ops. The vendor APIs would eventually perform arch level implementations for these PASID tables. Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- drivers/iommu/iommu-pasid-table.h | 134 ++ 1 file changed, 134 insertions(+) create mode 100644 drivers/iommu/iommu-pasid-table.h diff --git a/drivers/iommu/iommu-pasid-table.h b/drivers/iommu/iommu-pasid-table.h new file mode 100644 index ..bd4f57656f67 --- /dev/null +++ b/drivers/iommu/iommu-pasid-table.h @@ -0,0 +1,134 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * PASID table management for the IOMMU + * + * Copyright (C) 2021 Arm Ltd. + */ +#ifndef __IOMMU_PASID_TABLE_H +#define __IOMMU_PASID_TABLE_H + +#include + +#include "arm/arm-smmu-v3/arm-smmu-v3.h" + +enum pasid_table_fmt { + PASID_TABLE_ARM_SMMU_V3, + PASID_TABLE_NUM_FMTS, +}; + +/** + * struct arm_smmu_cfg_info - arm-smmu-v3 specific configuration data + * + * @s1_cfg: arm-smmu-v3 stage1 config data + * @feat_flag: features supported by arm-smmu-v3 implementation + */ +struct arm_smmu_cfg_info { + struct arm_smmu_s1_cfg *s1_cfg; + u32 feat_flag; +}; + +/** + * struct iommu_vendor_psdtable_cfg - Configuration data for PASID tables + * + * @iommu_dev: device performing the DMA table walks + * @fmt: The PASID table format + * @base: DMA address of the allocated table, set by the vendor driver + * @cfg: arm-smmu-v3 specific config data + */ +struct iommu_vendor_psdtable_cfg { + struct device *iommu_dev; + enum pasid_table_fmtfmt; + dma_addr_t base; + union { + struct arm_smmu_cfg_infocfg; + } vendor; +}; + +struct iommu_vendor_psdtable_ops; + +/** + * struct iommu_pasid_table - describes a set of PASID tables + * + * @cookie: An opaque token provided by the IOMMU driver and passed back to any + * callback routine. + * @cfg: A copy of the PASID table configuration + * @ops: The PASID table operations in use for this set of page tables + */ +struct iommu_pasid_table { + void*cookie; + struct iommu_vendor_psdtable_cfgcfg; + struct iommu_vendor_psdtable_ops*ops; +}; + +#define pasid_table_cfg_to_table(pst_cfg) \ + container_of((pst_cfg), struct iommu_pasid_table, cfg) + +struct iommu_vendor_psdtable_ops { + int (*alloc)(struct iommu_vendor_psdtable_cfg *cfg); + void (*free)(struct iommu_vendor_psdtable_cfg *cfg); + void (*prepare)(struct iommu_vendor_psdtable_cfg *cfg, + struct io_pgtable_cfg *pgtbl_cfg, u32 asid); + int (*write)(struct iommu_vendor_psdtable_cfg *cfg, int ssid, +void *cookie); + void (*sync)(void *cookie, int ssid, bool leaf); +}; + +static inline int iommu_psdtable_alloc(struct iommu_pasid_table *tbl, + struct iommu_vendor_psdtable_cfg *cfg) +{ + if (!tbl->ops->alloc) + return -ENOSYS; + + return tbl->ops->alloc(cfg); +} + +static inline void iommu_psdtable_free(struct iommu_pasid_table *tbl, + struct iommu_vendor_psdtable_cfg *cfg) +{ + if (!tbl->ops->free) + return; + + tbl->ops->free(cfg); +} + +static inline int iommu_psdtable_prepare(struct iommu_pasid_table *tbl, +struct iommu_vendor_psdtable_cfg *cfg, +struct io_pgtable_cfg *pgtbl_cfg, +u32 asid) +{ + if (!tbl->ops->prepare) + return -ENOSYS; + + tbl->ops->prepare(cfg, pgtbl_cfg, asid); + return 0; +} + +static inline int iommu_psdtable_write(struct iommu_pasid_table *tbl, + struct iommu_vendor_psdtable_cfg *cfg, + int ssid, void *cookie) +{ + if (!tbl->ops->write) + return -ENOSYS; + + return tbl->ops->write(cfg, ssid, cookie); +} + +static inline int iommu_psdtable_sync(struct iommu_pasid_table *tbl, + void *cookie, int ssid, bool leaf) +{ + if (!tbl->ops->sync) + return -ENOSYS; + + tbl->ops->sync(cookie, ssid, leaf); + return 0; +} + +/* A placeholder to register vendor specific pasid layer */ +static inline struct iommu_pasid_table *
[PATCH RFC v1 01/15] iommu/arm-smmu-v3: Create a Context Descriptor library
Para-virtualized iommu drivers in guest may require to create and manage context descriptor (CD) tables as part of PASID table allocations. The PASID tables are passed to host to configure stage-1 tables in hardware. Make way for a library driver for CD management to allow para- virtualized iommu driver call such code. Signed-off-by: Vivek Gautam Cc: Joerg Roedel Cc: Will Deacon Cc: Robin Murphy Cc: Jean-Philippe Brucker Cc: Eric Auger Cc: Alex Williamson Cc: Kevin Tian Cc: Jacob Pan Cc: Liu Yi L Cc: Lorenzo Pieralisi Cc: Shameerali Kolothum Thodi --- drivers/iommu/arm/arm-smmu-v3/Makefile| 2 +- .../arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c | 223 ++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 216 + drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 + 4 files changed, 228 insertions(+), 216 deletions(-) create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile index 54feb1ecccad..ca1a05b8b8ad 100644 --- a/drivers/iommu/arm/arm-smmu-v3/Makefile +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o -arm_smmu_v3-objs-y += arm-smmu-v3.o +arm_smmu_v3-objs-y += arm-smmu-v3.o arm-smmu-v3-cd-lib.o arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o arm_smmu_v3-objs := $(arm_smmu_v3-objs-y) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c new file mode 100644 index ..97d1786a8a70 --- /dev/null +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c @@ -0,0 +1,223 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * arm-smmu-v3 context descriptor handling library driver + * + * Copyright (C) 2021 Arm Ltd. + */ + +#include + +#include "arm-smmu-v3.h" + +static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu, + struct arm_smmu_l1_ctx_desc *l1_desc) +{ + size_t size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3); + + l1_desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, +_desc->l2ptr_dma, GFP_KERNEL); + if (!l1_desc->l2ptr) { + dev_warn(smmu->dev, +"failed to allocate context descriptor table\n"); + return -ENOMEM; + } + return 0; +} + +static void arm_smmu_write_cd_l1_desc(__le64 *dst, + struct arm_smmu_l1_ctx_desc *l1_desc) +{ + u64 val = (l1_desc->l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) | + CTXDESC_L1_DESC_V; + + /* See comment in arm_smmu_write_ctx_desc() */ + WRITE_ONCE(*dst, cpu_to_le64(val)); +} + +static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_domain *smmu_domain, + u32 ssid) +{ + __le64 *l1ptr; + unsigned int idx; + struct arm_smmu_l1_ctx_desc *l1_desc; + struct arm_smmu_device *smmu = smmu_domain->smmu; + struct arm_smmu_ctx_desc_cfg *cdcfg = _domain->s1_cfg.cdcfg; + + if (smmu_domain->s1_cfg.s1fmt == STRTAB_STE_0_S1FMT_LINEAR) + return cdcfg->cdtab + ssid * CTXDESC_CD_DWORDS; + + idx = ssid >> CTXDESC_SPLIT; + l1_desc = >l1_desc[idx]; + if (!l1_desc->l2ptr) { + if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc)) + return NULL; + + l1ptr = cdcfg->cdtab + idx * CTXDESC_L1_DESC_DWORDS; + arm_smmu_write_cd_l1_desc(l1ptr, l1_desc); + /* An invalid L1CD can be cached */ + arm_smmu_sync_cd(smmu_domain, ssid, false); + } + idx = ssid & (CTXDESC_L2_ENTRIES - 1); + return l1_desc->l2ptr + idx * CTXDESC_CD_DWORDS; +} + +int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid, + struct arm_smmu_ctx_desc *cd) +{ + /* +* This function handles the following cases: +* +* (1) Install primary CD, for normal DMA traffic (SSID = 0). +* (2) Install a secondary CD, for SID+SSID traffic. +* (3) Update ASID of a CD. Atomically write the first 64 bits of the +* CD, then invalidate the old entry and mappings. +* (4) Quiesce the context without clearing the valid bit. Disable +* translation, and ignore any translation fault. +* (5) Remove a secondary CD. +*/ + u64 val; + bool cd_live; + __le64 *cdptr; + struct arm_smmu_device *smmu = smmu_domain->smmu; + + if (WARN_ON(ssid >= (1 << smmu_domain->s1_cfg.s1cdmax))) + return -E2BIG; + + cdptr = arm_smmu_get_cd_ptr(smmu_domain, ssid); + if (!cdptr) + return -ENOMEM; + + val = le64_to_cpu(cdptr[0]); + cd_live = !!(val & CTXDESC_CD_0_V); + + if (!cd) { /* (5) */ +
[PATCH RFC v1 00/15] iommu/virtio: Nested stage support with Arm
This patch-series aims at enabling Nested stage translation in guests using virtio-iommu as the paravirtualized iommu. The backend is supported with Arm SMMU-v3 that provides nested stage-1 and stage-2 translation. This series derives its purpose from various efforts happening to add support for Shared Virtual Addressing (SVA) in host and guest. On Arm, most of the support for SVA has already landed. The support for nested stage translation and fault reporting to guest has been proposed [1]. The related changes required in VFIO [2] framework have also been put forward. This series proposes changes in virtio-iommu to program PASID tables and related stage-1 page tables. A simple iommu-pasid-table library is added for this purpose that interacts with vendor drivers to allocate and populate PASID tables. In Arm SMMUv3 we propose to pull the Context Descriptor (CD) management code out of the arm-smmu-v3 driver and add that as a glue vendor layer to support allocating CD tables, and populating them with right values. These CD tables are essentially the PASID tables and contain stage-1 page table configurations too. A request to setup these CD tables come from virtio-iommu driver using the iommu-pasid-table library when running on Arm. The virtio-iommu then pass these PASID tables to the host using the right virtio backend and support in VMM. For testing we have added necessary support in kvmtool. The changes in kvmtool are based on virtio-iommu development branch by Jean-Philippe Brucker [3]. The tested kernel branch contains following in the order bottom to top on the git hash - a) v5.11-rc3 b) arm-smmu-v3 [1] and vfio [2] changes from Eric to add nested page table support for Arm. c) Smmu test engine patches from Jean-Philippe's branch [4] d) This series e) Domain nesting info patches [5][6][7]. f) Changes to add arm-smmu-v3 specific nesting info (to be sent to the list). This kernel is tested on Neoverse reference software stack with Fixed virtual platform. Public version of the software stack and FVP is available here[8][9]. A big thanks to Jean-Philippe for his contributions towards this work and for his valuable guidance. [1] https://lore.kernel.org/linux-iommu/20201118112151.25412-1-eric.au...@redhat.com/T/ [2] https://lore.kernel.org/kvmarm/20201116110030.32335-12-eric.au...@redhat.com/T/ [3] https://jpbrucker.net/git/kvmtool/log/?h=virtio-iommu/devel [4] https://jpbrucker.net/git/linux/log/?h=sva/smmute [5] https://lore.kernel.org/kvm/1599734733-6431-2-git-send-email-yi.l@intel.com/ [6] https://lore.kernel.org/kvm/1599734733-6431-3-git-send-email-yi.l@intel.com/ [7] https://lore.kernel.org/kvm/1599734733-6431-4-git-send-email-yi.l@intel.com/ [8] https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps [9] https://git.linaro.org/landing-teams/working/arm/arm-reference-platforms.git/about/docs/rdn1edge/user-guide.rst Jean-Philippe Brucker (6): iommu/virtio: Add headers for table format probing iommu/virtio: Add table format probing iommu/virtio: Add headers for binding pasid table in iommu iommu/virtio: Add support for INVALIDATE request iommu/virtio: Attach Arm PASID tables when available iommu/virtio: Add support for Arm LPAE page table format Vivek Gautam (9): iommu/arm-smmu-v3: Create a Context Descriptor library iommu: Add a simple PASID table library iommu/arm-smmu-v3: Update drivers to work with iommu-pasid-table iommu/arm-smmu-v3: Update CD base address info for user-space iommu/arm-smmu-v3: Set sync op from consumer driver of cd-lib iommu: Add asid_bits to arm smmu-v3 stage1 table info iommu/virtio: Update table format probing header iommu/virtio: Prepare to add attach pasid table infrastructure iommu/virtio: Update fault type and reason info for viommu fault drivers/iommu/arm/arm-smmu-v3/Makefile| 2 +- .../arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c | 283 +++ .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 16 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 268 +-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 4 +- drivers/iommu/iommu-pasid-table.h | 140 drivers/iommu/virtio-iommu.c | 692 +- include/uapi/linux/iommu.h| 2 +- include/uapi/linux/virtio_iommu.h | 158 +++- 9 files changed, 1303 insertions(+), 262 deletions(-) create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c create mode 100644 drivers/iommu/iommu-pasid-table.h -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RESEND PATCH v3 0/4] iommu/iova: Solve longterm IOVA issue
+ linux-scsi (see https://lore.kernel.org/linux-iommu/1607538189-237944-4-git-send-email-john.ga...@huawei.com/) On 17/11/2020 10:25, John Garry wrote: This series contains a patch to solve the longterm IOVA issue which leizhen originally tried to address at [0]. A sieved kernel log is at the following, showing periodic dumps of IOVA sizes, per CPU and per depot bin, per IOVA size granule: https://raw.githubusercontent.com/hisilicon/kernel-dev/topic-iommu-5.10-iova-debug-v3/aging_test Notice, for example, the following logs: [13175.355584] print_iova1 cpu_total=40135 depot_total=3866 total=44001 [83483.457858] print_iova1 cpu_total=62532 depot_total=24476 total=87008 Where total IOVA rcache size has grown from 44K->87K over a long time. JFYI, I am able to reproduce this aging issue on another storage card, an LSI SAS 3008, so now it's harder to say it's an issue specific to a (buggy) single driver. A log of the IOVA size dumps is here: https://raw.githubusercontent.com/hisilicon/kernel-dev/064c4dc8869b3f2ad07edffceafde0b129f276b0/lsi3008_dmesg Notice again how the total IOVA size goes up over time, like: [ 68.176914] print_iova1 cpu_total=23663 depot_total=256 total=23919 [ 2337.008194] print_iova1 cpu_total=67361 depot_total=9088 total=76449 [17141.860078] print_iova1 cpu_total=73397 depot_total=10368 total=83765 [27087.850830] print_iova1 cpu_total=73386 depot_total=10624 total=84010 [10434.042877] print_iova1 cpu_total=90652 depot_total=12928 total=103580 I had to change some settings for that storage card to reproduce, though [0]. Could explain why no other reports. So please consider this issue again... Thanks, john [0] https://lore.kernel.org/linux-scsi/dd8e6fdc-397d-b6ad-3371-0b65d1932...@huawei.com/T/#m953d21446a5756981412c92d0924ca65c8d2f3a5 Along with this patch, I included the following: - A smaller helper to clear all IOVAs for a domain - Change polarity of the IOVA magazine helpers - Small optimisation from Cong Wang included, which was never applied [1]. There was some debate of the other patches in that series, but this one is quite straightforward. Differnces to v2: - Update commit message for patch 3/4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[GIT PULL] IOMMU fixes for -rc4
Hi Linus, Please pull these three IOMMU fixes for -rc4. The main one is a change to the Intel IOMMU driver to fix the handling of unaligned addresses when invalidating the TLB. The fix itself is a bit ugly (the caller does a bunch of shifting which is then effectively undone later in the callchain), but Lu has patches to clean all of this up in 5.12. Thanks, Will --->8 The following changes since commit 7c29ada5e70083805bc3a68daa23441df421fbee: iommu/vt-d: Fix ineffective devTLB invalidation for subdevices (2021-01-07 14:38:15 +) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git tags/iommu-fixes for you to fetch changes up to 694a1c0adebee9152a9ba0320468f7921aca647d: iommu/vt-d: Fix duplicate included linux/dma-map-ops.h (2021-01-12 16:56:20 +) iommu fixes for -rc4 - Fix address alignment handling for VT-D TLB invalidation - Enable workarounds for buggy Qualcomm firmware on two more SoCs - Drop duplicate #include Konrad Dybcio (1): iommu: arm-smmu-qcom: Add sdm630/msm8998 compatibles for qcom quirks Lu Baolu (1): iommu/vt-d: Fix unaligned addresses for intel_flush_svm_range_dev() Tian Tao (1): iommu/vt-d: Fix duplicate included linux/dma-map-ops.h drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 2 ++ drivers/iommu/intel/iommu.c| 1 - drivers/iommu/intel/svm.c | 22 -- 3 files changed, 22 insertions(+), 3 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Consult on ARM SMMU debugfs
在 2021/1/12 4:01, Robin Murphy 写道: On 2021-01-07 02:45, chenxiang (M) wrote: Hi Will, Robin or other guys, When debugging SMMU/SVA issue on huawei ARM64 board, we find that it lacks of enough debugfs for ARM SMMU driver (such as the value of STE/CD which we need to check sometimes). Currently it creates top-level iommu directory in debugfs, but there is no debugfs for ARM SMMU driver specially. Do you know whether ARM have the plan to do that recently? FWIW I don't think I've ever felt the need to need to inspect the Stream Table on a live system. So far the nature of the STE code has been simple enough that it's very hard for any given STE to be *wrong* - either it's set up as expected and thus works fine, or it's not initialised at all and you get C_BAD_STE, where 99% of the time you then just cross-reference the Stream ID against the firmware and find that the DT/IORT is wrong. Similarly I don't think I've even even *seen* an issue that could be attributed to a context descriptor, although I appreciate that as we start landing more PASID and SVA support the scope for that starts to widen considerably. Thank you for your reply. I aggree that it is very hard for the content of STE/CD to be wrong in current code, but there are more upsteaming features(such as SVA/vSMMU) which are related to SMMU, when debugging with those features, it is useful for us to locate issues if there are interfaces to dump those info. And also when debugging together with hardware guys, the content of dump is important for them too. Feel free to propose a patch if you believe it would be genuinely useful and won't just bit-rot into a maintenance burden, but it's not something that's on our roadmap here. OK, we are considering about incorporating following requirements into the plan: - Dump the value of STE/CD of devices - Dump page table entries of devices - Dump the entries info of CMDQ/EVENTQ Thanks, Robin. . ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RESEND PATCH v2 1/2] vfio/iommu_type1: Populate full dirty when detach non-pinned group
If a group with non-pinned-page dirty scope is detached with dirty logging enabled, we should fully populate the dirty bitmaps at the time it's removed since we don't know the extent of its previous DMA, nor will the group be present to trigger the full bitmap when the user retrieves the dirty bitmap. Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages tracking") Suggested-by: Alex Williamson Signed-off-by: Keqian Zhu --- drivers/vfio/vfio_iommu_type1.c | 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 0b4dedaa9128..c16924cd54e7 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -236,6 +236,19 @@ static void vfio_dma_populate_bitmap(struct vfio_dma *dma, size_t pgsize) } } +static void vfio_iommu_populate_bitmap_full(struct vfio_iommu *iommu) +{ + struct rb_node *n; + unsigned long pgshift = __ffs(iommu->pgsize_bitmap); + + for (n = rb_first(>dma_list); n; n = rb_next(n)) { + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node); + + if (dma->iommu_mapped) + bitmap_set(dma->bitmap, 0, dma->size >> pgshift); + } +} + static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu, size_t pgsize) { struct rb_node *n; @@ -2415,8 +2428,12 @@ static void vfio_iommu_type1_detach_group(void *iommu_data, * Removal of a group without dirty tracking may allow the iommu scope * to be promoted. */ - if (update_dirty_scope) + if (update_dirty_scope) { update_pinned_page_dirty_scope(iommu); + /* Promote pinned_scope successfully during dirty tracking? */ + if (iommu->dirty_page_tracking && iommu->pinned_page_dirty_scope) + vfio_iommu_populate_bitmap_full(iommu); + } mutex_unlock(>lock); } -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RESEND PATCH v2 2/2] vfio/iommu_type1: Sanity check pfn_list when remove vfio_dma
vfio_sanity_check_pfn_list() is used to check whether pfn_list of vfio_dma is empty when remove the external domain, so it makes a wrong assumption that only external domain will add pfn to dma pfn_list. Now we apply this check when remove a specific vfio_dma and extract the notifier check just for external domain. Fixes: a54eb55045ae ("vfio iommu type1: Add support for mediated devices") Signed-off-by: Keqian Zhu --- drivers/vfio/vfio_iommu_type1.c | 24 +--- 1 file changed, 5 insertions(+), 19 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index c16924cd54e7..9b7fcff6bd81 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -958,6 +958,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) { + WARN_ON(!RB_EMPTY_ROOT(>pfn_list)); vfio_unmap_unpin(iommu, dma, true); vfio_unlink_dma(iommu, dma); put_task_struct(dma->task); @@ -2251,23 +2252,6 @@ static void vfio_iommu_unmap_unpin_reaccount(struct vfio_iommu *iommu) } } -static void vfio_sanity_check_pfn_list(struct vfio_iommu *iommu) -{ - struct rb_node *n; - - n = rb_first(>dma_list); - for (; n; n = rb_next(n)) { - struct vfio_dma *dma; - - dma = rb_entry(n, struct vfio_dma, node); - - if (WARN_ON(!RB_EMPTY_ROOT(>pfn_list))) - break; - } - /* mdev vendor driver must unregister notifier */ - WARN_ON(iommu->notifier.head); -} - /* * Called when a domain is removed in detach. It is possible that * the removed domain decided the iova aperture window. Modify the @@ -2367,7 +2351,8 @@ static void vfio_iommu_type1_detach_group(void *iommu_data, kfree(group); if (list_empty(>external_domain->group_list)) { - vfio_sanity_check_pfn_list(iommu); + /* mdev vendor driver must unregister notifier */ + WARN_ON(iommu->notifier.head); if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) vfio_iommu_unmap_unpin_all(iommu); @@ -2492,7 +2477,8 @@ static void vfio_iommu_type1_release(void *iommu_data) if (iommu->external_domain) { vfio_release_domain(iommu->external_domain, true); - vfio_sanity_check_pfn_list(iommu); + /* mdev vendor driver must unregister notifier */ + WARN_ON(iommu->notifier.head); kfree(iommu->external_domain); } -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RESEND PATCH v2 0/2] vfio/iommu_type1: some fixes
changelog: v2: - Address suggestions from Alex. - Remove unnecessary patches. Keqian Zhu (2): vfio/iommu_type1: Populate full dirty when detach non-pinned group vfio/iommu_type1: Sanity check pfn_list when remove vfio_dma drivers/vfio/vfio_iommu_type1.c | 42 + 1 file changed, 22 insertions(+), 20 deletions(-) -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/5] vfio/iommu_type1: Fixes vfio_dma_populate_bitmap to avoid dirty lose
On 2021/1/15 1:14, Alex Williamson wrote: > On Thu, 14 Jan 2021 21:05:23 +0800 > Keqian Zhu wrote: > >> Hi Alex, >> >> On 2021/1/13 5:20, Alex Williamson wrote: >>> On Thu, 7 Jan 2021 17:28:57 +0800 >>> Keqian Zhu wrote: >>> Defer checking whether vfio_dma is of fully-dirty in update_user_bitmap is easy to lose dirty log. For example, after promoting pinned_scope of vfio_iommu, vfio_dma is not considered as fully-dirty, then we may lose dirty log that occurs before vfio_iommu is promoted. The key point is that pinned-dirty is not a real dirty tracking way, it can't continuously track dirty pages, but just restrict dirty scope. It is essentially the same as fully-dirty. Fully-dirty is of full-scope and pinned-dirty is of pinned-scope. So we must mark pinned-dirty or fully-dirty after we start dirty tracking or clear dirty bitmap, to ensure that dirty log is marked right away. >>> >>> I was initially convinced by these first three patches, but upon >>> further review, I think the premise is wrong. AIUI, the concern across >>> these patches is that our dirty bitmap is only populated with pages >>> dirtied by pinning and we only take into account the pinned page dirty >>> scope at the time the bitmap is retrieved by the user. You suppose >>> this presents a gap where if a vendor driver has not yet identified >>> with a page pinning scope that the entire bitmap should be considered >>> dirty regardless of whether that driver later pins pages prior to the >>> user retrieving the dirty bitmap. >> Yes, this is my concern. And there are some other scenarios. >> >> 1. If a non-pinned iommu-backed domain is detached between starting >> dirty log and retrieving dirty bitmap, then the dirty log is missed >> (As you said in the last paragraph). >> >> 2. If all vendor drivers pinned (means iommu pinned_scope is true), >> and an vendor driver do pin/unpin pair between starting dirty log and >> retrieving dirty bitmap, then the dirty log of these unpinned pages >> are missed. > > Can you identity where this happen? I believe this assertion is > incorrect. When dirty logging is enabled vfio_dma_populate_bitmap() > marks all existing pinned pages as dirty. In the scenario you > describe, the iommu pinned page dirty scope is irrelevant. We only > track pinned or external DMA access pages for exactly this reason. > Unpinning a page never clears the dirty bitmap, only the user > retrieving the bitmap or disabling dirty logging clears the bitmap. A > page that has been unpinned is transiently dirty, it is not repopulated > in the bitmap after the user has retrieved the bitmap because the > device no longer has access to it to consider it perpetually dirty. Right, thanks for making the logic more clear to me ;-). Then just one issue to fix. > >>> I don't think this is how we intended the cooperation between the >>> iommu driver and vendor driver to work. By pinning pages a vendor >>> driver is not declaring that only their future dirty page scope is >>> limited to pinned pages, instead they're declaring themselves as a >>> participant in dirty page tracking and take responsibility for >>> pinning any necessary pages. For example we might extend >>> VFIO_IOMMU_DIRTY_PAGES_FLAG_START to trigger a blocking >>> notification to groups to not only begin dirty tracking, but also >>> to synchronously register their current device DMA footprint. This >>> patch would require a vendor driver to possibly perform a >>> gratuitous page pinning in order to set the scope prior to dirty >>> logging being enabled, or else the initial bitmap will be fully >>> dirty. >> I get what you mean ;-). You said that there is time gap between we >> attach a device and this device declares itself is dirty traceable. >> >> However, this makes it difficult to management dirty log tracking (I >> list two bugs). If the vfio devices can declare themselves dirty >> traceable when attach to container, then the logic of dirty tracking >> is much more clear ;-) . > > There's only one bug above afaict, which should be easily fixed. I > think if you actually dig into the problem of a device declaring > themselves as dirty tracking capable, when the IOMMU backend works on > group, not devices, and it's groups that are attached to containers, > you might see that the logic is not so clear. I also don't see this as > a significant issue, you're focusing on a niche scenario where a device > is hot-added to a VM, which is immediately migrated before the device > identifies itself by pinning pages. In that scenario the iommu dirty > scope would be overly broad and we'd indicate all pages are dirty. > This errors on the side of reporting too much is dirty, which still > provides a correct result to the user. The more common scenario would > be migration of a "long" running, stable VM, where drivers are active > and devices have already pinned pages if they intend to. > OK ;-) Now I
[PATCH v2 2/2] vfio/iommu_type1: Sanity check pfn_list when remove vfio_dma
vfio_sanity_check_pfn_list() is used to check whether pfn_list of vfio_dma is empty when remove the external domain, so it makes a wrong assumption that only external domain will add pfn to dma pfn_list. Now we apply this check when remove a specific vfio_dma and extract the notifier check just for external domain. Fixes: a54eb55045ae ("vfio iommu type1: Add support for mediated devices") Signed-off-by: Keqian Zhu --- drivers/vfio/vfio_iommu_type1.c | 24 +--- 1 file changed, 5 insertions(+), 19 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 4e82b9a3440f..a9bc15e84a4e 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -958,6 +958,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) { + WARN_ON(!RB_EMPTY_ROOT(>pfn_list); vfio_unmap_unpin(iommu, dma, true); vfio_unlink_dma(iommu, dma); put_task_struct(dma->task); @@ -2251,23 +2252,6 @@ static void vfio_iommu_unmap_unpin_reaccount(struct vfio_iommu *iommu) } } -static void vfio_sanity_check_pfn_list(struct vfio_iommu *iommu) -{ - struct rb_node *n; - - n = rb_first(>dma_list); - for (; n; n = rb_next(n)) { - struct vfio_dma *dma; - - dma = rb_entry(n, struct vfio_dma, node); - - if (WARN_ON(!RB_EMPTY_ROOT(>pfn_list))) - break; - } - /* mdev vendor driver must unregister notifier */ - WARN_ON(iommu->notifier.head); -} - /* * Called when a domain is removed in detach. It is possible that * the removed domain decided the iova aperture window. Modify the @@ -2367,7 +2351,8 @@ static void vfio_iommu_type1_detach_group(void *iommu_data, kfree(group); if (list_empty(>external_domain->group_list)) { - vfio_sanity_check_pfn_list(iommu); + /* mdev vendor driver must unregister notifier */ + WARN_ON(iommu->notifier.head); if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) vfio_iommu_unmap_unpin_all(iommu); @@ -2491,7 +2476,8 @@ static void vfio_iommu_type1_release(void *iommu_data) if (iommu->external_domain) { vfio_release_domain(iommu->external_domain, true); - vfio_sanity_check_pfn_list(iommu); + /* mdev vendor driver must unregister notifier */ + WARN_ON(iommu->notifier.head); kfree(iommu->external_domain); } -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 1/2] vfio/iommu_type1: Populate full dirty when detach non-pinned group
If a group with non-pinned-page dirty scope is detached with dirty logging enabled, we should fully populate the dirty bitmaps at the time it's removed since we don't know the extent of its previous DMA, nor will the group be present to trigger the full bitmap when the user retrieves the dirty bitmap. Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages tracking") Suggested-by: Alex Williamson Signed-off-by: Keqian Zhu --- drivers/vfio/vfio_iommu_type1.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 0b4dedaa9128..4e82b9a3440f 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -236,6 +236,19 @@ static void vfio_dma_populate_bitmap(struct vfio_dma *dma, size_t pgsize) } } +static void vfio_iommu_populate_bitmap_full(struct vfio_iommu *iommu) +{ + struct rb_node *n; + unsigned long pgshift = __ffs(iommu->pgsize_bitmap); + + for (n = rb_first(>dma_list); n; n = rb_next(n)) { + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node); + + if (dma->iommu_mapped) + bitmap_set(dma->bitmap, 0, dma->size >> pgshift); + } +} + static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu, size_t pgsize) { struct rb_node *n; @@ -2415,8 +2428,11 @@ static void vfio_iommu_type1_detach_group(void *iommu_data, * Removal of a group without dirty tracking may allow the iommu scope * to be promoted. */ - if (update_dirty_scope) + if (update_dirty_scope) { update_pinned_page_dirty_scope(iommu); + if (iommu->dirty_page_tracking) + vfio_iommu_populate_bitmap_full(iommu); + } mutex_unlock(>lock); } -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 0/2] vfio/iommu_type1: some fixes
changelog: v2: - Address suggestions from Alex. - Remove unnecessary patches. Keqian Zhu (2): vfio/iommu_type1: Populate full dirty when detach non-pinned group vfio/iommu_type1: Sanity check pfn_list when remove vfio_dma drivers/vfio/vfio_iommu_type1.c | 42 + 1 file changed, 22 insertions(+), 20 deletions(-) -- 2.19.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/2] dt-bindings: arm-smmu: Add sm8350 compatible string
Add compatible string for sm8350 iommu to documentation. Signed-off-by: Vinod Koul --- Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml index 3b63f2ae24db..161a5d389808 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml @@ -37,6 +37,7 @@ properties: - qcom,sdm845-smmu-500 - qcom,sm8150-smmu-500 - qcom,sm8250-smmu-500 + - qcom,sm8350-smmu-500 - const: arm,mmu-500 - description: Qcom Adreno GPUs implementing "arm,smmu-v2" items: -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] iommu: arm-smmu-impl: Add SM8350 qcom iommu implementation
Add SM8350 qcom iommu implementation to the table of qcom_smmu_impl_of_match table which brings in iommu support for SM8350 SoC Signed-off-by: Vinod Koul --- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c index 5dff7ffbef11..8044a9bfca66 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c @@ -327,6 +327,7 @@ static const struct of_device_id __maybe_unused qcom_smmu_impl_of_match[] = { { .compatible = "qcom,sdm845-smmu-500" }, { .compatible = "qcom,sm8150-smmu-500" }, { .compatible = "qcom,sm8250-smmu-500" }, + { .compatible = "qcom,sm8350-smmu-500" }, { } }; -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
Hi Edgar, On Mon, Nov 23, 2020 at 06:41:18AM +, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > Just wanted to follow-up on that topic. > Is that quirk already put into upstream kernel? Sorry for the late reply, I had to take an extended sick leave. I will take care of sending this fix upstream next week. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu