Re: [PATCH 0/5] iommu/amd: fixes for suspend/resume
On Wed, 26 Jan 2022 at 10:12, Maxim Levitsky wrote: > > Great, your system does seem to support GA log > (but a patch to check if, other that assume blindly that it is supported is > something that should be done). > > So could you bump the LOOP_TIMEOUT like by 10x or so and see if the problem > goes away? > > (that code should be rewritten to time based wait and not just blindly loop > like that, > I also can prepare a patch for that as well). > > Best regards, > Maxim Levitsky > Hi I've done quite a few restarts with the LOOP_TIMEOUT increased and I've not seen the issue since Cheers Mike ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Wed, Jan 26 2022 at 09:36, Fenghua Yu wrote: > On Wed, Jan 26, 2022 at 03:23:42PM +0100, Thomas Gleixner wrote: >> On Tue, Jan 25 2022 at 07:18, Fenghua Yu wrote: >> While looking at ioasid_put() usage I tripped over the following UAF >> issue: >> >> --- a/drivers/iommu/intel/iommu.c >> +++ b/drivers/iommu/intel/iommu.c >> @@ -4817,8 +4817,10 @@ static int aux_domain_add_dev(struct dma >> auxiliary_unlink_device(domain, dev); >> link_failed: >> spin_unlock_irqrestore(&device_domain_lock, flags); >> -if (list_empty(&domain->subdevices) && domain->default_pasid > 0) >> +if (list_empty(&domain->subdevices) && domain->default_pasid > 0) { >> ioasid_put(domain->default_pasid); >> +domain->default_pasid = INVALID_IOASID; >> +} >> >> return ret; >> } >> @@ -4847,8 +4849,10 @@ static void aux_domain_remove_dev(struct >> >> spin_unlock_irqrestore(&device_domain_lock, flags); >> >> -if (list_empty(&domain->subdevices) && domain->default_pasid > 0) >> +if (list_empty(&domain->subdevices) && domain->default_pasid > 0) { >> ioasid_put(domain->default_pasid); >> +domain->default_pasid = INVALID_IOASID; >> +} >> } >> >> static int prepare_domain_attach_device(struct iommu_domain *domain, > > The above patch fixes an existing issue. I will put it in a separate patch, > right? Correct. > It cannot be applied cleanly to the upstream tree. Do you want me to base > the above patch (and the whole patch set) to the upstream tree or a specific > tip branch? Against Linus tree please so that the bugfix applies. > I will fold the following patch into patch #5. The patch #11 (the doc patch) > also needs to remove one paragraph talking about refcount. > > So I will send the whole patch set with the following changes: > 1. One new bug fix patch (the above patch) > 2. Updated patch #5 (with the following patch folded) > 3. Updated patch #11 (removing refcount description) Looks good. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/iova: Separate out rcache init
Hi Robin, Signed-off-by: John Garry Mangled patch? (no "---" separator here) hmm... not sure. As an experiment, I just downloaded this patch from lore.kernel.org and it applies ok. Overall this looks great, just a few comments further down... ... +} +EXPORT_SYMBOL_GPL(iova_domain_init_rcaches); + +void iova_domain_free_rcaches(struct iova_domain *iovad) +{ + cpuhp_state_remove_instance_nocalls(CPUHP_IOMMU_IOVA_DEAD, + &iovad->cpuhp_dead); + free_iova_rcaches(iovad); } +EXPORT_SYMBOL_GPL(iova_domain_free_rcaches); I think we should continue to expect external callers to clean up with put_iova_domain(). ok, fine, makes sense If they aren't doing that already they have a bug (albeit minor), and we don't want to give the impression that it's OK to free the caches at any point*other* than tearing down the whole iova_domain, since the implementation really wouldn't expect that. /* * Try inserting IOVA range starting with 'iova_pfn' into 'rcache', and @@ -831,7 +872,7 @@ static unsigned long iova_rcache_get(struct iova_domain *iovad, { unsigned int log_size = order_base_2(size); - if (log_size >= IOVA_RANGE_CACHE_MAX_SIZE) + if (log_size >= IOVA_RANGE_CACHE_MAX_SIZE || !iovad->rcaches) return 0; .. @@ -102,6 +92,8 @@ struct iova *reserve_iova(struct iova_domain *iovad, unsigned long pfn_lo, unsigned long pfn_hi); void init_iova_domain(struct iova_domain *iovad, unsigned long granule, unsigned long start_pfn); +int iova_domain_init_rcaches(struct iova_domain *iovad); +void iova_domain_free_rcaches(struct iova_domain *iovad); As above, I vote for just forward-declaring the free routine in iova.c and keeping it entirely private. ok struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn); void put_iova_domain(struct iova_domain *iovad); #else @@ -157,6 +149,15 @@ static inline void init_iova_domain(struct iova_domain *iovad, { } +static inline int iova_domain_init_rcaches(struct iova_domain *iovad) +{ + return -ENOTSUPP; +} + +static inline void iova_domain_free_rcaches(struct iova_domain *iovad) +{ +} + I'd be inclined not to add stubs at all - I think it's a reasonable assumption that anyone involved enough to care about rcaches has a hard dependency on IOMMU_IOVA already. So iova_domain_free_rcaches() would disappear from here as a result of the changes discussed above. As for iova_domain_init_rcaches(), I was just following the IOMMU/IOVA coding practice - that is, always stub. It's certainly the case today, and I'd hardly want to encourage more users anyway. I think that stronger deterrents would be needed :) Anyway, I can remove it. Thanks, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Thomas, On Wed, Jan 26, 2022 at 03:23:42PM +0100, Thomas Gleixner wrote: > On Tue, Jan 25 2022 at 07:18, Fenghua Yu wrote: > > On Mon, Jan 24, 2022 at 09:55:56PM +0100, Thomas Gleixner wrote: > > /** > > * ioasid_put - Release a reference to an ioasid > > * @ioasid: the ID to remove > > which in turn makes ioasid_put() a misnomer and the whole refcounting of > the ioasid a pointless exercise. > > While looking at ioasid_put() usage I tripped over the following UAF > issue: > > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -4817,8 +4817,10 @@ static int aux_domain_add_dev(struct dma > auxiliary_unlink_device(domain, dev); > link_failed: > spin_unlock_irqrestore(&device_domain_lock, flags); > - if (list_empty(&domain->subdevices) && domain->default_pasid > 0) > + if (list_empty(&domain->subdevices) && domain->default_pasid > 0) { > ioasid_put(domain->default_pasid); > + domain->default_pasid = INVALID_IOASID; > + } > > return ret; > } > @@ -4847,8 +4849,10 @@ static void aux_domain_remove_dev(struct > > spin_unlock_irqrestore(&device_domain_lock, flags); > > - if (list_empty(&domain->subdevices) && domain->default_pasid > 0) > + if (list_empty(&domain->subdevices) && domain->default_pasid > 0) { > ioasid_put(domain->default_pasid); > + domain->default_pasid = INVALID_IOASID; > + } > } > > static int prepare_domain_attach_device(struct iommu_domain *domain, The above patch fixes an existing issue. I will put it in a separate patch, right? It cannot be applied cleanly to the upstream tree. Do you want me to base the above patch (and the whole patch set) to the upstream tree or a specific tip branch? I will fold the following patch into patch #5. The patch #11 (the doc patch) also needs to remove one paragraph talking about refcount. So I will send the whole patch set with the following changes: 1. One new bug fix patch (the above patch) 2. Updated patch #5 (with the following patch folded) 3. Updated patch #11 (removing refcount description) Are the changes OK to you? > > Vs. ioasid_put() I think we should fold the following: > > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -4818,7 +4818,7 @@ static int aux_domain_add_dev(struct dma > link_failed: > spin_unlock_irqrestore(&device_domain_lock, flags); > if (list_empty(&domain->subdevices) && domain->default_pasid > 0) { > - ioasid_put(domain->default_pasid); > + ioasid_free(domain->default_pasid); > domain->default_pasid = INVALID_IOASID; > } > > @@ -4850,7 +4850,7 @@ static void aux_domain_remove_dev(struct > spin_unlock_irqrestore(&device_domain_lock, flags); > > if (list_empty(&domain->subdevices) && domain->default_pasid > 0) { > - ioasid_put(domain->default_pasid); > + ioasid_free(domain->default_pasid); > domain->default_pasid = INVALID_IOASID; > } > } > --- a/drivers/iommu/ioasid.c > +++ b/drivers/iommu/ioasid.c > @@ -2,7 +2,7 @@ > /* > * I/O Address Space ID allocator. There is one global IOASID space, split > into > * subsets. Users create a subset with DECLARE_IOASID_SET, then allocate and > - * free IOASIDs with ioasid_alloc and ioasid_put. > + * free IOASIDs with ioasid_alloc() and ioasid_free(). > */ > #include > #include > @@ -15,7 +15,6 @@ struct ioasid_data { > struct ioasid_set *set; > void *private; > struct rcu_head rcu; > - refcount_t refs; > }; > > /* > @@ -315,7 +314,6 @@ ioasid_t ioasid_alloc(struct ioasid_set > > data->set = set; > data->private = private; > - refcount_set(&data->refs, 1); > > /* >* Custom allocator needs allocator data to perform platform specific > @@ -348,17 +346,11 @@ ioasid_t ioasid_alloc(struct ioasid_set > EXPORT_SYMBOL_GPL(ioasid_alloc); > > /** > - * ioasid_put - Release a reference to an ioasid > + * ioasid_free - Free an ioasid > * @ioasid: the ID to remove > - * > - * Put a reference to the IOASID, free it when the number of references > drops to > - * zero. > - * > - * Return: %true if the IOASID was freed, %false otherwise. > */ > -bool ioasid_put(ioasid_t ioasid) > +void ioasid_free(ioasid_t ioasid) > { > - bool free = false; > struct ioasid_data *ioasid_data; > > spin_lock(&ioasid_allocator_lock); > @@ -368,10 +360,6 @@ bool ioasid_put(ioasid_t ioasid) > goto exit_unlock; > } > > - free = refcount_dec_and_test(&ioasid_data->refs); > - if (!free) > - goto exit_unlock; > - > active_allocator->ops->free(ioasid, active_allocator->ops->pdata); > /* Custom allocator needs additional steps to free the xa element */ > if (active_allocator->flags & IOASID_ALLOCATOR_CUSTOM) { > @@ -381,9 +369,8 @@ bool ioasid_put(ioasid_t ioasid) > > exit_unlock: >
Re: [PATCH] iommu/iova: Separate out rcache init
On 2022-01-26 13:55, John Garry wrote: Currently the rcache structures are allocated for all IOVA domains, even if they do not use "fast" alloc+free interface. This is wasteful of memory. In addition, fails in init_iova_rcaches() are not handled safely, which is less than ideal. Make "fast" users call a separate rcache init explicitly, which includes error checking. Signed-off-by: John Garry Mangled patch? (no "---" separator here) Overall this looks great, just a few comments further down... diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 3a46f2cc9e5d..dd066d990809 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -525,6 +525,7 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, struct iommu_dma_cookie *cookie = domain->iova_cookie; unsigned long order, base_pfn; struct iova_domain *iovad; + int ret; if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE) return -EINVAL; @@ -559,6 +560,9 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, } init_iova_domain(iovad, 1UL << order, base_pfn); + ret = iova_domain_init_rcaches(iovad); + if (ret) + return ret; /* If the FQ fails we can simply fall back to strict mode */ if (domain->type == IOMMU_DOMAIN_DMA_FQ && iommu_dma_init_fq(domain)) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index b28c9435b898..d3adc6ea5710 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -15,13 +15,14 @@ /* The anchor node sits above the top of the usable address space */ #define IOVA_ANCHOR ~0UL +#define IOVA_RANGE_CACHE_MAX_SIZE 6 /* log of max cached IOVA range size (in pages) */ + static bool iova_rcache_insert(struct iova_domain *iovad, unsigned long pfn, unsigned long size); static unsigned long iova_rcache_get(struct iova_domain *iovad, unsigned long size, unsigned long limit_pfn); -static void init_iova_rcaches(struct iova_domain *iovad); static void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad); static void free_iova_rcaches(struct iova_domain *iovad); @@ -64,8 +65,6 @@ init_iova_domain(struct iova_domain *iovad, unsigned long granule, iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR; rb_link_node(&iovad->anchor.node, NULL, &iovad->rbroot.rb_node); rb_insert_color(&iovad->anchor.node, &iovad->rbroot); - cpuhp_state_add_instance_nocalls(CPUHP_IOMMU_IOVA_DEAD, &iovad->cpuhp_dead); - init_iova_rcaches(iovad); } EXPORT_SYMBOL_GPL(init_iova_domain); @@ -497,9 +496,9 @@ void put_iova_domain(struct iova_domain *iovad) { struct iova *iova, *tmp; - cpuhp_state_remove_instance_nocalls(CPUHP_IOMMU_IOVA_DEAD, - &iovad->cpuhp_dead); - free_iova_rcaches(iovad); + if (iovad->rcaches) + iova_domain_free_rcaches(iovad); + rbtree_postorder_for_each_entry_safe(iova, tmp, &iovad->rbroot, node) free_iova_mem(iova); } @@ -608,6 +607,7 @@ EXPORT_SYMBOL_GPL(reserve_iova); */ #define IOVA_MAG_SIZE 128 +#define MAX_GLOBAL_MAGS 32 /* magazines per bin */ struct iova_magazine { unsigned long size; @@ -620,6 +620,13 @@ struct iova_cpu_rcache { struct iova_magazine *prev; }; +struct iova_rcache { + spinlock_t lock; + unsigned long depot_size; + struct iova_magazine *depot[MAX_GLOBAL_MAGS]; + struct iova_cpu_rcache __percpu *cpu_rcaches; +}; + static struct iova_magazine *iova_magazine_alloc(gfp_t flags) { return kzalloc(sizeof(struct iova_magazine), flags); @@ -693,28 +700,62 @@ static void iova_magazine_push(struct iova_magazine *mag, unsigned long pfn) mag->pfns[mag->size++] = pfn; } -static void init_iova_rcaches(struct iova_domain *iovad) +int iova_domain_init_rcaches(struct iova_domain *iovad) { - struct iova_cpu_rcache *cpu_rcache; - struct iova_rcache *rcache; unsigned int cpu; - int i; + int i, ret; + + iovad->rcaches = kcalloc(IOVA_RANGE_CACHE_MAX_SIZE, +sizeof(struct iova_rcache), +GFP_KERNEL); + if (!iovad->rcaches) + return -ENOMEM; for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) { + struct iova_cpu_rcache *cpu_rcache; + struct iova_rcache *rcache; + rcache = &iovad->rcaches[i]; spin_lock_init(&rcache->lock); rcache->depot_size = 0; - rcache->cpu_rcaches = __alloc_percpu(sizeof(*cpu_rcache), cache_line_size()); - if (WARN_ON(!rcache->cpu_rcaches)) - continue; +
Re: [PATCH 0/3] Some minor SWIOTLB cleanups
Thanks, applied to the dma-mapping for-next tree. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] x86/hyperv: Set swiotlb_alloc_from_low_pages to false
From: Tianyu Lan In Hyper-V Isolation VM, swiotlb bnounce buffer size maybe 1G at most and there maybe no enough memory from 0 to 4G according to memory layout. Devices in Isolation VM can use memory above 4G as DMA memory. Set swiotlb_ alloc_from_low_pages to false in Isolation VM. Signed-off-by: Tianyu Lan --- arch/x86/kernel/cpu/mshyperv.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index 5a99f993e639..80a0423ac75d 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -343,6 +343,7 @@ static void __init ms_hyperv_init_platform(void) * use swiotlb bounce buffer for dma transaction. */ swiotlb_force = SWIOTLB_FORCE; + swiotlb_alloc_from_low_pages = false; #endif } -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/2] Swiotlb: Add swiotlb_alloc_from_low_pages switch
From: Tianyu Lan Hyper-V Isolation VM and AMD SEV VM uses swiotlb bounce buffer to share memory with hypervisor. Current swiotlb bounce buffer is only allocated from 0 to ARCH_LOW_ADDRESS_LIMIT which is default to 0xUL. Isolation VM and AMD SEV VM needs 1G bounce buffer at most. This will fail when there is not enough contiguous memory from 0 to 4G address space and devices also may use memory above 4G address space as DMA memory. Expose swiotlb_alloc_from_low_pages and platform mey set it to false when it's not necessary to limit bounce buffer from 0 to 4G memory. Signed-off-by: Tianyu Lan --- include/linux/swiotlb.h | 1 + kernel/dma/swiotlb.c| 13 +++-- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index f6c3638255d5..55c178e8eee0 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -191,5 +191,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev) #endif /* CONFIG_DMA_RESTRICTED_POOL */ extern phys_addr_t swiotlb_unencrypted_base; +extern bool swiotlb_alloc_from_low_pages; #endif /* __LINUX_SWIOTLB_H */ diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index f1e7ea160b43..159fef80f3db 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -73,6 +73,12 @@ enum swiotlb_force swiotlb_force; struct io_tlb_mem io_tlb_default_mem; +/* + * Get IO TLB memory from the low pages if swiotlb_alloc_from_low_pages + * is set. + */ +bool swiotlb_alloc_from_low_pages = true; + phys_addr_t swiotlb_unencrypted_base; /* @@ -284,8 +290,11 @@ swiotlb_init(int verbose) if (swiotlb_force == SWIOTLB_NO_FORCE) return; - /* Get IO TLB memory from the low pages */ - tlb = memblock_alloc_low(bytes, PAGE_SIZE); + if (swiotlb_alloc_from_low_pages) + tlb = memblock_alloc_low(bytes, PAGE_SIZE); + else + tlb = memblock_alloc(bytes, PAGE_SIZE); + if (!tlb) goto fail; if (swiotlb_init_with_tbl(tlb, default_nslabs, verbose)) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/2] x86/hyperv/Swiotlb: Add swiotlb_alloc_from_low_pages switch
From: Tianyu Lan Hyper-V Isolation VM may fail to allocate swiotlb bounce buffer due to there is no enough contiguous memory from 0 to 4G in some cases. Current swiotlb code allocate bounce buffer in the low end memory. This patchset adds a switch "swiotlb_alloc_from_low_pages" and it's set to true by default. Platform may clear it if necessary. Devices in Hyper-V Isolation VM may use memory above 4G as DMA memory and set the switch to false in order to avoid no enough contiguous memory in low end address space. Tianyu Lan (2): Swiotlb: Add swiotlb_alloc_from_low_pages switch x86/hyperv: Set swiotlb_alloc_from_low_pages to false arch/x86/kernel/cpu/mshyperv.c | 1 + include/linux/swiotlb.h| 1 + kernel/dma/swiotlb.c | 13 +++-- 3 files changed, 13 insertions(+), 2 deletions(-) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] swiotlb: Do not zero buffer in set_memory_decrypted()
Thanks, applied to the dma-mapping for-next tree. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Tue, Jan 25 2022 at 07:18, Fenghua Yu wrote: > On Mon, Jan 24, 2022 at 09:55:56PM +0100, Thomas Gleixner wrote: > /** > * ioasid_put - Release a reference to an ioasid > * @ioasid: the ID to remove which in turn makes ioasid_put() a misnomer and the whole refcounting of the ioasid a pointless exercise. While looking at ioasid_put() usage I tripped over the following UAF issue: --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4817,8 +4817,10 @@ static int aux_domain_add_dev(struct dma auxiliary_unlink_device(domain, dev); link_failed: spin_unlock_irqrestore(&device_domain_lock, flags); - if (list_empty(&domain->subdevices) && domain->default_pasid > 0) + if (list_empty(&domain->subdevices) && domain->default_pasid > 0) { ioasid_put(domain->default_pasid); + domain->default_pasid = INVALID_IOASID; + } return ret; } @@ -4847,8 +4849,10 @@ static void aux_domain_remove_dev(struct spin_unlock_irqrestore(&device_domain_lock, flags); - if (list_empty(&domain->subdevices) && domain->default_pasid > 0) + if (list_empty(&domain->subdevices) && domain->default_pasid > 0) { ioasid_put(domain->default_pasid); + domain->default_pasid = INVALID_IOASID; + } } static int prepare_domain_attach_device(struct iommu_domain *domain, Vs. ioasid_put() I think we should fold the following: --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4818,7 +4818,7 @@ static int aux_domain_add_dev(struct dma link_failed: spin_unlock_irqrestore(&device_domain_lock, flags); if (list_empty(&domain->subdevices) && domain->default_pasid > 0) { - ioasid_put(domain->default_pasid); + ioasid_free(domain->default_pasid); domain->default_pasid = INVALID_IOASID; } @@ -4850,7 +4850,7 @@ static void aux_domain_remove_dev(struct spin_unlock_irqrestore(&device_domain_lock, flags); if (list_empty(&domain->subdevices) && domain->default_pasid > 0) { - ioasid_put(domain->default_pasid); + ioasid_free(domain->default_pasid); domain->default_pasid = INVALID_IOASID; } } --- a/drivers/iommu/ioasid.c +++ b/drivers/iommu/ioasid.c @@ -2,7 +2,7 @@ /* * I/O Address Space ID allocator. There is one global IOASID space, split into * subsets. Users create a subset with DECLARE_IOASID_SET, then allocate and - * free IOASIDs with ioasid_alloc and ioasid_put. + * free IOASIDs with ioasid_alloc() and ioasid_free(). */ #include #include @@ -15,7 +15,6 @@ struct ioasid_data { struct ioasid_set *set; void *private; struct rcu_head rcu; - refcount_t refs; }; /* @@ -315,7 +314,6 @@ ioasid_t ioasid_alloc(struct ioasid_set data->set = set; data->private = private; - refcount_set(&data->refs, 1); /* * Custom allocator needs allocator data to perform platform specific @@ -348,17 +346,11 @@ ioasid_t ioasid_alloc(struct ioasid_set EXPORT_SYMBOL_GPL(ioasid_alloc); /** - * ioasid_put - Release a reference to an ioasid + * ioasid_free - Free an ioasid * @ioasid: the ID to remove - * - * Put a reference to the IOASID, free it when the number of references drops to - * zero. - * - * Return: %true if the IOASID was freed, %false otherwise. */ -bool ioasid_put(ioasid_t ioasid) +void ioasid_free(ioasid_t ioasid) { - bool free = false; struct ioasid_data *ioasid_data; spin_lock(&ioasid_allocator_lock); @@ -368,10 +360,6 @@ bool ioasid_put(ioasid_t ioasid) goto exit_unlock; } - free = refcount_dec_and_test(&ioasid_data->refs); - if (!free) - goto exit_unlock; - active_allocator->ops->free(ioasid, active_allocator->ops->pdata); /* Custom allocator needs additional steps to free the xa element */ if (active_allocator->flags & IOASID_ALLOCATOR_CUSTOM) { @@ -381,9 +369,8 @@ bool ioasid_put(ioasid_t ioasid) exit_unlock: spin_unlock(&ioasid_allocator_lock); - return free; } -EXPORT_SYMBOL_GPL(ioasid_put); +EXPORT_SYMBOL_GPL(ioasid_free); /** * ioasid_find - Find IOASID data --- a/include/linux/ioasid.h +++ b/include/linux/ioasid.h @@ -34,7 +34,7 @@ struct ioasid_allocator_ops { #if IS_ENABLED(CONFIG_IOASID) ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max, void *private); -bool ioasid_put(ioasid_t ioasid); +void ioasid_free(ioasid_t ioasid); void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid, bool (*getter)(void *)); int ioasid_register_allocator(struct ioasid_allocator_ops *allocator); @@ -52,10 +52,7 @@ static inline ioasid_t ioasid_alloc(stru return INVALID_IOASID; } -static inline bool ioasid_put(ioasid_t ioasid) -{ - return false; -} +static inline void ioa
[PATCH] iommu/iova: Separate out rcache init
Currently the rcache structures are allocated for all IOVA domains, even if they do not use "fast" alloc+free interface. This is wasteful of memory. In addition, fails in init_iova_rcaches() are not handled safely, which is less than ideal. Make "fast" users call a separate rcache init explicitly, which includes error checking. Signed-off-by: John Garry diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 3a46f2cc9e5d..dd066d990809 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -525,6 +525,7 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, struct iommu_dma_cookie *cookie = domain->iova_cookie; unsigned long order, base_pfn; struct iova_domain *iovad; + int ret; if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE) return -EINVAL; @@ -559,6 +560,9 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, } init_iova_domain(iovad, 1UL << order, base_pfn); + ret = iova_domain_init_rcaches(iovad); + if (ret) + return ret; /* If the FQ fails we can simply fall back to strict mode */ if (domain->type == IOMMU_DOMAIN_DMA_FQ && iommu_dma_init_fq(domain)) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index b28c9435b898..d3adc6ea5710 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -15,13 +15,14 @@ /* The anchor node sits above the top of the usable address space */ #define IOVA_ANCHOR~0UL +#define IOVA_RANGE_CACHE_MAX_SIZE 6/* log of max cached IOVA range size (in pages) */ + static bool iova_rcache_insert(struct iova_domain *iovad, unsigned long pfn, unsigned long size); static unsigned long iova_rcache_get(struct iova_domain *iovad, unsigned long size, unsigned long limit_pfn); -static void init_iova_rcaches(struct iova_domain *iovad); static void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad); static void free_iova_rcaches(struct iova_domain *iovad); @@ -64,8 +65,6 @@ init_iova_domain(struct iova_domain *iovad, unsigned long granule, iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR; rb_link_node(&iovad->anchor.node, NULL, &iovad->rbroot.rb_node); rb_insert_color(&iovad->anchor.node, &iovad->rbroot); - cpuhp_state_add_instance_nocalls(CPUHP_IOMMU_IOVA_DEAD, &iovad->cpuhp_dead); - init_iova_rcaches(iovad); } EXPORT_SYMBOL_GPL(init_iova_domain); @@ -497,9 +496,9 @@ void put_iova_domain(struct iova_domain *iovad) { struct iova *iova, *tmp; - cpuhp_state_remove_instance_nocalls(CPUHP_IOMMU_IOVA_DEAD, - &iovad->cpuhp_dead); - free_iova_rcaches(iovad); + if (iovad->rcaches) + iova_domain_free_rcaches(iovad); + rbtree_postorder_for_each_entry_safe(iova, tmp, &iovad->rbroot, node) free_iova_mem(iova); } @@ -608,6 +607,7 @@ EXPORT_SYMBOL_GPL(reserve_iova); */ #define IOVA_MAG_SIZE 128 +#define MAX_GLOBAL_MAGS 32 /* magazines per bin */ struct iova_magazine { unsigned long size; @@ -620,6 +620,13 @@ struct iova_cpu_rcache { struct iova_magazine *prev; }; +struct iova_rcache { + spinlock_t lock; + unsigned long depot_size; + struct iova_magazine *depot[MAX_GLOBAL_MAGS]; + struct iova_cpu_rcache __percpu *cpu_rcaches; +}; + static struct iova_magazine *iova_magazine_alloc(gfp_t flags) { return kzalloc(sizeof(struct iova_magazine), flags); @@ -693,28 +700,62 @@ static void iova_magazine_push(struct iova_magazine *mag, unsigned long pfn) mag->pfns[mag->size++] = pfn; } -static void init_iova_rcaches(struct iova_domain *iovad) +int iova_domain_init_rcaches(struct iova_domain *iovad) { - struct iova_cpu_rcache *cpu_rcache; - struct iova_rcache *rcache; unsigned int cpu; - int i; + int i, ret; + + iovad->rcaches = kcalloc(IOVA_RANGE_CACHE_MAX_SIZE, +sizeof(struct iova_rcache), +GFP_KERNEL); + if (!iovad->rcaches) + return -ENOMEM; for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) { + struct iova_cpu_rcache *cpu_rcache; + struct iova_rcache *rcache; + rcache = &iovad->rcaches[i]; spin_lock_init(&rcache->lock); rcache->depot_size = 0; - rcache->cpu_rcaches = __alloc_percpu(sizeof(*cpu_rcache), cache_line_size()); - if (WARN_ON(!rcache->cpu_rcaches)) - continue; + rcache->cpu_rcaches = __alloc_percpu(sizeof(*cpu_rcache), +cache_line_size()); + if
Re: [PATCH 0/7] iommu cleanup and refactoring
On 2022-01-26 13:27, Jason Gunthorpe via iommu wrote: On Wed, Jan 26, 2022 at 09:51:36AM +0800, Lu Baolu wrote: they are fundamentally different things in their own right, and the ideal API should give us the orthogonality to also bind a device to an SVA domain without PASID (e.g. for KVM stage 2, or userspace assignment of simpler fault/stall-tolerant devices), or attach PASIDs to regular iommu_domains. Yes, these are orthogonal things. A iommu driver that supports PASID ideally should support PASID enabled attach/detatch for every iommu_domain type it supports. SVA should not be entangled with PASID beyond that SVA is often used with PASID - a SVA iommu_domain should be fully usable with a RID too. The prototype of PASID enabled attach/detach ops could look like: int (*attach_dev_pasid)(struct iommu_domain *domain, struct device *dev, ioasid_t id); void (*detach_dev_pasid)(struct iommu_domain *domain, struct device *dev, ioasid_t id); It seems reasonable and straightforward to me.. These would be domain ops? But the iommu driver should implement different callbacks for 1) attaching an IOMMU DMA domain to a PASID on device; - kernel DMA with PASID - mdev-like device passthrough - etc. 2) attaching a CPU-shared domain to a PASID on device; - SVA - guest PASID - etc. But this you mean domain->ops would be different? Seems fine, up to the driver. Indeed, it makes little practical difference whether the driver provides distinct sets of ops or just common callbacks which switch on the domain type internally. I expect that's largely going to come down to personal preference and how much internal driver code is common between the paths. I'd hope to see some flow like: domain = device->bus->iommu_ops->alloc_sva_domain(dev) domain->ops->attach_dev_pasid(domain, dev, current->pasid) To duplicate the current SVA APIs +1 - I'd concur that attach/detach belong as domain ops in general. There's quite a nice logical split forming here where domain ops are the ones that make sense for external subsystems and endpoint drivers to use too, while device ops, with the sole exception of domain_alloc, are IOMMU API internals. By that reasoning, attach/bind/etc. are firmly in the former category. Thanks, Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/7] iommu cleanup and refactoring
On Wed, Jan 26, 2022 at 09:51:36AM +0800, Lu Baolu wrote: > > > they are fundamentally different things in their own right, and the ideal > > > API should give us the orthogonality to also bind a device to an SVA > > > domain > > > without PASID (e.g. for KVM stage 2, or userspace assignment of simpler > > > fault/stall-tolerant devices), or attach PASIDs to regular iommu_domains. > > > > Yes, these are orthogonal things. A iommu driver that supports PASID > > ideally should support PASID enabled attach/detatch for every > > iommu_domain type it supports. > > > > SVA should not be entangled with PASID beyond that SVA is often used > > with PASID - a SVA iommu_domain should be fully usable with a RID too. > > The prototype of PASID enabled attach/detach ops could look like: > >int (*attach_dev_pasid)(struct iommu_domain *domain, >struct device *dev, ioasid_t id); >void (*detach_dev_pasid)(struct iommu_domain *domain, > struct device *dev, ioasid_t id); It seems reasonable and straightforward to me.. These would be domain ops? > But the iommu driver should implement different callbacks for > > 1) attaching an IOMMU DMA domain to a PASID on device; >- kernel DMA with PASID >- mdev-like device passthrough >- etc. > 2) attaching a CPU-shared domain to a PASID on device; >- SVA >- guest PASID >- etc. But this you mean domain->ops would be different? Seems fine, up to the driver. I'd hope to see some flow like: domain = device->bus->iommu_ops->alloc_sva_domain(dev) domain->ops->attach_dev_pasid(domain, dev, current->pasid) To duplicate the current SVA APIs Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC][DISCUSSION] dma-mapping: allocating noncoherent buffer without mapping
On 2022-01-26 11:54, Hyeonggon Yoo wrote: Last month we discussed drivers that uses kmalloc(GFP_DMA) for noncoherent mapping should be converted to use DMA API [1]. Simple grep with GFP_DMA shows that many of drivers are mistakenly using GFP_DMA flag. So our purpose was to make DMA API choose right zone depending on device's dma mask. Baoquan and I are trying to make drivers to use dma_alloc_noncoherent() when allocating the buffer. But it's not simple for some of drivers; there is a gap between dma_alloc_noncoherent() and the previous convention (allocating buffer from buddy or slab allocator and mapping it when needed.) For example, some drivers allocate buffer and reuse it. it just maps and unmaps whenever needed. And some drivers does not even maps the buffer. (some drivers that does not use DMA API) So I think we need to implement a version of dma_alloc_noncoherent() that does not map the buffer. This doesn't make sense to me. The point of dma_alloc_* is you get back a dedicated DMA buffer which can then be used at any time. In the noncoherent case you have to put in explicit dma_sync_*() calls around accesses when the CPU or device is expected to have updated the buffer contents, but it's still fundamentally the same paradigm. If you want to update drivers from using streaming mappings on specially-allocated pages to using proper dedicated DMA buffers, then update the logic in those drivers to use dedicated DMA buffers appropriately. Adding an API to allocate a DMA buffer which isn't actually a DMA buffer so we can "update" drivers from misusing streaming mappings to still misusing streaming mappings is nonsense. Robin. I think implementing a helper that internally calls __dma_direct_alloc_pages() will be okay. As I'm not expert in this area, I want to hear others' opinions. [1] https://lkml.org/lkml/2021/12/13/1121 Thanks, Hyeonggon. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] dt-bindings: iommu: renesas, ipmmu-vmsa: Reformat renesas, ipmmu-main description
Remove trailing whitespace and break overly long lines. Signed-off-by: Geert Uytterhoeven --- .../devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml index 507853fcc746eea1..67da53e8d4d10aa0 100644 --- a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml +++ b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml @@ -69,9 +69,9 @@ properties: items: - items: - description: phandle to main IPMMU - - description: the interrupt bit number associated with the particular - cache IPMMU device. The interrupt bit number needs to match the main - IPMMU IMSSTR register. Only used by cache IPMMU instances. + - description: the interrupt bit number associated with the particular + cache IPMMU device. The interrupt bit number needs to match the + main IPMMU IMSSTR register. Only used by cache IPMMU instances. description: Reference to the main IPMMU phandle plus 1 cell. The cell is the interrupt bit number associated with the particular cache IPMMU -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC][DISCUSSION] dma-mapping: allocating noncoherent buffer without mapping
Last month we discussed drivers that uses kmalloc(GFP_DMA) for noncoherent mapping should be converted to use DMA API [1]. Simple grep with GFP_DMA shows that many of drivers are mistakenly using GFP_DMA flag. So our purpose was to make DMA API choose right zone depending on device's dma mask. Baoquan and I are trying to make drivers to use dma_alloc_noncoherent() when allocating the buffer. But it's not simple for some of drivers; there is a gap between dma_alloc_noncoherent() and the previous convention (allocating buffer from buddy or slab allocator and mapping it when needed.) For example, some drivers allocate buffer and reuse it. it just maps and unmaps whenever needed. And some drivers does not even maps the buffer. (some drivers that does not use DMA API) So I think we need to implement a version of dma_alloc_noncoherent() that does not map the buffer. I think implementing a helper that internally calls __dma_direct_alloc_pages() will be okay. As I'm not expert in this area, I want to hear others' opinions. [1] https://lkml.org/lkml/2021/12/13/1121 Thanks, Hyeonggon. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/5] iommu/amd: fixes for suspend/resume
On Wed, 2022-01-26 at 09:54 +, Mike Lothian wrote: > On Wed, 26 Jan 2022 at 07:34, Maxim Levitsky wrote: > > Could you post the whole dmesg, or at least: > > > > dmesg | grep AMD-Vi > > > > > > What CPU does your system have? > > > > I suspect that your system doesn't GA log feature enabled in the IOMMU, and > > the code never checks > > for that, and here it fails enabling it, which before my patches was just > > ignoring it silently. > > > > > > Best regards, > > Maxim Levitsky > > > Hope that helps > > > > > > Mike > > > > > Hi > > It's an AMD Ryzen 9 5900HX > > [0.186350] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR0, rdevid:160 > [0.186353] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR1, rdevid:160 > [0.186354] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR2, rdevid:160 > [0.186355] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR3, rdevid:160 > [0.355628] pci :00:00.2: AMD-Vi: IOMMU performance counters supported > [0.356134] pci :00:00.2: AMD-Vi: Found IOMMU cap 0x40 > [0.356136] AMD-Vi: Extended features (0x206d73ef22254ade): PPR > X2APIC NX GT IA GA PC GA_vAPIC > [0.356140] AMD-Vi: Interrupt remapping enabled > [0.356141] AMD-Vi: Virtual APIC enabled > [0.356142] AMD-Vi: X2APIC enabled > [0.431377] AMD-Vi: AMD IOMMUv2 loaded and initialized > > I've attached the dmesg, I notice that some boots it doesn't happen > > Cheers > > Mike Great, your system does seem to support GA log (but a patch to check if, other that assume blindly that it is supported is something that should be done). So could you bump the LOOP_TIMEOUT like by 10x or so and see if the problem goes away? (that code should be rewritten to time based wait and not just blindly loop like that, I also can prepare a patch for that as well). Best regards, Maxim Levitsky ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/5] iommu/amd: fixes for suspend/resume
On Wed, 26 Jan 2022 at 07:34, Maxim Levitsky wrote: > > Could you post the whole dmesg, or at least: > > dmesg | grep AMD-Vi > > > What CPU does your system have? > > I suspect that your system doesn't GA log feature enabled in the IOMMU, and > the code never checks > for that, and here it fails enabling it, which before my patches was just > ignoring it silently. > > > Best regards, > Maxim Levitsky > > > > Hope that helps > > > > Mike > > Hi It's an AMD Ryzen 9 5900HX [0.186350] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR0, rdevid:160 [0.186353] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR1, rdevid:160 [0.186354] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR2, rdevid:160 [0.186355] AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR3, rdevid:160 [0.355628] pci :00:00.2: AMD-Vi: IOMMU performance counters supported [0.356134] pci :00:00.2: AMD-Vi: Found IOMMU cap 0x40 [0.356136] AMD-Vi: Extended features (0x206d73ef22254ade): PPR X2APIC NX GT IA GA PC GA_vAPIC [0.356140] AMD-Vi: Interrupt remapping enabled [0.356141] AMD-Vi: Virtual APIC enabled [0.356142] AMD-Vi: X2APIC enabled [0.431377] AMD-Vi: AMD IOMMUv2 loaded and initialized I've attached the dmesg, I notice that some boots it doesn't happen Cheers Mike x2apic.dmesg Description: Binary data ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 7/7] iommu: Add iommu_domain::domain_ops
On Mon, Jan 24, 2022 at 12:33:02PM -0400, Jason Gunthorpe wrote: > On Mon, Jan 24, 2022 at 10:16:07AM +, Jean-Philippe Brucker wrote: > > On Mon, Jan 24, 2022 at 09:58:18AM +, Tian, Kevin wrote: > > > > From: Lu Baolu > > > > Sent: Monday, January 24, 2022 3:11 PM > > > > +/** > > > > + * struct domain_ops - per-domain ops > > > > + * @attach_dev: attach an iommu domain to a device > > > > + * @detach_dev: detach an iommu domain from a device > > > > > > What is the criteria about whether an op should be iommu_ops or domain_ops > > > when it requires both domain and device pointers like above two (and > > > future > > > PASID-based attach)? > > > > > > Other examples include: > > > @apply_resv_region > > > @is_attach_deferred > > > > Could attach_dev() be an IOMMU op? So a driver could set the domain ops > > in attach_dev() rather than domain_alloc(). That would allow to install > > map()/unmap() ops that are tailored for the device's IOMMU, which we don't > > know at domain_alloc() time. > > I think we should be moving toward 'domain_alloc' returning the > correct domain and the way the driver implements the domain shouldn't > change after that. > > > I'm thinking about a guest that has both physical and virtual > > endpoints, which would ideally use different kinds of domain ops to > > support both efficiently (caching mode vs page tables) > > In this case shouldn't domain_alloc() reached from the struct device > already do the correct thing? Sure, if we can finalise the domains before attach that could also clean up the drivers a bit. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu