Re: [PATCH v3] iommu/amd: Add support for fast IOTLB flushing
Hi Joerg, On 2/13/18 8:29 PM, Joerg Roedel wrote: Hi Suravee, thanks for working on this. On Wed, Jan 31, 2018 at 12:01:14AM -0500, Suravee Suthikulpanit wrote: +static void amd_iommu_iotlb_range_add(struct iommu_domain *domain, + unsigned long iova, size_t size) +{ + struct amd_iommu_flush_entries *entry, *p; + unsigned long flags; + bool found = false; + + spin_lock_irqsave(_iommu_flush_list_lock, flags); I am not happy with introducing or using global locks when they are not necessary. Can this be a per-domain lock? Besides, did you check it makes sense to actually keep track of the ranges here? My approach would be to just make iotlb_range_add() an noop and do a full domain flush in iotlb_sync(). But maybe you did measurements you can share here to show there is a benefit. Joerg Alright, I'll send out v4 w/ iotlb_range_add() as no-op, and iotlb_sync() as full domain flush. This should be sufficient to get start with adopting the fast TLB flushing interface. I'll submit support for fine-grain TLB invalidation as a separate series. Thanks, Suravee
Re: [PATCH v3] iommu/amd: Add support for fast IOTLB flushing
Hi Joerg, On 2/13/18 8:29 PM, Joerg Roedel wrote: Hi Suravee, thanks for working on this. On Wed, Jan 31, 2018 at 12:01:14AM -0500, Suravee Suthikulpanit wrote: +static void amd_iommu_iotlb_range_add(struct iommu_domain *domain, + unsigned long iova, size_t size) +{ + struct amd_iommu_flush_entries *entry, *p; + unsigned long flags; + bool found = false; + + spin_lock_irqsave(_iommu_flush_list_lock, flags); I am not happy with introducing or using global locks when they are not necessary. Can this be a per-domain lock? Besides, did you check it makes sense to actually keep track of the ranges here? My approach would be to just make iotlb_range_add() an noop and do a full domain flush in iotlb_sync(). But maybe you did measurements you can share here to show there is a benefit. Joerg Alright, I'll send out v4 w/ iotlb_range_add() as no-op, and iotlb_sync() as full domain flush. This should be sufficient to get start with adopting the fast TLB flushing interface. I'll submit support for fine-grain TLB invalidation as a separate series. Thanks, Suravee
Re: [PATCH v3] iommu/amd: Add support for fast IOTLB flushing
Hi Suravee, thanks for working on this. On Wed, Jan 31, 2018 at 12:01:14AM -0500, Suravee Suthikulpanit wrote: > +static void amd_iommu_iotlb_range_add(struct iommu_domain *domain, > + unsigned long iova, size_t size) > +{ > + struct amd_iommu_flush_entries *entry, *p; > + unsigned long flags; > + bool found = false; > + > + spin_lock_irqsave(_iommu_flush_list_lock, flags); I am not happy with introducing or using global locks when they are not necessary. Can this be a per-domain lock? Besides, did you check it makes sense to actually keep track of the ranges here? My approach would be to just make iotlb_range_add() an noop and do a full domain flush in iotlb_sync(). But maybe you did measurements you can share here to show there is a benefit. Joerg
Re: [PATCH v3] iommu/amd: Add support for fast IOTLB flushing
Hi Suravee, thanks for working on this. On Wed, Jan 31, 2018 at 12:01:14AM -0500, Suravee Suthikulpanit wrote: > +static void amd_iommu_iotlb_range_add(struct iommu_domain *domain, > + unsigned long iova, size_t size) > +{ > + struct amd_iommu_flush_entries *entry, *p; > + unsigned long flags; > + bool found = false; > + > + spin_lock_irqsave(_iommu_flush_list_lock, flags); I am not happy with introducing or using global locks when they are not necessary. Can this be a per-domain lock? Besides, did you check it makes sense to actually keep track of the ranges here? My approach would be to just make iotlb_range_add() an noop and do a full domain flush in iotlb_sync(). But maybe you did measurements you can share here to show there is a benefit. Joerg
[PATCH v3] iommu/amd: Add support for fast IOTLB flushing
Implement the newly added IOTLB flushing interface for AMD IOMMU. Cc: Joerg RoedelSigned-off-by: Suravee Suthikulpanit --- Changes from v2 (https://lkml.org/lkml/2017/12/27/44) * Call domain_flush_complete() after domain_flush_tlb_pde(). drivers/iommu/amd_iommu.c | 77 +++-- drivers/iommu/amd_iommu_init.c | 7 drivers/iommu/amd_iommu_types.h | 7 3 files changed, 88 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 3609f51..6c7ac3f 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -129,6 +129,12 @@ struct dma_ops_domain { static struct iova_domain reserved_iova_ranges; static struct lock_class_key reserved_rbtree_key; +struct amd_iommu_flush_entries { + struct list_head list; + unsigned long iova; + size_t size; +}; + / * * Helper functions @@ -3043,9 +3049,6 @@ static ssize_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova, unmap_size = iommu_unmap_page(domain, iova, page_size); mutex_unlock(>api_lock); - domain_flush_tlb_pde(domain); - domain_flush_complete(domain); - return unmap_size; } @@ -3163,6 +3166,71 @@ static bool amd_iommu_is_attach_deferred(struct iommu_domain *domain, return dev_data->defer_attach; } +static void amd_iommu_flush_iotlb_all(struct iommu_domain *domain) +{ + struct protection_domain *dom = to_pdomain(domain); + + domain_flush_tlb_pde(dom); + domain_flush_complete(dom); +} + +static void amd_iommu_iotlb_range_add(struct iommu_domain *domain, + unsigned long iova, size_t size) +{ + struct amd_iommu_flush_entries *entry, *p; + unsigned long flags; + bool found = false; + + spin_lock_irqsave(_iommu_flush_list_lock, flags); + list_for_each_entry(p, _iommu_flush_list, list) { + if (iova != p->iova) + continue; + + if (size > p->size) { + p->size = size; + pr_debug("%s: update range: iova=%#lx, size = %#lx\n", +__func__, p->iova, p->size); + } + found = true; + break; + } + + if (!found) { + entry = kzalloc(sizeof(struct amd_iommu_flush_entries), + GFP_KERNEL); + if (entry) { + pr_debug("%s: new range: iova=%lx, size=%#lx\n", +__func__, iova, size); + + entry->iova = iova; + entry->size = size; + list_add(>list, _iommu_flush_list); + } + } + spin_unlock_irqrestore(_iommu_flush_list_lock, flags); +} + +static void amd_iommu_iotlb_sync(struct iommu_domain *domain) +{ + struct protection_domain *pdom = to_pdomain(domain); + struct amd_iommu_flush_entries *entry, *next; + unsigned long flags; + + /* Note: +* Currently, IOMMU driver just flushes the whole IO/TLB for +* a given domain. So, just remove entries from the list here. +*/ + spin_lock_irqsave(_iommu_flush_list_lock, flags); + list_for_each_entry_safe(entry, next, _iommu_flush_list, list) { + list_del(>list); + kfree(entry); + } + spin_unlock_irqrestore(_iommu_flush_list_lock, flags); + + domain_flush_tlb_pde(pdom); + domain_flush_complete(pdom); +} + const struct iommu_ops amd_iommu_ops = { .capable = amd_iommu_capable, .domain_alloc = amd_iommu_domain_alloc, @@ -3181,6 +3249,9 @@ static bool amd_iommu_is_attach_deferred(struct iommu_domain *domain, .apply_resv_region = amd_iommu_apply_resv_region, .is_attach_deferred = amd_iommu_is_attach_deferred, .pgsize_bitmap = AMD_IOMMU_PGSIZES, + .flush_iotlb_all = amd_iommu_flush_iotlb_all, + .iotlb_range_add = amd_iommu_iotlb_range_add, + .iotlb_sync = amd_iommu_iotlb_sync, }; /* diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 6fe2d03..e8f8cee 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -185,6 +185,12 @@ struct ivmd_header { bool amd_iommu_force_isolation __read_mostly; /* + * IOTLB flush list + */ +LIST_HEAD(amd_iommu_flush_list); +spinlock_t amd_iommu_flush_list_lock; + +/* * List of protection domains - used during resume */ LIST_HEAD(amd_iommu_pd_list); @@ -2490,6 +2496,7 @@ static int __init early_amd_iommu_init(void) __set_bit(0, amd_iommu_pd_alloc_bitmap); spin_lock_init(_iommu_pd_lock); +
[PATCH v3] iommu/amd: Add support for fast IOTLB flushing
Implement the newly added IOTLB flushing interface for AMD IOMMU. Cc: Joerg Roedel Signed-off-by: Suravee Suthikulpanit --- Changes from v2 (https://lkml.org/lkml/2017/12/27/44) * Call domain_flush_complete() after domain_flush_tlb_pde(). drivers/iommu/amd_iommu.c | 77 +++-- drivers/iommu/amd_iommu_init.c | 7 drivers/iommu/amd_iommu_types.h | 7 3 files changed, 88 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 3609f51..6c7ac3f 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -129,6 +129,12 @@ struct dma_ops_domain { static struct iova_domain reserved_iova_ranges; static struct lock_class_key reserved_rbtree_key; +struct amd_iommu_flush_entries { + struct list_head list; + unsigned long iova; + size_t size; +}; + / * * Helper functions @@ -3043,9 +3049,6 @@ static ssize_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova, unmap_size = iommu_unmap_page(domain, iova, page_size); mutex_unlock(>api_lock); - domain_flush_tlb_pde(domain); - domain_flush_complete(domain); - return unmap_size; } @@ -3163,6 +3166,71 @@ static bool amd_iommu_is_attach_deferred(struct iommu_domain *domain, return dev_data->defer_attach; } +static void amd_iommu_flush_iotlb_all(struct iommu_domain *domain) +{ + struct protection_domain *dom = to_pdomain(domain); + + domain_flush_tlb_pde(dom); + domain_flush_complete(dom); +} + +static void amd_iommu_iotlb_range_add(struct iommu_domain *domain, + unsigned long iova, size_t size) +{ + struct amd_iommu_flush_entries *entry, *p; + unsigned long flags; + bool found = false; + + spin_lock_irqsave(_iommu_flush_list_lock, flags); + list_for_each_entry(p, _iommu_flush_list, list) { + if (iova != p->iova) + continue; + + if (size > p->size) { + p->size = size; + pr_debug("%s: update range: iova=%#lx, size = %#lx\n", +__func__, p->iova, p->size); + } + found = true; + break; + } + + if (!found) { + entry = kzalloc(sizeof(struct amd_iommu_flush_entries), + GFP_KERNEL); + if (entry) { + pr_debug("%s: new range: iova=%lx, size=%#lx\n", +__func__, iova, size); + + entry->iova = iova; + entry->size = size; + list_add(>list, _iommu_flush_list); + } + } + spin_unlock_irqrestore(_iommu_flush_list_lock, flags); +} + +static void amd_iommu_iotlb_sync(struct iommu_domain *domain) +{ + struct protection_domain *pdom = to_pdomain(domain); + struct amd_iommu_flush_entries *entry, *next; + unsigned long flags; + + /* Note: +* Currently, IOMMU driver just flushes the whole IO/TLB for +* a given domain. So, just remove entries from the list here. +*/ + spin_lock_irqsave(_iommu_flush_list_lock, flags); + list_for_each_entry_safe(entry, next, _iommu_flush_list, list) { + list_del(>list); + kfree(entry); + } + spin_unlock_irqrestore(_iommu_flush_list_lock, flags); + + domain_flush_tlb_pde(pdom); + domain_flush_complete(pdom); +} + const struct iommu_ops amd_iommu_ops = { .capable = amd_iommu_capable, .domain_alloc = amd_iommu_domain_alloc, @@ -3181,6 +3249,9 @@ static bool amd_iommu_is_attach_deferred(struct iommu_domain *domain, .apply_resv_region = amd_iommu_apply_resv_region, .is_attach_deferred = amd_iommu_is_attach_deferred, .pgsize_bitmap = AMD_IOMMU_PGSIZES, + .flush_iotlb_all = amd_iommu_flush_iotlb_all, + .iotlb_range_add = amd_iommu_iotlb_range_add, + .iotlb_sync = amd_iommu_iotlb_sync, }; /* diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 6fe2d03..e8f8cee 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -185,6 +185,12 @@ struct ivmd_header { bool amd_iommu_force_isolation __read_mostly; /* + * IOTLB flush list + */ +LIST_HEAD(amd_iommu_flush_list); +spinlock_t amd_iommu_flush_list_lock; + +/* * List of protection domains - used during resume */ LIST_HEAD(amd_iommu_pd_list); @@ -2490,6 +2496,7 @@ static int __init early_amd_iommu_init(void) __set_bit(0, amd_iommu_pd_alloc_bitmap); spin_lock_init(_iommu_pd_lock); + spin_lock_init(_iommu_flush_list_lock); /* * now the data