Re: [RFC PATCHES 1/2] iommu: Add RCU-protected page free support
On Thu, Jun 16, 2022 at 10:27:02AM +0800, Baolu Lu wrote: > On 2022/6/15 23:40, Jason Gunthorpe wrote: > > On Fri, Jun 10, 2022 at 01:37:20PM +0800, Baolu Lu wrote: > > > On 2022/6/9 20:49, Jason Gunthorpe wrote: > > > > > +void iommu_free_pgtbl_pages(struct iommu_domain *domain, > > > > > + struct list_head *pages) > > > > > +{ > > > > > + struct page *page, *next; > > > > > + > > > > > + if (!domain->concurrent_traversal) { > > > > > + put_pages_list(pages); > > > > > + return; > > > > > + } > > > > > + > > > > > + list_for_each_entry_safe(page, next, pages, lru) { > > > > > + list_del(>lru); > > > > > + call_rcu(>rcu_head, pgtble_page_free_rcu); > > > > > + } > > > > It seems OK, but I wonder if there is benifit to using > > > > put_pages_list() from the rcu callback > > > > > > The price is that we need to allocate a "struct list_head" and free it > > > in the rcu callback as well. Currently the list_head is sitting in the > > > stack. > > > > You'd have to use a different struct page layout so that the list_head > > was in the struct page and didn't overlap with the rcu_head > > Okay, let me head this direction in the next version. I'm not sure it is worth it performance wise, would be interesting to know perhaps. But see my prior email about how slab has a custom struct page. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCHES 1/2] iommu: Add RCU-protected page free support
On 2022/6/15 23:40, Jason Gunthorpe wrote: On Fri, Jun 10, 2022 at 01:37:20PM +0800, Baolu Lu wrote: On 2022/6/9 20:49, Jason Gunthorpe wrote: +void iommu_free_pgtbl_pages(struct iommu_domain *domain, + struct list_head *pages) +{ + struct page *page, *next; + + if (!domain->concurrent_traversal) { + put_pages_list(pages); + return; + } + + list_for_each_entry_safe(page, next, pages, lru) { + list_del(>lru); + call_rcu(>rcu_head, pgtble_page_free_rcu); + } It seems OK, but I wonder if there is benifit to using put_pages_list() from the rcu callback The price is that we need to allocate a "struct list_head" and free it in the rcu callback as well. Currently the list_head is sitting in the stack. You'd have to use a different struct page layout so that the list_head was in the struct page and didn't overlap with the rcu_head Okay, let me head this direction in the next version. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCHES 1/2] iommu: Add RCU-protected page free support
On Fri, Jun 10, 2022 at 01:37:20PM +0800, Baolu Lu wrote: > On 2022/6/9 20:49, Jason Gunthorpe wrote: > > > +void iommu_free_pgtbl_pages(struct iommu_domain *domain, > > > + struct list_head *pages) > > > +{ > > > + struct page *page, *next; > > > + > > > + if (!domain->concurrent_traversal) { > > > + put_pages_list(pages); > > > + return; > > > + } > > > + > > > + list_for_each_entry_safe(page, next, pages, lru) { > > > + list_del(>lru); > > > + call_rcu(>rcu_head, pgtble_page_free_rcu); > > > + } > > It seems OK, but I wonder if there is benifit to using > > put_pages_list() from the rcu callback > > The price is that we need to allocate a "struct list_head" and free it > in the rcu callback as well. Currently the list_head is sitting in the > stack. You'd have to use a different struct page layout so that the list_head was in the struct page and didn't overlap with the rcu_head Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCHES 1/2] iommu: Add RCU-protected page free support
On 2022/6/10 01:06, Raj, Ashok wrote: On Thu, Jun 09, 2022 at 03:08:10PM +0800, Lu Baolu wrote: The IOMMU page tables are updated using iommu_map/unmap() interfaces. Currently, there is no mandatory requirement for drivers to use locks to ensure concurrent updates to page tables, because it's assumed that overlapping IOVA ranges do not have concurrent updates. Therefore the IOMMU drivers only need to take care of concurrent updates to level page table entries. The last part doesn't read well.. s/updates to level page table entries/ updates to page-table entries at the same level But enabling new features challenges this assumption. For example, the hardware assisted dirty page tracking feature requires scanning page tables in interfaces other than mapping and unmapping. This might result in a use-after-free scenario in which a level page table has been freed by the unmap() interface, while another thread is scanning the next level page table. This adds RCU-protected page free support so that the pages are really freed and reused after a RCU grace period. Hence, the page tables are safe for scanning within a rcu_read_lock critical region. Considering that scanning the page table is a rare case, this also adds a domain flag and the RCU-protected page free is only used when this flat is set. s/flat/flag Above updated. Thank you! Signed-off-by: Lu Baolu --- include/linux/iommu.h | 9 + drivers/iommu/iommu.c | 23 +++ 2 files changed, 32 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 5e1afe169549..6f68eabb8567 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -95,6 +95,7 @@ struct iommu_domain { void *handler_token; struct iommu_domain_geometry geometry; struct iommu_dma_cookie *iova_cookie; + unsigned long concurrent_traversal:1; Does this need to be a bitfield? Even though you are needing just one bit now, you can probably make have maskbits? As discussed in another reply, I am about to drop the driver opt-in and wrapper the helper around the iommu_iotlb_gather. }; static inline bool iommu_is_dma_domain(struct iommu_domain *domain) @@ -657,6 +658,12 @@ static inline void dev_iommu_priv_set(struct device *dev, void *priv) dev->iommu->priv = priv; } +static inline void domain_set_concurrent_traversal(struct iommu_domain *domain, + bool value) +{ + domain->concurrent_traversal = value; +} + int iommu_probe_device(struct device *dev); void iommu_release_device(struct device *dev); @@ -677,6 +684,8 @@ int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner); void iommu_group_release_dma_owner(struct iommu_group *group); bool iommu_group_dma_owner_claimed(struct iommu_group *group); +void iommu_free_pgtbl_pages(struct iommu_domain *domain, + struct list_head *pages); #else /* CONFIG_IOMMU_API */ struct iommu_ops {}; diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 847ad47a2dfd..ceeb97ebe3e2 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -3252,3 +3252,26 @@ bool iommu_group_dma_owner_claimed(struct iommu_group *group) return user; } EXPORT_SYMBOL_GPL(iommu_group_dma_owner_claimed); + +static void pgtble_page_free_rcu(struct rcu_head *rcu) maybe the names can be consistent? pgtble_ vs pgtbl below. vote to drop the 'e' :-) Updated. +{ + struct page *page = container_of(rcu, struct page, rcu_head); + + __free_pages(page, 0); +} + +void iommu_free_pgtbl_pages(struct iommu_domain *domain, + struct list_head *pages) +{ + struct page *page, *next; + + if (!domain->concurrent_traversal) { + put_pages_list(pages); + return; + } + + list_for_each_entry_safe(page, next, pages, lru) { + list_del(>lru); + call_rcu(>rcu_head, pgtble_page_free_rcu); + } +} -- 2.25.1 Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCHES 1/2] iommu: Add RCU-protected page free support
On 2022/6/9 21:32, Jason Gunthorpe wrote: On Thu, Jun 09, 2022 at 02:19:06PM +0100, Robin Murphy wrote: Is there a significant benefit to keeping both paths, or could we get away with just always using RCU? Realistically, pagetable pages aren't likely to be freed all that frequently, except perhaps at domain teardown, but that shouldn't really be performance-critical, and I guess we could stick an RCU sync point in iommu_domain_free() if we're really worried about releasing larger quantities of pages back to the allocator ASAP? I think you are right, anything that uses the iommu_iotlb_gather may as well use RCU too. IIRC the allocators already know that RCU is often sitting on freed-memory and have some contigency to flush it out before OOMing, so nothing special should be needed. Fair enough. How about below code? static void pgtble_page_free_rcu(struct rcu_head *rcu) { struct page *page = container_of(rcu, struct page, rcu_head); __free_pages(page, 0); } /* * Free pages gathered in the freelist of iommu_iotlb_gather. Use RCU free * way so that it's safe for lock-free page table walk. */ void iommu_free_iotlb_gather_pages(struct iommu_iotlb_gather *iotlb_gather) { struct page *page, *next; list_for_each_entry_safe(page, next, _gather->freelist, lru) { list_del(>lru); call_rcu(>rcu_head, pgtble_page_free_rcu); } } Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCHES 1/2] iommu: Add RCU-protected page free support
On 2022/6/9 20:49, Jason Gunthorpe wrote: +void iommu_free_pgtbl_pages(struct iommu_domain *domain, + struct list_head *pages) +{ + struct page *page, *next; + + if (!domain->concurrent_traversal) { + put_pages_list(pages); + return; + } + + list_for_each_entry_safe(page, next, pages, lru) { + list_del(>lru); + call_rcu(>rcu_head, pgtble_page_free_rcu); + } It seems OK, but I wonder if there is benifit to using put_pages_list() from the rcu callback The price is that we need to allocate a "struct list_head" and free it in the rcu callback as well. Currently the list_head is sitting in the stack. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCHES 1/2] iommu: Add RCU-protected page free support
On Thu, Jun 09, 2022 at 03:08:10PM +0800, Lu Baolu wrote: > The IOMMU page tables are updated using iommu_map/unmap() interfaces. > Currently, there is no mandatory requirement for drivers to use locks > to ensure concurrent updates to page tables, because it's assumed that > overlapping IOVA ranges do not have concurrent updates. Therefore the > IOMMU drivers only need to take care of concurrent updates to level > page table entries. The last part doesn't read well.. s/updates to level page table entries/ updates to page-table entries at the same level > > But enabling new features challenges this assumption. For example, the > hardware assisted dirty page tracking feature requires scanning page > tables in interfaces other than mapping and unmapping. This might result > in a use-after-free scenario in which a level page table has been freed > by the unmap() interface, while another thread is scanning the next level > page table. > > This adds RCU-protected page free support so that the pages are really > freed and reused after a RCU grace period. Hence, the page tables are > safe for scanning within a rcu_read_lock critical region. Considering > that scanning the page table is a rare case, this also adds a domain > flag and the RCU-protected page free is only used when this flat is set. s/flat/flag > > Signed-off-by: Lu Baolu > --- > include/linux/iommu.h | 9 + > drivers/iommu/iommu.c | 23 +++ > 2 files changed, 32 insertions(+) > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index 5e1afe169549..6f68eabb8567 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -95,6 +95,7 @@ struct iommu_domain { > void *handler_token; > struct iommu_domain_geometry geometry; > struct iommu_dma_cookie *iova_cookie; > + unsigned long concurrent_traversal:1; Does this need to be a bitfield? Even though you are needing just one bit now, you can probably make have maskbits? > }; > > static inline bool iommu_is_dma_domain(struct iommu_domain *domain) > @@ -657,6 +658,12 @@ static inline void dev_iommu_priv_set(struct device > *dev, void *priv) > dev->iommu->priv = priv; > } > > +static inline void domain_set_concurrent_traversal(struct iommu_domain > *domain, > +bool value) > +{ > + domain->concurrent_traversal = value; > +} > + > int iommu_probe_device(struct device *dev); > void iommu_release_device(struct device *dev); > > @@ -677,6 +684,8 @@ int iommu_group_claim_dma_owner(struct iommu_group > *group, void *owner); > void iommu_group_release_dma_owner(struct iommu_group *group); > bool iommu_group_dma_owner_claimed(struct iommu_group *group); > > +void iommu_free_pgtbl_pages(struct iommu_domain *domain, > + struct list_head *pages); > #else /* CONFIG_IOMMU_API */ > > struct iommu_ops {}; > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index 847ad47a2dfd..ceeb97ebe3e2 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -3252,3 +3252,26 @@ bool iommu_group_dma_owner_claimed(struct iommu_group > *group) > return user; > } > EXPORT_SYMBOL_GPL(iommu_group_dma_owner_claimed); > + > +static void pgtble_page_free_rcu(struct rcu_head *rcu) maybe the names can be consistent? pgtble_ vs pgtbl below. vote to drop the 'e' :-) > +{ > + struct page *page = container_of(rcu, struct page, rcu_head); > + > + __free_pages(page, 0); > +} > + > +void iommu_free_pgtbl_pages(struct iommu_domain *domain, > + struct list_head *pages) > +{ > + struct page *page, *next; > + > + if (!domain->concurrent_traversal) { > + put_pages_list(pages); > + return; > + } > + > + list_for_each_entry_safe(page, next, pages, lru) { > + list_del(>lru); > + call_rcu(>rcu_head, pgtble_page_free_rcu); > + } > +} > -- > 2.25.1 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCHES 1/2] iommu: Add RCU-protected page free support
On Thu, Jun 09, 2022 at 02:19:06PM +0100, Robin Murphy wrote: > Is there a significant benefit to keeping both paths, or could we get away > with just always using RCU? Realistically, pagetable pages aren't likely to > be freed all that frequently, except perhaps at domain teardown, but that > shouldn't really be performance-critical, and I guess we could stick an RCU > sync point in iommu_domain_free() if we're really worried about releasing > larger quantities of pages back to the allocator ASAP? I think you are right, anything that uses the iommu_iotlb_gather may as well use RCU too. IIRC the allocators already know that RCU is often sitting on freed-memory and have some contigency to flush it out before OOMing, so nothing special should be needed. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCHES 1/2] iommu: Add RCU-protected page free support
On 2022-06-09 13:49, Jason Gunthorpe wrote: On Thu, Jun 09, 2022 at 03:08:10PM +0800, Lu Baolu wrote: The IOMMU page tables are updated using iommu_map/unmap() interfaces. Currently, there is no mandatory requirement for drivers to use locks to ensure concurrent updates to page tables, because it's assumed that overlapping IOVA ranges do not have concurrent updates. Therefore the IOMMU drivers only need to take care of concurrent updates to level page table entries. But enabling new features challenges this assumption. For example, the hardware assisted dirty page tracking feature requires scanning page tables in interfaces other than mapping and unmapping. This might result in a use-after-free scenario in which a level page table has been freed by the unmap() interface, while another thread is scanning the next level page table. This adds RCU-protected page free support so that the pages are really freed and reused after a RCU grace period. Hence, the page tables are safe for scanning within a rcu_read_lock critical region. Considering that scanning the page table is a rare case, this also adds a domain flag and the RCU-protected page free is only used when this flat is set. Signed-off-by: Lu Baolu --- include/linux/iommu.h | 9 + drivers/iommu/iommu.c | 23 +++ 2 files changed, 32 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 5e1afe169549..6f68eabb8567 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -95,6 +95,7 @@ struct iommu_domain { void *handler_token; struct iommu_domain_geometry geometry; struct iommu_dma_cookie *iova_cookie; + unsigned long concurrent_traversal:1; }; static inline bool iommu_is_dma_domain(struct iommu_domain *domain) @@ -657,6 +658,12 @@ static inline void dev_iommu_priv_set(struct device *dev, void *priv) dev->iommu->priv = priv; } +static inline void domain_set_concurrent_traversal(struct iommu_domain *domain, + bool value) +{ + domain->concurrent_traversal = value; +} ?? If you want it to be a driver opt in I would just add a flags to the domain ops. "DOMAIN_FLAG_RCU_FREE_PAGES" Is there a significant benefit to keeping both paths, or could we get away with just always using RCU? Realistically, pagetable pages aren't likely to be freed all that frequently, except perhaps at domain teardown, but that shouldn't really be performance-critical, and I guess we could stick an RCU sync point in iommu_domain_free() if we're really worried about releasing larger quantities of pages back to the allocator ASAP? It's already a driver opt-in to use the iommu_iotlb_gather freelist in the first place, and right now the ones that do are also the ones that do lock-free table walks so will ultimately all want this as well. Robin. +void iommu_free_pgtbl_pages(struct iommu_domain *domain, + struct list_head *pages) +{ + struct page *page, *next; + + if (!domain->concurrent_traversal) { + put_pages_list(pages); + return; + } + + list_for_each_entry_safe(page, next, pages, lru) { + list_del(>lru); + call_rcu(>rcu_head, pgtble_page_free_rcu); + } It seems OK, but I wonder if there is benifit to using put_pages_list() from the rcu callback Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCHES 1/2] iommu: Add RCU-protected page free support
On Thu, Jun 09, 2022 at 03:08:10PM +0800, Lu Baolu wrote: > The IOMMU page tables are updated using iommu_map/unmap() interfaces. > Currently, there is no mandatory requirement for drivers to use locks > to ensure concurrent updates to page tables, because it's assumed that > overlapping IOVA ranges do not have concurrent updates. Therefore the > IOMMU drivers only need to take care of concurrent updates to level > page table entries. > > But enabling new features challenges this assumption. For example, the > hardware assisted dirty page tracking feature requires scanning page > tables in interfaces other than mapping and unmapping. This might result > in a use-after-free scenario in which a level page table has been freed > by the unmap() interface, while another thread is scanning the next level > page table. > > This adds RCU-protected page free support so that the pages are really > freed and reused after a RCU grace period. Hence, the page tables are > safe for scanning within a rcu_read_lock critical region. Considering > that scanning the page table is a rare case, this also adds a domain > flag and the RCU-protected page free is only used when this flat is set. > > Signed-off-by: Lu Baolu > --- > include/linux/iommu.h | 9 + > drivers/iommu/iommu.c | 23 +++ > 2 files changed, 32 insertions(+) > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index 5e1afe169549..6f68eabb8567 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -95,6 +95,7 @@ struct iommu_domain { > void *handler_token; > struct iommu_domain_geometry geometry; > struct iommu_dma_cookie *iova_cookie; > + unsigned long concurrent_traversal:1; > }; > > static inline bool iommu_is_dma_domain(struct iommu_domain *domain) > @@ -657,6 +658,12 @@ static inline void dev_iommu_priv_set(struct device > *dev, void *priv) > dev->iommu->priv = priv; > } > > +static inline void domain_set_concurrent_traversal(struct iommu_domain > *domain, > +bool value) > +{ > + domain->concurrent_traversal = value; > +} ?? If you want it to be a driver opt in I would just add a flags to the domain ops. "DOMAIN_FLAG_RCU_FREE_PAGES" > +void iommu_free_pgtbl_pages(struct iommu_domain *domain, > + struct list_head *pages) > +{ > + struct page *page, *next; > + > + if (!domain->concurrent_traversal) { > + put_pages_list(pages); > + return; > + } > + > + list_for_each_entry_safe(page, next, pages, lru) { > + list_del(>lru); > + call_rcu(>rcu_head, pgtble_page_free_rcu); > + } It seems OK, but I wonder if there is benifit to using put_pages_list() from the rcu callback Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu