Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
On 5/23/19 3:50 PM, John Hubbard wrote: > [...] > I was thinking of it as a temporary measure, only up until, but not including > the > point where put_user_pages() becomes active. That is, the point when > put_user_pages > starts decrementing GUP_PIN_COUNTING_BIAS, instead of just forwarding to > put_page(). > > (For other readers, that's this patch: > > "mm/gup: debug tracking of get_user_pages() references" > > ...in https://github.com/johnhubbard/linux/tree/gup_dma_core ) > Arggh, correction, I meant this patch: "mm/gup: track gup-pinned pages" ...sorry for any confusion there. thanks, -- John Hubbard NVIDIA
Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
On 5/23/19 3:37 PM, Ira Weiny wrote: [...] > I've dug in further and I see now that release_pages() implements (almost the > same thing, see below) as put_page(). > > However, I think we need to be careful here because put_page_testzero() calls > > page_ref_dec_and_test(page); > > ... and after your changes it will need to call ... > > page_ref_sub_return(page, GUP_PIN_COUNTING_BIAS); > > ... on a GUP page: > > So how do you propose calling release_pages() from within put_user_pages()? > Or > were you thinking this would be temporary? I was thinking of it as a temporary measure, only up until, but not including the point where put_user_pages() becomes active. That is, the point when put_user_pages starts decrementing GUP_PIN_COUNTING_BIAS, instead of just forwarding to put_page(). (For other readers, that's this patch: "mm/gup: debug tracking of get_user_pages() references" ...in https://github.com/johnhubbard/linux/tree/gup_dma_core ) > > That said, there are 2 differences I see between release_pages() and > put_page() > > 1) release_pages() will only work for a MEMORY_DEVICE_PUBLIC page and not all >devmem pages... >I think this is a bug, patch to follow shortly. > > 2) release_pages() calls __ClearPageActive() while put_page() does not > > I have no idea if the second difference is a bug or not. But it smells of > one... > > It would be nice to know if the open coding of put_page is really a > performance > benefit or not. It seems like an attempt to optimize the taking of the page > data lock. > > Does anyone have any information about the performance advantage here? > > Given the changes above it seems like it would be a benefit to merge the 2 > call > paths more closely to make sure we do the right thing. > Yes, it does. Maybe best to not do the temporary measure, then, while this stuff gets improved. I'll look at your other patch... thanks, -- John Hubbard NVIDIA
Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
On Thu, May 23, 2019 at 12:13:59PM -0700, John Hubbard wrote: > On 5/23/19 12:04 PM, Ira Weiny wrote: > > On Thu, May 23, 2019 at 10:46:38AM -0700, John Hubbard wrote: > > > On 5/23/19 10:32 AM, Jason Gunthorpe wrote: > > > > On Thu, May 23, 2019 at 10:28:52AM -0700, Ira Weiny wrote: > > > > > > @@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct > > > > > > ib_umem_odp *umem_odp, u64 user_virt, > > > > > > * ib_umem_odp_map_dma_single_page(). > > > > > > */ > > > > > > if (npages - (j + 1) > 0) > > > > > > - release_pages(_page_list[j+1], > > > > > > - npages - (j + 1)); > > > > > > + put_user_pages(_page_list[j+1], > > > > > > + npages - (j + 1)); > > > > > > > > > > I don't know if we discussed this before but it looks like the use of > > > > > release_pages() was not entirely correct (or at least not necessary) > > > > > here. So > > > > > I think this is ok. > > > > > > > > Oh? John switched it from a put_pages loop to release_pages() here: > > > > > > > > commit 75a3e6a3c129cddcc683538d8702c6ef998ec589 > > > > Author: John Hubbard > > > > Date: Mon Mar 4 11:46:45 2019 -0800 > > > > > > > > RDMA/umem: minor bug fix in error handling path > > > > 1. Bug fix: fix an off by one error in the code that cleans up if > > > > it fails > > > > to dma-map a page, after having done a get_user_pages_remote() > > > > on a > > > > range of pages. > > > > 2. Refinement: for that same cleanup code, release_pages() is > > > > better than > > > > put_page() in a loop. > > > > > > > > And now we are going to back something called put_pages() that > > > > implements the same for loop the above removed? > > > > > > > > Seems like we are going in circles?? John? > > > > > > > > > > put_user_pages() is meant to be a drop-in replacement for release_pages(), > > > so I made the above change as an interim step in moving the callsite from > > > a loop, to a single call. > > > > > > And at some point, it may be possible to find a way to optimize > > > put_user_pages() > > > in a similar way to the batching that release_pages() does, that was part > > > of the plan for this. > > > > > > But I do see what you mean: in the interim, maybe put_user_pages() should > > > just be calling release_pages(), how does that change sound? > > > > I'm certainly not the expert here but FWICT release_pages() was originally > > designed to work with the page cache. > > > > aabfb57296e3 mm: memcontrol: do not kill uncharge batching in > > free_pages_and_swap_cache > > > > But at some point it was changed to be more general? > > > > ea1754a08476 mm, fs: remove remaining PAGE_CACHE_* and > > page_cache_{get,release} usage > > > > ... and it is exported and used outside of the swapping code... and used at > > lease 1 place to directly "put" pages gotten from get_user_pages_fast() > > [arch/x86/kvm/svm.c] > > > > From that it seems like it is safe. > > > > But I don't see where release_page() actually calls put_page() anywhere? > > What > > am I missing? > > > > For that question, I recall having to look closely at this function, as well: > > void release_pages(struct page **pages, int nr) > { > int i; > LIST_HEAD(pages_to_free); > struct pglist_data *locked_pgdat = NULL; > struct lruvec *lruvec; > unsigned long uninitialized_var(flags); > unsigned int uninitialized_var(lock_batch); > > for (i = 0; i < nr; i++) { > struct page *page = pages[i]; > > /* >* Make sure the IRQ-safe lock-holding time does not get >* excessive with a continuous string of pages from the >* same pgdat. The lock is held only if pgdat != NULL. >*/ > if (locked_pgdat && ++lock_batch == SWAP_CLUSTER_MAX) { > spin_unlock_irqrestore(_pgdat->lru_lock, flags); > locked_pgdat = NULL; > } > > if (is_huge_zero_page(page)) > continue; > > /* Device public page can not be huge page */ > if (is_device_public_page(page)) { > if (locked_pgdat) { > spin_unlock_irqrestore(_pgdat->lru_lock, > flags); > locked_pgdat = NULL; > } > put_devmap_managed_page(page); > continue; > } > > page = compound_head(page); > if (!put_page_testzero(page)) > >^here is where it does the put_page() call, is that what > you were looking for? Yes I saw that... I've dug in further and I see
Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
On Thu, May 23, 2019 at 10:46:38AM -0700, John Hubbard wrote: > On 5/23/19 10:32 AM, Jason Gunthorpe wrote: > > On Thu, May 23, 2019 at 10:28:52AM -0700, Ira Weiny wrote: > > > > @@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp > > > > *umem_odp, u64 user_virt, > > > > * ib_umem_odp_map_dma_single_page(). > > > > */ > > > > if (npages - (j + 1) > 0) > > > > - release_pages(_page_list[j+1], > > > > - npages - (j + 1)); > > > > + put_user_pages(_page_list[j+1], > > > > + npages - (j + 1)); > > > > > > I don't know if we discussed this before but it looks like the use of > > > release_pages() was not entirely correct (or at least not necessary) > > > here. So > > > I think this is ok. > > > > Oh? John switched it from a put_pages loop to release_pages() here: > > > > commit 75a3e6a3c129cddcc683538d8702c6ef998ec589 > > Author: John Hubbard > > Date: Mon Mar 4 11:46:45 2019 -0800 > > > > RDMA/umem: minor bug fix in error handling path > > 1. Bug fix: fix an off by one error in the code that cleans up if it > > fails > > to dma-map a page, after having done a get_user_pages_remote() on a > > range of pages. > > 2. Refinement: for that same cleanup code, release_pages() is better > > than > > put_page() in a loop. > > > > And now we are going to back something called put_pages() that > > implements the same for loop the above removed? > > > > Seems like we are going in circles?? John? > > > > put_user_pages() is meant to be a drop-in replacement for release_pages(), > so I made the above change as an interim step in moving the callsite from > a loop, to a single call. > > And at some point, it may be possible to find a way to optimize > put_user_pages() > in a similar way to the batching that release_pages() does, that was part > of the plan for this. > > But I do see what you mean: in the interim, maybe put_user_pages() should > just be calling release_pages(), how does that change sound? It would have made it more consistent.. But it seems this isn't a functional problem in this patch Jason
Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
On 5/23/19 12:04 PM, Ira Weiny wrote: On Thu, May 23, 2019 at 10:46:38AM -0700, John Hubbard wrote: On 5/23/19 10:32 AM, Jason Gunthorpe wrote: On Thu, May 23, 2019 at 10:28:52AM -0700, Ira Weiny wrote: @@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt, * ib_umem_odp_map_dma_single_page(). */ if (npages - (j + 1) > 0) - release_pages(_page_list[j+1], - npages - (j + 1)); + put_user_pages(_page_list[j+1], + npages - (j + 1)); I don't know if we discussed this before but it looks like the use of release_pages() was not entirely correct (or at least not necessary) here. So I think this is ok. Oh? John switched it from a put_pages loop to release_pages() here: commit 75a3e6a3c129cddcc683538d8702c6ef998ec589 Author: John Hubbard Date: Mon Mar 4 11:46:45 2019 -0800 RDMA/umem: minor bug fix in error handling path 1. Bug fix: fix an off by one error in the code that cleans up if it fails to dma-map a page, after having done a get_user_pages_remote() on a range of pages. 2. Refinement: for that same cleanup code, release_pages() is better than put_page() in a loop. And now we are going to back something called put_pages() that implements the same for loop the above removed? Seems like we are going in circles?? John? put_user_pages() is meant to be a drop-in replacement for release_pages(), so I made the above change as an interim step in moving the callsite from a loop, to a single call. And at some point, it may be possible to find a way to optimize put_user_pages() in a similar way to the batching that release_pages() does, that was part of the plan for this. But I do see what you mean: in the interim, maybe put_user_pages() should just be calling release_pages(), how does that change sound? I'm certainly not the expert here but FWICT release_pages() was originally designed to work with the page cache. aabfb57296e3 mm: memcontrol: do not kill uncharge batching in free_pages_and_swap_cache But at some point it was changed to be more general? ea1754a08476 mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage ... and it is exported and used outside of the swapping code... and used at lease 1 place to directly "put" pages gotten from get_user_pages_fast() [arch/x86/kvm/svm.c] From that it seems like it is safe. But I don't see where release_page() actually calls put_page() anywhere? What am I missing? For that question, I recall having to look closely at this function, as well: void release_pages(struct page **pages, int nr) { int i; LIST_HEAD(pages_to_free); struct pglist_data *locked_pgdat = NULL; struct lruvec *lruvec; unsigned long uninitialized_var(flags); unsigned int uninitialized_var(lock_batch); for (i = 0; i < nr; i++) { struct page *page = pages[i]; /* * Make sure the IRQ-safe lock-holding time does not get * excessive with a continuous string of pages from the * same pgdat. The lock is held only if pgdat != NULL. */ if (locked_pgdat && ++lock_batch == SWAP_CLUSTER_MAX) { spin_unlock_irqrestore(_pgdat->lru_lock, flags); locked_pgdat = NULL; } if (is_huge_zero_page(page)) continue; /* Device public page can not be huge page */ if (is_device_public_page(page)) { if (locked_pgdat) { spin_unlock_irqrestore(_pgdat->lru_lock, flags); locked_pgdat = NULL; } put_devmap_managed_page(page); continue; } page = compound_head(page); if (!put_page_testzero(page)) ^here is where it does the put_page() call, is that what you were looking for? thanks, -- John Hubbard NVIDIA
Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
On Thu, May 23, 2019 at 10:46:38AM -0700, John Hubbard wrote: > On 5/23/19 10:32 AM, Jason Gunthorpe wrote: > > On Thu, May 23, 2019 at 10:28:52AM -0700, Ira Weiny wrote: > > > > @@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp > > > > *umem_odp, u64 user_virt, > > > > * ib_umem_odp_map_dma_single_page(). > > > > */ > > > > if (npages - (j + 1) > 0) > > > > - release_pages(_page_list[j+1], > > > > - npages - (j + 1)); > > > > + put_user_pages(_page_list[j+1], > > > > + npages - (j + 1)); > > > > > > I don't know if we discussed this before but it looks like the use of > > > release_pages() was not entirely correct (or at least not necessary) > > > here. So > > > I think this is ok. > > > > Oh? John switched it from a put_pages loop to release_pages() here: > > > > commit 75a3e6a3c129cddcc683538d8702c6ef998ec589 > > Author: John Hubbard > > Date: Mon Mar 4 11:46:45 2019 -0800 > > > > RDMA/umem: minor bug fix in error handling path > > 1. Bug fix: fix an off by one error in the code that cleans up if it > > fails > > to dma-map a page, after having done a get_user_pages_remote() on a > > range of pages. > > 2. Refinement: for that same cleanup code, release_pages() is better > > than > > put_page() in a loop. > > > > And now we are going to back something called put_pages() that > > implements the same for loop the above removed? > > > > Seems like we are going in circles?? John? > > > > put_user_pages() is meant to be a drop-in replacement for release_pages(), > so I made the above change as an interim step in moving the callsite from > a loop, to a single call. > > And at some point, it may be possible to find a way to optimize > put_user_pages() > in a similar way to the batching that release_pages() does, that was part > of the plan for this. > > But I do see what you mean: in the interim, maybe put_user_pages() should > just be calling release_pages(), how does that change sound? I'm certainly not the expert here but FWICT release_pages() was originally designed to work with the page cache. aabfb57296e3 mm: memcontrol: do not kill uncharge batching in free_pages_and_swap_cache But at some point it was changed to be more general? ea1754a08476 mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage ... and it is exported and used outside of the swapping code... and used at lease 1 place to directly "put" pages gotten from get_user_pages_fast() [arch/x86/kvm/svm.c] >From that it seems like it is safe. But I don't see where release_page() actually calls put_page() anywhere? What am I missing? Ira
Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
On 5/23/19 8:31 AM, Jerome Glisse wrote: [...] Reviewed-by: Jérôme Glisse Thanks for the review! Between i have a wishlist see below [...] diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index e7ea819fcb11..673f0d240b3e 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -54,9 +54,10 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d for_each_sg_page(umem->sg_head.sgl, _iter, umem->sg_nents, 0) { page = sg_page_iter_page(_iter); - if (!PageDirty(page) && umem->writable && dirty) - set_page_dirty_lock(page); - put_page(page); + if (umem->writable && dirty) + put_user_pages_dirty_lock(, 1); + else + put_user_page(page); Can we get a put_user_page_dirty(struct page 8*pages, bool dirty, npages) ? It is a common pattern that we might have to conditionaly dirty the pages and i feel it would look cleaner if we could move the branch within the put_user_page*() function. This sounds reasonable to me, do others have a preference on this? Last time we discussed it, I recall there was interest in trying to handle the sg lists, which was where a lot of focus was. I'm not sure if there was a preference one way or the other, on adding more of these helpers. thanks, -- John Hubbard NVIDIA
Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
On 5/23/19 10:32 AM, Jason Gunthorpe wrote: On Thu, May 23, 2019 at 10:28:52AM -0700, Ira Weiny wrote: @@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt, * ib_umem_odp_map_dma_single_page(). */ if (npages - (j + 1) > 0) - release_pages(_page_list[j+1], - npages - (j + 1)); + put_user_pages(_page_list[j+1], + npages - (j + 1)); I don't know if we discussed this before but it looks like the use of release_pages() was not entirely correct (or at least not necessary) here. So I think this is ok. Oh? John switched it from a put_pages loop to release_pages() here: commit 75a3e6a3c129cddcc683538d8702c6ef998ec589 Author: John Hubbard Date: Mon Mar 4 11:46:45 2019 -0800 RDMA/umem: minor bug fix in error handling path 1. Bug fix: fix an off by one error in the code that cleans up if it fails to dma-map a page, after having done a get_user_pages_remote() on a range of pages. 2. Refinement: for that same cleanup code, release_pages() is better than put_page() in a loop. And now we are going to back something called put_pages() that implements the same for loop the above removed? Seems like we are going in circles?? John? put_user_pages() is meant to be a drop-in replacement for release_pages(), so I made the above change as an interim step in moving the callsite from a loop, to a single call. And at some point, it may be possible to find a way to optimize put_user_pages() in a similar way to the batching that release_pages() does, that was part of the plan for this. But I do see what you mean: in the interim, maybe put_user_pages() should just be calling release_pages(), how does that change sound? thanks, -- John Hubbard NVIDIA
Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
On Thu, May 23, 2019 at 10:28:52AM -0700, Ira Weiny wrote: > > > > @@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp > > *umem_odp, u64 user_virt, > > * ib_umem_odp_map_dma_single_page(). > > */ > > if (npages - (j + 1) > 0) > > - release_pages(_page_list[j+1], > > - npages - (j + 1)); > > + put_user_pages(_page_list[j+1], > > + npages - (j + 1)); > > I don't know if we discussed this before but it looks like the use of > release_pages() was not entirely correct (or at least not necessary) here. So > I think this is ok. Oh? John switched it from a put_pages loop to release_pages() here: commit 75a3e6a3c129cddcc683538d8702c6ef998ec589 Author: John Hubbard Date: Mon Mar 4 11:46:45 2019 -0800 RDMA/umem: minor bug fix in error handling path 1. Bug fix: fix an off by one error in the code that cleans up if it fails to dma-map a page, after having done a get_user_pages_remote() on a range of pages. 2. Refinement: for that same cleanup code, release_pages() is better than put_page() in a loop. And now we are going to back something called put_pages() that implements the same for loop the above removed? Seems like we are going in circles?? John? Jason
Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
On Thu, May 23, 2019 at 12:25:37AM -0700, john.hubb...@gmail.com wrote: > From: John Hubbard > > For infiniband code that retains pages via get_user_pages*(), > release those pages via the new put_user_page(), or > put_user_pages*(), instead of put_page() > > This is a tiny part of the second step of fixing the problem described > in [1]. The steps are: > > 1) Provide put_user_page*() routines, intended to be used >for releasing pages that were pinned via get_user_pages*(). > > 2) Convert all of the call sites for get_user_pages*(), to >invoke put_user_page*(), instead of put_page(). This involves dozens of >call sites, and will take some time. > > 3) After (2) is complete, use get_user_pages*() and put_user_page*() to >implement tracking of these pages. This tracking will be separate from >the existing struct page refcounting. > > 4) Use the tracking and identification of these pages, to implement >special handling (especially in writeback paths) when the pages are >backed by a filesystem. Again, [1] provides details as to why that is >desirable. > > [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()" > > Cc: Doug Ledford > Cc: Jason Gunthorpe > Cc: Mike Marciniszyn > Cc: Dennis Dalessandro > Cc: Christian Benvenuti > > Reviewed-by: Jan Kara > Reviewed-by: Dennis Dalessandro > Acked-by: Jason Gunthorpe > Tested-by: Ira Weiny > Signed-off-by: John Hubbard > --- > drivers/infiniband/core/umem.c | 7 --- > drivers/infiniband/core/umem_odp.c | 10 +- > drivers/infiniband/hw/hfi1/user_pages.c | 11 --- > drivers/infiniband/hw/mthca/mthca_memfree.c | 6 +++--- > drivers/infiniband/hw/qib/qib_user_pages.c | 11 --- > drivers/infiniband/hw/qib/qib_user_sdma.c | 6 +++--- > drivers/infiniband/hw/usnic/usnic_uiom.c| 7 --- > 7 files changed, 27 insertions(+), 31 deletions(-) > > diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c > index e7ea819fcb11..673f0d240b3e 100644 > --- a/drivers/infiniband/core/umem.c > +++ b/drivers/infiniband/core/umem.c > @@ -54,9 +54,10 @@ static void __ib_umem_release(struct ib_device *dev, > struct ib_umem *umem, int d > > for_each_sg_page(umem->sg_head.sgl, _iter, umem->sg_nents, 0) { > page = sg_page_iter_page(_iter); > - if (!PageDirty(page) && umem->writable && dirty) > - set_page_dirty_lock(page); > - put_page(page); > + if (umem->writable && dirty) > + put_user_pages_dirty_lock(, 1); > + else > + put_user_page(page); > } > > sg_free_table(>sg_head); > diff --git a/drivers/infiniband/core/umem_odp.c > b/drivers/infiniband/core/umem_odp.c > index f962b5bbfa40..17e46df3990a 100644 > --- a/drivers/infiniband/core/umem_odp.c > +++ b/drivers/infiniband/core/umem_odp.c > @@ -487,7 +487,7 @@ void ib_umem_odp_release(struct ib_umem_odp *umem_odp) > * The function returns -EFAULT if the DMA mapping operation fails. It > returns > * -EAGAIN if a concurrent invalidation prevents us from updating the page. > * > - * The page is released via put_page even if the operation failed. For > + * The page is released via put_user_page even if the operation failed. For > * on-demand pinning, the page is released whenever it isn't stored in the > * umem. > */ > @@ -536,7 +536,7 @@ static int ib_umem_odp_map_dma_single_page( > } > > out: > - put_page(page); > + put_user_page(page); > > if (remove_existing_mapping) { > ib_umem_notifier_start_account(umem_odp); > @@ -659,7 +659,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp > *umem_odp, u64 user_virt, > ret = -EFAULT; > break; > } > - put_page(local_page_list[j]); > + put_user_page(local_page_list[j]); > continue; > } > > @@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp > *umem_odp, u64 user_virt, >* ib_umem_odp_map_dma_single_page(). >*/ > if (npages - (j + 1) > 0) > - release_pages(_page_list[j+1], > - npages - (j + 1)); > + put_user_pages(_page_list[j+1], > +npages - (j + 1)); I don't know if we discussed this before but it looks like the use of release_pages() was not entirely correct (or at least not necessary) here. So I think this is ok. As for testing, I have been running with this patch for a while but I don't have ODP hardware so that testing would not cover this code path. So you can add my: Reviewed-by: Ira Weiny >
Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()
On Thu, May 23, 2019 at 12:25:37AM -0700, john.hubb...@gmail.com wrote: > From: John Hubbard > > For infiniband code that retains pages via get_user_pages*(), > release those pages via the new put_user_page(), or > put_user_pages*(), instead of put_page() > > This is a tiny part of the second step of fixing the problem described > in [1]. The steps are: > > 1) Provide put_user_page*() routines, intended to be used >for releasing pages that were pinned via get_user_pages*(). > > 2) Convert all of the call sites for get_user_pages*(), to >invoke put_user_page*(), instead of put_page(). This involves dozens of >call sites, and will take some time. > > 3) After (2) is complete, use get_user_pages*() and put_user_page*() to >implement tracking of these pages. This tracking will be separate from >the existing struct page refcounting. > > 4) Use the tracking and identification of these pages, to implement >special handling (especially in writeback paths) when the pages are >backed by a filesystem. Again, [1] provides details as to why that is >desirable. > > [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()" > > Cc: Doug Ledford > Cc: Jason Gunthorpe > Cc: Mike Marciniszyn > Cc: Dennis Dalessandro > Cc: Christian Benvenuti > > Reviewed-by: Jan Kara > Reviewed-by: Dennis Dalessandro > Acked-by: Jason Gunthorpe > Tested-by: Ira Weiny > Signed-off-by: John Hubbard Reviewed-by: Jérôme Glisse Between i have a wishlist see below > --- > drivers/infiniband/core/umem.c | 7 --- > drivers/infiniband/core/umem_odp.c | 10 +- > drivers/infiniband/hw/hfi1/user_pages.c | 11 --- > drivers/infiniband/hw/mthca/mthca_memfree.c | 6 +++--- > drivers/infiniband/hw/qib/qib_user_pages.c | 11 --- > drivers/infiniband/hw/qib/qib_user_sdma.c | 6 +++--- > drivers/infiniband/hw/usnic/usnic_uiom.c| 7 --- > 7 files changed, 27 insertions(+), 31 deletions(-) > > diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c > index e7ea819fcb11..673f0d240b3e 100644 > --- a/drivers/infiniband/core/umem.c > +++ b/drivers/infiniband/core/umem.c > @@ -54,9 +54,10 @@ static void __ib_umem_release(struct ib_device *dev, > struct ib_umem *umem, int d > > for_each_sg_page(umem->sg_head.sgl, _iter, umem->sg_nents, 0) { > page = sg_page_iter_page(_iter); > - if (!PageDirty(page) && umem->writable && dirty) > - set_page_dirty_lock(page); > - put_page(page); > + if (umem->writable && dirty) > + put_user_pages_dirty_lock(, 1); > + else > + put_user_page(page); Can we get a put_user_page_dirty(struct page 8*pages, bool dirty, npages) ? It is a common pattern that we might have to conditionaly dirty the pages and i feel it would look cleaner if we could move the branch within the put_user_page*() function. Cheers, Jérôme