Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()

2019-05-23 Thread John Hubbard
On 5/23/19 3:50 PM, John Hubbard wrote:
> [...] 
> I was thinking of it as a temporary measure, only up until, but not including 
> the
> point where put_user_pages() becomes active. That is, the point when 
> put_user_pages
> starts decrementing GUP_PIN_COUNTING_BIAS, instead of just forwarding to 
> put_page().
> 
> (For other readers, that's this patch:
> 
> "mm/gup: debug tracking of get_user_pages() references"
> 
> ...in https://github.com/johnhubbard/linux/tree/gup_dma_core )
> 

Arggh, correction, I meant this patch:

"mm/gup: track gup-pinned pages"

...sorry for any confusion there.

thanks,
-- 
John Hubbard
NVIDIA



Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()

2019-05-23 Thread John Hubbard
On 5/23/19 3:37 PM, Ira Weiny wrote:
[...] 
> I've dug in further and I see now that release_pages() implements (almost the
> same thing, see below) as put_page().
> 
> However, I think we need to be careful here because put_page_testzero() calls
> 
>   page_ref_dec_and_test(page);
> 
> ... and after your changes it will need to call ...
> 
>   page_ref_sub_return(page, GUP_PIN_COUNTING_BIAS);
> 
> ... on a GUP page:
> 
> So how do you propose calling release_pages() from within put_user_pages()?  
> Or
> were you thinking this would be temporary?

I was thinking of it as a temporary measure, only up until, but not including 
the
point where put_user_pages() becomes active. That is, the point when 
put_user_pages
starts decrementing GUP_PIN_COUNTING_BIAS, instead of just forwarding to 
put_page().

(For other readers, that's this patch:

"mm/gup: debug tracking of get_user_pages() references"

...in https://github.com/johnhubbard/linux/tree/gup_dma_core )

> 
> That said, there are 2 differences I see between release_pages() and 
> put_page()
> 
> 1) release_pages() will only work for a MEMORY_DEVICE_PUBLIC page and not all
>devmem pages...
>I think this is a bug, patch to follow shortly.
> 
> 2) release_pages() calls __ClearPageActive() while put_page() does not
> 
> I have no idea if the second difference is a bug or not.  But it smells of
> one...
> 
> It would be nice to know if the open coding of put_page is really a 
> performance
> benefit or not.  It seems like an attempt to optimize the taking of the page
> data lock.
> 
> Does anyone have any information about the performance advantage here?
> 
> Given the changes above it seems like it would be a benefit to merge the 2 
> call
> paths more closely to make sure we do the right thing.
> 

Yes, it does. Maybe best to not do the temporary measure, then, while this stuff
gets improved. I'll look at your other patch...


thanks,
-- 
John Hubbard
NVIDIA


Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()

2019-05-23 Thread Ira Weiny
On Thu, May 23, 2019 at 12:13:59PM -0700, John Hubbard wrote:
> On 5/23/19 12:04 PM, Ira Weiny wrote:
> > On Thu, May 23, 2019 at 10:46:38AM -0700, John Hubbard wrote:
> > > On 5/23/19 10:32 AM, Jason Gunthorpe wrote:
> > > > On Thu, May 23, 2019 at 10:28:52AM -0700, Ira Weiny wrote:
> > > > > > @@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct 
> > > > > > ib_umem_odp *umem_odp, u64 user_virt,
> > > > > >  * ib_umem_odp_map_dma_single_page().
> > > > > >  */
> > > > > > if (npages - (j + 1) > 0)
> > > > > > -   release_pages(_page_list[j+1],
> > > > > > - npages - (j + 1));
> > > > > > +   put_user_pages(_page_list[j+1],
> > > > > > +  npages - (j + 1));
> > > > > 
> > > > > I don't know if we discussed this before but it looks like the use of
> > > > > release_pages() was not entirely correct (or at least not necessary) 
> > > > > here.  So
> > > > > I think this is ok.
> > > > 
> > > > Oh? John switched it from a put_pages loop to release_pages() here:
> > > > 
> > > > commit 75a3e6a3c129cddcc683538d8702c6ef998ec589
> > > > Author: John Hubbard 
> > > > Date:   Mon Mar 4 11:46:45 2019 -0800
> > > > 
> > > >   RDMA/umem: minor bug fix in error handling path
> > > >   1. Bug fix: fix an off by one error in the code that cleans up if 
> > > > it fails
> > > >  to dma-map a page, after having done a get_user_pages_remote() 
> > > > on a
> > > >  range of pages.
> > > >   2. Refinement: for that same cleanup code, release_pages() is 
> > > > better than
> > > >  put_page() in a loop.
> > > > 
> > > > And now we are going to back something called put_pages() that
> > > > implements the same for loop the above removed?
> > > > 
> > > > Seems like we are going in circles?? John?
> > > > 
> > > 
> > > put_user_pages() is meant to be a drop-in replacement for release_pages(),
> > > so I made the above change as an interim step in moving the callsite from
> > > a loop, to a single call.
> > > 
> > > And at some point, it may be possible to find a way to optimize 
> > > put_user_pages()
> > > in a similar way to the batching that release_pages() does, that was part
> > > of the plan for this.
> > > 
> > > But I do see what you mean: in the interim, maybe put_user_pages() should
> > > just be calling release_pages(), how does that change sound?
> > 
> > I'm certainly not the expert here but FWICT release_pages() was originally
> > designed to work with the page cache.
> > 
> > aabfb57296e3  mm: memcontrol: do not kill uncharge batching in 
> > free_pages_and_swap_cache
> > 
> > But at some point it was changed to be more general?
> > 
> > ea1754a08476 mm, fs: remove remaining PAGE_CACHE_* and 
> > page_cache_{get,release} usage
> > 
> > ... and it is exported and used outside of the swapping code... and used at
> > lease 1 place to directly "put" pages gotten from get_user_pages_fast()
> > [arch/x86/kvm/svm.c]
> > 
> >  From that it seems like it is safe.
> > 
> > But I don't see where release_page() actually calls put_page() anywhere?  
> > What
> > am I missing?
> > 
> 
> For that question, I recall having to look closely at this function, as well:
> 
> void release_pages(struct page **pages, int nr)
> {
>   int i;
>   LIST_HEAD(pages_to_free);
>   struct pglist_data *locked_pgdat = NULL;
>   struct lruvec *lruvec;
>   unsigned long uninitialized_var(flags);
>   unsigned int uninitialized_var(lock_batch);
> 
>   for (i = 0; i < nr; i++) {
>   struct page *page = pages[i];
> 
>   /*
>* Make sure the IRQ-safe lock-holding time does not get
>* excessive with a continuous string of pages from the
>* same pgdat. The lock is held only if pgdat != NULL.
>*/
>   if (locked_pgdat && ++lock_batch == SWAP_CLUSTER_MAX) {
>   spin_unlock_irqrestore(_pgdat->lru_lock, flags);
>   locked_pgdat = NULL;
>   }
> 
>   if (is_huge_zero_page(page))
>   continue;
> 
>   /* Device public page can not be huge page */
>   if (is_device_public_page(page)) {
>   if (locked_pgdat) {
>   spin_unlock_irqrestore(_pgdat->lru_lock,
>  flags);
>   locked_pgdat = NULL;
>   }
>   put_devmap_managed_page(page);
>   continue;
>   }
> 
>   page = compound_head(page);
>   if (!put_page_testzero(page))
> 
>^here is where it does the put_page() call, is that what
>   you were looking for?

Yes I saw that...

I've dug in further and I see 

Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()

2019-05-23 Thread Jason Gunthorpe
On Thu, May 23, 2019 at 10:46:38AM -0700, John Hubbard wrote:
> On 5/23/19 10:32 AM, Jason Gunthorpe wrote:
> > On Thu, May 23, 2019 at 10:28:52AM -0700, Ira Weiny wrote:
> > > > @@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp 
> > > > *umem_odp, u64 user_virt,
> > > >  * ib_umem_odp_map_dma_single_page().
> > > >  */
> > > > if (npages - (j + 1) > 0)
> > > > -   release_pages(_page_list[j+1],
> > > > - npages - (j + 1));
> > > > +   put_user_pages(_page_list[j+1],
> > > > +  npages - (j + 1));
> > > 
> > > I don't know if we discussed this before but it looks like the use of
> > > release_pages() was not entirely correct (or at least not necessary) 
> > > here.  So
> > > I think this is ok.
> > 
> > Oh? John switched it from a put_pages loop to release_pages() here:
> > 
> > commit 75a3e6a3c129cddcc683538d8702c6ef998ec589
> > Author: John Hubbard 
> > Date:   Mon Mar 4 11:46:45 2019 -0800
> > 
> >  RDMA/umem: minor bug fix in error handling path
> >  1. Bug fix: fix an off by one error in the code that cleans up if it 
> > fails
> > to dma-map a page, after having done a get_user_pages_remote() on a
> > range of pages.
> >  2. Refinement: for that same cleanup code, release_pages() is better 
> > than
> > put_page() in a loop.
> > 
> > And now we are going to back something called put_pages() that
> > implements the same for loop the above removed?
> > 
> > Seems like we are going in circles?? John?
> > 
> 
> put_user_pages() is meant to be a drop-in replacement for release_pages(),
> so I made the above change as an interim step in moving the callsite from
> a loop, to a single call.
> 
> And at some point, it may be possible to find a way to optimize 
> put_user_pages()
> in a similar way to the batching that release_pages() does, that was part
> of the plan for this.
> 
> But I do see what you mean: in the interim, maybe put_user_pages() should
> just be calling release_pages(), how does that change sound?

It would have made it more consistent.. But it seems this isn't a
functional problem in this patch

Jason


Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()

2019-05-23 Thread John Hubbard

On 5/23/19 12:04 PM, Ira Weiny wrote:

On Thu, May 23, 2019 at 10:46:38AM -0700, John Hubbard wrote:

On 5/23/19 10:32 AM, Jason Gunthorpe wrote:

On Thu, May 23, 2019 at 10:28:52AM -0700, Ira Weiny wrote:

@@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, 
u64 user_virt,
 * ib_umem_odp_map_dma_single_page().
 */
if (npages - (j + 1) > 0)
-   release_pages(_page_list[j+1],
- npages - (j + 1));
+   put_user_pages(_page_list[j+1],
+  npages - (j + 1));


I don't know if we discussed this before but it looks like the use of
release_pages() was not entirely correct (or at least not necessary) here.  So
I think this is ok.


Oh? John switched it from a put_pages loop to release_pages() here:

commit 75a3e6a3c129cddcc683538d8702c6ef998ec589
Author: John Hubbard 
Date:   Mon Mar 4 11:46:45 2019 -0800

  RDMA/umem: minor bug fix in error handling path
  1. Bug fix: fix an off by one error in the code that cleans up if it fails
 to dma-map a page, after having done a get_user_pages_remote() on a
 range of pages.
  2. Refinement: for that same cleanup code, release_pages() is better than
 put_page() in a loop.

And now we are going to back something called put_pages() that
implements the same for loop the above removed?

Seems like we are going in circles?? John?



put_user_pages() is meant to be a drop-in replacement for release_pages(),
so I made the above change as an interim step in moving the callsite from
a loop, to a single call.

And at some point, it may be possible to find a way to optimize put_user_pages()
in a similar way to the batching that release_pages() does, that was part
of the plan for this.

But I do see what you mean: in the interim, maybe put_user_pages() should
just be calling release_pages(), how does that change sound?


I'm certainly not the expert here but FWICT release_pages() was originally
designed to work with the page cache.

aabfb57296e3  mm: memcontrol: do not kill uncharge batching in 
free_pages_and_swap_cache

But at some point it was changed to be more general?

ea1754a08476 mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} 
usage

... and it is exported and used outside of the swapping code... and used at
lease 1 place to directly "put" pages gotten from get_user_pages_fast()
[arch/x86/kvm/svm.c]

 From that it seems like it is safe.

But I don't see where release_page() actually calls put_page() anywhere?  What
am I missing?



For that question, I recall having to look closely at this function, as well:

void release_pages(struct page **pages, int nr)
{
int i;
LIST_HEAD(pages_to_free);
struct pglist_data *locked_pgdat = NULL;
struct lruvec *lruvec;
unsigned long uninitialized_var(flags);
unsigned int uninitialized_var(lock_batch);

for (i = 0; i < nr; i++) {
struct page *page = pages[i];

/*
 * Make sure the IRQ-safe lock-holding time does not get
 * excessive with a continuous string of pages from the
 * same pgdat. The lock is held only if pgdat != NULL.
 */
if (locked_pgdat && ++lock_batch == SWAP_CLUSTER_MAX) {
spin_unlock_irqrestore(_pgdat->lru_lock, flags);
locked_pgdat = NULL;
}

if (is_huge_zero_page(page))
continue;

/* Device public page can not be huge page */
if (is_device_public_page(page)) {
if (locked_pgdat) {
spin_unlock_irqrestore(_pgdat->lru_lock,
   flags);
locked_pgdat = NULL;
}
put_devmap_managed_page(page);
continue;
}

page = compound_head(page);
if (!put_page_testzero(page))

 ^here is where it does the put_page() call, is that what
you were looking for?



thanks,
--
John Hubbard
NVIDIA


Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()

2019-05-23 Thread Ira Weiny
On Thu, May 23, 2019 at 10:46:38AM -0700, John Hubbard wrote:
> On 5/23/19 10:32 AM, Jason Gunthorpe wrote:
> > On Thu, May 23, 2019 at 10:28:52AM -0700, Ira Weiny wrote:
> > > > @@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp 
> > > > *umem_odp, u64 user_virt,
> > > >  * ib_umem_odp_map_dma_single_page().
> > > >  */
> > > > if (npages - (j + 1) > 0)
> > > > -   release_pages(_page_list[j+1],
> > > > - npages - (j + 1));
> > > > +   put_user_pages(_page_list[j+1],
> > > > +  npages - (j + 1));
> > > 
> > > I don't know if we discussed this before but it looks like the use of
> > > release_pages() was not entirely correct (or at least not necessary) 
> > > here.  So
> > > I think this is ok.
> > 
> > Oh? John switched it from a put_pages loop to release_pages() here:
> > 
> > commit 75a3e6a3c129cddcc683538d8702c6ef998ec589
> > Author: John Hubbard 
> > Date:   Mon Mar 4 11:46:45 2019 -0800
> > 
> >  RDMA/umem: minor bug fix in error handling path
> >  1. Bug fix: fix an off by one error in the code that cleans up if it 
> > fails
> > to dma-map a page, after having done a get_user_pages_remote() on a
> > range of pages.
> >  2. Refinement: for that same cleanup code, release_pages() is better 
> > than
> > put_page() in a loop.
> > 
> > And now we are going to back something called put_pages() that
> > implements the same for loop the above removed?
> > 
> > Seems like we are going in circles?? John?
> > 
> 
> put_user_pages() is meant to be a drop-in replacement for release_pages(),
> so I made the above change as an interim step in moving the callsite from
> a loop, to a single call.
> 
> And at some point, it may be possible to find a way to optimize 
> put_user_pages()
> in a similar way to the batching that release_pages() does, that was part
> of the plan for this.
> 
> But I do see what you mean: in the interim, maybe put_user_pages() should
> just be calling release_pages(), how does that change sound?

I'm certainly not the expert here but FWICT release_pages() was originally
designed to work with the page cache.

aabfb57296e3  mm: memcontrol: do not kill uncharge batching in 
free_pages_and_swap_cache

But at some point it was changed to be more general?

ea1754a08476 mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} 
usage

... and it is exported and used outside of the swapping code... and used at
lease 1 place to directly "put" pages gotten from get_user_pages_fast()
[arch/x86/kvm/svm.c]

>From that it seems like it is safe.

But I don't see where release_page() actually calls put_page() anywhere?  What
am I missing?

Ira



Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()

2019-05-23 Thread John Hubbard

On 5/23/19 8:31 AM, Jerome Glisse wrote:
[...]


Reviewed-by: Jérôme Glisse 



Thanks for the review!


Between i have a wishlist see below

[...]

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index e7ea819fcb11..673f0d240b3e 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -54,9 +54,10 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
  
  	for_each_sg_page(umem->sg_head.sgl, _iter, umem->sg_nents, 0) {

page = sg_page_iter_page(_iter);
-   if (!PageDirty(page) && umem->writable && dirty)
-   set_page_dirty_lock(page);
-   put_page(page);
+   if (umem->writable && dirty)
+   put_user_pages_dirty_lock(, 1);
+   else
+   put_user_page(page);


Can we get a put_user_page_dirty(struct page 8*pages, bool dirty, npages) ?

It is a common pattern that we might have to conditionaly dirty the pages
and i feel it would look cleaner if we could move the branch within the
put_user_page*() function.



This sounds reasonable to me, do others have a preference on this? Last time
we discussed it, I recall there was interest in trying to handle the sg lists,
which was where a lot of focus was. I'm not sure if there was a preference one 
way or the other, on adding more of these helpers.



thanks,
--
John Hubbard
NVIDIA


Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()

2019-05-23 Thread John Hubbard

On 5/23/19 10:32 AM, Jason Gunthorpe wrote:

On Thu, May 23, 2019 at 10:28:52AM -0700, Ira Weiny wrote:
  
@@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,

 * ib_umem_odp_map_dma_single_page().
 */
if (npages - (j + 1) > 0)
-   release_pages(_page_list[j+1],
- npages - (j + 1));
+   put_user_pages(_page_list[j+1],
+  npages - (j + 1));


I don't know if we discussed this before but it looks like the use of
release_pages() was not entirely correct (or at least not necessary) here.  So
I think this is ok.


Oh? John switched it from a put_pages loop to release_pages() here:

commit 75a3e6a3c129cddcc683538d8702c6ef998ec589
Author: John Hubbard 
Date:   Mon Mar 4 11:46:45 2019 -0800

 RDMA/umem: minor bug fix in error handling path
 
 1. Bug fix: fix an off by one error in the code that cleans up if it fails

to dma-map a page, after having done a get_user_pages_remote() on a
range of pages.
 
 2. Refinement: for that same cleanup code, release_pages() is better than

put_page() in a loop.
 


And now we are going to back something called put_pages() that
implements the same for loop the above removed?

Seems like we are going in circles?? John?



put_user_pages() is meant to be a drop-in replacement for release_pages(),
so I made the above change as an interim step in moving the callsite from
a loop, to a single call.

And at some point, it may be possible to find a way to optimize put_user_pages()
in a similar way to the batching that release_pages() does, that was part
of the plan for this.

But I do see what you mean: in the interim, maybe put_user_pages() should
just be calling release_pages(), how does that change sound?


thanks,
--
John Hubbard
NVIDIA


Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()

2019-05-23 Thread Jason Gunthorpe
On Thu, May 23, 2019 at 10:28:52AM -0700, Ira Weiny wrote:
> >  
> > @@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp 
> > *umem_odp, u64 user_virt,
> >  * ib_umem_odp_map_dma_single_page().
> >  */
> > if (npages - (j + 1) > 0)
> > -   release_pages(_page_list[j+1],
> > - npages - (j + 1));
> > +   put_user_pages(_page_list[j+1],
> > +  npages - (j + 1));
> 
> I don't know if we discussed this before but it looks like the use of
> release_pages() was not entirely correct (or at least not necessary) here.  So
> I think this is ok.

Oh? John switched it from a put_pages loop to release_pages() here:

commit 75a3e6a3c129cddcc683538d8702c6ef998ec589
Author: John Hubbard 
Date:   Mon Mar 4 11:46:45 2019 -0800

RDMA/umem: minor bug fix in error handling path

1. Bug fix: fix an off by one error in the code that cleans up if it fails
   to dma-map a page, after having done a get_user_pages_remote() on a
   range of pages.

2. Refinement: for that same cleanup code, release_pages() is better than
   put_page() in a loop.


And now we are going to back something called put_pages() that
implements the same for loop the above removed?

Seems like we are going in circles?? John?

Jason


Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()

2019-05-23 Thread Ira Weiny
On Thu, May 23, 2019 at 12:25:37AM -0700, john.hubb...@gmail.com wrote:
> From: John Hubbard 
> 
> For infiniband code that retains pages via get_user_pages*(),
> release those pages via the new put_user_page(), or
> put_user_pages*(), instead of put_page()
> 
> This is a tiny part of the second step of fixing the problem described
> in [1]. The steps are:
> 
> 1) Provide put_user_page*() routines, intended to be used
>for releasing pages that were pinned via get_user_pages*().
> 
> 2) Convert all of the call sites for get_user_pages*(), to
>invoke put_user_page*(), instead of put_page(). This involves dozens of
>call sites, and will take some time.
> 
> 3) After (2) is complete, use get_user_pages*() and put_user_page*() to
>implement tracking of these pages. This tracking will be separate from
>the existing struct page refcounting.
> 
> 4) Use the tracking and identification of these pages, to implement
>special handling (especially in writeback paths) when the pages are
>backed by a filesystem. Again, [1] provides details as to why that is
>desirable.
> 
> [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"
> 
> Cc: Doug Ledford 
> Cc: Jason Gunthorpe 
> Cc: Mike Marciniszyn 
> Cc: Dennis Dalessandro 
> Cc: Christian Benvenuti 
> 
> Reviewed-by: Jan Kara 
> Reviewed-by: Dennis Dalessandro 
> Acked-by: Jason Gunthorpe 
> Tested-by: Ira Weiny 
> Signed-off-by: John Hubbard 
> ---
>  drivers/infiniband/core/umem.c  |  7 ---
>  drivers/infiniband/core/umem_odp.c  | 10 +-
>  drivers/infiniband/hw/hfi1/user_pages.c | 11 ---
>  drivers/infiniband/hw/mthca/mthca_memfree.c |  6 +++---
>  drivers/infiniband/hw/qib/qib_user_pages.c  | 11 ---
>  drivers/infiniband/hw/qib/qib_user_sdma.c   |  6 +++---
>  drivers/infiniband/hw/usnic/usnic_uiom.c|  7 ---
>  7 files changed, 27 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index e7ea819fcb11..673f0d240b3e 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -54,9 +54,10 @@ static void __ib_umem_release(struct ib_device *dev, 
> struct ib_umem *umem, int d
>  
>   for_each_sg_page(umem->sg_head.sgl, _iter, umem->sg_nents, 0) {
>   page = sg_page_iter_page(_iter);
> - if (!PageDirty(page) && umem->writable && dirty)
> - set_page_dirty_lock(page);
> - put_page(page);
> + if (umem->writable && dirty)
> + put_user_pages_dirty_lock(, 1);
> + else
> + put_user_page(page);
>   }
>  
>   sg_free_table(>sg_head);
> diff --git a/drivers/infiniband/core/umem_odp.c 
> b/drivers/infiniband/core/umem_odp.c
> index f962b5bbfa40..17e46df3990a 100644
> --- a/drivers/infiniband/core/umem_odp.c
> +++ b/drivers/infiniband/core/umem_odp.c
> @@ -487,7 +487,7 @@ void ib_umem_odp_release(struct ib_umem_odp *umem_odp)
>   * The function returns -EFAULT if the DMA mapping operation fails. It 
> returns
>   * -EAGAIN if a concurrent invalidation prevents us from updating the page.
>   *
> - * The page is released via put_page even if the operation failed. For
> + * The page is released via put_user_page even if the operation failed. For
>   * on-demand pinning, the page is released whenever it isn't stored in the
>   * umem.
>   */
> @@ -536,7 +536,7 @@ static int ib_umem_odp_map_dma_single_page(
>   }
>  
>  out:
> - put_page(page);
> + put_user_page(page);
>  
>   if (remove_existing_mapping) {
>   ib_umem_notifier_start_account(umem_odp);
> @@ -659,7 +659,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp 
> *umem_odp, u64 user_virt,
>   ret = -EFAULT;
>   break;
>   }
> - put_page(local_page_list[j]);
> + put_user_page(local_page_list[j]);
>   continue;
>   }
>  
> @@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp 
> *umem_odp, u64 user_virt,
>* ib_umem_odp_map_dma_single_page().
>*/
>   if (npages - (j + 1) > 0)
> - release_pages(_page_list[j+1],
> -   npages - (j + 1));
> + put_user_pages(_page_list[j+1],
> +npages - (j + 1));

I don't know if we discussed this before but it looks like the use of
release_pages() was not entirely correct (or at least not necessary) here.  So
I think this is ok.

As for testing, I have been running with this patch for a while but I don't
have ODP hardware so that testing would not cover this code path.  So you can
add my:

Reviewed-by: Ira Weiny 

>   

Re: [PATCH 1/1] infiniband/mm: convert put_page() to put_user_page*()

2019-05-23 Thread Jerome Glisse
On Thu, May 23, 2019 at 12:25:37AM -0700, john.hubb...@gmail.com wrote:
> From: John Hubbard 
> 
> For infiniband code that retains pages via get_user_pages*(),
> release those pages via the new put_user_page(), or
> put_user_pages*(), instead of put_page()
> 
> This is a tiny part of the second step of fixing the problem described
> in [1]. The steps are:
> 
> 1) Provide put_user_page*() routines, intended to be used
>for releasing pages that were pinned via get_user_pages*().
> 
> 2) Convert all of the call sites for get_user_pages*(), to
>invoke put_user_page*(), instead of put_page(). This involves dozens of
>call sites, and will take some time.
> 
> 3) After (2) is complete, use get_user_pages*() and put_user_page*() to
>implement tracking of these pages. This tracking will be separate from
>the existing struct page refcounting.
> 
> 4) Use the tracking and identification of these pages, to implement
>special handling (especially in writeback paths) when the pages are
>backed by a filesystem. Again, [1] provides details as to why that is
>desirable.
> 
> [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"
> 
> Cc: Doug Ledford 
> Cc: Jason Gunthorpe 
> Cc: Mike Marciniszyn 
> Cc: Dennis Dalessandro 
> Cc: Christian Benvenuti 
> 
> Reviewed-by: Jan Kara 
> Reviewed-by: Dennis Dalessandro 
> Acked-by: Jason Gunthorpe 
> Tested-by: Ira Weiny 
> Signed-off-by: John Hubbard 

Reviewed-by: Jérôme Glisse 

Between i have a wishlist see below


> ---
>  drivers/infiniband/core/umem.c  |  7 ---
>  drivers/infiniband/core/umem_odp.c  | 10 +-
>  drivers/infiniband/hw/hfi1/user_pages.c | 11 ---
>  drivers/infiniband/hw/mthca/mthca_memfree.c |  6 +++---
>  drivers/infiniband/hw/qib/qib_user_pages.c  | 11 ---
>  drivers/infiniband/hw/qib/qib_user_sdma.c   |  6 +++---
>  drivers/infiniband/hw/usnic/usnic_uiom.c|  7 ---
>  7 files changed, 27 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index e7ea819fcb11..673f0d240b3e 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -54,9 +54,10 @@ static void __ib_umem_release(struct ib_device *dev, 
> struct ib_umem *umem, int d
>  
>   for_each_sg_page(umem->sg_head.sgl, _iter, umem->sg_nents, 0) {
>   page = sg_page_iter_page(_iter);
> - if (!PageDirty(page) && umem->writable && dirty)
> - set_page_dirty_lock(page);
> - put_page(page);
> + if (umem->writable && dirty)
> + put_user_pages_dirty_lock(, 1);
> + else
> + put_user_page(page);

Can we get a put_user_page_dirty(struct page 8*pages, bool dirty, npages) ?

It is a common pattern that we might have to conditionaly dirty the pages
and i feel it would look cleaner if we could move the branch within the
put_user_page*() function.

Cheers,
Jérôme