Re: [PATCH v2] infiniband/mm: convert put_page() to put_user_page*()

2019-05-27 Thread Jason Gunthorpe
On Fri, May 24, 2019 at 06:45:22PM -0700, john.hubb...@gmail.com wrote:
> From: John Hubbard 
> 
> For infiniband code that retains pages via get_user_pages*(),
> release those pages via the new put_user_page(), or
> put_user_pages*(), instead of put_page()
> 
> This is a tiny part of the second step of fixing the problem described
> in [1]. The steps are:
> 
> 1) Provide put_user_page*() routines, intended to be used
>for releasing pages that were pinned via get_user_pages*().
> 
> 2) Convert all of the call sites for get_user_pages*(), to
>invoke put_user_page*(), instead of put_page(). This involves dozens of
>call sites, and will take some time.
> 
> 3) After (2) is complete, use get_user_pages*() and put_user_page*() to
>implement tracking of these pages. This tracking will be separate from
>the existing struct page refcounting.
> 
> 4) Use the tracking and identification of these pages, to implement
>special handling (especially in writeback paths) when the pages are
>backed by a filesystem. Again, [1] provides details as to why that is
>desirable.
> 
> [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"
> 
> Cc: Doug Ledford 
> Cc: Jason Gunthorpe 
> Cc: Mike Marciniszyn 
> Cc: Dennis Dalessandro 
> Cc: Christian Benvenuti 
> 
> Reviewed-by: Jan Kara 
> Reviewed-by: Dennis Dalessandro 
> Reviewed-by: Ira Weiny 
> Reviewed-by: Jérôme Glisse 
> Acked-by: Jason Gunthorpe 
> Tested-by: Ira Weiny 
> Signed-off-by: John Hubbard 
> ---
>  drivers/infiniband/core/umem.c  |  7 ---
>  drivers/infiniband/core/umem_odp.c  | 10 +-
>  drivers/infiniband/hw/hfi1/user_pages.c | 11 ---
>  drivers/infiniband/hw/mthca/mthca_memfree.c |  6 +++---
>  drivers/infiniband/hw/qib/qib_user_pages.c  | 11 ---
>  drivers/infiniband/hw/qib/qib_user_sdma.c   |  6 +++---
>  drivers/infiniband/hw/usnic/usnic_uiom.c|  7 ---
>  7 files changed, 27 insertions(+), 31 deletions(-)

Applied to for-next, thanks

Jason


Re: [PATCH v2] infiniband/mm: convert put_page() to put_user_page*()

2019-05-27 Thread Jason Gunthorpe
On Sun, May 26, 2019 at 04:06:31AM -0700, Matthew Wilcox wrote:
> On Fri, May 24, 2019 at 06:45:22PM -0700, john.hubb...@gmail.com wrote:
> > For infiniband code that retains pages via get_user_pages*(),
> > release those pages via the new put_user_page(), or
> > put_user_pages*(), instead of put_page()
> 
> I have no objection to this particular patch, but ...
> 
> > This is a tiny part of the second step of fixing the problem described
> > in [1]. The steps are:
> > 
> > 1) Provide put_user_page*() routines, intended to be used
> >for releasing pages that were pinned via get_user_pages*().
> > 
> > 2) Convert all of the call sites for get_user_pages*(), to
> >invoke put_user_page*(), instead of put_page(). This involves dozens of
> >call sites, and will take some time.
> > 
> > 3) After (2) is complete, use get_user_pages*() and put_user_page*() to
> >implement tracking of these pages. This tracking will be separate from
> >the existing struct page refcounting.
> > 
> > 4) Use the tracking and identification of these pages, to implement
> >special handling (especially in writeback paths) when the pages are
> >backed by a filesystem. Again, [1] provides details as to why that is
> >desirable.
> 
> I thought we agreed at LSFMM that the future is a new get_user_bvec()
> / put_user_bvec().  This is largely going to touch the same places as
> step 2 in your list above.  Is it worth doing step 2?

I think so, as these two conversions can run in parallel, whichever we
finish first, biovec or put_user_pages lets John progress to step #3

Jason


Re: [PATCH v2] infiniband/mm: convert put_page() to put_user_page*()

2019-05-26 Thread Christoph Hellwig
On Sun, May 26, 2019 at 04:06:31AM -0700, Matthew Wilcox wrote:
> I thought we agreed at LSFMM that the future is a new get_user_bvec()
> / put_user_bvec().  This is largely going to touch the same places as
> step 2 in your list above.  Is it worth doing step 2?
> 
> One of the advantages of put_user_bvec() is that it would be quite easy
> to miss a conversion from put_page() to put_user_page(), but it'll be
> a type error to miss a conversion from put_page() to put_user_bvec().

FYI, I've got a prototype for get_user_pages_bvec.  I'll post a RFC
series in a few days.


Re: [PATCH v2] infiniband/mm: convert put_page() to put_user_page*()

2019-05-26 Thread Matthew Wilcox
On Fri, May 24, 2019 at 06:45:22PM -0700, john.hubb...@gmail.com wrote:
> For infiniband code that retains pages via get_user_pages*(),
> release those pages via the new put_user_page(), or
> put_user_pages*(), instead of put_page()

I have no objection to this particular patch, but ...

> This is a tiny part of the second step of fixing the problem described
> in [1]. The steps are:
> 
> 1) Provide put_user_page*() routines, intended to be used
>for releasing pages that were pinned via get_user_pages*().
> 
> 2) Convert all of the call sites for get_user_pages*(), to
>invoke put_user_page*(), instead of put_page(). This involves dozens of
>call sites, and will take some time.
> 
> 3) After (2) is complete, use get_user_pages*() and put_user_page*() to
>implement tracking of these pages. This tracking will be separate from
>the existing struct page refcounting.
> 
> 4) Use the tracking and identification of these pages, to implement
>special handling (especially in writeback paths) when the pages are
>backed by a filesystem. Again, [1] provides details as to why that is
>desirable.

I thought we agreed at LSFMM that the future is a new get_user_bvec()
/ put_user_bvec().  This is largely going to touch the same places as
step 2 in your list above.  Is it worth doing step 2?

One of the advantages of put_user_bvec() is that it would be quite easy
to miss a conversion from put_page() to put_user_page(), but it'll be
a type error to miss a conversion from put_page() to put_user_bvec().


[PATCH v2] infiniband/mm: convert put_page() to put_user_page*()

2019-05-24 Thread john . hubbard
From: John Hubbard 

For infiniband code that retains pages via get_user_pages*(),
release those pages via the new put_user_page(), or
put_user_pages*(), instead of put_page()

This is a tiny part of the second step of fixing the problem described
in [1]. The steps are:

1) Provide put_user_page*() routines, intended to be used
   for releasing pages that were pinned via get_user_pages*().

2) Convert all of the call sites for get_user_pages*(), to
   invoke put_user_page*(), instead of put_page(). This involves dozens of
   call sites, and will take some time.

3) After (2) is complete, use get_user_pages*() and put_user_page*() to
   implement tracking of these pages. This tracking will be separate from
   the existing struct page refcounting.

4) Use the tracking and identification of these pages, to implement
   special handling (especially in writeback paths) when the pages are
   backed by a filesystem. Again, [1] provides details as to why that is
   desirable.

[1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"

Cc: Doug Ledford 
Cc: Jason Gunthorpe 
Cc: Mike Marciniszyn 
Cc: Dennis Dalessandro 
Cc: Christian Benvenuti 

Reviewed-by: Jan Kara 
Reviewed-by: Dennis Dalessandro 
Reviewed-by: Ira Weiny 
Reviewed-by: Jérôme Glisse 
Acked-by: Jason Gunthorpe 
Tested-by: Ira Weiny 
Signed-off-by: John Hubbard 
---
 drivers/infiniband/core/umem.c  |  7 ---
 drivers/infiniband/core/umem_odp.c  | 10 +-
 drivers/infiniband/hw/hfi1/user_pages.c | 11 ---
 drivers/infiniband/hw/mthca/mthca_memfree.c |  6 +++---
 drivers/infiniband/hw/qib/qib_user_pages.c  | 11 ---
 drivers/infiniband/hw/qib/qib_user_sdma.c   |  6 +++---
 drivers/infiniband/hw/usnic/usnic_uiom.c|  7 ---
 7 files changed, 27 insertions(+), 31 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index e7ea819fcb11..673f0d240b3e 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -54,9 +54,10 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
 
for_each_sg_page(umem->sg_head.sgl, _iter, umem->sg_nents, 0) {
page = sg_page_iter_page(_iter);
-   if (!PageDirty(page) && umem->writable && dirty)
-   set_page_dirty_lock(page);
-   put_page(page);
+   if (umem->writable && dirty)
+   put_user_pages_dirty_lock(, 1);
+   else
+   put_user_page(page);
}
 
sg_free_table(>sg_head);
diff --git a/drivers/infiniband/core/umem_odp.c 
b/drivers/infiniband/core/umem_odp.c
index f962b5bbfa40..17e46df3990a 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -487,7 +487,7 @@ void ib_umem_odp_release(struct ib_umem_odp *umem_odp)
  * The function returns -EFAULT if the DMA mapping operation fails. It returns
  * -EAGAIN if a concurrent invalidation prevents us from updating the page.
  *
- * The page is released via put_page even if the operation failed. For
+ * The page is released via put_user_page even if the operation failed. For
  * on-demand pinning, the page is released whenever it isn't stored in the
  * umem.
  */
@@ -536,7 +536,7 @@ static int ib_umem_odp_map_dma_single_page(
}
 
 out:
-   put_page(page);
+   put_user_page(page);
 
if (remove_existing_mapping) {
ib_umem_notifier_start_account(umem_odp);
@@ -659,7 +659,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, 
u64 user_virt,
ret = -EFAULT;
break;
}
-   put_page(local_page_list[j]);
+   put_user_page(local_page_list[j]);
continue;
}
 
@@ -686,8 +686,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, 
u64 user_virt,
 * ib_umem_odp_map_dma_single_page().
 */
if (npages - (j + 1) > 0)
-   release_pages(_page_list[j+1],
- npages - (j + 1));
+   put_user_pages(_page_list[j+1],
+  npages - (j + 1));
break;
}
}
diff --git a/drivers/infiniband/hw/hfi1/user_pages.c 
b/drivers/infiniband/hw/hfi1/user_pages.c
index 02eee8eff1db..b89a9b9aef7a 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -118,13 +118,10 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, 
unsigned long vaddr, size_t np
 void hfi1_release_user_pages(struct mm_struct *mm, struct page **p,
 size_t npages, bool dirty)
 {
-   size_t i;
-
-   for