Re: [PATCH v2 3/3] bio: convert get_user_pages_fast() --> pin_user_pages_fast()

2020-08-29 Thread John Hubbard

On 8/29/20 8:02 AM, Christoph Hellwig wrote:

-   size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, );
+   size = iov_iter_pin_user_pages(iter, pages, LONG_MAX, nr_pages, 
);


This is really a comment to the previous patch, but I only spotted it
here:  I think the right name is iov_iter_pin_pages, as bvec, kvec and
pipe aren't usually user pages.  Same as iov_iter_get_pages vs
get_user_pages.  Same for the _alloc variant.



Yes, it is clearly misnamed now! Will fix.


+ * here on.  It will run one unpin_user_page() against each page
+ * and will run one bio_put() against the BIO.


Nit: the ant and the will still fit on the previous line.



Sorry about that, *usually* my text editor does the Right Thing for
those, I must have interfered with the natural flow of things. :)


thanks,
--
John Hubbard
NVIDIA


Re: [PATCH v2 3/3] bio: convert get_user_pages_fast() --> pin_user_pages_fast()

2020-08-29 Thread Christoph Hellwig
> - size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, );
> + size = iov_iter_pin_user_pages(iter, pages, LONG_MAX, nr_pages, 
> );

This is really a comment to the previous patch, but I only spotted it
here:  I think the right name is iov_iter_pin_pages, as bvec, kvec and
pipe aren't usually user pages.  Same as iov_iter_get_pages vs
get_user_pages.  Same for the _alloc variant.

> + * here on.  It will run one unpin_user_page() against each page
> + * and will run one bio_put() against the BIO.

Nit: the ant and the will still fit on the previous line.


[PATCH v2 3/3] bio: convert get_user_pages_fast() --> pin_user_pages_fast()

2020-08-29 Thread John Hubbard
Change generic block/bio Direct IO routines, to acquire FOLL_PIN user
pages via the recently added routines:

iov_iter_pin_user_pages()
iov_iter_pin_user_pages_alloc()
pin_user_page()

This effectively converts several file systems (ext4, for example) that
use the common Direct IO routines.

Change the corresponding page release calls from put_page() to
unpin_user_page().

Change bio_release_pages() to handle FOLL_PIN pages. In fact, that
is now the *only* type of pages it handles now.

Design notes


Quite a few approaches have been considered over the years. This one is
inspired by Christoph Hellwig's July, 2019 observation that there are
only 5 ITER_ types, and we can simplify handling of them for Direct IO
[1]. Accordingly, this patch implements the following pseudocode:

Direct IO behavior:

ITER_IOVEC:
pin_user_pages_fast();
break;

ITER_PIPE:
for each page:
 pin_user_page();
break;

ITER_KVEC:// already elevated page refcount, leave alone
ITER_BVEC:// already elevated page refcount, leave alone
ITER_DISCARD: // discard
return -EFAULT or -ENVALID;

...which works for callers that already have sorted out which case they
are in. Such as, Direct IO in the block/bio layers.

Now, this does leave ITER_KVEC and ITER_BVEC unconverted, but on the
other hand, it's not clear that these are actually affected in the real
world, by the get_user_pages()+filesystem interaction problems of [2].
If it turns out to matter, then those can be handled too, but it's just
more refactoring and surgery to do so.

Page acquisition: The iov_iter_get_pages*() routines
above are at just the right level in the call stack: the callers already
know which system to use, and so it's a small change to just drop in the
replacement routines. And it's a fan-in/fan-out point: block/bio call
sites for Direct IO funnel their page acquisitions through the
iov_iter_get_pages*() routines, and there are many other callers of
those. And we can't convert all of the callers at once--too many
subsystems are involved, and it would be a too large and too risky
patch.

Page release: there are already separate release routines: put_page()
vs. unpin_user_page(), so it's already done there.

[1] https://lore.kernel.org/kvm/20190724061750.ga19...@infradead.org/

[2] "Explicit pinning of user-space pages":
https://lwn.net/Articles/807108/

Signed-off-by: John Hubbard 
---
 block/bio.c  | 24 
 block/blk-map.c  |  6 +++---
 fs/direct-io.c   | 28 ++--
 fs/iomap/direct-io.c |  2 +-
 4 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index a9931f23d933..f54e9414e6d9 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -955,7 +955,7 @@ void bio_release_pages(struct bio *bio, bool mark_dirty)
bio_for_each_segment_all(bvec, bio, iter_all) {
if (mark_dirty && !PageCompound(bvec->bv_page))
set_page_dirty_lock(bvec->bv_page);
-   put_page(bvec->bv_page);
+   unpin_user_page(bvec->bv_page);
}
 }
 EXPORT_SYMBOL_GPL(bio_release_pages);
@@ -986,9 +986,9 @@ static int __bio_iov_bvec_add_pages(struct bio *bio, struct 
iov_iter *iter)
  * @iter: iov iterator describing the region to be mapped
  *
  * Pins pages from *iter and appends them to @bio's bvec array. The
- * pages will have to be released using put_page() when done.
- * For multi-segment *iter, this function only adds pages from the
- * next non-empty segment of the iov iterator.
+ * pages will have to be released using put_page() or unpin_user_page() when
+ * done. For multi-segment *iter, this function only adds pages from the next
+ * non-empty segment of the iov iterator.
  */
 static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 {
@@ -1009,7 +1009,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, 
struct iov_iter *iter)
BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
 
-   size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, );
+   size = iov_iter_pin_user_pages(iter, pages, LONG_MAX, nr_pages, 
);
if (unlikely(size <= 0))
return size ? size : -EFAULT;
 
@@ -1020,7 +1020,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, 
struct iov_iter *iter)
 
if (__bio_try_merge_page(bio, page, len, offset, _page)) {
if (same_page)
-   put_page(page);
+   unpin_user_page(page);
} else {
if (WARN_ON_ONCE(bio_full(bio, len)))
 return -EINVAL;
@@ -1056,7 +1056,7 @@ static int __bio_iov_append_get_pages(struct bio *bio, 
struct iov_iter *iter)
BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
 
-