Change generic block/bio Direct IO routines, to acquire FOLL_PIN user
pages via the recently added routines:
iov_iter_pin_user_pages()
iov_iter_pin_user_pages_alloc()
pin_user_page()
This effectively converts several file systems (ext4, for example) that
use the common Direct IO routines.
Change the corresponding page release calls from put_page() to
unpin_user_page().
Change bio_release_pages() to handle FOLL_PIN pages. In fact, that
is now the *only* type of pages it handles now.
Design notes
Quite a few approaches have been considered over the years. This one is
inspired by Christoph Hellwig's July, 2019 observation that there are
only 5 ITER_ types, and we can simplify handling of them for Direct IO
[1]. Accordingly, this patch implements the following pseudocode:
Direct IO behavior:
ITER_IOVEC:
pin_user_pages_fast();
break;
ITER_PIPE:
for each page:
pin_user_page();
break;
ITER_KVEC:// already elevated page refcount, leave alone
ITER_BVEC:// already elevated page refcount, leave alone
ITER_DISCARD: // discard
return -EFAULT or -ENVALID;
...which works for callers that already have sorted out which case they
are in. Such as, Direct IO in the block/bio layers.
Now, this does leave ITER_KVEC and ITER_BVEC unconverted, but on the
other hand, it's not clear that these are actually affected in the real
world, by the get_user_pages()+filesystem interaction problems of [2].
If it turns out to matter, then those can be handled too, but it's just
more refactoring and surgery to do so.
Page acquisition: The iov_iter_get_pages*() routines
above are at just the right level in the call stack: the callers already
know which system to use, and so it's a small change to just drop in the
replacement routines. And it's a fan-in/fan-out point: block/bio call
sites for Direct IO funnel their page acquisitions through the
iov_iter_get_pages*() routines, and there are many other callers of
those. And we can't convert all of the callers at once--too many
subsystems are involved, and it would be a too large and too risky
patch.
Page release: there are already separate release routines: put_page()
vs. unpin_user_page(), so it's already done there.
[1] https://lore.kernel.org/kvm/20190724061750.ga19...@infradead.org/
[2] "Explicit pinning of user-space pages":
https://lwn.net/Articles/807108/
Signed-off-by: John Hubbard
---
block/bio.c | 24
block/blk-map.c | 6 +++---
fs/direct-io.c | 28 ++--
fs/iomap/direct-io.c | 2 +-
4 files changed, 30 insertions(+), 30 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index a9931f23d933..f54e9414e6d9 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -955,7 +955,7 @@ void bio_release_pages(struct bio *bio, bool mark_dirty)
bio_for_each_segment_all(bvec, bio, iter_all) {
if (mark_dirty && !PageCompound(bvec->bv_page))
set_page_dirty_lock(bvec->bv_page);
- put_page(bvec->bv_page);
+ unpin_user_page(bvec->bv_page);
}
}
EXPORT_SYMBOL_GPL(bio_release_pages);
@@ -986,9 +986,9 @@ static int __bio_iov_bvec_add_pages(struct bio *bio, struct
iov_iter *iter)
* @iter: iov iterator describing the region to be mapped
*
* Pins pages from *iter and appends them to @bio's bvec array. The
- * pages will have to be released using put_page() when done.
- * For multi-segment *iter, this function only adds pages from the
- * next non-empty segment of the iov iterator.
+ * pages will have to be released using put_page() or unpin_user_page() when
+ * done. For multi-segment *iter, this function only adds pages from the next
+ * non-empty segment of the iov iterator.
*/
static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
{
@@ -1009,7 +1009,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio,
struct iov_iter *iter)
BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
- size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, );
+ size = iov_iter_pin_user_pages(iter, pages, LONG_MAX, nr_pages,
);
if (unlikely(size <= 0))
return size ? size : -EFAULT;
@@ -1020,7 +1020,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio,
struct iov_iter *iter)
if (__bio_try_merge_page(bio, page, len, offset, _page)) {
if (same_page)
- put_page(page);
+ unpin_user_page(page);
} else {
if (WARN_ON_ONCE(bio_full(bio, len)))
return -EINVAL;
@@ -1056,7 +1056,7 @@ static int __bio_iov_append_get_pages(struct bio *bio,
struct iov_iter *iter)
BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
-