[PATCH v2 2/3] mm/gup: introduce FOLL_PIN flag for get_user_pages()

2019-08-20 Thread John Hubbard
As explained in the newly added documentation for FOLL_PIN and
FOLL_LONGTERM, in every case where vaddr_pin_pages() is required,
FOLL_PIN must be set. That reason, plus a desire to keep FOLL_PIN
an internal (to get_user_pages() and follow_page()) detail, is why
vaddr_pin_pages() sets FOLL_PIN.

FOLL_LONGTERM, on the other hand, in only set in *some* cases, but
not all. For that reason, this patch moves the setting of FOLL_LONGTERM
out to the caller.

Also add fairly extensive documentation of the meaning and use
of both FOLL_PIN and FOLL_LONGTERM.

Thanks to Jan Kara and Vlastimil Babka for explaining the 4 cases
in this documentation. (I've reworded it and expanded on it slightly.)

The motivation behind moving away from "bare" get_user_pages() calls
is described in more detail in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Vlastimil Babka 
Cc: Jan Kara 
Cc: Michal Hocko 
Cc: Ira Weiny 
Signed-off-by: John Hubbard 
---
 drivers/infiniband/core/umem.c |  1 +
 include/linux/mm.h | 56 ++
 mm/gup.c   |  2 +-
 3 files changed, 52 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index e69eecb0023f..d84f1bfb8d21 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -300,6 +300,7 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, 
unsigned long addr,
 
while (npages) {
down_read(>mmap_sem);
+   gup_flags |= FOLL_LONGTERM;
ret = vaddr_pin_pages(cur_base,
 min_t(unsigned long, npages,
   PAGE_SIZE / sizeof (struct page *)),
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bc675e94ddf8..6e7de424bf5e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2644,6 +2644,8 @@ static inline vm_fault_t vmf_error(int err)
 struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
 unsigned int foll_flags);
 
+/* Flags for follow_page(), get_user_pages ("GUP"), and vaddr_pin_pages(): */
+
 #define FOLL_WRITE 0x01/* check pte is writable */
 #define FOLL_TOUCH 0x02/* mark page accessed */
 #define FOLL_GET   0x04/* do get_page on page */
@@ -2663,13 +2665,15 @@ struct page *follow_page(struct vm_area_struct *vma, 
unsigned long address,
 #define FOLL_ANON  0x8000  /* don't do file mappings */
 #define FOLL_LONGTERM  0x1 /* mapping lifetime is indefinite: see below */
 #define FOLL_SPLIT_PMD 0x2 /* split huge pmd before returning */
+#define FOLL_PIN   0x4 /* pages must be released via put_user_page() */
 
 /*
- * NOTE on FOLL_LONGTERM:
+ * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each
+ * other. Here is what they mean, and how to use them:
  *
  * FOLL_LONGTERM indicates that the page will be held for an indefinite time
- * period _often_ under userspace control.  This is contrasted with
- * iov_iter_get_pages() where usages which are transient.
+ * period _often_ under userspace control.  This is in contrast to
+ * iov_iter_get_pages(), where usages which are transient.
  *
  * FIXME: For pages which are part of a filesystem, mappings are subject to the
  * lifetime enforced by the filesystem and we need guarantees that longterm
@@ -2684,11 +2688,51 @@ struct page *follow_page(struct vm_area_struct *vma, 
unsigned long address,
  * Currently only get_user_pages() and get_user_pages_fast() support this flag
  * and calls to get_user_pages_[un]locked are specifically not allowed.  This
  * is due to an incompatibility with the FS DAX check and
- * FAULT_FLAG_ALLOW_RETRY
+ * FAULT_FLAG_ALLOW_RETRY.
  *
- * In the CMA case: longterm pins in a CMA region would unnecessarily fragment
- * that region.  And so CMA attempts to migrate the page before pinning when
+ * In the CMA case: long term pins in a CMA region would unnecessarily fragment
+ * that region.  And so, CMA attempts to migrate the page before pinning, when
  * FOLL_LONGTERM is specified.
+ *
+ * FOLL_PIN indicates that a special kind of tracking (not just 
page->_refcount,
+ * but an additional pin counting system) will be invoked. This is intended for
+ * anything that gets a page reference and then touches page data (for example,
+ * Direct IO). This lets the filesystem know that some non-file-system entity 
is
+ * potentially changing the pages' data. FOLL_PIN pages must be released,
+ * ultimately, by a call to put_user_page(). Typically that will be via one of
+ * the vaddr_unpin_pages() variants.
+ *
+ * FIXME: note that this special tracking is not in place yet. However, the
+ * pages should still be released by put_user_page().
+ *
+ * When and where to use each flag:
+ *
+ * CASE 1: Direct IO (DIO). There are GUP references to pages that are serving
+ * as DIO buffers. These buffers are needed for a relatively 

Re: [RFC PATCH v2 2/3] mm/gup: introduce FOLL_PIN flag for get_user_pages()

2019-08-19 Thread John Hubbard

On 8/16/19 7:36 PM, John Hubbard wrote:

On 8/16/19 7:24 PM, jhubb...@nvidia.com wrote:

From: John Hubbard 
DKIM-Signature: v a a-sha256; claxed/relaxed; d idia.com; s;
t66008674; bhMai0va6k/z2enpQJ4Nfvbj5WByFxGAO1JwdIBbXioh 
PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer:
 In-Reply-To:References:MIME-Version:X-NVConfidentiality:
 Content-Transfer-Encoding:Content-Type;
bÖUDSde9XF/IsNteBaYOBWeKiHhWmeU9ekUJNvCviHssBDCtw0T+M/2TlEPEzomIT
 fGXzIQNlGN6MXFbaBoyBmF/zjCu02TmTNExbVJ3/5N6PTyOuJFCx9ZN1/5gXsB11m1
 xAHIWE+VOZs4qqDeHDBqKZq+FaxQHNvGz0j6lyVBA70TfseNoZqZZrSil8uvaKJwKd
 TQ1ht+AGWbw9p610JmaPb4u6o/eV6Ns8Sl3EVnjWWu94T6ISNIaWCiC6wQQF6L1YCH
 G5Pjn+0rEjhk6XG4TyLudi5lWp3IVBHd8+WlWlnl+bvLCC55RUAjPJLn7LaVyVdh0F
 nLHwm3bN2Jotg


I cannot readily explain the above email glitch, but I did just now switch
back to mailgw.nvidia.com for this patchset, in order to get the nice behavior
of having "From:" really be my native NVIDIA email address. That's very nice,
but if the glitches happen again, I'll switch back to using gmail for
git-send-email.

Sorry about the weirdness. It does still let you apply the patch, I
just now checked on that.



Hi Ira, could you please let me know if you'd like me to repost this patch, or
the entire patchset, or if you're able to deal with it as-is? As it stands, the
DKIM-Signature cruft above needs to be manually removed, either from the patch, 
or
from the commit log after applying the patch.

Also, as noted in the email thread involving Bharath and sgi-gru [1], I'm
currently planning on branching from your tree, and continuing the misc
call site conversions from there. And then just adapting to whatever API
changes are made to vaddr_*() functions. And the biovec call site conversions 
should
be based on that as well.

[1] https://lore.kernel.org/r/0c2ad29b-934c-ec30-66c3-b153baf1f...@nvidia.com

thanks,
--
John Hubbard
NVIDIA



Re: [RFC PATCH v2 2/3] mm/gup: introduce FOLL_PIN flag for get_user_pages()

2019-08-16 Thread John Hubbard
On 8/16/19 7:24 PM, jhubb...@nvidia.com wrote:
> From: John Hubbard 
> DKIM-Signature: v a a-sha256; claxed/relaxed; d idia.com; s;
>   t66008674; bhMai0va6k/z2enpQJ4Nfvbj5WByFxGAO1JwdIBbXioh 
> PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer:
>In-Reply-To:References:MIME-Version:X-NVConfidentiality:
>Content-Transfer-Encoding:Content-Type;
>   bÖUDSde9XF/IsNteBaYOBWeKiHhWmeU9ekUJNvCviHssBDCtw0T+M/2TlEPEzomIT
>fGXzIQNlGN6MXFbaBoyBmF/zjCu02TmTNExbVJ3/5N6PTyOuJFCx9ZN1/5gXsB11m1
>xAHIWE+VOZs4qqDeHDBqKZq+FaxQHNvGz0j6lyVBA70TfseNoZqZZrSil8uvaKJwKd
>TQ1ht+AGWbw9p610JmaPb4u6o/eV6Ns8Sl3EVnjWWu94T6ISNIaWCiC6wQQF6L1YCH
>G5Pjn+0rEjhk6XG4TyLudi5lWp3IVBHd8+WlWlnl+bvLCC55RUAjPJLn7LaVyVdh0F
>nLHwm3bN2Jotg

I cannot readily explain the above email glitch, but I did just now switch
back to mailgw.nvidia.com for this patchset, in order to get the nice behavior
of having "From:" really be my native NVIDIA email address. That's very nice,
but if the glitches happen again, I'll switch back to using gmail for 
git-send-email.

Sorry about the weirdness. It does still let you apply the patch, I
just now checked on that.

thanks,
-- 
John Hubbard
NVIDIA



[RFC PATCH v2 2/3] mm/gup: introduce FOLL_PIN flag for get_user_pages()

2019-08-16 Thread jhubbard
From: John Hubbard 
DKIM-Signature: v a*
- * In the CMA case: longterm pins in a CMA region would unnecessarily fragment
- * that region.  And so CMA attempts to migrate the page before pinning when
+ * In the CMA case: long term pins in a CMA region would unnecessarily fragment
+ * that region.  And so, CMA attempts to migrate the page before pinning, when
  * FOLL_LONGTERM is specified.
+ *
+ * FOLL_PIN indicates that a special kind of tracking (not just 
page->_refcount,
+ * but an additional pin counting system) will be invoked. This is intended for
+ * anything that gets a page reference and then touches page data (for example,
+ * Direct IO). This lets the filesystem know that some non-file-system entity 
is
+ * potentially changing the pages' data. FOLL_PIN pages must be released,
+ * ultimately, by a call to put_user_page(). Typically that will be via one of
+ * the vaddr_unpin_pages() variants.
+ *
+ * FIXME: note that this special tracking is not in place yet. However, the
+ * pages should still be released by put_user_page().
+ *
+ * When and where to use each flag:
+ *
+ * CASE 1: Direct IO (DIO). There are GUP references to pages that are serving
+ * as DIO buffers. These buffers are needed for a relatively short time (so 
they
+ * are not "long term"). No special synchronization with page_mkclean() or
+ * munmap() is provided. Therefore, flags to set at the call site are:
+ *
+ * FOLL_PIN
+ *
+ * CASE 2: RDMA. There are GUP references to pages that are serving as DMA
+ * buffers. These buffers are needed for a long time ("long term"). No special
+ * synchronization with page_mkclean() or munmap() is provided. Therefore, 
flags
+ * to set at the call site are:
+ *
+ * FOLL_PIN | FOLL_LONGTERM
+ *
+ * There is also a special case when the pages are DAX pages: in addition to 
the
+ * above flags, the caller needs a file lease. This is provided via the struct
+ * vaddr_pin argument to vaddr_pin_pages().
+ *
+ * CASE 3: ODP (Mellanox/Infiniband On Demand Paging: the hardware supports
+ * replayable page faulting). There are GUP references to pages serving as DMA
+ * buffers. For ODP, MMU notifiers are used to synchronize with page_mkclean()
+ * and munmap(). Therefore, normal GUP calls are sufficient, so neither flag
+ * needs to be set.
+ *
+ * CASE 4: pinning for struct page manipulation only. Here, normal GUP calls 
are
+ * sufficient, so neither flag needs to be set.
  */
 
 static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags)
-- 
2.22.1