Re: [PATCH 2/2] mm/hmm: make full use of walk_page_range()

2019-07-24 Thread Ralph Campbell



On 7/24/19 4:53 AM, Jason Gunthorpe wrote:

On Wed, Jul 24, 2019 at 08:51:46AM +0200, Christoph Hellwig wrote:

On Tue, Jul 23, 2019 at 04:30:16PM -0700, Ralph Campbell wrote:

hmm_range_snapshot() and hmm_range_fault() both call find_vma() and
walk_page_range() in a loop. This is unnecessary duplication since
walk_page_range() calls find_vma() in a loop already.
Simplify hmm_range_snapshot() and hmm_range_fault() by defining a
walk_test() callback function to filter unhandled vmas.


I like the approach a lot!

But we really need to sort out the duplication between hmm_range_fault
and hmm_range_snapshot first, as they are basically the same code.  I
have patches here:

http://git.infradead.org/users/hch/misc.git/commitdiff/a34ccd30ee8a8a3111d9e91711c12901ed7dea74

http://git.infradead.org/users/hch/misc.git/commitdiff/81f442ebac7170815af7770a1efa9c4ab662137e


Yeah, that is a straightforward improvement, maybe Ralph should grab
these two as part of his series?


Sure, no problem.
I'll add them in v2 when I fix the other issues in the series.


That being said we don't really have any users for the snapshot mode
or non-blocking faults, and I don't see any in the immediate pipeline
either.


If this code was production ready I'd use it in ODP right away.

When we first create a ODP MR we'd want to snapshot to pre-load the
NIC tables with something other than page fault, but not fault
anything.

This would be a big performance improvement for ODP.

Jason



Re: [PATCH 2/2] mm/hmm: make full use of walk_page_range()

2019-07-24 Thread Jason Gunthorpe
On Wed, Jul 24, 2019 at 08:51:46AM +0200, Christoph Hellwig wrote:
> On Tue, Jul 23, 2019 at 04:30:16PM -0700, Ralph Campbell wrote:
> > hmm_range_snapshot() and hmm_range_fault() both call find_vma() and
> > walk_page_range() in a loop. This is unnecessary duplication since
> > walk_page_range() calls find_vma() in a loop already.
> > Simplify hmm_range_snapshot() and hmm_range_fault() by defining a
> > walk_test() callback function to filter unhandled vmas.
> 
> I like the approach a lot!
> 
> But we really need to sort out the duplication between hmm_range_fault
> and hmm_range_snapshot first, as they are basically the same code.  I
> have patches here:
> 
> http://git.infradead.org/users/hch/misc.git/commitdiff/a34ccd30ee8a8a3111d9e91711c12901ed7dea74
> 
> http://git.infradead.org/users/hch/misc.git/commitdiff/81f442ebac7170815af7770a1efa9c4ab662137e

Yeah, that is a straightforward improvement, maybe Ralph should grab
these two as part of his series?

> That being said we don't really have any users for the snapshot mode
> or non-blocking faults, and I don't see any in the immediate pipeline
> either.

If this code was production ready I'd use it in ODP right away.

When we first create a ODP MR we'd want to snapshot to pre-load the
NIC tables with something other than page fault, but not fault
anything.

This would be a big performance improvement for ODP.

Jason


Re: [PATCH 2/2] mm/hmm: make full use of walk_page_range()

2019-07-24 Thread Christoph Hellwig
On Tue, Jul 23, 2019 at 04:30:16PM -0700, Ralph Campbell wrote:
> hmm_range_snapshot() and hmm_range_fault() both call find_vma() and
> walk_page_range() in a loop. This is unnecessary duplication since
> walk_page_range() calls find_vma() in a loop already.
> Simplify hmm_range_snapshot() and hmm_range_fault() by defining a
> walk_test() callback function to filter unhandled vmas.

I like the approach a lot!

But we really need to sort out the duplication between hmm_range_fault
and hmm_range_snapshot first, as they are basically the same code.  I
have patches here:

http://git.infradead.org/users/hch/misc.git/commitdiff/a34ccd30ee8a8a3111d9e91711c12901ed7dea74

http://git.infradead.org/users/hch/misc.git/commitdiff/81f442ebac7170815af7770a1efa9c4ab662137e

That being said we don't really have any users for the snapshot mode
or non-blocking faults, and I don't see any in the immediate pipeline
either.  It might actually be a better idea to just kill that stuff off
for now until we have a user, as code without users is per definition
untested and will just bitrot and break.

> + const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP;
> + struct hmm_vma_walk *hmm_vma_walk = walk->private;
> + struct hmm_range *range = hmm_vma_walk->range;
> + struct vm_area_struct *vma = walk->vma;
> +
> + /* If range is no longer valid, force retry. */
> + if (!range->valid)
> + return -EBUSY;
> +
> + if (vma->vm_flags & device_vma)

Can we just kill off this odd device_vma variable?

if (vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP))

and maybe add a comment on why we are skipping them (because they
don't have struct page backing I guess..)


[PATCH 2/2] mm/hmm: make full use of walk_page_range()

2019-07-23 Thread Ralph Campbell
hmm_range_snapshot() and hmm_range_fault() both call find_vma() and
walk_page_range() in a loop. This is unnecessary duplication since
walk_page_range() calls find_vma() in a loop already.
Simplify hmm_range_snapshot() and hmm_range_fault() by defining a
walk_test() callback function to filter unhandled vmas.

Signed-off-by: Ralph Campbell 
Cc: "Jérôme Glisse" 
Cc: Jason Gunthorpe 
Cc: Christoph Hellwig 
---
 mm/hmm.c | 197 ---
 1 file changed, 70 insertions(+), 127 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 8271f110c243..218557451070 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -839,13 +839,40 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, 
unsigned long hmask,
 #endif
 }
 
-static void hmm_pfns_clear(struct hmm_range *range,
-  uint64_t *pfns,
-  unsigned long addr,
-  unsigned long end)
+static int hmm_vma_walk_test(unsigned long start,
+unsigned long end,
+struct mm_walk *walk)
 {
-   for (; addr < end; addr += PAGE_SIZE, pfns++)
-   *pfns = range->values[HMM_PFN_NONE];
+   const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP;
+   struct hmm_vma_walk *hmm_vma_walk = walk->private;
+   struct hmm_range *range = hmm_vma_walk->range;
+   struct vm_area_struct *vma = walk->vma;
+
+   /* If range is no longer valid, force retry. */
+   if (!range->valid)
+   return -EBUSY;
+
+   if (vma->vm_flags & device_vma)
+   return -EFAULT;
+
+   if (is_vm_hugetlb_page(vma)) {
+   if (huge_page_shift(hstate_vma(vma)) != range->page_shift &&
+   range->page_shift != PAGE_SHIFT)
+   return -EINVAL;
+   } else {
+   if (range->page_shift != PAGE_SHIFT)
+   return -EINVAL;
+   }
+
+   /*
+* If vma does not allow read access, then assume that it does not
+* allow write access, either. HMM does not support architectures
+* that allow write without read.
+*/
+   if (!(vma->vm_flags & VM_READ))
+   return -EPERM;
+
+   return 0;
 }
 
 /*
@@ -949,65 +976,28 @@ EXPORT_SYMBOL(hmm_range_unregister);
  */
 long hmm_range_snapshot(struct hmm_range *range)
 {
-   const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP;
-   unsigned long start = range->start, end;
-   struct hmm_vma_walk hmm_vma_walk;
+   unsigned long start = range->start;
+   struct hmm_vma_walk hmm_vma_walk = {};
struct hmm *hmm = range->hmm;
-   struct vm_area_struct *vma;
-   struct mm_walk mm_walk;
+   struct mm_walk mm_walk = {};
+   int ret;
 
lockdep_assert_held(>mm->mmap_sem);
-   do {
-   /* If range is no longer valid force retry. */
-   if (!range->valid)
-   return -EBUSY;
 
-   vma = find_vma(hmm->mm, start);
-   if (vma == NULL || (vma->vm_flags & device_vma))
-   return -EFAULT;
-
-   if (is_vm_hugetlb_page(vma)) {
-   if (huge_page_shift(hstate_vma(vma)) !=
-   range->page_shift &&
-   range->page_shift != PAGE_SHIFT)
-   return -EINVAL;
-   } else {
-   if (range->page_shift != PAGE_SHIFT)
-   return -EINVAL;
-   }
+   hmm_vma_walk.range = range;
+   hmm_vma_walk.last = start;
+   mm_walk.private = _vma_walk;
 
-   if (!(vma->vm_flags & VM_READ)) {
-   /*
-* If vma do not allow read access, then assume that it
-* does not allow write access, either. HMM does not
-* support architecture that allow write without read.
-*/
-   hmm_pfns_clear(range, range->pfns,
-   range->start, range->end);
-   return -EPERM;
-   }
+   mm_walk.mm = hmm->mm;
+   mm_walk.pud_entry = hmm_vma_walk_pud;
+   mm_walk.pmd_entry = hmm_vma_walk_pmd;
+   mm_walk.pte_hole = hmm_vma_walk_hole;
+   mm_walk.hugetlb_entry = hmm_vma_walk_hugetlb_entry;
+   mm_walk.test_walk = hmm_vma_walk_test;
 
-   range->vma = vma;
-   hmm_vma_walk.pgmap = NULL;
-   hmm_vma_walk.last = start;
-   hmm_vma_walk.fault = false;
-   hmm_vma_walk.range = range;
-   mm_walk.private = _vma_walk;
-   end = min(range->end, vma->vm_end);
-
-   mm_walk.vma = vma;
-   mm_walk.mm = vma->vm_mm;
-   mm_walk.pte_entry = NULL;
-   mm_walk.test_walk = NULL;
-   mm_walk.hugetlb_entry =