Re: [PATCH 2/2] mm/hmm: make full use of walk_page_range()
On 7/24/19 4:53 AM, Jason Gunthorpe wrote: On Wed, Jul 24, 2019 at 08:51:46AM +0200, Christoph Hellwig wrote: On Tue, Jul 23, 2019 at 04:30:16PM -0700, Ralph Campbell wrote: hmm_range_snapshot() and hmm_range_fault() both call find_vma() and walk_page_range() in a loop. This is unnecessary duplication since walk_page_range() calls find_vma() in a loop already. Simplify hmm_range_snapshot() and hmm_range_fault() by defining a walk_test() callback function to filter unhandled vmas. I like the approach a lot! But we really need to sort out the duplication between hmm_range_fault and hmm_range_snapshot first, as they are basically the same code. I have patches here: http://git.infradead.org/users/hch/misc.git/commitdiff/a34ccd30ee8a8a3111d9e91711c12901ed7dea74 http://git.infradead.org/users/hch/misc.git/commitdiff/81f442ebac7170815af7770a1efa9c4ab662137e Yeah, that is a straightforward improvement, maybe Ralph should grab these two as part of his series? Sure, no problem. I'll add them in v2 when I fix the other issues in the series. That being said we don't really have any users for the snapshot mode or non-blocking faults, and I don't see any in the immediate pipeline either. If this code was production ready I'd use it in ODP right away. When we first create a ODP MR we'd want to snapshot to pre-load the NIC tables with something other than page fault, but not fault anything. This would be a big performance improvement for ODP. Jason
Re: [PATCH 2/2] mm/hmm: make full use of walk_page_range()
On Wed, Jul 24, 2019 at 08:51:46AM +0200, Christoph Hellwig wrote: > On Tue, Jul 23, 2019 at 04:30:16PM -0700, Ralph Campbell wrote: > > hmm_range_snapshot() and hmm_range_fault() both call find_vma() and > > walk_page_range() in a loop. This is unnecessary duplication since > > walk_page_range() calls find_vma() in a loop already. > > Simplify hmm_range_snapshot() and hmm_range_fault() by defining a > > walk_test() callback function to filter unhandled vmas. > > I like the approach a lot! > > But we really need to sort out the duplication between hmm_range_fault > and hmm_range_snapshot first, as they are basically the same code. I > have patches here: > > http://git.infradead.org/users/hch/misc.git/commitdiff/a34ccd30ee8a8a3111d9e91711c12901ed7dea74 > > http://git.infradead.org/users/hch/misc.git/commitdiff/81f442ebac7170815af7770a1efa9c4ab662137e Yeah, that is a straightforward improvement, maybe Ralph should grab these two as part of his series? > That being said we don't really have any users for the snapshot mode > or non-blocking faults, and I don't see any in the immediate pipeline > either. If this code was production ready I'd use it in ODP right away. When we first create a ODP MR we'd want to snapshot to pre-load the NIC tables with something other than page fault, but not fault anything. This would be a big performance improvement for ODP. Jason
Re: [PATCH 2/2] mm/hmm: make full use of walk_page_range()
On Tue, Jul 23, 2019 at 04:30:16PM -0700, Ralph Campbell wrote: > hmm_range_snapshot() and hmm_range_fault() both call find_vma() and > walk_page_range() in a loop. This is unnecessary duplication since > walk_page_range() calls find_vma() in a loop already. > Simplify hmm_range_snapshot() and hmm_range_fault() by defining a > walk_test() callback function to filter unhandled vmas. I like the approach a lot! But we really need to sort out the duplication between hmm_range_fault and hmm_range_snapshot first, as they are basically the same code. I have patches here: http://git.infradead.org/users/hch/misc.git/commitdiff/a34ccd30ee8a8a3111d9e91711c12901ed7dea74 http://git.infradead.org/users/hch/misc.git/commitdiff/81f442ebac7170815af7770a1efa9c4ab662137e That being said we don't really have any users for the snapshot mode or non-blocking faults, and I don't see any in the immediate pipeline either. It might actually be a better idea to just kill that stuff off for now until we have a user, as code without users is per definition untested and will just bitrot and break. > + const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP; > + struct hmm_vma_walk *hmm_vma_walk = walk->private; > + struct hmm_range *range = hmm_vma_walk->range; > + struct vm_area_struct *vma = walk->vma; > + > + /* If range is no longer valid, force retry. */ > + if (!range->valid) > + return -EBUSY; > + > + if (vma->vm_flags & device_vma) Can we just kill off this odd device_vma variable? if (vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP)) and maybe add a comment on why we are skipping them (because they don't have struct page backing I guess..)
[PATCH 2/2] mm/hmm: make full use of walk_page_range()
hmm_range_snapshot() and hmm_range_fault() both call find_vma() and walk_page_range() in a loop. This is unnecessary duplication since walk_page_range() calls find_vma() in a loop already. Simplify hmm_range_snapshot() and hmm_range_fault() by defining a walk_test() callback function to filter unhandled vmas. Signed-off-by: Ralph Campbell Cc: "Jérôme Glisse" Cc: Jason Gunthorpe Cc: Christoph Hellwig --- mm/hmm.c | 197 --- 1 file changed, 70 insertions(+), 127 deletions(-) diff --git a/mm/hmm.c b/mm/hmm.c index 8271f110c243..218557451070 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -839,13 +839,40 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, #endif } -static void hmm_pfns_clear(struct hmm_range *range, - uint64_t *pfns, - unsigned long addr, - unsigned long end) +static int hmm_vma_walk_test(unsigned long start, +unsigned long end, +struct mm_walk *walk) { - for (; addr < end; addr += PAGE_SIZE, pfns++) - *pfns = range->values[HMM_PFN_NONE]; + const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP; + struct hmm_vma_walk *hmm_vma_walk = walk->private; + struct hmm_range *range = hmm_vma_walk->range; + struct vm_area_struct *vma = walk->vma; + + /* If range is no longer valid, force retry. */ + if (!range->valid) + return -EBUSY; + + if (vma->vm_flags & device_vma) + return -EFAULT; + + if (is_vm_hugetlb_page(vma)) { + if (huge_page_shift(hstate_vma(vma)) != range->page_shift && + range->page_shift != PAGE_SHIFT) + return -EINVAL; + } else { + if (range->page_shift != PAGE_SHIFT) + return -EINVAL; + } + + /* +* If vma does not allow read access, then assume that it does not +* allow write access, either. HMM does not support architectures +* that allow write without read. +*/ + if (!(vma->vm_flags & VM_READ)) + return -EPERM; + + return 0; } /* @@ -949,65 +976,28 @@ EXPORT_SYMBOL(hmm_range_unregister); */ long hmm_range_snapshot(struct hmm_range *range) { - const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP; - unsigned long start = range->start, end; - struct hmm_vma_walk hmm_vma_walk; + unsigned long start = range->start; + struct hmm_vma_walk hmm_vma_walk = {}; struct hmm *hmm = range->hmm; - struct vm_area_struct *vma; - struct mm_walk mm_walk; + struct mm_walk mm_walk = {}; + int ret; lockdep_assert_held(>mm->mmap_sem); - do { - /* If range is no longer valid force retry. */ - if (!range->valid) - return -EBUSY; - vma = find_vma(hmm->mm, start); - if (vma == NULL || (vma->vm_flags & device_vma)) - return -EFAULT; - - if (is_vm_hugetlb_page(vma)) { - if (huge_page_shift(hstate_vma(vma)) != - range->page_shift && - range->page_shift != PAGE_SHIFT) - return -EINVAL; - } else { - if (range->page_shift != PAGE_SHIFT) - return -EINVAL; - } + hmm_vma_walk.range = range; + hmm_vma_walk.last = start; + mm_walk.private = _vma_walk; - if (!(vma->vm_flags & VM_READ)) { - /* -* If vma do not allow read access, then assume that it -* does not allow write access, either. HMM does not -* support architecture that allow write without read. -*/ - hmm_pfns_clear(range, range->pfns, - range->start, range->end); - return -EPERM; - } + mm_walk.mm = hmm->mm; + mm_walk.pud_entry = hmm_vma_walk_pud; + mm_walk.pmd_entry = hmm_vma_walk_pmd; + mm_walk.pte_hole = hmm_vma_walk_hole; + mm_walk.hugetlb_entry = hmm_vma_walk_hugetlb_entry; + mm_walk.test_walk = hmm_vma_walk_test; - range->vma = vma; - hmm_vma_walk.pgmap = NULL; - hmm_vma_walk.last = start; - hmm_vma_walk.fault = false; - hmm_vma_walk.range = range; - mm_walk.private = _vma_walk; - end = min(range->end, vma->vm_end); - - mm_walk.vma = vma; - mm_walk.mm = vma->vm_mm; - mm_walk.pte_entry = NULL; - mm_walk.test_walk = NULL; - mm_walk.hugetlb_entry =