Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
On Thu, Mar 17, 2022 at 09:13:50AM +0100, David Hildenbrand wrote: > On 17.03.22 03:54, Alistair Popple wrote: > > Felix Kuehling writes: > > > >> On 2022-03-11 04:16, David Hildenbrand wrote: > >>> On 10.03.22 18:26, Alex Sierra wrote: > DEVICE_COHERENT pages introduce a subtle distinction in the way > "normal" pages can be used by various callers throughout the kernel. > They behave like normal pages for purposes of mapping in CPU page > tables, and for COW. But they do not support LRU lists, NUMA > migration or THP. Therefore we split vm_normal_page into two > functions vm_normal_any_page and vm_normal_lru_page. The latter will > only return pages that can be put on an LRU list and that support > NUMA migration, KSM and THP. > > We also introduced a FOLL_LRU flag that adds the same behaviour to > follow_page and related APIs, to allow callers to specify that they > expect to put pages on an LRU list. > > >>> I still don't see the need for s/vm_normal_page/vm_normal_any_page/. And > >>> as this patch is dominated by that change, I'd suggest (again) to just > >>> drop it as I don't see any value of that renaming. No specifier implies > >>> any. > >> > >> OK. If nobody objects, we can adopts that naming convention. > > > > I'd prefer we avoid the churn too, but I don't think we should make > > vm_normal_page() the equivalent of vm_normal_any_page(). It would mean > > vm_normal_page() would return non-LRU device coherent pages, but to me at > > least > > device coherent pages seem special and not what I'd expect from a function > > with > > "normal" in the name. > > > > So I think it would be better to s/vm_normal_lru_page/vm_normal_page/ and > > keep > > vm_normal_any_page() (or perhaps call it vm_any_page?). This is basically > > what > > the previous incarnation of this feature did: > > > > struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, > > pte_t pte, bool with_public_device); > > #define vm_normal_page(vma, addr, pte) _vm_normal_page(vma, addr, pte, > > false) > > > > Except we should add: > > > > #define vm_normal_any_page(vma, addr, pte) _vm_normal_page(vma, addr, pte, > > true) > > > > "normal" simply tells us that this is not a special mapping -- IOW, we > want the VM to take a look at the memmap and not treat it like a PFN > map. What we're changing is that we're now also returning non-lru pages. > Fair enough, that's why we introduce vm_normal_lru_page() as a > replacement where we really can only deal with lru pages. > > vm_normal_page vs vm_normal_lru_page is good enough. "lru" further > limits what we get via vm_normal_page, that's even how it's implemented. This naming makes sense to me. Jason
Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
On 17.03.22 03:54, Alistair Popple wrote: > Felix Kuehling writes: > >> On 2022-03-11 04:16, David Hildenbrand wrote: >>> On 10.03.22 18:26, Alex Sierra wrote: DEVICE_COHERENT pages introduce a subtle distinction in the way "normal" pages can be used by various callers throughout the kernel. They behave like normal pages for purposes of mapping in CPU page tables, and for COW. But they do not support LRU lists, NUMA migration or THP. Therefore we split vm_normal_page into two functions vm_normal_any_page and vm_normal_lru_page. The latter will only return pages that can be put on an LRU list and that support NUMA migration, KSM and THP. We also introduced a FOLL_LRU flag that adds the same behaviour to follow_page and related APIs, to allow callers to specify that they expect to put pages on an LRU list. >>> I still don't see the need for s/vm_normal_page/vm_normal_any_page/. And >>> as this patch is dominated by that change, I'd suggest (again) to just >>> drop it as I don't see any value of that renaming. No specifier implies any. >> >> OK. If nobody objects, we can adopts that naming convention. > > I'd prefer we avoid the churn too, but I don't think we should make > vm_normal_page() the equivalent of vm_normal_any_page(). It would mean > vm_normal_page() would return non-LRU device coherent pages, but to me at > least > device coherent pages seem special and not what I'd expect from a function > with > "normal" in the name. > > So I think it would be better to s/vm_normal_lru_page/vm_normal_page/ and keep > vm_normal_any_page() (or perhaps call it vm_any_page?). This is basically what > the previous incarnation of this feature did: > > struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, > pte_t pte, bool with_public_device); > #define vm_normal_page(vma, addr, pte) _vm_normal_page(vma, addr, pte, false) > > Except we should add: > > #define vm_normal_any_page(vma, addr, pte) _vm_normal_page(vma, addr, pte, > true) > "normal" simply tells us that this is not a special mapping -- IOW, we want the VM to take a look at the memmap and not treat it like a PFN map. What we're changing is that we're now also returning non-lru pages. Fair enough, that's why we introduce vm_normal_lru_page() as a replacement where we really can only deal with lru pages. vm_normal_page vs vm_normal_lru_page is good enough. "lru" further limits what we get via vm_normal_page, that's even how it's implemented. vm_normal_page vs vm_normal_any_page is confusing IMHO. -- Thanks, David / dhildenb
Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
Felix Kuehling writes: > On 2022-03-11 04:16, David Hildenbrand wrote: >> On 10.03.22 18:26, Alex Sierra wrote: >>> DEVICE_COHERENT pages introduce a subtle distinction in the way >>> "normal" pages can be used by various callers throughout the kernel. >>> They behave like normal pages for purposes of mapping in CPU page >>> tables, and for COW. But they do not support LRU lists, NUMA >>> migration or THP. Therefore we split vm_normal_page into two >>> functions vm_normal_any_page and vm_normal_lru_page. The latter will >>> only return pages that can be put on an LRU list and that support >>> NUMA migration, KSM and THP. >>> >>> We also introduced a FOLL_LRU flag that adds the same behaviour to >>> follow_page and related APIs, to allow callers to specify that they >>> expect to put pages on an LRU list. >>> >> I still don't see the need for s/vm_normal_page/vm_normal_any_page/. And >> as this patch is dominated by that change, I'd suggest (again) to just >> drop it as I don't see any value of that renaming. No specifier implies any. > > OK. If nobody objects, we can adopts that naming convention. I'd prefer we avoid the churn too, but I don't think we should make vm_normal_page() the equivalent of vm_normal_any_page(). It would mean vm_normal_page() would return non-LRU device coherent pages, but to me at least device coherent pages seem special and not what I'd expect from a function with "normal" in the name. So I think it would be better to s/vm_normal_lru_page/vm_normal_page/ and keep vm_normal_any_page() (or perhaps call it vm_any_page?). This is basically what the previous incarnation of this feature did: struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte, bool with_public_device); #define vm_normal_page(vma, addr, pte) _vm_normal_page(vma, addr, pte, false) Except we should add: #define vm_normal_any_page(vma, addr, pte) _vm_normal_page(vma, addr, pte, true) >> The general idea of this change LGTM. >> >> >> I wonder how this interacts with the actual DEVICE_COHERENT coherent >> series. Is this a preparation? Should it be part of the DEVICE_COHERENT >> series? > > Yes, it should be part of that series. Alex developed it on top of the series > for now. But I think eventually it would need to be spliced into it. Agreed, this needs to go at the start of the DEVICE_COHERENT series. Thanks. Alistair > Patch1 would need to go somewhere before the other DEVICE_COHERENT patches > (with > minor modifications). Patch 2 could be squashed into "tools: add hmm gup test > for long term pinned device pages" or go next to it. Patch 3 doesn't have a > direct dependency on device-coherent pages. It only mentions them in comments. > > >> >> IOW, should this patch start with >> >> "With DEVICE_COHERENT, we'll soon have vm_normal_pages() return >> device-managed anonymous pages that are not LRU pages. Although they >> behave like normal pages for purposes of mapping in CPU page, and for >> COW, they do not support LRU lists, NUMA migration or THP. [...]" > > Yes, that makes sense. > > Regards, > Felix > > >> >> But then, I'm confused by patch 2 and 3, because it feels more like we'd >> already have DEVICE_COHERENT then ("hmm_is_coherent_type"). >> >>
Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
Felix Kuehling writes: > Am 2022-03-10 um 14:25 schrieb Matthew Wilcox: >> On Thu, Mar 10, 2022 at 11:26:31AM -0600, Alex Sierra wrote: >>> @@ -606,7 +606,7 @@ static void print_bad_pte(struct vm_area_struct *vma, >>> unsigned long addr, >>>* PFNMAP mappings in order to support COWable mappings. >>>* >>>*/ >>> -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, >>> +struct page *vm_normal_any_page(struct vm_area_struct *vma, unsigned long >>> addr, >>> pte_t pte) >>> { >>> unsigned long pfn = pte_pfn(pte); >>> @@ -620,8 +620,6 @@ struct page *vm_normal_page(struct vm_area_struct *vma, >>> unsigned long addr, >>> return NULL; >>> if (is_zero_pfn(pfn)) >>> return NULL; >>> - if (pte_devmap(pte)) >>> - return NULL; >>> print_bad_pte(vma, addr, pte, NULL); >>> return NULL; >> ... what? >> >> Haven't you just made it so that a devmap page always prints a bad PTE >> message, and then returns NULL anyway? > > Yeah, that was stupid. :/ I think the long-term goal was to get rid of > pte_devmap. But for now, as long as we have pte_special with pte_devmap, > we'll need a special case to handle that like a normal page. > > I only see the PFN_DEV|PFN_MAP flags set in a few places: > drivers/dax/device.c, > drivers/nvdimm/pmem.c, fs/fuse/virtio_fs.c. I guess we need to test at least > one > of them for this patch series to make sure we're not breaking them. > > >> >> Surely this should be: >> >> if (pte_devmap(pte)) >> -return NULL; >> +return pfn_to_page(pfn); >> >> or maybe >> >> +goto check_pfn; >> >> But I don't know about that highest_memmap_pfn check. > > Looks to me like it should work. highest_memmap_pfn gets updated in > memremap_pages -> pagemap_range -> move_pfn_range_to_zone -> > memmap_init_range. FWIW the previous version of this feature which was removed in 25b2995a35b6 ("mm: remove MEMORY_DEVICE_PUBLIC support") had a similar comparison with highest_memmap_pfn: if (likely(pfn <= highest_memmap_pfn)) { struct page *page = pfn_to_page(pfn); if (is_device_public_page(page)) { if (with_public_device) return page; return NULL; } } > Regards, > Felix > > >> >>> @@ -661,6 +659,22 @@ struct page *vm_normal_page(struct vm_area_struct >>> *vma, unsigned long addr, >>> return pfn_to_page(pfn); >>> } >>> +/* >>> + * vm_normal_lru_page -- This function gets the "struct page" associated >>> + * with a pte only for page cache and anon page. These pages are LRU >>> handled. >>> + */ >>> +struct page *vm_normal_lru_page(struct vm_area_struct *vma, unsigned long >>> addr, >>> + pte_t pte) >> It seems a shame to add a new function without proper kernel-doc. >>
Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
On 2022-03-11 04:16, David Hildenbrand wrote: On 10.03.22 18:26, Alex Sierra wrote: DEVICE_COHERENT pages introduce a subtle distinction in the way "normal" pages can be used by various callers throughout the kernel. They behave like normal pages for purposes of mapping in CPU page tables, and for COW. But they do not support LRU lists, NUMA migration or THP. Therefore we split vm_normal_page into two functions vm_normal_any_page and vm_normal_lru_page. The latter will only return pages that can be put on an LRU list and that support NUMA migration, KSM and THP. We also introduced a FOLL_LRU flag that adds the same behaviour to follow_page and related APIs, to allow callers to specify that they expect to put pages on an LRU list. I still don't see the need for s/vm_normal_page/vm_normal_any_page/. And as this patch is dominated by that change, I'd suggest (again) to just drop it as I don't see any value of that renaming. No specifier implies any. OK. If nobody objects, we can adopts that naming convention. The general idea of this change LGTM. I wonder how this interacts with the actual DEVICE_COHERENT coherent series. Is this a preparation? Should it be part of the DEVICE_COHERENT series? Yes, it should be part of that series. Alex developed it on top of the series for now. But I think eventually it would need to be spliced into it. Patch1 would need to go somewhere before the other DEVICE_COHERENT patches (with minor modifications). Patch 2 could be squashed into "tools: add hmm gup test for long term pinned device pages" or go next to it. Patch 3 doesn't have a direct dependency on device-coherent pages. It only mentions them in comments. IOW, should this patch start with "With DEVICE_COHERENT, we'll soon have vm_normal_pages() return device-managed anonymous pages that are not LRU pages. Although they behave like normal pages for purposes of mapping in CPU page, and for COW, they do not support LRU lists, NUMA migration or THP. [...]" Yes, that makes sense. Regards, Felix But then, I'm confused by patch 2 and 3, because it feels more like we'd already have DEVICE_COHERENT then ("hmm_is_coherent_type").
Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
On 10.03.22 18:26, Alex Sierra wrote: > DEVICE_COHERENT pages introduce a subtle distinction in the way > "normal" pages can be used by various callers throughout the kernel. > They behave like normal pages for purposes of mapping in CPU page > tables, and for COW. But they do not support LRU lists, NUMA > migration or THP. Therefore we split vm_normal_page into two > functions vm_normal_any_page and vm_normal_lru_page. The latter will > only return pages that can be put on an LRU list and that support > NUMA migration, KSM and THP. > > We also introduced a FOLL_LRU flag that adds the same behaviour to > follow_page and related APIs, to allow callers to specify that they > expect to put pages on an LRU list. > I still don't see the need for s/vm_normal_page/vm_normal_any_page/. And as this patch is dominated by that change, I'd suggest (again) to just drop it as I don't see any value of that renaming. No specifier implies any. The general idea of this change LGTM. I wonder how this interacts with the actual DEVICE_COHERENT coherent series. Is this a preparation? Should it be part of the DEVICE_COHERENT series? IOW, should this patch start with "With DEVICE_COHERENT, we'll soon have vm_normal_pages() return device-managed anonymous pages that are not LRU pages. Although they behave like normal pages for purposes of mapping in CPU page, and for COW, they do not support LRU lists, NUMA migration or THP. [...]" But then, I'm confused by patch 2 and 3, because it feels more like we'd already have DEVICE_COHERENT then ("hmm_is_coherent_type"). -- Thanks, David / dhildenb
Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
Am 2022-03-10 um 14:25 schrieb Matthew Wilcox: On Thu, Mar 10, 2022 at 11:26:31AM -0600, Alex Sierra wrote: @@ -606,7 +606,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, * PFNMAP mappings in order to support COWable mappings. * */ -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, +struct page *vm_normal_any_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte) { unsigned long pfn = pte_pfn(pte); @@ -620,8 +620,6 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, return NULL; if (is_zero_pfn(pfn)) return NULL; - if (pte_devmap(pte)) - return NULL; print_bad_pte(vma, addr, pte, NULL); return NULL; ... what? Haven't you just made it so that a devmap page always prints a bad PTE message, and then returns NULL anyway? Yeah, that was stupid. :/ I think the long-term goal was to get rid of pte_devmap. But for now, as long as we have pte_special with pte_devmap, we'll need a special case to handle that like a normal page. I only see the PFN_DEV|PFN_MAP flags set in a few places: drivers/dax/device.c, drivers/nvdimm/pmem.c, fs/fuse/virtio_fs.c. I guess we need to test at least one of them for this patch series to make sure we're not breaking them. Surely this should be: if (pte_devmap(pte)) - return NULL; + return pfn_to_page(pfn); or maybe + goto check_pfn; But I don't know about that highest_memmap_pfn check. Looks to me like it should work. highest_memmap_pfn gets updated in memremap_pages -> pagemap_range -> move_pfn_range_to_zone -> memmap_init_range. Regards, Felix @@ -661,6 +659,22 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, return pfn_to_page(pfn); } +/* + * vm_normal_lru_page -- This function gets the "struct page" associated + * with a pte only for page cache and anon page. These pages are LRU handled. + */ +struct page *vm_normal_lru_page(struct vm_area_struct *vma, unsigned long addr, + pte_t pte) It seems a shame to add a new function without proper kernel-doc.
Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
On Thu, Mar 10, 2022 at 11:26:31AM -0600, Alex Sierra wrote: > @@ -606,7 +606,7 @@ static void print_bad_pte(struct vm_area_struct *vma, > unsigned long addr, > * PFNMAP mappings in order to support COWable mappings. > * > */ > -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, > +struct page *vm_normal_any_page(struct vm_area_struct *vma, unsigned long > addr, > pte_t pte) > { > unsigned long pfn = pte_pfn(pte); > @@ -620,8 +620,6 @@ struct page *vm_normal_page(struct vm_area_struct *vma, > unsigned long addr, > return NULL; > if (is_zero_pfn(pfn)) > return NULL; > - if (pte_devmap(pte)) > - return NULL; > > print_bad_pte(vma, addr, pte, NULL); > return NULL; ... what? Haven't you just made it so that a devmap page always prints a bad PTE message, and then returns NULL anyway? Surely this should be: if (pte_devmap(pte)) - return NULL; + return pfn_to_page(pfn); or maybe + goto check_pfn; But I don't know about that highest_memmap_pfn check. > @@ -661,6 +659,22 @@ struct page *vm_normal_page(struct vm_area_struct *vma, > unsigned long addr, > return pfn_to_page(pfn); > } > > +/* > + * vm_normal_lru_page -- This function gets the "struct page" associated > + * with a pte only for page cache and anon page. These pages are LRU handled. > + */ > +struct page *vm_normal_lru_page(struct vm_area_struct *vma, unsigned long > addr, > + pte_t pte) It seems a shame to add a new function without proper kernel-doc.
[PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
DEVICE_COHERENT pages introduce a subtle distinction in the way "normal" pages can be used by various callers throughout the kernel. They behave like normal pages for purposes of mapping in CPU page tables, and for COW. But they do not support LRU lists, NUMA migration or THP. Therefore we split vm_normal_page into two functions vm_normal_any_page and vm_normal_lru_page. The latter will only return pages that can be put on an LRU list and that support NUMA migration, KSM and THP. We also introduced a FOLL_LRU flag that adds the same behaviour to follow_page and related APIs, to allow callers to specify that they expect to put pages on an LRU list. Signed-off-by: Alex Sierra Acked-by: Felix Kuehling --- fs/proc/task_mmu.c | 12 ++-- include/linux/mm.h | 11 +++ mm/gup.c| 10 ++ mm/hmm.c| 2 +- mm/huge_memory.c| 2 +- mm/khugepaged.c | 8 mm/ksm.c| 4 ++-- mm/madvise.c| 4 ++-- mm/memcontrol.c | 2 +- mm/memory.c | 38 ++ mm/mempolicy.c | 4 ++-- mm/migrate.c| 2 +- mm/migrate_device.c | 2 +- mm/mlock.c | 6 +++--- mm/mprotect.c | 2 +- 15 files changed, 64 insertions(+), 45 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index f46060eb91b5..cada3e702693 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -528,7 +528,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, bool migration = false; if (pte_present(*pte)) { - page = vm_normal_page(vma, addr, *pte); + page = vm_normal_any_page(vma, addr, *pte); } else if (is_swap_pte(*pte)) { swp_entry_t swpent = pte_to_swp_entry(*pte); @@ -723,7 +723,7 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask, struct page *page = NULL; if (pte_present(*pte)) { - page = vm_normal_page(vma, addr, *pte); + page = vm_normal_any_page(vma, addr, *pte); } else if (is_swap_pte(*pte)) { swp_entry_t swpent = pte_to_swp_entry(*pte); @@ -1077,7 +1077,7 @@ static inline bool pte_is_pinned(struct vm_area_struct *vma, unsigned long addr, return false; if (likely(!test_bit(MMF_HAS_PINNED, >vm_mm->flags))) return false; - page = vm_normal_page(vma, addr, pte); + page = vm_normal_any_page(vma, addr, pte); if (!page) return false; return page_maybe_dma_pinned(page); @@ -1190,7 +1190,7 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr, if (!pte_present(ptent)) continue; - page = vm_normal_page(vma, addr, ptent); + page = vm_normal_any_page(vma, addr, ptent); if (!page) continue; @@ -1402,7 +1402,7 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm, if (pm->show_pfn) frame = pte_pfn(pte); flags |= PM_PRESENT; - page = vm_normal_page(vma, addr, pte); + page = vm_normal_any_page(vma, addr, pte); if (pte_soft_dirty(pte)) flags |= PM_SOFT_DIRTY; if (pte_uffd_wp(pte)) @@ -1784,7 +1784,7 @@ static struct page *can_gather_numa_stats(pte_t pte, struct vm_area_struct *vma, if (!pte_present(pte)) return NULL; - page = vm_normal_page(vma, addr, pte); + page = vm_normal_lru_page(vma, addr, pte); if (!page) return NULL; diff --git a/include/linux/mm.h b/include/linux/mm.h index d507c32724c0..46b0bb43cef3 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -593,8 +593,8 @@ struct vm_operations_struct { unsigned long addr); #endif /* -* Called by vm_normal_page() for special PTEs to find the -* page for @addr. This is useful if the default behavior +* Called by vm_normal_*_page() for special PTEs to find the +* page for @addr. This is useful if the default behavior * (using pte_page()) would not find the correct page. */ struct page *(*find_special_page)(struct vm_area_struct *vma, @@ -1781,7 +1781,9 @@ static inline bool can_do_mlock(void) { return false; } extern int user_shm_lock(size_t, struct ucounts *); extern void user_shm_unlock(size_t, struct ucounts *); -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, +struct page *vm_normal_any_page(struct vm_area_struct *vma, unsigned long addr, +pte_t pte); +struct page *vm_normal_lru_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte); struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,