[PATCH 04/13] mm: Make HPAGE_PXD_* macros even if !THP

2023-12-18 Thread peterx
From: Peter Xu These macros can be helpful when we plan to merge hugetlb code into generic code. Move them out and define them even if !THP. We actually already defined HPAGE_PMD_NR for other reasons even if !THP. Reorganize these macros. Reviewed-by: Christoph Hellwig Signed-off-by: Peter

[PATCH 12/13] mm/gup: Handle hugepd for follow_page()

2023-12-19 Thread peterx
From: Peter Xu Hugepd is only used in PowerPC so far on 4K page size kernels where hash mmu is used. follow_page_mask() used to leverage hugetlb APIs to access hugepd entries. Teach follow_page_mask() itself on hugepd. With previous refactors on fast-gup gup_huge_pd(), most of the code can be

[PATCH 13/13] mm/gup: Handle hugetlb in the generic follow_page_mask code

2023-12-19 Thread peterx
From: Peter Xu Now follow_page() is ready to handle hugetlb pages in whatever form, and over all architectures. Switch to the generic code path. Time to retire hugetlb_follow_page_mask(), following the previous retirement of follow_hugetlb_page() in 4849807114b8. There may be a slight

[PATCH 00/13] mm/gup: Unify hugetlb, part 2

2023-12-18 Thread peterx
From: Peter Xu This is v1 of the series. The series removes the hugetlb slow gup path after a previous refactor work [1], so that slow gup now uses the exact same path to handle all kinds of memory including hugetlb. It's based on latest mm-unstalbe (c13bdc82ada9). RFC->v1 (use old verion's

[PATCH 10/13] mm/gup: Handle huge pud for follow_pud_mask()

2023-12-19 Thread peterx
From: Peter Xu Teach follow_pud_mask() to be able to handle normal PUD pages like hugetlb. Rename follow_devmap_pud() to follow_huge_pud() so that it can process either huge devmap or hugetlb. Move it out of TRANSPARENT_HUGEPAGE_PUD and and huge_memory.c (which relies on CONFIG_THP). In the

[PATCH 09/13] mm/gup: Cache *pudp in follow_pud_mask()

2023-12-19 Thread peterx
From: Peter Xu Introduce "pud_t pud" in the function, so the code won't dereference *pudp multiple time. Not only because that looks less straightforward, but also because if the dereference really happened, it's not clear whether there can be race to see different *pudp values if it's being

[PATCH 02/13] mm/hugetlb: Declare hugetlbfs_pagecache_present() non-static

2023-12-18 Thread peterx
From: Peter Xu It will be used outside hugetlb.c soon. Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 9 + mm/hugetlb.c| 4 ++-- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index

[PATCH 03/13] mm: Provide generic pmd_thp_or_huge()

2023-12-18 Thread peterx
From: Peter Xu ARM defines pmd_thp_or_huge(), detecting either a THP or a huge PMD. It can be a helpful helper if we want to merge more THP and hugetlb code paths. Make it a generic default implementation, only exist when CONFIG_MMU. Arch can overwrite it by defining its own version. For

[PATCH 01/13] mm/Kconfig: CONFIG_PGTABLE_HAS_HUGE_LEAVES

2023-12-18 Thread peterx
From: Peter Xu Introduce a config option that will be selected as long as huge leaves are involved in pgtable (thp or hugetlbfs). It would be useful to mark any code with this new config that can process either hugetlb or thp pages in any level that is higher than pte level. Signed-off-by:

[PATCH 05/13] mm: Introduce vma_pgtable_walk_{begin|end}()

2023-12-19 Thread peterx
From: Peter Xu Introduce per-vma begin()/end() helpers for pgtable walks. This is a preparation work to merge hugetlb pgtable walkers with generic mm. The helpers need to be called before and after a pgtable walk, will start to be needed if the pgtable walker code supports hugetlb pages. It's

[PATCH 08/13] mm/gup: Handle hugetlb for no_page_table()

2023-12-19 Thread peterx
From: Peter Xu no_page_table() is not yet used for hugetlb code paths. Make it prepared. The major difference here is hugetlb will return -EFAULT as long as page cache does not exist, even if VM_SHARED. See hugetlb_follow_page_mask(). Pass "address" into no_page_table() too, as hugetlb will

[PATCH 06/13] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing

2023-12-19 Thread peterx
From: Peter Xu Hugepd format for GUP is only used in PowerPC with hugetlbfs. There are some kernel usage of hugepd (can refer to hugepd_populate_kernel() for PPC_8XX), however those pages are not candidates for GUP. Commit a6e79df92e4a ("mm/gup: disallow FOLL_LONGTERM GUP-fast writing to

[PATCH 07/13] mm/gup: Refactor record_subpages() to find 1st small page

2023-12-19 Thread peterx
From: Peter Xu All the fast-gup functions take a tail page to operate, always need to do page mask calculations before feeding that into record_subpages(). Merge that logic into record_subpages(), so that it will do the nth_page() calculation. Signed-off-by: Peter Xu --- mm/gup.c | 25

[PATCH 11/13] mm/gup: Handle huge pmd for follow_pmd_mask()

2023-12-19 Thread peterx
From: Peter Xu Replace pmd_trans_huge() with pmd_thp_or_huge() to also cover pmd_huge() as long as enabled. FOLL_TOUCH and FOLL_SPLIT_PMD only apply to THP, not yet huge. Since now follow_trans_huge_pmd() can process hugetlb pages, renaming it into follow_huge_pmd() to match what it does.

[PATCH v2 00/13] mm/gup: Unify hugetlb, part 2

2024-01-03 Thread peterx
From: Peter Xu v2: - Collect acks - Patch 9: - Use READ_ONCE() to fetch pud entry [James] rfc: https://lore.kernel.org/r/20231116012908.392077-1-pet...@redhat.com v1: https://lore.kernel.org/r/20231219075538.414708-1-pet...@redhat.com This is v2 of the series, based on latest mm-unstalbe

[PATCH v2 13/13] mm/gup: Handle hugetlb in the generic follow_page_mask code

2024-01-03 Thread peterx
From: Peter Xu Now follow_page() is ready to handle hugetlb pages in whatever form, and over all architectures. Switch to the generic code path. Time to retire hugetlb_follow_page_mask(), following the previous retirement of follow_hugetlb_page() in 4849807114b8. There may be a slight

[PATCH v2 10/13] mm/gup: Handle huge pud for follow_pud_mask()

2024-01-03 Thread peterx
From: Peter Xu Teach follow_pud_mask() to be able to handle normal PUD pages like hugetlb. Rename follow_devmap_pud() to follow_huge_pud() so that it can process either huge devmap or hugetlb. Move it out of TRANSPARENT_HUGEPAGE_PUD and and huge_memory.c (which relies on CONFIG_THP). In the

[PATCH v2 11/13] mm/gup: Handle huge pmd for follow_pmd_mask()

2024-01-03 Thread peterx
From: Peter Xu Replace pmd_trans_huge() with pmd_thp_or_huge() to also cover pmd_huge() as long as enabled. FOLL_TOUCH and FOLL_SPLIT_PMD only apply to THP, not yet huge. Since now follow_trans_huge_pmd() can process hugetlb pages, renaming it into follow_huge_pmd() to match what it does.

[PATCH v2 03/13] mm: Provide generic pmd_thp_or_huge()

2024-01-03 Thread peterx
From: Peter Xu ARM defines pmd_thp_or_huge(), detecting either a THP or a huge PMD. It can be a helpful helper if we want to merge more THP and hugetlb code paths. Make it a generic default implementation, only exist when CONFIG_MMU. Arch can overwrite it by defining its own version. For

[PATCH v2 12/13] mm/gup: Handle hugepd for follow_page()

2024-01-03 Thread peterx
From: Peter Xu Hugepd is only used in PowerPC so far on 4K page size kernels where hash mmu is used. follow_page_mask() used to leverage hugetlb APIs to access hugepd entries. Teach follow_page_mask() itself on hugepd. With previous refactors on fast-gup gup_huge_pd(), most of the code can be

[PATCH v2 05/13] mm: Introduce vma_pgtable_walk_{begin|end}()

2024-01-03 Thread peterx
From: Peter Xu Introduce per-vma begin()/end() helpers for pgtable walks. This is a preparation work to merge hugetlb pgtable walkers with generic mm. The helpers need to be called before and after a pgtable walk, will start to be needed if the pgtable walker code supports hugetlb pages. It's

[PATCH v2 04/13] mm: Make HPAGE_PXD_* macros even if !THP

2024-01-03 Thread peterx
From: Peter Xu These macros can be helpful when we plan to merge hugetlb code into generic code. Move them out and define them even if !THP. We actually already defined HPAGE_PMD_NR for other reasons even if !THP. Reorganize these macros. Reviewed-by: Christoph Hellwig Signed-off-by: Peter

[PATCH v2 06/13] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing

2024-01-03 Thread peterx
From: Peter Xu Hugepd format for GUP is only used in PowerPC with hugetlbfs. There are some kernel usage of hugepd (can refer to hugepd_populate_kernel() for PPC_8XX), however those pages are not candidates for GUP. Commit a6e79df92e4a ("mm/gup: disallow FOLL_LONGTERM GUP-fast writing to

[PATCH v2 07/13] mm/gup: Refactor record_subpages() to find 1st small page

2024-01-03 Thread peterx
From: Peter Xu All the fast-gup functions take a tail page to operate, always need to do page mask calculations before feeding that into record_subpages(). Merge that logic into record_subpages(), so that it will do the nth_page() calculation. Signed-off-by: Peter Xu --- mm/gup.c | 25

[PATCH v2 09/13] mm/gup: Cache *pudp in follow_pud_mask()

2024-01-03 Thread peterx
From: Peter Xu Introduce "pud_t pud" in the function, so the code won't dereference *pudp multiple time. Not only because that looks less straightforward, but also because if the dereference really happened, it's not clear whether there can be race to see different *pudp values if it's being

[PATCH v2 01/13] mm/Kconfig: CONFIG_PGTABLE_HAS_HUGE_LEAVES

2024-01-03 Thread peterx
From: Peter Xu Introduce a config option that will be selected as long as huge leaves are involved in pgtable (thp or hugetlbfs). It would be useful to mark any code with this new config that can process either hugetlb or thp pages in any level that is higher than pte level. Signed-off-by:

[PATCH v2 02/13] mm/hugetlb: Declare hugetlbfs_pagecache_present() non-static

2024-01-03 Thread peterx
From: Peter Xu It will be used outside hugetlb.c soon. Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 9 + mm/hugetlb.c| 4 ++-- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index

[PATCH v2 08/13] mm/gup: Handle hugetlb for no_page_table()

2024-01-03 Thread peterx
From: Peter Xu no_page_table() is not yet used for hugetlb code paths. Make it prepared. The major difference here is hugetlb will return -EFAULT as long as page cache does not exist, even if VM_SHARED. See hugetlb_follow_page_mask(). Pass "address" into no_page_table() too, as hugetlb will

[PATCH RFC 08/13] mm/arm64: Merge pXd_huge() and pXd_leaf() definitions

2024-03-06 Thread peterx
From: Peter Xu Unlike most archs, aarch64 defines pXd_huge() and pXd_leaf() slightly differently. Redefine the pXd_huge() with pXd_leaf(). There used to be two traps for old aarch64 definitions over these APIs that I found when reading the code around, they're: (1) 4797ec2dc83a ("arm64: fix

[PATCH RFC 07/13] mm/arm: Redefine pmd_huge() with pmd_leaf()

2024-03-06 Thread peterx
From: Peter Xu Most of the archs already define these two APIs the same way. ARM is more complicated in two aspects: - For pXd_huge() it's always checking against !PXD_TABLE_BIT, while for pXd_leaf() it's always checking against PXD_TYPE_SECT. - SECT/TABLE bits are defined differently

[PATCH RFC 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf()

2024-03-06 Thread peterx
From: Peter Xu PowerPC book3s 4K mostly has the same definition on both, except pXd_huge() constantly returns 0 for hash MMUs. AFAICT that is fine to be removed, because pXd_huge() reflects a hugetlb entry, while it's own hugetlb pgtable lookup function (__find_linux_pte() shared by all powerpc

[PATCH RFC 04/13] mm/x86: Change pXd_huge() behavior to exclude swap entries

2024-03-06 Thread peterx
From: Peter Xu This patch partly reverts below commits: 3a194f3f8ad0 ("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry") cbef8478bee5 ("mm/hugetlb: pmd_huge() returns true for non-present hugepage") Right now, pXd_huge() definition across kernel is unclear. We

[PATCH RFC 05/13] mm/sparc: Change pXd_huge() behavior to exclude swap entries

2024-03-06 Thread peterx
From: Peter Xu Please refer to the previous patch on the reasoning for x86. Now sparc is the only architecture that will allow swap entries to be reported as pXd_huge(). After this patch, all architectures should forbid swap entries in pXd_huge(). Cc: David S. Miller Cc: Andreas Larsson Cc:

[PATCH RFC 00/13] mm/treewide: Remove pXd_huge() API

2024-03-06 Thread peterx
From: Peter Xu [based on akpm/mm-unstable latest commit a7f399ae964e] In previous work [1], we removed the pXd_large() API, which is arch specific. This patchset further removes the hugetlb pXd_huge() API. Hugetlb was never special on creating huge mappings when compared with other huge

[PATCH RFC 06/13] mm/arm: Use macros to define pmd/pud helpers

2024-03-06 Thread peterx
From: Peter Xu It's already confusing that ARM 2-level v.s. 3-level defines SECT bit differently on pmd/puds. Always use a macro which is much clearer. Cc: Russell King Cc: Shawn Guo Cc: Krzysztof Kozlowski Cc: Bjorn Andersson Cc: Arnd Bergmann Cc: Konrad Dybcio Cc: Fabio Estevam Cc:

[PATCH RFC 13/13] mm: Document pXd_leaf() API

2024-03-06 Thread peterx
From: Peter Xu There's one small section already, but since we're going to remove pXd_huge(), that comment may start to obsolete. Rewrite that section with more information, hopefully with that the API is crystal clear on what it implies. Signed-off-by: Peter Xu --- include/linux/pgtable.h |

[PATCH RFC 03/13] mm/gup: Check p4d presence before going on

2024-03-06 Thread peterx
From: Peter Xu Currently there should have no p4d swap entries so it may not matter much, however this may help us to rule out swap entries in pXd_huge() API, which will include p4d_huge(). The p4d_present() checks make it 100% clear that we won't rely on p4d_huge() for swap entries.

[PATCH RFC 11/13] mm/treewide: Replace pXd_huge() with pXd_leaf()

2024-03-06 Thread peterx
From: Peter Xu Now after we're sure all pXd_huge() definitions are the same as pXd_leaf(), reuse it. Luckily, pXd_huge() isn't widely used. Signed-off-by: Peter Xu --- arch/arm/include/asm/pgtable-3level.h | 2 +- arch/arm64/include/asm/pgtable.h | 2 +- arch/arm64/mm/hugetlbpage.c

[PATCH RFC 10/13] mm/gup: Merge pXd huge mapping checks

2024-03-06 Thread peterx
From: Peter Xu Huge mapping checks in GUP are slightly redundant and can be simplified. pXd_huge() now is the same as pXd_leaf(). pmd_trans_huge() and pXd_devmap() should both imply pXd_leaf(). Time to merge them into one. Signed-off-by: Peter Xu --- mm/gup.c | 7 +++ 1 file changed, 3

[PATCH RFC 12/13] mm/treewide: Remove pXd_huge()

2024-03-06 Thread peterx
From: Peter Xu This API is not used anymore, drop it for the whole tree. Signed-off-by: Peter Xu --- arch/arm/mm/Makefile | 1 - arch/arm/mm/hugetlbpage.c | 29 --- arch/arm64/mm/hugetlbpage.c | 10 ---

[PATCH RFC 02/13] mm/gup: Cache p4d in follow_p4d_mask()

2024-03-06 Thread peterx
From: Peter Xu Add a variable to cache p4d in follow_p4d_mask(). It's a good practise to make sure all the following checks will have a consistent view of the entry. Signed-off-by: Peter Xu --- mm/gup.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git

[PATCH RFC 01/13] mm/hmm: Process pud swap entry without pud_huge()

2024-03-06 Thread peterx
From: Peter Xu Swap pud entries do not always return true for pud_huge() for all archs. x86 and sparc (so far) allow it, but all the rest do not accept a swap entry to be reported as pud_huge(). So it's not safe to check swap entries within pud_huge(). Check swap entries before pud_huge(), so

[PATCH v2 04/14] mm/x86: Change pXd_huge() behavior to exclude swap entries

2024-03-18 Thread peterx
From: Peter Xu This patch partly reverts below commits: 3a194f3f8ad0 ("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry") cbef8478bee5 ("mm/hugetlb: pmd_huge() returns true for non-present hugepage") Right now, pXd_huge() definition across kernel is unclear. We

[PATCH v2 06/14] mm/arm: Use macros to define pmd/pud helpers

2024-03-18 Thread peterx
From: Peter Xu It's already confusing that ARM 2-level v.s. 3-level defines SECT bit differently on pmd/puds. Always use a macro which is much clearer. Cc: Russell King Cc: Shawn Guo Cc: Krzysztof Kozlowski Cc: Bjorn Andersson Cc: Arnd Bergmann Cc: Konrad Dybcio Cc: Fabio Estevam Cc:

[PATCH v2 05/14] mm/sparc: Change pXd_huge() behavior to exclude swap entries

2024-03-18 Thread peterx
From: Peter Xu Please refer to the previous patch on the reasoning for x86. Now sparc is the only architecture that will allow swap entries to be reported as pXd_huge(). After this patch, all architectures should forbid swap entries in pXd_huge(). Cc: David S. Miller Cc: Andreas Larsson Cc:

[PATCH v2 08/14] mm/arm64: Merge pXd_huge() and pXd_leaf() definitions

2024-03-18 Thread peterx
From: Peter Xu Unlike most archs, aarch64 defines pXd_huge() and pXd_leaf() slightly differently. Redefine the pXd_huge() with pXd_leaf(). There used to be two traps for old aarch64 definitions over these APIs that I found when reading the code around, they're: (1) 4797ec2dc83a ("arm64: fix

[PATCH v2 11/14] mm/treewide: Replace pXd_huge() with pXd_leaf()

2024-03-18 Thread peterx
From: Peter Xu Now after we're sure all pXd_huge() definitions are the same as pXd_leaf(), reuse it. Luckily, pXd_huge() isn't widely used. Signed-off-by: Peter Xu --- arch/arm/include/asm/pgtable-3level.h | 2 +- arch/arm64/include/asm/pgtable.h | 2 +- arch/arm64/mm/hugetlbpage.c

[PATCH v2 09/14] mm/powerpc: Redefine pXd_huge() with pXd_leaf()

2024-03-18 Thread peterx
From: Peter Xu PowerPC book3s 4K mostly has the same definition on both, except pXd_huge() constantly returns 0 for hash MMUs. As Michael Ellerman pointed out [1], it is safe to check _PAGE_PTE on hash MMUs, as the bit will never be set so it will keep returning false. As a reference,

[PATCH v2 10/14] mm/gup: Merge pXd huge mapping checks

2024-03-18 Thread peterx
From: Peter Xu Huge mapping checks in GUP are slightly redundant and can be simplified. pXd_huge() now is the same as pXd_leaf(). pmd_trans_huge() and pXd_devmap() should both imply pXd_leaf(). Time to merge them into one. Reviewed-by: Jason Gunthorpe Signed-off-by: Peter Xu --- mm/gup.c |

[PATCH v2 13/14] mm/arm: Remove pmd_thp_or_huge()

2024-03-18 Thread peterx
From: Peter Xu ARM/ARM64 used to define pmd_thp_or_huge(). Now this macro is completely redundant. Remove it and use pmd_leaf(). Cc: Mark Salter Cc: Catalin Marinas Cc: Will Deacon Cc: Russell King Cc: Shawn Guo Cc: Krzysztof Kozlowski Cc: Bjorn Andersson Cc: Arnd Bergmann Cc: Konrad

[PATCH v2 12/14] mm/treewide: Remove pXd_huge()

2024-03-18 Thread peterx
From: Peter Xu This API is not used anymore, drop it for the whole tree. Signed-off-by: Peter Xu --- arch/arm/mm/Makefile | 1 - arch/arm/mm/hugetlbpage.c | 29 --- arch/arm64/mm/hugetlbpage.c | 10 ---

[PATCH v2 07/14] mm/arm: Redefine pmd_huge() with pmd_leaf()

2024-03-18 Thread peterx
From: Peter Xu Most of the archs already define these two APIs the same way. ARM is more complicated in two aspects: - For pXd_huge() it's always checking against !PXD_TABLE_BIT, while for pXd_leaf() it's always checking against PXD_TYPE_SECT. - SECT/TABLE bits are defined differently

[PATCH v2 00/14] mm/treewide: Remove pXd_huge() API

2024-03-18 Thread peterx
From: Peter Xu [based on akpm/mm-unstable commit b66d4391d8fe, March 18th] v2: - Add a patch to cleanup ARM's pmd_thp_or_huge [Christophe] - Enhance commit message for PowerPC patch on hugepd [Christophe] v1: https://lore.kernel.org/r/20240313214719.253873-1-pet...@redhat.com In previous work

[PATCH v2 01/14] mm/hmm: Process pud swap entry without pud_huge()

2024-03-18 Thread peterx
From: Peter Xu Swap pud entries do not always return true for pud_huge() for all archs. x86 and sparc (so far) allow it, but all the rest do not accept a swap entry to be reported as pud_huge(). So it's not safe to check swap entries within pud_huge(). Check swap entries before pud_huge(), so

[PATCH v2 14/14] mm: Document pXd_leaf() API

2024-03-18 Thread peterx
From: Peter Xu There's one small section already, but since we're going to remove pXd_huge(), that comment may start to obsolete. Rewrite that section with more information, hopefully with that the API is crystal clear on what it implies. Reviewed-by: Jason Gunthorpe Signed-off-by: Peter Xu

[PATCH v2 02/14] mm/gup: Cache p4d in follow_p4d_mask()

2024-03-18 Thread peterx
From: Peter Xu Add a variable to cache p4d in follow_p4d_mask(). It's a good practise to make sure all the following checks will have a consistent view of the entry. Signed-off-by: Peter Xu --- mm/gup.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git

[PATCH v2 03/14] mm/gup: Check p4d presence before going on

2024-03-18 Thread peterx
From: Peter Xu Currently there should have no p4d swap entries so it may not matter much, however this may help us to rule out swap entries in pXd_huge() API, which will include p4d_huge(). The p4d_present() checks make it 100% clear that we won't rely on p4d_huge() for swap entries.

[PATCH v3 01/12] mm/Kconfig: CONFIG_PGTABLE_HAS_HUGE_LEAVES

2024-03-21 Thread peterx
From: Peter Xu Introduce a config option that will be selected as long as huge leaves are involved in pgtable (thp or hugetlbfs). It would be useful to mark any code with this new config that can process either hugetlb or thp pages in any level that is higher than pte level. Reviewed-by: Jason

[PATCH v3 03/12] mm: Make HPAGE_PXD_* macros even if !THP

2024-03-21 Thread peterx
From: Peter Xu These macros can be helpful when we plan to merge hugetlb code into generic code. Move them out and define them even if !THP. We actually already defined HPAGE_PMD_NR for other reasons even if !THP. Reorganize these macros. Reviewed-by: Christoph Hellwig Reviewed-by: Jason

[PATCH v3 02/12] mm/hugetlb: Declare hugetlbfs_pagecache_present() non-static

2024-03-21 Thread peterx
From: Peter Xu It will be used outside hugetlb.c soon. Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 9 + mm/hugetlb.c| 4 ++-- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index

[PATCH v3 00/12] mm/gup: Unify hugetlb, part 2

2024-03-21 Thread peterx
From: Peter Xu v3: - Rebased to latest mm-unstalbe (a824831a082f, of March 21th) - Dropped patch to introduce pmd_thp_or_huge(), replace such uses (and also pXd_huge() users) with pXd_leaf() [Jason] - Add a comment for CONFIG_PGTABLE_HAS_HUGE_LEAVES [Jason] - Use IS_ENABLED() in

[PATCH v3 04/12] mm: Introduce vma_pgtable_walk_{begin|end}()

2024-03-21 Thread peterx
From: Peter Xu Introduce per-vma begin()/end() helpers for pgtable walks. This is a preparation work to merge hugetlb pgtable walkers with generic mm. The helpers need to be called before and after a pgtable walk, will start to be needed if the pgtable walker code supports hugetlb pages. It's

[PATCH v3 12/12] mm/gup: Handle hugetlb in the generic follow_page_mask code

2024-03-21 Thread peterx
From: Peter Xu Now follow_page() is ready to handle hugetlb pages in whatever form, and over all architectures. Switch to the generic code path. Time to retire hugetlb_follow_page_mask(), following the previous retirement of follow_hugetlb_page() in 4849807114b8. There may be a slight

[PATCH v3 10/12] mm/gup: Handle huge pmd for follow_pmd_mask()

2024-03-21 Thread peterx
From: Peter Xu Replace pmd_trans_huge() with pmd_leaf() to also cover pmd_huge() as long as enabled. FOLL_TOUCH and FOLL_SPLIT_PMD only apply to THP, not yet huge. Since now follow_trans_huge_pmd() can process hugetlb pages, renaming it into follow_huge_pmd() to match what it does. Move it

[PATCH v3 11/12] mm/gup: Handle hugepd for follow_page()

2024-03-21 Thread peterx
From: Peter Xu Hugepd is only used in PowerPC so far on 4K page size kernels where hash mmu is used. follow_page_mask() used to leverage hugetlb APIs to access hugepd entries. Teach follow_page_mask() itself on hugepd. With previous refactors on fast-gup gup_huge_pd(), most of the code can be

[PATCH v3 05/12] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing

2024-03-21 Thread peterx
From: Peter Xu Hugepd format for GUP is only used in PowerPC with hugetlbfs. There are some kernel usage of hugepd (can refer to hugepd_populate_kernel() for PPC_8XX), however those pages are not candidates for GUP. Commit a6e79df92e4a ("mm/gup: disallow FOLL_LONGTERM GUP-fast writing to

[PATCH v3 07/12] mm/gup: Handle hugetlb for no_page_table()

2024-03-21 Thread peterx
From: Peter Xu no_page_table() is not yet used for hugetlb code paths. Make it prepared. The major difference here is hugetlb will return -EFAULT as long as page cache does not exist, even if VM_SHARED. See hugetlb_follow_page_mask(). Pass "address" into no_page_table() too, as hugetlb will

[PATCH v3 06/12] mm/gup: Refactor record_subpages() to find 1st small page

2024-03-21 Thread peterx
From: Peter Xu All the fast-gup functions take a tail page to operate, always need to do page mask calculations before feeding that into record_subpages(). Merge that logic into record_subpages(), so that it will do the nth_page() calculation. Reviewed-by: Jason Gunthorpe Signed-off-by: Peter

[PATCH v3 09/12] mm/gup: Handle huge pud for follow_pud_mask()

2024-03-21 Thread peterx
From: Peter Xu Teach follow_pud_mask() to be able to handle normal PUD pages like hugetlb. Rename follow_devmap_pud() to follow_huge_pud() so that it can process either huge devmap or hugetlb. Move it out of TRANSPARENT_HUGEPAGE_PUD and and huge_memory.c (which relies on CONFIG_THP). Switch to

[PATCH v3 08/12] mm/gup: Cache *pudp in follow_pud_mask()

2024-03-21 Thread peterx
From: Peter Xu Introduce "pud_t pud" in the function, so the code won't dereference *pudp multiple time. Not only because that looks less straightforward, but also because if the dereference really happened, it's not clear whether there can be race to see different *pudp values if it's being

[PATCH 00/13] mm/treewide: Remove pXd_huge() API

2024-03-13 Thread peterx
From: Peter Xu [based on akpm/mm-unstable latest commit 9af2e4c429b5] v1: - Rebase, remove RFC tag - Fixed powerpc patch build issue, enhancing commit message [Michael] - Optimize patch 1 & 3 on "none || !present" check [Jason] In previous work [1], we removed the pXd_large() API, which is

[PATCH 11/13] mm/treewide: Replace pXd_huge() with pXd_leaf()

2024-03-13 Thread peterx
From: Peter Xu Now after we're sure all pXd_huge() definitions are the same as pXd_leaf(), reuse it. Luckily, pXd_huge() isn't widely used. Signed-off-by: Peter Xu --- arch/arm/include/asm/pgtable-3level.h | 2 +- arch/arm64/include/asm/pgtable.h | 2 +- arch/arm64/mm/hugetlbpage.c

[PATCH 10/13] mm/gup: Merge pXd huge mapping checks

2024-03-13 Thread peterx
From: Peter Xu Huge mapping checks in GUP are slightly redundant and can be simplified. pXd_huge() now is the same as pXd_leaf(). pmd_trans_huge() and pXd_devmap() should both imply pXd_leaf(). Time to merge them into one. Reviewed-by: Jason Gunthorpe Signed-off-by: Peter Xu --- mm/gup.c |

[PATCH 12/13] mm/treewide: Remove pXd_huge()

2024-03-13 Thread peterx
From: Peter Xu This API is not used anymore, drop it for the whole tree. Signed-off-by: Peter Xu --- arch/arm/mm/Makefile | 1 - arch/arm/mm/hugetlbpage.c | 29 --- arch/arm64/mm/hugetlbpage.c | 10 ---

[PATCH 13/13] mm: Document pXd_leaf() API

2024-03-13 Thread peterx
From: Peter Xu There's one small section already, but since we're going to remove pXd_huge(), that comment may start to obsolete. Rewrite that section with more information, hopefully with that the API is crystal clear on what it implies. Reviewed-by: Jason Gunthorpe Signed-off-by: Peter Xu

[PATCH 01/13] mm/hmm: Process pud swap entry without pud_huge()

2024-03-13 Thread peterx
From: Peter Xu Swap pud entries do not always return true for pud_huge() for all archs. x86 and sparc (so far) allow it, but all the rest do not accept a swap entry to be reported as pud_huge(). So it's not safe to check swap entries within pud_huge(). Check swap entries before pud_huge(), so

[PATCH 04/13] mm/x86: Change pXd_huge() behavior to exclude swap entries

2024-03-13 Thread peterx
From: Peter Xu This patch partly reverts below commits: 3a194f3f8ad0 ("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry") cbef8478bee5 ("mm/hugetlb: pmd_huge() returns true for non-present hugepage") Right now, pXd_huge() definition across kernel is unclear. We

[PATCH 03/13] mm/gup: Check p4d presence before going on

2024-03-13 Thread peterx
From: Peter Xu Currently there should have no p4d swap entries so it may not matter much, however this may help us to rule out swap entries in pXd_huge() API, which will include p4d_huge(). The p4d_present() checks make it 100% clear that we won't rely on p4d_huge() for swap entries.

[PATCH 02/13] mm/gup: Cache p4d in follow_p4d_mask()

2024-03-13 Thread peterx
From: Peter Xu Add a variable to cache p4d in follow_p4d_mask(). It's a good practise to make sure all the following checks will have a consistent view of the entry. Signed-off-by: Peter Xu --- mm/gup.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git

[PATCH 05/13] mm/sparc: Change pXd_huge() behavior to exclude swap entries

2024-03-13 Thread peterx
From: Peter Xu Please refer to the previous patch on the reasoning for x86. Now sparc is the only architecture that will allow swap entries to be reported as pXd_huge(). After this patch, all architectures should forbid swap entries in pXd_huge(). Cc: David S. Miller Cc: Andreas Larsson Cc:

[PATCH 06/13] mm/arm: Use macros to define pmd/pud helpers

2024-03-13 Thread peterx
From: Peter Xu It's already confusing that ARM 2-level v.s. 3-level defines SECT bit differently on pmd/puds. Always use a macro which is much clearer. Cc: Russell King Cc: Shawn Guo Cc: Krzysztof Kozlowski Cc: Bjorn Andersson Cc: Arnd Bergmann Cc: Konrad Dybcio Cc: Fabio Estevam Cc:

[PATCH 07/13] mm/arm: Redefine pmd_huge() with pmd_leaf()

2024-03-13 Thread peterx
From: Peter Xu Most of the archs already define these two APIs the same way. ARM is more complicated in two aspects: - For pXd_huge() it's always checking against !PXD_TABLE_BIT, while for pXd_leaf() it's always checking against PXD_TYPE_SECT. - SECT/TABLE bits are defined differently

[PATCH 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf()

2024-03-13 Thread peterx
From: Peter Xu PowerPC book3s 4K mostly has the same definition on both, except pXd_huge() constantly returns 0 for hash MMUs. As Michael Ellerman pointed out [1], it is safe to check _PAGE_PTE on hash MMUs, as the bit will never be set so it will keep returning false. As a reference,

[PATCH 08/13] mm/arm64: Merge pXd_huge() and pXd_leaf() definitions

2024-03-13 Thread peterx
From: Peter Xu Unlike most archs, aarch64 defines pXd_huge() and pXd_leaf() slightly differently. Redefine the pXd_huge() with pXd_leaf(). There used to be two traps for old aarch64 definitions over these APIs that I found when reading the code around, they're: (1) 4797ec2dc83a ("arm64: fix

[PATCH v4 01/13] mm/Kconfig: CONFIG_PGTABLE_HAS_HUGE_LEAVES

2024-03-27 Thread peterx
From: Peter Xu Introduce a config option that will be selected as long as huge leaves are involved in pgtable (thp or hugetlbfs). It would be useful to mark any code with this new config that can process either hugetlb or thp pages in any level that is higher than pte level. Reviewed-by: Jason

[PATCH v4 00/13] mm/gup: Unify hugetlb, part 2

2024-03-27 Thread peterx
From: Peter Xu v4: - Fix build issues, tested on more archs/configs ([x86_64, i386, arm, arm64, powerpc, riscv, s390] x [allno, alldef, allmod]). - Squashed the fixup series into v3, touched up commit messages [1] - Added the patch to fix pud_pfn() into the series [2] - Fixed one more

[PATCH v4 05/13] mm/arch: Provide pud_pfn() fallback

2024-03-27 Thread peterx
From: Peter Xu The comment in the code explains the reasons. We took a different approach comparing to pmd_pfn() by providing a fallback function. Another option is to provide some lower level config options (compare to HUGETLB_PAGE or THP) to identify which layer an arch can support for such

[PATCH v4 04/13] mm: Introduce vma_pgtable_walk_{begin|end}()

2024-03-27 Thread peterx
From: Peter Xu Introduce per-vma begin()/end() helpers for pgtable walks. This is a preparation work to merge hugetlb pgtable walkers with generic mm. The helpers need to be called before and after a pgtable walk, will start to be needed if the pgtable walker code supports hugetlb pages. It's

[PATCH v4 13/13] mm/gup: Handle hugetlb in the generic follow_page_mask code

2024-03-27 Thread peterx
From: Peter Xu Now follow_page() is ready to handle hugetlb pages in whatever form, and over all architectures. Switch to the generic code path. Time to retire hugetlb_follow_page_mask(), following the previous retirement of follow_hugetlb_page() in 4849807114b8. There may be a slight

[PATCH v4 03/13] mm: Make HPAGE_PXD_* macros even if !THP

2024-03-27 Thread peterx
From: Peter Xu These macros can be helpful when we plan to merge hugetlb code into generic code. Move them out and define them as long as PGTABLE_HAS_HUGE_LEAVES is selected, because there are systems that only define HUGETLB_PAGE not THP. One note here is HPAGE_PMD_SHIFT must be defined even

[PATCH v4 06/13] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing

2024-03-27 Thread peterx
From: Peter Xu Hugepd format for GUP is only used in PowerPC with hugetlbfs. There are some kernel usage of hugepd (can refer to hugepd_populate_kernel() for PPC_8XX), however those pages are not candidates for GUP. Commit a6e79df92e4a ("mm/gup: disallow FOLL_LONGTERM GUP-fast writing to

[PATCH v4 07/13] mm/gup: Refactor record_subpages() to find 1st small page

2024-03-27 Thread peterx
From: Peter Xu All the fast-gup functions take a tail page to operate, always need to do page mask calculations before feeding that into record_subpages(). Merge that logic into record_subpages(), so that it will do the nth_page() calculation. Reviewed-by: Jason Gunthorpe Signed-off-by: Peter

[PATCH v4 11/13] mm/gup: Handle huge pmd for follow_pmd_mask()

2024-03-27 Thread peterx
From: Peter Xu Replace pmd_trans_huge() with pmd_leaf() to also cover pmd_huge() as long as enabled. FOLL_TOUCH and FOLL_SPLIT_PMD only apply to THP, not yet huge. Since now follow_trans_huge_pmd() can process hugetlb pages, renaming it into follow_huge_pmd() to match what it does. Move it

[PATCH v4 12/13] mm/gup: Handle hugepd for follow_page()

2024-03-27 Thread peterx
From: Peter Xu Hugepd is only used in PowerPC so far on 4K page size kernels where hash mmu is used. follow_page_mask() used to leverage hugetlb APIs to access hugepd entries. Teach follow_page_mask() itself on hugepd. With previous refactors on fast-gup gup_huge_pd(), most of the code can be

[PATCH v4 02/13] mm/hugetlb: Declare hugetlbfs_pagecache_present() non-static

2024-03-27 Thread peterx
From: Peter Xu It will be used outside hugetlb.c soon. Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 9 + mm/hugetlb.c| 4 ++-- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index

[PATCH v4 09/13] mm/gup: Cache *pudp in follow_pud_mask()

2024-03-27 Thread peterx
From: Peter Xu Introduce "pud_t pud" in the function, so the code won't dereference *pudp multiple time. Not only because that looks less straightforward, but also because if the dereference really happened, it's not clear whether there can be race to see different *pudp values if it's being

[PATCH v4 08/13] mm/gup: Handle hugetlb for no_page_table()

2024-03-27 Thread peterx
From: Peter Xu no_page_table() is not yet used for hugetlb code paths. Make it prepared. The major difference here is hugetlb will return -EFAULT as long as page cache does not exist, even if VM_SHARED. See hugetlb_follow_page_mask(). Pass "address" into no_page_table() too, as hugetlb will

[PATCH v4 10/13] mm/gup: Handle huge pud for follow_pud_mask()

2024-03-27 Thread peterx
From: Peter Xu Teach follow_pud_mask() to be able to handle normal PUD pages like hugetlb. Rename follow_devmap_pud() to follow_huge_pud() so that it can process either huge devmap or hugetlb. Move it out of TRANSPARENT_HUGEPAGE_PUD and and huge_memory.c (which relies on CONFIG_THP). Switch to

[PATCH 5/5] mm/treewide: Drop pXd_large()

2024-02-28 Thread peterx
From: Peter Xu They're not used anymore, drop all of them. Signed-off-by: Peter Xu --- arch/arm/include/asm/pgtable-2level.h| 1 - arch/arm/include/asm/pgtable-3level.h| 1 - arch/powerpc/include/asm/book3s/64/pgtable.h | 2 -- arch/powerpc/include/asm/pgtable.h |

[PATCH 3/5] mm/treewide: Replace pmd_large() with pmd_leaf()

2024-02-28 Thread peterx
From: Peter Xu pmd_large() is always defined as pmd_leaf(). Merge their usages. Chose pmd_leaf() because pmd_leaf() is a global API, while pmd_large() is not. Signed-off-by: Peter Xu --- arch/arm/mm/dump.c | 4 ++-- arch/powerpc/mm/book3s64/pgtable.c | 2 +-

  1   2   >