Re: API for setting multiple PTEs at once
On Dienstag, 14. Februar 2023 10:55:43 CET Alexandre Ghiti wrote: > Hi Matthew, > > On 2/7/23 21:27, Matthew Wilcox wrote: > > On Thu, Feb 02, 2023 at 09:14:23PM +, Matthew Wilcox wrote: > >> For those of you not subscribed, linux-mm is currently discussing > >> how best to handle page faults on large folios. I simply made it work > >> when adding large folio support. Now Yin Fengwei is working on > >> making it fast. > > > > OK, here's an actual implementation: > > > > https://lore.kernel.org/linux-mm/20230207194937.122543-3-willy@infradead.o > > rg/ > > > > It survives a run of xfstests. If your architecture doesn't store its > > PFNs at PAGE_SHIFT, you're going to want to implement your own set_ptes(), > > or you'll see entirely the wrong pages mapped into userspace. You may > > also wish to implement set_ptes() if it can be done more efficiently > > than __pte(pteval(pte) + PAGE_SIZE). > > > > Architectures that implement things like flush_icache_page() and > > update_mmu_cache() may want to propose batched versions of those. > > That's alpha, csky, m68k, mips, nios2, parisc, sh, > > arm, loongarch, openrisc, powerpc, riscv, sparc and xtensa. > > Maintainers BCC'd, mailing lists CC'd. > > > > I'm happy to collect implementations and submit them as part of a v6. > > Please find below the riscv implementation for set_ptes: > > diff --git a/arch/riscv/include/asm/pgtable.h > b/arch/riscv/include/asm/pgtable.h > index ebee56d47003..10bf812776a4 100644 > --- a/arch/riscv/include/asm/pgtable.h > +++ b/arch/riscv/include/asm/pgtable.h > @@ -463,6 +463,20 @@ static inline void set_pte_at(struct mm_struct *mm, > __set_pte_at(mm, addr, ptep, pteval); > } > > +#define set_ptes set_ptes > +static inline void set_ptes(struct mm_struct *mm, unsigned long addr, > + pte_t *ptep, pte_t pte, unsigned int nr) > +{ > + for (;;) { > + set_pte_at(mm, addr, ptep, pte); > + if (--nr == 0) > + break; > + ptep++; > + addr += PAGE_SIZE; > + pte = __pte(pte_val(pte) + (1 << _PAGE_PFN_SHIFT)); > + } > +} Given that this is the same code as the original version (surprise!), what about doing something like this in the generic code instead: #ifndef PTE_PAGE_STEP #define PTE_PAGE_STEP PAGE_SIZE #endif […] > + pte = __pte(pte_val(pte) + PTE_PAGE_STEP); The name of the define is an obvious candidate for bike-shedding, feel free to name it as you want. Or if that isn't sound enough maybe introduce something like: static inline pte_t set_ptes_next_pte(pte_t pte) { return __pte(pte_val(pte) + (1 << _PAGE_PFN_SHIFT)); } Greetings, Eike -- Rolf Eike Beer, emlix GmbH, http://www.emlix.com Fon +49 551 30664-0, Fax +49 551 30664-11 Gothaer Platz 3, 37083 Göttingen, Germany Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160 Geschäftsführung: Heike Jordan, Dr. Uwe Kracke – Ust-IdNr.: DE 205 198 055 emlix - smart embedded open source signature.asc Description: This is a digitally signed message part.
Re: API for setting multiple PTEs at once
Hi Matthew, On 2/7/23 21:27, Matthew Wilcox wrote: On Thu, Feb 02, 2023 at 09:14:23PM +, Matthew Wilcox wrote: For those of you not subscribed, linux-mm is currently discussing how best to handle page faults on large folios. I simply made it work when adding large folio support. Now Yin Fengwei is working on making it fast. OK, here's an actual implementation: https://lore.kernel.org/linux-mm/20230207194937.122543-3-wi...@infradead.org/ It survives a run of xfstests. If your architecture doesn't store its PFNs at PAGE_SHIFT, you're going to want to implement your own set_ptes(), or you'll see entirely the wrong pages mapped into userspace. You may also wish to implement set_ptes() if it can be done more efficiently than __pte(pteval(pte) + PAGE_SIZE). Architectures that implement things like flush_icache_page() and update_mmu_cache() may want to propose batched versions of those. That's alpha, csky, m68k, mips, nios2, parisc, sh, arm, loongarch, openrisc, powerpc, riscv, sparc and xtensa. Maintainers BCC'd, mailing lists CC'd. I'm happy to collect implementations and submit them as part of a v6. Please find below the riscv implementation for set_ptes: diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index ebee56d47003..10bf812776a4 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -463,6 +463,20 @@ static inline void set_pte_at(struct mm_struct *mm, __set_pte_at(mm, addr, ptep, pteval); } +#define set_ptes set_ptes +static inline void set_ptes(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte, unsigned int nr) +{ + for (;;) { + set_pte_at(mm, addr, ptep, pte); + if (--nr == 0) + break; + ptep++; + addr += PAGE_SIZE; + pte = __pte(pte_val(pte) + (1 << _PAGE_PFN_SHIFT)); + } +} + static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { Thanks, Alex
Re: API for setting multiple PTEs at once
On Wed, Feb 08, 2023 at 08:09:00PM +0800, Yin, Fengwei wrote: > > > On 2/8/2023 7:23 PM, Alexandre Ghiti wrote: > > Hi Matthew, > > > > On 2/7/23 21:27, Matthew Wilcox wrote: > >> On Thu, Feb 02, 2023 at 09:14:23PM +, Matthew Wilcox wrote: > >>> For those of you not subscribed, linux-mm is currently discussing > >>> how best to handle page faults on large folios. I simply made it work > >>> when adding large folio support. Now Yin Fengwei is working on > >>> making it fast. > >> OK, here's an actual implementation: > >> > >> https://lore.kernel.org/linux-mm/20230207194937.122543-3-wi...@infradead.org/ > >> > >> It survives a run of xfstests. If your architecture doesn't store its > >> PFNs at PAGE_SHIFT, you're going to want to implement your own set_ptes(), > > > > > > riscv stores its pfn at PAGE_PFN_SHIFT instead of PAGE_SHIFT, se we need to > > reimplement set_ptes. But I have been playing with your patchset and we > > never fall into the case where set_ptes is called with nr > 1, any idea > > why? I booted a large ubuntu defconfig and launched > > will_it_scale.page_fault4. > Need to use xfs filesystem to get large folio for file mapping. > Other filesystem may be also OK. But I just tried xfs. Thanks. XFS is certainly the flagship filesystem to support large folios, but others have added support, AFS and EROFS. You can also get large folios in tmpfs (which is slightly different as it focuses on THPs rather than generic large folios). You also have to have CONFIG_TRANSPARENT_HUGEPAGE selected, which riscv can do. That restriction will be lifted at some point, but for now large folios depends on the THP infrastructure.
Re: API for setting multiple PTEs at once
On 2/8/2023 7:23 PM, Alexandre Ghiti wrote: > Hi Matthew, > > On 2/7/23 21:27, Matthew Wilcox wrote: >> On Thu, Feb 02, 2023 at 09:14:23PM +, Matthew Wilcox wrote: >>> For those of you not subscribed, linux-mm is currently discussing >>> how best to handle page faults on large folios. I simply made it work >>> when adding large folio support. Now Yin Fengwei is working on >>> making it fast. >> OK, here's an actual implementation: >> >> https://lore.kernel.org/linux-mm/20230207194937.122543-3-wi...@infradead.org/ >> >> It survives a run of xfstests. If your architecture doesn't store its >> PFNs at PAGE_SHIFT, you're going to want to implement your own set_ptes(), > > > riscv stores its pfn at PAGE_PFN_SHIFT instead of PAGE_SHIFT, se we need to > reimplement set_ptes. But I have been playing with your patchset and we never > fall into the case where set_ptes is called with nr > 1, any idea why? I > booted a large ubuntu defconfig and launched will_it_scale.page_fault4. Need to use xfs filesystem to get large folio for file mapping. Other filesystem may be also OK. But I just tried xfs. Thanks. Regards Yin, Fengwei > > I'll come up with the proper implementation of set_ptes anyway soon. > > Thanks, > > Alex > > >> or you'll see entirely the wrong pages mapped into userspace. You may >> also wish to implement set_ptes() if it can be done more efficiently >> than __pte(pteval(pte) + PAGE_SIZE). >> >> Architectures that implement things like flush_icache_page() and >> update_mmu_cache() may want to propose batched versions of those. >> That's alpha, csky, m68k, mips, nios2, parisc, sh, >> arm, loongarch, openrisc, powerpc, riscv, sparc and xtensa. >> Maintainers BCC'd, mailing lists CC'd. >> >> I'm happy to collect implementations and submit them as part of a v6. >> >> ___ >> linux-riscv mailing list >> linux-ri...@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-riscv
Re: API for setting multiple PTEs at once
Hi Matthew, On 2/7/23 21:27, Matthew Wilcox wrote: On Thu, Feb 02, 2023 at 09:14:23PM +, Matthew Wilcox wrote: For those of you not subscribed, linux-mm is currently discussing how best to handle page faults on large folios. I simply made it work when adding large folio support. Now Yin Fengwei is working on making it fast. OK, here's an actual implementation: https://lore.kernel.org/linux-mm/20230207194937.122543-3-wi...@infradead.org/ It survives a run of xfstests. If your architecture doesn't store its PFNs at PAGE_SHIFT, you're going to want to implement your own set_ptes(), riscv stores its pfn at PAGE_PFN_SHIFT instead of PAGE_SHIFT, se we need to reimplement set_ptes. But I have been playing with your patchset and we never fall into the case where set_ptes is called with nr > 1, any idea why? I booted a large ubuntu defconfig and launched will_it_scale.page_fault4. I'll come up with the proper implementation of set_ptes anyway soon. Thanks, Alex or you'll see entirely the wrong pages mapped into userspace. You may also wish to implement set_ptes() if it can be done more efficiently than __pte(pteval(pte) + PAGE_SIZE). Architectures that implement things like flush_icache_page() and update_mmu_cache() may want to propose batched versions of those. That's alpha, csky, m68k, mips, nios2, parisc, sh, arm, loongarch, openrisc, powerpc, riscv, sparc and xtensa. Maintainers BCC'd, mailing lists CC'd. I'm happy to collect implementations and submit them as part of a v6. ___ linux-riscv mailing list linux-ri...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
Re: API for setting multiple PTEs at once
On Thu, Feb 02, 2023 at 09:14:23PM +, Matthew Wilcox wrote: > For those of you not subscribed, linux-mm is currently discussing > how best to handle page faults on large folios. I simply made it work > when adding large folio support. Now Yin Fengwei is working on > making it fast. OK, here's an actual implementation: https://lore.kernel.org/linux-mm/20230207194937.122543-3-wi...@infradead.org/ It survives a run of xfstests. If your architecture doesn't store its PFNs at PAGE_SHIFT, you're going to want to implement your own set_ptes(), or you'll see entirely the wrong pages mapped into userspace. You may also wish to implement set_ptes() if it can be done more efficiently than __pte(pteval(pte) + PAGE_SIZE). Architectures that implement things like flush_icache_page() and update_mmu_cache() may want to propose batched versions of those. That's alpha, csky, m68k, mips, nios2, parisc, sh, arm, loongarch, openrisc, powerpc, riscv, sparc and xtensa. Maintainers BCC'd, mailing lists CC'd. I'm happy to collect implementations and submit them as part of a v6.