Re: [PATCH v2 1/3] arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions
Hi Arnd, Arnd Bergmann writes: > From: Arnd Bergmann > > These four architectures define the same Kconfig symbols for configuring > the page size. Move the logic into a common place where it can be shared > with all other architectures. > > Signed-off-by: Arnd Bergmann > --- > Changes from v1: > - improve Kconfig help texts > - fix Hexagon Kconfig > > arch/Kconfig | 92 ++- > arch/hexagon/Kconfig | 24 ++-- > arch/hexagon/include/asm/page.h | 6 +- > arch/loongarch/Kconfig| 21 ++- > arch/loongarch/include/asm/page.h | 10 +--- > arch/mips/Kconfig | 58 ++- > arch/mips/include/asm/page.h | 16 +- > arch/sh/include/asm/page.h| 13 + > arch/sh/mm/Kconfig| 42 -- > 9 files changed, 121 insertions(+), 161 deletions(-) There's a few "help" lines missing, which breaks the build: arch/Kconfig:1134: syntax error arch/Kconfig:1133: invalid statement arch/Kconfig:1134: invalid statement arch/Kconfig:1135:warning: ignoring unsupported character '.' arch/Kconfig:1135:warning: ignoring unsupported character '.' arch/Kconfig:1135: invalid statement arch/Kconfig:1136: invalid statement arch/Kconfig:1137:warning: ignoring unsupported character '.' arch/Kconfig:1137: invalid statement arch/Kconfig:1143: syntax error arch/Kconfig:1142: invalid statement arch/Kconfig:1143: invalid statement arch/Kconfig:1144:warning: ignoring unsupported character '.' arch/Kconfig:1144: invalid statement arch/Kconfig:1145: invalid statement arch/Kconfig:1146: invalid statement arch/Kconfig:1147: invalid statement arch/Kconfig:1148:warning: ignoring unsupported character '.' arch/Kconfig:1148: invalid statement make[4]: *** [../scripts/kconfig/Makefile:85: syncconfig] Error 1 Fixup diff is: diff --git a/arch/Kconfig b/arch/Kconfig index 56d45a75f625..f2295fa3b48c 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1130,6 +1130,7 @@ config PAGE_SIZE_16KB config PAGE_SIZE_32KB bool "32KiB pages" depends on HAVE_PAGE_SIZE_32KB + help Using 32KiB page size will result in slightly higher performance kernel at the price of higher memory consumption compared to 16KiB pages. This option is available only on cnMIPS cores. @@ -1139,6 +1140,7 @@ config PAGE_SIZE_32KB config PAGE_SIZE_64KB bool "64KiB pages" depends on HAVE_PAGE_SIZE_64KB + help Using 64KiB page size will result in slightly higher performance kernel at the price of much higher memory consumption compared to 4KiB or 16KiB pages. cheers
Re: [v2 PATCH 0/3] arch: mm, vdso: consolidate PAGE_SIZE definition
On Wed, Mar 06 2024 at 15:14, Arnd Bergmann wrote: > From: Arnd Bergmann > > Naresh noticed that the newly added usage of the PAGE_SIZE macro in > include/vdso/datapage.h introduced a build regression. I had an older > patch that I revived to have this defined through Kconfig rather than > through including asm/page.h, which is not allowed in vdso code. > > The vdso patch series now has a temporary workaround, but I still want to > get this into v6.9 so we can place the hack with CONFIG_PAGE_SIZE > in the vdso. Thank you for cleaning this up! tglx
Re: [PATCH v2 3/3] arch: define CONFIG_PAGE_SIZE_*KB on all architectures
On Wed, Mar 06 2024 at 15:14, Arnd Bergmann wrote: > From: Arnd Bergmann > > Most architectures only support a single hardcoded page size. In order > to ensure that each one of these sets the corresponding Kconfig symbols, > change over the PAGE_SHIFT definition to the common one and allow > only the hardware page size to be selected. > > Acked-by: Guo Ren > Acked-by: Heiko Carstens > Acked-by: Stafford Horne > Acked-by: Johannes Berg > Signed-off-by: Arnd Bergmann Reviewed-by: Thomas Gleixner
Re: [PATCH v2 2/3] arch: simplify architecture specific page size configuration
On Wed, Mar 06 2024 at 15:14, Arnd Bergmann wrote: > From: Arnd Bergmann > > arc, arm64, parisc and powerpc all have their own Kconfig symbols > in place of the common CONFIG_PAGE_SIZE_4KB symbols. Change these > so the common symbols are the ones that are actually used, while > leaving the arhcitecture specific ones as the user visible > place for configuring it, to avoid breaking user configs. > > Reviewed-by: Christophe Leroy (powerpc32) > Acked-by: Catalin Marinas > Acked-by: Helge Deller # parisc > Signed-off-by: Arnd Bergmann Reviewed-by: Thomas Gleixner
Re: [PATCH v2 1/3] arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions
On Wed, Mar 06 2024 at 15:14, Arnd Bergmann wrote: > From: Arnd Bergmann > > These four architectures define the same Kconfig symbols for configuring > the page size. Move the logic into a common place where it can be shared > with all other architectures. > > Signed-off-by: Arnd Bergmann Reviewed-by: Thomas Gleixner
Re: [PATCH v2 3/3] arch: define CONFIG_PAGE_SIZE_*KB on all architectures
On Wed, Mar 6, 2024 at 3:15 PM Arnd Bergmann wrote: > From: Arnd Bergmann > > Most architectures only support a single hardcoded page size. In order > to ensure that each one of these sets the corresponding Kconfig symbols, > change over the PAGE_SHIFT definition to the common one and allow > only the hardware page size to be selected. > > Acked-by: Guo Ren > Acked-by: Heiko Carstens > Acked-by: Stafford Horne > Acked-by: Johannes Berg > Signed-off-by: Arnd Bergmann > --- > No changes from v1 > arch/m68k/Kconfig | 3 +++ > arch/m68k/Kconfig.cpu | 2 ++ > arch/m68k/include/asm/page.h | 6 +- Acked-by: Geert Uytterhoeven Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
[PATCH v2 3/3] arch: define CONFIG_PAGE_SIZE_*KB on all architectures
From: Arnd Bergmann Most architectures only support a single hardcoded page size. In order to ensure that each one of these sets the corresponding Kconfig symbols, change over the PAGE_SHIFT definition to the common one and allow only the hardware page size to be selected. Acked-by: Guo Ren Acked-by: Heiko Carstens Acked-by: Stafford Horne Acked-by: Johannes Berg Signed-off-by: Arnd Bergmann --- No changes from v1 arch/alpha/Kconfig | 1 + arch/alpha/include/asm/page.h | 2 +- arch/arm/Kconfig | 1 + arch/arm/include/asm/page.h| 2 +- arch/csky/Kconfig | 1 + arch/csky/include/asm/page.h | 2 +- arch/m68k/Kconfig | 3 +++ arch/m68k/Kconfig.cpu | 2 ++ arch/m68k/include/asm/page.h | 6 +- arch/microblaze/Kconfig| 1 + arch/microblaze/include/asm/page.h | 2 +- arch/nios2/Kconfig | 1 + arch/nios2/include/asm/page.h | 2 +- arch/openrisc/Kconfig | 1 + arch/openrisc/include/asm/page.h | 2 +- arch/riscv/Kconfig | 1 + arch/riscv/include/asm/page.h | 2 +- arch/s390/Kconfig | 1 + arch/s390/include/asm/page.h | 2 +- arch/sparc/Kconfig | 2 ++ arch/sparc/include/asm/page_32.h | 2 +- arch/sparc/include/asm/page_64.h | 3 +-- arch/um/Kconfig| 1 + arch/um/include/asm/page.h | 2 +- arch/x86/Kconfig | 1 + arch/x86/include/asm/page_types.h | 2 +- arch/xtensa/Kconfig| 1 + arch/xtensa/include/asm/page.h | 2 +- 28 files changed, 32 insertions(+), 19 deletions(-) diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig index d6968d090d49..4f490250d323 100644 --- a/arch/alpha/Kconfig +++ b/arch/alpha/Kconfig @@ -14,6 +14,7 @@ config ALPHA select PCI_DOMAINS if PCI select PCI_SYSCALL if PCI select HAVE_ASM_MODVERSIONS + select HAVE_PAGE_SIZE_8KB select HAVE_PCSPKR_PLATFORM select HAVE_PERF_EVENTS select NEED_DMA_MAP_STATE diff --git a/arch/alpha/include/asm/page.h b/arch/alpha/include/asm/page.h index 4db1ebc0ed99..70419e6be1a3 100644 --- a/arch/alpha/include/asm/page.h +++ b/arch/alpha/include/asm/page.h @@ -6,7 +6,7 @@ #include /* PAGE_SHIFT determines the page size */ -#define PAGE_SHIFT 13 +#define PAGE_SHIFT CONFIG_PAGE_SHIFT #define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT) #define PAGE_MASK (~(PAGE_SIZE-1)) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 0af6709570d1..9d52ba3a8ad1 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -116,6 +116,7 @@ config ARM select HAVE_MOD_ARCH_SPECIFIC select HAVE_NMI select HAVE_OPTPROBES if !THUMB2_KERNEL + select HAVE_PAGE_SIZE_4KB select HAVE_PCI if MMU select HAVE_PERF_EVENTS select HAVE_PERF_REGS diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h index 119aa85d1feb..62af9f7f9e96 100644 --- a/arch/arm/include/asm/page.h +++ b/arch/arm/include/asm/page.h @@ -8,7 +8,7 @@ #define _ASMARM_PAGE_H /* PAGE_SHIFT determines the page size */ -#define PAGE_SHIFT 12 +#define PAGE_SHIFT CONFIG_PAGE_SHIFT #define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT) #define PAGE_MASK (~((1 << PAGE_SHIFT) - 1)) diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig index cf2a6fd7dff8..9c2723ab1c94 100644 --- a/arch/csky/Kconfig +++ b/arch/csky/Kconfig @@ -89,6 +89,7 @@ config CSKY select HAVE_KPROBES if !CPU_CK610 select HAVE_KPROBES_ON_FTRACE if !CPU_CK610 select HAVE_KRETPROBES if !CPU_CK610 + select HAVE_PAGE_SIZE_4KB select HAVE_PERF_EVENTS select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP diff --git a/arch/csky/include/asm/page.h b/arch/csky/include/asm/page.h index 866855e1ab43..0ca6c408c07f 100644 --- a/arch/csky/include/asm/page.h +++ b/arch/csky/include/asm/page.h @@ -10,7 +10,7 @@ /* * PAGE_SHIFT determines the page size: 4KB */ -#define PAGE_SHIFT 12 +#define PAGE_SHIFT CONFIG_PAGE_SHIFT #define PAGE_SIZE (_AC(1, UL) << PAGE_SHIFT) #define PAGE_MASK (~(PAGE_SIZE - 1)) #define THREAD_SIZE(PAGE_SIZE * 2) diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig index 4b3e93cac723..7b709453d5e7 100644 --- a/arch/m68k/Kconfig +++ b/arch/m68k/Kconfig @@ -84,12 +84,15 @@ config MMU config MMU_MOTOROLA bool + select HAVE_PAGE_SIZE_4KB config MMU_COLDFIRE + select HAVE_PAGE_SIZE_8KB bool config MMU_SUN3 bool + select HAVE_PAGE_SIZE_8KB depends on MMU && !MMU_MOTOROLA && !MMU_COLDFIRE config ARCH_SUPPORTS_KEXEC diff --git a/arch/m68k/Kconfig.cpu b/arch/m68k/Kconfig.cpu index 9dcf245c9cbf..c777a129768a 100644 --- a/arch/m68k/Kconfig.cpu +++ b/arch/m68k/Kconfig.cpu @@ -30,6 +30,7 @@ config COLDFIRE select GENERIC_CSUM select GPIOLIB
[PATCH v2 2/3] arch: simplify architecture specific page size configuration
From: Arnd Bergmann arc, arm64, parisc and powerpc all have their own Kconfig symbols in place of the common CONFIG_PAGE_SIZE_4KB symbols. Change these so the common symbols are the ones that are actually used, while leaving the arhcitecture specific ones as the user visible place for configuring it, to avoid breaking user configs. Reviewed-by: Christophe Leroy (powerpc32) Acked-by: Catalin Marinas Acked-by: Helge Deller # parisc Signed-off-by: Arnd Bergmann --- No changes from v1 arch/arc/Kconfig | 3 +++ arch/arc/include/uapi/asm/page.h | 6 ++ arch/arm64/Kconfig| 29 + arch/arm64/include/asm/page-def.h | 2 +- arch/parisc/Kconfig | 3 +++ arch/parisc/include/asm/page.h| 10 +- arch/powerpc/Kconfig | 31 ++- arch/powerpc/include/asm/page.h | 2 +- scripts/gdb/linux/constants.py.in | 2 +- scripts/gdb/linux/mm.py | 2 +- 10 files changed, 32 insertions(+), 58 deletions(-) diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig index 1b0483c51cc1..4092bec198be 100644 --- a/arch/arc/Kconfig +++ b/arch/arc/Kconfig @@ -284,14 +284,17 @@ choice config ARC_PAGE_SIZE_8K bool "8KB" + select HAVE_PAGE_SIZE_8KB help Choose between 8k vs 16k config ARC_PAGE_SIZE_16K + select HAVE_PAGE_SIZE_16KB bool "16KB" config ARC_PAGE_SIZE_4K bool "4KB" + select HAVE_PAGE_SIZE_4KB depends on ARC_MMU_V3 || ARC_MMU_V4 endchoice diff --git a/arch/arc/include/uapi/asm/page.h b/arch/arc/include/uapi/asm/page.h index 2a4ad619abfb..7fd9e741b527 100644 --- a/arch/arc/include/uapi/asm/page.h +++ b/arch/arc/include/uapi/asm/page.h @@ -13,10 +13,8 @@ #include /* PAGE_SHIFT determines the page size */ -#if defined(CONFIG_ARC_PAGE_SIZE_16K) -#define PAGE_SHIFT 14 -#elif defined(CONFIG_ARC_PAGE_SIZE_4K) -#define PAGE_SHIFT 12 +#ifdef __KERNEL__ +#define PAGE_SHIFT CONFIG_PAGE_SHIFT #else /* * Default 8k diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index aa7c1d435139..29290b8cb36d 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -277,27 +277,21 @@ config 64BIT config MMU def_bool y -config ARM64_PAGE_SHIFT - int - default 16 if ARM64_64K_PAGES - default 14 if ARM64_16K_PAGES - default 12 - config ARM64_CONT_PTE_SHIFT int - default 5 if ARM64_64K_PAGES - default 7 if ARM64_16K_PAGES + default 5 if PAGE_SIZE_64KB + default 7 if PAGE_SIZE_16KB default 4 config ARM64_CONT_PMD_SHIFT int - default 5 if ARM64_64K_PAGES - default 5 if ARM64_16K_PAGES + default 5 if PAGE_SIZE_64KB + default 5 if PAGE_SIZE_16KB default 4 config ARCH_MMAP_RND_BITS_MIN - default 14 if ARM64_64K_PAGES - default 16 if ARM64_16K_PAGES + default 14 if PAGE_SIZE_64KB + default 16 if PAGE_SIZE_16KB default 18 # max bits determined by the following formula: @@ -1259,11 +1253,13 @@ choice config ARM64_4K_PAGES bool "4KB" + select HAVE_PAGE_SIZE_4KB help This feature enables 4KB pages support. config ARM64_16K_PAGES bool "16KB" + select HAVE_PAGE_SIZE_16KB help The system will use 16KB pages support. AArch32 emulation requires applications compiled with 16K (or a multiple of 16K) @@ -1271,6 +1267,7 @@ config ARM64_16K_PAGES config ARM64_64K_PAGES bool "64KB" + select HAVE_PAGE_SIZE_64KB help This feature enables 64KB pages support (4KB by default) allowing only two levels of page tables and faster TLB @@ -1291,19 +1288,19 @@ choice config ARM64_VA_BITS_36 bool "36-bit" if EXPERT - depends on ARM64_16K_PAGES + depends on PAGE_SIZE_16KB config ARM64_VA_BITS_39 bool "39-bit" - depends on ARM64_4K_PAGES + depends on PAGE_SIZE_4KB config ARM64_VA_BITS_42 bool "42-bit" - depends on ARM64_64K_PAGES + depends on PAGE_SIZE_64KB config ARM64_VA_BITS_47 bool "47-bit" - depends on ARM64_16K_PAGES + depends on PAGE_SIZE_16KB config ARM64_VA_BITS_48 bool "48-bit" diff --git a/arch/arm64/include/asm/page-def.h b/arch/arm64/include/asm/page-def.h index 2403f7b4cdbf..792e9fe881dc 100644 --- a/arch/arm64/include/asm/page-def.h +++ b/arch/arm64/include/asm/page-def.h @@ -11,7 +11,7 @@ #include /* PAGE_SHIFT determines the page size */ -#define PAGE_SHIFT CONFIG_ARM64_PAGE_SHIFT +#define PAGE_SHIFT CONFIG_PAGE_SHIFT #define PAGE_SIZE (_AC(1, UL) << PAGE_SHIFT) #define PAGE_MASK (~(PAGE_SIZE-1)) diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig index 5c845e8d59d9..b180e684fa0d 100644 --- a/arch/parisc/Kconfig +++ b/arch/parisc/Kconfig @@ -273,6 +273,7 @@ choice config PARISC_PAGE_SIZE_4KB
[PATCH v2 1/3] arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions
From: Arnd Bergmann These four architectures define the same Kconfig symbols for configuring the page size. Move the logic into a common place where it can be shared with all other architectures. Signed-off-by: Arnd Bergmann --- Changes from v1: - improve Kconfig help texts - fix Hexagon Kconfig arch/Kconfig | 92 ++- arch/hexagon/Kconfig | 24 ++-- arch/hexagon/include/asm/page.h | 6 +- arch/loongarch/Kconfig| 21 ++- arch/loongarch/include/asm/page.h | 10 +--- arch/mips/Kconfig | 58 ++- arch/mips/include/asm/page.h | 16 +- arch/sh/include/asm/page.h| 13 + arch/sh/mm/Kconfig| 42 -- 9 files changed, 121 insertions(+), 161 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index a5af0edd3eb8..c63034e092d0 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1078,17 +1078,105 @@ config HAVE_ARCH_COMPAT_MMAP_BASES and vice-versa 32-bit applications to call 64-bit mmap(). Required for applications doing different bitness syscalls. +config HAVE_PAGE_SIZE_4KB + bool + +config HAVE_PAGE_SIZE_8KB + bool + +config HAVE_PAGE_SIZE_16KB + bool + +config HAVE_PAGE_SIZE_32KB + bool + +config HAVE_PAGE_SIZE_64KB + bool + +config HAVE_PAGE_SIZE_256KB + bool + +choice + prompt "MMU page size" + +config PAGE_SIZE_4KB + bool "4KiB pages" + depends on HAVE_PAGE_SIZE_4KB + help + This option select the standard 4KiB Linux page size and the only + available option on many architectures. Using 4KiB page size will + minimize memory consumption and is therefore recommended for low + memory systems. + Some software that is written for x86 systems makes incorrect + assumptions about the page size and only runs on 4KiB pages. + +config PAGE_SIZE_8KB + bool "8KiB pages" + depends on HAVE_PAGE_SIZE_8KB + help + This option is the only supported page size on a few older + processors, and can be slightly faster than 4KiB pages. + +config PAGE_SIZE_16KB + bool "16KiB pages" + depends on HAVE_PAGE_SIZE_16KB + help + This option is usually a good compromise between memory + consumption and performance for typical desktop and server + workloads, often saving a level of page table lookups compared + to 4KB pages as well as reducing TLB pressure and overhead of + per-page operations in the kernel at the expense of a larger + page cache. + +config PAGE_SIZE_32KB + bool "32KiB pages" + depends on HAVE_PAGE_SIZE_32KB + Using 32KiB page size will result in slightly higher performance + kernel at the price of higher memory consumption compared to + 16KiB pages. This option is available only on cnMIPS cores. + Note that you will need a suitable Linux distribution to + support this. + +config PAGE_SIZE_64KB + bool "64KiB pages" + depends on HAVE_PAGE_SIZE_64KB + Using 64KiB page size will result in slightly higher performance + kernel at the price of much higher memory consumption compared to + 4KiB or 16KiB pages. + This is not suitable for general-purpose workloads but the + better performance may be worth the cost for certain types of + supercomputing or database applications that work mostly with + large in-memory data rather than small files. + +config PAGE_SIZE_256KB + bool "256KiB pages" + depends on HAVE_PAGE_SIZE_256KB + help + 256KiB pages have little practical value due to their extreme + memory usage. The kernel will only be able to run applications + that have been compiled with '-zmax-page-size' set to 256KiB + (the default is 64KiB or 4KiB on most architectures). + +endchoice + config PAGE_SIZE_LESS_THAN_64KB def_bool y - depends on !ARM64_64K_PAGES depends on !PAGE_SIZE_64KB - depends on !PARISC_PAGE_SIZE_64KB depends on PAGE_SIZE_LESS_THAN_256KB config PAGE_SIZE_LESS_THAN_256KB def_bool y depends on !PAGE_SIZE_256KB +config PAGE_SHIFT + int + default 12 if PAGE_SIZE_4KB + default 13 if PAGE_SIZE_8KB + default 14 if PAGE_SIZE_16KB + default 15 if PAGE_SIZE_32KB + default 16 if PAGE_SIZE_64KB + default 18 if PAGE_SIZE_256KB + # This allows to use a set of generic functions to determine mmap base # address by giving priority to top-down scheme only if the process # is not in legacy mode (compat task, unlimited stack size or diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig index a880ee067d2e..1414052e7d6b 100644 --- a/arch/hexagon/Kconfig +++ b/arch/hexagon/Kconfig @@ -8,6 +8,10 @@ config HEXAGON select ARCH_HAS_SYNC_DMA_FOR_DEVICE select
[v2 PATCH 0/3] arch: mm, vdso: consolidate PAGE_SIZE definition
From: Arnd Bergmann Naresh noticed that the newly added usage of the PAGE_SIZE macro in include/vdso/datapage.h introduced a build regression. I had an older patch that I revived to have this defined through Kconfig rather than through including asm/page.h, which is not allowed in vdso code. The vdso patch series now has a temporary workaround, but I still want to get this into v6.9 so we can place the hack with CONFIG_PAGE_SIZE in the vdso. I've applied this to the asm-generic tree already, please let me know if there are still remaining issues. It's really close to the merge window already, so I'd probably give this a few more days before I send a pull request, or defer it to v6.10 if anything goes wrong. Sorry for the delay, I was still waiting to resolve the m68k question, but there were no further replies in the end, so I kept my original version. Changes from v1: - improve Kconfig help texts - remove an extraneous line in hexagon Arnd Link: https://lore.kernel.org/lkml/ca+g9fytrxxm_ko9fnpz3xarxhv7ud_yqp-teupqrnrhu+_0...@mail.gmail.com/ Link: https://lore.kernel.org/all/65dc6c14.170a0220.f4a3f.9...@mx.google.com/ Link: https://lore.kernel.org/lkml/20240226161414.2316610-1-a...@kernel.org/ Arnd Bergmann (3): arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions arch: simplify architecture specific page size configuration arch: define CONFIG_PAGE_SIZE_*KB on all architectures arch/Kconfig | 92 +- arch/alpha/Kconfig | 1 + arch/alpha/include/asm/page.h | 2 +- arch/arc/Kconfig | 3 + arch/arc/include/uapi/asm/page.h | 6 +- arch/arm/Kconfig | 1 + arch/arm/include/asm/page.h| 2 +- arch/arm64/Kconfig | 29 +- arch/arm64/include/asm/page-def.h | 2 +- arch/csky/Kconfig | 1 + arch/csky/include/asm/page.h | 2 +- arch/hexagon/Kconfig | 24 ++-- arch/hexagon/include/asm/page.h| 6 +- arch/loongarch/Kconfig | 21 ++- arch/loongarch/include/asm/page.h | 10 +--- arch/m68k/Kconfig | 3 + arch/m68k/Kconfig.cpu | 2 + arch/m68k/include/asm/page.h | 6 +- arch/microblaze/Kconfig| 1 + arch/microblaze/include/asm/page.h | 2 +- arch/mips/Kconfig | 58 ++- arch/mips/include/asm/page.h | 16 +- arch/nios2/Kconfig | 1 + arch/nios2/include/asm/page.h | 2 +- arch/openrisc/Kconfig | 1 + arch/openrisc/include/asm/page.h | 2 +- arch/parisc/Kconfig| 3 + arch/parisc/include/asm/page.h | 10 +--- arch/powerpc/Kconfig | 31 ++ arch/powerpc/include/asm/page.h| 2 +- arch/riscv/Kconfig | 1 + arch/riscv/include/asm/page.h | 2 +- arch/s390/Kconfig | 1 + arch/s390/include/asm/page.h | 2 +- arch/sh/include/asm/page.h | 13 + arch/sh/mm/Kconfig | 42 -- arch/sparc/Kconfig | 2 + arch/sparc/include/asm/page_32.h | 2 +- arch/sparc/include/asm/page_64.h | 3 +- arch/um/Kconfig| 1 + arch/um/include/asm/page.h | 2 +- arch/x86/Kconfig | 1 + arch/x86/include/asm/page_types.h | 2 +- arch/xtensa/Kconfig| 1 + arch/xtensa/include/asm/page.h | 2 +- scripts/gdb/linux/constants.py.in | 2 +- scripts/gdb/linux/mm.py| 2 +- 47 files changed, 185 insertions(+), 238 deletions(-) -- 2.39.2 To: Thomas Gleixner To: Vincenzo Frascino To: Kees Cook To: Anna-Maria Behnsen Cc: Matt Turner Cc: Vineet Gupta Cc: Russell King Cc: Catalin Marinas Cc: Guo Ren Cc: Brian Cain Cc: Huacai Chen Cc: Geert Uytterhoeven Cc: Michal Simek Cc: Thomas Bogendoerfer Cc: Helge Deller Cc: Michael Ellerman Cc: Christophe Leroy Cc: Palmer Dabbelt Cc: John Paul Adrian Glaubitz Cc: Andreas Larsson Cc: Richard Weinberger Cc: x...@kernel.org Cc: Max Filippov Cc: Andy Lutomirski Cc: Vincenzo Frascino Cc: Jan Kiszka Cc: Kieran Bingham Cc: Andrew Morton Cc: Arnd Bergmann Cc: linux-ker...@vger.kernel.org Cc: linux-alpha@vger.kernel.org Cc: linux-snps-...@lists.infradead.org Cc: linux-arm-ker...@lists.infradead.org Cc: linux-c...@vger.kernel.org Cc: linux-hexa...@vger.kernel.org Cc: loonga...@lists.linux.dev Cc: linux-m...@lists.linux-m68k.org Cc: linux-m...@vger.kernel.org Cc: linux-openr...@vger.kernel.org Cc: linux-par...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Cc: linux-ri...@lists.infradead.org Cc: linux-s...@vger.kernel.org Cc: linux...@vger.kernel.org Cc: sparcli...@vger.kernel.org Cc: linux...@lists.infradead.org
Re: [RFC PATCH 00/14] Introducing TIF_NOTIFY_IPI flag
On Wed, 6 Mar 2024 at 11:18, K Prateek Nayak wrote: > > Hello Vincent, > > Thank you for taking a look at the series. > > On 3/6/2024 3:29 PM, Vincent Guittot wrote: > > Hi Prateek, > > > > Adding Julia who could be interested in this patchset. Your patchset > > should trigger idle load balance instead of newly idle load balance > > now when the polling is used. This was one reason for not migrating > > task in idle CPU > > Thank you. > > > > > On Tue, 20 Feb 2024 at 18:15, K Prateek Nayak > > wrote: > >> > >> Hello everyone, > >> > >> [..snip..] > >> > >> > >> Skipping newidle_balance() > >> == > >> > >> In an earlier attempt to solve the challenge of the long IRQ disabled > >> section, newidle_balance() was skipped when a CPU waking up from idle > >> was found to have no runnable tasks, and was transitioning back to > >> idle [2]. Tim [3] and David [4] had pointed out that newidle_balance() > >> may be viable for CPUs that are idling with tick enabled, where the > >> newidle_balance() has the opportunity to pull tasks onto the idle CPU. > >> > >> Vincent [5] pointed out a case where the idle load kick will fail to > >> run on an idle CPU since the IPI handler launching the ILB will check > >> for need_resched(). In such cases, the idle CPU relies on > >> newidle_balance() to pull tasks towards itself. > > > > Calling newidle_balance() instead of the normal idle load balance > > prevents the CPU to pull tasks from other groups > > Thank you for the correction. > > > > >> > >> Using an alternate flag instead of NEED_RESCHED to indicate a pending > >> IPI was suggested as the correct approach to solve this problem on the > >> same thread. > >> > >> > >> Proposed solution: TIF_NOTIFY_IPI > >> = > >> > >> Instead of reusing TIF_NEED_RESCHED bit to pull an TIF_POLLING CPU out > >> of idle, TIF_NOTIFY_IPI is a newly introduced flag that > >> call_function_single_prep_ipi() sets on a target TIF_POLLING CPU to > >> indicate a pending IPI, which the idle CPU promises to process soon. > >> > >> On architectures that do not support the TIF_NOTIFY_IPI flag (this > >> series only adds support for x86 and ARM processors for now), > > > > I'm surprised that you are mentioning ARM processors because they > > don't use TIF_POLLING. > > Yup I just realised that after Linus Walleij pointed it out on the > thread. > > > > >> call_function_single_prep_ipi() will fallback to setting > >> TIF_NEED_RESCHED bit to pull the TIF_POLLING CPU out of idle. > >> > >> Since the pending IPI handlers are processed before the call to > >> schedule_idle() in do_idle(), schedule_idle() will only be called if the > >> IPI handler have woken / migrated a new task on the idle CPU and has set > >> TIF_NEED_RESCHED bit to indicate the same. This avoids running into the > >> long IRQ disabled section in schedule_idle() unnecessarily, and any > >> need_resched() check within a call function will accurately notify if a > >> task is waiting for CPU time on the CPU handling the IPI. > >> > >> Following is the crude visualization of how the situation changes with > >> the newly introduced TIF_NOTIFY_IPI flag: > >> -- > >> CPU0CPU1 > >> > >> do_idle() { > >> > >> __current_set_polling(); > >> ... > >> > >> monitor(addr); > >> if > >> (!need_resched_or_ipi()) > >> > >> mwait() { > >> /* > >> Waiting */ > >> smp_call_function_single(CPU1, func, wait = 1) { > >> ... > >> ... > >> ... > >> set_nr_if_polling(CPU1) { > >> ... > >> /* Realizes CPU1 is polling */ > >> ... > >> try_cmpxchg(addr, > >> ... > >> , > >> ... > >> val | _TIF_NOTIFY_IPI); > >> ... > >> } /* Does not send an IPI */ > >> ... > >> ... } > >> /* mwait exit due to write at addr */ > >> csd_lock_wait() { ... > >> /* Waiting */ > >>
Re: [RFC PATCH 00/14] Introducing TIF_NOTIFY_IPI flag
Hello Vincent, Thank you for taking a look at the series. On 3/6/2024 3:29 PM, Vincent Guittot wrote: > Hi Prateek, > > Adding Julia who could be interested in this patchset. Your patchset > should trigger idle load balance instead of newly idle load balance > now when the polling is used. This was one reason for not migrating > task in idle CPU Thank you. > > On Tue, 20 Feb 2024 at 18:15, K Prateek Nayak wrote: >> >> Hello everyone, >> >> [..snip..] >> >> >> Skipping newidle_balance() >> == >> >> In an earlier attempt to solve the challenge of the long IRQ disabled >> section, newidle_balance() was skipped when a CPU waking up from idle >> was found to have no runnable tasks, and was transitioning back to >> idle [2]. Tim [3] and David [4] had pointed out that newidle_balance() >> may be viable for CPUs that are idling with tick enabled, where the >> newidle_balance() has the opportunity to pull tasks onto the idle CPU. >> >> Vincent [5] pointed out a case where the idle load kick will fail to >> run on an idle CPU since the IPI handler launching the ILB will check >> for need_resched(). In such cases, the idle CPU relies on >> newidle_balance() to pull tasks towards itself. > > Calling newidle_balance() instead of the normal idle load balance > prevents the CPU to pull tasks from other groups Thank you for the correction. > >> >> Using an alternate flag instead of NEED_RESCHED to indicate a pending >> IPI was suggested as the correct approach to solve this problem on the >> same thread. >> >> >> Proposed solution: TIF_NOTIFY_IPI >> = >> >> Instead of reusing TIF_NEED_RESCHED bit to pull an TIF_POLLING CPU out >> of idle, TIF_NOTIFY_IPI is a newly introduced flag that >> call_function_single_prep_ipi() sets on a target TIF_POLLING CPU to >> indicate a pending IPI, which the idle CPU promises to process soon. >> >> On architectures that do not support the TIF_NOTIFY_IPI flag (this >> series only adds support for x86 and ARM processors for now), > > I'm surprised that you are mentioning ARM processors because they > don't use TIF_POLLING. Yup I just realised that after Linus Walleij pointed it out on the thread. > >> call_function_single_prep_ipi() will fallback to setting >> TIF_NEED_RESCHED bit to pull the TIF_POLLING CPU out of idle. >> >> Since the pending IPI handlers are processed before the call to >> schedule_idle() in do_idle(), schedule_idle() will only be called if the >> IPI handler have woken / migrated a new task on the idle CPU and has set >> TIF_NEED_RESCHED bit to indicate the same. This avoids running into the >> long IRQ disabled section in schedule_idle() unnecessarily, and any >> need_resched() check within a call function will accurately notify if a >> task is waiting for CPU time on the CPU handling the IPI. >> >> Following is the crude visualization of how the situation changes with >> the newly introduced TIF_NOTIFY_IPI flag: >> -- >> CPU0CPU1 >> >> do_idle() { >> >> __current_set_polling(); >> ... >> >> monitor(addr); >> if >> (!need_resched_or_ipi()) >> >> mwait() { >> /* >> Waiting */ >> smp_call_function_single(CPU1, func, wait = 1) { >>... >> ... >>... >> set_nr_if_polling(CPU1) { >>... >> /* Realizes CPU1 is polling */ >>... >> try_cmpxchg(addr, >>... >> , >>... >> val | _TIF_NOTIFY_IPI); >>... >> } /* Does not send an IPI */ >>... >> ... } /* >> mwait exit due to write at addr */ >> csd_lock_wait() { ... >> /* Waiting */ >> preempt_fold_need_resched(); /* fold if NEED_RESCHED */ >> ... >> __current_clr_polling(); >> ... >> flush_smp_call_function_queue() { >> ...
Re: [RFC PATCH 00/14] Introducing TIF_NOTIFY_IPI flag
Hi Prateek, Adding Julia who could be interested in this patchset. Your patchset should trigger idle load balance instead of newly idle load balance now when the polling is used. This was one reason for not migrating task in idle CPU On Tue, 20 Feb 2024 at 18:15, K Prateek Nayak wrote: > > Hello everyone, > > Before jumping into the issue, let me clarify the Cc list. Everyone have > been cc'ed on Patch 0 through Patch 3. Respective arch maintainers, > reviewers, and committers returned by scripts/get_maintainer.pl have > been cc'ed on the respective arch side changes. Scheduler and CPU Idle > maintainers and reviewers have been included for the entire series. If I > have missed anyone, please do add them. If you would like to be dropped > from the cc list, wholly or partially, for the future iterations, please > do let me know. > > With that out of the way ... > > Problem statement > = > > When measuring IPI throughput using a modified version of Anton > Blanchard's ipistorm benchmark [1], configured to measure time taken to > perform a fixed number of smp_call_function_single() (with wait set to > 1), an increase in benchmark time was observed between v5.7 and the > current upstream release (v6.7-rc6 at the time of encounter). > > Bisection pointed to commit b2a02fc43a1f ("smp: Optimize > send_call_function_single_ipi()") as the reason behind this increase in > runtime. > > > Experiments > === > > Since the commit cannot be cleanly reverted on top of the current > tip:sched/core, the effects of the optimizations were reverted by: > > 1. Removing the check for call_function_single_prep_ipi() in >send_call_function_single_ipi(). With this change >send_call_function_single_ipi() always calls >arch_send_call_function_single_ipi() > > 2. Removing the call to flush_smp_call_function_queue() in do_idle() >since every smp_call_function, with (1.), would unconditionally send >an IPI to an idle CPU in TIF_POLLING mode. > > Following is the diff of the above described changes which will be > henceforth referred to as the "revert": > > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c > index 31231925f1ec..735184d98c0f 100644 > --- a/kernel/sched/idle.c > +++ b/kernel/sched/idle.c > @@ -332,11 +332,6 @@ static void do_idle(void) > */ > smp_mb__after_atomic(); > > - /* > -* RCU relies on this call to be done outside of an RCU read-side > -* critical section. > -*/ > - flush_smp_call_function_queue(); > schedule_idle(); > > if (unlikely(klp_patch_pending(current))) > diff --git a/kernel/smp.c b/kernel/smp.c > index f085ebcdf9e7..2ff100c41885 100644 > --- a/kernel/smp.c > +++ b/kernel/smp.c > @@ -111,11 +111,9 @@ void __init call_function_init(void) > static __always_inline void > send_call_function_single_ipi(int cpu) > { > - if (call_function_single_prep_ipi(cpu)) { > - trace_ipi_send_cpu(cpu, _RET_IP_, > - > generic_smp_call_function_single_interrupt); > - arch_send_call_function_single_ipi(cpu); > - } > + trace_ipi_send_cpu(cpu, _RET_IP_, > + generic_smp_call_function_single_interrupt); > + arch_send_call_function_single_ipi(cpu); > } > > static __always_inline void > -- > > With the revert, the time taken to complete a fixed set of IPIs using > ipistorm improves significantly. Following are the numbers from a dual > socket 3rd Generation EPYC system (2 x 64C/128T) (boost on, C2 disabled) > running ipistorm between CPU8 and CPU16: > > cmdline: insmod ipistorm.ko numipi=10 single=1 offset=8 cpulist=8 wait=1 > > (tip:sched/core at tag "sched-core-2024-01-08" for all the testing done > below) > > == > Test : ipistorm (modified) > Units : Normalized runtime > Interpretation: Lower is better > Statistic : AMean > == > kernel: time [pct imp] > tip:sched/core1.00 [0.00] > tip:sched/core + revert 0.81 [19.36] > > Although the revert improves ipistorm performance, it also regresses > tbench and netperf, supporting the validity of the optimization. > Following are netperf and tbench numbers from the same machine comparing > vanilla tip:sched/core and the revert applied on top: > > == > Test : tbench > Units : Normalized throughput > Interpretation: Higher is better > Statistic : AMean > == > Clients:tip[pct imp](CV) revert[pct imp](CV) > 1 1.00 [ 0.00]( 0.24) 0.91 [ -8.96]( 0.30) > 2 1.00 [ 0.00]( 0.25) 0.92 [ -8.20]( 0.97) > 4 1.00 [ 0.00]( 0.23) 0.91 [ -9.20]( 1.75)