[PATCH v4 10/14] treewide: Use initializer for struct vm_unmapped_area_info
Future changes will need to add a new member to struct vm_unmapped_area_info. This would cause trouble for any call site that doesn't initialize the struct. Currently every caller sets each member manually, so if new ones are added they will be uninitialized and the core code parsing the struct will see garbage in the new member. It could be possible to initialize the new member manually to 0 at each call site. This and a couple other options were discussed. Having some struct vm_unmapped_area_info instances not zero initialized will put those sites at risk of feeding garbage into vm_unmapped_area(), if the convention is to zero initialize the struct and any new field addition missed a call site that initializes each field manually. So it is useful to do things similar across the kernel. The consensus (see links) was that in general the best way to accomplish taking into account both code cleanliness and minimizing the chance of introducing bugs, was to do C99 static initialization. As in: struct vm_unmapped_area_info info = {}; With this method of initialization, the whole struct will be zero initialized, and any statements setting fields to zero will be unneeded. The change should not leave cleanup at the call sides. While iterating though the possible solutions a few archs kindly acked other variations that still zero initialized the struct. These sites have been modified in previous changes using the pattern acked by the respective arch. So to be reduce the chance of bugs via uninitialized fields, perform a tree wide change using the consensus for the best general way to do this change. Use C99 static initializing to zero the struct and remove and statements that simply set members to zero. Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Cc: linux...@kvack.org Cc: linux-alpha@vger.kernel.org Cc: linux-snps-...@lists.infradead.org Cc: linux-arm-ker...@lists.infradead.org Cc: linux-c...@vger.kernel.org Cc: loonga...@lists.linux.dev Cc: linux-m...@vger.kernel.org Cc: linux-s...@vger.kernel.org Cc: linux...@vger.kernel.org Cc: sparcli...@vger.kernel.org Link: https://lore.kernel.org/lkml/202402280912.33AEE7A9CF@keescook/#t Link: https://lore.kernel.org/lkml/j7bfvig3gew3qruouxrh7z7ehjjafrgkbcmg6tcghhfh3rhmzi@wzlcoecgy5rs/ Link: https://lore.kernel.org/lkml/ec3e377a-c0a0-4dd3-9cb9-96517e54d...@csgroup.eu/ --- v4: - Trivial rebase conflict in s390 Hi archs, For some context, this is part of a larger series to improve shadow stack guard gaps. It involves plumbing a new field via struct vm_unmapped_area_info. The first user is x86, but arm and riscv may likely use it as well. The change is compile tested only for non-x86. Thanks, Rick --- arch/alpha/kernel/osf_sys.c | 5 + arch/arc/mm/mmap.c | 4 +--- arch/arm/mm/mmap.c | 5 ++--- arch/loongarch/mm/mmap.c | 3 +-- arch/mips/mm/mmap.c | 3 +-- arch/s390/mm/hugetlbpage.c | 7 ++- arch/s390/mm/mmap.c | 5 ++--- arch/sh/mm/mmap.c| 5 ++--- arch/sparc/kernel/sys_sparc_32.c | 3 +-- arch/sparc/kernel/sys_sparc_64.c | 5 ++--- arch/sparc/mm/hugetlbpage.c | 7 ++- arch/x86/kernel/sys_x86_64.c | 7 ++- arch/x86/mm/hugetlbpage.c| 7 ++- fs/hugetlbfs/inode.c | 7 ++- mm/mmap.c| 9 ++--- 15 files changed, 25 insertions(+), 57 deletions(-) diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c index 5db88b627439..e5f881bc8288 100644 --- a/arch/alpha/kernel/osf_sys.c +++ b/arch/alpha/kernel/osf_sys.c @@ -1218,14 +1218,11 @@ static unsigned long arch_get_unmapped_area_1(unsigned long addr, unsigned long len, unsigned long limit) { - struct vm_unmapped_area_info info; + struct vm_unmapped_area_info info = {}; - info.flags = 0; info.length = len; info.low_limit = addr; info.high_limit = limit; - info.align_mask = 0; - info.align_offset = 0; return vm_unmapped_area(); } diff --git a/arch/arc/mm/mmap.c b/arch/arc/mm/mmap.c index 3c1c7ae73292..69a915297155 100644 --- a/arch/arc/mm/mmap.c +++ b/arch/arc/mm/mmap.c @@ -27,7 +27,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr, { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; - struct vm_unmapped_area_info info; + struct vm_unmapped_area_info info = {}; /* * We enforce the MAP_FIXED case. @@ -51,11 +51,9 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr, return addr; } - info.flags = 0; info.length = len; info.low_limit = mm->mmap_base; info.high_limit = TASK_SIZE; - info.align_mask = 0; info.align_offset = pgoff << PAGE_SHIFT; return vm_unmapped_area(); } diff --git a/arch/arm/mm/mmap.c b/arch/arm/mm/mmap.c index a0f8a0ca0788..d65d0e6ed10a 100644 ---
Re: [PATCH v3 08/12] treewide: Use initializer for struct vm_unmapped_area_info
On Tue, 2024-03-12 at 20:18 -0700, Kees Cook wrote: > > Thanks! This looks to do exactly what it describes. :) > > Reviewed-by: Kees Cook Thanks!
Re: [PATCH v3 08/12] treewide: Use initializer for struct vm_unmapped_area_info
On Tue, Mar 12, 2024 at 03:28:39PM -0700, Rick Edgecombe wrote: > So to be reduce the chance of bugs via uninitialized fields, perform a > tree wide change using the consensus for the best general way to do this > change. Use C99 static initializing to zero the struct and remove and > statements that simply set members to zero. > > Signed-off-by: Rick Edgecombe Thanks! This looks to do exactly what it describes. :) Reviewed-by: Kees Cook -- Kees Cook
[PATCH v3 08/12] treewide: Use initializer for struct vm_unmapped_area_info
Future changes will need to add a new member to struct vm_unmapped_area_info. This would cause trouble for any call site that doesn't initialize the struct. Currently every caller sets each member manually, so if new ones are added they will be uninitialized and the core code parsing the struct will see garbage in the new member. It could be possible to initialize the new member manually to 0 at each call site. This and a couple other options were discussed. Having some struct vm_unmapped_area_info instances not zero initialized will put those sites at risk of feeding garbage into vm_unmapped_area(), if the convention is to zero initialize the struct and any new field addition missed a call site that initializes each field manually. So it is useful to do things similar across the kernel. The consensus (see links) was that in general the best way to accomplish taking into account both code cleanliness and minimizing the chance of introducing bugs, was to do C99 static initialization. As in: struct vm_unmapped_area_info info = {}; With this method of initialization, the whole struct will be zero initialized, and any statements setting fields to zero will be unneeded. The change should not leave cleanup at the call sides. While iterating though the possible solutions a few archs kindly acked other variations that still zero initialized the struct. These sites have been modified in previous changes using the pattern acked by the respective arch. So to be reduce the chance of bugs via uninitialized fields, perform a tree wide change using the consensus for the best general way to do this change. Use C99 static initializing to zero the struct and remove and statements that simply set members to zero. Signed-off-by: Rick Edgecombe Cc: linux...@kvack.org Cc: linux-alpha@vger.kernel.org Cc: linux-snps-...@lists.infradead.org Cc: linux-arm-ker...@lists.infradead.org Cc: linux-c...@vger.kernel.org Cc: loonga...@lists.linux.dev Cc: linux-m...@vger.kernel.org Cc: linux-s...@vger.kernel.org Cc: linux...@vger.kernel.org Cc: sparcli...@vger.kernel.org Link: https://lore.kernel.org/lkml/202402280912.33AEE7A9CF@keescook/#t Link: https://lore.kernel.org/lkml/j7bfvig3gew3qruouxrh7z7ehjjafrgkbcmg6tcghhfh3rhmzi@wzlcoecgy5rs/ Link: https://lore.kernel.org/lkml/ec3e377a-c0a0-4dd3-9cb9-96517e54d...@csgroup.eu/ --- Hi archs, For some context, this is part of a larger series to improve shadow stack guard gaps. It involves plumbing a new field via struct vm_unmapped_area_info. The first user is x86, but arm and riscv may likely use it as well. The change is compile tested only for non-x86. Thanks, Rick --- arch/alpha/kernel/osf_sys.c | 5 + arch/arc/mm/mmap.c | 4 +--- arch/arm/mm/mmap.c | 5 ++--- arch/loongarch/mm/mmap.c | 3 +-- arch/mips/mm/mmap.c | 3 +-- arch/s390/mm/hugetlbpage.c | 7 ++- arch/s390/mm/mmap.c | 11 --- arch/sh/mm/mmap.c| 5 ++--- arch/sparc/kernel/sys_sparc_32.c | 3 +-- arch/sparc/kernel/sys_sparc_64.c | 5 ++--- arch/sparc/mm/hugetlbpage.c | 7 ++- arch/x86/kernel/sys_x86_64.c | 7 ++- arch/x86/mm/hugetlbpage.c| 7 ++- fs/hugetlbfs/inode.c | 7 ++- mm/mmap.c| 9 ++--- 15 files changed, 27 insertions(+), 61 deletions(-) diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c index 5db88b627439..e5f881bc8288 100644 --- a/arch/alpha/kernel/osf_sys.c +++ b/arch/alpha/kernel/osf_sys.c @@ -1218,14 +1218,11 @@ static unsigned long arch_get_unmapped_area_1(unsigned long addr, unsigned long len, unsigned long limit) { - struct vm_unmapped_area_info info; + struct vm_unmapped_area_info info = {}; - info.flags = 0; info.length = len; info.low_limit = addr; info.high_limit = limit; - info.align_mask = 0; - info.align_offset = 0; return vm_unmapped_area(); } diff --git a/arch/arc/mm/mmap.c b/arch/arc/mm/mmap.c index 3c1c7ae73292..69a915297155 100644 --- a/arch/arc/mm/mmap.c +++ b/arch/arc/mm/mmap.c @@ -27,7 +27,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr, { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; - struct vm_unmapped_area_info info; + struct vm_unmapped_area_info info = {}; /* * We enforce the MAP_FIXED case. @@ -51,11 +51,9 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr, return addr; } - info.flags = 0; info.length = len; info.low_limit = mm->mmap_base; info.high_limit = TASK_SIZE; - info.align_mask = 0; info.align_offset = pgoff << PAGE_SHIFT; return vm_unmapped_area(); } diff --git a/arch/arm/mm/mmap.c b/arch/arm/mm/mmap.c index a0f8a0ca0788..d65d0e6ed10a 100644 --- a/arch/arm/mm/mmap.c +++ b/arch/arm/mm/mmap.c @@
Re: [v2 PATCH 0/3] arch: mm, vdso: consolidate PAGE_SIZE definition
On 06/03/2024 14:14, Arnd Bergmann wrote: > From: Arnd Bergmann > > Naresh noticed that the newly added usage of the PAGE_SIZE macro in > include/vdso/datapage.h introduced a build regression. I had an older > patch that I revived to have this defined through Kconfig rather than > through including asm/page.h, which is not allowed in vdso code. > > The vdso patch series now has a temporary workaround, but I still want to > get this into v6.9 so we can place the hack with CONFIG_PAGE_SIZE > in the vdso. > > I've applied this to the asm-generic tree already, please let me know if > there are still remaining issues. It's really close to the merge window > already, so I'd probably give this a few more days before I send a pull > request, or defer it to v6.10 if anything goes wrong. > > Sorry for the delay, I was still waiting to resolve the m68k question, > but there were no further replies in the end, so I kept my original > version. > > Changes from v1: > > - improve Kconfig help texts > - remove an extraneous line in hexagon > > Arnd > Thanks Arnd, looks good to me. Reviewed-by: Vincenzo Frascino
Re: [RFC PATCH 00/14] Introducing TIF_NOTIFY_IPI flag
On Wed, 6 Mar 2024, Vincent Guittot wrote: > Hi Prateek, > > Adding Julia who could be interested in this patchset. Your patchset > should trigger idle load balance instead of newly idle load balance > now when the polling is used. This was one reason for not migrating > task in idle CPU My situation is roughly as follows: The machine is an Intel 6130 with two sockets and 32 hardware threads (subsequently referred to as cores) per socket. The test is bt.B of the OpenMP version of the NAS benchmark suite. Initially there is one thread per core. NUMA balancing occurs, resulting in a move, and thus 31 threads on one socket and 33 on the other. Load balancing should result in the idle core pulling one of the threads from the other socket. But that doesn't happen in normal load balancing, because all 33 threads on the overloaded socket are considered to have a preference for that socket. Active balancing could pull a thread, but it is not triggered because the idle core is seen as being newly idle. The question is then why a core that has been idle for up to multiple seconds is continually seen as newly idle. Every 4ms, a scheduler tick submits some work to try to load balance. This submission process previously broke out of the idle loop due to a need_resched, hence the same issue as involved in this patch series. The need_resched caused invocation of schedule, which would then see that there was no task to pick, making the core be considered to be newly idle. The classification as newly idle doesn't take into account whether any task was running prior to the call to schedule. The load balancing work that was submitted every 4ms is also a NOP due a test for need_resched. This patch series no longer makes need resched be the only way out of the idle loop. Without the need resched, the load balancing work that is submitted every 4ms can actually try to do load balancing. The core is not newly idle, so active balancing could in principle occur. But now nothing happens because the work is run by ksoftirqd. The presence of ksoftirqd on the idle core means that the core is no longer idle. Thus there is no more need for load balancing. So this patch series in itself doesn't solve the problem. I did 500 runs with this patch series and 500 runs with the Linux kernel that this patch series builds on, and there is essentially no difference in the performance. julia > > On Tue, 20 Feb 2024 at 18:15, K Prateek Nayak wrote: > > > > Hello everyone, > > > > Before jumping into the issue, let me clarify the Cc list. Everyone have > > been cc'ed on Patch 0 through Patch 3. Respective arch maintainers, > > reviewers, and committers returned by scripts/get_maintainer.pl have > > been cc'ed on the respective arch side changes. Scheduler and CPU Idle > > maintainers and reviewers have been included for the entire series. If I > > have missed anyone, please do add them. If you would like to be dropped > > from the cc list, wholly or partially, for the future iterations, please > > do let me know. > > > > With that out of the way ... > > > > Problem statement > > = > > > > When measuring IPI throughput using a modified version of Anton > > Blanchard's ipistorm benchmark [1], configured to measure time taken to > > perform a fixed number of smp_call_function_single() (with wait set to > > 1), an increase in benchmark time was observed between v5.7 and the > > current upstream release (v6.7-rc6 at the time of encounter). > > > > Bisection pointed to commit b2a02fc43a1f ("smp: Optimize > > send_call_function_single_ipi()") as the reason behind this increase in > > runtime. > > > > > > Experiments > > === > > > > Since the commit cannot be cleanly reverted on top of the current > > tip:sched/core, the effects of the optimizations were reverted by: > > > > 1. Removing the check for call_function_single_prep_ipi() in > >send_call_function_single_ipi(). With this change > >send_call_function_single_ipi() always calls > >arch_send_call_function_single_ipi() > > > > 2. Removing the call to flush_smp_call_function_queue() in do_idle() > >since every smp_call_function, with (1.), would unconditionally send > >an IPI to an idle CPU in TIF_POLLING mode. > > > > Following is the diff of the above described changes which will be > > henceforth referred to as the "revert": > > > > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c > > index 31231925f1ec..735184d98c0f 100644 > > --- a/kernel/sched/idle.c > > +++ b/kernel/sched/idle.c > > @@ -332,11 +332,6 @@ static void do_idle(void) > > */ > > smp_mb__after_atomic(); > > > > - /* > > -* RCU relies on this call to be done outside of an RCU read-side > > -* critical section. > > -*/ > > - flush_smp_call_function_queue(); > > schedule_idle(); > > > > if (unlikely(klp_patch_pending(current))) > > diff --git a/kernel/smp.c b/kernel/smp.c > > index
Re: [PATCH v2 3/3] arch: define CONFIG_PAGE_SIZE_*KB on all architectures
On 2024-03-06 15:14, Arnd Bergmann wrote: > From: Arnd Bergmann > > Most architectures only support a single hardcoded page size. In order > to ensure that each one of these sets the corresponding Kconfig symbols, > change over the PAGE_SHIFT definition to the common one and allow > only the hardware page size to be selected. > > Acked-by: Guo Ren > Acked-by: Heiko Carstens > Acked-by: Stafford Horne > Acked-by: Johannes Berg > Signed-off-by: Arnd Bergmann > --- > No changes from v1 > arch/sparc/Kconfig | 2 ++ > arch/sparc/include/asm/page_32.h | 2 +- > arch/sparc/include/asm/page_64.h | 3 +-- Acked-by: Andreas Larsson Thanks, Andreas
Re: [PATCH v2 2/3] arch: simplify architecture specific page size configuration
Arnd Bergmann writes: > From: Arnd Bergmann > > arc, arm64, parisc and powerpc all have their own Kconfig symbols > in place of the common CONFIG_PAGE_SIZE_4KB symbols. Change these > so the common symbols are the ones that are actually used, while > leaving the arhcitecture specific ones as the user visible > place for configuring it, to avoid breaking user configs. > > Reviewed-by: Christophe Leroy (powerpc32) > Acked-by: Catalin Marinas > Acked-by: Helge Deller # parisc > Signed-off-by: Arnd Bergmann > --- > No changes from v1 > > arch/arc/Kconfig | 3 +++ > arch/arc/include/uapi/asm/page.h | 6 ++ > arch/arm64/Kconfig| 29 + > arch/arm64/include/asm/page-def.h | 2 +- > arch/parisc/Kconfig | 3 +++ > arch/parisc/include/asm/page.h| 10 +- > arch/powerpc/Kconfig | 31 ++- > arch/powerpc/include/asm/page.h | 2 +- > scripts/gdb/linux/constants.py.in | 2 +- > scripts/gdb/linux/mm.py | 2 +- > 10 files changed, 32 insertions(+), 58 deletions(-) Acked-by: Michael Ellerman (powerpc) cheers
Re: [PATCH v2 1/3] arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions
Hi Arnd, Arnd Bergmann writes: > From: Arnd Bergmann > > These four architectures define the same Kconfig symbols for configuring > the page size. Move the logic into a common place where it can be shared > with all other architectures. > > Signed-off-by: Arnd Bergmann > --- > Changes from v1: > - improve Kconfig help texts > - fix Hexagon Kconfig > > arch/Kconfig | 92 ++- > arch/hexagon/Kconfig | 24 ++-- > arch/hexagon/include/asm/page.h | 6 +- > arch/loongarch/Kconfig| 21 ++- > arch/loongarch/include/asm/page.h | 10 +--- > arch/mips/Kconfig | 58 ++- > arch/mips/include/asm/page.h | 16 +- > arch/sh/include/asm/page.h| 13 + > arch/sh/mm/Kconfig| 42 -- > 9 files changed, 121 insertions(+), 161 deletions(-) There's a few "help" lines missing, which breaks the build: arch/Kconfig:1134: syntax error arch/Kconfig:1133: invalid statement arch/Kconfig:1134: invalid statement arch/Kconfig:1135:warning: ignoring unsupported character '.' arch/Kconfig:1135:warning: ignoring unsupported character '.' arch/Kconfig:1135: invalid statement arch/Kconfig:1136: invalid statement arch/Kconfig:1137:warning: ignoring unsupported character '.' arch/Kconfig:1137: invalid statement arch/Kconfig:1143: syntax error arch/Kconfig:1142: invalid statement arch/Kconfig:1143: invalid statement arch/Kconfig:1144:warning: ignoring unsupported character '.' arch/Kconfig:1144: invalid statement arch/Kconfig:1145: invalid statement arch/Kconfig:1146: invalid statement arch/Kconfig:1147: invalid statement arch/Kconfig:1148:warning: ignoring unsupported character '.' arch/Kconfig:1148: invalid statement make[4]: *** [../scripts/kconfig/Makefile:85: syncconfig] Error 1 Fixup diff is: diff --git a/arch/Kconfig b/arch/Kconfig index 56d45a75f625..f2295fa3b48c 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1130,6 +1130,7 @@ config PAGE_SIZE_16KB config PAGE_SIZE_32KB bool "32KiB pages" depends on HAVE_PAGE_SIZE_32KB + help Using 32KiB page size will result in slightly higher performance kernel at the price of higher memory consumption compared to 16KiB pages. This option is available only on cnMIPS cores. @@ -1139,6 +1140,7 @@ config PAGE_SIZE_32KB config PAGE_SIZE_64KB bool "64KiB pages" depends on HAVE_PAGE_SIZE_64KB + help Using 64KiB page size will result in slightly higher performance kernel at the price of much higher memory consumption compared to 4KiB or 16KiB pages. cheers
Re: [v2 PATCH 0/3] arch: mm, vdso: consolidate PAGE_SIZE definition
On Wed, Mar 06 2024 at 15:14, Arnd Bergmann wrote: > From: Arnd Bergmann > > Naresh noticed that the newly added usage of the PAGE_SIZE macro in > include/vdso/datapage.h introduced a build regression. I had an older > patch that I revived to have this defined through Kconfig rather than > through including asm/page.h, which is not allowed in vdso code. > > The vdso patch series now has a temporary workaround, but I still want to > get this into v6.9 so we can place the hack with CONFIG_PAGE_SIZE > in the vdso. Thank you for cleaning this up! tglx
Re: [PATCH v2 3/3] arch: define CONFIG_PAGE_SIZE_*KB on all architectures
On Wed, Mar 06 2024 at 15:14, Arnd Bergmann wrote: > From: Arnd Bergmann > > Most architectures only support a single hardcoded page size. In order > to ensure that each one of these sets the corresponding Kconfig symbols, > change over the PAGE_SHIFT definition to the common one and allow > only the hardware page size to be selected. > > Acked-by: Guo Ren > Acked-by: Heiko Carstens > Acked-by: Stafford Horne > Acked-by: Johannes Berg > Signed-off-by: Arnd Bergmann Reviewed-by: Thomas Gleixner
Re: [PATCH v2 2/3] arch: simplify architecture specific page size configuration
On Wed, Mar 06 2024 at 15:14, Arnd Bergmann wrote: > From: Arnd Bergmann > > arc, arm64, parisc and powerpc all have their own Kconfig symbols > in place of the common CONFIG_PAGE_SIZE_4KB symbols. Change these > so the common symbols are the ones that are actually used, while > leaving the arhcitecture specific ones as the user visible > place for configuring it, to avoid breaking user configs. > > Reviewed-by: Christophe Leroy (powerpc32) > Acked-by: Catalin Marinas > Acked-by: Helge Deller # parisc > Signed-off-by: Arnd Bergmann Reviewed-by: Thomas Gleixner
Re: [PATCH v2 1/3] arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions
On Wed, Mar 06 2024 at 15:14, Arnd Bergmann wrote: > From: Arnd Bergmann > > These four architectures define the same Kconfig symbols for configuring > the page size. Move the logic into a common place where it can be shared > with all other architectures. > > Signed-off-by: Arnd Bergmann Reviewed-by: Thomas Gleixner
Re: [PATCH v2 3/3] arch: define CONFIG_PAGE_SIZE_*KB on all architectures
On Wed, Mar 6, 2024 at 3:15 PM Arnd Bergmann wrote: > From: Arnd Bergmann > > Most architectures only support a single hardcoded page size. In order > to ensure that each one of these sets the corresponding Kconfig symbols, > change over the PAGE_SHIFT definition to the common one and allow > only the hardware page size to be selected. > > Acked-by: Guo Ren > Acked-by: Heiko Carstens > Acked-by: Stafford Horne > Acked-by: Johannes Berg > Signed-off-by: Arnd Bergmann > --- > No changes from v1 > arch/m68k/Kconfig | 3 +++ > arch/m68k/Kconfig.cpu | 2 ++ > arch/m68k/include/asm/page.h | 6 +- Acked-by: Geert Uytterhoeven Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
[PATCH v2 3/3] arch: define CONFIG_PAGE_SIZE_*KB on all architectures
From: Arnd Bergmann Most architectures only support a single hardcoded page size. In order to ensure that each one of these sets the corresponding Kconfig symbols, change over the PAGE_SHIFT definition to the common one and allow only the hardware page size to be selected. Acked-by: Guo Ren Acked-by: Heiko Carstens Acked-by: Stafford Horne Acked-by: Johannes Berg Signed-off-by: Arnd Bergmann --- No changes from v1 arch/alpha/Kconfig | 1 + arch/alpha/include/asm/page.h | 2 +- arch/arm/Kconfig | 1 + arch/arm/include/asm/page.h| 2 +- arch/csky/Kconfig | 1 + arch/csky/include/asm/page.h | 2 +- arch/m68k/Kconfig | 3 +++ arch/m68k/Kconfig.cpu | 2 ++ arch/m68k/include/asm/page.h | 6 +- arch/microblaze/Kconfig| 1 + arch/microblaze/include/asm/page.h | 2 +- arch/nios2/Kconfig | 1 + arch/nios2/include/asm/page.h | 2 +- arch/openrisc/Kconfig | 1 + arch/openrisc/include/asm/page.h | 2 +- arch/riscv/Kconfig | 1 + arch/riscv/include/asm/page.h | 2 +- arch/s390/Kconfig | 1 + arch/s390/include/asm/page.h | 2 +- arch/sparc/Kconfig | 2 ++ arch/sparc/include/asm/page_32.h | 2 +- arch/sparc/include/asm/page_64.h | 3 +-- arch/um/Kconfig| 1 + arch/um/include/asm/page.h | 2 +- arch/x86/Kconfig | 1 + arch/x86/include/asm/page_types.h | 2 +- arch/xtensa/Kconfig| 1 + arch/xtensa/include/asm/page.h | 2 +- 28 files changed, 32 insertions(+), 19 deletions(-) diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig index d6968d090d49..4f490250d323 100644 --- a/arch/alpha/Kconfig +++ b/arch/alpha/Kconfig @@ -14,6 +14,7 @@ config ALPHA select PCI_DOMAINS if PCI select PCI_SYSCALL if PCI select HAVE_ASM_MODVERSIONS + select HAVE_PAGE_SIZE_8KB select HAVE_PCSPKR_PLATFORM select HAVE_PERF_EVENTS select NEED_DMA_MAP_STATE diff --git a/arch/alpha/include/asm/page.h b/arch/alpha/include/asm/page.h index 4db1ebc0ed99..70419e6be1a3 100644 --- a/arch/alpha/include/asm/page.h +++ b/arch/alpha/include/asm/page.h @@ -6,7 +6,7 @@ #include /* PAGE_SHIFT determines the page size */ -#define PAGE_SHIFT 13 +#define PAGE_SHIFT CONFIG_PAGE_SHIFT #define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT) #define PAGE_MASK (~(PAGE_SIZE-1)) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 0af6709570d1..9d52ba3a8ad1 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -116,6 +116,7 @@ config ARM select HAVE_MOD_ARCH_SPECIFIC select HAVE_NMI select HAVE_OPTPROBES if !THUMB2_KERNEL + select HAVE_PAGE_SIZE_4KB select HAVE_PCI if MMU select HAVE_PERF_EVENTS select HAVE_PERF_REGS diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h index 119aa85d1feb..62af9f7f9e96 100644 --- a/arch/arm/include/asm/page.h +++ b/arch/arm/include/asm/page.h @@ -8,7 +8,7 @@ #define _ASMARM_PAGE_H /* PAGE_SHIFT determines the page size */ -#define PAGE_SHIFT 12 +#define PAGE_SHIFT CONFIG_PAGE_SHIFT #define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT) #define PAGE_MASK (~((1 << PAGE_SHIFT) - 1)) diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig index cf2a6fd7dff8..9c2723ab1c94 100644 --- a/arch/csky/Kconfig +++ b/arch/csky/Kconfig @@ -89,6 +89,7 @@ config CSKY select HAVE_KPROBES if !CPU_CK610 select HAVE_KPROBES_ON_FTRACE if !CPU_CK610 select HAVE_KRETPROBES if !CPU_CK610 + select HAVE_PAGE_SIZE_4KB select HAVE_PERF_EVENTS select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP diff --git a/arch/csky/include/asm/page.h b/arch/csky/include/asm/page.h index 866855e1ab43..0ca6c408c07f 100644 --- a/arch/csky/include/asm/page.h +++ b/arch/csky/include/asm/page.h @@ -10,7 +10,7 @@ /* * PAGE_SHIFT determines the page size: 4KB */ -#define PAGE_SHIFT 12 +#define PAGE_SHIFT CONFIG_PAGE_SHIFT #define PAGE_SIZE (_AC(1, UL) << PAGE_SHIFT) #define PAGE_MASK (~(PAGE_SIZE - 1)) #define THREAD_SIZE(PAGE_SIZE * 2) diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig index 4b3e93cac723..7b709453d5e7 100644 --- a/arch/m68k/Kconfig +++ b/arch/m68k/Kconfig @@ -84,12 +84,15 @@ config MMU config MMU_MOTOROLA bool + select HAVE_PAGE_SIZE_4KB config MMU_COLDFIRE + select HAVE_PAGE_SIZE_8KB bool config MMU_SUN3 bool + select HAVE_PAGE_SIZE_8KB depends on MMU && !MMU_MOTOROLA && !MMU_COLDFIRE config ARCH_SUPPORTS_KEXEC diff --git a/arch/m68k/Kconfig.cpu b/arch/m68k/Kconfig.cpu index 9dcf245c9cbf..c777a129768a 100644 --- a/arch/m68k/Kconfig.cpu +++ b/arch/m68k/Kconfig.cpu @@ -30,6 +30,7 @@ config COLDFIRE select GENERIC_CSUM select GPIOLIB
[PATCH v2 2/3] arch: simplify architecture specific page size configuration
From: Arnd Bergmann arc, arm64, parisc and powerpc all have their own Kconfig symbols in place of the common CONFIG_PAGE_SIZE_4KB symbols. Change these so the common symbols are the ones that are actually used, while leaving the arhcitecture specific ones as the user visible place for configuring it, to avoid breaking user configs. Reviewed-by: Christophe Leroy (powerpc32) Acked-by: Catalin Marinas Acked-by: Helge Deller # parisc Signed-off-by: Arnd Bergmann --- No changes from v1 arch/arc/Kconfig | 3 +++ arch/arc/include/uapi/asm/page.h | 6 ++ arch/arm64/Kconfig| 29 + arch/arm64/include/asm/page-def.h | 2 +- arch/parisc/Kconfig | 3 +++ arch/parisc/include/asm/page.h| 10 +- arch/powerpc/Kconfig | 31 ++- arch/powerpc/include/asm/page.h | 2 +- scripts/gdb/linux/constants.py.in | 2 +- scripts/gdb/linux/mm.py | 2 +- 10 files changed, 32 insertions(+), 58 deletions(-) diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig index 1b0483c51cc1..4092bec198be 100644 --- a/arch/arc/Kconfig +++ b/arch/arc/Kconfig @@ -284,14 +284,17 @@ choice config ARC_PAGE_SIZE_8K bool "8KB" + select HAVE_PAGE_SIZE_8KB help Choose between 8k vs 16k config ARC_PAGE_SIZE_16K + select HAVE_PAGE_SIZE_16KB bool "16KB" config ARC_PAGE_SIZE_4K bool "4KB" + select HAVE_PAGE_SIZE_4KB depends on ARC_MMU_V3 || ARC_MMU_V4 endchoice diff --git a/arch/arc/include/uapi/asm/page.h b/arch/arc/include/uapi/asm/page.h index 2a4ad619abfb..7fd9e741b527 100644 --- a/arch/arc/include/uapi/asm/page.h +++ b/arch/arc/include/uapi/asm/page.h @@ -13,10 +13,8 @@ #include /* PAGE_SHIFT determines the page size */ -#if defined(CONFIG_ARC_PAGE_SIZE_16K) -#define PAGE_SHIFT 14 -#elif defined(CONFIG_ARC_PAGE_SIZE_4K) -#define PAGE_SHIFT 12 +#ifdef __KERNEL__ +#define PAGE_SHIFT CONFIG_PAGE_SHIFT #else /* * Default 8k diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index aa7c1d435139..29290b8cb36d 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -277,27 +277,21 @@ config 64BIT config MMU def_bool y -config ARM64_PAGE_SHIFT - int - default 16 if ARM64_64K_PAGES - default 14 if ARM64_16K_PAGES - default 12 - config ARM64_CONT_PTE_SHIFT int - default 5 if ARM64_64K_PAGES - default 7 if ARM64_16K_PAGES + default 5 if PAGE_SIZE_64KB + default 7 if PAGE_SIZE_16KB default 4 config ARM64_CONT_PMD_SHIFT int - default 5 if ARM64_64K_PAGES - default 5 if ARM64_16K_PAGES + default 5 if PAGE_SIZE_64KB + default 5 if PAGE_SIZE_16KB default 4 config ARCH_MMAP_RND_BITS_MIN - default 14 if ARM64_64K_PAGES - default 16 if ARM64_16K_PAGES + default 14 if PAGE_SIZE_64KB + default 16 if PAGE_SIZE_16KB default 18 # max bits determined by the following formula: @@ -1259,11 +1253,13 @@ choice config ARM64_4K_PAGES bool "4KB" + select HAVE_PAGE_SIZE_4KB help This feature enables 4KB pages support. config ARM64_16K_PAGES bool "16KB" + select HAVE_PAGE_SIZE_16KB help The system will use 16KB pages support. AArch32 emulation requires applications compiled with 16K (or a multiple of 16K) @@ -1271,6 +1267,7 @@ config ARM64_16K_PAGES config ARM64_64K_PAGES bool "64KB" + select HAVE_PAGE_SIZE_64KB help This feature enables 64KB pages support (4KB by default) allowing only two levels of page tables and faster TLB @@ -1291,19 +1288,19 @@ choice config ARM64_VA_BITS_36 bool "36-bit" if EXPERT - depends on ARM64_16K_PAGES + depends on PAGE_SIZE_16KB config ARM64_VA_BITS_39 bool "39-bit" - depends on ARM64_4K_PAGES + depends on PAGE_SIZE_4KB config ARM64_VA_BITS_42 bool "42-bit" - depends on ARM64_64K_PAGES + depends on PAGE_SIZE_64KB config ARM64_VA_BITS_47 bool "47-bit" - depends on ARM64_16K_PAGES + depends on PAGE_SIZE_16KB config ARM64_VA_BITS_48 bool "48-bit" diff --git a/arch/arm64/include/asm/page-def.h b/arch/arm64/include/asm/page-def.h index 2403f7b4cdbf..792e9fe881dc 100644 --- a/arch/arm64/include/asm/page-def.h +++ b/arch/arm64/include/asm/page-def.h @@ -11,7 +11,7 @@ #include /* PAGE_SHIFT determines the page size */ -#define PAGE_SHIFT CONFIG_ARM64_PAGE_SHIFT +#define PAGE_SHIFT CONFIG_PAGE_SHIFT #define PAGE_SIZE (_AC(1, UL) << PAGE_SHIFT) #define PAGE_MASK (~(PAGE_SIZE-1)) diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig index 5c845e8d59d9..b180e684fa0d 100644 --- a/arch/parisc/Kconfig +++ b/arch/parisc/Kconfig @@ -273,6 +273,7 @@ choice config PARISC_PAGE_SIZE_4KB
[PATCH v2 1/3] arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions
From: Arnd Bergmann These four architectures define the same Kconfig symbols for configuring the page size. Move the logic into a common place where it can be shared with all other architectures. Signed-off-by: Arnd Bergmann --- Changes from v1: - improve Kconfig help texts - fix Hexagon Kconfig arch/Kconfig | 92 ++- arch/hexagon/Kconfig | 24 ++-- arch/hexagon/include/asm/page.h | 6 +- arch/loongarch/Kconfig| 21 ++- arch/loongarch/include/asm/page.h | 10 +--- arch/mips/Kconfig | 58 ++- arch/mips/include/asm/page.h | 16 +- arch/sh/include/asm/page.h| 13 + arch/sh/mm/Kconfig| 42 -- 9 files changed, 121 insertions(+), 161 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index a5af0edd3eb8..c63034e092d0 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1078,17 +1078,105 @@ config HAVE_ARCH_COMPAT_MMAP_BASES and vice-versa 32-bit applications to call 64-bit mmap(). Required for applications doing different bitness syscalls. +config HAVE_PAGE_SIZE_4KB + bool + +config HAVE_PAGE_SIZE_8KB + bool + +config HAVE_PAGE_SIZE_16KB + bool + +config HAVE_PAGE_SIZE_32KB + bool + +config HAVE_PAGE_SIZE_64KB + bool + +config HAVE_PAGE_SIZE_256KB + bool + +choice + prompt "MMU page size" + +config PAGE_SIZE_4KB + bool "4KiB pages" + depends on HAVE_PAGE_SIZE_4KB + help + This option select the standard 4KiB Linux page size and the only + available option on many architectures. Using 4KiB page size will + minimize memory consumption and is therefore recommended for low + memory systems. + Some software that is written for x86 systems makes incorrect + assumptions about the page size and only runs on 4KiB pages. + +config PAGE_SIZE_8KB + bool "8KiB pages" + depends on HAVE_PAGE_SIZE_8KB + help + This option is the only supported page size on a few older + processors, and can be slightly faster than 4KiB pages. + +config PAGE_SIZE_16KB + bool "16KiB pages" + depends on HAVE_PAGE_SIZE_16KB + help + This option is usually a good compromise between memory + consumption and performance for typical desktop and server + workloads, often saving a level of page table lookups compared + to 4KB pages as well as reducing TLB pressure and overhead of + per-page operations in the kernel at the expense of a larger + page cache. + +config PAGE_SIZE_32KB + bool "32KiB pages" + depends on HAVE_PAGE_SIZE_32KB + Using 32KiB page size will result in slightly higher performance + kernel at the price of higher memory consumption compared to + 16KiB pages. This option is available only on cnMIPS cores. + Note that you will need a suitable Linux distribution to + support this. + +config PAGE_SIZE_64KB + bool "64KiB pages" + depends on HAVE_PAGE_SIZE_64KB + Using 64KiB page size will result in slightly higher performance + kernel at the price of much higher memory consumption compared to + 4KiB or 16KiB pages. + This is not suitable for general-purpose workloads but the + better performance may be worth the cost for certain types of + supercomputing or database applications that work mostly with + large in-memory data rather than small files. + +config PAGE_SIZE_256KB + bool "256KiB pages" + depends on HAVE_PAGE_SIZE_256KB + help + 256KiB pages have little practical value due to their extreme + memory usage. The kernel will only be able to run applications + that have been compiled with '-zmax-page-size' set to 256KiB + (the default is 64KiB or 4KiB on most architectures). + +endchoice + config PAGE_SIZE_LESS_THAN_64KB def_bool y - depends on !ARM64_64K_PAGES depends on !PAGE_SIZE_64KB - depends on !PARISC_PAGE_SIZE_64KB depends on PAGE_SIZE_LESS_THAN_256KB config PAGE_SIZE_LESS_THAN_256KB def_bool y depends on !PAGE_SIZE_256KB +config PAGE_SHIFT + int + default 12 if PAGE_SIZE_4KB + default 13 if PAGE_SIZE_8KB + default 14 if PAGE_SIZE_16KB + default 15 if PAGE_SIZE_32KB + default 16 if PAGE_SIZE_64KB + default 18 if PAGE_SIZE_256KB + # This allows to use a set of generic functions to determine mmap base # address by giving priority to top-down scheme only if the process # is not in legacy mode (compat task, unlimited stack size or diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig index a880ee067d2e..1414052e7d6b 100644 --- a/arch/hexagon/Kconfig +++ b/arch/hexagon/Kconfig @@ -8,6 +8,10 @@ config HEXAGON select ARCH_HAS_SYNC_DMA_FOR_DEVICE select
[v2 PATCH 0/3] arch: mm, vdso: consolidate PAGE_SIZE definition
From: Arnd Bergmann Naresh noticed that the newly added usage of the PAGE_SIZE macro in include/vdso/datapage.h introduced a build regression. I had an older patch that I revived to have this defined through Kconfig rather than through including asm/page.h, which is not allowed in vdso code. The vdso patch series now has a temporary workaround, but I still want to get this into v6.9 so we can place the hack with CONFIG_PAGE_SIZE in the vdso. I've applied this to the asm-generic tree already, please let me know if there are still remaining issues. It's really close to the merge window already, so I'd probably give this a few more days before I send a pull request, or defer it to v6.10 if anything goes wrong. Sorry for the delay, I was still waiting to resolve the m68k question, but there were no further replies in the end, so I kept my original version. Changes from v1: - improve Kconfig help texts - remove an extraneous line in hexagon Arnd Link: https://lore.kernel.org/lkml/ca+g9fytrxxm_ko9fnpz3xarxhv7ud_yqp-teupqrnrhu+_0...@mail.gmail.com/ Link: https://lore.kernel.org/all/65dc6c14.170a0220.f4a3f.9...@mx.google.com/ Link: https://lore.kernel.org/lkml/20240226161414.2316610-1-a...@kernel.org/ Arnd Bergmann (3): arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions arch: simplify architecture specific page size configuration arch: define CONFIG_PAGE_SIZE_*KB on all architectures arch/Kconfig | 92 +- arch/alpha/Kconfig | 1 + arch/alpha/include/asm/page.h | 2 +- arch/arc/Kconfig | 3 + arch/arc/include/uapi/asm/page.h | 6 +- arch/arm/Kconfig | 1 + arch/arm/include/asm/page.h| 2 +- arch/arm64/Kconfig | 29 +- arch/arm64/include/asm/page-def.h | 2 +- arch/csky/Kconfig | 1 + arch/csky/include/asm/page.h | 2 +- arch/hexagon/Kconfig | 24 ++-- arch/hexagon/include/asm/page.h| 6 +- arch/loongarch/Kconfig | 21 ++- arch/loongarch/include/asm/page.h | 10 +--- arch/m68k/Kconfig | 3 + arch/m68k/Kconfig.cpu | 2 + arch/m68k/include/asm/page.h | 6 +- arch/microblaze/Kconfig| 1 + arch/microblaze/include/asm/page.h | 2 +- arch/mips/Kconfig | 58 ++- arch/mips/include/asm/page.h | 16 +- arch/nios2/Kconfig | 1 + arch/nios2/include/asm/page.h | 2 +- arch/openrisc/Kconfig | 1 + arch/openrisc/include/asm/page.h | 2 +- arch/parisc/Kconfig| 3 + arch/parisc/include/asm/page.h | 10 +--- arch/powerpc/Kconfig | 31 ++ arch/powerpc/include/asm/page.h| 2 +- arch/riscv/Kconfig | 1 + arch/riscv/include/asm/page.h | 2 +- arch/s390/Kconfig | 1 + arch/s390/include/asm/page.h | 2 +- arch/sh/include/asm/page.h | 13 + arch/sh/mm/Kconfig | 42 -- arch/sparc/Kconfig | 2 + arch/sparc/include/asm/page_32.h | 2 +- arch/sparc/include/asm/page_64.h | 3 +- arch/um/Kconfig| 1 + arch/um/include/asm/page.h | 2 +- arch/x86/Kconfig | 1 + arch/x86/include/asm/page_types.h | 2 +- arch/xtensa/Kconfig| 1 + arch/xtensa/include/asm/page.h | 2 +- scripts/gdb/linux/constants.py.in | 2 +- scripts/gdb/linux/mm.py| 2 +- 47 files changed, 185 insertions(+), 238 deletions(-) -- 2.39.2 To: Thomas Gleixner To: Vincenzo Frascino To: Kees Cook To: Anna-Maria Behnsen Cc: Matt Turner Cc: Vineet Gupta Cc: Russell King Cc: Catalin Marinas Cc: Guo Ren Cc: Brian Cain Cc: Huacai Chen Cc: Geert Uytterhoeven Cc: Michal Simek Cc: Thomas Bogendoerfer Cc: Helge Deller Cc: Michael Ellerman Cc: Christophe Leroy Cc: Palmer Dabbelt Cc: John Paul Adrian Glaubitz Cc: Andreas Larsson Cc: Richard Weinberger Cc: x...@kernel.org Cc: Max Filippov Cc: Andy Lutomirski Cc: Vincenzo Frascino Cc: Jan Kiszka Cc: Kieran Bingham Cc: Andrew Morton Cc: Arnd Bergmann Cc: linux-ker...@vger.kernel.org Cc: linux-alpha@vger.kernel.org Cc: linux-snps-...@lists.infradead.org Cc: linux-arm-ker...@lists.infradead.org Cc: linux-c...@vger.kernel.org Cc: linux-hexa...@vger.kernel.org Cc: loonga...@lists.linux.dev Cc: linux-m...@lists.linux-m68k.org Cc: linux-m...@vger.kernel.org Cc: linux-openr...@vger.kernel.org Cc: linux-par...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Cc: linux-ri...@lists.infradead.org Cc: linux-s...@vger.kernel.org Cc: linux...@vger.kernel.org Cc: sparcli...@vger.kernel.org Cc: linux...@lists.infradead.org
Re: [RFC PATCH 00/14] Introducing TIF_NOTIFY_IPI flag
On Wed, 6 Mar 2024 at 11:18, K Prateek Nayak wrote: > > Hello Vincent, > > Thank you for taking a look at the series. > > On 3/6/2024 3:29 PM, Vincent Guittot wrote: > > Hi Prateek, > > > > Adding Julia who could be interested in this patchset. Your patchset > > should trigger idle load balance instead of newly idle load balance > > now when the polling is used. This was one reason for not migrating > > task in idle CPU > > Thank you. > > > > > On Tue, 20 Feb 2024 at 18:15, K Prateek Nayak > > wrote: > >> > >> Hello everyone, > >> > >> [..snip..] > >> > >> > >> Skipping newidle_balance() > >> == > >> > >> In an earlier attempt to solve the challenge of the long IRQ disabled > >> section, newidle_balance() was skipped when a CPU waking up from idle > >> was found to have no runnable tasks, and was transitioning back to > >> idle [2]. Tim [3] and David [4] had pointed out that newidle_balance() > >> may be viable for CPUs that are idling with tick enabled, where the > >> newidle_balance() has the opportunity to pull tasks onto the idle CPU. > >> > >> Vincent [5] pointed out a case where the idle load kick will fail to > >> run on an idle CPU since the IPI handler launching the ILB will check > >> for need_resched(). In such cases, the idle CPU relies on > >> newidle_balance() to pull tasks towards itself. > > > > Calling newidle_balance() instead of the normal idle load balance > > prevents the CPU to pull tasks from other groups > > Thank you for the correction. > > > > >> > >> Using an alternate flag instead of NEED_RESCHED to indicate a pending > >> IPI was suggested as the correct approach to solve this problem on the > >> same thread. > >> > >> > >> Proposed solution: TIF_NOTIFY_IPI > >> = > >> > >> Instead of reusing TIF_NEED_RESCHED bit to pull an TIF_POLLING CPU out > >> of idle, TIF_NOTIFY_IPI is a newly introduced flag that > >> call_function_single_prep_ipi() sets on a target TIF_POLLING CPU to > >> indicate a pending IPI, which the idle CPU promises to process soon. > >> > >> On architectures that do not support the TIF_NOTIFY_IPI flag (this > >> series only adds support for x86 and ARM processors for now), > > > > I'm surprised that you are mentioning ARM processors because they > > don't use TIF_POLLING. > > Yup I just realised that after Linus Walleij pointed it out on the > thread. > > > > >> call_function_single_prep_ipi() will fallback to setting > >> TIF_NEED_RESCHED bit to pull the TIF_POLLING CPU out of idle. > >> > >> Since the pending IPI handlers are processed before the call to > >> schedule_idle() in do_idle(), schedule_idle() will only be called if the > >> IPI handler have woken / migrated a new task on the idle CPU and has set > >> TIF_NEED_RESCHED bit to indicate the same. This avoids running into the > >> long IRQ disabled section in schedule_idle() unnecessarily, and any > >> need_resched() check within a call function will accurately notify if a > >> task is waiting for CPU time on the CPU handling the IPI. > >> > >> Following is the crude visualization of how the situation changes with > >> the newly introduced TIF_NOTIFY_IPI flag: > >> -- > >> CPU0CPU1 > >> > >> do_idle() { > >> > >> __current_set_polling(); > >> ... > >> > >> monitor(addr); > >> if > >> (!need_resched_or_ipi()) > >> > >> mwait() { > >> /* > >> Waiting */ > >> smp_call_function_single(CPU1, func, wait = 1) { > >> ... > >> ... > >> ... > >> set_nr_if_polling(CPU1) { > >> ... > >> /* Realizes CPU1 is polling */ > >> ... > >> try_cmpxchg(addr, > >> ... > >> , > >> ... > >> val | _TIF_NOTIFY_IPI); > >> ... > >> } /* Does not send an IPI */ > >> ... > >> ... } > >> /* mwait exit due to write at addr */ > >> csd_lock_wait() { ... > >> /* Waiting */ > >>
Re: [RFC PATCH 00/14] Introducing TIF_NOTIFY_IPI flag
Hello Vincent, Thank you for taking a look at the series. On 3/6/2024 3:29 PM, Vincent Guittot wrote: > Hi Prateek, > > Adding Julia who could be interested in this patchset. Your patchset > should trigger idle load balance instead of newly idle load balance > now when the polling is used. This was one reason for not migrating > task in idle CPU Thank you. > > On Tue, 20 Feb 2024 at 18:15, K Prateek Nayak wrote: >> >> Hello everyone, >> >> [..snip..] >> >> >> Skipping newidle_balance() >> == >> >> In an earlier attempt to solve the challenge of the long IRQ disabled >> section, newidle_balance() was skipped when a CPU waking up from idle >> was found to have no runnable tasks, and was transitioning back to >> idle [2]. Tim [3] and David [4] had pointed out that newidle_balance() >> may be viable for CPUs that are idling with tick enabled, where the >> newidle_balance() has the opportunity to pull tasks onto the idle CPU. >> >> Vincent [5] pointed out a case where the idle load kick will fail to >> run on an idle CPU since the IPI handler launching the ILB will check >> for need_resched(). In such cases, the idle CPU relies on >> newidle_balance() to pull tasks towards itself. > > Calling newidle_balance() instead of the normal idle load balance > prevents the CPU to pull tasks from other groups Thank you for the correction. > >> >> Using an alternate flag instead of NEED_RESCHED to indicate a pending >> IPI was suggested as the correct approach to solve this problem on the >> same thread. >> >> >> Proposed solution: TIF_NOTIFY_IPI >> = >> >> Instead of reusing TIF_NEED_RESCHED bit to pull an TIF_POLLING CPU out >> of idle, TIF_NOTIFY_IPI is a newly introduced flag that >> call_function_single_prep_ipi() sets on a target TIF_POLLING CPU to >> indicate a pending IPI, which the idle CPU promises to process soon. >> >> On architectures that do not support the TIF_NOTIFY_IPI flag (this >> series only adds support for x86 and ARM processors for now), > > I'm surprised that you are mentioning ARM processors because they > don't use TIF_POLLING. Yup I just realised that after Linus Walleij pointed it out on the thread. > >> call_function_single_prep_ipi() will fallback to setting >> TIF_NEED_RESCHED bit to pull the TIF_POLLING CPU out of idle. >> >> Since the pending IPI handlers are processed before the call to >> schedule_idle() in do_idle(), schedule_idle() will only be called if the >> IPI handler have woken / migrated a new task on the idle CPU and has set >> TIF_NEED_RESCHED bit to indicate the same. This avoids running into the >> long IRQ disabled section in schedule_idle() unnecessarily, and any >> need_resched() check within a call function will accurately notify if a >> task is waiting for CPU time on the CPU handling the IPI. >> >> Following is the crude visualization of how the situation changes with >> the newly introduced TIF_NOTIFY_IPI flag: >> -- >> CPU0CPU1 >> >> do_idle() { >> >> __current_set_polling(); >> ... >> >> monitor(addr); >> if >> (!need_resched_or_ipi()) >> >> mwait() { >> /* >> Waiting */ >> smp_call_function_single(CPU1, func, wait = 1) { >>... >> ... >>... >> set_nr_if_polling(CPU1) { >>... >> /* Realizes CPU1 is polling */ >>... >> try_cmpxchg(addr, >>... >> , >>... >> val | _TIF_NOTIFY_IPI); >>... >> } /* Does not send an IPI */ >>... >> ... } /* >> mwait exit due to write at addr */ >> csd_lock_wait() { ... >> /* Waiting */ >> preempt_fold_need_resched(); /* fold if NEED_RESCHED */ >> ... >> __current_clr_polling(); >> ... >> flush_smp_call_function_queue() { >> ...
Re: [RFC PATCH 00/14] Introducing TIF_NOTIFY_IPI flag
Hi Prateek, Adding Julia who could be interested in this patchset. Your patchset should trigger idle load balance instead of newly idle load balance now when the polling is used. This was one reason for not migrating task in idle CPU On Tue, 20 Feb 2024 at 18:15, K Prateek Nayak wrote: > > Hello everyone, > > Before jumping into the issue, let me clarify the Cc list. Everyone have > been cc'ed on Patch 0 through Patch 3. Respective arch maintainers, > reviewers, and committers returned by scripts/get_maintainer.pl have > been cc'ed on the respective arch side changes. Scheduler and CPU Idle > maintainers and reviewers have been included for the entire series. If I > have missed anyone, please do add them. If you would like to be dropped > from the cc list, wholly or partially, for the future iterations, please > do let me know. > > With that out of the way ... > > Problem statement > = > > When measuring IPI throughput using a modified version of Anton > Blanchard's ipistorm benchmark [1], configured to measure time taken to > perform a fixed number of smp_call_function_single() (with wait set to > 1), an increase in benchmark time was observed between v5.7 and the > current upstream release (v6.7-rc6 at the time of encounter). > > Bisection pointed to commit b2a02fc43a1f ("smp: Optimize > send_call_function_single_ipi()") as the reason behind this increase in > runtime. > > > Experiments > === > > Since the commit cannot be cleanly reverted on top of the current > tip:sched/core, the effects of the optimizations were reverted by: > > 1. Removing the check for call_function_single_prep_ipi() in >send_call_function_single_ipi(). With this change >send_call_function_single_ipi() always calls >arch_send_call_function_single_ipi() > > 2. Removing the call to flush_smp_call_function_queue() in do_idle() >since every smp_call_function, with (1.), would unconditionally send >an IPI to an idle CPU in TIF_POLLING mode. > > Following is the diff of the above described changes which will be > henceforth referred to as the "revert": > > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c > index 31231925f1ec..735184d98c0f 100644 > --- a/kernel/sched/idle.c > +++ b/kernel/sched/idle.c > @@ -332,11 +332,6 @@ static void do_idle(void) > */ > smp_mb__after_atomic(); > > - /* > -* RCU relies on this call to be done outside of an RCU read-side > -* critical section. > -*/ > - flush_smp_call_function_queue(); > schedule_idle(); > > if (unlikely(klp_patch_pending(current))) > diff --git a/kernel/smp.c b/kernel/smp.c > index f085ebcdf9e7..2ff100c41885 100644 > --- a/kernel/smp.c > +++ b/kernel/smp.c > @@ -111,11 +111,9 @@ void __init call_function_init(void) > static __always_inline void > send_call_function_single_ipi(int cpu) > { > - if (call_function_single_prep_ipi(cpu)) { > - trace_ipi_send_cpu(cpu, _RET_IP_, > - > generic_smp_call_function_single_interrupt); > - arch_send_call_function_single_ipi(cpu); > - } > + trace_ipi_send_cpu(cpu, _RET_IP_, > + generic_smp_call_function_single_interrupt); > + arch_send_call_function_single_ipi(cpu); > } > > static __always_inline void > -- > > With the revert, the time taken to complete a fixed set of IPIs using > ipistorm improves significantly. Following are the numbers from a dual > socket 3rd Generation EPYC system (2 x 64C/128T) (boost on, C2 disabled) > running ipistorm between CPU8 and CPU16: > > cmdline: insmod ipistorm.ko numipi=10 single=1 offset=8 cpulist=8 wait=1 > > (tip:sched/core at tag "sched-core-2024-01-08" for all the testing done > below) > > == > Test : ipistorm (modified) > Units : Normalized runtime > Interpretation: Lower is better > Statistic : AMean > == > kernel: time [pct imp] > tip:sched/core1.00 [0.00] > tip:sched/core + revert 0.81 [19.36] > > Although the revert improves ipistorm performance, it also regresses > tbench and netperf, supporting the validity of the optimization. > Following are netperf and tbench numbers from the same machine comparing > vanilla tip:sched/core and the revert applied on top: > > == > Test : tbench > Units : Normalized throughput > Interpretation: Higher is better > Statistic : AMean > == > Clients:tip[pct imp](CV) revert[pct imp](CV) > 1 1.00 [ 0.00]( 0.24) 0.91 [ -8.96]( 0.30) > 2 1.00 [ 0.00]( 0.25) 0.92 [ -8.20]( 0.97) > 4 1.00 [ 0.00]( 0.23) 0.91 [ -9.20]( 1.75)
Re: [PATCH 3/4] arch: define CONFIG_PAGE_SIZE_*KB on all architectures
On Mon, 2024-02-26 at 17:14 +0100, Arnd Bergmann wrote: > > arch/um/Kconfig| 1 + > arch/um/include/asm/page.h | 2 +- LGTM, thanks. Acked-by: Johannes Berg johannes
Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info
On Mon, 2024-03-04 at 18:00 +, Christophe Leroy wrote: > > Personally, I think a single patch that sets "= {}" for all of them > > and > > drop the all the "= 0" or "= NULL" assignments would be the > > cleanest way > > to go. > > I agree with Kees, set = {} and drop all the "something = 0;" stuff. Thanks. Now some of the arch's have very nicely acked and reviewed the existing patches. I'll leave those as is, and do this for anyone that doesn't respond.
Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info
Le 02/03/2024 à 02:51, Kees Cook a écrit : > On Sat, Mar 02, 2024 at 12:47:08AM +, Edgecombe, Rick P wrote: >> On Wed, 2024-02-28 at 09:21 -0800, Kees Cook wrote: >>> I totally understand. If the "uninitialized" warnings were actually >>> reliable, I would agree. I look at it this way: >>> >>> - initializations can be missed either in static initializers or via >>> run time initializers. (So the risk of mistake here is matched -- >>> though I'd argue it's easier to *find* static initializers when >>> adding >>> new struct members.) >>> - uninitialized warnings are inconsistent (this becomes an unknown >>> risk) >>> - when a run time initializer is missed, the contents are whatever >>> was >>> on the stack (high risk) >>> - what a static initializer is missed, the content is 0 (low risk) >>> >>> I think unambiguous state (always 0) is significantly more important >>> for >>> the safety of the system as a whole. Yes, individual cases maybe bad >>> ("what uid should this be? root?!") but from a general memory safety >>> perspective the value doesn't become potentially influenced by order >>> of >>> operations, leftover stack memory, etc. >>> >>> I'd agree, lifting everything into a static initializer does seem >>> cleanest of all the choices. >> >> Hi Kees, >> >> Well, I just gave this a try. It is giving me flashbacks of when I last >> had to do a tree wide change that I couldn't fully test and the >> breakage was caught by Linus. > > Yeah, testing isn't fun for these kinds of things. This is traditionally > why the "obviously correct" changes tend to have an easier time landing > (i.e. adding "= {}" to all of them). > >> Could you let me know if you think this is additionally worthwhile >> cleanup outside of the guard gap improvements of this series? Because I >> was thinking a more cowardly approach could be a new vm_unmapped_area() >> variant that takes the new start gap member as a separate argument >> outside of struct vm_unmapped_area_info. It would be kind of strange to >> keep them separate, but it would be less likely to bump something. > > I think you want a new member -- AIUI, that's what that struct is for. > > Looking at this resulting set of patches, I do kinda think just adding > the "= {}" in a single patch is more sensible. Having to split things > that are know at the top of the function from the stuff known at the > existing initialization time is rather awkward. > > Personally, I think a single patch that sets "= {}" for all of them and > drop the all the "= 0" or "= NULL" assignments would be the cleanest way > to go. I agree with Kees, set = {} and drop all the "something = 0;" stuff. Christophe
Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info
On Sat, Mar 02, 2024 at 12:47:08AM +, Edgecombe, Rick P wrote: > On Wed, 2024-02-28 at 09:21 -0800, Kees Cook wrote: > > I totally understand. If the "uninitialized" warnings were actually > > reliable, I would agree. I look at it this way: > > > > - initializations can be missed either in static initializers or via > > run time initializers. (So the risk of mistake here is matched -- > > though I'd argue it's easier to *find* static initializers when > > adding > > new struct members.) > > - uninitialized warnings are inconsistent (this becomes an unknown > > risk) > > - when a run time initializer is missed, the contents are whatever > > was > > on the stack (high risk) > > - what a static initializer is missed, the content is 0 (low risk) > > > > I think unambiguous state (always 0) is significantly more important > > for > > the safety of the system as a whole. Yes, individual cases maybe bad > > ("what uid should this be? root?!") but from a general memory safety > > perspective the value doesn't become potentially influenced by order > > of > > operations, leftover stack memory, etc. > > > > I'd agree, lifting everything into a static initializer does seem > > cleanest of all the choices. > > Hi Kees, > > Well, I just gave this a try. It is giving me flashbacks of when I last > had to do a tree wide change that I couldn't fully test and the > breakage was caught by Linus. Yeah, testing isn't fun for these kinds of things. This is traditionally why the "obviously correct" changes tend to have an easier time landing (i.e. adding "= {}" to all of them). > Could you let me know if you think this is additionally worthwhile > cleanup outside of the guard gap improvements of this series? Because I > was thinking a more cowardly approach could be a new vm_unmapped_area() > variant that takes the new start gap member as a separate argument > outside of struct vm_unmapped_area_info. It would be kind of strange to > keep them separate, but it would be less likely to bump something. I think you want a new member -- AIUI, that's what that struct is for. Looking at this resulting set of patches, I do kinda think just adding the "= {}" in a single patch is more sensible. Having to split things that are know at the top of the function from the stuff known at the existing initialization time is rather awkward. Personally, I think a single patch that sets "= {}" for all of them and drop the all the "= 0" or "= NULL" assignments would be the cleanest way to go. -Kees -- Kees Cook
Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info
On Wed, 2024-02-28 at 09:21 -0800, Kees Cook wrote: > I totally understand. If the "uninitialized" warnings were actually > reliable, I would agree. I look at it this way: > > - initializations can be missed either in static initializers or via > run time initializers. (So the risk of mistake here is matched -- > though I'd argue it's easier to *find* static initializers when > adding > new struct members.) > - uninitialized warnings are inconsistent (this becomes an unknown > risk) > - when a run time initializer is missed, the contents are whatever > was > on the stack (high risk) > - what a static initializer is missed, the content is 0 (low risk) > > I think unambiguous state (always 0) is significantly more important > for > the safety of the system as a whole. Yes, individual cases maybe bad > ("what uid should this be? root?!") but from a general memory safety > perspective the value doesn't become potentially influenced by order > of > operations, leftover stack memory, etc. > > I'd agree, lifting everything into a static initializer does seem > cleanest of all the choices. Hi Kees, Well, I just gave this a try. It is giving me flashbacks of when I last had to do a tree wide change that I couldn't fully test and the breakage was caught by Linus. Could you let me know if you think this is additionally worthwhile cleanup outside of the guard gap improvements of this series? Because I was thinking a more cowardly approach could be a new vm_unmapped_area() variant that takes the new start gap member as a separate argument outside of struct vm_unmapped_area_info. It would be kind of strange to keep them separate, but it would be less likely to bump something. Thanks, Rick
Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info
Le 28/02/2024 à 18:01, Edgecombe, Rick P a écrit : > On Wed, 2024-02-28 at 13:22 +, Christophe Leroy wrote: >>> Any preference? Or maybe am I missing your point and talking >>> nonsense? >>> >> >> So my preference would go to the addition of: >> >> info.new_field = 0; >> >> But that's very minor and if you think it is easier to manage and >> maintain by performing {} initialisation at declaration, lets go for >> that. > > Appreciate the clarification and help getting this right. I'm thinking > Kees' and now Kirill's point about this patch resulting in unnecessary > manual zero initialization of the structs is probably something that > needs to be addressed. > > If I created a bunch of patches to change each call site, I think the > the best is probably to do the designated field zero initialization > way. > > But I can do something for powerpc special if you want. I'll first try > with powerpc matching the others, and if it seems objectionable, please > let me know. > My comments were generic, it was not powerpc oriented. Please keep powerpc as similar as possible with others. Christophe
Re: [PATCH 3/4] arch: define CONFIG_PAGE_SIZE_*KB on all architectures
On Mon, Feb 26, 2024 at 05:14:13PM +0100, Arnd Bergmann wrote: > From: Arnd Bergmann > > Most architectures only support a single hardcoded page size. In order > to ensure that each one of these sets the corresponding Kconfig symbols, > change over the PAGE_SHIFT definition to the common one and allow > only the hardware page size to be selected. > > Signed-off-by: Arnd Bergmann > --- > arch/alpha/Kconfig | 1 + > arch/alpha/include/asm/page.h | 2 +- > arch/arm/Kconfig | 1 + > arch/arm/include/asm/page.h| 2 +- > arch/csky/Kconfig | 1 + > arch/csky/include/asm/page.h | 2 +- > arch/m68k/Kconfig | 3 +++ > arch/m68k/Kconfig.cpu | 2 ++ > arch/m68k/include/asm/page.h | 6 +- > arch/microblaze/Kconfig| 1 + > arch/microblaze/include/asm/page.h | 2 +- > arch/nios2/Kconfig | 1 + > arch/nios2/include/asm/page.h | 2 +- > arch/openrisc/Kconfig | 1 + > arch/openrisc/include/asm/page.h | 2 +- > arch/riscv/Kconfig | 1 + > arch/riscv/include/asm/page.h | 2 +- > arch/s390/Kconfig | 1 + > arch/s390/include/asm/page.h | 2 +- > arch/sparc/Kconfig | 2 ++ > arch/sparc/include/asm/page_32.h | 2 +- > arch/sparc/include/asm/page_64.h | 3 +-- > arch/um/Kconfig| 1 + > arch/um/include/asm/page.h | 2 +- > arch/x86/Kconfig | 1 + > arch/x86/include/asm/page_types.h | 2 +- > arch/xtensa/Kconfig| 1 + > arch/xtensa/include/asm/page.h | 2 +- > 28 files changed, 32 insertions(+), 19 deletions(-) > diff --git a/arch/openrisc/Kconfig b/arch/openrisc/Kconfig > index fd9bb76a610b..3586cda55bde 100644 > --- a/arch/openrisc/Kconfig > +++ b/arch/openrisc/Kconfig > @@ -25,6 +25,7 @@ config OPENRISC > select GENERIC_CPU_DEVICES > select HAVE_PCI > select HAVE_UID16 > + select HAVE_PAGE_SIZE_8KB > select GENERIC_ATOMIC64 > select GENERIC_CLOCKEVENTS_BROADCAST > select GENERIC_SMP_IDLE_THREAD > diff --git a/arch/openrisc/include/asm/page.h > b/arch/openrisc/include/asm/page.h > index 44fc1fd56717..7925ce09ab5a 100644 > --- a/arch/openrisc/include/asm/page.h > +++ b/arch/openrisc/include/asm/page.h > @@ -18,7 +18,7 @@ > > /* PAGE_SHIFT determines the page size */ > > -#define PAGE_SHIFT 13 > +#define PAGE_SHIFT CONFIG_PAGE_SHIFT > #ifdef __ASSEMBLY__ > #define PAGE_SIZE (1 << PAGE_SHIFT) > #else For the openrisc bits, Acked-by: Stafford Horne
Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info
On Wed, Feb 28, 2024 at 01:22:09PM +, Christophe Leroy wrote: > [...] > My worry with initialisation at declaration is it often hides missing > assignments. Let's take following simple exemple: > > char *colour(int num) > { > char *name; > > if (num == 0) { > name = "black"; > } else if (num == 1) { > name = "white"; > } else if (num == 2) { > } else { > name = "no colour"; > } > > return name; > } > > Here, GCC warns about a missing initialisation of variable 'name'. Sometimes. :( We build with -Wno-maybe-uninitialized because GCC gets this wrong too often. Also, like with large structs like this, all uninit warnings get suppressed if anything takes it by reference. So, if before your "return name" statement above, you had something like: do_something(); it won't warn with any option enabled. > But if I declare it as > > char *name = "no colour"; > > Then GCC won't warn anymore that we are missing a value for when num is 2. > > During my life I have so many times spent huge amount of time > investigating issues and bugs due to missing assignments that were going > undetected due to default initialisation at declaration. I totally understand. If the "uninitialized" warnings were actually reliable, I would agree. I look at it this way: - initializations can be missed either in static initializers or via run time initializers. (So the risk of mistake here is matched -- though I'd argue it's easier to *find* static initializers when adding new struct members.) - uninitialized warnings are inconsistent (this becomes an unknown risk) - when a run time initializer is missed, the contents are whatever was on the stack (high risk) - what a static initializer is missed, the content is 0 (low risk) I think unambiguous state (always 0) is significantly more important for the safety of the system as a whole. Yes, individual cases maybe bad ("what uid should this be? root?!") but from a general memory safety perspective the value doesn't become potentially influenced by order of operations, leftover stack memory, etc. I'd agree, lifting everything into a static initializer does seem cleanest of all the choices. -Kees -- Kees Cook
Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info
On Wed, 2024-02-28 at 13:22 +, Christophe Leroy wrote: > > Any preference? Or maybe am I missing your point and talking > > nonsense? > > > > So my preference would go to the addition of: > > info.new_field = 0; > > But that's very minor and if you think it is easier to manage and > maintain by performing {} initialisation at declaration, lets go for > that. Appreciate the clarification and help getting this right. I'm thinking Kees' and now Kirill's point about this patch resulting in unnecessary manual zero initialization of the structs is probably something that needs to be addressed. If I created a bunch of patches to change each call site, I think the the best is probably to do the designated field zero initialization way. But I can do something for powerpc special if you want. I'll first try with powerpc matching the others, and if it seems objectionable, please let me know. Thanks, Rick
Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info
Le 27/02/2024 à 21:25, Edgecombe, Rick P a écrit : > On Tue, 2024-02-27 at 18:16 +, Christophe Leroy wrote: Why doing a full init of the struct when all fields are re- written a few lines after ? >>> >>> It's a nice change for robustness and makes future changes easier. >>> It's >>> not actually wasteful since the compiler will throw away all >>> redundant >>> stores. >> >> Well, I tend to dislike default init at declaration because it often >> hides missed real init. When a field is not initialized GCC should >> emit >> a Warning, at least when built with W=2 which sets >> -Wmissing-field-initializers ? > > Sorry, I'm not following where you are going with this. There aren't > any struct vm_unmapped_area_info users that use initializers today, so > that warning won't apply in this case. Meanwhile, designated style > struct initialization (which would zero new members) is very common, as > well as not get anything checked by that warning. Anything with this > many members is probably going to use the designated style. > > If we are optimizing to avoid bugs, the way this struct is used today > is not great. It is essentially being used as an argument passer. > Normally when a function signature changes, but a caller is missed, of > course the compiler will notice loudly. But not here. So I think > probably zero initializing it is safer than being setup to pass > garbage. No worry, if everybody thinks that init at declaration is worth it in that case it is OK for me and I'm not going to ask for something special on powerpc, my comment was more general allthough I used powerpc as an exemple. My worry with initialisation at declaration is it often hides missing assignments. Let's take following simple exemple: char *colour(int num) { char *name; if (num == 0) { name = "black"; } else if (num == 1) { name = "white"; } else if (num == 2) { } else { name = "no colour"; } return name; } Here, GCC warns about a missing initialisation of variable 'name'. But if I declare it as char *name = "no colour"; Then GCC won't warn anymore that we are missing a value for when num is 2. During my life I have so many times spent huge amount of time investigating issues and bugs due to missing assignments that were going undetected due to default initialisation at declaration. > > I'm trying to figure out what to do here. If I changed it so that just > powerpc set the new field manually, then the convention across the > kernel would be for everything to be default zero, and future other new > parameters could have a greater chance of turning into garbage on > powerpc. Since it could be easy to miss that powerpc was special. Would > you prefer it? > > Or maybe I could try a new vm_unmapped_area() that takes the extra > argument separately? The old callers could call the old function and > not need any arch updates. It all seems strange though, because > automatic zero initializing struct members is so common in the kernel. > But it also wouldn't add the cleanup Kees was pointing out. Hmm. > > Any preference? Or maybe am I missing your point and talking nonsense? > So my preference would go to the addition of: info.new_field = 0; But that's very minor and if you think it is easier to manage and maintain by performing {} initialisation at declaration, lets go for that. Christophe
Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info
On Mon, Feb 26, 2024 at 11:09:47AM -0800, Rick Edgecombe wrote: > diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c > index 5db88b627439..dd6801bb9240 100644 > --- a/arch/alpha/kernel/osf_sys.c > +++ b/arch/alpha/kernel/osf_sys.c > @@ -1218,7 +1218,7 @@ static unsigned long > arch_get_unmapped_area_1(unsigned long addr, unsigned long len, >unsigned long limit) > { > - struct vm_unmapped_area_info info; > + struct vm_unmapped_area_info info = {}; > > info.flags = 0; > info.length = len; Can we make a step forward and actually move initialization inside the initializator? Something like below. I understand that it is substantially more work, but I think it is useful. diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c index 5db88b627439..c40ddede3b13 100644 --- a/arch/alpha/kernel/osf_sys.c +++ b/arch/alpha/kernel/osf_sys.c @@ -1218,14 +1218,12 @@ static unsigned long arch_get_unmapped_area_1(unsigned long addr, unsigned long len, unsigned long limit) { - struct vm_unmapped_area_info info; + struct vm_unmapped_area_info info = { + .length = len; + .low_limit = addr, + .high_limit = limit, + }; - info.flags = 0; - info.length = len; - info.low_limit = addr; - info.high_limit = limit; - info.align_mask = 0; - info.align_offset = 0; return vm_unmapped_area(); } -- Kiryl Shutsemau / Kirill A. Shutemov
Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info
On Tue, 2024-02-27 at 18:16 +, Christophe Leroy wrote: > > > Why doing a full init of the struct when all fields are re- > > > written a few > > > lines after ? > > > > It's a nice change for robustness and makes future changes easier. > > It's > > not actually wasteful since the compiler will throw away all > > redundant > > stores. > > Well, I tend to dislike default init at declaration because it often > hides missed real init. When a field is not initialized GCC should > emit > a Warning, at least when built with W=2 which sets > -Wmissing-field-initializers ? Sorry, I'm not following where you are going with this. There aren't any struct vm_unmapped_area_info users that use initializers today, so that warning won't apply in this case. Meanwhile, designated style struct initialization (which would zero new members) is very common, as well as not get anything checked by that warning. Anything with this many members is probably going to use the designated style. If we are optimizing to avoid bugs, the way this struct is used today is not great. It is essentially being used as an argument passer. Normally when a function signature changes, but a caller is missed, of course the compiler will notice loudly. But not here. So I think probably zero initializing it is safer than being setup to pass garbage. I'm trying to figure out what to do here. If I changed it so that just powerpc set the new field manually, then the convention across the kernel would be for everything to be default zero, and future other new parameters could have a greater chance of turning into garbage on powerpc. Since it could be easy to miss that powerpc was special. Would you prefer it? Or maybe I could try a new vm_unmapped_area() that takes the extra argument separately? The old callers could call the old function and not need any arch updates. It all seems strange though, because automatic zero initializing struct members is so common in the kernel. But it also wouldn't add the cleanup Kees was pointing out. Hmm. Any preference? Or maybe am I missing your point and talking nonsense?
Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info
Le 27/02/2024 à 19:07, Kees Cook a écrit : > On Tue, Feb 27, 2024 at 07:02:59AM +, Christophe Leroy wrote: >> >> >> Le 26/02/2024 à 20:09, Rick Edgecombe a écrit : >>> Future changes will need to add a field to struct vm_unmapped_area_info. >>> This would cause trouble for any archs that don't initialize the >>> struct. Currently every user sets each field, so if new fields are >>> added, the core code parsing the struct will see garbage in the new >>> field. >>> >>> It could be possible to initialize the new field for each arch to 0, but >>> instead simply inialize the field with a C99 struct inializing syntax. >> >> Why doing a full init of the struct when all fields are re-written a few >> lines after ? > > It's a nice change for robustness and makes future changes easier. It's > not actually wasteful since the compiler will throw away all redundant > stores. Well, I tend to dislike default init at declaration because it often hides missed real init. When a field is not initialized GCC should emit a Warning, at least when built with W=2 which sets -Wmissing-field-initializers ? > >> If I take the exemple of powerpc function slice_find_area_bottomup(): >> >> struct vm_unmapped_area_info info; >> >> info.flags = 0; >> info.length = len; >> info.align_mask = PAGE_MASK & ((1ul << pshift) - 1); >> info.align_offset = 0; > > But one cleanup that is possible from explicitly zero-initializing the > whole structure would be dropping all the individual "= 0" assignments. > :) > Sure if we decide to go that direction all those 0 assignments void.
Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info
On Tue, Feb 27, 2024 at 07:02:59AM +, Christophe Leroy wrote: > > > Le 26/02/2024 à 20:09, Rick Edgecombe a écrit : > > Future changes will need to add a field to struct vm_unmapped_area_info. > > This would cause trouble for any archs that don't initialize the > > struct. Currently every user sets each field, so if new fields are > > added, the core code parsing the struct will see garbage in the new > > field. > > > > It could be possible to initialize the new field for each arch to 0, but > > instead simply inialize the field with a C99 struct inializing syntax. > > Why doing a full init of the struct when all fields are re-written a few > lines after ? It's a nice change for robustness and makes future changes easier. It's not actually wasteful since the compiler will throw away all redundant stores. > If I take the exemple of powerpc function slice_find_area_bottomup(): > > struct vm_unmapped_area_info info; > > info.flags = 0; > info.length = len; > info.align_mask = PAGE_MASK & ((1ul << pshift) - 1); > info.align_offset = 0; But one cleanup that is possible from explicitly zero-initializing the whole structure would be dropping all the individual "= 0" assignments. :) -- Kees Cook
Re: [PATCH 1/4] arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions
On Tue, Feb 27, 2024, at 16:44, Christophe Leroy wrote: > Le 27/02/2024 à 16:40, Arnd Bergmann a écrit : >> On Mon, Feb 26, 2024, at 17:55, Samuel Holland wrote: > > > For 256K pages, powerpc has the following help. I think you should have > it too: > > The kernel will only be able to run applications that have been > compiled with '-zmax-page-size' set to 256K (the default is 64K) using > binutils later than 2.17.50.0.3, or by patching the ELF_MAXPAGESIZE > definition from 0x1 to 0x4 in older versions. I don't think we need to mention pre-2.18 binutils any more, but the rest seems useful, changed the text now to config PAGE_SIZE_256KB bool "256KiB pages" depends on HAVE_PAGE_SIZE_256KB help 256KiB pages have little practical value due to their extreme memory usage. The kernel will only be able to run applications that have been compiled with '-zmax-page-size' set to 256KiB (the default is 64KiB or 4KiB on most architectures). Arnd
Re: [PATCH 1/4] arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions
Le 27/02/2024 à 16:40, Arnd Bergmann a écrit : > On Mon, Feb 26, 2024, at 17:55, Samuel Holland wrote: >> On 2024-02-26 10:14 AM, Arnd Bergmann wrote: >>> >>> +config HAVE_PAGE_SIZE_4KB >>> + bool >>> + >>> +config HAVE_PAGE_SIZE_8KB >>> + bool >>> + >>> +config HAVE_PAGE_SIZE_16KB >>> + bool >>> + >>> +config HAVE_PAGE_SIZE_32KB >>> + bool >>> + >>> +config HAVE_PAGE_SIZE_64KB >>> + bool >>> + >>> +config HAVE_PAGE_SIZE_256KB >>> + bool >>> + >>> +choice >>> + prompt "MMU page size" >> >> Should this have some generic help text (at least a warning about >> compatibility)? > > Good point. I've added some of this now, based on the mips > text with some generalizations for other architectures: > > config PAGE_SIZE_4KB > bool "4KiB pages" > depends on HAVE_PAGE_SIZE_4KB > help >This option select the standard 4KiB Linux page size and the only >available option on many architectures. Using 4KiB page size will >minimize memory consumption and is therefore recommended for low >memory systems. >Some software that is written for x86 systems makes incorrect >assumptions about the page size and only runs on 4KiB pages. > > config PAGE_SIZE_8KB > bool "8KiB pages" > depends on HAVE_PAGE_SIZE_8KB > help >This option is the only supported page size on a few older >processors, and can be slightly faster than 4KiB pages. > > config PAGE_SIZE_16KB > bool "16KiB pages" > depends on HAVE_PAGE_SIZE_16KB > help >This option is usually a good compromise between memory >consumption and performance for typical desktop and server >workloads, often saving a level of page table lookups compared >to 4KB pages as well as reducing TLB pressure and overhead of >per-page operations in the kernel at the expense of a larger >page cache. > > config PAGE_SIZE_32KB > bool "32KiB pages" > depends on HAVE_PAGE_SIZE_32KB >Using 32KiB page size will result in slightly higher performance >kernel at the price of higher memory consumption compared to >16KiB pages. This option is available only on cnMIPS cores. >Note that you will need a suitable Linux distribution to >support this. > > config PAGE_SIZE_64KB > bool "64KiB pages" > depends on HAVE_PAGE_SIZE_64KB >Using 64KiB page size will result in slightly higher performance >kernel at the price of much higher memory consumption compared to >4KiB or 16KiB pages. >This is not suitable for general-purpose workloads but the >better performance may be worth the cost for certain types of >supercomputing or database applications that work mostly with >large in-memory data rather than small files. > > config PAGE_SIZE_256KB > bool "256KiB pages" > depends on HAVE_PAGE_SIZE_256KB > help >256KB pages have little practical value due to their extreme >memory usage. For 256K pages, powerpc has the following help. I think you should have it too: The kernel will only be able to run applications that have been compiled with '-zmax-page-size' set to 256K (the default is 64K) using binutils later than 2.17.50.0.3, or by patching the ELF_MAXPAGESIZE definition from 0x1 to 0x4 in older versions.
Re: [PATCH 1/4] arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions
On Tue, Feb 27, 2024, at 09:45, Geert Uytterhoeven wrote: > >> +config PAGE_SIZE_4KB >> + bool "4KB pages" > > Now you got rid of the 4000-byte ("4kB") pages and friends, please > do not replace these by Kelvin-bytes, and use the official binary > prefixes => "4 KiB". > Done, thanks. Arnd
Re: [PATCH 1/4] arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions
On Mon, Feb 26, 2024, at 20:02, Christophe Leroy wrote: > Le 26/02/2024 à 17:14, Arnd Bergmann a écrit : >> From: Arnd Bergmann > > That's a nice re-factor. > > The only drawback I see is that we are loosing several interesting > arch-specific comments/help text. Don't know if there could be an easy > way to keep them. This is what I have now, trying to write it as generic as possible while still giving useful advice: config PAGE_SIZE_4KB bool "4KiB pages" depends on HAVE_PAGE_SIZE_4KB help This option select the standard 4KiB Linux page size and the only available option on many architectures. Using 4KiB page size will minimize memory consumption and is therefore recommended for low memory systems. Some software that is written for x86 systems makes incorrect assumptions about the page size and only runs on 4KiB pages. config PAGE_SIZE_8KB bool "8KiB pages" depends on HAVE_PAGE_SIZE_8KB help This option is the only supported page size on a few older processors, and can be slightly faster than 4KiB pages. config PAGE_SIZE_16KB bool "16KiB pages" depends on HAVE_PAGE_SIZE_16KB help This option is usually a good compromise between memory consumption and performance for typical desktop and server workloads, often saving a level of page table lookups compared to 4KB pages as well as reducing TLB pressure and overhead of per-page operations in the kernel at the expense of a larger page cache. config PAGE_SIZE_32KB bool "32KiB pages" depends on HAVE_PAGE_SIZE_32KB Using 32KiB page size will result in slightly higher performance kernel at the price of higher memory consumption compared to 16KiB pages. This option is available only on cnMIPS cores. Note that you will need a suitable Linux distribution to support this. config PAGE_SIZE_64KB bool "64KiB pages" depends on HAVE_PAGE_SIZE_64KB Using 64KiB page size will result in slightly higher performance kernel at the price of much higher memory consumption compared to 4KiB or 16KiB pages. This is not suitable for general-purpose workloads but the better performance may be worth the cost for certain types of supercomputing or database applications that work mostly with large in-memory data rather than small files. config PAGE_SIZE_256KB bool "256KiB pages" depends on HAVE_PAGE_SIZE_256KB help 256KB pages have little practical value due to their extreme memory usage. Let me know if you think some of this should be adapted further. >> >> +#define PAGE_SHIFT CONFIG_PAGE_SHIFT >> #define PAGE_SIZE (1UL << PAGE_SHIFT) >> #define PAGE_MASK (~((1 << PAGE_SHIFT) - 1)) >> > > Could we move PAGE_SIZE and PAGE_MASK in a generic/core header instead > of having it duplicated for each arch ? Yes, but I'm leaving this for a follow-up series, since I had to stop somewhere and there is always room for cleanup up headers further ;-) Arnd
Re: [PATCH 1/4] arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions
On Mon, Feb 26, 2024, at 17:55, Samuel Holland wrote: > On 2024-02-26 10:14 AM, Arnd Bergmann wrote: >> >> +config HAVE_PAGE_SIZE_4KB >> +bool >> + >> +config HAVE_PAGE_SIZE_8KB >> +bool >> + >> +config HAVE_PAGE_SIZE_16KB >> +bool >> + >> +config HAVE_PAGE_SIZE_32KB >> +bool >> + >> +config HAVE_PAGE_SIZE_64KB >> +bool >> + >> +config HAVE_PAGE_SIZE_256KB >> +bool >> + >> +choice >> +prompt "MMU page size" > > Should this have some generic help text (at least a warning about > compatibility)? Good point. I've added some of this now, based on the mips text with some generalizations for other architectures: config PAGE_SIZE_4KB bool "4KiB pages" depends on HAVE_PAGE_SIZE_4KB help This option select the standard 4KiB Linux page size and the only available option on many architectures. Using 4KiB page size will minimize memory consumption and is therefore recommended for low memory systems. Some software that is written for x86 systems makes incorrect assumptions about the page size and only runs on 4KiB pages. config PAGE_SIZE_8KB bool "8KiB pages" depends on HAVE_PAGE_SIZE_8KB help This option is the only supported page size on a few older processors, and can be slightly faster than 4KiB pages. config PAGE_SIZE_16KB bool "16KiB pages" depends on HAVE_PAGE_SIZE_16KB help This option is usually a good compromise between memory consumption and performance for typical desktop and server workloads, often saving a level of page table lookups compared to 4KB pages as well as reducing TLB pressure and overhead of per-page operations in the kernel at the expense of a larger page cache. config PAGE_SIZE_32KB bool "32KiB pages" depends on HAVE_PAGE_SIZE_32KB Using 32KiB page size will result in slightly higher performance kernel at the price of higher memory consumption compared to 16KiB pages. This option is available only on cnMIPS cores. Note that you will need a suitable Linux distribution to support this. config PAGE_SIZE_64KB bool "64KiB pages" depends on HAVE_PAGE_SIZE_64KB Using 64KiB page size will result in slightly higher performance kernel at the price of much higher memory consumption compared to 4KiB or 16KiB pages. This is not suitable for general-purpose workloads but the better performance may be worth the cost for certain types of supercomputing or database applications that work mostly with large in-memory data rather than small files. config PAGE_SIZE_256KB bool "256KiB pages" depends on HAVE_PAGE_SIZE_256KB help 256KB pages have little practical value due to their extreme memory usage. >> diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig >> index a880ee067d2e..aac46ee1a000 100644 >> --- a/arch/hexagon/Kconfig >> +++ b/arch/hexagon/Kconfig >> @@ -8,6 +8,11 @@ config HEXAGON >> select ARCH_HAS_SYNC_DMA_FOR_DEVICE >> select ARCH_NO_PREEMPT >> select DMA_GLOBAL_POOL >> +select FRAME_POINTER > > Looks like a paste error. > Fixed, thanks! I think that happened during a rebase. >> #ifdef CONFIG_PAGE_SIZE_1MB >> -#define PAGE_SHIFT 20 >> #define HEXAGON_L1_PTE_SIZE __HVM_PDE_S_1MB >> #endif > > The corresponding Kconfig option does not exist (and did not exist before this > patch). Yes, I noticed that as well. It's clearly harmless. Arnd
Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info
On Tue, 2024-02-27 at 07:02 +, Christophe Leroy wrote: > > It could be possible to initialize the new field for each arch to > > 0, but > > instead simply inialize the field with a C99 struct inializing > > syntax. > > Why doing a full init of the struct when all fields are re-written a > few > lines after ? > > If I take the exemple of powerpc function slice_find_area_bottomup(): > > struct vm_unmapped_area_info info; > > info.flags = 0; > info.length = len; > info.align_mask = PAGE_MASK & ((1ul << pshift) - 1); > info.align_offset = 0; > > For me it looks better to just add: > > info.new_field = 0; /* or whatever value it needs to have */ Hi, Thanks for taking a look. Yes, I guess that should have some justification. I was thinking of two reasons: 1. No future additions of optional parameters would need to make tree wide changes like this. 2. The change is easier to review and get correct because the necessary context is within a single line. For example, in that function some of members are set within a while loop. The place you pointed seems to be the correct one, but a diff that had the new field set after: info.high_limit = addr; ...would look correct too, but not be. What is the concern with C99 initialization? FWIW, the full series also removes an indirect branch, and probably is a net win for performance in this path.
Re: [PATCH 3/4] arch: define CONFIG_PAGE_SIZE_*KB on all architectures
On Tue, Feb 27, 2024, at 12:12, Geert Uytterhoeven wrote: > On Tue, Feb 27, 2024 at 11:59 AM Arnd Bergmann wrote: >> On Tue, Feb 27, 2024, at 09:54, Geert Uytterhoeven wrote: >> I was a bit unsure about how to best do this since there >> is not really a need for a fixed page size on nommu kernels, >> whereas the three MMU configs clearly tie the page size to >> the MMU rather than the platform. >> >> There should be no reason for coldfire to have a different >> page size from dragonball if neither of them actually uses >> hardware pages, so one of them could be changed later. > > Indeed, in theory, PAGE_SIZE doesn't matter for nommu, but the concept > of pages is used all over the place in Linux. > > I'm mostly worried about some Coldfire code relying on the actual value > of PAGE_SIZE in some other context. e.g. for configuring non-cacheable > regions. Right, any change here would have to be carefully tested. I would expect that a 4K page size would reduce memory consumption even on NOMMU systems that should have the same tradeoffs for representing files in the page cache and in mem_map[]. > And does this impact running nommu binaries on a system with MMU? > I.e. if nommu binaries were built with a 4 KiB PAGE_SIZE, do they > still run on MMU systems with an 8 KiB PAGE_SIZE (coldfire and sun3), > or are there some subtleties to take into account? As far as I understand, binaries have to be built and linked for the largest page size they can run on, so running them on a kernel with smaller page size usually works. One notable exception is sys_mmap2(), which on most architectures takes units of 4KiB but on m68k is actually written to take PAGE_SIZE units. As Al pointed out in f8b7256096a2 ("Unify sys_mmap*"), it has always been wrong on sun3, presumably because users of that predate modern glibc. Running coldfire nommu binaries on coldfire mmu kernels would run into the same bug if either of them changes PAGE_SIZE. If you can run coldfire nommu binaries on classic m68k, that is already broken in the same way. Arnd
Re: [PATCH 3/4] arch: define CONFIG_PAGE_SIZE_*KB on all architectures
On Mon, Feb 26, 2024 at 05:14:13PM +0100, Arnd Bergmann wrote: > From: Arnd Bergmann > > Most architectures only support a single hardcoded page size. In order > to ensure that each one of these sets the corresponding Kconfig symbols, > change over the PAGE_SHIFT definition to the common one and allow > only the hardware page size to be selected. > > Signed-off-by: Arnd Bergmann > --- ... > arch/s390/Kconfig | 1 + > arch/s390/include/asm/page.h | 2 +- ... > diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig > index fe565f3a3a91..b61c74c10050 100644 > --- a/arch/s390/Kconfig > +++ b/arch/s390/Kconfig > @@ -199,6 +199,7 @@ config S390 > select HAVE_MOD_ARCH_SPECIFIC > select HAVE_NMI > select HAVE_NOP_MCOUNT > + select HAVE_PAGE_SIZE_4KB > select HAVE_PCI > select HAVE_PERF_EVENTS > select HAVE_PERF_REGS > diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h > index 73b9c3bf377f..ded9548d11d9 100644 > --- a/arch/s390/include/asm/page.h > +++ b/arch/s390/include/asm/page.h > @@ -11,7 +11,7 @@ > #include > #include > > -#define _PAGE_SHIFT 12 > +#define _PAGE_SHIFT CONFIG_PAGE_SHIFT Acked-by: Heiko Carstens
Re: [PATCH 2/4] arch: simplify architecture specific page size configuration
On 2/26/24 17:14, Arnd Bergmann wrote: From: Arnd Bergmann arc, arm64, parisc and powerpc all have their own Kconfig symbols in place of the common CONFIG_PAGE_SIZE_4KB symbols. Change these so the common symbols are the ones that are actually used, while leaving the arhcitecture specific ones as the user visible place for configuring it, to avoid breaking user configs. Signed-off-by: Arnd Bergmann --- arch/arc/Kconfig | 3 +++ arch/arc/include/uapi/asm/page.h | 6 ++ arch/arm64/Kconfig| 29 + arch/arm64/include/asm/page-def.h | 2 +- arch/parisc/Kconfig | 3 +++ arch/parisc/include/asm/page.h| 10 +- Acked-by: Helge Deller # parisc Thanks for the cleanups! Helge
Re: [PATCH 4/4] vdso: avoid including asm/page.h
On Mon, Feb 26, 2024 at 05:14:14PM +0100, Arnd Bergmann wrote: > From: Arnd Bergmann > > The recent change to the vdso_data_store broke building compat VDSO > on at least arm64 because it includes headers outside of the include/vdso/ > namespace: > > In file included from arch/arm64/include/asm/lse.h:5, > from arch/arm64/include/asm/cmpxchg.h:14, > from arch/arm64/include/asm/atomic.h:16, > from include/linux/atomic.h:7, > from include/asm-generic/bitops/atomic.h:5, > from arch/arm64/include/asm/bitops.h:25, > from include/linux/bitops.h:68, > from arch/arm64/include/asm/memory.h:209, > from arch/arm64/include/asm/page.h:46, > from include/vdso/datapage.h:22, > from lib/vdso/gettimeofday.c:5, > from : > arch/arm64/include/asm/atomic_ll_sc.h:298:9: error: unknown type name 'u128' > 298 | u128 full; > > Use an open-coded page size calculation based on the new CONFIG_PAGE_SHIFT > Kconfig symbol instead. > > Reported-by: Linux Kernel Functional Testing > Fixes: a0d2fcd62ac2 ("vdso/ARM: Make union vdso_data_store available for all > architectures") > Link: > https://lore.kernel.org/lkml/ca+g9fytrxxm_ko9fnpz3xarxhv7ud_yqp-teupqrnrhu+_0...@mail.gmail.com/ > Signed-off-by: Arnd Bergmann Acked-by: Catalin Marinas
Re: [PATCH 2/4] arch: simplify architecture specific page size configuration
On Mon, Feb 26, 2024 at 05:14:12PM +0100, Arnd Bergmann wrote: > From: Arnd Bergmann > > arc, arm64, parisc and powerpc all have their own Kconfig symbols > in place of the common CONFIG_PAGE_SIZE_4KB symbols. Change these > so the common symbols are the ones that are actually used, while > leaving the arhcitecture specific ones as the user visible > place for configuring it, to avoid breaking user configs. > > Signed-off-by: Arnd Bergmann For arm64: Acked-by: Catalin Marinas
Re: [PATCH 4/4] vdso: avoid including asm/page.h
Christophe Leroy writes: > Le 26/02/2024 à 17:14, Arnd Bergmann a écrit : >> From: Arnd Bergmann >> >> The recent change to the vdso_data_store broke building compat VDSO >> on at least arm64 because it includes headers outside of the include/vdso/ >> namespace: > > I understand that powerpc64 also has an issue, see > https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20231221120410.2226678-1-...@ellerman.id.au/ Yeah, and that patch would silently conflict with this series, which is not ideal. I could delay merging my patch above until after this series goes in, mine only fixes a fairly obscure build warning. cheers
Re: [PATCH 3/4] arch: define CONFIG_PAGE_SIZE_*KB on all architectures
Hi Arnd, CC Greg On Tue, Feb 27, 2024 at 11:59 AM Arnd Bergmann wrote: > On Tue, Feb 27, 2024, at 09:54, Geert Uytterhoeven wrote: > >> diff --git a/arch/m68k/Kconfig.cpu b/arch/m68k/Kconfig.cpu > >> index 9dcf245c9cbf..c777a129768a 100644 > >> --- a/arch/m68k/Kconfig.cpu > >> +++ b/arch/m68k/Kconfig.cpu > >> @@ -30,6 +30,7 @@ config COLDFIRE > >> select GENERIC_CSUM > >> select GPIOLIB > >> select HAVE_LEGACY_CLK > >> + select HAVE_PAGE_SIZE_8KB if !MMU > > > > if you would drop the !MMU-dependency here. > > > >> > >> endchoice > >> > >> @@ -45,6 +46,7 @@ config M68000 > >> select GENERIC_CSUM > >> select CPU_NO_EFFICIENT_FFS > >> select HAVE_ARCH_HASH > >> + select HAVE_PAGE_SIZE_4KB > > > > Perhaps replace this by > > > > config M68KCLASSIC > > bool "Classic M68K CPU family support" > > select HAVE_ARCH_PFN_VALID > > + select HAVE_PAGE_SIZE_4KB if !MMU > > > > so it covers all 680x0 CPUs without MMU? > > I was a bit unsure about how to best do this since there > is not really a need for a fixed page size on nommu kernels, > whereas the three MMU configs clearly tie the page size to > the MMU rather than the platform. > > There should be no reason for coldfire to have a different > page size from dragonball if neither of them actually uses > hardware pages, so one of them could be changed later. Indeed, in theory, PAGE_SIZE doesn't matter for nommu, but the concept of pages is used all over the place in Linux. I'm mostly worried about some Coldfire code relying on the actual value of PAGE_SIZE in some other context. e.g. for configuring non-cacheable regions. And does this impact running nommu binaries on a system with MMU? I.e. if nommu binaries were built with a 4 KiB PAGE_SIZE, do they still run on MMU systems with an 8 KiB PAGE_SIZE (coldfire and sun3), or are there some subtleties to take into account? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH 3/4] arch: define CONFIG_PAGE_SIZE_*KB on all architectures
On Tue, Feb 27, 2024, at 09:54, Geert Uytterhoeven wrote: > Hi Arnd, >> diff --git a/arch/m68k/Kconfig.cpu b/arch/m68k/Kconfig.cpu >> index 9dcf245c9cbf..c777a129768a 100644 >> --- a/arch/m68k/Kconfig.cpu >> +++ b/arch/m68k/Kconfig.cpu >> @@ -30,6 +30,7 @@ config COLDFIRE >> select GENERIC_CSUM >> select GPIOLIB >> select HAVE_LEGACY_CLK >> + select HAVE_PAGE_SIZE_8KB if !MMU > > if you would drop the !MMU-dependency here. > >> >> endchoice >> >> @@ -45,6 +46,7 @@ config M68000 >> select GENERIC_CSUM >> select CPU_NO_EFFICIENT_FFS >> select HAVE_ARCH_HASH >> + select HAVE_PAGE_SIZE_4KB > > Perhaps replace this by > > config M68KCLASSIC > bool "Classic M68K CPU family support" > select HAVE_ARCH_PFN_VALID > + select HAVE_PAGE_SIZE_4KB if !MMU > > so it covers all 680x0 CPUs without MMU? I was a bit unsure about how to best do this since there is not really a need for a fixed page size on nommu kernels, whereas the three MMU configs clearly tie the page size to the MMU rather than the platform. There should be no reason for coldfire to have a different page size from dragonball if neither of them actually uses hardware pages, so one of them could be changed later. Let me know if that makes sense to you, or you still prefer me to change it like you suggested. Arnd
Re: [PATCH 3/4] arch: define CONFIG_PAGE_SIZE_*KB on all architectures
Hi Arnd, On Mon, Feb 26, 2024 at 5:15 PM Arnd Bergmann wrote: > From: Arnd Bergmann > > Most architectures only support a single hardcoded page size. In order > to ensure that each one of these sets the corresponding Kconfig symbols, > change over the PAGE_SHIFT definition to the common one and allow > only the hardware page size to be selected. > > Signed-off-by: Arnd Bergmann Thanks for your patch! > --- a/arch/m68k/Kconfig > +++ b/arch/m68k/Kconfig > @@ -84,12 +84,15 @@ config MMU > > config MMU_MOTOROLA > bool > + select HAVE_PAGE_SIZE_4KB > > config MMU_COLDFIRE > + select HAVE_PAGE_SIZE_8KB I think you can do without this... > bool > > config MMU_SUN3 > bool > + select HAVE_PAGE_SIZE_8KB > depends on MMU && !MMU_MOTOROLA && !MMU_COLDFIRE > > config ARCH_SUPPORTS_KEXEC > diff --git a/arch/m68k/Kconfig.cpu b/arch/m68k/Kconfig.cpu > index 9dcf245c9cbf..c777a129768a 100644 > --- a/arch/m68k/Kconfig.cpu > +++ b/arch/m68k/Kconfig.cpu > @@ -30,6 +30,7 @@ config COLDFIRE > select GENERIC_CSUM > select GPIOLIB > select HAVE_LEGACY_CLK > + select HAVE_PAGE_SIZE_8KB if !MMU if you would drop the !MMU-dependency here. > > endchoice > > @@ -45,6 +46,7 @@ config M68000 > select GENERIC_CSUM > select CPU_NO_EFFICIENT_FFS > select HAVE_ARCH_HASH > + select HAVE_PAGE_SIZE_4KB Perhaps replace this by config M68KCLASSIC bool "Classic M68K CPU family support" select HAVE_ARCH_PFN_VALID + select HAVE_PAGE_SIZE_4KB if !MMU so it covers all 680x0 CPUs without MMU? > select LEGACY_TIMER_TICK > help > The Freescale (was Motorola) 68000 CPU is the first generation of Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH 1/4] arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions
Hi Arnd, On Mon, Feb 26, 2024 at 5:14 PM Arnd Bergmann wrote: > From: Arnd Bergmann > > These four architectures define the same Kconfig symbols for configuring > the page size. Move the logic into a common place where it can be shared > with all other architectures. > > Signed-off-by: Arnd Bergmann Thanks for your patch! > --- a/arch/Kconfig > +++ b/arch/Kconfig > +config PAGE_SIZE_4KB > + bool "4KB pages" Now you got rid of the 4000-byte ("4kB") pages and friends, please do not replace these by Kelvin-bytes, and use the official binary prefixes => "4 KiB". > + depends on HAVE_PAGE_SIZE_4KB Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info
Le 26/02/2024 à 20:09, Rick Edgecombe a écrit : > Future changes will need to add a field to struct vm_unmapped_area_info. > This would cause trouble for any archs that don't initialize the > struct. Currently every user sets each field, so if new fields are > added, the core code parsing the struct will see garbage in the new > field. > > It could be possible to initialize the new field for each arch to 0, but > instead simply inialize the field with a C99 struct inializing syntax. Why doing a full init of the struct when all fields are re-written a few lines after ? If I take the exemple of powerpc function slice_find_area_bottomup(): struct vm_unmapped_area_info info; info.flags = 0; info.length = len; info.align_mask = PAGE_MASK & ((1ul << pshift) - 1); info.align_offset = 0; For me it looks better to just add: info.new_field = 0; /* or whatever value it needs to have */ Christophe > > Cc: linux...@kvack.org > Cc: linux-alpha@vger.kernel.org > Cc: linux-snps-...@lists.infradead.org > Cc: linux-arm-ker...@lists.infradead.org > Cc: linux-c...@vger.kernel.org > Cc: loonga...@lists.linux.dev > Cc: linux-m...@vger.kernel.org > Cc: linux-par...@vger.kernel.org > Cc: linuxppc-...@lists.ozlabs.org > Cc: linux-s...@vger.kernel.org > Cc: linux...@vger.kernel.org > Cc: sparcli...@vger.kernel.org > Cc: x...@kernel.org > Suggested-by: Kirill A. Shutemov > Signed-off-by: Rick Edgecombe > Link: > https://lore.kernel.org/lkml/3ynogxcgokc6i6xojbxzzwqectg472laes24u7jmtktlxcch5e@dfytra3ia3zc/#t > --- > Hi archs, > > For some context, this is part of a larger series to improve shadow stack > guard gaps. It involves plumbing a new field via > struct vm_unmapped_area_info. The first user is x86, but arm and riscv may > likely use it as well. The change is compile tested only for non-x86 but > seems like a relatively safe one. > > Thanks, > > Rick > > v2: > - New patch > --- > arch/alpha/kernel/osf_sys.c | 2 +- > arch/arc/mm/mmap.c | 2 +- > arch/arm/mm/mmap.c | 4 ++-- > arch/csky/abiv1/mmap.c | 2 +- > arch/loongarch/mm/mmap.c | 2 +- > arch/mips/mm/mmap.c | 2 +- > arch/parisc/kernel/sys_parisc.c | 2 +- > arch/powerpc/mm/book3s64/slice.c | 4 ++-- > arch/s390/mm/hugetlbpage.c | 4 ++-- > arch/s390/mm/mmap.c | 4 ++-- > arch/sh/mm/mmap.c| 4 ++-- > arch/sparc/kernel/sys_sparc_32.c | 2 +- > arch/sparc/kernel/sys_sparc_64.c | 4 ++-- > arch/sparc/mm/hugetlbpage.c | 4 ++-- > arch/x86/kernel/sys_x86_64.c | 4 ++-- > arch/x86/mm/hugetlbpage.c| 4 ++-- > fs/hugetlbfs/inode.c | 4 ++-- > mm/mmap.c| 4 ++-- > 18 files changed, 29 insertions(+), 29 deletions(-) > > diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c > index 5db88b627439..dd6801bb9240 100644 > --- a/arch/alpha/kernel/osf_sys.c > +++ b/arch/alpha/kernel/osf_sys.c > @@ -1218,7 +1218,7 @@ static unsigned long > arch_get_unmapped_area_1(unsigned long addr, unsigned long len, >unsigned long limit) > { > - struct vm_unmapped_area_info info; > + struct vm_unmapped_area_info info = {}; > > info.flags = 0; > info.length = len; > diff --git a/arch/arc/mm/mmap.c b/arch/arc/mm/mmap.c > index 3c1c7ae73292..6549b3375f54 100644 > --- a/arch/arc/mm/mmap.c > +++ b/arch/arc/mm/mmap.c > @@ -27,7 +27,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long > addr, > { > struct mm_struct *mm = current->mm; > struct vm_area_struct *vma; > - struct vm_unmapped_area_info info; > + struct vm_unmapped_area_info info = {}; > > /* >* We enforce the MAP_FIXED case. > diff --git a/arch/arm/mm/mmap.c b/arch/arm/mm/mmap.c > index a0f8a0ca0788..525795578c29 100644 > --- a/arch/arm/mm/mmap.c > +++ b/arch/arm/mm/mmap.c > @@ -34,7 +34,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long > addr, > struct vm_area_struct *vma; > int do_align = 0; > int aliasing = cache_is_vipt_aliasing(); > - struct vm_unmapped_area_info info; > + struct vm_unmapped_area_info info = {}; > > /* >* We only need to do colour alignment if either the I or D > @@ -87,7 +87,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const > unsigned long addr0, > unsigned long addr = addr0; > int do_align = 0; > int aliasing = cache_is_vipt_aliasing(); > - struct vm_unmapped_area_info info; > + struct vm_unmapped_area_info info = {}; > > /* >* We only need to do colour alignment if either the I or D > diff --git a/arch/csky/abiv1/mmap.c b/arch/csky/abiv1/mmap.c > index 6792aca4..726659d41fa9 100644 > --- a/arch/csky/abiv1/mmap.c > +++ b/arch/csky/abiv1/mmap.c > @@ -28,7 +28,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long > addr, > struct mm_struct *mm =
Re: [PATCH 3/4] arch: define CONFIG_PAGE_SIZE_*KB on all architectures
On Tue, Feb 27, 2024 at 12:15 AM Arnd Bergmann wrote: > > From: Arnd Bergmann > > Most architectures only support a single hardcoded page size. In order > to ensure that each one of these sets the corresponding Kconfig symbols, > change over the PAGE_SHIFT definition to the common one and allow > only the hardware page size to be selected. > > Signed-off-by: Arnd Bergmann > --- > arch/alpha/Kconfig | 1 + > arch/alpha/include/asm/page.h | 2 +- > arch/arm/Kconfig | 1 + > arch/arm/include/asm/page.h| 2 +- > arch/csky/Kconfig | 1 + > arch/csky/include/asm/page.h | 2 +- > arch/m68k/Kconfig | 3 +++ > arch/m68k/Kconfig.cpu | 2 ++ > arch/m68k/include/asm/page.h | 6 +- > arch/microblaze/Kconfig| 1 + > arch/microblaze/include/asm/page.h | 2 +- > arch/nios2/Kconfig | 1 + > arch/nios2/include/asm/page.h | 2 +- > arch/openrisc/Kconfig | 1 + > arch/openrisc/include/asm/page.h | 2 +- > arch/riscv/Kconfig | 1 + > arch/riscv/include/asm/page.h | 2 +- > arch/s390/Kconfig | 1 + > arch/s390/include/asm/page.h | 2 +- > arch/sparc/Kconfig | 2 ++ > arch/sparc/include/asm/page_32.h | 2 +- > arch/sparc/include/asm/page_64.h | 3 +-- > arch/um/Kconfig| 1 + > arch/um/include/asm/page.h | 2 +- > arch/x86/Kconfig | 1 + > arch/x86/include/asm/page_types.h | 2 +- > arch/xtensa/Kconfig| 1 + > arch/xtensa/include/asm/page.h | 2 +- > 28 files changed, 32 insertions(+), 19 deletions(-) > > diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig > index d6968d090d49..4f490250d323 100644 > --- a/arch/alpha/Kconfig > +++ b/arch/alpha/Kconfig > @@ -14,6 +14,7 @@ config ALPHA > select PCI_DOMAINS if PCI > select PCI_SYSCALL if PCI > select HAVE_ASM_MODVERSIONS > + select HAVE_PAGE_SIZE_8KB > select HAVE_PCSPKR_PLATFORM > select HAVE_PERF_EVENTS > select NEED_DMA_MAP_STATE > diff --git a/arch/alpha/include/asm/page.h b/arch/alpha/include/asm/page.h > index 4db1ebc0ed99..70419e6be1a3 100644 > --- a/arch/alpha/include/asm/page.h > +++ b/arch/alpha/include/asm/page.h > @@ -6,7 +6,7 @@ > #include > > /* PAGE_SHIFT determines the page size */ > -#define PAGE_SHIFT 13 > +#define PAGE_SHIFT CONFIG_PAGE_SHIFT > #define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT) > #define PAGE_MASK (~(PAGE_SIZE-1)) > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index 0af6709570d1..9d52ba3a8ad1 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -116,6 +116,7 @@ config ARM > select HAVE_MOD_ARCH_SPECIFIC > select HAVE_NMI > select HAVE_OPTPROBES if !THUMB2_KERNEL > + select HAVE_PAGE_SIZE_4KB > select HAVE_PCI if MMU > select HAVE_PERF_EVENTS > select HAVE_PERF_REGS > diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h > index 119aa85d1feb..62af9f7f9e96 100644 > --- a/arch/arm/include/asm/page.h > +++ b/arch/arm/include/asm/page.h > @@ -8,7 +8,7 @@ > #define _ASMARM_PAGE_H > > /* PAGE_SHIFT determines the page size */ > -#define PAGE_SHIFT 12 > +#define PAGE_SHIFT CONFIG_PAGE_SHIFT > #define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT) > #define PAGE_MASK (~((1 << PAGE_SHIFT) - 1)) > > diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig > index cf2a6fd7dff8..9c2723ab1c94 100644 > --- a/arch/csky/Kconfig > +++ b/arch/csky/Kconfig > @@ -89,6 +89,7 @@ config CSKY > select HAVE_KPROBES if !CPU_CK610 > select HAVE_KPROBES_ON_FTRACE if !CPU_CK610 > select HAVE_KRETPROBES if !CPU_CK610 > + select HAVE_PAGE_SIZE_4KB > select HAVE_PERF_EVENTS > select HAVE_PERF_REGS > select HAVE_PERF_USER_STACK_DUMP > diff --git a/arch/csky/include/asm/page.h b/arch/csky/include/asm/page.h > index 4a0502e324a6..f70f37402d75 100644 > --- a/arch/csky/include/asm/page.h > +++ b/arch/csky/include/asm/page.h > @@ -10,7 +10,7 @@ > /* > * PAGE_SHIFT determines the page size: 4KB > */ > -#define PAGE_SHIFT 12 > +#define PAGE_SHIFT CONFIG_PAGE_SHIFT LGTM, thx. Acked-by: Guo Ren > #define PAGE_SIZE (_AC(1, UL) << PAGE_SHIFT) > #define PAGE_MASK (~(PAGE_SIZE - 1)) > #define THREAD_SIZE(PAGE_SIZE * 2) > diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig > index 4b3e93cac723..7b709453d5e7 100644 > --- a/arch/m68k/Kconfig > +++ b/arch/m68k/Kconfig > @@ -84,12 +84,15 @@ config MMU > > config MMU_MOTOROLA > bool > + select HAVE_PAGE_SIZE_4KB > > config MMU_COLDFIRE > + select HAVE_PAGE_SIZE_8KB > bool > > config MMU_SUN3 > bool > + select HAVE_PAGE_SIZE_8KB > depends on MMU && !MMU_MOTOROLA && !MMU_COLDFIRE > > config ARCH_SUPPORTS_KEXEC > diff --git
[PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info
Future changes will need to add a field to struct vm_unmapped_area_info. This would cause trouble for any archs that don't initialize the struct. Currently every user sets each field, so if new fields are added, the core code parsing the struct will see garbage in the new field. It could be possible to initialize the new field for each arch to 0, but instead simply inialize the field with a C99 struct inializing syntax. Cc: linux...@kvack.org Cc: linux-alpha@vger.kernel.org Cc: linux-snps-...@lists.infradead.org Cc: linux-arm-ker...@lists.infradead.org Cc: linux-c...@vger.kernel.org Cc: loonga...@lists.linux.dev Cc: linux-m...@vger.kernel.org Cc: linux-par...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Cc: linux-s...@vger.kernel.org Cc: linux...@vger.kernel.org Cc: sparcli...@vger.kernel.org Cc: x...@kernel.org Suggested-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Link: https://lore.kernel.org/lkml/3ynogxcgokc6i6xojbxzzwqectg472laes24u7jmtktlxcch5e@dfytra3ia3zc/#t --- Hi archs, For some context, this is part of a larger series to improve shadow stack guard gaps. It involves plumbing a new field via struct vm_unmapped_area_info. The first user is x86, but arm and riscv may likely use it as well. The change is compile tested only for non-x86 but seems like a relatively safe one. Thanks, Rick v2: - New patch --- arch/alpha/kernel/osf_sys.c | 2 +- arch/arc/mm/mmap.c | 2 +- arch/arm/mm/mmap.c | 4 ++-- arch/csky/abiv1/mmap.c | 2 +- arch/loongarch/mm/mmap.c | 2 +- arch/mips/mm/mmap.c | 2 +- arch/parisc/kernel/sys_parisc.c | 2 +- arch/powerpc/mm/book3s64/slice.c | 4 ++-- arch/s390/mm/hugetlbpage.c | 4 ++-- arch/s390/mm/mmap.c | 4 ++-- arch/sh/mm/mmap.c| 4 ++-- arch/sparc/kernel/sys_sparc_32.c | 2 +- arch/sparc/kernel/sys_sparc_64.c | 4 ++-- arch/sparc/mm/hugetlbpage.c | 4 ++-- arch/x86/kernel/sys_x86_64.c | 4 ++-- arch/x86/mm/hugetlbpage.c| 4 ++-- fs/hugetlbfs/inode.c | 4 ++-- mm/mmap.c| 4 ++-- 18 files changed, 29 insertions(+), 29 deletions(-) diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c index 5db88b627439..dd6801bb9240 100644 --- a/arch/alpha/kernel/osf_sys.c +++ b/arch/alpha/kernel/osf_sys.c @@ -1218,7 +1218,7 @@ static unsigned long arch_get_unmapped_area_1(unsigned long addr, unsigned long len, unsigned long limit) { - struct vm_unmapped_area_info info; + struct vm_unmapped_area_info info = {}; info.flags = 0; info.length = len; diff --git a/arch/arc/mm/mmap.c b/arch/arc/mm/mmap.c index 3c1c7ae73292..6549b3375f54 100644 --- a/arch/arc/mm/mmap.c +++ b/arch/arc/mm/mmap.c @@ -27,7 +27,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr, { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; - struct vm_unmapped_area_info info; + struct vm_unmapped_area_info info = {}; /* * We enforce the MAP_FIXED case. diff --git a/arch/arm/mm/mmap.c b/arch/arm/mm/mmap.c index a0f8a0ca0788..525795578c29 100644 --- a/arch/arm/mm/mmap.c +++ b/arch/arm/mm/mmap.c @@ -34,7 +34,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr, struct vm_area_struct *vma; int do_align = 0; int aliasing = cache_is_vipt_aliasing(); - struct vm_unmapped_area_info info; + struct vm_unmapped_area_info info = {}; /* * We only need to do colour alignment if either the I or D @@ -87,7 +87,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, unsigned long addr = addr0; int do_align = 0; int aliasing = cache_is_vipt_aliasing(); - struct vm_unmapped_area_info info; + struct vm_unmapped_area_info info = {}; /* * We only need to do colour alignment if either the I or D diff --git a/arch/csky/abiv1/mmap.c b/arch/csky/abiv1/mmap.c index 6792aca4..726659d41fa9 100644 --- a/arch/csky/abiv1/mmap.c +++ b/arch/csky/abiv1/mmap.c @@ -28,7 +28,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr, struct mm_struct *mm = current->mm; struct vm_area_struct *vma; int do_align = 0; - struct vm_unmapped_area_info info; + struct vm_unmapped_area_info info = {}; /* * We only need to do colour alignment if either the I or D diff --git a/arch/loongarch/mm/mmap.c b/arch/loongarch/mm/mmap.c index a9630a81b38a..664bf4abfdcf 100644 --- a/arch/loongarch/mm/mmap.c +++ b/arch/loongarch/mm/mmap.c @@ -24,7 +24,7 @@ static unsigned long arch_get_unmapped_area_common(struct file *filp, struct vm_area_struct *vma; unsigned long addr = addr0; int do_color_align; - struct vm_unmapped_area_info info; + struct vm_unmapped_area_info info = {}; if (unlikely(len > TASK_SIZE))
Re: [PATCH 2/4] arch: simplify architecture specific page size configuration
Le 26/02/2024 à 17:14, Arnd Bergmann a écrit : > From: Arnd Bergmann > > arc, arm64, parisc and powerpc all have their own Kconfig symbols > in place of the common CONFIG_PAGE_SIZE_4KB symbols. Change these > so the common symbols are the ones that are actually used, while > leaving the arhcitecture specific ones as the user visible > place for configuring it, to avoid breaking user configs. > > Signed-off-by: Arnd Bergmann Reviewed-by: Christophe Leroy (powerpc32) > --- > arch/arc/Kconfig | 3 +++ > arch/arc/include/uapi/asm/page.h | 6 ++ > arch/arm64/Kconfig| 29 + > arch/arm64/include/asm/page-def.h | 2 +- > arch/parisc/Kconfig | 3 +++ > arch/parisc/include/asm/page.h| 10 +- > arch/powerpc/Kconfig | 31 ++- > arch/powerpc/include/asm/page.h | 2 +- > scripts/gdb/linux/constants.py.in | 2 +- > scripts/gdb/linux/mm.py | 2 +- > 10 files changed, 32 insertions(+), 58 deletions(-) > > diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig > index 1b0483c51cc1..4092bec198be 100644 > --- a/arch/arc/Kconfig > +++ b/arch/arc/Kconfig > @@ -284,14 +284,17 @@ choice > > config ARC_PAGE_SIZE_8K > bool "8KB" > + select HAVE_PAGE_SIZE_8KB > help > Choose between 8k vs 16k > > config ARC_PAGE_SIZE_16K > + select HAVE_PAGE_SIZE_16KB > bool "16KB" > > config ARC_PAGE_SIZE_4K > bool "4KB" > + select HAVE_PAGE_SIZE_4KB > depends on ARC_MMU_V3 || ARC_MMU_V4 > > endchoice > diff --git a/arch/arc/include/uapi/asm/page.h > b/arch/arc/include/uapi/asm/page.h > index 2a4ad619abfb..7fd9e741b527 100644 > --- a/arch/arc/include/uapi/asm/page.h > +++ b/arch/arc/include/uapi/asm/page.h > @@ -13,10 +13,8 @@ > #include > > /* PAGE_SHIFT determines the page size */ > -#if defined(CONFIG_ARC_PAGE_SIZE_16K) > -#define PAGE_SHIFT 14 > -#elif defined(CONFIG_ARC_PAGE_SIZE_4K) > -#define PAGE_SHIFT 12 > +#ifdef __KERNEL__ > +#define PAGE_SHIFT CONFIG_PAGE_SHIFT > #else > /* >* Default 8k > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index aa7c1d435139..29290b8cb36d 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -277,27 +277,21 @@ config 64BIT > config MMU > def_bool y > > -config ARM64_PAGE_SHIFT > - int > - default 16 if ARM64_64K_PAGES > - default 14 if ARM64_16K_PAGES > - default 12 > - > config ARM64_CONT_PTE_SHIFT > int > - default 5 if ARM64_64K_PAGES > - default 7 if ARM64_16K_PAGES > + default 5 if PAGE_SIZE_64KB > + default 7 if PAGE_SIZE_16KB > default 4 > > config ARM64_CONT_PMD_SHIFT > int > - default 5 if ARM64_64K_PAGES > - default 5 if ARM64_16K_PAGES > + default 5 if PAGE_SIZE_64KB > + default 5 if PAGE_SIZE_16KB > default 4 > > config ARCH_MMAP_RND_BITS_MIN > - default 14 if ARM64_64K_PAGES > - default 16 if ARM64_16K_PAGES > + default 14 if PAGE_SIZE_64KB > + default 16 if PAGE_SIZE_16KB > default 18 > > # max bits determined by the following formula: > @@ -1259,11 +1253,13 @@ choice > > config ARM64_4K_PAGES > bool "4KB" > + select HAVE_PAGE_SIZE_4KB > help > This feature enables 4KB pages support. > > config ARM64_16K_PAGES > bool "16KB" > + select HAVE_PAGE_SIZE_16KB > help > The system will use 16KB pages support. AArch32 emulation > requires applications compiled with 16K (or a multiple of 16K) > @@ -1271,6 +1267,7 @@ config ARM64_16K_PAGES > > config ARM64_64K_PAGES > bool "64KB" > + select HAVE_PAGE_SIZE_64KB > help > This feature enables 64KB pages support (4KB by default) > allowing only two levels of page tables and faster TLB > @@ -1291,19 +1288,19 @@ choice > > config ARM64_VA_BITS_36 > bool "36-bit" if EXPERT > - depends on ARM64_16K_PAGES > + depends on PAGE_SIZE_16KB > > config ARM64_VA_BITS_39 > bool "39-bit" > - depends on ARM64_4K_PAGES > + depends on PAGE_SIZE_4KB > > config ARM64_VA_BITS_42 > bool "42-bit" > - depends on ARM64_64K_PAGES > + depends on PAGE_SIZE_64KB > > config ARM64_VA_BITS_47 > bool "47-bit" > - depends on ARM64_16K_PAGES > + depends on PAGE_SIZE_16KB > > config ARM64_VA_BITS_48 > bool "48-bit" > diff --git a/arch/arm64/include/asm/page-def.h > b/arch/arm64/include/asm/page-def.h > index 2403f7b4cdbf..792e9fe881dc 100644 > --- a/arch/arm64/include/asm/page-def.h > +++ b/arch/arm64/include/asm/page-def.h > @@ -11,7 +11,7 @@ > #include > > /* PAGE_SHIFT determines the page size */ > -#define PAGE_SHIFT CONFIG_ARM64_PAGE_SHIFT > +#define PAGE_SHIFT CONFIG_PAGE_SHIFT > #define PAGE_SIZE (_AC(1, UL) << PAGE_SHIFT) > #define PAGE_MASK
Re: [PATCH 1/4] arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions
Le 26/02/2024 à 17:14, Arnd Bergmann a écrit : > From: Arnd Bergmann > > These four architectures define the same Kconfig symbols for configuring > the page size. Move the logic into a common place where it can be shared > with all other architectures. > > Signed-off-by: Arnd Bergmann > --- > arch/Kconfig | 58 +-- > arch/hexagon/Kconfig | 25 +++-- > arch/hexagon/include/asm/page.h | 6 +--- > arch/loongarch/Kconfig| 21 --- > arch/loongarch/include/asm/page.h | 10 +- > arch/mips/Kconfig | 58 +++ > arch/mips/include/asm/page.h | 16 + > arch/sh/include/asm/page.h| 13 +-- > arch/sh/mm/Kconfig| 42 +++--- > 9 files changed, 88 insertions(+), 161 deletions(-) > > diff --git a/arch/Kconfig b/arch/Kconfig > index a5af0edd3eb8..237cea01ed9b 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -1078,17 +1078,71 @@ config HAVE_ARCH_COMPAT_MMAP_BASES > and vice-versa 32-bit applications to call 64-bit mmap(). > Required for applications doing different bitness syscalls. > > +config HAVE_PAGE_SIZE_4KB > + bool > + > +config HAVE_PAGE_SIZE_8KB > + bool > + > +config HAVE_PAGE_SIZE_16KB > + bool > + > +config HAVE_PAGE_SIZE_32KB > + bool > + > +config HAVE_PAGE_SIZE_64KB > + bool > + > +config HAVE_PAGE_SIZE_256KB > + bool > + > +choice > + prompt "MMU page size" > + That's a nice re-factor. The only drawback I see is that we are loosing several interesting arch-specific comments/help text. Don't know if there could be an easy way to keep them. > +config PAGE_SIZE_4KB > + bool "4KB pages" > + depends on HAVE_PAGE_SIZE_4KB > + > +config PAGE_SIZE_8KB > + bool "8KB pages" > + depends on HAVE_PAGE_SIZE_8KB > + > +config PAGE_SIZE_16KB > + bool "16KB pages" > + depends on HAVE_PAGE_SIZE_16KB > + > +config PAGE_SIZE_32KB > + bool "32KB pages" > + depends on HAVE_PAGE_SIZE_32KB > + > +config PAGE_SIZE_64KB > + bool "64KB pages" > + depends on HAVE_PAGE_SIZE_64KB > + > +config PAGE_SIZE_256KB > + bool "256KB pages" > + depends on HAVE_PAGE_SIZE_256KB Hexagon seem to also use CONFIG_PAGE_SIZE_1MB ? > + > +endchoice > + > config PAGE_SIZE_LESS_THAN_64KB > def_bool y > - depends on !ARM64_64K_PAGES > depends on !PAGE_SIZE_64KB > - depends on !PARISC_PAGE_SIZE_64KB > depends on PAGE_SIZE_LESS_THAN_256KB > > config PAGE_SIZE_LESS_THAN_256KB > def_bool y > depends on !PAGE_SIZE_256KB > > +config PAGE_SHIFT > + int > + default 12 if PAGE_SIZE_4KB > + default 13 if PAGE_SIZE_8KB > + default 14 if PAGE_SIZE_16KB > + default 15 if PAGE_SIZE_32KB > + default 16 if PAGE_SIZE_64KB > + default 18 if PAGE_SIZE_256KB > + > # This allows to use a set of generic functions to determine mmap base > # address by giving priority to top-down scheme only if the process > # is not in legacy mode (compat task, unlimited stack size or > diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig > index a880ee067d2e..aac46ee1a000 100644 > --- a/arch/hexagon/Kconfig > +++ b/arch/hexagon/Kconfig > @@ -8,6 +8,11 @@ config HEXAGON > select ARCH_HAS_SYNC_DMA_FOR_DEVICE > select ARCH_NO_PREEMPT > select DMA_GLOBAL_POOL > + select FRAME_POINTER > + select HAVE_PAGE_SIZE_4KB > + select HAVE_PAGE_SIZE_16KB > + select HAVE_PAGE_SIZE_64KB > + select HAVE_PAGE_SIZE_256KB > # Other pending projects/to-do items. > # select HAVE_REGS_AND_STACK_ACCESS_API > # select HAVE_HW_BREAKPOINT if PERF_EVENTS > @@ -120,26 +125,6 @@ config NR_CPUS > This is purely to save memory - each supported CPU adds > approximately eight kilobytes to the kernel image. > > -choice > - prompt "Kernel page size" > - default PAGE_SIZE_4KB > - help > - Changes the default page size; use with caution. > - > -config PAGE_SIZE_4KB > - bool "4KB" > - > -config PAGE_SIZE_16KB > - bool "16KB" > - > -config PAGE_SIZE_64KB > - bool "64KB" > - > -config PAGE_SIZE_256KB > - bool "256KB" > - > -endchoice > - > source "kernel/Kconfig.hz" > > endmenu > diff --git a/arch/hexagon/include/asm/page.h b/arch/hexagon/include/asm/page.h > index 10f1bc07423c..65c9bac639fa 100644 > --- a/arch/hexagon/include/asm/page.h > +++ b/arch/hexagon/include/asm/page.h > @@ -13,27 +13,22 @@ > /* This is probably not the most graceful way to handle this. */ > > #ifdef CONFIG_PAGE_SIZE_4KB > -#define PAGE_SHIFT 12 > #define HEXAGON_L1_PTE_SIZE __HVM_PDE_S_4KB > #endif > > #ifdef CONFIG_PAGE_SIZE_16KB > -#define PAGE_SHIFT 14 > #define HEXAGON_L1_PTE_SIZE __HVM_PDE_S_16KB > #endif > > #ifdef CONFIG_PAGE_SIZE_64KB > -#define PAGE_SHIFT 16 > #define HEXAGON_L1_PTE_SIZE __HVM_PDE_S_64KB >
Re: [PATCH 4/4] vdso: avoid including asm/page.h
Le 26/02/2024 à 17:14, Arnd Bergmann a écrit : > From: Arnd Bergmann > > The recent change to the vdso_data_store broke building compat VDSO > on at least arm64 because it includes headers outside of the include/vdso/ > namespace: I understand that powerpc64 also has an issue, see https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20231221120410.2226678-1-...@ellerman.id.au/ > > In file included from arch/arm64/include/asm/lse.h:5, > from arch/arm64/include/asm/cmpxchg.h:14, > from arch/arm64/include/asm/atomic.h:16, > from include/linux/atomic.h:7, > from include/asm-generic/bitops/atomic.h:5, > from arch/arm64/include/asm/bitops.h:25, > from include/linux/bitops.h:68, > from arch/arm64/include/asm/memory.h:209, > from arch/arm64/include/asm/page.h:46, > from include/vdso/datapage.h:22, > from lib/vdso/gettimeofday.c:5, > from : > arch/arm64/include/asm/atomic_ll_sc.h:298:9: error: unknown type name 'u128' >298 | u128 full; > > Use an open-coded page size calculation based on the new CONFIG_PAGE_SHIFT > Kconfig symbol instead. > > Reported-by: Linux Kernel Functional Testing > Fixes: a0d2fcd62ac2 ("vdso/ARM: Make union vdso_data_store available for all > architectures") > Link: > https://lore.kernel.org/lkml/ca+g9fytrxxm_ko9fnpz3xarxhv7ud_yqp-teupqrnrhu+_0...@mail.gmail.com/ > Signed-off-by: Arnd Bergmann > --- > include/vdso/datapage.h | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/include/vdso/datapage.h b/include/vdso/datapage.h > index 7ba44379a095..2c39a67d7e23 100644 > --- a/include/vdso/datapage.h > +++ b/include/vdso/datapage.h > @@ -19,8 +19,6 @@ > #include > #include > > -#include > - > #ifdef CONFIG_ARCH_HAS_VDSO_DATA > #include > #else > @@ -128,7 +126,7 @@ extern struct vdso_data _timens_data[CS_BASES] > __attribute__((visibility("hidden >*/ > union vdso_data_store { > struct vdso_datadata[CS_BASES]; > - u8 page[PAGE_SIZE]; > + u8 page[1ul << CONFIG_PAGE_SHIFT]; Usually 1UL is used (capital letter) Maybe better to (re)define PAGE_SIZE instead, something like: #define PAGE_SIZE (1UL << CONFIG_PAGE_SHIFT) > }; > > /*
Re: [PATCH 1/4] arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions
On 2024-02-26 10:14 AM, Arnd Bergmann wrote: > From: Arnd Bergmann > > These four architectures define the same Kconfig symbols for configuring > the page size. Move the logic into a common place where it can be shared > with all other architectures. > > Signed-off-by: Arnd Bergmann > --- > arch/Kconfig | 58 +-- > arch/hexagon/Kconfig | 25 +++-- > arch/hexagon/include/asm/page.h | 6 +--- > arch/loongarch/Kconfig| 21 --- > arch/loongarch/include/asm/page.h | 10 +- > arch/mips/Kconfig | 58 +++ > arch/mips/include/asm/page.h | 16 + > arch/sh/include/asm/page.h| 13 +-- > arch/sh/mm/Kconfig| 42 +++--- > 9 files changed, 88 insertions(+), 161 deletions(-) > > diff --git a/arch/Kconfig b/arch/Kconfig > index a5af0edd3eb8..237cea01ed9b 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -1078,17 +1078,71 @@ config HAVE_ARCH_COMPAT_MMAP_BASES > and vice-versa 32-bit applications to call 64-bit mmap(). > Required for applications doing different bitness syscalls. > > +config HAVE_PAGE_SIZE_4KB > + bool > + > +config HAVE_PAGE_SIZE_8KB > + bool > + > +config HAVE_PAGE_SIZE_16KB > + bool > + > +config HAVE_PAGE_SIZE_32KB > + bool > + > +config HAVE_PAGE_SIZE_64KB > + bool > + > +config HAVE_PAGE_SIZE_256KB > + bool > + > +choice > + prompt "MMU page size" Should this have some generic help text (at least a warning about compatibility)? > + > +config PAGE_SIZE_4KB > + bool "4KB pages" > + depends on HAVE_PAGE_SIZE_4KB > + > +config PAGE_SIZE_8KB > + bool "8KB pages" > + depends on HAVE_PAGE_SIZE_8KB > + > +config PAGE_SIZE_16KB > + bool "16KB pages" > + depends on HAVE_PAGE_SIZE_16KB > + > +config PAGE_SIZE_32KB > + bool "32KB pages" > + depends on HAVE_PAGE_SIZE_32KB > + > +config PAGE_SIZE_64KB > + bool "64KB pages" > + depends on HAVE_PAGE_SIZE_64KB > + > +config PAGE_SIZE_256KB > + bool "256KB pages" > + depends on HAVE_PAGE_SIZE_256KB > + > +endchoice > + > config PAGE_SIZE_LESS_THAN_64KB > def_bool y > - depends on !ARM64_64K_PAGES > depends on !PAGE_SIZE_64KB > - depends on !PARISC_PAGE_SIZE_64KB > depends on PAGE_SIZE_LESS_THAN_256KB > > config PAGE_SIZE_LESS_THAN_256KB > def_bool y > depends on !PAGE_SIZE_256KB > > +config PAGE_SHIFT > + int > + default 12 if PAGE_SIZE_4KB > + default 13 if PAGE_SIZE_8KB > + default 14 if PAGE_SIZE_16KB > + default 15 if PAGE_SIZE_32KB > + default 16 if PAGE_SIZE_64KB > + default 18 if PAGE_SIZE_256KB > + > # This allows to use a set of generic functions to determine mmap base > # address by giving priority to top-down scheme only if the process > # is not in legacy mode (compat task, unlimited stack size or > diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig > index a880ee067d2e..aac46ee1a000 100644 > --- a/arch/hexagon/Kconfig > +++ b/arch/hexagon/Kconfig > @@ -8,6 +8,11 @@ config HEXAGON > select ARCH_HAS_SYNC_DMA_FOR_DEVICE > select ARCH_NO_PREEMPT > select DMA_GLOBAL_POOL > + select FRAME_POINTER Looks like a paste error. > + select HAVE_PAGE_SIZE_4KB > + select HAVE_PAGE_SIZE_16KB > + select HAVE_PAGE_SIZE_64KB > + select HAVE_PAGE_SIZE_256KB > # Other pending projects/to-do items. > # select HAVE_REGS_AND_STACK_ACCESS_API > # select HAVE_HW_BREAKPOINT if PERF_EVENTS > @@ -120,26 +125,6 @@ config NR_CPUS > This is purely to save memory - each supported CPU adds > approximately eight kilobytes to the kernel image. > > -choice > - prompt "Kernel page size" > - default PAGE_SIZE_4KB > - help > - Changes the default page size; use with caution. > - > -config PAGE_SIZE_4KB > - bool "4KB" > - > -config PAGE_SIZE_16KB > - bool "16KB" > - > -config PAGE_SIZE_64KB > - bool "64KB" > - > -config PAGE_SIZE_256KB > - bool "256KB" > - > -endchoice > - > source "kernel/Kconfig.hz" > > endmenu > diff --git a/arch/hexagon/include/asm/page.h b/arch/hexagon/include/asm/page.h > index 10f1bc07423c..65c9bac639fa 100644 > --- a/arch/hexagon/include/asm/page.h > +++ b/arch/hexagon/include/asm/page.h > @@ -13,27 +13,22 @@ > /* This is probably not the most graceful way to handle this. */ > > #ifdef CONFIG_PAGE_SIZE_4KB > -#define PAGE_SHIFT 12 > #define HEXAGON_L1_PTE_SIZE __HVM_PDE_S_4KB > #endif > > #ifdef CONFIG_PAGE_SIZE_16KB > -#define PAGE_SHIFT 14 > #define HEXAGON_L1_PTE_SIZE __HVM_PDE_S_16KB > #endif > > #ifdef CONFIG_PAGE_SIZE_64KB > -#define PAGE_SHIFT 16 > #define HEXAGON_L1_PTE_SIZE __HVM_PDE_S_64KB > #endif > > #ifdef CONFIG_PAGE_SIZE_256KB > -#define PAGE_SHIFT 18 > #define HEXAGON_L1_PTE_SIZE __HVM_PDE_S_256KB > #endif > > #ifdef CONFIG_PAGE_SIZE_1MB >
[PATCH 4/4] vdso: avoid including asm/page.h
From: Arnd Bergmann The recent change to the vdso_data_store broke building compat VDSO on at least arm64 because it includes headers outside of the include/vdso/ namespace: In file included from arch/arm64/include/asm/lse.h:5, from arch/arm64/include/asm/cmpxchg.h:14, from arch/arm64/include/asm/atomic.h:16, from include/linux/atomic.h:7, from include/asm-generic/bitops/atomic.h:5, from arch/arm64/include/asm/bitops.h:25, from include/linux/bitops.h:68, from arch/arm64/include/asm/memory.h:209, from arch/arm64/include/asm/page.h:46, from include/vdso/datapage.h:22, from lib/vdso/gettimeofday.c:5, from : arch/arm64/include/asm/atomic_ll_sc.h:298:9: error: unknown type name 'u128' 298 | u128 full; Use an open-coded page size calculation based on the new CONFIG_PAGE_SHIFT Kconfig symbol instead. Reported-by: Linux Kernel Functional Testing Fixes: a0d2fcd62ac2 ("vdso/ARM: Make union vdso_data_store available for all architectures") Link: https://lore.kernel.org/lkml/ca+g9fytrxxm_ko9fnpz3xarxhv7ud_yqp-teupqrnrhu+_0...@mail.gmail.com/ Signed-off-by: Arnd Bergmann --- include/vdso/datapage.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/include/vdso/datapage.h b/include/vdso/datapage.h index 7ba44379a095..2c39a67d7e23 100644 --- a/include/vdso/datapage.h +++ b/include/vdso/datapage.h @@ -19,8 +19,6 @@ #include #include -#include - #ifdef CONFIG_ARCH_HAS_VDSO_DATA #include #else @@ -128,7 +126,7 @@ extern struct vdso_data _timens_data[CS_BASES] __attribute__((visibility("hidden */ union vdso_data_store { struct vdso_datadata[CS_BASES]; - u8 page[PAGE_SIZE]; + u8 page[1ul << CONFIG_PAGE_SHIFT]; }; /* -- 2.39.2
[PATCH 3/4] arch: define CONFIG_PAGE_SIZE_*KB on all architectures
From: Arnd Bergmann Most architectures only support a single hardcoded page size. In order to ensure that each one of these sets the corresponding Kconfig symbols, change over the PAGE_SHIFT definition to the common one and allow only the hardware page size to be selected. Signed-off-by: Arnd Bergmann --- arch/alpha/Kconfig | 1 + arch/alpha/include/asm/page.h | 2 +- arch/arm/Kconfig | 1 + arch/arm/include/asm/page.h| 2 +- arch/csky/Kconfig | 1 + arch/csky/include/asm/page.h | 2 +- arch/m68k/Kconfig | 3 +++ arch/m68k/Kconfig.cpu | 2 ++ arch/m68k/include/asm/page.h | 6 +- arch/microblaze/Kconfig| 1 + arch/microblaze/include/asm/page.h | 2 +- arch/nios2/Kconfig | 1 + arch/nios2/include/asm/page.h | 2 +- arch/openrisc/Kconfig | 1 + arch/openrisc/include/asm/page.h | 2 +- arch/riscv/Kconfig | 1 + arch/riscv/include/asm/page.h | 2 +- arch/s390/Kconfig | 1 + arch/s390/include/asm/page.h | 2 +- arch/sparc/Kconfig | 2 ++ arch/sparc/include/asm/page_32.h | 2 +- arch/sparc/include/asm/page_64.h | 3 +-- arch/um/Kconfig| 1 + arch/um/include/asm/page.h | 2 +- arch/x86/Kconfig | 1 + arch/x86/include/asm/page_types.h | 2 +- arch/xtensa/Kconfig| 1 + arch/xtensa/include/asm/page.h | 2 +- 28 files changed, 32 insertions(+), 19 deletions(-) diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig index d6968d090d49..4f490250d323 100644 --- a/arch/alpha/Kconfig +++ b/arch/alpha/Kconfig @@ -14,6 +14,7 @@ config ALPHA select PCI_DOMAINS if PCI select PCI_SYSCALL if PCI select HAVE_ASM_MODVERSIONS + select HAVE_PAGE_SIZE_8KB select HAVE_PCSPKR_PLATFORM select HAVE_PERF_EVENTS select NEED_DMA_MAP_STATE diff --git a/arch/alpha/include/asm/page.h b/arch/alpha/include/asm/page.h index 4db1ebc0ed99..70419e6be1a3 100644 --- a/arch/alpha/include/asm/page.h +++ b/arch/alpha/include/asm/page.h @@ -6,7 +6,7 @@ #include /* PAGE_SHIFT determines the page size */ -#define PAGE_SHIFT 13 +#define PAGE_SHIFT CONFIG_PAGE_SHIFT #define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT) #define PAGE_MASK (~(PAGE_SIZE-1)) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 0af6709570d1..9d52ba3a8ad1 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -116,6 +116,7 @@ config ARM select HAVE_MOD_ARCH_SPECIFIC select HAVE_NMI select HAVE_OPTPROBES if !THUMB2_KERNEL + select HAVE_PAGE_SIZE_4KB select HAVE_PCI if MMU select HAVE_PERF_EVENTS select HAVE_PERF_REGS diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h index 119aa85d1feb..62af9f7f9e96 100644 --- a/arch/arm/include/asm/page.h +++ b/arch/arm/include/asm/page.h @@ -8,7 +8,7 @@ #define _ASMARM_PAGE_H /* PAGE_SHIFT determines the page size */ -#define PAGE_SHIFT 12 +#define PAGE_SHIFT CONFIG_PAGE_SHIFT #define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT) #define PAGE_MASK (~((1 << PAGE_SHIFT) - 1)) diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig index cf2a6fd7dff8..9c2723ab1c94 100644 --- a/arch/csky/Kconfig +++ b/arch/csky/Kconfig @@ -89,6 +89,7 @@ config CSKY select HAVE_KPROBES if !CPU_CK610 select HAVE_KPROBES_ON_FTRACE if !CPU_CK610 select HAVE_KRETPROBES if !CPU_CK610 + select HAVE_PAGE_SIZE_4KB select HAVE_PERF_EVENTS select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP diff --git a/arch/csky/include/asm/page.h b/arch/csky/include/asm/page.h index 4a0502e324a6..f70f37402d75 100644 --- a/arch/csky/include/asm/page.h +++ b/arch/csky/include/asm/page.h @@ -10,7 +10,7 @@ /* * PAGE_SHIFT determines the page size: 4KB */ -#define PAGE_SHIFT 12 +#define PAGE_SHIFT CONFIG_PAGE_SHIFT #define PAGE_SIZE (_AC(1, UL) << PAGE_SHIFT) #define PAGE_MASK (~(PAGE_SIZE - 1)) #define THREAD_SIZE(PAGE_SIZE * 2) diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig index 4b3e93cac723..7b709453d5e7 100644 --- a/arch/m68k/Kconfig +++ b/arch/m68k/Kconfig @@ -84,12 +84,15 @@ config MMU config MMU_MOTOROLA bool + select HAVE_PAGE_SIZE_4KB config MMU_COLDFIRE + select HAVE_PAGE_SIZE_8KB bool config MMU_SUN3 bool + select HAVE_PAGE_SIZE_8KB depends on MMU && !MMU_MOTOROLA && !MMU_COLDFIRE config ARCH_SUPPORTS_KEXEC diff --git a/arch/m68k/Kconfig.cpu b/arch/m68k/Kconfig.cpu index 9dcf245c9cbf..c777a129768a 100644 --- a/arch/m68k/Kconfig.cpu +++ b/arch/m68k/Kconfig.cpu @@ -30,6 +30,7 @@ config COLDFIRE select GENERIC_CSUM select GPIOLIB select HAVE_LEGACY_CLK + select HAVE_PAGE_SIZE_8KB if !MMU endchoice @@ -45,6 +46,7 @@ config M68000
[PATCH 2/4] arch: simplify architecture specific page size configuration
From: Arnd Bergmann arc, arm64, parisc and powerpc all have their own Kconfig symbols in place of the common CONFIG_PAGE_SIZE_4KB symbols. Change these so the common symbols are the ones that are actually used, while leaving the arhcitecture specific ones as the user visible place for configuring it, to avoid breaking user configs. Signed-off-by: Arnd Bergmann --- arch/arc/Kconfig | 3 +++ arch/arc/include/uapi/asm/page.h | 6 ++ arch/arm64/Kconfig| 29 + arch/arm64/include/asm/page-def.h | 2 +- arch/parisc/Kconfig | 3 +++ arch/parisc/include/asm/page.h| 10 +- arch/powerpc/Kconfig | 31 ++- arch/powerpc/include/asm/page.h | 2 +- scripts/gdb/linux/constants.py.in | 2 +- scripts/gdb/linux/mm.py | 2 +- 10 files changed, 32 insertions(+), 58 deletions(-) diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig index 1b0483c51cc1..4092bec198be 100644 --- a/arch/arc/Kconfig +++ b/arch/arc/Kconfig @@ -284,14 +284,17 @@ choice config ARC_PAGE_SIZE_8K bool "8KB" + select HAVE_PAGE_SIZE_8KB help Choose between 8k vs 16k config ARC_PAGE_SIZE_16K + select HAVE_PAGE_SIZE_16KB bool "16KB" config ARC_PAGE_SIZE_4K bool "4KB" + select HAVE_PAGE_SIZE_4KB depends on ARC_MMU_V3 || ARC_MMU_V4 endchoice diff --git a/arch/arc/include/uapi/asm/page.h b/arch/arc/include/uapi/asm/page.h index 2a4ad619abfb..7fd9e741b527 100644 --- a/arch/arc/include/uapi/asm/page.h +++ b/arch/arc/include/uapi/asm/page.h @@ -13,10 +13,8 @@ #include /* PAGE_SHIFT determines the page size */ -#if defined(CONFIG_ARC_PAGE_SIZE_16K) -#define PAGE_SHIFT 14 -#elif defined(CONFIG_ARC_PAGE_SIZE_4K) -#define PAGE_SHIFT 12 +#ifdef __KERNEL__ +#define PAGE_SHIFT CONFIG_PAGE_SHIFT #else /* * Default 8k diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index aa7c1d435139..29290b8cb36d 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -277,27 +277,21 @@ config 64BIT config MMU def_bool y -config ARM64_PAGE_SHIFT - int - default 16 if ARM64_64K_PAGES - default 14 if ARM64_16K_PAGES - default 12 - config ARM64_CONT_PTE_SHIFT int - default 5 if ARM64_64K_PAGES - default 7 if ARM64_16K_PAGES + default 5 if PAGE_SIZE_64KB + default 7 if PAGE_SIZE_16KB default 4 config ARM64_CONT_PMD_SHIFT int - default 5 if ARM64_64K_PAGES - default 5 if ARM64_16K_PAGES + default 5 if PAGE_SIZE_64KB + default 5 if PAGE_SIZE_16KB default 4 config ARCH_MMAP_RND_BITS_MIN - default 14 if ARM64_64K_PAGES - default 16 if ARM64_16K_PAGES + default 14 if PAGE_SIZE_64KB + default 16 if PAGE_SIZE_16KB default 18 # max bits determined by the following formula: @@ -1259,11 +1253,13 @@ choice config ARM64_4K_PAGES bool "4KB" + select HAVE_PAGE_SIZE_4KB help This feature enables 4KB pages support. config ARM64_16K_PAGES bool "16KB" + select HAVE_PAGE_SIZE_16KB help The system will use 16KB pages support. AArch32 emulation requires applications compiled with 16K (or a multiple of 16K) @@ -1271,6 +1267,7 @@ config ARM64_16K_PAGES config ARM64_64K_PAGES bool "64KB" + select HAVE_PAGE_SIZE_64KB help This feature enables 64KB pages support (4KB by default) allowing only two levels of page tables and faster TLB @@ -1291,19 +1288,19 @@ choice config ARM64_VA_BITS_36 bool "36-bit" if EXPERT - depends on ARM64_16K_PAGES + depends on PAGE_SIZE_16KB config ARM64_VA_BITS_39 bool "39-bit" - depends on ARM64_4K_PAGES + depends on PAGE_SIZE_4KB config ARM64_VA_BITS_42 bool "42-bit" - depends on ARM64_64K_PAGES + depends on PAGE_SIZE_64KB config ARM64_VA_BITS_47 bool "47-bit" - depends on ARM64_16K_PAGES + depends on PAGE_SIZE_16KB config ARM64_VA_BITS_48 bool "48-bit" diff --git a/arch/arm64/include/asm/page-def.h b/arch/arm64/include/asm/page-def.h index 2403f7b4cdbf..792e9fe881dc 100644 --- a/arch/arm64/include/asm/page-def.h +++ b/arch/arm64/include/asm/page-def.h @@ -11,7 +11,7 @@ #include /* PAGE_SHIFT determines the page size */ -#define PAGE_SHIFT CONFIG_ARM64_PAGE_SHIFT +#define PAGE_SHIFT CONFIG_PAGE_SHIFT #define PAGE_SIZE (_AC(1, UL) << PAGE_SHIFT) #define PAGE_MASK (~(PAGE_SIZE-1)) diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig index 5c845e8d59d9..b180e684fa0d 100644 --- a/arch/parisc/Kconfig +++ b/arch/parisc/Kconfig @@ -273,6 +273,7 @@ choice config PARISC_PAGE_SIZE_4KB bool "4KB" + select HAVE_PAGE_SIZE_4KB help This lets you select the page size of the kernel. For
[PATCH 1/4] arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions
From: Arnd Bergmann These four architectures define the same Kconfig symbols for configuring the page size. Move the logic into a common place where it can be shared with all other architectures. Signed-off-by: Arnd Bergmann --- arch/Kconfig | 58 +-- arch/hexagon/Kconfig | 25 +++-- arch/hexagon/include/asm/page.h | 6 +--- arch/loongarch/Kconfig| 21 --- arch/loongarch/include/asm/page.h | 10 +- arch/mips/Kconfig | 58 +++ arch/mips/include/asm/page.h | 16 + arch/sh/include/asm/page.h| 13 +-- arch/sh/mm/Kconfig| 42 +++--- 9 files changed, 88 insertions(+), 161 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index a5af0edd3eb8..237cea01ed9b 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1078,17 +1078,71 @@ config HAVE_ARCH_COMPAT_MMAP_BASES and vice-versa 32-bit applications to call 64-bit mmap(). Required for applications doing different bitness syscalls. +config HAVE_PAGE_SIZE_4KB + bool + +config HAVE_PAGE_SIZE_8KB + bool + +config HAVE_PAGE_SIZE_16KB + bool + +config HAVE_PAGE_SIZE_32KB + bool + +config HAVE_PAGE_SIZE_64KB + bool + +config HAVE_PAGE_SIZE_256KB + bool + +choice + prompt "MMU page size" + +config PAGE_SIZE_4KB + bool "4KB pages" + depends on HAVE_PAGE_SIZE_4KB + +config PAGE_SIZE_8KB + bool "8KB pages" + depends on HAVE_PAGE_SIZE_8KB + +config PAGE_SIZE_16KB + bool "16KB pages" + depends on HAVE_PAGE_SIZE_16KB + +config PAGE_SIZE_32KB + bool "32KB pages" + depends on HAVE_PAGE_SIZE_32KB + +config PAGE_SIZE_64KB + bool "64KB pages" + depends on HAVE_PAGE_SIZE_64KB + +config PAGE_SIZE_256KB + bool "256KB pages" + depends on HAVE_PAGE_SIZE_256KB + +endchoice + config PAGE_SIZE_LESS_THAN_64KB def_bool y - depends on !ARM64_64K_PAGES depends on !PAGE_SIZE_64KB - depends on !PARISC_PAGE_SIZE_64KB depends on PAGE_SIZE_LESS_THAN_256KB config PAGE_SIZE_LESS_THAN_256KB def_bool y depends on !PAGE_SIZE_256KB +config PAGE_SHIFT + int + default 12 if PAGE_SIZE_4KB + default 13 if PAGE_SIZE_8KB + default 14 if PAGE_SIZE_16KB + default 15 if PAGE_SIZE_32KB + default 16 if PAGE_SIZE_64KB + default 18 if PAGE_SIZE_256KB + # This allows to use a set of generic functions to determine mmap base # address by giving priority to top-down scheme only if the process # is not in legacy mode (compat task, unlimited stack size or diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig index a880ee067d2e..aac46ee1a000 100644 --- a/arch/hexagon/Kconfig +++ b/arch/hexagon/Kconfig @@ -8,6 +8,11 @@ config HEXAGON select ARCH_HAS_SYNC_DMA_FOR_DEVICE select ARCH_NO_PREEMPT select DMA_GLOBAL_POOL + select FRAME_POINTER + select HAVE_PAGE_SIZE_4KB + select HAVE_PAGE_SIZE_16KB + select HAVE_PAGE_SIZE_64KB + select HAVE_PAGE_SIZE_256KB # Other pending projects/to-do items. # select HAVE_REGS_AND_STACK_ACCESS_API # select HAVE_HW_BREAKPOINT if PERF_EVENTS @@ -120,26 +125,6 @@ config NR_CPUS This is purely to save memory - each supported CPU adds approximately eight kilobytes to the kernel image. -choice - prompt "Kernel page size" - default PAGE_SIZE_4KB - help - Changes the default page size; use with caution. - -config PAGE_SIZE_4KB - bool "4KB" - -config PAGE_SIZE_16KB - bool "16KB" - -config PAGE_SIZE_64KB - bool "64KB" - -config PAGE_SIZE_256KB - bool "256KB" - -endchoice - source "kernel/Kconfig.hz" endmenu diff --git a/arch/hexagon/include/asm/page.h b/arch/hexagon/include/asm/page.h index 10f1bc07423c..65c9bac639fa 100644 --- a/arch/hexagon/include/asm/page.h +++ b/arch/hexagon/include/asm/page.h @@ -13,27 +13,22 @@ /* This is probably not the most graceful way to handle this. */ #ifdef CONFIG_PAGE_SIZE_4KB -#define PAGE_SHIFT 12 #define HEXAGON_L1_PTE_SIZE __HVM_PDE_S_4KB #endif #ifdef CONFIG_PAGE_SIZE_16KB -#define PAGE_SHIFT 14 #define HEXAGON_L1_PTE_SIZE __HVM_PDE_S_16KB #endif #ifdef CONFIG_PAGE_SIZE_64KB -#define PAGE_SHIFT 16 #define HEXAGON_L1_PTE_SIZE __HVM_PDE_S_64KB #endif #ifdef CONFIG_PAGE_SIZE_256KB -#define PAGE_SHIFT 18 #define HEXAGON_L1_PTE_SIZE __HVM_PDE_S_256KB #endif #ifdef CONFIG_PAGE_SIZE_1MB -#define PAGE_SHIFT 20 #define HEXAGON_L1_PTE_SIZE __HVM_PDE_S_1MB #endif @@ -50,6 +45,7 @@ #define HVM_HUGEPAGE_SIZE 0x5 #endif +#define PAGE_SHIFT CONFIG_PAGE_SHIFT #define PAGE_SIZE (1UL << PAGE_SHIFT) #define PAGE_MASK (~((1 << PAGE_SHIFT) - 1)) diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 929f68926b34..b274784c2e26 100644 ---
[PATCH 0/4] arch: mm, vdso: consolidate PAGE_SIZE definition
From: Arnd Bergmann Naresh noticed that the newly added usage of the PAGE_SIZE macro in include/vdso/datapage.h introduced a build regression. I had an older patch that I revived to have this defined through Kconfig rather than through including asm/page.h, which is not allowed in vdso code. I rebased and tested on top of the tip/timers/core branch that introduced the regression. If these patches get added, the compat VDSOs all build again, but the changes are a bit invasive. Arnd Link: https://lore.kernel.org/lkml/ca+g9fytrxxm_ko9fnpz3xarxhv7ud_yqp-teupqrnrhu+_0...@mail.gmail.com/ Link: https://lore.kernel.org/all/65dc6c14.170a0220.f4a3f.9...@mx.google.com/ Arnd Bergmann (4): arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions arch: simplify architecture specific page size configuration arch: define CONFIG_PAGE_SIZE_*KB on all architectures vdso: avoid including asm/page.h arch/Kconfig | 58 -- arch/alpha/Kconfig | 1 + arch/alpha/include/asm/page.h | 2 +- arch/arc/Kconfig | 3 ++ arch/arc/include/uapi/asm/page.h | 6 ++-- arch/arm/Kconfig | 1 + arch/arm/include/asm/page.h| 2 +- arch/arm64/Kconfig | 29 +++ arch/arm64/include/asm/page-def.h | 2 +- arch/csky/Kconfig | 1 + arch/csky/include/asm/page.h | 2 +- arch/hexagon/Kconfig | 25 +++-- arch/hexagon/include/asm/page.h| 6 +--- arch/loongarch/Kconfig | 21 --- arch/loongarch/include/asm/page.h | 10 +- arch/m68k/Kconfig | 3 ++ arch/m68k/Kconfig.cpu | 2 ++ arch/m68k/include/asm/page.h | 6 +--- arch/microblaze/Kconfig| 1 + arch/microblaze/include/asm/page.h | 2 +- arch/mips/Kconfig | 58 +++--- arch/mips/include/asm/page.h | 16 + arch/nios2/Kconfig | 1 + arch/nios2/include/asm/page.h | 2 +- arch/openrisc/Kconfig | 1 + arch/openrisc/include/asm/page.h | 2 +- arch/parisc/Kconfig| 3 ++ arch/parisc/include/asm/page.h | 10 +- arch/powerpc/Kconfig | 31 arch/powerpc/include/asm/page.h| 2 +- arch/riscv/Kconfig | 1 + arch/riscv/include/asm/page.h | 2 +- arch/s390/Kconfig | 1 + arch/s390/include/asm/page.h | 2 +- arch/sh/include/asm/page.h | 13 +-- arch/sh/mm/Kconfig | 42 +++--- arch/sparc/Kconfig | 2 ++ arch/sparc/include/asm/page_32.h | 2 +- arch/sparc/include/asm/page_64.h | 3 +- arch/um/Kconfig| 1 + arch/um/include/asm/page.h | 2 +- arch/x86/Kconfig | 1 + arch/x86/include/asm/page_types.h | 2 +- arch/xtensa/Kconfig| 1 + arch/xtensa/include/asm/page.h | 2 +- include/vdso/datapage.h| 4 +-- scripts/gdb/linux/constants.py.in | 2 +- scripts/gdb/linux/mm.py| 2 +- 48 files changed, 153 insertions(+), 241 deletions(-) -- 2.39.2 To: Thomas Gleixner To: Vincenzo Frascino To: Kees Cook To: Anna-Maria Behnsen Cc: Matt Turner Cc: Vineet Gupta Cc: Russell King Cc: Catalin Marinas Cc: Guo Ren Cc: Brian Cain Cc: Huacai Chen Cc: Geert Uytterhoeven Cc: Michal Simek Cc: Thomas Bogendoerfer Cc: Helge Deller Cc: Michael Ellerman Cc: Christophe Leroy Cc: Palmer Dabbelt Cc: John Paul Adrian Glaubitz Cc: Andreas Larsson Cc: Richard Weinberger Cc: x...@kernel.org Cc: Max Filippov Cc: Andy Lutomirski Cc: Vincenzo Frascino Cc: Jan Kiszka Cc: Kieran Bingham Cc: Andrew Morton Cc: Arnd Bergmann Cc: linux-ker...@vger.kernel.org Cc: linux-alpha@vger.kernel.org Cc: linux-snps-...@lists.infradead.org Cc: linux-arm-ker...@lists.infradead.org Cc: linux-c...@vger.kernel.org Cc: linux-hexa...@vger.kernel.org Cc: loonga...@lists.linux.dev Cc: linux-m...@lists.linux-m68k.org Cc: linux-m...@vger.kernel.org Cc: linux-openr...@vger.kernel.org Cc: linux-par...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Cc: linux-ri...@lists.infradead.org Cc: linux-s...@vger.kernel.org Cc: linux...@vger.kernel.org Cc: sparcli...@vger.kernel.org Cc: linux...@lists.infradead.org
[RFC PATCH 03/14] sched/core: Use TIF_NOTIFY_IPI to notify an idle CPU in TIF_POLLING mode of pending IPI
From: "Gautham R. Shenoy" Problem statement = When measuring IPI throughput using a modified version of Anton Blanchard's ipistorm benchmark [1], configured to measure time taken to perform a fixed number of smp_call_function_single() (with wait set to 1), an increase in benchmark time was observed between v5.7 and the upstream kernel (v6.7-rc6). Bisection pointed to commit b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()") as the reason behind this increase in runtime. Reverting the optimization introduced by the above commit fixed the regression in ipistorm, however benchmarks like tbench and netperf regressed with the revert, supporting the validity of the optimization. Following are the benchmark results on top of tip:sched/core with the optimization reverted on a dual socket 3rd Generation aMD EPYC system (2 x 64C/128T) running with boost enabled and C2 disabled: (tip:sched/core at tag "sched-core-2024-01-08" for all the testing done below) == Test : ipistorm (modified) Units : Normalized runtime Interpretation: Lower is better Statistic : AMean cmdline : insmod ipistorm.ko numipi=10 single=1 offset=8 cpulist=8 wait=1 == kernel: time [pct imp] tip:sched/core1.00 [0.00] tip:sched/core + revert 0.81 [19.36] == Test : tbench Units : Normalized throughput Interpretation: Higher is better Statistic : AMean == Clients:tip[pct imp](CV) revert[pct imp](CV) 1 1.00 [ 0.00]( 0.24) 0.91 [ -8.96]( 0.30) 2 1.00 [ 0.00]( 0.25) 0.92 [ -8.20]( 0.97) 4 1.00 [ 0.00]( 0.23) 0.91 [ -9.20]( 1.75) 8 1.00 [ 0.00]( 0.69) 0.91 [ -9.48]( 1.56) 16 1.00 [ 0.00]( 0.66) 0.92 [ -8.49]( 2.43) 32 1.00 [ 0.00]( 0.96) 0.89 [-11.13]( 0.96) 64 1.00 [ 0.00]( 1.06) 0.90 [ -9.72]( 2.49) 128 1.00 [ 0.00]( 0.70) 0.92 [ -8.36]( 1.26) 256 1.00 [ 0.00]( 0.72) 0.97 [ -3.30]( 1.10) 512 1.00 [ 0.00]( 0.42) 0.98 [ -1.73]( 0.37) 1024 1.00 [ 0.00]( 0.28) 0.99 [ -1.39]( 0.43) == Test : netperf Units : Normalized Througput Interpretation: Higher is better Statistic : AMean == Clients: tip[pct imp](CV) revert[pct imp](CV) 1-clients 1.00 [ 0.00]( 0.50) 0.89 [-10.51]( 0.20) 2-clients 1.00 [ 0.00]( 1.16) 0.89 [-11.10]( 0.59) 4-clients 1.00 [ 0.00]( 1.03) 0.89 [-10.68]( 0.38) 8-clients 1.00 [ 0.00]( 0.99) 0.89 [-10.54]( 0.50) 16-clients 1.00 [ 0.00]( 0.87) 0.89 [-10.92]( 0.95) 32-clients 1.00 [ 0.00]( 1.24) 0.89 [-10.85]( 0.63) 64-clients 1.00 [ 0.00]( 1.58) 0.90 [-10.11]( 1.18) 128-clients1.00 [ 0.00]( 0.87) 0.89 [-10.94]( 1.11) 256-clients1.00 [ 0.00]( 4.77) 1.00 [ -0.16]( 3.45) 512-clients1.00 [ 0.00](56.16) 1.02 [ 2.10](56.05) Since a simple revert is not a viable solution, the changes in the code path of call_function_single_prep_ipi(), with and without the optimization were audited to better understand the effect of the commit. Effects of call_function_single_prep_ipi() == To pull a TIF_POLLING thread out of idle to process an IPI, the sender sets the TIF_NEED_RESCHED bit in the idle task's thread info in call_function_single_prep_ipi() and avoids sending an actual IPI to the target. As a result, the scheduler expects a task to be enqueued when exiting the idle path. This is not the case with non-polling idle states where the idle CPU exits the non-polling idle state to process the interrupt, and since need_resched() returns false, soon goes back to idle again. When TIF_NEED_RESCHED flag is set, do_idle() will call schedule_idle(), a large part of which runs with local IRQ disabled. In case of ipistorm, when measuring IPI throughput, this large IRQ disabled section delays processing of IPIs. Further auditing revealed that in absence of any runnable tasks, pick_next_task_fair(), which is called from the pick_next_task() fast path, will always call newidle_balance() in this scenario, further increasing the time spent in the IRQ disabled section. Following is the crude visualization of the problem with relevant functions expanded: -- CPU0CPU1 do_idle() {
[RFC PATCH 02/14] sched: Define a need_resched_or_ipi() helper and use it treewide
From: "Gautham R. Shenoy" Currently TIF_NEED_RESCHED is being overloaded, to wakeup an idle CPU in TIF_POLLING mode to service an IPI even if there are no new tasks being woken up on the said CPU. In preparation of a proper fix, introduce a new helper "need_resched_or_ipi()" which is intended to return true if either the TIF_NEED_RESCHED flag or if TIF_NOTIFY_IPI flag is set. Use this helper function in place of need_resched() in idle loops where TIF_POLLING_NRFLAG is set. To preserve bisectibility and avoid unbreakable idle loops, all the need_resched() checks within TIF_POLLING_NRFLAGS sections, have been replaced tree-wide with the need_resched_or_ipi() check. [ prateek: Replaced some of the missed out occurrences of need_resched() within a TIF_POLLING sections with need_resched_or_ipi() ] Cc: Richard Henderson Cc: Ivan Kokshaysky Cc: Matt Turner Cc: Russell King Cc: Guo Ren Cc: Michal Simek Cc: Dinh Nguyen Cc: Jonas Bonn Cc: Stefan Kristiansson Cc: Stafford Horne Cc: "James E.J. Bottomley" Cc: Helge Deller Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Christophe Leroy Cc: "Aneesh Kumar K.V" Cc: "Naveen N. Rao" Cc: Yoshinori Sato Cc: Rich Felker Cc: John Paul Adrian Glaubitz Cc: "David S. Miller" Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: "H. Peter Anvin" Cc: "Rafael J. Wysocki" Cc: Daniel Lezcano Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Steven Rostedt Cc: Ben Segall Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: Valentin Schneider Cc: Al Viro Cc: Linus Walleij Cc: Ard Biesheuvel Cc: Andrew Donnellan Cc: Nicholas Miehlbradt Cc: Andrew Morton Cc: Arnd Bergmann Cc: Josh Poimboeuf Cc: "Kirill A. Shutemov" Cc: Rick Edgecombe Cc: Tony Battersby Cc: Brian Gerst Cc: Tim Chen Cc: David Vernet Cc: x...@kernel.org Cc: linux-ker...@vger.kernel.org Cc: linux-alpha@vger.kernel.org Cc: linux-arm-ker...@lists.infradead.org Cc: linux-c...@vger.kernel.org Cc: linux-openr...@vger.kernel.org Cc: linux-par...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Cc: linux...@vger.kernel.org Cc: sparcli...@vger.kernel.org Cc: linux...@vger.kernel.org Signed-off-by: Gautham R. Shenoy Co-developed-by: K Prateek Nayak Signed-off-by: K Prateek Nayak --- arch/x86/include/asm/mwait.h | 2 +- arch/x86/kernel/process.c | 2 +- drivers/cpuidle/cpuidle-powernv.c | 2 +- drivers/cpuidle/cpuidle-pseries.c | 2 +- drivers/cpuidle/poll_state.c | 2 +- include/linux/sched.h | 5 + include/linux/sched/idle.h| 4 ++-- kernel/sched/idle.c | 7 --- 8 files changed, 16 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h index 778df05f8539..ac1370143407 100644 --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -115,7 +115,7 @@ static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned lo } __monitor((void *)_thread_info()->flags, 0, 0); - if (!need_resched()) + if (!need_resched_or_ipi()) __mwait(eax, ecx); } current_clr_polling(); diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index b6f4e8399fca..ca6cb7e28cba 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -925,7 +925,7 @@ static __cpuidle void mwait_idle(void) } __monitor((void *)_thread_info()->flags, 0, 0); - if (!need_resched()) { + if (!need_resched_or_ipi()) { __sti_mwait(0, 0); raw_local_irq_disable(); } diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c index 9ebedd972df0..77c3bb371f56 100644 --- a/drivers/cpuidle/cpuidle-powernv.c +++ b/drivers/cpuidle/cpuidle-powernv.c @@ -79,7 +79,7 @@ static int snooze_loop(struct cpuidle_device *dev, dev->poll_time_limit = false; ppc64_runlatch_off(); HMT_very_low(); - while (!need_resched()) { + while (!need_resched_or_ipi()) { if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) { /* * Task has not woken up but we are exiting the polling diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c index 14db9b7d985d..4f2b490f8b73 100644 --- a/drivers/cpuidle/cpuidle-pseries.c +++ b/drivers/cpuidle/cpuidle-pseries.c @@ -46,7 +46,7 @@ int snooze_loop(struct cpuidle_device *dev, struct cpuidle_driver *drv, snooze_exit_time = get_tb() + snooze_timeout; dev->poll_time_limit = false; - while (!need_resched()) { + while (!need_resched_or_ipi()) { HMT_low(); HMT_very_low(); if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) { diff --git
[RFC PATCH 01/14] thread_info: Add helpers to test and clear TIF_NOTIFY_IPI
From: "Gautham R. Shenoy" Introduce the notion of TIF_NOTIFY_IPI flag. When a processor in TIF_POLLING mode needs to process an IPI, the sender sets NEED_RESCHED bit in idle task's thread_info to pull the target out of idle and avoids sending an interrupt to the idle CPU. When NEED_RESCHED is set, the scheduler assumes that a new task has been queued on the idle CPU and calls schedule_idle(), however, it is not necessary that an IPI on an idle CPU will necessarily end up waking a task on the said CPU. To avoid spurious calls to schedule_idle() assuming an IPI on an idle CPU will always wake a task on the said CPU, TIF_NOTIFY_IPI will be used to pull a TIF_POLLING CPU out of idle. Since the IPI handlers are processed before the call to schedule_idle(), schedule_idle() will be called only if one of the handlers have woken up a new task on the CPU and has set NEED_RESCHED. Add tif_notify_ipi() and current_clr_notify_ipi() helpers to test if TIF_NOTIFY_IPI is set in the current task's thread_info, and to clear it respectively. These interfaces will be used in subsequent patches as TIF_NOTIFY_IPI notion is integrated in the scheduler and in the idle path. [ prateek: Split the changes into a separate patch, add commit log ] Cc: Richard Henderson Cc: Ivan Kokshaysky Cc: Matt Turner Cc: Russell King Cc: Guo Ren Cc: Michal Simek Cc: Dinh Nguyen Cc: Jonas Bonn Cc: Stefan Kristiansson Cc: Stafford Horne Cc: "James E.J. Bottomley" Cc: Helge Deller Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Christophe Leroy Cc: "Aneesh Kumar K.V" Cc: "Naveen N. Rao" Cc: Yoshinori Sato Cc: Rich Felker Cc: John Paul Adrian Glaubitz Cc: "David S. Miller" Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: "H. Peter Anvin" Cc: "Rafael J. Wysocki" Cc: Daniel Lezcano Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Steven Rostedt Cc: Ben Segall Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: Valentin Schneider Cc: Al Viro Cc: Linus Walleij Cc: Ard Biesheuvel Cc: Andrew Donnellan Cc: Nicholas Miehlbradt Cc: Andrew Morton Cc: Arnd Bergmann Cc: Josh Poimboeuf Cc: "Kirill A. Shutemov" Cc: Rick Edgecombe Cc: Tony Battersby Cc: Brian Gerst Cc: Tim Chen Cc: David Vernet Cc: x...@kernel.org Cc: linux-ker...@vger.kernel.org Cc: linux-alpha@vger.kernel.org Cc: linux-arm-ker...@lists.infradead.org Cc: linux-c...@vger.kernel.org Cc: linux-openr...@vger.kernel.org Cc: linux-par...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Cc: linux...@vger.kernel.org Cc: sparcli...@vger.kernel.org Cc: linux...@vger.kernel.org Signed-off-by: Gautham R. Shenoy Co-developed-by: K Prateek Nayak Signed-off-by: K Prateek Nayak --- include/linux/thread_info.h | 43 + 1 file changed, 43 insertions(+) diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index 9ea0b28068f4..1e10dd8c0227 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -195,6 +195,49 @@ static __always_inline bool tif_need_resched(void) #endif /* _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H */ +#ifdef TIF_NOTIFY_IPI + +#ifdef _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H + +static __always_inline bool tif_notify_ipi(void) +{ + return arch_test_bit(TIF_NOTIFY_IPI, +(unsigned long *)(_thread_info()->flags)); +} + +static __always_inline void current_clr_notify_ipi(void) +{ + arch_clear_bit(TIF_NOTIFY_IPI, + (unsigned long *)(_thread_info()->flags)); +} + +#else + +static __always_inline bool tif_notify_ipi(void) +{ + return test_bit(TIF_NOTIFY_IPI, + (unsigned long *)(_thread_info()->flags)); +} + +static __always_inline void current_clr_notify_ipi(void) +{ + clear_bit(TIF_NOTIFY_IPI, + (unsigned long *)(_thread_info()->flags)); +} + +#endif /* _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H */ + +#else /* !TIF_NOTIFY_IPI */ + +static __always_inline bool tif_notify_ipi(void) +{ + return false; +} + +static __always_inline void current_clr_notify_ipi(void) { } + +#endif /* TIF_NOTIFY_IPI */ + #ifndef CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES static inline int arch_within_stack_frames(const void * const stack, const void * const stackend, -- 2.34.1
[RFC PATCH 00/14] Introducing TIF_NOTIFY_IPI flag
Hello everyone, Before jumping into the issue, let me clarify the Cc list. Everyone have been cc'ed on Patch 0 through Patch 3. Respective arch maintainers, reviewers, and committers returned by scripts/get_maintainer.pl have been cc'ed on the respective arch side changes. Scheduler and CPU Idle maintainers and reviewers have been included for the entire series. If I have missed anyone, please do add them. If you would like to be dropped from the cc list, wholly or partially, for the future iterations, please do let me know. With that out of the way ... Problem statement = When measuring IPI throughput using a modified version of Anton Blanchard's ipistorm benchmark [1], configured to measure time taken to perform a fixed number of smp_call_function_single() (with wait set to 1), an increase in benchmark time was observed between v5.7 and the current upstream release (v6.7-rc6 at the time of encounter). Bisection pointed to commit b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()") as the reason behind this increase in runtime. Experiments === Since the commit cannot be cleanly reverted on top of the current tip:sched/core, the effects of the optimizations were reverted by: 1. Removing the check for call_function_single_prep_ipi() in send_call_function_single_ipi(). With this change send_call_function_single_ipi() always calls arch_send_call_function_single_ipi() 2. Removing the call to flush_smp_call_function_queue() in do_idle() since every smp_call_function, with (1.), would unconditionally send an IPI to an idle CPU in TIF_POLLING mode. Following is the diff of the above described changes which will be henceforth referred to as the "revert": diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 31231925f1ec..735184d98c0f 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -332,11 +332,6 @@ static void do_idle(void) */ smp_mb__after_atomic(); - /* -* RCU relies on this call to be done outside of an RCU read-side -* critical section. -*/ - flush_smp_call_function_queue(); schedule_idle(); if (unlikely(klp_patch_pending(current))) diff --git a/kernel/smp.c b/kernel/smp.c index f085ebcdf9e7..2ff100c41885 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -111,11 +111,9 @@ void __init call_function_init(void) static __always_inline void send_call_function_single_ipi(int cpu) { - if (call_function_single_prep_ipi(cpu)) { - trace_ipi_send_cpu(cpu, _RET_IP_, - generic_smp_call_function_single_interrupt); - arch_send_call_function_single_ipi(cpu); - } + trace_ipi_send_cpu(cpu, _RET_IP_, + generic_smp_call_function_single_interrupt); + arch_send_call_function_single_ipi(cpu); } static __always_inline void -- With the revert, the time taken to complete a fixed set of IPIs using ipistorm improves significantly. Following are the numbers from a dual socket 3rd Generation EPYC system (2 x 64C/128T) (boost on, C2 disabled) running ipistorm between CPU8 and CPU16: cmdline: insmod ipistorm.ko numipi=10 single=1 offset=8 cpulist=8 wait=1 (tip:sched/core at tag "sched-core-2024-01-08" for all the testing done below) == Test : ipistorm (modified) Units : Normalized runtime Interpretation: Lower is better Statistic : AMean == kernel: time [pct imp] tip:sched/core1.00 [0.00] tip:sched/core + revert 0.81 [19.36] Although the revert improves ipistorm performance, it also regresses tbench and netperf, supporting the validity of the optimization. Following are netperf and tbench numbers from the same machine comparing vanilla tip:sched/core and the revert applied on top: == Test : tbench Units : Normalized throughput Interpretation: Higher is better Statistic : AMean == Clients:tip[pct imp](CV) revert[pct imp](CV) 1 1.00 [ 0.00]( 0.24) 0.91 [ -8.96]( 0.30) 2 1.00 [ 0.00]( 0.25) 0.92 [ -8.20]( 0.97) 4 1.00 [ 0.00]( 0.23) 0.91 [ -9.20]( 1.75) 8 1.00 [ 0.00]( 0.69) 0.91 [ -9.48]( 1.56) 16 1.00 [ 0.00]( 0.66) 0.92 [ -8.49]( 2.43) 32 1.00 [ 0.00]( 0.96) 0.89 [-11.13]( 0.96) 64 1.00 [ 0.00]( 1.06) 0.90 [ -9.72]( 2.49) 128 1.00 [ 0.00]( 0.70) 0.92 [ -8.36]( 1.26) 256 1.00 [ 0.00]( 0.72) 0.97 [ -3.30]( 1.10) 512 1.00 [ 0.00]( 0.42) 0.98 [ -1.73]( 0.37) 1024 1.00 [ 0.00]( 0.28) 0.99 [ -1.39]( 0.43)
[linux-next:master] BUILD REGRESSION abb240f7a2bd14567ab53e602db562bb683391e6
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master branch HEAD: abb240f7a2bd14567ab53e602db562bb683391e6 Add linux-next specific files for 20231212 Error/Warning reports: https://lore.kernel.org/oe-kbuild-all/202312121926.gc7oytbz-...@intel.com https://lore.kernel.org/oe-kbuild-all/202312130153.ebbunfqa-...@intel.com Error/Warning: (recently discovered and may have been fixed) Warning: MAINTAINERS references a file that doesn't exist: Documentation/devicetree/bindings/display/panel/synaptics,r63353.yaml Error/Warning ids grouped by kconfigs: gcc_recent_errors |-- alpha-allyesconfig | |-- fs-bcachefs-chardev.c:warning:function-run_thread_with_file-might-be-a-candidate-for-gnu_printf-format-attribute | `-- fs-bcachefs-super.c:warning:function-__bch2_print-might-be-a-candidate-for-gnu_printf-format-attribute |-- alpha-randconfig-r113-20231212 | |-- arch-alpha-mm-fault.c:sparse:sparse:Using-plain-integer-as-NULL-pointer | |-- fs-bcachefs-chardev.c:warning:function-run_thread_with_file-might-be-a-candidate-for-gnu_printf-format-attribute | |-- fs-bcachefs-super.c:warning:function-__bch2_print-might-be-a-candidate-for-gnu_printf-format-attribute | |-- lib-zstd-compress-zstd_fast.c:sparse:sparse:Using-plain-integer-as-NULL-pointer | |-- sound-soc-codecs-cs42l43.c:sparse:sparse:symbol-cs42l43_hp_ilimit_clear_work-was-not-declared.-Should-it-be-static | `-- sound-soc-codecs-cs42l43.c:sparse:sparse:symbol-cs42l43_hp_ilimit_work-was-not-declared.-Should-it-be-static |-- arc-allmodconfig | |-- fs-bcachefs-chardev.c:warning:function-run_thread_with_file-might-be-a-candidate-for-gnu_printf-format-attribute | `-- fs-bcachefs-super.c:warning:function-__bch2_print-might-be-a-candidate-for-gnu_printf-format-attribute |-- arc-allyesconfig | |-- fs-bcachefs-chardev.c:warning:function-run_thread_with_file-might-be-a-candidate-for-gnu_printf-format-attribute | `-- fs-bcachefs-super.c:warning:function-__bch2_print-might-be-a-candidate-for-gnu_printf-format-attribute |-- arc-randconfig-002-20231212 | |-- fs-bcachefs-chardev.c:warning:function-run_thread_with_file-might-be-a-candidate-for-gnu_printf-format-attribute | `-- fs-bcachefs-super.c:warning:function-__bch2_print-might-be-a-candidate-for-gnu_printf-format-attribute |-- arm-allmodconfig | |-- fs-bcachefs-chardev.c:warning:function-run_thread_with_file-might-be-a-candidate-for-gnu_printf-format-attribute | `-- fs-bcachefs-super.c:warning:function-__bch2_print-might-be-a-candidate-for-gnu_printf-format-attribute |-- arm-allyesconfig | |-- fs-bcachefs-chardev.c:warning:function-run_thread_with_file-might-be-a-candidate-for-gnu_printf-format-attribute | `-- fs-bcachefs-super.c:warning:function-__bch2_print-might-be-a-candidate-for-gnu_printf-format-attribute |-- arm-randconfig-001-20231212 | |-- fs-bcachefs-chardev.c:warning:function-run_thread_with_file-might-be-a-candidate-for-gnu_printf-format-attribute | `-- fs-bcachefs-super.c:warning:function-__bch2_print-might-be-a-candidate-for-gnu_printf-format-attribute |-- arm-randconfig-002-20231212 | |-- fs-bcachefs-chardev.c:warning:function-run_thread_with_file-might-be-a-candidate-for-gnu_printf-format-attribute | `-- fs-bcachefs-super.c:warning:function-__bch2_print-might-be-a-candidate-for-gnu_printf-format-attribute |-- arm-randconfig-003-20231212 | |-- fs-bcachefs-chardev.c:warning:function-run_thread_with_file-might-be-a-candidate-for-gnu_printf-format-attribute | `-- fs-bcachefs-super.c:warning:function-__bch2_print-might-be-a-candidate-for-gnu_printf-format-attribute |-- arm-randconfig-004-20231212 | |-- fs-bcachefs-chardev.c:warning:function-run_thread_with_file-might-be-a-candidate-for-gnu_printf-format-attribute | `-- fs-bcachefs-super.c:warning:function-__bch2_print-might-be-a-candidate-for-gnu_printf-format-attribute |-- arm-randconfig-r133-20231212 | |-- fs-ntfs3-ntfs.h:sparse:sparse:static-assertion-failed:sizeof(struct-ATTR_LIST_ENTRY) | `-- lib-zstd-compress-zstd_fast.c:sparse:sparse:Using-plain-integer-as-NULL-pointer |-- arm64-randconfig-002-20231212 | `-- WARNING:modpost:missing-MODULE_DESCRIPTION()-in-lib-zlib_inflate-zlib_inflate.o |-- arm64-randconfig-003-20231212 | `-- WARNING:modpost:missing-MODULE_DESCRIPTION()-in-lib-zlib_inflate-zlib_inflate.o |-- arm64-randconfig-004-20231212 | |-- fs-bcachefs-chardev.c:warning:function-run_thread_with_file-might-be-a-candidate-for-gnu_printf-format-attribute | `-- fs-bcachefs-super.c:warning:function-__bch2_print-might-be-a-candidate-for-gnu_printf-format-attribute |-- csky-allmodconfig | |-- fs-bcachefs-chardev.c:warning:function-run_thread_with_file-might-be-a-candidate-for-gnu_printf-format-attribute | `-- fs-bcachefs-super.c:warning:function-__bch2_print-might-be-a-candidate-for-gnu_printf-format-attribute |-- csky-allyesconfig | |--
Re: [PATCH] tty: virtio: drop virtio_cons_early_init()
On Thu, Nov 30, 2023 at 7:31 PM Jiri Slaby (SUSE) wrote: > > The last user of virtio_cons_early_init() was dropped in commit > 7fb2b2d51244 ("s390/virtio: remove the old KVM virtio transport"). > > So now, drop virtio_cons_early_init() and the logic and headers behind > too. > > Signed-off-by: Jiri Slaby (SUSE) > Cc: Richard Henderson > Cc: Ivan Kokshaysky > Cc: Matt Turner > Cc: Amit Shah > Cc: Arnd Bergmann > Cc: "Michael S. Tsirkin" > Cc: Jason Wang > Cc: Xuan Zhuo > Cc: linux-alpha@vger.kernel.org > Cc: virtualizat...@lists.linux.dev > --- Acked-by: Jason Wang Thanks
PSA: this list has been migrated (no action required)
Hello: This list has been migrated to the new vger infrastructure. You should't need to change anything about how you participate with the list or how you receive mail. If something isn't working right, please reach out to helpd...@kernel.org. Best regards, Konstantin
Re: [PATCH 2/2] rtc/alpha: remove legacy rtc driver
[[PATCH 2/2] rtc/alpha: remove legacy rtc driver] On 23/10/2019 (Wed 17:01) Arnd Bergmann wrote: > The old drivers/char/rtc.c driver was originally the implementation > for x86 PCs but got subsequently replaced by the rtc class driver > on all architectures except alpha. > > Move alpha over to the portable driver and remove the old one > for good. Git history will show I'm in favour of showing old code and old drivers to the curb - even if it is stuff that I wrote myself 20+ years ago! So if all users are now on the formalized rtc framework, then this relic should go away, and you can add my ack'd for the commit. Thanks, Paul. -- > > The CONFIG_JS_RTC option was only ever used on SPARC32 but > has not been available for many years, this was used to build > the same rtc driver with a different module name. > > Cc: Richard Henderson > Cc: Ivan Kokshaysky > Cc: Matt Turner > Cc: linux-alpha@vger.kernel.org > Cc: Paul Gortmaker > Signed-off-by: Arnd Bergmann > --- > This was last discussed in early 2018 in > https://lore.kernel.org/lkml/CAK8P3a0QZNY+K+V1HG056xCerz=_l2jh5ufz+2lwkdqkw5z...@mail.gmail.com/ > > Nobody ever replied there, so let's try this instead. > If there is any reason to keep the driver after all, > please let us know. > --- > arch/alpha/configs/defconfig |3 +- > drivers/char/Kconfig | 56 -- > drivers/char/Makefile|4 - > drivers/char/rtc.c | 1311 -- > 4 files changed, 2 insertions(+), 1372 deletions(-) > delete mode 100644 drivers/char/rtc.c > > diff --git a/arch/alpha/configs/defconfig b/arch/alpha/configs/defconfig > index f4ec420d7f2d..e10c1be3c0d1 100644 > --- a/arch/alpha/configs/defconfig > +++ b/arch/alpha/configs/defconfig > @@ -53,7 +53,8 @@ CONFIG_NET_PCI=y > CONFIG_YELLOWFIN=y > CONFIG_SERIAL_8250=y > CONFIG_SERIAL_8250_CONSOLE=y > -CONFIG_RTC=y > +CONFIG_RTC_CLASS=y > +CONFIG_RTC_DRV_CMOS=y > CONFIG_EXT2_FS=y > CONFIG_REISERFS_FS=m > CONFIG_ISO9660_FS=y > diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig > index dabbf3f519c6..c2ac4f257c82 100644 > --- a/drivers/char/Kconfig > +++ b/drivers/char/Kconfig > @@ -243,62 +243,6 @@ config NVRAM > To compile this driver as a module, choose M here: the > module will be called nvram. > > -# > -# These legacy RTC drivers just cause too many conflicts with the generic > -# RTC framework ... let's not even try to coexist any more. > -# > -if RTC_LIB=n > - > -config RTC > - tristate "Enhanced Real Time Clock Support (legacy PC RTC driver)" > - depends on ALPHA > - ---help--- > - If you say Y here and create a character special file /dev/rtc with > - major number 10 and minor number 135 using mknod ("man mknod"), you > - will get access to the real time clock (or hardware clock) built > - into your computer. > - > - Every PC has such a clock built in. It can be used to generate > - signals from as low as 1Hz up to 8192Hz, and can also be used > - as a 24 hour alarm. It reports status information via the file > - /proc/driver/rtc and its behaviour is set by various ioctls on > - /dev/rtc. > - > - If you run Linux on a multiprocessor machine and said Y to > - "Symmetric Multi Processing" above, you should say Y here to read > - and set the RTC in an SMP compatible fashion. > - > - If you think you have a use for such a device (such as periodic data > - sampling), then say Y here, and read > > - for details. > - > - To compile this driver as a module, choose M here: the > - module will be called rtc. > - > -config JS_RTC > - tristate "Enhanced Real Time Clock Support" > - depends on SPARC32 && PCI > - ---help--- > - If you say Y here and create a character special file /dev/rtc with > - major number 10 and minor number 135 using mknod ("man mknod"), you > - will get access to the real time clock (or hardware clock) built > - into your computer. > - > - Every PC has such a clock built in. It can be used to generate > - signals from as low as 1Hz up to 8192Hz, and can also be used > - as a 24 hour alarm. It reports status information via the file > - /proc/driver/rtc and its behaviour is set by various ioctls on > - /dev/rtc. > - > - If you think you have a use for such a device (such as periodic data > - sampling), then say Y here, and read > > - for details. > - > - To compile this driver as a module, choose M here: the > - module will be called js-rtc. > - > -endif # RTC_LIB > - > config DTLK > tristate "Double Talk PC internal speech card support" > depends on ISA > diff --git a/drivers/char/Makefile b/drivers/char/Makefile > index abe3138b1f5a..ffce287ef415 100644 > --- a/drivers/char/Makefile > +++ b/drivers/char/Makefile > @@ -20,7 +20,6 @@ obj-$(CONFIG_APM_EMULATION) += apm-emulation.o >
Re: [PATCH 2/2] rtc/alpha: remove legacy rtc driver
On 23/10/2019 17:01:59+0200, Arnd Bergmann wrote: > The old drivers/char/rtc.c driver was originally the implementation > for x86 PCs but got subsequently replaced by the rtc class driver > on all architectures except alpha. > > Move alpha over to the portable driver and remove the old one > for good. > > The CONFIG_JS_RTC option was only ever used on SPARC32 but > has not been available for many years, this was used to build > the same rtc driver with a different module name. > > Cc: Richard Henderson > Cc: Ivan Kokshaysky > Cc: Matt Turner > Cc: linux-alpha@vger.kernel.org > Cc: Paul Gortmaker > Signed-off-by: Arnd Bergmann Acked-by: Alexandre Belloni > --- > This was last discussed in early 2018 in > https://lore.kernel.org/lkml/CAK8P3a0QZNY+K+V1HG056xCerz=_l2jh5ufz+2lwkdqkw5z...@mail.gmail.com/ > > Nobody ever replied there, so let's try this instead. > If there is any reason to keep the driver after all, > please let us know. > --- > arch/alpha/configs/defconfig |3 +- > drivers/char/Kconfig | 56 -- > drivers/char/Makefile|4 - > drivers/char/rtc.c | 1311 -- > 4 files changed, 2 insertions(+), 1372 deletions(-) > delete mode 100644 drivers/char/rtc.c > > diff --git a/arch/alpha/configs/defconfig b/arch/alpha/configs/defconfig > index f4ec420d7f2d..e10c1be3c0d1 100644 > --- a/arch/alpha/configs/defconfig > +++ b/arch/alpha/configs/defconfig > @@ -53,7 +53,8 @@ CONFIG_NET_PCI=y > CONFIG_YELLOWFIN=y > CONFIG_SERIAL_8250=y > CONFIG_SERIAL_8250_CONSOLE=y > -CONFIG_RTC=y > +CONFIG_RTC_CLASS=y > +CONFIG_RTC_DRV_CMOS=y > CONFIG_EXT2_FS=y > CONFIG_REISERFS_FS=m > CONFIG_ISO9660_FS=y > diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig > index dabbf3f519c6..c2ac4f257c82 100644 > --- a/drivers/char/Kconfig > +++ b/drivers/char/Kconfig > @@ -243,62 +243,6 @@ config NVRAM > To compile this driver as a module, choose M here: the > module will be called nvram. > > -# > -# These legacy RTC drivers just cause too many conflicts with the generic > -# RTC framework ... let's not even try to coexist any more. > -# > -if RTC_LIB=n > - > -config RTC > - tristate "Enhanced Real Time Clock Support (legacy PC RTC driver)" > - depends on ALPHA > - ---help--- > - If you say Y here and create a character special file /dev/rtc with > - major number 10 and minor number 135 using mknod ("man mknod"), you > - will get access to the real time clock (or hardware clock) built > - into your computer. > - > - Every PC has such a clock built in. It can be used to generate > - signals from as low as 1Hz up to 8192Hz, and can also be used > - as a 24 hour alarm. It reports status information via the file > - /proc/driver/rtc and its behaviour is set by various ioctls on > - /dev/rtc. > - > - If you run Linux on a multiprocessor machine and said Y to > - "Symmetric Multi Processing" above, you should say Y here to read > - and set the RTC in an SMP compatible fashion. > - > - If you think you have a use for such a device (such as periodic data > - sampling), then say Y here, and read > > - for details. > - > - To compile this driver as a module, choose M here: the > - module will be called rtc. > - > -config JS_RTC > - tristate "Enhanced Real Time Clock Support" > - depends on SPARC32 && PCI > - ---help--- > - If you say Y here and create a character special file /dev/rtc with > - major number 10 and minor number 135 using mknod ("man mknod"), you > - will get access to the real time clock (or hardware clock) built > - into your computer. > - > - Every PC has such a clock built in. It can be used to generate > - signals from as low as 1Hz up to 8192Hz, and can also be used > - as a 24 hour alarm. It reports status information via the file > - /proc/driver/rtc and its behaviour is set by various ioctls on > - /dev/rtc. > - > - If you think you have a use for such a device (such as periodic data > - sampling), then say Y here, and read > > - for details. > - > - To compile this driver as a module, choose M here: the > - module will be called js-rtc. > - > -endif # RTC_LIB > - > config DTLK > tristate "Double Talk PC internal speech card support" > depends on ISA > diff --git a/drivers/char/Makefile b/drivers/char/Makefile > index abe3138b1f5a..ffce287ef415 100644 > --- a/drivers/char/Makefile > +++ b/drivers/char/Makefile > @@ -20,7 +20,6 @@ obj-$(CONFIG_APM_EMULATION) += apm-emulation.o > obj-$(CONFIG_DTLK) += dtlk.o > obj-$(CONFIG_APPLICOM) += applicom.o > obj-$(CONFIG_SONYPI) += sonypi.o > -obj-$(CONFIG_RTC)+= rtc.o > obj-$(CONFIG_HPET) += hpet.o > obj-$(CONFIG_XILINX_HWICAP) += xilinx_hwicap/ > obj-$(CONFIG_NVRAM) += nvram.o > @@ -45,9
[PATCH 2/2] rtc/alpha: remove legacy rtc driver
The old drivers/char/rtc.c driver was originally the implementation for x86 PCs but got subsequently replaced by the rtc class driver on all architectures except alpha. Move alpha over to the portable driver and remove the old one for good. The CONFIG_JS_RTC option was only ever used on SPARC32 but has not been available for many years, this was used to build the same rtc driver with a different module name. Cc: Richard Henderson Cc: Ivan Kokshaysky Cc: Matt Turner Cc: linux-alpha@vger.kernel.org Cc: Paul Gortmaker Signed-off-by: Arnd Bergmann --- This was last discussed in early 2018 in https://lore.kernel.org/lkml/CAK8P3a0QZNY+K+V1HG056xCerz=_l2jh5ufz+2lwkdqkw5z...@mail.gmail.com/ Nobody ever replied there, so let's try this instead. If there is any reason to keep the driver after all, please let us know. --- arch/alpha/configs/defconfig |3 +- drivers/char/Kconfig | 56 -- drivers/char/Makefile|4 - drivers/char/rtc.c | 1311 -- 4 files changed, 2 insertions(+), 1372 deletions(-) delete mode 100644 drivers/char/rtc.c diff --git a/arch/alpha/configs/defconfig b/arch/alpha/configs/defconfig index f4ec420d7f2d..e10c1be3c0d1 100644 --- a/arch/alpha/configs/defconfig +++ b/arch/alpha/configs/defconfig @@ -53,7 +53,8 @@ CONFIG_NET_PCI=y CONFIG_YELLOWFIN=y CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y -CONFIG_RTC=y +CONFIG_RTC_CLASS=y +CONFIG_RTC_DRV_CMOS=y CONFIG_EXT2_FS=y CONFIG_REISERFS_FS=m CONFIG_ISO9660_FS=y diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig index dabbf3f519c6..c2ac4f257c82 100644 --- a/drivers/char/Kconfig +++ b/drivers/char/Kconfig @@ -243,62 +243,6 @@ config NVRAM To compile this driver as a module, choose M here: the module will be called nvram. -# -# These legacy RTC drivers just cause too many conflicts with the generic -# RTC framework ... let's not even try to coexist any more. -# -if RTC_LIB=n - -config RTC - tristate "Enhanced Real Time Clock Support (legacy PC RTC driver)" - depends on ALPHA - ---help--- - If you say Y here and create a character special file /dev/rtc with - major number 10 and minor number 135 using mknod ("man mknod"), you - will get access to the real time clock (or hardware clock) built - into your computer. - - Every PC has such a clock built in. It can be used to generate - signals from as low as 1Hz up to 8192Hz, and can also be used - as a 24 hour alarm. It reports status information via the file - /proc/driver/rtc and its behaviour is set by various ioctls on - /dev/rtc. - - If you run Linux on a multiprocessor machine and said Y to - "Symmetric Multi Processing" above, you should say Y here to read - and set the RTC in an SMP compatible fashion. - - If you think you have a use for such a device (such as periodic data - sampling), then say Y here, and read - for details. - - To compile this driver as a module, choose M here: the - module will be called rtc. - -config JS_RTC - tristate "Enhanced Real Time Clock Support" - depends on SPARC32 && PCI - ---help--- - If you say Y here and create a character special file /dev/rtc with - major number 10 and minor number 135 using mknod ("man mknod"), you - will get access to the real time clock (or hardware clock) built - into your computer. - - Every PC has such a clock built in. It can be used to generate - signals from as low as 1Hz up to 8192Hz, and can also be used - as a 24 hour alarm. It reports status information via the file - /proc/driver/rtc and its behaviour is set by various ioctls on - /dev/rtc. - - If you think you have a use for such a device (such as periodic data - sampling), then say Y here, and read - for details. - - To compile this driver as a module, choose M here: the - module will be called js-rtc. - -endif # RTC_LIB - config DTLK tristate "Double Talk PC internal speech card support" depends on ISA diff --git a/drivers/char/Makefile b/drivers/char/Makefile index abe3138b1f5a..ffce287ef415 100644 --- a/drivers/char/Makefile +++ b/drivers/char/Makefile @@ -20,7 +20,6 @@ obj-$(CONFIG_APM_EMULATION) += apm-emulation.o obj-$(CONFIG_DTLK) += dtlk.o obj-$(CONFIG_APPLICOM) += applicom.o obj-$(CONFIG_SONYPI) += sonypi.o -obj-$(CONFIG_RTC) += rtc.o obj-$(CONFIG_HPET) += hpet.o obj-$(CONFIG_XILINX_HWICAP)+= xilinx_hwicap/ obj-$(CONFIG_NVRAM)+= nvram.o @@ -45,9 +44,6 @@ obj-$(CONFIG_TCG_TPM) += tpm/ obj-$(CONFIG_PS3_FLASH)+= ps3flash.o -obj-$(CONFIG_JS_RTC) += js-rtc.o -js-rtc-y = rtc.o - obj-$(CONFIG_XILLYBUS) += xillybus/ obj-$(CONFIG_POWERNV_OP_PANEL)
Re: [PATCH 00/12] mm: remove __ARCH_HAS_4LEVEL_HACK
On Wed, Oct 23, 2019 at 5:29 AM Mike Rapoport wrote: > > These patches convert several architectures to use page table folding and > remove __ARCH_HAS_4LEVEL_HACK along with include/asm-generic/4level-fixup.h. Thanks for doing this. The patches look sane from a quick scan, and it's definitely the right thing to do. So ack on my part, but obviously testing the different architectures would be a really good thing... Linus
Re: [PATCH 08/12] parisc: use pgtable-nopXd instead of 4level-fixup
diff --git a/arch/parisc/include/asm/page.h b/arch/parisc/include/asm/page.h index 93caf17..1d339ee 100644 --- a/arch/parisc/include/asm/page.h +++ b/arch/parisc/include/asm/page.h @@ -42,48 +42,54 @@ typedef struct { unsigned long pte; } pte_t; /* either 32 or 64bit */ /* NOTE: even on 64 bits, these entries are __u32 because we allocate * the pmd and pgd in ZONE_DMA (i.e. under 4GB) */ -typedef struct { __u32 pmd; } pmd_t; typedef struct { __u32 pgd; } pgd_t; typedef struct { unsigned long pgprot; } pgprot_t; -#define pte_val(x) ((x).pte) -/* These do not work lvalues, so make sure we don't use them as such. */ +#if CONFIG_PGTABLE_LEVELS == 3 +typedef struct { __u32 pmd; } pmd_t; +#define __pmd(x) ((pmd_t) { (x) } ) +/* pXd_val() do not work lvalues, so make sure we don't use them as such. */ For me it sounds like there is something missing, maybe an "as" before lvalues? And it was "These", so plural, and now it is singular, so do -> does? Eike
Re: [PATCH 02/12] arm: nommu: use pgtable-nopud instead of 4level-fixup
On Wed, Oct 23, 2019 at 12:28:51PM +0300, Mike Rapoport wrote: > From: Mike Rapoport > > The generic nommu implementation of page table manipulation takes care of > folding of the upper levels and does not require fixups. > > Simply replace of include/asm-generic/4level-fixup.h with > include/asm-generic/pgtable-nopud.h. > > Signed-off-by: Mike Rapoport Acked-by: Russell King Thanks. > --- > arch/arm/include/asm/pgtable.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h > index 3ae120c..eabcb48 100644 > --- a/arch/arm/include/asm/pgtable.h > +++ b/arch/arm/include/asm/pgtable.h > @@ -12,7 +12,7 @@ > > #ifndef CONFIG_MMU > > -#include > +#include > #include > > #else > -- > 2.7.4 > > -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up
[PATCH 08/12] parisc: use pgtable-nopXd instead of 4level-fixup
From: Mike Rapoport parisc has two or three levels of page tables and can use appropriate pgtable-nopXd and folding of the upper layers. Replace usage of include/asm-generic/4level-fixup.h and explicit definitions of __PAGETABLE_PxD_FOLDED in parisc with include/asm-generic/pgtable-nopmd.h for two-level configurations and with include/asm-generic/pgtable-nopmd.h for three-lelve configurations and adjust page table manipulation macros and functions accordingly. Signed-off-by: Mike Rapoport --- arch/parisc/include/asm/page.h| 30 +- arch/parisc/include/asm/pgalloc.h | 41 +++--- arch/parisc/include/asm/pgtable.h | 52 +++ arch/parisc/include/asm/tlb.h | 2 ++ arch/parisc/kernel/cache.c| 13 ++ arch/parisc/kernel/pci-dma.c | 9 +-- arch/parisc/mm/fixmap.c | 10 +--- 7 files changed, 81 insertions(+), 76 deletions(-) diff --git a/arch/parisc/include/asm/page.h b/arch/parisc/include/asm/page.h index 93caf17..1d339ee 100644 --- a/arch/parisc/include/asm/page.h +++ b/arch/parisc/include/asm/page.h @@ -42,48 +42,54 @@ typedef struct { unsigned long pte; } pte_t; /* either 32 or 64bit */ /* NOTE: even on 64 bits, these entries are __u32 because we allocate * the pmd and pgd in ZONE_DMA (i.e. under 4GB) */ -typedef struct { __u32 pmd; } pmd_t; typedef struct { __u32 pgd; } pgd_t; typedef struct { unsigned long pgprot; } pgprot_t; -#define pte_val(x) ((x).pte) -/* These do not work lvalues, so make sure we don't use them as such. */ +#if CONFIG_PGTABLE_LEVELS == 3 +typedef struct { __u32 pmd; } pmd_t; +#define __pmd(x) ((pmd_t) { (x) } ) +/* pXd_val() do not work lvalues, so make sure we don't use them as such. */ #define pmd_val(x) ((x).pmd + 0) +#endif + +#define pte_val(x) ((x).pte) #define pgd_val(x) ((x).pgd + 0) #define pgprot_val(x) ((x).pgprot) #define __pte(x) ((pte_t) { (x) } ) -#define __pmd(x) ((pmd_t) { (x) } ) #define __pgd(x) ((pgd_t) { (x) } ) #define __pgprot(x)((pgprot_t) { (x) } ) -#define __pmd_val_set(x,n) (x).pmd = (n) -#define __pgd_val_set(x,n) (x).pgd = (n) - #else /* * .. while these make it easier on the compiler */ typedef unsigned long pte_t; + +#if CONFIG_PGTABLE_LEVELS == 3 typedef __u32 pmd_t; +#define pmd_val(x) (x) +#define __pmd(x) (x) +#endif + typedef __u32 pgd_t; typedef unsigned long pgprot_t; #define pte_val(x) (x) -#define pmd_val(x) (x) #define pgd_val(x) (x) #define pgprot_val(x) (x) #define __pte(x)(x) -#define __pmd(x) (x) #define __pgd(x)(x) #define __pgprot(x) (x) -#define __pmd_val_set(x,n) (x) = (n) -#define __pgd_val_set(x,n) (x) = (n) - #endif /* STRICT_MM_TYPECHECKS */ +#define set_pmd(pmdptr, pmdval) (*(pmdptr) = (pmdval)) +#if CONFIG_PGTABLE_LEVELS == 3 +#define set_pud(pudptr, pudval) (*(pudptr) = (pudval)) +#endif + typedef struct page *pgtable_t; typedef struct __physmem_range { diff --git a/arch/parisc/include/asm/pgalloc.h b/arch/parisc/include/asm/pgalloc.h index d98647c..9ac74da 100644 --- a/arch/parisc/include/asm/pgalloc.h +++ b/arch/parisc/include/asm/pgalloc.h @@ -34,13 +34,13 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm) /* Populate first pmd with allocated memory. We mark it * with PxD_FLAG_ATTACHED as a signal to the system that this * pmd entry may not be cleared. */ - __pgd_val_set(*actual_pgd, (PxD_FLAG_PRESENT | - PxD_FLAG_VALID | - PxD_FLAG_ATTACHED) - + (__u32)(__pa((unsigned long)pgd) >> PxD_VALUE_SHIFT)); + set_pgd(actual_pgd, __pgd((PxD_FLAG_PRESENT | + PxD_FLAG_VALID | + PxD_FLAG_ATTACHED) + + (__u32)(__pa((unsigned long)pgd) >> PxD_VALUE_SHIFT))); /* The first pmd entry also is marked with PxD_FLAG_ATTACHED as * a signal that this pmd may not be freed */ - __pgd_val_set(*pgd, PxD_FLAG_ATTACHED); + set_pgd(pgd, __pgd(PxD_FLAG_ATTACHED)); #endif } spin_lock_init(pgd_spinlock(actual_pgd)); @@ -59,10 +59,10 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd) /* Three Level Page Table Support for pmd's */ -static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pmd_t *pmd) +static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd) { - __pgd_val_set(*pgd, (PxD_FLAG_PRESENT | PxD_FLAG_VALID) + - (__u32)(__pa((unsigned long)pmd) >> PxD_VALUE_SHIFT)); + set_pud(pud, __pud((PxD_FLAG_PRESENT | PxD_FLAG_VALID) + + (__u32)(__pa((unsigned long)pmd) >> PxD_VALUE_SHIFT))); } static inline
[PATCH 09/12] sparc32: use pgtable-nopud instead of 4level-fixup
From: Mike Rapoport 32-bit version of sparc has three-level page tables and can use pgtable-nopud and folding of the upper layers. Replace usage of include/asm-generic/4level-fixup.h with include/asm-generic/pgtable-nopud.h and adjust page table manipulation macros and functions accordingly. Signed-off-by: Mike Rapoport --- arch/sparc/include/asm/pgalloc_32.h | 6 ++--- arch/sparc/include/asm/pgtable_32.h | 28 ++-- arch/sparc/mm/fault_32.c| 11 ++-- arch/sparc/mm/highmem.c | 6 - arch/sparc/mm/io-unit.c | 6 - arch/sparc/mm/iommu.c | 6 - arch/sparc/mm/srmmu.c | 51 + 7 files changed, 81 insertions(+), 33 deletions(-) diff --git a/arch/sparc/include/asm/pgalloc_32.h b/arch/sparc/include/asm/pgalloc_32.h index 10538a4..eae0c92 100644 --- a/arch/sparc/include/asm/pgalloc_32.h +++ b/arch/sparc/include/asm/pgalloc_32.h @@ -26,14 +26,14 @@ static inline void free_pgd_fast(pgd_t *pgd) #define pgd_free(mm, pgd) free_pgd_fast(pgd) #define pgd_alloc(mm) get_pgd_fast() -static inline void pgd_set(pgd_t * pgdp, pmd_t * pmdp) +static inline void pud_set(pud_t * pudp, pmd_t * pmdp) { unsigned long pa = __nocache_pa(pmdp); - set_pte((pte_t *)pgdp, __pte((SRMMU_ET_PTD | (pa >> 4; + set_pte((pte_t *)pudp, __pte((SRMMU_ET_PTD | (pa >> 4; } -#define pgd_populate(MM, PGD, PMD) pgd_set(PGD, PMD) +#define pud_populate(MM, PGD, PMD) pud_set(PGD, PMD) static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address) diff --git a/arch/sparc/include/asm/pgtable_32.h b/arch/sparc/include/asm/pgtable_32.h index 31da448..6d6f44c 100644 --- a/arch/sparc/include/asm/pgtable_32.h +++ b/arch/sparc/include/asm/pgtable_32.h @@ -12,7 +12,7 @@ #include #ifndef __ASSEMBLY__ -#include +#include #include #include @@ -132,12 +132,12 @@ static inline struct page *pmd_page(pmd_t pmd) return pfn_to_page((pmd_val(pmd) & SRMMU_PTD_PMASK) >> (PAGE_SHIFT-4)); } -static inline unsigned long pgd_page_vaddr(pgd_t pgd) +static inline unsigned long pud_page_vaddr(pud_t pud) { - if (srmmu_device_memory(pgd_val(pgd))) { + if (srmmu_device_memory(pud_val(pud))) { return ~0; } else { - unsigned long v = pgd_val(pgd) & SRMMU_PTD_PMASK; + unsigned long v = pud_val(pud) & SRMMU_PTD_PMASK; return (unsigned long)__nocache_va(v << 4); } } @@ -184,24 +184,24 @@ static inline void pmd_clear(pmd_t *pmdp) set_pte((pte_t *)>pmdv[i], __pte(0)); } -static inline int pgd_none(pgd_t pgd) +static inline int pud_none(pud_t pud) { - return !(pgd_val(pgd) & 0xFFF); + return !(pud_val(pud) & 0xFFF); } -static inline int pgd_bad(pgd_t pgd) +static inline int pud_bad(pud_t pud) { - return (pgd_val(pgd) & SRMMU_ET_MASK) != SRMMU_ET_PTD; + return (pud_val(pud) & SRMMU_ET_MASK) != SRMMU_ET_PTD; } -static inline int pgd_present(pgd_t pgd) +static inline int pud_present(pud_t pud) { - return ((pgd_val(pgd) & SRMMU_ET_MASK) == SRMMU_ET_PTD); + return ((pud_val(pud) & SRMMU_ET_MASK) == SRMMU_ET_PTD); } -static inline void pgd_clear(pgd_t *pgdp) +static inline void pud_clear(pud_t *pudp) { - set_pte((pte_t *)pgdp, __pte(0)); + set_pte((pte_t *)pudp, __pte(0)); } /* @@ -319,9 +319,9 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) #define pgd_offset_k(address) pgd_offset(_mm, address) /* Find an entry in the second-level page table.. */ -static inline pmd_t *pmd_offset(pgd_t * dir, unsigned long address) +static inline pmd_t *pmd_offset(pud_t * dir, unsigned long address) { - return (pmd_t *) pgd_page_vaddr(*dir) + + return (pmd_t *) pud_page_vaddr(*dir) + ((address >> PMD_SHIFT) & (PTRS_PER_PMD - 1)); } diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c index 8d69de1..89976c9 100644 --- a/arch/sparc/mm/fault_32.c +++ b/arch/sparc/mm/fault_32.c @@ -351,6 +351,8 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write, */ int offset = pgd_index(address); pgd_t *pgd, *pgd_k; + p4d_t *p4d, *p4d_k; + pud_t *pud, *pud_k; pmd_t *pmd, *pmd_k; pgd = tsk->active_mm->pgd + offset; @@ -363,8 +365,13 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write, return; } - pmd = pmd_offset(pgd, address); - pmd_k = pmd_offset(pgd_k, address); + p4d = p4d_offset(pgd, address); + pud = pud_offset(p4d, address); + pmd = pmd_offset(pud, address); + + p4d_k = p4d_offset(pgd_k, address); +
[PATCH 03/12] c6x: use pgtable-nopud instead of 4level-fixup
From: Mike Rapoport c6x is a nommu architecture and does not require fixup for upper layers of the page tables because it is already handled by the generic nommu implementation. Replace usage of include/asm-generic/4level-fixup.h with include/asm-generic/pgtable-nopud.h Signed-off-by: Mike Rapoport --- arch/c6x/include/asm/pgtable.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/c6x/include/asm/pgtable.h b/arch/c6x/include/asm/pgtable.h index 0b6919c..197c473 100644 --- a/arch/c6x/include/asm/pgtable.h +++ b/arch/c6x/include/asm/pgtable.h @@ -8,7 +8,7 @@ #ifndef _ASM_C6X_PGTABLE_H #define _ASM_C6X_PGTABLE_H -#include +#include #include #include -- 2.7.4
Re: [PATCH 03/21] ia64: rename ioremap_nocache to ioremap_uc
Hello! On 17.10.2019 20:45, Christoph Hellwig wrote: On ia64 ioremap_nocache fails if attributs don't match. Not other Attributes? architectures does this, and we plan to get rid of ioremap_nocache. So get rid of the special semantics and define ioremap_nocache in terms of ioremap as no portable driver could rely on the behavior anyway. However x86 implements ioremap_uc with a in a similar way as the ia64 With a what? version of ioremap_nocache, so implement that instead. Signed-off-by: Christoph Hellwig --- arch/ia64/include/asm/io.h | 6 +++--- arch/ia64/mm/ioremap.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/ia64/include/asm/io.h b/arch/ia64/include/asm/io.h index 54e70c21352a..fec9df9609ed 100644 --- a/arch/ia64/include/asm/io.h +++ b/arch/ia64/include/asm/io.h [...] MBR, Sergei
Re: [PATCH 20/21] csky: remove ioremap_cache
Acked-by: Guo Ren On Fri, Oct 18, 2019 at 1:47 AM Christoph Hellwig wrote: > > No driver that can be used on csky uses ioremap_cache, and this > interface has been deprecated in favor of memremap. > > Signed-off-by: Christoph Hellwig > --- > arch/csky/include/asm/io.h | 2 -- > arch/csky/mm/ioremap.c | 7 --- > 2 files changed, 9 deletions(-) > > diff --git a/arch/csky/include/asm/io.h b/arch/csky/include/asm/io.h > index a4b9fb616faa..f572605d5ad5 100644 > --- a/arch/csky/include/asm/io.h > +++ b/arch/csky/include/asm/io.h > @@ -36,13 +36,11 @@ > /* > * I/O memory mapping functions. > */ > -extern void __iomem *ioremap_cache(phys_addr_t addr, size_t size); > extern void __iomem *__ioremap(phys_addr_t addr, size_t size, pgprot_t prot); > extern void iounmap(void *addr); > > #define ioremap(addr, size)__ioremap((addr), (size), > pgprot_noncached(PAGE_KERNEL)) > #define ioremap_wc(addr, size) __ioremap((addr), (size), > pgprot_writecombine(PAGE_KERNEL)) > -#define ioremap_cache ioremap_cache > > #include > > diff --git a/arch/csky/mm/ioremap.c b/arch/csky/mm/ioremap.c > index e13cd3497628..ae78256a56fd 100644 > --- a/arch/csky/mm/ioremap.c > +++ b/arch/csky/mm/ioremap.c > @@ -44,13 +44,6 @@ void __iomem *__ioremap(phys_addr_t phys_addr, size_t > size, pgprot_t prot) > } > EXPORT_SYMBOL(__ioremap); > > -void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size) > -{ > - return __ioremap_caller(phys_addr, size, PAGE_KERNEL, > - __builtin_return_address(0)); > -} > -EXPORT_SYMBOL(ioremap_cache); > - > void iounmap(void __iomem *addr) > { > vunmap((void *)((unsigned long)addr & PAGE_MASK)); > -- > 2.20.1 > -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/
Darlehen
Ich bringe dir gute Nachrichten. Wir vergeben Kredite zu einem niedrigen Zinssatz von 2% über einen bestimmten Zeitraum. Müssen Sie ein Haus, ein Auto kaufen, ein Unternehmen gründen, ein Projekt finanzieren und vieles mehr? Wir vergeben schnelle und legitime Kredite sowohl an Privatpersonen als auch an Firmen. Kontaktieren Sie uns noch heute und Sie werden ohne Verzögerungen ausreichend finanziert.
Re: [PATCH 13/21] m68k: rename __iounmap and mark it static
Hi Christoph, On Thu, Oct 17, 2019 at 7:53 PM Christoph Hellwig wrote: > m68k uses __iounmap as the name for an internal helper that is only > used for some CPU types. Mark it static and give it a better name. > > Signed-off-by: Christoph Hellwig Thanks for your patch! > --- a/arch/m68k/mm/kmap.c > +++ b/arch/m68k/mm/kmap.c > @@ -52,6 +52,7 @@ static inline void free_io_area(void *addr) > > #define IO_SIZE(256*1024) > > +static void __free_io_area(void *addr, unsigned long size); > static struct vm_struct *iolist; > > static struct vm_struct *get_io_area(unsigned long size) > @@ -90,7 +91,7 @@ static inline void free_io_area(void *addr) > if (tmp->addr == addr) { > *p = tmp->next; > /* remove gap added in get_io_area() */ > - __iounmap(tmp->addr, tmp->size - IO_SIZE); > + __free_io_area(tmp->addr, tmp->size - IO_SIZE); > kfree(tmp); > return; > } > @@ -249,12 +250,13 @@ void iounmap(void __iomem *addr) > } > EXPORT_SYMBOL(iounmap); > > +#ifndef CPU_M68040_OR_M68060_ONLY Can you please move this block up, instead of adding more #ifdef cluttery? That would also remove the need for a forward declaration. > /* > - * __iounmap unmaps nearly everything, so be careful > + * __free_io_area unmaps nearly everything, so be careful > * Currently it doesn't free pointer/page tables anymore but this > * wasn't used anyway and might be added later. > */ > -void __iounmap(void *addr, unsigned long size) > +static void __free_io_area(void *addr, unsigned long size) > { > unsigned long virtaddr = (unsigned long)addr; > pgd_t *pgd_dir; > @@ -297,6 +299,7 @@ void __iounmap(void *addr, unsigned long size) > > flush_tlb_all(); > } > +#endif /* CPU_M68040_OR_M68060_ONLY */ > > /* > * Set new cache mode for some kernel address space. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: Some Alphas broken by f75b99d5a77d (PCI: Enforce bus address limits in resource allocation)
On Mon, Apr 23, 2018 at 10:34 AM Ivan Kokshaysky wrote: > > On Sun, Apr 22, 2018 at 01:07:38PM -0700, Matt Turner wrote: > > On Wed, Apr 18, 2018 at 1:48 PM, Ivan Kokshaysky > > wrote: > > > On Tue, Apr 17, 2018 at 02:43:44PM -0500, Bjorn Helgaas wrote: > > >> On Mon, Apr 16, 2018 at 09:43:42PM -0700, Matt Turner wrote: > > >> > On Mon, Apr 16, 2018 at 2:50 PM, Bjorn Helgaas > > >> > wrote: > > >> > > Hi Matt, > > >> > > > > >> > > First of all, sorry about breaking Nautilus, and thanks very much for > > >> > > tracking it down to this commit. > > >> > > > >> > It's a particularly weird case, as far as I've been able to discern :) > > >> > > > >> > > On Mon, Apr 16, 2018 at 07:33:57AM -0700, Matt Turner wrote: > > >> > >> Commit f75b99d5a77d63f20e07bd276d5a427808ac8ef6 (PCI: Enforce bus > > >> > >> address limits in resource allocation) broke Alpha systems using > > >> > >> CONFIG_ALPHA_NAUTILUS. Alpha is 64-bit, but Nautilus systems use a > > >> > >> 32-bit AMD 751/761 chipset. arch/alpha/kernel/sys_nautilus.c maps > > >> > >> PCI > > >> > >> into the upper addresses just below 4GB. > > >> > >> > > >> > >> I can get a working kernel by ifdef'ing out the code in > > >> > >> drivers/pci/bus.c:pci_bus_alloc_resource. We can't tie > > >> > >> PCI_BUS_ADDR_T_64BIT to ALPHA_NAUTILUS without breaking generic > > >> > >> kernels. > > >> > >> > > >> > >> How can we get Nautilus working again? > > >> > > > > >> > > Can you collect a complete dmesg log, ideally both before and after > > >> > > f75b99d5a77d? I assume the problem is that after f75b99d5a77d? we > > >> > > erroneously assign space for something above 4GB. But if we know the > > >> > > correct host bridge apertures, we shouldn't assign space outside > > >> > > them, > > >> > > regardless of the PCI bus address size. > > >> > > > >> > I made a mistake in my initial report. Commit f75b99d5a77d is actually > > >> > the last *working* commit. My apologies. The next commit is > > >> > d56dbf5bab8c (PCI: Allocate 64-bit BARs above 4G when possible) and it > > >> > breaks Nautilus I've confirmed. > > >> > > > >> > Please find attached dmesgs from those two commits, from the commit > > >> > immediately before them, and another from 4.17-rc1 with my hack of #if > > >> > 0'ing out the pci_bus_alloc_from_region(..., _high) code. > > >> > > > >> > Thanks for having a look! > > >> > > >> We're telling the PCI core that the host bridge MMIO aperture is the > > >> entire 64-bit address space, so when we assign BARs, some of them end > > >> up above 4GB: > > >> > > >> pci_bus :00: root bus resource [mem 0x-0x] > > >> pci :00:09.0: BAR 0: assigned [mem 0x1-0x1 64bit] > > >> > > >> But it sounds like the MMIO aperture really ends at 0x, so > > >> that's not going to work. > > > > > > Correct... This would do as a quick fix, I think: > > > > > > diff --git a/arch/alpha/kernel/sys_nautilus.c > > > b/arch/alpha/kernel/sys_nautilus.c > > > index ff4f54b..477ba65 100644 > > > --- a/arch/alpha/kernel/sys_nautilus.c > > > +++ b/arch/alpha/kernel/sys_nautilus.c > > > @@ -193,6 +193,8 @@ static struct resource irongate_io = { > > > }; > > > static struct resource irongate_mem = { > > > .name = "Irongate PCI MEM", > > > + .start = 0, > > > + .end= 0x, > > > .flags = IORESOURCE_MEM, > > > }; > > > static struct resource busn_resource = { > > > @@ -218,7 +220,7 @@ nautilus_init_pci(void) > > > return; > > > > > > pci_add_resource(>windows, _resource); > > > - pci_add_resource(>windows, _resource); > > > + pci_add_resource(>windows, _mem); > > > pci_add_resource(>windows, _resource); > > > bridge->dev.parent = NULL; > > > bridge->sysdata = hose; > > > > Thanks. But with that I get > > > > PCI host bridge to bus :00 > > pci_bus :00: root bus resource [io 0x-0x] > > pci_bus :00: root bus resource [mem 0x-0x] > > pci_bus :00: root bus resource [bus 00-ff] > > pci :00:10.0: [Firmware Bug]: reg 0x10: invalid BAR (can't size) > > pci :00:10.0: [Firmware Bug]: reg 0x14: invalid BAR (can't size) > > pci :00:10.0: [Firmware Bug]: reg 0x18: invalid BAR (can't size) > > pci :00:10.0: [Firmware Bug]: reg 0x1c: invalid BAR (can't size) > > pci :00:10.0: legacy IDE quirk: reg 0x10: [io 0x01f0-0x01f7] > > pci :00:10.0: legacy IDE quirk: reg 0x14: [io 0x03f6] > > pci :00:10.0: legacy IDE quirk: reg 0x18: [io 0x0170-0x0177] > > pci :00:10.0: legacy IDE quirk: reg 0x1c: [io 0x0376] > > pci :00:11.0: quirk: [io 0x4000-0x403f] claimed by ali7101 ACPI > > pci :00:11.0: quirk: [io 0x5000-0x501f] claimed by ali7101 SMB > > pci :00:01.0: BAR 9: assigned [mem 0xc000-0xc2ff pref] > > pci :00:01.0: BAR 8: assigned [mem 0xc300-0xc3bf] > > pci :00:0b.0: BAR 6: assigned [mem 0xc3c0-0xc3c3 pref] > > pci :00:08.0:
Re: [PATCH 18/21] riscv: use the generic ioremap code
On Thu, 17 Oct 2019, Christoph Hellwig wrote: > Use the generic ioremap code instead of providing a local version. > Note that this relies on the asm-generic no-op definition of > pgprot_noncached. > > Signed-off-by: Christoph Hellwig According to the series introduction E-mail: https://lore.kernel.org/linux-riscv/20191017174554.29840-1-...@lst.de/T/#m9ac4010fd725c8c84179fa99aa391a6f701a32de nothing substantive related to RISC-V or the common code has changed since the first version of this series, and this RISC-V-specific patch appears to be quite close (if not identical) to the first version of the patch: https://lore.kernel.org/linux-riscv/alpine.deb.2.21..1908171421560.4...@viisi.sifive.com/ Thus the Tested-by, Reviewed-by, and Acked-by for RISC-V should all still apply: https://lore.kernel.org/linux-riscv/alpine.deb.2.21..1908171421560.4...@viisi.sifive.com/ - Paul
Re: [PATCH 07/21] parisc: remove __ioremap
Christoph Hellwig wrote: > __ioremap is always called with the _PAGE_NO_CACHE, so fold the whole > thing and rename it to ioremap. This allows allows to remove the ^ > special EISA quirk to force _PAGE_NO_CACHE. Eike signature.asc Description: This is a digitally signed message part.
[PATCH 07/21] parisc: remove __ioremap
__ioremap is always called with the _PAGE_NO_CACHE, so fold the whole thing and rename it to ioremap. This allows allows to remove the special EISA quirk to force _PAGE_NO_CACHE. Signed-off-by: Christoph Hellwig --- arch/parisc/include/asm/io.h | 11 +-- arch/parisc/mm/ioremap.c | 10 -- 2 files changed, 5 insertions(+), 16 deletions(-) diff --git a/arch/parisc/include/asm/io.h b/arch/parisc/include/asm/io.h index 93d37010b375..46212b52c23e 100644 --- a/arch/parisc/include/asm/io.h +++ b/arch/parisc/include/asm/io.h @@ -127,16 +127,7 @@ static inline void gsc_writeq(unsigned long long val, unsigned long addr) /* * The standard PCI ioremap interfaces */ - -extern void __iomem * __ioremap(unsigned long offset, unsigned long size, unsigned long flags); - -/* Most machines react poorly to I/O-space being cacheable... Instead let's - * define ioremap() in terms of ioremap_nocache(). - */ -static inline void __iomem * ioremap(unsigned long offset, unsigned long size) -{ - return __ioremap(offset, size, _PAGE_NO_CACHE); -} +void __iomem *ioremap(unsigned long offset, unsigned long size); #define ioremap_nocache(off, sz) ioremap((off), (sz)) #define ioremap_wc ioremap_nocache #define ioremap_uc ioremap_nocache diff --git a/arch/parisc/mm/ioremap.c b/arch/parisc/mm/ioremap.c index f29f682352f0..6e7c005aa09b 100644 --- a/arch/parisc/mm/ioremap.c +++ b/arch/parisc/mm/ioremap.c @@ -25,7 +25,7 @@ * have to convert them into an offset in a page-aligned mapping, but the * caller shouldn't need to know that small detail. */ -void __iomem * __ioremap(unsigned long phys_addr, unsigned long size, unsigned long flags) +void __iomem *ioremap(unsigned long phys_addr, unsigned long size) { void __iomem *addr; struct vm_struct *area; @@ -36,10 +36,8 @@ void __iomem * __ioremap(unsigned long phys_addr, unsigned long size, unsigned l unsigned long end = phys_addr + size - 1; /* Support EISA addresses */ if ((phys_addr >= 0x0008 && end < 0x000f) || - (phys_addr >= 0x0050 && end < 0x03bf)) { + (phys_addr >= 0x0050 && end < 0x03bf)) phys_addr |= F_EXTEND(0xfc00); - flags |= _PAGE_NO_CACHE; - } #endif /* Don't allow wraparound or zero size */ @@ -65,7 +63,7 @@ void __iomem * __ioremap(unsigned long phys_addr, unsigned long size, unsigned l } pgprot = __pgprot(_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | - _PAGE_ACCESSED | flags); + _PAGE_ACCESSED | _PAGE_NO_CACHE); /* * Mappings have to be page-aligned @@ -90,7 +88,7 @@ void __iomem * __ioremap(unsigned long phys_addr, unsigned long size, unsigned l return (void __iomem *) (offset + (char __iomem *)addr); } -EXPORT_SYMBOL(__ioremap); +EXPORT_SYMBOL(ioremap); void iounmap(const volatile void __iomem *io_addr) { -- 2.20.1
[PATCH 17/21] lib: provide a simple generic ioremap implementation
A lot of architectures reuse the same simple ioremap implementation, so start lifting the most simple variant to lib/ioremap.c. It provides ioremap_prot and iounmap, plus a default ioremap that uses prot_noncached, although that can be overridden by asm/io.h. Signed-off-by: Christoph Hellwig --- include/asm-generic/io.h | 20 lib/Kconfig | 3 +++ lib/ioremap.c| 39 +++ 3 files changed, 58 insertions(+), 4 deletions(-) diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h index 4e45e1cb6560..4a661fdd1937 100644 --- a/include/asm-generic/io.h +++ b/include/asm-generic/io.h @@ -923,9 +923,10 @@ static inline void *phys_to_virt(unsigned long address) * DOC: ioremap() and ioremap_*() variants * * Architectures with an MMU are expected to provide ioremap() and iounmap() - * themselves. For NOMMU architectures we provide a default nop-op - * implementation that expect that the physical address used for MMIO are - * already marked as uncached, and can be used as kernel virtual addresses. + * themselves or rely on GENERIC_IOREMAP. For NOMMU architectures we provide + * a default nop-op implementation that expect that the physical address used + * for MMIO are already marked as uncached, and can be used as kernel virtual + * addresses. * * ioremap_wc() and ioremap_wt() can provide more relaxed caching attributes * for specific drivers if the architecture choses to implement them. If they @@ -946,7 +947,18 @@ static inline void iounmap(void __iomem *addr) { } #endif -#endif /* CONFIG_MMU */ +#elif defined(CONFIG_GENERIC_IOREMAP) +#include + +void __iomem *ioremap_prot(phys_addr_t addr, size_t size, unsigned long prot); +void iounmap(volatile void __iomem *addr); + +static inline void __iomem *ioremap(phys_addr_t addr, size_t size) +{ + /* _PAGE_IOREMAP needs to be supplied by the architecture */ + return ioremap_prot(addr, size, _PAGE_IOREMAP); +} +#endif /* !CONFIG_MMU || CONFIG_GENERIC_IOREMAP */ #ifndef ioremap_nocache #define ioremap_nocache ioremap diff --git a/lib/Kconfig b/lib/Kconfig index 183f92a297ca..afc78aaf2b25 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -638,6 +638,9 @@ config STRING_SELFTEST endmenu +config GENERIC_IOREMAP + bool + config GENERIC_LIB_ASHLDI3 bool diff --git a/lib/ioremap.c b/lib/ioremap.c index 0a2ffadc6d71..3f0e18543de8 100644 --- a/lib/ioremap.c +++ b/lib/ioremap.c @@ -231,3 +231,42 @@ int ioremap_page_range(unsigned long addr, return err; } + +#ifdef CONFIG_GENERIC_IOREMAP +void __iomem *ioremap_prot(phys_addr_t addr, size_t size, unsigned long prot) +{ + unsigned long offset, vaddr; + phys_addr_t last_addr; + struct vm_struct *area; + + /* Disallow wrap-around or zero size */ + last_addr = addr + size - 1; + if (!size || last_addr < addr) + return NULL; + + /* Page-align mappings */ + offset = addr & (~PAGE_MASK); + addr -= offset; + size = PAGE_ALIGN(size + offset); + + area = get_vm_area_caller(size, VM_IOREMAP, + __builtin_return_address(0)); + if (!area) + return NULL; + vaddr = (unsigned long)area->addr; + + if (ioremap_page_range(vaddr, vaddr + size, addr, __pgprot(prot))) { + free_vm_area(area); + return NULL; + } + + return (void __iomem *)(vaddr + offset); +} +EXPORT_SYMBOL(ioremap_prot); + +void iounmap(volatile void __iomem *addr) +{ + vunmap((void *)((unsigned long)addr & PAGE_MASK)); +} +EXPORT_SYMBOL(iounmap); +#endif /* CONFIG_GENERIC_IOREMAP */ -- 2.20.1
[PATCH 12/21] arch: rely on asm-generic/io.h for default ioremap_* definitions
Various architectures that use asm-generic/io.h still defined their own default versions of ioremap_nocache, ioremap_wt and ioremap_wc that point back to plain ioremap directly or indirectly. Remove these definitions and rely on asm-generic/io.h instead. For this to work the backup ioremap_* defintions needs to be changed to purely cpp macros instea of inlines to cover for architectures like openrisc that only define ioremap after including . Signed-off-by: Christoph Hellwig --- arch/arc/include/asm/io.h| 4 arch/arm/include/asm/io.h| 1 - arch/arm64/include/asm/io.h | 2 -- arch/csky/include/asm/io.h | 1 - arch/ia64/include/asm/io.h | 1 - arch/microblaze/include/asm/io.h | 3 --- arch/nios2/include/asm/io.h | 4 arch/openrisc/include/asm/io.h | 1 - arch/riscv/include/asm/io.h | 10 -- arch/s390/include/asm/io.h | 4 arch/x86/include/asm/io.h| 1 - arch/xtensa/include/asm/io.h | 4 include/asm-generic/io.h | 18 +++--- 13 files changed, 3 insertions(+), 51 deletions(-) diff --git a/arch/arc/include/asm/io.h b/arch/arc/include/asm/io.h index 72f7929736f8..8f777d6441a5 100644 --- a/arch/arc/include/asm/io.h +++ b/arch/arc/include/asm/io.h @@ -34,10 +34,6 @@ static inline void ioport_unmap(void __iomem *addr) extern void iounmap(const void __iomem *addr); -#define ioremap_nocache(phy, sz) ioremap(phy, sz) -#define ioremap_wc(phy, sz)ioremap(phy, sz) -#define ioremap_wt(phy, sz)ioremap(phy, sz) - /* * io{read,write}{16,32}be() macros */ diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h index 924f9dd502ed..aefdabdbeb84 100644 --- a/arch/arm/include/asm/io.h +++ b/arch/arm/include/asm/io.h @@ -392,7 +392,6 @@ static inline void memcpy_toio(volatile void __iomem *to, const void *from, */ void __iomem *ioremap(resource_size_t res_cookie, size_t size); #define ioremap ioremap -#define ioremap_nocache ioremap /* * Do not use ioremap_cache for mapping memory. Use memremap instead. diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h index 323cb306bd28..4e531f57147d 100644 --- a/arch/arm64/include/asm/io.h +++ b/arch/arm64/include/asm/io.h @@ -167,9 +167,7 @@ extern void iounmap(volatile void __iomem *addr); extern void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size); #define ioremap(addr, size)__ioremap((addr), (size), __pgprot(PROT_DEVICE_nGnRE)) -#define ioremap_nocache(addr, size)__ioremap((addr), (size), __pgprot(PROT_DEVICE_nGnRE)) #define ioremap_wc(addr, size) __ioremap((addr), (size), __pgprot(PROT_NORMAL_NC)) -#define ioremap_wt(addr, size) __ioremap((addr), (size), __pgprot(PROT_DEVICE_nGnRE)) /* * PCI configuration space mapping function. diff --git a/arch/csky/include/asm/io.h b/arch/csky/include/asm/io.h index 80d071e2567f..a4b9fb616faa 100644 --- a/arch/csky/include/asm/io.h +++ b/arch/csky/include/asm/io.h @@ -42,7 +42,6 @@ extern void iounmap(void *addr); #define ioremap(addr, size)__ioremap((addr), (size), pgprot_noncached(PAGE_KERNEL)) #define ioremap_wc(addr, size) __ioremap((addr), (size), pgprot_writecombine(PAGE_KERNEL)) -#define ioremap_nocache(addr, size)ioremap((addr), (size)) #define ioremap_cache ioremap_cache #include diff --git a/arch/ia64/include/asm/io.h b/arch/ia64/include/asm/io.h index fec9df9609ed..3d666a11a2de 100644 --- a/arch/ia64/include/asm/io.h +++ b/arch/ia64/include/asm/io.h @@ -263,7 +263,6 @@ static inline void __iomem * ioremap_cache (unsigned long phys_addr, unsigned lo return ioremap(phys_addr, size); } #define ioremap ioremap -#define ioremap_nocache ioremap #define ioremap_cache ioremap_cache #define ioremap_uc ioremap_uc #define iounmap iounmap diff --git a/arch/microblaze/include/asm/io.h b/arch/microblaze/include/asm/io.h index 86c95b2a1ce1..d33c61737b8b 100644 --- a/arch/microblaze/include/asm/io.h +++ b/arch/microblaze/include/asm/io.h @@ -39,9 +39,6 @@ extern resource_size_t isa_mem_base; extern void iounmap(volatile void __iomem *addr); extern void __iomem *ioremap(phys_addr_t address, unsigned long size); -#define ioremap_nocache(addr, size)ioremap((addr), (size)) -#define ioremap_wc(addr, size) ioremap((addr), (size)) -#define ioremap_wt(addr, size) ioremap((addr), (size)) #endif /* CONFIG_MMU */ diff --git a/arch/nios2/include/asm/io.h b/arch/nios2/include/asm/io.h index 74ab34aa6731..d108937c321e 100644 --- a/arch/nios2/include/asm/io.h +++ b/arch/nios2/include/asm/io.h @@ -33,10 +33,6 @@ static inline void iounmap(void __iomem *addr) __iounmap(addr); } -#define ioremap_nocache ioremap -#define ioremap_wc ioremap -#define ioremap_wt ioremap - /* Pages to physical address... */ #define page_to_phys(page) virt_to_phys(page_to_virt(page)) diff --git
[PATCH 15/21] nios2: remove __iounmap
No need to indirect iounmap for nios2. Signed-off-by: Christoph Hellwig --- arch/nios2/include/asm/io.h | 7 +-- arch/nios2/mm/ioremap.c | 6 +++--- 2 files changed, 4 insertions(+), 9 deletions(-) diff --git a/arch/nios2/include/asm/io.h b/arch/nios2/include/asm/io.h index d108937c321e..746853ac7d8d 100644 --- a/arch/nios2/include/asm/io.h +++ b/arch/nios2/include/asm/io.h @@ -26,12 +26,7 @@ #define writel_relaxed(x, addr)writel(x, addr) void __iomem *ioremap(unsigned long physaddr, unsigned long size); -extern void __iounmap(void __iomem *addr); - -static inline void iounmap(void __iomem *addr) -{ - __iounmap(addr); -} +void iounmap(void __iomem *addr); /* Pages to physical address... */ #define page_to_phys(page) virt_to_phys(page_to_virt(page)) diff --git a/arch/nios2/mm/ioremap.c b/arch/nios2/mm/ioremap.c index 7a1a27f3daa3..b56af759dcdf 100644 --- a/arch/nios2/mm/ioremap.c +++ b/arch/nios2/mm/ioremap.c @@ -157,11 +157,11 @@ void __iomem *ioremap(unsigned long phys_addr, unsigned long size) EXPORT_SYMBOL(ioremap); /* - * __iounmap unmaps nearly everything, so be careful + * iounmap unmaps nearly everything, so be careful * it doesn't free currently pointer/page tables anymore but it * wasn't used anyway and might be added later. */ -void __iounmap(void __iomem *addr) +void iounmap(void __iomem *addr) { struct vm_struct *p; @@ -173,4 +173,4 @@ void __iounmap(void __iomem *addr) pr_err("iounmap: bad address %p\n", addr); kfree(p); } -EXPORT_SYMBOL(__iounmap); +EXPORT_SYMBOL(iounmap); -- 2.20.1
[PATCH 13/21] m68k: rename __iounmap and mark it static
m68k uses __iounmap as the name for an internal helper that is only used for some CPU types. Mark it static and give it a better name. Signed-off-by: Christoph Hellwig --- arch/m68k/include/asm/kmap.h | 1 - arch/m68k/mm/kmap.c | 9 ++--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/m68k/include/asm/kmap.h b/arch/m68k/include/asm/kmap.h index 421b6c9c769d..559cb91bede1 100644 --- a/arch/m68k/include/asm/kmap.h +++ b/arch/m68k/include/asm/kmap.h @@ -20,7 +20,6 @@ extern void __iomem *__ioremap(unsigned long physaddr, unsigned long size, int cacheflag); #define iounmap iounmap extern void iounmap(void __iomem *addr); -extern void __iounmap(void *addr, unsigned long size); #define ioremap ioremap static inline void __iomem *ioremap(unsigned long physaddr, unsigned long size) diff --git a/arch/m68k/mm/kmap.c b/arch/m68k/mm/kmap.c index 40a3b327da07..4c279cf0bcc8 100644 --- a/arch/m68k/mm/kmap.c +++ b/arch/m68k/mm/kmap.c @@ -52,6 +52,7 @@ static inline void free_io_area(void *addr) #define IO_SIZE(256*1024) +static void __free_io_area(void *addr, unsigned long size); static struct vm_struct *iolist; static struct vm_struct *get_io_area(unsigned long size) @@ -90,7 +91,7 @@ static inline void free_io_area(void *addr) if (tmp->addr == addr) { *p = tmp->next; /* remove gap added in get_io_area() */ - __iounmap(tmp->addr, tmp->size - IO_SIZE); + __free_io_area(tmp->addr, tmp->size - IO_SIZE); kfree(tmp); return; } @@ -249,12 +250,13 @@ void iounmap(void __iomem *addr) } EXPORT_SYMBOL(iounmap); +#ifndef CPU_M68040_OR_M68060_ONLY /* - * __iounmap unmaps nearly everything, so be careful + * __free_io_area unmaps nearly everything, so be careful * Currently it doesn't free pointer/page tables anymore but this * wasn't used anyway and might be added later. */ -void __iounmap(void *addr, unsigned long size) +static void __free_io_area(void *addr, unsigned long size) { unsigned long virtaddr = (unsigned long)addr; pgd_t *pgd_dir; @@ -297,6 +299,7 @@ void __iounmap(void *addr, unsigned long size) flush_tlb_all(); } +#endif /* CPU_M68040_OR_M68060_ONLY */ /* * Set new cache mode for some kernel address space. -- 2.20.1
[PATCH 18/21] riscv: use the generic ioremap code
Use the generic ioremap code instead of providing a local version. Note that this relies on the asm-generic no-op definition of pgprot_noncached. Signed-off-by: Christoph Hellwig --- arch/riscv/Kconfig | 1 + arch/riscv/include/asm/io.h | 3 -- arch/riscv/include/asm/pgtable.h | 6 +++ arch/riscv/mm/Makefile | 1 - arch/riscv/mm/ioremap.c | 84 5 files changed, 7 insertions(+), 88 deletions(-) delete mode 100644 arch/riscv/mm/ioremap.c diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 8eebbc8860bb..a02e91ed747a 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -30,6 +30,7 @@ config RISCV select GENERIC_STRNLEN_USER select GENERIC_SMP_IDLE_THREAD select GENERIC_ATOMIC64 if !64BIT + select GENERIC_IOREMAP select HAVE_ARCH_AUDITSYSCALL select HAVE_ASM_MODVERSIONS select HAVE_MEMBLOCK_NODE_MAP diff --git a/arch/riscv/include/asm/io.h b/arch/riscv/include/asm/io.h index c1de6875cc77..df4c8812ff64 100644 --- a/arch/riscv/include/asm/io.h +++ b/arch/riscv/include/asm/io.h @@ -14,9 +14,6 @@ #include #include -extern void __iomem *ioremap(phys_addr_t offset, unsigned long size); -extern void iounmap(volatile void __iomem *addr); - /* Generic IO read/write. These perform native-endian accesses. */ #define __raw_writeb __raw_writeb static inline void __raw_writeb(u8 val, volatile void __iomem *addr) diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 7255f2d8395b..65a216e91df2 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -61,6 +61,12 @@ #define PAGE_TABLE __pgprot(_PAGE_TABLE) +/* + * The RISC-V ISA doesn't yet specify how to query or modify PMAs, so we can't + * change the properties of memory regions. + */ +#define _PAGE_IOREMAP _PAGE_KERNEL + extern pgd_t swapper_pg_dir[]; /* MAP_PRIVATE permissions: xwr (copy-on-write) */ diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile index 9d9a17335686..b3a356c80c1f 100644 --- a/arch/riscv/mm/Makefile +++ b/arch/riscv/mm/Makefile @@ -8,7 +8,6 @@ endif obj-y += init.o obj-y += fault.o obj-y += extable.o -obj-y += ioremap.o obj-y += cacheflush.o obj-y += context.o obj-y += sifive_l2_cache.o diff --git a/arch/riscv/mm/ioremap.c b/arch/riscv/mm/ioremap.c deleted file mode 100644 index ac621ddb45c0.. --- a/arch/riscv/mm/ioremap.c +++ /dev/null @@ -1,84 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* - * (C) Copyright 1995 1996 Linus Torvalds - * (C) Copyright 2012 Regents of the University of California - */ - -#include -#include -#include -#include - -#include - -/* - * Remap an arbitrary physical address space into the kernel virtual - * address space. Needed when the kernel wants to access high addresses - * directly. - * - * NOTE! We need to allow non-page-aligned mappings too: we will obviously - * have to convert them into an offset in a page-aligned mapping, but the - * caller shouldn't need to know that small detail. - */ -static void __iomem *__ioremap_caller(phys_addr_t addr, size_t size, - pgprot_t prot, void *caller) -{ - phys_addr_t last_addr; - unsigned long offset, vaddr; - struct vm_struct *area; - - /* Disallow wrap-around or zero size */ - last_addr = addr + size - 1; - if (!size || last_addr < addr) - return NULL; - - /* Page-align mappings */ - offset = addr & (~PAGE_MASK); - addr -= offset; - size = PAGE_ALIGN(size + offset); - - area = get_vm_area_caller(size, VM_IOREMAP, caller); - if (!area) - return NULL; - vaddr = (unsigned long)area->addr; - - if (ioremap_page_range(vaddr, vaddr + size, addr, prot)) { - free_vm_area(area); - return NULL; - } - - return (void __iomem *)(vaddr + offset); -} - -/* - * ioremap - map bus memory into CPU space - * @offset:bus address of the memory - * @size: size of the resource to map - * - * ioremap performs a platform specific sequence of operations to - * make bus memory CPU accessible via the readb/readw/readl/writeb/ - * writew/writel functions and the other mmio helpers. The returned - * address is not guaranteed to be usable directly as a virtual - * address. - * - * Must be freed with iounmap. - */ -void __iomem *ioremap(phys_addr_t offset, unsigned long size) -{ - return __ioremap_caller(offset, size, PAGE_KERNEL, - __builtin_return_address(0)); -} -EXPORT_SYMBOL(ioremap); - - -/** - * iounmap - Free a IO remapping - * @addr: virtual address from ioremap_* - * - * Caller must ensure there is only one unmapping for the same pointer. - */ -void iounmap(volatile void __iomem *addr) -{ - vunmap((void *)((unsigned long)addr & PAGE_MASK)); -} -EXPORT_SYMBOL(iounmap); -- 2.20.1
[PATCH 14/21] hexagon: remove __iounmap
No need to indirect iounmap for hexagon. Signed-off-by: Christoph Hellwig --- arch/hexagon/include/asm/io.h | 7 +-- arch/hexagon/kernel/hexagon_ksyms.c | 2 +- arch/hexagon/mm/ioremap.c | 2 +- 3 files changed, 3 insertions(+), 8 deletions(-) diff --git a/arch/hexagon/include/asm/io.h b/arch/hexagon/include/asm/io.h index 89537dc1cf97..539e3efcf39c 100644 --- a/arch/hexagon/include/asm/io.h +++ b/arch/hexagon/include/asm/io.h @@ -27,7 +27,7 @@ extern int remap_area_pages(unsigned long start, unsigned long phys_addr, unsigned long end, unsigned long flags); -extern void __iounmap(const volatile void __iomem *addr); +extern void iounmap(const volatile void __iomem *addr); /* Defined in lib/io.c, needed for smc91x driver. */ extern void __raw_readsw(const void __iomem *addr, void *data, int wordlen); @@ -175,11 +175,6 @@ void __iomem *ioremap(unsigned long phys_addr, unsigned long size); #define ioremap_nocache ioremap -static inline void iounmap(volatile void __iomem *addr) -{ - __iounmap(addr); -} - #define __raw_writel writel static inline void memcpy_fromio(void *dst, const volatile void __iomem *src, diff --git a/arch/hexagon/kernel/hexagon_ksyms.c b/arch/hexagon/kernel/hexagon_ksyms.c index b3dbb472572e..6fb1aaab1c29 100644 --- a/arch/hexagon/kernel/hexagon_ksyms.c +++ b/arch/hexagon/kernel/hexagon_ksyms.c @@ -14,7 +14,7 @@ EXPORT_SYMBOL(__clear_user_hexagon); EXPORT_SYMBOL(raw_copy_from_user); EXPORT_SYMBOL(raw_copy_to_user); -EXPORT_SYMBOL(__iounmap); +EXPORT_SYMBOL(iounmap); EXPORT_SYMBOL(__strnlen_user); EXPORT_SYMBOL(__vmgetie); EXPORT_SYMBOL(__vmsetie); diff --git a/arch/hexagon/mm/ioremap.c b/arch/hexagon/mm/ioremap.c index b103d83b5fbb..255c5b1ee1a7 100644 --- a/arch/hexagon/mm/ioremap.c +++ b/arch/hexagon/mm/ioremap.c @@ -38,7 +38,7 @@ void __iomem *ioremap(unsigned long phys_addr, unsigned long size) return (void __iomem *) (offset + addr); } -void __iounmap(const volatile void __iomem *addr) +void iounmap(const volatile void __iomem *addr) { vunmap((void *) ((unsigned long) addr & PAGE_MASK)); } -- 2.20.1
[PATCH 11/21] asm-generic: don't provide ioremap for CONFIG_MMU
All MMU-enabled ports have a non-trivial ioremap and should thus provide the prototype for their implementation instead of providing a generic one unless a different symbol is not defined. Note that this only affects sparc32 nds32 as all others do provide their own version. Also update the kerneldoc comments in asm-generic/io.h to explain the situation around the default ioremap* implementations correctly. Signed-off-by: Christoph Hellwig --- arch/nds32/include/asm/io.h| 2 ++ arch/sparc/include/asm/io_32.h | 1 + include/asm-generic/io.h | 29 - 3 files changed, 11 insertions(+), 21 deletions(-) diff --git a/arch/nds32/include/asm/io.h b/arch/nds32/include/asm/io.h index 16f262322b8f..fb0e8a24c7af 100644 --- a/arch/nds32/include/asm/io.h +++ b/arch/nds32/include/asm/io.h @@ -6,6 +6,7 @@ #include +void __iomem *ioremap(phys_addr_t phys_addr, size_t size); extern void iounmap(volatile void __iomem *addr); #define __raw_writeb __raw_writeb static inline void __raw_writeb(u8 val, volatile void __iomem *addr) @@ -80,4 +81,5 @@ static inline u32 __raw_readl(const volatile void __iomem *addr) #define writew(v,c)({ __iowmb(); writew_relaxed((v),(c)); }) #define writel(v,c)({ __iowmb(); writel_relaxed((v),(c)); }) #include + #endif /* __ASM_NDS32_IO_H */ diff --git a/arch/sparc/include/asm/io_32.h b/arch/sparc/include/asm/io_32.h index df2dc1784673..9a52d9506f80 100644 --- a/arch/sparc/include/asm/io_32.h +++ b/arch/sparc/include/asm/io_32.h @@ -127,6 +127,7 @@ static inline void sbus_memcpy_toio(volatile void __iomem *dst, * Bus number may be embedded in the higher bits of the physical address. * This is why we have no bus number argument to ioremap(). */ +void __iomem *ioremap(phys_addr_t offset, size_t size); void iounmap(volatile void __iomem *addr); /* Create a virtual mapping cookie for an IO port range */ void __iomem *ioport_map(unsigned long port, unsigned int nr); diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h index a98ed6325727..6a5edc23afe2 100644 --- a/include/asm-generic/io.h +++ b/include/asm-generic/io.h @@ -922,28 +922,16 @@ static inline void *phys_to_virt(unsigned long address) /** * DOC: ioremap() and ioremap_*() variants * - * If you have an IOMMU your architecture is expected to have both ioremap() - * and iounmap() implemented otherwise the asm-generic helpers will provide a - * direct mapping. + * Architectures with an MMU are expected to provide ioremap() and iounmap() + * themselves. For NOMMU architectures we provide a default nop-op + * implementation that expect that the physical address used for MMIO are + * already marked as uncached, and can be used as kernel virtual addresses. * - * There are ioremap_*() call variants, if you have no IOMMU we naturally will - * default to direct mapping for all of them, you can override these defaults. - * If you have an IOMMU you are highly encouraged to provide your own - * ioremap variant implementation as there currently is no safe architecture - * agnostic default. To avoid possible improper behaviour default asm-generic - * ioremap_*() variants all return NULL when an IOMMU is available. If you've - * defined your own ioremap_*() variant you must then declare your own - * ioremap_*() variant as defined to itself to avoid the default NULL return. + * ioremap_wc() and ioremap_wt() can provide more relaxed caching attributes + * for specific drivers if the architecture choses to implement them. If they + * are not implemented we fall back to plain ioremap. */ #ifndef CONFIG_MMU - -/* - * Change "struct page" to physical address. - * - * This implementation is for the no-MMU case only... if you have an MMU - * you'll need to provide your own definitions. - */ - #ifndef ioremap #define ioremap ioremap static inline void __iomem *ioremap(phys_addr_t offset, size_t size) @@ -954,14 +942,13 @@ static inline void __iomem *ioremap(phys_addr_t offset, size_t size) #ifndef iounmap #define iounmap iounmap - static inline void iounmap(void __iomem *addr) { } #endif #endif /* CONFIG_MMU */ + #ifndef ioremap_nocache -void __iomem *ioremap(phys_addr_t phys_addr, size_t size); #define ioremap_nocache ioremap_nocache static inline void __iomem *ioremap_nocache(phys_addr_t offset, size_t size) { -- 2.20.1
[PATCH 16/21] sh: remove __iounmap
No need to indirect iounmap for sh. Signed-off-by: Christoph Hellwig --- arch/sh/include/asm/io.h | 9 ++--- arch/sh/mm/ioremap.c | 4 ++-- 2 files changed, 4 insertions(+), 9 deletions(-) diff --git a/arch/sh/include/asm/io.h b/arch/sh/include/asm/io.h index ac0561960c52..1495489225ac 100644 --- a/arch/sh/include/asm/io.h +++ b/arch/sh/include/asm/io.h @@ -267,7 +267,7 @@ unsigned long long poke_real_address_q(unsigned long long addr, #ifdef CONFIG_MMU void __iomem *__ioremap_caller(phys_addr_t offset, unsigned long size, pgprot_t prot, void *caller); -void __iounmap(void __iomem *addr); +void iounmap(void __iomem *addr); static inline void __iomem * __ioremap(phys_addr_t offset, unsigned long size, pgprot_t prot) @@ -328,7 +328,7 @@ __ioremap_mode(phys_addr_t offset, unsigned long size, pgprot_t prot) #else #define __ioremap(offset, size, prot) ((void __iomem *)(offset)) #define __ioremap_mode(offset, size, prot) ((void __iomem *)(offset)) -#define __iounmap(addr)do { } while (0) +#define iounmap(addr) do { } while (0) #endif /* CONFIG_MMU */ static inline void __iomem *ioremap(phys_addr_t offset, unsigned long size) @@ -370,11 +370,6 @@ static inline int iounmap_fixed(void __iomem *addr) { return -EINVAL; } #define ioremap_nocacheioremap #define ioremap_uc ioremap -static inline void iounmap(void __iomem *addr) -{ - __iounmap(addr); -} - /* * Convert a physical pointer to a virtual kernel pointer for /dev/mem * access diff --git a/arch/sh/mm/ioremap.c b/arch/sh/mm/ioremap.c index d09ddfe58fd8..f6d02246d665 100644 --- a/arch/sh/mm/ioremap.c +++ b/arch/sh/mm/ioremap.c @@ -103,7 +103,7 @@ static inline int iomapping_nontranslatable(unsigned long offset) return 0; } -void __iounmap(void __iomem *addr) +void iounmap(void __iomem *addr) { unsigned long vaddr = (unsigned long __force)addr; struct vm_struct *p; @@ -134,4 +134,4 @@ void __iounmap(void __iomem *addr) kfree(p); } -EXPORT_SYMBOL(__iounmap); +EXPORT_SYMBOL(iounmap); -- 2.20.1
[PATCH 21/21] csky: use generic ioremap
Use the generic ioremap_prot and iounmap helpers. Signed-off-by: Christoph Hellwig --- arch/csky/Kconfig | 1 + arch/csky/include/asm/io.h | 8 +++--- arch/csky/include/asm/pgtable.h | 4 +++ arch/csky/mm/ioremap.c | 45 - 4 files changed, 8 insertions(+), 50 deletions(-) diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig index 3973847b5f42..da09c884cc30 100644 --- a/arch/csky/Kconfig +++ b/arch/csky/Kconfig @@ -17,6 +17,7 @@ config CSKY select IRQ_DOMAIN select HANDLE_DOMAIN_IRQ select DW_APB_TIMER_OF + select GENERIC_IOREMAP select GENERIC_LIB_ASHLDI3 select GENERIC_LIB_ASHRDI3 select GENERIC_LIB_LSHRDI3 diff --git a/arch/csky/include/asm/io.h b/arch/csky/include/asm/io.h index f572605d5ad5..332f51bc68fb 100644 --- a/arch/csky/include/asm/io.h +++ b/arch/csky/include/asm/io.h @@ -36,11 +36,9 @@ /* * I/O memory mapping functions. */ -extern void __iomem *__ioremap(phys_addr_t addr, size_t size, pgprot_t prot); -extern void iounmap(void *addr); - -#define ioremap(addr, size)__ioremap((addr), (size), pgprot_noncached(PAGE_KERNEL)) -#define ioremap_wc(addr, size) __ioremap((addr), (size), pgprot_writecombine(PAGE_KERNEL)) +#define ioremap_wc(addr, size) \ + ioremap_prot((addr), (size), \ + (_PAGE_IOREMAP & ~_CACHE_MASK) | _CACHE_UNCACHED) #include diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h index 7c21985c60dc..4b2a41e15f2e 100644 --- a/arch/csky/include/asm/pgtable.h +++ b/arch/csky/include/asm/pgtable.h @@ -86,6 +86,10 @@ #define PAGE_USERIO__pgprot(_PAGE_PRESENT | _PAGE_READ | _PAGE_WRITE | \ _CACHE_CACHED) +#define _PAGE_IOREMAP \ + (_PAGE_PRESENT | __READABLE | __WRITEABLE | _PAGE_GLOBAL | \ +_CACHE_UNCACHED | _PAGE_SO) + #define __P000 PAGE_NONE #define __P001 PAGE_READONLY #define __P010 PAGE_COPY diff --git a/arch/csky/mm/ioremap.c b/arch/csky/mm/ioremap.c index ae78256a56fd..70c8268d3b2b 100644 --- a/arch/csky/mm/ioremap.c +++ b/arch/csky/mm/ioremap.c @@ -3,53 +3,8 @@ #include #include -#include #include -#include - -static void __iomem *__ioremap_caller(phys_addr_t addr, size_t size, - pgprot_t prot, void *caller) -{ - phys_addr_t last_addr; - unsigned long offset, vaddr; - struct vm_struct *area; - - last_addr = addr + size - 1; - if (!size || last_addr < addr) - return NULL; - - offset = addr & (~PAGE_MASK); - addr &= PAGE_MASK; - size = PAGE_ALIGN(size + offset); - - area = get_vm_area_caller(size, VM_IOREMAP, caller); - if (!area) - return NULL; - - vaddr = (unsigned long)area->addr; - - if (ioremap_page_range(vaddr, vaddr + size, addr, prot)) { - free_vm_area(area); - return NULL; - } - - return (void __iomem *)(vaddr + offset); -} - -void __iomem *__ioremap(phys_addr_t phys_addr, size_t size, pgprot_t prot) -{ - return __ioremap_caller(phys_addr, size, prot, - __builtin_return_address(0)); -} -EXPORT_SYMBOL(__ioremap); - -void iounmap(void __iomem *addr) -{ - vunmap((void *)((unsigned long)addr & PAGE_MASK)); -} -EXPORT_SYMBOL(iounmap); - pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, unsigned long size, pgprot_t vma_prot) { -- 2.20.1
[PATCH 20/21] csky: remove ioremap_cache
No driver that can be used on csky uses ioremap_cache, and this interface has been deprecated in favor of memremap. Signed-off-by: Christoph Hellwig --- arch/csky/include/asm/io.h | 2 -- arch/csky/mm/ioremap.c | 7 --- 2 files changed, 9 deletions(-) diff --git a/arch/csky/include/asm/io.h b/arch/csky/include/asm/io.h index a4b9fb616faa..f572605d5ad5 100644 --- a/arch/csky/include/asm/io.h +++ b/arch/csky/include/asm/io.h @@ -36,13 +36,11 @@ /* * I/O memory mapping functions. */ -extern void __iomem *ioremap_cache(phys_addr_t addr, size_t size); extern void __iomem *__ioremap(phys_addr_t addr, size_t size, pgprot_t prot); extern void iounmap(void *addr); #define ioremap(addr, size)__ioremap((addr), (size), pgprot_noncached(PAGE_KERNEL)) #define ioremap_wc(addr, size) __ioremap((addr), (size), pgprot_writecombine(PAGE_KERNEL)) -#define ioremap_cache ioremap_cache #include diff --git a/arch/csky/mm/ioremap.c b/arch/csky/mm/ioremap.c index e13cd3497628..ae78256a56fd 100644 --- a/arch/csky/mm/ioremap.c +++ b/arch/csky/mm/ioremap.c @@ -44,13 +44,6 @@ void __iomem *__ioremap(phys_addr_t phys_addr, size_t size, pgprot_t prot) } EXPORT_SYMBOL(__ioremap); -void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size) -{ - return __ioremap_caller(phys_addr, size, PAGE_KERNEL, - __builtin_return_address(0)); -} -EXPORT_SYMBOL(ioremap_cache); - void iounmap(void __iomem *addr) { vunmap((void *)((unsigned long)addr & PAGE_MASK)); -- 2.20.1
[PATCH 19/21] nds32: use generic ioremap
Use the generic ioremap_prot and iounmap helpers. Note that the io.h include in pgtable.h had to be removed to not create an include loop. As far as I can tell there was no need for it to start with. Signed-off-by: Christoph Hellwig --- arch/nds32/Kconfig | 1 + arch/nds32/include/asm/io.h | 3 +- arch/nds32/include/asm/pgtable.h | 4 ++- arch/nds32/mm/Makefile | 3 +- arch/nds32/mm/ioremap.c | 62 5 files changed, 6 insertions(+), 67 deletions(-) delete mode 100644 arch/nds32/mm/ioremap.c diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig index fbd68329737f..12c06a833b7c 100644 --- a/arch/nds32/Kconfig +++ b/arch/nds32/Kconfig @@ -20,6 +20,7 @@ config NDS32 select GENERIC_CLOCKEVENTS select GENERIC_IRQ_CHIP select GENERIC_IRQ_SHOW + select GENERIC_IOREMAP select GENERIC_LIB_ASHLDI3 select GENERIC_LIB_ASHRDI3 select GENERIC_LIB_CMPDI2 diff --git a/arch/nds32/include/asm/io.h b/arch/nds32/include/asm/io.h index fb0e8a24c7af..e57378d04006 100644 --- a/arch/nds32/include/asm/io.h +++ b/arch/nds32/include/asm/io.h @@ -6,8 +6,6 @@ #include -void __iomem *ioremap(phys_addr_t phys_addr, size_t size); -extern void iounmap(volatile void __iomem *addr); #define __raw_writeb __raw_writeb static inline void __raw_writeb(u8 val, volatile void __iomem *addr) { @@ -80,6 +78,7 @@ static inline u32 __raw_readl(const volatile void __iomem *addr) #define writeb(v,c)({ __iowmb(); writeb_relaxed((v),(c)); }) #define writew(v,c)({ __iowmb(); writew_relaxed((v),(c)); }) #define writel(v,c)({ __iowmb(); writel_relaxed((v),(c)); }) + #include #endif /* __ASM_NDS32_IO_H */ diff --git a/arch/nds32/include/asm/pgtable.h b/arch/nds32/include/asm/pgtable.h index 0588ec99725c..6fbf251cfc26 100644 --- a/arch/nds32/include/asm/pgtable.h +++ b/arch/nds32/include/asm/pgtable.h @@ -12,7 +12,6 @@ #include #ifndef __ASSEMBLY__ #include -#include #include #endif @@ -130,6 +129,9 @@ extern void __pgd_error(const char *file, int line, unsigned long val); #define _PAGE_CACHE_PAGE_C_MEM_WB #endif +#define _PAGE_IOREMAP \ + (_PAGE_V | _PAGE_M_KRW | _PAGE_D | _PAGE_G | _PAGE_C_DEV) + /* * + Level 1 descriptor (PMD) */ diff --git a/arch/nds32/mm/Makefile b/arch/nds32/mm/Makefile index bd360e4583b5..897ecaf5cf54 100644 --- a/arch/nds32/mm/Makefile +++ b/arch/nds32/mm/Makefile @@ -1,6 +1,5 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-y := extable.o tlb.o \ - fault.o init.o ioremap.o mmap.o \ +obj-y := extable.o tlb.o fault.o init.o mmap.o \ mm-nds32.o cacheflush.o proc.o obj-$(CONFIG_ALIGNMENT_TRAP) += alignment.o diff --git a/arch/nds32/mm/ioremap.c b/arch/nds32/mm/ioremap.c deleted file mode 100644 index 690140bb23a2.. --- a/arch/nds32/mm/ioremap.c +++ /dev/null @@ -1,62 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -// Copyright (C) 2005-2017 Andes Technology Corporation - -#include -#include -#include -#include - -void __iomem *ioremap(phys_addr_t phys_addr, size_t size); - -static void __iomem *__ioremap_caller(phys_addr_t phys_addr, size_t size, - void *caller) -{ - struct vm_struct *area; - unsigned long addr, offset, last_addr; - pgprot_t prot; - - /* Don't allow wraparound or zero size */ - last_addr = phys_addr + size - 1; - if (!size || last_addr < phys_addr) - return NULL; - - /* -* Mappings have to be page-aligned -*/ - offset = phys_addr & ~PAGE_MASK; - phys_addr &= PAGE_MASK; - size = PAGE_ALIGN(last_addr + 1) - phys_addr; - - /* -* Ok, go for it.. -*/ - area = get_vm_area_caller(size, VM_IOREMAP, caller); - if (!area) - return NULL; - - area->phys_addr = phys_addr; - addr = (unsigned long)area->addr; - prot = __pgprot(_PAGE_V | _PAGE_M_KRW | _PAGE_D | - _PAGE_G | _PAGE_C_DEV); - if (ioremap_page_range(addr, addr + size, phys_addr, prot)) { - vunmap((void *)addr); - return NULL; - } - return (__force void __iomem *)(offset + (char *)addr); - -} - -void __iomem *ioremap(phys_addr_t phys_addr, size_t size) -{ - return __ioremap_caller(phys_addr, size, - __builtin_return_address(0)); -} - -EXPORT_SYMBOL(ioremap); - -void iounmap(volatile void __iomem * addr) -{ - vunmap((void *)(PAGE_MASK & (unsigned long)addr)); -} - -EXPORT_SYMBOL(iounmap); -- 2.20.1
[PATCH 08/21] x86: clean up ioremap
Use ioremap as the main implemented function, and defined ioremap_nocache to it as a deprecated alias. Signed-off-by: Christoph Hellwig --- arch/x86/include/asm/io.h | 8 ++-- arch/x86/mm/ioremap.c | 8 arch/x86/mm/pageattr.c| 4 ++-- 3 files changed, 8 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index 6bed97ff6db2..6b5cc41319a7 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -180,8 +180,6 @@ static inline unsigned int isa_virt_to_bus(volatile void *address) * The default ioremap() behavior is non-cached; if you need something * else, you probably want one of the following. */ -extern void __iomem *ioremap_nocache(resource_size_t offset, unsigned long size); -#define ioremap_nocache ioremap_nocache extern void __iomem *ioremap_uc(resource_size_t offset, unsigned long size); #define ioremap_uc ioremap_uc extern void __iomem *ioremap_cache(resource_size_t offset, unsigned long size); @@ -205,11 +203,9 @@ extern void __iomem *ioremap_encrypted(resource_size_t phys_addr, unsigned long * If the area you are trying to map is a PCI BAR you should have a * look at pci_iomap(). */ -static inline void __iomem *ioremap(resource_size_t offset, unsigned long size) -{ - return ioremap_nocache(offset, size); -} +void __iomem *ioremap(resource_size_t offset, unsigned long size); #define ioremap ioremap +#define ioremap_nocache ioremap extern void iounmap(volatile void __iomem *addr); #define iounmap iounmap diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index a39dcdb5ae34..7985233dfb8d 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -280,11 +280,11 @@ __ioremap_caller(resource_size_t phys_addr, unsigned long size, } /** - * ioremap_nocache - map bus memory into CPU space + * ioremap - map bus memory into CPU space * @phys_addr:bus address of the memory * @size: size of the resource to map * - * ioremap_nocache performs a platform specific sequence of operations to + * ioremap performs a platform specific sequence of operations to * make bus memory CPU accessible via the readb/readw/readl/writeb/ * writew/writel functions and the other mmio helpers. The returned * address is not guaranteed to be usable directly as a virtual @@ -300,7 +300,7 @@ __ioremap_caller(resource_size_t phys_addr, unsigned long size, * * Must be freed with iounmap. */ -void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size) +void __iomem *ioremap(resource_size_t phys_addr, unsigned long size) { /* * Ideally, this should be: @@ -315,7 +315,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size) return __ioremap_caller(phys_addr, size, pcm, __builtin_return_address(0), false); } -EXPORT_SYMBOL(ioremap_nocache); +EXPORT_SYMBOL(ioremap); /** * ioremap_uc - map bus memory into CPU space as strongly uncachable diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 0d09cc5aad61..1b99ad05b117 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -1784,7 +1784,7 @@ static inline int cpa_clear_pages_array(struct page **pages, int numpages, int _set_memory_uc(unsigned long addr, int numpages) { /* -* for now UC MINUS. see comments in ioremap_nocache() +* for now UC MINUS. see comments in ioremap() * If you really need strong UC use ioremap_uc(), but note * that you cannot override IO areas with set_memory_*() as * these helpers cannot work with IO memory. @@ -1799,7 +1799,7 @@ int set_memory_uc(unsigned long addr, int numpages) int ret; /* -* for now UC MINUS. see comments in ioremap_nocache() +* for now UC MINUS. see comments in ioremap() */ ret = reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE, _PAGE_CACHE_MODE_UC_MINUS, NULL); -- 2.20.1
[PATCH 04/21] hexagon: clean up ioremap
Use ioremap as the main implemented function, and defined ioremap_nocache to it as a deprecated alias. Signed-off-by: Christoph Hellwig --- arch/hexagon/include/asm/io.h | 11 ++- arch/hexagon/kernel/hexagon_ksyms.c | 2 +- arch/hexagon/mm/ioremap.c | 2 +- 3 files changed, 4 insertions(+), 11 deletions(-) diff --git a/arch/hexagon/include/asm/io.h b/arch/hexagon/include/asm/io.h index ba1a444d55b3..89537dc1cf97 100644 --- a/arch/hexagon/include/asm/io.h +++ b/arch/hexagon/include/asm/io.h @@ -171,16 +171,9 @@ static inline void writel(u32 data, volatile void __iomem *addr) #define writew_relaxed __raw_writew #define writel_relaxed __raw_writel -/* - * Need an mtype somewhere in here, for cache type deals? - * This is probably too long for an inline. - */ -void __iomem *ioremap_nocache(unsigned long phys_addr, unsigned long size); +void __iomem *ioremap(unsigned long phys_addr, unsigned long size); +#define ioremap_nocache ioremap -static inline void __iomem *ioremap(unsigned long phys_addr, unsigned long size) -{ - return ioremap_nocache(phys_addr, size); -} static inline void iounmap(volatile void __iomem *addr) { diff --git a/arch/hexagon/kernel/hexagon_ksyms.c b/arch/hexagon/kernel/hexagon_ksyms.c index cf8974beb500..b3dbb472572e 100644 --- a/arch/hexagon/kernel/hexagon_ksyms.c +++ b/arch/hexagon/kernel/hexagon_ksyms.c @@ -20,7 +20,7 @@ EXPORT_SYMBOL(__vmgetie); EXPORT_SYMBOL(__vmsetie); EXPORT_SYMBOL(__vmyield); EXPORT_SYMBOL(empty_zero_page); -EXPORT_SYMBOL(ioremap_nocache); +EXPORT_SYMBOL(ioremap); EXPORT_SYMBOL(memcpy); EXPORT_SYMBOL(memset); diff --git a/arch/hexagon/mm/ioremap.c b/arch/hexagon/mm/ioremap.c index 77d8e1e69e9b..b103d83b5fbb 100644 --- a/arch/hexagon/mm/ioremap.c +++ b/arch/hexagon/mm/ioremap.c @@ -9,7 +9,7 @@ #include #include -void __iomem *ioremap_nocache(unsigned long phys_addr, unsigned long size) +void __iomem *ioremap(unsigned long phys_addr, unsigned long size) { unsigned long last_addr, addr; unsigned long offset = phys_addr & ~PAGE_MASK; -- 2.20.1