Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info

2024-03-04 Thread Christophe Leroy


Le 02/03/2024 à 02:51, Kees Cook a écrit :
> On Sat, Mar 02, 2024 at 12:47:08AM +, Edgecombe, Rick P wrote:
>> On Wed, 2024-02-28 at 09:21 -0800, Kees Cook wrote:
>>> I totally understand. If the "uninitialized" warnings were actually
>>> reliable, I would agree. I look at it this way:
>>>
>>> - initializations can be missed either in static initializers or via
>>>    run time initializers. (So the risk of mistake here is matched --
>>>    though I'd argue it's easier to *find* static initializers when
>>> adding
>>>    new struct members.)
>>> - uninitialized warnings are inconsistent (this becomes an unknown
>>> risk)
>>> - when a run time initializer is missed, the contents are whatever
>>> was
>>>    on the stack (high risk)
>>> - what a static initializer is missed, the content is 0 (low risk)
>>>
>>> I think unambiguous state (always 0) is significantly more important
>>> for
>>> the safety of the system as a whole. Yes, individual cases maybe bad
>>> ("what uid should this be? root?!") but from a general memory safety
>>> perspective the value doesn't become potentially influenced by order
>>> of
>>> operations, leftover stack memory, etc.
>>>
>>> I'd agree, lifting everything into a static initializer does seem
>>> cleanest of all the choices.
>>
>> Hi Kees,
>>
>> Well, I just gave this a try. It is giving me flashbacks of when I last
>> had to do a tree wide change that I couldn't fully test and the
>> breakage was caught by Linus.
> 
> Yeah, testing isn't fun for these kinds of things. This is traditionally
> why the "obviously correct" changes tend to have an easier time landing
> (i.e. adding "= {}" to all of them).
> 
>> Could you let me know if you think this is additionally worthwhile
>> cleanup outside of the guard gap improvements of this series? Because I
>> was thinking a more cowardly approach could be a new vm_unmapped_area()
>> variant that takes the new start gap member as a separate argument
>> outside of struct vm_unmapped_area_info. It would be kind of strange to
>> keep them separate, but it would be less likely to bump something.
> 
> I think you want a new member -- AIUI, that's what that struct is for.
> 
> Looking at this resulting set of patches, I do kinda think just adding
> the "= {}" in a single patch is more sensible. Having to split things
> that are know at the top of the function from the stuff known at the
> existing initialization time is rather awkward.
> 
> Personally, I think a single patch that sets "= {}" for all of them and
> drop the all the "= 0" or "= NULL" assignments would be the cleanest way
> to go.

I agree with Kees, set = {} and drop all the "something = 0;" stuff.

Christophe


Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info

2024-02-28 Thread Christophe Leroy


Le 28/02/2024 à 18:01, Edgecombe, Rick P a écrit :
> On Wed, 2024-02-28 at 13:22 +0000, Christophe Leroy wrote:
>>> Any preference? Or maybe am I missing your point and talking
>>> nonsense?
>>>
>>
>> So my preference would go to the addition of:
>>
>>  info.new_field = 0;
>>
>> But that's very minor and if you think it is easier to manage and
>> maintain by performing {} initialisation at declaration, lets go for
>> that.
> 
> Appreciate the clarification and help getting this right. I'm thinking
> Kees' and now Kirill's point about this patch resulting in unnecessary
> manual zero initialization of the structs is probably something that
> needs to be addressed.
> 
> If I created a bunch of patches to change each call site, I think the
> the best is probably to do the designated field zero initialization
> way.
> 
> But I can do something for powerpc special if you want. I'll first try
> with powerpc matching the others, and if it seems objectionable, please
> let me know.
> 

My comments were generic, it was not powerpc oriented. Please keep 
powerpc as similar as possible with others.

Christophe


Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info

2024-02-28 Thread Christophe Leroy


Le 27/02/2024 à 21:25, Edgecombe, Rick P a écrit :
> On Tue, 2024-02-27 at 18:16 +0000, Christophe Leroy wrote:
>>>> Why doing a full init of the struct when all fields are re-
>>>> written a few
>>>> lines after ?
>>>
>>> It's a nice change for robustness and makes future changes easier.
>>> It's
>>> not actually wasteful since the compiler will throw away all
>>> redundant
>>> stores.
>>
>> Well, I tend to dislike default init at declaration because it often
>> hides missed real init. When a field is not initialized GCC should
>> emit
>> a Warning, at least when built with W=2 which sets
>> -Wmissing-field-initializers ?
> 
> Sorry, I'm not following where you are going with this. There aren't
> any struct vm_unmapped_area_info users that use initializers today, so
> that warning won't apply in this case. Meanwhile, designated style
> struct initialization (which would zero new members) is very common, as
> well as not get anything checked by that warning. Anything with this
> many members is probably going to use the designated style.
> 
> If we are optimizing to avoid bugs, the way this struct is used today
> is not great. It is essentially being used as an argument passer.
> Normally when a function signature changes, but a caller is missed, of
> course the compiler will notice loudly. But not here. So I think
> probably zero initializing it is safer than being setup to pass
> garbage.

No worry, if everybody thinks that init at declaration is worth it in 
that case it is OK for me and I'm not going to ask for something special 
on powerpc, my comment was more general allthough I used powerpc as an 
exemple.

My worry with initialisation at declaration is it often hides missing 
assignments. Let's take following simple exemple:

char *colour(int num)
{
char *name;

if (num == 0) {
name = "black";
} else if (num == 1) {
name = "white";
} else if (num == 2) {
} else {
name = "no colour";
}

return name;
}


Here, GCC warns about a missing initialisation of variable 'name'.

But if I declare it as

char *name = "no colour";

Then GCC won't warn anymore that we are missing a value for when num is 2.

During my life I have so many times spent huge amount of time 
investigating issues and bugs due to missing assignments that were going 
undetected due to default initialisation at declaration.

> 
> I'm trying to figure out what to do here. If I changed it so that just
> powerpc set the new field manually, then the convention across the
> kernel would be for everything to be default zero, and future other new
> parameters could have a greater chance of turning into garbage on
> powerpc. Since it could be easy to miss that powerpc was special. Would
> you prefer it?
> 
> Or maybe I could try a new vm_unmapped_area() that takes the extra
> argument separately? The old callers could call the old function and
> not need any arch updates. It all seems strange though, because
> automatic zero initializing struct members is so common in the kernel.
> But it also wouldn't add the cleanup Kees was pointing out. Hmm.
> 
> Any preference? Or maybe am I missing your point and talking nonsense?
> 

So my preference would go to the addition of:

info.new_field = 0;

But that's very minor and if you think it is easier to manage and 
maintain by performing {} initialisation at declaration, lets go for that.

Christophe


Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info

2024-02-27 Thread Christophe Leroy


Le 27/02/2024 à 19:07, Kees Cook a écrit :
> On Tue, Feb 27, 2024 at 07:02:59AM +0000, Christophe Leroy wrote:
>>
>>
>> Le 26/02/2024 à 20:09, Rick Edgecombe a écrit :
>>> Future changes will need to add a field to struct vm_unmapped_area_info.
>>> This would cause trouble for any archs that don't initialize the
>>> struct. Currently every user sets each field, so if new fields are
>>> added, the core code parsing the struct will see garbage in the new
>>> field.
>>>
>>> It could be possible to initialize the new field for each arch to 0, but
>>> instead simply inialize the field with a C99 struct inializing syntax.
>>
>> Why doing a full init of the struct when all fields are re-written a few
>> lines after ?
> 
> It's a nice change for robustness and makes future changes easier. It's
> not actually wasteful since the compiler will throw away all redundant
> stores.

Well, I tend to dislike default init at declaration because it often 
hides missed real init. When a field is not initialized GCC should emit 
a Warning, at least when built with W=2 which sets 
-Wmissing-field-initializers ?

> 
>> If I take the exemple of powerpc function slice_find_area_bottomup():
>>
>>  struct vm_unmapped_area_info info;
>>
>>  info.flags = 0;
>>  info.length = len;
>>  info.align_mask = PAGE_MASK & ((1ul << pshift) - 1);
>>  info.align_offset = 0;
> 
> But one cleanup that is possible from explicitly zero-initializing the
> whole structure would be dropping all the individual "= 0" assignments.
> :)
> 

Sure if we decide to go that direction all those 0 assignments void.


Re: [PATCH 1/4] arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions

2024-02-27 Thread Christophe Leroy


Le 27/02/2024 à 16:40, Arnd Bergmann a écrit :
> On Mon, Feb 26, 2024, at 17:55, Samuel Holland wrote:
>> On 2024-02-26 10:14 AM, Arnd Bergmann wrote:
>>>   
>>> +config HAVE_PAGE_SIZE_4KB
>>> +   bool
>>> +
>>> +config HAVE_PAGE_SIZE_8KB
>>> +   bool
>>> +
>>> +config HAVE_PAGE_SIZE_16KB
>>> +   bool
>>> +
>>> +config HAVE_PAGE_SIZE_32KB
>>> +   bool
>>> +
>>> +config HAVE_PAGE_SIZE_64KB
>>> +   bool
>>> +
>>> +config HAVE_PAGE_SIZE_256KB
>>> +   bool
>>> +
>>> +choice
>>> +   prompt "MMU page size"
>>
>> Should this have some generic help text (at least a warning about
>> compatibility)?
> 
> Good point. I've added some of this now, based on the mips
> text with some generalizations for other architectures:
> 
> config PAGE_SIZE_4KB
>  bool "4KiB pages"
>  depends on HAVE_PAGE_SIZE_4KB
>  help
>This option select the standard 4KiB Linux page size and the only
>available option on many architectures. Using 4KiB page size will
>minimize memory consumption and is therefore recommended for low
>memory systems.
>Some software that is written for x86 systems makes incorrect
>assumptions about the page size and only runs on 4KiB pages.
> 
> config PAGE_SIZE_8KB
>  bool "8KiB pages"
>  depends on HAVE_PAGE_SIZE_8KB
>  help
>This option is the only supported page size on a few older
>processors, and can be slightly faster than 4KiB pages.
> 
> config PAGE_SIZE_16KB
>  bool "16KiB pages"
>  depends on HAVE_PAGE_SIZE_16KB
>  help
>This option is usually a good compromise between memory
>consumption and performance for typical desktop and server
>workloads, often saving a level of page table lookups compared
>to 4KB pages as well as reducing TLB pressure and overhead of
>per-page operations in the kernel at the expense of a larger
>page cache.
> 
> config PAGE_SIZE_32KB
>  bool "32KiB pages"
>  depends on HAVE_PAGE_SIZE_32KB
>Using 32KiB page size will result in slightly higher performance
>kernel at the price of higher memory consumption compared to
>16KiB pages.  This option is available only on cnMIPS cores.
>Note that you will need a suitable Linux distribution to
>support this.
> 
> config PAGE_SIZE_64KB
>  bool "64KiB pages"
>  depends on HAVE_PAGE_SIZE_64KB
>Using 64KiB page size will result in slightly higher performance
>kernel at the price of much higher memory consumption compared to
>4KiB or 16KiB pages.
>This is not suitable for general-purpose workloads but the
>better performance may be worth the cost for certain types of
>supercomputing or database applications that work mostly with
>large in-memory data rather than small files.
> 
> config PAGE_SIZE_256KB
>  bool "256KiB pages"
>  depends on HAVE_PAGE_SIZE_256KB
>  help
>256KB pages have little practical value due to their extreme
>memory usage.


For 256K pages, powerpc has the following help. I think you should have 
it too:

  The kernel will only be able to run applications that have been
  compiled with '-zmax-page-size' set to 256K (the default is 64K) using
  binutils later than 2.17.50.0.3, or by patching the ELF_MAXPAGESIZE
  definition from 0x1 to 0x4 in older versions.


Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info

2024-02-26 Thread Christophe Leroy


Le 26/02/2024 à 20:09, Rick Edgecombe a écrit :
> Future changes will need to add a field to struct vm_unmapped_area_info.
> This would cause trouble for any archs that don't initialize the
> struct. Currently every user sets each field, so if new fields are
> added, the core code parsing the struct will see garbage in the new
> field.
> 
> It could be possible to initialize the new field for each arch to 0, but
> instead simply inialize the field with a C99 struct inializing syntax.

Why doing a full init of the struct when all fields are re-written a few 
lines after ?

If I take the exemple of powerpc function slice_find_area_bottomup():

struct vm_unmapped_area_info info;

info.flags = 0;
info.length = len;
info.align_mask = PAGE_MASK & ((1ul << pshift) - 1);
info.align_offset = 0;

For me it looks better to just add:

info.new_field = 0; /* or whatever value it needs to have */

Christophe


> 
> Cc: linux...@kvack.org
> Cc: linux-alpha@vger.kernel.org
> Cc: linux-snps-...@lists.infradead.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-c...@vger.kernel.org
> Cc: loonga...@lists.linux.dev
> Cc: linux-m...@vger.kernel.org
> Cc: linux-par...@vger.kernel.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: linux-s...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Cc: sparcli...@vger.kernel.org
> Cc: x...@kernel.org
> Suggested-by: Kirill A. Shutemov 
> Signed-off-by: Rick Edgecombe 
> Link: 
> https://lore.kernel.org/lkml/3ynogxcgokc6i6xojbxzzwqectg472laes24u7jmtktlxcch5e@dfytra3ia3zc/#t
> ---
> Hi archs,
> 
> For some context, this is part of a larger series to improve shadow stack
> guard gaps. It involves plumbing a new field via
> struct vm_unmapped_area_info. The first user is x86, but arm and riscv may
> likely use it as well. The change is compile tested only for non-x86 but
> seems like a relatively safe one.
> 
> Thanks,
> 
> Rick
> 
> v2:
>   - New patch
> ---
>   arch/alpha/kernel/osf_sys.c  | 2 +-
>   arch/arc/mm/mmap.c   | 2 +-
>   arch/arm/mm/mmap.c   | 4 ++--
>   arch/csky/abiv1/mmap.c   | 2 +-
>   arch/loongarch/mm/mmap.c | 2 +-
>   arch/mips/mm/mmap.c  | 2 +-
>   arch/parisc/kernel/sys_parisc.c  | 2 +-
>   arch/powerpc/mm/book3s64/slice.c | 4 ++--
>   arch/s390/mm/hugetlbpage.c   | 4 ++--
>   arch/s390/mm/mmap.c  | 4 ++--
>   arch/sh/mm/mmap.c| 4 ++--
>   arch/sparc/kernel/sys_sparc_32.c | 2 +-
>   arch/sparc/kernel/sys_sparc_64.c | 4 ++--
>   arch/sparc/mm/hugetlbpage.c  | 4 ++--
>   arch/x86/kernel/sys_x86_64.c | 4 ++--
>   arch/x86/mm/hugetlbpage.c| 4 ++--
>   fs/hugetlbfs/inode.c | 4 ++--
>   mm/mmap.c| 4 ++--
>   18 files changed, 29 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
> index 5db88b627439..dd6801bb9240 100644
> --- a/arch/alpha/kernel/osf_sys.c
> +++ b/arch/alpha/kernel/osf_sys.c
> @@ -1218,7 +1218,7 @@ static unsigned long
>   arch_get_unmapped_area_1(unsigned long addr, unsigned long len,
>unsigned long limit)
>   {
> - struct vm_unmapped_area_info info;
> + struct vm_unmapped_area_info info = {};
>   
>   info.flags = 0;
>   info.length = len;
> diff --git a/arch/arc/mm/mmap.c b/arch/arc/mm/mmap.c
> index 3c1c7ae73292..6549b3375f54 100644
> --- a/arch/arc/mm/mmap.c
> +++ b/arch/arc/mm/mmap.c
> @@ -27,7 +27,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long 
> addr,
>   {
>   struct mm_struct *mm = current->mm;
>   struct vm_area_struct *vma;
> - struct vm_unmapped_area_info info;
> + struct vm_unmapped_area_info info = {};
>   
>   /*
>* We enforce the MAP_FIXED case.
> diff --git a/arch/arm/mm/mmap.c b/arch/arm/mm/mmap.c
> index a0f8a0ca0788..525795578c29 100644
> --- a/arch/arm/mm/mmap.c
> +++ b/arch/arm/mm/mmap.c
> @@ -34,7 +34,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long 
> addr,
>   struct vm_area_struct *vma;
>   int do_align = 0;
>   int aliasing = cache_is_vipt_aliasing();
> - struct vm_unmapped_area_info info;
> + struct vm_unmapped_area_info info = {};
>   
>   /*
>* We only need to do colour alignment if either the I or D
> @@ -87,7 +87,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const 
> unsigned long addr0,
>   unsigned long addr = addr0;
>   int do_align = 0;
>   int aliasing = cache_is_vipt_aliasing();
> - struct vm_unmapped_area_info info;
> + struct vm_unmapped_area_info info = {};
>   
>   /*
>* We only need to do colour alignment if either the I or D
> diff --git a/arch/csky/abiv1/mmap.c b/arch/csky/abiv1/mmap.c
> index 6792aca4..726659d41fa9 100644
> --- a/arch/csky/abiv1/mmap.c
> +++ b/arch/csky/abiv1/mmap.c
> @@ -28,7 +28,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long 
> addr,
>   struct mm_struct *mm = 

Re: [PATCH 2/4] arch: simplify architecture specific page size configuration

2024-02-26 Thread Christophe Leroy


Le 26/02/2024 à 17:14, Arnd Bergmann a écrit :
> From: Arnd Bergmann 
> 
> arc, arm64, parisc and powerpc all have their own Kconfig symbols
> in place of the common CONFIG_PAGE_SIZE_4KB symbols. Change these
> so the common symbols are the ones that are actually used, while
> leaving the arhcitecture specific ones as the user visible
> place for configuring it, to avoid breaking user configs.
> 
> Signed-off-by: Arnd Bergmann 

Reviewed-by: Christophe Leroy  (powerpc32)

> ---
>   arch/arc/Kconfig  |  3 +++
>   arch/arc/include/uapi/asm/page.h  |  6 ++
>   arch/arm64/Kconfig| 29 +
>   arch/arm64/include/asm/page-def.h |  2 +-
>   arch/parisc/Kconfig   |  3 +++
>   arch/parisc/include/asm/page.h| 10 +-
>   arch/powerpc/Kconfig  | 31 ++-
>   arch/powerpc/include/asm/page.h   |  2 +-
>   scripts/gdb/linux/constants.py.in |  2 +-
>   scripts/gdb/linux/mm.py   |  2 +-
>   10 files changed, 32 insertions(+), 58 deletions(-)
> 
> diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
> index 1b0483c51cc1..4092bec198be 100644
> --- a/arch/arc/Kconfig
> +++ b/arch/arc/Kconfig
> @@ -284,14 +284,17 @@ choice
>   
>   config ARC_PAGE_SIZE_8K
>   bool "8KB"
> + select HAVE_PAGE_SIZE_8KB
>   help
> Choose between 8k vs 16k
>   
>   config ARC_PAGE_SIZE_16K
> + select HAVE_PAGE_SIZE_16KB
>   bool "16KB"
>   
>   config ARC_PAGE_SIZE_4K
>   bool "4KB"
> + select HAVE_PAGE_SIZE_4KB
>   depends on ARC_MMU_V3 || ARC_MMU_V4
>   
>   endchoice
> diff --git a/arch/arc/include/uapi/asm/page.h 
> b/arch/arc/include/uapi/asm/page.h
> index 2a4ad619abfb..7fd9e741b527 100644
> --- a/arch/arc/include/uapi/asm/page.h
> +++ b/arch/arc/include/uapi/asm/page.h
> @@ -13,10 +13,8 @@
>   #include 
>   
>   /* PAGE_SHIFT determines the page size */
> -#if defined(CONFIG_ARC_PAGE_SIZE_16K)
> -#define PAGE_SHIFT 14
> -#elif defined(CONFIG_ARC_PAGE_SIZE_4K)
> -#define PAGE_SHIFT 12
> +#ifdef __KERNEL__
> +#define PAGE_SHIFT CONFIG_PAGE_SHIFT
>   #else
>   /*
>* Default 8k
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index aa7c1d435139..29290b8cb36d 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -277,27 +277,21 @@ config 64BIT
>   config MMU
>   def_bool y
>   
> -config ARM64_PAGE_SHIFT
> - int
> - default 16 if ARM64_64K_PAGES
> - default 14 if ARM64_16K_PAGES
> - default 12
> -
>   config ARM64_CONT_PTE_SHIFT
>   int
> - default 5 if ARM64_64K_PAGES
> - default 7 if ARM64_16K_PAGES
> + default 5 if PAGE_SIZE_64KB
> + default 7 if PAGE_SIZE_16KB
>   default 4
>   
>   config ARM64_CONT_PMD_SHIFT
>   int
> - default 5 if ARM64_64K_PAGES
> - default 5 if ARM64_16K_PAGES
> + default 5 if PAGE_SIZE_64KB
> + default 5 if PAGE_SIZE_16KB
>   default 4
>   
>   config ARCH_MMAP_RND_BITS_MIN
> - default 14 if ARM64_64K_PAGES
> - default 16 if ARM64_16K_PAGES
> + default 14 if PAGE_SIZE_64KB
> + default 16 if PAGE_SIZE_16KB
>   default 18
>   
>   # max bits determined by the following formula:
> @@ -1259,11 +1253,13 @@ choice
>   
>   config ARM64_4K_PAGES
>   bool "4KB"
> + select HAVE_PAGE_SIZE_4KB
>   help
> This feature enables 4KB pages support.
>   
>   config ARM64_16K_PAGES
>   bool "16KB"
> + select HAVE_PAGE_SIZE_16KB
>   help
> The system will use 16KB pages support. AArch32 emulation
> requires applications compiled with 16K (or a multiple of 16K)
> @@ -1271,6 +1267,7 @@ config ARM64_16K_PAGES
>   
>   config ARM64_64K_PAGES
>   bool "64KB"
> + select HAVE_PAGE_SIZE_64KB
>   help
> This feature enables 64KB pages support (4KB by default)
> allowing only two levels of page tables and faster TLB
> @@ -1291,19 +1288,19 @@ choice
>   
>   config ARM64_VA_BITS_36
>   bool "36-bit" if EXPERT
> - depends on ARM64_16K_PAGES
> + depends on PAGE_SIZE_16KB
>   
>   config ARM64_VA_BITS_39
>   bool "39-bit"
> - depends on ARM64_4K_PAGES
> + depends on PAGE_SIZE_4KB
>   
>   config ARM64_VA_BITS_42
>   bool "42-bit"
> - depends on ARM64_64K_PAGES
> + depends on PAGE_SIZE_64KB
>   
>   config ARM64_VA_BITS_47
>   bool "47-bit"
> - depends on ARM64_16K_PAGES
> + depends on PAGE_SIZE_16KB
>   
>   config ARM64_VA_BITS_

Re: [PATCH 1/4] arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions

2024-02-26 Thread Christophe Leroy


Le 26/02/2024 à 17:14, Arnd Bergmann a écrit :
> From: Arnd Bergmann 
> 
> These four architectures define the same Kconfig symbols for configuring
> the page size. Move the logic into a common place where it can be shared
> with all other architectures.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>   arch/Kconfig  | 58 +--
>   arch/hexagon/Kconfig  | 25 +++--
>   arch/hexagon/include/asm/page.h   |  6 +---
>   arch/loongarch/Kconfig| 21 ---
>   arch/loongarch/include/asm/page.h | 10 +-
>   arch/mips/Kconfig | 58 +++
>   arch/mips/include/asm/page.h  | 16 +
>   arch/sh/include/asm/page.h| 13 +--
>   arch/sh/mm/Kconfig| 42 +++---
>   9 files changed, 88 insertions(+), 161 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index a5af0edd3eb8..237cea01ed9b 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1078,17 +1078,71 @@ config HAVE_ARCH_COMPAT_MMAP_BASES
> and vice-versa 32-bit applications to call 64-bit mmap().
> Required for applications doing different bitness syscalls.
>   
> +config HAVE_PAGE_SIZE_4KB
> + bool
> +
> +config HAVE_PAGE_SIZE_8KB
> + bool
> +
> +config HAVE_PAGE_SIZE_16KB
> + bool
> +
> +config HAVE_PAGE_SIZE_32KB
> + bool
> +
> +config HAVE_PAGE_SIZE_64KB
> + bool
> +
> +config HAVE_PAGE_SIZE_256KB
> + bool
> +
> +choice
> + prompt "MMU page size"
> +

That's a nice re-factor.

The only drawback I see is that we are loosing several interesting 
arch-specific comments/help text. Don't know if there could be an easy 
way to keep them.


> +config PAGE_SIZE_4KB
> + bool "4KB pages"
> + depends on HAVE_PAGE_SIZE_4KB
> +
> +config PAGE_SIZE_8KB
> + bool "8KB pages"
> + depends on HAVE_PAGE_SIZE_8KB
> +
> +config PAGE_SIZE_16KB
> + bool "16KB pages"
> + depends on HAVE_PAGE_SIZE_16KB
> +
> +config PAGE_SIZE_32KB
> + bool "32KB pages"
> + depends on HAVE_PAGE_SIZE_32KB
> +
> +config PAGE_SIZE_64KB
> + bool "64KB pages"
> + depends on HAVE_PAGE_SIZE_64KB
> +
> +config PAGE_SIZE_256KB
> + bool "256KB pages"
> + depends on HAVE_PAGE_SIZE_256KB

Hexagon seem to also use CONFIG_PAGE_SIZE_1MB ?

> +
> +endchoice
> +
>   config PAGE_SIZE_LESS_THAN_64KB
>   def_bool y
> - depends on !ARM64_64K_PAGES
>   depends on !PAGE_SIZE_64KB
> - depends on !PARISC_PAGE_SIZE_64KB
>   depends on PAGE_SIZE_LESS_THAN_256KB
>   
>   config PAGE_SIZE_LESS_THAN_256KB
>   def_bool y
>   depends on !PAGE_SIZE_256KB
>   
> +config PAGE_SHIFT
> + int
> + default 12 if PAGE_SIZE_4KB
> + default 13 if PAGE_SIZE_8KB
> + default 14 if PAGE_SIZE_16KB
> + default 15 if PAGE_SIZE_32KB
> + default 16 if PAGE_SIZE_64KB
> + default 18 if PAGE_SIZE_256KB
> +
>   # This allows to use a set of generic functions to determine mmap base
>   # address by giving priority to top-down scheme only if the process
>   # is not in legacy mode (compat task, unlimited stack size or
> diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig
> index a880ee067d2e..aac46ee1a000 100644
> --- a/arch/hexagon/Kconfig
> +++ b/arch/hexagon/Kconfig
> @@ -8,6 +8,11 @@ config HEXAGON
>   select ARCH_HAS_SYNC_DMA_FOR_DEVICE
>   select ARCH_NO_PREEMPT
>   select DMA_GLOBAL_POOL
> + select FRAME_POINTER
> + select HAVE_PAGE_SIZE_4KB
> + select HAVE_PAGE_SIZE_16KB
> + select HAVE_PAGE_SIZE_64KB
> + select HAVE_PAGE_SIZE_256KB
>   # Other pending projects/to-do items.
>   # select HAVE_REGS_AND_STACK_ACCESS_API
>   # select HAVE_HW_BREAKPOINT if PERF_EVENTS
> @@ -120,26 +125,6 @@ config NR_CPUS
> This is purely to save memory - each supported CPU adds
> approximately eight kilobytes to the kernel image.
>   
> -choice
> - prompt "Kernel page size"
> - default PAGE_SIZE_4KB
> - help
> -   Changes the default page size; use with caution.
> -
> -config PAGE_SIZE_4KB
> - bool "4KB"
> -
> -config PAGE_SIZE_16KB
> - bool "16KB"
> -
> -config PAGE_SIZE_64KB
> - bool "64KB"
> -
> -config PAGE_SIZE_256KB
> - bool "256KB"
> -
> -endchoice
> -
>   source "kernel/Kconfig.hz"
>   
>   endmenu
> diff --git a/arch/hexagon/include/asm/page.h b/arch/hexagon/include/asm/page.h
> index 10f1bc07423c..65c9bac639fa 100644
> --- a/arch/hexagon/include/asm/page.h
> +++ b/arch/hexagon/include/asm/page.h
> @@ -13,27 +13,22 @@
>   /*  This is probably not the most graceful way to handle this.  */
>   
>   #ifdef CONFIG_PAGE_SIZE_4KB
> -#define PAGE_SHIFT 12
>   #define HEXAGON_L1_PTE_SIZE __HVM_PDE_S_4KB
>   #endif
>   
>   #ifdef CONFIG_PAGE_SIZE_16KB
> -#define PAGE_SHIFT 14
>   #define HEXAGON_L1_PTE_SIZE __HVM_PDE_S_16KB
>   #endif
>   
>   #ifdef CONFIG_PAGE_SIZE_64KB
> -#define PAGE_SHIFT 16
>   #define HEXAGON_L1_PTE_SIZE __HVM_PDE_S_64KB
>   

Re: [PATCH 4/4] vdso: avoid including asm/page.h

2024-02-26 Thread Christophe Leroy


Le 26/02/2024 à 17:14, Arnd Bergmann a écrit :
> From: Arnd Bergmann 
> 
> The recent change to the vdso_data_store broke building compat VDSO
> on at least arm64 because it includes headers outside of the include/vdso/
> namespace:

I understand that powerpc64 also has an issue, see 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20231221120410.2226678-1-...@ellerman.id.au/

> 
> In file included from arch/arm64/include/asm/lse.h:5,
>   from arch/arm64/include/asm/cmpxchg.h:14,
>   from arch/arm64/include/asm/atomic.h:16,
>   from include/linux/atomic.h:7,
>   from include/asm-generic/bitops/atomic.h:5,
>   from arch/arm64/include/asm/bitops.h:25,
>   from include/linux/bitops.h:68,
>   from arch/arm64/include/asm/memory.h:209,
>   from arch/arm64/include/asm/page.h:46,
>   from include/vdso/datapage.h:22,
>   from lib/vdso/gettimeofday.c:5,
>   from :
> arch/arm64/include/asm/atomic_ll_sc.h:298:9: error: unknown type name 'u128'
>298 | u128 full;
> 
> Use an open-coded page size calculation based on the new CONFIG_PAGE_SHIFT
> Kconfig symbol instead.
> 
> Reported-by: Linux Kernel Functional Testing 
> Fixes: a0d2fcd62ac2 ("vdso/ARM: Make union vdso_data_store available for all 
> architectures")
> Link: 
> https://lore.kernel.org/lkml/ca+g9fytrxxm_ko9fnpz3xarxhv7ud_yqp-teupqrnrhu+_0...@mail.gmail.com/
> Signed-off-by: Arnd Bergmann 
> ---
>   include/vdso/datapage.h | 4 +---
>   1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/include/vdso/datapage.h b/include/vdso/datapage.h
> index 7ba44379a095..2c39a67d7e23 100644
> --- a/include/vdso/datapage.h
> +++ b/include/vdso/datapage.h
> @@ -19,8 +19,6 @@
>   #include 
>   #include 
>   
> -#include 
> -
>   #ifdef CONFIG_ARCH_HAS_VDSO_DATA
>   #include 
>   #else
> @@ -128,7 +126,7 @@ extern struct vdso_data _timens_data[CS_BASES] 
> __attribute__((visibility("hidden
>*/
>   union vdso_data_store {
>   struct vdso_datadata[CS_BASES];
> - u8  page[PAGE_SIZE];
> + u8  page[1ul << CONFIG_PAGE_SHIFT];

Usually 1UL is used (capital letter)

Maybe better to (re)define PAGE_SIZE instead, something like:

#define PAGE_SIZE (1UL << CONFIG_PAGE_SHIFT)


>   };
>   
>   /*


Re: [PATCH 12/15] powerpc/nohash/64: switch to generic version of pte allocation

2019-05-02 Thread Christophe Leroy




Le 02/05/2019 à 17:28, Mike Rapoport a écrit :

The 64-bit book-E powerpc implements pte_alloc_one(),
pte_alloc_one_kernel(), pte_free_kernel() and pte_free() the same way as
the generic version.


Will soon be converted to the same as the 3 other PPC subarches, see
https://patchwork.ozlabs.org/patch/1091590/

Christophe



Switch it to the generic version that does exactly the same thing.

Signed-off-by: Mike Rapoport 
---
  arch/powerpc/include/asm/nohash/64/pgalloc.h | 35 ++--
  1 file changed, 2 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/64/pgalloc.h 
b/arch/powerpc/include/asm/nohash/64/pgalloc.h
index 66d086f..bfb53a0 100644
--- a/arch/powerpc/include/asm/nohash/64/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/64/pgalloc.h
@@ -11,6 +11,8 @@
  #include 
  #include 
  
+#include 	/* for pte_{alloc,free}_one */

+
  struct vmemmap_backing {
struct vmemmap_backing *list;
unsigned long phys;
@@ -92,39 +94,6 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
kmem_cache_free(PGT_CACHE(PMD_CACHE_INDEX), pmd);
  }
  
-

-static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
-{
-   return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
-}
-
-static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
-{
-   struct page *page;
-   pte_t *pte;
-
-   pte = (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT);
-   if (!pte)
-   return NULL;
-   page = virt_to_page(pte);
-   if (!pgtable_page_ctor(page)) {
-   __free_page(page);
-   return NULL;
-   }
-   return page;
-}
-
-static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
-{
-   free_page((unsigned long)pte);
-}
-
-static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
-{
-   pgtable_page_dtor(ptepage);
-   __free_page(ptepage);
-}
-
  static inline void pgtable_free(void *table, int shift)
  {
if (!shift) {



Re: [PATCH v5 3/3] locking/rwsem: Optimize down_read_trylock()

2019-03-25 Thread Christophe Leroy

Hi,

Could you share the microbenchmark you are using ?

I'd like to test the series on powerpc.

Thanks
Christophe

Le 22/03/2019 à 15:30, Waiman Long a écrit :

Modify __down_read_trylock() to optimize for an unlocked rwsem and make
it generate slightly better code.

Before this patch, down_read_trylock:

0x <+0>: callq  0x5 
0x0005 <+5>: jmp0x18 
0x0007 <+7>: lea0x1(%rdx),%rcx
0x000b <+11>:mov%rdx,%rax
0x000e <+14>:lock cmpxchg %rcx,(%rdi)
0x0013 <+19>:cmp%rax,%rdx
0x0016 <+22>:je 0x23 
0x0018 <+24>:mov(%rdi),%rdx
0x001b <+27>:test   %rdx,%rdx
0x001e <+30>:jns0x7 
0x0020 <+32>:xor%eax,%eax
0x0022 <+34>:retq
0x0023 <+35>:mov%gs:0x0,%rax
0x002c <+44>:or $0x3,%rax
0x0030 <+48>:mov%rax,0x20(%rdi)
0x0034 <+52>:mov$0x1,%eax
0x0039 <+57>:retq

After patch, down_read_trylock:

0x <+0>:  callq  0x5 
0x0005 <+5>:  xor%eax,%eax
0x0007 <+7>:  lea0x1(%rax),%rdx
0x000b <+11>: lock cmpxchg %rdx,(%rdi)
0x0010 <+16>: jne0x29 
0x0012 <+18>: mov%gs:0x0,%rax
0x001b <+27>: or $0x3,%rax
0x001f <+31>: mov%rax,0x20(%rdi)
0x0023 <+35>: mov$0x1,%eax
0x0028 <+40>: retq
0x0029 <+41>: test   %rax,%rax
0x002c <+44>: jns0x7 
0x002e <+46>: xor%eax,%eax
0x0030 <+48>: retq

By using a rwsem microbenchmark, the down_read_trylock() rate (with a
load of 10 to lengthen the lock critical section) on a x86-64 system
before and after the patch were:

  Before PatchAfter Patch
# of Threads rlock   rlock
 -   -
 1   14,496  14,716
 28,644   8,453
46,799   6,983
85,664   7,190

On a ARM64 system, the performance results were:

  Before PatchAfter Patch
# of Threads rlock   rlock
 -   -
 1   23,676  24,488
 27,697   9,502
 44,945   3,440
 82,641   1,603

For the uncontended case (1 thread), the new down_read_trylock() is a
little bit faster. For the contended cases, the new down_read_trylock()
perform pretty well in x86-64, but performance degrades at high
contention level on ARM64.

Suggested-by: Linus Torvalds 
Signed-off-by: Waiman Long 
---
  kernel/locking/rwsem.h | 13 -
  1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index 45ee00236e03..1f5775aa6a1d 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -174,14 +174,17 @@ static inline int __down_read_killable(struct 
rw_semaphore *sem)
  
  static inline int __down_read_trylock(struct rw_semaphore *sem)

  {
-   long tmp;
+   /*
+* Optimize for the case when the rwsem is not locked at all.
+*/
+   long tmp = RWSEM_UNLOCKED_VALUE;
  
-	while ((tmp = atomic_long_read(>count)) >= 0) {

-   if (tmp == atomic_long_cmpxchg_acquire(>count, tmp,
-  tmp + RWSEM_ACTIVE_READ_BIAS)) {
+   do {
+   if (atomic_long_try_cmpxchg_acquire(>count, ,
+   tmp + RWSEM_ACTIVE_READ_BIAS)) {
return 1;
}
-   }
+   } while (tmp >= 0);
return 0;
  }
  



Re: [PATCH v2 19/21] treewide: add checks for the return value of memblock_alloc*()

2019-01-30 Thread Christophe Leroy




Le 31/01/2019 à 07:44, Christophe Leroy a écrit :



Le 31/01/2019 à 07:41, Mike Rapoport a écrit :

On Thu, Jan 31, 2019 at 07:07:46AM +0100, Christophe Leroy wrote:



Le 21/01/2019 à 09:04, Mike Rapoport a écrit :

Add check for the return value of memblock_alloc*() functions and call
panic() in case of error.
The panic message repeats the one used by panicing memblock 
allocators with

adjustment of parameters to include only relevant ones.

The replacement was mostly automated with semantic patches like the one
below with manual massaging of format strings.

@@
expression ptr, size, align;
@@
ptr = memblock_alloc(size, align);
+ if (!ptr)
+ panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__,
size, align);

Signed-off-by: Mike Rapoport 
Reviewed-by: Guo Ren  # c-sky
Acked-by: Paul Burton  # MIPS
Acked-by: Heiko Carstens  # s390
Reviewed-by: Juergen Gross  # Xen
---


[...]


diff --git a/mm/sparse.c b/mm/sparse.c
index 7ea5dc6..ad94242 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c


[...]

@@ -425,6 +436,10 @@ static void __init sparse_buffer_init(unsigned 
long size, int nid)

  memblock_alloc_try_nid_raw(size, PAGE_SIZE,
  __pa(MAX_DMA_ADDRESS),
  MEMBLOCK_ALLOC_ACCESSIBLE, nid);
+    if (!sparsemap_buf)
+    panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d 
from=%lx\n",

+  __func__, size, PAGE_SIZE, nid, __pa(MAX_DMA_ADDRESS));
+


memblock_alloc_try_nid_raw() does not panic (help explicitly says: 
Does not

zero allocated memory, does not panic if request cannot be satisfied.).


"Does not panic" does not mean it always succeeds.


I agree, but at least here you are changing the behaviour by making it 
panic explicitly. Are we sure there are not cases where the system could 
just continue functionning ? Maybe a WARN_ON() would be enough there ?


Looking more in details, it looks like everything is done to live with 
sparsemap_buf NULL, all functions using it check it so having it NULL 
shouldn't imply a panic I believe, see code below.


static void *sparsemap_buf __meminitdata;
static void *sparsemap_buf_end __meminitdata;

static void __init sparse_buffer_init(unsigned long size, int nid)
{
WARN_ON(sparsemap_buf); /* forgot to call sparse_buffer_fini()? */
sparsemap_buf =
memblock_alloc_try_nid_raw(size, PAGE_SIZE,
__pa(MAX_DMA_ADDRESS),
MEMBLOCK_ALLOC_ACCESSIBLE, nid);
sparsemap_buf_end = sparsemap_buf + size;
}

static void __init sparse_buffer_fini(void)
{
unsigned long size = sparsemap_buf_end - sparsemap_buf;

if (sparsemap_buf && size > 0)
memblock_free_early(__pa(sparsemap_buf), size);
sparsemap_buf = NULL;
}

void * __meminit sparse_buffer_alloc(unsigned long size)
{
void *ptr = NULL;

if (sparsemap_buf) {
ptr = PTR_ALIGN(sparsemap_buf, size);
if (ptr + size > sparsemap_buf_end)
ptr = NULL;
else
sparsemap_buf = ptr + size;
}
return ptr;
}


Christophe


Re: [PATCH v2 19/21] treewide: add checks for the return value of memblock_alloc*()

2019-01-30 Thread Christophe Leroy




Le 31/01/2019 à 07:41, Mike Rapoport a écrit :

On Thu, Jan 31, 2019 at 07:07:46AM +0100, Christophe Leroy wrote:



Le 21/01/2019 à 09:04, Mike Rapoport a écrit :

Add check for the return value of memblock_alloc*() functions and call
panic() in case of error.
The panic message repeats the one used by panicing memblock allocators with
adjustment of parameters to include only relevant ones.

The replacement was mostly automated with semantic patches like the one
below with manual massaging of format strings.

@@
expression ptr, size, align;
@@
ptr = memblock_alloc(size, align);
+ if (!ptr)
+   panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__,
size, align);

Signed-off-by: Mike Rapoport 
Reviewed-by: Guo Ren  # c-sky
Acked-by: Paul Burton# MIPS
Acked-by: Heiko Carstens  # s390
Reviewed-by: Juergen Gross  # Xen
---


[...]


diff --git a/mm/sparse.c b/mm/sparse.c
index 7ea5dc6..ad94242 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c


[...]


@@ -425,6 +436,10 @@ static void __init sparse_buffer_init(unsigned long size, 
int nid)
memblock_alloc_try_nid_raw(size, PAGE_SIZE,
__pa(MAX_DMA_ADDRESS),
MEMBLOCK_ALLOC_ACCESSIBLE, nid);
+   if (!sparsemap_buf)
+   panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d 
from=%lx\n",
+ __func__, size, PAGE_SIZE, nid, __pa(MAX_DMA_ADDRESS));
+


memblock_alloc_try_nid_raw() does not panic (help explicitly says: Does not
zero allocated memory, does not panic if request cannot be satisfied.).


"Does not panic" does not mean it always succeeds.


I agree, but at least here you are changing the behaviour by making it 
panic explicitly. Are we sure there are not cases where the system could 
just continue functionning ? Maybe a WARN_ON() would be enough there ?


Christophe

  

Stephen Rothwell reports a boot failure due to this change.


Please see my reply on that thread.


Christophe


sparsemap_buf_end = sparsemap_buf + size;
  }







Re: [PATCH v2 19/21] treewide: add checks for the return value of memblock_alloc*()

2019-01-30 Thread Christophe Leroy




Le 21/01/2019 à 09:04, Mike Rapoport a écrit :

Add check for the return value of memblock_alloc*() functions and call
panic() in case of error.
The panic message repeats the one used by panicing memblock allocators with
adjustment of parameters to include only relevant ones.

The replacement was mostly automated with semantic patches like the one
below with manual massaging of format strings.

@@
expression ptr, size, align;
@@
ptr = memblock_alloc(size, align);
+ if (!ptr)
+   panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__,
size, align);

Signed-off-by: Mike Rapoport 
Reviewed-by: Guo Ren  # c-sky
Acked-by: Paul Burton# MIPS
Acked-by: Heiko Carstens  # s390
Reviewed-by: Juergen Gross  # Xen
---


[...]


diff --git a/mm/sparse.c b/mm/sparse.c
index 7ea5dc6..ad94242 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c


[...]


@@ -425,6 +436,10 @@ static void __init sparse_buffer_init(unsigned long size, 
int nid)
memblock_alloc_try_nid_raw(size, PAGE_SIZE,
__pa(MAX_DMA_ADDRESS),
MEMBLOCK_ALLOC_ACCESSIBLE, nid);
+   if (!sparsemap_buf)
+   panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d 
from=%lx\n",
+ __func__, size, PAGE_SIZE, nid, __pa(MAX_DMA_ADDRESS));
+


memblock_alloc_try_nid_raw() does not panic (help explicitly says: Does 
not zero allocated memory, does not panic if request cannot be satisfied.).


Stephen Rothwell reports a boot failure due to this change.

Christophe


sparsemap_buf_end = sparsemap_buf + size;
  }