Re: [PATCH 1/4] mm/vmalloc: allow arch-specific vmalloc_node overrides

2024-02-21 Thread Maxwell Bland
> On Wednesday, February 21, 2024 12:59 AM, Christophe Leroy wrote:
> 
> In the code you add __weak for that. But you also add the flags to the
> parameters and I can't understand why when reading the above description.

This  change was made to allow most kernel interfaces use vmalloc_node and
enable the overrides to work. It also reduces the number of kernel locations
which would need to be change if there was ever a change to the
vmalloc_node_range interface.

However, there is a pushback to overriding the vmalloc interface, so this change
will likely not show up in my final patch.

Regards,
Maxwell



Re: [PATCH 1/4] mm/vmalloc: allow arch-specific vmalloc_node overrides

2024-02-21 Thread Christophe Leroy


Le 21/02/2024 à 06:43, Christoph Hellwig a écrit :
> On Tue, Feb 20, 2024 at 02:32:53PM -0600, Maxwell Bland wrote:
>> Present non-uniform use of __vmalloc_node and __vmalloc_node_range makes
>> enforcing appropriate code and data seperation untenable on certain
>> microarchitectures, as VMALLOC_START and VMALLOC_END are monolithic
>> while the use of the vmalloc interface is non-monolithic: in particular,
>> appropriate randomness in ASLR makes it such that code regions must fall
>> in some region between VMALLOC_START and VMALLOC_end, but this
>> necessitates that code pages are intermingled with data pages, meaning
>> code-specific protections, such as arm64's PXNTable, cannot be
>> performantly runtime enforced.
> 
> That's not actually true.  We have MODULE_START/END to separate them,
> which is used by mips only for now.

We have MODULES_VADDR and MODULES_END that are used by arm, arm64, 
loongarcg, powerpc, riscv, s390, sparc, x86_64

is_vmalloc_or_module_addr() is using MODULES_VADDR so I guess this 
function fails on mips ?

> 
>>
>> The solution to this problem allows architectures to override the
>> vmalloc wrapper functions by enforcing that the rest of the kernel does
>> not reimplement __vmalloc_node by using __vmalloc_node_range with the
>> same parameters as __vmalloc_node or provides a __weak tag to those
>> functions using __vmalloc_node_range with parameters repeating those of
>> __vmalloc_node.
> 
> I'm really not too happy about overriding the functions.  Especially
> as the separation is a generally good idea and it would be good to
> move everyone (or at least all modern architectures) over to a scheme
> like this.


Re: [PATCH 1/4] mm/vmalloc: allow arch-specific vmalloc_node overrides

2024-02-21 Thread Christophe Leroy


Le 20/02/2024 à 21:32, Maxwell Bland a écrit :
> [Vous ne recevez pas souvent de courriers de mbl...@motorola.com. Découvrez 
> pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ]
> 
> Present non-uniform use of __vmalloc_node and __vmalloc_node_range makes
> enforcing appropriate code and data seperation untenable on certain
> microarchitectures, as VMALLOC_START and VMALLOC_END are monolithic
> while the use of the vmalloc interface is non-monolithic: in particular,
> appropriate randomness in ASLR makes it such that code regions must fall
> in some region between VMALLOC_START and VMALLOC_end, but this
> necessitates that code pages are intermingled with data pages, meaning
> code-specific protections, such as arm64's PXNTable, cannot be
> performantly runtime enforced.
> 
> The solution to this problem allows architectures to override the
> vmalloc wrapper functions by enforcing that the rest of the kernel does
> not reimplement __vmalloc_node by using __vmalloc_node_range with the
> same parameters as __vmalloc_node or provides a __weak tag to those
> functions using __vmalloc_node_range with parameters repeating those of
> __vmalloc_node.
> 
> Two benefits of this approach are (1) greater flexibility to each
> architecture for handling of virtual memory while not compromising the
> kernel's vmalloc logic and (2) more uniform use of the __vmalloc_node
> interface, reserving the more specialized __vmalloc_node_range for more
> specialized cases, such as kasan's shadow memory.

I'm not sure I understand the message. What I understand is that you 
allow architectures to override vmalloc_node().

In the code you add __weak for that. But you also add the flags to the 
parameters and I can't understand why when reading the above description.

Christophe


Re: [PATCH 1/4] mm/vmalloc: allow arch-specific vmalloc_node overrides

2024-02-20 Thread Christoph Hellwig
On Tue, Feb 20, 2024 at 02:32:53PM -0600, Maxwell Bland wrote:
> Present non-uniform use of __vmalloc_node and __vmalloc_node_range makes
> enforcing appropriate code and data seperation untenable on certain
> microarchitectures, as VMALLOC_START and VMALLOC_END are monolithic
> while the use of the vmalloc interface is non-monolithic: in particular,
> appropriate randomness in ASLR makes it such that code regions must fall
> in some region between VMALLOC_START and VMALLOC_end, but this
> necessitates that code pages are intermingled with data pages, meaning
> code-specific protections, such as arm64's PXNTable, cannot be
> performantly runtime enforced.

That's not actually true.  We have MODULE_START/END to separate them,
which is used by mips only for now.

> 
> The solution to this problem allows architectures to override the
> vmalloc wrapper functions by enforcing that the rest of the kernel does
> not reimplement __vmalloc_node by using __vmalloc_node_range with the
> same parameters as __vmalloc_node or provides a __weak tag to those
> functions using __vmalloc_node_range with parameters repeating those of
> __vmalloc_node.

I'm really not too happy about overriding the functions.  Especially
as the separation is a generally good idea and it would be good to
move everyone (or at least all modern architectures) over to a scheme
like this.


[PATCH 1/4] mm/vmalloc: allow arch-specific vmalloc_node overrides

2024-02-20 Thread Maxwell Bland
Present non-uniform use of __vmalloc_node and __vmalloc_node_range makes
enforcing appropriate code and data seperation untenable on certain
microarchitectures, as VMALLOC_START and VMALLOC_END are monolithic
while the use of the vmalloc interface is non-monolithic: in particular,
appropriate randomness in ASLR makes it such that code regions must fall
in some region between VMALLOC_START and VMALLOC_end, but this
necessitates that code pages are intermingled with data pages, meaning
code-specific protections, such as arm64's PXNTable, cannot be
performantly runtime enforced.

The solution to this problem allows architectures to override the
vmalloc wrapper functions by enforcing that the rest of the kernel does
not reimplement __vmalloc_node by using __vmalloc_node_range with the
same parameters as __vmalloc_node or provides a __weak tag to those
functions using __vmalloc_node_range with parameters repeating those of
__vmalloc_node.

Two benefits of this approach are (1) greater flexibility to each
architecture for handling of virtual memory while not compromising the
kernel's vmalloc logic and (2) more uniform use of the __vmalloc_node
interface, reserving the more specialized __vmalloc_node_range for more
specialized cases, such as kasan's shadow memory.

Signed-off-by: Maxwell Bland 
---
 arch/arm/kernel/irq.c   |  2 +-
 arch/arm64/include/asm/vmap_stack.h |  2 +-
 arch/arm64/kernel/efi.c |  2 +-
 arch/powerpc/kernel/irq.c   |  2 +-
 arch/riscv/include/asm/irq_stack.h  |  2 +-
 arch/s390/hypfs/hypfs_diag.c|  2 +-
 arch/s390/kernel/setup.c|  6 ++---
 arch/s390/kernel/sthyi.c|  2 +-
 include/linux/vmalloc.h | 15 ++-
 kernel/bpf/syscall.c|  4 +--
 kernel/fork.c   |  4 +--
 kernel/scs.c|  3 +--
 lib/objpool.c   |  2 +-
 lib/test_vmalloc.c  |  6 ++---
 mm/util.c   |  3 +--
 mm/vmalloc.c| 39 +++--
 16 files changed, 47 insertions(+), 49 deletions(-)

diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
index fe28fc1f759d..109f4f363621 100644
--- a/arch/arm/kernel/irq.c
+++ b/arch/arm/kernel/irq.c
@@ -61,7 +61,7 @@ static void __init init_irq_stacks(void)
   THREAD_SIZE_ORDER);
else
stack = __vmalloc_node(THREAD_SIZE, THREAD_ALIGN,
-  THREADINFO_GFP, NUMA_NO_NODE,
+  THREADINFO_GFP, 0, NUMA_NO_NODE,
   __builtin_return_address(0));
 
if (WARN_ON(!stack))
diff --git a/arch/arm64/include/asm/vmap_stack.h 
b/arch/arm64/include/asm/vmap_stack.h
index 20873099c035..57a7eaa720d5 100644
--- a/arch/arm64/include/asm/vmap_stack.h
+++ b/arch/arm64/include/asm/vmap_stack.h
@@ -21,7 +21,7 @@ static inline unsigned long *arch_alloc_vmap_stack(size_t 
stack_size, int node)
 
BUILD_BUG_ON(!IS_ENABLED(CONFIG_VMAP_STACK));
 
-   p = __vmalloc_node(stack_size, THREAD_ALIGN, THREADINFO_GFP, node,
+   p = __vmalloc_node(stack_size, THREAD_ALIGN, THREADINFO_GFP, 0, node,
__builtin_return_address(0));
return kasan_reset_tag(p);
 }
diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index 0228001347be..48efa31a9161 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -205,7 +205,7 @@ static int __init arm64_efi_rt_init(void)
return 0;
 
p = __vmalloc_node(THREAD_SIZE, THREAD_ALIGN, GFP_KERNEL,
-  NUMA_NO_NODE, &);
+  0, NUMA_NO_NODE, &);
 l: if (!p) {
pr_warn("Failed to allocate EFI runtime stack\n");
clear_bit(EFI_RUNTIME_SERVICES, );
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 6f7d4edaa0bc..ceb7ea07ca28 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -308,7 +308,7 @@ DEFINE_INTERRUPT_HANDLER_ASYNC(do_IRQ)
 static void *__init alloc_vm_stack(void)
 {
return __vmalloc_node(THREAD_SIZE, THREAD_ALIGN, THREADINFO_GFP,
- NUMA_NO_NODE, (void *)_RET_IP_);
+ 0, NUMA_NO_NODE, (void *)_RET_IP_);
 }
 
 static void __init vmap_irqstack_init(void)
diff --git a/arch/riscv/include/asm/irq_stack.h 
b/arch/riscv/include/asm/irq_stack.h
index 6441ded3b0cf..d2410735bde0 100644
--- a/arch/riscv/include/asm/irq_stack.h
+++ b/arch/riscv/include/asm/irq_stack.h
@@ -24,7 +24,7 @@ static inline unsigned long *arch_alloc_vmap_stack(size_t 
stack_size, int node)
 {
void *p;
 
-   p = __vmalloc_node(stack_size, THREAD_ALIGN, THREADINFO_GFP, node,
+   p = __vmalloc_node(stack_size, THREAD_ALIGN, THREADINFO_GFP, 0, node,