Re: [PATCH v5 1/3] riscv: Move kernel mapping outside of linear mapping

2021-04-18 Thread Alex Ghiti

Hi Palmer,

Le 4/15/21 à 2:00 PM, Alex Ghiti a écrit :

Le 4/15/21 à 12:54 AM, Alex Ghiti a écrit :

Le 4/15/21 à 12:20 AM, Palmer Dabbelt a écrit :

On Sun, 11 Apr 2021 09:41:44 PDT (-0700), a...@ghiti.fr wrote:

This is a preparatory patch for relocatable kernel and sv48 support.

The kernel used to be linked at PAGE_OFFSET address therefore we 
could use
the linear mapping for the kernel mapping. But the relocated kernel 
base
address will be different from PAGE_OFFSET and since in the linear 
mapping,
two different virtual addresses cannot point to the same physical 
address,
the kernel mapping needs to lie outside the linear mapping so that 
we don't

have to copy it at the same physical offset.

The kernel mapping is moved to the last 2GB of the address space, BPF
is now always after the kernel and modules use the 2GB memory range 
right

before the kernel, so BPF and modules regions do not overlap. KASLR
implementation will simply have to move the kernel in the last 2GB 
range

and just take care of leaving enough space for BPF.

In addition, by moving the kernel to the end of the address space, both
sv39 and sv48 kernels will be exactly the same without needing to be
relocated at runtime.

Suggested-by: Arnd Bergmann 
Signed-off-by: Alexandre Ghiti 
---
 arch/riscv/boot/loader.lds.S    |  3 +-
 arch/riscv/include/asm/page.h  | 17 +-
 arch/riscv/include/asm/pgtable.h    | 37 
 arch/riscv/include/asm/set_memory.h |  1 +
 arch/riscv/kernel/head.S    |  3 +-
 arch/riscv/kernel/module.c  |  6 +-
 arch/riscv/kernel/setup.c   |  5 ++
 arch/riscv/kernel/vmlinux.lds.S | 3 +-
 arch/riscv/mm/fault.c   | 13 +
 arch/riscv/mm/init.c    | 87 ++---
 arch/riscv/mm/kasan_init.c  |  9 +++
 arch/riscv/mm/physaddr.c    |  2 +-
 12 files changed, 146 insertions(+), 40 deletions(-)

diff --git a/arch/riscv/boot/loader.lds.S 
b/arch/riscv/boot/loader.lds.S

index 47a5003c2e28..62d94696a19c 100644
--- a/arch/riscv/boot/loader.lds.S
+++ b/arch/riscv/boot/loader.lds.S
@@ -1,13 +1,14 @@
 /* SPDX-License-Identifier: GPL-2.0 */

 #include 
+#include 

 OUTPUT_ARCH(riscv)
 ENTRY(_start)

 SECTIONS
 {
-    . = PAGE_OFFSET;
+    . = KERNEL_LINK_ADDR;

 .payload : {
 *(.payload)
diff --git a/arch/riscv/include/asm/page.h 
b/arch/riscv/include/asm/page.h

index adc9d26f3d75..22cfb2be60dc 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -90,15 +90,28 @@ typedef struct page *pgtable_t;

 #ifdef CONFIG_MMU
 extern unsigned long va_pa_offset;
+extern unsigned long va_kernel_pa_offset;
 extern unsigned long pfn_base;
 #define ARCH_PFN_OFFSET   (pfn_base)
 #else
 #define va_pa_offset    0
+#define va_kernel_pa_offset    0
 #define ARCH_PFN_OFFSET   (PAGE_OFFSET >> PAGE_SHIFT)
 #endif /* CONFIG_MMU */

-#define __pa_to_va_nodebug(x)    ((void *)((unsigned long) (x) + 
va_pa_offset))

-#define __va_to_pa_nodebug(x)    ((unsigned long)(x) - va_pa_offset)
+extern unsigned long kernel_virt_addr;
+
+#define linear_mapping_pa_to_va(x)    ((void *)((unsigned long)(x) 
+ va_pa_offset))
+#define kernel_mapping_pa_to_va(x)    ((void *)((unsigned long)(x) 
+ va_kernel_pa_offset))

+#define __pa_to_va_nodebug(x)    linear_mapping_pa_to_va(x)
+
+#define linear_mapping_va_to_pa(x)    ((unsigned long)(x) - 
va_pa_offset)
+#define kernel_mapping_va_to_pa(x)    ((unsigned long)(x) - 
va_kernel_pa_offset)

+#define __va_to_pa_nodebug(x)    ({    \
+    unsigned long _x = x;    \
+    (_x < kernel_virt_addr) ?    \
+    linear_mapping_va_to_pa(_x) : 
kernel_mapping_va_to_pa(_x);    \

+    })

 #ifdef CONFIG_DEBUG_VIRTUAL
 extern phys_addr_t __virt_to_phys(unsigned long x);
diff --git a/arch/riscv/include/asm/pgtable.h 
b/arch/riscv/include/asm/pgtable.h

index ebf817c1bdf4..80e63a93e903 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -11,23 +11,30 @@

 #include 

-#ifndef __ASSEMBLY__
-
-/* Page Upper Directory not used in RISC-V */
-#include 
-#include 
-#include 
-#include 
+#ifndef CONFIG_MMU
+#define KERNEL_LINK_ADDR    PAGE_OFFSET
+#else

-#ifdef CONFIG_MMU
+#define ADDRESS_SPACE_END    (UL(-1))
+/*
+ * Leave 2GB for kernel and BPF at the end of the address space
+ */
+#define KERNEL_LINK_ADDR    (ADDRESS_SPACE_END - SZ_2G + 1)

 #define VMALLOC_SIZE (KERN_VIRT_SIZE >>1)
 #define VMALLOC_END  (PAGE_OFFSET - 1)
 #define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)

+/* KASLR should leave at least 128MB for BPF after the kernel */
 #define BPF_JIT_REGION_SIZE    (SZ_128M)
-#define BPF_JIT_REGION_START    (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
-#define BPF_JIT_REGION_END    (VMALLOC_END)
+#define BPF_JIT_REGION_START    PFN_ALIGN((unsigned long)&_end)
+#define BPF_JIT_REGION_END    (BPF_JIT_REGION_START + 
BPF_JIT_REGION_SIZE)

+
+/* Modules

Re: [PATCH] riscv: Protect kernel linear mapping only if CONFIG_STRICT_KERNEL_RWX is set

2021-04-17 Thread Alex Ghiti

Le 4/16/21 à 12:33 PM, Palmer Dabbelt a écrit :

On Fri, 16 Apr 2021 03:47:19 PDT (-0700), a...@ghiti.fr wrote:

Hi Anup,

Le 4/16/21 à 6:41 AM, Anup Patel a écrit :

On Thu, Apr 15, 2021 at 4:34 PM Alexandre Ghiti  wrote:


If CONFIG_STRICT_KERNEL_RWX is not set, we cannot set different 
permissions

to the kernel data and text sections, so make sure it is defined before
trying to protect the kernel linear mapping.

Signed-off-by: Alexandre Ghiti 


Maybe you should add "Fixes:" tag in commit tag ?


Yes you're right I should have done that. Maybe Palmer will squash it as
it just entered for-next?


Ya, I'll do it.  My testing box was just tied up last night for the rc8 
PR, so I threw this on for-next to get the buildbots to take a look. 
It's a bit too late to take something for this week, as I try to be 
pretty conservative this late in the cycle.  There's another kprobes fix 
on the list so if we end up with an rc8 I might send this along with 
that, otherwise this'll just go onto for-next before the linear map 
changes that exercise the bug.


You're more than welcome to just dig up the fixes tag and reply, my 
scripts pull all tags from replies (just like Revieweb-by).  Otherwise 
I'll do it myself, most people don't really post Fixes tags that 
accurately so I go through it for pretty much everything anyway.


Here it is:

Fixes: 4b67f48da707 ("riscv: Move kernel mapping outside of linear mapping")

Thanks,



Thanks for sorting this out so quickly!





Otherwise it looks good.

Reviewed-by: Anup Patel 


Thank you!

Alex



Regards,
Anup


---
  arch/riscv/kernel/setup.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 626003bb5fca..ab394d173cd4 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -264,12 +264,12 @@ void __init setup_arch(char **cmdline_p)

 sbi_init();

-   if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX))
+   if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX)) {
 protect_kernel_text_data();
-
-#if defined(CONFIG_64BIT) && defined(CONFIG_MMU)
-   protect_kernel_linear_mapping_text_rodata();
+#ifdef CONFIG_64BIT
+   protect_kernel_linear_mapping_text_rodata();
  #endif
+   }

  #ifdef CONFIG_SWIOTLB
 swiotlb_init(1);
--
2.20.1



___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv



___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv


Re: [PATCH v4 1/3] riscv: Move kernel mapping outside of linear mapping

2021-04-17 Thread Alex Ghiti

Hi Guenter,

Le 4/16/21 à 2:51 PM, Guenter Roeck a écrit :

On Fri, Apr 09, 2021 at 02:14:58AM -0400, Alexandre Ghiti wrote:

This is a preparatory patch for relocatable kernel and sv48 support.

The kernel used to be linked at PAGE_OFFSET address therefore we could use
the linear mapping for the kernel mapping. But the relocated kernel base
address will be different from PAGE_OFFSET and since in the linear mapping,
two different virtual addresses cannot point to the same physical address,
the kernel mapping needs to lie outside the linear mapping so that we don't
have to copy it at the same physical offset.

The kernel mapping is moved to the last 2GB of the address space, BPF
is now always after the kernel and modules use the 2GB memory range right
before the kernel, so BPF and modules regions do not overlap. KASLR
implementation will simply have to move the kernel in the last 2GB range
and just take care of leaving enough space for BPF.

In addition, by moving the kernel to the end of the address space, both
sv39 and sv48 kernels will be exactly the same without needing to be
relocated at runtime.

Suggested-by: Arnd Bergmann 
Signed-off-by: Alexandre Ghiti 


In next-20210416, when booting a riscv32 image in qemu, this patch results in:

[0.00] Linux version 5.12.0-rc7-next-20210416 (groeck@desktop) 
(riscv32-linux-gcc (GCC) 10.3.0, GNU ld (GNU Binutils) 2.36.1) #1 SMP Fri Apr 
16 10:38:09 PDT 2021
[0.00] OF: fdt: Ignoring memory block 0x8000 - 0xa000
[0.00] Machine model: riscv-virtio,qemu
[0.00] earlycon: uart8250 at MMIO 0x1000 (options '115200')
[0.00] printk: bootconsole [uart8250] enabled
[0.00] efi: UEFI not found.
[0.00] Kernel panic - not syncing: init_resources: Failed to allocate 
160 bytes
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 5.12.0-rc7-next-20210416 
#1
[0.00] Hardware name: riscv-virtio,qemu (DT)
[0.00] Call Trace:
[0.00] [<80005292>] walk_stackframe+0x0/0xce
[0.00] [<809f4db8>] dump_backtrace+0x38/0x46
[0.00] [<809f4dd4>] show_stack+0xe/0x16
[0.00] [<809ff1d0>] dump_stack+0x92/0xc6
[0.00] [<809f4fee>] panic+0x10a/0x2d8
[0.00] [<80c02b24>] setup_arch+0x2a0/0x4ea
[0.00] [<80c006b0>] start_kernel+0x90/0x628
[0.00] ---[ end Kernel panic - not syncing: init_resources: Failed to 
allocate 160 bytes ]---

Reverting it fixes the problem. I understand that the version in -next is
different to this version of the patch, but I also tried v4 and it still
crashes with the same error message.



I completely neglected 32b kernel in this series, I fixed that here:

Thank you for testing and reporting,

Alex


Guenter

___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv



Re: [PATCH] riscv: Protect kernel linear mapping only if CONFIG_STRICT_KERNEL_RWX is set

2021-04-16 Thread Alex Ghiti

Hi Anup,

Le 4/16/21 à 6:41 AM, Anup Patel a écrit :

On Thu, Apr 15, 2021 at 4:34 PM Alexandre Ghiti  wrote:


If CONFIG_STRICT_KERNEL_RWX is not set, we cannot set different permissions
to the kernel data and text sections, so make sure it is defined before
trying to protect the kernel linear mapping.

Signed-off-by: Alexandre Ghiti 


Maybe you should add "Fixes:" tag in commit tag ?


Yes you're right I should have done that. Maybe Palmer will squash it as 
it just entered for-next?




Otherwise it looks good.

Reviewed-by: Anup Patel 


Thank you!

Alex



Regards,
Anup


---
  arch/riscv/kernel/setup.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 626003bb5fca..ab394d173cd4 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -264,12 +264,12 @@ void __init setup_arch(char **cmdline_p)

 sbi_init();

-   if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX))
+   if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX)) {
 protect_kernel_text_data();
-
-#if defined(CONFIG_64BIT) && defined(CONFIG_MMU)
-   protect_kernel_linear_mapping_text_rodata();
+#ifdef CONFIG_64BIT
+   protect_kernel_linear_mapping_text_rodata();
  #endif
+   }

  #ifdef CONFIG_SWIOTLB
 swiotlb_init(1);
--
2.20.1



___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv



Re: [PATCH v5 1/3] riscv: Move kernel mapping outside of linear mapping

2021-04-15 Thread Alex Ghiti

Le 4/15/21 à 12:54 AM, Alex Ghiti a écrit :

Le 4/15/21 à 12:20 AM, Palmer Dabbelt a écrit :

On Sun, 11 Apr 2021 09:41:44 PDT (-0700), a...@ghiti.fr wrote:

This is a preparatory patch for relocatable kernel and sv48 support.

The kernel used to be linked at PAGE_OFFSET address therefore we 
could use

the linear mapping for the kernel mapping. But the relocated kernel base
address will be different from PAGE_OFFSET and since in the linear 
mapping,
two different virtual addresses cannot point to the same physical 
address,
the kernel mapping needs to lie outside the linear mapping so that we 
don't

have to copy it at the same physical offset.

The kernel mapping is moved to the last 2GB of the address space, BPF
is now always after the kernel and modules use the 2GB memory range 
right

before the kernel, so BPF and modules regions do not overlap. KASLR
implementation will simply have to move the kernel in the last 2GB range
and just take care of leaving enough space for BPF.

In addition, by moving the kernel to the end of the address space, both
sv39 and sv48 kernels will be exactly the same without needing to be
relocated at runtime.

Suggested-by: Arnd Bergmann 
Signed-off-by: Alexandre Ghiti 
---
 arch/riscv/boot/loader.lds.S    |  3 +-
 arch/riscv/include/asm/page.h   | 17 +-
 arch/riscv/include/asm/pgtable.h    | 37 
 arch/riscv/include/asm/set_memory.h |  1 +
 arch/riscv/kernel/head.S    |  3 +-
 arch/riscv/kernel/module.c  |  6 +-
 arch/riscv/kernel/setup.c   |  5 ++
 arch/riscv/kernel/vmlinux.lds.S |  3 +-
 arch/riscv/mm/fault.c   | 13 +
 arch/riscv/mm/init.c    | 87 ++---
 arch/riscv/mm/kasan_init.c  |  9 +++
 arch/riscv/mm/physaddr.c    |  2 +-
 12 files changed, 146 insertions(+), 40 deletions(-)

diff --git a/arch/riscv/boot/loader.lds.S b/arch/riscv/boot/loader.lds.S
index 47a5003c2e28..62d94696a19c 100644
--- a/arch/riscv/boot/loader.lds.S
+++ b/arch/riscv/boot/loader.lds.S
@@ -1,13 +1,14 @@
 /* SPDX-License-Identifier: GPL-2.0 */

 #include 
+#include 

 OUTPUT_ARCH(riscv)
 ENTRY(_start)

 SECTIONS
 {
-    . = PAGE_OFFSET;
+    . = KERNEL_LINK_ADDR;

 .payload : {
 *(.payload)
diff --git a/arch/riscv/include/asm/page.h 
b/arch/riscv/include/asm/page.h

index adc9d26f3d75..22cfb2be60dc 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -90,15 +90,28 @@ typedef struct page *pgtable_t;

 #ifdef CONFIG_MMU
 extern unsigned long va_pa_offset;
+extern unsigned long va_kernel_pa_offset;
 extern unsigned long pfn_base;
 #define ARCH_PFN_OFFSET    (pfn_base)
 #else
 #define va_pa_offset    0
+#define va_kernel_pa_offset    0
 #define ARCH_PFN_OFFSET    (PAGE_OFFSET >> PAGE_SHIFT)
 #endif /* CONFIG_MMU */

-#define __pa_to_va_nodebug(x)    ((void *)((unsigned long) (x) + 
va_pa_offset))

-#define __va_to_pa_nodebug(x)    ((unsigned long)(x) - va_pa_offset)
+extern unsigned long kernel_virt_addr;
+
+#define linear_mapping_pa_to_va(x)    ((void *)((unsigned long)(x) + 
va_pa_offset))
+#define kernel_mapping_pa_to_va(x)    ((void *)((unsigned long)(x) + 
va_kernel_pa_offset))

+#define __pa_to_va_nodebug(x)    linear_mapping_pa_to_va(x)
+
+#define linear_mapping_va_to_pa(x)    ((unsigned long)(x) - 
va_pa_offset)
+#define kernel_mapping_va_to_pa(x)    ((unsigned long)(x) - 
va_kernel_pa_offset)

+#define __va_to_pa_nodebug(x)    ({    \
+    unsigned long _x = x;    \
+    (_x < kernel_virt_addr) ?    \
+    linear_mapping_va_to_pa(_x) : kernel_mapping_va_to_pa(_x);    \
+    })

 #ifdef CONFIG_DEBUG_VIRTUAL
 extern phys_addr_t __virt_to_phys(unsigned long x);
diff --git a/arch/riscv/include/asm/pgtable.h 
b/arch/riscv/include/asm/pgtable.h

index ebf817c1bdf4..80e63a93e903 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -11,23 +11,30 @@

 #include 

-#ifndef __ASSEMBLY__
-
-/* Page Upper Directory not used in RISC-V */
-#include 
-#include 
-#include 
-#include 
+#ifndef CONFIG_MMU
+#define KERNEL_LINK_ADDR    PAGE_OFFSET
+#else

-#ifdef CONFIG_MMU
+#define ADDRESS_SPACE_END    (UL(-1))
+/*
+ * Leave 2GB for kernel and BPF at the end of the address space
+ */
+#define KERNEL_LINK_ADDR    (ADDRESS_SPACE_END - SZ_2G + 1)

 #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
 #define VMALLOC_END  (PAGE_OFFSET - 1)
 #define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)

+/* KASLR should leave at least 128MB for BPF after the kernel */
 #define BPF_JIT_REGION_SIZE    (SZ_128M)
-#define BPF_JIT_REGION_START    (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
-#define BPF_JIT_REGION_END    (VMALLOC_END)
+#define BPF_JIT_REGION_START    PFN_ALIGN((unsigned long)&_end)
+#define BPF_JIT_REGION_END    (BPF_JIT_REGION_START + 
BPF_JIT_REGION_SIZE)

+
+/* Modules always live before the kernel */
+#ifdef CONFIG_64BIT
+#d

Re: [PATCH v5 1/3] riscv: Move kernel mapping outside of linear mapping

2021-04-14 Thread Alex Ghiti

Le 4/15/21 à 12:20 AM, Palmer Dabbelt a écrit :

On Sun, 11 Apr 2021 09:41:44 PDT (-0700), a...@ghiti.fr wrote:

This is a preparatory patch for relocatable kernel and sv48 support.

The kernel used to be linked at PAGE_OFFSET address therefore we could 
use

the linear mapping for the kernel mapping. But the relocated kernel base
address will be different from PAGE_OFFSET and since in the linear 
mapping,
two different virtual addresses cannot point to the same physical 
address,
the kernel mapping needs to lie outside the linear mapping so that we 
don't

have to copy it at the same physical offset.

The kernel mapping is moved to the last 2GB of the address space, BPF
is now always after the kernel and modules use the 2GB memory range right
before the kernel, so BPF and modules regions do not overlap. KASLR
implementation will simply have to move the kernel in the last 2GB range
and just take care of leaving enough space for BPF.

In addition, by moving the kernel to the end of the address space, both
sv39 and sv48 kernels will be exactly the same without needing to be
relocated at runtime.

Suggested-by: Arnd Bergmann 
Signed-off-by: Alexandre Ghiti 
---
 arch/riscv/boot/loader.lds.S    |  3 +-
 arch/riscv/include/asm/page.h   | 17 +-
 arch/riscv/include/asm/pgtable.h    | 37 
 arch/riscv/include/asm/set_memory.h |  1 +
 arch/riscv/kernel/head.S    |  3 +-
 arch/riscv/kernel/module.c  |  6 +-
 arch/riscv/kernel/setup.c   |  5 ++
 arch/riscv/kernel/vmlinux.lds.S |  3 +-
 arch/riscv/mm/fault.c   | 13 +
 arch/riscv/mm/init.c    | 87 ++---
 arch/riscv/mm/kasan_init.c  |  9 +++
 arch/riscv/mm/physaddr.c    |  2 +-
 12 files changed, 146 insertions(+), 40 deletions(-)

diff --git a/arch/riscv/boot/loader.lds.S b/arch/riscv/boot/loader.lds.S
index 47a5003c2e28..62d94696a19c 100644
--- a/arch/riscv/boot/loader.lds.S
+++ b/arch/riscv/boot/loader.lds.S
@@ -1,13 +1,14 @@
 /* SPDX-License-Identifier: GPL-2.0 */

 #include 
+#include 

 OUTPUT_ARCH(riscv)
 ENTRY(_start)

 SECTIONS
 {
-    . = PAGE_OFFSET;
+    . = KERNEL_LINK_ADDR;

 .payload : {
 *(.payload)
diff --git a/arch/riscv/include/asm/page.h 
b/arch/riscv/include/asm/page.h

index adc9d26f3d75..22cfb2be60dc 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -90,15 +90,28 @@ typedef struct page *pgtable_t;

 #ifdef CONFIG_MMU
 extern unsigned long va_pa_offset;
+extern unsigned long va_kernel_pa_offset;
 extern unsigned long pfn_base;
 #define ARCH_PFN_OFFSET    (pfn_base)
 #else
 #define va_pa_offset    0
+#define va_kernel_pa_offset    0
 #define ARCH_PFN_OFFSET    (PAGE_OFFSET >> PAGE_SHIFT)
 #endif /* CONFIG_MMU */

-#define __pa_to_va_nodebug(x)    ((void *)((unsigned long) (x) + 
va_pa_offset))

-#define __va_to_pa_nodebug(x)    ((unsigned long)(x) - va_pa_offset)
+extern unsigned long kernel_virt_addr;
+
+#define linear_mapping_pa_to_va(x)    ((void *)((unsigned long)(x) + 
va_pa_offset))
+#define kernel_mapping_pa_to_va(x)    ((void *)((unsigned long)(x) + 
va_kernel_pa_offset))

+#define __pa_to_va_nodebug(x)    linear_mapping_pa_to_va(x)
+
+#define linear_mapping_va_to_pa(x)    ((unsigned long)(x) - 
va_pa_offset)
+#define kernel_mapping_va_to_pa(x)    ((unsigned long)(x) - 
va_kernel_pa_offset)

+#define __va_to_pa_nodebug(x)    ({    \
+    unsigned long _x = x;    \
+    (_x < kernel_virt_addr) ?    \
+    linear_mapping_va_to_pa(_x) : kernel_mapping_va_to_pa(_x);    \
+    })

 #ifdef CONFIG_DEBUG_VIRTUAL
 extern phys_addr_t __virt_to_phys(unsigned long x);
diff --git a/arch/riscv/include/asm/pgtable.h 
b/arch/riscv/include/asm/pgtable.h

index ebf817c1bdf4..80e63a93e903 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -11,23 +11,30 @@

 #include 

-#ifndef __ASSEMBLY__
-
-/* Page Upper Directory not used in RISC-V */
-#include 
-#include 
-#include 
-#include 
+#ifndef CONFIG_MMU
+#define KERNEL_LINK_ADDR    PAGE_OFFSET
+#else

-#ifdef CONFIG_MMU
+#define ADDRESS_SPACE_END    (UL(-1))
+/*
+ * Leave 2GB for kernel and BPF at the end of the address space
+ */
+#define KERNEL_LINK_ADDR    (ADDRESS_SPACE_END - SZ_2G + 1)

 #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
 #define VMALLOC_END  (PAGE_OFFSET - 1)
 #define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)

+/* KASLR should leave at least 128MB for BPF after the kernel */
 #define BPF_JIT_REGION_SIZE    (SZ_128M)
-#define BPF_JIT_REGION_START    (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
-#define BPF_JIT_REGION_END    (VMALLOC_END)
+#define BPF_JIT_REGION_START    PFN_ALIGN((unsigned long)&_end)
+#define BPF_JIT_REGION_END    (BPF_JIT_REGION_START + 
BPF_JIT_REGION_SIZE)

+
+/* Modules always live before the kernel */
+#ifdef CONFIG_64BIT
+#define MODULES_VADDR    (PFN_ALIGN((unsigned long)&_end) - SZ_2G)

Re: [PATCH] implement flush_cache_vmap for RISC-V

2021-04-14 Thread Alex Ghiti

Hi,

Le 4/12/21 à 3:08 AM, Jisheng Zhang a écrit :

Hi Jiuyang,

On Mon, 12 Apr 2021 00:05:30 + Jiuyang Liu  wrote:




This patch implements flush_cache_vmap for RISC-V, since it modifies PTE.
Without this patch, SFENCE.VMA won't be added to related codes, which
might introduce a bug in the out-of-order micro-architecture
implementations.

Signed-off-by: Jiuyang Liu 
Reviewed-by: Alexandre Ghiti 
Reviewed-by: Palmer Dabbelt 


IIRC, Palmer hasn't given this Reviewed-by tag.


---


Could you plz add version and changes? IIRC, this is the v3.


  arch/riscv/include/asm/cacheflush.h | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/arch/riscv/include/asm/cacheflush.h 
b/arch/riscv/include/asm/cacheflush.h
index 23ff70350992..3fd528badc35 100644
--- a/arch/riscv/include/asm/cacheflush.h
+++ b/arch/riscv/include/asm/cacheflush.h
@@ -30,6 +30,12 @@ static inline void flush_dcache_page(struct page *page)
  #define flush_icache_user_page(vma, pg, addr, len) \
 flush_icache_mm(vma->vm_mm, 0)

+/*
+ * flush_cache_vmap is invoked after map_kernel_range() has installed the page
+ * table entries, which modifies PTE, SFENCE.VMA should be inserted.


Just my humble opinion, flush_cache_vmap() may not be necessary. vmalloc_fault
can take care of this, and finally sfence.vma is inserted in related path.




I believe Palmer and Jisheng are right, my initial proposal to implement 
flush_cache_vmap is wrong.


But then, Jiuyang should not have noticed any problem here, so what's 
wrong? @Jiuyang: Does implementing flush_cache_vmap fix your issue?


And regarding flush_cache_vunmap, from Jisheng call stack, it seems also 
not necessary.


@Jiuyang: Can you tell us more about what you noticed?



Regards


+ */
+#define flush_cache_vmap(start, end) flush_tlb_all()
+
  #ifndef CONFIG_SMP

  #define flush_icache_all() local_flush_icache_all()
--
2.31.1


___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv



Re: [PATCH v8] RISC-V: enable XIP

2021-04-13 Thread Alex Ghiti

Le 4/13/21 à 2:35 AM, Alexandre Ghiti a écrit :

From: Vitaly Wool 

Introduce XIP (eXecute In Place) support for RISC-V platforms.
It allows code to be executed directly from non-volatile storage
directly addressable by the CPU, such as QSPI NOR flash which can
be found on many RISC-V platforms. This makes way for significant
optimization of RAM footprint. The XIP kernel is not compressed
since it has to run directly from flash, so it will occupy more
space on the non-volatile storage. The physical flash address used
to link the kernel object files and for storing it has to be known
at compile time and is represented by a Kconfig option.

XIP on RISC-V will for the time being only work on MMU-enabled
kernels.

Signed-off-by: Alexandre Ghiti  [ Rebase on top of "Move
kernel mapping outside the linear mapping" ]
Signed-off-by: Vitaly Wool 
---


I forgot the changes history:

Changes in v2:
- dedicated macro for XIP address fixup when MMU is not enabled yet
  o both for 32-bit and 64-bit RISC-V
- SP is explicitly set to a safe place in RAM before __copy_data call
- removed redundant alignment requirements in vmlinux-xip.lds.S
- changed long -> uintptr_t typecast in __XIP_FIXUP macro.
Changes in v3:
- rebased against latest for-next
- XIP address fixup macro now takes an argument
- SMP related fixes
Changes in v4:
- rebased against the current for-next
- less #ifdef's in C/ASM code
- dedicated XIP_FIXUP_OFFSET assembler macro in head.S
- C-specific definitions moved into #ifndef __ASSEMBLY__
- Fixed multi-core boot
Changes in v5:
- fixed build error for non-XIP kernels
Changes in v6:
- XIP_PHYS_RAM_BASE config option renamed to PHYS_RAM_BASE
- added PHYS_RAM_BASE_FIXED config flag to allow usage of
  PHYS_RAM_BASE in non-XIP configurations if needed
- XIP_FIXUP macro rewritten with a tempoarary variable to avoid side
  effects
- fixed crash for non-XIP kernels that don't use built-in DTB
Changes in v7:
- Fix pfn_base that required FIXUP
- Fix copy_data which lacked + 1 in size to copy
- Fix pfn_valid for FLATMEM
- Rebased on top of "Move kernel mapping outside the linear mapping":
  this is the biggest change and affected mm/init.c,
  kernel/vmlinux-xip.lds.S and include/asm/pgtable.h: XIP kernel is now
  mapped like 'normal' kernel at the end of the address space.
Changes in v8:
- XIP_KERNEL now depends on SPARSEMEM
- FLATMEM related: pfn_valid and pfn_base removal


  arch/riscv/Kconfig  |  55 +++-
  arch/riscv/Makefile |   8 +-
  arch/riscv/boot/Makefile|  13 +++
  arch/riscv/include/asm/page.h   |  21 +
  arch/riscv/include/asm/pgtable.h|  25 +-
  arch/riscv/kernel/head.S|  46 +-
  arch/riscv/kernel/head.h|   3 +
  arch/riscv/kernel/setup.c   |  10 ++-
  arch/riscv/kernel/vmlinux-xip.lds.S | 133 
  arch/riscv/kernel/vmlinux.lds.S |   6 ++
  arch/riscv/mm/init.c| 115 ++--
  11 files changed, 418 insertions(+), 17 deletions(-)
  create mode 100644 arch/riscv/kernel/vmlinux-xip.lds.S

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 8ea60a0a19ae..7c7efdd67a10 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -28,7 +28,7 @@ config RISCV
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_SET_DIRECT_MAP
select ARCH_HAS_SET_MEMORY
-   select ARCH_HAS_STRICT_KERNEL_RWX if MMU
+   select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT
@@ -441,7 +441,7 @@ config EFI_STUB
  
  config EFI

bool "UEFI runtime support"
-   depends on OF
+   depends on OF && !XIP_KERNEL
select LIBFDT
select UCS2_STRING
select EFI_PARAMS_FROM_FDT
@@ -465,11 +465,60 @@ config STACKPROTECTOR_PER_TASK
def_bool y
depends on STACKPROTECTOR && CC_HAVE_STACKPROTECTOR_TLS
  
+config PHYS_RAM_BASE_FIXED

+   bool "Explicitly specified physical RAM address"
+   default n
+
+config PHYS_RAM_BASE
+   hex "Platform Physical RAM address"
+   depends on PHYS_RAM_BASE_FIXED
+   default "0x8000"
+   help
+ This is the physical address of RAM in the system. It has to be
+ explicitly specified to run early relocations of read-write data
+ from flash to RAM.
+
+config XIP_KERNEL
+   bool "Kernel Execute-In-Place from ROM"
+   depends on MMU && SPARSEMEM
+   select PHYS_RAM_BASE_FIXED
+   help
+ Execute-In-Place allows the kernel to run from non-volatile storage
+ directly addressable by the CPU, such as NOR flash. This saves RAM
+ space since the text section of the kernel is not loaded from flash
+ to RAM.  Read-write sections, such as the data section and stack,
+ are still copied to 

Re: [PATCH v7] RISC-V: enable XIP

2021-04-11 Thread Alex Ghiti

Le 4/9/21 à 10:42 AM, Vitaly Wool a écrit :

On Fri, Apr 9, 2021 at 3:59 PM Mike Rapoport  wrote:


On Fri, Apr 09, 2021 at 02:46:17PM +0200, David Hildenbrand wrote:

Also, will that memory properly be exposed in the resource tree as
System RAM (e.g., /proc/iomem) ? Otherwise some things (/proc/kcore)
won't work as expected - the kernel won't be included in a dump.

Do we really need a XIP kernel to included in kdump?
And does not it sound weird to expose flash as System RAM in /proc/iomem? ;-)


See my other mail, maybe we actually want something different.




I have just checked and it does not appear in /proc/iomem.

Ok your conclusion would be to have struct page, I'm going to implement this
version then using memblock as you described.


I'm not sure this is required. With XIP kernel text never gets into RAM, so
it does not seem to require struct page.

XIP by definition has some limitations relatively to "normal" operation,
so lack of kdump could be one of them.


I agree.



I might be wrong, but IMHO, artificially creating a memory map for part of
flash would cause more problems in the long run.


Can you elaborate?


Nothing particular, just a gut feeling. Usually, when you force something
it comes out the wrong way later.


It's possible still that MTD_XIP is implemented allowing to write to
the flash used for XIP. While flash is being written, memory map
doesn't make sense at all. I can't come up with a real life example
when it can actually lead to problems but it is indeed weird when
System RAM suddenly becomes unreadable. I really don't think exposing
it in /proc/iomem is a good idea.


BTW, how does XIP account the kernel text on other architectures that
implement it?


Interesting point, I thought XIP would be something new on RISC-V (well, at
least to me :) ). If that concept exists already, we better mimic what
existing implementations do.


I had quick glance at ARM, it seems that kernel text does not have memory
map and does not show up in System RAM.


Exactly, and I believe ARM64 won't do that too when it gets its own
XIP support (which is underway).




memmap does not seem necessary and ARM/ARM64 do not use it.

But if someone tries to get a struct page from a physical address that 
lies in flash, as mentioned by David, that could lead to silent 
corruptions if something exists at the address where the struct page 
should be. And it is hard to know which features in the kernel depends 
on that.


Regarding SPARSEMEM, the vmemmap lies in its own region so that's 
unlikely to happen, so we will catch those invalid accesses (and that's 
what I observed on riscv).


But for FLATMEM, memmap is in the linear mapping, then that could very 
likely happen silently.


Could a simple solution be to force SPARSEMEM for those XIP kernels ? 
Then wrong things could happen, but we would see those and avoid 
spending hours to debug :)


I will at least send a v8 to remove the pfn_valid modifications for 
FLATMEM that now returns true to pfn in flash.


Thanks,




Best regards,
Vitaly

___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv



Re: [PATCH v7] RISC-V: enable XIP

2021-04-09 Thread Alex Ghiti

Le 4/9/21 à 8:07 AM, David Hildenbrand a écrit :

On 09.04.21 13:39, Alex Ghiti wrote:

Hi David,

Le 4/9/21 à 4:23 AM, David Hildenbrand a écrit :

On 09.04.21 09:14, Alex Ghiti wrote:

Le 4/9/21 à 2:51 AM, Alexandre Ghiti a écrit :

From: Vitaly Wool 

Introduce XIP (eXecute In Place) support for RISC-V platforms.
It allows code to be executed directly from non-volatile storage
directly addressable by the CPU, such as QSPI NOR flash which can
be found on many RISC-V platforms. This makes way for significant
optimization of RAM footprint. The XIP kernel is not compressed
since it has to run directly from flash, so it will occupy more
space on the non-volatile storage. The physical flash address used
to link the kernel object files and for storing it has to be known
at compile time and is represented by a Kconfig option.

XIP on RISC-V will for the time being only work on MMU-enabled
kernels.


I added linux-mm and linux-arch to get feedbacks because I noticed that
DEBUG_VM_PGTABLE fails for SPARSEMEM (it works for FLATMEM but I think
it does not do what is expected): the fact that we don't have any 
struct

page to back the text and rodata in flash is the problem but to which
extent ?


Just wondering, why can't we create a memmap for that memory -- or is it
even desireable to not do that explicity? There might be some nasty side
effects when not having a memmap for text and rodata.



Do you have examples of such effects ? Any feature that will not work
without that ?



At least if it's not part of /proc/iomem in any way (maybe "System RAM" 
is not what we want without a memmap, TBD), kexec-tools won't be able to 
handle it properly e.g., for kdump. But not sure if that is really 
relevant in your setup.


Regarding other features, anything that does a pfn_valid(), 
pfn_to_page() or pfn_to_online_page() would behave differently now -- 
assuming the kernel doesn't fall into a section with other System RAM 
(whereby we would still allocate the memmap for the whole section).


I guess you might stumble over some surprises in some code paths, but 
nothing really comes to mind. Not sure if your zeropage is part of the 
kernel image on RISC-V (I remember that we sometimes need a memmap 
there, but I might be wrong)?



It is in the kernel image and is located in bss which will be in RAM and 
then be backed by a memmap.





I assume you still somehow create the direct mapping for the kernel, 
right? So it's really some memory region with a direct mapping but 
without a memmap (and right now, without a resource), correct?





No I don't create any direct mapping for the text and the rodata.



[...]



Also, will that memory properly be exposed in the resource tree as
System RAM (e.g., /proc/iomem) ? Otherwise some things (/proc/kcore)
won't work as expected - the kernel won't be included in a dump.



I have just checked and it does not appear in /proc/iomem.

Ok your conclusion would be to have struct page, I'm going to implement
this version then using memblock as you described.


Let's first evaluate what the harm could be. You could (and should?) 
create the kernel resource manually - IIRC, that's independent of the 
memmap/memblock thing.


@Mike, what's your take on not having a memmap for kernel text and ro data?



Re: [PATCH v7] RISC-V: enable XIP

2021-04-09 Thread Alex Ghiti

Hi David,

Le 4/9/21 à 4:23 AM, David Hildenbrand a écrit :

On 09.04.21 09:14, Alex Ghiti wrote:

Le 4/9/21 à 2:51 AM, Alexandre Ghiti a écrit :

From: Vitaly Wool 

Introduce XIP (eXecute In Place) support for RISC-V platforms.
It allows code to be executed directly from non-volatile storage
directly addressable by the CPU, such as QSPI NOR flash which can
be found on many RISC-V platforms. This makes way for significant
optimization of RAM footprint. The XIP kernel is not compressed
since it has to run directly from flash, so it will occupy more
space on the non-volatile storage. The physical flash address used
to link the kernel object files and for storing it has to be known
at compile time and is represented by a Kconfig option.

XIP on RISC-V will for the time being only work on MMU-enabled
kernels.


I added linux-mm and linux-arch to get feedbacks because I noticed that
DEBUG_VM_PGTABLE fails for SPARSEMEM (it works for FLATMEM but I think
it does not do what is expected): the fact that we don't have any struct
page to back the text and rodata in flash is the problem but to which
extent ?


Just wondering, why can't we create a memmap for that memory -- or is it 
even desireable to not do that explicity? There might be some nasty side 
effects when not having a memmap for text and rodata.



Do you have examples of such effects ? Any feature that will not work 
without that ?





I would assume stimply exposing the physical memory range to memblock as 
RAM and marking it reserved would create a memmap that's fully 
initialized like any bootmem (PG_reserved).


Or is there a reason why we cannot do that?



I did not want to do that if it was not needed as the overall goal of 
XIP kernel is to save RAM (I may be cheap but 16MB backed by struct page 
represents ~220KB).






Also, will that memory properly be exposed in the resource tree as 
System RAM (e.g., /proc/iomem) ? Otherwise some things (/proc/kcore) 
won't work as expected - the kernel won't be included in a dump.



I have just checked and it does not appear in /proc/iomem.

Ok your conclusion would be to have struct page, I'm going to implement 
this version then using memblock as you described.


Thanks David,

Alex






Thanks,

Alex


Signed-off-by: Alexandre Ghiti  [ Rebase on top of "Move
kernel mapping outside the linear mapping ]
Signed-off-by: Vitaly Wool 
---

Changes in v2:
- dedicated macro for XIP address fixup when MMU is not enabled yet
    o both for 32-bit and 64-bit RISC-V
- SP is explicitly set to a safe place in RAM before __copy_data call
- removed redundant alignment requirements in vmlinux-xip.lds.S
- changed long -> uintptr_t typecast in __XIP_FIXUP macro.
Changes in v3:
- rebased against latest for-next
- XIP address fixup macro now takes an argument
- SMP related fixes
Changes in v4:
- rebased against the current for-next
- less #ifdef's in C/ASM code
- dedicated XIP_FIXUP_OFFSET assembler macro in head.S
- C-specific definitions moved into #ifndef __ASSEMBLY__
- Fixed multi-core boot
Changes in v5:
- fixed build error for non-XIP kernels
Changes in v6:
- XIP_PHYS_RAM_BASE config option renamed to PHYS_RAM_BASE
- added PHYS_RAM_BASE_FIXED config flag to allow usage of
    PHYS_RAM_BASE in non-XIP configurations if needed
- XIP_FIXUP macro rewritten with a tempoarary variable to avoid side
    effects
- fixed crash for non-XIP kernels that don't use built-in DTB
Changes in v7:
- Fix pfn_base that required FIXUP
- Fix copy_data which lacked + 1 in size to copy
- Fix pfn_valid for FLATMEM
- Rebased on top of "Move kernel mapping outside the linear mapping":
    this is the biggest change and affected mm/init.c,
    kernel/vmlinux-xip.lds.S and include/asm/pgtable.h: XIP kernel is 
now

    mapped like 'normal' kernel at the end of the address space.

   arch/riscv/Kconfig  |  51 ++-
   arch/riscv/Makefile |   8 +-
   arch/riscv/boot/Makefile    |  13 +++
   arch/riscv/include/asm/page.h   |  28 ++
   arch/riscv/include/asm/pgtable.h    |  25 +-
   arch/riscv/kernel/head.S    |  46 +-
   arch/riscv/kernel/head.h    |   3 +
   arch/riscv/kernel/setup.c   |  10 ++-
   arch/riscv/kernel/vmlinux-xip.lds.S | 133 


   arch/riscv/kernel/vmlinux.lds.S |   6 ++
   arch/riscv/mm/init.c    | 118 ++--
   11 files changed, 424 insertions(+), 17 deletions(-)
   create mode 100644 arch/riscv/kernel/vmlinux-xip.lds.S

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 8ea60a0a19ae..4d0153805927 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -28,7 +28,7 @@ config RISCV
   select ARCH_HAS_PTE_SPECIAL
   select ARCH_HAS_SET_DIRECT_MAP
   select ARCH_HAS_SET_MEMORY
-    select ARCH_HAS_STRICT_KERNEL_RWX if MMU
+    select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
   select ARCH_HAS_TICK_BROADCAST if GEN

Re: [PATCH v7] RISC-V: enable XIP

2021-04-09 Thread Alex Ghiti

Le 4/9/21 à 2:51 AM, Alexandre Ghiti a écrit :

From: Vitaly Wool 

Introduce XIP (eXecute In Place) support for RISC-V platforms.
It allows code to be executed directly from non-volatile storage
directly addressable by the CPU, such as QSPI NOR flash which can
be found on many RISC-V platforms. This makes way for significant
optimization of RAM footprint. The XIP kernel is not compressed
since it has to run directly from flash, so it will occupy more
space on the non-volatile storage. The physical flash address used
to link the kernel object files and for storing it has to be known
at compile time and is represented by a Kconfig option.

XIP on RISC-V will for the time being only work on MMU-enabled
kernels.

I added linux-mm and linux-arch to get feedbacks because I noticed that 
DEBUG_VM_PGTABLE fails for SPARSEMEM (it works for FLATMEM but I think 
it does not do what is expected): the fact that we don't have any struct 
page to back the text and rodata in flash is the problem but to which 
extent ?


Thanks,

Alex


Signed-off-by: Alexandre Ghiti  [ Rebase on top of "Move
kernel mapping outside the linear mapping ]
Signed-off-by: Vitaly Wool 
---

Changes in v2:
- dedicated macro for XIP address fixup when MMU is not enabled yet
   o both for 32-bit and 64-bit RISC-V
- SP is explicitly set to a safe place in RAM before __copy_data call
- removed redundant alignment requirements in vmlinux-xip.lds.S
- changed long -> uintptr_t typecast in __XIP_FIXUP macro.
Changes in v3:
- rebased against latest for-next
- XIP address fixup macro now takes an argument
- SMP related fixes
Changes in v4:
- rebased against the current for-next
- less #ifdef's in C/ASM code
- dedicated XIP_FIXUP_OFFSET assembler macro in head.S
- C-specific definitions moved into #ifndef __ASSEMBLY__
- Fixed multi-core boot
Changes in v5:
- fixed build error for non-XIP kernels
Changes in v6:
- XIP_PHYS_RAM_BASE config option renamed to PHYS_RAM_BASE
- added PHYS_RAM_BASE_FIXED config flag to allow usage of
   PHYS_RAM_BASE in non-XIP configurations if needed
- XIP_FIXUP macro rewritten with a tempoarary variable to avoid side
   effects
- fixed crash for non-XIP kernels that don't use built-in DTB
Changes in v7:
- Fix pfn_base that required FIXUP
- Fix copy_data which lacked + 1 in size to copy
- Fix pfn_valid for FLATMEM
- Rebased on top of "Move kernel mapping outside the linear mapping":
   this is the biggest change and affected mm/init.c,
   kernel/vmlinux-xip.lds.S and include/asm/pgtable.h: XIP kernel is now
   mapped like 'normal' kernel at the end of the address space.

  arch/riscv/Kconfig  |  51 ++-
  arch/riscv/Makefile |   8 +-
  arch/riscv/boot/Makefile|  13 +++
  arch/riscv/include/asm/page.h   |  28 ++
  arch/riscv/include/asm/pgtable.h|  25 +-
  arch/riscv/kernel/head.S|  46 +-
  arch/riscv/kernel/head.h|   3 +
  arch/riscv/kernel/setup.c   |  10 ++-
  arch/riscv/kernel/vmlinux-xip.lds.S | 133 
  arch/riscv/kernel/vmlinux.lds.S |   6 ++
  arch/riscv/mm/init.c| 118 ++--
  11 files changed, 424 insertions(+), 17 deletions(-)
  create mode 100644 arch/riscv/kernel/vmlinux-xip.lds.S

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 8ea60a0a19ae..4d0153805927 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -28,7 +28,7 @@ config RISCV
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_SET_DIRECT_MAP
select ARCH_HAS_SET_MEMORY
-   select ARCH_HAS_STRICT_KERNEL_RWX if MMU
+   select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT
@@ -441,7 +441,7 @@ config EFI_STUB
  
  config EFI

bool "UEFI runtime support"
-   depends on OF
+   depends on OF && !XIP_KERNEL
select LIBFDT
select UCS2_STRING
select EFI_PARAMS_FROM_FDT
@@ -465,11 +465,56 @@ config STACKPROTECTOR_PER_TASK
def_bool y
depends on STACKPROTECTOR && CC_HAVE_STACKPROTECTOR_TLS
  
+config PHYS_RAM_BASE_FIXED

+   bool "Explicitly specified physical RAM address"
+   default n
+
+config PHYS_RAM_BASE
+   hex "Platform Physical RAM address"
+   depends on PHYS_RAM_BASE_FIXED
+   default "0x8000"
+   help
+ This is the physical address of RAM in the system. It has to be
+ explicitly specified to run early relocations of read-write data
+ from flash to RAM.
+
+config XIP_KERNEL
+   bool "Kernel Execute-In-Place from ROM"
+   depends on MMU
+   select PHYS_RAM_BASE_FIXED
+   help
+ Execute-In-Place allows the kernel to run from non-volatile storage
+ directly addressable by the CPU, such as NOR flash. This saves RAM
+ space 

Re: [PATCH] driver: of: Properly truncate command line if too long

2021-04-07 Thread Alex Ghiti

Hi Andy,

Le 4/6/21 à 6:56 PM, Andy Shevchenko a écrit :



On Tuesday, March 16, 2021, Alexandre Ghiti > wrote:


In case the command line given by the user is too long, warn about it
and truncate it to the last full argument.

This is what efi already does in commit 80b1bfe1cb2f ("efi/libstub:
Don't parse overlong command lines").

Reported-by: Dmitry Vyukov mailto:dvyu...@google.com>>
Signed-off-by: Alexandre Ghiti mailto:a...@ghiti.fr>>
---
  drivers/of/fdt.c | 21 -
  1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index dcc1dd96911a..de4c6f9bac39 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -25,6 +25,7 @@
  #include 
  #include 
  #include 
+#include 

  #include   /* for COMMAND_LINE_SIZE */
  #include 
@@ -1050,9 +1051,27 @@ int __init early_init_dt_scan_chosen(unsigned
long node, const char *uname,

         /* Retrieve command line */
         p = of_get_flat_dt_prop(node, "bootargs", );
-       if (p != NULL && l > 0)
+       if (p != NULL && l > 0) {
                 strlcpy(data, p, min(l, COMMAND_LINE_SIZE));

+               /*
+                * If the given command line size is larger than
+                * COMMAND_LINE_SIZE, truncate it to the last complete
+                * parameter.
+                */
+               if (l > COMMAND_LINE_SIZE) {
+                       char *cmd_p = (char *)data +
COMMAND_LINE_SIZE - 1;
+
+                       while (!isspace(*cmd_p))
+                               cmd_p--;


Shouldn’t you check for cmd_p being always bigger than or equal to data?


Yes you're right.



+
+                       *cmd_p = '\0';
+
+                       pr_err("Command line is too long: truncated
to %d bytes\n",
+                              (int)(cmd_p - (char *)data + 1));


Do you really need that casting?


No, I can use %td to print a pointer difference.

I'll send a v2.

Thanks,

Alex



+               }
+       }
+
         /*
          * CONFIG_CMDLINE is meant to be a default in case nothing else
          * managed to set the command line, unless CONFIG_CMDLINE_FORCE
-- 
2.20.1




--
With Best Regards,
Andy Shevchenko




Re: [PATCH v3 2/5] RISC-V: Add kexec support

2021-04-06 Thread Alex Ghiti



Le 4/5/21 à 4:57 AM, Nick Kossifidis a écrit :

This patch adds support for kexec on RISC-V. On SMP systems it depends
on HOTPLUG_CPU in order to be able to bring up all harts after kexec.
It also needs a recent OpenSBI version that supports the HSM extension.
I tested it on riscv64 QEMU on both an smp and a non-smp system.

v5:
  * For now depend on MMU, further changes needed for NOMMU support
  * Make sure stvec is aligned
  * Cleanup some unneeded fences
  * Verify control code's buffer size
  * Compile kexec_relocate.S with medany and norelax

v4:
  * No functional changes, just re-based

v3:
  * Use the new smp_shutdown_nonboot_cpus() call.
  * Move riscv_kexec_relocate to .rodata

v2:
  * Pass needed parameters as arguments to riscv_kexec_relocate
instead of using global variables.
  * Use kimage_arch to hold the fdt address of the included fdt.
  * Use SYM_* macros on kexec_relocate.S.
  * Compatibility with STRICT_KERNEL_RWX.
  * Compatibility with HOTPLUG_CPU for SMP
  * Small cleanups

Signed-off-by: Nick Kossifidis 
---
  arch/riscv/Kconfig |  15 +++
  arch/riscv/include/asm/kexec.h |  47 
  arch/riscv/kernel/Makefile |   5 +
  arch/riscv/kernel/kexec_relocate.S | 156 
  arch/riscv/kernel/machine_kexec.c  | 186 +
  5 files changed, 409 insertions(+)
  create mode 100644 arch/riscv/include/asm/kexec.h
  create mode 100644 arch/riscv/kernel/kexec_relocate.S
  create mode 100644 arch/riscv/kernel/machine_kexec.c

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 8ea60a0a1..3716262ef 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -389,6 +389,21 @@ config RISCV_SBI_V01
help
  This config allows kernel to use SBI v0.1 APIs. This will be
  deprecated in future once legacy M-mode software are no longer in use.
+
+config KEXEC
+   bool "Kexec system call"
+   select KEXEC_CORE
+   select HOTPLUG_CPU if SMP
+   depends on MMU
+   help
+ kexec is a system call that implements the ability to shutdown your
+ current kernel, and to start another kernel. It is like a reboot
+ but it is independent of the system firmware. And like a reboot
+ you can start any kernel with it, not just Linux.
+
+ The name comes from the similarity to the exec system call.
+
+
  endmenu
  
  menu "Boot options"

diff --git a/arch/riscv/include/asm/kexec.h b/arch/riscv/include/asm/kexec.h
new file mode 100644
index 0..efc69feb4
--- /dev/null
+++ b/arch/riscv/include/asm/kexec.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 FORTH-ICS/CARV
+ *  Nick Kossifidis 
+ */
+
+#ifndef _RISCV_KEXEC_H
+#define _RISCV_KEXEC_H
+
+/* Maximum physical address we can use pages from */
+#define KEXEC_SOURCE_MEMORY_LIMIT (-1UL)
+
+/* Maximum address we can reach in physical address mode */
+#define KEXEC_DESTINATION_MEMORY_LIMIT (-1UL)
+
+/* Maximum address we can use for the control code buffer */
+#define KEXEC_CONTROL_MEMORY_LIMIT (-1UL)
+
+/* Reserve a page for the control code buffer */
+#define KEXEC_CONTROL_PAGE_SIZE 4096


PAGE_SIZE instead ?


+
+#define KEXEC_ARCH KEXEC_ARCH_RISCV
+
+static inline void
+crash_setup_regs(struct pt_regs *newregs,
+struct pt_regs *oldregs)
+{
+   /* Dummy implementation for now */
+}
+
+
+#define ARCH_HAS_KIMAGE_ARCH
+
+struct kimage_arch {
+   unsigned long fdt_addr;
+};
+
+const extern unsigned char riscv_kexec_relocate[];
+const extern unsigned int riscv_kexec_relocate_size;
+
+typedef void (*riscv_kexec_do_relocate)(unsigned long first_ind_entry,
+   unsigned long jump_addr,
+   unsigned long fdt_addr,
+   unsigned long hartid,
+   unsigned long va_pa_off);
+
+#endif
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 3dc0abde9..c2594018c 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -9,6 +9,10 @@ CFLAGS_REMOVE_patch.o  = $(CC_FLAGS_FTRACE)
  CFLAGS_REMOVE_sbi.o   = $(CC_FLAGS_FTRACE)
  endif
  
+ifdef CONFIG_KEXEC

+AFLAGS_kexec_relocate.o := -mcmodel=medany -mno-relax
+endif
+
  extra-y += head.o
  extra-y += vmlinux.lds
  
@@ -54,6 +58,7 @@ obj-$(CONFIG_SMP) += cpu_ops_sbi.o

  endif
  obj-$(CONFIG_HOTPLUG_CPU) += cpu-hotplug.o
  obj-$(CONFIG_KGDB)+= kgdb.o
+obj-${CONFIG_KEXEC}+= kexec_relocate.o machine_kexec.o


Other obj-$() use parenthesis.

  
  obj-$(CONFIG_JUMP_LABEL)	+= jump_label.o
  
diff --git a/arch/riscv/kernel/kexec_relocate.S b/arch/riscv/kernel/kexec_relocate.S

new file mode 100644
index 0..616c20771
--- /dev/null
+++ b/arch/riscv/kernel/kexec_relocate.S
@@ -0,0 +1,156 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 FORTH-ICS/CARV
+ *  Nick Kossifidis 
+ */
+
+#include 

Re: [PATCH v3 4/5] RISC-V: Add kdump support

2021-04-06 Thread Alex Ghiti

Hi Nick,

Le 4/5/21 à 4:57 AM, Nick Kossifidis a écrit :

This patch adds support for kdump, the kernel will reserve a
region for the crash kernel and jump there on panic. In order
for userspace tools (kexec-tools) to prepare the crash kernel
kexec image, we also need to expose some information on
/proc/iomem for the memory regions used by the kernel and for
the region reserved for crash kernel. Note that on userspace
the device tree is used to determine the system's memory
layout so the "System RAM" on /proc/iomem is ignored.

I tested this on riscv64 qemu and works as expected, you may
test it by triggering a crash through /proc/sysrq_trigger:

echo c > /proc/sysrq_trigger

v3:
  * Move ELF_CORE_COPY_REGS to asm/elf.h instead of uapi/asm/elf.h
  * Set stvec when disabling MMU
  * Minor cleanups and re-base

v2:
  * Properly populate the ioresources tree, so that it can be
used later on for implementing strict /dev/mem.
  * Minor cleanups and re-base

Signed-off-by: Nick Kossifidis 
---
  arch/riscv/include/asm/elf.h|  6 +++
  arch/riscv/include/asm/kexec.h  | 19 ---
  arch/riscv/kernel/Makefile  |  2 +-
  arch/riscv/kernel/crash_save_regs.S | 56 +
  arch/riscv/kernel/kexec_relocate.S  | 68 -
  arch/riscv/kernel/machine_kexec.c   | 43 +---
  arch/riscv/kernel/setup.c   | 11 -
  arch/riscv/mm/init.c| 77 +
  8 files changed, 255 insertions(+), 27 deletions(-)
  create mode 100644 arch/riscv/kernel/crash_save_regs.S

diff --git a/arch/riscv/include/asm/elf.h b/arch/riscv/include/asm/elf.h
index 5c725e1df..f4b490cd0 100644
--- a/arch/riscv/include/asm/elf.h
+++ b/arch/riscv/include/asm/elf.h
@@ -81,4 +81,10 @@ extern int arch_setup_additional_pages(struct linux_binprm 
*bprm,
int uses_interp);
  #endif /* CONFIG_MMU */
  
+#define ELF_CORE_COPY_REGS(dest, regs)			\

+do {   \
+   *(struct user_regs_struct *)&(dest) =   \
+   *(struct user_regs_struct *)regs;   \
+} while (0);
+
  #endif /* _ASM_RISCV_ELF_H */
diff --git a/arch/riscv/include/asm/kexec.h b/arch/riscv/include/asm/kexec.h
index efc69feb4..4fd583acc 100644
--- a/arch/riscv/include/asm/kexec.h
+++ b/arch/riscv/include/asm/kexec.h
@@ -21,11 +21,16 @@
  
  #define KEXEC_ARCH KEXEC_ARCH_RISCV
  
+extern void riscv_crash_save_regs(struct pt_regs *newregs);

+
  static inline void
  crash_setup_regs(struct pt_regs *newregs,
 struct pt_regs *oldregs)
  {
-   /* Dummy implementation for now */
+   if (oldregs)
+   memcpy(newregs, oldregs, sizeof(struct pt_regs));
+   else
+   riscv_crash_save_regs(newregs);
  }
  
  
@@ -38,10 +43,12 @@ struct kimage_arch {

  const extern unsigned char riscv_kexec_relocate[];
  const extern unsigned int riscv_kexec_relocate_size;
  
-typedef void (*riscv_kexec_do_relocate)(unsigned long first_ind_entry,

-   unsigned long jump_addr,
-   unsigned long fdt_addr,
-   unsigned long hartid,
-   unsigned long va_pa_off);
+typedef void (*riscv_kexec_method)(unsigned long first_ind_entry,
+  unsigned long jump_addr,
+  unsigned long fdt_addr,
+  unsigned long hartid,
+  unsigned long va_pa_off);
+
+extern riscv_kexec_method riscv_kexec_norelocate;
  
  #endif

diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index c2594018c..07f676ad3 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -58,7 +58,7 @@ obj-$(CONFIG_SMP) += cpu_ops_sbi.o
  endif
  obj-$(CONFIG_HOTPLUG_CPU) += cpu-hotplug.o
  obj-$(CONFIG_KGDB)+= kgdb.o
-obj-${CONFIG_KEXEC}+= kexec_relocate.o machine_kexec.o
+obj-${CONFIG_KEXEC}+= kexec_relocate.o crash_save_regs.o 
machine_kexec.o
  
  obj-$(CONFIG_JUMP_LABEL)	+= jump_label.o
  
diff --git a/arch/riscv/kernel/crash_save_regs.S b/arch/riscv/kernel/crash_save_regs.S

new file mode 100644
index 0..7832fb763
--- /dev/null
+++ b/arch/riscv/kernel/crash_save_regs.S
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2020 FORTH-ICS/CARV
+ *  Nick Kossifidis 
+ */
+
+#include  /* For RISCV_* and REG_* macros */
+#include  /* For CSR_* macros */
+#include  /* For offsets on pt_regs */
+#include/* For SYM_* macros */
+
+.section ".text"
+SYM_CODE_START(riscv_crash_save_regs)
+   REG_S ra,  PT_RA(a0)/* x1 */
+   REG_S sp,  PT_SP(a0)/* x2 */
+   REG_S gp,  PT_GP(a0)/* x3 */
+   REG_S tp,  PT_TP(a0)/* x4 */
+   REG_S t0,  PT_T0(a0)/* x5 */
+   REG_S t1,  PT_T1(a0)/* x6 */
+   REG_S t2,  

Re: [PATCH v6] RISC-V: enable XIP

2021-04-06 Thread Alex Ghiti



Le 4/6/21 à 3:54 AM, Vitaly Wool a écrit :

On Tue, Apr 6, 2021 at 8:47 AM Alex Ghiti  wrote:


Hi Vitaly,

Le 4/5/21 à 4:34 AM, Vitaly Wool a écrit :

On Sun, Apr 4, 2021 at 10:39 AM Vitaly Wool  wrote:


On Sat, Apr 3, 2021 at 12:00 PM Alex Ghiti  wrote:


Hi Vitaly,

Le 4/1/21 à 7:10 AM, Alex Ghiti a écrit :

Le 4/1/21 à 4:52 AM, Vitaly Wool a écrit :

Hi Alex,

On Thu, Apr 1, 2021 at 10:11 AM Alex Ghiti  wrote:


Hi,

Le 3/30/21 à 4:04 PM, Alex Ghiti a écrit :

Le 3/30/21 à 3:33 PM, Palmer Dabbelt a écrit :

On Tue, 30 Mar 2021 11:39:10 PDT (-0700), a...@ghiti.fr wrote:



Le 3/30/21 à 2:26 AM, Vitaly Wool a écrit :

On Tue, Mar 30, 2021 at 8:23 AM Palmer Dabbelt
 wrote:


On Sun, 21 Mar 2021 17:12:15 PDT (-0700), vitaly.w...@konsulko.com
wrote:

Introduce XIP (eXecute In Place) support for RISC-V platforms.
It allows code to be executed directly from non-volatile storage
directly addressable by the CPU, such as QSPI NOR flash which can
be found on many RISC-V platforms. This makes way for significant
optimization of RAM footprint. The XIP kernel is not compressed
since it has to run directly from flash, so it will occupy more
space on the non-volatile storage. The physical flash address used
to link the kernel object files and for storing it has to be known
at compile time and is represented by a Kconfig option.

XIP on RISC-V will for the time being only work on MMU-enabled
kernels.

Signed-off-by: Vitaly Wool 

---

Changes in v2:
- dedicated macro for XIP address fixup when MMU is not enabled
yet
  o both for 32-bit and 64-bit RISC-V
- SP is explicitly set to a safe place in RAM before
__copy_data call
- removed redundant alignment requirements in vmlinux-xip.lds.S
- changed long -> uintptr_t typecast in __XIP_FIXUP macro.
Changes in v3:
- rebased against latest for-next
- XIP address fixup macro now takes an argument
- SMP related fixes
Changes in v4:
- rebased against the current for-next
- less #ifdef's in C/ASM code
- dedicated XIP_FIXUP_OFFSET assembler macro in head.S
- C-specific definitions moved into #ifndef __ASSEMBLY__
- Fixed multi-core boot
Changes in v5:
- fixed build error for non-XIP kernels
Changes in v6:
- XIP_PHYS_RAM_BASE config option renamed to PHYS_RAM_BASE
- added PHYS_RAM_BASE_FIXED config flag to allow usage of
  PHYS_RAM_BASE in non-XIP configurations if needed
- XIP_FIXUP macro rewritten with a tempoarary variable to avoid
side
  effects
- fixed crash for non-XIP kernels that don't use built-in DTB


So v5 landed on for-next, which generally means it's best to avoid
re-spinning the patch and instead send along fixups.  That said,
the v5
is causing some testing failures for me.

I'm going to drop the v5 for now as I don't have time to test this
tonight.  I'll try and take a look soon, as it will conflict with
Alex's
patches.


I can come up with the incremental patch instead pretty much
straight
away if that works better.

~Vitaly


 arch/riscv/Kconfig  |  49 ++-
 arch/riscv/Makefile |   8 +-
 arch/riscv/boot/Makefile|  13 +++
 arch/riscv/include/asm/pgtable.h|  65 --
 arch/riscv/kernel/cpu_ops_sbi.c |  11 ++-
 arch/riscv/kernel/head.S|  49 ++-
 arch/riscv/kernel/head.h|   3 +
 arch/riscv/kernel/setup.c   |   8 +-
 arch/riscv/kernel/vmlinux-xip.lds.S | 132

 arch/riscv/kernel/vmlinux.lds.S |   6 ++
 arch/riscv/mm/init.c| 100 +++--
 11 files changed, 426 insertions(+), 18 deletions(-)
 create mode 100644 arch/riscv/kernel/vmlinux-xip.lds.S

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 8ea60a0a19ae..bd6f82240c34 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -441,7 +441,7 @@ config EFI_STUB

 config EFI
  bool "UEFI runtime support"
- depends on OF
+ depends on OF && !XIP_KERNEL
  select LIBFDT
  select UCS2_STRING
  select EFI_PARAMS_FROM_FDT
@@ -465,11 +465,56 @@ config STACKPROTECTOR_PER_TASK
  def_bool y
  depends on STACKPROTECTOR && CC_HAVE_STACKPROTECTOR_TLS

+config PHYS_RAM_BASE_FIXED
+ bool "Explicitly specified physical RAM address"
+ default n
+
+config PHYS_RAM_BASE
+ hex "Platform Physical RAM address"
+ depends on PHYS_RAM_BASE_FIXED
+ default "0x8000"
+ help
+   This is the physical address of RAM in the system. It has
to be
+   explicitly specified to run early relocations of
read-write data
+   from flash to RAM.
+
+config XIP_KERNEL
+ bool "Kernel Execute-In-Place from ROM"
+ depends on MMU
+ select PHYS_RAM_BASE_FIXED
+ help
+   Execute-In-Place allows the kernel to run from
non-volatile storage
+   directly addressable by the CPU, such as NOR flash. This
saves RAM
+   space since the text secti

Re: [PATCH] driver: of: Properly truncate command line if too long

2021-04-06 Thread Alex Ghiti

Le 4/6/21 à 9:40 AM, Rob Herring a écrit :

On Sat, Apr 3, 2021 at 7:09 AM Alex Ghiti  wrote:


Hi,

Le 3/16/21 à 3:38 PM, Alexandre Ghiti a écrit :

In case the command line given by the user is too long, warn about it
and truncate it to the last full argument.

This is what efi already does in commit 80b1bfe1cb2f ("efi/libstub:
Don't parse overlong command lines").

Reported-by: Dmitry Vyukov 
Signed-off-by: Alexandre Ghiti 
---
   drivers/of/fdt.c | 21 -
   1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index dcc1dd96911a..de4c6f9bac39 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -25,6 +25,7 @@
   #include 
   #include 
   #include 
+#include 

   #include   /* for COMMAND_LINE_SIZE */
   #include 
@@ -1050,9 +1051,27 @@ int __init early_init_dt_scan_chosen(unsigned long node, 
const char *uname,

   /* Retrieve command line */
   p = of_get_flat_dt_prop(node, "bootargs", );
- if (p != NULL && l > 0)
+ if (p != NULL && l > 0) {
   strlcpy(data, p, min(l, COMMAND_LINE_SIZE));

+ /*
+  * If the given command line size is larger than
+  * COMMAND_LINE_SIZE, truncate it to the last complete
+  * parameter.
+  */
+ if (l > COMMAND_LINE_SIZE) {
+ char *cmd_p = (char *)data + COMMAND_LINE_SIZE - 1;
+
+ while (!isspace(*cmd_p))
+ cmd_p--;
+
+ *cmd_p = '\0';
+
+ pr_err("Command line is too long: truncated to %d 
bytes\n",
+(int)(cmd_p - (char *)data + 1));
+ }
+ }
+
   /*
* CONFIG_CMDLINE is meant to be a default in case nothing else
* managed to set the command line, unless CONFIG_CMDLINE_FORCE



Any thought about that ?


It looks fine to me, but this will need to be adapted to the generic
command line support[1][2] when that is merged. So I've been waiting
to see if that's going to happen this cycle.


Ok I'll take a look then, thanks.

Alex



Rob

[1] 
https://lore.kernel.org/lkml/cover.1616765869.git.christophe.le...@csgroup.eu/
[2] 
https://lore.kernel.org/lkml/41021d66db2ab427c14255d2a24bb4517c8b58fd.1617126961.git.danie...@cisco.com/



Re: [PATCH v6] RISC-V: enable XIP

2021-04-06 Thread Alex Ghiti

Hi Vitaly,

Le 4/5/21 à 4:34 AM, Vitaly Wool a écrit :

On Sun, Apr 4, 2021 at 10:39 AM Vitaly Wool  wrote:


On Sat, Apr 3, 2021 at 12:00 PM Alex Ghiti  wrote:


Hi Vitaly,

Le 4/1/21 à 7:10 AM, Alex Ghiti a écrit :

Le 4/1/21 à 4:52 AM, Vitaly Wool a écrit :

Hi Alex,

On Thu, Apr 1, 2021 at 10:11 AM Alex Ghiti  wrote:


Hi,

Le 3/30/21 à 4:04 PM, Alex Ghiti a écrit :

Le 3/30/21 à 3:33 PM, Palmer Dabbelt a écrit :

On Tue, 30 Mar 2021 11:39:10 PDT (-0700), a...@ghiti.fr wrote:



Le 3/30/21 à 2:26 AM, Vitaly Wool a écrit :

On Tue, Mar 30, 2021 at 8:23 AM Palmer Dabbelt
 wrote:


On Sun, 21 Mar 2021 17:12:15 PDT (-0700), vitaly.w...@konsulko.com
wrote:

Introduce XIP (eXecute In Place) support for RISC-V platforms.
It allows code to be executed directly from non-volatile storage
directly addressable by the CPU, such as QSPI NOR flash which can
be found on many RISC-V platforms. This makes way for significant
optimization of RAM footprint. The XIP kernel is not compressed
since it has to run directly from flash, so it will occupy more
space on the non-volatile storage. The physical flash address used
to link the kernel object files and for storing it has to be known
at compile time and is represented by a Kconfig option.

XIP on RISC-V will for the time being only work on MMU-enabled
kernels.

Signed-off-by: Vitaly Wool 

---

Changes in v2:
- dedicated macro for XIP address fixup when MMU is not enabled
yet
 o both for 32-bit and 64-bit RISC-V
- SP is explicitly set to a safe place in RAM before
__copy_data call
- removed redundant alignment requirements in vmlinux-xip.lds.S
- changed long -> uintptr_t typecast in __XIP_FIXUP macro.
Changes in v3:
- rebased against latest for-next
- XIP address fixup macro now takes an argument
- SMP related fixes
Changes in v4:
- rebased against the current for-next
- less #ifdef's in C/ASM code
- dedicated XIP_FIXUP_OFFSET assembler macro in head.S
- C-specific definitions moved into #ifndef __ASSEMBLY__
- Fixed multi-core boot
Changes in v5:
- fixed build error for non-XIP kernels
Changes in v6:
- XIP_PHYS_RAM_BASE config option renamed to PHYS_RAM_BASE
- added PHYS_RAM_BASE_FIXED config flag to allow usage of
 PHYS_RAM_BASE in non-XIP configurations if needed
- XIP_FIXUP macro rewritten with a tempoarary variable to avoid
side
 effects
- fixed crash for non-XIP kernels that don't use built-in DTB


So v5 landed on for-next, which generally means it's best to avoid
re-spinning the patch and instead send along fixups.  That said,
the v5
is causing some testing failures for me.

I'm going to drop the v5 for now as I don't have time to test this
tonight.  I'll try and take a look soon, as it will conflict with
Alex's
patches.


I can come up with the incremental patch instead pretty much
straight
away if that works better.

~Vitaly


arch/riscv/Kconfig  |  49 ++-
arch/riscv/Makefile |   8 +-
arch/riscv/boot/Makefile|  13 +++
arch/riscv/include/asm/pgtable.h|  65 --
arch/riscv/kernel/cpu_ops_sbi.c |  11 ++-
arch/riscv/kernel/head.S|  49 ++-
arch/riscv/kernel/head.h|   3 +
arch/riscv/kernel/setup.c   |   8 +-
arch/riscv/kernel/vmlinux-xip.lds.S | 132

arch/riscv/kernel/vmlinux.lds.S |   6 ++
arch/riscv/mm/init.c| 100 +++--
11 files changed, 426 insertions(+), 18 deletions(-)
create mode 100644 arch/riscv/kernel/vmlinux-xip.lds.S

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 8ea60a0a19ae..bd6f82240c34 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -441,7 +441,7 @@ config EFI_STUB

config EFI
 bool "UEFI runtime support"
- depends on OF
+ depends on OF && !XIP_KERNEL
 select LIBFDT
 select UCS2_STRING
 select EFI_PARAMS_FROM_FDT
@@ -465,11 +465,56 @@ config STACKPROTECTOR_PER_TASK
 def_bool y
 depends on STACKPROTECTOR && CC_HAVE_STACKPROTECTOR_TLS

+config PHYS_RAM_BASE_FIXED
+ bool "Explicitly specified physical RAM address"
+ default n
+
+config PHYS_RAM_BASE
+ hex "Platform Physical RAM address"
+ depends on PHYS_RAM_BASE_FIXED
+ default "0x8000"
+ help
+   This is the physical address of RAM in the system. It has
to be
+   explicitly specified to run early relocations of
read-write data
+   from flash to RAM.
+
+config XIP_KERNEL
+ bool "Kernel Execute-In-Place from ROM"
+ depends on MMU
+ select PHYS_RAM_BASE_FIXED
+ help
+   Execute-In-Place allows the kernel to run from
non-volatile storage
+   directly addressable by the CPU, such as NOR flash. This
saves RAM
+   space since the text section of the kernel is not loaded
from flash
+   to RAM.  Read-write sections, such as the data section and
stack,
+  

Re: [PATCH] driver: of: Properly truncate command line if too long

2021-04-03 Thread Alex Ghiti

Hi,

Le 3/16/21 à 3:38 PM, Alexandre Ghiti a écrit :

In case the command line given by the user is too long, warn about it
and truncate it to the last full argument.

This is what efi already does in commit 80b1bfe1cb2f ("efi/libstub:
Don't parse overlong command lines").

Reported-by: Dmitry Vyukov 
Signed-off-by: Alexandre Ghiti 
---
  drivers/of/fdt.c | 21 -
  1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index dcc1dd96911a..de4c6f9bac39 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -25,6 +25,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include   /* for COMMAND_LINE_SIZE */

  #include 
@@ -1050,9 +1051,27 @@ int __init early_init_dt_scan_chosen(unsigned long node, 
const char *uname,
  
  	/* Retrieve command line */

p = of_get_flat_dt_prop(node, "bootargs", );
-   if (p != NULL && l > 0)
+   if (p != NULL && l > 0) {
strlcpy(data, p, min(l, COMMAND_LINE_SIZE));
  
+		/*

+* If the given command line size is larger than
+* COMMAND_LINE_SIZE, truncate it to the last complete
+* parameter.
+*/
+   if (l > COMMAND_LINE_SIZE) {
+   char *cmd_p = (char *)data + COMMAND_LINE_SIZE - 1;
+
+   while (!isspace(*cmd_p))
+   cmd_p--;
+
+   *cmd_p = '\0';
+
+   pr_err("Command line is too long: truncated to %d 
bytes\n",
+  (int)(cmd_p - (char *)data + 1));
+   }
+   }
+
/*
 * CONFIG_CMDLINE is meant to be a default in case nothing else
 * managed to set the command line, unless CONFIG_CMDLINE_FORCE



Any thought about that ?

Thanks,

Alex


Re: [PATCH v6] RISC-V: enable XIP

2021-04-03 Thread Alex Ghiti

Hi Vitaly,

Le 4/1/21 à 7:10 AM, Alex Ghiti a écrit :

Le 4/1/21 à 4:52 AM, Vitaly Wool a écrit :

Hi Alex,

On Thu, Apr 1, 2021 at 10:11 AM Alex Ghiti  wrote:


Hi,

Le 3/30/21 à 4:04 PM, Alex Ghiti a écrit :

Le 3/30/21 à 3:33 PM, Palmer Dabbelt a écrit :

On Tue, 30 Mar 2021 11:39:10 PDT (-0700), a...@ghiti.fr wrote:



Le 3/30/21 à 2:26 AM, Vitaly Wool a écrit :

On Tue, Mar 30, 2021 at 8:23 AM Palmer Dabbelt
 wrote:


On Sun, 21 Mar 2021 17:12:15 PDT (-0700), vitaly.w...@konsulko.com
wrote:

Introduce XIP (eXecute In Place) support for RISC-V platforms.
It allows code to be executed directly from non-volatile storage
directly addressable by the CPU, such as QSPI NOR flash which can
be found on many RISC-V platforms. This makes way for significant
optimization of RAM footprint. The XIP kernel is not compressed
since it has to run directly from flash, so it will occupy more
space on the non-volatile storage. The physical flash address used
to link the kernel object files and for storing it has to be known
at compile time and is represented by a Kconfig option.

XIP on RISC-V will for the time being only work on MMU-enabled
kernels.

Signed-off-by: Vitaly Wool 

---

Changes in v2:
- dedicated macro for XIP address fixup when MMU is not enabled 
yet

    o both for 32-bit and 64-bit RISC-V
- SP is explicitly set to a safe place in RAM before 
__copy_data call

- removed redundant alignment requirements in vmlinux-xip.lds.S
- changed long -> uintptr_t typecast in __XIP_FIXUP macro.
Changes in v3:
- rebased against latest for-next
- XIP address fixup macro now takes an argument
- SMP related fixes
Changes in v4:
- rebased against the current for-next
- less #ifdef's in C/ASM code
- dedicated XIP_FIXUP_OFFSET assembler macro in head.S
- C-specific definitions moved into #ifndef __ASSEMBLY__
- Fixed multi-core boot
Changes in v5:
- fixed build error for non-XIP kernels
Changes in v6:
- XIP_PHYS_RAM_BASE config option renamed to PHYS_RAM_BASE
- added PHYS_RAM_BASE_FIXED config flag to allow usage of
    PHYS_RAM_BASE in non-XIP configurations if needed
- XIP_FIXUP macro rewritten with a tempoarary variable to avoid 
side

    effects
- fixed crash for non-XIP kernels that don't use built-in DTB


So v5 landed on for-next, which generally means it's best to avoid
re-spinning the patch and instead send along fixups.  That said,
the v5
is causing some testing failures for me.

I'm going to drop the v5 for now as I don't have time to test this
tonight.  I'll try and take a look soon, as it will conflict with
Alex's
patches.


I can come up with the incremental patch instead pretty much 
straight

away if that works better.

~Vitaly


   arch/riscv/Kconfig  |  49 ++-
   arch/riscv/Makefile |   8 +-
   arch/riscv/boot/Makefile    |  13 +++
   arch/riscv/include/asm/pgtable.h    |  65 --
   arch/riscv/kernel/cpu_ops_sbi.c |  11 ++-
   arch/riscv/kernel/head.S    |  49 ++-
   arch/riscv/kernel/head.h    |   3 +
   arch/riscv/kernel/setup.c   |   8 +-
   arch/riscv/kernel/vmlinux-xip.lds.S | 132

   arch/riscv/kernel/vmlinux.lds.S |   6 ++
   arch/riscv/mm/init.c    | 100 +++--
   11 files changed, 426 insertions(+), 18 deletions(-)
   create mode 100644 arch/riscv/kernel/vmlinux-xip.lds.S

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 8ea60a0a19ae..bd6f82240c34 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -441,7 +441,7 @@ config EFI_STUB

   config EFI
    bool "UEFI runtime support"
- depends on OF
+ depends on OF && !XIP_KERNEL
    select LIBFDT
    select UCS2_STRING
    select EFI_PARAMS_FROM_FDT
@@ -465,11 +465,56 @@ config STACKPROTECTOR_PER_TASK
    def_bool y
    depends on STACKPROTECTOR && CC_HAVE_STACKPROTECTOR_TLS

+config PHYS_RAM_BASE_FIXED
+ bool "Explicitly specified physical RAM address"
+ default n
+
+config PHYS_RAM_BASE
+ hex "Platform Physical RAM address"
+ depends on PHYS_RAM_BASE_FIXED
+ default "0x8000"
+ help
+   This is the physical address of RAM in the system. It has
to be
+   explicitly specified to run early relocations of
read-write data
+   from flash to RAM.
+
+config XIP_KERNEL
+ bool "Kernel Execute-In-Place from ROM"
+ depends on MMU
+ select PHYS_RAM_BASE_FIXED
+ help
+   Execute-In-Place allows the kernel to run from
non-volatile storage
+   directly addressable by the CPU, such as NOR flash. This
saves RAM
+   space since the text section of the kernel is not loaded
from flash
+   to RAM.  Read-write sections, such as the data section and
stack,
+   are still copied to RAM.  The XIP kernel is not compressed
since
+   it has to run directly from flash, so it will take more
space to
+   store it.  The flash 

Re: [PATCH v6] RISC-V: enable XIP

2021-04-01 Thread Alex Ghiti

Le 4/1/21 à 4:52 AM, Vitaly Wool a écrit :

Hi Alex,

On Thu, Apr 1, 2021 at 10:11 AM Alex Ghiti  wrote:


Hi,

Le 3/30/21 à 4:04 PM, Alex Ghiti a écrit :

Le 3/30/21 à 3:33 PM, Palmer Dabbelt a écrit :

On Tue, 30 Mar 2021 11:39:10 PDT (-0700), a...@ghiti.fr wrote:



Le 3/30/21 à 2:26 AM, Vitaly Wool a écrit :

On Tue, Mar 30, 2021 at 8:23 AM Palmer Dabbelt
 wrote:


On Sun, 21 Mar 2021 17:12:15 PDT (-0700), vitaly.w...@konsulko.com
wrote:

Introduce XIP (eXecute In Place) support for RISC-V platforms.
It allows code to be executed directly from non-volatile storage
directly addressable by the CPU, such as QSPI NOR flash which can
be found on many RISC-V platforms. This makes way for significant
optimization of RAM footprint. The XIP kernel is not compressed
since it has to run directly from flash, so it will occupy more
space on the non-volatile storage. The physical flash address used
to link the kernel object files and for storing it has to be known
at compile time and is represented by a Kconfig option.

XIP on RISC-V will for the time being only work on MMU-enabled
kernels.

Signed-off-by: Vitaly Wool 

---

Changes in v2:
- dedicated macro for XIP address fixup when MMU is not enabled yet
o both for 32-bit and 64-bit RISC-V
- SP is explicitly set to a safe place in RAM before __copy_data call
- removed redundant alignment requirements in vmlinux-xip.lds.S
- changed long -> uintptr_t typecast in __XIP_FIXUP macro.
Changes in v3:
- rebased against latest for-next
- XIP address fixup macro now takes an argument
- SMP related fixes
Changes in v4:
- rebased against the current for-next
- less #ifdef's in C/ASM code
- dedicated XIP_FIXUP_OFFSET assembler macro in head.S
- C-specific definitions moved into #ifndef __ASSEMBLY__
- Fixed multi-core boot
Changes in v5:
- fixed build error for non-XIP kernels
Changes in v6:
- XIP_PHYS_RAM_BASE config option renamed to PHYS_RAM_BASE
- added PHYS_RAM_BASE_FIXED config flag to allow usage of
PHYS_RAM_BASE in non-XIP configurations if needed
- XIP_FIXUP macro rewritten with a tempoarary variable to avoid side
effects
- fixed crash for non-XIP kernels that don't use built-in DTB


So v5 landed on for-next, which generally means it's best to avoid
re-spinning the patch and instead send along fixups.  That said,
the v5
is causing some testing failures for me.

I'm going to drop the v5 for now as I don't have time to test this
tonight.  I'll try and take a look soon, as it will conflict with
Alex's
patches.


I can come up with the incremental patch instead pretty much straight
away if that works better.

~Vitaly


   arch/riscv/Kconfig  |  49 ++-
   arch/riscv/Makefile |   8 +-
   arch/riscv/boot/Makefile|  13 +++
   arch/riscv/include/asm/pgtable.h|  65 --
   arch/riscv/kernel/cpu_ops_sbi.c |  11 ++-
   arch/riscv/kernel/head.S|  49 ++-
   arch/riscv/kernel/head.h|   3 +
   arch/riscv/kernel/setup.c   |   8 +-
   arch/riscv/kernel/vmlinux-xip.lds.S | 132

   arch/riscv/kernel/vmlinux.lds.S |   6 ++
   arch/riscv/mm/init.c| 100 +++--
   11 files changed, 426 insertions(+), 18 deletions(-)
   create mode 100644 arch/riscv/kernel/vmlinux-xip.lds.S

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 8ea60a0a19ae..bd6f82240c34 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -441,7 +441,7 @@ config EFI_STUB

   config EFI
bool "UEFI runtime support"
- depends on OF
+ depends on OF && !XIP_KERNEL
select LIBFDT
select UCS2_STRING
select EFI_PARAMS_FROM_FDT
@@ -465,11 +465,56 @@ config STACKPROTECTOR_PER_TASK
def_bool y
depends on STACKPROTECTOR && CC_HAVE_STACKPROTECTOR_TLS

+config PHYS_RAM_BASE_FIXED
+ bool "Explicitly specified physical RAM address"
+ default n
+
+config PHYS_RAM_BASE
+ hex "Platform Physical RAM address"
+ depends on PHYS_RAM_BASE_FIXED
+ default "0x8000"
+ help
+   This is the physical address of RAM in the system. It has
to be
+   explicitly specified to run early relocations of
read-write data
+   from flash to RAM.
+
+config XIP_KERNEL
+ bool "Kernel Execute-In-Place from ROM"
+ depends on MMU
+ select PHYS_RAM_BASE_FIXED
+ help
+   Execute-In-Place allows the kernel to run from
non-volatile storage
+   directly addressable by the CPU, such as NOR flash. This
saves RAM
+   space since the text section of the kernel is not loaded
from flash
+   to RAM.  Read-write sections, such as the data section and
stack,
+   are still copied to RAM.  The XIP kernel is not compressed
since
+   it has to run directly from flash, so it will take more
space to
+   store it.  The flash address used to link the kernel
object files,
+   and for storing

Re: [PATCH v6] RISC-V: enable XIP

2021-04-01 Thread Alex Ghiti

Hi,

Le 3/30/21 à 4:04 PM, Alex Ghiti a écrit :

Le 3/30/21 à 3:33 PM, Palmer Dabbelt a écrit :

On Tue, 30 Mar 2021 11:39:10 PDT (-0700), a...@ghiti.fr wrote:



Le 3/30/21 à 2:26 AM, Vitaly Wool a écrit :
On Tue, Mar 30, 2021 at 8:23 AM Palmer Dabbelt 
 wrote:


On Sun, 21 Mar 2021 17:12:15 PDT (-0700), vitaly.w...@konsulko.com 
wrote:

Introduce XIP (eXecute In Place) support for RISC-V platforms.
It allows code to be executed directly from non-volatile storage
directly addressable by the CPU, such as QSPI NOR flash which can
be found on many RISC-V platforms. This makes way for significant
optimization of RAM footprint. The XIP kernel is not compressed
since it has to run directly from flash, so it will occupy more
space on the non-volatile storage. The physical flash address used
to link the kernel object files and for storing it has to be known
at compile time and is represented by a Kconfig option.

XIP on RISC-V will for the time being only work on MMU-enabled
kernels.

Signed-off-by: Vitaly Wool 

---

Changes in v2:
- dedicated macro for XIP address fixup when MMU is not enabled yet
   o both for 32-bit and 64-bit RISC-V
- SP is explicitly set to a safe place in RAM before __copy_data call
- removed redundant alignment requirements in vmlinux-xip.lds.S
- changed long -> uintptr_t typecast in __XIP_FIXUP macro.
Changes in v3:
- rebased against latest for-next
- XIP address fixup macro now takes an argument
- SMP related fixes
Changes in v4:
- rebased against the current for-next
- less #ifdef's in C/ASM code
- dedicated XIP_FIXUP_OFFSET assembler macro in head.S
- C-specific definitions moved into #ifndef __ASSEMBLY__
- Fixed multi-core boot
Changes in v5:
- fixed build error for non-XIP kernels
Changes in v6:
- XIP_PHYS_RAM_BASE config option renamed to PHYS_RAM_BASE
- added PHYS_RAM_BASE_FIXED config flag to allow usage of
   PHYS_RAM_BASE in non-XIP configurations if needed
- XIP_FIXUP macro rewritten with a tempoarary variable to avoid side
   effects
- fixed crash for non-XIP kernels that don't use built-in DTB


So v5 landed on for-next, which generally means it's best to avoid
re-spinning the patch and instead send along fixups.  That said, 
the v5

is causing some testing failures for me.

I'm going to drop the v5 for now as I don't have time to test this
tonight.  I'll try and take a look soon, as it will conflict with 
Alex's

patches.


I can come up with the incremental patch instead pretty much straight
away if that works better.

~Vitaly


  arch/riscv/Kconfig  |  49 ++-
  arch/riscv/Makefile |   8 +-
  arch/riscv/boot/Makefile    |  13 +++
  arch/riscv/include/asm/pgtable.h    |  65 --
  arch/riscv/kernel/cpu_ops_sbi.c |  11 ++-
  arch/riscv/kernel/head.S    |  49 ++-
  arch/riscv/kernel/head.h    |   3 +
  arch/riscv/kernel/setup.c   |   8 +-
  arch/riscv/kernel/vmlinux-xip.lds.S | 132 


  arch/riscv/kernel/vmlinux.lds.S |   6 ++
  arch/riscv/mm/init.c    | 100 +++--
  11 files changed, 426 insertions(+), 18 deletions(-)
  create mode 100644 arch/riscv/kernel/vmlinux-xip.lds.S

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 8ea60a0a19ae..bd6f82240c34 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -441,7 +441,7 @@ config EFI_STUB

  config EFI
   bool "UEFI runtime support"
- depends on OF
+ depends on OF && !XIP_KERNEL
   select LIBFDT
   select UCS2_STRING
   select EFI_PARAMS_FROM_FDT
@@ -465,11 +465,56 @@ config STACKPROTECTOR_PER_TASK
   def_bool y
   depends on STACKPROTECTOR && CC_HAVE_STACKPROTECTOR_TLS

+config PHYS_RAM_BASE_FIXED
+ bool "Explicitly specified physical RAM address"
+ default n
+
+config PHYS_RAM_BASE
+ hex "Platform Physical RAM address"
+ depends on PHYS_RAM_BASE_FIXED
+ default "0x8000"
+ help
+   This is the physical address of RAM in the system. It has 
to be
+   explicitly specified to run early relocations of 
read-write data

+   from flash to RAM.
+
+config XIP_KERNEL
+ bool "Kernel Execute-In-Place from ROM"
+ depends on MMU
+ select PHYS_RAM_BASE_FIXED
+ help
+   Execute-In-Place allows the kernel to run from 
non-volatile storage
+   directly addressable by the CPU, such as NOR flash. This 
saves RAM
+   space since the text section of the kernel is not loaded 
from flash
+   to RAM.  Read-write sections, such as the data section and 
stack,
+   are still copied to RAM.  The XIP kernel is not compressed 
since
+   it has to run directly from flash, so it will take more 
space to
+   store it.  The flash address used to link the kernel 
object files,
+   and for storing it, is configuration dependent. Therefore, 
if you
+   say Y here, you must know the proper physical address

Re: [PATCH v6] RISC-V: enable XIP

2021-03-30 Thread Alex Ghiti

Le 3/30/21 à 3:33 PM, Palmer Dabbelt a écrit :

On Tue, 30 Mar 2021 11:39:10 PDT (-0700), a...@ghiti.fr wrote:



Le 3/30/21 à 2:26 AM, Vitaly Wool a écrit :
On Tue, Mar 30, 2021 at 8:23 AM Palmer Dabbelt 
 wrote:


On Sun, 21 Mar 2021 17:12:15 PDT (-0700), vitaly.w...@konsulko.com 
wrote:

Introduce XIP (eXecute In Place) support for RISC-V platforms.
It allows code to be executed directly from non-volatile storage
directly addressable by the CPU, such as QSPI NOR flash which can
be found on many RISC-V platforms. This makes way for significant
optimization of RAM footprint. The XIP kernel is not compressed
since it has to run directly from flash, so it will occupy more
space on the non-volatile storage. The physical flash address used
to link the kernel object files and for storing it has to be known
at compile time and is represented by a Kconfig option.

XIP on RISC-V will for the time being only work on MMU-enabled
kernels.

Signed-off-by: Vitaly Wool 

---

Changes in v2:
- dedicated macro for XIP address fixup when MMU is not enabled yet
   o both for 32-bit and 64-bit RISC-V
- SP is explicitly set to a safe place in RAM before __copy_data call
- removed redundant alignment requirements in vmlinux-xip.lds.S
- changed long -> uintptr_t typecast in __XIP_FIXUP macro.
Changes in v3:
- rebased against latest for-next
- XIP address fixup macro now takes an argument
- SMP related fixes
Changes in v4:
- rebased against the current for-next
- less #ifdef's in C/ASM code
- dedicated XIP_FIXUP_OFFSET assembler macro in head.S
- C-specific definitions moved into #ifndef __ASSEMBLY__
- Fixed multi-core boot
Changes in v5:
- fixed build error for non-XIP kernels
Changes in v6:
- XIP_PHYS_RAM_BASE config option renamed to PHYS_RAM_BASE
- added PHYS_RAM_BASE_FIXED config flag to allow usage of
   PHYS_RAM_BASE in non-XIP configurations if needed
- XIP_FIXUP macro rewritten with a tempoarary variable to avoid side
   effects
- fixed crash for non-XIP kernels that don't use built-in DTB


So v5 landed on for-next, which generally means it's best to avoid
re-spinning the patch and instead send along fixups.  That said, the v5
is causing some testing failures for me.

I'm going to drop the v5 for now as I don't have time to test this
tonight.  I'll try and take a look soon, as it will conflict with 
Alex's

patches.


I can come up with the incremental patch instead pretty much straight
away if that works better.

~Vitaly


  arch/riscv/Kconfig  |  49 ++-
  arch/riscv/Makefile |   8 +-
  arch/riscv/boot/Makefile    |  13 +++
  arch/riscv/include/asm/pgtable.h    |  65 --
  arch/riscv/kernel/cpu_ops_sbi.c |  11 ++-
  arch/riscv/kernel/head.S    |  49 ++-
  arch/riscv/kernel/head.h    |   3 +
  arch/riscv/kernel/setup.c   |   8 +-
  arch/riscv/kernel/vmlinux-xip.lds.S | 132 


  arch/riscv/kernel/vmlinux.lds.S |   6 ++
  arch/riscv/mm/init.c    | 100 +++--
  11 files changed, 426 insertions(+), 18 deletions(-)
  create mode 100644 arch/riscv/kernel/vmlinux-xip.lds.S

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 8ea60a0a19ae..bd6f82240c34 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -441,7 +441,7 @@ config EFI_STUB

  config EFI
   bool "UEFI runtime support"
- depends on OF
+ depends on OF && !XIP_KERNEL
   select LIBFDT
   select UCS2_STRING
   select EFI_PARAMS_FROM_FDT
@@ -465,11 +465,56 @@ config STACKPROTECTOR_PER_TASK
   def_bool y
   depends on STACKPROTECTOR && CC_HAVE_STACKPROTECTOR_TLS

+config PHYS_RAM_BASE_FIXED
+ bool "Explicitly specified physical RAM address"
+ default n
+
+config PHYS_RAM_BASE
+ hex "Platform Physical RAM address"
+ depends on PHYS_RAM_BASE_FIXED
+ default "0x8000"
+ help
+   This is the physical address of RAM in the system. It has 
to be
+   explicitly specified to run early relocations of read-write 
data

+   from flash to RAM.
+
+config XIP_KERNEL
+ bool "Kernel Execute-In-Place from ROM"
+ depends on MMU
+ select PHYS_RAM_BASE_FIXED
+ help
+   Execute-In-Place allows the kernel to run from non-volatile 
storage
+   directly addressable by the CPU, such as NOR flash. This 
saves RAM
+   space since the text section of the kernel is not loaded 
from flash
+   to RAM.  Read-write sections, such as the data section and 
stack,
+   are still copied to RAM.  The XIP kernel is not compressed 
since
+   it has to run directly from flash, so it will take more 
space to
+   store it.  The flash address used to link the kernel object 
files,
+   and for storing it, is configuration dependent. Therefore, 
if you

+   say Y here, you must know the proper physical address where to
+   store the kernel image depending on your own flash memory 
usage.

+
+   Also note that the make 

Re: [PATCH v6] RISC-V: enable XIP

2021-03-30 Thread Alex Ghiti




Le 3/30/21 à 2:26 AM, Vitaly Wool a écrit :

On Tue, Mar 30, 2021 at 8:23 AM Palmer Dabbelt  wrote:


On Sun, 21 Mar 2021 17:12:15 PDT (-0700), vitaly.w...@konsulko.com wrote:

Introduce XIP (eXecute In Place) support for RISC-V platforms.
It allows code to be executed directly from non-volatile storage
directly addressable by the CPU, such as QSPI NOR flash which can
be found on many RISC-V platforms. This makes way for significant
optimization of RAM footprint. The XIP kernel is not compressed
since it has to run directly from flash, so it will occupy more
space on the non-volatile storage. The physical flash address used
to link the kernel object files and for storing it has to be known
at compile time and is represented by a Kconfig option.

XIP on RISC-V will for the time being only work on MMU-enabled
kernels.

Signed-off-by: Vitaly Wool 

---

Changes in v2:
- dedicated macro for XIP address fixup when MMU is not enabled yet
   o both for 32-bit and 64-bit RISC-V
- SP is explicitly set to a safe place in RAM before __copy_data call
- removed redundant alignment requirements in vmlinux-xip.lds.S
- changed long -> uintptr_t typecast in __XIP_FIXUP macro.
Changes in v3:
- rebased against latest for-next
- XIP address fixup macro now takes an argument
- SMP related fixes
Changes in v4:
- rebased against the current for-next
- less #ifdef's in C/ASM code
- dedicated XIP_FIXUP_OFFSET assembler macro in head.S
- C-specific definitions moved into #ifndef __ASSEMBLY__
- Fixed multi-core boot
Changes in v5:
- fixed build error for non-XIP kernels
Changes in v6:
- XIP_PHYS_RAM_BASE config option renamed to PHYS_RAM_BASE
- added PHYS_RAM_BASE_FIXED config flag to allow usage of
   PHYS_RAM_BASE in non-XIP configurations if needed
- XIP_FIXUP macro rewritten with a tempoarary variable to avoid side
   effects
- fixed crash for non-XIP kernels that don't use built-in DTB


So v5 landed on for-next, which generally means it's best to avoid
re-spinning the patch and instead send along fixups.  That said, the v5
is causing some testing failures for me.

I'm going to drop the v5 for now as I don't have time to test this
tonight.  I'll try and take a look soon, as it will conflict with Alex's
patches.


I can come up with the incremental patch instead pretty much straight
away if that works better.

~Vitaly


  arch/riscv/Kconfig  |  49 ++-
  arch/riscv/Makefile |   8 +-
  arch/riscv/boot/Makefile|  13 +++
  arch/riscv/include/asm/pgtable.h|  65 --
  arch/riscv/kernel/cpu_ops_sbi.c |  11 ++-
  arch/riscv/kernel/head.S|  49 ++-
  arch/riscv/kernel/head.h|   3 +
  arch/riscv/kernel/setup.c   |   8 +-
  arch/riscv/kernel/vmlinux-xip.lds.S | 132 
  arch/riscv/kernel/vmlinux.lds.S |   6 ++
  arch/riscv/mm/init.c| 100 +++--
  11 files changed, 426 insertions(+), 18 deletions(-)
  create mode 100644 arch/riscv/kernel/vmlinux-xip.lds.S

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 8ea60a0a19ae..bd6f82240c34 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -441,7 +441,7 @@ config EFI_STUB

  config EFI
   bool "UEFI runtime support"
- depends on OF
+ depends on OF && !XIP_KERNEL
   select LIBFDT
   select UCS2_STRING
   select EFI_PARAMS_FROM_FDT
@@ -465,11 +465,56 @@ config STACKPROTECTOR_PER_TASK
   def_bool y
   depends on STACKPROTECTOR && CC_HAVE_STACKPROTECTOR_TLS

+config PHYS_RAM_BASE_FIXED
+ bool "Explicitly specified physical RAM address"
+ default n
+
+config PHYS_RAM_BASE
+ hex "Platform Physical RAM address"
+ depends on PHYS_RAM_BASE_FIXED
+ default "0x8000"
+ help
+   This is the physical address of RAM in the system. It has to be
+   explicitly specified to run early relocations of read-write data
+   from flash to RAM.
+
+config XIP_KERNEL
+ bool "Kernel Execute-In-Place from ROM"
+ depends on MMU
+ select PHYS_RAM_BASE_FIXED
+ help
+   Execute-In-Place allows the kernel to run from non-volatile storage
+   directly addressable by the CPU, such as NOR flash. This saves RAM
+   space since the text section of the kernel is not loaded from flash
+   to RAM.  Read-write sections, such as the data section and stack,
+   are still copied to RAM.  The XIP kernel is not compressed since
+   it has to run directly from flash, so it will take more space to
+   store it.  The flash address used to link the kernel object files,
+   and for storing it, is configuration dependent. Therefore, if you
+   say Y here, you must know the proper physical address where to
+   store the kernel image depending on your own flash memory usage.
+
+   Also note that the make target becomes "make xipImage" rather than
+   "make zImage" or "make Image".  The final kernel binary to put in
+   ROM 

Re: [PATCH] implement flush_cache_vmap and flush_cache_vunmap for RISC-V

2021-03-30 Thread Alex Ghiti

Hi Jiuyang,

Le 3/28/21 à 9:55 PM, Jiuyang Liu a écrit :

This patch implements flush_cache_vmap and flush_cache_vunmap for
RISC-V, since these functions might modify PTE. Without this patch,
SFENCE.VMA won't be added to related codes, which might introduce a bug
in some out-of-order micro-architecture implementations.

Signed-off-by: Jiuyang Liu 
---
  arch/riscv/include/asm/cacheflush.h | 8 
  1 file changed, 8 insertions(+)

diff --git a/arch/riscv/include/asm/cacheflush.h 
b/arch/riscv/include/asm/cacheflush.h
index 23ff70350992..4adf25248c43 100644
--- a/arch/riscv/include/asm/cacheflush.h
+++ b/arch/riscv/include/asm/cacheflush.h
@@ -8,6 +8,14 @@
  
  #include 
  
+/*

+ * flush_cache_vmap and flush_cache_vunmap might modify PTE, needs SFENCE.VMA.


"might modify PTE" is not entirely true I think, this is what happens 
before using this function that might modify PTE, those functions ensure 
those modifications are made visible.



+ * - flush_cache_vmap is invoked after map_kernel_range() has installed the 
page table entries.
+ * - flush_cache_vunmap is invoked before unmap_kernel_range() deletes the 
page table entries
+ */
+#define flush_cache_vmap(start, end) flush_tlb_all()
+#define flush_cache_vunmap(start, end) flush_tlb_all()
+
  static inline void local_flush_icache_all(void)
  {
asm volatile ("fence.i" ::: "memory");



FWIW, you can add:

Reviewed-by: Alexandre Ghiti 

Thanks,

Alex


Re: [PATCH v4 3/5] RISC-V: Initial DTS for Microchip ICICLE board

2021-03-27 Thread Alex Ghiti

Hi Atish,

Le 3/3/21 à 3:02 PM, Atish Patra a écrit :

Add initial DTS for Microchip ICICLE board having only
essential devices (clocks, sdhci, ethernet, serial, etc).
The device tree is based on the U-Boot patch.

https://patchwork.ozlabs.org/project/uboot/patch/20201110103414.10142-6-padmarao.beg...@microchip.com/

Signed-off-by: Atish Patra 
---
  arch/riscv/boot/dts/Makefile  |   1 +
  arch/riscv/boot/dts/microchip/Makefile|   2 +
  .../microchip/microchip-mpfs-icicle-kit.dts   |  72 
  .../boot/dts/microchip/microchip-mpfs.dtsi| 329 ++
  4 files changed, 404 insertions(+)
  create mode 100644 arch/riscv/boot/dts/microchip/Makefile
  create mode 100644 arch/riscv/boot/dts/microchip/microchip-mpfs-icicle-kit.dts
  create mode 100644 arch/riscv/boot/dts/microchip/microchip-mpfs.dtsi

diff --git a/arch/riscv/boot/dts/Makefile b/arch/riscv/boot/dts/Makefile
index 7ffd502e3e7b..fe996b88319e 100644
--- a/arch/riscv/boot/dts/Makefile
+++ b/arch/riscv/boot/dts/Makefile
@@ -1,5 +1,6 @@
  # SPDX-License-Identifier: GPL-2.0
  subdir-y += sifive
  subdir-$(CONFIG_SOC_CANAAN_K210_DTB_BUILTIN) += canaan
+subdir-y += microchip
  
  obj-$(CONFIG_BUILTIN_DTB) := $(addsuffix /, $(subdir-y))

diff --git a/arch/riscv/boot/dts/microchip/Makefile 
b/arch/riscv/boot/dts/microchip/Makefile
new file mode 100644
index ..622b12771fd3
--- /dev/null
+++ b/arch/riscv/boot/dts/microchip/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+dtb-$(CONFIG_SOC_MICROCHIP_POLARFIRE) += microchip-mpfs-icicle-kit.dtb


I'm playing (or trying to...) with XIP_KERNEL and I had to add the 
following to have the device tree actually builtin the kernel:


diff --git a/arch/riscv/boot/dts/microchip/Makefile 
b/arch/riscv/boot/dts/microchip/Makefile

index 622b12771fd3..855c1502d912 100644
--- a/arch/riscv/boot/dts/microchip/Makefile
+++ b/arch/riscv/boot/dts/microchip/Makefile
@@ -1,2 +1,3 @@
 # SPDX-License-Identifier: GPL-2.0
 dtb-$(CONFIG_SOC_MICROCHIP_POLARFIRE) += microchip-mpfs-icicle-kit.dtb
+obj-$(CONFIG_BUILTIN_DTB) += $(addsuffix .o, $(dtb-y))

Alex


diff --git a/arch/riscv/boot/dts/microchip/microchip-mpfs-icicle-kit.dts 
b/arch/riscv/boot/dts/microchip/microchip-mpfs-icicle-kit.dts
new file mode 100644
index ..ec79944065c9
--- /dev/null
+++ b/arch/riscv/boot/dts/microchip/microchip-mpfs-icicle-kit.dts
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/* Copyright (c) 2020 Microchip Technology Inc */
+
+/dts-v1/;
+
+#include "microchip-mpfs.dtsi"
+
+/* Clock frequency (in Hz) of the rtcclk */
+#define RTCCLK_FREQ100
+
+/ {
+   #address-cells = <2>;
+   #size-cells = <2>;
+   model = "Microchip PolarFire-SoC Icicle Kit";
+   compatible = "microchip,mpfs-icicle-kit";
+
+   chosen {
+   stdout-path = 
+   };
+
+   cpus {
+   timebase-frequency = ;
+   };
+
+   memory@8000 {
+   device_type = "memory";
+   reg = <0x0 0x8000 0x0 0x4000>;
+   clocks = < 26>;
+   };
+
+   soc {
+   };
+};
+
+ {
+   status = "okay";
+};
+
+ {
+   status = "okay";
+};
+
+ {
+   status = "okay";
+};
+
+ {
+   status = "okay";
+};
+
+ {
+   status = "okay";
+};
+
+ {
+   phy-mode = "sgmii";
+   phy-handle = <>;
+   phy0: ethernet-phy@8 {
+   reg = <8>;
+   ti,fifo-depth = <0x01>;
+   };
+};
+
+ {
+   status = "okay";
+   phy-mode = "sgmii";
+   phy-handle = <>;
+   phy1: ethernet-phy@9 {
+   reg = <9>;
+   ti,fifo-depth = <0x01>;
+   };
+};
diff --git a/arch/riscv/boot/dts/microchip/microchip-mpfs.dtsi 
b/arch/riscv/boot/dts/microchip/microchip-mpfs.dtsi
new file mode 100644
index ..b9819570a7d1
--- /dev/null
+++ b/arch/riscv/boot/dts/microchip/microchip-mpfs.dtsi
@@ -0,0 +1,329 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/* Copyright (c) 2020 Microchip Technology Inc */
+
+/dts-v1/;
+
+/ {
+   #address-cells = <2>;
+   #size-cells = <2>;
+   model = "Microchip MPFS Icicle Kit";
+   compatible = "microchip,mpfs-icicle-kit";
+
+   chosen {
+   };
+
+   cpus {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   cpu@0 {
+   clock-frequency = <0>;
+   compatible = "sifive,e51", "sifive,rocket0", "riscv";
+   device_type = "cpu";
+   i-cache-block-size = <64>;
+   i-cache-sets = <128>;
+   i-cache-size = <16384>;
+   reg = <0>;
+   riscv,isa = "rv64imac";
+   status = "disabled";
+
+   cpu0_intc: interrupt-controller {
+   #interrupt-cells = <1>;
+   compatible = "riscv,cpu-intc";
+   

Re: [PATCH v5] RISC-V: enable XIP

2021-03-22 Thread Alex Ghiti

Le 3/21/21 à 2:06 PM, Vitaly Wool a écrit :

Hey Alex,

On Sun, Mar 21, 2021 at 4:11 PM Alex Ghiti  wrote:


Hi Vitaly,

Le 3/10/21 à 4:22 AM, Vitaly Wool a écrit :

Introduce XIP (eXecute In Place) support for RISC-V platforms.
It allows code to be executed directly from non-volatile storage
directly addressable by the CPU, such as QSPI NOR flash which can
be found on many RISC-V platforms. This makes way for significant
optimization of RAM footprint. The XIP kernel is not compressed
since it has to run directly from flash, so it will occupy more
space on the non-volatile storage to The physical flash address


There seems to be missing something here.



Hmmm... strange indeed. I'll come up with a respin shortly and will
double check.




used to link the kernel object files and for storing it has to
be known at compile time and is represented by a Kconfig option.

XIP on RISC-V will currently only work on MMU-enabled kernels.

Signed-off-by: Vitaly Wool 



This fails to boot on current for-next with the following panic.
This is because dtb_early_va points to an address that is not mapped in
swapper_pg_dir: using __va(dtb_early_pa) instead works fine.


Is it with CONFIG_BUILTIN_DTB or without?


It is without CONFIG_BUILTIN_DTB enabled.

And I noticed I can't link a XIP_KERNEL either:

/home/alex/wip/lpc/buildroot/build_rv64/host/bin/riscv64-buildroot-linux-gnu-ld: 
section .data LMA [0080,008cd77f] overlaps section 
.rodata LMA [006f61c0,008499a7]
/home/alex/wip/lpc/buildroot/build_rv64/host/bin/riscv64-buildroot-linux-gnu-ld: 
section .pci_fixup LMA [008499a8,0084cfd7] overlaps 
section .data LMA [0080,008cd77f]
/home/alex/wip/lpc/buildroot/build_rv64/host/bin/riscv64-buildroot-linux-gnu-ld: 
arch/riscv/mm/init.o: in function `.L138':

init.c:(.text+0x232): undefined reference to `__init_text_begin'
/home/alex/wip/lpc/buildroot/build_rv64/host/bin/riscv64-buildroot-linux-gnu-ld: 
arch/riscv/mm/init.o: in function `protect_kernel_text_data':

init.c:(.text+0x23a): undefined reference to `__init_data_begin'
/home/alex/wip/lpc/buildroot/build_rv64/host/bin/riscv64-buildroot-linux-gnu-ld: 
init.c:(.text+0x28c): undefined reference to `__init_text_begin'
/home/alex/wip/lpc/buildroot/build_rv64/host/bin/riscv64-buildroot-linux-gnu-ld: 
init.c:(.text+0x2a0): undefined reference to `__init_data_begin'

make[2]: *** [Makefile:1197: vmlinux] Error 1
make[1]: *** [package/pkg-generic.mk:250: 
/home/alex/wip/lpc/buildroot/build_rv64/build/linux-custom/.stamp_built] 
Error 2

make: *** [Makefile:23: _all] Error 2

The 2 missing symbols are not defined in vmlinux-xip.lds.S and are 
required for CONFIG_STRICT_KERNEL_RWX, I don't think both configs are 
exclusive ? I added them in linker script and that works.


But then I'm still blocked with the overlaps, any idea ?





And as this likely needs another version, I'm going to add my comments
below.

[0.00] OF: fdt: Ignoring memory range 0x8000 - 0x8020
[0.00] Machine model: riscv-virtio,qemu
[0.00] earlycon: sbi0 at I/O port 0x0 (options '')
[0.00] printk: bootconsole [sbi0] enabled
[0.00] efi: UEFI not found.
[0.00] Unable to handle kernel paging request at virtual address
4001
[0.00] Oops [#1]
[0.00] Modules linked in:
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 5.12.0-rc2 #155
[0.00] Hardware name: riscv-virtio,qemu (DT)
[0.00] epc : fdt_check_header+0x0/0x1fc
[0.00]  ra : early_init_dt_verify+0x16/0x6e
[0.00] epc : ffe0002b955e ra : ffe00082100c sp :
ffe001203f10
[0.00]  gp : ffe0012e40b8 tp : ffe00120bd80 t0 :
ffe23fdf7000
[0.00]  t1 :  t2 :  s0 :
ffe001203f30
[0.00]  s1 : 4000 a0 : 4000 a1 :
0002bfff
[0.00]  a2 : ffe23fdf6f00 a3 : 0001 a4 :
0018
[0.00]  a5 : ffe000a0b5e8 a6 : ffe23fdf6ef0 a7 :
0018
[0.00]  s2 : 8200 s3 : 0fff s4 :
ffe000a0a958
[0.00]  s5 : 0005 s6 : 0140 s7 :
ffe23fdf6ec0
[0.00]  s8 : 81000200 s9 : 8200 s10:
ffe000a01000
[0.00]  s11: 0fff t3 : bfff7000 t4 :

[0.00]  t5 : 80e0 t6 : 80202000
[0.00] status: 0100 badaddr: 4001 cause:
000d
[0.00] Call Trace:
[0.00] [] fdt_check_header+0x0/0x1fc
[0.00] [] setup_arch+0x3a6/0x412
[0.00] [] start_kernel+0x7e/0x580
[0.00] random: get_random_bytes called from
print_oops_end_marker+0x22/0x44 with crng_init=0
[0.00] ---[ end trace  ]---
[0.00] Kernel panic - not syncing: Fatal exception
[0.00] ---[ end Kernel panic - not syncing: Fatal

Re: [PATCH v5] RISC-V: enable XIP

2021-03-21 Thread Alex Ghiti

Hi Vitaly,

Le 3/10/21 à 4:22 AM, Vitaly Wool a écrit :

Introduce XIP (eXecute In Place) support for RISC-V platforms.
It allows code to be executed directly from non-volatile storage
directly addressable by the CPU, such as QSPI NOR flash which can
be found on many RISC-V platforms. This makes way for significant
optimization of RAM footprint. The XIP kernel is not compressed
since it has to run directly from flash, so it will occupy more
space on the non-volatile storage to The physical flash address


There seems to be missing something here.


used to link the kernel object files and for storing it has to
be known at compile time and is represented by a Kconfig option.

XIP on RISC-V will currently only work on MMU-enabled kernels.

Signed-off-by: Vitaly Wool 



This fails to boot on current for-next with the following panic.
This is because dtb_early_va points to an address that is not mapped in 
swapper_pg_dir: using __va(dtb_early_pa) instead works fine.


And as this likely needs another version, I'm going to add my comments 
below.


[0.00] OF: fdt: Ignoring memory range 0x8000 - 0x8020
[0.00] Machine model: riscv-virtio,qemu
[0.00] earlycon: sbi0 at I/O port 0x0 (options '')
[0.00] printk: bootconsole [sbi0] enabled
[0.00] efi: UEFI not found.
[0.00] Unable to handle kernel paging request at virtual address 
4001

[0.00] Oops [#1]
[0.00] Modules linked in:
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 5.12.0-rc2 #155
[0.00] Hardware name: riscv-virtio,qemu (DT)
[0.00] epc : fdt_check_header+0x0/0x1fc
[0.00]  ra : early_init_dt_verify+0x16/0x6e
[0.00] epc : ffe0002b955e ra : ffe00082100c sp : 
ffe001203f10
[0.00]  gp : ffe0012e40b8 tp : ffe00120bd80 t0 : 
ffe23fdf7000
[0.00]  t1 :  t2 :  s0 : 
ffe001203f30
[0.00]  s1 : 4000 a0 : 4000 a1 : 
0002bfff
[0.00]  a2 : ffe23fdf6f00 a3 : 0001 a4 : 
0018
[0.00]  a5 : ffe000a0b5e8 a6 : ffe23fdf6ef0 a7 : 
0018
[0.00]  s2 : 8200 s3 : 0fff s4 : 
ffe000a0a958
[0.00]  s5 : 0005 s6 : 0140 s7 : 
ffe23fdf6ec0
[0.00]  s8 : 81000200 s9 : 8200 s10: 
ffe000a01000
[0.00]  s11: 0fff t3 : bfff7000 t4 : 


[0.00]  t5 : 80e0 t6 : 80202000
[0.00] status: 0100 badaddr: 4001 cause: 
000d

[0.00] Call Trace:
[0.00] [] fdt_check_header+0x0/0x1fc
[0.00] [] setup_arch+0x3a6/0x412
[0.00] [] start_kernel+0x7e/0x580
[0.00] random: get_random_bytes called from 
print_oops_end_marker+0x22/0x44 with crng_init=0

[0.00] ---[ end trace  ]---
[0.00] Kernel panic - not syncing: Fatal exception
[0.00] ---[ end Kernel panic - not syncing: Fatal exception ]---



---
Changes in v2:
- dedicated macro for XIP address fixup when MMU is not enabled yet
   o both for 32-bit and 64-bit RISC-V
- SP is explicitly set to a safe place in RAM before __copy_data call
- removed redundant alignment requirements in vmlinux-xip.lds.S
- changed long -> uintptr_t typecast in __XIP_FIXUP macro.
Changes in v3:
- rebased against latest for-next
- XIP address fixup macro now takes an argument
- SMP related fixes
Changes in v4:
- rebased against the current for-next
- less #ifdef's in C/ASM code
- dedicated XIP_FIXUP_OFFSET assembler macro in head.S
- C-specific definitions moved into #ifndef __ASSEMBLY__
- Fixed multi-core boot
Changes in v5:
- fixed build error for non-XIP kernels

  arch/riscv/Kconfig  |  44 +-
  arch/riscv/Makefile |   8 +-
  arch/riscv/boot/Makefile|  13 +++
  arch/riscv/include/asm/pgtable.h|  65 --
  arch/riscv/kernel/cpu_ops_sbi.c |  12 ++-
  arch/riscv/kernel/head.S|  59 -
  arch/riscv/kernel/head.h|   3 +
  arch/riscv/kernel/setup.c   |   8 +-
  arch/riscv/kernel/vmlinux-xip.lds.S | 132 
  arch/riscv/kernel/vmlinux.lds.S |   6 ++
  arch/riscv/mm/init.c| 100 +++--
  11 files changed, 432 insertions(+), 18 deletions(-)
  create mode 100644 arch/riscv/kernel/vmlinux-xip.lds.S

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 85d626b8ce5e..59fb945a900e 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -438,7 +438,7 @@ config EFI_STUB
  
  config EFI

bool "UEFI runtime support"
-   depends on OF
+   depends on OF && !XIP_KERNEL
select LIBFDT
select UCS2_STRING
select EFI_PARAMS_FROM_FDT
@@ -462,11 +462,51 @@ config STACKPROTECTOR_PER_TASK
def_bool y
 

Re: [PATCH 0/3] Move kernel mapping outside the linear mapping

2021-03-20 Thread Alex Ghiti

Le 3/9/21 à 9:54 PM, Palmer Dabbelt a écrit :

On Thu, 25 Feb 2021 00:04:50 PST (-0800), a...@ghiti.fr wrote:

I decided to split sv48 support in small series to ease the review.

This patchset pushes the kernel mapping (modules and BPF too) to the last
4GB of the 64bit address space, this allows to:
- implement relocatable kernel (that will come later in another
  patchset) that requires to move the kernel mapping out of the linear
  mapping to avoid to copy the kernel at a different physical address.
- have a single kernel that is not relocatable (and then that avoids the
  performance penalty imposed by PIC kernel) for both sv39 and sv48.

The first patch implements this behaviour, the second patch introduces a
documentation that describes the virtual address space layout of the 
64bit
kernel and the last patch is taken from my sv48 series where I simply 
added

the dump of the modules/kernel/BPF mapping.

I removed the Reviewed-by on the first patch since it changed enough from
last time and deserves a second look.

Alexandre Ghiti (3):
  riscv: Move kernel mapping outside of linear mapping
  Documentation: riscv: Add documentation that describes the VM layout
  riscv: Prepare ptdump for vm layout dynamic addresses

 Documentation/riscv/index.rst   |  1 +
 Documentation/riscv/vm-layout.rst   | 61 ++
 arch/riscv/boot/loader.lds.S    |  3 +-
 arch/riscv/include/asm/page.h   | 18 ++-
 arch/riscv/include/asm/pgtable.h    | 37 +
 arch/riscv/include/asm/set_memory.h |  1 +
 arch/riscv/kernel/head.S    |  3 +-
 arch/riscv/kernel/module.c  |  6 +--
 arch/riscv/kernel/setup.c   |  3 ++
 arch/riscv/kernel/vmlinux.lds.S |  3 +-
 arch/riscv/mm/fault.c   | 13 +
 arch/riscv/mm/init.c    | 81 +++--
 arch/riscv/mm/kasan_init.c  |  9 
 arch/riscv/mm/physaddr.c    |  2 +-
 arch/riscv/mm/ptdump.c  | 67 +++-
 15 files changed, 258 insertions(+), 50 deletions(-)
 create mode 100644 Documentation/riscv/vm-layout.rst


This generally looks good, but I'm getting a bunch of checkpatch 
warnings and some conflicts, do you mind fixing those up (and including 
your other kasan patch, as that's likely to conflict)?


I have just tried to rebase this on for-next, and that quite conflicts 
with Vitaly's XIP patch, I'm fixing this and post a v3.


Alex


Re: [PATCH] Insert SFENCE.VMA in function set_pte_at for RISCV

2021-03-19 Thread Alex Ghiti

Le 3/17/21 à 10:10 PM, Jiuyang Liu a écrit :

Thanks for the review!

I see, after skimming related codes, and implementation of other architecture,
I also agree this method is too heavy to implement. And there is a potential
bug, that my patch may introduce two SFENCE.VMA in the related codes:
flush at set_pte_at and also flush in the upper level of the calling stack.

My two cents is that the original description in spec is a little
misleading to the
software side, spec requires each set_pte inserting SFENCE.VMA together,
while the kernel chooses to maintain set_pte and flush_tlb separately.

So I think I should add a patch to fix my bug specifically, and
provide this trunk
as an inline function to flush tlb after modification to a pte.


if (pte_present(pteval)) {
 if (pte_leaf(pteval)) {
 local_flush_tlb_page(addr);
 } else {
 if (pte_global(pteval))
 local_flush_tlb_all();
 else
 local_flush_tlb_asid();

}
}


My next patch will become two patches:
1. add flush_tlb related codes according to spec(also flush global tlb
via sbi call if G bit is on)
2. add a bug fix for my stack by adding flush in the flush_cache_vmap.

Does this approach sound reasonable?


Ok for me, please take a look at flush_cache_vunmap too as I think we 
need to do the same thing here.


Thanks,

Alex



Regards,
Jiuyang

On Tue, 16 Mar 2021 at 09:17 PM Palmer Dabbelt  wrote:

We're trying to avoid this sort of thing, instead relying on the generic kernel
functionality to batch up page table modifications before we issue the fences.
If you're seeing some specific issue then I'd be happy to try and sort out a
fix for it, but this is a bit heavy-handed to use as anything but a last
resort.

On Tue, Mar 16, 2021 at 10:03 PM Andrew Waterman
 wrote:


On Tue, Mar 16, 2021 at 5:05 AM Alex Ghiti  wrote:


Le 3/16/21 à 4:40 AM, Anup Patel a écrit :

On Tue, Mar 16, 2021 at 1:59 PM Andrew Waterman
 wrote:


On Tue, Mar 16, 2021 at 12:32 AM Anup Patel  wrote:


On Tue, Mar 16, 2021 at 12:27 PM Jiuyang Liu  wrote:



As per my understanding, we don't need to explicitly invalidate local TLB
in set_pte() or set_pet_at() because generic Linux page table management
(/mm/*) will call the appropriate flush_tlb_xyz() function after page
table updates.


I witnessed this bug in our micro-architecture: set_pte instruction is
still in the store buffer, no functions are inserting SFENCE.VMA in
the stack below, so TLB cannot witness this modification.
Here is my call stack:
set_pte
set_pte_at
map_vm_area
__vmalloc_area_node
__vmalloc_node_range
__vmalloc_node
__vmalloc_node_flags
vzalloc
n_tty_open



I don't find this call stack, what I find is (the other way around):

n_tty_open
vzalloc
__vmalloc_node
__vmalloc_node_range
__vmalloc_area_node
map_kernel_range
-> map_kernel_range_noflush
 flush_cache_vmap

Which leads to the fact that we don't have flush_cache_vmap callback
implemented: shouldn't we add the sfence.vma here ? Powerpc does
something similar with "ptesync" (see below) instruction that seems to
do the same as sfence.vma.


I was thinking the same thing, but I hadn't yet wrapped my head around
the fact that most architectures don't have something similar.  I'm OK
with following PPC's lead if it appears to be a correct bug fix :)




ptesync: "The ptesync instruction after the Store instruction ensures
that all searches of the Page Table that are performed after the ptesync
instruction completes will use the value stored"


I think this is an architecture specific code, so /mm/* should
not be modified.
And spec requires SFENCE.VMA to be inserted on each modification to
TLB. So I added code here.


The generic linux/mm/* already calls the appropriate tlb_flush_xyz()
function defined in arch/riscv/include/asm/tlbflush.h

Better to have a write-barrier in set_pte().




Also, just local TLB flush is generally not sufficient because
a lot of page tables will be used across on multiple HARTs.


Yes, this is the biggest issue, in RISC-V Volume 2, Privileged Spec v.
20190608 page 67 gave a solution:


This is not an issue with RISC-V privilege spec rather it is more about
placing RISC-V fences at right locations.


Consequently, other harts must be notified separately when the
memory-management data structures have been modified. One approach is
to use
1) a local data fence to ensure local writes are visible globally,
then 2) an interprocessor interrupt to the other thread,
then 3) a local SFENCE.VMA in the interrupt handler of the remote thread,
and finally 4) signal back to originating thread that operation is
complete. This is, of course, the RISC-V analog to a TLB shootdown.


I would suggest trying approach#1.

You can include "asm/barrier.h" here and use wmb() or __smp_wmb()
in-place of local TLB flush.


wmb() doesn't suffice to order older stores before younger page-table
walks, so th

Re: [PATCH] Insert SFENCE.VMA in function set_pte_at for RISCV

2021-03-16 Thread Alex Ghiti

Le 3/16/21 à 4:40 AM, Anup Patel a écrit :

On Tue, Mar 16, 2021 at 1:59 PM Andrew Waterman
 wrote:


On Tue, Mar 16, 2021 at 12:32 AM Anup Patel  wrote:


On Tue, Mar 16, 2021 at 12:27 PM Jiuyang Liu  wrote:



As per my understanding, we don't need to explicitly invalidate local TLB
in set_pte() or set_pet_at() because generic Linux page table management
(/mm/*) will call the appropriate flush_tlb_xyz() function after page
table updates.


I witnessed this bug in our micro-architecture: set_pte instruction is
still in the store buffer, no functions are inserting SFENCE.VMA in
the stack below, so TLB cannot witness this modification.
Here is my call stack:
set_pte
set_pte_at
map_vm_area
__vmalloc_area_node
__vmalloc_node_range
__vmalloc_node
__vmalloc_node_flags
vzalloc
n_tty_open



I don't find this call stack, what I find is (the other way around):

n_tty_open
vzalloc
__vmalloc_node
__vmalloc_node_range
__vmalloc_area_node
map_kernel_range
-> map_kernel_range_noflush
   flush_cache_vmap

Which leads to the fact that we don't have flush_cache_vmap callback 
implemented: shouldn't we add the sfence.vma here ? Powerpc does 
something similar with "ptesync" (see below) instruction that seems to 
do the same as sfence.vma.


ptesync: "The ptesync instruction after the Store instruction ensures 
that all searches of the Page Table that are performed after the ptesync 
instruction completes will use the value stored"



I think this is an architecture specific code, so /mm/* should
not be modified.
And spec requires SFENCE.VMA to be inserted on each modification to
TLB. So I added code here.


The generic linux/mm/* already calls the appropriate tlb_flush_xyz()
function defined in arch/riscv/include/asm/tlbflush.h

Better to have a write-barrier in set_pte().




Also, just local TLB flush is generally not sufficient because
a lot of page tables will be used across on multiple HARTs.


Yes, this is the biggest issue, in RISC-V Volume 2, Privileged Spec v.
20190608 page 67 gave a solution:


This is not an issue with RISC-V privilege spec rather it is more about
placing RISC-V fences at right locations.


Consequently, other harts must be notified separately when the
memory-management data structures have been modified. One approach is
to use
1) a local data fence to ensure local writes are visible globally,
then 2) an interprocessor interrupt to the other thread,
then 3) a local SFENCE.VMA in the interrupt handler of the remote thread,
and finally 4) signal back to originating thread that operation is
complete. This is, of course, the RISC-V analog to a TLB shootdown.


I would suggest trying approach#1.

You can include "asm/barrier.h" here and use wmb() or __smp_wmb()
in-place of local TLB flush.


wmb() doesn't suffice to order older stores before younger page-table
walks, so that might hide the problem without actually fixing it.


If we assume page-table walks as reads then mb() might be more
suitable in this case ??

ARM64 also has an explicit barrier in set_pte() implementation. They are
doing "dsb(ishst); isb()" which is an inner-shareable store barrier followed
by an instruction barrier.



Based upon Jiuyang's description, it does sound plausible that we are
missing an SFENCE.VMA (or TLB shootdown) somewhere.  But I don't
understand the situation well enough to know where that might be, or
what the best fix is.


Yes, I agree but set_pte() doesn't seem to be the right place for TLB
shootdown based on set_pte() implementations of other architectures.


I agree as "flushing" the TLB after every set_pte() would be very 
costly, it's better to do it once at the end of the all the updates: 
like in flush_cache_vmap :)


Alex



Regards,
Anup








In general, this patch didn't handle the G bit in PTE, kernel trap it
to sbi_remote_sfence_vma. do you think I should use flush_tlb_all?

Jiuyang




arch/arm/mm/mmu.c
void set_pte_at(struct mm_struct *mm, unsigned long addr,
   pte_t *ptep, pte_t pteval)
{
 unsigned long ext = 0;

 if (addr < TASK_SIZE && pte_valid_user(pteval)) {
 if (!pte_special(pteval))
 __sync_icache_dcache(pteval);
 ext |= PTE_EXT_NG;
 }

 set_pte_ext(ptep, pteval, ext);
}

arch/mips/include/asm/pgtable.h
static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
   pte_t *ptep, pte_t pteval)
{

 if (!pte_present(pteval))
 goto cache_sync_done;

 if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
 goto cache_sync_done;

 __update_cache(addr, pteval);
cache_sync_done:
 set_pte(ptep, pteval);
}


Also, just local TLB flush is generally not sufficient because

a lot of page tables will be used accross on multiple HARTs.



On Tue, Mar 16, 2021 at 5:05 AM Anup Patel  wrote:


+Alex

On Tue, Mar 16, 2021 at 9:20 AM Jiuyang Liu  wrote:


This patch inserts 

Re: [PATCH 0/3] Move kernel mapping outside the linear mapping

2021-03-13 Thread Alex Ghiti

Hi Palmer,

Le 3/9/21 à 9:54 PM, Palmer Dabbelt a écrit :

On Thu, 25 Feb 2021 00:04:50 PST (-0800), a...@ghiti.fr wrote:

I decided to split sv48 support in small series to ease the review.

This patchset pushes the kernel mapping (modules and BPF too) to the last
4GB of the 64bit address space, this allows to:
- implement relocatable kernel (that will come later in another
  patchset) that requires to move the kernel mapping out of the linear
  mapping to avoid to copy the kernel at a different physical address.
- have a single kernel that is not relocatable (and then that avoids the
  performance penalty imposed by PIC kernel) for both sv39 and sv48.

The first patch implements this behaviour, the second patch introduces a
documentation that describes the virtual address space layout of the 
64bit
kernel and the last patch is taken from my sv48 series where I simply 
added

the dump of the modules/kernel/BPF mapping.

I removed the Reviewed-by on the first patch since it changed enough from
last time and deserves a second look.

Alexandre Ghiti (3):
  riscv: Move kernel mapping outside of linear mapping
  Documentation: riscv: Add documentation that describes the VM layout
  riscv: Prepare ptdump for vm layout dynamic addresses

 Documentation/riscv/index.rst   |  1 +
 Documentation/riscv/vm-layout.rst   | 61 ++
 arch/riscv/boot/loader.lds.S    |  3 +-
 arch/riscv/include/asm/page.h   | 18 ++-
 arch/riscv/include/asm/pgtable.h    | 37 +
 arch/riscv/include/asm/set_memory.h |  1 +
 arch/riscv/kernel/head.S    |  3 +-
 arch/riscv/kernel/module.c  |  6 +--
 arch/riscv/kernel/setup.c   |  3 ++
 arch/riscv/kernel/vmlinux.lds.S |  3 +-
 arch/riscv/mm/fault.c   | 13 +
 arch/riscv/mm/init.c    | 81 +++--
 arch/riscv/mm/kasan_init.c  |  9 
 arch/riscv/mm/physaddr.c    |  2 +-
 arch/riscv/mm/ptdump.c  | 67 +++-
 15 files changed, 258 insertions(+), 50 deletions(-)
 create mode 100644 Documentation/riscv/vm-layout.rst


This generally looks good, but I'm getting a bunch of checkpatch 
warnings and some conflicts, do you mind fixing those up (and including 
your other kasan patch, as that's likely to conflict)?



I fixed a few checkpatch warnings and rebased on top of for-next but had 
not conflicts.


I have just sent the v2.

Thanks,

Alex


Re: [PATCH 2/3] Documentation: riscv: Add documentation that describes the VM layout

2021-03-13 Thread Alex Ghiti

Hi Arnd,

Le 3/11/21 à 3:42 AM, Arnd Bergmann a écrit :

On Wed, Mar 10, 2021 at 8:12 PM Alex Ghiti  wrote:

Le 3/10/21 à 6:42 AM, Arnd Bergmann a écrit :

On Thu, Feb 25, 2021 at 12:56 PM Alex Ghiti  wrote:


Le 2/25/21 à 5:34 AM, David Hildenbrand a écrit :

||  | |> +
ffc0 | -256GB | ffc7 |   32 GB | kasan

+   ffcefee0 | -196GB | ffcefeff |2 MB | fixmap
+   ffceff00 | -196GB | ffce |   16 MB | PCI io
+   ffcf | -196GB | ffcf |4 GB | vmemmap
+   ffd0 | -192GB | ffdf |   64 GB |
vmalloc/ioremap space
+   ffe0 | -128GB | 7fff |  126 GB |
direct mapping of all physical memory


^ So you could never ever have more than 126 GB, correct?

I assume that's nothing new.



Before this patch, the limit was 128GB, so in my sense, there is nothing
new. If ever we want to increase that limit, we'll just have to lower
PAGE_OFFSET, there is still some unused virtual addresses after kasan
for example.


Linus Walleij is looking into changing the arm32 code to have the kernel
direct map inside of the vmalloc area, which would be another place
that you could use here. It would be nice to not have too many different
ways of doing this, but I'm not sure how hard it would be to rework your
code, or if there are any downsides of doing this.


This was what my previous version did: https://lkml.org/lkml/2020/6/7/28.

This approach was not welcomed very well and it fixed only the problem
of the implementation of relocatable kernel. The second issue I'm trying
to resolve here is to support both 3 and 4 level page tables using the
same kernel without being relocatable (which would introduce performance
penalty). I can't do it when the kernel mapping is in the vmalloc region
since vmalloc region relies on PAGE_OFFSET which is different on both 3
and 4 level page table and that would then require the kernel to be
relocatable.


Ok, I see.

I suppose it might work if you moved the direct-map to the lowest
address and the vmalloc area (incorporating the kernel mapping,
modules, pio, and fixmap at fixed addresses) to the very top of the
address space, but you probably already considered and rejected
that for other reasons.



Yes I considered it...when you re-proposed it :) I'm not opposed to your 
solution in the vmalloc region but I can't find any advantage over the 
current solution, are there ? That would harmonize with Linus's work, 
but then we'd be quite different from x86 address space.


And by the way, thanks for having suggested the current solution in a 
previous conversation :)


Thanks again,

Alex


  Arnd



Re: [syzbot] BUG: unable to handle kernel access to user memory in schedule_tail

2021-03-12 Thread Alex Ghiti




Le 3/12/21 à 10:12 AM, Dmitry Vyukov a écrit :

On Fri, Mar 12, 2021 at 2:50 PM Ben Dooks  wrote:


On 10/03/2021 17:16, Dmitry Vyukov wrote:

On Wed, Mar 10, 2021 at 5:46 PM syzbot
 wrote:


Hello,

syzbot found the following issue on:

HEAD commit:0d7588ab riscv: process: Fix no prototype for arch_dup_tas..
git tree:   git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git 
fixes
console output: https://syzkaller.appspot.com/x/log.txt?x=1212c6e6d0
kernel config:  https://syzkaller.appspot.com/x/.config?x=e3c595255fb2d136
dashboard link: https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69
userspace arch: riscv64

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+e74b94fe601ab9552...@syzkaller.appspotmail.com


+riscv maintainers

This is riscv64-specific.
I've seen similar crashes in put_user in other places. It looks like
put_user crashes in the user address is not mapped/protected (?).


I've been having a look, and this seems to be down to access of the
tsk->set_child_tid variable. I assume the fuzzing here is to pass a
bad address to clone?

  From looking at the code, the put_user() code should have set the
relevant SR_SUM bit (the value for this, which is 1<<18 is in the
s2 register in the crash report) and from looking at the compiler
output from my gcc-10, the code looks to be dong the relevant csrs
and then csrc around the put_user

So currently I do not understand how the above could have happened
over than something re-tried the code seqeunce and ended up retrying
the faulting instruction without the SR_SUM bit set.


I would maybe blame qemu for randomly resetting SR_SUM, but it's
strange that 99% of these crashes are in schedule_tail. If it would be
qemu, then they would be more evenly distributed...

Another observation: looking at a dozen of crash logs, in none of
these cases fuzzer was actually trying to fuzz clone with some insane
arguments. So it looks like completely normal clone's (e..g coming
from pthread_create) result in this crash.

I also wonder why there is ret_from_exception, is it normal? I see
handle_exception disables SR_SUM:


csrrc does the right thing: it cleans SR_SUM bit in status but saves the 
previous value that will get correctly restored.


("The CSRRC (Atomic Read and Clear Bits in CSR) instruction reads the 
value of the CSR, zero-extends the value to XLEN bits, and writes it to 
integer registerrd.  The initial value in integerregisterrs1is treated 
as a bit mask that specifies bit positions to be cleared in the CSR. Any 
bitthat is high inrs1will cause the corresponding bit to be cleared in 
the CSR, if that CSR bit iswritable.  Other bits in the CSR are 
unaffected.")



https://elixir.bootlin.com/linux/v5.12-rc2/source/arch/riscv/kernel/entry.S#L73


Still no luck for the moment, can't reproduce it locally, my test is 
maybe not that good (I created threads all day long in order to trigger 
the put_user of schedule_tail).


Given that the path you mention works most of the time, and that the 
status register in the stack trace shows the SUM bit is not set whereas 
it is set in put_user, I'm leaning toward some race condition (maybe an 
interrupt that arrives at the "wrong" time) or a qemu issue as you 
mentioned.


To eliminate qemu issues, do you have access to some HW ? Or to 
different qemu versions ?





___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv



Re: [syzbot] BUG: unable to handle kernel access to user memory in schedule_tail

2021-03-10 Thread Alex Ghiti

Hi Ben,

Le 3/10/21 à 5:24 PM, Ben Dooks a écrit :

On 10/03/2021 17:16, Dmitry Vyukov wrote:

On Wed, Mar 10, 2021 at 5:46 PM syzbot
 wrote:


Hello,

syzbot found the following issue on:

HEAD commit:    0d7588ab riscv: process: Fix no prototype for 
arch_dup_tas..
git tree:   
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes

console output: https://syzkaller.appspot.com/x/log.txt?x=1212c6e6d0
kernel config:  
https://syzkaller.appspot.com/x/.config?x=e3c595255fb2d136
dashboard link: 
https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69

userspace arch: riscv64

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the 
commit:

Reported-by: syzbot+e74b94fe601ab9552...@syzkaller.appspotmail.com


+riscv maintainers

This is riscv64-specific.
I've seen similar crashes in put_user in other places. It looks like
put_user crashes in the user address is not mapped/protected (?).


The unmapped case should have been handled.

I think this issue is that the check for user-mode access added. From
what I read the code may be wrong in

+    if (!user_mode(regs) && addr < TASK_SIZE &&
+    unlikely(!(regs->status & SR_SUM)))
+    die_kernel_fault("access to user memory without uaccess routines",
+    addr, regs);

I think the SR_SUM check might be wrong, as I read the standard the
SR_SUM should be set to disable user-space access. So the check
should be unlikely(regs->status & SR_SUM) to say access without
having disabled the protection.


The check that is done seems correct to me: "The SUM (permit Supervisor 
User Memory access) bit modifies the privilege with which S-mode loads 
and stores access virtual memory.  *When SUM=0, S-mode memory accesses 
to pages that are accessible by U-mode (U=1 in Figure 4.15) will fault*. 
 When SUM=1, these accesses are permitted.SUM  has  no  effect  when 
page-based  virtual  memory  is  not  in  effect".


I will try to reproduce the problem locally.

Thanks,

Alex



Without this, you can end up with an infinite loop in the fault handler.



Unable to handle kernel access to user memory without uaccess 
routines at virtual address 2749f0d0

Oops [#1]
Modules linked in:
CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted 
5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0

Hardware name: riscv-virtio,qemu (DT)
epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
  ra : task_pid_vnr include/linux/sched.h:1421 [inline]
  ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
epc : ffe8c8b0 ra : ffe8c8ae sp : ffe025d17ec0
  gp : ffe005d25378 tp : ffe00f0d t0 : 
  t1 : 0001 t2 : 000f4240 s0 : ffe025d17ee0
  s1 : 2749f0d0 a0 : 002a a1 : 0003
  a2 : 1ffc0cfac500 a3 : ffec80cc a4 : 5ae9db91c19bbe00
  a5 :  a6 : 00f0 a7 : ffe82eba
  s2 : 0004 s3 : ffe00eef96c0 s4 : ffe022c77fe0
  s5 : 4000 s6 : ffe067d74e00 s7 : ffe067d74850
  s8 : ffe067d73e18 s9 : ffe067d74e00 s10: ffe00eef96e8
  s11: 00ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffc4043cafb2
  t5 : ffc4043cafba t6 : 0004
status: 0120 badaddr: 2749f0d0 cause: 
000f

Call Trace:
[] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
[] ret_from_exception+0x0/0x14
Dumping ftrace buffer:
    (ftrace buffer empty)
---[ end trace b5f8f9231dc87dda ]---


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

--
You received this message because you are subscribed to the Google 
Groups "syzkaller-bugs" group.
To unsubscribe from this group and stop receiving emails from it, 
send an email to syzkaller-bugs+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/syzkaller-bugs/b74f1b05bd316729%40google.com. 



___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv






Re: [PATCH v2] riscv: Improve KASAN_VMALLOC support

2021-03-10 Thread Alex Ghiti

Le 3/9/21 à 9:37 PM, Palmer Dabbelt a écrit :

On Fri, 26 Feb 2021 10:01:54 PST (-0800), a...@ghiti.fr wrote:

When KASAN vmalloc region is populated, there is no userspace process and
the page table in use is swapper_pg_dir, so there is no need to read
SATP. Then we can use the same scheme used by kasan_populate_p*d
functions to go through the page table, which harmonizes the code.

In addition, make use of set_pgd that goes through all unused page table
levels, contrary to p*d_populate functions, which makes this function 
work

whatever the number of page table levels.

And finally, make sure the writes to swapper_pg_dir are visible using
an sfence.vma.


So I think this is actually a bug: without the fence we could get a 
kasan-related fault at any point (as the mappings might not be visible 
yet), and if we get one when inside do_page_fault() (or while holding a 
lock it wants) we'll end up deadlocking against ourselves.  That'll 
probably never happen in practice, but it'd still be good to get the 
fence onto fixes. The rest are cleanups, they're for for-next (and 
should probably be part of your sv48 series, if you need to re-spin it 
-- I'll look at that next).


I only talked about sv48 support in the changelog as it explains why I 
replaced p*d_populate functions for set_p*d, this is not directly linked 
to the sv48 patchset, this is just a bonus that it works for both :)




LMK if you want to split this up, or if you want me to do it.  Either way,


I'll split it up: one patch for the cleanup and one patch for the fix.



Reviewed-by: Palmer Dabbelt 


Thanks,

Alex



Thanks!


Signed-off-by: Alexandre Ghiti 
---

Changes in v2:
- Quiet kernel test robot warnings about missing prototypes by declaring
  the introduced functions as static.

 arch/riscv/mm/kasan_init.c | 61 +-
 1 file changed, 20 insertions(+), 41 deletions(-)

diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index e3d91f334b57..aaa3bdc0ffc0 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -11,18 +11,6 @@
 #include 
 #include 

-static __init void *early_alloc(size_t size, int node)
-{
-    void *ptr = memblock_alloc_try_nid(size, size,
-    __pa(MAX_DMA_ADDRESS), MEMBLOCK_ALLOC_ACCESSIBLE, node);
-
-    if (!ptr)
-    panic("%pS: Failed to allocate %zu bytes align=%zx nid=%d 
from=%llx\n",

-    __func__, size, size, node, (u64)__pa(MAX_DMA_ADDRESS));
-
-    return ptr;
-}
-
 extern pgd_t early_pg_dir[PTRS_PER_PGD];
 asmlinkage void __init kasan_early_init(void)
 {
@@ -155,38 +143,29 @@ static void __init kasan_populate(void *start, 
void *end)

 memset(start, KASAN_SHADOW_INIT, end - start);
 }

-void __init kasan_shallow_populate(void *start, void *end)
+static void __init kasan_shallow_populate_pgd(unsigned long vaddr, 
unsigned long end)

 {
-    unsigned long vaddr = (unsigned long)start & PAGE_MASK;
-    unsigned long vend = PAGE_ALIGN((unsigned long)end);
-    unsigned long pfn;
-    int index;
+    unsigned long next;
 void *p;
-    pud_t *pud_dir, *pud_k;
-    pgd_t *pgd_dir, *pgd_k;
-    p4d_t *p4d_dir, *p4d_k;
-
-    while (vaddr < vend) {
-    index = pgd_index(vaddr);
-    pfn = csr_read(CSR_SATP) & SATP_PPN;
-    pgd_dir = (pgd_t *)pfn_to_virt(pfn) + index;
-    pgd_k = init_mm.pgd + index;
-    pgd_dir = pgd_offset_k(vaddr);
-    set_pgd(pgd_dir, *pgd_k);
-
-    p4d_dir = p4d_offset(pgd_dir, vaddr);
-    p4d_k  = p4d_offset(pgd_k, vaddr);
-
-    vaddr = (vaddr + PUD_SIZE) & PUD_MASK;
-    pud_dir = pud_offset(p4d_dir, vaddr);
-    pud_k = pud_offset(p4d_k, vaddr);
-
-    if (pud_present(*pud_dir)) {
-    p = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
-    pud_populate(_mm, pud_dir, p);
+    pgd_t *pgd_k = pgd_offset_k(vaddr);
+
+    do {
+    next = pgd_addr_end(vaddr, end);
+    if (pgd_page_vaddr(*pgd_k) == (unsigned 
long)lm_alias(kasan_early_shadow_pmd)) {

+    p = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
+    set_pgd(pgd_k, pfn_pgd(PFN_DOWN(__pa(p)), PAGE_TABLE));
 }
-    vaddr += PAGE_SIZE;
-    }
+    } while (pgd_k++, vaddr = next, vaddr != end);
+}
+
+static void __init kasan_shallow_populate(void *start, void *end)
+{
+    unsigned long vaddr = (unsigned long)start & PAGE_MASK;
+    unsigned long vend = PAGE_ALIGN((unsigned long)end);
+
+    kasan_shallow_populate_pgd(vaddr, vend);
+
+    local_flush_tlb_all();
 }

 void __init kasan_init(void)


___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv


Re: [PATCH 2/3] Documentation: riscv: Add documentation that describes the VM layout

2021-03-10 Thread Alex Ghiti

Hi Arnd,

Le 3/10/21 à 6:42 AM, Arnd Bergmann a écrit :

On Thu, Feb 25, 2021 at 12:56 PM Alex Ghiti  wrote:


Le 2/25/21 à 5:34 AM, David Hildenbrand a écrit :

   ||  | |> +
ffc0 | -256GB | ffc7 |   32 GB | kasan

+   ffcefee0 | -196GB | ffcefeff |2 MB | fixmap
+   ffceff00 | -196GB | ffce |   16 MB | PCI io
+   ffcf | -196GB | ffcf |4 GB | vmemmap
+   ffd0 | -192GB | ffdf |   64 GB |
vmalloc/ioremap space
+   ffe0 | -128GB | 7fff |  126 GB |
direct mapping of all physical memory


^ So you could never ever have more than 126 GB, correct?

I assume that's nothing new.



Before this patch, the limit was 128GB, so in my sense, there is nothing
new. If ever we want to increase that limit, we'll just have to lower
PAGE_OFFSET, there is still some unused virtual addresses after kasan
for example.


Linus Walleij is looking into changing the arm32 code to have the kernel
direct map inside of the vmalloc area, which would be another place
that you could use here. It would be nice to not have too many different
ways of doing this, but I'm not sure how hard it would be to rework your
code, or if there are any downsides of doing this.


This was what my previous version did: https://lkml.org/lkml/2020/6/7/28.

This approach was not welcomed very well and it fixed only the problem 
of the implementation of relocatable kernel. The second issue I'm trying 
to resolve here is to support both 3 and 4 level page tables using the 
same kernel without being relocatable (which would introduce performance 
penalty). I can't do it when the kernel mapping is in the vmalloc region 
since vmalloc region relies on PAGE_OFFSET which is different on both 3 
and 4 level page table and that would then require the kernel to be 
relocatable.


Alex



 Arnd

___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv



Re: riscv+KASAN does not boot

2021-03-09 Thread Alex Ghiti

Le 3/9/21 à 12:11 PM, Dmitry Vyukov a écrit :

On Fri, Feb 19, 2021 at 11:26 PM 'Palmer Dabbelt' via syzkaller
 wrote:


On Fri, 19 Feb 2021 10:53:43 PST (-0800), dvyu...@google.com wrote:

On Fri, Feb 19, 2021 at 6:01 PM Alex Ghiti  wrote:


Hi Dmitry,

Le 2/18/21 à 6:36 AM, Dmitry Vyukov a écrit :

On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti  wrote:


Hi Dmitry,


On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti  wrote:


Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit :

On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti  wrote:


Hi Dmitry,

Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit :

On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov  wrote:


On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov  wrote:

I was fixing KASAN support for my sv48 patchset so I took a look at your
issue: I built a kernel on top of the branch riscv/fixes using
https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config
and Buildroot 2020.11. I have the warnings regarding the use of
__virt_to_phys on wrong addresses (but that's normal since this function
is used in virt_addr_valid) but not the segfaults you describe.


Hi Alex,

Let me try to rebuild buildroot image. Maybe there was something wrong
with my build, though, I did 'make clean' before doing. But at the
same time it worked back in June...

Re WARNINGs, they indicate kernel bugs. I am working on setting up a
syzbot instance on riscv. If there a WARNING during boot then the
kernel will be marked as broken. No further testing will happen.
Is it a mis-use of WARN_ON? If so, could anybody please remove it or
replace it with pr_err.



Hi,

I've localized one issue with riscv/KASAN:
KASAN breaks VDSO and that's I think the root cause of weird faults I
saw earlier. The following patch fixes it.
Could somebody please upstream this fix? I don't know how to add/run
tests for this.
Thanks

diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
index 0cfd6da784f84..cf3a383c1799d 100644
--- a/arch/riscv/kernel/vdso/Makefile
+++ b/arch/riscv/kernel/vdso/Makefile
@@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
  # Disable gcov profiling for VDSO code
  GCOV_PROFILE := n
  KCOV_INSTRUMENT := n
+KASAN_SANITIZE := n

  # Force dependency
  $(obj)/vdso.o: $(obj)/vdso.so


What's weird is that I don't have any issue without this patch with the
following config whereas it indeed seems required for KASAN. But when
looking at the segfaults you got earlier, the segfault address is 0xbb0
and the cause is an instruction page fault: this address is the PLT base
address in vdso.so and an instruction page fault would mean that someone
tried to jump at this address, which is weird. At first sight, that does
not seem related to your patch above, but clearly I may be wrong.

Tobias, did you observe the same segfaults as Dmitry ?



I noticed that not all buildroot images use VDSO, it seems to be
dependent on libc settings (at least I think I changed it in the
past).


Ok, I used uClibc but then when using glibc, I have the same segfaults,
only when KASAN is enabled. And your patch fixes the problem. I will try
to take a look later to better understand the problem.


I also booted an image completely successfully including dhcpd/sshd
start, but then my executable crashed in clock_gettime. The executable
was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static"
(10.2.1).



Second issue I am seeing seems to be related to text segment size.
I check out v5.11 and use this config:
https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178


This config gave my laptop a hard time ! Finally I was able to boot
correctly to userspace, but I realized I used my sv48 branch...Either I
fixed your issue along the way or I can't reproduce it, I'll give it a
try tomorrow.


Where is your branch? I could also test in my setup on your branch.



You can find my branch int/alex/riscv_kernel_end_of_address_space_v2
here: https://github.com/AlexGhiti/riscv-linux.git


No, it does not work for me.

Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD,
int/alex/riscv_kernel_end_of_address_space_v2)
Config is 
https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt

riscv64-linux-gnu-gcc -v
gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1)

qemu-system-riscv64 --version
QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)

qemu-system-riscv64 \
-machine virt -smp 2 -m 2G \
-device virtio-blk-device,drive=hd0 \
-drive file=image-riscv64,if=none,format=raw,id=hd0 \
-kernel arch/riscv/boot/Image \
-nographic \
-device virtio-rng-device,rng=rng0 -object
rng-random,filename=/dev/urandom,id=rng0 \
-netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
virtio-net-device,netdev=net0 \
-append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
panic_on_warn=1 panic=86400 early

Re: [RFC PATCH 1/8] RISC-V: Enable CPU_IDLE drivers

2021-02-26 Thread Alex Ghiti

Hi Anup,

Le 2/21/21 à 4:37 AM, Anup Patel a écrit :

We force select CPU_PM and provide asm/cpuidle.h so that we can
use CPU IDLE drivers for Linux RISC-V kernel.

Signed-off-by: Anup Patel 
---
  arch/riscv/Kconfig|  7 +++
  arch/riscv/configs/defconfig  |  7 +++
  arch/riscv/configs/rv32_defconfig |  4 ++--
  arch/riscv/include/asm/cpuidle.h  | 24 
  arch/riscv/kernel/process.c   |  3 ++-
  5 files changed, 38 insertions(+), 7 deletions(-)
  create mode 100644 arch/riscv/include/asm/cpuidle.h

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index fe6862b06ead..4901200b6b6c 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -37,6 +37,7 @@ config RISCV
select CLONE_BACKWARDS
select CLINT_TIMER if !MMU
select COMMON_CLK
+   select CPU_PM if CPU_IDLE
select EDAC_SUPPORT
select GENERIC_ARCH_TOPOLOGY if SMP
select GENERIC_ATOMIC64 if !64BIT
@@ -430,4 +431,10 @@ source "kernel/power/Kconfig"
  
  endmenu
  
+menu "CPU Power Management"

+
+source "drivers/cpuidle/Kconfig"
+
+endmenu
+
  source "drivers/firmware/Kconfig"
diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig
index 6c0625aa96c7..dc4927c0e44b 100644
--- a/arch/riscv/configs/defconfig
+++ b/arch/riscv/configs/defconfig
@@ -13,11 +13,13 @@ CONFIG_USER_NS=y
  CONFIG_CHECKPOINT_RESTORE=y
  CONFIG_BLK_DEV_INITRD=y
  CONFIG_EXPERT=y
+# CONFIG_SYSFS_SYSCALL is not set
  CONFIG_BPF_SYSCALL=y
  CONFIG_SOC_SIFIVE=y
  CONFIG_SOC_VIRT=y
  CONFIG_SMP=y
  CONFIG_HOTPLUG_CPU=y
+CONFIG_CPU_IDLE=y
  CONFIG_JUMP_LABEL=y
  CONFIG_MODULES=y
  CONFIG_MODULE_UNLOAD=y
@@ -65,10 +67,9 @@ CONFIG_HW_RANDOM=y
  CONFIG_HW_RANDOM_VIRTIO=y
  CONFIG_SPI=y
  CONFIG_SPI_SIFIVE=y
+# CONFIG_PTP_1588_CLOCK is not set
  CONFIG_GPIOLIB=y
  CONFIG_GPIO_SIFIVE=y
-# CONFIG_PTP_1588_CLOCK is not set
-CONFIG_POWER_RESET=y


Why do you remove this config ?


  CONFIG_DRM=y
  CONFIG_DRM_RADEON=y
  CONFIG_DRM_VIRTIO_GPU=y
@@ -132,5 +133,3 @@ CONFIG_DEBUG_BLOCK_EXT_DEVT=y
  # CONFIG_FTRACE is not set
  # CONFIG_RUNTIME_TESTING_MENU is not set
  CONFIG_MEMTEST=y
-# CONFIG_SYSFS_SYSCALL is not set
-CONFIG_EFI=y


And this is one too ? If those removals are intentional, maybe you can 
add something about that in the commit description ?



diff --git a/arch/riscv/configs/rv32_defconfig 
b/arch/riscv/configs/rv32_defconfig
index 8dd02b842fef..332e43a4a2c3 100644
--- a/arch/riscv/configs/rv32_defconfig
+++ b/arch/riscv/configs/rv32_defconfig
@@ -13,12 +13,14 @@ CONFIG_USER_NS=y
  CONFIG_CHECKPOINT_RESTORE=y
  CONFIG_BLK_DEV_INITRD=y
  CONFIG_EXPERT=y
+# CONFIG_SYSFS_SYSCALL is not set
  CONFIG_BPF_SYSCALL=y
  CONFIG_SOC_SIFIVE=y
  CONFIG_SOC_VIRT=y
  CONFIG_ARCH_RV32I=y
  CONFIG_SMP=y
  CONFIG_HOTPLUG_CPU=y
+CONFIG_CPU_IDLE=y
  CONFIG_JUMP_LABEL=y
  CONFIG_MODULES=y
  CONFIG_MODULE_UNLOAD=y
@@ -67,7 +69,6 @@ CONFIG_HW_RANDOM_VIRTIO=y
  CONFIG_SPI=y
  CONFIG_SPI_SIFIVE=y
  # CONFIG_PTP_1588_CLOCK is not set
-CONFIG_POWER_RESET=y
  CONFIG_DRM=y
  CONFIG_DRM_RADEON=y
  CONFIG_DRM_VIRTIO_GPU=y
@@ -131,4 +132,3 @@ CONFIG_DEBUG_BLOCK_EXT_DEVT=y
  # CONFIG_FTRACE is not set
  # CONFIG_RUNTIME_TESTING_MENU is not set
  CONFIG_MEMTEST=y
-# CONFIG_SYSFS_SYSCALL is not set
diff --git a/arch/riscv/include/asm/cpuidle.h b/arch/riscv/include/asm/cpuidle.h
new file mode 100644
index ..1042d790e446
--- /dev/null
+++ b/arch/riscv/include/asm/cpuidle.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2021 Allwinner Ltd
+ * Copyright (C) 2021 Western Digital Corporation or its affiliates.
+ */
+
+#ifndef _ASM_RISCV_CPUIDLE_H
+#define _ASM_RISCV_CPUIDLE_H
+
+#include 
+#include 
+
+static inline void cpu_do_idle(void)
+{
+   /*
+* Add mb() here to ensure that all
+* IO/MEM access are completed prior


accessES ?


+* to enter WFI.
+*/
+   mb();
+   wait_for_interrupt();
+}
+
+#endif
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index dd5f985b1f40..b5b51fd26624 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -21,6 +21,7 @@
  #include 
  #include 
  #include 
+#include 
  
  register unsigned long gp_in_global __asm__("gp");
  
@@ -35,7 +36,7 @@ extern asmlinkage void ret_from_kernel_thread(void);
  
  void arch_cpu_idle(void)

  {
-   wait_for_interrupt();
+   cpu_do_idle();
raw_local_irq_enable();
  }
  



Re: [PATCH] riscv: Add KASAN_VMALLOC support

2021-02-25 Thread Alex Ghiti

Hi Palmer,

Le 2/26/21 à 12:32 AM, Palmer Dabbelt a écrit :

On Wed, 24 Feb 2021 23:48:13 PST (-0800), a...@ghiti.fr wrote:

Le 2/25/21 à 2:42 AM, Alexandre Ghiti a écrit :
Populate the top-level of the kernel page table to implement 
KASAN_VMALLOC,

lower levels are filled dynamically upon memory allocation at runtime.

Co-developed-by: Nylon Chen 
Signed-off-by: Nylon Chen 
Co-developed-by: Nick Hu 
Signed-off-by: Nick Hu 
Signed-off-by: Alexandre Ghiti 
---
  arch/riscv/Kconfig |  1 +
  arch/riscv/mm/kasan_init.c | 35 ++-
  2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 8eadd1cbd524..3832a537c5d6 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -57,6 +57,7 @@ config RISCV
  select HAVE_ARCH_JUMP_LABEL
  select HAVE_ARCH_JUMP_LABEL_RELATIVE
  select HAVE_ARCH_KASAN if MMU && 64BIT
+    select HAVE_ARCH_KASAN_VMALLOC if MMU && 64BIT
  select HAVE_ARCH_KGDB
  select HAVE_ARCH_KGDB_QXFER_PKT
  select HAVE_ARCH_MMAP_RND_BITS if MMU
diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index 719b6e4d6075..171569df4334 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -142,6 +142,31 @@ static void __init kasan_populate(void *start, 
void *end)

  memset(start, KASAN_SHADOW_INIT, end - start);
  }

+void __init kasan_shallow_populate_pgd(unsigned long vaddr, unsigned 
long end)

+{
+    unsigned long next;
+    void *p;
+    pgd_t *pgd_k = pgd_offset_k(vaddr);
+
+    do {
+    next = pgd_addr_end(vaddr, end);
+    if (pgd_page_vaddr(*pgd_k) == (unsigned 
long)lm_alias(kasan_early_shadow_pmd)) {

+    p = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
+    set_pgd(pgd_k, pfn_pgd(PFN_DOWN(__pa(p)), PAGE_TABLE));
+    }
+    } while (pgd_k++, vaddr = next, vaddr != end);
+}
+
+void __init kasan_shallow_populate(void *start, void *end)
+{
+    unsigned long vaddr = (unsigned long)start & PAGE_MASK;
+    unsigned long vend = PAGE_ALIGN((unsigned long)end);
+
+    kasan_shallow_populate_pgd(vaddr, vend);
+
+    local_flush_tlb_all();
+}
+
  void __init kasan_init(void)
  {
  phys_addr_t _start, _end;
@@ -149,7 +174,15 @@ void __init kasan_init(void)

  kasan_populate_early_shadow((void *)KASAN_SHADOW_START,
  (void *)kasan_mem_to_shadow((void *)
-    VMALLOC_END));
+    VMEMMAP_END));
+    if (IS_ENABLED(CONFIG_KASAN_VMALLOC))
+    kasan_shallow_populate(
+    (void *)kasan_mem_to_shadow((void *)VMALLOC_START),
+    (void *)kasan_mem_to_shadow((void *)VMALLOC_END));
+    else
+    kasan_populate_early_shadow(
+    (void *)kasan_mem_to_shadow((void *)VMALLOC_START),
+    (void *)kasan_mem_to_shadow((void *)VMALLOC_END));

  for_each_mem_range(i, &_start, &_end) {
  void *start = (void *)_start;



Palmer, this commit should replace (if everyone agrees) Nylon and Nick's
Commit e178d670f251 ("riscv/kasan: add KASAN_VMALLOC support") that is
already in for-next.


Sorry, but it's way too late to be rebasing things.  I can get trying to 
have
the history clean, but in this case we're better off having this as an 
explicit

fix patch -- changing hashes this late in the process messes with all the
testing.

I'm not sure what the issue actually is, so it'd be great if you could 
send the
fix patch.  If not then LMK and I'll try to figure out what's going on.  
Either

way, having the fix will make sure this gets tested properly as whatever's
going on isn't failing for me.



Nylon's patch is functional as is, but as I mentioned here 
https://patchwork.kernel.org/project/linux-riscv/patch/20210116055836.22366-2-nyl...@andestech.com/, 
it does unnecessary things (like trying to walk a user page table that 
does not exist at this point in the boot process).


Anyway, I will send another patch rebased on top of Nylon's.

Thanks,

Alex



___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv


Re: [PATCH 2/3] Documentation: riscv: Add documentation that describes the VM layout

2021-02-25 Thread Alex Ghiti

Le 2/25/21 à 5:34 AM, David Hildenbrand a écrit :
  |    |  | |> + 
ffc0 | -256    GB | ffc7 |   32 GB | kasan

+   ffcefee0 | -196    GB | ffcefeff |    2 MB | fixmap
+   ffceff00 | -196    GB | ffce |   16 MB | PCI io
+   ffcf | -196    GB | ffcf |    4 GB | vmemmap
+   ffd0 | -192    GB | ffdf |   64 GB | 
vmalloc/ioremap space
+   ffe0 | -128    GB | 7fff |  126 GB | 
direct mapping of all physical memory


^ So you could never ever have more than 126 GB, correct?

I assume that's nothing new.



Before this patch, the limit was 128GB, so in my sense, there is nothing 
new. If ever we want to increase that limit, we'll just have to lower 
PAGE_OFFSET, there is still some unused virtual addresses after kasan 
for example.


Thanks,

Alex


Re: [PATCH] riscv: Add KASAN_VMALLOC support

2021-02-24 Thread Alex Ghiti

Le 2/25/21 à 2:42 AM, Alexandre Ghiti a écrit :

Populate the top-level of the kernel page table to implement KASAN_VMALLOC,
lower levels are filled dynamically upon memory allocation at runtime.

Co-developed-by: Nylon Chen 
Signed-off-by: Nylon Chen 
Co-developed-by: Nick Hu 
Signed-off-by: Nick Hu 
Signed-off-by: Alexandre Ghiti 
---
  arch/riscv/Kconfig |  1 +
  arch/riscv/mm/kasan_init.c | 35 ++-
  2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 8eadd1cbd524..3832a537c5d6 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -57,6 +57,7 @@ config RISCV
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_JUMP_LABEL_RELATIVE
select HAVE_ARCH_KASAN if MMU && 64BIT
+   select HAVE_ARCH_KASAN_VMALLOC if MMU && 64BIT
select HAVE_ARCH_KGDB
select HAVE_ARCH_KGDB_QXFER_PKT
select HAVE_ARCH_MMAP_RND_BITS if MMU
diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index 719b6e4d6075..171569df4334 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -142,6 +142,31 @@ static void __init kasan_populate(void *start, void *end)
memset(start, KASAN_SHADOW_INIT, end - start);
  }
  
+void __init kasan_shallow_populate_pgd(unsigned long vaddr, unsigned long end)

+{
+   unsigned long next;
+   void *p;
+   pgd_t *pgd_k = pgd_offset_k(vaddr);
+
+   do {
+   next = pgd_addr_end(vaddr, end);
+   if (pgd_page_vaddr(*pgd_k) == (unsigned 
long)lm_alias(kasan_early_shadow_pmd)) {
+   p = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
+   set_pgd(pgd_k, pfn_pgd(PFN_DOWN(__pa(p)), PAGE_TABLE));
+   }
+   } while (pgd_k++, vaddr = next, vaddr != end);
+}
+
+void __init kasan_shallow_populate(void *start, void *end)
+{
+   unsigned long vaddr = (unsigned long)start & PAGE_MASK;
+   unsigned long vend = PAGE_ALIGN((unsigned long)end);
+
+   kasan_shallow_populate_pgd(vaddr, vend);
+
+   local_flush_tlb_all();
+}
+
  void __init kasan_init(void)
  {
phys_addr_t _start, _end;
@@ -149,7 +174,15 @@ void __init kasan_init(void)
  
  	kasan_populate_early_shadow((void *)KASAN_SHADOW_START,

(void *)kasan_mem_to_shadow((void *)
-   VMALLOC_END));
+   VMEMMAP_END));
+   if (IS_ENABLED(CONFIG_KASAN_VMALLOC))
+   kasan_shallow_populate(
+   (void *)kasan_mem_to_shadow((void *)VMALLOC_START),
+   (void *)kasan_mem_to_shadow((void *)VMALLOC_END));
+   else
+   kasan_populate_early_shadow(
+   (void *)kasan_mem_to_shadow((void *)VMALLOC_START),
+   (void *)kasan_mem_to_shadow((void *)VMALLOC_END));
  
  	for_each_mem_range(i, &_start, &_end) {

void *start = (void *)_start;



Palmer, this commit should replace (if everyone agrees) Nylon and Nick's 
Commit e178d670f251 ("riscv/kasan: add KASAN_VMALLOC support") that is 
already in for-next.


Thanks,

Alex


Re: [PATCH v2 1/1] riscv/kasan: add KASAN_VMALLOC support

2021-02-24 Thread Alex Ghiti

Hi Nylon,

Le 2/22/21 à 12:13 PM, Alex Ghiti a écrit :

Le 2/21/21 à 8:37 PM, Nylon Chen a écrit :

Hi Alex, Palmer

Sorry I missed this message.
On Sun, Feb 21, 2021 at 09:38:04PM +0800, Alex Ghiti wrote:

Le 2/13/21 à 5:52 AM, Alex Ghiti a écrit :

Hi Nylon, Palmer,

Le 2/8/21 à 1:28 AM, Alex Ghiti a écrit :

Hi Nylon,

Le 1/22/21 à 10:56 PM, Palmer Dabbelt a écrit :

On Fri, 15 Jan 2021 21:58:35 PST (-0800), nyl...@andestech.com wrote:

It references to x86/s390 architecture.

So, it doesn't map the early shadow page to cover VMALLOC space.


Prepopulate top level page table for the range that would 
otherwise be

empty.

lower levels are filled dynamically upon memory allocation while
booting.


I think we can improve the changelog a bit here with something like 
that:


"KASAN vmalloc space used to be mapped using kasan early shadow page.
KASAN_VMALLOC requires the top-level of the kernel page table to be
properly populated, lower levels being filled dynamically upon memory
allocation at runtime."



Signed-off-by: Nylon Chen 
Signed-off-by: Nick Hu 
---
  arch/riscv/Kconfig |  1 +
  arch/riscv/mm/kasan_init.c | 57 
+-

  2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 81b76d44725d..15a2c8088bbe 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -57,6 +57,7 @@ config RISCV
  select HAVE_ARCH_JUMP_LABEL
  select HAVE_ARCH_JUMP_LABEL_RELATIVE
  select HAVE_ARCH_KASAN if MMU && 64BIT
+    select HAVE_ARCH_KASAN_VMALLOC if MMU && 64BIT
  select HAVE_ARCH_KGDB
  select HAVE_ARCH_KGDB_QXFER_PKT
  select HAVE_ARCH_MMAP_RND_BITS if MMU
diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index 12ddd1f6bf70..4b9149f963d3 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -9,6 +9,19 @@
  #include 
  #include 
  #include 
+#include 
+
+static __init void *early_alloc(size_t size, int node)
+{
+    void *ptr = memblock_alloc_try_nid(size, size,
+    __pa(MAX_DMA_ADDRESS), MEMBLOCK_ALLOC_ACCESSIBLE, node);
+
+    if (!ptr)
+    panic("%pS: Failed to allocate %zu bytes align=%zx nid=%d
from=%llx\n",
+    __func__, size, size, node, 
(u64)__pa(MAX_DMA_ADDRESS));

+
+    return ptr;
+}

  extern pgd_t early_pg_dir[PTRS_PER_PGD];
  asmlinkage void __init kasan_early_init(void)
@@ -83,6 +96,40 @@ static void __init populate(void *start, void 
*end)

  memset(start, 0, end - start);
  }

+void __init kasan_shallow_populate(void *start, void *end)
+{
+    unsigned long vaddr = (unsigned long)start & PAGE_MASK;
+    unsigned long vend = PAGE_ALIGN((unsigned long)end);
+    unsigned long pfn;
+    int index;
+    void *p;
+    pud_t *pud_dir, *pud_k;
+    pgd_t *pgd_dir, *pgd_k;
+    p4d_t *p4d_dir, *p4d_k;
+
+    while (vaddr < vend) {
+    index = pgd_index(vaddr);
+    pfn = csr_read(CSR_SATP) & SATP_PPN;


At this point in the boot process, we know that we use swapper_pg_dir
so no need to read SATP.


+    pgd_dir = (pgd_t *)pfn_to_virt(pfn) + index;


Here, this pgd_dir assignment is overwritten 2 lines below, so no need
for it.


+    pgd_k = init_mm.pgd + index;
+    pgd_dir = pgd_offset_k(vaddr);


pgd_offset_k(vaddr) = init_mm.pgd + pgd_index(vaddr) so pgd_k == 
pgd_dir.



+    set_pgd(pgd_dir, *pgd_k);
+
+    p4d_dir = p4d_offset(pgd_dir, vaddr);
+    p4d_k  = p4d_offset(pgd_k, vaddr);
+
+    vaddr = (vaddr + PUD_SIZE) & PUD_MASK;


Why do you increase vaddr *before* populating the first one ? And
pud_addr_end does that properly: it returns the next pud address if it
does not go beyond end address to map.


+    pud_dir = pud_offset(p4d_dir, vaddr);
+    pud_k = pud_offset(p4d_k, vaddr);
+
+    if (pud_present(*pud_dir)) {
+    p = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
+    pud_populate(_mm, pud_dir, p);


init_mm is not needed here.


+    }
+    vaddr += PAGE_SIZE;


Why do you need to add PAGE_SIZE ? vaddr already points to the next 
pud.


It seems like this patch tries to populate userspace page table
whereas at this point in the boot process, only swapper_pg_dir is used
or am I missing something ?

Thanks,

Alex


I implemented this morning a version that fixes all the comments I made
earlier. I was able to insert test_kasan_module on both sv39 and sv48
without any modification: set_pgd "goes through" all the unused page
table levels, whereas p*d_populate are noop for unused levels.

If you have any comment, do not hesitate.

diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index adbf94b7e68a..d643b222167c 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -195,6 +195,31 @@ static void __init kasan_populate(void *start, 
void

*end)
      memset(start, KASAN_SHADOW_INIT, end - start);
   }


+void __init kasan_shallow_populate_pg

Re: [PATCH] riscv: mm: Remove the copy operation of pmd

2021-02-24 Thread Alex Ghiti

Le 3/30/20 à 7:53 AM, Chuanhua Han a écrit :

Since all processes share the kernel address space,
we only need to copy pgd in case of a vmalloc page
fault exception, the other levels of page tables are
shared, so the operation of copying pmd is unnecessary.

Signed-off-by: Chuanhua Han 
---
  arch/riscv/mm/fault.c | 10 +++---
  1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index be84e32adc4c..24f4ebfd2df8 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -208,9 +208,9 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
  vmalloc_fault:
{
pgd_t *pgd, *pgd_k;
-   pud_t *pud, *pud_k;
-   p4d_t *p4d, *p4d_k;
-   pmd_t *pmd, *pmd_k;
+   pud_t *pud_k;
+   p4d_t *p4d_k;
+   pmd_t *pmd_k;
pte_t *pte_k;
int index;
  
@@ -234,12 +234,10 @@ asmlinkage void do_page_fault(struct pt_regs *regs)

goto no_context;
set_pgd(pgd, *pgd_k);
  
-		p4d = p4d_offset(pgd, addr);

p4d_k = p4d_offset(pgd_k, addr);
if (!p4d_present(*p4d_k))
goto no_context;
  
-		pud = pud_offset(p4d, addr);

pud_k = pud_offset(p4d_k, addr);
if (!pud_present(*pud_k))
goto no_context;
@@ -248,11 +246,9 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
 * Since the vmalloc area is global, it is unnecessary
 * to copy individual PTEs
 */
-   pmd = pmd_offset(pud, addr);
pmd_k = pmd_offset(pud_k, addr);
if (!pmd_present(*pmd_k))
goto no_context;
-   set_pmd(pmd, *pmd_k);
  
  		/*

 * Make sure the actual PTE exists as well to



Better late than never: I do agree with this patch, once the PGD is 
copied into the user page table, it "comes with all its mappings" so 
there is no need to copy PMD, the only thing left to do is to make sure 
the mapping does exists in the kernel page table.


So feel free to add:

Reviewed-by: Alexandre Ghiti 
Tested-by: Alexandre Ghiti 

Thanks,

Alex


Re: [PATCH] riscv: Pass virtual addresses to kasan_mem_to_shadow

2021-02-22 Thread Alex Ghiti

Hi Palmer,

Le 2/22/21 à 9:58 PM, Palmer Dabbelt a écrit :

On Mon, 22 Feb 2021 00:07:34 PST (-0800), a...@ghiti.fr wrote:

kasan_mem_to_shadow translates virtual addresses to kasan shadow
addresses whereas for_each_mem_range returns physical addresses: it is
then required to use __va on those addresses before passing them to
kasan_mem_to_shadow.

Fixes: b10d6bca8720 ("arch, drivers: replace for_each_membock() with 
for_each_mem_range()")

Signed-off-by: Alexandre Ghiti 
---
 arch/riscv/mm/kasan_init.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index 4b9149f963d3..6d3b88f2c566 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -148,8 +148,8 @@ void __init kasan_init(void)
 (void *)kasan_mem_to_shadow((void *)VMALLOC_END));

 for_each_mem_range(i, &_start, &_end) {
-    void *start = (void *)_start;
-    void *end = (void *)_end;
+    void *start = (void *)__va(_start);
+    void *end = (void *)__va(_end);

 if (start >= end)
 break;


Thanks, but unless I'm missing something this is already in Linus' tree as
c25a053e1577 ("riscv: Fix KASAN memory mapping.").


You're right, I missed this one: but for some reasons, this patch does 
not appear in for-next.


Thanks,

Alex



___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv


Re: [PATCH v2 1/1] riscv/kasan: add KASAN_VMALLOC support

2021-02-22 Thread Alex Ghiti

Le 2/21/21 à 8:37 PM, Nylon Chen a écrit :

Hi Alex, Palmer

Sorry I missed this message.
On Sun, Feb 21, 2021 at 09:38:04PM +0800, Alex Ghiti wrote:

Le 2/13/21 à 5:52 AM, Alex Ghiti a écrit :

Hi Nylon, Palmer,

Le 2/8/21 à 1:28 AM, Alex Ghiti a écrit :

Hi Nylon,

Le 1/22/21 à 10:56 PM, Palmer Dabbelt a écrit :

On Fri, 15 Jan 2021 21:58:35 PST (-0800), nyl...@andestech.com wrote:

It references to x86/s390 architecture.

So, it doesn't map the early shadow page to cover VMALLOC space.


Prepopulate top level page table for the range that would otherwise be
empty.

lower levels are filled dynamically upon memory allocation while
booting.


I think we can improve the changelog a bit here with something like that:

"KASAN vmalloc space used to be mapped using kasan early shadow page.
KASAN_VMALLOC requires the top-level of the kernel page table to be
properly populated, lower levels being filled dynamically upon memory
allocation at runtime."



Signed-off-by: Nylon Chen 
Signed-off-by: Nick Hu 
---
  arch/riscv/Kconfig |  1 +
  arch/riscv/mm/kasan_init.c | 57 +-
  2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 81b76d44725d..15a2c8088bbe 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -57,6 +57,7 @@ config RISCV
  select HAVE_ARCH_JUMP_LABEL
  select HAVE_ARCH_JUMP_LABEL_RELATIVE
  select HAVE_ARCH_KASAN if MMU && 64BIT
+    select HAVE_ARCH_KASAN_VMALLOC if MMU && 64BIT
  select HAVE_ARCH_KGDB
  select HAVE_ARCH_KGDB_QXFER_PKT
  select HAVE_ARCH_MMAP_RND_BITS if MMU
diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index 12ddd1f6bf70..4b9149f963d3 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -9,6 +9,19 @@
  #include 
  #include 
  #include 
+#include 
+
+static __init void *early_alloc(size_t size, int node)
+{
+    void *ptr = memblock_alloc_try_nid(size, size,
+    __pa(MAX_DMA_ADDRESS), MEMBLOCK_ALLOC_ACCESSIBLE, node);
+
+    if (!ptr)
+    panic("%pS: Failed to allocate %zu bytes align=%zx nid=%d
from=%llx\n",
+    __func__, size, size, node, (u64)__pa(MAX_DMA_ADDRESS));
+
+    return ptr;
+}

  extern pgd_t early_pg_dir[PTRS_PER_PGD];
  asmlinkage void __init kasan_early_init(void)
@@ -83,6 +96,40 @@ static void __init populate(void *start, void *end)
  memset(start, 0, end - start);
  }

+void __init kasan_shallow_populate(void *start, void *end)
+{
+    unsigned long vaddr = (unsigned long)start & PAGE_MASK;
+    unsigned long vend = PAGE_ALIGN((unsigned long)end);
+    unsigned long pfn;
+    int index;
+    void *p;
+    pud_t *pud_dir, *pud_k;
+    pgd_t *pgd_dir, *pgd_k;
+    p4d_t *p4d_dir, *p4d_k;
+
+    while (vaddr < vend) {
+    index = pgd_index(vaddr);
+    pfn = csr_read(CSR_SATP) & SATP_PPN;


At this point in the boot process, we know that we use swapper_pg_dir
so no need to read SATP.


+    pgd_dir = (pgd_t *)pfn_to_virt(pfn) + index;


Here, this pgd_dir assignment is overwritten 2 lines below, so no need
for it.


+    pgd_k = init_mm.pgd + index;
+    pgd_dir = pgd_offset_k(vaddr);


pgd_offset_k(vaddr) = init_mm.pgd + pgd_index(vaddr) so pgd_k == pgd_dir.


+    set_pgd(pgd_dir, *pgd_k);
+
+    p4d_dir = p4d_offset(pgd_dir, vaddr);
+    p4d_k  = p4d_offset(pgd_k, vaddr);
+
+    vaddr = (vaddr + PUD_SIZE) & PUD_MASK;


Why do you increase vaddr *before* populating the first one ? And
pud_addr_end does that properly: it returns the next pud address if it
does not go beyond end address to map.


+    pud_dir = pud_offset(p4d_dir, vaddr);
+    pud_k = pud_offset(p4d_k, vaddr);
+
+    if (pud_present(*pud_dir)) {
+    p = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
+    pud_populate(_mm, pud_dir, p);


init_mm is not needed here.


+    }
+    vaddr += PAGE_SIZE;


Why do you need to add PAGE_SIZE ? vaddr already points to the next pud.

It seems like this patch tries to populate userspace page table
whereas at this point in the boot process, only swapper_pg_dir is used
or am I missing something ?

Thanks,

Alex


I implemented this morning a version that fixes all the comments I made
earlier. I was able to insert test_kasan_module on both sv39 and sv48
without any modification: set_pgd "goes through" all the unused page
table levels, whereas p*d_populate are noop for unused levels.

If you have any comment, do not hesitate.

diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index adbf94b7e68a..d643b222167c 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -195,6 +195,31 @@ static void __init kasan_populate(void *start, void
*end)
      memset(start, KASAN_SHADOW_INIT, end - start);
   }


+void __init kasan_shallow_populate_pgd(unsigned long vaddr, unsigned
long end)
+{
+   unsigned long nex

Re: [PATCH] riscv: Get rid of MAX_EARLY_MAPPING_SIZE

2021-02-22 Thread Alex Ghiti

Le 2/22/21 à 12:40 AM, Alex Ghiti a écrit :

Hi Dmitry,

Le 2/21/21 à 10:38 AM, Dmitry Vyukov a écrit :

On Sun, Feb 21, 2021 at 3:22 PM Alexandre Ghiti  wrote:


At early boot stage, we have a whole PGDIR to map the kernel, so there
is no need to restrict the early mapping size to 128MB. Removing this
define also allows us to simplify some compile time logic.

This fixes large kernel mappings with a size greater than 128MB, as it
is the case for syzbot kernels whose size was just ~130MB.

Note that on rv64, for now, we are then limited to PGDIR size for early
mapping as we can't use PGD mappings (see [1]). That should be enough
given the relative small size of syzbot kernels compared to PGDIR_SIZE
which is 1GB.

[1] https://lore.kernel.org/lkml/20200603153608.30056-1-a...@ghiti.fr/


I've applied this patch to (as it contains the HEAD fix):

commit f49815047c1a3e3644a0ba38f3825c5cde8a0922 (HEAD, riscv/for-next)
Author: Tobias Klauser 
Date:   Tue Feb 16 18:33:05 2021 +0100
 riscv: Disable KSAN_SANITIZE for vDSO

and the kernel started booting with my large config.
It quickly crashed (see below), but at least it started booting, so
it's an improvement.

Tested-by: Dmitry Vyukov 


Thanks for that.



Linux version 5.11.0-rc2-00069-gf49815047c1a-dirty
(dvyu...@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc
(Debian 10.2.1-6+build1) 10.2.1 20210110, GNU ld (GNU Binutils for
Debian) 2.35.1) #34 SMP PREEMPT Sun Feb 21 15:51:40 CET 2021
OF: fdt: Ignoring memory range 0x8000 - 0x8020
Machine model: riscv-virtio,qemu
earlycon: ns16550a0 at MMIO 0x1000 (options '')
printk: bootconsole [ns16550a0] enabled
efi: UEFI not found.
cma: Reserved 16 MiB at 0xfec0
Zone ranges:
   DMA32    [mem 0x8020-0x]
   Normal   empty
Movable zone start for each node
Early memory node ranges
   node   0: [mem 0x8020-0x]
Zeroed struct page in unavailable ranges: 512 pages
Initmem setup node 0 [mem 0x8020-0x]
SBI specification v0.2 detected
SBI implementation ID=0x1 Version=0x8
SBI v0.2 TIME extension detected
SBI v0.2 IPI extension detected
SBI v0.2 RFENCE extension detected
software IO TLB: mapped [mem 0xf7c0-0xfbc0] 
(64MB)

[ cut here ]
DEBUG_LOCKS_WARN_ON(early_boot_irqs_disabled)
WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:4085
lockdep_hardirqs_on_prepare+0x384/0x388 kernel/locking/lockdep.c:4085
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 
5.11.0-rc2-00069-gf49815047c1a-dirty #34

Hardware name: riscv-virtio,qemu (DT)
epc : lockdep_hardirqs_on_prepare+0x384/0x388 
kernel/locking/lockdep.c:4085
  ra : lockdep_hardirqs_on_prepare+0x384/0x388 
kernel/locking/lockdep.c:4085

epc : ffec125a ra : ffec125a sp : ffe006603ce0
  gp : ffe006c338f0 tp : ffe006689e00 t0 : ffe00669a9a8
  t1 : ffc400cc0738 t2 :  s0 : ffe006603d20
  s1 : ffe006689e00 a0 : 002d a1 : 000f
  a2 : 0002 a3 : ffed2718 a4 : 
  a5 :  a6 : 00f0 a7 : ffe0066039c7
  s2 : ffe004a337c0 s3 : ffe0076fa1b8 s4 : 
  s5 : ffe006689e00 s6 : 0001 s7 : ffe07fcfc000
  s8 : ffe07fcfd000 s9 : ffe006c3c0d0 s10: f000
  s11: ffe004a1fbb8 t3 : 2d2d2d2d t4 : ffc400cc0737
  t5 : ffc400cc0739 t6 : ffe0066039c8
status: 0100 badaddr:  cause: 
0003

Call Trace:
[] lockdep_hardirqs_on_prepare+0x384/0x388
kernel/locking/lockdep.c:4085
[] trace_hardirqs_on+0x116/0x174
kernel/trace/trace_preemptirq.c:49
[] _save_context+0xa2/0xe2
[] local_flush_tlb_all
arch/riscv/include/asm/tlbflush.h:16 [inline]
[] populate arch/riscv/mm/kasan_init.c:95 [inline]
[] kasan_init+0x23e/0x31a 
arch/riscv/mm/kasan_init.c:157

irq event stamp: 0
hardirqs last  enabled at (0): [<>] 0x0
hardirqs last disabled at (0): [<>] 0x0
softirqs last  enabled at (0): [<>] 0x0
softirqs last disabled at (0): [<>] 0x0
random: get_random_bytes called from init_oops_id kernel/panic.c:546
[inline] with crng_init=0
random: get_random_bytes called from init_oops_id kernel/panic.c:543
[inline] with crng_init=0
random: get_random_bytes called from print_oops_end_marker
kernel/panic.c:556 [inline] with crng_init=0
random: get_random_bytes called from __warn+0x1be/0x20a
kernel/panic.c:613 with crng_init=0
---[ end trace  ]---
Unable to handle kernel paging request at virtual address 
dfc81004

Oops [#1]
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Tainted: G    W
5.11.0-rc2-00069-gf49815047c1a-dirty #34
Hardware name: riscv-virtio,qemu (DT)
epc : __memset+0x60/0xfc arch/riscv/lib/memset.S:67
  ra : populate arch/riscv/mm/kasan_init.c:96 [inline]
  ra : kas

Re: [PATCH] riscv: Get rid of MAX_EARLY_MAPPING_SIZE

2021-02-21 Thread Alex Ghiti

Hi Dmitry,

Le 2/21/21 à 10:38 AM, Dmitry Vyukov a écrit :

On Sun, Feb 21, 2021 at 3:22 PM Alexandre Ghiti  wrote:


At early boot stage, we have a whole PGDIR to map the kernel, so there
is no need to restrict the early mapping size to 128MB. Removing this
define also allows us to simplify some compile time logic.

This fixes large kernel mappings with a size greater than 128MB, as it
is the case for syzbot kernels whose size was just ~130MB.

Note that on rv64, for now, we are then limited to PGDIR size for early
mapping as we can't use PGD mappings (see [1]). That should be enough
given the relative small size of syzbot kernels compared to PGDIR_SIZE
which is 1GB.

[1] https://lore.kernel.org/lkml/20200603153608.30056-1-a...@ghiti.fr/


I've applied this patch to (as it contains the HEAD fix):

commit f49815047c1a3e3644a0ba38f3825c5cde8a0922 (HEAD, riscv/for-next)
Author: Tobias Klauser 
Date:   Tue Feb 16 18:33:05 2021 +0100
 riscv: Disable KSAN_SANITIZE for vDSO

and the kernel started booting with my large config.
It quickly crashed (see below), but at least it started booting, so
it's an improvement.

Tested-by: Dmitry Vyukov 


Thanks for that.



Linux version 5.11.0-rc2-00069-gf49815047c1a-dirty
(dvyu...@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc
(Debian 10.2.1-6+build1) 10.2.1 20210110, GNU ld (GNU Binutils for
Debian) 2.35.1) #34 SMP PREEMPT Sun Feb 21 15:51:40 CET 2021
OF: fdt: Ignoring memory range 0x8000 - 0x8020
Machine model: riscv-virtio,qemu
earlycon: ns16550a0 at MMIO 0x1000 (options '')
printk: bootconsole [ns16550a0] enabled
efi: UEFI not found.
cma: Reserved 16 MiB at 0xfec0
Zone ranges:
   DMA32[mem 0x8020-0x]
   Normal   empty
Movable zone start for each node
Early memory node ranges
   node   0: [mem 0x8020-0x]
Zeroed struct page in unavailable ranges: 512 pages
Initmem setup node 0 [mem 0x8020-0x]
SBI specification v0.2 detected
SBI implementation ID=0x1 Version=0x8
SBI v0.2 TIME extension detected
SBI v0.2 IPI extension detected
SBI v0.2 RFENCE extension detected
software IO TLB: mapped [mem 0xf7c0-0xfbc0] (64MB)
[ cut here ]
DEBUG_LOCKS_WARN_ON(early_boot_irqs_disabled)
WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:4085
lockdep_hardirqs_on_prepare+0x384/0x388 kernel/locking/lockdep.c:4085
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.11.0-rc2-00069-gf49815047c1a-dirty #34
Hardware name: riscv-virtio,qemu (DT)
epc : lockdep_hardirqs_on_prepare+0x384/0x388 kernel/locking/lockdep.c:4085
  ra : lockdep_hardirqs_on_prepare+0x384/0x388 kernel/locking/lockdep.c:4085
epc : ffec125a ra : ffec125a sp : ffe006603ce0
  gp : ffe006c338f0 tp : ffe006689e00 t0 : ffe00669a9a8
  t1 : ffc400cc0738 t2 :  s0 : ffe006603d20
  s1 : ffe006689e00 a0 : 002d a1 : 000f
  a2 : 0002 a3 : ffed2718 a4 : 
  a5 :  a6 : 00f0 a7 : ffe0066039c7
  s2 : ffe004a337c0 s3 : ffe0076fa1b8 s4 : 
  s5 : ffe006689e00 s6 : 0001 s7 : ffe07fcfc000
  s8 : ffe07fcfd000 s9 : ffe006c3c0d0 s10: f000
  s11: ffe004a1fbb8 t3 : 2d2d2d2d t4 : ffc400cc0737
  t5 : ffc400cc0739 t6 : ffe0066039c8
status: 0100 badaddr:  cause: 0003
Call Trace:
[] lockdep_hardirqs_on_prepare+0x384/0x388
kernel/locking/lockdep.c:4085
[] trace_hardirqs_on+0x116/0x174
kernel/trace/trace_preemptirq.c:49
[] _save_context+0xa2/0xe2
[] local_flush_tlb_all
arch/riscv/include/asm/tlbflush.h:16 [inline]
[] populate arch/riscv/mm/kasan_init.c:95 [inline]
[] kasan_init+0x23e/0x31a arch/riscv/mm/kasan_init.c:157
irq event stamp: 0
hardirqs last  enabled at (0): [<>] 0x0
hardirqs last disabled at (0): [<>] 0x0
softirqs last  enabled at (0): [<>] 0x0
softirqs last disabled at (0): [<>] 0x0
random: get_random_bytes called from init_oops_id kernel/panic.c:546
[inline] with crng_init=0
random: get_random_bytes called from init_oops_id kernel/panic.c:543
[inline] with crng_init=0
random: get_random_bytes called from print_oops_end_marker
kernel/panic.c:556 [inline] with crng_init=0
random: get_random_bytes called from __warn+0x1be/0x20a
kernel/panic.c:613 with crng_init=0
---[ end trace  ]---
Unable to handle kernel paging request at virtual address dfc81004
Oops [#1]
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Tainted: GW
5.11.0-rc2-00069-gf49815047c1a-dirty #34
Hardware name: riscv-virtio,qemu (DT)
epc : __memset+0x60/0xfc arch/riscv/lib/memset.S:67
  ra : populate arch/riscv/mm/kasan_init.c:96 [inline]
  ra : kasan_init+0x256/0x31a arch/riscv/mm/kasan_init.c:157
epc : ffe001791cf0 ra : 

Re: [PATCH 0/4] Kasan improvements and fixes

2021-02-21 Thread Alex Ghiti

Hi,

Le 2/8/21 à 2:30 PM, Alexandre Ghiti a écrit :

This small series contains some improvements for the riscv KASAN code:

- it brings a better readability of the code (patch 1/2)
- it fixes oversight regarding page table population which I uncovered
   while working on my sv48 patchset (patch 3)
- it helps to have better performance by using hugepages when possible
   (patch 4)

Alexandre Ghiti (4):
   riscv: Improve kasan definitions
   riscv: Use KASAN_SHADOW_INIT define for kasan memory initialization
   riscv: Improve kasan population function
   riscv: Improve kasan population by using hugepages when possible

  arch/riscv/include/asm/kasan.h |  22 +-
  arch/riscv/mm/kasan_init.c | 119 -
  2 files changed, 108 insertions(+), 33 deletions(-)



I'm cc-ing linux-arch and linux-mm to get more chance to have reviewers 
on this series.


Thanks,

Alex


Re: [PATCH v2 1/1] riscv/kasan: add KASAN_VMALLOC support

2021-02-21 Thread Alex Ghiti

Le 2/13/21 à 5:52 AM, Alex Ghiti a écrit :

Hi Nylon, Palmer,

Le 2/8/21 à 1:28 AM, Alex Ghiti a écrit :

Hi Nylon,

Le 1/22/21 à 10:56 PM, Palmer Dabbelt a écrit :

On Fri, 15 Jan 2021 21:58:35 PST (-0800), nyl...@andestech.com wrote:

It references to x86/s390 architecture.
>> So, it doesn't map the early shadow page to cover VMALLOC space.

Prepopulate top level page table for the range that would otherwise be
empty.

lower levels are filled dynamically upon memory allocation while
booting.


I think we can improve the changelog a bit here with something like that:

"KASAN vmalloc space used to be mapped using kasan early shadow page. 
KASAN_VMALLOC requires the top-level of the kernel page table to be 
properly populated, lower levels being filled dynamically upon memory 
allocation at runtime."




Signed-off-by: Nylon Chen 
Signed-off-by: Nick Hu 
---
 arch/riscv/Kconfig |  1 +
 arch/riscv/mm/kasan_init.c | 57 +-
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 81b76d44725d..15a2c8088bbe 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -57,6 +57,7 @@ config RISCV
 select HAVE_ARCH_JUMP_LABEL
 select HAVE_ARCH_JUMP_LABEL_RELATIVE
 select HAVE_ARCH_KASAN if MMU && 64BIT
+    select HAVE_ARCH_KASAN_VMALLOC if MMU && 64BIT
 select HAVE_ARCH_KGDB
 select HAVE_ARCH_KGDB_QXFER_PKT
 select HAVE_ARCH_MMAP_RND_BITS if MMU
diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index 12ddd1f6bf70..4b9149f963d3 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -9,6 +9,19 @@
 #include 
 #include 
 #include 
+#include 
+
+static __init void *early_alloc(size_t size, int node)
+{
+    void *ptr = memblock_alloc_try_nid(size, size,
+    __pa(MAX_DMA_ADDRESS), MEMBLOCK_ALLOC_ACCESSIBLE, node);
+
+    if (!ptr)
+    panic("%pS: Failed to allocate %zu bytes align=%zx nid=%d 
from=%llx\n",

+    __func__, size, size, node, (u64)__pa(MAX_DMA_ADDRESS));
+
+    return ptr;
+}

 extern pgd_t early_pg_dir[PTRS_PER_PGD];
 asmlinkage void __init kasan_early_init(void)
@@ -83,6 +96,40 @@ static void __init populate(void *start, void *end)
 memset(start, 0, end - start);
 }

+void __init kasan_shallow_populate(void *start, void *end)
+{
+    unsigned long vaddr = (unsigned long)start & PAGE_MASK;
+    unsigned long vend = PAGE_ALIGN((unsigned long)end);
+    unsigned long pfn;
+    int index;
+    void *p;
+    pud_t *pud_dir, *pud_k;
+    pgd_t *pgd_dir, *pgd_k;
+    p4d_t *p4d_dir, *p4d_k;
+
+    while (vaddr < vend) {
+    index = pgd_index(vaddr);
+    pfn = csr_read(CSR_SATP) & SATP_PPN;


At this point in the boot process, we know that we use swapper_pg_dir 
so no need to read SATP.



+    pgd_dir = (pgd_t *)pfn_to_virt(pfn) + index;


Here, this pgd_dir assignment is overwritten 2 lines below, so no need 
for it.



+    pgd_k = init_mm.pgd + index;
+    pgd_dir = pgd_offset_k(vaddr);


pgd_offset_k(vaddr) = init_mm.pgd + pgd_index(vaddr) so pgd_k == pgd_dir.


+    set_pgd(pgd_dir, *pgd_k);
+
+    p4d_dir = p4d_offset(pgd_dir, vaddr);
+    p4d_k  = p4d_offset(pgd_k, vaddr);
+
+    vaddr = (vaddr + PUD_SIZE) & PUD_MASK;


Why do you increase vaddr *before* populating the first one ? And 
pud_addr_end does that properly: it returns the next pud address if it 
does not go beyond end address to map.



+    pud_dir = pud_offset(p4d_dir, vaddr);
+    pud_k = pud_offset(p4d_k, vaddr);
+
+    if (pud_present(*pud_dir)) {
+    p = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
+    pud_populate(_mm, pud_dir, p);


init_mm is not needed here.


+    }
+    vaddr += PAGE_SIZE;


Why do you need to add PAGE_SIZE ? vaddr already points to the next pud.

It seems like this patch tries to populate userspace page table 
whereas at this point in the boot process, only swapper_pg_dir is used 
or am I missing something ?


Thanks,

Alex


I implemented this morning a version that fixes all the comments I made 
earlier. I was able to insert test_kasan_module on both sv39 and sv48 
without any modification: set_pgd "goes through" all the unused page 
table levels, whereas p*d_populate are noop for unused levels.


If you have any comment, do not hesitate.

diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index adbf94b7e68a..d643b222167c 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -195,6 +195,31 @@ static void __init kasan_populate(void *start, void 
*end)

     memset(start, KASAN_SHADOW_INIT, end - start);
  }


+void __init kasan_shallow_populate_pgd(unsigned long vaddr, unsigned 
long end)

+{
+   unsigned long next;
+   void *p;
+   pgd_t *pgd_k = pgd_offset_k(vaddr);
+
+   do {
+   next = pgd_addr_end(vaddr, end);
+  

Re: riscv+KASAN does not boot

2021-02-19 Thread Alex Ghiti

Hi Dmitry,

Le 2/18/21 à 6:36 AM, Dmitry Vyukov a écrit :

On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti  wrote:


Hi Dmitry,


On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti  wrote:


Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit :

On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti  wrote:


Hi Dmitry,

Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit :

On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov  wrote:


On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov  wrote:

I was fixing KASAN support for my sv48 patchset so I took a look at your
issue: I built a kernel on top of the branch riscv/fixes using
https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config
and Buildroot 2020.11. I have the warnings regarding the use of
__virt_to_phys on wrong addresses (but that's normal since this function
is used in virt_addr_valid) but not the segfaults you describe.


Hi Alex,

Let me try to rebuild buildroot image. Maybe there was something wrong
with my build, though, I did 'make clean' before doing. But at the
same time it worked back in June...

Re WARNINGs, they indicate kernel bugs. I am working on setting up a
syzbot instance on riscv. If there a WARNING during boot then the
kernel will be marked as broken. No further testing will happen.
Is it a mis-use of WARN_ON? If so, could anybody please remove it or
replace it with pr_err.



Hi,

I've localized one issue with riscv/KASAN:
KASAN breaks VDSO and that's I think the root cause of weird faults I
saw earlier. The following patch fixes it.
Could somebody please upstream this fix? I don't know how to add/run
tests for this.
Thanks

diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
index 0cfd6da784f84..cf3a383c1799d 100644
--- a/arch/riscv/kernel/vdso/Makefile
+++ b/arch/riscv/kernel/vdso/Makefile
@@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
 # Disable gcov profiling for VDSO code
 GCOV_PROFILE := n
 KCOV_INSTRUMENT := n
+KASAN_SANITIZE := n

 # Force dependency
 $(obj)/vdso.o: $(obj)/vdso.so


What's weird is that I don't have any issue without this patch with the
following config whereas it indeed seems required for KASAN. But when
looking at the segfaults you got earlier, the segfault address is 0xbb0
and the cause is an instruction page fault: this address is the PLT base
address in vdso.so and an instruction page fault would mean that someone
tried to jump at this address, which is weird. At first sight, that does
not seem related to your patch above, but clearly I may be wrong.

Tobias, did you observe the same segfaults as Dmitry ?



I noticed that not all buildroot images use VDSO, it seems to be
dependent on libc settings (at least I think I changed it in the
past).


Ok, I used uClibc but then when using glibc, I have the same segfaults,
only when KASAN is enabled. And your patch fixes the problem. I will try
to take a look later to better understand the problem.


I also booted an image completely successfully including dhcpd/sshd
start, but then my executable crashed in clock_gettime. The executable
was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static"
(10.2.1).



Second issue I am seeing seems to be related to text segment size.
I check out v5.11 and use this config:
https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178


This config gave my laptop a hard time ! Finally I was able to boot
correctly to userspace, but I realized I used my sv48 branch...Either I
fixed your issue along the way or I can't reproduce it, I'll give it a
try tomorrow.


Where is your branch? I could also test in my setup on your branch.



You can find my branch int/alex/riscv_kernel_end_of_address_space_v2
here: https://github.com/AlexGhiti/riscv-linux.git


No, it does not work for me.

Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD,
int/alex/riscv_kernel_end_of_address_space_v2)
Config is 
https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt

riscv64-linux-gnu-gcc -v
gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1)

qemu-system-riscv64 --version
QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)

qemu-system-riscv64 \
-machine virt -smp 2 -m 2G \
-device virtio-blk-device,drive=hd0 \
-drive file=image-riscv64,if=none,format=raw,id=hd0 \
-kernel arch/riscv/boot/Image \
-nographic \
-device virtio-rng-device,rng=rng0 -object
rng-random,filename=/dev/urandom,id=rng0 \
-netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
virtio-net-device,netdev=net0 \
-append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
panic_on_warn=1 panic=86400 earlycon"


It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I
don't think that changes anything at runtime). But your above command
line does not work for me as it appears you do not load any firmware, if
I add -bios images/f

Re: riscv+KASAN does not boot

2021-02-18 Thread Alex Ghiti

Hi Dmitry,


On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti  wrote:


Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit :

On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti  wrote:


Hi Dmitry,

Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit :

On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov  wrote:


On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov  wrote:

I was fixing KASAN support for my sv48 patchset so I took a look at your
issue: I built a kernel on top of the branch riscv/fixes using
https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config
and Buildroot 2020.11. I have the warnings regarding the use of
__virt_to_phys on wrong addresses (but that's normal since this function
is used in virt_addr_valid) but not the segfaults you describe.


Hi Alex,

Let me try to rebuild buildroot image. Maybe there was something wrong
with my build, though, I did 'make clean' before doing. But at the
same time it worked back in June...

Re WARNINGs, they indicate kernel bugs. I am working on setting up a
syzbot instance on riscv. If there a WARNING during boot then the
kernel will be marked as broken. No further testing will happen.
Is it a mis-use of WARN_ON? If so, could anybody please remove it or
replace it with pr_err.



Hi,

I've localized one issue with riscv/KASAN:
KASAN breaks VDSO and that's I think the root cause of weird faults I
saw earlier. The following patch fixes it.
Could somebody please upstream this fix? I don't know how to add/run
tests for this.
Thanks

diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
index 0cfd6da784f84..cf3a383c1799d 100644
--- a/arch/riscv/kernel/vdso/Makefile
+++ b/arch/riscv/kernel/vdso/Makefile
@@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
# Disable gcov profiling for VDSO code
GCOV_PROFILE := n
KCOV_INSTRUMENT := n
+KASAN_SANITIZE := n

# Force dependency
$(obj)/vdso.o: $(obj)/vdso.so


What's weird is that I don't have any issue without this patch with the
following config whereas it indeed seems required for KASAN. But when
looking at the segfaults you got earlier, the segfault address is 0xbb0
and the cause is an instruction page fault: this address is the PLT base
address in vdso.so and an instruction page fault would mean that someone
tried to jump at this address, which is weird. At first sight, that does
not seem related to your patch above, but clearly I may be wrong.

Tobias, did you observe the same segfaults as Dmitry ?



I noticed that not all buildroot images use VDSO, it seems to be
dependent on libc settings (at least I think I changed it in the
past).


Ok, I used uClibc but then when using glibc, I have the same segfaults,
only when KASAN is enabled. And your patch fixes the problem. I will try
to take a look later to better understand the problem.


I also booted an image completely successfully including dhcpd/sshd
start, but then my executable crashed in clock_gettime. The executable
was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static"
(10.2.1).



Second issue I am seeing seems to be related to text segment size.
I check out v5.11 and use this config:
https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178


This config gave my laptop a hard time ! Finally I was able to boot
correctly to userspace, but I realized I used my sv48 branch...Either I
fixed your issue along the way or I can't reproduce it, I'll give it a
try tomorrow.


Where is your branch? I could also test in my setup on your branch.



You can find my branch int/alex/riscv_kernel_end_of_address_space_v2
here: https://github.com/AlexGhiti/riscv-linux.git


No, it does not work for me.

Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD,
int/alex/riscv_kernel_end_of_address_space_v2)
Config is 
https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt

riscv64-linux-gnu-gcc -v
gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1)

qemu-system-riscv64 --version
QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)

qemu-system-riscv64 \
-machine virt -smp 2 -m 2G \
-device virtio-blk-device,drive=hd0 \
-drive file=image-riscv64,if=none,format=raw,id=hd0 \
-kernel arch/riscv/boot/Image \
-nographic \
-device virtio-rng-device,rng=rng0 -object
rng-random,filename=/dev/urandom,id=rng0 \
-netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
virtio-net-device,netdev=net0 \
-append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
panic_on_warn=1 panic=86400 earlycon"


It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I 
don't think that changes anything at runtime). But your above command 
line does not work for me as it appears you do not load any firmware, if 
I add -bios images/fw_jump.elf, it works. But then I don't know where 
your opensbi output below comes from...


And regarding you

Re: riscv+KASAN does not boot

2021-02-17 Thread Alex Ghiti

Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit :

On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti  wrote:


Hi Dmitry,

Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit :

On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov  wrote:


On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov  wrote:

I was fixing KASAN support for my sv48 patchset so I took a look at your
issue: I built a kernel on top of the branch riscv/fixes using
https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config
and Buildroot 2020.11. I have the warnings regarding the use of
__virt_to_phys on wrong addresses (but that's normal since this function
is used in virt_addr_valid) but not the segfaults you describe.


Hi Alex,

Let me try to rebuild buildroot image. Maybe there was something wrong
with my build, though, I did 'make clean' before doing. But at the
same time it worked back in June...

Re WARNINGs, they indicate kernel bugs. I am working on setting up a
syzbot instance on riscv. If there a WARNING during boot then the
kernel will be marked as broken. No further testing will happen.
Is it a mis-use of WARN_ON? If so, could anybody please remove it or
replace it with pr_err.



Hi,

I've localized one issue with riscv/KASAN:
KASAN breaks VDSO and that's I think the root cause of weird faults I
saw earlier. The following patch fixes it.
Could somebody please upstream this fix? I don't know how to add/run
tests for this.
Thanks

diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
index 0cfd6da784f84..cf3a383c1799d 100644
--- a/arch/riscv/kernel/vdso/Makefile
+++ b/arch/riscv/kernel/vdso/Makefile
@@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
   # Disable gcov profiling for VDSO code
   GCOV_PROFILE := n
   KCOV_INSTRUMENT := n
+KASAN_SANITIZE := n

   # Force dependency
   $(obj)/vdso.o: $(obj)/vdso.so


What's weird is that I don't have any issue without this patch with the
following config whereas it indeed seems required for KASAN. But when
looking at the segfaults you got earlier, the segfault address is 0xbb0
and the cause is an instruction page fault: this address is the PLT base
address in vdso.so and an instruction page fault would mean that someone
tried to jump at this address, which is weird. At first sight, that does
not seem related to your patch above, but clearly I may be wrong.

Tobias, did you observe the same segfaults as Dmitry ?



I noticed that not all buildroot images use VDSO, it seems to be
dependent on libc settings (at least I think I changed it in the
past).


Ok, I used uClibc but then when using glibc, I have the same segfaults, 
only when KASAN is enabled. And your patch fixes the problem. I will try 
to take a look later to better understand the problem.



I also booted an image completely successfully including dhcpd/sshd
start, but then my executable crashed in clock_gettime. The executable
was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static"
(10.2.1).



Second issue I am seeing seems to be related to text segment size.
I check out v5.11 and use this config:
https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178


This config gave my laptop a hard time ! Finally I was able to boot
correctly to userspace, but I realized I used my sv48 branch...Either I
fixed your issue along the way or I can't reproduce it, I'll give it a
try tomorrow.


Where is your branch? I could also test in my setup on your branch.



You can find my branch int/alex/riscv_kernel_end_of_address_space_v2 
here: https://github.com/AlexGhiti/riscv-linux.git


Thanks,




Then trying to boot it using:
QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)
$ qemu-system-riscv64 -machine virt -smp 2 -m 4G ...

It shows no output from the kernel whatsoever, even though I have
earlycon and output shows very early with other configs.
Kernel boots fine with defconfig and other smaller configs.

If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config
also boots fine. Both of these options significantly reduce kernel
size. However, I can also boot the kernel without these 2 configs, if
I disable a whole lot of subsystem configs. This makes me think that
there is an issue related to kernel size somewhere in
qemu/bootloader/kernel bootstrap code.
Does it make sense to you? Can somebody reproduce what I am seeing? >


I did not bring any answer to your question, but at least you know I'm
working on it, I'll keep you posted.

Thanks for taking the time to setup syzkaller.

Alex


Thanks

___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv



___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv



Re: riscv+KASAN does not boot

2021-02-16 Thread Alex Ghiti

Hi Dmitry,

Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit :

On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov  wrote:


On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov  wrote:

I was fixing KASAN support for my sv48 patchset so I took a look at your
issue: I built a kernel on top of the branch riscv/fixes using
https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config
and Buildroot 2020.11. I have the warnings regarding the use of
__virt_to_phys on wrong addresses (but that's normal since this function
is used in virt_addr_valid) but not the segfaults you describe.


Hi Alex,

Let me try to rebuild buildroot image. Maybe there was something wrong
with my build, though, I did 'make clean' before doing. But at the
same time it worked back in June...

Re WARNINGs, they indicate kernel bugs. I am working on setting up a
syzbot instance on riscv. If there a WARNING during boot then the
kernel will be marked as broken. No further testing will happen.
Is it a mis-use of WARN_ON? If so, could anybody please remove it or
replace it with pr_err.



Hi,

I've localized one issue with riscv/KASAN:
KASAN breaks VDSO and that's I think the root cause of weird faults I
saw earlier. The following patch fixes it.
Could somebody please upstream this fix? I don't know how to add/run
tests for this.
Thanks

diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
index 0cfd6da784f84..cf3a383c1799d 100644
--- a/arch/riscv/kernel/vdso/Makefile
+++ b/arch/riscv/kernel/vdso/Makefile
@@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
  # Disable gcov profiling for VDSO code
  GCOV_PROFILE := n
  KCOV_INSTRUMENT := n
+KASAN_SANITIZE := n

  # Force dependency
  $(obj)/vdso.o: $(obj)/vdso.so


What's weird is that I don't have any issue without this patch with the 
following config whereas it indeed seems required for KASAN. But when 
looking at the segfaults you got earlier, the segfault address is 0xbb0 
and the cause is an instruction page fault: this address is the PLT base 
address in vdso.so and an instruction page fault would mean that someone 
tried to jump at this address, which is weird. At first sight, that does 
not seem related to your patch above, but clearly I may be wrong.


Tobias, did you observe the same segfaults as Dmitry ?





Second issue I am seeing seems to be related to text segment size.
I check out v5.11 and use this config:
https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178


This config gave my laptop a hard time ! Finally I was able to boot 
correctly to userspace, but I realized I used my sv48 branch...Either I 
fixed your issue along the way or I can't reproduce it, I'll give it a 
try tomorrow.




Then trying to boot it using:
QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)
$ qemu-system-riscv64 -machine virt -smp 2 -m 4G ...

It shows no output from the kernel whatsoever, even though I have
earlycon and output shows very early with other configs.
Kernel boots fine with defconfig and other smaller configs.

If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config
also boots fine. Both of these options significantly reduce kernel
size. However, I can also boot the kernel without these 2 configs, if
I disable a whole lot of subsystem configs. This makes me think that
there is an issue related to kernel size somewhere in
qemu/bootloader/kernel bootstrap code.
Does it make sense to you? Can somebody reproduce what I am seeing? >


I did not bring any answer to your question, but at least you know I'm 
working on it, I'll keep you posted.


Thanks for taking the time to setup syzkaller.

Alex


Thanks

___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv



Re: [PATCH v2 1/1] riscv/kasan: add KASAN_VMALLOC support

2021-02-13 Thread Alex Ghiti

Hi Nylon, Palmer,

Le 2/8/21 à 1:28 AM, Alex Ghiti a écrit :

Hi Nylon,

Le 1/22/21 à 10:56 PM, Palmer Dabbelt a écrit :

On Fri, 15 Jan 2021 21:58:35 PST (-0800), nyl...@andestech.com wrote:

It references to x86/s390 architecture.
>> So, it doesn't map the early shadow page to cover VMALLOC space.

Prepopulate top level page table for the range that would otherwise be
empty.

lower levels are filled dynamically upon memory allocation while
booting.


I think we can improve the changelog a bit here with something like that:

"KASAN vmalloc space used to be mapped using kasan early shadow page. 
KASAN_VMALLOC requires the top-level of the kernel page table to be 
properly populated, lower levels being filled dynamically upon memory 
allocation at runtime."




Signed-off-by: Nylon Chen 
Signed-off-by: Nick Hu 
---
 arch/riscv/Kconfig |  1 +
 arch/riscv/mm/kasan_init.c | 57 +-
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 81b76d44725d..15a2c8088bbe 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -57,6 +57,7 @@ config RISCV
 select HAVE_ARCH_JUMP_LABEL
 select HAVE_ARCH_JUMP_LABEL_RELATIVE
 select HAVE_ARCH_KASAN if MMU && 64BIT
+    select HAVE_ARCH_KASAN_VMALLOC if MMU && 64BIT
 select HAVE_ARCH_KGDB
 select HAVE_ARCH_KGDB_QXFER_PKT
 select HAVE_ARCH_MMAP_RND_BITS if MMU
diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index 12ddd1f6bf70..4b9149f963d3 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -9,6 +9,19 @@
 #include 
 #include 
 #include 
+#include 
+
+static __init void *early_alloc(size_t size, int node)
+{
+    void *ptr = memblock_alloc_try_nid(size, size,
+    __pa(MAX_DMA_ADDRESS), MEMBLOCK_ALLOC_ACCESSIBLE, node);
+
+    if (!ptr)
+    panic("%pS: Failed to allocate %zu bytes align=%zx nid=%d 
from=%llx\n",

+    __func__, size, size, node, (u64)__pa(MAX_DMA_ADDRESS));
+
+    return ptr;
+}

 extern pgd_t early_pg_dir[PTRS_PER_PGD];
 asmlinkage void __init kasan_early_init(void)
@@ -83,6 +96,40 @@ static void __init populate(void *start, void *end)
 memset(start, 0, end - start);
 }

+void __init kasan_shallow_populate(void *start, void *end)
+{
+    unsigned long vaddr = (unsigned long)start & PAGE_MASK;
+    unsigned long vend = PAGE_ALIGN((unsigned long)end);
+    unsigned long pfn;
+    int index;
+    void *p;
+    pud_t *pud_dir, *pud_k;
+    pgd_t *pgd_dir, *pgd_k;
+    p4d_t *p4d_dir, *p4d_k;
+
+    while (vaddr < vend) {
+    index = pgd_index(vaddr);
+    pfn = csr_read(CSR_SATP) & SATP_PPN;


At this point in the boot process, we know that we use swapper_pg_dir so 
no need to read SATP.



+    pgd_dir = (pgd_t *)pfn_to_virt(pfn) + index;


Here, this pgd_dir assignment is overwritten 2 lines below, so no need 
for it.



+    pgd_k = init_mm.pgd + index;
+    pgd_dir = pgd_offset_k(vaddr);


pgd_offset_k(vaddr) = init_mm.pgd + pgd_index(vaddr) so pgd_k == pgd_dir.


+    set_pgd(pgd_dir, *pgd_k);
+
+    p4d_dir = p4d_offset(pgd_dir, vaddr);
+    p4d_k  = p4d_offset(pgd_k, vaddr);
+
+    vaddr = (vaddr + PUD_SIZE) & PUD_MASK;


Why do you increase vaddr *before* populating the first one ? And 
pud_addr_end does that properly: it returns the next pud address if it 
does not go beyond end address to map.



+    pud_dir = pud_offset(p4d_dir, vaddr);
+    pud_k = pud_offset(p4d_k, vaddr);
+
+    if (pud_present(*pud_dir)) {
+    p = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
+    pud_populate(_mm, pud_dir, p);


init_mm is not needed here.


+    }
+    vaddr += PAGE_SIZE;


Why do you need to add PAGE_SIZE ? vaddr already points to the next pud.

It seems like this patch tries to populate userspace page table whereas 
at this point in the boot process, only swapper_pg_dir is used or am I 
missing something ?


Thanks,

Alex


I implemented this morning a version that fixes all the comments I made 
earlier. I was able to insert test_kasan_module on both sv39 and sv48 
without any modification: set_pgd "goes through" all the unused page 
table levels, whereas p*d_populate are noop for unused levels.


If you have any comment, do not hesitate.

diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c 

index adbf94b7e68a..d643b222167c 100644 

--- a/arch/riscv/mm/kasan_init.c 

+++ b/arch/riscv/mm/kasan_init.c 

@@ -195,6 +195,31 @@ static void __init kasan_populate(void *start, void 
*end)
memset(start, KASAN_SHADOW_INIT, end - start); 

 } 




+void __init kasan_shallow_populate_pgd(unsigned long vaddr, unsigned 
long end)
+{ 

+   unsigned long next; 

+   void *p; 

+   pgd_t *pgd_k = pgd_offset_k(vaddr); 

+ 

+   do { 

+   next = pgd_addr_end(vaddr, end); 

+   if (pgd_page_va

Re: [PATCH] riscv: Improve kasan population by using hugepages when possible

2021-02-08 Thread Alex Ghiti

Le 2/2/21 à 3:50 AM, Alex Ghiti a écrit :

Hi,

Le 2/1/21 à 3:00 AM, Alexandre Ghiti a écrit :

Kasan function that populates the shadow regions used to allocate them
page by page and did not take advantage of hugepages, so fix this by
trying to allocate hugepages of 1GB and fallback to 2MB hugepages or 4K
pages in case it fails.

This reduces the page table memory consumption and improves TLB usage,
as shown below:

Before this patch:

---[ Kasan shadow start ]---
0xffc0-0xffc4    0x818ef000    16G 
PTE . A . . . . R V
0xffc4-0xffc447fc    0x0002b7f4f000   1179392K 
PTE D A . . . W R V
0xffc48000-0xffc8    0x818ef000    14G 
PTE . A . . . . R V

---[ Kasan shadow end ]---

After this patch:

---[ Kasan shadow start ]---
0xffc0-0xffc4    0x818ef000    16G 
PTE . A . . . . R V
0xffc4-0xffc44000    0x00024000 1G 
PGD D A . . . W R V
0xffc44000-0xffc447e0    0x0002b7e0   126M 
PMD D A . . . W R V
0xffc447e0-0xffc447fc    0x0002b818f000  1792K 
PTE D A . . . W R V
0xffc48000-0xffc8    0x818ef000    14G 
PTE . A . . . . R V

---[ Kasan shadow end ]---

Signed-off-by: Alexandre Ghiti 
---
  arch/riscv/mm/kasan_init.c | 101 +++--
  1 file changed, 73 insertions(+), 28 deletions(-)

diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index a8a2ffd9114a..8f11b73018b1 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -47,37 +47,82 @@ asmlinkage void __init kasan_early_init(void)
  local_flush_tlb_all();
  }
-static void __init populate(void *start, void *end)
+static void kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, 
unsigned long end)

+{
+    phys_addr_t phys_addr;
+    pte_t *ptep = memblock_alloc(PTRS_PER_PTE * sizeof(pte_t), 
PAGE_SIZE);

+
+    do {
+    phys_addr = memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
+    set_pte(ptep, pfn_pte(PFN_DOWN(phys_addr), PAGE_KERNEL));
+    } while (ptep++, vaddr += PAGE_SIZE, vaddr != end);
+
+    set_pmd(pmd, pfn_pmd(PFN_DOWN(__pa(ptep)), PAGE_TABLE));
+}
+
+static void kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, 
unsigned long end)

+{
+    phys_addr_t phys_addr;
+    pmd_t *pmdp = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), 
PAGE_SIZE);

+    unsigned long next;
+
+    do {
+    next = pmd_addr_end(vaddr, end);
+
+    if (IS_ALIGNED(vaddr, PMD_SIZE) && (next - vaddr) >= PMD_SIZE) {
+    phys_addr = memblock_phys_alloc(PMD_SIZE, PMD_SIZE);
+    if (phys_addr) {
+    set_pmd(pmdp, pfn_pmd(PFN_DOWN(phys_addr), 
PAGE_KERNEL));

+    continue;
+    }
+    }
+
+    kasan_populate_pte(pmdp, vaddr, end);
+    } while (pmdp++, vaddr = next, vaddr != end);
+
+    /*
+ * Wait for the whole PGD to be populated before setting the PGD in
+ * the page table, otherwise, if we did set the PGD before 
populating

+ * it entirely, memblock could allocate a page at a physical address
+ * where KASAN is not populated yet and then we'd get a page fault.
+ */
+    set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(pmdp)), PAGE_TABLE));


In case the PMD was filled entirely, PFN_DOWN(__pa(pmdp)) will point to 
the next physical page, which is wrong. The same problem happens on the 
other levels too.


I'll fix that in a v2 later today.

Alex


+}
+
+static void kasan_populate_pgd(unsigned long vaddr, unsigned long end)
+{
+    phys_addr_t phys_addr;
+    pgd_t *pgdp = pgd_offset_k(vaddr);
+    unsigned long next;
+
+    do {
+    next = pgd_addr_end(vaddr, end);
+
+    if (IS_ALIGNED(vaddr, PGDIR_SIZE) && (next - vaddr) >= 
PGDIR_SIZE) {

+    phys_addr = memblock_phys_alloc(PGDIR_SIZE, PGDIR_SIZE);
+    if (phys_addr) {
+    set_pgd(pgdp, pfn_pgd(PFN_DOWN(phys_addr), 
PAGE_KERNEL));

+    continue;
+    }
+    }
+
+    kasan_populate_pmd(pgdp, vaddr, end);
+    } while (pgdp++, vaddr = next, vaddr != end);
+}
+
+/*
+ * This function populates KASAN shadow region focusing on hugepages in
+ * order to minimize the page table cost and TLB usage too.
+ * Note that start must be PGDIR_SIZE-aligned in SV39 which amounts 
to be
+ * 1G aligned (that represents a 8G alignment constraint on virtual 
address

+ * ranges because of KASAN_SHADOW_SCALE_SHIFT).
+ */
+static void __init kasan_populate(void *start, void *end)
  {
-    unsigned long i, offset;
  unsigned long vaddr = (unsigned long)start & PAGE_MASK;
  unsigned long vend = PAGE_ALIGN((unsigned long)end);
-    unsigned long n_pages = (vend - vaddr) / PAGE_SIZE;
-    unsigned long n_ptes =
-    ((n_pages + PTRS_PER_PTE) & -PTRS_PER_PTE) / PTRS_PER_PTE;
-    unsigned long n_pmds =
-    ((n_ptes +

Re: [PATCH v2 1/1] riscv/kasan: add KASAN_VMALLOC support

2021-02-07 Thread Alex Ghiti

Hi Nylon,

Le 1/22/21 à 10:56 PM, Palmer Dabbelt a écrit :

On Fri, 15 Jan 2021 21:58:35 PST (-0800), nyl...@andestech.com wrote:

It references to x86/s390 architecture.
>> So, it doesn't map the early shadow page to cover VMALLOC space.

Prepopulate top level page table for the range that would otherwise be
empty.

lower levels are filled dynamically upon memory allocation while
booting.


I think we can improve the changelog a bit here with something like that:

"KASAN vmalloc space used to be mapped using kasan early shadow page. 
KASAN_VMALLOC requires the top-level of the kernel page table to be 
properly populated, lower levels being filled dynamically upon memory 
allocation at runtime."




Signed-off-by: Nylon Chen 
Signed-off-by: Nick Hu 
---
 arch/riscv/Kconfig |  1 +
 arch/riscv/mm/kasan_init.c | 57 +-
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 81b76d44725d..15a2c8088bbe 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -57,6 +57,7 @@ config RISCV
 select HAVE_ARCH_JUMP_LABEL
 select HAVE_ARCH_JUMP_LABEL_RELATIVE
 select HAVE_ARCH_KASAN if MMU && 64BIT
+    select HAVE_ARCH_KASAN_VMALLOC if MMU && 64BIT
 select HAVE_ARCH_KGDB
 select HAVE_ARCH_KGDB_QXFER_PKT
 select HAVE_ARCH_MMAP_RND_BITS if MMU
diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index 12ddd1f6bf70..4b9149f963d3 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -9,6 +9,19 @@
 #include 
 #include 
 #include 
+#include 
+
+static __init void *early_alloc(size_t size, int node)
+{
+    void *ptr = memblock_alloc_try_nid(size, size,
+    __pa(MAX_DMA_ADDRESS), MEMBLOCK_ALLOC_ACCESSIBLE, node);
+
+    if (!ptr)
+    panic("%pS: Failed to allocate %zu bytes align=%zx nid=%d 
from=%llx\n",

+    __func__, size, size, node, (u64)__pa(MAX_DMA_ADDRESS));
+
+    return ptr;
+}

 extern pgd_t early_pg_dir[PTRS_PER_PGD];
 asmlinkage void __init kasan_early_init(void)
@@ -83,6 +96,40 @@ static void __init populate(void *start, void *end)
 memset(start, 0, end - start);
 }

+void __init kasan_shallow_populate(void *start, void *end)
+{
+    unsigned long vaddr = (unsigned long)start & PAGE_MASK;
+    unsigned long vend = PAGE_ALIGN((unsigned long)end);
+    unsigned long pfn;
+    int index;
+    void *p;
+    pud_t *pud_dir, *pud_k;
+    pgd_t *pgd_dir, *pgd_k;
+    p4d_t *p4d_dir, *p4d_k;
+
+    while (vaddr < vend) {
+    index = pgd_index(vaddr);
+    pfn = csr_read(CSR_SATP) & SATP_PPN;


At this point in the boot process, we know that we use swapper_pg_dir so 
no need to read SATP.



+    pgd_dir = (pgd_t *)pfn_to_virt(pfn) + index;


Here, this pgd_dir assignment is overwritten 2 lines below, so no need 
for it.



+    pgd_k = init_mm.pgd + index;
+    pgd_dir = pgd_offset_k(vaddr);


pgd_offset_k(vaddr) = init_mm.pgd + pgd_index(vaddr) so pgd_k == pgd_dir.


+    set_pgd(pgd_dir, *pgd_k);
+
+    p4d_dir = p4d_offset(pgd_dir, vaddr);
+    p4d_k  = p4d_offset(pgd_k, vaddr);
+
+    vaddr = (vaddr + PUD_SIZE) & PUD_MASK;


Why do you increase vaddr *before* populating the first one ? And 
pud_addr_end does that properly: it returns the next pud address if it 
does not go beyond end address to map.



+    pud_dir = pud_offset(p4d_dir, vaddr);
+    pud_k = pud_offset(p4d_k, vaddr);
+
+    if (pud_present(*pud_dir)) {
+    p = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
+    pud_populate(_mm, pud_dir, p);


init_mm is not needed here.


+    }
+    vaddr += PAGE_SIZE;


Why do you need to add PAGE_SIZE ? vaddr already points to the next pud.

It seems like this patch tries to populate userspace page table whereas 
at this point in the boot process, only swapper_pg_dir is used or am I 
missing something ?


Thanks,

Alex


+    }
+}
+
 void __init kasan_init(void)
 {
 phys_addr_t _start, _end;
@@ -90,7 +137,15 @@ void __init kasan_init(void)

 kasan_populate_early_shadow((void *)KASAN_SHADOW_START,
 (void *)kasan_mem_to_shadow((void *)
-    VMALLOC_END));
+    VMEMMAP_END));
+    if (IS_ENABLED(CONFIG_KASAN_VMALLOC))
+    kasan_shallow_populate(
+    (void *)kasan_mem_to_shadow((void *)VMALLOC_START),
+    (void *)kasan_mem_to_shadow((void *)VMALLOC_END));
+    else
+    kasan_populate_early_shadow(
+    (void *)kasan_mem_to_shadow((void *)VMALLOC_START),
+    (void *)kasan_mem_to_shadow((void *)VMALLOC_END));

 for_each_mem_range(i, &_start, &_end) {
 void *start = (void *)_start; >

Thanks, this is on for-next.

___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv


Re: [PATCH] riscv: Improve kasan population by using hugepages when possible

2021-02-02 Thread Alex Ghiti

Hi,

Le 2/1/21 à 3:00 AM, Alexandre Ghiti a écrit :

Kasan function that populates the shadow regions used to allocate them
page by page and did not take advantage of hugepages, so fix this by
trying to allocate hugepages of 1GB and fallback to 2MB hugepages or 4K
pages in case it fails.

This reduces the page table memory consumption and improves TLB usage,
as shown below:

Before this patch:

---[ Kasan shadow start ]---
0xffc0-0xffc40x818ef00016G PTE 
. A . . . . R V
0xffc4-0xffc447fc0x0002b7f4f000   1179392K PTE 
D A . . . W R V
0xffc48000-0xffc80x818ef00014G PTE 
. A . . . . R V
---[ Kasan shadow end ]---

After this patch:

---[ Kasan shadow start ]---
0xffc0-0xffc40x818ef00016G PTE 
. A . . . . R V
0xffc4-0xffc440000x00024000 1G PGD 
D A . . . W R V
0xffc44000-0xffc447e00x0002b7e0   126M PMD 
D A . . . W R V
0xffc447e0-0xffc447fc0x0002b818f000  1792K PTE 
D A . . . W R V
0xffc48000-0xffc80x818ef00014G PTE 
. A . . . . R V
---[ Kasan shadow end ]---

Signed-off-by: Alexandre Ghiti 
---
  arch/riscv/mm/kasan_init.c | 101 +++--
  1 file changed, 73 insertions(+), 28 deletions(-)

diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index a8a2ffd9114a..8f11b73018b1 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -47,37 +47,82 @@ asmlinkage void __init kasan_early_init(void)
local_flush_tlb_all();
  }
  
-static void __init populate(void *start, void *end)

+static void kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned long 
end)
+{
+   phys_addr_t phys_addr;
+   pte_t *ptep = memblock_alloc(PTRS_PER_PTE * sizeof(pte_t), PAGE_SIZE);
+
+   do {
+   phys_addr = memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
+   set_pte(ptep, pfn_pte(PFN_DOWN(phys_addr), PAGE_KERNEL));
+   } while (ptep++, vaddr += PAGE_SIZE, vaddr != end);
+
+   set_pmd(pmd, pfn_pmd(PFN_DOWN(__pa(ptep)), PAGE_TABLE));
+}
+
+static void kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned long 
end)
+{
+   phys_addr_t phys_addr;
+   pmd_t *pmdp = memblock_alloc(PTRS_PER_PMD * sizeof(pmd_t), PAGE_SIZE);
+   unsigned long next;
+
+   do {
+   next = pmd_addr_end(vaddr, end);
+
+   if (IS_ALIGNED(vaddr, PMD_SIZE) && (next - vaddr) >= PMD_SIZE) {
+   phys_addr = memblock_phys_alloc(PMD_SIZE, PMD_SIZE);
+   if (phys_addr) {
+   set_pmd(pmdp, pfn_pmd(PFN_DOWN(phys_addr), 
PAGE_KERNEL));
+   continue;
+   }
+   }
+
+   kasan_populate_pte(pmdp, vaddr, end);
+   } while (pmdp++, vaddr = next, vaddr != end);
+
+   /*
+* Wait for the whole PGD to be populated before setting the PGD in
+* the page table, otherwise, if we did set the PGD before populating
+* it entirely, memblock could allocate a page at a physical address
+* where KASAN is not populated yet and then we'd get a page fault.
+*/
+   set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(pmdp)), PAGE_TABLE));


In case the PMD was filled entirely, PFN_DOWN(__pa(pmdp)) will point to 
the next physical page, which is wrong. The same problem happens on the 
other levels too.


I'll fix that in a v2 later today.

Alex


+}
+
+static void kasan_populate_pgd(unsigned long vaddr, unsigned long end)
+{
+   phys_addr_t phys_addr;
+   pgd_t *pgdp = pgd_offset_k(vaddr);
+   unsigned long next;
+
+   do {
+   next = pgd_addr_end(vaddr, end);
+
+   if (IS_ALIGNED(vaddr, PGDIR_SIZE) && (next - vaddr) >= 
PGDIR_SIZE) {
+   phys_addr = memblock_phys_alloc(PGDIR_SIZE, PGDIR_SIZE);
+   if (phys_addr) {
+   set_pgd(pgdp, pfn_pgd(PFN_DOWN(phys_addr), 
PAGE_KERNEL));
+   continue;
+   }
+   }
+
+   kasan_populate_pmd(pgdp, vaddr, end);
+   } while (pgdp++, vaddr = next, vaddr != end);
+}
+
+/*
+ * This function populates KASAN shadow region focusing on hugepages in
+ * order to minimize the page table cost and TLB usage too.
+ * Note that start must be PGDIR_SIZE-aligned in SV39 which amounts to be
+ * 1G aligned (that represents a 8G alignment constraint on virtual address
+ * ranges because of KASAN_SHADOW_SCALE_SHIFT).
+ */
+static void __init kasan_populate(void *start, void *end)
  {
-   unsigned long i, offset;
unsigned long vaddr = (unsigned long)start & PAGE_MASK;
unsigned long vend = PAGE_ALIGN((unsigned long)end);
-   

Re: [PATCH] riscv: kasan: remove unneeded semicolon

2021-02-01 Thread Alex Ghiti

Hi Yang,

Le 2/2/21 à 12:51 AM, Yang Li a écrit :

Eliminate the following coccicheck warning:
./arch/riscv/mm/kasan_init.c:103:2-3: Unneeded semicolon

Reported-by: Abaci Robot 
Signed-off-by: Yang Li 
---
  arch/riscv/mm/kasan_init.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index a8a2ffd..fac437a 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -100,7 +100,7 @@ void __init kasan_init(void)
break;
  
  		populate(kasan_mem_to_shadow(start), kasan_mem_to_shadow(end));

-   };
+   }
  
  	for (i = 0; i < PTRS_PER_PTE; i++)

set_pte(_early_shadow_pte[i],



Reviewed-by: Alexandre Ghiti 

Thanks,

Alex


Re: [RFC PATCH 00/12] Introduce sv48 support without relocable kernel

2021-01-30 Thread Alex Ghiti

Hi Palmer,

On 1/4/21 2:58 PM, Alexandre Ghiti wrote:

This patchset, contrary to the previous versions, allows to have a single
kernel for sv39 and sv48 without being relocatable.
  
The idea comes from Arnd Bergmann who suggested to do the same as x86,

that is mapping the kernel to the end of the address space, which allows
the kernel to be linked at the same address for both sv39 and sv48 and
then does not require to be relocated at runtime.
  
This is an RFC because I need to at least rebase a few commits and add

documentation. The most interesting patches where I expect feedbacks are
1/12, 2/12 and 8/12. Note that moving the kernel out of the linear
mapping and sv48 support can be separate patchsets, I share them together
today to show that it works (this patchset is rebased on top of v5.10).

If we agree about the overall idea, I'll rebase my relocatable patchset
on top of that and then KASLR implementation from Zong will be greatly
simplified since moving the kernel out of the linear mapping will avoid
to copy the kernel physically.
  
This implements sv48 support at runtime. The kernel will try to

boot with 4-level page table and will fallback to 3-level if the HW does not
support it. Folding the 4th level into a 3-level page table has almost no
cost at runtime.
  
Finally, the user can now ask for sv39 explicitly by using the device-tree

which will reduce memory footprint and reduce the number of memory accesses
in case of TLB miss.

Alexandre Ghiti (12):
   riscv: Move kernel mapping outside of linear mapping
   riscv: Protect the kernel linear mapping
   riscv: Get rid of compile time logic with MAX_EARLY_MAPPING_SIZE
   riscv: Allow to dynamically define VA_BITS
   riscv: Simplify MAXPHYSMEM config
   riscv: Prepare ptdump for vm layout dynamic addresses
   asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
   riscv: Implement sv48 support
   riscv: Allow user to downgrade to sv39 when hw supports sv48
   riscv: Use pgtable_l4_enabled to output mmu type in cpuinfo
   riscv: Explicit comment about user virtual address space size
   riscv: Improve virtual kernel memory layout dump

  arch/riscv/Kconfig  |  34 +--
  arch/riscv/boot/loader.lds.S|   3 +-
  arch/riscv/include/asm/csr.h|   3 +-
  arch/riscv/include/asm/fixmap.h |   3 +
  arch/riscv/include/asm/page.h   |  33 ++-
  arch/riscv/include/asm/pgalloc.h|  40 +++
  arch/riscv/include/asm/pgtable-64.h | 104 ++-
  arch/riscv/include/asm/pgtable.h|  68 +++--
  arch/riscv/include/asm/sparsemem.h  |   6 +-
  arch/riscv/kernel/cpu.c |  23 +-
  arch/riscv/kernel/head.S|   6 +-
  arch/riscv/kernel/module.c  |   4 +-
  arch/riscv/kernel/vmlinux.lds.S |   3 +-
  arch/riscv/mm/context.c |   2 +-
  arch/riscv/mm/init.c| 376 
  arch/riscv/mm/physaddr.c|   2 +-
  arch/riscv/mm/ptdump.c  |  56 +++-
  drivers/firmware/efi/libstub/efi-stub.c |   2 +-
  include/asm-generic/pgalloc.h   |  24 +-
  include/linux/sizes.h   |   3 +-
  20 files changed, 648 insertions(+), 147 deletions(-)



Any thought about the idea ? Is it going in the right direction ? I have 
fixed quite a few things since I posted this so don't bother giving this 
patchset a full review.


Thanks,

Alex


Re: riscv+KASAN does not boot

2021-01-28 Thread Alex Ghiti

Hi Dmitry,

On 1/18/21 10:43 AM, Dmitry Vyukov wrote:

On Mon, Jan 18, 2021 at 4:05 PM Dmitry Vyukov  wrote:


On Mon, Jan 18, 2021 at 3:53 PM Tobias Klauser  wrote:

On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt  wrote:


On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyu...@google.com wrote:

On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab  wrote:


On Dez 25 2020, Dmitry Vyukov wrote:


qemu-system-riscv64 \
-machine virt -bios default -smp 1 -m 2G \
-device virtio-blk-device,drive=hd0 \
-drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \
-kernel arch/riscv/boot/Image \
-nographic \
-device virtio-rng-device,rng=rng0 -object
rng-random,filename=/dev/urandom,id=rng0 \
-netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
virtio-net-device,netdev=net0 \
-append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
panic_on_warn=1 panic=86400"


Do you get more output with earlycon=sbi?


Hi Andreas,

For defconfig+kvm_guest.config+ scripts/config -e KASAN -e
KASAN_INLINE it actually gave me more output:


OpenSBI v0.7
_  _
   / __ \  / |  _ \_   _|
  | |  | |_ __   ___ _ __ | (___ | |_) || |
  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
  | |__| | |_) |  __/ | | |) | |_) || |_
   \/| .__/ \___|_| |_|_/|/_|
 | |
 |_|

Platform Name  : QEMU Virt Machine
Platform HART Features : RV64ACDFIMSU
Current Hart   : 0
Firmware Base  : 0x8000
Firmware Size  : 132 KB
Runtime SBI Version: 0.2

MIDELEG : 0x0222
MEDELEG : 0xb109
PMP0: 0x8000-0x8003 (A)
PMP1: 0x-0x (A,R,W,X)
[0.00] Linux version 5.10.0-01370-g71c5f03154ac
(dvyu...@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc
(Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17
SMP Fri Dec 25 18:10:12 CET 2020
[0.00] OF: fdt: Ignoring memory range 0x8000 - 0x8020
[0.00] earlycon: sbi0 at I/O port 0x0 (options '')
[0.00] printk: bootconsole [sbi0] enabled
[0.00] efi: UEFI not found.
[0.00] Zone ranges:
[0.00]   DMA32[mem 0x8020-0x]
[0.00]   Normal   empty
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x8020-0x]
[0.00] Initmem setup node 0 [mem 0x8020-0x]
[0.00] SBI specification v0.2 detected
[0.00] SBI implementation ID=0x1 Version=0x7
[0.00] SBI v0.2 TIME extension detected
[0.00] SBI v0.2 IPI extension detected
[0.00] SBI v0.2 RFENCE extension detected
[0.00] software IO TLB: mapped [mem
0xfa3f9000-0xfe3f9000] (64MB)
[0.00] Unable to handle kernel paging request at virtual
address dfc81004
[0.00] Oops [#1]
[0.00] Modules linked in:
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted
5.10.0-01370-g71c5f03154ac #17
[0.00] epc: ffe00042e3e4 ra : ffe000c0462c sp : ffe001603ea0
[0.00]  gp : ffe0016e3c60 tp : ffe00160cd40 t0 :
dfc81004
[0.00]  t1 : ffe000e0a838 t2 :  s0 :
ffe001603f50
[0.00]  s1 : ffe0016e50a8 a0 : dfc81004 a1 :

[0.00]  a2 : 0ffc a3 : dfc82000 a4 :

[0.00]  a5 : 3e8c6001 a6 : ffe000e0a820 a7 :
0900
[0.00]  s2 : dfc82000 s3 : dfc8 s4 :
0001
[0.00]  s5 : ffe0016e5108 s6 : f000 s7 :
dfc81004
[0.00]  s8 : 0080 s9 :  s10:
ffe07a119000
[0.00]  s11: ffc0 t3 : ffe0016eb908 t4 :
0001
[0.00]  t5 : ffc4001c150a t6 : ffe001603be8
[0.00] status: 0100 badaddr: dfc81004
cause: 000f
[0.00] random: get_random_bytes called from
oops_exit+0x30/0x58 with crng_init=0
[0.00] ---[ end trace  ]---
[0.00] Kernel panic - not syncing: Fatal exception
[0.00] ---[ end Kernel panic - not syncing: Fatal exception ]---


But I first tried with a the kernel image I had in the dir, I think it
was this config (no KASAN):
https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt

and earlycon=sbi did not change anything (no output after OpenSBI).
So potentially there are 2 different problems.


Thanks for reporting this.  Looks like I'd forgotten to add a kasan config to
my tests.  There's one in there now, and it's passing as of the fix that Nylon
posted.


I can boot the KASAN kernel now on riscv/fixes.

Next problem: I've got only to:

[   90.498967][T1] Run /sbin/init as 

Re: [RFC PATCH 01/12] riscv: Move kernel mapping outside of linear mapping

2021-01-06 Thread Alex Ghiti




Le 1/6/21 à 1:44 AM, Anup Patel a écrit :

On Wed, Jan 6, 2021 at 12:06 PM Alex Ghiti  wrote:


Hi Anup,

Le 1/5/21 à 6:40 AM, Anup Patel a écrit :

On Tue, Jan 5, 2021 at 1:29 AM Alexandre Ghiti  wrote:


This is a preparatory patch for relocatable kernel and sv48 support.

The kernel used to be linked at PAGE_OFFSET address therefore we could use
the linear mapping for the kernel mapping. But the relocated kernel base
address will be different from PAGE_OFFSET and since in the linear mapping,
two different virtual addresses cannot point to the same physical address,
the kernel mapping needs to lie outside the linear mapping so that we don't
have to copy it at the same physical offset.

The kernel mapping is moved to the last 2GB of the address space and then
BPF and modules are also pushed to the same range since they have to lie
close to the kernel inside a 2GB window.

Note then that KASLR implementation will simply have to move the kernel in
this 2GB range and modify BPF/modules regions accordingly.

In addition, by moving the kernel to the end of the address space, both
sv39 and sv48 kernels will be exactly the same without needing to be
relocated at runtime.


Awesome ! This is a good approach with no performance impact.



Suggested-by: Arnd Bergmann 
Signed-off-by: Alexandre Ghiti 
---
   arch/riscv/boot/loader.lds.S |  3 +-
   arch/riscv/include/asm/page.h| 10 -
   arch/riscv/include/asm/pgtable.h | 39 +--
   arch/riscv/kernel/head.S |  3 +-
   arch/riscv/kernel/module.c   |  4 +-
   arch/riscv/kernel/vmlinux.lds.S  |  3 +-
   arch/riscv/mm/init.c | 65 
   arch/riscv/mm/physaddr.c |  2 +-
   8 files changed, 94 insertions(+), 35 deletions(-)

diff --git a/arch/riscv/boot/loader.lds.S b/arch/riscv/boot/loader.lds.S
index 47a5003c2e28..62d94696a19c 100644
--- a/arch/riscv/boot/loader.lds.S
+++ b/arch/riscv/boot/loader.lds.S
@@ -1,13 +1,14 @@
   /* SPDX-License-Identifier: GPL-2.0 */

   #include 
+#include 

   OUTPUT_ARCH(riscv)
   ENTRY(_start)

   SECTIONS
   {
-   . = PAGE_OFFSET;
+   . = KERNEL_LINK_ADDR;

  .payload : {
  *(.payload)
diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 2d50f76efe48..98188e315e8d 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -90,18 +90,26 @@ typedef struct page *pgtable_t;

   #ifdef CONFIG_MMU
   extern unsigned long va_pa_offset;
+extern unsigned long va_kernel_pa_offset;
   extern unsigned long pfn_base;
   #define ARCH_PFN_OFFSET(pfn_base)
   #else
   #define va_pa_offset   0
+#define va_kernel_pa_offset0
   #define ARCH_PFN_OFFSET(PAGE_OFFSET >> PAGE_SHIFT)
   #endif /* CONFIG_MMU */

   extern unsigned long max_low_pfn;
   extern unsigned long min_low_pfn;
+extern unsigned long kernel_virt_addr;

   #define __pa_to_va_nodebug(x)  ((void *)((unsigned long) (x) + va_pa_offset))
-#define __va_to_pa_nodebug(x)  ((unsigned long)(x) - va_pa_offset)
+#define linear_mapping_va_to_pa(x) ((unsigned long)(x) - va_pa_offset)
+#define kernel_mapping_va_to_pa(x) \
+   ((unsigned long)(x) - va_kernel_pa_offset)
+#define __va_to_pa_nodebug(x)  \
+   (((x) < KERNEL_LINK_ADDR) ? \
+   linear_mapping_va_to_pa(x) : kernel_mapping_va_to_pa(x))

   #ifdef CONFIG_DEBUG_VIRTUAL
   extern phys_addr_t __virt_to_phys(unsigned long x);
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 183f1f4b2ae6..102b728ca146 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -11,23 +11,32 @@

   #include 

-#ifndef __ASSEMBLY__
-
-/* Page Upper Directory not used in RISC-V */
-#include 
-#include 
-#include 
-#include 
+#ifndef CONFIG_MMU
+#define KERNEL_VIRT_ADDR   PAGE_OFFSET
+#define KERNEL_LINK_ADDR   PAGE_OFFSET
+#else

-#ifdef CONFIG_MMU
+#define ADDRESS_SPACE_END  (UL(-1))
+/*
+ * Leave 2GB for kernel, modules and BPF at the end of the address space
+ */
+#define KERNEL_VIRT_ADDR   (ADDRESS_SPACE_END - SZ_2G + 1)
+#define KERNEL_LINK_ADDR   KERNEL_VIRT_ADDR

   #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
   #define VMALLOC_END  (PAGE_OFFSET - 1)
   #define VMALLOC_START(PAGE_OFFSET - VMALLOC_SIZE)

+/* KASLR should leave at least 128MB for BPF after the kernel */
   #define BPF_JIT_REGION_SIZE(SZ_128M)
-#define BPF_JIT_REGION_START   (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
-#define BPF_JIT_REGION_END (VMALLOC_END)
+#define BPF_JIT_REGION_START   PFN_ALIGN((unsigned long)&_end)
+#define BPF_JIT_REGION_END (BPF_JIT_REGION_START + BPF_JIT_REGION_SIZE)
+
+/* Modules always live before the kernel */
+#ifdef CONFIG_64BIT
+#define VMALLOC_MODULE_START   (PFN_ALIGN((unsigned long)&_end) - SZ_2G)
+#define VMALLOC_MODULE_END (PFN_ALIGN((unsigned long)&_start))
+#endif


This 

Re: [RFC PATCH 04/12] riscv: Allow to dynamically define VA_BITS

2021-01-05 Thread Alex Ghiti




Le 1/5/21 à 7:06 AM, Anup Patel a écrit :

On Tue, Jan 5, 2021 at 1:33 AM Alexandre Ghiti  wrote:


With 4-level page table folding at runtime, we don't know at compile time
the size of the virtual address space so we must set VA_BITS dynamically
so that sparsemem reserves the right amount of memory for struct pages.

Signed-off-by: Alexandre Ghiti 
---
  arch/riscv/Kconfig | 10 --
  arch/riscv/include/asm/pgtable.h   | 11 +--
  arch/riscv/include/asm/sparsemem.h |  6 +-
  3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 44377fd7860e..2979a44103be 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -122,16 +122,6 @@ config ZONE_DMA32
 bool
 default y if 64BIT

-config VA_BITS
-   int
-   default 32 if 32BIT
-   default 39 if 64BIT
-
-config PA_BITS
-   int
-   default 34 if 32BIT
-   default 56 if 64BIT
-
  config PAGE_OFFSET
 hex
 default 0xC000 if 32BIT && MAXPHYSMEM_2GB
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 102b728ca146..c7973bfd65bc 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -43,8 +43,14 @@
   * struct pages to map half the virtual address space. Then
   * position vmemmap directly below the VMALLOC region.
   */
+#ifdef CONFIG_64BIT
+#define VA_BITS39
+#else
+#define VA_BITS32
+#endif
+
  #define VMEMMAP_SHIFT \
-   (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
+   (VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
  #define VMEMMAP_SIZE   BIT(VMEMMAP_SHIFT)
  #define VMEMMAP_END(VMALLOC_START - 1)
  #define VMEMMAP_START  (VMALLOC_START - VMEMMAP_SIZE)
@@ -83,6 +89,7 @@
  #endif /* CONFIG_64BIT */

  #ifdef CONFIG_MMU
+
  /* Number of entries in the page global directory */
  #define PTRS_PER_PGD(PAGE_SIZE / sizeof(pgd_t))
  /* Number of entries in the page table */
@@ -453,7 +460,7 @@ static inline int ptep_clear_flush_young(struct 
vm_area_struct *vma,
   * and give the kernel the other (upper) half.
   */
  #ifdef CONFIG_64BIT
-#define KERN_VIRT_START(-(BIT(CONFIG_VA_BITS)) + TASK_SIZE)
+#define KERN_VIRT_START(-(BIT(VA_BITS)) + TASK_SIZE)
  #else
  #define KERN_VIRT_STARTFIXADDR_START
  #endif
diff --git a/arch/riscv/include/asm/sparsemem.h 
b/arch/riscv/include/asm/sparsemem.h
index 45a7018a8118..63acaecc3374 100644
--- a/arch/riscv/include/asm/sparsemem.h
+++ b/arch/riscv/include/asm/sparsemem.h
@@ -4,7 +4,11 @@
  #define _ASM_RISCV_SPARSEMEM_H

  #ifdef CONFIG_SPARSEMEM
-#define MAX_PHYSMEM_BITS   CONFIG_PA_BITS
+#ifdef CONFIG_64BIT
+#define MAX_PHYSMEM_BITS   56
+#else
+#define MAX_PHYSMEM_BITS   34
+#endif /* CONFIG_64BIT */
  #define SECTION_SIZE_BITS  27
  #endif /* CONFIG_SPARSEMEM */

--
2.20.1



Looks good to me.

Reviewed-by: Anup Patel 


Thanks,



Regards,
Anup



Alex


Re: [RFC PATCH 01/12] riscv: Move kernel mapping outside of linear mapping

2021-01-05 Thread Alex Ghiti

Hi Anup,

Le 1/5/21 à 6:40 AM, Anup Patel a écrit :

On Tue, Jan 5, 2021 at 1:29 AM Alexandre Ghiti  wrote:


This is a preparatory patch for relocatable kernel and sv48 support.

The kernel used to be linked at PAGE_OFFSET address therefore we could use
the linear mapping for the kernel mapping. But the relocated kernel base
address will be different from PAGE_OFFSET and since in the linear mapping,
two different virtual addresses cannot point to the same physical address,
the kernel mapping needs to lie outside the linear mapping so that we don't
have to copy it at the same physical offset.

The kernel mapping is moved to the last 2GB of the address space and then
BPF and modules are also pushed to the same range since they have to lie
close to the kernel inside a 2GB window.

Note then that KASLR implementation will simply have to move the kernel in
this 2GB range and modify BPF/modules regions accordingly.

In addition, by moving the kernel to the end of the address space, both
sv39 and sv48 kernels will be exactly the same without needing to be
relocated at runtime.


Awesome ! This is a good approach with no performance impact.



Suggested-by: Arnd Bergmann 
Signed-off-by: Alexandre Ghiti 
---
  arch/riscv/boot/loader.lds.S |  3 +-
  arch/riscv/include/asm/page.h| 10 -
  arch/riscv/include/asm/pgtable.h | 39 +--
  arch/riscv/kernel/head.S |  3 +-
  arch/riscv/kernel/module.c   |  4 +-
  arch/riscv/kernel/vmlinux.lds.S  |  3 +-
  arch/riscv/mm/init.c | 65 
  arch/riscv/mm/physaddr.c |  2 +-
  8 files changed, 94 insertions(+), 35 deletions(-)

diff --git a/arch/riscv/boot/loader.lds.S b/arch/riscv/boot/loader.lds.S
index 47a5003c2e28..62d94696a19c 100644
--- a/arch/riscv/boot/loader.lds.S
+++ b/arch/riscv/boot/loader.lds.S
@@ -1,13 +1,14 @@
  /* SPDX-License-Identifier: GPL-2.0 */

  #include 
+#include 

  OUTPUT_ARCH(riscv)
  ENTRY(_start)

  SECTIONS
  {
-   . = PAGE_OFFSET;
+   . = KERNEL_LINK_ADDR;

 .payload : {
 *(.payload)
diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 2d50f76efe48..98188e315e8d 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -90,18 +90,26 @@ typedef struct page *pgtable_t;

  #ifdef CONFIG_MMU
  extern unsigned long va_pa_offset;
+extern unsigned long va_kernel_pa_offset;
  extern unsigned long pfn_base;
  #define ARCH_PFN_OFFSET(pfn_base)
  #else
  #define va_pa_offset   0
+#define va_kernel_pa_offset0
  #define ARCH_PFN_OFFSET(PAGE_OFFSET >> PAGE_SHIFT)
  #endif /* CONFIG_MMU */

  extern unsigned long max_low_pfn;
  extern unsigned long min_low_pfn;
+extern unsigned long kernel_virt_addr;

  #define __pa_to_va_nodebug(x)  ((void *)((unsigned long) (x) + va_pa_offset))
-#define __va_to_pa_nodebug(x)  ((unsigned long)(x) - va_pa_offset)
+#define linear_mapping_va_to_pa(x) ((unsigned long)(x) - va_pa_offset)
+#define kernel_mapping_va_to_pa(x) \
+   ((unsigned long)(x) - va_kernel_pa_offset)
+#define __va_to_pa_nodebug(x)  \
+   (((x) < KERNEL_LINK_ADDR) ? \
+   linear_mapping_va_to_pa(x) : kernel_mapping_va_to_pa(x))

  #ifdef CONFIG_DEBUG_VIRTUAL
  extern phys_addr_t __virt_to_phys(unsigned long x);
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 183f1f4b2ae6..102b728ca146 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -11,23 +11,32 @@

  #include 

-#ifndef __ASSEMBLY__
-
-/* Page Upper Directory not used in RISC-V */
-#include 
-#include 
-#include 
-#include 
+#ifndef CONFIG_MMU
+#define KERNEL_VIRT_ADDR   PAGE_OFFSET
+#define KERNEL_LINK_ADDR   PAGE_OFFSET
+#else

-#ifdef CONFIG_MMU
+#define ADDRESS_SPACE_END  (UL(-1))
+/*
+ * Leave 2GB for kernel, modules and BPF at the end of the address space
+ */
+#define KERNEL_VIRT_ADDR   (ADDRESS_SPACE_END - SZ_2G + 1)
+#define KERNEL_LINK_ADDR   KERNEL_VIRT_ADDR

  #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
  #define VMALLOC_END  (PAGE_OFFSET - 1)
  #define VMALLOC_START(PAGE_OFFSET - VMALLOC_SIZE)

+/* KASLR should leave at least 128MB for BPF after the kernel */
  #define BPF_JIT_REGION_SIZE(SZ_128M)
-#define BPF_JIT_REGION_START   (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
-#define BPF_JIT_REGION_END (VMALLOC_END)
+#define BPF_JIT_REGION_START   PFN_ALIGN((unsigned long)&_end)
+#define BPF_JIT_REGION_END (BPF_JIT_REGION_START + BPF_JIT_REGION_SIZE)
+
+/* Modules always live before the kernel */
+#ifdef CONFIG_64BIT
+#define VMALLOC_MODULE_START   (PFN_ALIGN((unsigned long)&_end) - SZ_2G)
+#define VMALLOC_MODULE_END (PFN_ALIGN((unsigned long)&_start))
+#endif


This does not look right or I am missing something.

I think the VMALLOC_MODULE_START should be:
#define VMALLOC_MODULE_START   (PFN_ALIGN((unsigned long)&_start) - 

Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone

2020-07-22 Thread Alex Ghiti




Le 7/21/20 à 7:36 PM, Palmer Dabbelt a écrit :

On Tue, 21 Jul 2020 16:11:02 PDT (-0700), b...@kernel.crashing.org wrote:

On Tue, 2020-07-21 at 14:36 -0400, Alex Ghiti wrote:

> > I guess I don't understand why this is necessary at all.
> > Specifically: why
> > can't we just relocate the kernel within the linear map?  That would
> > let the
> > bootloader put the kernel wherever it wants, modulo the physical
> > memory size we
> > support.  We'd need to handle the regions that are coupled to the
> > kernel's
> > execution address, but we could just put them in an explicit memory
> > region
> > which is what we should probably be doing anyway.
>
> Virtual relocation in the linear mapping requires to move the kernel
> physically too. Zong implemented this physical move in its KASLR RFC
> patchset, which is cumbersome since finding an available physical spot
> is harder than just selecting a virtual range in the vmalloc range.
>
> In addition, having the kernel mapping in the linear mapping prevents
> the use of hugepage for the linear mapping resulting in performance 
loss

> (at least for the GB that encompasses the kernel).
>
> Why do you find this "ugly" ? The vmalloc region is just a bunch of
> available virtual addresses to whatever purpose we want, and as 
noted by

> Zong, arm64 uses the same scheme.


I don't get it :-)

At least on powerpc we move the kernel in the linear mapping and it
works fine with huge pages, what is your problem there ? You rely on
punching small-page size holes in there ?


That was my original suggestion, and I'm not actually sure it's 
invalid.  It
would mean that both the kernel's physical and virtual addresses are set 
by the
bootloader, which may or may not be workable if we want to have an 
sv48+sv39
kernel.  My initial approach to sv48+sv39 kernels would be to just throw 
away
the sv39 memory on sv48 kernels, which would preserve the linear map but 
mean

that there is no single physical address that's accessible for both.  That
would require some coordination between the bootloader and the kernel as to
where it should be loaded, but maybe there's a better way to design the 
linear
map.  Right now we have a bunch of unwritten rules about where things 
need to

be loaded, which is a recipe for disaster.

We could copy the kernel around, but I'm not sure I really like that 
idea.  We
do zero the BSS right now, so it's not like we entirely rely on the 
bootloader
to set up the kernel image, but with the hart race boot scheme we have 
right

now we'd at least need to leave a stub sitting around.  Maybe we just throw
away SBI v0.1, though, that's why we called it all legacy in the first 
place.


My bigger worry is that anything that involves running the kernel at 
arbitrary
virtual addresses means we need a PIC kernel, which means every global 
symbol
needs an indirection.  That's probably not so bad for shared libraries, 
but the
kernel has a lot of global symbols.  PLT references probably aren't so 
scary,
as we have an incoherent instruction cache so the virtual function 
predictor

isn't that hard to build, but making all global data accesses GOT-relative
seems like a disaster for performance.  This fixed-VA thing really just 
exists

so we don't have to be full-on PIC.

In theory I think we could just get away with pretending that medany is 
PIC,
which I believe works as long as the data and text offset stays 
constant, you

you don't have any symbols between 2GiB and -2GiB (as those may stay fixed,
even in medany), and you deal with GP accordingly (which should work 
itself out
in the current startup code).  We rely on this for some of the early 
boot code

(and will soon for kexec), but that's a very controlled code base and we've
already had some issues.  I'd be much more comfortable adding an explicit
semi-PIC code model, as I tend to miss something when doing these sorts of
things and then we could at least add it to the GCC test runs and 
guarantee it
actually works.  Not really sure I want to deal with that, though.  It 
would,

however, be the only way to get random virtual addresses during kernel
execution.


At least in the old days, there were a number of assumptions that
the kernel text/data/bss resides in the linear mapping.


Ya, it terrified me as well.  Alex says arm64 puts the kernel in the 
vmalloc

region, so assuming that's the case it must be possible.  I didn't get that
from reading the arm64 port (I guess it's no secret that pretty much all 
I do

is copy their code)


See https://elixir.bootlin.com/linux/latest/source/arch/arm64/mm/mmu.c#L615.




If you change that you need to ensure that it's still physically
contiguous and you'll have to tweak __va and __pa, which might induce
extra overhead.


I'm operating under the assumption that we don't want to add an 
additional load
to virt2phys conversions.  arm64 bends over b

Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone

2020-07-22 Thread Alex Ghiti

Hi Palmer,

Le 7/21/20 à 3:05 PM, Palmer Dabbelt a écrit :

On Tue, 21 Jul 2020 11:36:10 PDT (-0700), a...@ghiti.fr wrote:

Let's try to make progress here: I add linux-mm in CC to get feedback on
this patch as it blocks sv48 support too.


Sorry for being slow here.  I haven't replied because I hadn't really 
fleshed


No problem :)

out the design yet, but just so everyone's on the same page my problems 
with

this are:

* We waste vmalloc space on 32-bit systems, where there isn't a lot of it.
* On 64-bit systems the VA space around the kernel is precious because 
it's the
  only place we can place text (modules, BPF, whatever).  If we start 
putting
  the kernel in the vmalloc space then we either have to pre-allocate a 
bunch

  of space around it (essentially making it a fixed mapping anyway) or it
  becomes likely that we won't be able to find space for modules as they're
  loaded into running systems.


Let's note that we already have this issue for BPF and modules right now.
But by keeping the kernel 'in the end' of the vmalloc region, that's 
quite mitigate this problem: if we exhaust the vmalloc region in 64bit 
and then start allocating here, I think the whole system will have other 
problem.


* Relying on a relocatable kernel for sv48 support introduces a fairly 
large

  performance hit.


I understand the performance penalty but I struggle to it "fairly 
large": can we benchmark this somehow ?




Roughly, my proposal would be to:

* Leave the 32-bit memory map alone.  On 32-bit systems we can load modules
  anywhere and we only have one VA width, so we're not really solving any
  problems with these changes.


Ok that's possible although a lot of ifdef will get involved :)

* Staticly allocate a 2GiB portion of the VA space for all our text, as 
its own
  region.  We'd link/relocate the kernel here instead of around 
PAGE_OFFSET,
  which would decouple the kernel from the physical memory layout of the 
system.
  This would have the side effect of sorting out a bunch of bootloader 
headaches

  that we currently have.


This amounts to doing the same as this patch but instead of using the 
vmalloc region, we'd use our own right ? I believe we'd then lose the 
vmalloc facilities to allocate modules around this zone.



* Sort out how to maintain a linear map as the canonical hole moves around
  between the VA widths without adding a bunch of overhead to the 
virt2phys and
  friends.  This is probably going to be the trickiest part, but I think 
if we

  just change the page table code to essentially lie about VAs when an sv39
  system runs an sv48+sv39 kernel we could make it work -- there'd be some
  logical complexity involved, but it would remain fast.


I have to think about that.



This doesn't solve the problem of virtually relocatable kernels, but it 
does
let us decouple that from the sv48 stuff.  It also lets us stop relying 
on a

fixed physical address the kernel is loaded into, which is another thing I
don't like.



Agreed on this one.


I know this may be a more complicated approach, but there aren't any sv48
systems around right now so I just don't see the rush to support them,
particularly when there's a cost to what already exists (for those who 
haven't

been watching, so far all the sv48 patch sets have imposed a significant
performance penalty on all systems).



Alex



Alex

Le 7/9/20 à 7:11 AM, Alex Ghiti a écrit :

Hi Palmer,

Le 7/9/20 à 1:05 AM, Palmer Dabbelt a écrit :

On Sun, 07 Jun 2020 00:59:46 PDT (-0700), a...@ghiti.fr wrote:

This is a preparatory patch for relocatable kernel.

The kernel used to be linked at PAGE_OFFSET address and used to be
loaded
physically at the beginning of the main memory. Therefore, we could 
use

the linear mapping for the kernel mapping.

But the relocated kernel base address will be different from 
PAGE_OFFSET
and since in the linear mapping, two different virtual addresses 
cannot

point to the same physical address, the kernel mapping needs to lie
outside
the linear mapping.


I know it's been a while, but I keep opening this up to review it and
just
can't get over how ugly it is to put the kernel's linear map in the
vmalloc
region.

I guess I don't understand why this is necessary at all.
Specifically: why
can't we just relocate the kernel within the linear map?  That would
let the
bootloader put the kernel wherever it wants, modulo the physical
memory size we
support.  We'd need to handle the regions that are coupled to the
kernel's
execution address, but we could just put them in an explicit memory
region
which is what we should probably be doing anyway.


Virtual relocation in the linear mapping requires to move the kernel
physically too. Zong implemented this physical move in its KASLR RFC
patchset, which is cumbersome since finding an available physical spot
is harder than just selecting a virtual range in the vmalloc range.

In addition, having the kernel mapping in the linear mapping prevents
the use o

Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone

2020-07-22 Thread Alex Ghiti

Hi Benjamin,

Le 7/21/20 à 7:11 PM, Benjamin Herrenschmidt a écrit :

On Tue, 2020-07-21 at 14:36 -0400, Alex Ghiti wrote:

I guess I don't understand why this is necessary at all.
Specifically: why
can't we just relocate the kernel within the linear map?  That would
let the
bootloader put the kernel wherever it wants, modulo the physical
memory size we
support.  We'd need to handle the regions that are coupled to the
kernel's
execution address, but we could just put them in an explicit memory
region
which is what we should probably be doing anyway.


Virtual relocation in the linear mapping requires to move the kernel
physically too. Zong implemented this physical move in its KASLR RFC
patchset, which is cumbersome since finding an available physical spot
is harder than just selecting a virtual range in the vmalloc range.

In addition, having the kernel mapping in the linear mapping prevents
the use of hugepage for the linear mapping resulting in performance loss
(at least for the GB that encompasses the kernel).

Why do you find this "ugly" ? The vmalloc region is just a bunch of
available virtual addresses to whatever purpose we want, and as noted by
Zong, arm64 uses the same scheme.


I don't get it :-)

At least on powerpc we move the kernel in the linear mapping and it
works fine with huge pages, what is your problem there ? You rely on
punching small-page size holes in there ?



ARCH_HAS_STRICT_KERNEL_RWX prevents the use of a hugepage for the kernel 
mapping in the direct mapping as it sets different permissions to 
different part of the kernel (data, text..etc).




At least in the old days, there were a number of assumptions that
the kernel text/data/bss resides in the linear mapping.

If you change that you need to ensure that it's still physically
contiguous and you'll have to tweak __va and __pa, which might induce
extra overhead.



Yes that's done in this patch and indeed there is an overhead to those 
functions.



Cheers,
Ben.
  



Thanks,

Alex


Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone

2020-07-21 Thread Alex Ghiti
Let's try to make progress here: I add linux-mm in CC to get feedback on 
this patch as it blocks sv48 support too.


Alex

Le 7/9/20 à 7:11 AM, Alex Ghiti a écrit :

Hi Palmer,

Le 7/9/20 à 1:05 AM, Palmer Dabbelt a écrit :

On Sun, 07 Jun 2020 00:59:46 PDT (-0700), a...@ghiti.fr wrote:

This is a preparatory patch for relocatable kernel.

The kernel used to be linked at PAGE_OFFSET address and used to be 
loaded

physically at the beginning of the main memory. Therefore, we could use
the linear mapping for the kernel mapping.

But the relocated kernel base address will be different from PAGE_OFFSET
and since in the linear mapping, two different virtual addresses cannot
point to the same physical address, the kernel mapping needs to lie 
outside

the linear mapping.


I know it's been a while, but I keep opening this up to review it and 
just
can't get over how ugly it is to put the kernel's linear map in the 
vmalloc

region.

I guess I don't understand why this is necessary at all.  
Specifically: why
can't we just relocate the kernel within the linear map?  That would 
let the
bootloader put the kernel wherever it wants, modulo the physical 
memory size we
support.  We'd need to handle the regions that are coupled to the 
kernel's
execution address, but we could just put them in an explicit memory 
region

which is what we should probably be doing anyway.


Virtual relocation in the linear mapping requires to move the kernel 
physically too. Zong implemented this physical move in its KASLR RFC 
patchset, which is cumbersome since finding an available physical spot 
is harder than just selecting a virtual range in the vmalloc range.


In addition, having the kernel mapping in the linear mapping prevents 
the use of hugepage for the linear mapping resulting in performance loss 
(at least for the GB that encompasses the kernel).


Why do you find this "ugly" ? The vmalloc region is just a bunch of 
available virtual addresses to whatever purpose we want, and as noted by 
Zong, arm64 uses the same scheme.





In addition, because modules and BPF must be close to the kernel (inside
+-2GB window), the kernel is placed at the end of the vmalloc zone minus
2GB, which leaves room for modules and BPF. The kernel could not be
placed at the beginning of the vmalloc zone since other vmalloc
allocations from the kernel could get all the +-2GB window around the
kernel which would prevent new modules and BPF programs to be loaded.


Well, that's not enough to make sure this doesn't happen -- it's just 
enough to
make sure it doesn't happen very quickily.  That's the same boat we're 
already

in, though, so it's not like it's worse.


Indeed, that's not worse, I haven't found a way to reserve vmalloc area 
without actually allocating it.





Signed-off-by: Alexandre Ghiti 
Reviewed-by: Zong Li 
---
 arch/riscv/boot/loader.lds.S |  3 +-
 arch/riscv/include/asm/page.h    | 10 +-
 arch/riscv/include/asm/pgtable.h | 38 ++---
 arch/riscv/kernel/head.S |  3 +-
 arch/riscv/kernel/module.c   |  4 +--
 arch/riscv/kernel/vmlinux.lds.S  |  3 +-
 arch/riscv/mm/init.c | 58 +---
 arch/riscv/mm/physaddr.c |  2 +-
 8 files changed, 88 insertions(+), 33 deletions(-)

diff --git a/arch/riscv/boot/loader.lds.S b/arch/riscv/boot/loader.lds.S
index 47a5003c2e28..62d94696a19c 100644
--- a/arch/riscv/boot/loader.lds.S
+++ b/arch/riscv/boot/loader.lds.S
@@ -1,13 +1,14 @@
 /* SPDX-License-Identifier: GPL-2.0 */

 #include 
+#include 

 OUTPUT_ARCH(riscv)
 ENTRY(_start)

 SECTIONS
 {
-    . = PAGE_OFFSET;
+    . = KERNEL_LINK_ADDR;

 .payload : {
 *(.payload)
diff --git a/arch/riscv/include/asm/page.h 
b/arch/riscv/include/asm/page.h

index 2d50f76efe48..48bb09b6a9b7 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -90,18 +90,26 @@ typedef struct page *pgtable_t;

 #ifdef CONFIG_MMU
 extern unsigned long va_pa_offset;
+extern unsigned long va_kernel_pa_offset;
 extern unsigned long pfn_base;
 #define ARCH_PFN_OFFSET    (pfn_base)
 #else
 #define va_pa_offset    0
+#define va_kernel_pa_offset    0
 #define ARCH_PFN_OFFSET    (PAGE_OFFSET >> PAGE_SHIFT)
 #endif /* CONFIG_MMU */

 extern unsigned long max_low_pfn;
 extern unsigned long min_low_pfn;
+extern unsigned long kernel_virt_addr;

 #define __pa_to_va_nodebug(x)    ((void *)((unsigned long) (x) + 
va_pa_offset))

-#define __va_to_pa_nodebug(x)    ((unsigned long)(x) - va_pa_offset)
+#define linear_mapping_va_to_pa(x)    ((unsigned long)(x) - 
va_pa_offset)

+#define kernel_mapping_va_to_pa(x)    \
+    ((unsigned long)(x) - va_kernel_pa_offset)
+#define __va_to_pa_nodebug(x)    \
+    (((x) >= PAGE_OFFSET) ?    \
+    linear_mapping_va_to_pa(x) : kernel_mapping_va_to_pa(x))

 #ifdef CONFIG_DEBUG_VIRTUAL
 extern phys_addr_t __virt_to_phys(unsigned long x);
diff --git a/arch/riscv/include/asm/pgtable.h 
b/arch/riscv/in

Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone

2020-07-09 Thread Alex Ghiti

Hi Palmer,

Le 7/9/20 à 1:05 AM, Palmer Dabbelt a écrit :

On Sun, 07 Jun 2020 00:59:46 PDT (-0700), a...@ghiti.fr wrote:

This is a preparatory patch for relocatable kernel.

The kernel used to be linked at PAGE_OFFSET address and used to be loaded
physically at the beginning of the main memory. Therefore, we could use
the linear mapping for the kernel mapping.

But the relocated kernel base address will be different from PAGE_OFFSET
and since in the linear mapping, two different virtual addresses cannot
point to the same physical address, the kernel mapping needs to lie 
outside

the linear mapping.


I know it's been a while, but I keep opening this up to review it and just
can't get over how ugly it is to put the kernel's linear map in the vmalloc
region.

I guess I don't understand why this is necessary at all.  Specifically: why
can't we just relocate the kernel within the linear map?  That would let 
the
bootloader put the kernel wherever it wants, modulo the physical memory 
size we

support.  We'd need to handle the regions that are coupled to the kernel's
execution address, but we could just put them in an explicit memory region
which is what we should probably be doing anyway.


Virtual relocation in the linear mapping requires to move the kernel 
physically too. Zong implemented this physical move in its KASLR RFC 
patchset, which is cumbersome since finding an available physical spot 
is harder than just selecting a virtual range in the vmalloc range.


In addition, having the kernel mapping in the linear mapping prevents 
the use of hugepage for the linear mapping resulting in performance loss 
(at least for the GB that encompasses the kernel).


Why do you find this "ugly" ? The vmalloc region is just a bunch of 
available virtual addresses to whatever purpose we want, and as noted by 
Zong, arm64 uses the same scheme.





In addition, because modules and BPF must be close to the kernel (inside
+-2GB window), the kernel is placed at the end of the vmalloc zone minus
2GB, which leaves room for modules and BPF. The kernel could not be
placed at the beginning of the vmalloc zone since other vmalloc
allocations from the kernel could get all the +-2GB window around the
kernel which would prevent new modules and BPF programs to be loaded.


Well, that's not enough to make sure this doesn't happen -- it's just 
enough to
make sure it doesn't happen very quickily.  That's the same boat we're 
already

in, though, so it's not like it's worse.


Indeed, that's not worse, I haven't found a way to reserve vmalloc area 
without actually allocating it.





Signed-off-by: Alexandre Ghiti 
Reviewed-by: Zong Li 
---
 arch/riscv/boot/loader.lds.S |  3 +-
 arch/riscv/include/asm/page.h    | 10 +-
 arch/riscv/include/asm/pgtable.h | 38 ++---
 arch/riscv/kernel/head.S |  3 +-
 arch/riscv/kernel/module.c   |  4 +--
 arch/riscv/kernel/vmlinux.lds.S  |  3 +-
 arch/riscv/mm/init.c | 58 +---
 arch/riscv/mm/physaddr.c |  2 +-
 8 files changed, 88 insertions(+), 33 deletions(-)

diff --git a/arch/riscv/boot/loader.lds.S b/arch/riscv/boot/loader.lds.S
index 47a5003c2e28..62d94696a19c 100644
--- a/arch/riscv/boot/loader.lds.S
+++ b/arch/riscv/boot/loader.lds.S
@@ -1,13 +1,14 @@
 /* SPDX-License-Identifier: GPL-2.0 */

 #include 
+#include 

 OUTPUT_ARCH(riscv)
 ENTRY(_start)

 SECTIONS
 {
-    . = PAGE_OFFSET;
+    . = KERNEL_LINK_ADDR;

 .payload : {
 *(.payload)
diff --git a/arch/riscv/include/asm/page.h 
b/arch/riscv/include/asm/page.h

index 2d50f76efe48..48bb09b6a9b7 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -90,18 +90,26 @@ typedef struct page *pgtable_t;

 #ifdef CONFIG_MMU
 extern unsigned long va_pa_offset;
+extern unsigned long va_kernel_pa_offset;
 extern unsigned long pfn_base;
 #define ARCH_PFN_OFFSET    (pfn_base)
 #else
 #define va_pa_offset    0
+#define va_kernel_pa_offset    0
 #define ARCH_PFN_OFFSET    (PAGE_OFFSET >> PAGE_SHIFT)
 #endif /* CONFIG_MMU */

 extern unsigned long max_low_pfn;
 extern unsigned long min_low_pfn;
+extern unsigned long kernel_virt_addr;

 #define __pa_to_va_nodebug(x)    ((void *)((unsigned long) (x) + 
va_pa_offset))

-#define __va_to_pa_nodebug(x)    ((unsigned long)(x) - va_pa_offset)
+#define linear_mapping_va_to_pa(x)    ((unsigned long)(x) - 
va_pa_offset)

+#define kernel_mapping_va_to_pa(x)    \
+    ((unsigned long)(x) - va_kernel_pa_offset)
+#define __va_to_pa_nodebug(x)    \
+    (((x) >= PAGE_OFFSET) ?    \
+    linear_mapping_va_to_pa(x) : kernel_mapping_va_to_pa(x))

 #ifdef CONFIG_DEBUG_VIRTUAL
 extern phys_addr_t __virt_to_phys(unsigned long x);
diff --git a/arch/riscv/include/asm/pgtable.h 
b/arch/riscv/include/asm/pgtable.h

index 35b60035b6b0..94ef3b49dfb6 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -11,23 +11,29 @@

 #include 

-#ifndef __ASSEMBLY__
-

Re: [PATCH] riscv: Enable ELF-ASLR for riscv

2020-07-08 Thread Alex Ghiti

Hi Guo,

Le 7/9/20 à 12:38 AM, guo...@kernel.org a écrit :

From: Guo Ren 

Let riscv enable randomizes the stack, heap and binary images of
ELF binaries. Seems it's ok at all after qemu & chip test and
there is no founded side effect.

So just simply select ARCH_HAS_ELF_RANDOMIZE :)

Signed-off-by: Guo Ren 
Cc: Palmer Dabbelt 
Cc: Paul Walmsley 
Cc: Zong Li 
Cc: Greentime Hu 
---
  arch/riscv/Kconfig | 1 +
  1 file changed, 1 insertion(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 91bfc6c..eed6647 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -20,6 +20,7 @@ config RISCV
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_HAS_MMIOWB
select ARCH_HAS_PTE_SPECIAL
+   select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_SET_DIRECT_MAP
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_STRICT_KERNEL_RWX if MMU



Actually it is already the case: ARCH_HAS_ELF_RANDOMIZE is already 
selected by ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT.


Thanks,

Alex


Re: [PATCH v5 0/4] vmalloc kernel mapping and relocatable kernel

2020-07-07 Thread Alex Ghiti

Hi Palmer,

Le 6/7/20 à 3:59 AM, Alexandre Ghiti a écrit :

This patchset originally implemented relocatable kernel support but now
also moves the kernel mapping into the vmalloc zone.
  
The first patch explains why we need to move the kernel into vmalloc

zone (instead of memcpying it around). That patch should ease KASLR
implementation a lot.
  
The second patch allows to build relocatable kernels but is not selected

by default.
  
The third and fourth patches take advantage of an already existing powerpc

script that checks relocations at compile-time, and uses it for riscv.
  
Changes in v5:

   * Add "static __init" to create_kernel_page_table function as reported by
 Kbuild test robot
   * Add reviewed-by from Zong
   * Rebase onto v5.7

Changes in v4:
   * Fix BPF region that overlapped with kernel's as suggested by Zong
   * Fix end of module region that could be larger than 2GB as suggested by Zong
   * Fix the size of the vm area reserved for the kernel as we could lose
 PMD_SIZE if the size was already aligned on PMD_SIZE
   * Split compile time relocations check patch into 2 patches as suggested by 
Anup
   * Applied Reviewed-by from Zong and Anup
  
Changes in v3:

   * Move kernel mapping to vmalloc
  
Changes in v2:

   * Make RELOCATABLE depend on MMU as suggested by Anup
   * Rename kernel_load_addr into kernel_virt_addr as suggested by Anup
   * Use __pa_symbol instead of __pa, as suggested by Zong
   * Rebased on top of v5.6-rc3
   * Tested with sv48 patchset
   * Add Reviewed/Tested-by from Zong and Anup

Alexandre Ghiti (4):
   riscv: Move kernel mapping to vmalloc zone
   riscv: Introduce CONFIG_RELOCATABLE
   powerpc: Move script to check relocations at compile time in scripts/
   riscv: Check relocations at compile time

  arch/powerpc/tools/relocs_check.sh |  18 +
  arch/riscv/Kconfig |  12 +++
  arch/riscv/Makefile|   5 +-
  arch/riscv/Makefile.postlink   |  36 +
  arch/riscv/boot/loader.lds.S   |   3 +-
  arch/riscv/include/asm/page.h  |  10 ++-
  arch/riscv/include/asm/pgtable.h   |  38 ++---
  arch/riscv/kernel/head.S   |   3 +-
  arch/riscv/kernel/module.c |   4 +-
  arch/riscv/kernel/vmlinux.lds.S|   9 ++-
  arch/riscv/mm/Makefile |   4 +
  arch/riscv/mm/init.c   | 121 +
  arch/riscv/mm/physaddr.c   |   2 +-
  arch/riscv/tools/relocs_check.sh   |  26 +++
  scripts/relocs_check.sh|  20 +
  15 files changed, 259 insertions(+), 52 deletions(-)
  create mode 100644 arch/riscv/Makefile.postlink
  create mode 100755 arch/riscv/tools/relocs_check.sh
  create mode 100755 scripts/relocs_check.sh



Do you have any remark regarding this series ?

Thanks,

Alex


Re: [PATCH v2 0/8] Introduce sv48 support

2020-07-02 Thread Alex Ghiti

Hi Palmer,

Le 7/1/20 à 2:27 PM, Palmer Dabbelt a écrit :

On Wed, 03 Jun 2020 01:10:56 PDT (-0700), a...@ghiti.fr wrote:

This patchset implements sv48 support at runtime. The kernel will try to
boot with 4-level page table and will fallback to 3-level if the HW 
does not

support it.

The biggest advantage is that we only have one kernel for 64bit, which
is way easier to maintain.

Folding the 4th level into a 3-level page table has almost no cost at
runtime. But as mentioned Palmer, the relocatable code generated is less
performant.

At the moment, there is no way to build a 3-level page table 
non-relocatable
64bit kernel. We agreed that distributions will use this runtime 
configuration
anyway, but Palmer proposed to introduce a new Kconfig, which I will 
do later

as sv48 support was asked for 5.8.


Sorry I wasn't clear last time, but this still has the same fundamental 
issue:

it forces 64-bit kernels to be relocatable, which imposes a performance
penalty.  We don't have any hardware that can actually take advantage of 
sv48,
so I don't want to take anything that penalizes what people are actually 
using

in order to add a feature people can't use.

I'd be OK taking this if sv48 support simply depended on a relocatable 
kernel,
as then users who want the faster kernel could still build one.  I don't 
want

to take something that forces all 64-bit kernels to be relocatable.


Indeed, I had not understood that this was a requirement. I will add a 
patch on top of this one introducing a new config, I have to think about it.


But even if I understand that the new level of indirection coming with 
PIE will be slower, is this new config worth it ? Can we benchmark 
somehow the performance loss ? IMHO I think that this config will get 
broken over time by lack of testing because I believe distributions will 
go for KASLR kernel which requires the relocatability property anyway.


Alex



Finally, the user can now ask for sv39 explicitly by using the 
device-tree
which will reduce memory footprint and reduce the number of memory 
accesses

in case of TLB miss.

Changes in v2:
  * Move variable declarations to pgtable.h in patch 5/7 as suggested 
by Anup

  * Restore mmu-type properties in patch 6 as suggested by Anup
  * Fix unused variable in patch 5 that was used in patch 6
  * Fix SPARSEMEM build (patch 2 was modified so I dropped the 
Reviewed-by)

  * Applied various Reviewed-by

Alexandre Ghiti (8):
  riscv: Get rid of compile time logic with MAX_EARLY_MAPPING_SIZE
  riscv: Allow to dynamically define VA_BITS
  riscv: Simplify MAXPHYSMEM config
  riscv: Prepare ptdump for vm layout dynamic addresses
  riscv: Implement sv48 support
  riscv: Allow user to downgrade to sv39 when hw supports sv48
  riscv: Use pgtable_l4_enabled to output mmu type in cpuinfo
  riscv: Explicit comment about user virtual address space size

 arch/riscv/Kconfig  |  34 ++---
 arch/riscv/include/asm/csr.h    |   3 +-
 arch/riscv/include/asm/fixmap.h |   1 +
 arch/riscv/include/asm/page.h   |  15 +++
 arch/riscv/include/asm/pgalloc.h    |  36 ++
 arch/riscv/include/asm/pgtable-64.h |  97 +-
 arch/riscv/include/asm/pgtable.h    |  31 -
 arch/riscv/include/asm/sparsemem.h  |   6 +-
 arch/riscv/kernel/cpu.c |  23 ++--
 arch/riscv/kernel/head.S    |   3 +-
 arch/riscv/mm/context.c |   2 +-
 arch/riscv/mm/init.c    | 194 
 arch/riscv/mm/ptdump.c  |  49 +--
 13 files changed, 412 insertions(+), 82 deletions(-)


Re: [PATCH 2/2] riscv: Use PUD/PGDIR entries for linear mapping when possible

2020-06-29 Thread Alex Ghiti

Hi Atish,

Le 6/22/20 à 3:11 PM, Atish Patra a écrit :

On Sun, Jun 21, 2020 at 2:39 AM Alex Ghiti  wrote:


Hi Atish,

Le 6/20/20 à 5:04 AM, Alex Ghiti a écrit :

Hi Atish,

Le 6/19/20 à 2:16 PM, Atish Patra a écrit :

On Thu, Jun 18, 2020 at 9:28 PM Alex Ghiti  wrote:

Hi Atish,

Le 6/18/20 à 8:47 PM, Atish Patra a écrit :

On Wed, Jun 3, 2020 at 8:38 AM Alexandre Ghiti  wrote:

Improve best_map_size so that PUD or PGDIR entries are used for
linear
mapping when possible as it allows better TLB utilization.

Signed-off-by: Alexandre Ghiti 
---
arch/riscv/mm/init.c | 45
+---
1 file changed, 34 insertions(+), 11 deletions(-)

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 9a5c97e091c1..d275f9f834cf 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -424,13 +424,29 @@ static void __init create_pgd_mapping(pgd_t
*pgdp,
   create_pgd_next_mapping(nextp, va, pa, sz, prot);
}

-static uintptr_t __init best_map_size(phys_addr_t base,
phys_addr_t size)
+static bool is_map_size_ok(uintptr_t map_size, phys_addr_t base,
+  uintptr_t base_virt, phys_addr_t size)
{
-   /* Upgrade to PMD_SIZE mappings whenever possible */
-   if ((base & (PMD_SIZE - 1)) || (size & (PMD_SIZE - 1)))
-   return PAGE_SIZE;
+   return !((base & (map_size - 1)) || (base_virt & (map_size
- 1)) ||
+   (size < map_size));
+}
+
+static uintptr_t __init best_map_size(phys_addr_t base, uintptr_t
base_virt,
+ phys_addr_t size)
+{
+#ifndef __PAGETABLE_PMD_FOLDED
+   if (is_map_size_ok(PGDIR_SIZE, base, base_virt, size))
+   return PGDIR_SIZE;
+
+   if (pgtable_l4_enabled)
+   if (is_map_size_ok(PUD_SIZE, base, base_virt, size))
+   return PUD_SIZE;
+#endif
+
+   if (is_map_size_ok(PMD_SIZE, base, base_virt, size))
+   return PMD_SIZE;

-   return PMD_SIZE;
+   return PAGE_SIZE;
}

/*
@@ -576,7 +592,7 @@ void create_kernel_page_table(pgd_t *pgdir,
uintptr_t map_size)
asmlinkage void __init setup_vm(uintptr_t dtb_pa)
{
   uintptr_t va, end_va;
-   uintptr_t map_size = best_map_size(load_pa,
MAX_EARLY_MAPPING_SIZE);
+   uintptr_t map_size;

   load_pa = (uintptr_t)(&_start);
   load_sz = (uintptr_t)(&_end) - load_pa;
@@ -587,6 +603,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)

   kernel_virt_addr = KERNEL_VIRT_ADDR;

+   map_size = best_map_size(load_pa, PAGE_OFFSET,
MAX_EARLY_MAPPING_SIZE);
   va_pa_offset = PAGE_OFFSET - load_pa;
   va_kernel_pa_offset = kernel_virt_addr - load_pa;
   pfn_base = PFN_DOWN(load_pa);
@@ -700,6 +717,8 @@ static void __init setup_vm_final(void)

   /* Map all memory banks */
   for_each_memblock(memory, reg) {
+   uintptr_t remaining_size;
+
   start = reg->base;
   end = start + reg->size;

@@ -707,15 +726,19 @@ static void __init setup_vm_final(void)
   break;
   if (memblock_is_nomap(reg))
   continue;
-   if (start <= __pa(PAGE_OFFSET) &&
-   __pa(PAGE_OFFSET) < end)
-   start = __pa(PAGE_OFFSET);

-   map_size = best_map_size(start, end - start);
-   for (pa = start; pa < end; pa += map_size) {
+   pa = start;
+   remaining_size = reg->size;
+
+   while (remaining_size) {
   va = (uintptr_t)__va(pa);
+   map_size = best_map_size(pa, va,
remaining_size);
+
create_pgd_mapping(swapper_pg_dir, va, pa,
  map_size, PAGE_KERNEL);
+
+   pa += map_size;
+   remaining_size -= map_size;
   }
   }


This may not work in the RV32 with 2G memory  and if the map_size is
determined to be a page size
for the last memblock. Both pa & remaining_size will overflow and the
loop will try to map memory from zero again.

I'm not sure I understand: if pa starts at 0x8000_ and size is 2G,
then pa will overflow in the last iteration, but remaining_size will
then be equal to 0 right ?


Not unless the remaining_size is at least page size aligned. The last
remaining size would "fff".
It will overflow as well after subtracting the map_size.



While fixing this issue, I noticed that if the size in the device tree
is not aligned on PAGE_SIZE, the size is then automatically realigned on
PAGE_SIZE: see early_init_dt_add_memory_arch where size is and-ed with
PAGE_MASK to remove the unaligned part.


Yes. But the memblock size is not guaranteed to be PAGE_SIZE aligned.
The memblock size is updated in memblock_cap_size

 /* adjus

Re: [PATCH 0/2] PUD/PGDIR entries for linear mapping

2020-06-29 Thread Alex Ghiti

Le 6/3/20 à 11:36 AM, Alexandre Ghiti a écrit :

This small patchset intends to use PUD/PGDIR entries for linear mapping
in order to better utilize TLB.

At the moment, only PMD entries can be used since on common platforms
(qemu/unleashed), the kernel is loaded at DRAM + 2MB which dealigns virtual
and physical addresses and then prevents the use of PUD/PGDIR entries.
So the kernel must be able to get those 2MB for PAGE_OFFSET to map the
beginning of the DRAM: this is achieved in patch 1.

But furthermore, at the moment, the firmware (opensbi) explicitly asks the
kernel not to map the region it occupies, which is on those common
platforms at the very beginning of the DRAM and then it also dealigns
virtual and physical addresses. I proposed a patch here:

https://github.com/riscv/opensbi/pull/167

that removes this 'constraint' but *not* all the time as it offers some
kind of protection in case PMP is not available. So sometimes, we may
have a part of the memory below the kernel that is removed creating a
misalignment between virtual and physical addresses. So for performance
reasons, we must at least make sure that PMD entries can be used: that
is guaranteed by patch 1 too.

Finally the second patch simply improves best_map_size so that whenever
possible, PUD/PGDIR entries are used.

Below is the kernel page table without this patch on a 6G platform:

---[ Linear mapping ]---
0xc000-0xc00176e00x8020 5998M PMD D A . 
. . W R V

And with this patchset + opensbi patch:

---[ Linear mapping ]---
0xc000-0xc0014000 0x8000 5G PUD D A 
. . . W R V
0xc0014000-0xc00177000x0001c000 880M PMD D A . 
. . W R V

Alexandre Ghiti (2):
   riscv: Get memory below load_pa while ensuring linear mapping is PMD
 aligned
   riscv: Use PUD/PGDIR entries for linear mapping when possible

  arch/riscv/include/asm/page.h |  8 
  arch/riscv/mm/init.c  | 69 +--
  2 files changed, 65 insertions(+), 12 deletions(-)



The way to handle the remapping of the first 2MB is incorrect: Atish has 
issues while using an initrd because the initrd_start variable is 
defined using __va between setup_vm and setup_vm_final and then its 
value is inconsistent after setup_vm_final since virtual addressing was 
modified with the remapping of the first 2MB.


I will come with another solution to this problem since the way I handle 
it for now is not correct.


Thanks,

Alex


Re: [PATCH v2 5/8] riscv: Implement sv48 support

2020-06-27 Thread Alex Ghiti

Hi Nick,

Le 6/27/20 à 8:30 AM, Nick Kossifidis a écrit :

Στις 2020-06-03 11:11, Alexandre Ghiti έγραψε:

By adding a new 4th level of page table, give the possibility to 64bit
kernel to address 2^48 bytes of virtual address: in practice, that 
roughly

offers ~160TB of virtual address space to userspace and allows up to 64TB
of physical memory.

If the underlying hardware does not support sv48, we will automatically
fallback to a standard 3-level page table by folding the new PUD level 
into

PGDIR level. In order to detect HW capabilities at runtime, we
use SATP feature that ignores writes with an unsupported mode.

Signed-off-by: Alexandre Ghiti 
Reviewed-by: Anup Patel 
---
 arch/riscv/Kconfig  |   6 +-
 arch/riscv/include/asm/csr.h    |   3 +-
 arch/riscv/include/asm/fixmap.h |   1 +
 arch/riscv/include/asm/page.h   |  15 +++
 arch/riscv/include/asm/pgalloc.h    |  36 +++
 arch/riscv/include/asm/pgtable-64.h |  97 -
 arch/riscv/include/asm/pgtable.h    |  10 +-
 arch/riscv/kernel/head.S    |   3 +-
 arch/riscv/mm/context.c |   2 +-
 arch/riscv/mm/init.c    | 158 +---
 10 files changed, 307 insertions(+), 24 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index e167f16131f4..3f73f60e9732 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -68,6 +68,7 @@ config RISCV
 select ARCH_HAS_GCOV_PROFILE_ALL
 select HAVE_COPY_THREAD_TLS
 select HAVE_ARCH_KASAN if MMU && 64BIT
+    select RELOCATABLE if 64BIT

 config ARCH_MMAP_RND_BITS_MIN
 default 18 if 64BIT
@@ -106,7 +107,7 @@ config PAGE_OFFSET
 default 0xC000 if 32BIT && MAXPHYSMEM_2GB
 default 0x8000 if 64BIT && !MMU
 default 0x8000 if 64BIT && MAXPHYSMEM_2GB
-    default 0xffe0 if 64BIT && !MAXPHYSMEM_2GB
+    default 0xc000 if 64BIT && !MAXPHYSMEM_2GB

 config ARCH_FLATMEM_ENABLE
 def_bool y
@@ -155,8 +156,11 @@ config GENERIC_HWEIGHT
 config FIX_EARLYCON_MEM
 def_bool MMU

+# On a 64BIT relocatable kernel, the 4-level page table is at runtime 
folded

+# on a 3-level page table when sv48 is not supported.
 config PGTABLE_LEVELS
 int
+    default 4 if 64BIT && RELOCATABLE
 default 3 if 64BIT
 default 2

diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index cec462e198ce..d41536c3f8d4 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -40,11 +40,10 @@
 #ifndef CONFIG_64BIT
 #define SATP_PPN    _AC(0x003F, UL)
 #define SATP_MODE_32    _AC(0x8000, UL)
-#define SATP_MODE    SATP_MODE_32
 #else
 #define SATP_PPN    _AC(0x0FFF, UL)
 #define SATP_MODE_39    _AC(0x8000, UL)
-#define SATP_MODE    SATP_MODE_39
+#define SATP_MODE_48    _AC(0x9000, UL)
 #endif

 /* Exception cause high bit - is an interrupt if set */
diff --git a/arch/riscv/include/asm/fixmap.h 
b/arch/riscv/include/asm/fixmap.h

index 2368d49eb4ef..d891cf9c73c5 100644
--- a/arch/riscv/include/asm/fixmap.h
+++ b/arch/riscv/include/asm/fixmap.h
@@ -27,6 +27,7 @@ enum fixed_addresses {
 FIX_FDT = FIX_FDT_END + FIX_FDT_SIZE / PAGE_SIZE - 1,
 FIX_PTE,
 FIX_PMD,
+    FIX_PUD,
 FIX_TEXT_POKE1,
 FIX_TEXT_POKE0,
 FIX_EARLYCON_MEM_BASE,
diff --git a/arch/riscv/include/asm/page.h 
b/arch/riscv/include/asm/page.h

index 48bb09b6a9b7..5e77fe7f0d6d 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -31,7 +31,19 @@
  * When not using MMU this corresponds to the first free page in
  * physical memory (aligned on a page boundary).
  */
+#ifdef CONFIG_RELOCATABLE
+#define PAGE_OFFSET    __page_offset
+
+#ifdef CONFIG_64BIT
+/*
+ * By default, CONFIG_PAGE_OFFSET value corresponds to SV48 address 
space so

+ * define the PAGE_OFFSET value for SV39.
+ */
+#define PAGE_OFFSET_L3    0xffe0
+#endif /* CONFIG_64BIT */
+#else
 #define PAGE_OFFSET    _AC(CONFIG_PAGE_OFFSET, UL)
+#endif /* CONFIG_RELOCATABLE */

 #define KERN_VIRT_SIZE (-PAGE_OFFSET)

@@ -102,6 +114,9 @@ extern unsigned long pfn_base;
 extern unsigned long max_low_pfn;
 extern unsigned long min_low_pfn;
 extern unsigned long kernel_virt_addr;
+#ifdef CONFIG_RELOCATABLE
+extern unsigned long __page_offset;
+#endif

 #define __pa_to_va_nodebug(x)    ((void *)((unsigned long) (x) + 
va_pa_offset))
 #define linear_mapping_va_to_pa(x)    ((unsigned long)(x) - 
va_pa_offset)
diff --git a/arch/riscv/include/asm/pgalloc.h 
b/arch/riscv/include/asm/pgalloc.h

index 3f601ee8233f..540eaa5a8658 100644
--- a/arch/riscv/include/asm/pgalloc.h
+++ b/arch/riscv/include/asm/pgalloc.h
@@ -36,6 +36,42 @@ static inline void pud_populate(struct mm_struct
*mm, pud_t *pud, pmd_t *pmd)

 set_pud(pud, __pud((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
 }
+
+static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, 
pud_t *pud)

+{
+    if (pgtable_l4_enabled) {
+    unsigned 

Re: [PATCH 2/2] riscv: Use PUD/PGDIR entries for linear mapping when possible

2020-06-21 Thread Alex Ghiti

Hi Atish,

Le 6/20/20 à 5:04 AM, Alex Ghiti a écrit :

Hi Atish,

Le 6/19/20 à 2:16 PM, Atish Patra a écrit :

On Thu, Jun 18, 2020 at 9:28 PM Alex Ghiti  wrote:

Hi Atish,

Le 6/18/20 à 8:47 PM, Atish Patra a écrit :

On Wed, Jun 3, 2020 at 8:38 AM Alexandre Ghiti  wrote:
Improve best_map_size so that PUD or PGDIR entries are used for 
linear

mapping when possible as it allows better TLB utilization.

Signed-off-by: Alexandre Ghiti 
---
   arch/riscv/mm/init.c | 45 
+---

   1 file changed, 34 insertions(+), 11 deletions(-)

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 9a5c97e091c1..d275f9f834cf 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -424,13 +424,29 @@ static void __init create_pgd_mapping(pgd_t 
*pgdp,

  create_pgd_next_mapping(nextp, va, pa, sz, prot);
   }

-static uintptr_t __init best_map_size(phys_addr_t base, 
phys_addr_t size)

+static bool is_map_size_ok(uintptr_t map_size, phys_addr_t base,
+  uintptr_t base_virt, phys_addr_t size)
   {
-   /* Upgrade to PMD_SIZE mappings whenever possible */
-   if ((base & (PMD_SIZE - 1)) || (size & (PMD_SIZE - 1)))
-   return PAGE_SIZE;
+   return !((base & (map_size - 1)) || (base_virt & (map_size 
- 1)) ||

+   (size < map_size));
+}
+
+static uintptr_t __init best_map_size(phys_addr_t base, uintptr_t 
base_virt,

+ phys_addr_t size)
+{
+#ifndef __PAGETABLE_PMD_FOLDED
+   if (is_map_size_ok(PGDIR_SIZE, base, base_virt, size))
+   return PGDIR_SIZE;
+
+   if (pgtable_l4_enabled)
+   if (is_map_size_ok(PUD_SIZE, base, base_virt, size))
+   return PUD_SIZE;
+#endif
+
+   if (is_map_size_ok(PMD_SIZE, base, base_virt, size))
+   return PMD_SIZE;

-   return PMD_SIZE;
+   return PAGE_SIZE;
   }

   /*
@@ -576,7 +592,7 @@ void create_kernel_page_table(pgd_t *pgdir, 
uintptr_t map_size)

   asmlinkage void __init setup_vm(uintptr_t dtb_pa)
   {
  uintptr_t va, end_va;
-   uintptr_t map_size = best_map_size(load_pa, 
MAX_EARLY_MAPPING_SIZE);

+   uintptr_t map_size;

  load_pa = (uintptr_t)(&_start);
  load_sz = (uintptr_t)(&_end) - load_pa;
@@ -587,6 +603,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)

  kernel_virt_addr = KERNEL_VIRT_ADDR;

+   map_size = best_map_size(load_pa, PAGE_OFFSET, 
MAX_EARLY_MAPPING_SIZE);

  va_pa_offset = PAGE_OFFSET - load_pa;
  va_kernel_pa_offset = kernel_virt_addr - load_pa;
  pfn_base = PFN_DOWN(load_pa);
@@ -700,6 +717,8 @@ static void __init setup_vm_final(void)

  /* Map all memory banks */
  for_each_memblock(memory, reg) {
+   uintptr_t remaining_size;
+
  start = reg->base;
  end = start + reg->size;

@@ -707,15 +726,19 @@ static void __init setup_vm_final(void)
  break;
  if (memblock_is_nomap(reg))
  continue;
-   if (start <= __pa(PAGE_OFFSET) &&
-   __pa(PAGE_OFFSET) < end)
-   start = __pa(PAGE_OFFSET);

-   map_size = best_map_size(start, end - start);
-   for (pa = start; pa < end; pa += map_size) {
+   pa = start;
+   remaining_size = reg->size;
+
+   while (remaining_size) {
  va = (uintptr_t)__va(pa);
+   map_size = best_map_size(pa, va, 
remaining_size);

+
create_pgd_mapping(swapper_pg_dir, va, pa,
 map_size, PAGE_KERNEL);
+
+   pa += map_size;
+   remaining_size -= map_size;
  }
  }


This may not work in the RV32 with 2G memory  and if the map_size is
determined to be a page size
for the last memblock. Both pa & remaining_size will overflow and the
loop will try to map memory from zero again.

I'm not sure I understand: if pa starts at 0x8000_ and size is 2G,
then pa will overflow in the last iteration, but remaining_size will
then be equal to 0 right ?


Not unless the remaining_size is at least page size aligned. The last
remaining size would "fff".
It will overflow as well after subtracting the map_size.



While fixing this issue, I noticed that if the size in the device tree 
is not aligned on PAGE_SIZE, the size is then automatically realigned on 
PAGE_SIZE: see early_init_dt_add_memory_arch where size is and-ed with 
PAGE_MASK to remove the unaligned part.


So the issue does not need to be fixed :)

Thanks anyway,

Alex





And by the way, I realize that this loop only handles sizes that are
aligned on map_size.


Yeah.



Thanks for noticing, I send a v2.

Alex





Thanks,

Alex



--
2.20.1









Re: [PATCH 2/2] riscv: Use PUD/PGDIR entries for linear mapping when possible

2020-06-20 Thread Alex Ghiti

Hi Atish,

Le 6/19/20 à 2:16 PM, Atish Patra a écrit :

On Thu, Jun 18, 2020 at 9:28 PM Alex Ghiti  wrote:

Hi Atish,

Le 6/18/20 à 8:47 PM, Atish Patra a écrit :

On Wed, Jun 3, 2020 at 8:38 AM Alexandre Ghiti  wrote:

Improve best_map_size so that PUD or PGDIR entries are used for linear
mapping when possible as it allows better TLB utilization.

Signed-off-by: Alexandre Ghiti 
---
   arch/riscv/mm/init.c | 45 +---
   1 file changed, 34 insertions(+), 11 deletions(-)

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 9a5c97e091c1..d275f9f834cf 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -424,13 +424,29 @@ static void __init create_pgd_mapping(pgd_t *pgdp,
  create_pgd_next_mapping(nextp, va, pa, sz, prot);
   }

-static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size)
+static bool is_map_size_ok(uintptr_t map_size, phys_addr_t base,
+  uintptr_t base_virt, phys_addr_t size)
   {
-   /* Upgrade to PMD_SIZE mappings whenever possible */
-   if ((base & (PMD_SIZE - 1)) || (size & (PMD_SIZE - 1)))
-   return PAGE_SIZE;
+   return !((base & (map_size - 1)) || (base_virt & (map_size - 1)) ||
+   (size < map_size));
+}
+
+static uintptr_t __init best_map_size(phys_addr_t base, uintptr_t base_virt,
+ phys_addr_t size)
+{
+#ifndef __PAGETABLE_PMD_FOLDED
+   if (is_map_size_ok(PGDIR_SIZE, base, base_virt, size))
+   return PGDIR_SIZE;
+
+   if (pgtable_l4_enabled)
+   if (is_map_size_ok(PUD_SIZE, base, base_virt, size))
+   return PUD_SIZE;
+#endif
+
+   if (is_map_size_ok(PMD_SIZE, base, base_virt, size))
+   return PMD_SIZE;

-   return PMD_SIZE;
+   return PAGE_SIZE;
   }

   /*
@@ -576,7 +592,7 @@ void create_kernel_page_table(pgd_t *pgdir, uintptr_t 
map_size)
   asmlinkage void __init setup_vm(uintptr_t dtb_pa)
   {
  uintptr_t va, end_va;
-   uintptr_t map_size = best_map_size(load_pa, MAX_EARLY_MAPPING_SIZE);
+   uintptr_t map_size;

  load_pa = (uintptr_t)(&_start);
  load_sz = (uintptr_t)(&_end) - load_pa;
@@ -587,6 +603,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)

  kernel_virt_addr = KERNEL_VIRT_ADDR;

+   map_size = best_map_size(load_pa, PAGE_OFFSET, MAX_EARLY_MAPPING_SIZE);
  va_pa_offset = PAGE_OFFSET - load_pa;
  va_kernel_pa_offset = kernel_virt_addr - load_pa;
  pfn_base = PFN_DOWN(load_pa);
@@ -700,6 +717,8 @@ static void __init setup_vm_final(void)

  /* Map all memory banks */
  for_each_memblock(memory, reg) {
+   uintptr_t remaining_size;
+
  start = reg->base;
  end = start + reg->size;

@@ -707,15 +726,19 @@ static void __init setup_vm_final(void)
  break;
  if (memblock_is_nomap(reg))
  continue;
-   if (start <= __pa(PAGE_OFFSET) &&
-   __pa(PAGE_OFFSET) < end)
-   start = __pa(PAGE_OFFSET);

-   map_size = best_map_size(start, end - start);
-   for (pa = start; pa < end; pa += map_size) {
+   pa = start;
+   remaining_size = reg->size;
+
+   while (remaining_size) {
  va = (uintptr_t)__va(pa);
+   map_size = best_map_size(pa, va, remaining_size);
+
  create_pgd_mapping(swapper_pg_dir, va, pa,
 map_size, PAGE_KERNEL);
+
+   pa += map_size;
+   remaining_size -= map_size;
  }
  }


This may not work in the RV32 with 2G memory  and if the map_size is
determined to be a page size
for the last memblock. Both pa & remaining_size will overflow and the
loop will try to map memory from zero again.

I'm not sure I understand: if pa starts at 0x8000_ and size is 2G,
then pa will overflow in the last iteration, but remaining_size will
then be equal to 0 right ?


Not unless the remaining_size is at least page size aligned. The last
remaining size would "fff".
It will overflow as well after subtracting the map_size.


And by the way, I realize that this loop only handles sizes that are
aligned on map_size.


Yeah.



Thanks for noticing, I send a v2.

Alex





Thanks,

Alex



--
2.20.1







Re: [PATCH 2/2] riscv: Use PUD/PGDIR entries for linear mapping when possible

2020-06-18 Thread Alex Ghiti

Hi Atish,

Le 6/18/20 à 8:47 PM, Atish Patra a écrit :

On Wed, Jun 3, 2020 at 8:38 AM Alexandre Ghiti  wrote:

Improve best_map_size so that PUD or PGDIR entries are used for linear
mapping when possible as it allows better TLB utilization.

Signed-off-by: Alexandre Ghiti 
---
  arch/riscv/mm/init.c | 45 +---
  1 file changed, 34 insertions(+), 11 deletions(-)

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 9a5c97e091c1..d275f9f834cf 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -424,13 +424,29 @@ static void __init create_pgd_mapping(pgd_t *pgdp,
 create_pgd_next_mapping(nextp, va, pa, sz, prot);
  }

-static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size)
+static bool is_map_size_ok(uintptr_t map_size, phys_addr_t base,
+  uintptr_t base_virt, phys_addr_t size)
  {
-   /* Upgrade to PMD_SIZE mappings whenever possible */
-   if ((base & (PMD_SIZE - 1)) || (size & (PMD_SIZE - 1)))
-   return PAGE_SIZE;
+   return !((base & (map_size - 1)) || (base_virt & (map_size - 1)) ||
+   (size < map_size));
+}
+
+static uintptr_t __init best_map_size(phys_addr_t base, uintptr_t base_virt,
+ phys_addr_t size)
+{
+#ifndef __PAGETABLE_PMD_FOLDED
+   if (is_map_size_ok(PGDIR_SIZE, base, base_virt, size))
+   return PGDIR_SIZE;
+
+   if (pgtable_l4_enabled)
+   if (is_map_size_ok(PUD_SIZE, base, base_virt, size))
+   return PUD_SIZE;
+#endif
+
+   if (is_map_size_ok(PMD_SIZE, base, base_virt, size))
+   return PMD_SIZE;

-   return PMD_SIZE;
+   return PAGE_SIZE;
  }

  /*
@@ -576,7 +592,7 @@ void create_kernel_page_table(pgd_t *pgdir, uintptr_t 
map_size)
  asmlinkage void __init setup_vm(uintptr_t dtb_pa)
  {
 uintptr_t va, end_va;
-   uintptr_t map_size = best_map_size(load_pa, MAX_EARLY_MAPPING_SIZE);
+   uintptr_t map_size;

 load_pa = (uintptr_t)(&_start);
 load_sz = (uintptr_t)(&_end) - load_pa;
@@ -587,6 +603,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)

 kernel_virt_addr = KERNEL_VIRT_ADDR;

+   map_size = best_map_size(load_pa, PAGE_OFFSET, MAX_EARLY_MAPPING_SIZE);
 va_pa_offset = PAGE_OFFSET - load_pa;
 va_kernel_pa_offset = kernel_virt_addr - load_pa;
 pfn_base = PFN_DOWN(load_pa);
@@ -700,6 +717,8 @@ static void __init setup_vm_final(void)

 /* Map all memory banks */
 for_each_memblock(memory, reg) {
+   uintptr_t remaining_size;
+
 start = reg->base;
 end = start + reg->size;

@@ -707,15 +726,19 @@ static void __init setup_vm_final(void)
 break;
 if (memblock_is_nomap(reg))
 continue;
-   if (start <= __pa(PAGE_OFFSET) &&
-   __pa(PAGE_OFFSET) < end)
-   start = __pa(PAGE_OFFSET);

-   map_size = best_map_size(start, end - start);
-   for (pa = start; pa < end; pa += map_size) {
+   pa = start;
+   remaining_size = reg->size;
+
+   while (remaining_size) {
 va = (uintptr_t)__va(pa);
+   map_size = best_map_size(pa, va, remaining_size);
+
 create_pgd_mapping(swapper_pg_dir, va, pa,
map_size, PAGE_KERNEL);
+
+   pa += map_size;
+   remaining_size -= map_size;
 }
 }


This may not work in the RV32 with 2G memory  and if the map_size is
determined to be a page size
for the last memblock. Both pa & remaining_size will overflow and the
loop will try to map memory from zero again.


I'm not sure I understand: if pa starts at 0x8000_ and size is 2G, 
then pa will overflow in the last iteration, but remaining_size will 
then be equal to 0 right ?


And by the way, I realize that this loop only handles sizes that are 
aligned on map_size.


Thanks,

Alex





--
2.20.1






Re: [PATCH v2 0/8] Introduce sv48 support

2020-06-17 Thread Alex Ghiti

Hi Palmer,

Le 6/3/20 à 4:10 AM, Alexandre Ghiti a écrit :

This patchset implements sv48 support at runtime. The kernel will try to
boot with 4-level page table and will fallback to 3-level if the HW does not
support it.
  
The biggest advantage is that we only have one kernel for 64bit, which

is way easier to maintain.
  
Folding the 4th level into a 3-level page table has almost no cost at

runtime. But as mentioned Palmer, the relocatable code generated is less
performant.
  
At the moment, there is no way to build a 3-level page table non-relocatable

64bit kernel. We agreed that distributions will use this runtime configuration
anyway, but Palmer proposed to introduce a new Kconfig, which I will do later
as sv48 support was asked for 5.8.
  
Finally, the user can now ask for sv39 explicitly by using the device-tree

which will reduce memory footprint and reduce the number of memory accesses
in case of TLB miss.

Changes in v2:
   * Move variable declarations to pgtable.h in patch 5/7 as suggested by Anup
   * Restore mmu-type properties in patch 6 as suggested by Anup
   * Fix unused variable in patch 5 that was used in patch 6
   * Fix SPARSEMEM build (patch 2 was modified so I dropped the Reviewed-by)
   * Applied various Reviewed-by

Alexandre Ghiti (8):
   riscv: Get rid of compile time logic with MAX_EARLY_MAPPING_SIZE
   riscv: Allow to dynamically define VA_BITS
   riscv: Simplify MAXPHYSMEM config
   riscv: Prepare ptdump for vm layout dynamic addresses
   riscv: Implement sv48 support
   riscv: Allow user to downgrade to sv39 when hw supports sv48
   riscv: Use pgtable_l4_enabled to output mmu type in cpuinfo
   riscv: Explicit comment about user virtual address space size

  arch/riscv/Kconfig  |  34 ++---
  arch/riscv/include/asm/csr.h|   3 +-
  arch/riscv/include/asm/fixmap.h |   1 +
  arch/riscv/include/asm/page.h   |  15 +++
  arch/riscv/include/asm/pgalloc.h|  36 ++
  arch/riscv/include/asm/pgtable-64.h |  97 +-
  arch/riscv/include/asm/pgtable.h|  31 -
  arch/riscv/include/asm/sparsemem.h  |   6 +-
  arch/riscv/kernel/cpu.c |  23 ++--
  arch/riscv/kernel/head.S|   3 +-
  arch/riscv/mm/context.c |   2 +-
  arch/riscv/mm/init.c| 194 
  arch/riscv/mm/ptdump.c  |  49 +--
  13 files changed, 412 insertions(+), 82 deletions(-)



Do you any remark regarding this patchset and the others ?

Thanks,

Alex



Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V

2020-06-16 Thread Alex Ghiti

Le 6/16/20 à 2:07 PM, Palmer Dabbelt a écrit :

On Tue, 16 Jun 2020 10:54:51 PDT (-0700), ati...@atishpatra.org wrote:
On Tue, Jun 16, 2020 at 3:45 AM Michel Lespinasse  
wrote:


I am also unable to reproduce the issue so far.

I wanted to point to a few things in case this helps:
- Commit 42fc541404f2 was bisected as the cause. This commit changes
walk_page_range_novma() to use mmap_assert_locked() instead of
lockdep_assert_held()
- mmap_assert_locked() checks lockdep_assert_held(), but also checks
that the rwsem itself is locked.

Now how could lockdep think the lock is held, but the lock itself is
not marked as locked ???

I'm not sure if it helps at all, but a few commits earlier,
0cc55a0213a0 introduces mmap_read_trylock_non_owner(), which is used
exclusively by stackmap, and does the opposite: it acquires the mmap
lock without telling lockdep about it. I can't see any smoking gun
linking this to our bug, but I thought it may be worth mentioning as
it involves the same suspects (stackmap and the difference between
owning the lock vs lockdep thinking we own the lock).

I'm sorry, that's only how far I was able to go on this bug - I'm not
sure how to investigate it further as I can not reproduce the issue...

On Tue, Jun 16, 2020 at 1:40 AM Palmer Dabbelt  
wrote:

>
> On Mon, 15 Jun 2020 21:51:08 PDT (-0700), sho...@gmail.com wrote:
> > On Tue, Jun 16, 2020 at 06:57:47AM +0900, Stafford Horne wrote:
> >> On Mon, Jun 15, 2020 at 12:28:11AM -0700, Atish Patra wrote:
> >> > Hi,
> >> > I encountered the following issue while booting 5.8-rc1 on 
Qemu for RV64.
> >> > I added additional dump_stack and observed that it's 
happening in bpf free path.

> >> > It happens always if CONFIG_DEBUG_VM is enabled. VM_BUG_ON_MM is
> >> > compiled away without that.
> >> > 
 


> >> > forked to background, child pid 113
> >> > [   10.328850] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted
> >> > 5.8.0-rc1-dirty #732
> >> > [   10.331739] Workqueue: events bpf_prog_free_deferred
> >> > [   10.334133] Call Trace:
> >> > [   10.338039] [] walk_stackframe+0x0/0xa4
> >> > [   10.339988] [] show_stack+0x2e/0x3a
> >> > [   10.340902] [] dump_stack+0x72/0x8c
> >> > [   10.341451] [] 
mmap_assert_locked.part.13+0x14/0x1c
> >> > [   10.342131] [] 
walk_page_range_novma+0x0/0x4e
> >> > [   10.342973] [] 
set_direct_map_invalid_noflush+0x66/0x6e

> >> > [   10.343917] [] __vunmap+0xe8/0x212
> >> > [   10.344680] [] __vfree+0x22/0x6e
> >> > [   10.345270] [] vfree+0x34/0x56
> >> > [   10.345834] [] __bpf_prog_free+0x2c/0x36
> >> > [   10.346529] [] 
bpf_prog_free_deferred+0x74/0x8a

> >> > [   10.347394] [] process_one_work+0x13a/0x272
> >> > [   10.348239] [] worker_thread+0x50/0x2e4
> >> > [   10.348900] [] kthread+0xfc/0x10a
> >> > [   10.349470] [] ret_from_exception+0x0/0xc
> >> > [   10.354405] mm ffe001018600 mmap  
seqnum 0 task_size 0

> >> > [   10.354405] get_unmapped_area 
> >> > [   10.354405] mmap_base 0 mmap_legacy_base 0 highest_vm_end 0
> >> > [   10.354405] pgd ffe001074000 mm_users 2 mm_count 1
> >> > pgtables_bytes 8192 map_count 0
> >> > [   10.354405] hiwater_rss 0 hiwater_vm 0 total_vm 0 locked_vm 0
> >> > [   10.354405] pinned_vm 0 data_vm 0 exec_vm 0 stack_vm 0
> >> > [   10.354405] start_code ffe00020 end_code 
ffe00084acc2

> >> > start_data 0 end_data ffe00106dfe4
> >> > [   10.354405] start_brk 0 brk ffe0010bd6d0 start_stack 0
> >> > [   10.354405] arg_start 0 arg_end 0 env_start 0 env_end 0
> >> > [   10.354405] binfmt  flags 0 core_state 


> >> > [   10.354405] ioctx_table 
> >> > [   10.354405] exe_file 
> >> > [   10.354405] tlb_flush_pending 0
> >> > [   10.354405] def_flags: 0x0()
> >> > [   10.369325] [ cut here ]
> >> > [   10.370763] kernel BUG at include/linux/mmap_lock.h:81!
> >> > [   10.375235] Kernel BUG [#1]
> >> > [   10.377198] Modules linked in:
> >> > [   10.378931] CPU: 3 PID: 51 Comm: kworker/3:1 Not tainted 
5.8.0-rc1-dirty #732

> >> > [   10.380179] Workqueue: events bpf_prog_free_deferred
> >> > [   10.381270] epc: ffe0002db4d4 ra : ffe0002db4d4 sp 
: ffe3eaea7c70

> >> > [   10.382561]  gp : ffe00106d950 tp : ffe3ef752f80 t0 :
> >> > ffe0010836e8
> >> > [   10.383996]  t1 : 0064 t2 :  s0 :
> >> > ffe3eaea7c90
> >> > [   10.385119]  s1 : ffe001018600 a0 : 0289 a1 :
> >> > 0020
> >> > [   10.386099]  a2 : 0005 a3 :  a4 :
> >> > ffe001012758
> >> > [   10.387294]  a5 :  a6 : 0102 a7 :
> >> > 0006
> >> > [   10.388265]  s2 : ffe3f00674c0 s3 : ffe00106e108 s4 :
> >> > ffe00106e100
> >> > [   10.389250]  s5 : ffe00106e908 s6 :  s7 :
> >> > 6db6db6db6db6db7
> >> > [   10.390272]  s8 : 

Re: [PATCH 0/2] PUD/PGDIR entries for linear mapping

2020-06-14 Thread Alex Ghiti

Hi Atish,

Le 6/12/20 à 1:43 PM, Atish Patra a écrit :

On Fri, Jun 12, 2020 at 6:17 AM Alex Ghiti  wrote:

Le 6/12/20 à 8:59 AM, Alex Ghiti a écrit :

Hi Atish,

Le 6/11/20 à 1:29 PM, Atish Patra a écrit :

On Wed, Jun 10, 2020 at 11:51 PM Alex Ghiti  wrote:

Hi Atish,

Le 6/10/20 à 2:32 PM, Atish Patra a écrit :

On Wed, Jun 3, 2020 at 8:36 AM Alexandre Ghiti  wrote:

This small patchset intends to use PUD/PGDIR entries for linear
mapping
in order to better utilize TLB.

At the moment, only PMD entries can be used since on common platforms
(qemu/unleashed), the kernel is loaded at DRAM + 2MB which
dealigns virtual
and physical addresses and then prevents the use of PUD/PGDIR
entries.
So the kernel must be able to get those 2MB for PAGE_OFFSET to map
the
beginning of the DRAM: this is achieved in patch 1.


I don't have in depth knowledge of how mm code works so this question
may be a completely
stupid one :). Just for my understanding,
As per my understanding, kernel will map those 2MB of memory but
never use it.
How does the kernel ensure that it doesn't allocate any memory from
those 2MB
memory if it is not marked as reserved?

Yes, a 1GB hugepage will cover those 2MB: I rely on the previous boot
stage to mark this region
as reserved if there is something there (like opensbi). Otherwise, the
kernel will indeed try to
allocate memory from there :)


In that case, this patch mandates that the firmware region has to be
mark "reserved"
the device tree so that the Linux kernel doesn't try to allocate
memory from there.
OpenSBI is already doing it from v0.7. Thus, any user using latest
OpenSBI can leverage
this patch for a better TLB utilization.


Note that *currently* OpenSBI v0.7 still adds the "no-map" property
which prevents such optimization.


Thanks for the clarification. When I said latest, I meant including
your patch in the mailing list.


However, legacy previous boot stages(BBL) do not reserve this area via
DT which may
result in an unexpected crash. I am not sure how many developers still
use BBL though.

Few general suggestions to tackle this problem:
1. This mandatory requirement should be added to the booting document
so that any other
SBI implementation is also aware of it.
2. You may have to move the patch1 to a separate config so that any
users of legacy boot stages
can disable this feature.


IMHO, the region occupied by runtime services should be marked as
reserved in the device-tree. So it seems redundant to add this as a
requirement, I would rather consider its absence as a bug.


I agree. I was just suggesting to document this bug :).


Oh ok then, we meant the same thing :)



Even if I understand that this might break some system, I don't like
the idea of a new config to support old "buggy" bootloaders: when will
we be able to remove it ? We'll never know when people will stop using
those bootloaders, so it will stay here forever...Where can I find the

Personally, I am fine with that. However, there were few concerns in the past.
I am leaving it to Palmer to decide.

@Palmer Dabbelt : Any thoughts ?


boot document you are talking about ? Can we simply state here that
this kernel version will not be compatible with those bootloaders
(we'll draw an exhaustive list here) ?

Yes.


Ok, I have just found Documentation/riscv/boot-image-header.rst: could
we imagine doing something like incrementing the version and use that as
a hint in the kernel not to map the 2MB offset ? That's still legacy,
but at least it does not require to recompile a kernel as the check
would be done at runtime.


I was suggesting to add a risc-v specific booting document and
document this "bug".
Documentation/riscv/boot-image-header.rst can be linked from that document or
the boot hader content can be included in that. No changes in code is necessary.

Eventually, this booting document will also include other additional
booting constraints for RISC-V
such as minimum extension required to boot Linux, csr state upon
entering S-mode, mmu state.



Ok I will prepare a boot document that links to the existing documents and
add all of that, I will need you for the last constraints that I don't 
know about.


Thanks Atish,

Alex


Alex



Alex



But furthermore, at the moment, the firmware (opensbi) explicitly
asks the
kernel not to map the region it occupies, which is on those common
platforms at the very beginning of the DRAM and then it also dealigns
virtual and physical addresses. I proposed a patch here:

https://github.com/riscv/opensbi/pull/167

that removes this 'constraint' but *not* all the time as it offers
some
kind of protection in case PMP is not available. So sometimes, we may
have a part of the memory below the kernel that is removed creating a
misalignment between virtual and physical addresses. So for
performance
reasons, we must at least make sure that PMD entries can be used:
that
is guaranteed by patch 1 too.

Finally the second patch simply improves best

Re: [PATCH 0/2] PUD/PGDIR entries for linear mapping

2020-06-12 Thread Alex Ghiti

Le 6/12/20 à 8:59 AM, Alex Ghiti a écrit :

Hi Atish,

Le 6/11/20 à 1:29 PM, Atish Patra a écrit :

On Wed, Jun 10, 2020 at 11:51 PM Alex Ghiti  wrote:

Hi Atish,

Le 6/10/20 à 2:32 PM, Atish Patra a écrit :

On Wed, Jun 3, 2020 at 8:36 AM Alexandre Ghiti  wrote:
This small patchset intends to use PUD/PGDIR entries for linear 
mapping

in order to better utilize TLB.

At the moment, only PMD entries can be used since on common platforms
(qemu/unleashed), the kernel is loaded at DRAM + 2MB which 
dealigns virtual
and physical addresses and then prevents the use of PUD/PGDIR 
entries.
So the kernel must be able to get those 2MB for PAGE_OFFSET to map 
the

beginning of the DRAM: this is achieved in patch 1.


I don't have in depth knowledge of how mm code works so this question
may be a completely
stupid one :). Just for my understanding,
As per my understanding, kernel will map those 2MB of memory but 
never use it.
How does the kernel ensure that it doesn't allocate any memory from 
those 2MB

memory if it is not marked as reserved?

Yes, a 1GB hugepage will cover those 2MB: I rely on the previous boot
stage to mark this region
as reserved if there is something there (like opensbi). Otherwise, the
kernel will indeed try to
allocate memory from there :)


In that case, this patch mandates that the firmware region has to be
mark "reserved"
the device tree so that the Linux kernel doesn't try to allocate
memory from there.
OpenSBI is already doing it from v0.7. Thus, any user using latest
OpenSBI can leverage
this patch for a better TLB utilization.



Note that *currently* OpenSBI v0.7 still adds the "no-map" property 
which prevents such optimization.



However, legacy previous boot stages(BBL) do not reserve this area via
DT which may
result in an unexpected crash. I am not sure how many developers still
use BBL though.

Few general suggestions to tackle this problem:
1. This mandatory requirement should be added to the booting document
so that any other
SBI implementation is also aware of it.
2. You may have to move the patch1 to a separate config so that any
users of legacy boot stages
can disable this feature.



IMHO, the region occupied by runtime services should be marked as 
reserved in the device-tree. So it seems redundant to add this as a 
requirement, I would rather consider its absence as a bug.


Even if I understand that this might break some system, I don't like 
the idea of a new config to support old "buggy" bootloaders: when will 
we be able to remove it ? We'll never know when people will stop using 
those bootloaders, so it will stay here forever...Where can I find the 
boot document you are talking about ? Can we simply state here that 
this kernel version will not be compatible with those bootloaders 
(we'll draw an exhaustive list here) ?



Ok, I have just found Documentation/riscv/boot-image-header.rst: could 
we imagine doing something like incrementing the version and use that as 
a hint in the kernel not to map the 2MB offset ? That's still legacy, 
but at least it does not require to recompile a kernel as the check 
would be done at runtime.





Alex



Alex


But furthermore, at the moment, the firmware (opensbi) explicitly 
asks the

kernel not to map the region it occupies, which is on those common
platforms at the very beginning of the DRAM and then it also dealigns
virtual and physical addresses. I proposed a patch here:

https://github.com/riscv/opensbi/pull/167

that removes this 'constraint' but *not* all the time as it offers 
some

kind of protection in case PMP is not available. So sometimes, we may
have a part of the memory below the kernel that is removed creating a
misalignment between virtual and physical addresses. So for 
performance
reasons, we must at least make sure that PMD entries can be used: 
that

is guaranteed by patch 1 too.

Finally the second patch simply improves best_map_size so that 
whenever

possible, PUD/PGDIR entries are used.

Below is the kernel page table without this patch on a 6G platform:

---[ Linear mapping ]---
0xc000-0xc00176e0 0x8020 5998M 
PMD D A . . . W R V


And with this patchset + opensbi patch:

---[ Linear mapping ]---
0xc000-0xc0014000 0x8000 
5G PUD D A . . . W R V
0xc0014000-0xc0017700 0x0001c000 880M 
PMD D A . . . W R V


Alexandre Ghiti (2):
    riscv: Get memory below load_pa while ensuring linear mapping 
is PMD

  aligned
    riscv: Use PUD/PGDIR entries for linear mapping when possible

   arch/riscv/include/asm/page.h |  8 
   arch/riscv/mm/init.c  | 69 
+--

   2 files changed, 65 insertions(+), 12 deletions(-)

--
2.20.1








Re: [PATCH 0/2] PUD/PGDIR entries for linear mapping

2020-06-12 Thread Alex Ghiti

Hi Atish,

Le 6/11/20 à 1:29 PM, Atish Patra a écrit :

On Wed, Jun 10, 2020 at 11:51 PM Alex Ghiti  wrote:

Hi Atish,

Le 6/10/20 à 2:32 PM, Atish Patra a écrit :

On Wed, Jun 3, 2020 at 8:36 AM Alexandre Ghiti  wrote:

This small patchset intends to use PUD/PGDIR entries for linear mapping
in order to better utilize TLB.

At the moment, only PMD entries can be used since on common platforms
(qemu/unleashed), the kernel is loaded at DRAM + 2MB which dealigns virtual
and physical addresses and then prevents the use of PUD/PGDIR entries.
So the kernel must be able to get those 2MB for PAGE_OFFSET to map the
beginning of the DRAM: this is achieved in patch 1.


I don't have in depth knowledge of how mm code works so this question
may be a completely
stupid one :). Just for my understanding,
As per my understanding, kernel will map those 2MB of memory but never use it.
How does the kernel ensure that it doesn't allocate any memory from those 2MB
memory if it is not marked as reserved?

Yes, a 1GB hugepage will cover those 2MB: I rely on the previous boot
stage to mark this region
as reserved if there is something there (like opensbi). Otherwise, the
kernel will indeed try to
allocate memory from there :)


In that case, this patch mandates that the firmware region has to be
mark "reserved"
the device tree so that the Linux kernel doesn't try to allocate
memory from there.
OpenSBI is already doing it from v0.7. Thus, any user using latest
OpenSBI can leverage
this patch for a better TLB utilization.



Note that *currently* OpenSBI v0.7 still adds the "no-map" property 
which prevents such optimization.



However, legacy previous boot stages(BBL) do not reserve this area via
DT which may
result in an unexpected crash. I am not sure how many developers still
use BBL though.

Few general suggestions to tackle this problem:
1. This mandatory requirement should be added to the booting document
so that any other
SBI implementation is also aware of it.
2. You may have to move the patch1 to a separate config so that any
users of legacy boot stages
can disable this feature.



IMHO, the region occupied by runtime services should be marked as 
reserved in the device-tree. So it seems redundant to add this as a 
requirement, I would rather consider its absence as a bug.


Even if I understand that this might break some system, I don't like the 
idea of a new config to support old "buggy" bootloaders: when will we be 
able to remove it ? We'll never know when people will stop using those 
bootloaders, so it will stay here forever...Where can I find the boot 
document you are talking about ? Can we simply state here that this 
kernel version will not be compatible with those bootloaders (we'll draw 
an exhaustive list here) ?


Alex



Alex



But furthermore, at the moment, the firmware (opensbi) explicitly asks the
kernel not to map the region it occupies, which is on those common
platforms at the very beginning of the DRAM and then it also dealigns
virtual and physical addresses. I proposed a patch here:

https://github.com/riscv/opensbi/pull/167

that removes this 'constraint' but *not* all the time as it offers some
kind of protection in case PMP is not available. So sometimes, we may
have a part of the memory below the kernel that is removed creating a
misalignment between virtual and physical addresses. So for performance
reasons, we must at least make sure that PMD entries can be used: that
is guaranteed by patch 1 too.

Finally the second patch simply improves best_map_size so that whenever
possible, PUD/PGDIR entries are used.

Below is the kernel page table without this patch on a 6G platform:

---[ Linear mapping ]---
0xc000-0xc00176e00x8020 5998M PMD D A . 
. . W R V

And with this patchset + opensbi patch:

---[ Linear mapping ]---
0xc000-0xc0014000 0x8000 5G PUD D A 
. . . W R V
0xc0014000-0xc00177000x0001c000 880M PMD D A . 
. . W R V

Alexandre Ghiti (2):
riscv: Get memory below load_pa while ensuring linear mapping is PMD
  aligned
riscv: Use PUD/PGDIR entries for linear mapping when possible

   arch/riscv/include/asm/page.h |  8 
   arch/riscv/mm/init.c  | 69 +--
   2 files changed, 65 insertions(+), 12 deletions(-)

--
2.20.1






Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone

2020-06-12 Thread Alex Ghiti

Hi Atish,

Le 6/11/20 à 5:34 PM, Atish Patra a écrit :

On Sun, Jun 7, 2020 at 1:01 AM Alexandre Ghiti  wrote:

This is a preparatory patch for relocatable kernel.

The kernel used to be linked at PAGE_OFFSET address and used to be loaded
physically at the beginning of the main memory. Therefore, we could use
the linear mapping for the kernel mapping.

But the relocated kernel base address will be different from PAGE_OFFSET
and since in the linear mapping, two different virtual addresses cannot
point to the same physical address, the kernel mapping needs to lie outside
the linear mapping.

In addition, because modules and BPF must be close to the kernel (inside
+-2GB window), the kernel is placed at the end of the vmalloc zone minus
2GB, which leaves room for modules and BPF. The kernel could not be
placed at the beginning of the vmalloc zone since other vmalloc
allocations from the kernel could get all the +-2GB window around the
kernel which would prevent new modules and BPF programs to be loaded.

Signed-off-by: Alexandre Ghiti 
Reviewed-by: Zong Li 
---
  arch/riscv/boot/loader.lds.S |  3 +-
  arch/riscv/include/asm/page.h| 10 +-
  arch/riscv/include/asm/pgtable.h | 38 ++---
  arch/riscv/kernel/head.S |  3 +-
  arch/riscv/kernel/module.c   |  4 +--
  arch/riscv/kernel/vmlinux.lds.S  |  3 +-
  arch/riscv/mm/init.c | 58 +---
  arch/riscv/mm/physaddr.c |  2 +-
  8 files changed, 88 insertions(+), 33 deletions(-)

diff --git a/arch/riscv/boot/loader.lds.S b/arch/riscv/boot/loader.lds.S
index 47a5003c2e28..62d94696a19c 100644
--- a/arch/riscv/boot/loader.lds.S
+++ b/arch/riscv/boot/loader.lds.S
@@ -1,13 +1,14 @@
  /* SPDX-License-Identifier: GPL-2.0 */

  #include 
+#include 

  OUTPUT_ARCH(riscv)
  ENTRY(_start)

  SECTIONS
  {
-   . = PAGE_OFFSET;
+   . = KERNEL_LINK_ADDR;

 .payload : {
 *(.payload)
diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 2d50f76efe48..48bb09b6a9b7 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -90,18 +90,26 @@ typedef struct page *pgtable_t;

  #ifdef CONFIG_MMU
  extern unsigned long va_pa_offset;
+extern unsigned long va_kernel_pa_offset;
  extern unsigned long pfn_base;
  #define ARCH_PFN_OFFSET(pfn_base)
  #else
  #define va_pa_offset   0
+#define va_kernel_pa_offset0
  #define ARCH_PFN_OFFSET(PAGE_OFFSET >> PAGE_SHIFT)
  #endif /* CONFIG_MMU */

  extern unsigned long max_low_pfn;
  extern unsigned long min_low_pfn;
+extern unsigned long kernel_virt_addr;

  #define __pa_to_va_nodebug(x)  ((void *)((unsigned long) (x) + va_pa_offset))
-#define __va_to_pa_nodebug(x)  ((unsigned long)(x) - va_pa_offset)
+#define linear_mapping_va_to_pa(x) ((unsigned long)(x) - va_pa_offset)
+#define kernel_mapping_va_to_pa(x) \
+   ((unsigned long)(x) - va_kernel_pa_offset)
+#define __va_to_pa_nodebug(x)  \
+   (((x) >= PAGE_OFFSET) ? \
+   linear_mapping_va_to_pa(x) : kernel_mapping_va_to_pa(x))

  #ifdef CONFIG_DEBUG_VIRTUAL
  extern phys_addr_t __virt_to_phys(unsigned long x);
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 35b60035b6b0..94ef3b49dfb6 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -11,23 +11,29 @@

  #include 

-#ifndef __ASSEMBLY__
-
-/* Page Upper Directory not used in RISC-V */
-#include 
-#include 
-#include 
-#include 
-
-#ifdef CONFIG_MMU
+#ifndef CONFIG_MMU
+#define KERNEL_VIRT_ADDR   PAGE_OFFSET
+#define KERNEL_LINK_ADDR   PAGE_OFFSET
+#else
+/*
+ * Leave 2GB for modules and BPF that must lie within a 2GB range around
+ * the kernel.
+ */
+#define KERNEL_VIRT_ADDR   (VMALLOC_END - SZ_2G + 1)
+#define KERNEL_LINK_ADDR   KERNEL_VIRT_ADDR

  #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
  #define VMALLOC_END  (PAGE_OFFSET - 1)
  #define VMALLOC_START(PAGE_OFFSET - VMALLOC_SIZE)

  #define BPF_JIT_REGION_SIZE(SZ_128M)
-#define BPF_JIT_REGION_START   (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
-#define BPF_JIT_REGION_END (VMALLOC_END)
+#define BPF_JIT_REGION_START   PFN_ALIGN((unsigned long)&_end)
+#define BPF_JIT_REGION_END (BPF_JIT_REGION_START + BPF_JIT_REGION_SIZE)
+

As these mappings have changed a few times in recent months including
this one, I think it would be
better to have virtual memory layout documentation in RISC-V similar
to other architectures.

If you can include the page table layout for 3/4 level page tables in
the same document, that would be really helpful.



Yes, I'll do that in a separate commit.

Thanks,

Alex



+#ifdef CONFIG_64BIT
+#define VMALLOC_MODULE_START   BPF_JIT_REGION_END
+#define VMALLOC_MODULE_END (((unsigned long)&_start & PAGE_MASK) + SZ_2G)
+#endif

  /*
   * Roughly size the vmemmap space to be large enough to fit enough
@@ -57,9 +63,16 @@
  

Re: [PATCH v5 2/4] riscv: Introduce CONFIG_RELOCATABLE

2020-06-11 Thread Alex Ghiti

Hi Jerome,

Le 6/10/20 à 10:10 AM, Jerome Forissier a écrit :

On 6/7/20 9:59 AM, Alexandre Ghiti wrote:
[...]


+config RELOCATABLE
+   bool
+   depends on MMU
+   help
+  This builds a kernel as a Position Independent Executable (PIE),
+  which retains all relocation metadata required to relocate the
+  kernel binary at runtime to a different virtual address than the
+  address it was linked at.
+  Since RISCV uses the RELA relocation format, this requires a
+  relocation pass at runtime even if the kernel is loaded at the
+  same address it was linked at.

Is this true? I thought that the GNU linker would write the "proper"
values by default, contrary to the LLVM linker (ld.lld) which would need
a special flag: --apply-dynamic-relocs (by default the relocated places
are set to zero). At least, it is my experience with Aarch64 on a
different project. So, sorry if I'm talking nonsense here -- I have not
looked at the details.




It seems that you're right, at least for aarch64 since they specifically 
specify the --no-apply-dynamic-relocs option. I retried to boot without 
relocating at runtime, and it fails on riscv. Can this be arch specific ?


Thanks,

Alex



Re: [PATCH 0/2] PUD/PGDIR entries for linear mapping

2020-06-11 Thread Alex Ghiti

Hi Atish,

Le 6/10/20 à 2:32 PM, Atish Patra a écrit :

On Wed, Jun 3, 2020 at 8:36 AM Alexandre Ghiti  wrote:

This small patchset intends to use PUD/PGDIR entries for linear mapping
in order to better utilize TLB.

At the moment, only PMD entries can be used since on common platforms
(qemu/unleashed), the kernel is loaded at DRAM + 2MB which dealigns virtual
and physical addresses and then prevents the use of PUD/PGDIR entries.
So the kernel must be able to get those 2MB for PAGE_OFFSET to map the
beginning of the DRAM: this is achieved in patch 1.


I don't have in depth knowledge of how mm code works so this question
may be a completely
stupid one :). Just for my understanding,
As per my understanding, kernel will map those 2MB of memory but never use it.
How does the kernel ensure that it doesn't allocate any memory from those 2MB
memory if it is not marked as reserved?


Yes, a 1GB hugepage will cover those 2MB: I rely on the previous boot 
stage to mark this region
as reserved if there is something there (like opensbi). Otherwise, the 
kernel will indeed try to

allocate memory from there :)

Alex



But furthermore, at the moment, the firmware (opensbi) explicitly asks the
kernel not to map the region it occupies, which is on those common
platforms at the very beginning of the DRAM and then it also dealigns
virtual and physical addresses. I proposed a patch here:

https://github.com/riscv/opensbi/pull/167

that removes this 'constraint' but *not* all the time as it offers some
kind of protection in case PMP is not available. So sometimes, we may
have a part of the memory below the kernel that is removed creating a
misalignment between virtual and physical addresses. So for performance
reasons, we must at least make sure that PMD entries can be used: that
is guaranteed by patch 1 too.

Finally the second patch simply improves best_map_size so that whenever
possible, PUD/PGDIR entries are used.

Below is the kernel page table without this patch on a 6G platform:

---[ Linear mapping ]---
0xc000-0xc00176e00x8020 5998M PMD D A . 
. . W R V

And with this patchset + opensbi patch:

---[ Linear mapping ]---
0xc000-0xc0014000 0x8000 5G PUD D A 
. . . W R V
0xc0014000-0xc00177000x0001c000 880M PMD D A . 
. . W R V

Alexandre Ghiti (2):
   riscv: Get memory below load_pa while ensuring linear mapping is PMD
 aligned
   riscv: Use PUD/PGDIR entries for linear mapping when possible

  arch/riscv/include/asm/page.h |  8 
  arch/riscv/mm/init.c  | 69 +--
  2 files changed, 65 insertions(+), 12 deletions(-)

--
2.20.1






Re: [PATCH v4 1/4] riscv: Move kernel mapping to vmalloc zone

2020-06-05 Thread Alex Ghiti

Hi Zong,

Le 6/3/20 à 10:52 PM, Zong Li a écrit :

On Wed, Jun 3, 2020 at 4:01 PM Alexandre Ghiti  wrote:

This is a preparatory patch for relocatable kernel.

The kernel used to be linked at PAGE_OFFSET address and used to be loaded
physically at the beginning of the main memory. Therefore, we could use
the linear mapping for the kernel mapping.

But the relocated kernel base address will be different from PAGE_OFFSET
and since in the linear mapping, two different virtual addresses cannot
point to the same physical address, the kernel mapping needs to lie outside
the linear mapping.

In addition, because modules and BPF must be close to the kernel (inside
+-2GB window), the kernel is placed at the end of the vmalloc zone minus
2GB, which leaves room for modules and BPF. The kernel could not be
placed at the beginning of the vmalloc zone since other vmalloc
allocations from the kernel could get all the +-2GB window around the
kernel which would prevent new modules and BPF programs to be loaded.

Signed-off-by: Alexandre Ghiti 
---
  arch/riscv/boot/loader.lds.S |  3 +-
  arch/riscv/include/asm/page.h| 10 +-
  arch/riscv/include/asm/pgtable.h | 38 ++---
  arch/riscv/kernel/head.S |  3 +-
  arch/riscv/kernel/module.c   |  4 +--
  arch/riscv/kernel/vmlinux.lds.S  |  3 +-
  arch/riscv/mm/init.c | 58 +---
  arch/riscv/mm/physaddr.c |  2 +-
  8 files changed, 88 insertions(+), 33 deletions(-)

diff --git a/arch/riscv/boot/loader.lds.S b/arch/riscv/boot/loader.lds.S
index 47a5003c2e28..62d94696a19c 100644
--- a/arch/riscv/boot/loader.lds.S
+++ b/arch/riscv/boot/loader.lds.S
@@ -1,13 +1,14 @@
  /* SPDX-License-Identifier: GPL-2.0 */

  #include 
+#include 

  OUTPUT_ARCH(riscv)
  ENTRY(_start)

  SECTIONS
  {
-   . = PAGE_OFFSET;
+   . = KERNEL_LINK_ADDR;

 .payload : {
 *(.payload)
diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 2d50f76efe48..48bb09b6a9b7 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -90,18 +90,26 @@ typedef struct page *pgtable_t;

  #ifdef CONFIG_MMU
  extern unsigned long va_pa_offset;
+extern unsigned long va_kernel_pa_offset;
  extern unsigned long pfn_base;
  #define ARCH_PFN_OFFSET(pfn_base)
  #else
  #define va_pa_offset   0
+#define va_kernel_pa_offset0
  #define ARCH_PFN_OFFSET(PAGE_OFFSET >> PAGE_SHIFT)
  #endif /* CONFIG_MMU */

  extern unsigned long max_low_pfn;
  extern unsigned long min_low_pfn;
+extern unsigned long kernel_virt_addr;

  #define __pa_to_va_nodebug(x)  ((void *)((unsigned long) (x) + va_pa_offset))
-#define __va_to_pa_nodebug(x)  ((unsigned long)(x) - va_pa_offset)
+#define linear_mapping_va_to_pa(x) ((unsigned long)(x) - va_pa_offset)
+#define kernel_mapping_va_to_pa(x) \
+   ((unsigned long)(x) - va_kernel_pa_offset)
+#define __va_to_pa_nodebug(x)  \
+   (((x) >= PAGE_OFFSET) ? \
+   linear_mapping_va_to_pa(x) : kernel_mapping_va_to_pa(x))

  #ifdef CONFIG_DEBUG_VIRTUAL
  extern phys_addr_t __virt_to_phys(unsigned long x);
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 35b60035b6b0..94ef3b49dfb6 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -11,23 +11,29 @@

  #include 

-#ifndef __ASSEMBLY__
-
-/* Page Upper Directory not used in RISC-V */
-#include 
-#include 
-#include 
-#include 
-
-#ifdef CONFIG_MMU
+#ifndef CONFIG_MMU
+#define KERNEL_VIRT_ADDR   PAGE_OFFSET
+#define KERNEL_LINK_ADDR   PAGE_OFFSET
+#else
+/*
+ * Leave 2GB for modules and BPF that must lie within a 2GB range around
+ * the kernel.
+ */
+#define KERNEL_VIRT_ADDR   (VMALLOC_END - SZ_2G + 1)
+#define KERNEL_LINK_ADDR   KERNEL_VIRT_ADDR

  #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
  #define VMALLOC_END  (PAGE_OFFSET - 1)
  #define VMALLOC_START(PAGE_OFFSET - VMALLOC_SIZE)

  #define BPF_JIT_REGION_SIZE(SZ_128M)
-#define BPF_JIT_REGION_START   (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
-#define BPF_JIT_REGION_END (VMALLOC_END)
+#define BPF_JIT_REGION_START   PFN_ALIGN((unsigned long)&_end)
+#define BPF_JIT_REGION_END (BPF_JIT_REGION_START + BPF_JIT_REGION_SIZE)
+
+#ifdef CONFIG_64BIT
+#define VMALLOC_MODULE_START   BPF_JIT_REGION_END
+#define VMALLOC_MODULE_END (((unsigned long)&_start & PAGE_MASK) + SZ_2G)
+#endif

  /*
   * Roughly size the vmemmap space to be large enough to fit enough
@@ -57,9 +63,16 @@
  #define FIXADDR_SIZE PGDIR_SIZE
  #endif
  #define FIXADDR_START(FIXADDR_TOP - FIXADDR_SIZE)
-
  #endif

+#ifndef __ASSEMBLY__
+
+/* Page Upper Directory not used in RISC-V */
+#include 
+#include 
+#include 
+#include 
+
  #ifdef CONFIG_64BIT
  #include 
  #else
@@ -483,6 +496,7 @@ static inline void __kernel_map_pages(struct page *page, 
int numpages, int enabl

  #define 

Re: [PATCH v3 1/3] riscv: Move kernel mapping to vmalloc zone

2020-05-28 Thread Alex Ghiti

Hi Zong,

Le 5/27/20 à 3:29 AM, Alex Ghiti a écrit :

Le 5/27/20 à 2:05 AM, Zong Li a écrit :

On Wed, May 27, 2020 at 1:06 AM Alex Ghiti  wrote:

Hi Zong,

Le 5/26/20 à 5:43 AM, Zong Li a écrit :

On Sun, May 24, 2020 at 4:54 PM Alexandre Ghiti  wrote:

This is a preparatory patch for relocatable kernel.

The kernel used to be linked at PAGE_OFFSET address and used to be 
loaded
physically at the beginning of the main memory. Therefore, we 
could use

the linear mapping for the kernel mapping.

But the relocated kernel base address will be different from 
PAGE_OFFSET
and since in the linear mapping, two different virtual addresses 
cannot
point to the same physical address, the kernel mapping needs to 
lie outside

the linear mapping.

In addition, because modules and BPF must be close to the kernel 
(inside
+-2GB window), the kernel is placed at the end of the vmalloc zone 
minus

2GB, which leaves room for modules and BPF. The kernel could not be
placed at the beginning of the vmalloc zone since other vmalloc
allocations from the kernel could get all the +-2GB window around the
kernel which would prevent new modules and BPF programs to be loaded.

Signed-off-by: Alexandre Ghiti 
---
   arch/riscv/boot/loader.lds.S |  3 +-
   arch/riscv/include/asm/page.h    | 10 +-
   arch/riscv/include/asm/pgtable.h | 37 +---
   arch/riscv/kernel/head.S |  3 +-
   arch/riscv/kernel/module.c   |  4 +--
   arch/riscv/kernel/vmlinux.lds.S  |  3 +-
   arch/riscv/mm/init.c | 58 
+---

   arch/riscv/mm/physaddr.c |  2 +-
   8 files changed, 87 insertions(+), 33 deletions(-)

diff --git a/arch/riscv/boot/loader.lds.S 
b/arch/riscv/boot/loader.lds.S

index 47a5003c2e28..62d94696a19c 100644
--- a/arch/riscv/boot/loader.lds.S
+++ b/arch/riscv/boot/loader.lds.S
@@ -1,13 +1,14 @@
   /* SPDX-License-Identifier: GPL-2.0 */

   #include 
+#include 

   OUTPUT_ARCH(riscv)
   ENTRY(_start)

   SECTIONS
   {
-   . = PAGE_OFFSET;
+   . = KERNEL_LINK_ADDR;

  .payload : {
  *(.payload)
diff --git a/arch/riscv/include/asm/page.h 
b/arch/riscv/include/asm/page.h

index 2d50f76efe48..48bb09b6a9b7 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -90,18 +90,26 @@ typedef struct page *pgtable_t;

   #ifdef CONFIG_MMU
   extern unsigned long va_pa_offset;
+extern unsigned long va_kernel_pa_offset;
   extern unsigned long pfn_base;
   #define ARCH_PFN_OFFSET    (pfn_base)
   #else
   #define va_pa_offset   0
+#define va_kernel_pa_offset    0
   #define ARCH_PFN_OFFSET    (PAGE_OFFSET >> PAGE_SHIFT)
   #endif /* CONFIG_MMU */

   extern unsigned long max_low_pfn;
   extern unsigned long min_low_pfn;
+extern unsigned long kernel_virt_addr;

   #define __pa_to_va_nodebug(x)  ((void *)((unsigned long) (x) + 
va_pa_offset))

-#define __va_to_pa_nodebug(x)  ((unsigned long)(x) - va_pa_offset)
+#define linear_mapping_va_to_pa(x) ((unsigned long)(x) - 
va_pa_offset)

+#define kernel_mapping_va_to_pa(x) \
+   ((unsigned long)(x) - va_kernel_pa_offset)
+#define __va_to_pa_nodebug(x)  \
+   (((x) >= PAGE_OFFSET) ? \
+   linear_mapping_va_to_pa(x) : 
kernel_mapping_va_to_pa(x))


   #ifdef CONFIG_DEBUG_VIRTUAL
   extern phys_addr_t __virt_to_phys(unsigned long x);
diff --git a/arch/riscv/include/asm/pgtable.h 
b/arch/riscv/include/asm/pgtable.h

index 35b60035b6b0..25213cfaf680 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -11,23 +11,29 @@

   #include 

-#ifndef __ASSEMBLY__
-
-/* Page Upper Directory not used in RISC-V */
-#include 
-#include 
-#include 
-#include 
-
-#ifdef CONFIG_MMU
+#ifndef CONFIG_MMU
+#define KERNEL_VIRT_ADDR   PAGE_OFFSET
+#define KERNEL_LINK_ADDR   PAGE_OFFSET
+#else
+/*
+ * Leave 2GB for modules and BPF that must lie within a 2GB range 
around

+ * the kernel.
+ */
+#define KERNEL_VIRT_ADDR   (VMALLOC_END - SZ_2G + 1)
+#define KERNEL_LINK_ADDR   KERNEL_VIRT_ADDR

   #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
   #define VMALLOC_END  (PAGE_OFFSET - 1)
   #define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)

   #define BPF_JIT_REGION_SIZE    (SZ_128M)
-#define BPF_JIT_REGION_START   (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
-#define BPF_JIT_REGION_END (VMALLOC_END)
+#define BPF_JIT_REGION_START   (kernel_virt_addr)
+#define BPF_JIT_REGION_END (kernel_virt_addr + 
BPF_JIT_REGION_SIZE)

It seems to have a potential risk here, the region of bpf is
overlapping with kernel mapping, so if kernel size is bigger than
128MB, bpf region would be occupied and run out by kernel mapping.

Is there the risk as I mentioned?



Sorry I forgot to answer this one: I was confident that 128MB was 
large enough for kernel
and BPF. But I see no reason to leave this risk so I'll change 
kernel_virt_addr for _end so

BPF will have its 128MB reserved.

Re: [PATCH v3 1/3] riscv: Move kernel mapping to vmalloc zone

2020-05-27 Thread Alex Ghiti

Le 5/27/20 à 2:05 AM, Zong Li a écrit :

On Wed, May 27, 2020 at 1:06 AM Alex Ghiti  wrote:

Hi Zong,

Le 5/26/20 à 5:43 AM, Zong Li a écrit :

On Sun, May 24, 2020 at 4:54 PM Alexandre Ghiti  wrote:

This is a preparatory patch for relocatable kernel.

The kernel used to be linked at PAGE_OFFSET address and used to be loaded
physically at the beginning of the main memory. Therefore, we could use
the linear mapping for the kernel mapping.

But the relocated kernel base address will be different from PAGE_OFFSET
and since in the linear mapping, two different virtual addresses cannot
point to the same physical address, the kernel mapping needs to lie outside
the linear mapping.

In addition, because modules and BPF must be close to the kernel (inside
+-2GB window), the kernel is placed at the end of the vmalloc zone minus
2GB, which leaves room for modules and BPF. The kernel could not be
placed at the beginning of the vmalloc zone since other vmalloc
allocations from the kernel could get all the +-2GB window around the
kernel which would prevent new modules and BPF programs to be loaded.

Signed-off-by: Alexandre Ghiti 
---
   arch/riscv/boot/loader.lds.S |  3 +-
   arch/riscv/include/asm/page.h| 10 +-
   arch/riscv/include/asm/pgtable.h | 37 +---
   arch/riscv/kernel/head.S |  3 +-
   arch/riscv/kernel/module.c   |  4 +--
   arch/riscv/kernel/vmlinux.lds.S  |  3 +-
   arch/riscv/mm/init.c | 58 +---
   arch/riscv/mm/physaddr.c |  2 +-
   8 files changed, 87 insertions(+), 33 deletions(-)

diff --git a/arch/riscv/boot/loader.lds.S b/arch/riscv/boot/loader.lds.S
index 47a5003c2e28..62d94696a19c 100644
--- a/arch/riscv/boot/loader.lds.S
+++ b/arch/riscv/boot/loader.lds.S
@@ -1,13 +1,14 @@
   /* SPDX-License-Identifier: GPL-2.0 */

   #include 
+#include 

   OUTPUT_ARCH(riscv)
   ENTRY(_start)

   SECTIONS
   {
-   . = PAGE_OFFSET;
+   . = KERNEL_LINK_ADDR;

  .payload : {
  *(.payload)
diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 2d50f76efe48..48bb09b6a9b7 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -90,18 +90,26 @@ typedef struct page *pgtable_t;

   #ifdef CONFIG_MMU
   extern unsigned long va_pa_offset;
+extern unsigned long va_kernel_pa_offset;
   extern unsigned long pfn_base;
   #define ARCH_PFN_OFFSET(pfn_base)
   #else
   #define va_pa_offset   0
+#define va_kernel_pa_offset0
   #define ARCH_PFN_OFFSET(PAGE_OFFSET >> PAGE_SHIFT)
   #endif /* CONFIG_MMU */

   extern unsigned long max_low_pfn;
   extern unsigned long min_low_pfn;
+extern unsigned long kernel_virt_addr;

   #define __pa_to_va_nodebug(x)  ((void *)((unsigned long) (x) + va_pa_offset))
-#define __va_to_pa_nodebug(x)  ((unsigned long)(x) - va_pa_offset)
+#define linear_mapping_va_to_pa(x) ((unsigned long)(x) - va_pa_offset)
+#define kernel_mapping_va_to_pa(x) \
+   ((unsigned long)(x) - va_kernel_pa_offset)
+#define __va_to_pa_nodebug(x)  \
+   (((x) >= PAGE_OFFSET) ? \
+   linear_mapping_va_to_pa(x) : kernel_mapping_va_to_pa(x))

   #ifdef CONFIG_DEBUG_VIRTUAL
   extern phys_addr_t __virt_to_phys(unsigned long x);
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 35b60035b6b0..25213cfaf680 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -11,23 +11,29 @@

   #include 

-#ifndef __ASSEMBLY__
-
-/* Page Upper Directory not used in RISC-V */
-#include 
-#include 
-#include 
-#include 
-
-#ifdef CONFIG_MMU
+#ifndef CONFIG_MMU
+#define KERNEL_VIRT_ADDR   PAGE_OFFSET
+#define KERNEL_LINK_ADDR   PAGE_OFFSET
+#else
+/*
+ * Leave 2GB for modules and BPF that must lie within a 2GB range around
+ * the kernel.
+ */
+#define KERNEL_VIRT_ADDR   (VMALLOC_END - SZ_2G + 1)
+#define KERNEL_LINK_ADDR   KERNEL_VIRT_ADDR

   #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
   #define VMALLOC_END  (PAGE_OFFSET - 1)
   #define VMALLOC_START(PAGE_OFFSET - VMALLOC_SIZE)

   #define BPF_JIT_REGION_SIZE(SZ_128M)
-#define BPF_JIT_REGION_START   (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
-#define BPF_JIT_REGION_END (VMALLOC_END)
+#define BPF_JIT_REGION_START   (kernel_virt_addr)
+#define BPF_JIT_REGION_END (kernel_virt_addr + BPF_JIT_REGION_SIZE)

It seems to have a potential risk here, the region of bpf is
overlapping with kernel mapping, so if kernel size is bigger than
128MB, bpf region would be occupied and run out by kernel mapping.

Is there the risk as I mentioned?



Sorry I forgot to answer this one: I was confident that 128MB was large 
enough for kernel
and BPF. But I see no reason to leave this risk so I'll change 
kernel_virt_addr for _end so

BPF will have its 128MB reserved.

Thanks !

Alex





+
+#ifdef CONFIG_64BIT
+#define VMALLOC_MODULE_START   BPF

Re: [PATCH v3 1/3] riscv: Move kernel mapping to vmalloc zone

2020-05-26 Thread Alex Ghiti

Hi Zong,

Le 5/26/20 à 5:43 AM, Zong Li a écrit :

On Sun, May 24, 2020 at 4:54 PM Alexandre Ghiti  wrote:

This is a preparatory patch for relocatable kernel.

The kernel used to be linked at PAGE_OFFSET address and used to be loaded
physically at the beginning of the main memory. Therefore, we could use
the linear mapping for the kernel mapping.

But the relocated kernel base address will be different from PAGE_OFFSET
and since in the linear mapping, two different virtual addresses cannot
point to the same physical address, the kernel mapping needs to lie outside
the linear mapping.

In addition, because modules and BPF must be close to the kernel (inside
+-2GB window), the kernel is placed at the end of the vmalloc zone minus
2GB, which leaves room for modules and BPF. The kernel could not be
placed at the beginning of the vmalloc zone since other vmalloc
allocations from the kernel could get all the +-2GB window around the
kernel which would prevent new modules and BPF programs to be loaded.

Signed-off-by: Alexandre Ghiti 
---
  arch/riscv/boot/loader.lds.S |  3 +-
  arch/riscv/include/asm/page.h| 10 +-
  arch/riscv/include/asm/pgtable.h | 37 +---
  arch/riscv/kernel/head.S |  3 +-
  arch/riscv/kernel/module.c   |  4 +--
  arch/riscv/kernel/vmlinux.lds.S  |  3 +-
  arch/riscv/mm/init.c | 58 +---
  arch/riscv/mm/physaddr.c |  2 +-
  8 files changed, 87 insertions(+), 33 deletions(-)

diff --git a/arch/riscv/boot/loader.lds.S b/arch/riscv/boot/loader.lds.S
index 47a5003c2e28..62d94696a19c 100644
--- a/arch/riscv/boot/loader.lds.S
+++ b/arch/riscv/boot/loader.lds.S
@@ -1,13 +1,14 @@
  /* SPDX-License-Identifier: GPL-2.0 */

  #include 
+#include 

  OUTPUT_ARCH(riscv)
  ENTRY(_start)

  SECTIONS
  {
-   . = PAGE_OFFSET;
+   . = KERNEL_LINK_ADDR;

 .payload : {
 *(.payload)
diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 2d50f76efe48..48bb09b6a9b7 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -90,18 +90,26 @@ typedef struct page *pgtable_t;

  #ifdef CONFIG_MMU
  extern unsigned long va_pa_offset;
+extern unsigned long va_kernel_pa_offset;
  extern unsigned long pfn_base;
  #define ARCH_PFN_OFFSET(pfn_base)
  #else
  #define va_pa_offset   0
+#define va_kernel_pa_offset0
  #define ARCH_PFN_OFFSET(PAGE_OFFSET >> PAGE_SHIFT)
  #endif /* CONFIG_MMU */

  extern unsigned long max_low_pfn;
  extern unsigned long min_low_pfn;
+extern unsigned long kernel_virt_addr;

  #define __pa_to_va_nodebug(x)  ((void *)((unsigned long) (x) + va_pa_offset))
-#define __va_to_pa_nodebug(x)  ((unsigned long)(x) - va_pa_offset)
+#define linear_mapping_va_to_pa(x) ((unsigned long)(x) - va_pa_offset)
+#define kernel_mapping_va_to_pa(x) \
+   ((unsigned long)(x) - va_kernel_pa_offset)
+#define __va_to_pa_nodebug(x)  \
+   (((x) >= PAGE_OFFSET) ? \
+   linear_mapping_va_to_pa(x) : kernel_mapping_va_to_pa(x))

  #ifdef CONFIG_DEBUG_VIRTUAL
  extern phys_addr_t __virt_to_phys(unsigned long x);
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 35b60035b6b0..25213cfaf680 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -11,23 +11,29 @@

  #include 

-#ifndef __ASSEMBLY__
-
-/* Page Upper Directory not used in RISC-V */
-#include 
-#include 
-#include 
-#include 
-
-#ifdef CONFIG_MMU
+#ifndef CONFIG_MMU
+#define KERNEL_VIRT_ADDR   PAGE_OFFSET
+#define KERNEL_LINK_ADDR   PAGE_OFFSET
+#else
+/*
+ * Leave 2GB for modules and BPF that must lie within a 2GB range around
+ * the kernel.
+ */
+#define KERNEL_VIRT_ADDR   (VMALLOC_END - SZ_2G + 1)
+#define KERNEL_LINK_ADDR   KERNEL_VIRT_ADDR

  #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
  #define VMALLOC_END  (PAGE_OFFSET - 1)
  #define VMALLOC_START(PAGE_OFFSET - VMALLOC_SIZE)

  #define BPF_JIT_REGION_SIZE(SZ_128M)
-#define BPF_JIT_REGION_START   (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
-#define BPF_JIT_REGION_END (VMALLOC_END)
+#define BPF_JIT_REGION_START   (kernel_virt_addr)
+#define BPF_JIT_REGION_END (kernel_virt_addr + BPF_JIT_REGION_SIZE)

It seems to have a potential risk here, the region of bpf is
overlapping with kernel mapping, so if kernel size is bigger than
128MB, bpf region would be occupied and run out by kernel mapping.


+
+#ifdef CONFIG_64BIT
+#define VMALLOC_MODULE_START   BPF_JIT_REGION_END
+#define VMALLOC_MODULE_END VMALLOC_END
+#endif


Although kernel_virt_addr is a fixed address now, I think it could be
changed for the purpose of relocatable or KASLR, so if
kernel_virt_addr is moved to far from VMALLOC_END than 2G, the region
of module would be too big.



Yes you're right, that's wrong to allow modules to lie outside
the 2G window, thanks for noticing.



In addition, the region 

Re: [PATCH 5/8] riscv: Implement sv48 support

2020-05-26 Thread Alex Ghiti

Le 5/25/20 à 2:45 AM, Anup Patel a écrit :

On Sun, May 24, 2020 at 2:45 PM Alexandre Ghiti  wrote:

By adding a new 4th level of page table, give the possibility to 64bit
kernel to address 2^48 bytes of virtual address: in practice, that roughly
offers ~160TB of virtual address space to userspace and allows up to 64TB
of physical memory.

If the underlying hardware does not support sv48, we will automatically
fallback to a standard 3-level page table by folding the new PUD level into
PGDIR level. In order to detect HW capabilities at runtime, we
use SATP feature that ignores writes with an unsupported mode.

Signed-off-by: Alexandre Ghiti 
---
  arch/riscv/Kconfig  |   6 +-
  arch/riscv/include/asm/csr.h|   3 +-
  arch/riscv/include/asm/fixmap.h |   1 +
  arch/riscv/include/asm/page.h   |  15 +++
  arch/riscv/include/asm/pgalloc.h|  36 +++
  arch/riscv/include/asm/pgtable-64.h |  97 -
  arch/riscv/include/asm/pgtable.h|   9 +-
  arch/riscv/kernel/head.S|   3 +-
  arch/riscv/mm/context.c |   4 +-
  arch/riscv/mm/init.c| 159 +---
  10 files changed, 309 insertions(+), 24 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index e167f16131f4..3f73f60e9732 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -68,6 +68,7 @@ config RISCV
 select ARCH_HAS_GCOV_PROFILE_ALL
 select HAVE_COPY_THREAD_TLS
 select HAVE_ARCH_KASAN if MMU && 64BIT
+   select RELOCATABLE if 64BIT

  config ARCH_MMAP_RND_BITS_MIN
 default 18 if 64BIT
@@ -106,7 +107,7 @@ config PAGE_OFFSET
 default 0xC000 if 32BIT && MAXPHYSMEM_2GB
 default 0x8000 if 64BIT && !MMU
 default 0x8000 if 64BIT && MAXPHYSMEM_2GB
-   default 0xffe0 if 64BIT && !MAXPHYSMEM_2GB
+   default 0xc000 if 64BIT && !MAXPHYSMEM_2GB

  config ARCH_FLATMEM_ENABLE
 def_bool y
@@ -155,8 +156,11 @@ config GENERIC_HWEIGHT
  config FIX_EARLYCON_MEM
 def_bool MMU

+# On a 64BIT relocatable kernel, the 4-level page table is at runtime folded
+# on a 3-level page table when sv48 is not supported.
  config PGTABLE_LEVELS
 int
+   default 4 if 64BIT && RELOCATABLE
 default 3 if 64BIT
 default 2

diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index cec462e198ce..d41536c3f8d4 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -40,11 +40,10 @@
  #ifndef CONFIG_64BIT
  #define SATP_PPN   _AC(0x003F, UL)
  #define SATP_MODE_32   _AC(0x8000, UL)
-#define SATP_MODE  SATP_MODE_32
  #else
  #define SATP_PPN   _AC(0x0FFF, UL)
  #define SATP_MODE_39   _AC(0x8000, UL)
-#define SATP_MODE  SATP_MODE_39
+#define SATP_MODE_48   _AC(0x9000, UL)
  #endif

  /* Exception cause high bit - is an interrupt if set */
diff --git a/arch/riscv/include/asm/fixmap.h b/arch/riscv/include/asm/fixmap.h
index 2368d49eb4ef..d891cf9c73c5 100644
--- a/arch/riscv/include/asm/fixmap.h
+++ b/arch/riscv/include/asm/fixmap.h
@@ -27,6 +27,7 @@ enum fixed_addresses {
 FIX_FDT = FIX_FDT_END + FIX_FDT_SIZE / PAGE_SIZE - 1,
 FIX_PTE,
 FIX_PMD,
+   FIX_PUD,
 FIX_TEXT_POKE1,
 FIX_TEXT_POKE0,
 FIX_EARLYCON_MEM_BASE,
diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 48bb09b6a9b7..5e77fe7f0d6d 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -31,7 +31,19 @@
   * When not using MMU this corresponds to the first free page in
   * physical memory (aligned on a page boundary).
   */
+#ifdef CONFIG_RELOCATABLE
+#define PAGE_OFFSET__page_offset
+
+#ifdef CONFIG_64BIT
+/*
+ * By default, CONFIG_PAGE_OFFSET value corresponds to SV48 address space so
+ * define the PAGE_OFFSET value for SV39.
+ */
+#define PAGE_OFFSET_L3 0xffe0
+#endif /* CONFIG_64BIT */
+#else
  #define PAGE_OFFSET_AC(CONFIG_PAGE_OFFSET, UL)
+#endif /* CONFIG_RELOCATABLE */

  #define KERN_VIRT_SIZE (-PAGE_OFFSET)

@@ -102,6 +114,9 @@ extern unsigned long pfn_base;
  extern unsigned long max_low_pfn;
  extern unsigned long min_low_pfn;
  extern unsigned long kernel_virt_addr;
+#ifdef CONFIG_RELOCATABLE
+extern unsigned long __page_offset;
+#endif

  #define __pa_to_va_nodebug(x)  ((void *)((unsigned long) (x) + va_pa_offset))
  #define linear_mapping_va_to_pa(x) ((unsigned long)(x) - va_pa_offset)
diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
index 3f601ee8233f..540eaa5a8658 100644
--- a/arch/riscv/include/asm/pgalloc.h
+++ b/arch/riscv/include/asm/pgalloc.h
@@ -36,6 +36,42 @@ static inline void pud_populate(struct mm_struct *mm, pud_t 
*pud, pmd_t *pmd)

 set_pud(pud, __pud((pfn << _PAGE_PFN_SHIFT) | _PAGE_TABLE));
  }
+
+static inline void p4d_populate(struct 

Re: [PATCH 7/8] riscv: Use pgtable_l4_enabled to output mmu type in cpuinfo

2020-05-26 Thread Alex Ghiti

Hi Anup,

Le 5/25/20 à 2:21 AM, Anup Patel a écrit :

On Sun, May 24, 2020 at 2:47 PM Alexandre Ghiti  wrote:

Now that the mmu type is determined at runtime using SATP
characteristic, use the global variable pgtable_l4_enabled to output
mmu type of the processor through /proc/cpuinfo instead of relying on
device tree infos.

Signed-off-by: Alexandre Ghiti 
Reviewed-by: Anup Patel 
Reviewed-by: Palmer Dabbelt 
---
  arch/riscv/boot/dts/sifive/fu540-c000.dtsi |  4 
  arch/riscv/kernel/cpu.c| 24 --
  2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/arch/riscv/boot/dts/sifive/fu540-c000.dtsi 
b/arch/riscv/boot/dts/sifive/fu540-c000.dtsi
index 7db861053483..6138590a2229 100644
--- a/arch/riscv/boot/dts/sifive/fu540-c000.dtsi
+++ b/arch/riscv/boot/dts/sifive/fu540-c000.dtsi
@@ -50,7 +50,6 @@
 i-cache-size = <32768>;
 i-tlb-sets = <1>;
 i-tlb-size = <32>;
-   mmu-type = "riscv,sv39";
 reg = <1>;
 riscv,isa = "rv64imafdc";
 tlb-split;
@@ -74,7 +73,6 @@
 i-cache-size = <32768>;
 i-tlb-sets = <1>;
 i-tlb-size = <32>;
-   mmu-type = "riscv,sv39";
 reg = <2>;
 riscv,isa = "rv64imafdc";
 tlb-split;
@@ -98,7 +96,6 @@
 i-cache-size = <32768>;
 i-tlb-sets = <1>;
 i-tlb-size = <32>;
-   mmu-type = "riscv,sv39";
 reg = <3>;
 riscv,isa = "rv64imafdc";
 tlb-split;
@@ -122,7 +119,6 @@
 i-cache-size = <32768>;
 i-tlb-sets = <1>;
 i-tlb-size = <32>;
-   mmu-type = "riscv,sv39";
 reg = <4>;
 riscv,isa = "rv64imafdc";
 tlb-split;

Your PATCH6 is already doing the right thing by skipping CPU DT
nodes that don't have "mmu-type" DT property.

The "mmu-type" DT property is very critical for RUNTIME M-mode
firmware (OpenSBI) because it tells whether a given CPU has MMU
(or not). This is also in agreement with the current DT bindings
document for RISC-V CPUs.

I suggest to drop the change in sifive/fu540-c000.dtsi and rest of
the patch is fine so my Reviewed-by still holds.



Ok I'll do that in v2, thanks.


Alex



Regards,
Anup


diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index 40a3c442ac5f..38a699b997a8 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -8,6 +8,8 @@
  #include 
  #include 

+extern bool pgtable_l4_enabled;
+
  /*
   * Returns the hart ID of the given device tree node, or -ENODEV if the node
   * isn't an enabled and valid RISC-V hart node.
@@ -54,18 +56,19 @@ static void print_isa(struct seq_file *f, const char *isa)
 seq_puts(f, "\n");
  }

-static void print_mmu(struct seq_file *f, const char *mmu_type)
+static void print_mmu(struct seq_file *f)
  {
+   char sv_type[16];
+
  #if defined(CONFIG_32BIT)
-   if (strcmp(mmu_type, "riscv,sv32") != 0)
-   return;
+   strncpy(sv_type, "sv32", 5);
  #elif defined(CONFIG_64BIT)
-   if (strcmp(mmu_type, "riscv,sv39") != 0 &&
-   strcmp(mmu_type, "riscv,sv48") != 0)
-   return;
+   if (pgtable_l4_enabled)
+   strncpy(sv_type, "sv48", 5);
+   else
+   strncpy(sv_type, "sv39", 5);
  #endif
-
-   seq_printf(f, "mmu\t\t: %s\n", mmu_type+6);
+   seq_printf(f, "mmu\t\t: %s\n", sv_type);
  }

  static void *c_start(struct seq_file *m, loff_t *pos)
@@ -90,14 +93,13 @@ static int c_show(struct seq_file *m, void *v)
  {
 unsigned long cpu_id = (unsigned long)v - 1;
 struct device_node *node = of_get_cpu_node(cpu_id, NULL);
-   const char *compat, *isa, *mmu;
+   const char *compat, *isa;

 seq_printf(m, "processor\t: %lu\n", cpu_id);
 seq_printf(m, "hart\t\t: %lu\n", cpuid_to_hartid_map(cpu_id));
 if (!of_property_read_string(node, "riscv,isa", ))
 print_isa(m, isa);
-   if (!of_property_read_string(node, "mmu-type", ))
-   print_mmu(m, mmu);
+   print_mmu(m);
 if (!of_property_read_string(node, "compatible", )
 && strcmp(compat, "riscv"))
 seq_printf(m, "uarch\t\t: %s\n", compat);
--
2.20.1



Re: [PATCH 04/10] riscv: Fix print_vm_layout build error if NOMMU

2020-05-14 Thread Alex Ghiti

Hi,

On 5/10/20 10:19 PM, Kefeng Wang wrote:

arch/riscv/mm/init.c: In function ‘print_vm_layout’:
arch/riscv/mm/init.c:68:37: error: ‘FIXADDR_START’ undeclared (first use in 
this function);
arch/riscv/mm/init.c:69:20: error: ‘FIXADDR_TOP’ undeclared
arch/riscv/mm/init.c:70:37: error: ‘PCI_IO_START’ undeclared
arch/riscv/mm/init.c:71:20: error: ‘PCI_IO_END’ undeclared
arch/riscv/mm/init.c:72:38: error: ‘VMEMMAP_START’ undeclared
arch/riscv/mm/init.c:73:20: error: ‘VMEMMAP_END’ undeclared (first use in this 
function);

Reported-by: Hulk Robot 
Signed-off-by: Kefeng Wang 
---
  arch/riscv/mm/init.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index dfcaebc3928f..58c39c44b9c9 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -49,7 +49,7 @@ static void setup_zero_page(void)
memset((void *)empty_zero_page, 0, PAGE_SIZE);
  }
  
-#ifdef CONFIG_DEBUG_VM

+#if defined(CONFIG_MMU) && defined(DEBUG_VM)



Shouldn't it be CONFIG_DEBUG_VM ?



  static inline void print_mlk(char *name, unsigned long b, unsigned long t)
  {
pr_notice("%12s : 0x%08lx - 0x%08lx   (%4ld kB)\n", name, b, t,



Alex



Re: [PATCH v6 14/14] riscv: Make mmap allocation top-down by default

2019-10-09 Thread Alex Ghiti

On 10/8/19 10:07 PM, Atish Patra wrote:

On Tue, 2019-10-08 at 07:58 -0400, Alex Ghiti wrote:

On 10/7/19 8:46 PM, Atish Patra wrote:

On Mon, 2019-10-07 at 05:11 -0400, Alex Ghiti wrote:

On 10/4/19 10:12 PM, Atish Patra wrote:

On Thu, 2019-08-08 at 02:17 -0400, Alexandre Ghiti wrote:

In order to avoid wasting user address space by using bottom-
up
mmap
allocation scheme, prefer top-down scheme when possible.

Before:
root@qemuriscv64:~# cat /proc/self/maps
0001-00016000 r-xp  fe:00
6389   /bin/cat.coreutils
00016000-00017000 r--p 5000 fe:00
6389   /bin/cat.coreutils
00017000-00018000 rw-p 6000 fe:00
6389   /bin/cat.coreutils
00018000-00039000 rw-p  00:00 0  [heap]
156000-16d000 r-xp  fe:00 7193   /lib/ld-
2.28.so
16d000-16e000 r--p 00016000 fe:00 7193   /lib/ld-
2.28.so
16e000-16f000 rw-p 00017000 fe:00 7193   /lib/ld-
2.28.so
16f000-17 rw-p  00:00 0
17-172000 r-xp  00:00 0  [vdso]
174000-176000 rw-p  00:00 0
176000-1555674000 r-xp  fe:00 7187   /lib/libc-
2.28.so
1555674000-1555678000 r--p 000fd000 fe:00 7187   /lib/libc-
2.28.so
1555678000-155567a000 rw-p 00101000 fe:00 7187   /lib/libc-
2.28.so
155567a000-15556a rw-p  00:00 0
3fffb9-3fffbb1000 rw-p  00:00 0  [stack]

After:
root@qemuriscv64:~# cat /proc/self/maps
0001-00016000 r-xp  fe:00
6389   /bin/cat.coreutils
00016000-00017000 r--p 5000 fe:00
6389   /bin/cat.coreutils
00017000-00018000 rw-p 6000 fe:00
6389   /bin/cat.coreutils
2de81000-2dea2000 rw-p  00:00 0  [heap]
3ff7eb6000-3ff7ed8000 rw-p  00:00 0
3ff7ed8000-3ff7fd6000 r-xp  fe:00 7187   /lib/libc-
2.28.so
3ff7fd6000-3ff7fda000 r--p 000fd000 fe:00 7187   /lib/libc-
2.28.so
3ff7fda000-3ff7fdc000 rw-p 00101000 fe:00 7187   /lib/libc-
2.28.so
3ff7fdc000-3ff7fe2000 rw-p  00:00 0
3ff7fe4000-3ff7fe6000 r-xp  00:00 0  [vdso]
3ff7fe6000-3ff7ffd000 r-xp  fe:00 7193   /lib/ld-
2.28.so
3ff7ffd000-3ff7ffe000 r--p 00016000 fe:00 7193   /lib/ld-
2.28.so
3ff7ffe000-3ff7fff000 rw-p 00017000 fe:00 7193   /lib/ld-
2.28.so
3ff7fff000-3ff800 rw-p  00:00 0
3fff888000-3fff8a9000 rw-p  00:00 0  [stack]

Signed-off-by: Alexandre Ghiti 
Acked-by: Paul Walmsley 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Kees Cook 
Reviewed-by: Luis Chamberlain 
---
arch/riscv/Kconfig | 12 
1 file changed, 12 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 59a4727ecd6c..87dc5370becb 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -54,6 +54,18 @@ config RISCV
select EDAC_SUPPORT
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
+   select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
+   select HAVE_ARCH_MMAP_RND_BITS
+
+config ARCH_MMAP_RND_BITS_MIN
+   default 18 if 6legacy_va_layout4BIT
+   default 8
+
+# max bits determined by the following formula:
+#  VA_BITS - PAGE_SHIFT - 3
+config ARCH_MMAP_RND_BITS_MAX
+   default 24 if 64BIT # SV39 based
+   default 17

config MMU

def_bool y

With this patch, I am not able to boot a Fedora Linux(a Gnome
desktop
image) on RISC-V hardware (Unleashed + Microsemi Expansion
board).
The
booting gets stuck right after systemd starts.

https://paste.fedoraproject.org/paste/TOrUMqqKH-pGFX7CnfajDg

Reverting just this patch allow to boot Fedora successfully on
specific
RISC-V hardware. I have not root caused the issue but it looks
like
it
might have messed userpsace mapping.

It might have messed userspace mapping but not enough to make
userspace
completely broken
as systemd does some things. I would try to boot in legacy
layout:
if
you can try to set sysctl legacy_va_layout
at boottime, it will map userspace as it was before (bottom-up).
If
that
does not work, the problem could
be the randomization that is activated by default now.

Randomization may not be the issue. I just removed
ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT from the config and that
seems to
work. Here is the bottom-up layout with randomization on.

Oups, sorry for my previous answer, I missed yours that landed in
another folder.

Removing ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT also removes
randomization
as this config selects ARCH_HAS_ELF_RANDOMIZE.
You could remove ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT and selects by
hand
ARCH_HAS_ELF_RANDOMIZE but you would have to implement arch_mmap_rnd
and
arch_randomize_brk (elf-randomize.h).


Ahh okay.


The simplest would be to boot in legacy layout: I did not find a way
to
set this in kernel
command line, but you can by modifying it directly in the code:

https://elixir.bootlin.com/linux/v5.4-rc2/source/kernel/sysctl.c#L269


Setting this to 1 works.


[root@fedora-riscv ~]# cat /proc/self/maps
156000-17 r-xp  103:01
280098

Re: [PATCH v6 14/14] riscv: Make mmap allocation top-down by default

2019-10-08 Thread Alex Ghiti

On 10/7/19 8:46 PM, Atish Patra wrote:

On Mon, 2019-10-07 at 05:11 -0400, Alex Ghiti wrote:

On 10/4/19 10:12 PM, Atish Patra wrote:

On Thu, 2019-08-08 at 02:17 -0400, Alexandre Ghiti wrote:

In order to avoid wasting user address space by using bottom-up
mmap
allocation scheme, prefer top-down scheme when possible.

Before:
root@qemuriscv64:~# cat /proc/self/maps
0001-00016000 r-xp  fe:00
6389   /bin/cat.coreutils
00016000-00017000 r--p 5000 fe:00
6389   /bin/cat.coreutils
00017000-00018000 rw-p 6000 fe:00
6389   /bin/cat.coreutils
00018000-00039000 rw-p  00:00 0  [heap]
156000-16d000 r-xp  fe:00 7193   /lib/ld-2.28.so
16d000-16e000 r--p 00016000 fe:00 7193   /lib/ld-2.28.so
16e000-16f000 rw-p 00017000 fe:00 7193   /lib/ld-2.28.so
16f000-17 rw-p  00:00 0
17-172000 r-xp  00:00 0  [vdso]
174000-176000 rw-p  00:00 0
176000-1555674000 r-xp  fe:00 7187   /lib/libc-
2.28.so
1555674000-1555678000 r--p 000fd000 fe:00 7187   /lib/libc-
2.28.so
1555678000-155567a000 rw-p 00101000 fe:00 7187   /lib/libc-
2.28.so
155567a000-15556a rw-p  00:00 0
3fffb9-3fffbb1000 rw-p  00:00 0  [stack]

After:
root@qemuriscv64:~# cat /proc/self/maps
0001-00016000 r-xp  fe:00
6389   /bin/cat.coreutils
00016000-00017000 r--p 5000 fe:00
6389   /bin/cat.coreutils
00017000-00018000 rw-p 6000 fe:00
6389   /bin/cat.coreutils
2de81000-2dea2000 rw-p  00:00 0  [heap]
3ff7eb6000-3ff7ed8000 rw-p  00:00 0
3ff7ed8000-3ff7fd6000 r-xp  fe:00 7187   /lib/libc-
2.28.so
3ff7fd6000-3ff7fda000 r--p 000fd000 fe:00 7187   /lib/libc-
2.28.so
3ff7fda000-3ff7fdc000 rw-p 00101000 fe:00 7187   /lib/libc-
2.28.so
3ff7fdc000-3ff7fe2000 rw-p  00:00 0
3ff7fe4000-3ff7fe6000 r-xp  00:00 0  [vdso]
3ff7fe6000-3ff7ffd000 r-xp  fe:00 7193   /lib/ld-2.28.so
3ff7ffd000-3ff7ffe000 r--p 00016000 fe:00 7193   /lib/ld-2.28.so
3ff7ffe000-3ff7fff000 rw-p 00017000 fe:00 7193   /lib/ld-2.28.so
3ff7fff000-3ff800 rw-p  00:00 0
3fff888000-3fff8a9000 rw-p  00:00 0  [stack]

Signed-off-by: Alexandre Ghiti 
Acked-by: Paul Walmsley 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Kees Cook 
Reviewed-by: Luis Chamberlain 
---
   arch/riscv/Kconfig | 12 
   1 file changed, 12 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 59a4727ecd6c..87dc5370becb 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -54,6 +54,18 @@ config RISCV
select EDAC_SUPPORT
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
+   select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
+   select HAVE_ARCH_MMAP_RND_BITS
+
+config ARCH_MMAP_RND_BITS_MIN
+   default 18 if 6legacy_va_layout4BIT
+   default 8
+
+# max bits determined by the following formula:
+#  VA_BITS - PAGE_SHIFT - 3
+config ARCH_MMAP_RND_BITS_MAX
+   default 24 if 64BIT # SV39 based
+   default 17
   
   config MMU

def_bool y

With this patch, I am not able to boot a Fedora Linux(a Gnome
desktop
image) on RISC-V hardware (Unleashed + Microsemi Expansion board).
The
booting gets stuck right after systemd starts.

https://paste.fedoraproject.org/paste/TOrUMqqKH-pGFX7CnfajDg

Reverting just this patch allow to boot Fedora successfully on
specific
RISC-V hardware. I have not root caused the issue but it looks like
it
might have messed userpsace mapping.

It might have messed userspace mapping but not enough to make
userspace
completely broken
as systemd does some things. I would try to boot in legacy layout:
if
you can try to set sysctl legacy_va_layout
at boottime, it will map userspace as it was before (bottom-up). If
that
does not work, the problem could
be the randomization that is activated by default now.

Randomization may not be the issue. I just removed
ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT from the config and that seems to
work. Here is the bottom-up layout with randomization on.


Oups, sorry for my previous answer, I missed yours that landed in 
another folder.


Removing ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT also removes randomization
as this config selects ARCH_HAS_ELF_RANDOMIZE.
You could remove ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT and selects by hand
ARCH_HAS_ELF_RANDOMIZE but you would have to implement arch_mmap_rnd and
arch_randomize_brk (elf-randomize.h).

The simplest would be to boot in legacy layout: I did not find a way to 
set this in kernel

command line, but you can by modifying it directly in the code:

https://elixir.bootlin.com/linux/v5.4-rc2/source/kernel/sysctl.c#L269


[root@fedora-riscv ~]# cat /proc/self/maps
156000-17 r-xp  103:01
280098/usr/lib64/ld-2.28.so
17-171000 r--p 00019000 103:01
280098/usr/lib64/ld-2.28.so
171000

Re: [PATCH v11 07/22] riscv: mm: Add p?d_leaf() definitions

2019-10-08 Thread Alex Ghiti

On 10/7/19 11:38 AM, Steven Price wrote:

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.

For riscv a page is a leaf page when it has a read, write or execute bit
set on it.

CC: Palmer Dabbelt 
CC: Albert Ou 
CC: linux-ri...@lists.infradead.org
Signed-off-by: Steven Price 
---
  arch/riscv/include/asm/pgtable-64.h | 7 +++
  arch/riscv/include/asm/pgtable.h| 7 +++
  2 files changed, 14 insertions(+)

diff --git a/arch/riscv/include/asm/pgtable-64.h 
b/arch/riscv/include/asm/pgtable-64.h
index 74630989006d..e88a8e8acbdf 100644
--- a/arch/riscv/include/asm/pgtable-64.h
+++ b/arch/riscv/include/asm/pgtable-64.h
@@ -43,6 +43,13 @@ static inline int pud_bad(pud_t pud)
return !pud_present(pud);
  }
  
+#define pud_leaf	pud_leaf

+static inline int pud_leaf(pud_t pud)
+{
+   return pud_present(pud)
+   && (pud_val(pud) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC));
+}
+
  static inline void set_pud(pud_t *pudp, pud_t pud)
  {
*pudp = pud;
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 7255f2d8395b..b9a679153265 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -130,6 +130,13 @@ static inline int pmd_bad(pmd_t pmd)
return !pmd_present(pmd);
  }
  
+#define pmd_leaf	pmd_leaf

+static inline int pmd_leaf(pmd_t pmd)
+{
+   return pmd_present(pmd)
+   && (pmd_val(pmd) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC));
+}
+
  static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
  {
*pmdp = pmd;


Hi Steven,

The way you check leaf entries is correct: we do the same for hugepages. 
So is
there a reason you did not use the pmd/pud_huge functions that are 
defined in

arch/riscv/mm/hugetlbpage.c ?

Anyway, FWIW:

Reviewed-by: Alexandre Ghiti 

Thanks,

Alex



Re: [PATCH v6 14/14] riscv: Make mmap allocation top-down by default

2019-10-08 Thread Alex Ghiti

On 10/7/19 5:11 AM, Alex Ghiti wrote:

On 10/4/19 10:12 PM, Atish Patra wrote:

On Thu, 2019-08-08 at 02:17 -0400, Alexandre Ghiti wrote:

In order to avoid wasting user address space by using bottom-up mmap
allocation scheme, prefer top-down scheme when possible.

Before:
root@qemuriscv64:~# cat /proc/self/maps
0001-00016000 r-xp  fe:00 6389 /bin/cat.coreutils
00016000-00017000 r--p 5000 fe:00 6389 /bin/cat.coreutils
00017000-00018000 rw-p 6000 fe:00 6389 /bin/cat.coreutils
00018000-00039000 rw-p  00:00 0  [heap]
156000-16d000 r-xp  fe:00 7193 /lib/ld-2.28.so
16d000-16e000 r--p 00016000 fe:00 7193 /lib/ld-2.28.so
16e000-16f000 rw-p 00017000 fe:00 7193 /lib/ld-2.28.so
16f000-17 rw-p  00:00 0
17-172000 r-xp  00:00 0  [vdso]
174000-176000 rw-p  00:00 0
176000-1555674000 r-xp  fe:00 7187 /lib/libc-2.28.so
1555674000-1555678000 r--p 000fd000 fe:00 7187 /lib/libc-2.28.so
1555678000-155567a000 rw-p 00101000 fe:00 7187 /lib/libc-2.28.so
155567a000-15556a rw-p  00:00 0
3fffb9-3fffbb1000 rw-p  00:00 0  [stack]

After:
root@qemuriscv64:~# cat /proc/self/maps
0001-00016000 r-xp  fe:00 6389 /bin/cat.coreutils
00016000-00017000 r--p 5000 fe:00 6389 /bin/cat.coreutils
00017000-00018000 rw-p 6000 fe:00 6389 /bin/cat.coreutils
2de81000-2dea2000 rw-p  00:00 0  [heap]
3ff7eb6000-3ff7ed8000 rw-p  00:00 0
3ff7ed8000-3ff7fd6000 r-xp  fe:00 7187 /lib/libc-2.28.so
3ff7fd6000-3ff7fda000 r--p 000fd000 fe:00 7187 /lib/libc-2.28.so
3ff7fda000-3ff7fdc000 rw-p 00101000 fe:00 7187 /lib/libc-2.28.so
3ff7fdc000-3ff7fe2000 rw-p  00:00 0
3ff7fe4000-3ff7fe6000 r-xp  00:00 0  [vdso]
3ff7fe6000-3ff7ffd000 r-xp  fe:00 7193 /lib/ld-2.28.so
3ff7ffd000-3ff7ffe000 r--p 00016000 fe:00 7193 /lib/ld-2.28.so
3ff7ffe000-3ff7fff000 rw-p 00017000 fe:00 7193 /lib/ld-2.28.so
3ff7fff000-3ff800 rw-p  00:00 0
3fff888000-3fff8a9000 rw-p  00:00 0  [stack]

Signed-off-by: Alexandre Ghiti 
Acked-by: Paul Walmsley 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Kees Cook 
Reviewed-by: Luis Chamberlain 
---
  arch/riscv/Kconfig | 12 
  1 file changed, 12 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 59a4727ecd6c..87dc5370becb 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -54,6 +54,18 @@ config RISCV
  select EDAC_SUPPORT
  select ARCH_HAS_GIGANTIC_PAGE
  select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
+    select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
+    select HAVE_ARCH_MMAP_RND_BITS
+
+config ARCH_MMAP_RND_BITS_MIN
+    default 18 if 64BIT
+    default 8
+
+# max bits determined by the following formula:
+#  VA_BITS - PAGE_SHIFT - 3
+config ARCH_MMAP_RND_BITS_MAX
+    default 24 if 64BIT # SV39 based
+    default 17
    config MMU
  def_bool y

With this patch, I am not able to boot a Fedora Linux(a Gnome desktop
image) on RISC-V hardware (Unleashed + Microsemi Expansion board). The
booting gets stuck right after systemd starts.

https://paste.fedoraproject.org/paste/TOrUMqqKH-pGFX7CnfajDg

Reverting just this patch allow to boot Fedora successfully on specific
RISC-V hardware. I have not root caused the issue but it looks like it
might have messed userpsace mapping.


It might have messed userspace mapping but not enough to make 
userspace completely broken
as systemd does some things. I would try to boot in legacy layout: if 
you can try to set sysctl legacy_va_layout
at boottime, it will map userspace as it was before (bottom-up). If 
that does not work, the problem could

be the randomization that is activated by default now.
Anyway, it's weird since userspace should not depend on how the 
mapping is.


If you can identify the program that stalls, that would be fantastic :)

As the code is common to mips and arm now and I did not hear from 
them, I imagine the problem comes

from us.

Alex


Atish, do you have any news regarding this problem ? If you have an 
image I can execute on qemu that

reproduces the issue, I can take a look.

Alex







Re: [PATCH v6 14/14] riscv: Make mmap allocation top-down by default

2019-10-07 Thread Alex Ghiti

On 10/4/19 10:12 PM, Atish Patra wrote:

On Thu, 2019-08-08 at 02:17 -0400, Alexandre Ghiti wrote:

In order to avoid wasting user address space by using bottom-up mmap
allocation scheme, prefer top-down scheme when possible.

Before:
root@qemuriscv64:~# cat /proc/self/maps
0001-00016000 r-xp  fe:00 6389   /bin/cat.coreutils
00016000-00017000 r--p 5000 fe:00 6389   /bin/cat.coreutils
00017000-00018000 rw-p 6000 fe:00 6389   /bin/cat.coreutils
00018000-00039000 rw-p  00:00 0  [heap]
156000-16d000 r-xp  fe:00 7193   /lib/ld-2.28.so
16d000-16e000 r--p 00016000 fe:00 7193   /lib/ld-2.28.so
16e000-16f000 rw-p 00017000 fe:00 7193   /lib/ld-2.28.so
16f000-17 rw-p  00:00 0
17-172000 r-xp  00:00 0  [vdso]
174000-176000 rw-p  00:00 0
176000-1555674000 r-xp  fe:00 7187   /lib/libc-2.28.so
1555674000-1555678000 r--p 000fd000 fe:00 7187   /lib/libc-2.28.so
1555678000-155567a000 rw-p 00101000 fe:00 7187   /lib/libc-2.28.so
155567a000-15556a rw-p  00:00 0
3fffb9-3fffbb1000 rw-p  00:00 0  [stack]

After:
root@qemuriscv64:~# cat /proc/self/maps
0001-00016000 r-xp  fe:00 6389   /bin/cat.coreutils
00016000-00017000 r--p 5000 fe:00 6389   /bin/cat.coreutils
00017000-00018000 rw-p 6000 fe:00 6389   /bin/cat.coreutils
2de81000-2dea2000 rw-p  00:00 0  [heap]
3ff7eb6000-3ff7ed8000 rw-p  00:00 0
3ff7ed8000-3ff7fd6000 r-xp  fe:00 7187   /lib/libc-2.28.so
3ff7fd6000-3ff7fda000 r--p 000fd000 fe:00 7187   /lib/libc-2.28.so
3ff7fda000-3ff7fdc000 rw-p 00101000 fe:00 7187   /lib/libc-2.28.so
3ff7fdc000-3ff7fe2000 rw-p  00:00 0
3ff7fe4000-3ff7fe6000 r-xp  00:00 0  [vdso]
3ff7fe6000-3ff7ffd000 r-xp  fe:00 7193   /lib/ld-2.28.so
3ff7ffd000-3ff7ffe000 r--p 00016000 fe:00 7193   /lib/ld-2.28.so
3ff7ffe000-3ff7fff000 rw-p 00017000 fe:00 7193   /lib/ld-2.28.so
3ff7fff000-3ff800 rw-p  00:00 0
3fff888000-3fff8a9000 rw-p  00:00 0  [stack]

Signed-off-by: Alexandre Ghiti 
Acked-by: Paul Walmsley 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Kees Cook 
Reviewed-by: Luis Chamberlain 
---
  arch/riscv/Kconfig | 12 
  1 file changed, 12 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 59a4727ecd6c..87dc5370becb 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -54,6 +54,18 @@ config RISCV
select EDAC_SUPPORT
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
+   select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
+   select HAVE_ARCH_MMAP_RND_BITS
+
+config ARCH_MMAP_RND_BITS_MIN
+   default 18 if 64BIT
+   default 8
+
+# max bits determined by the following formula:
+#  VA_BITS - PAGE_SHIFT - 3
+config ARCH_MMAP_RND_BITS_MAX
+   default 24 if 64BIT # SV39 based
+   default 17
  
  config MMU

def_bool y

With this patch, I am not able to boot a Fedora Linux(a Gnome desktop
image) on RISC-V hardware (Unleashed + Microsemi Expansion board). The
booting gets stuck right after systemd starts.

https://paste.fedoraproject.org/paste/TOrUMqqKH-pGFX7CnfajDg

Reverting just this patch allow to boot Fedora successfully on specific
RISC-V hardware. I have not root caused the issue but it looks like it
might have messed userpsace mapping.


It might have messed userspace mapping but not enough to make userspace 
completely broken
as systemd does some things. I would try to boot in legacy layout: if 
you can try to set sysctl legacy_va_layout
at boottime, it will map userspace as it was before (bottom-up). If that 
does not work, the problem could

be the randomization that is activated by default now.
Anyway, it's weird since userspace should not depend on how the mapping is.

If you can identify the program that stalls, that would be fantastic :)

As the code is common to mips and arm now and I did not hear from them, 
I imagine the problem comes

from us.

Alex




Re: [PATCH RESEND 0/8] Fix mmap base in bottom-up mmap

2019-08-27 Thread Alex Ghiti

On 8/26/19 6:37 PM, Helge Deller wrote:

On 26.08.19 09:34, Alexandre Ghiti wrote:

On 6/20/19 7:03 AM, Alexandre Ghiti wrote:

This series fixes the fallback of the top-down mmap: in case of
failure, a bottom-up scheme can be tried as a last resort between
the top-down mmap base and the stack, hoping for a large unused stack
limit.

Lots of architectures and even mm code start this fallback
at TASK_UNMAPPED_BASE, which is useless since the top-down scheme
already failed on the whole address space: instead, simply use
mmap_base.

Along the way, it allows to get rid of of mmap_legacy_base and
mmap_compat_legacy_base from mm_struct.

Note that arm and mips already implement this behaviour.

Alexandre Ghiti (8):
   s390: Start fallback of top-down mmap at mm->mmap_base
   sh: Start fallback of top-down mmap at mm->mmap_base
   sparc: Start fallback of top-down mmap at mm->mmap_base
   x86, hugetlbpage: Start fallback of top-down mmap at mm->mmap_base
   mm: Start fallback top-down mmap at mm->mmap_base
   parisc: Use mmap_base, not mmap_legacy_base, as low_limit for
 bottom-up mmap
   x86: Use mmap_*base, not mmap_*legacy_base, as low_limit for 
bottom-up

 mmap
   mm: Remove mmap_legacy_base and mmap_compat_legacy_code fields from
 mm_struct

  arch/parisc/kernel/sys_parisc.c  |  8 +++-
  arch/s390/mm/mmap.c  |  2 +-
  arch/sh/mm/mmap.c    |  2 +-
  arch/sparc/kernel/sys_sparc_64.c |  2 +-
  arch/sparc/mm/hugetlbpage.c  |  2 +-
  arch/x86/include/asm/elf.h   |  2 +-
  arch/x86/kernel/sys_x86_64.c |  4 ++--
  arch/x86/mm/hugetlbpage.c    |  7 ---
  arch/x86/mm/mmap.c   | 20 +---
  include/linux/mm_types.h |  2 --
  mm/debug.c   |  4 ++--
  mm/mmap.c    |  2 +-
  12 files changed, 26 insertions(+), 31 deletions(-)



Any thoughts about that series ? As said before, this is just a 
preparatory patchset in order to

merge x86 mmap top down code with the generic version.


I just tested your patch series successfully on the parisc
architeture. You may add:

Tested-by: Helge Deller  # parisc


Thanks again Helge !

Alex




Thanks!
Helge


Re: [PATCH v5 14/14] riscv: Make mmap allocation top-down by default

2019-07-31 Thread Alex Ghiti

On 7/30/19 1:51 AM, Alexandre Ghiti wrote:

In order to avoid wasting user address space by using bottom-up mmap
allocation scheme, prefer top-down scheme when possible.

Before:
root@qemuriscv64:~# cat /proc/self/maps
0001-00016000 r-xp  fe:00 6389   /bin/cat.coreutils
00016000-00017000 r--p 5000 fe:00 6389   /bin/cat.coreutils
00017000-00018000 rw-p 6000 fe:00 6389   /bin/cat.coreutils
00018000-00039000 rw-p  00:00 0  [heap]
156000-16d000 r-xp  fe:00 7193   /lib/ld-2.28.so
16d000-16e000 r--p 00016000 fe:00 7193   /lib/ld-2.28.so
16e000-16f000 rw-p 00017000 fe:00 7193   /lib/ld-2.28.so
16f000-17 rw-p  00:00 0
17-172000 r-xp  00:00 0  [vdso]
174000-176000 rw-p  00:00 0
176000-1555674000 r-xp  fe:00 7187   /lib/libc-2.28.so
1555674000-1555678000 r--p 000fd000 fe:00 7187   /lib/libc-2.28.so
1555678000-155567a000 rw-p 00101000 fe:00 7187   /lib/libc-2.28.so
155567a000-15556a rw-p  00:00 0
3fffb9-3fffbb1000 rw-p  00:00 0  [stack]

After:
root@qemuriscv64:~# cat /proc/self/maps
0001-00016000 r-xp  fe:00 6389   /bin/cat.coreutils
00016000-00017000 r--p 5000 fe:00 6389   /bin/cat.coreutils
00017000-00018000 rw-p 6000 fe:00 6389   /bin/cat.coreutils
2de81000-2dea2000 rw-p  00:00 0  [heap]
3ff7eb6000-3ff7ed8000 rw-p  00:00 0
3ff7ed8000-3ff7fd6000 r-xp  fe:00 7187   /lib/libc-2.28.so
3ff7fd6000-3ff7fda000 r--p 000fd000 fe:00 7187   /lib/libc-2.28.so
3ff7fda000-3ff7fdc000 rw-p 00101000 fe:00 7187   /lib/libc-2.28.so
3ff7fdc000-3ff7fe2000 rw-p  00:00 0
3ff7fe4000-3ff7fe6000 r-xp  00:00 0  [vdso]
3ff7fe6000-3ff7ffd000 r-xp  fe:00 7193   /lib/ld-2.28.so
3ff7ffd000-3ff7ffe000 r--p 00016000 fe:00 7193   /lib/ld-2.28.so
3ff7ffe000-3ff7fff000 rw-p 00017000 fe:00 7193   /lib/ld-2.28.so
3ff7fff000-3ff800 rw-p  00:00 0
3fff888000-3fff8a9000 rw-p  00:00 0  [stack]

Signed-off-by: Alexandre Ghiti 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Kees Cook 
Reviewed-by: Luis Chamberlain 
---
  arch/riscv/Kconfig | 13 +
  1 file changed, 13 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 8ef64fe2c2b3..8d0d8af1a744 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -54,6 +54,19 @@ config RISCV
select EDAC_SUPPORT
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
+   select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
+   select HAVE_ARCH_MMAP_RND_BITS
+
+config ARCH_MMAP_RND_BITS_MIN
+   default 18 if 64BIT
+   default 8
+
+# max bits determined by the following formula:
+#  VA_BITS - PAGE_SHIFT - 3
+config ARCH_MMAP_RND_BITS_MAX
+   default 33 if RISCV_VM_SV48
+   default 24 if RISCV_VM_SV39
+   default 17 if RISCV_VM_SV32
  
  config MMU

def_bool y



Hi Andrew,

I have just seen you took this series into mmotm but without Paul's 
patch ("riscv: kbuild: add virtual memory system selection") on which 
this commit relies, I'm not sure it could

compile without it as there is no default for ARCH_MMAP_RND_BITS_MAX.

Thanks,

Alex



  1   2   >