Re: [PATCH] riscv: Get rid of MAX_EARLY_MAPPING_SIZE
Le 2/22/21 à 12:40 AM, Alex Ghiti a écrit : Hi Dmitry, Le 2/21/21 à 10:38 AM, Dmitry Vyukov a écrit : On Sun, Feb 21, 2021 at 3:22 PM Alexandre Ghiti wrote: At early boot stage, we have a whole PGDIR to map the kernel, so there is no need to restrict the early mapping size to 128MB. Removing this define also allows us to simplify some compile time logic. This fixes large kernel mappings with a size greater than 128MB, as it is the case for syzbot kernels whose size was just ~130MB. Note that on rv64, for now, we are then limited to PGDIR size for early mapping as we can't use PGD mappings (see [1]). That should be enough given the relative small size of syzbot kernels compared to PGDIR_SIZE which is 1GB. [1] https://lore.kernel.org/lkml/20200603153608.30056-1-a...@ghiti.fr/ I've applied this patch to (as it contains the HEAD fix): commit f49815047c1a3e3644a0ba38f3825c5cde8a0922 (HEAD, riscv/for-next) Author: Tobias Klauser Date: Tue Feb 16 18:33:05 2021 +0100 riscv: Disable KSAN_SANITIZE for vDSO and the kernel started booting with my large config. It quickly crashed (see below), but at least it started booting, so it's an improvement. Tested-by: Dmitry Vyukov Thanks for that. Linux version 5.11.0-rc2-00069-gf49815047c1a-dirty (dvyu...@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc (Debian 10.2.1-6+build1) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1) #34 SMP PREEMPT Sun Feb 21 15:51:40 CET 2021 OF: fdt: Ignoring memory range 0x8000 - 0x8020 Machine model: riscv-virtio,qemu earlycon: ns16550a0 at MMIO 0x1000 (options '') printk: bootconsole [ns16550a0] enabled efi: UEFI not found. cma: Reserved 16 MiB at 0xfec0 Zone ranges: DMA32 [mem 0x8020-0x] Normal empty Movable zone start for each node Early memory node ranges node 0: [mem 0x8020-0x] Zeroed struct page in unavailable ranges: 512 pages Initmem setup node 0 [mem 0x8020-0x] SBI specification v0.2 detected SBI implementation ID=0x1 Version=0x8 SBI v0.2 TIME extension detected SBI v0.2 IPI extension detected SBI v0.2 RFENCE extension detected software IO TLB: mapped [mem 0xf7c0-0xfbc0] (64MB) [ cut here ] DEBUG_LOCKS_WARN_ON(early_boot_irqs_disabled) WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:4085 lockdep_hardirqs_on_prepare+0x384/0x388 kernel/locking/lockdep.c:4085 Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.11.0-rc2-00069-gf49815047c1a-dirty #34 Hardware name: riscv-virtio,qemu (DT) epc : lockdep_hardirqs_on_prepare+0x384/0x388 kernel/locking/lockdep.c:4085 ra : lockdep_hardirqs_on_prepare+0x384/0x388 kernel/locking/lockdep.c:4085 epc : ffec125a ra : ffec125a sp : ffe006603ce0 gp : ffe006c338f0 tp : ffe006689e00 t0 : ffe00669a9a8 t1 : ffc400cc0738 t2 : s0 : ffe006603d20 s1 : ffe006689e00 a0 : 002d a1 : 000f a2 : 0002 a3 : ffed2718 a4 : a5 : a6 : 00f0 a7 : ffe0066039c7 s2 : ffe004a337c0 s3 : ffe0076fa1b8 s4 : s5 : ffe006689e00 s6 : 0001 s7 : ffe07fcfc000 s8 : ffe07fcfd000 s9 : ffe006c3c0d0 s10: f000 s11: ffe004a1fbb8 t3 : 2d2d2d2d t4 : ffc400cc0737 t5 : ffc400cc0739 t6 : ffe0066039c8 status: 0100 badaddr: cause: 0003 Call Trace: [] lockdep_hardirqs_on_prepare+0x384/0x388 kernel/locking/lockdep.c:4085 [] trace_hardirqs_on+0x116/0x174 kernel/trace/trace_preemptirq.c:49 [] _save_context+0xa2/0xe2 [] local_flush_tlb_all arch/riscv/include/asm/tlbflush.h:16 [inline] [] populate arch/riscv/mm/kasan_init.c:95 [inline] [] kasan_init+0x23e/0x31a arch/riscv/mm/kasan_init.c:157 irq event stamp: 0 hardirqs last enabled at (0): [<>] 0x0 hardirqs last disabled at (0): [<>] 0x0 softirqs last enabled at (0): [<>] 0x0 softirqs last disabled at (0): [<>] 0x0 random: get_random_bytes called from init_oops_id kernel/panic.c:546 [inline] with crng_init=0 random: get_random_bytes called from init_oops_id kernel/panic.c:543 [inline] with crng_init=0 random: get_random_bytes called from print_oops_end_marker kernel/panic.c:556 [inline] with crng_init=0 random: get_random_bytes called from __warn+0x1be/0x20a kernel/panic.c:613 with crng_init=0 ---[ end trace ]--- Unable to handle kernel paging request at virtual address dfc81004 Oops [#1] Modules linked in: CPU: 0 PID: 0 Comm: swapper Tainted: G W 5.11.0-rc2-00069-gf49815047c1a-dirty #34 Hardware name: riscv-virtio,qemu (DT) epc : __memset+0x60/0xfc arch/riscv/lib/memset.S:67 ra : populate arch/riscv/mm/kasan_init.c:96 [inline] ra : kasan_init+0x256/0x31a
Re: [PATCH] riscv: Get rid of MAX_EARLY_MAPPING_SIZE
Hi Dmitry, Le 2/21/21 à 10:38 AM, Dmitry Vyukov a écrit : On Sun, Feb 21, 2021 at 3:22 PM Alexandre Ghiti wrote: At early boot stage, we have a whole PGDIR to map the kernel, so there is no need to restrict the early mapping size to 128MB. Removing this define also allows us to simplify some compile time logic. This fixes large kernel mappings with a size greater than 128MB, as it is the case for syzbot kernels whose size was just ~130MB. Note that on rv64, for now, we are then limited to PGDIR size for early mapping as we can't use PGD mappings (see [1]). That should be enough given the relative small size of syzbot kernels compared to PGDIR_SIZE which is 1GB. [1] https://lore.kernel.org/lkml/20200603153608.30056-1-a...@ghiti.fr/ I've applied this patch to (as it contains the HEAD fix): commit f49815047c1a3e3644a0ba38f3825c5cde8a0922 (HEAD, riscv/for-next) Author: Tobias Klauser Date: Tue Feb 16 18:33:05 2021 +0100 riscv: Disable KSAN_SANITIZE for vDSO and the kernel started booting with my large config. It quickly crashed (see below), but at least it started booting, so it's an improvement. Tested-by: Dmitry Vyukov Thanks for that. Linux version 5.11.0-rc2-00069-gf49815047c1a-dirty (dvyu...@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc (Debian 10.2.1-6+build1) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1) #34 SMP PREEMPT Sun Feb 21 15:51:40 CET 2021 OF: fdt: Ignoring memory range 0x8000 - 0x8020 Machine model: riscv-virtio,qemu earlycon: ns16550a0 at MMIO 0x1000 (options '') printk: bootconsole [ns16550a0] enabled efi: UEFI not found. cma: Reserved 16 MiB at 0xfec0 Zone ranges: DMA32[mem 0x8020-0x] Normal empty Movable zone start for each node Early memory node ranges node 0: [mem 0x8020-0x] Zeroed struct page in unavailable ranges: 512 pages Initmem setup node 0 [mem 0x8020-0x] SBI specification v0.2 detected SBI implementation ID=0x1 Version=0x8 SBI v0.2 TIME extension detected SBI v0.2 IPI extension detected SBI v0.2 RFENCE extension detected software IO TLB: mapped [mem 0xf7c0-0xfbc0] (64MB) [ cut here ] DEBUG_LOCKS_WARN_ON(early_boot_irqs_disabled) WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:4085 lockdep_hardirqs_on_prepare+0x384/0x388 kernel/locking/lockdep.c:4085 Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.11.0-rc2-00069-gf49815047c1a-dirty #34 Hardware name: riscv-virtio,qemu (DT) epc : lockdep_hardirqs_on_prepare+0x384/0x388 kernel/locking/lockdep.c:4085 ra : lockdep_hardirqs_on_prepare+0x384/0x388 kernel/locking/lockdep.c:4085 epc : ffec125a ra : ffec125a sp : ffe006603ce0 gp : ffe006c338f0 tp : ffe006689e00 t0 : ffe00669a9a8 t1 : ffc400cc0738 t2 : s0 : ffe006603d20 s1 : ffe006689e00 a0 : 002d a1 : 000f a2 : 0002 a3 : ffed2718 a4 : a5 : a6 : 00f0 a7 : ffe0066039c7 s2 : ffe004a337c0 s3 : ffe0076fa1b8 s4 : s5 : ffe006689e00 s6 : 0001 s7 : ffe07fcfc000 s8 : ffe07fcfd000 s9 : ffe006c3c0d0 s10: f000 s11: ffe004a1fbb8 t3 : 2d2d2d2d t4 : ffc400cc0737 t5 : ffc400cc0739 t6 : ffe0066039c8 status: 0100 badaddr: cause: 0003 Call Trace: [] lockdep_hardirqs_on_prepare+0x384/0x388 kernel/locking/lockdep.c:4085 [] trace_hardirqs_on+0x116/0x174 kernel/trace/trace_preemptirq.c:49 [] _save_context+0xa2/0xe2 [] local_flush_tlb_all arch/riscv/include/asm/tlbflush.h:16 [inline] [] populate arch/riscv/mm/kasan_init.c:95 [inline] [] kasan_init+0x23e/0x31a arch/riscv/mm/kasan_init.c:157 irq event stamp: 0 hardirqs last enabled at (0): [<>] 0x0 hardirqs last disabled at (0): [<>] 0x0 softirqs last enabled at (0): [<>] 0x0 softirqs last disabled at (0): [<>] 0x0 random: get_random_bytes called from init_oops_id kernel/panic.c:546 [inline] with crng_init=0 random: get_random_bytes called from init_oops_id kernel/panic.c:543 [inline] with crng_init=0 random: get_random_bytes called from print_oops_end_marker kernel/panic.c:556 [inline] with crng_init=0 random: get_random_bytes called from __warn+0x1be/0x20a kernel/panic.c:613 with crng_init=0 ---[ end trace ]--- Unable to handle kernel paging request at virtual address dfc81004 Oops [#1] Modules linked in: CPU: 0 PID: 0 Comm: swapper Tainted: GW 5.11.0-rc2-00069-gf49815047c1a-dirty #34 Hardware name: riscv-virtio,qemu (DT) epc : __memset+0x60/0xfc arch/riscv/lib/memset.S:67 ra : populate arch/riscv/mm/kasan_init.c:96 [inline] ra : kasan_init+0x256/0x31a arch/riscv/mm/kasan_init.c:157 epc : ffe001791cf0 ra :
Re: [PATCH] riscv: Get rid of MAX_EARLY_MAPPING_SIZE
On Sun, Feb 21, 2021 at 3:22 PM Alexandre Ghiti wrote: > > At early boot stage, we have a whole PGDIR to map the kernel, so there > is no need to restrict the early mapping size to 128MB. Removing this > define also allows us to simplify some compile time logic. > > This fixes large kernel mappings with a size greater than 128MB, as it > is the case for syzbot kernels whose size was just ~130MB. > > Note that on rv64, for now, we are then limited to PGDIR size for early > mapping as we can't use PGD mappings (see [1]). That should be enough > given the relative small size of syzbot kernels compared to PGDIR_SIZE > which is 1GB. > > [1] https://lore.kernel.org/lkml/20200603153608.30056-1-a...@ghiti.fr/ I've applied this patch to (as it contains the HEAD fix): commit f49815047c1a3e3644a0ba38f3825c5cde8a0922 (HEAD, riscv/for-next) Author: Tobias Klauser Date: Tue Feb 16 18:33:05 2021 +0100 riscv: Disable KSAN_SANITIZE for vDSO and the kernel started booting with my large config. It quickly crashed (see below), but at least it started booting, so it's an improvement. Tested-by: Dmitry Vyukov Linux version 5.11.0-rc2-00069-gf49815047c1a-dirty (dvyu...@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc (Debian 10.2.1-6+build1) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1) #34 SMP PREEMPT Sun Feb 21 15:51:40 CET 2021 OF: fdt: Ignoring memory range 0x8000 - 0x8020 Machine model: riscv-virtio,qemu earlycon: ns16550a0 at MMIO 0x1000 (options '') printk: bootconsole [ns16550a0] enabled efi: UEFI not found. cma: Reserved 16 MiB at 0xfec0 Zone ranges: DMA32[mem 0x8020-0x] Normal empty Movable zone start for each node Early memory node ranges node 0: [mem 0x8020-0x] Zeroed struct page in unavailable ranges: 512 pages Initmem setup node 0 [mem 0x8020-0x] SBI specification v0.2 detected SBI implementation ID=0x1 Version=0x8 SBI v0.2 TIME extension detected SBI v0.2 IPI extension detected SBI v0.2 RFENCE extension detected software IO TLB: mapped [mem 0xf7c0-0xfbc0] (64MB) [ cut here ] DEBUG_LOCKS_WARN_ON(early_boot_irqs_disabled) WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:4085 lockdep_hardirqs_on_prepare+0x384/0x388 kernel/locking/lockdep.c:4085 Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.11.0-rc2-00069-gf49815047c1a-dirty #34 Hardware name: riscv-virtio,qemu (DT) epc : lockdep_hardirqs_on_prepare+0x384/0x388 kernel/locking/lockdep.c:4085 ra : lockdep_hardirqs_on_prepare+0x384/0x388 kernel/locking/lockdep.c:4085 epc : ffec125a ra : ffec125a sp : ffe006603ce0 gp : ffe006c338f0 tp : ffe006689e00 t0 : ffe00669a9a8 t1 : ffc400cc0738 t2 : s0 : ffe006603d20 s1 : ffe006689e00 a0 : 002d a1 : 000f a2 : 0002 a3 : ffed2718 a4 : a5 : a6 : 00f0 a7 : ffe0066039c7 s2 : ffe004a337c0 s3 : ffe0076fa1b8 s4 : s5 : ffe006689e00 s6 : 0001 s7 : ffe07fcfc000 s8 : ffe07fcfd000 s9 : ffe006c3c0d0 s10: f000 s11: ffe004a1fbb8 t3 : 2d2d2d2d t4 : ffc400cc0737 t5 : ffc400cc0739 t6 : ffe0066039c8 status: 0100 badaddr: cause: 0003 Call Trace: [] lockdep_hardirqs_on_prepare+0x384/0x388 kernel/locking/lockdep.c:4085 [] trace_hardirqs_on+0x116/0x174 kernel/trace/trace_preemptirq.c:49 [] _save_context+0xa2/0xe2 [] local_flush_tlb_all arch/riscv/include/asm/tlbflush.h:16 [inline] [] populate arch/riscv/mm/kasan_init.c:95 [inline] [] kasan_init+0x23e/0x31a arch/riscv/mm/kasan_init.c:157 irq event stamp: 0 hardirqs last enabled at (0): [<>] 0x0 hardirqs last disabled at (0): [<>] 0x0 softirqs last enabled at (0): [<>] 0x0 softirqs last disabled at (0): [<>] 0x0 random: get_random_bytes called from init_oops_id kernel/panic.c:546 [inline] with crng_init=0 random: get_random_bytes called from init_oops_id kernel/panic.c:543 [inline] with crng_init=0 random: get_random_bytes called from print_oops_end_marker kernel/panic.c:556 [inline] with crng_init=0 random: get_random_bytes called from __warn+0x1be/0x20a kernel/panic.c:613 with crng_init=0 ---[ end trace ]--- Unable to handle kernel paging request at virtual address dfc81004 Oops [#1] Modules linked in: CPU: 0 PID: 0 Comm: swapper Tainted: GW 5.11.0-rc2-00069-gf49815047c1a-dirty #34 Hardware name: riscv-virtio,qemu (DT) epc : __memset+0x60/0xfc arch/riscv/lib/memset.S:67 ra : populate arch/riscv/mm/kasan_init.c:96 [inline] ra : kasan_init+0x256/0x31a arch/riscv/mm/kasan_init.c:157 epc : ffe001791cf0 ra : ffe004807920 sp : ffe006603e80 gp : ffe006c338f0 tp : ffe006689e00
[PATCH] riscv: Get rid of MAX_EARLY_MAPPING_SIZE
At early boot stage, we have a whole PGDIR to map the kernel, so there is no need to restrict the early mapping size to 128MB. Removing this define also allows us to simplify some compile time logic. This fixes large kernel mappings with a size greater than 128MB, as it is the case for syzbot kernels whose size was just ~130MB. Note that on rv64, for now, we are then limited to PGDIR size for early mapping as we can't use PGD mappings (see [1]). That should be enough given the relative small size of syzbot kernels compared to PGDIR_SIZE which is 1GB. [1] https://lore.kernel.org/lkml/20200603153608.30056-1-a...@ghiti.fr/ Reported-by: Dmitry Vyukov Signed-off-by: Alexandre Ghiti --- arch/riscv/mm/init.c | 21 + 1 file changed, 5 insertions(+), 16 deletions(-) diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index f9f9568d689e..f81f813b9603 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -226,8 +226,6 @@ pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss; pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss; pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss; -#define MAX_EARLY_MAPPING_SIZE SZ_128M - pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE); void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot) @@ -302,13 +300,7 @@ static void __init create_pte_mapping(pte_t *ptep, pmd_t trampoline_pmd[PTRS_PER_PMD] __page_aligned_bss; pmd_t fixmap_pmd[PTRS_PER_PMD] __page_aligned_bss; - -#if MAX_EARLY_MAPPING_SIZE < PGDIR_SIZE -#define NUM_EARLY_PMDS 1UL -#else -#define NUM_EARLY_PMDS (1UL + MAX_EARLY_MAPPING_SIZE / PGDIR_SIZE) -#endif -pmd_t early_pmd[PTRS_PER_PMD * NUM_EARLY_PMDS] __initdata __aligned(PAGE_SIZE); +pmd_t early_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE); pmd_t early_dtb_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE); static pmd_t *__init get_pmd_virt_early(phys_addr_t pa) @@ -330,11 +322,9 @@ static pmd_t *get_pmd_virt_late(phys_addr_t pa) static phys_addr_t __init alloc_pmd_early(uintptr_t va) { - uintptr_t pmd_num; + BUG_ON((va - PAGE_OFFSET) >> PGDIR_SHIFT); - pmd_num = (va - PAGE_OFFSET) >> PGDIR_SHIFT; - BUG_ON(pmd_num >= NUM_EARLY_PMDS); - return (uintptr_t)_pmd[pmd_num * PTRS_PER_PMD]; + return (uintptr_t)early_pmd; } static phys_addr_t __init alloc_pmd_fixmap(uintptr_t va) @@ -452,7 +442,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa) uintptr_t va, pa, end_va; uintptr_t load_pa = (uintptr_t)(&_start); uintptr_t load_sz = (uintptr_t)(&_end) - load_pa; - uintptr_t map_size = best_map_size(load_pa, MAX_EARLY_MAPPING_SIZE); + uintptr_t map_size; #ifndef __PAGETABLE_PMD_FOLDED pmd_t fix_bmap_spmd, fix_bmap_epmd; #endif @@ -464,12 +454,11 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa) * Enforce boot alignment requirements of RV32 and * RV64 by only allowing PMD or PGD mappings. */ - BUG_ON(map_size == PAGE_SIZE); + map_size = PMD_SIZE; /* Sanity check alignment and size */ BUG_ON((PAGE_OFFSET % PGDIR_SIZE) != 0); BUG_ON((load_pa % map_size) != 0); - BUG_ON(load_sz > MAX_EARLY_MAPPING_SIZE); pt_ops.alloc_pte = alloc_pte_early; pt_ops.get_pte_virt = get_pte_virt_early; -- 2.20.1