Re: [PATCHv7 4/4] powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid
On Wed, Oct 4, 2023 at 2:07 AM Mahesh J Salgaonkar wrote: > > On 2023-09-25 15:53:48 Mon, Pingfan Liu wrote: > > paca_ptrs should be large enough to hold the boot_cpuid, hence, its > > lower boundary is set to the bigger one between boot_cpuid+1 and > > nr_cpus. > > > > On the other hand, some kernel component: -1. the timer assumes cpu0 > > online since the timer_list->flags subfield 'TIMER_CPUMASK' is zero if > > not initialized to a proper present cpu. -2. power9_idle_stop() assumes > > the primary thread's paca is allocated. > > > > Hence lift nr_cpu_ids from one to two to ensure cpu0 is onlined, if the > > boot cpu is not cpu0. > > > > Result: > > When nr_cpus=1, taskset -c 14 bash -c 'echo c > /proc/sysrq-trigger' > > the kdump kernel brings up two cpus. > > While when taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger', > > the kdump kernel brings up one cpu. > > I tried your changes on power9 and power10 systems. However, on power10 lpar I > see bellow backtrace in kdump kernel bootup with nr_cpus=1. > Thanks for the testing. I have only tried this series on Power9 bare metal. I think the bug is related with the code snippet in update_mask_from_threadgroup() for (i = first_thread; i < first_thread + threads_per_core; i++) { int i_group_start = get_cpu_thread_group_start(i, tg); ^^^ Here it iterates over each thread in the core, but some of them are not online. I will try to bring up a remedy. Thanks, Pingfan > $ taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger' > [...] > [0.00] Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf06 > of:IBM,FW1040.00 (NL1040_005) hv:phyp pSeries > [0.00] printk: bootconsole [udbg0] enabled > [0.00] the round shift between dt seq and the cpu logic number: 8 > [0.00] Partition configured for 16 cpus, operating system maximum is > 2. > [0.00] CPU maps initialized for 8 threads per core > [...] > [0.002249] BUG: Unable to handle kernel data access at 0x88c0 > [0.002260] Faulting instruction address: 0xc0001201226c > [0.002268] Oops: Kernel access of bad area, sig: 11 [#1] > [0.002274] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries > [0.002282] Modules linked in: > [0.002288] CPU: 4 PID: 1 Comm: swapper/4 Not tainted 6.6.0-rc4 #1 > [0.002296] Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf06 > of:IBM,FW1040.00 (NL1040_005) hv:phyp pSeries > [0.002305] NIP: c0001201226c LR: c00012012234 CTR: > 0004 > [0.002312] REGS: c000167ff8f0 TRAP: 0380 Not tainted (6.6.0-rc4) > [0.002321] MSR: 82009033 CR: > 24000844 XER: 000a > [0.002346] CFAR: c0001201231c IRQMASK: 0 > [0.002346] GPR00: c00012012234 c000167ffb90 c00011b61900 > 0002 > [0.002346] GPR04: 0001 0001 > c0004ffeff80 > [0.002346] GPR08: 0002 > > [0.002346] GPR12: c00013141000 c00010011058 > > [0.002346] GPR16: > > [0.002346] GPR20: 0028 c00012170968 c000120a3e80 > 0016 > [0.002346] GPR24: c0004ffdcfd0 c00012b82058 > > [0.002346] GPR28: c0004fc80a68 c00012bf0350 c000120a3e2c > > [0.002426] NIP [c0001201226c] update_mask_from_threadgroup+0x98/0x174 > [0.002437] LR [c00012012234] update_mask_from_threadgroup+0x60/0x174 > [0.002444] Call Trace: > [0.002451] [c000167ffb90] [c00012012234] > update_mask_from_threadgroup+0x60/0x174 (unreliable) > [0.002464] [c000167ffbe0] [c000120125f8] > init_thread_group_cache_map+0x2b0/0x328 > [0.002477] [c000167ffc50] [c0001201296c] > smp_prepare_cpus+0x2fc/0x4f0 > [0.002497] [c000167ffd10] [c00012004e40] > kernel_init_freeable+0x198/0x3cc > [0.002509] [c000167ffde0] [c00010011084] kernel_init+0x34/0x1b0 > [0.002531] [c000167ffe50] [c0001000dd3c] > ret_from_kernel_user_thread+0x14/0x1c > [0.002547] --- interrupt: 0 at 0x0 > [0.002553] NIP: LR: CTR: > > [0.002563] REGS: c000167ffe80 TRAP: Not tainted (6.6.0-rc4) > [0.002569] MSR: <> CR: XER: > [0.002576] CFAR: IRQMASK: 0 > [0.002576] GPR00: > > [0.002576] GPR04: > > [0.002576] GPR08: > > [0.002576] GPR12: > > [0.002
Re: [PATCHv7 4/4] powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid
On 2023-09-25 15:53:48 Mon, Pingfan Liu wrote: > paca_ptrs should be large enough to hold the boot_cpuid, hence, its > lower boundary is set to the bigger one between boot_cpuid+1 and > nr_cpus. > > On the other hand, some kernel component: -1. the timer assumes cpu0 > online since the timer_list->flags subfield 'TIMER_CPUMASK' is zero if > not initialized to a proper present cpu. -2. power9_idle_stop() assumes > the primary thread's paca is allocated. > > Hence lift nr_cpu_ids from one to two to ensure cpu0 is onlined, if the > boot cpu is not cpu0. > > Result: > When nr_cpus=1, taskset -c 14 bash -c 'echo c > /proc/sysrq-trigger' > the kdump kernel brings up two cpus. > While when taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger', > the kdump kernel brings up one cpu. I tried your changes on power9 and power10 systems. However, on power10 lpar I see bellow backtrace in kdump kernel bootup with nr_cpus=1. $ taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger' [...] [0.00] Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf06 of:IBM,FW1040.00 (NL1040_005) hv:phyp pSeries [0.00] printk: bootconsole [udbg0] enabled [0.00] the round shift between dt seq and the cpu logic number: 8 [0.00] Partition configured for 16 cpus, operating system maximum is 2. [0.00] CPU maps initialized for 8 threads per core [...] [0.002249] BUG: Unable to handle kernel data access at 0x88c0 [0.002260] Faulting instruction address: 0xc0001201226c [0.002268] Oops: Kernel access of bad area, sig: 11 [#1] [0.002274] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries [0.002282] Modules linked in: [0.002288] CPU: 4 PID: 1 Comm: swapper/4 Not tainted 6.6.0-rc4 #1 [0.002296] Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf06 of:IBM,FW1040.00 (NL1040_005) hv:phyp pSeries [0.002305] NIP: c0001201226c LR: c00012012234 CTR: 0004 [0.002312] REGS: c000167ff8f0 TRAP: 0380 Not tainted (6.6.0-rc4) [0.002321] MSR: 82009033 CR: 24000844 XER: 000a [0.002346] CFAR: c0001201231c IRQMASK: 0 [0.002346] GPR00: c00012012234 c000167ffb90 c00011b61900 0002 [0.002346] GPR04: 0001 0001 c0004ffeff80 [0.002346] GPR08: 0002 [0.002346] GPR12: c00013141000 c00010011058 [0.002346] GPR16: [0.002346] GPR20: 0028 c00012170968 c000120a3e80 0016 [0.002346] GPR24: c0004ffdcfd0 c00012b82058 [0.002346] GPR28: c0004fc80a68 c00012bf0350 c000120a3e2c [0.002426] NIP [c0001201226c] update_mask_from_threadgroup+0x98/0x174 [0.002437] LR [c00012012234] update_mask_from_threadgroup+0x60/0x174 [0.002444] Call Trace: [0.002451] [c000167ffb90] [c00012012234] update_mask_from_threadgroup+0x60/0x174 (unreliable) [0.002464] [c000167ffbe0] [c000120125f8] init_thread_group_cache_map+0x2b0/0x328 [0.002477] [c000167ffc50] [c0001201296c] smp_prepare_cpus+0x2fc/0x4f0 [0.002497] [c000167ffd10] [c00012004e40] kernel_init_freeable+0x198/0x3cc [0.002509] [c000167ffde0] [c00010011084] kernel_init+0x34/0x1b0 [0.002531] [c000167ffe50] [c0001000dd3c] ret_from_kernel_user_thread+0x14/0x1c [0.002547] --- interrupt: 0 at 0x0 [0.002553] NIP: LR: CTR: [0.002563] REGS: c000167ffe80 TRAP: Not tainted (6.6.0-rc4) [0.002569] MSR: <> CR: XER: [0.002576] CFAR: IRQMASK: 0 [0.002576] GPR00: [0.002576] GPR04: [0.002576] GPR08: [0.002576] GPR12: [0.002576] GPR16: [0.002576] GPR20: [0.002576] GPR24: [0.002576] GPR28: [0.002671] NIP [] 0x0 [0.002680] LR [] 0x0 [0.002689] --- interrupt: 0 [0.002697] Code: 7feafb78 813d 7d29fa14 7f895000 409d00d4 3ce20102 38e74758 79491f24 e87e0006 3900 e8e7 7d27482a 7f834000 79090020 419e005c [0.002727] ---[ end trace ]--- [0.002739] [1.0
[PATCHv7 4/4] powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid
paca_ptrs should be large enough to hold the boot_cpuid, hence, its lower boundary is set to the bigger one between boot_cpuid+1 and nr_cpus. On the other hand, some kernel component: -1. the timer assumes cpu0 online since the timer_list->flags subfield 'TIMER_CPUMASK' is zero if not initialized to a proper present cpu. -2. power9_idle_stop() assumes the primary thread's paca is allocated. Hence lift nr_cpu_ids from one to two to ensure cpu0 is onlined, if the boot cpu is not cpu0. Result: When nr_cpus=1, taskset -c 14 bash -c 'echo c > /proc/sysrq-trigger' the kdump kernel brings up two cpus. While when taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger', the kdump kernel brings up one cpu. Signed-off-by: Pingfan Liu Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Christophe Leroy Cc: Mahesh Salgaonkar Cc: Wen Xiong Cc: Baoquan He Cc: Ming Lei Cc: ke...@lists.infradead.org To: linuxppc-dev@lists.ozlabs.org --- arch/powerpc/kernel/paca.c | 10 ++ arch/powerpc/kernel/prom.c | 9 ++--- 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c index cda4e00b67c1..91e2401de1bd 100644 --- a/arch/powerpc/kernel/paca.c +++ b/arch/powerpc/kernel/paca.c @@ -242,9 +242,10 @@ static int __initdata paca_struct_size; void __init allocate_paca_ptrs(void) { - paca_nr_cpu_ids = nr_cpu_ids; + int n = (boot_cpuid + 1) > nr_cpu_ids ? (boot_cpuid + 1) : nr_cpu_ids; - paca_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids; + paca_nr_cpu_ids = n; + paca_ptrs_size = sizeof(struct paca_struct *) * n; paca_ptrs = memblock_alloc_raw(paca_ptrs_size, SMP_CACHE_BYTES); if (!paca_ptrs) panic("Failed to allocate %d bytes for paca pointers\n", @@ -287,13 +288,14 @@ void __init allocate_paca(int cpu) void __init free_unused_pacas(void) { int new_ptrs_size; + int n = (boot_cpuid + 1) > nr_cpu_ids ? (boot_cpuid + 1) : nr_cpu_ids; - new_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids; + new_ptrs_size = sizeof(struct paca_struct *) * n; if (new_ptrs_size < paca_ptrs_size) memblock_phys_free(__pa(paca_ptrs) + new_ptrs_size, paca_ptrs_size - new_ptrs_size); - paca_nr_cpu_ids = nr_cpu_ids; + paca_nr_cpu_ids = n; paca_ptrs_size = new_ptrs_size; #ifdef CONFIG_PPC_64S_HASH_MMU diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index 87272a2d8c10..15c994f54bf9 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -362,9 +362,12 @@ static int __init early_init_dt_scan_cpus(unsigned long node, */ boot_cpuid = i; found = true; - /* This works around the hole in paca_ptrs[]. */ - if (nr_cpu_ids < nthreads) - set_nr_cpu_ids(nthreads); + /* +* Ideally, nr_cpus=1 can be achieved if each kernel +* component does not assume cpu0 is onlined. +*/ + if (boot_cpuid != 0 && nr_cpu_ids < 2) + set_nr_cpu_ids(2); } #ifdef CONFIG_SMP /* logical cpu id is always 0 on UP kernels */ -- 2.31.1