Re: [PATCHv7 4/4] powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid

2023-10-06 Thread Pingfan Liu
On Wed, Oct 4, 2023 at 2:07 AM Mahesh J Salgaonkar  wrote:
>
> On 2023-09-25 15:53:48 Mon, Pingfan Liu wrote:
> > paca_ptrs should be large enough to hold the boot_cpuid, hence, its
> > lower boundary is set to the bigger one between boot_cpuid+1 and
> > nr_cpus.
> >
> > On the other hand, some kernel component: -1. the timer assumes cpu0
> > online since the timer_list->flags subfield 'TIMER_CPUMASK' is zero if
> > not initialized to a proper present cpu.  -2. power9_idle_stop() assumes
> > the primary thread's paca is allocated.
> >
> > Hence lift nr_cpu_ids from one to two to ensure cpu0 is onlined, if the
> > boot cpu is not cpu0.
> >
> > Result:
> > When nr_cpus=1, taskset -c 14 bash -c 'echo c > /proc/sysrq-trigger'
> > the kdump kernel brings up two cpus.
> > While when taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger',
> > the kdump kernel brings up one cpu.
>
> I tried your changes on power9 and power10 systems. However, on power10 lpar I
> see bellow backtrace in kdump kernel bootup with nr_cpus=1.
>

Thanks for the testing. I have only tried this series on Power9 bare
metal.  I think the bug is related with the code snippet in
update_mask_from_threadgroup()
  for (i = first_thread; i < first_thread + threads_per_core; i++) {
int i_group_start = get_cpu_thread_group_start(i, tg);
  ^^^

Here it iterates over each thread in the core, but some of them are not online.

I will try to bring up a remedy.

Thanks,

Pingfan


> $ taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger'
> [...]
> [0.00] Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf06 
> of:IBM,FW1040.00 (NL1040_005) hv:phyp pSeries
> [0.00] printk: bootconsole [udbg0] enabled
> [0.00] the round shift between dt seq and the cpu logic number: 8
> [0.00] Partition configured for 16 cpus, operating system maximum is 
> 2.
> [0.00] CPU maps initialized for 8 threads per core
> [...]
> [0.002249] BUG: Unable to handle kernel data access at 0x88c0
> [0.002260] Faulting instruction address: 0xc0001201226c
> [0.002268] Oops: Kernel access of bad area, sig: 11 [#1]
> [0.002274] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> [0.002282] Modules linked in:
> [0.002288] CPU: 4 PID: 1 Comm: swapper/4 Not tainted 6.6.0-rc4 #1
> [0.002296] Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf06 
> of:IBM,FW1040.00 (NL1040_005) hv:phyp pSeries
> [0.002305] NIP:  c0001201226c LR: c00012012234 CTR: 
> 0004
> [0.002312] REGS: c000167ff8f0 TRAP: 0380   Not tainted  (6.6.0-rc4)
> [0.002321] MSR:  82009033   CR: 
> 24000844  XER: 000a
> [0.002346] CFAR: c0001201231c IRQMASK: 0
> [0.002346] GPR00: c00012012234 c000167ffb90 c00011b61900 
> 0002
> [0.002346] GPR04:  0001 0001 
> c0004ffeff80
> [0.002346] GPR08:   0002 
> 
> [0.002346] GPR12:  c00013141000 c00010011058 
> 
> [0.002346] GPR16:    
> 
> [0.002346] GPR20: 0028 c00012170968 c000120a3e80 
> 0016
> [0.002346] GPR24: c0004ffdcfd0  c00012b82058 
> 
> [0.002346] GPR28: c0004fc80a68 c00012bf0350 c000120a3e2c 
> 
> [0.002426] NIP [c0001201226c] update_mask_from_threadgroup+0x98/0x174
> [0.002437] LR [c00012012234] update_mask_from_threadgroup+0x60/0x174
> [0.002444] Call Trace:
> [0.002451] [c000167ffb90] [c00012012234] 
> update_mask_from_threadgroup+0x60/0x174 (unreliable)
> [0.002464] [c000167ffbe0] [c000120125f8] 
> init_thread_group_cache_map+0x2b0/0x328
> [0.002477] [c000167ffc50] [c0001201296c] 
> smp_prepare_cpus+0x2fc/0x4f0
> [0.002497] [c000167ffd10] [c00012004e40] 
> kernel_init_freeable+0x198/0x3cc
> [0.002509] [c000167ffde0] [c00010011084] kernel_init+0x34/0x1b0
> [0.002531] [c000167ffe50] [c0001000dd3c] 
> ret_from_kernel_user_thread+0x14/0x1c
> [0.002547] --- interrupt: 0 at 0x0
> [0.002553] NIP:   LR:  CTR: 
> 
> [0.002563] REGS: c000167ffe80 TRAP:    Not tainted  (6.6.0-rc4)
> [0.002569] MSR:   <>  CR:   XER: 
> [0.002576] CFAR:  IRQMASK: 0
> [0.002576] GPR00:    
> 
> [0.002576] GPR04:    
> 
> [0.002576] GPR08:    
> 
> [0.002576] GPR12:    
> 
> [0.002

Re: [PATCHv7 4/4] powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid

2023-10-03 Thread Mahesh J Salgaonkar
On 2023-09-25 15:53:48 Mon, Pingfan Liu wrote:
> paca_ptrs should be large enough to hold the boot_cpuid, hence, its
> lower boundary is set to the bigger one between boot_cpuid+1 and
> nr_cpus.
> 
> On the other hand, some kernel component: -1. the timer assumes cpu0
> online since the timer_list->flags subfield 'TIMER_CPUMASK' is zero if
> not initialized to a proper present cpu.  -2. power9_idle_stop() assumes
> the primary thread's paca is allocated.
> 
> Hence lift nr_cpu_ids from one to two to ensure cpu0 is onlined, if the
> boot cpu is not cpu0.
> 
> Result:
> When nr_cpus=1, taskset -c 14 bash -c 'echo c > /proc/sysrq-trigger'
> the kdump kernel brings up two cpus.
> While when taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger',
> the kdump kernel brings up one cpu.

I tried your changes on power9 and power10 systems. However, on power10 lpar I
see bellow backtrace in kdump kernel bootup with nr_cpus=1.

$ taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger'
[...]
[0.00] Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf06 
of:IBM,FW1040.00 (NL1040_005) hv:phyp pSeries
[0.00] printk: bootconsole [udbg0] enabled
[0.00] the round shift between dt seq and the cpu logic number: 8
[0.00] Partition configured for 16 cpus, operating system maximum is 2.
[0.00] CPU maps initialized for 8 threads per core
[...]
[0.002249] BUG: Unable to handle kernel data access at 0x88c0
[0.002260] Faulting instruction address: 0xc0001201226c
[0.002268] Oops: Kernel access of bad area, sig: 11 [#1]
[0.002274] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[0.002282] Modules linked in:
[0.002288] CPU: 4 PID: 1 Comm: swapper/4 Not tainted 6.6.0-rc4 #1
[0.002296] Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf06 
of:IBM,FW1040.00 (NL1040_005) hv:phyp pSeries
[0.002305] NIP:  c0001201226c LR: c00012012234 CTR: 0004
[0.002312] REGS: c000167ff8f0 TRAP: 0380   Not tainted  (6.6.0-rc4)
[0.002321] MSR:  82009033   CR: 24000844  
XER: 000a
[0.002346] CFAR: c0001201231c IRQMASK: 0
[0.002346] GPR00: c00012012234 c000167ffb90 c00011b61900 
0002
[0.002346] GPR04:  0001 0001 
c0004ffeff80
[0.002346] GPR08:   0002 

[0.002346] GPR12:  c00013141000 c00010011058 

[0.002346] GPR16:    

[0.002346] GPR20: 0028 c00012170968 c000120a3e80 
0016
[0.002346] GPR24: c0004ffdcfd0  c00012b82058 

[0.002346] GPR28: c0004fc80a68 c00012bf0350 c000120a3e2c 

[0.002426] NIP [c0001201226c] update_mask_from_threadgroup+0x98/0x174
[0.002437] LR [c00012012234] update_mask_from_threadgroup+0x60/0x174
[0.002444] Call Trace:
[0.002451] [c000167ffb90] [c00012012234] 
update_mask_from_threadgroup+0x60/0x174 (unreliable)
[0.002464] [c000167ffbe0] [c000120125f8] 
init_thread_group_cache_map+0x2b0/0x328
[0.002477] [c000167ffc50] [c0001201296c] 
smp_prepare_cpus+0x2fc/0x4f0
[0.002497] [c000167ffd10] [c00012004e40] 
kernel_init_freeable+0x198/0x3cc
[0.002509] [c000167ffde0] [c00010011084] kernel_init+0x34/0x1b0
[0.002531] [c000167ffe50] [c0001000dd3c] 
ret_from_kernel_user_thread+0x14/0x1c
[0.002547] --- interrupt: 0 at 0x0
[0.002553] NIP:   LR:  CTR: 
[0.002563] REGS: c000167ffe80 TRAP:    Not tainted  (6.6.0-rc4)
[0.002569] MSR:   <>  CR:   XER: 
[0.002576] CFAR:  IRQMASK: 0
[0.002576] GPR00:    

[0.002576] GPR04:    

[0.002576] GPR08:    

[0.002576] GPR12:    

[0.002576] GPR16:    

[0.002576] GPR20:    

[0.002576] GPR24:    

[0.002576] GPR28:    

[0.002671] NIP [] 0x0
[0.002680] LR [] 0x0
[0.002689] --- interrupt: 0
[0.002697] Code: 7feafb78 813d 7d29fa14 7f895000 409d00d4 3ce20102 
38e74758 79491f24 e87e0006 3900 e8e7 7d27482a  7f834000 
79090020 419e005c
[0.002727] ---[ end trace  ]---
[0.002739]
[1.0

[PATCHv7 4/4] powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid

2023-09-25 Thread Pingfan Liu
paca_ptrs should be large enough to hold the boot_cpuid, hence, its
lower boundary is set to the bigger one between boot_cpuid+1 and
nr_cpus.

On the other hand, some kernel component: -1. the timer assumes cpu0
online since the timer_list->flags subfield 'TIMER_CPUMASK' is zero if
not initialized to a proper present cpu.  -2. power9_idle_stop() assumes
the primary thread's paca is allocated.

Hence lift nr_cpu_ids from one to two to ensure cpu0 is onlined, if the
boot cpu is not cpu0.

Result:
When nr_cpus=1, taskset -c 14 bash -c 'echo c > /proc/sysrq-trigger'
the kdump kernel brings up two cpus.
While when taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger',
the kdump kernel brings up one cpu.

Signed-off-by: Pingfan Liu 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Mahesh Salgaonkar 
Cc: Wen Xiong 
Cc: Baoquan He 
Cc: Ming Lei 
Cc: ke...@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/kernel/paca.c | 10 ++
 arch/powerpc/kernel/prom.c |  9 ++---
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index cda4e00b67c1..91e2401de1bd 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -242,9 +242,10 @@ static int __initdata paca_struct_size;
 
 void __init allocate_paca_ptrs(void)
 {
-   paca_nr_cpu_ids = nr_cpu_ids;
+   int n = (boot_cpuid + 1) > nr_cpu_ids ? (boot_cpuid + 1) : nr_cpu_ids;
 
-   paca_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
+   paca_nr_cpu_ids = n;
+   paca_ptrs_size = sizeof(struct paca_struct *) * n;
paca_ptrs = memblock_alloc_raw(paca_ptrs_size, SMP_CACHE_BYTES);
if (!paca_ptrs)
panic("Failed to allocate %d bytes for paca pointers\n",
@@ -287,13 +288,14 @@ void __init allocate_paca(int cpu)
 void __init free_unused_pacas(void)
 {
int new_ptrs_size;
+   int n = (boot_cpuid + 1) > nr_cpu_ids ? (boot_cpuid + 1) : nr_cpu_ids;
 
-   new_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
+   new_ptrs_size = sizeof(struct paca_struct *) * n;
if (new_ptrs_size < paca_ptrs_size)
memblock_phys_free(__pa(paca_ptrs) + new_ptrs_size,
   paca_ptrs_size - new_ptrs_size);
 
-   paca_nr_cpu_ids = nr_cpu_ids;
+   paca_nr_cpu_ids = n;
paca_ptrs_size = new_ptrs_size;
 
 #ifdef CONFIG_PPC_64S_HASH_MMU
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 87272a2d8c10..15c994f54bf9 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -362,9 +362,12 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
 */
boot_cpuid = i;
found = true;
-   /* This works around the hole in paca_ptrs[]. */
-   if (nr_cpu_ids < nthreads)
-   set_nr_cpu_ids(nthreads);
+   /*
+* Ideally, nr_cpus=1 can be achieved if each kernel
+* component does not assume cpu0 is onlined.
+*/
+   if (boot_cpuid != 0 && nr_cpu_ids < 2)
+   set_nr_cpu_ids(2);
}
 #ifdef CONFIG_SMP
/* logical cpu id is always 0 on UP kernels */
-- 
2.31.1