Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.
On Mon, 2014-12-22 at 14:38 +0800, Dongsheng Wang wrote: From: Wang Dongsheng dongsheng.w...@freescale.com Kernel cannot bring up Non-boot cpus always get Processor xx is stuck. this issue bring by http://patchwork.ozlabs.org/patch/418912/ (powerpc: Secondary CPUs must set cpu_callin_map after setting active and online) We've decided we're just going to revert that patch for this cycle. We'll do a better fix for next, and hopefully people will test it before it gets to mainline :D cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.
Hi, On Mon, Dec 22, 2014 at 02:38:40PM +0800, Dongsheng Wang wrote: From: Wang Dongsheng dongsheng.w...@freescale.com Kernel cannot bring up Non-boot cpus always get Processor xx is stuck. this issue bring by http://patchwork.ozlabs.org/patch/418912/ (powerpc: Secondary CPUs must set cpu_callin_map after setting active and online) We need to take timebase after bootup cpu give the timebase firstly. When start_secondary, non-boot cpus set cpu_callin_map for boot cpu after that boot cpu will give the timebase for non-boot cpu. Otherwise non-boot cpus will fall in dead loop to waiting bootup cpu to give imebase. Signed-off-by: Wang Dongsheng dongsheng.w...@freescale.com This fixes v3.19-rc1 boot on G5 Xserve. Tested-by: Aaro Koskinen aaro.koski...@iki.fi A. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.
On Tue, 2014-12-23 at 12:00 +1100, Michael Ellerman wrote: On Mon, 2014-12-22 at 14:38 +0800, Dongsheng Wang wrote: From: Wang Dongsheng dongsheng.w...@freescale.com Kernel cannot bring up Non-boot cpus always get Processor xx is stuck. this issue bring by http://patchwork.ozlabs.org/patch/418912/ (powerpc: Secondary CPUs must set cpu_callin_map after setting active and online) We need to take timebase after bootup cpu give the timebase firstly. When start_secondary, non-boot cpus set cpu_callin_map for boot cpu after that boot cpu will give the timebase for non-boot cpu. Otherwise non-boot cpus will fall in dead loop to waiting bootup cpu to give imebase. Right. However, doesn't this introduce the possibility that the secondary cpu is up and marked online but has an unsynchronised clock? As long as it doesn't execute anything (and at that point it shouldn't have interrupts enabled) it should be ok. The TB cannot be observed outside of that CPU. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.
On Mon, 2014-12-22 at 14:38 +0800, Dongsheng Wang wrote: From: Wang Dongsheng dongsheng.w...@freescale.com Kernel cannot bring up Non-boot cpus always get Processor xx is stuck. this issue bring by http://patchwork.ozlabs.org/patch/418912/ (powerpc: Secondary CPUs must set cpu_callin_map after setting active and online) We need to take timebase after bootup cpu give the timebase firstly. When start_secondary, non-boot cpus set cpu_callin_map for boot cpu after that boot cpu will give the timebase for non-boot cpu. Otherwise non-boot cpus will fall in dead loop to waiting bootup cpu to give imebase. Right. However, doesn't this introduce the possibility that the secondary cpu is up and marked online but has an unsynchronised clock? cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.
-Original Message- From: Michael Ellerman [mailto:m...@ellerman.id.au] Sent: Tuesday, December 23, 2014 9:01 AM To: Wang Dongsheng-B40534 Cc: b...@kernel.crashing.org; Wood Scott-B07421; an...@samba.org; linuxppc- d...@lists.ozlabs.org Subject: Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up. On Mon, 2014-12-22 at 14:38 +0800, Dongsheng Wang wrote: From: Wang Dongsheng dongsheng.w...@freescale.com Kernel cannot bring up Non-boot cpus always get Processor xx is stuck. this issue bring by http://patchwork.ozlabs.org/patch/418912/ (powerpc: Secondary CPUs must set cpu_callin_map after setting active and online) We need to take timebase after bootup cpu give the timebase firstly. When start_secondary, non-boot cpus set cpu_callin_map for boot cpu after that boot cpu will give the timebase for non-boot cpu. Otherwise non-boot cpus will fall in dead loop to waiting bootup cpu to give imebase. Right. However, doesn't this introduce the possibility that the secondary cpu is up and marked online but has an unsynchronised clock? Yes, right. But Freescale platform boot-cpu will freeze the TB until secondary cpu take the time base, so the clock is synchronized. For generic PowerPC maybe has this issue. So for safe I think we need to set cpu online after synchronized clock. I will update my patch if you agree this way. + if (smp_ops-take_timebase) + smp_ops-take_timebase(); + secondary_cpu_time_init(); + Move set_cpu_online to here. + set_cpu_online(cpu, true); Regards, -Dongsheng ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.
On Tue, 2014-12-23 at 02:41 +, dongsheng.w...@freescale.com wrote: -Original Message- From: Michael Ellerman [mailto:m...@ellerman.id.au] Sent: Tuesday, December 23, 2014 9:01 AM To: Wang Dongsheng-B40534 Cc: b...@kernel.crashing.org; Wood Scott-B07421; an...@samba.org; linuxppc- d...@lists.ozlabs.org Subject: Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up. On Mon, 2014-12-22 at 14:38 +0800, Dongsheng Wang wrote: From: Wang Dongsheng dongsheng.w...@freescale.com Kernel cannot bring up Non-boot cpus always get Processor xx is stuck. this issue bring by http://patchwork.ozlabs.org/patch/418912/ (powerpc: Secondary CPUs must set cpu_callin_map after setting active and online) We need to take timebase after bootup cpu give the timebase firstly. When start_secondary, non-boot cpus set cpu_callin_map for boot cpu after that boot cpu will give the timebase for non-boot cpu. Otherwise non-boot cpus will fall in dead loop to waiting bootup cpu to give imebase. Right. However, doesn't this introduce the possibility that the secondary cpu is up and marked online but has an unsynchronised clock? Yes, right. But Freescale platform boot-cpu will freeze the TB until secondary cpu take the time base, so the clock is synchronized. It does the freeze in give_timebase() doesn't it? So there's still a window there where the secondary is up online but hasn't had it's timebase synchronised, and the primary hasn't frozen the timebase yet. So that makes me nervous. For generic PowerPC maybe has this issue. So for safe I think we need to set cpu online after synchronized clock. I will update my patch if you agree this way. + if (smp_ops-take_timebase) + smp_ops-take_timebase(); + secondary_cpu_time_init(); + Move set_cpu_online to here. + set_cpu_online(cpu, true); But that reverses the effect of the original patch, which was that we have to set online *before* we set the callin map. Looking harder at Anton's patch I'm not sure it's right anyway. The issue he was trying to fix was that the cpu was online but not active, which confused the scheduler. I think Anton missed that we have a loop that waits for online at the bottom of __cpu_up(): /* Wait until cpu puts itself in the online map */ while (!cpu_online(cpu)) cpu_relax(); He must have seen a case where that popped due to the cpu being online, but the cpu wasn't yet active. His patch fixed the problem by ensuring the previous loop that waits for cpu_callin_map doesn't finish until active online are set, making the while loop above a nop. So I think we should probably revert Anton's patch and instead change that while loop to: /* Wait until cpu is online AND active */ while (!cpu_online(cpu) || !cpu_active(cpu)) cpu_relax(); cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.
-Original Message- From: Michael Ellerman [mailto:m...@ellerman.id.au] Sent: Tuesday, December 23, 2014 1:46 PM To: Wang Dongsheng-B40534 Cc: b...@kernel.crashing.org; Wood Scott-B07421; an...@samba.org; linuxppc- d...@lists.ozlabs.org Subject: Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up. On Tue, 2014-12-23 at 02:41 +, dongsheng.w...@freescale.com wrote: -Original Message- From: Michael Ellerman [mailto:m...@ellerman.id.au] Sent: Tuesday, December 23, 2014 9:01 AM To: Wang Dongsheng-B40534 Cc: b...@kernel.crashing.org; Wood Scott-B07421; an...@samba.org; linuxppc- d...@lists.ozlabs.org Subject: Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up. On Mon, 2014-12-22 at 14:38 +0800, Dongsheng Wang wrote: From: Wang Dongsheng dongsheng.w...@freescale.com Kernel cannot bring up Non-boot cpus always get Processor xx is stuck. this issue bring by http://patchwork.ozlabs.org/patch/418912/ (powerpc: Secondary CPUs must set cpu_callin_map after setting active and online) We need to take timebase after bootup cpu give the timebase firstly. When start_secondary, non-boot cpus set cpu_callin_map for boot cpu after that boot cpu will give the timebase for non-boot cpu. Otherwise non-boot cpus will fall in dead loop to waiting bootup cpu to give imebase. Right. However, doesn't this introduce the possibility that the secondary cpu is up and marked online but has an unsynchronised clock? Yes, right. But Freescale platform boot-cpu will freeze the TB until secondary cpu take the time base, so the clock is synchronized. It does the freeze in give_timebase() doesn't it? So there's still a window there where the secondary is up online but hasn't had it's timebase synchronised, and the primary hasn't frozen the timebase yet. So that makes me nervous. For generic PowerPC maybe has this issue. So for safe I think we need to set cpu online after synchronized clock. I will update my patch if you agree this way. + if (smp_ops-take_timebase) + smp_ops-take_timebase(); + secondary_cpu_time_init(); + Move set_cpu_online to here. + set_cpu_online(cpu, true); But that reverses the effect of the original patch, which was that we have to set online *before* we set the callin map. Looking harder at Anton's patch I'm not sure it's right anyway. The issue he was trying to fix was that the cpu was online but not active, which confused the scheduler. I think Anton missed that we have a loop that waits for online at the bottom of __cpu_up(): /* Wait until cpu puts itself in the online map */ while (!cpu_online(cpu)) cpu_relax(); He must have seen a case where that popped due to the cpu being online, but the cpu wasn't yet active. His patch fixed the problem by ensuring the previous loop that waits for cpu_callin_map doesn't finish until active online are set, making the while loop above a nop. So I think we should probably revert Anton's patch and instead change that while loop to: /* Wait until cpu is online AND active */ while (!cpu_online(cpu) || !cpu_active(cpu)) cpu_relax(); Umm.. Sorry about that...I forgot Anton's patch. But set_cpu_online also set cpu_active_bits, I think this judgment cannot fix Anton's issue. It is actually the effects of the original is the same. Regrads, -Dongsheng ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.
-Original Message- From: Wang Dongsheng-B40534 Sent: Tuesday, December 23, 2014 2:53 PM To: 'Michael Ellerman' Cc: b...@kernel.crashing.org; Wood Scott-B07421; an...@samba.org; linuxppc- d...@lists.ozlabs.org Subject: RE: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up. -Original Message- From: Michael Ellerman [mailto:m...@ellerman.id.au] Sent: Tuesday, December 23, 2014 1:46 PM To: Wang Dongsheng-B40534 Cc: b...@kernel.crashing.org; Wood Scott-B07421; an...@samba.org; linuxppc- d...@lists.ozlabs.org Subject: Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up. On Tue, 2014-12-23 at 02:41 +, dongsheng.w...@freescale.com wrote: -Original Message- From: Michael Ellerman [mailto:m...@ellerman.id.au] Sent: Tuesday, December 23, 2014 9:01 AM To: Wang Dongsheng-B40534 Cc: b...@kernel.crashing.org; Wood Scott-B07421; an...@samba.org; linuxppc- d...@lists.ozlabs.org Subject: Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up. On Mon, 2014-12-22 at 14:38 +0800, Dongsheng Wang wrote: From: Wang Dongsheng dongsheng.w...@freescale.com Kernel cannot bring up Non-boot cpus always get Processor xx is stuck. this issue bring by http://patchwork.ozlabs.org/patch/418912/ (powerpc: Secondary CPUs must set cpu_callin_map after setting active and online) We need to take timebase after bootup cpu give the timebase firstly. When start_secondary, non-boot cpus set cpu_callin_map for boot cpu after that boot cpu will give the timebase for non-boot cpu. Otherwise non-boot cpus will fall in dead loop to waiting bootup cpu to give imebase. Right. However, doesn't this introduce the possibility that the secondary cpu is up and marked online but has an unsynchronised clock? Yes, right. But Freescale platform boot-cpu will freeze the TB until secondary cpu take the time base, so the clock is synchronized. It does the freeze in give_timebase() doesn't it? So there's still a window there where the secondary is up online but hasn't had it's timebase synchronised, and the primary hasn't frozen the timebase yet. So that makes me nervous. For generic PowerPC maybe has this issue. So for safe I think we need to set cpu online after synchronized clock. I will update my patch if you agree this way. + if (smp_ops-take_timebase) + smp_ops-take_timebase(); + secondary_cpu_time_init(); + Move set_cpu_online to here. + set_cpu_online(cpu, true); But that reverses the effect of the original patch, which was that we have to set online *before* we set the callin map. Looking harder at Anton's patch I'm not sure it's right anyway. The issue he was trying to fix was that the cpu was online but not active, which confused the scheduler. I think Anton missed that we have a loop that waits for online at the bottom of __cpu_up(): /* Wait until cpu puts itself in the online map */ while (!cpu_online(cpu)) cpu_relax(); He must have seen a case where that popped due to the cpu being online, but the cpu wasn't yet active. His patch fixed the problem by ensuring the previous loop that waits for cpu_callin_map doesn't finish until active online are set, making the while loop above a nop. So I think we should probably revert Anton's patch and instead change that while loop to: /* Wait until cpu is online AND active */ while (!cpu_online(cpu) || !cpu_active(cpu)) cpu_relax(); Base on Anton's patch, we should probably change __cup_up. Please comment the changes. --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -528,12 +528,10 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle) } /* -* wait to see if the cpu made a callin (is actually up). -* use this value that I found through experimentation. -* -- Cort +* Wait until cpu puts itself in the online map */ if (system_state SYSTEM_RUNNING) - for (c = 5; c !cpu_callin_map[cpu]; c--) + for (c = 5; c !cpu_online(cpu); c--) udelay(100); #ifdef CONFIG_HOTPLUG_CPU else @@ -541,11 +539,10 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle) * CPUs can take much longer to come up in the * hotplug case. Wait five seconds. */ - for (c = 5000; c !cpu_callin_map[cpu]; c--) + for (c = 5000; c !cpu_online(cpu); c--) msleep(1); #endif - - if (!cpu_callin_map[cpu]) { + if (!cpu_online(cpu)) { printk(KERN_ERR Processor %u is stuck.\n, cpu); return -ENOENT; } @@ -555,8 +552,8 @@ int __cpu_up(unsigned int