Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.

2014-12-29 Thread Michael Ellerman
On Mon, 2014-12-22 at 14:38 +0800, Dongsheng Wang wrote:
 From: Wang Dongsheng dongsheng.w...@freescale.com
 
 Kernel cannot bring up Non-boot cpus always get Processor xx is stuck.
 this issue bring by http://patchwork.ozlabs.org/patch/418912/ (powerpc:
 Secondary CPUs must set cpu_callin_map after setting active and online)

We've decided we're just going to revert that patch for this cycle.

We'll do a better fix for next, and hopefully people will test it before it
gets to mainline :D

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.

2014-12-26 Thread Aaro Koskinen
Hi,

On Mon, Dec 22, 2014 at 02:38:40PM +0800, Dongsheng Wang wrote:
 From: Wang Dongsheng dongsheng.w...@freescale.com
 
 Kernel cannot bring up Non-boot cpus always get Processor xx is stuck.
 this issue bring by http://patchwork.ozlabs.org/patch/418912/ (powerpc:
 Secondary CPUs must set cpu_callin_map after setting active and online)
 We need to take timebase after bootup cpu give the timebase firstly.
 
 When start_secondary, non-boot cpus set cpu_callin_map for boot cpu
 after that boot cpu will give the timebase for non-boot cpu. Otherwise
 non-boot cpus will fall in dead loop to waiting bootup cpu to give
 imebase.
 
 Signed-off-by: Wang Dongsheng dongsheng.w...@freescale.com

This fixes v3.19-rc1 boot on G5 Xserve.

Tested-by: Aaro Koskinen aaro.koski...@iki.fi

A.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.

2014-12-23 Thread Benjamin Herrenschmidt
On Tue, 2014-12-23 at 12:00 +1100, Michael Ellerman wrote:
 On Mon, 2014-12-22 at 14:38 +0800, Dongsheng Wang wrote:
  From: Wang Dongsheng dongsheng.w...@freescale.com
  
  Kernel cannot bring up Non-boot cpus always get Processor xx is stuck.
  this issue bring by http://patchwork.ozlabs.org/patch/418912/ (powerpc:
  Secondary CPUs must set cpu_callin_map after setting active and online)
  We need to take timebase after bootup cpu give the timebase firstly.
  
  When start_secondary, non-boot cpus set cpu_callin_map for boot cpu
  after that boot cpu will give the timebase for non-boot cpu. Otherwise
  non-boot cpus will fall in dead loop to waiting bootup cpu to give
  imebase.
 
 Right.
 
 However, doesn't this introduce the possibility that the secondary cpu is up
 and marked online but has an unsynchronised clock?

As long as it doesn't execute anything (and at that point it shouldn't
have interrupts enabled) it should be ok. The TB cannot be observed
outside of that CPU.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.

2014-12-22 Thread Michael Ellerman
On Mon, 2014-12-22 at 14:38 +0800, Dongsheng Wang wrote:
 From: Wang Dongsheng dongsheng.w...@freescale.com
 
 Kernel cannot bring up Non-boot cpus always get Processor xx is stuck.
 this issue bring by http://patchwork.ozlabs.org/patch/418912/ (powerpc:
 Secondary CPUs must set cpu_callin_map after setting active and online)
 We need to take timebase after bootup cpu give the timebase firstly.
 
 When start_secondary, non-boot cpus set cpu_callin_map for boot cpu
 after that boot cpu will give the timebase for non-boot cpu. Otherwise
 non-boot cpus will fall in dead loop to waiting bootup cpu to give
 imebase.

Right.

However, doesn't this introduce the possibility that the secondary cpu is up
and marked online but has an unsynchronised clock?

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.

2014-12-22 Thread dongsheng.w...@freescale.com


 -Original Message-
 From: Michael Ellerman [mailto:m...@ellerman.id.au]
 Sent: Tuesday, December 23, 2014 9:01 AM
 To: Wang Dongsheng-B40534
 Cc: b...@kernel.crashing.org; Wood Scott-B07421; an...@samba.org; linuxppc-
 d...@lists.ozlabs.org
 Subject: Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.
 
 On Mon, 2014-12-22 at 14:38 +0800, Dongsheng Wang wrote:
  From: Wang Dongsheng dongsheng.w...@freescale.com
 
  Kernel cannot bring up Non-boot cpus always get Processor xx is stuck.
  this issue bring by http://patchwork.ozlabs.org/patch/418912/ (powerpc:
  Secondary CPUs must set cpu_callin_map after setting active and
  online) We need to take timebase after bootup cpu give the timebase firstly.
 
  When start_secondary, non-boot cpus set cpu_callin_map for boot cpu
  after that boot cpu will give the timebase for non-boot cpu. Otherwise
  non-boot cpus will fall in dead loop to waiting bootup cpu to give
  imebase.
 
 Right.
 
 However, doesn't this introduce the possibility that the secondary cpu is up 
 and
 marked online but has an unsynchronised clock?
 
Yes, right. But Freescale platform boot-cpu will freeze the TB until secondary 
cpu
take the time base, so the clock is synchronized.

For generic PowerPC maybe has this issue. So for safe I think we need to set 
cpu online
after synchronized clock.

I will update my patch if you agree this way.
+   if (smp_ops-take_timebase)
+   smp_ops-take_timebase();
+   secondary_cpu_time_init();
+
Move set_cpu_online to here.
+   set_cpu_online(cpu, true);

Regards,
-Dongsheng
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.

2014-12-22 Thread Michael Ellerman
On Tue, 2014-12-23 at 02:41 +, dongsheng.w...@freescale.com wrote:
  -Original Message-
  From: Michael Ellerman [mailto:m...@ellerman.id.au]
  Sent: Tuesday, December 23, 2014 9:01 AM
  To: Wang Dongsheng-B40534
  Cc: b...@kernel.crashing.org; Wood Scott-B07421; an...@samba.org; linuxppc-
  d...@lists.ozlabs.org
  Subject: Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.
  
  On Mon, 2014-12-22 at 14:38 +0800, Dongsheng Wang wrote:
   From: Wang Dongsheng dongsheng.w...@freescale.com
  
   Kernel cannot bring up Non-boot cpus always get Processor xx is stuck.
   this issue bring by http://patchwork.ozlabs.org/patch/418912/ (powerpc:
   Secondary CPUs must set cpu_callin_map after setting active and
   online) We need to take timebase after bootup cpu give the timebase 
   firstly.
  
   When start_secondary, non-boot cpus set cpu_callin_map for boot cpu
   after that boot cpu will give the timebase for non-boot cpu. Otherwise
   non-boot cpus will fall in dead loop to waiting bootup cpu to give
   imebase.
  
  Right.
  
  However, doesn't this introduce the possibility that the secondary cpu is 
  up and
  marked online but has an unsynchronised clock?
 
 Yes, right. But Freescale platform boot-cpu will freeze the TB until 
 secondary cpu
 take the time base, so the clock is synchronized.

It does the freeze in give_timebase() doesn't it?

So there's still a window there where the secondary is up  online but hasn't
had it's timebase synchronised, and the primary hasn't frozen the timebase yet.
So that makes me nervous.

 For generic PowerPC maybe has this issue. So for safe I think we need to set 
 cpu online
 after synchronized clock.
 
 I will update my patch if you agree this way.
 +   if (smp_ops-take_timebase)
 +   smp_ops-take_timebase();
 +   secondary_cpu_time_init();
 +
 Move set_cpu_online to here.
 +   set_cpu_online(cpu, true);

But that reverses the effect of the original patch, which was that we have to
set online *before* we set the callin map.


Looking harder at Anton's patch I'm not sure it's right anyway.

The issue he was trying to fix was that the cpu was online but not active,
which confused the scheduler.

I think Anton missed that we have a loop that waits for online at the bottom of
__cpu_up():

/* Wait until cpu puts itself in the online map */
while (!cpu_online(cpu))
cpu_relax();


He must have seen a case where that popped due to the cpu being online, but the
cpu wasn't yet active.

His patch fixed the problem by ensuring the previous loop that waits for
cpu_callin_map doesn't finish until active  online are set, making the while
loop above a nop.

So I think we should probably revert Anton's patch and instead change that
while loop to:

/* Wait until cpu is online AND active */
while (!cpu_online(cpu) || !cpu_active(cpu))
cpu_relax();


cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.

2014-12-22 Thread dongsheng.w...@freescale.com


 -Original Message-
 From: Michael Ellerman [mailto:m...@ellerman.id.au]
 Sent: Tuesday, December 23, 2014 1:46 PM
 To: Wang Dongsheng-B40534
 Cc: b...@kernel.crashing.org; Wood Scott-B07421; an...@samba.org; linuxppc-
 d...@lists.ozlabs.org
 Subject: Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.
 
 On Tue, 2014-12-23 at 02:41 +, dongsheng.w...@freescale.com wrote:
   -Original Message-
   From: Michael Ellerman [mailto:m...@ellerman.id.au]
   Sent: Tuesday, December 23, 2014 9:01 AM
   To: Wang Dongsheng-B40534
   Cc: b...@kernel.crashing.org; Wood Scott-B07421; an...@samba.org;
   linuxppc- d...@lists.ozlabs.org
   Subject: Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.
  
   On Mon, 2014-12-22 at 14:38 +0800, Dongsheng Wang wrote:
From: Wang Dongsheng dongsheng.w...@freescale.com
   
Kernel cannot bring up Non-boot cpus always get Processor xx is stuck.
this issue bring by http://patchwork.ozlabs.org/patch/418912/ (powerpc:
Secondary CPUs must set cpu_callin_map after setting active and
online) We need to take timebase after bootup cpu give the timebase
 firstly.
   
When start_secondary, non-boot cpus set cpu_callin_map for boot
cpu after that boot cpu will give the timebase for non-boot cpu.
Otherwise non-boot cpus will fall in dead loop to waiting bootup
cpu to give imebase.
  
   Right.
  
   However, doesn't this introduce the possibility that the secondary
   cpu is up and marked online but has an unsynchronised clock?
 
  Yes, right. But Freescale platform boot-cpu will freeze the TB until
  secondary cpu take the time base, so the clock is synchronized.
 
 It does the freeze in give_timebase() doesn't it?
 
 So there's still a window there where the secondary is up  online but hasn't
 had it's timebase synchronised, and the primary hasn't frozen the timebase 
 yet.
 So that makes me nervous.
 
  For generic PowerPC maybe has this issue. So for safe I think we need
  to set cpu online after synchronized clock.
 
  I will update my patch if you agree this way.
  +   if (smp_ops-take_timebase)
  +   smp_ops-take_timebase();
  +   secondary_cpu_time_init();
  +
  Move set_cpu_online to here.
  +   set_cpu_online(cpu, true);
 
 But that reverses the effect of the original patch, which was that we have to
 set online *before* we set the callin map.
 
 
 Looking harder at Anton's patch I'm not sure it's right anyway.
 
 The issue he was trying to fix was that the cpu was online but not active, 
 which
 confused the scheduler.
 
 I think Anton missed that we have a loop that waits for online at the bottom 
 of
 __cpu_up():
 
   /* Wait until cpu puts itself in the online map */
   while (!cpu_online(cpu))
   cpu_relax();
 
 
 He must have seen a case where that popped due to the cpu being online, but 
 the
 cpu wasn't yet active.
 
 His patch fixed the problem by ensuring the previous loop that waits for
 cpu_callin_map doesn't finish until active  online are set, making the while
 loop above a nop.
 
 So I think we should probably revert Anton's patch and instead change that 
 while
 loop to:
 
   /* Wait until cpu is online AND active */
   while (!cpu_online(cpu) || !cpu_active(cpu))
   cpu_relax();
 
Umm.. Sorry about that...I forgot Anton's patch.

But set_cpu_online also set cpu_active_bits, I think this judgment cannot fix 
Anton's issue.
It is actually the effects of the original is the same.

Regrads,
-Dongsheng

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.

2014-12-22 Thread dongsheng.w...@freescale.com


 -Original Message-
 From: Wang Dongsheng-B40534
 Sent: Tuesday, December 23, 2014 2:53 PM
 To: 'Michael Ellerman'
 Cc: b...@kernel.crashing.org; Wood Scott-B07421; an...@samba.org; linuxppc-
 d...@lists.ozlabs.org
 Subject: RE: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.
 
 
 
  -Original Message-
  From: Michael Ellerman [mailto:m...@ellerman.id.au]
  Sent: Tuesday, December 23, 2014 1:46 PM
  To: Wang Dongsheng-B40534
  Cc: b...@kernel.crashing.org; Wood Scott-B07421; an...@samba.org;
  linuxppc- d...@lists.ozlabs.org
  Subject: Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.
 
  On Tue, 2014-12-23 at 02:41 +, dongsheng.w...@freescale.com wrote:
-Original Message-
From: Michael Ellerman [mailto:m...@ellerman.id.au]
Sent: Tuesday, December 23, 2014 9:01 AM
To: Wang Dongsheng-B40534
Cc: b...@kernel.crashing.org; Wood Scott-B07421; an...@samba.org;
linuxppc- d...@lists.ozlabs.org
Subject: Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.
   
On Mon, 2014-12-22 at 14:38 +0800, Dongsheng Wang wrote:
 From: Wang Dongsheng dongsheng.w...@freescale.com

 Kernel cannot bring up Non-boot cpus always get Processor xx is 
 stuck.
 this issue bring by http://patchwork.ozlabs.org/patch/418912/ 
 (powerpc:
 Secondary CPUs must set cpu_callin_map after setting active and
 online) We need to take timebase after bootup cpu give the
 timebase
  firstly.

 When start_secondary, non-boot cpus set cpu_callin_map for boot
 cpu after that boot cpu will give the timebase for non-boot cpu.
 Otherwise non-boot cpus will fall in dead loop to waiting bootup
 cpu to give imebase.
   
Right.
   
However, doesn't this introduce the possibility that the secondary
cpu is up and marked online but has an unsynchronised clock?
 
   Yes, right. But Freescale platform boot-cpu will freeze the TB until
   secondary cpu take the time base, so the clock is synchronized.
 
  It does the freeze in give_timebase() doesn't it?
 
  So there's still a window there where the secondary is up  online but
  hasn't had it's timebase synchronised, and the primary hasn't frozen the
 timebase yet.
  So that makes me nervous.
 
   For generic PowerPC maybe has this issue. So for safe I think we
   need to set cpu online after synchronized clock.
  
   I will update my patch if you agree this way.
   +   if (smp_ops-take_timebase)
   +   smp_ops-take_timebase();
   +   secondary_cpu_time_init();
   +
   Move set_cpu_online to here.
   +   set_cpu_online(cpu, true);
 
  But that reverses the effect of the original patch, which was that we
  have to set online *before* we set the callin map.
 
 
  Looking harder at Anton's patch I'm not sure it's right anyway.
 
  The issue he was trying to fix was that the cpu was online but not
  active, which confused the scheduler.
 
  I think Anton missed that we have a loop that waits for online at the
  bottom of
  __cpu_up():
 
  /* Wait until cpu puts itself in the online map */
  while (!cpu_online(cpu))
  cpu_relax();
 
 
  He must have seen a case where that popped due to the cpu being
  online, but the cpu wasn't yet active.
 
  His patch fixed the problem by ensuring the previous loop that waits
  for cpu_callin_map doesn't finish until active  online are set,
  making the while loop above a nop.
 
  So I think we should probably revert Anton's patch and instead change
  that while loop to:
 
  /* Wait until cpu is online AND active */
  while (!cpu_online(cpu) || !cpu_active(cpu))
  cpu_relax();
 

Base on Anton's patch, we should probably change __cup_up.
Please comment the changes.
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -528,12 +528,10 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
}

/*
-* wait to see if the cpu made a callin (is actually up).
-* use this value that I found through experimentation.
-* -- Cort
+* Wait until cpu puts itself in the online map
 */
if (system_state  SYSTEM_RUNNING)
-   for (c = 5; c  !cpu_callin_map[cpu]; c--)
+   for (c = 5; c  !cpu_online(cpu); c--)
udelay(100);
 #ifdef CONFIG_HOTPLUG_CPU
else
@@ -541,11 +539,10 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
 * CPUs can take much longer to come up in the
 * hotplug case.  Wait five seconds.
 */
-   for (c = 5000; c  !cpu_callin_map[cpu]; c--)
+   for (c = 5000; c  !cpu_online(cpu); c--)
msleep(1);
 #endif
-
-   if (!cpu_callin_map[cpu]) {
+   if (!cpu_online(cpu)) {
printk(KERN_ERR Processor %u is stuck.\n, cpu);
return -ENOENT;
}
@@ -555,8 +552,8 @@ int __cpu_up(unsigned int