Re: [RFC,1/2] powerpc/numa: fix cpu_to_node() usage during boot

2015-07-15 Thread Tejun Heo
Hello,

On Fri, Jul 10, 2015 at 09:15:47AM -0700, Nishanth Aravamudan wrote:
 On 08.07.2015 [16:16:23 -0700], Nishanth Aravamudan wrote:
  On 08.07.2015 [14:00:56 +1000], Michael Ellerman wrote:
   On Thu, 2015-02-07 at 23:02:02 UTC, Nishanth Aravamudan wrote:
Much like on x86, now that powerpc is using USE_PERCPU_NUMA_NODE_ID, we
have an ordering issue during boot with early calls to cpu_to_node().
   
   now that .. implies we changed something and broke this. What commit was
   it that changed the behaviour?
  
  Well, that's something I'm trying to still unearth. In the commits
  before and after adding USE_PERCPU_NUMA_NODE_ID (8c272261194d
  powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID), the dmesg reports:
  
  pcpu-alloc: [0] 0 1 2 3 4 5 6 7
 
 Ok, I did a bisection, and it seems like prior to commit
 1a4d76076cda69b0abf15463a8cebc172406da25 (percpu: implement
 asynchronous chunk population), we emitted the above, e.g.:
 
 pcpu-alloc: [0] 0 1 2 3 4 5 6 7
 
 And after that commit, we emitted:
 
 pcpu-alloc: [0] 0 1 2 3 [0] 4 5 6 7
 
 I'm not exactly sure why that changed, but I'm still
 reading/understanding the commit. Tejun might be able to explain.
 
 Tejun, for reference, I noticed on Power systems since the
 above-mentioned commit, pcpu-alloc is not reflecting the topology of the
 system correctly -- that is, the pcpu areas are all on node 0
 unconditionally (based up on pcpu-alloc's output). Prior to that, there
 was just one group, it seems like, which completely ignored the NUMA
 topology.
 
 Is this just an ordering thing that changed with the introduction of the
 async code?

It's just each unit growing and percpu allocator deciding to split
them into separate allocation units.  Before it was serving all cpus
in a single alloc unit as they looked like they belong to the same
NUMA node and small enough to fit into one alloc unit.  In the latter,
the async one added more reserve space, so the allocator is deciding
to split them into two alloc units while assigning them to the same
group as the NUMA info wasn't still there.

Thanks.

-- 
tejun
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC,1/2] powerpc/numa: fix cpu_to_node() usage during boot

2015-07-14 Thread Michael Ellerman
On Wed, 2015-07-08 at 16:16 -0700, Nishanth Aravamudan wrote:
 On 08.07.2015 [14:00:56 +1000], Michael Ellerman wrote:
  On Thu, 2015-02-07 at 23:02:02 UTC, Nishanth Aravamudan wrote:
   
   we currently emit at boot:
   
   [0.00] pcpu-alloc: [0] 0 1 2 3 [0] 4 5 6 7 
   
   After this commit, we correctly emit:
   
   [0.00] pcpu-alloc: [0] 0 1 2 3 [1] 4 5 6 7 
  
  
  So it looks fairly sane, and I guess it's a bug fix.
  
  But I'm a bit reluctant to put it in straight away without some time in 
  next.
 
 I'm fine with that -- it could use some more extensive testing,
 admittedly (I only have been able to verify the pcpu areas are being
 correctly allocated on the right node so far).
 
 I still need to test with hotplug and things like that. Hence the RFC.
 
  It looks like the symptom is that the per-cpu areas are all allocated on 
  node
  0, is that all that goes wrong?
 
 Yes, that's the symptom. I cc'd a few folks to see if they could help
 indicate the performance implications of such a setup -- sorry, I should
 have been more explicit about that.

OK cool. I'm happy to put it in next if you send a non-RFC version.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC,1/2] powerpc/numa: fix cpu_to_node() usage during boot

2015-07-10 Thread Nishanth Aravamudan
On 08.07.2015 [16:16:23 -0700], Nishanth Aravamudan wrote:
 On 08.07.2015 [14:00:56 +1000], Michael Ellerman wrote:
  On Thu, 2015-02-07 at 23:02:02 UTC, Nishanth Aravamudan wrote:
   Much like on x86, now that powerpc is using USE_PERCPU_NUMA_NODE_ID, we
   have an ordering issue during boot with early calls to cpu_to_node().
  
  now that .. implies we changed something and broke this. What commit was
  it that changed the behaviour?
 
 Well, that's something I'm trying to still unearth. In the commits
 before and after adding USE_PERCPU_NUMA_NODE_ID (8c272261194d
 powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID), the dmesg reports:
 
 pcpu-alloc: [0] 0 1 2 3 4 5 6 7

Ok, I did a bisection, and it seems like prior to commit
1a4d76076cda69b0abf15463a8cebc172406da25 (percpu: implement
asynchronous chunk population), we emitted the above, e.g.:

pcpu-alloc: [0] 0 1 2 3 4 5 6 7

And after that commit, we emitted:

pcpu-alloc: [0] 0 1 2 3 [0] 4 5 6 7

I'm not exactly sure why that changed, but I'm still
reading/understanding the commit. Tejun might be able to explain.

Tejun, for reference, I noticed on Power systems since the
above-mentioned commit, pcpu-alloc is not reflecting the topology of the
system correctly -- that is, the pcpu areas are all on node 0
unconditionally (based up on pcpu-alloc's output). Prior to that, there
was just one group, it seems like, which completely ignored the NUMA
topology.

Is this just an ordering thing that changed with the introduction of the
async code?

Thanks,
Nish

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC,1/2] powerpc/numa: fix cpu_to_node() usage during boot

2015-07-08 Thread Nishanth Aravamudan
On 08.07.2015 [14:00:56 +1000], Michael Ellerman wrote:
 On Thu, 2015-02-07 at 23:02:02 UTC, Nishanth Aravamudan wrote:
  Much like on x86, now that powerpc is using USE_PERCPU_NUMA_NODE_ID, we
  have an ordering issue during boot with early calls to cpu_to_node().
 
 now that .. implies we changed something and broke this. What commit was
 it that changed the behaviour?

Well, that's something I'm trying to still unearth. In the commits
before and after adding USE_PERCPU_NUMA_NODE_ID (8c272261194d
powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID), the dmesg reports:

pcpu-alloc: [0] 0 1 2 3 4 5 6 7

At least prior to 8c272261194d, this might have been due to the old
powerpc-specific cpu_to_node():

static inline int cpu_to_node(int cpu)
{
   int nid;

   nid = numa_cpu_lookup_table[cpu];

   /*
* During early boot, the numa-cpu lookup table might not have
been
* setup for all CPUs yet. In such cases, default to node 0.
*/
   return (nid  0) ? 0 : nid;
}

which might imply that no one cares or that simply no one noticed.

  The value returned by those calls now depend on the per-cpu area being
  setup, but that is not guaranteed to be the case during boot. Instead,
  we need to add an early_cpu_to_node() which doesn't use the per-CPU area
  and call that from certain spots that are known to invoke cpu_to_node()
  before the per-CPU areas are not configured.
  
  On an example 2-node NUMA system with the following topology:
  
  available: 2 nodes (0-1)
  node 0 cpus: 0 1 2 3
  node 0 size: 2029 MB
  node 0 free: 1753 MB
  node 1 cpus: 4 5 6 7
  node 1 size: 2045 MB
  node 1 free: 1945 MB
  node distances:
  node   0   1 
0:  10  40 
1:  40  10 
  
  we currently emit at boot:
  
  [0.00] pcpu-alloc: [0] 0 1 2 3 [0] 4 5 6 7 
  
  After this commit, we correctly emit:
  
  [0.00] pcpu-alloc: [0] 0 1 2 3 [1] 4 5 6 7 
 
 
 So it looks fairly sane, and I guess it's a bug fix.
 
 But I'm a bit reluctant to put it in straight away without some time in next.

I'm fine with that -- it could use some more extensive testing,
admittedly (I only have been able to verify the pcpu areas are being
correctly allocated on the right node so far).

I still need to test with hotplug and things like that. Hence the RFC.

 It looks like the symptom is that the per-cpu areas are all allocated on node
 0, is that all that goes wrong?

Yes, that's the symptom. I cc'd a few folks to see if they could help
indicate the performance implications of such a setup -- sorry, I should
have been more explicit about that.

Thanks,
Nish

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC,1/2] powerpc/numa: fix cpu_to_node() usage during boot

2015-07-08 Thread David Rientjes
On Wed, 8 Jul 2015, Nishanth Aravamudan wrote:

  It looks like the symptom is that the per-cpu areas are all allocated on 
  node
  0, is that all that goes wrong?
 
 Yes, that's the symptom. I cc'd a few folks to see if they could help
 indicate the performance implications of such a setup -- sorry, I should
 have been more explicit about that.
 

Yeah, not sure it's really a bugfix but rather a performance optimization 
since cpu_to_node() with CONFIG_USE_PERCPU_NUMA_NODE_ID is only about 
performance.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC,1/2] powerpc/numa: fix cpu_to_node() usage during boot

2015-07-07 Thread Michael Ellerman
On Thu, 2015-02-07 at 23:02:02 UTC, Nishanth Aravamudan wrote:
 Much like on x86, now that powerpc is using USE_PERCPU_NUMA_NODE_ID, we
 have an ordering issue during boot with early calls to cpu_to_node().

now that .. implies we changed something and broke this. What commit was
it that changed the behaviour?

 The value returned by those calls now depend on the per-cpu area being
 setup, but that is not guaranteed to be the case during boot. Instead,
 we need to add an early_cpu_to_node() which doesn't use the per-CPU area
 and call that from certain spots that are known to invoke cpu_to_node()
 before the per-CPU areas are not configured.
 
 On an example 2-node NUMA system with the following topology:
 
 available: 2 nodes (0-1)
 node 0 cpus: 0 1 2 3
 node 0 size: 2029 MB
 node 0 free: 1753 MB
 node 1 cpus: 4 5 6 7
 node 1 size: 2045 MB
 node 1 free: 1945 MB
 node distances:
 node   0   1 
   0:  10  40 
   1:  40  10 
 
 we currently emit at boot:
 
 [0.00] pcpu-alloc: [0] 0 1 2 3 [0] 4 5 6 7 
 
 After this commit, we correctly emit:
 
 [0.00] pcpu-alloc: [0] 0 1 2 3 [1] 4 5 6 7 


So it looks fairly sane, and I guess it's a bug fix.

But I'm a bit reluctant to put it in straight away without some time in next.

It looks like the symptom is that the per-cpu areas are all allocated on node
0, is that all that goes wrong?

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev