Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7

2010-01-25 Thread Joel Schopp

Benjamin Herrenschmidt wrote:

On Wed, 2010-01-20 at 16:09 -0600, Joel Schopp wrote:
  
I can turn that into a conditional branch (case statement) with a shift 
for the common 1,2,4 cases which should cover all procs available today 
falling back to a divide for any theoretical future processors that do 
other numbers of threads. 



Look at the cputhreads.h implementation ... Today we only support
power-of-two numbers of threads.
  
I've run 3 threads using cpu hotplug to offline 1 of the 4.  It's 
certainly a stupid idea, but there you go.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7

2010-01-25 Thread Benjamin Herrenschmidt
On Mon, 2010-01-25 at 11:50 -0600, Joel Schopp wrote:
  Look at the cputhreads.h implementation ... Today we only support
  power-of-two numbers of threads.

 I've run 3 threads using cpu hotplug to offline 1 of the 4.  It's 
 certainly a stupid idea, but there you go. 

Oh, you mean you need to use the actual online count ? In which case,
yes, you do indeed need to be a bit more careful... In this case though,
you're probably better off special casing 3 :-)

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7

2010-01-23 Thread Benjamin Herrenschmidt
On Wed, 2010-01-20 at 16:09 -0600, Joel Schopp wrote:
 I can turn that into a conditional branch (case statement) with a shift 
 for the common 1,2,4 cases which should cover all procs available today 
 falling back to a divide for any theoretical future processors that do 
 other numbers of threads. 

Look at the cputhreads.h implementation ... Today we only support
power-of-two numbers of threads.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7

2010-01-21 Thread Peter Zijlstra
On Wed, 2010-01-20 at 16:44 -0600, Joel Schopp wrote:
 
 Care to take Gautham's bugfix patch (patch 1/2) now, since it just fixes 
 a bug?  You'll need it if you ever try to make the x86 broken version work.

Sure, I'll take that, thanks!

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7

2010-01-20 Thread Joel Schopp
On Power7 processors running in SMT4 mode with 2, 3, or 4 idle threads 
there is performance benefit to idling the higher numbered threads in
the core.  

This patch implements arch_scale_smt_power to dynamically update smt
thread power in these idle cases in order to prefer threads 0,1 over
threads 2,3 within a core.

Signed-off-by: Joel Schopp jsch...@austin.ibm.com
---
Index: linux-2.6.git/arch/powerpc/kernel/smp.c
===
--- linux-2.6.git.orig/arch/powerpc/kernel/smp.c
+++ linux-2.6.git/arch/powerpc/kernel/smp.c
@@ -617,3 +617,44 @@ void __cpu_die(unsigned int cpu)
smp_ops-cpu_die(cpu);
 }
 #endif
+
+static inline int thread_in_smt4core(int x)
+{
+  return x % 4;
+}
+unsigned long arch_scale_smt_power(struct sched_domain *sd, int cpu)
+{
+  int cpu2;
+  int idle_count = 0;
+
+  struct cpumask *cpu_map = sched_domain_span(sd);
+
+   unsigned long weight = cpumask_weight(cpu_map);
+   unsigned long smt_gain = sd-smt_gain;
+
+   if (cpu_has_feature(CPU_FTRS_POWER7)  weight == 4) {
+   for_each_cpu(cpu2, cpu_map) {
+   if (idle_cpu(cpu2))
+   idle_count++;
+   }
+
+   /* the following section attempts to tweak cpu power based
+* on current idleness of the threads dynamically at runtime
+*/
+   if (idle_count == 2 || idle_count == 3 || idle_count == 4) {
+   if (thread_in_smt4core(cpu) == 0 ||
+   thread_in_smt4core(cpu) == 1) {
+   /* add 75 % to thread power */
+   smt_gain += (smt_gain  1) + (smt_gain  2);
+   } else {
+/* subtract 75 % to thread power */
+   smt_gain = smt_gain  2;
+   }
+   }
+   }
+   /* default smt gain is 1178, weight is # of SMT threads */
+   smt_gain /= weight;
+
+   return smt_gain;
+
+}
Index: linux-2.6.git/kernel/sched_features.h
===
--- linux-2.6.git.orig/kernel/sched_features.h
+++ linux-2.6.git/kernel/sched_features.h
@@ -107,7 +107,7 @@ SCHED_FEAT(CACHE_HOT_BUDDY, 1)
 /*
  * Use arch dependent cpu power functions
  */
-SCHED_FEAT(ARCH_POWER, 0)
+SCHED_FEAT(ARCH_POWER, 1)
 
 SCHED_FEAT(HRTICK, 0)
 SCHED_FEAT(DOUBLE_TICK, 0)


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7

2010-01-20 Thread Peter Zijlstra
On Wed, 2010-01-20 at 14:04 -0600, Joel Schopp wrote:
 On Power7 processors running in SMT4 mode with 2, 3, or 4 idle threads 
 there is performance benefit to idling the higher numbered threads in
 the core.  

So this is an actual performance improvement, not only power savings?

 This patch implements arch_scale_smt_power to dynamically update smt
 thread power in these idle cases in order to prefer threads 0,1 over
 threads 2,3 within a core.
 
 Signed-off-by: Joel Schopp jsch...@austin.ibm.com
 ---
 Index: linux-2.6.git/arch/powerpc/kernel/smp.c
 ===
 --- linux-2.6.git.orig/arch/powerpc/kernel/smp.c
 +++ linux-2.6.git/arch/powerpc/kernel/smp.c
 @@ -617,3 +617,44 @@ void __cpu_die(unsigned int cpu)
   smp_ops-cpu_die(cpu);
  }
  #endif
 +
 +static inline int thread_in_smt4core(int x)
 +{
 +  return x % 4;
 +}
 +unsigned long arch_scale_smt_power(struct sched_domain *sd, int cpu)
 +{
 +  int cpu2;
 +  int idle_count = 0;
 +
 +  struct cpumask *cpu_map = sched_domain_span(sd);
 +
 + unsigned long weight = cpumask_weight(cpu_map);
 + unsigned long smt_gain = sd-smt_gain;
 +
 + if (cpu_has_feature(CPU_FTRS_POWER7)  weight == 4) {
 + for_each_cpu(cpu2, cpu_map) {
 + if (idle_cpu(cpu2))
 + idle_count++;
 + }
 +
 + /* the following section attempts to tweak cpu power based
 +  * on current idleness of the threads dynamically at runtime
 +  */
 + if (idle_count == 2 || idle_count == 3 || idle_count == 4) {
 + if (thread_in_smt4core(cpu) == 0 ||
 + thread_in_smt4core(cpu) == 1) {
 + /* add 75 % to thread power */
 + smt_gain += (smt_gain  1) + (smt_gain  2);
 + } else {
 +  /* subtract 75 % to thread power */
 + smt_gain = smt_gain  2;
 + }
 + }
 + }
 + /* default smt gain is 1178, weight is # of SMT threads */
 + smt_gain /= weight;
 +
 + return smt_gain;
 +
 +}

This looks to suffer significant whitespace damage.

The design goal for smt_power was to be able to actually measure the
processing gains from smt and feed that into the scheduler, not really
placement tricks like this.

Now I also heard AMD might want to have something similar to this,
something to do with powerlines and die layout.

I'm not sure playing games with cpu_power is the best or if simply
moving tasks to lower numbered cpus using an SD_flag is the best
solution for these kinds of things.

 Index: linux-2.6.git/kernel/sched_features.h
 ===
 --- linux-2.6.git.orig/kernel/sched_features.h
 +++ linux-2.6.git/kernel/sched_features.h
 @@ -107,7 +107,7 @@ SCHED_FEAT(CACHE_HOT_BUDDY, 1)
  /*
   * Use arch dependent cpu power functions
   */
 -SCHED_FEAT(ARCH_POWER, 0)
 +SCHED_FEAT(ARCH_POWER, 1)
  
  SCHED_FEAT(HRTICK, 0)
  SCHED_FEAT(DOUBLE_TICK, 0)

And you just wrecked x86 ;-)

It has an smt_power implementation that tries to measure smt gains using
aperf/mperf, trouble is that this represents the actual performance not
the capacity. This has the problem that when idle it represents 0
capacity and will not attract work.

Coming up with something that actually works there is on the todo list,
I was thinking perhaps temporal maximums from !idle.

So if you want to go with this, you'll need to stub out
arch/x86/kernel/cpu/sched.c



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7

2010-01-20 Thread Michael Neuling
 On Power7 processors running in SMT4 mode with 2, 3, or 4 idle threads 
 there is performance benefit to idling the higher numbered threads in
 the core.  
 
 This patch implements arch_scale_smt_power to dynamically update smt
 thread power in these idle cases in order to prefer threads 0,1 over
 threads 2,3 within a core.
 
 Signed-off-by: Joel Schopp jsch...@austin.ibm.com
 ---
 Index: linux-2.6.git/arch/powerpc/kernel/smp.c
 ===
 --- linux-2.6.git.orig/arch/powerpc/kernel/smp.c
 +++ linux-2.6.git/arch/powerpc/kernel/smp.c
 @@ -617,3 +617,44 @@ void __cpu_die(unsigned int cpu)
   smp_ops-cpu_die(cpu);
  }
  #endif
 +
 +static inline int thread_in_smt4core(int x)
 +{
 +  return x % 4;
 +}
 +unsigned long arch_scale_smt_power(struct sched_domain *sd, int cpu)
 +{
 +  int cpu2;
 +  int idle_count = 0;
 +
 +  struct cpumask *cpu_map = sched_domain_span(sd);
 +
 + unsigned long weight = cpumask_weight(cpu_map);
 + unsigned long smt_gain = sd-smt_gain;
 +
 + if (cpu_has_feature(CPU_FTRS_POWER7)  weight == 4) {

I think we should avoid using cpu_has_feature like this.  It's better to
create a new feature and add it to POWER7 in the cputable, then check
for that here.

The way that it is now, I think any CPU that has superset of the POWER7
features, will be true here.  This is not what we want.

 + for_each_cpu(cpu2, cpu_map) {
 + if (idle_cpu(cpu2))
 + idle_count++;
 + }
 +
 + /* the following section attempts to tweak cpu power based
 +  * on current idleness of the threads dynamically at runtime
 +  */
 + if (idle_count == 2 || idle_count == 3 || idle_count == 4) {
 + if (thread_in_smt4core(cpu) == 0 ||
 + thread_in_smt4core(cpu) == 1) {
 + /* add 75 % to thread power */
 + smt_gain += (smt_gain  1) + (smt_gain  2);
 + } else {
 +  /* subtract 75 % to thread power */
 + smt_gain = smt_gain  2;
 + }
 + }
 + }
 + /* default smt gain is 1178, weight is # of SMT threads */
 + smt_gain /= weight;

This results in a PPC div, when most of the time it's going to be a
power of two divide.  You've optimised the divides a few lines above
this, but not this one.  Some consistency would be good.

Mikey

 +
 + return smt_gain;
 +
 +}
 Index: linux-2.6.git/kernel/sched_features.h
 ===
 --- linux-2.6.git.orig/kernel/sched_features.h
 +++ linux-2.6.git/kernel/sched_features.h
 @@ -107,7 +107,7 @@ SCHED_FEAT(CACHE_HOT_BUDDY, 1)
  /*
   * Use arch dependent cpu power functions
   */
 -SCHED_FEAT(ARCH_POWER, 0)
 +SCHED_FEAT(ARCH_POWER, 1)
  
  SCHED_FEAT(HRTICK, 0)
  SCHED_FEAT(DOUBLE_TICK, 0)
 
 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7

2010-01-20 Thread Benjamin Herrenschmidt
On Wed, 2010-01-20 at 14:04 -0600, Joel Schopp wrote:
 On Power7 processors running in SMT4 mode with 2, 3, or 4 idle threads 
 there is performance benefit to idling the higher numbered threads in
 the core.  
 
 This patch implements arch_scale_smt_power to dynamically update smt
 thread power in these idle cases in order to prefer threads 0,1 over
 threads 2,3 within a core.
 
 Signed-off-by: Joel Schopp jsch...@austin.ibm.com

So I'll leave Peter deal with the scheduler aspects and will focus on
details :-)

 ---
 Index: linux-2.6.git/arch/powerpc/kernel/smp.c
 ===
 --- linux-2.6.git.orig/arch/powerpc/kernel/smp.c
 +++ linux-2.6.git/arch/powerpc/kernel/smp.c
 @@ -617,3 +617,44 @@ void __cpu_die(unsigned int cpu)
   smp_ops-cpu_die(cpu);
  }
  #endif
 +
 +static inline int thread_in_smt4core(int x)
 +{
 +  return x % 4;
 +}

Needs a whitespace here though I don't really like the above. Any reason
why you can't use the existing cpu_thread_in_core() ?

 +unsigned long arch_scale_smt_power(struct sched_domain *sd, int cpu)
 +{
 +  int cpu2;
 +  int idle_count = 0;
 +
 +  struct cpumask *cpu_map = sched_domain_span(sd);
 +
 + unsigned long weight = cpumask_weight(cpu_map);
 + unsigned long smt_gain = sd-smt_gain;

More whitespace damage above.

 + if (cpu_has_feature(CPU_FTRS_POWER7)  weight == 4) {
 + for_each_cpu(cpu2, cpu_map) {
 + if (idle_cpu(cpu2))
 + idle_count++;
 + }

I'm not 100% sure about the use of the CPU feature above. First I wonder
if the right approach is to instead do something like

if (!cpu_has_feature(...) !! weigth  4)
return default_scale_smt_power(sd, cpu);

Though we may be better off using a ppc_md. hook here to avoid
calculating the weight etc... on processors that don't need any
of that.

I also dislike your naming. I would suggest you change cpu_map to
sibling_map() and cpu2 to sibling (or just c). One thing I wonder is how
sure we are that sched_domain_span() is always going to give us the
threads btw ? If we introduce another sched domain level for NUMA
purposes can't we get confused ?

Also, how hot is this code path ?

 + /* the following section attempts to tweak cpu power based
 +  * on current idleness of the threads dynamically at runtime
 +  */
 + if (idle_count == 2 || idle_count == 3 || idle_count == 4) {

if (idle_count  1) ? :-)

 + if (thread_in_smt4core(cpu) == 0 ||
 + thread_in_smt4core(cpu) == 1) {

int thread = cpu_thread_in_core(cpu);
if (thread  2)
...

 + /* add 75 % to thread power */
 + smt_gain += (smt_gain  1) + (smt_gain  2);
 + } else {
 +  /* subtract 75 % to thread power */
 + smt_gain = smt_gain  2;
 + }
 + }
 + }
 + /* default smt gain is 1178, weight is # of SMT threads */
 + smt_gain /= weight;
 +
 + return smt_gain;

Cheers,
Ben.

 +}
 Index: linux-2.6.git/kernel/sched_features.h
 ===
 --- linux-2.6.git.orig/kernel/sched_features.h
 +++ linux-2.6.git/kernel/sched_features.h
 @@ -107,7 +107,7 @@ SCHED_FEAT(CACHE_HOT_BUDDY, 1)
  /*
   * Use arch dependent cpu power functions
   */
 -SCHED_FEAT(ARCH_POWER, 0)
 +SCHED_FEAT(ARCH_POWER, 1)
  
  SCHED_FEAT(HRTICK, 0)
  SCHED_FEAT(DOUBLE_TICK, 0)
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7

2010-01-20 Thread Michael Neuling
  On Power7 processors running in SMT4 mode with 2, 3, or 4 idle threads 
  there is performance benefit to idling the higher numbered threads in
  the core.  
 
 So this is an actual performance improvement, not only power savings?

It's primarily a performance improvement.  Any power/energy savings
would be a bonus.  

Mikey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7

2010-01-20 Thread Joel Schopp



+   if (cpu_has_feature(CPU_FTRS_POWER7)  weight == 4) {



I think we should avoid using cpu_has_feature like this.  It's better to
create a new feature and add it to POWER7 in the cputable, then check
for that here.

The way that it is now, I think any CPU that has superset of the POWER7
features, will be true here.  This is not what we want.
  

Any ideas for what to call this feature?  ASYM_SMT4 ?



+   smt_gain /= weight;



This results in a PPC div, when most of the time it's going to be a
power of two divide.  You've optimised the divides a few lines above
this, but not this one.  Some consistency would be good.

  
I can turn that into a conditional branch (case statement) with a shift 
for the common 1,2,4 cases which should cover all procs available today 
falling back to a divide for any theoretical future processors that do 
other numbers of threads.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7

2010-01-20 Thread Joel Schopp



+
+static inline int thread_in_smt4core(int x)
+{
+  return x % 4;
+}



Needs a whitespace here though I don't really like the above. Any reason
why you can't use the existing cpu_thread_in_core() ?
  

I will change it to cpu_thread_in_core()
  

+unsigned long arch_scale_smt_power(struct sched_domain *sd, int cpu)
+{
+  int cpu2;
+  int idle_count = 0;
+
+  struct cpumask *cpu_map = sched_domain_span(sd);
+
+   unsigned long weight = cpumask_weight(cpu_map);
+   unsigned long smt_gain = sd-smt_gain;



More whitespace damage above.
  

You are better than checkpatch.pl, will fix.
  

+   if (cpu_has_feature(CPU_FTRS_POWER7)  weight == 4) {
+   for_each_cpu(cpu2, cpu_map) {
+   if (idle_cpu(cpu2))
+   idle_count++;
+   }



I'm not 100% sure about the use of the CPU feature above. First I wonder
if the right approach is to instead do something like

if (!cpu_has_feature(...) !! weigth  4)
return default_scale_smt_power(sd, cpu);

Though we may be better off using a ppc_md. hook here to avoid
calculating the weight etc... on processors that don't need any
of that.

I also dislike your naming. I would suggest you change cpu_map to
sibling_map() and cpu2 to sibling (or just c). One thing I wonder is how
sure we are that sched_domain_span() is always going to give us the
threads btw ? If we introduce another sched domain level for NUMA
purposes can't we get confused ?
  
Right now it's 100% always giving us threads.  My development version of 
the patch had a BUG_ON() to check this.  I expect this to stay the case 
in the future as the name of the function is arch_scale_smt_power(), 
which clearly denotes threads are expected.


I am not stuck on the names, I'll change it to sibling instead of cpu2 
and sibling_map instead of cpu_map.  It seems clear to me either way.


As for testing the ! case it seems funcationally equivalent, and mine 
seems less confusing.


Having a ppc.md hook with exactly 1 user is pointless, especially since 
you'll still have to calculate the weight with the ability to 
dynamically disable smt.



Also, how hot is this code path ?
  
It's every load balance, which is to say not hot, but fairly frequent.  
I haven't been able to measure an impact from doing very hairy 
calculations (without actually changing the weights) here vs not having 
it at all in actual end workloads.
  

+   /* the following section attempts to tweak cpu power based
+* on current idleness of the threads dynamically at runtime
+*/
+   if (idle_count == 2 || idle_count == 3 || idle_count == 4) {



if (idle_count  1) ? :-)
  
Yes :)  Originally I had done different weightings for each of the 3 
cases, which gained in some workloads but regressed some others.   But 
since I'm not doing that anymore I'll fold it down to  1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7

2010-01-20 Thread Joel Schopp

Peter Zijlstra wrote:

On Wed, 2010-01-20 at 14:04 -0600, Joel Schopp wrote:
  
On Power7 processors running in SMT4 mode with 2, 3, or 4 idle threads 
there is performance benefit to idling the higher numbered threads in
the core.  



So this is an actual performance improvement, not only power savings?
  

Yes.




And you just wrecked x86 ;-)

It has an smt_power implementation that tries to measure smt gains using
aperf/mperf, trouble is that this represents the actual performance not
the capacity. This has the problem that when idle it represents 0
capacity and will not attract work.

Coming up with something that actually works there is on the todo list,
I was thinking perhaps temporal maximums from !idle.

So if you want to go with this, you'll need to stub out
arch/x86/kernel/cpu/sched.c
  


OK.  Guess I now will have a 3 patch series, with a patch to stub out 
the x86 broken version.


Care to take Gautham's bugfix patch (patch 1/2) now, since it just fixes 
a bug?  You'll need it if you ever try to make the x86 broken version work.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev