Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7
Benjamin Herrenschmidt wrote: On Wed, 2010-01-20 at 16:09 -0600, Joel Schopp wrote: I can turn that into a conditional branch (case statement) with a shift for the common 1,2,4 cases which should cover all procs available today falling back to a divide for any theoretical future processors that do other numbers of threads. Look at the cputhreads.h implementation ... Today we only support power-of-two numbers of threads. I've run 3 threads using cpu hotplug to offline 1 of the 4. It's certainly a stupid idea, but there you go. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7
On Mon, 2010-01-25 at 11:50 -0600, Joel Schopp wrote: Look at the cputhreads.h implementation ... Today we only support power-of-two numbers of threads. I've run 3 threads using cpu hotplug to offline 1 of the 4. It's certainly a stupid idea, but there you go. Oh, you mean you need to use the actual online count ? In which case, yes, you do indeed need to be a bit more careful... In this case though, you're probably better off special casing 3 :-) Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7
On Wed, 2010-01-20 at 16:09 -0600, Joel Schopp wrote: I can turn that into a conditional branch (case statement) with a shift for the common 1,2,4 cases which should cover all procs available today falling back to a divide for any theoretical future processors that do other numbers of threads. Look at the cputhreads.h implementation ... Today we only support power-of-two numbers of threads. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7
On Wed, 2010-01-20 at 16:44 -0600, Joel Schopp wrote: Care to take Gautham's bugfix patch (patch 1/2) now, since it just fixes a bug? You'll need it if you ever try to make the x86 broken version work. Sure, I'll take that, thanks! ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7
On Power7 processors running in SMT4 mode with 2, 3, or 4 idle threads there is performance benefit to idling the higher numbered threads in the core. This patch implements arch_scale_smt_power to dynamically update smt thread power in these idle cases in order to prefer threads 0,1 over threads 2,3 within a core. Signed-off-by: Joel Schopp jsch...@austin.ibm.com --- Index: linux-2.6.git/arch/powerpc/kernel/smp.c === --- linux-2.6.git.orig/arch/powerpc/kernel/smp.c +++ linux-2.6.git/arch/powerpc/kernel/smp.c @@ -617,3 +617,44 @@ void __cpu_die(unsigned int cpu) smp_ops-cpu_die(cpu); } #endif + +static inline int thread_in_smt4core(int x) +{ + return x % 4; +} +unsigned long arch_scale_smt_power(struct sched_domain *sd, int cpu) +{ + int cpu2; + int idle_count = 0; + + struct cpumask *cpu_map = sched_domain_span(sd); + + unsigned long weight = cpumask_weight(cpu_map); + unsigned long smt_gain = sd-smt_gain; + + if (cpu_has_feature(CPU_FTRS_POWER7) weight == 4) { + for_each_cpu(cpu2, cpu_map) { + if (idle_cpu(cpu2)) + idle_count++; + } + + /* the following section attempts to tweak cpu power based +* on current idleness of the threads dynamically at runtime +*/ + if (idle_count == 2 || idle_count == 3 || idle_count == 4) { + if (thread_in_smt4core(cpu) == 0 || + thread_in_smt4core(cpu) == 1) { + /* add 75 % to thread power */ + smt_gain += (smt_gain 1) + (smt_gain 2); + } else { +/* subtract 75 % to thread power */ + smt_gain = smt_gain 2; + } + } + } + /* default smt gain is 1178, weight is # of SMT threads */ + smt_gain /= weight; + + return smt_gain; + +} Index: linux-2.6.git/kernel/sched_features.h === --- linux-2.6.git.orig/kernel/sched_features.h +++ linux-2.6.git/kernel/sched_features.h @@ -107,7 +107,7 @@ SCHED_FEAT(CACHE_HOT_BUDDY, 1) /* * Use arch dependent cpu power functions */ -SCHED_FEAT(ARCH_POWER, 0) +SCHED_FEAT(ARCH_POWER, 1) SCHED_FEAT(HRTICK, 0) SCHED_FEAT(DOUBLE_TICK, 0) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7
On Wed, 2010-01-20 at 14:04 -0600, Joel Schopp wrote: On Power7 processors running in SMT4 mode with 2, 3, or 4 idle threads there is performance benefit to idling the higher numbered threads in the core. So this is an actual performance improvement, not only power savings? This patch implements arch_scale_smt_power to dynamically update smt thread power in these idle cases in order to prefer threads 0,1 over threads 2,3 within a core. Signed-off-by: Joel Schopp jsch...@austin.ibm.com --- Index: linux-2.6.git/arch/powerpc/kernel/smp.c === --- linux-2.6.git.orig/arch/powerpc/kernel/smp.c +++ linux-2.6.git/arch/powerpc/kernel/smp.c @@ -617,3 +617,44 @@ void __cpu_die(unsigned int cpu) smp_ops-cpu_die(cpu); } #endif + +static inline int thread_in_smt4core(int x) +{ + return x % 4; +} +unsigned long arch_scale_smt_power(struct sched_domain *sd, int cpu) +{ + int cpu2; + int idle_count = 0; + + struct cpumask *cpu_map = sched_domain_span(sd); + + unsigned long weight = cpumask_weight(cpu_map); + unsigned long smt_gain = sd-smt_gain; + + if (cpu_has_feature(CPU_FTRS_POWER7) weight == 4) { + for_each_cpu(cpu2, cpu_map) { + if (idle_cpu(cpu2)) + idle_count++; + } + + /* the following section attempts to tweak cpu power based + * on current idleness of the threads dynamically at runtime + */ + if (idle_count == 2 || idle_count == 3 || idle_count == 4) { + if (thread_in_smt4core(cpu) == 0 || + thread_in_smt4core(cpu) == 1) { + /* add 75 % to thread power */ + smt_gain += (smt_gain 1) + (smt_gain 2); + } else { + /* subtract 75 % to thread power */ + smt_gain = smt_gain 2; + } + } + } + /* default smt gain is 1178, weight is # of SMT threads */ + smt_gain /= weight; + + return smt_gain; + +} This looks to suffer significant whitespace damage. The design goal for smt_power was to be able to actually measure the processing gains from smt and feed that into the scheduler, not really placement tricks like this. Now I also heard AMD might want to have something similar to this, something to do with powerlines and die layout. I'm not sure playing games with cpu_power is the best or if simply moving tasks to lower numbered cpus using an SD_flag is the best solution for these kinds of things. Index: linux-2.6.git/kernel/sched_features.h === --- linux-2.6.git.orig/kernel/sched_features.h +++ linux-2.6.git/kernel/sched_features.h @@ -107,7 +107,7 @@ SCHED_FEAT(CACHE_HOT_BUDDY, 1) /* * Use arch dependent cpu power functions */ -SCHED_FEAT(ARCH_POWER, 0) +SCHED_FEAT(ARCH_POWER, 1) SCHED_FEAT(HRTICK, 0) SCHED_FEAT(DOUBLE_TICK, 0) And you just wrecked x86 ;-) It has an smt_power implementation that tries to measure smt gains using aperf/mperf, trouble is that this represents the actual performance not the capacity. This has the problem that when idle it represents 0 capacity and will not attract work. Coming up with something that actually works there is on the todo list, I was thinking perhaps temporal maximums from !idle. So if you want to go with this, you'll need to stub out arch/x86/kernel/cpu/sched.c ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7
On Power7 processors running in SMT4 mode with 2, 3, or 4 idle threads there is performance benefit to idling the higher numbered threads in the core. This patch implements arch_scale_smt_power to dynamically update smt thread power in these idle cases in order to prefer threads 0,1 over threads 2,3 within a core. Signed-off-by: Joel Schopp jsch...@austin.ibm.com --- Index: linux-2.6.git/arch/powerpc/kernel/smp.c === --- linux-2.6.git.orig/arch/powerpc/kernel/smp.c +++ linux-2.6.git/arch/powerpc/kernel/smp.c @@ -617,3 +617,44 @@ void __cpu_die(unsigned int cpu) smp_ops-cpu_die(cpu); } #endif + +static inline int thread_in_smt4core(int x) +{ + return x % 4; +} +unsigned long arch_scale_smt_power(struct sched_domain *sd, int cpu) +{ + int cpu2; + int idle_count = 0; + + struct cpumask *cpu_map = sched_domain_span(sd); + + unsigned long weight = cpumask_weight(cpu_map); + unsigned long smt_gain = sd-smt_gain; + + if (cpu_has_feature(CPU_FTRS_POWER7) weight == 4) { I think we should avoid using cpu_has_feature like this. It's better to create a new feature and add it to POWER7 in the cputable, then check for that here. The way that it is now, I think any CPU that has superset of the POWER7 features, will be true here. This is not what we want. + for_each_cpu(cpu2, cpu_map) { + if (idle_cpu(cpu2)) + idle_count++; + } + + /* the following section attempts to tweak cpu power based + * on current idleness of the threads dynamically at runtime + */ + if (idle_count == 2 || idle_count == 3 || idle_count == 4) { + if (thread_in_smt4core(cpu) == 0 || + thread_in_smt4core(cpu) == 1) { + /* add 75 % to thread power */ + smt_gain += (smt_gain 1) + (smt_gain 2); + } else { + /* subtract 75 % to thread power */ + smt_gain = smt_gain 2; + } + } + } + /* default smt gain is 1178, weight is # of SMT threads */ + smt_gain /= weight; This results in a PPC div, when most of the time it's going to be a power of two divide. You've optimised the divides a few lines above this, but not this one. Some consistency would be good. Mikey + + return smt_gain; + +} Index: linux-2.6.git/kernel/sched_features.h === --- linux-2.6.git.orig/kernel/sched_features.h +++ linux-2.6.git/kernel/sched_features.h @@ -107,7 +107,7 @@ SCHED_FEAT(CACHE_HOT_BUDDY, 1) /* * Use arch dependent cpu power functions */ -SCHED_FEAT(ARCH_POWER, 0) +SCHED_FEAT(ARCH_POWER, 1) SCHED_FEAT(HRTICK, 0) SCHED_FEAT(DOUBLE_TICK, 0) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7
On Wed, 2010-01-20 at 14:04 -0600, Joel Schopp wrote: On Power7 processors running in SMT4 mode with 2, 3, or 4 idle threads there is performance benefit to idling the higher numbered threads in the core. This patch implements arch_scale_smt_power to dynamically update smt thread power in these idle cases in order to prefer threads 0,1 over threads 2,3 within a core. Signed-off-by: Joel Schopp jsch...@austin.ibm.com So I'll leave Peter deal with the scheduler aspects and will focus on details :-) --- Index: linux-2.6.git/arch/powerpc/kernel/smp.c === --- linux-2.6.git.orig/arch/powerpc/kernel/smp.c +++ linux-2.6.git/arch/powerpc/kernel/smp.c @@ -617,3 +617,44 @@ void __cpu_die(unsigned int cpu) smp_ops-cpu_die(cpu); } #endif + +static inline int thread_in_smt4core(int x) +{ + return x % 4; +} Needs a whitespace here though I don't really like the above. Any reason why you can't use the existing cpu_thread_in_core() ? +unsigned long arch_scale_smt_power(struct sched_domain *sd, int cpu) +{ + int cpu2; + int idle_count = 0; + + struct cpumask *cpu_map = sched_domain_span(sd); + + unsigned long weight = cpumask_weight(cpu_map); + unsigned long smt_gain = sd-smt_gain; More whitespace damage above. + if (cpu_has_feature(CPU_FTRS_POWER7) weight == 4) { + for_each_cpu(cpu2, cpu_map) { + if (idle_cpu(cpu2)) + idle_count++; + } I'm not 100% sure about the use of the CPU feature above. First I wonder if the right approach is to instead do something like if (!cpu_has_feature(...) !! weigth 4) return default_scale_smt_power(sd, cpu); Though we may be better off using a ppc_md. hook here to avoid calculating the weight etc... on processors that don't need any of that. I also dislike your naming. I would suggest you change cpu_map to sibling_map() and cpu2 to sibling (or just c). One thing I wonder is how sure we are that sched_domain_span() is always going to give us the threads btw ? If we introduce another sched domain level for NUMA purposes can't we get confused ? Also, how hot is this code path ? + /* the following section attempts to tweak cpu power based + * on current idleness of the threads dynamically at runtime + */ + if (idle_count == 2 || idle_count == 3 || idle_count == 4) { if (idle_count 1) ? :-) + if (thread_in_smt4core(cpu) == 0 || + thread_in_smt4core(cpu) == 1) { int thread = cpu_thread_in_core(cpu); if (thread 2) ... + /* add 75 % to thread power */ + smt_gain += (smt_gain 1) + (smt_gain 2); + } else { + /* subtract 75 % to thread power */ + smt_gain = smt_gain 2; + } + } + } + /* default smt gain is 1178, weight is # of SMT threads */ + smt_gain /= weight; + + return smt_gain; Cheers, Ben. +} Index: linux-2.6.git/kernel/sched_features.h === --- linux-2.6.git.orig/kernel/sched_features.h +++ linux-2.6.git/kernel/sched_features.h @@ -107,7 +107,7 @@ SCHED_FEAT(CACHE_HOT_BUDDY, 1) /* * Use arch dependent cpu power functions */ -SCHED_FEAT(ARCH_POWER, 0) +SCHED_FEAT(ARCH_POWER, 1) SCHED_FEAT(HRTICK, 0) SCHED_FEAT(DOUBLE_TICK, 0) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7
On Power7 processors running in SMT4 mode with 2, 3, or 4 idle threads there is performance benefit to idling the higher numbered threads in the core. So this is an actual performance improvement, not only power savings? It's primarily a performance improvement. Any power/energy savings would be a bonus. Mikey ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7
+ if (cpu_has_feature(CPU_FTRS_POWER7) weight == 4) { I think we should avoid using cpu_has_feature like this. It's better to create a new feature and add it to POWER7 in the cputable, then check for that here. The way that it is now, I think any CPU that has superset of the POWER7 features, will be true here. This is not what we want. Any ideas for what to call this feature? ASYM_SMT4 ? + smt_gain /= weight; This results in a PPC div, when most of the time it's going to be a power of two divide. You've optimised the divides a few lines above this, but not this one. Some consistency would be good. I can turn that into a conditional branch (case statement) with a shift for the common 1,2,4 cases which should cover all procs available today falling back to a divide for any theoretical future processors that do other numbers of threads. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7
+ +static inline int thread_in_smt4core(int x) +{ + return x % 4; +} Needs a whitespace here though I don't really like the above. Any reason why you can't use the existing cpu_thread_in_core() ? I will change it to cpu_thread_in_core() +unsigned long arch_scale_smt_power(struct sched_domain *sd, int cpu) +{ + int cpu2; + int idle_count = 0; + + struct cpumask *cpu_map = sched_domain_span(sd); + + unsigned long weight = cpumask_weight(cpu_map); + unsigned long smt_gain = sd-smt_gain; More whitespace damage above. You are better than checkpatch.pl, will fix. + if (cpu_has_feature(CPU_FTRS_POWER7) weight == 4) { + for_each_cpu(cpu2, cpu_map) { + if (idle_cpu(cpu2)) + idle_count++; + } I'm not 100% sure about the use of the CPU feature above. First I wonder if the right approach is to instead do something like if (!cpu_has_feature(...) !! weigth 4) return default_scale_smt_power(sd, cpu); Though we may be better off using a ppc_md. hook here to avoid calculating the weight etc... on processors that don't need any of that. I also dislike your naming. I would suggest you change cpu_map to sibling_map() and cpu2 to sibling (or just c). One thing I wonder is how sure we are that sched_domain_span() is always going to give us the threads btw ? If we introduce another sched domain level for NUMA purposes can't we get confused ? Right now it's 100% always giving us threads. My development version of the patch had a BUG_ON() to check this. I expect this to stay the case in the future as the name of the function is arch_scale_smt_power(), which clearly denotes threads are expected. I am not stuck on the names, I'll change it to sibling instead of cpu2 and sibling_map instead of cpu_map. It seems clear to me either way. As for testing the ! case it seems funcationally equivalent, and mine seems less confusing. Having a ppc.md hook with exactly 1 user is pointless, especially since you'll still have to calculate the weight with the ability to dynamically disable smt. Also, how hot is this code path ? It's every load balance, which is to say not hot, but fairly frequent. I haven't been able to measure an impact from doing very hairy calculations (without actually changing the weights) here vs not having it at all in actual end workloads. + /* the following section attempts to tweak cpu power based +* on current idleness of the threads dynamically at runtime +*/ + if (idle_count == 2 || idle_count == 3 || idle_count == 4) { if (idle_count 1) ? :-) Yes :) Originally I had done different weightings for each of the 3 cases, which gained in some workloads but regressed some others. But since I'm not doing that anymore I'll fold it down to 1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7
Peter Zijlstra wrote: On Wed, 2010-01-20 at 14:04 -0600, Joel Schopp wrote: On Power7 processors running in SMT4 mode with 2, 3, or 4 idle threads there is performance benefit to idling the higher numbered threads in the core. So this is an actual performance improvement, not only power savings? Yes. And you just wrecked x86 ;-) It has an smt_power implementation that tries to measure smt gains using aperf/mperf, trouble is that this represents the actual performance not the capacity. This has the problem that when idle it represents 0 capacity and will not attract work. Coming up with something that actually works there is on the todo list, I was thinking perhaps temporal maximums from !idle. So if you want to go with this, you'll need to stub out arch/x86/kernel/cpu/sched.c OK. Guess I now will have a 3 patch series, with a patch to stub out the x86 broken version. Care to take Gautham's bugfix patch (patch 1/2) now, since it just fixes a bug? You'll need it if you ever try to make the x86 broken version work. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev