Re: [PATCH 0/5] sched/debug: decouple sched_stat tracepoints from CONFIG_SCHEDSTATS

2016-07-08 Thread Josh Poimboeuf
On Wed, Jun 29, 2016 at 12:29:58PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 28, 2016 at 02:43:36PM +0200, Peter Zijlstra wrote:
> 
> > Yeah, its a bit of a pain in general...
> > 
> > A) perf stat --null --repeat 50 -- perf bench sched messaging -g 50 -l 5000 
> > | grep "seconds time elapsed"
> > B) perf stat --null --repeat 50 -- taskset 1 perf bench sched pipe | grep 
> > "seconds time elapsed"
> > 
> > 1) tip/master + 1-4
> > 2) tip/master + 1-5
> > 3) tip/master + 1-5 + below
> > 
> > 1   2   3
> > 
> > A)  4.627767855 4.650429917 4.646208062
> > 4.633921933 4.641424424 4.612021058
> > 4.649536375 4.663144144 4.636815948
> > 4.630165619 4.649053552 4.613022902
> > 
> > B)  1.770732957 1.789534273 1.773334291
> > 1.761740716 1.795618428 1.773338681
> > 1.763761666 1.822316496 1.774385589
> > 
> > 
> > From this it looks like patch 5 does hurt a wee bit, but we can get most
> > of that back by reordering the structure a bit. The results seem
> > 'stable' across rebuilds and reboots (I've pop'ed all patches and
> > rebuild, rebooted and re-benched 1 at the end and obtained similar
> > results).
> 
> Ha! So those numbers were with CONFIG_SCHEDSTAT=n :-/
> 
> 1) above 1 (4 patches, CONFIG_SCHEDSTAT=n, sysctl=0)
> 2) 1 + CONFIG_SCHEDSTAT=y (sysctl=0)
> 3) 2 + sysctl=1
> 4) above 3 (6 patches) + CONFIG_SCHEDSTAT=y (sysctl=0)
> 
> 
>   1   2   3   4
> 
> A)4.620495664 4.788352823 4.862036428 4.623480512
>   4.628800053 4.792622881 4.855325525 4.613553872
>   4.611909507 4.794282178 4.850959761 4.613323142
>   4.608379522 4.787300153 4.822439864 4.597903070
> 
> B)1.765668026 1.788374847 1.877803100 1.827213170
>   1.769379968 1.779881911 1.870091005 1.825335322
>   1.765822150 1.786251610 1.885874745 1.828218761
> 
> 
> Which looks good for hackbench, but still stinks for pipetest :/

I tried again on another system (Broadwell 2*10*2) and seemingly got
more consistent results, but the conclusions are a bit different from
yours.

I tested only with CONFIG_SCHEDSTAT=y, sysctl=0, because I think that
should be the most common configuration by far.

1) linus/master
2) linus/master + 1-4
3) linux/master + 1-5
4) linus/master + 1-5 + smp cacheline patch

A) perf stat --null --repeat 50 -- perf bench sched messaging -g 50 -l 5000
B) perf stat --null --repeat 50 -- taskset 1 perf bench sched pipe

1   2   3   4

A)  6.335625627 6.299825679 6.317633969 6.305548464
6.334188492 6.331391159 6.345195048 6.334608006
6.345243359 6.329650737 6.328263309 6.304355127
6.333154970 6.313859694 6.336338820 6.342374680

B)  2.310476138 2.324716175 2.355990033 2.350834083
2.307231831 2.327946052 2.349816680 2.335581939
2.303859470 2.317300965 2.347193526 2.333758084
2.317224538 2.331390610 2.326164933 2.334235895

With patches 1-4, hackbench was slightly better and pipetest was
slightly worse.

With patches 1-5, hackbench was about the same or even slightly better
than baseline, and pipetest was 1-2% worse than baseline.

With your smp cacheline patch added, I didn't see a clear improvement.

It would be nice to have the schedstat tracepoints be always functional,
but I suppose it's up to you and Ingo as to whether it's worth the
performance tradeoff.

Another option would be to only merge patches 1-4.

-- 
Josh


Re: [PATCH 0/5] sched/debug: decouple sched_stat tracepoints from CONFIG_SCHEDSTATS

2016-07-08 Thread Josh Poimboeuf
On Wed, Jun 29, 2016 at 12:29:58PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 28, 2016 at 02:43:36PM +0200, Peter Zijlstra wrote:
> 
> > Yeah, its a bit of a pain in general...
> > 
> > A) perf stat --null --repeat 50 -- perf bench sched messaging -g 50 -l 5000 
> > | grep "seconds time elapsed"
> > B) perf stat --null --repeat 50 -- taskset 1 perf bench sched pipe | grep 
> > "seconds time elapsed"
> > 
> > 1) tip/master + 1-4
> > 2) tip/master + 1-5
> > 3) tip/master + 1-5 + below
> > 
> > 1   2   3
> > 
> > A)  4.627767855 4.650429917 4.646208062
> > 4.633921933 4.641424424 4.612021058
> > 4.649536375 4.663144144 4.636815948
> > 4.630165619 4.649053552 4.613022902
> > 
> > B)  1.770732957 1.789534273 1.773334291
> > 1.761740716 1.795618428 1.773338681
> > 1.763761666 1.822316496 1.774385589
> > 
> > 
> > From this it looks like patch 5 does hurt a wee bit, but we can get most
> > of that back by reordering the structure a bit. The results seem
> > 'stable' across rebuilds and reboots (I've pop'ed all patches and
> > rebuild, rebooted and re-benched 1 at the end and obtained similar
> > results).
> 
> Ha! So those numbers were with CONFIG_SCHEDSTAT=n :-/
> 
> 1) above 1 (4 patches, CONFIG_SCHEDSTAT=n, sysctl=0)
> 2) 1 + CONFIG_SCHEDSTAT=y (sysctl=0)
> 3) 2 + sysctl=1
> 4) above 3 (6 patches) + CONFIG_SCHEDSTAT=y (sysctl=0)
> 
> 
>   1   2   3   4
> 
> A)4.620495664 4.788352823 4.862036428 4.623480512
>   4.628800053 4.792622881 4.855325525 4.613553872
>   4.611909507 4.794282178 4.850959761 4.613323142
>   4.608379522 4.787300153 4.822439864 4.597903070
> 
> B)1.765668026 1.788374847 1.877803100 1.827213170
>   1.769379968 1.779881911 1.870091005 1.825335322
>   1.765822150 1.786251610 1.885874745 1.828218761
> 
> 
> Which looks good for hackbench, but still stinks for pipetest :/

I tried again on another system (Broadwell 2*10*2) and seemingly got
more consistent results, but the conclusions are a bit different from
yours.

I tested only with CONFIG_SCHEDSTAT=y, sysctl=0, because I think that
should be the most common configuration by far.

1) linus/master
2) linus/master + 1-4
3) linux/master + 1-5
4) linus/master + 1-5 + smp cacheline patch

A) perf stat --null --repeat 50 -- perf bench sched messaging -g 50 -l 5000
B) perf stat --null --repeat 50 -- taskset 1 perf bench sched pipe

1   2   3   4

A)  6.335625627 6.299825679 6.317633969 6.305548464
6.334188492 6.331391159 6.345195048 6.334608006
6.345243359 6.329650737 6.328263309 6.304355127
6.333154970 6.313859694 6.336338820 6.342374680

B)  2.310476138 2.324716175 2.355990033 2.350834083
2.307231831 2.327946052 2.349816680 2.335581939
2.303859470 2.317300965 2.347193526 2.333758084
2.317224538 2.331390610 2.326164933 2.334235895

With patches 1-4, hackbench was slightly better and pipetest was
slightly worse.

With patches 1-5, hackbench was about the same or even slightly better
than baseline, and pipetest was 1-2% worse than baseline.

With your smp cacheline patch added, I didn't see a clear improvement.

It would be nice to have the schedstat tracepoints be always functional,
but I suppose it's up to you and Ingo as to whether it's worth the
performance tradeoff.

Another option would be to only merge patches 1-4.

-- 
Josh


Re: [PATCH 0/5] sched/debug: decouple sched_stat tracepoints from CONFIG_SCHEDSTATS

2016-06-29 Thread Peter Zijlstra
On Tue, Jun 28, 2016 at 02:43:36PM +0200, Peter Zijlstra wrote:

> Yeah, its a bit of a pain in general...
> 
> A) perf stat --null --repeat 50 -- perf bench sched messaging -g 50 -l 5000 | 
> grep "seconds time elapsed"
> B) perf stat --null --repeat 50 -- taskset 1 perf bench sched pipe | grep 
> "seconds time elapsed"
> 
> 1) tip/master + 1-4
> 2) tip/master + 1-5
> 3) tip/master + 1-5 + below
> 
>   1   2   3
> 
> A)4.627767855 4.650429917 4.646208062
>   4.633921933 4.641424424 4.612021058
>   4.649536375 4.663144144 4.636815948
>   4.630165619 4.649053552 4.613022902
> 
> B)1.770732957 1.789534273 1.773334291
>   1.761740716 1.795618428 1.773338681
>   1.763761666 1.822316496 1.774385589
> 
> 
> From this it looks like patch 5 does hurt a wee bit, but we can get most
> of that back by reordering the structure a bit. The results seem
> 'stable' across rebuilds and reboots (I've pop'ed all patches and
> rebuild, rebooted and re-benched 1 at the end and obtained similar
> results).

Ha! So those numbers were with CONFIG_SCHEDSTAT=n :-/

1) above 1 (4 patches, CONFIG_SCHEDSTAT=n, sysctl=0)
2) 1 + CONFIG_SCHEDSTAT=y (sysctl=0)
3) 2 + sysctl=1
4) above 3 (6 patches) + CONFIG_SCHEDSTAT=y (sysctl=0)


1   2   3   4

A)  4.620495664 4.788352823 4.862036428 4.623480512
4.628800053 4.792622881 4.855325525 4.613553872
4.611909507 4.794282178 4.850959761 4.613323142
4.608379522 4.787300153 4.822439864 4.597903070

B)  1.765668026 1.788374847 1.877803100 1.827213170
1.769379968 1.779881911 1.870091005 1.825335322
1.765822150 1.786251610 1.885874745 1.828218761


Which looks good for hackbench, but still stinks for pipetest :/



Re: [PATCH 0/5] sched/debug: decouple sched_stat tracepoints from CONFIG_SCHEDSTATS

2016-06-29 Thread Peter Zijlstra
On Tue, Jun 28, 2016 at 02:43:36PM +0200, Peter Zijlstra wrote:

> Yeah, its a bit of a pain in general...
> 
> A) perf stat --null --repeat 50 -- perf bench sched messaging -g 50 -l 5000 | 
> grep "seconds time elapsed"
> B) perf stat --null --repeat 50 -- taskset 1 perf bench sched pipe | grep 
> "seconds time elapsed"
> 
> 1) tip/master + 1-4
> 2) tip/master + 1-5
> 3) tip/master + 1-5 + below
> 
>   1   2   3
> 
> A)4.627767855 4.650429917 4.646208062
>   4.633921933 4.641424424 4.612021058
>   4.649536375 4.663144144 4.636815948
>   4.630165619 4.649053552 4.613022902
> 
> B)1.770732957 1.789534273 1.773334291
>   1.761740716 1.795618428 1.773338681
>   1.763761666 1.822316496 1.774385589
> 
> 
> From this it looks like patch 5 does hurt a wee bit, but we can get most
> of that back by reordering the structure a bit. The results seem
> 'stable' across rebuilds and reboots (I've pop'ed all patches and
> rebuild, rebooted and re-benched 1 at the end and obtained similar
> results).

Ha! So those numbers were with CONFIG_SCHEDSTAT=n :-/

1) above 1 (4 patches, CONFIG_SCHEDSTAT=n, sysctl=0)
2) 1 + CONFIG_SCHEDSTAT=y (sysctl=0)
3) 2 + sysctl=1
4) above 3 (6 patches) + CONFIG_SCHEDSTAT=y (sysctl=0)


1   2   3   4

A)  4.620495664 4.788352823 4.862036428 4.623480512
4.628800053 4.792622881 4.855325525 4.613553872
4.611909507 4.794282178 4.850959761 4.613323142
4.608379522 4.787300153 4.822439864 4.597903070

B)  1.765668026 1.788374847 1.877803100 1.827213170
1.769379968 1.779881911 1.870091005 1.825335322
1.765822150 1.786251610 1.885874745 1.828218761


Which looks good for hackbench, but still stinks for pipetest :/



Re: [PATCH 0/5] sched/debug: decouple sched_stat tracepoints from CONFIG_SCHEDSTATS

2016-06-28 Thread Josh Poimboeuf
On Tue, Jun 28, 2016 at 02:43:36PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 17, 2016 at 12:43:22PM -0500, Josh Poimboeuf wrote:
> > NOTE: I didn't include any performance numbers because I wasn't able to
> > get consistent results.  I tried the following on a Xeon E5-2420 v2 CPU:
> > 
> >   $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo 
> > -n performance > $i; done
> >   $ echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
> >   $ echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
> >   $ echo 0 > /proc/sys/kernel/nmi_watchdog
> >   $ taskset 0x10 perf stat -n -r10 perf bench sched pipe -l 100
> > 
> > I was going to post the numbers from that, both with and without
> > SCHEDSTATS, but then when I tried to repeat the test on a different day,
> > the results were surprisingly different, with different conclusions.
> > 
> > So any advice on measuring scheduler performance would be appreciated...
> 
> Yeah, its a bit of a pain in general...
> 
> A) perf stat --null --repeat 50 -- perf bench sched messaging -g 50 -l 5000 | 
> grep "seconds time elapsed"
> B) perf stat --null --repeat 50 -- taskset 1 perf bench sched pipe | grep 
> "seconds time elapsed"
> 
> 1) tip/master + 1-4
> 2) tip/master + 1-5
> 3) tip/master + 1-5 + below
> 
>   1   2   3
> 
> A)4.627767855 4.650429917 4.646208062
>   4.633921933 4.641424424 4.612021058
>   4.649536375 4.663144144 4.636815948
>   4.630165619 4.649053552 4.613022902
> 
> B)1.770732957 1.789534273 1.773334291
>   1.761740716 1.795618428 1.773338681
>   1.763761666 1.822316496 1.774385589
> 
> 
> From this it looks like patch 5 does hurt a wee bit, but we can get most
> of that back by reordering the structure a bit. The results seem
> 'stable' across rebuilds and reboots (I've pop'ed all patches and
> rebuild, rebooted and re-benched 1 at the end and obtained similar
> results).
> 
> Although, possible that if we reorder first and then do 5, we'll just
> see a bigger regression. I've not bothered.

Thanks a lot for benchmarking this!  And also for improving the cache
alignments.  Your changes look good to me.

-- 
Josh


Re: [PATCH 0/5] sched/debug: decouple sched_stat tracepoints from CONFIG_SCHEDSTATS

2016-06-28 Thread Josh Poimboeuf
On Tue, Jun 28, 2016 at 02:43:36PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 17, 2016 at 12:43:22PM -0500, Josh Poimboeuf wrote:
> > NOTE: I didn't include any performance numbers because I wasn't able to
> > get consistent results.  I tried the following on a Xeon E5-2420 v2 CPU:
> > 
> >   $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo 
> > -n performance > $i; done
> >   $ echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
> >   $ echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
> >   $ echo 0 > /proc/sys/kernel/nmi_watchdog
> >   $ taskset 0x10 perf stat -n -r10 perf bench sched pipe -l 100
> > 
> > I was going to post the numbers from that, both with and without
> > SCHEDSTATS, but then when I tried to repeat the test on a different day,
> > the results were surprisingly different, with different conclusions.
> > 
> > So any advice on measuring scheduler performance would be appreciated...
> 
> Yeah, its a bit of a pain in general...
> 
> A) perf stat --null --repeat 50 -- perf bench sched messaging -g 50 -l 5000 | 
> grep "seconds time elapsed"
> B) perf stat --null --repeat 50 -- taskset 1 perf bench sched pipe | grep 
> "seconds time elapsed"
> 
> 1) tip/master + 1-4
> 2) tip/master + 1-5
> 3) tip/master + 1-5 + below
> 
>   1   2   3
> 
> A)4.627767855 4.650429917 4.646208062
>   4.633921933 4.641424424 4.612021058
>   4.649536375 4.663144144 4.636815948
>   4.630165619 4.649053552 4.613022902
> 
> B)1.770732957 1.789534273 1.773334291
>   1.761740716 1.795618428 1.773338681
>   1.763761666 1.822316496 1.774385589
> 
> 
> From this it looks like patch 5 does hurt a wee bit, but we can get most
> of that back by reordering the structure a bit. The results seem
> 'stable' across rebuilds and reboots (I've pop'ed all patches and
> rebuild, rebooted and re-benched 1 at the end and obtained similar
> results).
> 
> Although, possible that if we reorder first and then do 5, we'll just
> see a bigger regression. I've not bothered.

Thanks a lot for benchmarking this!  And also for improving the cache
alignments.  Your changes look good to me.

-- 
Josh


Re: [PATCH 0/5] sched/debug: decouple sched_stat tracepoints from CONFIG_SCHEDSTATS

2016-06-28 Thread Peter Zijlstra
On Fri, Jun 17, 2016 at 12:43:22PM -0500, Josh Poimboeuf wrote:
> NOTE: I didn't include any performance numbers because I wasn't able to
> get consistent results.  I tried the following on a Xeon E5-2420 v2 CPU:
> 
>   $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo 
> -n performance > $i; done
>   $ echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
>   $ echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
>   $ echo 0 > /proc/sys/kernel/nmi_watchdog
>   $ taskset 0x10 perf stat -n -r10 perf bench sched pipe -l 100
> 
> I was going to post the numbers from that, both with and without
> SCHEDSTATS, but then when I tried to repeat the test on a different day,
> the results were surprisingly different, with different conclusions.
> 
> So any advice on measuring scheduler performance would be appreciated...

Yeah, its a bit of a pain in general...

A) perf stat --null --repeat 50 -- perf bench sched messaging -g 50 -l 5000 | 
grep "seconds time elapsed"
B) perf stat --null --repeat 50 -- taskset 1 perf bench sched pipe | grep 
"seconds time elapsed"

1) tip/master + 1-4
2) tip/master + 1-5
3) tip/master + 1-5 + below

1   2   3

A)  4.627767855 4.650429917 4.646208062
4.633921933 4.641424424 4.612021058
4.649536375 4.663144144 4.636815948
4.630165619 4.649053552 4.613022902

B)  1.770732957 1.789534273 1.773334291
1.761740716 1.795618428 1.773338681
1.763761666 1.822316496 1.774385589


>From this it looks like patch 5 does hurt a wee bit, but we can get most
of that back by reordering the structure a bit. The results seem
'stable' across rebuilds and reboots (I've pop'ed all patches and
rebuild, rebooted and re-benched 1 at the end and obtained similar
results).

Although, possible that if we reorder first and then do 5, we'll just
see a bigger regression. I've not bothered.


---
 include/linux/sched.h |   33 +++--
 kernel/sched/core.c   |4 ++--
 kernel/sched/debug.c  |6 +++---
 3 files changed, 20 insertions(+), 23 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1220,7 +1220,7 @@ struct uts_namespace;
 struct load_weight {
unsigned long weight;
u32 inv_weight;
-};
+} __packed;
 
 /*
  * The load_avg/util_avg accumulates an infinite geometric series
@@ -1315,44 +1315,40 @@ struct sched_statistics {
 
 struct sched_entity {
struct load_weight  load;   /* for load-balancing */
+   unsigned inton_rq;
struct rb_node  run_node;
struct list_headgroup_node;
-   unsigned inton_rq;
 
-   u64 exec_start;
+   u64 exec_start cacheline_aligned_in_smp;
u64 sum_exec_runtime;
u64 vruntime;
u64 prev_sum_exec_runtime;
-
-   u64 nr_migrations;
-
u64 wait_start;
u64 sleep_start;
u64 block_start;
 
+#ifdef CONFIG_SMP
+   /*
+* Per entity load average tracking.
+*/
+   struct sched_avgavg cacheline_aligned_in_smp;
+#endif
 #ifdef CONFIG_SCHEDSTATS
struct sched_statistics statistics;
 #endif
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
-   int depth;
+   /*
+* mostly constant values, separate from modifications above
+*/
+   int depth cacheline_aligned_in_smp;
struct sched_entity *parent;
/* rq on which this entity is (to be) queued: */
struct cfs_rq   *cfs_rq;
/* rq "owned" by this entity/group: */
struct cfs_rq   *my_q;
 #endif
-
-#ifdef CONFIG_SMP
-   /*
-* Per entity load average tracking.
-*
-* Put into separate cache line so it does not
-* collide with read-mostly values above.
-*/
-   struct sched_avgavg cacheline_aligned_in_smp;
-#endif
-};
+} cacheline_aligned_in_smp;
 
 struct sched_rt_entity {
struct list_head run_list;
@@ -1475,6 +1471,7 @@ struct task_struct {
int prio, static_prio, normal_prio;
unsigned int rt_priority;
const struct sched_class *sched_class;
+   u64 nr_migrations;
struct sched_entity se;
struct sched_rt_entity rt;
 #ifdef CONFIG_CGROUP_SCHED
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1239,7 +1239,7 @@ void set_task_cpu(struct task_struct *p,
if (task_cpu(p) != new_cpu) {
if (p->sched_class->migrate_task_rq)
p->sched_class->migrate_task_rq(p);
-   p->se.nr_migrations++;
+   p->nr_migrations++;
perf_event_task_migrate(p);
}
 
@@ -2167,7 

Re: [PATCH 0/5] sched/debug: decouple sched_stat tracepoints from CONFIG_SCHEDSTATS

2016-06-28 Thread Peter Zijlstra
On Fri, Jun 17, 2016 at 12:43:22PM -0500, Josh Poimboeuf wrote:
> NOTE: I didn't include any performance numbers because I wasn't able to
> get consistent results.  I tried the following on a Xeon E5-2420 v2 CPU:
> 
>   $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo 
> -n performance > $i; done
>   $ echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
>   $ echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
>   $ echo 0 > /proc/sys/kernel/nmi_watchdog
>   $ taskset 0x10 perf stat -n -r10 perf bench sched pipe -l 100
> 
> I was going to post the numbers from that, both with and without
> SCHEDSTATS, but then when I tried to repeat the test on a different day,
> the results were surprisingly different, with different conclusions.
> 
> So any advice on measuring scheduler performance would be appreciated...

Yeah, its a bit of a pain in general...

A) perf stat --null --repeat 50 -- perf bench sched messaging -g 50 -l 5000 | 
grep "seconds time elapsed"
B) perf stat --null --repeat 50 -- taskset 1 perf bench sched pipe | grep 
"seconds time elapsed"

1) tip/master + 1-4
2) tip/master + 1-5
3) tip/master + 1-5 + below

1   2   3

A)  4.627767855 4.650429917 4.646208062
4.633921933 4.641424424 4.612021058
4.649536375 4.663144144 4.636815948
4.630165619 4.649053552 4.613022902

B)  1.770732957 1.789534273 1.773334291
1.761740716 1.795618428 1.773338681
1.763761666 1.822316496 1.774385589


>From this it looks like patch 5 does hurt a wee bit, but we can get most
of that back by reordering the structure a bit. The results seem
'stable' across rebuilds and reboots (I've pop'ed all patches and
rebuild, rebooted and re-benched 1 at the end and obtained similar
results).

Although, possible that if we reorder first and then do 5, we'll just
see a bigger regression. I've not bothered.


---
 include/linux/sched.h |   33 +++--
 kernel/sched/core.c   |4 ++--
 kernel/sched/debug.c  |6 +++---
 3 files changed, 20 insertions(+), 23 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1220,7 +1220,7 @@ struct uts_namespace;
 struct load_weight {
unsigned long weight;
u32 inv_weight;
-};
+} __packed;
 
 /*
  * The load_avg/util_avg accumulates an infinite geometric series
@@ -1315,44 +1315,40 @@ struct sched_statistics {
 
 struct sched_entity {
struct load_weight  load;   /* for load-balancing */
+   unsigned inton_rq;
struct rb_node  run_node;
struct list_headgroup_node;
-   unsigned inton_rq;
 
-   u64 exec_start;
+   u64 exec_start cacheline_aligned_in_smp;
u64 sum_exec_runtime;
u64 vruntime;
u64 prev_sum_exec_runtime;
-
-   u64 nr_migrations;
-
u64 wait_start;
u64 sleep_start;
u64 block_start;
 
+#ifdef CONFIG_SMP
+   /*
+* Per entity load average tracking.
+*/
+   struct sched_avgavg cacheline_aligned_in_smp;
+#endif
 #ifdef CONFIG_SCHEDSTATS
struct sched_statistics statistics;
 #endif
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
-   int depth;
+   /*
+* mostly constant values, separate from modifications above
+*/
+   int depth cacheline_aligned_in_smp;
struct sched_entity *parent;
/* rq on which this entity is (to be) queued: */
struct cfs_rq   *cfs_rq;
/* rq "owned" by this entity/group: */
struct cfs_rq   *my_q;
 #endif
-
-#ifdef CONFIG_SMP
-   /*
-* Per entity load average tracking.
-*
-* Put into separate cache line so it does not
-* collide with read-mostly values above.
-*/
-   struct sched_avgavg cacheline_aligned_in_smp;
-#endif
-};
+} cacheline_aligned_in_smp;
 
 struct sched_rt_entity {
struct list_head run_list;
@@ -1475,6 +1471,7 @@ struct task_struct {
int prio, static_prio, normal_prio;
unsigned int rt_priority;
const struct sched_class *sched_class;
+   u64 nr_migrations;
struct sched_entity se;
struct sched_rt_entity rt;
 #ifdef CONFIG_CGROUP_SCHED
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1239,7 +1239,7 @@ void set_task_cpu(struct task_struct *p,
if (task_cpu(p) != new_cpu) {
if (p->sched_class->migrate_task_rq)
p->sched_class->migrate_task_rq(p);
-   p->se.nr_migrations++;
+   p->nr_migrations++;
perf_event_task_migrate(p);
}
 
@@ -2167,7 

Re: [PATCH 0/5] sched/debug: decouple sched_stat tracepoints from CONFIG_SCHEDSTATS

2016-06-21 Thread Srikar Dronamraju
* Josh Poimboeuf  [2016-06-17 12:43:22]:

> NOTE: I didn't include any performance numbers because I wasn't able to
> get consistent results.  I tried the following on a Xeon E5-2420 v2 CPU:
> 
>   $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo 
> -n performance > $i; done
>   $ echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
>   $ echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
>   $ echo 0 > /proc/sys/kernel/nmi_watchdog
>   $ taskset 0x10 perf stat -n -r10 perf bench sched pipe -l 100
> 
> I was going to post the numbers from that, both with and without
> SCHEDSTATS, but then when I tried to repeat the test on a different day,
> the results were surprisingly different, with different conclusions.
> 
> So any advice on measuring scheduler performance would be appreciated...
> 
> Josh Poimboeuf (5):
>   sched/debug: rename and move enqueue_sleeper()
>   sched/debug: schedstat macro cleanup
>   sched/debug: 'schedstat_val()' -> 'schedstat_val_or_zero()'
>   sched/debug: remove several CONFIG_SCHEDSTATS guards
>   sched/debug: decouple 'sched_stat_*' tracepoints' from
> CONFIG_SCHEDSTATS
> 

This patchset looks good to me.

Acked-by: Srikar Dronamraju 

-- 
Thanks and Regards
Srikar Dronamraju



Re: [PATCH 0/5] sched/debug: decouple sched_stat tracepoints from CONFIG_SCHEDSTATS

2016-06-21 Thread Srikar Dronamraju
* Josh Poimboeuf  [2016-06-17 12:43:22]:

> NOTE: I didn't include any performance numbers because I wasn't able to
> get consistent results.  I tried the following on a Xeon E5-2420 v2 CPU:
> 
>   $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo 
> -n performance > $i; done
>   $ echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
>   $ echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
>   $ echo 0 > /proc/sys/kernel/nmi_watchdog
>   $ taskset 0x10 perf stat -n -r10 perf bench sched pipe -l 100
> 
> I was going to post the numbers from that, both with and without
> SCHEDSTATS, but then when I tried to repeat the test on a different day,
> the results were surprisingly different, with different conclusions.
> 
> So any advice on measuring scheduler performance would be appreciated...
> 
> Josh Poimboeuf (5):
>   sched/debug: rename and move enqueue_sleeper()
>   sched/debug: schedstat macro cleanup
>   sched/debug: 'schedstat_val()' -> 'schedstat_val_or_zero()'
>   sched/debug: remove several CONFIG_SCHEDSTATS guards
>   sched/debug: decouple 'sched_stat_*' tracepoints' from
> CONFIG_SCHEDSTATS
> 

This patchset looks good to me.

Acked-by: Srikar Dronamraju 

-- 
Thanks and Regards
Srikar Dronamraju



[PATCH 0/5] sched/debug: decouple sched_stat tracepoints from CONFIG_SCHEDSTATS

2016-06-17 Thread Josh Poimboeuf
NOTE: I didn't include any performance numbers because I wasn't able to
get consistent results.  I tried the following on a Xeon E5-2420 v2 CPU:

  $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo -n 
performance > $i; done
  $ echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
  $ echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
  $ echo 0 > /proc/sys/kernel/nmi_watchdog
  $ taskset 0x10 perf stat -n -r10 perf bench sched pipe -l 100

I was going to post the numbers from that, both with and without
SCHEDSTATS, but then when I tried to repeat the test on a different day,
the results were surprisingly different, with different conclusions.

So any advice on measuring scheduler performance would be appreciated...

Josh Poimboeuf (5):
  sched/debug: rename and move enqueue_sleeper()
  sched/debug: schedstat macro cleanup
  sched/debug: 'schedstat_val()' -> 'schedstat_val_or_zero()'
  sched/debug: remove several CONFIG_SCHEDSTATS guards
  sched/debug: decouple 'sched_stat_*' tracepoints' from
CONFIG_SCHEDSTATS

 include/linux/sched.h|  11 +-
 kernel/latencytop.c  |   2 -
 kernel/profile.c |   5 -
 kernel/sched/core.c  |  59 --
 kernel/sched/debug.c | 104 +
 kernel/sched/fair.c  | 290 ---
 kernel/sched/idle_task.c |   2 +-
 kernel/sched/stats.h |  24 ++--
 lib/Kconfig.debug|   1 -
 9 files changed, 220 insertions(+), 278 deletions(-)

-- 
2.4.11



[PATCH 0/5] sched/debug: decouple sched_stat tracepoints from CONFIG_SCHEDSTATS

2016-06-17 Thread Josh Poimboeuf
NOTE: I didn't include any performance numbers because I wasn't able to
get consistent results.  I tried the following on a Xeon E5-2420 v2 CPU:

  $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo -n 
performance > $i; done
  $ echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
  $ echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
  $ echo 0 > /proc/sys/kernel/nmi_watchdog
  $ taskset 0x10 perf stat -n -r10 perf bench sched pipe -l 100

I was going to post the numbers from that, both with and without
SCHEDSTATS, but then when I tried to repeat the test on a different day,
the results were surprisingly different, with different conclusions.

So any advice on measuring scheduler performance would be appreciated...

Josh Poimboeuf (5):
  sched/debug: rename and move enqueue_sleeper()
  sched/debug: schedstat macro cleanup
  sched/debug: 'schedstat_val()' -> 'schedstat_val_or_zero()'
  sched/debug: remove several CONFIG_SCHEDSTATS guards
  sched/debug: decouple 'sched_stat_*' tracepoints' from
CONFIG_SCHEDSTATS

 include/linux/sched.h|  11 +-
 kernel/latencytop.c  |   2 -
 kernel/profile.c |   5 -
 kernel/sched/core.c  |  59 --
 kernel/sched/debug.c | 104 +
 kernel/sched/fair.c  | 290 ---
 kernel/sched/idle_task.c |   2 +-
 kernel/sched/stats.h |  24 ++--
 lib/Kconfig.debug|   1 -
 9 files changed, 220 insertions(+), 278 deletions(-)

-- 
2.4.11