Re: 4.3 group scheduling regression

2015-10-13 Thread Yuyang Du
On Tue, Oct 13, 2015 at 10:10:23AM +0200, Peter Zijlstra wrote:
> On Tue, Oct 13, 2015 at 10:06:48AM +0200, Peter Zijlstra wrote:
> > On Tue, Oct 13, 2015 at 03:55:17AM +0800, Yuyang Du wrote:
> > 
> > > I think maybe the real disease is the tg->load_avg is not updated in time.
> > > I.e., it is after migrate, the source cfs_rq does not decrease its 
> > > contribution
> > > to the parent's tg->load_avg fast enough.
> > 
> > No, using the load_avg for shares calculation seems wrong; that would
> > mean we'd first have to ramp up the avg before you react.
> > 
> > You want to react quickly to actual load changes, esp. going up.
> > 
> > We use the avg to guess the global group load, since that's the best
> > compromise we have, but locally it doesn't make sense to use the avg if
> > we have the actual values.
> 
> That is, can you send the original patch with a Changelog etc.. so that
> I can press 'A' :-)

Sure, in minutes, :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-13 Thread Yuyang Du
On Tue, Oct 13, 2015 at 10:06:48AM +0200, Peter Zijlstra wrote:
> On Tue, Oct 13, 2015 at 03:55:17AM +0800, Yuyang Du wrote:
> 
> > I think maybe the real disease is the tg->load_avg is not updated in time.
> > I.e., it is after migrate, the source cfs_rq does not decrease its 
> > contribution
> > to the parent's tg->load_avg fast enough.
> 
> No, using the load_avg for shares calculation seems wrong; that would
> mean we'd first have to ramp up the avg before you react.
> 
> You want to react quickly to actual load changes, esp. going up.
> 
> We use the avg to guess the global group load, since that's the best
> compromise we have, but locally it doesn't make sense to use the avg if
> we have the actual values.

In Mike's case, since the mplayer group has only one active task, after
the task migrates, the source cfs_rq should have zero contrib to the
tg, so at the destination, the group entity should have the entire tg's
share. It is just the zeroing can be that fast we need.

But yes, in a general case, the load_avg (that has the blocked load) is
likely to lag behind. Using the actual load.weight to accelerate the
process makes sense. It is especially helpful to the less hungry tasks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-13 Thread Peter Zijlstra
On Tue, Oct 13, 2015 at 10:06:48AM +0200, Peter Zijlstra wrote:
> On Tue, Oct 13, 2015 at 03:55:17AM +0800, Yuyang Du wrote:
> 
> > I think maybe the real disease is the tg->load_avg is not updated in time.
> > I.e., it is after migrate, the source cfs_rq does not decrease its 
> > contribution
> > to the parent's tg->load_avg fast enough.
> 
> No, using the load_avg for shares calculation seems wrong; that would
> mean we'd first have to ramp up the avg before you react.
> 
> You want to react quickly to actual load changes, esp. going up.
> 
> We use the avg to guess the global group load, since that's the best
> compromise we have, but locally it doesn't make sense to use the avg if
> we have the actual values.

That is, can you send the original patch with a Changelog etc.. so that
I can press 'A' :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-13 Thread Peter Zijlstra
On Tue, Oct 13, 2015 at 03:32:47AM +0800, Yuyang Du wrote:
> On Mon, Oct 12, 2015 at 01:47:23PM +0200, Peter Zijlstra wrote:
> > 
> > Also, should we do the below? At this point se->on_rq is still 0 so
> > reweight_entity() will not update (dequeue/enqueue) the accounting, but
> > we'll have just accounted the 'old' load.weight.
> > 
> > Doing it this way around we'll first update the weight and then account
> > it, which seems more accurate.
>  
> I think the original looks ok.
> 
> The account_entity_enqueue() adds child entity's load.weight to parent's load:
> 
> update_load_add(_rq->load, se->load.weight)
> 
> Then recalculate the shares.
> 
> Then reweight_entity() resets the parent entity's load.weight.

Yes, some days I should just not be allowed near a keyboard :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-13 Thread Peter Zijlstra
On Tue, Oct 13, 2015 at 03:55:17AM +0800, Yuyang Du wrote:

> I think maybe the real disease is the tg->load_avg is not updated in time.
> I.e., it is after migrate, the source cfs_rq does not decrease its 
> contribution
> to the parent's tg->load_avg fast enough.

No, using the load_avg for shares calculation seems wrong; that would
mean we'd first have to ramp up the avg before you react.

You want to react quickly to actual load changes, esp. going up.

We use the avg to guess the global group load, since that's the best
compromise we have, but locally it doesn't make sense to use the avg if
we have the actual values.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-13 Thread Peter Zijlstra
On Tue, Oct 13, 2015 at 03:55:17AM +0800, Yuyang Du wrote:

> I think maybe the real disease is the tg->load_avg is not updated in time.
> I.e., it is after migrate, the source cfs_rq does not decrease its 
> contribution
> to the parent's tg->load_avg fast enough.

No, using the load_avg for shares calculation seems wrong; that would
mean we'd first have to ramp up the avg before you react.

You want to react quickly to actual load changes, esp. going up.

We use the avg to guess the global group load, since that's the best
compromise we have, but locally it doesn't make sense to use the avg if
we have the actual values.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-13 Thread Yuyang Du
On Tue, Oct 13, 2015 at 10:06:48AM +0200, Peter Zijlstra wrote:
> On Tue, Oct 13, 2015 at 03:55:17AM +0800, Yuyang Du wrote:
> 
> > I think maybe the real disease is the tg->load_avg is not updated in time.
> > I.e., it is after migrate, the source cfs_rq does not decrease its 
> > contribution
> > to the parent's tg->load_avg fast enough.
> 
> No, using the load_avg for shares calculation seems wrong; that would
> mean we'd first have to ramp up the avg before you react.
> 
> You want to react quickly to actual load changes, esp. going up.
> 
> We use the avg to guess the global group load, since that's the best
> compromise we have, but locally it doesn't make sense to use the avg if
> we have the actual values.

In Mike's case, since the mplayer group has only one active task, after
the task migrates, the source cfs_rq should have zero contrib to the
tg, so at the destination, the group entity should have the entire tg's
share. It is just the zeroing can be that fast we need.

But yes, in a general case, the load_avg (that has the blocked load) is
likely to lag behind. Using the actual load.weight to accelerate the
process makes sense. It is especially helpful to the less hungry tasks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-13 Thread Yuyang Du
On Tue, Oct 13, 2015 at 10:10:23AM +0200, Peter Zijlstra wrote:
> On Tue, Oct 13, 2015 at 10:06:48AM +0200, Peter Zijlstra wrote:
> > On Tue, Oct 13, 2015 at 03:55:17AM +0800, Yuyang Du wrote:
> > 
> > > I think maybe the real disease is the tg->load_avg is not updated in time.
> > > I.e., it is after migrate, the source cfs_rq does not decrease its 
> > > contribution
> > > to the parent's tg->load_avg fast enough.
> > 
> > No, using the load_avg for shares calculation seems wrong; that would
> > mean we'd first have to ramp up the avg before you react.
> > 
> > You want to react quickly to actual load changes, esp. going up.
> > 
> > We use the avg to guess the global group load, since that's the best
> > compromise we have, but locally it doesn't make sense to use the avg if
> > we have the actual values.
> 
> That is, can you send the original patch with a Changelog etc.. so that
> I can press 'A' :-)

Sure, in minutes, :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-13 Thread Peter Zijlstra
On Tue, Oct 13, 2015 at 10:06:48AM +0200, Peter Zijlstra wrote:
> On Tue, Oct 13, 2015 at 03:55:17AM +0800, Yuyang Du wrote:
> 
> > I think maybe the real disease is the tg->load_avg is not updated in time.
> > I.e., it is after migrate, the source cfs_rq does not decrease its 
> > contribution
> > to the parent's tg->load_avg fast enough.
> 
> No, using the load_avg for shares calculation seems wrong; that would
> mean we'd first have to ramp up the avg before you react.
> 
> You want to react quickly to actual load changes, esp. going up.
> 
> We use the avg to guess the global group load, since that's the best
> compromise we have, but locally it doesn't make sense to use the avg if
> we have the actual values.

That is, can you send the original patch with a Changelog etc.. so that
I can press 'A' :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-13 Thread Peter Zijlstra
On Tue, Oct 13, 2015 at 03:32:47AM +0800, Yuyang Du wrote:
> On Mon, Oct 12, 2015 at 01:47:23PM +0200, Peter Zijlstra wrote:
> > 
> > Also, should we do the below? At this point se->on_rq is still 0 so
> > reweight_entity() will not update (dequeue/enqueue) the accounting, but
> > we'll have just accounted the 'old' load.weight.
> > 
> > Doing it this way around we'll first update the weight and then account
> > it, which seems more accurate.
>  
> I think the original looks ok.
> 
> The account_entity_enqueue() adds child entity's load.weight to parent's load:
> 
> update_load_add(_rq->load, se->load.weight)
> 
> Then recalculate the shares.
> 
> Then reweight_entity() resets the parent entity's load.weight.

Yes, some days I should just not be allowed near a keyboard :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Yuyang Du
On Tue, Oct 13, 2015 at 06:08:34AM +0200, Mike Galbraith wrote:
> It sounded like you wanted me to run the below alone.  If so, it's a nogo.
  
Yes, thanks.

Then it is the sad fact that after migrate and removed_load_avg is added
in migrate_task_rq_fair(), we don't get a chance to update the tg so fast
that at the destination the mplayer is weighted to the group's share.

>  
> -
>   Task  |   Runtime ms  | Switches | Average delay ms | 
> Maximum delay ms | Maximum delay at   |
>  
> -
>   oink:(8)  | 787001.236 ms |21641 | avg:0.377 ms | max:  
>  21.991 ms | max at: 51.504005 s
>   mplayer:(25)  |   4256.224 ms | 7264 | avg:   19.698 ms | max: 
> 2087.489 ms | max at:115.294922 s
>   Xorg:1011 |   1507.958 ms | 4081 | avg:8.349 ms | max: 
> 1652.200 ms | max at:126.908021 s
>   konsole:1752  |697.806 ms | 1186 | avg:5.749 ms | max:  
> 160.189 ms | max at: 53.037952 s
>   testo:(9) |438.164 ms | 2551 | avg:6.616 ms | max:  
> 215.527 ms | max at:117.302455 s
>   plasma-desktop:1716   |280.418 ms | 1624 | avg:3.701 ms | max:  
> 574.806 ms | max at: 53.582261 s
>   kwin:1708 |144.986 ms | 2422 | avg:3.301 ms | max:  
> 315.707 ms | max at:116.555721 s
> 
> > --
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 4df37a4..3dba883 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -2686,12 +2686,13 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq 
> > *cfs_rq);
> >  static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
> >  {
> > struct sched_avg *sa = _rq->avg;
> > -   int decayed;
> > +   int decayed, updated = 0;
> >  
> > if (atomic_long_read(_rq->removed_load_avg)) {
> > long r = atomic_long_xchg(_rq->removed_load_avg, 0);
> > sa->load_avg = max_t(long, sa->load_avg - r, 0);
> > sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
> > +   updated = 1;
> > }
> >  
> > if (atomic_long_read(_rq->removed_util_avg)) {
> > @@ -2708,7 +2709,7 @@ static inline int update_cfs_rq_load_avg(u64 now, 
> > struct cfs_rq *cfs_rq)
> > cfs_rq->load_last_update_time_copy = sa->last_update_time;
> >  #endif
> >  
> > -   return decayed;
> > +   return decayed | updated;

A typo: decayed || updated, but shouldn't make any difference.

> >  }
> >  
> >  /* Update task and its cfs_rq load average */
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Mike Galbraith
On Tue, 2015-10-13 at 03:55 +0800, Yuyang Du wrote:
> On Mon, Oct 12, 2015 at 12:23:31PM +0200, Mike Galbraith wrote:
> > On Mon, 2015-10-12 at 10:12 +0800, Yuyang Du wrote:
> > 
> > > I am guessing it is in calc_tg_weight(), and naughty boys do make them 
> > > more
> > > favored, what a reality...
> > > 
> > > Mike, beg you test the following?
> > 
> > Wow, that was quick.  Dinky patch made it all better.
> > 
> >  
> > -
> >   Task  |   Runtime ms  | Switches | Average delay ms | 
> > Maximum delay ms | Maximum delay at   |
> >  
> > -
> >   oink:(8)  | 739056.970 ms |27270 | avg:2.043 ms | 
> > max:   29.105 ms | max at:339.988310 s
> >   mplayer:(25)  |  36448.997 ms |44670 | avg:1.886 ms | 
> > max:   72.808 ms | max at:302.153121 s
> >   Xorg:988  |  13334.908 ms |22210 | avg:0.081 ms | 
> > max:   25.005 ms | max at:269.068666 s
> >   testo:(9) |   2558.540 ms |13703 | avg:0.124 ms | 
> > max:6.412 ms | max at:279.235272 s
> >   konsole:1781  |   1084.316 ms | 1457 | avg:0.006 ms | 
> > max:1.039 ms | max at:268.863379 s
> >   kwin:1734 |879.645 ms |17855 | avg:0.458 ms | 
> > max:   15.788 ms | max at:268.854992 s
> >   pulseaudio:1808   |356.334 ms |15023 | avg:0.028 ms | 
> > max:6.134 ms | max at:324.479766 s
> >   threaded-ml:3483  |292.782 ms |25769 | avg:0.364 ms | 
> > max:   40.387 ms | max at:294.550515 s
> >   plasma-desktop:1745   |265.055 ms | 1470 | avg:0.102 ms | 
> > max:   21.886 ms | max at:267.724902 s
> >   perf:3439 | 61.677 ms |2 | avg:0.117 ms | 
> > max:0.232 ms | max at:367.043889 s
> 
> Phew...
> 
> I think maybe the real disease is the tg->load_avg is not updated in time.
> I.e., it is after migrate, the source cfs_rq does not decrease its 
> contribution
> to the parent's tg->load_avg fast enough.

It sounded like you wanted me to run the below alone.  If so, it's a nogo.
 
 
-
  Task  |   Runtime ms  | Switches | Average delay ms | Maximum 
delay ms | Maximum delay at   |
 
-
  oink:(8)  | 787001.236 ms |21641 | avg:0.377 ms | max:   
21.991 ms | max at: 51.504005 s
  mplayer:(25)  |   4256.224 ms | 7264 | avg:   19.698 ms | max: 
2087.489 ms | max at:115.294922 s
  Xorg:1011 |   1507.958 ms | 4081 | avg:8.349 ms | max: 
1652.200 ms | max at:126.908021 s
  konsole:1752  |697.806 ms | 1186 | avg:5.749 ms | max:  
160.189 ms | max at: 53.037952 s
  testo:(9) |438.164 ms | 2551 | avg:6.616 ms | max:  
215.527 ms | max at:117.302455 s
  plasma-desktop:1716   |280.418 ms | 1624 | avg:3.701 ms | max:  
574.806 ms | max at: 53.582261 s
  kwin:1708 |144.986 ms | 2422 | avg:3.301 ms | max:  
315.707 ms | max at:116.555721 s

> --
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4df37a4..3dba883 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2686,12 +2686,13 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq 
> *cfs_rq);
>  static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
>  {
>   struct sched_avg *sa = _rq->avg;
> - int decayed;
> + int decayed, updated = 0;
>  
>   if (atomic_long_read(_rq->removed_load_avg)) {
>   long r = atomic_long_xchg(_rq->removed_load_avg, 0);
>   sa->load_avg = max_t(long, sa->load_avg - r, 0);
>   sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
> + updated = 1;
>   }
>  
>   if (atomic_long_read(_rq->removed_util_avg)) {
> @@ -2708,7 +2709,7 @@ static inline int update_cfs_rq_load_avg(u64 now, 
> struct cfs_rq *cfs_rq)
>   cfs_rq->load_last_update_time_copy = sa->last_update_time;
>  #endif
>  
> - return decayed;
> + return decayed | updated;
>  }
>  
>  /* Update task and its cfs_rq load average */


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Yuyang Du
On Mon, Oct 12, 2015 at 12:23:31PM +0200, Mike Galbraith wrote:
> On Mon, 2015-10-12 at 10:12 +0800, Yuyang Du wrote:
> 
> > I am guessing it is in calc_tg_weight(), and naughty boys do make them more
> > favored, what a reality...
> > 
> > Mike, beg you test the following?
> 
> Wow, that was quick.  Dinky patch made it all better.
> 
>  
> -
>   Task  |   Runtime ms  | Switches | Average delay ms | 
> Maximum delay ms | Maximum delay at   |
>  
> -
>   oink:(8)  | 739056.970 ms |27270 | avg:2.043 ms | max:  
>  29.105 ms | max at:339.988310 s
>   mplayer:(25)  |  36448.997 ms |44670 | avg:1.886 ms | max:  
>  72.808 ms | max at:302.153121 s
>   Xorg:988  |  13334.908 ms |22210 | avg:0.081 ms | max:  
>  25.005 ms | max at:269.068666 s
>   testo:(9) |   2558.540 ms |13703 | avg:0.124 ms | max:  
>   6.412 ms | max at:279.235272 s
>   konsole:1781  |   1084.316 ms | 1457 | avg:0.006 ms | max:  
>   1.039 ms | max at:268.863379 s
>   kwin:1734 |879.645 ms |17855 | avg:0.458 ms | max:  
>  15.788 ms | max at:268.854992 s
>   pulseaudio:1808   |356.334 ms |15023 | avg:0.028 ms | max:  
>   6.134 ms | max at:324.479766 s
>   threaded-ml:3483  |292.782 ms |25769 | avg:0.364 ms | max:  
>  40.387 ms | max at:294.550515 s
>   plasma-desktop:1745   |265.055 ms | 1470 | avg:0.102 ms | max:  
>  21.886 ms | max at:267.724902 s
>   perf:3439 | 61.677 ms |2 | avg:0.117 ms | max:  
>   0.232 ms | max at:367.043889 s

Phew...

I think maybe the real disease is the tg->load_avg is not updated in time.
I.e., it is after migrate, the source cfs_rq does not decrease its contribution
to the parent's tg->load_avg fast enough.

--

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4df37a4..3dba883 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2686,12 +2686,13 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq 
*cfs_rq);
 static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
 {
struct sched_avg *sa = _rq->avg;
-   int decayed;
+   int decayed, updated = 0;
 
if (atomic_long_read(_rq->removed_load_avg)) {
long r = atomic_long_xchg(_rq->removed_load_avg, 0);
sa->load_avg = max_t(long, sa->load_avg - r, 0);
sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
+   updated = 1;
}
 
if (atomic_long_read(_rq->removed_util_avg)) {
@@ -2708,7 +2709,7 @@ static inline int update_cfs_rq_load_avg(u64 now, struct 
cfs_rq *cfs_rq)
cfs_rq->load_last_update_time_copy = sa->last_update_time;
 #endif
 
-   return decayed;
+   return decayed | updated;
 }
 
 /* Update task and its cfs_rq load average */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Yuyang Du
On Mon, Oct 12, 2015 at 01:47:23PM +0200, Peter Zijlstra wrote:
> 
> Also, should we do the below? At this point se->on_rq is still 0 so
> reweight_entity() will not update (dequeue/enqueue) the accounting, but
> we'll have just accounted the 'old' load.weight.
> 
> Doing it this way around we'll first update the weight and then account
> it, which seems more accurate.
 
I think the original looks ok.

The account_entity_enqueue() adds child entity's load.weight to parent's load:

update_load_add(_rq->load, se->load.weight)

Then recalculate the shares.

Then reweight_entity() resets the parent entity's load.weight.

> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 700eb548315f..d2efef565aed 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3009,8 +3009,8 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct 
> sched_entity *se, int flags)
>*/
>   update_curr(cfs_rq);
>   enqueue_entity_load_avg(cfs_rq, se);
> - account_entity_enqueue(cfs_rq, se);
>   update_cfs_shares(cfs_rq);
> + account_entity_enqueue(cfs_rq, se);
>  
>   if (flags & ENQUEUE_WAKEUP) {
>   place_entity(cfs_rq, se, 0);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Mike Galbraith
On Mon, 2015-10-12 at 13:47 +0200, Peter Zijlstra wrote:

> Also, should we do the below?

Ew.  Box said "Either you quilt pop/burn, or I boot windows." ;-)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Peter Zijlstra
On Mon, Oct 12, 2015 at 10:12:31AM +0800, Yuyang Du wrote:
> On Mon, Oct 12, 2015 at 11:12:06AM +0200, Peter Zijlstra wrote:

> > So in the old code we had 'magic' to deal with the case where a cgroup
> > was consuming less than 1 cpu's worth of runtime. For example, a single
> > task running in the group.
> > 
> > In that scenario it might be possible that the group entity weight:
> > 
> > se->weight = (tg->shares * cfs_rq->weight) / tg->weight;
> > 
> > Strongly deviates from the tg->shares; you want the single task reflect
> > the full group shares to the next level; due to the whole distributed
> > approximation stuff.
> 
> Yeah, I thought so.
>  
> > I see you've deleted all that code; see the former
> > __update_group_entity_contrib().
>  
> Probably not there, it actually was an icky way to adjust things.

Yeah, no argument there.

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4df37a4..b184da0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2370,7 +2370,7 @@ static inline long calc_tg_weight(struct task_group 
> *tg, struct cfs_rq *cfs_rq)
>*/
>   tg_weight = atomic_long_read(>load_avg);
>   tg_weight -= cfs_rq->tg_load_avg_contrib;
> - tg_weight += cfs_rq_load_avg(cfs_rq);
> + tg_weight += cfs_rq->load.weight;
>  
>   return tg_weight;
>  }
> @@ -2380,7 +2380,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, 
> struct task_group *tg)
>   long tg_weight, load, shares;
>  
>   tg_weight = calc_tg_weight(tg, cfs_rq);
> - load = cfs_rq_load_avg(cfs_rq);
> + load = cfs_rq->load.weight;
>  
>   shares = (tg->shares * load);
>   if (tg_weight)

Aah, yes very much so. I completely overlooked that :-(

When calculating shares we very much want the current load, not the load
average.

Also, should we do the below? At this point se->on_rq is still 0 so
reweight_entity() will not update (dequeue/enqueue) the accounting, but
we'll have just accounted the 'old' load.weight.

Doing it this way around we'll first update the weight and then account
it, which seems more accurate.

---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 700eb548315f..d2efef565aed 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3009,8 +3009,8 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity 
*se, int flags)
 */
update_curr(cfs_rq);
enqueue_entity_load_avg(cfs_rq, se);
-   account_entity_enqueue(cfs_rq, se);
update_cfs_shares(cfs_rq);
+   account_entity_enqueue(cfs_rq, se);
 
if (flags & ENQUEUE_WAKEUP) {
place_entity(cfs_rq, se, 0);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Mike Galbraith
On Mon, 2015-10-12 at 10:12 +0800, Yuyang Du wrote:

> I am guessing it is in calc_tg_weight(), and naughty boys do make them more
> favored, what a reality...
> 
> Mike, beg you test the following?

Wow, that was quick.  Dinky patch made it all better.

 
-
  Task  |   Runtime ms  | Switches | Average delay ms | Maximum 
delay ms | Maximum delay at   |
 
-
  oink:(8)  | 739056.970 ms |27270 | avg:2.043 ms | max:   
29.105 ms | max at:339.988310 s
  mplayer:(25)  |  36448.997 ms |44670 | avg:1.886 ms | max:   
72.808 ms | max at:302.153121 s
  Xorg:988  |  13334.908 ms |22210 | avg:0.081 ms | max:   
25.005 ms | max at:269.068666 s
  testo:(9) |   2558.540 ms |13703 | avg:0.124 ms | max:
6.412 ms | max at:279.235272 s
  konsole:1781  |   1084.316 ms | 1457 | avg:0.006 ms | max:
1.039 ms | max at:268.863379 s
  kwin:1734 |879.645 ms |17855 | avg:0.458 ms | max:   
15.788 ms | max at:268.854992 s
  pulseaudio:1808   |356.334 ms |15023 | avg:0.028 ms | max:
6.134 ms | max at:324.479766 s
  threaded-ml:3483  |292.782 ms |25769 | avg:0.364 ms | max:   
40.387 ms | max at:294.550515 s
  plasma-desktop:1745   |265.055 ms | 1470 | avg:0.102 ms | max:   
21.886 ms | max at:267.724902 s
  perf:3439 | 61.677 ms |2 | avg:0.117 ms | max:
0.232 ms | max at:367.043889 s


> --
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4df37a4..b184da0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2370,7 +2370,7 @@ static inline long calc_tg_weight(struct task_group 
> *tg, struct cfs_rq *cfs_rq)
>*/
>   tg_weight = atomic_long_read(>load_avg);
>   tg_weight -= cfs_rq->tg_load_avg_contrib;
> - tg_weight += cfs_rq_load_avg(cfs_rq);
> + tg_weight += cfs_rq->load.weight;
>  
>   return tg_weight;
>  }
> @@ -2380,7 +2380,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, 
> struct task_group *tg)
>   long tg_weight, load, shares;
>  
>   tg_weight = calc_tg_weight(tg, cfs_rq);
> - load = cfs_rq_load_avg(cfs_rq);
> + load = cfs_rq->load.weight;
>  
>   shares = (tg->shares * load);
>   if (tg_weight)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Yuyang Du
On Mon, Oct 12, 2015 at 11:12:06AM +0200, Peter Zijlstra wrote:
> On Mon, Oct 12, 2015 at 08:53:51AM +0800, Yuyang Du wrote:
> > Good morning, Peter.
> > 
> > On Mon, Oct 12, 2015 at 10:04:07AM +0200, Peter Zijlstra wrote:
> > > On Mon, Oct 12, 2015 at 09:44:57AM +0200, Mike Galbraith wrote:
> > > 
> > > > It's odd to me that things look pretty much the same good/bad tree with
> > > > hogs vs hogs or hogs vs tbench (with top anyway, just adding up times).
> > > > Seems Xorg+mplayer more or less playing cross group ping-pong must be
> > > > the BadThing trigger.
> > >
> > > Ohh, wait, Xorg and mplayer are _not_ in the same group? I was assuming
> > > you had your entire user session in 1 (auto) group and was competing
> > > against 8 manual cgroups.
> > > 
> > > So how exactly are things configured?
> >  
> > Hmm... my impression is the naughty boy mplayer (+Xorg) isn't favored, due 
> > to the per CPU group entity share distribution. Let me dig more.
> 
> So in the old code we had 'magic' to deal with the case where a cgroup
> was consuming less than 1 cpu's worth of runtime. For example, a single
> task running in the group.
> 
> In that scenario it might be possible that the group entity weight:
> 
>   se->weight = (tg->shares * cfs_rq->weight) / tg->weight;
> 
> Strongly deviates from the tg->shares; you want the single task reflect
> the full group shares to the next level; due to the whole distributed
> approximation stuff.

Yeah, I thought so.
 
> I see you've deleted all that code; see the former
> __update_group_entity_contrib().
 
Probably not there, it actually was an icky way to adjust things.

> It could be that we need to bring that back. But let me think a little
> bit more on this.. I'm having a hard time waking :/

I am guessing it is in calc_tg_weight(), and naughty boys do make them more
favored, what a reality...

Mike, beg you test the following?

--

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4df37a4..b184da0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2370,7 +2370,7 @@ static inline long calc_tg_weight(struct task_group *tg, 
struct cfs_rq *cfs_rq)
 */
tg_weight = atomic_long_read(>load_avg);
tg_weight -= cfs_rq->tg_load_avg_contrib;
-   tg_weight += cfs_rq_load_avg(cfs_rq);
+   tg_weight += cfs_rq->load.weight;
 
return tg_weight;
 }
@@ -2380,7 +2380,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct 
task_group *tg)
long tg_weight, load, shares;
 
tg_weight = calc_tg_weight(tg, cfs_rq);
-   load = cfs_rq_load_avg(cfs_rq);
+   load = cfs_rq->load.weight;
 
shares = (tg->shares * load);
if (tg_weight)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Peter Zijlstra
On Mon, Oct 12, 2015 at 08:53:51AM +0800, Yuyang Du wrote:
> Good morning, Peter.
> 
> On Mon, Oct 12, 2015 at 10:04:07AM +0200, Peter Zijlstra wrote:
> > On Mon, Oct 12, 2015 at 09:44:57AM +0200, Mike Galbraith wrote:
> > 
> > > It's odd to me that things look pretty much the same good/bad tree with
> > > hogs vs hogs or hogs vs tbench (with top anyway, just adding up times).
> > > Seems Xorg+mplayer more or less playing cross group ping-pong must be
> > > the BadThing trigger.
> >
> > Ohh, wait, Xorg and mplayer are _not_ in the same group? I was assuming
> > you had your entire user session in 1 (auto) group and was competing
> > against 8 manual cgroups.
> > 
> > So how exactly are things configured?
>  
> Hmm... my impression is the naughty boy mplayer (+Xorg) isn't favored, due 
> to the per CPU group entity share distribution. Let me dig more.

So in the old code we had 'magic' to deal with the case where a cgroup
was consuming less than 1 cpu's worth of runtime. For example, a single
task running in the group.

In that scenario it might be possible that the group entity weight:

se->weight = (tg->shares * cfs_rq->weight) / tg->weight;

Strongly deviates from the tg->shares; you want the single task reflect
the full group shares to the next level; due to the whole distributed
approximation stuff.

I see you've deleted all that code; see the former
__update_group_entity_contrib().

It could be that we need to bring that back. But let me think a little
bit more on this.. I'm having a hard time waking :/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Mike Galbraith
On Mon, 2015-10-12 at 10:04 +0200, Peter Zijlstra wrote:
> On Mon, Oct 12, 2015 at 09:44:57AM +0200, Mike Galbraith wrote:
> 
> > It's odd to me that things look pretty much the same good/bad tree with
> > hogs vs hogs or hogs vs tbench (with top anyway, just adding up times).
> > Seems Xorg+mplayer more or less playing cross group ping-pong must be
> > the BadThing trigger.
> 
> Ohh, wait, Xorg and mplayer are _not_ in the same group? I was assuming
> you had your entire user session in 1 (auto) group and was competing
> against 8 manual cgroups.
> 
> So how exactly are things configured?

I turned autogroup on as to not have to muck about creating groups, so
Xorg is in its per session group, and each konsole instance in its.  I
launched groups via testo (aka konsole) -e  in a little script
to turn it loose at once to run for 100 seconds and kill itself, but
that's not necessary 'course.  Start 1 hog in 8 konsole tabs, and
mplayer in the 9th, ickiness follows.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Yuyang Du
Good morning, Peter.

On Mon, Oct 12, 2015 at 10:04:07AM +0200, Peter Zijlstra wrote:
> On Mon, Oct 12, 2015 at 09:44:57AM +0200, Mike Galbraith wrote:
> 
> > It's odd to me that things look pretty much the same good/bad tree with
> > hogs vs hogs or hogs vs tbench (with top anyway, just adding up times).
> > Seems Xorg+mplayer more or less playing cross group ping-pong must be
> > the BadThing trigger.
>
> Ohh, wait, Xorg and mplayer are _not_ in the same group? I was assuming
> you had your entire user session in 1 (auto) group and was competing
> against 8 manual cgroups.
> 
> So how exactly are things configured?
 
Hmm... my impression is the naughty boy mplayer (+Xorg) isn't favored, due 
to the per CPU group entity share distribution. Let me dig more.

Sorry.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Peter Zijlstra
On Mon, Oct 12, 2015 at 09:44:57AM +0200, Mike Galbraith wrote:

> It's odd to me that things look pretty much the same good/bad tree with
> hogs vs hogs or hogs vs tbench (with top anyway, just adding up times).
> Seems Xorg+mplayer more or less playing cross group ping-pong must be
> the BadThing trigger.

Ohh, wait, Xorg and mplayer are _not_ in the same group? I was assuming
you had your entire user session in 1 (auto) group and was competing
against 8 manual cgroups.

So how exactly are things configured?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Mike Galbraith
On Mon, 2015-10-12 at 09:23 +0200, Peter Zijlstra wrote:
> On Sun, Oct 11, 2015 at 07:42:01PM +0200, Mike Galbraith wrote:
> > (change subject, CCs)
> > 
> > On Sun, 2015-10-11 at 04:25 +0200, Mike Galbraith wrote:
> > 
> > > > Is the interactivity the same (horrible) at fe32d3cd5e8e (ie, before the
> > > > load tracking rewrite from Yuyang)?
> > 
> > It is the rewrite, 9d89c257dfb9c51a532d69397f6eed75e5168c35.
> 
> Just to be sure, so 9d89c257dfb9^1 is good, while 9d89c257dfb9 is bad?

Yeah, I went ahead and bisected.
 
> And *groan*, _just_ the thing I need on a monday morning ;-)

Sorry 'bout that.

It's odd to me that things look pretty much the same good/bad tree with
hogs vs hogs or hogs vs tbench (with top anyway, just adding up times).
Seems Xorg+mplayer more or less playing cross group ping-pong must be
the BadThing trigger.

-Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Peter Zijlstra
On Sun, Oct 11, 2015 at 07:42:01PM +0200, Mike Galbraith wrote:
> (change subject, CCs)
> 
> On Sun, 2015-10-11 at 04:25 +0200, Mike Galbraith wrote:
> 
> > > Is the interactivity the same (horrible) at fe32d3cd5e8e (ie, before the
> > > load tracking rewrite from Yuyang)?
> 
> It is the rewrite, 9d89c257dfb9c51a532d69397f6eed75e5168c35.

Just to be sure, so 9d89c257dfb9^1 is good, while 9d89c257dfb9 is bad?

And *groan*, _just_ the thing I need on a monday morning ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Peter Zijlstra
On Sun, Oct 11, 2015 at 07:42:01PM +0200, Mike Galbraith wrote:
> (change subject, CCs)
> 
> On Sun, 2015-10-11 at 04:25 +0200, Mike Galbraith wrote:
> 
> > > Is the interactivity the same (horrible) at fe32d3cd5e8e (ie, before the
> > > load tracking rewrite from Yuyang)?
> 
> It is the rewrite, 9d89c257dfb9c51a532d69397f6eed75e5168c35.

Just to be sure, so 9d89c257dfb9^1 is good, while 9d89c257dfb9 is bad?

And *groan*, _just_ the thing I need on a monday morning ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Peter Zijlstra
On Mon, Oct 12, 2015 at 09:44:57AM +0200, Mike Galbraith wrote:

> It's odd to me that things look pretty much the same good/bad tree with
> hogs vs hogs or hogs vs tbench (with top anyway, just adding up times).
> Seems Xorg+mplayer more or less playing cross group ping-pong must be
> the BadThing trigger.

Ohh, wait, Xorg and mplayer are _not_ in the same group? I was assuming
you had your entire user session in 1 (auto) group and was competing
against 8 manual cgroups.

So how exactly are things configured?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Mike Galbraith
On Mon, 2015-10-12 at 09:23 +0200, Peter Zijlstra wrote:
> On Sun, Oct 11, 2015 at 07:42:01PM +0200, Mike Galbraith wrote:
> > (change subject, CCs)
> > 
> > On Sun, 2015-10-11 at 04:25 +0200, Mike Galbraith wrote:
> > 
> > > > Is the interactivity the same (horrible) at fe32d3cd5e8e (ie, before the
> > > > load tracking rewrite from Yuyang)?
> > 
> > It is the rewrite, 9d89c257dfb9c51a532d69397f6eed75e5168c35.
> 
> Just to be sure, so 9d89c257dfb9^1 is good, while 9d89c257dfb9 is bad?

Yeah, I went ahead and bisected.
 
> And *groan*, _just_ the thing I need on a monday morning ;-)

Sorry 'bout that.

It's odd to me that things look pretty much the same good/bad tree with
hogs vs hogs or hogs vs tbench (with top anyway, just adding up times).
Seems Xorg+mplayer more or less playing cross group ping-pong must be
the BadThing trigger.

-Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Mike Galbraith
On Mon, 2015-10-12 at 10:12 +0800, Yuyang Du wrote:

> I am guessing it is in calc_tg_weight(), and naughty boys do make them more
> favored, what a reality...
> 
> Mike, beg you test the following?

Wow, that was quick.  Dinky patch made it all better.

 
-
  Task  |   Runtime ms  | Switches | Average delay ms | Maximum 
delay ms | Maximum delay at   |
 
-
  oink:(8)  | 739056.970 ms |27270 | avg:2.043 ms | max:   
29.105 ms | max at:339.988310 s
  mplayer:(25)  |  36448.997 ms |44670 | avg:1.886 ms | max:   
72.808 ms | max at:302.153121 s
  Xorg:988  |  13334.908 ms |22210 | avg:0.081 ms | max:   
25.005 ms | max at:269.068666 s
  testo:(9) |   2558.540 ms |13703 | avg:0.124 ms | max:
6.412 ms | max at:279.235272 s
  konsole:1781  |   1084.316 ms | 1457 | avg:0.006 ms | max:
1.039 ms | max at:268.863379 s
  kwin:1734 |879.645 ms |17855 | avg:0.458 ms | max:   
15.788 ms | max at:268.854992 s
  pulseaudio:1808   |356.334 ms |15023 | avg:0.028 ms | max:
6.134 ms | max at:324.479766 s
  threaded-ml:3483  |292.782 ms |25769 | avg:0.364 ms | max:   
40.387 ms | max at:294.550515 s
  plasma-desktop:1745   |265.055 ms | 1470 | avg:0.102 ms | max:   
21.886 ms | max at:267.724902 s
  perf:3439 | 61.677 ms |2 | avg:0.117 ms | max:
0.232 ms | max at:367.043889 s


> --
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4df37a4..b184da0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2370,7 +2370,7 @@ static inline long calc_tg_weight(struct task_group 
> *tg, struct cfs_rq *cfs_rq)
>*/
>   tg_weight = atomic_long_read(>load_avg);
>   tg_weight -= cfs_rq->tg_load_avg_contrib;
> - tg_weight += cfs_rq_load_avg(cfs_rq);
> + tg_weight += cfs_rq->load.weight;
>  
>   return tg_weight;
>  }
> @@ -2380,7 +2380,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, 
> struct task_group *tg)
>   long tg_weight, load, shares;
>  
>   tg_weight = calc_tg_weight(tg, cfs_rq);
> - load = cfs_rq_load_avg(cfs_rq);
> + load = cfs_rq->load.weight;
>  
>   shares = (tg->shares * load);
>   if (tg_weight)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Yuyang Du
Good morning, Peter.

On Mon, Oct 12, 2015 at 10:04:07AM +0200, Peter Zijlstra wrote:
> On Mon, Oct 12, 2015 at 09:44:57AM +0200, Mike Galbraith wrote:
> 
> > It's odd to me that things look pretty much the same good/bad tree with
> > hogs vs hogs or hogs vs tbench (with top anyway, just adding up times).
> > Seems Xorg+mplayer more or less playing cross group ping-pong must be
> > the BadThing trigger.
>
> Ohh, wait, Xorg and mplayer are _not_ in the same group? I was assuming
> you had your entire user session in 1 (auto) group and was competing
> against 8 manual cgroups.
> 
> So how exactly are things configured?
 
Hmm... my impression is the naughty boy mplayer (+Xorg) isn't favored, due 
to the per CPU group entity share distribution. Let me dig more.

Sorry.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Mike Galbraith
On Mon, 2015-10-12 at 10:04 +0200, Peter Zijlstra wrote:
> On Mon, Oct 12, 2015 at 09:44:57AM +0200, Mike Galbraith wrote:
> 
> > It's odd to me that things look pretty much the same good/bad tree with
> > hogs vs hogs or hogs vs tbench (with top anyway, just adding up times).
> > Seems Xorg+mplayer more or less playing cross group ping-pong must be
> > the BadThing trigger.
> 
> Ohh, wait, Xorg and mplayer are _not_ in the same group? I was assuming
> you had your entire user session in 1 (auto) group and was competing
> against 8 manual cgroups.
> 
> So how exactly are things configured?

I turned autogroup on as to not have to muck about creating groups, so
Xorg is in its per session group, and each konsole instance in its.  I
launched groups via testo (aka konsole) -e  in a little script
to turn it loose at once to run for 100 seconds and kill itself, but
that's not necessary 'course.  Start 1 hog in 8 konsole tabs, and
mplayer in the 9th, ickiness follows.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Yuyang Du
On Mon, Oct 12, 2015 at 11:12:06AM +0200, Peter Zijlstra wrote:
> On Mon, Oct 12, 2015 at 08:53:51AM +0800, Yuyang Du wrote:
> > Good morning, Peter.
> > 
> > On Mon, Oct 12, 2015 at 10:04:07AM +0200, Peter Zijlstra wrote:
> > > On Mon, Oct 12, 2015 at 09:44:57AM +0200, Mike Galbraith wrote:
> > > 
> > > > It's odd to me that things look pretty much the same good/bad tree with
> > > > hogs vs hogs or hogs vs tbench (with top anyway, just adding up times).
> > > > Seems Xorg+mplayer more or less playing cross group ping-pong must be
> > > > the BadThing trigger.
> > >
> > > Ohh, wait, Xorg and mplayer are _not_ in the same group? I was assuming
> > > you had your entire user session in 1 (auto) group and was competing
> > > against 8 manual cgroups.
> > > 
> > > So how exactly are things configured?
> >  
> > Hmm... my impression is the naughty boy mplayer (+Xorg) isn't favored, due 
> > to the per CPU group entity share distribution. Let me dig more.
> 
> So in the old code we had 'magic' to deal with the case where a cgroup
> was consuming less than 1 cpu's worth of runtime. For example, a single
> task running in the group.
> 
> In that scenario it might be possible that the group entity weight:
> 
>   se->weight = (tg->shares * cfs_rq->weight) / tg->weight;
> 
> Strongly deviates from the tg->shares; you want the single task reflect
> the full group shares to the next level; due to the whole distributed
> approximation stuff.

Yeah, I thought so.
 
> I see you've deleted all that code; see the former
> __update_group_entity_contrib().
 
Probably not there, it actually was an icky way to adjust things.

> It could be that we need to bring that back. But let me think a little
> bit more on this.. I'm having a hard time waking :/

I am guessing it is in calc_tg_weight(), and naughty boys do make them more
favored, what a reality...

Mike, beg you test the following?

--

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4df37a4..b184da0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2370,7 +2370,7 @@ static inline long calc_tg_weight(struct task_group *tg, 
struct cfs_rq *cfs_rq)
 */
tg_weight = atomic_long_read(>load_avg);
tg_weight -= cfs_rq->tg_load_avg_contrib;
-   tg_weight += cfs_rq_load_avg(cfs_rq);
+   tg_weight += cfs_rq->load.weight;
 
return tg_weight;
 }
@@ -2380,7 +2380,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct 
task_group *tg)
long tg_weight, load, shares;
 
tg_weight = calc_tg_weight(tg, cfs_rq);
-   load = cfs_rq_load_avg(cfs_rq);
+   load = cfs_rq->load.weight;
 
shares = (tg->shares * load);
if (tg_weight)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Peter Zijlstra
On Mon, Oct 12, 2015 at 08:53:51AM +0800, Yuyang Du wrote:
> Good morning, Peter.
> 
> On Mon, Oct 12, 2015 at 10:04:07AM +0200, Peter Zijlstra wrote:
> > On Mon, Oct 12, 2015 at 09:44:57AM +0200, Mike Galbraith wrote:
> > 
> > > It's odd to me that things look pretty much the same good/bad tree with
> > > hogs vs hogs or hogs vs tbench (with top anyway, just adding up times).
> > > Seems Xorg+mplayer more or less playing cross group ping-pong must be
> > > the BadThing trigger.
> >
> > Ohh, wait, Xorg and mplayer are _not_ in the same group? I was assuming
> > you had your entire user session in 1 (auto) group and was competing
> > against 8 manual cgroups.
> > 
> > So how exactly are things configured?
>  
> Hmm... my impression is the naughty boy mplayer (+Xorg) isn't favored, due 
> to the per CPU group entity share distribution. Let me dig more.

So in the old code we had 'magic' to deal with the case where a cgroup
was consuming less than 1 cpu's worth of runtime. For example, a single
task running in the group.

In that scenario it might be possible that the group entity weight:

se->weight = (tg->shares * cfs_rq->weight) / tg->weight;

Strongly deviates from the tg->shares; you want the single task reflect
the full group shares to the next level; due to the whole distributed
approximation stuff.

I see you've deleted all that code; see the former
__update_group_entity_contrib().

It could be that we need to bring that back. But let me think a little
bit more on this.. I'm having a hard time waking :/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Peter Zijlstra
On Mon, Oct 12, 2015 at 10:12:31AM +0800, Yuyang Du wrote:
> On Mon, Oct 12, 2015 at 11:12:06AM +0200, Peter Zijlstra wrote:

> > So in the old code we had 'magic' to deal with the case where a cgroup
> > was consuming less than 1 cpu's worth of runtime. For example, a single
> > task running in the group.
> > 
> > In that scenario it might be possible that the group entity weight:
> > 
> > se->weight = (tg->shares * cfs_rq->weight) / tg->weight;
> > 
> > Strongly deviates from the tg->shares; you want the single task reflect
> > the full group shares to the next level; due to the whole distributed
> > approximation stuff.
> 
> Yeah, I thought so.
>  
> > I see you've deleted all that code; see the former
> > __update_group_entity_contrib().
>  
> Probably not there, it actually was an icky way to adjust things.

Yeah, no argument there.

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4df37a4..b184da0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2370,7 +2370,7 @@ static inline long calc_tg_weight(struct task_group 
> *tg, struct cfs_rq *cfs_rq)
>*/
>   tg_weight = atomic_long_read(>load_avg);
>   tg_weight -= cfs_rq->tg_load_avg_contrib;
> - tg_weight += cfs_rq_load_avg(cfs_rq);
> + tg_weight += cfs_rq->load.weight;
>  
>   return tg_weight;
>  }
> @@ -2380,7 +2380,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, 
> struct task_group *tg)
>   long tg_weight, load, shares;
>  
>   tg_weight = calc_tg_weight(tg, cfs_rq);
> - load = cfs_rq_load_avg(cfs_rq);
> + load = cfs_rq->load.weight;
>  
>   shares = (tg->shares * load);
>   if (tg_weight)

Aah, yes very much so. I completely overlooked that :-(

When calculating shares we very much want the current load, not the load
average.

Also, should we do the below? At this point se->on_rq is still 0 so
reweight_entity() will not update (dequeue/enqueue) the accounting, but
we'll have just accounted the 'old' load.weight.

Doing it this way around we'll first update the weight and then account
it, which seems more accurate.

---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 700eb548315f..d2efef565aed 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3009,8 +3009,8 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity 
*se, int flags)
 */
update_curr(cfs_rq);
enqueue_entity_load_avg(cfs_rq, se);
-   account_entity_enqueue(cfs_rq, se);
update_cfs_shares(cfs_rq);
+   account_entity_enqueue(cfs_rq, se);
 
if (flags & ENQUEUE_WAKEUP) {
place_entity(cfs_rq, se, 0);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Yuyang Du
On Mon, Oct 12, 2015 at 01:47:23PM +0200, Peter Zijlstra wrote:
> 
> Also, should we do the below? At this point se->on_rq is still 0 so
> reweight_entity() will not update (dequeue/enqueue) the accounting, but
> we'll have just accounted the 'old' load.weight.
> 
> Doing it this way around we'll first update the weight and then account
> it, which seems more accurate.
 
I think the original looks ok.

The account_entity_enqueue() adds child entity's load.weight to parent's load:

update_load_add(_rq->load, se->load.weight)

Then recalculate the shares.

Then reweight_entity() resets the parent entity's load.weight.

> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 700eb548315f..d2efef565aed 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3009,8 +3009,8 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct 
> sched_entity *se, int flags)
>*/
>   update_curr(cfs_rq);
>   enqueue_entity_load_avg(cfs_rq, se);
> - account_entity_enqueue(cfs_rq, se);
>   update_cfs_shares(cfs_rq);
> + account_entity_enqueue(cfs_rq, se);
>  
>   if (flags & ENQUEUE_WAKEUP) {
>   place_entity(cfs_rq, se, 0);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Yuyang Du
On Mon, Oct 12, 2015 at 12:23:31PM +0200, Mike Galbraith wrote:
> On Mon, 2015-10-12 at 10:12 +0800, Yuyang Du wrote:
> 
> > I am guessing it is in calc_tg_weight(), and naughty boys do make them more
> > favored, what a reality...
> > 
> > Mike, beg you test the following?
> 
> Wow, that was quick.  Dinky patch made it all better.
> 
>  
> -
>   Task  |   Runtime ms  | Switches | Average delay ms | 
> Maximum delay ms | Maximum delay at   |
>  
> -
>   oink:(8)  | 739056.970 ms |27270 | avg:2.043 ms | max:  
>  29.105 ms | max at:339.988310 s
>   mplayer:(25)  |  36448.997 ms |44670 | avg:1.886 ms | max:  
>  72.808 ms | max at:302.153121 s
>   Xorg:988  |  13334.908 ms |22210 | avg:0.081 ms | max:  
>  25.005 ms | max at:269.068666 s
>   testo:(9) |   2558.540 ms |13703 | avg:0.124 ms | max:  
>   6.412 ms | max at:279.235272 s
>   konsole:1781  |   1084.316 ms | 1457 | avg:0.006 ms | max:  
>   1.039 ms | max at:268.863379 s
>   kwin:1734 |879.645 ms |17855 | avg:0.458 ms | max:  
>  15.788 ms | max at:268.854992 s
>   pulseaudio:1808   |356.334 ms |15023 | avg:0.028 ms | max:  
>   6.134 ms | max at:324.479766 s
>   threaded-ml:3483  |292.782 ms |25769 | avg:0.364 ms | max:  
>  40.387 ms | max at:294.550515 s
>   plasma-desktop:1745   |265.055 ms | 1470 | avg:0.102 ms | max:  
>  21.886 ms | max at:267.724902 s
>   perf:3439 | 61.677 ms |2 | avg:0.117 ms | max:  
>   0.232 ms | max at:367.043889 s

Phew...

I think maybe the real disease is the tg->load_avg is not updated in time.
I.e., it is after migrate, the source cfs_rq does not decrease its contribution
to the parent's tg->load_avg fast enough.

--

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4df37a4..3dba883 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2686,12 +2686,13 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq 
*cfs_rq);
 static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
 {
struct sched_avg *sa = _rq->avg;
-   int decayed;
+   int decayed, updated = 0;
 
if (atomic_long_read(_rq->removed_load_avg)) {
long r = atomic_long_xchg(_rq->removed_load_avg, 0);
sa->load_avg = max_t(long, sa->load_avg - r, 0);
sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
+   updated = 1;
}
 
if (atomic_long_read(_rq->removed_util_avg)) {
@@ -2708,7 +2709,7 @@ static inline int update_cfs_rq_load_avg(u64 now, struct 
cfs_rq *cfs_rq)
cfs_rq->load_last_update_time_copy = sa->last_update_time;
 #endif
 
-   return decayed;
+   return decayed | updated;
 }
 
 /* Update task and its cfs_rq load average */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Yuyang Du
On Tue, Oct 13, 2015 at 06:08:34AM +0200, Mike Galbraith wrote:
> It sounded like you wanted me to run the below alone.  If so, it's a nogo.
  
Yes, thanks.

Then it is the sad fact that after migrate and removed_load_avg is added
in migrate_task_rq_fair(), we don't get a chance to update the tg so fast
that at the destination the mplayer is weighted to the group's share.

>  
> -
>   Task  |   Runtime ms  | Switches | Average delay ms | 
> Maximum delay ms | Maximum delay at   |
>  
> -
>   oink:(8)  | 787001.236 ms |21641 | avg:0.377 ms | max:  
>  21.991 ms | max at: 51.504005 s
>   mplayer:(25)  |   4256.224 ms | 7264 | avg:   19.698 ms | max: 
> 2087.489 ms | max at:115.294922 s
>   Xorg:1011 |   1507.958 ms | 4081 | avg:8.349 ms | max: 
> 1652.200 ms | max at:126.908021 s
>   konsole:1752  |697.806 ms | 1186 | avg:5.749 ms | max:  
> 160.189 ms | max at: 53.037952 s
>   testo:(9) |438.164 ms | 2551 | avg:6.616 ms | max:  
> 215.527 ms | max at:117.302455 s
>   plasma-desktop:1716   |280.418 ms | 1624 | avg:3.701 ms | max:  
> 574.806 ms | max at: 53.582261 s
>   kwin:1708 |144.986 ms | 2422 | avg:3.301 ms | max:  
> 315.707 ms | max at:116.555721 s
> 
> > --
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 4df37a4..3dba883 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -2686,12 +2686,13 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq 
> > *cfs_rq);
> >  static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
> >  {
> > struct sched_avg *sa = _rq->avg;
> > -   int decayed;
> > +   int decayed, updated = 0;
> >  
> > if (atomic_long_read(_rq->removed_load_avg)) {
> > long r = atomic_long_xchg(_rq->removed_load_avg, 0);
> > sa->load_avg = max_t(long, sa->load_avg - r, 0);
> > sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
> > +   updated = 1;
> > }
> >  
> > if (atomic_long_read(_rq->removed_util_avg)) {
> > @@ -2708,7 +2709,7 @@ static inline int update_cfs_rq_load_avg(u64 now, 
> > struct cfs_rq *cfs_rq)
> > cfs_rq->load_last_update_time_copy = sa->last_update_time;
> >  #endif
> >  
> > -   return decayed;
> > +   return decayed | updated;

A typo: decayed || updated, but shouldn't make any difference.

> >  }
> >  
> >  /* Update task and its cfs_rq load average */
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Mike Galbraith
On Tue, 2015-10-13 at 03:55 +0800, Yuyang Du wrote:
> On Mon, Oct 12, 2015 at 12:23:31PM +0200, Mike Galbraith wrote:
> > On Mon, 2015-10-12 at 10:12 +0800, Yuyang Du wrote:
> > 
> > > I am guessing it is in calc_tg_weight(), and naughty boys do make them 
> > > more
> > > favored, what a reality...
> > > 
> > > Mike, beg you test the following?
> > 
> > Wow, that was quick.  Dinky patch made it all better.
> > 
> >  
> > -
> >   Task  |   Runtime ms  | Switches | Average delay ms | 
> > Maximum delay ms | Maximum delay at   |
> >  
> > -
> >   oink:(8)  | 739056.970 ms |27270 | avg:2.043 ms | 
> > max:   29.105 ms | max at:339.988310 s
> >   mplayer:(25)  |  36448.997 ms |44670 | avg:1.886 ms | 
> > max:   72.808 ms | max at:302.153121 s
> >   Xorg:988  |  13334.908 ms |22210 | avg:0.081 ms | 
> > max:   25.005 ms | max at:269.068666 s
> >   testo:(9) |   2558.540 ms |13703 | avg:0.124 ms | 
> > max:6.412 ms | max at:279.235272 s
> >   konsole:1781  |   1084.316 ms | 1457 | avg:0.006 ms | 
> > max:1.039 ms | max at:268.863379 s
> >   kwin:1734 |879.645 ms |17855 | avg:0.458 ms | 
> > max:   15.788 ms | max at:268.854992 s
> >   pulseaudio:1808   |356.334 ms |15023 | avg:0.028 ms | 
> > max:6.134 ms | max at:324.479766 s
> >   threaded-ml:3483  |292.782 ms |25769 | avg:0.364 ms | 
> > max:   40.387 ms | max at:294.550515 s
> >   plasma-desktop:1745   |265.055 ms | 1470 | avg:0.102 ms | 
> > max:   21.886 ms | max at:267.724902 s
> >   perf:3439 | 61.677 ms |2 | avg:0.117 ms | 
> > max:0.232 ms | max at:367.043889 s
> 
> Phew...
> 
> I think maybe the real disease is the tg->load_avg is not updated in time.
> I.e., it is after migrate, the source cfs_rq does not decrease its 
> contribution
> to the parent's tg->load_avg fast enough.

It sounded like you wanted me to run the below alone.  If so, it's a nogo.
 
 
-
  Task  |   Runtime ms  | Switches | Average delay ms | Maximum 
delay ms | Maximum delay at   |
 
-
  oink:(8)  | 787001.236 ms |21641 | avg:0.377 ms | max:   
21.991 ms | max at: 51.504005 s
  mplayer:(25)  |   4256.224 ms | 7264 | avg:   19.698 ms | max: 
2087.489 ms | max at:115.294922 s
  Xorg:1011 |   1507.958 ms | 4081 | avg:8.349 ms | max: 
1652.200 ms | max at:126.908021 s
  konsole:1752  |697.806 ms | 1186 | avg:5.749 ms | max:  
160.189 ms | max at: 53.037952 s
  testo:(9) |438.164 ms | 2551 | avg:6.616 ms | max:  
215.527 ms | max at:117.302455 s
  plasma-desktop:1716   |280.418 ms | 1624 | avg:3.701 ms | max:  
574.806 ms | max at: 53.582261 s
  kwin:1708 |144.986 ms | 2422 | avg:3.301 ms | max:  
315.707 ms | max at:116.555721 s

> --
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4df37a4..3dba883 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2686,12 +2686,13 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq 
> *cfs_rq);
>  static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
>  {
>   struct sched_avg *sa = _rq->avg;
> - int decayed;
> + int decayed, updated = 0;
>  
>   if (atomic_long_read(_rq->removed_load_avg)) {
>   long r = atomic_long_xchg(_rq->removed_load_avg, 0);
>   sa->load_avg = max_t(long, sa->load_avg - r, 0);
>   sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
> + updated = 1;
>   }
>  
>   if (atomic_long_read(_rq->removed_util_avg)) {
> @@ -2708,7 +2709,7 @@ static inline int update_cfs_rq_load_avg(u64 now, 
> struct cfs_rq *cfs_rq)
>   cfs_rq->load_last_update_time_copy = sa->last_update_time;
>  #endif
>  
> - return decayed;
> + return decayed | updated;
>  }
>  
>  /* Update task and its cfs_rq load average */


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.3 group scheduling regression

2015-10-12 Thread Mike Galbraith
On Mon, 2015-10-12 at 13:47 +0200, Peter Zijlstra wrote:

> Also, should we do the below?

Ew.  Box said "Either you quilt pop/burn, or I boot windows." ;-)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


4.3 group scheduling regression

2015-10-11 Thread Mike Galbraith
(change subject, CCs)

On Sun, 2015-10-11 at 04:25 +0200, Mike Galbraith wrote:

> > Is the interactivity the same (horrible) at fe32d3cd5e8e (ie, before the
> > load tracking rewrite from Yuyang)?

It is the rewrite, 9d89c257dfb9c51a532d69397f6eed75e5168c35.

Watching 8 single hog groups vs 1 tbench group, master vs 4.2.3, I saw
no big hairy difference, just as 1 group of 8 hogs vs 8 groups of 1.

8 single hog groups vs the less hungry mplayer otoh is quite different

100 second scripted recordings:
(note: "testo" is kde konsole acting as task group launch vehicle)

master
 
-
  Task  |   Runtime ms  | Switches | Average delay ms | Maximum 
delay ms | Maximum delay at   |
 
-
  oink:(8)  | 787637.964 ms |16242 | avg:0.557 ms | max:   
68.993 ms | max at:239.126118 s
  mplayer:(25)  |   5477.234 ms | 8504 | avg:   16.395 ms | max: 
2100.233 ms | max at:282.850734 s
  Xorg:997  |   1773.218 ms | 4680 | avg:4.857 ms | max: 
1640.194 ms | max at:285.660210 s
  konsole:1789  |649.323 ms | 1261 | avg:6.747 ms | max:  
156.282 ms | max at:265.548523 s
  testo:(9) |454.046 ms | 2867 | avg:5.961 ms | max:  
276.371 ms | max at:245.511282 s
  plasma-desktop:1753   |223.251 ms | 1582 | avg:4.220 ms | max:  
299.354 ms | max at:337.242542 s
  kwin:1745 |156.746 ms | 2879 | avg:2.398 ms | max:  
355.765 ms | max at:337.242490 s
  pulseaudio:1797   | 60.268 ms | 2573 | avg:0.695 ms | max:   
36.069 ms | max at:292.318120 s
  threaded-ml:3477  | 47.076 ms | 3878 | avg:7.083 ms | max: 
1898.940 ms | max at:254.919367 s
  perf:3437 | 28.525 ms |4 | avg:  129.042 ms | max:  
498.816 ms | max at:336.102154 s

4.2.3
 
-
  Task  |   Runtime ms  | Switches | Average delay ms | Maximum 
delay ms | Maximum delay at   |
 
-
  oink:(8)  | 741307.292 ms |42325 | avg:1.276 ms | max:   
23.598 ms | max at:192.459790 s
  mplayer:(25)  |  35296.804 ms |35423 | avg:1.715 ms | max:   
71.972 ms | max at:128.737783 s
  Xorg:929  |  13257.917 ms |21583 | avg:0.091 ms | max:   
27.983 ms | max at:102.272376 s
  testo:(9) |   2315.080 ms |13213 | avg:0.133 ms | max:
6.632 ms | max at:201.422570 s
  konsole:1747  |938.939 ms | 1458 | avg:0.096 ms | max:   
15.006 ms | max at:102.260294 s
  kwin:1703 |815.384 ms |17376 | avg:0.464 ms | max:
9.311 ms | max at:119.026179 s
  pulseaudio:1762   |396.168 ms |14338 | avg:0.020 ms | max:
6.514 ms | max at:115.928179 s
  threaded-ml:3477  |310.132 ms |23966 | avg:0.428 ms | max:   
27.974 ms | max at:134.100588 s
  plasma-desktop:1711   |239.232 ms | 1577 | avg:0.048 ms | max:
7.072 ms | max at:102.060279 s
  perf:3434 | 65.705 ms |2 | avg:0.054 ms | max:
0.105 ms | max at:102.011221 s

master, mplayer solo reference
 
-
  Task  |   Runtime ms  | Switches | Average delay ms | Maximum 
delay ms | Maximum delay at   |
 
-
  mplayer:(25)  |  32171.732 ms |18416 | avg:0.012 ms | max:
4.405 ms | max at:   4911.226038 s
  Xorg:948  |  14271.286 ms |17396 | avg:0.016 ms | max:
0.082 ms | max at:   4911.243020 s
  testo:4121|   3594.784 ms |11607 | avg:0.015 ms | max:
0.078 ms | max at:   4981.705240 s
  kwin:1650 |   1209.387 ms |17562 | avg:0.012 ms | max:
1.612 ms | max at:   4911.245523 s
  konsole:1728  |967.914 ms | 1498 | avg:0.007 ms | max:
0.048 ms | max at:   4997.903759 s
  pulseaudio:1750   |684.342 ms |14460 | avg:0.013 ms | max:
0.552 ms | max at:   4957.743502 s
  threaded-ml:4153  |641.893 ms |15748 | avg:0.016 ms | max:
2.201 ms | max at:   4923.928810 s
  plasma-desktop:1658   |150.068 ms |  569 | avg:0.011 ms | max:
0.390 ms | max at:   4911.258650 s
  perf:4126 | 43.854 ms |3 | avg:0.022 ms | max:

4.3 group scheduling regression

2015-10-11 Thread Mike Galbraith
(change subject, CCs)

On Sun, 2015-10-11 at 04:25 +0200, Mike Galbraith wrote:

> > Is the interactivity the same (horrible) at fe32d3cd5e8e (ie, before the
> > load tracking rewrite from Yuyang)?

It is the rewrite, 9d89c257dfb9c51a532d69397f6eed75e5168c35.

Watching 8 single hog groups vs 1 tbench group, master vs 4.2.3, I saw
no big hairy difference, just as 1 group of 8 hogs vs 8 groups of 1.

8 single hog groups vs the less hungry mplayer otoh is quite different

100 second scripted recordings:
(note: "testo" is kde konsole acting as task group launch vehicle)

master
 
-
  Task  |   Runtime ms  | Switches | Average delay ms | Maximum 
delay ms | Maximum delay at   |
 
-
  oink:(8)  | 787637.964 ms |16242 | avg:0.557 ms | max:   
68.993 ms | max at:239.126118 s
  mplayer:(25)  |   5477.234 ms | 8504 | avg:   16.395 ms | max: 
2100.233 ms | max at:282.850734 s
  Xorg:997  |   1773.218 ms | 4680 | avg:4.857 ms | max: 
1640.194 ms | max at:285.660210 s
  konsole:1789  |649.323 ms | 1261 | avg:6.747 ms | max:  
156.282 ms | max at:265.548523 s
  testo:(9) |454.046 ms | 2867 | avg:5.961 ms | max:  
276.371 ms | max at:245.511282 s
  plasma-desktop:1753   |223.251 ms | 1582 | avg:4.220 ms | max:  
299.354 ms | max at:337.242542 s
  kwin:1745 |156.746 ms | 2879 | avg:2.398 ms | max:  
355.765 ms | max at:337.242490 s
  pulseaudio:1797   | 60.268 ms | 2573 | avg:0.695 ms | max:   
36.069 ms | max at:292.318120 s
  threaded-ml:3477  | 47.076 ms | 3878 | avg:7.083 ms | max: 
1898.940 ms | max at:254.919367 s
  perf:3437 | 28.525 ms |4 | avg:  129.042 ms | max:  
498.816 ms | max at:336.102154 s

4.2.3
 
-
  Task  |   Runtime ms  | Switches | Average delay ms | Maximum 
delay ms | Maximum delay at   |
 
-
  oink:(8)  | 741307.292 ms |42325 | avg:1.276 ms | max:   
23.598 ms | max at:192.459790 s
  mplayer:(25)  |  35296.804 ms |35423 | avg:1.715 ms | max:   
71.972 ms | max at:128.737783 s
  Xorg:929  |  13257.917 ms |21583 | avg:0.091 ms | max:   
27.983 ms | max at:102.272376 s
  testo:(9) |   2315.080 ms |13213 | avg:0.133 ms | max:
6.632 ms | max at:201.422570 s
  konsole:1747  |938.939 ms | 1458 | avg:0.096 ms | max:   
15.006 ms | max at:102.260294 s
  kwin:1703 |815.384 ms |17376 | avg:0.464 ms | max:
9.311 ms | max at:119.026179 s
  pulseaudio:1762   |396.168 ms |14338 | avg:0.020 ms | max:
6.514 ms | max at:115.928179 s
  threaded-ml:3477  |310.132 ms |23966 | avg:0.428 ms | max:   
27.974 ms | max at:134.100588 s
  plasma-desktop:1711   |239.232 ms | 1577 | avg:0.048 ms | max:
7.072 ms | max at:102.060279 s
  perf:3434 | 65.705 ms |2 | avg:0.054 ms | max:
0.105 ms | max at:102.011221 s

master, mplayer solo reference
 
-
  Task  |   Runtime ms  | Switches | Average delay ms | Maximum 
delay ms | Maximum delay at   |
 
-
  mplayer:(25)  |  32171.732 ms |18416 | avg:0.012 ms | max:
4.405 ms | max at:   4911.226038 s
  Xorg:948  |  14271.286 ms |17396 | avg:0.016 ms | max:
0.082 ms | max at:   4911.243020 s
  testo:4121|   3594.784 ms |11607 | avg:0.015 ms | max:
0.078 ms | max at:   4981.705240 s
  kwin:1650 |   1209.387 ms |17562 | avg:0.012 ms | max:
1.612 ms | max at:   4911.245523 s
  konsole:1728  |967.914 ms | 1498 | avg:0.007 ms | max:
0.048 ms | max at:   4997.903759 s
  pulseaudio:1750   |684.342 ms |14460 | avg:0.013 ms | max:
0.552 ms | max at:   4957.743502 s
  threaded-ml:4153  |641.893 ms |15748 | avg:0.016 ms | max:
2.201 ms | max at:   4923.928810 s
  plasma-desktop:1658   |150.068 ms |  569 | avg:0.011 ms | max:
0.390 ms | max at:   4911.258650 s
  perf:4126 | 43.854 ms |3 | avg:0.022 ms | max: