Re: [PATCH 5/7 v4] sched: propagate asynchrous detach
On 12/10/16 16:45, Vincent Guittot wrote: > On 12 October 2016 at 17:03, Dietmar Eggemann > wrote: >> On 26/09/16 13:19, Vincent Guittot wrote: [...] >>> @@ -6607,6 +6609,10 @@ static void update_blocked_averages(int cpu) >>> >>> if (update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, >>> true)) >>> update_tg_load_avg(cfs_rq, 0); >>> + >>> + /* Propagate pending load changes to the parent */ >>> + if (cfs_rq->tg->se[cpu]) >>> + update_load_avg(cfs_rq->tg->se[cpu], 0); >> >> In my test (1 task (run/period: 8ms/16ms) in tg_root->tg_x->tg_y->*tg_z* >> and oscillating between cpu1 and cpu2) the cfs_rq related signals are >> nicely going down to 0 after the task has left the cpu but it doesn't >> seem to be the case for the corresponding se (cfs_rq->tg->se[cpu])? > > strange because such use case is part of the functional tests that I > run and it was working fine according to last test that I did > >> >> It should actually work correctly because of the >> update_tg_cfs_util/load() calls in update_load_avg(cfs_rq->tg->se[cpu], >> 0)->propagate_entity_load_avg() > > Furthermore, the update of the parent cfs_rq tg_x->cfs_rq[cpu] uses > the delta between previous and new value for the child tg_y->se[cpu]. > So it means that if tg_x->cfs_rq[cpu]->avg.load_avg goes down to 0, > tg_y->se[cpu]->avg.load_avg has at least changed and most probably > goes down to 0 too Makes sense. > > Can't it be a misplaced trace point ? Yes, you're right, it was a missing tracepoint. I only had se and cfs_rq pelt tracepoints in __update_load_avg() and attach/detach_entity_load_avg(). I've added them as well to propagate_entity_load_avg() after the update_tg_cfs_load() call and now it makes sense. Thanks! [...]
Re: [PATCH 5/7 v4] sched: propagate asynchrous detach
On 12 October 2016 at 17:03, Dietmar Eggemann wrote: > On 26/09/16 13:19, Vincent Guittot wrote: >> A task can be asynchronously detached from cfs_rq when migrating >> between CPUs. The load of the migrated task is then removed from >> source cfs_rq during its next update. We use this event to set propagation >> flag. >> >> During the load balance, we take advanatge of the update of blocked load >> to we propagate any pending changes. > > IMHO, it would be a good idea to mention that '2/7 sched: fix > hierarchical order in rq->leaf_cfs_rq_list' is a hard requirement for > this to work. The functionality relies on the order of cfs_rq's (top to > root) in the rq->leaf_cfs_rq_list list. yes. i will add a comment > >> Signed-off-by: Vincent Guittot >> --- >> kernel/sched/fair.c | 6 ++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 8ba500f..bd3b6b9 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -3221,6 +3221,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, >> bool update_freq) >> sub_positive(&sa->load_avg, r); >> sub_positive(&sa->load_sum, r * LOAD_AVG_MAX); >> removed_load = 1; >> + set_tg_cfs_propagate(cfs_rq); >> } >> >> if (atomic_long_read(&cfs_rq->removed_util_avg)) { >> @@ -3228,6 +3229,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, >> bool update_freq) >> sub_positive(&sa->util_avg, r); >> sub_positive(&sa->util_sum, r * LOAD_AVG_MAX); >> removed_util = 1; >> + set_tg_cfs_propagate(cfs_rq); >> } >> >> decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa, >> @@ -6607,6 +6609,10 @@ static void update_blocked_averages(int cpu) >> >> if (update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, >> true)) >> update_tg_load_avg(cfs_rq, 0); >> + >> + /* Propagate pending load changes to the parent */ >> + if (cfs_rq->tg->se[cpu]) >> + update_load_avg(cfs_rq->tg->se[cpu], 0); > > In my test (1 task (run/period: 8ms/16ms) in tg_root->tg_x->tg_y->*tg_z* > and oscillating between cpu1 and cpu2) the cfs_rq related signals are > nicely going down to 0 after the task has left the cpu but it doesn't > seem to be the case for the corresponding se (cfs_rq->tg->se[cpu])? strange because such use case is part of the functional tests that I run and it was working fine according to last test that I did > > It should actually work correctly because of the > update_tg_cfs_util/load() calls in update_load_avg(cfs_rq->tg->se[cpu], > 0)->propagate_entity_load_avg() Furthermore, the update of the parent cfs_rq tg_x->cfs_rq[cpu] uses the delta between previous and new value for the child tg_y->se[cpu]. So it means that if tg_x->cfs_rq[cpu]->avg.load_avg goes down to 0, tg_y->se[cpu]->avg.load_avg has at least changed and most probably goes down to 0 too Can't it be a misplaced trace point ? > > [...]
Re: [PATCH 5/7 v4] sched: propagate asynchrous detach
On 26/09/16 13:19, Vincent Guittot wrote: > A task can be asynchronously detached from cfs_rq when migrating > between CPUs. The load of the migrated task is then removed from > source cfs_rq during its next update. We use this event to set propagation > flag. > > During the load balance, we take advanatge of the update of blocked load > to we propagate any pending changes. IMHO, it would be a good idea to mention that '2/7 sched: fix hierarchical order in rq->leaf_cfs_rq_list' is a hard requirement for this to work. The functionality relies on the order of cfs_rq's (top to root) in the rq->leaf_cfs_rq_list list. > Signed-off-by: Vincent Guittot > --- > kernel/sched/fair.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 8ba500f..bd3b6b9 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3221,6 +3221,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, > bool update_freq) > sub_positive(&sa->load_avg, r); > sub_positive(&sa->load_sum, r * LOAD_AVG_MAX); > removed_load = 1; > + set_tg_cfs_propagate(cfs_rq); > } > > if (atomic_long_read(&cfs_rq->removed_util_avg)) { > @@ -3228,6 +3229,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, > bool update_freq) > sub_positive(&sa->util_avg, r); > sub_positive(&sa->util_sum, r * LOAD_AVG_MAX); > removed_util = 1; > + set_tg_cfs_propagate(cfs_rq); > } > > decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa, > @@ -6607,6 +6609,10 @@ static void update_blocked_averages(int cpu) > > if (update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, > true)) > update_tg_load_avg(cfs_rq, 0); > + > + /* Propagate pending load changes to the parent */ > + if (cfs_rq->tg->se[cpu]) > + update_load_avg(cfs_rq->tg->se[cpu], 0); In my test (1 task (run/period: 8ms/16ms) in tg_root->tg_x->tg_y->*tg_z* and oscillating between cpu1 and cpu2) the cfs_rq related signals are nicely going down to 0 after the task has left the cpu but it doesn't seem to be the case for the corresponding se (cfs_rq->tg->se[cpu])? It should actually work correctly because of the update_tg_cfs_util/load() calls in update_load_avg(cfs_rq->tg->se[cpu], 0)->propagate_entity_load_avg() [...]