Re: [PATCH 5/7 v4] sched: propagate asynchrous detach

2016-10-12 Thread Dietmar Eggemann
On 12/10/16 16:45, Vincent Guittot wrote:
> On 12 October 2016 at 17:03, Dietmar Eggemann  
> wrote:
>> On 26/09/16 13:19, Vincent Guittot wrote:

[...]

>>> @@ -6607,6 +6609,10 @@ static void update_blocked_averages(int cpu)
>>>
>>>   if (update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, 
>>> true))
>>>   update_tg_load_avg(cfs_rq, 0);
>>> +
>>> + /* Propagate pending load changes to the parent */
>>> + if (cfs_rq->tg->se[cpu])
>>> + update_load_avg(cfs_rq->tg->se[cpu], 0);
>>
>> In my test (1 task (run/period: 8ms/16ms) in tg_root->tg_x->tg_y->*tg_z*
>> and oscillating between cpu1 and cpu2) the cfs_rq related signals are
>> nicely going down to 0 after the task has left the cpu but it doesn't
>> seem to be the case for the corresponding se (cfs_rq->tg->se[cpu])?
> 
> strange because such use case is part of the functional tests that I
> run and it was working fine according to last test that I did
> 
>>
>> It should actually work correctly because of the
>> update_tg_cfs_util/load() calls in update_load_avg(cfs_rq->tg->se[cpu],
>> 0)->propagate_entity_load_avg()
> 
> Furthermore, the update of the parent cfs_rq tg_x->cfs_rq[cpu] uses
> the delta between previous and new value for the child tg_y->se[cpu].
> So it means that if tg_x->cfs_rq[cpu]->avg.load_avg goes down to 0,
> tg_y->se[cpu]->avg.load_avg has at least changed and most probably
> goes down to 0 too

Makes sense.

> 
> Can't it be a misplaced trace point ?

Yes, you're right, it was a missing tracepoint. I only had se and cfs_rq
pelt tracepoints in __update_load_avg() and
attach/detach_entity_load_avg(). I've added them as well to
propagate_entity_load_avg() after the update_tg_cfs_load() call and now
it makes sense. Thanks!

[...]


Re: [PATCH 5/7 v4] sched: propagate asynchrous detach

2016-10-12 Thread Vincent Guittot
On 12 October 2016 at 17:03, Dietmar Eggemann  wrote:
> On 26/09/16 13:19, Vincent Guittot wrote:
>> A task can be asynchronously detached from cfs_rq when migrating
>> between CPUs. The load of the migrated task is then removed from
>> source cfs_rq during its next update. We use this event to set propagation
>> flag.
>>
>> During the load balance, we take advanatge of the update of blocked load
>> to we propagate any pending changes.
>
> IMHO, it would be a good idea to mention that '2/7 sched: fix
> hierarchical order in rq->leaf_cfs_rq_list' is a hard requirement for
> this to work. The functionality relies on the order of cfs_rq's (top to
> root) in the rq->leaf_cfs_rq_list list.

yes. i will add a comment

>
>> Signed-off-by: Vincent Guittot 
>> ---
>>  kernel/sched/fair.c | 6 ++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 8ba500f..bd3b6b9 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -3221,6 +3221,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, 
>> bool update_freq)
>>   sub_positive(&sa->load_avg, r);
>>   sub_positive(&sa->load_sum, r * LOAD_AVG_MAX);
>>   removed_load = 1;
>> + set_tg_cfs_propagate(cfs_rq);
>>   }
>>
>>   if (atomic_long_read(&cfs_rq->removed_util_avg)) {
>> @@ -3228,6 +3229,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, 
>> bool update_freq)
>>   sub_positive(&sa->util_avg, r);
>>   sub_positive(&sa->util_sum, r * LOAD_AVG_MAX);
>>   removed_util = 1;
>> + set_tg_cfs_propagate(cfs_rq);
>>   }
>>
>>   decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
>> @@ -6607,6 +6609,10 @@ static void update_blocked_averages(int cpu)
>>
>>   if (update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, 
>> true))
>>   update_tg_load_avg(cfs_rq, 0);
>> +
>> + /* Propagate pending load changes to the parent */
>> + if (cfs_rq->tg->se[cpu])
>> + update_load_avg(cfs_rq->tg->se[cpu], 0);
>
> In my test (1 task (run/period: 8ms/16ms) in tg_root->tg_x->tg_y->*tg_z*
> and oscillating between cpu1 and cpu2) the cfs_rq related signals are
> nicely going down to 0 after the task has left the cpu but it doesn't
> seem to be the case for the corresponding se (cfs_rq->tg->se[cpu])?

strange because such use case is part of the functional tests that I
run and it was working fine according to last test that I did

>
> It should actually work correctly because of the
> update_tg_cfs_util/load() calls in update_load_avg(cfs_rq->tg->se[cpu],
> 0)->propagate_entity_load_avg()

Furthermore, the update of the parent cfs_rq tg_x->cfs_rq[cpu] uses
the delta between previous and new value for the child tg_y->se[cpu].
So it means that if tg_x->cfs_rq[cpu]->avg.load_avg goes down to 0,
tg_y->se[cpu]->avg.load_avg has at least changed and most probably
goes down to 0 too

Can't it be a misplaced trace point ?

>
> [...]


Re: [PATCH 5/7 v4] sched: propagate asynchrous detach

2016-10-12 Thread Dietmar Eggemann
On 26/09/16 13:19, Vincent Guittot wrote:
> A task can be asynchronously detached from cfs_rq when migrating
> between CPUs. The load of the migrated task is then removed from
> source cfs_rq during its next update. We use this event to set propagation
> flag.
> 
> During the load balance, we take advanatge of the update of blocked load
> to we propagate any pending changes.

IMHO, it would be a good idea to mention that '2/7 sched: fix
hierarchical order in rq->leaf_cfs_rq_list' is a hard requirement for
this to work. The functionality relies on the order of cfs_rq's (top to
root) in the rq->leaf_cfs_rq_list list.

> Signed-off-by: Vincent Guittot 
> ---
>  kernel/sched/fair.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8ba500f..bd3b6b9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3221,6 +3221,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, 
> bool update_freq)
>   sub_positive(&sa->load_avg, r);
>   sub_positive(&sa->load_sum, r * LOAD_AVG_MAX);
>   removed_load = 1;
> + set_tg_cfs_propagate(cfs_rq);
>   }
>  
>   if (atomic_long_read(&cfs_rq->removed_util_avg)) {
> @@ -3228,6 +3229,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, 
> bool update_freq)
>   sub_positive(&sa->util_avg, r);
>   sub_positive(&sa->util_sum, r * LOAD_AVG_MAX);
>   removed_util = 1;
> + set_tg_cfs_propagate(cfs_rq);
>   }
>  
>   decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
> @@ -6607,6 +6609,10 @@ static void update_blocked_averages(int cpu)
>  
>   if (update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, 
> true))
>   update_tg_load_avg(cfs_rq, 0);
> +
> + /* Propagate pending load changes to the parent */
> + if (cfs_rq->tg->se[cpu])
> + update_load_avg(cfs_rq->tg->se[cpu], 0);

In my test (1 task (run/period: 8ms/16ms) in tg_root->tg_x->tg_y->*tg_z*
and oscillating between cpu1 and cpu2) the cfs_rq related signals are
nicely going down to 0 after the task has left the cpu but it doesn't
seem to be the case for the corresponding se (cfs_rq->tg->se[cpu])?

It should actually work correctly because of the
update_tg_cfs_util/load() calls in update_load_avg(cfs_rq->tg->se[cpu],
0)->propagate_entity_load_avg()

[...]