Re: [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks

2021-04-20 Thread Vincent Guittot
On Mon, 19 Apr 2021 at 19:13, Valentin Schneider
 wrote:
>
> On 16/04/21 15:51, Vincent Guittot wrote:
> > Le jeudi 15 avril 2021 � 18:58:46 (+0100), Valentin Schneider a �crit :
> >> +
> >> +/*
> >> + * What does migrating this task do to our capacity-aware scheduling 
> >> criterion?
> >> + *
> >> + * Returns 1, if the task needs more capacity than the dst CPU can 
> >> provide.
> >> + * Returns 0, if the task needs the extra capacity provided by the dst CPU
> >> + * Returns -1, if the task isn't impacted by the migration wrt capacity.
> >> + */
> >> +static int migrate_degrades_capacity(struct task_struct *p, struct lb_env 
> >> *env)
> >> +{
> >> +if (!(env->sd->flags & SD_ASYM_CPUCAPACITY))
> >> +return -1;
> >> +
> >> +if (!task_fits_capacity(p, capacity_of(env->src_cpu))) {
> >> +if (cpu_capacity_greater(env->dst_cpu, env->src_cpu))
> >> +return 0;
> >> +else if (cpu_capacity_greater(env->src_cpu, env->dst_cpu))
> >> +return 1;
> >> +else
> >> +return -1;
> >> +}
> >
> > Being there means that task fits src_cpu capacity so why testing p against 
> > dst_cpu ?
> >
>
> Because if p fits on src_cpu, we don't want to move it to a dst_cpu on
> which it *doesn't* fit.

OK. I was confused because I thought that this was only to force
migration in case of group_misfit_task but you tried to extend to
other cases... I'm not convinced that you succeeded to cover all cases

Also I found this function which returns 3 values a bit disturbing.
IIUC you tried to align to migrate_degrades_capacity but you should
have better aligned to task_hot and return only 0 or 1. -1 is not used

>
> >> +
> >> +return task_fits_capacity(p, capacity_of(env->dst_cpu)) ? -1 : 1;
> >> +}
> >
> > I prefer the below which easier to read because the same var is use 
> > everywhere and you can remove cpu_capacity_greater.
> >
> > static int migrate_degrades_capacity(struct task_struct *p, struct lb_env 
> > *env)
> > {
> > unsigned long src_capacity, dst_capacity;
> >
> > if (!(env->sd->flags & SD_ASYM_CPUCAPACITY))
> > return -1;
> >
> > src_capacity = capacity_of(env->src_cpu);
> > dst_capacity = capacity_of(env->dst_cpu);
> >
> > if (!task_fits_capacity(p, src_capacity)) {
> > if (capacity_greater(dst_capacity, src_capacity))
> > return 0;
> > else if (capacity_greater(src_capacity, dst_capacity))
> > return 1;
> > else
> > return -1;
> > }
> >
> > return task_fits_capacity(p, dst_capacity) ? -1 : 1;
> > }
> >
>
> I'll take it, thanks!
>
> >
> >> +
> >>  #ifdef CONFIG_NUMA_BALANCING
> >>  /*
> >>   * Returns 1, if task migration degrades locality
> >> @@ -7672,6 +7698,15 @@ int can_migrate_task(struct task_struct *p, struct 
> >> lb_env *env)
> >>  if (tsk_cache_hot == -1)
> >>  tsk_cache_hot = task_hot(p, env);
> >>
> >> +/*
> >> + * On a (sane) asymmetric CPU capacity system, the increase in compute
> >> + * capacity should offset any potential performance hit caused by a
> >> + * migration.
> >> + */
> >> +if ((env->dst_grp_type == group_has_spare) &&
> >
> > Shouldn't it be env->src_grp_type == group_misfit_task to only care of 
> > misfit task case as
> > stated in $subject
> >
>
> Previously this was env->idle != CPU_NOT_IDLE, but I figured dst_grp_type
> could give us a better picture. Staring at this some more, this isn't so
> true when the group size goes up - there's no guarantees the dst_cpu is the
> one that has spare cycles, and the other CPUs might not be able to grant
> the capacity uplift dst_cpu can.

yeah you have to keep checking for env->idle != CPU_NOT_IDLE

>
> As for not using src_grp_type == group_misfit_task, this is pretty much the
> same as [1]. CPU-bound (misfit) task + some other task on the same rq
> implies group_overloaded classification when balancing at MC level (no SMT,
> so one group per CPU).

Is it something that happens often or just a sporadic/transient state
? I mean does it really worth the extra complexity and do you see
performance improvement ?

You should better focus on fixing the simple case of group_misfit_task
task. This other cases looks far more complex with lot of corner cases

>
> [1]: http://lore.kernel.org/r/jhjblcuv2mo.mog...@arm.com


Re: [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks

2021-04-19 Thread Phil Auld
On Mon, Apr 19, 2021 at 06:17:47PM +0100 Valentin Schneider wrote:
> On 19/04/21 08:59, Phil Auld wrote:
> > On Fri, Apr 16, 2021 at 10:43:38AM +0100 Valentin Schneider wrote:
> >> On 15/04/21 16:39, Rik van Riel wrote:
> >> > On Thu, 2021-04-15 at 18:58 +0100, Valentin Schneider wrote:
> >> >> @@ -7672,6 +7698,15 @@ int can_migrate_task(struct task_struct *p,
> >> >> struct lb_env *env)
> >> >>  if (tsk_cache_hot == -1)
> >> >>  tsk_cache_hot = task_hot(p, env);
> >> >>
> >> >> +   /*
> >> >> +* On a (sane) asymmetric CPU capacity system, the increase in
> >> >> compute
> >> >> +* capacity should offset any potential performance hit caused
> >> >> by a
> >> >> +* migration.
> >> >> +*/
> >> >> +   if ((env->dst_grp_type == group_has_spare) &&
> >> >> +   !migrate_degrades_capacity(p, env))
> >> >> +   tsk_cache_hot = 0;
> >> >
> >> > ... I'm starting to wonder if we should not rename the
> >> > tsk_cache_hot variable to something else to make this
> >> > code more readable. Probably in another patch :)
> >> >
> >>
> >> I'd tend to agree, but naming is hard. "migration_harmful" ?
> >
> > I thought Rik meant tsk_cache_hot, for which I'd suggest at least
> > buying a vowel and putting an 'a' in there :)
> >
> 
> That's the one I was eyeing: s/tsk_cache_hot/migration_harmful/ or
> somesuch. Right now we're feeding it:
>

Fair enough, my bad, the migration part immediately drew me to
migrate_degrades_capacity().

> o migrate_degrades_locality()
> o task_hot()
> 
> and I'm adding another one, so that's 2/3 which don't actually care about
> cache hotness, but rather "does doing this migration degrade/improve
> $criterion?"
> 

prefer_no_migrate? 


Cheers,
Phil
-- 



Re: [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks

2021-04-19 Thread Valentin Schneider
On 19/04/21 08:59, Phil Auld wrote:
> On Fri, Apr 16, 2021 at 10:43:38AM +0100 Valentin Schneider wrote:
>> On 15/04/21 16:39, Rik van Riel wrote:
>> > On Thu, 2021-04-15 at 18:58 +0100, Valentin Schneider wrote:
>> >> @@ -7672,6 +7698,15 @@ int can_migrate_task(struct task_struct *p,
>> >> struct lb_env *env)
>> >>  if (tsk_cache_hot == -1)
>> >>  tsk_cache_hot = task_hot(p, env);
>> >>
>> >> + /*
>> >> +  * On a (sane) asymmetric CPU capacity system, the increase in
>> >> compute
>> >> +  * capacity should offset any potential performance hit caused
>> >> by a
>> >> +  * migration.
>> >> +  */
>> >> + if ((env->dst_grp_type == group_has_spare) &&
>> >> + !migrate_degrades_capacity(p, env))
>> >> + tsk_cache_hot = 0;
>> >
>> > ... I'm starting to wonder if we should not rename the
>> > tsk_cache_hot variable to something else to make this
>> > code more readable. Probably in another patch :)
>> >
>>
>> I'd tend to agree, but naming is hard. "migration_harmful" ?
>
> I thought Rik meant tsk_cache_hot, for which I'd suggest at least
> buying a vowel and putting an 'a' in there :)
>

That's the one I was eyeing: s/tsk_cache_hot/migration_harmful/ or
somesuch. Right now we're feeding it:

o migrate_degrades_locality()
o task_hot()

and I'm adding another one, so that's 2/3 which don't actually care about
cache hotness, but rather "does doing this migration degrade/improve
$criterion?"


Re: [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks

2021-04-19 Thread Valentin Schneider
On 16/04/21 15:51, Vincent Guittot wrote:
> Le jeudi 15 avril 2021 � 18:58:46 (+0100), Valentin Schneider a �crit :
>> +
>> +/*
>> + * What does migrating this task do to our capacity-aware scheduling 
>> criterion?
>> + *
>> + * Returns 1, if the task needs more capacity than the dst CPU can provide.
>> + * Returns 0, if the task needs the extra capacity provided by the dst CPU
>> + * Returns -1, if the task isn't impacted by the migration wrt capacity.
>> + */
>> +static int migrate_degrades_capacity(struct task_struct *p, struct lb_env 
>> *env)
>> +{
>> +if (!(env->sd->flags & SD_ASYM_CPUCAPACITY))
>> +return -1;
>> +
>> +if (!task_fits_capacity(p, capacity_of(env->src_cpu))) {
>> +if (cpu_capacity_greater(env->dst_cpu, env->src_cpu))
>> +return 0;
>> +else if (cpu_capacity_greater(env->src_cpu, env->dst_cpu))
>> +return 1;
>> +else
>> +return -1;
>> +}
>
> Being there means that task fits src_cpu capacity so why testing p against 
> dst_cpu ?
>

Because if p fits on src_cpu, we don't want to move it to a dst_cpu on
which it *doesn't* fit.

>> +
>> +return task_fits_capacity(p, capacity_of(env->dst_cpu)) ? -1 : 1;
>> +}
>
> I prefer the below which easier to read because the same var is use 
> everywhere and you can remove cpu_capacity_greater.
>
> static int migrate_degrades_capacity(struct task_struct *p, struct lb_env 
> *env)
> {
> unsigned long src_capacity, dst_capacity;
>
> if (!(env->sd->flags & SD_ASYM_CPUCAPACITY))
> return -1;
>
> src_capacity = capacity_of(env->src_cpu);
> dst_capacity = capacity_of(env->dst_cpu);
>
> if (!task_fits_capacity(p, src_capacity)) {
> if (capacity_greater(dst_capacity, src_capacity))
> return 0;
> else if (capacity_greater(src_capacity, dst_capacity))
> return 1;
> else
> return -1;
> }
>
> return task_fits_capacity(p, dst_capacity) ? -1 : 1;
> }
>

I'll take it, thanks!

>
>> +
>>  #ifdef CONFIG_NUMA_BALANCING
>>  /*
>>   * Returns 1, if task migration degrades locality
>> @@ -7672,6 +7698,15 @@ int can_migrate_task(struct task_struct *p, struct 
>> lb_env *env)
>>  if (tsk_cache_hot == -1)
>>  tsk_cache_hot = task_hot(p, env);
>>
>> +/*
>> + * On a (sane) asymmetric CPU capacity system, the increase in compute
>> + * capacity should offset any potential performance hit caused by a
>> + * migration.
>> + */
>> +if ((env->dst_grp_type == group_has_spare) &&
>
> Shouldn't it be env->src_grp_type == group_misfit_task to only care of misfit 
> task case as
> stated in $subject
>

Previously this was env->idle != CPU_NOT_IDLE, but I figured dst_grp_type
could give us a better picture. Staring at this some more, this isn't so
true when the group size goes up - there's no guarantees the dst_cpu is the
one that has spare cycles, and the other CPUs might not be able to grant
the capacity uplift dst_cpu can.

As for not using src_grp_type == group_misfit_task, this is pretty much the
same as [1]. CPU-bound (misfit) task + some other task on the same rq
implies group_overloaded classification when balancing at MC level (no SMT,
so one group per CPU).

[1]: http://lore.kernel.org/r/jhjblcuv2mo.mog...@arm.com


Re: [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks

2021-04-19 Thread Phil Auld
On Fri, Apr 16, 2021 at 10:43:38AM +0100 Valentin Schneider wrote:
> On 15/04/21 16:39, Rik van Riel wrote:
> > On Thu, 2021-04-15 at 18:58 +0100, Valentin Schneider wrote:
> >> Consider the following topology:
> >>
> >> Long story short, preempted misfit tasks are affected by task_hot(),
> >> while
> >> currently running misfit tasks are intentionally preempted by the
> >> stopper
> >> task to migrate them over to a higher-capacity CPU.
> >>
> >> Align detach_tasks() with the active-balance logic and let it pick a
> >> cache-hot misfit task when the destination CPU can provide a capacity
> >> uplift.
> >>
> >> Signed-off-by: Valentin Schneider 
> >
> > Reviewed-by: Rik van Riel 
> >
> 
> Thanks!
> 
> >
> > This patch looks good, but...
> >
> >> @@ -7672,6 +7698,15 @@ int can_migrate_task(struct task_struct *p,
> >> struct lb_env *env)
> >>  if (tsk_cache_hot == -1)
> >>  tsk_cache_hot = task_hot(p, env);
> >>
> >> +  /*
> >> +   * On a (sane) asymmetric CPU capacity system, the increase in
> >> compute
> >> +   * capacity should offset any potential performance hit caused
> >> by a
> >> +   * migration.
> >> +   */
> >> +  if ((env->dst_grp_type == group_has_spare) &&
> >> +  !migrate_degrades_capacity(p, env))
> >> +  tsk_cache_hot = 0;
> >
> > ... I'm starting to wonder if we should not rename the
> > tsk_cache_hot variable to something else to make this
> > code more readable. Probably in another patch :)
> >
> 
> I'd tend to agree, but naming is hard. "migration_harmful" ?

I thought Rik meant tsk_cache_hot, for which I'd suggest at least
buying a vowel and putting an 'a' in there :) 


Cheers,
Phil

> 
> > --
> > All Rights Reversed.
> 

-- 



Re: [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks

2021-04-16 Thread Vincent Guittot
Le jeudi 15 avril 2021 à 18:58:46 (+0100), Valentin Schneider a écrit :
> Consider the following topology:
> 
>   DIE [  ]
>   MC  [][]
>0  1  2  3
> 
>   capacity_orig_of(x \in {0-1}) < capacity_orig_of(x \in {2-3})
> 
> w/ CPUs 2-3 idle and CPUs 0-1 running CPU hogs (util_avg=1024).
> 
> When CPU2 goes through load_balance() (via periodic / NOHZ balance), it
> should pull one CPU hog from either CPU0 or CPU1 (this is misfit task
> upmigration). However, should a e.g. pcpu kworker awake on CPU0 just before
> this load_balance() happens and preempt the CPU hog running there, we would
> have, for the [0-1] group at CPU2's DIE level:
> 
> o sgs->sum_nr_running > sgs->group_weight
> o sgs->group_capacity * 100 < sgs->group_util * imbalance_pct
> 
> IOW, this group is group_overloaded.
> 
> Considering CPU0 is picked by find_busiest_queue(), we would then visit the
> preempted CPU hog in detach_tasks(). However, given it has just been
> preempted by this pcpu kworker, task_hot() will prevent it from being
> detached. We then leave load_balance() without having done anything.
> 
> Long story short, preempted misfit tasks are affected by task_hot(), while
> currently running misfit tasks are intentionally preempted by the stopper
> task to migrate them over to a higher-capacity CPU.
> 
> Align detach_tasks() with the active-balance logic and let it pick a
> cache-hot misfit task when the destination CPU can provide a capacity
> uplift.
> 
> Signed-off-by: Valentin Schneider 
> ---
>  kernel/sched/fair.c | 36 
>  1 file changed, 36 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index d2d1a69d7aa7..43fc98d34276 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7493,6 +7493,7 @@ struct lb_env {
>   enum fbq_type   fbq_type;
>   enum migration_type migration_type;
>   enum group_type src_grp_type;
> + enum group_type dst_grp_type;
>   struct list_headtasks;
>  };
>  
> @@ -7533,6 +7534,31 @@ static int task_hot(struct task_struct *p, struct 
> lb_env *env)
>   return delta < (s64)sysctl_sched_migration_cost;
>  }
>  
> +
> +/*
> + * What does migrating this task do to our capacity-aware scheduling 
> criterion?
> + *
> + * Returns 1, if the task needs more capacity than the dst CPU can provide.
> + * Returns 0, if the task needs the extra capacity provided by the dst CPU
> + * Returns -1, if the task isn't impacted by the migration wrt capacity.
> + */
> +static int migrate_degrades_capacity(struct task_struct *p, struct lb_env 
> *env)
> +{
> + if (!(env->sd->flags & SD_ASYM_CPUCAPACITY))
> + return -1;
> +
> + if (!task_fits_capacity(p, capacity_of(env->src_cpu))) {
> + if (cpu_capacity_greater(env->dst_cpu, env->src_cpu))
> + return 0;
> + else if (cpu_capacity_greater(env->src_cpu, env->dst_cpu))
> + return 1;
> + else
> + return -1;
> + }

Being there means that task fits src_cpu capacity so why testing p against 
dst_cpu ?

> +
> + return task_fits_capacity(p, capacity_of(env->dst_cpu)) ? -1 : 1;
> +}

I prefer the below which easier to read because the same var is use everywhere 
and you can remove cpu_capacity_greater.

static int migrate_degrades_capacity(struct task_struct *p, struct lb_env *env)
{
unsigned long src_capacity, dst_capacity;

if (!(env->sd->flags & SD_ASYM_CPUCAPACITY))
return -1;

src_capacity = capacity_of(env->src_cpu);
dst_capacity = capacity_of(env->dst_cpu);

if (!task_fits_capacity(p, src_capacity)) {
if (capacity_greater(dst_capacity, src_capacity))
return 0;
else if (capacity_greater(src_capacity, dst_capacity))
return 1;
else
return -1;
}

return task_fits_capacity(p, dst_capacity) ? -1 : 1;
}


> +
>  #ifdef CONFIG_NUMA_BALANCING
>  /*
>   * Returns 1, if task migration degrades locality
> @@ -7672,6 +7698,15 @@ int can_migrate_task(struct task_struct *p, struct 
> lb_env *env)
>   if (tsk_cache_hot == -1)
>   tsk_cache_hot = task_hot(p, env);
>  
> + /*
> +  * On a (sane) asymmetric CPU capacity system, the increase in compute
> +  * capacity should offset any potential performance hit caused by a
> +  * migration.
> +  */
> + if ((env->dst_grp_type == group_has_spare) &&

Shouldn't it be env->src_grp_type == group_misfit_task to only care of misfit 
task case as
stated in $subject


> + !migrate_degrades_capacity(p, env))
> + tsk_cache_hot = 0;
> +
>   if (tsk_cache_hot <= 0 ||
>   env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
>   if (tsk_cache_hot == 1) {
> @@ -9310,6 +9345,7 @@ static struct sched_group *find_busiest_group(struct 
> lb_env *env)
>   if (!sds.busiest)
>   goto 

Re: [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks

2021-04-16 Thread Valentin Schneider
On 15/04/21 16:39, Rik van Riel wrote:
> On Thu, 2021-04-15 at 18:58 +0100, Valentin Schneider wrote:
>> Consider the following topology:
>>
>> Long story short, preempted misfit tasks are affected by task_hot(),
>> while
>> currently running misfit tasks are intentionally preempted by the
>> stopper
>> task to migrate them over to a higher-capacity CPU.
>>
>> Align detach_tasks() with the active-balance logic and let it pick a
>> cache-hot misfit task when the destination CPU can provide a capacity
>> uplift.
>>
>> Signed-off-by: Valentin Schneider 
>
> Reviewed-by: Rik van Riel 
>

Thanks!

>
> This patch looks good, but...
>
>> @@ -7672,6 +7698,15 @@ int can_migrate_task(struct task_struct *p,
>> struct lb_env *env)
>>  if (tsk_cache_hot == -1)
>>  tsk_cache_hot = task_hot(p, env);
>>
>> +/*
>> + * On a (sane) asymmetric CPU capacity system, the increase in
>> compute
>> + * capacity should offset any potential performance hit caused
>> by a
>> + * migration.
>> + */
>> +if ((env->dst_grp_type == group_has_spare) &&
>> +!migrate_degrades_capacity(p, env))
>> +tsk_cache_hot = 0;
>
> ... I'm starting to wonder if we should not rename the
> tsk_cache_hot variable to something else to make this
> code more readable. Probably in another patch :)
>

I'd tend to agree, but naming is hard. "migration_harmful" ?

> --
> All Rights Reversed.


Re: [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks

2021-04-15 Thread Rik van Riel
On Thu, 2021-04-15 at 18:58 +0100, Valentin Schneider wrote:
> Consider the following topology:
> 
> Long story short, preempted misfit tasks are affected by task_hot(),
> while
> currently running misfit tasks are intentionally preempted by the
> stopper
> task to migrate them over to a higher-capacity CPU.
> 
> Align detach_tasks() with the active-balance logic and let it pick a
> cache-hot misfit task when the destination CPU can provide a capacity
> uplift.
> 
> Signed-off-by: Valentin Schneider 

Reviewed-by: Rik van Riel 


This patch looks good, but...

> @@ -7672,6 +7698,15 @@ int can_migrate_task(struct task_struct *p,
> struct lb_env *env)
>   if (tsk_cache_hot == -1)
>   tsk_cache_hot = task_hot(p, env);
>  
> + /*
> +  * On a (sane) asymmetric CPU capacity system, the increase in
> compute
> +  * capacity should offset any potential performance hit caused
> by a
> +  * migration.
> +  */
> + if ((env->dst_grp_type == group_has_spare) &&
> + !migrate_degrades_capacity(p, env))
> + tsk_cache_hot = 0;

... I'm starting to wonder if we should not rename the
tsk_cache_hot variable to something else to make this
code more readable. Probably in another patch :)

-- 
All Rights Reversed.


signature.asc
Description: This is a digitally signed message part