Re: [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks
On Mon, 19 Apr 2021 at 19:13, Valentin Schneider wrote: > > On 16/04/21 15:51, Vincent Guittot wrote: > > Le jeudi 15 avril 2021 � 18:58:46 (+0100), Valentin Schneider a �crit : > >> + > >> +/* > >> + * What does migrating this task do to our capacity-aware scheduling > >> criterion? > >> + * > >> + * Returns 1, if the task needs more capacity than the dst CPU can > >> provide. > >> + * Returns 0, if the task needs the extra capacity provided by the dst CPU > >> + * Returns -1, if the task isn't impacted by the migration wrt capacity. > >> + */ > >> +static int migrate_degrades_capacity(struct task_struct *p, struct lb_env > >> *env) > >> +{ > >> +if (!(env->sd->flags & SD_ASYM_CPUCAPACITY)) > >> +return -1; > >> + > >> +if (!task_fits_capacity(p, capacity_of(env->src_cpu))) { > >> +if (cpu_capacity_greater(env->dst_cpu, env->src_cpu)) > >> +return 0; > >> +else if (cpu_capacity_greater(env->src_cpu, env->dst_cpu)) > >> +return 1; > >> +else > >> +return -1; > >> +} > > > > Being there means that task fits src_cpu capacity so why testing p against > > dst_cpu ? > > > > Because if p fits on src_cpu, we don't want to move it to a dst_cpu on > which it *doesn't* fit. OK. I was confused because I thought that this was only to force migration in case of group_misfit_task but you tried to extend to other cases... I'm not convinced that you succeeded to cover all cases Also I found this function which returns 3 values a bit disturbing. IIUC you tried to align to migrate_degrades_capacity but you should have better aligned to task_hot and return only 0 or 1. -1 is not used > > >> + > >> +return task_fits_capacity(p, capacity_of(env->dst_cpu)) ? -1 : 1; > >> +} > > > > I prefer the below which easier to read because the same var is use > > everywhere and you can remove cpu_capacity_greater. > > > > static int migrate_degrades_capacity(struct task_struct *p, struct lb_env > > *env) > > { > > unsigned long src_capacity, dst_capacity; > > > > if (!(env->sd->flags & SD_ASYM_CPUCAPACITY)) > > return -1; > > > > src_capacity = capacity_of(env->src_cpu); > > dst_capacity = capacity_of(env->dst_cpu); > > > > if (!task_fits_capacity(p, src_capacity)) { > > if (capacity_greater(dst_capacity, src_capacity)) > > return 0; > > else if (capacity_greater(src_capacity, dst_capacity)) > > return 1; > > else > > return -1; > > } > > > > return task_fits_capacity(p, dst_capacity) ? -1 : 1; > > } > > > > I'll take it, thanks! > > > > >> + > >> #ifdef CONFIG_NUMA_BALANCING > >> /* > >> * Returns 1, if task migration degrades locality > >> @@ -7672,6 +7698,15 @@ int can_migrate_task(struct task_struct *p, struct > >> lb_env *env) > >> if (tsk_cache_hot == -1) > >> tsk_cache_hot = task_hot(p, env); > >> > >> +/* > >> + * On a (sane) asymmetric CPU capacity system, the increase in compute > >> + * capacity should offset any potential performance hit caused by a > >> + * migration. > >> + */ > >> +if ((env->dst_grp_type == group_has_spare) && > > > > Shouldn't it be env->src_grp_type == group_misfit_task to only care of > > misfit task case as > > stated in $subject > > > > Previously this was env->idle != CPU_NOT_IDLE, but I figured dst_grp_type > could give us a better picture. Staring at this some more, this isn't so > true when the group size goes up - there's no guarantees the dst_cpu is the > one that has spare cycles, and the other CPUs might not be able to grant > the capacity uplift dst_cpu can. yeah you have to keep checking for env->idle != CPU_NOT_IDLE > > As for not using src_grp_type == group_misfit_task, this is pretty much the > same as [1]. CPU-bound (misfit) task + some other task on the same rq > implies group_overloaded classification when balancing at MC level (no SMT, > so one group per CPU). Is it something that happens often or just a sporadic/transient state ? I mean does it really worth the extra complexity and do you see performance improvement ? You should better focus on fixing the simple case of group_misfit_task task. This other cases looks far more complex with lot of corner cases > > [1]: http://lore.kernel.org/r/jhjblcuv2mo.mog...@arm.com
Re: [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks
On Mon, Apr 19, 2021 at 06:17:47PM +0100 Valentin Schneider wrote: > On 19/04/21 08:59, Phil Auld wrote: > > On Fri, Apr 16, 2021 at 10:43:38AM +0100 Valentin Schneider wrote: > >> On 15/04/21 16:39, Rik van Riel wrote: > >> > On Thu, 2021-04-15 at 18:58 +0100, Valentin Schneider wrote: > >> >> @@ -7672,6 +7698,15 @@ int can_migrate_task(struct task_struct *p, > >> >> struct lb_env *env) > >> >> if (tsk_cache_hot == -1) > >> >> tsk_cache_hot = task_hot(p, env); > >> >> > >> >> + /* > >> >> +* On a (sane) asymmetric CPU capacity system, the increase in > >> >> compute > >> >> +* capacity should offset any potential performance hit caused > >> >> by a > >> >> +* migration. > >> >> +*/ > >> >> + if ((env->dst_grp_type == group_has_spare) && > >> >> + !migrate_degrades_capacity(p, env)) > >> >> + tsk_cache_hot = 0; > >> > > >> > ... I'm starting to wonder if we should not rename the > >> > tsk_cache_hot variable to something else to make this > >> > code more readable. Probably in another patch :) > >> > > >> > >> I'd tend to agree, but naming is hard. "migration_harmful" ? > > > > I thought Rik meant tsk_cache_hot, for which I'd suggest at least > > buying a vowel and putting an 'a' in there :) > > > > That's the one I was eyeing: s/tsk_cache_hot/migration_harmful/ or > somesuch. Right now we're feeding it: > Fair enough, my bad, the migration part immediately drew me to migrate_degrades_capacity(). > o migrate_degrades_locality() > o task_hot() > > and I'm adding another one, so that's 2/3 which don't actually care about > cache hotness, but rather "does doing this migration degrade/improve > $criterion?" > prefer_no_migrate? Cheers, Phil --
Re: [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks
On 19/04/21 08:59, Phil Auld wrote: > On Fri, Apr 16, 2021 at 10:43:38AM +0100 Valentin Schneider wrote: >> On 15/04/21 16:39, Rik van Riel wrote: >> > On Thu, 2021-04-15 at 18:58 +0100, Valentin Schneider wrote: >> >> @@ -7672,6 +7698,15 @@ int can_migrate_task(struct task_struct *p, >> >> struct lb_env *env) >> >> if (tsk_cache_hot == -1) >> >> tsk_cache_hot = task_hot(p, env); >> >> >> >> + /* >> >> + * On a (sane) asymmetric CPU capacity system, the increase in >> >> compute >> >> + * capacity should offset any potential performance hit caused >> >> by a >> >> + * migration. >> >> + */ >> >> + if ((env->dst_grp_type == group_has_spare) && >> >> + !migrate_degrades_capacity(p, env)) >> >> + tsk_cache_hot = 0; >> > >> > ... I'm starting to wonder if we should not rename the >> > tsk_cache_hot variable to something else to make this >> > code more readable. Probably in another patch :) >> > >> >> I'd tend to agree, but naming is hard. "migration_harmful" ? > > I thought Rik meant tsk_cache_hot, for which I'd suggest at least > buying a vowel and putting an 'a' in there :) > That's the one I was eyeing: s/tsk_cache_hot/migration_harmful/ or somesuch. Right now we're feeding it: o migrate_degrades_locality() o task_hot() and I'm adding another one, so that's 2/3 which don't actually care about cache hotness, but rather "does doing this migration degrade/improve $criterion?"
Re: [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks
On 16/04/21 15:51, Vincent Guittot wrote: > Le jeudi 15 avril 2021 � 18:58:46 (+0100), Valentin Schneider a �crit : >> + >> +/* >> + * What does migrating this task do to our capacity-aware scheduling >> criterion? >> + * >> + * Returns 1, if the task needs more capacity than the dst CPU can provide. >> + * Returns 0, if the task needs the extra capacity provided by the dst CPU >> + * Returns -1, if the task isn't impacted by the migration wrt capacity. >> + */ >> +static int migrate_degrades_capacity(struct task_struct *p, struct lb_env >> *env) >> +{ >> +if (!(env->sd->flags & SD_ASYM_CPUCAPACITY)) >> +return -1; >> + >> +if (!task_fits_capacity(p, capacity_of(env->src_cpu))) { >> +if (cpu_capacity_greater(env->dst_cpu, env->src_cpu)) >> +return 0; >> +else if (cpu_capacity_greater(env->src_cpu, env->dst_cpu)) >> +return 1; >> +else >> +return -1; >> +} > > Being there means that task fits src_cpu capacity so why testing p against > dst_cpu ? > Because if p fits on src_cpu, we don't want to move it to a dst_cpu on which it *doesn't* fit. >> + >> +return task_fits_capacity(p, capacity_of(env->dst_cpu)) ? -1 : 1; >> +} > > I prefer the below which easier to read because the same var is use > everywhere and you can remove cpu_capacity_greater. > > static int migrate_degrades_capacity(struct task_struct *p, struct lb_env > *env) > { > unsigned long src_capacity, dst_capacity; > > if (!(env->sd->flags & SD_ASYM_CPUCAPACITY)) > return -1; > > src_capacity = capacity_of(env->src_cpu); > dst_capacity = capacity_of(env->dst_cpu); > > if (!task_fits_capacity(p, src_capacity)) { > if (capacity_greater(dst_capacity, src_capacity)) > return 0; > else if (capacity_greater(src_capacity, dst_capacity)) > return 1; > else > return -1; > } > > return task_fits_capacity(p, dst_capacity) ? -1 : 1; > } > I'll take it, thanks! > >> + >> #ifdef CONFIG_NUMA_BALANCING >> /* >> * Returns 1, if task migration degrades locality >> @@ -7672,6 +7698,15 @@ int can_migrate_task(struct task_struct *p, struct >> lb_env *env) >> if (tsk_cache_hot == -1) >> tsk_cache_hot = task_hot(p, env); >> >> +/* >> + * On a (sane) asymmetric CPU capacity system, the increase in compute >> + * capacity should offset any potential performance hit caused by a >> + * migration. >> + */ >> +if ((env->dst_grp_type == group_has_spare) && > > Shouldn't it be env->src_grp_type == group_misfit_task to only care of misfit > task case as > stated in $subject > Previously this was env->idle != CPU_NOT_IDLE, but I figured dst_grp_type could give us a better picture. Staring at this some more, this isn't so true when the group size goes up - there's no guarantees the dst_cpu is the one that has spare cycles, and the other CPUs might not be able to grant the capacity uplift dst_cpu can. As for not using src_grp_type == group_misfit_task, this is pretty much the same as [1]. CPU-bound (misfit) task + some other task on the same rq implies group_overloaded classification when balancing at MC level (no SMT, so one group per CPU). [1]: http://lore.kernel.org/r/jhjblcuv2mo.mog...@arm.com
Re: [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks
On Fri, Apr 16, 2021 at 10:43:38AM +0100 Valentin Schneider wrote: > On 15/04/21 16:39, Rik van Riel wrote: > > On Thu, 2021-04-15 at 18:58 +0100, Valentin Schneider wrote: > >> Consider the following topology: > >> > >> Long story short, preempted misfit tasks are affected by task_hot(), > >> while > >> currently running misfit tasks are intentionally preempted by the > >> stopper > >> task to migrate them over to a higher-capacity CPU. > >> > >> Align detach_tasks() with the active-balance logic and let it pick a > >> cache-hot misfit task when the destination CPU can provide a capacity > >> uplift. > >> > >> Signed-off-by: Valentin Schneider > > > > Reviewed-by: Rik van Riel > > > > Thanks! > > > > > This patch looks good, but... > > > >> @@ -7672,6 +7698,15 @@ int can_migrate_task(struct task_struct *p, > >> struct lb_env *env) > >> if (tsk_cache_hot == -1) > >> tsk_cache_hot = task_hot(p, env); > >> > >> + /* > >> + * On a (sane) asymmetric CPU capacity system, the increase in > >> compute > >> + * capacity should offset any potential performance hit caused > >> by a > >> + * migration. > >> + */ > >> + if ((env->dst_grp_type == group_has_spare) && > >> + !migrate_degrades_capacity(p, env)) > >> + tsk_cache_hot = 0; > > > > ... I'm starting to wonder if we should not rename the > > tsk_cache_hot variable to something else to make this > > code more readable. Probably in another patch :) > > > > I'd tend to agree, but naming is hard. "migration_harmful" ? I thought Rik meant tsk_cache_hot, for which I'd suggest at least buying a vowel and putting an 'a' in there :) Cheers, Phil > > > -- > > All Rights Reversed. > --
Re: [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks
Le jeudi 15 avril 2021 à 18:58:46 (+0100), Valentin Schneider a écrit : > Consider the following topology: > > DIE [ ] > MC [][] >0 1 2 3 > > capacity_orig_of(x \in {0-1}) < capacity_orig_of(x \in {2-3}) > > w/ CPUs 2-3 idle and CPUs 0-1 running CPU hogs (util_avg=1024). > > When CPU2 goes through load_balance() (via periodic / NOHZ balance), it > should pull one CPU hog from either CPU0 or CPU1 (this is misfit task > upmigration). However, should a e.g. pcpu kworker awake on CPU0 just before > this load_balance() happens and preempt the CPU hog running there, we would > have, for the [0-1] group at CPU2's DIE level: > > o sgs->sum_nr_running > sgs->group_weight > o sgs->group_capacity * 100 < sgs->group_util * imbalance_pct > > IOW, this group is group_overloaded. > > Considering CPU0 is picked by find_busiest_queue(), we would then visit the > preempted CPU hog in detach_tasks(). However, given it has just been > preempted by this pcpu kworker, task_hot() will prevent it from being > detached. We then leave load_balance() without having done anything. > > Long story short, preempted misfit tasks are affected by task_hot(), while > currently running misfit tasks are intentionally preempted by the stopper > task to migrate them over to a higher-capacity CPU. > > Align detach_tasks() with the active-balance logic and let it pick a > cache-hot misfit task when the destination CPU can provide a capacity > uplift. > > Signed-off-by: Valentin Schneider > --- > kernel/sched/fair.c | 36 > 1 file changed, 36 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index d2d1a69d7aa7..43fc98d34276 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -7493,6 +7493,7 @@ struct lb_env { > enum fbq_type fbq_type; > enum migration_type migration_type; > enum group_type src_grp_type; > + enum group_type dst_grp_type; > struct list_headtasks; > }; > > @@ -7533,6 +7534,31 @@ static int task_hot(struct task_struct *p, struct > lb_env *env) > return delta < (s64)sysctl_sched_migration_cost; > } > > + > +/* > + * What does migrating this task do to our capacity-aware scheduling > criterion? > + * > + * Returns 1, if the task needs more capacity than the dst CPU can provide. > + * Returns 0, if the task needs the extra capacity provided by the dst CPU > + * Returns -1, if the task isn't impacted by the migration wrt capacity. > + */ > +static int migrate_degrades_capacity(struct task_struct *p, struct lb_env > *env) > +{ > + if (!(env->sd->flags & SD_ASYM_CPUCAPACITY)) > + return -1; > + > + if (!task_fits_capacity(p, capacity_of(env->src_cpu))) { > + if (cpu_capacity_greater(env->dst_cpu, env->src_cpu)) > + return 0; > + else if (cpu_capacity_greater(env->src_cpu, env->dst_cpu)) > + return 1; > + else > + return -1; > + } Being there means that task fits src_cpu capacity so why testing p against dst_cpu ? > + > + return task_fits_capacity(p, capacity_of(env->dst_cpu)) ? -1 : 1; > +} I prefer the below which easier to read because the same var is use everywhere and you can remove cpu_capacity_greater. static int migrate_degrades_capacity(struct task_struct *p, struct lb_env *env) { unsigned long src_capacity, dst_capacity; if (!(env->sd->flags & SD_ASYM_CPUCAPACITY)) return -1; src_capacity = capacity_of(env->src_cpu); dst_capacity = capacity_of(env->dst_cpu); if (!task_fits_capacity(p, src_capacity)) { if (capacity_greater(dst_capacity, src_capacity)) return 0; else if (capacity_greater(src_capacity, dst_capacity)) return 1; else return -1; } return task_fits_capacity(p, dst_capacity) ? -1 : 1; } > + > #ifdef CONFIG_NUMA_BALANCING > /* > * Returns 1, if task migration degrades locality > @@ -7672,6 +7698,15 @@ int can_migrate_task(struct task_struct *p, struct > lb_env *env) > if (tsk_cache_hot == -1) > tsk_cache_hot = task_hot(p, env); > > + /* > + * On a (sane) asymmetric CPU capacity system, the increase in compute > + * capacity should offset any potential performance hit caused by a > + * migration. > + */ > + if ((env->dst_grp_type == group_has_spare) && Shouldn't it be env->src_grp_type == group_misfit_task to only care of misfit task case as stated in $subject > + !migrate_degrades_capacity(p, env)) > + tsk_cache_hot = 0; > + > if (tsk_cache_hot <= 0 || > env->sd->nr_balance_failed > env->sd->cache_nice_tries) { > if (tsk_cache_hot == 1) { > @@ -9310,6 +9345,7 @@ static struct sched_group *find_busiest_group(struct > lb_env *env) > if (!sds.busiest) > goto
Re: [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks
On 15/04/21 16:39, Rik van Riel wrote: > On Thu, 2021-04-15 at 18:58 +0100, Valentin Schneider wrote: >> Consider the following topology: >> >> Long story short, preempted misfit tasks are affected by task_hot(), >> while >> currently running misfit tasks are intentionally preempted by the >> stopper >> task to migrate them over to a higher-capacity CPU. >> >> Align detach_tasks() with the active-balance logic and let it pick a >> cache-hot misfit task when the destination CPU can provide a capacity >> uplift. >> >> Signed-off-by: Valentin Schneider > > Reviewed-by: Rik van Riel > Thanks! > > This patch looks good, but... > >> @@ -7672,6 +7698,15 @@ int can_migrate_task(struct task_struct *p, >> struct lb_env *env) >> if (tsk_cache_hot == -1) >> tsk_cache_hot = task_hot(p, env); >> >> +/* >> + * On a (sane) asymmetric CPU capacity system, the increase in >> compute >> + * capacity should offset any potential performance hit caused >> by a >> + * migration. >> + */ >> +if ((env->dst_grp_type == group_has_spare) && >> +!migrate_degrades_capacity(p, env)) >> +tsk_cache_hot = 0; > > ... I'm starting to wonder if we should not rename the > tsk_cache_hot variable to something else to make this > code more readable. Probably in another patch :) > I'd tend to agree, but naming is hard. "migration_harmful" ? > -- > All Rights Reversed.
Re: [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks
On Thu, 2021-04-15 at 18:58 +0100, Valentin Schneider wrote: > Consider the following topology: > > Long story short, preempted misfit tasks are affected by task_hot(), > while > currently running misfit tasks are intentionally preempted by the > stopper > task to migrate them over to a higher-capacity CPU. > > Align detach_tasks() with the active-balance logic and let it pick a > cache-hot misfit task when the destination CPU can provide a capacity > uplift. > > Signed-off-by: Valentin Schneider Reviewed-by: Rik van Riel This patch looks good, but... > @@ -7672,6 +7698,15 @@ int can_migrate_task(struct task_struct *p, > struct lb_env *env) > if (tsk_cache_hot == -1) > tsk_cache_hot = task_hot(p, env); > > + /* > + * On a (sane) asymmetric CPU capacity system, the increase in > compute > + * capacity should offset any potential performance hit caused > by a > + * migration. > + */ > + if ((env->dst_grp_type == group_has_spare) && > + !migrate_degrades_capacity(p, env)) > + tsk_cache_hot = 0; ... I'm starting to wonder if we should not rename the tsk_cache_hot variable to something else to make this code more readable. Probably in another patch :) -- All Rights Reversed. signature.asc Description: This is a digitally signed message part