Re: [PATCH 7/8] sched/fair: Attempt misfit active balance when migration_type != migrate_misfit
On 09/02/21 09:58, Vincent Guittot wrote: > On Thu, 28 Jan 2021 at 19:32, Valentin Schneider >> Giving group_misfit_task a higher group_classify() priority than >> group_imbalance doesn't seem like the right thing to do. Instead, make > > Could you explain why ? > Morten had intentionally placed it above (then) group_other but below group_imbalanced: 3b1baa6496e6 ("sched/fair: Add 'group_misfit_task' load-balance type") The reasoning being misfit balance shouldn't take higher priority than jarring imbalance issues. group_imbalanced is a mixed bag and difficult to classify, but for sure group_overloaded takes priority as it ought to imply you can move tasks around without doing an active balance (there's more tasks than CPUs). Then again, we do have issues where the busiest group is group_overloaded, but we'd "want" this to be classified as misfit. This ties in with patch 8. Take the CPU hog vs pcpu kworker example on a big.LITTLE platform: a,b,c,d are our CPU-hogging tasks k is a per-CPU kworker {CPU0 | a a a a k LITTLE{CPU1 | b b b b a --|- {CPU2 | c c c c . bigs {CPU3 | d d d d ^ | | newidle pull CPU2 finished its work and goes through a newidle balance. Ideally here it would pull either 'a' or 'b' which are CPU-bound tasks running on LITTLE CPUs. Unfortunately, a per-CPU kworker woke up on CPU0, so since we have: DIE [] MC [ ][ ] 0 1 2 3 the DIE (0-1) sched_group has 3 runnable tasks, two of which are CPU hogs: it gets classified as group_overloaded. Only task 'a' can be pulled, and it requires patch 8 to be migrated in this scenario. I'm not sure how we could better classify this, even if admitting we started tracking preempted misfit tasks. Perhaps not group classification itself, but the migration_type? i.e. something like if (nr_running - sgs->group_weight <= nr_misfits) => all preempted tasks are misfit => migration_type = migrate_misfit
Re: [PATCH 7/8] sched/fair: Attempt misfit active balance when migration_type != migrate_misfit
On Thu, 28 Jan 2021 at 19:32, Valentin Schneider wrote: > > Consider a 4-CPU big.LITTLE system with CPUs 0-1 as LITTLEs and CPUs 2-3 as > bigs. The resulting sched_domain hierarchy is: > > DIE [ ] > MC [][] >0 1 2 3 > > When running a multithreaded CPU-bound workload (i.e. 1 hog per CPU), the > expected behaviour is to have the about-to-idle big CPUs pull a hog from > the LITTLEs, since bigs will complete their work sooner than LITTLEs. > > Further Consider a scenario where: > - CPU0 is idle (e.g. its hog got migrated to one of the big CPUs) > - CPU1 is currently executing a per-CPU kworker, preempting the CPU hog > - CPU2 and CPU3 are executing CPU-hogs > > CPU0 goes through load_balance() at MC level, and tries to pick stuff from > CPU1, but: > - the hog can't be pulled, because it's task_hot() > - the kworker can't be pulled, because it's pinned to CPU1, which sets > LBF_SOME_PINNED > > This load balance attempts ends with no load pulled, LBF_SOME_PINNED set, > and as a consequence we set the imbalance flag of DIE's [0, 1] > sched_group_capacity. > > Shortly after, CPU2 completes its work and is about to go idle. It goes > through the newidle_balance(), and we would really like it to active > balance the hog running on CPU1 (which is a misfit task). However, > sgc->imbalance is set for the LITTLE group at DIE level, so the group gets > classified as group_imbalanced rather than group_misfit_task. > > Unlike group_misfit_task (via migrate_misfit), the active balance logic > doesn't have any specific case for group_imbalanced, so CPU2 ends up going > idle. We'll have to wait for a load balance on CPU0 or CPU1 to happen and > clear the imbalance flag, and then for another DIE-level load-balance on > CPU2 to happen to pull the task off of CPU1. That's several precious > milliseconds wasted down the drain. > > Giving group_misfit_task a higher group_classify() priority than > group_imbalance doesn't seem like the right thing to do. Instead, make Could you explain why ? > need_active_balance() return true for any migration_type when the > destination CPU is idle and the source CPU has a misfit task. > > While at it, add an sd_has_asym_cpucapacity() guard in > need_active_balance(). > > Signed-off-by: Valentin Schneider > --- > kernel/sched/fair.c | 13 + > 1 file changed, 13 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 0ac2f876b86f..cba9f97d9beb 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -9557,9 +9557,22 @@ static int need_active_balance(struct lb_env *env) > return 1; > } > > + if (!sd_has_asym_cpucapacity(sd)) > + return 0; > + > if (env->migration_type == migrate_misfit) > return 1; > > + /* > +* If we failed to pull anything and the src_rq has a misfit task, but > +* the busiest group_type was higher than group_misfit_task, try to > +* go for a misfit active balance anyway. > +*/ > + if ((env->idle != CPU_NOT_IDLE) && > + env->src_rq->misfit_task_load && > + cpu_capacity_greater(env->dst_cpu, env->src_cpu)) > + return 1; > + > return 0; > } > > -- > 2.27.0 >
Re: [PATCH 7/8] sched/fair: Attempt misfit active balance when migration_type != migrate_misfit
On 04/02/21 12:44, Dietmar Eggemann wrote: > On 03/02/2021 19:43, Valentin Schneider wrote: >> On 03/02/21 15:16, Qais Yousef wrote: >>> On 01/28/21 18:31, Valentin Schneider wrote: Giving group_misfit_task a higher group_classify() priority than group_imbalance doesn't seem like the right thing to do. Instead, make need_active_balance() return true for any migration_type when the >>> >>> s/need_active_balance()/voluntary_active_balance()/? >>> destination CPU is idle and the source CPU has a misfit task. While at it, add an sd_has_asym_cpucapacity() guard in need_active_balance(). >>> >>> ditto. >>> >> >> Myes, clearly this has been left to ferment for too long! > > Wasn't the migrate_misfit condition moved from > voluntary_active_balance() into need_active_balance() by commit > ("sched/fair: Reduce cases for active balance")? Bah, you're right, I got confused with when I first wrote that vs when I last updated the changelog. As of e9b9734b7465 ("sched/fair: Reduce cases for active balance")e9b9734b7465 The above changelog is actually correct.
Re: [PATCH 7/8] sched/fair: Attempt misfit active balance when migration_type != migrate_misfit
On 03/02/2021 19:43, Valentin Schneider wrote: > On 03/02/21 15:16, Qais Yousef wrote: >> On 01/28/21 18:31, Valentin Schneider wrote: >>> Giving group_misfit_task a higher group_classify() priority than >>> group_imbalance doesn't seem like the right thing to do. Instead, make >>> need_active_balance() return true for any migration_type when the >> >> s/need_active_balance()/voluntary_active_balance()/? >> >>> destination CPU is idle and the source CPU has a misfit task. >>> >>> While at it, add an sd_has_asym_cpucapacity() guard in >>> need_active_balance(). >> >> ditto. >> > > Myes, clearly this has been left to ferment for too long! Wasn't the migrate_misfit condition moved from voluntary_active_balance() into need_active_balance() by commit ("sched/fair: Reduce cases for active balance")?
Re: [PATCH 7/8] sched/fair: Attempt misfit active balance when migration_type != migrate_misfit
On 03/02/21 15:16, Qais Yousef wrote: > On 01/28/21 18:31, Valentin Schneider wrote: >> Giving group_misfit_task a higher group_classify() priority than >> group_imbalance doesn't seem like the right thing to do. Instead, make >> need_active_balance() return true for any migration_type when the > > s/need_active_balance()/voluntary_active_balance()/? > >> destination CPU is idle and the source CPU has a misfit task. >> >> While at it, add an sd_has_asym_cpucapacity() guard in >> need_active_balance(). > > ditto. > Myes, clearly this has been left to ferment for too long!
Re: [PATCH 7/8] sched/fair: Attempt misfit active balance when migration_type != migrate_misfit
On 01/28/21 18:31, Valentin Schneider wrote: > Consider a 4-CPU big.LITTLE system with CPUs 0-1 as LITTLEs and CPUs 2-3 as > bigs. The resulting sched_domain hierarchy is: > > DIE [ ] > MC [][] >0 1 2 3 > > When running a multithreaded CPU-bound workload (i.e. 1 hog per CPU), the > expected behaviour is to have the about-to-idle big CPUs pull a hog from > the LITTLEs, since bigs will complete their work sooner than LITTLEs. > > Further Consider a scenario where: > - CPU0 is idle (e.g. its hog got migrated to one of the big CPUs) > - CPU1 is currently executing a per-CPU kworker, preempting the CPU hog > - CPU2 and CPU3 are executing CPU-hogs > > CPU0 goes through load_balance() at MC level, and tries to pick stuff from > CPU1, but: > - the hog can't be pulled, because it's task_hot() > - the kworker can't be pulled, because it's pinned to CPU1, which sets > LBF_SOME_PINNED > > This load balance attempts ends with no load pulled, LBF_SOME_PINNED set, > and as a consequence we set the imbalance flag of DIE's [0, 1] > sched_group_capacity. > > Shortly after, CPU2 completes its work and is about to go idle. It goes > through the newidle_balance(), and we would really like it to active > balance the hog running on CPU1 (which is a misfit task). However, > sgc->imbalance is set for the LITTLE group at DIE level, so the group gets > classified as group_imbalanced rather than group_misfit_task. > > Unlike group_misfit_task (via migrate_misfit), the active balance logic > doesn't have any specific case for group_imbalanced, so CPU2 ends up going > idle. We'll have to wait for a load balance on CPU0 or CPU1 to happen and > clear the imbalance flag, and then for another DIE-level load-balance on > CPU2 to happen to pull the task off of CPU1. That's several precious > milliseconds wasted down the drain. I think that might help with in games where some worker threads could get end up on little cores and this delay in up migration could cause frames to drop. > > Giving group_misfit_task a higher group_classify() priority than > group_imbalance doesn't seem like the right thing to do. Instead, make > need_active_balance() return true for any migration_type when the s/need_active_balance()/voluntary_active_balance()/? > destination CPU is idle and the source CPU has a misfit task. > > While at it, add an sd_has_asym_cpucapacity() guard in > need_active_balance(). ditto. > > Signed-off-by: Valentin Schneider > --- > kernel/sched/fair.c | 13 + > 1 file changed, 13 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 0ac2f876b86f..cba9f97d9beb 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -9557,9 +9557,22 @@ static int need_active_balance(struct lb_env *env) > return 1; > } > > + if (!sd_has_asym_cpucapacity(sd)) > + return 0; > + > if (env->migration_type == migrate_misfit) > return 1; > > + /* > + * If we failed to pull anything and the src_rq has a misfit task, but > + * the busiest group_type was higher than group_misfit_task, try to > + * go for a misfit active balance anyway. > + */ > + if ((env->idle != CPU_NOT_IDLE) && > + env->src_rq->misfit_task_load && > + cpu_capacity_greater(env->dst_cpu, env->src_cpu)) > + return 1; > + Reviewed-by: Qais Yousef Thanks -- Qais Yousef > return 0; > } > > -- > 2.27.0 >
[PATCH 7/8] sched/fair: Attempt misfit active balance when migration_type != migrate_misfit
Consider a 4-CPU big.LITTLE system with CPUs 0-1 as LITTLEs and CPUs 2-3 as bigs. The resulting sched_domain hierarchy is: DIE [ ] MC [][] 0 1 2 3 When running a multithreaded CPU-bound workload (i.e. 1 hog per CPU), the expected behaviour is to have the about-to-idle big CPUs pull a hog from the LITTLEs, since bigs will complete their work sooner than LITTLEs. Further Consider a scenario where: - CPU0 is idle (e.g. its hog got migrated to one of the big CPUs) - CPU1 is currently executing a per-CPU kworker, preempting the CPU hog - CPU2 and CPU3 are executing CPU-hogs CPU0 goes through load_balance() at MC level, and tries to pick stuff from CPU1, but: - the hog can't be pulled, because it's task_hot() - the kworker can't be pulled, because it's pinned to CPU1, which sets LBF_SOME_PINNED This load balance attempts ends with no load pulled, LBF_SOME_PINNED set, and as a consequence we set the imbalance flag of DIE's [0, 1] sched_group_capacity. Shortly after, CPU2 completes its work and is about to go idle. It goes through the newidle_balance(), and we would really like it to active balance the hog running on CPU1 (which is a misfit task). However, sgc->imbalance is set for the LITTLE group at DIE level, so the group gets classified as group_imbalanced rather than group_misfit_task. Unlike group_misfit_task (via migrate_misfit), the active balance logic doesn't have any specific case for group_imbalanced, so CPU2 ends up going idle. We'll have to wait for a load balance on CPU0 or CPU1 to happen and clear the imbalance flag, and then for another DIE-level load-balance on CPU2 to happen to pull the task off of CPU1. That's several precious milliseconds wasted down the drain. Giving group_misfit_task a higher group_classify() priority than group_imbalance doesn't seem like the right thing to do. Instead, make need_active_balance() return true for any migration_type when the destination CPU is idle and the source CPU has a misfit task. While at it, add an sd_has_asym_cpucapacity() guard in need_active_balance(). Signed-off-by: Valentin Schneider --- kernel/sched/fair.c | 13 + 1 file changed, 13 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0ac2f876b86f..cba9f97d9beb 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9557,9 +9557,22 @@ static int need_active_balance(struct lb_env *env) return 1; } + if (!sd_has_asym_cpucapacity(sd)) + return 0; + if (env->migration_type == migrate_misfit) return 1; + /* +* If we failed to pull anything and the src_rq has a misfit task, but +* the busiest group_type was higher than group_misfit_task, try to +* go for a misfit active balance anyway. +*/ + if ((env->idle != CPU_NOT_IDLE) && + env->src_rq->misfit_task_load && + cpu_capacity_greater(env->dst_cpu, env->src_cpu)) + return 1; + return 0; } -- 2.27.0