Re: [PATCH 7/8] sched/fair: Attempt misfit active balance when migration_type != migrate_misfit

2021-02-09 Thread Valentin Schneider
On 09/02/21 09:58, Vincent Guittot wrote:
> On Thu, 28 Jan 2021 at 19:32, Valentin Schneider
>> Giving group_misfit_task a higher group_classify() priority than
>> group_imbalance doesn't seem like the right thing to do. Instead, make
>
> Could you explain why ?
>

Morten had intentionally placed it above (then) group_other but below
group_imbalanced:

  3b1baa6496e6 ("sched/fair: Add 'group_misfit_task' load-balance type")

The reasoning being misfit balance shouldn't take higher priority than
jarring imbalance issues. group_imbalanced is a mixed bag and difficult to
classify, but for sure group_overloaded takes priority as it ought to imply
you can move tasks around without doing an active balance (there's more
tasks than CPUs).

Then again, we do have issues where the busiest group is group_overloaded,
but we'd "want" this to be classified as misfit. This ties in with patch 8.

Take the CPU hog vs pcpu kworker example on a big.LITTLE platform:

  a,b,c,d are our CPU-hogging tasks
  k is a per-CPU kworker

{CPU0 | a a a a k
  LITTLE{CPU1 | b b b b a
--|-
{CPU2 | c c c c .
  bigs  {CPU3 | d d d d ^
|
|
  newidle pull

CPU2 finished its work and goes through a newidle balance. Ideally here it
would pull either 'a' or 'b' which are CPU-bound tasks running on LITTLE
CPUs. Unfortunately, a per-CPU kworker woke up on CPU0, so since we have:

  DIE []
  MC  [   ][   ]
   0 1  2 3

the DIE (0-1) sched_group has 3 runnable tasks, two of which are CPU hogs:
it gets classified as group_overloaded. Only task 'a' can be pulled, and it
requires patch 8 to be migrated in this scenario.


I'm not sure how we could better classify this, even if admitting we
started tracking preempted misfit tasks. Perhaps not group classification
itself, but the migration_type? i.e. something like

  if (nr_running - sgs->group_weight <= nr_misfits)
  => all preempted tasks are misfit
  => migration_type = migrate_misfit


Re: [PATCH 7/8] sched/fair: Attempt misfit active balance when migration_type != migrate_misfit

2021-02-09 Thread Vincent Guittot
On Thu, 28 Jan 2021 at 19:32, Valentin Schneider
 wrote:
>
> Consider a 4-CPU big.LITTLE system with CPUs 0-1 as LITTLEs and CPUs 2-3 as
> bigs. The resulting sched_domain hierarchy is:
>
>   DIE [  ]
>   MC  [][]
>0  1  2  3
>
> When running a multithreaded CPU-bound workload (i.e. 1 hog per CPU), the
> expected behaviour is to have the about-to-idle big CPUs pull a hog from
> the LITTLEs, since bigs will complete their work sooner than LITTLEs.
>
> Further Consider a scenario where:
> - CPU0 is idle (e.g. its hog got migrated to one of the big CPUs)
> - CPU1 is currently executing a per-CPU kworker, preempting the CPU hog
> - CPU2 and CPU3 are executing CPU-hogs
>
> CPU0 goes through load_balance() at MC level, and tries to pick stuff from
> CPU1, but:
> - the hog can't be pulled, because it's task_hot()
> - the kworker can't be pulled, because it's pinned to CPU1, which sets
>   LBF_SOME_PINNED
>
> This load balance attempts ends with no load pulled, LBF_SOME_PINNED set,
> and as a consequence we set the imbalance flag of DIE's [0, 1]
> sched_group_capacity.
>
> Shortly after, CPU2 completes its work and is about to go idle. It goes
> through the newidle_balance(), and we would really like it to active
> balance the hog running on CPU1 (which is a misfit task). However,
> sgc->imbalance is set for the LITTLE group at DIE level, so the group gets
> classified as group_imbalanced rather than group_misfit_task.
>
> Unlike group_misfit_task (via migrate_misfit), the active balance logic
> doesn't have any specific case for group_imbalanced, so CPU2 ends up going
> idle. We'll have to wait for a load balance on CPU0 or CPU1 to happen and
> clear the imbalance flag, and then for another DIE-level load-balance on
> CPU2 to happen to pull the task off of CPU1. That's several precious
> milliseconds wasted down the drain.
>
> Giving group_misfit_task a higher group_classify() priority than
> group_imbalance doesn't seem like the right thing to do. Instead, make

Could you explain why ?

> need_active_balance() return true for any migration_type when the
> destination CPU is idle and the source CPU has a misfit task.
>
> While at it, add an sd_has_asym_cpucapacity() guard in
> need_active_balance().
>
> Signed-off-by: Valentin Schneider 
> ---
>  kernel/sched/fair.c | 13 +
>  1 file changed, 13 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0ac2f876b86f..cba9f97d9beb 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9557,9 +9557,22 @@ static int need_active_balance(struct lb_env *env)
> return 1;
> }
>
> +   if (!sd_has_asym_cpucapacity(sd))
> +   return 0;
> +
> if (env->migration_type == migrate_misfit)
> return 1;
>
> +   /*
> +* If we failed to pull anything and the src_rq has a misfit task, but
> +* the busiest group_type was higher than group_misfit_task, try to
> +* go for a misfit active balance anyway.
> +*/
> +   if ((env->idle != CPU_NOT_IDLE) &&
> +   env->src_rq->misfit_task_load &&
> +   cpu_capacity_greater(env->dst_cpu, env->src_cpu))
> +   return 1;
> +
> return 0;
>  }
>
> --
> 2.27.0
>


Re: [PATCH 7/8] sched/fair: Attempt misfit active balance when migration_type != migrate_misfit

2021-02-04 Thread Valentin Schneider
On 04/02/21 12:44, Dietmar Eggemann wrote:
> On 03/02/2021 19:43, Valentin Schneider wrote:
>> On 03/02/21 15:16, Qais Yousef wrote:
>>> On 01/28/21 18:31, Valentin Schneider wrote:
 Giving group_misfit_task a higher group_classify() priority than
 group_imbalance doesn't seem like the right thing to do. Instead, make
 need_active_balance() return true for any migration_type when the
>>>
>>> s/need_active_balance()/voluntary_active_balance()/?
>>>
 destination CPU is idle and the source CPU has a misfit task.

 While at it, add an sd_has_asym_cpucapacity() guard in
 need_active_balance().
>>>
>>> ditto.
>>>
>> 
>> Myes, clearly this has been left to ferment for too long!
>
> Wasn't the migrate_misfit condition moved from
> voluntary_active_balance() into need_active_balance() by commit
> ("sched/fair: Reduce cases for active balance")?

Bah, you're right, I got confused with when I first wrote that vs when I
last updated the changelog.

As of

  e9b9734b7465 ("sched/fair: Reduce cases for active balance")e9b9734b7465

The above changelog is actually correct.


Re: [PATCH 7/8] sched/fair: Attempt misfit active balance when migration_type != migrate_misfit

2021-02-04 Thread Dietmar Eggemann
On 03/02/2021 19:43, Valentin Schneider wrote:
> On 03/02/21 15:16, Qais Yousef wrote:
>> On 01/28/21 18:31, Valentin Schneider wrote:
>>> Giving group_misfit_task a higher group_classify() priority than
>>> group_imbalance doesn't seem like the right thing to do. Instead, make
>>> need_active_balance() return true for any migration_type when the
>>
>> s/need_active_balance()/voluntary_active_balance()/?
>>
>>> destination CPU is idle and the source CPU has a misfit task.
>>>
>>> While at it, add an sd_has_asym_cpucapacity() guard in
>>> need_active_balance().
>>
>> ditto.
>>
> 
> Myes, clearly this has been left to ferment for too long!

Wasn't the migrate_misfit condition moved from
voluntary_active_balance() into need_active_balance() by commit
("sched/fair: Reduce cases for active balance")?


Re: [PATCH 7/8] sched/fair: Attempt misfit active balance when migration_type != migrate_misfit

2021-02-03 Thread Valentin Schneider
On 03/02/21 15:16, Qais Yousef wrote:
> On 01/28/21 18:31, Valentin Schneider wrote:
>> Giving group_misfit_task a higher group_classify() priority than
>> group_imbalance doesn't seem like the right thing to do. Instead, make
>> need_active_balance() return true for any migration_type when the
>
> s/need_active_balance()/voluntary_active_balance()/?
>
>> destination CPU is idle and the source CPU has a misfit task.
>> 
>> While at it, add an sd_has_asym_cpucapacity() guard in
>> need_active_balance().
>
> ditto.
>

Myes, clearly this has been left to ferment for too long!


Re: [PATCH 7/8] sched/fair: Attempt misfit active balance when migration_type != migrate_misfit

2021-02-03 Thread Qais Yousef
On 01/28/21 18:31, Valentin Schneider wrote:
> Consider a 4-CPU big.LITTLE system with CPUs 0-1 as LITTLEs and CPUs 2-3 as
> bigs. The resulting sched_domain hierarchy is:
> 
>   DIE [  ]
>   MC  [][]
>0  1  2  3
> 
> When running a multithreaded CPU-bound workload (i.e. 1 hog per CPU), the
> expected behaviour is to have the about-to-idle big CPUs pull a hog from
> the LITTLEs, since bigs will complete their work sooner than LITTLEs.
> 
> Further Consider a scenario where:
> - CPU0 is idle (e.g. its hog got migrated to one of the big CPUs)
> - CPU1 is currently executing a per-CPU kworker, preempting the CPU hog
> - CPU2 and CPU3 are executing CPU-hogs
> 
> CPU0 goes through load_balance() at MC level, and tries to pick stuff from
> CPU1, but:
> - the hog can't be pulled, because it's task_hot()
> - the kworker can't be pulled, because it's pinned to CPU1, which sets
>   LBF_SOME_PINNED
> 
> This load balance attempts ends with no load pulled, LBF_SOME_PINNED set,
> and as a consequence we set the imbalance flag of DIE's [0, 1]
> sched_group_capacity.
> 
> Shortly after, CPU2 completes its work and is about to go idle. It goes
> through the newidle_balance(), and we would really like it to active
> balance the hog running on CPU1 (which is a misfit task). However,
> sgc->imbalance is set for the LITTLE group at DIE level, so the group gets
> classified as group_imbalanced rather than group_misfit_task.
> 
> Unlike group_misfit_task (via migrate_misfit), the active balance logic
> doesn't have any specific case for group_imbalanced, so CPU2 ends up going
> idle. We'll have to wait for a load balance on CPU0 or CPU1 to happen and
> clear the imbalance flag, and then for another DIE-level load-balance on
> CPU2 to happen to pull the task off of CPU1. That's several precious
> milliseconds wasted down the drain.

I think that might help with in games where some worker threads could get end
up on little cores and this delay in up migration could cause frames to drop.

> 
> Giving group_misfit_task a higher group_classify() priority than
> group_imbalance doesn't seem like the right thing to do. Instead, make
> need_active_balance() return true for any migration_type when the

s/need_active_balance()/voluntary_active_balance()/?

> destination CPU is idle and the source CPU has a misfit task.
> 
> While at it, add an sd_has_asym_cpucapacity() guard in
> need_active_balance().

ditto.

> 
> Signed-off-by: Valentin Schneider 
> ---
>  kernel/sched/fair.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0ac2f876b86f..cba9f97d9beb 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9557,9 +9557,22 @@ static int need_active_balance(struct lb_env *env)
>   return 1;
>   }
>  
> + if (!sd_has_asym_cpucapacity(sd))
> + return 0;
> +
>   if (env->migration_type == migrate_misfit)
>   return 1;
>  
> + /*
> +  * If we failed to pull anything and the src_rq has a misfit task, but
> +  * the busiest group_type was higher than group_misfit_task, try to
> +  * go for a misfit active balance anyway.
> +  */
> + if ((env->idle != CPU_NOT_IDLE) &&
> + env->src_rq->misfit_task_load &&
> + cpu_capacity_greater(env->dst_cpu, env->src_cpu))
> + return 1;
> +

Reviewed-by: Qais Yousef 

Thanks

--
Qais Yousef

>   return 0;
>  }
>  
> -- 
> 2.27.0
> 


[PATCH 7/8] sched/fair: Attempt misfit active balance when migration_type != migrate_misfit

2021-01-28 Thread Valentin Schneider
Consider a 4-CPU big.LITTLE system with CPUs 0-1 as LITTLEs and CPUs 2-3 as
bigs. The resulting sched_domain hierarchy is:

  DIE [  ]
  MC  [][]
   0  1  2  3

When running a multithreaded CPU-bound workload (i.e. 1 hog per CPU), the
expected behaviour is to have the about-to-idle big CPUs pull a hog from
the LITTLEs, since bigs will complete their work sooner than LITTLEs.

Further Consider a scenario where:
- CPU0 is idle (e.g. its hog got migrated to one of the big CPUs)
- CPU1 is currently executing a per-CPU kworker, preempting the CPU hog
- CPU2 and CPU3 are executing CPU-hogs

CPU0 goes through load_balance() at MC level, and tries to pick stuff from
CPU1, but:
- the hog can't be pulled, because it's task_hot()
- the kworker can't be pulled, because it's pinned to CPU1, which sets
  LBF_SOME_PINNED

This load balance attempts ends with no load pulled, LBF_SOME_PINNED set,
and as a consequence we set the imbalance flag of DIE's [0, 1]
sched_group_capacity.

Shortly after, CPU2 completes its work and is about to go idle. It goes
through the newidle_balance(), and we would really like it to active
balance the hog running on CPU1 (which is a misfit task). However,
sgc->imbalance is set for the LITTLE group at DIE level, so the group gets
classified as group_imbalanced rather than group_misfit_task.

Unlike group_misfit_task (via migrate_misfit), the active balance logic
doesn't have any specific case for group_imbalanced, so CPU2 ends up going
idle. We'll have to wait for a load balance on CPU0 or CPU1 to happen and
clear the imbalance flag, and then for another DIE-level load-balance on
CPU2 to happen to pull the task off of CPU1. That's several precious
milliseconds wasted down the drain.

Giving group_misfit_task a higher group_classify() priority than
group_imbalance doesn't seem like the right thing to do. Instead, make
need_active_balance() return true for any migration_type when the
destination CPU is idle and the source CPU has a misfit task.

While at it, add an sd_has_asym_cpucapacity() guard in
need_active_balance().

Signed-off-by: Valentin Schneider 
---
 kernel/sched/fair.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0ac2f876b86f..cba9f97d9beb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9557,9 +9557,22 @@ static int need_active_balance(struct lb_env *env)
return 1;
}
 
+   if (!sd_has_asym_cpucapacity(sd))
+   return 0;
+
if (env->migration_type == migrate_misfit)
return 1;
 
+   /*
+* If we failed to pull anything and the src_rq has a misfit task, but
+* the busiest group_type was higher than group_misfit_task, try to
+* go for a misfit active balance anyway.
+*/
+   if ((env->idle != CPU_NOT_IDLE) &&
+   env->src_rq->misfit_task_load &&
+   cpu_capacity_greater(env->dst_cpu, env->src_cpu))
+   return 1;
+
return 0;
 }
 
-- 
2.27.0