Re: [PATCHv4 12/12] sched/core: Disable SD_PREFER_SIBLING on asymmetric cpu capacity domains

2018-08-10 Thread Vincent Guittot
On Mon, 6 Aug 2018 at 12:53, Valentin Schneider
 wrote:
>
> Hi,
>
> On 06/08/18 11:20, Vincent Guittot wrote:
> > Hi Valentin,
> >
> > On Tue, 31 Jul 2018 at 14:33, Valentin Schneider
> >  wrote:
> >>
> >> Hi,
> >>
> >> On 31/07/18 13:17, Vincent Guittot wrote:
> >> [...]
> 
>  This can easily happen with SD_PREFER_SIBLING enabled too so I wouldn't
>  say that this patch breaks anything that isn't broken already. In fact
>  we this happening with and without this patch applied.
> >>>
> >>> At least for the use case  above, this doesn't happen when
> >>> SD_PREFER_SIBLING is set
> >>>
> >>
> >> On my HiKey960 I can see coscheduling on a big CPU while a LITTLE is free
> >> with **and** without SD_PREFER_SIBLING. Having it set only means that in
> >> some cases the imbalance will be re-classified as group_overloaded instead
> >> of group_misfit_task, so we'll skip the misfit logic when we shouldn't 
> >> (this
> >> happens on Juno for instance).
> >
> > Can you give more details about your test case ?
> >
>
> I've been running the same test case as presented in the cover letter on
> my HiKey960 but with sched_switch tracing and with no tasksets. I've just
> re-run the testcase with tasksets and I get similar results (i.e. a big
> with coscheduling while a LITTLE is free) with or without the flag.
>
> >>
> >> It does nothing for the "1 task per CPU" problem that Morten described 
> >> above.
> >> When you have this little amount of tasks, load isn't very relevant, but it
> >> skews the load-balancer into thinking the LITTLE CPUs are more busy than
> >> the bigs even though there's an idle one in the lot.

ok. I meant  that without misfit patch, SD_PREFER_SIBLING ensures 1
task per CPU and no co scheduling. But co scheduling appears with
misfit task patchset

Vincent
> >>
> 
>  Morten


Re: [PATCHv4 12/12] sched/core: Disable SD_PREFER_SIBLING on asymmetric cpu capacity domains

2018-08-10 Thread Vincent Guittot
On Mon, 6 Aug 2018 at 12:53, Valentin Schneider
 wrote:
>
> Hi,
>
> On 06/08/18 11:20, Vincent Guittot wrote:
> > Hi Valentin,
> >
> > On Tue, 31 Jul 2018 at 14:33, Valentin Schneider
> >  wrote:
> >>
> >> Hi,
> >>
> >> On 31/07/18 13:17, Vincent Guittot wrote:
> >> [...]
> 
>  This can easily happen with SD_PREFER_SIBLING enabled too so I wouldn't
>  say that this patch breaks anything that isn't broken already. In fact
>  we this happening with and without this patch applied.
> >>>
> >>> At least for the use case  above, this doesn't happen when
> >>> SD_PREFER_SIBLING is set
> >>>
> >>
> >> On my HiKey960 I can see coscheduling on a big CPU while a LITTLE is free
> >> with **and** without SD_PREFER_SIBLING. Having it set only means that in
> >> some cases the imbalance will be re-classified as group_overloaded instead
> >> of group_misfit_task, so we'll skip the misfit logic when we shouldn't 
> >> (this
> >> happens on Juno for instance).
> >
> > Can you give more details about your test case ?
> >
>
> I've been running the same test case as presented in the cover letter on
> my HiKey960 but with sched_switch tracing and with no tasksets. I've just
> re-run the testcase with tasksets and I get similar results (i.e. a big
> with coscheduling while a LITTLE is free) with or without the flag.
>
> >>
> >> It does nothing for the "1 task per CPU" problem that Morten described 
> >> above.
> >> When you have this little amount of tasks, load isn't very relevant, but it
> >> skews the load-balancer into thinking the LITTLE CPUs are more busy than
> >> the bigs even though there's an idle one in the lot.

ok. I meant  that without misfit patch, SD_PREFER_SIBLING ensures 1
task per CPU and no co scheduling. But co scheduling appears with
misfit task patchset

Vincent
> >>
> 
>  Morten


Re: [PATCHv4 12/12] sched/core: Disable SD_PREFER_SIBLING on asymmetric cpu capacity domains

2018-08-06 Thread Valentin Schneider
Hi,

On 06/08/18 11:20, Vincent Guittot wrote:
> Hi Valentin,
> 
> On Tue, 31 Jul 2018 at 14:33, Valentin Schneider
>  wrote:
>>
>> Hi,
>>
>> On 31/07/18 13:17, Vincent Guittot wrote:
>> [...]

 This can easily happen with SD_PREFER_SIBLING enabled too so I wouldn't
 say that this patch breaks anything that isn't broken already. In fact
 we this happening with and without this patch applied.
>>>
>>> At least for the use case  above, this doesn't happen when
>>> SD_PREFER_SIBLING is set
>>>
>>
>> On my HiKey960 I can see coscheduling on a big CPU while a LITTLE is free
>> with **and** without SD_PREFER_SIBLING. Having it set only means that in
>> some cases the imbalance will be re-classified as group_overloaded instead
>> of group_misfit_task, so we'll skip the misfit logic when we shouldn't (this
>> happens on Juno for instance).
> 
> Can you give more details about your test case ?
> 

I've been running the same test case as presented in the cover letter on
my HiKey960 but with sched_switch tracing and with no tasksets. I've just
re-run the testcase with tasksets and I get similar results (i.e. a big
with coscheduling while a LITTLE is free) with or without the flag.

>>
>> It does nothing for the "1 task per CPU" problem that Morten described above.
>> When you have this little amount of tasks, load isn't very relevant, but it
>> skews the load-balancer into thinking the LITTLE CPUs are more busy than
>> the bigs even though there's an idle one in the lot.
>>

 Morten


Re: [PATCHv4 12/12] sched/core: Disable SD_PREFER_SIBLING on asymmetric cpu capacity domains

2018-08-06 Thread Valentin Schneider
Hi,

On 06/08/18 11:20, Vincent Guittot wrote:
> Hi Valentin,
> 
> On Tue, 31 Jul 2018 at 14:33, Valentin Schneider
>  wrote:
>>
>> Hi,
>>
>> On 31/07/18 13:17, Vincent Guittot wrote:
>> [...]

 This can easily happen with SD_PREFER_SIBLING enabled too so I wouldn't
 say that this patch breaks anything that isn't broken already. In fact
 we this happening with and without this patch applied.
>>>
>>> At least for the use case  above, this doesn't happen when
>>> SD_PREFER_SIBLING is set
>>>
>>
>> On my HiKey960 I can see coscheduling on a big CPU while a LITTLE is free
>> with **and** without SD_PREFER_SIBLING. Having it set only means that in
>> some cases the imbalance will be re-classified as group_overloaded instead
>> of group_misfit_task, so we'll skip the misfit logic when we shouldn't (this
>> happens on Juno for instance).
> 
> Can you give more details about your test case ?
> 

I've been running the same test case as presented in the cover letter on
my HiKey960 but with sched_switch tracing and with no tasksets. I've just
re-run the testcase with tasksets and I get similar results (i.e. a big
with coscheduling while a LITTLE is free) with or without the flag.

>>
>> It does nothing for the "1 task per CPU" problem that Morten described above.
>> When you have this little amount of tasks, load isn't very relevant, but it
>> skews the load-balancer into thinking the LITTLE CPUs are more busy than
>> the bigs even though there's an idle one in the lot.
>>

 Morten


Re: [PATCHv4 12/12] sched/core: Disable SD_PREFER_SIBLING on asymmetric cpu capacity domains

2018-08-06 Thread Vincent Guittot
Hi Valentin,

On Tue, 31 Jul 2018 at 14:33, Valentin Schneider
 wrote:
>
> Hi,
>
> On 31/07/18 13:17, Vincent Guittot wrote:
> > On Fri, 6 Jul 2018 at 16:31, Morten Rasmussen  
> > wrote:
> >>
> >> On Fri, Jul 06, 2018 at 12:18:17PM +0200, Vincent Guittot wrote:
> >>> [...]
> >>
> >> Scheduling one task per cpu when n_task == n_cpus on asymmetric
> >> topologies is generally broken already and this patch set doesn't fix
> >> that problem.
> >>
> >> SD_PREFER_SIBLING might seem to help in very specific cases:
> >> n_litte_cpus == n_big_cpus. In that case the little group might
> >> classified as overloaded. It doesn't guarantee that anything gets pulled
> >> as the grp_load/grp_capacity in the imbalance calculation on some system
> >> still says the little cpus are more loaded than the bigs despite one of
> >> them being idle. That depends on the little cpu capacities.
> >>
> >> On systems where n_little_cpus != n_big_cpus SD_PREFER_SIBLING is broken
> >> as it assumes the group_weight to be the same. This is the case on Juno
> >> and several other platforms.
> >>
> >> IMHO, SD_PREFER_SIBLING isn't the solution to this problem. It might
> >
> > I agree but this patchset creates a regression in the scheduling behavior
> >
> >> help for a limited subset of topologies/capacities but the right
> >> solution is to change the imbalance calculation. As the name says, it is
> >
> > Yes that what does the prototype that I came with.
> >
> >> meant to spread tasks and does so unconditionally. For asymmetric
> >> systems we would like to consider cpu capacity before migrating tasks.
> >>
> >>> When running the tests of your cover letter, 1 long
> >>> running task is often co scheduled on a big core whereas short pinned
> >>> tasks are still running and a little core is idle which is not an
> >>> optimal scheduling decision
> >>
> >> This can easily happen with SD_PREFER_SIBLING enabled too so I wouldn't
> >> say that this patch breaks anything that isn't broken already. In fact
> >> we this happening with and without this patch applied.
> >
> > At least for the use case  above, this doesn't happen when
> > SD_PREFER_SIBLING is set
> >
>
> On my HiKey960 I can see coscheduling on a big CPU while a LITTLE is free
> with **and** without SD_PREFER_SIBLING. Having it set only means that in
> some cases the imbalance will be re-classified as group_overloaded instead
> of group_misfit_task, so we'll skip the misfit logic when we shouldn't (this
> happens on Juno for instance).

Can you give more details about your test case ?

>
> It does nothing for the "1 task per CPU" problem that Morten described above.
> When you have this little amount of tasks, load isn't very relevant, but it
> skews the load-balancer into thinking the LITTLE CPUs are more busy than
> the bigs even though there's an idle one in the lot.
>
> >>
> >> Morten


Re: [PATCHv4 12/12] sched/core: Disable SD_PREFER_SIBLING on asymmetric cpu capacity domains

2018-08-06 Thread Vincent Guittot
Hi Valentin,

On Tue, 31 Jul 2018 at 14:33, Valentin Schneider
 wrote:
>
> Hi,
>
> On 31/07/18 13:17, Vincent Guittot wrote:
> > On Fri, 6 Jul 2018 at 16:31, Morten Rasmussen  
> > wrote:
> >>
> >> On Fri, Jul 06, 2018 at 12:18:17PM +0200, Vincent Guittot wrote:
> >>> [...]
> >>
> >> Scheduling one task per cpu when n_task == n_cpus on asymmetric
> >> topologies is generally broken already and this patch set doesn't fix
> >> that problem.
> >>
> >> SD_PREFER_SIBLING might seem to help in very specific cases:
> >> n_litte_cpus == n_big_cpus. In that case the little group might
> >> classified as overloaded. It doesn't guarantee that anything gets pulled
> >> as the grp_load/grp_capacity in the imbalance calculation on some system
> >> still says the little cpus are more loaded than the bigs despite one of
> >> them being idle. That depends on the little cpu capacities.
> >>
> >> On systems where n_little_cpus != n_big_cpus SD_PREFER_SIBLING is broken
> >> as it assumes the group_weight to be the same. This is the case on Juno
> >> and several other platforms.
> >>
> >> IMHO, SD_PREFER_SIBLING isn't the solution to this problem. It might
> >
> > I agree but this patchset creates a regression in the scheduling behavior
> >
> >> help for a limited subset of topologies/capacities but the right
> >> solution is to change the imbalance calculation. As the name says, it is
> >
> > Yes that what does the prototype that I came with.
> >
> >> meant to spread tasks and does so unconditionally. For asymmetric
> >> systems we would like to consider cpu capacity before migrating tasks.
> >>
> >>> When running the tests of your cover letter, 1 long
> >>> running task is often co scheduled on a big core whereas short pinned
> >>> tasks are still running and a little core is idle which is not an
> >>> optimal scheduling decision
> >>
> >> This can easily happen with SD_PREFER_SIBLING enabled too so I wouldn't
> >> say that this patch breaks anything that isn't broken already. In fact
> >> we this happening with and without this patch applied.
> >
> > At least for the use case  above, this doesn't happen when
> > SD_PREFER_SIBLING is set
> >
>
> On my HiKey960 I can see coscheduling on a big CPU while a LITTLE is free
> with **and** without SD_PREFER_SIBLING. Having it set only means that in
> some cases the imbalance will be re-classified as group_overloaded instead
> of group_misfit_task, so we'll skip the misfit logic when we shouldn't (this
> happens on Juno for instance).

Can you give more details about your test case ?

>
> It does nothing for the "1 task per CPU" problem that Morten described above.
> When you have this little amount of tasks, load isn't very relevant, but it
> skews the load-balancer into thinking the LITTLE CPUs are more busy than
> the bigs even though there's an idle one in the lot.
>
> >>
> >> Morten


Re: [PATCHv4 12/12] sched/core: Disable SD_PREFER_SIBLING on asymmetric cpu capacity domains

2018-07-31 Thread Valentin Schneider
Hi,

On 31/07/18 13:17, Vincent Guittot wrote:
> On Fri, 6 Jul 2018 at 16:31, Morten Rasmussen  
> wrote:
>>
>> On Fri, Jul 06, 2018 at 12:18:17PM +0200, Vincent Guittot wrote:
>>> [...]
>>
>> Scheduling one task per cpu when n_task == n_cpus on asymmetric
>> topologies is generally broken already and this patch set doesn't fix
>> that problem.
>>
>> SD_PREFER_SIBLING might seem to help in very specific cases:
>> n_litte_cpus == n_big_cpus. In that case the little group might
>> classified as overloaded. It doesn't guarantee that anything gets pulled
>> as the grp_load/grp_capacity in the imbalance calculation on some system
>> still says the little cpus are more loaded than the bigs despite one of
>> them being idle. That depends on the little cpu capacities.
>>
>> On systems where n_little_cpus != n_big_cpus SD_PREFER_SIBLING is broken
>> as it assumes the group_weight to be the same. This is the case on Juno
>> and several other platforms.
>>
>> IMHO, SD_PREFER_SIBLING isn't the solution to this problem. It might
> 
> I agree but this patchset creates a regression in the scheduling behavior
> 
>> help for a limited subset of topologies/capacities but the right
>> solution is to change the imbalance calculation. As the name says, it is
> 
> Yes that what does the prototype that I came with.
> 
>> meant to spread tasks and does so unconditionally. For asymmetric
>> systems we would like to consider cpu capacity before migrating tasks.
>>
>>> When running the tests of your cover letter, 1 long
>>> running task is often co scheduled on a big core whereas short pinned
>>> tasks are still running and a little core is idle which is not an
>>> optimal scheduling decision
>>
>> This can easily happen with SD_PREFER_SIBLING enabled too so I wouldn't
>> say that this patch breaks anything that isn't broken already. In fact
>> we this happening with and without this patch applied.
> 
> At least for the use case  above, this doesn't happen when
> SD_PREFER_SIBLING is set
> 

On my HiKey960 I can see coscheduling on a big CPU while a LITTLE is free
with **and** without SD_PREFER_SIBLING. Having it set only means that in
some cases the imbalance will be re-classified as group_overloaded instead
of group_misfit_task, so we'll skip the misfit logic when we shouldn't (this
happens on Juno for instance).

It does nothing for the "1 task per CPU" problem that Morten described above.
When you have this little amount of tasks, load isn't very relevant, but it
skews the load-balancer into thinking the LITTLE CPUs are more busy than
the bigs even though there's an idle one in the lot.

>>
>> Morten


Re: [PATCHv4 12/12] sched/core: Disable SD_PREFER_SIBLING on asymmetric cpu capacity domains

2018-07-31 Thread Valentin Schneider
Hi,

On 31/07/18 13:17, Vincent Guittot wrote:
> On Fri, 6 Jul 2018 at 16:31, Morten Rasmussen  
> wrote:
>>
>> On Fri, Jul 06, 2018 at 12:18:17PM +0200, Vincent Guittot wrote:
>>> [...]
>>
>> Scheduling one task per cpu when n_task == n_cpus on asymmetric
>> topologies is generally broken already and this patch set doesn't fix
>> that problem.
>>
>> SD_PREFER_SIBLING might seem to help in very specific cases:
>> n_litte_cpus == n_big_cpus. In that case the little group might
>> classified as overloaded. It doesn't guarantee that anything gets pulled
>> as the grp_load/grp_capacity in the imbalance calculation on some system
>> still says the little cpus are more loaded than the bigs despite one of
>> them being idle. That depends on the little cpu capacities.
>>
>> On systems where n_little_cpus != n_big_cpus SD_PREFER_SIBLING is broken
>> as it assumes the group_weight to be the same. This is the case on Juno
>> and several other platforms.
>>
>> IMHO, SD_PREFER_SIBLING isn't the solution to this problem. It might
> 
> I agree but this patchset creates a regression in the scheduling behavior
> 
>> help for a limited subset of topologies/capacities but the right
>> solution is to change the imbalance calculation. As the name says, it is
> 
> Yes that what does the prototype that I came with.
> 
>> meant to spread tasks and does so unconditionally. For asymmetric
>> systems we would like to consider cpu capacity before migrating tasks.
>>
>>> When running the tests of your cover letter, 1 long
>>> running task is often co scheduled on a big core whereas short pinned
>>> tasks are still running and a little core is idle which is not an
>>> optimal scheduling decision
>>
>> This can easily happen with SD_PREFER_SIBLING enabled too so I wouldn't
>> say that this patch breaks anything that isn't broken already. In fact
>> we this happening with and without this patch applied.
> 
> At least for the use case  above, this doesn't happen when
> SD_PREFER_SIBLING is set
> 

On my HiKey960 I can see coscheduling on a big CPU while a LITTLE is free
with **and** without SD_PREFER_SIBLING. Having it set only means that in
some cases the imbalance will be re-classified as group_overloaded instead
of group_misfit_task, so we'll skip the misfit logic when we shouldn't (this
happens on Juno for instance).

It does nothing for the "1 task per CPU" problem that Morten described above.
When you have this little amount of tasks, load isn't very relevant, but it
skews the load-balancer into thinking the LITTLE CPUs are more busy than
the bigs even though there's an idle one in the lot.

>>
>> Morten


Re: [PATCHv4 12/12] sched/core: Disable SD_PREFER_SIBLING on asymmetric cpu capacity domains

2018-07-31 Thread Vincent Guittot
On Fri, 6 Jul 2018 at 16:31, Morten Rasmussen  wrote:
>
> On Fri, Jul 06, 2018 at 12:18:17PM +0200, Vincent Guittot wrote:
> > On Wed, 4 Jul 2018 at 12:18, Morten Rasmussen  
> > wrote:
> > >
> > > The 'prefer sibling' sched_domain flag is intended to encourage
> > > spreading tasks to sibling sched_domain to take advantage of more caches
> > > and core for SMT systems. It has recently been changed to be on all
> > > non-NUMA topology level. However, spreading across domains with cpu
> > > capacity asymmetry isn't desirable, e.g. spreading from high capacity to
> > > low capacity cpus even if high capacity cpus aren't overutilized might
> > > give access to more cache but the cpu will be slower and possibly lead
> > > to worse overall throughput.
> > >
> > > To prevent this, we need to remove SD_PREFER_SIBLING on the sched_domain
> > > level immediately below SD_ASYM_CPUCAPACITY.
> >
> > This makes sense. Nevertheless, this patch also raises a scheduling
> > problem and break the 1 task per CPU policy that is enforced by
> > SD_PREFER_SIBLING.
>
> Scheduling one task per cpu when n_task == n_cpus on asymmetric
> topologies is generally broken already and this patch set doesn't fix
> that problem.
>
> SD_PREFER_SIBLING might seem to help in very specific cases:
> n_litte_cpus == n_big_cpus. In that case the little group might
> classified as overloaded. It doesn't guarantee that anything gets pulled
> as the grp_load/grp_capacity in the imbalance calculation on some system
> still says the little cpus are more loaded than the bigs despite one of
> them being idle. That depends on the little cpu capacities.
>
> On systems where n_little_cpus != n_big_cpus SD_PREFER_SIBLING is broken
> as it assumes the group_weight to be the same. This is the case on Juno
> and several other platforms.
>
> IMHO, SD_PREFER_SIBLING isn't the solution to this problem. It might

I agree but this patchset creates a regression in the scheduling behavior

> help for a limited subset of topologies/capacities but the right
> solution is to change the imbalance calculation. As the name says, it is

Yes that what does the prototype that I came with.

> meant to spread tasks and does so unconditionally. For asymmetric
> systems we would like to consider cpu capacity before migrating tasks.
>
> > When running the tests of your cover letter, 1 long
> > running task is often co scheduled on a big core whereas short pinned
> > tasks are still running and a little core is idle which is not an
> > optimal scheduling decision
>
> This can easily happen with SD_PREFER_SIBLING enabled too so I wouldn't
> say that this patch breaks anything that isn't broken already. In fact
> we this happening with and without this patch applied.

At least for the use case  above, this doesn't happen when
SD_PREFER_SIBLING is set

>
> Morten


Re: [PATCHv4 12/12] sched/core: Disable SD_PREFER_SIBLING on asymmetric cpu capacity domains

2018-07-31 Thread Vincent Guittot
On Fri, 6 Jul 2018 at 16:31, Morten Rasmussen  wrote:
>
> On Fri, Jul 06, 2018 at 12:18:17PM +0200, Vincent Guittot wrote:
> > On Wed, 4 Jul 2018 at 12:18, Morten Rasmussen  
> > wrote:
> > >
> > > The 'prefer sibling' sched_domain flag is intended to encourage
> > > spreading tasks to sibling sched_domain to take advantage of more caches
> > > and core for SMT systems. It has recently been changed to be on all
> > > non-NUMA topology level. However, spreading across domains with cpu
> > > capacity asymmetry isn't desirable, e.g. spreading from high capacity to
> > > low capacity cpus even if high capacity cpus aren't overutilized might
> > > give access to more cache but the cpu will be slower and possibly lead
> > > to worse overall throughput.
> > >
> > > To prevent this, we need to remove SD_PREFER_SIBLING on the sched_domain
> > > level immediately below SD_ASYM_CPUCAPACITY.
> >
> > This makes sense. Nevertheless, this patch also raises a scheduling
> > problem and break the 1 task per CPU policy that is enforced by
> > SD_PREFER_SIBLING.
>
> Scheduling one task per cpu when n_task == n_cpus on asymmetric
> topologies is generally broken already and this patch set doesn't fix
> that problem.
>
> SD_PREFER_SIBLING might seem to help in very specific cases:
> n_litte_cpus == n_big_cpus. In that case the little group might
> classified as overloaded. It doesn't guarantee that anything gets pulled
> as the grp_load/grp_capacity in the imbalance calculation on some system
> still says the little cpus are more loaded than the bigs despite one of
> them being idle. That depends on the little cpu capacities.
>
> On systems where n_little_cpus != n_big_cpus SD_PREFER_SIBLING is broken
> as it assumes the group_weight to be the same. This is the case on Juno
> and several other platforms.
>
> IMHO, SD_PREFER_SIBLING isn't the solution to this problem. It might

I agree but this patchset creates a regression in the scheduling behavior

> help for a limited subset of topologies/capacities but the right
> solution is to change the imbalance calculation. As the name says, it is

Yes that what does the prototype that I came with.

> meant to spread tasks and does so unconditionally. For asymmetric
> systems we would like to consider cpu capacity before migrating tasks.
>
> > When running the tests of your cover letter, 1 long
> > running task is often co scheduled on a big core whereas short pinned
> > tasks are still running and a little core is idle which is not an
> > optimal scheduling decision
>
> This can easily happen with SD_PREFER_SIBLING enabled too so I wouldn't
> say that this patch breaks anything that isn't broken already. In fact
> we this happening with and without this patch applied.

At least for the use case  above, this doesn't happen when
SD_PREFER_SIBLING is set

>
> Morten


Re: [PATCHv4 12/12] sched/core: Disable SD_PREFER_SIBLING on asymmetric cpu capacity domains

2018-07-06 Thread Morten Rasmussen
On Fri, Jul 06, 2018 at 12:18:17PM +0200, Vincent Guittot wrote:
> On Wed, 4 Jul 2018 at 12:18, Morten Rasmussen  
> wrote:
> >
> > The 'prefer sibling' sched_domain flag is intended to encourage
> > spreading tasks to sibling sched_domain to take advantage of more caches
> > and core for SMT systems. It has recently been changed to be on all
> > non-NUMA topology level. However, spreading across domains with cpu
> > capacity asymmetry isn't desirable, e.g. spreading from high capacity to
> > low capacity cpus even if high capacity cpus aren't overutilized might
> > give access to more cache but the cpu will be slower and possibly lead
> > to worse overall throughput.
> >
> > To prevent this, we need to remove SD_PREFER_SIBLING on the sched_domain
> > level immediately below SD_ASYM_CPUCAPACITY.
> 
> This makes sense. Nevertheless, this patch also raises a scheduling
> problem and break the 1 task per CPU policy that is enforced by
> SD_PREFER_SIBLING.

Scheduling one task per cpu when n_task == n_cpus on asymmetric
topologies is generally broken already and this patch set doesn't fix
that problem.

SD_PREFER_SIBLING might seem to help in very specific cases:
n_litte_cpus == n_big_cpus. In that case the little group might
classified as overloaded. It doesn't guarantee that anything gets pulled
as the grp_load/grp_capacity in the imbalance calculation on some system
still says the little cpus are more loaded than the bigs despite one of
them being idle. That depends on the little cpu capacities.

On systems where n_little_cpus != n_big_cpus SD_PREFER_SIBLING is broken
as it assumes the group_weight to be the same. This is the case on Juno
and several other platforms.

IMHO, SD_PREFER_SIBLING isn't the solution to this problem. It might
help for a limited subset of topologies/capacities but the right
solution is to change the imbalance calculation. As the name says, it is
meant to spread tasks and does so unconditionally. For asymmetric
systems we would like to consider cpu capacity before migrating tasks.

> When running the tests of your cover letter, 1 long
> running task is often co scheduled on a big core whereas short pinned
> tasks are still running and a little core is idle which is not an
> optimal scheduling decision

This can easily happen with SD_PREFER_SIBLING enabled too so I wouldn't
say that this patch breaks anything that isn't broken already. In fact
we this happening with and without this patch applied.

Morten


Re: [PATCHv4 12/12] sched/core: Disable SD_PREFER_SIBLING on asymmetric cpu capacity domains

2018-07-06 Thread Morten Rasmussen
On Fri, Jul 06, 2018 at 12:18:17PM +0200, Vincent Guittot wrote:
> On Wed, 4 Jul 2018 at 12:18, Morten Rasmussen  
> wrote:
> >
> > The 'prefer sibling' sched_domain flag is intended to encourage
> > spreading tasks to sibling sched_domain to take advantage of more caches
> > and core for SMT systems. It has recently been changed to be on all
> > non-NUMA topology level. However, spreading across domains with cpu
> > capacity asymmetry isn't desirable, e.g. spreading from high capacity to
> > low capacity cpus even if high capacity cpus aren't overutilized might
> > give access to more cache but the cpu will be slower and possibly lead
> > to worse overall throughput.
> >
> > To prevent this, we need to remove SD_PREFER_SIBLING on the sched_domain
> > level immediately below SD_ASYM_CPUCAPACITY.
> 
> This makes sense. Nevertheless, this patch also raises a scheduling
> problem and break the 1 task per CPU policy that is enforced by
> SD_PREFER_SIBLING.

Scheduling one task per cpu when n_task == n_cpus on asymmetric
topologies is generally broken already and this patch set doesn't fix
that problem.

SD_PREFER_SIBLING might seem to help in very specific cases:
n_litte_cpus == n_big_cpus. In that case the little group might
classified as overloaded. It doesn't guarantee that anything gets pulled
as the grp_load/grp_capacity in the imbalance calculation on some system
still says the little cpus are more loaded than the bigs despite one of
them being idle. That depends on the little cpu capacities.

On systems where n_little_cpus != n_big_cpus SD_PREFER_SIBLING is broken
as it assumes the group_weight to be the same. This is the case on Juno
and several other platforms.

IMHO, SD_PREFER_SIBLING isn't the solution to this problem. It might
help for a limited subset of topologies/capacities but the right
solution is to change the imbalance calculation. As the name says, it is
meant to spread tasks and does so unconditionally. For asymmetric
systems we would like to consider cpu capacity before migrating tasks.

> When running the tests of your cover letter, 1 long
> running task is often co scheduled on a big core whereas short pinned
> tasks are still running and a little core is idle which is not an
> optimal scheduling decision

This can easily happen with SD_PREFER_SIBLING enabled too so I wouldn't
say that this patch breaks anything that isn't broken already. In fact
we this happening with and without this patch applied.

Morten


Re: [PATCHv4 12/12] sched/core: Disable SD_PREFER_SIBLING on asymmetric cpu capacity domains

2018-07-06 Thread Vincent Guittot
On Wed, 4 Jul 2018 at 12:18, Morten Rasmussen  wrote:
>
> The 'prefer sibling' sched_domain flag is intended to encourage
> spreading tasks to sibling sched_domain to take advantage of more caches
> and core for SMT systems. It has recently been changed to be on all
> non-NUMA topology level. However, spreading across domains with cpu
> capacity asymmetry isn't desirable, e.g. spreading from high capacity to
> low capacity cpus even if high capacity cpus aren't overutilized might
> give access to more cache but the cpu will be slower and possibly lead
> to worse overall throughput.
>
> To prevent this, we need to remove SD_PREFER_SIBLING on the sched_domain
> level immediately below SD_ASYM_CPUCAPACITY.

This makes sense. Nevertheless, this patch also raises a scheduling
problem and break the 1 task per CPU policy that is enforced by
SD_PREFER_SIBLING. When running the tests of your cover letter, 1 long
running task is often co scheduled on a big core whereas short pinned
tasks are still running and a little core is idle which is not an
optimal scheduling decision

>
> cc: Ingo Molnar 
> cc: Peter Zijlstra 
>
> Signed-off-by: Morten Rasmussen 
> ---
>  kernel/sched/topology.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 29c186961345..00c7a08c7f77 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1140,7 +1140,7 @@ sd_init(struct sched_domain_topology_level *tl,
> | 0*SD_SHARE_CPUCAPACITY
> | 0*SD_SHARE_PKG_RESOURCES
> | 0*SD_SERIALIZE
> -   | 0*SD_PREFER_SIBLING
> +   | 1*SD_PREFER_SIBLING
> | 0*SD_NUMA
> | sd_flags
> ,
> @@ -1186,17 +1186,21 @@ sd_init(struct sched_domain_topology_level *tl,
> if (sd->flags & SD_ASYM_CPUCAPACITY) {
> struct sched_domain *t = sd;
>
> +   /*
> +* Don't attempt to spread across cpus of different 
> capacities.
> +*/
> +   if (sd->child)
> +   sd->child->flags &= ~SD_PREFER_SIBLING;
> +
> for_each_lower_domain(t)
> t->flags |= SD_BALANCE_WAKE;
> }
>
> if (sd->flags & SD_SHARE_CPUCAPACITY) {
> -   sd->flags |= SD_PREFER_SIBLING;
> sd->imbalance_pct = 110;
> sd->smt_gain = 1178; /* ~15% */
>
> } else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
> -   sd->flags |= SD_PREFER_SIBLING;
> sd->imbalance_pct = 117;
> sd->cache_nice_tries = 1;
> sd->busy_idx = 2;
> @@ -1207,6 +1211,7 @@ sd_init(struct sched_domain_topology_level *tl,
> sd->busy_idx = 3;
> sd->idle_idx = 2;
>
> +   sd->flags &= ~SD_PREFER_SIBLING;
> sd->flags |= SD_SERIALIZE;
> if (sched_domains_numa_distance[tl->numa_level] > 
> RECLAIM_DISTANCE) {
> sd->flags &= ~(SD_BALANCE_EXEC |
> @@ -1216,7 +1221,6 @@ sd_init(struct sched_domain_topology_level *tl,
>
>  #endif
> } else {
> -   sd->flags |= SD_PREFER_SIBLING;
> sd->cache_nice_tries = 1;
> sd->busy_idx = 2;
> sd->idle_idx = 1;
> --
> 2.7.4
>


Re: [PATCHv4 12/12] sched/core: Disable SD_PREFER_SIBLING on asymmetric cpu capacity domains

2018-07-06 Thread Vincent Guittot
On Wed, 4 Jul 2018 at 12:18, Morten Rasmussen  wrote:
>
> The 'prefer sibling' sched_domain flag is intended to encourage
> spreading tasks to sibling sched_domain to take advantage of more caches
> and core for SMT systems. It has recently been changed to be on all
> non-NUMA topology level. However, spreading across domains with cpu
> capacity asymmetry isn't desirable, e.g. spreading from high capacity to
> low capacity cpus even if high capacity cpus aren't overutilized might
> give access to more cache but the cpu will be slower and possibly lead
> to worse overall throughput.
>
> To prevent this, we need to remove SD_PREFER_SIBLING on the sched_domain
> level immediately below SD_ASYM_CPUCAPACITY.

This makes sense. Nevertheless, this patch also raises a scheduling
problem and break the 1 task per CPU policy that is enforced by
SD_PREFER_SIBLING. When running the tests of your cover letter, 1 long
running task is often co scheduled on a big core whereas short pinned
tasks are still running and a little core is idle which is not an
optimal scheduling decision

>
> cc: Ingo Molnar 
> cc: Peter Zijlstra 
>
> Signed-off-by: Morten Rasmussen 
> ---
>  kernel/sched/topology.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 29c186961345..00c7a08c7f77 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1140,7 +1140,7 @@ sd_init(struct sched_domain_topology_level *tl,
> | 0*SD_SHARE_CPUCAPACITY
> | 0*SD_SHARE_PKG_RESOURCES
> | 0*SD_SERIALIZE
> -   | 0*SD_PREFER_SIBLING
> +   | 1*SD_PREFER_SIBLING
> | 0*SD_NUMA
> | sd_flags
> ,
> @@ -1186,17 +1186,21 @@ sd_init(struct sched_domain_topology_level *tl,
> if (sd->flags & SD_ASYM_CPUCAPACITY) {
> struct sched_domain *t = sd;
>
> +   /*
> +* Don't attempt to spread across cpus of different 
> capacities.
> +*/
> +   if (sd->child)
> +   sd->child->flags &= ~SD_PREFER_SIBLING;
> +
> for_each_lower_domain(t)
> t->flags |= SD_BALANCE_WAKE;
> }
>
> if (sd->flags & SD_SHARE_CPUCAPACITY) {
> -   sd->flags |= SD_PREFER_SIBLING;
> sd->imbalance_pct = 110;
> sd->smt_gain = 1178; /* ~15% */
>
> } else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
> -   sd->flags |= SD_PREFER_SIBLING;
> sd->imbalance_pct = 117;
> sd->cache_nice_tries = 1;
> sd->busy_idx = 2;
> @@ -1207,6 +1211,7 @@ sd_init(struct sched_domain_topology_level *tl,
> sd->busy_idx = 3;
> sd->idle_idx = 2;
>
> +   sd->flags &= ~SD_PREFER_SIBLING;
> sd->flags |= SD_SERIALIZE;
> if (sched_domains_numa_distance[tl->numa_level] > 
> RECLAIM_DISTANCE) {
> sd->flags &= ~(SD_BALANCE_EXEC |
> @@ -1216,7 +1221,6 @@ sd_init(struct sched_domain_topology_level *tl,
>
>  #endif
> } else {
> -   sd->flags |= SD_PREFER_SIBLING;
> sd->cache_nice_tries = 1;
> sd->busy_idx = 2;
> sd->idle_idx = 1;
> --
> 2.7.4
>