Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups

2018-02-26 Thread Morten Rasmussen
On Fri, Feb 23, 2018 at 05:47:52PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 23, 2018 at 04:38:06PM +, Morten Rasmussen wrote:
> > > Or am I now terminally confused again?
> > 
> > No, I think you are right, or I'm equally confused.
> 
> :-)
> 
> Would it make sense to also track max_capacity, but then based on the
> value before RT scaling ?
> 
> That should readily be able to distinguish between big and little
> clusters, although then DynamiQ would still completely ruin things.

IIRC, I did actually track max_capacity to begin with for the wake_cap()
stuff, but someone suggested to use min_capacity instead to factor in
the RT scaling as it could potentially help some use-cases.

I can add unscaled max_capacity tracking and use that as this is
primarily a solution for asymmetric cpu capacity system.

Whether we track max or min shouldn't really matter if it is based on
original capacity, unless you have a DynamiQ system. For DynamiQ system
it depends on how it is configured. If it is a single DynamiQ cluster we
will have just on sched_domain with per-cpu sched_groups, so we don't
have balancing between mixed groups. If we have multiple DynamiQ
clusters, we can have mixed groups, homogeneous groups, or both
depending on the system configuration. Homogeneous groups should be
okay, mixed groups could work okay, I think, as long as all group have
the same mix, a mix of mixed groups is going to be a challenge. Most
likely we would have to treat all these groups as one and ignore cache
boundaries for scheduling decisions.

We are slowly getting into the mess of which capacity should be used for
various conditions. Is it the original capacity (unscaled and at the
highest frequency), do we subtract the RT utilization, and what if the
thermal framework has disabled some of the higher frequencies?

Particularly, the fact that constraints impose by RT and thermal are not
permanent and might change depending on the use-case and how we place
the tasks. But that is a longer discussion to be had.


Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups

2018-02-23 Thread Peter Zijlstra
On Fri, Feb 23, 2018 at 04:38:06PM +, Morten Rasmussen wrote:
> > Or am I now terminally confused again?
> 
> No, I think you are right, or I'm equally confused.

:-)

Would it make sense to also track max_capacity, but then based on the
value before RT scaling ?

That should readily be able to distinguish between big and little
clusters, although then DynamiQ would still completely ruin things.


Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups

2018-02-23 Thread Morten Rasmussen
On Tue, Feb 20, 2018 at 07:26:05PM +0100, Peter Zijlstra wrote:
> On Tue, Feb 20, 2018 at 04:33:52PM +, Morten Rasmussen wrote:
> > On Mon, Feb 19, 2018 at 04:10:11PM +0100, Peter Zijlstra wrote:
> > > On Thu, Feb 15, 2018 at 04:20:51PM +, Morten Rasmussen wrote:
> > > > +/*
> > > > + * group_similar_cpu_capacity: Returns true if the minimum capacity of 
> > > > the
> > > > + * compared groups differ by less than 12.5%.
> > > > + */
> > > > +static inline bool
> > > > +group_similar_cpu_capacity(struct sched_group *sg, struct sched_group 
> > > > *ref)
> > > > +{
> > > > +   long diff = sg->sgc->min_capacity - ref->sgc->min_capacity;
> > > > +   long max = max(sg->sgc->min_capacity, ref->sgc->min_capacity);
> > > > +
> > > > +   return abs(diff) < max >> 3;
> > > > +}
> > > 
> > > This seems a fairly random and dodgy heuristic.
> > 
> > I can't deny that :-)
> > 
> > We need to somehow figure out if we are doing asymmetric cpu capacity
> > balancing or normal SMP balancing. We probably don't care about
> > migrating tasks if the capacities are nearly identical. But how much is
> > 'nearly'?
> > 
> > We could make it strictly equal as long as sgc->min_capacity is based on
> > capacity_orig. If we let things like rt-pressure influence
> > sgc->min_capacity, it might become a mess.
> 
> See, that is the problem, I think it this min_capacity thing is
> influenced by rt-pressure and the like.
> 
> See update_cpu_capacity(), min_capacity is set after we add the RT scale
> factor thingy, and then update_group_capacity() filters the min of the
> whole group. The thing only ever goes down.
> 
> But this means that if a big CPU has a very high IRQ/RT load, its
> capacity will dip below that of a little core and min_capacity for the
> big group as a whole will appear smaller than that of the little group.
> 
> Or am I now terminally confused again?

No, I think you are right, or I'm equally confused.

I don't think we can avoid having some sort of margin to decide when
capacities are significantly different if we want to keep this patch.

Looking more into this, I realized that we do already have similar
comparison and margin in group_smaller_cpu_capacity(). So I'm going to
look using that instead if possible.

The commit message could use some clarification too as we do in fact
already use group_smaller_cpu_capacity() to bail out of pulling big
tasks from big cpus to little. However, there are cases where it is fine
to have sum_nr_running > group_weight on the big side and cases where it
is fine to have the bigs idling while the littles are lightly utilized
which should be addressed by this patch.


Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups

2018-02-20 Thread Peter Zijlstra
On Tue, Feb 20, 2018 at 04:33:52PM +, Morten Rasmussen wrote:
> On Mon, Feb 19, 2018 at 04:10:11PM +0100, Peter Zijlstra wrote:
> > On Thu, Feb 15, 2018 at 04:20:51PM +, Morten Rasmussen wrote:
> > > +/*
> > > + * group_similar_cpu_capacity: Returns true if the minimum capacity of 
> > > the
> > > + * compared groups differ by less than 12.5%.
> > > + */
> > > +static inline bool
> > > +group_similar_cpu_capacity(struct sched_group *sg, struct sched_group 
> > > *ref)
> > > +{
> > > + long diff = sg->sgc->min_capacity - ref->sgc->min_capacity;
> > > + long max = max(sg->sgc->min_capacity, ref->sgc->min_capacity);
> > > +
> > > + return abs(diff) < max >> 3;
> > > +}
> > 
> > This seems a fairly random and dodgy heuristic.
> 
> I can't deny that :-)
> 
> We need to somehow figure out if we are doing asymmetric cpu capacity
> balancing or normal SMP balancing. We probably don't care about
> migrating tasks if the capacities are nearly identical. But how much is
> 'nearly'?
> 
> We could make it strictly equal as long as sgc->min_capacity is based on
> capacity_orig. If we let things like rt-pressure influence
> sgc->min_capacity, it might become a mess.

See, that is the problem, I think it this min_capacity thing is
influenced by rt-pressure and the like.

See update_cpu_capacity(), min_capacity is set after we add the RT scale
factor thingy, and then update_group_capacity() filters the min of the
whole group. The thing only ever goes down.

But this means that if a big CPU has a very high IRQ/RT load, its
capacity will dip below that of a little core and min_capacity for the
big group as a whole will appear smaller than that of the little group.

Or am I now terminally confused again?


Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups

2018-02-20 Thread Morten Rasmussen
On Mon, Feb 19, 2018 at 04:10:11PM +0100, Peter Zijlstra wrote:
> On Thu, Feb 15, 2018 at 04:20:51PM +, Morten Rasmussen wrote:
> > +/*
> > + * group_similar_cpu_capacity: Returns true if the minimum capacity of the
> > + * compared groups differ by less than 12.5%.
> > + */
> > +static inline bool
> > +group_similar_cpu_capacity(struct sched_group *sg, struct sched_group *ref)
> > +{
> > +   long diff = sg->sgc->min_capacity - ref->sgc->min_capacity;
> > +   long max = max(sg->sgc->min_capacity, ref->sgc->min_capacity);
> > +
> > +   return abs(diff) < max >> 3;
> > +}
> 
> This seems a fairly random and dodgy heuristic.

I can't deny that :-)

We need to somehow figure out if we are doing asymmetric cpu capacity
balancing or normal SMP balancing. We probably don't care about
migrating tasks if the capacities are nearly identical. But how much is
'nearly'?

We could make it strictly equal as long as sgc->min_capacity is based on
capacity_orig. If we let things like rt-pressure influence
sgc->min_capacity, it might become a mess.

We could tie it to sd->imbalance_pct to make it slightly less arbitrary,
or we can try to drop the margin.

Alternative solutions and preferences are welcome...


Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups

2018-02-20 Thread Morten Rasmussen
On Mon, Feb 19, 2018 at 03:53:35PM +0100, Peter Zijlstra wrote:
> On Mon, Feb 19, 2018 at 03:50:12PM +0100, Peter Zijlstra wrote:
> > On Thu, Feb 15, 2018 at 04:20:51PM +, Morten Rasmussen wrote:
> > > On systems with asymmetric cpu capacities, a skewed load distribution
> > > might yield better throughput than balancing load per group capacity.
> > > For example, preferring high capacity cpus for compute intensive tasks
> > > leaving low capacity cpus idle rather than balancing the number of idle
> > > cpus across different cpu types. Instead, let load-balance back off if
> > > the busiest group isn't really overloaded.
> > 
> > I'm sorry. I just can't seem to make sense of that today. What?
> 
> Aah, you're saying that is we have 4+4 bit.little and 4 runnable tasks,
> we would like them running on our big cluster and leave the small
> cluster entirely idle, instead of runnint 2+2.

Yes. Ideally, when are busy, we want to fill up the big cpus first, one
task on each, and if we have more tasks we start using little cpus
before putting additional tasks on any of the bigs. The load per group
capacity metric is sort of working against that up until roughly the
point where we have enough tasks for all the cpus.

> So what about this DynamicQ nonsense? Or is that so unstructured we
> can't really do anything sensible with it?

With DynamiQ we don't have any grouping of the cpu types provided by the
architecture. So we are going end up balancing between individual cpus.
I hope these tweaks should work for DynamiQ too. Always-running tasks on
little cpus should to be flagged as misfits and be picked up by big
cpus. It is still to be verified though.


Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups

2018-02-20 Thread Peter Zijlstra
On Thu, Feb 15, 2018 at 04:20:51PM +, Morten Rasmussen wrote:
> On systems with asymmetric cpu capacities, a skewed load distribution
> might yield better throughput than balancing load per group capacity.
> For example, preferring high capacity cpus for compute intensive tasks
> leaving low capacity cpus idle rather than balancing the number of idle
> cpus across different cpu types. Instead, let load-balance back off if
> the busiest group isn't really overloaded.

I'm sorry. I just can't seem to make sense of that today. What?



Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups

2018-02-20 Thread Peter Zijlstra
On Mon, Feb 19, 2018 at 03:50:12PM +0100, Peter Zijlstra wrote:
> On Thu, Feb 15, 2018 at 04:20:51PM +, Morten Rasmussen wrote:
> > On systems with asymmetric cpu capacities, a skewed load distribution
> > might yield better throughput than balancing load per group capacity.
> > For example, preferring high capacity cpus for compute intensive tasks
> > leaving low capacity cpus idle rather than balancing the number of idle
> > cpus across different cpu types. Instead, let load-balance back off if
> > the busiest group isn't really overloaded.
> 
> I'm sorry. I just can't seem to make sense of that today. What?

Aah, you're saying that is we have 4+4 bit.little and 4 runnable tasks,
we would like them running on our big cluster and leave the small
cluster entirely idle, instead of runnint 2+2.

So what about this DynamicQ nonsense? Or is that so unstructured we
can't really do anything sensible with it?



Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups

2018-02-20 Thread Peter Zijlstra
On Thu, Feb 15, 2018 at 04:20:51PM +, Morten Rasmussen wrote:
> +/*
> + * group_similar_cpu_capacity: Returns true if the minimum capacity of the
> + * compared groups differ by less than 12.5%.
> + */
> +static inline bool
> +group_similar_cpu_capacity(struct sched_group *sg, struct sched_group *ref)
> +{
> + long diff = sg->sgc->min_capacity - ref->sgc->min_capacity;
> + long max = max(sg->sgc->min_capacity, ref->sgc->min_capacity);
> +
> + return abs(diff) < max >> 3;
> +}

This seems a fairly random and dodgy heuristic.