Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups
On Fri, Feb 23, 2018 at 05:47:52PM +0100, Peter Zijlstra wrote: > On Fri, Feb 23, 2018 at 04:38:06PM +, Morten Rasmussen wrote: > > > Or am I now terminally confused again? > > > > No, I think you are right, or I'm equally confused. > > :-) > > Would it make sense to also track max_capacity, but then based on the > value before RT scaling ? > > That should readily be able to distinguish between big and little > clusters, although then DynamiQ would still completely ruin things. IIRC, I did actually track max_capacity to begin with for the wake_cap() stuff, but someone suggested to use min_capacity instead to factor in the RT scaling as it could potentially help some use-cases. I can add unscaled max_capacity tracking and use that as this is primarily a solution for asymmetric cpu capacity system. Whether we track max or min shouldn't really matter if it is based on original capacity, unless you have a DynamiQ system. For DynamiQ system it depends on how it is configured. If it is a single DynamiQ cluster we will have just on sched_domain with per-cpu sched_groups, so we don't have balancing between mixed groups. If we have multiple DynamiQ clusters, we can have mixed groups, homogeneous groups, or both depending on the system configuration. Homogeneous groups should be okay, mixed groups could work okay, I think, as long as all group have the same mix, a mix of mixed groups is going to be a challenge. Most likely we would have to treat all these groups as one and ignore cache boundaries for scheduling decisions. We are slowly getting into the mess of which capacity should be used for various conditions. Is it the original capacity (unscaled and at the highest frequency), do we subtract the RT utilization, and what if the thermal framework has disabled some of the higher frequencies? Particularly, the fact that constraints impose by RT and thermal are not permanent and might change depending on the use-case and how we place the tasks. But that is a longer discussion to be had.
Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups
On Fri, Feb 23, 2018 at 04:38:06PM +, Morten Rasmussen wrote: > > Or am I now terminally confused again? > > No, I think you are right, or I'm equally confused. :-) Would it make sense to also track max_capacity, but then based on the value before RT scaling ? That should readily be able to distinguish between big and little clusters, although then DynamiQ would still completely ruin things.
Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups
On Tue, Feb 20, 2018 at 07:26:05PM +0100, Peter Zijlstra wrote: > On Tue, Feb 20, 2018 at 04:33:52PM +, Morten Rasmussen wrote: > > On Mon, Feb 19, 2018 at 04:10:11PM +0100, Peter Zijlstra wrote: > > > On Thu, Feb 15, 2018 at 04:20:51PM +, Morten Rasmussen wrote: > > > > +/* > > > > + * group_similar_cpu_capacity: Returns true if the minimum capacity of > > > > the > > > > + * compared groups differ by less than 12.5%. > > > > + */ > > > > +static inline bool > > > > +group_similar_cpu_capacity(struct sched_group *sg, struct sched_group > > > > *ref) > > > > +{ > > > > + long diff = sg->sgc->min_capacity - ref->sgc->min_capacity; > > > > + long max = max(sg->sgc->min_capacity, ref->sgc->min_capacity); > > > > + > > > > + return abs(diff) < max >> 3; > > > > +} > > > > > > This seems a fairly random and dodgy heuristic. > > > > I can't deny that :-) > > > > We need to somehow figure out if we are doing asymmetric cpu capacity > > balancing or normal SMP balancing. We probably don't care about > > migrating tasks if the capacities are nearly identical. But how much is > > 'nearly'? > > > > We could make it strictly equal as long as sgc->min_capacity is based on > > capacity_orig. If we let things like rt-pressure influence > > sgc->min_capacity, it might become a mess. > > See, that is the problem, I think it this min_capacity thing is > influenced by rt-pressure and the like. > > See update_cpu_capacity(), min_capacity is set after we add the RT scale > factor thingy, and then update_group_capacity() filters the min of the > whole group. The thing only ever goes down. > > But this means that if a big CPU has a very high IRQ/RT load, its > capacity will dip below that of a little core and min_capacity for the > big group as a whole will appear smaller than that of the little group. > > Or am I now terminally confused again? No, I think you are right, or I'm equally confused. I don't think we can avoid having some sort of margin to decide when capacities are significantly different if we want to keep this patch. Looking more into this, I realized that we do already have similar comparison and margin in group_smaller_cpu_capacity(). So I'm going to look using that instead if possible. The commit message could use some clarification too as we do in fact already use group_smaller_cpu_capacity() to bail out of pulling big tasks from big cpus to little. However, there are cases where it is fine to have sum_nr_running > group_weight on the big side and cases where it is fine to have the bigs idling while the littles are lightly utilized which should be addressed by this patch.
Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups
On Tue, Feb 20, 2018 at 04:33:52PM +, Morten Rasmussen wrote: > On Mon, Feb 19, 2018 at 04:10:11PM +0100, Peter Zijlstra wrote: > > On Thu, Feb 15, 2018 at 04:20:51PM +, Morten Rasmussen wrote: > > > +/* > > > + * group_similar_cpu_capacity: Returns true if the minimum capacity of > > > the > > > + * compared groups differ by less than 12.5%. > > > + */ > > > +static inline bool > > > +group_similar_cpu_capacity(struct sched_group *sg, struct sched_group > > > *ref) > > > +{ > > > + long diff = sg->sgc->min_capacity - ref->sgc->min_capacity; > > > + long max = max(sg->sgc->min_capacity, ref->sgc->min_capacity); > > > + > > > + return abs(diff) < max >> 3; > > > +} > > > > This seems a fairly random and dodgy heuristic. > > I can't deny that :-) > > We need to somehow figure out if we are doing asymmetric cpu capacity > balancing or normal SMP balancing. We probably don't care about > migrating tasks if the capacities are nearly identical. But how much is > 'nearly'? > > We could make it strictly equal as long as sgc->min_capacity is based on > capacity_orig. If we let things like rt-pressure influence > sgc->min_capacity, it might become a mess. See, that is the problem, I think it this min_capacity thing is influenced by rt-pressure and the like. See update_cpu_capacity(), min_capacity is set after we add the RT scale factor thingy, and then update_group_capacity() filters the min of the whole group. The thing only ever goes down. But this means that if a big CPU has a very high IRQ/RT load, its capacity will dip below that of a little core and min_capacity for the big group as a whole will appear smaller than that of the little group. Or am I now terminally confused again?
Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups
On Mon, Feb 19, 2018 at 04:10:11PM +0100, Peter Zijlstra wrote: > On Thu, Feb 15, 2018 at 04:20:51PM +, Morten Rasmussen wrote: > > +/* > > + * group_similar_cpu_capacity: Returns true if the minimum capacity of the > > + * compared groups differ by less than 12.5%. > > + */ > > +static inline bool > > +group_similar_cpu_capacity(struct sched_group *sg, struct sched_group *ref) > > +{ > > + long diff = sg->sgc->min_capacity - ref->sgc->min_capacity; > > + long max = max(sg->sgc->min_capacity, ref->sgc->min_capacity); > > + > > + return abs(diff) < max >> 3; > > +} > > This seems a fairly random and dodgy heuristic. I can't deny that :-) We need to somehow figure out if we are doing asymmetric cpu capacity balancing or normal SMP balancing. We probably don't care about migrating tasks if the capacities are nearly identical. But how much is 'nearly'? We could make it strictly equal as long as sgc->min_capacity is based on capacity_orig. If we let things like rt-pressure influence sgc->min_capacity, it might become a mess. We could tie it to sd->imbalance_pct to make it slightly less arbitrary, or we can try to drop the margin. Alternative solutions and preferences are welcome...
Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups
On Mon, Feb 19, 2018 at 03:53:35PM +0100, Peter Zijlstra wrote: > On Mon, Feb 19, 2018 at 03:50:12PM +0100, Peter Zijlstra wrote: > > On Thu, Feb 15, 2018 at 04:20:51PM +, Morten Rasmussen wrote: > > > On systems with asymmetric cpu capacities, a skewed load distribution > > > might yield better throughput than balancing load per group capacity. > > > For example, preferring high capacity cpus for compute intensive tasks > > > leaving low capacity cpus idle rather than balancing the number of idle > > > cpus across different cpu types. Instead, let load-balance back off if > > > the busiest group isn't really overloaded. > > > > I'm sorry. I just can't seem to make sense of that today. What? > > Aah, you're saying that is we have 4+4 bit.little and 4 runnable tasks, > we would like them running on our big cluster and leave the small > cluster entirely idle, instead of runnint 2+2. Yes. Ideally, when are busy, we want to fill up the big cpus first, one task on each, and if we have more tasks we start using little cpus before putting additional tasks on any of the bigs. The load per group capacity metric is sort of working against that up until roughly the point where we have enough tasks for all the cpus. > So what about this DynamicQ nonsense? Or is that so unstructured we > can't really do anything sensible with it? With DynamiQ we don't have any grouping of the cpu types provided by the architecture. So we are going end up balancing between individual cpus. I hope these tweaks should work for DynamiQ too. Always-running tasks on little cpus should to be flagged as misfits and be picked up by big cpus. It is still to be verified though.
Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups
On Thu, Feb 15, 2018 at 04:20:51PM +, Morten Rasmussen wrote: > On systems with asymmetric cpu capacities, a skewed load distribution > might yield better throughput than balancing load per group capacity. > For example, preferring high capacity cpus for compute intensive tasks > leaving low capacity cpus idle rather than balancing the number of idle > cpus across different cpu types. Instead, let load-balance back off if > the busiest group isn't really overloaded. I'm sorry. I just can't seem to make sense of that today. What?
Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups
On Mon, Feb 19, 2018 at 03:50:12PM +0100, Peter Zijlstra wrote: > On Thu, Feb 15, 2018 at 04:20:51PM +, Morten Rasmussen wrote: > > On systems with asymmetric cpu capacities, a skewed load distribution > > might yield better throughput than balancing load per group capacity. > > For example, preferring high capacity cpus for compute intensive tasks > > leaving low capacity cpus idle rather than balancing the number of idle > > cpus across different cpu types. Instead, let load-balance back off if > > the busiest group isn't really overloaded. > > I'm sorry. I just can't seem to make sense of that today. What? Aah, you're saying that is we have 4+4 bit.little and 4 runnable tasks, we would like them running on our big cluster and leave the small cluster entirely idle, instead of runnint 2+2. So what about this DynamicQ nonsense? Or is that so unstructured we can't really do anything sensible with it?
Re: [PATCH 4/7] sched/fair: Avoid unnecessary balancing of asymmetric capacity groups
On Thu, Feb 15, 2018 at 04:20:51PM +, Morten Rasmussen wrote: > +/* > + * group_similar_cpu_capacity: Returns true if the minimum capacity of the > + * compared groups differ by less than 12.5%. > + */ > +static inline bool > +group_similar_cpu_capacity(struct sched_group *sg, struct sched_group *ref) > +{ > + long diff = sg->sgc->min_capacity - ref->sgc->min_capacity; > + long max = max(sg->sgc->min_capacity, ref->sgc->min_capacity); > + > + return abs(diff) < max >> 3; > +} This seems a fairly random and dodgy heuristic.