Re: [Xen-devel] [PATCH 19/24] xen: credit2: soft-affinity awareness in load balancing
On Fri, 2016-09-02 at 12:46 +0100, anshul makkar wrote: Hey, Anshul, Thanks for having a look at the patch! > On 17/08/16 18:19, Dario Faggioli wrote: > > > > --- a/xen/common/sched_credit2.c > > +++ b/xen/common/sched_credit2.c > > > > + * Basically, if a soft-affinity is defined, the work done by a > > vcpu on a > > + * runq to which it has higher degree of soft-affinity, is > > considered > > + * 'lighter' than the same work done by the same vcpu on a runq to > > which it > > + * has smaller degree of soft-affinity (degree of soft affinity is > > <= 1). In > > + * fact, if soft-affinity is used to achieve NUMA-aware > > scheduling, the higher > > + * the degree of soft-affinity of the vcpu to a runq, the greater > > the probability > > + * of accessing local memory, when running on such runq. And that > > is certainly\ > > + * 'lighter' than having to fetch memory from remote NUMA nodes. > Do we ensure that while defining soft-affinity for a vcpu, NUMA > architecture is considered. If not, then this whole calculation can > go > wrong and have negative impact on performance. > Defining soft-affinity after topology is what we do by default, just not here in Xen: we do it in toolstack (in libxl, to be precise). NUMA aware scheduling is indeed the most obvious use case for all this --and, in fact that's why we configure things in such a way in higher layers-- but the mechanism is, at the Xen level, flexible enough to be used for any purpose that the user may find interesting. > Degree of affinity to runq will give good result if the affinity to > pcpus has been chosen after due consideration .. > At this level, 'good result' means 'making sure that a vcpu runs for as much time as possible on a pcpu to which it has soft-affinity'. Whether that is good or not for performance (or for any other aspect or metric), it's not this algorithm's job to determine. Note that things are exactly the same for hard-affinity/pinning, or for weights. In fact, Xen won't stop one to, say, pin 128 vcpu all to pcpu 3. This will deeply suck, but it's the higher layers' will (fault?) and Xen should just comply to that. > > + * If there is no soft-affinity, load_balance() (actually, > > consider()) acts > > + * as follows: > > + * > > + * - D = abs(Li - Lj) > If we are consider absolute of Li -Lj, how will we know which runq > has > less workload which, I think, is an essential parameter for load > balancing. Am I missing something here ? > What we are aiming for is making the queues more balanced, which means we want the difference between their load to be smaller than how it is when the balancing start. As far as that happens, we don't care which loads goes down and which one goes up, as far as the final result is a smaller load delta. > > + * - consider pushing v from I to J: > > + * - D' = abs(Li - lv - (Lj + lv)) (from now, abs(x) == |x|) > > + * - if (D' < D) { push } > > + * - consider pulling k from J to I: > > + * - D' = |Li + lk - (Lj - lk)| > > + * - if (D' < D) { pull } > For both push and pull we are checking (D` < D) ? > Indeed. And that's because of the abs(). :-) Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 19/24] xen: credit2: soft-affinity awareness in load balancing
On 17/08/16 18:19, Dario Faggioli wrote: We want is soft-affinity to play a role in load balancing, i.e., when deciding whether or not to something like that at some point. (Oh, and while there, just a couple of style fixes are also done.) Signed-off-by: Dario Faggioli--- Cc: George Dunlap Cc: Anshul Makkar --- xen/common/sched_credit2.c | 359 1 file changed, 326 insertions(+), 33 deletions(-) diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index 2d7228a..3722f46 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -1786,19 +1786,21 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc) return new_cpu; } -/* Working state of the load-balancing algorithm */ +/* Working state of the load-balancing algorithm. */ typedef struct { -/* NB: Modified by consider() */ +/* NB: Modified by consider(). */ s_time_t load_delta; struct csched2_vcpu * best_push_svc, *best_pull_svc; -/* NB: Read by consider() */ +/* NB: Read by consider() (and the various consider_foo() functions). */ struct csched2_runqueue_data *lrqd; -struct csched2_runqueue_data *orqd; +struct csched2_runqueue_data *orqd; +bool_t push_has_soft_aff, pull_has_soft_aff; +s_time_t push_soft_aff_load, pull_soft_aff_load; } balance_state_t; -static void consider(balance_state_t *st, - struct csched2_vcpu *push_svc, - struct csched2_vcpu *pull_svc) +static inline s_time_t consider_load(balance_state_t *st, + struct csched2_vcpu *push_svc, + struct csched2_vcpu *pull_svc) { s_time_t l_load, o_load, delta; @@ -1821,11 +1823,166 @@ static void consider(balance_state_t *st, if ( delta < 0 ) delta = -delta; +return delta; +} + +/* + * Load balancing and soft-affinity. + * + * When trying to figure out whether or not it's best to move a vcpu from + * one runqueue to another, we must keep soft-affinity in mind. Intuitively + * we would want to know the following: + * - 'how much' affinity does the vcpu have with its current runq? + * - 'how much' affinity will it have with its new runq? + * + * But we certainly need to be more precise about how much it is that 'how + * much'! Let's start with some definitions: + * + * - let v be a vcpu, running in runq I, with soft-affinity to vi + *pcpus of runq I, and soft affinity with vj pcpus of runq J; + * - let k be another vcpu, running in runq J, with soft-affinity to kj + *pcpus of runq J, and with ki pcpus of runq I; + * - let runq I have Ci pcpus, and runq J Cj pcpus; + * - let vcpu v have an average load of lv, and k an average load of lk; + * - let runq I have an average load of Li, and J an average load of Lj. + * + * We also define the following:: + * + * - lvi = lv * (vi / Ci) as the 'perceived load' of v, when running + * in runq i; + * - lvj = lv * (vj / Cj) as the 'perceived load' of v, it running + * in runq j; + * - the same for k, mutatis mutandis. + * + * Idea is that vi/Ci (i.e., the ratio of the number of cpus of a runq that + * a vcpu has soft-affinity with, over the total number of cpus of the runq + * itself) can be seen as the 'degree of soft-affinity' of v to runq I (and + * vj/Cj the one of v to J). In other words, we define the degree of soft + * affinity of a vcpu to a runq as what fraction of pcpus of the runq itself + * the vcpu has soft-affinity with. Then, we multiply this 'degree of + * soft-affinity' by the vcpu load, and call the result the 'perceived load'. + * + * Basically, if a soft-affinity is defined, the work done by a vcpu on a + * runq to which it has higher degree of soft-affinity, is considered + * 'lighter' than the same work done by the same vcpu on a runq to which it + * has smaller degree of soft-affinity (degree of soft affinity is <= 1). In + * fact, if soft-affinity is used to achieve NUMA-aware scheduling, the higher + * the degree of soft-affinity of the vcpu to a runq, the greater the probability + * of accessing local memory, when running on such runq. And that is certainly\ + * 'lighter' than having to fetch memory from remote NUMA nodes. Do we ensure that while defining soft-affinity for a vcpu, NUMA architecture is considered. If not, then this whole calculation can go wrong and have negative impact on performance. Degree of affinity to runq will give good result if the affinity to pcpus has been chosen after due consideration .. + * + * SoXX, evaluating pushing v from I to J would mean removing (from I) a + * perceived load of lv*(vi/Ci) and adding (to J) a perceived load of + * lv*(vj/Cj), which we (looking at things from the point of view of I, + * which is what balance_load() does) can call
[Xen-devel] [PATCH 19/24] xen: credit2: soft-affinity awareness in load balancing
We want is soft-affinity to play a role in load balancing, i.e., when deciding whether or not to push this vcpu from here to there, and/or pull that other vcpu from there to here. A way of doing that, is considering the following (for both pushes and pulls, just with the roles of current an new runqueues inverted): - how much affinity does the vcpu have with its current runq? - how much affinity will the vcpu it have with its new runq? We call this 'degree of soft-affinity' of a vcpu to a runq, and define it as, informally speaking, a quantity that is proportional to the fraction of pcpus of a runq that a vcpu has soft-affinity with. Then, we use this 'degree of soft-affinity' to compute a value that is used as a modifier of the baseline --purely load based-- results of the load balancer, we apply it (potentially, with a scaling factor), and use the modified result for the actual load balancing decision. This modifier based approach is chosen because it integrates well into the existing load balancing framework, it is modular and can easily accommodate further extensions. A note on performance and optimization: since we call (potentially) call consider() O(nr_vcpus^2) times, we absolutely need that it is lean and quick. Therefore, a bunch of things are pre-calculated outside of it. This makes things look less encapsulated and clean, but at the same time, makes the code faster (and this is a critical path, so we want it fast!). Finally, this patch does not interfere with the load balancing triggering logic. This is to say that vcpus running outside of their soft-affinity _don't_ trigger additional load balancing point. Early numbers show that this is ok, but it well may be the case that we will want to introduce something like that at some point. (Oh, and while there, just a couple of style fixes are also done.) Signed-off-by: Dario Faggioli--- Cc: George Dunlap Cc: Anshul Makkar --- xen/common/sched_credit2.c | 359 1 file changed, 326 insertions(+), 33 deletions(-) diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index 2d7228a..3722f46 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -1786,19 +1786,21 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc) return new_cpu; } -/* Working state of the load-balancing algorithm */ +/* Working state of the load-balancing algorithm. */ typedef struct { -/* NB: Modified by consider() */ +/* NB: Modified by consider(). */ s_time_t load_delta; struct csched2_vcpu * best_push_svc, *best_pull_svc; -/* NB: Read by consider() */ +/* NB: Read by consider() (and the various consider_foo() functions). */ struct csched2_runqueue_data *lrqd; -struct csched2_runqueue_data *orqd; +struct csched2_runqueue_data *orqd; +bool_t push_has_soft_aff, pull_has_soft_aff; +s_time_t push_soft_aff_load, pull_soft_aff_load; } balance_state_t; -static void consider(balance_state_t *st, - struct csched2_vcpu *push_svc, - struct csched2_vcpu *pull_svc) +static inline s_time_t consider_load(balance_state_t *st, + struct csched2_vcpu *push_svc, + struct csched2_vcpu *pull_svc) { s_time_t l_load, o_load, delta; @@ -1821,11 +1823,166 @@ static void consider(balance_state_t *st, if ( delta < 0 ) delta = -delta; +return delta; +} + +/* + * Load balancing and soft-affinity. + * + * When trying to figure out whether or not it's best to move a vcpu from + * one runqueue to another, we must keep soft-affinity in mind. Intuitively + * we would want to know the following: + * - 'how much' affinity does the vcpu have with its current runq? + * - 'how much' affinity will it have with its new runq? + * + * But we certainly need to be more precise about how much it is that 'how + * much'! Let's start with some definitions: + * + * - let v be a vcpu, running in runq I, with soft-affinity to vi + *pcpus of runq I, and soft affinity with vj pcpus of runq J; + * - let k be another vcpu, running in runq J, with soft-affinity to kj + *pcpus of runq J, and with ki pcpus of runq I; + * - let runq I have Ci pcpus, and runq J Cj pcpus; + * - let vcpu v have an average load of lv, and k an average load of lk; + * - let runq I have an average load of Li, and J an average load of Lj. + * + * We also define the following:: + * + * - lvi = lv * (vi / Ci) as the 'perceived load' of v, when running + * in runq i; + * - lvj = lv * (vj / Cj) as the 'perceived load' of v, it running + * in runq j; + * - the same for k, mutatis mutandis. + * + * Idea is that vi/Ci (i.e., the ratio of the number of cpus of a runq that + * a vcpu has