Re: [Xen-devel] [PATCH 19/24] xen: credit2: soft-affinity awareness in load balancing

2016-09-05 Thread Dario Faggioli
On Fri, 2016-09-02 at 12:46 +0100, anshul makkar wrote:

Hey, Anshul,

Thanks for having a look at the patch!

> On 17/08/16 18:19, Dario Faggioli wrote:
> > 
> > --- a/xen/common/sched_credit2.c
> > +++ b/xen/common/sched_credit2.c
> > 
> > + * Basically, if a soft-affinity is defined, the work done by a
> > vcpu on a
> > + * runq to which it has higher degree of soft-affinity, is
> > considered
> > + * 'lighter' than the same work done by the same vcpu on a runq to
> > which it
> > + * has smaller degree of soft-affinity (degree of soft affinity is
> > <= 1). In
> > + * fact, if soft-affinity is used to achieve NUMA-aware
> > scheduling, the higher
> > + * the degree of soft-affinity of the vcpu to a runq, the greater
> > the probability
> > + * of accessing local memory, when running on such runq. And that
> > is certainly\
> > + * 'lighter' than having to fetch memory from remote NUMA nodes.
> Do we ensure that while defining soft-affinity for a vcpu, NUMA 
> architecture is considered. If not, then this whole calculation can
> go 
> wrong and have negative impact on performance.
> 
Defining soft-affinity after topology is what we do by default, just
not here in Xen: we do it in toolstack (in libxl, to be precise).

NUMA aware scheduling is indeed the most obvious use case for all this
--and, in fact that's why we configure things in such a way in higher
layers-- but the mechanism is, at the Xen level, flexible enough to be
used for any purpose that the user may find interesting.

> Degree of affinity to runq will give good result if the affinity to 
> pcpus has been chosen after due consideration ..
>
At this level, 'good result' means 'making sure that a vcpu runs for as
much time as possible on a pcpu to which it has soft-affinity'. Whether
that is good or not for performance (or for any other aspect or
metric), it's not this algorithm's job to determine.

Note that things are exactly the same for hard-affinity/pinning, or for
weights. In fact, Xen won't stop one to, say, pin 128 vcpu all to pcpu
3. This will deeply suck, but it's the higher layers' will (fault?) and
Xen should just comply to that.

> > + * If there is no soft-affinity, load_balance() (actually,
> > consider()) acts
> > + * as follows:
> > + *
> > + *  - D = abs(Li - Lj)
> If we are consider absolute of Li -Lj, how will we know which runq
> has 
> less workload which, I think, is an essential parameter for load 
> balancing. Am I missing something here ?
>
What we are aiming for is making the queues more balanced, which means
we want the difference between their load to be smaller than how it is
when the balancing start. As far as that happens, we don't care which
loads goes down and which one goes up, as far as the final result is a
smaller load delta.

> > + *  - consider pushing v from I to J:
> > + * - D' = abs(Li - lv - (Lj + lv))   (from now, abs(x) == |x|)
> > + * - if (D' < D) { push }
> > + *  - consider pulling k from J to I:
> > + * - D' = |Li + lk - (Lj - lk)|
> > + * - if (D' < D) { pull }
> For both push and pull we are checking (D` < D) ?
>
Indeed. And that's because of the abs(). :-)


Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 19/24] xen: credit2: soft-affinity awareness in load balancing

2016-09-02 Thread anshul makkar

On 17/08/16 18:19, Dario Faggioli wrote:

We want is soft-affinity to play a role in load
balancing, i.e., when deciding whether or not to



something like that at some point.

(Oh, and while there, just a couple of style fixes
are also done.)

Signed-off-by: Dario Faggioli 
---
Cc: George Dunlap 
Cc: Anshul Makkar 
---
  xen/common/sched_credit2.c |  359 
  1 file changed, 326 insertions(+), 33 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 2d7228a..3722f46 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1786,19 +1786,21 @@ csched2_cpu_pick(const struct scheduler *ops, struct 
vcpu *vc)
  return new_cpu;
  }

-/* Working state of the load-balancing algorithm */
+/* Working state of the load-balancing algorithm. */
  typedef struct {
-/* NB: Modified by consider() */
+/* NB: Modified by consider(). */
  s_time_t load_delta;
  struct csched2_vcpu * best_push_svc, *best_pull_svc;
-/* NB: Read by consider() */
+/* NB: Read by consider() (and the various consider_foo() functions). */
  struct csched2_runqueue_data *lrqd;
-struct csched2_runqueue_data *orqd;
+struct csched2_runqueue_data *orqd;
+bool_t push_has_soft_aff, pull_has_soft_aff;
+s_time_t push_soft_aff_load, pull_soft_aff_load;
  } balance_state_t;

-static void consider(balance_state_t *st,
- struct csched2_vcpu *push_svc,
- struct csched2_vcpu *pull_svc)
+static inline s_time_t consider_load(balance_state_t *st,
+ struct csched2_vcpu *push_svc,
+ struct csched2_vcpu *pull_svc)
  {
  s_time_t l_load, o_load, delta;

@@ -1821,11 +1823,166 @@ static void consider(balance_state_t *st,
  if ( delta < 0 )
  delta = -delta;

+return delta;
+}
+
+/*
+ * Load balancing and soft-affinity.
+ *
+ * When trying to figure out whether or not it's best to move a vcpu from
+ * one runqueue to another, we must keep soft-affinity in mind. Intuitively
+ * we would want to know the following:
+ *  - 'how much' affinity does the vcpu have with its current runq?
+ *  - 'how much' affinity will it have with its new runq?
+ *
+ * But we certainly need to be more precise about how much it is that 'how
+ * much'! Let's start with some definitions:
+ *
+ *  - let v be a vcpu, running in runq I, with soft-affinity to vi
+ *pcpus of runq I, and soft affinity with vj pcpus of runq J;
+ *  - let k be another vcpu, running in runq J, with soft-affinity to kj
+ *pcpus of runq J, and with ki pcpus of runq I;
+ *  - let runq I have Ci pcpus, and runq J Cj pcpus;
+ *  - let vcpu v have an average load of lv, and k an average load of lk;
+ *  - let runq I have an average load of Li, and J an average load of Lj.
+ *
+ * We also define the following::
+ *
+ *  - lvi = lv * (vi / Ci)  as the 'perceived load' of v, when running
+ *  in runq i;
+ *  - lvj = lv * (vj / Cj)  as the 'perceived load' of v, it running
+ *  in runq j;
+ *  - the same for k, mutatis mutandis.
+ *
+ * Idea is that vi/Ci (i.e., the ratio of the number of cpus of a runq that
+ * a vcpu has soft-affinity with, over the total number of cpus of the runq
+ * itself) can be seen as the 'degree of soft-affinity' of v to runq I (and
+ * vj/Cj the one of v to J). In other words, we define the degree of soft
+ * affinity of a vcpu to a runq as what fraction of pcpus of the runq itself
+ * the vcpu has soft-affinity with. Then, we multiply this 'degree of
+ * soft-affinity' by the vcpu load, and call the result the 'perceived load'.
+ *
+ * Basically, if a soft-affinity is defined, the work done by a vcpu on a
+ * runq to which it has higher degree of soft-affinity, is considered
+ * 'lighter' than the same work done by the same vcpu on a runq to which it
+ * has smaller degree of soft-affinity (degree of soft affinity is <= 1). In
+ * fact, if soft-affinity is used to achieve NUMA-aware scheduling, the higher
+ * the degree of soft-affinity of the vcpu to a runq, the greater the 
probability
+ * of accessing local memory, when running on such runq. And that is certainly\
+ * 'lighter' than having to fetch memory from remote NUMA nodes.
Do we ensure that while defining soft-affinity for a vcpu, NUMA 
architecture is considered. If not, then this whole calculation can go 
wrong and have negative impact on performance.


Degree of affinity to runq will give good result if the affinity to 
pcpus has been chosen after due consideration ..

+ *
+ * SoXX, evaluating pushing v from I to J would mean removing (from I) a
+ * perceived load of lv*(vi/Ci) and adding (to J) a perceived load of
+ * lv*(vj/Cj), which we (looking at things from the point of view of I,
+ * which is what balance_load() does) can call 

[Xen-devel] [PATCH 19/24] xen: credit2: soft-affinity awareness in load balancing

2016-08-17 Thread Dario Faggioli
We want is soft-affinity to play a role in load
balancing, i.e., when deciding whether or not to
push this vcpu from here to there, and/or pull that
other vcpu from there to here.

A way of doing that, is considering the following
(for both pushes and pulls, just with the roles of
current an new runqueues inverted):
 - how much affinity does the vcpu have with
   its current runq?
 - how much affinity will the vcpu it have with
   its new runq?

We call this 'degree of soft-affinity' of a vcpu
to a runq, and define it as, informally speaking,
a quantity that is proportional to the fraction of
pcpus of a runq that a vcpu has soft-affinity with.

Then, we use this 'degree of soft-affinity' to
compute a value that is used as a modifier of the
baseline --purely load based-- results of the load
balancer, we apply it (potentially, with a scaling
factor), and use the modified result for the actual
load balancing decision.

This modifier based approach is chosen because it
integrates well into the existing load balancing
framework, it is modular and can easily accommodate
further extensions.

A note on performance and optimization: since we
call (potentially) call consider() O(nr_vcpus^2)
times, we absolutely need that it is lean and
quick. Therefore, a bunch of things are
pre-calculated outside of it. This makes things
look less encapsulated and clean, but at the same
time, makes the code faster (and this is a critical
path, so we want it fast!).

Finally, this patch does not interfere with the
load balancing triggering logic. This is to say
that vcpus running outside of their soft-affinity
_don't_ trigger additional load balancing point.
Early numbers show that this is ok, but it well
may be the case that we will want to introduce
something like that at some point.

(Oh, and while there, just a couple of style fixes
are also done.)

Signed-off-by: Dario Faggioli 
---
Cc: George Dunlap 
Cc: Anshul Makkar 
---
 xen/common/sched_credit2.c |  359 
 1 file changed, 326 insertions(+), 33 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 2d7228a..3722f46 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1786,19 +1786,21 @@ csched2_cpu_pick(const struct scheduler *ops, struct 
vcpu *vc)
 return new_cpu;
 }
 
-/* Working state of the load-balancing algorithm */
+/* Working state of the load-balancing algorithm. */
 typedef struct {
-/* NB: Modified by consider() */
+/* NB: Modified by consider(). */
 s_time_t load_delta;
 struct csched2_vcpu * best_push_svc, *best_pull_svc;
-/* NB: Read by consider() */
+/* NB: Read by consider() (and the various consider_foo() functions). */
 struct csched2_runqueue_data *lrqd;
-struct csched2_runqueue_data *orqd;  
+struct csched2_runqueue_data *orqd;
+bool_t push_has_soft_aff, pull_has_soft_aff;
+s_time_t push_soft_aff_load, pull_soft_aff_load;
 } balance_state_t;
 
-static void consider(balance_state_t *st, 
- struct csched2_vcpu *push_svc,
- struct csched2_vcpu *pull_svc)
+static inline s_time_t consider_load(balance_state_t *st,
+ struct csched2_vcpu *push_svc,
+ struct csched2_vcpu *pull_svc)
 {
 s_time_t l_load, o_load, delta;
 
@@ -1821,11 +1823,166 @@ static void consider(balance_state_t *st,
 if ( delta < 0 )
 delta = -delta;
 
+return delta;
+}
+
+/*
+ * Load balancing and soft-affinity.
+ *
+ * When trying to figure out whether or not it's best to move a vcpu from
+ * one runqueue to another, we must keep soft-affinity in mind. Intuitively
+ * we would want to know the following:
+ *  - 'how much' affinity does the vcpu have with its current runq?
+ *  - 'how much' affinity will it have with its new runq?
+ *
+ * But we certainly need to be more precise about how much it is that 'how
+ * much'! Let's start with some definitions:
+ *
+ *  - let v be a vcpu, running in runq I, with soft-affinity to vi
+ *pcpus of runq I, and soft affinity with vj pcpus of runq J;
+ *  - let k be another vcpu, running in runq J, with soft-affinity to kj
+ *pcpus of runq J, and with ki pcpus of runq I;
+ *  - let runq I have Ci pcpus, and runq J Cj pcpus;
+ *  - let vcpu v have an average load of lv, and k an average load of lk;
+ *  - let runq I have an average load of Li, and J an average load of Lj.
+ *
+ * We also define the following::
+ *
+ *  - lvi = lv * (vi / Ci)  as the 'perceived load' of v, when running
+ *  in runq i;
+ *  - lvj = lv * (vj / Cj)  as the 'perceived load' of v, it running
+ *  in runq j;
+ *  - the same for k, mutatis mutandis.
+ *
+ * Idea is that vi/Ci (i.e., the ratio of the number of cpus of a runq that
+ * a vcpu has