Re: [PATCH] sched,numa: move processes with load difference

2014-05-14 Thread Peter Zijlstra
On Tue, May 13, 2014 at 07:55:50PM -0400, Rik van Riel wrote:
> Currently the numa balancing code refuses to move a task from a
> heavily loaded node to a much less heavily loaded node, if the
> difference in load between them is large enough.
> 
> If the source load is larger than the destination load after the
> swap, moving the task is fine. Chances are the load balancer would
> move tasks in the same direction, anyway.

So the intent of that code was that the swap (remember, both tasks don't
need to have the same weight) wouldn't push us over the imbalance
threshold.

Now we want to test that threshold both ways, dst being small and src
being large, and vice versa.

However, if we've already crossed that imbalance, crossing it isn't the
right test.

In that case I think we want to limit the swap such that the imbalance
improves (ie. gets smaller).

Something like the below perhaps (entirely untested).

Or am I missing something?

---
 kernel/sched/fair.c | 33 -
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 28ccf502c63c..87f88568ecb3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1107,7 +1107,9 @@ static void task_numa_compare(struct task_numa_env *env,
struct rq *src_rq = cpu_rq(env->src_cpu);
struct rq *dst_rq = cpu_rq(env->dst_cpu);
struct task_struct *cur;
+   long orig_dst_load, orig_src_load;
long dst_load, src_load;
+   long imb, orig_imb;
long load;
long imp = (groupimp > 0) ? groupimp : taskimp;
 
@@ -1181,8 +1183,8 @@ static void task_numa_compare(struct task_numa_env *env,
 * In the overloaded case, try and keep the load balanced.
 */
 balance:
-   dst_load = env->dst_stats.load;
-   src_load = env->src_stats.load;
+   orig_dst_load = dst_load = env->dst_stats.load;
+   orig_src_load = src_load = env->src_stats.load;
 
/* XXX missing power terms */
load = task_h_load(env->p);
@@ -1195,12 +1197,33 @@ static void task_numa_compare(struct task_numa_env *env,
src_load += load;
}
 
-   /* make src_load the smaller */
+   /* 
+* We want to compute the 'slope' of the imbalance between src and dst
+* since we're not interested in what direction the slope is.
+*
+* So make src_load the smaller.
+*/
if (dst_load < src_load)
swap(dst_load, src_load);
 
-   if (src_load * env->imbalance_pct < dst_load * 100)
-   goto unlock;
+   /*
+* Test if the slope is over or under the imb_pct
+*/
+   imb = dst_load * 100 - src_load * env->imbalance_pct;
+   if (imb >= 0) {
+   /*
+* If the slope is over the imb_pct, check the original state;
+* if we started out already being imbalanced, see if this swap
+* improves the situation by reducing the slope, even though
+* its still over the threshold.
+*/
+   if (orig_dst_load < orig_src_load)
+   swap(orig_dst_load, orig_src_load);
+
+   orig_imb = orig_dst_load * 100 - orig_src_load * 
env->imbalance_pct;
+   if (imb > orig_imb)
+   goto unlock;
+   }
 
 assign:
task_numa_assign(env, cur, imp);


pgpR4VHE3qba5.pgp
Description: PGP signature


Re: [PATCH] sched,numa: move processes with load difference

2014-05-14 Thread Peter Zijlstra
On Tue, May 13, 2014 at 07:55:50PM -0400, Rik van Riel wrote:
 Currently the numa balancing code refuses to move a task from a
 heavily loaded node to a much less heavily loaded node, if the
 difference in load between them is large enough.
 
 If the source load is larger than the destination load after the
 swap, moving the task is fine. Chances are the load balancer would
 move tasks in the same direction, anyway.

So the intent of that code was that the swap (remember, both tasks don't
need to have the same weight) wouldn't push us over the imbalance
threshold.

Now we want to test that threshold both ways, dst being small and src
being large, and vice versa.

However, if we've already crossed that imbalance, crossing it isn't the
right test.

In that case I think we want to limit the swap such that the imbalance
improves (ie. gets smaller).

Something like the below perhaps (entirely untested).

Or am I missing something?

---
 kernel/sched/fair.c | 33 -
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 28ccf502c63c..87f88568ecb3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1107,7 +1107,9 @@ static void task_numa_compare(struct task_numa_env *env,
struct rq *src_rq = cpu_rq(env-src_cpu);
struct rq *dst_rq = cpu_rq(env-dst_cpu);
struct task_struct *cur;
+   long orig_dst_load, orig_src_load;
long dst_load, src_load;
+   long imb, orig_imb;
long load;
long imp = (groupimp  0) ? groupimp : taskimp;
 
@@ -1181,8 +1183,8 @@ static void task_numa_compare(struct task_numa_env *env,
 * In the overloaded case, try and keep the load balanced.
 */
 balance:
-   dst_load = env-dst_stats.load;
-   src_load = env-src_stats.load;
+   orig_dst_load = dst_load = env-dst_stats.load;
+   orig_src_load = src_load = env-src_stats.load;
 
/* XXX missing power terms */
load = task_h_load(env-p);
@@ -1195,12 +1197,33 @@ static void task_numa_compare(struct task_numa_env *env,
src_load += load;
}
 
-   /* make src_load the smaller */
+   /* 
+* We want to compute the 'slope' of the imbalance between src and dst
+* since we're not interested in what direction the slope is.
+*
+* So make src_load the smaller.
+*/
if (dst_load  src_load)
swap(dst_load, src_load);
 
-   if (src_load * env-imbalance_pct  dst_load * 100)
-   goto unlock;
+   /*
+* Test if the slope is over or under the imb_pct
+*/
+   imb = dst_load * 100 - src_load * env-imbalance_pct;
+   if (imb = 0) {
+   /*
+* If the slope is over the imb_pct, check the original state;
+* if we started out already being imbalanced, see if this swap
+* improves the situation by reducing the slope, even though
+* its still over the threshold.
+*/
+   if (orig_dst_load  orig_src_load)
+   swap(orig_dst_load, orig_src_load);
+
+   orig_imb = orig_dst_load * 100 - orig_src_load * 
env-imbalance_pct;
+   if (imb  orig_imb)
+   goto unlock;
+   }
 
 assign:
task_numa_assign(env, cur, imp);


pgpR4VHE3qba5.pgp
Description: PGP signature


[PATCH] sched,numa: move processes with load difference

2014-05-13 Thread Rik van Riel
Currently the numa balancing code refuses to move a task from a
heavily loaded node to a much less heavily loaded node, if the
difference in load between them is large enough.

If the source load is larger than the destination load after the
swap, moving the task is fine. Chances are the load balancer would
move tasks in the same direction, anyway.

Signed-off-by: Rik van Riel 
---
 kernel/sched/fair.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6d9474c..98a018f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1224,10 +1224,6 @@ static void task_numa_compare(struct task_numa_env *env,
src_load += load;
}
 
-   /* make src_load the smaller */
-   if (dst_load < src_load)
-   swap(dst_load, src_load);
-
if (src_load * env->imbalance_pct < dst_load * 100)
goto unlock;
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sched,numa: move processes with load difference

2014-05-13 Thread Rik van Riel
Currently the numa balancing code refuses to move a task from a
heavily loaded node to a much less heavily loaded node, if the
difference in load between them is large enough.

If the source load is larger than the destination load after the
swap, moving the task is fine. Chances are the load balancer would
move tasks in the same direction, anyway.

Signed-off-by: Rik van Riel r...@redhat.com
---
 kernel/sched/fair.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6d9474c..98a018f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1224,10 +1224,6 @@ static void task_numa_compare(struct task_numa_env *env,
src_load += load;
}
 
-   /* make src_load the smaller */
-   if (dst_load  src_load)
-   swap(dst_load, src_load);
-
if (src_load * env-imbalance_pct  dst_load * 100)
goto unlock;
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/