Re: [ck] Re: [ANNOUNCE] Staircase Deadline cpu scheduler version 0.45

2007-04-23 Thread Con Kolivas
On Monday 23 April 2007 00:35, Con Kolivas wrote:
> On Monday 23 April 2007 00:22, Willy Tarreau wrote:
> > X is still somewhat jerky, even
> > at nice -19. I'm sure it happens when it's waiting in the other array. We
> > should definitely manage to get rid of this if we want to ensure low
> > latency.
>
> Yeah that would be correct. It's clearly possible to keep the whole design
> philosophy and priority system intact with SD and do away with the arrays
> if it becomes a continuous stream instead of two arrays but that requires
> some architectural changes. I've been concentrating on nailing all the
> remaining issues (and they kept cropping up as you've seen *blush*).
> However... I haven't quite figured out how to do that architectural change
> just yet either so let's just iron out all the bugs out of this now.

By the way, Ingo et al, this is yet again an open invitation to suggest ideas, 
or better yet, provide code to do this with now that the core of SD is 
finally looking to be doing everything as expected within its constraints. 
I'm low on cycles and would appreciate the help. I'd prefer to leave 
everything that's queued in -mm as is for the moment before someone wants to 
take this into another wild direction.

Thanks!

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: [ANNOUNCE] Staircase Deadline cpu scheduler version 0.45

2007-04-22 Thread Con Kolivas
On Monday 23 April 2007 00:27, Michael Gerdau wrote:
> > Anyway the more important part is... Can you test this patch please? Dump
> > all the other patches I sent you post 045. Michael, if you could test too
> > please?
>
> Have it up running for 40 minutes now and my perljobs show a constant
> cpu utilization of 100/50/50 in top most of the time. When the 100% job
> goes down to e.g. 70% these 30% are immediately reclaimed by the other
> two, i.e. the total sum of all three stays with 2% point of 200%.
>
> From here it seems as if your latest patch did what is was supposed to :-)

Excellent, thanks for testing. v0.46 with something close to this patch coming 
shortly.

> Best,
> Michael
>
> PS: While these numbercrunching jobs were running I started another
> kde session and have my children play supertux for 20 minutes. While
> the system occasionally was not as responsive as it is when there
> is little load, supertux remained very playable.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: [ANNOUNCE] Staircase Deadline cpu scheduler version 0.45

2007-04-22 Thread Con Kolivas
On Sunday 22 April 2007 23:07, Willy Tarreau wrote:
> On Sun, Apr 22, 2007 at 10:18:32PM +1000, Con Kolivas wrote:
> > On Sunday 22 April 2007 21:42, Con Kolivas wrote:
> >
> > Willy I'm still investigating the idle time and fluctuating load as a
> > separate issue.
>
> OK.
>
> > Is it possible the multiple ocbench processes are naturally
> > synchronising and desynchronising and choosing to sleep and/or run at the
> > same time?
>
> I don't think so. They're independant processes, and I insist on reducing
> their X work in order to ensure they don't get perturbated by external
> factor. Their work consist in looping 250 ms and waiting 750 ms, then
> displaying a new progress line.

Well if they always wait 750ms and they always do 250ms of work, they will 
never actually get their 250ms in a continuous stream, and may be waiting on 
a runqueue while working. What I mean then is that scheduling could cause 
that synchronising and desynchronising unwittingly by fluctuating the 
absolute time over which they get their 250ms. The sleep always takes 750ms, 
but the actual physical time over which they get their 250ms fluctuates by 
scheduling aliasing. If instead the code said "500ms has passed while I only 
did 250ms work so I should sleep for 250ms less" this aliasing would go away. 
Of course this is impossible since a fully loaded machine would mean each 
process should never sleep. I'm not arguing this is correct behaviour for the 
scheduler to cause this, mind you, nor am I saying it's wrong behaviour. I'm 
just trying to understand better how it happens and what (if anything) should 
be done about it. Overall their progress and cpu distribution appears 
identical, as you said. The difference is that the CFS design intrinsically 
manages this exact scenario by design with its sleep/run timing mechanism.

> > I can remove the idle time entirely by running ocbench at nice 19
> > which means they are all forced to run at basically the same time by the
> > scheduler.
>
> It may indicate some special handling of nice ?

By running them nice 19 the scheduler has effectively just sequentially 
schedules them, and there is no aliasing.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: [ANNOUNCE] Staircase Deadline cpu scheduler version 0.45

2007-04-22 Thread Willy Tarreau
On Sun, Apr 22, 2007 at 10:18:32PM +1000, Con Kolivas wrote:
> On Sunday 22 April 2007 21:42, Con Kolivas wrote:
> 
> Willy I'm still investigating the idle time and fluctuating load as a 
> separate 
> issue.

OK.

> Is it possible the multiple ocbench processes are naturally 
> synchronising and desynchronising and choosing to sleep and/or run at the 
> same time?

I don't think so. They're independant processes, and I insist on reducing
their X work in order to ensure they don't get perturbated by external
factor. Their work consist in looping 250 ms and waiting 750 ms, then
displaying a new progress line.

> I can remove the idle time entirely by running ocbench at nice 19 
> which means they are all forced to run at basically the same time by the 
> scheduler.

It may indicate some special handling of nice ?

> Anyway the more important part is... Can you test this patch please? Dump
> all the other patches I sent you post 045. Michael, if you could test too
> please?

OK, I will restart from fresh 0.45 and try again.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: [ANNOUNCE] Staircase Deadline cpu scheduler version 0.45

2007-04-22 Thread Con Kolivas
On Monday 23 April 2007 00:22, Willy Tarreau wrote:
> On Sun, Apr 22, 2007 at 10:18:32PM +1000, Con Kolivas wrote:
> > On Sunday 22 April 2007 21:42, Con Kolivas wrote:
> >
> > Willy I'm still investigating the idle time and fluctuating load as a
> > separate issue. Is it possible the multiple ocbench processes are
> > naturally synchronising and desynchronising and choosing to sleep and/or
> > run at the same time? I can remove the idle time entirely by running
> > ocbench at nice 19 which means they are all forced to run at basically
> > the same time by the scheduler.
> >
> > Anyway the more important part is... Can you test this patch please? Dump
> > all the other patches I sent you post 045. Michael, if you could test too
> > please?
>
> OK, it's better now. All tasks equally run.

Excellent thank you very much (again!)

> X is still somewhat jerky, even 
> at nice -19. I'm sure it happens when it's waiting in the other array. We
> should definitely manage to get rid of this if we want to ensure low
> latency.

Yeah that would be correct. It's clearly possible to keep the whole design 
philosophy and priority system intact with SD and do away with the arrays if 
it becomes a continuous stream instead of two arrays but that requires some 
architectural changes. I've been concentrating on nailing all the remaining 
issues (and they kept cropping up as you've seen *blush*). However... I 
haven't quite figured out how to do that architectural change just yet either 
so let's just iron out all the bugs out of this now.

> Just FYI, the idle is often close to zero and the load is often close to
> 30, even if still fluctuating :

> Hoping this helps !

I can say without a shadow of a doubt it has helped :) I'll respin the patch 
slightly differently and post it and release as v0.46.

> Willy

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: [ANNOUNCE] Staircase Deadline cpu scheduler version 0.45

2007-04-22 Thread Michael Gerdau
> Anyway the more important part is... Can you test this patch please? Dump
> all the other patches I sent you post 045. Michael, if you could test too
> please?

Have it up running for 40 minutes now and my perljobs show a constant
cpu utilization of 100/50/50 in top most of the time. When the 100% job
goes down to e.g. 70% these 30% are immediately reclaimed by the other
two, i.e. the total sum of all three stays with 2% point of 200%.

From here it seems as if your latest patch did what is was supposed to :-)

Best,
Michael

PS: While these numbercrunching jobs were running I started another
kde session and have my children play supertux for 20 minutes. While
the system occasionally was not as responsive as it is when there
is little load, supertux remained very playable.
-- 
 Technosis GmbH, Geschäftsführer: Michael Gerdau, Tobias Dittmar
 Sitz Hamburg; HRB 89145 Amtsgericht Hamburg
 Vote against SPAM - see http://www.politik-digital.de/spam/
 Michael Gerdau   email: [EMAIL PROTECTED]
 GPG-keys available on request or at public keyserver


pgpqVf9ejWzeo.pgp
Description: PGP signature


Re: [ck] Re: [ANNOUNCE] Staircase Deadline cpu scheduler version 0.45

2007-04-22 Thread Willy Tarreau
On Sun, Apr 22, 2007 at 10:18:32PM +1000, Con Kolivas wrote:
> On Sunday 22 April 2007 21:42, Con Kolivas wrote:
> 
> Willy I'm still investigating the idle time and fluctuating load as a 
> separate 
> issue. Is it possible the multiple ocbench processes are naturally 
> synchronising and desynchronising and choosing to sleep and/or run at the 
> same time? I can remove the idle time entirely by running ocbench at nice 19 
> which means they are all forced to run at basically the same time by the 
> scheduler.
> 
> Anyway the more important part is... Can you test this patch please? Dump
> all the other patches I sent you post 045. Michael, if you could test too
> please?

OK, it's better now. All tasks equally run. X is still somewhat jerky, even
at nice -19. I'm sure it happens when it's waiting in the other array. We
should definitely manage to get rid of this if we want to ensure low latency.

Just FYI, the idle is often close to zero and the load is often close to 30,
even if still fluctuating :

   procs  memory  swap  io system  cpu
 r  b  w   swpd   free   buff  cache   si   sobibo   incs us sy id
22  0  0  0 922876   6472  5784800 0 0  164  3275 34 58  8
14  0  0  0 922648   6472  5786400 0 08   383 37 63  0
24  0  0  0 922580   6472  5786400 0   128   32   219 34 66  0
13  0  1  0 922524   6472  5786400 0 00   393 35 64  0
31  0  0  0 922524   6472  5786400 0 03   338 37 63  0
56  0  0  0 922556   6472  5786400 0 00   290 35 65  0
57  0  1  0 922556   6472  5786400 0 01   288 33 55 11
45  0  0  0 922556   6472  5786400 0 01   255 27 52 21
38  0  0  0 922564   6472  5786400 0 00   161 24 49 27
 0  0  0  0 922564   6472  5786400 0 01   142 23 40 38
 0  0  0  0 922564   6472  5786400 0 02   182 29 55 16
22  0  0  0 922564   6472  5786400 0 01   253 28 48 24
26  0  0  0 922564   6472  5786400 0 01   212 31 60  9
27  0  0  0 922564   6472  5786400 0 01   314 31 70  0
44  0  0  0 922564   6472  5786400 0 02   282 32 62  6
54  0  1  0 922564   6472  5786400 0 0   26   213 32 67  1
42  0  0  0 922564   6472  5786400 0 0  142   278 34 61  4
35  0  0  0 922564   6472  5786400 0 0   58   226 39 61  0
 6  0  0  0 922564   6472  5786400 0 0   79   228 35 65  0
 5  0  0  0 922564   6472  5786400 0 0   98   225 36 61  3
35  0  1  0 922564   6472  5786400 0 0   71   205 22 41 36

Hoping this helps !
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: [ANNOUNCE] Staircase Deadline cpu scheduler version 0.45

2007-04-22 Thread Con Kolivas
On Sunday 22 April 2007 21:42, Con Kolivas wrote:

Willy I'm still investigating the idle time and fluctuating load as a separate 
issue. Is it possible the multiple ocbench processes are naturally 
synchronising and desynchronising and choosing to sleep and/or run at the 
same time? I can remove the idle time entirely by running ocbench at nice 19 
which means they are all forced to run at basically the same time by the 
scheduler.

Anyway the more important part is... Can you test this patch please? Dump
all the other patches I sent you post 045. Michael, if you could test too
please?

Thanks!

---

It appears load weight still wasn't being set in enough places and changing
where rr_interval was being set a few iterations of SD ago might have revealed
that bug. Ensure load_weight is set whenever p->quota is set and simplify
dramatically the load_weight.

Signed-off-by: Con Kolivas <[EMAIL PROTECTED]>

---
 kernel/sched.c |   35 ---
 1 file changed, 16 insertions(+), 19 deletions(-)

Index: linux-2.6.21-rc7-sd/kernel/sched.c
===
--- linux-2.6.21-rc7-sd.orig/kernel/sched.c 2007-04-22 21:37:25.0 
+1000
+++ linux-2.6.21-rc7-sd/kernel/sched.c  2007-04-22 22:01:48.0 +1000
@@ -102,8 +102,6 @@ unsigned long long __attribute__((weak))
  */
 int rr_interval __read_mostly = 8;
 
-#define DEF_TIMESLICE  (rr_interval * 20)
-
 /*
  * This contains a bitmap for each dynamic priority level with empty slots
  * for the valid priorities each different nice level can have. It allows
@@ -886,16 +884,10 @@ static int task_timeslice(struct task_st
 }
 
 /*
- * Assume: static_prio_timeslice(NICE_TO_PRIO(0)) == DEF_TIMESLICE
- * If static_prio_timeslice() is ever changed to break this assumption then
- * this code will need modification. Scaled as multiples of milliseconds.
- */
-#define TIME_SLICE_NICE_ZERO DEF_TIMESLICE
-#define LOAD_WEIGHT(lp) \
-   (((lp) * SCHED_LOAD_SCALE) / TIME_SLICE_NICE_ZERO)
-#define TASK_LOAD_WEIGHT(p)LOAD_WEIGHT(task_timeslice(p))
-#define RTPRIO_TO_LOAD_WEIGHT(rp)  \
-   (LOAD_WEIGHT((rr_interval + 20 + (rp
+ * The load weight is basically the task_timeslice in ms. Realtime tasks are
+ * special cased to be proportionately larger by their rt_priority.
+ */
+#define RTPRIO_TO_LOAD_WEIGHT(rp)  ((rr_interval + 20 + (rp)))
 
 static void set_load_weight(struct task_struct *p)
 {
@@ -912,7 +904,7 @@ static void set_load_weight(struct task_
 #endif
p->load_weight = RTPRIO_TO_LOAD_WEIGHT(p->rt_priority);
} else
-   p->load_weight = TASK_LOAD_WEIGHT(p);
+   p->load_weight = task_timeslice(p);
 }
 
 static inline void
@@ -995,7 +987,7 @@ static int effective_prio(struct task_st
  * nice -20 = 10 * rr_interval. nice 1-19 = rr_interval / 2.
  * Value returned is in microseconds.
  */
-static unsigned int rr_quota(struct task_struct *p)
+static inline unsigned int rr_quota(struct task_struct *p)
 {
int nice = TASK_NICE(p), rr = rr_interval;
 
@@ -1009,6 +1001,13 @@ static unsigned int rr_quota(struct task
return MS_TO_US(rr);
 }
 
+/* Every time we set the quota we need to set the load weight */
+static void set_quota(struct task_struct *p)
+{
+   p->quota = rr_quota(p);
+   set_load_weight(p);
+}
+
 /*
  * activate_task - move a task to the runqueue and do priority recalculation
  */
@@ -1036,7 +1035,7 @@ static void activate_task(struct task_st
 (now - p->timestamp) >> 20);
}
 
-   p->quota = rr_quota(p);
+   set_quota(p);
p->prio = effective_prio(p);
p->timestamp = now;
__activate_task(p, rq);
@@ -3885,8 +3884,7 @@ void set_user_nice(struct task_struct *p
p->static_prio = NICE_TO_PRIO(nice);
old_prio = p->prio;
p->prio = effective_prio(p);
-   p->quota = rr_quota(p);
-   set_load_weight(p);
+   set_quota(p);
delta = p->prio - old_prio;
 
if (queued) {
@@ -4020,8 +4018,7 @@ static void __setscheduler(struct task_s
p->normal_prio = normal_prio(p);
/* we are holding p->pi_lock already */
p->prio = rt_mutex_getprio(p);
-   p->quota = rr_quota(p);
-   set_load_weight(p);
+   set_quota(p);
 }
 
 /**

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/