Re: [ck] Re: [ANNOUNCE] Staircase Deadline cpu scheduler version 0.45
On Monday 23 April 2007 00:35, Con Kolivas wrote: > On Monday 23 April 2007 00:22, Willy Tarreau wrote: > > X is still somewhat jerky, even > > at nice -19. I'm sure it happens when it's waiting in the other array. We > > should definitely manage to get rid of this if we want to ensure low > > latency. > > Yeah that would be correct. It's clearly possible to keep the whole design > philosophy and priority system intact with SD and do away with the arrays > if it becomes a continuous stream instead of two arrays but that requires > some architectural changes. I've been concentrating on nailing all the > remaining issues (and they kept cropping up as you've seen *blush*). > However... I haven't quite figured out how to do that architectural change > just yet either so let's just iron out all the bugs out of this now. By the way, Ingo et al, this is yet again an open invitation to suggest ideas, or better yet, provide code to do this with now that the core of SD is finally looking to be doing everything as expected within its constraints. I'm low on cycles and would appreciate the help. I'd prefer to leave everything that's queued in -mm as is for the moment before someone wants to take this into another wild direction. Thanks! -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: [ANNOUNCE] Staircase Deadline cpu scheduler version 0.45
On Monday 23 April 2007 00:27, Michael Gerdau wrote: > > Anyway the more important part is... Can you test this patch please? Dump > > all the other patches I sent you post 045. Michael, if you could test too > > please? > > Have it up running for 40 minutes now and my perljobs show a constant > cpu utilization of 100/50/50 in top most of the time. When the 100% job > goes down to e.g. 70% these 30% are immediately reclaimed by the other > two, i.e. the total sum of all three stays with 2% point of 200%. > > From here it seems as if your latest patch did what is was supposed to :-) Excellent, thanks for testing. v0.46 with something close to this patch coming shortly. > Best, > Michael > > PS: While these numbercrunching jobs were running I started another > kde session and have my children play supertux for 20 minutes. While > the system occasionally was not as responsive as it is when there > is little load, supertux remained very playable. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: [ANNOUNCE] Staircase Deadline cpu scheduler version 0.45
On Sunday 22 April 2007 23:07, Willy Tarreau wrote: > On Sun, Apr 22, 2007 at 10:18:32PM +1000, Con Kolivas wrote: > > On Sunday 22 April 2007 21:42, Con Kolivas wrote: > > > > Willy I'm still investigating the idle time and fluctuating load as a > > separate issue. > > OK. > > > Is it possible the multiple ocbench processes are naturally > > synchronising and desynchronising and choosing to sleep and/or run at the > > same time? > > I don't think so. They're independant processes, and I insist on reducing > their X work in order to ensure they don't get perturbated by external > factor. Their work consist in looping 250 ms and waiting 750 ms, then > displaying a new progress line. Well if they always wait 750ms and they always do 250ms of work, they will never actually get their 250ms in a continuous stream, and may be waiting on a runqueue while working. What I mean then is that scheduling could cause that synchronising and desynchronising unwittingly by fluctuating the absolute time over which they get their 250ms. The sleep always takes 750ms, but the actual physical time over which they get their 250ms fluctuates by scheduling aliasing. If instead the code said "500ms has passed while I only did 250ms work so I should sleep for 250ms less" this aliasing would go away. Of course this is impossible since a fully loaded machine would mean each process should never sleep. I'm not arguing this is correct behaviour for the scheduler to cause this, mind you, nor am I saying it's wrong behaviour. I'm just trying to understand better how it happens and what (if anything) should be done about it. Overall their progress and cpu distribution appears identical, as you said. The difference is that the CFS design intrinsically manages this exact scenario by design with its sleep/run timing mechanism. > > I can remove the idle time entirely by running ocbench at nice 19 > > which means they are all forced to run at basically the same time by the > > scheduler. > > It may indicate some special handling of nice ? By running them nice 19 the scheduler has effectively just sequentially schedules them, and there is no aliasing. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: [ANNOUNCE] Staircase Deadline cpu scheduler version 0.45
On Sun, Apr 22, 2007 at 10:18:32PM +1000, Con Kolivas wrote: > On Sunday 22 April 2007 21:42, Con Kolivas wrote: > > Willy I'm still investigating the idle time and fluctuating load as a > separate > issue. OK. > Is it possible the multiple ocbench processes are naturally > synchronising and desynchronising and choosing to sleep and/or run at the > same time? I don't think so. They're independant processes, and I insist on reducing their X work in order to ensure they don't get perturbated by external factor. Their work consist in looping 250 ms and waiting 750 ms, then displaying a new progress line. > I can remove the idle time entirely by running ocbench at nice 19 > which means they are all forced to run at basically the same time by the > scheduler. It may indicate some special handling of nice ? > Anyway the more important part is... Can you test this patch please? Dump > all the other patches I sent you post 045. Michael, if you could test too > please? OK, I will restart from fresh 0.45 and try again. Regards, Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: [ANNOUNCE] Staircase Deadline cpu scheduler version 0.45
On Monday 23 April 2007 00:22, Willy Tarreau wrote: > On Sun, Apr 22, 2007 at 10:18:32PM +1000, Con Kolivas wrote: > > On Sunday 22 April 2007 21:42, Con Kolivas wrote: > > > > Willy I'm still investigating the idle time and fluctuating load as a > > separate issue. Is it possible the multiple ocbench processes are > > naturally synchronising and desynchronising and choosing to sleep and/or > > run at the same time? I can remove the idle time entirely by running > > ocbench at nice 19 which means they are all forced to run at basically > > the same time by the scheduler. > > > > Anyway the more important part is... Can you test this patch please? Dump > > all the other patches I sent you post 045. Michael, if you could test too > > please? > > OK, it's better now. All tasks equally run. Excellent thank you very much (again!) > X is still somewhat jerky, even > at nice -19. I'm sure it happens when it's waiting in the other array. We > should definitely manage to get rid of this if we want to ensure low > latency. Yeah that would be correct. It's clearly possible to keep the whole design philosophy and priority system intact with SD and do away with the arrays if it becomes a continuous stream instead of two arrays but that requires some architectural changes. I've been concentrating on nailing all the remaining issues (and they kept cropping up as you've seen *blush*). However... I haven't quite figured out how to do that architectural change just yet either so let's just iron out all the bugs out of this now. > Just FYI, the idle is often close to zero and the load is often close to > 30, even if still fluctuating : > Hoping this helps ! I can say without a shadow of a doubt it has helped :) I'll respin the patch slightly differently and post it and release as v0.46. > Willy -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: [ANNOUNCE] Staircase Deadline cpu scheduler version 0.45
> Anyway the more important part is... Can you test this patch please? Dump > all the other patches I sent you post 045. Michael, if you could test too > please? Have it up running for 40 minutes now and my perljobs show a constant cpu utilization of 100/50/50 in top most of the time. When the 100% job goes down to e.g. 70% these 30% are immediately reclaimed by the other two, i.e. the total sum of all three stays with 2% point of 200%. From here it seems as if your latest patch did what is was supposed to :-) Best, Michael PS: While these numbercrunching jobs were running I started another kde session and have my children play supertux for 20 minutes. While the system occasionally was not as responsive as it is when there is little load, supertux remained very playable. -- Technosis GmbH, Geschäftsführer: Michael Gerdau, Tobias Dittmar Sitz Hamburg; HRB 89145 Amtsgericht Hamburg Vote against SPAM - see http://www.politik-digital.de/spam/ Michael Gerdau email: [EMAIL PROTECTED] GPG-keys available on request or at public keyserver pgpqVf9ejWzeo.pgp Description: PGP signature
Re: [ck] Re: [ANNOUNCE] Staircase Deadline cpu scheduler version 0.45
On Sun, Apr 22, 2007 at 10:18:32PM +1000, Con Kolivas wrote: > On Sunday 22 April 2007 21:42, Con Kolivas wrote: > > Willy I'm still investigating the idle time and fluctuating load as a > separate > issue. Is it possible the multiple ocbench processes are naturally > synchronising and desynchronising and choosing to sleep and/or run at the > same time? I can remove the idle time entirely by running ocbench at nice 19 > which means they are all forced to run at basically the same time by the > scheduler. > > Anyway the more important part is... Can you test this patch please? Dump > all the other patches I sent you post 045. Michael, if you could test too > please? OK, it's better now. All tasks equally run. X is still somewhat jerky, even at nice -19. I'm sure it happens when it's waiting in the other array. We should definitely manage to get rid of this if we want to ensure low latency. Just FYI, the idle is often close to zero and the load is often close to 30, even if still fluctuating : procs memory swap io system cpu r b w swpd free buff cache si sobibo incs us sy id 22 0 0 0 922876 6472 5784800 0 0 164 3275 34 58 8 14 0 0 0 922648 6472 5786400 0 08 383 37 63 0 24 0 0 0 922580 6472 5786400 0 128 32 219 34 66 0 13 0 1 0 922524 6472 5786400 0 00 393 35 64 0 31 0 0 0 922524 6472 5786400 0 03 338 37 63 0 56 0 0 0 922556 6472 5786400 0 00 290 35 65 0 57 0 1 0 922556 6472 5786400 0 01 288 33 55 11 45 0 0 0 922556 6472 5786400 0 01 255 27 52 21 38 0 0 0 922564 6472 5786400 0 00 161 24 49 27 0 0 0 0 922564 6472 5786400 0 01 142 23 40 38 0 0 0 0 922564 6472 5786400 0 02 182 29 55 16 22 0 0 0 922564 6472 5786400 0 01 253 28 48 24 26 0 0 0 922564 6472 5786400 0 01 212 31 60 9 27 0 0 0 922564 6472 5786400 0 01 314 31 70 0 44 0 0 0 922564 6472 5786400 0 02 282 32 62 6 54 0 1 0 922564 6472 5786400 0 0 26 213 32 67 1 42 0 0 0 922564 6472 5786400 0 0 142 278 34 61 4 35 0 0 0 922564 6472 5786400 0 0 58 226 39 61 0 6 0 0 0 922564 6472 5786400 0 0 79 228 35 65 0 5 0 0 0 922564 6472 5786400 0 0 98 225 36 61 3 35 0 1 0 922564 6472 5786400 0 0 71 205 22 41 36 Hoping this helps ! Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: [ANNOUNCE] Staircase Deadline cpu scheduler version 0.45
On Sunday 22 April 2007 21:42, Con Kolivas wrote: Willy I'm still investigating the idle time and fluctuating load as a separate issue. Is it possible the multiple ocbench processes are naturally synchronising and desynchronising and choosing to sleep and/or run at the same time? I can remove the idle time entirely by running ocbench at nice 19 which means they are all forced to run at basically the same time by the scheduler. Anyway the more important part is... Can you test this patch please? Dump all the other patches I sent you post 045. Michael, if you could test too please? Thanks! --- It appears load weight still wasn't being set in enough places and changing where rr_interval was being set a few iterations of SD ago might have revealed that bug. Ensure load_weight is set whenever p->quota is set and simplify dramatically the load_weight. Signed-off-by: Con Kolivas <[EMAIL PROTECTED]> --- kernel/sched.c | 35 --- 1 file changed, 16 insertions(+), 19 deletions(-) Index: linux-2.6.21-rc7-sd/kernel/sched.c === --- linux-2.6.21-rc7-sd.orig/kernel/sched.c 2007-04-22 21:37:25.0 +1000 +++ linux-2.6.21-rc7-sd/kernel/sched.c 2007-04-22 22:01:48.0 +1000 @@ -102,8 +102,6 @@ unsigned long long __attribute__((weak)) */ int rr_interval __read_mostly = 8; -#define DEF_TIMESLICE (rr_interval * 20) - /* * This contains a bitmap for each dynamic priority level with empty slots * for the valid priorities each different nice level can have. It allows @@ -886,16 +884,10 @@ static int task_timeslice(struct task_st } /* - * Assume: static_prio_timeslice(NICE_TO_PRIO(0)) == DEF_TIMESLICE - * If static_prio_timeslice() is ever changed to break this assumption then - * this code will need modification. Scaled as multiples of milliseconds. - */ -#define TIME_SLICE_NICE_ZERO DEF_TIMESLICE -#define LOAD_WEIGHT(lp) \ - (((lp) * SCHED_LOAD_SCALE) / TIME_SLICE_NICE_ZERO) -#define TASK_LOAD_WEIGHT(p)LOAD_WEIGHT(task_timeslice(p)) -#define RTPRIO_TO_LOAD_WEIGHT(rp) \ - (LOAD_WEIGHT((rr_interval + 20 + (rp + * The load weight is basically the task_timeslice in ms. Realtime tasks are + * special cased to be proportionately larger by their rt_priority. + */ +#define RTPRIO_TO_LOAD_WEIGHT(rp) ((rr_interval + 20 + (rp))) static void set_load_weight(struct task_struct *p) { @@ -912,7 +904,7 @@ static void set_load_weight(struct task_ #endif p->load_weight = RTPRIO_TO_LOAD_WEIGHT(p->rt_priority); } else - p->load_weight = TASK_LOAD_WEIGHT(p); + p->load_weight = task_timeslice(p); } static inline void @@ -995,7 +987,7 @@ static int effective_prio(struct task_st * nice -20 = 10 * rr_interval. nice 1-19 = rr_interval / 2. * Value returned is in microseconds. */ -static unsigned int rr_quota(struct task_struct *p) +static inline unsigned int rr_quota(struct task_struct *p) { int nice = TASK_NICE(p), rr = rr_interval; @@ -1009,6 +1001,13 @@ static unsigned int rr_quota(struct task return MS_TO_US(rr); } +/* Every time we set the quota we need to set the load weight */ +static void set_quota(struct task_struct *p) +{ + p->quota = rr_quota(p); + set_load_weight(p); +} + /* * activate_task - move a task to the runqueue and do priority recalculation */ @@ -1036,7 +1035,7 @@ static void activate_task(struct task_st (now - p->timestamp) >> 20); } - p->quota = rr_quota(p); + set_quota(p); p->prio = effective_prio(p); p->timestamp = now; __activate_task(p, rq); @@ -3885,8 +3884,7 @@ void set_user_nice(struct task_struct *p p->static_prio = NICE_TO_PRIO(nice); old_prio = p->prio; p->prio = effective_prio(p); - p->quota = rr_quota(p); - set_load_weight(p); + set_quota(p); delta = p->prio - old_prio; if (queued) { @@ -4020,8 +4018,7 @@ static void __setscheduler(struct task_s p->normal_prio = normal_prio(p); /* we are holding p->pi_lock already */ p->prio = rt_mutex_getprio(p); - p->quota = rr_quota(p); - set_load_weight(p); + set_quota(p); } /** -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/