Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Ingo Molnar wrote: * Davide Libenzi <[EMAIL PROTECTED]> wrote: The same user nicing two different multi-threaded processes would expect a predictable CPU distribution too. [...] i disagree that the user 'would expect' this. Some users might. Others would say: 'my 10-thread rendering engine is more important than a 1-thread job because it's using 10 threads for a reason'. And the CFS feedback so far strengthens this point: the default behavior of treating the thread as a single scheduling (and CPU time accounting) unit works pretty well on the desktop. If by desktop you mean "one and only one interactive user," that's true. On a shared machine it's very hard to preserve any semblance of fairness when one user gets far more than another, based not on the value of what they're doing but the tools they use to to it. think about it in another, 'kernel policy' way as well: we'd like to _encourage_ more parallel user applications. Hurting them by accounting all threads together sends the exact opposite message. Why is that? There are lots of things which are intrinsically single threaded, how are we hurting hurting multi-threaded applications by refusing to give them more CPU than an application running on behalf of another user? By accounting all threads together we encourage writing an application in the most logical way. Threads are a solution, not a goal in themselves. [...] Doing that efficently (the old per-cpu run-queue is pretty nice from many POVs) is the real challenge. yeah. Ingo -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Linus Torvalds wrote: On Wed, 18 Apr 2007, Matt Mackall wrote: Why is X special? Because it does work on behalf of other processes? Lots of things do this. Perhaps a scheduler should focus entirely on the implicit and directed wakeup matrix and optimizing that instead[1]. I 100% agree - the perfect scheduler would indeed take into account where the wakeups come from, and try to "weigh" processes that help other processes make progress more. That would naturally give server processes more CPU power, because they help others I don't believe for a second that "fairness" means "give everybody the same amount of CPU". That's a totally illogical measure of fairness. All processes are _not_ created equal. That said, even trying to do "fairness by effective user ID" would probably already do a lot. In a desktop environment, X would get as much CPU time as the user processes, simply because it's in a different protection domain (and that's really what "effective user ID" means: it's not about "users", it's really about "protection domains"). And "fairness by euid" is probably a hell of a lot easier to do than trying to figure out the wakeup matrix. You probably want to consider the controlling terminal as well... do you want to have people starting 'at' jobs competing on equal footing with people typing at a terminal? I'm not offering an answer, just raising the question. And for some database applications, everyone in a group may connect with the same login-id, then do sub authorization to the database application. euid may be an issue there as well. -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Matt Mackall wrote: On Wed, Apr 18, 2007 at 08:37:11AM +0200, Nick Piggin wrote: [2] It's trivial to construct two or more perfectly reasonable and desirable definitions of fairness that are mutually incompatible. Probably not if you use common sense, and in the context of a replacement for the 2.6 scheduler. Ok, trivial example. You cannot allocate equal CPU time to processes/tasks and simultaneously allocate equal time to thread groups. Is it common sense that a heavily-threaded app should be able to get hugely more CPU than a well-written app? No. I don't want Joe's stupid Java app to make my compile crawl. On the other hand, if my heavily threaded app is, say, a voicemail server serving 30 customers, I probably want it to get 30x the CPU of my gzip job. Matt, you tickled a thought... on one hand we have a single user running a threaded application, and it ideally should get the same total CPU as a user running a single thread process. On the other hand we have a threaded application, call it sendmail, nnrpd, httpd, bind, whatever. In that case each thread is really providing service for an independent user, and should get an appropriate share of the CPU. Perhaps the solution is to add a means for identifying server processes, by capability, or by membership in a "server" group, or by having the initiating process set some flag at exec() time. That doesn't necessarily solve problems, but it may provide more information to allow them to be soluble. -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Hi Björn, On Sat, Apr 21, 2007 at 01:29:41PM +0200, Björn Steinbrink wrote: > Hi, > > On 2007.04.21 13:07:48 +0200, Willy Tarreau wrote: > > > another thing i noticed: when using a -y larger then 1, then the window > > > title (at least on Metacity) overlaps and thus the ocbench tasks have > > > different X overhead and get scheduled a bit assymetrically as well. Is > > > there any way to start them up title-less perhaps? > > > > It has annoyed me a bit too, but I'm no X developer at all, so I don't > > know at all if it's possible nor how to do this. I know that my window > > manager even adds title bars to xeyes, so I'm not sure we can do this. > > > > Right now, I've added a "-B " argument so that you can > > skip the size of your title bar. It's dirty but it's not my main job :-) > > Here's a small patch that makes the windows unmanaged, which also causes > ocbench to start up quite a bit faster on my box with larger number of > windows, so it probably avoids some window manager overhead, which is a > nice side-effect. Excellent ! I've just merged it but conditionned it to a "-u" argument so that we can keep previous behaviour (moving the windows is useful especially when there are few of them). So the new version 0.5 is available there : http://linux.1wt.eu/sched/ I believe it's the last one for today as I'm late on some work. Thanks ! Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Hi, On 2007.04.21 13:07:48 +0200, Willy Tarreau wrote: > > another thing i noticed: when using a -y larger then 1, then the window > > title (at least on Metacity) overlaps and thus the ocbench tasks have > > different X overhead and get scheduled a bit assymetrically as well. Is > > there any way to start them up title-less perhaps? > > It has annoyed me a bit too, but I'm no X developer at all, so I don't > know at all if it's possible nor how to do this. I know that my window > manager even adds title bars to xeyes, so I'm not sure we can do this. > > Right now, I've added a "-B " argument so that you can > skip the size of your title bar. It's dirty but it's not my main job :-) Here's a small patch that makes the windows unmanaged, which also causes ocbench to start up quite a bit faster on my box with larger number of windows, so it probably avoids some window manager overhead, which is a nice side-effect. Björn -- diff -u ocbench-0.4/ocbench.c ocbench-0.4.1/ocbench.c --- ocbench-0.4/ocbench.c 2007-04-21 13:05:55.0 +0200 +++ ocbench-0.4.1/ocbench.c 2007-04-21 13:24:01.0 +0200 @@ -213,6 +213,7 @@ int main(int argc, char *argv[]) { Window root; XGCValues gc_setup; + XSetWindowAttributes swa; int c, index, proc_x, proc_y, pid; int *pcount[] = {, , }; char *p, *q; @@ -342,8 +343,11 @@ alloc_color(fg, ); alloc_color(fg2, ); - win = XCreateSimpleWindow(dpy, root, X, Y, width, height, 0, - black.pixel, black.pixel); + swa.override_redirect = 1; + + win = XCreateWindow(dpy, root, X, Y, width, height, 0, + CopyFromParent, InputOutput, CopyFromParent, + CWOverrideRedirect, ); XStoreName(dpy, win, "ocbench"); XSelectInput(dpy, win, ExposureMask | StructureNotifyMask); Only in ocbench-0.4.1/: .README.swp - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Hi Ingo, I'm replying to your 3 mails at once. On Sat, Apr 21, 2007 at 12:45:22PM +0200, Ingo Molnar wrote: > > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > > It could become a useful scheduler benchmark ! > > > > i just tried ocbench-0.3, and it is indeed very nice! So as you've noticed just one minute after I put it there, I've updated the tool and renamed it ocbench. For others, it's here : http://linux.1wt.eu/sched/ Useful news are proper positionning, automatic forking, and more visible progress with smaller windows, which eat less of X ressources. Now about your idea of making it report information on stdout, I don't know if it would be that useful. There are many other command line tools for this purpose. This one's goal is to eat CPU with a visual control of CPU distribution only. Concerning your idea of using a signal to resync every process, I agree with you. Running at 8x8 shows a noticeable offset. I've just uploaded v0.4 which supports your idea of sending USR1. > another thing i noticed: when using a -y larger then 1, then the window > title (at least on Metacity) overlaps and thus the ocbench tasks have > different X overhead and get scheduled a bit assymetrically as well. Is > there any way to start them up title-less perhaps? It has annoyed me a bit too, but I'm no X developer at all, so I don't know at all if it's possible nor how to do this. I know that my window manager even adds title bars to xeyes, so I'm not sure we can do this. Right now, I've added a "-B " argument so that you can skip the size of your title bar. It's dirty but it's not my main job :-) Thanks for your feedback Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > > It could become a useful scheduler benchmark ! > > i just tried ocbench-0.3, and it is indeed very nice! another thing i noticed: when using a -y larger then 1, then the window title (at least on Metacity) overlaps and thus the ocbench tasks have different X overhead and get scheduled a bit assymetrically as well. Is there any way to start them up title-less perhaps? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > > The modified code is here : > > > > http://linux.1wt.eu/sched/orbitclock-0.2bench.tgz > > > > What is interesting to note is that it's easy to make X work a lot > > (99%) by using 0 as the sleeping time, and it's easy to make the > > process work a lot by using large values for the running time > > associated with very low values (or 0) for the sleep time. > > > > Ah, and it supports -geometry ;-) > > > > It could become a useful scheduler benchmark ! > > i just tried ocbench-0.3, and it is indeed very nice! another thing i just noticed: when starting up lots of ocbench tasks (say -x 6 -y 6) then they (naturally) get started up with an already visible offset. It's nice to observe the startup behavior, but after that it would be useful if it were possible to 'resync' all those ocbench tasks so that they start at the same offset. [ Maybe a "killall -SIGUSR1 ocbench" could serve this purpose, without having to synchronize the tasks explicitly? ] Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Willy Tarreau <[EMAIL PROTECTED]> wrote: > I hacked it a bit to make it accept two parameters : > -R : time spent burning CPU cycles at each round > -S : time spent getting a rest > > It now advances what it thinks is a second at each iteration, so that > it makes it easy to compare its progress with other instances (there > are seconds, minutes and hours, so it's easy to visually count up to > around 43200). > > The modified code is here : > > http://linux.1wt.eu/sched/orbitclock-0.2bench.tgz > > What is interesting to note is that it's easy to make X work a lot > (99%) by using 0 as the sleeping time, and it's easy to make the > process work a lot by using large values for the running time > associated with very low values (or 0) for the sleep time. > > Ah, and it supports -geometry ;-) > > It could become a useful scheduler benchmark ! i just tried ocbench-0.3, and it is indeed very nice! Would it make sense perhaps to (optionally?) also log some sort of periodic text feedback to stdout, about the quality of scheduling? Maybe even a 'run this many seconds' option plus a summary text output at the end (which would output measured runtime, observed longest/smallest latency and standard deviation of latencies maybe)? That would make it directly usable both as a 'consistency of X app scheduling' visual test and as an easily shareable benchmark with an objective numeric result as well. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Bill Davidsen <[EMAIL PROTECTED]> wrote: > All of my testing has been on desktop machines, although in most cases > they were really loaded desktops which had load avg 10..100 from time > to time, and none were low memory machines. Up to CFS v3 I thought > nicksched was my winner, now CFSv3 looks better, by not having > stumbles under stupid loads. nice! I hope CFSv4 kept that good tradition too ;) > I have not tested: > 1 - server loads, nntp, smtp, etc > 2 - low memory machines > 3 - uniprocessor systems > > I think this should be done before drawing conclusions. Or if someone > has tried this, perhaps they would report what they saw. People are > talking about smoothness, but not how many pages per second come out > of their overloaded web server. i tested heavily swapping systems. (make -j50 workloads easily trigger that) I also tested UP systems and a handful of SMP systems. I have also tested massive_intr.c which i believe is an indicator of how fairly CPU time is distributed between partly sleeping partly running server threads. But i very much agree that diverse feedback is sought and welcome, both from those who are happy with the current scheduler and those who are unhappy about it. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Fri, Apr 20, 2007 at 04:47:27PM -0400, Bill Davidsen wrote: > Ingo Molnar wrote: > > >( Lets be cautious though: the jury is still out whether people actually > > like this more than the current approach. While CFS feedback looks > > promising after a whopping 3 days of it being released [ ;-) ], the > > test coverage of all 'fairness centric' schedulers, even considering > > years of availability is less than 1% i'm afraid, and that < 1% was > > mostly self-selecting. ) > > > All of my testing has been on desktop machines, although in most cases > they were really loaded desktops which had load avg 10..100 from time to > time, and none were low memory machines. Up to CFS v3 I thought > nicksched was my winner, now CFSv3 looks better, by not having stumbles > under stupid loads. What base_timeslice were you using for nicksched, and what HZ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Fri, Apr 20, 2007 at 04:47:27PM -0400, Bill Davidsen wrote: Ingo Molnar wrote: ( Lets be cautious though: the jury is still out whether people actually like this more than the current approach. While CFS feedback looks promising after a whopping 3 days of it being released [ ;-) ], the test coverage of all 'fairness centric' schedulers, even considering years of availability is less than 1% i'm afraid, and that 1% was mostly self-selecting. ) All of my testing has been on desktop machines, although in most cases they were really loaded desktops which had load avg 10..100 from time to time, and none were low memory machines. Up to CFS v3 I thought nicksched was my winner, now CFSv3 looks better, by not having stumbles under stupid loads. What base_timeslice were you using for nicksched, and what HZ? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Bill Davidsen [EMAIL PROTECTED] wrote: All of my testing has been on desktop machines, although in most cases they were really loaded desktops which had load avg 10..100 from time to time, and none were low memory machines. Up to CFS v3 I thought nicksched was my winner, now CFSv3 looks better, by not having stumbles under stupid loads. nice! I hope CFSv4 kept that good tradition too ;) I have not tested: 1 - server loads, nntp, smtp, etc 2 - low memory machines 3 - uniprocessor systems I think this should be done before drawing conclusions. Or if someone has tried this, perhaps they would report what they saw. People are talking about smoothness, but not how many pages per second come out of their overloaded web server. i tested heavily swapping systems. (make -j50 workloads easily trigger that) I also tested UP systems and a handful of SMP systems. I have also tested massive_intr.c which i believe is an indicator of how fairly CPU time is distributed between partly sleeping partly running server threads. But i very much agree that diverse feedback is sought and welcome, both from those who are happy with the current scheduler and those who are unhappy about it. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Ingo Molnar [EMAIL PROTECTED] wrote: The modified code is here : http://linux.1wt.eu/sched/orbitclock-0.2bench.tgz What is interesting to note is that it's easy to make X work a lot (99%) by using 0 as the sleeping time, and it's easy to make the process work a lot by using large values for the running time associated with very low values (or 0) for the sleep time. Ah, and it supports -geometry ;-) It could become a useful scheduler benchmark ! i just tried ocbench-0.3, and it is indeed very nice! another thing i just noticed: when starting up lots of ocbench tasks (say -x 6 -y 6) then they (naturally) get started up with an already visible offset. It's nice to observe the startup behavior, but after that it would be useful if it were possible to 'resync' all those ocbench tasks so that they start at the same offset. [ Maybe a killall -SIGUSR1 ocbench could serve this purpose, without having to synchronize the tasks explicitly? ] Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Willy Tarreau [EMAIL PROTECTED] wrote: I hacked it a bit to make it accept two parameters : -R run_time_in_microsecond : time spent burning CPU cycles at each round -S sleep_time_in_microsecond : time spent getting a rest It now advances what it thinks is a second at each iteration, so that it makes it easy to compare its progress with other instances (there are seconds, minutes and hours, so it's easy to visually count up to around 43200). The modified code is here : http://linux.1wt.eu/sched/orbitclock-0.2bench.tgz What is interesting to note is that it's easy to make X work a lot (99%) by using 0 as the sleeping time, and it's easy to make the process work a lot by using large values for the running time associated with very low values (or 0) for the sleep time. Ah, and it supports -geometry ;-) It could become a useful scheduler benchmark ! i just tried ocbench-0.3, and it is indeed very nice! Would it make sense perhaps to (optionally?) also log some sort of periodic text feedback to stdout, about the quality of scheduling? Maybe even a 'run this many seconds' option plus a summary text output at the end (which would output measured runtime, observed longest/smallest latency and standard deviation of latencies maybe)? That would make it directly usable both as a 'consistency of X app scheduling' visual test and as an easily shareable benchmark with an objective numeric result as well. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Ingo Molnar [EMAIL PROTECTED] wrote: It could become a useful scheduler benchmark ! i just tried ocbench-0.3, and it is indeed very nice! another thing i noticed: when using a -y larger then 1, then the window title (at least on Metacity) overlaps and thus the ocbench tasks have different X overhead and get scheduled a bit assymetrically as well. Is there any way to start them up title-less perhaps? Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Hi Ingo, I'm replying to your 3 mails at once. On Sat, Apr 21, 2007 at 12:45:22PM +0200, Ingo Molnar wrote: * Ingo Molnar [EMAIL PROTECTED] wrote: It could become a useful scheduler benchmark ! i just tried ocbench-0.3, and it is indeed very nice! So as you've noticed just one minute after I put it there, I've updated the tool and renamed it ocbench. For others, it's here : http://linux.1wt.eu/sched/ Useful news are proper positionning, automatic forking, and more visible progress with smaller windows, which eat less of X ressources. Now about your idea of making it report information on stdout, I don't know if it would be that useful. There are many other command line tools for this purpose. This one's goal is to eat CPU with a visual control of CPU distribution only. Concerning your idea of using a signal to resync every process, I agree with you. Running at 8x8 shows a noticeable offset. I've just uploaded v0.4 which supports your idea of sending USR1. another thing i noticed: when using a -y larger then 1, then the window title (at least on Metacity) overlaps and thus the ocbench tasks have different X overhead and get scheduled a bit assymetrically as well. Is there any way to start them up title-less perhaps? It has annoyed me a bit too, but I'm no X developer at all, so I don't know at all if it's possible nor how to do this. I know that my window manager even adds title bars to xeyes, so I'm not sure we can do this. Right now, I've added a -B border size argument so that you can skip the size of your title bar. It's dirty but it's not my main job :-) Thanks for your feedback Willy - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Hi, On 2007.04.21 13:07:48 +0200, Willy Tarreau wrote: another thing i noticed: when using a -y larger then 1, then the window title (at least on Metacity) overlaps and thus the ocbench tasks have different X overhead and get scheduled a bit assymetrically as well. Is there any way to start them up title-less perhaps? It has annoyed me a bit too, but I'm no X developer at all, so I don't know at all if it's possible nor how to do this. I know that my window manager even adds title bars to xeyes, so I'm not sure we can do this. Right now, I've added a -B border size argument so that you can skip the size of your title bar. It's dirty but it's not my main job :-) Here's a small patch that makes the windows unmanaged, which also causes ocbench to start up quite a bit faster on my box with larger number of windows, so it probably avoids some window manager overhead, which is a nice side-effect. Björn -- diff -u ocbench-0.4/ocbench.c ocbench-0.4.1/ocbench.c --- ocbench-0.4/ocbench.c 2007-04-21 13:05:55.0 +0200 +++ ocbench-0.4.1/ocbench.c 2007-04-21 13:24:01.0 +0200 @@ -213,6 +213,7 @@ int main(int argc, char *argv[]) { Window root; XGCValues gc_setup; + XSetWindowAttributes swa; int c, index, proc_x, proc_y, pid; int *pcount[] = {HOUR, MIN, SEC}; char *p, *q; @@ -342,8 +343,11 @@ alloc_color(fg, orange); alloc_color(fg2, blue); - win = XCreateSimpleWindow(dpy, root, X, Y, width, height, 0, - black.pixel, black.pixel); + swa.override_redirect = 1; + + win = XCreateWindow(dpy, root, X, Y, width, height, 0, + CopyFromParent, InputOutput, CopyFromParent, + CWOverrideRedirect, swa); XStoreName(dpy, win, ocbench); XSelectInput(dpy, win, ExposureMask | StructureNotifyMask); Only in ocbench-0.4.1/: .README.swp - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Hi Björn, On Sat, Apr 21, 2007 at 01:29:41PM +0200, Björn Steinbrink wrote: Hi, On 2007.04.21 13:07:48 +0200, Willy Tarreau wrote: another thing i noticed: when using a -y larger then 1, then the window title (at least on Metacity) overlaps and thus the ocbench tasks have different X overhead and get scheduled a bit assymetrically as well. Is there any way to start them up title-less perhaps? It has annoyed me a bit too, but I'm no X developer at all, so I don't know at all if it's possible nor how to do this. I know that my window manager even adds title bars to xeyes, so I'm not sure we can do this. Right now, I've added a -B border size argument so that you can skip the size of your title bar. It's dirty but it's not my main job :-) Here's a small patch that makes the windows unmanaged, which also causes ocbench to start up quite a bit faster on my box with larger number of windows, so it probably avoids some window manager overhead, which is a nice side-effect. Excellent ! I've just merged it but conditionned it to a -u argument so that we can keep previous behaviour (moving the windows is useful especially when there are few of them). So the new version 0.5 is available there : http://linux.1wt.eu/sched/ I believe it's the last one for today as I'm late on some work. Thanks ! Willy - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Matt Mackall wrote: On Wed, Apr 18, 2007 at 08:37:11AM +0200, Nick Piggin wrote: [2] It's trivial to construct two or more perfectly reasonable and desirable definitions of fairness that are mutually incompatible. Probably not if you use common sense, and in the context of a replacement for the 2.6 scheduler. Ok, trivial example. You cannot allocate equal CPU time to processes/tasks and simultaneously allocate equal time to thread groups. Is it common sense that a heavily-threaded app should be able to get hugely more CPU than a well-written app? No. I don't want Joe's stupid Java app to make my compile crawl. On the other hand, if my heavily threaded app is, say, a voicemail server serving 30 customers, I probably want it to get 30x the CPU of my gzip job. Matt, you tickled a thought... on one hand we have a single user running a threaded application, and it ideally should get the same total CPU as a user running a single thread process. On the other hand we have a threaded application, call it sendmail, nnrpd, httpd, bind, whatever. In that case each thread is really providing service for an independent user, and should get an appropriate share of the CPU. Perhaps the solution is to add a means for identifying server processes, by capability, or by membership in a server group, or by having the initiating process set some flag at exec() time. That doesn't necessarily solve problems, but it may provide more information to allow them to be soluble. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Linus Torvalds wrote: On Wed, 18 Apr 2007, Matt Mackall wrote: Why is X special? Because it does work on behalf of other processes? Lots of things do this. Perhaps a scheduler should focus entirely on the implicit and directed wakeup matrix and optimizing that instead[1]. I 100% agree - the perfect scheduler would indeed take into account where the wakeups come from, and try to weigh processes that help other processes make progress more. That would naturally give server processes more CPU power, because they help others I don't believe for a second that fairness means give everybody the same amount of CPU. That's a totally illogical measure of fairness. All processes are _not_ created equal. That said, even trying to do fairness by effective user ID would probably already do a lot. In a desktop environment, X would get as much CPU time as the user processes, simply because it's in a different protection domain (and that's really what effective user ID means: it's not about users, it's really about protection domains). And fairness by euid is probably a hell of a lot easier to do than trying to figure out the wakeup matrix. You probably want to consider the controlling terminal as well... do you want to have people starting 'at' jobs competing on equal footing with people typing at a terminal? I'm not offering an answer, just raising the question. And for some database applications, everyone in a group may connect with the same login-id, then do sub authorization to the database application. euid may be an issue there as well. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Ingo Molnar wrote: * Davide Libenzi [EMAIL PROTECTED] wrote: The same user nicing two different multi-threaded processes would expect a predictable CPU distribution too. [...] i disagree that the user 'would expect' this. Some users might. Others would say: 'my 10-thread rendering engine is more important than a 1-thread job because it's using 10 threads for a reason'. And the CFS feedback so far strengthens this point: the default behavior of treating the thread as a single scheduling (and CPU time accounting) unit works pretty well on the desktop. If by desktop you mean one and only one interactive user, that's true. On a shared machine it's very hard to preserve any semblance of fairness when one user gets far more than another, based not on the value of what they're doing but the tools they use to to it. think about it in another, 'kernel policy' way as well: we'd like to _encourage_ more parallel user applications. Hurting them by accounting all threads together sends the exact opposite message. Why is that? There are lots of things which are intrinsically single threaded, how are we hurting hurting multi-threaded applications by refusing to give them more CPU than an application running on behalf of another user? By accounting all threads together we encourage writing an application in the most logical way. Threads are a solution, not a goal in themselves. [...] Doing that efficently (the old per-cpu run-queue is pretty nice from many POVs) is the real challenge. yeah. Ingo -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Ingo Molnar wrote: ( Lets be cautious though: the jury is still out whether people actually like this more than the current approach. While CFS feedback looks promising after a whopping 3 days of it being released [ ;-) ], the test coverage of all 'fairness centric' schedulers, even considering years of availability is less than 1% i'm afraid, and that < 1% was mostly self-selecting. ) All of my testing has been on desktop machines, although in most cases they were really loaded desktops which had load avg 10..100 from time to time, and none were low memory machines. Up to CFS v3 I thought nicksched was my winner, now CFSv3 looks better, by not having stumbles under stupid loads. I have not tested: 1 - server loads, nntp, smtp, etc 2 - low memory machines 3 - uniprocessor systems I think this should be done before drawing conclusions. Or if someone has tried this, perhaps they would report what they saw. People are talking about smoothness, but not how many pages per second come out of their overloaded web server. -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Mike Galbraith wrote: On Tue, 2007-04-17 at 05:40 +0200, Nick Piggin wrote: On Tue, Apr 17, 2007 at 04:29:01AM +0200, Mike Galbraith wrote: Yup, and progress _is_ happening now, quite rapidly. Progress as in progress on Ingo's scheduler. I still don't know how we'd decide when to replace the mainline scheduler or with what. I don't think we can say Ingo's is better than the alternatives, can we? No, that would require massive performance testing of all alternatives. If there is some kind of bakeoff, then I'd like one of Con's designs to be involved, and mine, and Peter's... The trouble with a bakeoff is that it's pretty darn hard to get people to test in the first place, and then comes weighting the subjective and hard performance numbers. If they're close in numbers, do you go with the one which starts the least flamewars or what? Here we disagree... I picked a scheduler not by running benchmarks, but by running loads which piss me off with the mainline scheduler. And then I ran the other schedulers for a while to find the things, normal things I do, which resulted in bad behavior. And when I found one which had (so far) no such cases I called it my winner, but I haven't tested it under server load, so I can't begin to say it's "the best." What we need is for lots of people to run every scheduler in real life, and do "worst case analysis" by finding the cases which cause bad behavior. And if there were a way to easily choose another scheduler, call it plugable, modular, or Russian Roulette, people who found a worst case would report it (aka bitch about it) and try another. But the average user is better able to boot with an option like "sched=cfs" (or sc, or nick, or ...) than to patch and build a kernel. So if we don't get easily switched schedulers people will not test nearly as well. The best scheduler isn't the one 2% faster than the rest, it's the one with the fewest jackpot cases where it sucks. And if the mainline had multiple schedulers this testing would get done, authors would get more reports and have a better chance of fixing corner cases. Note that we really need multiple schedulers to make people happy, because fairness is not the most desirable behavior on all machines, and adding knobs probably isn't the answer. I want a server to degrade gently, I want my desktop to show my movie and echo my typing, and if that's hard on compiles or the file transfer, so be it. Con doesn't want to compromise his goals, I agree but want to have an option if I don't share them. -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
William Lee Irwin III wrote: William Lee Irwin III wrote: I'd further recommend making priority levels accessible to kernel threads that are not otherwise accessible to processes, both above and below user-available priority levels. Basically, if you can get SCHED_RR and SCHED_FIFO to coexist as "intimate scheduler classes," then a SCHED_KERN scheduler class can coexist with SCHED_OTHER in like fashion, but with availability of higher and lower priorities than any userspace process is allowed, and potentially some differing scheduling semantics. In such a manner nonessential background processing intended not to ever disturb userspace can be given priorities appropriate to it (perhaps even con's SCHED_IDLEPRIO would make sense), and other, urgent processing can be given priority over userspace altogether. On Thu, Apr 19, 2007 at 09:50:19PM +1000, Peter Williams wrote: This is sounding very much like System V Release 4 (and descendants) except that they call it SCHED_SYS and also give SCHED_NORMAL tasks that are in system mode dynamic priorities in the SCHED_SYS range (to avoid priority inversion, I believe). Descriptions of that are probably where I got the idea (hurrah for OS textbooks). And long term background memory. :-) It makes a fair amount of sense. Yes. You could also add a SCHED_IA in between SCHED_SYS and SCHED_OTHER (a la Solaris) for interactive tasks. The only problem is how to get a task into SCHED_IA without root privileges. Not sure what the take on the specific precedent is. The only content here is expanding the priority range with ranges above and below for the exclusive use of ultra-privileged tasks, so it's really trivial. Actually it might be so trivial it should just be some permission checks in the SCHED_OTHER renicing code. Perhaps. Peter -- Peter Williams [EMAIL PROTECTED] "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
William Lee Irwin III wrote: William Lee Irwin III wrote: I'd further recommend making priority levels accessible to kernel threads that are not otherwise accessible to processes, both above and below user-available priority levels. Basically, if you can get SCHED_RR and SCHED_FIFO to coexist as intimate scheduler classes, then a SCHED_KERN scheduler class can coexist with SCHED_OTHER in like fashion, but with availability of higher and lower priorities than any userspace process is allowed, and potentially some differing scheduling semantics. In such a manner nonessential background processing intended not to ever disturb userspace can be given priorities appropriate to it (perhaps even con's SCHED_IDLEPRIO would make sense), and other, urgent processing can be given priority over userspace altogether. On Thu, Apr 19, 2007 at 09:50:19PM +1000, Peter Williams wrote: This is sounding very much like System V Release 4 (and descendants) except that they call it SCHED_SYS and also give SCHED_NORMAL tasks that are in system mode dynamic priorities in the SCHED_SYS range (to avoid priority inversion, I believe). Descriptions of that are probably where I got the idea (hurrah for OS textbooks). And long term background memory. :-) It makes a fair amount of sense. Yes. You could also add a SCHED_IA in between SCHED_SYS and SCHED_OTHER (a la Solaris) for interactive tasks. The only problem is how to get a task into SCHED_IA without root privileges. Not sure what the take on the specific precedent is. The only content here is expanding the priority range with ranges above and below for the exclusive use of ultra-privileged tasks, so it's really trivial. Actually it might be so trivial it should just be some permission checks in the SCHED_OTHER renicing code. Perhaps. Peter -- Peter Williams [EMAIL PROTECTED] Learning, n. The kind of ignorance distinguishing the studious. -- Ambrose Bierce - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Mike Galbraith wrote: On Tue, 2007-04-17 at 05:40 +0200, Nick Piggin wrote: On Tue, Apr 17, 2007 at 04:29:01AM +0200, Mike Galbraith wrote: Yup, and progress _is_ happening now, quite rapidly. Progress as in progress on Ingo's scheduler. I still don't know how we'd decide when to replace the mainline scheduler or with what. I don't think we can say Ingo's is better than the alternatives, can we? No, that would require massive performance testing of all alternatives. If there is some kind of bakeoff, then I'd like one of Con's designs to be involved, and mine, and Peter's... The trouble with a bakeoff is that it's pretty darn hard to get people to test in the first place, and then comes weighting the subjective and hard performance numbers. If they're close in numbers, do you go with the one which starts the least flamewars or what? Here we disagree... I picked a scheduler not by running benchmarks, but by running loads which piss me off with the mainline scheduler. And then I ran the other schedulers for a while to find the things, normal things I do, which resulted in bad behavior. And when I found one which had (so far) no such cases I called it my winner, but I haven't tested it under server load, so I can't begin to say it's the best. What we need is for lots of people to run every scheduler in real life, and do worst case analysis by finding the cases which cause bad behavior. And if there were a way to easily choose another scheduler, call it plugable, modular, or Russian Roulette, people who found a worst case would report it (aka bitch about it) and try another. But the average user is better able to boot with an option like sched=cfs (or sc, or nick, or ...) than to patch and build a kernel. So if we don't get easily switched schedulers people will not test nearly as well. The best scheduler isn't the one 2% faster than the rest, it's the one with the fewest jackpot cases where it sucks. And if the mainline had multiple schedulers this testing would get done, authors would get more reports and have a better chance of fixing corner cases. Note that we really need multiple schedulers to make people happy, because fairness is not the most desirable behavior on all machines, and adding knobs probably isn't the answer. I want a server to degrade gently, I want my desktop to show my movie and echo my typing, and if that's hard on compiles or the file transfer, so be it. Con doesn't want to compromise his goals, I agree but want to have an option if I don't share them. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Ingo Molnar wrote: ( Lets be cautious though: the jury is still out whether people actually like this more than the current approach. While CFS feedback looks promising after a whopping 3 days of it being released [ ;-) ], the test coverage of all 'fairness centric' schedulers, even considering years of availability is less than 1% i'm afraid, and that 1% was mostly self-selecting. ) All of my testing has been on desktop machines, although in most cases they were really loaded desktops which had load avg 10..100 from time to time, and none were low memory machines. Up to CFS v3 I thought nicksched was my winner, now CFSv3 looks better, by not having stumbles under stupid loads. I have not tested: 1 - server loads, nntp, smtp, etc 2 - low memory machines 3 - uniprocessor systems I think this should be done before drawing conclusions. Or if someone has tried this, perhaps they would report what they saw. People are talking about smoothness, but not how many pages per second come out of their overloaded web server. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
William Lee Irwin III wrote: >> I'd further recommend making priority levels accessible to kernel threads >> that are not otherwise accessible to processes, both above and below >> user-available priority levels. Basically, if you can get SCHED_RR and >> SCHED_FIFO to coexist as "intimate scheduler classes," then a SCHED_KERN >> scheduler class can coexist with SCHED_OTHER in like fashion, but with >> availability of higher and lower priorities than any userspace process >> is allowed, and potentially some differing scheduling semantics. In such >> a manner nonessential background processing intended not to ever disturb >> userspace can be given priorities appropriate to it (perhaps even con's >> SCHED_IDLEPRIO would make sense), and other, urgent processing can be >> given priority over userspace altogether. On Thu, Apr 19, 2007 at 09:50:19PM +1000, Peter Williams wrote: > This is sounding very much like System V Release 4 (and descendants) > except that they call it SCHED_SYS and also give SCHED_NORMAL tasks that > are in system mode dynamic priorities in the SCHED_SYS range (to avoid > priority inversion, I believe). Descriptions of that are probably where I got the idea (hurrah for OS textbooks). It makes a fair amount of sense. Not sure what the take on the specific precedent is. The only content here is expanding the priority range with ranges above and below for the exclusive use of ultra-privileged tasks, so it's really trivial. Actually it might be so trivial it should just be some permission checks in the SCHED_OTHER renicing code. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Thu, 2007-04-19 at 09:55 -0700, Davide Libenzi wrote: > On Thu, 19 Apr 2007, Mike Galbraith wrote: > > > On Thu, 2007-04-19 at 09:09 +0200, Ingo Molnar wrote: > > > * Mike Galbraith <[EMAIL PROTECTED]> wrote: > > > > > > > With a heavily reniced X (perfectly fine), that should indeed solve my > > > > daily usage pattern nicely (always need godmode for shells, but not > > > > for mozilla and ilk. 50/50 split automatic without renice of entire > > > > gui) > > > > > > how about the first-approximation solution i suggested in the previous > > > mail: to add a per UID default nice level? (With this default defaulting > > > to '-10' for all root-owned processes, and defaulting to '0' for > > > everything else.) That would solve most of the current CFS regressions > > > at hand. > > > > That would make my kernel builds etc interfere with my other self's > > surfing and whatnot. With it by EUID, when I'm surfing or whatnot, the > > X portion of my Joe-User activity pushes the compile portion of root > > down in bandwidth utilization automagically, which is exactly the right > > thing, because the root me in not as important as the Joe-User me using > > the GUI at that time. If the idea of X disturbing root upsets some, > > they can move X to another UID. Generally, it seems perfect for here. > > Now guys, I did not follow the whole lengthy and feisty thread, but IIRC > Con's scheduler has been attacked because, among other argouments, was > requiring X to be reniced. This happened like a month ago IINM. I don't object to renicing X if you want it to receive _more_ than it's fair share. I do object to having to renice X in order for it to _get_ it's fair share. That's what I attacked. > I did not have time to look at Con's scheduler, and I only had a brief > look at Ingo's one (looks very promising IMO, but so was the initial O(1) > post before all the corner-cases fixes went in). > But this is not a about technical merit, this is about applying the same > rules of judgement to others as well to ourselves. I'm running the same tests with CFS that I ran for RSDL/SD. It falls short in one key area (to me) in that X+client cannot yet split my box 50/50 with two concurrent tasks. In the CFS case, renicing both X and client does work, but it should not be necessary IMHO. With RSDL/SD renicing didn't help. > We went from a "renicing X to -10 is bad because the scheduler should > be able to correctly handle the problem w/out additional external plugs" > to a totally opposite "let's renice -10 X, the whole SCHED_NORMAL kthreads > class, on top of all the tasks owned by root" [1]. > >From a spectator POV like myself in this case, this looks rather "unfair". Well, for me, the renicing I mentioned above is only interesting as a way to improve long term fairness with schedulers with no history. I found Linus' EUID idea intriguing in that by putting the server together with a steady load in one 'fair' domain, and clients in another, X can, if prioritized to empower it to do so, modulate the steady load in it's domain (but can't starve it!), the clients modulate X, and the steady load gets it all when X and clients are idle. The nice level of X determines to what _extent_ X can modulate the constant load rather like a mixer slider. The synchronous (I'm told) nature of X/client then becomes kind of an asset to the desktop instead of a liability. The specific case I was thinking about is the X+Gforce test where both RSDL and CFS fail to provide fairness (as defined by me;). X and Gforce are mostly not concurrent. The make -j2 I put them up against are mostly concurrent. I don't call giving 1/3 of my CPU to X+Client fair at _all_, but that's what you'll get if your fairstick of the instant generally can't see the fourth competing task. Seemed pretty cool to me because it creates the missing connection between client and server, though also likely complicated (and maybe full of perils, who knows). -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Fri, Apr 20, 2007 at 02:52:38AM +0300, Jan Knutar wrote: > On Thursday 19 April 2007 18:18, Ingo Molnar wrote: > > * Willy Tarreau <[EMAIL PROTECTED]> wrote: > > > You can certainly script it with -geometry. But it is the wrong > > > application for this matter, because you benchmark X more than > > > glxgears itself. What would be better is something like a line > > > rotating 360 degrees and doing some short stuff between each > > > degree, so that X is not much sollicitated, but the CPU would be > > > spent more on the processes themselves. > > > > at least on my setup glxgears goes via DRI/DRM so there's no X > > scheduling inbetween at all, and the visual appearance of glxgears is > > a direct function of its scheduling. > > How much of the subjective interactiveness-feel of the desktop is at the > mercy of the X server's scheduling and not the cpu scheduler? probably a lot. Hence the reason why I wanted something visually noticeable but using far less X resources than glxgears. The modified orbitclock is perfect IMHO. Regards, Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Thursday 19 April 2007 18:18, Ingo Molnar wrote: > * Willy Tarreau <[EMAIL PROTECTED]> wrote: > > You can certainly script it with -geometry. But it is the wrong > > application for this matter, because you benchmark X more than > > glxgears itself. What would be better is something like a line > > rotating 360 degrees and doing some short stuff between each > > degree, so that X is not much sollicitated, but the CPU would be > > spent more on the processes themselves. > > at least on my setup glxgears goes via DRI/DRM so there's no X > scheduling inbetween at all, and the visual appearance of glxgears is > a direct function of its scheduling. How much of the subjective interactiveness-feel of the desktop is at the mercy of the X server's scheduling and not the cpu scheduler? I've noticed that video playback is significantly smoother and resistant to other load, when using MPlayer's opengl output, especially if "heavy" programs are running at the same time. Especially firefox and ksysguard seem to have found a way to cause video through Xv to look annoyingly jittery. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Thu, Apr 19, 2007 at 05:18:03PM +0200, Ingo Molnar wrote: > > * Willy Tarreau <[EMAIL PROTECTED]> wrote: > > > You can certainly script it with -geometry. But it is the wrong > > application for this matter, because you benchmark X more than > > glxgears itself. What would be better is something like a line > > rotating 360 degrees and doing some short stuff between each degree, > > so that X is not much sollicitated, but the CPU would be spent more on > > the processes themselves. > > at least on my setup glxgears goes via DRI/DRM so there's no X > scheduling inbetween at all, and the visual appearance of glxgears is a > direct function of its scheduling. OK, I thought that somethink looking like a clock would be useful, especially if we could tune the amount of CPU spent per task instead of being limited by graphics drivers. I searched freashmeat for a clock and found "orbitclock" by Jeremy Weatherford, which was exactly what I was looking for : - small - C only - X11 only - needed less than 5 minutes and no knowledge of X11 for the complete hack ! => Kudos to its author, sincerely ! I hacked it a bit to make it accept two parameters : -R : time spent burning CPU cycles at each round -S : time spent getting a rest It now advances what it thinks is a second at each iteration, so that it makes it easy to compare its progress with other instances (there are seconds, minutes and hours, so it's easy to visually count up to around 43200). The modified code is here : http://linux.1wt.eu/sched/orbitclock-0.2bench.tgz What is interesting to note is that it's easy to make X work a lot (99%) by using 0 as the sleeping time, and it's easy to make the process work a lot by using large values for the running time associated with very low values (or 0) for the sleep time. Ah, and it supports -geometry ;-) It could become a useful scheduler benchmark ! Have fun ! Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
In article <[EMAIL PROTECTED]> you wrote: > Top (VCPU maybe?) >User >Process >Thread The problem with that is, that not all Schedulers might work on the User level. You can think of Batch/Job, Parent, Group, Session or namespace level. That would be IMHO a generic Top, with no need for a level above. Greetings Bernd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Thursday 19 April 2007, Ingo Molnar wrote: >* Willy Tarreau <[EMAIL PROTECTED]> wrote: >> You can certainly script it with -geometry. But it is the wrong >> application for this matter, because you benchmark X more than >> glxgears itself. What would be better is something like a line >> rotating 360 degrees and doing some short stuff between each degree, >> so that X is not much sollicitated, but the CPU would be spent more on >> the processes themselves. > >at least on my setup glxgears goes via DRI/DRM so there's no X >scheduling inbetween at all, and the visual appearance of glxgears is a >direct function of its scheduling. > > Ingo That doesn't appear to be the case here Ingo. Even when I know the rest of the system is lagged, glxgears continues to show very smooth and steady movement. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Yow! I just went below the poverty line! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Thursday 19 April 2007, Ingo Molnar wrote: >* Willy Tarreau <[EMAIL PROTECTED]> wrote: >> Good idea. The machine I'm typing from now has 1000 scheddos running >> at +19, and 12 gears at nice 0. [...] >> >> From time to time, one of the 12 aligned gears will quickly perform a >> full quarter of round while others slowly turn by a few degrees. In >> fact, while I don't know this process's CPU usage pattern, there's >> something useful in it : it allows me to visually see when process >> accelerate/decelerate. [...] > >cool idea - i have just tried this and it rocks - you can easily see the >'nature' of CPU time distribution just via visual feedback. (Is there >any easy way to start up 12 glxgears fully aligned, or does one always >have to mouse around to get them into proper position?) > >btw., i am using another method to quickly judge X's behavior: i started >the 'snowflakes' plugin in Beryl on Fedora 7, which puts a nice smooth >opengl-rendered snow fall on the desktop background. That gives me an >idea about how well X is scheduling under various workloads, without >having to instrument it explicitly. > yes, its a cute idea, till you switch away from that screen to check progress on something else, like to compose this message. === 5913 frames in 5.0 seconds = 1182.499 FPS 6238 frames in 5.0 seconds = 1247.556 FPS 11380 frames in 5.0 seconds = 2275.905 FPS 10691 frames in 5.0 seconds = 2138.173 FPS 8707 frames in 5.0 seconds = 1741.305 FPS 10669 frames in 5.0 seconds = 2133.708 FPS 11392 frames in 5.0 seconds = 2278.037 FPS 11379 frames in 5.0 seconds = 2275.711 FPS 11310 frames in 5.0 seconds = 2261.861 FPS 11386 frames in 5.0 seconds = 2277.081 FPS 11292 frames in 5.0 seconds = 2258.353 FPS 11352 frames in 5.0 seconds = 2270.297 FPS 11415 frames in 5.0 seconds = 2282.886 FPS 11406 frames in 5.0 seconds = 2281.037 FPS 11483 frames in 5.0 seconds = 2296.533 FPS 11510 frames in 5.0 seconds = 2301.883 FPS 11123 frames in 5.0 seconds = 2224.266 FPS 8980 frames in 5.0 seconds = 1795.861 FPS === The over 2000fps reports were while I was either looking at htop, or starting this message, both on different screens. htop said it was using 95+ % of the cpu even when its display was going to /dev/null. So 'Kewl' doesn't seem to get us apples to apples numbers we can go to the window and bet win-place-show based on them alone. FWIW, running the nvidia-9755 drivers here. So if we are going to use that as a judgement operator, it obviously needs some intelligently applied scaling before they are worth more than a subjective feel is. > Ingo >- >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >the body of a message to [EMAIL PROTECTED] >More majordomo info at http://vger.kernel.org/majordomo-info.html >Please read the FAQ at http://www.tux.org/lkml/ -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) The confusion of a staff member is measured by the length of his memos. -- New York Times, Jan. 20, 1981 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Thu, 19 Apr 2007, Mike Galbraith wrote: > On Thu, 2007-04-19 at 09:09 +0200, Ingo Molnar wrote: > > * Mike Galbraith <[EMAIL PROTECTED]> wrote: > > > > > With a heavily reniced X (perfectly fine), that should indeed solve my > > > daily usage pattern nicely (always need godmode for shells, but not > > > for mozilla and ilk. 50/50 split automatic without renice of entire > > > gui) > > > > how about the first-approximation solution i suggested in the previous > > mail: to add a per UID default nice level? (With this default defaulting > > to '-10' for all root-owned processes, and defaulting to '0' for > > everything else.) That would solve most of the current CFS regressions > > at hand. > > That would make my kernel builds etc interfere with my other self's > surfing and whatnot. With it by EUID, when I'm surfing or whatnot, the > X portion of my Joe-User activity pushes the compile portion of root > down in bandwidth utilization automagically, which is exactly the right > thing, because the root me in not as important as the Joe-User me using > the GUI at that time. If the idea of X disturbing root upsets some, > they can move X to another UID. Generally, it seems perfect for here. Now guys, I did not follow the whole lengthy and feisty thread, but IIRC Con's scheduler has been attacked because, among other argouments, was requiring X to be reniced. This happened like a month ago IINM. I did not have time to look at Con's scheduler, and I only had a brief look at Ingo's one (looks very promising IMO, but so was the initial O(1) post before all the corner-cases fixes went in). But this is not a about technical merit, this is about applying the same rules of judgement to others as well to ourselves. We went from a "renicing X to -10 is bad because the scheduler should be able to correctly handle the problem w/out additional external plugs" to a totally opposite "let's renice -10 X, the whole SCHED_NORMAL kthreads class, on top of all the tasks owned by root" [1]. >From a spectator POV like myself in this case, this looks rather "unfair". [1] I think, before and now, that that's more a duck tape patch than a real solution. OTOH if the "solution" is gonna be another maze of macros and heuristics filled with pretty bad corner cases, I may prefer the former. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Thu, 19 Apr 2007, Ingo Molnar wrote: > i disagree that the user 'would expect' this. Some users might. Others > would say: 'my 10-thread rendering engine is more important than a > 1-thread job because it's using 10 threads for a reason'. And the CFS > feedback so far strengthens this point: the default behavior of treating > the thread as a single scheduling (and CPU time accounting) unit works > pretty well on the desktop. > > think about it in another, 'kernel policy' way as well: we'd like to > _encourage_ more parallel user applications. Hurting them by accounting > all threads together sends the exact opposite message. There are counter argouments too. Like, not every user knows if a certain process is MT or not. I agree though that doing accounting and fairness at a depth lower then USER is messy, and not only for performance. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Willy Tarreau <[EMAIL PROTECTED]> wrote: > You can certainly script it with -geometry. But it is the wrong > application for this matter, because you benchmark X more than > glxgears itself. What would be better is something like a line > rotating 360 degrees and doing some short stuff between each degree, > so that X is not much sollicitated, but the CPU would be spent more on > the processes themselves. at least on my setup glxgears goes via DRI/DRM so there's no X scheduling inbetween at all, and the visual appearance of glxgears is a direct function of its scheduling. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Hi Ingo, On Thu, Apr 19, 2007 at 11:01:44AM +0200, Ingo Molnar wrote: > > * Willy Tarreau <[EMAIL PROTECTED]> wrote: > > > Good idea. The machine I'm typing from now has 1000 scheddos running > > at +19, and 12 gears at nice 0. [...] > > > From time to time, one of the 12 aligned gears will quickly perform a > > full quarter of round while others slowly turn by a few degrees. In > > fact, while I don't know this process's CPU usage pattern, there's > > something useful in it : it allows me to visually see when process > > accelerate/decelerate. [...] > > cool idea - i have just tried this and it rocks - you can easily see the > 'nature' of CPU time distribution just via visual feedback. (Is there > any easy way to start up 12 glxgears fully aligned, or does one always > have to mouse around to get them into proper position?) -- Replying quickly, I'm short in time -- You can certainly script it with -geometry. But it is the wrong application for this matter, because you benchmark X more than glxgears itself. What would be better is something like a line rotating 360 degrees and doing some short stuff between each degree, so that X is not much sollicitated, but the CPU would be spent more on the processes themselves. Benchmarking interactions between X and multiple clients is a completely different test IMHO. Glxgears is between those two, making it inappropriate for scheduler tuning. Regards, Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
William Lee Irwin III wrote: * Andrew Morton <[EMAIL PROTECTED]> wrote: Yes, there are potential compatibility problems. Example: a machine with 100 busy httpd processes and suddenly a big gzip starts up from console or cron. [...] On Thu, Apr 19, 2007 at 08:38:10AM +0200, Ingo Molnar wrote: h. How about the following then: default to nice -10 for all (SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_ special: root already has disk space reserved to it, root has special memory allocation allowances, etc. I dont see a reason why we couldnt by default make all root tasks have nice -10. This would be instantly loved by sysadmins i suspect ;-) (distros that go the extra mile of making Xorg run under non-root could also go another extra one foot to renice that X server to -10.) I'd further recommend making priority levels accessible to kernel threads that are not otherwise accessible to processes, both above and below user-available priority levels. Basically, if you can get SCHED_RR and SCHED_FIFO to coexist as "intimate scheduler classes," then a SCHED_KERN scheduler class can coexist with SCHED_OTHER in like fashion, but with availability of higher and lower priorities than any userspace process is allowed, and potentially some differing scheduling semantics. In such a manner nonessential background processing intended not to ever disturb userspace can be given priorities appropriate to it (perhaps even con's SCHED_IDLEPRIO would make sense), and other, urgent processing can be given priority over userspace altogether. I believe root's default priority can be adjusted in userspace as things now stand somewhere in /etc/ but I'm not sure of the specifics. Word is somewhere in /etc/security/limits.conf This is sounding very much like System V Release 4 (and descendants) except that they call it SCHED_SYS and also give SCHED_NORMAL tasks that are in system mode dynamic priorities in the SCHED_SYS range (to avoid priority inversion, I believe). Peter -- Peter Williams [EMAIL PROTECTED] "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS and suspend2: hang in atomic copy (was: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS])
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > > I think a better approach would be to keep track of the rightmost > > entry, set the key to the rightmost's key +1 and then simply insert > > it there. > > yeah. I had that implemented at a stage but was trying to be too > clever for my own good ;-) i have fixed it via the patch below. (I'm using rb_last() because that way the normal scheduling codepaths are not burdened with the maintainance of a rightmost entry.) Ingo --- kernel/sched.c |3 ++- kernel/sched_fair.c | 24 +--- 2 files changed, 15 insertions(+), 12 deletions(-) Index: linux/kernel/sched.c === --- linux.orig/kernel/sched.c +++ linux/kernel/sched.c @@ -3806,7 +3806,8 @@ asmlinkage long sys_sched_yield(void) schedstat_inc(rq, yld_cnt); if (rq->nr_running == 1) schedstat_inc(rq, yld_act_empty); - current->sched_class->yield_task(rq, current); + else + current->sched_class->yield_task(rq, current); /* * Since we are going to call schedule() anyway, there's Index: linux/kernel/sched_fair.c === --- linux.orig/kernel/sched_fair.c +++ linux/kernel/sched_fair.c @@ -275,21 +275,23 @@ static void dequeue_task_fair(struct rq */ static void yield_task_fair(struct rq *rq, struct task_struct *p) { + struct rb_node *entry; + struct task_struct *last; + dequeue_task_fair(rq, p); p->on_rq = 0; + /* -* Temporarily insert at the last position of the tree: +* Temporarily insert at the last position of the tree. +* The key will be updated back to (near) its old value +* when the task gets scheduled. */ - p->fair_key = LLONG_MAX; + entry = rb_last(>tasks_timeline); + last = rb_entry(entry, struct task_struct, run_node); + + p->fair_key = last->fair_key + 1; __enqueue_task_fair(rq, p); p->on_rq = 1; - - /* -* Update the key to the real value, so that when all other -* tasks from before the rightmost position have executed, -* this task is picked up again: -*/ - p->fair_key = rq->fair_clock - p->wait_runtime + p->nice_offset; } /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS and suspend2: hang in atomic copy (was: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS])
* Esben Nielsen <[EMAIL PROTECTED]> wrote: > >+/* > >+ * Temporarily insert at the last position of the tree: > >+ */ > >+p->fair_key = LLONG_MAX; > >+__enqueue_task_fair(rq, p); > > p->on_rq = 1; > >+ > >+/* > >+ * Update the key to the real value, so that when all other > >+ * tasks from before the rightmost position have executed, > >+ * this task is picked up again: > >+ */ > >+p->fair_key = rq->fair_clock - p->wait_runtime + p->nice_offset; > > I don't think it safe to change the key after inserting the element in > the tree. You end up with an unsorted tree giving where new entries > end up in wrong places "randomly". yeah, indeed. I hoped that once this rightmost entry is removed (as soon as it gets scheduled next time) the tree goes back to a correct shape, but that's not the case - the left sub-tree and the right sub-tree is merged by the rbtree code with the assumption that the entry had a correct key. > I think a better approach would be to keep track of the rightmost > entry, set the key to the rightmost's key +1 and then simply insert it > there. yeah. I had that implemented at a stage but was trying to be too clever for my own good ;-) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS and suspend2: hang in atomic copy (was: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS])
On Wed, 18 Apr 2007, Ingo Molnar wrote: * Christian Hesse <[EMAIL PROTECTED]> wrote: Hi Ingo and all, On Friday 13 April 2007, Ingo Molnar wrote: as usual, any sort of feedback, bugreports, fixes and suggestions are more than welcome, I just gave CFS a try on my system. From a user's point of view it looks good so far. Thanks for your work. you are welcome! However I found a problem: When trying to suspend a system patched with suspend2 2.2.9.11 it hangs with "doing atomic copy". Pressing the ESC key results in a message that it tries to abort suspend, but then still hangs. i took a quick look at suspend2 and it makes some use of yield(). There's a bug in CFS's yield code, i've attached a patch that should fix it, does it make any difference to the hang? Ingo Index: linux/kernel/sched_fair.c === --- linux.orig/kernel/sched_fair.c +++ linux/kernel/sched_fair.c @@ -264,15 +264,26 @@ static void dequeue_task_fair(struct rq /* * sched_yield() support is very simple via the rbtree, we just - * dequeue and enqueue the task, which causes the task to - * roundrobin to the end of the tree: + * dequeue the task and move it to the rightmost position, which + * causes the task to roundrobin to the end of the tree. */ static void requeue_task_fair(struct rq *rq, struct task_struct *p) { dequeue_task_fair(rq, p); p->on_rq = 0; - enqueue_task_fair(rq, p); + /* +* Temporarily insert at the last position of the tree: +*/ + p->fair_key = LLONG_MAX; + __enqueue_task_fair(rq, p); p->on_rq = 1; + + /* +* Update the key to the real value, so that when all other +* tasks from before the rightmost position have executed, +* this task is picked up again: +*/ + p->fair_key = rq->fair_clock - p->wait_runtime + p->nice_offset; I don't think it safe to change the key after inserting the element in the tree. You end up with an unsorted tree giving where new entries end up in wrong places "randomly". I think a better approach would be to keep track of the rightmost entry, set the key to the rightmost's key +1 and then simply insert it there. Esben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Willy Tarreau <[EMAIL PROTECTED]> wrote: > Good idea. The machine I'm typing from now has 1000 scheddos running > at +19, and 12 gears at nice 0. [...] > From time to time, one of the 12 aligned gears will quickly perform a > full quarter of round while others slowly turn by a few degrees. In > fact, while I don't know this process's CPU usage pattern, there's > something useful in it : it allows me to visually see when process > accelerate/decelerate. [...] cool idea - i have just tried this and it rocks - you can easily see the 'nature' of CPU time distribution just via visual feedback. (Is there any easy way to start up 12 glxgears fully aligned, or does one always have to mouse around to get them into proper position?) btw., i am using another method to quickly judge X's behavior: i started the 'snowflakes' plugin in Beryl on Fedora 7, which puts a nice smooth opengl-rendered snow fall on the desktop background. That gives me an idea about how well X is scheduling under various workloads, without having to instrument it explicitly. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Thu, Apr 19, 2007 at 08:38:10AM +0200, Ingo Molnar wrote: > > * Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > And yes, by fairly, I mean fairly among all threads as a base > > > resource class, because that's what Linux has always done > > > > Yes, there are potential compatibility problems. Example: a machine > > with 100 busy httpd processes and suddenly a big gzip starts up from > > console or cron. > > > > Under current kernels, that gzip will take ages and the httpds will > > take a 1% slowdown, which may well be exactly the behaviour which is > > desired. > > > > If we were to schedule by UID then the gzip suddenly gets 50% of the > > CPU and those httpd's all take a 50% hit, which could be quite > > serious. > > > > That's simple to fix via nicing, but people have to know to do that, > > and there will be a transition period where some disruption is > > possible. > > h. How about the following then: default to nice -10 for all > (SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_ > special: root already has disk space reserved to it, root has special > memory allocation allowances, etc. I dont see a reason why we couldnt by > default make all root tasks have nice -10. This would be instantly loved > by sysadmins i suspect ;-) I have no problem with doing fancy new fairness classes and things. But considering that we _need_ to have per-thread fairness and that is also what the current scheduler has and what we need to do well for obvious reasons, the best path to take is to get per-thread scheduling up to a point where it is able to replace the current scheduler, then look at more complex things after that. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Davide Libenzi <[EMAIL PROTECTED]> wrote: > > That's one reason why i dont think it's necessarily a good idea to > > group-schedule threads, we dont really want to do a per thread group > > percpu_alloc(). > > I still do not have clear how much overhead this will bring into the > table, but I think (like Linus was pointing out) the hierarchy should > look like: > > Top (VCPU maybe?) > User > Process > Thread > > The "run_queue" concept (and data) that now is bound to a CPU, need to be > replicated in: > > ROOT <- VCPUs add themselves here > VCPU <- USERs add themselves here > USER <- PROCs add themselves here > PROC <- THREADs add themselves here > THREAD (ultimate fine grained scheduling unit) > > So ROOT, VCPU, USER and PROC will have their own "run_queue". Picking > up a new task would mean: > > VCPU = ROOT->lookup(); > USER = VCPU->lookup(); > PROC = USER->lookup(); > THREAD = PROC->lookup(); > > Run-time statistics should propagate back the other way around. yeah, but this looks quite bad from an overhead POV ... i think we can do alot simpler to solve X and kernel threads prioritization. > > In fact for threads the _reverse_ problem exists, threaded apps tend > > to _strive_ for more performance - hence their desperation of using > > the threaded programming model to begin with ;) (just think of media > > playback apps which are typically multithreaded) > > The same user nicing two different multi-threaded processes would > expect a predictable CPU distribution too. [...] i disagree that the user 'would expect' this. Some users might. Others would say: 'my 10-thread rendering engine is more important than a 1-thread job because it's using 10 threads for a reason'. And the CFS feedback so far strengthens this point: the default behavior of treating the thread as a single scheduling (and CPU time accounting) unit works pretty well on the desktop. think about it in another, 'kernel policy' way as well: we'd like to _encourage_ more parallel user applications. Hurting them by accounting all threads together sends the exact opposite message. > [...] Doing that efficently (the old per-cpu run-queue is pretty nice > from many POVs) is the real challenge. yeah. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Andrew Morton <[EMAIL PROTECTED]> wrote: >> Yes, there are potential compatibility problems. Example: a machine >> with 100 busy httpd processes and suddenly a big gzip starts up from >> console or cron. [...] On Thu, Apr 19, 2007 at 08:38:10AM +0200, Ingo Molnar wrote: > h. How about the following then: default to nice -10 for all > (SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_ > special: root already has disk space reserved to it, root has special > memory allocation allowances, etc. I dont see a reason why we couldnt by > default make all root tasks have nice -10. This would be instantly loved > by sysadmins i suspect ;-) > (distros that go the extra mile of making Xorg run under non-root could > also go another extra one foot to renice that X server to -10.) I'd further recommend making priority levels accessible to kernel threads that are not otherwise accessible to processes, both above and below user-available priority levels. Basically, if you can get SCHED_RR and SCHED_FIFO to coexist as "intimate scheduler classes," then a SCHED_KERN scheduler class can coexist with SCHED_OTHER in like fashion, but with availability of higher and lower priorities than any userspace process is allowed, and potentially some differing scheduling semantics. In such a manner nonessential background processing intended not to ever disturb userspace can be given priorities appropriate to it (perhaps even con's SCHED_IDLEPRIO would make sense), and other, urgent processing can be given priority over userspace altogether. I believe root's default priority can be adjusted in userspace as things now stand somewhere in /etc/ but I'm not sure of the specifics. Word is somewhere in /etc/security/limits.conf -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Thu, 2007-04-19 at 09:09 +0200, Ingo Molnar wrote: > * Mike Galbraith <[EMAIL PROTECTED]> wrote: > > > With a heavily reniced X (perfectly fine), that should indeed solve my > > daily usage pattern nicely (always need godmode for shells, but not > > for mozilla and ilk. 50/50 split automatic without renice of entire > > gui) > > how about the first-approximation solution i suggested in the previous > mail: to add a per UID default nice level? (With this default defaulting > to '-10' for all root-owned processes, and defaulting to '0' for > everything else.) That would solve most of the current CFS regressions > at hand. That would make my kernel builds etc interfere with my other self's surfing and whatnot. With it by EUID, when I'm surfing or whatnot, the X portion of my Joe-User activity pushes the compile portion of root down in bandwidth utilization automagically, which is exactly the right thing, because the root me in not as important as the Joe-User me using the GUI at that time. If the idea of X disturbing root upsets some, they can move X to another UID. Generally, it seems perfect for here. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Thu, 2007-04-19 at 08:52 +0200, Mike Galbraith wrote: > On Wed, 2007-04-18 at 23:48 +0200, Ingo Molnar wrote: > > > so my current impression is that we want per UID accounting to solve the > > X problem, the kernel threads problem and the many-users problem, but > > i'd not want to do it for threads just yet because for them there's not > > really any apparent problem to be solved. > > If you really mean UID vs EUID as Linus mentioned, I suppose I could > learn to login as !root, and set KDE up to always give me root shells. > > With a heavily reniced X (perfectly fine), that should indeed solve my > daily usage pattern nicely (always need godmode for shells, but not for > mozilla and ilk. 50/50 split automatic without renice of entire gui) Backward, needs to be EUID as Linus suggested. Kernel builds etc along with reniced X in root's bucket, surfing and whatnot in Joe-User's bucket. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Mike Galbraith <[EMAIL PROTECTED]> wrote: > With a heavily reniced X (perfectly fine), that should indeed solve my > daily usage pattern nicely (always need godmode for shells, but not > for mozilla and ilk. 50/50 split automatic without renice of entire > gui) how about the first-approximation solution i suggested in the previous mail: to add a per UID default nice level? (With this default defaulting to '-10' for all root-owned processes, and defaulting to '0' for everything else.) That would solve most of the current CFS regressions at hand. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Wed, 2007-04-18 at 23:48 +0200, Ingo Molnar wrote: > so my current impression is that we want per UID accounting to solve the > X problem, the kernel threads problem and the many-users problem, but > i'd not want to do it for threads just yet because for them there's not > really any apparent problem to be solved. If you really mean UID vs EUID as Linus mentioned, I suppose I could learn to login as !root, and set KDE up to always give me root shells. With a heavily reniced X (perfectly fine), that should indeed solve my daily usage pattern nicely (always need godmode for shells, but not for mozilla and ilk. 50/50 split automatic without renice of entire gui) -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Andrew Morton <[EMAIL PROTECTED]> wrote: > > And yes, by fairly, I mean fairly among all threads as a base > > resource class, because that's what Linux has always done > > Yes, there are potential compatibility problems. Example: a machine > with 100 busy httpd processes and suddenly a big gzip starts up from > console or cron. > > Under current kernels, that gzip will take ages and the httpds will > take a 1% slowdown, which may well be exactly the behaviour which is > desired. > > If we were to schedule by UID then the gzip suddenly gets 50% of the > CPU and those httpd's all take a 50% hit, which could be quite > serious. > > That's simple to fix via nicing, but people have to know to do that, > and there will be a transition period where some disruption is > possible. h. How about the following then: default to nice -10 for all (SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_ special: root already has disk space reserved to it, root has special memory allocation allowances, etc. I dont see a reason why we couldnt by default make all root tasks have nice -10. This would be instantly loved by sysadmins i suspect ;-) (distros that go the extra mile of making Xorg run under non-root could also go another extra one foot to renice that X server to -10.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Andrew Morton [EMAIL PROTECTED] wrote: And yes, by fairly, I mean fairly among all threads as a base resource class, because that's what Linux has always done Yes, there are potential compatibility problems. Example: a machine with 100 busy httpd processes and suddenly a big gzip starts up from console or cron. Under current kernels, that gzip will take ages and the httpds will take a 1% slowdown, which may well be exactly the behaviour which is desired. If we were to schedule by UID then the gzip suddenly gets 50% of the CPU and those httpd's all take a 50% hit, which could be quite serious. That's simple to fix via nicing, but people have to know to do that, and there will be a transition period where some disruption is possible. h. How about the following then: default to nice -10 for all (SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_ special: root already has disk space reserved to it, root has special memory allocation allowances, etc. I dont see a reason why we couldnt by default make all root tasks have nice -10. This would be instantly loved by sysadmins i suspect ;-) (distros that go the extra mile of making Xorg run under non-root could also go another extra one foot to renice that X server to -10.) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Wed, 2007-04-18 at 23:48 +0200, Ingo Molnar wrote: so my current impression is that we want per UID accounting to solve the X problem, the kernel threads problem and the many-users problem, but i'd not want to do it for threads just yet because for them there's not really any apparent problem to be solved. If you really mean UID vs EUID as Linus mentioned, I suppose I could learn to login as !root, and set KDE up to always give me root shells. With a heavily reniced X (perfectly fine), that should indeed solve my daily usage pattern nicely (always need godmode for shells, but not for mozilla and ilk. 50/50 split automatic without renice of entire gui) -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Mike Galbraith [EMAIL PROTECTED] wrote: With a heavily reniced X (perfectly fine), that should indeed solve my daily usage pattern nicely (always need godmode for shells, but not for mozilla and ilk. 50/50 split automatic without renice of entire gui) how about the first-approximation solution i suggested in the previous mail: to add a per UID default nice level? (With this default defaulting to '-10' for all root-owned processes, and defaulting to '0' for everything else.) That would solve most of the current CFS regressions at hand. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Thu, 2007-04-19 at 08:52 +0200, Mike Galbraith wrote: On Wed, 2007-04-18 at 23:48 +0200, Ingo Molnar wrote: so my current impression is that we want per UID accounting to solve the X problem, the kernel threads problem and the many-users problem, but i'd not want to do it for threads just yet because for them there's not really any apparent problem to be solved. If you really mean UID vs EUID as Linus mentioned, I suppose I could learn to login as !root, and set KDE up to always give me root shells. With a heavily reniced X (perfectly fine), that should indeed solve my daily usage pattern nicely (always need godmode for shells, but not for mozilla and ilk. 50/50 split automatic without renice of entire gui) Backward, needs to be EUID as Linus suggested. Kernel builds etc along with reniced X in root's bucket, surfing and whatnot in Joe-User's bucket. -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Thu, 2007-04-19 at 09:09 +0200, Ingo Molnar wrote: * Mike Galbraith [EMAIL PROTECTED] wrote: With a heavily reniced X (perfectly fine), that should indeed solve my daily usage pattern nicely (always need godmode for shells, but not for mozilla and ilk. 50/50 split automatic without renice of entire gui) how about the first-approximation solution i suggested in the previous mail: to add a per UID default nice level? (With this default defaulting to '-10' for all root-owned processes, and defaulting to '0' for everything else.) That would solve most of the current CFS regressions at hand. That would make my kernel builds etc interfere with my other self's surfing and whatnot. With it by EUID, when I'm surfing or whatnot, the X portion of my Joe-User activity pushes the compile portion of root down in bandwidth utilization automagically, which is exactly the right thing, because the root me in not as important as the Joe-User me using the GUI at that time. If the idea of X disturbing root upsets some, they can move X to another UID. Generally, it seems perfect for here. -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Andrew Morton [EMAIL PROTECTED] wrote: Yes, there are potential compatibility problems. Example: a machine with 100 busy httpd processes and suddenly a big gzip starts up from console or cron. [...] On Thu, Apr 19, 2007 at 08:38:10AM +0200, Ingo Molnar wrote: h. How about the following then: default to nice -10 for all (SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_ special: root already has disk space reserved to it, root has special memory allocation allowances, etc. I dont see a reason why we couldnt by default make all root tasks have nice -10. This would be instantly loved by sysadmins i suspect ;-) (distros that go the extra mile of making Xorg run under non-root could also go another extra one foot to renice that X server to -10.) I'd further recommend making priority levels accessible to kernel threads that are not otherwise accessible to processes, both above and below user-available priority levels. Basically, if you can get SCHED_RR and SCHED_FIFO to coexist as intimate scheduler classes, then a SCHED_KERN scheduler class can coexist with SCHED_OTHER in like fashion, but with availability of higher and lower priorities than any userspace process is allowed, and potentially some differing scheduling semantics. In such a manner nonessential background processing intended not to ever disturb userspace can be given priorities appropriate to it (perhaps even con's SCHED_IDLEPRIO would make sense), and other, urgent processing can be given priority over userspace altogether. I believe root's default priority can be adjusted in userspace as things now stand somewhere in /etc/ but I'm not sure of the specifics. Word is somewhere in /etc/security/limits.conf -- wli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Davide Libenzi [EMAIL PROTECTED] wrote: That's one reason why i dont think it's necessarily a good idea to group-schedule threads, we dont really want to do a per thread group percpu_alloc(). I still do not have clear how much overhead this will bring into the table, but I think (like Linus was pointing out) the hierarchy should look like: Top (VCPU maybe?) User Process Thread The run_queue concept (and data) that now is bound to a CPU, need to be replicated in: ROOT - VCPUs add themselves here VCPU - USERs add themselves here USER - PROCs add themselves here PROC - THREADs add themselves here THREAD (ultimate fine grained scheduling unit) So ROOT, VCPU, USER and PROC will have their own run_queue. Picking up a new task would mean: VCPU = ROOT-lookup(); USER = VCPU-lookup(); PROC = USER-lookup(); THREAD = PROC-lookup(); Run-time statistics should propagate back the other way around. yeah, but this looks quite bad from an overhead POV ... i think we can do alot simpler to solve X and kernel threads prioritization. In fact for threads the _reverse_ problem exists, threaded apps tend to _strive_ for more performance - hence their desperation of using the threaded programming model to begin with ;) (just think of media playback apps which are typically multithreaded) The same user nicing two different multi-threaded processes would expect a predictable CPU distribution too. [...] i disagree that the user 'would expect' this. Some users might. Others would say: 'my 10-thread rendering engine is more important than a 1-thread job because it's using 10 threads for a reason'. And the CFS feedback so far strengthens this point: the default behavior of treating the thread as a single scheduling (and CPU time accounting) unit works pretty well on the desktop. think about it in another, 'kernel policy' way as well: we'd like to _encourage_ more parallel user applications. Hurting them by accounting all threads together sends the exact opposite message. [...] Doing that efficently (the old per-cpu run-queue is pretty nice from many POVs) is the real challenge. yeah. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Thu, Apr 19, 2007 at 08:38:10AM +0200, Ingo Molnar wrote: * Andrew Morton [EMAIL PROTECTED] wrote: And yes, by fairly, I mean fairly among all threads as a base resource class, because that's what Linux has always done Yes, there are potential compatibility problems. Example: a machine with 100 busy httpd processes and suddenly a big gzip starts up from console or cron. Under current kernels, that gzip will take ages and the httpds will take a 1% slowdown, which may well be exactly the behaviour which is desired. If we were to schedule by UID then the gzip suddenly gets 50% of the CPU and those httpd's all take a 50% hit, which could be quite serious. That's simple to fix via nicing, but people have to know to do that, and there will be a transition period where some disruption is possible. h. How about the following then: default to nice -10 for all (SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_ special: root already has disk space reserved to it, root has special memory allocation allowances, etc. I dont see a reason why we couldnt by default make all root tasks have nice -10. This would be instantly loved by sysadmins i suspect ;-) I have no problem with doing fancy new fairness classes and things. But considering that we _need_ to have per-thread fairness and that is also what the current scheduler has and what we need to do well for obvious reasons, the best path to take is to get per-thread scheduling up to a point where it is able to replace the current scheduler, then look at more complex things after that. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Willy Tarreau [EMAIL PROTECTED] wrote: Good idea. The machine I'm typing from now has 1000 scheddos running at +19, and 12 gears at nice 0. [...] From time to time, one of the 12 aligned gears will quickly perform a full quarter of round while others slowly turn by a few degrees. In fact, while I don't know this process's CPU usage pattern, there's something useful in it : it allows me to visually see when process accelerate/decelerate. [...] cool idea - i have just tried this and it rocks - you can easily see the 'nature' of CPU time distribution just via visual feedback. (Is there any easy way to start up 12 glxgears fully aligned, or does one always have to mouse around to get them into proper position?) btw., i am using another method to quickly judge X's behavior: i started the 'snowflakes' plugin in Beryl on Fedora 7, which puts a nice smooth opengl-rendered snow fall on the desktop background. That gives me an idea about how well X is scheduling under various workloads, without having to instrument it explicitly. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS and suspend2: hang in atomic copy (was: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS])
On Wed, 18 Apr 2007, Ingo Molnar wrote: * Christian Hesse [EMAIL PROTECTED] wrote: Hi Ingo and all, On Friday 13 April 2007, Ingo Molnar wrote: as usual, any sort of feedback, bugreports, fixes and suggestions are more than welcome, I just gave CFS a try on my system. From a user's point of view it looks good so far. Thanks for your work. you are welcome! However I found a problem: When trying to suspend a system patched with suspend2 2.2.9.11 it hangs with doing atomic copy. Pressing the ESC key results in a message that it tries to abort suspend, but then still hangs. i took a quick look at suspend2 and it makes some use of yield(). There's a bug in CFS's yield code, i've attached a patch that should fix it, does it make any difference to the hang? Ingo Index: linux/kernel/sched_fair.c === --- linux.orig/kernel/sched_fair.c +++ linux/kernel/sched_fair.c @@ -264,15 +264,26 @@ static void dequeue_task_fair(struct rq /* * sched_yield() support is very simple via the rbtree, we just - * dequeue and enqueue the task, which causes the task to - * roundrobin to the end of the tree: + * dequeue the task and move it to the rightmost position, which + * causes the task to roundrobin to the end of the tree. */ static void requeue_task_fair(struct rq *rq, struct task_struct *p) { dequeue_task_fair(rq, p); p-on_rq = 0; - enqueue_task_fair(rq, p); + /* +* Temporarily insert at the last position of the tree: +*/ + p-fair_key = LLONG_MAX; + __enqueue_task_fair(rq, p); p-on_rq = 1; + + /* +* Update the key to the real value, so that when all other +* tasks from before the rightmost position have executed, +* this task is picked up again: +*/ + p-fair_key = rq-fair_clock - p-wait_runtime + p-nice_offset; I don't think it safe to change the key after inserting the element in the tree. You end up with an unsorted tree giving where new entries end up in wrong places randomly. I think a better approach would be to keep track of the rightmost entry, set the key to the rightmost's key +1 and then simply insert it there. Esben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS and suspend2: hang in atomic copy (was: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS])
* Esben Nielsen [EMAIL PROTECTED] wrote: +/* + * Temporarily insert at the last position of the tree: + */ +p-fair_key = LLONG_MAX; +__enqueue_task_fair(rq, p); p-on_rq = 1; + +/* + * Update the key to the real value, so that when all other + * tasks from before the rightmost position have executed, + * this task is picked up again: + */ +p-fair_key = rq-fair_clock - p-wait_runtime + p-nice_offset; I don't think it safe to change the key after inserting the element in the tree. You end up with an unsorted tree giving where new entries end up in wrong places randomly. yeah, indeed. I hoped that once this rightmost entry is removed (as soon as it gets scheduled next time) the tree goes back to a correct shape, but that's not the case - the left sub-tree and the right sub-tree is merged by the rbtree code with the assumption that the entry had a correct key. I think a better approach would be to keep track of the rightmost entry, set the key to the rightmost's key +1 and then simply insert it there. yeah. I had that implemented at a stage but was trying to be too clever for my own good ;-) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS and suspend2: hang in atomic copy (was: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS])
* Ingo Molnar [EMAIL PROTECTED] wrote: I think a better approach would be to keep track of the rightmost entry, set the key to the rightmost's key +1 and then simply insert it there. yeah. I had that implemented at a stage but was trying to be too clever for my own good ;-) i have fixed it via the patch below. (I'm using rb_last() because that way the normal scheduling codepaths are not burdened with the maintainance of a rightmost entry.) Ingo --- kernel/sched.c |3 ++- kernel/sched_fair.c | 24 +--- 2 files changed, 15 insertions(+), 12 deletions(-) Index: linux/kernel/sched.c === --- linux.orig/kernel/sched.c +++ linux/kernel/sched.c @@ -3806,7 +3806,8 @@ asmlinkage long sys_sched_yield(void) schedstat_inc(rq, yld_cnt); if (rq-nr_running == 1) schedstat_inc(rq, yld_act_empty); - current-sched_class-yield_task(rq, current); + else + current-sched_class-yield_task(rq, current); /* * Since we are going to call schedule() anyway, there's Index: linux/kernel/sched_fair.c === --- linux.orig/kernel/sched_fair.c +++ linux/kernel/sched_fair.c @@ -275,21 +275,23 @@ static void dequeue_task_fair(struct rq */ static void yield_task_fair(struct rq *rq, struct task_struct *p) { + struct rb_node *entry; + struct task_struct *last; + dequeue_task_fair(rq, p); p-on_rq = 0; + /* -* Temporarily insert at the last position of the tree: +* Temporarily insert at the last position of the tree. +* The key will be updated back to (near) its old value +* when the task gets scheduled. */ - p-fair_key = LLONG_MAX; + entry = rb_last(rq-tasks_timeline); + last = rb_entry(entry, struct task_struct, run_node); + + p-fair_key = last-fair_key + 1; __enqueue_task_fair(rq, p); p-on_rq = 1; - - /* -* Update the key to the real value, so that when all other -* tasks from before the rightmost position have executed, -* this task is picked up again: -*/ - p-fair_key = rq-fair_clock - p-wait_runtime + p-nice_offset; } /* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
William Lee Irwin III wrote: * Andrew Morton [EMAIL PROTECTED] wrote: Yes, there are potential compatibility problems. Example: a machine with 100 busy httpd processes and suddenly a big gzip starts up from console or cron. [...] On Thu, Apr 19, 2007 at 08:38:10AM +0200, Ingo Molnar wrote: h. How about the following then: default to nice -10 for all (SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_ special: root already has disk space reserved to it, root has special memory allocation allowances, etc. I dont see a reason why we couldnt by default make all root tasks have nice -10. This would be instantly loved by sysadmins i suspect ;-) (distros that go the extra mile of making Xorg run under non-root could also go another extra one foot to renice that X server to -10.) I'd further recommend making priority levels accessible to kernel threads that are not otherwise accessible to processes, both above and below user-available priority levels. Basically, if you can get SCHED_RR and SCHED_FIFO to coexist as intimate scheduler classes, then a SCHED_KERN scheduler class can coexist with SCHED_OTHER in like fashion, but with availability of higher and lower priorities than any userspace process is allowed, and potentially some differing scheduling semantics. In such a manner nonessential background processing intended not to ever disturb userspace can be given priorities appropriate to it (perhaps even con's SCHED_IDLEPRIO would make sense), and other, urgent processing can be given priority over userspace altogether. I believe root's default priority can be adjusted in userspace as things now stand somewhere in /etc/ but I'm not sure of the specifics. Word is somewhere in /etc/security/limits.conf This is sounding very much like System V Release 4 (and descendants) except that they call it SCHED_SYS and also give SCHED_NORMAL tasks that are in system mode dynamic priorities in the SCHED_SYS range (to avoid priority inversion, I believe). Peter -- Peter Williams [EMAIL PROTECTED] Learning, n. The kind of ignorance distinguishing the studious. -- Ambrose Bierce - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Hi Ingo, On Thu, Apr 19, 2007 at 11:01:44AM +0200, Ingo Molnar wrote: * Willy Tarreau [EMAIL PROTECTED] wrote: Good idea. The machine I'm typing from now has 1000 scheddos running at +19, and 12 gears at nice 0. [...] From time to time, one of the 12 aligned gears will quickly perform a full quarter of round while others slowly turn by a few degrees. In fact, while I don't know this process's CPU usage pattern, there's something useful in it : it allows me to visually see when process accelerate/decelerate. [...] cool idea - i have just tried this and it rocks - you can easily see the 'nature' of CPU time distribution just via visual feedback. (Is there any easy way to start up 12 glxgears fully aligned, or does one always have to mouse around to get them into proper position?) -- Replying quickly, I'm short in time -- You can certainly script it with -geometry. But it is the wrong application for this matter, because you benchmark X more than glxgears itself. What would be better is something like a line rotating 360 degrees and doing some short stuff between each degree, so that X is not much sollicitated, but the CPU would be spent more on the processes themselves. Benchmarking interactions between X and multiple clients is a completely different test IMHO. Glxgears is between those two, making it inappropriate for scheduler tuning. Regards, Willy - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Willy Tarreau [EMAIL PROTECTED] wrote: You can certainly script it with -geometry. But it is the wrong application for this matter, because you benchmark X more than glxgears itself. What would be better is something like a line rotating 360 degrees and doing some short stuff between each degree, so that X is not much sollicitated, but the CPU would be spent more on the processes themselves. at least on my setup glxgears goes via DRI/DRM so there's no X scheduling inbetween at all, and the visual appearance of glxgears is a direct function of its scheduling. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Thu, 19 Apr 2007, Ingo Molnar wrote: i disagree that the user 'would expect' this. Some users might. Others would say: 'my 10-thread rendering engine is more important than a 1-thread job because it's using 10 threads for a reason'. And the CFS feedback so far strengthens this point: the default behavior of treating the thread as a single scheduling (and CPU time accounting) unit works pretty well on the desktop. think about it in another, 'kernel policy' way as well: we'd like to _encourage_ more parallel user applications. Hurting them by accounting all threads together sends the exact opposite message. There are counter argouments too. Like, not every user knows if a certain process is MT or not. I agree though that doing accounting and fairness at a depth lower then USER is messy, and not only for performance. - Davide - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Thu, 19 Apr 2007, Mike Galbraith wrote: On Thu, 2007-04-19 at 09:09 +0200, Ingo Molnar wrote: * Mike Galbraith [EMAIL PROTECTED] wrote: With a heavily reniced X (perfectly fine), that should indeed solve my daily usage pattern nicely (always need godmode for shells, but not for mozilla and ilk. 50/50 split automatic without renice of entire gui) how about the first-approximation solution i suggested in the previous mail: to add a per UID default nice level? (With this default defaulting to '-10' for all root-owned processes, and defaulting to '0' for everything else.) That would solve most of the current CFS regressions at hand. That would make my kernel builds etc interfere with my other self's surfing and whatnot. With it by EUID, when I'm surfing or whatnot, the X portion of my Joe-User activity pushes the compile portion of root down in bandwidth utilization automagically, which is exactly the right thing, because the root me in not as important as the Joe-User me using the GUI at that time. If the idea of X disturbing root upsets some, they can move X to another UID. Generally, it seems perfect for here. Now guys, I did not follow the whole lengthy and feisty thread, but IIRC Con's scheduler has been attacked because, among other argouments, was requiring X to be reniced. This happened like a month ago IINM. I did not have time to look at Con's scheduler, and I only had a brief look at Ingo's one (looks very promising IMO, but so was the initial O(1) post before all the corner-cases fixes went in). But this is not a about technical merit, this is about applying the same rules of judgement to others as well to ourselves. We went from a renicing X to -10 is bad because the scheduler should be able to correctly handle the problem w/out additional external plugs to a totally opposite let's renice -10 X, the whole SCHED_NORMAL kthreads class, on top of all the tasks owned by root [1]. From a spectator POV like myself in this case, this looks rather unfair. [1] I think, before and now, that that's more a duck tape patch than a real solution. OTOH if the solution is gonna be another maze of macros and heuristics filled with pretty bad corner cases, I may prefer the former. - Davide - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Thursday 19 April 2007, Ingo Molnar wrote: * Willy Tarreau [EMAIL PROTECTED] wrote: Good idea. The machine I'm typing from now has 1000 scheddos running at +19, and 12 gears at nice 0. [...] From time to time, one of the 12 aligned gears will quickly perform a full quarter of round while others slowly turn by a few degrees. In fact, while I don't know this process's CPU usage pattern, there's something useful in it : it allows me to visually see when process accelerate/decelerate. [...] cool idea - i have just tried this and it rocks - you can easily see the 'nature' of CPU time distribution just via visual feedback. (Is there any easy way to start up 12 glxgears fully aligned, or does one always have to mouse around to get them into proper position?) btw., i am using another method to quickly judge X's behavior: i started the 'snowflakes' plugin in Beryl on Fedora 7, which puts a nice smooth opengl-rendered snow fall on the desktop background. That gives me an idea about how well X is scheduling under various workloads, without having to instrument it explicitly. yes, its a cute idea, till you switch away from that screen to check progress on something else, like to compose this message. === 5913 frames in 5.0 seconds = 1182.499 FPS 6238 frames in 5.0 seconds = 1247.556 FPS 11380 frames in 5.0 seconds = 2275.905 FPS 10691 frames in 5.0 seconds = 2138.173 FPS 8707 frames in 5.0 seconds = 1741.305 FPS 10669 frames in 5.0 seconds = 2133.708 FPS 11392 frames in 5.0 seconds = 2278.037 FPS 11379 frames in 5.0 seconds = 2275.711 FPS 11310 frames in 5.0 seconds = 2261.861 FPS 11386 frames in 5.0 seconds = 2277.081 FPS 11292 frames in 5.0 seconds = 2258.353 FPS 11352 frames in 5.0 seconds = 2270.297 FPS 11415 frames in 5.0 seconds = 2282.886 FPS 11406 frames in 5.0 seconds = 2281.037 FPS 11483 frames in 5.0 seconds = 2296.533 FPS 11510 frames in 5.0 seconds = 2301.883 FPS 11123 frames in 5.0 seconds = 2224.266 FPS 8980 frames in 5.0 seconds = 1795.861 FPS === The over 2000fps reports were while I was either looking at htop, or starting this message, both on different screens. htop said it was using 95+ % of the cpu even when its display was going to /dev/null. So 'Kewl' doesn't seem to get us apples to apples numbers we can go to the window and bet win-place-show based on them alone. FWIW, running the nvidia-9755 drivers here. So if we are going to use that as a judgement operator, it obviously needs some intelligently applied scaling before they are worth more than a subjective feel is. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) The confusion of a staff member is measured by the length of his memos. -- New York Times, Jan. 20, 1981 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Thursday 19 April 2007, Ingo Molnar wrote: * Willy Tarreau [EMAIL PROTECTED] wrote: You can certainly script it with -geometry. But it is the wrong application for this matter, because you benchmark X more than glxgears itself. What would be better is something like a line rotating 360 degrees and doing some short stuff between each degree, so that X is not much sollicitated, but the CPU would be spent more on the processes themselves. at least on my setup glxgears goes via DRI/DRM so there's no X scheduling inbetween at all, and the visual appearance of glxgears is a direct function of its scheduling. Ingo That doesn't appear to be the case here Ingo. Even when I know the rest of the system is lagged, glxgears continues to show very smooth and steady movement. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) Yow! I just went below the poverty line! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
In article [EMAIL PROTECTED] you wrote: Top (VCPU maybe?) User Process Thread The problem with that is, that not all Schedulers might work on the User level. You can think of Batch/Job, Parent, Group, Session or namespace level. That would be IMHO a generic Top, with no need for a level above. Greetings Bernd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Thu, Apr 19, 2007 at 05:18:03PM +0200, Ingo Molnar wrote: * Willy Tarreau [EMAIL PROTECTED] wrote: You can certainly script it with -geometry. But it is the wrong application for this matter, because you benchmark X more than glxgears itself. What would be better is something like a line rotating 360 degrees and doing some short stuff between each degree, so that X is not much sollicitated, but the CPU would be spent more on the processes themselves. at least on my setup glxgears goes via DRI/DRM so there's no X scheduling inbetween at all, and the visual appearance of glxgears is a direct function of its scheduling. OK, I thought that somethink looking like a clock would be useful, especially if we could tune the amount of CPU spent per task instead of being limited by graphics drivers. I searched freashmeat for a clock and found orbitclock by Jeremy Weatherford, which was exactly what I was looking for : - small - C only - X11 only - needed less than 5 minutes and no knowledge of X11 for the complete hack ! = Kudos to its author, sincerely ! I hacked it a bit to make it accept two parameters : -R run_time_in_microsecond : time spent burning CPU cycles at each round -S sleep_time_in_microsecond : time spent getting a rest It now advances what it thinks is a second at each iteration, so that it makes it easy to compare its progress with other instances (there are seconds, minutes and hours, so it's easy to visually count up to around 43200). The modified code is here : http://linux.1wt.eu/sched/orbitclock-0.2bench.tgz What is interesting to note is that it's easy to make X work a lot (99%) by using 0 as the sleeping time, and it's easy to make the process work a lot by using large values for the running time associated with very low values (or 0) for the sleep time. Ah, and it supports -geometry ;-) It could become a useful scheduler benchmark ! Have fun ! Willy - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Thursday 19 April 2007 18:18, Ingo Molnar wrote: * Willy Tarreau [EMAIL PROTECTED] wrote: You can certainly script it with -geometry. But it is the wrong application for this matter, because you benchmark X more than glxgears itself. What would be better is something like a line rotating 360 degrees and doing some short stuff between each degree, so that X is not much sollicitated, but the CPU would be spent more on the processes themselves. at least on my setup glxgears goes via DRI/DRM so there's no X scheduling inbetween at all, and the visual appearance of glxgears is a direct function of its scheduling. How much of the subjective interactiveness-feel of the desktop is at the mercy of the X server's scheduling and not the cpu scheduler? I've noticed that video playback is significantly smoother and resistant to other load, when using MPlayer's opengl output, especially if heavy programs are running at the same time. Especially firefox and ksysguard seem to have found a way to cause video through Xv to look annoyingly jittery. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Fri, Apr 20, 2007 at 02:52:38AM +0300, Jan Knutar wrote: On Thursday 19 April 2007 18:18, Ingo Molnar wrote: * Willy Tarreau [EMAIL PROTECTED] wrote: You can certainly script it with -geometry. But it is the wrong application for this matter, because you benchmark X more than glxgears itself. What would be better is something like a line rotating 360 degrees and doing some short stuff between each degree, so that X is not much sollicitated, but the CPU would be spent more on the processes themselves. at least on my setup glxgears goes via DRI/DRM so there's no X scheduling inbetween at all, and the visual appearance of glxgears is a direct function of its scheduling. How much of the subjective interactiveness-feel of the desktop is at the mercy of the X server's scheduling and not the cpu scheduler? probably a lot. Hence the reason why I wanted something visually noticeable but using far less X resources than glxgears. The modified orbitclock is perfect IMHO. Regards, Willy - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Thu, 2007-04-19 at 09:55 -0700, Davide Libenzi wrote: On Thu, 19 Apr 2007, Mike Galbraith wrote: On Thu, 2007-04-19 at 09:09 +0200, Ingo Molnar wrote: * Mike Galbraith [EMAIL PROTECTED] wrote: With a heavily reniced X (perfectly fine), that should indeed solve my daily usage pattern nicely (always need godmode for shells, but not for mozilla and ilk. 50/50 split automatic without renice of entire gui) how about the first-approximation solution i suggested in the previous mail: to add a per UID default nice level? (With this default defaulting to '-10' for all root-owned processes, and defaulting to '0' for everything else.) That would solve most of the current CFS regressions at hand. That would make my kernel builds etc interfere with my other self's surfing and whatnot. With it by EUID, when I'm surfing or whatnot, the X portion of my Joe-User activity pushes the compile portion of root down in bandwidth utilization automagically, which is exactly the right thing, because the root me in not as important as the Joe-User me using the GUI at that time. If the idea of X disturbing root upsets some, they can move X to another UID. Generally, it seems perfect for here. Now guys, I did not follow the whole lengthy and feisty thread, but IIRC Con's scheduler has been attacked because, among other argouments, was requiring X to be reniced. This happened like a month ago IINM. I don't object to renicing X if you want it to receive _more_ than it's fair share. I do object to having to renice X in order for it to _get_ it's fair share. That's what I attacked. I did not have time to look at Con's scheduler, and I only had a brief look at Ingo's one (looks very promising IMO, but so was the initial O(1) post before all the corner-cases fixes went in). But this is not a about technical merit, this is about applying the same rules of judgement to others as well to ourselves. I'm running the same tests with CFS that I ran for RSDL/SD. It falls short in one key area (to me) in that X+client cannot yet split my box 50/50 with two concurrent tasks. In the CFS case, renicing both X and client does work, but it should not be necessary IMHO. With RSDL/SD renicing didn't help. We went from a renicing X to -10 is bad because the scheduler should be able to correctly handle the problem w/out additional external plugs to a totally opposite let's renice -10 X, the whole SCHED_NORMAL kthreads class, on top of all the tasks owned by root [1]. From a spectator POV like myself in this case, this looks rather unfair. Well, for me, the renicing I mentioned above is only interesting as a way to improve long term fairness with schedulers with no history. I found Linus' EUID idea intriguing in that by putting the server together with a steady load in one 'fair' domain, and clients in another, X can, if prioritized to empower it to do so, modulate the steady load in it's domain (but can't starve it!), the clients modulate X, and the steady load gets it all when X and clients are idle. The nice level of X determines to what _extent_ X can modulate the constant load rather like a mixer slider. The synchronous (I'm told) nature of X/client then becomes kind of an asset to the desktop instead of a liability. The specific case I was thinking about is the X+Gforce test where both RSDL and CFS fail to provide fairness (as defined by me;). X and Gforce are mostly not concurrent. The make -j2 I put them up against are mostly concurrent. I don't call giving 1/3 of my CPU to X+Client fair at _all_, but that's what you'll get if your fairstick of the instant generally can't see the fourth competing task. Seemed pretty cool to me because it creates the missing connection between client and server, though also likely complicated (and maybe full of perils, who knows). -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
William Lee Irwin III wrote: I'd further recommend making priority levels accessible to kernel threads that are not otherwise accessible to processes, both above and below user-available priority levels. Basically, if you can get SCHED_RR and SCHED_FIFO to coexist as intimate scheduler classes, then a SCHED_KERN scheduler class can coexist with SCHED_OTHER in like fashion, but with availability of higher and lower priorities than any userspace process is allowed, and potentially some differing scheduling semantics. In such a manner nonessential background processing intended not to ever disturb userspace can be given priorities appropriate to it (perhaps even con's SCHED_IDLEPRIO would make sense), and other, urgent processing can be given priority over userspace altogether. On Thu, Apr 19, 2007 at 09:50:19PM +1000, Peter Williams wrote: This is sounding very much like System V Release 4 (and descendants) except that they call it SCHED_SYS and also give SCHED_NORMAL tasks that are in system mode dynamic priorities in the SCHED_SYS range (to avoid priority inversion, I believe). Descriptions of that are probably where I got the idea (hurrah for OS textbooks). It makes a fair amount of sense. Not sure what the take on the specific precedent is. The only content here is expanding the priority range with ranges above and below for the exclusive use of ultra-privileged tasks, so it's really trivial. Actually it might be so trivial it should just be some permission checks in the SCHED_OTHER renicing code. -- wli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Thu, 19 Apr 2007 05:18:07 +0200 Nick Piggin <[EMAIL PROTECTED]> wrote: > And yes, by fairly, I mean fairly among all threads as a base resource > class, because that's what Linux has always done Yes, there are potential compatibility problems. Example: a machine with 100 busy httpd processes and suddenly a big gzip starts up from console or cron. Under current kernels, that gzip will take ages and the httpds will take a 1% slowdown, which may well be exactly the behaviour which is desired. If we were to schedule by UID then the gzip suddenly gets 50% of the CPU and those httpd's all take a 50% hit, which could be quite serious. That's simple to fix via nicing, but people have to know to do that, and there will be a transition period where some disruption is possible. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Wed, Apr 18, 2007 at 10:49:45PM +1000, Con Kolivas wrote: > On Wednesday 18 April 2007 22:13, Nick Piggin wrote: > > > > The kernel compile (make -j8 on 4 thread system) is doing 1800 total > > context switches per second (450/s per runqueue) for cfs, and 670 > > for mainline. Going up to 20ms granularity for cfs brings the context > > switch numbers similar, but user time is still a % or so higher. I'd > > be more worried about compute heavy threads which naturally don't do > > much context switching. > > While kernel compiles are nice and easy to do I've seen enough criticism of > them in the past to wonder about their usefulness as a standard benchmark on > their own. Actually it is a real workload for most kernel developers including you no doubt :) The criticism's of kernbench for the kernel are probably fair in that kernel compiles don't exercise a lot of kernel functionality (page allocator and fault paths mostly, IIRC). However as far as I'm concerned, they're great for testing the CPU scheduler, because it doesn't actually matter whether you're running in userspace or kernel space for a context switch to blow your caches. The results are quite stable. You could actually make up a benchmark that hurts a whole lot more from context switching, but I figure that kernbench is a real world thing that shows it up quite well. > > Some other numbers on the same system > > Hackbench: 2.6.21-rc7 cfs-v2 1ms[*] nicksched > > 10 groups: Time: 1.332 0.743 0.607 > > 20 groups: Time: 1.197 1.100 1.241 > > 30 groups: Time: 1.754 2.376 1.834 > > 40 groups: Time: 3.451 2.227 2.503 > > 50 groups: Time: 3.726 3.399 3.220 > > 60 groups: Time: 3.548 4.567 3.668 > > 70 groups: Time: 4.206 4.905 4.314 > > 80 groups: Time: 4.551 6.324 4.879 > > 90 groups: Time: 7.904 6.962 5.335 > > 100 groups: Time: 7.293 7.799 5.857 > > 110 groups: Time: 10.5958.728 6.517 > > 120 groups: Time: 7.543 9.304 7.082 > > 130 groups: Time: 8.269 10.639 8.007 > > 140 groups: Time: 11.8678.250 8.302 > > 150 groups: Time: 14.8528.656 8.662 > > 160 groups: Time: 9.648 9.313 9.541 > > Hackbench even more so. A prolonged discussion with Rusty Russell on this > issue he suggested hackbench was more a pass/fail benchmark to ensure there > was no starvation scenario that never ended, and very little value should be > placed on the actual results returned from it. Yeah, cfs seems to do a little worse than nicksched here, but I include the numbers not because I think that is significant, but to show mainline's poor characteristics. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Wed, Apr 18, 2007 at 07:48:21AM -0700, Linus Torvalds wrote: > > > On Wed, 18 Apr 2007, Matt Mackall wrote: > > > > Why is X special? Because it does work on behalf of other processes? > > Lots of things do this. Perhaps a scheduler should focus entirely on > > the implicit and directed wakeup matrix and optimizing that > > instead[1]. > > I 100% agree - the perfect scheduler would indeed take into account where > the wakeups come from, and try to "weigh" processes that help other > processes make progress more. That would naturally give server processes > more CPU power, because they help others > > I don't believe for a second that "fairness" means "give everybody the > same amount of CPU". That's a totally illogical measure of fairness. All > processes are _not_ created equal. I believe that unless the kernel is told of these inequalities, then it must schedule fairly. And yes, by fairly, I mean fairly among all threads as a base resource class, because that's what Linux has always done (and if you aggregate into higher classes, you still need that per-thread scheduling). So I'm not excluding extra scheduling classes like per-process, per-user, but among any class of equal schedulable entities, fair scheduling is the only option because the alternative of unfairness is just insane. > That said, even trying to do "fairness by effective user ID" would > probably already do a lot. In a desktop environment, X would get as much > CPU time as the user processes, simply because it's in a different > protection domain (and that's really what "effective user ID" means: it's > not about "users", it's really about "protection domains"). > > And "fairness by euid" is probably a hell of a lot easier to do than > trying to figure out the wakeup matrix. Well my X server has an euid of root, which would mean my X clients can cause X to do work and eat into root's resources. Or as Ingo said, X may not be running as root. Seems like just another hack to try to implicitly solve the X problem and probably create a lot of others along the way. All fairness issues aside, in the context of keeping a very heavily loaded desktop interactive, X is special. That you are trying to think up funny rules that would implicitly give X better priority is kind of indicative of that. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Ingo Molnar wrote: * Peter Williams <[EMAIL PROTECTED]> wrote: And my scheduler for example cuts down the amount of policy code and code size significantly. Yours is one of the smaller patches mainly because you perpetuate (or you did in the last one I looked at) the (horrible to my eyes) dual array (active/expired) mechanism. That this idea was bad should have been apparent to all as soon as the decision was made to excuse some tasks from being moved from the active array to the expired array. This essentially meant that there would be circumstances where extreme unfairness (to the extent of starvation in some cases) -- the very things that the mechanism was originally designed to ensure (as far as I can gather). Right about then in the development of the O(1) scheduler alternative solutions should have been sought. in hindsight i'd agree. Hindsight's a wonderful place isn't it :-) and, of course, it's where I was making my comments from. But back then we were clearly not ready for fine-grained accurate statistics + trees (cpus are alot faster at more complex arithmetics today, plus people still believed that low-res can be done well enough), and taking out any of these two concepts from CFS would result in a similarly complex runqueue implementation. I disagree. The single priority array with a promotion mechanism that I use in the SPA schedulers can do the job of avoiding starvation with no measurable increase in the overhead. Fairness, nice, good interactive responsiveness can then be managed by how you determine tasks' dynamic priorities. Also, the array switch was just thought to be of another piece of 'if the heuristics go wrong, we fall back to an array switch' logic, right in line with the other heuristics. And you have to accept it, mainline's ability to auto-renice make -j jobs (and other CPU hogs) was quite a plus for developers, so it had (and probably still has) quite some inertia. I agree, it wasn't totally useless especially for the average user. My main problem with it was that the effect of "nice" wasn't consistent or predictable enough for reliable resource allocation. I also agree with the aims of the various heuristics i.e. you have to be unfair and give some tasks preferential treatment in order to give the users the type of responsiveness that they want. It's just a shame that it got broken in the process but as you say it's easier to see these things in hindsight than in the middle of the melee. Peter -- Peter Williams [EMAIL PROTECTED] "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Chris Friesen wrote: Mark Glines wrote: One minor question: is it even possible to be completely fair on SMP? For instance, if you have a 2-way SMP box running 3 applications, one of which has 2 threads, will the threaded app have an advantage here? (The current system seems to try to keep each thread on a specific CPU, to reduce cache thrashing, which means threads and processes alike each get 50% of the CPU.) I think the ideal in this case would be to have both threads on one cpu, with the other app on the other cpu. This gives inter-process fairness while minimizing the amount of task migration required. Solving this sort of issue was one of the reasons for the smpnice patches. More interesting is the case of three processes on a 2-cpu system. Do we constantly migrate one of them back and forth to ensure that each of them gets 66% of a cpu? Depends how keen you are on fairness. Unless the process are long term continuously active tasks that never sleep it's probably not an issue as they'll probably move around enough in the long term for them each to get 66% over the long term. Exact load balancing for real work loads (where tasks are coming and going, sleeping and waking semi randomly and over relatively brief periods) is probably unattainable because by the time you've work out the ideal placement of the currently runnable tasks on the available CPUs it's all changed and the solution is invalid. The best you can hope for that change isn't so great as to completely invalidate the solution and the changes you make as a result are an improvement on the current allocation of processes to CPUs. The above probably doesn't hold for some systems such as those large super computer jobs that run for several days but they're probably best served by explicit allocation of processes to CPUs using the process affinity mechanism. Peter -- Peter Williams [EMAIL PROTECTED] "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Wed, 18 Apr 2007, Davide Libenzi wrote: > > I know, we agree there. But that did not fit my "Pirates of the Caribbean" > quote :) Ahh, I'm clearly not cultured enough, I didn't catch that reference. Linus "yes, I've seen the movie, but it apparently left more of a mark in other people" Torvalds - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Linus Torvalds wrote: On Wed, 18 Apr 2007, Matt Mackall wrote: On Wed, Apr 18, 2007 at 07:48:21AM -0700, Linus Torvalds wrote: And "fairness by euid" is probably a hell of a lot easier to do than trying to figure out the wakeup matrix. For the record, you actually don't need to track a whole NxN matrix (or do the implied O(n**3) matrix inversion!) to get to the same result. I'm sure you can do things differently, but the reason I think "fairness by euid" is actually worth looking at is that it's pretty much the *identical* issue that we'll have with "fairness by virtual machine" and a number of other "container" issues. The fact is: - "fairness" is *not* about giving everybody the same amount of CPU time (scaled by some niceness level or not). Anybody who thinks that is "fair" is just being silly and hasn't thought it through. - "fairness" is multi-level. You want to be fair to threads within a thread group (where "process" may be one good approximation of what a "thread group" is, but not necessarily the only one). But you *also* want to be fair in between those "thread groups", and then you want to be fair across "containers" (where "user" may be one such container). So I claim that anything that cannot be fair by user ID is actually really REALLY unfair. I think it's absolutely humongously STUPID to call something the "Completely Fair Scheduler", and then just be fair on a thread level. That's not fair AT ALL! It's the anti-thesis of being fair! So if you have 2 users on a machine running CPU hogs, you should *first* try to be fair among users. If one user then runs 5 programs, and the other one runs just 1, then the *one* program should get 50% of the CPU time (the users fair share), and the five programs should get 10% of CPU time each. And if one of them uses two threads, each thread should get 5%. So you should see one thread get 50& CPU (single thread of one user), 4 threads get 10% CPU (their fair share of that users time), and 2 threads get 5% CPU (the fair share within that thread group!). Any scheduling argument that just considers the above to be "7 threads total" and gives each thread 14% of CPU time "fairly" is *anything* but fair. It's a joke if that kind of scheduler then calls itself CFS! And yes, that's largely what the current scheduler will do, but at least the current scheduler doesn't claim to be fair! So the current scheduler is a lot *better* if only in the sense that it doesn't make ridiculous claims that aren't true! Linus Sounds a lot like the PLFS (process level fair sharing) scheduler in Aurema's ARMTech (for whom I used to work). The "fair" in the title is a bit misleading as it's all about unfair scheduling in order to meet specific policies. But it's based on the principle that if you can allocate CPU band width "fairly" (which really means in proportion to the entitlement each process is allocated) then you can allocate CPU band width "fairly" between higher level entities such as process groups, users groups and so on by subdividing the entitlements downwards. The tricky part of implementing this was the fact that not all entities at the various levels have sufficient demand for CPU band width to use their entitlements and this in turn means that the entities above them will have difficulty using their entitlements even if other of their subordinates have sufficient demand (because their entitlements will be too small). The trick is to have a measure of each entity's demand for CPU bandwidth and use that to modify the way entitlement is divided among subordinates. As a first guess, an entity's CPU band width usage is an indicator of demand but doesn't take into account unmet demand due to tasks waiting on a run queue waiting for access to the CPU. On the other hand, usage plus time waiting on the queue isn't a good measure of demand either (although it's probably a good upper bound) as it's unlikely that the task would have used the same amount of CPU as the waiting time if it had gone straight to the CPU. But my main point is that it is possible to build schedulers that can achieve higher level scheduling policies. Versions of PLFS work on Windows from user space by twiddling process priorities. Part of my more recent work at Aurema had been involved in patching Linux's scheduler so that nice worked more predictably so that we could release a user space version of PLFS for Linux. The other part was to add hard CPU band width caps for processes so that ARMTech could enforce hard CPU bandwidth caps on higher level entities (as this can't be done without the kernel being able to do it at that level. Peter -- Peter Williams [EMAIL PROTECTED] "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Wed, 18 Apr 2007, Linus Torvalds wrote: > On Wed, 18 Apr 2007, Davide Libenzi wrote: > > > > "Perhaps on the rare occasion pursuing the right course demands an act of > > unfairness, unfairness itself can be the right course?" > > I don't think that's the right issue. > > It's just that "fairness" != "equal". > > Do you think it "fair" to pay everybody the same regardless of how good a > job they do? I don't think anybody really believes that. > > Equating "fair" and "equal" is simply a very fundamental mistake. They're > not the same thing. Never have been, and never will. I know, we agree there. But that did not fit my "Pirates of the Caribbean" quote :) - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Wed, 18 Apr 2007, Ingo Molnar wrote: > That's one reason why i dont think it's necessarily a good idea to > group-schedule threads, we dont really want to do a per thread group > percpu_alloc(). I still do not have clear how much overhead this will bring into the table, but I think (like Linus was pointing out) the hierarchy should look like: Top (VCPU maybe?) User Process Thread The "run_queue" concept (and data) that now is bound to a CPU, need to be replicated in: ROOT <- VCPUs add themselves here VCPU <- USERs add themselves here USER <- PROCs add themselves here PROC <- THREADs add themselves here THREAD (ultimate fine grained scheduling unit) So ROOT, VCPU, USER and PROC will have their own "run_queue". Picking up a new task would mean: VCPU = ROOT->lookup(); USER = VCPU->lookup(); PROC = USER->lookup(); THREAD = PROC->lookup(); Run-time statistics should propagate back the other way around. > In fact for threads the _reverse_ problem exists, threaded apps tend to > _strive_ for more performance - hence their desperation of using the > threaded programming model to begin with ;) (just think of media > playback apps which are typically multithreaded) The same user nicing two different multi-threaded processes would expect a predictable CPU distribution too. Doing that efficently (the old per-cpu run-queue is pretty nice from many POVs) is the real challenge. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Wednesday 18 April 2007 22:33, Con Kolivas wrote: > On Wednesday 18 April 2007 22:14, Nick Piggin wrote: > > On Wed, Apr 18, 2007 at 07:33:56PM +1000, Con Kolivas wrote: > > > On Wednesday 18 April 2007 18:55, Nick Piggin wrote: > > > > Again, for comparison 2.6.21-rc7 mainline: > > > > > > > > 508.87user 32.47system 2:17.82elapsed 392%CPU > > > > 509.05user 32.25system 2:17.84elapsed 392%CPU > > > > 508.75user 32.26system 2:17.83elapsed 392%CPU > > > > 508.63user 32.17system 2:17.88elapsed 392%CPU > > > > 509.01user 32.26system 2:17.90elapsed 392%CPU > > > > 509.08user 32.20system 2:17.95elapsed 392%CPU > > > > > > > > So looking at elapsed time, a granularity of 100ms is just behind the > > > > mainline score. However it is using slightly less user time and > > > > slightly more idle time, which indicates that balancing might have > > > > got a bit less aggressive. > > > > > > > > But anyway, it conclusively shows the efficiency impact of such tiny > > > > timeslices. > > > > > > See test.kernel.org for how (the now defunct) SD was performing on > > > kernbench. It had low latency _and_ equivalent throughput to mainline. > > > Set the standard appropriately on both counts please. > > > > I can give it a run. Got an updated patch against -rc7? > > I said I wasn't pursuing it but since you're offering, the rc6 patch should > apply ok. > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc6-sd-0.40.patch Oh and if you go to the effort of trying you may as well try the timeslice tweak to see what effect it has on SD as well. /proc/sys/kernel/rr_interval 100 is the highest. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Davide Libenzi <[EMAIL PROTECTED]> wrote: > I think Ingo's idea of a new sched_group to contain the generic > parameters needed for the "key" calculation, works better than adding > more fields to existing strctures (that would, of course, host > pointers to it). Otherwise I can already the the struct_signal being > the target for other unrelated fields :) yeah. Another detail is that for global containers like uids, the statistics will have to be percpu_alloc()-ed, both for correctness (runqueues are per CPU) and for performance. That's one reason why i dont think it's necessarily a good idea to group-schedule threads, we dont really want to do a per thread group percpu_alloc(). In fact for threads the _reverse_ problem exists, threaded apps tend to _strive_ for more performance - hence their desperation of using the threaded programming model to begin with ;) (just think of media playback apps which are typically multithreaded) I dont think threads are all that different. Also, the resource-conserving act of using CLONE_VM to share the VM (and to use a different programming environment like Java) should not be 'punished' by forcing the thread group to be accounted as a single, shared entity against other 'fat' tasks. so my current impression is that we want per UID accounting to solve the X problem, the kernel threads problem and the many-users problem, but i'd not want to do it for threads just yet because for them there's not really any apparent problem to be solved. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > > perhaps a more fitting term would be 'precise group-scheduling'. > > Within the lowest level task group entity (be that thread group or > > uid group, etc.) 'precise scheduling' is equivalent to 'fairness'. > > Yes. Absolutely. Except I think that at least if you're going to name > somethign "complete" (or "perfect" or "precise"), you should also > admit that groups can be hierarchical. yes. Am i correct to sum up your impression as: " Ingo, for you the hierarchy still appears to be an after-thought, while in practice it's easily the most important thing! Why are you so hung up about 'fairness', it makes no sense!" right? and you would definitely be right if you suggested that i neglected the 'group scheduling' aspects of CFS (except for a minimalistic nice level implementation, which is a poor-man's-non-automatic-group-scheduling), but i very much know its important and i'll definitely fix it for -v4. But please let me explain my reasons for my different focus: yes, group scheduling in practice is the most important first-layer thing, and without it any of the other 'CFS wins' can easily be useless. Firstly, i have not neglected the group scheduling related CFS regressions at all, mainly because there _is_ already a quick hack to check whether group scheduling would solve these regressions: renice. And it was tried in both of the two CFS regression cases i'm aware of: Mike's X starvation problem and Willy's "kevents starvation with thousands of scheddos tasks running" problem. And in both cases, applying the renice hack [which should be properly and automatically implemented as uid group scheduling] fixed the regression for them! So i was not worried at all, group scheduling _provably solves_ these CFS regressions. I rather concentrated on the CFS regressions that were much less clear. But PLEASE believe me: even with perfect cross-group CPU allocation but with a simple non-heuristic scheduler underlying it, you can _easily_ get a sucky desktop experience! I know it because i tried it and others tried it too. (in fact the first version of sched_fair.c was tick based and low-res, and it sucked) Two more things were needed: - the high precision of nsec/64-bit accounting ('reliability of scheduling') - extremely even time-distribution of CPU power ('determinism/smoothness, human perception') (i'm expanding on these two concepts further below) take out any of these and group scheduling or not, you are easily going to have a sucky desktop! (We know that from years of experiments: many people tried to rip out the unfairness from the scheduler and there were always nasty corner cases that 'should' have worked but didnt.) Without these we'd in essence start again at square one, just at a different square, this time with another group of people being irritated! But the biggest and hardest to achieve _wins_ of CFS are _NOT_ achieved via a simple 'get rid of the unfairness of the upstream scheduler and apply group scheduling'. (I know that because i tried it before and because others tried it before, for many many years.) You will _easily_ get sucky desktop experience. The other two things are very much needed too: - the high precision of nsec/64-bit accounting, and the many corner-cases this solves. (For example on a typical desktop there are _lots_ of timing-driven workloads that are in essence 'invisible' to low-resolution, timer-tick based accounting and are heavily skewed.) - extremely even time-distribution of CPU power. CFS behaves pretty well even under the dreaded 'make -jN in an xterm' kernel build workload as reported by Mark Lord, because it also distributes CPU power in a _finegrained_ way. A shell prompt under CFS still behaves acceptably on a single-CPU testbox of mine with a "make -j50" workload. (yes, fifty) Humans react alot more negatively to sudden changes in application behavior ('lags', pauses, short hangs) than they react to fine, gradual, all-encompassing slowdowns. This is a key property of CFS. ( Otherwise renicing X to -10 would have solved most of the interactivity complaints against the vanilla scheduler, otherwise renicing X to -10 would have fixed Mike's setup under SD (it didnt) while it worked much better under CFS, otherwise Gene wouldnt have found CFS markedly better than SD, etc., etc. So getting rid of the heuristics is less than 50% of the road to the perfect desktop scheduler. ) and i claim that these were the really hard bits, and i spent most of the CFS coding only on getting _these_ details 100% right under various workloads, and it makes a night and day difference _even without any group scheduling help_. and note another reason here: group scheduling _masks_ many other scheduling deficiencies that are possible in scheduler. So since CFS doesnt do group scheduling, i get a _fuller_
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Wed, 18 Apr 2007, Davide Libenzi wrote: > > "Perhaps on the rare occasion pursuing the right course demands an act of > unfairness, unfairness itself can be the right course?" I don't think that's the right issue. It's just that "fairness" != "equal". Do you think it "fair" to pay everybody the same regardless of how good a job they do? I don't think anybody really believes that. Equating "fair" and "equal" is simply a very fundamental mistake. They're not the same thing. Never have been, and never will. Now, there's no question that "equal" is much easier to implement, if only because it's a lot easier to agree what it means. "Equal parts" is somethign everybody can agree on. "Fair parts" automatically involves a balancing act, and people will invariably count things differently and thus disagree about what is "fair" and what is not. I don't think we can ever get a "perfect" setup for that reason, but I think we can get something that at least gets reasonably close, at least for the obvious cases. So my suggested test-case of running one process as one user and two processes as another one has a fairly "obviously correct" solution if you have just one CPU's, and you can probably be pretty fair in practice on two CPU's (there's an obvious theoretical solution, whether you can get there with a practical algorithm is another thing). On three or more CPU's, you obviously wouldn't even *want* to be fair, since you can very naturally just give a CPU to each.. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Wed, 18 Apr 2007, Linus Torvalds wrote: > For example, maybe we can approximate it by spreading out the statistics: > right now you have things like > > - last_ran, wait_runtime, sum_wait_runtime.. > > be per-thread things. Maybe some of those can be spread out, so that you > put a part of them in the "struct vm_struct" thing (to approximate > processes), part of them in the "struct user" struct (to approximate the > user-level thing), and part of it in a per-container thing for when/if we > support that kind of thing? I think Ingo's idea of a new sched_group to contain the generic parameters needed for the "key" calculation, works better than adding more fields to existing strctures (that would, of course, host pointers to it). Otherwise I can already the the struct_signal being the target for other unrelated fields :) - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Wed, 18 Apr 2007, Linus Torvalds wrote: > I'm not arguing against fairness. I'm arguing against YOUR notion of > fairness, which is obviously bogus. It is *not* fair to try to give out > CPU time evenly! "Perhaps on the rare occasion pursuing the right course demands an act of unfairness, unfairness itself can be the right course?" - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Wed, 18 Apr 2007, William Lee Irwin III wrote: > Thinking of the scheduler as a CPU bandwidth allocator, this means > handing out shares of CPU bandwidth to all users on the system, which > in turn hand out shares of bandwidth to all sessions, which in turn > hand out shares of bandwidth to all process groups, which in turn hand > out shares of bandwidth to all thread groups, which in turn hand out > shares of bandwidth to threads. The event handlers for the scheduler > need not deal with this apart from task creation and exit and various > sorts of process ID changes (e.g. setsid(), setpgrp(), setuid(), etc.). Yes, it really becomes a hierarchical problem once you consider user and processes. Top level sees a "user" can be scheduled (put itself on the virtual run queue), and passes the ball to the "process" scheduler inside the "user" container, down to maybe "threads". With all the "key" calculation parameters kept at each level (with up-propagation). - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > For example, maybe we can approximate it by spreading out the > statistics: right now you have things like > > - last_ran, wait_runtime, sum_wait_runtime.. > > be per-thread things. [...] yes, yes, yes! :) My thinking is "struct sched_group" embedded into _arbitrary_ other resource containers and abstractions, which sched_group's are then in a simple hierarchy and are driven by the core scheduling machinery. > [...] Maybe some of those can be spread out, so that you put a part of > them in the "struct vm_struct" thing (to approximate processes), part > of them in the "struct user" struct (to approximate the user-level > thing), and part of it in a per-container thing for when/if we support > that kind of thing? yes. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Wed, 18 Apr 2007, Ingo Molnar wrote: > > perhaps a more fitting term would be 'precise group-scheduling'. Within > the lowest level task group entity (be that thread group or uid group, > etc.) 'precise scheduling' is equivalent to 'fairness'. Yes. Absolutely. Except I think that at least if you're going to name somethign "complete" (or "perfect" or "precise"), you should also admit that groups can be hierarchical. The "threads in a process" thing is a great example of a hierarchical group. Imagine if X was running as a collection of threads - then each server thread would no longer be more important than the clients! But if you have a mix of "bags of threads" and "single process" kind applications, then very arguably the single thread in a single traditional process should get as much time as the "bag of threads" process gets total. So it really should be a hierarchical notion, where each thread is owned by one "process", and each process is owned by one "user", and each user is in one "virtual machine" - there's at least three different levels to this, and you'd want to schedule this thing top-down: virtual machines should be given CPU time "fairly" (which doesn't need to mean "equally", of course - nice-values could very well work at that level too), and then within each virtual machine users or "scheduling groups" should be scheduled fairly, and then within each scheduling group the processes should be scheduled, and within each process threads should equally get their fair share at _that_ level. And no, I don't think we necessarily need to do something quite that elaborate. But I think that's the kind of "obviously good goal" to keep in mind. Can we perhaps _approximate_ something like that by other means? For example, maybe we can approximate it by spreading out the statistics: right now you have things like - last_ran, wait_runtime, sum_wait_runtime.. be per-thread things. Maybe some of those can be spread out, so that you put a part of them in the "struct vm_struct" thing (to approximate processes), part of them in the "struct user" struct (to approximate the user-level thing), and part of it in a per-container thing for when/if we support that kind of thing? IOW, I don't think the scheduling "groups" have to be explicit boxes or anything like that. I suspect you can make do with just heurstics that penalize the same "struct user" and "struct vm_struct" to get overly much scheduling time, and you'll get the same _effect_. And I don't think it's wrong to look at the "one hundred processes by the same user" case as being an important case. But it should not be the *only* case or even necessarily the *main* case that matters. I think a benchmark that literally does pid_t pid = fork(); if (pid < 0) exit(1); if (pid) { if (setuid(500) < 0) exit(2); for (;;) /* Do nothing */; } if (setuid(501) < 0) exit(3); fork(); for (;;) /* Do nothing in two processes */; and I think that it's a really valid benchmark: if the scheduler gives 25% of time to each of the two processes of user 501, and 50% to user 500, then THAT is a good scheduler. If somebody wants to actually write and test the above as a test-script, and add it to a collection of scheduler tests, I think that could be a good thing. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Mark Glines wrote: One minor question: is it even possible to be completely fair on SMP? For instance, if you have a 2-way SMP box running 3 applications, one of which has 2 threads, will the threaded app have an advantage here? (The current system seems to try to keep each thread on a specific CPU, to reduce cache thrashing, which means threads and processes alike each get 50% of the CPU.) I think the ideal in this case would be to have both threads on one cpu, with the other app on the other cpu. This gives inter-process fairness while minimizing the amount of task migration required. More interesting is the case of three processes on a 2-cpu system. Do we constantly migrate one of them back and forth to ensure that each of them gets 66% of a cpu? Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Wed, 18 Apr 2007, Ingo Molnar wrote: > > But note that most of the reported CFS interactivity wins, as surprising > as it might be, were due to fairness between _the same user's tasks_. And *ALL* of the CFS interactivity *losses* and complaints have been because it did the wrong thing _between different user's tasks_ So what's your point? Your point was that when people try it out as a single user, it is indeed fair. But that's no point at all, since it totally missed _my_ point. The problems with X scheduling is exactly that "other user" kind of thing. The problem with kernel thread starvation due to user threads getting all the CPU time is exactly the same issue. As logn as you think that all threads are equal, and should be treated equally, you CANNOT make it work well. People can say "ok, you can renice X", but the whole problem stems from the fact that you're trying to be fair based on A TOTALLY INVALID NOTION of what "fair" is. > In the typical case, 99% of the desktop CPU time is executed either as X > (root user) or under the uid of the logged in user, and X is just one > task. So? You are ignoring the argument again. You're totally bringing up a red herring: > Even with a bad hack of making X super-high-prio, interactivity as > experienced by users still sucks without having fairness between the > other 100-200 user tasks that a desktop system is typically using. I didn't say that you should be *unfair* within one user group. What kind of *idiotic* argument are you trying to put forth? OF COURSE you should be fair "within the user group". Nobody contests that the "other 100-200 user tasks" should be scheduled fairly _amongst themselves_. The only point I had was that you cannot just lump all threads together and say "these threads are equally important". The 100-200 user tasks may be equally important, and should get equal amounts of preference, but that has absolutely _zero_ bearing on the _single_ task run in another "scheduling group", ie by other users or by X. I'm not arguing against fairness. I'm arguing against YOUR notion of fairness, which is obviously bogus. It is *not* fair to try to give out CPU time evenly! Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On 4/18/07, Matt Mackall <[EMAIL PROTECTED]> wrote: For the record, you actually don't need to track a whole NxN matrix (or do the implied O(n**3) matrix inversion!) to get to the same result. You can converge on the same node weightings (ie dynamic priorities) by applying a damped function at each transition point (directed wakeup, preemption, fork, exit). The trouble with any scheme like this is that it needs careful tuning of the damping factor to converge rapidly and not oscillate and precise numerical attention to the transition functions so that the sum of dynamic priorities is conserved. That would be the control theory approach. And yes, you have to get both the theoretical transfer function and the numerics right. It sometimes helps to use a control-systems framework like the classic Takagi-Sugeno-Kang fuzzy logic controller; get the numerics right once and for all, and treat the heuristics as data, not logic. (I haven't worked in this area in almost twenty years, but Google -- yes, I do use Google+brain for fact-checking; what do you do? -- says that people are still doing active research on TSK models, and solid fixed-point reference implementations are readily available.) That seems like an attractive strategy here because you could easily embed the control engine in the kernel and load rule sets dynamically. Done right, that could give most of the advantages of pluggable schedulers (different heuristic strokes for different folks) without diluting the tester pool for the actual engine code. (Of course, different scheduling strategies require different input data, and you might not want the overhead of collecting data that your chosen heuristics won't use. But that's not much different from the netfilter situation, and is obviously a solvable problem, if anyone cares to put that much work in. The people who ought to be funding this kind of work are Sun and IBM, who don't have a chance on the desktop and are in big trouble in the database tier; their future as processor vendors depends on being able to service presentation-tier and business-logic-tier loads efficiently on their massively multi-core chips. MIPS should pitch in too, on behalf of licensees like Cavium who need more predictable behavior on multi-core embedded Linux.) Note also that you might not even want to persistently prioritize particular processes or process groups. You might want a heuristic that notices that some task (say, the X server) often responds to being awakened by doing a little work and then unblocking the task that awakened it. When it is pinged from some highly interactive task, you want it to jump the scheduler queue just long enough to unblock the interactive task, which may mean letting it flush some work out of its internal queue. But otherwise you want to batch things up until there's too much "scheduler pressure" behind it, then let it work more or less until it runs out of things to do, because its working set is so large that repeatedly scheduling it in and out is hell on caches. (Priority inheritance is the classic solution to the blocked-high-priority-task problem _in_isolation_. It is not without its pitfalls, especially when the designer of the "server" didn't expect to lose his timeslice instantly on releasing the lock. True priority inheritance is probably not something you want to inflict on a non-real-time system, but you do need some urgency heuristic. What a "fuzzy logic" framework does for you is to let you combine competing heuristics in a way that remains amenable to analysis using control theory techniques.) What does any of this have to do with "fairness"? Nothing whatsoever! There's work that has to be done, and choosing when to do it is almost entirely a matter of staying out of the way of more urgent work while minimizing the task's negative impact on the rest of the system. Does that mean that the X server is "special", kind of the way that latency-sensitive A/V applications are "special", and belongs in a separate scheduler class? No. Nowadays, workloads where the kernel has any idea what tasks belong to what "users" are the exception, not the norm. The X server is the canary in the coal mine, and a scheduler that won't do the right thing for X without hand tweaking won't do the right thing for other eyeball-driven, multiple-tiers-on-one-box scenarios either. If you want fairness among users to the extent that their demands _compete_, you might as well partition the whole machine, and have a separate fairness-oriented scheduler (let's call it a "hypervisor") that lives outside the kernel. (Talk about two students running gcc on the same shell server, with more important people also doing things on the same system, is so 1990's!) Not that the design of scheduler heuristics shouldn't include "fairness"-like considerations; but they're probably only interesting as a fallback for when the scheduler has no idea what it ought to schedule next. So why is Ingo's
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Wed, 18 Apr 2007, Matt Mackall wrote: > On Wed, Apr 18, 2007 at 07:48:21AM -0700, Linus Torvalds wrote: > > And "fairness by euid" is probably a hell of a lot easier to do than > > trying to figure out the wakeup matrix. > > For the record, you actually don't need to track a whole NxN matrix > (or do the implied O(n**3) matrix inversion!) to get to the same > result. You can converge on the same node weightings (ie dynamic > priorities) by applying a damped function at each transition point > (directed wakeup, preemption, fork, exit). > > The trouble with any scheme like this is that it needs careful tuning > of the damping factor to converge rapidly and not oscillate and > precise numerical attention to the transition functions so that the sum of > dynamic priorities is conserved. Doing that inside the boundaries of the time constrains imposed by a scheduler, is the interesting part. Given also that the size (and members) of it (matrix) is dynamic. Also, a "wakup matrix" (if the name correctly pictures what it is for) would help with latencies and priority inheritance, but not for global fairness. The maniacal fairness focus we're seeing now, is due to the fact the mainline can have extremely unfair behaviour under certain conditions. IMO fairness, although important, should not be main objective of the scheduler rewrite. Simplification and predictability should be on higher priority, with interactivity achievements bound to decent fariness constraints. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/