Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Bill Davidsen

Ingo Molnar wrote:

* Davide Libenzi <[EMAIL PROTECTED]> wrote:


The same user nicing two different multi-threaded processes would 
expect a predictable CPU distribution too. [...]


i disagree that the user 'would expect' this. Some users might. Others 
would say: 'my 10-thread rendering engine is more important than a 
1-thread job because it's using 10 threads for a reason'. And the CFS 
feedback so far strengthens this point: the default behavior of treating 
the thread as a single scheduling (and CPU time accounting) unit works 
pretty well on the desktop.


If by desktop you mean "one and only one interactive user," that's true. 
On a shared machine it's very hard to preserve any semblance of fairness 
when one user gets far more than another, based not on the value of what 
they're doing but the tools they use to to it.


think about it in another, 'kernel policy' way as well: we'd like to 
_encourage_ more parallel user applications. Hurting them by accounting 
all threads together sends the exact opposite message.


Why is that? There are lots of things which are intrinsically single 
threaded, how are we hurting hurting multi-threaded applications by 
refusing to give them more CPU than an application running on behalf of 
another user? By accounting all threads together we encourage writing an 
application in the most logical way. Threads are a solution, not a goal 
in themselves.


[...] Doing that efficently (the old per-cpu run-queue is pretty nice 
from many POVs) is the real challenge.


yeah.

Ingo



--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Bill Davidsen

Linus Torvalds wrote:


On Wed, 18 Apr 2007, Matt Mackall wrote:

Why is X special? Because it does work on behalf of other processes?
Lots of things do this. Perhaps a scheduler should focus entirely on
the implicit and directed wakeup matrix and optimizing that
instead[1].


I 100% agree - the perfect scheduler would indeed take into account where 
the wakeups come from, and try to "weigh" processes that help other 
processes make progress more. That would naturally give server processes 
more CPU power, because they help others


I don't believe for a second that "fairness" means "give everybody the 
same amount of CPU". That's a totally illogical measure of fairness. All 
processes are _not_ created equal.


That said, even trying to do "fairness by effective user ID" would 
probably already do a lot. In a desktop environment, X would get as much 
CPU time as the user processes, simply because it's in a different 
protection domain (and that's really what "effective user ID" means: it's 
not about "users", it's really about "protection domains").


And "fairness by euid" is probably a hell of a lot easier to do than 
trying to figure out the wakeup matrix.


You probably want to consider the controlling terminal as well...  do 
you want to have people starting 'at' jobs competing on equal footing 
with people typing at a terminal? I'm not offering an answer, just 
raising the question.


And for some database applications, everyone in a group may connect with 
the same login-id, then do sub authorization to the database 
application. euid may be an issue there as well.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Bill Davidsen

Matt Mackall wrote:

On Wed, Apr 18, 2007 at 08:37:11AM +0200, Nick Piggin wrote:



[2] It's trivial to construct two or more perfectly reasonable and
desirable definitions of fairness that are mutually incompatible.

Probably not if you use common sense, and in the context of a replacement
for the 2.6 scheduler.


Ok, trivial example. You cannot allocate equal CPU time to
processes/tasks and simultaneously allocate equal time to thread
groups. Is it common sense that a heavily-threaded app should be able
to get hugely more CPU than a well-written app? No. I don't want Joe's
stupid Java app to make my compile crawl.

On the other hand, if my heavily threaded app is, say, a voicemail
server serving 30 customers, I probably want it to get 30x the CPU of
my gzip job.

Matt, you tickled a thought... on one hand we have a single user running 
a threaded application, and it ideally should get the same total CPU as 
a user running a single thread process. On the other hand we have a 
threaded application, call it sendmail, nnrpd, httpd, bind, whatever. In 
that case each thread is really providing service for an independent 
user, and should get an appropriate share of the CPU.


Perhaps the solution is to add a means for identifying server processes, 
by capability, or by membership in a "server" group, or by having the 
initiating process set some flag at exec() time. That doesn't 
necessarily solve problems, but it may provide more information to allow 
them to be soluble.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Willy Tarreau
Hi Björn,

On Sat, Apr 21, 2007 at 01:29:41PM +0200, Björn Steinbrink wrote:
> Hi,
> 
> On 2007.04.21 13:07:48 +0200, Willy Tarreau wrote:
> > > another thing i noticed: when using a -y larger then 1, then the window 
> > > title (at least on Metacity) overlaps and thus the ocbench tasks have 
> > > different X overhead and get scheduled a bit assymetrically as well. Is 
> > > there any way to start them up title-less perhaps?
> > 
> > It has annoyed me a bit too, but I'm no X developer at all, so I don't
> > know at all if it's possible nor how to do this. I know that my window
> > manager even adds title bars to xeyes, so I'm not sure we can do this.
> > 
> > Right now, I've added a "-B " argument so that you can
> > skip the size of your title bar. It's dirty but it's not my main job :-)
> 
> Here's a small patch that makes the windows unmanaged, which also causes
> ocbench to start up quite a bit faster on my box with larger number of
> windows, so it probably avoids some window manager overhead, which is a
> nice side-effect.

Excellent ! I've just merged it but conditionned it to a "-u" argument
so that we can keep previous behaviour (moving the windows is useful
especially when there are few of them).

So the new version 0.5 is available there :

  http://linux.1wt.eu/sched/

I believe it's the last one for today as I'm late on some work.

Thanks !
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Björn Steinbrink
Hi,

On 2007.04.21 13:07:48 +0200, Willy Tarreau wrote:
> > another thing i noticed: when using a -y larger then 1, then the window 
> > title (at least on Metacity) overlaps and thus the ocbench tasks have 
> > different X overhead and get scheduled a bit assymetrically as well. Is 
> > there any way to start them up title-less perhaps?
> 
> It has annoyed me a bit too, but I'm no X developer at all, so I don't
> know at all if it's possible nor how to do this. I know that my window
> manager even adds title bars to xeyes, so I'm not sure we can do this.
> 
> Right now, I've added a "-B " argument so that you can
> skip the size of your title bar. It's dirty but it's not my main job :-)

Here's a small patch that makes the windows unmanaged, which also causes
ocbench to start up quite a bit faster on my box with larger number of
windows, so it probably avoids some window manager overhead, which is a
nice side-effect.

Björn

--

diff -u ocbench-0.4/ocbench.c ocbench-0.4.1/ocbench.c
--- ocbench-0.4/ocbench.c   2007-04-21 13:05:55.0 +0200
+++ ocbench-0.4.1/ocbench.c 2007-04-21 13:24:01.0 +0200
@@ -213,6 +213,7 @@
 int main(int argc, char *argv[]) {
   Window root;
   XGCValues gc_setup;
+  XSetWindowAttributes swa;
   int c, index, proc_x, proc_y, pid;
   int *pcount[] = {, , };
   char *p, *q;
@@ -342,8 +343,11 @@
   alloc_color(fg, );
   alloc_color(fg2, );
 
-  win = XCreateSimpleWindow(dpy, root, X, Y, width, height, 0, 
-   black.pixel, black.pixel);
+  swa.override_redirect = 1;
+
+  win = XCreateWindow(dpy, root, X, Y, width, height, 0,
+   CopyFromParent, InputOutput, CopyFromParent,
+   CWOverrideRedirect, );
   XStoreName(dpy, win, "ocbench");
 
   XSelectInput(dpy, win, ExposureMask | StructureNotifyMask);
Only in ocbench-0.4.1/: .README.swp
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Willy Tarreau
Hi Ingo,

I'm replying to your 3 mails at once.

On Sat, Apr 21, 2007 at 12:45:22PM +0200, Ingo Molnar wrote:
> 
> * Ingo Molnar <[EMAIL PROTECTED]> wrote:
> 
> > > It could become a useful scheduler benchmark !
> > 
> > i just tried ocbench-0.3, and it is indeed very nice!

So as you've noticed just one minute after I put it there, I've updated
the tool and renamed it ocbench. For others, it's here :

http://linux.1wt.eu/sched/

Useful news are proper positionning, automatic forking, and more visible
progress with smaller windows, which eat less of X ressources.

Now about your idea of making it report information on stdout, I don't
know if it would be that useful. There are many other command line tools
for this purpose. This one's goal is to eat CPU with a visual control of
CPU distribution only.

Concerning your idea of using a signal to resync every process, I agree
with you. Running at 8x8 shows a noticeable offset. I've just uploaded
v0.4 which supports your idea of sending USR1.

> another thing i noticed: when using a -y larger then 1, then the window 
> title (at least on Metacity) overlaps and thus the ocbench tasks have 
> different X overhead and get scheduled a bit assymetrically as well. Is 
> there any way to start them up title-less perhaps?

It has annoyed me a bit too, but I'm no X developer at all, so I don't
know at all if it's possible nor how to do this. I know that my window
manager even adds title bars to xeyes, so I'm not sure we can do this.

Right now, I've added a "-B " argument so that you can
skip the size of your title bar. It's dirty but it's not my main job :-)

Thanks for your feedback
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> > It could become a useful scheduler benchmark !
> 
> i just tried ocbench-0.3, and it is indeed very nice!

another thing i noticed: when using a -y larger then 1, then the window 
title (at least on Metacity) overlaps and thus the ocbench tasks have 
different X overhead and get scheduled a bit assymetrically as well. Is 
there any way to start them up title-less perhaps?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> > The modified code is here :
> > 
> >   http://linux.1wt.eu/sched/orbitclock-0.2bench.tgz
> > 
> > What is interesting to note is that it's easy to make X work a lot 
> > (99%) by using 0 as the sleeping time, and it's easy to make the 
> > process work a lot by using large values for the running time 
> > associated with very low values (or 0) for the sleep time.
> > 
> > Ah, and it supports -geometry ;-)
> > 
> > It could become a useful scheduler benchmark !
> 
> i just tried ocbench-0.3, and it is indeed very nice!

another thing i just noticed: when starting up lots of ocbench tasks 
(say -x 6 -y 6) then they (naturally) get started up with an already 
visible offset. It's nice to observe the startup behavior, but after 
that it would be useful if it were possible to 'resync' all those 
ocbench tasks so that they start at the same offset. [ Maybe a "killall 
-SIGUSR1 ocbench" could serve this purpose, without having to 
synchronize the tasks explicitly? ]

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Ingo Molnar

* Willy Tarreau <[EMAIL PROTECTED]> wrote:

> I hacked it a bit to make it accept two parameters :
>   -R  : time spent burning CPU cycles at each round
>   -S  : time spent getting a rest
> 
> It now advances what it thinks is a second at each iteration, so that 
> it makes it easy to compare its progress with other instances (there 
> are seconds, minutes and hours, so it's easy to visually count up to 
> around 43200).
> 
> The modified code is here :
> 
>   http://linux.1wt.eu/sched/orbitclock-0.2bench.tgz
> 
> What is interesting to note is that it's easy to make X work a lot 
> (99%) by using 0 as the sleeping time, and it's easy to make the 
> process work a lot by using large values for the running time 
> associated with very low values (or 0) for the sleep time.
> 
> Ah, and it supports -geometry ;-)
> 
> It could become a useful scheduler benchmark !

i just tried ocbench-0.3, and it is indeed very nice!

Would it make sense perhaps to (optionally?) also log some sort of 
periodic text feedback to stdout, about the quality of scheduling? Maybe 
even a 'run this many seconds' option plus a summary text output at the 
end (which would output measured runtime, observed longest/smallest 
latency and standard deviation of latencies maybe)? That would make it 
directly usable both as a 'consistency of X app scheduling' visual test 
and as an easily shareable benchmark with an objective numeric result as 
well.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Ingo Molnar

* Bill Davidsen <[EMAIL PROTECTED]> wrote:

> All of my testing has been on desktop machines, although in most cases 
> they were really loaded desktops which had load avg 10..100 from time 
> to time, and none were low memory machines. Up to CFS v3 I thought 
> nicksched was my winner, now CFSv3 looks better, by not having 
> stumbles under stupid loads.

nice! I hope CFSv4 kept that good tradition too ;)

> I have not tested:
>   1 - server loads, nntp, smtp, etc
>   2 - low memory machines
>   3 - uniprocessor systems
> 
> I think this should be done before drawing conclusions. Or if someone 
> has tried this, perhaps they would report what they saw. People are 
> talking about smoothness, but not how many pages per second come out 
> of their overloaded web server.

i tested heavily swapping systems. (make -j50 workloads easily trigger 
that) I also tested UP systems and a handful of SMP systems. I have also 
tested massive_intr.c which i believe is an indicator of how fairly CPU 
time is distributed between partly sleeping partly running server 
threads. But i very much agree that diverse feedback is sought and 
welcome, both from those who are happy with the current scheduler and 
those who are unhappy about it.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Nick Piggin
On Fri, Apr 20, 2007 at 04:47:27PM -0400, Bill Davidsen wrote:
> Ingo Molnar wrote:
> 
> >( Lets be cautious though: the jury is still out whether people actually 
> >  like this more than the current approach. While CFS feedback looks 
> >  promising after a whopping 3 days of it being released [ ;-) ], the 
> >  test coverage of all 'fairness centric' schedulers, even considering 
> >  years of availability is less than 1% i'm afraid, and that < 1% was 
> >  mostly self-selecting. )
> >
> All of my testing has been on desktop machines, although in most cases 
> they were really loaded desktops which had load avg 10..100 from time to 
> time, and none were low memory machines. Up to CFS v3 I thought 
> nicksched was my winner, now CFSv3 looks better, by not having stumbles 
> under stupid loads.

What base_timeslice were you using for nicksched, and what HZ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Nick Piggin
On Fri, Apr 20, 2007 at 04:47:27PM -0400, Bill Davidsen wrote:
 Ingo Molnar wrote:
 
 ( Lets be cautious though: the jury is still out whether people actually 
   like this more than the current approach. While CFS feedback looks 
   promising after a whopping 3 days of it being released [ ;-) ], the 
   test coverage of all 'fairness centric' schedulers, even considering 
   years of availability is less than 1% i'm afraid, and that  1% was 
   mostly self-selecting. )
 
 All of my testing has been on desktop machines, although in most cases 
 they were really loaded desktops which had load avg 10..100 from time to 
 time, and none were low memory machines. Up to CFS v3 I thought 
 nicksched was my winner, now CFSv3 looks better, by not having stumbles 
 under stupid loads.

What base_timeslice were you using for nicksched, and what HZ?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Ingo Molnar

* Bill Davidsen [EMAIL PROTECTED] wrote:

 All of my testing has been on desktop machines, although in most cases 
 they were really loaded desktops which had load avg 10..100 from time 
 to time, and none were low memory machines. Up to CFS v3 I thought 
 nicksched was my winner, now CFSv3 looks better, by not having 
 stumbles under stupid loads.

nice! I hope CFSv4 kept that good tradition too ;)

 I have not tested:
   1 - server loads, nntp, smtp, etc
   2 - low memory machines
   3 - uniprocessor systems
 
 I think this should be done before drawing conclusions. Or if someone 
 has tried this, perhaps they would report what they saw. People are 
 talking about smoothness, but not how many pages per second come out 
 of their overloaded web server.

i tested heavily swapping systems. (make -j50 workloads easily trigger 
that) I also tested UP systems and a handful of SMP systems. I have also 
tested massive_intr.c which i believe is an indicator of how fairly CPU 
time is distributed between partly sleeping partly running server 
threads. But i very much agree that diverse feedback is sought and 
welcome, both from those who are happy with the current scheduler and 
those who are unhappy about it.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

  The modified code is here :
  
http://linux.1wt.eu/sched/orbitclock-0.2bench.tgz
  
  What is interesting to note is that it's easy to make X work a lot 
  (99%) by using 0 as the sleeping time, and it's easy to make the 
  process work a lot by using large values for the running time 
  associated with very low values (or 0) for the sleep time.
  
  Ah, and it supports -geometry ;-)
  
  It could become a useful scheduler benchmark !
 
 i just tried ocbench-0.3, and it is indeed very nice!

another thing i just noticed: when starting up lots of ocbench tasks 
(say -x 6 -y 6) then they (naturally) get started up with an already 
visible offset. It's nice to observe the startup behavior, but after 
that it would be useful if it were possible to 'resync' all those 
ocbench tasks so that they start at the same offset. [ Maybe a killall 
-SIGUSR1 ocbench could serve this purpose, without having to 
synchronize the tasks explicitly? ]

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Ingo Molnar

* Willy Tarreau [EMAIL PROTECTED] wrote:

 I hacked it a bit to make it accept two parameters :
   -R run_time_in_microsecond : time spent burning CPU cycles at each round
   -S sleep_time_in_microsecond : time spent getting a rest
 
 It now advances what it thinks is a second at each iteration, so that 
 it makes it easy to compare its progress with other instances (there 
 are seconds, minutes and hours, so it's easy to visually count up to 
 around 43200).
 
 The modified code is here :
 
   http://linux.1wt.eu/sched/orbitclock-0.2bench.tgz
 
 What is interesting to note is that it's easy to make X work a lot 
 (99%) by using 0 as the sleeping time, and it's easy to make the 
 process work a lot by using large values for the running time 
 associated with very low values (or 0) for the sleep time.
 
 Ah, and it supports -geometry ;-)
 
 It could become a useful scheduler benchmark !

i just tried ocbench-0.3, and it is indeed very nice!

Would it make sense perhaps to (optionally?) also log some sort of 
periodic text feedback to stdout, about the quality of scheduling? Maybe 
even a 'run this many seconds' option plus a summary text output at the 
end (which would output measured runtime, observed longest/smallest 
latency and standard deviation of latencies maybe)? That would make it 
directly usable both as a 'consistency of X app scheduling' visual test 
and as an easily shareable benchmark with an objective numeric result as 
well.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

  It could become a useful scheduler benchmark !
 
 i just tried ocbench-0.3, and it is indeed very nice!

another thing i noticed: when using a -y larger then 1, then the window 
title (at least on Metacity) overlaps and thus the ocbench tasks have 
different X overhead and get scheduled a bit assymetrically as well. Is 
there any way to start them up title-less perhaps?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Willy Tarreau
Hi Ingo,

I'm replying to your 3 mails at once.

On Sat, Apr 21, 2007 at 12:45:22PM +0200, Ingo Molnar wrote:
 
 * Ingo Molnar [EMAIL PROTECTED] wrote:
 
   It could become a useful scheduler benchmark !
  
  i just tried ocbench-0.3, and it is indeed very nice!

So as you've noticed just one minute after I put it there, I've updated
the tool and renamed it ocbench. For others, it's here :

http://linux.1wt.eu/sched/

Useful news are proper positionning, automatic forking, and more visible
progress with smaller windows, which eat less of X ressources.

Now about your idea of making it report information on stdout, I don't
know if it would be that useful. There are many other command line tools
for this purpose. This one's goal is to eat CPU with a visual control of
CPU distribution only.

Concerning your idea of using a signal to resync every process, I agree
with you. Running at 8x8 shows a noticeable offset. I've just uploaded
v0.4 which supports your idea of sending USR1.

 another thing i noticed: when using a -y larger then 1, then the window 
 title (at least on Metacity) overlaps and thus the ocbench tasks have 
 different X overhead and get scheduled a bit assymetrically as well. Is 
 there any way to start them up title-less perhaps?

It has annoyed me a bit too, but I'm no X developer at all, so I don't
know at all if it's possible nor how to do this. I know that my window
manager even adds title bars to xeyes, so I'm not sure we can do this.

Right now, I've added a -B border size argument so that you can
skip the size of your title bar. It's dirty but it's not my main job :-)

Thanks for your feedback
Willy

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Björn Steinbrink
Hi,

On 2007.04.21 13:07:48 +0200, Willy Tarreau wrote:
  another thing i noticed: when using a -y larger then 1, then the window 
  title (at least on Metacity) overlaps and thus the ocbench tasks have 
  different X overhead and get scheduled a bit assymetrically as well. Is 
  there any way to start them up title-less perhaps?
 
 It has annoyed me a bit too, but I'm no X developer at all, so I don't
 know at all if it's possible nor how to do this. I know that my window
 manager even adds title bars to xeyes, so I'm not sure we can do this.
 
 Right now, I've added a -B border size argument so that you can
 skip the size of your title bar. It's dirty but it's not my main job :-)

Here's a small patch that makes the windows unmanaged, which also causes
ocbench to start up quite a bit faster on my box with larger number of
windows, so it probably avoids some window manager overhead, which is a
nice side-effect.

Björn

--

diff -u ocbench-0.4/ocbench.c ocbench-0.4.1/ocbench.c
--- ocbench-0.4/ocbench.c   2007-04-21 13:05:55.0 +0200
+++ ocbench-0.4.1/ocbench.c 2007-04-21 13:24:01.0 +0200
@@ -213,6 +213,7 @@
 int main(int argc, char *argv[]) {
   Window root;
   XGCValues gc_setup;
+  XSetWindowAttributes swa;
   int c, index, proc_x, proc_y, pid;
   int *pcount[] = {HOUR, MIN, SEC};
   char *p, *q;
@@ -342,8 +343,11 @@
   alloc_color(fg, orange);
   alloc_color(fg2, blue);
 
-  win = XCreateSimpleWindow(dpy, root, X, Y, width, height, 0, 
-   black.pixel, black.pixel);
+  swa.override_redirect = 1;
+
+  win = XCreateWindow(dpy, root, X, Y, width, height, 0,
+   CopyFromParent, InputOutput, CopyFromParent,
+   CWOverrideRedirect, swa);
   XStoreName(dpy, win, ocbench);
 
   XSelectInput(dpy, win, ExposureMask | StructureNotifyMask);
Only in ocbench-0.4.1/: .README.swp
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Willy Tarreau
Hi Björn,

On Sat, Apr 21, 2007 at 01:29:41PM +0200, Björn Steinbrink wrote:
 Hi,
 
 On 2007.04.21 13:07:48 +0200, Willy Tarreau wrote:
   another thing i noticed: when using a -y larger then 1, then the window 
   title (at least on Metacity) overlaps and thus the ocbench tasks have 
   different X overhead and get scheduled a bit assymetrically as well. Is 
   there any way to start them up title-less perhaps?
  
  It has annoyed me a bit too, but I'm no X developer at all, so I don't
  know at all if it's possible nor how to do this. I know that my window
  manager even adds title bars to xeyes, so I'm not sure we can do this.
  
  Right now, I've added a -B border size argument so that you can
  skip the size of your title bar. It's dirty but it's not my main job :-)
 
 Here's a small patch that makes the windows unmanaged, which also causes
 ocbench to start up quite a bit faster on my box with larger number of
 windows, so it probably avoids some window manager overhead, which is a
 nice side-effect.

Excellent ! I've just merged it but conditionned it to a -u argument
so that we can keep previous behaviour (moving the windows is useful
especially when there are few of them).

So the new version 0.5 is available there :

  http://linux.1wt.eu/sched/

I believe it's the last one for today as I'm late on some work.

Thanks !
Willy

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Bill Davidsen

Matt Mackall wrote:

On Wed, Apr 18, 2007 at 08:37:11AM +0200, Nick Piggin wrote:



[2] It's trivial to construct two or more perfectly reasonable and
desirable definitions of fairness that are mutually incompatible.

Probably not if you use common sense, and in the context of a replacement
for the 2.6 scheduler.


Ok, trivial example. You cannot allocate equal CPU time to
processes/tasks and simultaneously allocate equal time to thread
groups. Is it common sense that a heavily-threaded app should be able
to get hugely more CPU than a well-written app? No. I don't want Joe's
stupid Java app to make my compile crawl.

On the other hand, if my heavily threaded app is, say, a voicemail
server serving 30 customers, I probably want it to get 30x the CPU of
my gzip job.

Matt, you tickled a thought... on one hand we have a single user running 
a threaded application, and it ideally should get the same total CPU as 
a user running a single thread process. On the other hand we have a 
threaded application, call it sendmail, nnrpd, httpd, bind, whatever. In 
that case each thread is really providing service for an independent 
user, and should get an appropriate share of the CPU.


Perhaps the solution is to add a means for identifying server processes, 
by capability, or by membership in a server group, or by having the 
initiating process set some flag at exec() time. That doesn't 
necessarily solve problems, but it may provide more information to allow 
them to be soluble.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Bill Davidsen

Linus Torvalds wrote:


On Wed, 18 Apr 2007, Matt Mackall wrote:

Why is X special? Because it does work on behalf of other processes?
Lots of things do this. Perhaps a scheduler should focus entirely on
the implicit and directed wakeup matrix and optimizing that
instead[1].


I 100% agree - the perfect scheduler would indeed take into account where 
the wakeups come from, and try to weigh processes that help other 
processes make progress more. That would naturally give server processes 
more CPU power, because they help others


I don't believe for a second that fairness means give everybody the 
same amount of CPU. That's a totally illogical measure of fairness. All 
processes are _not_ created equal.


That said, even trying to do fairness by effective user ID would 
probably already do a lot. In a desktop environment, X would get as much 
CPU time as the user processes, simply because it's in a different 
protection domain (and that's really what effective user ID means: it's 
not about users, it's really about protection domains).


And fairness by euid is probably a hell of a lot easier to do than 
trying to figure out the wakeup matrix.


You probably want to consider the controlling terminal as well...  do 
you want to have people starting 'at' jobs competing on equal footing 
with people typing at a terminal? I'm not offering an answer, just 
raising the question.


And for some database applications, everyone in a group may connect with 
the same login-id, then do sub authorization to the database 
application. euid may be an issue there as well.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-21 Thread Bill Davidsen

Ingo Molnar wrote:

* Davide Libenzi [EMAIL PROTECTED] wrote:


The same user nicing two different multi-threaded processes would 
expect a predictable CPU distribution too. [...]


i disagree that the user 'would expect' this. Some users might. Others 
would say: 'my 10-thread rendering engine is more important than a 
1-thread job because it's using 10 threads for a reason'. And the CFS 
feedback so far strengthens this point: the default behavior of treating 
the thread as a single scheduling (and CPU time accounting) unit works 
pretty well on the desktop.


If by desktop you mean one and only one interactive user, that's true. 
On a shared machine it's very hard to preserve any semblance of fairness 
when one user gets far more than another, based not on the value of what 
they're doing but the tools they use to to it.


think about it in another, 'kernel policy' way as well: we'd like to 
_encourage_ more parallel user applications. Hurting them by accounting 
all threads together sends the exact opposite message.


Why is that? There are lots of things which are intrinsically single 
threaded, how are we hurting hurting multi-threaded applications by 
refusing to give them more CPU than an application running on behalf of 
another user? By accounting all threads together we encourage writing an 
application in the most logical way. Threads are a solution, not a goal 
in themselves.


[...] Doing that efficently (the old per-cpu run-queue is pretty nice 
from many POVs) is the real challenge.


yeah.

Ingo



--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-20 Thread Bill Davidsen

Ingo Molnar wrote:

( Lets be cautious though: the jury is still out whether people actually 
  like this more than the current approach. While CFS feedback looks 
  promising after a whopping 3 days of it being released [ ;-) ], the 
  test coverage of all 'fairness centric' schedulers, even considering 
  years of availability is less than 1% i'm afraid, and that < 1% was 
  mostly self-selecting. )


All of my testing has been on desktop machines, although in most cases 
they were really loaded desktops which had load avg 10..100 from time to 
time, and none were low memory machines. Up to CFS v3 I thought 
nicksched was my winner, now CFSv3 looks better, by not having stumbles 
under stupid loads.


I have not tested:
  1 - server loads, nntp, smtp, etc
  2 - low memory machines
  3 - uniprocessor systems

I think this should be done before drawing conclusions. Or if someone 
has tried this, perhaps they would report what they saw. People are 
talking about smoothness, but not how many pages per second come out of 
their overloaded web server.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-20 Thread Bill Davidsen

Mike Galbraith wrote:

On Tue, 2007-04-17 at 05:40 +0200, Nick Piggin wrote:

On Tue, Apr 17, 2007 at 04:29:01AM +0200, Mike Galbraith wrote:
 

Yup, and progress _is_ happening now, quite rapidly.

Progress as in progress on Ingo's scheduler. I still don't know how we'd
decide when to replace the mainline scheduler or with what.

I don't think we can say Ingo's is better than the alternatives, can we?


No, that would require massive performance testing of all alternatives.


If there is some kind of bakeoff, then I'd like one of Con's designs to
be involved, and mine, and Peter's...


The trouble with a bakeoff is that it's pretty darn hard to get people
to test in the first place, and then comes weighting the subjective and
hard performance numbers.  If they're close in numbers, do you go with
the one which starts the least flamewars or what?

Here we disagree... I picked a scheduler not by running benchmarks, but 
by running loads which piss me off with the mainline scheduler. And then 
I ran the other schedulers for a while to find the things, normal things 
I do, which resulted in bad behavior. And when I found one which had (so 
far) no such cases I called it my winner, but I haven't tested it under 
server load, so I can't begin to say it's "the best."


What we need is for lots of people to run every scheduler in real life, 
and do "worst case analysis" by finding the cases which cause bad 
behavior. And if there were a way to easily choose another scheduler, 
call it plugable, modular, or Russian Roulette, people who found a worst 
case would report it (aka bitch about it) and try another. But the 
average user is better able to boot with an option like "sched=cfs" (or 
sc, or nick, or ...) than to patch and build a kernel. So if we don't 
get easily switched schedulers people will not test nearly as well.


The best scheduler isn't the one 2% faster than the rest, it's the one 
with the fewest jackpot cases where it sucks. And if the mainline had 
multiple schedulers this testing would get done, authors would get more 
reports and have a better chance of fixing corner cases.


Note that we really need multiple schedulers to make people happy, 
because fairness is not the most desirable behavior on all machines, and 
adding knobs probably isn't the answer. I want a server to degrade 
gently, I want my desktop to show my movie and echo my typing, and if 
that's hard on compiles or the file transfer, so be it. Con doesn't want 
to compromise his goals, I agree but want to have an option if I don't 
share them.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-20 Thread Peter Williams

William Lee Irwin III wrote:

William Lee Irwin III wrote:

I'd further recommend making priority levels accessible to kernel threads
that are not otherwise accessible to processes, both above and below
user-available priority levels. Basically, if you can get SCHED_RR and
SCHED_FIFO to coexist as "intimate scheduler classes," then a SCHED_KERN
scheduler class can coexist with SCHED_OTHER in like fashion, but with
availability of higher and lower priorities than any userspace process
is allowed, and potentially some differing scheduling semantics. In such
a manner nonessential background processing intended not to ever disturb
userspace can be given priorities appropriate to it (perhaps even con's
SCHED_IDLEPRIO would make sense), and other, urgent processing can be
given priority over userspace altogether.


On Thu, Apr 19, 2007 at 09:50:19PM +1000, Peter Williams wrote:
This is sounding very much like System V Release 4 (and descendants) 
except that they call it SCHED_SYS and also give SCHED_NORMAL tasks that 
are in system mode dynamic priorities in the SCHED_SYS range (to avoid 
priority inversion, I believe).


Descriptions of that are probably where I got the idea (hurrah for OS
textbooks).


And long term background memory.  :-)


It makes a fair amount of sense.


Yes.  You could also add a SCHED_IA in between SCHED_SYS and SCHED_OTHER 
(a la Solaris) for interactive tasks.  The only problem is how to get a 
task into SCHED_IA without root privileges.



Not sure what the take on
the specific precedent is. The only content here is expanding the
priority range with ranges above and below for the exclusive use of
ultra-privileged tasks, so it's really trivial. Actually it might be so
trivial it should just be some permission checks in the SCHED_OTHER
renicing code.


Perhaps.

Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-20 Thread Peter Williams

William Lee Irwin III wrote:

William Lee Irwin III wrote:

I'd further recommend making priority levels accessible to kernel threads
that are not otherwise accessible to processes, both above and below
user-available priority levels. Basically, if you can get SCHED_RR and
SCHED_FIFO to coexist as intimate scheduler classes, then a SCHED_KERN
scheduler class can coexist with SCHED_OTHER in like fashion, but with
availability of higher and lower priorities than any userspace process
is allowed, and potentially some differing scheduling semantics. In such
a manner nonessential background processing intended not to ever disturb
userspace can be given priorities appropriate to it (perhaps even con's
SCHED_IDLEPRIO would make sense), and other, urgent processing can be
given priority over userspace altogether.


On Thu, Apr 19, 2007 at 09:50:19PM +1000, Peter Williams wrote:
This is sounding very much like System V Release 4 (and descendants) 
except that they call it SCHED_SYS and also give SCHED_NORMAL tasks that 
are in system mode dynamic priorities in the SCHED_SYS range (to avoid 
priority inversion, I believe).


Descriptions of that are probably where I got the idea (hurrah for OS
textbooks).


And long term background memory.  :-)


It makes a fair amount of sense.


Yes.  You could also add a SCHED_IA in between SCHED_SYS and SCHED_OTHER 
(a la Solaris) for interactive tasks.  The only problem is how to get a 
task into SCHED_IA without root privileges.



Not sure what the take on
the specific precedent is. The only content here is expanding the
priority range with ranges above and below for the exclusive use of
ultra-privileged tasks, so it's really trivial. Actually it might be so
trivial it should just be some permission checks in the SCHED_OTHER
renicing code.


Perhaps.

Peter
--
Peter Williams   [EMAIL PROTECTED]

Learning, n. The kind of ignorance distinguishing the studious.
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-20 Thread Bill Davidsen

Mike Galbraith wrote:

On Tue, 2007-04-17 at 05:40 +0200, Nick Piggin wrote:

On Tue, Apr 17, 2007 at 04:29:01AM +0200, Mike Galbraith wrote:
 

Yup, and progress _is_ happening now, quite rapidly.

Progress as in progress on Ingo's scheduler. I still don't know how we'd
decide when to replace the mainline scheduler or with what.

I don't think we can say Ingo's is better than the alternatives, can we?


No, that would require massive performance testing of all alternatives.


If there is some kind of bakeoff, then I'd like one of Con's designs to
be involved, and mine, and Peter's...


The trouble with a bakeoff is that it's pretty darn hard to get people
to test in the first place, and then comes weighting the subjective and
hard performance numbers.  If they're close in numbers, do you go with
the one which starts the least flamewars or what?

Here we disagree... I picked a scheduler not by running benchmarks, but 
by running loads which piss me off with the mainline scheduler. And then 
I ran the other schedulers for a while to find the things, normal things 
I do, which resulted in bad behavior. And when I found one which had (so 
far) no such cases I called it my winner, but I haven't tested it under 
server load, so I can't begin to say it's the best.


What we need is for lots of people to run every scheduler in real life, 
and do worst case analysis by finding the cases which cause bad 
behavior. And if there were a way to easily choose another scheduler, 
call it plugable, modular, or Russian Roulette, people who found a worst 
case would report it (aka bitch about it) and try another. But the 
average user is better able to boot with an option like sched=cfs (or 
sc, or nick, or ...) than to patch and build a kernel. So if we don't 
get easily switched schedulers people will not test nearly as well.


The best scheduler isn't the one 2% faster than the rest, it's the one 
with the fewest jackpot cases where it sucks. And if the mainline had 
multiple schedulers this testing would get done, authors would get more 
reports and have a better chance of fixing corner cases.


Note that we really need multiple schedulers to make people happy, 
because fairness is not the most desirable behavior on all machines, and 
adding knobs probably isn't the answer. I want a server to degrade 
gently, I want my desktop to show my movie and echo my typing, and if 
that's hard on compiles or the file transfer, so be it. Con doesn't want 
to compromise his goals, I agree but want to have an option if I don't 
share them.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-20 Thread Bill Davidsen

Ingo Molnar wrote:

( Lets be cautious though: the jury is still out whether people actually 
  like this more than the current approach. While CFS feedback looks 
  promising after a whopping 3 days of it being released [ ;-) ], the 
  test coverage of all 'fairness centric' schedulers, even considering 
  years of availability is less than 1% i'm afraid, and that  1% was 
  mostly self-selecting. )


All of my testing has been on desktop machines, although in most cases 
they were really loaded desktops which had load avg 10..100 from time to 
time, and none were low memory machines. Up to CFS v3 I thought 
nicksched was my winner, now CFSv3 looks better, by not having stumbles 
under stupid loads.


I have not tested:
  1 - server loads, nntp, smtp, etc
  2 - low memory machines
  3 - uniprocessor systems

I think this should be done before drawing conclusions. Or if someone 
has tried this, perhaps they would report what they saw. People are 
talking about smoothness, but not how many pages per second come out of 
their overloaded web server.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread William Lee Irwin III
William Lee Irwin III wrote:
>> I'd further recommend making priority levels accessible to kernel threads
>> that are not otherwise accessible to processes, both above and below
>> user-available priority levels. Basically, if you can get SCHED_RR and
>> SCHED_FIFO to coexist as "intimate scheduler classes," then a SCHED_KERN
>> scheduler class can coexist with SCHED_OTHER in like fashion, but with
>> availability of higher and lower priorities than any userspace process
>> is allowed, and potentially some differing scheduling semantics. In such
>> a manner nonessential background processing intended not to ever disturb
>> userspace can be given priorities appropriate to it (perhaps even con's
>> SCHED_IDLEPRIO would make sense), and other, urgent processing can be
>> given priority over userspace altogether.

On Thu, Apr 19, 2007 at 09:50:19PM +1000, Peter Williams wrote:
> This is sounding very much like System V Release 4 (and descendants) 
> except that they call it SCHED_SYS and also give SCHED_NORMAL tasks that 
> are in system mode dynamic priorities in the SCHED_SYS range (to avoid 
> priority inversion, I believe).

Descriptions of that are probably where I got the idea (hurrah for OS
textbooks). It makes a fair amount of sense. Not sure what the take on
the specific precedent is. The only content here is expanding the
priority range with ranges above and below for the exclusive use of
ultra-privileged tasks, so it's really trivial. Actually it might be so
trivial it should just be some permission checks in the SCHED_OTHER
renicing code.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Mike Galbraith
On Thu, 2007-04-19 at 09:55 -0700, Davide Libenzi wrote:
> On Thu, 19 Apr 2007, Mike Galbraith wrote:
> 
> > On Thu, 2007-04-19 at 09:09 +0200, Ingo Molnar wrote:
> > > * Mike Galbraith <[EMAIL PROTECTED]> wrote:
> > > 
> > > > With a heavily reniced X (perfectly fine), that should indeed solve my 
> > > > daily usage pattern nicely (always need godmode for shells, but not 
> > > > for mozilla and ilk. 50/50 split automatic without renice of entire 
> > > > gui)
> > > 
> > > how about the first-approximation solution i suggested in the previous 
> > > mail: to add a per UID default nice level? (With this default defaulting 
> > > to '-10' for all root-owned processes, and defaulting to '0' for 
> > > everything else.) That would solve most of the current CFS regressions 
> > > at hand.
> > 
> > That would make my kernel builds etc interfere with my other self's
> > surfing and whatnot.  With it by EUID, when I'm surfing or whatnot, the
> > X portion of my Joe-User activity pushes the compile portion of root
> > down in bandwidth utilization automagically, which is exactly the right
> > thing, because the root me in not as important as the Joe-User me using
> > the GUI at that time.  If the idea of X disturbing root upsets some,
> > they can move X to another UID.  Generally, it seems perfect for here.
> 
> Now guys, I did not follow the whole lengthy and feisty thread, but IIRC 
> Con's scheduler has been attacked because, among other argouments, was 
> requiring X to be reniced. This happened like a month ago IINM.

I don't object to renicing X if you want it to receive _more_ than it's
fair share. I do object to having to renice X in order for it to _get_
it's fair share.  That's what I attacked.

> I did not have time to look at Con's scheduler, and I only had a brief 
> look at Ingo's one (looks very promising IMO, but so was the initial O(1) 
> post before all the corner-cases fixes went in).
> But this is not a about technical merit, this is about applying the same 
> rules of judgement to others as well to ourselves.

I'm running the same tests with CFS that I ran for RSDL/SD.  It falls
short in one key area (to me) in that X+client cannot yet split my box
50/50 with two concurrent tasks.  In the CFS case, renicing both X and
client does work, but it should not be necessary IMHO.  With RSDL/SD
renicing didn't help.

> We went from a "renicing X to -10 is bad because the scheduler should 
> be able to correctly handle the problem w/out additional external plugs" 
> to a totally opposite "let's renice -10 X, the whole SCHED_NORMAL kthreads 
> class, on top of all the tasks owned by root" [1].
> >From a spectator POV like myself in this case, this looks rather "unfair".

Well, for me, the renicing I mentioned above is only interesting as a
way to improve long term fairness with schedulers with no history.

I found Linus' EUID idea intriguing in that by putting the server
together with a steady load in one 'fair' domain, and clients in
another, X can, if prioritized to empower it to do so, modulate the
steady load in it's domain (but can't starve it!), the clients modulate
X, and the steady load gets it all when X and clients are idle.  The
nice level of X determines to what _extent_ X can modulate the constant
load rather like a mixer slider.  The synchronous (I'm told) nature of
X/client then becomes kind of an asset to the desktop instead of a
liability.

The specific case I was thinking about is the X+Gforce test where both
RSDL and CFS fail to provide fairness (as defined by me;).  X and Gforce
are mostly not concurrent.  The make -j2 I put them up against are
mostly concurrent.  I don't call giving 1/3 of my CPU to X+Client fair
at _all_, but that's what you'll get if your fairstick of the instant
generally can't see the fourth competing task.  Seemed pretty cool to me
because it creates the missing connection between client and server,
though also likely complicated (and maybe full of perils, who knows).

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Willy Tarreau
On Fri, Apr 20, 2007 at 02:52:38AM +0300, Jan Knutar wrote:
> On Thursday 19 April 2007 18:18, Ingo Molnar wrote:
> > * Willy Tarreau <[EMAIL PROTECTED]> wrote:
> > > You can certainly script it with -geometry. But it is the wrong
> > > application for this matter, because you benchmark X more than
> > > glxgears itself. What would be better is something like a line
> > > rotating 360 degrees and doing some short stuff between each
> > > degree, so that X is not much sollicitated, but the CPU would be
> > > spent more on the processes themselves.
> >
> > at least on my setup glxgears goes via DRI/DRM so there's no X
> > scheduling inbetween at all, and the visual appearance of glxgears is
> > a direct function of its scheduling.
> 
> How much of the subjective interactiveness-feel of the desktop is at the 
> mercy of the X server's scheduling and not the cpu scheduler?

probably a lot. Hence the reason why I wanted something visually noticeable
but using far less X resources than glxgears. The modified orbitclock is
perfect IMHO.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Jan Knutar
On Thursday 19 April 2007 18:18, Ingo Molnar wrote:
> * Willy Tarreau <[EMAIL PROTECTED]> wrote:
> > You can certainly script it with -geometry. But it is the wrong
> > application for this matter, because you benchmark X more than
> > glxgears itself. What would be better is something like a line
> > rotating 360 degrees and doing some short stuff between each
> > degree, so that X is not much sollicitated, but the CPU would be
> > spent more on the processes themselves.
>
> at least on my setup glxgears goes via DRI/DRM so there's no X
> scheduling inbetween at all, and the visual appearance of glxgears is
> a direct function of its scheduling.

How much of the subjective interactiveness-feel of the desktop is at the 
mercy of the X server's scheduling and not the cpu scheduler?

I've noticed that video playback is significantly smoother and resistant 
to other load, when using MPlayer's opengl output, especially if 
"heavy" programs are running at the same time. Especially firefox and 
ksysguard seem to have found a way to cause video through Xv to look 
annoyingly jittery.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Willy Tarreau
On Thu, Apr 19, 2007 at 05:18:03PM +0200, Ingo Molnar wrote:
> 
> * Willy Tarreau <[EMAIL PROTECTED]> wrote:
> 
> > You can certainly script it with -geometry. But it is the wrong 
> > application for this matter, because you benchmark X more than 
> > glxgears itself. What would be better is something like a line 
> > rotating 360 degrees and doing some short stuff between each degree, 
> > so that X is not much sollicitated, but the CPU would be spent more on 
> > the processes themselves.
> 
> at least on my setup glxgears goes via DRI/DRM so there's no X 
> scheduling inbetween at all, and the visual appearance of glxgears is a 
> direct function of its scheduling.

OK, I thought that somethink looking like a clock would be useful, especially
if we could tune the amount of CPU spent per task instead of being limited by
graphics drivers.

I searched freashmeat for a clock and found "orbitclock" by Jeremy Weatherford,
which was exactly what I was looking for :
  - small
  - C only
  - X11 only
  - needed less than 5 minutes and no knowledge of X11 for the complete hack !
  => Kudos to its author, sincerely !

I hacked it a bit to make it accept two parameters :
  -R  : time spent burning CPU cycles at each round
  -S  : time spent getting a rest

It now advances what it thinks is a second at each iteration, so that it makes
it easy to compare its progress with other instances (there are seconds,
minutes and hours, so it's easy to visually count up to around 43200).

The modified code is here :

  http://linux.1wt.eu/sched/orbitclock-0.2bench.tgz

What is interesting to note is that it's easy to make X work a lot (99%) by
using 0 as the sleeping time, and it's easy to make the process work a lot
by using large values for the running time associated with very low values
(or 0) for the sleep time.

Ah, and it supports -geometry ;-)

It could become a useful scheduler benchmark !

Have fun !
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Bernd Eckenfels
In article <[EMAIL PROTECTED]> you wrote:
> Top (VCPU maybe?)
>User
>Process
>Thread

The problem with that is, that not all Schedulers might work on the User
level. You can think of Batch/Job, Parent, Group, Session or namespace
level. That would be IMHO a generic Top, with no need for a level above.

Greetings
Bernd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Gene Heskett
On Thursday 19 April 2007, Ingo Molnar wrote:
>* Willy Tarreau <[EMAIL PROTECTED]> wrote:
>> You can certainly script it with -geometry. But it is the wrong
>> application for this matter, because you benchmark X more than
>> glxgears itself. What would be better is something like a line
>> rotating 360 degrees and doing some short stuff between each degree,
>> so that X is not much sollicitated, but the CPU would be spent more on
>> the processes themselves.
>
>at least on my setup glxgears goes via DRI/DRM so there's no X
>scheduling inbetween at all, and the visual appearance of glxgears is a
>direct function of its scheduling.
>
>   Ingo

That doesn't appear to be the case here Ingo. Even when I know the rest of the 
system is lagged, glxgears continues to show very smooth and steady movement.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Yow!  I just went below the poverty line!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Gene Heskett
On Thursday 19 April 2007, Ingo Molnar wrote:
>* Willy Tarreau <[EMAIL PROTECTED]> wrote:
>> Good idea. The machine I'm typing from now has 1000 scheddos running
>> at +19, and 12 gears at nice 0. [...]
>>
>> From time to time, one of the 12 aligned gears will quickly perform a
>> full quarter of round while others slowly turn by a few degrees. In
>> fact, while I don't know this process's CPU usage pattern, there's
>> something useful in it : it allows me to visually see when process
>> accelerate/decelerate. [...]
>
>cool idea - i have just tried this and it rocks - you can easily see the
>'nature' of CPU time distribution just via visual feedback. (Is there
>any easy way to start up 12 glxgears fully aligned, or does one always
>have to mouse around to get them into proper position?)
>
>btw., i am using another method to quickly judge X's behavior: i started
>the 'snowflakes' plugin in Beryl on Fedora 7, which puts a nice smooth
>opengl-rendered snow fall on the desktop background. That gives me an
>idea about how well X is scheduling under various workloads, without
>having to instrument it explicitly.
>
yes, its a  cute idea, till you switch away from that screen to check progress 
on something else, like to compose this message.

===
5913 frames in 5.0 seconds = 1182.499 FPS
6238 frames in 5.0 seconds = 1247.556 FPS
11380 frames in 5.0 seconds = 2275.905 FPS
10691 frames in 5.0 seconds = 2138.173 FPS
8707 frames in 5.0 seconds = 1741.305 FPS
10669 frames in 5.0 seconds = 2133.708 FPS
11392 frames in 5.0 seconds = 2278.037 FPS
11379 frames in 5.0 seconds = 2275.711 FPS
11310 frames in 5.0 seconds = 2261.861 FPS
11386 frames in 5.0 seconds = 2277.081 FPS
11292 frames in 5.0 seconds = 2258.353 FPS
11352 frames in 5.0 seconds = 2270.297 FPS
11415 frames in 5.0 seconds = 2282.886 FPS
11406 frames in 5.0 seconds = 2281.037 FPS
11483 frames in 5.0 seconds = 2296.533 FPS
11510 frames in 5.0 seconds = 2301.883 FPS
11123 frames in 5.0 seconds = 2224.266 FPS
8980 frames in 5.0 seconds = 1795.861 FPS
===
The over 2000fps reports were while I was either looking at htop, or starting 
this message, both on different screens.  htop said it was using 95+ % of the 
cpu even when its display was going to /dev/null.  So 'Kewl' doesn't seem to 
get us apples to apples numbers we can go to the window and bet 
win-place-show based on them alone.

FWIW, running the nvidia-9755 drivers here.

So if we are going to use that as a judgement operator, it obviously needs 
some intelligently applied scaling before they are worth more than a 
subjective feel is.

>   Ingo
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [EMAIL PROTECTED]
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/



-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
The confusion of a staff member is measured by the length of his memos.
-- New York Times, Jan. 20, 1981
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Davide Libenzi
On Thu, 19 Apr 2007, Mike Galbraith wrote:

> On Thu, 2007-04-19 at 09:09 +0200, Ingo Molnar wrote:
> > * Mike Galbraith <[EMAIL PROTECTED]> wrote:
> > 
> > > With a heavily reniced X (perfectly fine), that should indeed solve my 
> > > daily usage pattern nicely (always need godmode for shells, but not 
> > > for mozilla and ilk. 50/50 split automatic without renice of entire 
> > > gui)
> > 
> > how about the first-approximation solution i suggested in the previous 
> > mail: to add a per UID default nice level? (With this default defaulting 
> > to '-10' for all root-owned processes, and defaulting to '0' for 
> > everything else.) That would solve most of the current CFS regressions 
> > at hand.
> 
> That would make my kernel builds etc interfere with my other self's
> surfing and whatnot.  With it by EUID, when I'm surfing or whatnot, the
> X portion of my Joe-User activity pushes the compile portion of root
> down in bandwidth utilization automagically, which is exactly the right
> thing, because the root me in not as important as the Joe-User me using
> the GUI at that time.  If the idea of X disturbing root upsets some,
> they can move X to another UID.  Generally, it seems perfect for here.

Now guys, I did not follow the whole lengthy and feisty thread, but IIRC 
Con's scheduler has been attacked because, among other argouments, was 
requiring X to be reniced. This happened like a month ago IINM.
I did not have time to look at Con's scheduler, and I only had a brief 
look at Ingo's one (looks very promising IMO, but so was the initial O(1) 
post before all the corner-cases fixes went in).
But this is not a about technical merit, this is about applying the same 
rules of judgement to others as well to ourselves.
We went from a "renicing X to -10 is bad because the scheduler should 
be able to correctly handle the problem w/out additional external plugs" 
to a totally opposite "let's renice -10 X, the whole SCHED_NORMAL kthreads 
class, on top of all the tasks owned by root" [1].
>From a spectator POV like myself in this case, this looks rather "unfair".



[1] I think, before and now, that that's more a duck tape patch than a 
real solution. OTOH if the "solution" is gonna be another maze of 
macros and heuristics filled with pretty bad corner cases, I may 
prefer the former.


- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Davide Libenzi
On Thu, 19 Apr 2007, Ingo Molnar wrote:

> i disagree that the user 'would expect' this. Some users might. Others 
> would say: 'my 10-thread rendering engine is more important than a 
> 1-thread job because it's using 10 threads for a reason'. And the CFS 
> feedback so far strengthens this point: the default behavior of treating 
> the thread as a single scheduling (and CPU time accounting) unit works 
> pretty well on the desktop.
> 
> think about it in another, 'kernel policy' way as well: we'd like to 
> _encourage_ more parallel user applications. Hurting them by accounting 
> all threads together sends the exact opposite message.

There are counter argouments too. Like, not every user knows if a certain 
process is MT or not. I agree though that doing accounting and fairness at 
a depth lower then USER is messy, and not only for performance.


- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Ingo Molnar

* Willy Tarreau <[EMAIL PROTECTED]> wrote:

> You can certainly script it with -geometry. But it is the wrong 
> application for this matter, because you benchmark X more than 
> glxgears itself. What would be better is something like a line 
> rotating 360 degrees and doing some short stuff between each degree, 
> so that X is not much sollicitated, but the CPU would be spent more on 
> the processes themselves.

at least on my setup glxgears goes via DRI/DRM so there's no X 
scheduling inbetween at all, and the visual appearance of glxgears is a 
direct function of its scheduling.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Willy Tarreau
Hi Ingo,

On Thu, Apr 19, 2007 at 11:01:44AM +0200, Ingo Molnar wrote:
> 
> * Willy Tarreau <[EMAIL PROTECTED]> wrote:
> 
> > Good idea. The machine I'm typing from now has 1000 scheddos running 
> > at +19, and 12 gears at nice 0. [...]
> 
> > From time to time, one of the 12 aligned gears will quickly perform a 
> > full quarter of round while others slowly turn by a few degrees. In 
> > fact, while I don't know this process's CPU usage pattern, there's 
> > something useful in it : it allows me to visually see when process 
> > accelerate/decelerate. [...]
> 
> cool idea - i have just tried this and it rocks - you can easily see the 
> 'nature' of CPU time distribution just via visual feedback. (Is there 
> any easy way to start up 12 glxgears fully aligned, or does one always 
> have to mouse around to get them into proper position?)

-- Replying quickly, I'm short in time --

You can certainly script it with -geometry. But it is the wrong application
for this matter, because you benchmark X more than glxgears itself. What would
be better is something like a line rotating 360 degrees and doing some short
stuff between each degree, so that X is not much sollicitated, but the CPU
would be spent more on the processes themselves.

Benchmarking interactions between X and multiple clients is a completely
different test IMHO. Glxgears is between those two, making it inappropriate
for scheduler tuning.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Peter Williams

William Lee Irwin III wrote:

* Andrew Morton <[EMAIL PROTECTED]> wrote:
Yes, there are potential compatibility problems.  Example: a machine 
with 100 busy httpd processes and suddenly a big gzip starts up from 
console or cron.

[...]

On Thu, Apr 19, 2007 at 08:38:10AM +0200, Ingo Molnar wrote:
h. How about the following then: default to nice -10 for all 
(SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_ 
special: root already has disk space reserved to it, root has special 
memory allocation allowances, etc. I dont see a reason why we couldnt by 
default make all root tasks have nice -10. This would be instantly loved 
by sysadmins i suspect ;-)
(distros that go the extra mile of making Xorg run under non-root could 
also go another extra one foot to renice that X server to -10.)


I'd further recommend making priority levels accessible to kernel threads
that are not otherwise accessible to processes, both above and below
user-available priority levels. Basically, if you can get SCHED_RR and
SCHED_FIFO to coexist as "intimate scheduler classes," then a SCHED_KERN
scheduler class can coexist with SCHED_OTHER in like fashion, but with
availability of higher and lower priorities than any userspace process
is allowed, and potentially some differing scheduling semantics. In such
a manner nonessential background processing intended not to ever disturb
userspace can be given priorities appropriate to it (perhaps even con's
SCHED_IDLEPRIO would make sense), and other, urgent processing can be
given priority over userspace altogether.

I believe root's default priority can be adjusted in userspace as
things now stand somewhere in /etc/ but I'm not sure of the specifics.
Word is somewhere in /etc/security/limits.conf


This is sounding very much like System V Release 4 (and descendants) 
except that they call it SCHED_SYS and also give SCHED_NORMAL tasks that 
are in system mode dynamic priorities in the SCHED_SYS range (to avoid 
priority inversion, I believe).


Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS and suspend2: hang in atomic copy (was: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS])

2007-04-19 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> > I think a better approach would be to keep track of the rightmost 
> > entry, set the key to the rightmost's key +1 and then simply insert 
> > it there.
> 
> yeah. I had that implemented at a stage but was trying to be too 
> clever for my own good ;-)

i have fixed it via the patch below. (I'm using rb_last() because that 
way the normal scheduling codepaths are not burdened with the 
maintainance of a rightmost entry.)

Ingo

---
 kernel/sched.c  |3 ++-
 kernel/sched_fair.c |   24 +---
 2 files changed, 15 insertions(+), 12 deletions(-)

Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -3806,7 +3806,8 @@ asmlinkage long sys_sched_yield(void)
schedstat_inc(rq, yld_cnt);
if (rq->nr_running == 1)
schedstat_inc(rq, yld_act_empty);
-   current->sched_class->yield_task(rq, current);
+   else
+   current->sched_class->yield_task(rq, current);
 
/*
 * Since we are going to call schedule() anyway, there's
Index: linux/kernel/sched_fair.c
===
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -275,21 +275,23 @@ static void dequeue_task_fair(struct rq 
  */
 static void yield_task_fair(struct rq *rq, struct task_struct *p)
 {
+   struct rb_node *entry;
+   struct task_struct *last;
+
dequeue_task_fair(rq, p);
p->on_rq = 0;
+
/*
-* Temporarily insert at the last position of the tree:
+* Temporarily insert at the last position of the tree.
+* The key will be updated back to (near) its old value
+* when the task gets scheduled.
 */
-   p->fair_key = LLONG_MAX;
+   entry = rb_last(>tasks_timeline);
+   last = rb_entry(entry, struct task_struct, run_node);
+
+   p->fair_key = last->fair_key + 1;
__enqueue_task_fair(rq, p);
p->on_rq = 1;
-
-   /*
-* Update the key to the real value, so that when all other
-* tasks from before the rightmost position have executed,
-* this task is picked up again:
-*/
-   p->fair_key = rq->fair_clock - p->wait_runtime + p->nice_offset;
 }
 
 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS and suspend2: hang in atomic copy (was: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS])

2007-04-19 Thread Ingo Molnar

* Esben Nielsen <[EMAIL PROTECTED]> wrote:

> >+/*
> >+ * Temporarily insert at the last position of the tree:
> >+ */
> >+p->fair_key = LLONG_MAX;
> >+__enqueue_task_fair(rq, p);
> > p->on_rq = 1;
> >+
> >+/*
> >+ * Update the key to the real value, so that when all other
> >+ * tasks from before the rightmost position have executed,
> >+ * this task is picked up again:
> >+ */
> >+p->fair_key = rq->fair_clock - p->wait_runtime + p->nice_offset;
> 
> I don't think it safe to change the key after inserting the element in 
> the tree. You end up with an unsorted tree giving where new entries 
> end up in wrong places "randomly".

yeah, indeed. I hoped that once this rightmost entry is removed (as soon 
as it gets scheduled next time) the tree goes back to a correct shape, 
but that's not the case - the left sub-tree and the right sub-tree is 
merged by the rbtree code with the assumption that the entry had a 
correct key.

> I think a better approach would be to keep track of the rightmost 
> entry, set the key to the rightmost's key +1 and then simply insert it 
> there.

yeah. I had that implemented at a stage but was trying to be too clever 
for my own good ;-)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS and suspend2: hang in atomic copy (was: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS])

2007-04-19 Thread Esben Nielsen



On Wed, 18 Apr 2007, Ingo Molnar wrote:



* Christian Hesse <[EMAIL PROTECTED]> wrote:


Hi Ingo and all,

On Friday 13 April 2007, Ingo Molnar wrote:

as usual, any sort of feedback, bugreports, fixes and suggestions are
more than welcome,


I just gave CFS a try on my system. From a user's point of view it
looks good so far. Thanks for your work.


you are welcome!


However I found a problem: When trying to suspend a system patched
with suspend2 2.2.9.11 it hangs with "doing atomic copy". Pressing the
ESC key results in a message that it tries to abort suspend, but then
still hangs.


i took a quick look at suspend2 and it makes some use of yield().
There's a bug in CFS's yield code, i've attached a patch that should fix
it, does it make any difference to the hang?

Ingo

Index: linux/kernel/sched_fair.c
===
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -264,15 +264,26 @@ static void dequeue_task_fair(struct rq

/*
 * sched_yield() support is very simple via the rbtree, we just
- * dequeue and enqueue the task, which causes the task to
- * roundrobin to the end of the tree:
+ * dequeue the task and move it to the rightmost position, which
+ * causes the task to roundrobin to the end of the tree.
 */
static void requeue_task_fair(struct rq *rq, struct task_struct *p)
{
dequeue_task_fair(rq, p);
p->on_rq = 0;
-   enqueue_task_fair(rq, p);
+   /*
+* Temporarily insert at the last position of the tree:
+*/
+   p->fair_key = LLONG_MAX;
+   __enqueue_task_fair(rq, p);
p->on_rq = 1;
+
+   /*
+* Update the key to the real value, so that when all other
+* tasks from before the rightmost position have executed,
+* this task is picked up again:
+*/
+   p->fair_key = rq->fair_clock - p->wait_runtime + p->nice_offset;


I don't think it safe to change the key after inserting the element in the 
tree. You end up with an unsorted tree giving where new entries end up in 
wrong places "randomly".
I think a better approach would be to keep track of the rightmost entry, 
set the key to the rightmost's key +1 and then simply insert it there.


Esben




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Ingo Molnar

* Willy Tarreau <[EMAIL PROTECTED]> wrote:

> Good idea. The machine I'm typing from now has 1000 scheddos running 
> at +19, and 12 gears at nice 0. [...]

> From time to time, one of the 12 aligned gears will quickly perform a 
> full quarter of round while others slowly turn by a few degrees. In 
> fact, while I don't know this process's CPU usage pattern, there's 
> something useful in it : it allows me to visually see when process 
> accelerate/decelerate. [...]

cool idea - i have just tried this and it rocks - you can easily see the 
'nature' of CPU time distribution just via visual feedback. (Is there 
any easy way to start up 12 glxgears fully aligned, or does one always 
have to mouse around to get them into proper position?)

btw., i am using another method to quickly judge X's behavior: i started 
the 'snowflakes' plugin in Beryl on Fedora 7, which puts a nice smooth 
opengl-rendered snow fall on the desktop background. That gives me an 
idea about how well X is scheduling under various workloads, without 
having to instrument it explicitly.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Nick Piggin
On Thu, Apr 19, 2007 at 08:38:10AM +0200, Ingo Molnar wrote:
> 
> * Andrew Morton <[EMAIL PROTECTED]> wrote:
> 
> > > And yes, by fairly, I mean fairly among all threads as a base 
> > > resource class, because that's what Linux has always done
> > 
> > Yes, there are potential compatibility problems.  Example: a machine 
> > with 100 busy httpd processes and suddenly a big gzip starts up from 
> > console or cron.
> > 
> > Under current kernels, that gzip will take ages and the httpds will 
> > take a 1% slowdown, which may well be exactly the behaviour which is 
> > desired.
> > 
> > If we were to schedule by UID then the gzip suddenly gets 50% of the 
> > CPU and those httpd's all take a 50% hit, which could be quite 
> > serious.
> > 
> > That's simple to fix via nicing, but people have to know to do that, 
> > and there will be a transition period where some disruption is 
> > possible.
> 
> h. How about the following then: default to nice -10 for all 
> (SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_ 
> special: root already has disk space reserved to it, root has special 
> memory allocation allowances, etc. I dont see a reason why we couldnt by 
> default make all root tasks have nice -10. This would be instantly loved 
> by sysadmins i suspect ;-)

I have no problem with doing fancy new fairness classes and things.

But considering that we _need_ to have per-thread fairness and that
is also what the current scheduler has and what we need to do well for
obvious reasons, the best path to take is to get per-thread scheduling
up to a point where it is able to replace the current scheduler, then
look at more complex things after that.
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Ingo Molnar

* Davide Libenzi <[EMAIL PROTECTED]> wrote:

> > That's one reason why i dont think it's necessarily a good idea to 
> > group-schedule threads, we dont really want to do a per thread group 
> > percpu_alloc().
> 
> I still do not have clear how much overhead this will bring into the 
> table, but I think (like Linus was pointing out) the hierarchy should 
> look like:
> 
> Top (VCPU maybe?)
> User
> Process
> Thread
> 
> The "run_queue" concept (and data) that now is bound to a CPU, need to be 
> replicated in:
> 
> ROOT <- VCPUs add themselves here
> VCPU <- USERs add themselves here
> USER <- PROCs add themselves here
> PROC <- THREADs add themselves here
> THREAD (ultimate fine grained scheduling unit)
> 
> So ROOT, VCPU, USER and PROC will have their own "run_queue". Picking 
> up a new task would mean:
> 
> VCPU = ROOT->lookup();
> USER = VCPU->lookup();
> PROC = USER->lookup();
> THREAD = PROC->lookup();
> 
> Run-time statistics should propagate back the other way around.

yeah, but this looks quite bad from an overhead POV ... i think we can 
do alot simpler to solve X and kernel threads prioritization.

> > In fact for threads the _reverse_ problem exists, threaded apps tend 
> > to _strive_ for more performance - hence their desperation of using 
> > the threaded programming model to begin with ;) (just think of media 
> > playback apps which are typically multithreaded)
> 
> The same user nicing two different multi-threaded processes would 
> expect a predictable CPU distribution too. [...]

i disagree that the user 'would expect' this. Some users might. Others 
would say: 'my 10-thread rendering engine is more important than a 
1-thread job because it's using 10 threads for a reason'. And the CFS 
feedback so far strengthens this point: the default behavior of treating 
the thread as a single scheduling (and CPU time accounting) unit works 
pretty well on the desktop.

think about it in another, 'kernel policy' way as well: we'd like to 
_encourage_ more parallel user applications. Hurting them by accounting 
all threads together sends the exact opposite message.

> [...] Doing that efficently (the old per-cpu run-queue is pretty nice 
> from many POVs) is the real challenge.

yeah.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread William Lee Irwin III
* Andrew Morton <[EMAIL PROTECTED]> wrote:
>> Yes, there are potential compatibility problems.  Example: a machine 
>> with 100 busy httpd processes and suddenly a big gzip starts up from 
>> console or cron.
[...]

On Thu, Apr 19, 2007 at 08:38:10AM +0200, Ingo Molnar wrote:
> h. How about the following then: default to nice -10 for all 
> (SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_ 
> special: root already has disk space reserved to it, root has special 
> memory allocation allowances, etc. I dont see a reason why we couldnt by 
> default make all root tasks have nice -10. This would be instantly loved 
> by sysadmins i suspect ;-)
> (distros that go the extra mile of making Xorg run under non-root could 
> also go another extra one foot to renice that X server to -10.)

I'd further recommend making priority levels accessible to kernel threads
that are not otherwise accessible to processes, both above and below
user-available priority levels. Basically, if you can get SCHED_RR and
SCHED_FIFO to coexist as "intimate scheduler classes," then a SCHED_KERN
scheduler class can coexist with SCHED_OTHER in like fashion, but with
availability of higher and lower priorities than any userspace process
is allowed, and potentially some differing scheduling semantics. In such
a manner nonessential background processing intended not to ever disturb
userspace can be given priorities appropriate to it (perhaps even con's
SCHED_IDLEPRIO would make sense), and other, urgent processing can be
given priority over userspace altogether.

I believe root's default priority can be adjusted in userspace as
things now stand somewhere in /etc/ but I'm not sure of the specifics.
Word is somewhere in /etc/security/limits.conf


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Mike Galbraith
On Thu, 2007-04-19 at 09:09 +0200, Ingo Molnar wrote:
> * Mike Galbraith <[EMAIL PROTECTED]> wrote:
> 
> > With a heavily reniced X (perfectly fine), that should indeed solve my 
> > daily usage pattern nicely (always need godmode for shells, but not 
> > for mozilla and ilk. 50/50 split automatic without renice of entire 
> > gui)
> 
> how about the first-approximation solution i suggested in the previous 
> mail: to add a per UID default nice level? (With this default defaulting 
> to '-10' for all root-owned processes, and defaulting to '0' for 
> everything else.) That would solve most of the current CFS regressions 
> at hand.

That would make my kernel builds etc interfere with my other self's
surfing and whatnot.  With it by EUID, when I'm surfing or whatnot, the
X portion of my Joe-User activity pushes the compile portion of root
down in bandwidth utilization automagically, which is exactly the right
thing, because the root me in not as important as the Joe-User me using
the GUI at that time.  If the idea of X disturbing root upsets some,
they can move X to another UID.  Generally, it seems perfect for here.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Mike Galbraith
On Thu, 2007-04-19 at 08:52 +0200, Mike Galbraith wrote:
> On Wed, 2007-04-18 at 23:48 +0200, Ingo Molnar wrote:
> 
> > so my current impression is that we want per UID accounting to solve the 
> > X problem, the kernel threads problem and the many-users problem, but 
> > i'd not want to do it for threads just yet because for them there's not 
> > really any apparent problem to be solved.
> 
> If you really mean UID vs EUID as Linus mentioned, I suppose I could
> learn to login as !root, and set KDE up to always give me root shells.
> 
> With a heavily reniced X (perfectly fine), that should indeed solve my
> daily usage pattern nicely (always need godmode for shells, but not for
> mozilla and ilk. 50/50 split automatic without renice of entire gui)

Backward, needs to be EUID as Linus suggested.  Kernel builds etc along
with reniced X in root's bucket, surfing and whatnot in Joe-User's
bucket.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Ingo Molnar

* Mike Galbraith <[EMAIL PROTECTED]> wrote:

> With a heavily reniced X (perfectly fine), that should indeed solve my 
> daily usage pattern nicely (always need godmode for shells, but not 
> for mozilla and ilk. 50/50 split automatic without renice of entire 
> gui)

how about the first-approximation solution i suggested in the previous 
mail: to add a per UID default nice level? (With this default defaulting 
to '-10' for all root-owned processes, and defaulting to '0' for 
everything else.) That would solve most of the current CFS regressions 
at hand.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Mike Galbraith
On Wed, 2007-04-18 at 23:48 +0200, Ingo Molnar wrote:

> so my current impression is that we want per UID accounting to solve the 
> X problem, the kernel threads problem and the many-users problem, but 
> i'd not want to do it for threads just yet because for them there's not 
> really any apparent problem to be solved.

If you really mean UID vs EUID as Linus mentioned, I suppose I could
learn to login as !root, and set KDE up to always give me root shells.

With a heavily reniced X (perfectly fine), that should indeed solve my
daily usage pattern nicely (always need godmode for shells, but not for
mozilla and ilk. 50/50 split automatic without renice of entire gui)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Ingo Molnar

* Andrew Morton <[EMAIL PROTECTED]> wrote:

> > And yes, by fairly, I mean fairly among all threads as a base 
> > resource class, because that's what Linux has always done
> 
> Yes, there are potential compatibility problems.  Example: a machine 
> with 100 busy httpd processes and suddenly a big gzip starts up from 
> console or cron.
> 
> Under current kernels, that gzip will take ages and the httpds will 
> take a 1% slowdown, which may well be exactly the behaviour which is 
> desired.
> 
> If we were to schedule by UID then the gzip suddenly gets 50% of the 
> CPU and those httpd's all take a 50% hit, which could be quite 
> serious.
> 
> That's simple to fix via nicing, but people have to know to do that, 
> and there will be a transition period where some disruption is 
> possible.

h. How about the following then: default to nice -10 for all 
(SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_ 
special: root already has disk space reserved to it, root has special 
memory allocation allowances, etc. I dont see a reason why we couldnt by 
default make all root tasks have nice -10. This would be instantly loved 
by sysadmins i suspect ;-)

(distros that go the extra mile of making Xorg run under non-root could 
also go another extra one foot to renice that X server to -10.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Ingo Molnar

* Andrew Morton [EMAIL PROTECTED] wrote:

  And yes, by fairly, I mean fairly among all threads as a base 
  resource class, because that's what Linux has always done
 
 Yes, there are potential compatibility problems.  Example: a machine 
 with 100 busy httpd processes and suddenly a big gzip starts up from 
 console or cron.
 
 Under current kernels, that gzip will take ages and the httpds will 
 take a 1% slowdown, which may well be exactly the behaviour which is 
 desired.
 
 If we were to schedule by UID then the gzip suddenly gets 50% of the 
 CPU and those httpd's all take a 50% hit, which could be quite 
 serious.
 
 That's simple to fix via nicing, but people have to know to do that, 
 and there will be a transition period where some disruption is 
 possible.

h. How about the following then: default to nice -10 for all 
(SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_ 
special: root already has disk space reserved to it, root has special 
memory allocation allowances, etc. I dont see a reason why we couldnt by 
default make all root tasks have nice -10. This would be instantly loved 
by sysadmins i suspect ;-)

(distros that go the extra mile of making Xorg run under non-root could 
also go another extra one foot to renice that X server to -10.)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Mike Galbraith
On Wed, 2007-04-18 at 23:48 +0200, Ingo Molnar wrote:

 so my current impression is that we want per UID accounting to solve the 
 X problem, the kernel threads problem and the many-users problem, but 
 i'd not want to do it for threads just yet because for them there's not 
 really any apparent problem to be solved.

If you really mean UID vs EUID as Linus mentioned, I suppose I could
learn to login as !root, and set KDE up to always give me root shells.

With a heavily reniced X (perfectly fine), that should indeed solve my
daily usage pattern nicely (always need godmode for shells, but not for
mozilla and ilk. 50/50 split automatic without renice of entire gui)

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Ingo Molnar

* Mike Galbraith [EMAIL PROTECTED] wrote:

 With a heavily reniced X (perfectly fine), that should indeed solve my 
 daily usage pattern nicely (always need godmode for shells, but not 
 for mozilla and ilk. 50/50 split automatic without renice of entire 
 gui)

how about the first-approximation solution i suggested in the previous 
mail: to add a per UID default nice level? (With this default defaulting 
to '-10' for all root-owned processes, and defaulting to '0' for 
everything else.) That would solve most of the current CFS regressions 
at hand.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Mike Galbraith
On Thu, 2007-04-19 at 08:52 +0200, Mike Galbraith wrote:
 On Wed, 2007-04-18 at 23:48 +0200, Ingo Molnar wrote:
 
  so my current impression is that we want per UID accounting to solve the 
  X problem, the kernel threads problem and the many-users problem, but 
  i'd not want to do it for threads just yet because for them there's not 
  really any apparent problem to be solved.
 
 If you really mean UID vs EUID as Linus mentioned, I suppose I could
 learn to login as !root, and set KDE up to always give me root shells.
 
 With a heavily reniced X (perfectly fine), that should indeed solve my
 daily usage pattern nicely (always need godmode for shells, but not for
 mozilla and ilk. 50/50 split automatic without renice of entire gui)

Backward, needs to be EUID as Linus suggested.  Kernel builds etc along
with reniced X in root's bucket, surfing and whatnot in Joe-User's
bucket.

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Mike Galbraith
On Thu, 2007-04-19 at 09:09 +0200, Ingo Molnar wrote:
 * Mike Galbraith [EMAIL PROTECTED] wrote:
 
  With a heavily reniced X (perfectly fine), that should indeed solve my 
  daily usage pattern nicely (always need godmode for shells, but not 
  for mozilla and ilk. 50/50 split automatic without renice of entire 
  gui)
 
 how about the first-approximation solution i suggested in the previous 
 mail: to add a per UID default nice level? (With this default defaulting 
 to '-10' for all root-owned processes, and defaulting to '0' for 
 everything else.) That would solve most of the current CFS regressions 
 at hand.

That would make my kernel builds etc interfere with my other self's
surfing and whatnot.  With it by EUID, when I'm surfing or whatnot, the
X portion of my Joe-User activity pushes the compile portion of root
down in bandwidth utilization automagically, which is exactly the right
thing, because the root me in not as important as the Joe-User me using
the GUI at that time.  If the idea of X disturbing root upsets some,
they can move X to another UID.  Generally, it seems perfect for here.

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread William Lee Irwin III
* Andrew Morton [EMAIL PROTECTED] wrote:
 Yes, there are potential compatibility problems.  Example: a machine 
 with 100 busy httpd processes and suddenly a big gzip starts up from 
 console or cron.
[...]

On Thu, Apr 19, 2007 at 08:38:10AM +0200, Ingo Molnar wrote:
 h. How about the following then: default to nice -10 for all 
 (SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_ 
 special: root already has disk space reserved to it, root has special 
 memory allocation allowances, etc. I dont see a reason why we couldnt by 
 default make all root tasks have nice -10. This would be instantly loved 
 by sysadmins i suspect ;-)
 (distros that go the extra mile of making Xorg run under non-root could 
 also go another extra one foot to renice that X server to -10.)

I'd further recommend making priority levels accessible to kernel threads
that are not otherwise accessible to processes, both above and below
user-available priority levels. Basically, if you can get SCHED_RR and
SCHED_FIFO to coexist as intimate scheduler classes, then a SCHED_KERN
scheduler class can coexist with SCHED_OTHER in like fashion, but with
availability of higher and lower priorities than any userspace process
is allowed, and potentially some differing scheduling semantics. In such
a manner nonessential background processing intended not to ever disturb
userspace can be given priorities appropriate to it (perhaps even con's
SCHED_IDLEPRIO would make sense), and other, urgent processing can be
given priority over userspace altogether.

I believe root's default priority can be adjusted in userspace as
things now stand somewhere in /etc/ but I'm not sure of the specifics.
Word is somewhere in /etc/security/limits.conf


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Ingo Molnar

* Davide Libenzi [EMAIL PROTECTED] wrote:

  That's one reason why i dont think it's necessarily a good idea to 
  group-schedule threads, we dont really want to do a per thread group 
  percpu_alloc().
 
 I still do not have clear how much overhead this will bring into the 
 table, but I think (like Linus was pointing out) the hierarchy should 
 look like:
 
 Top (VCPU maybe?)
 User
 Process
 Thread
 
 The run_queue concept (and data) that now is bound to a CPU, need to be 
 replicated in:
 
 ROOT - VCPUs add themselves here
 VCPU - USERs add themselves here
 USER - PROCs add themselves here
 PROC - THREADs add themselves here
 THREAD (ultimate fine grained scheduling unit)
 
 So ROOT, VCPU, USER and PROC will have their own run_queue. Picking 
 up a new task would mean:
 
 VCPU = ROOT-lookup();
 USER = VCPU-lookup();
 PROC = USER-lookup();
 THREAD = PROC-lookup();
 
 Run-time statistics should propagate back the other way around.

yeah, but this looks quite bad from an overhead POV ... i think we can 
do alot simpler to solve X and kernel threads prioritization.

  In fact for threads the _reverse_ problem exists, threaded apps tend 
  to _strive_ for more performance - hence their desperation of using 
  the threaded programming model to begin with ;) (just think of media 
  playback apps which are typically multithreaded)
 
 The same user nicing two different multi-threaded processes would 
 expect a predictable CPU distribution too. [...]

i disagree that the user 'would expect' this. Some users might. Others 
would say: 'my 10-thread rendering engine is more important than a 
1-thread job because it's using 10 threads for a reason'. And the CFS 
feedback so far strengthens this point: the default behavior of treating 
the thread as a single scheduling (and CPU time accounting) unit works 
pretty well on the desktop.

think about it in another, 'kernel policy' way as well: we'd like to 
_encourage_ more parallel user applications. Hurting them by accounting 
all threads together sends the exact opposite message.

 [...] Doing that efficently (the old per-cpu run-queue is pretty nice 
 from many POVs) is the real challenge.

yeah.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Nick Piggin
On Thu, Apr 19, 2007 at 08:38:10AM +0200, Ingo Molnar wrote:
 
 * Andrew Morton [EMAIL PROTECTED] wrote:
 
   And yes, by fairly, I mean fairly among all threads as a base 
   resource class, because that's what Linux has always done
  
  Yes, there are potential compatibility problems.  Example: a machine 
  with 100 busy httpd processes and suddenly a big gzip starts up from 
  console or cron.
  
  Under current kernels, that gzip will take ages and the httpds will 
  take a 1% slowdown, which may well be exactly the behaviour which is 
  desired.
  
  If we were to schedule by UID then the gzip suddenly gets 50% of the 
  CPU and those httpd's all take a 50% hit, which could be quite 
  serious.
  
  That's simple to fix via nicing, but people have to know to do that, 
  and there will be a transition period where some disruption is 
  possible.
 
 h. How about the following then: default to nice -10 for all 
 (SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_ 
 special: root already has disk space reserved to it, root has special 
 memory allocation allowances, etc. I dont see a reason why we couldnt by 
 default make all root tasks have nice -10. This would be instantly loved 
 by sysadmins i suspect ;-)

I have no problem with doing fancy new fairness classes and things.

But considering that we _need_ to have per-thread fairness and that
is also what the current scheduler has and what we need to do well for
obvious reasons, the best path to take is to get per-thread scheduling
up to a point where it is able to replace the current scheduler, then
look at more complex things after that.
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Ingo Molnar

* Willy Tarreau [EMAIL PROTECTED] wrote:

 Good idea. The machine I'm typing from now has 1000 scheddos running 
 at +19, and 12 gears at nice 0. [...]

 From time to time, one of the 12 aligned gears will quickly perform a 
 full quarter of round while others slowly turn by a few degrees. In 
 fact, while I don't know this process's CPU usage pattern, there's 
 something useful in it : it allows me to visually see when process 
 accelerate/decelerate. [...]

cool idea - i have just tried this and it rocks - you can easily see the 
'nature' of CPU time distribution just via visual feedback. (Is there 
any easy way to start up 12 glxgears fully aligned, or does one always 
have to mouse around to get them into proper position?)

btw., i am using another method to quickly judge X's behavior: i started 
the 'snowflakes' plugin in Beryl on Fedora 7, which puts a nice smooth 
opengl-rendered snow fall on the desktop background. That gives me an 
idea about how well X is scheduling under various workloads, without 
having to instrument it explicitly.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS and suspend2: hang in atomic copy (was: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS])

2007-04-19 Thread Esben Nielsen



On Wed, 18 Apr 2007, Ingo Molnar wrote:



* Christian Hesse [EMAIL PROTECTED] wrote:


Hi Ingo and all,

On Friday 13 April 2007, Ingo Molnar wrote:

as usual, any sort of feedback, bugreports, fixes and suggestions are
more than welcome,


I just gave CFS a try on my system. From a user's point of view it
looks good so far. Thanks for your work.


you are welcome!


However I found a problem: When trying to suspend a system patched
with suspend2 2.2.9.11 it hangs with doing atomic copy. Pressing the
ESC key results in a message that it tries to abort suspend, but then
still hangs.


i took a quick look at suspend2 and it makes some use of yield().
There's a bug in CFS's yield code, i've attached a patch that should fix
it, does it make any difference to the hang?

Ingo

Index: linux/kernel/sched_fair.c
===
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -264,15 +264,26 @@ static void dequeue_task_fair(struct rq

/*
 * sched_yield() support is very simple via the rbtree, we just
- * dequeue and enqueue the task, which causes the task to
- * roundrobin to the end of the tree:
+ * dequeue the task and move it to the rightmost position, which
+ * causes the task to roundrobin to the end of the tree.
 */
static void requeue_task_fair(struct rq *rq, struct task_struct *p)
{
dequeue_task_fair(rq, p);
p-on_rq = 0;
-   enqueue_task_fair(rq, p);
+   /*
+* Temporarily insert at the last position of the tree:
+*/
+   p-fair_key = LLONG_MAX;
+   __enqueue_task_fair(rq, p);
p-on_rq = 1;
+
+   /*
+* Update the key to the real value, so that when all other
+* tasks from before the rightmost position have executed,
+* this task is picked up again:
+*/
+   p-fair_key = rq-fair_clock - p-wait_runtime + p-nice_offset;


I don't think it safe to change the key after inserting the element in the 
tree. You end up with an unsorted tree giving where new entries end up in 
wrong places randomly.
I think a better approach would be to keep track of the rightmost entry, 
set the key to the rightmost's key +1 and then simply insert it there.


Esben




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS and suspend2: hang in atomic copy (was: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS])

2007-04-19 Thread Ingo Molnar

* Esben Nielsen [EMAIL PROTECTED] wrote:

 +/*
 + * Temporarily insert at the last position of the tree:
 + */
 +p-fair_key = LLONG_MAX;
 +__enqueue_task_fair(rq, p);
  p-on_rq = 1;
 +
 +/*
 + * Update the key to the real value, so that when all other
 + * tasks from before the rightmost position have executed,
 + * this task is picked up again:
 + */
 +p-fair_key = rq-fair_clock - p-wait_runtime + p-nice_offset;
 
 I don't think it safe to change the key after inserting the element in 
 the tree. You end up with an unsorted tree giving where new entries 
 end up in wrong places randomly.

yeah, indeed. I hoped that once this rightmost entry is removed (as soon 
as it gets scheduled next time) the tree goes back to a correct shape, 
but that's not the case - the left sub-tree and the right sub-tree is 
merged by the rbtree code with the assumption that the entry had a 
correct key.

 I think a better approach would be to keep track of the rightmost 
 entry, set the key to the rightmost's key +1 and then simply insert it 
 there.

yeah. I had that implemented at a stage but was trying to be too clever 
for my own good ;-)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS and suspend2: hang in atomic copy (was: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS])

2007-04-19 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

  I think a better approach would be to keep track of the rightmost 
  entry, set the key to the rightmost's key +1 and then simply insert 
  it there.
 
 yeah. I had that implemented at a stage but was trying to be too 
 clever for my own good ;-)

i have fixed it via the patch below. (I'm using rb_last() because that 
way the normal scheduling codepaths are not burdened with the 
maintainance of a rightmost entry.)

Ingo

---
 kernel/sched.c  |3 ++-
 kernel/sched_fair.c |   24 +---
 2 files changed, 15 insertions(+), 12 deletions(-)

Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -3806,7 +3806,8 @@ asmlinkage long sys_sched_yield(void)
schedstat_inc(rq, yld_cnt);
if (rq-nr_running == 1)
schedstat_inc(rq, yld_act_empty);
-   current-sched_class-yield_task(rq, current);
+   else
+   current-sched_class-yield_task(rq, current);
 
/*
 * Since we are going to call schedule() anyway, there's
Index: linux/kernel/sched_fair.c
===
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -275,21 +275,23 @@ static void dequeue_task_fair(struct rq 
  */
 static void yield_task_fair(struct rq *rq, struct task_struct *p)
 {
+   struct rb_node *entry;
+   struct task_struct *last;
+
dequeue_task_fair(rq, p);
p-on_rq = 0;
+
/*
-* Temporarily insert at the last position of the tree:
+* Temporarily insert at the last position of the tree.
+* The key will be updated back to (near) its old value
+* when the task gets scheduled.
 */
-   p-fair_key = LLONG_MAX;
+   entry = rb_last(rq-tasks_timeline);
+   last = rb_entry(entry, struct task_struct, run_node);
+
+   p-fair_key = last-fair_key + 1;
__enqueue_task_fair(rq, p);
p-on_rq = 1;
-
-   /*
-* Update the key to the real value, so that when all other
-* tasks from before the rightmost position have executed,
-* this task is picked up again:
-*/
-   p-fair_key = rq-fair_clock - p-wait_runtime + p-nice_offset;
 }
 
 /*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Peter Williams

William Lee Irwin III wrote:

* Andrew Morton [EMAIL PROTECTED] wrote:
Yes, there are potential compatibility problems.  Example: a machine 
with 100 busy httpd processes and suddenly a big gzip starts up from 
console or cron.

[...]

On Thu, Apr 19, 2007 at 08:38:10AM +0200, Ingo Molnar wrote:
h. How about the following then: default to nice -10 for all 
(SCHED_NORMAL) kernel threads and all root-owned tasks. Root _is_ 
special: root already has disk space reserved to it, root has special 
memory allocation allowances, etc. I dont see a reason why we couldnt by 
default make all root tasks have nice -10. This would be instantly loved 
by sysadmins i suspect ;-)
(distros that go the extra mile of making Xorg run under non-root could 
also go another extra one foot to renice that X server to -10.)


I'd further recommend making priority levels accessible to kernel threads
that are not otherwise accessible to processes, both above and below
user-available priority levels. Basically, if you can get SCHED_RR and
SCHED_FIFO to coexist as intimate scheduler classes, then a SCHED_KERN
scheduler class can coexist with SCHED_OTHER in like fashion, but with
availability of higher and lower priorities than any userspace process
is allowed, and potentially some differing scheduling semantics. In such
a manner nonessential background processing intended not to ever disturb
userspace can be given priorities appropriate to it (perhaps even con's
SCHED_IDLEPRIO would make sense), and other, urgent processing can be
given priority over userspace altogether.

I believe root's default priority can be adjusted in userspace as
things now stand somewhere in /etc/ but I'm not sure of the specifics.
Word is somewhere in /etc/security/limits.conf


This is sounding very much like System V Release 4 (and descendants) 
except that they call it SCHED_SYS and also give SCHED_NORMAL tasks that 
are in system mode dynamic priorities in the SCHED_SYS range (to avoid 
priority inversion, I believe).


Peter
--
Peter Williams   [EMAIL PROTECTED]

Learning, n. The kind of ignorance distinguishing the studious.
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Willy Tarreau
Hi Ingo,

On Thu, Apr 19, 2007 at 11:01:44AM +0200, Ingo Molnar wrote:
 
 * Willy Tarreau [EMAIL PROTECTED] wrote:
 
  Good idea. The machine I'm typing from now has 1000 scheddos running 
  at +19, and 12 gears at nice 0. [...]
 
  From time to time, one of the 12 aligned gears will quickly perform a 
  full quarter of round while others slowly turn by a few degrees. In 
  fact, while I don't know this process's CPU usage pattern, there's 
  something useful in it : it allows me to visually see when process 
  accelerate/decelerate. [...]
 
 cool idea - i have just tried this and it rocks - you can easily see the 
 'nature' of CPU time distribution just via visual feedback. (Is there 
 any easy way to start up 12 glxgears fully aligned, or does one always 
 have to mouse around to get them into proper position?)

-- Replying quickly, I'm short in time --

You can certainly script it with -geometry. But it is the wrong application
for this matter, because you benchmark X more than glxgears itself. What would
be better is something like a line rotating 360 degrees and doing some short
stuff between each degree, so that X is not much sollicitated, but the CPU
would be spent more on the processes themselves.

Benchmarking interactions between X and multiple clients is a completely
different test IMHO. Glxgears is between those two, making it inappropriate
for scheduler tuning.

Regards,
Willy

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Ingo Molnar

* Willy Tarreau [EMAIL PROTECTED] wrote:

 You can certainly script it with -geometry. But it is the wrong 
 application for this matter, because you benchmark X more than 
 glxgears itself. What would be better is something like a line 
 rotating 360 degrees and doing some short stuff between each degree, 
 so that X is not much sollicitated, but the CPU would be spent more on 
 the processes themselves.

at least on my setup glxgears goes via DRI/DRM so there's no X 
scheduling inbetween at all, and the visual appearance of glxgears is a 
direct function of its scheduling.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Davide Libenzi
On Thu, 19 Apr 2007, Ingo Molnar wrote:

 i disagree that the user 'would expect' this. Some users might. Others 
 would say: 'my 10-thread rendering engine is more important than a 
 1-thread job because it's using 10 threads for a reason'. And the CFS 
 feedback so far strengthens this point: the default behavior of treating 
 the thread as a single scheduling (and CPU time accounting) unit works 
 pretty well on the desktop.
 
 think about it in another, 'kernel policy' way as well: we'd like to 
 _encourage_ more parallel user applications. Hurting them by accounting 
 all threads together sends the exact opposite message.

There are counter argouments too. Like, not every user knows if a certain 
process is MT or not. I agree though that doing accounting and fairness at 
a depth lower then USER is messy, and not only for performance.


- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Davide Libenzi
On Thu, 19 Apr 2007, Mike Galbraith wrote:

 On Thu, 2007-04-19 at 09:09 +0200, Ingo Molnar wrote:
  * Mike Galbraith [EMAIL PROTECTED] wrote:
  
   With a heavily reniced X (perfectly fine), that should indeed solve my 
   daily usage pattern nicely (always need godmode for shells, but not 
   for mozilla and ilk. 50/50 split automatic without renice of entire 
   gui)
  
  how about the first-approximation solution i suggested in the previous 
  mail: to add a per UID default nice level? (With this default defaulting 
  to '-10' for all root-owned processes, and defaulting to '0' for 
  everything else.) That would solve most of the current CFS regressions 
  at hand.
 
 That would make my kernel builds etc interfere with my other self's
 surfing and whatnot.  With it by EUID, when I'm surfing or whatnot, the
 X portion of my Joe-User activity pushes the compile portion of root
 down in bandwidth utilization automagically, which is exactly the right
 thing, because the root me in not as important as the Joe-User me using
 the GUI at that time.  If the idea of X disturbing root upsets some,
 they can move X to another UID.  Generally, it seems perfect for here.

Now guys, I did not follow the whole lengthy and feisty thread, but IIRC 
Con's scheduler has been attacked because, among other argouments, was 
requiring X to be reniced. This happened like a month ago IINM.
I did not have time to look at Con's scheduler, and I only had a brief 
look at Ingo's one (looks very promising IMO, but so was the initial O(1) 
post before all the corner-cases fixes went in).
But this is not a about technical merit, this is about applying the same 
rules of judgement to others as well to ourselves.
We went from a renicing X to -10 is bad because the scheduler should 
be able to correctly handle the problem w/out additional external plugs 
to a totally opposite let's renice -10 X, the whole SCHED_NORMAL kthreads 
class, on top of all the tasks owned by root [1].
From a spectator POV like myself in this case, this looks rather unfair.



[1] I think, before and now, that that's more a duck tape patch than a 
real solution. OTOH if the solution is gonna be another maze of 
macros and heuristics filled with pretty bad corner cases, I may 
prefer the former.


- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Gene Heskett
On Thursday 19 April 2007, Ingo Molnar wrote:
* Willy Tarreau [EMAIL PROTECTED] wrote:
 Good idea. The machine I'm typing from now has 1000 scheddos running
 at +19, and 12 gears at nice 0. [...]

 From time to time, one of the 12 aligned gears will quickly perform a
 full quarter of round while others slowly turn by a few degrees. In
 fact, while I don't know this process's CPU usage pattern, there's
 something useful in it : it allows me to visually see when process
 accelerate/decelerate. [...]

cool idea - i have just tried this and it rocks - you can easily see the
'nature' of CPU time distribution just via visual feedback. (Is there
any easy way to start up 12 glxgears fully aligned, or does one always
have to mouse around to get them into proper position?)

btw., i am using another method to quickly judge X's behavior: i started
the 'snowflakes' plugin in Beryl on Fedora 7, which puts a nice smooth
opengl-rendered snow fall on the desktop background. That gives me an
idea about how well X is scheduling under various workloads, without
having to instrument it explicitly.

yes, its a  cute idea, till you switch away from that screen to check progress 
on something else, like to compose this message.

===
5913 frames in 5.0 seconds = 1182.499 FPS
6238 frames in 5.0 seconds = 1247.556 FPS
11380 frames in 5.0 seconds = 2275.905 FPS
10691 frames in 5.0 seconds = 2138.173 FPS
8707 frames in 5.0 seconds = 1741.305 FPS
10669 frames in 5.0 seconds = 2133.708 FPS
11392 frames in 5.0 seconds = 2278.037 FPS
11379 frames in 5.0 seconds = 2275.711 FPS
11310 frames in 5.0 seconds = 2261.861 FPS
11386 frames in 5.0 seconds = 2277.081 FPS
11292 frames in 5.0 seconds = 2258.353 FPS
11352 frames in 5.0 seconds = 2270.297 FPS
11415 frames in 5.0 seconds = 2282.886 FPS
11406 frames in 5.0 seconds = 2281.037 FPS
11483 frames in 5.0 seconds = 2296.533 FPS
11510 frames in 5.0 seconds = 2301.883 FPS
11123 frames in 5.0 seconds = 2224.266 FPS
8980 frames in 5.0 seconds = 1795.861 FPS
===
The over 2000fps reports were while I was either looking at htop, or starting 
this message, both on different screens.  htop said it was using 95+ % of the 
cpu even when its display was going to /dev/null.  So 'Kewl' doesn't seem to 
get us apples to apples numbers we can go to the window and bet 
win-place-show based on them alone.

FWIW, running the nvidia-9755 drivers here.

So if we are going to use that as a judgement operator, it obviously needs 
some intelligently applied scaling before they are worth more than a 
subjective feel is.

   Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
The confusion of a staff member is measured by the length of his memos.
-- New York Times, Jan. 20, 1981
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Gene Heskett
On Thursday 19 April 2007, Ingo Molnar wrote:
* Willy Tarreau [EMAIL PROTECTED] wrote:
 You can certainly script it with -geometry. But it is the wrong
 application for this matter, because you benchmark X more than
 glxgears itself. What would be better is something like a line
 rotating 360 degrees and doing some short stuff between each degree,
 so that X is not much sollicitated, but the CPU would be spent more on
 the processes themselves.

at least on my setup glxgears goes via DRI/DRM so there's no X
scheduling inbetween at all, and the visual appearance of glxgears is a
direct function of its scheduling.

   Ingo

That doesn't appear to be the case here Ingo. Even when I know the rest of the 
system is lagged, glxgears continues to show very smooth and steady movement.

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
Yow!  I just went below the poverty line!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Bernd Eckenfels
In article [EMAIL PROTECTED] you wrote:
 Top (VCPU maybe?)
User
Process
Thread

The problem with that is, that not all Schedulers might work on the User
level. You can think of Batch/Job, Parent, Group, Session or namespace
level. That would be IMHO a generic Top, with no need for a level above.

Greetings
Bernd
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Willy Tarreau
On Thu, Apr 19, 2007 at 05:18:03PM +0200, Ingo Molnar wrote:
 
 * Willy Tarreau [EMAIL PROTECTED] wrote:
 
  You can certainly script it with -geometry. But it is the wrong 
  application for this matter, because you benchmark X more than 
  glxgears itself. What would be better is something like a line 
  rotating 360 degrees and doing some short stuff between each degree, 
  so that X is not much sollicitated, but the CPU would be spent more on 
  the processes themselves.
 
 at least on my setup glxgears goes via DRI/DRM so there's no X 
 scheduling inbetween at all, and the visual appearance of glxgears is a 
 direct function of its scheduling.

OK, I thought that somethink looking like a clock would be useful, especially
if we could tune the amount of CPU spent per task instead of being limited by
graphics drivers.

I searched freashmeat for a clock and found orbitclock by Jeremy Weatherford,
which was exactly what I was looking for :
  - small
  - C only
  - X11 only
  - needed less than 5 minutes and no knowledge of X11 for the complete hack !
  = Kudos to its author, sincerely !

I hacked it a bit to make it accept two parameters :
  -R run_time_in_microsecond : time spent burning CPU cycles at each round
  -S sleep_time_in_microsecond : time spent getting a rest

It now advances what it thinks is a second at each iteration, so that it makes
it easy to compare its progress with other instances (there are seconds,
minutes and hours, so it's easy to visually count up to around 43200).

The modified code is here :

  http://linux.1wt.eu/sched/orbitclock-0.2bench.tgz

What is interesting to note is that it's easy to make X work a lot (99%) by
using 0 as the sleeping time, and it's easy to make the process work a lot
by using large values for the running time associated with very low values
(or 0) for the sleep time.

Ah, and it supports -geometry ;-)

It could become a useful scheduler benchmark !

Have fun !
Willy

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Jan Knutar
On Thursday 19 April 2007 18:18, Ingo Molnar wrote:
 * Willy Tarreau [EMAIL PROTECTED] wrote:
  You can certainly script it with -geometry. But it is the wrong
  application for this matter, because you benchmark X more than
  glxgears itself. What would be better is something like a line
  rotating 360 degrees and doing some short stuff between each
  degree, so that X is not much sollicitated, but the CPU would be
  spent more on the processes themselves.

 at least on my setup glxgears goes via DRI/DRM so there's no X
 scheduling inbetween at all, and the visual appearance of glxgears is
 a direct function of its scheduling.

How much of the subjective interactiveness-feel of the desktop is at the 
mercy of the X server's scheduling and not the cpu scheduler?

I've noticed that video playback is significantly smoother and resistant 
to other load, when using MPlayer's opengl output, especially if 
heavy programs are running at the same time. Especially firefox and 
ksysguard seem to have found a way to cause video through Xv to look 
annoyingly jittery.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Willy Tarreau
On Fri, Apr 20, 2007 at 02:52:38AM +0300, Jan Knutar wrote:
 On Thursday 19 April 2007 18:18, Ingo Molnar wrote:
  * Willy Tarreau [EMAIL PROTECTED] wrote:
   You can certainly script it with -geometry. But it is the wrong
   application for this matter, because you benchmark X more than
   glxgears itself. What would be better is something like a line
   rotating 360 degrees and doing some short stuff between each
   degree, so that X is not much sollicitated, but the CPU would be
   spent more on the processes themselves.
 
  at least on my setup glxgears goes via DRI/DRM so there's no X
  scheduling inbetween at all, and the visual appearance of glxgears is
  a direct function of its scheduling.
 
 How much of the subjective interactiveness-feel of the desktop is at the 
 mercy of the X server's scheduling and not the cpu scheduler?

probably a lot. Hence the reason why I wanted something visually noticeable
but using far less X resources than glxgears. The modified orbitclock is
perfect IMHO.

Regards,
Willy

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread Mike Galbraith
On Thu, 2007-04-19 at 09:55 -0700, Davide Libenzi wrote:
 On Thu, 19 Apr 2007, Mike Galbraith wrote:
 
  On Thu, 2007-04-19 at 09:09 +0200, Ingo Molnar wrote:
   * Mike Galbraith [EMAIL PROTECTED] wrote:
   
With a heavily reniced X (perfectly fine), that should indeed solve my 
daily usage pattern nicely (always need godmode for shells, but not 
for mozilla and ilk. 50/50 split automatic without renice of entire 
gui)
   
   how about the first-approximation solution i suggested in the previous 
   mail: to add a per UID default nice level? (With this default defaulting 
   to '-10' for all root-owned processes, and defaulting to '0' for 
   everything else.) That would solve most of the current CFS regressions 
   at hand.
  
  That would make my kernel builds etc interfere with my other self's
  surfing and whatnot.  With it by EUID, when I'm surfing or whatnot, the
  X portion of my Joe-User activity pushes the compile portion of root
  down in bandwidth utilization automagically, which is exactly the right
  thing, because the root me in not as important as the Joe-User me using
  the GUI at that time.  If the idea of X disturbing root upsets some,
  they can move X to another UID.  Generally, it seems perfect for here.
 
 Now guys, I did not follow the whole lengthy and feisty thread, but IIRC 
 Con's scheduler has been attacked because, among other argouments, was 
 requiring X to be reniced. This happened like a month ago IINM.

I don't object to renicing X if you want it to receive _more_ than it's
fair share. I do object to having to renice X in order for it to _get_
it's fair share.  That's what I attacked.

 I did not have time to look at Con's scheduler, and I only had a brief 
 look at Ingo's one (looks very promising IMO, but so was the initial O(1) 
 post before all the corner-cases fixes went in).
 But this is not a about technical merit, this is about applying the same 
 rules of judgement to others as well to ourselves.

I'm running the same tests with CFS that I ran for RSDL/SD.  It falls
short in one key area (to me) in that X+client cannot yet split my box
50/50 with two concurrent tasks.  In the CFS case, renicing both X and
client does work, but it should not be necessary IMHO.  With RSDL/SD
renicing didn't help.

 We went from a renicing X to -10 is bad because the scheduler should 
 be able to correctly handle the problem w/out additional external plugs 
 to a totally opposite let's renice -10 X, the whole SCHED_NORMAL kthreads 
 class, on top of all the tasks owned by root [1].
 From a spectator POV like myself in this case, this looks rather unfair.

Well, for me, the renicing I mentioned above is only interesting as a
way to improve long term fairness with schedulers with no history.

I found Linus' EUID idea intriguing in that by putting the server
together with a steady load in one 'fair' domain, and clients in
another, X can, if prioritized to empower it to do so, modulate the
steady load in it's domain (but can't starve it!), the clients modulate
X, and the steady load gets it all when X and clients are idle.  The
nice level of X determines to what _extent_ X can modulate the constant
load rather like a mixer slider.  The synchronous (I'm told) nature of
X/client then becomes kind of an asset to the desktop instead of a
liability.

The specific case I was thinking about is the X+Gforce test where both
RSDL and CFS fail to provide fairness (as defined by me;).  X and Gforce
are mostly not concurrent.  The make -j2 I put them up against are
mostly concurrent.  I don't call giving 1/3 of my CPU to X+Client fair
at _all_, but that's what you'll get if your fairstick of the instant
generally can't see the fourth competing task.  Seemed pretty cool to me
because it creates the missing connection between client and server,
though also likely complicated (and maybe full of perils, who knows).

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-19 Thread William Lee Irwin III
William Lee Irwin III wrote:
 I'd further recommend making priority levels accessible to kernel threads
 that are not otherwise accessible to processes, both above and below
 user-available priority levels. Basically, if you can get SCHED_RR and
 SCHED_FIFO to coexist as intimate scheduler classes, then a SCHED_KERN
 scheduler class can coexist with SCHED_OTHER in like fashion, but with
 availability of higher and lower priorities than any userspace process
 is allowed, and potentially some differing scheduling semantics. In such
 a manner nonessential background processing intended not to ever disturb
 userspace can be given priorities appropriate to it (perhaps even con's
 SCHED_IDLEPRIO would make sense), and other, urgent processing can be
 given priority over userspace altogether.

On Thu, Apr 19, 2007 at 09:50:19PM +1000, Peter Williams wrote:
 This is sounding very much like System V Release 4 (and descendants) 
 except that they call it SCHED_SYS and also give SCHED_NORMAL tasks that 
 are in system mode dynamic priorities in the SCHED_SYS range (to avoid 
 priority inversion, I believe).

Descriptions of that are probably where I got the idea (hurrah for OS
textbooks). It makes a fair amount of sense. Not sure what the take on
the specific precedent is. The only content here is expanding the
priority range with ranges above and below for the exclusive use of
ultra-privileged tasks, so it's really trivial. Actually it might be so
trivial it should just be some permission checks in the SCHED_OTHER
renicing code.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Andrew Morton
On Thu, 19 Apr 2007 05:18:07 +0200 Nick Piggin <[EMAIL PROTECTED]> wrote:

> And yes, by fairly, I mean fairly among all threads as a base resource
> class, because that's what Linux has always done

Yes, there are potential compatibility problems.  Example: a machine with
100 busy httpd processes and suddenly a big gzip starts up from console or
cron.

Under current kernels, that gzip will take ages and the httpds will take a
1% slowdown, which may well be exactly the behaviour which is desired.

If we were to schedule by UID then the gzip suddenly gets 50% of the CPU
and those httpd's all take a 50% hit, which could be quite serious.

That's simple to fix via nicing, but people have to know to do that, and
there will be a transition period where some disruption is possible.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Nick Piggin
On Wed, Apr 18, 2007 at 10:49:45PM +1000, Con Kolivas wrote:
> On Wednesday 18 April 2007 22:13, Nick Piggin wrote:
> >
> > The kernel compile (make -j8 on 4 thread system) is doing 1800 total
> > context switches per second (450/s per runqueue) for cfs, and 670
> > for mainline. Going up to 20ms granularity for cfs brings the context
> > switch numbers similar, but user time is still a % or so higher. I'd
> > be more worried about compute heavy threads which naturally don't do
> > much context switching.
> 
> While kernel compiles are nice and easy to do I've seen enough criticism of 
> them in the past to wonder about their usefulness as a standard benchmark on 
> their own.

Actually it is a real workload for most kernel developers including you
no doubt :)

The criticism's of kernbench for the kernel are probably fair in that
kernel compiles don't exercise a lot of kernel functionality (page
allocator and fault paths mostly, IIRC). However as far as I'm concerned,
they're great for testing the CPU scheduler, because it doesn't actually
matter whether you're running in userspace or kernel space for a context
switch to blow your caches. The results are quite stable.

You could actually make up a benchmark that hurts a whole lot more from
context switching, but I figure that kernbench is a real world thing
that shows it up quite well.


> > Some other numbers on the same system
> > Hackbench:  2.6.21-rc7  cfs-v2 1ms[*]   nicksched
> > 10 groups: Time: 1.332  0.743   0.607
> > 20 groups: Time: 1.197  1.100   1.241
> > 30 groups: Time: 1.754  2.376   1.834
> > 40 groups: Time: 3.451  2.227   2.503
> > 50 groups: Time: 3.726  3.399   3.220
> > 60 groups: Time: 3.548  4.567   3.668
> > 70 groups: Time: 4.206  4.905   4.314
> > 80 groups: Time: 4.551  6.324   4.879
> > 90 groups: Time: 7.904  6.962   5.335
> > 100 groups: Time: 7.293 7.799   5.857
> > 110 groups: Time: 10.5958.728   6.517
> > 120 groups: Time: 7.543 9.304   7.082
> > 130 groups: Time: 8.269 10.639  8.007
> > 140 groups: Time: 11.8678.250   8.302
> > 150 groups: Time: 14.8528.656   8.662
> > 160 groups: Time: 9.648 9.313   9.541
> 
> Hackbench even more so. A prolonged discussion with Rusty Russell on this 
> issue he suggested hackbench was more a pass/fail benchmark to ensure there 
> was no starvation scenario that never ended, and very little value should be 
> placed on the actual results returned from it.

Yeah, cfs seems to do a little worse than nicksched here, but I
include the numbers not because I think that is significant, but to
show mainline's poor characteristics.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Nick Piggin
On Wed, Apr 18, 2007 at 07:48:21AM -0700, Linus Torvalds wrote:
> 
> 
> On Wed, 18 Apr 2007, Matt Mackall wrote:
> > 
> > Why is X special? Because it does work on behalf of other processes?
> > Lots of things do this. Perhaps a scheduler should focus entirely on
> > the implicit and directed wakeup matrix and optimizing that
> > instead[1].
> 
> I 100% agree - the perfect scheduler would indeed take into account where 
> the wakeups come from, and try to "weigh" processes that help other 
> processes make progress more. That would naturally give server processes 
> more CPU power, because they help others
> 
> I don't believe for a second that "fairness" means "give everybody the 
> same amount of CPU". That's a totally illogical measure of fairness. All 
> processes are _not_ created equal.

I believe that unless the kernel is told of these inequalities, then it
must schedule fairly.

And yes, by fairly, I mean fairly among all threads as a base resource
class, because that's what Linux has always done (and if you aggregate
into higher classes, you still need that per-thread scheduling).

So I'm not excluding extra scheduling classes like per-process, per-user,
but among any class of equal schedulable entities, fair scheduling is the
only option because the alternative of unfairness is just insane.


> That said, even trying to do "fairness by effective user ID" would 
> probably already do a lot. In a desktop environment, X would get as much 
> CPU time as the user processes, simply because it's in a different 
> protection domain (and that's really what "effective user ID" means: it's 
> not about "users", it's really about "protection domains").
> 
> And "fairness by euid" is probably a hell of a lot easier to do than 
> trying to figure out the wakeup matrix.

Well my X server has an euid of root, which would mean my X clients can
cause X to do work and eat into root's resources. Or as Ingo said, X
may not be running as root. Seems like just another hack to try to
implicitly solve the X problem and probably create a lot of others
along the way.

All fairness issues aside, in the context of keeping a very heavily
loaded desktop interactive, X is special. That you are trying to think
up funny rules that would implicitly give X better priority is kind of
indicative of that.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Peter Williams

Ingo Molnar wrote:

* Peter Williams <[EMAIL PROTECTED]> wrote:

And my scheduler for example cuts down the amount of policy code and 
code size significantly.
Yours is one of the smaller patches mainly because you perpetuate (or 
you did in the last one I looked at) the (horrible to my eyes) dual 
array (active/expired) mechanism.  That this idea was bad should have 
been apparent to all as soon as the decision was made to excuse some 
tasks from being moved from the active array to the expired array.  
This essentially meant that there would be circumstances where extreme 
unfairness (to the extent of starvation in some cases) -- the very 
things that the mechanism was originally designed to ensure (as far as 
I can gather).  Right about then in the development of the O(1) 
scheduler alternative solutions should have been sought.


in hindsight i'd agree.


Hindsight's a wonderful place isn't it :-) and, of course, it's where I 
was making my comments from.


But back then we were clearly not ready for 
fine-grained accurate statistics + trees (cpus are alot faster at more 
complex arithmetics today, plus people still believed that low-res can 
be done well enough),  and taking out any of these two concepts from CFS

would result in a similarly complex runqueue implementation.


I disagree.  The single priority array with a promotion mechanism that I 
use in the SPA schedulers can do the job of avoiding starvation with no 
measurable increase in the overhead.  Fairness, nice, good interactive 
responsiveness can then be managed by how you determine tasks' dynamic 
priorities.


Also, the 
array switch was just thought to be of another piece of 'if the 
heuristics go wrong, we fall back to an array switch' logic, right in 
line with the other heuristics. And you have to accept it, mainline's 
ability to auto-renice make -j jobs (and other CPU hogs) was quite a 
plus for developers, so it had (and probably still has) quite some 
inertia.


I agree, it wasn't totally useless especially for the average user.  My 
main problem with it was that the effect of "nice" wasn't consistent or 
predictable enough for reliable resource allocation.


I also agree with the aims of the various heuristics i.e. you have to be 
unfair and give some tasks preferential treatment in order to give the 
users the type of responsiveness that they want.  It's just a shame that 
it got broken in the process but as you say it's easier to see these 
things in hindsight than in the middle of the melee.


Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Peter Williams

Chris Friesen wrote:

Mark Glines wrote:


One minor question: is it even possible to be completely fair on SMP?
For instance, if you have a 2-way SMP box running 3 applications, one of
which has 2 threads, will the threaded app have an advantage here?  (The
current system seems to try to keep each thread on a specific CPU, to
reduce cache thrashing, which means threads and processes alike each
get 50% of the CPU.)


I think the ideal in this case would be to have both threads on one cpu, 
with the other app on the other cpu.  This gives inter-process fairness 
while minimizing the amount of task migration required.


Solving this sort of issue was one of the reasons for the smpnice patches.



More interesting is the case of three processes on a 2-cpu system.  Do 
we constantly migrate one of them back and forth to ensure that each of 
them gets 66% of a cpu?


Depends how keen you are on fairness.  Unless the process are long term 
continuously active tasks that never sleep it's probably not an issue as 
they'll probably move around enough in the long term for them each to 
get 66% over the long term.


Exact load balancing for real work loads (where tasks are coming and 
going, sleeping and waking semi randomly and over relatively brief 
periods) is probably unattainable because by the time you've work out 
the ideal placement of the currently runnable tasks on the available 
CPUs it's all changed and the solution is invalid.  The best you can 
hope for that change isn't so great as to completely invalidate the 
solution and the changes you make as a result are an improvement on the 
current allocation of processes to CPUs.


The above probably doesn't hold for some systems such as those large 
super computer jobs that run for several days but they're probably best 
served by explicit allocation of processes to CPUs using the process 
affinity mechanism.


Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Linus Torvalds


On Wed, 18 Apr 2007, Davide Libenzi wrote:
> 
> I know, we agree there. But that did not fit my "Pirates of the Caribbean" 
> quote :)

Ahh, I'm clearly not cultured enough, I didn't catch that reference.

Linus "yes, I've seen the movie, but it
 apparently left more of a mark in other people" Torvalds
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Peter Williams

Linus Torvalds wrote:


On Wed, 18 Apr 2007, Matt Mackall wrote:

On Wed, Apr 18, 2007 at 07:48:21AM -0700, Linus Torvalds wrote:
And "fairness by euid" is probably a hell of a lot easier to do than 
trying to figure out the wakeup matrix.

For the record, you actually don't need to track a whole NxN matrix
(or do the implied O(n**3) matrix inversion!) to get to the same
result.


I'm sure you can do things differently, but the reason I think "fairness 
by euid" is actually worth looking at is that it's pretty much the 
*identical* issue that we'll have with "fairness by virtual machine" and a 
number of other "container" issues.


The fact is:

 - "fairness" is *not* about giving everybody the same amount of CPU time 
   (scaled by some niceness level or not). Anybody who thinks that is 
   "fair" is just being silly and hasn't thought it through.


 - "fairness" is multi-level. You want to be fair to threads within a 
   thread group (where "process" may be one good approximation of what a 
   "thread group" is, but not necessarily the only one).


   But you *also* want to be fair in between those "thread groups", and 
   then you want to be fair across "containers" (where "user" may be one 
   such container).


So I claim that anything that cannot be fair by user ID is actually really 
REALLY unfair. I think it's absolutely humongously STUPID to call 
something the "Completely Fair Scheduler", and then just be fair on a 
thread level. That's not fair AT ALL! It's the anti-thesis of being fair!


So if you have 2 users on a machine running CPU hogs, you should *first* 
try to be fair among users. If one user then runs 5 programs, and the 
other one runs just 1, then the *one* program should get 50% of the CPU 
time (the users fair share), and the five programs should get 10% of CPU 
time each. And if one of them uses two threads, each thread should get 5%.


So you should see one thread get 50& CPU (single thread of one user), 4 
threads get 10% CPU (their fair share of that users time), and 2 threads 
get 5% CPU (the fair share within that thread group!).


Any scheduling argument that just considers the above to be "7 threads 
total" and gives each thread 14% of CPU time "fairly" is *anything* but 
fair. It's a joke if that kind of scheduler then calls itself CFS!


And yes, that's largely what the current scheduler will do, but at least 
the current scheduler doesn't claim to be fair! So the current scheduler 
is a lot *better* if only in the sense that it doesn't make ridiculous 
claims that aren't true!


Linus


Sounds a lot like the PLFS (process level fair sharing) scheduler in 
Aurema's ARMTech (for whom I used to work).  The "fair" in the title is 
a bit misleading as it's all about unfair scheduling in order to meet 
specific policies.  But it's based on the principle that if you can 
allocate CPU band width "fairly" (which really means in proportion to 
the entitlement each process is allocated) then you can allocate CPU 
band width "fairly" between higher level entities such as process 
groups, users groups and so on by subdividing the entitlements downwards.


The tricky part of implementing this was the fact that not all entities 
at the various levels have sufficient demand for CPU band width to use 
their entitlements and this in turn means that the entities above them 
will have difficulty using their entitlements even if other of their 
subordinates have sufficient demand (because their entitlements will be 
too small).  The trick is to have a measure of each entity's demand for 
CPU bandwidth and use that to modify the way entitlement is divided 
among subordinates.


As a first guess, an entity's CPU band width usage is an indicator of 
demand but doesn't take into account unmet demand due to tasks waiting 
on a run queue waiting for access to the CPU.  On the other hand, usage 
plus time waiting on the queue isn't a good measure of demand either 
(although it's probably a good upper bound) as it's unlikely that the 
task would have used the same amount of CPU as the waiting time if it 
had gone straight to the CPU.


But my main point is that it is possible to build schedulers that can 
achieve higher level scheduling policies.  Versions of PLFS work on 
Windows from user space by twiddling process priorities.  Part of my 
more recent work at Aurema had been involved in patching Linux's 
scheduler so that nice worked more predictably so that we could release 
a user space version of PLFS for Linux.  The other part was to add hard 
CPU band width caps for processes so that ARMTech could enforce hard CPU 
bandwidth caps on higher level entities (as this can't be done without 
the kernel being able to do it at that level.


Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Davide Libenzi
On Wed, 18 Apr 2007, Linus Torvalds wrote:

> On Wed, 18 Apr 2007, Davide Libenzi wrote:
> > 
> > "Perhaps on the rare occasion pursuing the right course demands an act of 
> >  unfairness, unfairness itself can be the right course?"
> 
> I don't think that's the right issue.
> 
> It's just that "fairness" != "equal".
> 
> Do you think it "fair" to pay everybody the same regardless of how good a 
> job they do? I don't think anybody really believes that. 
> 
> Equating "fair" and "equal" is simply a very fundamental mistake. They're 
> not the same thing. Never have been, and never will.

I know, we agree there. But that did not fit my "Pirates of the Caribbean" 
quote :)



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Davide Libenzi
On Wed, 18 Apr 2007, Ingo Molnar wrote:

> That's one reason why i dont think it's necessarily a good idea to 
> group-schedule threads, we dont really want to do a per thread group 
> percpu_alloc().

I still do not have clear how much overhead this will bring into the 
table, but I think (like Linus was pointing out) the hierarchy should look 
like:

Top (VCPU maybe?)
User
Process
Thread

The "run_queue" concept (and data) that now is bound to a CPU, need to be 
replicated in:

ROOT <- VCPUs add themselves here
VCPU <- USERs add themselves here
USER <- PROCs add themselves here
PROC <- THREADs add themselves here
THREAD (ultimate fine grained scheduling unit)

So ROOT, VCPU, USER and PROC will have their own "run_queue". Picking up a 
new task would mean:

VCPU = ROOT->lookup();
USER = VCPU->lookup();
PROC = USER->lookup();
THREAD = PROC->lookup();

Run-time statistics should propagate back the other way around.


> In fact for threads the _reverse_ problem exists, threaded apps tend to 
> _strive_ for more performance - hence their desperation of using the 
> threaded programming model to begin with ;) (just think of media 
> playback apps which are typically multithreaded)

The same user nicing two different multi-threaded processes would expect a 
predictable CPU distribution too. Doing that efficently (the old per-cpu 
run-queue is pretty nice from many POVs) is the real challenge.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Con Kolivas
On Wednesday 18 April 2007 22:33, Con Kolivas wrote:
> On Wednesday 18 April 2007 22:14, Nick Piggin wrote:
> > On Wed, Apr 18, 2007 at 07:33:56PM +1000, Con Kolivas wrote:
> > > On Wednesday 18 April 2007 18:55, Nick Piggin wrote:
> > > > Again, for comparison 2.6.21-rc7 mainline:
> > > >
> > > > 508.87user 32.47system 2:17.82elapsed 392%CPU
> > > > 509.05user 32.25system 2:17.84elapsed 392%CPU
> > > > 508.75user 32.26system 2:17.83elapsed 392%CPU
> > > > 508.63user 32.17system 2:17.88elapsed 392%CPU
> > > > 509.01user 32.26system 2:17.90elapsed 392%CPU
> > > > 509.08user 32.20system 2:17.95elapsed 392%CPU
> > > >
> > > > So looking at elapsed time, a granularity of 100ms is just behind the
> > > > mainline score. However it is using slightly less user time and
> > > > slightly more idle time, which indicates that balancing might have
> > > > got a bit less aggressive.
> > > >
> > > > But anyway, it conclusively shows the efficiency impact of such tiny
> > > > timeslices.
> > >
> > > See test.kernel.org for how (the now defunct) SD was performing on
> > > kernbench. It had low latency _and_ equivalent throughput to mainline.
> > > Set the standard appropriately on both counts please.
> >
> > I can give it a run. Got an updated patch against -rc7?
>
> I said I wasn't pursuing it but since you're offering, the rc6 patch should
> apply ok.
>
> http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc6-sd-0.40.patch

Oh and if you go to the effort of trying you may as well try the timeslice 
tweak to see what effect it has on SD as well.

/proc/sys/kernel/rr_interval

100 is the highest.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Ingo Molnar

* Davide Libenzi <[EMAIL PROTECTED]> wrote:

> I think Ingo's idea of a new sched_group to contain the generic 
> parameters needed for the "key" calculation, works better than adding 
> more fields to existing strctures (that would, of course, host 
> pointers to it). Otherwise I can already the the struct_signal being 
> the target for other unrelated fields :)

yeah. Another detail is that for global containers like uids, the 
statistics will have to be percpu_alloc()-ed, both for correctness 
(runqueues are per CPU) and for performance.

That's one reason why i dont think it's necessarily a good idea to 
group-schedule threads, we dont really want to do a per thread group 
percpu_alloc().

In fact for threads the _reverse_ problem exists, threaded apps tend to 
_strive_ for more performance - hence their desperation of using the 
threaded programming model to begin with ;) (just think of media 
playback apps which are typically multithreaded)

I dont think threads are all that different. Also, the 
resource-conserving act of using CLONE_VM to share the VM (and to use a 
different programming environment like Java) should not be 'punished' by 
forcing the thread group to be accounted as a single, shared entity 
against other 'fat' tasks.

so my current impression is that we want per UID accounting to solve the 
X problem, the kernel threads problem and the many-users problem, but 
i'd not want to do it for threads just yet because for them there's not 
really any apparent problem to be solved.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> > perhaps a more fitting term would be 'precise group-scheduling'. 
> > Within the lowest level task group entity (be that thread group or 
> > uid group, etc.) 'precise scheduling' is equivalent to 'fairness'.
> 
> Yes. Absolutely. Except I think that at least if you're going to name 
> somethign "complete" (or "perfect" or "precise"), you should also 
> admit that groups can be hierarchical.

yes. Am i correct to sum up your impression as:

 " Ingo, for you the hierarchy still appears to be an after-thought,
   while in practice it's easily the most important thing! Why are you
   so hung up about 'fairness', it makes no sense!"

right?

and you would definitely be right if you suggested that i neglected the 
'group scheduling' aspects of CFS (except for a minimalistic nice level 
implementation, which is a poor-man's-non-automatic-group-scheduling), 
but i very much know its important and i'll definitely fix it for -v4.

But please let me explain my reasons for my different focus:

yes, group scheduling in practice is the most important first-layer 
thing, and without it any of the other 'CFS wins' can easily be useless.

Firstly, i have not neglected the group scheduling related CFS 
regressions at all, mainly because there _is_ already a quick hack to 
check whether group scheduling would solve these regressions: renice. 
And it was tried in both of the two CFS regression cases i'm aware of: 
Mike's X starvation problem and Willy's "kevents starvation with 
thousands of scheddos tasks running" problem. And in both cases, 
applying the renice hack [which should be properly and automatically 
implemented as uid group scheduling] fixed the regression for them! So i 
was not worried at all, group scheduling _provably solves_ these CFS 
regressions. I rather concentrated on the CFS regressions that were much 
less clear.

But PLEASE believe me: even with perfect cross-group CPU allocation but 
with a simple non-heuristic scheduler underlying it, you can _easily_ 
get a sucky desktop experience! I know it because i tried it and others 
tried it too. (in fact the first version of sched_fair.c was tick based 
and low-res, and it sucked)

Two more things were needed:

  - the high precision of nsec/64-bit accounting
('reliability of scheduling')

  - extremely even time-distribution of CPU power 
('determinism/smoothness, human perception')

(i'm expanding on these two concepts further below)

take out any of these and group scheduling or not, you are easily going 
to have a sucky desktop! (We know that from years of experiments: many 
people tried to rip out the unfairness from the scheduler and there were 
always nasty corner cases that 'should' have worked but didnt.)

Without these we'd in essence start again at square one, just at a 
different square, this time with another group of people being 
irritated!

But the biggest and hardest to achieve _wins_ of CFS are _NOT_ achieved 
via a simple 'get rid of the unfairness of the upstream scheduler and 
apply group scheduling'. (I know that because i tried it before and 
because others tried it before, for many many years.) You will _easily_ 
get sucky desktop experience. The other two things are very much needed 
too:

 - the high precision of nsec/64-bit accounting, and the many
   corner-cases this solves. (For example on a typical desktop there are
   _lots_ of timing-driven workloads that are in essence 'invisible' to
   low-resolution, timer-tick based accounting and are heavily skewed.)

 - extremely even time-distribution of CPU power. CFS behaves pretty
   well even under the dreaded 'make -jN in an xterm' kernel build
   workload as reported by Mark Lord, because it also distributes CPU
   power in a _finegrained_ way. A shell prompt under CFS still behaves
   acceptably on a single-CPU testbox of mine with a "make -j50"
   workload. (yes, fifty) Humans react alot more negatively to sudden
   changes in application behavior ('lags', pauses, short hangs) than
   they react to fine, gradual, all-encompassing slowdowns. This is a
   key property of CFS.

  ( Otherwise renicing X to -10 would have solved most of the
interactivity complaints against the vanilla scheduler, otherwise
renicing X to -10 would have fixed Mike's setup under SD (it didnt)
while it worked much better under CFS, otherwise Gene wouldnt have
found CFS markedly better than SD, etc., etc. So getting rid of the
heuristics is less than 50% of the road to the perfect desktop
scheduler. )

and i claim that these were the really hard bits, and i spent most of 
the CFS coding only on getting _these_ details 100% right under various 
workloads, and it makes a night and day difference _even without any 
group scheduling help_.

and note another reason here: group scheduling _masks_ many other 
scheduling deficiencies that are possible in scheduler. So since CFS 
doesnt do group scheduling, i get a _fuller_ 

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Linus Torvalds


On Wed, 18 Apr 2007, Davide Libenzi wrote:
> 
> "Perhaps on the rare occasion pursuing the right course demands an act of 
>  unfairness, unfairness itself can be the right course?"

I don't think that's the right issue.

It's just that "fairness" != "equal".

Do you think it "fair" to pay everybody the same regardless of how good a 
job they do? I don't think anybody really believes that. 

Equating "fair" and "equal" is simply a very fundamental mistake. They're 
not the same thing. Never have been, and never will.

Now, there's no question that "equal" is much easier to implement, if only 
because it's a lot easier to agree what it means. "Equal parts" is 
somethign everybody can agree on. "Fair parts" automatically involves a 
balancing act, and people will invariably count things differently and 
thus disagree about what is "fair" and what is not.

I don't think we can ever get a "perfect" setup for that reason, but I 
think we can get something that at least gets reasonably close, at least 
for the obvious cases.

So my suggested test-case of running one process as one user and two 
processes as another one has a fairly "obviously correct" solution if you 
have just one CPU's, and you can probably be pretty fair in practice on 
two CPU's (there's an obvious theoretical solution, whether you can get 
there with a practical algorithm is another thing). On three or more 
CPU's, you obviously wouldn't even *want* to be fair, since you can very 
naturally just give a CPU to each..

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Davide Libenzi
On Wed, 18 Apr 2007, Linus Torvalds wrote:

> For example, maybe we can approximate it by spreading out the statistics: 
> right now you have things like
> 
>  - last_ran, wait_runtime, sum_wait_runtime..
> 
> be per-thread things. Maybe some of those can be spread out, so that you 
> put a part of them in the "struct vm_struct" thing (to approximate 
> processes), part of them in the "struct user" struct (to approximate the 
> user-level thing), and part of it in a per-container thing for when/if we 
> support that kind of thing?

I think Ingo's idea of a new sched_group to contain the generic 
parameters needed for the "key" calculation, works better than adding more 
fields to existing strctures (that would, of course, host pointers to it). 
Otherwise I can already the the struct_signal being the target for other 
unrelated fields :)



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Davide Libenzi
On Wed, 18 Apr 2007, Linus Torvalds wrote:

> I'm not arguing against fairness. I'm arguing against YOUR notion of 
> fairness, which is obviously bogus. It is *not* fair to try to give out 
> CPU time evenly!

"Perhaps on the rare occasion pursuing the right course demands an act of 
 unfairness, unfairness itself can be the right course?"



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Davide Libenzi
On Wed, 18 Apr 2007, William Lee Irwin III wrote:

> Thinking of the scheduler as a CPU bandwidth allocator, this means
> handing out shares of CPU bandwidth to all users on the system, which
> in turn hand out shares of bandwidth to all sessions, which in turn
> hand out shares of bandwidth to all process groups, which in turn hand
> out shares of bandwidth to all thread groups, which in turn hand out
> shares of bandwidth to threads. The event handlers for the scheduler
> need not deal with this apart from task creation and exit and various
> sorts of process ID changes (e.g. setsid(), setpgrp(), setuid(), etc.).

Yes, it really becomes a hierarchical problem once you consider user and 
processes. Top level sees a "user" can be scheduled (put itself on the 
virtual run queue), and passes the ball to the "process" scheduler inside 
the "user" container, down to maybe "threads". With all the "key" 
calculation parameters kept at each level (with up-propagation).



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> For example, maybe we can approximate it by spreading out the 
> statistics: right now you have things like
> 
>  - last_ran, wait_runtime, sum_wait_runtime..
> 
> be per-thread things. [...]

yes, yes, yes! :) My thinking is "struct sched_group" embedded into 
_arbitrary_ other resource containers and abstractions, which 
sched_group's are then in a simple hierarchy and are driven by the core 
scheduling machinery.

> [...] Maybe some of those can be spread out, so that you put a part of 
> them in the "struct vm_struct" thing (to approximate processes), part 
> of them in the "struct user" struct (to approximate the user-level 
> thing), and part of it in a per-container thing for when/if we support 
> that kind of thing?

yes.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Linus Torvalds


On Wed, 18 Apr 2007, Ingo Molnar wrote:
> 
> perhaps a more fitting term would be 'precise group-scheduling'. Within 
> the lowest level task group entity (be that thread group or uid group, 
> etc.) 'precise scheduling' is equivalent to 'fairness'.

Yes. Absolutely. Except I think that at least if you're going to name 
somethign "complete" (or "perfect" or "precise"), you should also admit 
that groups can be hierarchical.

The "threads in a process" thing is a great example of a hierarchical 
group. Imagine if X was running as a collection of threads - then each 
server thread would no longer be more important than the clients! But if 
you have a mix of "bags of threads" and "single process" kind 
applications, then very arguably the single thread in a single traditional 
process should get as much time as the "bag of threads" process gets 
total.

So it really should be a hierarchical notion, where each thread is owned 
by one "process", and each process is owned by one "user", and each user 
is in one "virtual machine" - there's at least three different levels to 
this, and you'd want to schedule this thing top-down: virtual machines 
should be given CPU time "fairly" (which doesn't need to mean "equally", 
of course - nice-values could very well work at that level too), and then 
within each virtual machine users or "scheduling groups" should be 
scheduled fairly, and then within each scheduling group the processes 
should be scheduled, and within each process threads should equally get 
their fair share at _that_ level.

And no, I don't think we necessarily need to do something quite that 
elaborate. But I think that's the kind of "obviously good goal" to keep in 
mind. Can we perhaps _approximate_ something like that by other means? 

For example, maybe we can approximate it by spreading out the statistics: 
right now you have things like

 - last_ran, wait_runtime, sum_wait_runtime..

be per-thread things. Maybe some of those can be spread out, so that you 
put a part of them in the "struct vm_struct" thing (to approximate 
processes), part of them in the "struct user" struct (to approximate the 
user-level thing), and part of it in a per-container thing for when/if we 
support that kind of thing?

IOW, I don't think the scheduling "groups" have to be explicit boxes or 
anything like that. I suspect you can make do with just heurstics that 
penalize the same "struct user" and "struct vm_struct" to get overly much 
scheduling time, and you'll get the same _effect_. 

And I don't think it's wrong to look at the "one hundred processes by the 
same user" case as being an important case. But it should not be the 
*only* case or even necessarily the *main* case that matters. I think a 
benchmark that literally does

pid_t pid = fork();
if (pid < 0)
exit(1);
if (pid) {
if (setuid(500) < 0)
exit(2);
for (;;)
/* Do nothing */;
}
if (setuid(501) < 0)
exit(3);
fork();
for (;;)
/* Do nothing in two processes */;

and I think that it's a really valid benchmark: if the scheduler gives 25% 
of time to each of the two processes of user 501, and 50% to user 500, 
then THAT is a good scheduler.

If somebody wants to actually write and test the above as a test-script, 
and add it to a collection of scheduler tests, I think that could be a 
good thing.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Chris Friesen

Mark Glines wrote:


One minor question: is it even possible to be completely fair on SMP?
For instance, if you have a 2-way SMP box running 3 applications, one of
which has 2 threads, will the threaded app have an advantage here?  (The
current system seems to try to keep each thread on a specific CPU, to
reduce cache thrashing, which means threads and processes alike each
get 50% of the CPU.)


I think the ideal in this case would be to have both threads on one cpu, 
with the other app on the other cpu.  This gives inter-process fairness 
while minimizing the amount of task migration required.


More interesting is the case of three processes on a 2-cpu system.  Do 
we constantly migrate one of them back and forth to ensure that each of 
them gets 66% of a cpu?


Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Linus Torvalds


On Wed, 18 Apr 2007, Ingo Molnar wrote:
> 
> But note that most of the reported CFS interactivity wins, as surprising 
> as it might be, were due to fairness between _the same user's tasks_.

And *ALL* of the CFS interactivity *losses* and complaints have been 
because it did the wrong thing _between different user's tasks_

So what's your point? Your point was that when people try it out as a 
single user, it is indeed fair. But that's no point at all, since it 
totally missed _my_ point.

The problems with X scheduling is exactly that "other user" kind of thing.

The problem with kernel thread starvation due to user threads getting all 
the CPU time is exactly the same issue.

As logn as you think that all threads are equal, and should be treated 
equally, you CANNOT make it work well. People can say "ok, you can renice 
X", but the whole problem stems from the fact that you're trying to be 
fair based on A TOTALLY INVALID NOTION of what "fair" is.

> In the typical case, 99% of the desktop CPU time is executed either as X 
> (root user) or under the uid of the logged in user, and X is just one 
> task.

So? You are ignoring the argument again. You're totally bringing up a red 
herring:

> Even with a bad hack of making X super-high-prio, interactivity as 
> experienced by users still sucks without having fairness between the 
> other 100-200 user tasks that a desktop system is typically using.

I didn't say that you should be *unfair* within one user group. What kind 
of *idiotic* argument are you trying to put forth?

OF COURSE you should be fair "within the user group". Nobody contests that 
the "other 100-200 user tasks" should be scheduled fairly _amongst 
themselves_. 

The only point I had was that you cannot just lump all threads together 
and say "these threads are equally important". The 100-200 user tasks may 
be equally important, and should get equal amounts of preference, but that 
has absolutely _zero_ bearing on the _single_ task run in another 
"scheduling group", ie by other users or by X.

I'm not arguing against fairness. I'm arguing against YOUR notion of 
fairness, which is obviously bogus. It is *not* fair to try to give out 
CPU time evenly!

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Michael K. Edwards

On 4/18/07, Matt Mackall <[EMAIL PROTECTED]> wrote:

For the record, you actually don't need to track a whole NxN matrix
(or do the implied O(n**3) matrix inversion!) to get to the same
result. You can converge on the same node weightings (ie dynamic
priorities) by applying a damped function at each transition point
(directed wakeup, preemption, fork, exit).

The trouble with any scheme like this is that it needs careful tuning
of the damping factor to converge rapidly and not oscillate and
precise numerical attention to the transition functions so that the sum of
dynamic priorities is conserved.


That would be the control theory approach.  And yes, you have to get
both the theoretical transfer function and the numerics right.  It
sometimes helps to use a control-systems framework like the classic
Takagi-Sugeno-Kang fuzzy logic controller; get the numerics right once
and for all, and treat the heuristics as data, not logic.  (I haven't
worked in this area in almost twenty years, but Google -- yes, I do
use Google+brain for fact-checking; what do you do? -- says that
people are still doing active research on TSK models, and solid
fixed-point reference implementations are readily available.)  That
seems like an attractive strategy here because you could easily embed
the control engine in the kernel and load rule sets dynamically.  Done
right, that could give most of the advantages of pluggable schedulers
(different heuristic strokes for different folks) without diluting the
tester pool for the actual engine code.

(Of course, different scheduling strategies require different input
data, and you might not want the overhead of collecting data that your
chosen heuristics won't use.  But that's not much different from the
netfilter situation, and is obviously a solvable problem, if anyone
cares to put that much work in.  The people who ought to be funding
this kind of work are Sun and IBM, who don't have a chance on the
desktop and are in big trouble in the database tier; their future as
processor vendors depends on being able to service presentation-tier
and business-logic-tier loads efficiently on their massively
multi-core chips.  MIPS should pitch in too, on behalf of licensees
like Cavium who need more predictable behavior on multi-core embedded
Linux.)

Note also that you might not even want to persistently prioritize
particular processes or process groups.  You might want a heuristic
that notices that some task (say, the X server) often responds to
being awakened by doing a little work and then unblocking the task
that awakened it.  When it is pinged from some highly interactive
task, you want it to jump the scheduler queue just long enough to
unblock the interactive task, which may mean letting it flush some
work out of its internal queue.  But otherwise you want to batch
things up until there's too much "scheduler pressure" behind it, then
let it work more or less until it runs out of things to do, because
its working set is so large that repeatedly scheduling it in and out
is hell on caches.

(Priority inheritance is the classic solution to the
blocked-high-priority-task problem _in_isolation_.  It is not without
its pitfalls, especially when the designer of the "server" didn't
expect to lose his timeslice instantly on releasing the lock.  True
priority inheritance is probably not something you want to inflict on
a non-real-time system, but you do need some urgency heuristic.  What
a "fuzzy logic" framework does for you is to let you combine competing
heuristics in a way that remains amenable to analysis using control
theory techniques.)

What does any of this have to do with "fairness"?  Nothing whatsoever!
There's work that has to be done, and choosing when to do it is
almost entirely a matter of staying out of the way of more urgent work
while minimizing the task's negative impact on the rest of the system.
Does that mean that the X server is "special", kind of the way that
latency-sensitive A/V applications are "special", and belongs in a
separate scheduler class?  No.  Nowadays, workloads where the kernel
has any idea what tasks belong to what "users" are the exception, not
the norm.  The X server is the canary in the coal mine, and a
scheduler that won't do the right thing for X without hand tweaking
won't do the right thing for other eyeball-driven,
multiple-tiers-on-one-box scenarios either.

If you want fairness among users to the extent that their demands
_compete_, you might as well partition the whole machine, and have a
separate fairness-oriented scheduler (let's call it a "hypervisor")
that lives outside the kernel.  (Talk about two students running gcc
on the same shell server, with more important people also doing things
on the same system, is so 1990's!)  Not that the design of scheduler
heuristics shouldn't include "fairness"-like considerations; but
they're probably only interesting as a fallback for when the scheduler
has no idea what it ought to schedule next.

So why is Ingo's 

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-18 Thread Davide Libenzi
On Wed, 18 Apr 2007, Matt Mackall wrote:

> On Wed, Apr 18, 2007 at 07:48:21AM -0700, Linus Torvalds wrote:
> > And "fairness by euid" is probably a hell of a lot easier to do than 
> > trying to figure out the wakeup matrix.
> 
> For the record, you actually don't need to track a whole NxN matrix
> (or do the implied O(n**3) matrix inversion!) to get to the same
> result. You can converge on the same node weightings (ie dynamic
> priorities) by applying a damped function at each transition point
> (directed wakeup, preemption, fork, exit).
> 
> The trouble with any scheme like this is that it needs careful tuning
> of the damping factor to converge rapidly and not oscillate and
> precise numerical attention to the transition functions so that the sum of
> dynamic priorities is conserved.

Doing that inside the boundaries of the time constrains imposed by a 
scheduler, is the interesting part. Given also that the size (and members) 
of it (matrix) is dynamic.
Also, a "wakup matrix" (if the name correctly pictures what it is for) 
would help with latencies and priority inheritance, but not for 
global fairness.
The maniacal fairness focus we're seeing now, is due to the fact the 
mainline can have extremely unfair behaviour under certain conditions. 
IMO fairness, although important, should not be main objective of the 
scheduler rewrite. Simplification and predictability should be on higher 
priority, with interactivity achievements bound to decent fariness 
constraints.




- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   >