Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-28 Thread Bernd Eckenfels
In article <[EMAIL PROTECTED]> you wrote:
>  a) it may do so for a short and bound time, typically less than the
> maximum acceptable latency for other tasks

if you have n threads in runq and each of them can have mhttp://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-28 Thread Bernd Eckenfels
In article [EMAIL PROTECTED] you wrote:
  a) it may do so for a short and bound time, typically less than the
 maximum acceptable latency for other tasks

if you have n threads in runq and each of them can have md (d=max latency
deadline) overhead, you will have to account on d/n slices. This is
typically not possible for larger number of ready threads.

Therefore another aproach would be to make sure the next thread gets a
smaller slice, but then you will have to move around that debit and
distribute it fair, which is the whole problem we face here.

(Besides it is not clear to me if fair scheduling gets the best results, see
the X problem or compare threads vs. process vs. subsystems).

Gruss
Bernd

PS: sorry for that Cc trimming, I need to get rid of my mail2news gateway,
however I will make sure to copy important info to all concerend parties -
dont think thats needed for my ramblings .)


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-26 Thread William Lee Irwin III
On Wed, Apr 25, 2007 at 04:58:40AM -0700, William Lee Irwin III wrote:
>>> Adjustments to the lag computation for for arrivals and departures
>>> during execution are among the missing pieces. Some algorithmic devices
>>> are also needed to account for the varying growth rates of lags of tasks
>>> waiting to run, which arise from differing priorities/weights.

On Wed, 2007-04-25 at 22:13 +0200, Willy Tarreau wrote:
>> that was the principle of my proposal of sorting tasks by expected completion
>> time and using +/- credit to compensate for too large/too short slice used.

On Thu, Apr 26, 2007 at 10:57:48AM -0700, Li, Tong N wrote:
> Yeah, it's a good algorithm. It's a variant of earliest deadline first
> (EDF). There are also similar ones in the literature such as earliest
> eligible virtual deadline first (EEVDF) and biased virtual finishing
> time (BVFT). Based on wli's explanation, I think Ingo's approach would
> also fall into this category. With careful design, all such algorithms
> that order tasks based on some notion of time can achieve good fairness.
> There are some subtle differences. Some algorithms of this type can
> achieve a constant lag bound, but some only have a constant positive lag
> bound, but O(N) negative lag bound, meaning some tasks could receive
> much more CPU time than it would under ideal fairness when the number of
> tasks is high.

The algorithm is in a bit of flux, but the virtual deadline computation
is rather readable. You may be able to tell whether cfs is affected by
the negative lag issue better than I. For the most part all I can smoke
out is that it's not apparent to me whether load balancing is done the
way it needs to be.


On Thu, Apr 26, 2007 at 10:57:48AM -0700, Li, Tong N wrote:
> On the other hand, the log(N) complexity of this type of algorithms has
> been a concern in the research community. This motivated O(1)
> round-robin based algorithms such as deficit round-robin (DRR) and
> smoothed round-robin (SRR) in networking, and virtual-time round-robin
> (VTRR), group ratio round-robin (GP3) and grouped distributed queues
> (GDQ) in OS scheduling, as well as the distributed weighted round-robin
> (DWRR) one I posted earlier.

I'm going to make a bold statement: I don't think O(lg(n)) is bad at
all. In real systems there are constraints related to per-task memory
footprints that severely restrict the domain of the performance metric,
rendering O(lg(n)) bounded by a rather reasonable constant.

A larger concern to me is whether this affair actually achieves its
design goals and, to a lesser extent, in what contexts those design
goals are truly crucial or dominant as opposed to others, such as,
say, interactivity. It is clear, regardless of general applicability,
that the predictability of behavior with regard to strict fairness
is going to be useful in certain contexts.

Another concern which is in favor of the virtual deadline design is
that virtual deadlines can very effectively emulate a broad spectrum
of algorithms. For instance, the mainline "O(1) scheduler" can be
emulated using such a queueing mechanism. Even if the particular
policy cfs now implemented is dumped, radically different policies
can be expressed with its queueing mechanism. This has maintainence
implications which are quite beneficial. That said, it's far from
an unqualified endorsement. I'd still like to see much done differently.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-26 Thread Willy Tarreau
On Thu, Apr 26, 2007 at 10:57:48AM -0700, Li, Tong N wrote:
> On Wed, 2007-04-25 at 22:13 +0200, Willy Tarreau wrote:
> > On Wed, Apr 25, 2007 at 04:58:40AM -0700, William Lee Irwin III wrote:
> > 
> > > Adjustments to the lag computation for for arrivals and departures
> > > during execution are among the missing pieces. Some algorithmic devices
> > > are also needed to account for the varying growth rates of lags of tasks
> > > waiting to run, which arise from differing priorities/weights.
> > 
> > that was the principle of my proposal of sorting tasks by expected 
> > completion
> > time and using +/- credit to compensate for too large/too short slice used.
> > 
> > Willy
> 
> Yeah, it's a good algorithm. It's a variant of earliest deadline first
> (EDF). There are also similar ones in the literature such as earliest
> eligible virtual deadline first (EEVDF) and biased virtual finishing
> time (BVFT). Based on wli's explanation, I think Ingo's approach would
> also fall into this category. With careful design, all such algorithms
> that order tasks based on some notion of time can achieve good fairness.
> There are some subtle differences. Some algorithms of this type can
> achieve a constant lag bound, but some only have a constant positive lag
> bound, but O(N) negative lag bound,

Anyway, we're working in discrete time, not linear time. Lag is
unavoidable. At best it can be bounded and compensated for. First time
I thought about this algorithm, I was looking at a line drawn using the
Bresenham algorithm. This algorithm is all about error compensation for
a perfect expectation. Too high, too far, too high, too far... I thought
that the line could represent a task progress as a function of time, and
the pixels the periods the task spends on the CPU. On short intervals,
you lose. On large ones, you're very close to the ideal case.

> meaning some tasks could receive
> much more CPU time than it would under ideal fairness when the number of
> tasks is high.

It's not a problem that a task receives much more CPU than it should,
provided that :

  a) it may do so for a short and bound time, typically less than the
 maximum acceptable latency for other tasks

  b) the excess of CPU it received is accounted for so that it is
 deduced from next passes.


> On the other hand, the log(N) complexity of this type of algorithms has
> been a concern in the research community. This motivated O(1)

I've been for O(1) algorithms for a long time, but seeing how dumb the
things you can do in O(1) are compared to O(logN), I definitely switched
my mind. Also, in O(logN), you often have a lot of common operations
still O(1). Eg: you insert in O(logN) in a time-ordered tree, but you
read from it in O(1). But you still have the ability to change its
content in O(logN) if you need.

Last but not least, spending one hundred cycles a few thousand times
a second is nothing compared to electing the wrong task for a full
time-slice.

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-26 Thread Li, Tong N
On Wed, 2007-04-25 at 22:13 +0200, Willy Tarreau wrote:
> On Wed, Apr 25, 2007 at 04:58:40AM -0700, William Lee Irwin III wrote:
> 
> > Adjustments to the lag computation for for arrivals and departures
> > during execution are among the missing pieces. Some algorithmic devices
> > are also needed to account for the varying growth rates of lags of tasks
> > waiting to run, which arise from differing priorities/weights.
> 
> that was the principle of my proposal of sorting tasks by expected completion
> time and using +/- credit to compensate for too large/too short slice used.
> 
> Willy

Yeah, it's a good algorithm. It's a variant of earliest deadline first
(EDF). There are also similar ones in the literature such as earliest
eligible virtual deadline first (EEVDF) and biased virtual finishing
time (BVFT). Based on wli's explanation, I think Ingo's approach would
also fall into this category. With careful design, all such algorithms
that order tasks based on some notion of time can achieve good fairness.
There are some subtle differences. Some algorithms of this type can
achieve a constant lag bound, but some only have a constant positive lag
bound, but O(N) negative lag bound, meaning some tasks could receive
much more CPU time than it would under ideal fairness when the number of
tasks is high.

On the other hand, the log(N) complexity of this type of algorithms has
been a concern in the research community. This motivated O(1)
round-robin based algorithms such as deficit round-robin (DRR) and
smoothed round-robin (SRR) in networking, and virtual-time round-robin
(VTRR), group ratio round-robin (GP3) and grouped distributed queues
(GDQ) in OS scheduling, as well as the distributed weighted round-robin
(DWRR) one I posted earlier.

  tong
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-26 Thread Li, Tong N
On Wed, 2007-04-25 at 22:13 +0200, Willy Tarreau wrote:
 On Wed, Apr 25, 2007 at 04:58:40AM -0700, William Lee Irwin III wrote:
 
  Adjustments to the lag computation for for arrivals and departures
  during execution are among the missing pieces. Some algorithmic devices
  are also needed to account for the varying growth rates of lags of tasks
  waiting to run, which arise from differing priorities/weights.
 
 that was the principle of my proposal of sorting tasks by expected completion
 time and using +/- credit to compensate for too large/too short slice used.
 
 Willy

Yeah, it's a good algorithm. It's a variant of earliest deadline first
(EDF). There are also similar ones in the literature such as earliest
eligible virtual deadline first (EEVDF) and biased virtual finishing
time (BVFT). Based on wli's explanation, I think Ingo's approach would
also fall into this category. With careful design, all such algorithms
that order tasks based on some notion of time can achieve good fairness.
There are some subtle differences. Some algorithms of this type can
achieve a constant lag bound, but some only have a constant positive lag
bound, but O(N) negative lag bound, meaning some tasks could receive
much more CPU time than it would under ideal fairness when the number of
tasks is high.

On the other hand, the log(N) complexity of this type of algorithms has
been a concern in the research community. This motivated O(1)
round-robin based algorithms such as deficit round-robin (DRR) and
smoothed round-robin (SRR) in networking, and virtual-time round-robin
(VTRR), group ratio round-robin (GP3) and grouped distributed queues
(GDQ) in OS scheduling, as well as the distributed weighted round-robin
(DWRR) one I posted earlier.

  tong
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-26 Thread Willy Tarreau
On Thu, Apr 26, 2007 at 10:57:48AM -0700, Li, Tong N wrote:
 On Wed, 2007-04-25 at 22:13 +0200, Willy Tarreau wrote:
  On Wed, Apr 25, 2007 at 04:58:40AM -0700, William Lee Irwin III wrote:
  
   Adjustments to the lag computation for for arrivals and departures
   during execution are among the missing pieces. Some algorithmic devices
   are also needed to account for the varying growth rates of lags of tasks
   waiting to run, which arise from differing priorities/weights.
  
  that was the principle of my proposal of sorting tasks by expected 
  completion
  time and using +/- credit to compensate for too large/too short slice used.
  
  Willy
 
 Yeah, it's a good algorithm. It's a variant of earliest deadline first
 (EDF). There are also similar ones in the literature such as earliest
 eligible virtual deadline first (EEVDF) and biased virtual finishing
 time (BVFT). Based on wli's explanation, I think Ingo's approach would
 also fall into this category. With careful design, all such algorithms
 that order tasks based on some notion of time can achieve good fairness.
 There are some subtle differences. Some algorithms of this type can
 achieve a constant lag bound, but some only have a constant positive lag
 bound, but O(N) negative lag bound,

Anyway, we're working in discrete time, not linear time. Lag is
unavoidable. At best it can be bounded and compensated for. First time
I thought about this algorithm, I was looking at a line drawn using the
Bresenham algorithm. This algorithm is all about error compensation for
a perfect expectation. Too high, too far, too high, too far... I thought
that the line could represent a task progress as a function of time, and
the pixels the periods the task spends on the CPU. On short intervals,
you lose. On large ones, you're very close to the ideal case.

 meaning some tasks could receive
 much more CPU time than it would under ideal fairness when the number of
 tasks is high.

It's not a problem that a task receives much more CPU than it should,
provided that :

  a) it may do so for a short and bound time, typically less than the
 maximum acceptable latency for other tasks

  b) the excess of CPU it received is accounted for so that it is
 deduced from next passes.


 On the other hand, the log(N) complexity of this type of algorithms has
 been a concern in the research community. This motivated O(1)

I've been for O(1) algorithms for a long time, but seeing how dumb the
things you can do in O(1) are compared to O(logN), I definitely switched
my mind. Also, in O(logN), you often have a lot of common operations
still O(1). Eg: you insert in O(logN) in a time-ordered tree, but you
read from it in O(1). But you still have the ability to change its
content in O(logN) if you need.

Last but not least, spending one hundred cycles a few thousand times
a second is nothing compared to electing the wrong task for a full
time-slice.

Willy

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-26 Thread William Lee Irwin III
On Wed, Apr 25, 2007 at 04:58:40AM -0700, William Lee Irwin III wrote:
 Adjustments to the lag computation for for arrivals and departures
 during execution are among the missing pieces. Some algorithmic devices
 are also needed to account for the varying growth rates of lags of tasks
 waiting to run, which arise from differing priorities/weights.

On Wed, 2007-04-25 at 22:13 +0200, Willy Tarreau wrote:
 that was the principle of my proposal of sorting tasks by expected completion
 time and using +/- credit to compensate for too large/too short slice used.

On Thu, Apr 26, 2007 at 10:57:48AM -0700, Li, Tong N wrote:
 Yeah, it's a good algorithm. It's a variant of earliest deadline first
 (EDF). There are also similar ones in the literature such as earliest
 eligible virtual deadline first (EEVDF) and biased virtual finishing
 time (BVFT). Based on wli's explanation, I think Ingo's approach would
 also fall into this category. With careful design, all such algorithms
 that order tasks based on some notion of time can achieve good fairness.
 There are some subtle differences. Some algorithms of this type can
 achieve a constant lag bound, but some only have a constant positive lag
 bound, but O(N) negative lag bound, meaning some tasks could receive
 much more CPU time than it would under ideal fairness when the number of
 tasks is high.

The algorithm is in a bit of flux, but the virtual deadline computation
is rather readable. You may be able to tell whether cfs is affected by
the negative lag issue better than I. For the most part all I can smoke
out is that it's not apparent to me whether load balancing is done the
way it needs to be.


On Thu, Apr 26, 2007 at 10:57:48AM -0700, Li, Tong N wrote:
 On the other hand, the log(N) complexity of this type of algorithms has
 been a concern in the research community. This motivated O(1)
 round-robin based algorithms such as deficit round-robin (DRR) and
 smoothed round-robin (SRR) in networking, and virtual-time round-robin
 (VTRR), group ratio round-robin (GP3) and grouped distributed queues
 (GDQ) in OS scheduling, as well as the distributed weighted round-robin
 (DWRR) one I posted earlier.

I'm going to make a bold statement: I don't think O(lg(n)) is bad at
all. In real systems there are constraints related to per-task memory
footprints that severely restrict the domain of the performance metric,
rendering O(lg(n)) bounded by a rather reasonable constant.

A larger concern to me is whether this affair actually achieves its
design goals and, to a lesser extent, in what contexts those design
goals are truly crucial or dominant as opposed to others, such as,
say, interactivity. It is clear, regardless of general applicability,
that the predictability of behavior with regard to strict fairness
is going to be useful in certain contexts.

Another concern which is in favor of the virtual deadline design is
that virtual deadlines can very effectively emulate a broad spectrum
of algorithms. For instance, the mainline O(1) scheduler can be
emulated using such a queueing mechanism. Even if the particular
policy cfs now implemented is dumped, radically different policies
can be expressed with its queueing mechanism. This has maintainence
implications which are quite beneficial. That said, it's far from
an unqualified endorsement. I'd still like to see much done differently.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


SD renice recommendation was: Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-25 Thread Con Kolivas
On Tuesday 24 April 2007 16:36, Ingo Molnar wrote:

> So, my point is, the nice level of X for desktop users should not be set
> lower than a low limit suggested by that particular scheduler's author.
> That limit is scheduler-specific. Con i think recommends a nice level of
> -1 for X when using SD [Con, can you confirm?], while my tests show that
> if you want you can go as low as -10 under CFS, without any bad
> side-effects. (-19 was a bit too much)

Nice 0 as a default for X, but if renicing, nice -10 as the lower limit for X 
on SD. The reason for that on SD is that the priority of freshly woken up 
tasks (ie not fully cpu bound) for both nice 0 and nice -10 will still be the 
same at PRIO 1 (see the prio_matrix). Therefore, there will _not_ be 
preemption of the nice 0 task and a context switch _unless_ it is already cpu 
bound and has consumed a certain number of cycles and has been demoted. 
Contrary to popular belief, it is not universal that a less niced task will 
preempt its more niced counterpart and depends entirely on implementation of 
nice. Yes it is true that context switch rate will go up with a reniced X 
because the conditions that lead to preemption are more likely to be met, but 
it is definitely not every single wakeup of the reniced X.

Alas, again, I am forced to spend as little time as possible at the pc for my 
health, so expect _very few_ responses via email from me. Luckily SD is in 
pretty fine shape with version 0.46.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-25 Thread Willy Tarreau
On Wed, Apr 25, 2007 at 04:58:40AM -0700, William Lee Irwin III wrote:

> Adjustments to the lag computation for for arrivals and departures
> during execution are among the missing pieces. Some algorithmic devices
> are also needed to account for the varying growth rates of lags of tasks
> waiting to run, which arise from differing priorities/weights.

that was the principle of my proposal of sorting tasks by expected completion
time and using +/- credit to compensate for too large/too short slice used.

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-25 Thread William Lee Irwin III
* Li, Tong N <[EMAIL PROTECTED]> wrote:
>> [...] A corollary of this is that if both threads i and j are 
>> continuously runnable with fixed weights in the time interval, then 
>> the ratio of their CPU time should be equal to the ratio of their 
>> weights. This definition is pretty restrictive since it requires the 
>> properties to hold for any thread in any interval, which is not 
>> feasible. [...]

On Wed, Apr 25, 2007 at 11:44:03AM +0200, Ingo Molnar wrote:
> yes, it's a pretty strong definition, but also note that while it is 
> definitely not easy to implement, the solution is nevertheless feasible 
> in my opinion and there exists a scheduler that implements it: CFS.

The feasibility comment refers to the unimplementability of schedulers
with infinitesimal timeslices/quanta/sched_granularity_ns. It's no
failing of cfs (or any other scheduler) if, say, the ratios are not
exact within a time interval of one nanosecond or one picosecond.

One of the reasons you get the results you do is that what you use for
->fair_key is very close to the definition of lag, which is used as a
metric of fairness. It differs is in a couple of ways, but how it's
computed and used for queueing can be altered to more precisely match.

The basic concept you appear to be trying to implement is a greedy
algorithm: run the task with the largest lag first. As far as I can
tell, this is sound enough, though I have no formal proof. So with the
lag computation and queueing adjusted appropriately, it should work out.

Adjustments to the lag computation for for arrivals and departures
during execution are among the missing pieces. Some algorithmic devices
are also needed to account for the varying growth rates of lags of tasks
waiting to run, which arise from differing priorities/weights.

There are no mysteries.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-25 Thread Alan Cox
> > it into some xorg.conf field. (It also makes sure that X isnt preempted 
> > by other userspace stuff while it does timing-sensitive operations like 
> > setting the video modes up or switching video modes, etc.)
> 
> X is priviledged. It can just cli around the critical section.

Not really. X can use iopl3 but if it disables interrupts you get
priority inversions and hangs, so in practice it can't do that. 

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-25 Thread Ingo Molnar

* Li, Tong N <[EMAIL PROTECTED]> wrote:

> [...] A corollary of this is that if both threads i and j are 
> continuously runnable with fixed weights in the time interval, then 
> the ratio of their CPU time should be equal to the ratio of their 
> weights. This definition is pretty restrictive since it requires the 
> properties to hold for any thread in any interval, which is not 
> feasible. [...]

yes, it's a pretty strong definition, but also note that while it is 
definitely not easy to implement, the solution is nevertheless feasible 
in my opinion and there exists a scheduler that implements it: CFS.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-25 Thread Ingo Molnar

* Ray Lee <[EMAIL PROTECTED]> wrote:

> It would seem like there should be a penalty associated with sending 
> those points as well, so that two processes communicating quickly with 
> each other won't get into a mutual love-fest that'll capture the 
> scheduler's attention.

it's not really "points", but "nanoseconds you are allowed to execute on 
the CPU". And thus two processes communicating with each other quickly 
and sending around this resource does get the attention of CFS: the 
resource is gradually consumed because the two processes are running on 
the CPU while they are communicating with each other. So it all works 
out fine.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-25 Thread Ingo Molnar

* Rogan Dawes <[EMAIL PROTECTED]> wrote:

> My concern was that since Ingo said that this is a closed economy, 
> with a fixed sum/total, if we lose a nanosecond here and there, 
> eventually we'll lose them all.

it's not a closed economy - the CPU constantly produces a resource: "CPU 
cycles to be spent", and tasks constantly consume that resource. So in 
that sense small inaccuracies are not a huge issue. But you are correct 
that each and every such inaccuracy has to be justified. For example 
larger inaccuracies on the order of SCHED_LOAD_SCALE are a problem 
because they can indeed sum up, and i fixed up a couple of such 
inaccuracies in -v6-to-be.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-25 Thread Ingo Molnar

* Pavel Machek <[EMAIL PROTECTED]> wrote:

> > it into some xorg.conf field. (It also makes sure that X isnt 
> > preempted by other userspace stuff while it does timing-sensitive 
> > operations like setting the video modes up or switching video modes, 
> > etc.)
> 
> X is priviledged. It can just cli around the critical section.

yes, that is a tool that can be used too (and is used by most drivers) - 
my point was rather that besides the disadvantages, not preempting X can 
be an advantage too - not that there are no other (and often more 
suitable) tools to do the same.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-25 Thread Pavel Machek
Hi!


> it into some xorg.conf field. (It also makes sure that X isnt preempted 
> by other userspace stuff while it does timing-sensitive operations like 
> setting the video modes up or switching video modes, etc.)

X is priviledged. It can just cli around the critical section.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-25 Thread William Lee Irwin III
On Tue, Apr 24, 2007 at 06:22:53PM -0700, Li, Tong N wrote:
> The goal of a proportional-share scheduling algorithm is to minimize the
> above metrics. If the lag function is bounded by a constant for any
> thread in any time interval, then the algorithm is considered to be
> fair. You may notice that the second metric is actually weaker than
> first. In fact, if an algorithm achieves a constant lag bound, it must
> also achieve a constant bound for the second metric, but the reverse is
> not necessarily true. But in some settings, people have focused on the
> second metric and still consider an algorithm to be fair as long as the
> second metric is bounded by a constant.

Using these metrics it is possible to write benchmarks quantifying
fairness as a performance metric, provided weights for nice numbers.

Not so coincidentally, this also entails a test of whether nice numbers
are working as intended.


-- wli

P.S. Divide by the length of the time interval to rephrase in terms of
CPU bandwidth.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-25 Thread William Lee Irwin III
On Tue, Apr 24, 2007 at 06:22:53PM -0700, Li, Tong N wrote:
 The goal of a proportional-share scheduling algorithm is to minimize the
 above metrics. If the lag function is bounded by a constant for any
 thread in any time interval, then the algorithm is considered to be
 fair. You may notice that the second metric is actually weaker than
 first. In fact, if an algorithm achieves a constant lag bound, it must
 also achieve a constant bound for the second metric, but the reverse is
 not necessarily true. But in some settings, people have focused on the
 second metric and still consider an algorithm to be fair as long as the
 second metric is bounded by a constant.

Using these metrics it is possible to write benchmarks quantifying
fairness as a performance metric, provided weights for nice numbers.

Not so coincidentally, this also entails a test of whether nice numbers
are working as intended.


-- wli

P.S. Divide by the length of the time interval to rephrase in terms of
CPU bandwidth.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-25 Thread Pavel Machek
Hi!


 it into some xorg.conf field. (It also makes sure that X isnt preempted 
 by other userspace stuff while it does timing-sensitive operations like 
 setting the video modes up or switching video modes, etc.)

X is priviledged. It can just cli around the critical section.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-25 Thread Ingo Molnar

* Pavel Machek [EMAIL PROTECTED] wrote:

  it into some xorg.conf field. (It also makes sure that X isnt 
  preempted by other userspace stuff while it does timing-sensitive 
  operations like setting the video modes up or switching video modes, 
  etc.)
 
 X is priviledged. It can just cli around the critical section.

yes, that is a tool that can be used too (and is used by most drivers) - 
my point was rather that besides the disadvantages, not preempting X can 
be an advantage too - not that there are no other (and often more 
suitable) tools to do the same.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-25 Thread Ingo Molnar

* Rogan Dawes [EMAIL PROTECTED] wrote:

 My concern was that since Ingo said that this is a closed economy, 
 with a fixed sum/total, if we lose a nanosecond here and there, 
 eventually we'll lose them all.

it's not a closed economy - the CPU constantly produces a resource: CPU 
cycles to be spent, and tasks constantly consume that resource. So in 
that sense small inaccuracies are not a huge issue. But you are correct 
that each and every such inaccuracy has to be justified. For example 
larger inaccuracies on the order of SCHED_LOAD_SCALE are a problem 
because they can indeed sum up, and i fixed up a couple of such 
inaccuracies in -v6-to-be.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-25 Thread Ingo Molnar

* Ray Lee [EMAIL PROTECTED] wrote:

 It would seem like there should be a penalty associated with sending 
 those points as well, so that two processes communicating quickly with 
 each other won't get into a mutual love-fest that'll capture the 
 scheduler's attention.

it's not really points, but nanoseconds you are allowed to execute on 
the CPU. And thus two processes communicating with each other quickly 
and sending around this resource does get the attention of CFS: the 
resource is gradually consumed because the two processes are running on 
the CPU while they are communicating with each other. So it all works 
out fine.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-25 Thread Ingo Molnar

* Li, Tong N [EMAIL PROTECTED] wrote:

 [...] A corollary of this is that if both threads i and j are 
 continuously runnable with fixed weights in the time interval, then 
 the ratio of their CPU time should be equal to the ratio of their 
 weights. This definition is pretty restrictive since it requires the 
 properties to hold for any thread in any interval, which is not 
 feasible. [...]

yes, it's a pretty strong definition, but also note that while it is 
definitely not easy to implement, the solution is nevertheless feasible 
in my opinion and there exists a scheduler that implements it: CFS.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-25 Thread Alan Cox
  it into some xorg.conf field. (It also makes sure that X isnt preempted 
  by other userspace stuff while it does timing-sensitive operations like 
  setting the video modes up or switching video modes, etc.)
 
 X is priviledged. It can just cli around the critical section.

Not really. X can use iopl3 but if it disables interrupts you get
priority inversions and hangs, so in practice it can't do that. 

Alan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-25 Thread William Lee Irwin III
* Li, Tong N [EMAIL PROTECTED] wrote:
 [...] A corollary of this is that if both threads i and j are 
 continuously runnable with fixed weights in the time interval, then 
 the ratio of their CPU time should be equal to the ratio of their 
 weights. This definition is pretty restrictive since it requires the 
 properties to hold for any thread in any interval, which is not 
 feasible. [...]

On Wed, Apr 25, 2007 at 11:44:03AM +0200, Ingo Molnar wrote:
 yes, it's a pretty strong definition, but also note that while it is 
 definitely not easy to implement, the solution is nevertheless feasible 
 in my opinion and there exists a scheduler that implements it: CFS.

The feasibility comment refers to the unimplementability of schedulers
with infinitesimal timeslices/quanta/sched_granularity_ns. It's no
failing of cfs (or any other scheduler) if, say, the ratios are not
exact within a time interval of one nanosecond or one picosecond.

One of the reasons you get the results you do is that what you use for
-fair_key is very close to the definition of lag, which is used as a
metric of fairness. It differs is in a couple of ways, but how it's
computed and used for queueing can be altered to more precisely match.

The basic concept you appear to be trying to implement is a greedy
algorithm: run the task with the largest lag first. As far as I can
tell, this is sound enough, though I have no formal proof. So with the
lag computation and queueing adjusted appropriately, it should work out.

Adjustments to the lag computation for for arrivals and departures
during execution are among the missing pieces. Some algorithmic devices
are also needed to account for the varying growth rates of lags of tasks
waiting to run, which arise from differing priorities/weights.

There are no mysteries.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-25 Thread Willy Tarreau
On Wed, Apr 25, 2007 at 04:58:40AM -0700, William Lee Irwin III wrote:

 Adjustments to the lag computation for for arrivals and departures
 during execution are among the missing pieces. Some algorithmic devices
 are also needed to account for the varying growth rates of lags of tasks
 waiting to run, which arise from differing priorities/weights.

that was the principle of my proposal of sorting tasks by expected completion
time and using +/- credit to compensate for too large/too short slice used.

Willy

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


SD renice recommendation was: Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-25 Thread Con Kolivas
On Tuesday 24 April 2007 16:36, Ingo Molnar wrote:

 So, my point is, the nice level of X for desktop users should not be set
 lower than a low limit suggested by that particular scheduler's author.
 That limit is scheduler-specific. Con i think recommends a nice level of
 -1 for X when using SD [Con, can you confirm?], while my tests show that
 if you want you can go as low as -10 under CFS, without any bad
 side-effects. (-19 was a bit too much)

Nice 0 as a default for X, but if renicing, nice -10 as the lower limit for X 
on SD. The reason for that on SD is that the priority of freshly woken up 
tasks (ie not fully cpu bound) for both nice 0 and nice -10 will still be the 
same at PRIO 1 (see the prio_matrix). Therefore, there will _not_ be 
preemption of the nice 0 task and a context switch _unless_ it is already cpu 
bound and has consumed a certain number of cycles and has been demoted. 
Contrary to popular belief, it is not universal that a less niced task will 
preempt its more niced counterpart and depends entirely on implementation of 
nice. Yes it is true that context switch rate will go up with a reniced X 
because the conditions that lead to preemption are more likely to be met, but 
it is definitely not every single wakeup of the reniced X.

Alas, again, I am forced to spend as little time as possible at the pc for my 
health, so expect _very few_ responses via email from me. Luckily SD is in 
pretty fine shape with version 0.46.

-- 
-ck
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Li, Tong N
> Could you explain for the audience the technical definition of
fairness
> and what sorts of error metrics are commonly used? There seems to be
> some disagreement, and you're neutral enough of an observer that your
> statement would help.

The definition for proportional fairness assumes that each thread has a
weight, which, for example, can be specified by the user, or sth. mapped
from thread priorities, nice values, etc. A scheduler achieves ideal
proportional fairness if (1) it is work-conserving, i.e., it never
leaves a processor idle if there are runnable threads, and (2) for any
two threads, i and j, in any time interval, the ratio of their CPU time
is greater than or equal to the ratio of their weights, assuming that
thread i is continuously runnable in the entire interval and both
threads have fixed weights throughout the interval. A corollary of this
is that if both threads i and j are continuously runnable with fixed
weights in the time interval, then the ratio of their CPU time should be
equal to the ratio of their weights. This definition is pretty
restrictive since it requires the properties to hold for any thread in
any interval, which is not feasible. In practice, all algorithms try to
approximate this ideal scheduler (often referred to as Generalized
Processor Scheduling or GPS). Two error metrics are often used: 

(1) lag(t): for any interval [t1, t2], the lag of a thread at time t \in
[t1, t2] is S'(t1, t) - S(t1, t), where S' is the CPU time the thread
would receive in the interval [t1, t] under the ideal scheduler and S is
the actual CPU time it receives under the scheduler being evaluated.

(2) The second metric doesn't really have an agreed-upon name. Some call
it fairness measure and some call it sth else. Anyway, different from
lag, which is kind of an absolute measure for one thread, this metric
(call it F) defines a relative measure between two threads over any time
interval:

F(t1, t2) = S_i(t1, t2) / w_i - S_j(t1, t2) / w_j,

where S_i and S_j are the CPU time the two threads receive in the
interval [t1, t2] and w_i and w_j are their weights, assuming both
weights don't change throughout the interval.

The goal of a proportional-share scheduling algorithm is to minimize the
above metrics. If the lag function is bounded by a constant for any
thread in any time interval, then the algorithm is considered to be
fair. You may notice that the second metric is actually weaker than
first. In fact, if an algorithm achieves a constant lag bound, it must
also achieve a constant bound for the second metric, but the reverse is
not necessarily true. But in some settings, people have focused on the
second metric and still consider an algorithm to be fair as long as the
second metric is bounded by a constant.

> 
> On Mon, Apr 23, 2007 at 05:59:06PM -0700, Li, Tong N wrote:
> > I understand that via experiments we can show a design is reasonably
> > fair in the common case, but IMHO, to claim that a design is fair,
there
> > needs to be some kind of formal analysis on the fairness bound, and
this
> > bound should be proven to be constant. Even if the bound is not
> > constant, at least this analysis can help us better understand and
> > predict the degree of fairness that users would experience (e.g.,
would
> > the system be less fair if the number of threads increases? What
happens
> > if a large number of threads dynamically join and leave the
system?).
> 
> Carrying out this sort of analysis on various policies would help, but
> I'd expect most of them to be difficult to analyze. cfs' current
> ->fair_key computation should be simple enough to analyze, at least
> ignoring nice numbers, though I've done nothing rigorous in this area.
> 

If we can derive some invariants from the algorithm, it'd help the
analysis. An example is the deficit round-robin (DRR) algorithm in
networking. Its analysis utilizes the fact that the round each flow (in
this case, it'd be thread) goes through in any time interval differs by
at most one.

Hope you didn't get bored by all of this. :)

  tong
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Gene Heskett
On Tuesday 24 April 2007, Willy Tarreau wrote:
>On Tue, Apr 24, 2007 at 10:38:32AM -0400, Gene Heskett wrote:
>> On Tuesday 24 April 2007, Ingo Molnar wrote:
>> >* David Lang <[EMAIL PROTECTED]> wrote:
>> >> > (Btw., to protect against such mishaps in the future i have changed
>> >> > the SysRq-N [SysRq-Nice] implementation in my tree to not only
>> >> > change real-time tasks to SCHED_OTHER, but to also renice negative
>> >> > nice levels back to 0 - this will show up in -v6. That way you'd
>> >> > only have had to hit SysRq-N to get the system out of the wedge.)
>> >>
>> >> if you are trying to unwedge a system it may be a good idea to renice
>> >> all tasks to 0, it could be that a task at +19 is holding a lock that
>> >> something else is waiting for.
>> >
>> >Yeah, that's possible too, but +19 tasks are getting a small but
>> >guaranteed share of the CPU so eventually it ought to release it. It's
>> >still a possibility, but i think i'll wait for a specific incident to
>> >happen first, and then react to that incident :-)
>> >
>> >Ingo
>>
>> In the instance I created, even the SysRq+b was ignored, and ISTR thats
>> supposed to initiate a reboot is it not?  So it was well and truly wedged.
>
>On many machines I use this on, I have to release Alt while still holding B.
>Don't know why, but it works like this.
>
>Willy

Yeah, Willy, and pardon a slight bit of sarcasm here but that's how we get the 
reputation for needing virgins to sacrifice, regular experienced girls just 
wouldn't do.

This isn't APL running on an IBM 5120, so it should Just Work(TM) and not need 
a sceance or something to conjure up the right spell.  Besides, the reset 
button is only about 6 feet away...  I get some execsize that way by getting 
up to push it. :)

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
It is so soon that I am done for, I wonder what I was begun for.
-- Epitaph, Cheltenham Churchyard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Gene Heskett
On Tuesday 24 April 2007, Willy Tarreau wrote:
>On Tue, Apr 24, 2007 at 10:38:32AM -0400, Gene Heskett wrote:
>> On Tuesday 24 April 2007, Ingo Molnar wrote:
>> >* David Lang <[EMAIL PROTECTED]> wrote:
>> >> > (Btw., to protect against such mishaps in the future i have changed
>> >> > the SysRq-N [SysRq-Nice] implementation in my tree to not only
>> >> > change real-time tasks to SCHED_OTHER, but to also renice negative
>> >> > nice levels back to 0 - this will show up in -v6. That way you'd
>> >> > only have had to hit SysRq-N to get the system out of the wedge.)
>> >>
>> >> if you are trying to unwedge a system it may be a good idea to renice
>> >> all tasks to 0, it could be that a task at +19 is holding a lock that
>> >> something else is waiting for.
>> >
>> >Yeah, that's possible too, but +19 tasks are getting a small but
>> >guaranteed share of the CPU so eventually it ought to release it. It's
>> >still a possibility, but i think i'll wait for a specific incident to
>> >happen first, and then react to that incident :-)
>> >
>> >Ingo
>>
>> In the instance I created, even the SysRq+b was ignored, and ISTR thats
>> supposed to initiate a reboot is it not?  So it was well and truly wedged.
>
>On many machines I use this on, I have to release Alt while still holding B.
>Don't know why, but it works like this.
>
>Willy

Yeah, Willy, and pardon a slight bit of sarcasm here but that's how we get the 
reputation for needing virgins to sacrifice, regular experienced girls just 
wouldn't do.

This isn't APL running on an IBM 5120, so it should Just Work(TM) and not need 
a sceance or something to conjure up the right spell.  Besides, the reset 
button is only about 6 feet away...  I get some execsize that way by getting 
up to push it. :)

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
It is so soon that I am done for, I wonder what I was begun for.
-- Epitaph, Cheltenham Churchyard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Peter Williams

Rogan Dawes wrote:

Chris Friesen wrote:

Rogan Dawes wrote:

I guess my point was if we somehow get to an odd number of 
nanoseconds, we'd end up with rounding errors. I'm not sure if your 
algorithm will ever allow that.


And Ingo's point was that when it takes thousands of nanoseconds for a 
single context switch, an error of half a nanosecond is down in the 
noise.


Chris


My concern was that since Ingo said that this is a closed economy, with 
a fixed sum/total, if we lose a nanosecond here and there, eventually 
we'll lose them all.


Some folks have uptimes of multiple years.

Of course, I could (very likely!) be full of it! ;-)


And won't be using the any new scheduler on these computers anyhow as 
that would involve bringing the system down to install the new kernel. :-)


Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Bernd Eckenfels
In article <[EMAIL PROTECTED]> you wrote:
> Could you explain for the audience the technical definition of fairness
> and what sorts of error metrics are commonly used? There seems to be
> some disagreement, and you're neutral enough of an observer that your
> statement would help.

And while we are at it, why it is a good thing. I could understand that fair
means no missbehaving (intentionally or unintentionally) application can
harm the rest of the system. However a responsive desktop might not
necesarily be very fair to compute jobs.

Even a simple thing as "who gets accounted" can be quite different in
different workloads. (larger multi user systems tend to be fair based on the
user, on servers you more balance by thread or job and single user systems
should be as unfair as the user wants them as long as no process can "run
away")

Gruss
Bernd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread William Lee Irwin III
On Mon, Apr 23, 2007 at 05:59:06PM -0700, Li, Tong N wrote:
> I don't know if we've discussed this or not. Since both CFS and SD claim
> to be fair, I'd like to hear more opinions on the fairness aspect of
> these designs. In areas such as OS, networking, and real-time, fairness,
> and its more general form, proportional fairness, are well-defined
> terms. In fact, perfect fairness is not feasible since it requires all
> runnable threads to be running simultaneously and scheduled with
> infinitesimally small quanta (like a fluid system). So to evaluate if a
> new scheduling algorithm is fair, the common approach is to take the
> ideal fair algorithm (often referred to as Generalized Processor
> Scheduling or GPS) as a reference model and analyze if the new algorithm
> can achieve a constant error bound (different error metrics also exist).

Could you explain for the audience the technical definition of fairness
and what sorts of error metrics are commonly used? There seems to be
some disagreement, and you're neutral enough of an observer that your
statement would help.


On Mon, Apr 23, 2007 at 05:59:06PM -0700, Li, Tong N wrote:
> I understand that via experiments we can show a design is reasonably
> fair in the common case, but IMHO, to claim that a design is fair, there
> needs to be some kind of formal analysis on the fairness bound, and this
> bound should be proven to be constant. Even if the bound is not
> constant, at least this analysis can help us better understand and
> predict the degree of fairness that users would experience (e.g., would
> the system be less fair if the number of threads increases? What happens
> if a large number of threads dynamically join and leave the system?).

Carrying out this sort of analysis on various policies would help, but
I'd expect most of them to be difficult to analyze. cfs' current
->fair_key computation should be simple enough to analyze, at least
ignoring nice numbers, though I've done nothing rigorous in this area.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Li, Tong N
On Mon, 2007-04-23 at 18:57 -0700, Bill Huey wrote: 
> On Mon, Apr 23, 2007 at 05:59:06PM -0700, Li, Tong N wrote:
> > I don't know if we've discussed this or not. Since both CFS and SD claim
> > to be fair, I'd like to hear more opinions on the fairness aspect of
> > these designs. In areas such as OS, networking, and real-time, fairness,
> > and its more general form, proportional fairness, are well-defined
> > terms. In fact, perfect fairness is not feasible since it requires all
> > runnable threads to be running simultaneously and scheduled with
> > infinitesimally small quanta (like a fluid system). So to evaluate if a
> 
> Unfortunately, fairness is rather non-formal in this context and probably
> isn't strictly desirable given how hack much of Linux userspace is. Until
> there's a method of doing directed yields, like what Will has prescribed
> a kind of allotment to thread doing work for another a completely strict
> mechanism, it is probably problematic with regards to corner cases.
> 
> X for example is largely non-thread safe. Until they can get their xcb
> framework in place and addition thread infrastructure to do hand off
> properly, it's going to be difficult schedule for it. It's well known to
> be problematic.

I agree. I just think calling the designs "perfectly" or "completely"
fair is too strong. It might cause unnecessary confusion that
overshadows the actual merits of these designs. If we were to evaluate
specifically the fairness aspect of a design, then I'd suggest defining
it more formally.

> You announced your scheduler without CCing any of the relevant people here
> (and risk being completely ignored in lkml traffic):
> 
>   http://lkml.org/lkml/2007/4/20/286
> 
> What is your opinion of both CFS and SDL ? How can you work be useful
> to either scheduler mentioned or to the Linux kernel on its own ?

I like SD for its simplicity. My concern with CFS is the RB tree
structure. Log(n) seems high to me given the fact that we had an O(1)
scheduler. Many algorithms achieve strong fairness guarantees at the
cost of log(n) time. Thus, I tend to think, if log(n) is acceptable, we
might want also to look at other algorithms (e.g., start-time first)
with better fairness properties and see if they could be extended to be
general purpose.

> > I understand that via experiments we can show a design is reasonably
> > fair in the common case, but IMHO, to claim that a design is fair, there
> > needs to be some kind of formal analysis on the fairness bound, and this
> > bound should be proven to be constant. Even if the bound is not
> > constant, at least this analysis can help us better understand and
> > predict the degree of fairness that users would experience (e.g., would
> > the system be less fair if the number of threads increases? What happens
> > if a large number of threads dynamically join and leave the system?).
> 
> Will has been thinking about this, but you have to also consider the
> practicalities of your approach versus Con's and Ingo's.

I consider my work an approach to extend an existing scheduler to
support proportional fairness. I see many proportional-share designs are
lacking things such as good interactive support that Linux does well.
This is why I designed it on top of the existing scheduler so that it
can leverage things such as dynamic priorities. Regardless of the
underlying scheduler, SD or CFS, I think the algorithm I used would
still apply and thus we can extend the scheduler similarly.

> I'm all for things like proportional scheduling and the extensions
> needed to do it properly. It would be highly relevant to some version
> of the -rt patch if not that patch directly.

I'd love it to be considered for part of the -rt patch. I'm new to 
this, so would you please let me know what to do?

Thanks,

  tong
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Willy Tarreau
On Tue, Apr 24, 2007 at 10:38:32AM -0400, Gene Heskett wrote:
> On Tuesday 24 April 2007, Ingo Molnar wrote:
> >* David Lang <[EMAIL PROTECTED]> wrote:
> >> > (Btw., to protect against such mishaps in the future i have changed
> >> > the SysRq-N [SysRq-Nice] implementation in my tree to not only
> >> > change real-time tasks to SCHED_OTHER, but to also renice negative
> >> > nice levels back to 0 - this will show up in -v6. That way you'd
> >> > only have had to hit SysRq-N to get the system out of the wedge.)
> >>
> >> if you are trying to unwedge a system it may be a good idea to renice
> >> all tasks to 0, it could be that a task at +19 is holding a lock that
> >> something else is waiting for.
> >
> >Yeah, that's possible too, but +19 tasks are getting a small but
> >guaranteed share of the CPU so eventually it ought to release it. It's
> >still a possibility, but i think i'll wait for a specific incident to
> >happen first, and then react to that incident :-)
> >
> > Ingo
> 
> In the instance I created, even the SysRq+b was ignored, and ISTR thats 
> supposed to initiate a reboot is it not?  So it was well and truly wedged.

On many machines I use this on, I have to release Alt while still holding B.
Don't know why, but it works like this.

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Chris Friesen

Rogan Dawes wrote:

My concern was that since Ingo said that this is a closed economy, with 
a fixed sum/total, if we lose a nanosecond here and there, eventually 
we'll lose them all.


I assume Ingo has set it up so that the system doesn't "lose" partial 
nanoseconds, but rather they'd just be accounted to the wrong task.


Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ray Lee

On 4/23/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:

On Mon, 23 Apr 2007, Ingo Molnar wrote:
>
> The "give scheduler money" transaction can be both an "implicit
> transaction" (for example when writing to UNIX domain sockets or
> blocking on a pipe, etc.), or it could be an "explicit transaction":
> sched_yield_to(). This latter i've already implemented for CFS, but it's
> much less useful than the really significant implicit ones, the ones
> which will help X.

Yes. It would be wonderful to get it working automatically, so please say
something about the implementation..

The "perfect" situation would be that when somebody goes to sleep, any
extra points it had could be given to whoever it woke up last. Note that
for something like X, it means that the points are 100% ephemeral: it gets
points when a client sends it a request, but it would *lose* the points
again when it sends the reply!


It would seem like there should be a penalty associated with sending
those points as well, so that two processes communicating quickly with
each other won't get into a mutual love-fest that'll capture the
scheduler's attention.

Ray
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Rogan Dawes

Chris Friesen wrote:

Rogan Dawes wrote:

I guess my point was if we somehow get to an odd number of 
nanoseconds, we'd end up with rounding errors. I'm not sure if your 
algorithm will ever allow that.


And Ingo's point was that when it takes thousands of nanoseconds for a 
single context switch, an error of half a nanosecond is down in the noise.


Chris


My concern was that since Ingo said that this is a closed economy, with 
a fixed sum/total, if we lose a nanosecond here and there, eventually 
we'll lose them all.


Some folks have uptimes of multiple years.

Of course, I could (very likely!) be full of it! ;-)

Rogan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Chris Friesen

Rogan Dawes wrote:

I guess my point was if we somehow get to an odd number of nanoseconds, 
we'd end up with rounding errors. I'm not sure if your algorithm will 
ever allow that.


And Ingo's point was that when it takes thousands of nanoseconds for a 
single context switch, an error of half a nanosecond is down in the noise.


Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Gene Heskett
On Tuesday 24 April 2007, Ingo Molnar wrote:
>* Ingo Molnar <[EMAIL PROTECTED]> wrote:
>> yeah, i guess this has little to do with X. I think in your scenario
>> it might have been smarter to either stop, or to renice the workloads
>> that took away CPU power from others to _positive_ nice levels.
>> Negative nice levels can indeed be dangerous.
>
>btw., was X itself at nice 0 or nice -10 when the lockup happened?
>
>   Ingo

Memory could be fuzzy Ingo, but I think it was at 0 at the time.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
I know it all.  I just can't remember it all at once.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Gene Heskett
On Tuesday 24 April 2007, Ingo Molnar wrote:
>* Ingo Molnar <[EMAIL PROTECTED]> wrote:
>> yeah, i guess this has little to do with X. I think in your scenario
>> it might have been smarter to either stop, or to renice the workloads
>> that took away CPU power from others to _positive_ nice levels.
>> Negative nice levels can indeed be dangerous.
>
>btw., was X itself at nice 0 or nice -10 when the lockup happened?
>
>   Ingo

Memory could be hazy Ingo, but I think X was at 0 when that occurred.


-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
I use technology in order to hate it more properly.
-- Nam June Paik
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Gene Heskett
On Tuesday 24 April 2007, Ingo Molnar wrote:
>* David Lang <[EMAIL PROTECTED]> wrote:
>> > (Btw., to protect against such mishaps in the future i have changed
>> > the SysRq-N [SysRq-Nice] implementation in my tree to not only
>> > change real-time tasks to SCHED_OTHER, but to also renice negative
>> > nice levels back to 0 - this will show up in -v6. That way you'd
>> > only have had to hit SysRq-N to get the system out of the wedge.)
>>
>> if you are trying to unwedge a system it may be a good idea to renice
>> all tasks to 0, it could be that a task at +19 is holding a lock that
>> something else is waiting for.
>
>Yeah, that's possible too, but +19 tasks are getting a small but
>guaranteed share of the CPU so eventually it ought to release it. It's
>still a possibility, but i think i'll wait for a specific incident to
>happen first, and then react to that incident :-)
>
>   Ingo

In the instance I created, even the SysRq+b was ignored, and ISTR thats 
supposed to initiate a reboot is it not?  So it was well and truly wedged.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
I use technology in order to hate it more properly.
-- Nam June Paik
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Gene Heskett
On Tuesday 24 April 2007, Ingo Molnar wrote:
>* Gene Heskett <[EMAIL PROTECTED]> wrote:
>> > (Btw., to protect against such mishaps in the future i have changed
>> > the SysRq-N [SysRq-Nice] implementation in my tree to not only
>> > change real-time tasks to SCHED_OTHER, but to also renice negative
>> > nice levels back to 0 - this will show up in -v6. That way you'd
>> > only have had to hit SysRq-N to get the system out of the wedge.)
>>
>> That sounds handy, particularly with idiots like me at the wheel...
>
>by that standard i guess we tinkerers are all idiots ;)
>
>   Ingo

Eiyyyup!

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Man's horizons are bounded by his vision.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Rogan Dawes

Ingo Molnar wrote:

* Rogan Dawes <[EMAIL PROTECTED]> wrote:


   if (p_to && p->wait_runtime > 0) {
   p->wait_runtime >>= 1;
   p_to->wait_runtime += p->wait_runtime;
   }

the above is the basic expression of: "charge a positive bank balance". 


[..]

[note, due to the nanoseconds unit there's no rounding loss to worry 
about.]

Surely if you divide 5 nanoseconds by 2, you'll get a rounding loss?


yes. But not that we'll only truly have to worry about that when we'll 
have context-switching performance in that range - currently it's at 
least 2-3 orders of magnitude above that. Microseconds seemed to me to 
be too coarse already, that's why i picked nanoseconds and 64-bit 
arithmetics for CFS.


Ingo


I guess my point was if we somehow get to an odd number of nanoseconds, 
we'd end up with rounding errors. I'm not sure if your algorithm will 
ever allow that.


Rogan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> [...] That way you'd only have had to hit SysRq-N to get the system 
> out of the wedge.)

small correction: Alt-SysRq-N.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ingo Molnar

* Rogan Dawes <[EMAIL PROTECTED]> wrote:

> >if (p_to && p->wait_runtime > 0) {
> >p->wait_runtime >>= 1;
> >p_to->wait_runtime += p->wait_runtime;
> >}
> >
> >the above is the basic expression of: "charge a positive bank balance". 
> >
> 
> [..]
> 
> > [note, due to the nanoseconds unit there's no rounding loss to worry 
> > about.]
> 
> Surely if you divide 5 nanoseconds by 2, you'll get a rounding loss?

yes. But not that we'll only truly have to worry about that when we'll 
have context-switching performance in that range - currently it's at 
least 2-3 orders of magnitude above that. Microseconds seemed to me to 
be too coarse already, that's why i picked nanoseconds and 64-bit 
arithmetics for CFS.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ingo Molnar

* David Lang <[EMAIL PROTECTED]> wrote:

> > (Btw., to protect against such mishaps in the future i have changed 
> > the SysRq-N [SysRq-Nice] implementation in my tree to not only 
> > change real-time tasks to SCHED_OTHER, but to also renice negative 
> > nice levels back to 0 - this will show up in -v6. That way you'd 
> > only have had to hit SysRq-N to get the system out of the wedge.)
> 
> if you are trying to unwedge a system it may be a good idea to renice 
> all tasks to 0, it could be that a task at +19 is holding a lock that 
> something else is waiting for.

Yeah, that's possible too, but +19 tasks are getting a small but 
guaranteed share of the CPU so eventually it ought to release it. It's 
still a possibility, but i think i'll wait for a specific incident to 
happen first, and then react to that incident :-)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> yeah, i guess this has little to do with X. I think in your scenario 
> it might have been smarter to either stop, or to renice the workloads 
> that took away CPU power from others to _positive_ nice levels. 
> Negative nice levels can indeed be dangerous.

btw., was X itself at nice 0 or nice -10 when the lockup happened?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread David Lang

On Tue, 24 Apr 2007, Ingo Molnar wrote:


* Gene Heskett <[EMAIL PROTECTED]> wrote:


Gene has done some testing under CFS with X reniced to +10 and the
desktop still worked smoothly for him.


As a data point here, and probably nothing to do with X, but I did
manage to lock it up, solid, reset button time tonight, by wanting
'smart' to get done with an update session after amanda had started.
I took both smart processes I could see in htop all the way to -19,
but when it was about done about 3 minutes later, everything came to
an instant, frozen, reset button required lockup.  I should have
stopped at -17 I guess. :(


yeah, i guess this has little to do with X. I think in your scenario it
might have been smarter to either stop, or to renice the workloads that
took away CPU power from others to _positive_ nice levels. Negative nice
levels can indeed be dangerous.

(Btw., to protect against such mishaps in the future i have changed the
SysRq-N [SysRq-Nice] implementation in my tree to not only change
real-time tasks to SCHED_OTHER, but to also renice negative nice levels
back to 0 - this will show up in -v6. That way you'd only have had to
hit SysRq-N to get the system out of the wedge.)


if you are trying to unwedge a system it may be a good idea to renice all tasks 
to 0, it could be that a task at +19 is holding a lock that something else is 
waiting for.


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ingo Molnar

* Gene Heskett <[EMAIL PROTECTED]> wrote:

> > (Btw., to protect against such mishaps in the future i have changed 
> > the SysRq-N [SysRq-Nice] implementation in my tree to not only 
> > change real-time tasks to SCHED_OTHER, but to also renice negative 
> > nice levels back to 0 - this will show up in -v6. That way you'd 
> > only have had to hit SysRq-N to get the system out of the wedge.)
> 
> That sounds handy, particularly with idiots like me at the wheel...

by that standard i guess we tinkerers are all idiots ;)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Gene Heskett
On Tuesday 24 April 2007, Ingo Molnar wrote:
>* Gene Heskett <[EMAIL PROTECTED]> wrote:
>> > Gene has done some testing under CFS with X reniced to +10 and the
>> > desktop still worked smoothly for him.
>>
>> As a data point here, and probably nothing to do with X, but I did
>> manage to lock it up, solid, reset button time tonight, by wanting
>> 'smart' to get done with an update session after amanda had started.
>> I took both smart processes I could see in htop all the way to -19,
>> but when it was about done about 3 minutes later, everything came to
>> an instant, frozen, reset button required lockup.  I should have
>> stopped at -17 I guess. :(
>
>yeah, i guess this has little to do with X. I think in your scenario it
>might have been smarter to either stop, or to renice the workloads that
>took away CPU power from others to _positive_ nice levels. Negative nice
>levels can indeed be dangerous.
>
>(Btw., to protect against such mishaps in the future i have changed the
>SysRq-N [SysRq-Nice] implementation in my tree to not only change
>real-time tasks to SCHED_OTHER, but to also renice negative nice levels
>back to 0 - this will show up in -v6. That way you'd only have had to
>hit SysRq-N to get the system out of the wedge.)
>
>   Ingo

That sounds handy, particularly with idiots like me at the wheel...


-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
When a Banker jumps out of a window, jump after him--that's where the money 
is.
-- Robespierre
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ingo Molnar

* Gene Heskett <[EMAIL PROTECTED]> wrote:

> > Gene has done some testing under CFS with X reniced to +10 and the 
> > desktop still worked smoothly for him.
> 
> As a data point here, and probably nothing to do with X, but I did 
> manage to lock it up, solid, reset button time tonight, by wanting 
> 'smart' to get done with an update session after amanda had started.  
> I took both smart processes I could see in htop all the way to -19, 
> but when it was about done about 3 minutes later, everything came to 
> an instant, frozen, reset button required lockup.  I should have 
> stopped at -17 I guess. :(

yeah, i guess this has little to do with X. I think in your scenario it 
might have been smarter to either stop, or to renice the workloads that 
took away CPU power from others to _positive_ nice levels. Negative nice 
levels can indeed be dangerous.

(Btw., to protect against such mishaps in the future i have changed the 
SysRq-N [SysRq-Nice] implementation in my tree to not only change 
real-time tasks to SCHED_OTHER, but to also renice negative nice levels 
back to 0 - this will show up in -v6. That way you'd only have had to 
hit SysRq-N to get the system out of the wedge.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Rogan Dawes

Ingo Molnar wrote:


static void
yield_task_fair(struct rq *rq, struct task_struct *p, struct task_struct *p_to)
{
struct rb_node *curr, *next, *first;
struct task_struct *p_next;

/*
 * yield-to support: if we are on the same runqueue then
 * give half of our wait_runtime (if it's positive) to the other task:
 */
if (p_to && p->wait_runtime > 0) {
p->wait_runtime >>= 1;
p_to->wait_runtime += p->wait_runtime;
}

the above is the basic expression of: "charge a positive bank balance". 



[..]

[note, due to the nanoseconds unit there's no rounding loss to worry 
about.]


Surely if you divide 5 nanoseconds by 2, you'll get a rounding loss?


Ingo


Rogan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Gene Heskett
On Tuesday 24 April 2007, Ingo Molnar wrote:
>* Peter Williams <[EMAIL PROTECTED]> wrote:
>> > The cases are fundamentally different in behavior, because in the
>> > first case, X hardly consumes the time it would get in any scheme,
>> > while in the second case X really is CPU bound and will happily
>> > consume any CPU time it can get.
>>
>> Which still doesn't justify an elaborate "points" sharing scheme.
>> Whichever way you look at that that's just another way of giving X
>> more CPU bandwidth and there are simpler ways to give X more CPU if it
>> needs it.  However, I think there's something seriously wrong if it
>> needs the -19 nice that I've heard mentioned.
>
>Gene has done some testing under CFS with X reniced to +10 and the
>desktop still worked smoothly for him.

As a data point here, and probably nothing to do with X, but I did manage to 
lock it up, solid, reset button time tonight, by wanting 'smart' to get done 
with an update session after amanda had started.  I took both smart processes 
I could see in htop all the way to -19, but when it was about done about 3 
minutes later, everything came to an instant, frozen, reset button required 
lockup.  I should have stopped at -17 I guess. :(

>So CFS does not 'need' a reniced 
>X. There are simply advantages to negative nice levels: for example
>screen refreshes are smoother on any scheduler i tried. BUT, there is a
>caveat: on non-CFS schedulers i tried X is much more prone to get into
>'overscheduling' scenarios that visibly hurt X's performance, while on
>CFS there's a max of 1000-1500 context switches a second at nice -10.
>(which, considering the cost of a context switch is well under 1%
>overhead.)
>
>So, my point is, the nice level of X for desktop users should not be set
>lower than a low limit suggested by that particular scheduler's author.
>That limit is scheduler-specific. Con i think recommends a nice level of
>-1 for X when using SD [Con, can you confirm?], while my tests show that
>if you want you can go as low as -10 under CFS, without any bad
>side-effects. (-19 was a bit too much)
>
>> [...]  You might as well just run it as a real time process.
>
>hm, that would be a bad idea under any scheduler (including CFS),
>because real time processes can starve other processes indefinitely.
>
>   Ingo



-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
I have discovered that all human evil comes from this, man's being unable
to sit still in a room.
-- Blaise Pascal
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ingo Molnar

* Peter Williams <[EMAIL PROTECTED]> wrote:

> > The cases are fundamentally different in behavior, because in the 
> > first case, X hardly consumes the time it would get in any scheme, 
> > while in the second case X really is CPU bound and will happily 
> > consume any CPU time it can get.
> 
> Which still doesn't justify an elaborate "points" sharing scheme. 
> Whichever way you look at that that's just another way of giving X 
> more CPU bandwidth and there are simpler ways to give X more CPU if it 
> needs it.  However, I think there's something seriously wrong if it 
> needs the -19 nice that I've heard mentioned.

Gene has done some testing under CFS with X reniced to +10 and the 
desktop still worked smoothly for him. So CFS does not 'need' a reniced 
X. There are simply advantages to negative nice levels: for example 
screen refreshes are smoother on any scheduler i tried. BUT, there is a 
caveat: on non-CFS schedulers i tried X is much more prone to get into 
'overscheduling' scenarios that visibly hurt X's performance, while on 
CFS there's a max of 1000-1500 context switches a second at nice -10. 
(which, considering the cost of a context switch is well under 1% 
overhead.)

So, my point is, the nice level of X for desktop users should not be set 
lower than a low limit suggested by that particular scheduler's author. 
That limit is scheduler-specific. Con i think recommends a nice level of 
-1 for X when using SD [Con, can you confirm?], while my tests show that 
if you want you can go as low as -10 under CFS, without any bad 
side-effects. (-19 was a bit too much)

> [...]  You might as well just run it as a real time process.

hm, that would be a bad idea under any scheduler (including CFS), 
because real time processes can starve other processes indefinitely.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Peter Williams

Arjan van de Ven wrote:
Within reason, it's not the number of clients that X has that causes its 
CPU bandwidth use to sky rocket and cause problems.  It's more to to 
with what type of clients they are.  Most GUIs (even ones that are 
constantly updating visual data (e.g. gkrellm -- I can open quite a 
large number of these without increasing X's CPU usage very much)) cause 
very little load on the X server.  The exceptions to this are the 



there is actually 2 and not just 1 "X server", and they are VERY VERY
different in behavior.

Case 1: Accelerated driver

If X talks to a decent enough card it supports will with acceleration,
it will be very rare for X itself to spend any kind of significant
amount of CPU time, all the really heavy stuff is done in hardware, and
asynchronously at that. A bit of batching will greatly improve system
performance in this case.

Case 2: Unaccelerated VESA

Some drivers in X, especially the VESA and NV drivers (which are quite
common, vesa is used on all hardware without a special driver nowadays),
have no or not enough acceleration to matter for modern desktops. This
means the CPU is doing all the heavy lifting, in the X program. In this
case even a simple "move the window a bit" becomes quite a bit of a CPU
hog already.


Mine's a:

SiS 661/741/760 PCI/AGP or 662/761Gx PCIE VGA Display adapter according 
to X's display settings tool.  Which category does that fall into?


It's not a special adapter and is just the one that came with the 
motherboard. It doesn't use much CPU unless I grab a window and wiggle 
it all over the screen or do something like "ls -lR /" in an xterm.




The cases are fundamentally different in behavior, because in the first
case, X hardly consumes the time it would get in any scheme, while in
the second case X really is CPU bound and will happily consume any CPU
time it can get.


Which still doesn't justify an elaborate "points" sharing scheme. 
Whichever way you look at that that's just another way of giving X more 
CPU bandwidth and there are simpler ways to give X more CPU if it needs 
it.  However, I think there's something seriously wrong if it needs the 
-19 nice that I've heard mentioned.  You might as well just run it as a 
real time process.


Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Peter Williams

Arjan van de Ven wrote:
Within reason, it's not the number of clients that X has that causes its 
CPU bandwidth use to sky rocket and cause problems.  It's more to to 
with what type of clients they are.  Most GUIs (even ones that are 
constantly updating visual data (e.g. gkrellm -- I can open quite a 
large number of these without increasing X's CPU usage very much)) cause 
very little load on the X server.  The exceptions to this are the 



there is actually 2 and not just 1 X server, and they are VERY VERY
different in behavior.

Case 1: Accelerated driver

If X talks to a decent enough card it supports will with acceleration,
it will be very rare for X itself to spend any kind of significant
amount of CPU time, all the really heavy stuff is done in hardware, and
asynchronously at that. A bit of batching will greatly improve system
performance in this case.

Case 2: Unaccelerated VESA

Some drivers in X, especially the VESA and NV drivers (which are quite
common, vesa is used on all hardware without a special driver nowadays),
have no or not enough acceleration to matter for modern desktops. This
means the CPU is doing all the heavy lifting, in the X program. In this
case even a simple move the window a bit becomes quite a bit of a CPU
hog already.


Mine's a:

SiS 661/741/760 PCI/AGP or 662/761Gx PCIE VGA Display adapter according 
to X's display settings tool.  Which category does that fall into?


It's not a special adapter and is just the one that came with the 
motherboard. It doesn't use much CPU unless I grab a window and wiggle 
it all over the screen or do something like ls -lR / in an xterm.




The cases are fundamentally different in behavior, because in the first
case, X hardly consumes the time it would get in any scheme, while in
the second case X really is CPU bound and will happily consume any CPU
time it can get.


Which still doesn't justify an elaborate points sharing scheme. 
Whichever way you look at that that's just another way of giving X more 
CPU bandwidth and there are simpler ways to give X more CPU if it needs 
it.  However, I think there's something seriously wrong if it needs the 
-19 nice that I've heard mentioned.  You might as well just run it as a 
real time process.


Peter
--
Peter Williams   [EMAIL PROTECTED]

Learning, n. The kind of ignorance distinguishing the studious.
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ingo Molnar

* Peter Williams [EMAIL PROTECTED] wrote:

  The cases are fundamentally different in behavior, because in the 
  first case, X hardly consumes the time it would get in any scheme, 
  while in the second case X really is CPU bound and will happily 
  consume any CPU time it can get.
 
 Which still doesn't justify an elaborate points sharing scheme. 
 Whichever way you look at that that's just another way of giving X 
 more CPU bandwidth and there are simpler ways to give X more CPU if it 
 needs it.  However, I think there's something seriously wrong if it 
 needs the -19 nice that I've heard mentioned.

Gene has done some testing under CFS with X reniced to +10 and the 
desktop still worked smoothly for him. So CFS does not 'need' a reniced 
X. There are simply advantages to negative nice levels: for example 
screen refreshes are smoother on any scheduler i tried. BUT, there is a 
caveat: on non-CFS schedulers i tried X is much more prone to get into 
'overscheduling' scenarios that visibly hurt X's performance, while on 
CFS there's a max of 1000-1500 context switches a second at nice -10. 
(which, considering the cost of a context switch is well under 1% 
overhead.)

So, my point is, the nice level of X for desktop users should not be set 
lower than a low limit suggested by that particular scheduler's author. 
That limit is scheduler-specific. Con i think recommends a nice level of 
-1 for X when using SD [Con, can you confirm?], while my tests show that 
if you want you can go as low as -10 under CFS, without any bad 
side-effects. (-19 was a bit too much)

 [...]  You might as well just run it as a real time process.

hm, that would be a bad idea under any scheduler (including CFS), 
because real time processes can starve other processes indefinitely.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Gene Heskett
On Tuesday 24 April 2007, Ingo Molnar wrote:
* Peter Williams [EMAIL PROTECTED] wrote:
  The cases are fundamentally different in behavior, because in the
  first case, X hardly consumes the time it would get in any scheme,
  while in the second case X really is CPU bound and will happily
  consume any CPU time it can get.

 Which still doesn't justify an elaborate points sharing scheme.
 Whichever way you look at that that's just another way of giving X
 more CPU bandwidth and there are simpler ways to give X more CPU if it
 needs it.  However, I think there's something seriously wrong if it
 needs the -19 nice that I've heard mentioned.

Gene has done some testing under CFS with X reniced to +10 and the
desktop still worked smoothly for him.

As a data point here, and probably nothing to do with X, but I did manage to 
lock it up, solid, reset button time tonight, by wanting 'smart' to get done 
with an update session after amanda had started.  I took both smart processes 
I could see in htop all the way to -19, but when it was about done about 3 
minutes later, everything came to an instant, frozen, reset button required 
lockup.  I should have stopped at -17 I guess. :(

So CFS does not 'need' a reniced 
X. There are simply advantages to negative nice levels: for example
screen refreshes are smoother on any scheduler i tried. BUT, there is a
caveat: on non-CFS schedulers i tried X is much more prone to get into
'overscheduling' scenarios that visibly hurt X's performance, while on
CFS there's a max of 1000-1500 context switches a second at nice -10.
(which, considering the cost of a context switch is well under 1%
overhead.)

So, my point is, the nice level of X for desktop users should not be set
lower than a low limit suggested by that particular scheduler's author.
That limit is scheduler-specific. Con i think recommends a nice level of
-1 for X when using SD [Con, can you confirm?], while my tests show that
if you want you can go as low as -10 under CFS, without any bad
side-effects. (-19 was a bit too much)

 [...]  You might as well just run it as a real time process.

hm, that would be a bad idea under any scheduler (including CFS),
because real time processes can starve other processes indefinitely.

   Ingo



-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
I have discovered that all human evil comes from this, man's being unable
to sit still in a room.
-- Blaise Pascal
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Rogan Dawes

Ingo Molnar wrote:


static void
yield_task_fair(struct rq *rq, struct task_struct *p, struct task_struct *p_to)
{
struct rb_node *curr, *next, *first;
struct task_struct *p_next;

/*
 * yield-to support: if we are on the same runqueue then
 * give half of our wait_runtime (if it's positive) to the other task:
 */
if (p_to  p-wait_runtime  0) {
p-wait_runtime = 1;
p_to-wait_runtime += p-wait_runtime;
}

the above is the basic expression of: charge a positive bank balance. 



[..]

[note, due to the nanoseconds unit there's no rounding loss to worry 
about.]


Surely if you divide 5 nanoseconds by 2, you'll get a rounding loss?


Ingo


Rogan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ingo Molnar

* Gene Heskett [EMAIL PROTECTED] wrote:

  Gene has done some testing under CFS with X reniced to +10 and the 
  desktop still worked smoothly for him.
 
 As a data point here, and probably nothing to do with X, but I did 
 manage to lock it up, solid, reset button time tonight, by wanting 
 'smart' to get done with an update session after amanda had started.  
 I took both smart processes I could see in htop all the way to -19, 
 but when it was about done about 3 minutes later, everything came to 
 an instant, frozen, reset button required lockup.  I should have 
 stopped at -17 I guess. :(

yeah, i guess this has little to do with X. I think in your scenario it 
might have been smarter to either stop, or to renice the workloads that 
took away CPU power from others to _positive_ nice levels. Negative nice 
levels can indeed be dangerous.

(Btw., to protect against such mishaps in the future i have changed the 
SysRq-N [SysRq-Nice] implementation in my tree to not only change 
real-time tasks to SCHED_OTHER, but to also renice negative nice levels 
back to 0 - this will show up in -v6. That way you'd only have had to 
hit SysRq-N to get the system out of the wedge.)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Gene Heskett
On Tuesday 24 April 2007, Ingo Molnar wrote:
* Gene Heskett [EMAIL PROTECTED] wrote:
  Gene has done some testing under CFS with X reniced to +10 and the
  desktop still worked smoothly for him.

 As a data point here, and probably nothing to do with X, but I did
 manage to lock it up, solid, reset button time tonight, by wanting
 'smart' to get done with an update session after amanda had started.
 I took both smart processes I could see in htop all the way to -19,
 but when it was about done about 3 minutes later, everything came to
 an instant, frozen, reset button required lockup.  I should have
 stopped at -17 I guess. :(

yeah, i guess this has little to do with X. I think in your scenario it
might have been smarter to either stop, or to renice the workloads that
took away CPU power from others to _positive_ nice levels. Negative nice
levels can indeed be dangerous.

(Btw., to protect against such mishaps in the future i have changed the
SysRq-N [SysRq-Nice] implementation in my tree to not only change
real-time tasks to SCHED_OTHER, but to also renice negative nice levels
back to 0 - this will show up in -v6. That way you'd only have had to
hit SysRq-N to get the system out of the wedge.)

   Ingo

That sounds handy, particularly with idiots like me at the wheel...


-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
When a Banker jumps out of a window, jump after him--that's where the money 
is.
-- Robespierre
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ingo Molnar

* Gene Heskett [EMAIL PROTECTED] wrote:

  (Btw., to protect against such mishaps in the future i have changed 
  the SysRq-N [SysRq-Nice] implementation in my tree to not only 
  change real-time tasks to SCHED_OTHER, but to also renice negative 
  nice levels back to 0 - this will show up in -v6. That way you'd 
  only have had to hit SysRq-N to get the system out of the wedge.)
 
 That sounds handy, particularly with idiots like me at the wheel...

by that standard i guess we tinkerers are all idiots ;)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread David Lang

On Tue, 24 Apr 2007, Ingo Molnar wrote:


* Gene Heskett [EMAIL PROTECTED] wrote:


Gene has done some testing under CFS with X reniced to +10 and the
desktop still worked smoothly for him.


As a data point here, and probably nothing to do with X, but I did
manage to lock it up, solid, reset button time tonight, by wanting
'smart' to get done with an update session after amanda had started.
I took both smart processes I could see in htop all the way to -19,
but when it was about done about 3 minutes later, everything came to
an instant, frozen, reset button required lockup.  I should have
stopped at -17 I guess. :(


yeah, i guess this has little to do with X. I think in your scenario it
might have been smarter to either stop, or to renice the workloads that
took away CPU power from others to _positive_ nice levels. Negative nice
levels can indeed be dangerous.

(Btw., to protect against such mishaps in the future i have changed the
SysRq-N [SysRq-Nice] implementation in my tree to not only change
real-time tasks to SCHED_OTHER, but to also renice negative nice levels
back to 0 - this will show up in -v6. That way you'd only have had to
hit SysRq-N to get the system out of the wedge.)


if you are trying to unwedge a system it may be a good idea to renice all tasks 
to 0, it could be that a task at +19 is holding a lock that something else is 
waiting for.


David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ingo Molnar

* David Lang [EMAIL PROTECTED] wrote:

  (Btw., to protect against such mishaps in the future i have changed 
  the SysRq-N [SysRq-Nice] implementation in my tree to not only 
  change real-time tasks to SCHED_OTHER, but to also renice negative 
  nice levels back to 0 - this will show up in -v6. That way you'd 
  only have had to hit SysRq-N to get the system out of the wedge.)
 
 if you are trying to unwedge a system it may be a good idea to renice 
 all tasks to 0, it could be that a task at +19 is holding a lock that 
 something else is waiting for.

Yeah, that's possible too, but +19 tasks are getting a small but 
guaranteed share of the CPU so eventually it ought to release it. It's 
still a possibility, but i think i'll wait for a specific incident to 
happen first, and then react to that incident :-)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

 yeah, i guess this has little to do with X. I think in your scenario 
 it might have been smarter to either stop, or to renice the workloads 
 that took away CPU power from others to _positive_ nice levels. 
 Negative nice levels can indeed be dangerous.

btw., was X itself at nice 0 or nice -10 when the lockup happened?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ingo Molnar

* Rogan Dawes [EMAIL PROTECTED] wrote:

 if (p_to  p-wait_runtime  0) {
 p-wait_runtime = 1;
 p_to-wait_runtime += p-wait_runtime;
 }
 
 the above is the basic expression of: charge a positive bank balance. 
 
 
 [..]
 
  [note, due to the nanoseconds unit there's no rounding loss to worry 
  about.]
 
 Surely if you divide 5 nanoseconds by 2, you'll get a rounding loss?

yes. But not that we'll only truly have to worry about that when we'll 
have context-switching performance in that range - currently it's at 
least 2-3 orders of magnitude above that. Microseconds seemed to me to 
be too coarse already, that's why i picked nanoseconds and 64-bit 
arithmetics for CFS.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

 [...] That way you'd only have had to hit SysRq-N to get the system 
 out of the wedge.)

small correction: Alt-SysRq-N.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Rogan Dawes

Ingo Molnar wrote:

* Rogan Dawes [EMAIL PROTECTED] wrote:


   if (p_to  p-wait_runtime  0) {
   p-wait_runtime = 1;
   p_to-wait_runtime += p-wait_runtime;
   }

the above is the basic expression of: charge a positive bank balance. 


[..]

[note, due to the nanoseconds unit there's no rounding loss to worry 
about.]

Surely if you divide 5 nanoseconds by 2, you'll get a rounding loss?


yes. But not that we'll only truly have to worry about that when we'll 
have context-switching performance in that range - currently it's at 
least 2-3 orders of magnitude above that. Microseconds seemed to me to 
be too coarse already, that's why i picked nanoseconds and 64-bit 
arithmetics for CFS.


Ingo


I guess my point was if we somehow get to an odd number of nanoseconds, 
we'd end up with rounding errors. I'm not sure if your algorithm will 
ever allow that.


Rogan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Gene Heskett
On Tuesday 24 April 2007, Ingo Molnar wrote:
* Gene Heskett [EMAIL PROTECTED] wrote:
  (Btw., to protect against such mishaps in the future i have changed
  the SysRq-N [SysRq-Nice] implementation in my tree to not only
  change real-time tasks to SCHED_OTHER, but to also renice negative
  nice levels back to 0 - this will show up in -v6. That way you'd
  only have had to hit SysRq-N to get the system out of the wedge.)

 That sounds handy, particularly with idiots like me at the wheel...

by that standard i guess we tinkerers are all idiots ;)

   Ingo

Eiyyyup!

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
Man's horizons are bounded by his vision.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Gene Heskett
On Tuesday 24 April 2007, Ingo Molnar wrote:
* David Lang [EMAIL PROTECTED] wrote:
  (Btw., to protect against such mishaps in the future i have changed
  the SysRq-N [SysRq-Nice] implementation in my tree to not only
  change real-time tasks to SCHED_OTHER, but to also renice negative
  nice levels back to 0 - this will show up in -v6. That way you'd
  only have had to hit SysRq-N to get the system out of the wedge.)

 if you are trying to unwedge a system it may be a good idea to renice
 all tasks to 0, it could be that a task at +19 is holding a lock that
 something else is waiting for.

Yeah, that's possible too, but +19 tasks are getting a small but
guaranteed share of the CPU so eventually it ought to release it. It's
still a possibility, but i think i'll wait for a specific incident to
happen first, and then react to that incident :-)

   Ingo

In the instance I created, even the SysRq+b was ignored, and ISTR thats 
supposed to initiate a reboot is it not?  So it was well and truly wedged.

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
I use technology in order to hate it more properly.
-- Nam June Paik
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Gene Heskett
On Tuesday 24 April 2007, Ingo Molnar wrote:
* Ingo Molnar [EMAIL PROTECTED] wrote:
 yeah, i guess this has little to do with X. I think in your scenario
 it might have been smarter to either stop, or to renice the workloads
 that took away CPU power from others to _positive_ nice levels.
 Negative nice levels can indeed be dangerous.

btw., was X itself at nice 0 or nice -10 when the lockup happened?

   Ingo

Memory could be hazy Ingo, but I think X was at 0 when that occurred.


-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
I use technology in order to hate it more properly.
-- Nam June Paik
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Gene Heskett
On Tuesday 24 April 2007, Ingo Molnar wrote:
* Ingo Molnar [EMAIL PROTECTED] wrote:
 yeah, i guess this has little to do with X. I think in your scenario
 it might have been smarter to either stop, or to renice the workloads
 that took away CPU power from others to _positive_ nice levels.
 Negative nice levels can indeed be dangerous.

btw., was X itself at nice 0 or nice -10 when the lockup happened?

   Ingo

Memory could be fuzzy Ingo, but I think it was at 0 at the time.

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
I know it all.  I just can't remember it all at once.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Chris Friesen

Rogan Dawes wrote:

I guess my point was if we somehow get to an odd number of nanoseconds, 
we'd end up with rounding errors. I'm not sure if your algorithm will 
ever allow that.


And Ingo's point was that when it takes thousands of nanoseconds for a 
single context switch, an error of half a nanosecond is down in the noise.


Chris
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Rogan Dawes

Chris Friesen wrote:

Rogan Dawes wrote:

I guess my point was if we somehow get to an odd number of 
nanoseconds, we'd end up with rounding errors. I'm not sure if your 
algorithm will ever allow that.


And Ingo's point was that when it takes thousands of nanoseconds for a 
single context switch, an error of half a nanosecond is down in the noise.


Chris


My concern was that since Ingo said that this is a closed economy, with 
a fixed sum/total, if we lose a nanosecond here and there, eventually 
we'll lose them all.


Some folks have uptimes of multiple years.

Of course, I could (very likely!) be full of it! ;-)

Rogan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ray Lee

On 4/23/07, Linus Torvalds [EMAIL PROTECTED] wrote:

On Mon, 23 Apr 2007, Ingo Molnar wrote:

 The give scheduler money transaction can be both an implicit
 transaction (for example when writing to UNIX domain sockets or
 blocking on a pipe, etc.), or it could be an explicit transaction:
 sched_yield_to(). This latter i've already implemented for CFS, but it's
 much less useful than the really significant implicit ones, the ones
 which will help X.

Yes. It would be wonderful to get it working automatically, so please say
something about the implementation..

The perfect situation would be that when somebody goes to sleep, any
extra points it had could be given to whoever it woke up last. Note that
for something like X, it means that the points are 100% ephemeral: it gets
points when a client sends it a request, but it would *lose* the points
again when it sends the reply!


It would seem like there should be a penalty associated with sending
those points as well, so that two processes communicating quickly with
each other won't get into a mutual love-fest that'll capture the
scheduler's attention.

Ray
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Chris Friesen

Rogan Dawes wrote:

My concern was that since Ingo said that this is a closed economy, with 
a fixed sum/total, if we lose a nanosecond here and there, eventually 
we'll lose them all.


I assume Ingo has set it up so that the system doesn't lose partial 
nanoseconds, but rather they'd just be accounted to the wrong task.


Chris
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Willy Tarreau
On Tue, Apr 24, 2007 at 10:38:32AM -0400, Gene Heskett wrote:
 On Tuesday 24 April 2007, Ingo Molnar wrote:
 * David Lang [EMAIL PROTECTED] wrote:
   (Btw., to protect against such mishaps in the future i have changed
   the SysRq-N [SysRq-Nice] implementation in my tree to not only
   change real-time tasks to SCHED_OTHER, but to also renice negative
   nice levels back to 0 - this will show up in -v6. That way you'd
   only have had to hit SysRq-N to get the system out of the wedge.)
 
  if you are trying to unwedge a system it may be a good idea to renice
  all tasks to 0, it could be that a task at +19 is holding a lock that
  something else is waiting for.
 
 Yeah, that's possible too, but +19 tasks are getting a small but
 guaranteed share of the CPU so eventually it ought to release it. It's
 still a possibility, but i think i'll wait for a specific incident to
 happen first, and then react to that incident :-)
 
  Ingo
 
 In the instance I created, even the SysRq+b was ignored, and ISTR thats 
 supposed to initiate a reboot is it not?  So it was well and truly wedged.

On many machines I use this on, I have to release Alt while still holding B.
Don't know why, but it works like this.

Willy

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Li, Tong N
On Mon, 2007-04-23 at 18:57 -0700, Bill Huey wrote: 
 On Mon, Apr 23, 2007 at 05:59:06PM -0700, Li, Tong N wrote:
  I don't know if we've discussed this or not. Since both CFS and SD claim
  to be fair, I'd like to hear more opinions on the fairness aspect of
  these designs. In areas such as OS, networking, and real-time, fairness,
  and its more general form, proportional fairness, are well-defined
  terms. In fact, perfect fairness is not feasible since it requires all
  runnable threads to be running simultaneously and scheduled with
  infinitesimally small quanta (like a fluid system). So to evaluate if a
 
 Unfortunately, fairness is rather non-formal in this context and probably
 isn't strictly desirable given how hack much of Linux userspace is. Until
 there's a method of doing directed yields, like what Will has prescribed
 a kind of allotment to thread doing work for another a completely strict
 mechanism, it is probably problematic with regards to corner cases.
 
 X for example is largely non-thread safe. Until they can get their xcb
 framework in place and addition thread infrastructure to do hand off
 properly, it's going to be difficult schedule for it. It's well known to
 be problematic.

I agree. I just think calling the designs perfectly or completely
fair is too strong. It might cause unnecessary confusion that
overshadows the actual merits of these designs. If we were to evaluate
specifically the fairness aspect of a design, then I'd suggest defining
it more formally.

 You announced your scheduler without CCing any of the relevant people here
 (and risk being completely ignored in lkml traffic):
 
   http://lkml.org/lkml/2007/4/20/286
 
 What is your opinion of both CFS and SDL ? How can you work be useful
 to either scheduler mentioned or to the Linux kernel on its own ?

I like SD for its simplicity. My concern with CFS is the RB tree
structure. Log(n) seems high to me given the fact that we had an O(1)
scheduler. Many algorithms achieve strong fairness guarantees at the
cost of log(n) time. Thus, I tend to think, if log(n) is acceptable, we
might want also to look at other algorithms (e.g., start-time first)
with better fairness properties and see if they could be extended to be
general purpose.

  I understand that via experiments we can show a design is reasonably
  fair in the common case, but IMHO, to claim that a design is fair, there
  needs to be some kind of formal analysis on the fairness bound, and this
  bound should be proven to be constant. Even if the bound is not
  constant, at least this analysis can help us better understand and
  predict the degree of fairness that users would experience (e.g., would
  the system be less fair if the number of threads increases? What happens
  if a large number of threads dynamically join and leave the system?).
 
 Will has been thinking about this, but you have to also consider the
 practicalities of your approach versus Con's and Ingo's.

I consider my work an approach to extend an existing scheduler to
support proportional fairness. I see many proportional-share designs are
lacking things such as good interactive support that Linux does well.
This is why I designed it on top of the existing scheduler so that it
can leverage things such as dynamic priorities. Regardless of the
underlying scheduler, SD or CFS, I think the algorithm I used would
still apply and thus we can extend the scheduler similarly.

 I'm all for things like proportional scheduling and the extensions
 needed to do it properly. It would be highly relevant to some version
 of the -rt patch if not that patch directly.

I'd love it to be considered for part of the -rt patch. I'm new to 
this, so would you please let me know what to do?

Thanks,

  tong
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread William Lee Irwin III
On Mon, Apr 23, 2007 at 05:59:06PM -0700, Li, Tong N wrote:
 I don't know if we've discussed this or not. Since both CFS and SD claim
 to be fair, I'd like to hear more opinions on the fairness aspect of
 these designs. In areas such as OS, networking, and real-time, fairness,
 and its more general form, proportional fairness, are well-defined
 terms. In fact, perfect fairness is not feasible since it requires all
 runnable threads to be running simultaneously and scheduled with
 infinitesimally small quanta (like a fluid system). So to evaluate if a
 new scheduling algorithm is fair, the common approach is to take the
 ideal fair algorithm (often referred to as Generalized Processor
 Scheduling or GPS) as a reference model and analyze if the new algorithm
 can achieve a constant error bound (different error metrics also exist).

Could you explain for the audience the technical definition of fairness
and what sorts of error metrics are commonly used? There seems to be
some disagreement, and you're neutral enough of an observer that your
statement would help.


On Mon, Apr 23, 2007 at 05:59:06PM -0700, Li, Tong N wrote:
 I understand that via experiments we can show a design is reasonably
 fair in the common case, but IMHO, to claim that a design is fair, there
 needs to be some kind of formal analysis on the fairness bound, and this
 bound should be proven to be constant. Even if the bound is not
 constant, at least this analysis can help us better understand and
 predict the degree of fairness that users would experience (e.g., would
 the system be less fair if the number of threads increases? What happens
 if a large number of threads dynamically join and leave the system?).

Carrying out this sort of analysis on various policies would help, but
I'd expect most of them to be difficult to analyze. cfs' current
-fair_key computation should be simple enough to analyze, at least
ignoring nice numbers, though I've done nothing rigorous in this area.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Bernd Eckenfels
In article [EMAIL PROTECTED] you wrote:
 Could you explain for the audience the technical definition of fairness
 and what sorts of error metrics are commonly used? There seems to be
 some disagreement, and you're neutral enough of an observer that your
 statement would help.

And while we are at it, why it is a good thing. I could understand that fair
means no missbehaving (intentionally or unintentionally) application can
harm the rest of the system. However a responsive desktop might not
necesarily be very fair to compute jobs.

Even a simple thing as who gets accounted can be quite different in
different workloads. (larger multi user systems tend to be fair based on the
user, on servers you more balance by thread or job and single user systems
should be as unfair as the user wants them as long as no process can run
away)

Gruss
Bernd
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Peter Williams

Rogan Dawes wrote:

Chris Friesen wrote:

Rogan Dawes wrote:

I guess my point was if we somehow get to an odd number of 
nanoseconds, we'd end up with rounding errors. I'm not sure if your 
algorithm will ever allow that.


And Ingo's point was that when it takes thousands of nanoseconds for a 
single context switch, an error of half a nanosecond is down in the 
noise.


Chris


My concern was that since Ingo said that this is a closed economy, with 
a fixed sum/total, if we lose a nanosecond here and there, eventually 
we'll lose them all.


Some folks have uptimes of multiple years.

Of course, I could (very likely!) be full of it! ;-)


And won't be using the any new scheduler on these computers anyhow as 
that would involve bringing the system down to install the new kernel. :-)


Peter
--
Peter Williams   [EMAIL PROTECTED]

Learning, n. The kind of ignorance distinguishing the studious.
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Gene Heskett
On Tuesday 24 April 2007, Willy Tarreau wrote:
On Tue, Apr 24, 2007 at 10:38:32AM -0400, Gene Heskett wrote:
 On Tuesday 24 April 2007, Ingo Molnar wrote:
 * David Lang [EMAIL PROTECTED] wrote:
   (Btw., to protect against such mishaps in the future i have changed
   the SysRq-N [SysRq-Nice] implementation in my tree to not only
   change real-time tasks to SCHED_OTHER, but to also renice negative
   nice levels back to 0 - this will show up in -v6. That way you'd
   only have had to hit SysRq-N to get the system out of the wedge.)
 
  if you are trying to unwedge a system it may be a good idea to renice
  all tasks to 0, it could be that a task at +19 is holding a lock that
  something else is waiting for.
 
 Yeah, that's possible too, but +19 tasks are getting a small but
 guaranteed share of the CPU so eventually it ought to release it. It's
 still a possibility, but i think i'll wait for a specific incident to
 happen first, and then react to that incident :-)
 
 Ingo

 In the instance I created, even the SysRq+b was ignored, and ISTR thats
 supposed to initiate a reboot is it not?  So it was well and truly wedged.

On many machines I use this on, I have to release Alt while still holding B.
Don't know why, but it works like this.

Willy

Yeah, Willy, and pardon a slight bit of sarcasm here but that's how we get the 
reputation for needing virgins to sacrifice, regular experienced girls just 
wouldn't do.

This isn't APL running on an IBM 5120, so it should Just Work(TM) and not need 
a sceance or something to conjure up the right spell.  Besides, the reset 
button is only about 6 feet away...  I get some execsize that way by getting 
up to push it. :)

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
It is so soon that I am done for, I wonder what I was begun for.
-- Epitaph, Cheltenham Churchyard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Gene Heskett
On Tuesday 24 April 2007, Willy Tarreau wrote:
On Tue, Apr 24, 2007 at 10:38:32AM -0400, Gene Heskett wrote:
 On Tuesday 24 April 2007, Ingo Molnar wrote:
 * David Lang [EMAIL PROTECTED] wrote:
   (Btw., to protect against such mishaps in the future i have changed
   the SysRq-N [SysRq-Nice] implementation in my tree to not only
   change real-time tasks to SCHED_OTHER, but to also renice negative
   nice levels back to 0 - this will show up in -v6. That way you'd
   only have had to hit SysRq-N to get the system out of the wedge.)
 
  if you are trying to unwedge a system it may be a good idea to renice
  all tasks to 0, it could be that a task at +19 is holding a lock that
  something else is waiting for.
 
 Yeah, that's possible too, but +19 tasks are getting a small but
 guaranteed share of the CPU so eventually it ought to release it. It's
 still a possibility, but i think i'll wait for a specific incident to
 happen first, and then react to that incident :-)
 
 Ingo

 In the instance I created, even the SysRq+b was ignored, and ISTR thats
 supposed to initiate a reboot is it not?  So it was well and truly wedged.

On many machines I use this on, I have to release Alt while still holding B.
Don't know why, but it works like this.

Willy

Yeah, Willy, and pardon a slight bit of sarcasm here but that's how we get the 
reputation for needing virgins to sacrifice, regular experienced girls just 
wouldn't do.

This isn't APL running on an IBM 5120, so it should Just Work(TM) and not need 
a sceance or something to conjure up the right spell.  Besides, the reset 
button is only about 6 feet away...  I get some execsize that way by getting 
up to push it. :)

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
It is so soon that I am done for, I wonder what I was begun for.
-- Epitaph, Cheltenham Churchyard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Li, Tong N
 Could you explain for the audience the technical definition of
fairness
 and what sorts of error metrics are commonly used? There seems to be
 some disagreement, and you're neutral enough of an observer that your
 statement would help.

The definition for proportional fairness assumes that each thread has a
weight, which, for example, can be specified by the user, or sth. mapped
from thread priorities, nice values, etc. A scheduler achieves ideal
proportional fairness if (1) it is work-conserving, i.e., it never
leaves a processor idle if there are runnable threads, and (2) for any
two threads, i and j, in any time interval, the ratio of their CPU time
is greater than or equal to the ratio of their weights, assuming that
thread i is continuously runnable in the entire interval and both
threads have fixed weights throughout the interval. A corollary of this
is that if both threads i and j are continuously runnable with fixed
weights in the time interval, then the ratio of their CPU time should be
equal to the ratio of their weights. This definition is pretty
restrictive since it requires the properties to hold for any thread in
any interval, which is not feasible. In practice, all algorithms try to
approximate this ideal scheduler (often referred to as Generalized
Processor Scheduling or GPS). Two error metrics are often used: 

(1) lag(t): for any interval [t1, t2], the lag of a thread at time t \in
[t1, t2] is S'(t1, t) - S(t1, t), where S' is the CPU time the thread
would receive in the interval [t1, t] under the ideal scheduler and S is
the actual CPU time it receives under the scheduler being evaluated.

(2) The second metric doesn't really have an agreed-upon name. Some call
it fairness measure and some call it sth else. Anyway, different from
lag, which is kind of an absolute measure for one thread, this metric
(call it F) defines a relative measure between two threads over any time
interval:

F(t1, t2) = S_i(t1, t2) / w_i - S_j(t1, t2) / w_j,

where S_i and S_j are the CPU time the two threads receive in the
interval [t1, t2] and w_i and w_j are their weights, assuming both
weights don't change throughout the interval.

The goal of a proportional-share scheduling algorithm is to minimize the
above metrics. If the lag function is bounded by a constant for any
thread in any time interval, then the algorithm is considered to be
fair. You may notice that the second metric is actually weaker than
first. In fact, if an algorithm achieves a constant lag bound, it must
also achieve a constant bound for the second metric, but the reverse is
not necessarily true. But in some settings, people have focused on the
second metric and still consider an algorithm to be fair as long as the
second metric is bounded by a constant.

 
 On Mon, Apr 23, 2007 at 05:59:06PM -0700, Li, Tong N wrote:
  I understand that via experiments we can show a design is reasonably
  fair in the common case, but IMHO, to claim that a design is fair,
there
  needs to be some kind of formal analysis on the fairness bound, and
this
  bound should be proven to be constant. Even if the bound is not
  constant, at least this analysis can help us better understand and
  predict the degree of fairness that users would experience (e.g.,
would
  the system be less fair if the number of threads increases? What
happens
  if a large number of threads dynamically join and leave the
system?).
 
 Carrying out this sort of analysis on various policies would help, but
 I'd expect most of them to be difficult to analyze. cfs' current
 -fair_key computation should be simple enough to analyze, at least
 ignoring nice numbers, though I've done nothing rigorous in this area.
 

If we can derive some invariants from the algorithm, it'd help the
analysis. An example is the deficit round-robin (DRR) algorithm in
networking. Its analysis utilizes the fact that the round each flow (in
this case, it'd be thread) goes through in any time interval differs by
at most one.

Hope you didn't get bored by all of this. :)

  tong
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-23 Thread Arjan van de Ven

> Within reason, it's not the number of clients that X has that causes its 
> CPU bandwidth use to sky rocket and cause problems.  It's more to to 
> with what type of clients they are.  Most GUIs (even ones that are 
> constantly updating visual data (e.g. gkrellm -- I can open quite a 
> large number of these without increasing X's CPU usage very much)) cause 
> very little load on the X server.  The exceptions to this are the 


there is actually 2 and not just 1 "X server", and they are VERY VERY
different in behavior.

Case 1: Accelerated driver

If X talks to a decent enough card it supports will with acceleration,
it will be very rare for X itself to spend any kind of significant
amount of CPU time, all the really heavy stuff is done in hardware, and
asynchronously at that. A bit of batching will greatly improve system
performance in this case.

Case 2: Unaccelerated VESA

Some drivers in X, especially the VESA and NV drivers (which are quite
common, vesa is used on all hardware without a special driver nowadays),
have no or not enough acceleration to matter for modern desktops. This
means the CPU is doing all the heavy lifting, in the X program. In this
case even a simple "move the window a bit" becomes quite a bit of a CPU
hog already.

The cases are fundamentally different in behavior, because in the first
case, X hardly consumes the time it would get in any scheme, while in
the second case X really is CPU bound and will happily consume any CPU
time it can get.



-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-23 Thread Peter Williams

Linus Torvalds wrote:


On Mon, 23 Apr 2007, Ingo Molnar wrote:
The "give scheduler money" transaction can be both an "implicit 
transaction" (for example when writing to UNIX domain sockets or 
blocking on a pipe, etc.), or it could be an "explicit transaction": 
sched_yield_to(). This latter i've already implemented for CFS, but it's 
much less useful than the really significant implicit ones, the ones 
which will help X.


Yes. It would be wonderful to get it working automatically, so please say 
something about the implementation..


The "perfect" situation would be that when somebody goes to sleep, any 
extra points it had could be given to whoever it woke up last. Note that 
for something like X, it means that the points are 100% ephemeral: it gets 
points when a client sends it a request, but it would *lose* the points 
again when it sends the reply!


So it would only accumulate "scheduling points" while multiuple clients 
are actively waiting for it, which actually sounds like exactly the right 
thing. However, I don't really see how to do it well, especially since the 
kernel cannot actually match up the client that gave some scheduling 
points to the reply that X sends back.


There are subtle semantics with these kinds of things: especially if the 
scheduling points are only awarded when a process goes to sleep, if X is 
busy and continues to use the CPU (for another client), it wouldn't give 
any scheduling points back to clients and they really do accumulate with 
the server. Which again sounds like it would be exactly the right thing 
(both in the sense that the server that runs more gets more points, but 
also in the sense that we *only* give points at actual scheduling events).


But how do you actually *give/track* points? A simple "last woken up by 
this process" thing that triggers when it goes to sleep? It might work, 
but on the other hand, especially with more complex things (and networking 
tends to be pretty complex) the actual wakeup may be done by a software 
irq. Do we just say "it ran within the context of X, so we assume X was 
the one that caused it?" It probably would work, but we've generally tried 
very hard to avoid accessing "current" from interrupt context, including 
bh's.


Within reason, it's not the number of clients that X has that causes its 
CPU bandwidth use to sky rocket and cause problems.  It's more to to 
with what type of clients they are.  Most GUIs (even ones that are 
constantly updating visual data (e.g. gkrellm -- I can open quite a 
large number of these without increasing X's CPU usage very much)) cause 
very little load on the X server.  The exceptions to this are the 
various terminal emulators (e.g. xterm, gnome-terminal, etc.) when being 
used to run output intensive command line programs e.g. try "ls -lR /" 
in an xterm.  The other way (that I've noticed) X's CPU usage bandwidth 
sky rocket is when you grab a large window and wiggle it about a lot and 
hopefully this doesn't happen a lot so the problem that needs to be 
addressed is the one caused by text output on xterm and its ilk.


So I think that an elaborate scheme for distributing "points" between X 
and its clients would be overkill.  A good scheduler will make sure 
other tasks such as audio streamers get CPU when they need it with good 
responsiveness even when X takes off by giving them higher priority 
because their CPU bandwidth use is low.


The one problem that might still be apparent in these cases is the mouse 
becoming jerky while X is working like crazy to spew out text too fast 
for anyone to read.  But the only way to fix that is to give X more 
bandwidth but if it's already running at about 95% of a CPU that's 
unlikely to help.  To fix this you would probably need to modify X so 
that it knows re-rendering the cursor is more important than rendering 
text in an xterm.


In normal circumstances, the re-rendering of the mouse happens quickly 
enough for the user to experience good responsiveness because X's normal 
CPU use is low enough for it to be given high priority.


Just because the O(1) tried this model and failed doesn't mean that the 
model is bad.  O(1) was a flawed implementation of a good model.


Peter
PS Doing a kernel build in an xterm isn't an example of high enough 
output to cause a problem as (on my system) it only raises X's 
consumption from 0 to 2% to 2 to 5%.  The type of output that causes the 
problem is usually flying past too fast to read.

--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-23 Thread hui
On Mon, Apr 23, 2007 at 05:59:06PM -0700, Li, Tong N wrote:
> I don't know if we've discussed this or not. Since both CFS and SD claim
> to be fair, I'd like to hear more opinions on the fairness aspect of
> these designs. In areas such as OS, networking, and real-time, fairness,
> and its more general form, proportional fairness, are well-defined
> terms. In fact, perfect fairness is not feasible since it requires all
> runnable threads to be running simultaneously and scheduled with
> infinitesimally small quanta (like a fluid system). So to evaluate if a

Unfortunately, fairness is rather non-formal in this context and probably
isn't strictly desirable given how hack much of Linux userspace is. Until
there's a method of doing directed yields, like what Will has prescribed
a kind of allotment to thread doing work for another a completely strict
mechanism, it is probably problematic with regards to corner cases.

X for example is largely non-thread safe. Until they can get their xcb
framework in place and addition thread infrastructure to do hand off
properly, it's going to be difficult schedule for it. It's well known to
be problematic.

You announced your scheduler without CCing any of the relevant people here
(and risk being completely ignored in lkml traffic):

http://lkml.org/lkml/2007/4/20/286

What is your opinion of both CFS and SDL ? How can you work be useful
to either scheduler mentioned or to the Linux kernel on its own ?

> I understand that via experiments we can show a design is reasonably
> fair in the common case, but IMHO, to claim that a design is fair, there
> needs to be some kind of formal analysis on the fairness bound, and this
> bound should be proven to be constant. Even if the bound is not
> constant, at least this analysis can help us better understand and
> predict the degree of fairness that users would experience (e.g., would
> the system be less fair if the number of threads increases? What happens
> if a large number of threads dynamically join and leave the system?).

Will has been thinking about this, but you have to also consider the
practicalities of your approach versus Con's and Ingo's.

I'm all for things like proportional scheduling and the extensions
needed to do it properly. It would be highly relevant to some version
of the -rt patch if not that patch directly.

bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [REPORT] cfs-v4 vs sd-0.44

2007-04-23 Thread Li, Tong N
I don't know if we've discussed this or not. Since both CFS and SD claim
to be fair, I'd like to hear more opinions on the fairness aspect of
these designs. In areas such as OS, networking, and real-time, fairness,
and its more general form, proportional fairness, are well-defined
terms. In fact, perfect fairness is not feasible since it requires all
runnable threads to be running simultaneously and scheduled with
infinitesimally small quanta (like a fluid system). So to evaluate if a
new scheduling algorithm is fair, the common approach is to take the
ideal fair algorithm (often referred to as Generalized Processor
Scheduling or GPS) as a reference model and analyze if the new algorithm
can achieve a constant error bound (different error metrics also exist).
I understand that via experiments we can show a design is reasonably
fair in the common case, but IMHO, to claim that a design is fair, there
needs to be some kind of formal analysis on the fairness bound, and this
bound should be proven to be constant. Even if the bound is not
constant, at least this analysis can help us better understand and
predict the degree of fairness that users would experience (e.g., would
the system be less fair if the number of threads increases? What happens
if a large number of threads dynamically join and leave the system?).

  tong
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-23 Thread Jeremy Fitzhardinge
Linus Torvalds wrote:
> The "perfect" situation would be that when somebody goes to sleep, any 
> extra points it had could be given to whoever it woke up last. Note that 
> for something like X, it means that the points are 100% ephemeral: it gets 
> points when a client sends it a request, but it would *lose* the points 
> again when it sends the reply!
>
> So it would only accumulate "scheduling points" while multiuple clients 
> are actively waiting for it, which actually sounds like exactly the right 
> thing. However, I don't really see how to do it well, especially since the 
> kernel cannot actually match up the client that gave some scheduling 
> points to the reply that X sends back.
>   

This works out in quite an interesting way.  If the economy is closed -
all clients and servers are managed by the same scheduler - then the
server could get no inherent CPU priority and live entirely on donated
shares.  If that were the case, you'd have to make sure that the server
used the donation from client A on client A's work, otherwise you'd get
freeloaders - but maybe it will all work out.

It gets more interesting when you have a non-closed system - the X
server is working on behalf of external clients over TCP.  Presumably
wakeups from incoming TCP connections wouldn't have any scheduler shares
associated with it, so the X server would have to use its inherent CPU
allocation to service those requests.  Or the external client could
effectively end up freeloading off portions of the local clients' donations.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-23 Thread Guillaume Chazarain

2007/4/23, Ingo Molnar <[EMAIL PROTECTED]>:

p->wait_runtime >>= 1;
p_to->wait_runtime += p->wait_runtime;


I have no problem with clients giving some credit to X,
I am more concerned with X giving half of its credit to
a single client, a quarter of its credit to another client, etc...

For example, a client could setup a periodical wake up
from X, and then periodically getting some credit for free.
Would that be possible?

Thanks.

--
Guillaume
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-23 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> sorry, i was a bit imprecise here. There is a case where CFS can give 
> out a 'loan' to tasks. The scheduler tick has a low resolution, so it 
> is fundamentally inevitable [*] that tasks will run a bit more than 
> they should, and at a heavy context-switching rates these errors can 
> add up significantly. Furthermore, we want to batch up workloads.
> 
> So CFS has a "no loans larger than sched_granularity_ns" policy (which 
> defaults to 5msec), and it captures these sub-granularity 'loans' with 
> nanosec accounting. This too is a very sane economic policy and is 
> anti-infationary :-)

at which point i guess i should rename CFS to 'EFS' (the Economic Fair 
Scheduler)? =B-)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-23 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> (we obviously dont want to allow people to 'share' their loans with 
> others ;), nor do we want to allow a net negative balance. CFS is 
> really brutally cold-hearted, it has a strict 'no loans' policy - the 
> easiest economic way to manage 'inflation', besides the basic act of 
> not printing new money, ever.)

sorry, i was a bit imprecise here. There is a case where CFS can give 
out a 'loan' to tasks. The scheduler tick has a low resolution, so it is 
fundamentally inevitable [*] that tasks will run a bit more than they 
should, and at a heavy context-switching rates these errors can add up 
significantly. Furthermore, we want to batch up workloads.

So CFS has a "no loans larger than sched_granularity_ns" policy (which 
defaults to 5msec), and it captures these sub-granularity 'loans' with 
nanosec accounting. This too is a very sane economic policy and is 
anti-infationary :-)

Ingo

[*] i fundamentally hate 'fundamentally inevitable' conditions so i 
have plans to make the scheduler tick be fed from the rbtree and 
thus become a true high-resolution timer. This not only increases 
fairness (=='precision of scheduling') more, but it also decreases 
the number of timer interrupts on a running system - extending 
dynticks to sched-ticks too. Thomas and me shaped dynticks to enable 
that in an easy way: the scheduler tick is today already a high-res 
timer (but which is currently still driven via the jiffy mechanism).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-23 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> > The "give scheduler money" transaction can be both an "implicit 
> > transaction" (for example when writing to UNIX domain sockets or 
> > blocking on a pipe, etc.), or it could be an "explicit transaction": 
> > sched_yield_to(). This latter i've already implemented for CFS, but 
> > it's much less useful than the really significant implicit ones, the 
> > ones which will help X.
> 
> Yes. It would be wonderful to get it working automatically, so please 
> say something about the implementation..

i agree that the devil will be in the details, but so far it's really 
simple. I'll put all this into separate helper functions so that places 
can just use it in a natural way. The existing yield-to bit is this:

static void
yield_task_fair(struct rq *rq, struct task_struct *p, struct task_struct *p_to)
{
struct rb_node *curr, *next, *first;
struct task_struct *p_next;

/*
 * yield-to support: if we are on the same runqueue then
 * give half of our wait_runtime (if it's positive) to the other task:
 */
if (p_to && p->wait_runtime > 0) {
p->wait_runtime >>= 1;
p_to->wait_runtime += p->wait_runtime;
}

the above is the basic expression of: "charge a positive bank balance". 

(we obviously dont want to allow people to 'share' their loans with 
others ;), nor do we want to allow a net negative balance. CFS is really 
brutally cold-hearted, it has a strict 'no loans' policy - the easiest 
economic way to manage 'inflation', besides the basic act of not 
printing new money, ever.)

[note, due to the nanoseconds unit there's no rounding loss to worry 
about.]

that's all. No runqueue locking, no wakeup decisions even! [Note: see 
detail #1 below for cases where we need to touch the tree]. Really 
low-overhead. Accumulated 'new money' will be acted upon in the next 
schedule() call or in the next scheduler tick, whichever comes sooner. 
Note that in most cases when tasks communicate there will be a natural 
schedule() anyway, which drives this.

p->wait_runtime is also very finegrained: it is in nanoseconds, so a 
task can 'pay' at arbitrary granularity in essence, and there is in 
essence zero 'small coin overhead' and 'conversion loss' in this money 
system. (as you might remember, sharing p->timeslice had inherent 
rounding and sharing problems due to its low jiffy resolution)

detail #1: for decoupled workloads where there is no direct sleep/wake 
coupling between worker and producer, there should also be a way to 
update a task's position in the fairness tree, if it accumulates 
significant amount of new p->wait_runtime. I think this can be done by 
making this an extra field: p->new_wait_runtime, which gets picked up by 
the task if it runs, or which gets propagated into the task's tree 
position if the p->new_wait_runtime value goes above the 
sched_granularity_ns value. But it would work pretty well even without 
this, the server will take advantage of the p->new_wait_runtime 
immediately when it runs, so as long as enough clients 'feed' it with 
money, it will always have enough to keep going.

detail #2: changes to p->wait_runtime are totally lockless, as long as 
they are 64-bit atomic. So the above code is a bit naive on 32-bit 
systems, but no locking is needed otherwise, other than having a stable 
reference to a task structure. (i designed CFS for 64-bit systems)

detail #3: i suspect i should rename p->wait_runtime to a more intuitive 
name - perhaps p->right_to_run? I want to avoid calling it p->timeslice 
because it's not really a timeslice, it's the thing you earned, the 
'timeslice' is a totally locally decided property that has no direct 
connection to this physical resource. I also dont want to call it 
p->cpu_credit, because it is _not_ a credit system: every positive value 
there has been earned the hard way: by 'working' for the system via 
waiting on the runqueue - scaled down to the 'expected fair runtime' - 
i.e. roughly scaled down by 1/rq->nr_running.

detail #3: the scheduler is also a charity: when it has no other work 
left it will let tasks execute "for free" ;-) But otherwise, in any sort 
of saturated market situation CFS is very much a cold hearted 
capitalist.

about the 50% rule: it was a totally arbitrary case for yield_to(), and 
in other cases it should rather be: "give me _all_ the money you have, 
i'll make it work for you as much as i can". And the receiver should 
also perhaps record the amount of 'money' it got from the client, and 
_give back_ any unused proportion of it. (only where easily doable, in 
1:1 task relationships) I.e.:

p_to->wait_runtime = p->wait_runtime;
p->wait_runtime = 0;

schedule();

the former two lines put into a sched_pay(p) API perhaps?

> The "perfect" situation would be that when somebody goes to sleep, any 
> extra points it had could be given to whoever it woke up last. Note 

Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-23 Thread Linus Torvalds


On Mon, 23 Apr 2007, Ingo Molnar wrote:
> 
> The "give scheduler money" transaction can be both an "implicit 
> transaction" (for example when writing to UNIX domain sockets or 
> blocking on a pipe, etc.), or it could be an "explicit transaction": 
> sched_yield_to(). This latter i've already implemented for CFS, but it's 
> much less useful than the really significant implicit ones, the ones 
> which will help X.

Yes. It would be wonderful to get it working automatically, so please say 
something about the implementation..

The "perfect" situation would be that when somebody goes to sleep, any 
extra points it had could be given to whoever it woke up last. Note that 
for something like X, it means that the points are 100% ephemeral: it gets 
points when a client sends it a request, but it would *lose* the points 
again when it sends the reply!

So it would only accumulate "scheduling points" while multiuple clients 
are actively waiting for it, which actually sounds like exactly the right 
thing. However, I don't really see how to do it well, especially since the 
kernel cannot actually match up the client that gave some scheduling 
points to the reply that X sends back.

There are subtle semantics with these kinds of things: especially if the 
scheduling points are only awarded when a process goes to sleep, if X is 
busy and continues to use the CPU (for another client), it wouldn't give 
any scheduling points back to clients and they really do accumulate with 
the server. Which again sounds like it would be exactly the right thing 
(both in the sense that the server that runs more gets more points, but 
also in the sense that we *only* give points at actual scheduling events).

But how do you actually *give/track* points? A simple "last woken up by 
this process" thing that triggers when it goes to sleep? It might work, 
but on the other hand, especially with more complex things (and networking 
tends to be pretty complex) the actual wakeup may be done by a software 
irq. Do we just say "it ran within the context of X, so we assume X was 
the one that caused it?" It probably would work, but we've generally tried 
very hard to avoid accessing "current" from interrupt context, including 
bh's..

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-23 Thread Willy Tarreau
Hi !

On Mon, Apr 23, 2007 at 09:11:43PM +0200, Ingo Molnar wrote:
> 
> * Linus Torvalds <[EMAIL PROTECTED]> wrote:
> 
> > but the point I'm trying to make is that X shouldn't get more CPU-time 
> > because it's "more important" (it's not: and as noted earlier, 
> > thinking that it's more important skews the problem and makes for too 
> > *much* scheduling). X should get more CPU time simply because it 
> > should get it's "fair CPU share" relative to the *sum* of the clients, 
> > not relative to any client individually.
> 
> yeah. And this is not a pipe dream and i think it does not need a 
> 'wakeup matrix' or other complexities.
> 
> I am --->.< this close to being able to do this very robustly under 
> CFS via simple rules of economy and trade: there the p->wait_runtime 
> metric is intentionally a "physical resource" of "hard-earned right to 
> execute on the CPU, by having waited on it" the sum of which is bound 
> for the whole system.
>
> So while with other, heuristic approaches we always had the problem of 
> creating a "hyper-inflation" of an uneconomic virtual currency that 
> could be freely printed by certain tasks, in CFS the economy of this is 
> strict and the finegrained plus/minus balance is strictly managed by a 
> conservative and independent central bank.
> 
> So we can actually let tasks "trade" in these very physical units of 
> "right to execute on the CPU". A task giving it to another task means 
> that this task _already gave up CPU time in the past_. So it's the 
> robust equivalent of an economy's "money earned" concept, and this 
> "money"'s distribution (and redistribution) is totally fair and totally 
> balanced and is not prone to "inflation".
> 
> The "give scheduler money" transaction can be both an "implicit 
> transaction" (for example when writing to UNIX domain sockets or 
> blocking on a pipe, etc.), or it could be an "explicit transaction": 
> sched_yield_to(). This latter i've already implemented for CFS, but it's 
> much less useful than the really significant implicit ones, the ones 
> which will help X.

I don't think that a task should _give_ its slice to the task it's waiting
on, but it should _lend_ it : if the second task (the server, eg: X) does
not eat everything, maybe the first one will need to use the remains.

We had a good example with glxgears. Glxgears may need more CPU than X on
some machines, less on others. But it needs CPU. So it must not give all
it has to X otherwise it will stop. But it can tell X "hey, if you need
some CPU, I have some here, help yourself". When X has exhausted its slice,
it can then use some from the client. Hmmm no, better, X first serves itself
in the client's share, and may then use (parts of) its own if it needs more.

This would be seen like some CPU ressource buckets. Indeed, it's even a
problem of economy as you said. If you want someone to do something for you,
either it's very quick and simple and he can do it for free once in a while,
or you take all of his time and you have to pay him for this time.

Of course, you don't always know whom X is working for, and this will cause
X to sometimes run for one task on another one's ressources. But as long as
the work is done, it's OK. Hey, after all, many of us sometimes work for
customers in real life and take some of their time to work on the kernel
and everyone is happy with it.

I think that if we could have a (small) list of CPU buckets per task, it
would permit us to do such a thing. We would then have to ensure that
pipes or unix sockets correctly present their buckets to their servers.
If we consider that each task only has its own bucket and can lend it to
one and only one server at a time, it should not look too much horrible.

Basically, just something like this (thinking while typing) :

struct task_struct {
  ...
  struct {
 struct list list;
 int money_left;
  } cpu_bucket;
  ...
}

Then, waking up another process would consist in linking our bucket into
its own bucket list. The server can identify the task it's borrowing
from by looking at which task_struct the list belongs to.

Also, it creates some inheritance between processes. When doing such a
thing :


 $ fgrep DST=1.2.3.4 fw.log | sed -e 's/1.2/A.B/' | gzip -c3 >fw-anon.gz

Then fgrep would lend some CPU to sed which in turn would present them both
to gzip. Maybe we need two lists in order of the structures to be unstacked
upon gzip's sleep() :-/

I don't know if I'm clear enough.

Cheers,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-23 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> but the point I'm trying to make is that X shouldn't get more CPU-time 
> because it's "more important" (it's not: and as noted earlier, 
> thinking that it's more important skews the problem and makes for too 
> *much* scheduling). X should get more CPU time simply because it 
> should get it's "fair CPU share" relative to the *sum* of the clients, 
> not relative to any client individually.

yeah. And this is not a pipe dream and i think it does not need a 
'wakeup matrix' or other complexities.

I am --->.< this close to being able to do this very robustly under 
CFS via simple rules of economy and trade: there the p->wait_runtime 
metric is intentionally a "physical resource" of "hard-earned right to 
execute on the CPU, by having waited on it" the sum of which is bound 
for the whole system.

So while with other, heuristic approaches we always had the problem of 
creating a "hyper-inflation" of an uneconomic virtual currency that 
could be freely printed by certain tasks, in CFS the economy of this is 
strict and the finegrained plus/minus balance is strictly managed by a 
conservative and independent central bank.

So we can actually let tasks "trade" in these very physical units of 
"right to execute on the CPU". A task giving it to another task means 
that this task _already gave up CPU time in the past_. So it's the 
robust equivalent of an economy's "money earned" concept, and this 
"money"'s distribution (and redistribution) is totally fair and totally 
balanced and is not prone to "inflation".

The "give scheduler money" transaction can be both an "implicit 
transaction" (for example when writing to UNIX domain sockets or 
blocking on a pipe, etc.), or it could be an "explicit transaction": 
sched_yield_to(). This latter i've already implemented for CFS, but it's 
much less useful than the really significant implicit ones, the ones 
which will help X.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-23 Thread Linus Torvalds


On Mon, 23 Apr 2007, Nick Piggin wrote:
> > If you have a single client, the X server is *not* more important than the 
> > client, and indeed, renicing the X server causes bad patterns: just 
> > because the client sends a request does not mean that the X server should 
> > immediately be given the CPU as being "more important". 
> 
> If the client is doing some processing, and the user moves the mouse, it
> feels much more interactive if the pointer moves rather than waits for
> the client to finish processing.

.. yes. However, that should be automatically true if the X process just 
has "enough CPU time" to merit being scheduled to.

Which it normally should always have, exactly because it's an 
"interactive" process (regardless of how the scheduler is done - any 
scheduler should always give sleepers good latency. The current one 
obviously does it by giving interactivity-bonuses, CFS does it by trying 
to be fair in giving out CPU time).

The problem tends to be the following scenario:

 - the X server is very CPU-busy, because it has lots of clients 
   connecting to it, and it's not getting any "bonus" for doing work for 
   those clients (ie it uses up its time-slice and thus becomes "less 
   important" than other processes, since it's already gotten its "fair" 
   slice of CPU - never mind that it was really unfair to not give it 
   more)

 - there is some process that is *not* dependent on X, that can (and does) 
   run, because X has spent its CPU time serving others.

but the point I'm trying to make is that X shouldn't get more CPU-time 
because it's "more important" (it's not: and as noted earlier, thinking 
that it's more important skews the problem and makes for too *much* 
scheduling). X should get more CPU time simply because it should get it's 
"fair CPU share" relative to the *sum* of the clients, not relative to any 
client individually.

Once you actually do give the X server "fair share" of the CPU, I'm sure
that you can still get into bad situations (trivial example: make clients 
that on purpose do X requests that are expensive for the server, but are 
cheap to generate). But it's likely not going to be an issue in practice 
any more.

Scheduling is not something you can do "perfectly". There's no point in 
even trying. To do "perfect" scheduling, you'd have to have ESP and know 
exactly what the user expects and know the future too. What you should aim 
for is the "obvious cases".

And I don't think anybody really disputes the fact that a process that 
does work for other processes "obviously" should get the CPU time skewed 
towards it (and away from the clients - not from non-clients!). I think 
the only real issue is that nobody really knows how to do it well (or at 
all).

I think the "schedule by user" would be reasonable in practice - not 
perfect by any means, but it *does* fall into the same class of issues: 
users are not in general "more important" than other users, but they 
should be treated fairly across the user, not on a per-process basis.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-23 Thread Linus Torvalds


On Mon, 23 Apr 2007, Nick Piggin wrote:
  If you have a single client, the X server is *not* more important than the 
  client, and indeed, renicing the X server causes bad patterns: just 
  because the client sends a request does not mean that the X server should 
  immediately be given the CPU as being more important. 
 
 If the client is doing some processing, and the user moves the mouse, it
 feels much more interactive if the pointer moves rather than waits for
 the client to finish processing.

.. yes. However, that should be automatically true if the X process just 
has enough CPU time to merit being scheduled to.

Which it normally should always have, exactly because it's an 
interactive process (regardless of how the scheduler is done - any 
scheduler should always give sleepers good latency. The current one 
obviously does it by giving interactivity-bonuses, CFS does it by trying 
to be fair in giving out CPU time).

The problem tends to be the following scenario:

 - the X server is very CPU-busy, because it has lots of clients 
   connecting to it, and it's not getting any bonus for doing work for 
   those clients (ie it uses up its time-slice and thus becomes less 
   important than other processes, since it's already gotten its fair 
   slice of CPU - never mind that it was really unfair to not give it 
   more)

 - there is some process that is *not* dependent on X, that can (and does) 
   run, because X has spent its CPU time serving others.

but the point I'm trying to make is that X shouldn't get more CPU-time 
because it's more important (it's not: and as noted earlier, thinking 
that it's more important skews the problem and makes for too *much* 
scheduling). X should get more CPU time simply because it should get it's 
fair CPU share relative to the *sum* of the clients, not relative to any 
client individually.

Once you actually do give the X server fair share of the CPU, I'm sure
that you can still get into bad situations (trivial example: make clients 
that on purpose do X requests that are expensive for the server, but are 
cheap to generate). But it's likely not going to be an issue in practice 
any more.

Scheduling is not something you can do perfectly. There's no point in 
even trying. To do perfect scheduling, you'd have to have ESP and know 
exactly what the user expects and know the future too. What you should aim 
for is the obvious cases.

And I don't think anybody really disputes the fact that a process that 
does work for other processes obviously should get the CPU time skewed 
towards it (and away from the clients - not from non-clients!). I think 
the only real issue is that nobody really knows how to do it well (or at 
all).

I think the schedule by user would be reasonable in practice - not 
perfect by any means, but it *does* fall into the same class of issues: 
users are not in general more important than other users, but they 
should be treated fairly across the user, not on a per-process basis.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   >