subject:"Re\: RSDL v0.31"

Re: RSDL v0.31

2007-03-29 Thread Nick Piggin


David Schwartz wrote:

Bill Davidsen wrote:



I agree for giving a process more than a fair share, but I don't think
"latency" is the best term for what you describe later. If you think of
latency as the time between a process unblocking and the time when it
gets CPU, that is a more traditional interpretation. I'm not really sure
latency and CPU-starved are compatible.



For CPU-starvation, I think 'nice' is always going to be the fix. If you
want a process to get more than its 'fair share' of the CPU, you have to ask
for that. I think the scheduler should be fair by default.

However, cleverness in the scheduler with latency can make things better
without being unfair to anyone. It's perfectly fair for a task that has been
blocked for awhile to pre-empt a CPU-limited task when it unblocks.

What I'm arguing is that if your task is CPU-limited and the scheduler is
fair, that's your fault -- nice it. If your task is suffering from poor
latency, and it's using less than its fair share of the CPU (because it is
not CPU-limited), that is something the scheduler can be smarter about.


Agreed. That's what I've been saying for years (since early 2.6 when we had
all those scheduler troubles and I started nicksched).


Honestly, I have always been against aggressive pre-emption. I think as CPUs
get faster and timeslices get shorter, it makes less and less sense. In many


I think scheduler timeslices actually shouldn't really be getting shorter.
While I found it is quite easy to get good interactivity with a pretty
dumb scheduler and tiny timeslices (at least until load ramps up enough
that the "off-time" for your critical processes builds up too much), I
think we want to aim for large timeslices. CPU caches are still getting
bigger, and I don't think misses are getting cheaper (especially if you
consider multi core). Also, the energy cost of a memory access is much
higher even if hardware or software is able to hide the latency.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: RSDL v0.31

2007-03-29 Thread David Schwartz

Bill Davidsen wrote:

> I agree for giving a process more than a fair share, but I don't think
> "latency" is the best term for what you describe later. If you think of
> latency as the time between a process unblocking and the time when it
> gets CPU, that is a more traditional interpretation. I'm not really sure
> latency and CPU-starved are compatible.

For CPU-starvation, I think 'nice' is always going to be the fix. If you
want a process to get more than its 'fair share' of the CPU, you have to ask
for that. I think the scheduler should be fair by default.

However, cleverness in the scheduler with latency can make things better
without being unfair to anyone. It's perfectly fair for a task that has been
blocked for awhile to pre-empt a CPU-limited task when it unblocks.

What I'm arguing is that if your task is CPU-limited and the scheduler is
fair, that's your fault -- nice it. If your task is suffering from poor
latency, and it's using less than its fair share of the CPU (because it is
not CPU-limited), that is something the scheduler can be smarter about.

Two things that I think can help improve interactivity without breaking
fairness are:

1) Keep a longer-term history of tasks that have yielded the CPU so that
they can be more likely to pre-empt when they are unblocked by I/O. (The
improved accounting accuracy may go a long way towards doing this. I
personally like exponential decay measurements of CPU usage.)

2) Be smart about things like pipes. When one process unblocks another
through a pipe, socket, or the like, do not pre-empt (this defeats batching
and blows out caches needlessly), but do try to schedule the unblocked
process soon. Don't penalize one process for unblocking another, that's a
good thing for it to do.

I believe that the process of making schedulers smarter and fairer (and
fixing bugs in them) will get us to a place where interactivity is superb
without sacrificing fairness among tasks at equal static priority.

Honestly, I have always been against aggressive pre-emption. I think as CPUs
get faster and timeslices get shorter, it makes less and less sense. In many
cases you are better off just making the task ready-to-run and allowing its
higher dynamic priority to make it next. I strongly believe this for cases
where the running task unblocked the other task. (I think in too many cases,
you blow out the caches and force a context switch on a task that was just a
few hundred instructions short of being finished with what it was doing as
you punish it for getting useful work done.)

DS

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: RSDL v0.31

2007-03-29 Thread David Schwartz


Bill Davidsen wrote:

 I agree for giving a process more than a fair share, but I don't think
 latency is the best term for what you describe later. If you think of
 latency as the time between a process unblocking and the time when it
 gets CPU, that is a more traditional interpretation. I'm not really sure
 latency and CPU-starved are compatible.

For CPU-starvation, I think 'nice' is always going to be the fix. If you
want a process to get more than its 'fair share' of the CPU, you have to ask
for that. I think the scheduler should be fair by default.

However, cleverness in the scheduler with latency can make things better
without being unfair to anyone. It's perfectly fair for a task that has been
blocked for awhile to pre-empt a CPU-limited task when it unblocks.

What I'm arguing is that if your task is CPU-limited and the scheduler is
fair, that's your fault -- nice it. If your task is suffering from poor
latency, and it's using less than its fair share of the CPU (because it is
not CPU-limited), that is something the scheduler can be smarter about.

Two things that I think can help improve interactivity without breaking
fairness are:

1) Keep a longer-term history of tasks that have yielded the CPU so that
they can be more likely to pre-empt when they are unblocked by I/O. (The
improved accounting accuracy may go a long way towards doing this. I
personally like exponential decay measurements of CPU usage.)

2) Be smart about things like pipes. When one process unblocks another
through a pipe, socket, or the like, do not pre-empt (this defeats batching
and blows out caches needlessly), but do try to schedule the unblocked
process soon. Don't penalize one process for unblocking another, that's a
good thing for it to do.

I believe that the process of making schedulers smarter and fairer (and
fixing bugs in them) will get us to a place where interactivity is superb
without sacrificing fairness among tasks at equal static priority.

Honestly, I have always been against aggressive pre-emption. I think as CPUs
get faster and timeslices get shorter, it makes less and less sense. In many
cases you are better off just making the task ready-to-run and allowing its
higher dynamic priority to make it next. I strongly believe this for cases
where the running task unblocked the other task. (I think in too many cases,
you blow out the caches and force a context switch on a task that was just a
few hundred instructions short of being finished with what it was doing as
you punish it for getting useful work done.)

DS


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-29 Thread Nick Piggin


David Schwartz wrote:

Bill Davidsen wrote:



I agree for giving a process more than a fair share, but I don't think
latency is the best term for what you describe later. If you think of
latency as the time between a process unblocking and the time when it
gets CPU, that is a more traditional interpretation. I'm not really sure
latency and CPU-starved are compatible.



For CPU-starvation, I think 'nice' is always going to be the fix. If you
want a process to get more than its 'fair share' of the CPU, you have to ask
for that. I think the scheduler should be fair by default.

However, cleverness in the scheduler with latency can make things better
without being unfair to anyone. It's perfectly fair for a task that has been
blocked for awhile to pre-empt a CPU-limited task when it unblocks.

What I'm arguing is that if your task is CPU-limited and the scheduler is
fair, that's your fault -- nice it. If your task is suffering from poor
latency, and it's using less than its fair share of the CPU (because it is
not CPU-limited), that is something the scheduler can be smarter about.


Agreed. That's what I've been saying for years (since early 2.6 when we had
all those scheduler troubles and I started nicksched).


Honestly, I have always been against aggressive pre-emption. I think as CPUs
get faster and timeslices get shorter, it makes less and less sense. In many


I think scheduler timeslices actually shouldn't really be getting shorter.
While I found it is quite easy to get good interactivity with a pretty
dumb scheduler and tiny timeslices (at least until load ramps up enough
that the off-time for your critical processes builds up too much), I
think we want to aim for large timeslices. CPU caches are still getting
bigger, and I don't think misses are getting cheaper (especially if you
consider multi core). Also, the energy cost of a memory access is much
higher even if hardware or software is able to hide the latency.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-28 Thread Bill Davidsen


Linus Torvalds wrote:


On Tue, 20 Mar 2007, Willy Tarreau wrote:

Linus, you're unfair with Con. He initially was on this position, and lately
worked with Mike by proposing changes to try to improve his X responsiveness.


I was not actually so much speaking about Con, as about a lot of the 
tone in general here. And yes, it's not been entirely black and white. I 
was very happy to see the "try this patch" email from Al Boldi - not 
because I think that patch per se was necessarily the right fix (I have no 
idea), but simply because I think that's the kind of mindset we need to 
have.


Not a lot of people really *like* the old scheduler, but it's been tweaked 
over the years to try to avoid some nasty behaviour. I'm really hoping 
that RSDL would be a lot better (and by all accounts it has the potential 
for that), but I think it's totally naïve to expect that it won't need 
some tweaking too.


So I'll happily still merge RSDL right after 2.6.21 (and it won't even be 
a config option - if we want to make it good, we need to make sure 
*everybody* tests it), but what I want to see is that "can do" spirit wrt 
tweaking for issues that come up.


May I suggest that if you want proper testing that it not only should be 
a config option but a boot time option as well? Otherwise people will be 
comparing an old scheduler with an RSDL kernel, and they will diverge as 
time goes on.


More people would be willing to reboot and test on a similar load than 
will keep two versions of the kernel around. And if you get people 
testing RSDL against a vendor kernel which might be hacked, it will be 
even less meaningful.


Please consider the benefits of making RSDL the default scheduler, and 
leaving people with the old scheduler with an otherwise identical kernel 
as a fair and meaningful comparison.


There, that's a technical argument ;-)

Because let's face it - nothing is ever perfect. Even a really nice 
conceptual idea always ends up hitting the "but in real life, things are 
ugly and complex, and we've depended on behaviour X in the past and can't 
change it, so we need some tweaking for problem Y".


And everything is totally fixable - at least as long as people are willing 
to!


Linus



--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-28 Thread Bill Davidsen


David Schwartz wrote:

there were multiple attempts with renicing X under the vanilla
scheduler, and they were utter failures most of the time. _More_ people
complained about interactivity issues _after_ X has been reniced to -5
(or -10) than people complained about "nice 0" interactivity issues to
begin with.


Unfortunately, nicing X is not going to work. It causes X to pre-empt any
local process that tries to batch requests to it, defeating the batching.
What you really want is X to get scheduled after the client pauses in
sending data to it or has sent more than a certain amount. It seems kind of
crazy to put such login in a scheduler.

Perhaps when one process unblocks another, you put that other process at the
head of the run queue but don't pre-empt the currently running process. That
way, the process can continue to batch requests, but X's maximum latency
delay will be the quantum of the client program.


In general I think that's the right idea. See below for more...



The vanilla scheduler's auto-nice feature rewards _behavior_, so it gets
X right most of the time. The fundamental issue is that sometimes X is
very interactive - we boost it then, there's lots of scheduling but nice
low latencies. Sometimes it's a hog - we penalize it then and things
start to batch up more and we get out of the overload situation faster.
That's the case even if all you care about is desktop performance.

no doubt it's hard to get the auto-nice thing right, but one thing is
clear: currently RSDL causes problems in areas that worked well in the
vanilla scheduler for a long time, so RSDL needs to improve. RSDL should
not lure itself into the false promise of 'just renice X statically'. It
wont work. (You might want to rewrite X's request scheduling - but if so
then i'd like to see that being done _first_, because i just dont trust
such 10-mile-distance problem analysis.)


I am hopeful that there exists a heuristic that both improves this problem
and is also inherently fair. If that's true, then such a heuristic can be
added to RSDL without damaging its properties and without requiring any
special settings. Perhaps longer-term latency benefits to processes that
have yielded in the past?

I think there are certain circumstances, however, where it is inherently
reasonable to insist that 'nice' be used. If you want a CPU-starved task to
get more than 1/X of the CPU, where X is the number of CPU-starved tasks,
you should have to ask for that. If you want one CPU-starved task to get
better latency than other CPU-starved tasks, you should have to ask for
that.


I agree for giving a process more than a fair share, but I don't think 
"latency" is the best term for what you describe later. If you think of 
latency as the time between a process unblocking and the time when it 
gets CPU, that is a more traditional interpretation. I'm not really sure 
latency and CPU-starved are compatible.


I would like to see processes at the head of the queue (for latency) 
which were blocked for long term events, keyboard input, network input, 
mouse input, etc. Then processes blocked for short term events like 
disk, then processes which exhausted their time slice. This helps 
latency and responsiveness, while keeping all processes running.


A variation is to give those processes at the head of the queue short


Fundamentally, the scheduler cannot do it by itself. You can create cases
where the load is precisely identical and one person wants X and another
person wants Y. The scheduler cannot know what's important to you.

DS





--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-28 Thread Bill Davidsen


David Schwartz wrote:

there were multiple attempts with renicing X under the vanilla
scheduler, and they were utter failures most of the time. _More_ people
complained about interactivity issues _after_ X has been reniced to -5
(or -10) than people complained about nice 0 interactivity issues to
begin with.


Unfortunately, nicing X is not going to work. It causes X to pre-empt any
local process that tries to batch requests to it, defeating the batching.
What you really want is X to get scheduled after the client pauses in
sending data to it or has sent more than a certain amount. It seems kind of
crazy to put such login in a scheduler.

Perhaps when one process unblocks another, you put that other process at the
head of the run queue but don't pre-empt the currently running process. That
way, the process can continue to batch requests, but X's maximum latency
delay will be the quantum of the client program.


In general I think that's the right idea. See below for more...



The vanilla scheduler's auto-nice feature rewards _behavior_, so it gets
X right most of the time. The fundamental issue is that sometimes X is
very interactive - we boost it then, there's lots of scheduling but nice
low latencies. Sometimes it's a hog - we penalize it then and things
start to batch up more and we get out of the overload situation faster.
That's the case even if all you care about is desktop performance.

no doubt it's hard to get the auto-nice thing right, but one thing is
clear: currently RSDL causes problems in areas that worked well in the
vanilla scheduler for a long time, so RSDL needs to improve. RSDL should
not lure itself into the false promise of 'just renice X statically'. It
wont work. (You might want to rewrite X's request scheduling - but if so
then i'd like to see that being done _first_, because i just dont trust
such 10-mile-distance problem analysis.)


I am hopeful that there exists a heuristic that both improves this problem
and is also inherently fair. If that's true, then such a heuristic can be
added to RSDL without damaging its properties and without requiring any
special settings. Perhaps longer-term latency benefits to processes that
have yielded in the past?

I think there are certain circumstances, however, where it is inherently
reasonable to insist that 'nice' be used. If you want a CPU-starved task to
get more than 1/X of the CPU, where X is the number of CPU-starved tasks,
you should have to ask for that. If you want one CPU-starved task to get
better latency than other CPU-starved tasks, you should have to ask for
that.


I agree for giving a process more than a fair share, but I don't think 
latency is the best term for what you describe later. If you think of 
latency as the time between a process unblocking and the time when it 
gets CPU, that is a more traditional interpretation. I'm not really sure 
latency and CPU-starved are compatible.


I would like to see processes at the head of the queue (for latency) 
which were blocked for long term events, keyboard input, network input, 
mouse input, etc. Then processes blocked for short term events like 
disk, then processes which exhausted their time slice. This helps 
latency and responsiveness, while keeping all processes running.


A variation is to give those processes at the head of the queue short


Fundamentally, the scheduler cannot do it by itself. You can create cases
where the load is precisely identical and one person wants X and another
person wants Y. The scheduler cannot know what's important to you.

DS





--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-28 Thread Bill Davidsen


Linus Torvalds wrote:


On Tue, 20 Mar 2007, Willy Tarreau wrote:

Linus, you're unfair with Con. He initially was on this position, and lately
worked with Mike by proposing changes to try to improve his X responsiveness.


I was not actually so much speaking about Con, as about a lot of the 
tone in general here. And yes, it's not been entirely black and white. I 
was very happy to see the try this patch email from Al Boldi - not 
because I think that patch per se was necessarily the right fix (I have no 
idea), but simply because I think that's the kind of mindset we need to 
have.


Not a lot of people really *like* the old scheduler, but it's been tweaked 
over the years to try to avoid some nasty behaviour. I'm really hoping 
that RSDL would be a lot better (and by all accounts it has the potential 
for that), but I think it's totally naïve to expect that it won't need 
some tweaking too.


So I'll happily still merge RSDL right after 2.6.21 (and it won't even be 
a config option - if we want to make it good, we need to make sure 
*everybody* tests it), but what I want to see is that can do spirit wrt 
tweaking for issues that come up.


May I suggest that if you want proper testing that it not only should be 
a config option but a boot time option as well? Otherwise people will be 
comparing an old scheduler with an RSDL kernel, and they will diverge as 
time goes on.


More people would be willing to reboot and test on a similar load than 
will keep two versions of the kernel around. And if you get people 
testing RSDL against a vendor kernel which might be hacked, it will be 
even less meaningful.


Please consider the benefits of making RSDL the default scheduler, and 
leaving people with the old scheduler with an otherwise identical kernel 
as a fair and meaningful comparison.


There, that's a technical argument ;-)

Because let's face it - nothing is ever perfect. Even a really nice 
conceptual idea always ends up hitting the but in real life, things are 
ugly and complex, and we've depended on behaviour X in the past and can't 
change it, so we need some tweaking for problem Y.


And everything is totally fixable - at least as long as people are willing 
to!


Linus



--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-23 Thread Mike Galbraith

On Fri, 2007-03-23 at 16:59 +1100, Con Kolivas wrote:
> 
> The deadline mechanism is easy to hit and works. Try printk'ing it.

I tried rc4-rsdl.33, and in a log that's 782kb, there is only one
instance of an overrun, which I created.  On my box, it's dead code.  

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-23 Thread Mike Galbraith

On Fri, 2007-03-23 at 16:59 +1100, Con Kolivas wrote:
 
 The deadline mechanism is easy to hit and works. Try printk'ing it.

I tried rc4-rsdl.33, and in a log that's 782kb, there is only one
instance of an overrun, which I created.  On my box, it's dead code.  

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-22 Thread Mike Galbraith

On Fri, 2007-03-23 at 16:59 +1100, Con Kolivas wrote:

> The deadline mechanism is easy to hit and works. Try printk'ing it.

Hm.  I did (.30), and it didn't in an hours time doing this and that.
After I did the take your quota with you, it did kick in.  Lots.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-22 Thread Con Kolivas

On Friday 23 March 2007 15:39, Mike Galbraith wrote:
> On Fri, 2007-03-23 at 09:50 +1100, Con Kolivas wrote:
> > Now to figure out some meaningful cheap way of improving this accounting.
>
> The accounting is easy iff tick resolution is good enough, the deadline
> mechanism is harder.  I did the "quota follows task" thing, but nothing
> good happens.  That just ensured that the deadline mechanism kicks in
> constantly because tick theft is a fact of tick-based life.  A
> reasonable fudge factor would help, but...
>
> I see problems wrt with trying to implement the deadline mechanism.
>
> As implemented, it can't identify who is doing the stealing (which
> happens constantly, even if userland if 100% hog) because of tick
> resolution accounting.  If you can't identify the culprit, you can't
> enforce the quota, and quotas which are not enforced are, strictly
> speaking, not quotas.  At tick time, you can only close the barn door
> after the cow has been stolen, and the thief can theoretically visit
> your barn an infinite number of times while you aren't watching the
> door.  ("don't blink" scenarios, and tick is backward-assward blink)
>
> You can count nanoseconds in schedule, and store the actual usage, but
> then you still have the problem of inaccuracies in sched_clock() from
> cross-cpu wakeup and migration.  Cross-cpu wakeups happen quite a lot.
> If sched_clock() _were_ absolutely accurate, you wouldn't need the
> runqueue deadline mechanism, because at slice tick time you can see
> everything you will ever see without moving enforcement directly into
> the most critical of paths.
>
> IMHO, unless it can be demonstrated that timeslice theft is a problem
> with a real-life scenario, you'd be better off dropping the queue
> ticking.  Time slices are a deadline mechanism, and in practice the god
> of randomness ensures that even fast movers do get caught often enough
> to make ticking tasks sufficient.
>
> (that was a very long-winded reply to one sentence because I spent a lot
> of time looking into this very subject and came to the conclusion that
> you can't get there from here.  fwiw, ymmv and all that of course;)
>
> > Thanks again!
>
> You're welcome.

The deadline mechanism is easy to hit and works. Try printk'ing it. There is 
some leeway to take tick accounting into the equation and I don't believe 
nanosecond resolution is required at all for this (how much leeway would you 
give then ;)). Eventually there is nothing to stop us using highres timers 
(blessed if they work as planned everywhere eventually) to do the events and 
do away with scheduler_tick entirely. For now ticks works fine; a reasonable 
estimate for smp migration will suffice (patch forthcoming).

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-22 Thread Mike Galbraith

On Fri, 2007-03-23 at 09:50 +1100, Con Kolivas wrote:

> Now to figure out some meaningful cheap way of improving this accounting.

The accounting is easy iff tick resolution is good enough, the deadline
mechanism is harder.  I did the "quota follows task" thing, but nothing
good happens.  That just ensured that the deadline mechanism kicks in
constantly because tick theft is a fact of tick-based life.  A
reasonable fudge factor would help, but...

I see problems wrt with trying to implement the deadline mechanism.

As implemented, it can't identify who is doing the stealing (which
happens constantly, even if userland if 100% hog) because of tick
resolution accounting.  If you can't identify the culprit, you can't
enforce the quota, and quotas which are not enforced are, strictly
speaking, not quotas.  At tick time, you can only close the barn door
after the cow has been stolen, and the thief can theoretically visit
your barn an infinite number of times while you aren't watching the
door.  ("don't blink" scenarios, and tick is backward-assward blink)

You can count nanoseconds in schedule, and store the actual usage, but
then you still have the problem of inaccuracies in sched_clock() from
cross-cpu wakeup and migration.  Cross-cpu wakeups happen quite a lot.
If sched_clock() _were_ absolutely accurate, you wouldn't need the
runqueue deadline mechanism, because at slice tick time you can see
everything you will ever see without moving enforcement directly into
the most critical of paths.

IMHO, unless it can be demonstrated that timeslice theft is a problem
with a real-life scenario, you'd be better off dropping the queue
ticking.  Time slices are a deadline mechanism, and in practice the god
of randomness ensures that even fast movers do get caught often enough
to make ticking tasks sufficient.

(that was a very long-winded reply to one sentence because I spent a lot
of time looking into this very subject and came to the conclusion that
you can't get there from here.  fwiw, ymmv and all that of course;)

> Thanks again!

You're welcome.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-22 Thread Con Kolivas

Thanks for taking the time to actually look at the code. All audits are most 
welcome!.

On Thursday 22 March 2007 18:07, Mike Galbraith wrote:
> This is a rather long message, and isn't directed at anyone in
> particular, it's for others who may be digging into their own problems
> with RSDL, and for others (if any other than Con exist) who understand
> RSDL well enough to tell me if I'm missing something.  Anyone who's not
> interested in RSDL's gizzard hit 'D' now.
>
> On Wed, 2007-03-21 at 17:02 +0100, Peter Zijlstra wrote:
> > On Wed, 2007-03-21 at 15:57 +0100, Mike Galbraith wrote:
> > > 'f' is a progglet which sleeps a bit and burns a bit, duration
> > > depending on argument given. 'sh' is a shell 100% hog.  In this
> > > scenario, the argument was set such that 'f' used right at 50% cpu. 
> > > All are started at the same time, and I froze top when the first 'f'
> > > reached 1:00.
> >
> > May one enquire how much CPU the mythical 'f' uses when ran alone? Just
> > to get a gauge for the numbers?
>
> Actually, the numbers are an interesting curiosity point, but not as
> interesting as the fact that the deadline mechanism isn't kicking in.
>
> >From task_running_tick():
>
>   /*
>* Accounting is performed by both the task and the runqueue. This
>* allows frequently sleeping tasks to get their proper quota of
>* cpu as the runqueue will have their quota still available at
>* the appropriate priority level. It also means frequently waking
>* tasks that might miss the scheduler_tick() will get forced down
>* priority regardless.
>*/
>   if (!--p->time_slice)
>   task_expired_entitlement(rq, p);
>   /*
>* We only employ the deadline mechanism if we run over the quota.
>* It allows aliasing problems around the scheduler_tick to be
>* less harmful.
>*/
>   if (!rt_task(p) && --rq_quota(rq, rq->prio_level) < 0) {
>   if (unlikely(p->first_time_slice))
>   p->first_time_slice = 0;
>   rotate_runqueue_priority(rq);
>   set_tsk_need_resched(p);
>   }
>
> The reason for ticking both runqueue and task is that you can't sample a
> say 100KHz information stream at 1KHz and reproduce that information
> accurately.  IOW, task time slices "blur" at high switch frequency, you
> can't always hit tasks, so you hit what you _can_ hit every sample, the
> runqueue, to minimize the theoretical effects of time slice theft.
> (I've instrumented this before, and caught fast movers stealing 10s of
> milliseconds in extreme cases.)  Generally speaking, statistics even
> things out very much, the fast mover eventually gets hit, and pays a
> full tick for his sub-tick dip in the pool, so in practice it's not a
> great big hairy deal.
>
> If you can accept that tasks can and do dodge the tick, an imbalance
> between runqueue quota and task quota must occur.  It isn't happening
> here, and the reason appears to be bean counting error, tasks migrate
> but their quotas don't follow.  The first time a task is queued at any
> priority, quota is allocated, task goes to sleep, quota on departed
> runqueue stays behind, task awakens on a different runqueue, allocate
> more quota, repeat.  For migration, there's twist, if you pull an
> expired task, expired tasks don't have a quota yet, so they shouldn't
> screw up bean counting.

I had considered the quota not migrating to the new runqueue but basically it 
screws up the "set quota once and deadline only kicks in if absolutely 
necessary" policy. Migration means some extra quota is left behind on the 
runqueue it left from. It is never a huge extra quota and is reset on major 
rotation which occurs very frequently on rsdl. If I was to carry the quota 
over I would need to deduct p->time_slice from the source runqueue's quota, 
and add it to the target runqueue's quota. The problem there is that once the 
time_slice has been handed out to a task it is my position that I no longer 
trust the task to keep its accounting right and may well have exhausted all 
its quota from the source runqueue and is pulling quota away from tasks that 
haven't used theirs yet.

See below for more on updating prio rotation and adding quota to new runqueue.

>
> >From pull_task():
>
>   /*
>* If this task has already been running on src_rq this priority
>* cycle, make the new runqueue think it has been on its cycle
>*/
>   if (p->rotation == src_rq->prio_rotation)
>   p->rotation = this_rq->prio_rotation;
>
> The intent here is clearly that this task continue on the new cpu as if
> nothing has happened.  However, when the task was dequeued, p->array was
> left as it was, points to the last place it was queued.  Stale data.
>
> >From recalc_task_prio(), which is called by enqueue_task():
>
> static void recalc_task_prio(struct task_struct *p, struct rq *rq)
> {
>   struct prio_array *array =

Re: RSDL v0.31

2007-03-22 Thread Con Kolivas

All code reviews are most welcome indeed!

On Thursday 22 March 2007 20:18, Ingo Molnar wrote:
> * Mike Galbraith <[EMAIL PROTECTED]> wrote:
> > Actually, the numbers are an interesting curiosity point, but not as
> > interesting as the fact that the deadline mechanism isn't kicking in.
>
> it's not just the scheduling accounting being off, RSDL also seems to be

I'll look at that when I have time.

> accessing stale data here:
> > >From pull_task():
> >
> > /*
> >  * If this task has already been running on src_rq this priority
> >  * cycle, make the new runqueue think it has been on its cycle
> >  */
> > if (p->rotation == src_rq->prio_rotation)
> > p->rotation = this_rq->prio_rotation;
> >
> > The intent here is clearly that this task continue on the new cpu as
> > if nothing has happened.  However, when the task was dequeued,
> > p->array was left as it was, points to the last place it was queued.
> > Stale data.

I don't think this is a problem because immediately after this in pull_task it 
calls enqueue_task() which always updates p->array in recalc_task_prio(). 
Every enqueue_task always calls recalc_task_prio on non-rt tasks so the array 
should always be set no matter where the entry point to scheduling is from 
unless I have a logic error in setting the p->array in recalc_task_prio() or 
there is another path to schedule() that I've not accounted for by making 
sure recalc_task_prio is done.

> it might point to a hot-unplugged CPU's runqueue as well. Which might
> work accidentally, but we want this fixed nevertheless.

The hot unplugged cpu's prio_rotation will be examined, and then it sets the 
prio_rotation from this runqueue's value. That shouldn't lead to any more 
problems than setting the timestamp based on the hot unplug cpus timestamp 
lower down also in pull_task()

p->timestamp = (p->timestamp - src_rq->most_recent_timestamp) +  
this_rq->most_recent_timestamp;

Thanks for looking!

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-22 Thread Mike Galbraith

On Thu, 2007-03-22 at 10:34 +0100, Mike Galbraith wrote:

> Erk!

bzzt.  singletasking brain :)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-22 Thread Mike Galbraith

On Thu, 2007-03-22 at 10:18 +0100, Ingo Molnar wrote:
> * Mike Galbraith <[EMAIL PROTECTED]> wrote:
> 
> > Actually, the numbers are an interesting curiosity point, but not as 
> > interesting as the fact that the deadline mechanism isn't kicking in.
> 
> it's not just the scheduling accounting being off, RSDL also seems to be 
> accessing stale data here:
> 
> > >From pull_task():
> > /*
> >  * If this task has already been running on src_rq this priority
> >  * cycle, make the new runqueue think it has been on its cycle
> >  */
> > if (p->rotation == src_rq->prio_rotation)
> > p->rotation = this_rq->prio_rotation;
> > 
> > The intent here is clearly that this task continue on the new cpu as
> > if nothing has happened.  However, when the task was dequeued,
> > p->array was left as it was, points to the last place it was queued.
> > Stale data.
> 
> it might point to a hot-unplugged CPU's runqueue as well. Which might 
> work accidentally, but we want this fixed nevertheless.

Erk!  I mentioned to Con offline that I've seen RSDL bring up only one
of my two (halves of a) penguins a couple three times out of a zillion
boots.  Maybe that's why?

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-22 Thread Ingo Molnar


* Mike Galbraith <[EMAIL PROTECTED]> wrote:

> Actually, the numbers are an interesting curiosity point, but not as 
> interesting as the fact that the deadline mechanism isn't kicking in.

it's not just the scheduling accounting being off, RSDL also seems to be 
accessing stale data here:

> >From pull_task():
>   /*
>* If this task has already been running on src_rq this priority
>* cycle, make the new runqueue think it has been on its cycle
>*/
>   if (p->rotation == src_rq->prio_rotation)
>   p->rotation = this_rq->prio_rotation;
> 
> The intent here is clearly that this task continue on the new cpu as
> if nothing has happened.  However, when the task was dequeued,
> p->array was left as it was, points to the last place it was queued.
> Stale data.

it might point to a hot-unplugged CPU's runqueue as well. Which might 
work accidentally, but we want this fixed nevertheless.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-22 Thread Mike Galbraith

On Thu, 2007-03-22 at 05:49 +0100, Willy Tarreau wrote:

> Mike, if you need my old scheddos, I can resend it to you as well as to
> any people working on the scheduler and asking for it. Although trivial,
> I'm a bit reluctant to publish it to the whole world because I suspect
> that distros based on older kernels are still vulnerable and the fixes
> may not be easy. Anyway, it has absolutely no effect on non-interactive
> schedulers.

Sure.  I'm really irked that I lost most of my collection of posted
exploits.  I prefer to test with the same widget the poster tested
with.  

(in this particular case, it's not _very_ important which one i test
with, i'm exploring my RSDL troubles with sleepers in general...
duration matters though, need to be fairly short/short burn.)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-22 Thread Mike Galbraith

This is a rather long message, and isn't directed at anyone in
particular, it's for others who may be digging into their own problems
with RSDL, and for others (if any other than Con exist) who understand
RSDL well enough to tell me if I'm missing something.  Anyone who's not
interested in RSDL's gizzard hit 'D' now.

On Wed, 2007-03-21 at 17:02 +0100, Peter Zijlstra wrote:
> On Wed, 2007-03-21 at 15:57 +0100, Mike Galbraith wrote:
> 
> > 'f' is a progglet which sleeps a bit and burns a bit, duration depending
> > on argument given. 'sh' is a shell 100% hog.  In this scenario, the
> > argument was set such that 'f' used right at 50% cpu.  All are started
> > at the same time, and I froze top when the first 'f' reached 1:00.
> 
> May one enquire how much CPU the mythical 'f' uses when ran alone? Just
> to get a gauge for the numbers?

Actually, the numbers are an interesting curiosity point, but not as
interesting as the fact that the deadline mechanism isn't kicking in.

>From task_running_tick():
/*
 * Accounting is performed by both the task and the runqueue. This
 * allows frequently sleeping tasks to get their proper quota of
 * cpu as the runqueue will have their quota still available at
 * the appropriate priority level. It also means frequently waking
 * tasks that might miss the scheduler_tick() will get forced down
 * priority regardless.
 */
if (!--p->time_slice)
task_expired_entitlement(rq, p);
/*
 * We only employ the deadline mechanism if we run over the quota.
 * It allows aliasing problems around the scheduler_tick to be
 * less harmful.
 */
if (!rt_task(p) && --rq_quota(rq, rq->prio_level) < 0) {
if (unlikely(p->first_time_slice))
p->first_time_slice = 0;
rotate_runqueue_priority(rq);
set_tsk_need_resched(p);
}

The reason for ticking both runqueue and task is that you can't sample a
say 100KHz information stream at 1KHz and reproduce that information
accurately.  IOW, task time slices "blur" at high switch frequency, you
can't always hit tasks, so you hit what you _can_ hit every sample, the
runqueue, to minimize the theoretical effects of time slice theft.
(I've instrumented this before, and caught fast movers stealing 10s of
milliseconds in extreme cases.)  Generally speaking, statistics even
things out very much, the fast mover eventually gets hit, and pays a
full tick for his sub-tick dip in the pool, so in practice it's not a
great big hairy deal.

If you can accept that tasks can and do dodge the tick, an imbalance
between runqueue quota and task quota must occur.  It isn't happening
here, and the reason appears to be bean counting error, tasks migrate
but their quotas don't follow.  The first time a task is queued at any
priority, quota is allocated, task goes to sleep, quota on departed
runqueue stays behind, task awakens on a different runqueue, allocate
more quota, repeat.  For migration, there's twist, if you pull an
expired task, expired tasks don't have a quota yet, so they shouldn't
screw up bean counting.

>From pull_task():
/*
 * If this task has already been running on src_rq this priority
 * cycle, make the new runqueue think it has been on its cycle
 */
if (p->rotation == src_rq->prio_rotation)
p->rotation = this_rq->prio_rotation;

The intent here is clearly that this task continue on the new cpu as if
nothing has happened.  However, when the task was dequeued, p->array was
left as it was, points to the last place it was queued.  Stale data.

>From recalc_task_prio(), which is called by enqueue_task():
static void recalc_task_prio(struct task_struct *p, struct rq *rq)
{
struct prio_array *array = rq->active;
int queue_prio, search_prio;

if (p->rotation == rq->prio_rotation) {
if (p->array == array) {
if (p->time_slice && rq_quota(rq, p->prio))
return;
} else if (p->array == rq->expired) {
queue_expired(p, rq);
return;
} else
task_new_array(p, rq);
} else
task_new_array(p, rq);
search_prio = p->static_prio;

p->rotation was set to this runqueue's prio_rotation, but p->array is
stale, still points to the old cpu's runqueue, so...

static inline void task_new_array(struct task_struct *p, struct rq *rq)
{
bitmap_zero(p->bitmap, PRIO_RANGE);
p->rotation = rq->prio_rotation;
}

p->bitmap is the history of all priorities where this task has been
allocated a quota.  Here, that history is erased, so the task can't
continue it's staircase walk.  It is instead given a new runqueue quota
and time_slice (didn't it just gain ticks?).  Now, what if a cross-cpu
wakeup or

Re: RSDL v0.31

2007-03-22 Thread Mike Galbraith

This is a rather long message, and isn't directed at anyone in
particular, it's for others who may be digging into their own problems
with RSDL, and for others (if any other than Con exist) who understand
RSDL well enough to tell me if I'm missing something.  Anyone who's not
interested in RSDL's gizzard hit 'D' now.

On Wed, 2007-03-21 at 17:02 +0100, Peter Zijlstra wrote:
 On Wed, 2007-03-21 at 15:57 +0100, Mike Galbraith wrote:
 
  'f' is a progglet which sleeps a bit and burns a bit, duration depending
  on argument given. 'sh' is a shell 100% hog.  In this scenario, the
  argument was set such that 'f' used right at 50% cpu.  All are started
  at the same time, and I froze top when the first 'f' reached 1:00.
 
 May one enquire how much CPU the mythical 'f' uses when ran alone? Just
 to get a gauge for the numbers?

Actually, the numbers are an interesting curiosity point, but not as
interesting as the fact that the deadline mechanism isn't kicking in.

From task_running_tick():
/*
 * Accounting is performed by both the task and the runqueue. This
 * allows frequently sleeping tasks to get their proper quota of
 * cpu as the runqueue will have their quota still available at
 * the appropriate priority level. It also means frequently waking
 * tasks that might miss the scheduler_tick() will get forced down
 * priority regardless.
 */
if (!--p-time_slice)
task_expired_entitlement(rq, p);
/*
 * We only employ the deadline mechanism if we run over the quota.
 * It allows aliasing problems around the scheduler_tick to be
 * less harmful.
 */
if (!rt_task(p)  --rq_quota(rq, rq-prio_level)  0) {
if (unlikely(p-first_time_slice))
p-first_time_slice = 0;
rotate_runqueue_priority(rq);
set_tsk_need_resched(p);
}

The reason for ticking both runqueue and task is that you can't sample a
say 100KHz information stream at 1KHz and reproduce that information
accurately.  IOW, task time slices blur at high switch frequency, you
can't always hit tasks, so you hit what you _can_ hit every sample, the
runqueue, to minimize the theoretical effects of time slice theft.
(I've instrumented this before, and caught fast movers stealing 10s of
milliseconds in extreme cases.)  Generally speaking, statistics even
things out very much, the fast mover eventually gets hit, and pays a
full tick for his sub-tick dip in the pool, so in practice it's not a
great big hairy deal.

If you can accept that tasks can and do dodge the tick, an imbalance
between runqueue quota and task quota must occur.  It isn't happening
here, and the reason appears to be bean counting error, tasks migrate
but their quotas don't follow.  The first time a task is queued at any
priority, quota is allocated, task goes to sleep, quota on departed
runqueue stays behind, task awakens on a different runqueue, allocate
more quota, repeat.  For migration, there's twist, if you pull an
expired task, expired tasks don't have a quota yet, so they shouldn't
screw up bean counting.

From pull_task():
/*
 * If this task has already been running on src_rq this priority
 * cycle, make the new runqueue think it has been on its cycle
 */
if (p-rotation == src_rq-prio_rotation)
p-rotation = this_rq-prio_rotation;

The intent here is clearly that this task continue on the new cpu as if
nothing has happened.  However, when the task was dequeued, p-array was
left as it was, points to the last place it was queued.  Stale data.

From recalc_task_prio(), which is called by enqueue_task():
static void recalc_task_prio(struct task_struct *p, struct rq *rq)
{
struct prio_array *array = rq-active;
int queue_prio, search_prio;

if (p-rotation == rq-prio_rotation) {
if (p-array == array) {
if (p-time_slice  rq_quota(rq, p-prio))
return;
} else if (p-array == rq-expired) {
queue_expired(p, rq);
return;
} else
task_new_array(p, rq);
} else
task_new_array(p, rq);
search_prio = p-static_prio;

p-rotation was set to this runqueue's prio_rotation, but p-array is
stale, still points to the old cpu's runqueue, so...

static inline void task_new_array(struct task_struct *p, struct rq *rq)
{
bitmap_zero(p-bitmap, PRIO_RANGE);
p-rotation = rq-prio_rotation;
}

p-bitmap is the history of all priorities where this task has been
allocated a quota.  Here, that history is erased, so the task can't
continue it's staircase walk.  It is instead given a new runqueue quota
and time_slice (didn't it just gain ticks?).  Now, what if a cross-cpu
wakeup or migrating task _didn't_ have a stale array

Re: RSDL v0.31

2007-03-22 Thread Mike Galbraith

On Thu, 2007-03-22 at 05:49 +0100, Willy Tarreau wrote:

 Mike, if you need my old scheddos, I can resend it to you as well as to
 any people working on the scheduler and asking for it. Although trivial,
 I'm a bit reluctant to publish it to the whole world because I suspect
 that distros based on older kernels are still vulnerable and the fixes
 may not be easy. Anyway, it has absolutely no effect on non-interactive
 schedulers.

Sure.  I'm really irked that I lost most of my collection of posted
exploits.  I prefer to test with the same widget the poster tested
with.  

(in this particular case, it's not _very_ important which one i test
with, i'm exploring my RSDL troubles with sleepers in general...
duration matters though, need to be fairly short/short burn.)

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-22 Thread Ingo Molnar


* Mike Galbraith [EMAIL PROTECTED] wrote:

 Actually, the numbers are an interesting curiosity point, but not as 
 interesting as the fact that the deadline mechanism isn't kicking in.

it's not just the scheduling accounting being off, RSDL also seems to be 
accessing stale data here:

 From pull_task():
   /*
* If this task has already been running on src_rq this priority
* cycle, make the new runqueue think it has been on its cycle
*/
   if (p-rotation == src_rq-prio_rotation)
   p-rotation = this_rq-prio_rotation;
 
 The intent here is clearly that this task continue on the new cpu as
 if nothing has happened.  However, when the task was dequeued,
 p-array was left as it was, points to the last place it was queued.
 Stale data.

it might point to a hot-unplugged CPU's runqueue as well. Which might 
work accidentally, but we want this fixed nevertheless.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-22 Thread Mike Galbraith

On Thu, 2007-03-22 at 10:18 +0100, Ingo Molnar wrote:
 * Mike Galbraith [EMAIL PROTECTED] wrote:
 
  Actually, the numbers are an interesting curiosity point, but not as 
  interesting as the fact that the deadline mechanism isn't kicking in.
 
 it's not just the scheduling accounting being off, RSDL also seems to be 
 accessing stale data here:
 
  From pull_task():
  /*
   * If this task has already been running on src_rq this priority
   * cycle, make the new runqueue think it has been on its cycle
   */
  if (p-rotation == src_rq-prio_rotation)
  p-rotation = this_rq-prio_rotation;
  
  The intent here is clearly that this task continue on the new cpu as
  if nothing has happened.  However, when the task was dequeued,
  p-array was left as it was, points to the last place it was queued.
  Stale data.
 
 it might point to a hot-unplugged CPU's runqueue as well. Which might 
 work accidentally, but we want this fixed nevertheless.

Erk!  I mentioned to Con offline that I've seen RSDL bring up only one
of my two (halves of a) penguins a couple three times out of a zillion
boots.  Maybe that's why?

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-22 Thread Mike Galbraith

On Thu, 2007-03-22 at 10:34 +0100, Mike Galbraith wrote:

 Erk!

bzzt.  singletasking brain :)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-22 Thread Con Kolivas

All code reviews are most welcome indeed!

On Thursday 22 March 2007 20:18, Ingo Molnar wrote:
 * Mike Galbraith [EMAIL PROTECTED] wrote:
  Actually, the numbers are an interesting curiosity point, but not as
  interesting as the fact that the deadline mechanism isn't kicking in.

 it's not just the scheduling accounting being off, RSDL also seems to be

I'll look at that when I have time.

 accessing stale data here:
  From pull_task():
 
  /*
   * If this task has already been running on src_rq this priority
   * cycle, make the new runqueue think it has been on its cycle
   */
  if (p-rotation == src_rq-prio_rotation)
  p-rotation = this_rq-prio_rotation;
 
  The intent here is clearly that this task continue on the new cpu as
  if nothing has happened.  However, when the task was dequeued,
  p-array was left as it was, points to the last place it was queued.
  Stale data.

I don't think this is a problem because immediately after this in pull_task it 
calls enqueue_task() which always updates p-array in recalc_task_prio(). 
Every enqueue_task always calls recalc_task_prio on non-rt tasks so the array 
should always be set no matter where the entry point to scheduling is from 
unless I have a logic error in setting the p-array in recalc_task_prio() or 
there is another path to schedule() that I've not accounted for by making 
sure recalc_task_prio is done.

 it might point to a hot-unplugged CPU's runqueue as well. Which might
 work accidentally, but we want this fixed nevertheless.

The hot unplugged cpu's prio_rotation will be examined, and then it sets the 
prio_rotation from this runqueue's value. That shouldn't lead to any more 
problems than setting the timestamp based on the hot unplug cpus timestamp 
lower down also in pull_task()

p-timestamp = (p-timestamp - src_rq-most_recent_timestamp) +  
this_rq-most_recent_timestamp;

Thanks for looking!

-- 
-ck
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-22 Thread Con Kolivas

Thanks for taking the time to actually look at the code. All audits are most 
welcome!.

On Thursday 22 March 2007 18:07, Mike Galbraith wrote:
 This is a rather long message, and isn't directed at anyone in
 particular, it's for others who may be digging into their own problems
 with RSDL, and for others (if any other than Con exist) who understand
 RSDL well enough to tell me if I'm missing something.  Anyone who's not
 interested in RSDL's gizzard hit 'D' now.

 On Wed, 2007-03-21 at 17:02 +0100, Peter Zijlstra wrote:
  On Wed, 2007-03-21 at 15:57 +0100, Mike Galbraith wrote:
   'f' is a progglet which sleeps a bit and burns a bit, duration
   depending on argument given. 'sh' is a shell 100% hog.  In this
   scenario, the argument was set such that 'f' used right at 50% cpu. 
   All are started at the same time, and I froze top when the first 'f'
   reached 1:00.
 
  May one enquire how much CPU the mythical 'f' uses when ran alone? Just
  to get a gauge for the numbers?

 Actually, the numbers are an interesting curiosity point, but not as
 interesting as the fact that the deadline mechanism isn't kicking in.

 From task_running_tick():

   /*
* Accounting is performed by both the task and the runqueue. This
* allows frequently sleeping tasks to get their proper quota of
* cpu as the runqueue will have their quota still available at
* the appropriate priority level. It also means frequently waking
* tasks that might miss the scheduler_tick() will get forced down
* priority regardless.
*/
   if (!--p-time_slice)
   task_expired_entitlement(rq, p);
   /*
* We only employ the deadline mechanism if we run over the quota.
* It allows aliasing problems around the scheduler_tick to be
* less harmful.
*/
   if (!rt_task(p)  --rq_quota(rq, rq-prio_level)  0) {
   if (unlikely(p-first_time_slice))
   p-first_time_slice = 0;
   rotate_runqueue_priority(rq);
   set_tsk_need_resched(p);
   }

 The reason for ticking both runqueue and task is that you can't sample a
 say 100KHz information stream at 1KHz and reproduce that information
 accurately.  IOW, task time slices blur at high switch frequency, you
 can't always hit tasks, so you hit what you _can_ hit every sample, the
 runqueue, to minimize the theoretical effects of time slice theft.
 (I've instrumented this before, and caught fast movers stealing 10s of
 milliseconds in extreme cases.)  Generally speaking, statistics even
 things out very much, the fast mover eventually gets hit, and pays a
 full tick for his sub-tick dip in the pool, so in practice it's not a
 great big hairy deal.

 If you can accept that tasks can and do dodge the tick, an imbalance
 between runqueue quota and task quota must occur.  It isn't happening
 here, and the reason appears to be bean counting error, tasks migrate
 but their quotas don't follow.  The first time a task is queued at any
 priority, quota is allocated, task goes to sleep, quota on departed
 runqueue stays behind, task awakens on a different runqueue, allocate
 more quota, repeat.  For migration, there's twist, if you pull an
 expired task, expired tasks don't have a quota yet, so they shouldn't
 screw up bean counting.

I had considered the quota not migrating to the new runqueue but basically it 
screws up the set quota once and deadline only kicks in if absolutely 
necessary policy. Migration means some extra quota is left behind on the 
runqueue it left from. It is never a huge extra quota and is reset on major 
rotation which occurs very frequently on rsdl. If I was to carry the quota 
over I would need to deduct p-time_slice from the source runqueue's quota, 
and add it to the target runqueue's quota. The problem there is that once the 
time_slice has been handed out to a task it is my position that I no longer 
trust the task to keep its accounting right and may well have exhausted all 
its quota from the source runqueue and is pulling quota away from tasks that 
haven't used theirs yet.

See below for more on updating prio rotation and adding quota to new runqueue.


 From pull_task():

   /*
* If this task has already been running on src_rq this priority
* cycle, make the new runqueue think it has been on its cycle
*/
   if (p-rotation == src_rq-prio_rotation)
   p-rotation = this_rq-prio_rotation;

 The intent here is clearly that this task continue on the new cpu as if
 nothing has happened.  However, when the task was dequeued, p-array was
 left as it was, points to the last place it was queued.  Stale data.

 From recalc_task_prio(), which is called by enqueue_task():

 static void recalc_task_prio(struct task_struct *p, struct rq *rq)
 {
   struct prio_array *array = rq-active;
   int queue_prio, search_prio;

   if (p-rotation == rq-prio_rotation) {
   if

Re: RSDL v0.31

2007-03-22 Thread Mike Galbraith

On Fri, 2007-03-23 at 09:50 +1100, Con Kolivas wrote:

 Now to figure out some meaningful cheap way of improving this accounting.

The accounting is easy iff tick resolution is good enough, the deadline
mechanism is harder.  I did the quota follows task thing, but nothing
good happens.  That just ensured that the deadline mechanism kicks in
constantly because tick theft is a fact of tick-based life.  A
reasonable fudge factor would help, but...

I see problems wrt with trying to implement the deadline mechanism.

As implemented, it can't identify who is doing the stealing (which
happens constantly, even if userland if 100% hog) because of tick
resolution accounting.  If you can't identify the culprit, you can't
enforce the quota, and quotas which are not enforced are, strictly
speaking, not quotas.  At tick time, you can only close the barn door
after the cow has been stolen, and the thief can theoretically visit
your barn an infinite number of times while you aren't watching the
door.  (don't blink scenarios, and tick is backward-assward blink)

You can count nanoseconds in schedule, and store the actual usage, but
then you still have the problem of inaccuracies in sched_clock() from
cross-cpu wakeup and migration.  Cross-cpu wakeups happen quite a lot.
If sched_clock() _were_ absolutely accurate, you wouldn't need the
runqueue deadline mechanism, because at slice tick time you can see
everything you will ever see without moving enforcement directly into
the most critical of paths.

IMHO, unless it can be demonstrated that timeslice theft is a problem
with a real-life scenario, you'd be better off dropping the queue
ticking.  Time slices are a deadline mechanism, and in practice the god
of randomness ensures that even fast movers do get caught often enough
to make ticking tasks sufficient.

(that was a very long-winded reply to one sentence because I spent a lot
of time looking into this very subject and came to the conclusion that
you can't get there from here.  fwiw, ymmv and all that of course;)

 Thanks again!

You're welcome.

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-22 Thread Con Kolivas

On Friday 23 March 2007 15:39, Mike Galbraith wrote:
 On Fri, 2007-03-23 at 09:50 +1100, Con Kolivas wrote:
  Now to figure out some meaningful cheap way of improving this accounting.

 The accounting is easy iff tick resolution is good enough, the deadline
 mechanism is harder.  I did the quota follows task thing, but nothing
 good happens.  That just ensured that the deadline mechanism kicks in
 constantly because tick theft is a fact of tick-based life.  A
 reasonable fudge factor would help, but...

 I see problems wrt with trying to implement the deadline mechanism.

 As implemented, it can't identify who is doing the stealing (which
 happens constantly, even if userland if 100% hog) because of tick
 resolution accounting.  If you can't identify the culprit, you can't
 enforce the quota, and quotas which are not enforced are, strictly
 speaking, not quotas.  At tick time, you can only close the barn door
 after the cow has been stolen, and the thief can theoretically visit
 your barn an infinite number of times while you aren't watching the
 door.  (don't blink scenarios, and tick is backward-assward blink)

 You can count nanoseconds in schedule, and store the actual usage, but
 then you still have the problem of inaccuracies in sched_clock() from
 cross-cpu wakeup and migration.  Cross-cpu wakeups happen quite a lot.
 If sched_clock() _were_ absolutely accurate, you wouldn't need the
 runqueue deadline mechanism, because at slice tick time you can see
 everything you will ever see without moving enforcement directly into
 the most critical of paths.

 IMHO, unless it can be demonstrated that timeslice theft is a problem
 with a real-life scenario, you'd be better off dropping the queue
 ticking.  Time slices are a deadline mechanism, and in practice the god
 of randomness ensures that even fast movers do get caught often enough
 to make ticking tasks sufficient.

 (that was a very long-winded reply to one sentence because I spent a lot
 of time looking into this very subject and came to the conclusion that
 you can't get there from here.  fwiw, ymmv and all that of course;)

  Thanks again!

 You're welcome.

The deadline mechanism is easy to hit and works. Try printk'ing it. There is 
some leeway to take tick accounting into the equation and I don't believe 
nanosecond resolution is required at all for this (how much leeway would you 
give then ;)). Eventually there is nothing to stop us using highres timers 
(blessed if they work as planned everywhere eventually) to do the events and 
do away with scheduler_tick entirely. For now ticks works fine; a reasonable 
estimate for smp migration will suffice (patch forthcoming).

-- 
-ck
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-22 Thread Mike Galbraith

On Fri, 2007-03-23 at 16:59 +1100, Con Kolivas wrote:

 The deadline mechanism is easy to hit and works. Try printk'ing it.

Hm.  I did (.30), and it didn't in an hours time doing this and that.
After I did the take your quota with you, it did kick in.  Lots.

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-21 Thread Willy Tarreau

On Wed, Mar 21, 2007 at 06:07:33PM +0100, Mike Galbraith wrote:
> On Wed, 2007-03-21 at 16:11 +0100, Paolo Ornati wrote:
> > On Wed, 21 Mar 2007 15:57:44 +0100
> > Mike Galbraith <[EMAIL PROTECTED]> wrote:
> > 
> > > I was more than a bit surprised that mainline did this well, considering
> > > that the proggy was one someone posted long time ago to demonstrate
> > > starvation issues with the interactivity estimator.  (source not
> > > available unfortunately, was apparently still on my old PIII box along
> > > with the one Willy posted when I installed opensuse 10.2 on it.  damn.
> > > trivial thing though)
> > 
> > This one?  :)
> 
> No, but that one went to bit heaven too ;-)

Mike, if you need my old scheddos, I can resend it to you as well as to
any people working on the scheduler and asking for it. Although trivial,
I'm a bit reluctant to publish it to the whole world because I suspect
that distros based on older kernels are still vulnerable and the fixes
may not be easy. Anyway, it has absolutely no effect on non-interactive
schedulers.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-21 Thread Artur Skawina

Al Boldi wrote:
> Artur Skawina wrote:
>> Al Boldi wrote:
>>> -   p->quota = rr_quota(p);
>>> +   /*
>>> +* boost factor hardcoded to 5; adjust to your liking
>>> +* higher means more likely to DoS
>>> +*/
>>> +   p->quota = rr_quota(p) + (((now - p->timestamp) >> 20) * 5);

>> mouse cursor stalls lasting almost 1s. i/o bound tasks starving X?
>> After reverting the patch everything is smooth again.
> 
> This patch wasn't really meant for production, as any sleeping background 
> proc turned cpu-hog may DoS the system.
> 
> If you like to play with this, then you probably want to at least reset the 
> quota in its expiration.

well, the problem is that i can't reproduce the problem :) I tried
the patch because i suspected it could introduce regressions, and it
did. Maybe with some tuning a reasonable compromise could be found,
but first we need to know what to tune for... Does anybody have a
simple reproducible way to show the scheduling regressions of RSDL
vs mainline? ie one that does not involve (or at least is
independent of) specific X drivers, binary apps etc. Some reports
mentioned MP, is UP less susceptible?

I've now tried a -j2 kernel compilation on UP and in the not niced
case X interactivity suffers, which i guess is to be expected when
you have ~5 processes competing for one cpu (2*(cc+as)+X). "nice -5"
helps a bit, but does not eliminate the effect completely. Obviously
the right solution is to nice the makes, but i think the scheduler
could do better, at least in the case of almost idle X (once you
start moving windows etc it becomes a cpuhog just as the the
compiler). I'll look into this, maybe there's a way to prioritize
often sleeping tasks which can not be abused.
Another thing is the nice levels; right now "nice -10" means ~35%
and "nice -19" gives ~5% cpu; that's probably 2..5 times too much.

artur
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-21 Thread Mike Galbraith

On Wed, 2007-03-21 at 16:11 +0100, Paolo Ornati wrote:
> On Wed, 21 Mar 2007 15:57:44 +0100
> Mike Galbraith <[EMAIL PROTECTED]> wrote:
> 
> > I was more than a bit surprised that mainline did this well, considering
> > that the proggy was one someone posted long time ago to demonstrate
> > starvation issues with the interactivity estimator.  (source not
> > available unfortunately, was apparently still on my old PIII box along
> > with the one Willy posted when I installed opensuse 10.2 on it.  damn.
> > trivial thing though)
> 
> This one?  :)

No, but that one went to bit heaven too ;-)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-21 Thread Mike Galbraith

On Wed, 2007-03-21 at 17:02 +0100, Peter Zijlstra wrote:
> On Wed, 2007-03-21 at 15:57 +0100, Mike Galbraith wrote:
> 
> > 'f' is a progglet which sleeps a bit and burns a bit, duration depending
> > on argument given. 'sh' is a shell 100% hog.  In this scenario, the
> > argument was set such that 'f' used right at 50% cpu.  All are started
> > at the same time, and I froze top when the first 'f' reached 1:00.
> 
> May one enquire how much CPU the mythical 'f' uses when ran alone? Just
> to get a gauge for the numbers?

Right at 50%

-Mike

(mythical?  i can send you the binary if you want)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-21 Thread Peter Zijlstra

On Wed, 2007-03-21 at 15:57 +0100, Mike Galbraith wrote:

> 'f' is a progglet which sleeps a bit and burns a bit, duration depending
> on argument given. 'sh' is a shell 100% hog.  In this scenario, the
> argument was set such that 'f' used right at 50% cpu.  All are started
> at the same time, and I froze top when the first 'f' reached 1:00.

May one enquire how much CPU the mythical 'f' uses when ran alone? Just
to get a gauge for the numbers?



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-21 Thread Mike Galbraith

On Tue, 2007-03-20 at 09:03 +0100, Mike Galbraith wrote:

> Moving right along to the bugs part, I hope others are looking as well,
> and not only talking.
> 
> One area that looks pretty fishy to me is cross-cpu wakeups and task
> migration.  p->rotation appears to lose all meaning when you cross the
> cpu boundary, and try_to_wake_up()is using that information in the
> cross-cpu case.  In pull_task() OTOH, it checks to see if the task ran
> on the remote cpu (at all, hmm), and if so tags the task accordingly.

Doing the same in try_to_wake_up()delivered a counter intuitive result.
I expected sleeping tasks to suffer a bit, because when a task wakes up
on a different cpu, the chance of it being in the same rotation is
practically nil, so it would be issued a new quota when it hit
recalc_task_prio() and begin a new walk down the stairs.  In the case
where it's is told that the awakening task is running in the same
rotation (as is done in pull_task, and with the patchlet below), since
p->array isn't NULLed any more when the task is dequeued, there would be
an array (last it was queued in), there's going to be time_slice (see no
way 0 time_slice can happen, and nothing good would happen in
task_running_tick() if it could), and since per instrumentation nobody
is ever overrunning runqueue quota, it should just continue to march
down the stairs, and receive less bandwidth than the full restart.

What happened is below.

'f' is a progglet which sleeps a bit and burns a bit, duration depending
on argument given. 'sh' is a shell 100% hog.  In this scenario, the
argument was set such that 'f' used right at 50% cpu.  All are started
at the same time, and I froze top when the first 'f' reached 1:00.

virgin 2.6.21-rc3-rsdl-smp
top - 13:52:50 up 7 min, 12 users,  load average: 3.45, 2.89, 1.51

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  P COMMAND
 6560 root  31   0  2892 1236 1032 R   82  0.1   1:50.24 1 sh
 6558 root  28   0  1428  276  228 S   42  0.0   1:00.09 1 f
 6557 root  30   0  1424  280  228 R   35  0.0   1:00.25 0 f
 6559 root  39   0  1424  276  228 R   33  0.0   0:58.36 0 f
 6420 root  23   0  2372 1068  764 R3  0.1   0:04.68 0 top

patched as below 2.6.21-rc3-rsdl-smp
top - 14:09:28 up 6 min, 12 users,  load average: 3.52, 2.70, 1.29

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  P COMMAND
 6517 root  38   0  2892 1240 1032 R   59  0.1   1:31.12 1 sh
 6515 root  24   0  1424  280  228 R   51  0.0   1:00.10 0 f
 6514 root  37   0  1428  280  228 R   42  0.0   1:00.58 1 f
 6516 root  24   0  1428  280  228 R   41  0.0   1:00.01 0 f
 6430 root  23   0  2372 1056  764 R2  0.1   0:05.53 0 top

--- kernel/sched.c.org  2007-03-15 07:04:51.0 +0100
+++ kernel/sched.c  2007-03-21 13:55:22.0 +0100
@@ -1416,7 +1416,8 @@ static int try_to_wake_up(struct task_st
if (cpu == this_cpu) {
schedstat_inc(rq, ttwu_local);
goto out_set_cpu;
-   }
+   } else if (p->rotation == cpu_rq(cpu)->prio_rotation)
+   p->rotation = cpu_rq(this_cpu)->prio_rotation;

for_each_domain(this_cpu, sd) {
if (cpu_isset(cpu, sd->span)) {

Same test with virgin 2.6.20.3-smp for reference.
top - 14:46:10 up 18 min, 12 users,  load average: 3.70, 1.89, 1.07

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  P COMMAND
 6529 root  15   0  1424  280  228 S   54  0.0   1:00.26 1 f
 6530 root  15   0  1428  280  228 R   50  0.0   0:59.03 0 f
 6531 root  15   0  1424  280  228 R   48  0.0   0:59.29 1 f
 6532 root  25   0  2892 1240 1032 R   40  0.1   1:00.54 0 sh
 6457 root  15   0  2380 1056  764 R1  0.1   0:02.34 1 top

I was more than a bit surprised that mainline did this well, considering
that the proggy was one someone posted long time ago to demonstrate
starvation issues with the interactivity estimator.  (source not
available unfortunately, was apparently still on my old PIII box along
with the one Willy posted when I installed opensuse 10.2 on it.  damn.
trivial thing though)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: RSDL v0.31

2007-03-21 Thread David Schwartz


> there were multiple attempts with renicing X under the vanilla
> scheduler, and they were utter failures most of the time. _More_ people
> complained about interactivity issues _after_ X has been reniced to -5
> (or -10) than people complained about "nice 0" interactivity issues to
> begin with.

Unfortunately, nicing X is not going to work. It causes X to pre-empt any
local process that tries to batch requests to it, defeating the batching.
What you really want is X to get scheduled after the client pauses in
sending data to it or has sent more than a certain amount. It seems kind of
crazy to put such login in a scheduler.

Perhaps when one process unblocks another, you put that other process at the
head of the run queue but don't pre-empt the currently running process. That
way, the process can continue to batch requests, but X's maximum latency
delay will be the quantum of the client program.

> The vanilla scheduler's auto-nice feature rewards _behavior_, so it gets
> X right most of the time. The fundamental issue is that sometimes X is
> very interactive - we boost it then, there's lots of scheduling but nice
> low latencies. Sometimes it's a hog - we penalize it then and things
> start to batch up more and we get out of the overload situation faster.
> That's the case even if all you care about is desktop performance.
>
> no doubt it's hard to get the auto-nice thing right, but one thing is
> clear: currently RSDL causes problems in areas that worked well in the
> vanilla scheduler for a long time, so RSDL needs to improve. RSDL should
> not lure itself into the false promise of 'just renice X statically'. It
> wont work. (You might want to rewrite X's request scheduling - but if so
> then i'd like to see that being done _first_, because i just dont trust
> such 10-mile-distance problem analysis.)

I am hopeful that there exists a heuristic that both improves this problem
and is also inherently fair. If that's true, then such a heuristic can be
added to RSDL without damaging its properties and without requiring any
special settings. Perhaps longer-term latency benefits to processes that
have yielded in the past?

I think there are certain circumstances, however, where it is inherently
reasonable to insist that 'nice' be used. If you want a CPU-starved task to
get more than 1/X of the CPU, where X is the number of CPU-starved tasks,
you should have to ask for that. If you want one CPU-starved task to get
better latency than other CPU-starved tasks, you should have to ask for
that.

Fundamentally, the scheduler cannot do it by itself. You can create cases
where the load is precisely identical and one person wants X and another
person wants Y. The scheduler cannot know what's important to you.

DS


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ck] Re: RSDL v0.31

2007-03-21 Thread Kasper Sandberg

On Mon, 2007-03-19 at 16:47 -0400, Bill Davidsen wrote:
> Kasper Sandberg wrote:
> > On Sun, 2007-03-18 at 08:38 +0100, Mike Galbraith wrote:
> >> On Sun, 2007-03-18 at 08:22 +0100, Radoslaw Szkodzinski wrote:
> >>
> >>> I'd recon KDE regresses because of kioslaves waiting on a pipe
> >>> (communication with the app they're doing IO for) and then expiring.
> >>> That's why splitting IO from an app isn't exactly smart. It should at
> >>> least be ran in an another thread.
> >> Hm.  Sounds rather a lot like the...
> >> X sucks, fix X and RSDL will rock your world.  RSDL is perfect.
> >> ...that I've been getting.
> >>
> > not really, only X sucks. KDE works atleast as good with rsdl as
> > vanilla. i dont know how originally said kde works worse, wasnt it just
> > someone that thought?
> > 
> It was probably me, and I had the opinion that KDE is not as smooth as 
> GNOME with RSDL. I haven't had time to measure, but using for daily 
> stuff for about an hour each way hasn't changed my opinion. Every once 
> in a while KDE will KLUNK to a halt for 200-300ms doing mundane stuff 
> like redrawing a page, scrolling, etc. I don't see it with GNOME.

umm, could you try to find something that always does it, so i can try
to reproduce? cause i dont really hit any such thing, and i only have a
2ghz amd64

> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-21 Thread Kasper Sandberg

On Tue, 2007-03-20 at 08:16 -0700, Ray Lee wrote:
> On 3/20/07, Mark Lord <[EMAIL PROTECTED]> wrote:
> > I've droppped it from my machine -- interactive response is much
> > more important for my primary machine right now.
> 
> Help out with a data point? Are you running KDE as well? If you are,
> then it looks like the common denominator that RSDL is handling poorly
> is client-server communication. (KDE's KIO slaves in this case, but X
> in general.)

im not experiencing any problems with KDE. if anything ktorrent seems to
be going a teeny tiny bit smoother, though its nothing i can back up
with data.

now i havent tested ALL kioslaves yet, but stuff like sftp, fish, tar
and such works just as good.


> 
> If so, one would hope that a variation on Linus's 2.5.63 pipe wakeup
> pass-the-interactivity idea could work here. The problem with that
> original patch, IIRC, was that a couple of tasks could bounce their
> interactivity bonus back and forth and thereby starve others. Which
> might be expected given there was no 'decaying' of the interactivity
> bonus, which means you can make a feedback loop.
> 
> Anyway, looks like processes that do A -> B -> A communication chains
> are getting penalized under RSDL. In which case, perhaps I can make a
> test case that exhibits the problem without having to have the same
> graphics card or desktop as you.
An easy-to-reproduce testcase would be good.

> 
> Ray
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-21 Thread Keith Duthie

Another data point: I'm getting stalls in mplayer. I'm assuming the stalls
occur when procmail runs messages through spamprobe, as the system is
otherwise idle. The stalls continue to occur (and I'm not sure that they
aren't worse) when X and/or mplayer are reniced to negative nice levels.

This is on a dual core amd64 system running 2.6.20.3 with rsdl 0.31.
Admittedly I'm also running the nvidia binary driver with X.

-- 
The universe hates you, but don't worry - it's nothing personal.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-21 Thread Ingo Molnar

* Xavier Bestel <[EMAIL PROTECTED]> wrote:

> On Tue, 2007-03-20 at 07:11 +0100, Willy Tarreau wrote:
> > I don't agree with starting to renice X to get something usable
>
> [...] Why not compensate for X design by prioritizing it a bit ?

there were multiple attempts with renicing X under the vanilla 
scheduler, and they were utter failures most of the time. _More_ people 
complained about interactivity issues _after_ X has been reniced to -5 
(or -10) than people complained about "nice 0" interactivity issues to 
begin with.

The vanilla scheduler's auto-nice feature rewards _behavior_, so it gets 
X right most of the time. The fundamental issue is that sometimes X is 
very interactive - we boost it then, there's lots of scheduling but nice 
low latencies. Sometimes it's a hog - we penalize it then and things 
start to batch up more and we get out of the overload situation faster. 
That's the case even if all you care about is desktop performance.

no doubt it's hard to get the auto-nice thing right, but one thing is 
clear: currently RSDL causes problems in areas that worked well in the 
vanilla scheduler for a long time, so RSDL needs to improve. RSDL should 
not lure itself into the false promise of 'just renice X statically'. It 
wont work. (You might want to rewrite X's request scheduling - but if so 
then i'd like to see that being done _first_, because i just dont trust 
such 10-mile-distance problem analysis.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-21 Thread Ingo Molnar


* Xavier Bestel [EMAIL PROTECTED] wrote:

 On Tue, 2007-03-20 at 07:11 +0100, Willy Tarreau wrote:
  I don't agree with starting to renice X to get something usable

 [...] Why not compensate for X design by prioritizing it a bit ?

there were multiple attempts with renicing X under the vanilla 
scheduler, and they were utter failures most of the time. _More_ people 
complained about interactivity issues _after_ X has been reniced to -5 
(or -10) than people complained about nice 0 interactivity issues to 
begin with.

The vanilla scheduler's auto-nice feature rewards _behavior_, so it gets 
X right most of the time. The fundamental issue is that sometimes X is 
very interactive - we boost it then, there's lots of scheduling but nice 
low latencies. Sometimes it's a hog - we penalize it then and things 
start to batch up more and we get out of the overload situation faster. 
That's the case even if all you care about is desktop performance.

no doubt it's hard to get the auto-nice thing right, but one thing is 
clear: currently RSDL causes problems in areas that worked well in the 
vanilla scheduler for a long time, so RSDL needs to improve. RSDL should 
not lure itself into the false promise of 'just renice X statically'. It 
wont work. (You might want to rewrite X's request scheduling - but if so 
then i'd like to see that being done _first_, because i just dont trust 
such 10-mile-distance problem analysis.)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-21 Thread Keith Duthie

Another data point: I'm getting stalls in mplayer. I'm assuming the stalls
occur when procmail runs messages through spamprobe, as the system is
otherwise idle. The stalls continue to occur (and I'm not sure that they
aren't worse) when X and/or mplayer are reniced to negative nice levels.

This is on a dual core amd64 system running 2.6.20.3 with rsdl 0.31.
Admittedly I'm also running the nvidia binary driver with X.

-- 
The universe hates you, but don't worry - it's nothing personal.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-21 Thread Kasper Sandberg

On Tue, 2007-03-20 at 08:16 -0700, Ray Lee wrote:
 On 3/20/07, Mark Lord [EMAIL PROTECTED] wrote:
  I've droppped it from my machine -- interactive response is much
  more important for my primary machine right now.
 
 Help out with a data point? Are you running KDE as well? If you are,
 then it looks like the common denominator that RSDL is handling poorly
 is client-server communication. (KDE's KIO slaves in this case, but X
 in general.)

im not experiencing any problems with KDE. if anything ktorrent seems to
be going a teeny tiny bit smoother, though its nothing i can back up
with data.

now i havent tested ALL kioslaves yet, but stuff like sftp, fish, tar
and such works just as good.


 
 If so, one would hope that a variation on Linus's 2.5.63 pipe wakeup
 pass-the-interactivity idea could work here. The problem with that
 original patch, IIRC, was that a couple of tasks could bounce their
 interactivity bonus back and forth and thereby starve others. Which
 might be expected given there was no 'decaying' of the interactivity
 bonus, which means you can make a feedback loop.
 
 Anyway, looks like processes that do A - B - A communication chains
 are getting penalized under RSDL. In which case, perhaps I can make a
 test case that exhibits the problem without having to have the same
 graphics card or desktop as you.
An easy-to-reproduce testcase would be good.

 
 Ray
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ck] Re: RSDL v0.31

2007-03-21 Thread Kasper Sandberg

On Mon, 2007-03-19 at 16:47 -0400, Bill Davidsen wrote:
 Kasper Sandberg wrote:
  On Sun, 2007-03-18 at 08:38 +0100, Mike Galbraith wrote:
  On Sun, 2007-03-18 at 08:22 +0100, Radoslaw Szkodzinski wrote:
 
  I'd recon KDE regresses because of kioslaves waiting on a pipe
  (communication with the app they're doing IO for) and then expiring.
  That's why splitting IO from an app isn't exactly smart. It should at
  least be ran in an another thread.
  Hm.  Sounds rather a lot like the...
  X sucks, fix X and RSDL will rock your world.  RSDL is perfect.
  ...that I've been getting.
 
  not really, only X sucks. KDE works atleast as good with rsdl as
  vanilla. i dont know how originally said kde works worse, wasnt it just
  someone that thought?
  
 It was probably me, and I had the opinion that KDE is not as smooth as 
 GNOME with RSDL. I haven't had time to measure, but using for daily 
 stuff for about an hour each way hasn't changed my opinion. Every once 
 in a while KDE will KLUNK to a halt for 200-300ms doing mundane stuff 
 like redrawing a page, scrolling, etc. I don't see it with GNOME.

umm, could you try to find something that always does it, so i can try
to reproduce? cause i dont really hit any such thing, and i only have a
2ghz amd64

 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: RSDL v0.31

2007-03-21 Thread David Schwartz


 there were multiple attempts with renicing X under the vanilla
 scheduler, and they were utter failures most of the time. _More_ people
 complained about interactivity issues _after_ X has been reniced to -5
 (or -10) than people complained about nice 0 interactivity issues to
 begin with.

Unfortunately, nicing X is not going to work. It causes X to pre-empt any
local process that tries to batch requests to it, defeating the batching.
What you really want is X to get scheduled after the client pauses in
sending data to it or has sent more than a certain amount. It seems kind of
crazy to put such login in a scheduler.

Perhaps when one process unblocks another, you put that other process at the
head of the run queue but don't pre-empt the currently running process. That
way, the process can continue to batch requests, but X's maximum latency
delay will be the quantum of the client program.

 The vanilla scheduler's auto-nice feature rewards _behavior_, so it gets
 X right most of the time. The fundamental issue is that sometimes X is
 very interactive - we boost it then, there's lots of scheduling but nice
 low latencies. Sometimes it's a hog - we penalize it then and things
 start to batch up more and we get out of the overload situation faster.
 That's the case even if all you care about is desktop performance.

 no doubt it's hard to get the auto-nice thing right, but one thing is
 clear: currently RSDL causes problems in areas that worked well in the
 vanilla scheduler for a long time, so RSDL needs to improve. RSDL should
 not lure itself into the false promise of 'just renice X statically'. It
 wont work. (You might want to rewrite X's request scheduling - but if so
 then i'd like to see that being done _first_, because i just dont trust
 such 10-mile-distance problem analysis.)

I am hopeful that there exists a heuristic that both improves this problem
and is also inherently fair. If that's true, then such a heuristic can be
added to RSDL without damaging its properties and without requiring any
special settings. Perhaps longer-term latency benefits to processes that
have yielded in the past?

I think there are certain circumstances, however, where it is inherently
reasonable to insist that 'nice' be used. If you want a CPU-starved task to
get more than 1/X of the CPU, where X is the number of CPU-starved tasks,
you should have to ask for that. If you want one CPU-starved task to get
better latency than other CPU-starved tasks, you should have to ask for
that.

Fundamentally, the scheduler cannot do it by itself. You can create cases
where the load is precisely identical and one person wants X and another
person wants Y. The scheduler cannot know what's important to you.

DS


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-21 Thread Mike Galbraith

On Tue, 2007-03-20 at 09:03 +0100, Mike Galbraith wrote:

 Moving right along to the bugs part, I hope others are looking as well,
 and not only talking.
 
 One area that looks pretty fishy to me is cross-cpu wakeups and task
 migration.  p-rotation appears to lose all meaning when you cross the
 cpu boundary, and try_to_wake_up()is using that information in the
 cross-cpu case.  In pull_task() OTOH, it checks to see if the task ran
 on the remote cpu (at all, hmm), and if so tags the task accordingly.

Doing the same in try_to_wake_up()delivered a counter intuitive result.
I expected sleeping tasks to suffer a bit, because when a task wakes up
on a different cpu, the chance of it being in the same rotation is
practically nil, so it would be issued a new quota when it hit
recalc_task_prio() and begin a new walk down the stairs.  In the case
where it's is told that the awakening task is running in the same
rotation (as is done in pull_task, and with the patchlet below), since
p-array isn't NULLed any more when the task is dequeued, there would be
an array (last it was queued in), there's going to be time_slice (see no
way 0 time_slice can happen, and nothing good would happen in
task_running_tick() if it could), and since per instrumentation nobody
is ever overrunning runqueue quota, it should just continue to march
down the stairs, and receive less bandwidth than the full restart.

What happened is below.

'f' is a progglet which sleeps a bit and burns a bit, duration depending
on argument given. 'sh' is a shell 100% hog.  In this scenario, the
argument was set such that 'f' used right at 50% cpu.  All are started
at the same time, and I froze top when the first 'f' reached 1:00.

virgin 2.6.21-rc3-rsdl-smp
top - 13:52:50 up 7 min, 12 users,  load average: 3.45, 2.89, 1.51

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  P COMMAND
 6560 root  31   0  2892 1236 1032 R   82  0.1   1:50.24 1 sh
 6558 root  28   0  1428  276  228 S   42  0.0   1:00.09 1 f
 6557 root  30   0  1424  280  228 R   35  0.0   1:00.25 0 f
 6559 root  39   0  1424  276  228 R   33  0.0   0:58.36 0 f
 6420 root  23   0  2372 1068  764 R3  0.1   0:04.68 0 top

patched as below 2.6.21-rc3-rsdl-smp
top - 14:09:28 up 6 min, 12 users,  load average: 3.52, 2.70, 1.29

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  P COMMAND
 6517 root  38   0  2892 1240 1032 R   59  0.1   1:31.12 1 sh
 6515 root  24   0  1424  280  228 R   51  0.0   1:00.10 0 f
 6514 root  37   0  1428  280  228 R   42  0.0   1:00.58 1 f
 6516 root  24   0  1428  280  228 R   41  0.0   1:00.01 0 f
 6430 root  23   0  2372 1056  764 R2  0.1   0:05.53 0 top

--- kernel/sched.c.org  2007-03-15 07:04:51.0 +0100
+++ kernel/sched.c  2007-03-21 13:55:22.0 +0100
@@ -1416,7 +1416,8 @@ static int try_to_wake_up(struct task_st
if (cpu == this_cpu) {
schedstat_inc(rq, ttwu_local);
goto out_set_cpu;
-   }
+   } else if (p-rotation == cpu_rq(cpu)-prio_rotation)
+   p-rotation = cpu_rq(this_cpu)-prio_rotation;
 
for_each_domain(this_cpu, sd) {
if (cpu_isset(cpu, sd-span)) {

Same test with virgin 2.6.20.3-smp for reference.
top - 14:46:10 up 18 min, 12 users,  load average: 3.70, 1.89, 1.07

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  P COMMAND
 6529 root  15   0  1424  280  228 S   54  0.0   1:00.26 1 f
 6530 root  15   0  1428  280  228 R   50  0.0   0:59.03 0 f
 6531 root  15   0  1424  280  228 R   48  0.0   0:59.29 1 f
 6532 root  25   0  2892 1240 1032 R   40  0.1   1:00.54 0 sh
 6457 root  15   0  2380 1056  764 R1  0.1   0:02.34 1 top

I was more than a bit surprised that mainline did this well, considering
that the proggy was one someone posted long time ago to demonstrate
starvation issues with the interactivity estimator.  (source not
available unfortunately, was apparently still on my old PIII box along
with the one Willy posted when I installed opensuse 10.2 on it.  damn.
trivial thing though)

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-21 Thread Peter Zijlstra

On Wed, 2007-03-21 at 15:57 +0100, Mike Galbraith wrote:

 'f' is a progglet which sleeps a bit and burns a bit, duration depending
 on argument given. 'sh' is a shell 100% hog.  In this scenario, the
 argument was set such that 'f' used right at 50% cpu.  All are started
 at the same time, and I froze top when the first 'f' reached 1:00.

May one enquire how much CPU the mythical 'f' uses when ran alone? Just
to get a gauge for the numbers?



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-21 Thread Mike Galbraith

On Wed, 2007-03-21 at 17:02 +0100, Peter Zijlstra wrote:
 On Wed, 2007-03-21 at 15:57 +0100, Mike Galbraith wrote:
 
  'f' is a progglet which sleeps a bit and burns a bit, duration depending
  on argument given. 'sh' is a shell 100% hog.  In this scenario, the
  argument was set such that 'f' used right at 50% cpu.  All are started
  at the same time, and I froze top when the first 'f' reached 1:00.
 
 May one enquire how much CPU the mythical 'f' uses when ran alone? Just
 to get a gauge for the numbers?

Right at 50%

-Mike

(mythical?  i can send you the binary if you want)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-21 Thread Mike Galbraith

On Wed, 2007-03-21 at 16:11 +0100, Paolo Ornati wrote:
 On Wed, 21 Mar 2007 15:57:44 +0100
 Mike Galbraith [EMAIL PROTECTED] wrote:
 
  I was more than a bit surprised that mainline did this well, considering
  that the proggy was one someone posted long time ago to demonstrate
  starvation issues with the interactivity estimator.  (source not
  available unfortunately, was apparently still on my old PIII box along
  with the one Willy posted when I installed opensuse 10.2 on it.  damn.
  trivial thing though)
 
 This one?  :)

No, but that one went to bit heaven too ;-)

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-21 Thread Artur Skawina

Al Boldi wrote:
 Artur Skawina wrote:
 Al Boldi wrote:
 -   p-quota = rr_quota(p);
 +   /*
 +* boost factor hardcoded to 5; adjust to your liking
 +* higher means more likely to DoS
 +*/
 +   p-quota = rr_quota(p) + (((now - p-timestamp)  20) * 5);

 mouse cursor stalls lasting almost 1s. i/o bound tasks starving X?
 After reverting the patch everything is smooth again.
 
 This patch wasn't really meant for production, as any sleeping background 
 proc turned cpu-hog may DoS the system.
 
 If you like to play with this, then you probably want to at least reset the 
 quota in its expiration.

well, the problem is that i can't reproduce the problem :) I tried
the patch because i suspected it could introduce regressions, and it
did. Maybe with some tuning a reasonable compromise could be found,
but first we need to know what to tune for... Does anybody have a
simple reproducible way to show the scheduling regressions of RSDL
vs mainline? ie one that does not involve (or at least is
independent of) specific X drivers, binary apps etc. Some reports
mentioned MP, is UP less susceptible?

I've now tried a -j2 kernel compilation on UP and in the not niced
case X interactivity suffers, which i guess is to be expected when
you have ~5 processes competing for one cpu (2*(cc+as)+X). nice -5
helps a bit, but does not eliminate the effect completely. Obviously
the right solution is to nice the makes, but i think the scheduler
could do better, at least in the case of almost idle X (once you
start moving windows etc it becomes a cpuhog just as the the
compiler). I'll look into this, maybe there's a way to prioritize
often sleeping tasks which can not be abused.
Another thing is the nice levels; right now nice -10 means ~35%
and nice -19 gives ~5% cpu; that's probably 2..5 times too much.

artur
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-21 Thread Willy Tarreau

On Wed, Mar 21, 2007 at 06:07:33PM +0100, Mike Galbraith wrote:
 On Wed, 2007-03-21 at 16:11 +0100, Paolo Ornati wrote:
  On Wed, 21 Mar 2007 15:57:44 +0100
  Mike Galbraith [EMAIL PROTECTED] wrote:
  
   I was more than a bit surprised that mainline did this well, considering
   that the proggy was one someone posted long time ago to demonstrate
   starvation issues with the interactivity estimator.  (source not
   available unfortunately, was apparently still on my old PIII box along
   with the one Willy posted when I installed opensuse 10.2 on it.  damn.
   trivial thing though)
  
  This one?  :)
 
 No, but that one went to bit heaven too ;-)

Mike, if you need my old scheddos, I can resend it to you as well as to
any people working on the scheduler and asking for it. Although trivial,
I'm a bit reluctant to publish it to the whole world because I suspect
that distros based on older kernels are still vulnerable and the fixes
may not be easy. Anyway, it has absolutely no effect on non-interactive
schedulers.

Regards,
Willy

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-20 Thread Al Boldi

Artur Skawina wrote:
> Al Boldi wrote:
> > --- sched.bak.c 2007-03-16 23:07:23.0 +0300
> > +++ sched.c 2007-03-19 23:49:40.0 +0300
> > @@ -938,7 +938,11 @@ static void activate_task(struct task_st
> >  (now - p->timestamp) >> 20);
> > }
> >
> > -   p->quota = rr_quota(p);
> > +   /*
> > +* boost factor hardcoded to 5; adjust to your liking
> > +* higher means more likely to DoS
> > +*/
> > +   p->quota = rr_quota(p) + (((now - p->timestamp) >> 20) * 5);
> > p->prio = effective_prio(p);
> > p->timestamp = now;
> > __activate_task(p, rq);
>
> i've tried this and it lasted only a few minutes -- i was seeing
> mouse cursor stalls lasting almost 1s. i/o bound tasks starving X?
> After reverting the patch everything is smooth again.

This patch wasn't really meant for production, as any sleeping background 
proc turned cpu-hog may DoS the system.

If you like to play with this, then you probably want to at least reset the 
quota in its expiration.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-20 Thread Artur Skawina

Al Boldi wrote:
> --- sched.bak.c   2007-03-16 23:07:23.0 +0300
> +++ sched.c   2007-03-19 23:49:40.0 +0300
> @@ -938,7 +938,11 @@ static void activate_task(struct task_st
>(now - p->timestamp) >> 20);
>   }
>  
> - p->quota = rr_quota(p);
> + /*
> +  * boost factor hardcoded to 5; adjust to your liking
> +  * higher means more likely to DoS
> +  */
> + p->quota = rr_quota(p) + (((now - p->timestamp) >> 20) * 5);
>   p->prio = effective_prio(p);
>   p->timestamp = now;
>   __activate_task(p, rq);

i've tried this and it lasted only a few minutes -- i was seeing
mouse cursor stalls lasting almost 1s. i/o bound tasks starving X?
After reverting the patch everything is smooth again.

artur

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-20 Thread Al Boldi

Linus Torvalds wrote:
> I was very happy to see the "try this patch" email from Al Boldi - not
> because I think that patch per se was necessarily the right fix (I have no
> idea), 

Well, it wasn't really meant as a fix, but rather to point out that 
interactivity boosting is possible with RSDL.

It probably needs a lot more work, but just this one-liner gives an 
unbelievable ia boost.

> but simply because I think that's the kind of mindset we need to have.

Thanks.

> Not a lot of people really *like* the old scheduler, but it's been tweaked
> over the years to try to avoid some nasty behaviour. I'm really hoping
> that RSDL would be a lot better (and by all accounts it has the potential
> for that), but I think it's totally naïve to expect that it won't need
> some tweaking too.

Aside from ia boosting, I think fixed latencies per nice levels may be 
desirable, when physically possible, to allow for more deterministic 
scheduling.

> So I'll happily still merge RSDL right after 2.6.21 (and it won't even be
> a config option - if we want to make it good, we need to make sure
> *everybody* tests it), but what I want to see is that "can do" spirit wrt
> tweaking for issues that come up.
>
> Because let's face it - nothing is ever perfect. Even a really nice
> conceptual idea always ends up hitting the "but in real life, things are
> ugly and complex, and we've depended on behaviour X in the past and can't
> change it, so we need some tweaking for problem Y".
>
> And everything is totally fixable - at least as long as people are willing
> to!

Agreed.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-20 Thread Linus Torvalds

On Tue, 20 Mar 2007, Willy Tarreau wrote:
> 
> Linus, you're unfair with Con. He initially was on this position, and lately
> worked with Mike by proposing changes to try to improve his X responsiveness.

I was not actually so much speaking about Con, as about a lot of the 
tone in general here. And yes, it's not been entirely black and white. I 
was very happy to see the "try this patch" email from Al Boldi - not 
because I think that patch per se was necessarily the right fix (I have no 
idea), but simply because I think that's the kind of mindset we need to 
have.

Not a lot of people really *like* the old scheduler, but it's been tweaked 
over the years to try to avoid some nasty behaviour. I'm really hoping 
that RSDL would be a lot better (and by all accounts it has the potential 
for that), but I think it's totally naïve to expect that it won't need 
some tweaking too.

So I'll happily still merge RSDL right after 2.6.21 (and it won't even be 
a config option - if we want to make it good, we need to make sure 
*everybody* tests it), but what I want to see is that "can do" spirit wrt 
tweaking for issues that come up.

Because let's face it - nothing is ever perfect. Even a really nice 
conceptual idea always ends up hitting the "but in real life, things are 
ugly and complex, and we've depended on behaviour X in the past and can't 
change it, so we need some tweaking for problem Y".

And everything is totally fixable - at least as long as people are willing 
to!

Linus

Re: RSDL v0.31

2007-03-20 Thread Mark Lord


Ray Lee wrote:

On 3/20/07, Mark Lord <[EMAIL PROTECTED]> wrote:

I've droppped it from my machine -- interactive response is much
more important for my primary machine right now.


Help out with a data point? Are you running KDE as well? 


Yes, KDE.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-20 Thread Ray Lee


On 3/20/07, Mark Lord <[EMAIL PROTECTED]> wrote:

I've droppped it from my machine -- interactive response is much
more important for my primary machine right now.


Help out with a data point? Are you running KDE as well? If you are,
then it looks like the common denominator that RSDL is handling poorly
is client-server communication. (KDE's KIO slaves in this case, but X
in general.)

If so, one would hope that a variation on Linus's 2.5.63 pipe wakeup
pass-the-interactivity idea could work here. The problem with that
original patch, IIRC, was that a couple of tasks could bounce their
interactivity bonus back and forth and thereby starve others. Which
might be expected given there was no 'decaying' of the interactivity
bonus, which means you can make a feedback loop.

Anyway, looks like processes that do A -> B -> A communication chains
are getting penalized under RSDL. In which case, perhaps I can make a
test case that exhibits the problem without having to have the same
graphics card or desktop as you.

Ray
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-20 Thread Mark Lord


Linus Torvalds wrote:


Quite frankly, I was *planning* on merging RSDL very early after 2.6.21, 
but there is one thing that has turned me completely off the whole thing:


 - the people involved seem to be totally unwilling to even admit there 
   might be a problem.


Not to mention that it seems to only be tested thus far
by a very vocal and supportive core.  It needs much wider
exposure for much longer before risking it in mainline.
It likely will get there, eventually, just not yet.

I've droppped it from my machine -- interactive response is much
more important for my primary machine right now.

I believe Ingo's much simpler hack produces as good/bad results
as this RSDL thingie, and with one important extra:
it can be switched on/off at runtime.

->forwarded message:

Subject: [patch] CFS scheduler: Completely Fair Scheduler
From: Ingo Molnar <[EMAIL PROTECTED]>

add the CONFIG_SCHED_FAIR option (default: off): this turns the Linux 
scheduler into a completely fair scheduler for SCHED_OTHER tasks: with 
perfect roundrobin scheduling, fair distribution of timeslices combined 
with no interactivity boosting and no heuristics.


a /proc/sys/kernel/sched_fair option is also available to turn
this behavior on/off.

if this option establishes itself amongst leading distributions then we
could in the future remove the interactivity estimator altogether.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
include/linux/sched.h  |1 +
kernel/Kconfig.preempt |9 +
kernel/sched.c |8 
kernel/sysctl.c|   10 ++
4 files changed, 28 insertions(+)

Index: linux/include/linux/sched.h
===
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -119,6 +119,7 @@ extern unsigned long avenrun[]; /* Load
load += n*(FIXED_1-exp); \
load >>= FSHIFT;

+extern unsigned int sched_fair;
extern unsigned long total_forks;
extern int nr_threads;
DECLARE_PER_CPU(unsigned long, process_counts);
Index: linux/kernel/Kconfig.preempt
===
--- linux.orig/kernel/Kconfig.preempt
+++ linux/kernel/Kconfig.preempt
@@ -63,3 +63,12 @@ config PREEMPT_BKL
  Say Y here if you are building a kernel for a desktop system.
  Say N if you are unsure.

+config SCHED_FAIR
+   bool "Completely Fair Scheduler"
+   help
+ This option turns the Linux scheduler into a completely fair
+ scheduler. User-space workloads will round-robin fairly, and
+ they have to be prioritized using nice levels.
+
+ Say N if you are unsure.
+
Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -4040,6 +4040,10 @@ static inline struct task_struct *find_p
return pid ? find_task_by_pid(pid) : current;
}

+#ifdef CONFIG_SCHED_FAIR
+unsigned int sched_fair = 1;
+#endif
+
/* Actually do priority change: must hold rq lock. */
static void __setscheduler(struct task_struct *p, int policy, int prio)
{
@@ -4055,6 +4059,10 @@ static void __setscheduler(struct task_s
 */
if (policy == SCHED_BATCH)
p->sleep_avg = 0;
+#ifdef CONFIG_SCHED_FAIR
+   if (policy == SCHED_NORMAL && sched_fair)
+   p->sleep_avg = 0;
+#endif
set_load_weight(p);
}

Index: linux/kernel/sysctl.c
===
--- linux.orig/kernel/sysctl.c
+++ linux/kernel/sysctl.c
@@ -205,6 +205,16 @@ static ctl_table root_table[] = {
};

static ctl_table kern_table[] = {
+#ifdef CONFIG_SCHED_FAIR
+   {
+   .ctl_name   = CTL_UNNUMBERED,
+   .procname   = "sched_fair",
+   .data   = _fair,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = _dointvec,
+   },
+#endif
{
.ctl_name   = KERN_PANIC,
.procname   = "panic",
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-20 Thread Artur Skawina

Xavier Bestel wrote:
> On Tue, 2007-03-20 at 07:11 +0100, Willy Tarreau wrote:
>> I don't agree with starting to renice X to get something usable
> 
> X looks very special to me: it's a big userspace driver, the primary
> task handling user interaction on the desktop, and on some OS the part
> responsible for moving the mouse pointer and interacting with windows is
> even implemented as an interrupt handler, and that for sure provides for
> smooth user experience even on very low-end hardware. Why not compensate
> for X design by prioritizing it a bit ?
> If RSDL + reniced X makes for a better desktop than sotck kernel + X, on
> all kind of workloads, it's good to know.

No, running X at a different priority than its clients is not really
a good idea. If it isn't immediately obvious why try something like
this:

mkdir /tmp/tempdir
cd /tmp/tempdir
for i in `seq -w 1 1` ; do touch
longfilenamexx$i
; done
nice --20 xterm &
xterm &
nice -20 xterm &

then do "time ls -l ." in each xterm.

This is what i get on UP 2.6.20+RSDL.31 w/ X at nice 0:
-20: 0m0.244s user   0m0.156s system   0m3.113s elapsed   12.84% CPU
  0: 0m0.216s user   0m0.168s system   0m2.801s elapsed   13.70% CPU
 19: 0m0.188s user   0m0.196s system   0m3.268s elapsed   11.75% CPU

I just made this simple example up and it doesn't show the problem
too well, but you can already see the ~10% performance drop. It's
actually worse in practice, because for some apps the increased
amount of rendering is clearly visible; text areas scroll
line-by-line, content is incrementally redrawn several times etc.
This happens because an X server running at a higher priority than a
client will often get scheduled immediately after some x11 traffic
arrives; when the process priorities are equal usually the client
gets a chance to supply some more data. IOW by renicing the server
you make X almost synchronous.

This isn't specific to RSDL - it happens w/ any cpu scheduler; and
while the effects of less extreme prio differences (ie -5 instead of
-20 etc) may be less visible i also doubt they will help much.

A better approach to X interactivity might be allowing the server to
use (part of) the clients timeslice, but it's not trivial -- you'd
only want to do that when the client is waiting for a reply and you
almost never want to preempt the client just because the server
received some data.

As to RSDL - it seems to work great for desktop use and feels better
than mainline. However top output under 100% load (eg kernel
compilation) looks like below -- the %CPU error seems a bit high...

Tasks:  97 total,   6 running,  91 sleeping,   0 stopped,   0 zombie
Cpu(s): 81.7% us, 18.3% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,
 0.0% si
  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

 7566 root  17   0  9196 4108 1188 R  3.0  0.8   0:00.09 cc1

 7499 root  11   0  1952  924  648 S  0.3  0.2   0:00.01 make

12279 root   1   0  5556 2928 2064 S  0.3  0.6   0:00.83 xterm

31510 root   1   0  2152 1100  840 R  0.3  0.2   0:00.25 top

1 root   1   0  1584   88   60 S  0.0  0.0   0:00.30 init

artur
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ck] Re: RSDL v0.31

2007-03-20 Thread jos poortvliet

Op Tuesday 20 March 2007, schreef Linus Torvalds:
> On Mon, 19 Mar 2007, Xavier Bestel wrote:
> > > >> Stock scheduler wins easily, no contest.
> > > >
> > > > What happens when you renice X ?
> > >
> > > Dunno -- not necessary with the stock scheduler.
> >
> > Could you try something like renice -10 $(pidof Xorg) ?
>
> Could you try something as simple and accepting that maybe this is a
> problem?
>
> Quite frankly, I was *planning* on merging RSDL very early after 2.6.21,
> but there is one thing that has turned me completely off the whole thing:
>
>  - the people involved seem to be totally unwilling to even admit there
>might be a problem.
>
> This is like alcoholism. If you cannot admit that you might have a
> problem, you'll never get anywhere. And quite frankly, the RSDL proponents
> seem to be in denial ("we're always better", "it's your problem if the old
> scheduler works better", "just one report of old scheduler being better").
>
> And the thing is, if people aren't even _willing_ to admit that there may
> be issues, there's *no*way*in*hell* I will merge it even for testing.
> Because the whole and only point of merging RSDL was to see if it could
> replace the old scheduler, and the most important feature in that case is
> not whether it is perfect, BUT WHETHER ANYBODY IS INTERESTED IN TRYING TO
> FIX THE INEVITABLE PROBLEMS!

Con simply isn't available right now, but you're right. RSDL isn't ready yet, 
imho, there seem to be some regressions (and I'm bitten by them, too). But if 
con's past behaviour says anything about how he's going to behave in the 
future (and according to my psych prof it's the most reliable predictor ;-)), 
I'm pretty sure he'll jump on this when he's healthy again. He's gone through 
great lengths to fix problems with staircase, no matter how obscure, so I see 
no reason why he wouldn't do the same for RSDL... Though scheduler problems 
can be extremely hard to reproduce on other hardware.

> See?
>
> Can you people not see that the way you're doing that "RSDL is perfect"
> chorus in the face of people who report problems, you're just making it
> totally unrealistic that it will *ever* get merged.
>
> So unless somebody steps up to the plate and actually *talks* about the
> problem reports, and admits that maybe RSDL will need some tweaking, I'm
> not going to merge it.
>
> Because there is just _one_ thing that is more important than code - and
> that is the willingness to fix the code...
>
>   Linus
> ___
> http://ck.kolivas.org/faqs/replying-to-mailing-list.txt
> ck mailing list - mailto: [EMAIL PROTECTED]
> http://vds.kolivas.org/mailman/listinfo/ck

-- 
Disclaimer:

Alles wat ik doe denk en zeg is gebaseerd op het wereldbeeld wat ik nu heb. 
Ik ben niet verantwoordelijk voor wijzigingen van de wereld, of het beeld wat 
ik daarvan heb, noch voor de daaruit voortvloeiende gedragingen van mezelf. 
Alles wat ik zeg is aardig bedoeld, tenzij expliciet vermeld.

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html

pgp5Lq89fFLhw.pgp
Description: PGP signature

Re: [ck] Re: RSDL v0.31

2007-03-20 Thread jos poortvliet

Op Tuesday 20 March 2007, schreef Bill Davidsen:
> Kasper Sandberg wrote:
> > On Sun, 2007-03-18 at 08:38 +0100, Mike Galbraith wrote:
> >> On Sun, 2007-03-18 at 08:22 +0100, Radoslaw Szkodzinski wrote:
> >>> I'd recon KDE regresses because of kioslaves waiting on a pipe
> >>> (communication with the app they're doing IO for) and then expiring.
> >>> That's why splitting IO from an app isn't exactly smart. It should at
> >>> least be ran in an another thread.
> >>
> >> Hm.  Sounds rather a lot like the...
> >> X sucks, fix X and RSDL will rock your world.  RSDL is perfect.
> >> ...that I've been getting.
> >
> > not really, only X sucks. KDE works atleast as good with rsdl as
> > vanilla. i dont know how originally said kde works worse, wasnt it just
> > someone that thought?
>
> It was probably me, and I had the opinion that KDE is not as smooth as
> GNOME with RSDL. I haven't had time to measure, but using for daily
> stuff for about an hour each way hasn't changed my opinion. Every once
> in a while KDE will KLUNK to a halt for 200-300ms doing mundane stuff
> like redrawing a page, scrolling, etc. I don't see it with GNOME.

yeah, here too... sometimes even longer (and I have a dualcore, 3gb ram, 
damnit!)

-- 
Disclaimer:

Alles wat ik doe denk en zeg is gebaseerd op het wereldbeeld wat ik nu heb. 
Ik ben niet verantwoordelijk voor wijzigingen van de wereld, of het beeld wat 
ik daarvan heb, noch voor de daaruit voortvloeiende gedragingen van mezelf. 
Alles wat ik zeg is aardig bedoeld, tenzij expliciet vermeld.

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html


pgpTTtAuMOZKH.pgp
Description: PGP signature

Re: RSDL v0.31

2007-03-20 Thread Xavier Bestel

On Tue, 2007-03-20 at 07:11 +0100, Willy Tarreau wrote:
> I don't agree with starting to renice X to get something usable

X looks very special to me: it's a big userspace driver, the primary
task handling user interaction on the desktop, and on some OS the part
responsible for moving the mouse pointer and interacting with windows is
even implemented as an interrupt handler, and that for sure provides for
smooth user experience even on very low-end hardware. Why not compensate
for X design by prioritizing it a bit ?
If RSDL + reniced X makes for a better desktop than sotck kernel + X, on
all kind of workloads, it's good to know.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-20 Thread Mike Galbraith

On Tue, 2007-03-20 at 07:11 +0100, Willy Tarreau wrote:

> Also, while I don't agree with starting to renice X to get something usable,
> it seems real that there's something funny on Mike's system which makes it
> behave particularly strangely when combined with RSDL, because other people
> in comparable tests (including me) have found X perfectly smooth even with
> loads in the tens or even hundreds. I really suspect that we will find a bug
> in RSDL which triggers the problem and that this fix will help discover
> another problem on Mike's hardware which was not triggered by mainline.

I don't _think_ there's anything funny in my system, and Con said it was
the expected behavior with my testcase, but I won't rule it out.

Moving right along to the bugs part, I hope others are looking as well,
and not only talking.

One area that looks pretty fishy to me is cross-cpu wakeups and task
migration.  p->rotation appears to lose all meaning when you cross the
cpu boundary, and try_to_wake_up()is using that information in the
cross-cpu case.  In pull_task() OTOH, it checks to see if the task ran
on the remote cpu (at all, hmm), and if so tags the task accordingly.
It is not immediately obvious to me why this would be a good thing
though, because quotas of one runqueue don't appear to have any relation
to quotas of some other runqueue.  (i'm going to it that this old
information is meaningless)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-20 Thread Mike Galbraith

On Tue, 2007-03-20 at 07:11 +0100, Willy Tarreau wrote:

 Also, while I don't agree with starting to renice X to get something usable,
 it seems real that there's something funny on Mike's system which makes it
 behave particularly strangely when combined with RSDL, because other people
 in comparable tests (including me) have found X perfectly smooth even with
 loads in the tens or even hundreds. I really suspect that we will find a bug
 in RSDL which triggers the problem and that this fix will help discover
 another problem on Mike's hardware which was not triggered by mainline.

I don't _think_ there's anything funny in my system, and Con said it was
the expected behavior with my testcase, but I won't rule it out.

Moving right along to the bugs part, I hope others are looking as well,
and not only talking.

One area that looks pretty fishy to me is cross-cpu wakeups and task
migration.  p-rotation appears to lose all meaning when you cross the
cpu boundary, and try_to_wake_up()is using that information in the
cross-cpu case.  In pull_task() OTOH, it checks to see if the task ran
on the remote cpu (at all, hmm), and if so tags the task accordingly.
It is not immediately obvious to me why this would be a good thing
though, because quotas of one runqueue don't appear to have any relation
to quotas of some other runqueue.  (i'm going to it that this old
information is meaningless)

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-20 Thread Xavier Bestel

On Tue, 2007-03-20 at 07:11 +0100, Willy Tarreau wrote:
 I don't agree with starting to renice X to get something usable

X looks very special to me: it's a big userspace driver, the primary
task handling user interaction on the desktop, and on some OS the part
responsible for moving the mouse pointer and interacting with windows is
even implemented as an interrupt handler, and that for sure provides for
smooth user experience even on very low-end hardware. Why not compensate
for X design by prioritizing it a bit ?
If RSDL + reniced X makes for a better desktop than sotck kernel + X, on
all kind of workloads, it's good to know.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ck] Re: RSDL v0.31

2007-03-20 Thread jos poortvliet

Op Tuesday 20 March 2007, schreef Bill Davidsen:
 Kasper Sandberg wrote:
  On Sun, 2007-03-18 at 08:38 +0100, Mike Galbraith wrote:
  On Sun, 2007-03-18 at 08:22 +0100, Radoslaw Szkodzinski wrote:
  I'd recon KDE regresses because of kioslaves waiting on a pipe
  (communication with the app they're doing IO for) and then expiring.
  That's why splitting IO from an app isn't exactly smart. It should at
  least be ran in an another thread.
 
  Hm.  Sounds rather a lot like the...
  X sucks, fix X and RSDL will rock your world.  RSDL is perfect.
  ...that I've been getting.
 
  not really, only X sucks. KDE works atleast as good with rsdl as
  vanilla. i dont know how originally said kde works worse, wasnt it just
  someone that thought?

 It was probably me, and I had the opinion that KDE is not as smooth as
 GNOME with RSDL. I haven't had time to measure, but using for daily
 stuff for about an hour each way hasn't changed my opinion. Every once
 in a while KDE will KLUNK to a halt for 200-300ms doing mundane stuff
 like redrawing a page, scrolling, etc. I don't see it with GNOME.

yeah, here too... sometimes even longer (and I have a dualcore, 3gb ram, 
damnit!)

-- 
Disclaimer:

Alles wat ik doe denk en zeg is gebaseerd op het wereldbeeld wat ik nu heb. 
Ik ben niet verantwoordelijk voor wijzigingen van de wereld, of het beeld wat 
ik daarvan heb, noch voor de daaruit voortvloeiende gedragingen van mezelf. 
Alles wat ik zeg is aardig bedoeld, tenzij expliciet vermeld.

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html


pgpTTtAuMOZKH.pgp
Description: PGP signature

Re: [ck] Re: RSDL v0.31

2007-03-20 Thread jos poortvliet

Op Tuesday 20 March 2007, schreef Linus Torvalds:
 On Mon, 19 Mar 2007, Xavier Bestel wrote:
Stock scheduler wins easily, no contest.
   
What happens when you renice X ?
  
   Dunno -- not necessary with the stock scheduler.
 
  Could you try something like renice -10 $(pidof Xorg) ?

 Could you try something as simple and accepting that maybe this is a
 problem?

 Quite frankly, I was *planning* on merging RSDL very early after 2.6.21,
 but there is one thing that has turned me completely off the whole thing:

  - the people involved seem to be totally unwilling to even admit there
might be a problem.

 This is like alcoholism. If you cannot admit that you might have a
 problem, you'll never get anywhere. And quite frankly, the RSDL proponents
 seem to be in denial (we're always better, it's your problem if the old
 scheduler works better, just one report of old scheduler being better).

 And the thing is, if people aren't even _willing_ to admit that there may
 be issues, there's *no*way*in*hell* I will merge it even for testing.
 Because the whole and only point of merging RSDL was to see if it could
 replace the old scheduler, and the most important feature in that case is
 not whether it is perfect, BUT WHETHER ANYBODY IS INTERESTED IN TRYING TO
 FIX THE INEVITABLE PROBLEMS!

Con simply isn't available right now, but you're right. RSDL isn't ready yet, 
imho, there seem to be some regressions (and I'm bitten by them, too). But if 
con's past behaviour says anything about how he's going to behave in the 
future (and according to my psych prof it's the most reliable predictor ;-)), 
I'm pretty sure he'll jump on this when he's healthy again. He's gone through 
great lengths to fix problems with staircase, no matter how obscure, so I see 
no reason why he wouldn't do the same for RSDL... Though scheduler problems 
can be extremely hard to reproduce on other hardware.

 See?

 Can you people not see that the way you're doing that RSDL is perfect
 chorus in the face of people who report problems, you're just making it
 totally unrealistic that it will *ever* get merged.

 So unless somebody steps up to the plate and actually *talks* about the
 problem reports, and admits that maybe RSDL will need some tweaking, I'm
 not going to merge it.

 Because there is just _one_ thing that is more important than code - and
 that is the willingness to fix the code...

   Linus
 ___
 http://ck.kolivas.org/faqs/replying-to-mailing-list.txt
 ck mailing list - mailto: [EMAIL PROTECTED]
 http://vds.kolivas.org/mailman/listinfo/ck



-- 
Disclaimer:

Alles wat ik doe denk en zeg is gebaseerd op het wereldbeeld wat ik nu heb. 
Ik ben niet verantwoordelijk voor wijzigingen van de wereld, of het beeld wat 
ik daarvan heb, noch voor de daaruit voortvloeiende gedragingen van mezelf. 
Alles wat ik zeg is aardig bedoeld, tenzij expliciet vermeld.

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html


pgp5Lq89fFLhw.pgp
Description: PGP signature

Re: RSDL v0.31

2007-03-20 Thread Artur Skawina

Xavier Bestel wrote:
 On Tue, 2007-03-20 at 07:11 +0100, Willy Tarreau wrote:
 I don't agree with starting to renice X to get something usable
 
 X looks very special to me: it's a big userspace driver, the primary
 task handling user interaction on the desktop, and on some OS the part
 responsible for moving the mouse pointer and interacting with windows is
 even implemented as an interrupt handler, and that for sure provides for
 smooth user experience even on very low-end hardware. Why not compensate
 for X design by prioritizing it a bit ?
 If RSDL + reniced X makes for a better desktop than sotck kernel + X, on
 all kind of workloads, it's good to know.

No, running X at a different priority than its clients is not really
a good idea. If it isn't immediately obvious why try something like
this:

mkdir /tmp/tempdir
cd /tmp/tempdir
for i in `seq -w 1 1` ; do touch
longfilenamexx$i
; done
nice --20 xterm 
xterm 
nice -20 xterm 

then do time ls -l . in each xterm.

This is what i get on UP 2.6.20+RSDL.31 w/ X at nice 0:
-20: 0m0.244s user   0m0.156s system   0m3.113s elapsed   12.84% CPU
  0: 0m0.216s user   0m0.168s system   0m2.801s elapsed   13.70% CPU
 19: 0m0.188s user   0m0.196s system   0m3.268s elapsed   11.75% CPU

I just made this simple example up and it doesn't show the problem
too well, but you can already see the ~10% performance drop. It's
actually worse in practice, because for some apps the increased
amount of rendering is clearly visible; text areas scroll
line-by-line, content is incrementally redrawn several times etc.
This happens because an X server running at a higher priority than a
client will often get scheduled immediately after some x11 traffic
arrives; when the process priorities are equal usually the client
gets a chance to supply some more data. IOW by renicing the server
you make X almost synchronous.

This isn't specific to RSDL - it happens w/ any cpu scheduler; and
while the effects of less extreme prio differences (ie -5 instead of
-20 etc) may be less visible i also doubt they will help much.

A better approach to X interactivity might be allowing the server to
use (part of) the clients timeslice, but it's not trivial -- you'd
only want to do that when the client is waiting for a reply and you
almost never want to preempt the client just because the server
received some data.

As to RSDL - it seems to work great for desktop use and feels better
than mainline. However top output under 100% load (eg kernel
compilation) looks like below -- the %CPU error seems a bit high...

Tasks:  97 total,   6 running,  91 sleeping,   0 stopped,   0 zombie
Cpu(s): 81.7% us, 18.3% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,
 0.0% si
  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

 7566 root  17   0  9196 4108 1188 R  3.0  0.8   0:00.09 cc1

 7499 root  11   0  1952  924  648 S  0.3  0.2   0:00.01 make

12279 root   1   0  5556 2928 2064 S  0.3  0.6   0:00.83 xterm

31510 root   1   0  2152 1100  840 R  0.3  0.2   0:00.25 top

1 root   1   0  1584   88   60 S  0.0  0.0   0:00.30 init



artur
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-20 Thread Mark Lord


Linus Torvalds wrote:


Quite frankly, I was *planning* on merging RSDL very early after 2.6.21, 
but there is one thing that has turned me completely off the whole thing:


 - the people involved seem to be totally unwilling to even admit there 
   might be a problem.


Not to mention that it seems to only be tested thus far
by a very vocal and supportive core.  It needs much wider
exposure for much longer before risking it in mainline.
It likely will get there, eventually, just not yet.

I've droppped it from my machine -- interactive response is much
more important for my primary machine right now.

I believe Ingo's much simpler hack produces as good/bad results
as this RSDL thingie, and with one important extra:
it can be switched on/off at runtime.

-forwarded message:

Subject: [patch] CFS scheduler: Completely Fair Scheduler
From: Ingo Molnar [EMAIL PROTECTED]

add the CONFIG_SCHED_FAIR option (default: off): this turns the Linux 
scheduler into a completely fair scheduler for SCHED_OTHER tasks: with 
perfect roundrobin scheduling, fair distribution of timeslices combined 
with no interactivity boosting and no heuristics.


a /proc/sys/kernel/sched_fair option is also available to turn
this behavior on/off.

if this option establishes itself amongst leading distributions then we
could in the future remove the interactivity estimator altogether.

Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
---
include/linux/sched.h  |1 +
kernel/Kconfig.preempt |9 +
kernel/sched.c |8 
kernel/sysctl.c|   10 ++
4 files changed, 28 insertions(+)

Index: linux/include/linux/sched.h
===
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -119,6 +119,7 @@ extern unsigned long avenrun[]; /* Load
load += n*(FIXED_1-exp); \
load = FSHIFT;

+extern unsigned int sched_fair;
extern unsigned long total_forks;
extern int nr_threads;
DECLARE_PER_CPU(unsigned long, process_counts);
Index: linux/kernel/Kconfig.preempt
===
--- linux.orig/kernel/Kconfig.preempt
+++ linux/kernel/Kconfig.preempt
@@ -63,3 +63,12 @@ config PREEMPT_BKL
  Say Y here if you are building a kernel for a desktop system.
  Say N if you are unsure.

+config SCHED_FAIR
+   bool Completely Fair Scheduler
+   help
+ This option turns the Linux scheduler into a completely fair
+ scheduler. User-space workloads will round-robin fairly, and
+ they have to be prioritized using nice levels.
+
+ Say N if you are unsure.
+
Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -4040,6 +4040,10 @@ static inline struct task_struct *find_p
return pid ? find_task_by_pid(pid) : current;
}

+#ifdef CONFIG_SCHED_FAIR
+unsigned int sched_fair = 1;
+#endif
+
/* Actually do priority change: must hold rq lock. */
static void __setscheduler(struct task_struct *p, int policy, int prio)
{
@@ -4055,6 +4059,10 @@ static void __setscheduler(struct task_s
 */
if (policy == SCHED_BATCH)
p-sleep_avg = 0;
+#ifdef CONFIG_SCHED_FAIR
+   if (policy == SCHED_NORMAL  sched_fair)
+   p-sleep_avg = 0;
+#endif
set_load_weight(p);
}

Index: linux/kernel/sysctl.c
===
--- linux.orig/kernel/sysctl.c
+++ linux/kernel/sysctl.c
@@ -205,6 +205,16 @@ static ctl_table root_table[] = {
};

static ctl_table kern_table[] = {
+#ifdef CONFIG_SCHED_FAIR
+   {
+   .ctl_name   = CTL_UNNUMBERED,
+   .procname   = sched_fair,
+   .data   = sched_fair,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec,
+   },
+#endif
{
.ctl_name   = KERN_PANIC,
.procname   = panic,
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-20 Thread Ray Lee


On 3/20/07, Mark Lord [EMAIL PROTECTED] wrote:

I've droppped it from my machine -- interactive response is much
more important for my primary machine right now.


Help out with a data point? Are you running KDE as well? If you are,
then it looks like the common denominator that RSDL is handling poorly
is client-server communication. (KDE's KIO slaves in this case, but X
in general.)

If so, one would hope that a variation on Linus's 2.5.63 pipe wakeup
pass-the-interactivity idea could work here. The problem with that
original patch, IIRC, was that a couple of tasks could bounce their
interactivity bonus back and forth and thereby starve others. Which
might be expected given there was no 'decaying' of the interactivity
bonus, which means you can make a feedback loop.

Anyway, looks like processes that do A - B - A communication chains
are getting penalized under RSDL. In which case, perhaps I can make a
test case that exhibits the problem without having to have the same
graphics card or desktop as you.

Ray
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-20 Thread Mark Lord


Ray Lee wrote:

On 3/20/07, Mark Lord [EMAIL PROTECTED] wrote:

I've droppped it from my machine -- interactive response is much
more important for my primary machine right now.


Help out with a data point? Are you running KDE as well? 


Yes, KDE.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-20 Thread Linus Torvalds



On Tue, 20 Mar 2007, Willy Tarreau wrote:
 
 Linus, you're unfair with Con. He initially was on this position, and lately
 worked with Mike by proposing changes to try to improve his X responsiveness.

I was not actually so much speaking about Con, as about a lot of the 
tone in general here. And yes, it's not been entirely black and white. I 
was very happy to see the try this patch email from Al Boldi - not 
because I think that patch per se was necessarily the right fix (I have no 
idea), but simply because I think that's the kind of mindset we need to 
have.

Not a lot of people really *like* the old scheduler, but it's been tweaked 
over the years to try to avoid some nasty behaviour. I'm really hoping 
that RSDL would be a lot better (and by all accounts it has the potential 
for that), but I think it's totally naïve to expect that it won't need 
some tweaking too.

So I'll happily still merge RSDL right after 2.6.21 (and it won't even be 
a config option - if we want to make it good, we need to make sure 
*everybody* tests it), but what I want to see is that can do spirit wrt 
tweaking for issues that come up.

Because let's face it - nothing is ever perfect. Even a really nice 
conceptual idea always ends up hitting the but in real life, things are 
ugly and complex, and we've depended on behaviour X in the past and can't 
change it, so we need some tweaking for problem Y.

And everything is totally fixable - at least as long as people are willing 
to!

Linus

Re: RSDL v0.31

2007-03-20 Thread Al Boldi

Linus Torvalds wrote:
 I was very happy to see the try this patch email from Al Boldi - not
 because I think that patch per se was necessarily the right fix (I have no
 idea), 

Well, it wasn't really meant as a fix, but rather to point out that 
interactivity boosting is possible with RSDL.

It probably needs a lot more work, but just this one-liner gives an 
unbelievable ia boost.

 but simply because I think that's the kind of mindset we need to have.

Thanks.

 Not a lot of people really *like* the old scheduler, but it's been tweaked
 over the years to try to avoid some nasty behaviour. I'm really hoping
 that RSDL would be a lot better (and by all accounts it has the potential
 for that), but I think it's totally naïve to expect that it won't need
 some tweaking too.

Aside from ia boosting, I think fixed latencies per nice levels may be 
desirable, when physically possible, to allow for more deterministic 
scheduling.

 So I'll happily still merge RSDL right after 2.6.21 (and it won't even be
 a config option - if we want to make it good, we need to make sure
 *everybody* tests it), but what I want to see is that can do spirit wrt
 tweaking for issues that come up.

 Because let's face it - nothing is ever perfect. Even a really nice
 conceptual idea always ends up hitting the but in real life, things are
 ugly and complex, and we've depended on behaviour X in the past and can't
 change it, so we need some tweaking for problem Y.

 And everything is totally fixable - at least as long as people are willing
 to!

Agreed.


Thanks!

--
Al

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-20 Thread Artur Skawina

Al Boldi wrote:
 --- sched.bak.c   2007-03-16 23:07:23.0 +0300
 +++ sched.c   2007-03-19 23:49:40.0 +0300
 @@ -938,7 +938,11 @@ static void activate_task(struct task_st
(now - p-timestamp)  20);
   }
  
 - p-quota = rr_quota(p);
 + /*
 +  * boost factor hardcoded to 5; adjust to your liking
 +  * higher means more likely to DoS
 +  */
 + p-quota = rr_quota(p) + (((now - p-timestamp)  20) * 5);
   p-prio = effective_prio(p);
   p-timestamp = now;
   __activate_task(p, rq);

i've tried this and it lasted only a few minutes -- i was seeing
mouse cursor stalls lasting almost 1s. i/o bound tasks starving X?
After reverting the patch everything is smooth again.

artur

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-20 Thread Al Boldi

Artur Skawina wrote:
 Al Boldi wrote:
  --- sched.bak.c 2007-03-16 23:07:23.0 +0300
  +++ sched.c 2007-03-19 23:49:40.0 +0300
  @@ -938,7 +938,11 @@ static void activate_task(struct task_st
   (now - p-timestamp)  20);
  }
 
  -   p-quota = rr_quota(p);
  +   /*
  +* boost factor hardcoded to 5; adjust to your liking
  +* higher means more likely to DoS
  +*/
  +   p-quota = rr_quota(p) + (((now - p-timestamp)  20) * 5);
  p-prio = effective_prio(p);
  p-timestamp = now;
  __activate_task(p, rq);

 i've tried this and it lasted only a few minutes -- i was seeing
 mouse cursor stalls lasting almost 1s. i/o bound tasks starving X?
 After reverting the patch everything is smooth again.

This patch wasn't really meant for production, as any sleeping background 
proc turned cpu-hog may DoS the system.

If you like to play with this, then you probably want to at least reset the 
quota in its expiration.


Thanks!

--
Al

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-19 Thread Willy Tarreau

On Mon, Mar 19, 2007 at 08:11:55PM -0700, Linus Torvalds wrote:
> Quite frankly, I was *planning* on merging RSDL very early after 2.6.21, 
> but there is one thing that has turned me completely off the whole thing:
> 
>  - the people involved seem to be totally unwilling to even admit there 
>might be a problem.
> 
> This is like alcoholism. If you cannot admit that you might have a 
> problem, you'll never get anywhere. And quite frankly, the RSDL proponents 
> seem to be in denial ("we're always better", "it's your problem if the old 
> scheduler works better", "just one report of old scheduler being better").
> 
> And the thing is, if people aren't even _willing_ to admit that there may 
> be issues, there's *no*way*in*hell* I will merge it even for testing. 
> Because the whole and only point of merging RSDL was to see if it could 
> replace the old scheduler, and the most important feature in that case is 
> not whether it is perfect, BUT WHETHER ANYBODY IS INTERESTED IN TRYING TO 
> FIX THE INEVITABLE PROBLEMS!

Linus, you're unfair with Con. He initially was on this position, and lately
worked with Mike by proposing changes to try to improve his X responsiveness.
But he's ill right now and cannot touch the keyboard, so only his supporters
speak for him, and as you know, speech is not code and does not fix problems.

Leave him a week or so to relieve and let's see what he can propose. Hopefully
a week away from the keyboard will help him think with a more general approach.
Also, Mike has already modified the code a bit to get better experience.

Also, while I don't agree with starting to renice X to get something usable,
it seems real that there's something funny on Mike's system which makes it
behave particularly strangely when combined with RSDL, because other people
in comparable tests (including me) have found X perfectly smooth even with
loads in the tens or even hundreds. I really suspect that we will find a bug
in RSDL which triggers the problem and that this fix will help discover
another problem on Mike's hardware which was not triggered by mainline.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-19 Thread Linus Torvalds

On Mon, 19 Mar 2007, Xavier Bestel wrote:
> > >> Stock scheduler wins easily, no contest.
> > > 
> > > What happens when you renice X ?
> > 
> > Dunno -- not necessary with the stock scheduler.
> 
> Could you try something like renice -10 $(pidof Xorg) ?

Could you try something as simple and accepting that maybe this is a 
problem?

Quite frankly, I was *planning* on merging RSDL very early after 2.6.21, 
but there is one thing that has turned me completely off the whole thing:

 - the people involved seem to be totally unwilling to even admit there 
   might be a problem.

This is like alcoholism. If you cannot admit that you might have a 
problem, you'll never get anywhere. And quite frankly, the RSDL proponents 
seem to be in denial ("we're always better", "it's your problem if the old 
scheduler works better", "just one report of old scheduler being better").

And the thing is, if people aren't even _willing_ to admit that there may 
be issues, there's *no*way*in*hell* I will merge it even for testing. 
Because the whole and only point of merging RSDL was to see if it could 
replace the old scheduler, and the most important feature in that case is 
not whether it is perfect, BUT WHETHER ANYBODY IS INTERESTED IN TRYING TO 
FIX THE INEVITABLE PROBLEMS!

See?

Can you people not see that the way you're doing that "RSDL is perfect" 
chorus in the face of people who report problems, you're just making it 
totally unrealistic that it will *ever* get merged.

So unless somebody steps up to the plate and actually *talks* about the 
problem reports, and admits that maybe RSDL will need some tweaking, I'm 
not going to merge it.

Because there is just _one_ thing that is more important than code - and 
that is the willingness to fix the code...

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-19 Thread Al Boldi

Mark Lord wrote:
> Al Boldi wrote:
> >..
> > Mike, I'm not saying RSDL is perfect, but v0.31 is by far better than
> > mainline.  Try this easy test:
> >
> > startx with the vesa driver
> > run reflect from the mesa5.0-demos
> > load 5 cpu-hogs
> > start moving the mouse
> >
> > On my desktop, mainline completely breaks down, and no nicing may
> > rescue.
> >
> > On RSDL, even without nicing, the desktop is at least useable.
>
> I use a simpler, far more common (for lkml participants) workload:
>
> Dell notebook, single P-M-2GHz, ATI X300, open source X.org:
> (1) build a kernel in one window with "make -j$((NUMBER_OF_CPUS + 1))".
> (2) try to read email and/or surf in Firefox/Thunderbird.
>
> Stock scheduler wins easily, no contest.

Try this on RSDL:

--- sched.bak.c 2007-03-16 23:07:23.0 +0300
+++ sched.c 2007-03-19 23:49:40.0 +0300
@@ -938,7 +938,11 @@ static void activate_task(struct task_st
 (now - p->timestamp) >> 20);
}
 
-   p->quota = rr_quota(p);
+   /*
+* boost factor hardcoded to 5; adjust to your liking
+* higher means more likely to DoS
+*/
+   p->quota = rr_quota(p) + (((now - p->timestamp) >> 20) * 5);
p->prio = effective_prio(p);
p->timestamp = now;
__activate_task(p, rq);


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ck] Re: RSDL v0.31

2007-03-19 Thread Bill Davidsen


Kasper Sandberg wrote:

On Sun, 2007-03-18 at 08:38 +0100, Mike Galbraith wrote:

On Sun, 2007-03-18 at 08:22 +0100, Radoslaw Szkodzinski wrote:


I'd recon KDE regresses because of kioslaves waiting on a pipe
(communication with the app they're doing IO for) and then expiring.
That's why splitting IO from an app isn't exactly smart. It should at
least be ran in an another thread.

Hm.  Sounds rather a lot like the...
X sucks, fix X and RSDL will rock your world.  RSDL is perfect.
...that I've been getting.


not really, only X sucks. KDE works atleast as good with rsdl as
vanilla. i dont know how originally said kde works worse, wasnt it just
someone that thought?

It was probably me, and I had the opinion that KDE is not as smooth as 
GNOME with RSDL. I haven't had time to measure, but using for daily 
stuff for about an hour each way hasn't changed my opinion. Every once 
in a while KDE will KLUNK to a halt for 200-300ms doing mundane stuff 
like redrawing a page, scrolling, etc. I don't see it with GNOME.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-19 Thread Xavier Bestel

On Mon, 2007-03-19 at 12:36 -0400, Mark Lord wrote:
> Xavier Bestel wrote:
> > On Mon, 2007-03-19 at 12:07 -0400, Mark Lord wrote:
> >> Dell notebook, single P-M-2GHz, ATI X300, open source X.org:
> >> (1) build a kernel in one window with "make -j$((NUMBER_OF_CPUS +
> 1))".
> >> (2) try to read email and/or surf in Firefox/Thunderbird.
> >>
> >> Stock scheduler wins easily, no contest.
> > 
> > What happens when you renice X ?
> 
> Dunno -- not necessary with the stock scheduler.

Could you try something like renice -10 $(pidof Xorg) ?

Xav


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-19 Thread Mark Lord


Xavier Bestel wrote:

On Mon, 2007-03-19 at 12:07 -0400, Mark Lord wrote:

Dell notebook, single P-M-2GHz, ATI X300, open source X.org:
(1) build a kernel in one window with "make -j$((NUMBER_OF_CPUS + 1))".
(2) try to read email and/or surf in Firefox/Thunderbird.

Stock scheduler wins easily, no contest.


What happens when you renice X ?


Dunno -- not necessary with the stock scheduler.
Nicing the "make" helped with RSDL, though.
But again, the stock scheduler "just works" in that regard.

I agree with Ingo -- the auto-renice feature is something
very useful for desktops, and is missing from RSDL.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-19 Thread Xavier Bestel

On Mon, 2007-03-19 at 12:07 -0400, Mark Lord wrote:
> Dell notebook, single P-M-2GHz, ATI X300, open source X.org:
> (1) build a kernel in one window with "make -j$((NUMBER_OF_CPUS + 1))".
> (2) try to read email and/or surf in Firefox/Thunderbird.
> 
> Stock scheduler wins easily, no contest.

What happens when you renice X ?

Xav


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-19 Thread Mark Lord


Al Boldi wrote:

..
Mike, I'm not saying RSDL is perfect, but v0.31 is by far better than 
mainline.  Try this easy test:


startx with the vesa driver
run reflect from the mesa5.0-demos
load 5 cpu-hogs
start moving the mouse

On my desktop, mainline completely breaks down, and no nicing may rescue.

On RSDL, even without nicing, the desktop is at least useable.


I use a simpler, far more common (for lkml participants) workload:

Dell notebook, single P-M-2GHz, ATI X300, open source X.org:
(1) build a kernel in one window with "make -j$((NUMBER_OF_CPUS + 1))".
(2) try to read email and/or surf in Firefox/Thunderbird.

Stock scheduler wins easily, no contest.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-19 Thread Mark Lord


Mike Galbraith wrote:

On Sat, 2007-03-17 at 20:48 +1100, Con Kolivas wrote:

The most frustrating part of a discussion of this nature on lkml is that 
earlier information in a thread seems to be long forgotten after a few days 
and all that is left is the one reporter having a problem.


One?  I'm not the only person who reported regression.


Ditto here.  I'm not the only one after all!
Reverted back to stock scheduler, and desktop is interactive again.

-ml
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-19 Thread Chris Friesen



Just so you know the context, I'm coming at this from the point of view 
of an embedded call server designer.


Mark Hahn wrote:

why do you think fairness is good, especially always good?


Fairness is good because it promotes predictability.  See the 
"deterministic" section below.



even starvation is sometimes a good thing - there's a place for processes
that only use the CPU if it is otherwise idle.  that is, they are
deliberately starved all the rest of the time.


If you have nice 19 be sufficiently low priority, then the difference 
between "using cpu if otherwise idle" and "gets a little bit of cpu even 
if not totally idle" is unimportant.


Starvation is a very *bad* thing when you don't want it.



Much lower and bound latencies



in an average sense?  also, under what circumstances does this actually
matter?  (please don't offer something like RT audio on an overloaded 
machine- that's operator error, not something to design for.)


In my environment, latency *matters*.  If a packet doesn't get processed 
in time, you drop it.  With mainline it can be quite tricky to tune the 
latency, especially when you don't want to resort to soft realtime 
because you don't entirely trust the code thats running (because it came 
from a third party vendor).




Deterministic


not a bad thing, but how does this make itself apparent and of value to 
the user?  I think everyone is extremely comfortable with non-determinism

(stemming from networks, caches, interleaved workloads, etc)


Determinism is really important.  It almost doesn't matter what the 
behaviour is, as long as we can predict it.  We model the system to 
determine how to tweak the system (niceness, sched policy, etc.), as 
well as what performance numbers we can advertise.  If the system is 
non-deterministic, it makes this modelling extremely difficult--you end 
up having to give up significant performance due to worst-case spikes.


If the system is deterministic, it makes it much easier to predict its 
actions.


Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-19 Thread Willy Tarreau

On Mon, Mar 19, 2007 at 07:21:47AM +0100, Mike Galbraith wrote:
> On Sun, 2007-03-18 at 19:27 -0700, David Schwartz wrote:
> 
> > > Wrong.  I call a good job giving a _preference_ to the desktop.  I call
> > > rigid fairness impractical for the desktop, and a denial of reality.
> > 
> > Assuming you *want* that. It's possible that the desktop may not be
> > particularly important and the machine may be doing much more important
> > server work with critical latency issues. So if you want that, you have to
> > ask for it.
> 
> Amusing argument ;-) I doubt that there are many admins ripping and
> encoding CDs on their employers critical production servers.

I've known one at least, he said he was ensuring the shiny new dual athlons
were stable enough for production ;-) But that does not make a rule.

Cheers,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-19 Thread Willy Tarreau

On Mon, Mar 19, 2007 at 07:21:47AM +0100, Mike Galbraith wrote:
 On Sun, 2007-03-18 at 19:27 -0700, David Schwartz wrote:
 
   Wrong.  I call a good job giving a _preference_ to the desktop.  I call
   rigid fairness impractical for the desktop, and a denial of reality.
  
  Assuming you *want* that. It's possible that the desktop may not be
  particularly important and the machine may be doing much more important
  server work with critical latency issues. So if you want that, you have to
  ask for it.
 
 Amusing argument ;-) I doubt that there are many admins ripping and
 encoding CDs on their employers critical production servers.

I've known one at least, he said he was ensuring the shiny new dual athlons
were stable enough for production ;-) But that does not make a rule.

Cheers,
Willy

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-19 Thread Chris Friesen



Just so you know the context, I'm coming at this from the point of view 
of an embedded call server designer.


Mark Hahn wrote:

why do you think fairness is good, especially always good?


Fairness is good because it promotes predictability.  See the 
deterministic section below.



even starvation is sometimes a good thing - there's a place for processes
that only use the CPU if it is otherwise idle.  that is, they are
deliberately starved all the rest of the time.


If you have nice 19 be sufficiently low priority, then the difference 
between using cpu if otherwise idle and gets a little bit of cpu even 
if not totally idle is unimportant.


Starvation is a very *bad* thing when you don't want it.



Much lower and bound latencies



in an average sense?  also, under what circumstances does this actually
matter?  (please don't offer something like RT audio on an overloaded 
machine- that's operator error, not something to design for.)


In my environment, latency *matters*.  If a packet doesn't get processed 
in time, you drop it.  With mainline it can be quite tricky to tune the 
latency, especially when you don't want to resort to soft realtime 
because you don't entirely trust the code thats running (because it came 
from a third party vendor).




Deterministic


not a bad thing, but how does this make itself apparent and of value to 
the user?  I think everyone is extremely comfortable with non-determinism

(stemming from networks, caches, interleaved workloads, etc)


Determinism is really important.  It almost doesn't matter what the 
behaviour is, as long as we can predict it.  We model the system to 
determine how to tweak the system (niceness, sched policy, etc.), as 
well as what performance numbers we can advertise.  If the system is 
non-deterministic, it makes this modelling extremely difficult--you end 
up having to give up significant performance due to worst-case spikes.


If the system is deterministic, it makes it much easier to predict its 
actions.


Chris
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-19 Thread Mark Lord


Mike Galbraith wrote:

On Sat, 2007-03-17 at 20:48 +1100, Con Kolivas wrote:

The most frustrating part of a discussion of this nature on lkml is that 
earlier information in a thread seems to be long forgotten after a few days 
and all that is left is the one reporter having a problem.


One?  I'm not the only person who reported regression.


Ditto here.  I'm not the only one after all!
Reverted back to stock scheduler, and desktop is interactive again.

-ml
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-19 Thread Mark Lord


Al Boldi wrote:

..
Mike, I'm not saying RSDL is perfect, but v0.31 is by far better than 
mainline.  Try this easy test:


startx with the vesa driver
run reflect from the mesa5.0-demos
load 5 cpu-hogs
start moving the mouse

On my desktop, mainline completely breaks down, and no nicing may rescue.

On RSDL, even without nicing, the desktop is at least useable.


I use a simpler, far more common (for lkml participants) workload:

Dell notebook, single P-M-2GHz, ATI X300, open source X.org:
(1) build a kernel in one window with make -j$((NUMBER_OF_CPUS + 1)).
(2) try to read email and/or surf in Firefox/Thunderbird.

Stock scheduler wins easily, no contest.

Cheers
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-19 Thread Xavier Bestel

On Mon, 2007-03-19 at 12:07 -0400, Mark Lord wrote:
 Dell notebook, single P-M-2GHz, ATI X300, open source X.org:
 (1) build a kernel in one window with make -j$((NUMBER_OF_CPUS + 1)).
 (2) try to read email and/or surf in Firefox/Thunderbird.
 
 Stock scheduler wins easily, no contest.

What happens when you renice X ?

Xav


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-19 Thread Mark Lord


Xavier Bestel wrote:

On Mon, 2007-03-19 at 12:07 -0400, Mark Lord wrote:

Dell notebook, single P-M-2GHz, ATI X300, open source X.org:
(1) build a kernel in one window with make -j$((NUMBER_OF_CPUS + 1)).
(2) try to read email and/or surf in Firefox/Thunderbird.

Stock scheduler wins easily, no contest.


What happens when you renice X ?


Dunno -- not necessary with the stock scheduler.
Nicing the make helped with RSDL, though.
But again, the stock scheduler just works in that regard.

I agree with Ingo -- the auto-renice feature is something
very useful for desktops, and is missing from RSDL.

Cheers
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-19 Thread Xavier Bestel

On Mon, 2007-03-19 at 12:36 -0400, Mark Lord wrote:
 Xavier Bestel wrote:
  On Mon, 2007-03-19 at 12:07 -0400, Mark Lord wrote:
  Dell notebook, single P-M-2GHz, ATI X300, open source X.org:
  (1) build a kernel in one window with make -j$((NUMBER_OF_CPUS +
 1)).
  (2) try to read email and/or surf in Firefox/Thunderbird.
 
  Stock scheduler wins easily, no contest.
  
  What happens when you renice X ?
 
 Dunno -- not necessary with the stock scheduler.

Could you try something like renice -10 $(pidof Xorg) ?

Xav


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ck] Re: RSDL v0.31

2007-03-19 Thread Bill Davidsen


Kasper Sandberg wrote:

On Sun, 2007-03-18 at 08:38 +0100, Mike Galbraith wrote:

On Sun, 2007-03-18 at 08:22 +0100, Radoslaw Szkodzinski wrote:


I'd recon KDE regresses because of kioslaves waiting on a pipe
(communication with the app they're doing IO for) and then expiring.
That's why splitting IO from an app isn't exactly smart. It should at
least be ran in an another thread.

Hm.  Sounds rather a lot like the...
X sucks, fix X and RSDL will rock your world.  RSDL is perfect.
...that I've been getting.


not really, only X sucks. KDE works atleast as good with rsdl as
vanilla. i dont know how originally said kde works worse, wasnt it just
someone that thought?

It was probably me, and I had the opinion that KDE is not as smooth as 
GNOME with RSDL. I haven't had time to measure, but using for daily 
stuff for about an hour each way hasn't changed my opinion. Every once 
in a while KDE will KLUNK to a halt for 200-300ms doing mundane stuff 
like redrawing a page, scrolling, etc. I don't see it with GNOME.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-19 Thread Al Boldi

Mark Lord wrote:
 Al Boldi wrote:
 ..
  Mike, I'm not saying RSDL is perfect, but v0.31 is by far better than
  mainline.  Try this easy test:
 
  startx with the vesa driver
  run reflect from the mesa5.0-demos
  load 5 cpu-hogs
  start moving the mouse
 
  On my desktop, mainline completely breaks down, and no nicing may
  rescue.
 
  On RSDL, even without nicing, the desktop is at least useable.

 I use a simpler, far more common (for lkml participants) workload:

 Dell notebook, single P-M-2GHz, ATI X300, open source X.org:
 (1) build a kernel in one window with make -j$((NUMBER_OF_CPUS + 1)).
 (2) try to read email and/or surf in Firefox/Thunderbird.

 Stock scheduler wins easily, no contest.

Try this on RSDL:

--- sched.bak.c 2007-03-16 23:07:23.0 +0300
+++ sched.c 2007-03-19 23:49:40.0 +0300
@@ -938,7 +938,11 @@ static void activate_task(struct task_st
 (now - p-timestamp)  20);
}
 
-   p-quota = rr_quota(p);
+   /*
+* boost factor hardcoded to 5; adjust to your liking
+* higher means more likely to DoS
+*/
+   p-quota = rr_quota(p) + (((now - p-timestamp)  20) * 5);
p-prio = effective_prio(p);
p-timestamp = now;
__activate_task(p, rq);


Thanks!

--
Al

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-19 Thread Linus Torvalds



On Mon, 19 Mar 2007, Xavier Bestel wrote:
   Stock scheduler wins easily, no contest.
   
   What happens when you renice X ?
  
  Dunno -- not necessary with the stock scheduler.
 
 Could you try something like renice -10 $(pidof Xorg) ?

Could you try something as simple and accepting that maybe this is a 
problem?

Quite frankly, I was *planning* on merging RSDL very early after 2.6.21, 
but there is one thing that has turned me completely off the whole thing:

 - the people involved seem to be totally unwilling to even admit there 
   might be a problem.

This is like alcoholism. If you cannot admit that you might have a 
problem, you'll never get anywhere. And quite frankly, the RSDL proponents 
seem to be in denial (we're always better, it's your problem if the old 
scheduler works better, just one report of old scheduler being better).

And the thing is, if people aren't even _willing_ to admit that there may 
be issues, there's *no*way*in*hell* I will merge it even for testing. 
Because the whole and only point of merging RSDL was to see if it could 
replace the old scheduler, and the most important feature in that case is 
not whether it is perfect, BUT WHETHER ANYBODY IS INTERESTED IN TRYING TO 
FIX THE INEVITABLE PROBLEMS!

See?

Can you people not see that the way you're doing that RSDL is perfect 
chorus in the face of people who report problems, you're just making it 
totally unrealistic that it will *ever* get merged.

So unless somebody steps up to the plate and actually *talks* about the 
problem reports, and admits that maybe RSDL will need some tweaking, I'm 
not going to merge it.

Because there is just _one_ thing that is more important than code - and 
that is the willingness to fix the code...

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RSDL v0.31

2007-03-19 Thread Willy Tarreau

On Mon, Mar 19, 2007 at 08:11:55PM -0700, Linus Torvalds wrote:
 Quite frankly, I was *planning* on merging RSDL very early after 2.6.21, 
 but there is one thing that has turned me completely off the whole thing:
 
  - the people involved seem to be totally unwilling to even admit there 
might be a problem.
 
 This is like alcoholism. If you cannot admit that you might have a 
 problem, you'll never get anywhere. And quite frankly, the RSDL proponents 
 seem to be in denial (we're always better, it's your problem if the old 
 scheduler works better, just one report of old scheduler being better).
 
 And the thing is, if people aren't even _willing_ to admit that there may 
 be issues, there's *no*way*in*hell* I will merge it even for testing. 
 Because the whole and only point of merging RSDL was to see if it could 
 replace the old scheduler, and the most important feature in that case is 
 not whether it is perfect, BUT WHETHER ANYBODY IS INTERESTED IN TRYING TO 
 FIX THE INEVITABLE PROBLEMS!

Linus, you're unfair with Con. He initially was on this position, and lately
worked with Mike by proposing changes to try to improve his X responsiveness.
But he's ill right now and cannot touch the keyboard, so only his supporters
speak for him, and as you know, speech is not code and does not fix problems.

Leave him a week or so to relieve and let's see what he can propose. Hopefully
a week away from the keyboard will help him think with a more general approach.
Also, Mike has already modified the code a bit to get better experience.

Also, while I don't agree with starting to renice X to get something usable,
it seems real that there's something funny on Mike's system which makes it
behave particularly strangely when combined with RSDL, because other people
in comparable tests (including me) have found X perfectly smooth even with
loads in the tens or even hundreds. I really suspect that we will find a bug
in RSDL which triggers the problem and that this fix will help discover
another problem on Mike's hardware which was not triggered by mainline.

Regards,
Willy

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: RSDL v0.31

2007-03-18 Thread Mike Galbraith

On Sun, 2007-03-18 at 19:27 -0700, David Schwartz wrote:

> > Wrong.  I call a good job giving a _preference_ to the desktop.  I call
> > rigid fairness impractical for the desktop, and a denial of reality.
> 
> Assuming you *want* that. It's possible that the desktop may not be
> particularly important and the machine may be doing much more important
> server work with critical latency issues. So if you want that, you have to
> ask for it.

Amusing argument ;-) I doubt that there are many admins ripping and
encoding CDs on their employers critical production servers.

> Again, your complaint is that the other server gave you what you wanted even
> when you didn't ask for it. That's great for you but totally sucks for the
> majority of other people who want something else.

I don't presume to speak for the majority...

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: RSDL v0.31

2007-03-18 Thread David Schwartz


> P.S.  "utter failure" was too harsh.  What sticks in my craw is that the
> world has to adjust to fit this new scheduler.
>
>   -Mike

Even when it's totally clear that this scheduler is doing what you asked it
do while the old one wasn't? It still bothers you that now you have to ask
for what you want rather than asking for what happens to give you what you
want?

> Wrong.  I call a good job giving a _preference_ to the desktop.  I call
> rigid fairness impractical for the desktop, and a denial of reality.

Assuming you *want* that. It's possible that the desktop may not be
particularly important and the machine may be doing much more important
server work with critical latency issues. So if you want that, you have to
ask for it.

Again, your complaint is that the other server gave you what you wanted even
when you didn't ask for it. That's great for you but totally sucks for the
majority of other people who want something else.

DS


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 >

1 - 100 of 236 matches

Mail list logo