Re: More ULE bugs fixed.

2003-11-05 Thread Sheldon Hearn
On (2003/11/04 15:46), Jeff Roberson wrote:

  The thing is, I'm using 4BSD, not ULE, so I wouldn't trouble Jeff to
  look for a cause for that specific problem in ULE.
 
 How long have you been seeing this?  Are you using a usb mouse?  Can you
 try with PS/2 if you are?

Since my last update, Fri Oct 24 17:47:22.

I am using a USB mouse, but don't have a PS/2 one.  I'm also using
moused, and my WM is sawfish.

The problem with all these reports is that they're scattered.  It's hard
to pin down exactly what the common elements are.  Indeed, we may be
looking at combinations of elements.

I don't have time to be more helpful, which is why I hadn't complained.
I just wanted to include the datapoint that over-active mouse behaviour
under load exists under SCHED_4BSD as well.

Incidentally, this is under ATA disk load.  I don't really push my CPU.

Ciao,
Sheldon.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-11-05 Thread Eirik Oeverby
Sheldon Hearn wrote:
On (2003/11/04 15:46), Jeff Roberson wrote:


The thing is, I'm using 4BSD, not ULE, so I wouldn't trouble Jeff to
look for a cause for that specific problem in ULE.
How long have you been seeing this?  Are you using a usb mouse?  Can you
try with PS/2 if you are?


Since my last update, Fri Oct 24 17:47:22.

I am using a USB mouse, but don't have a PS/2 one.  I'm also using
moused, and my WM is sawfish.
The problem with all these reports is that they're scattered.  It's hard
to pin down exactly what the common elements are.  Indeed, we may be
looking at combinations of elements.
I don't have time to be more helpful, which is why I hadn't complained.
I just wanted to include the datapoint that over-active mouse behaviour
under load exists under SCHED_4BSD as well.
Incidentally, this is under ATA disk load.  I don't really push my CPU.
Though I am not a hardcore C programmer, much less a FreeBSD contributor 
in any way, I do have some experience in tracking down problems like 
this. Used to have a lot of them on some of the more obscure platforms 
I've been using in the past.
My feeling is (and it might be completely wrong ofcourse) that we are 
dealing with atleast two completely separate issues here. The first has 
to do with mouse jerkiness, the second has to do with bogus mouse events.
There is a significant difference between these two, and personally I am 
leaning towards concluding that the first has to do with the scheduler, 
and the second has to do with something entirely different - interrupt 
handler or something else of the sorts.
The first is simply that the mouse stops for a brief moment and then 
continues from the point where it stopped. Perhaps this is the situation 
that is remedied by bypassing moused? Is moused perhaps not getting the 
CPU cycles it needs to process and pass on mouse messages?
The second is that mouse messages are actually *lost*, or bogus ones are 
being generated. I guess it's the first, making moused or X misinterpret 
the messages it gets. Where along the chain it fails I obviously have no 
clue. The consequence of this is that when the mouse stops (like in #1) 
but then resumes from an entirely different point - be it 10 pixels away 
or at the other end of the screen - possibly even generating a button 
push (but not necessarily the corresponding button release) message.

These two situations could at first sight be mistaken for being the same 
symptom, but I am pretty sure they are very different. One may influence 
the other, or they may by coincidence (or for some good reason) happen 
at the same time, but I believe the errors happen in different parts of 
the kernel.

When you say you get the bogus mouse events (which I believe you are 
saying atleast ;) only during load, I'm immediately thinking that yes, 
that might make sense. But I guess that's better left to those who are 
in the know to decide ;) I have never seen it happen with the 4BSD 
scheduler, but that might have other reasons (hardware?).

Why don't you try with the new interrupt handler? Helped me quite a lot.. :)

/Eirik

Ciao,
Sheldon.


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-11-05 Thread Robert Watson

On Wed, 5 Nov 2003, Sheldon Hearn wrote:

 On (2003/11/04 15:46), Jeff Roberson wrote:
 
   The thing is, I'm using 4BSD, not ULE, so I wouldn't trouble Jeff to
   look for a cause for that specific problem in ULE.
  
  How long have you been seeing this?  Are you using a usb mouse?  Can you
  try with PS/2 if you are?
 
 Since my last update, Fri Oct 24 17:47:22. 
 
 I am using a USB mouse, but don't have a PS/2 one.  I'm also using
 moused, and my WM is sawfish. 
 
 The problem with all these reports is that they're scattered.  It's hard
 to pin down exactly what the common elements are.  Indeed, we may be
 looking at combinations of elements. 
 
 I don't have time to be more helpful, which is why I hadn't complained. 
 I just wanted to include the datapoint that over-active mouse behaviour
 under load exists under SCHED_4BSD as well. 
 
 Incidentally, this is under ATA disk load.  I don't really push my CPU. 

There's been some speculation that the PS/2 mouse problem could be due to
high interrupt latency for non-fast interrupt handlers (especially ones
not MPSAFE) in 5.x.  I think it would make a lot of sense for us to push
Giant off both the PS/2 mouse and syscons interrupt handlers in the near
future.  For syscons, this would also improve the reliability of dropping
into the debugger from a non-serial console.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Network Associates Laboratories


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-11-05 Thread Matthew D. Fuller
On Wed, Nov 05, 2003 at 11:28:50AM +0100 I heard the voice of
Eirik Oeverby, and lo! it spake thus:
 
 The second is that mouse messages are actually *lost*, or bogus ones are 
 being generated. I guess it's the first, making moused or X misinterpret 
 the messages it gets. Where along the chain it fails I obviously have no 
 clue. The consequence of this is that when the mouse stops (like in #1) 
 but then resumes from an entirely different point - be it 10 pixels away 
 or at the other end of the screen - possibly even generating a button 
 push (but not necessarily the corresponding button release) message.

Note that I've had this to a greater or lesser extent for as long as I
can remember (certainly back to 3.0-CURRENT).  It corresponds with
syslog'd messages on my xconsole along the lines of:

Nov  3 12:46:13 mortis kernel: psmintr: out of sync (00c0 != ).
Nov  3 12:46:13 mortis kernel: psmintr: discard a byte (12).

It's certainly a lot more common (by orders of magnitude) on 5.x in the
past...   oh, I dunno, year-ish, than it was previously.  I lose mouse
function for maybe a second, then it squirms itself off somewhere on the
screen and sends some button press events.

I'm currently running 5.1-R, the traditional scheduler, a PS/2 mouse with
no moused.  And since I got them (much more rarely) with earlier
5-CURRENT's, and with 4-CURRENT's, etc, I can't see how it's scheduler
related.


 When you say you get the bogus mouse events (which I believe you are 
 saying atleast ;) only during load, I'm immediately thinking that yes, 
 that might make sense.

I don't get it only under load; sometimes from flat idle.  However, it's
usually when I first move the mouse, after it sitting still for a while
(where 'while' can vary from a few seconds to a few days, of course); it
hardly ever happens in mid-move.



-- 
Matthew Fuller (MF4839)   |  [EMAIL PROTECTED]
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/

The only reason I'm burning my candle at both ends, is because I
  haven't figured out how to light the middle yet
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-11-05 Thread Eirik Oeverby
Matthew D. Fuller wrote:
On Wed, Nov 05, 2003 at 11:28:50AM +0100 I heard the voice of
Eirik Oeverby, and lo! it spake thus:
The second is that mouse messages are actually *lost*, or bogus ones are 
being generated. I guess it's the first, making moused or X misinterpret 
the messages it gets. Where along the chain it fails I obviously have no 
clue. The consequence of this is that when the mouse stops (like in #1) 
but then resumes from an entirely different point - be it 10 pixels away 
or at the other end of the screen - possibly even generating a button 
push (but not necessarily the corresponding button release) message.


Note that I've had this to a greater or lesser extent for as long as I
can remember (certainly back to 3.0-CURRENT).  It corresponds with
syslog'd messages on my xconsole along the lines of:
Nov  3 12:46:13 mortis kernel: psmintr: out of sync (00c0 != ).
Nov  3 12:46:13 mortis kernel: psmintr: discard a byte (12).
It's certainly a lot more common (by orders of magnitude) on 5.x in the
past...   oh, I dunno, year-ish, than it was previously.  I lose mouse
function for maybe a second, then it squirms itself off somewhere on the
screen and sends some button press events.
I'm currently running 5.1-R, the traditional scheduler, a PS/2 mouse with
no moused.  And since I got them (much more rarely) with earlier
5-CURRENT's, and with 4-CURRENT's, etc, I can't see how it's scheduler
related.
No idea, but I never got messages like the ones you mention, and it has 
absolutely never happened on 4.x or with SCHED_4BSD.
Weirdness. :)

/Eirik



When you say you get the bogus mouse events (which I believe you are 
saying atleast ;) only during load, I'm immediately thinking that yes, 
that might make sense.


I don't get it only under load; sometimes from flat idle.  However, it's
usually when I first move the mouse, after it sitting still for a while
(where 'while' can vary from a few seconds to a few days, of course); it
hardly ever happens in mid-move.




___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-11-04 Thread Eirik Oeverby
Jeff Roberson wrote:
On Mon, 3 Nov 2003, Eirik Oeverby wrote:


Hi,

Just recompiled yesterday, running sched_ule.c 1.75. It seems to have
re-introduced the bogus mouse events I talked about earlier, after a
period of having no problems with it. The change happened between 1.69
and 1.75, and there's also the occational glitch in keyboard input.


How unfortunate, it seems to have fixed other problems.  Can you describe
the mouse problem?  Is it jittery constantly or only under load?  Or are
you having other problems?  Have you tried reverting to SCHED_4BSD?  What
window manager do you run?
The problem is two parts: The mouse tends to 'lock up' for brief moments
when the system is under load, in particular during heavy UI operations
or when doing compile jobs and such.
The second part of the problem is related, and is manifested by the
mouse actually making movements I never asked it to make. It's almost as
if messages passed from the mouse (PS/2 type) through the kernel are
being corrupted or lost - moving the mouse slowly in one direction will
suddenly make it jump half across the screen and continue. Also it will
quite often produce bogus clicks and drags, i.e. I'll be moving the
mouse across the screen and suddenly it grabs something and doesn't let
go - as if it got a MouseRightDown event but no MouseRightRelease event
(or whatever they are called in the world you are in - I'm coming from
OS/2 and other obscure platforms ;).
The second problem usually follows the first - it's more likely to
happen when the system is under some kind of load. Heavy window
repainting/updating (Mozilla Thunderbird is a prime example, but other
apps can be just as guilty), compile jobs, VMWare going crazy with the
CPU, heavy disk/network I/O .. Anything that places load on the
system/kernel can cause this.
Running with SCHED_4BSD completely solves these problems, and the bogus
mouse event problems did NOT appear with sched_ule 1.69 (which is the
last one I tried before 1.75). It did appear with ~1.50 and thereabouts
though (as reported earlier in this thread).
I'm currently running WindowMaker as window manager, but the problem
also exists in Gnome and xfce4. Gnome is more likely to exhibit the
problem even during low system loads, given that it's more violent UI-wise.
You are right though, the later sched_ule revisions DO seem to be better
in many other respects - overall performance 'feels' better (atleast in
console mode). The mouse issues makes X kinda hard to use though.
Btw you might be interested in knowing that the keyboard from time to
time shows what I think is bogus input aswell - I have a consistently
higher rate of failure when typing with sched_ule 1.75 than I had with
1.69, and it definitely feels as if keystrokes are getting lost or
repeated when they shouldn't be. Not often, had two or three
'suspicious' typos while writing this message, and I am reluctant to say
it's a definite kernel issue, because I'm nowhere near a perfect typist
- but it is nevertheless worth noting and might even be worth looking
into. Might there be a connection between this and the mouse issues?
Thanks,
/Eirik

Thanks for the report.

Cheers,
Jeff

If you need me to do anything to track this down, let me know. I am, and
have always been, running with moused, on a uniprocessor box (ThinkPad
T21 1ghz p3).
Best regards,
/Eirik
Jeff Roberson wrote:

On Fri, 31 Oct 2003, Bruno Van Den Bossche wrote:



Jeff Roberson [EMAIL PROTECTED] wrote:



On Wed, 29 Oct 2003, Jeff Roberson wrote:



On Thu, 30 Oct 2003, Bruce Evans wrote:



Test for scheduling buildworlds:

cd /usr/src/usr.bin
for i in obj depend all
do
MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i
done /tmp/zqz 21
(Run this with an empty /somewhere/obj.  The all stage doesn't
quite finish.)  On an ABIT BP6 system with a 400MHz and a 366MHz
CPU, with/usr (including /usr/src) nfs-mounted (with 100 Mbps
ethernet and a reasonably fast server) and /somewhere/obj
ufs1-mounted (on a fairly slow disk; no soft-updates), this
gives the following times:
SCHED_ULE-yesterday, with not so careful setup:
 40.37 real 8.26 user 6.26 sys
278.90 real59.35 user41.32 sys
341.82 real   307.38 user69.01 sys
SCHED_ULE-today, run immediately after booting:
 41.51 real 7.97 user 6.42 sys
306.64 real59.66 user40.68 sys
346.48 real   305.54 user69.97 sys
SCHED_4BSD-yesterday, with not so careful setup:
[same as today except the depend step was 10 seconds
slower (real)]
SCHED_4BSD-today, run immediately after booting:
 18.89 real 8.01 user 6.66 sys
128.17 real58.33 user43.61 sys
291.59 real   308.48 user72.33 sys
SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz
CPU) with
  many local changes and not so careful setup:
 17.39 real 8.28 user 5.49 sys
 

Re: More ULE bugs fixed.

2003-11-04 Thread Sheldon Hearn
On (2003/11/04 09:29), Eirik Oeverby wrote:

 The problem is two parts: The mouse tends to 'lock up' for brief moments
 when the system is under load, in particular during heavy UI operations
 or when doing compile jobs and such.
 The second part of the problem is related, and is manifested by the
 mouse actually making movements I never asked it to make.

Wow, I just assumed it was a local problem.  I'm also seeing unrequested
mouse movement, as if the signals from movements are repeated or
amplified.

The thing is, I'm using 4BSD, not ULE, so I wouldn't trouble Jeff to
look for a cause for that specific problem in ULE.

Ciao,
Sheldon.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-11-04 Thread Jeff Roberson
On Tue, 4 Nov 2003, Sheldon Hearn wrote:

 On (2003/11/04 09:29), Eirik Oeverby wrote:

  The problem is two parts: The mouse tends to 'lock up' for brief moments
  when the system is under load, in particular during heavy UI operations
  or when doing compile jobs and such.
  The second part of the problem is related, and is manifested by the
  mouse actually making movements I never asked it to make.

 Wow, I just assumed it was a local problem.  I'm also seeing unrequested
 mouse movement, as if the signals from movements are repeated or
 amplified.

 The thing is, I'm using 4BSD, not ULE, so I wouldn't trouble Jeff to
 look for a cause for that specific problem in ULE.

How long have you been seeing this?  Are you using a usb mouse?  Can you
try with PS/2 if you are?

Thanks,
Jeff


 Ciao,
 Sheldon.
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to [EMAIL PROTECTED]


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-11-04 Thread Eirik Oeverby
Sheldon Hearn wrote:
On (2003/11/04 09:29), Eirik Oeverby wrote:


The problem is two parts: The mouse tends to 'lock up' for brief moments
when the system is under load, in particular during heavy UI operations
or when doing compile jobs and such.
The second part of the problem is related, and is manifested by the
mouse actually making movements I never asked it to make.


Wow, I just assumed it was a local problem.  I'm also seeing unrequested
mouse movement, as if the signals from movements are repeated or
amplified.
The thing is, I'm using 4BSD, not ULE, so I wouldn't trouble Jeff to
look for a cause for that specific problem in ULE.
That is indeed interesting. When I return to 4BSD, everything works very
nicely. Perhaps this is some interrupt issue or something? I'll
recompile tonight and try with a new kernel (new interrupt stuff for
i386 has been checked in recently) and report back.
Sorry about the (possibly) false alarm!

/Eirik


Ciao,
Sheldon.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-11-03 Thread Eirik Oeverby
Hi,

Just recompiled yesterday, running sched_ule.c 1.75. It seems to have 
re-introduced the bogus mouse events I talked about earlier, after a 
period of having no problems with it. The change happened between 1.69 
and 1.75, and there's also the occational glitch in keyboard input.

If you need me to do anything to track this down, let me know. I am, and 
have always been, running with moused, on a uniprocessor box (ThinkPad 
T21 1ghz p3).

Best regards,
/Eirik
Jeff Roberson wrote:
On Fri, 31 Oct 2003, Bruno Van Den Bossche wrote:


Jeff Roberson [EMAIL PROTECTED] wrote:


On Wed, 29 Oct 2003, Jeff Roberson wrote:


On Thu, 30 Oct 2003, Bruce Evans wrote:


Test for scheduling buildworlds:

cd /usr/src/usr.bin
for i in obj depend all
do
MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i
done /tmp/zqz 21
(Run this with an empty /somewhere/obj.  The all stage doesn't
quite finish.)  On an ABIT BP6 system with a 400MHz and a 366MHz
CPU, with/usr (including /usr/src) nfs-mounted (with 100 Mbps
ethernet and a reasonably fast server) and /somewhere/obj
ufs1-mounted (on a fairly slow disk; no soft-updates), this
gives the following times:
SCHED_ULE-yesterday, with not so careful setup:
  40.37 real 8.26 user 6.26 sys
 278.90 real59.35 user41.32 sys
 341.82 real   307.38 user69.01 sys
SCHED_ULE-today, run immediately after booting:
  41.51 real 7.97 user 6.42 sys
 306.64 real59.66 user40.68 sys
 346.48 real   305.54 user69.97 sys
SCHED_4BSD-yesterday, with not so careful setup:
 [same as today except the depend step was 10 seconds
 slower (real)]
SCHED_4BSD-today, run immediately after booting:
  18.89 real 8.01 user 6.66 sys
 128.17 real58.33 user43.61 sys
 291.59 real   308.48 user72.33 sys
SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz
CPU) with
   many local changes and not so careful setup:
  17.39 real 8.28 user 5.49 sys
 130.51 real60.97 user34.63 sys
 390.68 real   310.78 user60.55 sys
Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for
the obj and depend stages.  These stages have little
parallelism.  SCHED_ULE was only 19% slower for the all stage.
...
I reran this with -current (sched_ule.c 1.68, etc.).  Result: no
significant change.  However, with a UP kernel there was no
significant difference between the times for SCHED_ULE and
SCHED_4BSD.
There was a significant difference on UP until last week.  I'm
working on SMP now.  I have some patches but they aren't quite ready
yet.
I have commited my SMP fixes.  I would appreciate it if you could post
update results.  ULE now outperforms 4BSD in a single threaded kernel
compile and performs almost identically in a 16 way make.  I still
have a few more things that I can do to improve the situation.  I
would expect ULE to pull further ahead in the months to come.
I recently had to complete a little piece of software in a course on
parallel computing.  I've put it online[1] (we only had to write the
pract2.cpp file).  It calculates the inverse of a Vandermonde matrix and
allows you to spawn multiple slave-processes who each perform a part of
the work.  Everything happens in memory so
I've used it lately to test the different changes you made to
sched_ule.c and these last fixes do improve the performance on my dual
p3 machine a lot.
Here are the results of my (very limited tests) :

sched4bsd
---
dimension   slaves  time
10001   90.925408
10002   58.897038
200 1   0.735962
200 2   0.676660
sched_ule 1.68
---
dimension   slaves  time
10001   90.951015
10002   70.402845
200 1   0.743551
200 2   1.900455
sched_ule 1.70
---
dimension   slaves  time
10001   90.782309
10002   57.207351
200 1   0.739998
200 2   0.383545
I'm not really sure if this is very relevant to you, but from the
end-user point of view (me :-)) this does means something.
Thanks!


I welcome the feedback, positive or negative, as it helps me improve
things.  Thanks for the report!  Could you run this again under 4bsd and
ULE with the following in your .cshrc:
set time= ( 5 %Uu %Ss %E %P %X+%Dk %I+%Oio %Fpf+%Ww %cc/%ww )

And then time ./testpract 200 2, etc.  This will give me a few hints about
what's impacting your performance.
Thanks!
Jeff

[1] http://users.pandora.be/bomberboy/mptest/final.tar.bz2
It can be used by running testpract2 with two arguments, the dimension
of the matrix and the number of slaves.  example './testpract2 200 2'
will create a matrix with 

Re: More ULE bugs fixed.

2003-11-03 Thread Bruce Evans
On Sun, 2 Nov 2003, Jeff Roberson wrote:

 On Sat, 1 Nov 2003, Bruce Evans wrote:

  My simple make benchmark now takes infinitely longer with ULE under SMP,
  since make -j 16 with ULE under SMP now hangs nfs after about a minute.
  4BSD works better.  However, some networking bugs have developed in the
  last few days.  One of their manifestations is that SMP kernels always
  panic in sbdrop() on shutdown.

This was fixed by setting debug.mpsafenet to 0 (fxp is apparently not MPSAFE
yet).

The last run with sched_ule.c 1.75 shows little difference between ULE
and 4BSD:

% *** zqz.4bsd.1Wed Oct 29 22:03:29 2003
% --- zqz.ule.3 Sun Nov  2 22:58:53 2003
% ***
% *** 4 
% --- 5,6 
% + === atm
% + === atm/sscop

The tree compiled by 4BSD is 4 days older so ULE does these extra.

% ***
% *** 227 
% !18.49 real 8.26 user 6.38 sys
% --- 229 
% !18.44 real 8.00 user 6.43 sys

Differences for make obj (all this in usr.bin tree).

% ***
% *** 229,233 
% !265  average shared memory size
% !116  average unshared data size
% !125  average unshared stack size
% !  23222  page reclaims
% ! 26  page faults
% --- 231,235 
% !274  average shared memory size
% !118  average unshared data size
% !128  average unshared stack size
% !  22760  page reclaims
% ! 25  page faults
% ***
% *** 236,241 
% !918  block output operations
% !   9893  messages sent
% !   9893  messages received
% !230  signals received
% !  13034  voluntary context switches
% !   1216  involuntary context switches
% --- 238,243 
% !926  block output operations
% !   9973  messages sent
% !   9973  messages received
% !232  signals received
% !  17432  voluntary context switches
% !   1583  involuntary context switches

Tiny differences in time -l output for obj stage, except ULE does more
context switches.

The signals are mostly SIGCHLD (needed to fix make(1)).

% ***
% *** 245 
% --- 248,249 
% + === atm
% + === atm/sscop
% ***
% *** 506 
% !   126.67 real57.42 user43.83 sys
% --- 510 
% !   124.43 real58.07 user42.17 sys
% ***
% *** 508,512 
% !   1973  average shared memory size
% !803  average unshared data size
% !128  average unshared stack size
% ! 203770  page reclaims
% !   1459  page faults
% --- 512,516 
% !   1920  average shared memory size
% !784  average unshared data size
% !127  average unshared stack size
% ! 203124  page reclaims
% !   1464  page faults
% ***
% *** 514,520 
% !165  block input operations
% !   1463  block output operations
% !  83118  messages sent
% !  83117  messages received
% !265  signals received
% ! 100319  voluntary context switches
% !   8113  involuntary context switches
% --- 518,524 
% !167  block input operations
% !   1469  block output operations
% !  83234  messages sent
% !  83236  messages received
% !267  signals received
% ! 125750  voluntary context switches
% !  17825  involuntary context switches

Similarly for depend stage.

% ***
% *** 524 
% --- 529,530 
% + === atm
% + === atm/sscop
% ***
% *** 701 
% !   291.30 real   307.00 user73.77 sys
% --- 707 
% !   290.28 real   308.16 user74.05 sys
% ***
% *** 703,707 
% !   2073  average shared memory size
% !   2076  average unshared data size
% !127  average unshared stack size
% ! 624020  page reclaims
% !156  page faults
% --- 709,713 
% !   2084  average shared memory size
% !   2056  average unshared data size
% !128  average unshared stack size
% ! 626651  page reclaims
% !154  page faults
% ***
% *** 709,715 
% ! 72  block input operations
% !   2122  block output operations
% !  45315  messages sent
% !  45317  messages received
% !691  signals received
% ! 195785  voluntary context switches
% !  58130  involuntary context switches
% --- 715,721 
% ! 83  block input operations
% !   2133  block output operations
% !  45532  messages sent
% !  45524  messages received
% !759  signals received
% ! 228998  voluntary context switches
% ! 128078  involuntary context switches

Similarly for the all stage.  The benchmark was not run carefully enough
for the 1 second differences in the times to be significant.

 You commented on the nice cutoff before.  What do you believe the correct
 behavior is?  In ULE I went to great lengths to be certain that I emulated
 the old behavior of denying nice +20 

Re: More ULE bugs fixed.

2003-11-03 Thread David O'Brien
On Tue, Nov 04, 2003 at 12:33:48AM +1100, Bruce Evans wrote:
 I think the existence of rtprio and a non-broken idprio makes infinite
 deprioritization using niceness unnecessary.  (idprio is still broken
 (not available to users) in -current, but it doesn't need to be if
 priority propagation is working as it should be.)  It's safer and fairer
 for all niced processes to not completely prevent each other being
 scheduled, and use the special scheduling classes for cases where this
 is not wanted.  I'd mainly like the slices for nice -20 vs nice --20
 processes to be very small and/or infrequent.

I agree.  With idprio, there is no need for a special nice value that is
handled outside the normal rules of nice.  I always thought that a wart
after using Irix which has a working idprio.
 
-- 
-- David  ([EMAIL PROTECTED])
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


How nice should behave (was Re: More ULE bugs fixed.)

2003-11-03 Thread Jeff Roberson

On Tue, 4 Nov 2003, Bruce Evans wrote:

 On Sun, 2 Nov 2003, Jeff Roberson wrote:

  You commented on the nice cutoff before.  What do you believe the correct
  behavior is?  In ULE I went to great lengths to be certain that I emulated
  the old behavior of denying nice +20 processes cpu time when anything nice
  0 or above was running.  As a result of that, nice -20 processes inhibit
  any processes with a nice below zero from receiving cpu time.  Prior to a
  commit earlier today, nice -20 would stop nice 0 processes that were
  non-interactive.  I've changed that though so nice 0 will always be able
  to run, just with a small slice.  Based on your earlier comments, you
  don't believe that this behavior is correct, why, and what would you like
  to see?

 Only RELENG_4 has that old behaviour.

 I think the existence of rtprio and a non-broken idprio makes infinite
 deprioritization using niceness unnecessary.  (idprio is still broken
 (not available to users) in -current, but it doesn't need to be if
 priority propagation is working as it should be.)  It's safer and fairer
 for all niced processes to not completely prevent each other being
 scheduled, and use the special scheduling classes for cases where this
 is not wanted.  I'd mainly like the slices for nice -20 vs nice --20
 processes to be very small and/or infrequent.

idprio should be able to function properly since we have priority
propagation and elevated priorities for m/tsleep.  I believe that many
people rely on the nice +20 behavior.  We could change this and make it a
matter of user education.

ULE's nice mechanism is very flexible in this regard.  I would only have
to change one define to force the slice assignment to scale across the
whole slice range.  Although, I only have 14 possible slice values to
hand out, so small differences would be meaningless.


 Bruce
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to [EMAIL PROTECTED]


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-11-03 Thread Jeff Roberson
On Mon, 3 Nov 2003, Eirik Oeverby wrote:

 Hi,

 Just recompiled yesterday, running sched_ule.c 1.75. It seems to have
 re-introduced the bogus mouse events I talked about earlier, after a
 period of having no problems with it. The change happened between 1.69
 and 1.75, and there's also the occational glitch in keyboard input.

How unfortunate, it seems to have fixed other problems.  Can you describe
the mouse problem?  Is it jittery constantly or only under load?  Or are
you having other problems?  Have you tried reverting to SCHED_4BSD?  What
window manager do you run?

Thanks for the report.

Cheers,
Jeff


 If you need me to do anything to track this down, let me know. I am, and
 have always been, running with moused, on a uniprocessor box (ThinkPad
 T21 1ghz p3).

 Best regards,
 /Eirik

 Jeff Roberson wrote:
  On Fri, 31 Oct 2003, Bruno Van Den Bossche wrote:
 
 
 Jeff Roberson [EMAIL PROTECTED] wrote:
 
 
 On Wed, 29 Oct 2003, Jeff Roberson wrote:
 
 
 On Thu, 30 Oct 2003, Bruce Evans wrote:
 
 
 Test for scheduling buildworlds:
 
 cd /usr/src/usr.bin
 for i in obj depend all
 do
 MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i
 done /tmp/zqz 21
 
 (Run this with an empty /somewhere/obj.  The all stage doesn't
 quite finish.)  On an ABIT BP6 system with a 400MHz and a 366MHz
 CPU, with/usr (including /usr/src) nfs-mounted (with 100 Mbps
 ethernet and a reasonably fast server) and /somewhere/obj
 ufs1-mounted (on a fairly slow disk; no soft-updates), this
 gives the following times:
 
 SCHED_ULE-yesterday, with not so careful setup:
40.37 real 8.26 user 6.26 sys
   278.90 real59.35 user41.32 sys
   341.82 real   307.38 user69.01 sys
 SCHED_ULE-today, run immediately after booting:
41.51 real 7.97 user 6.42 sys
   306.64 real59.66 user40.68 sys
   346.48 real   305.54 user69.97 sys
 SCHED_4BSD-yesterday, with not so careful setup:
   [same as today except the depend step was 10 seconds
   slower (real)]
 SCHED_4BSD-today, run immediately after booting:
18.89 real 8.01 user 6.66 sys
   128.17 real58.33 user43.61 sys
   291.59 real   308.48 user72.33 sys
 SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz
 CPU) with
 many local changes and not so careful setup:
17.39 real 8.28 user 5.49 sys
   130.51 real60.97 user34.63 sys
   390.68 real   310.78 user60.55 sys
 
 Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for
 the obj and depend stages.  These stages have little
 parallelism.  SCHED_ULE was only 19% slower for the all stage.
 ...
 
 I reran this with -current (sched_ule.c 1.68, etc.).  Result: no
 significant change.  However, with a UP kernel there was no
 significant difference between the times for SCHED_ULE and
 SCHED_4BSD.
 
 There was a significant difference on UP until last week.  I'm
 working on SMP now.  I have some patches but they aren't quite ready
 yet.
 
 I have commited my SMP fixes.  I would appreciate it if you could post
 update results.  ULE now outperforms 4BSD in a single threaded kernel
 compile and performs almost identically in a 16 way make.  I still
 have a few more things that I can do to improve the situation.  I
 would expect ULE to pull further ahead in the months to come.
 
 I recently had to complete a little piece of software in a course on
 parallel computing.  I've put it online[1] (we only had to write the
 pract2.cpp file).  It calculates the inverse of a Vandermonde matrix and
 allows you to spawn multiple slave-processes who each perform a part of
 the work.  Everything happens in memory so
 I've used it lately to test the different changes you made to
 sched_ule.c and these last fixes do improve the performance on my dual
 p3 machine a lot.
 
 Here are the results of my (very limited tests) :
 
 sched4bsd
 ---
 dimension   slaves  time
 10001   90.925408
 10002   58.897038
 
 200 1   0.735962
 200 2   0.676660
 
 sched_ule 1.68
 ---
 dimension   slaves  time
 10001   90.951015
 10002   70.402845
 
 200 1   0.743551
 200 2   1.900455
 
 sched_ule 1.70
 ---
 dimension   slaves  time
 10001   90.782309
 10002   57.207351
 
 200 1   0.739998
 200 2   0.383545
 
 
 I'm not really sure if this is very relevant to you, but from the
 end-user point of view (me :-)) this does means something.
 Thanks!
 
 
  I welcome the feedback, positive or negative, as it helps me improve
  things.  Thanks for the report!  Could you run 

Re: More ULE bugs fixed.

2003-11-02 Thread Jeff Roberson
On Sat, 1 Nov 2003, Bruce Evans wrote:

 On Fri, 31 Oct 2003, Jeff Roberson wrote:

  I have commited my SMP fixes.  I would appreciate it if you could post
  update results.  ULE now outperforms 4BSD in a single threaded kernel
  compile and performs almost identically in a 16 way make.  I still have a
  few more things that I can do to improve the situation.  I would expect
  ULE to pull further ahead in the months to come.

 My simple make benchmark now takes infinitely longer with ULE under SMP,
 since make -j 16 with ULE under SMP now hangs nfs after about a minute.
 4BSD works better.  However, some networking bugs have developed in the
 last few days.  One of their manifestations is that SMP kernels always
 panic in sbdrop() on shutdown.

  The nice issue is still outstanding, as is the incorrect wcpu reporting.

 It may be related to nfs processes not getting any cycles even when there
 are no niced processes.


I've just run your script myself.  I was using sched_ule.c rev 1.75.  I
did not encounter any problem.  I also have not run it with 4BSD so I
don't have any performance comparisons.  Hopefully the next time you have
an opportunity to test things will go smoothly.  I fixed a bug in
sched_prio() that may have caused this behavior.

You commented on the nice cutoff before.  What do you believe the correct
behavior is?  In ULE I went to great lengths to be certain that I emulated
the old behavior of denying nice +20 processes cpu time when anything nice
0 or above was running.  As a result of that, nice -20 processes inhibit
any processes with a nice below zero from receiving cpu time.  Prior to a
commit earlier today, nice -20 would stop nice 0 processes that were
non-interactive.  I've changed that though so nice 0 will always be able
to run, just with a small slice.  Based on your earlier comments, you
don't believe that this behavior is correct, why, and what would you like
to see?

Thanks,
Jeff



 Bruce
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to [EMAIL PROTECTED]


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-11-02 Thread Bruce Evans
On Fri, 31 Oct 2003, Sam Leffler wrote:

 On Friday 31 October 2003 09:04 am, Bruce Evans wrote:

  My simple make benchmark now takes infinitely longer with ULE under SMP,
  since make -j 16 with ULE under SMP now hangs nfs after about a minute.
  4BSD works better.  However, some networking bugs have developed in the
  last few days.  One of their manifestations is that SMP kernels always
  panic in sbdrop() on shutdown.

 I'm looking at something similar now.  If you have a stack trace please send
 it to me (along with any other info).  You might also try booting
 debug.mpsafenet=0.

Turning off mpsafenet fixed all these problems.

These console messages are with it not turned off.  fxp is the only
physical network device.

%%%
WARNING: loader(8) metadata is missing!
[ preserving 869208 bytes of kernel symbol table ]
Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.1-CURRENT #1005: Sun Nov  2 20:38:42 EST 2003
[EMAIL PROTECTED]:/c/sysc/i386/compile/smp
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Pentium II/Pentium II Xeon/Celeron (400.91-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0x665  Stepping = 5
  
Features=0x183fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR
real memory  = 268435456 (256 MB)
avail memory = 255369216 (243 MB)
Programming 24 pins in IOAPIC #0
IOAPIC #0 intpin 2 - irq 0
IOAPIC #0 intpin 17 - irq 9
IOAPIC #0 intpin 18 - irq 11
IOAPIC #0 intpin 19 - irq 5
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfee0
 cpu1 (AP):  apic id:  1, version: 0x00040011, at 0xfee0
 io0 (APIC): apic id:  2, version: 0x00170011, at 0xfec0
Pentium Pro MTRR support enabled
npx0: math processor on motherboard
npx0: flags 0x80 npx0: INT 16 interface
pcibios: BIOS version 2.10
Using $PIR table, 8 entries at 0xc00fdef0
pcib0: Intel 82443BX (440 BX) host to PCI bridge at pcibus 0 on motherboard
pci0: PCI bus on pcib0
pcib1: PCI-PCI bridge at device 1.0 on pci0
pci1: PCI bus on pcib1
pci1: display, VGA at device 0.0 (no driver attached)
isab0: PCI-ISA bridge at device 7.0 on pci0
isa0: ISA bus on isab0
atapci0: Intel PIIX4 UDMA33 controller port 0xf000-0xf00f at device 7.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata0: [MPSAFE]
ata1: at 0x170 irq 15 on atapci0
ata1: [MPSAFE]
pci0: serial bus, USB at device 7.2 (no driver attached)
piix0: PIIX Timecounter port 0x5000-0x500f at device 7.3 on pci0
Timecounter PIIX frequency 3579545 Hz quality 0
pci0: multimedia, video at device 11.0 (no driver attached)
pci0: multimedia at device 11.1 (no driver attached)
fxp0: Intel 82559 Pro/100 Ethernet port 0xa400-0xa43f mem 
0xea00-0xea0f,0xea104000-0xea104fff irq 9 at device 13.0 on pci0
fxp0: Ethernet address 00:90:27:99:02:99
miibus0: MII bus on fxp0
inphy0: i82555 10/100 media interface on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp0: [MPSAFE]
puc0: Titan VScom PCI-200HV2 port 0xb000-0xb01f,0xac00-0xac07,0xa800-0xa807 mem 
0xea103000-0xea103fff,0xea102000-0xea102fff irq 5 at device 17.0 on pci0
sio4: Titan VScom PCI-200HV2 on puc0
sio4: type 16550A
sio5: Titan VScom PCI-200HV2 on puc0
sio5: type 16550A
atapci1: HighPoint HPT366 UDMA66 controller port 
0xbc00-0xbcff,0xb800-0xb803,0xb400-0xb407 irq 11 at device 19.0 on pci0
atapci1: [MPSAFE]
ata2: at 0xb400 on atapci1
ata2: [MPSAFE]
atapci2: HighPoint HPT366 UDMA66 controller port 
0xc800-0xc8ff,0xc400-0xc403,0xc000-0xc007 irq 11 at device 19.1 on pci0
atapci2: [MPSAFE]
ata3: at 0xc000 on atapci2
ata3: [MPSAFE]
orm0: Option ROMs at iomem 0xc8000-0xcbfff,0xc-0xc7fff on isa0
fdc0: Enhanced floppy controller (i82077, NE72065 or clone) at port 
0x3f7,0x3f0-0x3f5 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: 1440-KB 3.5 drive on fdc0 drive 0
atkbdc0: Keyboard controller (i8042) at port 0x64,0x60 on isa0
atkbd0: AT Keyboard flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
psm0: PS/2 Mouse irq 12 on atkbdc0
psm0: model Generic PS/2 mouse, device ID 0
vga0: Generic ISA VGA at port 0x3c0-0x3df iomem 0xa-0xb on isa0
sc0: System console at flags 0x100 on isa0
sc0: VGA 16 virtual consoles, flags=0x100
sio0 at port 0x3f8-0x3ff irq 4 flags 0x90 on isa0
sio0: type 16550A, console
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
cy0 at iomem 0xd4000-0xd5fff irq 10 on isa0
cy0: driver is using old-style compatibility shims
ppc0: Parallel port at port 0x378-0x37f irq 7 on isa0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/16 bytes threshold
ppbus0: Parallel port bus on ppc0
ppbus0: IEEE1284 device found
Probing for PnP devices on ppbus0:
plip0: PLIP network interface on ppbus0
lpt0: Printer on ppbus0
lpt0: Interrupt-driven port
ppi0: Parallel I/O on ppbus0
unknown: PNP0303 can't assign resources (port)
speaker0: PC 

Re: More ULE bugs fixed.

2003-11-02 Thread Bruno Van Den Bossche
Jeff Roberson [EMAIL PROTECTED] wrote:

 On Fri, 31 Oct 2003, Bruno Van Den Bossche wrote:
[...]
  I recently had to complete a little piece of software in a course on
  parallel computing.  I've put it online[1] (we only had to write the
  pract2.cpp file).  It calculates the inverse of a Vandermonde matrix and
  allows you to spawn multiple slave-processes who each perform a part of
  the work.  Everything happens in memory so
  I've used it lately to test the different changes you made to
  sched_ule.c and these last fixes do improve the performance on my dual
  p3 machine a lot.
 
  Here are the results of my (very limited tests) :
 
  sched4bsd
  ---
  dimension   slaves  time
  10001   90.925408
  10002   58.897038
 
  200 1   0.735962
  200 2   0.676660
 
  sched_ule 1.68
  ---
  dimension   slaves  time
  10001   90.951015
  10002   70.402845
 
  200 1   0.743551
  200 2   1.900455
 
  sched_ule 1.70
  ---
  dimension   slaves  time
  10001   90.782309
  10002   57.207351
 
  200 1   0.739998
  200 2   0.383545
 
 
  I'm not really sure if this is very relevant to you, but from the
  end-user point of view (me :-)) this does means something.
  Thanks!
 
 I welcome the feedback, positive or negative, as it helps me improve
 things.  Thanks for the report!  Could you run this again under 4bsd and
 ULE with the following in your .cshrc:
 
 set time= ( 5 %Uu %Ss %E %P %X+%Dk %I+%Oio %Fpf+%Ww %cc/%ww )
 
 And then time ./testpract 200 2, etc.  This will give me a few hints about
 what's impacting your performance.

The program can run as a slave or master.  So one should run one master and multiple 
slaves and they all work on a piece of shared memory.  So I've timed the individual 
processes, as the wrapper-script test_pract2 doesn't do more then launch a few 
processes in the background.  I don't think the output of that is very relevant.

Here's the result:

sched_4bsd 1.26

10001
master: 49.172u 0.187s 2:21.54 34.8% 15+10182k 0+0io 0pf+0w 5962c/65w
slave : 90.326u 0.250s 1:30.75 99.8% 15+168k 0+0io 0pf+0w 9156c/35w

10002
master: 49.113u 0.226s 1:49.94 44.8% 15+10181k 0+0io 0pf+0w 5942c/63w
slave1: 55.211u 0.326s 0:59.11 93.9% 15+166k 0+0io 0pf+0w 11129c/2224w
slave2: 54.897u 0.363s 0:58.62 94.2% 15+167k 0+0io 0pf+0w 7111c/6129w

200 1
master: 0.377u 0.007s 0:02.39 15.4% 15+589k 0+0io 0pf+0w 38c/13w
slave : 0.711u 0.031s 0:00.74 100.0% 15+169k 0+0io 0pf+0w 85c/1w

200 2
master: 0.376u 0.007s 0:02.87 12.8% 16+602k 0+0io 0pf+0w 41c/11w
slave1: 0.388u 0.006s 0:01.03 36.8% 18+201k 0+0io 0pf+0w 1245c/408w
slave2: 0.345u 0.038s 0:00.68 54.4% 34+158k 0+0io 0pf+0w 432c/1215w


sched_ule 1.75

10001
master: 49.097u 0.163s 2:21.32 34.8% 15+10186k 0+0io 0pf+0w 6197c/163w
slave : 90.157u 0.398s 1:30.82 99.6% 15+168k 0+0io 0pf+0w 11568c/49w

10002
master: 49.132u 0.164s 1:48.15 45.5% 15+10155k 0+0io 0pf+0w 6517c/276w
slave1: 55.634u 0.406s 0:57.52 97.4% 15+169k 0+0io 0pf+0w 12745c/9628w
slave2: 55.416u 0.391s 0:57.13 97.6% 15+168k 0+0io 0pf+0w 12448c/10063w

200 1
master: 0.369u 0.016s 0:02.52 14.6% 15+577k 0+0io 0pf+0w 92c/35w
slave : 0.690u 0.054s 0:00.74 100.0% 15+171k 0+0io 0pf+0w 147c/13w

200 2
master: 0.376u 0.007s 0:02.47 14.9% 15+589k 0+0io 0pf+0w 87c/21w
slave1: 0.331u 0.023s 0:00.70 50.0% 15+173k 0+0io 0pf+0w 466c/2135w
slave2: 0.304u 0.040s 0:00.39 87.1% 15+166k 0+0io 0pf+0w 412c/2119w

 
  [1] http://users.pandora.be/bomberboy/mptest/final.tar.bz2
  It can be used by running testpract2 with two arguments, the dimension
  of the matrix and the number of slaves.  example './testpract2 200 2'
  will create a matrix with dimension 200 and 2 slaves.

-- 
Bruno

This fortune is inoperative.  Please try another.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-31 Thread Jeff Roberson
On Wed, 29 Oct 2003, Jeff Roberson wrote:

 On Thu, 30 Oct 2003, Bruce Evans wrote:

   Test for scheduling buildworlds:
  
 cd /usr/src/usr.bin
 for i in obj depend all
 do
 MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i
 done /tmp/zqz 21
  
   (Run this with an empty /somewhere/obj.  The all stage doesn't quite
   finish.)  On an ABIT BP6 system with a 400MHz and a 366MHz CPU, with
   /usr (including /usr/src) nfs-mounted (with 100 Mbps ethernet and a
   reasonably fast server) and /somewhere/obj ufs1-mounted (on a fairly
   slow disk; no soft-updates), this gives the following times:
  
   SCHED_ULE-yesterday, with not so careful setup:
  40.37 real 8.26 user 6.26 sys
 278.90 real59.35 user41.32 sys
 341.82 real   307.38 user69.01 sys
   SCHED_ULE-today, run immediately after booting:
  41.51 real 7.97 user 6.42 sys
 306.64 real59.66 user40.68 sys
 346.48 real   305.54 user69.97 sys
   SCHED_4BSD-yesterday, with not so careful setup:
 [same as today except the depend step was 10 seconds slower (real)]
   SCHED_4BSD-today, run immediately after booting:
  18.89 real 8.01 user 6.66 sys
 128.17 real58.33 user43.61 sys
 291.59 real   308.48 user72.33 sys
   SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz CPU) with
   many local changes and not so careful setup:
  17.39 real 8.28 user 5.49 sys
 130.51 real60.97 user34.63 sys
 390.68 real   310.78 user60.55 sys
  
   Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for the
   obj and depend stages.  These stages have little parallelism.  SCHED_ULE
   was only 19% slower for the all stage.  ...
 
  I reran this with -current (sched_ule.c 1.68, etc.).  Result: no
  significant change.  However, with a UP kernel there was no significant
  difference between the times for SCHED_ULE and SCHED_4BSD.

 There was a significant difference on UP until last week.  I'm working on
 SMP now.  I have some patches but they aren't quite ready yet.

I have commited my SMP fixes.  I would appreciate it if you could post
update results.  ULE now outperforms 4BSD in a single threaded kernel
compile and performs almost identically in a 16 way make.  I still have a
few more things that I can do to improve the situation.  I would expect
ULE to pull further ahead in the months to come.

The nice issue is still outstanding, as is the incorrect wcpu reporting.

Cheers,
Jeff


 
   Test 5 for fair scheduling related to niceness:
  
 for i in -20 -16 -12 -8 -4 0 4 8 12 16 20
 do
 nice -$i sh -c while :; do echo -n;done 
 done
 time top -o cpu
  
   With SCHED_ULE, this now hangs the system, but it worked yesterday.  Today
   it doesn't get as far as running top and it stops the nfs server responding.
   To unhang the system and see what the above does, run a shell at rtprio 0
   and start top before the above, and use top to kill processes (I normally
   use killall sh to kill all the shells generated by tests 1-5, but killall
   doesn't work if it is on nfs when the nfs server is not responding).
 
  This shows problems much more clearly with UP kernels.  It gives the
  nice -20 and -16 processes approx. 55% and 50% of the CPU, respectively
  (the total is significantly more than 100%), and it gives approx.  0%
  of the CPU to the other sh processes (perhaps exactly 0).  It also
  apparently gives gives 0% of the CPU to some important nfs process (I
  couldn't see exactly which) so the nfs server stops responding.
  SCHED_4BSD errs in the opposite direction by giving too many cycles to
  highly niced processes so it is naturally immune to this problem.  With
  SMP, SCHED_ULE lets many more processes run.

 I seem to have broken something related to nice.  I only tested
 interactivity and performance after my last round of changes.  I have a
 standard test that I do that is similar to the one that you have posted
 here.  I used it to gather results for my paper
 (http://www.chesapeake.net/~jroberson/ULE.pdf).  There you can see what
 the intended nice curve is like.  Oddly enough, I ran your test again on
 my laptop and I did not see 55% of the cpu going to nice -20.  It was
 spread proportionally from -20 to 0 with postive nice values not receiving
 cpu time, as intended.  It did not, however, let interactive processes
 proceed.  This is certainly a bug and it sounds like there may be others
 which lead to the problems that you're having.

 
  The nfs server also sometimes stops reponding with only non-negatively
  niced processes (0 through 20 in the above), but it takes longer.
 
  The nfs server restarts if enough of the hog processes are killed.
  Apparently nfs has some critical process running at only user 

Re: More ULE bugs fixed.

2003-10-31 Thread Bruno Van Den Bossche
Jeff Roberson [EMAIL PROTECTED] wrote:

 On Wed, 29 Oct 2003, Jeff Roberson wrote:
 
  On Thu, 30 Oct 2003, Bruce Evans wrote:
 
Test for scheduling buildworlds:
   
cd /usr/src/usr.bin
for i in obj depend all
do
MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i
done /tmp/zqz 21
   
(Run this with an empty /somewhere/obj.  The all stage doesn't
quite finish.)  On an ABIT BP6 system with a 400MHz and a 366MHz
CPU, with/usr (including /usr/src) nfs-mounted (with 100 Mbps
ethernet and a reasonably fast server) and /somewhere/obj
ufs1-mounted (on a fairly slow disk; no soft-updates), this
gives the following times:
   
SCHED_ULE-yesterday, with not so careful setup:
   40.37 real 8.26 user 6.26 sys
  278.90 real59.35 user41.32 sys
  341.82 real   307.38 user69.01 sys
SCHED_ULE-today, run immediately after booting:
   41.51 real 7.97 user 6.42 sys
  306.64 real59.66 user40.68 sys
  346.48 real   305.54 user69.97 sys
SCHED_4BSD-yesterday, with not so careful setup:
  [same as today except the depend step was 10 seconds
  slower (real)]
SCHED_4BSD-today, run immediately after booting:
   18.89 real 8.01 user 6.66 sys
  128.17 real58.33 user43.61 sys
  291.59 real   308.48 user72.33 sys
SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz
CPU) with
many local changes and not so careful setup:
   17.39 real 8.28 user 5.49 sys
  130.51 real60.97 user34.63 sys
  390.68 real   310.78 user60.55 sys
   
Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for
the obj and depend stages.  These stages have little
parallelism.  SCHED_ULE was only 19% slower for the all stage. 
...
  
   I reran this with -current (sched_ule.c 1.68, etc.).  Result: no
   significant change.  However, with a UP kernel there was no
   significant difference between the times for SCHED_ULE and
   SCHED_4BSD.
 
  There was a significant difference on UP until last week.  I'm
  working on SMP now.  I have some patches but they aren't quite ready
  yet.
 
 I have commited my SMP fixes.  I would appreciate it if you could post
 update results.  ULE now outperforms 4BSD in a single threaded kernel
 compile and performs almost identically in a 16 way make.  I still
 have a few more things that I can do to improve the situation.  I
 would expect ULE to pull further ahead in the months to come.

I recently had to complete a little piece of software in a course on
parallel computing.  I've put it online[1] (we only had to write the
pract2.cpp file).  It calculates the inverse of a Vandermonde matrix and
allows you to spawn multiple slave-processes who each perform a part of
the work.  Everything happens in memory so 
I've used it lately to test the different changes you made to
sched_ule.c and these last fixes do improve the performance on my dual
p3 machine a lot.

Here are the results of my (very limited tests) :

sched4bsd
---
dimension   slaves  time
10001   90.925408
10002   58.897038

200 1   0.735962
200 2   0.676660

sched_ule 1.68
---
dimension   slaves  time
10001   90.951015
10002   70.402845

200 1   0.743551
200 2   1.900455

sched_ule 1.70
---
dimension   slaves  time
10001   90.782309
10002   57.207351

200 1   0.739998
200 2   0.383545


I'm not really sure if this is very relevant to you, but from the
end-user point of view (me :-)) this does means something.
Thanks!

[1] http://users.pandora.be/bomberboy/mptest/final.tar.bz2
It can be used by running testpract2 with two arguments, the dimension
of the matrix and the number of slaves.  example './testpract2 200 2'
will create a matrix with dimension 200 and 2 slaves.


-- 
Bruno

... And then there's the guy who bought 20,000 bras, cut them in half,
and sold 40,000 yamalchas with chin straps
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-31 Thread Bruce Evans
On Fri, 31 Oct 2003, Jeff Roberson wrote:

 I have commited my SMP fixes.  I would appreciate it if you could post
 update results.  ULE now outperforms 4BSD in a single threaded kernel
 compile and performs almost identically in a 16 way make.  I still have a
 few more things that I can do to improve the situation.  I would expect
 ULE to pull further ahead in the months to come.

My simple make benchmark now takes infinitely longer with ULE under SMP,
since make -j 16 with ULE under SMP now hangs nfs after about a minute.
4BSD works better.  However, some networking bugs have developed in the
last few days.  One of their manifestations is that SMP kernels always
panic in sbdrop() on shutdown.

 The nice issue is still outstanding, as is the incorrect wcpu reporting.

It may be related to nfs processes not getting any cycles even when there
are no niced processes.

Bruce
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-31 Thread Sam Leffler
On Friday 31 October 2003 09:04 am, Bruce Evans wrote:

 My simple make benchmark now takes infinitely longer with ULE under SMP,
 since make -j 16 with ULE under SMP now hangs nfs after about a minute.
 4BSD works better.  However, some networking bugs have developed in the
 last few days.  One of their manifestations is that SMP kernels always
 panic in sbdrop() on shutdown.

I'm looking at something similar now.  If you have a stack trace please send 
it to me (along with any other info).  You might also try booting 
debug.mpsafenet=0.

Sam

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-31 Thread Jeff Roberson
On Fri, 31 Oct 2003, Bruno Van Den Bossche wrote:

 Jeff Roberson [EMAIL PROTECTED] wrote:

  On Wed, 29 Oct 2003, Jeff Roberson wrote:
 
   On Thu, 30 Oct 2003, Bruce Evans wrote:
  
 Test for scheduling buildworlds:

   cd /usr/src/usr.bin
   for i in obj depend all
   do
   MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i
   done /tmp/zqz 21

 (Run this with an empty /somewhere/obj.  The all stage doesn't
 quite finish.)  On an ABIT BP6 system with a 400MHz and a 366MHz
 CPU, with/usr (including /usr/src) nfs-mounted (with 100 Mbps
 ethernet and a reasonably fast server) and /somewhere/obj
 ufs1-mounted (on a fairly slow disk; no soft-updates), this
 gives the following times:

 SCHED_ULE-yesterday, with not so careful setup:
40.37 real 8.26 user 6.26 sys
   278.90 real59.35 user41.32 sys
   341.82 real   307.38 user69.01 sys
 SCHED_ULE-today, run immediately after booting:
41.51 real 7.97 user 6.42 sys
   306.64 real59.66 user40.68 sys
   346.48 real   305.54 user69.97 sys
 SCHED_4BSD-yesterday, with not so careful setup:
   [same as today except the depend step was 10 seconds
   slower (real)]
 SCHED_4BSD-today, run immediately after booting:
18.89 real 8.01 user 6.66 sys
   128.17 real58.33 user43.61 sys
   291.59 real   308.48 user72.33 sys
 SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz
 CPU) with
 many local changes and not so careful setup:
17.39 real 8.28 user 5.49 sys
   130.51 real60.97 user34.63 sys
   390.68 real   310.78 user60.55 sys

 Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for
 the obj and depend stages.  These stages have little
 parallelism.  SCHED_ULE was only 19% slower for the all stage.
 ...
   
I reran this with -current (sched_ule.c 1.68, etc.).  Result: no
significant change.  However, with a UP kernel there was no
significant difference between the times for SCHED_ULE and
SCHED_4BSD.
  
   There was a significant difference on UP until last week.  I'm
   working on SMP now.  I have some patches but they aren't quite ready
   yet.
 
  I have commited my SMP fixes.  I would appreciate it if you could post
  update results.  ULE now outperforms 4BSD in a single threaded kernel
  compile and performs almost identically in a 16 way make.  I still
  have a few more things that I can do to improve the situation.  I
  would expect ULE to pull further ahead in the months to come.

 I recently had to complete a little piece of software in a course on
 parallel computing.  I've put it online[1] (we only had to write the
 pract2.cpp file).  It calculates the inverse of a Vandermonde matrix and
 allows you to spawn multiple slave-processes who each perform a part of
 the work.  Everything happens in memory so
 I've used it lately to test the different changes you made to
 sched_ule.c and these last fixes do improve the performance on my dual
 p3 machine a lot.

 Here are the results of my (very limited tests) :

 sched4bsd
 ---
 dimension   slaves  time
 10001   90.925408
 10002   58.897038

 200 1   0.735962
 200 2   0.676660

 sched_ule 1.68
 ---
 dimension   slaves  time
 10001   90.951015
 10002   70.402845

 200 1   0.743551
 200 2   1.900455

 sched_ule 1.70
 ---
 dimension   slaves  time
 10001   90.782309
 10002   57.207351

 200 1   0.739998
 200 2   0.383545


 I'm not really sure if this is very relevant to you, but from the
 end-user point of view (me :-)) this does means something.
 Thanks!

I welcome the feedback, positive or negative, as it helps me improve
things.  Thanks for the report!  Could you run this again under 4bsd and
ULE with the following in your .cshrc:

set time= ( 5 %Uu %Ss %E %P %X+%Dk %I+%Oio %Fpf+%Ww %cc/%ww )

And then time ./testpract 200 2, etc.  This will give me a few hints about
what's impacting your performance.

Thanks!
Jeff


 [1] http://users.pandora.be/bomberboy/mptest/final.tar.bz2
 It can be used by running testpract2 with two arguments, the dimension
 of the matrix and the number of slaves.  example './testpract2 200 2'
 will create a matrix with dimension 200 and 2 slaves.


 --
 Bruno

 ... And then there's the guy who bought 20,000 bras, cut them in half,
 and sold 40,000 yamalchas with chin straps



Re: More ULE bugs fixed.

2003-10-29 Thread Bruce Evans
 Test for scheduling buildworlds:

   cd /usr/src/usr.bin
   for i in obj depend all
   do
   MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i
   done /tmp/zqz 21

 (Run this with an empty /somewhere/obj.  The all stage doesn't quite
 finish.)  On an ABIT BP6 system with a 400MHz and a 366MHz CPU, with
 /usr (including /usr/src) nfs-mounted (with 100 Mbps ethernet and a
 reasonably fast server) and /somewhere/obj ufs1-mounted (on a fairly
 slow disk; no soft-updates), this gives the following times:

 SCHED_ULE-yesterday, with not so careful setup:
40.37 real 8.26 user 6.26 sys
   278.90 real59.35 user41.32 sys
   341.82 real   307.38 user69.01 sys
 SCHED_ULE-today, run immediately after booting:
41.51 real 7.97 user 6.42 sys
   306.64 real59.66 user40.68 sys
   346.48 real   305.54 user69.97 sys
 SCHED_4BSD-yesterday, with not so careful setup:
   [same as today except the depend step was 10 seconds slower (real)]
 SCHED_4BSD-today, run immediately after booting:
18.89 real 8.01 user 6.66 sys
   128.17 real58.33 user43.61 sys
   291.59 real   308.48 user72.33 sys
 SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz CPU) with
 many local changes and not so careful setup:
17.39 real 8.28 user 5.49 sys
   130.51 real60.97 user34.63 sys
   390.68 real   310.78 user60.55 sys

 Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for the
 obj and depend stages.  These stages have little parallelism.  SCHED_ULE
 was only 19% slower for the all stage.  ...

I reran this with -current (sched_ule.c 1.68, etc.).  Result: no
significant change.  However, with a UP kernel there was no significant
difference between the times for SCHED_ULE and SCHED_4BSD.

 Test 5 for fair scheduling related to niceness:

   for i in -20 -16 -12 -8 -4 0 4 8 12 16 20
   do
   nice -$i sh -c while :; do echo -n;done 
   done
   time top -o cpu

 With SCHED_ULE, this now hangs the system, but it worked yesterday.  Today
 it doesn't get as far as running top and it stops the nfs server responding.
 To unhang the system and see what the above does, run a shell at rtprio 0
 and start top before the above, and use top to kill processes (I normally
 use killall sh to kill all the shells generated by tests 1-5, but killall
 doesn't work if it is on nfs when the nfs server is not responding).

This shows problems much more clearly with UP kernels.  It gives the
nice -20 and -16 processes approx. 55% and 50% of the CPU, respectively
(the total is significantly more than 100%), and it gives approx.  0%
of the CPU to the other sh processes (perhaps exactly 0).  It also
apparently gives gives 0% of the CPU to some important nfs process (I
couldn't see exactly which) so the nfs server stops responding.
SCHED_4BSD errs in the opposite direction by giving too many cycles to
highly niced processes so it is naturally immune to this problem.  With
SMP, SCHED_ULE lets many more processes run.

The nfs server also sometimes stops reponding with only non-negatively
niced processes (0 through 20 in the above), but it takes longer.

The nfs server restarts if enough of the hog processes are killed.
Apparently nfs has some critical process running at only user priority
and nice 0 and even non-negatively niced processes are enough to prevent
it it running.

Top output with loops like the above shows many anomalies in PRI, TIME,
WCPU and CPU, but no worse than the ones with SCHED_4BSD.  PRI tends to
stick at 139 (the max) with SCHED_ULE.  With SCHED_4BSD, this indicates
that the scheduler has entered an unfair scheduling region.  I don't
know how to interpret it for SCHED_ULE (at first I thought 139 was a
dummy value).

Bruce
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-29 Thread Jeff Roberson
On Thu, 30 Oct 2003, Bruce Evans wrote:

  Test for scheduling buildworlds:
 
  cd /usr/src/usr.bin
  for i in obj depend all
  do
  MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i
  done /tmp/zqz 21
 
  (Run this with an empty /somewhere/obj.  The all stage doesn't quite
  finish.)  On an ABIT BP6 system with a 400MHz and a 366MHz CPU, with
  /usr (including /usr/src) nfs-mounted (with 100 Mbps ethernet and a
  reasonably fast server) and /somewhere/obj ufs1-mounted (on a fairly
  slow disk; no soft-updates), this gives the following times:
 
  SCHED_ULE-yesterday, with not so careful setup:
 40.37 real 8.26 user 6.26 sys
278.90 real59.35 user41.32 sys
341.82 real   307.38 user69.01 sys
  SCHED_ULE-today, run immediately after booting:
 41.51 real 7.97 user 6.42 sys
306.64 real59.66 user40.68 sys
346.48 real   305.54 user69.97 sys
  SCHED_4BSD-yesterday, with not so careful setup:
[same as today except the depend step was 10 seconds slower (real)]
  SCHED_4BSD-today, run immediately after booting:
 18.89 real 8.01 user 6.66 sys
128.17 real58.33 user43.61 sys
291.59 real   308.48 user72.33 sys
  SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz CPU) with
  many local changes and not so careful setup:
 17.39 real 8.28 user 5.49 sys
130.51 real60.97 user34.63 sys
390.68 real   310.78 user60.55 sys
 
  Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for the
  obj and depend stages.  These stages have little parallelism.  SCHED_ULE
  was only 19% slower for the all stage.  ...

 I reran this with -current (sched_ule.c 1.68, etc.).  Result: no
 significant change.  However, with a UP kernel there was no significant
 difference between the times for SCHED_ULE and SCHED_4BSD.

There was a significant difference on UP until last week.  I'm working on
SMP now.  I have some patches but they aren't quite ready yet.


  Test 5 for fair scheduling related to niceness:
 
  for i in -20 -16 -12 -8 -4 0 4 8 12 16 20
  do
  nice -$i sh -c while :; do echo -n;done 
  done
  time top -o cpu
 
  With SCHED_ULE, this now hangs the system, but it worked yesterday.  Today
  it doesn't get as far as running top and it stops the nfs server responding.
  To unhang the system and see what the above does, run a shell at rtprio 0
  and start top before the above, and use top to kill processes (I normally
  use killall sh to kill all the shells generated by tests 1-5, but killall
  doesn't work if it is on nfs when the nfs server is not responding).

 This shows problems much more clearly with UP kernels.  It gives the
 nice -20 and -16 processes approx. 55% and 50% of the CPU, respectively
 (the total is significantly more than 100%), and it gives approx.  0%
 of the CPU to the other sh processes (perhaps exactly 0).  It also
 apparently gives gives 0% of the CPU to some important nfs process (I
 couldn't see exactly which) so the nfs server stops responding.
 SCHED_4BSD errs in the opposite direction by giving too many cycles to
 highly niced processes so it is naturally immune to this problem.  With
 SMP, SCHED_ULE lets many more processes run.

I seem to have broken something related to nice.  I only tested
interactivity and performance after my last round of changes.  I have a
standard test that I do that is similar to the one that you have posted
here.  I used it to gather results for my paper
(http://www.chesapeake.net/~jroberson/ULE.pdf).  There you can see what
the intended nice curve is like.  Oddly enough, I ran your test again on
my laptop and I did not see 55% of the cpu going to nice -20.  It was
spread proportionally from -20 to 0 with postive nice values not receiving
cpu time, as intended.  It did not, however, let interactive processes
proceed.  This is certainly a bug and it sounds like there may be others
which lead to the problems that you're having.


 The nfs server also sometimes stops reponding with only non-negatively
 niced processes (0 through 20 in the above), but it takes longer.

 The nfs server restarts if enough of the hog processes are killed.
 Apparently nfs has some critical process running at only user priority
 and nice 0 and even non-negatively niced processes are enough to prevent
 it it running.

This shouldn't be the case, it sounds like my interactivity boost is
somewhat broken.


 Top output with loops like the above shows many anomalies in PRI, TIME,
 WCPU and CPU, but no worse than the ones with SCHED_4BSD.  PRI tends to
 stick at 139 (the max) with SCHED_ULE.  With SCHED_4BSD, this indicates
 that the scheduler has entered an unfair scheduling region.  I don't
 know how to interpret it for SCHED_ULE (at first I thought 139 

Re: More ULE bugs fixed.

2003-10-27 Thread Jeff Roberson
On Fri, 17 Oct 2003, Bruce Evans wrote:

 On Fri, 17 Oct 2003, Jeff Roberson wrote:

  On Fri, 17 Oct 2003, Bruce Evans wrote:
 
   How would one test if it was an improvement on the 4BSD scheduler?  It
   is not even competitive in my simple tests.
   ...
 
  At one point ULE was at least as fast as 4BSD and in most cases faster.
  This is a regression.  I'll sort it out soon.

 How much faster?


make kernel on UP seems to be within 1% of 4BSD now.  I actually had some
runs which showed lower system time.  I think I can still improve the
situation some.  Anyway, I found some bugs relating to idle prio tasks,
and also ULE had been doing almost twice as many context switches as 4BSD.
Now it's doing about 8% more.  I'm still tracking this down.

Anyhow, it should be much closer now.  I still have some plans for SMP
that should improve things quite a bit there but UP is looking good.

Cheers,
Jeff

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-27 Thread Jon Mini
Jeff Roberson [EMAIL PROTECTED] wrote :

 On Fri, 17 Oct 2003, Bruce Evans wrote:
 
 How would one test if it was an improvement on the 4BSD scheduler?  It
 is not even competitive in my simple tests.

What were your simple tests?

-- 
Jonathan Mini [EMAIL PROTECTED]
http://www.freebsd.org/
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-27 Thread Bruce Evans
On Sun, 26 Oct 2003, Jon Mini wrote:

 Jeff Roberson [EMAIL PROTECTED] wrote :

  On Fri, 17 Oct 2003, Bruce Evans wrote:
 
  How would one test if it was an improvement on the 4BSD scheduler?  It
  is not even competitive in my simple tests.

 What were your simple tests?

Er, they were in the original mail.  Just do parts of buildworld with -j16
on an SMP system.  ULE was 2.4 times slower for make depend and 2.1 times
slower for make obj.  Something must have been very wrong, since make obj,
especially, should be completely i/o bound so it shouldn't be affected
by the scheduler.  Also, run a bunch of CPU hog processes with various
nicenesses and look at top output to check that they are given reasonable
amounts of CPU.

Bruce
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-27 Thread Jon Mini
Bruce Evans [EMAIL PROTECTED] wrote :

 On Sun, 26 Oct 2003, Jon Mini wrote:
 
  Jeff Roberson [EMAIL PROTECTED] wrote :
 
   On Fri, 17 Oct 2003, Bruce Evans wrote:
  
   How would one test if it was an improvement on the 4BSD scheduler?  It
   is not even competitive in my simple tests.
 
  What were your simple tests?
 
 Er, they were in the original mail.  Just do parts of buildworld with -j16
 on an SMP system.  ULE was 2.4 times slower for make depend and 2.1 times
 slower for make obj.  Something must have been very wrong, since make obj,
 especially, should be completely i/o bound so it shouldn't be affected
 by the scheduler.  Also, run a bunch of CPU hog processes with various
 nicenesses and look at top output to check that they are given reasonable
 amounts of CPU.

My apologies, I just subscribed to current and only caught the tail
end of this thread.

-- 
Jonathan Mini [EMAIL PROTECTED]
http://www.freebsd.org/
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-21 Thread Eirik Oeverby
Thanks.
I should have known =)
/Eirik

Maxime Henrion wrote:
Eirik Oeverby wrote:

As a side note/question:
Is there any way to figure out which ULE version I'm running in a 
precompiled kernel? I just nuked my src tree by accident, and am not 
sure if i'm on 1.65 or something older..

If there is no way, is this perhaps an idea?


Try ident /boot/kernel/kernel | grep sched_ule.

Cheers,
Maxime
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-19 Thread Eirik Oeverby
As a side note/question:
Is there any way to figure out which ULE version I'm running in a 
precompiled kernel? I just nuked my src tree by accident, and am not 
sure if i'm on 1.65 or something older..

If there is no way, is this perhaps an idea?

Thanks,
/Eirik
Jeff Roberson wrote:
On Fri, 17 Oct 2003, Bruce Evans wrote:


How would one test if it was an improvement on the 4BSD scheduler?  It
is not even competitive in my simple tests.


[scripts results deleted]


Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for the
obj and depend stages.  These stages have little parallelism.  SCHED_ULE
was only 19% slower for the all stage.  It apparently misses many
oppurtunities to actually run useful processes.  This may be related
to /usr being nfs mounted.  There is lots of idling waiting for nfs
even in the SCHED_4BSD case.  The system times are smaller for SCHED_ULE,
but this might not be significant.  E.g., zeroing pages can account
for several percent of the system time in buildworld, but on unbalanced
systems that have too much idle time most page zero gets done in idle
time and doesn't show up in the system time.


At one point ULE was at least as fast as 4BSD and in most cases faster.
This is a regression.  I'll sort it out soon.


Test 1 for fair scheduling related to niceness:

for i in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
do
nice -$i sh -c while :; do echo -n;done 
done
top -o time
[Output deleted].  This shows only a vague correlation between niceness
and runtime for SCHED_ULE.  However, top -o cpu shows a strong correlation
between %CPU and niceness.  Apparently, %CPU is very innacurate and/or
not enough history is kept for long-term scheduling to be fair.
Test 5 for fair scheduling related to niceness:

for i in -20 -16 -12 -8 -4 0 4 8 12 16 20
do
nice -$i sh -c while :; do echo -n;done 
done
time top -o cpu
With SCHED_ULE, this now hangs the system, but it worked yesterday.  Today
it doesn't get as far as running top and it stops the nfs server responding.
To unhang the system and see what the above does, run a shell at rtprio 0
and start top before the above, and use top to kill processes (I normally
use killall sh to kill all the shells generated by tests 1-5, but killall
doesn't work if it is on nfs when the nfs server is not responding).


  661 root 112  -20   900K   608K RUN  0:24 27.80% 27.64% sh
  662 root 114  -16   900K   608K RUN  0:19 12.43% 12.35% sh
  663 root 114  -12   900K   608K RUN  0:15 10.66% 10.60% sh
  664 root 114   -8   900K   608K RUN  0:11  9.38%  9.33% sh
  665 root 115   -4   900K   608K RUN  0:10  7.91%  7.86% sh
  666 root 1150   900K   608K RUN  0:07  6.83%  6.79% sh
  667 root 1154   900K   608K RUN  0:06  5.01%  4.98% sh
  668 root 1158   900K   608K RUN  0:04  3.83%  3.81% sh
  669 root 115   12   900K   608K RUN  0:02  2.21%  2.20% sh
  670 root 115   16   900K   608K RUN  0:01  0.93%  0.93% sh
I think you cvsup'd at a bad time.  I fixed a bug that would have caused
the system to lock up in this case late last night.  On my system it
freezes for a few seconds and then returns.  I can stop that by turning
down the interactivity threshold.
Thanks,
Jeff

Bruce
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-19 Thread Maxime Henrion
Eirik Oeverby wrote:
 As a side note/question:
 Is there any way to figure out which ULE version I'm running in a 
 precompiled kernel? I just nuked my src tree by accident, and am not 
 sure if i'm on 1.65 or something older..
 
 If there is no way, is this perhaps an idea?

Try ident /boot/kernel/kernel | grep sched_ule.

Cheers,
Maxime
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-19 Thread Bruce Evans
On Fri, 17 Oct 2003, Jeff Roberson wrote:

 On Fri, 17 Oct 2003, Bruce Evans wrote:

  How would one test if it was an improvement on the 4BSD scheduler?  It
  is not even competitive in my simple tests.
  ...

 At one point ULE was at least as fast as 4BSD and in most cases faster.
 This is a regression.  I'll sort it out soon.

How much faster?

  Test 5 for fair scheduling related to niceness:
 
  for i in -20 -16 -12 -8 -4 0 4 8 12 16 20
  do
  nice -$i sh -c while :; do echo -n;done 
  done
  time top -o cpu
 
  With SCHED_ULE, this now hangs the system, but it worked yesterday.  Today
  it doesn't get as far as running top and it stops the nfs server responding.

   661 root 112  -20   900K   608K RUN  0:24 27.80% 27.64% sh
   662 root 114  -16   900K   608K RUN  0:19 12.43% 12.35% sh
   663 root 114  -12   900K   608K RUN  0:15 10.66% 10.60% sh
   664 root 114   -8   900K   608K RUN  0:11  9.38%  9.33% sh
   665 root 115   -4   900K   608K RUN  0:10  7.91%  7.86% sh
   666 root 1150   900K   608K RUN  0:07  6.83%  6.79% sh
   667 root 1154   900K   608K RUN  0:06  5.01%  4.98% sh
   668 root 1158   900K   608K RUN  0:04  3.83%  3.81% sh
   669 root 115   12   900K   608K RUN  0:02  2.21%  2.20% sh
   670 root 115   16   900K   608K RUN  0:01  0.93%  0.93% sh

Perhaps the bug only affects SMP.  The above is for UP (no CPU column).

I see a large difference from the above, at least under SMP: %CPU
tapers off to 0 at nice 0.

BTW, I just noticed that SCHED_4BSD never really worked for the SMP case.
sched_clock() is called for each CPU, and for N CPU's this has the same
effect as calling sched_clock() N times too often for 1 CPU.  Calling
sched_clock() too often was fixed for the UP case in kern_synch.c 1.83
by introducing a scale factor.  The scale factor is fixed so it doesn't
help for SMP.

 I think you cvsup'd at a bad time.  I fixed a bug that would have caused
 the system to lock up in this case late last night.  On my system it
 freezes for a few seconds and then returns.  I can stop that by turning
 down the interactivity threshold.

No, I tested with an up to date kernel (sched_ule.c 1.65).

Bruce
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-18 Thread Sean Chittenden
 The commit to src/sys/kern/kern_switch.c:1.62, would it fix the
 following crash (can't find my kernel with debugging symbols):

Hrm, nope.  This is from a kernel from tonight at 9pm PST.  -sc

#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:240
#1  0xc052f579 in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:372
#2  0xc052f958 in panic () at /usr/src/sys/kern/kern_shutdown.c:550
#3  0xc06e5546 in trap_fatal (frame=0xdc2797e4, eva=0) at 
/usr/src/sys/i386/i386/trap.c:820
#4  0xc06e4b83 in trap (frame=
  {tf_fs = -1068236776, tf_es = -1065811952, tf_ds = -880082928, tf_edi = 0, 
tf_esi = 0, tf_ebp = -601384896, tf_isp = -601384944, tf_ebx = 0, tf_edx = -872307216, 
tf_ecx = -601384756, tf_eax = 1, tf_trapno = 12, tf_err = 0, tf_eip = -1068209552, 
tf_cs = 8, tf_eflags = 66050, tf_esp = -878456736, tf_ss = -1049884900}) at 
/usr/src/sys/i386/i386/trap.c:252
#5  0xc06d53e8 in calltrap () at {standard input}:102
#6  0xc0527bd4 in fill_kinfo_thread (td=0xcc087e40, kp=0xdc2798cc) at 
/usr/src/sys/kern/kern_proc.c:766
#7  0xc052757b in fill_kinfo_proc (p=0x0, kp=0x0) at /usr/src/sys/kern/kern_proc.c:622
#8  0xc0527fbe in sysctl_out_proc (p=0xcbe6b1e4, req=0xdc279bf8, flags=4) at 
/usr/src/sys/kern/kern_proc.c:859
#9  0xc0528787 in sysctl_kern_proc (oidp=0xc0764300, arg1=0xdc279ca4, arg2=0, 
req=0xdc279bf8) at /usr/src/sys/kern/kern_proc.c:1024
#10 0xc053a36a in sysctl_root (oidp=0x0, arg1=0xdc279c98, arg2=3, req=0xdc279bf8) at 
/usr/src/sys/kern/kern_sysctl.c:1179
#11 0xc053a64d in userland_sysctl (td=0x0, name=0xdc279c98, namelen=3, old=0x3, 
oldlenp=0xdc279bf8, inkernel=0, new=0xdc279c98, newlen=0,
retval=0xdc279c90) at /usr/src/sys/kern/kern_sysctl.c:1286
#12 0xc053a474 in __sysctl (td=0x0, uap=0xdc279d10) at 
/usr/src/sys/kern/kern_sysctl.c:1216
#13 0xc06e58d0 in syscall (frame=
  {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 3, tf_esi = 0, tf_ebp = 
-1077941160, tf_isp = -601383564, tf_ebx = -1077941108, tf_edx = 0, tf_ecx = 
-1077941056, tf_eax = 202, tf_trapno = 12, tf_err = 2, tf_eip = 134768643, tf_cs = 31, 
tf_eflags = 663, tf_esp = -1077941204, tf_ss = 47}) at 
/usr/src/sys/i386/i386/trap.c:1009
#14 0xc06d543d in Xint0x80_syscall () at {standard input}:144


Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x38
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc0546a70
stack pointer   = 0x10:0xdc279824
frame pointer   = 0x10:0xdc279840
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 971 (ps)
trap number = 12
panic: page fault

-- 
Sean Chittenden
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-17 Thread Bruce Evans
On Wed, 15 Oct 2003, Jeff Roberson wrote:

 I fixed two bugs that were exposed due to more of the kernel running
 outside of Giant.  ULE had some issues with priority propagation that
 stopped it from working very well.

 Things should be much improved.  Feedback, as always, is welcome.  I'd
 like to look into making this the default scheduler for 5.2 if things
 start looking up.  I hope that scares you all into using it more. :-)

How would one test if it was an improvement on the 4BSD scheduler?  It
is not even competitive in my simple tests.

Test for scheduling buildworlds:

cd /usr/src/usr.bin
for i in obj depend all
do
MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i
done /tmp/zqz 21

(Run this with an empty /somewhere/obj.  The all stage doesn't quite
finish.)  On an ABIT BP6 system with a 400MHz and a 366MHz CPU, with
/usr (including /usr/src) nfs-mounted (with 100 Mbps ethernet and a
reasonably fast server) and /somewhere/obj ufs1-mounted (on a fairly
slow disk; no soft-updates), this gives the following times:

SCHED_ULE-yesterday, with not so careful setup:
   40.37 real 8.26 user 6.26 sys
  278.90 real59.35 user41.32 sys
  341.82 real   307.38 user69.01 sys
SCHED_ULE-today, run immediately after booting:
   41.51 real 7.97 user 6.42 sys
  306.64 real59.66 user40.68 sys
  346.48 real   305.54 user69.97 sys
SCHED_4BSD-yesterday, with not so careful setup:
  [same as today except the depend step was 10 seconds slower (real)]
SCHED_4BSD-today, run immediately after booting:
   18.89 real 8.01 user 6.66 sys
  128.17 real58.33 user43.61 sys
  291.59 real   308.48 user72.33 sys
SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz CPU) with
many local changes and not so careful setup:
   17.39 real 8.28 user 5.49 sys
  130.51 real60.97 user34.63 sys
  390.68 real   310.78 user60.55 sys

Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for the
obj and depend stages.  These stages have little parallelism.  SCHED_ULE
was only 19% slower for the all stage.  It apparently misses many
oppurtunities to actually run useful processes.  This may be related
to /usr being nfs mounted.  There is lots of idling waiting for nfs
even in the SCHED_4BSD case.  The system times are smaller for SCHED_ULE,
but this might not be significant.  E.g., zeroing pages can account
for several percent of the system time in buildworld, but on unbalanced
systems that have too much idle time most page zero gets done in idle
time and doesn't show up in the system time.

Test 1 for fair scheduling related to niceness:

for i in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
do
nice -$i sh -c while :; do echo -n;done 
done
top -o time

[Output deleted].  This shows only a vague correlation between niceness
and runtime for SCHED_ULE.  However, top -o cpu shows a strong correlation
between %CPU and niceness.  Apparently, %CPU is very innacurate and/or
not enough history is kept for long-term scheduling to be fair.

Test 5 for fair scheduling related to niceness:

for i in -20 -16 -12 -8 -4 0 4 8 12 16 20
do
nice -$i sh -c while :; do echo -n;done 
done
time top -o cpu

With SCHED_ULE, this now hangs the system, but it worked yesterday.  Today
it doesn't get as far as running top and it stops the nfs server responding.
To unhang the system and see what the above does, run a shell at rtprio 0
and start top before the above, and use top to kill processes (I normally
use killall sh to kill all the shells generated by tests 1-5, but killall
doesn't work if it is on nfs when the nfs server is not responding).

Bruce
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-17 Thread Jeff Roberson
On Fri, 17 Oct 2003, Bruce Evans wrote:

 How would one test if it was an improvement on the 4BSD scheduler?  It
 is not even competitive in my simple tests.

[scripts results deleted]


 Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for the
 obj and depend stages.  These stages have little parallelism.  SCHED_ULE
 was only 19% slower for the all stage.  It apparently misses many
 oppurtunities to actually run useful processes.  This may be related
 to /usr being nfs mounted.  There is lots of idling waiting for nfs
 even in the SCHED_4BSD case.  The system times are smaller for SCHED_ULE,
 but this might not be significant.  E.g., zeroing pages can account
 for several percent of the system time in buildworld, but on unbalanced
 systems that have too much idle time most page zero gets done in idle
 time and doesn't show up in the system time.

At one point ULE was at least as fast as 4BSD and in most cases faster.
This is a regression.  I'll sort it out soon.



 Test 1 for fair scheduling related to niceness:

   for i in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
   do
   nice -$i sh -c while :; do echo -n;done 
   done
   top -o time

 [Output deleted].  This shows only a vague correlation between niceness
 and runtime for SCHED_ULE.  However, top -o cpu shows a strong correlation
 between %CPU and niceness.  Apparently, %CPU is very innacurate and/or
 not enough history is kept for long-term scheduling to be fair.

 Test 5 for fair scheduling related to niceness:

   for i in -20 -16 -12 -8 -4 0 4 8 12 16 20
   do
   nice -$i sh -c while :; do echo -n;done 
   done
   time top -o cpu

 With SCHED_ULE, this now hangs the system, but it worked yesterday.  Today
 it doesn't get as far as running top and it stops the nfs server responding.
 To unhang the system and see what the above does, run a shell at rtprio 0
 and start top before the above, and use top to kill processes (I normally
 use killall sh to kill all the shells generated by tests 1-5, but killall
 doesn't work if it is on nfs when the nfs server is not responding).

  661 root 112  -20   900K   608K RUN  0:24 27.80% 27.64% sh
  662 root 114  -16   900K   608K RUN  0:19 12.43% 12.35% sh
  663 root 114  -12   900K   608K RUN  0:15 10.66% 10.60% sh
  664 root 114   -8   900K   608K RUN  0:11  9.38%  9.33% sh
  665 root 115   -4   900K   608K RUN  0:10  7.91%  7.86% sh
  666 root 1150   900K   608K RUN  0:07  6.83%  6.79% sh
  667 root 1154   900K   608K RUN  0:06  5.01%  4.98% sh
  668 root 1158   900K   608K RUN  0:04  3.83%  3.81% sh
  669 root 115   12   900K   608K RUN  0:02  2.21%  2.20% sh
  670 root 115   16   900K   608K RUN  0:01  0.93%  0.93% sh

I think you cvsup'd at a bad time.  I fixed a bug that would have caused
the system to lock up in this case late last night.  On my system it
freezes for a few seconds and then returns.  I can stop that by turning
down the interactivity threshold.

Thanks,
Jeff


 Bruce
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to [EMAIL PROTECTED]


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-17 Thread Jeff Roberson

On Fri, 17 Oct 2003, Bruce Evans wrote:

 On Fri, 17 Oct 2003, Jeff Roberson wrote:

  On Fri, 17 Oct 2003, Bruce Evans wrote:
 
   How would one test if it was an improvement on the 4BSD scheduler?  It
   is not even competitive in my simple tests.
   ...
 
  At one point ULE was at least as fast as 4BSD and in most cases faster.
  This is a regression.  I'll sort it out soon.

 How much faster?

Apache benchmarked at 30% greater throughput due the cpu affinity some
time ago.  I haven't done more recent tests with apache.  buildworld is
the most degenerate case for per cpu run queues because cpu affinity
doesn't help much and load imbalances hurt a lot.  On my machine the
compiler hardly ever wants to run for more than a few slices before doing
a msleep() so it's not bouncing around between CPUs so much with 4BSD.



   Test 5 for fair scheduling related to niceness:
  
 for i in -20 -16 -12 -8 -4 0 4 8 12 16 20
 do
 nice -$i sh -c while :; do echo -n;done 
 done
 time top -o cpu
  
   With SCHED_ULE, this now hangs the system, but it worked yesterday.  Today
   it doesn't get as far as running top and it stops the nfs server responding.

661 root 112  -20   900K   608K RUN  0:24 27.80% 27.64% sh
662 root 114  -16   900K   608K RUN  0:19 12.43% 12.35% sh
663 root 114  -12   900K   608K RUN  0:15 10.66% 10.60% sh
664 root 114   -8   900K   608K RUN  0:11  9.38%  9.33% sh
665 root 115   -4   900K   608K RUN  0:10  7.91%  7.86% sh
666 root 1150   900K   608K RUN  0:07  6.83%  6.79% sh
667 root 1154   900K   608K RUN  0:06  5.01%  4.98% sh
668 root 1158   900K   608K RUN  0:04  3.83%  3.81% sh
669 root 115   12   900K   608K RUN  0:02  2.21%  2.20% sh
670 root 115   16   900K   608K RUN  0:01  0.93%  0.93% sh

 Perhaps the bug only affects SMP.  The above is for UP (no CPU column).


That is likely, I don't use my SMP machine much anymore.  I should setup
some automated tests.

 I see a large difference from the above, at least under SMP: %CPU
 tapers off to 0 at nice 0.

 BTW, I just noticed that SCHED_4BSD never really worked for the SMP case.
 sched_clock() is called for each CPU, and for N CPU's this has the same
 effect as calling sched_clock() N times too often for 1 CPU.  Calling
 sched_clock() too often was fixed for the UP case in kern_synch.c 1.83
 by introducing a scale factor.  The scale factor is fixed so it doesn't
 help for SMP.

Wait.. why are we calling sched_clock() too frequently on UP?


  I think you cvsup'd at a bad time.  I fixed a bug that would have caused
  the system to lock up in this case late last night.  On my system it
  freezes for a few seconds and then returns.  I can stop that by turning
  down the interactivity threshold.

 No, I tested with an up to date kernel (sched_ule.c 1.65).

Curious.  ULE seems to have suffered from bitrot.  These things were all
tested and working when I did my paper for BSDCon.  I have largely
neglected FreeBSD since.  I can't fix it this weekend, but I'm sure I'll
sort it out next weekend.

Cheers,
Jeff


 Bruce


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-17 Thread Sean Chittenden
 I think you cvsup'd at a bad time.  I fixed a bug that would have
 caused the system to lock up in this case late last night.  On my
 system it freezes for a few seconds and then returns.  I can stop
 that by turning down the interactivity threshold.

Hrm, I must concur that while ULE seems a tad snappier on the
responsiveness end, it seems to be lacking in terms of real world
performance compared to 4BSD.

Fresh CVSup (~midnight 2003-10-17) and build with a benchmark from
before and after.  I was benchmarking a chump calc program using
bison vs. lemon earlier today under 4BSD
(http://groups.yahoo.com/group/sqlite/message/5506) and figured I'd
throw my hat in on the subject with some relative numbers.  System
time is down for ULE, but user and real are up.


Under ULE:

Running a dry run with bison calc...done.
Running 1st run with bison calc... 52.11 real 45.63 user 0.56 sys
Running 2nd run with bison calc... 52.16 real 45.52 user 0.69 sys
Running 3rd run with bison calc... 51.80 real 45.32 user 0.87 sys

Running a dry run with lemon calc...done.
Running 1st run with lemon calc... 129.69 real 117.91 user 1.10 sys
Running 2nd run with lemon calc... 130.26 real 117.88 user 1.13 sys
Running 3rd run with lemon calc... 130.76 real 117.90 user 1.10 sys

Time spent in user mode   (CPU seconds) : 654.049s
Time spent in kernel mode (CPU seconds) : 7.047s
Total time  : 12:19.06s
CPU utilization (percentage): 89.4%
Times the process was swapped   : 0
Times of major page faults  : 34
Times of minor page faults  : 2361


And under 4BSD:

 Running a dry run with bison calc...done.
 Running 1st run with bison calc... 44.22 real 37.94 user 0.85 sys
 Running 2nd run with bison calc... 46.21 real 37.98 user 0.85 sys
 Running 3rd run with bison calc... 45.32 real 38.13 user 0.67 sys
 
 Running a dry run with lemon calc...done.
 Running 1st run with lemon calc... 116.53 real 100.10 user 1.13 sys
 Running 2nd run with lemon calc... 112.61 real 100.35 user 0.86 sys
 Running 3rd run with lemon calc... 114.16 real 100.19 user 1.04 sys
  
 Time spent in user mode (CPU seconds) : 553.392s
 Time spent in kernel mode (CPU seconds) : 6.978s
 Total time : 10:40.80s
 CPU utilization (percentage) : 87.4%
 Times the process was swapped : 223
 Times of major page faults : 50
 Times of minor page faults : 2750


Just a heads up, it does indeed look as thought hings have gone
backwards in terms of performance.  -sc

-- 
Sean Chittenden
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-17 Thread Jeff Roberson

On Fri, 17 Oct 2003, Sean Chittenden wrote:

  I think you cvsup'd at a bad time.  I fixed a bug that would have
  caused the system to lock up in this case late last night.  On my
  system it freezes for a few seconds and then returns.  I can stop
  that by turning down the interactivity threshold.

 Hrm, I must concur that while ULE seems a tad snappier on the
 responsiveness end, it seems to be lacking in terms of real world
 performance compared to 4BSD.

Thanks for the stats.  Is this on SMP or UP?


 Fresh CVSup (~midnight 2003-10-17) and build with a benchmark from
 before and after.  I was benchmarking a chump calc program using
 bison vs. lemon earlier today under 4BSD
 (http://groups.yahoo.com/group/sqlite/message/5506) and figured I'd
 throw my hat in on the subject with some relative numbers.  System
 time is down for ULE, but user and real are up.


 Under ULE:

 Running a dry run with bison calc...done.
 Running 1st run with bison calc... 52.11 real 45.63 user 0.56 sys
 Running 2nd run with bison calc... 52.16 real 45.52 user 0.69 sys
 Running 3rd run with bison calc... 51.80 real 45.32 user 0.87 sys

 Running a dry run with lemon calc...done.
 Running 1st run with lemon calc... 129.69 real 117.91 user 1.10 sys
 Running 2nd run with lemon calc... 130.26 real 117.88 user 1.13 sys
 Running 3rd run with lemon calc... 130.76 real 117.90 user 1.10 sys

 Time spent in user mode   (CPU seconds) : 654.049s
 Time spent in kernel mode (CPU seconds) : 7.047s
 Total time  : 12:19.06s
 CPU utilization (percentage): 89.4%
 Times the process was swapped   : 0
 Times of major page faults  : 34
 Times of minor page faults  : 2361


 And under 4BSD:

  Running a dry run with bison calc...done.
  Running 1st run with bison calc... 44.22 real 37.94 user 0.85 sys
  Running 2nd run with bison calc... 46.21 real 37.98 user 0.85 sys
  Running 3rd run with bison calc... 45.32 real 38.13 user 0.67 sys

  Running a dry run with lemon calc...done.
  Running 1st run with lemon calc... 116.53 real 100.10 user 1.13 sys
  Running 2nd run with lemon calc... 112.61 real 100.35 user 0.86 sys
  Running 3rd run with lemon calc... 114.16 real 100.19 user 1.04 sys

  Time spent in user mode (CPU seconds) : 553.392s
  Time spent in kernel mode (CPU seconds) : 6.978s
  Total time : 10:40.80s
  CPU utilization (percentage) : 87.4%
  Times the process was swapped : 223
  Times of major page faults : 50
  Times of minor page faults : 2750


 Just a heads up, it does indeed look as thought hings have gone
 backwards in terms of performance.  -sc

 --
 Sean Chittenden


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-17 Thread Sean Chittenden
   I think you cvsup'd at a bad time.  I fixed a bug that would have
   caused the system to lock up in this case late last night.  On my
   system it freezes for a few seconds and then returns.  I can stop
   that by turning down the interactivity threshold.
 
  Hrm, I must concur that while ULE seems a tad snappier on the
  responsiveness end, it seems to be lacking in terms of real world
  performance compared to 4BSD.
 
 Thanks for the stats.  Is this on SMP or UP?

UP.  -sc

-- 
Sean Chittenden
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-17 Thread Matteo Riondato
Il Mer, 2003-10-15 alle 09:51, Jeff Roberson ha scritto:
 I fixed two bugs that were exposed due to more of the kernel running
 outside of Giant.  ULE had some issues with priority propagation that
 stopped it from working very well.
 
 Things should be much improved. 

On my Athlon XP 2000+ the situation is much better. No mouse jerkiness 
whatever the load of the sistem is. The system respond better in any situation.
I'm using libc_r, so no problem with any Gnome app caused (probably) by KSE.
Best Regards.
-- 
Rionda aka Matteo Riondato
G.U.F.I Staff Member (http://www.gufi.org)
BSD-FAQ-it Main Developer (http://www.gufi.org/~rionda)
GPG key at: http://www.riondabsd.net/riondagpg.asc
Sent from: kaiser.sig11.org running FreeBSD-5.1-CURRENT


signature.asc
Description: Questa parte del messaggio =?ISO-8859-1?Q?=E8?= firmata


Re: More ULE bugs fixed.

2003-10-17 Thread Sean Chittenden
I think you cvsup'd at a bad time.  I fixed a bug that would have
caused the system to lock up in this case late last night.  On my
system it freezes for a few seconds and then returns.  I can stop
that by turning down the interactivity threshold.
  
   Hrm, I must concur that while ULE seems a tad snappier on the
   responsiveness end, it seems to be lacking in terms of real world
   performance compared to 4BSD.
  
  Thanks for the stats.  Is this on SMP or UP?
 
 UP.  -sc

The commit to src/sys/kern/kern_switch.c:1.62, would it fix the
following crash (can't find my kernel with debugging symbols):

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x30
fault code  = supervisor write, page not present
instruction pointer = 0x8:0xc054699f
stack pointer   = 0x10:0xd6713b20
frame pointer   = 0x10:0xd6713b2c
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 3 (g_up)
trap number = 12
panic: page fault

syncing disks, buffers remaining...

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x0
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc0536771
stack pointer   = 0x10:0xdb7d4bb4
frame pointer   = 0x10:0xdb7d4bc0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 45 (syncer)
trap number = 12
panic: page fault

#0  0xc052eeeb in doadump ()
#1  0xc052f579 in boot ()
#2  0xc052f958 in panic ()
#3  0xc06e5536 in trap_fatal ()
#4  0xc06e4b73 in trap ()
#5  0xc06d53d8 in calltrap ()
#6  0xc05460bb in sched_switch ()
#7  0xc05384eb in mi_switch ()
#8  0xc0537b9f in msleep ()
#9  0xc058eca3 in sched_sync ()
#10 0xc0518321 in fork_exit ()

-sc

-- 
Sean Chittenden
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-16 Thread Eirik Oeverby
Jeff Roberson wrote:
On Wed, 15 Oct 2003, Eirik Oeverby wrote:


Eirik Oeverby wrote:

Jeff Roberson wrote:


I fixed two bugs that were exposed due to more of the kernel running
outside of Giant.  ULE had some issues with priority propagation that
stopped it from working very well.
Things should be much improved.  Feedback, as always, is welcome.  I'd
like to look into making this the default scheduler for 5.2 if things
start looking up.  I hope that scares you all into using it more. :-)


Hi..
Just tested, so far it seems good. System CPU load is floored (near 0),
system is very responsive, no mouse sluggishness or random
mouse/keyboard input.
Doing a make -j 20 buildworld now (on my 1ghz p3 thinkpad ;), and
running some SQLServer stuff in VMWare. We'll see how it fares.
Hi, just a followup message.
I'm now running the buildworld mentioned above, and the system is pretty
much unusable. It exhibits the same symptoms as I have mentioned before,
mouse jumpiness, bogus mouse input (movement, clicks), and the system is
generally very jerky and unresponsive. This is particularily evident
when doing things like webpage loading/browsing/rendering, but it's
noticeable all the time, no matter what I am doing. As an example, the
last sentence I wote without seeing a single character on screen before
I was finsihed writing it, and it appeared with a lot more typos than I
usually make ;)
I'm running *without* invariants and witness right now, i.e. a kernel
100% equal to the SCHED_4BSD kernel.


Can you confirm the revision of your sys/kern/sched_ule.c file?  How does
SCHED_4BSD respond in this same test?
Yes I can. From file:
__FBSDID($FreeBSD: src/sys/kern/sched_ule.c,v 1.59 2003/10/15 07:47:06 
jeff Exp $);
I am running SCHED_4BSD now, with a make -j 20 buildworld running, and I 
do not experience any of the problems. Keyboard and mouse input is 
smooth, and though apps run slightly slower due to the massive load on 
the system, there is none of the jerkiness I have seen before.

Anything else I can do to help?

/Eirik

Thanks,
Jeff

Best regards,
/Eirik
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-16 Thread Jeff Roberson
On Thu, 16 Oct 2003, Eirik Oeverby wrote:

 Jeff Roberson wrote:
  On Wed, 15 Oct 2003, Eirik Oeverby wrote:
 
 
 Eirik Oeverby wrote:
 
 Jeff Roberson wrote:
 
 
 I fixed two bugs that were exposed due to more of the kernel running
 outside of Giant.  ULE had some issues with priority propagation that
 stopped it from working very well.
 
 Things should be much improved.  Feedback, as always, is welcome.  I'd
 like to look into making this the default scheduler for 5.2 if things
 start looking up.  I hope that scares you all into using it more. :-)
 
 
 Hi..
 Just tested, so far it seems good. System CPU load is floored (near 0),
 system is very responsive, no mouse sluggishness or random
 mouse/keyboard input.
 Doing a make -j 20 buildworld now (on my 1ghz p3 thinkpad ;), and
 running some SQLServer stuff in VMWare. We'll see how it fares.
 
 Hi, just a followup message.
 I'm now running the buildworld mentioned above, and the system is pretty
 much unusable. It exhibits the same symptoms as I have mentioned before,
 mouse jumpiness, bogus mouse input (movement, clicks), and the system is
 generally very jerky and unresponsive. This is particularily evident
 when doing things like webpage loading/browsing/rendering, but it's
 noticeable all the time, no matter what I am doing. As an example, the
 last sentence I wote without seeing a single character on screen before
 I was finsihed writing it, and it appeared with a lot more typos than I
 usually make ;)
 
 I'm running *without* invariants and witness right now, i.e. a kernel
 100% equal to the SCHED_4BSD kernel.
 
 
  Can you confirm the revision of your sys/kern/sched_ule.c file?  How does
  SCHED_4BSD respond in this same test?

 Yes I can. From file:
 __FBSDID($FreeBSD: src/sys/kern/sched_ule.c,v 1.59 2003/10/15 07:47:06
 jeff Exp $);
 I am running SCHED_4BSD now, with a make -j 20 buildworld running, and I
 do not experience any of the problems. Keyboard and mouse input is
 smooth, and though apps run slightly slower due to the massive load on
 the system, there is none of the jerkiness I have seen before.

 Anything else I can do to help?

Yup, try again. :-)  I found another bug and tuned some parameters of the
scheduler.  The bug was introduced after I did my paper for BSDCon and so
I never ran into it when I was doing serious stress testing.

Hopefully this will be a huge improvement.  I did a make -j16 buildworld
and used mozilla while in kde2.  It was fine unless I tried to scroll
around rapidly in a page full of several megabyte images for many minutes.


 /Eirik

  Thanks,
  Jeff
 
 
 Best regards,
 /Eirik
 
 
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to [EMAIL PROTECTED]
 
 
 
  ___
  [EMAIL PROTECTED] mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-current
  To unsubscribe, send any mail to [EMAIL PROTECTED]



___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-16 Thread Peter Kadau
Hi !

 Things should be much improved.  Feedback, as always, is welcome. 
Wow ! Smoothly working under a load of approx. 4.
Running gnome2, mozilla, evolution, mplayer and kpdf.
Running portsdb -Uu and a kernel build.
No stuttering mouse, no irritating delays, fast rendering.
That's definitely better than _4BSD.
(UP machine)

Cheers
Peter

-- 
[EMAIL PROTECTED]

Campus der Max-Planck-Institute Tübingen
Netzwerk- und Systemadministration

Tel: +49 7071 601 598
Fax: +49 7071 601 616

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-16 Thread Eirik Oeverby
Jeff Roberson wrote:
On Thu, 16 Oct 2003, Eirik Oeverby wrote:


Jeff Roberson wrote:

On Wed, 15 Oct 2003, Eirik Oeverby wrote:



Eirik Oeverby wrote:


Jeff Roberson wrote:



I fixed two bugs that were exposed due to more of the kernel running
outside of Giant.  ULE had some issues with priority propagation that
stopped it from working very well.
Things should be much improved.  Feedback, as always, is welcome.  I'd
like to look into making this the default scheduler for 5.2 if things
start looking up.  I hope that scares you all into using it more. :-)


Hi..
Just tested, so far it seems good. System CPU load is floored (near 0),
system is very responsive, no mouse sluggishness or random
mouse/keyboard input.
Doing a make -j 20 buildworld now (on my 1ghz p3 thinkpad ;), and
running some SQLServer stuff in VMWare. We'll see how it fares.
Hi, just a followup message.
I'm now running the buildworld mentioned above, and the system is pretty
much unusable. It exhibits the same symptoms as I have mentioned before,
mouse jumpiness, bogus mouse input (movement, clicks), and the system is
generally very jerky and unresponsive. This is particularily evident
when doing things like webpage loading/browsing/rendering, but it's
noticeable all the time, no matter what I am doing. As an example, the
last sentence I wote without seeing a single character on screen before
I was finsihed writing it, and it appeared with a lot more typos than I
usually make ;)
I'm running *without* invariants and witness right now, i.e. a kernel
100% equal to the SCHED_4BSD kernel.


Can you confirm the revision of your sys/kern/sched_ule.c file?  How does
SCHED_4BSD respond in this same test?
Yes I can. From file:
__FBSDID($FreeBSD: src/sys/kern/sched_ule.c,v 1.59 2003/10/15 07:47:06
jeff Exp $);
I am running SCHED_4BSD now, with a make -j 20 buildworld running, and I
do not experience any of the problems. Keyboard and mouse input is
smooth, and though apps run slightly slower due to the massive load on
the system, there is none of the jerkiness I have seen before.
Anything else I can do to help?


Yup, try again. :-)  I found another bug and tuned some parameters of the
scheduler.  The bug was introduced after I did my paper for BSDCon and so
I never ran into it when I was doing serious stress testing.
Hopefully this will be a huge improvement.  I did a make -j16 buildworld
and used mozilla while in kde2.  It was fine unless I tried to scroll
around rapidly in a page full of several megabyte images for many minutes.
It is. Still not perfect, but now it's somewhere around the 4BSD mark I 
would say. Think about 'make buildworld' is that it doesn't get real 
tough before it hits some of the larger directories, like the crypto 
stuff etc., where there are many .c files in one dir - before it gets 
that far, there are at most 2 or 3 cc1 processes going concurrently.
As soon as I get 10-20 of them, things start getting sluggish, but I 
suppose it's hard to avoid that. What disturbs me somewhat, though, is 
that I get some of this sluggishness (and other symptoms i've mentioned 
before) even when i'm running 'nice -n 20 make -j 20 buildworld' .. 
meaning the cc1 processes and all that are running (very) nice. The fact 
that I still have issues even when doing that, would lead me to think 
the problem is somewhere else than in the scheduler..
Now I can't say I'm completely sure if this is also the case with 4BSD - 
I only tested the nice stuff after the last reboot.

But all in all, things are better now than yesterday morning. Kudos!

/Eirik

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-15 Thread Eirik Oeverby
Jeff Roberson wrote:
I fixed two bugs that were exposed due to more of the kernel running
outside of Giant.  ULE had some issues with priority propagation that
stopped it from working very well.
Things should be much improved.  Feedback, as always, is welcome.  I'd
like to look into making this the default scheduler for 5.2 if things
start looking up.  I hope that scares you all into using it more. :-)
Hi..
Just tested, so far it seems good. System CPU load is floored (near 0), 
system is very responsive, no mouse sluggishness or random 
mouse/keyboard input.
Doing a make -j 20 buildworld now (on my 1ghz p3 thinkpad ;), and 
running some SQLServer stuff in VMWare. We'll see how it fares.

Thanks,
/Eirik
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-15 Thread Eirik Oeverby
Eirik Oeverby wrote:
Jeff Roberson wrote:

I fixed two bugs that were exposed due to more of the kernel running
outside of Giant.  ULE had some issues with priority propagation that
stopped it from working very well.
Things should be much improved.  Feedback, as always, is welcome.  I'd
like to look into making this the default scheduler for 5.2 if things
start looking up.  I hope that scares you all into using it more. :-)


Hi..
Just tested, so far it seems good. System CPU load is floored (near 0), 
system is very responsive, no mouse sluggishness or random 
mouse/keyboard input.
Doing a make -j 20 buildworld now (on my 1ghz p3 thinkpad ;), and 
running some SQLServer stuff in VMWare. We'll see how it fares.
Hi, just a followup message.
I'm now running the buildworld mentioned above, and the system is pretty
much unusable. It exhibits the same symptoms as I have mentioned before,
mouse jumpiness, bogus mouse input (movement, clicks), and the system is
generally very jerky and unresponsive. This is particularily evident
when doing things like webpage loading/browsing/rendering, but it's
noticeable all the time, no matter what I am doing. As an example, the
last sentence I wote without seeing a single character on screen before
I was finsihed writing it, and it appeared with a lot more typos than I
usually make ;)
I'm running *without* invariants and witness right now, i.e. a kernel
100% equal to the SCHED_4BSD kernel.
Best regards,
/Eirik
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-15 Thread Daniel Eischen
On Wed, 15 Oct 2003, Jeff Roberson wrote:

 I fixed two bugs that were exposed due to more of the kernel running
 outside of Giant.  ULE had some issues with priority propagation that
 stopped it from working very well.
 
 Things should be much improved.  Feedback, as always, is welcome.  I'd
 like to look into making this the default scheduler for 5.2 if things
 start looking up.  I hope that scares you all into using it more. :-)

Before you do that, can you look into changing the scheduler
interfaces to address David Xu's concern with it being
suboptimal for KSE processes?

-- 
Dan Eischen

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-15 Thread Jeff Roberson
On Wed, 15 Oct 2003, Eirik Oeverby wrote:

 Eirik Oeverby wrote:
  Jeff Roberson wrote:
 
  I fixed two bugs that were exposed due to more of the kernel running
  outside of Giant.  ULE had some issues with priority propagation that
  stopped it from working very well.
 
  Things should be much improved.  Feedback, as always, is welcome.  I'd
  like to look into making this the default scheduler for 5.2 if things
  start looking up.  I hope that scares you all into using it more. :-)
 
 
  Hi..
  Just tested, so far it seems good. System CPU load is floored (near 0),
  system is very responsive, no mouse sluggishness or random
  mouse/keyboard input.
  Doing a make -j 20 buildworld now (on my 1ghz p3 thinkpad ;), and
  running some SQLServer stuff in VMWare. We'll see how it fares.

 Hi, just a followup message.
 I'm now running the buildworld mentioned above, and the system is pretty
 much unusable. It exhibits the same symptoms as I have mentioned before,
 mouse jumpiness, bogus mouse input (movement, clicks), and the system is
 generally very jerky and unresponsive. This is particularily evident
 when doing things like webpage loading/browsing/rendering, but it's
 noticeable all the time, no matter what I am doing. As an example, the
 last sentence I wote without seeing a single character on screen before
 I was finsihed writing it, and it appeared with a lot more typos than I
 usually make ;)

 I'm running *without* invariants and witness right now, i.e. a kernel
 100% equal to the SCHED_4BSD kernel.

Can you confirm the revision of your sys/kern/sched_ule.c file?  How does
SCHED_4BSD respond in this same test?

Thanks,
Jeff


 Best regards,
 /Eirik


 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to [EMAIL PROTECTED]


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-15 Thread Jeff Roberson
On Wed, 15 Oct 2003, Daniel Eischen wrote:

 On Wed, 15 Oct 2003, Jeff Roberson wrote:

  I fixed two bugs that were exposed due to more of the kernel running
  outside of Giant.  ULE had some issues with priority propagation that
  stopped it from working very well.
 
  Things should be much improved.  Feedback, as always, is welcome.  I'd
  like to look into making this the default scheduler for 5.2 if things
  start looking up.  I hope that scares you all into using it more. :-)

 Before you do that, can you look into changing the scheduler
 interfaces to address David Xu's concern with it being
 suboptimal for KSE processes?

Certainly, it may not happen if I can't find out what's making things so
jerky for gnome/kde users.  If it looks like it will, I'll investigate the
kse issues.


 --
 Dan Eischen



___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-15 Thread Daniel Eischen
On Wed, 15 Oct 2003, Jeff Roberson wrote:

 On Wed, 15 Oct 2003, Daniel Eischen wrote:
 
  On Wed, 15 Oct 2003, Jeff Roberson wrote:
 
   I fixed two bugs that were exposed due to more of the kernel running
   outside of Giant.  ULE had some issues with priority propagation that
   stopped it from working very well.
  
   Things should be much improved.  Feedback, as always, is welcome.  I'd
   like to look into making this the default scheduler for 5.2 if things
   start looking up.  I hope that scares you all into using it more. :-)
 
  Before you do that, can you look into changing the scheduler
  interfaces to address David Xu's concern with it being
  suboptimal for KSE processes?
 
 Certainly, it may not happen if I can't find out what's making things so
 jerky for gnome/kde users.  If it looks like it will, I'll investigate the
 kse issues.

Thanks, I appreciate it.

-- 
Dan Eischen

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: More ULE bugs fixed.

2003-10-15 Thread Julian Elischer


On Wed, 15 Oct 2003, Daniel Eischen wrote:

 On Wed, 15 Oct 2003, Jeff Roberson wrote:
 
  I fixed two bugs that were exposed due to more of the kernel running
  outside of Giant.  ULE had some issues with priority propagation that
  stopped it from working very well.
  
  Things should be much improved.  Feedback, as always, is welcome.  I'd
  like to look into making this the default scheduler for 5.2 if things
  start looking up.  I hope that scares you all into using it more. :-)
 
 Before you do that, can you look into changing the scheduler
 interfaces to address David Xu's concern with it being
 suboptimal for KSE processes?
 
There is also some work that I'd like to get done re:
cleaning up the scheduler interface a bit..

I know that Jeff and I have doiscussed this before but it was a long
time ago, and I've forgotten a lot and also learned a bit since then..

Here's my logic on the matter:

Any process has a number (fixed or variable) of kernel entities tghat
can be scheduled. In KSE (gotta get a better name) there are a variable
number of them. In libthr they are 1:1.

I would postulate that the action of scheduling these items in a fair
way is up to the scheduler. I had a very crude fairness module
added to the BSD4.4 scheduler but I think that fairness
is a property of the scheduler and not of the threading package.

If the scheduler doesn't care if threads are scheduled fairly than it
can just schedule all threads equally. I would say that the ksegrp 
in question (which represents a rough unit of 'fairness'), should 
make a call to the scheduler on creation specifying the required
concurrancy.

At the moment KSE-M:N based ksegrps would specify N = NCPU, and
THR based ksegrps would specify N = NTHREADS.
KSE-1:1 runs with a KSEGRP with a concurrancy of 1 per thread.

(I still think that THR should allocate a KSEGRP per thread not a KSE
but it's not critical.)

Basically What I'm saying is that each scheduler should taka
a concurrency setting for each KSEGRP and how it implements it
is hidden from higher layers.  The current 4.4 scheduler would 
implement it using KSEs and the existitng code but other shcedulers may
chose to implement it in different manners.

I think the top layer API calls for the scheduler should be:
setrunnable(thread) 
choosethread()
sched_clocktick()
sched_set_concurrancy()
(plus all the other 'entrypoints')


I think that the scheduler needs to be in control of scheduling threads 
because there is too much inside information needed for it to be done 
properly by an outside entity. For example if the scheduler is not a 
priority based scheduler then an outside entity can not know 
how to juggle which thread should be run next if there is a choice of
which to do..

this would mean that each scheduler would neeed its own module to 
do this juggling instead of having a separate module to do it..

it makes the job of the scheduler more difficult, but in fact it has to
be so, because true posix process-scope threads require that the
scheduler do this work.


a thread is made runnable (with a unix priority)
the scheduler needs to look at this thread in the context of all the
other threads from this process, the current concurrency rule for that
ksegrp and the other runnable threads, and adjust things so that:
1/ the new thread is run some time
2/ the ksegrp doesn't get TOO MUCH cpu, possibly
punishing other threads in the group to compensate..

This is all up for discussion, but it's my current thinking.

Julian



___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]