Re: More ULE bugs fixed.
On (2003/11/04 15:46), Jeff Roberson wrote: The thing is, I'm using 4BSD, not ULE, so I wouldn't trouble Jeff to look for a cause for that specific problem in ULE. How long have you been seeing this? Are you using a usb mouse? Can you try with PS/2 if you are? Since my last update, Fri Oct 24 17:47:22. I am using a USB mouse, but don't have a PS/2 one. I'm also using moused, and my WM is sawfish. The problem with all these reports is that they're scattered. It's hard to pin down exactly what the common elements are. Indeed, we may be looking at combinations of elements. I don't have time to be more helpful, which is why I hadn't complained. I just wanted to include the datapoint that over-active mouse behaviour under load exists under SCHED_4BSD as well. Incidentally, this is under ATA disk load. I don't really push my CPU. Ciao, Sheldon. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
Sheldon Hearn wrote: On (2003/11/04 15:46), Jeff Roberson wrote: The thing is, I'm using 4BSD, not ULE, so I wouldn't trouble Jeff to look for a cause for that specific problem in ULE. How long have you been seeing this? Are you using a usb mouse? Can you try with PS/2 if you are? Since my last update, Fri Oct 24 17:47:22. I am using a USB mouse, but don't have a PS/2 one. I'm also using moused, and my WM is sawfish. The problem with all these reports is that they're scattered. It's hard to pin down exactly what the common elements are. Indeed, we may be looking at combinations of elements. I don't have time to be more helpful, which is why I hadn't complained. I just wanted to include the datapoint that over-active mouse behaviour under load exists under SCHED_4BSD as well. Incidentally, this is under ATA disk load. I don't really push my CPU. Though I am not a hardcore C programmer, much less a FreeBSD contributor in any way, I do have some experience in tracking down problems like this. Used to have a lot of them on some of the more obscure platforms I've been using in the past. My feeling is (and it might be completely wrong ofcourse) that we are dealing with atleast two completely separate issues here. The first has to do with mouse jerkiness, the second has to do with bogus mouse events. There is a significant difference between these two, and personally I am leaning towards concluding that the first has to do with the scheduler, and the second has to do with something entirely different - interrupt handler or something else of the sorts. The first is simply that the mouse stops for a brief moment and then continues from the point where it stopped. Perhaps this is the situation that is remedied by bypassing moused? Is moused perhaps not getting the CPU cycles it needs to process and pass on mouse messages? The second is that mouse messages are actually *lost*, or bogus ones are being generated. I guess it's the first, making moused or X misinterpret the messages it gets. Where along the chain it fails I obviously have no clue. The consequence of this is that when the mouse stops (like in #1) but then resumes from an entirely different point - be it 10 pixels away or at the other end of the screen - possibly even generating a button push (but not necessarily the corresponding button release) message. These two situations could at first sight be mistaken for being the same symptom, but I am pretty sure they are very different. One may influence the other, or they may by coincidence (or for some good reason) happen at the same time, but I believe the errors happen in different parts of the kernel. When you say you get the bogus mouse events (which I believe you are saying atleast ;) only during load, I'm immediately thinking that yes, that might make sense. But I guess that's better left to those who are in the know to decide ;) I have never seen it happen with the 4BSD scheduler, but that might have other reasons (hardware?). Why don't you try with the new interrupt handler? Helped me quite a lot.. :) /Eirik Ciao, Sheldon. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
On Wed, 5 Nov 2003, Sheldon Hearn wrote: On (2003/11/04 15:46), Jeff Roberson wrote: The thing is, I'm using 4BSD, not ULE, so I wouldn't trouble Jeff to look for a cause for that specific problem in ULE. How long have you been seeing this? Are you using a usb mouse? Can you try with PS/2 if you are? Since my last update, Fri Oct 24 17:47:22. I am using a USB mouse, but don't have a PS/2 one. I'm also using moused, and my WM is sawfish. The problem with all these reports is that they're scattered. It's hard to pin down exactly what the common elements are. Indeed, we may be looking at combinations of elements. I don't have time to be more helpful, which is why I hadn't complained. I just wanted to include the datapoint that over-active mouse behaviour under load exists under SCHED_4BSD as well. Incidentally, this is under ATA disk load. I don't really push my CPU. There's been some speculation that the PS/2 mouse problem could be due to high interrupt latency for non-fast interrupt handlers (especially ones not MPSAFE) in 5.x. I think it would make a lot of sense for us to push Giant off both the PS/2 mouse and syscons interrupt handlers in the near future. For syscons, this would also improve the reliability of dropping into the debugger from a non-serial console. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects [EMAIL PROTECTED] Network Associates Laboratories ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
On Wed, Nov 05, 2003 at 11:28:50AM +0100 I heard the voice of Eirik Oeverby, and lo! it spake thus: The second is that mouse messages are actually *lost*, or bogus ones are being generated. I guess it's the first, making moused or X misinterpret the messages it gets. Where along the chain it fails I obviously have no clue. The consequence of this is that when the mouse stops (like in #1) but then resumes from an entirely different point - be it 10 pixels away or at the other end of the screen - possibly even generating a button push (but not necessarily the corresponding button release) message. Note that I've had this to a greater or lesser extent for as long as I can remember (certainly back to 3.0-CURRENT). It corresponds with syslog'd messages on my xconsole along the lines of: Nov 3 12:46:13 mortis kernel: psmintr: out of sync (00c0 != ). Nov 3 12:46:13 mortis kernel: psmintr: discard a byte (12). It's certainly a lot more common (by orders of magnitude) on 5.x in the past... oh, I dunno, year-ish, than it was previously. I lose mouse function for maybe a second, then it squirms itself off somewhere on the screen and sends some button press events. I'm currently running 5.1-R, the traditional scheduler, a PS/2 mouse with no moused. And since I got them (much more rarely) with earlier 5-CURRENT's, and with 4-CURRENT's, etc, I can't see how it's scheduler related. When you say you get the bogus mouse events (which I believe you are saying atleast ;) only during load, I'm immediately thinking that yes, that might make sense. I don't get it only under load; sometimes from flat idle. However, it's usually when I first move the mouse, after it sitting still for a while (where 'while' can vary from a few seconds to a few days, of course); it hardly ever happens in mid-move. -- Matthew Fuller (MF4839) | [EMAIL PROTECTED] Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ The only reason I'm burning my candle at both ends, is because I haven't figured out how to light the middle yet ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
Matthew D. Fuller wrote: On Wed, Nov 05, 2003 at 11:28:50AM +0100 I heard the voice of Eirik Oeverby, and lo! it spake thus: The second is that mouse messages are actually *lost*, or bogus ones are being generated. I guess it's the first, making moused or X misinterpret the messages it gets. Where along the chain it fails I obviously have no clue. The consequence of this is that when the mouse stops (like in #1) but then resumes from an entirely different point - be it 10 pixels away or at the other end of the screen - possibly even generating a button push (but not necessarily the corresponding button release) message. Note that I've had this to a greater or lesser extent for as long as I can remember (certainly back to 3.0-CURRENT). It corresponds with syslog'd messages on my xconsole along the lines of: Nov 3 12:46:13 mortis kernel: psmintr: out of sync (00c0 != ). Nov 3 12:46:13 mortis kernel: psmintr: discard a byte (12). It's certainly a lot more common (by orders of magnitude) on 5.x in the past... oh, I dunno, year-ish, than it was previously. I lose mouse function for maybe a second, then it squirms itself off somewhere on the screen and sends some button press events. I'm currently running 5.1-R, the traditional scheduler, a PS/2 mouse with no moused. And since I got them (much more rarely) with earlier 5-CURRENT's, and with 4-CURRENT's, etc, I can't see how it's scheduler related. No idea, but I never got messages like the ones you mention, and it has absolutely never happened on 4.x or with SCHED_4BSD. Weirdness. :) /Eirik When you say you get the bogus mouse events (which I believe you are saying atleast ;) only during load, I'm immediately thinking that yes, that might make sense. I don't get it only under load; sometimes from flat idle. However, it's usually when I first move the mouse, after it sitting still for a while (where 'while' can vary from a few seconds to a few days, of course); it hardly ever happens in mid-move. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
Jeff Roberson wrote: On Mon, 3 Nov 2003, Eirik Oeverby wrote: Hi, Just recompiled yesterday, running sched_ule.c 1.75. It seems to have re-introduced the bogus mouse events I talked about earlier, after a period of having no problems with it. The change happened between 1.69 and 1.75, and there's also the occational glitch in keyboard input. How unfortunate, it seems to have fixed other problems. Can you describe the mouse problem? Is it jittery constantly or only under load? Or are you having other problems? Have you tried reverting to SCHED_4BSD? What window manager do you run? The problem is two parts: The mouse tends to 'lock up' for brief moments when the system is under load, in particular during heavy UI operations or when doing compile jobs and such. The second part of the problem is related, and is manifested by the mouse actually making movements I never asked it to make. It's almost as if messages passed from the mouse (PS/2 type) through the kernel are being corrupted or lost - moving the mouse slowly in one direction will suddenly make it jump half across the screen and continue. Also it will quite often produce bogus clicks and drags, i.e. I'll be moving the mouse across the screen and suddenly it grabs something and doesn't let go - as if it got a MouseRightDown event but no MouseRightRelease event (or whatever they are called in the world you are in - I'm coming from OS/2 and other obscure platforms ;). The second problem usually follows the first - it's more likely to happen when the system is under some kind of load. Heavy window repainting/updating (Mozilla Thunderbird is a prime example, but other apps can be just as guilty), compile jobs, VMWare going crazy with the CPU, heavy disk/network I/O .. Anything that places load on the system/kernel can cause this. Running with SCHED_4BSD completely solves these problems, and the bogus mouse event problems did NOT appear with sched_ule 1.69 (which is the last one I tried before 1.75). It did appear with ~1.50 and thereabouts though (as reported earlier in this thread). I'm currently running WindowMaker as window manager, but the problem also exists in Gnome and xfce4. Gnome is more likely to exhibit the problem even during low system loads, given that it's more violent UI-wise. You are right though, the later sched_ule revisions DO seem to be better in many other respects - overall performance 'feels' better (atleast in console mode). The mouse issues makes X kinda hard to use though. Btw you might be interested in knowing that the keyboard from time to time shows what I think is bogus input aswell - I have a consistently higher rate of failure when typing with sched_ule 1.75 than I had with 1.69, and it definitely feels as if keystrokes are getting lost or repeated when they shouldn't be. Not often, had two or three 'suspicious' typos while writing this message, and I am reluctant to say it's a definite kernel issue, because I'm nowhere near a perfect typist - but it is nevertheless worth noting and might even be worth looking into. Might there be a connection between this and the mouse issues? Thanks, /Eirik Thanks for the report. Cheers, Jeff If you need me to do anything to track this down, let me know. I am, and have always been, running with moused, on a uniprocessor box (ThinkPad T21 1ghz p3). Best regards, /Eirik Jeff Roberson wrote: On Fri, 31 Oct 2003, Bruno Van Den Bossche wrote: Jeff Roberson [EMAIL PROTECTED] wrote: On Wed, 29 Oct 2003, Jeff Roberson wrote: On Thu, 30 Oct 2003, Bruce Evans wrote: Test for scheduling buildworlds: cd /usr/src/usr.bin for i in obj depend all do MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i done /tmp/zqz 21 (Run this with an empty /somewhere/obj. The all stage doesn't quite finish.) On an ABIT BP6 system with a 400MHz and a 366MHz CPU, with/usr (including /usr/src) nfs-mounted (with 100 Mbps ethernet and a reasonably fast server) and /somewhere/obj ufs1-mounted (on a fairly slow disk; no soft-updates), this gives the following times: SCHED_ULE-yesterday, with not so careful setup: 40.37 real 8.26 user 6.26 sys 278.90 real59.35 user41.32 sys 341.82 real 307.38 user69.01 sys SCHED_ULE-today, run immediately after booting: 41.51 real 7.97 user 6.42 sys 306.64 real59.66 user40.68 sys 346.48 real 305.54 user69.97 sys SCHED_4BSD-yesterday, with not so careful setup: [same as today except the depend step was 10 seconds slower (real)] SCHED_4BSD-today, run immediately after booting: 18.89 real 8.01 user 6.66 sys 128.17 real58.33 user43.61 sys 291.59 real 308.48 user72.33 sys SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz CPU) with many local changes and not so careful setup: 17.39 real 8.28 user 5.49 sys
Re: More ULE bugs fixed.
On (2003/11/04 09:29), Eirik Oeverby wrote: The problem is two parts: The mouse tends to 'lock up' for brief moments when the system is under load, in particular during heavy UI operations or when doing compile jobs and such. The second part of the problem is related, and is manifested by the mouse actually making movements I never asked it to make. Wow, I just assumed it was a local problem. I'm also seeing unrequested mouse movement, as if the signals from movements are repeated or amplified. The thing is, I'm using 4BSD, not ULE, so I wouldn't trouble Jeff to look for a cause for that specific problem in ULE. Ciao, Sheldon. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
On Tue, 4 Nov 2003, Sheldon Hearn wrote: On (2003/11/04 09:29), Eirik Oeverby wrote: The problem is two parts: The mouse tends to 'lock up' for brief moments when the system is under load, in particular during heavy UI operations or when doing compile jobs and such. The second part of the problem is related, and is manifested by the mouse actually making movements I never asked it to make. Wow, I just assumed it was a local problem. I'm also seeing unrequested mouse movement, as if the signals from movements are repeated or amplified. The thing is, I'm using 4BSD, not ULE, so I wouldn't trouble Jeff to look for a cause for that specific problem in ULE. How long have you been seeing this? Are you using a usb mouse? Can you try with PS/2 if you are? Thanks, Jeff Ciao, Sheldon. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
Sheldon Hearn wrote: On (2003/11/04 09:29), Eirik Oeverby wrote: The problem is two parts: The mouse tends to 'lock up' for brief moments when the system is under load, in particular during heavy UI operations or when doing compile jobs and such. The second part of the problem is related, and is manifested by the mouse actually making movements I never asked it to make. Wow, I just assumed it was a local problem. I'm also seeing unrequested mouse movement, as if the signals from movements are repeated or amplified. The thing is, I'm using 4BSD, not ULE, so I wouldn't trouble Jeff to look for a cause for that specific problem in ULE. That is indeed interesting. When I return to 4BSD, everything works very nicely. Perhaps this is some interrupt issue or something? I'll recompile tonight and try with a new kernel (new interrupt stuff for i386 has been checked in recently) and report back. Sorry about the (possibly) false alarm! /Eirik Ciao, Sheldon. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
Hi, Just recompiled yesterday, running sched_ule.c 1.75. It seems to have re-introduced the bogus mouse events I talked about earlier, after a period of having no problems with it. The change happened between 1.69 and 1.75, and there's also the occational glitch in keyboard input. If you need me to do anything to track this down, let me know. I am, and have always been, running with moused, on a uniprocessor box (ThinkPad T21 1ghz p3). Best regards, /Eirik Jeff Roberson wrote: On Fri, 31 Oct 2003, Bruno Van Den Bossche wrote: Jeff Roberson [EMAIL PROTECTED] wrote: On Wed, 29 Oct 2003, Jeff Roberson wrote: On Thu, 30 Oct 2003, Bruce Evans wrote: Test for scheduling buildworlds: cd /usr/src/usr.bin for i in obj depend all do MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i done /tmp/zqz 21 (Run this with an empty /somewhere/obj. The all stage doesn't quite finish.) On an ABIT BP6 system with a 400MHz and a 366MHz CPU, with/usr (including /usr/src) nfs-mounted (with 100 Mbps ethernet and a reasonably fast server) and /somewhere/obj ufs1-mounted (on a fairly slow disk; no soft-updates), this gives the following times: SCHED_ULE-yesterday, with not so careful setup: 40.37 real 8.26 user 6.26 sys 278.90 real59.35 user41.32 sys 341.82 real 307.38 user69.01 sys SCHED_ULE-today, run immediately after booting: 41.51 real 7.97 user 6.42 sys 306.64 real59.66 user40.68 sys 346.48 real 305.54 user69.97 sys SCHED_4BSD-yesterday, with not so careful setup: [same as today except the depend step was 10 seconds slower (real)] SCHED_4BSD-today, run immediately after booting: 18.89 real 8.01 user 6.66 sys 128.17 real58.33 user43.61 sys 291.59 real 308.48 user72.33 sys SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz CPU) with many local changes and not so careful setup: 17.39 real 8.28 user 5.49 sys 130.51 real60.97 user34.63 sys 390.68 real 310.78 user60.55 sys Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for the obj and depend stages. These stages have little parallelism. SCHED_ULE was only 19% slower for the all stage. ... I reran this with -current (sched_ule.c 1.68, etc.). Result: no significant change. However, with a UP kernel there was no significant difference between the times for SCHED_ULE and SCHED_4BSD. There was a significant difference on UP until last week. I'm working on SMP now. I have some patches but they aren't quite ready yet. I have commited my SMP fixes. I would appreciate it if you could post update results. ULE now outperforms 4BSD in a single threaded kernel compile and performs almost identically in a 16 way make. I still have a few more things that I can do to improve the situation. I would expect ULE to pull further ahead in the months to come. I recently had to complete a little piece of software in a course on parallel computing. I've put it online[1] (we only had to write the pract2.cpp file). It calculates the inverse of a Vandermonde matrix and allows you to spawn multiple slave-processes who each perform a part of the work. Everything happens in memory so I've used it lately to test the different changes you made to sched_ule.c and these last fixes do improve the performance on my dual p3 machine a lot. Here are the results of my (very limited tests) : sched4bsd --- dimension slaves time 10001 90.925408 10002 58.897038 200 1 0.735962 200 2 0.676660 sched_ule 1.68 --- dimension slaves time 10001 90.951015 10002 70.402845 200 1 0.743551 200 2 1.900455 sched_ule 1.70 --- dimension slaves time 10001 90.782309 10002 57.207351 200 1 0.739998 200 2 0.383545 I'm not really sure if this is very relevant to you, but from the end-user point of view (me :-)) this does means something. Thanks! I welcome the feedback, positive or negative, as it helps me improve things. Thanks for the report! Could you run this again under 4bsd and ULE with the following in your .cshrc: set time= ( 5 %Uu %Ss %E %P %X+%Dk %I+%Oio %Fpf+%Ww %cc/%ww ) And then time ./testpract 200 2, etc. This will give me a few hints about what's impacting your performance. Thanks! Jeff [1] http://users.pandora.be/bomberboy/mptest/final.tar.bz2 It can be used by running testpract2 with two arguments, the dimension of the matrix and the number of slaves. example './testpract2 200 2' will create a matrix with
Re: More ULE bugs fixed.
On Sun, 2 Nov 2003, Jeff Roberson wrote: On Sat, 1 Nov 2003, Bruce Evans wrote: My simple make benchmark now takes infinitely longer with ULE under SMP, since make -j 16 with ULE under SMP now hangs nfs after about a minute. 4BSD works better. However, some networking bugs have developed in the last few days. One of their manifestations is that SMP kernels always panic in sbdrop() on shutdown. This was fixed by setting debug.mpsafenet to 0 (fxp is apparently not MPSAFE yet). The last run with sched_ule.c 1.75 shows little difference between ULE and 4BSD: % *** zqz.4bsd.1Wed Oct 29 22:03:29 2003 % --- zqz.ule.3 Sun Nov 2 22:58:53 2003 % *** % *** 4 % --- 5,6 % + === atm % + === atm/sscop The tree compiled by 4BSD is 4 days older so ULE does these extra. % *** % *** 227 % !18.49 real 8.26 user 6.38 sys % --- 229 % !18.44 real 8.00 user 6.43 sys Differences for make obj (all this in usr.bin tree). % *** % *** 229,233 % !265 average shared memory size % !116 average unshared data size % !125 average unshared stack size % ! 23222 page reclaims % ! 26 page faults % --- 231,235 % !274 average shared memory size % !118 average unshared data size % !128 average unshared stack size % ! 22760 page reclaims % ! 25 page faults % *** % *** 236,241 % !918 block output operations % ! 9893 messages sent % ! 9893 messages received % !230 signals received % ! 13034 voluntary context switches % ! 1216 involuntary context switches % --- 238,243 % !926 block output operations % ! 9973 messages sent % ! 9973 messages received % !232 signals received % ! 17432 voluntary context switches % ! 1583 involuntary context switches Tiny differences in time -l output for obj stage, except ULE does more context switches. The signals are mostly SIGCHLD (needed to fix make(1)). % *** % *** 245 % --- 248,249 % + === atm % + === atm/sscop % *** % *** 506 % ! 126.67 real57.42 user43.83 sys % --- 510 % ! 124.43 real58.07 user42.17 sys % *** % *** 508,512 % ! 1973 average shared memory size % !803 average unshared data size % !128 average unshared stack size % ! 203770 page reclaims % ! 1459 page faults % --- 512,516 % ! 1920 average shared memory size % !784 average unshared data size % !127 average unshared stack size % ! 203124 page reclaims % ! 1464 page faults % *** % *** 514,520 % !165 block input operations % ! 1463 block output operations % ! 83118 messages sent % ! 83117 messages received % !265 signals received % ! 100319 voluntary context switches % ! 8113 involuntary context switches % --- 518,524 % !167 block input operations % ! 1469 block output operations % ! 83234 messages sent % ! 83236 messages received % !267 signals received % ! 125750 voluntary context switches % ! 17825 involuntary context switches Similarly for depend stage. % *** % *** 524 % --- 529,530 % + === atm % + === atm/sscop % *** % *** 701 % ! 291.30 real 307.00 user73.77 sys % --- 707 % ! 290.28 real 308.16 user74.05 sys % *** % *** 703,707 % ! 2073 average shared memory size % ! 2076 average unshared data size % !127 average unshared stack size % ! 624020 page reclaims % !156 page faults % --- 709,713 % ! 2084 average shared memory size % ! 2056 average unshared data size % !128 average unshared stack size % ! 626651 page reclaims % !154 page faults % *** % *** 709,715 % ! 72 block input operations % ! 2122 block output operations % ! 45315 messages sent % ! 45317 messages received % !691 signals received % ! 195785 voluntary context switches % ! 58130 involuntary context switches % --- 715,721 % ! 83 block input operations % ! 2133 block output operations % ! 45532 messages sent % ! 45524 messages received % !759 signals received % ! 228998 voluntary context switches % ! 128078 involuntary context switches Similarly for the all stage. The benchmark was not run carefully enough for the 1 second differences in the times to be significant. You commented on the nice cutoff before. What do you believe the correct behavior is? In ULE I went to great lengths to be certain that I emulated the old behavior of denying nice +20
Re: More ULE bugs fixed.
On Tue, Nov 04, 2003 at 12:33:48AM +1100, Bruce Evans wrote: I think the existence of rtprio and a non-broken idprio makes infinite deprioritization using niceness unnecessary. (idprio is still broken (not available to users) in -current, but it doesn't need to be if priority propagation is working as it should be.) It's safer and fairer for all niced processes to not completely prevent each other being scheduled, and use the special scheduling classes for cases where this is not wanted. I'd mainly like the slices for nice -20 vs nice --20 processes to be very small and/or infrequent. I agree. With idprio, there is no need for a special nice value that is handled outside the normal rules of nice. I always thought that a wart after using Irix which has a working idprio. -- -- David ([EMAIL PROTECTED]) ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
How nice should behave (was Re: More ULE bugs fixed.)
On Tue, 4 Nov 2003, Bruce Evans wrote: On Sun, 2 Nov 2003, Jeff Roberson wrote: You commented on the nice cutoff before. What do you believe the correct behavior is? In ULE I went to great lengths to be certain that I emulated the old behavior of denying nice +20 processes cpu time when anything nice 0 or above was running. As a result of that, nice -20 processes inhibit any processes with a nice below zero from receiving cpu time. Prior to a commit earlier today, nice -20 would stop nice 0 processes that were non-interactive. I've changed that though so nice 0 will always be able to run, just with a small slice. Based on your earlier comments, you don't believe that this behavior is correct, why, and what would you like to see? Only RELENG_4 has that old behaviour. I think the existence of rtprio and a non-broken idprio makes infinite deprioritization using niceness unnecessary. (idprio is still broken (not available to users) in -current, but it doesn't need to be if priority propagation is working as it should be.) It's safer and fairer for all niced processes to not completely prevent each other being scheduled, and use the special scheduling classes for cases where this is not wanted. I'd mainly like the slices for nice -20 vs nice --20 processes to be very small and/or infrequent. idprio should be able to function properly since we have priority propagation and elevated priorities for m/tsleep. I believe that many people rely on the nice +20 behavior. We could change this and make it a matter of user education. ULE's nice mechanism is very flexible in this regard. I would only have to change one define to force the slice assignment to scale across the whole slice range. Although, I only have 14 possible slice values to hand out, so small differences would be meaningless. Bruce ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
On Mon, 3 Nov 2003, Eirik Oeverby wrote: Hi, Just recompiled yesterday, running sched_ule.c 1.75. It seems to have re-introduced the bogus mouse events I talked about earlier, after a period of having no problems with it. The change happened between 1.69 and 1.75, and there's also the occational glitch in keyboard input. How unfortunate, it seems to have fixed other problems. Can you describe the mouse problem? Is it jittery constantly or only under load? Or are you having other problems? Have you tried reverting to SCHED_4BSD? What window manager do you run? Thanks for the report. Cheers, Jeff If you need me to do anything to track this down, let me know. I am, and have always been, running with moused, on a uniprocessor box (ThinkPad T21 1ghz p3). Best regards, /Eirik Jeff Roberson wrote: On Fri, 31 Oct 2003, Bruno Van Den Bossche wrote: Jeff Roberson [EMAIL PROTECTED] wrote: On Wed, 29 Oct 2003, Jeff Roberson wrote: On Thu, 30 Oct 2003, Bruce Evans wrote: Test for scheduling buildworlds: cd /usr/src/usr.bin for i in obj depend all do MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i done /tmp/zqz 21 (Run this with an empty /somewhere/obj. The all stage doesn't quite finish.) On an ABIT BP6 system with a 400MHz and a 366MHz CPU, with/usr (including /usr/src) nfs-mounted (with 100 Mbps ethernet and a reasonably fast server) and /somewhere/obj ufs1-mounted (on a fairly slow disk; no soft-updates), this gives the following times: SCHED_ULE-yesterday, with not so careful setup: 40.37 real 8.26 user 6.26 sys 278.90 real59.35 user41.32 sys 341.82 real 307.38 user69.01 sys SCHED_ULE-today, run immediately after booting: 41.51 real 7.97 user 6.42 sys 306.64 real59.66 user40.68 sys 346.48 real 305.54 user69.97 sys SCHED_4BSD-yesterday, with not so careful setup: [same as today except the depend step was 10 seconds slower (real)] SCHED_4BSD-today, run immediately after booting: 18.89 real 8.01 user 6.66 sys 128.17 real58.33 user43.61 sys 291.59 real 308.48 user72.33 sys SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz CPU) with many local changes and not so careful setup: 17.39 real 8.28 user 5.49 sys 130.51 real60.97 user34.63 sys 390.68 real 310.78 user60.55 sys Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for the obj and depend stages. These stages have little parallelism. SCHED_ULE was only 19% slower for the all stage. ... I reran this with -current (sched_ule.c 1.68, etc.). Result: no significant change. However, with a UP kernel there was no significant difference between the times for SCHED_ULE and SCHED_4BSD. There was a significant difference on UP until last week. I'm working on SMP now. I have some patches but they aren't quite ready yet. I have commited my SMP fixes. I would appreciate it if you could post update results. ULE now outperforms 4BSD in a single threaded kernel compile and performs almost identically in a 16 way make. I still have a few more things that I can do to improve the situation. I would expect ULE to pull further ahead in the months to come. I recently had to complete a little piece of software in a course on parallel computing. I've put it online[1] (we only had to write the pract2.cpp file). It calculates the inverse of a Vandermonde matrix and allows you to spawn multiple slave-processes who each perform a part of the work. Everything happens in memory so I've used it lately to test the different changes you made to sched_ule.c and these last fixes do improve the performance on my dual p3 machine a lot. Here are the results of my (very limited tests) : sched4bsd --- dimension slaves time 10001 90.925408 10002 58.897038 200 1 0.735962 200 2 0.676660 sched_ule 1.68 --- dimension slaves time 10001 90.951015 10002 70.402845 200 1 0.743551 200 2 1.900455 sched_ule 1.70 --- dimension slaves time 10001 90.782309 10002 57.207351 200 1 0.739998 200 2 0.383545 I'm not really sure if this is very relevant to you, but from the end-user point of view (me :-)) this does means something. Thanks! I welcome the feedback, positive or negative, as it helps me improve things. Thanks for the report! Could you run
Re: More ULE bugs fixed.
On Sat, 1 Nov 2003, Bruce Evans wrote: On Fri, 31 Oct 2003, Jeff Roberson wrote: I have commited my SMP fixes. I would appreciate it if you could post update results. ULE now outperforms 4BSD in a single threaded kernel compile and performs almost identically in a 16 way make. I still have a few more things that I can do to improve the situation. I would expect ULE to pull further ahead in the months to come. My simple make benchmark now takes infinitely longer with ULE under SMP, since make -j 16 with ULE under SMP now hangs nfs after about a minute. 4BSD works better. However, some networking bugs have developed in the last few days. One of their manifestations is that SMP kernels always panic in sbdrop() on shutdown. The nice issue is still outstanding, as is the incorrect wcpu reporting. It may be related to nfs processes not getting any cycles even when there are no niced processes. I've just run your script myself. I was using sched_ule.c rev 1.75. I did not encounter any problem. I also have not run it with 4BSD so I don't have any performance comparisons. Hopefully the next time you have an opportunity to test things will go smoothly. I fixed a bug in sched_prio() that may have caused this behavior. You commented on the nice cutoff before. What do you believe the correct behavior is? In ULE I went to great lengths to be certain that I emulated the old behavior of denying nice +20 processes cpu time when anything nice 0 or above was running. As a result of that, nice -20 processes inhibit any processes with a nice below zero from receiving cpu time. Prior to a commit earlier today, nice -20 would stop nice 0 processes that were non-interactive. I've changed that though so nice 0 will always be able to run, just with a small slice. Based on your earlier comments, you don't believe that this behavior is correct, why, and what would you like to see? Thanks, Jeff Bruce ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
On Fri, 31 Oct 2003, Sam Leffler wrote: On Friday 31 October 2003 09:04 am, Bruce Evans wrote: My simple make benchmark now takes infinitely longer with ULE under SMP, since make -j 16 with ULE under SMP now hangs nfs after about a minute. 4BSD works better. However, some networking bugs have developed in the last few days. One of their manifestations is that SMP kernels always panic in sbdrop() on shutdown. I'm looking at something similar now. If you have a stack trace please send it to me (along with any other info). You might also try booting debug.mpsafenet=0. Turning off mpsafenet fixed all these problems. These console messages are with it not turned off. fxp is the only physical network device. %%% WARNING: loader(8) metadata is missing! [ preserving 869208 bytes of kernel symbol table ] Copyright (c) 1992-2003 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.1-CURRENT #1005: Sun Nov 2 20:38:42 EST 2003 [EMAIL PROTECTED]:/c/sysc/i386/compile/smp Timecounter i8254 frequency 1193182 Hz quality 0 CPU: Pentium II/Pentium II Xeon/Celeron (400.91-MHz 686-class CPU) Origin = GenuineIntel Id = 0x665 Stepping = 5 Features=0x183fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR real memory = 268435456 (256 MB) avail memory = 255369216 (243 MB) Programming 24 pins in IOAPIC #0 IOAPIC #0 intpin 2 - irq 0 IOAPIC #0 intpin 17 - irq 9 IOAPIC #0 intpin 18 - irq 11 IOAPIC #0 intpin 19 - irq 5 FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfee0 cpu1 (AP): apic id: 1, version: 0x00040011, at 0xfee0 io0 (APIC): apic id: 2, version: 0x00170011, at 0xfec0 Pentium Pro MTRR support enabled npx0: math processor on motherboard npx0: flags 0x80 npx0: INT 16 interface pcibios: BIOS version 2.10 Using $PIR table, 8 entries at 0xc00fdef0 pcib0: Intel 82443BX (440 BX) host to PCI bridge at pcibus 0 on motherboard pci0: PCI bus on pcib0 pcib1: PCI-PCI bridge at device 1.0 on pci0 pci1: PCI bus on pcib1 pci1: display, VGA at device 0.0 (no driver attached) isab0: PCI-ISA bridge at device 7.0 on pci0 isa0: ISA bus on isab0 atapci0: Intel PIIX4 UDMA33 controller port 0xf000-0xf00f at device 7.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata0: [MPSAFE] ata1: at 0x170 irq 15 on atapci0 ata1: [MPSAFE] pci0: serial bus, USB at device 7.2 (no driver attached) piix0: PIIX Timecounter port 0x5000-0x500f at device 7.3 on pci0 Timecounter PIIX frequency 3579545 Hz quality 0 pci0: multimedia, video at device 11.0 (no driver attached) pci0: multimedia at device 11.1 (no driver attached) fxp0: Intel 82559 Pro/100 Ethernet port 0xa400-0xa43f mem 0xea00-0xea0f,0xea104000-0xea104fff irq 9 at device 13.0 on pci0 fxp0: Ethernet address 00:90:27:99:02:99 miibus0: MII bus on fxp0 inphy0: i82555 10/100 media interface on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp0: [MPSAFE] puc0: Titan VScom PCI-200HV2 port 0xb000-0xb01f,0xac00-0xac07,0xa800-0xa807 mem 0xea103000-0xea103fff,0xea102000-0xea102fff irq 5 at device 17.0 on pci0 sio4: Titan VScom PCI-200HV2 on puc0 sio4: type 16550A sio5: Titan VScom PCI-200HV2 on puc0 sio5: type 16550A atapci1: HighPoint HPT366 UDMA66 controller port 0xbc00-0xbcff,0xb800-0xb803,0xb400-0xb407 irq 11 at device 19.0 on pci0 atapci1: [MPSAFE] ata2: at 0xb400 on atapci1 ata2: [MPSAFE] atapci2: HighPoint HPT366 UDMA66 controller port 0xc800-0xc8ff,0xc400-0xc403,0xc000-0xc007 irq 11 at device 19.1 on pci0 atapci2: [MPSAFE] ata3: at 0xc000 on atapci2 ata3: [MPSAFE] orm0: Option ROMs at iomem 0xc8000-0xcbfff,0xc-0xc7fff on isa0 fdc0: Enhanced floppy controller (i82077, NE72065 or clone) at port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: 1440-KB 3.5 drive on fdc0 drive 0 atkbdc0: Keyboard controller (i8042) at port 0x64,0x60 on isa0 atkbd0: AT Keyboard flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 psm0: PS/2 Mouse irq 12 on atkbdc0 psm0: model Generic PS/2 mouse, device ID 0 vga0: Generic ISA VGA at port 0x3c0-0x3df iomem 0xa-0xb on isa0 sc0: System console at flags 0x100 on isa0 sc0: VGA 16 virtual consoles, flags=0x100 sio0 at port 0x3f8-0x3ff irq 4 flags 0x90 on isa0 sio0: type 16550A, console sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A cy0 at iomem 0xd4000-0xd5fff irq 10 on isa0 cy0: driver is using old-style compatibility shims ppc0: Parallel port at port 0x378-0x37f irq 7 on isa0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/16 bytes threshold ppbus0: Parallel port bus on ppc0 ppbus0: IEEE1284 device found Probing for PnP devices on ppbus0: plip0: PLIP network interface on ppbus0 lpt0: Printer on ppbus0 lpt0: Interrupt-driven port ppi0: Parallel I/O on ppbus0 unknown: PNP0303 can't assign resources (port) speaker0: PC
Re: More ULE bugs fixed.
Jeff Roberson [EMAIL PROTECTED] wrote: On Fri, 31 Oct 2003, Bruno Van Den Bossche wrote: [...] I recently had to complete a little piece of software in a course on parallel computing. I've put it online[1] (we only had to write the pract2.cpp file). It calculates the inverse of a Vandermonde matrix and allows you to spawn multiple slave-processes who each perform a part of the work. Everything happens in memory so I've used it lately to test the different changes you made to sched_ule.c and these last fixes do improve the performance on my dual p3 machine a lot. Here are the results of my (very limited tests) : sched4bsd --- dimension slaves time 10001 90.925408 10002 58.897038 200 1 0.735962 200 2 0.676660 sched_ule 1.68 --- dimension slaves time 10001 90.951015 10002 70.402845 200 1 0.743551 200 2 1.900455 sched_ule 1.70 --- dimension slaves time 10001 90.782309 10002 57.207351 200 1 0.739998 200 2 0.383545 I'm not really sure if this is very relevant to you, but from the end-user point of view (me :-)) this does means something. Thanks! I welcome the feedback, positive or negative, as it helps me improve things. Thanks for the report! Could you run this again under 4bsd and ULE with the following in your .cshrc: set time= ( 5 %Uu %Ss %E %P %X+%Dk %I+%Oio %Fpf+%Ww %cc/%ww ) And then time ./testpract 200 2, etc. This will give me a few hints about what's impacting your performance. The program can run as a slave or master. So one should run one master and multiple slaves and they all work on a piece of shared memory. So I've timed the individual processes, as the wrapper-script test_pract2 doesn't do more then launch a few processes in the background. I don't think the output of that is very relevant. Here's the result: sched_4bsd 1.26 10001 master: 49.172u 0.187s 2:21.54 34.8% 15+10182k 0+0io 0pf+0w 5962c/65w slave : 90.326u 0.250s 1:30.75 99.8% 15+168k 0+0io 0pf+0w 9156c/35w 10002 master: 49.113u 0.226s 1:49.94 44.8% 15+10181k 0+0io 0pf+0w 5942c/63w slave1: 55.211u 0.326s 0:59.11 93.9% 15+166k 0+0io 0pf+0w 11129c/2224w slave2: 54.897u 0.363s 0:58.62 94.2% 15+167k 0+0io 0pf+0w 7111c/6129w 200 1 master: 0.377u 0.007s 0:02.39 15.4% 15+589k 0+0io 0pf+0w 38c/13w slave : 0.711u 0.031s 0:00.74 100.0% 15+169k 0+0io 0pf+0w 85c/1w 200 2 master: 0.376u 0.007s 0:02.87 12.8% 16+602k 0+0io 0pf+0w 41c/11w slave1: 0.388u 0.006s 0:01.03 36.8% 18+201k 0+0io 0pf+0w 1245c/408w slave2: 0.345u 0.038s 0:00.68 54.4% 34+158k 0+0io 0pf+0w 432c/1215w sched_ule 1.75 10001 master: 49.097u 0.163s 2:21.32 34.8% 15+10186k 0+0io 0pf+0w 6197c/163w slave : 90.157u 0.398s 1:30.82 99.6% 15+168k 0+0io 0pf+0w 11568c/49w 10002 master: 49.132u 0.164s 1:48.15 45.5% 15+10155k 0+0io 0pf+0w 6517c/276w slave1: 55.634u 0.406s 0:57.52 97.4% 15+169k 0+0io 0pf+0w 12745c/9628w slave2: 55.416u 0.391s 0:57.13 97.6% 15+168k 0+0io 0pf+0w 12448c/10063w 200 1 master: 0.369u 0.016s 0:02.52 14.6% 15+577k 0+0io 0pf+0w 92c/35w slave : 0.690u 0.054s 0:00.74 100.0% 15+171k 0+0io 0pf+0w 147c/13w 200 2 master: 0.376u 0.007s 0:02.47 14.9% 15+589k 0+0io 0pf+0w 87c/21w slave1: 0.331u 0.023s 0:00.70 50.0% 15+173k 0+0io 0pf+0w 466c/2135w slave2: 0.304u 0.040s 0:00.39 87.1% 15+166k 0+0io 0pf+0w 412c/2119w [1] http://users.pandora.be/bomberboy/mptest/final.tar.bz2 It can be used by running testpract2 with two arguments, the dimension of the matrix and the number of slaves. example './testpract2 200 2' will create a matrix with dimension 200 and 2 slaves. -- Bruno This fortune is inoperative. Please try another. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
On Wed, 29 Oct 2003, Jeff Roberson wrote: On Thu, 30 Oct 2003, Bruce Evans wrote: Test for scheduling buildworlds: cd /usr/src/usr.bin for i in obj depend all do MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i done /tmp/zqz 21 (Run this with an empty /somewhere/obj. The all stage doesn't quite finish.) On an ABIT BP6 system with a 400MHz and a 366MHz CPU, with /usr (including /usr/src) nfs-mounted (with 100 Mbps ethernet and a reasonably fast server) and /somewhere/obj ufs1-mounted (on a fairly slow disk; no soft-updates), this gives the following times: SCHED_ULE-yesterday, with not so careful setup: 40.37 real 8.26 user 6.26 sys 278.90 real59.35 user41.32 sys 341.82 real 307.38 user69.01 sys SCHED_ULE-today, run immediately after booting: 41.51 real 7.97 user 6.42 sys 306.64 real59.66 user40.68 sys 346.48 real 305.54 user69.97 sys SCHED_4BSD-yesterday, with not so careful setup: [same as today except the depend step was 10 seconds slower (real)] SCHED_4BSD-today, run immediately after booting: 18.89 real 8.01 user 6.66 sys 128.17 real58.33 user43.61 sys 291.59 real 308.48 user72.33 sys SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz CPU) with many local changes and not so careful setup: 17.39 real 8.28 user 5.49 sys 130.51 real60.97 user34.63 sys 390.68 real 310.78 user60.55 sys Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for the obj and depend stages. These stages have little parallelism. SCHED_ULE was only 19% slower for the all stage. ... I reran this with -current (sched_ule.c 1.68, etc.). Result: no significant change. However, with a UP kernel there was no significant difference between the times for SCHED_ULE and SCHED_4BSD. There was a significant difference on UP until last week. I'm working on SMP now. I have some patches but they aren't quite ready yet. I have commited my SMP fixes. I would appreciate it if you could post update results. ULE now outperforms 4BSD in a single threaded kernel compile and performs almost identically in a 16 way make. I still have a few more things that I can do to improve the situation. I would expect ULE to pull further ahead in the months to come. The nice issue is still outstanding, as is the incorrect wcpu reporting. Cheers, Jeff Test 5 for fair scheduling related to niceness: for i in -20 -16 -12 -8 -4 0 4 8 12 16 20 do nice -$i sh -c while :; do echo -n;done done time top -o cpu With SCHED_ULE, this now hangs the system, but it worked yesterday. Today it doesn't get as far as running top and it stops the nfs server responding. To unhang the system and see what the above does, run a shell at rtprio 0 and start top before the above, and use top to kill processes (I normally use killall sh to kill all the shells generated by tests 1-5, but killall doesn't work if it is on nfs when the nfs server is not responding). This shows problems much more clearly with UP kernels. It gives the nice -20 and -16 processes approx. 55% and 50% of the CPU, respectively (the total is significantly more than 100%), and it gives approx. 0% of the CPU to the other sh processes (perhaps exactly 0). It also apparently gives gives 0% of the CPU to some important nfs process (I couldn't see exactly which) so the nfs server stops responding. SCHED_4BSD errs in the opposite direction by giving too many cycles to highly niced processes so it is naturally immune to this problem. With SMP, SCHED_ULE lets many more processes run. I seem to have broken something related to nice. I only tested interactivity and performance after my last round of changes. I have a standard test that I do that is similar to the one that you have posted here. I used it to gather results for my paper (http://www.chesapeake.net/~jroberson/ULE.pdf). There you can see what the intended nice curve is like. Oddly enough, I ran your test again on my laptop and I did not see 55% of the cpu going to nice -20. It was spread proportionally from -20 to 0 with postive nice values not receiving cpu time, as intended. It did not, however, let interactive processes proceed. This is certainly a bug and it sounds like there may be others which lead to the problems that you're having. The nfs server also sometimes stops reponding with only non-negatively niced processes (0 through 20 in the above), but it takes longer. The nfs server restarts if enough of the hog processes are killed. Apparently nfs has some critical process running at only user
Re: More ULE bugs fixed.
Jeff Roberson [EMAIL PROTECTED] wrote: On Wed, 29 Oct 2003, Jeff Roberson wrote: On Thu, 30 Oct 2003, Bruce Evans wrote: Test for scheduling buildworlds: cd /usr/src/usr.bin for i in obj depend all do MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i done /tmp/zqz 21 (Run this with an empty /somewhere/obj. The all stage doesn't quite finish.) On an ABIT BP6 system with a 400MHz and a 366MHz CPU, with/usr (including /usr/src) nfs-mounted (with 100 Mbps ethernet and a reasonably fast server) and /somewhere/obj ufs1-mounted (on a fairly slow disk; no soft-updates), this gives the following times: SCHED_ULE-yesterday, with not so careful setup: 40.37 real 8.26 user 6.26 sys 278.90 real59.35 user41.32 sys 341.82 real 307.38 user69.01 sys SCHED_ULE-today, run immediately after booting: 41.51 real 7.97 user 6.42 sys 306.64 real59.66 user40.68 sys 346.48 real 305.54 user69.97 sys SCHED_4BSD-yesterday, with not so careful setup: [same as today except the depend step was 10 seconds slower (real)] SCHED_4BSD-today, run immediately after booting: 18.89 real 8.01 user 6.66 sys 128.17 real58.33 user43.61 sys 291.59 real 308.48 user72.33 sys SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz CPU) with many local changes and not so careful setup: 17.39 real 8.28 user 5.49 sys 130.51 real60.97 user34.63 sys 390.68 real 310.78 user60.55 sys Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for the obj and depend stages. These stages have little parallelism. SCHED_ULE was only 19% slower for the all stage. ... I reran this with -current (sched_ule.c 1.68, etc.). Result: no significant change. However, with a UP kernel there was no significant difference between the times for SCHED_ULE and SCHED_4BSD. There was a significant difference on UP until last week. I'm working on SMP now. I have some patches but they aren't quite ready yet. I have commited my SMP fixes. I would appreciate it if you could post update results. ULE now outperforms 4BSD in a single threaded kernel compile and performs almost identically in a 16 way make. I still have a few more things that I can do to improve the situation. I would expect ULE to pull further ahead in the months to come. I recently had to complete a little piece of software in a course on parallel computing. I've put it online[1] (we only had to write the pract2.cpp file). It calculates the inverse of a Vandermonde matrix and allows you to spawn multiple slave-processes who each perform a part of the work. Everything happens in memory so I've used it lately to test the different changes you made to sched_ule.c and these last fixes do improve the performance on my dual p3 machine a lot. Here are the results of my (very limited tests) : sched4bsd --- dimension slaves time 10001 90.925408 10002 58.897038 200 1 0.735962 200 2 0.676660 sched_ule 1.68 --- dimension slaves time 10001 90.951015 10002 70.402845 200 1 0.743551 200 2 1.900455 sched_ule 1.70 --- dimension slaves time 10001 90.782309 10002 57.207351 200 1 0.739998 200 2 0.383545 I'm not really sure if this is very relevant to you, but from the end-user point of view (me :-)) this does means something. Thanks! [1] http://users.pandora.be/bomberboy/mptest/final.tar.bz2 It can be used by running testpract2 with two arguments, the dimension of the matrix and the number of slaves. example './testpract2 200 2' will create a matrix with dimension 200 and 2 slaves. -- Bruno ... And then there's the guy who bought 20,000 bras, cut them in half, and sold 40,000 yamalchas with chin straps ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
On Fri, 31 Oct 2003, Jeff Roberson wrote: I have commited my SMP fixes. I would appreciate it if you could post update results. ULE now outperforms 4BSD in a single threaded kernel compile and performs almost identically in a 16 way make. I still have a few more things that I can do to improve the situation. I would expect ULE to pull further ahead in the months to come. My simple make benchmark now takes infinitely longer with ULE under SMP, since make -j 16 with ULE under SMP now hangs nfs after about a minute. 4BSD works better. However, some networking bugs have developed in the last few days. One of their manifestations is that SMP kernels always panic in sbdrop() on shutdown. The nice issue is still outstanding, as is the incorrect wcpu reporting. It may be related to nfs processes not getting any cycles even when there are no niced processes. Bruce ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
On Friday 31 October 2003 09:04 am, Bruce Evans wrote: My simple make benchmark now takes infinitely longer with ULE under SMP, since make -j 16 with ULE under SMP now hangs nfs after about a minute. 4BSD works better. However, some networking bugs have developed in the last few days. One of their manifestations is that SMP kernels always panic in sbdrop() on shutdown. I'm looking at something similar now. If you have a stack trace please send it to me (along with any other info). You might also try booting debug.mpsafenet=0. Sam ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
On Fri, 31 Oct 2003, Bruno Van Den Bossche wrote: Jeff Roberson [EMAIL PROTECTED] wrote: On Wed, 29 Oct 2003, Jeff Roberson wrote: On Thu, 30 Oct 2003, Bruce Evans wrote: Test for scheduling buildworlds: cd /usr/src/usr.bin for i in obj depend all do MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i done /tmp/zqz 21 (Run this with an empty /somewhere/obj. The all stage doesn't quite finish.) On an ABIT BP6 system with a 400MHz and a 366MHz CPU, with/usr (including /usr/src) nfs-mounted (with 100 Mbps ethernet and a reasonably fast server) and /somewhere/obj ufs1-mounted (on a fairly slow disk; no soft-updates), this gives the following times: SCHED_ULE-yesterday, with not so careful setup: 40.37 real 8.26 user 6.26 sys 278.90 real59.35 user41.32 sys 341.82 real 307.38 user69.01 sys SCHED_ULE-today, run immediately after booting: 41.51 real 7.97 user 6.42 sys 306.64 real59.66 user40.68 sys 346.48 real 305.54 user69.97 sys SCHED_4BSD-yesterday, with not so careful setup: [same as today except the depend step was 10 seconds slower (real)] SCHED_4BSD-today, run immediately after booting: 18.89 real 8.01 user 6.66 sys 128.17 real58.33 user43.61 sys 291.59 real 308.48 user72.33 sys SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz CPU) with many local changes and not so careful setup: 17.39 real 8.28 user 5.49 sys 130.51 real60.97 user34.63 sys 390.68 real 310.78 user60.55 sys Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for the obj and depend stages. These stages have little parallelism. SCHED_ULE was only 19% slower for the all stage. ... I reran this with -current (sched_ule.c 1.68, etc.). Result: no significant change. However, with a UP kernel there was no significant difference between the times for SCHED_ULE and SCHED_4BSD. There was a significant difference on UP until last week. I'm working on SMP now. I have some patches but they aren't quite ready yet. I have commited my SMP fixes. I would appreciate it if you could post update results. ULE now outperforms 4BSD in a single threaded kernel compile and performs almost identically in a 16 way make. I still have a few more things that I can do to improve the situation. I would expect ULE to pull further ahead in the months to come. I recently had to complete a little piece of software in a course on parallel computing. I've put it online[1] (we only had to write the pract2.cpp file). It calculates the inverse of a Vandermonde matrix and allows you to spawn multiple slave-processes who each perform a part of the work. Everything happens in memory so I've used it lately to test the different changes you made to sched_ule.c and these last fixes do improve the performance on my dual p3 machine a lot. Here are the results of my (very limited tests) : sched4bsd --- dimension slaves time 10001 90.925408 10002 58.897038 200 1 0.735962 200 2 0.676660 sched_ule 1.68 --- dimension slaves time 10001 90.951015 10002 70.402845 200 1 0.743551 200 2 1.900455 sched_ule 1.70 --- dimension slaves time 10001 90.782309 10002 57.207351 200 1 0.739998 200 2 0.383545 I'm not really sure if this is very relevant to you, but from the end-user point of view (me :-)) this does means something. Thanks! I welcome the feedback, positive or negative, as it helps me improve things. Thanks for the report! Could you run this again under 4bsd and ULE with the following in your .cshrc: set time= ( 5 %Uu %Ss %E %P %X+%Dk %I+%Oio %Fpf+%Ww %cc/%ww ) And then time ./testpract 200 2, etc. This will give me a few hints about what's impacting your performance. Thanks! Jeff [1] http://users.pandora.be/bomberboy/mptest/final.tar.bz2 It can be used by running testpract2 with two arguments, the dimension of the matrix and the number of slaves. example './testpract2 200 2' will create a matrix with dimension 200 and 2 slaves. -- Bruno ... And then there's the guy who bought 20,000 bras, cut them in half, and sold 40,000 yamalchas with chin straps
Re: More ULE bugs fixed.
Test for scheduling buildworlds: cd /usr/src/usr.bin for i in obj depend all do MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i done /tmp/zqz 21 (Run this with an empty /somewhere/obj. The all stage doesn't quite finish.) On an ABIT BP6 system with a 400MHz and a 366MHz CPU, with /usr (including /usr/src) nfs-mounted (with 100 Mbps ethernet and a reasonably fast server) and /somewhere/obj ufs1-mounted (on a fairly slow disk; no soft-updates), this gives the following times: SCHED_ULE-yesterday, with not so careful setup: 40.37 real 8.26 user 6.26 sys 278.90 real59.35 user41.32 sys 341.82 real 307.38 user69.01 sys SCHED_ULE-today, run immediately after booting: 41.51 real 7.97 user 6.42 sys 306.64 real59.66 user40.68 sys 346.48 real 305.54 user69.97 sys SCHED_4BSD-yesterday, with not so careful setup: [same as today except the depend step was 10 seconds slower (real)] SCHED_4BSD-today, run immediately after booting: 18.89 real 8.01 user 6.66 sys 128.17 real58.33 user43.61 sys 291.59 real 308.48 user72.33 sys SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz CPU) with many local changes and not so careful setup: 17.39 real 8.28 user 5.49 sys 130.51 real60.97 user34.63 sys 390.68 real 310.78 user60.55 sys Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for the obj and depend stages. These stages have little parallelism. SCHED_ULE was only 19% slower for the all stage. ... I reran this with -current (sched_ule.c 1.68, etc.). Result: no significant change. However, with a UP kernel there was no significant difference between the times for SCHED_ULE and SCHED_4BSD. Test 5 for fair scheduling related to niceness: for i in -20 -16 -12 -8 -4 0 4 8 12 16 20 do nice -$i sh -c while :; do echo -n;done done time top -o cpu With SCHED_ULE, this now hangs the system, but it worked yesterday. Today it doesn't get as far as running top and it stops the nfs server responding. To unhang the system and see what the above does, run a shell at rtprio 0 and start top before the above, and use top to kill processes (I normally use killall sh to kill all the shells generated by tests 1-5, but killall doesn't work if it is on nfs when the nfs server is not responding). This shows problems much more clearly with UP kernels. It gives the nice -20 and -16 processes approx. 55% and 50% of the CPU, respectively (the total is significantly more than 100%), and it gives approx. 0% of the CPU to the other sh processes (perhaps exactly 0). It also apparently gives gives 0% of the CPU to some important nfs process (I couldn't see exactly which) so the nfs server stops responding. SCHED_4BSD errs in the opposite direction by giving too many cycles to highly niced processes so it is naturally immune to this problem. With SMP, SCHED_ULE lets many more processes run. The nfs server also sometimes stops reponding with only non-negatively niced processes (0 through 20 in the above), but it takes longer. The nfs server restarts if enough of the hog processes are killed. Apparently nfs has some critical process running at only user priority and nice 0 and even non-negatively niced processes are enough to prevent it it running. Top output with loops like the above shows many anomalies in PRI, TIME, WCPU and CPU, but no worse than the ones with SCHED_4BSD. PRI tends to stick at 139 (the max) with SCHED_ULE. With SCHED_4BSD, this indicates that the scheduler has entered an unfair scheduling region. I don't know how to interpret it for SCHED_ULE (at first I thought 139 was a dummy value). Bruce ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
On Thu, 30 Oct 2003, Bruce Evans wrote: Test for scheduling buildworlds: cd /usr/src/usr.bin for i in obj depend all do MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i done /tmp/zqz 21 (Run this with an empty /somewhere/obj. The all stage doesn't quite finish.) On an ABIT BP6 system with a 400MHz and a 366MHz CPU, with /usr (including /usr/src) nfs-mounted (with 100 Mbps ethernet and a reasonably fast server) and /somewhere/obj ufs1-mounted (on a fairly slow disk; no soft-updates), this gives the following times: SCHED_ULE-yesterday, with not so careful setup: 40.37 real 8.26 user 6.26 sys 278.90 real59.35 user41.32 sys 341.82 real 307.38 user69.01 sys SCHED_ULE-today, run immediately after booting: 41.51 real 7.97 user 6.42 sys 306.64 real59.66 user40.68 sys 346.48 real 305.54 user69.97 sys SCHED_4BSD-yesterday, with not so careful setup: [same as today except the depend step was 10 seconds slower (real)] SCHED_4BSD-today, run immediately after booting: 18.89 real 8.01 user 6.66 sys 128.17 real58.33 user43.61 sys 291.59 real 308.48 user72.33 sys SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz CPU) with many local changes and not so careful setup: 17.39 real 8.28 user 5.49 sys 130.51 real60.97 user34.63 sys 390.68 real 310.78 user60.55 sys Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for the obj and depend stages. These stages have little parallelism. SCHED_ULE was only 19% slower for the all stage. ... I reran this with -current (sched_ule.c 1.68, etc.). Result: no significant change. However, with a UP kernel there was no significant difference between the times for SCHED_ULE and SCHED_4BSD. There was a significant difference on UP until last week. I'm working on SMP now. I have some patches but they aren't quite ready yet. Test 5 for fair scheduling related to niceness: for i in -20 -16 -12 -8 -4 0 4 8 12 16 20 do nice -$i sh -c while :; do echo -n;done done time top -o cpu With SCHED_ULE, this now hangs the system, but it worked yesterday. Today it doesn't get as far as running top and it stops the nfs server responding. To unhang the system and see what the above does, run a shell at rtprio 0 and start top before the above, and use top to kill processes (I normally use killall sh to kill all the shells generated by tests 1-5, but killall doesn't work if it is on nfs when the nfs server is not responding). This shows problems much more clearly with UP kernels. It gives the nice -20 and -16 processes approx. 55% and 50% of the CPU, respectively (the total is significantly more than 100%), and it gives approx. 0% of the CPU to the other sh processes (perhaps exactly 0). It also apparently gives gives 0% of the CPU to some important nfs process (I couldn't see exactly which) so the nfs server stops responding. SCHED_4BSD errs in the opposite direction by giving too many cycles to highly niced processes so it is naturally immune to this problem. With SMP, SCHED_ULE lets many more processes run. I seem to have broken something related to nice. I only tested interactivity and performance after my last round of changes. I have a standard test that I do that is similar to the one that you have posted here. I used it to gather results for my paper (http://www.chesapeake.net/~jroberson/ULE.pdf). There you can see what the intended nice curve is like. Oddly enough, I ran your test again on my laptop and I did not see 55% of the cpu going to nice -20. It was spread proportionally from -20 to 0 with postive nice values not receiving cpu time, as intended. It did not, however, let interactive processes proceed. This is certainly a bug and it sounds like there may be others which lead to the problems that you're having. The nfs server also sometimes stops reponding with only non-negatively niced processes (0 through 20 in the above), but it takes longer. The nfs server restarts if enough of the hog processes are killed. Apparently nfs has some critical process running at only user priority and nice 0 and even non-negatively niced processes are enough to prevent it it running. This shouldn't be the case, it sounds like my interactivity boost is somewhat broken. Top output with loops like the above shows many anomalies in PRI, TIME, WCPU and CPU, but no worse than the ones with SCHED_4BSD. PRI tends to stick at 139 (the max) with SCHED_ULE. With SCHED_4BSD, this indicates that the scheduler has entered an unfair scheduling region. I don't know how to interpret it for SCHED_ULE (at first I thought 139
Re: More ULE bugs fixed.
On Fri, 17 Oct 2003, Bruce Evans wrote: On Fri, 17 Oct 2003, Jeff Roberson wrote: On Fri, 17 Oct 2003, Bruce Evans wrote: How would one test if it was an improvement on the 4BSD scheduler? It is not even competitive in my simple tests. ... At one point ULE was at least as fast as 4BSD and in most cases faster. This is a regression. I'll sort it out soon. How much faster? make kernel on UP seems to be within 1% of 4BSD now. I actually had some runs which showed lower system time. I think I can still improve the situation some. Anyway, I found some bugs relating to idle prio tasks, and also ULE had been doing almost twice as many context switches as 4BSD. Now it's doing about 8% more. I'm still tracking this down. Anyhow, it should be much closer now. I still have some plans for SMP that should improve things quite a bit there but UP is looking good. Cheers, Jeff ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
Jeff Roberson [EMAIL PROTECTED] wrote : On Fri, 17 Oct 2003, Bruce Evans wrote: How would one test if it was an improvement on the 4BSD scheduler? It is not even competitive in my simple tests. What were your simple tests? -- Jonathan Mini [EMAIL PROTECTED] http://www.freebsd.org/ ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
On Sun, 26 Oct 2003, Jon Mini wrote: Jeff Roberson [EMAIL PROTECTED] wrote : On Fri, 17 Oct 2003, Bruce Evans wrote: How would one test if it was an improvement on the 4BSD scheduler? It is not even competitive in my simple tests. What were your simple tests? Er, they were in the original mail. Just do parts of buildworld with -j16 on an SMP system. ULE was 2.4 times slower for make depend and 2.1 times slower for make obj. Something must have been very wrong, since make obj, especially, should be completely i/o bound so it shouldn't be affected by the scheduler. Also, run a bunch of CPU hog processes with various nicenesses and look at top output to check that they are given reasonable amounts of CPU. Bruce ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
Bruce Evans [EMAIL PROTECTED] wrote : On Sun, 26 Oct 2003, Jon Mini wrote: Jeff Roberson [EMAIL PROTECTED] wrote : On Fri, 17 Oct 2003, Bruce Evans wrote: How would one test if it was an improvement on the 4BSD scheduler? It is not even competitive in my simple tests. What were your simple tests? Er, they were in the original mail. Just do parts of buildworld with -j16 on an SMP system. ULE was 2.4 times slower for make depend and 2.1 times slower for make obj. Something must have been very wrong, since make obj, especially, should be completely i/o bound so it shouldn't be affected by the scheduler. Also, run a bunch of CPU hog processes with various nicenesses and look at top output to check that they are given reasonable amounts of CPU. My apologies, I just subscribed to current and only caught the tail end of this thread. -- Jonathan Mini [EMAIL PROTECTED] http://www.freebsd.org/ ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
Thanks. I should have known =) /Eirik Maxime Henrion wrote: Eirik Oeverby wrote: As a side note/question: Is there any way to figure out which ULE version I'm running in a precompiled kernel? I just nuked my src tree by accident, and am not sure if i'm on 1.65 or something older.. If there is no way, is this perhaps an idea? Try ident /boot/kernel/kernel | grep sched_ule. Cheers, Maxime ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
As a side note/question: Is there any way to figure out which ULE version I'm running in a precompiled kernel? I just nuked my src tree by accident, and am not sure if i'm on 1.65 or something older.. If there is no way, is this perhaps an idea? Thanks, /Eirik Jeff Roberson wrote: On Fri, 17 Oct 2003, Bruce Evans wrote: How would one test if it was an improvement on the 4BSD scheduler? It is not even competitive in my simple tests. [scripts results deleted] Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for the obj and depend stages. These stages have little parallelism. SCHED_ULE was only 19% slower for the all stage. It apparently misses many oppurtunities to actually run useful processes. This may be related to /usr being nfs mounted. There is lots of idling waiting for nfs even in the SCHED_4BSD case. The system times are smaller for SCHED_ULE, but this might not be significant. E.g., zeroing pages can account for several percent of the system time in buildworld, but on unbalanced systems that have too much idle time most page zero gets done in idle time and doesn't show up in the system time. At one point ULE was at least as fast as 4BSD and in most cases faster. This is a regression. I'll sort it out soon. Test 1 for fair scheduling related to niceness: for i in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 do nice -$i sh -c while :; do echo -n;done done top -o time [Output deleted]. This shows only a vague correlation between niceness and runtime for SCHED_ULE. However, top -o cpu shows a strong correlation between %CPU and niceness. Apparently, %CPU is very innacurate and/or not enough history is kept for long-term scheduling to be fair. Test 5 for fair scheduling related to niceness: for i in -20 -16 -12 -8 -4 0 4 8 12 16 20 do nice -$i sh -c while :; do echo -n;done done time top -o cpu With SCHED_ULE, this now hangs the system, but it worked yesterday. Today it doesn't get as far as running top and it stops the nfs server responding. To unhang the system and see what the above does, run a shell at rtprio 0 and start top before the above, and use top to kill processes (I normally use killall sh to kill all the shells generated by tests 1-5, but killall doesn't work if it is on nfs when the nfs server is not responding). 661 root 112 -20 900K 608K RUN 0:24 27.80% 27.64% sh 662 root 114 -16 900K 608K RUN 0:19 12.43% 12.35% sh 663 root 114 -12 900K 608K RUN 0:15 10.66% 10.60% sh 664 root 114 -8 900K 608K RUN 0:11 9.38% 9.33% sh 665 root 115 -4 900K 608K RUN 0:10 7.91% 7.86% sh 666 root 1150 900K 608K RUN 0:07 6.83% 6.79% sh 667 root 1154 900K 608K RUN 0:06 5.01% 4.98% sh 668 root 1158 900K 608K RUN 0:04 3.83% 3.81% sh 669 root 115 12 900K 608K RUN 0:02 2.21% 2.20% sh 670 root 115 16 900K 608K RUN 0:01 0.93% 0.93% sh I think you cvsup'd at a bad time. I fixed a bug that would have caused the system to lock up in this case late last night. On my system it freezes for a few seconds and then returns. I can stop that by turning down the interactivity threshold. Thanks, Jeff Bruce ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
Eirik Oeverby wrote: As a side note/question: Is there any way to figure out which ULE version I'm running in a precompiled kernel? I just nuked my src tree by accident, and am not sure if i'm on 1.65 or something older.. If there is no way, is this perhaps an idea? Try ident /boot/kernel/kernel | grep sched_ule. Cheers, Maxime ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
On Fri, 17 Oct 2003, Jeff Roberson wrote: On Fri, 17 Oct 2003, Bruce Evans wrote: How would one test if it was an improvement on the 4BSD scheduler? It is not even competitive in my simple tests. ... At one point ULE was at least as fast as 4BSD and in most cases faster. This is a regression. I'll sort it out soon. How much faster? Test 5 for fair scheduling related to niceness: for i in -20 -16 -12 -8 -4 0 4 8 12 16 20 do nice -$i sh -c while :; do echo -n;done done time top -o cpu With SCHED_ULE, this now hangs the system, but it worked yesterday. Today it doesn't get as far as running top and it stops the nfs server responding. 661 root 112 -20 900K 608K RUN 0:24 27.80% 27.64% sh 662 root 114 -16 900K 608K RUN 0:19 12.43% 12.35% sh 663 root 114 -12 900K 608K RUN 0:15 10.66% 10.60% sh 664 root 114 -8 900K 608K RUN 0:11 9.38% 9.33% sh 665 root 115 -4 900K 608K RUN 0:10 7.91% 7.86% sh 666 root 1150 900K 608K RUN 0:07 6.83% 6.79% sh 667 root 1154 900K 608K RUN 0:06 5.01% 4.98% sh 668 root 1158 900K 608K RUN 0:04 3.83% 3.81% sh 669 root 115 12 900K 608K RUN 0:02 2.21% 2.20% sh 670 root 115 16 900K 608K RUN 0:01 0.93% 0.93% sh Perhaps the bug only affects SMP. The above is for UP (no CPU column). I see a large difference from the above, at least under SMP: %CPU tapers off to 0 at nice 0. BTW, I just noticed that SCHED_4BSD never really worked for the SMP case. sched_clock() is called for each CPU, and for N CPU's this has the same effect as calling sched_clock() N times too often for 1 CPU. Calling sched_clock() too often was fixed for the UP case in kern_synch.c 1.83 by introducing a scale factor. The scale factor is fixed so it doesn't help for SMP. I think you cvsup'd at a bad time. I fixed a bug that would have caused the system to lock up in this case late last night. On my system it freezes for a few seconds and then returns. I can stop that by turning down the interactivity threshold. No, I tested with an up to date kernel (sched_ule.c 1.65). Bruce ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
The commit to src/sys/kern/kern_switch.c:1.62, would it fix the following crash (can't find my kernel with debugging symbols): Hrm, nope. This is from a kernel from tonight at 9pm PST. -sc #0 doadump () at /usr/src/sys/kern/kern_shutdown.c:240 #1 0xc052f579 in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:372 #2 0xc052f958 in panic () at /usr/src/sys/kern/kern_shutdown.c:550 #3 0xc06e5546 in trap_fatal (frame=0xdc2797e4, eva=0) at /usr/src/sys/i386/i386/trap.c:820 #4 0xc06e4b83 in trap (frame= {tf_fs = -1068236776, tf_es = -1065811952, tf_ds = -880082928, tf_edi = 0, tf_esi = 0, tf_ebp = -601384896, tf_isp = -601384944, tf_ebx = 0, tf_edx = -872307216, tf_ecx = -601384756, tf_eax = 1, tf_trapno = 12, tf_err = 0, tf_eip = -1068209552, tf_cs = 8, tf_eflags = 66050, tf_esp = -878456736, tf_ss = -1049884900}) at /usr/src/sys/i386/i386/trap.c:252 #5 0xc06d53e8 in calltrap () at {standard input}:102 #6 0xc0527bd4 in fill_kinfo_thread (td=0xcc087e40, kp=0xdc2798cc) at /usr/src/sys/kern/kern_proc.c:766 #7 0xc052757b in fill_kinfo_proc (p=0x0, kp=0x0) at /usr/src/sys/kern/kern_proc.c:622 #8 0xc0527fbe in sysctl_out_proc (p=0xcbe6b1e4, req=0xdc279bf8, flags=4) at /usr/src/sys/kern/kern_proc.c:859 #9 0xc0528787 in sysctl_kern_proc (oidp=0xc0764300, arg1=0xdc279ca4, arg2=0, req=0xdc279bf8) at /usr/src/sys/kern/kern_proc.c:1024 #10 0xc053a36a in sysctl_root (oidp=0x0, arg1=0xdc279c98, arg2=3, req=0xdc279bf8) at /usr/src/sys/kern/kern_sysctl.c:1179 #11 0xc053a64d in userland_sysctl (td=0x0, name=0xdc279c98, namelen=3, old=0x3, oldlenp=0xdc279bf8, inkernel=0, new=0xdc279c98, newlen=0, retval=0xdc279c90) at /usr/src/sys/kern/kern_sysctl.c:1286 #12 0xc053a474 in __sysctl (td=0x0, uap=0xdc279d10) at /usr/src/sys/kern/kern_sysctl.c:1216 #13 0xc06e58d0 in syscall (frame= {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 3, tf_esi = 0, tf_ebp = -1077941160, tf_isp = -601383564, tf_ebx = -1077941108, tf_edx = 0, tf_ecx = -1077941056, tf_eax = 202, tf_trapno = 12, tf_err = 2, tf_eip = 134768643, tf_cs = 31, tf_eflags = 663, tf_esp = -1077941204, tf_ss = 47}) at /usr/src/sys/i386/i386/trap.c:1009 #14 0xc06d543d in Xint0x80_syscall () at {standard input}:144 Fatal trap 12: page fault while in kernel mode fault virtual address = 0x38 fault code = supervisor read, page not present instruction pointer = 0x8:0xc0546a70 stack pointer = 0x10:0xdc279824 frame pointer = 0x10:0xdc279840 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 971 (ps) trap number = 12 panic: page fault -- Sean Chittenden ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
On Wed, 15 Oct 2003, Jeff Roberson wrote: I fixed two bugs that were exposed due to more of the kernel running outside of Giant. ULE had some issues with priority propagation that stopped it from working very well. Things should be much improved. Feedback, as always, is welcome. I'd like to look into making this the default scheduler for 5.2 if things start looking up. I hope that scares you all into using it more. :-) How would one test if it was an improvement on the 4BSD scheduler? It is not even competitive in my simple tests. Test for scheduling buildworlds: cd /usr/src/usr.bin for i in obj depend all do MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i done /tmp/zqz 21 (Run this with an empty /somewhere/obj. The all stage doesn't quite finish.) On an ABIT BP6 system with a 400MHz and a 366MHz CPU, with /usr (including /usr/src) nfs-mounted (with 100 Mbps ethernet and a reasonably fast server) and /somewhere/obj ufs1-mounted (on a fairly slow disk; no soft-updates), this gives the following times: SCHED_ULE-yesterday, with not so careful setup: 40.37 real 8.26 user 6.26 sys 278.90 real59.35 user41.32 sys 341.82 real 307.38 user69.01 sys SCHED_ULE-today, run immediately after booting: 41.51 real 7.97 user 6.42 sys 306.64 real59.66 user40.68 sys 346.48 real 305.54 user69.97 sys SCHED_4BSD-yesterday, with not so careful setup: [same as today except the depend step was 10 seconds slower (real)] SCHED_4BSD-today, run immediately after booting: 18.89 real 8.01 user 6.66 sys 128.17 real58.33 user43.61 sys 291.59 real 308.48 user72.33 sys SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz CPU) with many local changes and not so careful setup: 17.39 real 8.28 user 5.49 sys 130.51 real60.97 user34.63 sys 390.68 real 310.78 user60.55 sys Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for the obj and depend stages. These stages have little parallelism. SCHED_ULE was only 19% slower for the all stage. It apparently misses many oppurtunities to actually run useful processes. This may be related to /usr being nfs mounted. There is lots of idling waiting for nfs even in the SCHED_4BSD case. The system times are smaller for SCHED_ULE, but this might not be significant. E.g., zeroing pages can account for several percent of the system time in buildworld, but on unbalanced systems that have too much idle time most page zero gets done in idle time and doesn't show up in the system time. Test 1 for fair scheduling related to niceness: for i in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 do nice -$i sh -c while :; do echo -n;done done top -o time [Output deleted]. This shows only a vague correlation between niceness and runtime for SCHED_ULE. However, top -o cpu shows a strong correlation between %CPU and niceness. Apparently, %CPU is very innacurate and/or not enough history is kept for long-term scheduling to be fair. Test 5 for fair scheduling related to niceness: for i in -20 -16 -12 -8 -4 0 4 8 12 16 20 do nice -$i sh -c while :; do echo -n;done done time top -o cpu With SCHED_ULE, this now hangs the system, but it worked yesterday. Today it doesn't get as far as running top and it stops the nfs server responding. To unhang the system and see what the above does, run a shell at rtprio 0 and start top before the above, and use top to kill processes (I normally use killall sh to kill all the shells generated by tests 1-5, but killall doesn't work if it is on nfs when the nfs server is not responding). Bruce ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
On Fri, 17 Oct 2003, Bruce Evans wrote: How would one test if it was an improvement on the 4BSD scheduler? It is not even competitive in my simple tests. [scripts results deleted] Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for the obj and depend stages. These stages have little parallelism. SCHED_ULE was only 19% slower for the all stage. It apparently misses many oppurtunities to actually run useful processes. This may be related to /usr being nfs mounted. There is lots of idling waiting for nfs even in the SCHED_4BSD case. The system times are smaller for SCHED_ULE, but this might not be significant. E.g., zeroing pages can account for several percent of the system time in buildworld, but on unbalanced systems that have too much idle time most page zero gets done in idle time and doesn't show up in the system time. At one point ULE was at least as fast as 4BSD and in most cases faster. This is a regression. I'll sort it out soon. Test 1 for fair scheduling related to niceness: for i in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 do nice -$i sh -c while :; do echo -n;done done top -o time [Output deleted]. This shows only a vague correlation between niceness and runtime for SCHED_ULE. However, top -o cpu shows a strong correlation between %CPU and niceness. Apparently, %CPU is very innacurate and/or not enough history is kept for long-term scheduling to be fair. Test 5 for fair scheduling related to niceness: for i in -20 -16 -12 -8 -4 0 4 8 12 16 20 do nice -$i sh -c while :; do echo -n;done done time top -o cpu With SCHED_ULE, this now hangs the system, but it worked yesterday. Today it doesn't get as far as running top and it stops the nfs server responding. To unhang the system and see what the above does, run a shell at rtprio 0 and start top before the above, and use top to kill processes (I normally use killall sh to kill all the shells generated by tests 1-5, but killall doesn't work if it is on nfs when the nfs server is not responding). 661 root 112 -20 900K 608K RUN 0:24 27.80% 27.64% sh 662 root 114 -16 900K 608K RUN 0:19 12.43% 12.35% sh 663 root 114 -12 900K 608K RUN 0:15 10.66% 10.60% sh 664 root 114 -8 900K 608K RUN 0:11 9.38% 9.33% sh 665 root 115 -4 900K 608K RUN 0:10 7.91% 7.86% sh 666 root 1150 900K 608K RUN 0:07 6.83% 6.79% sh 667 root 1154 900K 608K RUN 0:06 5.01% 4.98% sh 668 root 1158 900K 608K RUN 0:04 3.83% 3.81% sh 669 root 115 12 900K 608K RUN 0:02 2.21% 2.20% sh 670 root 115 16 900K 608K RUN 0:01 0.93% 0.93% sh I think you cvsup'd at a bad time. I fixed a bug that would have caused the system to lock up in this case late last night. On my system it freezes for a few seconds and then returns. I can stop that by turning down the interactivity threshold. Thanks, Jeff Bruce ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
On Fri, 17 Oct 2003, Bruce Evans wrote: On Fri, 17 Oct 2003, Jeff Roberson wrote: On Fri, 17 Oct 2003, Bruce Evans wrote: How would one test if it was an improvement on the 4BSD scheduler? It is not even competitive in my simple tests. ... At one point ULE was at least as fast as 4BSD and in most cases faster. This is a regression. I'll sort it out soon. How much faster? Apache benchmarked at 30% greater throughput due the cpu affinity some time ago. I haven't done more recent tests with apache. buildworld is the most degenerate case for per cpu run queues because cpu affinity doesn't help much and load imbalances hurt a lot. On my machine the compiler hardly ever wants to run for more than a few slices before doing a msleep() so it's not bouncing around between CPUs so much with 4BSD. Test 5 for fair scheduling related to niceness: for i in -20 -16 -12 -8 -4 0 4 8 12 16 20 do nice -$i sh -c while :; do echo -n;done done time top -o cpu With SCHED_ULE, this now hangs the system, but it worked yesterday. Today it doesn't get as far as running top and it stops the nfs server responding. 661 root 112 -20 900K 608K RUN 0:24 27.80% 27.64% sh 662 root 114 -16 900K 608K RUN 0:19 12.43% 12.35% sh 663 root 114 -12 900K 608K RUN 0:15 10.66% 10.60% sh 664 root 114 -8 900K 608K RUN 0:11 9.38% 9.33% sh 665 root 115 -4 900K 608K RUN 0:10 7.91% 7.86% sh 666 root 1150 900K 608K RUN 0:07 6.83% 6.79% sh 667 root 1154 900K 608K RUN 0:06 5.01% 4.98% sh 668 root 1158 900K 608K RUN 0:04 3.83% 3.81% sh 669 root 115 12 900K 608K RUN 0:02 2.21% 2.20% sh 670 root 115 16 900K 608K RUN 0:01 0.93% 0.93% sh Perhaps the bug only affects SMP. The above is for UP (no CPU column). That is likely, I don't use my SMP machine much anymore. I should setup some automated tests. I see a large difference from the above, at least under SMP: %CPU tapers off to 0 at nice 0. BTW, I just noticed that SCHED_4BSD never really worked for the SMP case. sched_clock() is called for each CPU, and for N CPU's this has the same effect as calling sched_clock() N times too often for 1 CPU. Calling sched_clock() too often was fixed for the UP case in kern_synch.c 1.83 by introducing a scale factor. The scale factor is fixed so it doesn't help for SMP. Wait.. why are we calling sched_clock() too frequently on UP? I think you cvsup'd at a bad time. I fixed a bug that would have caused the system to lock up in this case late last night. On my system it freezes for a few seconds and then returns. I can stop that by turning down the interactivity threshold. No, I tested with an up to date kernel (sched_ule.c 1.65). Curious. ULE seems to have suffered from bitrot. These things were all tested and working when I did my paper for BSDCon. I have largely neglected FreeBSD since. I can't fix it this weekend, but I'm sure I'll sort it out next weekend. Cheers, Jeff Bruce ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
I think you cvsup'd at a bad time. I fixed a bug that would have caused the system to lock up in this case late last night. On my system it freezes for a few seconds and then returns. I can stop that by turning down the interactivity threshold. Hrm, I must concur that while ULE seems a tad snappier on the responsiveness end, it seems to be lacking in terms of real world performance compared to 4BSD. Fresh CVSup (~midnight 2003-10-17) and build with a benchmark from before and after. I was benchmarking a chump calc program using bison vs. lemon earlier today under 4BSD (http://groups.yahoo.com/group/sqlite/message/5506) and figured I'd throw my hat in on the subject with some relative numbers. System time is down for ULE, but user and real are up. Under ULE: Running a dry run with bison calc...done. Running 1st run with bison calc... 52.11 real 45.63 user 0.56 sys Running 2nd run with bison calc... 52.16 real 45.52 user 0.69 sys Running 3rd run with bison calc... 51.80 real 45.32 user 0.87 sys Running a dry run with lemon calc...done. Running 1st run with lemon calc... 129.69 real 117.91 user 1.10 sys Running 2nd run with lemon calc... 130.26 real 117.88 user 1.13 sys Running 3rd run with lemon calc... 130.76 real 117.90 user 1.10 sys Time spent in user mode (CPU seconds) : 654.049s Time spent in kernel mode (CPU seconds) : 7.047s Total time : 12:19.06s CPU utilization (percentage): 89.4% Times the process was swapped : 0 Times of major page faults : 34 Times of minor page faults : 2361 And under 4BSD: Running a dry run with bison calc...done. Running 1st run with bison calc... 44.22 real 37.94 user 0.85 sys Running 2nd run with bison calc... 46.21 real 37.98 user 0.85 sys Running 3rd run with bison calc... 45.32 real 38.13 user 0.67 sys Running a dry run with lemon calc...done. Running 1st run with lemon calc... 116.53 real 100.10 user 1.13 sys Running 2nd run with lemon calc... 112.61 real 100.35 user 0.86 sys Running 3rd run with lemon calc... 114.16 real 100.19 user 1.04 sys Time spent in user mode (CPU seconds) : 553.392s Time spent in kernel mode (CPU seconds) : 6.978s Total time : 10:40.80s CPU utilization (percentage) : 87.4% Times the process was swapped : 223 Times of major page faults : 50 Times of minor page faults : 2750 Just a heads up, it does indeed look as thought hings have gone backwards in terms of performance. -sc -- Sean Chittenden ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
On Fri, 17 Oct 2003, Sean Chittenden wrote: I think you cvsup'd at a bad time. I fixed a bug that would have caused the system to lock up in this case late last night. On my system it freezes for a few seconds and then returns. I can stop that by turning down the interactivity threshold. Hrm, I must concur that while ULE seems a tad snappier on the responsiveness end, it seems to be lacking in terms of real world performance compared to 4BSD. Thanks for the stats. Is this on SMP or UP? Fresh CVSup (~midnight 2003-10-17) and build with a benchmark from before and after. I was benchmarking a chump calc program using bison vs. lemon earlier today under 4BSD (http://groups.yahoo.com/group/sqlite/message/5506) and figured I'd throw my hat in on the subject with some relative numbers. System time is down for ULE, but user and real are up. Under ULE: Running a dry run with bison calc...done. Running 1st run with bison calc... 52.11 real 45.63 user 0.56 sys Running 2nd run with bison calc... 52.16 real 45.52 user 0.69 sys Running 3rd run with bison calc... 51.80 real 45.32 user 0.87 sys Running a dry run with lemon calc...done. Running 1st run with lemon calc... 129.69 real 117.91 user 1.10 sys Running 2nd run with lemon calc... 130.26 real 117.88 user 1.13 sys Running 3rd run with lemon calc... 130.76 real 117.90 user 1.10 sys Time spent in user mode (CPU seconds) : 654.049s Time spent in kernel mode (CPU seconds) : 7.047s Total time : 12:19.06s CPU utilization (percentage): 89.4% Times the process was swapped : 0 Times of major page faults : 34 Times of minor page faults : 2361 And under 4BSD: Running a dry run with bison calc...done. Running 1st run with bison calc... 44.22 real 37.94 user 0.85 sys Running 2nd run with bison calc... 46.21 real 37.98 user 0.85 sys Running 3rd run with bison calc... 45.32 real 38.13 user 0.67 sys Running a dry run with lemon calc...done. Running 1st run with lemon calc... 116.53 real 100.10 user 1.13 sys Running 2nd run with lemon calc... 112.61 real 100.35 user 0.86 sys Running 3rd run with lemon calc... 114.16 real 100.19 user 1.04 sys Time spent in user mode (CPU seconds) : 553.392s Time spent in kernel mode (CPU seconds) : 6.978s Total time : 10:40.80s CPU utilization (percentage) : 87.4% Times the process was swapped : 223 Times of major page faults : 50 Times of minor page faults : 2750 Just a heads up, it does indeed look as thought hings have gone backwards in terms of performance. -sc -- Sean Chittenden ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
I think you cvsup'd at a bad time. I fixed a bug that would have caused the system to lock up in this case late last night. On my system it freezes for a few seconds and then returns. I can stop that by turning down the interactivity threshold. Hrm, I must concur that while ULE seems a tad snappier on the responsiveness end, it seems to be lacking in terms of real world performance compared to 4BSD. Thanks for the stats. Is this on SMP or UP? UP. -sc -- Sean Chittenden ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
Il Mer, 2003-10-15 alle 09:51, Jeff Roberson ha scritto: I fixed two bugs that were exposed due to more of the kernel running outside of Giant. ULE had some issues with priority propagation that stopped it from working very well. Things should be much improved. On my Athlon XP 2000+ the situation is much better. No mouse jerkiness whatever the load of the sistem is. The system respond better in any situation. I'm using libc_r, so no problem with any Gnome app caused (probably) by KSE. Best Regards. -- Rionda aka Matteo Riondato G.U.F.I Staff Member (http://www.gufi.org) BSD-FAQ-it Main Developer (http://www.gufi.org/~rionda) GPG key at: http://www.riondabsd.net/riondagpg.asc Sent from: kaiser.sig11.org running FreeBSD-5.1-CURRENT signature.asc Description: Questa parte del messaggio =?ISO-8859-1?Q?=E8?= firmata
Re: More ULE bugs fixed.
I think you cvsup'd at a bad time. I fixed a bug that would have caused the system to lock up in this case late last night. On my system it freezes for a few seconds and then returns. I can stop that by turning down the interactivity threshold. Hrm, I must concur that while ULE seems a tad snappier on the responsiveness end, it seems to be lacking in terms of real world performance compared to 4BSD. Thanks for the stats. Is this on SMP or UP? UP. -sc The commit to src/sys/kern/kern_switch.c:1.62, would it fix the following crash (can't find my kernel with debugging symbols): Fatal trap 12: page fault while in kernel mode fault virtual address = 0x30 fault code = supervisor write, page not present instruction pointer = 0x8:0xc054699f stack pointer = 0x10:0xd6713b20 frame pointer = 0x10:0xd6713b2c code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 3 (g_up) trap number = 12 panic: page fault syncing disks, buffers remaining... Fatal trap 12: page fault while in kernel mode fault virtual address = 0x0 fault code = supervisor read, page not present instruction pointer = 0x8:0xc0536771 stack pointer = 0x10:0xdb7d4bb4 frame pointer = 0x10:0xdb7d4bc0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 45 (syncer) trap number = 12 panic: page fault #0 0xc052eeeb in doadump () #1 0xc052f579 in boot () #2 0xc052f958 in panic () #3 0xc06e5536 in trap_fatal () #4 0xc06e4b73 in trap () #5 0xc06d53d8 in calltrap () #6 0xc05460bb in sched_switch () #7 0xc05384eb in mi_switch () #8 0xc0537b9f in msleep () #9 0xc058eca3 in sched_sync () #10 0xc0518321 in fork_exit () -sc -- Sean Chittenden ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
Jeff Roberson wrote: On Wed, 15 Oct 2003, Eirik Oeverby wrote: Eirik Oeverby wrote: Jeff Roberson wrote: I fixed two bugs that were exposed due to more of the kernel running outside of Giant. ULE had some issues with priority propagation that stopped it from working very well. Things should be much improved. Feedback, as always, is welcome. I'd like to look into making this the default scheduler for 5.2 if things start looking up. I hope that scares you all into using it more. :-) Hi.. Just tested, so far it seems good. System CPU load is floored (near 0), system is very responsive, no mouse sluggishness or random mouse/keyboard input. Doing a make -j 20 buildworld now (on my 1ghz p3 thinkpad ;), and running some SQLServer stuff in VMWare. We'll see how it fares. Hi, just a followup message. I'm now running the buildworld mentioned above, and the system is pretty much unusable. It exhibits the same symptoms as I have mentioned before, mouse jumpiness, bogus mouse input (movement, clicks), and the system is generally very jerky and unresponsive. This is particularily evident when doing things like webpage loading/browsing/rendering, but it's noticeable all the time, no matter what I am doing. As an example, the last sentence I wote without seeing a single character on screen before I was finsihed writing it, and it appeared with a lot more typos than I usually make ;) I'm running *without* invariants and witness right now, i.e. a kernel 100% equal to the SCHED_4BSD kernel. Can you confirm the revision of your sys/kern/sched_ule.c file? How does SCHED_4BSD respond in this same test? Yes I can. From file: __FBSDID($FreeBSD: src/sys/kern/sched_ule.c,v 1.59 2003/10/15 07:47:06 jeff Exp $); I am running SCHED_4BSD now, with a make -j 20 buildworld running, and I do not experience any of the problems. Keyboard and mouse input is smooth, and though apps run slightly slower due to the massive load on the system, there is none of the jerkiness I have seen before. Anything else I can do to help? /Eirik Thanks, Jeff Best regards, /Eirik ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
On Thu, 16 Oct 2003, Eirik Oeverby wrote: Jeff Roberson wrote: On Wed, 15 Oct 2003, Eirik Oeverby wrote: Eirik Oeverby wrote: Jeff Roberson wrote: I fixed two bugs that were exposed due to more of the kernel running outside of Giant. ULE had some issues with priority propagation that stopped it from working very well. Things should be much improved. Feedback, as always, is welcome. I'd like to look into making this the default scheduler for 5.2 if things start looking up. I hope that scares you all into using it more. :-) Hi.. Just tested, so far it seems good. System CPU load is floored (near 0), system is very responsive, no mouse sluggishness or random mouse/keyboard input. Doing a make -j 20 buildworld now (on my 1ghz p3 thinkpad ;), and running some SQLServer stuff in VMWare. We'll see how it fares. Hi, just a followup message. I'm now running the buildworld mentioned above, and the system is pretty much unusable. It exhibits the same symptoms as I have mentioned before, mouse jumpiness, bogus mouse input (movement, clicks), and the system is generally very jerky and unresponsive. This is particularily evident when doing things like webpage loading/browsing/rendering, but it's noticeable all the time, no matter what I am doing. As an example, the last sentence I wote without seeing a single character on screen before I was finsihed writing it, and it appeared with a lot more typos than I usually make ;) I'm running *without* invariants and witness right now, i.e. a kernel 100% equal to the SCHED_4BSD kernel. Can you confirm the revision of your sys/kern/sched_ule.c file? How does SCHED_4BSD respond in this same test? Yes I can. From file: __FBSDID($FreeBSD: src/sys/kern/sched_ule.c,v 1.59 2003/10/15 07:47:06 jeff Exp $); I am running SCHED_4BSD now, with a make -j 20 buildworld running, and I do not experience any of the problems. Keyboard and mouse input is smooth, and though apps run slightly slower due to the massive load on the system, there is none of the jerkiness I have seen before. Anything else I can do to help? Yup, try again. :-) I found another bug and tuned some parameters of the scheduler. The bug was introduced after I did my paper for BSDCon and so I never ran into it when I was doing serious stress testing. Hopefully this will be a huge improvement. I did a make -j16 buildworld and used mozilla while in kde2. It was fine unless I tried to scroll around rapidly in a page full of several megabyte images for many minutes. /Eirik Thanks, Jeff Best regards, /Eirik ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
Hi ! Things should be much improved. Feedback, as always, is welcome. Wow ! Smoothly working under a load of approx. 4. Running gnome2, mozilla, evolution, mplayer and kpdf. Running portsdb -Uu and a kernel build. No stuttering mouse, no irritating delays, fast rendering. That's definitely better than _4BSD. (UP machine) Cheers Peter -- [EMAIL PROTECTED] Campus der Max-Planck-Institute Tübingen Netzwerk- und Systemadministration Tel: +49 7071 601 598 Fax: +49 7071 601 616 ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
Jeff Roberson wrote: On Thu, 16 Oct 2003, Eirik Oeverby wrote: Jeff Roberson wrote: On Wed, 15 Oct 2003, Eirik Oeverby wrote: Eirik Oeverby wrote: Jeff Roberson wrote: I fixed two bugs that were exposed due to more of the kernel running outside of Giant. ULE had some issues with priority propagation that stopped it from working very well. Things should be much improved. Feedback, as always, is welcome. I'd like to look into making this the default scheduler for 5.2 if things start looking up. I hope that scares you all into using it more. :-) Hi.. Just tested, so far it seems good. System CPU load is floored (near 0), system is very responsive, no mouse sluggishness or random mouse/keyboard input. Doing a make -j 20 buildworld now (on my 1ghz p3 thinkpad ;), and running some SQLServer stuff in VMWare. We'll see how it fares. Hi, just a followup message. I'm now running the buildworld mentioned above, and the system is pretty much unusable. It exhibits the same symptoms as I have mentioned before, mouse jumpiness, bogus mouse input (movement, clicks), and the system is generally very jerky and unresponsive. This is particularily evident when doing things like webpage loading/browsing/rendering, but it's noticeable all the time, no matter what I am doing. As an example, the last sentence I wote without seeing a single character on screen before I was finsihed writing it, and it appeared with a lot more typos than I usually make ;) I'm running *without* invariants and witness right now, i.e. a kernel 100% equal to the SCHED_4BSD kernel. Can you confirm the revision of your sys/kern/sched_ule.c file? How does SCHED_4BSD respond in this same test? Yes I can. From file: __FBSDID($FreeBSD: src/sys/kern/sched_ule.c,v 1.59 2003/10/15 07:47:06 jeff Exp $); I am running SCHED_4BSD now, with a make -j 20 buildworld running, and I do not experience any of the problems. Keyboard and mouse input is smooth, and though apps run slightly slower due to the massive load on the system, there is none of the jerkiness I have seen before. Anything else I can do to help? Yup, try again. :-) I found another bug and tuned some parameters of the scheduler. The bug was introduced after I did my paper for BSDCon and so I never ran into it when I was doing serious stress testing. Hopefully this will be a huge improvement. I did a make -j16 buildworld and used mozilla while in kde2. It was fine unless I tried to scroll around rapidly in a page full of several megabyte images for many minutes. It is. Still not perfect, but now it's somewhere around the 4BSD mark I would say. Think about 'make buildworld' is that it doesn't get real tough before it hits some of the larger directories, like the crypto stuff etc., where there are many .c files in one dir - before it gets that far, there are at most 2 or 3 cc1 processes going concurrently. As soon as I get 10-20 of them, things start getting sluggish, but I suppose it's hard to avoid that. What disturbs me somewhat, though, is that I get some of this sluggishness (and other symptoms i've mentioned before) even when i'm running 'nice -n 20 make -j 20 buildworld' .. meaning the cc1 processes and all that are running (very) nice. The fact that I still have issues even when doing that, would lead me to think the problem is somewhere else than in the scheduler.. Now I can't say I'm completely sure if this is also the case with 4BSD - I only tested the nice stuff after the last reboot. But all in all, things are better now than yesterday morning. Kudos! /Eirik ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
Jeff Roberson wrote: I fixed two bugs that were exposed due to more of the kernel running outside of Giant. ULE had some issues with priority propagation that stopped it from working very well. Things should be much improved. Feedback, as always, is welcome. I'd like to look into making this the default scheduler for 5.2 if things start looking up. I hope that scares you all into using it more. :-) Hi.. Just tested, so far it seems good. System CPU load is floored (near 0), system is very responsive, no mouse sluggishness or random mouse/keyboard input. Doing a make -j 20 buildworld now (on my 1ghz p3 thinkpad ;), and running some SQLServer stuff in VMWare. We'll see how it fares. Thanks, /Eirik ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
Eirik Oeverby wrote: Jeff Roberson wrote: I fixed two bugs that were exposed due to more of the kernel running outside of Giant. ULE had some issues with priority propagation that stopped it from working very well. Things should be much improved. Feedback, as always, is welcome. I'd like to look into making this the default scheduler for 5.2 if things start looking up. I hope that scares you all into using it more. :-) Hi.. Just tested, so far it seems good. System CPU load is floored (near 0), system is very responsive, no mouse sluggishness or random mouse/keyboard input. Doing a make -j 20 buildworld now (on my 1ghz p3 thinkpad ;), and running some SQLServer stuff in VMWare. We'll see how it fares. Hi, just a followup message. I'm now running the buildworld mentioned above, and the system is pretty much unusable. It exhibits the same symptoms as I have mentioned before, mouse jumpiness, bogus mouse input (movement, clicks), and the system is generally very jerky and unresponsive. This is particularily evident when doing things like webpage loading/browsing/rendering, but it's noticeable all the time, no matter what I am doing. As an example, the last sentence I wote without seeing a single character on screen before I was finsihed writing it, and it appeared with a lot more typos than I usually make ;) I'm running *without* invariants and witness right now, i.e. a kernel 100% equal to the SCHED_4BSD kernel. Best regards, /Eirik ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
On Wed, 15 Oct 2003, Jeff Roberson wrote: I fixed two bugs that were exposed due to more of the kernel running outside of Giant. ULE had some issues with priority propagation that stopped it from working very well. Things should be much improved. Feedback, as always, is welcome. I'd like to look into making this the default scheduler for 5.2 if things start looking up. I hope that scares you all into using it more. :-) Before you do that, can you look into changing the scheduler interfaces to address David Xu's concern with it being suboptimal for KSE processes? -- Dan Eischen ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
On Wed, 15 Oct 2003, Eirik Oeverby wrote: Eirik Oeverby wrote: Jeff Roberson wrote: I fixed two bugs that were exposed due to more of the kernel running outside of Giant. ULE had some issues with priority propagation that stopped it from working very well. Things should be much improved. Feedback, as always, is welcome. I'd like to look into making this the default scheduler for 5.2 if things start looking up. I hope that scares you all into using it more. :-) Hi.. Just tested, so far it seems good. System CPU load is floored (near 0), system is very responsive, no mouse sluggishness or random mouse/keyboard input. Doing a make -j 20 buildworld now (on my 1ghz p3 thinkpad ;), and running some SQLServer stuff in VMWare. We'll see how it fares. Hi, just a followup message. I'm now running the buildworld mentioned above, and the system is pretty much unusable. It exhibits the same symptoms as I have mentioned before, mouse jumpiness, bogus mouse input (movement, clicks), and the system is generally very jerky and unresponsive. This is particularily evident when doing things like webpage loading/browsing/rendering, but it's noticeable all the time, no matter what I am doing. As an example, the last sentence I wote without seeing a single character on screen before I was finsihed writing it, and it appeared with a lot more typos than I usually make ;) I'm running *without* invariants and witness right now, i.e. a kernel 100% equal to the SCHED_4BSD kernel. Can you confirm the revision of your sys/kern/sched_ule.c file? How does SCHED_4BSD respond in this same test? Thanks, Jeff Best regards, /Eirik ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
On Wed, 15 Oct 2003, Daniel Eischen wrote: On Wed, 15 Oct 2003, Jeff Roberson wrote: I fixed two bugs that were exposed due to more of the kernel running outside of Giant. ULE had some issues with priority propagation that stopped it from working very well. Things should be much improved. Feedback, as always, is welcome. I'd like to look into making this the default scheduler for 5.2 if things start looking up. I hope that scares you all into using it more. :-) Before you do that, can you look into changing the scheduler interfaces to address David Xu's concern with it being suboptimal for KSE processes? Certainly, it may not happen if I can't find out what's making things so jerky for gnome/kde users. If it looks like it will, I'll investigate the kse issues. -- Dan Eischen ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
On Wed, 15 Oct 2003, Jeff Roberson wrote: On Wed, 15 Oct 2003, Daniel Eischen wrote: On Wed, 15 Oct 2003, Jeff Roberson wrote: I fixed two bugs that were exposed due to more of the kernel running outside of Giant. ULE had some issues with priority propagation that stopped it from working very well. Things should be much improved. Feedback, as always, is welcome. I'd like to look into making this the default scheduler for 5.2 if things start looking up. I hope that scares you all into using it more. :-) Before you do that, can you look into changing the scheduler interfaces to address David Xu's concern with it being suboptimal for KSE processes? Certainly, it may not happen if I can't find out what's making things so jerky for gnome/kde users. If it looks like it will, I'll investigate the kse issues. Thanks, I appreciate it. -- Dan Eischen ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: More ULE bugs fixed.
On Wed, 15 Oct 2003, Daniel Eischen wrote: On Wed, 15 Oct 2003, Jeff Roberson wrote: I fixed two bugs that were exposed due to more of the kernel running outside of Giant. ULE had some issues with priority propagation that stopped it from working very well. Things should be much improved. Feedback, as always, is welcome. I'd like to look into making this the default scheduler for 5.2 if things start looking up. I hope that scares you all into using it more. :-) Before you do that, can you look into changing the scheduler interfaces to address David Xu's concern with it being suboptimal for KSE processes? There is also some work that I'd like to get done re: cleaning up the scheduler interface a bit.. I know that Jeff and I have doiscussed this before but it was a long time ago, and I've forgotten a lot and also learned a bit since then.. Here's my logic on the matter: Any process has a number (fixed or variable) of kernel entities tghat can be scheduled. In KSE (gotta get a better name) there are a variable number of them. In libthr they are 1:1. I would postulate that the action of scheduling these items in a fair way is up to the scheduler. I had a very crude fairness module added to the BSD4.4 scheduler but I think that fairness is a property of the scheduler and not of the threading package. If the scheduler doesn't care if threads are scheduled fairly than it can just schedule all threads equally. I would say that the ksegrp in question (which represents a rough unit of 'fairness'), should make a call to the scheduler on creation specifying the required concurrancy. At the moment KSE-M:N based ksegrps would specify N = NCPU, and THR based ksegrps would specify N = NTHREADS. KSE-1:1 runs with a KSEGRP with a concurrancy of 1 per thread. (I still think that THR should allocate a KSEGRP per thread not a KSE but it's not critical.) Basically What I'm saying is that each scheduler should taka a concurrency setting for each KSEGRP and how it implements it is hidden from higher layers. The current 4.4 scheduler would implement it using KSEs and the existitng code but other shcedulers may chose to implement it in different manners. I think the top layer API calls for the scheduler should be: setrunnable(thread) choosethread() sched_clocktick() sched_set_concurrancy() (plus all the other 'entrypoints') I think that the scheduler needs to be in control of scheduling threads because there is too much inside information needed for it to be done properly by an outside entity. For example if the scheduler is not a priority based scheduler then an outside entity can not know how to juggle which thread should be run next if there is a choice of which to do.. this would mean that each scheduler would neeed its own module to do this juggling instead of having a separate module to do it.. it makes the job of the scheduler more difficult, but in fact it has to be so, because true posix process-scope threads require that the scheduler do this work. a thread is made runnable (with a unix priority) the scheduler needs to look at this thread in the context of all the other threads from this process, the current concurrency rule for that ksegrp and the other runnable threads, and adjust things so that: 1/ the new thread is run some time 2/ the ksegrp doesn't get TOO MUCH cpu, possibly punishing other threads in the group to compensate.. This is all up for discussion, but it's my current thinking. Julian ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]