Re: SCHED_ULE should not be the default
On Mon, 19 Dec 2011 23:22:40 +0200 Andriy Gapon a...@freebsd.org wrote: on 19/12/2011 17:50 Nathan Whitehorn said the following: The thing I've seen is that ULE is substantially more enthusiastic about migrating processes between cores than 4BSD. Hmm, this seems to be contrary to my theoretical expectations. I thought that with 4BSD all threads that were not in one of the following categories: - temporary pinned - bound to cpu in kernel via sched_bind - belong to a cpu set which a strict subset of a total set were placed onto a common queue that was shared by all cpus. And as such I expected them to get picked up by the cpus semi-randomly. In other words, I thought that it was ULE that took into account cpu/cache affinities while 4BSD was deliberately entirely ignorant of those details. I have a 6-core AMD CPU running FreeeBSD 10.0 and SCHED_4BSD. I've noticed with large ports builds which are not MAKE_JOBS_SAFE that the compile load migrates between the cores pretty quickly, but I haven't compared it to ULE. -- Gary Jennejohn ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On 12/18/11 04:34, Adrian Chadd wrote: The trouble is that there's lots of anecdotal evidence, but noone's really gone digging deep into _their_ example of why it's broken. The developers who know this stuff don't see anything wrong. That hints to me it may be something a little more creepy - as an example, the interplay between netisr/swi/taskqueue/callbacks and such. It may be that something is being starved that isn't obviously obvious. It's just a stab in the dark, but it sounds somewhat plausible based on what I've seen ULE do in my network throughput hacking. I applaud reppie for trying to make it as easy as possible for people to use KTR to provide scheduler traces for him to go digging with, so please, if you have these issues and you can absolutely reproduce them, please follow his instructions and work with him to get him what he needs. The thing I've seen is that ULE is substantially more enthusiastic about migrating processes between cores than 4BSD. Often, this is a good thing, but can increase the rate of cache misses, hurting performance for cache-bound processes (I see this particularly in HPC-type scientific workloads). It might be interesting to add some kind of tunable here. Another more interesting and slightly longer-term possibility if someone wants a project would be to integrate scheduling decisions with hwpmc counters, to accumulate statistics on cache hits at each context switch and preferentially keep processes with a high hits/misses ratio on the same thread/cache domain relative to processes with a low one. -Nathan P.S. The other thing that could be very interesting from a research and scheduling standpoint would be to integrate heterogeneous SMP support into the operating system, with a FreeBSD-4 Application Processor syscall model. We seem to be going down the road where GPGPU computing has MMUs, timer interrupts, IPIs, etc. (the next AMD Fusions, IBM Cell), as well as potential systems with both x86 and ARM cores. This is something that no operating system currently supports well, and would be a place for BSD to shine. If anyone has a free graduate student... ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
on 19/12/2011 17:50 Nathan Whitehorn said the following: The thing I've seen is that ULE is substantially more enthusiastic about migrating processes between cores than 4BSD. Hmm, this seems to be contrary to my theoretical expectations. I thought that with 4BSD all threads that were not in one of the following categories: - temporary pinned - bound to cpu in kernel via sched_bind - belong to a cpu set which a strict subset of a total set were placed onto a common queue that was shared by all cpus. And as such I expected them to get picked up by the cpus semi-randomly. In other words, I thought that it was ULE that took into account cpu/cache affinities while 4BSD was deliberately entirely ignorant of those details. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Mon Dec 19 11, Nathan Whitehorn wrote: On 12/18/11 04:34, Adrian Chadd wrote: The trouble is that there's lots of anecdotal evidence, but noone's really gone digging deep into _their_ example of why it's broken. The developers who know this stuff don't see anything wrong. That hints to me it may be something a little more creepy - as an example, the interplay between netisr/swi/taskqueue/callbacks and such. It may be that something is being starved that isn't obviously obvious. It's just a stab in the dark, but it sounds somewhat plausible based on what I've seen ULE do in my network throughput hacking. I applaud reppie for trying to make it as easy as possible for people to use KTR to provide scheduler traces for him to go digging with, so please, if you have these issues and you can absolutely reproduce them, please follow his instructions and work with him to get him what he needs. The thing I've seen is that ULE is substantially more enthusiastic about migrating processes between cores than 4BSD. Often, this is a good thing, but can increase the rate of cache misses, hurting performance for cache-bound processes (I see this particularly in HPC-type scientific workloads). It might be interesting to add some kind of tunable here. does r228718 have any impact regarding this behaviour? cheers. alex Another more interesting and slightly longer-term possibility if someone wants a project would be to integrate scheduling decisions with hwpmc counters, to accumulate statistics on cache hits at each context switch and preferentially keep processes with a high hits/misses ratio on the same thread/cache domain relative to processes with a low one. -Nathan P.S. The other thing that could be very interesting from a research and scheduling standpoint would be to integrate heterogeneous SMP support into the operating system, with a FreeBSD-4 Application Processor syscall model. We seem to be going down the road where GPGPU computing has MMUs, timer interrupts, IPIs, etc. (the next AMD Fusions, IBM Cell). This is something that no operating system currently supports well, and would be a place for BSD to shine. If anyone has a free graduate student... ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Sun Dec 18 11, Andrey Chernov wrote: On Sun, Dec 18, 2011 at 05:51:47PM +1100, Ian Smith wrote: On Sun, 18 Dec 2011 02:37:52 +, Bruce Cran wrote: On 13/12/2011 09:00, Andrey Chernov wrote: I observe ULE interactivity slowness even on single core machine (Pentium 4) in very visible places, like 'ps ax' output stucks in the middle by ~1 second. When I switch back to SHED_4BSD, all slowness is gone. I'm also seeing problems with ULE on a dual-socket quad-core Xeon machine with 16 logical CPUs. If I run tar xf somefile.tar and make -j16 buildworld then logging into another console can take several seconds. Sometimes even the Password: prompt can take a couple of seconds to appear after typing my username. I'd resigned myself to expecting this sort of behaviour as 'normal' on my single core 1133MHz PIII-M. As a reproducable data point, running 'dd if=/dev/random of=/dev/null' in one konsole, specifically to heat the CPU while testing my manual fan control script, hogs it up pretty much while regularly running the script below in another konsole to check values - which often gets stuck half way, occasionally pausing _twice_ before finishing. Switching back to the first konsole (on another desktop) to kill the dd can also take a couple/few seconds. This issue not about slow machine under load, because the same slow machine under exact the same load, but with SCHED_4BSD is very fast to response interactively. I think we should not misinterpret interactivity with speed. I see no big speed (i.e. compilation time) differences, switching schedulers, but see big _interactivity_ difference. ULE in general tends to underestimate interactive processes in favour of background ones. It perhaps helps to compilation, but looks like slowpoke OS from the interactive user experience. +1 i've also experienced issues with ULE and performed several tests to compare it to the historical 4BSD scheduler. the difference between the two does *not* seem to be speed (at least not a huge difference), but interactivity. one of the tests i performed was the following ttyv0: untar a *huge* (+10G) archive ttyv1: after ~ 30 seconds of untaring do 'ls -la $direcory', where directory contains a lot of files. i used direcory = /var/db/portsnap, because that directory contains 23117 files on my machine. measuring 'ls -la $direcory' via time(1) revealed that SCHED_ULE takes 15 seconds, whereas SCHED_4BSD only takes ~ 3-5 seconds. i think the issue is io. io operations usually get a high priority, because statistics have shown that - unlike computational tasks - io intensive tasks only run for a small fraction of time and then exit: read data - change data - writeback data. so SCHED_ULE might take these statistics too literaly and gives tasks like bsdtar(1) (in my case) too many ressources, so other tasks which require io are struggling to get some ressources assigned to them (ls(1) in my case). of course SCHED_4BSD isn't perfect, too. try using it and run the stress2 testsuite. your whole system will grind to a halt. mouse input drops below 1 HZ. even after killing all the stress2 tests, it will take a few minutes after the system becomes snappy again. cheers. alex -- http://ache.vniz.net/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Sun Dec 18 11, Alexander Best wrote: On Sun Dec 18 11, Andrey Chernov wrote: On Sun, Dec 18, 2011 at 05:51:47PM +1100, Ian Smith wrote: On Sun, 18 Dec 2011 02:37:52 +, Bruce Cran wrote: On 13/12/2011 09:00, Andrey Chernov wrote: I observe ULE interactivity slowness even on single core machine (Pentium 4) in very visible places, like 'ps ax' output stucks in the middle by ~1 second. When I switch back to SHED_4BSD, all slowness is gone. I'm also seeing problems with ULE on a dual-socket quad-core Xeon machine with 16 logical CPUs. If I run tar xf somefile.tar and make -j16 buildworld then logging into another console can take several seconds. Sometimes even the Password: prompt can take a couple of seconds to appear after typing my username. I'd resigned myself to expecting this sort of behaviour as 'normal' on my single core 1133MHz PIII-M. As a reproducable data point, running 'dd if=/dev/random of=/dev/null' in one konsole, specifically to heat the CPU while testing my manual fan control script, hogs it up pretty much while regularly running the script below in another konsole to check values - which often gets stuck half way, occasionally pausing _twice_ before finishing. Switching back to the first konsole (on another desktop) to kill the dd can also take a couple/few seconds. This issue not about slow machine under load, because the same slow machine under exact the same load, but with SCHED_4BSD is very fast to response interactively. I think we should not misinterpret interactivity with speed. I see no big speed (i.e. compilation time) differences, switching schedulers, but see big _interactivity_ difference. ULE in general tends to underestimate interactive processes in favour of background ones. It perhaps helps to compilation, but looks like slowpoke OS from the interactive user experience. +1 i've also experienced issues with ULE and performed several tests to compare it to the historical 4BSD scheduler. the difference between the two does *not* seem to be speed (at least not a huge difference), but interactivity. one of the tests i performed was the following ttyv0: untar a *huge* (+10G) archive ttyv1: after ~ 30 seconds of untaring do 'ls -la $direcory', where directory contains a lot of files. i used direcory = /var/db/portsnap, because s/portsnap/portsnap\/files/ that directory contains 23117 files on my machine. measuring 'ls -la $direcory' via time(1) revealed that SCHED_ULE takes 15 seconds, whereas SCHED_4BSD only takes ~ 3-5 seconds. i think the issue is io. io operations usually get a high priority, because statistics have shown that - unlike computational tasks - io intensive tasks only run for a small fraction of time and then exit: read data - change data - writeback data. so SCHED_ULE might take these statistics too literaly and gives tasks like bsdtar(1) (in my case) too many ressources, so other tasks which require io are struggling to get some ressources assigned to them (ls(1) in my case). of course SCHED_4BSD isn't perfect, too. try using it and run the stress2 testsuite. your whole system will grind to a halt. mouse input drops below 1 HZ. even after killing all the stress2 tests, it will take a few minutes after the system becomes snappy again. cheers. alex -- http://ache.vniz.net/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On 18/12/2011 10:34, Adrian Chadd wrote: I applaud reppie for trying to make it as easy as possible for people to use KTR to provide scheduler traces for him to go digging with, so please, if you have these issues and you can absolutely reproduce them, please follow his instructions and work with him to get him what he needs. Who's 'reppie'? -- Bruce Cran ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On 12/18/11 03:37, Bruce Cran wrote: On 13/12/2011 09:00, Andrey Chernov wrote: I observe ULE interactivity slowness even on single core machine (Pentium 4) in very visible places, like 'ps ax' output stucks in the middle by ~1 second. When I switch back to SHED_4BSD, all slowness is gone. I'm also seeing problems with ULE on a dual-socket quad-core Xeon machine with 16 logical CPUs. If I run tar xf somefile.tar and make -j16 buildworld then logging into another console can take several seconds. Sometimes even the Password: prompt can take a couple of seconds to appear after typing my username. I reported ages ago several problems using SCHED_ULE on FreeBSD 8/9 when doing heavy I/O, either disk or network bound (that time I realised the problem on servers doing heavy disk I/O or net I/O). It was suspected that X could be the problem, but we also have a Dell PowerEdge 1950III running FreeBSD 8.2-STABLE (by next week 9.0-RC[2/3]/STABLE) without X, but the same problems, but no so prominent as with X. The box has 8 cores, 4 cores per socket each and 16 GB RAM, SAS 6/iR controller and two PCI-X attached Broacom NexTreme NICs, so the hardware shouldn't be any kind of trouble. But that time (over the past two years for now), the problem was considered a personal problem. Bah! By the beginning of next year my working group expects new hardware. Since we use for Linux for scientific work (due to OpenCL and CUDA on TESLA cards), I can't use the Blade system. The boxes I expect is one Dell Precission T7500, 96 GB RAM, two sockets, two Westmere XEONs each socket with a summary of 12 cores/24 threads. I'll start a dual OS installation with FreeBSD 10 and the most recent Suse (since the development is mostly done by my colleagues on Suse for the C2075 TESLA board, I need Suse Linux). I will then being capable of performing some benchmarks on both boxes on the very same hardware. The other box will be my desk's box, a brand new Sandy-Bridge E CPU (i7-3960X) with 32 GB RAM. I'm also inclined to install a dual boot box (I rejected this up to now since I do not like to install GRUB2 for having multiboot when using GPT on FreeBSD). The box will run with FreeBSD 9 and an Ubuntu or Gentoo Linux, if. I'm unsure in the question of Linux, but I tend to have Gentoo for compiling everything myself. On this box, I also can perform benchmarks with several setups. I see forward getting some help and/or tips to proof the issues we discussed here. Oliver signature.asc Description: OpenPGP digital signature
Re: SCHED_ULE should not be the default
Hi, What Attilllo and others need are KTR traces in the most stripped down example of interactive-busting workload you can find. Eg: if you're doing 32 concurrent buildworlds and trying to test interactivity - fine, but that's going to result in a lot of KTR stuff. If you can reproduce it using a dd via /dev/null and /dev/random (like another poster did) with nothing else running, then even better. If you can do it without X running, even better. I honestly suggest ignoring benchmarks for now and concentrating on interactivity. Adrian ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Wed, 14 Dec 2011, Ivan Klymenko wrote: ?? Wed, 14 Dec 2011 00:04:42 +0100 Jilles Tjoelker jil...@stack.nl ??: On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote: If the algorithm ULE does not contain problems - it means the problem has Core2Duo, or in a piece of code that uses the ULE scheduler. I already wrote in a mailing list that specifically in my case (Core2Duo) partially helps the following patch: --- sched_ule.c.orig2011-11-24 18:11:48.0 +0200 +++ sched_ule.c 2011-12-10 22:47:08.0 +0200 ... @@ -2118,13 +2119,21 @@ struct td_sched *ts; THREAD_LOCK_ASSERT(td, MA_OWNED); + if (td-td_pri_class PRI_FIFO_BIT) + return; + ts = td-td_sched; + /* +* We used up one time slice. +*/ + if (--ts-ts_slice 0) + return; This skips most of the periodic functionality (long term load balancer, saving switch count (?), insert index (?), interactivity score update for long running thread) if the thread is not going to be rescheduled right now. It looks wrong but it is a data point if it helps your workload. Yes, I did it for as long as possible to delay the execution of the code in section: I don't understand what you are doing here, but recently noticed that the timeslicing in SCHED_4BSD is completely broken. This bug may be a feature. SCHED_4BSD doesn't have its own timeslice counter like ts_slice above. It uses `switchticks' instead. But switchticks hasn't been usable for this purpose since long before SCHED_4BSD started using it for this purpose. switchticks is reset on every context switch, so it is useless for almost all purposes -- any interrupt activity on a non-fast interrupt clobbers it. Removing the check of ts_slice in the above and always returning might give a similar bug to the SCHED_4BSD one. I noticed this while looking for bugs in realtime scheduling. In the above, returning early for PRI_FIFO_BIT also skips most of the periodic functionality. In SCHED_4BSD, returning early is the usual case, so the PRI_FIFO_BIT might as well not be checked, and it is the unusual fifo scheduling case (which is supposed to only apply to realtime priority threads) which has a chance of working as intended, while the usual roundrobin case degenerates to an impure form of fifo scheduling (iit is impure since priority decay still works so it is only fifo among threads of the same priority). ... @@ -2144,9 +2153,6 @@ if (TAILQ_EMPTY(tdq-tdq_timeshare.rq_queues[tdq-tdq_ridx])) tdq-tdq_ridx = tdq-tdq_idx; } - ts = td-td_sched; - if (td-td_pri_class PRI_FIFO_BIT) - return; if (PRI_BASE(td-td_pri_class) == PRI_TIMESHARE) { /* * We used a tick; charge it to the thread so @@ -2157,11 +2163,6 @@ sched_priority(td); } /* -* We used up one time slice. -*/ - if (--ts-ts_slice 0) - return; - /* * We're out of time, force a requeue at userret(). */ ts-ts_slice = sched_slice; With the ts_slice check here before you moved it, removing it might give buggy behaviour closer to SCHED_4BSD. and refusal to use options FULL_PREEMPTION 4-5 years ago, I found that any form of PREMPTION was a pessimization for at least makeworld (since it caused too many context switches). PREEMPTION was needed for the !SMP case, at least partly because of the broken switchticks (switchticks, when it works, gives voluntary yielding by some CPU hogs in the kernel. PREEMPTION, if it works, should do this better). So I used PREEMPTION in the !SMP case and not for the SMP case. I didn't worry about the CPU hogs in the SMP case since it is rare to have more than 1 of them and 1 will use at most 1/2 of a multi-CPU system. But no one has unsubscribed to my letter, my patch helps or not in the case of Core2Duo... There is a suspicion that the problems stem from the sections of code associated with the SMP... Maybe I'm in something wrong, but I want to help in solving this problem ... The main point of SCHED_ULE is to give better affinity for multi-CPU systems. But the `multi' apparently needs to be strictly more than 2 for it to brak even. Bruce___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Thu, Dec 15, 2011 at 05:26:27PM +0100, Attilio Rao wrote: 2011/12/13 Jeremy Chadwick free...@jdc.parodius.com: On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote: Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. ??[...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. Within our department, we developed a highly scalable code for planetary science purposes on imagery. It utilizes present GPUs via OpenCL if present. Otherwise it grabs as many cores as it can. By the end of this year I'll get a new desktop box based on Intels new Sandy Bridge-E architecture with plenty of memory. If the colleague who developed the code is willing performing some benchmarks on the same hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most recent Suse. For FreeBSD I intent also to look for performance with both different schedulers available. This is in no way shape or form the same kind of benchmark as what you're planning to do, but I thought I'd throw it out there for folks to take in as they see fit. I know folks were focused mainly on buildworld. I personally would find it interesting if someone with a higher-end system (e.g. 2 physical CPUs, with 6 or 8 cores per CPU) was to do the same test (changing -jX to -j{numofcores} of course). -- | Jeremy Chadwick ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??jdc at parodius.com | | Parodius Networking ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? http://www.parodius.com/ | | UNIX Systems Administrator ?? ?? ?? ?? ?? ?? ?? ?? ?? Mountain View, CA, US | | Making life hard for others since 1977. ?? ?? ?? ?? ?? ?? ?? PGP 4BD6C0CB | sched_ule === - time make -j2 buildworld ??1689.831u 229.328s 18:46.20 170.4% 6566+2051k 432+4264io 4565pf+0w - time make -j2 buildkernel ??640.542u 87.737s 9:01.38 134.5% 6490+1920k 134+5968io 0pf+0w sched_4bsd - time make -j2 buildworld ??1662.793u 206.908s 17:12.02 181.1% 6578+2054k 23750+4271io 6451pf+0w - time make -j2 buildkernel ??638.717u 76.146s 8:34.90 138.8% 6530+1927k 6415+5903io 0pf+0w software == * sched_ule test: ??FreeBSD 8.2-STABLE, Thu Dec ??1 04:37:29 PST 2011 * sched_4bsd test: FreeBSD 8.2-STABLE, Mon Dec 12 22:42:54 PST 2011 Hi Jeremy, thanks for the time you spent on this. However, I wanted to ask/let you note 3 things: 1) Did you use 2 different code base for the test? (one updated on December 1 and another one on December 12) No; src-all (/usr/src on this system) was not updated between December 1st and December 12th PST. I do believe I updated it today (15th PST). I can/will obviously hold off so that we have a consistent code base for comparing numbers between schedulers during buildworld and/or buildkernel. 2) Please note that you should have repeated this test several times (basically until you don't get a standard deviation which is acceptable with ministat) and report the ministat output This is the first time I have heard of ministat(1). I'm pretty sure I see what it's for and how it applies to this situation, but boy that man page could use some clarification (I have 3 people looking at this thing right now trying to figure out what means what in the graph :-) ). Anyway, graph or not, I see the point. Regarding multiple tests: yup, you're absolutely right, the only way to do it would be to run a sequence of tests repeatedly (probably 10 per scheduler). Reboots and rm -fr /usr/obj/* would be required after each test too, to guarantee empty kernel caches (of all types) consistently every time. What I posted was supposed to give people just a general idea if there was any gigantic difference between the two, and there really isn't. But, as others have stated (and you below), buildworld may not be an effective way to benchmark what we're trying to test. Hence me wondering exactly what would make for a good test. Example: 1. Run + background some program that beats on things (I really don't know what; creation/deletion of threads? CPU benchmark? bonnie++?), with output going to /dev/null. 2. Run + background time make -j2 buildworld with output going to /dev/null 3. Record/save output from time. 4. rm -fr /usr/obj shutdown -r now 5. Repeat all steps ~10 times 6. Adjust kernel configuration file to use other scheduler 7. Repeat steps 1-5. What I'm trying to figure out is what #1 and #2 should be in the above example. 3) The difference is less than 2% which I suspect is really
Re: SCHED_ULE should not be the default
On Sun, 18 Dec 2011 02:37:52 +, Bruce Cran wrote: On 13/12/2011 09:00, Andrey Chernov wrote: I observe ULE interactivity slowness even on single core machine (Pentium 4) in very visible places, like 'ps ax' output stucks in the middle by ~1 second. When I switch back to SHED_4BSD, all slowness is gone. I'm also seeing problems with ULE on a dual-socket quad-core Xeon machine with 16 logical CPUs. If I run tar xf somefile.tar and make -j16 buildworld then logging into another console can take several seconds. Sometimes even the Password: prompt can take a couple of seconds to appear after typing my username. I'd resigned myself to expecting this sort of behaviour as 'normal' on my single core 1133MHz PIII-M. As a reproducable data point, running 'dd if=/dev/random of=/dev/null' in one konsole, specifically to heat the CPU while testing my manual fan control script, hogs it up pretty much while regularly running the script below in another konsole to check values - which often gets stuck half way, occasionally pausing _twice_ before finishing. Switching back to the first konsole (on another desktop) to kill the dd can also take a couple/few seconds. t23# cat /root/bin/t23stat #!/bin/sh echo -n `date` sysctl dev.cpu.0.freq dev.cpu.0.cx_usage sysctl dev.acpi_ibm | egrep 'fan_|thermal' sysctl hw.acpi.thermal.tz0.temperature acpiconf -i0 | egrep 'State|Remain|Present|Volt' Sure it's a slow machine, but it normally runs pretty smoothly. Anything with a bit of disk i/o, like buildworld, runs smooth as. This is on 8.2-R GENERIC, HZ=1000, 768MB with lots free, no swap in use. I'll definitely be trying SCHED_4BSD after updating to 8-stable unless a 'miracle cure' appears beforehand. cheers, Ian ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
The trouble is that there's lots of anecdotal evidence, but noone's really gone digging deep into _their_ example of why it's broken. The developers who know this stuff don't see anything wrong. That hints to me it may be something a little more creepy - as an example, the interplay between netisr/swi/taskqueue/callbacks and such. It may be that something is being starved that isn't obviously obvious. It's just a stab in the dark, but it sounds somewhat plausible based on what I've seen ULE do in my network throughput hacking. I applaud reppie for trying to make it as easy as possible for people to use KTR to provide scheduler traces for him to go digging with, so please, if you have these issues and you can absolutely reproduce them, please follow his instructions and work with him to get him what he needs. Adrian (wow, lots of personal pronouns packed into one sentence. It must be sleep time.) ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On 13/12/2011 09:00, Andrey Chernov wrote: I observe ULE interactivity slowness even on single core machine (Pentium 4) in very visible places, like 'ps ax' output stucks in the middle by ~1 second. When I switch back to SHED_4BSD, all slowness is gone. I'm also seeing problems with ULE on a dual-socket quad-core Xeon machine with 16 logical CPUs. If I run tar xf somefile.tar and make -j16 buildworld then logging into another console can take several seconds. Sometimes even the Password: prompt can take a couple of seconds to appear after typing my username. -- Bruce Cran ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Sun, Dec 18, 2011 at 05:51:47PM +1100, Ian Smith wrote: On Sun, 18 Dec 2011 02:37:52 +, Bruce Cran wrote: On 13/12/2011 09:00, Andrey Chernov wrote: I observe ULE interactivity slowness even on single core machine (Pentium 4) in very visible places, like 'ps ax' output stucks in the middle by ~1 second. When I switch back to SHED_4BSD, all slowness is gone. I'm also seeing problems with ULE on a dual-socket quad-core Xeon machine with 16 logical CPUs. If I run tar xf somefile.tar and make -j16 buildworld then logging into another console can take several seconds. Sometimes even the Password: prompt can take a couple of seconds to appear after typing my username. I'd resigned myself to expecting this sort of behaviour as 'normal' on my single core 1133MHz PIII-M. As a reproducable data point, running 'dd if=/dev/random of=/dev/null' in one konsole, specifically to heat the CPU while testing my manual fan control script, hogs it up pretty much while regularly running the script below in another konsole to check values - which often gets stuck half way, occasionally pausing _twice_ before finishing. Switching back to the first konsole (on another desktop) to kill the dd can also take a couple/few seconds. This issue not about slow machine under load, because the same slow machine under exact the same load, but with SCHED_4BSD is very fast to response interactively. I think we should not misinterpret interactivity with speed. I see no big speed (i.e. compilation time) differences, switching schedulers, but see big _interactivity_ difference. ULE in general tends to underestimate interactive processes in favour of background ones. It perhaps helps to compilation, but looks like slowpoke OS from the interactive user experience. -- http://ache.vniz.net/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Thu, Dec 15, 2011 at 9:58 PM, Mike Tancsa m...@sentex.net wrote: On 12/15/2011 11:56 AM, Attilio Rao wrote: So, as very first thing, can you try the following: - Same codebase, etc. etc. - Make the test 4 times, discard the first and ministat for the other 3 - Reboot - Change the steal_thresh value - Make the test 4 times, discard the first and ministat for the other 3 Then report discarded values and the ministated one and we will have more informations I guess (also, I don't think devfs contention should play a role here, thus nevermind about it for now). Results and data at http://www.tancsa.com/ule-bsd.html ---Mike I took the liberty of re-plotting this as one boxplot per test-type, in the hope of getting a better overview. R script included. Beware the y-ranges. (To re-plot with a specific y range, add e.g. ylim=c(0,35) to the boxplot() calls.) http://nebdal.net/sched/plot.html -- Daniel Nebdal Dep. of genetics, Oslo University Hospital ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
2011/12/14 Mike Tancsa m...@sentex.net: On 12/13/2011 7:01 PM, m...@freebsd.org wrote: Has anyone experiencing problems tried to set sysctl kern.sched.steal_thresh=1 ? I don't remember what our specific problem at $WORK was, perhaps it was just interrupt threads not getting serviced fast enough, but we've hard-coded this to 1 and removed the code that sets it in sched_initticks(). The same effect should be had by setting the sysctl after a box is up. FWIW, this does impact the performance of pbzip2 on an i7. Using a 1.1G file pbzip2 -v -c big /dev/null with burnP6 running in the background, sysctl kern.sched.steal_thresh=1 vs sysctl kern.sched.steal_thresh=3 N Min Max Median Avg Stddev x 10 38.005022 38.42238 38.194648 38.165052 0.15546188 + 9 38.695417 40.595544 39.392127 39.435384 0.59814114 Difference at 95.0% confidence 1.27033 +/- 0.412636 3.32852% +/- 1.08119% (Student's t, pooled s = 0.425627) a value of 1 is *slightly* faster. Hi Mike, was that just the same codebase with the switch SCHED_4BSD/SCHED_ULE? Also, the results here should be in the 3% interval for the avg case, which is not yet at the 'alarm level' but could still be an indication. I still suspect I/O plays a big role here, however, thus it could be detemined by other factors. Could you retry the bench checking CPU usage and possible thread migration around for both cases? Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
2011/12/13 Jeremy Chadwick free...@jdc.parodius.com: On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote: Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. [...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. Within our department, we developed a highly scalable code for planetary science purposes on imagery. It utilizes present GPUs via OpenCL if present. Otherwise it grabs as many cores as it can. By the end of this year I'll get a new desktop box based on Intels new Sandy Bridge-E architecture with plenty of memory. If the colleague who developed the code is willing performing some benchmarks on the same hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most recent Suse. For FreeBSD I intent also to look for performance with both different schedulers available. This is in no way shape or form the same kind of benchmark as what you're planning to do, but I thought I'd throw it out there for folks to take in as they see fit. I know folks were focused mainly on buildworld. I personally would find it interesting if someone with a higher-end system (e.g. 2 physical CPUs, with 6 or 8 cores per CPU) was to do the same test (changing -jX to -j{numofcores} of course). -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | sched_ule === - time make -j2 buildworld 1689.831u 229.328s 18:46.20 170.4% 6566+2051k 432+4264io 4565pf+0w - time make -j2 buildkernel 640.542u 87.737s 9:01.38 134.5% 6490+1920k 134+5968io 0pf+0w sched_4bsd - time make -j2 buildworld 1662.793u 206.908s 17:12.02 181.1% 6578+2054k 23750+4271io 6451pf+0w - time make -j2 buildkernel 638.717u 76.146s 8:34.90 138.8% 6530+1927k 6415+5903io 0pf+0w software == * sched_ule test: FreeBSD 8.2-STABLE, Thu Dec 1 04:37:29 PST 2011 * sched_4bsd test: FreeBSD 8.2-STABLE, Mon Dec 12 22:42:54 PST 2011 Hi Jeremy, thanks for the time you spent on this. However, I wanted to ask/let you note 3 things: 1) Did you use 2 different code base for the test? (one updated on December 1 and another one on December 12) 2) Please note that you should have repeated this test several times (basically until you don't get a standard deviation which is acceptable with ministat) and report the ministat output 3) The difference is less than 2% which I suspect is really statistically unuseful/the same I'm not really even surprised ULE is not faster than 4BSD in this case because usually buildworld/buildkernel tests are driven for the vast majority by I/O overhead rather than scheduler capacity. It would be more interesting to analyze how buildworld does while another type of workload is going on. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On 12/15/2011 11:26 AM, Attilio Rao wrote: Hi Mike, was that just the same codebase with the switch SCHED_4BSD/SCHED_ULE? Hi Attilio, It was the same codebase. Could you retry the bench checking CPU usage and possible thread migration around for both cases? I can, but how do I do that ? ---Mike -- --- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
2011/12/15 Mike Tancsa m...@sentex.net: On 12/15/2011 11:26 AM, Attilio Rao wrote: Hi Mike, was that just the same codebase with the switch SCHED_4BSD/SCHED_ULE? Hi Attilio, It was the same codebase. Could you retry the bench checking CPU usage and possible thread migration around for both cases? I can, but how do I do that ? I'm thinking now to a better test-case for this: can you try that on a tmpfs volume? Also what filesystem you were using? How many CPUs were in place? Did you reboot before to move the steal_thresh value? Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On 12/15/2011 11:42 AM, Attilio Rao wrote: I'm thinking now to a better test-case for this: can you try that on a tmpfs volume? There is enough RAM in the box so that it should not touch the disk, and I was sending the output to /dev/null, so it was not writing to the disk. Also what filesystem you were using? UFS How many CPUs were in place? 4 Did you reboot before to move the steal_thresh value? No. ---Mike -- --- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
2011/12/15 Mike Tancsa m...@sentex.net: On 12/15/2011 11:42 AM, Attilio Rao wrote: I'm thinking now to a better test-case for this: can you try that on a tmpfs volume? There is enough RAM in the box so that it should not touch the disk, and I was sending the output to /dev/null, so it was not writing to the disk. Also what filesystem you were using? UFS How many CPUs were in place? 4 Did you reboot before to move the steal_thresh value? No. So, as very first thing, can you try the following: - Same codebase, etc. etc. - Make the test 4 times, discard the first and ministat for the other 3 - Reboot - Change the steal_thresh value - Make the test 4 times, discard the first and ministat for the other 3 Then report discarded values and the ministated one and we will have more informations I guess (also, I don't think devfs contention should play a role here, thus nevermind about it for now). Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
Am 12/15/11 15:20, schrieb Steven Hartland: With all the discussion I thought I'd give a buildworld benchmark a go here on a spare 24 core machine. ULE tested fine but with 4BSD it wont even boot panicing with the following:- http://screensnapr.com/v/hwysGV.png This is on a clean 8.2-RELEASE-p4 Upgrading to RELENG_9 fixed this but its a bit concerning that just changing the scheduler would cause the machine to panic on boot. Its only a single run so varience could be high but here's the result of a buildworld on this machine running the two different schedulers:- 4BSD: 24m54.10s real 2h43m12.42s user 56m20.07s sys ULE: 23m54.68s real 2h34m59.04s user 50m59.91s sys What really sticks out is that this is over double that of an 8.2 buildworld on the same machine with the same kernel ULE: 11m12.76s real 1h27m59.39s user 28m59.57s sys This was run 9.0-PRERELEASE kernel due to 4BSD panicing on boot under 8.2. So for this use ULE vs 4BSD is neither here-nor-there but 9.0 buildworld is very slow (x2 slower) compared with 8.2 so whats a bigger question in my mind. Regards Steve All of our 8.2-STABLE with ncpu = 4 compile the OS in half the time a compilation of FreeBSD 9/10 is needed to. I guess this is due to the huge LLVM contribution which is now part of the source tree. Even if you allow building a whole LLVM suite (and not even pieces of it as in FreeBSD standard for CLANG purposes), it takes another q0 to 20 minutes, depending on the architecture of the underlying host. Building kernel or worl, taking time and show then the invers of that number isn't a good idea, in my opinion. Therefore I like artificial benchmarks: have a set of programs that can be compiled and take the time if compilation time is important. Well, your one-shot test would show, that there is indeed a marginal advantage of SCHED_ULE, if the number of cores is big enough (as said to be n 2 in this thread). But I'm a bit disappointed about the very small advantage on that 24 core hog. Oliver signature.asc Description: OpenPGP digital signature
Re: SCHED_ULE should not be the default
2011/12/15 Jeremy Chadwick free...@jdc.parodius.com: On Thu, Dec 15, 2011 at 05:26:27PM +0100, Attilio Rao wrote: 2011/12/13 Jeremy Chadwick free...@jdc.parodius.com: On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote: Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. ??[...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. Within our department, we developed a highly scalable code for planetary science purposes on imagery. It utilizes present GPUs via OpenCL if present. Otherwise it grabs as many cores as it can. By the end of this year I'll get a new desktop box based on Intels new Sandy Bridge-E architecture with plenty of memory. If the colleague who developed the code is willing performing some benchmarks on the same hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most recent Suse. For FreeBSD I intent also to look for performance with both different schedulers available. This is in no way shape or form the same kind of benchmark as what you're planning to do, but I thought I'd throw it out there for folks to take in as they see fit. I know folks were focused mainly on buildworld. I personally would find it interesting if someone with a higher-end system (e.g. 2 physical CPUs, with 6 or 8 cores per CPU) was to do the same test (changing -jX to -j{numofcores} of course). -- | Jeremy Chadwick ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??jdc at parodius.com | | Parodius Networking ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? http://www.parodius.com/ | | UNIX Systems Administrator ?? ?? ?? ?? ?? ?? ?? ?? ?? Mountain View, CA, US | | Making life hard for others since 1977. ?? ?? ?? ?? ?? ?? ?? PGP 4BD6C0CB | sched_ule === - time make -j2 buildworld ??1689.831u 229.328s 18:46.20 170.4% 6566+2051k 432+4264io 4565pf+0w - time make -j2 buildkernel ??640.542u 87.737s 9:01.38 134.5% 6490+1920k 134+5968io 0pf+0w sched_4bsd - time make -j2 buildworld ??1662.793u 206.908s 17:12.02 181.1% 6578+2054k 23750+4271io 6451pf+0w - time make -j2 buildkernel ??638.717u 76.146s 8:34.90 138.8% 6530+1927k 6415+5903io 0pf+0w software == * sched_ule test: ??FreeBSD 8.2-STABLE, Thu Dec ??1 04:37:29 PST 2011 * sched_4bsd test: FreeBSD 8.2-STABLE, Mon Dec 12 22:42:54 PST 2011 Hi Jeremy, thanks for the time you spent on this. However, I wanted to ask/let you note 3 things: 1) Did you use 2 different code base for the test? (one updated on December 1 and another one on December 12) No; src-all (/usr/src on this system) was not updated between December 1st and December 12th PST. I do believe I updated it today (15th PST). I can/will obviously hold off so that we have a consistent code base for comparing numbers between schedulers during buildworld and/or buildkernel. 2) Please note that you should have repeated this test several times (basically until you don't get a standard deviation which is acceptable with ministat) and report the ministat output This is the first time I have heard of ministat(1). I'm pretty sure I see what it's for and how it applies to this situation, but boy that man page could use some clarification (I have 3 people looking at this thing right now trying to figure out what means what in the graph :-) ). Anyway, graph or not, I see the point. Regarding multiple tests: yup, you're absolutely right, the only way to do it would be to run a sequence of tests repeatedly (probably 10 per scheduler). Reboots and rm -fr /usr/obj/* would be required after each test too, to guarantee empty kernel caches (of all types) consistently every time. What I posted was supposed to give people just a general idea if there was any gigantic difference between the two, and there really isn't. But, as others have stated (and you below), buildworld may not be an effective way to benchmark what we're trying to test. Hence me wondering exactly what would make for a good test. Example: 1. Run + background some program that beats on things (I really don't know what; creation/deletion of threads? CPU benchmark? bonnie++?), with output going to /dev/null. 2. Run + background time make -j2 buildworld with output going to /dev/null 3. Record/save output from time. 4. rm -fr /usr/obj shutdown -r now 5. Repeat all steps ~10 times 6. Adjust kernel configuration file to use other scheduler 7. Repeat steps 1-5. What I'm trying to figure out is what #1 and #2 should be in the
Re: SCHED_ULE should not be the default
В Thu, 15 Dec 2011 20:02:44 +0100 Attilio Rao atti...@freebsd.org пишет: 2011/12/15 Jeremy Chadwick free...@jdc.parodius.com: On Thu, Dec 15, 2011 at 05:26:27PM +0100, Attilio Rao wrote: 2011/12/13 Jeremy Chadwick free...@jdc.parodius.com: On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote: Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. ??[...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. Within our department, we developed a highly scalable code for planetary science purposes on imagery. It utilizes present GPUs via OpenCL if present. Otherwise it grabs as many cores as it can. By the end of this year I'll get a new desktop box based on Intels new Sandy Bridge-E architecture with plenty of memory. If the colleague who developed the code is willing performing some benchmarks on the same hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most recent Suse. For FreeBSD I intent also to look for performance with both different schedulers available. This is in no way shape or form the same kind of benchmark as what you're planning to do, but I thought I'd throw it out there for folks to take in as they see fit. I know folks were focused mainly on buildworld. I personally would find it interesting if someone with a higher-end system (e.g. 2 physical CPUs, with 6 or 8 cores per CPU) was to do the same test (changing -jX to -j{numofcores} of course). -- | Jeremy Chadwick ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??jdc at parodius.com | | Parodius Networking ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? http://www.parodius.com/ | | UNIX Systems Administrator ?? ?? ?? ?? ?? ?? ?? ?? ?? Mountain View, CA, US | | Making life hard for others since 1977. ?? ?? ?? ?? ?? ?? ?? PGP 4BD6C0CB | sched_ule === - time make -j2 buildworld ??1689.831u 229.328s 18:46.20 170.4% 6566+2051k 432+4264io 4565pf+0w - time make -j2 buildkernel ??640.542u 87.737s 9:01.38 134.5% 6490+1920k 134+5968io 0pf+0w sched_4bsd - time make -j2 buildworld ??1662.793u 206.908s 17:12.02 181.1% 6578+2054k 23750+4271io 6451pf+0w - time make -j2 buildkernel ??638.717u 76.146s 8:34.90 138.8% 6530+1927k 6415+5903io 0pf+0w software == * sched_ule test: ??FreeBSD 8.2-STABLE, Thu Dec ??1 04:37:29 PST 2011 * sched_4bsd test: FreeBSD 8.2-STABLE, Mon Dec 12 22:42:54 PST 2011 Hi Jeremy, thanks for the time you spent on this. However, I wanted to ask/let you note 3 things: 1) Did you use 2 different code base for the test? (one updated on December 1 and another one on December 12) No; src-all (/usr/src on this system) was not updated between December 1st and December 12th PST. I do believe I updated it today (15th PST). I can/will obviously hold off so that we have a consistent code base for comparing numbers between schedulers during buildworld and/or buildkernel. 2) Please note that you should have repeated this test several times (basically until you don't get a standard deviation which is acceptable with ministat) and report the ministat output This is the first time I have heard of ministat(1). I'm pretty sure I see what it's for and how it applies to this situation, but boy that man page could use some clarification (I have 3 people looking at this thing right now trying to figure out what means what in the graph :-) ). Anyway, graph or not, I see the point. Regarding multiple tests: yup, you're absolutely right, the only way to do it would be to run a sequence of tests repeatedly (probably 10 per scheduler). Reboots and rm -fr /usr/obj/* would be required after each test too, to guarantee empty kernel caches (of all types) consistently every time. What I posted was supposed to give people just a general idea if there was any gigantic difference between the two, and there really isn't. But, as others have stated (and you below), buildworld may not be an effective way to benchmark what we're trying to test. Hence me wondering exactly what would make for a good test. Example: 1. Run + background some program that beats on things (I really don't know what; creation/deletion of threads? CPU benchmark? bonnie++?), with output going to /dev/null. 2. Run + background time make -j2 buildworld with output going to /dev/null 3. Record/save output from time. 4. rm -fr
Re: SCHED_ULE should not be the default
On 12/15/2011 11:56 AM, Attilio Rao wrote: So, as very first thing, can you try the following: - Same codebase, etc. etc. - Make the test 4 times, discard the first and ministat for the other 3 - Reboot - Change the steal_thresh value - Make the test 4 times, discard the first and ministat for the other 3 Then report discarded values and the ministated one and we will have more informations I guess (also, I don't think devfs contention should play a role here, thus nevermind about it for now). Results and data at http://www.tancsa.com/ule-bsd.html ---Mike -- --- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
2011/12/15 Mike Tancsa m...@sentex.net: On 12/15/2011 11:56 AM, Attilio Rao wrote: So, as very first thing, can you try the following: - Same codebase, etc. etc. - Make the test 4 times, discard the first and ministat for the other 3 - Reboot - Change the steal_thresh value - Make the test 4 times, discard the first and ministat for the other 3 Then report discarded values and the ministated one and we will have more informations I guess (also, I don't think devfs contention should play a role here, thus nevermind about it for now). Results and data at http://www.tancsa.com/ule-bsd.html I'm not totally sure, what does burnP6 do? is it a CPU-bound workload? Also, how many threads are spanked in your case for parallel bzip2? Also, it would be very good if you could arrange these tests against newer -CURRENT (with userland and kerneland debugging off). Thanks a lot of your hard work, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On 12/13/2011 7:01 PM, m...@freebsd.org wrote: Has anyone experiencing problems tried to set sysctl kern.sched.steal_thresh=1 ? I don't remember what our specific problem at $WORK was, perhaps it was just interrupt threads not getting serviced fast enough, but we've hard-coded this to 1 and removed the code that sets it in sched_initticks(). The same effect should be had by setting the sysctl after a box is up. FWIW, this does impact the performance of pbzip2 on an i7. Using a 1.1G file pbzip2 -v -c big /dev/null with burnP6 running in the background, sysctl kern.sched.steal_thresh=1 vs sysctl kern.sched.steal_thresh=3 N Min MaxMedian AvgStddev x 10 38.005022 38.42238 38.194648 38.1650520.15546188 + 9 38.695417 40.595544 39.392127 39.4353840.59814114 Difference at 95.0% confidence 1.27033 +/- 0.412636 3.32852% +/- 1.08119% (Student's t, pooled s = 0.425627) a value of 1 is *slightly* faster. -- --- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Tue, Dec 13, 2011 at 02:22:48AM -0800, Adrian Chadd wrote: On 13 December 2011 01:00, Andrey Chernov a...@freebsd.org wrote: If the algorithm ULE does not contain problems - it means the problem has Core2Duo, or in a piece of code that uses the ULE scheduler. I observe ULE interactivity slowness even on single core machine (Pentium 4) in very visible places, like 'ps ax' output stucks in the middle by ~1 second. When I switch back to SHED_4BSD, all slowness is gone. Are you able to provide KTR traces of the scheduler results? Something that can be fed to schedgraph? Sorry, this machine is not mine anymore. I try SCHED_ULE on Core 2 Duo instead and don't notice this effect, but it is overall pretty fast comparing to that Pentium 4. -- http://ache.vniz.net/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
В Wed, 14 Dec 2011 21:34:35 +0400 Andrey Chernov a...@freebsd.org пишет: On Tue, Dec 13, 2011 at 02:22:48AM -0800, Adrian Chadd wrote: On 13 December 2011 01:00, Andrey Chernov a...@freebsd.org wrote: If the algorithm ULE does not contain problems - it means the problem has Core2Duo, or in a piece of code that uses the ULE scheduler. I observe ULE interactivity slowness even on single core machine (Pentium 4) in very visible places, like 'ps ax' output stucks in the middle by ~1 second. When I switch back to SHED_4BSD, all slowness is gone. Are you able to provide KTR traces of the scheduler results? Something that can be fed to schedgraph? Sorry, this machine is not mine anymore. I try SCHED_ULE on Core 2 Duo instead and don't notice this effect, but it is overall pretty fast comparing to that Pentium 4. Give me, please, detailed instructions on how to do it - I'll do it ... Be a shame if this the theme is will end again just only the discussions ... :( ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On 12/12/2011 05:47, O. Hartmann wrote: Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? I complained about poor interactive performance of ULE in a desktop environment for years. I had numerous people try to help, including Jeff, with various tunables, dtrace'ing, etc. The cause of the problem was never found. I switched to 4BSD, problem gone. This is on 2 separate systems with core 2 duos. hth, Doug If the algorithm ULE does not contain problems - it means the problem has Core2Duo, or in a piece of code that uses the ULE scheduler. I already wrote in a mailing list that specifically in my case (Core2Duo) partially helps the following patch: --- sched_ule.c.orig2011-11-24 18:11:48.0 +0200 +++ sched_ule.c 2011-12-10 22:47:08.0 +0200 @@ -794,7 +794,8 @@ * 1.5 * balance_interval. */ balance_ticks = max(balance_interval / 2, 1); - balance_ticks += random() % balance_interval; +// balance_ticks += random() % balance_interval; + balance_ticks += ((int)random()) % balance_interval; if (smp_started == 0 || rebalance == 0) return; tdq = TDQ_SELF(); @@ -2118,13 +2119,21 @@ struct td_sched *ts; THREAD_LOCK_ASSERT(td, MA_OWNED); + if (td-td_pri_class PRI_FIFO_BIT) + return; + ts = td-td_sched; + /* +* We used up one time slice. +*/ + if (--ts-ts_slice 0) + return; tdq = TDQ_SELF(); #ifdef SMP /* * We run the long term load balancer infrequently on the first cpu. */ - if (balance_tdq == tdq) { - if (balance_ticks --balance_ticks == 0) + if (balance_ticks --balance_ticks == 0) { + if (balance_tdq == tdq) sched_balance(); } #endif @@ -2144,9 +2153,6 @@ if (TAILQ_EMPTY(tdq-tdq_timeshare.rq_queues[tdq-tdq_ridx])) tdq-tdq_ridx = tdq-tdq_idx; } - ts = td-td_sched; - if (td-td_pri_class PRI_FIFO_BIT) - return; if (PRI_BASE(td-td_pri_class) == PRI_TIMESHARE) { /* * We used a tick; charge it to the thread so @@ -2157,11 +2163,6 @@ sched_priority(td); } /* -* We used up one time slice. -*/ - if (--ts-ts_slice 0) - return; - /* * We're out of time, force a requeue at userret(). */ ts-ts_slice = sched_slice; and refusal to use options FULL_PREEMPTION But no one has unsubscribed to my letter, my patch helps or not in the case of Core2Duo... There is a suspicion that the problems stem from the sections of code associated with the SMP... Maybe I'm in something wrong, but I want to help in solving this problem ... ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote: On 12/12/2011 05:47, O. Hartmann wrote: Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? I complained about poor interactive performance of ULE in a desktop environment for years. I had numerous people try to help, including Jeff, with various tunables, dtrace'ing, etc. The cause of the problem was never found. I switched to 4BSD, problem gone. This is on 2 separate systems with core 2 duos. hth, Doug If the algorithm ULE does not contain problems - it means the problem has Core2Duo, or in a piece of code that uses the ULE scheduler. I observe ULE interactivity slowness even on single core machine (Pentium 4) in very visible places, like 'ps ax' output stucks in the middle by ~1 second. When I switch back to SHED_4BSD, all slowness is gone. -- http://ache.vniz.net/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On 13 December 2011 01:00, Andrey Chernov a...@freebsd.org wrote: If the algorithm ULE does not contain problems - it means the problem has Core2Duo, or in a piece of code that uses the ULE scheduler. I observe ULE interactivity slowness even on single core machine (Pentium 4) in very visible places, like 'ps ax' output stucks in the middle by ~1 second. When I switch back to SHED_4BSD, all slowness is gone. Are you able to provide KTR traces of the scheduler results? Something that can be fed to schedgraph? Adrian ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote: Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. [...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. Within our department, we developed a highly scalable code for planetary science purposes on imagery. It utilizes present GPUs via OpenCL if present. Otherwise it grabs as many cores as it can. By the end of this year I'll get a new desktop box based on Intels new Sandy Bridge-E architecture with plenty of memory. If the colleague who developed the code is willing performing some benchmarks on the same hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most recent Suse. For FreeBSD I intent also to look for performance with both different schedulers available. This is in no way shape or form the same kind of benchmark as what you're planning to do, but I thought I'd throw it out there for folks to take in as they see fit. I know folks were focused mainly on buildworld. I personally would find it interesting if someone with a higher-end system (e.g. 2 physical CPUs, with 6 or 8 cores per CPU) was to do the same test (changing -jX to -j{numofcores} of course). -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | sched_ule === - time make -j2 buildworld 1689.831u 229.328s 18:46.20 170.4% 6566+2051k 432+4264io 4565pf+0w - time make -j2 buildkernel 640.542u 87.737s 9:01.38 134.5% 6490+1920k 134+5968io 0pf+0w sched_4bsd - time make -j2 buildworld 1662.793u 206.908s 17:12.02 181.1% 6578+2054k 23750+4271io 6451pf+0w - time make -j2 buildkernel 638.717u 76.146s 8:34.90 138.8% 6530+1927k 6415+5903io 0pf+0w software == * sched_ule test: FreeBSD 8.2-STABLE, Thu Dec 1 04:37:29 PST 2011 * sched_4bsd test: FreeBSD 8.2-STABLE, Mon Dec 12 22:42:54 PST 2011 hardware == * Intel Core 2 Duo E8400, 3GHz * Supermicro X7SBA * 8GB ECC RAM (4x2GB), DDR2-800 * Intel 320-series SSD, 80GB: /, swap, /var, /tmp, /usr tuning adjustments / etc. === * Before each scheduler test, system was rebooted to ensure I/O cache and other whatnots were empty * All filesystems stock UFS2 + SU (root is non-SU) * All filesystems had tunefs -t enable applied to them * powerd(8) in use, with two rc.conf variables (per CPU spec): performance_cx_lowest=C2 economy_cx_lowest=C2 * loader.conf kern.maxdsiz=2560M kern.dfldsiz=2560M kern.maxssiz=256M ahci_load=yes hint.p4tcc.0.disabled=1 hint.acpi_throttle.0.disabled=1 vfs.zfs.arc_max=5120M * make.conf CPUTYPE?=core2 * src.conf WITHOUT_INET6=true WITHOUT_IPFILTER=true WITHOUT_LIB32=true WITHOUT_KERBEROS=true WITHOUT_PAM_SUPPORT=true WITHOUT_PROFILE=true WITHOUT_SENDMAIL=true * kernel configuration - note: between kernel builds, config was changed to either use SCHED_4BSD or SCHED_ULE respectively. cpu HAMMER ident GENERIC makeoptions DEBUG=-g# Build kernel with gdb(1) debug symbols options SCHED_4BSD # Classic BSD scheduler #optionsSCHED_ULE # ULE scheduler options PREEMPTION # Enable kernel thread preemption options INET# InterNETworking options FFS # Berkeley Fast Filesystem options SOFTUPDATES # Enable FFS soft updates support options UFS_ACL # Support for access control lists options UFS_DIRHASH # Improve performance on big directories options UFS_GJOURNAL# Enable gjournal-based UFS journaling options MD_ROOT # MD is a potential root device options NFSCLIENT # Network Filesystem Client options NFSSERVER # Network Filesystem Server options NFSLOCKD# Network Lock Manager options NFS_ROOT# NFS usable as /, requires NFSCLIENT options MSDOSFS # MSDOS Filesystem options CD9660 # ISO 9660 Filesystem options PROCFS # Process filesystem (requires PSEUDOFS) options PSEUDOFS# Pseudo-filesystem framework options GEOM_PART_GPT # GUID Partition Tables. options
Re: SCHED_ULE should not be the default
On 12/12/11 16:13, Vincent Hoffman wrote: On 12/12/2011 13:47, O. Hartmann wrote: Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. [...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. It all a little old now but some if the stuff in http://people.freebsd.org/~kris/scaling/ covers improvements that were seen. http://jeffr-tech.livejournal.com/5705.html shows a little too, reading though Jeffs blog is worth it as it has some interesting stuff on SHED_ULE. I thought there were some more benchmarks floating round but cant find any with a quick google. Vince Interesting, there seems to be a much more performant scheduler in 7.0, called SCHED_SMP. I have some faint recalls on that ... where is this beast gone? Oliver signature.asc Description: OpenPGP digital signature
Re: SCHED_ULE should not be the default
On Tue, Dec 13, 2011 at 12:13:42PM +0100, O. Hartmann wrote: On 12/12/11 16:13, Vincent Hoffman wrote: On 12/12/2011 13:47, O. Hartmann wrote: Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. [...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. It all a little old now but some if the stuff in http://people.freebsd.org/~kris/scaling/ covers improvements that were seen. http://jeffr-tech.livejournal.com/5705.html shows a little too, reading though Jeffs blog is worth it as it has some interesting stuff on SHED_ULE. I thought there were some more benchmarks floating round but cant find any with a quick google. Vince Interesting, there seems to be a much more performant scheduler in 7.0, called SCHED_SMP. I have some faint recalls on that ... where is this beast gone? Boy I sure hope I remember this right. I strongly urge others to correct me where I'm wrong; thanks in advance! The classic scheduler, SCHED_4BSD, was implemented back before there was oxygen. sched_4bsd(4) mentions this. No need to discuss it. Jeff Robertson began working on the first-generation ULE scheduler during the days of FreeBSD 5.x (I believe 5.1), and a paper on it was presented at USENIX circa 2003: http://www.usenix.org/event/bsdcon03/tech/full_papers/roberson/roberson.pdf Over the following years, Jeff (and others I assume -- maybe folks like George Neville-Neil and/or Kirk McKusick?) adjusted and tinkered with some of the semantics and models/methods. If I remember right, some of these quirks/fixes were committed. All of this was happening under the scheduler that was then called SCHED_ULE, but it was ULE 1.0 for lack of better terminology. This scheduler did not perform well, if I remember right, and Jeff was quite honest about that. From this point forward, Jeff began idealising and working on a scheduler which he called SCHED_SMP -- think of it as ULE 2.0, again, for lack of better terminology. It was different than the existing SCHED_ULE scheduler, hence a different name. Jeff blogged about this in early 2007, using exactly that term (ULE 2.0): http://jeffr-tech.livejournal.com/3729.html In mid-2007, prior to FreeBSD 7.0-RELEASE, Jeff announced that effectively he wanted to make SCHED_ULE do what SCHED_SMP did, and provided a patch to SCHED_ULE to accomplish just that: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-07/msg00755.html Full thread is here (beware -- many replies): http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-07/threads.html#00755 The patch mentioned above was merged into HEAD on 2007/07/19. http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/sched_ule.c#rev1.202 So in effect, as of 2007/07/19, SCHED_ULE became SCHED_SMP. FreeBSD 7.0-RELEASE was released on 2008/02/27, and the above commit/changes were available at that time as well (meaning: RELENG_7 and RELENG_7_0 at that moment in time should have included the patch from the above paragraph). The document released by Kris Kenneway hinted at those changes and performance improvements: http://people.freebsd.org/~kris/scaling/7.0%20Preview.pdf Keep in mind, however, that at that time kernel configuration files (GENERIC, etc.) still defaulted to SCHED_4BSD. The default scheduler in kernel config files (GENERIC, etc.) for i386 and amd64 (not sure about others) was changed in 2007/10/19: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/i386/conf/GENERIC#rev1.475 http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/amd64/conf/GENERIC#rev1.485 This was done *prior* to FreeBSD 7.1-RELEASE. So, it first became available as the default scheduler for the masses when 7.1-RELEASE came out on 2009/01/05. All of the answers, in a roundabout and non-user-friendly way, are available by examining the commit history for src/sys/kern/sched_ule.c. It's hard to follow especially given that you have to consider all the releases/branchpoints that took place over time, but: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/sched_ule.c Are we having fun yet? :-) -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to
Re: SCHED_ULE should not be the default
On 12/12/11 16:51, Steve Kargl wrote: On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote: Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. [...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. Within our department, we developed a highly scalable code for planetary science purposes on imagery. It utilizes present GPUs via OpenCL if present. Otherwise it grabs as many cores as it can. By the end of this year I'll get a new desktop box based on Intels new Sandy Bridge-E architecture with plenty of memory. If the colleague who developed the code is willing performing some benchmarks on the same hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most recent Suse. For FreeBSD I intent also to look for performance with both different schedulers available. This comes up every 9 months or so, and must be approaching FAQ status. In a HPC environment, I recommend 4BSD. Depending on the workload, ULE can cause a severe increase in turn around time when doing already long computations. If you have an MPI application, simply launching greater than ncpu+1 jobs can show the problem. Well, those recommendations should based on WHY. As the mostly negative experiences with SCHED_ULE in highly computative workloads get allways contradicted by ...but there are workloads that show the opposite ... this should be shown by more recent benchmarks and explanations than legacy benchmarks from years ago. And, indeed, I highly would recommend having a FAQ or a short note in tuning or the handbook in which it is mentioned to use SCHED_4BSD in HPC environments and SCHED_ULE for other workloads (which has to be more specific). It is not an easy task setting up a certain kind of OS for a specific purpose and tuning by crawling the mailing lists. Some notes and hints in the documentation is always a valuable hint and highly appreciated by folks not deep into development. And by the way, I have the deep impression that most of these discussions about the poor performance of SCHED_ULE tend to always end up in a covering up that flaw and the conclusive waste of development. But this is only my personal impression. signature.asc Description: OpenPGP digital signature
Re: SCHED_ULE should not be the default
On Tue, Dec 13, 2011 at 02:23:46PM +0100, O. Hartmann wrote: On 12/12/11 16:51, Steve Kargl wrote: On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote: Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. [...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. Within our department, we developed a highly scalable code for planetary science purposes on imagery. It utilizes present GPUs via OpenCL if present. Otherwise it grabs as many cores as it can. By the end of this year I'll get a new desktop box based on Intels new Sandy Bridge-E architecture with plenty of memory. If the colleague who developed the code is willing performing some benchmarks on the same hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most recent Suse. For FreeBSD I intent also to look for performance with both different schedulers available. This comes up every 9 months or so, and must be approaching FAQ status. In a HPC environment, I recommend 4BSD. Depending on the workload, ULE can cause a severe increase in turn around time when doing already long computations. If you have an MPI application, simply launching greater than ncpu+1 jobs can show the problem. Well, those recommendations should based on WHY. As the mostly negative experiences with SCHED_ULE in highly computative workloads get allways contradicted by ...but there are workloads that show the opposite ... this should be shown by more recent benchmarks and explanations than legacy benchmarks from years ago. I have given the WHY in previous discussions of ULE, based on what you call legacy benchmarks. I have not seen any commit to sched_ule.c that would lead me to believe that the performance issues with ULE and cpu-bound numerical codes have been addressed. Repeating the benchmark would be a waste of time. -- Steve ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On 12/13/2011 10:54 AM, Steve Kargl wrote: I have given the WHY in previous discussions of ULE, based on what you call legacy benchmarks. I have not seen any commit to sched_ule.c that would lead me to believe that the performance issues with ULE and cpu-bound numerical codes have been addressed. Repeating the benchmark would be a waste of time. Trying a simple pbzip2 on a large file, the results are pretty consistent through iterations. pbzip2 with 4BSD is barely faster on a file thats 322MB in size. after a reboot, I did a strings bigfile /dev/null then ran pbzip2 -v xaa -c /dev/null 7 times If I do a burnP6 in the background, they perform about the same. (from sysutils/cpuburn) eg pbzip2 -v xaa -c /dev/null Parallel BZIP2 v1.1.6 - by: Jeff Gilchrist [http://compression.ca] [Oct. 30, 2011] (uses libbzip2 by Julian Seward) Major contributions: Yavor Nikolov nikolov.javor+pbz...@gmail.com # CPUs: 4 BWT Block Size: 900 KB File Block Size: 900 KB Maximum Memory: 100 MB --- File #: 1 of 1 Input Name: xaa Output Name: stdout Input Size: 352404831 bytes Compressing data... Output Size: 50630745 bytes --- Wall Clock: 18.139342 seconds ULE 18.113204 18.116896 18.123400 18.105894 18.163332 18.139342 18.082888 ULE with burnP6 23.076085 22.003666 21.162987 21.682445 21.935568 23.595781 21.601277 4BSD 17.983395 17.986218 18.009254 18.004312 18.001494 17.997032 4BSD with burnP6 22.215508 21.886459 21.595179 21.361830 21.325351 21.244793 # ministat uleP6 bsdP6 x uleP6 + bsdP6 +--+ |x+ + ++x + x x + xx| | ||__MA|M_A__| | +--+ N Min MaxMedian AvgStddev x 6 21.162987 23.595781 22.003666 22.2427550.91175566 + 6 21.244793 22.215508 21.595179 21.604853 0.3792413 No difference proven at 95.0% confidence x ule + bsd +--+ |+ + + + + + xx x xx x x| | |__A___M___| |M__A__| | +--+ N Min MaxMedian AvgStddev x 7 18.082888 18.163332 18.116896 18.120708 0.025468695 + 6 17.983395 18.009254 18.001494 17.996951 0.010248473 Difference at 95.0% confidence -0.123757 +/- 0.024538 -0.68296% +/- 0.135414% (Student's t, pooled s = 0.0200388) hardware is X3450 with 8G of memory. RELENG8 ---Mike -- --- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On 12/13/2011 13:31, Malin Randstrom wrote: stop sending me spam mail ... you never stop despite me having unsubscribeb several times. stop this! If you had actually unsubscribed, the mail would have stopped. :) You can see the instructions you need to follow below. ___ freebsd-sta...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org -- [^L] Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote: If the algorithm ULE does not contain problems - it means the problem has Core2Duo, or in a piece of code that uses the ULE scheduler. I already wrote in a mailing list that specifically in my case (Core2Duo) partially helps the following patch: --- sched_ule.c.orig 2011-11-24 18:11:48.0 +0200 +++ sched_ule.c 2011-12-10 22:47:08.0 +0200 @@ -794,7 +794,8 @@ * 1.5 * balance_interval. */ balance_ticks = max(balance_interval / 2, 1); - balance_ticks += random() % balance_interval; +// balance_ticks += random() % balance_interval; + balance_ticks += ((int)random()) % balance_interval; if (smp_started == 0 || rebalance == 0) return; tdq = TDQ_SELF(); This avoids a 64-bit division on 64-bit platforms but seems to have no effect otherwise. Because this function is not called very often, the change seems unlikely to help. @@ -2118,13 +2119,21 @@ struct td_sched *ts; THREAD_LOCK_ASSERT(td, MA_OWNED); + if (td-td_pri_class PRI_FIFO_BIT) + return; + ts = td-td_sched; + /* + * We used up one time slice. + */ + if (--ts-ts_slice 0) + return; This skips most of the periodic functionality (long term load balancer, saving switch count (?), insert index (?), interactivity score update for long running thread) if the thread is not going to be rescheduled right now. It looks wrong but it is a data point if it helps your workload. tdq = TDQ_SELF(); #ifdef SMP /* * We run the long term load balancer infrequently on the first cpu. */ - if (balance_tdq == tdq) { - if (balance_ticks --balance_ticks == 0) + if (balance_ticks --balance_ticks == 0) { + if (balance_tdq == tdq) sched_balance(); } #endif The main effect of this appears to be to disable the long term load balancer completely after some time. At some point, a CPU other than the first CPU (which uses balance_tdq) will set balance_ticks = 0, and sched_balance() will never be called again. It also introduces a hypothetical race condition because the access to balance_ticks is no longer restricted to one CPU under a spinlock. If the long term load balancer may be causing trouble, try setting kern.sched.balance_interval to a higher value with unpatched code. @@ -2144,9 +2153,6 @@ if (TAILQ_EMPTY(tdq-tdq_timeshare.rq_queues[tdq-tdq_ridx])) tdq-tdq_ridx = tdq-tdq_idx; } - ts = td-td_sched; - if (td-td_pri_class PRI_FIFO_BIT) - return; if (PRI_BASE(td-td_pri_class) == PRI_TIMESHARE) { /* * We used a tick; charge it to the thread so @@ -2157,11 +2163,6 @@ sched_priority(td); } /* - * We used up one time slice. - */ - if (--ts-ts_slice 0) - return; - /* * We're out of time, force a requeue at userret(). */ ts-ts_slice = sched_slice; and refusal to use options FULL_PREEMPTION But no one has unsubscribed to my letter, my patch helps or not in the case of Core2Duo... There is a suspicion that the problems stem from the sections of code associated with the SMP... Maybe I'm in something wrong, but I want to help in solving this problem ... -- Jilles Tjoelker ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Mon, Dec 12, 2011 at 04:29:14PM -0800, Doug Barton wrote: On 12/12/2011 05:47, O. Hartmann wrote: Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? I complained about poor interactive performance of ULE in a desktop environment for years. I had numerous people try to help, including Jeff, with various tunables, dtrace'ing, etc. The cause of the problem was never found. The issues that I've seen with ULE on the desktop seem to be caused by X taking up a steady amount of CPU, and being demoted from being an interactive process. X then becomes the bottleneck for other processes that would otherwise be interactive. Try 'renice -20 pid_of_X' and see if that makes your problems go away. Marcus ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
В Wed, 14 Dec 2011 00:04:42 +0100 Jilles Tjoelker jil...@stack.nl пишет: On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote: If the algorithm ULE does not contain problems - it means the problem has Core2Duo, or in a piece of code that uses the ULE scheduler. I already wrote in a mailing list that specifically in my case (Core2Duo) partially helps the following patch: --- sched_ule.c.orig2011-11-24 18:11:48.0 +0200 +++ sched_ule.c 2011-12-10 22:47:08.0 +0200 @@ -794,7 +794,8 @@ * 1.5 * balance_interval. */ balance_ticks = max(balance_interval / 2, 1); - balance_ticks += random() % balance_interval; +// balance_ticks += random() % balance_interval; + balance_ticks += ((int)random()) % balance_interval; if (smp_started == 0 || rebalance == 0) return; tdq = TDQ_SELF(); This avoids a 64-bit division on 64-bit platforms but seems to have no effect otherwise. Because this function is not called very often, the change seems unlikely to help. Yes, this section does not apply to this problem :) Just I posted the latest patch which i using now... @@ -2118,13 +2119,21 @@ struct td_sched *ts; THREAD_LOCK_ASSERT(td, MA_OWNED); + if (td-td_pri_class PRI_FIFO_BIT) + return; + ts = td-td_sched; + /* +* We used up one time slice. +*/ + if (--ts-ts_slice 0) + return; This skips most of the periodic functionality (long term load balancer, saving switch count (?), insert index (?), interactivity score update for long running thread) if the thread is not going to be rescheduled right now. It looks wrong but it is a data point if it helps your workload. Yes, I did it for as long as possible to delay the execution of the code in section: ... #ifdef SMP /* * We run the long term load balancer infrequently on the first cpu. */ if (balance_tdq == tdq) { if (balance_ticks --balance_ticks == 0) sched_balance(); } #endif ... tdq = TDQ_SELF(); #ifdef SMP /* * We run the long term load balancer infrequently on the first cpu. */ - if (balance_tdq == tdq) { - if (balance_ticks --balance_ticks == 0) + if (balance_ticks --balance_ticks == 0) { + if (balance_tdq == tdq) sched_balance(); } #endif The main effect of this appears to be to disable the long term load balancer completely after some time. At some point, a CPU other than the first CPU (which uses balance_tdq) will set balance_ticks = 0, and sched_balance() will never be called again. That is, for the same reason as above in the text... It also introduces a hypothetical race condition because the access to balance_ticks is no longer restricted to one CPU under a spinlock. If the long term load balancer may be causing trouble, try setting kern.sched.balance_interval to a higher value with unpatched code. I checked it in the first place - but it did not help fix the situation... The impression of malfunction rebalancing... It seems that the thread is passed on to the same core that is loaded and so... Perhaps this is a consequence of an incorrect definition of the topology CPU? @@ -2144,9 +2153,6 @@ if (TAILQ_EMPTY(tdq-tdq_timeshare.rq_queues[tdq-tdq_ridx])) tdq-tdq_ridx = tdq-tdq_idx; } - ts = td-td_sched; - if (td-td_pri_class PRI_FIFO_BIT) - return; if (PRI_BASE(td-td_pri_class) == PRI_TIMESHARE) { /* * We used a tick; charge it to the thread so @@ -2157,11 +2163,6 @@ sched_priority(td); } /* -* We used up one time slice. -*/ - if (--ts-ts_slice 0) - return; - /* * We're out of time, force a requeue at userret(). */ ts-ts_slice = sched_slice; and refusal to use options FULL_PREEMPTION But no one has unsubscribed to my letter, my patch helps or not in the case of Core2Duo... There is a suspicion that the problems stem from the sections of code associated with the SMP... Maybe I'm in something wrong, but I want to help in solving this problem ... ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
В Tue, 13 Dec 2011 23:02:15 + Marcus Reid mar...@blazingdot.com пишет: On Mon, Dec 12, 2011 at 04:29:14PM -0800, Doug Barton wrote: On 12/12/2011 05:47, O. Hartmann wrote: Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? I complained about poor interactive performance of ULE in a desktop environment for years. I had numerous people try to help, including Jeff, with various tunables, dtrace'ing, etc. The cause of the problem was never found. The issues that I've seen with ULE on the desktop seem to be caused by X taking up a steady amount of CPU, and being demoted from being an interactive process. X then becomes the bottleneck for other processes that would otherwise be interactive. Try 'renice -20 pid_of_X' and see if that makes your problems go away. Why, then X is not a bottleneck when using 4BSD? Marcus ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Tue, Dec 13, 2011 at 3:39 PM, Ivan Klymenko fi...@ukr.net wrote: В Wed, 14 Dec 2011 00:04:42 +0100 Jilles Tjoelker jil...@stack.nl пишет: On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote: If the algorithm ULE does not contain problems - it means the problem has Core2Duo, or in a piece of code that uses the ULE scheduler. I already wrote in a mailing list that specifically in my case (Core2Duo) partially helps the following patch: --- sched_ule.c.orig 2011-11-24 18:11:48.0 +0200 +++ sched_ule.c 2011-12-10 22:47:08.0 +0200 @@ -794,7 +794,8 @@ * 1.5 * balance_interval. */ balance_ticks = max(balance_interval / 2, 1); - balance_ticks += random() % balance_interval; +// balance_ticks += random() % balance_interval; + balance_ticks += ((int)random()) % balance_interval; if (smp_started == 0 || rebalance == 0) return; tdq = TDQ_SELF(); This avoids a 64-bit division on 64-bit platforms but seems to have no effect otherwise. Because this function is not called very often, the change seems unlikely to help. Yes, this section does not apply to this problem :) Just I posted the latest patch which i using now... @@ -2118,13 +2119,21 @@ struct td_sched *ts; THREAD_LOCK_ASSERT(td, MA_OWNED); + if (td-td_pri_class PRI_FIFO_BIT) + return; + ts = td-td_sched; + /* + * We used up one time slice. + */ + if (--ts-ts_slice 0) + return; This skips most of the periodic functionality (long term load balancer, saving switch count (?), insert index (?), interactivity score update for long running thread) if the thread is not going to be rescheduled right now. It looks wrong but it is a data point if it helps your workload. Yes, I did it for as long as possible to delay the execution of the code in section: ... #ifdef SMP /* * We run the long term load balancer infrequently on the first cpu. */ if (balance_tdq == tdq) { if (balance_ticks --balance_ticks == 0) sched_balance(); } #endif ... tdq = TDQ_SELF(); #ifdef SMP /* * We run the long term load balancer infrequently on the first cpu. */ - if (balance_tdq == tdq) { - if (balance_ticks --balance_ticks == 0) + if (balance_ticks --balance_ticks == 0) { + if (balance_tdq == tdq) sched_balance(); } #endif The main effect of this appears to be to disable the long term load balancer completely after some time. At some point, a CPU other than the first CPU (which uses balance_tdq) will set balance_ticks = 0, and sched_balance() will never be called again. That is, for the same reason as above in the text... It also introduces a hypothetical race condition because the access to balance_ticks is no longer restricted to one CPU under a spinlock. If the long term load balancer may be causing trouble, try setting kern.sched.balance_interval to a higher value with unpatched code. I checked it in the first place - but it did not help fix the situation... The impression of malfunction rebalancing... It seems that the thread is passed on to the same core that is loaded and so... Perhaps this is a consequence of an incorrect definition of the topology CPU? @@ -2144,9 +2153,6 @@ if (TAILQ_EMPTY(tdq-tdq_timeshare.rq_queues[tdq-tdq_ridx])) tdq-tdq_ridx = tdq-tdq_idx; } - ts = td-td_sched; - if (td-td_pri_class PRI_FIFO_BIT) - return; if (PRI_BASE(td-td_pri_class) == PRI_TIMESHARE) { /* * We used a tick; charge it to the thread so @@ -2157,11 +2163,6 @@ sched_priority(td); } /* - * We used up one time slice. - */ - if (--ts-ts_slice 0) - return; - /* * We're out of time, force a requeue at userret(). */ ts-ts_slice = sched_slice; and refusal to use options FULL_PREEMPTION But no one has unsubscribed to my letter, my patch helps or not in the case of Core2Duo... There is a suspicion that the problems stem from the sections of code associated with the SMP... Maybe I'm in something wrong, but I want to help in solving this problem ... Has anyone experiencing problems tried to set sysctl kern.sched.steal_thresh=1 ? I don't remember what our specific problem at $WORK was, perhaps it was just interrupt threads not getting serviced fast enough, but we've hard-coded this to 1 and removed the code that sets it in sched_initticks(). The same effect should be had by setting the sysctl after a box is up. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
В Tue, 13 Dec 2011 16:01:56 -0800 m...@freebsd.org пишет: On Tue, Dec 13, 2011 at 3:39 PM, Ivan Klymenko fi...@ukr.net wrote: В Wed, 14 Dec 2011 00:04:42 +0100 Jilles Tjoelker jil...@stack.nl пишет: On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote: If the algorithm ULE does not contain problems - it means the problem has Core2Duo, or in a piece of code that uses the ULE scheduler. I already wrote in a mailing list that specifically in my case (Core2Duo) partially helps the following patch: --- sched_ule.c.orig 2011-11-24 18:11:48.0 +0200 +++ sched_ule.c 2011-12-10 22:47:08.0 +0200 @@ -794,7 +794,8 @@ * 1.5 * balance_interval. */ balance_ticks = max(balance_interval / 2, 1); - balance_ticks += random() % balance_interval; +// balance_ticks += random() % balance_interval; + balance_ticks += ((int)random()) % balance_interval; if (smp_started == 0 || rebalance == 0) return; tdq = TDQ_SELF(); This avoids a 64-bit division on 64-bit platforms but seems to have no effect otherwise. Because this function is not called very often, the change seems unlikely to help. Yes, this section does not apply to this problem :) Just I posted the latest patch which i using now... @@ -2118,13 +2119,21 @@ struct td_sched *ts; THREAD_LOCK_ASSERT(td, MA_OWNED); + if (td-td_pri_class PRI_FIFO_BIT) + return; + ts = td-td_sched; + /* + * We used up one time slice. + */ + if (--ts-ts_slice 0) + return; This skips most of the periodic functionality (long term load balancer, saving switch count (?), insert index (?), interactivity score update for long running thread) if the thread is not going to be rescheduled right now. It looks wrong but it is a data point if it helps your workload. Yes, I did it for as long as possible to delay the execution of the code in section: ... #ifdef SMP /* * We run the long term load balancer infrequently on the first cpu. */ if (balance_tdq == tdq) { if (balance_ticks --balance_ticks == 0) sched_balance(); } #endif ... tdq = TDQ_SELF(); #ifdef SMP /* * We run the long term load balancer infrequently on the first cpu. */ - if (balance_tdq == tdq) { - if (balance_ticks --balance_ticks == 0) + if (balance_ticks --balance_ticks == 0) { + if (balance_tdq == tdq) sched_balance(); } #endif The main effect of this appears to be to disable the long term load balancer completely after some time. At some point, a CPU other than the first CPU (which uses balance_tdq) will set balance_ticks = 0, and sched_balance() will never be called again. That is, for the same reason as above in the text... It also introduces a hypothetical race condition because the access to balance_ticks is no longer restricted to one CPU under a spinlock. If the long term load balancer may be causing trouble, try setting kern.sched.balance_interval to a higher value with unpatched code. I checked it in the first place - but it did not help fix the situation... The impression of malfunction rebalancing... It seems that the thread is passed on to the same core that is loaded and so... Perhaps this is a consequence of an incorrect definition of the topology CPU? @@ -2144,9 +2153,6 @@ if (TAILQ_EMPTY(tdq-tdq_timeshare.rq_queues[tdq-tdq_ridx])) tdq-tdq_ridx = tdq-tdq_idx; } - ts = td-td_sched; - if (td-td_pri_class PRI_FIFO_BIT) - return; if (PRI_BASE(td-td_pri_class) == PRI_TIMESHARE) { /* * We used a tick; charge it to the thread so @@ -2157,11 +2163,6 @@ sched_priority(td); } /* - * We used up one time slice. - */ - if (--ts-ts_slice 0) - return; - /* * We're out of time, force a requeue at userret(). */ ts-ts_slice = sched_slice; and refusal to use options FULL_PREEMPTION But no one has unsubscribed to my letter, my patch helps or not in the case of Core2Duo... There is a suspicion that the problems stem from the sections of code associated with the SMP... Maybe I'm in something wrong, but I want to help in solving this problem ... Has anyone experiencing problems tried to set sysctl kern.sched.steal_thresh=1 ? In my case, the variable kern.sched.steal_thresh and so has the value 1. I don't remember what our specific problem at $WORK was, perhaps it was just interrupt threads not getting serviced fast enough, but we've hard-coded this to 1 and removed the code that sets it in sched_initticks(). The same effect should be had by setting the
Re: SCHED_ULE should not be the default
Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. [...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. Within our department, we developed a highly scalable code for planetary science purposes on imagery. It utilizes present GPUs via OpenCL if present. Otherwise it grabs as many cores as it can. By the end of this year I'll get a new desktop box based on Intels new Sandy Bridge-E architecture with plenty of memory. If the colleague who developed the code is willing performing some benchmarks on the same hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most recent Suse. For FreeBSD I intent also to look for performance with both different schedulers available. O. signature.asc Description: OpenPGP digital signature
Re: SCHED_ULE should not be the default
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/12/2011 13:47, O. Hartmann wrote: Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. [...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. It all a little old now but some if the stuff in http://people.freebsd.org/~kris/scaling/ covers improvements that were seen. http://jeffr-tech.livejournal.com/5705.html shows a little too, reading though Jeffs blog is worth it as it has some interesting stuff on SHED_ULE. I thought there were some more benchmarks floating round but cant find any with a quick google. Vince Within our department, we developed a highly scalable code for planetary science purposes on imagery. It utilizes present GPUs via OpenCL if present. Otherwise it grabs as many cores as it can. By the end of this year I'll get a new desktop box based on Intels new Sandy Bridge-E architecture with plenty of memory. If the colleague who developed the code is willing performing some benchmarks on the same hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most recent Suse. For FreeBSD I intent also to look for performance with both different schedulers available. O. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJO5hn7AAoJEF4mgOY1fXowOLAP/2EjhAFPb88NgKM0ieBb4X7R NSw/9HTiwcshkfEdvYjAzYZ0cUWetEuRfnPVnh+abwfJEmMzZkwA0KIz8UYGHHik 22Z2SWSVDiwZAluz0ca7Xc931ojbzrK/zVMbivqW3cvnz8P4oEnASiENnsoa89Jy Oskjd4QpAyIpB/AsYgc9FLT3kPX13fXC5bzw/zAPDsaupOYssRRlZu8nnqsEc1i1 IanLIPKLnIbpZTx75ehWxxRW8IjiQRvIe+7eBaDMhXO/Kvftotf0JzknrBnJezDQ ZdhiOTq7F1Pm3dxra+DNKD+Dw+xUCYPFq/kuyqrZNz44H3qwT60vDhvw0yDz6422 nNP11z2+G4M85sahBak5AmSHuyek7HWb6uIHHnfvwNKSX4ZsdS8MVBViNJjmCYtL PwuHDU3WdCes/vvKRNDopSp/s6RSLK9w3RT7jlMkaTu2Mmtw0BwGziDJ2pGaCQ14 68R5eO/SfNxoVp0g4lIzObyQR+//0OmALzElVK3VmHM9NoL3qZGCwBRLqjN5re82 dX6nsBr/DFJOpaFfdFLwPNyCNdNpg/WVegRkq2BEL/BaMISNiKzoVbM0Psh9gnb3 LW1j3LP2fOHhuN1bW3S31JmbNzvAnlRNynoNMldrwj5PWJY2HPk+mMFRjmRwdDTJ 9mhscz8++WRPvDZQXefl =XqaR -END PGP SIGNATURE- ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Mon, 12 Dec 2011 15:13:00 + Vincent Hoffman vi...@unsane.co.uk wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/12/2011 13:47, O. Hartmann wrote: Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. [...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. It all a little old now but some if the stuff in http://people.freebsd.org/~kris/scaling/ covers improvements that were seen. http://jeffr-tech.livejournal.com/5705.html shows a little too, reading though Jeffs blog is worth it as it has some interesting stuff on SHED_ULE. I thought there were some more benchmarks floating round but cant find any with a quick google. Vince Within our department, we developed a highly scalable code for planetary science purposes on imagery. It utilizes present GPUs via OpenCL if present. Otherwise it grabs as many cores as it can. By the end of this year I'll get a new desktop box based on Intels new Sandy Bridge-E architecture with plenty of memory. If the colleague who developed the code is willing performing some benchmarks on the same hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most recent Suse. For FreeBSD I intent also to look for performance with both different schedulers available. These observations are not scientific, but I have a CPU from AMD with 6 cores (AMD Phenom(tm) II X6 1090T Processor). My simple test was ``make buildkernel'' while watching the core usage with gkrellm. With SCHED_4BSD all 6 cores are loaded to 97% during the build phase. I've never seen any value above 97% with gkrellm. With SCHED_ULE I never saw all 6 cores loaded this heavily. Usually 2 or more cores were at or below 90%. Not really that significant, but still a noticeable difference in apparent scheduling behavior. Whether the observed difference is due to some change in data from the kernel to gkrellm is beyond me. -- Gary Jennejohn ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote: Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. [...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. Within our department, we developed a highly scalable code for planetary science purposes on imagery. It utilizes present GPUs via OpenCL if present. Otherwise it grabs as many cores as it can. By the end of this year I'll get a new desktop box based on Intels new Sandy Bridge-E architecture with plenty of memory. If the colleague who developed the code is willing performing some benchmarks on the same hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most recent Suse. For FreeBSD I intent also to look for performance with both different schedulers available. This comes up every 9 months or so, and must be approaching FAQ status. In a HPC environment, I recommend 4BSD. Depending on the workload, ULE can cause a severe increase in turn around time when doing already long computations. If you have an MPI application, simply launching greater than ncpu+1 jobs can show the problem. PS: search the list archives for kargl and ULE. -- Steve ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Mon, Dec 12, 2011 at 7:32 AM, Gary Jennejohn gljennj...@googlemail.com wrote: On Mon, 12 Dec 2011 15:13:00 + Vincent Hoffman vi...@unsane.co.uk wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/12/2011 13:47, O. Hartmann wrote: Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. [...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. It all a little old now but some if the stuff in http://people.freebsd.org/~kris/scaling/ covers improvements that were seen. http://jeffr-tech.livejournal.com/5705.html shows a little too, reading though Jeffs blog is worth it as it has some interesting stuff on SHED_ULE. I thought there were some more benchmarks floating round but cant find any with a quick google. Vince Within our department, we developed a highly scalable code for planetary science purposes on imagery. It utilizes present GPUs via OpenCL if present. Otherwise it grabs as many cores as it can. By the end of this year I'll get a new desktop box based on Intels new Sandy Bridge-E architecture with plenty of memory. If the colleague who developed the code is willing performing some benchmarks on the same hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most recent Suse. For FreeBSD I intent also to look for performance with both different schedulers available. These observations are not scientific, but I have a CPU from AMD with 6 cores (AMD Phenom(tm) II X6 1090T Processor). My simple test was ``make buildkernel'' while watching the core usage with gkrellm. With SCHED_4BSD all 6 cores are loaded to 97% during the build phase. I've never seen any value above 97% with gkrellm. With SCHED_ULE I never saw all 6 cores loaded this heavily. Usually 2 or more cores were at or below 90%. Not really that significant, but still a noticeable difference in apparent scheduling behavior. Whether the observed difference is due to some change in data from the kernel to gkrellm is beyond me. SCHED_ULE is much sloppier about calculating which thread used a timeslice -- unless the timeslice went 100% to a thread, the fraction it used may get attributed elsewhere. So top's reporting of thread usage is not a useful metric. Total buildworld time is, potentially. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
Did you use -jX to build the world? _ Von: Gary Jennejohn gljennj...@googlemail.com Versendet am: Mon Dec 12 16:32:21 MEZ 2011 An: Vincent Hoffman vi...@unsane.co.uk CC: O. Hartmann ohart...@mail.zedat.fu-berlin.de, Current FreeBSD freebsd-current@freebsd.org, freebsd-sta...@freebsd.org, freebsd-performa...@freebsd.org Betreff: Re: SCHED_ULE should not be the default On Mon, 12 Dec 2011 15:13:00 + Vincent Hoffman vi...@unsane.co.uk wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/12/2011 13:47, O. Hartmann wrote: Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. [...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. It all a little old now but some if the stuff in http://people.freebsd.org/~kris/scaling/ covers improvements that were seen. http://jeffr-tech.livejournal.com/5705.html shows a little too, reading though Jeffs blog is worth it as it has some interesting stuff on SHED_ULE. I thought there were some more benchmarks floating round but cant find any with a quick google. Vince Within our department, we developed a highly scalable code for planetary science purposes on imagery. It utilizes present GPUs via OpenCL if present. Otherwise it grabs as many cores as it can. By the end of this year I'll get a new desktop box based on Intels new Sandy Bridge-E architecture with plenty of memory. If the colleague who developed the code is willing performing some benchmarks on the same hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most recent Suse. For FreeBSD I intent also to look for performance with both different schedulers available. These observations are not scientific, but I have a CPU from AMD with 6 cores (AMD Phenom(tm) II X6 1090T Processor). My simple test was ``make buildkernel'' while watching the core usage with gkrellm. With SCHED_4BSD all 6 cores are loaded to 97% during the build phase. I've never seen any value above 97% with gkrellm. With SCHED_ULE I never saw all 6 cores loaded this heavily. Usually 2 or more cores were at or below 90%. Not really that significant, but still a noticeable difference in apparent scheduling behavior. Whether the observed difference is due to some change in data from the kernel to gkrellm is beyond me. -- Gary Jennejohn _ freebsd-sta...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
Would it be possible to implement a mechanism that lets one change the scheduler on the fly? Afaik Solaris can do that. _ Von: Steve Kargl s...@troutmask.apl.washington.edu Versendet am: Mon Dec 12 16:51:59 MEZ 2011 An: O. Hartmann ohart...@mail.zedat.fu-berlin.de CC: freebsd-performa...@freebsd.org, Current FreeBSD freebsd-current@freebsd.org, freebsd-sta...@freebsd.org Betreff: Re: SCHED_ULE should not be the default On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote: Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. [...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. Within our department, we developed a highly scalable code for planetary science purposes on imagery. It utilizes present GPUs via OpenCL if present. Otherwise it grabs as many cores as it can. By the end of this year I'll get a new desktop box based on Intels new Sandy Bridge-E architecture with plenty of memory. If the colleague who developed the code is willing performing some benchmarks on the same hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most recent Suse. For FreeBSD I intent also to look for performance with both different schedulers available. This comes up every 9 months or so, and must be approaching FAQ status. In a HPC environment, I recommend 4BSD. Depending on the workload, ULE can cause a severe increase in turn around time when doing already long computations. If you have an MPI application, simply launching greater than ncpu+1 jobs can show the problem. PS: search the list archives for kargl and ULE. -- Steve _ freebsd-sta...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On 12/12/2011 15:51, Steve Kargl wrote: This comes up every 9 months or so, and must be approaching FAQ status. In a HPC environment, I recommend 4BSD. Depending on the workload, ULE can cause a severe increase in turn around time when doing already long computations. If you have an MPI application, simply launching greater than ncpu+1 jobs can show the problem. PS: search the list archives for kargl and ULE. This isn't something that can be fixed by tuning ULE? For example for desktop applications kern.sched.preempt_thresh should be set to 224 from its default. I'm wondering if the installer should ask people what the typical use will be, and tune the scheduler appropriately. -- Bruce Cran ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
В Mon, 12 Dec 2011 16:18:35 + Bruce Cran br...@cran.org.uk пишет: On 12/12/2011 15:51, Steve Kargl wrote: This comes up every 9 months or so, and must be approaching FAQ status. In a HPC environment, I recommend 4BSD. Depending on the workload, ULE can cause a severe increase in turn around time when doing already long computations. If you have an MPI application, simply launching greater than ncpu+1 jobs can show the problem. PS: search the list archives for kargl and ULE. This isn't something that can be fixed by tuning ULE? For example for desktop applications kern.sched.preempt_thresh should be set to 224 from its default. I'm wondering if the installer should ask people what the typical use will be, and tune the scheduler appropriately. This is by and large does not help in certain situations ... ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Monday 12 December 2011 14:47:57 O. Hartmann wrote: Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. [...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. Within our department, we developed a highly scalable code for planetary science purposes on imagery. It utilizes present GPUs via OpenCL if present. Otherwise it grabs as many cores as it can. By the end of this year I'll get a new desktop box based on Intels new Sandy Bridge-E architecture with plenty of memory. If the colleague who developed the code is willing performing some benchmarks on the same hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most recent Suse. For FreeBSD I intent also to look for performance with both different schedulers available. O. In my spare time I do some stuff which can be considered HPC. If I recall correctly the most loud supporters of the notion that SCHED_BSD is faster than SCHED_ULE are using more threads than there are cores, causing CPU core contention and more importantly unevenly distributed runtimes among threads, resulting in suboptimal execution times for their programs. Since I've never actually seen that code in question it's hard to say whether or not this unfair distribution actually results in lower throughput or that it simply violates an assumption in the code that each thread takes about as long to finish its task. Although I haven't actually benchmarked the two schedulers directly, I have no reason to suspect SCHED_ULE of suboptimal performance because: 1) A program model where there are N threads on N cores which take work items from a shared queue until it is empty has almost perfect scaling on SCHED_ULE (I get 398% CPU usage on a quadcore) 2) The same program on Linux (dual boot) compiled with exactly the same compiler and flags runs slightly slower. I think this has to do with VM differences. What I'm trying to say is that until someone actually shows some code which has demonstrably lower performance on SCHED_ULE and this is not caused by IMHO improper timing dependencies between threads I'd say that there is no cause for concern here. I actually expect performance differences between the two schedulers to show in problems which cause a lot more contention on the CPU cores and use lots of locks internally so threads are frequently waiting on each other, for instance the MySQL benchmarks done a couple of years ago by Kris Kennaway. Aside from algorithmic limitations (SCHED_BSD doesn't really scale all that well), there will always exist some problems in which SCHED_BSD is faster because it by chance has a better execution order for these problems... The good thing is people have a choice :-). I'm looking forward to the results of your benchmark. -- Pieter de Goeje ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Mon, 12 Dec 2011 17:10:46 +0100 Lars Engels lars.eng...@0x20.net wrote: Did you use -jX to build the world? I'm top posting since Lars did. It was buildkernel, not buildworld. Yes, -j6. _ Von: Gary Jennejohn gljennj...@googlemail.com Versendet am: Mon Dec 12 16:32:21 MEZ 2011 An: Vincent Hoffman vi...@unsane.co.uk CC: O. Hartmann ohart...@mail.zedat.fu-berlin.de, Current FreeBSD freebsd-current@freebsd.org, freebsd-sta...@freebsd.org, freebsd-performa...@freebsd.org Betreff: Re: SCHED_ULE should not be the default On Mon, 12 Dec 2011 15:13:00 + Vincent Hoffman vi...@unsane.co.uk wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/12/2011 13:47, O. Hartmann wrote: Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. [...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. It all a little old now but some if the stuff in http://people.freebsd.org/~kris/scaling/ covers improvements that were seen. http://jeffr-tech.livejournal.com/5705.html shows a little too, reading though Jeffs blog is worth it as it has some interesting stuff on SHED_ULE. I thought there were some more benchmarks floating round but cant find any with a quick google. Vince Within our department, we developed a highly scalable code for planetary science purposes on imagery. It utilizes present GPUs via OpenCL if present. Otherwise it grabs as many cores as it can. By the end of this year I'll get a new desktop box based on Intels new Sandy Bridge-E architecture with plenty of memory. If the colleague who developed the code is willing performing some benchmarks on the same hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most recent Suse. For FreeBSD I intent also to look for performance with both different schedulers available. These observations are not scientific, but I have a CPU from AMD with 6 cores (AMD Phenom(tm) II X6 1090T Processor). My simple test was ``make buildkernel'' while watching the core usage with gkrellm. With SCHED_4BSD all 6 cores are loaded to 97% during the build phase. I've never seen any value above 97% with gkrellm. With SCHED_ULE I never saw all 6 cores loaded this heavily. Usually 2 or more cores were at or below 90%. Not really that significant, but still a noticeable difference in apparent scheduling behavior. Whether the observed difference is due to some change in data from the kernel to gkrellm is beyond me. -- Gary Jennejohn _ freebsd-sta...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org -- Gary Jennejohn ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Mon, 12 Dec 2011 08:04:37 -0800 m...@freebsd.org wrote: On Mon, Dec 12, 2011 at 7:32 AM, Gary Jennejohn gljennj...@googlemail.com wrote: On Mon, 12 Dec 2011 15:13:00 + Vincent Hoffman vi...@unsane.co.uk wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/12/2011 13:47, O. Hartmann wrote: Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. [...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. It all a little old now but some if the stuff in http://people.freebsd.org/~kris/scaling/ covers improvements that were seen. http://jeffr-tech.livejournal.com/5705.html shows a little too, reading though Jeffs blog is worth it as it has some interesting stuff on SHED_ULE. I thought there were some more benchmarks floating round but cant find any with a quick google. Vince Within our department, we developed a highly scalable code for planetary science purposes on imagery. It utilizes present GPUs via OpenCL if present. Otherwise it grabs as many cores as it can. By the end of this year I'll get a new desktop box based on Intels new Sandy Bridge-E architecture with plenty of memory. If the colleague who developed the code is willing performing some benchmarks on the same hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most recent Suse. For FreeBSD I intent also to look for performance with both different schedulers available. These observations are not scientific, but I have a CPU from AMD with 6 cores (AMD Phenom(tm) II X6 1090T Processor). My simple test was ``make buildkernel'' while watching the core usage with gkrellm. With SCHED_4BSD all 6 cores are loaded to 97% during the build phase. I've never seen any value above 97% with gkrellm. With SCHED_ULE I never saw all 6 cores loaded this heavily. Usually 2 or more cores were at or below 90%. Not really that significant, but still a noticeable difference in apparent scheduling behavior. Whether the observed difference is due to some change in data from the kernel to gkrellm is beyond me. SCHED_ULE is much sloppier about calculating which thread used a timeslice -- unless the timeslice went 100% to a thread, the fraction it used may get attributed elsewhere. So top's reporting of thread usage is not a useful metric. Total buildworld time is, potentially. I suspect you're right since the buildworld time, a much better test, was pretty much the same with 4BSD and ULE. -- Gary Jennejohn ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Mon, Dec 12, 2011 at 04:18:35PM +, Bruce Cran wrote: On 12/12/2011 15:51, Steve Kargl wrote: This comes up every 9 months or so, and must be approaching FAQ status. In a HPC environment, I recommend 4BSD. Depending on the workload, ULE can cause a severe increase in turn around time when doing already long computations. If you have an MPI application, simply launching greater than ncpu+1 jobs can show the problem. PS: search the list archives for kargl and ULE. This isn't something that can be fixed by tuning ULE? For example for desktop applications kern.sched.preempt_thresh should be set to 224 from its default. I'm wondering if the installer should ask people what the typical use will be, and tune the scheduler appropriately. Tuning kern.sched.preempt_thresh did not seem to help for my workload. My code is a classic master-slave OpenMPI application where the master runs on one node and all cpu-bound slaves are sent to a second node. If I send send ncpu+1 jobs to the 2nd node with ncpu's, then ncpu-1 jobs are assigned to the 1st ncpu-1 cpus. The last two jobs are assigned to the ncpu'th cpu, and these ping-pong on the this cpu. AFAICT, it is a cpu affinity issue, where ULE is trying to keep each job associated with its initially assigned cpu. While one might suggest that starting ncpu+1 jobs is not prudent, my example is just that. It is an example showing that ULE has performance issues. So, I now can start only ncpu jobs on each node in the cluster and send emails to all other users to not use those node, or use 4BSD and not worry about loading issues. -- Steve ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Monday, December 12, 2011 12:06:04 pm Steve Kargl wrote: On Mon, Dec 12, 2011 at 04:18:35PM +, Bruce Cran wrote: On 12/12/2011 15:51, Steve Kargl wrote: This comes up every 9 months or so, and must be approaching FAQ status. In a HPC environment, I recommend 4BSD. Depending on the workload, ULE can cause a severe increase in turn around time when doing already long computations. If you have an MPI application, simply launching greater than ncpu+1 jobs can show the problem. PS: search the list archives for kargl and ULE. This isn't something that can be fixed by tuning ULE? For example for desktop applications kern.sched.preempt_thresh should be set to 224 from its default. I'm wondering if the installer should ask people what the typical use will be, and tune the scheduler appropriately. Tuning kern.sched.preempt_thresh did not seem to help for my workload. My code is a classic master-slave OpenMPI application where the master runs on one node and all cpu-bound slaves are sent to a second node. If I send send ncpu+1 jobs to the 2nd node with ncpu's, then ncpu-1 jobs are assigned to the 1st ncpu-1 cpus. The last two jobs are assigned to the ncpu'th cpu, and these ping-pong on the this cpu. AFAICT, it is a cpu affinity issue, where ULE is trying to keep each job associated with its initially assigned cpu. While one might suggest that starting ncpu+1 jobs is not prudent, my example is just that. It is an example showing that ULE has performance issues. So, I now can start only ncpu jobs on each node in the cluster and send emails to all other users to not use those node, or use 4BSD and not worry about loading issues. This is a case where 4BSD's naive algorithm will spread out the load more evenly because all the threads are on a single, shared queue and each CPU just grabs the head of the queue when it finishes a timeslice. ULE always assigns threads to a single CPU (even if they aren't pinned to a single CPU using cpuset, etc.) and then tries to balance the load across cores later, but I believe in this case it's rebalancer won't have anything to really do as no matter what it does with the N+1 job it's going to be sharing a CPU with another job. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Mon, Dec 12, 2011 at 09:06:04AM -0800, Steve Kargl wrote: Tuning kern.sched.preempt_thresh did not seem to help for my workload. My code is a classic master-slave OpenMPI application where the master runs on one node and all cpu-bound slaves are sent to a second node. If I send send ncpu+1 jobs to the 2nd node with ncpu's, then ncpu-1 jobs are assigned to the 1st ncpu-1 cpus. The last two jobs are assigned to the ncpu'th cpu, and these ping-pong on the this cpu. AFAICT, it is a cpu affinity issue, where ULE is trying to keep each job associated with its initially assigned cpu. While one might suggest that starting ncpu+1 jobs is not prudent, my example is just that. It is an example showing that ULE has performance issues. So, I now can start only ncpu jobs on each node in the cluster and send emails to all other users to not use those node, or use 4BSD and not worry about loading issues. Does it meet your expectations if you start (j modulo ncpu) = 0 jobs on a node? -- Scott LambertKC5MLE Unix SysAdmin lamb...@lambertfam.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Mon, Dec 12, 2011 at 01:03:30PM -0600, Scott Lambert wrote: On Mon, Dec 12, 2011 at 09:06:04AM -0800, Steve Kargl wrote: Tuning kern.sched.preempt_thresh did not seem to help for my workload. My code is a classic master-slave OpenMPI application where the master runs on one node and all cpu-bound slaves are sent to a second node. If I send send ncpu+1 jobs to the 2nd node with ncpu's, then ncpu-1 jobs are assigned to the 1st ncpu-1 cpus. The last two jobs are assigned to the ncpu'th cpu, and these ping-pong on the this cpu. AFAICT, it is a cpu affinity issue, where ULE is trying to keep each job associated with its initially assigned cpu. While one might suggest that starting ncpu+1 jobs is not prudent, my example is just that. It is an example showing that ULE has performance issues. So, I now can start only ncpu jobs on each node in the cluster and send emails to all other users to not use those node, or use 4BSD and not worry about loading issues. Does it meet your expectations if you start (j modulo ncpu) = 0 jobs on a node? I've never tried to launch more than ncpu + 1 (or + 2) jobs. I suppose at the time I was investigating the issue, it was determined that 4BSD allowed me to get my work done in a more timely manner. So, I took the path of least resistance. -- Steve ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On 12/12/11 18:06, Steve Kargl wrote: On Mon, Dec 12, 2011 at 04:18:35PM +, Bruce Cran wrote: On 12/12/2011 15:51, Steve Kargl wrote: This comes up every 9 months or so, and must be approaching FAQ status. In a HPC environment, I recommend 4BSD. Depending on the workload, ULE can cause a severe increase in turn around time when doing already long computations. If you have an MPI application, simply launching greater than ncpu+1 jobs can show the problem. PS: search the list archives for kargl and ULE. This isn't something that can be fixed by tuning ULE? For example for desktop applications kern.sched.preempt_thresh should be set to 224 from its default. I'm wondering if the installer should ask people what the typical use will be, and tune the scheduler appropriately. Is the tuning of kern.sched.preempt_thresh and a proper method of estimating its correct value for the intended to use workload documented in the manpages, maybe tuning()? I find it hard to crawl a lot of pros and cons of mailing lists for evaluating a correct value of this, seemingly, important tunable. Tuning kern.sched.preempt_thresh did not seem to help for my workload. My code is a classic master-slave OpenMPI application where the master runs on one node and all cpu-bound slaves are sent to a second node. If I send send ncpu+1 jobs to the 2nd node with ncpu's, then ncpu-1 jobs are assigned to the 1st ncpu-1 cpus. The last two jobs are assigned to the ncpu'th cpu, and these ping-pong on the this cpu. AFAICT, it is a cpu affinity issue, where ULE is trying to keep each job associated with its initially assigned cpu. While one might suggest that starting ncpu+1 jobs is not prudent, my example is just that. It is an example showing that ULE has performance issues. So, I now can start only ncpu jobs on each node in the cluster and send emails to all other users to not use those node, or use 4BSD and not worry about loading issues. signature.asc Description: OpenPGP digital signature
Re: SCHED_ULE should not be the default
On 12/12/2011 23:48, O. Hartmann wrote: Is the tuning of kern.sched.preempt_thresh and a proper method of estimating its correct value for the intended to use workload documented in the manpages, maybe tuning()? I find it hard to crawl a lot of pros and cons of mailing lists for evaluating a correct value of this, seemingly, important tunable. Note that I said for example :) I was suggesting that there may be sysctl's that can be tweaked to improve performance. -- Bruce Cran ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On 12/12/2011 05:47, O. Hartmann wrote: Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? I complained about poor interactive performance of ULE in a desktop environment for years. I had numerous people try to help, including Jeff, with various tunables, dtrace'ing, etc. The cause of the problem was never found. I switched to 4BSD, problem gone. This is on 2 separate systems with core 2 duos. hth, Doug -- [^L] Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org