Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior
Marc Espie wrote: > reorganizing a large part of usr.bin or usr.sbin to just be one > single variation of bsd.prog.mk with multiple progs and multiple object > files... works just fine for, say 95% of the binaries in those directories > > (considering there are lots of directories with one single C file or two > C files, this sounds like a gain for make -jN, right ?) > > End result: no measurable gain. That on 4 cpu boxes at the time. > There are SMP issues to solve before this actually yields any > useful result... I don't know. I wrote a lua script that unrolled all the subdir and ran a single process make in each directory, up to 4 at a time, and got pretty good results. It takes 30s on my newer laptop just to run unbound configure. Nothing else is compiling during that time, and it can be. Just a few of those can shave minutes off a build. (This was before clang. The savings is somewhat diminished by the total time consumed by the gnu directory.)
Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior
On Wed, Nov 27, 2019 at 06:21:01PM +0200, Lauri Tirkkonen wrote: > All that said I do understand if there is reluctance to merge the > jobserver stuff since it doesn't actually help the current situation in > most cases. Nevertheless it has been personally beneficial to me in > identifying areas of improvement, even if those are nothing new to > OpenBSD developers. Oh, I'm not opposed to merging a jobserver once it's roughly as stable as what we have. My concerns are: - it should work with all of src and xenocara (once that's done, I can do ports) - it should not interfere with other jobservers in other makes. - it does not replace the expensive jobs heuristics. - it does not render make harder to maintain and evolve. Especially since, as I've said before, there's no actual guarantee anywhere in POSIX that shells will cooperate with it. I don't have enough time to do everything I want, but it's well known that there are still issues with make -j compared to sequential make. If you look at cvs history, I've completely rewritten job.c a few times. And I had a presentation at eurobsdcon about what's still missing a few years ago. I've been swamped with changes in dpb/bsd.port.mk/pkg_add... recently, but it's still something I want to get back to eventually ! One reason that work goes slowly is because make was a seriously entwined series of things that all depend on each other... It still is, but slightly less so, and thus anything that's not squeaky clean is unlikely to get in. (hence me wanting the jobserver stuff to be reasonably self-contained and not interfering with the rest of make)
Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior
On Wed, Nov 27 2019 17:09:09 +0100, Marc Espie wrote: > I did experiment with something similar a while back: > > reorganizing a large part of usr.bin or usr.sbin to just be one > single variation of bsd.prog.mk with multiple progs and multiple object > files... works just fine for, say 95% of the binaries in those directories > > (considering there are lots of directories with one single C file or two > C files, this sounds like a gain for make -jN, right ?) yep, exactly my thinking as well. > End result: no measurable gain. That on 4 cpu boxes at the time. > There are SMP issues to solve before this actually yields any > useful result... I did see (and report) a measurable gain previously in this thread in one of the environments I was testing this on, exactly in usr.bin; just under two minutes of improvement with 4 CPUs. https://marc.info/?l=openbsd-tech&m=157374553407474&w=2 -- see the note about my test on Intel Atom C2550. Now, I don't really know why I saw that improvement there, but you did not in your tests and I did not in the Hyper-V guest that I tested my most recent diff on; that merits further investigation. Solving the actual SMP issues involved here would be great; I think the potential benefits of work in that area are pretty big. All that said I do understand if there is reluctance to merge the jobserver stuff since it doesn't actually help the current situation in most cases. Nevertheless it has been personally beneficial to me in identifying areas of improvement, even if those are nothing new to OpenBSD developers. -- Lauri Tirkkonen | lotheac @ IRCnet
Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior
On Wed, Nov 27, 2019 at 05:31:48PM +0200, Lauri Tirkkonen wrote: > I'll give you some more background that I maybe should have given > earlier already: in my hobby OS Unleashed, we use bmake and earlier did > some (slightly hacky) modifications to subdir.mk to enable paralellizing > jobs in multiple subdirs at the same time. I don't remember the exact > numbers since it's been quite some time, but I was able to cut the base > build time by *a lot* with just that change -- during the build there > were previously a lot of idle CPUs. Those results served as motivation > for me to start looking into doing similar things in OpenBSD, and having > a jobserver is sort of the first step there. I did experiment with something similar a while back: reorganizing a large part of usr.bin or usr.sbin to just be one single variation of bsd.prog.mk with multiple progs and multiple object files... works just fine for, say 95% of the binaries in those directories (considering there are lots of directories with one single C file or two C files, this sounds like a gain for make -jN, right ?) End result: no measurable gain. That on 4 cpu boxes at the time. There are SMP issues to solve before this actually yields any useful result...
Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior
On Wed, Nov 27 2019 16:14:44 +0100, Marc Espie wrote: > On Fri, Nov 15, 2019 at 03:29:29PM +0200, Lauri Tirkkonen wrote: > > On Fri, Nov 15 2019 15:24:57 +0200, Lauri Tirkkonen wrote: > > > Your points are valid and I agree with them completely. There are > > > clearly problems with lock contention, > > > > and I should mention here that I would probably not have observed the > > magnitude of these problems had I not made make start more jobs at once > > in the first place. So while the make/mk diffs need more work, we've > > already identified more areas of improvement as a result of trying them > > out. But one thing at a time. :) > > Seriously, as much as what you're doing is interesting, you've not identified > anything new in THAT specific area. We've run enough build configurations > to know about those scalability issues for years. Fair enough, I was being unfair in assuming otherwise. I guess I wanted to offer some validation for my efforts or something ;) > It's cool that you found new races that are hidden by expensive jobs, though, > that much I fully agree with. I'll give you some more background that I maybe should have given earlier already: in my hobby OS Unleashed, we use bmake and earlier did some (slightly hacky) modifications to subdir.mk to enable paralellizing jobs in multiple subdirs at the same time. I don't remember the exact numbers since it's been quite some time, but I was able to cut the base build time by *a lot* with just that change -- during the build there were previously a lot of idle CPUs. Those results served as motivation for me to start looking into doing similar things in OpenBSD, and having a jobserver is sort of the first step there. -- Lauri Tirkkonen | lotheac @ IRCnet
Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior
On Fri, Nov 15, 2019 at 03:29:29PM +0200, Lauri Tirkkonen wrote: > On Fri, Nov 15 2019 15:24:57 +0200, Lauri Tirkkonen wrote: > > Your points are valid and I agree with them completely. There are > > clearly problems with lock contention, > > and I should mention here that I would probably not have observed the > magnitude of these problems had I not made make start more jobs at once > in the first place. So while the make/mk diffs need more work, we've > already identified more areas of improvement as a result of trying them > out. But one thing at a time. :) Seriously, as much as what you're doing is interesting, you've not identified anything new in THAT specific area. We've run enough build configurations to know about those scalability issues for years. It's cool that you found new races that are hidden by expensive jobs, though, that much I fully agree with.
Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior
On Wed, Nov 27 2019 11:49:00 +0100, Marc Espie wrote: > On Wed, Nov 27, 2019 at 12:42:48PM +0200, Lauri Tirkkonen wrote: > > In a lesson to always proof-read *before* sending your message: > > > > On Wed, Nov 27 2019 12:31:51 +0200, Lauri Tirkkonen wrote: > > > - a full release build with the jobserver enabled passed after > > >disabling gnu/usr.bin/binutils > > > > this should, in fact, read "gnu/usr.bin/binutils-2.17". binutils builds > > fine. > > We would need to investigate this. > > Instead of disabling them, send logs Sure; I just considered these separate problems that happened to be exposed by the jobserver usage ('expensive' avoids these races since it does not run more than one sub-make at a time). The problem I encountered with binutils-2.17 seems to be that configure is run in 'intl' more than once at the same time. pstree output seems to support my hypothesis: | | |-+= 94071 lotheac make -f Makefile.bsd-wrapper -j16 -x | | | \-+- 26584 lotheac /bin/sh -ec SUBDIRS='opcodes bfd binutils ld gas' CONFIGURE_HOST_MODULES='configure- | | | \-+- 71033 lotheac make CC=cc CFLAGS=-O2 -pipe -DPIE_DEFAULT=1 LDFLAGS= scriptdir=/usr/libdata toold | | | |-+- 08037 lotheac /bin/sh -ec r=`${PWDCMD-pwd}`; export r; s=`cd /usr/src/gnu/usr.bin/binutils-2.1 | | | | \-+- 57596 lotheac make DESTDIR= RPATH_ENVVAR=LD_LIBRARY_PATH TARGET_SUBDIR=amd64-unknown-openbsd6 | | | | \-+- 46889 lotheac /bin/sh -ec r=`${PWDCMD-pwd}`; export r; s=`cd /usr/src/gnu/usr.bin/binutils | | | | \-+- 07550 lotheac /bin/sh /usr/src/gnu/usr.bin/binutils-2.17/intl/configure --cache-file=./co | | | | \--- 50218 lotheac (grep) | | | \-+- 47358 lotheac /bin/sh -ec r=`${PWDCMD-pwd}`; export r; s=`cd /usr/src/gnu/usr.bin/binutils-2.1 | | | \-+- 26216 lotheac make DESTDIR= RPATH_ENVVAR=LD_LIBRARY_PATH TARGET_SUBDIR=amd64-unknown-openbsd6 | | | \-+- 51511 lotheac /bin/sh -ec r=`${PWDCMD-pwd}`; export r; s=`cd /usr/src/gnu/usr.bin/binutils | | | \-+- 22785 lotheac /bin/sh /usr/src/gnu/usr.bin/binutils-2.17/intl/configure --cache-file=./co | | | \-+- 25979 lotheac /bin/sh /usr/src/gnu/usr.bin/binutils-2.17/intl/configure --cache-file=./ | | | \--- 51834 lotheac (cc) Partial log of the failure below. I have a full log of the make invocation as well but I think the relevant parts are included here. [ ... ] cd /usr/src/gnu/usr.bin/binutils-2.17/obj/ld && make ld.1 cd /usr/src/gnu/usr.bin/binutils-2.17/obj/gas/doc && make as.1 `as.1' is up to date. `ld.1' is up to date. mv ld/ld.1 ld/ld.bfd.1 SUBDIRS='opcodes bfd binutils ld gas' CONFIGURE_HOST_MODULES='configure-opcodes configure-bfd configure-binutils configure-ld configure-gas' make CC="cc" CFLAGS="-O2 -pipe -DPIE_DEFAULT=1 " LDFLAGS="" scriptdir=/usr/libdata tooldir=/usr MAKEINFO='makeinfo --no-split' MAKEINFOFLAGS='' BSDSRCDIR=/usr/src ALL_MODULES="all-opcodes all-bfd all-binutils all-ld all-gas" ALL_HOST_MODULES='all-opcodes all-bfd all-binutils all-ld all-gas' INFO_HOST_MODULES='info-opcodes info-bfd info-binutils info-ld info-gas' all info Configuring in ./intl Configuring in ./intl Doing info in opcodes Doing info in bfd Doing info in binutils Doing info in gas Doing info in ld Making info in po Making info in doc touch ld.1 Making info in doc perl /usr/src/gnu/usr.bin/binutils-2.17/ld/../etc/texi2pod.pl -I /usr/src/gnu/usr.bin/binutils-2.17/ld -I /usr/src/gnu/usr.bin/binutils-2.17/ld/../bfd/doc -I /usr/src/gnu/usr.bin/binutils-2.17/ld/../../../lib/libiberty/src -Dman < /usr/src/gnu/usr.bin/binutils-2.17/ld/ld.texinfo > ld.pod Making info in doc rm -f /usr/src/gnu/usr.bin/binutils-2.17/bfd/doc/bfd.info /usr/src/gnu/usr.bin/binutils-2.17/bfd/doc/bfd.info-[0-9] /usr/src/gnu/usr.bin/binutils-2.17/bfd/doc/bfd.info-[0-9][0-9] if test -f cxxfilt.man; then man=cxxfilt.man; else man=/usr/src/gnu/usr.bin/binutils-2.17/binutils/doc/cxxfilt.man; fi; sed -e 's/@PROGRAM@/c++filt/' -e 's/cxxfilt/c++filt/' < $man > c++filt.1 makeinfo --no-split -I /usr/src/gnu/usr.bin/binutils-2.17/bfd/doc /usr/src/gnu/usr.bin/binutils-2.17/bfd/doc/bfd.texinfo Making info in po makeinfo --no-split -I "/usr/src/gnu/usr.bin/binutils-2.17/binutils/doc" -I "/usr/src/gnu/usr.bin/binutils-2.17/binutils/../../../lib/libiberty/src" -I /usr/src/gnu/usr.bin/binutils-2.17/binutils/doc /usr/src/gnu/usr.bin/binutils-2.17/binutils/doc/binutils.texi Making info in po loading cache ./config.cache Making info in po loading cache ./config.cache checking for a BSD compatible install... /usr/bin/install -c checking how to run the C preprocessor... (pod2man --center="GNU Development Tools" --release="binutils-2.17" --section=1 ld.pod | sed -e '/^.if n .na/d' > ld.1.T$$ && mv -f ld.1.T$$ ld.1) || (rm -f ld.1.T$$ && exit 1) checking for a BSD compatible install... /usr/bin/install -c checking how to run the C pre
Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior
On Wed, Nov 27, 2019 at 12:42:48PM +0200, Lauri Tirkkonen wrote: > In a lesson to always proof-read *before* sending your message: > > On Wed, Nov 27 2019 12:31:51 +0200, Lauri Tirkkonen wrote: > > - a full release build with the jobserver enabled passed after > >disabling gnu/usr.bin/binutils > > this should, in fact, read "gnu/usr.bin/binutils-2.17". binutils builds > fine. We would need to investigate this. Instead of disabling them, send logs
Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior
In a lesson to always proof-read *before* sending your message: On Wed, Nov 27 2019 12:31:51 +0200, Lauri Tirkkonen wrote: > - a full release build with the jobserver enabled passed after >disabling gnu/usr.bin/binutils this should, in fact, read "gnu/usr.bin/binutils-2.17". binutils builds fine. -- Lauri Tirkkonen | lotheac @ IRCnet
Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior
New diff, almost entirely rewritten and slightly redesigned, based on your previous feedback as well as problems uncovered in testing. Sorry it's taken so long; I've been a bit stingy with my spare time with respect to this :) Description of the design: - the top-level make initializes jobserver, if requested, by creating a pair of sockets with socketpair(). It keeps one side ("master") for itself only (O_CLOEXEC), passing the other side to all children ("slave"). It then adds the fd number for the inheritable slave fd to MAKEFLAGS with '-J'. these sockets are set SOCK_NONBLOCK. - the master initializes the number of available job tokens to maxJobs (ie. from -j). - every Job keeps track of the kind of job token it is currently holding (enum job_token_type). when the jobserver master starts a job, it allocates a JOB_TOKEN_LOCAL type token to the job and decrements the number of available tokens. - child makes, noticing -J in MAKEFLAGS, initialize the number of available job tokens to 1 -- this is because an upper-level make has already allocated a job token to the job that ended up starting this make process. - if a slave make's jobserver.tokens is 0, it asks the master for a token via the socket. This happens by writing a '-' character to the socket, and waiting for a response. - in the master process, handle_running_jobs uses ppoll() to wait for either communication from slaves or signals. when it receives a token request ('-' character read from the socket), it checks if it has available job tokens, and if so, decrements the number of available tokens and sends back a response ('.' byte, but the value isn't actually checked by slaves, so any byte would do). if the master does not have available tokens to fulfill the request, it increments jobserver.waiting so that it can fulfill the request later when currently in-use tokens are reclaimed. - when the slave make receives a byte over the socket from master, it stops waiting, sets the Job's token type to JOB_TOKEN_REMOTE and continues executing the job. when the job stops executing (determine_job_next_step(), or when the job finishes in continue_job()), the token is released back to master by writing a '+' to the socket. - when the master sees a '+' character on the socket, it increments jobserver.tokens, and subsequently sends the newly reclaimed token(s) to the socket if there are slaves waiting (jobserver.waiting != 0) - if the communication between the master and slave has been severed (eg. if an intermediate process has closed the fd), the slave sets maxJobs to 1, disables the jobserver, prints a warning to stderr and proceeds to run one job at a time. - more severe error conditions exist; I've commented on these in the code so I won't repeat here. Testing notes: - a full release build with the jobserver enabled passed after disabling gnu/usr.bin/binutils and gnu/usr.bin/perl subdirs. Those dirs may fail to build with jobserver because there are races that are hit when parallelizing with the jobserver, but not with 'expensive' heuristics. - as described earlier, my testing setup seems to have high lock contention for reasons I don't know. as such, the results for enabling the jobserver for 'make -j16 build' are slower in real-time than without the jobserver: $ time doas make -j16 build 209m03.63s real 388m15.88s user 3057m43.77s system $ time doas make -j16 -x build 214m27.57s real 381m11.55s user 3179m29.82s system that means that for this particular setup, the jobserver implementation is only a small part of the puzzle of improving performance (and does the opposite without the other pieces :). subdir.mk is still essentially single-threaded (I sent an unfinished, somewhat broken PoC patch for that earlier), which is a big opportunity for improvement when using the jobserver, and obviously for this machine the high spin% problems should be solved if I want the build to actually go faster when it's running more processes. I did mention that on actual hardware I had better results in previous smaller-scale tests, but I have not run a 'make build' on that actual hardware since it's essentially a production machine I use. - I have not tested ports builds. - I have only tested this on amd64. - regression tests have not been written. Other general notes: - manpage parts are not yet written. - I don't particularly like the option letter I picked to enable the jobserver, '-x'. I don't know what would be better, though; one option that crossed my mind was using -J and passing the fd number to children in another environment variable instead of MAKEFLAGS -J, but I'm not sure. I would appreciate help in picking the color for this shed :) - a new debug flag J is added for jobserver debug messages. they are perhaps a little
Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior
On Fri, Nov 15 2019 15:24:57 +0200, Lauri Tirkkonen wrote: > Your points are valid and I agree with them completely. There are > clearly problems with lock contention, and I should mention here that I would probably not have observed the magnitude of these problems had I not made make start more jobs at once in the first place. So while the make/mk diffs need more work, we've already identified more areas of improvement as a result of trying them out. But one thing at a time. :) -- Lauri Tirkkonen | lotheac @ IRCnet
Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior
On Fri, Nov 15 2019 14:16:44 +0100, Martin Pieuchot wrote: > > Yes, that was my point exactly. Less jobs didn't fare any better (well, > > it had less spin time, but took around the same real time), so the > > conclusion I arrived at was that something in my setup was eventually > > contending on a small number of locks. My guess is that it's either the > > filesystem, the IDE driver, something Hyper-V specific, or a combination > > of the above. > > What does it bring to guess? Why don't you look deeper where the > contention is? > > > This change is all about utilizing CPUs better in parallelizing existing > > workloads, so I wouldn't expect a very large change in user time (but it > > should happen over a smaller amout of real time). > > Is this change about better parallelizing? Do we see that? Or is it a > guess? If we want OpenBSD to do a better job at parallelizing maybe we > should look at where the contention is and then how to get rid of it? > > > > You can also write Makefiles that expose less the limitation of the > > > system. ktrace(1) is your friend for that. > > > > The idea was to test real-world workloads, ie. actual OpenBSD builds. I > > do have enough memory on this thing to place objs in mfs; maybe I'll try > > that next time around. > > I'd suggest you to spend your time understanding where is the bottleneck > instead of randomly trying to change stuff :) Your points are valid and I agree with them completely. There are clearly problems with lock contention, and there are problems with resource utilisation due to make not starting enough jobs. While they are related, I kind of have to pick my battles one by one not to descend too far into yak shaving territory. Given infinite time, let's fix everything, but since that is not afforded to me, some educated guesses have to be made to be able to test the right thing in the meantime. ;) -- Lauri Tirkkonen | lotheac @ IRCnet
Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior
On 15/11/19(Fri) 15:03, Lauri Tirkkonen wrote: > On Thu, Nov 14 2019 17:57:20 +0100, Martin Pieuchot wrote: > > On 14/11/19(Thu) 17:30, Lauri Tirkkonen wrote: > > > [...] > > > The first test I made was on a beefy virtual machine under Hyper-V, > > > with 32 vCPUs on a Threadripper 2950X. With and without this patch (and > > > the followup for share/mk), I did make -j32 -C /usr/src/usr.bin. The > > > results were very disappointing: with my patches the build was *slower* > > > even though it used way more CPU. I think 'top' said it best: > > > > > > 32 CPUs: 0.1% user, 0.0% nice, 3.9% sys, 95.4% spin, 0.0% intr, 0.7% > > > idle > > > ^^ > > > > I'd suggest considering the %user time when working on a userland tool. > > A high %spin time indicates that syscalls and page faults are spending > > a lot of time busy waiting on the KERNEL_LOCK(). > > > > In other words you're exposing the non-scalability of OpenBSD on such > > machine. I'd suggest you use less jobs, I'd try 8 or 12 as a first step. > > Yes, that was my point exactly. Less jobs didn't fare any better (well, > it had less spin time, but took around the same real time), so the > conclusion I arrived at was that something in my setup was eventually > contending on a small number of locks. My guess is that it's either the > filesystem, the IDE driver, something Hyper-V specific, or a combination > of the above. What does it bring to guess? Why don't you look deeper where the contention is? > This change is all about utilizing CPUs better in parallelizing existing > workloads, so I wouldn't expect a very large change in user time (but it > should happen over a smaller amout of real time). Is this change about better parallelizing? Do we see that? Or is it a guess? If we want OpenBSD to do a better job at parallelizing maybe we should look at where the contention is and then how to get rid of it? > > You can also write Makefiles that expose less the limitation of the > > system. ktrace(1) is your friend for that. > > The idea was to test real-world workloads, ie. actual OpenBSD builds. I > do have enough memory on this thing to place objs in mfs; maybe I'll try > that next time around. I'd suggest you to spend your time understanding where is the bottleneck instead of randomly trying to change stuff :)
Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior
Hi Martin, On Thu, Nov 14 2019 17:57:20 +0100, Martin Pieuchot wrote: > On 14/11/19(Thu) 17:30, Lauri Tirkkonen wrote: > > [...] > > The first test I made was on a beefy virtual machine under Hyper-V, > > with 32 vCPUs on a Threadripper 2950X. With and without this patch (and > > the followup for share/mk), I did make -j32 -C /usr/src/usr.bin. The > > results were very disappointing: with my patches the build was *slower* > > even though it used way more CPU. I think 'top' said it best: > > > > 32 CPUs: 0.1% user, 0.0% nice, 3.9% sys, 95.4% spin, 0.0% intr, 0.7% > > idle > > ^^ > > I'd suggest considering the %user time when working on a userland tool. > A high %spin time indicates that syscalls and page faults are spending > a lot of time busy waiting on the KERNEL_LOCK(). > > In other words you're exposing the non-scalability of OpenBSD on such > machine. I'd suggest you use less jobs, I'd try 8 or 12 as a first step. Yes, that was my point exactly. Less jobs didn't fare any better (well, it had less spin time, but took around the same real time), so the conclusion I arrived at was that something in my setup was eventually contending on a small number of locks. My guess is that it's either the filesystem, the IDE driver, something Hyper-V specific, or a combination of the above. SATA on bare metal didn't spin nearly as badly, which is why I suspected IDE. But this is a bit of a digression :) This change is all about utilizing CPUs better in parallelizing existing workloads, so I wouldn't expect a very large change in user time (but it should happen over a smaller amout of real time). > You can also write Makefiles that expose less the limitation of the > system. ktrace(1) is your friend for that. The idea was to test real-world workloads, ie. actual OpenBSD builds. I do have enough memory on this thing to place objs in mfs; maybe I'll try that next time around. -- Lauri Tirkkonen | lotheac @ IRCnet
Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior
On 14/11/19(Thu) 17:30, Lauri Tirkkonen wrote: > [...] > The first test I made was on a beefy virtual machine under Hyper-V, > with 32 vCPUs on a Threadripper 2950X. With and without this patch (and > the followup for share/mk), I did make -j32 -C /usr/src/usr.bin. The > results were very disappointing: with my patches the build was *slower* > even though it used way more CPU. I think 'top' said it best: > > 32 CPUs: 0.1% user, 0.0% nice, 3.9% sys, 95.4% spin, 0.0% intr, 0.7% idle > ^^ I'd suggest considering the %user time when working on a userland tool. A high %spin time indicates that syscalls and page faults are spending a lot of time busy waiting on the KERNEL_LOCK(). In other words you're exposing the non-scalability of OpenBSD on such machine. I'd suggest you use less jobs, I'd try 8 or 12 as a first step. You can also write Makefiles that expose less the limitation of the system. ktrace(1) is your friend for that.
Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior
On Thu, Nov 14, 2019 at 05:30:34PM +0200, Lauri Tirkkonen wrote: > > > @@ -347,10 +360,9 @@ print_errors(void) > > > static void > > > setup_signal(int sig) > > > { > > > - if (signal(sig, SIG_IGN) != SIG_IGN) { > > > - (void)signal(sig, notice_signal); > > > - sigaddset(&sigset, sig); > > > - } > > > + sigaction(sig, &(struct sigaction) { .sa_handler = notice_signal }, > > > + NULL); > > > + sigaddset(&sigset, sig); > > > } > > Why did you change this ? this is gratuitous. > > Actually, it's not gratuitous, but it's subtle. signal() enables > SA_RESTART, but we depend on EINTR returns in a couple places, so I did > this. I should've mentioned this originally, sorry. Err, thinking some more. You can't do that. This affects ALL of make's code and not just this file. Some parts are belt-and-suspenders type and include checks for EINTR, but NOWHERE near every part. In particular, you *lose* stdio.h entirely.
Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior
On Thu, Nov 14, 2019 at 05:30:34PM +0200, Lauri Tirkkonen wrote: > Yeah, you're right, it reads like crap, I'll rewrite it. The gist of the > thing is that 'expensive' heuristic is now only used to decide whether > or not to print the failed command. I object to that. Keep the "expensive" heuristics around. The current -jn should still work *without* the jobserver mechanism. > > > static void > > > -loop_handle_running_jobs() > > > +loop_handle_running_jobs(void) > > > { > > > while (runningJobs != NULL) > > > handle_running_jobs(); > > > } > > Yet another gratuitous change. There is already a perfectly fine prototype > > that asserts (void). > > A function prototype with empty parentheses does not assert (void) in C > (but does in C++). However, you're correct about it not being related to > this change. See above. There are prototypes for ALL static functions in that file. You don't need the void in the function definition when the function declaration (a few hundred lines above) already has it. > This was also a bit concerning to me, but that's what bmake/gmake also > are doing (albeit with more fd's). Eventually the potential gains > outweighed my concerns: every SUBDIR in bsd.subdir.mk is currently built > serially, which has quite a large impact on how many jobs are running at > any one time. > > An alternative could be to use unix sockets with paths instead and pass > the path through the environment, but I'm not convinced I like that any > better. There is the huge problem of where you put your paths. NFS comes in the way AND socket names are limited to 256 characters.
Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior
Hi, thanks for your comments. More responses inline, but I'd like to mention up front that further testing I've made agrees with you; this isn't ready by any means. I observed one hang I've yet to debug (very bad), and the speedups weren't that great in some cases either (more on that later). I don't have a new version of the diff fixing the problems (at least not yet), but I'll try to give a bit more motivation/background for my changes. On Thu, Nov 14 2019 15:24:53 +0100, Marc Espie wrote: > Sorry, I missed your initial message (thanks to naddy@ for telling me about > it) > > A jobserver was considered in make before implementing expensive. > > The main reason it was not done that way is that you can't guarantee a > file descriptor will make it through the intermediate shell. > > In fact, POSIX doesn't guarantee anything, and a lot of shells actively close > file descriptors they don't know about. Yes, I'm aware of this. Other makes seem to do fine though, and they use the same mechanism of passing file descriptors: leave them open, and communicate their existence through environment variables. However, it's a good point; failure to access the fd should fall back to single-job semantics in the child at that point, and I need to consider what happens in the top-level make in such a case as well. More testing is obviously needed. > Your example is not complicated enough, as your commands mean NO > shell will be spawned (everything will pass directly through to > submakes, which is BTW a large reason for implementing -C, which cuts > down on the number of intermediate processes). True, the test needs more work. Thanks for pointing it out, I somehow forgot about the optimization about avoiding a shell while I was writing the previous mail. > I'm not sure I like added complexity in that part of make. > > > I haven't yet built the entire tree with this, just cautiously probing > > the waters first to see if there is interest :) > > I'd like some actual numbers on real world applications: what does it change > for the actual build through the full source tree. Areas like perl are where > the job explosion occurs and where the expensive heuristics got refined. As I mentioned in the preface above, a full build actually ended up hanging somewhere in gnu/ :-) However I do have some results from smaller real-world tests. Unfortunately I haven't yet gotten around to bigger ones (and besides I need to fix the outstanding issues first for the tests to make actual sense). The first test I made was on a beefy virtual machine under Hyper-V, with 32 vCPUs on a Threadripper 2950X. With and without this patch (and the followup for share/mk), I did make -j32 -C /usr/src/usr.bin. The results were very disappointing: with my patches the build was *slower* even though it used way more CPU. I think 'top' said it best: 32 CPUs: 0.1% user, 0.0% nice, 3.9% sys, 95.4% spin, 0.0% intr, 0.7% idle ^^ The Hyper-V guest uses an IDE disk, and my hypothesis is that there may be some locking problems there. I repeated the test on actual hardware, an Intel Atom C2550 board with 4 CPUs, and SATA disks. This showed an actual build time improvement (though I admit I just did one trial so far): before make -j4 817.49s user 340.95s system 268% cpu 7:11.44 total after make -j4 831.34s user 352.72s system 373% cpu 5:16.88 total which is not too shabby if you ask me. > Comments on the patch proper: > > static int aborting = 0; /* why is the make aborting? */ > > #define ABORT_ERROR1 /* Because of an error */ > > @@ -123,12 +128,15 @@ static intaborting = 0; /* why is the make > > aborting? */ > > > > static int maxJobs;/* The most children we can run at once */ > > static int nJobs; /* Number of jobs already allocated */ > > -static boolno_new_jobs;/* Mark recursive shit so we shouldn't > > start > > -* something else at the same time > > -*/ > I don't think ditching expensive entirely is a good idea anyhow. > What happens if the jobserver fails ? I see it goes to err all the time. > BTW, going to err is bad. Make handles processes. Most errors MUST do > something sane, like waiting for processes to end in many many cases. You're absolutely right, error handling is currently complete garbage. And jobserver failures should be handled gracefully, I agree. I'll see if I can come up with something. > > + > > +struct jobserver { > > + int master; /* master deposits job tokens into this fd */ > > + int slave; /* slaves consume job tokens from this fd */ > > + volatile sig_atomic_t tokens; /* currently available tokens */ > > +}; > you didn't really read the code, did you ? why volatile sig_atomic_t ? I wrote this code, I'm not sure how I could have avoided reading it :-) But I must conced
Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior
On Tue, Nov 05, 2019 at 03:22:08PM +, Lauri Tirkkonen wrote: > Hi, > > make has the concept of 'expensive' jobs: if the command line includes > the word "make" in it, the command is considered expensive and other > jobs are held until that job finishes. It does this to avoid exponential > behavior when parallelizing with -j. > > Some other makes (eg. bmake and gmake) use a jobserver, ie. a pair of > pipes between child makes and the top-level make: the top-level make > provides job tokens to children via these pipes, which are returned when > the jobs complete. This ensures that recursive makes only run a certain > number of jobs in total at any one time, while achieving greater > resource utilisation (no need to wait for an 'expensive' job to finish > before starting others). > > This diff implements a similar jobserver in make. Like bmake and gmake, > job tokens are provided by the top-level make (called "jobserver > master"), used by children ("jobserver slaves") when they run commands, > and returned when those commands finish. Like bmake, the internal, > undocumented command line option to tell (possibly indirect, via > MAKEFLAGS)) children about the file descriptor to use is -J, but unlike > bmake and gmake, we use socketpair(AF_UNIX, SOCK_STREAM) instead of > pipes, and therefore only need to pass one file descriptor to children > instead of two. Sorry, I missed your initial message (thanks to naddy@ for telling me about it) A jobserver was considered in make before implementing expensive. The main reason it was not done that way is that you can't guarantee a file descriptor will make it through the intermediate shell. In fact, POSIX doesn't guarantee anything, and a lot of shells actively close file descriptors they don't know about. Your example is not complicated enough, as your commands mean NO shell will be spawned (everything will pass directly through to submakes, which is BTW a large reason for implementing -C, which cuts down on the number of intermediate processes). I'm not sure I like added complexity in that part of make. > I haven't yet built the entire tree with this, just cautiously probing > the waters first to see if there is interest :) I'd like some actual numbers on real world applications: what does it change for the actual build through the full source tree. Areas like perl are where the job explosion occurs and where the expensive heuristics got refined. > Tested with the following Makefile structure: > > $ cat recursive/Makefile > .PHONY: a b c d > > all: a b c d > > a: > make -C a > b: > make -C a > c: > make -C a > d: > make -C a > $ cat recursive/a/Makefile > all: 1 2 3 4 > > 1: > sleep 1 > 2: > sleep 2 > 3: > sleep 3 > 4: > sleep 4 > > the speedup is as follows with -j4: > > before: > make -C recursive -j4 0.01s user 0.10s system 0% cpu 16.097 total > after: > make -C recursive -j4 0.02s user 0.15s system 1% cpu 11.082 total Comments on the patch proper: > --- > usr.bin/make/job.c | 317 +++- > usr.bin/make/job.h | 7 +- > usr.bin/make/main.c | 31 - > usr.bin/make/main.h | 3 + > 4 files changed, 290 insertions(+), 68 deletions(-) > > diff --git a/usr.bin/make/job.c b/usr.bin/make/job.c > index ebbae5306da..b36aaeacac8 100644 > --- a/usr.bin/make/job.c > +++ b/usr.bin/make/job.c > @@ -90,11 +90,15 @@ > * Job_WaitWait for all running jobs to finish. > */ > > +#include > #include > #include > +#include > #include > #include > +#include > #include > +#include > #include > #include > #include > @@ -115,6 +119,7 @@ > #include "memory.h" > #include "make.h" > #include "buf.h" > +#include "main.h" > > static int aborting = 0; /* why is the make aborting? */ > #define ABORT_ERROR 1 /* Because of an error */ > @@ -123,12 +128,15 @@ static int aborting = 0; /* why is the make > aborting? */ > > static int maxJobs;/* The most children we can run at once */ > static int nJobs; /* Number of jobs already allocated */ > -static bool no_new_jobs;/* Mark recursive shit so we shouldn't start > - * something else at the same time > - */ I don't think ditching expensive entirely is a good idea anyhow. What happens if the jobserver fails ? I see it goes to err all the time. BTW, going to err is bad. Make handles processes. Most errors MUST do something sane, like waiting for processes to end in many many cases. > + > +struct jobserver { > + int master; /* master deposits job tokens into this fd */ > + int slave; /* slaves consume job tokens from this fd */ > + volatile sig_atomic_t tokens; /* current
[PATCH] make: implement jobserver and use it to avoid exponential behavior
Hi, make has the concept of 'expensive' jobs: if the command line includes the word "make" in it, the command is considered expensive and other jobs are held until that job finishes. It does this to avoid exponential behavior when parallelizing with -j. Some other makes (eg. bmake and gmake) use a jobserver, ie. a pair of pipes between child makes and the top-level make: the top-level make provides job tokens to children via these pipes, which are returned when the jobs complete. This ensures that recursive makes only run a certain number of jobs in total at any one time, while achieving greater resource utilisation (no need to wait for an 'expensive' job to finish before starting others). This diff implements a similar jobserver in make. Like bmake and gmake, job tokens are provided by the top-level make (called "jobserver master"), used by children ("jobserver slaves") when they run commands, and returned when those commands finish. Like bmake, the internal, undocumented command line option to tell (possibly indirect, via MAKEFLAGS)) children about the file descriptor to use is -J, but unlike bmake and gmake, we use socketpair(AF_UNIX, SOCK_STREAM) instead of pipes, and therefore only need to pass one file descriptor to children instead of two. I haven't yet built the entire tree with this, just cautiously probing the waters first to see if there is interest :) Tested with the following Makefile structure: $ cat recursive/Makefile .PHONY: a b c d all: a b c d a: make -C a b: make -C a c: make -C a d: make -C a $ cat recursive/a/Makefile all: 1 2 3 4 1: sleep 1 2: sleep 2 3: sleep 3 4: sleep 4 the speedup is as follows with -j4: before: make -C recursive -j4 0.01s user 0.10s system 0% cpu 16.097 total after: make -C recursive -j4 0.02s user 0.15s system 1% cpu 11.082 total --- usr.bin/make/job.c | 317 +++- usr.bin/make/job.h | 7 +- usr.bin/make/main.c | 31 - usr.bin/make/main.h | 3 + 4 files changed, 290 insertions(+), 68 deletions(-) diff --git a/usr.bin/make/job.c b/usr.bin/make/job.c index ebbae5306da..b36aaeacac8 100644 --- a/usr.bin/make/job.c +++ b/usr.bin/make/job.c @@ -90,11 +90,15 @@ * Job_WaitWait for all running jobs to finish. */ +#include #include #include +#include #include #include +#include #include +#include #include #include #include @@ -115,6 +119,7 @@ #include "memory.h" #include "make.h" #include "buf.h" +#include "main.h" static int aborting = 0; /* why is the make aborting? */ #define ABORT_ERROR1 /* Because of an error */ @@ -123,12 +128,15 @@ static intaborting = 0; /* why is the make aborting? */ static int maxJobs;/* The most children we can run at once */ static int nJobs; /* Number of jobs already allocated */ -static boolno_new_jobs;/* Mark recursive shit so we shouldn't start -* something else at the same time -*/ + +struct jobserver { + int master; /* master deposits job tokens into this fd */ + int slave; /* slaves consume job tokens from this fd */ + volatile sig_atomic_t tokens; /* currently available tokens */ +}; +static struct jobserver jobserver; Job *runningJobs; /* Jobs currently running a process */ Job *errorJobs;/* Jobs in error at end */ -static Job *heldJobs; /* Jobs not running yet because of expensive */ static pid_t mypid;/* Used for printing debugging messages */ static volatile sig_atomic_t got_fatal; @@ -144,11 +152,16 @@ static void postprocess_job(Job *, bool); static Job *prepare_job(GNode *); static void determine_job_next_step(Job *); static void remove_job(Job *, bool); -static void may_continue_job(Job *); static void continue_job(Job *); static Job *reap_finished_job(pid_t); static bool reap_jobs(void); +static void jobserver_init(unsigned, int); +static bool jobserver_is_slave(void); +static void jobserver_master_reclaim_tokens(void); +static void jobserver_master_send_tokens(void); +static void jobserver_acquire_token(Job *job); +static void jobserver_release_token(Job *job); static void loop_handle_running_jobs(void); static bool expensive_job(Job *); static bool expensive_command(const char *); @@ -347,10 +360,9 @@ print_errors(void) static void setup_signal(int sig) { - if (signal(sig, SIG_IGN) != SIG_IGN) { - (void)signal(sig, notice_signal); - sigaddset(&sigset, sig); - } + sigaction(sig, &(struct sigaction) { .sa_handler = notice_signal }, + NULL); + siga