Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior

2019-11-27 Thread Ted Unangst
Marc Espie wrote:
> reorganizing a large part of usr.bin or usr.sbin to just be one
> single variation of bsd.prog.mk with multiple progs and multiple object
> files... works just fine for, say 95% of the binaries in those directories
> 
> (considering there are lots of directories with one single C file or two
> C files, this sounds like a gain for make -jN, right ?)
> 
> End result: no measurable gain.   That on 4 cpu boxes at the time. 
> There are SMP issues to solve before this actually yields any 
> useful result...

I don't know. I wrote a lua script that unrolled all the subdir and ran a
single process make in each directory, up to 4 at a time, and got pretty good
results. It takes 30s on my newer laptop just to run unbound configure.
Nothing else is compiling during that time, and it can be. Just a few of those
can shave minutes off a build.

(This was before clang. The savings is somewhat diminished by the
total time consumed by the gnu directory.)



Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior

2019-11-27 Thread Marc Espie
On Wed, Nov 27, 2019 at 06:21:01PM +0200, Lauri Tirkkonen wrote:
> All that said I do understand if there is reluctance to merge the
> jobserver stuff since it doesn't actually help the current situation in
> most cases. Nevertheless it has been personally beneficial to me in
> identifying areas of improvement, even if those are nothing new to
> OpenBSD developers.

Oh, I'm not opposed to merging a jobserver once it's roughly as stable
as what we have.

My concerns are:
- it should work with all of src and xenocara (once that's done, 
I can do ports)
- it should not interfere with other jobservers in other makes.
- it does not replace the expensive jobs heuristics.
- it does not render make harder to maintain and evolve.

Especially since, as I've said before, there's no actual guarantee anywhere
in POSIX that shells will cooperate with it.

I don't have enough time to do everything I want, but it's well known that
there are still issues with make -j compared to sequential make.

If you look at cvs history, I've completely rewritten job.c a few times.
And I had a presentation at eurobsdcon about what's still missing a few
years ago. I've been swamped with changes in dpb/bsd.port.mk/pkg_add...
recently, but it's still something I want to get back to eventually !

One reason that work goes slowly is because make was a seriously entwined
series of things that all depend on each other... It still is, but slightly
less so, and thus anything that's not squeaky clean is unlikely to get in.

(hence me wanting the jobserver stuff to be reasonably self-contained
and not interfering with the rest of make)



Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior

2019-11-27 Thread Lauri Tirkkonen
On Wed, Nov 27 2019 17:09:09 +0100, Marc Espie wrote:
> I did experiment with something similar a while back:
> 
> reorganizing a large part of usr.bin or usr.sbin to just be one
> single variation of bsd.prog.mk with multiple progs and multiple object
> files... works just fine for, say 95% of the binaries in those directories
> 
> (considering there are lots of directories with one single C file or two
> C files, this sounds like a gain for make -jN, right ?)

yep, exactly my thinking as well.

> End result: no measurable gain.   That on 4 cpu boxes at the time. 
> There are SMP issues to solve before this actually yields any 
> useful result...

I did see (and report) a measurable gain previously in this thread in
one of the environments I was testing this on, exactly in usr.bin; just
under two minutes of improvement with 4 CPUs.
https://marc.info/?l=openbsd-tech&m=157374553407474&w=2 -- see the note
about my test on Intel Atom C2550.

Now, I don't really know why I saw that improvement there, but you did
not in your tests and I did not in the Hyper-V guest that I tested my
most recent diff on; that merits further investigation. Solving the
actual SMP issues involved here would be great; I think the potential
benefits of work in that area are pretty big.

All that said I do understand if there is reluctance to merge the
jobserver stuff since it doesn't actually help the current situation in
most cases. Nevertheless it has been personally beneficial to me in
identifying areas of improvement, even if those are nothing new to
OpenBSD developers.

-- 
Lauri Tirkkonen | lotheac @ IRCnet



Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior

2019-11-27 Thread Marc Espie
On Wed, Nov 27, 2019 at 05:31:48PM +0200, Lauri Tirkkonen wrote:
> I'll give you some more background that I maybe should have given
> earlier already: in my hobby OS Unleashed, we use bmake and earlier did
> some (slightly hacky) modifications to subdir.mk to enable paralellizing
> jobs in multiple subdirs at the same time. I don't remember the exact
> numbers since it's been quite some time, but I was able to cut the base
> build time by *a lot* with just that change -- during the build there
> were previously a lot of idle CPUs. Those results served as motivation
> for me to start looking into doing similar things in OpenBSD, and having
> a jobserver is sort of the first step there.

I did experiment with something similar a while back:

reorganizing a large part of usr.bin or usr.sbin to just be one
single variation of bsd.prog.mk with multiple progs and multiple object
files... works just fine for, say 95% of the binaries in those directories

(considering there are lots of directories with one single C file or two
C files, this sounds like a gain for make -jN, right ?)

End result: no measurable gain.   That on 4 cpu boxes at the time. 
There are SMP issues to solve before this actually yields any 
useful result...



Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior

2019-11-27 Thread Lauri Tirkkonen
On Wed, Nov 27 2019 16:14:44 +0100, Marc Espie wrote:
> On Fri, Nov 15, 2019 at 03:29:29PM +0200, Lauri Tirkkonen wrote:
> > On Fri, Nov 15 2019 15:24:57 +0200, Lauri Tirkkonen wrote:
> > > Your points are valid and I agree with them completely. There are
> > > clearly problems with lock contention,
> > 
> > and I should mention here that I would probably not have observed the
> > magnitude of these problems had I not made make start more jobs at once
> > in the first place. So while the make/mk diffs need more work, we've
> > already identified more areas of improvement as a result of trying them
> > out. But one thing at a time. :)
> 
> Seriously, as much as what you're doing is interesting, you've not identified
> anything new in THAT specific area.  We've run enough build configurations 
> to know about those scalability issues for years.

Fair enough, I was being unfair in assuming otherwise. I guess I wanted
to offer some validation for my efforts or something ;)

> It's cool that you found new races that are hidden by expensive jobs, though,
> that much I fully agree with.

I'll give you some more background that I maybe should have given
earlier already: in my hobby OS Unleashed, we use bmake and earlier did
some (slightly hacky) modifications to subdir.mk to enable paralellizing
jobs in multiple subdirs at the same time. I don't remember the exact
numbers since it's been quite some time, but I was able to cut the base
build time by *a lot* with just that change -- during the build there
were previously a lot of idle CPUs. Those results served as motivation
for me to start looking into doing similar things in OpenBSD, and having
a jobserver is sort of the first step there.

-- 
Lauri Tirkkonen | lotheac @ IRCnet



Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior

2019-11-27 Thread Marc Espie
On Fri, Nov 15, 2019 at 03:29:29PM +0200, Lauri Tirkkonen wrote:
> On Fri, Nov 15 2019 15:24:57 +0200, Lauri Tirkkonen wrote:
> > Your points are valid and I agree with them completely. There are
> > clearly problems with lock contention,
> 
> and I should mention here that I would probably not have observed the
> magnitude of these problems had I not made make start more jobs at once
> in the first place. So while the make/mk diffs need more work, we've
> already identified more areas of improvement as a result of trying them
> out. But one thing at a time. :)

Seriously, as much as what you're doing is interesting, you've not identified
anything new in THAT specific area.  We've run enough build configurations 
to know about those scalability issues for years.

It's cool that you found new races that are hidden by expensive jobs, though,
that much I fully agree with.



Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior

2019-11-27 Thread Lauri Tirkkonen
On Wed, Nov 27 2019 11:49:00 +0100, Marc Espie wrote:
> On Wed, Nov 27, 2019 at 12:42:48PM +0200, Lauri Tirkkonen wrote:
> > In a lesson to always proof-read *before* sending your message:
> > 
> > On Wed, Nov 27 2019 12:31:51 +0200, Lauri Tirkkonen wrote:
> > >  - a full release build with the jobserver enabled passed after
> > >disabling gnu/usr.bin/binutils
> > 
> > this should, in fact, read "gnu/usr.bin/binutils-2.17". binutils builds
> > fine.
> 
> We would need to investigate this.
> 
> Instead of disabling them, send logs

Sure; I just considered these separate problems that happened to be
exposed by the jobserver usage ('expensive' avoids these races since it
does not run more than one sub-make at a time).

The problem I encountered with binutils-2.17 seems to be that configure is run
in 'intl' more than once at the same time. pstree output seems to
support my hypothesis:

 | | |-+= 94071 lotheac make -f Makefile.bsd-wrapper -j16 -x
 | | | \-+- 26584 lotheac /bin/sh -ec SUBDIRS='opcodes bfd binutils ld gas' 
 CONFIGURE_HOST_MODULES='configure-
 | | |   \-+- 71033 lotheac make CC=cc CFLAGS=-O2 -pipe  -DPIE_DEFAULT=1  
LDFLAGS= scriptdir=/usr/libdata toold
 | | | |-+- 08037 lotheac /bin/sh -ec r=`${PWDCMD-pwd}`; export r;  
s=`cd /usr/src/gnu/usr.bin/binutils-2.1
 | | | | \-+- 57596 lotheac make DESTDIR= RPATH_ENVVAR=LD_LIBRARY_PATH 
TARGET_SUBDIR=amd64-unknown-openbsd6
 | | | |   \-+- 46889 lotheac /bin/sh -ec r=`${PWDCMD-pwd}`; export r;  
s=`cd /usr/src/gnu/usr.bin/binutils
 | | | | \-+- 07550 lotheac /bin/sh 
/usr/src/gnu/usr.bin/binutils-2.17/intl/configure --cache-file=./co
 | | | |   \--- 50218 lotheac (grep)
 | | | \-+- 47358 lotheac /bin/sh -ec r=`${PWDCMD-pwd}`; export r;  
s=`cd /usr/src/gnu/usr.bin/binutils-2.1
 | | |   \-+- 26216 lotheac make DESTDIR= RPATH_ENVVAR=LD_LIBRARY_PATH 
TARGET_SUBDIR=amd64-unknown-openbsd6
 | | | \-+- 51511 lotheac /bin/sh -ec r=`${PWDCMD-pwd}`; export r;  
s=`cd /usr/src/gnu/usr.bin/binutils
 | | |   \-+- 22785 lotheac /bin/sh 
/usr/src/gnu/usr.bin/binutils-2.17/intl/configure --cache-file=./co
 | | | \-+- 25979 lotheac /bin/sh 
/usr/src/gnu/usr.bin/binutils-2.17/intl/configure --cache-file=./
 | | |   \--- 51834 lotheac (cc)

Partial log of the failure below. I have a full log of the make
invocation as well but I think the relevant parts are included here.

[ ... ]
cd /usr/src/gnu/usr.bin/binutils-2.17/obj/ld && make ld.1
cd /usr/src/gnu/usr.bin/binutils-2.17/obj/gas/doc && make as.1
`as.1' is up to date.
`ld.1' is up to date.
mv ld/ld.1 ld/ld.bfd.1
SUBDIRS='opcodes bfd binutils ld gas'  
CONFIGURE_HOST_MODULES='configure-opcodes configure-bfd configure-binutils 
configure-ld configure-gas'  make CC="cc" CFLAGS="-O2 -pipe  -DPIE_DEFAULT=1 " 
LDFLAGS=""  scriptdir=/usr/libdata  tooldir=/usr  MAKEINFO='makeinfo 
--no-split'  MAKEINFOFLAGS=''  BSDSRCDIR=/usr/src  ALL_MODULES="all-opcodes 
all-bfd all-binutils all-ld all-gas"  ALL_HOST_MODULES='all-opcodes all-bfd 
all-binutils all-ld all-gas'  INFO_HOST_MODULES='info-opcodes info-bfd 
info-binutils info-ld info-gas' all info
Configuring in ./intl
Configuring in ./intl
Doing info in opcodes
Doing info in bfd
Doing info in binutils
Doing info in gas
Doing info in ld
Making info in po
Making info in doc
touch ld.1
Making info in doc
perl /usr/src/gnu/usr.bin/binutils-2.17/ld/../etc/texi2pod.pl -I 
/usr/src/gnu/usr.bin/binutils-2.17/ld -I 
/usr/src/gnu/usr.bin/binutils-2.17/ld/../bfd/doc -I 
/usr/src/gnu/usr.bin/binutils-2.17/ld/../../../lib/libiberty/src -Dman < 
/usr/src/gnu/usr.bin/binutils-2.17/ld/ld.texinfo > ld.pod
Making info in doc
rm -f /usr/src/gnu/usr.bin/binutils-2.17/bfd/doc/bfd.info 
/usr/src/gnu/usr.bin/binutils-2.17/bfd/doc/bfd.info-[0-9] 
/usr/src/gnu/usr.bin/binutils-2.17/bfd/doc/bfd.info-[0-9][0-9]
if test -f cxxfilt.man; then  man=cxxfilt.man;  else  
man=/usr/src/gnu/usr.bin/binutils-2.17/binutils/doc/cxxfilt.man;  fi;  sed -e 
's/@PROGRAM@/c++filt/'  -e 's/cxxfilt/c++filt/' < $man  > c++filt.1
makeinfo --no-split -I /usr/src/gnu/usr.bin/binutils-2.17/bfd/doc 
/usr/src/gnu/usr.bin/binutils-2.17/bfd/doc/bfd.texinfo
Making info in po
makeinfo --no-split   -I "/usr/src/gnu/usr.bin/binutils-2.17/binutils/doc" -I 
"/usr/src/gnu/usr.bin/binutils-2.17/binutils/../../../lib/libiberty/src"  -I 
/usr/src/gnu/usr.bin/binutils-2.17/binutils/doc 
/usr/src/gnu/usr.bin/binutils-2.17/binutils/doc/binutils.texi
Making info in po
loading cache ./config.cache
Making info in po
loading cache ./config.cache
checking for a BSD compatible install... /usr/bin/install -c
checking how to run the C preprocessor... (pod2man --center="GNU Development 
Tools"  --release="binutils-2.17" --section=1 ld.pod |  sed -e '/^.if n .na/d' 
> ld.1.T$$ &&  mv -f ld.1.T$$ ld.1) ||  (rm -f ld.1.T$$ && exit 1)
checking for a BSD compatible install... /usr/bin/install -c
checking how to run the C pre

Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior

2019-11-27 Thread Marc Espie
On Wed, Nov 27, 2019 at 12:42:48PM +0200, Lauri Tirkkonen wrote:
> In a lesson to always proof-read *before* sending your message:
> 
> On Wed, Nov 27 2019 12:31:51 +0200, Lauri Tirkkonen wrote:
> >  - a full release build with the jobserver enabled passed after
> >disabling gnu/usr.bin/binutils
> 
> this should, in fact, read "gnu/usr.bin/binutils-2.17". binutils builds
> fine.

We would need to investigate this.

Instead of disabling them, send logs



Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior

2019-11-27 Thread Lauri Tirkkonen
In a lesson to always proof-read *before* sending your message:

On Wed, Nov 27 2019 12:31:51 +0200, Lauri Tirkkonen wrote:
>  - a full release build with the jobserver enabled passed after
>disabling gnu/usr.bin/binutils

this should, in fact, read "gnu/usr.bin/binutils-2.17". binutils builds
fine.

-- 
Lauri Tirkkonen | lotheac @ IRCnet



Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior

2019-11-27 Thread Lauri Tirkkonen
New diff, almost entirely rewritten and slightly redesigned, based on
your previous feedback as well as problems uncovered in testing. Sorry
it's taken so long; I've been a bit stingy with my spare time with
respect to this :)

Description of the design:

 - the top-level make initializes jobserver, if requested, by creating
   a pair of sockets with socketpair(). It keeps one side ("master") for
   itself only (O_CLOEXEC), passing the other side to all children
   ("slave"). It then adds the fd number for the inheritable slave fd to
   MAKEFLAGS with '-J'. these sockets are set SOCK_NONBLOCK.
 - the master initializes the number of available job tokens to maxJobs
   (ie. from -j).
 - every Job keeps track of the kind of job token it is currently
   holding (enum job_token_type). when the jobserver master starts a
   job, it allocates a JOB_TOKEN_LOCAL type token to the job and
   decrements the number of available tokens.
 - child makes, noticing -J in MAKEFLAGS, initialize the number of
   available job tokens to 1 -- this is because an upper-level make has
   already allocated a job token to the job that ended up starting this
   make process.
 - if a slave make's jobserver.tokens is 0, it asks the master for a
   token via the socket. This happens by writing a '-' character to the
   socket, and waiting for a response.
 - in the master process, handle_running_jobs uses ppoll() to wait for
   either communication from slaves or signals. when it receives a token
   request ('-' character read from the socket), it checks if it has
   available job tokens, and if so, decrements the number of available
   tokens and sends back a response ('.' byte, but the value isn't
   actually checked by slaves, so any byte would do). if the master does
   not have available tokens to fulfill the request, it increments
   jobserver.waiting so that it can fulfill the request later when
   currently in-use tokens are reclaimed.
 - when the slave make receives a byte over the socket from master, it
   stops waiting, sets the Job's token type to JOB_TOKEN_REMOTE and
   continues executing the job. when the job stops executing
   (determine_job_next_step(), or when the job finishes in
   continue_job()), the token is released back to master by writing a
   '+' to the socket.
 - when the master sees a '+' character on the socket, it increments
   jobserver.tokens, and subsequently sends the newly reclaimed token(s)
   to the socket if there are slaves waiting (jobserver.waiting != 0)
 - if the communication between the master and slave has been severed
   (eg. if an intermediate process has closed the fd), the slave sets
   maxJobs to 1, disables the jobserver, prints a warning to stderr and
   proceeds to run one job at a time.
 - more severe error conditions exist; I've commented on these in the
   code so I won't repeat here.

Testing notes:

 - a full release build with the jobserver enabled passed after
   disabling gnu/usr.bin/binutils and gnu/usr.bin/perl subdirs. Those
   dirs may fail to build with jobserver because there are races that
   are hit when parallelizing with the jobserver, but not with
   'expensive' heuristics.
 - as described earlier, my testing setup seems to have high lock
   contention for reasons I don't know. as such, the results for
   enabling the jobserver for 'make -j16 build' are slower in real-time
   than without the jobserver:

$ time doas make -j16 build
  209m03.63s real   388m15.88s user  3057m43.77s system

$ time doas make -j16 -x build
  214m27.57s real   381m11.55s user  3179m29.82s system
   
   that means that for this particular setup, the jobserver
   implementation is only a small part of the puzzle of improving
   performance (and does the opposite without the other pieces :).
   subdir.mk is still essentially single-threaded (I sent an unfinished,
   somewhat broken PoC patch for that earlier), which is a big
   opportunity for improvement when using the jobserver, and obviously
   for this machine the high spin% problems should be solved if I want
   the build to actually go faster when it's running more processes.

   I did mention that on actual hardware I had better results in
   previous smaller-scale tests, but I have not run a 'make build' on
   that actual hardware since it's essentially a production machine I
   use.
 - I have not tested ports builds.
 - I have only tested this on amd64.
 - regression tests have not been written.

Other general notes:

 - manpage parts are not yet written.
 - I don't particularly like the option letter I picked to enable the
   jobserver, '-x'. I don't know what would be better, though; one
   option that crossed my mind was using -J and passing the fd number to
   children in another environment variable instead of MAKEFLAGS -J, but
   I'm not sure. I would appreciate help in picking the color for this
   shed :)
 - a new debug flag J is added for jobserver debug messages. they are
   perhaps a little 

Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior

2019-11-15 Thread Lauri Tirkkonen
On Fri, Nov 15 2019 15:24:57 +0200, Lauri Tirkkonen wrote:
> Your points are valid and I agree with them completely. There are
> clearly problems with lock contention,

and I should mention here that I would probably not have observed the
magnitude of these problems had I not made make start more jobs at once
in the first place. So while the make/mk diffs need more work, we've
already identified more areas of improvement as a result of trying them
out. But one thing at a time. :)

-- 
Lauri Tirkkonen | lotheac @ IRCnet



Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior

2019-11-15 Thread Lauri Tirkkonen
On Fri, Nov 15 2019 14:16:44 +0100, Martin Pieuchot wrote:
> > Yes, that was my point exactly. Less jobs didn't fare any better (well,
> > it had less spin time, but took around the same real time), so the
> > conclusion I arrived at was that something in my setup was eventually
> > contending on a small number of locks. My guess is that it's either the
> > filesystem, the IDE driver, something Hyper-V specific, or a combination
> > of the above.
> 
> What does it bring to guess?  Why don't you look deeper where the
> contention is?
> 
> > This change is all about utilizing CPUs better in parallelizing existing
> > workloads, so I wouldn't expect a very large change in user time (but it
> > should happen over a smaller amout of real time).
> 
> Is this change about better parallelizing?  Do we see that?  Or is it a
> guess?  If we want OpenBSD to do a better job at parallelizing maybe we
> should look at where the contention is and then how to get rid of it?
> 
> > > You can also write Makefiles that expose less the limitation of the
> > > system.  ktrace(1) is your friend for that.
> > 
> > The idea was to test real-world workloads, ie. actual OpenBSD builds. I
> > do have enough memory on this thing to place objs in mfs; maybe I'll try
> > that next time around.
> 
> I'd suggest you to spend your time understanding where is the bottleneck
> instead of randomly trying to change stuff :)

Your points are valid and I agree with them completely. There are
clearly problems with lock contention, and there are problems with
resource utilisation due to make not starting enough jobs. While they
are related, I kind of have to pick my battles one by one not to descend
too far into yak shaving territory. Given infinite time, let's fix
everything, but since that is not afforded to me, some educated guesses
have to be made to be able to test the right thing in the meantime. ;)

-- 
Lauri Tirkkonen | lotheac @ IRCnet



Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior

2019-11-15 Thread Martin Pieuchot
On 15/11/19(Fri) 15:03, Lauri Tirkkonen wrote:
> On Thu, Nov 14 2019 17:57:20 +0100, Martin Pieuchot wrote:
> > On 14/11/19(Thu) 17:30, Lauri Tirkkonen wrote:
> > > [...] 
> > > The first test I made was on a beefy virtual machine under Hyper-V,
> > > with 32 vCPUs on a Threadripper 2950X. With and without this patch (and
> > > the followup for share/mk), I did make -j32 -C /usr/src/usr.bin. The
> > > results were very disappointing: with my patches the build was *slower*
> > > even though it used way more CPU. I think 'top' said it best:
> > > 
> > > 32  CPUs:  0.1% user,  0.0% nice,  3.9% sys, 95.4% spin,  0.0% intr, 0.7% 
> > > idle
> > >  ^^
> > 
> > I'd suggest considering the %user time when working on a userland tool.
> > A high %spin time indicates that syscalls and page faults are spending
> > a lot of time busy waiting on the KERNEL_LOCK().
> > 
> > In other words you're exposing the non-scalability of OpenBSD on such
> > machine.  I'd suggest you use less jobs, I'd try 8 or 12 as a first step.
> 
> Yes, that was my point exactly. Less jobs didn't fare any better (well,
> it had less spin time, but took around the same real time), so the
> conclusion I arrived at was that something in my setup was eventually
> contending on a small number of locks. My guess is that it's either the
> filesystem, the IDE driver, something Hyper-V specific, or a combination
> of the above.

What does it bring to guess?  Why don't you look deeper where the
contention is?

> This change is all about utilizing CPUs better in parallelizing existing
> workloads, so I wouldn't expect a very large change in user time (but it
> should happen over a smaller amout of real time).

Is this change about better parallelizing?  Do we see that?  Or is it a
guess?  If we want OpenBSD to do a better job at parallelizing maybe we
should look at where the contention is and then how to get rid of it?

> > You can also write Makefiles that expose less the limitation of the
> > system.  ktrace(1) is your friend for that.
> 
> The idea was to test real-world workloads, ie. actual OpenBSD builds. I
> do have enough memory on this thing to place objs in mfs; maybe I'll try
> that next time around.

I'd suggest you to spend your time understanding where is the bottleneck
instead of randomly trying to change stuff :)



Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior

2019-11-15 Thread Lauri Tirkkonen
Hi Martin,

On Thu, Nov 14 2019 17:57:20 +0100, Martin Pieuchot wrote:
> On 14/11/19(Thu) 17:30, Lauri Tirkkonen wrote:
> > [...] 
> > The first test I made was on a beefy virtual machine under Hyper-V,
> > with 32 vCPUs on a Threadripper 2950X. With and without this patch (and
> > the followup for share/mk), I did make -j32 -C /usr/src/usr.bin. The
> > results were very disappointing: with my patches the build was *slower*
> > even though it used way more CPU. I think 'top' said it best:
> > 
> > 32  CPUs:  0.1% user,  0.0% nice,  3.9% sys, 95.4% spin,  0.0% intr, 0.7% 
> > idle
> >  ^^
> 
> I'd suggest considering the %user time when working on a userland tool.
> A high %spin time indicates that syscalls and page faults are spending
> a lot of time busy waiting on the KERNEL_LOCK().
> 
> In other words you're exposing the non-scalability of OpenBSD on such
> machine.  I'd suggest you use less jobs, I'd try 8 or 12 as a first step.

Yes, that was my point exactly. Less jobs didn't fare any better (well,
it had less spin time, but took around the same real time), so the
conclusion I arrived at was that something in my setup was eventually
contending on a small number of locks. My guess is that it's either the
filesystem, the IDE driver, something Hyper-V specific, or a combination
of the above.

SATA on bare metal didn't spin nearly as badly, which is why I suspected
IDE. But this is a bit of a digression :)

This change is all about utilizing CPUs better in parallelizing existing
workloads, so I wouldn't expect a very large change in user time (but it
should happen over a smaller amout of real time).

> You can also write Makefiles that expose less the limitation of the
> system.  ktrace(1) is your friend for that.

The idea was to test real-world workloads, ie. actual OpenBSD builds. I
do have enough memory on this thing to place objs in mfs; maybe I'll try
that next time around.

-- 
Lauri Tirkkonen | lotheac @ IRCnet



Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior

2019-11-15 Thread Martin Pieuchot
On 14/11/19(Thu) 17:30, Lauri Tirkkonen wrote:
> [...] 
> The first test I made was on a beefy virtual machine under Hyper-V,
> with 32 vCPUs on a Threadripper 2950X. With and without this patch (and
> the followup for share/mk), I did make -j32 -C /usr/src/usr.bin. The
> results were very disappointing: with my patches the build was *slower*
> even though it used way more CPU. I think 'top' said it best:
> 
> 32  CPUs:  0.1% user,  0.0% nice,  3.9% sys, 95.4% spin,  0.0% intr, 0.7% idle
>  ^^

I'd suggest considering the %user time when working on a userland tool.
A high %spin time indicates that syscalls and page faults are spending
a lot of time busy waiting on the KERNEL_LOCK().

In other words you're exposing the non-scalability of OpenBSD on such
machine.  I'd suggest you use less jobs, I'd try 8 or 12 as a first step.

You can also write Makefiles that expose less the limitation of the
system.  ktrace(1) is your friend for that.



Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior

2019-11-14 Thread Marc Espie
On Thu, Nov 14, 2019 at 05:30:34PM +0200, Lauri Tirkkonen wrote:
> > > @@ -347,10 +360,9 @@ print_errors(void)
> > >  static void
> > >  setup_signal(int sig)
> > >  {
> > > - if (signal(sig, SIG_IGN) != SIG_IGN) {
> > > - (void)signal(sig, notice_signal);
> > > - sigaddset(&sigset, sig);
> > > - }
> > > + sigaction(sig, &(struct sigaction) { .sa_handler = notice_signal },
> > > + NULL);
> > > + sigaddset(&sigset, sig);
> > >  }
> > Why did you change this ? this is gratuitous.
> 
> Actually, it's not gratuitous, but it's subtle. signal() enables
> SA_RESTART, but we depend on EINTR returns in a couple places, so I did
> this. I should've mentioned this originally, sorry.

Err, thinking some more. You can't do that. This affects ALL of make's code
and not just this file.  Some parts are belt-and-suspenders type and include
checks for EINTR, but NOWHERE near every part.

In particular, you *lose* stdio.h entirely.



Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior

2019-11-14 Thread Marc Espie
On Thu, Nov 14, 2019 at 05:30:34PM +0200, Lauri Tirkkonen wrote:
> Yeah, you're right, it reads like crap, I'll rewrite it. The gist of the
> thing is that 'expensive' heuristic is now only used to decide whether
> or not to print the failed command.

I object to that. Keep the "expensive" heuristics around. 
The current -jn should still work *without* the jobserver mechanism.

> > >  static void
> > > -loop_handle_running_jobs()
> > > +loop_handle_running_jobs(void)
> > >  {
> > >   while (runningJobs != NULL)
> > >   handle_running_jobs();
> > >  }
> > Yet another gratuitous change.  There is already a perfectly fine prototype
> > that asserts (void).
> 
> A function prototype with empty parentheses does not assert (void) in C
> (but does in C++). However, you're correct about it not being related to
> this change.
See above. There are prototypes for ALL static functions in that file.
You don't need the void in the function definition when the function
declaration (a few hundred lines above) already has it.

> This was also a bit concerning to me, but that's what bmake/gmake also
> are doing (albeit with more fd's). Eventually the potential gains
> outweighed my concerns: every SUBDIR in bsd.subdir.mk is currently built
> serially, which has quite a large impact on how many jobs are running at
> any one time.
> 
> An alternative could be to use unix sockets with paths instead and pass
> the path through the environment, but I'm not convinced I like that any
> better.
There is the huge problem of where you put your paths.  NFS comes in the
way AND socket names are limited to 256 characters.



Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior

2019-11-14 Thread Lauri Tirkkonen
Hi,

thanks for your comments. More responses inline, but I'd like to mention
up front that further testing I've made agrees with you; this isn't
ready by any means. I observed one hang I've yet to debug (very bad),
and the speedups weren't that great in some cases either (more on that
later).

I don't have a new version of the diff fixing the problems (at least not
yet), but I'll try to give a bit more motivation/background for my
changes.

On Thu, Nov 14 2019 15:24:53 +0100, Marc Espie wrote:
> Sorry, I missed your initial message (thanks to naddy@ for telling me about 
> it)
> 
> A jobserver was considered in make before implementing expensive.
> 
> The main reason it was not done that way is that you can't guarantee a
> file descriptor will make it through the intermediate shell.  
> 
> In fact, POSIX doesn't guarantee anything, and a lot of shells actively close
> file descriptors they don't know about.

Yes, I'm aware of this. Other makes seem to do fine though, and they use
the same mechanism of passing file descriptors: leave them open, and
communicate their existence through environment variables.

However, it's a good point; failure to access the fd should fall back to
single-job semantics in the child at that point, and I need to consider
what happens in the top-level make in such a case as well. More testing
is obviously needed.

> Your example is not complicated enough, as your commands mean NO 
> shell will be spawned (everything will pass directly through to 
> submakes, which is BTW a large reason for implementing -C, which cuts 
> down on the number of intermediate processes).

True, the test needs more work. Thanks for pointing it out, I somehow
forgot about the optimization about avoiding a shell while I was writing
the previous mail.

> I'm not sure I like added complexity in that part of make.
> 
> > I haven't yet built the entire tree with this, just cautiously probing
> > the waters first to see if there is interest :)
> 
> I'd like some actual numbers on real world applications: what does it change
> for the actual build through the full source tree.   Areas like perl are where
> the job explosion occurs and where the expensive heuristics got refined.

As I mentioned in the preface above, a full build actually ended up
hanging somewhere in gnu/ :-)

However I do have some results from smaller real-world tests.
Unfortunately I haven't yet gotten around to bigger ones (and besides I
need to fix the outstanding issues first for the tests to make actual
sense).

The first test I made was on a beefy virtual machine under Hyper-V,
with 32 vCPUs on a Threadripper 2950X. With and without this patch (and
the followup for share/mk), I did make -j32 -C /usr/src/usr.bin. The
results were very disappointing: with my patches the build was *slower*
even though it used way more CPU. I think 'top' said it best:

32  CPUs:  0.1% user,  0.0% nice,  3.9% sys, 95.4% spin,  0.0% intr, 0.7% idle
 ^^

The Hyper-V guest uses an IDE disk, and my hypothesis is that there may
be some locking problems there.

I repeated the test on actual hardware, an Intel Atom C2550 board with 4
CPUs, and SATA disks. This showed an actual build time improvement
(though I admit I just did one trial so far):

before
make -j4  817.49s user 340.95s system 268% cpu 7:11.44 total
after
make -j4  831.34s user 352.72s system 373% cpu 5:16.88 total

which is not too shabby if you ask me.

> Comments on the patch proper:

> >  static int aborting = 0;   /* why is the make aborting? */
> >  #define ABORT_ERROR1   /* Because of an error */
> > @@ -123,12 +128,15 @@ static intaborting = 0;   /* why is the make 
> > aborting? */
> >  
> >  static int maxJobs;/* The most children we can run at once */
> >  static int nJobs;  /* Number of jobs already allocated */
> > -static boolno_new_jobs;/* Mark recursive shit so we shouldn't 
> > start
> > -* something else at the same time
> > -*/
> I don't think ditching expensive entirely is a good idea anyhow.
> What happens if the jobserver fails ?  I see it goes to err all the time.
> BTW, going to err is bad. Make handles processes. Most errors MUST do 
> something sane, like waiting for processes to end in many many cases.

You're absolutely right, error handling is currently complete garbage.
And jobserver failures should be handled gracefully, I agree. I'll see
if I can come up with something.

> > +
> > +struct jobserver {
> > +   int master; /* master deposits job tokens into this fd */
> > +   int slave;  /* slaves consume job tokens from this fd */
> > +   volatile sig_atomic_t tokens;   /* currently available tokens */
> > +};
> you didn't really read the code, did you ? why volatile sig_atomic_t ?

I wrote this code, I'm not sure how I could have avoided reading it :-)

But I must conced

Re: [PATCH] make: implement jobserver and use it to avoid exponential behavior

2019-11-14 Thread Marc Espie
On Tue, Nov 05, 2019 at 03:22:08PM +, Lauri Tirkkonen wrote:
> Hi,
> 
> make has the concept of 'expensive' jobs: if the command line includes
> the word "make" in it, the command is considered expensive and other
> jobs are held until that job finishes. It does this to avoid exponential
> behavior when parallelizing with -j.
> 
> Some other makes (eg. bmake and gmake) use a jobserver, ie. a pair of
> pipes between child makes and the top-level make: the top-level make
> provides job tokens to children via these pipes, which are returned when
> the jobs complete. This ensures that recursive makes only run a certain
> number of jobs in total at any one time, while achieving greater
> resource utilisation (no need to wait for an 'expensive' job to finish
> before starting others).
> 
> This diff implements a similar jobserver in make. Like bmake and gmake,
> job tokens are provided by the top-level make (called "jobserver
> master"), used by children ("jobserver slaves") when they run commands,
> and returned when those commands finish. Like bmake, the internal,
> undocumented command line option to tell (possibly indirect, via
> MAKEFLAGS)) children about the file descriptor to use is -J, but unlike
> bmake and gmake, we use socketpair(AF_UNIX, SOCK_STREAM) instead of
> pipes, and therefore only need to pass one file descriptor to children
> instead of two.

Sorry, I missed your initial message (thanks to naddy@ for telling me about it)

A jobserver was considered in make before implementing expensive.

The main reason it was not done that way is that you can't guarantee a
file descriptor will make it through the intermediate shell.  

In fact, POSIX doesn't guarantee anything, and a lot of shells actively close
file descriptors they don't know about.

Your example is not complicated enough, as your commands mean NO 
shell will be spawned (everything will pass directly through to 
submakes, which is BTW a large reason for implementing -C, which cuts 
down on the number of intermediate processes).

I'm not sure I like added complexity in that part of make.

> I haven't yet built the entire tree with this, just cautiously probing
> the waters first to see if there is interest :)

I'd like some actual numbers on real world applications: what does it change
for the actual build through the full source tree.   Areas like perl are where
the job explosion occurs and where the expensive heuristics got refined.


> Tested with the following Makefile structure:
> 
>   $ cat recursive/Makefile
>   .PHONY: a b c d
> 
>   all: a b c d
> 
>   a:
>   make -C a
>   b:
>   make -C a
>   c:
>   make -C a
>   d:
>   make -C a
>   $ cat recursive/a/Makefile
>   all: 1 2 3 4
> 
>   1:
>   sleep 1
>   2:
>   sleep 2
>   3:
>   sleep 3
>   4:
>   sleep 4
> 
> the speedup is as follows with -j4:
> 
> before:
>   make -C recursive -j4  0.01s user 0.10s system 0% cpu 16.097 total
> after:
>   make -C recursive -j4  0.02s user 0.15s system 1% cpu 11.082 total

Comments on the patch proper:

> ---
>  usr.bin/make/job.c  | 317 +++-
>  usr.bin/make/job.h  |   7 +-
>  usr.bin/make/main.c |  31 -
>  usr.bin/make/main.h |   3 +
>  4 files changed, 290 insertions(+), 68 deletions(-)
> 
> diff --git a/usr.bin/make/job.c b/usr.bin/make/job.c
> index ebbae5306da..b36aaeacac8 100644
> --- a/usr.bin/make/job.c
> +++ b/usr.bin/make/job.c
> @@ -90,11 +90,15 @@
>   *   Job_WaitWait for all running jobs to finish.
>   */
>  
> +#include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
> +#include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -115,6 +119,7 @@
>  #include "memory.h"
>  #include "make.h"
>  #include "buf.h"
> +#include "main.h"
>  
>  static int   aborting = 0;   /* why is the make aborting? */
>  #define ABORT_ERROR  1   /* Because of an error */
> @@ -123,12 +128,15 @@ static int  aborting = 0;   /* why is the make 
> aborting? */
>  
>  static int   maxJobs;/* The most children we can run at once */
>  static int   nJobs;  /* Number of jobs already allocated */
> -static bool  no_new_jobs;/* Mark recursive shit so we shouldn't start
> -  * something else at the same time
> -  */
I don't think ditching expensive entirely is a good idea anyhow.
What happens if the jobserver fails ?  I see it goes to err all the time.
BTW, going to err is bad. Make handles processes. Most errors MUST do 
something sane, like waiting for processes to end in many many cases.

> +
> +struct jobserver {
> + int master; /* master deposits job tokens into this fd */
> + int slave;  /* slaves consume job tokens from this fd */
> + volatile sig_atomic_t tokens;   /* current

[PATCH] make: implement jobserver and use it to avoid exponential behavior

2019-11-05 Thread Lauri Tirkkonen
Hi,

make has the concept of 'expensive' jobs: if the command line includes
the word "make" in it, the command is considered expensive and other
jobs are held until that job finishes. It does this to avoid exponential
behavior when parallelizing with -j.

Some other makes (eg. bmake and gmake) use a jobserver, ie. a pair of
pipes between child makes and the top-level make: the top-level make
provides job tokens to children via these pipes, which are returned when
the jobs complete. This ensures that recursive makes only run a certain
number of jobs in total at any one time, while achieving greater
resource utilisation (no need to wait for an 'expensive' job to finish
before starting others).

This diff implements a similar jobserver in make. Like bmake and gmake,
job tokens are provided by the top-level make (called "jobserver
master"), used by children ("jobserver slaves") when they run commands,
and returned when those commands finish. Like bmake, the internal,
undocumented command line option to tell (possibly indirect, via
MAKEFLAGS)) children about the file descriptor to use is -J, but unlike
bmake and gmake, we use socketpair(AF_UNIX, SOCK_STREAM) instead of
pipes, and therefore only need to pass one file descriptor to children
instead of two.

I haven't yet built the entire tree with this, just cautiously probing
the waters first to see if there is interest :)

Tested with the following Makefile structure:

$ cat recursive/Makefile
.PHONY: a b c d

all: a b c d

a:
make -C a
b:
make -C a
c:
make -C a
d:
make -C a
$ cat recursive/a/Makefile
all: 1 2 3 4

1:
sleep 1
2:
sleep 2
3:
sleep 3
4:
sleep 4

the speedup is as follows with -j4:

before:
make -C recursive -j4  0.01s user 0.10s system 0% cpu 16.097 total
after:
make -C recursive -j4  0.02s user 0.15s system 1% cpu 11.082 total

---
 usr.bin/make/job.c  | 317 +++-
 usr.bin/make/job.h  |   7 +-
 usr.bin/make/main.c |  31 -
 usr.bin/make/main.h |   3 +
 4 files changed, 290 insertions(+), 68 deletions(-)

diff --git a/usr.bin/make/job.c b/usr.bin/make/job.c
index ebbae5306da..b36aaeacac8 100644
--- a/usr.bin/make/job.c
+++ b/usr.bin/make/job.c
@@ -90,11 +90,15 @@
  * Job_WaitWait for all running jobs to finish.
  */
 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -115,6 +119,7 @@
 #include "memory.h"
 #include "make.h"
 #include "buf.h"
+#include "main.h"
 
 static int aborting = 0;   /* why is the make aborting? */
 #define ABORT_ERROR1   /* Because of an error */
@@ -123,12 +128,15 @@ static intaborting = 0;   /* why is the make 
aborting? */
 
 static int maxJobs;/* The most children we can run at once */
 static int nJobs;  /* Number of jobs already allocated */
-static boolno_new_jobs;/* Mark recursive shit so we shouldn't start
-* something else at the same time
-*/
+
+struct jobserver {
+   int master; /* master deposits job tokens into this fd */
+   int slave;  /* slaves consume job tokens from this fd */
+   volatile sig_atomic_t tokens;   /* currently available tokens */
+};
+static struct jobserver jobserver;
 Job *runningJobs;  /* Jobs currently running a process */
 Job *errorJobs;/* Jobs in error at end */
-static Job *heldJobs;  /* Jobs not running yet because of expensive */
 static pid_t mypid;/* Used for printing debugging messages */
 
 static volatile sig_atomic_t got_fatal;
@@ -144,11 +152,16 @@ static void postprocess_job(Job *, bool);
 static Job *prepare_job(GNode *);
 static void determine_job_next_step(Job *);
 static void remove_job(Job *, bool);
-static void may_continue_job(Job *);
 static void continue_job(Job *);
 static Job *reap_finished_job(pid_t);
 static bool reap_jobs(void);
 
+static void jobserver_init(unsigned, int);
+static bool jobserver_is_slave(void);
+static void jobserver_master_reclaim_tokens(void);
+static void jobserver_master_send_tokens(void);
+static void jobserver_acquire_token(Job *job);
+static void jobserver_release_token(Job *job);
 static void loop_handle_running_jobs(void);
 static bool expensive_job(Job *);
 static bool expensive_command(const char *);
@@ -347,10 +360,9 @@ print_errors(void)
 static void
 setup_signal(int sig)
 {
-   if (signal(sig, SIG_IGN) != SIG_IGN) {
-   (void)signal(sig, notice_signal);
-   sigaddset(&sigset, sig);
-   }
+   sigaction(sig, &(struct sigaction) { .sa_handler = notice_signal },
+   NULL);
+   siga