Summary: Running 'distcc' w/ zeroconf support under the attached
wrapper (as 'with-closed-fds distcc [distcc options]') prevents odd
hangs.
Long version:
[This explanation ended up a little longer than I'd've liked, but
nonetheless, might be interesting...]
Sometime between version 0.36-or-so and now, I noticed that there would
often be odd pauses when running even very simple commands with paludis.
It turns out to be a weird interaction with distcc, zeroconf, and
outputwrapper.
In /etc/distcc/hosts:
+zeroconf
In my /etc/paludis/bashrc, among a bunch of other things:
if [ "$_OK_DISTCC" = "yes" ] && distcc --version &> /dev/null ; then
DISTCC_DIR="/var/tmp/paludis/.distcc"
PATH="/usr/lib/distcc/bin:$PATH"
SANDBOX_WRITE="$SANDBOX_WRITE:$DISTCC_DIR"
: ${_make_jobs="$(distcc -j)"}
: ${_make_jobs:=2}
_make_jobs=$(( $_make_jobs / 2 ))
fi
#...
MAKEOPTS="$MAKEOPTS -j$_make_jobs"
The $_OK_DISTCC check allows me to disable distcc (on gcc builds, for
example, where the parallelism seems to cause out-of-order problems, or
when I just don't want the excess load on my system). And the
_make_jobs=$(distcc -j) setting looks for how many distcc hosts are
available at the time.
The problem is that 'distcc -j' with zeroconf fires off a daemon (See
http://lists.samba.org/archive/distcc/2004q4/002774.html for the
justification -- basically: the startup cost for collecting mDNS
information is worth avoiding in a build that calls distcc many times.)
I saw in paludis/util/output_wrapper.cc that 'outputwrapper' does a wait
for its child to finish. And I saw in distcc's src/zeroconf.c that it
does a pretty standard daemonization process:
pid = fork()
in the child:
1. close fd's 0,1,2
2. open "/dev/null", dup it twice, making sure fd's are 0, 1, and 2
3. chdir "/"
4. on systems that have it, setsid()
5. collect the info, and wait up to 20 seconds for further zeroconf queries
So, from the way 'outputwrapper' works, the problem is that
'outputwrapper's fd's aren't in the set that get closed by 'distcc'
before daemonizing. And 'distcc' would thus sleep for 20 seconds every
time 'bashrc' got sourced, unless I happened to have run 'distcc'
outside of paludis (so that the daemon was already running outside of
outputwrapper).
I'm going to suggest on the distcc list that the daemonization process
closes a larger set of fd's. (There is also a similar problem with some
leaked fd's in the LVM2 utilities -- I've not corresponded w/ that
community ever, but I'll try to find them, too.) But, I just wanted to
share the workaround if anyone else was having trouble (seems unlikely).
[Assuming anyone reading this far is very patient...] I also wanted to
poll paludis dev's to see whether they thought that the problem seems to
be in the 'distcc' code, not in paludis. This seems to be a pretty
common daemonization pattern (fork, open fd's 0-2 to /dev/null, chdir
"/", and setsid). Might there be other programs affected by this?
Would it be better to waitpid on a specific child's pid in
output_wrapper.cc? Or is spawning a daemon a rare enough thing (and
maybe even 'wrong' in some sense) for a child process to do that it's
not worth the effort?
Best,
Ben#include <stdio.h>
#include <unistd.h>
#include <linux/limits.h>
int main(int argc, char **argv) {
int fd;
if (argc < 2) {
fprintf(stderr,"Usage: %s program [args]\n",argv[0]);
return 1;
}
/* linux/limits.h contains NR_OPEN == max number of open fd's */
/* It's 1024*1024 on my system */
/* 1024 might be a (more) reasonable number to use */
/* Also, start at 3 to avoid closing std{in/out/err}, if needed */
for (fd = 3; fd < NR_OPEN; fd++) close(fd);
execvp(argv[1], argv+1);
/* return 1 if exec failed */
return 1;
}
_______________________________________________
paludis-user mailing list
[email protected]
http://lists.pioto.org/mailman/listinfo/paludis-user