On Mon, Nov 06, 2017 at 12:07:52AM -0800, Noah Misch wrote: > I've been enjoying the speed of parallel check-world, but I get spurious > failures from makefile race conditions. Commit c66b438 fixed the simple ones. > More tricky is this problem of multiple "make" processes entering > src/test/regress concurrently, which causes failures like these: > > gcc: error: pg_regress.o: No such file or directory > make[4]: *** [pg_isolation_regress] Error 1 > > /bin/sh: ../../../src/test/isolation/pg_isolation_regress: Permission denied > make -C test_extensions check > make[2]: *** [check] Error 126 > make[2]: Leaving directory > `/home/nm/src/pg/backbranch/10/src/test/isolation' > > /bin/sh: ../../../../src/test/isolation/pg_isolation_regress: Text file busy > make[3]: *** [isolationcheck] Error 126 > make[3]: Leaving directory > `/home/nm/src/pg/backbranch/10/src/test/modules/snapshot_too_old' > > This is reproducible since commit 2038bf4 or earlier; "make -j check-world" > had worse problems before that era. A workaround is to issue "make -j; make > -j -C src/test/isolation" before the check-world.
Commit de0aca6 fixed that problem, but I now see similar trouble from multiple "make" processes running "make -C contrib/test_decoding install" concurrently. This is a risk for any directory named in an EXTRA_INSTALL variable of more than one makefile. Under the right circumstances, this would affect contrib/hstore and others in addition to contrib/test_decoding. That brings me back to the locking idea: > The problem of multiple "make" processes in a directory (especially src/port) > shows up elsewhere. In a cleaned tree, "make -j -C src/bin" or "make -j > installcheck-world" will do it. For more-prominent use cases, src/Makefile > prevents this with ".NOTPARALLEL:" and building first the directories that are > frequent submake targets. Perhaps we could fix the general problem with > directory locking; targets that call "$(MAKE) -C FOO" would first sleep until > FOO's lock is available. That could be tricky to make robust. If one is willing to assume that a lock-holding process never crashes, locking in a shell script is simple: mkdir to lock, rmdir to unlock. I don't want to assume that. The bakery algorithm provides convenient opportunities for checking whether the last locker crashed; I have attached a shell script demonstrating this approach. Better ideas? Otherwise, I'll look into integrating this design into the makefiles. Thanks, nm
bakery.sh
Description: Bourne shell script