Re: Automake's file locking

2021-02-03 Thread Nick Bowler
On 2021-02-03, Bob Friesenhahn  wrote:
> GNU make does have a way to declare that a target (or multiple
> targets) is not safe for parallel use.  This is done via a
> '.NOTPARALLEL: target' type declaration.

According to the manual[1], prerequisites on .NOTPARALLEL target are
ignored and this will simply disable parallel builds completely for
the entire Makefile.  I did a quick test and the manual seems to be
accurate about this.

Order-only prerequisites can be used to prevent GNU make from running
specific rules in parallel.  These are more difficult (but not impossible)
to declare in an interoperable way.

[1] https://www.gnu.org/software/make/manual/make.html#index-_002eNOTPARALLEL

Cheers,
  Nick



Re: Automake's file locking

2021-02-03 Thread Bob Friesenhahn

On Wed, 3 Feb 2021, Zack Weinberg wrote:

Therefore I like the idea of merely relying on the atomicity of
file creation / file rename operations.

These files should reside inside the autom4te.cache directory. I would
not like to change all my scripts and Makefiles that do
  rm -rf autom4te.cache


Agreed.  The approach I'm currently considering is: with the
implementation of the new locking protocol, autom4te will create a
subdirectory of autom4te.cache named after its own version number, and
work only in that directory (thus preventing different versions of
autom4te from tripping over each other).  Each request will be somehow
reduced to a strong hash and given a directory named after the hash
value.  The existence of this directory signals that an autom4te
process is working on a request, and the presence of 'request',
'output', and 'traces' files in that directory signals that the cache
for that request is valid.  If the directory for a request exists but
the output files don't, autom4te will busy-wait for up to some
definite timeout before stealing the lock and starting to work on that
request itself.


This seems like a good approach to me.

There is substantially less danger from independent reconfs (on the 
same or different hosts) than there is from parallel jobs in the 
current build deciding that something should be done and trying to do 
it at the same time.


GNU make does have a way to declare that a target (or multiple 
targets) is not safe for parallel use.  This is done via a 
'.NOTPARALLEL: target' type declaration.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
Public Key, http://www.simplesystems.org/users/bfriesen/public-key.txt



Re: Automake's file locking

2021-02-03 Thread Zack Weinberg
On Thu, Jan 28, 2021 at 6:51 PM Bruno Haible  wrote:
> Zack Weinberg wrote:
> > There is a potential way forward here.  The *only* place in all of
> > Autoconf and Automake where XFile::lock is used, is by autom4te, to
> > take an exclusive lock on the entire contents of autom4te.cache.
> > For this, open-file locks are overkill; we could instead use the
> > battle-tested technique used by Emacs: symlink sentinels.  (See
> > https://git.savannah.gnu.org/cgit/emacs.git/tree/src/filelock.c .)
>
> I can confirm that, while flock() is the most basic/elementary locking
> facility [1], its emulation in gnulib [2] does not really work on
> Solaris. The unit test regularly fails on Solaris.
>
> Therefore I like the idea of merely relying on the atomicity of
> file creation / file rename operations.
>
> These files should reside inside the autom4te.cache directory. I would
> not like to change all my scripts and Makefiles that do
>   rm -rf autom4te.cache

Agreed.  The approach I'm currently considering is: with the
implementation of the new locking protocol, autom4te will create a
subdirectory of autom4te.cache named after its own version number, and
work only in that directory (thus preventing different versions of
autom4te from tripping over each other).  Each request will be somehow
reduced to a strong hash and given a directory named after the hash
value.  The existence of this directory signals that an autom4te
process is working on a request, and the presence of 'request',
'output', and 'traces' files in that directory signals that the cache
for that request is valid.  If the directory for a request exists but
the output files don't, autom4te will busy-wait for up to some
definite timeout before stealing the lock and starting to work on that
request itself.

This would be substantially easier to implement with access to the
Storable, Digest::SHA, and Time::HiRes modules, and that's the
principal reason I suggested bumping our minimum Perl requirement to
5.18 in .

zw



NFS file locking (was: Re: Automake's file locking (was Re: Autoconf/Automake is not using version from AC_INIT))

2021-01-28 Thread pluto--- via Discussion list for automake
Zack Weinberg  wrote:
> ...
> grumpy aside in OpenBSD's "fcntl(2)" manpage:
>
> | This interface follows the completely stupid semantics of System V
> | and IEEE Std 1003.1-1988 ("POSIX.1") that require ...
>
> As I recall, at the time, *neither* flock nor fcntl locks
> were honored *at all* over NFS, so that wouldn't have been
> a consideration.

FWIW, NFS attempted to support some form of file locking at least as
far back as SunOS 3.5, which IIRC predated both SysV and POSIX.  Old-
timers may remember occasional patches to "statd" and "lockd".

I say "attempted" because _correct_ support of network file locking is
fundamentally incomputable in the presence of (transient, recoverable)
communication failures and (non-recoverable) system crashes:  the fact
that the server has not heard from the client (or vice versa) within
some period of time might mean either that the other party has crashed
(and may -- or may not ever -- come back), or that network connectivity
has been lost (and will, presumably, be restored at some unknown future
time).



Re: Automake's file locking

2021-01-28 Thread Bruno Haible
Zack Weinberg wrote:
> There is a potential way forward here.  The *only* place in all of
> Autoconf and Automake where XFile::lock is used, is by autom4te, to
> take an exclusive lock on the entire contents of autom4te.cache.
> For this, open-file locks are overkill; we could instead use the
> battle-tested technique used by Emacs: symlink sentinels.  (See
> https://git.savannah.gnu.org/cgit/emacs.git/tree/src/filelock.c .)

I can confirm that, while flock() is the most basic/elementary locking
facility [1], its emulation in gnulib [2] does not really work on
Solaris. The unit test regularly fails on Solaris.

Therefore I like the idea of merely relying on the atomicity of
file creation / file rename operations.

These files should reside inside the autom4te.cache directory. I would
not like to change all my scripts and Makefiles that do
  rm -rf autom4te.cache
to do something like
  rm -rf autom4te.cache autom4te.tmp.*
instead.

Bruno

[1] https://gavv.github.io/articles/file-locks/
[2] https://www.gnu.org/software/gnulib/manual/html_node/flock.html



Re: Automake's file locking (was Re: Autoconf/Automake is not using version from AC_INIT)

2021-01-28 Thread Paul Eggert

On 1/28/21 10:34 AM, Zack Weinberg wrote:

we could instead use the
battle-tested technique used by Emacs: symlink sentinels.  (See
https://git.savannah.gnu.org/cgit/emacs.git/tree/src/filelock.c  .)


Although that Emacs code is battle-tested, one of the things it does is 
fall back on regular files on platforms where symlinks don't work.


Might be simpler to use a directory sentinel.


The main reason I can think of, not to do this, is that it would make
the locking strategy incompatible with that used by older autom4te;


I would say "don't do that"; just stick with current autom4te.



Re: Automake's file locking (was Re: Autoconf/Automake is not using version from AC_INIT)

2021-01-28 Thread Bob Friesenhahn

On Thu, 28 Jan 2021, Nick Bowler wrote:


If I understand correctly the issue at hand is multiple concurrent
rebuild rules, from a single parallel make implementation, are each
invoking autom4te concurrently and since file locking didn't work,
they clobber each other and things go wrong.


That is what would happen, but what currently happens is if the file 
locking does not work and a parallel build is used, then Autotools 
reports a hard error:


CDPATH="${ZSH_VERSION+.}:" && cd /home/bfriesen/src/graphics/GM && 
/bin/sh '/home/bfriesen/src/graphics/GM/config/missing' aclocal-1.16 
-I m4
autom4te: cannot lock autom4te.cache/requests with mode 2: Invalid 
argument

autom4te: forgo "make -j" or use a file system that supports locks
aclocal-1.16: error: autom4te failed with exit status: 1
gmake: *** [Makefile:4908: /home/bfriesen/src/graphics/GM/aclocal.m4] 
Error 1


In my case there is only one active developer so there would not be 
actual corruption.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
Public Key, http://www.simplesystems.org/users/bfriesen/public-key.txt



Re: Automake's file locking (was Re: Autoconf/Automake is not using version from AC_INIT)

2021-01-28 Thread Bob Friesenhahn

On Thu, 28 Jan 2021, Zack Weinberg wrote:


Do you use different versions of autoconf and/or automake on the
different clients?


No.  That would not make sense.  If a client is not suitably prepared, 
then I don't enable maintainer mode.



The lock appears to be taken speculatively since it is taken before
Autotools checks that there is something to do.

...

The most common case is that there is nothing for Autotools to do
since the user is most often doing a 'make' for some other purpose.


It looks to me like the lock is taken at exactly the point where
autom4te decides that it *does* have something to do. It might be


Perhaps this experience is a side effect of my recent experience 
(regarding AC_INIT and versioning) and not the normal case.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
Public Key, http://www.simplesystems.org/users/bfriesen/public-key.txt



Re: Automake's file locking (was Re: Autoconf/Automake is not using version from AC_INIT)

2021-01-28 Thread Nick Bowler
On 2021-01-28, Zack Weinberg  wrote:
> There is a potential way forward here.  The *only* place in all of
> Autoconf and Automake where XFile::lock is used, is by autom4te, to
> take an exclusive lock on the entire contents of autom4te.cache.
> For this, open-file locks are overkill; we could instead use the
> battle-tested technique used by Emacs: symlink sentinels.  (See
> https://git.savannah.gnu.org/cgit/emacs.git/tree/src/filelock.c .)
>
> The main reason I can think of, not to do this, is that it would make
> the locking strategy incompatible with that used by older autom4te;
> this could come up, for instance, if you’ve got your source directory
> on NFS and you’re building on two different clients in two different
> build directories.  On the other hand, this kind of version skew is
> going to cause problems anyway when they fight over who gets to write
> generated scripts to the source directory, so maybe it would be ok to
> declare “don’t do that” and move on.  What do others think?

I think it's reasonable to expect concurrent builds running on different
hosts to work if and only if they are in different build directories and
no rules modify anything in srcdir.  Otherwise "don't do that."

If I understand correctly the issue at hand is multiple concurrent
rebuild rules, from a single parallel make implementation, are each
invoking autom4te concurrently and since file locking didn't work,
they clobber each other and things go wrong.

I believe mkdir is the most portable mechanism to achieve "test and set"
type semantics at the filesystem level.  I believe this works everywhere,
even on old versions of NFS that don't support O_EXCL, and on filesystems
like FAT that don't support any kind of link.

The challenge with alternate filesystem locking methods compared to
proper file locks is that you need a way to recover when your program
dies before it can clean up its lock files or directories.

Could the issue be fixed by just serializing the rebuild rules within
make?  This might be way easier to do.  For example, we can easily
do it in NetBSD make:

  all: recover-rule1 recover-rule2
  clean:
rm -f recover-rule1 recover-rule2

  recover-rule1 recover-rule2:
@echo start $@; sleep 5; :>$@; echo end $@

  .ORDER: recover-rule1 recover-rule2

Heirloom make has a very similar mechanism that does not guarantee
relative order:

  .MUTEX: recover-rule1 recover-rule2

Both of these will ensure the two rules are not run concurrently by a
single parallel make invocation.

GNU make has order-only prerequisites.  Unlike the prior methods, this
is trickier to do without breaking other makes, but I have used a method
like this one with success:

  # goal here is to get rule1_seq set to empty string on non-GNU makes
  features = $(.FEATURES) # workaround problem with old FreeBSD make
  orderonly = $(findstring order-only,$(features))
  rule1_seq = $(orderonly:order-only=|recover-rule1)

  recover-rule2: $(rule1_seq)

I don't have experience with parallel builds using other makes.

Cheers,
  Nick



Re: Automake's file locking (was Re: Autoconf/Automake is not using version from AC_INIT)

2021-01-28 Thread Zack Weinberg
On Thu, Jan 28, 2021 at 2:16 PM Bob Friesenhahn
 wrote:
> On Thu, 28 Jan 2021, Zack Weinberg wrote:
> >
> > The main reason I can think of, not to do this, is that it would make
> > the locking strategy incompatible with that used by older autom4te;
> > this could come up, for instance, if you’ve got your source directory
> > on NFS and you’re building on two different clients in two different
> > build directories.  On the other hand, this kind of version skew is
> > going to cause problems anyway when they fight over who gets to write
> > generated scripts to the source directory, so maybe it would be ok to
> > declare “don’t do that” and move on.  What do others think?
>
> This is exactly what I do.  I keep the source files on a file server
> so that I can build on several different types of clients.  This used
> to even include Microsoft Windows clients using CIFS.

Do you use different versions of autoconf and/or automake on the
different clients?

> The lock appears to be taken speculatively since it is taken before
> Autotools checks that there is something to do.
...
> The most common case is that there is nothing for Autotools to do
> since the user is most often doing a 'make' for some other purpose.

It looks to me like the lock is taken at exactly the point where
autom4te decides that it *does* have something to do. It might be
possible to change it to take a *read* lock first and only upgrade to
a write lock if new files are to be added to the cache, but Make
shouldn't be running the autotools at all if they have nothing to do
(which I suppose takes us over to the *other* thread about your
problems with automake and configure's dependencies :-)

zw



Re: Automake's file locking (was Re: Autoconf/Automake is not using version from AC_INIT)

2021-01-28 Thread Bob Friesenhahn

On Thu, 28 Jan 2021, Zack Weinberg wrote:


The main reason I can think of, not to do this, is that it would make
the locking strategy incompatible with that used by older autom4te;
this could come up, for instance, if you’ve got your source directory
on NFS and you’re building on two different clients in two different
build directories.  On the other hand, this kind of version skew is
going to cause problems anyway when they fight over who gets to write
generated scripts to the source directory, so maybe it would be ok to
declare “don’t do that” and move on.  What do others think?


This is exactly what I do.  I keep the source files on a file server 
so that I can build on several different types of clients.  This used 
to even include Microsoft Windows clients using CIFS.


The lock appears to be taken speculatively since it is taken before 
Autotools checks that there is something to do.  It would be nicer if 
Autotools could check first if there is something to do, acquire the 
lock, check if there is still something to do, and then do the work.


The most common case is that there is nothing for Autotools to do 
since the user is most often doing a 'make' for some other purpose.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
Public Key, http://www.simplesystems.org/users/bfriesen/public-key.txt


Re: Automake's file locking (was Re: Autoconf/Automake is not using version from AC_INIT)

2021-01-28 Thread Zack Weinberg
On Mon, Jan 25, 2021 at 11:18 AM Bob Friesenhahn
 wrote:
> On Mon, 25 Jan 2021, Zack Weinberg wrote:
> > Automake "just" calls Perl's 'flock' built-in (see 'sub lock' in
> > Automake/XFile.pm) (this code is copied into Autoconf under the
> > Autom4te:: namespace).  It would be relatively straightforward to
> > teach it to try 'fcntl(F_SETLKW, ...)' if that fails.

I was wrong about this.  It is not practical to make a Perl program
directly call ‘fcntl(F_SETLKW)’ without use of CPAN modules.  This is
because, at the C level, ‘fcntl(F_SETLKW)’ takes a ‘struct flock’
argument, and we have no way of knowing what the layout of that
structure is.  (See https://metacpan.org/pod/File%3a%3aFcntlLock for
gory details.  We can’t adopt the implementation of that module,
because it relies on running a C compiler, and the locking is needed
at a point when we do not yet know whether a C compiler is available.)

> It may be that moving forward to 'fcntl(F_SETLKW, ...)' by default and
> then falling back to legacy 'flock' would be best.  Or perhaps
> discarding use of legacy 'flock' entirely.
>
> Most likely the decision as to what to do was based on what was the
> oldest primitive supported at the time.

I need to reemphasize that the decision here was made by the Perl
developers, not the Automake or Autoconf developers, and it was made a
very long time ago—judging by e.g. references to h2ph in ‘perldoc
Fcntl’, probably circa perl 5.000!  I am also going to guess that the
motivation for preferring flock was related to the motivation for this
grumpy aside in OpenBSD’s ‘fcntl(2)’ manpage:

| This interface follows the completely stupid semantics of System V
| and IEEE Std 1003.1-1988 (“POSIX.1”) that require that *all* locks
| associated with a file for a given process are removed when *any* file
| descriptor for that file is closed by that process
| ...
| The flock(2) interface has much more rational last close semantics

(Emphasis in original.)  As I recall, at the time, *neither* flock nor
fcntl locks were honored *at all* over NFS, so that wouldn’t have been
a consideration.



There is a potential way forward here.  The *only* place in all of
Autoconf and Automake where XFile::lock is used, is by autom4te, to
take an exclusive lock on the entire contents of autom4te.cache.
For this, open-file locks are overkill; we could instead use the
battle-tested technique used by Emacs: symlink sentinels.  (See
https://git.savannah.gnu.org/cgit/emacs.git/tree/src/filelock.c .)

The main reason I can think of, not to do this, is that it would make
the locking strategy incompatible with that used by older autom4te;
this could come up, for instance, if you’ve got your source directory
on NFS and you’re building on two different clients in two different
build directories.  On the other hand, this kind of version skew is
going to cause problems anyway when they fight over who gets to write
generated scripts to the source directory, so maybe it would be ok to
declare “don’t do that” and move on.  What do others think?

zw



Re: Automake's file locking (was Re: Autoconf/Automake is not using version from AC_INIT)

2021-01-25 Thread Bob Friesenhahn

On Mon, 25 Jan 2021, Zack Weinberg wrote:


Automake "just" calls Perl's 'flock' built-in (see 'sub lock' in
Automake/XFile.pm) (this code is copied into Autoconf under the
Autom4te:: namespace).  It would be relatively straightforward to
teach it to try 'fcntl(F_SETLKW, ...)' if that fails.  Do you know
whether that would be sufficient?  If not, we may be up a creek, since
depending on CPAN modules is a non-starter.


I expect that it would be that "simple" except for of course 
everything involved with making sure that things are working properly 
for everyone.


It may be that moving forward to 'fcntl(F_SETLKW, ...)' by default and 
then falling back to legacy 'flock' would be best.  Or perhaps 
discarding use of legacy 'flock' entirely.


Most likely the decision as to what to do was based on what was the 
oldest primitive supported at the time.  The GNU/Linux manual page 
says that "the flock() call first appeared in 4.2BSD".  It also says 
"Since Linux 2.6.12, NFS clients support flock() locks by emulating 
them as fcntl(2) byte-range locks on the entire file.".  There are a 
number of warnings in the manual page regarding the danger of mixing 
locking primitives.  It was never intended that flock() work over a 
network share.


It seems unlikely that Autotools development is going to be done on a 
system lacking POSIX locking since such a system would not be 
considered a usable system for most purposes.  If a project does not 
provide a 'maintainer mode' to stop maintainer rules from firing, then 
this could impact archaic targets from the early '90s.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
Public Key, http://www.simplesystems.org/users/bfriesen/public-key.txt



Automake's file locking (was Re: Autoconf/Automake is not using version from AC_INIT)

2021-01-25 Thread Zack Weinberg
On Mon, Jan 25, 2021 at 9:52 AM Bob Friesenhahn
 wrote:
> At the moment it is a big deal for me because the locking prototol
> that Autoconf/Automake is using does not work with NFS mounts for
> Illumos-derived systems when the client is also an Illumos-derived
> system, because Illumos failed to support the legacy locking protocol
> used when the system locking daemon was re-implemented from scratch.
...
> It is likely that a small patch to Automake Perl-based locking code
> could solve this issue by using the same fall-back to using POSIX
> locking rather than legacy locking the same way that GNU/Linux does.
> It may also be that using POSIX locking in the first place is the
> solution.

Automake "just" calls Perl's 'flock' built-in (see 'sub lock' in
Automake/XFile.pm) (this code is copied into Autoconf under the
Autom4te:: namespace).  It would be relatively straightforward to
teach it to try 'fcntl(F_SETLKW, ...)' if that fails.  Do you know
whether that would be sufficient?  If not, we may be up a creek, since
depending on CPAN modules is a non-starter.

zw