Re: gmake and ccache conspiring together in creating gremlins

2021-02-08 Thread Sam Varshavchik
On Mon, Feb 8, 2021 at 2:38 PM Paul Smith  wrote:
>
> On Mon, 2021-02-08 at 10:43 +, Edward Welbourne wrote:
> > Sounds to me like that's a bug: when the descriptors are closed, the
> > part of MAKEFLAGS that claims they're make's jobserver file
> > descriptors should be removed, since that's when the claim stops
> > being true.
>
> I believe there have been other similar issues reported recently.
>
> Certainly fixing MAKEFLAGS when we run without jobserver available is
> something that could be done.
>
> There is a loss of debugging information if we make this change: today
> make can detect if it was invoked in a way that _should_ expect to
> receive a jobserver context, but _didn't_ receive that context.  That
> is, if make sees that jobserver-auth is set but it can't open the
> jobserver pipes it can warn the user that most likely there's a problem
> in their environment or with the setup of their makefiles.
>
> Without this warning there's no way to know when this situation occurs.
>  It's easy to create a situation where every sub-make will create its
> own completely unique jobserver domain.  So you start the top make with
> -j4 and run 4 sub-makes; if you do it wrong then each of 4 sub-makes
> could create a new jobserver domain, and now you're running 16 jobs in
> parallel instead of 4... there's no way for make to warn you about this
> situation.

One thought occurred to me. Specifically: when make executes what it
believes to be something other than a recursive invocation of $(MAKE),
and it closes the job server pipe file descriptors for that, it can
also:

1) Add an additional parameter to MAKEFLAGS, let's call it
"--no-jobserver", and perhaps remove the --jobserver-auth parameter
completely. It might be easier just to append something there, instead
of surgically removing this.

2) Make checks for a --no-jobserver in MAKEFLAGS when it starts. If
it's there it does NOT attempt to validate the file descriptors that
are given in --jobserver-auth (if this parameter is preserved). It's a
given that they're not there:

  if (!FD_OK (job_fds[0]) || !FD_OK (job_fds[1]) || make_job_rfd () < 0)

Don't even do that. What happens right now a warning message gets
printed and make runs without a job server. This change should have
the same result, print the warning but skip the FD_OK tests.

This will result in the same warning, but it should avoid triggering
the bug that I found.

However that might cause a minor regression in LTO linking. I think
that this prevents the LTO linker's internal invocation of make from
finding that it can attach to the original make process's job server.

>From sifting through strace dumps, I see that a linker-invoked make
gets its own -j flag. It appears that the linker is courteous enough
to count how many CPUs it has and use it to construct its own -j flag.

How about this, safe approach: once --no-jobserver is there it stays
there, and gets propagated to all recursively invoked makes. If an
invoke make finds that it has both a --no-jobserver and a -j flag,
it'll warn and refuse to create its own job server, and then proceed
executing one command at a time.

This prevents an arithmetic proliferation of job worker processes if
the original job server's file descriptors get lost. Currently
recursively-invoked makes will find, and attach themselves to, an
existing job server. This is nice; but this is vulnerable to an edge
case that I think I'm hitting: a false positive involving a leaked
file descriptor. This change encourages fixing whatever's causing make
to fail to detect a recursive invocation.



[bug #48643] Irrelevant targets can confuse make on which pattern rule to select.

2021-02-08 Thread Steven Simpson
Follow-up Comment #3, bug #48643 (project make):

With the is_target test restored in implicit.c, "make check" fails:


features/patternrules ... Error running
/home/simpsons/Works/make/tests/../make (expected 0; got 512):
'/home/simpsons/Works/make/tests/../make' '-f'
'work/features/patternrules.mk.6'
FAILED (11/12 passed)


Seems to be in conflict with bug #17752 - the is_target test was removed to
resolve it, I gather.

I adapted a makefile from features/patternrules so that the intermediate can
have a specific suffix:


STEM = xyz
BIN = $(STEM)$(SFX)
COPY = $(STEM).cp
SRC = $(STEM).c
allbroken: $(COPY) $(BIN) ; @echo ok
$(SRC): ; @echo 'main(){}'
%.cp: %$(SFX) ; @echo $@ from $<
%$(SFX) : %.c ; @echo $@ from $<


Used with no builtin rules, if $(SFX) is non-empty, this works in 3.80, 3.81
and 4.3.  If $(SFX) is empty, only 3.81 fails.

So a %.sfx:%.c rule works, but %:%.c doesn't, because the target lacks a
suffix.  It seems this test eliminates the rule from the candidates:


  /* Rules that can match any filename and are not terminal
 are ignored if we're recursing, so that they cannot be
 intermediate files.  */
  if (recursions > 0 && target[1] == '\0' && !rule->terminal)
continue;


This suggests that the test case in bug #17752 is a 'feature', as it tries to
use a rule "that can match any filename" to build an intermediate,
specifically, a builtin rule %:%.c.  The in_use flag prevents these repeating,
as in foo.c.c.c, but with umpteen builtin rules of this sort, every
arrangement of every subset of suffixes in these rules is tried, and that
takes a long time, which can result in some checks (variables/EXTRA_PREREQS)
being aborted.

I managed to introduce a new parameter to pattern_search, unsigned anydepth,
which is incremented on the recursive call only if such a rule is used.  The
test above then becomes:


  char anyrule = 0;
  if (target[1] == '\0' && !rule->terminal)
{
  if (anydepth > 3)
continue;
  anyrule = 1;
}


anyrule is then stored in the tryrules entry, so it can be tested on
recursion:


  if (pattern_search (int_file,
  0,
  depth + 1,
  recursions + 1,
  anydepth + !!tryrules[ri].anyrule))


This allows %:%.X rules to be applied at any depth, but limits the total
number on the stack.

For me, "make check" passes with the anydepth limit set to 0, 1, 2 or 3, but 4
took too long on the variables/EXTRA_PREREQS tests.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




Re: gmake and ccache conspiring together in creating gremlins

2021-02-08 Thread Paul Smith
On Mon, 2021-02-08 at 10:43 +, Edward Welbourne wrote:
> Sounds to me like that's a bug: when the descriptors are closed, the
> part of MAKEFLAGS that claims they're make's jobserver file
> descriptors should be removed, since that's when the claim stops
> being true.

I believe there have been other similar issues reported recently.

Certainly fixing MAKEFLAGS when we run without jobserver available is
something that could be done.

There is a loss of debugging information if we make this change: today
make can detect if it was invoked in a way that _should_ expect to
receive a jobserver context, but _didn't_ receive that context.  That
is, if make sees that jobserver-auth is set but it can't open the
jobserver pipes it can warn the user that most likely there's a problem
in their environment or with the setup of their makefiles.

Without this warning there's no way to know when this situation occurs.
 It's easy to create a situation where every sub-make will create its
own completely unique jobserver domain.  So you start the top make with
-j4 and run 4 sub-makes; if you do it wrong then each of 4 sub-makes
could create a new jobserver domain, and now you're running 16 jobs in
parallel instead of 4... there's no way for make to warn you about this
situation.


Another option that I'm considering is moving away from anonymous pipes
and switching to either named pipes or named semaphores instead (I'm
not sure if one or the other is preferred WRT portability).  If I did
that then all of this hullabaloo around open/closed file descriptors,
inherited FDs opened blocking vs. non-blocking, and "passing through"
jobserver access across non-make boundaries would go away.

I liked the original implementation for these reasons:
 * It is very generic: pretty much every system supports simple pipes.
    However, in the end only POSIX systems are using anonymous pipes
   anyway (Windows jobserver already uses Windows named semaphores).
 * It is very easy to manage: using named pipes means that make is
   creating context on the filesystem that it needs to manage and clean
   up, which it otherwise never does; we need to worry about
   permissions, etc.  Anonymous pipes just go away magically.
 * It is very safe: it's not possible for any other process to access
   the pipe and mess up the jobserver count, unless it was invoked by
   make in a "sub-make context".

But, it is difficult to use in some subtle ways as we've seen.

A change would also mean that the format of the --jobserver-auth flag
would change: if the value provided were the current 2 numbers then the
old-school anonymous pipe process would be used.  If it were a path,
then we'd assume it was a named pipe (or named semaphore).

Other tools like LTO etc. that look for jobserver-auth would,
hopefully, be able to manage this.  I tried to be clear about the
accepted formats and behaviors in the GNU make documentation; hopefully
developers are handling incorrect formats properly.




Re: gmake and ccache conspiring together in creating gremlins

2021-02-08 Thread Dmitry Goncharov via Bug reports and discussion for GNU make
On Mon, Feb 8, 2021 at 12:36 PM Edward Welbourne  wrote:
> Sounds to me like that's a bug: when the descriptors are closed, the
> part of MAKEFLAGS that claims they're make's jobserver file descriptors
> should be removed, since that's when the claim stops being true.

make uses posix_spawn by default to create children.
posix_spawn makes it difficult to modify env per child.
As a workaround the user can have the recipe remove (or modify) MAKEFLAGS
E.g.

%.o: %.c ; unset MAKEFLAGS && $(CC)  $(CFLAGS) -o $@ -c $<

regards, Dmitry



Re: gmake and ccache conspiring together in creating gremlins

2021-02-08 Thread Dmitry Goncharov via Bug reports and discussion for GNU make
On Mon, Feb 8, 2021 at 12:51 PM Dmitry Goncharov
 wrote:
>
> On Mon, Feb 8, 2021 at 12:36 PM Edward Welbourne  
> wrote:
> > Sounds to me like that's a bug: when the descriptors are closed, the
> > part of MAKEFLAGS that claims they're make's jobserver file descriptors
> > should be removed, since that's when the claim stops being true.
>
> make uses posix_spawn by default to create children.
> posix_spawn makes it difficult to modify env per child.
> As a workaround the user can have the recipe remove (or modify) MAKEFLAGS
> E.g.
>
> %.o: %.c ; unset MAKEFLAGS && $(CC)  $(CFLAGS) -o $@ -c $<

Oops. Forgot that posix_spawn takes an envp parameter.
Yes, worth fixing.

regards, Dmitry



Re: gmake and ccache conspiring together in creating gremlins

2021-02-08 Thread Edward Welbourne
Hi Sam,

Thanks for a delightfully illuminating analysis.
I hope you enjoyed the sleuthing, even if it did cost you a month !

> The TLDR of the above: make reads the job server's file descriptors
> from the MAKEFLAGS environment variable, then checks here if they
> actually exist. If they don't exist, make will create the job server
> pipe. Important: by default they will be file descriptors 3 and 4.
> This becomes a key player in this mystery, a little bit later.
>
> When make spawns a child job (other than a recursive make) the job
> server file descriptors get closed (marked O_CLOEXEC before the actual
> execve), but MAKEFLAGS remains in the environment.

Sounds to me like that's a bug: when the descriptors are closed, the
part of MAKEFLAGS that claims they're make's jobserver file descriptors
should be removed, since that's when the claim stops being true.

Eddy.