Re: -j fails on DYNIX/ptx
Date: Wed, 31 May 2000 14:22:39 -0400 (EDT) From: "Paul D. Smith" <[EMAIL PROTECTED]> Two solutions immediately present themselves: 1) Just wrap stat(2) in a loop checking for EINTR, even though that's not possible on any standard UNIX system. Actually, EINTR is possible on a POSIX-compliant system, since POSIX allows (but does not require) stat to fail with EINTR. Wrapping stat is the right fix, I think. Also, while we're on the subject, that code should check for other failures. For example, if the file can't be stat'ed because the parent directory is unreadable, an error should be reported. Here's a proposed patch. 2000-05-31 Paul Eggert <[EMAIL PROTECTED]> * remake.c (name_mtime): Check for stat failures. Retry if EINTR. === RCS file: remake.c,v retrieving revision 3.79.0.2 retrieving revision 3.79.0.3 diff -pu -r3.79.0.2 -r3.79.0.3 --- remake.c2000/05/22 16:45:52 3.79.0.2 +++ remake.c2000/05/31 21:25:10 3.79.0.3 @@ -1216,8 +1216,13 @@ name_mtime (name) { struct stat st; - if (stat (name, &st) < 0) -return (FILE_TIMESTAMP) -1; + while (stat (name, &st) != 0) +if (errno != EINTR) + { + if (errno != ENOENT && errno != ENOTDIR) + perror_with_name ("stat:", name); + return (FILE_TIMESTAMP) -1; + } return FILE_TIMESTAMP_STAT_MODTIME (st); }
RE: -j fails on DYNIX/ptx
I can confirm that many filesystem operations can fail with EINTR when operating over NFS. However, someone had to be causing the signals in question - SIGALRM, maybe, or keyboard SIGINT, etc. Protecting stat() is usually the most important remedy, since failure will cause you to think that a file does not exist, even though it does. The big question is what is generating the signals - presumably it's SIGCHLD in this case, and there's probably not much you can do about that. -- Howard Chu Chief Architect, Symas Corp. Director, Highland Sun http://www.symas.com http://highlandsun.com/hyc > -Original Message- > From: Paul D. Smith [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, May 31, 2000 12:22 PM > To: Michael Sterrett -Mr. Bones.- > Cc: [EMAIL PROTECTED] > Subject: Re: -j fails on DYNIX/ptx > > > %% "Michael Sterrett -Mr. Bones.-" <[EMAIL PROTECTED]> writes: > > ms> Do the specifications say that EINTR is not required or that it > ms> is forbidden? > > Hmm. They say that a function may have more return error codes than > listed in the standards, so you're right: I guess there's nothing > technically preventing EINTR. > > stat() is listed as required to be safe to call within a signal handler, > which isn't directly related of course. > > ms> EINTR A signal was caught during the stat() or > ms> lstat() function. > > ms> Without the code, there's no way for me to know if Solaris will > ms> actually ever fail with EINTR, but the man pages seems > to indicate > ms> that it *could*. > > I suspect, but can't prove, that you won't get EINTR from stat(2) for > "normal" filesystems. > > ms> I guess I'm still not convinced that this problem couldn't be > ms> reproduced if sufficiently adverse conditions were > ms> encountered, even on "normal" UNIX systems. I'm for adding > ms> the loop on EINTR to the GNU make code base. > > I just find it hard to believe that there have been no other reported > cases of this anywhere else; if it's possible, no matter how obscure, it > seems like _someone_ else would have hit it somewhere, by now. Sigh. > > The problem is, there are a number of places that use stat(2) in GNU > make in addition to this one. Do we need to armor them all? What about > other system calls? > > If this is a real (even potential) problem for most/all OS's, then maybe > a different solution than wrapping all the system calls in EINTR checks > is in order. > > -- > -- > - > Paul D. Smith <[EMAIL PROTECTED]> Find some GNU make tips at: > http://www.gnu.org http://www.ultranet.com/~pauld/gmake/ "Please remain calm...I may be mad, but I am a professional." --Mad Scientist
Re: -j fails on DYNIX/ptx
%% "Michael Sterrett -Mr. Bones.-" <[EMAIL PROTECTED]> writes: ms> Do the specifications say that EINTR is not required or that it ms> is forbidden? Hmm. They say that a function may have more return error codes than listed in the standards, so you're right: I guess there's nothing technically preventing EINTR. stat() is listed as required to be safe to call within a signal handler, which isn't directly related of course. ms> EINTR A signal was caught during the stat() or ms> lstat() function. ms> Without the code, there's no way for me to know if Solaris will ms> actually ever fail with EINTR, but the man pages seems to indicate ms> that it *could*. I suspect, but can't prove, that you won't get EINTR from stat(2) for "normal" filesystems. ms> I guess I'm still not convinced that this problem couldn't be ms> reproduced if sufficiently adverse conditions were ms> encountered, even on "normal" UNIX systems. I'm for adding ms> the loop on EINTR to the GNU make code base. I just find it hard to believe that there have been no other reported cases of this anywhere else; if it's possible, no matter how obscure, it seems like _someone_ else would have hit it somewhere, by now. Sigh. The problem is, there are a number of places that use stat(2) in GNU make in addition to this one. Do we need to armor them all? What about other system calls? If this is a real (even potential) problem for most/all OS's, then maybe a different solution than wrapping all the system calls in EINTR checks is in order. -- --- Paul D. Smith <[EMAIL PROTECTED]> Find some GNU make tips at: http://www.gnu.org http://www.ultranet.com/~pauld/gmake/ "Please remain calm...I may be mad, but I am a professional." --Mad Scientist
Re: -j fails on DYNIX/ptx
On Wed, 31 May 2000, Paul D. Smith wrote: > OK, I've investigated this further. The reason you never see this > problem on "normal" UNIX systems is that it's not legal for stat(2) to > fail with EINTR. In other words, stat(2) is not interruptible, by > definition, due to signals. Looking at both the POSIX and SingleUNIX > specifications for stat(2), EINTR is not a legal error state when > stat(2) returns. > > Two solutions immediately present themselves: > > 1) Just wrap stat(2) in a loop checking for EINTR, even though that's > not possible on any standard UNIX system. I don't think this would > be much of a slowdown in the code, but others might disagree (for > sure this stat(2) is one of the most common system calls make uses). > > 2) Use a configure check for this OS (I don't see how a configure macro > for this can easily be written) and only wrap the stat(2) in an > EINTR loop on this OS (i386-sequent-sysv4). Paul - Do the specifications say that EINTR is not required or that it is forbidden? From the man page for stat(2) on Solaris: --CUT--- ERRORS stat() and lstat() fail if one or more of the following are true: EINTR A signal was caught during the stat() or lstat() function. --CUT--- Without the code, there's no way for me to know if Solaris will actually ever fail with EINTR, but the man pages seems to indicate that it *could*. I guess I'm still not convinced that this problem couldn't be reproduced if sufficiently adverse conditions were encountered, even on "normal" UNIX systems. I'm for adding the loop on EINTR to the GNU make code base. Thanks for your work on this, Michael Sterrett -Mr. Bones.- [EMAIL PROTECTED]
Re: -j fails on DYNIX/ptx
%% "Michael Sterrett -Mr. Bones.-" <[EMAIL PROTECTED]> writes: Re: an issue with GNU make 3.78 and above on DYNIX/ptx... ms> $ gmake --version ms> GNU Make version 3.78.1, by Richard Stallman and Roland McGrath. ms> Built for i386-sequent-sysv4 ms> $ uname -a ms> DYNIX/ptx roll 4.0 V4.4.4 i386 ms> Incidently, I can also reproduce it on V4.4.7. ms> TARGETS = $(patsubst %.abc,%.xyz,$(wildcard *[0-9].abc)) ms> %.xyz: %.abc ms> @touch $@ ms> all: $(TARGETS) ms> There are 100 files in the directory called 1.abc, 2.abc, and so on. ms> $ gmake -j ms> gmake: *** No rule to make target `12.abc', needed by `12.xyz'. Stop. ms> gmake: *** Waiting for unfinished jobs ms> The number varies, but it usually fails somewhere. Without the -j ms> option, or with -j1, the build completes as expected. Also, gmake ms> -j2 is just as unreliable so I don't think it's a resource or ms> memory problem. The problem turns out to be that the stat(2) system call in remake.c:name_mtime() is failing with EINTR. Wrapping it in a loop to repeat on EINTR solves the problem. - OK, I've investigated this further. The reason you never see this problem on "normal" UNIX systems is that it's not legal for stat(2) to fail with EINTR. In other words, stat(2) is not interruptible, by definition, due to signals. Looking at both the POSIX and SingleUNIX specifications for stat(2), EINTR is not a legal error state when stat(2) returns. Two solutions immediately present themselves: 1) Just wrap stat(2) in a loop checking for EINTR, even though that's not possible on any standard UNIX system. I don't think this would be much of a slowdown in the code, but others might disagree (for sure this stat(2) is one of the most common system calls make uses). 2) Use a configure check for this OS (I don't see how a configure macro for this can easily be written) and only wrap the stat(2) in an EINTR loop on this OS (i386-sequent-sysv4). -- --- Paul D. Smith <[EMAIL PROTECTED]> Find some GNU make tips at: http://www.gnu.org http://www.ultranet.com/~pauld/gmake/ "Please remain calm...I may be mad, but I am a professional." --Mad Scientist
RE: HP-UX 64 bit bug
Sorry folks false alarm. It seems that the Imake that generated the makefiles that GNU make died on was the culprit as it wasn't being built correctly for some reason and so generated bad rules. Regards, Mark Syms > -Original Message- > From: Mark Syms > Sent: Wednesday, May 24, 2000 10:37 AM > To: '[EMAIL PROTECTED]' > Subject: HP-UX 64 bit bug > > We are having some problems using GNU make on a new HP 9000 L2000 machine > (64 bit) having moved from using older 32 bit machines. > > It appears that the dependency checking is getting confused somewhere when > trying to build a file. > > Makefile snippet > > > transport.c: $(TRANSCOMMSRC)/transport.c > $(RM) $@ > $(LN) $? $@ > > TRANSCOMMSRC is ../../lib/xtrans > transport.c is itself a symbolic link to a source tree i.e. the expected > result is a symbolic link to a symbolic link to a real file. > > HP 9000 L2000 (64 bit PA RISC) > -- > > Reading makefiles... > Reading makefile `Makefile'... > Updating makefiles > Makefile `Makefile' might loop; not remaking it. > Updating goal targets > Considering target file `../../lib/xtrans/transport.c'. > Looking for an implicit rule for `../../lib/xtrans/transport.c'. > Trying pattern rule with stem `transport'. > Trying rule dependency `/src/cascade/main/X/lib/ICE'. > Trying implicit dependency `../../lib/xtrans//transport.c'. > Found an implicit rule for `../../lib/xtrans/transport.c'. > Considering target file `/src/cascade/main/X/lib/ICE'. > > > HP 9000 C110 (32 bit PA-RISC) > - > > Reading makefiles... > Reading makefile `Makefile'... > Updating makefiles > Makefile `Makefile' might loop; not remaking it. > Updating goal targets > Considering target file `transport.c'. > File `transport.c' does not exist. > Considering target file `../../lib/xtrans/transport.c'. >Looking for an implicit rule for `../../lib/xtrans/transport.c'. >Trying pattern rule with stem `transport'. >Trying implicit dependency > `../../lib/xtrans//src/cascade/main/X/lib/ICE/transport.c'. >Trying pattern rule with stem `transport'. >Trying implicit dependency `../../lib/xtrans/transport.y'. > > -- > --- > > As can be seen from the debug information the 32 bit and 64 bit machines > diverge on line 8 with the 64 bit trying a rule dependency and the 32 > using an implicit rule. If this continues the following happens :- > > Considering target file `../../lib/xtrans///transport.c'. > Looking for an implicit rule for `../../lib/xtrans///transport.c'. > Trying pattern rule with stem `transport'. > Trying rule dependency `/src/cascade/main/X/lib/ICE'. > Trying implicit dependency `../../lib/xtranstransport.c'. > Found an implicit rule for `../../lib/xtrans///transport.c'. > Considering target file `/src/cascade/main/X/lib/ICE'. > File `/src/cascade/main/X/lib/ICE' was considered already. > Considering target file `../../lib/xtranstransport.c'. >Looking for an implicit rule for `../../lib/xtranstransport.c'. >Trying pattern rule with stem `transport'. >Trying rule dependency `/src/cascade/main/X/lib/ICE'. > > As can easily be seen extra /'s are added to the rule checking, this > continues seemingly without termination until a limit is reached > (recursion depth possibly ?). > > This problem has been observed in 3.75, 3.77 and 3.79 gmake with binary > and source (built on both 32 or 64 bit platforms) distributions. Having > had a cursory look with a debugger on the 64 bit machine it seems that the > rules are corrupt before the parsing operation is completed. > > Any assistance would be appreciated. If any further information is > required please contact me. > > Mark Syms > > Software Engineer > Citrix Systems (Research and Development) Ltd > +44 1223 568 953 > [EMAIL PROTECTED]