Re: [ast-developers] Nonconforming glibc/Linux posix_spawn() = ast-ksh 2012-09-27 fails
On Mon, 01 Oct 2012 13:49:28 +0200 Michal Hlavinka wrote: #if _lib_posix_spawn 2 if (waitpid(pid, err, WNOHANG|WNOWAIT) == pid EXIT_STATUS(err) == 127) { while (waitpid(pid, NiL, 0) == -1 errno == EINTR); if (!access(path, X_OK)) errno = ENOEXEC; pid = -1; } #endif It will fail with EINVAL, because of WNOWAIT. This option is not supported in waitpid. It's supported only in waitid(). so you'd have to use something like pid_t pid; siginfo_t err; if (!waitid(P_PID, pid, err, WEXITED|WNOHANG|WNOWAIT) err.si_status == 127) thanks for pointing out the WNOWAIT misuse alas even with the correct waitid() usage it does not solve the problem with all posix-spawn() implementations that use exit code 127 as a catch all for posix_spawn() related problems in the child process in the particular case of how shell implementations *portably* determine binary executable vs shell script (exec fails with ENOEXEC) such posix_spawn() implementations are useless e.g. a shell calling posix_spawn() must be able to differentiate EAGAINretry in case some resources become free E2BIG diagnostic or apply xargs algorithm if appropriate ENOEXEC either a shell script or a diagnostic and you can't get that from exit status 127 ___ ast-developers mailing list ast-developers@research.att.com https://mailman.research.att.com/mailman/listinfo/ast-developers
[ast-developers] Nonconforming glibc/Linux posix_spawn() = ast-ksh 2012-09-27 fails
On 28 September 2012 07:44, Glenn Fowler g...@research.att.com wrote: { INIT ast-ksh } 2012-09-27 alphas posted to www.research.att.com/sw/download/alpha/ We experience a lot of failures with ast-ksh 2012-09-27 on Suse 12.2 Linux and latest Fedora: test arith begins at 2012-09-28+08:51:50 arith.sh[420]: compound var arithmetic failed arith.sh[421]: compound var arithmetic failed arith.sh[422]: compound var arithmetic failed arith.sh[423]: compound var arithmetic failed arith.sh[424]: compound var arithmetic failed arith.sh[425]: compound var arithmetic failed arith.sh[426]: compound var arithmetic failed test arith failed at 2012-09-28+08:51:50 with exit code 1 [ 201 tests 1 error ] test attributes begins at 2012-09-28+09:19:32 attributes.sh[128]: attributes not cleared for script execution attributes.sh[133]: typeset -L should not be inherited test attributes failed at 2012-09-28+09:19:34 with exit code 1 [ 110 tests 1 error ] test attributes(shcomp) begins at 2012-09-28+09:19:34 shcomp-attributes.ksh[128]: attributes not cleared for script execution shcomp-attributes.ksh[133]: typeset -L should not be inherited test attributes(shcomp) failed at 2012-09-28+09:19:36 with exit code 2 [ 110 tests 2 errors ] test basic begins at 2012-09-28+09:19:36 basic.sh[165]: script not working basic.sh[171]: output file pointer not shared correctly basic.sh[198]: builtin replaces standard input pipe basic.sh[204]: $0 not correct for . script basic.sh[211]: nested scripts failed basic.sh[215]: scripts in subshells fail basic.sh[350]: piping into script fails basic.sh[359]: script pipe to shell fails blabla We've traced this down to the nonconforming glibc/Linux implementation of posix_spawn() - disabling it cures the problem on Linux. I crosschecked with the AIX build - it uses posix_spawn() the same way but without triggering any failures. I think this is a follow-up to http://marc.info/?l=ast-developersm=134785274012526w=2 - I can't agree with the assertion of Redhat's Michal Hlavinka that glibc posix_spawn() is right, because the current behaviour is IMO useless for use in a shell (hence the failures in the testsuite), and think a fix in glibc is still required. Ced (who needs a sedative now to calm down - 3rd glibc bug *this* week) -- Cedric Blancher cedric.blanc...@googlemail.com Institute Pasteur ___ ast-developers mailing list ast-developers@research.att.com https://mailman.research.att.com/mailman/listinfo/ast-developers
Re: [ast-developers] Nonconforming glibc/Linux posix_spawn() = ast-ksh 2012-09-27 fails
On Fri, 28 Sep 2012 10:21:49 +0200 Cedric Blancher wrote: On 28 September 2012 07:44, Glenn Fowler g...@research.att.com wrote: { INIT ast-ksh } 2012-09-27 alphas posted to www.research.att.com/sw/download/alpha/ We experience a lot of failures with ast-ksh 2012-09-27 on Suse 12.2 Linux and latest Fedora: test arith begins at 2012-09-28+08:51:50 arith.sh[420]: compound var arithmetic failed arith.sh[421]: compound var arithmetic failed arith.sh[422]: compound var arithmetic failed arith.sh[423]: compound var arithmetic failed arith.sh[424]: compound var arithmetic failed arith.sh[425]: compound var arithmetic failed arith.sh[426]: compound var arithmetic failed test arith failed at 2012-09-28+08:51:50 with exit code 1 [ 201 tests 1 error ] test attributes begins at 2012-09-28+09:19:32 attributes.sh[128]: attributes not cleared for script execution attributes.sh[133]: typeset -L should not be inherited test attributes failed at 2012-09-28+09:19:34 with exit code 1 [ 110 tests 1 error ] test attributes(shcomp) begins at 2012-09-28+09:19:34 shcomp-attributes.ksh[128]: attributes not cleared for script execution shcomp-attributes.ksh[133]: typeset -L should not be inherited test attributes(shcomp) failed at 2012-09-28+09:19:36 with exit code 2 [ 110 tests 2 errors ] test basic begins at 2012-09-28+09:19:36 basic.sh[165]: script not working basic.sh[171]: output file pointer not shared correctly basic.sh[198]: builtin replaces standard input pipe basic.sh[204]: $0 not correct for . script basic.sh[211]: nested scripts failed basic.sh[215]: scripts in subshells fail basic.sh[350]: piping into script fails basic.sh[359]: script pipe to shell fails blabla We've traced this down to the nonconforming glibc/Linux implementation of posix_spawn() - disabling it cures the problem on Linux. I crosschecked with the AIX build - it uses posix_spawn() the same way but without triggering any failures. I think this is a follow-up to http://marc.info/?l=ast-developersm=134785274012526w=2 - I can't agree with the assertion of Redhat's Michal Hlavinka that glibc posix_spawn() is right, because the current behaviour is IMO useless for use in a shell (hence the failures in the testsuite), and think a fix in glibc is still required. to recap: grep _lib_posix_spawn arch/*/src/lib/libast/FEATURE/lib there are 3 possible results (1) not there = posix_spawn() unusable (2) #define _lib_posix_spawn 2 = works with no workarounds (3) #define _lib_posix_spawn 1 = works but posix_spawn() on an executable file that would fail with ENOEXEC via execve() creates a process that exits with status 127 our sol10.* systems have _lib_posix_spawn 1 and they work so something else is going on (we don't have a linux system with the new glibc posix_spawn()) it may be a timing problem with this logic in src/lib/libast/misc/spawnvex.c (spawnvex() is new and the api has not settled yet) #if _lib_posix_spawn 2 if (waitpid(pid, err, WNOHANG|WNOWAIT) == pid EXIT_STATUS(err) == 127) { while (waitpid(pid, NiL, 0) == -1 errno == EINTR); if (!access(path, X_OK)) errno = ENOEXEC; pid = -1; } #endif can you do an strace and see what the waitpid() is returning? my guess is on solaris the child process has exited 127 on ENOEXEC before the waitpid(pid, err, WNOHANG|WNOWAIT) and on linux the process has not yet exited (but looking at build log over the last week I see some spurious exit code 127 failures on solaris, so it looks like a timing problem even for solaris) the standard allows exit code 127 for fork()/exec() in the case of ENOEXEC producing a child process that will eventually exit 127 I'm beginning to fear that there is no way to work around the timing window -- a sleep() before waitpid() would be dumb and not guaranteed to work anyway -- the posix_spawn() wrapper could check the magic number but I don't want to get into the magic number game that's exec*()'s job so if it is a timing window, the iffe test will have to fail posix_spawn() implementations that create a child process for ENOEXEC and if that's the case it shows how usesless posix_spawn() is because the caller only knows exit status 127, not the root of the problem in the case of the shell calling posix_spawn() it must know the reason for failure ENOEXEC means the shell can attempt to treat the executable as a script not so for exit code 127 I just noticed that this code is not strictly portable because it relies on the non-standard linux WNOWAIT without iffe-ing or #ifdef-ing it for now its ok (by luck) because _lib_posix_spawn=1 only on { linux solaris } I'll modify the iffe test to only emit _lib_posix_spawn=1 if WNOWAIT is defined otherwise
Re: [ast-developers] Nonconforming glibc/Linux posix_spawn() = ast-ksh 2012-09-27 fails
On Fri, Sep 28, 2012 at 11:38 AM, Glenn Fowler g...@research.att.com wrote: On Fri, 28 Sep 2012 10:21:49 +0200 Cedric Blancher wrote: On 28 September 2012 07:44, Glenn Fowler g...@research.att.com wrote: { INIT ast-ksh } 2012-09-27 alphas posted to www.research.att.com/sw/download/alpha/ We experience a lot of failures with ast-ksh 2012-09-27 on Suse 12.2 Linux and latest Fedora: test arith begins at 2012-09-28+08:51:50 arith.sh[420]: compound var arithmetic failed arith.sh[421]: compound var arithmetic failed arith.sh[422]: compound var arithmetic failed arith.sh[423]: compound var arithmetic failed arith.sh[424]: compound var arithmetic failed arith.sh[425]: compound var arithmetic failed arith.sh[426]: compound var arithmetic failed test arith failed at 2012-09-28+08:51:50 with exit code 1 [ 201 tests 1 error ] test attributes begins at 2012-09-28+09:19:32 attributes.sh[128]: attributes not cleared for script execution attributes.sh[133]: typeset -L should not be inherited test attributes failed at 2012-09-28+09:19:34 with exit code 1 [ 110 tests 1 error ] test attributes(shcomp) begins at 2012-09-28+09:19:34 shcomp-attributes.ksh[128]: attributes not cleared for script execution shcomp-attributes.ksh[133]: typeset -L should not be inherited test attributes(shcomp) failed at 2012-09-28+09:19:36 with exit code 2 [ 110 tests 2 errors ] test basic begins at 2012-09-28+09:19:36 basic.sh[165]: script not working basic.sh[171]: output file pointer not shared correctly basic.sh[198]: builtin replaces standard input pipe basic.sh[204]: $0 not correct for . script basic.sh[211]: nested scripts failed basic.sh[215]: scripts in subshells fail basic.sh[350]: piping into script fails basic.sh[359]: script pipe to shell fails blabla We've traced this down to the nonconforming glibc/Linux implementation of posix_spawn() - disabling it cures the problem on Linux. I crosschecked with the AIX build - it uses posix_spawn() the same way but without triggering any failures. I think this is a follow-up to http://marc.info/?l=ast-developersm=134785274012526w=2 - I can't agree with the assertion of Redhat's Michal Hlavinka that glibc posix_spawn() is right, because the current behaviour is IMO useless for use in a shell (hence the failures in the testsuite), and think a fix in glibc is still required. to recap: grep _lib_posix_spawn arch/*/src/lib/libast/FEATURE/lib there are 3 possible results (1) not there = posix_spawn() unusable (2) #define _lib_posix_spawn 2 = works with no workarounds (3) #define _lib_posix_spawn 1 = works but posix_spawn() on an executable file that would fail with ENOEXEC via execve() creates a process that exits with status 127 our sol10.* systems have _lib_posix_spawn 1 and they work so something else is going on Solaris 11, Opensolaris, Openindiana/Illumos (Solaris clone) and AIX all produce _lib_posix_spawn==2. I think there are patches for Solaris 10 which backport the posix_spawn() fixes from Solaris 11 to Solaris 10 because behaviours such as _lib_posix_spawn==1 is not conforming and causes the the Single Unix Standard and VS* test suites to fail. Irek ___ ast-developers mailing list ast-developers@research.att.com https://mailman.research.att.com/mailman/listinfo/ast-developers