Re: [ast-developers] Nonconforming glibc/Linux posix_spawn() = ast-ksh 2012-09-27 fails

2012-10-01 Thread Glenn Fowler

On Mon, 01 Oct 2012 13:49:28 +0200 Michal Hlavinka wrote:
  #if _lib_posix_spawn  2
  if (waitpid(pid, err, WNOHANG|WNOWAIT) == pid  EXIT_STATUS(err) == 
  127)
  {
  while (waitpid(pid, NiL, 0) == -1  errno == EINTR);
  if (!access(path, X_OK))
  errno = ENOEXEC;
  pid = -1;
  }
  #endif

 It will fail with EINVAL, because of WNOWAIT. This option is not 
 supported in waitpid. It's supported only in waitid().

 so you'd have to use something like

 pid_t pid;
 siginfo_t err;
 if (!waitid(P_PID, pid, err, WEXITED|WNOHANG|WNOWAIT)  err.si_status 
 == 127)

thanks for pointing out the WNOWAIT misuse

alas even with the correct waitid() usage it does not solve the problem
with all posix-spawn() implementations that use exit code 127 as a catch all for
posix_spawn() related problems in the child process

in the particular case of how shell implementations *portably* determine
binary executable vs shell script (exec fails with ENOEXEC)
such posix_spawn() implementations are useless

e.g. a shell calling posix_spawn() must be able to differentiate 
  EAGAINretry in case some resources become free
  E2BIG diagnostic or apply xargs algorithm if appropriate
  ENOEXEC   either a shell script or a diagnostic
and you can't get that from exit status 127

___
ast-developers mailing list
ast-developers@research.att.com
https://mailman.research.att.com/mailman/listinfo/ast-developers


[ast-developers] Nonconforming glibc/Linux posix_spawn() = ast-ksh 2012-09-27 fails

2012-09-28 Thread Cedric Blancher
On 28 September 2012 07:44, Glenn Fowler g...@research.att.com wrote:

 { INIT ast-ksh } 2012-09-27 alphas posted to
 www.research.att.com/sw/download/alpha/

We experience a lot of failures with ast-ksh 2012-09-27 on Suse 12.2
Linux and latest Fedora:

test arith begins at 2012-09-28+08:51:50
arith.sh[420]: compound var arithmetic failed
arith.sh[421]: compound var arithmetic failed
arith.sh[422]: compound var arithmetic failed
arith.sh[423]: compound var arithmetic failed
arith.sh[424]: compound var arithmetic failed
arith.sh[425]: compound var arithmetic failed
arith.sh[426]: compound var arithmetic failed
test arith failed at 2012-09-28+08:51:50 with exit code 1 [ 201 tests 1 error ]
test attributes begins at 2012-09-28+09:19:32
attributes.sh[128]: attributes not cleared for script execution
attributes.sh[133]: typeset -L should not be inherited
test attributes failed at 2012-09-28+09:19:34 with exit code 1 [ 110
tests 1 error ]
test attributes(shcomp) begins at 2012-09-28+09:19:34
shcomp-attributes.ksh[128]: attributes not cleared for script execution
shcomp-attributes.ksh[133]: typeset -L should not be inherited
test attributes(shcomp) failed at 2012-09-28+09:19:36 with exit code 2
[ 110 tests 2 errors ]
test basic begins at 2012-09-28+09:19:36
basic.sh[165]: script not working
basic.sh[171]: output file pointer not shared correctly
basic.sh[198]: builtin replaces standard input pipe
basic.sh[204]: $0 not correct for . script
basic.sh[211]: nested scripts failed
basic.sh[215]: scripts in subshells fail
basic.sh[350]: piping into script fails
basic.sh[359]: script pipe to shell fails
blabla

We've traced this down to the nonconforming glibc/Linux implementation
of posix_spawn() - disabling it cures the problem on Linux. I
crosschecked with the AIX build - it uses posix_spawn() the same way
but without triggering any failures.
I think this is a follow-up to
http://marc.info/?l=ast-developersm=134785274012526w=2 - I can't
agree with the assertion of Redhat's Michal Hlavinka that glibc
posix_spawn() is right, because the current behaviour is IMO useless
for use in a shell (hence the failures in the testsuite), and think a
fix in glibc is still required.

Ced (who needs a sedative now to calm down - 3rd glibc bug *this* week)
-- 
Cedric Blancher cedric.blanc...@googlemail.com
Institute Pasteur
___
ast-developers mailing list
ast-developers@research.att.com
https://mailman.research.att.com/mailman/listinfo/ast-developers


Re: [ast-developers] Nonconforming glibc/Linux posix_spawn() = ast-ksh 2012-09-27 fails

2012-09-28 Thread Glenn Fowler

On Fri, 28 Sep 2012 10:21:49 +0200 Cedric Blancher wrote:
 On 28 September 2012 07:44, Glenn Fowler g...@research.att.com wrote:
 
  { INIT ast-ksh } 2012-09-27 alphas posted to
  www.research.att.com/sw/download/alpha/

 We experience a lot of failures with ast-ksh 2012-09-27 on Suse 12.2
 Linux and latest Fedora:

 test arith begins at 2012-09-28+08:51:50
 arith.sh[420]: compound var arithmetic failed
 arith.sh[421]: compound var arithmetic failed
 arith.sh[422]: compound var arithmetic failed
 arith.sh[423]: compound var arithmetic failed
 arith.sh[424]: compound var arithmetic failed
 arith.sh[425]: compound var arithmetic failed
 arith.sh[426]: compound var arithmetic failed
 test arith failed at 2012-09-28+08:51:50 with exit code 1 [ 201 tests 1 error 
 ]
 test attributes begins at 2012-09-28+09:19:32
 attributes.sh[128]: attributes not cleared for script execution
 attributes.sh[133]: typeset -L should not be inherited
 test attributes failed at 2012-09-28+09:19:34 with exit code 1 [ 110
 tests 1 error ]
 test attributes(shcomp) begins at 2012-09-28+09:19:34
 shcomp-attributes.ksh[128]: attributes not cleared for script 
 execution
 shcomp-attributes.ksh[133]: typeset -L should not be inherited
 test attributes(shcomp) failed at 2012-09-28+09:19:36 with exit code 2
 [ 110 tests 2 errors ]
 test basic begins at 2012-09-28+09:19:36
 basic.sh[165]: script not working
 basic.sh[171]: output file pointer not shared correctly
 basic.sh[198]: builtin replaces standard input pipe
 basic.sh[204]: $0 not correct for . script
 basic.sh[211]: nested scripts failed
 basic.sh[215]: scripts in subshells fail
 basic.sh[350]: piping into script fails
 basic.sh[359]: script pipe to shell fails
 blabla

 We've traced this down to the nonconforming glibc/Linux implementation
 of posix_spawn() - disabling it cures the problem on Linux. I
 crosschecked with the AIX build - it uses posix_spawn() the same way
 but without triggering any failures.
 I think this is a follow-up to
 http://marc.info/?l=ast-developersm=134785274012526w=2 - I can't
 agree with the assertion of Redhat's Michal Hlavinka that glibc
 posix_spawn() is right, because the current behaviour is IMO useless
 for use in a shell (hence the failures in the testsuite), and think a
 fix in glibc is still required.

to recap:

grep _lib_posix_spawn arch/*/src/lib/libast/FEATURE/lib

there are 3 possible results
(1) not there = posix_spawn() unusable
(2) #define _lib_posix_spawn 2 = works with no workarounds
(3) #define _lib_posix_spawn 1 = works but posix_spawn() on an executable
file that would fail with ENOEXEC via execve() creates a process
that exits with status 127

our sol10.* systems have _lib_posix_spawn 1 and they work
so something else is going on
(we don't have a linux system with the new glibc posix_spawn())
it may be a timing problem with this logic in
src/lib/libast/misc/spawnvex.c
(spawnvex() is new and the api has not settled yet)

#if _lib_posix_spawn  2
if (waitpid(pid, err, WNOHANG|WNOWAIT) == pid  EXIT_STATUS(err) == 
127)
{
while (waitpid(pid, NiL, 0) == -1  errno == EINTR);
if (!access(path, X_OK))
errno = ENOEXEC;
pid = -1;
}
#endif

can you do an strace and see what the waitpid() is returning?

my guess is on solaris the child process has exited 127 on ENOEXEC
before the waitpid(pid, err, WNOHANG|WNOWAIT) and on linux
the process has not yet exited (but looking at build log over the
last week I see some spurious exit code 127 failures on solaris,
so it looks like a timing problem even for solaris)

the standard allows exit code 127 for fork()/exec()
in the case of ENOEXEC producing a child process that will eventually
exit 127 I'm beginning to fear that there is no way to work around
the timing window -- a sleep() before waitpid() would be dumb and not
guaranteed to work anyway -- the posix_spawn() wrapper could
check the magic number but I don't want to get into the magic number game
that's exec*()'s job

so if it is a timing window, the iffe test will have to fail
posix_spawn() implementations that create a child process for ENOEXEC
and if that's the case it shows how usesless posix_spawn() is
because the caller only knows exit status 127, not the root of the problem

in the case of the shell calling posix_spawn() it must know the reason for 
failure
ENOEXEC means the shell can attempt to treat the executable as a script
not so for exit code 127

I just noticed that this code is not strictly portable because it
relies on the non-standard linux WNOWAIT without iffe-ing or #ifdef-ing it
for now its ok (by luck) because _lib_posix_spawn=1 only on { linux solaris }
I'll modify the iffe test to only emit _lib_posix_spawn=1 if WNOWAIT is defined
otherwise 

Re: [ast-developers] Nonconforming glibc/Linux posix_spawn() = ast-ksh 2012-09-27 fails

2012-09-28 Thread Irek Szczesniak
On Fri, Sep 28, 2012 at 11:38 AM, Glenn Fowler g...@research.att.com wrote:

 On Fri, 28 Sep 2012 10:21:49 +0200 Cedric Blancher wrote:
 On 28 September 2012 07:44, Glenn Fowler g...@research.att.com wrote:
 
  { INIT ast-ksh } 2012-09-27 alphas posted to
  www.research.att.com/sw/download/alpha/

 We experience a lot of failures with ast-ksh 2012-09-27 on Suse 12.2
 Linux and latest Fedora:

 test arith begins at 2012-09-28+08:51:50
 arith.sh[420]: compound var arithmetic failed
 arith.sh[421]: compound var arithmetic failed
 arith.sh[422]: compound var arithmetic failed
 arith.sh[423]: compound var arithmetic failed
 arith.sh[424]: compound var arithmetic failed
 arith.sh[425]: compound var arithmetic failed
 arith.sh[426]: compound var arithmetic failed
 test arith failed at 2012-09-28+08:51:50 with exit code 1 [ 201 tests 1 
 error ]
 test attributes begins at 2012-09-28+09:19:32
 attributes.sh[128]: attributes not cleared for script execution
 attributes.sh[133]: typeset -L should not be inherited
 test attributes failed at 2012-09-28+09:19:34 with exit code 1 [ 110
 tests 1 error ]
 test attributes(shcomp) begins at 2012-09-28+09:19:34
 shcomp-attributes.ksh[128]: attributes not cleared for script 
 execution
 shcomp-attributes.ksh[133]: typeset -L should not be inherited
 test attributes(shcomp) failed at 2012-09-28+09:19:36 with exit code 2
 [ 110 tests 2 errors ]
 test basic begins at 2012-09-28+09:19:36
 basic.sh[165]: script not working
 basic.sh[171]: output file pointer not shared correctly
 basic.sh[198]: builtin replaces standard input pipe
 basic.sh[204]: $0 not correct for . script
 basic.sh[211]: nested scripts failed
 basic.sh[215]: scripts in subshells fail
 basic.sh[350]: piping into script fails
 basic.sh[359]: script pipe to shell fails
 blabla

 We've traced this down to the nonconforming glibc/Linux implementation
 of posix_spawn() - disabling it cures the problem on Linux. I
 crosschecked with the AIX build - it uses posix_spawn() the same way
 but without triggering any failures.
 I think this is a follow-up to
 http://marc.info/?l=ast-developersm=134785274012526w=2 - I can't
 agree with the assertion of Redhat's Michal Hlavinka that glibc
 posix_spawn() is right, because the current behaviour is IMO useless
 for use in a shell (hence the failures in the testsuite), and think a
 fix in glibc is still required.

 to recap:

 grep _lib_posix_spawn arch/*/src/lib/libast/FEATURE/lib

 there are 3 possible results
 (1) not there = posix_spawn() unusable
 (2) #define _lib_posix_spawn 2 = works with no workarounds
 (3) #define _lib_posix_spawn 1 = works but posix_spawn() on an executable
 file that would fail with ENOEXEC via execve() creates a process
 that exits with status 127

 our sol10.* systems have _lib_posix_spawn 1 and they work
 so something else is going on

Solaris 11, Opensolaris, Openindiana/Illumos (Solaris clone) and AIX
all produce _lib_posix_spawn==2. I think there are patches for Solaris
10 which backport the posix_spawn() fixes from Solaris 11 to Solaris
10 because behaviours such as _lib_posix_spawn==1 is not conforming
and causes the the Single Unix Standard and VS* test suites to fail.

Irek
___
ast-developers mailing list
ast-developers@research.att.com
https://mailman.research.att.com/mailman/listinfo/ast-developers