Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()

2023-06-27 Thread Casper Ti. Vector
On Wed, Jun 28, 2023 at 01:33:57AM +, Laurent Bercot wrote:
>  They're not the only possible behaviours: for instance, [1] shows that
> under some buggy qemu, posix_spawn() always returns early.

That seems to at least inadvertently comply with posix_spawn(3)...

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2024.09.30)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C



Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()

2023-06-27 Thread Laurent Bercot

Actually I copied the fragment of posix_spawn(3) from a Devuan Chimaera
machine, so the problem may be not specific to CentOS 7.  I did not test
CentOS 6 or other distro (version)s, for example; but on Rocky Linux 8,
which I unfortunately also need to support at work, the behaviour is
as expected.  Attached is a simple test.


 It may be a bug in some old glibc, then.



If we assume posix_spawn(3) and posix_spawn(3p) were the only possible
behaviours (which is frankly not that reliable, judging from how
neither manpage noted the violation of conformance), then the two
behaviours could be distinguished with the attached test.


 They're not the only possible behaviours: for instance, [1] shows that
under some buggy qemu, posix_spawn() always returns early. But that
behaviour can also be caught by the same workaround as the glibc
behaviour you're observing, so it's fine.

 Since the bug is more widespread than "one old version of one distro",
is visible in production environments used at large, and seems 
constrained

to "posix_spawn succeeds even if exec fails", which is testable,
I'll add a sysdep to detect it and a workaround in child_spawn*, but it
will mean additional manual --with-sysdep-foobar=blah noise for
skalibs cross-builds.

[1]: https://skarnet.org/lists/skaware/1658.html

--
 Laurent



Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()

2023-06-27 Thread Casper Ti. Vector
On Tue, Jun 27, 2023 at 09:27:17PM +, Laurent Bercot wrote:
>  Testing the behaviour may be challenging, however, because I suspect
> the CentOS 7 implementation of posix_spawn() is just racy, and they
> simply documented that they don't care.

Actually I copied the fragment of posix_spawn(3) from a Devuan Chimaera
machine, so the problem may be not specific to CentOS 7.  I did not test
CentOS 6 or other distro (version)s, for example; but on Rocky Linux 8,
which I unfortunately also need to support at work, the behaviour is
as expected.  Attached is a simple test.

>From the posix_spawn(3), which had the similar statement across the
three different versions above, there is also a statement two paragraphs
later: "The posix_spawn() and posix_spawnp() functions fail only in the
case where the underlying fork(2), vfork(2) or clone(2) call fails".
So if the manpage is followed to the word, the CentOS 7 behaviour is
expected; if posix_spawn(3p) is followed instead, any exec() error must
also be caught and result in returning 0, which we really expect.

If we assume posix_spawn(3) and posix_spawn(3p) were the only possible
behaviours (which is frankly not that reliable, judging from how
neither manpage noted the violation of conformance), then the two
behaviours could be distinguished with the attached test.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2024.09.30)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C

/* ISC license. */

#include 
#include 
#include 
#include 
#include 

pid_t child_spawn0 (char const *prog, char const *const *argv, char const *const *envp)
{
  pid_t pid ;
  posix_spawnattr_t attr ;
  int e ;
  e = posix_spawnattr_init() ;
  if (e) goto err ;
  e = posix_spawnp(, prog, 0, , (char *const *)argv, (char *const *)envp) ;
  posix_spawnattr_destroy() ;
  if (e) goto err ;
  return pid ;

 err:
  errno = e ;
  return 0 ;
}

int main (int argc, char const *const *argv, char const *const *envp)
{
  if (argc < 2) return 1 ;
  printf ("%d\n", child_spawn0(argv[1], [1], envp)) ;
  return 0 ;
}



Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()

2023-06-27 Thread Laurent Bercot

 Testing the behaviour may be challenging, however, because I suspect
the CentOS 7 implementation of posix_spawn() is just racy, and they
simply documented that they don't care.


 Thinking about it more, I'm afraid it's not a testable behaviour.
Not only isn't there any way to force the race since it entirely
happens inside a libc function, but also, the test would require
running code on the build machine, which doesn't work for cross-builds
and people would have to manually set the sysdep anyway.

 It seems like --with-sysdep-posixspawn=no, as you did, is the easiest
workaround.

--
 Laurent



Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()

2023-06-27 Thread Laurent Bercot

As a more general fix, I think tryposixspawn.c should at least try
spawning a probably unexecutable path (like the one above) as well,
which corrects the sysdep on systems where the expected conformance
is broken.


 Adding a sysdep to detect that case is a good idea indeed!
 Rather than pretending it doesn't exist, though, I'd rather add a
different sysdep that tests its behaviour, so it can still be used
with the proper workaround.

 Testing the behaviour may be challenging, however, because I suspect
the CentOS 7 implementation of posix_spawn() is just racy, and they
simply documented that they don't care.

--
 Laurent



Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()

2023-06-27 Thread Casper Ti. Vector
On Tue, Jun 27, 2023 at 08:50:35AM +, Laurent Bercot wrote:
>  I believe the correct setting is actually KillMode=mixed; and the
> ExecStop= line is incorrect as well since ExecStop expects a synchronous
> command, not an asynchronous one. Better let systemd just send a SIGTERM
> to s6-svscan, wait for the supervision tree to exit on its own, and
> SIGKILL the stragglers. I pushed a fix accordingly.

Yes, I did my change in a haste and did not notice the `mixed' option
which is indeed better.  As you see, systemd waits for the stragglers
so that is at least correct, although the timeout (150s by default on
CentOS 7) is really exorbitant.

>  Yeah, well, tough for non-conforming systems.
>  That said, I also pushed a change last week that should have fixed
> this issue as a side effect, so it's all good. If you feel like it,
> you can try the s6-svscan version in the latest s6 git. :)

s6-svscan is quite a minor issue; I added .s6-svscan/SIGTERM for all
those scandirs which may unfortunately need to run on CentOS 7, as an
extra line of defence -- in addition to `--with-sysdep-posixspawn=no'
to skalibs.  The latter also fixes other programs that involve
child_spawn*(), including home-made ones (iotrap, discussed a few days
ago on the supervision list) and upstream ones (for instance trap).

I personally think `--with-sysdep-posixspawn=no' is the really correct
fix on CentOS 7, which fixes silent failures with eg.:
  #!/bin/execlineb -P
  trap {
SIGHUP { echo test }
  } /dev/xyzzy
As a more general fix, I think tryposixspawn.c should at least try
spawning a probably unexecutable path (like the one above) as well,
which corrects the sysdep on systems where the expected conformance
is broken.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2024.09.30)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C



Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()

2023-06-27 Thread Laurent Bercot

* In `trap.html', there is a reference to the removed `timeout' keyword.


 Fixed.



* In `s6-svscan-not-1.html', the systemd unit (traumatic experience with
  it, as you may easily expect) lacks a `KillMode = process'.


 I believe the correct setting is actually KillMode=mixed; and the
ExecStop= line is incorrect as well since ExecStop expects a synchronous
command, not an asynchronous one. Better let systemd just send a SIGTERM
to s6-svscan, wait for the supervision tree to exit on its own, and
SIGKILL the stragglers. I pushed a fix accordingly.



* The child_spawn*() family of functions, depending on using posix_spawn
  or not, exhibit different behaviours on CentOS 7 (trauma again), as
  posix_spawnp() may return 0 with argv pointing to unexecutable paths.
  This, for example, results in s6-svscan not exiting on SIGTERM when
  .s6-svscan/SIGTERM is absent.  The behaviour of posix_spawnp() on
  CentOS 7 does not conform to posix_spawn(3p), but is documented in
  posix_spawn(3): "Even when these functions return a success status,
  the child process may still fail for a plethora of reasons related to
  its pre-exec() initialization.  In addition, the exec(3) may fail."


 Yeah, well, tough for non-conforming systems.
 That said, I also pushed a change last week that should have fixed
this issue as a side effect, so it's all good. If you feel like it,
you can try the s6-svscan version in the latest s6 git. :)


> --with-sysdep-devurandom

 Also fixed.

 Thanks a lot for these reports!

--
 Laurent