Re: posix_spawn (was: Bugs with execline/s6 documentation and skalibs functions using posix_spawn())

2023-06-29 Thread Casper Ti. Vector
On Thu, Jun 29, 2023 at 11:04:33AM +, Laurent Bercot wrote:
>  Indeed, /dev should work; but using it still makes me queasier than
> crafting a nonexistent path.

Actually deliberately supplying a directory to posix_spawn*() is
a slightly deeper test, as the path exists but is unexecutable.
But surely this does not constitute a decisive advantage.

>  Of course, it doesn't matter for glibc, and it doesn't matter for
> s6 which needs fork anyway. And chances are that platforms that
> implement posix_spawn() with internals that are *not* fork+exec will
> not make it return before the spawning has really succeeded. But still,
> it's nice to make sure it can be used wherever it exists.

Now I see, thanks...

>  If you don't like the workaround, nobody's preventing you from using
> --with-sysdep-posixspawn=no manually. ;)

Yes, and I am keeping my origin solution at work intact :)

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2024.09.30)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C



Re: posix_spawn (was: Bugs with execline/s6 documentation and skalibs functions using posix_spawn())

2023-06-29 Thread Laurent Bercot

Actually I mean a *directory* that is guaranteed to exist (and meanwhile
unexecutable): so /dev here.


 Indeed, /dev should work; but using it still makes me queasier than
crafting a nonexistent path. The mkstemp thing works, so, not going to
change it to save a couple of syscalls in a configure test. :)



Well I was intending to suggest that we simpliy avoided posix_spawn*()
where it disagreed with posix_spawn(3p); that is to say simply replacing
all previous `#ifdef HASPOSIXSPAWN' conditions with `#if (defined
HASPOSIXSPAWN) && (!defined SKALIBS_HASPOSIXSPAWNEARLYRETURN)'.  After
all it seems to me child_spawn*() is not used that prevalently, so the
performance penalty is really minor; of course, feel free to correct me.


 Yes, falling back to fork+exec when posix_spawn is bad is an option,
and I would probably have done just that if I hadn't been pointed to
the existence of waitid() to achieve the "test whether a child is dead
without reaping it" thing, without which there can be no workaround.

 But posix_spawn is more than a performance thing. The point of this
interface is that its implementation doesn't have to be vfork+exec
internally; it was precisely designed to allow spawning processes on
nommu machines, where vfork and fork are basically impossible. So,
using posix_spawn wherever possible helps with portability as well.

 Of course, it doesn't matter for glibc, and it doesn't matter for s6
which needs fork anyway. And chances are that platforms that
implement posix_spawn() with internals that are *not* fork+exec will
not make it return before the spawning has really succeeded. But still,
it's nice to make sure it can be used wherever it exists.

 If you don't like the workaround, nobody's preventing you from using
--with-sysdep-posixspawn=no manually. ;)

--
 Laurent



Re: posix_spawn (was: Bugs with execline/s6 documentation and skalibs functions using posix_spawn())

2023-06-29 Thread Casper Ti. Vector
On Thu, Jun 29, 2023 at 08:21:19AM +, Laurent Bercot wrote:
>  POSIX doesn't mandate any path other than /dev/null and /dev/console
> and I'd rather not try executing them, who knows what weird permissions
> they may have on obscure OSes.

Actually I mean a *directory* that is guaranteed to exist (and meanwhile
unexecutable): so /dev here.

>  I agree it's a lot of work for not much, but as you said, the
> behaviour is arguably conformant, and your experience proves that old
> glibcs are still around, so I'd rather make posix_spawn usable where
> it exists instead of placing the burden of --with-sysdep-posixspawn=no
> on users who have a bad version.

Well I was intending to suggest that we simpliy avoided posix_spawn*()
where it disagreed with posix_spawn(3p); that is to say simply replacing
all previous `#ifdef HASPOSIXSPAWN' conditions with `#if (defined
HASPOSIXSPAWN) && (!defined SKALIBS_HASPOSIXSPAWNEARLYRETURN)'.  After
all it seems to me child_spawn*() is not used that prevalently, so the
performance penalty is really minor; of course, feel free to correct me.

And I probably also need to bring the incorrectness of posix_spawn(3)
to the attention of its (Linux manpages) maintainers.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2024.09.30)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C



Re: posix_spawn (was: Bugs with execline/s6 documentation and skalibs functions using posix_spawn())

2023-06-29 Thread Laurent Bercot



 Fixes pushed to git, thanks!

 When given an unexecutable path, child_spawn() returns 0, but errno
is unset... that's on purpose. Unfortunately, in the parent there is
no way to know the child's execve() error code; all we have is the
exit status, 127, and we cannot report the reason for the failure.
Rather than set errno to something that may be wrong and prompt the
caller to take inadequate measures, I'd rather set it to 0, which
glibc reports as "success" but really means "no error information"
except in a few, well-known contexts; and let the caller deal with
the lack of more accurate reporting. I know it's not satisfying, but
we can't do any better.



I have realised that a simpler unexecutable path can be, for example,
/etc (is it mandated in POSIX?); this can save the mkstemp() call
in the sysdep test.


 POSIX doesn't mandate any path other than /dev/null and /dev/console
and I'd rather not try executing them, who knows what weird permissions
they may have on obscure OSes.
 It's a sysdep test, it's not performance-critical; I'd rather use
mkstemp() to be *sure* we have a path that does not exist.
(Of course the user could always race the program, but we're not trying
to harden against stupidity here.)



(And frankly I personally do not really find it much worthwhile to
introduce this amount of complexity for the broken dependency of a
quite minor performance optimisation...)


 I agree it's a lot of work for not much, but as you said, the
behaviour is arguably conformant, and your experience proves that old
glibcs are still around, so I'd rather make posix_spawn usable where
it exists instead of placing the burden of --with-sysdep-posixspawn=no
on users who have a bad version.

 As shown by the qemu bug I linked above, this impacts s6-svscan,
which relies on correct child_spawn() reporting when running custom
signal handlers, so not working around bad posix_spawn QoI may lead
to buggy signal management in s6-svscan, and nobody wants that.


 A cursory web search appears to say that glibc-2.27 is when they fixed
the posix_spawn QoI; 2.17 being bad is consistent with that. But I can't
be bothered to go spelunk in glibc code to check and/or bisect, so if
someone could confirm, thank you, otherwise, no big deal.

--
 Laurent



Re: posix_spawn (was: Bugs with execline/s6 documentation and skalibs functions using posix_spawn())

2023-06-28 Thread Casper Ti. Vector
I have realised that a simpler unexecutable path can be, for example,
/etc (is it mandated in POSIX?); this can save the mkstemp() call
in the sysdep test.

(And frankly I personally do not really find it much worthwhile to
introduce this amount of complexity for the broken dependency of a
quite minor performance optimisation...)

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2024.09.30)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C



Re: posix_spawn (was: Bugs with execline/s6 documentation and skalibs functions using posix_spawn())

2023-06-28 Thread Casper Ti. Vector
On Wed, Jun 28, 2023 at 09:40:48PM +, Laurent Bercot wrote:
>  I pushed a workaround to the skalibs git.
>  Could you please try a build on a machine that exhibits the early
> return behaviour and tell me if
>  - the behaviour is correctly detected by ./configure (the last sysdep)

Yes, after changing `attrp' in tryposixspawnearlyreturn.c to `attr'.

>  - the child_spawn*() family of functions now works properly even on
> this machine?

Partially, after an additional WEXITED is OR-ed with the options for
waitid() in child_spawn_workaround.c.  When given an unexecutable path,
child_spawn() returns 0, but errno is unset.

>  Also, can you please tell me what version of glibc these distribution
> versions are running?

glibc-2.17-317.el7.x86_64, from CentOS 7.9.2009.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2024.09.30)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C



posix_spawn (was: Bugs with execline/s6 documentation and skalibs functions using posix_spawn())

2023-06-28 Thread Laurent Bercot



 I pushed a workaround to the skalibs git.
 Could you please try a build on a machine that exhibits the early
return behaviour and tell me if
 - the behaviour is correctly detected by ./configure (the last sysdep)
 - the child_spawn*() family of functions now works properly even on
this machine?

 Also, can you please tell me what version of glibc these distribution
versions are running?

 Thanks!

--
 Laurent



Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()

2023-06-27 Thread Casper Ti. Vector
On Wed, Jun 28, 2023 at 01:33:57AM +, Laurent Bercot wrote:
>  They're not the only possible behaviours: for instance, [1] shows that
> under some buggy qemu, posix_spawn() always returns early.

That seems to at least inadvertently comply with posix_spawn(3)...

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2024.09.30)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C



Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()

2023-06-27 Thread Laurent Bercot

Actually I copied the fragment of posix_spawn(3) from a Devuan Chimaera
machine, so the problem may be not specific to CentOS 7.  I did not test
CentOS 6 or other distro (version)s, for example; but on Rocky Linux 8,
which I unfortunately also need to support at work, the behaviour is
as expected.  Attached is a simple test.


 It may be a bug in some old glibc, then.



If we assume posix_spawn(3) and posix_spawn(3p) were the only possible
behaviours (which is frankly not that reliable, judging from how
neither manpage noted the violation of conformance), then the two
behaviours could be distinguished with the attached test.


 They're not the only possible behaviours: for instance, [1] shows that
under some buggy qemu, posix_spawn() always returns early. But that
behaviour can also be caught by the same workaround as the glibc
behaviour you're observing, so it's fine.

 Since the bug is more widespread than "one old version of one distro",
is visible in production environments used at large, and seems 
constrained

to "posix_spawn succeeds even if exec fails", which is testable,
I'll add a sysdep to detect it and a workaround in child_spawn*, but it
will mean additional manual --with-sysdep-foobar=blah noise for
skalibs cross-builds.

[1]: https://skarnet.org/lists/skaware/1658.html

--
 Laurent



Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()

2023-06-27 Thread Casper Ti. Vector
On Tue, Jun 27, 2023 at 09:27:17PM +, Laurent Bercot wrote:
>  Testing the behaviour may be challenging, however, because I suspect
> the CentOS 7 implementation of posix_spawn() is just racy, and they
> simply documented that they don't care.

Actually I copied the fragment of posix_spawn(3) from a Devuan Chimaera
machine, so the problem may be not specific to CentOS 7.  I did not test
CentOS 6 or other distro (version)s, for example; but on Rocky Linux 8,
which I unfortunately also need to support at work, the behaviour is
as expected.  Attached is a simple test.

>From the posix_spawn(3), which had the similar statement across the
three different versions above, there is also a statement two paragraphs
later: "The posix_spawn() and posix_spawnp() functions fail only in the
case where the underlying fork(2), vfork(2) or clone(2) call fails".
So if the manpage is followed to the word, the CentOS 7 behaviour is
expected; if posix_spawn(3p) is followed instead, any exec() error must
also be caught and result in returning 0, which we really expect.

If we assume posix_spawn(3) and posix_spawn(3p) were the only possible
behaviours (which is frankly not that reliable, judging from how
neither manpage noted the violation of conformance), then the two
behaviours could be distinguished with the attached test.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2024.09.30)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C

/* ISC license. */

#include 
#include 
#include 
#include 
#include 

pid_t child_spawn0 (char const *prog, char const *const *argv, char const *const *envp)
{
  pid_t pid ;
  posix_spawnattr_t attr ;
  int e ;
  e = posix_spawnattr_init() ;
  if (e) goto err ;
  e = posix_spawnp(, prog, 0, , (char *const *)argv, (char *const *)envp) ;
  posix_spawnattr_destroy() ;
  if (e) goto err ;
  return pid ;

 err:
  errno = e ;
  return 0 ;
}

int main (int argc, char const *const *argv, char const *const *envp)
{
  if (argc < 2) return 1 ;
  printf ("%d\n", child_spawn0(argv[1], [1], envp)) ;
  return 0 ;
}



Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()

2023-06-27 Thread Laurent Bercot

 Testing the behaviour may be challenging, however, because I suspect
the CentOS 7 implementation of posix_spawn() is just racy, and they
simply documented that they don't care.


 Thinking about it more, I'm afraid it's not a testable behaviour.
Not only isn't there any way to force the race since it entirely
happens inside a libc function, but also, the test would require
running code on the build machine, which doesn't work for cross-builds
and people would have to manually set the sysdep anyway.

 It seems like --with-sysdep-posixspawn=no, as you did, is the easiest
workaround.

--
 Laurent



Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()

2023-06-27 Thread Laurent Bercot

As a more general fix, I think tryposixspawn.c should at least try
spawning a probably unexecutable path (like the one above) as well,
which corrects the sysdep on systems where the expected conformance
is broken.


 Adding a sysdep to detect that case is a good idea indeed!
 Rather than pretending it doesn't exist, though, I'd rather add a
different sysdep that tests its behaviour, so it can still be used
with the proper workaround.

 Testing the behaviour may be challenging, however, because I suspect
the CentOS 7 implementation of posix_spawn() is just racy, and they
simply documented that they don't care.

--
 Laurent



Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()

2023-06-27 Thread Casper Ti. Vector
On Tue, Jun 27, 2023 at 08:50:35AM +, Laurent Bercot wrote:
>  I believe the correct setting is actually KillMode=mixed; and the
> ExecStop= line is incorrect as well since ExecStop expects a synchronous
> command, not an asynchronous one. Better let systemd just send a SIGTERM
> to s6-svscan, wait for the supervision tree to exit on its own, and
> SIGKILL the stragglers. I pushed a fix accordingly.

Yes, I did my change in a haste and did not notice the `mixed' option
which is indeed better.  As you see, systemd waits for the stragglers
so that is at least correct, although the timeout (150s by default on
CentOS 7) is really exorbitant.

>  Yeah, well, tough for non-conforming systems.
>  That said, I also pushed a change last week that should have fixed
> this issue as a side effect, so it's all good. If you feel like it,
> you can try the s6-svscan version in the latest s6 git. :)

s6-svscan is quite a minor issue; I added .s6-svscan/SIGTERM for all
those scandirs which may unfortunately need to run on CentOS 7, as an
extra line of defence -- in addition to `--with-sysdep-posixspawn=no'
to skalibs.  The latter also fixes other programs that involve
child_spawn*(), including home-made ones (iotrap, discussed a few days
ago on the supervision list) and upstream ones (for instance trap).

I personally think `--with-sysdep-posixspawn=no' is the really correct
fix on CentOS 7, which fixes silent failures with eg.:
  #!/bin/execlineb -P
  trap {
SIGHUP { echo test }
  } /dev/xyzzy
As a more general fix, I think tryposixspawn.c should at least try
spawning a probably unexecutable path (like the one above) as well,
which corrects the sysdep on systems where the expected conformance
is broken.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2024.09.30)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C



Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()

2023-06-27 Thread Laurent Bercot

* In `trap.html', there is a reference to the removed `timeout' keyword.


 Fixed.



* In `s6-svscan-not-1.html', the systemd unit (traumatic experience with
  it, as you may easily expect) lacks a `KillMode = process'.


 I believe the correct setting is actually KillMode=mixed; and the
ExecStop= line is incorrect as well since ExecStop expects a synchronous
command, not an asynchronous one. Better let systemd just send a SIGTERM
to s6-svscan, wait for the supervision tree to exit on its own, and
SIGKILL the stragglers. I pushed a fix accordingly.



* The child_spawn*() family of functions, depending on using posix_spawn
  or not, exhibit different behaviours on CentOS 7 (trauma again), as
  posix_spawnp() may return 0 with argv pointing to unexecutable paths.
  This, for example, results in s6-svscan not exiting on SIGTERM when
  .s6-svscan/SIGTERM is absent.  The behaviour of posix_spawnp() on
  CentOS 7 does not conform to posix_spawn(3p), but is documented in
  posix_spawn(3): "Even when these functions return a success status,
  the child process may still fail for a plethora of reasons related to
  its pre-exec() initialization.  In addition, the exec(3) may fail."


 Yeah, well, tough for non-conforming systems.
 That said, I also pushed a change last week that should have fixed
this issue as a side effect, so it's all good. If you feel like it,
you can try the s6-svscan version in the latest s6 git. :)


> --with-sysdep-devurandom

 Also fixed.

 Thanks a lot for these reports!

--
 Laurent



Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()

2023-06-26 Thread Casper Ti. Vector
On Tue, Jun 27, 2023 at 10:16:10AM +0800, Casper Ti. Vector wrote:
> * The child_spawn*() family of functions, [...]

Well, just found another one when handling the above issue:
* `flags.html' from skalibs references `--enable-sysdep-devurandom',
  which should now be `--with-sysdep-devurandom'.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2024.09.30)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C



Bugs with execline/s6 documentation and skalibs functions using posix_spawn()

2023-06-26 Thread Casper Ti. Vector
* In `trap.html', there is a reference to the removed `timeout' keyword.

* In `s6-svscan-not-1.html', the systemd unit (traumatic experience with
  it, as you may easily expect) lacks a `KillMode = process'.

* The child_spawn*() family of functions, depending on using posix_spawn
  or not, exhibit different behaviours on CentOS 7 (trauma again), as
  posix_spawnp() may return 0 with argv pointing to unexecutable paths.
  This, for example, results in s6-svscan not exiting on SIGTERM when
  .s6-svscan/SIGTERM is absent.  The behaviour of posix_spawnp() on
  CentOS 7 does not conform to posix_spawn(3p), but is documented in
  posix_spawn(3): "Even when these functions return a success status,
  the child process may still fail for a plethora of reasons related to
  its pre-exec() initialization.  In addition, the exec(3) may fail."

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2024.09.30)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C