Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()
On Wed, Jun 28, 2023 at 01:33:57AM +, Laurent Bercot wrote: > They're not the only possible behaviours: for instance, [1] shows that > under some buggy qemu, posix_spawn() always returns early. That seems to at least inadvertently comply with posix_spawn(3)... -- My current OpenPGP key: RSA4096/0x227E8CAAB7AA186C (expires: 2024.09.30) 7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C
Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()
Actually I copied the fragment of posix_spawn(3) from a Devuan Chimaera machine, so the problem may be not specific to CentOS 7. I did not test CentOS 6 or other distro (version)s, for example; but on Rocky Linux 8, which I unfortunately also need to support at work, the behaviour is as expected. Attached is a simple test. It may be a bug in some old glibc, then. If we assume posix_spawn(3) and posix_spawn(3p) were the only possible behaviours (which is frankly not that reliable, judging from how neither manpage noted the violation of conformance), then the two behaviours could be distinguished with the attached test. They're not the only possible behaviours: for instance, [1] shows that under some buggy qemu, posix_spawn() always returns early. But that behaviour can also be caught by the same workaround as the glibc behaviour you're observing, so it's fine. Since the bug is more widespread than "one old version of one distro", is visible in production environments used at large, and seems constrained to "posix_spawn succeeds even if exec fails", which is testable, I'll add a sysdep to detect it and a workaround in child_spawn*, but it will mean additional manual --with-sysdep-foobar=blah noise for skalibs cross-builds. [1]: https://skarnet.org/lists/skaware/1658.html -- Laurent
Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()
On Tue, Jun 27, 2023 at 09:27:17PM +, Laurent Bercot wrote: > Testing the behaviour may be challenging, however, because I suspect > the CentOS 7 implementation of posix_spawn() is just racy, and they > simply documented that they don't care. Actually I copied the fragment of posix_spawn(3) from a Devuan Chimaera machine, so the problem may be not specific to CentOS 7. I did not test CentOS 6 or other distro (version)s, for example; but on Rocky Linux 8, which I unfortunately also need to support at work, the behaviour is as expected. Attached is a simple test. >From the posix_spawn(3), which had the similar statement across the three different versions above, there is also a statement two paragraphs later: "The posix_spawn() and posix_spawnp() functions fail only in the case where the underlying fork(2), vfork(2) or clone(2) call fails". So if the manpage is followed to the word, the CentOS 7 behaviour is expected; if posix_spawn(3p) is followed instead, any exec() error must also be caught and result in returning 0, which we really expect. If we assume posix_spawn(3) and posix_spawn(3p) were the only possible behaviours (which is frankly not that reliable, judging from how neither manpage noted the violation of conformance), then the two behaviours could be distinguished with the attached test. -- My current OpenPGP key: RSA4096/0x227E8CAAB7AA186C (expires: 2024.09.30) 7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C /* ISC license. */ #include #include #include #include #include pid_t child_spawn0 (char const *prog, char const *const *argv, char const *const *envp) { pid_t pid ; posix_spawnattr_t attr ; int e ; e = posix_spawnattr_init() ; if (e) goto err ; e = posix_spawnp(, prog, 0, , (char *const *)argv, (char *const *)envp) ; posix_spawnattr_destroy() ; if (e) goto err ; return pid ; err: errno = e ; return 0 ; } int main (int argc, char const *const *argv, char const *const *envp) { if (argc < 2) return 1 ; printf ("%d\n", child_spawn0(argv[1], [1], envp)) ; return 0 ; }
Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()
Testing the behaviour may be challenging, however, because I suspect the CentOS 7 implementation of posix_spawn() is just racy, and they simply documented that they don't care. Thinking about it more, I'm afraid it's not a testable behaviour. Not only isn't there any way to force the race since it entirely happens inside a libc function, but also, the test would require running code on the build machine, which doesn't work for cross-builds and people would have to manually set the sysdep anyway. It seems like --with-sysdep-posixspawn=no, as you did, is the easiest workaround. -- Laurent
Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()
As a more general fix, I think tryposixspawn.c should at least try spawning a probably unexecutable path (like the one above) as well, which corrects the sysdep on systems where the expected conformance is broken. Adding a sysdep to detect that case is a good idea indeed! Rather than pretending it doesn't exist, though, I'd rather add a different sysdep that tests its behaviour, so it can still be used with the proper workaround. Testing the behaviour may be challenging, however, because I suspect the CentOS 7 implementation of posix_spawn() is just racy, and they simply documented that they don't care. -- Laurent
Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()
On Tue, Jun 27, 2023 at 08:50:35AM +, Laurent Bercot wrote: > I believe the correct setting is actually KillMode=mixed; and the > ExecStop= line is incorrect as well since ExecStop expects a synchronous > command, not an asynchronous one. Better let systemd just send a SIGTERM > to s6-svscan, wait for the supervision tree to exit on its own, and > SIGKILL the stragglers. I pushed a fix accordingly. Yes, I did my change in a haste and did not notice the `mixed' option which is indeed better. As you see, systemd waits for the stragglers so that is at least correct, although the timeout (150s by default on CentOS 7) is really exorbitant. > Yeah, well, tough for non-conforming systems. > That said, I also pushed a change last week that should have fixed > this issue as a side effect, so it's all good. If you feel like it, > you can try the s6-svscan version in the latest s6 git. :) s6-svscan is quite a minor issue; I added .s6-svscan/SIGTERM for all those scandirs which may unfortunately need to run on CentOS 7, as an extra line of defence -- in addition to `--with-sysdep-posixspawn=no' to skalibs. The latter also fixes other programs that involve child_spawn*(), including home-made ones (iotrap, discussed a few days ago on the supervision list) and upstream ones (for instance trap). I personally think `--with-sysdep-posixspawn=no' is the really correct fix on CentOS 7, which fixes silent failures with eg.: #!/bin/execlineb -P trap { SIGHUP { echo test } } /dev/xyzzy As a more general fix, I think tryposixspawn.c should at least try spawning a probably unexecutable path (like the one above) as well, which corrects the sysdep on systems where the expected conformance is broken. -- My current OpenPGP key: RSA4096/0x227E8CAAB7AA186C (expires: 2024.09.30) 7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C
Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()
* In `trap.html', there is a reference to the removed `timeout' keyword. Fixed. * In `s6-svscan-not-1.html', the systemd unit (traumatic experience with it, as you may easily expect) lacks a `KillMode = process'. I believe the correct setting is actually KillMode=mixed; and the ExecStop= line is incorrect as well since ExecStop expects a synchronous command, not an asynchronous one. Better let systemd just send a SIGTERM to s6-svscan, wait for the supervision tree to exit on its own, and SIGKILL the stragglers. I pushed a fix accordingly. * The child_spawn*() family of functions, depending on using posix_spawn or not, exhibit different behaviours on CentOS 7 (trauma again), as posix_spawnp() may return 0 with argv pointing to unexecutable paths. This, for example, results in s6-svscan not exiting on SIGTERM when .s6-svscan/SIGTERM is absent. The behaviour of posix_spawnp() on CentOS 7 does not conform to posix_spawn(3p), but is documented in posix_spawn(3): "Even when these functions return a success status, the child process may still fail for a plethora of reasons related to its pre-exec() initialization. In addition, the exec(3) may fail." Yeah, well, tough for non-conforming systems. That said, I also pushed a change last week that should have fixed this issue as a side effect, so it's all good. If you feel like it, you can try the s6-svscan version in the latest s6 git. :) > --with-sysdep-devurandom Also fixed. Thanks a lot for these reports! -- Laurent