Re: Calling system(3) when SIGCHLD is SIG_IGN'd

2024-01-13 Thread Rhialto
On Fri 12 Jan 2024 at 20:28:12 -0600, Matthew D. Fuller wrote:
> On Sat, Aug 12, 2023 at 03:22:56PM +0200 I heard the voice of
> Rhialto, and lo! it spake thus:
> > There seems to be only a single fork() in ctwm anyway, to call m4 to
> > parse the config file. The SIGCHLD changing could be limited to that
> > area, then set back to default.
> > 
> > Or, alternatively, a proper signal handler for SIGCHLD could be set up.
> 
> Presumably, we'd need to do the latter to handle the "inheriting
> unexpected children" case properly anyway.  I wonder what happened
> before the SIG_IGN change.  I guess we just accumulated zombies?

I expect so - I didn't really check.

> >From the looks of that PR, it doesn't seem like any kernel-side
> changes have fallen out of it, and cvsweb doesn't show any recent
> changes to system(3), so I presume this is still needed?

I didn't see any followup either. Their variant of the change goes like
this (but my version fits better with the current handler naming scheme
I think):

Index: signals.c
===
RCS file: /cvsroot/xsrc/external/mit/ctwm/dist/signals.c,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -r1.1 -r1.2
--- signals.c   5 Jul 2023 07:36:07 -   1.1
+++ signals.c   20 Oct 2023 10:18:55 -  1.2
@@ -8,6 +8,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "ctwm_shutdown.h"
 #include "signals.h"
@@ -26,6 +28,15 @@
 // needs to trigger an action.
 bool SignalFlag = false;
 
+void ChildExit(int signum)
+{
+   int Errno = errno;
+   /* reap dead children, ignore status */
+   while (waitpid(-1, NULL, WNOHANG) > 0)
+   continue;
+   /* restore errno for interrupted sys calls */
+   errno = Errno;
+}
 
 /**
  * Setup signal handlers (run during startup)
@@ -46,9 +57,12 @@
// die...
signal(SIGALRM, SIG_IGN);
 
-   // This should be set by default, but just in case; explicitly don't
-   // leave zombies.
-   signal(SIGCHLD, SIG_IGN);
+   /* Setting SIGCHLD to SIG_IGN detaches children from the parent
+* immediately, so it need not be waited for.
+* In fact, you cannot wait for it, so a function like system()
+* breaks.
+*/
+   signal(SIGCHLD, ChildExit);
 
return;
 }

> Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
-Olaf.
-- 
___ Olaf 'Rhialto' Seibert
\X/ There is no AI. There is just someone else's work.   --I. Rose


signature.asc
Description: PGP signature


Re: Calling system(3) when SIGCHLD is SIG_IGN'd

2024-01-12 Thread Matthew D. Fuller


On Sat, Aug 12, 2023 at 03:22:56PM +0200 I heard the voice of
Rhialto, and lo! it spake thus:
> There seems to be only a single fork() in ctwm anyway, to call m4 to
> parse the config file. The SIGCHLD changing could be limited to that
> area, then set back to default.
> 
> Or, alternatively, a proper signal handler for SIGCHLD could be set up.

Presumably, we'd need to do the latter to handle the "inheriting
unexpected children" case properly anyway.  I wonder what happened
before the SIG_IGN change.  I guess we just accumulated zombies?

>From the looks of that PR, it doesn't seem like any kernel-side
changes have fallen out of it, and cvsweb doesn't show any recent
changes to system(3), so I presume this is still needed?


-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.



Re: Calling system(3) when SIGCHLD is SIG_IGN'd

2023-08-13 Thread Rhialto
On Sun 13 Aug 2023 at 13:51:53 +0200, Michael van Elst wrote:
> On Sun, Aug 13, 2023 at 01:04:00PM +0200, Rhialto wrote:
> > I suppose that the problem was not noticed since 2018 because
> > there are several preconditions:
> 
> The problem didn't exist for NetBSD before the recent ctwm update in xsrc.

The change was made in ctwm in 2018 and I used it on NetBSD, pretty much
always the HEAD version.

> > - session startup script must start some programs in the background, and
> >   end with "exec ctwm". Just "ctwm" without "exec" doesn't do it. This
> >   is so that the started background processes become children of ctwm
> >   (even if ctwm didn't start them itself).
> 
> Indeed, that's the standard way to start a session. Exiting the window
> manager will then also finish the session.

I would say it is one of the two major ways. I end the session startup
script with "exec xterm". That way you can exit the window manager (or
it can crash... that can happen while developing new things) but it
won't end your X session.

> > - on FreeBSD (and probably Linux too) they apparently don't follow the
> >   described semantics of wait*(2) regarding SIGCHLD set to SIG_IGN, OR
> >   in system(3) they work around it somehow. Alledgedly, POSIX doesn't
> >   require system(3) to work in this state.
> 
> I haven't found anything that would even talk about it, but the sematics
> of waitpid in this case (wait for all children to finish) seem to forbid
> that system can work correctly unless it works outside of POSIX system calls.

Maybe the POSIX people overlooked this case, that is also possible.

> Michael van Elst
-Olaf.
-- 
___ Olaf 'Rhialto' Seibert
\X/ There is no AI. There is just someone else's work.   --I. Rose


signature.asc
Description: PGP signature


Re: Calling system(3) when SIGCHLD is SIG_IGN'd

2023-08-13 Thread Michael van Elst


On Sun, Aug 13, 2023 at 01:04:00PM +0200, Rhialto wrote:
> On Sat 12 Aug 2023 at 23:49:43 +0200, Michael van Elst wrote:
> > On Sat, Aug 12, 2023 at 03:22:56PM +0200, Rhialto wrote:
> > > 
> > > Or, alternatively, a proper signal handler for SIGCHLD could be set up.
> > 
> > ctwm had a signal handler for SIGCHLD (not 100% correct). And that got 
> > replaced
> > by ignoring the signal to avoid waiting for children.
> 
> Yes, looking at the history, it got replaced in commit 647.1.4. This was
> removed (amongst other things):


> I suppose that the problem was not noticed since 2018 because
> there are several preconditions:

The problem didn't exist for NetBSD before the recent ctwm update in xsrc.


> - session startup script must start some programs in the background, and
>   end with "exec ctwm". Just "ctwm" without "exec" doesn't do it. This
>   is so that the started background processes become children of ctwm
>   (even if ctwm didn't start them itself).

Indeed, that's the standard way to start a session. Exiting the window
manager will then also finish the session.


> - on FreeBSD (and probably Linux too) they apparently don't follow the
>   described semantics of wait*(2) regarding SIGCHLD set to SIG_IGN, OR
>   in system(3) they work around it somehow. Alledgedly, POSIX doesn't
>   require system(3) to work in this state.

I haven't found anything that would even talk about it, but the sematics
of waitpid in this case (wait for all children to finish) seem to forbid
that system can work correctly unless it works outside of POSIX system calls.



Greetings,
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."



Re: Calling system(3) when SIGCHLD is SIG_IGN'd

2023-08-13 Thread Rhialto
On Sat 12 Aug 2023 at 23:49:43 +0200, Michael van Elst wrote:
> On Sat, Aug 12, 2023 at 03:22:56PM +0200, Rhialto wrote:
> > 
> > Or, alternatively, a proper signal handler for SIGCHLD could be set up.
> 
> ctwm had a signal handler for SIGCHLD (not 100% correct). And that got 
> replaced
> by ignoring the signal to avoid waiting for children.

Yes, looking at the history, it got replaced in commit 647.1.4. This was
removed (amongst other things):

#ifdef __WAIT_FOR_CHILDS
/*
 * Handler for SIGCHLD. Needed to avoid zombies when an .xinitrc
 * execs ctwm as the last client. (All processes forked off from
 * within .xinitrc have been inherited by ctwm during the exec.)
 * Jens Schweikhardt 
 */
void
ChildExit(int signum)
{
int Errno = errno;
signal(SIGCHLD, ChildExit);  /* reestablish because we're a one-shot */
waitpid(-1, NULL, WNOHANG);   /* reap dead child, ignore status */
errno = Errno;   /* restore errno for interrupted sys calls 
*/
}
#endif

The "one-shot" comment is no longer true (POSIX says "Once an action is
installed for a specific signal, it shall remain installed until another
action is explicitly requested (by another call to sigaction()), until
the SA_RESETHAND flag causes resetting of the handler, or until one of
the exec functions is called." in
https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigaction.html

The waitpid() call should be in a loop, because there may be multiple
children to wait for. I read somewhere about "SysV semantics" that would
cause exactly one SIGCHLD for each child, but I wouldn't want to invoke
those semantics.

The __WAIT_FOR_CHILDS was probably never defined so this code was
probably never used. BUT, the comment on the new "signal(SIGCHLD,
SIG_IGN);" says "This should be set by default" but that also isn't
exactly true. Default action for SIGCHLD is to do nothing, but that
isn't the same as SIG_IGN (as discussed in the initial mail and the
referenced Problem Report).

I suppose that the problem was not noticed since 2018 because
there are several preconditions:

- session startup script must start some programs in the background, and
  end with "exec ctwm". Just "ctwm" without "exec" doesn't do it. This
  is so that the started background processes become children of ctwm
  (even if ctwm didn't start them itself).

- on FreeBSD (and probably Linux too) they apparently don't follow the
  described semantics of wait*(2) regarding SIGCHLD set to SIG_IGN, OR
  in system(3) they work around it somehow. Alledgedly, POSIX doesn't
  require system(3) to work in this state.

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert
\X/ There is no AI. There is just someone else's work.   --I. Rose


signature.asc
Description: PGP signature


Re: Calling system(3) when SIGCHLD is SIG_IGN'd

2023-08-12 Thread Michael van Elst


On Sat, Aug 12, 2023 at 03:22:56PM +0200, Rhialto wrote:
> 
> Or, alternatively, a proper signal handler for SIGCHLD could be set up.

ctwm had a signal handler for SIGCHLD (not 100% correct). And that got replaced
by ignoring the signal to avoid waiting for children.


-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."



Re: Calling system(3) when SIGCHLD is SIG_IGN'd

2023-08-12 Thread Rhialto
On Sat 12 Aug 2023 at 16:05:34 +0200, Rhialto wrote:
> On Sat 12 Aug 2023 at 15:22:56 +0200, Rhialto wrote:
> > Or, alternatively, a proper signal handler for SIGCHLD could be set up.
> 
> I propose something like this.
> It worked for me in a quick test, also when I changed the style of my
> .xinitrc file from my usual
> 
> xterm &
> exec ctwm
> 
> to
> 
> ctwm &
> exec xterm

I meant that the other way around. The bug shows in the first case.
I've also updated it slightly to save and restore errno.

=== modified file 'signals.c'
--- old/signals.c   2018-11-18 22:08:49 +
+++ new/signals.c   2023-08-12 15:03:42 +
@@ -8,6 +8,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "ctwm_shutdown.h"
 #include "signals.h"
@@ -16,6 +18,7 @@
 /* Our backends */
 static void sh_restart(int signum);
 static void sh_shutdown(int signum);
+static void sh_sigchld(int signum);
 
 
 // Internal flags for which signals have called us
@@ -46,9 +49,8 @@
// die...
signal(SIGALRM, SIG_IGN);
 
-   // This should be set by default, but just in case; explicitly don't
-   // leave zombies.
-   signal(SIGCHLD, SIG_IGN);
+   // Explicitly don't leave zombies.
+   signal(SIGCHLD, sh_sigchld);
 
return;
 }
@@ -123,3 +125,18 @@
SignalFlag = sig_shutdown = true;
 }
 
+/**
+ * Handle SIGCHLD so we don't leave zombie child processes.
+ * SIG_IGN'ing it would cause system(3) to malfunction.
+ */
+static void
+sh_sigchld(int signum)
+{
+   pid_t pid;
+   int old_errno = errno;
+
+   while((pid = waitpid(-1, NULL, WNOHANG)) > 0)
+   ;
+
+   errno = old_errno;
+}

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert
\X/ There is no AI. There is just someone else's work.   --I. Rose


signature.asc
Description: PGP signature


Re: Calling system(3) when SIGCHLD is SIG_IGN'd

2023-08-12 Thread Rhialto
On Sat 12 Aug 2023 at 15:22:56 +0200, Rhialto wrote:
> Or, alternatively, a proper signal handler for SIGCHLD could be set up.

I propose something like this.
It worked for me in a quick test, also when I changed the style of my
.xinitrc file from my usual

xterm &
exec ctwm

to

ctwm &
exec xterm

Before the fix, the second form did indeed cause ctwm to get blocked.

=== modified file 'signals.c'
--- old/signals.c   2018-11-18 22:08:49 +
+++ new/signals.c   2023-08-12 13:44:27 +
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ctwm_shutdown.h"
 #include "signals.h"
@@ -16,6 +17,7 @@
 /* Our backends */
 static void sh_restart(int signum);
 static void sh_shutdown(int signum);
+static void sh_sigchld(int signum);
 
 
 // Internal flags for which signals have called us
@@ -46,9 +48,8 @@
// die...
signal(SIGALRM, SIG_IGN);
 
-   // This should be set by default, but just in case; explicitly don't
-   // leave zombies.
-   signal(SIGCHLD, SIG_IGN);
+   // Explicitly don't leave zombies.
+   signal(SIGCHLD, sh_sigchld);
 
return;
 }
@@ -123,3 +124,15 @@
SignalFlag = sig_shutdown = true;
 }
 
+/**
+ * Handle SIGCHLD so we don't leave zombie child processes.
+ */
+static void
+sh_sigchld(int signum)
+{
+   pid_t pid;
+   int status;
+
+   while((pid = waitpid(-1, , WNOHANG)) > 0)
+   ;
+}

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert
\X/ There is no AI. There is just someone else's work.   --I. Rose


signature.asc
Description: PGP signature


Re: Calling system(3) when SIGCHLD is SIG_IGN'd

2023-08-12 Thread Rhialto
Problem Report from the NetBSD side: https://gnats.netbsd.org/57579

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert
\X/ There is no AI. There is just someone else's work.   --I. Rose


signature.asc
Description: PGP signature


Re: Calling system(3) when SIGCHLD is SIG_IGN'd

2023-08-12 Thread Rhialto
There seems to be only a single fork() in ctwm anyway, to call m4 to
parse the config file. The SIGCHLD changing could be limited to that
area, then set back to default.

Or, alternatively, a proper signal handler for SIGCHLD could be set up.

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert
\X/ There is no AI. There is just someone else's work.   --I. Rose


signature.asc
Description: PGP signature


Calling system(3) when SIGCHLD is SIG_IGN'd

2023-08-12 Thread Rhialto
I came across an interesting bug caused by the way ctwm sets up signals,
the POSIX description of SIGCHLD and wait(2), and a common way of
starting a window manager. It's at
http://mail-index.netbsd.org/tech-userlevel/2023/08/12/msg014107.html
and any followups can be read from there, but I'll quote the whole
message here.

Can we avoid setting doing `signal(SIGCHLD, SIG_IGN)`?
It is done in signals.c:setup_signal_handlers().

-
>From: Taylor R Campbell 
>To: tech-userle...@netbsd.org
>Subject: system(3) semantics when SIGCHLD is SIG_IGN'd
>Date: Sat, 12 Aug 2023 11:58:36 +

What should system(3) do when the signal action for SIGCHLD is
SIG_IGN, or has SA_NOCLDWAIT set?


Setting SIGCHLD to SIG_IGN has the effect of reaping zombie children
automatically, so that calling wait(2) is unnecessary to reap them --
and, further, doesn't return _at all_ until the last child has exited.

This semantics -- same as setting SA_NOCLDWAIT -- is enshrined in
POSIX:

If the calling process has SA_NOCLDWAIT set or has SIGCHLD set to
SIG_IGN, and the process has no unwaited for children that were
transformed into zombie processes, the calling thread will block
until all of the children of the process containing the calling
thread terminate, and wait() and waitpid() will fail and set errno
to [ECHILD].

https://pubs.opengroup.org/onlinepubs/7908799/xsh/wait.html

So if a process already has a child, and calls system(3) as it is
currently implemented in libc in ~all versions of NetBSD, system(3)
will hang indefinitely until the existing child exits.

This manifests in newer versions of ctwm which set SIGCHLD to SIG_IGN
if you have a .xsession file that does something like:

xterm &
xclock &
exec ctwm

This causes ctwm to start with two children already, which in turn
causes system(3) to hang when you try to start an application from the
ctwm menu.

The ctwm hang led to PR kern/57527 (https://gnats.netbsd.org/57527,
`kern' because at first it looked like a missing wakeup in the kernel
before we realized this is exactly how POSIX expects SIG_IGN and
SA_NOCLDWAIT to behave), which has some litmus tests for the semantics
and draft code to mitigate the situation in system(3).

So, should we do anything about this in system(3)?

Pro: Makes existing code code like ctwm work.

Cons:
- POSIX doesn't ask system(3) to work when SIGCHLD is set to SIG_IGN
  or when it has SA_NOCLDWAIT set, so this code is nonportable anyway;
  might break on other systems too, so breakage on NetBSD leading to
  an upstream bug report is helpful.
- Changing signal actions has the side effect of clearing the signal
  queue, and I don't see a way around that.

Alternative would be to say: don't do that; fix the buggy code that
calls system(3) with SIGCHLD ignored, either by having it set a signal
handler that calls waitpid(-1, NULL, WNOHANG) until success, or by
having it use something other than system(3).

-


signature.asc
Description: PGP signature