Re: Unsightly errors due to race on closing stdin when handling signals in s6-log
Laurent, Thanks for the quick turnaround. Let's see if I understand. The key part is here: case SIGTERM : if (flagprotect) break ; case SIGHUP : handle_stdin = &last_stdin ; if (!indata.len) { prepare_to_exit() ; e = 1 ; } break ; The risk is that last_stdin() will try to read stdin after prepare_to_exit() is called. Yes. But the only place that can call last_stdin() is the part of the main loop here: https://github.com/skarnet/s6/blob/master/src/daemontools-extras/s6-log.c#L1305 and that can only happen when there is actually something to read (x[xindex0].revents contains IOPAUSE_READ) and, above that, when fd 0 has been marked as a valid descriptor to poll on (xindex0 is nonzero). The problem is that the current iteration of the event loop was started when we were not exiting yet, so flagexiting is 0, so xindex0 was (possibly) nonzero: https://github.com/skarnet/s6/blob/master/src/daemontools-extras/s6-log.c#L1267 xindex0 has been invalidated by our state change; if we continue the current iteration, we may take the path where (*handle_stdin)() is called. That's the bug. We need to make sure xindex0 tracks the correct state at all times, so restarting the loop (with "continue") when the state has changed ensures xindex0 is valid, since This causes the next iteration of the event loop to set xindex0 zero because flagexiting was set by prepare_to_exit(): if (!flagexiting && !(flagblock && r)) { x[j].fd = 0 ; x[j].events = IOPAUSE_READ ; xindex0 = j++ ; } else xindex0 = 0 ; Right? Yes. xindex0 will be 0 in all future iterations - only the one where flagexiting was just set to 1 was risky. A generic event loop pattern looks like this: for (;;) { before() ; if (exitcondition) break ; iopause() ; after() ; } before() reads state into variables and prepares the arguments to iopause() - you want to only poll for reading on valid fds and when you're not in lameduck mode (i.e. exiting), and only poll for writing on fds where you actually have something to write. iopause() is the blocking poll()/select() primitive. after() is a list of actions to take depending on the state you've gathered in before() and the events reported by iopause(). If an action in after() changes the state that has been read in before(), and this state has been cached in variables local to the loop, those cache variables have now been invalidated, and further tests in after() may yield incorrect results, perform actions that you didn't want to take (such as reading on stdin), or fail to perform actions that you should have been taking. To avoid that, when such invalidation happens, the current iteration should be aborted (e.g. via "continue") so the next invocation of before() can refresh the state cache. If there are still events that have not been handled by the current after(), it doesn't matter, because iopause() is level- triggered and the events are still there so iopause() will return instantly, and the next after() will take care of the events. It did cross my mind that another approach is to trade state space complexity for resource usage by creating an additional file descriptor to /dev/null during the initialisation path. Yeah, but no, this isn't a good trade-off. Tracking state in an event loop is very cheap, and pretty easy unless you make a stupid mistake (as I did before figuring out the pattern!), so it's really not worth using an extra fd to dispense with that tracking. -- Laurent
Re: Unsightly errors due to race on closing stdin when handling signals in s6-log
Laurent, Thanks for the quick turnaround. Let's see if I understand. The key part is here: case SIGTERM : if (flagprotect) break ; case SIGHUP : handle_stdin = &last_stdin ; if (!indata.len) { prepare_to_exit() ; e = 1 ; } break ; The risk is that last_stdin() will try to read stdin after prepare_to_exit() is called. However, prepare_to_exit() is only called if !indata.len, and causes handle_signals() to return 1. The caller of handle_signals() is: if (x[0].revents & (IOPAUSE_READ | IOPAUSE_EXCEPT) && handle_signals()) continue ; This causes the next iteration of the event loop to set xindex0 zero because flagexiting was set by prepare_to_exit(): if (!flagexiting && !(flagblock && r)) { x[j].fd = 0 ; x[j].events = IOPAUSE_READ ; xindex0 = j++ ; } else xindex0 = 0 ; Right? It did cross my mind that another approach is to trade state space complexity for resource usage by creating an additional file descriptor to /dev/null during the initialisation path. With this additional file descriptor, either: a) have prepare_to_exit() either dup2() this to stdin, or b) use an int inputfd = 0 to earmark the corresponding input source and overwrite this to instead indicate the /dev/null file descriptor. Earl On 2021-12-24 00:46, Laurent Bercot wrote: So indeed, when an exit signal was received at the same time stdin was readable (unless your producer is spamming logs, that's a rare event, which is why I never saw it), prepare_to_exit() was called but the xindex0 marker was not updated and the remainder of the iteration still called *handle_stdin, yielding read errors. Now, when handle_signals() calls prepare_to_exit(), the event loop is restarted, so we're never handling events in an obsolete state. Please check the latest s6 git head and tell me if it works for you. The mistake of not restarting the event loop right away on state change is something I became aware of and stopped making in later software, but I never thought of going back and checking whether s6-log had it. Thanks! -- Laurent
Re: Unsightly errors due to race on closing stdin when handling signals in s6-log
So indeed, when an exit signal was received at the same time stdin was readable (unless your producer is spamming logs, that's a rare event, which is why I never saw it), prepare_to_exit() was called but the xindex0 marker was not updated and the remainder of the iteration still called *handle_stdin, yielding read errors. Now, when handle_signals() calls prepare_to_exit(), the event loop is restarted, so we're never handling events in an obsolete state. Please check the latest s6 git head and tell me if it works for you. The mistake of not restarting the event loop right away on state change is something I became aware of and stopped making in later software, but I never thought of going back and checking whether s6-log had it. Thanks! -- Laurent
Re: Unsightly errors due to race on closing stdin when handling signals in s6-log
Laurent, I don't like opening anything (even /dev/null) on an exit path Though I was looking to keep the scope of my efforts limited, I think this would be preferred if you an achieve it. Earl On , Laurent Bercot wrote: IIRC the use case involves signal handling, which causes prepare_to_exit() to be called in handle_signals(). Once the signal handler returns, stdin is closed. The mainline continues to refer to the now closed file descriptor, and emits errors as consequence. The process terminates, but the path to exit is unsightly. Ah, good catch. Thanks for the report and the patch! I don't like opening anything (even /dev/null) on an exit path, so I'll try to fix the problem in another way (tracking use of fd 0 in the mainline) over the next few days. If it proves too difficult, I'll commit something similar to your patch. -- Laurent
Re: Unsightly errors due to race on closing stdin when handling signals in s6-log
IIRC the use case involves signal handling, which causes prepare_to_exit() to be called in handle_signals(). Once the signal handler returns, stdin is closed. The mainline continues to refer to the now closed file descriptor, and emits errors as consequence. The process terminates, but the path to exit is unsightly. Ah, good catch. Thanks for the report and the patch! I don't like opening anything (even /dev/null) on an exit path, so I'll try to fix the problem in another way (tracking use of fd 0 in the mainline) over the next few days. If it proves too difficult, I'll commit something similar to your patch. -- Laurent