Re: Unsightly errors due to race on closing stdin when handling signals in s6-log

2021-12-24 Thread Laurent Bercot




Laurent,

Thanks for the quick turnaround. Let's see if I understand. The key part is 
here:

  case SIGTERM :
if (flagprotect) break ;
  case SIGHUP :
handle_stdin = _stdin ;
if (!indata.len) { prepare_to_exit() ; e = 1 ; }
break ;

The risk is that last_stdin() will try to read stdin after prepare_to_exit() is 
called.


 Yes. But the only place that can call last_stdin() is the part of the
main loop here:
https://github.com/skarnet/s6/blob/master/src/daemontools-extras/s6-log.c#L1305

and that can only happen when there is actually something to read
(x[xindex0].revents contains IOPAUSE_READ) and, above that, when
fd 0 has been marked as a valid descriptor to poll on (xindex0 is
nonzero).

The problem is that the current iteration of the event loop was started
when we were not exiting yet, so flagexiting is 0, so xindex0 was
(possibly) nonzero:
https://github.com/skarnet/s6/blob/master/src/daemontools-extras/s6-log.c#L1267
xindex0 has been invalidated by our state change; if we continue the
current iteration, we may take the path where (*handle_stdin)() is
called. That's the bug. We need to make sure xindex0 tracks the correct
state at all times, so restarting the loop (with "continue") when the
state has changed ensures xindex0 is valid, since



This causes the next iteration of the event loop to set xindex0 zero because 
flagexiting was set by prepare_to_exit():

  if (!flagexiting && !(flagblock && r))
  {
x[j].fd = 0 ;
x[j].events = IOPAUSE_READ ;
xindex0 = j++ ;
  }
  else xindex0 = 0 ;

Right?


 Yes. xindex0 will be 0 in all future iterations - only the one where
flagexiting was just set to 1 was risky.

 A generic event loop pattern looks like this:

  for (;;)
  {
before() ;
if (exitcondition) break ;
iopause() ;
after() ;
  }

 before() reads state into variables and prepares the arguments to
iopause() - you want to only poll for reading on valid fds and when
you're not in lameduck mode (i.e. exiting), and only poll for writing
on fds where you actually have something to write.
 iopause() is the blocking poll()/select() primitive.
 after() is a list of actions to take depending on the state you've
gathered in before() and the events reported by iopause().

 If an action in after() changes the state that has been read in
before(), and this state has been cached in variables local to the
loop, those cache variables have now been invalidated, and further
tests in after() may yield incorrect results, perform actions that
you didn't want to take (such as reading on stdin), or fail to
perform actions that you should have been taking. To avoid that, when
such invalidation happens, the current iteration should be aborted (e.g.
via "continue") so the next invocation of before() can refresh the
state cache. If there are still events that have not been handled by
the current after(), it doesn't matter, because iopause() is level-
triggered and the events are still there so iopause() will return
instantly, and the next after() will take care of the events.



It did cross my mind that another approach is to trade state space complexity 
for resource usage by creating an additional file descriptor to /dev/null 
during the initialisation path.


 Yeah, but no, this isn't a good trade-off. Tracking state in an
event loop is very cheap, and pretty easy unless you make a stupid
mistake (as I did before figuring out the pattern!), so it's really
not worth using an extra fd to dispense with that tracking.

--
 Laurent



Re: Unsightly errors due to race on closing stdin when handling signals in s6-log

2021-12-24 Thread Earl Chew via skaware

Laurent,

Thanks for the quick turnaround. Let's see if I understand. The key part 
is here:


  case SIGTERM :
    if (flagprotect) break ;
  case SIGHUP :
    handle_stdin = _stdin ;
    if (!indata.len) { prepare_to_exit() ; e = 1 ; }
    break ;

The risk is that last_stdin() will try to read stdin after 
prepare_to_exit() is called.


However, prepare_to_exit() is only called if !indata.len, and causes 
handle_signals() to return 1.


The caller of handle_signals() is:

  if (x[0].revents & (IOPAUSE_READ | IOPAUSE_EXCEPT) && 
handle_signals()) continue ;


This causes the next iteration of the event loop to set xindex0 zero 
because flagexiting was set by prepare_to_exit():


  if (!flagexiting && !(flagblock && r))
  {
    x[j].fd = 0 ;
    x[j].events = IOPAUSE_READ ;
    xindex0 = j++ ;
  }
  else xindex0 = 0 ;

Right?


It did cross my mind that another approach is to trade state space 
complexity for resource usage by creating an additional file descriptor 
to /dev/null during the initialisation path. With this additional file 
descriptor, either:


a) have prepare_to_exit() either dup2() this to stdin, or

b) use an int inputfd = 0 to earmark the corresponding input source and 
overwrite this to instead indicate the /dev/null file descriptor.



Earl


On 2021-12-24 00:46, Laurent Bercot wrote:


 So indeed, when an exit signal was received at the same time stdin was
readable (unless your producer is spamming logs, that's a rare event,
which is why I never saw it), prepare_to_exit() was called but the
xindex0 marker was not updated and the remainder of the iteration still
called *handle_stdin, yielding read errors.

 Now, when handle_signals() calls prepare_to_exit(), the event loop
is restarted, so we're never handling events in an obsolete state.

 Please check the latest s6 git head and tell me if it works for you.

 The mistake of not restarting the event loop right away on state change
is something I became aware of and stopped making in later software, but
I never thought of going back and checking whether s6-log had it.
Thanks!

--
 Laurent



Re: Unsightly errors due to race on closing stdin when handling signals in s6-log

2021-12-24 Thread Laurent Bercot



 So indeed, when an exit signal was received at the same time stdin was
readable (unless your producer is spamming logs, that's a rare event,
which is why I never saw it), prepare_to_exit() was called but the
xindex0 marker was not updated and the remainder of the iteration still
called *handle_stdin, yielding read errors.

 Now, when handle_signals() calls prepare_to_exit(), the event loop
is restarted, so we're never handling events in an obsolete state.

 Please check the latest s6 git head and tell me if it works for you.

 The mistake of not restarting the event loop right away on state change
is something I became aware of and stopped making in later software, but
I never thought of going back and checking whether s6-log had it.
Thanks!

--
 Laurent



Re: Unsightly errors due to race on closing stdin when handling signals in s6-log

2021-12-23 Thread Earl Chew via skaware

Laurent,


I don't like opening anything (even /dev/null) on an exit path


Though I was looking to keep the scope of my efforts limited, I think 
this would be preferred if you an achieve it.


Earl


On , Laurent Bercot wrote:
IIRC the use case involves signal handling, which causes 
prepare_to_exit() to be called in handle_signals(). Once the signal 
handler returns, stdin is closed. The mainline continues to refer to 
the now closed file descriptor, and emits errors as consequence. The 
process terminates, but the path to exit is unsightly.


 Ah, good catch. Thanks for the report and the patch!
 I don't like opening anything (even /dev/null) on an exit path, so
I'll try to fix the problem in another way (tracking use of fd 0 in
the mainline) over the next few days. If it proves too difficult, I'll
commit something similar to your patch.

--
 Laurent



Re: Unsightly errors due to race on closing stdin when handling signals in s6-log

2021-12-23 Thread Laurent Bercot

IIRC the use case involves signal handling, which causes prepare_to_exit() to 
be called in handle_signals(). Once the signal handler returns, stdin is 
closed. The mainline continues to refer to the now closed file descriptor, and 
emits errors as consequence. The process terminates, but the path to exit is 
unsightly.


 Ah, good catch. Thanks for the report and the patch!
 I don't like opening anything (even /dev/null) on an exit path, so
I'll try to fix the problem in another way (tracking use of fd 0 in
the mainline) over the next few days. If it proves too difficult, I'll
commit something similar to your patch.

--
 Laurent