Re: [Toybox] [PATCH] Factor out dhcpd's and syslogd's main loops into a library function

2013-12-02 Thread Felix Janda
Rob Landley wrote:
 Taking the remaining holiday time to close old reply windows and such.  
 I've ahd this one open a while... :)
 
 On 09/06/2013 05:35:27 PM, Felix Janda wrote:
[...]
   Are we normally blocking a lot of signals in other commands? Using  
  the
   6 argument version instead of the 5 argument version addresses an  
  issue
   for which existing command?
  
  If a signal occurs during the block it will still be delivered in the  
  next
  pselect loop. pselect implements exactly the signal unblocking I am  
  talking
  about above. We block all signals. But during the signal call some are
  unblocked. When it returns they are again blocked and we can safely  
  check
  what has happened. See the article
  
  https://lwn.net/Articles/176911/
  
  for some discussion on the system call.
 
 My problem with the approach was that the loop really shouldn't mess  
 with signal handlers at all. If we want to fiddle with something fancy  
 there's signalfd.

Ok, I see that in the case that in the case that no signal handling is
needed it shouldn't touch the sigprocmask at all.

In the case signal handling is needed the blocking tames the
asynchronic signals. signalfd would be another (linux specific) option.

 I have an aversion to infrastructure in search of a user. When common  
 code goes into the library, I like to be able to point to at least one  
 existing user, preferably two.

For me the whole point was factoring out the similar code in dhcpd and
syslogd (now there are more possible users in pending), replacing the
self-pipes by something of similar functionality but better suited to
be put into a function.

[...]
  select() adjusts its timespecs on linux. But that's only one possible
  behavior specified by POSIX.
 
 Some not-linuxes don't work like linux, posix is vague enough to allow  
 Windows NT to be certified...
 
 I'm pretty happy sticking with a documented Linux semantic if it's a  
 documented Linux semantic.

Ok. (But it'd be good to note somewhere that a Linux semantic is used.)

  Since we are already doing signals SIGALRM
  seemed more elegant then constantly adjusting the timeout to me.
 
 Having a library I/O function modify global signal handler state is  
 elegant?

more elegant. (Oh, s/then/than/) daemon_main_loop() says how much I
thought of it as a library I/O function at the start. It is meant to
take total control of all signals. I can't think of any application
(in toybox) of both signals and a poll/select loop but a
wait-for-fd-or-signal-loop.

[...] (Discussion on fork and exec.)

Since xexec()
doesn't change the pid the pidfile needs to be unlinked() before
terminating so that xpidfile() doesn't complain.
  
   Um... I agree this is a symptom of a problem. I'm not sure that's  
  the
   right fix. The restart is maintaining state and then getting  
  confused
   by the state it's maintaining...
  
   (I can just see some horrible systemd thing hitting a race window  
  where
   the PID file wasn't there and trying to restart the process...)
  
  Does systemd still need PID files?
 
 I have no idea. It's a horrible thing and I'm trying to avoid it. My  
 point was not assuming behavior on the part of other programs, and  
 leaving a window where something doing inotify() wakes up because an  
 indicator file went away when the process it represents didn't and  
 isn't going to... I try to avoid that sort of halfway state where there  
 can be conflicting assumptions.

So build in some extra logic so that the PID file is only deleted if the
daemon actually exits?

[...]
  So how about making the two helper functions for daemon_main_loop()
  return a value and possibly returning from daemon_main_loop() based on
  that value? Then in syslogd_main() add a infinite loop around (or  
  almost)
  everything.
 
 I need to go clean up dhcpd and systemd before figuring out what the  
 infrastructure should look like. If we're going to do anything fancy  
 with signals I lean towards teaching this loop about signalfd, but  
 probably since Linux can adjust the timeout we should just let Linux  
 adjust the timeout.

Apart from the portability I'd also be very much happy with signalfd.
Again pselect is a replacement for the __self-pipes__. The timeout
is just a side show.

 Lemme finish my current set of cleanups, then get back to this. :)

Take your time.

 I'm finally sort of resurfacing from A) catching up on domestic life  
 after 6 months out of state, B) starting a new job after the  
 kickstarter idea got zero replies.

I have no valuable input on the proposed text for the campaign
(like it as it is) but would really like something like it to happen
(and succeed). The current state is kind of sad. (There is no way
to blame you for it.)

Felix
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] Factor out dhcpd's and syslogd's main loops into a library function

2013-12-01 Thread Rob Landley
Taking the remaining holiday time to close old reply windows and such.  
I've ahd this one open a while... :)


On 09/06/2013 05:35:27 PM, Felix Janda wrote:

Rob Landley wrote:
 On 09/01/2013 08:54:27 AM, Felix Janda wrote:
  Both syslogd and dhcpd wait for sockets, signals or a timeout.
  Currently,
  they use select() with its timeout to listen to the sockets and a
  self-pipe for the signals.
 
  The new library function instead uses pselect() without a timeout  
and
  alarm(). With pselect() signals are only unblocked during this  
system

  call; so the xwrappers don't need to be aware about interrupted
  syscalls.

 Are we normally blocking a lot of signals in other commands? Using  
the
 6 argument version instead of the 5 argument version addresses an  
issue

 for which existing command?

If a signal occurs during the block it will still be delivered in the  
next
pselect loop. pselect implements exactly the signal unblocking I am  
talking

about above. We block all signals. But during the signal call some are
unblocked. When it returns they are again blocked and we can safely  
check

what has happened. See the article

https://lwn.net/Articles/176911/

for some discussion on the system call.


My problem with the approach was that the loop really shouldn't mess  
with signal handlers at all. If we want to fiddle with something fancy  
there's signalfd.


I have an aversion to infrastructure in search of a user. When common  
code goes into the library, I like to be able to point to at least one  
existing user, preferably two.


  alarm() is used instead of the timeout argument to pselect() so  
that
  the length of the intervals between timeouts is not affected by  
being

  interrupted.

 QEMU recently changed its timer system to _not_ use alarm:

http://blog.alex.org.uk/2013/08/24/changes-to-qemus-timer-system/

 I'm not convinced sending signals to ourselves is preferable to  
telling
 the system call when to timeout. Of the three options (select  
adjusting

 its timespec, us manually reading the clock and feeding pselect a
 timeout based on when we next want to wake up, and us sending  
ourselves
 an alarm), us sending ourselves an alarm and constantly having to  
reset
 it seems like it has the most moving parts and the most things that  
can

 go wrong.

select() adjusts its timespecs on linux. But that's only one possible
behavior specified by POSIX.


Some not-linuxes don't work like linux, posix is vague enough to allow  
Windows NT to be certified...


I'm pretty happy sticking with a documented Linux semantic if it's a  
documented Linux semantic.



Since we are already doing signals SIGALRM
seemed more elegant then constantly adjusting the timeout to me.


Having a library I/O function modify global signal handler state is  
elegant?



 Ok, maybe I've been slightly burned by SIGALRM before:

http://landley.net/notes-2011.html#05-09-2011

  On SIGHUP syslogd restarts. Previously everything relevant was
  contained
  in syslogd_main() and the restarting was implemented by a goto.  
Now a

  longjmp() would be necessary. I instead used an xexec().

 Since commit 955 (and pondering toysh to finally tackle that sucker
 within a finite amount of time) I'm a bit worried that restarting a
 command without OS involvement might misbehave if the signal mask  
gets

 corrupted, or there are funky environment variables, or memory leaks
 build up, or we have leftover filehandles, or a couple dozen other
 things.

 We haven't got generic reset this process's state code, longjmp()
 leaves a lot of debris, and xexec() callign a new main doesn't clear
 the old stuff off the stack; do that in a tight enough loop for long
 enough and eventually you will run out of stack even in a normal  
linux
 userspace. (Default's 8 megs: ulimit -s says 8192 and the bash man  
page

 says the units are kilobytes.)

 I added NOFORK because some commands (like cd) _must_ happen in the
 current toysh process context, but also because I could very very
 carefully audit commands to make sure it was safe to call them in  
the
 same process context and continue when they return. But these  
commands

 are in pending because I haven't even done an initial audit pass on
 them.

 Discarding your process on restart and execing a new one covers a
 multitude of sins. Possibly syslogd is clean enough it can restart
 itself without accumulating trash, but as a long-lived daemon I have
 yet to closely read (hence it being in the pending directory), this
 does not fill me with confidence.

 Probably just the it's 5am, sun coming up soon things. I should  
step

 away from the keyboard...

You are of course right. I think you have some good ideas below how to
fix it.


I note that fork but don't exec is remarkably cheap, and toybox  
should probably do more of that. The downside is long chains of xexec()  
that call command_main() can potentially eat a lot of stack and heap  
and filehandles and such.


Some vague 

Re: [Toybox] [PATCH] Factor out dhcpd's and syslogd's main loops into a library function

2013-09-18 Thread Felix Janda
Rob Landley wrote:
 On 09/10/2013 04:22:41 PM, Felix Janda wrote:
  Rob Landley wrote:
   And yet a call to pselect needs its own sigprocmask() call as setup?
   So... why are we using pselect again?
  
  Didn't I say so? Before the call to pselect we block everything.
 
 But why do we do that? (So that only kill -9 will kill daemons?)

So that we can precisely control when signals come in. If a signal occurs
during the time signals are blocked, it is pending to be delivered after
signals are unblocked, that's here during the pselect call. You don't need
kill -9 unless the program decides to go into an infinite loop or read()
from an fd with no data or something like that.

  Then
  pselect atomically unblocks everything, does the select and afterwards
  blocks everything again. So the only thing that can be interrupted by
  signals is pselect. No need to think about race conditions. The  
  relation
  between pselect and select is the same as between ppoll and poll.
 
 There was no need to think about signal handling race conditions in  
 netcat. I haven't read through the dhcpd or syslogd implementation very  
 closely yet, but I don't know what problem you're trying to solve here.

From what I see netcat only uses SIGALRM. syslogd has SIGHUP, SIGTERM,
SIGINT and SIGQUIT - these can occur at any time, during a signal handler,
during a system call... (ok, SIGALRM can also interrupt a system call,
but in netcat it terminates the toy anyway.)

I was trying to replace the self-pipe logic in these toys by something
equivalent. Race conditions occur when you try to make your sighandlers
reentrant by making the sighandlers stubs and doing the real work in the
main loop.

I can't say that I've understood all of syslogd, yet... For dhcpd I've
only tested the signal handling.

  The main reason for not using ppoll is that it is linux only. pselect
  is in POSIX. OpenBSD however doesn't seem to have it... The portable
  way would be to use the self-pipe as before. But then there is even
  more global state...
 
 I don't understand what problem this addresses. I wrote a dnsd that  
 didn't need this (although it just handled udp, not tcp). I've written  
 my own little web servers, my own little vpn and a star topology  
 network redirector... Not remembering ever playing games with signals  
 like this for any of them?

If they didn't do any signal handling at all, that might have been a
reason.

For syslogd one issue might be if two SIGHUPs (telling it to restart in
order to reread its configuration file) occur in quick succession so
that cleanup() is interrupted, and then called a second time which might
cause a double free().

+  for (;;) {
+memcpy(copy, readfds, sizeof(copy));
+ret = pselect(nfds, copy, 0, 0, 0, action.sa_mask);
+if (ret == -1) {
+  if (errno != EINTR) error_exit (pselect);
+  if (timeout  daemon_sig == SIGALRM) alarm(timeout);
+  sighandler(daemon_sig);
  
   Unconditionally calling sighandler(), so we don't have the option  
  of it
   being null and just getting the default behavior for signals. And  
  the
   caller can't have its own use of alarm outside this helper.
  
  Yes, for the broader use it would be good to make the signal handling
  optional (but that's actually the main point of this function). The  
  caller
  can have its own use of alarm if it doesn't use the one built into  
  this
  function.
 
 There's a timeout built into the blocking function. Possibly you have  
 to recalculate the timeout before calling the function again if you  
 care about next scheduled action instead of just it took too long.  
 This is a heavyweight operation?

No. Is alarm() a heavyweight operation?  (Unlike the wrapper function
the linux system call even updates the timeout like select() does.)

   Ok, net win. (I need a lib/pending.h. Right now adding stuff to
   lib/pending.c and then yanking it out again affects the commit  
  history
   of permanent files...)
  
  I guess using signal() instead of sigaction() would save some lines.  
  But I
  didn't want to mix the new pselect with the old signal().
 
 I still don't understand why you're doing all this stuff with signals.
 
   But this patch is doing a lot more than just introducing the new
   library functions and making stuff use them, so I've got to do  
  _more_
   review before I can come to an actual conclusion about whether or  
  not
   to apply it...
  
  It is really just moving stuff around. Luckily the local variables in  
  the
  main functions nicely broke up into the three seperate functions each.
  Anyway, I still need to improve the restarting of syslogd.
  
   I really do appreciate this work. Sorry to give you such a hard time
   about it. I just want to get it _right_...
  
  No problem. Please feel free to wait with coming back to this until  
  after
  the (impending?) release.
 
 I have an extra day in ohio (the linuxfest sunday content wasn't worth  
 staying 

Re: [Toybox] [PATCH] Factor out dhcpd's and syslogd's main loops into a library function

2013-09-15 Thread Rob Landley

On 09/10/2013 04:22:41 PM, Felix Janda wrote:

Rob Landley wrote:
 And yet a call to pselect needs its own sigprocmask() call as setup?
 So... why are we using pselect again?

Didn't I say so? Before the call to pselect we block everything.


But why do we do that? (So that only kill -9 will kill daemons?)


Then
pselect atomically unblocks everything, does the select and afterwards
blocks everything again. So the only thing that can be interrupted by
signals is pselect. No need to think about race conditions. The  
relation

between pselect and select is the same as between ppoll and poll.


There was no need to think about signal handling race conditions in  
netcat. I haven't read through the dhcpd or syslogd implementation very  
closely yet, but I don't know what problem you're trying to solve here.



The main reason for not using ppoll is that it is linux only. pselect
is in POSIX. OpenBSD however doesn't seem to have it... The portable
way would be to use the self-pipe as before. But then there is even
more global state...


I don't understand what problem this addresses. I wrote a dnsd that  
didn't need this (although it just handled udp, not tcp). I've written  
my own little web servers, my own little vpn and a star topology  
network redirector... Not remembering ever playing games with signals  
like this for any of them?



  +  for (;;) {
  +memcpy(copy, readfds, sizeof(copy));
  +ret = pselect(nfds, copy, 0, 0, 0, action.sa_mask);
  +if (ret == -1) {
  +  if (errno != EINTR) error_exit (pselect);
  +  if (timeout  daemon_sig == SIGALRM) alarm(timeout);
  +  sighandler(daemon_sig);

 Unconditionally calling sighandler(), so we don't have the option  
of it
 being null and just getting the default behavior for signals. And  
the

 caller can't have its own use of alarm outside this helper.

Yes, for the broader use it would be good to make the signal handling
optional (but that's actually the main point of this function). The  
caller
can have its own use of alarm if it doesn't use the one built into  
this

function.


There's a timeout built into the blocking function. Possibly you have  
to recalculate the timeout before calling the function again if you  
care about next scheduled action instead of just it took too long.  
This is a heavyweight operation?



 Ok, net win. (I need a lib/pending.h. Right now adding stuff to
 lib/pending.c and then yanking it out again affects the commit  
history

 of permanent files...)

I guess using signal() instead of sigaction() would save some lines.  
But I

didn't want to mix the new pselect with the old signal().


I still don't understand why you're doing all this stuff with signals.


 But this patch is doing a lot more than just introducing the new
 library functions and making stuff use them, so I've got to do  
_more_
 review before I can come to an actual conclusion about whether or  
not

 to apply it...

It is really just moving stuff around. Luckily the local variables in  
the

main functions nicely broke up into the three seperate functions each.
Anyway, I still need to improve the restarting of syslogd.

 I really do appreciate this work. Sorry to give you such a hard time
 about it. I just want to get it _right_...

No problem. Please feel free to wait with coming back to this until  
after

the (impending?) release.


I have an extra day in ohio (the linuxfest sunday content wasn't worth  
staying for), so I'm trying to catch up on my email a bit and  
downloading the LFS 7.4 files in the background...


Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] Factor out dhcpd's and syslogd's main loops into a library function

2013-09-10 Thread Felix Janda
Rob Landley wrote:
 On 09/01/2013 08:54:27 AM, Felix Janda wrote:
  diff -r 587b7572aac2 -r 752d22ece0fe lib/lib.h
  --- a/lib/lib.h Sun Sep 01 15:13:58 2013 +0200
  +++ b/lib/lib.h Sun Sep 01 15:25:08 2013 +0200
  @@ -201,4 +201,7 @@
   char  *astrcat (char *, char *);
   char *xastrcat (char *, char *);
  
  +// daemon helper functions
   void daemonize(void);
  +void daemon_main_loop(int nfds, fd_set *readfds, long timeout, int  
  *signals,
  + void (*sighandler)(int), void  
  (*selecthandler)(fd_set *fds));
 
 It's not really daemon, is it? Netcat isn't a daemon, less isn't a  
 daemon, tail -f isn't a daemon... All of 'em could potentially benefit  
 from a select loop. It's pretty much just select_loop().

I hadn't thought of it that general. Right now it unconditionally sets
up signals but that could be easily fixed.

 Also, the current code in netcat is using poll(), not select(). (Array  
 of file descriptors vs magic sized bitfield where you have to calculate  
 nfds, which _this_ function is asking the caller to do. Neither one  
 actually scales to thousands of filehandles gracefully, but if that's  
 an issue it's probably not common library code the one or two  
 filehandle case is going to want to use...)

  diff -r 587b7572aac2 -r 752d22ece0fe lib/pending.c
  --- a/lib/pending.c Sun Sep 01 15:13:58 2013 +0200
  +++ b/lib/pending.c Sun Sep 01 15:25:08 2013 +0200
  @@ -92,3 +92,41 @@
 dup2(fd, 2);
 if (fd  2) close(fd);
   }
  +
  +static int daemon_sig;
  +
 
 Gratuitous global variable. I've been trying to eliminate those. (I  
 just added libbuf analogous to toybuf but for use by lib/*.c,  
 presumably it could use that. But... is it worth doing?)

It seems inevitable to me to keep some global state for the signal
handling.

  +static void daemon_sighandler(int sig)
  +{
  +  daemon_sig = sig;
  +}
  +
 
 If we weren't messing with alarm, and instead of using the timeouts  
 built into select, would signals be our problem at all? (Might have to  
 calculate a timeout, but we do that anyway when actual data comes in if  
 we need a periodic action, and you can get data with a bad checksum  
 that causes spurious data arrived, no wait it didn't wakeups anyway.  
 Getting interrupted happens, we need to cope.)
 
 (And still confused about the syscall having a timeout but us not using  
 it...)

SIG_ALRM was an afterthought when I realized that interrupting pselect
messes up the timeouts.

pselect is a replacement for the select + signal self-pipe.

  +/*
  + * main loop for daemons listening to signals and file handles
  +*/
  +void daemon_main_loop(int nfds, fd_set *readfds, long timeout, int  
  *signals,
  + void (*sighandler)(int), void  
  (*selecthandler)(fd_set *fds))
  +{
  +  fd_set copy;
  +  sigset_t sigmask;
  +  struct sigaction action;
  +  int *i, ret;
  +
  +  sigemptyset(sigmask);
  +  action.sa_handler = daemon_sighandler;
  +  sigemptyset(action.sa_mask);
  +  action.sa_flags = 0;
  +  if (timeout) alarm(timeout);
  +  for (i = signals; *i; i++) sigaddset(sigmask, *i);
  +  sigprocmask(SIG_SETMASK, sigmask, 0);
  +  for (i = signals; *i; i++) sigaction(*i, action, 0);
 
 To quote from the ppoll man page:
 
 Other than the difference in the precision of the timeout  
 argument, the
 following ppoll() call:
 
 ready = ppoll(fds, nfds, timeout_ts, sigmask);
 
 is equivalent to atomically executing the following calls:
 
 sigset_t origmask;
 int timeout;
 
 timeout = (timeout_ts == NULL) ? -1 :
   (timeout_ts.tv_sec * 1000 + timeout_ts.tv_nsec /  
 100);
 sigprocmask(SIG_SETMASK, sigmask, origmask);
 ready = poll(fds, nfds, timeout);
 sigprocmask(SIG_SETMASK, origmask, NULL);
 
 And yet a call to pselect needs its own sigprocmask() call as setup?  
 So... why are we using pselect again?

Didn't I say so? Before the call to pselect we block everything. Then
pselect atomically unblocks everything, does the select and afterwards
blocks everything again. So the only thing that can be interrupted by
signals is pselect. No need to think about race conditions. The relation
between pselect and select is the same as between ppoll and poll.

The main reason for not using ppoll is that it is linux only. pselect
is in POSIX. OpenBSD however doesn't seem to have it... The portable
way would be to use the self-pipe as before. But then there is even
more global state...

  +  for (;;) {
  +memcpy(copy, readfds, sizeof(copy));
  +ret = pselect(nfds, copy, 0, 0, 0, action.sa_mask);
  +if (ret == -1) {
  +  if (errno != EINTR) error_exit (pselect);
  +  if (timeout  daemon_sig == SIGALRM) alarm(timeout);
  +  sighandler(daemon_sig);
 
 Unconditionally calling sighandler(), so we don't have the option of it  
 being null and just getting the default behavior for signals. And the  
 caller