Re: [PATCH 1/2] daemon: use timeout for uninterruptible poll
Kim Gybelswrites: >> > In other words, you scolded Kim for something that this patch did not >> > introduce, but which was already there. > > I didn't feel scolded, just Junio raising a concern about maintainability of > the code. FWIW, I didn't mean to scold, either. Rather I was pointing out that the code already maintains the number of remaining children, which means that a more portable abstraction than poll(), if we desired to have one, would merely be one step away, as we already know that at least need that information to help Windows. > There is another issue with the existing code that this new > "xpoll" will need to take into account. If a SIGCHLD arrives > between the call to check_dead_children and poll, the poll will > not be interupted by it, resulting in the child not being reaped > until another child terminates or a client connects. Currently, > the effect is just a zombie process for a longer time, however, > the proposed patch (daemon: graceful shutdown of client > connection) relies on the cleanup to close the client connection. Good analysis. That consideration may mean that xpoll() as I suggested is useless as a possible more portable abstraction (or, "an abstraction that is implementable easily in both POSIX and non-POSIX world"), but I suspect we would still want to have an internal "portable" API that serves the purpose similar to how we wanted POSIX poll() to serve. The place the patch is touching is not the only place poll() is used in the codebase, and other places (and future ones we would add) may benefit from having one. Thanks for being constructive.
Re: [PATCH 1/2] daemon: use timeout for uninterruptible poll
On (19/04/18 06:51), Junio C Hamano wrote: > Johannes Schindelinwrites: > > In other words, you scolded Kim for something that this patch did not > > introduce, but which was already there. I didn't feel scolded, just Junio raising a concern about maintainability of the code. > > Unless I am misunderstanding violently what you say, that is, in which > > case I would like to ask for a clarification why this patch (which does > > not change a thing unless NO_POLL is defined!) must be rejected, and while > > at it, I would like to ask you how introducing a layer of indirection with > > a full new function that is at least moderately misleading (as it would be > > named xpoll() despite your desire that it should do things that poll() > > does *not* do) would be preferable to this here patch that changes but a > > few lines to introduce a regular heartbeat check for platforms that > > Our xwrite() and other xfoo() are to "fix" undesirable aspect of the > underlying pure POSIX API to make it more suitable for our codebase. > When pure POSIX poll() that requires the implementing or emulating > platform pays attention to the children being waited on is not > appropriate for the codepath we are using (i.e. the place where the > patch is touching), it would be in line to introduce a "fixed" API > that allows us to pass that information, so that we can build on top > of that abstraction that is *not* pure POSIX abstraction, no? After > all, you are the one who constantly whine that Git is implemented on > POSIX API and it is inconvenient for other platforms. There is another issue with the existing code that this new "xpoll" will need to take into account. If a SIGCHLD arrives between the call to check_dead_children and poll, the poll will not be interupted by it, resulting in the child not being reaped until another child terminates or a client connects. Currently, the effect is just a zombie process for a longer time, however, the proposed patch (daemon: graceful shutdown of client connection) relies on the cleanup to close the client connection. When I have time, I will reroll including a change to ppoll. -Kim
Re: [PATCH 1/2] daemon: use timeout for uninterruptible poll
Johannes Schindelinwrites: > Unless I am misunderstanding violently what you say, that is, in which > case I would like to ask for a clarification why this patch (which does > not change a thing unless NO_POLL is defined!) must be rejected, and while > at it, I would like to ask you how introducing a layer of indirection with > a full new function that is at least moderately misleading (as it would be > named xpoll() despite your desire that it should do things that poll() > does *not* do) would be preferable to this here patch that changes but a > few lines to introduce a regular heartbeat check for platforms that Our xwrite() and other xfoo() are to "fix" undesirable aspect of the underlying pure POSIX API to make it more suitable for our codebase. When pure POSIX poll() that requires the implementing or emulating platform pays attention to the children being waited on is not appropriate for the codepath we are using (i.e. the place where the patch is touching), it would be in line to introduce a "fixed" API that allows us to pass that information, so that we can build on top of that abstraction that is *not* pure POSIX abstraction, no? After all, you are the one who constantly whine that Git is implemented on POSIX API and it is inconvenient for other platforms.
Re: [PATCH 1/2] daemon: use timeout for uninterruptible poll
Hi Kim, On Sun, 15 Apr 2018, Kim Gybels wrote: > On (13/04/18 14:36), Johannes Schindelin wrote: > > > The poll provided in compat/poll.c is not interrupted by receiving > > > SIGCHLD. Use a timeout for cleaning up dead children in a timely > > > manner. > > > > Maybe say "When using this poll emulation, use a timeout ..."? > > I will rewrite the commit message when I reroll the patch. Calling the > poll "uninterruptible" might be wrong as well, although the poll > doesn't return with EINTR when a child process terminates, it might > still be interruptible in other ways. On a related note, the handler > for SIGCHLD is simply not called in Git-for-Windows' daemon. Right. There is no signal infrastructure on Windows that is an exact equivalent of what Junio desires. > > > @@ -1161,8 +1162,13 @@ static int service_loop(struct socketlist > > > *socklist) > > > int i; > > > > > > check_dead_children(); > > > - > > > - if (poll(pfd, socklist->nr, -1) < 0) { > > > +#ifdef NO_POLL > > > + poll_timeout = live_children ? 100 : -1; > > > +#endif > > > + int ret = poll(pfd, socklist->nr, poll_timeout); > > > + if (ret == 0) { > > > + continue; > > > + } else if (ret < 0) { > > > > I would find it a bit easier on the eyes if this did not use curlies, and > > dropped the unnecessary `else` (`continue` will take care of that): > > > > if (!ret) > > continue; > > if (ret < 0) > > [...] > > Funny, that's how I would normally write it, if I wasn't so focused on > trying to follow the coding quidelines. While I'm at it, I will also > fix that sneaky double space after the if. :-) > Is it ok to add the timeout for all platforms using the poll > emulation, since I only tested for Windows? >From my reading of the patch, it changes only one thing, and only in the case that the developer asked to build with NO_POLL (which means that the platform does not have a native poll()): instead of waiting indefinitely, the poll() call is interrupted in regular intervals to give reap_dead_children() a chance to clean up. And that's all it does. So it is a simply heartbeat for platforms that require it, and that heartbeat would not even hurt any platform that would *not* require it. In short: from my point of view, it is fine to add the timeout for all NO_POLL platforms, even if it was only tested on Windows. Of course, we *do* know that there is one other user of NO_POLL: the NonStop platform. Randall, would you mind testing these two patches on NonStop? Thanks, Johannes
Re: [PATCH 1/2] daemon: use timeout for uninterruptible poll
Hi Junio, On Mon, 16 Apr 2018, Junio C Hamano wrote: > Kim Gybelswrites: > > > The poll provided in compat/poll.c is not interrupted by receiving > > SIGCHLD. Use a timeout for cleaning up dead children in a timely manner. > > I think you identified the problem and diagnosed it correctly, but I > find that the change proposed here introduces a severe layering > violation. The code is still calling what is called poll(), which > should not have such a broken semantics. While I have sympathy for your desire to apply pure POSIX functionality, the reality is that we do not have this luxury. Not if we want to support Git on the still most prevalent development platform: Windows. On Windows, you simply do not have that poll() that you are looking for. In particular, there is no signal handling of the type you seem to want to require. As to the layering violation you mention, first a HN quote, just to loosen the mood, and to at least partially ease the blow delivered by your mail: There is no such thing as a layering violation. You should be immediately suspicious of anyone who claims that there are such things. ;-) Seriously again. If you care to have a look at the patch, you will see that the loop (which will now benefit from Kim's timeout on platforms without POSIX signal handling) *already* contains that call to reap_dead_children(). In other words, you scolded Kim for something that this patch did not introduce, but which was already there. Unless I am misunderstanding violently what you say, that is, in which case I would like to ask for a clarification why this patch (which does not change a thing unless NO_POLL is defined!) must be rejected, and while at it, I would like to ask you how introducing a layer of indirection with a full new function that is at least moderately misleading (as it would be named xpoll() despite your desire that it should do things that poll() does *not* do) would be preferable to this here patch that changes but a few lines to introduce a regular heartbeat check for platforms that require it? Thank you, Dscho
Re: [PATCH 1/2] daemon: use timeout for uninterruptible poll
Junio C Hamanowrites: > I think you identified the problem and diagnosed it correctly, but I > find that the change proposed here introduces a severe layering > violation. The code is still calling what is called poll(), which > should not have such a broken semantics. I only mentioned a piece of fact (i.e. "the code calls poll() after the patch"), but I guess I should have made it clear what makes that a bad thing. Future readers of the code in daemon.c are required to be aware of the limitation of some poll() emulation; they cannot "optimize" out and made the code unware of the (non-)existence of remaining children, for example. When the callsite uses poll(), those who know how poll() ought to work won't be. The reason why the xpoll() I mentioned as a possible alternative would be better is because they will learn why we do not use normal poll() there and why we maintain and pass live_children (and those who cut and paste without understanding the existing code _will_ copy the calling site of xpoll(), which will automatically copy the need to maintain the number of remaining children ;-).
Re: [PATCH 1/2] daemon: use timeout for uninterruptible poll
Kim Gybelswrites: > The poll provided in compat/poll.c is not interrupted by receiving > SIGCHLD. Use a timeout for cleaning up dead children in a timely manner. I think you identified the problem and diagnosed it correctly, but I find that the change proposed here introduces a severe layering violation. The code is still calling what is called poll(), which should not have such a broken semantics. The ideal solution would be to fix the emulation so that it also properly works for reaping a dead child process, but if that is not possible, another solution that does not break the API layering would probably be to introduce our own version of something similar to poll() that helps various platforms that cannot implement the real poll() faithfully for whatever reason. Such an xpoll() API function we introduce (and implement in compat/poll.c) may take, in addition to the usual parameters to reall poll(), the value of live_children we have at this call site. With that - On platforms whose poll() does work correctly for culling dead children will just ignore the live_children paramater in its implementation of xpoll() - On other platforms, it will shorten the timeout depending on the need to cull dead children, just like your patch did. Thanks. > > Signed-off-by: Kim Gybels > --- > daemon.c | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/daemon.c b/daemon.c > index fe833ea7de..6dc95c1b2f 100644 > --- a/daemon.c > +++ b/daemon.c > @@ -1147,6 +1147,7 @@ static int service_loop(struct socketlist *socklist) > { > struct pollfd *pfd; > int i; > + int poll_timeout = -1; > > pfd = xcalloc(socklist->nr, sizeof(struct pollfd)); > > @@ -1161,8 +1162,13 @@ static int service_loop(struct socketlist *socklist) > int i; > > check_dead_children(); > - > - if (poll(pfd, socklist->nr, -1) < 0) { > +#ifdef NO_POLL > + poll_timeout = live_children ? 100 : -1; > +#endif > + int ret = poll(pfd, socklist->nr, poll_timeout); > + if (ret == 0) { > + continue; > + } else if (ret < 0) { > if (errno != EINTR) { > logerror("Poll failed, resuming: %s", > strerror(errno));
Re: [PATCH 1/2] daemon: use timeout for uninterruptible poll
On (13/04/18 14:36), Johannes Schindelin wrote: > > The poll provided in compat/poll.c is not interrupted by receiving > > SIGCHLD. Use a timeout for cleaning up dead children in a timely manner. > > Maybe say "When using this poll emulation, use a timeout ..."? I will rewrite the commit message when I reroll the patch. Calling the poll "uninterruptible" might be wrong as well, although the poll doesn't return with EINTR when a child process terminates, it might still be interruptible in other ways. On a related note, the handler for SIGCHLD is simply not called in Git-for-Windows' daemon. > > diff --git a/daemon.c b/daemon.c > > index fe833ea7de..6dc95c1b2f 100644 > > --- a/daemon.c > > +++ b/daemon.c > > @@ -1147,6 +1147,7 @@ static int service_loop(struct socketlist *socklist) > > { > > struct pollfd *pfd; > > int i; > > + int poll_timeout = -1; > > Just reuse the line above: > > int poll_timeout = -1, i; Sure. > > @@ -1161,8 +1162,13 @@ static int service_loop(struct socketlist *socklist) > > int i; > > > > check_dead_children(); > > - > > - if (poll(pfd, socklist->nr, -1) < 0) { > > +#ifdef NO_POLL > > + poll_timeout = live_children ? 100 : -1; > > +#endif > > + int ret = poll(pfd, socklist->nr, poll_timeout); > > + if (ret == 0) { > > + continue; > > + } else if (ret < 0) { > > I would find it a bit easier on the eyes if this did not use curlies, and > dropped the unnecessary `else` (`continue` will take care of that): > > if (!ret) > continue; > if (ret < 0) > [...] Funny, that's how I would normally write it, if I wasn't so focused on trying to follow the coding quidelines. While I'm at it, I will also fix that sneaky double space after the if. Is it ok to add the timeout for all platforms using the poll emulation, since I only tested for Windows? Best regards, Kim
Re: [PATCH 1/2] daemon: use timeout for uninterruptible poll
Hi Kim, On Thu, 12 Apr 2018, Kim Gybels wrote: > The poll provided in compat/poll.c is not interrupted by receiving > SIGCHLD. Use a timeout for cleaning up dead children in a timely manner. Maybe say "When using this poll emulation, use a timeout ..."? > diff --git a/daemon.c b/daemon.c > index fe833ea7de..6dc95c1b2f 100644 > --- a/daemon.c > +++ b/daemon.c > @@ -1147,6 +1147,7 @@ static int service_loop(struct socketlist *socklist) > { > struct pollfd *pfd; > int i; > + int poll_timeout = -1; Just reuse the line above: int poll_timeout = -1, i; > @@ -1161,8 +1162,13 @@ static int service_loop(struct socketlist *socklist) > int i; > > check_dead_children(); > - > - if (poll(pfd, socklist->nr, -1) < 0) { > +#ifdef NO_POLL > + poll_timeout = live_children ? 100 : -1; > +#endif > + int ret = poll(pfd, socklist->nr, poll_timeout); > + if (ret == 0) { > + continue; > + } else if (ret < 0) { I would find it a bit easier on the eyes if this did not use curlies, and dropped the unnecessary `else` (`continue` will take care of that): if (!ret) continue; if (ret < 0) [...] Thank you for working on this! Ciao, Dscho