Re: [PATCH 1/2] daemon: use timeout for uninterruptible poll

2018-04-19 Thread Junio C Hamano
Kim Gybels  writes:

>> > In other words, you scolded Kim for something that this patch did not
>> > introduce, but which was already there.
>
> I didn't feel scolded, just Junio raising a concern about maintainability of
> the code.

FWIW, I didn't mean to scold, either.

Rather I was pointing out that the code already maintains the number
of remaining children, which means that a more portable abstraction
than poll(), if we desired to have one, would merely be one step
away, as we already know that at least need that information to help
Windows.

> There is another issue with the existing code that this new
> "xpoll" will need to take into account. If a SIGCHLD arrives
> between the call to check_dead_children and poll, the poll will
> not be interupted by it, resulting in the child not being reaped
> until another child terminates or a client connects. Currently,
> the effect is just a zombie process for a longer time, however,
> the proposed patch (daemon: graceful shutdown of client
> connection) relies on the cleanup to close the client connection.

Good analysis.  That consideration may mean that xpoll() as I
suggested is useless as a possible more portable abstraction (or,
"an abstraction that is implementable easily in both POSIX and
non-POSIX world"), but I suspect we would still want to have an
internal "portable" API that serves the purpose similar to how we
wanted POSIX poll() to serve.  The place the patch is touching is
not the only place poll() is used in the codebase, and other places
(and future ones we would add) may benefit from having one.

Thanks for being constructive.




Re: [PATCH 1/2] daemon: use timeout for uninterruptible poll

2018-04-19 Thread Kim Gybels
On (19/04/18 06:51), Junio C Hamano wrote:
> Johannes Schindelin  writes:

> > In other words, you scolded Kim for something that this patch did not
> > introduce, but which was already there.

I didn't feel scolded, just Junio raising a concern about maintainability of
the code.

> > Unless I am misunderstanding violently what you say, that is, in which
> > case I would like to ask for a clarification why this patch (which does
> > not change a thing unless NO_POLL is defined!) must be rejected, and while
> > at it, I would like to ask you how introducing a layer of indirection with
> > a full new function that is at least moderately misleading (as it would be
> > named xpoll() despite your desire that it should do things that poll()
> > does *not* do) would be preferable to this here patch that changes but a
> > few lines to introduce a regular heartbeat check for platforms that
> 
> Our xwrite() and other xfoo() are to "fix" undesirable aspect of the
> underlying pure POSIX API to make it more suitable for our codebase.
> When pure POSIX poll() that requires the implementing or emulating
> platform pays attention to the children being waited on is not
> appropriate for the codepath we are using (i.e. the place where the
> patch is touching), it would be in line to introduce a "fixed" API
> that allows us to pass that information, so that we can build on top
> of that abstraction that is *not* pure POSIX abstraction, no?  After
> all, you are the one who constantly whine that Git is implemented on
> POSIX API and it is inconvenient for other platforms.

There is another issue with the existing code that this new "xpoll" will need
to take into account. If a SIGCHLD arrives between the call to
check_dead_children and poll, the poll will not be interupted by it, resulting
in the child not being reaped until another child terminates or a client
connects. Currently, the effect is just a zombie process for a longer time,
however, the proposed patch (daemon: graceful shutdown of client connection)
relies on the cleanup to close the client connection.

When I have time, I will reroll including a change to ppoll.

-Kim


Re: [PATCH 1/2] daemon: use timeout for uninterruptible poll

2018-04-18 Thread Junio C Hamano
Johannes Schindelin  writes:

> Unless I am misunderstanding violently what you say, that is, in which
> case I would like to ask for a clarification why this patch (which does
> not change a thing unless NO_POLL is defined!) must be rejected, and while
> at it, I would like to ask you how introducing a layer of indirection with
> a full new function that is at least moderately misleading (as it would be
> named xpoll() despite your desire that it should do things that poll()
> does *not* do) would be preferable to this here patch that changes but a
> few lines to introduce a regular heartbeat check for platforms that

Our xwrite() and other xfoo() are to "fix" undesirable aspect of the
underlying pure POSIX API to make it more suitable for our codebase.
When pure POSIX poll() that requires the implementing or emulating
platform pays attention to the children being waited on is not
appropriate for the codepath we are using (i.e. the place where the
patch is touching), it would be in line to introduce a "fixed" API
that allows us to pass that information, so that we can build on top
of that abstraction that is *not* pure POSIX abstraction, no?  After
all, you are the one who constantly whine that Git is implemented on
POSIX API and it is inconvenient for other platforms.



Re: [PATCH 1/2] daemon: use timeout for uninterruptible poll

2018-04-18 Thread Johannes Schindelin
Hi Kim,

On Sun, 15 Apr 2018, Kim Gybels wrote:

> On (13/04/18 14:36), Johannes Schindelin wrote:
> > > The poll provided in compat/poll.c is not interrupted by receiving
> > > SIGCHLD. Use a timeout for cleaning up dead children in a timely
> > > manner.
> > 
> > Maybe say "When using this poll emulation, use a timeout ..."?
> 
> I will rewrite the commit message when I reroll the patch. Calling the
> poll "uninterruptible" might be wrong as well, although the poll
> doesn't return with EINTR when a child process terminates, it might
> still be interruptible in other ways. On a related note, the handler
> for SIGCHLD is simply not called in Git-for-Windows' daemon.

Right. There is no signal infrastructure on Windows that is an exact
equivalent of what Junio desires.

> > > @@ -1161,8 +1162,13 @@ static int service_loop(struct socketlist 
> > > *socklist)
> > >   int i;
> > >  
> > >   check_dead_children();
> > > -
> > > - if (poll(pfd, socklist->nr, -1) < 0) {
> > > +#ifdef NO_POLL
> > > + poll_timeout = live_children ? 100 : -1;
> > > +#endif
> > > + int ret = poll(pfd, socklist->nr, poll_timeout);
> > > + if  (ret == 0) {
> > > + continue;
> > > + } else if (ret < 0) {
> > 
> > I would find it a bit easier on the eyes if this did not use curlies, and
> > dropped the unnecessary `else` (`continue` will take care of that):
> > 
> > if (!ret)
> > continue;
> > if (ret < 0)
> > [...]
> 
> Funny, that's how I would normally write it, if I wasn't so focused on
> trying to follow the coding quidelines. While I'm at it, I will also
> fix that sneaky double space after the if.

:-)

> Is it ok to add the timeout for all platforms using the poll
> emulation, since I only tested for Windows?

>From my reading of the patch, it changes only one thing, and only in the
case that the developer asked to build with NO_POLL (which means that the
platform does not have a native poll()): instead of waiting indefinitely,
the poll() call is interrupted in regular intervals to give
reap_dead_children() a chance to clean up.

And that's all it does.

So it is a simply heartbeat for platforms that require it, and that
heartbeat would not even hurt any platform that would *not* require it.

In short: from my point of view, it is fine to add the timeout for all
NO_POLL platforms, even if it was only tested on Windows.

Of course, we *do* know that there is one other user of NO_POLL: the
NonStop platform.

Randall, would you mind testing these two patches on NonStop?

Thanks,
Johannes


Re: [PATCH 1/2] daemon: use timeout for uninterruptible poll

2018-04-18 Thread Johannes Schindelin
Hi Junio,

On Mon, 16 Apr 2018, Junio C Hamano wrote:

> Kim Gybels  writes:
> 
> > The poll provided in compat/poll.c is not interrupted by receiving
> > SIGCHLD. Use a timeout for cleaning up dead children in a timely manner.
> 
> I think you identified the problem and diagnosed it correctly, but I
> find that the change proposed here introduces a severe layering
> violation.  The code is still calling what is called poll(), which
> should not have such a broken semantics.

While I have sympathy for your desire to apply pure POSIX functionality,
the reality is that we do not have this luxury. Not if we want to support
Git on the still most prevalent development platform: Windows. On Windows,
you simply do not have that poll() that you are looking for.

In particular, there is no signal handling of the type you seem to want to
require.

As to the layering violation you mention, first a HN quote, just to loosen
the mood, and to at least partially ease the blow delivered by your mail:

There is no such thing as a layering violation. You should be
immediately suspicious of anyone who claims that there are such
things.

;-)

Seriously again. If you care to have a look at the patch, you will see
that the loop (which will now benefit from Kim's timeout on platforms
without POSIX signal handling) *already* contains that call to
reap_dead_children().

In other words, you scolded Kim for something that this patch did not
introduce, but which was already there.

Unless I am misunderstanding violently what you say, that is, in which
case I would like to ask for a clarification why this patch (which does
not change a thing unless NO_POLL is defined!) must be rejected, and while
at it, I would like to ask you how introducing a layer of indirection with
a full new function that is at least moderately misleading (as it would be
named xpoll() despite your desire that it should do things that poll()
does *not* do) would be preferable to this here patch that changes but a
few lines to introduce a regular heartbeat check for platforms that
require it?

Thank you,
Dscho


Re: [PATCH 1/2] daemon: use timeout for uninterruptible poll

2018-04-15 Thread Junio C Hamano
Junio C Hamano  writes:

> I think you identified the problem and diagnosed it correctly, but I
> find that the change proposed here introduces a severe layering
> violation.  The code is still calling what is called poll(), which
> should not have such a broken semantics.

I only mentioned a piece of fact (i.e. "the code calls poll() after
the patch"), but I guess I should have made it clear what makes that
a bad thing.  Future readers of the code in daemon.c are required to
be aware of the limitation of some poll() emulation; they cannot
"optimize" out and made the code unware of the (non-)existence of
remaining children, for example.  When the callsite uses poll(),
those who know how poll() ought to work won't be.  The reason why
the xpoll() I mentioned as a possible alternative would be better is
because they will learn why we do not use normal poll() there and
why we maintain and pass live_children (and those who cut and paste
without understanding the existing code _will_ copy the calling site
of xpoll(), which will automatically copy the need to maintain the
number of remaining children ;-).




Re: [PATCH 1/2] daemon: use timeout for uninterruptible poll

2018-04-15 Thread Junio C Hamano
Kim Gybels  writes:

> The poll provided in compat/poll.c is not interrupted by receiving
> SIGCHLD. Use a timeout for cleaning up dead children in a timely manner.

I think you identified the problem and diagnosed it correctly, but I
find that the change proposed here introduces a severe layering
violation.  The code is still calling what is called poll(), which
should not have such a broken semantics.

The ideal solution would be to fix the emulation so that it also
properly works for reaping a dead child process, but if that is not
possible, another solution that does not break the API layering
would probably be to introduce our own version of something similar
to poll() that helps various platforms that cannot implement the
real poll() faithfully for whatever reason.  Such an xpoll() API
function we introduce (and implement in compat/poll.c) may take, in
addition to the usual parameters to reall poll(), the value of
live_children we have at this call site.  With that

 - On platforms whose poll() does work correctly for culling dead
   children will just ignore the live_children paramater in its
   implementation of xpoll()

 - On other platforms, it will shorten the timeout depending on the
   need to cull dead children, just like your patch did.

Thanks.


>
> Signed-off-by: Kim Gybels 
> ---
>  daemon.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/daemon.c b/daemon.c
> index fe833ea7de..6dc95c1b2f 100644
> --- a/daemon.c
> +++ b/daemon.c
> @@ -1147,6 +1147,7 @@ static int service_loop(struct socketlist *socklist)
>  {
>   struct pollfd *pfd;
>   int i;
> + int poll_timeout = -1;
>  
>   pfd = xcalloc(socklist->nr, sizeof(struct pollfd));
>  
> @@ -1161,8 +1162,13 @@ static int service_loop(struct socketlist *socklist)
>   int i;
>  
>   check_dead_children();
> -
> - if (poll(pfd, socklist->nr, -1) < 0) {
> +#ifdef NO_POLL
> + poll_timeout = live_children ? 100 : -1;
> +#endif
> + int ret = poll(pfd, socklist->nr, poll_timeout);
> + if  (ret == 0) {
> + continue;
> + } else if (ret < 0) {
>   if (errno != EINTR) {
>   logerror("Poll failed, resuming: %s",
> strerror(errno));


Re: [PATCH 1/2] daemon: use timeout for uninterruptible poll

2018-04-15 Thread Kim Gybels
On (13/04/18 14:36), Johannes Schindelin wrote:
> > The poll provided in compat/poll.c is not interrupted by receiving
> > SIGCHLD. Use a timeout for cleaning up dead children in a timely manner.
> 
> Maybe say "When using this poll emulation, use a timeout ..."?

I will rewrite the commit message when I reroll the patch. Calling the
poll "uninterruptible" might be wrong as well, although the poll
doesn't return with EINTR when a child process terminates, it might
still be interruptible in other ways. On a related note, the handler
for SIGCHLD is simply not called in Git-for-Windows' daemon.

> > diff --git a/daemon.c b/daemon.c
> > index fe833ea7de..6dc95c1b2f 100644
> > --- a/daemon.c
> > +++ b/daemon.c
> > @@ -1147,6 +1147,7 @@ static int service_loop(struct socketlist *socklist)
> >  {
> > struct pollfd *pfd;
> > int i;
> > +   int poll_timeout = -1;
> 
> Just reuse the line above:
> 
>   int poll_timeout = -1, i;

Sure.

> > @@ -1161,8 +1162,13 @@ static int service_loop(struct socketlist *socklist)
> > int i;
> >  
> > check_dead_children();
> > -
> > -   if (poll(pfd, socklist->nr, -1) < 0) {
> > +#ifdef NO_POLL
> > +   poll_timeout = live_children ? 100 : -1;
> > +#endif
> > +   int ret = poll(pfd, socklist->nr, poll_timeout);
> > +   if  (ret == 0) {
> > +   continue;
> > +   } else if (ret < 0) {
> 
> I would find it a bit easier on the eyes if this did not use curlies, and
> dropped the unnecessary `else` (`continue` will take care of that):
> 
>   if (!ret)
>   continue;
>   if (ret < 0)
>   [...]

Funny, that's how I would normally write it, if I wasn't so focused on
trying to follow the coding quidelines. While I'm at it, I will also
fix that sneaky double space after the if.

Is it ok to add the timeout for all platforms using the poll
emulation, since I only tested for Windows?

Best regards,
Kim


Re: [PATCH 1/2] daemon: use timeout for uninterruptible poll

2018-04-13 Thread Johannes Schindelin
Hi Kim,

On Thu, 12 Apr 2018, Kim Gybels wrote:

> The poll provided in compat/poll.c is not interrupted by receiving
> SIGCHLD. Use a timeout for cleaning up dead children in a timely manner.

Maybe say "When using this poll emulation, use a timeout ..."?

> diff --git a/daemon.c b/daemon.c
> index fe833ea7de..6dc95c1b2f 100644
> --- a/daemon.c
> +++ b/daemon.c
> @@ -1147,6 +1147,7 @@ static int service_loop(struct socketlist *socklist)
>  {
>   struct pollfd *pfd;
>   int i;
> + int poll_timeout = -1;

Just reuse the line above:

int poll_timeout = -1, i;

> @@ -1161,8 +1162,13 @@ static int service_loop(struct socketlist *socklist)
>   int i;
>  
>   check_dead_children();
> -
> - if (poll(pfd, socklist->nr, -1) < 0) {
> +#ifdef NO_POLL
> + poll_timeout = live_children ? 100 : -1;
> +#endif
> + int ret = poll(pfd, socklist->nr, poll_timeout);
> + if  (ret == 0) {
> + continue;
> + } else if (ret < 0) {

I would find it a bit easier on the eyes if this did not use curlies, and
dropped the unnecessary `else` (`continue` will take care of that):

if (!ret)
continue;
if (ret < 0)
[...]

Thank you for working on this!

Ciao,
Dscho