Re: [PATCH v8 00/11] Git filter protocol
Jeff Kingwrites: > I am not sure if I have followed all of this discussion, but I am of the > opinion that Git should not be doing any timeouts at all. > ... > If this is debugging output of the form "now I am calling wait() on all > of the filters", without respect to any timeout, that sounds reasonable. > Though I would argue that run-command should simply trace_printf() > any processes it is waiting for, which covers _any_ process, not just > the filters. That seems like a good match for the rest of the GIT_TRACE > output, which describes how and when we spawn the sub-processes. Yup, I agree that having trace_printf() report the wait for any process is the cleanest way to go. As you guessed the reason why Lars is bringing up "now we are waiting for this filter" is because I suggested it as a way to encourage users to file bugs when they see a hung Git. Originally Lars wanted to have a timeout on wait and after the timeout wanted to kill the process, and because I really did not want such a random "you are too slow to die, so I'll send a signal to you and exit myself without making sure you died" there, I suggested that if we were to have a timeout, that would be to timeout the wait only to have a chance to tell the user "we are stuck waiting on this thing" (and then go back to wait), as it would either be a buggy filter (i.e. the users need to debug their own filter code) or a buggy use of wait on Git side (i.e. we would want to hear about such bugs from them). Without such a "wait with timeout so that we can tell the user", we can still respond to "my 'git checkout' hangs forever" with "try running with GIT_TRACE" as you outlined above, so I do not think we need the timeout. Thanks for straightening us out.
Re: [PATCH v8 00/11] Git filter protocol
On Thu, Oct 06, 2016 at 03:13:19PM +0200, Lars Schneider wrote: > >>> Well, it would be good to tell users _why_ Git is hanging, see below. > >> > >> Agreed. Do you think it is OK to write the message to stderr? > > > > On the other hand, this is why GIT_TRACE (and GIT_TRACE_PERFORMANCE) > > was invented for. We do not signal troubles with single-shot filters, > > so I guess doing it for multi-file filters is not needed. > > I am on the fence with this one. > > @Junio/Peff: > Where would you prefer to see a "Waiting for filter 'XYZ'... " message? > On stderr or via GIT_TRACE? I am not sure if I have followed all of this discussion, but I am of the opinion that Git should not be doing any timeouts at all. There are simply too many places where the filter (or any other process that git is depending on) could inexplicably hang, and I do not want to pepper random timeouts for all parts of the procedure where we say "woah, this is taking longer than expected" (nor do I want to have a timeout for _one_ spot, and ignore all the others). If this is debugging output of the form "now I am calling wait() on all of the filters", without respect to any timeout, that sounds reasonable. Though I would argue that run-command should simply trace_printf() any processes it is waiting for, which covers _any_ process, not just the filters. That seems like a good match for the rest of the GIT_TRACE output, which describes how and when we spawn the sub-processes. Something like: diff --git a/run-command.c b/run-command.c index 5a4dbb6..b884605 100644 --- a/run-command.c +++ b/run-command.c @@ -226,6 +226,9 @@ static int wait_or_whine(pid_t pid, const char *argv0, int in_signal) pid_t waiting; int failed_errno = 0; + if (!in_signal) + trace_printf("waiting for pid %d", (int)pid); + while ((waiting = waitpid(pid, , 0)) < 0 && errno == EINTR) ; /* nothing */ if (in_signal) but it does not have to be part of this series, I think. -Peff
Re: [PATCH v8 00/11] Git filter protocol
> On 04 Oct 2016, at 21:04, Jakub Narębskiwrote: > > W dniu 03.10.2016 o 19:13, Lars Schneider pisze: >>> On 01 Oct 2016, at 22:48, Jakub Narębski wrote: >>> W dniu 01.10.2016 o 20:59, Lars Schneider pisze: On 29 Sep 2016, at 23:27, Junio C Hamano wrote: > Lars Schneider writes: > > If the filter process refuses to die forever when Git told it to > shutdown (by closing the pipe to it, for example), that filter > process is simply buggy. I think we want users to become aware of > that, instead of Git leaving it behind, which essentially is to > sweep the problem under the rug. >>> >>> Well, it would be good to tell users _why_ Git is hanging, see below. >> >> Agreed. Do you think it is OK to write the message to stderr? > > On the other hand, this is why GIT_TRACE (and GIT_TRACE_PERFORMANCE) > was invented for. We do not signal troubles with single-shot filters, > so I guess doing it for multi-file filters is not needed. I am on the fence with this one. @Junio/Peff: Where would you prefer to see a "Waiting for filter 'XYZ'... " message? On stderr or via GIT_TRACE? > > I agree with what Peff said elsewhere in the thread; if a filter > process wants to take time to clean things up while letting Git > proceed, it can do its own process management, but I think it is > sensible for Git to wait the filter process it directly spawned. To realize the approach above I prototyped the run-command patch below: I added an "exit_timeout" variable to the "child_process" struct. On exit, Git will close the pipe to the process and wait "exit_timeout" seconds until it kills the child process. If "exit_timeout" is negative then Git will wait until the process is done. >>> >>> That might be good approach. Probably the default would be to wait. >> >> I think I would prefer a 2sec timeout or something as default. This way >> we can ensure Git would not wait indefinitely for a buggy filter by default. > > Actually this waiting for multi-file filter is only about waiting for > the shutdown process of the filter. The filter could still hang during > processing a file, and git would hang too, if I understand it correctly. Correct. >> [...] this function is also used with the async struct... > > Hmmm... now I wonder if it is a good idea (similar treatment for > single-file async-invoked filter, and multi-file pkt-line filters). > > For single-file one-shot filter (correct me if I am wrong): > > - git sends contents to filter, signals end with EOF > (after process is started) > - in an async process: > - process is started > - git reads contents from filter, until EOF > - if process did not end, it is killed > > > For multi-process pkt-line based filter (simplified): > > - process is started > - handshake > - for each file > - file is send to filter process over pkt-line, > end signalled with flush packet > - git reads from filter from pkt-line, until flush > - ... > > > See how single-shot filter is sent EOF, though in different part > of code. We need to signal multi-file filter that no more files > will be coming. Simplest solution is to send EOF (we could send > "command=shutdown" for example...) to filter, and wait for EOF > from filter (or for "status=finished" and EOF). That's what we do. EOF does signal the multi-filter to shutdown. > For full patch, you would need also to add to Documentation/config.txt Why config.txt? Thanks, Lars
Re: [PATCH v8 00/11] Git filter protocol
W dniu 03.10.2016 o 19:13, Lars Schneider pisze: >> On 01 Oct 2016, at 22:48, Jakub Narębskiwrote: >> W dniu 01.10.2016 o 20:59, Lars Schneider pisze: >>> On 29 Sep 2016, at 23:27, Junio C Hamano wrote: Lars Schneider writes: If the filter process refuses to die forever when Git told it to shutdown (by closing the pipe to it, for example), that filter process is simply buggy. I think we want users to become aware of that, instead of Git leaving it behind, which essentially is to sweep the problem under the rug. >> >> Well, it would be good to tell users _why_ Git is hanging, see below. > > Agreed. Do you think it is OK to write the message to stderr? On the other hand, this is why GIT_TRACE (and GIT_TRACE_PERFORMANCE) was invented for. We do not signal troubles with single-shot filters, so I guess doing it for multi-file filters is not needed. I agree with what Peff said elsewhere in the thread; if a filter process wants to take time to clean things up while letting Git proceed, it can do its own process management, but I think it is sensible for Git to wait the filter process it directly spawned. >>> >>> To realize the approach above I prototyped the run-command patch below: >>> >>> I added an "exit_timeout" variable to the "child_process" struct. >>> On exit, Git will close the pipe to the process and wait "exit_timeout" >>> seconds until it kills the child process. If "exit_timeout" is negative >>> then Git will wait until the process is done. >> >> That might be good approach. Probably the default would be to wait. > > I think I would prefer a 2sec timeout or something as default. This way > we can ensure Git would not wait indefinitely for a buggy filter by default. Actually this waiting for multi-file filter is only about waiting for the shutdown process of the filter. The filter could still hang during processing a file, and git would hang too, if I understand it correctly. [...] >> Also, how would one set default value of timeout for all process >> based filters? > > I think we don't need that because a timeout is always specific > to a filter (if the 2sec default is not sufficient). All right (assuming that timeouts are good idea). >>> >>> + while ((waitpid(p->pid, , 0)) < 0 && errno == >>> EINTR) >>> + ; /* nothing */ >> >> Ah, this loop is here because waiting on waitpid() can be interrupted >> by the delivery of a signal to the calling process; though the result >> is -1, not just any < 0. > > "< 0" is also used in wait_or_whine() O.K. (though it doesn't necessary mean that it is correct, there is another point for using "< 0"). [...] >> There is also another complication: there can be more than one >> long-running filter driver used. With this implementation we >> wait for each of one in sequence, e.g. 10s + 10s + 10s. > > Good idea, I fixed that in the version below! > [...] > [...] this function is also used with the async struct... Hmmm... now I wonder if it is a good idea (similar treatment for single-file async-invoked filter, and multi-file pkt-line filters). For single-file one-shot filter (correct me if I am wrong): - git sends contents to filter, signals end with EOF (after process is started) - in an async process: - process is started - git reads contents from filter, until EOF - if process did not end, it is killed For multi-process pkt-line based filter (simplified): - process is started - handshake - for each file - file is send to filter process over pkt-line, end signalled with flush packet - git reads from filter from pkt-line, until flush - ... See how single-shot filter is sent EOF, though in different part of code. We need to signal multi-file filter that no more files will be coming. Simplest solution is to send EOF (we could send "command=shutdown" for example...) to filter, and wait for EOF from filter (or for "status=finished" and EOF). We could kill multi-file filter after sending last file and receiving full response... but I think single-shot filter gets killed only because it allows for very simple filters, and reusing existing commands as filters. [...] > diff --git a/run-command.c b/run-command.c > index 3269362..ca0feef 100644 > --- a/run-command.c > +++ b/run-command.c > @@ -21,6 +21,9 @@ void child_process_clear(struct child_process *child) > > struct child_to_clean { > pid_t pid; > + char *name; I guess it is here for output purposes? Should we store full command here, or just name of ? > + int stdin; I guess the name `stdin` for file _descriptor_ is something used in other parts of convert.c code, isn't it? > + int timeout; Hmmm... we assume that timeout is in seconds, not millis or other value, isn't it. timeout_sec would perhaps be unnecessarily long. > struct child_to_clean *next; >
Re: [PATCH v8 00/11] Git filter protocol
Jeff Kingwrites: > On Mon, Oct 03, 2016 at 10:02:14AM -0700, Junio C Hamano wrote: > >> The timeout would be good for you to give a message "filter process >> running the script '%s' is not exiting; I am waiting for it". The >> user is still left with a hung Git, and can then see if that process >> is hanging around. If it is, then we found a buggy filter. Or we >> found a buggy Git. Either needs to be fixed. I do not think it >> would help anybody by doing a kill(2) to sweep possible bugs under >> the rug. > > I would argue that we should not even bother with such a timeout. This > is an exceptional, buggy condition, and hanging is not at all restricted > to this particular case. If git is hanging, then the right tools are > "ps" or "strace" to figure out what is going on. I know that not all > users are comfortable with those tools, but enough are in practice that > the bugs get ironed out, without git having to carry a bunch of extra > timing code that is essentially never exercised. OK.
Re: [PATCH v8 00/11] Git filter protocol
On Mon, Oct 03, 2016 at 10:02:14AM -0700, Junio C Hamano wrote: > The timeout would be good for you to give a message "filter process > running the script '%s' is not exiting; I am waiting for it". The > user is still left with a hung Git, and can then see if that process > is hanging around. If it is, then we found a buggy filter. Or we > found a buggy Git. Either needs to be fixed. I do not think it > would help anybody by doing a kill(2) to sweep possible bugs under > the rug. I would argue that we should not even bother with such a timeout. This is an exceptional, buggy condition, and hanging is not at all restricted to this particular case. If git is hanging, then the right tools are "ps" or "strace" to figure out what is going on. I know that not all users are comfortable with those tools, but enough are in practice that the bugs get ironed out, without git having to carry a bunch of extra timing code that is essentially never exercised. -Peff
Re: [PATCH v8 00/11] Git filter protocol
> On 03 Oct 2016, at 19:02, Junio C Hamanowrote: > > Lars Schneider writes: > >>> If the filter process refuses to die forever when Git told it to >>> shutdown (by closing the pipe to it, for example), that filter >>> process is simply buggy. I think we want users to become aware of >>> that, instead of Git leaving it behind, which essentially is to >>> sweep the problem under the rug. >>> >>> I agree with what Peff said elsewhere in the thread; if a filter >>> process wants to take time to clean things up while letting Git >>> proceed, it can do its own process management, but I think it is >>> sensible for Git to wait the filter process it directly spawned. >> >> To realize the approach above I prototyped the run-command patch below: >> >> I added an "exit_timeout" variable to the "child_process" struct. >> On exit, Git will close the pipe to the process and wait "exit_timeout" >> seconds until it kills the child process. If "exit_timeout" is negative >> then Git will wait until the process is done. > >> If we use that in the long running filter process, then we could make >> the timeout even configurable. E.g. with "filter..process-timeout". >> >> What do you think about this solution? > > Is such a configuration (or timeout in general) necessary? I > suspect that a need for timeout, especially needing timeout and > needing to get killed that happens so often to require a > configuration variable, is a sign of something else seriously wrong. > > What's the justification for a filter to _require_ getting killed > all the time when it is spawned? Otherwise you wouldn't configure > "this driver does not die when told, so we need a timeout" variable. > Is it a sign of the flaw in the protocol to talk to it? e.g. Git > has a way to tell it to die, but it somehow is very hard to hear > from filter's end and honor that request? > > I think that we would need some timeout in the mechanism, but not to > be used for "killing". > > You would decide to "kill" an filter process in two cases: the > filter is buggy and refuses to die when Git tells it to exit, or the > code in Git waiting for its death is somehow miscounting its > children, and thought it told to die one process but in fact it > didn't (perhaps it told somebody else to die), or it thought it > hasn't seen the child die when in fact it already did. Agreed. > Calling kill(2) and exiting would hide these two kind of bugs from > end users. Not doing so would give the end users a hung Git, which > is a VERY GOOD thing. Otherwise you would not notice bugs and lose > the opportunity to diagnose and fix it. Aha. I assumed that a hung Git because of a buggy filter would be a no-no. Thanks for this clarification. > The timeout would be good for you to give a message "filter process > running the script '%s' is not exiting; I am waiting for it". The > user is still left with a hung Git, and can then see if that process > is hanging around. If it is, then we found a buggy filter. Or we > found a buggy Git. Either needs to be fixed. I do not think it > would help anybody by doing a kill(2) to sweep possible bugs under > the rug. I could achieve that with this run-command patch: http://public-inbox.org/git/e9946e9f-6ee5-492b-b122-9078ceb88...@gmail.com/ (I'll remove the "timeout after x seconds" parts and keep the "wait until done" part with stderr output) Thanks, Lars
Re: [PATCH v8 00/11] Git filter protocol
> On 01 Oct 2016, at 22:48, Jakub Narębskiwrote: > > W dniu 01.10.2016 o 20:59, Lars Schneider pisze: >> On 29 Sep 2016, at 23:27, Junio C Hamano wrote: >>> Lars Schneider writes: >>> >>> If the filter process refuses to die forever when Git told it to >>> shutdown (by closing the pipe to it, for example), that filter >>> process is simply buggy. I think we want users to become aware of >>> that, instead of Git leaving it behind, which essentially is to >>> sweep the problem under the rug. > > Well, it would be good to tell users _why_ Git is hanging, see below. Agreed. Do you think it is OK to write the message to stderr? >>> I agree with what Peff said elsewhere in the thread; if a filter >>> process wants to take time to clean things up while letting Git >>> proceed, it can do its own process management, but I think it is >>> sensible for Git to wait the filter process it directly spawned. >> >> To realize the approach above I prototyped the run-command patch below: >> >> I added an "exit_timeout" variable to the "child_process" struct. >> On exit, Git will close the pipe to the process and wait "exit_timeout" >> seconds until it kills the child process. If "exit_timeout" is negative >> then Git will wait until the process is done. > > That might be good approach. Probably the default would be to wait. I think I would prefer a 2sec timeout or something as default. This way we can ensure Git would not wait indefinitely for a buggy filter by default. >> If we use that in the long running filter process, then we could make >> the timeout even configurable. E.g. with "filter..process-timeout". > > Sidenote: we prefer camelCase rather than kebab-case for config > variables, that is, "filter..processTimeout". Sure! > Also, how would one set default value of timeout for all process > based filters? I think we don't need that because a timeout is always specific to a filter (if the 2sec default is not sufficient). >> >> +while ((waitpid(p->pid, , 0)) < 0 && errno == >> EINTR) >> +; /* nothing */ > > Ah, this loop is here because waiting on waitpid() can be interrupted > by the delivery of a signal to the calling process; though the result > is -1, not just any < 0. "< 0" is also used in wait_or_whine() >> +while (getpgid(p->pid) >= 0 && tv.tv_sec - secs < >> p->timeout) { >> +gettimeofday(, NULL); >> +} > > WTF? Busy wait loop??? This was just a quick prototype to show "my thinking direction". I wasn't expecting a review. Sorry :-) > Also, if we want to wait for child without blocking, then instead > of cryptic getpgid(p->pid) maybe use waitpid(p->pid, , WNOHANG); > it is more explicit. Sure! > There is also another complication: there can be more than one > long-running filter driver used. With this implementation we > wait for each of one in sequence, e.g. 10s + 10s + 10s. Good idea, I fixed that in the version below! >> >> -static void mark_child_for_cleanup(pid_t pid) >> +static void mark_child_for_cleanup(pid_t pid, int timeout, int stdin) > > H... three parameters is not too much, but we might want to > pass "struct child_process *" directly if this number grows. I used parameters because this function is also used with the async struct... I've attached a more serious patch for review below. What do you think? Thanks, Lars diff --git a/run-command.c b/run-command.c index 3269362..ca0feef 100644 --- a/run-command.c +++ b/run-command.c @@ -21,6 +21,9 @@ void child_process_clear(struct child_process *child) struct child_to_clean { pid_t pid; + char *name; + int stdin; + int timeout; struct child_to_clean *next; }; static struct child_to_clean *children_to_clean; @@ -28,12 +31,53 @@ static int installed_child_cleanup_handler; static void cleanup_children(int sig, int in_signal) { + int status; + struct timeval tv; + time_t secs; + struct child_to_clean *p = children_to_clean; + + // Send EOF to children as indicator that Git will exit soon + while (p) { + if (p->timeout != 0) { + if (p->stdin > 0) + close(p->stdin); + } + p = p->next; + } + while (children_to_clean) { - struct child_to_clean *p = children_to_clean; + p = children_to_clean; children_to_clean = p->next; + + if (p->timeout != 0) { + fprintf(stderr, _("Waiting for '%s' to finish..."), p->name); + if (p->timeout < 0) { + // No timeout given - wait indefinitely + while ((waitpid(p->pid, , 0)) < 0 && errno == EINTR) + ;
Re: [PATCH v8 00/11] Git filter protocol
Lars Schneiderwrites: >> If the filter process refuses to die forever when Git told it to >> shutdown (by closing the pipe to it, for example), that filter >> process is simply buggy. I think we want users to become aware of >> that, instead of Git leaving it behind, which essentially is to >> sweep the problem under the rug. >> >> I agree with what Peff said elsewhere in the thread; if a filter >> process wants to take time to clean things up while letting Git >> proceed, it can do its own process management, but I think it is >> sensible for Git to wait the filter process it directly spawned. > > To realize the approach above I prototyped the run-command patch below: > > I added an "exit_timeout" variable to the "child_process" struct. > On exit, Git will close the pipe to the process and wait "exit_timeout" > seconds until it kills the child process. If "exit_timeout" is negative > then Git will wait until the process is done. > If we use that in the long running filter process, then we could make > the timeout even configurable. E.g. with "filter..process-timeout". > > What do you think about this solution? Is such a configuration (or timeout in general) necessary? I suspect that a need for timeout, especially needing timeout and needing to get killed that happens so often to require a configuration variable, is a sign of something else seriously wrong. What's the justification for a filter to _require_ getting killed all the time when it is spawned? Otherwise you wouldn't configure "this driver does not die when told, so we need a timeout" variable. Is it a sign of the flaw in the protocol to talk to it? e.g. Git has a way to tell it to die, but it somehow is very hard to hear from filter's end and honor that request? I think that we would need some timeout in the mechanism, but not to be used for "killing". You would decide to "kill" an filter process in two cases: the filter is buggy and refuses to die when Git tells it to exit, or the code in Git waiting for its death is somehow miscounting its children, and thought it told to die one process but in fact it didn't (perhaps it told somebody else to die), or it thought it hasn't seen the child die when in fact it already did. Calling kill(2) and exiting would hide these two kind of bugs from end users. Not doing so would give the end users a hung Git, which is a VERY GOOD thing. Otherwise you would not notice bugs and lose the opportunity to diagnose and fix it. The timeout would be good for you to give a message "filter process running the script '%s' is not exiting; I am waiting for it". The user is still left with a hung Git, and can then see if that process is hanging around. If it is, then we found a buggy filter. Or we found a buggy Git. Either needs to be fixed. I do not think it would help anybody by doing a kill(2) to sweep possible bugs under the rug. Thanks.
Re: [PATCH v8 00/11] Git filter protocol
W dniu 01.10.2016 o 20:59, Lars Schneider pisze: > On 29 Sep 2016, at 23:27, Junio C Hamanowrote: >> Lars Schneider writes: >> >>> We discussed that issue in v4 and v6: >>> http://public-inbox.org/git/20160803225313.pk3tfe5ovz4y3...@sigill.intra.peff.net/ >>> http://public-inbox.org/git/xmqqbn0a3wy3@gitster.mtv.corp.google.com/ >>> >>> My impression was that you don't want Git to wait for the filter process. >>> If Git waits for the filter process - how long should Git wait? >> >> I am not sure where you got that impression. I did say that I do >> not want Git to _KILL_ my filter process. That does not mean I want >> Git to go away without waiting for me. >> >> If the filter process refuses to die forever when Git told it to >> shutdown (by closing the pipe to it, for example), that filter >> process is simply buggy. I think we want users to become aware of >> that, instead of Git leaving it behind, which essentially is to >> sweep the problem under the rug. Well, it would be good to tell users _why_ Git is hanging, see below. >> >> I agree with what Peff said elsewhere in the thread; if a filter >> process wants to take time to clean things up while letting Git >> proceed, it can do its own process management, but I think it is >> sensible for Git to wait the filter process it directly spawned. > > To realize the approach above I prototyped the run-command patch below: > > I added an "exit_timeout" variable to the "child_process" struct. > On exit, Git will close the pipe to the process and wait "exit_timeout" > seconds until it kills the child process. If "exit_timeout" is negative > then Git will wait until the process is done. That might be good approach. Probably the default would be to wait. > > If we use that in the long running filter process, then we could make > the timeout even configurable. E.g. with "filter..process-timeout". Sidenote: we prefer camelCase rather than kebab-case for config variables, that is, "filter..processTimeout". Also, how would one set default value of timeout for all process based filters? > > What do you think about this solution? I think this addition be done after, assuming that we come up with good default behavior (e.g. wait for filter processes to finish). Also, we would probably want to add some progress information in a similar way to progress info for checkout, that is display it after a few seconds waiting. This could be, for example: Waiting for filter '' to finish... done With timeout it could look like this (where underlined part is interactive, that is changes every second): Waiting 10s for '' filter process to finish. ^^^ And then either Filter '' killed or Filter '' finished > diff --git a/run-command.c b/run-command.c > index 3269362..a933066 100644 > --- a/run-command.c > +++ b/run-command.c > @@ -21,6 +21,8 @@ void child_process_clear(struct child_process *child) > > struct child_to_clean { > pid_t pid; > + int stdin; > + int timeout; > struct child_to_clean *next; > }; > static struct child_to_clean *children_to_clean; > @@ -28,9 +30,30 @@ static int installed_child_cleanup_handler; > > static void cleanup_children(int sig, int in_signal) > { > + int status; > + struct timeval tv; > + time_t secs; > + > while (children_to_clean) { > struct child_to_clean *p = children_to_clean; > children_to_clean = p->next; > + > + if (p->timeout != 0 && p->stdin > 0) > + close(p->stdin); Why are you not closing stdin of filter driver process if timeout is zero? Is it maybe some kind of special value? If it is, this is not documented. > + > + if (p->timeout < 0) { > + // Wait until the process finishes > + while ((waitpid(p->pid, , 0)) < 0 && errno == > EINTR) > + ; /* nothing */ Ah, this loop is here because waiting on waitpid() can be interrupted by the delivery of a signal to the calling process; though the result is -1, not just any < 0. Looks good (but we would want some progress information, probably). > + } else if (p->timeout != 0) { > + // Wait until the process finishes or timeout > + gettimeofday(, NULL); > + secs = tv.tv_sec; > + while (getpgid(p->pid) >= 0 && tv.tv_sec - secs < > p->timeout) { > + gettimeofday(, NULL); > + } WTF? Busy wait loop??? Also, if we want to wait for child without blocking, then instead of cryptic getpgid(p->pid) maybe use waitpid(p->pid, , WNOHANG); it is more explicit. There is also another complication: there can be more than one long-running filter driver used. With this implementation we wait for each of one in sequence, e.g. 10s + 10s + 10s. > + } > + >
Re: [PATCH v8 00/11] Git filter protocol
> On 29 Sep 2016, at 23:27, Junio C Hamanowrote: > > Lars Schneider writes: > >> We discussed that issue in v4 and v6: >> http://public-inbox.org/git/20160803225313.pk3tfe5ovz4y3...@sigill.intra.peff.net/ >> http://public-inbox.org/git/xmqqbn0a3wy3@gitster.mtv.corp.google.com/ >> >> My impression was that you don't want Git to wait for the filter process. >> If Git waits for the filter process - how long should Git wait? > > I am not sure where you got that impression. I did say that I do > not want Git to _KILL_ my filter process. That does not mean I want > Git to go away without waiting for me. > > If the filter process refuses to die forever when Git told it to > shutdown (by closing the pipe to it, for example), that filter > process is simply buggy. I think we want users to become aware of > that, instead of Git leaving it behind, which essentially is to > sweep the problem under the rug. > > I agree with what Peff said elsewhere in the thread; if a filter > process wants to take time to clean things up while letting Git > proceed, it can do its own process management, but I think it is > sensible for Git to wait the filter process it directly spawned. To realize the approach above I prototyped the run-command patch below: I added an "exit_timeout" variable to the "child_process" struct. On exit, Git will close the pipe to the process and wait "exit_timeout" seconds until it kills the child process. If "exit_timeout" is negative then Git will wait until the process is done. If we use that in the long running filter process, then we could make the timeout even configurable. E.g. with "filter..process-timeout". What do you think about this solution? Thanks, Lars diff --git a/run-command.c b/run-command.c index 3269362..a933066 100644 --- a/run-command.c +++ b/run-command.c @@ -21,6 +21,8 @@ void child_process_clear(struct child_process *child) struct child_to_clean { pid_t pid; + int stdin; + int timeout; struct child_to_clean *next; }; static struct child_to_clean *children_to_clean; @@ -28,9 +30,30 @@ static int installed_child_cleanup_handler; static void cleanup_children(int sig, int in_signal) { + int status; + struct timeval tv; + time_t secs; + while (children_to_clean) { struct child_to_clean *p = children_to_clean; children_to_clean = p->next; + + if (p->timeout != 0 && p->stdin > 0) + close(p->stdin); + + if (p->timeout < 0) { + // Wait until the process finishes + while ((waitpid(p->pid, , 0)) < 0 && errno == EINTR) + ; /* nothing */ + } else if (p->timeout != 0) { + // Wait until the process finishes or timeout + gettimeofday(, NULL); + secs = tv.tv_sec; + while (getpgid(p->pid) >= 0 && tv.tv_sec - secs < p->timeout) { + gettimeofday(, NULL); + } + } + kill(p->pid, sig); if (!in_signal) free(p); @@ -49,10 +72,12 @@ static void cleanup_children_on_exit(void) cleanup_children(SIGTERM, 0); } -static void mark_child_for_cleanup(pid_t pid) +static void mark_child_for_cleanup(pid_t pid, int timeout, int stdin) { struct child_to_clean *p = xmalloc(sizeof(*p)); p->pid = pid; + p->stdin = stdin; + p->timeout = timeout; p->next = children_to_clean; children_to_clean = p; @@ -422,7 +447,7 @@ int start_command(struct child_process *cmd) if (cmd->pid < 0) error_errno("cannot fork() for %s", cmd->argv[0]); else if (cmd->clean_on_exit) - mark_child_for_cleanup(cmd->pid); + mark_child_for_cleanup(cmd->pid, cmd->exit_timeout, cmd->in); /* * Wait for child's execvp. If the execvp succeeds (or if fork() @@ -483,7 +508,7 @@ int start_command(struct child_process *cmd) if (cmd->pid < 0 && (!cmd->silent_exec_failure || errno != ENOENT)) error_errno("cannot spawn %s", cmd->argv[0]); if (cmd->clean_on_exit && cmd->pid >= 0) - mark_child_for_cleanup(cmd->pid); + mark_child_for_cleanup(cmd->pid, cmd->exit_timeout, cmd->in); argv_array_clear(); cmd->argv = sargv; @@ -765,7 +790,7 @@ int start_async(struct async *async) exit(!!async->proc(proc_in, proc_out, async->data)); } - mark_child_for_cleanup(async->pid); + mark_child_for_cleanup(async->pid, 0, -1); if (need_in) close(fdin[0]); diff --git a/run-command.h b/run-command.h index cf29a31..f2eca33 100644 --- a/run-command.h +++ b/run-command.h @@ -33,6 +33,7 @@ struct
Re: [PATCH v8 00/11] Git filter protocol
Lars Schneiderwrites: > We discussed that issue in v4 and v6: > http://public-inbox.org/git/20160803225313.pk3tfe5ovz4y3...@sigill.intra.peff.net/ > http://public-inbox.org/git/xmqqbn0a3wy3@gitster.mtv.corp.google.com/ > > My impression was that you don't want Git to wait for the filter process. > If Git waits for the filter process - how long should Git wait? I am not sure where you got that impression. I did say that I do not want Git to _KILL_ my filter process. That does not mean I want Git to go away without waiting for me. If the filter process refuses to die forever when Git told it to shutdown (by closing the pipe to it, for example), that filter process is simply buggy. I think we want users to become aware of that, instead of Git leaving it behind, which essentially is to sweep the problem under the rug. I agree with what Peff said elsewhere in the thread; if a filter process wants to take time to clean things up while letting Git proceed, it can do its own process management, but I think it is sensible for Git to wait the filter process it directly spawned.
Re: [PATCH v8 00/11] Git filter protocol
Jeff Kingwrites: > I don't necessarily agree, though, that the timing of filter-process > cleanup needs to be part of the public interface. So in your list: > >> 3) Git waits until the filter process finishes. > > That seems simple and elegant, but I can think of reasons we might not > want to wait (e.g., if the filter has to do some maintenance task and > does not the user to have to wait). > > OTOH, we already face this in git, and we solve it by explicitly > backgrounding the maintenance task (i.e., auto-gc). So one could argue > that it is the responsibility of the filter process to manage its own > processes. It certainly makes the interaction with git simpler. Yup, that summarizes my thinking a lot better than I managed to do in the previous message.
Re: [PATCH v8 00/11] Git filter protocol
Jakub Narębskiwrites: > Or even better: make filter driver write its pid to pidfile, and then > "wait $(cat rot13-filter.pid)". That's what we do in lib-git-daemon.sh > (I think). I am not sure if "wait"ing on a random process that is not a direct child is a reasonable thing to do, but I like the direction. Communicate with a pidfile and wait until "kill -0 $that_pid" fails, or something like that, would be clean enough.
Re: [PATCH v8 00/11] Git filter protocol
Lars Schneiderwrites: > A pragmatic approach: > > I could drop the "STOP" message that the filter writes to the log > on exit and everything would work as is. We could argue that this > is OK because Git doesn't care anyways if the filter process has > stopped or not. That would mean you can leave the process running while the test framework tries to remove the trash directory when we are done, creating the same bug J6t mentioned in the thread, no?
Re: [PATCH v8 00/11] Git filter protocol
W dniu 29.09.2016 o 13:57, Torsten Bögershausen pisze: > On 29/09/16 12:28, Lars Schneider wrote: >> This is what happens: >> >> 1) Git exits >> 2) The filter process receives EOF and prints "STOP" to the log >> 3) t0021 checks the content of the log >> >> Sometimes 3 happened before 2 which makes the test fail. >> (Example: https://travis-ci.org/git/git/jobs/162660563 ) >> >> I added a this to wait until the filter process terminates: >> >> +wait_for_filter_termination () { >> +while ps | grep -v grep | grep -F "/t0021/rot13-filter.pl" >/dev/null >> 2>&1 >> +do >> +echo "Waiting for /t0021/rot13-filter.pl to finish..." >> +sleep 1 >> +done >> +} >> >> Does this look OK to you? > Do we need the ps at all ? > How about this: > > +wait_for_filter_termination () { > +while ! grep "STOP" LOGFILENAME >/dev/null > +do > +echo "Waiting for /t0021/rot13-filter.pl to finish..." > +sleep 1 > +done > +} Or even better: make filter driver write its pid to pidfile, and then "wait $(cat rot13-filter.pid)". That's what we do in lib-git-daemon.sh (I think). If the problem is exit status of "wait" builtin, then filter driver can remove its pidfile after writing "STOP", just before ending. -- Jakub Narębski
Re: [PATCH v8 00/11] Git filter protocol
> On 29 Sep 2016, at 18:57, Junio C Hamanowrote: > > Torsten Bögershausen writes: > >>> 1) Git exits >>> 2) The filter process receives EOF and prints "STOP" to the log >>> 3) t0021 checks the content of the log >>> >>> Sometimes 3 happened before 2 which makes the test fail. >>> (Example: https://travis-ci.org/git/git/jobs/162660563 ) >>> >>> I added a this to wait until the filter process terminates: >>> >>> +wait_for_filter_termination () { >>> + while ps | grep -v grep | grep -F "/t0021/rot13-filter.pl" >/dev/null >>> 2>&1 >>> + do >>> + echo "Waiting for /t0021/rot13-filter.pl to finish..." >>> + sleep 1 >>> + done >>> +} >>> >>> Does this look OK to you? >> Do we need the ps at all ? >> How about this: >> >> +wait_for_filter_termination () { >> +while ! grep "STOP" LOGFILENAME >/dev/null >> +do >> +echo "Waiting for /t0021/rot13-filter.pl to finish..." >> +sleep 1 >> +done >> +} > > Running "ps" and grepping for a command is not suitable for script > to reliably tell things, so it is out of question. Compared to > that, your version looks slightly better, but what if the machinery > that being tested, i.e. the part that drives the filter process, is > buggy or becomes buggy and causes the filter process that writes > "STOP" to die before it actually writes that string? > > I have a feeling that the machinery being tested needs to be fixed > so that the sequence is always be: > >0) Git spawns the filter process, as it needs some contents to > be filtered. > >1) Git did everything it needed to do and decides that is time > to go. > >2) Filter process receives EOF and prints "STOP" to the log. > >3) Git waits until the filter process finishes. > >4) t0021, after Git finishes, checks the log. A pragmatic approach: I could drop the "STOP" message that the filter writes to the log on exit and everything would work as is. We could argue that this is OK because Git doesn't care anyways if the filter process has stopped or not. Would that be OK for everyone? - Lars
Re: [PATCH v8 00/11] Git filter protocol
Am 29.09.2016 um 20:18 schrieb Torsten Bögershausen: I would agree that Git should not wait for the filter. But does the test suite need to wait for the filter ? We have fixed a test case on Windows recently where a process hung around too long (5babb5bd). So, yes, the test suite has to wait for the filter. -- Hannes
Re: [PATCH v8 00/11] Git filter protocol
On 29/09/16 19:57, Lars Schneider wrote: On 29 Sep 2016, at 18:57, Junio C Hamanowrote: Torsten Bögershausen writes: 1) Git exits 2) The filter process receives EOF and prints "STOP" to the log 3) t0021 checks the content of the log Sometimes 3 happened before 2 which makes the test fail. (Example: https://travis-ci.org/git/git/jobs/162660563 ) I added a this to wait until the filter process terminates: +wait_for_filter_termination () { + while ps | grep -v grep | grep -F "/t0021/rot13-filter.pl" >/dev/null 2>&1 + do + echo "Waiting for /t0021/rot13-filter.pl to finish..." + sleep 1 + done +} Does this look OK to you? Do we need the ps at all ? How about this: +wait_for_filter_termination () { + while ! grep "STOP" LOGFILENAME >/dev/null + do + echo "Waiting for /t0021/rot13-filter.pl to finish..." + sleep 1 + done +} Running "ps" and grepping for a command is not suitable for script to reliably tell things, so it is out of question. Compared to that, your version looks slightly better, but what if the machinery that being tested, i.e. the part that drives the filter process, is buggy or becomes buggy and causes the filter process that writes "STOP" to die before it actually writes that string? I have a feeling that the machinery being tested needs to be fixed so that the sequence is always be: 0) Git spawns the filter process, as it needs some contents to be filtered. 1) Git did everything it needed to do and decides that is time to go. 2) Filter process receives EOF and prints "STOP" to the log. 3) Git waits until the filter process finishes. 4) t0021, after Git finishes, checks the log. Repeated sleep combined with grep is probably just sweeping the real problem under the rug. Do we have enough information to do the above? An inspiration may be in the way we centrally clean all tempfiles and lockfiles before exiting. We have a central registry of these files that need cleaning up and have a single atexit(3) handler to clean them up. Perhaps we need a registry that filter processes spawned by the mechanism Lars introduces in this series, and have an atexit(3) handler that closes the pipe to them (which signals the filters that it is time for them to go) and wait(2) on them, or something? I do not think we want any kill(2) to be involved in this clean-up procedure, but I do think we should wait(2) on what we spawn, as long as these processes are meant to be shut down when the main process of Git exits (this is different from things like credential-cache daemon where they are expected to persist and meant to serve multiple Git processes). We discussed that issue in v4 and v6: http://public-inbox.org/git/20160803225313.pk3tfe5ovz4y3...@sigill.intra.peff.net/ http://public-inbox.org/git/xmqqbn0a3wy3@gitster.mtv.corp.google.com/ My impression was that you don't want Git to wait for the filter process. If Git waits for the filter process - how long should Git wait? Thanks, Lars Hm, I would agree that Git should not wait for the filter. But does the test suite need to wait for the filter ? May be, in this case we test the filter and Git, which is good. Adding a 1 second delay, if, and only if, there is a racy condition, is not that bad (or do we have better ways to check for a process to be terminated ?)
Re: [PATCH v8 00/11] Git filter protocol
On Thu, Sep 29, 2016 at 09:57:57AM -0700, Junio C Hamano wrote: > > +wait_for_filter_termination () { > > + while ! grep "STOP" LOGFILENAME >/dev/null > > + do > > + echo "Waiting for /t0021/rot13-filter.pl to finish..." > > + sleep 1 > > + done > > +} > > Running "ps" and grepping for a command is not suitable for script > to reliably tell things, so it is out of question. Compared to > that, your version looks slightly better, but what if the machinery > that being tested, i.e. the part that drives the filter process, is > buggy or becomes buggy and causes the filter process that writes > "STOP" to die before it actually writes that string? I'm of the opinion that any busy-waiting is a good sign that something is suboptimal. The right solution here seems like it should be signaling the test script via a descriptor. I don't necessarily agree, though, that the timing of filter-process cleanup needs to be part of the public interface. So in your list: > 3) Git waits until the filter process finishes. That seems simple and elegant, but I can think of reasons we might not want to wait (e.g., if the filter has to do some maintenance task and does not the user to have to wait). OTOH, we already face this in git, and we solve it by explicitly backgrounding the maintenance task (i.e., auto-gc). So one could argue that it is the responsibility of the filter process to manage its own processes. It certainly makes the interaction with git simpler. -Peff
Re: [PATCH v8 00/11] Git filter protocol
> On 29 Sep 2016, at 18:57, Junio C Hamanowrote: > > Torsten Bögershausen writes: > >>> 1) Git exits >>> 2) The filter process receives EOF and prints "STOP" to the log >>> 3) t0021 checks the content of the log >>> >>> Sometimes 3 happened before 2 which makes the test fail. >>> (Example: https://travis-ci.org/git/git/jobs/162660563 ) >>> >>> I added a this to wait until the filter process terminates: >>> >>> +wait_for_filter_termination () { >>> + while ps | grep -v grep | grep -F "/t0021/rot13-filter.pl" >/dev/null >>> 2>&1 >>> + do >>> + echo "Waiting for /t0021/rot13-filter.pl to finish..." >>> + sleep 1 >>> + done >>> +} >>> >>> Does this look OK to you? >> Do we need the ps at all ? >> How about this: >> >> +wait_for_filter_termination () { >> +while ! grep "STOP" LOGFILENAME >/dev/null >> +do >> +echo "Waiting for /t0021/rot13-filter.pl to finish..." >> +sleep 1 >> +done >> +} > > Running "ps" and grepping for a command is not suitable for script > to reliably tell things, so it is out of question. Compared to > that, your version looks slightly better, but what if the machinery > that being tested, i.e. the part that drives the filter process, is > buggy or becomes buggy and causes the filter process that writes > "STOP" to die before it actually writes that string? > > I have a feeling that the machinery being tested needs to be fixed > so that the sequence is always be: > >0) Git spawns the filter process, as it needs some contents to > be filtered. > >1) Git did everything it needed to do and decides that is time > to go. > >2) Filter process receives EOF and prints "STOP" to the log. > >3) Git waits until the filter process finishes. > >4) t0021, after Git finishes, checks the log. > > Repeated sleep combined with grep is probably just sweeping the real > problem under the rug. Do we have enough information to do the > above? > > An inspiration may be in the way we centrally clean all tempfiles > and lockfiles before exiting. We have a central registry of these > files that need cleaning up and have a single atexit(3) handler to > clean them up. Perhaps we need a registry that filter processes > spawned by the mechanism Lars introduces in this series, and have an > atexit(3) handler that closes the pipe to them (which signals the > filters that it is time for them to go) and wait(2) on them, or > something? I do not think we want any kill(2) to be involved in > this clean-up procedure, but I do think we should wait(2) on what we > spawn, as long as these processes are meant to be shut down when the > main process of Git exits (this is different from things like > credential-cache daemon where they are expected to persist and meant > to serve multiple Git processes). We discussed that issue in v4 and v6: http://public-inbox.org/git/20160803225313.pk3tfe5ovz4y3...@sigill.intra.peff.net/ http://public-inbox.org/git/xmqqbn0a3wy3@gitster.mtv.corp.google.com/ My impression was that you don't want Git to wait for the filter process. If Git waits for the filter process - how long should Git wait? Thanks, Lars
Re: [PATCH v8 00/11] Git filter protocol
Torsten Bögershausenwrites: >> 1) Git exits >> 2) The filter process receives EOF and prints "STOP" to the log >> 3) t0021 checks the content of the log >> >> Sometimes 3 happened before 2 which makes the test fail. >> (Example: https://travis-ci.org/git/git/jobs/162660563 ) >> >> I added a this to wait until the filter process terminates: >> >> +wait_for_filter_termination () { >> +while ps | grep -v grep | grep -F "/t0021/rot13-filter.pl" >/dev/null >> 2>&1 >> +do >> +echo "Waiting for /t0021/rot13-filter.pl to finish..." >> +sleep 1 >> +done >> +} >> >> Does this look OK to you? > Do we need the ps at all ? > How about this: > > +wait_for_filter_termination () { > + while ! grep "STOP" LOGFILENAME >/dev/null > + do > + echo "Waiting for /t0021/rot13-filter.pl to finish..." > + sleep 1 > + done > +} Running "ps" and grepping for a command is not suitable for script to reliably tell things, so it is out of question. Compared to that, your version looks slightly better, but what if the machinery that being tested, i.e. the part that drives the filter process, is buggy or becomes buggy and causes the filter process that writes "STOP" to die before it actually writes that string? I have a feeling that the machinery being tested needs to be fixed so that the sequence is always be: 0) Git spawns the filter process, as it needs some contents to be filtered. 1) Git did everything it needed to do and decides that is time to go. 2) Filter process receives EOF and prints "STOP" to the log. 3) Git waits until the filter process finishes. 4) t0021, after Git finishes, checks the log. Repeated sleep combined with grep is probably just sweeping the real problem under the rug. Do we have enough information to do the above? An inspiration may be in the way we centrally clean all tempfiles and lockfiles before exiting. We have a central registry of these files that need cleaning up and have a single atexit(3) handler to clean them up. Perhaps we need a registry that filter processes spawned by the mechanism Lars introduces in this series, and have an atexit(3) handler that closes the pipe to them (which signals the filters that it is time for them to go) and wait(2) on them, or something? I do not think we want any kill(2) to be involved in this clean-up procedure, but I do think we should wait(2) on what we spawn, as long as these processes are meant to be shut down when the main process of Git exits (this is different from things like credential-cache daemon where they are expected to persist and meant to serve multiple Git processes).
Re: [PATCH v8 00/11] Git filter protocol
On 29/09/16 12:28, Lars Schneider wrote: On 28 Sep 2016, at 23:49, Junio C Hamanowrote: I suspect that you are preparing a reroll already, but the one that is sitting in 'pu' seems to be flaky in t/t0021 and I seem to see occasional failures from it. I didn't trace where the test goes wrong, but one easy mistake you could make (I am not saying that is the reason of the failure) is to assume your filter will not be called under certain condition (like immediately after you checked out from the index to the working tree), when the automated test goes fast enough and get you into a "racy git" situation---the filter may be asked to filter the contents from the working tree again to re-validate what's there is still what is in the index. Thanks for the heads-up! This is what happens: 1) Git exits 2) The filter process receives EOF and prints "STOP" to the log 3) t0021 checks the content of the log Sometimes 3 happened before 2 which makes the test fail. (Example: https://travis-ci.org/git/git/jobs/162660563 ) I added a this to wait until the filter process terminates: +wait_for_filter_termination () { + while ps | grep -v grep | grep -F "/t0021/rot13-filter.pl" >/dev/null 2>&1 + do + echo "Waiting for /t0021/rot13-filter.pl to finish..." + sleep 1 + done +} Does this look OK to you? Do we need the ps at all ? How about this: +wait_for_filter_termination () { + while ! grep "STOP" LOGFILENAME >/dev/null + do + echo "Waiting for /t0021/rot13-filter.pl to finish..." + sleep 1 + done +}
Re: [PATCH v8 00/11] Git filter protocol
> On 28 Sep 2016, at 23:49, Junio C Hamanowrote: > > I suspect that you are preparing a reroll already, but the one that > is sitting in 'pu' seems to be flaky in t/t0021 and I seem to see > occasional failures from it. > > I didn't trace where the test goes wrong, but one easy mistake you > could make (I am not saying that is the reason of the failure) is to > assume your filter will not be called under certain condition (like > immediately after you checked out from the index to the working > tree), when the automated test goes fast enough and get you into a > "racy git" situation---the filter may be asked to filter the > contents from the working tree again to re-validate what's there is > still what is in the index. Thanks for the heads-up! This is what happens: 1) Git exits 2) The filter process receives EOF and prints "STOP" to the log 3) t0021 checks the content of the log Sometimes 3 happened before 2 which makes the test fail. (Example: https://travis-ci.org/git/git/jobs/162660563 ) I added a this to wait until the filter process terminates: +wait_for_filter_termination () { + while ps | grep -v grep | grep -F "/t0021/rot13-filter.pl" >/dev/null 2>&1 + do + echo "Waiting for /t0021/rot13-filter.pl to finish..." + sleep 1 + done +} Does this look OK to you? - Lars
Re: [PATCH v8 00/11] Git filter protocol
I suspect that you are preparing a reroll already, but the one that is sitting in 'pu' seems to be flaky in t/t0021 and I seem to see occasional failures from it. I didn't trace where the test goes wrong, but one easy mistake you could make (I am not saying that is the reason of the failure) is to assume your filter will not be called under certain condition (like immediately after you checked out from the index to the working tree), when the automated test goes fast enough and get you into a "racy git" situation---the filter may be asked to filter the contents from the working tree again to re-validate what's there is still what is in the index.