Re: Regression: Ctrl-c from the pager in an alias exits it

2017-01-06 Thread Jeff King
On Fri, Jan 06, 2017 at 11:42:25PM +0100, Johannes Sixt wrote:

> > So I dunno. Maybe this waiting should be restricted only to certain
> > cases like executing git sub-commands.
> 
> If given it some thought.
> 
> In general, I think it is wrong to wait for child processes when a signal
> was received. After all, it is the purpose of a (deadly) signal to have the
> process go away. There may be programs that know it better, like less, but
> git should not attempt to know better in general.
> 
> We do apply some special behavior for certain cases like we do for the
> pager. And now the case with aliases is another special situation. The
> parent git process only delegates to the child, and as such it is reasonable
> that it binds its life time to the first child, which executes the expanded
> alias.

Yeah, I think I agree. That binding is something you want in many cases,
but not necessarily all. The original purpose of clean_on_exit was to
create a binding like that, but of course it can be (and with the
smudge-filter stuff, arguably has been) used for other cases, too.

I'll work up a patch that makes it a separate option, which should be
pretty easy.

-Peff


Re: Regression: Ctrl-c from the pager in an alias exits it

2017-01-06 Thread Johannes Sixt

Am 06.01.2017 um 20:41 schrieb Jeff King:

On Fri, Jan 06, 2017 at 03:39:59PM +0100, Johannes Sixt wrote:


diff --git a/run-command.c b/run-command.c
index ca905a9e80..db47c429b7 100644
--- a/run-command.c
+++ b/run-command.c
@@ -29,6 +29,8 @@ static int installed_child_cleanup_handler;

 static void cleanup_children(int sig, int in_signal)
 {
+   struct child_to_clean *children_to_wait_for = NULL;
+
while (children_to_clean) {
struct child_to_clean *p = children_to_clean;
children_to_clean = p->next;
@@ -45,6 +47,17 @@ static void cleanup_children(int sig, int in_signal)
}

kill(p->pid, sig);
+   p->next = children_to_wait_for;
+   children_to_wait_for = p;
+   }
+
+   while (children_to_wait_for) {
+   struct child_to_clean *p = children_to_wait_for;
+   children_to_wait_for = p->next;
+
+   while (waitpid(p->pid, NULL, 0) < 0 && errno == EINTR)
+   ; /* spin waiting for process exit or error */
+
if (!in_signal)
free(p);
}



This looks like the minimal change necessary. I wonder, though, whether the
new local variable is really required. Wouldn't it be sufficient to walk the
children_to_clean chain twice?


Yeah, I considered that. The fact that we disassemble the list in the
first loop has two side effects:

  1. It lets us free the list as we go (for the !in_signal case).

  2. If we were to get another signal, it makes us sort-of reentrant. We
 will only kill and wait for each pid once.

Obviously (1) moves down to the lower loop, but I was trying to preserve
(2). I'm not sure if it is worth bothering, though.


Makes sense.


The way we pull
items off of the list is certainly not atomic (it does shorten the race
to a few instructions, though, versus potentially waiting on waitpid()
to return).

My bigger concern with the whole thing is whether we could hit some sort
of deadlock if the child doesn't die when we send it a signal. E.g.,
imagine we have a pipe open to the child and somebody sends SIGTERM to
us. We propagate SIGTERM to the child, and then waitpid() for it. The
child decides to ignore our SIGTERM for some reason and keep reading
until EOF on the pipe. It won't ever get it, and the two processes will
hang forever.

You can argue perhaps that the child is broken in that case. And I doubt
this could trigger when running a git sub-command. But we may add more
children in the future. Right now we use it for the new multi-file
clean/smudge filters. They use the hook feature to close the
descriptors, but note that that won't run in the in_signal case.

So I dunno. Maybe this waiting should be restricted only to certain
cases like executing git sub-commands.


If given it some thought.

In general, I think it is wrong to wait for child processes when a 
signal was received. After all, it is the purpose of a (deadly) signal 
to have the process go away. There may be programs that know it better, 
like less, but git should not attempt to know better in general.


We do apply some special behavior for certain cases like we do for the 
pager. And now the case with aliases is another special situation. The 
parent git process only delegates to the child, and as such it is 
reasonable that it binds its life time to the first child, which 
executes the expanded alias.


-- Hannes



Re: Regression: Ctrl-c from the pager in an alias exits it

2017-01-06 Thread Jeff King
On Fri, Jan 06, 2017 at 03:39:59PM +0100, Johannes Sixt wrote:

> > diff --git a/run-command.c b/run-command.c
> > index ca905a9e80..db47c429b7 100644
> > --- a/run-command.c
> > +++ b/run-command.c
> > @@ -29,6 +29,8 @@ static int installed_child_cleanup_handler;
> > 
> >  static void cleanup_children(int sig, int in_signal)
> >  {
> > +   struct child_to_clean *children_to_wait_for = NULL;
> > +
> > while (children_to_clean) {
> > struct child_to_clean *p = children_to_clean;
> > children_to_clean = p->next;
> > @@ -45,6 +47,17 @@ static void cleanup_children(int sig, int in_signal)
> > }
> > 
> > kill(p->pid, sig);
> > +   p->next = children_to_wait_for;
> > +   children_to_wait_for = p;
> > +   }
> > +
> > +   while (children_to_wait_for) {
> > +   struct child_to_clean *p = children_to_wait_for;
> > +   children_to_wait_for = p->next;
> > +
> > +   while (waitpid(p->pid, NULL, 0) < 0 && errno == EINTR)
> > +   ; /* spin waiting for process exit or error */
> > +
> > if (!in_signal)
> > free(p);
> > }
> > 
> 
> This looks like the minimal change necessary. I wonder, though, whether the
> new local variable is really required. Wouldn't it be sufficient to walk the
> children_to_clean chain twice?

Yeah, I considered that. The fact that we disassemble the list in the
first loop has two side effects:

  1. It lets us free the list as we go (for the !in_signal case).

  2. If we were to get another signal, it makes us sort-of reentrant. We
 will only kill and wait for each pid once.

Obviously (1) moves down to the lower loop, but I was trying to preserve
(2). I'm not sure if it is worth bothering, though. The way we pull
items off of the list is certainly not atomic (it does shorten the race
to a few instructions, though, versus potentially waiting on waitpid()
to return).

My bigger concern with the whole thing is whether we could hit some sort
of deadlock if the child doesn't die when we send it a signal. E.g.,
imagine we have a pipe open to the child and somebody sends SIGTERM to
us. We propagate SIGTERM to the child, and then waitpid() for it. The
child decides to ignore our SIGTERM for some reason and keep reading
until EOF on the pipe. It won't ever get it, and the two processes will
hang forever.

You can argue perhaps that the child is broken in that case. And I doubt
this could trigger when running a git sub-command. But we may add more
children in the future. Right now we use it for the new multi-file
clean/smudge filters. They use the hook feature to close the
descriptors, but note that that won't run in the in_signal case.

So I dunno. Maybe this waiting should be restricted only to certain
cases like executing git sub-commands.

-Peff


Re: Regression: Ctrl-c from the pager in an alias exits it

2017-01-06 Thread Johannes Sixt

Am 06.01.2017 um 08:32 schrieb Jeff King:

On Fri, Jan 06, 2017 at 02:26:02AM -0500, Jeff King wrote:


You'll notice that it actually calls wait() on the pager. That's due to
a3da882120 (pager: do wait_for_pager on signal death, 2009-01-22), which
IIRC was addressing a very similar problem. We want to stop feeding the
pager when we get a signal, but we don't want the main process to
actually exit, or the pager loses the controlling terminal.

In our new scenario we have an extra process, though. The git-log child
will wait on the pager, but the parent process can't. It doesn't know
about it. I think that it in turn needs to wait on the child when it
dies, and then the whole chain will stand still until the pager exits.


And here's a patch to do that. It seems to work.

I'll sleep on it and then write up a commit message tomorrow if it still
makes sense.

diff --git a/run-command.c b/run-command.c
index ca905a9e80..db47c429b7 100644
--- a/run-command.c
+++ b/run-command.c
@@ -29,6 +29,8 @@ static int installed_child_cleanup_handler;

 static void cleanup_children(int sig, int in_signal)
 {
+   struct child_to_clean *children_to_wait_for = NULL;
+
while (children_to_clean) {
struct child_to_clean *p = children_to_clean;
children_to_clean = p->next;
@@ -45,6 +47,17 @@ static void cleanup_children(int sig, int in_signal)
}

kill(p->pid, sig);
+   p->next = children_to_wait_for;
+   children_to_wait_for = p;
+   }
+
+   while (children_to_wait_for) {
+   struct child_to_clean *p = children_to_wait_for;
+   children_to_wait_for = p->next;
+
+   while (waitpid(p->pid, NULL, 0) < 0 && errno == EINTR)
+   ; /* spin waiting for process exit or error */
+
if (!in_signal)
free(p);
}



This looks like the minimal change necessary. I wonder, though, whether 
the new local variable is really required. Wouldn't it be sufficient to 
walk the children_to_clean chain twice?


-- Hannes



Re: Regression: Ctrl-c from the pager in an alias exits it

2017-01-06 Thread Trygve Aaberge
On Fri, Jan 06, 2017 at 02:32:25 -0500, Jeff King wrote:
> And here's a patch to do that. It seems to work.

Nice, thanks! This works very well for me too.

-- 
Trygve Aaberge


Re: Regression: Ctrl-c from the pager in an alias exits it

2017-01-05 Thread Jeff King
On Fri, Jan 06, 2017 at 02:26:02AM -0500, Jeff King wrote:

> You'll notice that it actually calls wait() on the pager. That's due to
> a3da882120 (pager: do wait_for_pager on signal death, 2009-01-22), which
> IIRC was addressing a very similar problem. We want to stop feeding the
> pager when we get a signal, but we don't want the main process to
> actually exit, or the pager loses the controlling terminal.
> 
> In our new scenario we have an extra process, though. The git-log child
> will wait on the pager, but the parent process can't. It doesn't know
> about it. I think that it in turn needs to wait on the child when it
> dies, and then the whole chain will stand still until the pager exits.

And here's a patch to do that. It seems to work.

I'll sleep on it and then write up a commit message tomorrow if it still
makes sense.

diff --git a/run-command.c b/run-command.c
index ca905a9e80..db47c429b7 100644
--- a/run-command.c
+++ b/run-command.c
@@ -29,6 +29,8 @@ static int installed_child_cleanup_handler;
 
 static void cleanup_children(int sig, int in_signal)
 {
+   struct child_to_clean *children_to_wait_for = NULL;
+
while (children_to_clean) {
struct child_to_clean *p = children_to_clean;
children_to_clean = p->next;
@@ -45,6 +47,17 @@ static void cleanup_children(int sig, int in_signal)
}
 
kill(p->pid, sig);
+   p->next = children_to_wait_for;
+   children_to_wait_for = p;
+   }
+
+   while (children_to_wait_for) {
+   struct child_to_clean *p = children_to_wait_for;
+   children_to_wait_for = p->next;
+
+   while (waitpid(p->pid, NULL, 0) < 0 && errno == EINTR)
+   ; /* spin waiting for process exit or error */
+
if (!in_signal)
free(p);
}


Re: Regression: Ctrl-c from the pager in an alias exits it

2017-01-05 Thread Jeff King
On Fri, Jan 06, 2017 at 01:47:52AM -0500, Jeff King wrote:

> > > To reproduce, create e.g. the alias `l = log` and run `git l`. Then press
> > > Ctrl-c. The expected behavior is that nothing happens. The actual 
> > > behavior is
> > > that the pager exits.
> > 
> > That's weird. I can't reproduce at all here. But I also can't think of a
> > thing that Git could do that would impact the behavior. For example:
> 
> I take it back. I could not reproduce under my normal config, which sets
> the pager to "diff-highlight | less". But if I drop that config, I can
> reproduce easily. And bisect it to that same commit, 86d26f240f
> (setup.c: re-fix d95138e (setup: set env $GIT_WORK_TREE when ..,
> 2015-12-20).

I made a little progress with strace. Prior to 86d26f240f, we would run
the "log" builtin directly inside the same executable. After it, we exec
a separate process, and the process tree looks like:

  |   `-bash,10123
  |   `-git,10125 l
  |   `-git-log,10127
  |   `-less,10128

Now if I strace all of those, I see (I've reordered and snipped a bit
for clarity):

  $ strace -p 10125 -p 10127 -p 10128
  strace: Process 10125 attached
  strace: Process 10127 attached
  strace: Process 10128 attached
  [pid 10127] write(1, "\n\33[33mcommit bae73e80d48ace1faa3"..., 1481 

  [pid 10128] read(5,  
  [pid 10125] wait4(10127,  

The main git process is waiting for the child to finish, the child is
blocked writing to the pager, and the pager is waiting for input from
the terminal (fd 5).

So I hit ^C:

  [pid 10128] <... read resumed> 0x7ffd39153b57, 1) = ? ERESTARTSYS (To be 
restarted if SA_RESTART is set)
  [pid 10127] <... write resumed> )   = ? ERESTARTSYS (To be restarted if 
SA_RESTART is set)
  [pid 10125] <... wait4 resumed> 0x7ffe88d0a560, 0, NULL) = ? ERESTARTSYS (To 
be restarted if SA_RESTART is set)
  [pid 10128] --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
  [pid 10127] --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
  [pid 10125] --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---

Everybody gets the signal...

  [pid 10127] write(1, "\n\33[33mcommit bae73e80d48ace1faa3"..., 1481 


The blocked writer will resume its write() until it returns. This is the
same as it would get under the older code, too (after write() returns it
will handle the signal and die).

  [pid 10125] kill(10127, SIGINT 
  [pid 10125] <... kill resumed> )= 0

The parent git process tries to propagate the signal to the child.
Unnecessary in this instance, but helpful when the signal is only
delivered to the parent. This is due to 

  [pid 10128] rt_sigaction(SIGINT, {sa_handler=0x558dd1af0300, sa_mask=[INT], 
sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f03fc8c2040},  
  [pid 10128] <... rt_sigaction resumed> {sa_handler=0x558dd1af0300, 
sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f03fc8c2040}, 8) 
= 0
  [pid 10128] rt_sigprocmask(SIG_SETMASK, [],  
  [pid 10128] <... rt_sigprocmask resumed> NULL, 8) = 0
  [pid 10128] write(1, "\7\r\33[K:\33[K", 9 
  [pid 10128] <... write resumed> )   = 9
  [pid 10128] read(5,  

And here's the pager handling the signal, and going back to waiting for
terminal input.

  [pid 10125] rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[INT], 
sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fbe2b4c1040},  
  [pid 10125] <... rt_sigaction resumed> {sa_handler=0x55aec373a6a0, 
sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fbe2b4c1040}, 8) 
= 0
  [pid 10125] rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [INT], 8) = 0
  [pid 10125] getpid( 
  [pid 10125] <... getpid resumed> )  = 10125
  [pid 10125] gettid()= 10125
  [pid 10125] tgkill(10125, 10125, SIGINT) = 0
  [pid 10125] rt_sigprocmask(SIG_SETMASK, [INT], NULL, 8) = 0
  [pid 10125] rt_sigreturn({mask=[]}) = 61
  [pid 10125] --- SIGINT {si_signo=SIGINT, si_code=SI_TKILL, si_pid=10125, 
si_uid=1000} ---
  [pid 10125] +++ killed by SIGINT +++

The parent process pops the signal handler and propagates to itself,
dying.

At this point things pause, and nothing happens. But as soon as I hit a
key, the pager dies:

  [pid 10128] <... read resumed> "\r", 1) = 1
  [pid 10128] write(1, "\r\33[K", 4)  = 4
  [pid 10128] write(1, " \"Another round of MIPS fixe"..., 50) = 50
  [pid 10128] read(5, 0x7ffd39153b57, 1)  = -1 EIO (Input/output error)
  [pid 10128] write(1, "\r\33[K\33[?1l\33>", 11) = 11
  [pid 10128] fsync(5)= -1 EINVAL (Invalid argument)
  [pid 10128] ioctl(5, TCGETS, {B38400 opost isig -icanon -echo ...}) = 0
  [pid 10128] ioctl(5, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig icanon 
echo ...}) = -1 EIO (Input/output error)
  [pid 10128] exit_group(1)   = ?
  [pid 10128] +++ exited with 1 +++

The key thing is the EIO we get reading from the terminal. I think
because the head of the process group died, we lost our controlling
terminal.

And then naturally the git-log process gets 

Re: Regression: Ctrl-c from the pager in an alias exits it

2017-01-05 Thread Jeff King
On Fri, Jan 06, 2017 at 01:40:32AM -0500, Jeff King wrote:

> On Thu, Jan 05, 2017 at 03:25:29PM +0100, Trygve Aaberge wrote:
> 
> > I'm experiencing an issue when using aliases for commands that open the 
> > pager.
> > When I press Ctrl-c from the pager, it exits. This does not happen when I
> > don't use an alias and did not happen before. It causes problems because
> > Ctrl-c is also used for other things, such as canceling a search that hasn't
> > completed.
> > 
> > To reproduce, create e.g. the alias `l = log` and run `git l`. Then press
> > Ctrl-c. The expected behavior is that nothing happens. The actual behavior 
> > is
> > that the pager exits.
> 
> That's weird. I can't reproduce at all here. But I also can't think of a
> thing that Git could do that would impact the behavior. For example:

I take it back. I could not reproduce under my normal config, which sets
the pager to "diff-highlight | less". But if I drop that config, I can
reproduce easily. And bisect it to that same commit, 86d26f240f
(setup.c: re-fix d95138e (setup: set env $GIT_WORK_TREE when ..,
2015-12-20).

-Peff


Re: Regression: Ctrl-c from the pager in an alias exits it

2017-01-05 Thread Jeff King
On Thu, Jan 05, 2017 at 03:25:29PM +0100, Trygve Aaberge wrote:

> I'm experiencing an issue when using aliases for commands that open the pager.
> When I press Ctrl-c from the pager, it exits. This does not happen when I
> don't use an alias and did not happen before. It causes problems because
> Ctrl-c is also used for other things, such as canceling a search that hasn't
> completed.
> 
> To reproduce, create e.g. the alias `l = log` and run `git l`. Then press
> Ctrl-c. The expected behavior is that nothing happens. The actual behavior is
> that the pager exits.

That's weird. I can't reproduce at all here. But I also can't think of a
thing that Git could do that would impact the behavior. For example:

  1. Git should never kill() the pager; in fact it waits for it to
 finish.

  2. Git can impact the pager environment and command line options, but
 those haven't changed anytime recently (and besides, I'm not sure
 there even is an option to convince less to die).

  3. Git can impact the set of blocked signals that the pager sees, but
 I think less would set up its own handler for SIGINT anyway.

I suppose it's possible that your shell or another program (tmux,
maybe?) catches the SIGINT and does a process-group kill. But then I
don't know why it would matter that you're using an alias. The process
tree without an alias:

  |-xterm,21861,peff
  |   `-bash,21863
  |   `-git,22376 log
  |   `-less,22377

and with:

  |-xterm,21861,peff
  |   `-bash,21863
  |   `-git,22391 l
  |   `-git-log,22393
  |   `-less,22394

are pretty similar.

Puzzling.

-Peff