Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-16 Thread Linus Torvalds
On Tue, May 16, 2017 at 1:12 PM, Johannes Schindelin
 wrote:
>>
>> I think it would be better to just
>>
>>  (a) get rid of the magic strcspn() entirely
>>
>>  (b) make the 'can we optimize this' test be simply just looking up
>> 'argv[0]' in $PATH
>
> What about
>
> ABC=1 my-executable my-arg

What about it?

Do you have a binary like

'/usr/bin/ABC=1 my-executable my-arg'

or something? If so, it gets executed. If not, it would get passed off
to "/bin/sh".

That sounds a lot more obvious than "some random set of characters are magical".

  Linus


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-16 Thread Johannes Schindelin
Hi Linus,

On Tue, 16 May 2017, Linus Torvalds wrote:

> On Tue, May 16, 2017 at 10:23 AM, Jeff King  wrote:
> >
> > I think the logic here would be more like:
> >
> >   1. During prepare_shell_cmd(), even if we optimize out the shell call,
> >  still prepare a fallback argv (since we can't allocate memory
> >  post-fork).
> >
> >   2. In the forked child, if we get ENOENT from exec and cmd->use_shell
> >  is set, then exec the fallback shell argv instead. Propagate its
> >  results, even if it's 127.
> >
> > That still means we'd prefer a $PATH copy of a command to its shell
> > builtin variant, but that can't be helped (and I kind of doubt anybody
> > would care too much).
> 
> I think it would be better to just
> 
>  (a) get rid of the magic strcspn() entirely
> 
>  (b) make the 'can we optimize this' test be simply just looking up
> 'argv[0]' in $PATH

What about

ABC=1 my-executable my-arg

Ciao,
Dscho


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-16 Thread Linus Torvalds
On Tue, May 16, 2017 at 10:23 AM, Jeff King  wrote:
>
> I think the logic here would be more like:
>
>   1. During prepare_shell_cmd(), even if we optimize out the shell call,
>  still prepare a fallback argv (since we can't allocate memory
>  post-fork).
>
>   2. In the forked child, if we get ENOENT from exec and cmd->use_shell
>  is set, then exec the fallback shell argv instead. Propagate its
>  results, even if it's 127.
>
> That still means we'd prefer a $PATH copy of a command to its shell
> builtin variant, but that can't be helped (and I kind of doubt anybody
> would care too much).

I think it would be better to just

 (a) get rid of the magic strcspn() entirely

 (b) make the 'can we optimize this' test be simply just looking up
'argv[0]' in $PATH

Hmm? If you have executables with special characters in them in your
PATH, you have bigger issues.

Also, if people really want to optimize the code that executes an
external program (whether in shell or directly), I think it might be
worth it to look at replacing the "fork()" with a "vfork()".

Something like this

-   cmd->pid = fork();
+   cmd->pid = (cmd->git_cmd || cmd->env) ? fork() : vfork();

might work (the native git_cmd case needs a real fork, and if we
change the environment variables we need it too, but the other cases
look like they might work with vfork()).

Using vfork() can be hugely more efficient, because you don't have the
extra page table copies and teardown, but also avoid a lot of possible
copy-on-write faults.

 Linus


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-16 Thread Jeff King
On Tue, May 16, 2017 at 10:29:31AM -0700, Eric Rannaud wrote:

> > It does not really work that way. Git runs in a separate process that
> > does not have access to your current shell. That's why you need to do
> > 'export -f foo'.
> >
> > If you want git to be able to ecute the foo shell function, git needs to
> > start a _new_ shell process, which reads the environment, recognize the
> > exported function and run that.
> >
> > This is not the same as git executing the command in your shell. Not
> > exported variables would not be available in this function (as it would
> > be in your equivalent).
> 
> I'm sorry, I didn't mean (or say) "my shell process". Indeed, it
> doesn't work that way. And to be clear, there is no problem with
> having to "export -f foo". The only question is how should git run the
>  passed to --exec: should it run directly or using a shell?

It's definitely "using a shell". Most things Git runs on your behalf
behave the same, but there are some exceptions due to historical warts
(that would break backwards compatibility if we switched them). E.g.,
$GIT_SSH does not use the shell, but we introduced GIT_SSH_COMMAND as an
alternative which does use the shell.

But note that "a shell" may not be necessarily be your login shell. We
always execute "/bin/sh -c" (and you can override the shell path at
build time). So your "export -f" trick only works because your /bin/sh
is a symlink to bash (or because you specifically built with the
SHELL_PATH knob pointed at bash).

-Peff


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-16 Thread Jeff King
On Tue, May 16, 2017 at 10:21:51AM -0700, Eric Rannaud wrote:

> On Tue, May 16, 2017 at 9:59 AM, Eric Rannaud  wrote:
> > When I use "git rebase --exec " I'm basically writing a "foreach
> > commit in range {  }" in my shell. Same idea with git bisect run.
> >
> > A transparent optimization that tries execve() then falls back to the
> > user's shell sounds like a good idea.
> 
> One issue with the execve-else-shell optimization is that sometimes a
> binary exists that will shadow an exported function or a shell
> builtin:
> 
>   git rebase --exec true master^^  # OK but in fact this runs /usr/bin/true

Yeah, this is the builtin thing I mentioned elsewhere. I think it's
pretty rare to run a builtin with no arguments and care about the
behavior differences.

> /usr/bin/time requires an argument. Even though the bash builtin time
> runs fine without argument.
> 
>   $ time
> 
>   real0m0.000s
>   user0m0.000s
>   sys 0m0.000s

I've run into the "time" distinction even when running things from the
shell, because the time builtin is special. E.g.:

  $ time true
  real  0m0.000s
  user  0m0.000s
  sys   0m0.000s

  $ (echo foo | time true) 2>&1
  0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 1136maxresident)k
  0inputs+0outputs (0major+61minor)pagefaults 0swaps

So to some degree, depending on builtins versus external commands
(especially when you're round-tripping through another program running a
second shell) is going to have some surprises.

> But if the optimization is applied to more complex commands, then we
> will have problems. For instance, the builtin echo supports \E, but
> /usr/bin/echo doesn't support it.

No, it shouldn't. If any of

 |&;<>()$`\\\"' \t\n*?[#~=%

appear in the string, we always run the shell.

> In any case, the manpage says --exec  and " will be
> interpreted as one or more shell commands.", it doesn't say "--exec
> ".

Right, it's clearly supposed to use the shell, or behave as if the shell
were invoked (within reason).

-Peff


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-16 Thread Jonathan Nieder
Jeff King wrote:
>>> On Mon, May 15, 2017 at 11:25:03PM -0400, Jeff King wrote:

 One hack would be to look for BASH_FUNC_* in the environment and disable
 the optimization in that case. I think that would make your case Just
 Work. It doesn't help other oddball cases, like:
[...]
>> Hm.  execvp explicitly does this when it hits ENOEXEC, but not for
>> ENOENT.  Do you know why that is?
>
> When execvp(foo) falls back on ENOEXEC, it is not running "sh -c foo".
> It is actually running "sh foo" to run the script contents.

Oh --- now I get it.  This is about the optimization to bypass the shell
in prepare_shell_command.  It has nothing to do with execvp --- our
execvp emulation already does the right thing, and Brandon's patches
did not make this problem any worse or better.

Unconditionally falling back to sh -c when we get ENOENT in the
RUN_USING_SHELL case makes sense to me.

Sorry for the confusion,
Jonathan


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-16 Thread Brandon Williams
On 05/16, Jeff King wrote:
> On Tue, May 16, 2017 at 10:15:40AM -0700, Brandon Williams wrote:
> 
> > > > > So I was thinking something like the patch below, though I guess
> > > > > technically you could look for BASH_FUNC_$argv[0]%%, which seems to be
> > > > > bash's magic variable name. I hate to get too intimate with those
> > > > > details, though.
> > 
> > One concern with that is what about all other shells that are not BASH?
> > I'm sure they use a different env var for storing functions so we may
> > need to handle other shell's too? That is assuming we want to keep the
> > old behavior.
> 
> Most other shells don't do function-exporting at all. Certainly dash and
> most traditional bourne shells don't. I wouldn't be surprised if zsh
> does. But yeah, we'd have to support them one by one (and possibly
> variants across different versions of each shell). Workable, but gross.
> 
> > > When execvp(foo) falls back on ENOEXEC, it is not running "sh -c foo".
> > > It is actually running "sh foo" to run the script contents. So it's
> > > about letting you do:
> > > 
> > >   echo "no shebang line" >script
> > >   chmod +x script
> > >   ./script
> > > 
> > > And nothing to do with shell builtins.
> > 
> > That's correct, and is the exact behavior I was trying to mimic with the
> > changes to run_command.
> >   1. try executing the command.
> >   2. if it fails with ENOEXEC then attempt to execute it with a shell
> 
> I think the logic here would be more like:

Oh yes, I was just saying what I did not taking into account how we
would solve this issue.

> 
>   1. During prepare_shell_cmd(), even if we optimize out the shell call,
>  still prepare a fallback argv (since we can't allocate memory
>  post-fork).
> 
>   2. In the forked child, if we get ENOENT from exec and cmd->use_shell
>  is set, then exec the fallback shell argv instead. Propagate its
>  results, even if it's 127.
> 
> That still means we'd prefer a $PATH copy of a command to its shell
> builtin variant, but that can't be helped (and I kind of doubt anybody
> would care too much).
> 
> -Peff

-- 
Brandon Williams


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-16 Thread Eric Rannaud
On Tue, May 16, 2017 at 10:14 AM, Kevin Daudt  wrote:
>> A transparent optimization that tries execve() then falls back to the
>> user's shell sounds like a good idea.
>
> It does not really work that way. Git runs in a separate process that
> does not have access to your current shell. That's why you need to do
> 'export -f foo'.
>
> If you want git to be able to ecute the foo shell function, git needs to
> start a _new_ shell process, which reads the environment, recognize the
> exported function and run that.
>
> This is not the same as git executing the command in your shell. Not
> exported variables would not be available in this function (as it would
> be in your equivalent).

I'm sorry, I didn't mean (or say) "my shell process". Indeed, it
doesn't work that way. And to be clear, there is no problem with
having to "export -f foo". The only question is how should git run the
 passed to --exec: should it run directly or using a shell?


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-16 Thread Jeff King
On Tue, May 16, 2017 at 10:15:40AM -0700, Brandon Williams wrote:

> > > > So I was thinking something like the patch below, though I guess
> > > > technically you could look for BASH_FUNC_$argv[0]%%, which seems to be
> > > > bash's magic variable name. I hate to get too intimate with those
> > > > details, though.
> 
> One concern with that is what about all other shells that are not BASH?
> I'm sure they use a different env var for storing functions so we may
> need to handle other shell's too? That is assuming we want to keep the
> old behavior.

Most other shells don't do function-exporting at all. Certainly dash and
most traditional bourne shells don't. I wouldn't be surprised if zsh
does. But yeah, we'd have to support them one by one (and possibly
variants across different versions of each shell). Workable, but gross.

> > When execvp(foo) falls back on ENOEXEC, it is not running "sh -c foo".
> > It is actually running "sh foo" to run the script contents. So it's
> > about letting you do:
> > 
> >   echo "no shebang line" >script
> >   chmod +x script
> >   ./script
> > 
> > And nothing to do with shell builtins.
> 
> That's correct, and is the exact behavior I was trying to mimic with the
> changes to run_command.
>   1. try executing the command.
>   2. if it fails with ENOEXEC then attempt to execute it with a shell

I think the logic here would be more like:

  1. During prepare_shell_cmd(), even if we optimize out the shell call,
 still prepare a fallback argv (since we can't allocate memory
 post-fork).

  2. In the forked child, if we get ENOENT from exec and cmd->use_shell
 is set, then exec the fallback shell argv instead. Propagate its
 results, even if it's 127.

That still means we'd prefer a $PATH copy of a command to its shell
builtin variant, but that can't be helped (and I kind of doubt anybody
would care too much).

-Peff


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-16 Thread Eric Rannaud
On Tue, May 16, 2017 at 9:59 AM, Eric Rannaud  wrote:
> When I use "git rebase --exec " I'm basically writing a "foreach
> commit in range {  }" in my shell. Same idea with git bisect run.
>
> A transparent optimization that tries execve() then falls back to the
> user's shell sounds like a good idea.

One issue with the execve-else-shell optimization is that sometimes a
binary exists that will shadow an exported function or a shell
builtin:

  git rebase --exec true master^^  # OK but in fact this runs /usr/bin/true

In this case it doesn't really matter. If the optimization is only
applied to "simple commands" (i.e. no arguments), then it's probably
OK. I can't think of problematic cases. Except weird things like:

  $ git rebase --exec time master^
  Executing: time
  Usage: time [-apvV] [-f format] [-o file] [--append] [--verbose]
 [--portability] [--format=format] [--output=file] [--version]
 [--help] command [arg...]
  warning: execution failed: time

/usr/bin/time requires an argument. Even though the bash builtin time
runs fine without argument.

  $ time

  real0m0.000s
  user0m0.000s
  sys 0m0.000s

But if the optimization is applied to more complex commands, then we
will have problems. For instance, the builtin echo supports \E, but
/usr/bin/echo doesn't support it.

In any case, the manpage says --exec  and " will be
interpreted as one or more shell commands.", it doesn't say "--exec
".


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-16 Thread Brandon Williams
On 05/16, Jeff King wrote:
> On Tue, May 16, 2017 at 09:41:24AM -0700, Jonathan Nieder wrote:
> 
> > Jeff King wrote:
> > > On Mon, May 15, 2017 at 11:25:03PM -0400, Jeff King wrote:
> > 
> > >> One hack would be to look for BASH_FUNC_* in the environment and disable
> > >> the optimization in that case. I think that would make your case Just
> > >> Work. It doesn't help other oddball cases, like:
> > >>
> > >>   - you're trying to run a shell builtin that behaves differently than
> > >> its exec-able counterpart
> > >>
> > >>   - your shell has some other mechanism for defining commands that we
> > >> would not find via exec. I don't know of one offhand. Obviously $ENV
> > >> could point to a file which defines some, but for most shells would
> > >> not read any startup files for a non-interactive "sh -c" invocation.
> > >
> > > So I was thinking something like the patch below, though I guess
> > > technically you could look for BASH_FUNC_$argv[0]%%, which seems to be
> > > bash's magic variable name. I hate to get too intimate with those
> > > details, though.

One concern with that is what about all other shells that are not BASH?
I'm sure they use a different env var for storing functions so we may
need to handle other shell's too? That is assuming we want to keep the
old behavior.

> > >
> > > Another option is to speculatively run "foo" without the shell, and if
> > > execve fails to find it, then fall back to running the shell. That would
> > > catch any number of cases where the shell "somehow" finds a command that
> > > we can't.
> > 
> > Hm.  execvp explicitly does this when it hits ENOEXEC, but not for
> > ENOENT.  Do you know why that is?
> 
> When execvp(foo) falls back on ENOEXEC, it is not running "sh -c foo".
> It is actually running "sh foo" to run the script contents. So it's
> about letting you do:
> 
>   echo "no shebang line" >script
>   chmod +x script
>   ./script
> 
> And nothing to do with shell builtins.

That's correct, and is the exact behavior I was trying to mimic with the
changes to run_command.
  1. try executing the command.
  2. if it fails with ENOEXEC then attempt to execute it with a shell

> 
> > I think we want to behave consistently for shell builtins and for
> > exported functions --- they are different sides of the same problem,
> > and behaving differently between the two feels confusing.
> 
> Yes, I think ideally we'd handle it all transparently. Although I'd also
> point out that Git the behavior under discussion dates back to 2009 and
> this is (as far as I can recall) the first report. So documenting the
> current behavior might not be that bad an option.
> 
> -Peff

-- 
Brandon Williams


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-16 Thread Kevin Daudt
On Tue, May 16, 2017 at 09:59:03AM -0700, Eric Rannaud wrote:
> On Tue, May 16, 2017 at 9:18 AM, Jeff King  wrote:
> > On Tue, May 16, 2017 at 12:23:02PM +0200, Johannes Schindelin wrote:
> >> It would appear to me that you used a side effect of an implementation
> >> detail: that `git rebase -i` was implemented entirely as a shell script.
> >
> > I don't think that's true at all. He expected the user-provided "--exec"
> > command to be run by a shell, which seems like a reasonable thing for
> > Git to promise (and we already make a similar promise for most
> > user-provided commands that we run).  What happens in between, be it
> 
> As a "user", my expectation was simply that the command would be run
> not just in "a shell", but in *my* shell (or the shell that calls git,
> maybe). So I don't see any portability question with respect to Git.
> My script that uses git rebase --exec may not be portable, but that's
> my problem.
> 
> When I use "git rebase --exec " I'm basically writing a "foreach
> commit in range {  }" in my shell. Same idea with git bisect run.
> 
> A transparent optimization that tries execve() then falls back to the
> user's shell sounds like a good idea.

It does not really work that way. Git runs in a separate process that
does not have access to your current shell. That's why you need to do
'export -f foo'.

If you want git to be able to ecute the foo shell function, git needs to
start a _new_ shell process, which reads the environment, recognize the
exported function and run that.

This is not the same as git executing the command in your shell. Not
exported variables would not be available in this function (as it would
be in your equivalent).


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-16 Thread Eric Rannaud
On Tue, May 16, 2017 at 9:47 AM, Jeff King  wrote:
>> I think we want to behave consistently for shell builtins and for
>> exported functions --- they are different sides of the same problem,
>> and behaving differently between the two feels confusing.
>
> Yes, I think ideally we'd handle it all transparently. Although I'd also
> point out that Git the behavior under discussion dates back to 2009 and
> this is (as far as I can recall) the first report. So documenting the
> current behavior might not be that bad an option.

This is on Arch Linux which is usually pretty aggressive with their
Git upgrades and I first saw the problem this week. The script that
hits this case runs constantly on my machine so my guess is that if
anyone relies on the old behavior, they're just starting to see a
new-enough Git to have a problem.


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-16 Thread Eric Rannaud
On Tue, May 16, 2017 at 9:18 AM, Jeff King  wrote:
> On Tue, May 16, 2017 at 12:23:02PM +0200, Johannes Schindelin wrote:
>> It would appear to me that you used a side effect of an implementation
>> detail: that `git rebase -i` was implemented entirely as a shell script.
>
> I don't think that's true at all. He expected the user-provided "--exec"
> command to be run by a shell, which seems like a reasonable thing for
> Git to promise (and we already make a similar promise for most
> user-provided commands that we run).  What happens in between, be it

As a "user", my expectation was simply that the command would be run
not just in "a shell", but in *my* shell (or the shell that calls git,
maybe). So I don't see any portability question with respect to Git.
My script that uses git rebase --exec may not be portable, but that's
my problem.

When I use "git rebase --exec " I'm basically writing a "foreach
commit in range {  }" in my shell. Same idea with git bisect run.

A transparent optimization that tries execve() then falls back to the
user's shell sounds like a good idea.


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-16 Thread Jeff King
On Tue, May 16, 2017 at 09:41:24AM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> > On Mon, May 15, 2017 at 11:25:03PM -0400, Jeff King wrote:
> 
> >> One hack would be to look for BASH_FUNC_* in the environment and disable
> >> the optimization in that case. I think that would make your case Just
> >> Work. It doesn't help other oddball cases, like:
> >>
> >>   - you're trying to run a shell builtin that behaves differently than
> >> its exec-able counterpart
> >>
> >>   - your shell has some other mechanism for defining commands that we
> >> would not find via exec. I don't know of one offhand. Obviously $ENV
> >> could point to a file which defines some, but for most shells would
> >> not read any startup files for a non-interactive "sh -c" invocation.
> >
> > So I was thinking something like the patch below, though I guess
> > technically you could look for BASH_FUNC_$argv[0]%%, which seems to be
> > bash's magic variable name. I hate to get too intimate with those
> > details, though.
> >
> > Another option is to speculatively run "foo" without the shell, and if
> > execve fails to find it, then fall back to running the shell. That would
> > catch any number of cases where the shell "somehow" finds a command that
> > we can't.
> 
> Hm.  execvp explicitly does this when it hits ENOEXEC, but not for
> ENOENT.  Do you know why that is?

When execvp(foo) falls back on ENOEXEC, it is not running "sh -c foo".
It is actually running "sh foo" to run the script contents. So it's
about letting you do:

  echo "no shebang line" >script
  chmod +x script
  ./script

And nothing to do with shell builtins.

> I think we want to behave consistently for shell builtins and for
> exported functions --- they are different sides of the same problem,
> and behaving differently between the two feels confusing.

Yes, I think ideally we'd handle it all transparently. Although I'd also
point out that Git the behavior under discussion dates back to 2009 and
this is (as far as I can recall) the first report. So documenting the
current behavior might not be that bad an option.

-Peff


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-16 Thread Eric Rannaud
On Mon, May 15, 2017 at 8:53 PM, Jeff King  wrote:
>> > So I suspect if you added an extraneous semi-colon, your case would work
>> > again (and that would confirm for us that this is indeed the problem).
>>
>> Wow, that's a brilliant analysis.
>
> If it's right. :) It's all theory at this point.
>
> My /bin/sh isn't bash, but I should be able to build with
> SHELL_PATH=/bin/bash to reproduce. But I can't:

Just to clarify if there was any doubt, the semicolon trick does
indeed work fine when bash is your shell.

It would be nice if it were consistent for all git commands that take
a  argument:

  foo() { echo foo; }
  export -f foo
  git bisect start master master^^
  git bisect run foo  # works
  git bisect run 'foo;'  # doesn't work
running foo;
/usr/lib/git-core/git-bisect: line 493: foo;: command not found

Also, what's a "simple command", exactly?

  foo() { echo foo; }
  export -f foo
  git rebase --exec foo master^^  # fails
  git rebase --exec 'foo;' master^^  # OK
  git rebase --exec 'foo 1' master^^  # OK

Not sure if this can be made easy to understand in the manpage.

Thanks,
Eric


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-16 Thread Jonathan Nieder
(+Brandon Williams, who may have more context for execvp-related things)
Hi,

Jeff King wrote:
> On Mon, May 15, 2017 at 11:25:03PM -0400, Jeff King wrote:

>> One hack would be to look for BASH_FUNC_* in the environment and disable
>> the optimization in that case. I think that would make your case Just
>> Work. It doesn't help other oddball cases, like:
>>
>>   - you're trying to run a shell builtin that behaves differently than
>> its exec-able counterpart
>>
>>   - your shell has some other mechanism for defining commands that we
>> would not find via exec. I don't know of one offhand. Obviously $ENV
>> could point to a file which defines some, but for most shells would
>> not read any startup files for a non-interactive "sh -c" invocation.
>
> So I was thinking something like the patch below, though I guess
> technically you could look for BASH_FUNC_$argv[0]%%, which seems to be
> bash's magic variable name. I hate to get too intimate with those
> details, though.
>
> Another option is to speculatively run "foo" without the shell, and if
> execve fails to find it, then fall back to running the shell. That would
> catch any number of cases where the shell "somehow" finds a command that
> we can't.

Hm.  execvp explicitly does this when it hits ENOEXEC, but not for
ENOENT.  Do you know why that is?

I think we want to behave consistently for shell builtins and for
exported functions --- they are different sides of the same problem,
and behaving differently between the two feels confusing.

Thanks,
Jonathan


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-16 Thread Jeff King
On Tue, May 16, 2017 at 12:23:02PM +0200, Johannes Schindelin wrote:

> On Mon, 15 May 2017, Eric Rannaud wrote:
> 
> > It used to be possible to run a sequence like:
> > 
> >   foo() { echo X; }
> >   export -f foo
> >   git rebase --exec foo HEAD~10
> 
> It would appear to me that you used a side effect of an implementation
> detail: that `git rebase -i` was implemented entirely as a shell script.

I don't think that's true at all. He expected the user-provided "--exec"
command to be run by a shell, which seems like a reasonable thing for
Git to promise (and we already make a similar promise for most
user-provided commands that we run).  What happens in between, be it
shell or C code, doesn't matter, and the conversion away from a shell
script in this case only tickled an existing bad interaction between
"export -f" and Git's run-command code.

See my other replies for the full story.

I don't think this has anything in particular to do with git-rebase,
though. Our solutions are either:

  - declare "export -f" as too tricky for our optimization, and teach
people about the ";" trick

  - figure out some workaround/fallback to disable the shell-skipping
optimization in this case

-Peff


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-16 Thread Johannes Schindelin
Hi Junio,

On Tue, 16 May 2017, Junio C Hamano wrote:

> The point of rewriting things in C and using run_command() interface was
> to avoid shell overhead.

That statement is missing the point.

It is true that converting shell scripts to proper builtins avoids the
shell overhead.

It is even more true that that results in a nice speed-up that can even be
felt on Linux (I feel I have to stress this, as you seem to be mostly
focused on Linux usage, and I do not want you to get the impression that
you are only doing "the Windows folks" a favor when you gracefully accept
the patches to convert scripts to proper builtins).

But the real truth is: shell scripting is not portable.

Shell scripting is never only shell scripting, of course. A quite
undocumented set of utilities is expected to be present for our scripts to
run, too: sed, awk, tr, cat, expr, just to name a few.

It does not end there. For example, sed is not equal to sed. BSD sed has
different semantics than GNU sed, and we jump through hoops to try to
ensure that our shell scripts run with both versions. Which must make many
a contributor feel a lot less positive about their contributions e.g. when
a reviewer points out that their \t in their sed statement would break Git
on MacOSX. Sad!

We place a burden on maintainers targeting platforms that are not Linux.
If I remember correctly, we need to override the default shell on one
Unix, even, because we simply failed to make our shell scripts run with
it.

And then, of course, there are all the problems inflicted on users on that
most prevalent Operating System of them all: Windows. Yes, I know, you
"non-Windows folk" would like to happily ignore that little tiny problem,
but that's really doing Git a disservice. So stop doing it.

What hurts me most about this is that Subversion, a software that us "Git
folk" pride ourselves in making obsolete, never had those problems. It was
designed with an eye toward portability from the start.

It is a good thing to support users who want to automate their workflows
using scripts, of course. Oh, and please, do not forget to remember that
there are tons of scripting languages out there, e.g. AppleScript,
Javascript, Tcl, Python, Ruby; just because you prefer Bash does not mean
the majority of developers do.

So let's continue to broaden our scripting support. And at the same time,
let's also change our mindset so that we can reduce the portability
problems of Git.

Ciao,
Dscho


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-16 Thread Johannes Schindelin
Hi Eric,

On Mon, 15 May 2017, Eric Rannaud wrote:

> It used to be possible to run a sequence like:
> 
>   foo() { echo X; }
>   export -f foo
>   git rebase --exec foo HEAD~10

It would appear to me that you used a side effect of an implementation
detail: that `git rebase -i` was implemented entirely as a shell script.

In my mind, this was a fragile assumption, and that is why...

> commit 18633e1a22a68bbe8e6311a1039d13ebbf6fd041 (refs/bisect/bad)
> Author: Johannes Schindelin 
> Date:   Thu Feb 9 23:23:11 2017 +0100
> 
> rebase -i: use the rebase--helper builtin

... this commit, which is a long-overdue start of cleaning up our act,
broke your code's assumption.

I am afraid that your code placed too much of a dependency on an
implementation detail that changed.

In short: I think that your fix to your script is correct.

Ciao,
Johannes


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-15 Thread Jeff King
On Mon, May 15, 2017 at 11:53:57PM -0400, Jeff King wrote:

> My /bin/sh isn't bash, but I should be able to build with
> SHELL_PATH=/bin/bash to reproduce. But I can't:
> 
>$ bash
>$ foo() { echo >&2 "in foo"; }
>$ export -f foo
>$ bash -c foo
>in foo
> 
>$ strace -f 2>&1 git.compile rebase -x 'foo;' HEAD^ | grep foo
>Executing: foo;
>[pid  1788] execve("/bin/bash", ["/bin/bash", "-c", "foo;", "foo;"], [/* 
> 60 vars */] 
>foo;: foo: command not found
> 
> So I'm not sure why the direct "bash -c" can find it, but somehow the
> variable doesn't make it through to the "bash -c" at the lower level.
> Replacing "foo;" with "env" shows the environment, but BASH_FUNC_foo
> isn't set in it. I'm not sure where it's getting eaten, though.

Hmph. It looks like "dash" eats it. My "git.compile" above is a symlink
to /path/to/git/bin-wrappers/git. But it looks like our Makefile isn't
smart enough to rebuild bin-wrappers when you switch SHELL_PATH, so it
will had "/bin/sh" in it. Which points to dash on my machine. And
indeed, dash seems to wipe the environment:

  $ foo() { echo >&2 "in foo"; }
  $ export -f foo
  $ bash -c foo
  in foo
  $ dash -c "bash -c foo"
  bash: foo: command not found

I wonder if it's related to ShellShock protections, or if dash just
rejects variable names with the "%" in them or something. Anyway. That
explains why I had trouble reproducing. Using bash consistently as my
shell lets me reproduce Eric's results. And doing the semicolon trick
does make it work again.

So I feel confident that I've analyzed the problem correctly, at least.

-Peff


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-15 Thread Jeff King
On Tue, May 16, 2017 at 12:40:51PM +0900, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Interesting. The "exec" string is still run with a shell. E.g.:
> > ...
> > I wonder if this is falling afoul of the optimization in run-command's
> > prepare_shell_cmd() to skip shell invocations for "simple" commands.
> > ...
> > So I suspect if you added an extraneous semi-colon, your case would work
> > again (and that would confirm for us that this is indeed the problem).
> 
> Wow, that's a brilliant analysis.

If it's right. :) It's all theory at this point.

My /bin/sh isn't bash, but I should be able to build with
SHELL_PATH=/bin/bash to reproduce. But I can't:

   $ bash
   $ foo() { echo >&2 "in foo"; }
   $ export -f foo
   $ bash -c foo
   in foo

   $ strace -f 2>&1 git.compile rebase -x 'foo;' HEAD^ | grep foo
   Executing: foo;
   [pid  1788] execve("/bin/bash", ["/bin/bash", "-c", "foo;", "foo;"], [/* 60 
vars */] 
   foo;: foo: command not found

So I'm not sure why the direct "bash -c" can find it, but somehow the
variable doesn't make it through to the "bash -c" at the lower level.
Replacing "foo;" with "env" shows the environment, but BASH_FUNC_foo
isn't set in it. I'm not sure where it's getting eaten, though.

> The "semicolon" trick is way too obscure, but perhaps can be
> documented as an escape hatch?

Yeah, I agree this should be documented if it can't be fixed. I wasn't
sure if we were giving up just yet.

Either way, though, it wouldn't hurt to mention optimizing out "maybe
shell" optimization, because it can occasionally produce user-visible
effects. Where would be a good place? In git(1), I guess?

It's almost something that could go in gitcli(7), but it's not really
about the CLI in particular. In most cases the shell-exec'd commands are
from config, but not always (as this case shows). So git-config(1)
probably isn't the right place.

AFAIK, we don't talk about this behavior at all in the existing
documentation. And I really don't know where we'd put it that somebody
would find it without having to read the documentation exhaustively.

-Peff


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-15 Thread Junio C Hamano
Jeff King  writes:

> Interesting. The "exec" string is still run with a shell. E.g.:
> ...
> I wonder if this is falling afoul of the optimization in run-command's
> prepare_shell_cmd() to skip shell invocations for "simple" commands.
> ...
> So I suspect if you added an extraneous semi-colon, your case would work
> again (and that would confirm for us that this is indeed the problem).

Wow, that's a brilliant analysis.

> The optimization in run-command is very old, but the switch to
> rebase--helper presumably moved us from doing that exec in the actual
> shell script to doing it via the C code.
>
> Which means your exported-function technique has been broken for _most_
> of Git all along, but it just now affected this particular spot.
>
> I'm not sure how to feel about it. In the face of exported functions, we
> can never do the shell-skipping optimization, because we don't know how
> the shell is going to interpret even a simple-looking command. And it is
> kind of a neat trick. But I also hate to add extra useless shell
> invocations for the predominantly common case that people aren't using
> this trick (or aren't even using a shell that supports function
> exports).

I was about to write this off as "an unfortunate regression, a
fallout that will likely left unfixed, due to lack of a good
practical workaround."

The point of rewriting things in C and using run_command() interface
was to avoid shell overhead.  We are doing an exec already, but
adding a shell invocation in the middle will double the number of
exec (and probably add an extra fork as well), which probably is
measurable on systems with slow fork/exec.

The "semicolon" trick is way too obscure, but perhaps can be
documented as an escape hatch?



Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-15 Thread Jeff King
On Mon, May 15, 2017 at 11:25:03PM -0400, Jeff King wrote:

> One hack would be to look for BASH_FUNC_* in the environment and disable
> the optimization in that case. I think that would make your case Just
> Work. It doesn't help other oddball cases, like:
> 
>   - you're trying to run a shell builtin that behaves differently than
> its exec-able counterpart
> 
>   - your shell has some other mechanism for defining commands that we
> would not find via exec. I don't know of one offhand. Obviously $ENV
> could point to a file which defines some, but for most shells would
> not read any startup files for a non-interactive "sh -c" invocation.

So I was thinking something like the patch below, though I guess
technically you could look for BASH_FUNC_$argv[0]%%, which seems to be
bash's magic variable name. I hate to get too intimate with those
details, though.

Another option is to speculatively run "foo" without the shell, and if
execve fails to find it, then fall back to running the shell. That would
catch any number of cases where the shell "somehow" finds a command that
we can't.

You'd still have confusing behavior if your shell builtin behaved
differently than the exec-able version, though (because we'd quietly use
the exec-able one), but I would imagine that's exceedingly rare.

I dunno. Maybe the whole thing is a fool's errand.

diff --git a/run-command.c b/run-command.c
index 1c02bfb2e..8328d27fb 100644
--- a/run-command.c
+++ b/run-command.c
@@ -240,12 +240,24 @@ int sane_execvp(const char *file, char * const argv[])
return -1;
 }
 
+static int env_has_bash_functions(void)
+{
+   extern char **environ;
+   char **key;
+
+   for (key = environ; *key; key++)
+   if (starts_with(*key, "BASH_FUNC_"))
+   return 1;
+   return 0;
+}
+
 static const char **prepare_shell_cmd(struct argv_array *out, const char 
**argv)
 {
if (!argv[0])
die("BUG: shell command is empty");
 
-   if (strcspn(argv[0], "|&;<>()$`\\\"' \t\n*?[#~=%") != strlen(argv[0])) {
+   if (strcspn(argv[0], "|&;<>()$`\\\"' \t\n*?[#~=%") != strlen(argv[0]) ||
+   env_has_bash_functions()) {
 #ifndef GIT_WINDOWS_NATIVE
argv_array_push(out, SHELL_PATH);
 #else


Re: git rebase regression: cannot pass a shell expression directly to --exec

2017-05-15 Thread Jeff King
On Mon, May 15, 2017 at 11:08:39AM -0700, Eric Rannaud wrote:

> It used to be possible to run a sequence like:
> 
>   foo() { echo X; }
>   export -f foo
>   git rebase --exec foo HEAD~10
> 
> Since upgrading to 2.13.0, I had to update my scripts to run:
> 
>   git rebase --exec "bash -c foo" HEAD~10

Interesting. The "exec" string is still run with a shell. E.g.:

  $ git rebase --exec 'for i in 1 2 3; do echo >&2 $i; done' HEAD~5
  Executing: for i in 1 2 3; do echo >&2 $i; done
  1
  2
  3
  Executing: for i in 1 2 3; do echo >&2 $i; done
  1
  2
  3
  [...and so on...]

I wonder if this is falling afoul of the optimization in run-command's
prepare_shell_cmd() to skip shell invocations for "simple" commands.

E.g., if I do:

   strace -fe execve git rebase -x foo HEAD^ 2>&1 | grep foo

I see:

   ...
   Executing: foo
   [pid 21820] execve("foo", ["foo"], [/* 57 vars */]) = -1 ENOENT (No such 
file or directory)
   fatal: cannot run foo: No such file or directory

But if I were to add a shell meta-character, like this:

   strace -fe execve git rebase -x 'foo;' HEAD^ 2>&1 | grep foo

then I see:

  ...
  Executing: foo;
  [pid 22569] execve("/bin/sh", ["/bin/sh", "-c", "foo;", "foo;"], [/* 57 vars 
*/]) = 0
  foo;: 1: foo;: foo: not found

So I suspect if you added an extraneous semi-colon, your case would work
again (and that would confirm for us that this is indeed the problem).

> I'm not sure if this was an intended change. Bisecting with the
> following script:
> [...]
> It points to this commit:
>
> commit 18633e1a22a68bbe8e6311a1039d13ebbf6fd041 (refs/bisect/bad)
> Author: Johannes Schindelin 
> Date:   Thu Feb 9 23:23:11 2017 +0100
>
> rebase -i: use the rebase--helper builtin

The optimization in run-command is very old, but the switch to
rebase--helper presumably moved us from doing that exec in the actual
shell script to doing it via the C code.

Which means your exported-function technique has been broken for _most_
of Git all along, but it just now affected this particular spot.

I'm not sure how to feel about it. In the face of exported functions, we
can never do the shell-skipping optimization, because we don't know how
the shell is going to interpret even a simple-looking command. And it is
kind of a neat trick. But I also hate to add extra useless shell
invocations for the predominantly common case that people aren't using
this trick (or aren't even using a shell that supports function
exports).

One hack would be to look for BASH_FUNC_* in the environment and disable
the optimization in that case. I think that would make your case Just
Work. It doesn't help other oddball cases, like:

  - you're trying to run a shell builtin that behaves differently than
its exec-able counterpart

  - your shell has some other mechanism for defining commands that we
would not find via exec. I don't know of one offhand. Obviously $ENV
could point to a file which defines some, but for most shells would
not read any startup files for a non-interactive "sh -c" invocation.

-Peff


git rebase regression: cannot pass a shell expression directly to --exec

2017-05-15 Thread Eric Rannaud
Hi all,

It used to be possible to run a sequence like:

  foo() { echo X; }
  export -f foo
  git rebase --exec foo HEAD~10

Since upgrading to 2.13.0, I had to update my scripts to run:

  git rebase --exec "bash -c foo" HEAD~10

I'm not sure if this was an intended change. Bisecting with the
following script:

  #!/usr/bin/env bash

  make -j8 || exit 3

  function foo() {
  echo OK
  }
  export -f foo

  pushd tmp
  ../git --exec-path=.. rebase --exec foo HEAD^^
  ret=$?
  # Cleanup if failure
  ../git --exec-path=.. rebase --abort &> /dev/null
  popd
  exit $ret

It points to this commit:

commit 18633e1a22a68bbe8e6311a1039d13ebbf6fd041 (refs/bisect/bad)
Author: Johannes Schindelin 
Date:   Thu Feb 9 23:23:11 2017 +0100

rebase -i: use the rebase--helper builtin

Now that the sequencer learned to process a "normal" interactive rebase,
we use it. The original shell script is still used for "non-normal"
interactive rebases, i.e. when --root or --preserve-merges was passed.

Please note that the --root option (via the $squash_onto variable) needs
special handling only for the very first command, hence it is still okay
to use the helper upon continue/skip.

Also please note that the --no-ff setting is volatile, i.e. when the
interactive rebase is interrupted at any stage, there is no record of
it. Therefore, we have to pass it from the shell script to the
rebase--helper.

Note: the test t3404 had to be adjusted because the the error messages
produced by the sequencer comply with our current convention to start with
a lower-case letter.

Signed-off-by: Johannes Schindelin 
Signed-off-by: Junio C Hamano 


Thanks,
Eric