Re: git rebase regression: cannot pass a shell expression directly to --exec
On Tue, May 16, 2017 at 1:12 PM, Johannes Schindelin wrote: >> >> I think it would be better to just >> >> (a) get rid of the magic strcspn() entirely >> >> (b) make the 'can we optimize this' test be simply just looking up >> 'argv[0]' in $PATH > > What about > > ABC=1 my-executable my-arg What about it? Do you have a binary like '/usr/bin/ABC=1 my-executable my-arg' or something? If so, it gets executed. If not, it would get passed off to "/bin/sh". That sounds a lot more obvious than "some random set of characters are magical". Linus
Re: git rebase regression: cannot pass a shell expression directly to --exec
Hi Linus, On Tue, 16 May 2017, Linus Torvalds wrote: > On Tue, May 16, 2017 at 10:23 AM, Jeff King wrote: > > > > I think the logic here would be more like: > > > > 1. During prepare_shell_cmd(), even if we optimize out the shell call, > > still prepare a fallback argv (since we can't allocate memory > > post-fork). > > > > 2. In the forked child, if we get ENOENT from exec and cmd->use_shell > > is set, then exec the fallback shell argv instead. Propagate its > > results, even if it's 127. > > > > That still means we'd prefer a $PATH copy of a command to its shell > > builtin variant, but that can't be helped (and I kind of doubt anybody > > would care too much). > > I think it would be better to just > > (a) get rid of the magic strcspn() entirely > > (b) make the 'can we optimize this' test be simply just looking up > 'argv[0]' in $PATH What about ABC=1 my-executable my-arg Ciao, Dscho
Re: git rebase regression: cannot pass a shell expression directly to --exec
On Tue, May 16, 2017 at 10:23 AM, Jeff King wrote: > > I think the logic here would be more like: > > 1. During prepare_shell_cmd(), even if we optimize out the shell call, > still prepare a fallback argv (since we can't allocate memory > post-fork). > > 2. In the forked child, if we get ENOENT from exec and cmd->use_shell > is set, then exec the fallback shell argv instead. Propagate its > results, even if it's 127. > > That still means we'd prefer a $PATH copy of a command to its shell > builtin variant, but that can't be helped (and I kind of doubt anybody > would care too much). I think it would be better to just (a) get rid of the magic strcspn() entirely (b) make the 'can we optimize this' test be simply just looking up 'argv[0]' in $PATH Hmm? If you have executables with special characters in them in your PATH, you have bigger issues. Also, if people really want to optimize the code that executes an external program (whether in shell or directly), I think it might be worth it to look at replacing the "fork()" with a "vfork()". Something like this - cmd->pid = fork(); + cmd->pid = (cmd->git_cmd || cmd->env) ? fork() : vfork(); might work (the native git_cmd case needs a real fork, and if we change the environment variables we need it too, but the other cases look like they might work with vfork()). Using vfork() can be hugely more efficient, because you don't have the extra page table copies and teardown, but also avoid a lot of possible copy-on-write faults. Linus
Re: git rebase regression: cannot pass a shell expression directly to --exec
On Tue, May 16, 2017 at 10:29:31AM -0700, Eric Rannaud wrote: > > It does not really work that way. Git runs in a separate process that > > does not have access to your current shell. That's why you need to do > > 'export -f foo'. > > > > If you want git to be able to ecute the foo shell function, git needs to > > start a _new_ shell process, which reads the environment, recognize the > > exported function and run that. > > > > This is not the same as git executing the command in your shell. Not > > exported variables would not be available in this function (as it would > > be in your equivalent). > > I'm sorry, I didn't mean (or say) "my shell process". Indeed, it > doesn't work that way. And to be clear, there is no problem with > having to "export -f foo". The only question is how should git run the > passed to --exec: should it run directly or using a shell? It's definitely "using a shell". Most things Git runs on your behalf behave the same, but there are some exceptions due to historical warts (that would break backwards compatibility if we switched them). E.g., $GIT_SSH does not use the shell, but we introduced GIT_SSH_COMMAND as an alternative which does use the shell. But note that "a shell" may not be necessarily be your login shell. We always execute "/bin/sh -c" (and you can override the shell path at build time). So your "export -f" trick only works because your /bin/sh is a symlink to bash (or because you specifically built with the SHELL_PATH knob pointed at bash). -Peff
Re: git rebase regression: cannot pass a shell expression directly to --exec
On Tue, May 16, 2017 at 10:21:51AM -0700, Eric Rannaud wrote: > On Tue, May 16, 2017 at 9:59 AM, Eric Rannaud wrote: > > When I use "git rebase --exec " I'm basically writing a "foreach > > commit in range { }" in my shell. Same idea with git bisect run. > > > > A transparent optimization that tries execve() then falls back to the > > user's shell sounds like a good idea. > > One issue with the execve-else-shell optimization is that sometimes a > binary exists that will shadow an exported function or a shell > builtin: > > git rebase --exec true master^^ # OK but in fact this runs /usr/bin/true Yeah, this is the builtin thing I mentioned elsewhere. I think it's pretty rare to run a builtin with no arguments and care about the behavior differences. > /usr/bin/time requires an argument. Even though the bash builtin time > runs fine without argument. > > $ time > > real0m0.000s > user0m0.000s > sys 0m0.000s I've run into the "time" distinction even when running things from the shell, because the time builtin is special. E.g.: $ time true real 0m0.000s user 0m0.000s sys 0m0.000s $ (echo foo | time true) 2>&1 0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 1136maxresident)k 0inputs+0outputs (0major+61minor)pagefaults 0swaps So to some degree, depending on builtins versus external commands (especially when you're round-tripping through another program running a second shell) is going to have some surprises. > But if the optimization is applied to more complex commands, then we > will have problems. For instance, the builtin echo supports \E, but > /usr/bin/echo doesn't support it. No, it shouldn't. If any of |&;<>()$`\\\"' \t\n*?[#~=% appear in the string, we always run the shell. > In any case, the manpage says --exec and " will be > interpreted as one or more shell commands.", it doesn't say "--exec > ". Right, it's clearly supposed to use the shell, or behave as if the shell were invoked (within reason). -Peff
Re: git rebase regression: cannot pass a shell expression directly to --exec
Jeff King wrote: >>> On Mon, May 15, 2017 at 11:25:03PM -0400, Jeff King wrote: One hack would be to look for BASH_FUNC_* in the environment and disable the optimization in that case. I think that would make your case Just Work. It doesn't help other oddball cases, like: [...] >> Hm. execvp explicitly does this when it hits ENOEXEC, but not for >> ENOENT. Do you know why that is? > > When execvp(foo) falls back on ENOEXEC, it is not running "sh -c foo". > It is actually running "sh foo" to run the script contents. Oh --- now I get it. This is about the optimization to bypass the shell in prepare_shell_command. It has nothing to do with execvp --- our execvp emulation already does the right thing, and Brandon's patches did not make this problem any worse or better. Unconditionally falling back to sh -c when we get ENOENT in the RUN_USING_SHELL case makes sense to me. Sorry for the confusion, Jonathan
Re: git rebase regression: cannot pass a shell expression directly to --exec
On 05/16, Jeff King wrote: > On Tue, May 16, 2017 at 10:15:40AM -0700, Brandon Williams wrote: > > > > > > So I was thinking something like the patch below, though I guess > > > > > technically you could look for BASH_FUNC_$argv[0]%%, which seems to be > > > > > bash's magic variable name. I hate to get too intimate with those > > > > > details, though. > > > > One concern with that is what about all other shells that are not BASH? > > I'm sure they use a different env var for storing functions so we may > > need to handle other shell's too? That is assuming we want to keep the > > old behavior. > > Most other shells don't do function-exporting at all. Certainly dash and > most traditional bourne shells don't. I wouldn't be surprised if zsh > does. But yeah, we'd have to support them one by one (and possibly > variants across different versions of each shell). Workable, but gross. > > > > When execvp(foo) falls back on ENOEXEC, it is not running "sh -c foo". > > > It is actually running "sh foo" to run the script contents. So it's > > > about letting you do: > > > > > > echo "no shebang line" >script > > > chmod +x script > > > ./script > > > > > > And nothing to do with shell builtins. > > > > That's correct, and is the exact behavior I was trying to mimic with the > > changes to run_command. > > 1. try executing the command. > > 2. if it fails with ENOEXEC then attempt to execute it with a shell > > I think the logic here would be more like: Oh yes, I was just saying what I did not taking into account how we would solve this issue. > > 1. During prepare_shell_cmd(), even if we optimize out the shell call, > still prepare a fallback argv (since we can't allocate memory > post-fork). > > 2. In the forked child, if we get ENOENT from exec and cmd->use_shell > is set, then exec the fallback shell argv instead. Propagate its > results, even if it's 127. > > That still means we'd prefer a $PATH copy of a command to its shell > builtin variant, but that can't be helped (and I kind of doubt anybody > would care too much). > > -Peff -- Brandon Williams
Re: git rebase regression: cannot pass a shell expression directly to --exec
On Tue, May 16, 2017 at 10:14 AM, Kevin Daudt wrote: >> A transparent optimization that tries execve() then falls back to the >> user's shell sounds like a good idea. > > It does not really work that way. Git runs in a separate process that > does not have access to your current shell. That's why you need to do > 'export -f foo'. > > If you want git to be able to ecute the foo shell function, git needs to > start a _new_ shell process, which reads the environment, recognize the > exported function and run that. > > This is not the same as git executing the command in your shell. Not > exported variables would not be available in this function (as it would > be in your equivalent). I'm sorry, I didn't mean (or say) "my shell process". Indeed, it doesn't work that way. And to be clear, there is no problem with having to "export -f foo". The only question is how should git run the passed to --exec: should it run directly or using a shell?
Re: git rebase regression: cannot pass a shell expression directly to --exec
On Tue, May 16, 2017 at 10:15:40AM -0700, Brandon Williams wrote: > > > > So I was thinking something like the patch below, though I guess > > > > technically you could look for BASH_FUNC_$argv[0]%%, which seems to be > > > > bash's magic variable name. I hate to get too intimate with those > > > > details, though. > > One concern with that is what about all other shells that are not BASH? > I'm sure they use a different env var for storing functions so we may > need to handle other shell's too? That is assuming we want to keep the > old behavior. Most other shells don't do function-exporting at all. Certainly dash and most traditional bourne shells don't. I wouldn't be surprised if zsh does. But yeah, we'd have to support them one by one (and possibly variants across different versions of each shell). Workable, but gross. > > When execvp(foo) falls back on ENOEXEC, it is not running "sh -c foo". > > It is actually running "sh foo" to run the script contents. So it's > > about letting you do: > > > > echo "no shebang line" >script > > chmod +x script > > ./script > > > > And nothing to do with shell builtins. > > That's correct, and is the exact behavior I was trying to mimic with the > changes to run_command. > 1. try executing the command. > 2. if it fails with ENOEXEC then attempt to execute it with a shell I think the logic here would be more like: 1. During prepare_shell_cmd(), even if we optimize out the shell call, still prepare a fallback argv (since we can't allocate memory post-fork). 2. In the forked child, if we get ENOENT from exec and cmd->use_shell is set, then exec the fallback shell argv instead. Propagate its results, even if it's 127. That still means we'd prefer a $PATH copy of a command to its shell builtin variant, but that can't be helped (and I kind of doubt anybody would care too much). -Peff
Re: git rebase regression: cannot pass a shell expression directly to --exec
On Tue, May 16, 2017 at 9:59 AM, Eric Rannaud wrote: > When I use "git rebase --exec " I'm basically writing a "foreach > commit in range { }" in my shell. Same idea with git bisect run. > > A transparent optimization that tries execve() then falls back to the > user's shell sounds like a good idea. One issue with the execve-else-shell optimization is that sometimes a binary exists that will shadow an exported function or a shell builtin: git rebase --exec true master^^ # OK but in fact this runs /usr/bin/true In this case it doesn't really matter. If the optimization is only applied to "simple commands" (i.e. no arguments), then it's probably OK. I can't think of problematic cases. Except weird things like: $ git rebase --exec time master^ Executing: time Usage: time [-apvV] [-f format] [-o file] [--append] [--verbose] [--portability] [--format=format] [--output=file] [--version] [--help] command [arg...] warning: execution failed: time /usr/bin/time requires an argument. Even though the bash builtin time runs fine without argument. $ time real0m0.000s user0m0.000s sys 0m0.000s But if the optimization is applied to more complex commands, then we will have problems. For instance, the builtin echo supports \E, but /usr/bin/echo doesn't support it. In any case, the manpage says --exec and " will be interpreted as one or more shell commands.", it doesn't say "--exec ".
Re: git rebase regression: cannot pass a shell expression directly to --exec
On 05/16, Jeff King wrote: > On Tue, May 16, 2017 at 09:41:24AM -0700, Jonathan Nieder wrote: > > > Jeff King wrote: > > > On Mon, May 15, 2017 at 11:25:03PM -0400, Jeff King wrote: > > > > >> One hack would be to look for BASH_FUNC_* in the environment and disable > > >> the optimization in that case. I think that would make your case Just > > >> Work. It doesn't help other oddball cases, like: > > >> > > >> - you're trying to run a shell builtin that behaves differently than > > >> its exec-able counterpart > > >> > > >> - your shell has some other mechanism for defining commands that we > > >> would not find via exec. I don't know of one offhand. Obviously $ENV > > >> could point to a file which defines some, but for most shells would > > >> not read any startup files for a non-interactive "sh -c" invocation. > > > > > > So I was thinking something like the patch below, though I guess > > > technically you could look for BASH_FUNC_$argv[0]%%, which seems to be > > > bash's magic variable name. I hate to get too intimate with those > > > details, though. One concern with that is what about all other shells that are not BASH? I'm sure they use a different env var for storing functions so we may need to handle other shell's too? That is assuming we want to keep the old behavior. > > > > > > Another option is to speculatively run "foo" without the shell, and if > > > execve fails to find it, then fall back to running the shell. That would > > > catch any number of cases where the shell "somehow" finds a command that > > > we can't. > > > > Hm. execvp explicitly does this when it hits ENOEXEC, but not for > > ENOENT. Do you know why that is? > > When execvp(foo) falls back on ENOEXEC, it is not running "sh -c foo". > It is actually running "sh foo" to run the script contents. So it's > about letting you do: > > echo "no shebang line" >script > chmod +x script > ./script > > And nothing to do with shell builtins. That's correct, and is the exact behavior I was trying to mimic with the changes to run_command. 1. try executing the command. 2. if it fails with ENOEXEC then attempt to execute it with a shell > > > I think we want to behave consistently for shell builtins and for > > exported functions --- they are different sides of the same problem, > > and behaving differently between the two feels confusing. > > Yes, I think ideally we'd handle it all transparently. Although I'd also > point out that Git the behavior under discussion dates back to 2009 and > this is (as far as I can recall) the first report. So documenting the > current behavior might not be that bad an option. > > -Peff -- Brandon Williams
Re: git rebase regression: cannot pass a shell expression directly to --exec
On Tue, May 16, 2017 at 09:59:03AM -0700, Eric Rannaud wrote: > On Tue, May 16, 2017 at 9:18 AM, Jeff King wrote: > > On Tue, May 16, 2017 at 12:23:02PM +0200, Johannes Schindelin wrote: > >> It would appear to me that you used a side effect of an implementation > >> detail: that `git rebase -i` was implemented entirely as a shell script. > > > > I don't think that's true at all. He expected the user-provided "--exec" > > command to be run by a shell, which seems like a reasonable thing for > > Git to promise (and we already make a similar promise for most > > user-provided commands that we run). What happens in between, be it > > As a "user", my expectation was simply that the command would be run > not just in "a shell", but in *my* shell (or the shell that calls git, > maybe). So I don't see any portability question with respect to Git. > My script that uses git rebase --exec may not be portable, but that's > my problem. > > When I use "git rebase --exec " I'm basically writing a "foreach > commit in range { }" in my shell. Same idea with git bisect run. > > A transparent optimization that tries execve() then falls back to the > user's shell sounds like a good idea. It does not really work that way. Git runs in a separate process that does not have access to your current shell. That's why you need to do 'export -f foo'. If you want git to be able to ecute the foo shell function, git needs to start a _new_ shell process, which reads the environment, recognize the exported function and run that. This is not the same as git executing the command in your shell. Not exported variables would not be available in this function (as it would be in your equivalent).
Re: git rebase regression: cannot pass a shell expression directly to --exec
On Tue, May 16, 2017 at 9:47 AM, Jeff King wrote: >> I think we want to behave consistently for shell builtins and for >> exported functions --- they are different sides of the same problem, >> and behaving differently between the two feels confusing. > > Yes, I think ideally we'd handle it all transparently. Although I'd also > point out that Git the behavior under discussion dates back to 2009 and > this is (as far as I can recall) the first report. So documenting the > current behavior might not be that bad an option. This is on Arch Linux which is usually pretty aggressive with their Git upgrades and I first saw the problem this week. The script that hits this case runs constantly on my machine so my guess is that if anyone relies on the old behavior, they're just starting to see a new-enough Git to have a problem.
Re: git rebase regression: cannot pass a shell expression directly to --exec
On Tue, May 16, 2017 at 9:18 AM, Jeff King wrote: > On Tue, May 16, 2017 at 12:23:02PM +0200, Johannes Schindelin wrote: >> It would appear to me that you used a side effect of an implementation >> detail: that `git rebase -i` was implemented entirely as a shell script. > > I don't think that's true at all. He expected the user-provided "--exec" > command to be run by a shell, which seems like a reasonable thing for > Git to promise (and we already make a similar promise for most > user-provided commands that we run). What happens in between, be it As a "user", my expectation was simply that the command would be run not just in "a shell", but in *my* shell (or the shell that calls git, maybe). So I don't see any portability question with respect to Git. My script that uses git rebase --exec may not be portable, but that's my problem. When I use "git rebase --exec " I'm basically writing a "foreach commit in range { }" in my shell. Same idea with git bisect run. A transparent optimization that tries execve() then falls back to the user's shell sounds like a good idea.
Re: git rebase regression: cannot pass a shell expression directly to --exec
On Tue, May 16, 2017 at 09:41:24AM -0700, Jonathan Nieder wrote: > Jeff King wrote: > > On Mon, May 15, 2017 at 11:25:03PM -0400, Jeff King wrote: > > >> One hack would be to look for BASH_FUNC_* in the environment and disable > >> the optimization in that case. I think that would make your case Just > >> Work. It doesn't help other oddball cases, like: > >> > >> - you're trying to run a shell builtin that behaves differently than > >> its exec-able counterpart > >> > >> - your shell has some other mechanism for defining commands that we > >> would not find via exec. I don't know of one offhand. Obviously $ENV > >> could point to a file which defines some, but for most shells would > >> not read any startup files for a non-interactive "sh -c" invocation. > > > > So I was thinking something like the patch below, though I guess > > technically you could look for BASH_FUNC_$argv[0]%%, which seems to be > > bash's magic variable name. I hate to get too intimate with those > > details, though. > > > > Another option is to speculatively run "foo" without the shell, and if > > execve fails to find it, then fall back to running the shell. That would > > catch any number of cases where the shell "somehow" finds a command that > > we can't. > > Hm. execvp explicitly does this when it hits ENOEXEC, but not for > ENOENT. Do you know why that is? When execvp(foo) falls back on ENOEXEC, it is not running "sh -c foo". It is actually running "sh foo" to run the script contents. So it's about letting you do: echo "no shebang line" >script chmod +x script ./script And nothing to do with shell builtins. > I think we want to behave consistently for shell builtins and for > exported functions --- they are different sides of the same problem, > and behaving differently between the two feels confusing. Yes, I think ideally we'd handle it all transparently. Although I'd also point out that Git the behavior under discussion dates back to 2009 and this is (as far as I can recall) the first report. So documenting the current behavior might not be that bad an option. -Peff
Re: git rebase regression: cannot pass a shell expression directly to --exec
On Mon, May 15, 2017 at 8:53 PM, Jeff King wrote: >> > So I suspect if you added an extraneous semi-colon, your case would work >> > again (and that would confirm for us that this is indeed the problem). >> >> Wow, that's a brilliant analysis. > > If it's right. :) It's all theory at this point. > > My /bin/sh isn't bash, but I should be able to build with > SHELL_PATH=/bin/bash to reproduce. But I can't: Just to clarify if there was any doubt, the semicolon trick does indeed work fine when bash is your shell. It would be nice if it were consistent for all git commands that take a argument: foo() { echo foo; } export -f foo git bisect start master master^^ git bisect run foo # works git bisect run 'foo;' # doesn't work running foo; /usr/lib/git-core/git-bisect: line 493: foo;: command not found Also, what's a "simple command", exactly? foo() { echo foo; } export -f foo git rebase --exec foo master^^ # fails git rebase --exec 'foo;' master^^ # OK git rebase --exec 'foo 1' master^^ # OK Not sure if this can be made easy to understand in the manpage. Thanks, Eric
Re: git rebase regression: cannot pass a shell expression directly to --exec
(+Brandon Williams, who may have more context for execvp-related things) Hi, Jeff King wrote: > On Mon, May 15, 2017 at 11:25:03PM -0400, Jeff King wrote: >> One hack would be to look for BASH_FUNC_* in the environment and disable >> the optimization in that case. I think that would make your case Just >> Work. It doesn't help other oddball cases, like: >> >> - you're trying to run a shell builtin that behaves differently than >> its exec-able counterpart >> >> - your shell has some other mechanism for defining commands that we >> would not find via exec. I don't know of one offhand. Obviously $ENV >> could point to a file which defines some, but for most shells would >> not read any startup files for a non-interactive "sh -c" invocation. > > So I was thinking something like the patch below, though I guess > technically you could look for BASH_FUNC_$argv[0]%%, which seems to be > bash's magic variable name. I hate to get too intimate with those > details, though. > > Another option is to speculatively run "foo" without the shell, and if > execve fails to find it, then fall back to running the shell. That would > catch any number of cases where the shell "somehow" finds a command that > we can't. Hm. execvp explicitly does this when it hits ENOEXEC, but not for ENOENT. Do you know why that is? I think we want to behave consistently for shell builtins and for exported functions --- they are different sides of the same problem, and behaving differently between the two feels confusing. Thanks, Jonathan
Re: git rebase regression: cannot pass a shell expression directly to --exec
On Tue, May 16, 2017 at 12:23:02PM +0200, Johannes Schindelin wrote: > On Mon, 15 May 2017, Eric Rannaud wrote: > > > It used to be possible to run a sequence like: > > > > foo() { echo X; } > > export -f foo > > git rebase --exec foo HEAD~10 > > It would appear to me that you used a side effect of an implementation > detail: that `git rebase -i` was implemented entirely as a shell script. I don't think that's true at all. He expected the user-provided "--exec" command to be run by a shell, which seems like a reasonable thing for Git to promise (and we already make a similar promise for most user-provided commands that we run). What happens in between, be it shell or C code, doesn't matter, and the conversion away from a shell script in this case only tickled an existing bad interaction between "export -f" and Git's run-command code. See my other replies for the full story. I don't think this has anything in particular to do with git-rebase, though. Our solutions are either: - declare "export -f" as too tricky for our optimization, and teach people about the ";" trick - figure out some workaround/fallback to disable the shell-skipping optimization in this case -Peff
Re: git rebase regression: cannot pass a shell expression directly to --exec
Hi Junio, On Tue, 16 May 2017, Junio C Hamano wrote: > The point of rewriting things in C and using run_command() interface was > to avoid shell overhead. That statement is missing the point. It is true that converting shell scripts to proper builtins avoids the shell overhead. It is even more true that that results in a nice speed-up that can even be felt on Linux (I feel I have to stress this, as you seem to be mostly focused on Linux usage, and I do not want you to get the impression that you are only doing "the Windows folks" a favor when you gracefully accept the patches to convert scripts to proper builtins). But the real truth is: shell scripting is not portable. Shell scripting is never only shell scripting, of course. A quite undocumented set of utilities is expected to be present for our scripts to run, too: sed, awk, tr, cat, expr, just to name a few. It does not end there. For example, sed is not equal to sed. BSD sed has different semantics than GNU sed, and we jump through hoops to try to ensure that our shell scripts run with both versions. Which must make many a contributor feel a lot less positive about their contributions e.g. when a reviewer points out that their \t in their sed statement would break Git on MacOSX. Sad! We place a burden on maintainers targeting platforms that are not Linux. If I remember correctly, we need to override the default shell on one Unix, even, because we simply failed to make our shell scripts run with it. And then, of course, there are all the problems inflicted on users on that most prevalent Operating System of them all: Windows. Yes, I know, you "non-Windows folk" would like to happily ignore that little tiny problem, but that's really doing Git a disservice. So stop doing it. What hurts me most about this is that Subversion, a software that us "Git folk" pride ourselves in making obsolete, never had those problems. It was designed with an eye toward portability from the start. It is a good thing to support users who want to automate their workflows using scripts, of course. Oh, and please, do not forget to remember that there are tons of scripting languages out there, e.g. AppleScript, Javascript, Tcl, Python, Ruby; just because you prefer Bash does not mean the majority of developers do. So let's continue to broaden our scripting support. And at the same time, let's also change our mindset so that we can reduce the portability problems of Git. Ciao, Dscho
Re: git rebase regression: cannot pass a shell expression directly to --exec
Hi Eric, On Mon, 15 May 2017, Eric Rannaud wrote: > It used to be possible to run a sequence like: > > foo() { echo X; } > export -f foo > git rebase --exec foo HEAD~10 It would appear to me that you used a side effect of an implementation detail: that `git rebase -i` was implemented entirely as a shell script. In my mind, this was a fragile assumption, and that is why... > commit 18633e1a22a68bbe8e6311a1039d13ebbf6fd041 (refs/bisect/bad) > Author: Johannes Schindelin > Date: Thu Feb 9 23:23:11 2017 +0100 > > rebase -i: use the rebase--helper builtin ... this commit, which is a long-overdue start of cleaning up our act, broke your code's assumption. I am afraid that your code placed too much of a dependency on an implementation detail that changed. In short: I think that your fix to your script is correct. Ciao, Johannes
Re: git rebase regression: cannot pass a shell expression directly to --exec
On Mon, May 15, 2017 at 11:53:57PM -0400, Jeff King wrote: > My /bin/sh isn't bash, but I should be able to build with > SHELL_PATH=/bin/bash to reproduce. But I can't: > >$ bash >$ foo() { echo >&2 "in foo"; } >$ export -f foo >$ bash -c foo >in foo > >$ strace -f 2>&1 git.compile rebase -x 'foo;' HEAD^ | grep foo >Executing: foo; >[pid 1788] execve("/bin/bash", ["/bin/bash", "-c", "foo;", "foo;"], [/* > 60 vars */] >foo;: foo: command not found > > So I'm not sure why the direct "bash -c" can find it, but somehow the > variable doesn't make it through to the "bash -c" at the lower level. > Replacing "foo;" with "env" shows the environment, but BASH_FUNC_foo > isn't set in it. I'm not sure where it's getting eaten, though. Hmph. It looks like "dash" eats it. My "git.compile" above is a symlink to /path/to/git/bin-wrappers/git. But it looks like our Makefile isn't smart enough to rebuild bin-wrappers when you switch SHELL_PATH, so it will had "/bin/sh" in it. Which points to dash on my machine. And indeed, dash seems to wipe the environment: $ foo() { echo >&2 "in foo"; } $ export -f foo $ bash -c foo in foo $ dash -c "bash -c foo" bash: foo: command not found I wonder if it's related to ShellShock protections, or if dash just rejects variable names with the "%" in them or something. Anyway. That explains why I had trouble reproducing. Using bash consistently as my shell lets me reproduce Eric's results. And doing the semicolon trick does make it work again. So I feel confident that I've analyzed the problem correctly, at least. -Peff
Re: git rebase regression: cannot pass a shell expression directly to --exec
On Tue, May 16, 2017 at 12:40:51PM +0900, Junio C Hamano wrote: > Jeff King writes: > > > Interesting. The "exec" string is still run with a shell. E.g.: > > ... > > I wonder if this is falling afoul of the optimization in run-command's > > prepare_shell_cmd() to skip shell invocations for "simple" commands. > > ... > > So I suspect if you added an extraneous semi-colon, your case would work > > again (and that would confirm for us that this is indeed the problem). > > Wow, that's a brilliant analysis. If it's right. :) It's all theory at this point. My /bin/sh isn't bash, but I should be able to build with SHELL_PATH=/bin/bash to reproduce. But I can't: $ bash $ foo() { echo >&2 "in foo"; } $ export -f foo $ bash -c foo in foo $ strace -f 2>&1 git.compile rebase -x 'foo;' HEAD^ | grep foo Executing: foo; [pid 1788] execve("/bin/bash", ["/bin/bash", "-c", "foo;", "foo;"], [/* 60 vars */] foo;: foo: command not found So I'm not sure why the direct "bash -c" can find it, but somehow the variable doesn't make it through to the "bash -c" at the lower level. Replacing "foo;" with "env" shows the environment, but BASH_FUNC_foo isn't set in it. I'm not sure where it's getting eaten, though. > The "semicolon" trick is way too obscure, but perhaps can be > documented as an escape hatch? Yeah, I agree this should be documented if it can't be fixed. I wasn't sure if we were giving up just yet. Either way, though, it wouldn't hurt to mention optimizing out "maybe shell" optimization, because it can occasionally produce user-visible effects. Where would be a good place? In git(1), I guess? It's almost something that could go in gitcli(7), but it's not really about the CLI in particular. In most cases the shell-exec'd commands are from config, but not always (as this case shows). So git-config(1) probably isn't the right place. AFAIK, we don't talk about this behavior at all in the existing documentation. And I really don't know where we'd put it that somebody would find it without having to read the documentation exhaustively. -Peff
Re: git rebase regression: cannot pass a shell expression directly to --exec
Jeff King writes: > Interesting. The "exec" string is still run with a shell. E.g.: > ... > I wonder if this is falling afoul of the optimization in run-command's > prepare_shell_cmd() to skip shell invocations for "simple" commands. > ... > So I suspect if you added an extraneous semi-colon, your case would work > again (and that would confirm for us that this is indeed the problem). Wow, that's a brilliant analysis. > The optimization in run-command is very old, but the switch to > rebase--helper presumably moved us from doing that exec in the actual > shell script to doing it via the C code. > > Which means your exported-function technique has been broken for _most_ > of Git all along, but it just now affected this particular spot. > > I'm not sure how to feel about it. In the face of exported functions, we > can never do the shell-skipping optimization, because we don't know how > the shell is going to interpret even a simple-looking command. And it is > kind of a neat trick. But I also hate to add extra useless shell > invocations for the predominantly common case that people aren't using > this trick (or aren't even using a shell that supports function > exports). I was about to write this off as "an unfortunate regression, a fallout that will likely left unfixed, due to lack of a good practical workaround." The point of rewriting things in C and using run_command() interface was to avoid shell overhead. We are doing an exec already, but adding a shell invocation in the middle will double the number of exec (and probably add an extra fork as well), which probably is measurable on systems with slow fork/exec. The "semicolon" trick is way too obscure, but perhaps can be documented as an escape hatch?
Re: git rebase regression: cannot pass a shell expression directly to --exec
On Mon, May 15, 2017 at 11:25:03PM -0400, Jeff King wrote: > One hack would be to look for BASH_FUNC_* in the environment and disable > the optimization in that case. I think that would make your case Just > Work. It doesn't help other oddball cases, like: > > - you're trying to run a shell builtin that behaves differently than > its exec-able counterpart > > - your shell has some other mechanism for defining commands that we > would not find via exec. I don't know of one offhand. Obviously $ENV > could point to a file which defines some, but for most shells would > not read any startup files for a non-interactive "sh -c" invocation. So I was thinking something like the patch below, though I guess technically you could look for BASH_FUNC_$argv[0]%%, which seems to be bash's magic variable name. I hate to get too intimate with those details, though. Another option is to speculatively run "foo" without the shell, and if execve fails to find it, then fall back to running the shell. That would catch any number of cases where the shell "somehow" finds a command that we can't. You'd still have confusing behavior if your shell builtin behaved differently than the exec-able version, though (because we'd quietly use the exec-able one), but I would imagine that's exceedingly rare. I dunno. Maybe the whole thing is a fool's errand. diff --git a/run-command.c b/run-command.c index 1c02bfb2e..8328d27fb 100644 --- a/run-command.c +++ b/run-command.c @@ -240,12 +240,24 @@ int sane_execvp(const char *file, char * const argv[]) return -1; } +static int env_has_bash_functions(void) +{ + extern char **environ; + char **key; + + for (key = environ; *key; key++) + if (starts_with(*key, "BASH_FUNC_")) + return 1; + return 0; +} + static const char **prepare_shell_cmd(struct argv_array *out, const char **argv) { if (!argv[0]) die("BUG: shell command is empty"); - if (strcspn(argv[0], "|&;<>()$`\\\"' \t\n*?[#~=%") != strlen(argv[0])) { + if (strcspn(argv[0], "|&;<>()$`\\\"' \t\n*?[#~=%") != strlen(argv[0]) || + env_has_bash_functions()) { #ifndef GIT_WINDOWS_NATIVE argv_array_push(out, SHELL_PATH); #else
Re: git rebase regression: cannot pass a shell expression directly to --exec
On Mon, May 15, 2017 at 11:08:39AM -0700, Eric Rannaud wrote: > It used to be possible to run a sequence like: > > foo() { echo X; } > export -f foo > git rebase --exec foo HEAD~10 > > Since upgrading to 2.13.0, I had to update my scripts to run: > > git rebase --exec "bash -c foo" HEAD~10 Interesting. The "exec" string is still run with a shell. E.g.: $ git rebase --exec 'for i in 1 2 3; do echo >&2 $i; done' HEAD~5 Executing: for i in 1 2 3; do echo >&2 $i; done 1 2 3 Executing: for i in 1 2 3; do echo >&2 $i; done 1 2 3 [...and so on...] I wonder if this is falling afoul of the optimization in run-command's prepare_shell_cmd() to skip shell invocations for "simple" commands. E.g., if I do: strace -fe execve git rebase -x foo HEAD^ 2>&1 | grep foo I see: ... Executing: foo [pid 21820] execve("foo", ["foo"], [/* 57 vars */]) = -1 ENOENT (No such file or directory) fatal: cannot run foo: No such file or directory But if I were to add a shell meta-character, like this: strace -fe execve git rebase -x 'foo;' HEAD^ 2>&1 | grep foo then I see: ... Executing: foo; [pid 22569] execve("/bin/sh", ["/bin/sh", "-c", "foo;", "foo;"], [/* 57 vars */]) = 0 foo;: 1: foo;: foo: not found So I suspect if you added an extraneous semi-colon, your case would work again (and that would confirm for us that this is indeed the problem). > I'm not sure if this was an intended change. Bisecting with the > following script: > [...] > It points to this commit: > > commit 18633e1a22a68bbe8e6311a1039d13ebbf6fd041 (refs/bisect/bad) > Author: Johannes Schindelin > Date: Thu Feb 9 23:23:11 2017 +0100 > > rebase -i: use the rebase--helper builtin The optimization in run-command is very old, but the switch to rebase--helper presumably moved us from doing that exec in the actual shell script to doing it via the C code. Which means your exported-function technique has been broken for _most_ of Git all along, but it just now affected this particular spot. I'm not sure how to feel about it. In the face of exported functions, we can never do the shell-skipping optimization, because we don't know how the shell is going to interpret even a simple-looking command. And it is kind of a neat trick. But I also hate to add extra useless shell invocations for the predominantly common case that people aren't using this trick (or aren't even using a shell that supports function exports). One hack would be to look for BASH_FUNC_* in the environment and disable the optimization in that case. I think that would make your case Just Work. It doesn't help other oddball cases, like: - you're trying to run a shell builtin that behaves differently than its exec-able counterpart - your shell has some other mechanism for defining commands that we would not find via exec. I don't know of one offhand. Obviously $ENV could point to a file which defines some, but for most shells would not read any startup files for a non-interactive "sh -c" invocation. -Peff
git rebase regression: cannot pass a shell expression directly to --exec
Hi all, It used to be possible to run a sequence like: foo() { echo X; } export -f foo git rebase --exec foo HEAD~10 Since upgrading to 2.13.0, I had to update my scripts to run: git rebase --exec "bash -c foo" HEAD~10 I'm not sure if this was an intended change. Bisecting with the following script: #!/usr/bin/env bash make -j8 || exit 3 function foo() { echo OK } export -f foo pushd tmp ../git --exec-path=.. rebase --exec foo HEAD^^ ret=$? # Cleanup if failure ../git --exec-path=.. rebase --abort &> /dev/null popd exit $ret It points to this commit: commit 18633e1a22a68bbe8e6311a1039d13ebbf6fd041 (refs/bisect/bad) Author: Johannes Schindelin Date: Thu Feb 9 23:23:11 2017 +0100 rebase -i: use the rebase--helper builtin Now that the sequencer learned to process a "normal" interactive rebase, we use it. The original shell script is still used for "non-normal" interactive rebases, i.e. when --root or --preserve-merges was passed. Please note that the --root option (via the $squash_onto variable) needs special handling only for the very first command, hence it is still okay to use the helper upon continue/skip. Also please note that the --no-ff setting is volatile, i.e. when the interactive rebase is interrupted at any stage, there is no record of it. Therefore, we have to pass it from the shell script to the rebase--helper. Note: the test t3404 had to be adjusted because the the error messages produced by the sequencer comply with our current convention to start with a lower-case letter. Signed-off-by: Johannes Schindelin Signed-off-by: Junio C Hamano Thanks, Eric