Re: [RFC/PATCH] recognize pathspec magic without "--" disambiguation
Jeff King writes: > Yeah. That's a good reason not to use ":/" without a disambiguating "--" > (which _does_ work, even without my series, because check_filename() > does specific path-matching). At best, you pay for a complete useless > history traversal before the command actually starts running. But much > more likely is that Git just complains of ambiguity, because people tend > to mention top-level paths in their commit messages. E.g.: > > $ cd t > $ git grep foo :/Documentation > fatal: ambiguous argument ':/Documentation': both revision and filename > > So it really is a pretty horrible syntax. It does not allow it to be further annotated like "HEAD^{/^Merge branch 'foo'}^2". Yes, I agree that ":/string" was a pretty poor design that was done without thinking things through. But back then in 2007, it is a bit unfair to blame the initial design for not thinking things through. There weren't as many precedents, good or bad, to learn from than we have today ;-)
Re: [RFC/PATCH] recognize pathspec magic without "--" disambiguation
On Sat, May 27, 2017 at 11:54:07AM +0200, Ævar Arnfjörð Bjarmason wrote: > On Thu, May 25, 2017 at 5:27 PM, Jeff King wrote: > > git log :/foo.*bar > > Another option would be to deprecate the :/rx syntax over some period > in favor of ^{/rx}. Yeah, the latter is more flexible (can start at a tip of your choosing) and syntactically matches other object selectors much better. > I think it's too ugly to live, and really useless. It's equivalent to > "--grep= --all". Does anyone use this and not really mean to use > ^{/rx}? E.g. "git show :/fix" might show a fix on some unrelated > branch you recently rebased. It's actually _worse_ than that --grep because it only picks one result. So not only do we look at unrelated branches, but they may take precedence (based on commit timestamp) over the current branch. It is shorter than "HEAD^{/re}", though. So I suspect people do use it and would be slightly annoyed if it went away. > > will be treated as a pathspec (if it doesn't match a > > commit message) due to the wildcard matching in > > 28fcc0b71. > > So it might DWYM after hanging there looking at your entire history > for a commit message matching foo.*bar? And if you make such a commit > it'll start meaning something else entirely? Yeah. That's a good reason not to use ":/" without a disambiguating "--" (which _does_ work, even without my series, because check_filename() does specific path-matching). At best, you pay for a complete useless history traversal before the command actually starts running. But much more likely is that Git just complains of ambiguity, because people tend to mention top-level paths in their commit messages. E.g.: $ cd t $ git grep foo :/Documentation fatal: ambiguous argument ':/Documentation': both revision and filename So it really is a pretty horrible syntax. -Peff
Re: [RFC/PATCH] recognize pathspec magic without "--" disambiguation
On Thu, May 25, 2017 at 5:27 PM, Jeff King wrote: > git log :/foo.*bar Another option would be to deprecate the :/rx syntax over some period in favor of ^{/rx}. I think it's too ugly to live, and really useless. It's equivalent to "--grep= --all". Does anyone use this and not really mean to use ^{/rx}? E.g. "git show :/fix" might show a fix on some unrelated branch you recently rebased. > will be treated as a pathspec (if it doesn't match a > commit message) due to the wildcard matching in > 28fcc0b71. So it might DWYM after hanging there looking at your entire history for a commit message matching foo.*bar? And if you make such a commit it'll start meaning something else entirely?
Re: [RFC/PATCH] recognize pathspec magic without "--" disambiguation
On Fri, May 26, 2017 at 11:13:55AM +0900, Junio C Hamano wrote: > >- git log :^foo > > (when "^foo" exists in your index) > > > > The same logic applies; it would continue to work. And > > ditto for any other weird filenames in your index like > > "(attr)foo". > > "git show :$path" where $path happens to be "^foo" would grab the > contents of the $path out of the index and I think that is what you > meant, but use of "git log" in the example made me scratch my head > greatly. Yeah, sorry about that confusion. My original motivation for this was for use with git-grep, but since the existing tests all used "git log", I tried to switch to that for consistency. But index paths obviously make no sense with "git log" (we _do_ still parse it the same as we would for git-show, etc, even though git-log doesn't do anything with it). As an aside, I wonder if git-log should issue a warning when there are pending objects that it doesn't handle. > >- git log :/foo > > (when "foo" does _not_ match a commit message) > > ... > > This same downside actually exists currently when you > > have an asterisk in your regex. E.g., > > > > git log :/foo.*bar > > > > will be treated as a pathspec (if it doesn't match a > > commit message) due to the wildcard matching in > > 28fcc0b71. > > In other words, we are not making things worse? We're making them slightly worse. The "problem" used to trigger with ":/foo.*bar", and now it would trigger with ":/foobar", too. I don't know if that's a meaningful distinction, though. > Unlike "git log builtin-checkout.c" that errors out (only because > there is no such file in the checkout of the current version) and > makes its solution obvious to the users, this change has the risk of > silently accepting an ambiguous input and computing result that is > different from what the user intended to. So I am not sure. Right, that's what I was trying to catalogue in the commit message. > As you pointedout, ":/" especially does look like a likely point of > failure, in that both "this is path at the top" pathspec magic and > "the commit with this string" are not something that we can say with > confidence that are rarely used because they are so esoteric. > > As to "is it OK to build a rule that we cannot explain easily?", I > think it is OK to say "if it is not a rev, and if it is not a > pathname in the current working tree, you must disambiguate, but Git > helps by guessing in some cases---if you want to have more control > (e.g. you are a script), explicitly disambiguate and you'd be OK", > and leave the "some cases" vague, as long as we are only making > reasonably conservative guesses. I don't know. That is punting on the issue by not making any promises. But that's a small consolation to somebody who does: $ git log :^foo [ok, works great...] $ git log :/foo fatal: ambiguous argument ':/foo': unknown revision or path not in the working tree. [stupid git! why don't you work?] An explanation like "it's complicated, and the rules are meant to be conservative and avoid confusion" does not really help Git's reputation for being arcane. I dunno. The one saving grace of ":/" is that we actually do handle the ":/foo" case completely in check_filename(). So going back to my :/foobar example: if you have a file "foobar" in your root directory, it _is_ considered ambiguous. So if that was the one exception, it would probably be OK in practice. Which again makes me feel a bit like the right solution is doing the existence checks on the non-magic portion of the pathspec for more magic besides ":/". Unfortunately, since writing my last message I did look into asking the pathspec code to parse the arguments for us, but I think it would take quite a bit of refactoring. It's very ready to die() or emit warnings on bogus pathspecs, so it's not a good match for this kind of speculative "is it a pathspec" test. The middle ground would be to replicate a simple subset of the rules in verify_filename(). If we assume that all arguments with ":(" are pathspecs (which seem rather unlikely to have false positives), then that leaves only a few short-magic patterns: :/, :!, and :^. We already specially handle :/ here. So it would really just be adding the other two (which are just aliases of each other). It's not that much code. The dirty thing is just that we're replicating some of the pathspec logic, and any new magic would have to be updated here, too. I'll see if I can work up a patch. -Peff
Re: [RFC/PATCH] recognize pathspec magic without "--" disambiguation
Jeff King writes: > But let's consider related invocations and whether we're > making them better or worse: > >- git log :/foo > (when "foo" matches a commit message) > > This one should remain the same. Like the existing > wildcard rule, we're touching only verify_filename(), > not verify_non_filename(). So cases that _do_ resolve > as a rev will continue to do so. > >- git log :^foo > (when "^foo" exists in your index) > > The same logic applies; it would continue to work. And > ditto for any other weird filenames in your index like > "(attr)foo". "git show :$path" where $path happens to be "^foo" would grab the contents of the $path out of the index and I think that is what you meant, but use of "git log" in the example made me scratch my head greatly. >- git log :/foo > (when "foo" does _not_ match a commit message) > ... > This same downside actually exists currently when you > have an asterisk in your regex. E.g., > > git log :/foo.*bar > > will be treated as a pathspec (if it doesn't match a > commit message) due to the wildcard matching in > 28fcc0b71. In other words, we are not making things worse? > I wrote all the above to try to convince myself that this > doesn't have any serious regression cases. And I think I > did. > > But I actually we could make the rules in alternative (2) > above work. check_filename() would ask the pathspec code to > parse each argument and get one of three results: > > 1. it's not pathspec magic; treat it like a filename > (e.g., just "foo", or even bogus stuff like ":%foo") > > 2. it is pathspec magic, and here is the matching filename > that ought to exist (e.g., "foo" from ":^foo" or > ":(exclude)foo") > > 3. it is pathspec magic, but there's no matching filename. > Assume it's a pathspec (e.g., "(attr)foo"). > > I'm on the fence on whether it's worth the trouble versus > the simple rule implemented by this patch. Unlike "git log builtin-checkout.c" that errors out (only because there is no such file in the checkout of the current version) and makes its solution obvious to the users, this change has the risk of silently accepting an ambiguous input and computing result that is different from what the user intended to. So I am not sure. As you pointedout, ":/" especially does look like a likely point of failure, in that both "this is path at the top" pathspec magic and "the commit with this string" are not something that we can say with confidence that are rarely used because they are so esoteric. As to "is it OK to build a rule that we cannot explain easily?", I think it is OK to say "if it is not a rev, and if it is not a pathname in the current working tree, you must disambiguate, but Git helps by guessing in some cases---if you want to have more control (e.g. you are a script), explicitly disambiguate and you'd be OK", and leave the "some cases" vague, as long as we are only making reasonably conservative guesses.
[RFC/PATCH] recognize pathspec magic without "--" disambiguation
For commands that take revisions and pathspecs, magic pathspecs like ":^Documentation" or ":/Documentation" have to appear on the right-hand side of the disambiguating "--", like: git log -- :^Documentation This makes them more annoying to use than they need to be. We loosened the rules for wildcards in 28fcc0b71 (pathspec: avoid the need of "--" when wildcard is used, 2015-05-02). Let's do the same for arguments that look like pathspec magic (colon followed by any punctuation). The obvious and desired impact is that you can now do: git log :^Documentation But let's consider related invocations and whether we're making them better or worse: - git log :/foo (when "foo" matches a commit message) This one should remain the same. Like the existing wildcard rule, we're touching only verify_filename(), not verify_non_filename(). So cases that _do_ resolve as a rev will continue to do so. - git log :^foo (when "^foo" exists in your index) The same logic applies; it would continue to work. And ditto for any other weird filenames in your index like "(attr)foo". - git log :/foo (when "foo" does _not_ match a commit message) We won't treat this as a revision, because it doesn't match anything. So prior to this patch, we'd either treat it as a path (if "foo" does exist at the root of the project) or complain "this isn't a rev, and nor is it a path". Whereas after this patch, we'd always treat it like a path, whether it exists or not (so in the second case instead of an error, you're likely to get an empty "log", unless "foo" really did exist somewhere in your history). So this is a potential downside; if the user intended a search of the commit messages, they may prefer the error message to the attempted pathspec match. This same downside actually exists currently when you have an asterisk in your regex. E.g., git log :/foo.*bar will be treated as a pathspec (if it doesn't match a commit message) due to the wildcard matching in 28fcc0b71. - git log :^foo (when "^foo" isn't in your index) The same outcomes apply here as above, but this "downside" is very unlikely to occur, as "^foo" is such a funny name (and again, things like "(attr)foo" as a filename are sufficiently uncommon not to worry about). - git log :%foo (or any other pathspec magic char that doesn't exist) The new check doesn't actually parse the pathspec magic, but allows any punctuation (which includes the long-form ":(magic)"). At first glance this seems overly permissive, but it actually yields a better error message here: instead of complaining that ":%foo" is neither a rev nor a path, we treat it as a pathspec and complain that "%" is not a known magic (the same as we would if the "--" were given). Of course, the flip side here is when you really meant a file named "%foo" in your index, but it didn't exist. That seems reasonably unlikely (and the error message is clear enough to point the user in the right direction). So the collateral damage doesn't seem too bad (it's really just the case where :/foo doesn't match anything). There are two possibilities for reducing that: 1. Instead of accepting all pathspec magic, just allow certain ones like ":^" which are unlikely to be used to specify a revision (or alternatively, just disallow potentially confusing ones like ":/"). That works, but it makes the rules inconsistent and confusing for the user (i.e., "--" is sometimes needed for magic and sometimes not). 2. Instead of recognizing pathspec magic, teach check_filename() to parse off the filename bit and see if that exists (e.g., for "^foo" see if "foo" exists). We already do this for ":/", but it's done in a very ad-hoc way. We parse it ourselves, rather than relying on the pathspec code, and handle only "/" magic, and not even its ":(top)" long form. That could be fixed by asking the pathspec code to parse it for us (including all magic), and then trying to match the resulting name against the working tree. But not all pathspec magic actually results in a pathname. For instance, when should ":(attr)foo" be valid? We can't just look for "foo" in the filesystem (it's an attribute, not a pathname). By comparison, recognizing things that look like pathspec magic is a simple and easy to understand rule. Signed-off-by: Jeff King --- I wrote all the above to try to convince myself that this doesn't have any serious regression cases. And I think I did. But I actually we could make the rules in alternative (2) above work. check_filename() would ask the pathspec code to parse each argument and get one of three results: 1