Re: [RFC/PATCH] recognize pathspec magic without "--" disambiguation

2017-05-27 Thread Junio C Hamano
Jeff King  writes:

> Yeah. That's a good reason not to use ":/" without a disambiguating "--"
> (which _does_ work, even without my series, because check_filename()
> does specific path-matching). At best, you pay for a complete useless
> history traversal before the command actually starts running. But much
> more likely is that Git just complains of ambiguity, because people tend
> to mention top-level paths in their commit messages. E.g.:
>
>   $ cd t
>   $ git grep foo :/Documentation
>   fatal: ambiguous argument ':/Documentation': both revision and filename
>
> So it really is a pretty horrible syntax.

It does not allow it to be further annotated like "HEAD^{/^Merge
branch 'foo'}^2".  Yes, I agree that ":/string" was a pretty poor
design that was done without thinking things through.

But back then in 2007, it is a bit unfair to blame the initial
design for not thinking things through.  There weren't as many
precedents, good or bad, to learn from than we have today ;-)



Re: [RFC/PATCH] recognize pathspec magic without "--" disambiguation

2017-05-27 Thread Jeff King
On Sat, May 27, 2017 at 11:54:07AM +0200, Ævar Arnfjörð Bjarmason wrote:

> On Thu, May 25, 2017 at 5:27 PM, Jeff King  wrote:
> > git log :/foo.*bar
> 
> Another option would be to deprecate the :/rx syntax over some period
> in favor of ^{/rx}.

Yeah, the latter is more flexible (can start at a tip of your choosing)
and syntactically matches other object selectors much better.

> I think it's too ugly to live, and really useless. It's equivalent to
> "--grep= --all". Does anyone use this and not really mean to use
> ^{/rx}? E.g. "git show :/fix" might show a fix on some unrelated
> branch you recently rebased.

It's actually _worse_ than that --grep because it only picks one result.
So not only do we look at unrelated branches, but they may take
precedence (based on commit timestamp) over the current branch.

It is shorter than "HEAD^{/re}", though. So I suspect people do use it
and would be slightly annoyed if it went away.

> >   will be treated as a pathspec (if it doesn't match a
> >   commit message) due to the wildcard matching in
> >   28fcc0b71.
> 
> So it might DWYM after hanging there looking at your entire history
> for a commit message matching foo.*bar? And if you make such a commit
> it'll start meaning something else entirely?

Yeah. That's a good reason not to use ":/" without a disambiguating "--"
(which _does_ work, even without my series, because check_filename()
does specific path-matching). At best, you pay for a complete useless
history traversal before the command actually starts running. But much
more likely is that Git just complains of ambiguity, because people tend
to mention top-level paths in their commit messages. E.g.:

  $ cd t
  $ git grep foo :/Documentation
  fatal: ambiguous argument ':/Documentation': both revision and filename

So it really is a pretty horrible syntax.

-Peff


Re: [RFC/PATCH] recognize pathspec magic without "--" disambiguation

2017-05-27 Thread Ævar Arnfjörð Bjarmason
On Thu, May 25, 2017 at 5:27 PM, Jeff King  wrote:
> git log :/foo.*bar

Another option would be to deprecate the :/rx syntax over some period
in favor of ^{/rx}.

I think it's too ugly to live, and really useless. It's equivalent to
"--grep= --all". Does anyone use this and not really mean to use
^{/rx}? E.g. "git show :/fix" might show a fix on some unrelated
branch you recently rebased.

>   will be treated as a pathspec (if it doesn't match a
>   commit message) due to the wildcard matching in
>   28fcc0b71.

So it might DWYM after hanging there looking at your entire history
for a commit message matching foo.*bar? And if you make such a commit
it'll start meaning something else entirely?


Re: [RFC/PATCH] recognize pathspec magic without "--" disambiguation

2017-05-26 Thread Jeff King
On Fri, May 26, 2017 at 11:13:55AM +0900, Junio C Hamano wrote:

> >- git log :^foo
> >   (when "^foo" exists in your index)
> >
> >   The same logic applies; it would continue to work. And
> >   ditto for any other weird filenames in your index like
> >   "(attr)foo".
> 
> "git show :$path" where $path happens to be "^foo" would grab the
> contents of the $path out of the index and I think that is what you
> meant, but use of "git log" in the example made me scratch my head
> greatly.

Yeah, sorry about that confusion. My original motivation for this was
for use with git-grep, but since the existing tests all used "git log",
I tried to switch to that for consistency. But index paths obviously
make no sense with "git log" (we _do_ still parse it the same as we
would for git-show, etc, even though git-log doesn't do anything with
it).

As an aside, I wonder if git-log should issue a warning when there are
pending objects that it doesn't handle.

> >- git log :/foo
> >   (when "foo" does _not_ match a commit message)
> > ...
> >   This same downside actually exists currently when you
> >   have an asterisk in your regex. E.g.,
> >
> > git log :/foo.*bar
> >
> >   will be treated as a pathspec (if it doesn't match a
> >   commit message) due to the wildcard matching in
> >   28fcc0b71.
> 
> In other words, we are not making things worse?

We're making them slightly worse. The "problem" used to trigger with
":/foo.*bar", and now it would trigger with ":/foobar", too. I don't
know if that's a meaningful distinction, though.

> Unlike "git log builtin-checkout.c" that errors out (only because
> there is no such file in the checkout of the current version) and
> makes its solution obvious to the users, this change has the risk of
> silently accepting an ambiguous input and computing result that is
> different from what the user intended to.  So I am not sure.  

Right, that's what I was trying to catalogue in the commit message.

> As you pointedout, ":/" especially does look like a likely point of
> failure, in that both "this is path at the top" pathspec magic and
> "the commit with this string" are not something that we can say with
> confidence that are rarely used because they are so esoteric.
> 
> As to "is it OK to build a rule that we cannot explain easily?", I
> think it is OK to say "if it is not a rev, and if it is not a
> pathname in the current working tree, you must disambiguate, but Git
> helps by guessing in some cases---if you want to have more control
> (e.g. you are a script), explicitly disambiguate and you'd be OK",
> and leave the "some cases" vague, as long as we are only making
> reasonably conservative guesses.

I don't know. That is punting on the issue by not making any promises.
But that's a small consolation to somebody who does:

  $ git log :^foo
  [ok, works great...]
  $ git log :/foo
  fatal: ambiguous argument ':/foo': unknown revision or path not in the 
working tree.
  [stupid git! why don't you work?]

An explanation like "it's complicated, and the rules are meant to be
conservative and avoid confusion" does not really help Git's reputation
for being arcane.

I dunno. The one saving grace of ":/" is that we actually do handle
the ":/foo" case completely in check_filename(). So going back to my
:/foobar example: if you have a file "foobar" in your root directory, it
_is_ considered ambiguous.

So if that was the one exception, it would probably be OK in practice.

Which again makes me feel a bit like the right solution is doing the
existence checks on the non-magic portion of the pathspec for more magic
besides ":/".

Unfortunately, since writing my last message I did look into asking the
pathspec code to parse the arguments for us, but I think it would take
quite a bit of refactoring. It's very ready to die() or emit warnings on
bogus pathspecs, so it's not a good match for this kind of speculative
"is it a pathspec" test.

The middle ground would be to replicate a simple subset of the rules in
verify_filename(). If we assume that all arguments with ":(" are
pathspecs (which seem rather unlikely to have false positives), then
that leaves only a few short-magic patterns: :/, :!, and :^.

We already specially handle :/ here. So it would really just be adding
the other two (which are just aliases of each other). It's not that much
code. The dirty thing is just that we're replicating some of the
pathspec logic, and any new magic would have to be updated here, too.

I'll see if I can work up a patch.

-Peff


Re: [RFC/PATCH] recognize pathspec magic without "--" disambiguation

2017-05-25 Thread Junio C Hamano
Jeff King  writes:

> But let's consider related invocations and whether we're
> making them better or worse:
>
>- git log :/foo
>   (when "foo" matches a commit message)
>
>   This one should remain the same. Like the existing
>   wildcard rule, we're touching only verify_filename(),
>   not verify_non_filename(). So cases that _do_ resolve
>   as a rev will continue to do so.
>
>- git log :^foo
>   (when "^foo" exists in your index)
>
>   The same logic applies; it would continue to work. And
>   ditto for any other weird filenames in your index like
>   "(attr)foo".

"git show :$path" where $path happens to be "^foo" would grab the
contents of the $path out of the index and I think that is what you
meant, but use of "git log" in the example made me scratch my head
greatly.

>- git log :/foo
>   (when "foo" does _not_ match a commit message)
>   ...
>   This same downside actually exists currently when you
>   have an asterisk in your regex. E.g.,
>
> git log :/foo.*bar
>
>   will be treated as a pathspec (if it doesn't match a
>   commit message) due to the wildcard matching in
>   28fcc0b71.

In other words, we are not making things worse?

> I wrote all the above to try to convince myself that this
> doesn't have any serious regression cases. And I think I
> did.
>
> But I actually we could make the rules in alternative (2)
> above work. check_filename() would ask the pathspec code to
> parse each argument and get one of three results:
>
>   1. it's not pathspec magic; treat it like a filename
>  (e.g., just "foo", or even bogus stuff like ":%foo")
>
>   2. it is pathspec magic, and here is the matching filename
>  that ought to exist (e.g., "foo" from ":^foo" or
>  ":(exclude)foo")
>
>   3. it is pathspec magic, but there's no matching filename.
>  Assume it's a pathspec (e.g., "(attr)foo").
>
> I'm on the fence on whether it's worth the trouble versus
> the simple rule implemented by this patch.

Unlike "git log builtin-checkout.c" that errors out (only because
there is no such file in the checkout of the current version) and
makes its solution obvious to the users, this change has the risk of
silently accepting an ambiguous input and computing result that is
different from what the user intended to.  So I am not sure.  

As you pointedout, ":/" especially does look like a likely point of
failure, in that both "this is path at the top" pathspec magic and
"the commit with this string" are not something that we can say with
confidence that are rarely used because they are so esoteric.

As to "is it OK to build a rule that we cannot explain easily?", I
think it is OK to say "if it is not a rev, and if it is not a
pathname in the current working tree, you must disambiguate, but Git
helps by guessing in some cases---if you want to have more control
(e.g. you are a script), explicitly disambiguate and you'd be OK",
and leave the "some cases" vague, as long as we are only making
reasonably conservative guesses.


[RFC/PATCH] recognize pathspec magic without "--" disambiguation

2017-05-25 Thread Jeff King
For commands that take revisions and pathspecs, magic
pathspecs like ":^Documentation" or ":/Documentation" have
to appear on the right-hand side of the disambiguating "--",
like:

  git log -- :^Documentation

This makes them more annoying to use than they need to be.
We loosened the rules for wildcards in 28fcc0b71 (pathspec:
avoid the need of "--" when wildcard is used, 2015-05-02).
Let's do the same for arguments that look like pathspec
magic (colon followed by any punctuation).

The obvious and desired impact is that you can now do:

  git log :^Documentation

But let's consider related invocations and whether we're
making them better or worse:

   - git log :/foo
  (when "foo" matches a commit message)

  This one should remain the same. Like the existing
  wildcard rule, we're touching only verify_filename(),
  not verify_non_filename(). So cases that _do_ resolve
  as a rev will continue to do so.

   - git log :^foo
  (when "^foo" exists in your index)

  The same logic applies; it would continue to work. And
  ditto for any other weird filenames in your index like
  "(attr)foo".

   - git log :/foo
  (when "foo" does _not_ match a commit message)

  We won't treat this as a revision, because it doesn't
  match anything. So prior to this patch, we'd either
  treat it as a path (if "foo" does exist at the root of
  the project) or complain "this isn't a rev, and nor is
  it a path".  Whereas after this patch, we'd always
  treat it like a path, whether it exists or not (so in
  the second case instead of an error, you're likely to
  get an empty "log", unless "foo" really did exist
  somewhere in your history). So this is a potential
  downside; if the user intended a search of the commit
  messages, they may prefer the error message to the
  attempted pathspec match.

  This same downside actually exists currently when you
  have an asterisk in your regex. E.g.,

git log :/foo.*bar

  will be treated as a pathspec (if it doesn't match a
  commit message) due to the wildcard matching in
  28fcc0b71.

   - git log :^foo
   (when "^foo" isn't in your index)

  The same outcomes apply here as above, but this
  "downside" is very unlikely to occur, as "^foo" is
  such a funny name (and again, things like "(attr)foo"
  as a filename are sufficiently uncommon not to worry
  about).

   - git log :%foo
  (or any other pathspec magic char that doesn't exist)

  The new check doesn't actually parse the pathspec
  magic, but allows any punctuation (which includes the
  long-form ":(magic)"). At first glance this seems
  overly permissive, but it actually yields a better
  error message here: instead of complaining that
  ":%foo" is neither a rev nor a path, we treat it as a
  pathspec and complain that "%" is not a known magic
  (the same as we would if the "--" were given).

  Of course, the flip side here is when you really
  meant a file named "%foo" in your index, but it didn't
  exist. That seems reasonably unlikely (and the error
  message is clear enough to point the user in the right
  direction).

So the collateral damage doesn't seem too bad (it's really
just the case where :/foo doesn't match anything). There are
two possibilities for reducing that:

  1. Instead of accepting all pathspec magic, just allow
 certain ones like ":^" which are unlikely to be used to
 specify a revision (or alternatively, just disallow
 potentially confusing ones like ":/").

 That works, but it makes the rules inconsistent and
 confusing for the user (i.e., "--" is sometimes needed
 for magic and sometimes not).

  2. Instead of recognizing pathspec magic, teach
 check_filename() to parse off the filename bit and see
 if that exists (e.g., for "^foo" see if "foo" exists).
 We already do this for ":/", but it's done in a very
 ad-hoc way. We parse it ourselves, rather than relying
 on the pathspec code, and handle only "/" magic, and
 not even its ":(top)" long form.

 That could be fixed by asking the pathspec code to
 parse it for us (including all magic), and then trying
 to match the resulting name against the working tree.
 But not all pathspec magic actually results in a
 pathname. For instance, when should ":(attr)foo" be
 valid? We can't just look for "foo" in the filesystem
 (it's an attribute, not a pathname).

 By comparison, recognizing things that look like
 pathspec magic is a simple and easy to understand rule.

Signed-off-by: Jeff King 
---
I wrote all the above to try to convince myself that this
doesn't have any serious regression cases. And I think I
did.

But I actually we could make the rules in alternative (2)
above work. check_filename() would ask the pathspec code to
parse each argument and get one of three results:

  1