Re: [PATCH 2/2] grep: use slash for path delimiter, not colon

2013-09-23 Thread Jeff King
On Sun, Sep 22, 2013 at 12:15:21PM -0700, Jonathon Mah wrote:

> A few days ago I came across the same "surprising" output of git-grep,
> tried to adjust the code to print "git show"-able object names, and
> ran into similar subtleties. I just found this thread, and Jeff's code
> handles more cases than mine did (I didn't try Phil's initial patch),
> but I can add some more test cases with non-showable output (again
> related to git-grep's path scoping):
> 

If you haven't read the side thread starting at [1], there are some
arguments that git-grep is doing the right thing already.  I think there
are a few issues at play here:

> $ git grep -l cache HEAD:./ | head -1
> HEAD:./:.gitignore

As you show, using a colon separator from a tree-ish that contains a
partial path looks bad.  The downside of turning this into a slash,
though, is that you lose the information of the tree-ish. See [2].

> $ cd Documentation
> $ git grep -l cache HEAD | head -1
> HEAD:CodingGuidelines

Grepping from a subdirectory produces relative paths that look like real
tree:path specifications, but aren't. Showing the full path would
potentially be cluttering if you are in a deep directory.  These days we
have the "./" syntax, though, so we could perhaps output:

  HEAD:./CodingGuidelines

which is succinct but can be used to access the path in the tree (and
makes more clear to the user that we have only grepped in the current
subdirectory).

> $ git grep -l cache HEAD:Documentation/CodingGuidelines
> ../HEAD:Documentation/CodingGuidelines
> (woah!)

That one just seems nonsensical and buggy to me. We should not be
applying ".." at all to a blob spec like this.

-Peff

[1] http://thread.gmane.org/gmane.comp.version-control.git/232892/focus=232980

[2] http://thread.gmane.org/gmane.comp.version-control.git/232892/focus=233004
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] grep: use slash for path delimiter, not colon

2013-09-22 Thread Jonathon Mah
> > HEAD:/some/path/to/foo.txt
> > HEAD:some/path/to/foo.txt
> 
> With my patch it prints the latter.
> 
> This is because get_sha1_with_context("HEAD:"...) returns an empty
> 'path' string.  The code decides to use ':' as the delimiter in that
> case, but it sees there already is one at the end of "HEAD:".

A few days ago I came across the same "surprising" output of git-grep, tried to 
adjust the code to print "git show"-able object names, and ran into similar 
subtleties. I just found this thread, and Jeff's code handles more cases than 
mine did (I didn't try Phil's initial patch), but I can add some more test 
cases with non-showable output (again related to git-grep's path scoping):

$ git grep -l cache HEAD:./ | head -1
HEAD:./:.gitignore

$ cd Documentation
$ git grep -l cache HEAD | head -1
HEAD:CodingGuidelines

$ git grep -l cache HEAD:Documentation/CodingGuidelines
../HEAD:Documentation/CodingGuidelines
(woah!)

Sorry that I don't yet have anything useful to suggest! But I can tell the 
story of my use case:

I have a large repository (1.6GB bare) which I don't work on, but which 
contains code that I need to refer to. A checkout is ~600MB and 27k files, 
which I'd like to avoid (it's redundant data, and would slow down backups of my 
drive). I found myself "git-grep"ping through parts of the tree, looking 
through the results, and then "git-show"ing interesting files. Having a real 
object name in the grep output allows copy-and-paste of the object path.



Jonathon Mah
m...@jonathonmah.com


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] grep: use slash for path delimiter, not colon

2013-08-26 Thread Jeff King
On Mon, Aug 26, 2013 at 05:03:04PM -0400, Phil Hord wrote:

> >   $ git grep -l foo HEAD | head -1
> >   HEAD:RelNotes/1.5.1.5.txt
> >
> > we still limit to the current directory, but the output does not note
> > this (it should be "HEAD:./RelNotes/1.5.1.5.txt"). I think this bug is
> > orthogonal to Phil's patch, though.
> 
> Maybe not.  My path completes the assumption that the L:R value
> returned by grep is an object ref; but Junio still thought it wasn't.
> I think this is another case where his view was correct.

I certainly assumed it was, because it is in most cases it is. And something
like "HEAD:RelNotes/1.5.1.5.txt" certainly _looks_ like one, and is
generated by the current git. And what is the point of coming up with a
file listing if the names you return do not actually exist?

> There's more bad news on this front.
> 
> $ cd Documentation
> $ git grep -l foo HEAD .. | head -1
> HEAD:../.gitignore
> 
> That's not a valid ref, either (though maybe it could be).

Yes, though we seem to normalize paths already. So the other entries
from that command are (in git.git):

  HEAD:../.mailmap
  HEAD:RelNotes/1.5.1.5.txt

So we could either:

  1. Prepend the current path before normalizing to yield:

  HEAD:.mailmap
  HEAD:Documentation/RelNotes/1.5.1.5.txt

  2. Teach the get_sha1 path parser about "..", and prepend "./" when we
 are in a prefixed subdir.

  HEAD:./../.mailmap
  HEAD:./RelNotes/1.5.1.5.txt

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] grep: use slash for path delimiter, not colon

2013-08-26 Thread Phil Hord
On Mon, Aug 26, 2013 at 4:52 PM, Jeff King  wrote:
> On Mon, Aug 26, 2013 at 10:13:14PM +0200, Johannes Sixt wrote:
>
>> Am 26.08.2013 21:56, schrieb Jeff King:
>> > Also, prevent the delimiter being added twice, as happens now in these
>> > examples:
>> >
>> >   git grep -l foo HEAD:
>> >   HEAD::some/path/to/foo.txt
>> >^
>>
>> Which one of these two does it print then?
>>
>> HEAD:/some/path/to/foo.txt
>> HEAD:some/path/to/foo.txt
>
> It should (and does) print the latter.
>
> But I do note that our pathspec handling for subdirectories seems buggy.
> If you do:
>
>   $ cd Documentation
>   $ git grep -l foo | head -1
>   RelNotes/1.5.1.5.txt
>
> that's fine; we limit to the current directory. But then if you do:
>
>   $ git grep -l foo HEAD | head -1
>   HEAD:RelNotes/1.5.1.5.txt
>
> we still limit to the current directory, but the output does not note
> this (it should be "HEAD:./RelNotes/1.5.1.5.txt"). I think this bug is
> orthogonal to Phil's patch, though.

Maybe not.  My path completes the assumption that the L:R value
returned by grep is an object ref; but Junio still thought it wasn't.
I think this is another case where his view was correct.

There's more bad news on this front.

$ cd Documentation
$ git grep -l foo HEAD .. | head -1
HEAD:../.gitignore

That's not a valid ref, either (though maybe it could be).

Phil
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] grep: use slash for path delimiter, not colon

2013-08-26 Thread Jeff King
On Mon, Aug 26, 2013 at 10:13:14PM +0200, Johannes Sixt wrote:

> Am 26.08.2013 21:56, schrieb Jeff King:
> > Also, prevent the delimiter being added twice, as happens now in these
> > examples:
> > 
> >   git grep -l foo HEAD:
> >   HEAD::some/path/to/foo.txt
> >^
> 
> Which one of these two does it print then?
> 
> HEAD:/some/path/to/foo.txt
> HEAD:some/path/to/foo.txt

It should (and does) print the latter.

But I do note that our pathspec handling for subdirectories seems buggy.
If you do:

  $ cd Documentation
  $ git grep -l foo | head -1
  RelNotes/1.5.1.5.txt

that's fine; we limit to the current directory. But then if you do:

  $ git grep -l foo HEAD | head -1
  HEAD:RelNotes/1.5.1.5.txt

we still limit to the current directory, but the output does note note
this (it should be "HEAD:./RelNotes/1.5.1.5.txt"). I think this bug is
orthogonal to Phil's patch, though.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] grep: use slash for path delimiter, not colon

2013-08-26 Thread Phil Hord
On Mon, Aug 26, 2013 at 4:13 PM, Johannes Sixt  wrote:
> Am 26.08.2013 21:56, schrieb Jeff King:
>> Also, prevent the delimiter being added twice, as happens now in these
>> examples:
>>
>>   git grep -l foo HEAD:
>>   HEAD::some/path/to/foo.txt
>>^
>
> Which one of these two does it print then?
>
> HEAD:/some/path/to/foo.txt
> HEAD:some/path/to/foo.txt


With my patch it prints the latter.

This is because get_sha1_with_context("HEAD:"...) returns an empty
'path' string.  The code decides to use ':' as the delimiter in that
case, but it sees there already is one at the end of "HEAD:".

Phil
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] grep: use slash for path delimiter, not colon

2013-08-26 Thread Johannes Sixt
Am 26.08.2013 21:56, schrieb Jeff King:
> Also, prevent the delimiter being added twice, as happens now in these
> examples:
> 
>   git grep -l foo HEAD:
>   HEAD::some/path/to/foo.txt
>^

Which one of these two does it print then?

HEAD:/some/path/to/foo.txt
HEAD:some/path/to/foo.txt

-- Hannes

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] grep: use slash for path delimiter, not colon

2013-08-26 Thread Jeff King
From: Phil Hord 

When a commit is grepped and matching filenames are printed, grep-objects
creates the filename by prefixing the original cmdline argument to the
matched path separated by a colon.  Normally this forms a valid blob
reference to the filename, like this:

  git grep -l foo HEAD
  HEAD:some/path/to/foo.txt
  ^

But a tree path may be given to grep instead; in this case the colon is
not a valid delimiter to use since it is placed inside a path.

  git grep -l foo HEAD:some
  HEAD:some:path/to/foo.txt
   ^

The slash path delimiter should be used instead.  Fix git grep to
discern the correct delimiter so it can report valid object names.

  git grep -l foo HEAD:some
  HEAD:some/path/to/foo.txt
   ^

Also, prevent the delimiter being added twice, as happens now in these
examples:

  git grep -l foo HEAD:
  HEAD::some/path/to/foo.txt
   ^
  git grep -l foo HEAD:some/
  HEAD:some/:path/to/foo.txt
^

Add a test to confirm correct path forming.

Signed-off-by: Jeff King 
---
I left the author as you, since you have done all the hard work; this is
really just me rebasing your patch on top of mine. But note that you did
not signoff the original.

 builtin/grep.c  | 13 +
 t/t7810-grep.sh | 15 +++
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index ee47d49..2df7986 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -26,6 +26,7 @@ struct object_to_grep {
 struct object_to_grep {
struct object *item;
const char *name;
+   unsigned has_path:1;
 };
 
 static int use_threads = 1;
@@ -462,7 +463,7 @@ static int grep_object(struct grep_opt *opt, const struct 
pathspec *pathspec,
 }
 
 static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
-  struct object *obj, const char *name)
+  struct object *obj, const char *name, char delimiter)
 {
if (obj->type == OBJ_BLOB)
return grep_sha1(opt, obj->sha1, name, 0, NULL);
@@ -485,7 +486,8 @@ static int grep_object(struct grep_opt *opt, const struct 
pathspec *pathspec,
strbuf_init(&base, PATH_MAX + len + 1);
if (len) {
strbuf_add(&base, name, len);
-   strbuf_addch(&base, ':');
+   if (name[len-1] != delimiter)
+   strbuf_addch(&base, delimiter);
}
init_tree_desc(&tree, data, size);
hit = grep_tree(opt, pathspec, &tree, &base, base.len,
@@ -506,7 +508,8 @@ static int grep_objects(struct grep_opt *opt, const struct 
pathspec *pathspec,
for (i = 0; i < nr; i++) {
struct object *real_obj;
real_obj = deref_tag(list[i].item, NULL, 0);
-   if (grep_object(opt, pathspec, real_obj, list[i].name)) {
+   if (grep_object(opt, pathspec, real_obj, list[i].name,
+   list[i].has_path ? '/' : ':')) {
hit = 1;
if (opt->status_only)
break;
@@ -822,8 +825,9 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
for (i = 0; i < argc; i++) {
const char *arg = argv[i];
unsigned char sha1[20];
+   struct object_context oc;
/* Is it a rev? */
-   if (!get_sha1(arg, sha1)) {
+   if (!get_sha1_with_context(arg, 0, sha1, &oc)) {
struct object *object = parse_object_or_die(sha1, arg);
if (!seen_dashdash)
verify_non_filename(prefix, arg);
@@ -831,6 +835,7 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
ALLOC_GROW(list, list_nr+1, list_alloc);
list[list_nr].item = object;
list[list_nr].name = arg;
+   list[list_nr].has_path = !!oc.path[0];
list_nr++;
continue;
}
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index f698001..2494bfc 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -886,6 +886,21 @@ cat >expected actual &&
+   test_cmp expected actual
+'
+
+cat >expected