Re: [PATCH 2/2] grep: use slash for path delimiter, not colon
On Sun, Sep 22, 2013 at 12:15:21PM -0700, Jonathon Mah wrote: > A few days ago I came across the same "surprising" output of git-grep, > tried to adjust the code to print "git show"-able object names, and > ran into similar subtleties. I just found this thread, and Jeff's code > handles more cases than mine did (I didn't try Phil's initial patch), > but I can add some more test cases with non-showable output (again > related to git-grep's path scoping): > If you haven't read the side thread starting at [1], there are some arguments that git-grep is doing the right thing already. I think there are a few issues at play here: > $ git grep -l cache HEAD:./ | head -1 > HEAD:./:.gitignore As you show, using a colon separator from a tree-ish that contains a partial path looks bad. The downside of turning this into a slash, though, is that you lose the information of the tree-ish. See [2]. > $ cd Documentation > $ git grep -l cache HEAD | head -1 > HEAD:CodingGuidelines Grepping from a subdirectory produces relative paths that look like real tree:path specifications, but aren't. Showing the full path would potentially be cluttering if you are in a deep directory. These days we have the "./" syntax, though, so we could perhaps output: HEAD:./CodingGuidelines which is succinct but can be used to access the path in the tree (and makes more clear to the user that we have only grepped in the current subdirectory). > $ git grep -l cache HEAD:Documentation/CodingGuidelines > ../HEAD:Documentation/CodingGuidelines > (woah!) That one just seems nonsensical and buggy to me. We should not be applying ".." at all to a blob spec like this. -Peff [1] http://thread.gmane.org/gmane.comp.version-control.git/232892/focus=232980 [2] http://thread.gmane.org/gmane.comp.version-control.git/232892/focus=233004 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] grep: use slash for path delimiter, not colon
> > HEAD:/some/path/to/foo.txt > > HEAD:some/path/to/foo.txt > > With my patch it prints the latter. > > This is because get_sha1_with_context("HEAD:"...) returns an empty > 'path' string. The code decides to use ':' as the delimiter in that > case, but it sees there already is one at the end of "HEAD:". A few days ago I came across the same "surprising" output of git-grep, tried to adjust the code to print "git show"-able object names, and ran into similar subtleties. I just found this thread, and Jeff's code handles more cases than mine did (I didn't try Phil's initial patch), but I can add some more test cases with non-showable output (again related to git-grep's path scoping): $ git grep -l cache HEAD:./ | head -1 HEAD:./:.gitignore $ cd Documentation $ git grep -l cache HEAD | head -1 HEAD:CodingGuidelines $ git grep -l cache HEAD:Documentation/CodingGuidelines ../HEAD:Documentation/CodingGuidelines (woah!) Sorry that I don't yet have anything useful to suggest! But I can tell the story of my use case: I have a large repository (1.6GB bare) which I don't work on, but which contains code that I need to refer to. A checkout is ~600MB and 27k files, which I'd like to avoid (it's redundant data, and would slow down backups of my drive). I found myself "git-grep"ping through parts of the tree, looking through the results, and then "git-show"ing interesting files. Having a real object name in the grep output allows copy-and-paste of the object path. Jonathon Mah m...@jonathonmah.com -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] grep: use slash for path delimiter, not colon
On Mon, Aug 26, 2013 at 05:03:04PM -0400, Phil Hord wrote: > > $ git grep -l foo HEAD | head -1 > > HEAD:RelNotes/1.5.1.5.txt > > > > we still limit to the current directory, but the output does not note > > this (it should be "HEAD:./RelNotes/1.5.1.5.txt"). I think this bug is > > orthogonal to Phil's patch, though. > > Maybe not. My path completes the assumption that the L:R value > returned by grep is an object ref; but Junio still thought it wasn't. > I think this is another case where his view was correct. I certainly assumed it was, because it is in most cases it is. And something like "HEAD:RelNotes/1.5.1.5.txt" certainly _looks_ like one, and is generated by the current git. And what is the point of coming up with a file listing if the names you return do not actually exist? > There's more bad news on this front. > > $ cd Documentation > $ git grep -l foo HEAD .. | head -1 > HEAD:../.gitignore > > That's not a valid ref, either (though maybe it could be). Yes, though we seem to normalize paths already. So the other entries from that command are (in git.git): HEAD:../.mailmap HEAD:RelNotes/1.5.1.5.txt So we could either: 1. Prepend the current path before normalizing to yield: HEAD:.mailmap HEAD:Documentation/RelNotes/1.5.1.5.txt 2. Teach the get_sha1 path parser about "..", and prepend "./" when we are in a prefixed subdir. HEAD:./../.mailmap HEAD:./RelNotes/1.5.1.5.txt -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] grep: use slash for path delimiter, not colon
On Mon, Aug 26, 2013 at 4:52 PM, Jeff King wrote: > On Mon, Aug 26, 2013 at 10:13:14PM +0200, Johannes Sixt wrote: > >> Am 26.08.2013 21:56, schrieb Jeff King: >> > Also, prevent the delimiter being added twice, as happens now in these >> > examples: >> > >> > git grep -l foo HEAD: >> > HEAD::some/path/to/foo.txt >> >^ >> >> Which one of these two does it print then? >> >> HEAD:/some/path/to/foo.txt >> HEAD:some/path/to/foo.txt > > It should (and does) print the latter. > > But I do note that our pathspec handling for subdirectories seems buggy. > If you do: > > $ cd Documentation > $ git grep -l foo | head -1 > RelNotes/1.5.1.5.txt > > that's fine; we limit to the current directory. But then if you do: > > $ git grep -l foo HEAD | head -1 > HEAD:RelNotes/1.5.1.5.txt > > we still limit to the current directory, but the output does not note > this (it should be "HEAD:./RelNotes/1.5.1.5.txt"). I think this bug is > orthogonal to Phil's patch, though. Maybe not. My path completes the assumption that the L:R value returned by grep is an object ref; but Junio still thought it wasn't. I think this is another case where his view was correct. There's more bad news on this front. $ cd Documentation $ git grep -l foo HEAD .. | head -1 HEAD:../.gitignore That's not a valid ref, either (though maybe it could be). Phil -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] grep: use slash for path delimiter, not colon
On Mon, Aug 26, 2013 at 10:13:14PM +0200, Johannes Sixt wrote: > Am 26.08.2013 21:56, schrieb Jeff King: > > Also, prevent the delimiter being added twice, as happens now in these > > examples: > > > > git grep -l foo HEAD: > > HEAD::some/path/to/foo.txt > >^ > > Which one of these two does it print then? > > HEAD:/some/path/to/foo.txt > HEAD:some/path/to/foo.txt It should (and does) print the latter. But I do note that our pathspec handling for subdirectories seems buggy. If you do: $ cd Documentation $ git grep -l foo | head -1 RelNotes/1.5.1.5.txt that's fine; we limit to the current directory. But then if you do: $ git grep -l foo HEAD | head -1 HEAD:RelNotes/1.5.1.5.txt we still limit to the current directory, but the output does note note this (it should be "HEAD:./RelNotes/1.5.1.5.txt"). I think this bug is orthogonal to Phil's patch, though. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] grep: use slash for path delimiter, not colon
On Mon, Aug 26, 2013 at 4:13 PM, Johannes Sixt wrote: > Am 26.08.2013 21:56, schrieb Jeff King: >> Also, prevent the delimiter being added twice, as happens now in these >> examples: >> >> git grep -l foo HEAD: >> HEAD::some/path/to/foo.txt >>^ > > Which one of these two does it print then? > > HEAD:/some/path/to/foo.txt > HEAD:some/path/to/foo.txt With my patch it prints the latter. This is because get_sha1_with_context("HEAD:"...) returns an empty 'path' string. The code decides to use ':' as the delimiter in that case, but it sees there already is one at the end of "HEAD:". Phil -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] grep: use slash for path delimiter, not colon
Am 26.08.2013 21:56, schrieb Jeff King: > Also, prevent the delimiter being added twice, as happens now in these > examples: > > git grep -l foo HEAD: > HEAD::some/path/to/foo.txt >^ Which one of these two does it print then? HEAD:/some/path/to/foo.txt HEAD:some/path/to/foo.txt -- Hannes -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] grep: use slash for path delimiter, not colon
From: Phil Hord When a commit is grepped and matching filenames are printed, grep-objects creates the filename by prefixing the original cmdline argument to the matched path separated by a colon. Normally this forms a valid blob reference to the filename, like this: git grep -l foo HEAD HEAD:some/path/to/foo.txt ^ But a tree path may be given to grep instead; in this case the colon is not a valid delimiter to use since it is placed inside a path. git grep -l foo HEAD:some HEAD:some:path/to/foo.txt ^ The slash path delimiter should be used instead. Fix git grep to discern the correct delimiter so it can report valid object names. git grep -l foo HEAD:some HEAD:some/path/to/foo.txt ^ Also, prevent the delimiter being added twice, as happens now in these examples: git grep -l foo HEAD: HEAD::some/path/to/foo.txt ^ git grep -l foo HEAD:some/ HEAD:some/:path/to/foo.txt ^ Add a test to confirm correct path forming. Signed-off-by: Jeff King --- I left the author as you, since you have done all the hard work; this is really just me rebasing your patch on top of mine. But note that you did not signoff the original. builtin/grep.c | 13 + t/t7810-grep.sh | 15 +++ 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index ee47d49..2df7986 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -26,6 +26,7 @@ struct object_to_grep { struct object_to_grep { struct object *item; const char *name; + unsigned has_path:1; }; static int use_threads = 1; @@ -462,7 +463,7 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec, } static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec, - struct object *obj, const char *name) + struct object *obj, const char *name, char delimiter) { if (obj->type == OBJ_BLOB) return grep_sha1(opt, obj->sha1, name, 0, NULL); @@ -485,7 +486,8 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_init(&base, PATH_MAX + len + 1); if (len) { strbuf_add(&base, name, len); - strbuf_addch(&base, ':'); + if (name[len-1] != delimiter) + strbuf_addch(&base, delimiter); } init_tree_desc(&tree, data, size); hit = grep_tree(opt, pathspec, &tree, &base, base.len, @@ -506,7 +508,8 @@ static int grep_objects(struct grep_opt *opt, const struct pathspec *pathspec, for (i = 0; i < nr; i++) { struct object *real_obj; real_obj = deref_tag(list[i].item, NULL, 0); - if (grep_object(opt, pathspec, real_obj, list[i].name)) { + if (grep_object(opt, pathspec, real_obj, list[i].name, + list[i].has_path ? '/' : ':')) { hit = 1; if (opt->status_only) break; @@ -822,8 +825,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix) for (i = 0; i < argc; i++) { const char *arg = argv[i]; unsigned char sha1[20]; + struct object_context oc; /* Is it a rev? */ - if (!get_sha1(arg, sha1)) { + if (!get_sha1_with_context(arg, 0, sha1, &oc)) { struct object *object = parse_object_or_die(sha1, arg); if (!seen_dashdash) verify_non_filename(prefix, arg); @@ -831,6 +835,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) ALLOC_GROW(list, list_nr+1, list_alloc); list[list_nr].item = object; list[list_nr].name = arg; + list[list_nr].has_path = !!oc.path[0]; list_nr++; continue; } diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index f698001..2494bfc 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -886,6 +886,21 @@ cat >expectedactual && + test_cmp expected actual +' + +cat >expected