[PATCHv2] grep: use slash for path delimiter, not colon

2013-08-26 Thread Phil Hord
When a commit is grepped and matching filenames are printed, grep-objects
creates the filename by prefixing the original cmdline argument to the
matched path separated by a colon.  Normally this forms a valid blob
reference to the filename, like this:

  git grep -l foo HEAD
  HEAD:some/path/to/foo.txt
  ^

But a tree path may be given to grep instead; in this case the colon is
not a valid delimiter to use since it is placed inside a path.

  git grep -l foo HEAD:some
  HEAD:some:path/to/foo.txt
   ^

The slash path delimiter should be used instead.  Fix git grep to
discern the correct delimiter so it can report valid object names.

  git grep -l foo HEAD:some
  HEAD:some/path/to/foo.txt
   ^

Also, prevent the delimiter being added twice, as happens now in these
examples:

  git grep -l foo HEAD:
  HEAD::some/path/to/foo.txt
   ^
  git grep -l foo HEAD:some/
  HEAD:some/:path/to/foo.txt
^

Add a test to confirm correct path forming.
---
This version is a bit more deterministic and also adds a test.

It accepts the expense of examining the path argument again to 
determine if it is a tree-ish + path rather than just a tree (commit).
The get_sha1 call occurs one extra time for each tree-ish argument,
so it's not expensive. We avoid mucking with the object_array API this
way, and also do not rely on the object-type to tell us anything about
the way the object name was spelled.

This one also adds a check to avoid duplicating an extant delimiter.

 builtin/grep.c  |  9 -
 t/t7810-grep.sh | 15 +++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 03bc442..6fc418f 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -480,8 +480,15 @@ static int grep_object(struct grep_opt *opt, const struct 
pathspec *pathspec,
len = name ? strlen(name) : 0;
strbuf_init(base, PATH_MAX + len + 1);
if (len) {
+   struct object_context ctx;
+   unsigned char sha1[20];
+   char delimiter = ':';
+   if (!get_sha1_with_context(name, 0, sha1, ctx) 
+   ctx.path[0]!=0)
+   delimiter='/';
strbuf_add(base, name, len);
-   strbuf_addch(base, ':');
+   if (name[len-1] != delimiter)
+   strbuf_addch(base, delimiter);
}
init_tree_desc(tree, data, size);
hit = grep_tree(opt, pathspec, tree, base, base.len,
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index f698001..2494bfc 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -886,6 +886,21 @@ test_expect_success 'grep -e -- -- path' '
 '
 
 cat expected EOF
+HEAD:t/a/v:1:vvv
+HEAD:t/v:1:vvv
+EOF
+
+test_expect_success grep HEAD -- path/ '
+   git grep -n -e vvv HEAD -- t/ actual 
+   test_cmp expected actual
+'
+
+test_expect_success grep HEAD:path '
+   git grep -n -e vvv HEAD:t/ actual 
+   test_cmp expected actual
+'
+
+cat expected EOF
 hello.c:int main(int argc, const char **argv)
 hello.c:   printf(Hello world.\n);
 EOF
-- 
1.8.4.557.g34b3a2e

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] grep: use slash for path delimiter, not colon

2013-08-26 Thread Jeff King
On Mon, Aug 26, 2013 at 10:46:12AM -0400, Phil Hord wrote:

 This version is a bit more deterministic and also adds a test.
 
 It accepts the expense of examining the path argument again to 
 determine if it is a tree-ish + path rather than just a tree (commit).
 The get_sha1 call occurs one extra time for each tree-ish argument,
 so it's not expensive.

I don't like this approach in general because it lacks atomicity. IOW,
the thing you are looking up may change between the two get_sha1 calls.
You're _almost_ good here because you don't actually care what the
second call returns, but only which features it _would_ have used. But
you may see the second call fail because the ref doesn't exist anymore,
or points to a different tree, and you will erroneously use : instead
of /.

I admit this is not that likely, but I'd really rather avoid introducing
such races if we can.

 We avoid mucking with the object_array API this way, and also do not
 rely on the object-type to tell us anything about the way the object
 name was spelled.

Changing the object_array API would be hard, but I don't think we need
to do it here. Can we simply stop using object_array to pass the list,
and instead just have a custom list?

I'll see how painful that is.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] grep: use slash for path delimiter, not colon

2013-08-26 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Mon, Aug 26, 2013 at 03:28:26PM -0400, Jeff King wrote:

 Changing the object_array API would be hard, but I don't think we need
 to do it here. Can we simply stop using object_array to pass the list,
 and instead just have a custom list?
 
 I'll see how painful that is.

 Not very, I think. Here's the series.

   [1/2]: grep: stop using object_array
   [2/2]: grep: use slash for path delimiter, not colon

I agree that if we were to do this, these patches show a reasonable
approach to do so.

I however am not yet convinced if its output is necessarily better
X-.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html