Re: [PATCH 2/2] builtin/grep.c: teach '-o', '--only-matching' to 'git-grep'

2018-05-10 Thread Jeff King
On Wed, May 09, 2018 at 07:00:10PM -0700, Taylor Blau wrote:

> >  - they test with context (-C3) for us. It looks like GNU grep omits
> >context lines with "-o", but we show a bunch of blank lines. This is
> >I guess a bug (though it seems kind of an odd combination to specify
> >in the first place)
> 
> I'm torn on what to do here. Currently, with -C3, I get a bunch of lines
> like:
> 
>   -
> 
> Which I think is technically _right_, but I agree that it is certainly
> an odd combination of things to give to 'git grep'. I think that we
> could either:
> 
>   1. Continue outputting blank lines,
>   2. Ignore '-C' with '-o', or
>   3. die() when given this combination.
> 
> I think that (1) is the most "correct" (for some definition), so I'm
> inclined to adopt that approach. I suppose that (2) is closer to what
> GNU grep offers, and that is the point of this series, so perhaps it
> makes sense to pick that instead.
> 
> I'll go with (2) for now, but I would appreciate others' thoughts before
> sending a subsequent re-roll of this series.

We talked about this off-list, so I want to summarize here for the
archive.

It turns out that the GNU grep behavior isn't universal.  Here's what I
get:

  $ grep -o -C3 the README.md
  the
  the
  the
  the
  the
  the
  the
  the
  --
  the
  the
  the
  the
  the
  the
  the
  the
  the
  the
  the
  the
  the
  the

So that's not _quite_ ignoring -C. It's still showing the "--", which
indicates that the first chunk are all matches within 6 lines of each
other (so their context is melded into a single hunk), but it omits the
lines entirely. Which at least seems like it could be _plausibly_
useful.

BSD grep seems to actually show the context lines. Which is more
information, but if you want to actually see the context, why are you
using "-o" in the first place?

So my general feeling is that disallowing the combination is probably
fine, because it's a vaguely crazy thing to ask for. But that GNU grep's
behavior is the most likely to actually be useful. The BSD behavior
seems more like "this is what we happen to produce" to me.

And it should be pretty easy to reproduce the GNU grep behavior by just
not outputting anything in show_line().

-Peff


Re: [PATCH 2/2] builtin/grep.c: teach '-o', '--only-matching' to 'git-grep'

2018-05-09 Thread Taylor Blau
On Tue, May 08, 2018 at 01:25:17PM -0400, Jeff King wrote:
> On Sat, May 05, 2018 at 08:49:43AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> > > +test_expect_success 'grep --only-matching --heading' '
> > > + git grep --only-matching --heading --line-number --column mmap file 
> > > >actual &&
> > > + test_cmp expected actual
> > > +'
> > > +
> > >  cat >expected < > >  hello.c
> > >  4:int main(int argc, const char **argv)
> >
> > We should test this a lot more, I think a good way to do that would be
> > to extend this series by first importing GNU grep's -o tests, see
> > http://git.savannah.gnu.org/cgit/grep.git/tree/tests/foad1 they are
> > license-compatible. Then change the grep_test() function to call git
> > grep instead.
>
> I'm trying to figure out what GNU grep's tests are actually checking
> that we don't have. I see:
>
>  - they check that "-i" returns the actual found string in its original
>case. This seems like a subset of finding a non-trivial regex. I.e.,
>"foo.*" should find "foobar". We probably should have a test like
>that.
>
>  - they test multiple hits on the same line, which seems like an
>important and easy-to-screw-up case; we do that, too.

Agree.

>  - they test everything other thing with and without "-i" because those
>are apparently separate code paths in GNU grep, but I don't think
>that applies to us.
>
>  - they test each case with "-b", but we don't have that (we do test
>with "--column", which is good)

Right, I think that we can ignore these groups.

>  - they test with "-n", which we do here (we don't test without, but
>that seems like an unlikely bug, knowing how it is implemented)

Fair, let's leave that as is.

>  - they test with -H, but that is already the default for git-grep

Good, we can ignore this one.

>  - they test with context (-C3) for us. It looks like GNU grep omits
>context lines with "-o", but we show a bunch of blank lines. This is
>I guess a bug (though it seems kind of an odd combination to specify
>in the first place)

I'm torn on what to do here. Currently, with -C3, I get a bunch of lines
like:

  -

Which I think is technically _right_, but I agree that it is certainly
an odd combination of things to give to 'git grep'. I think that we
could either:

  1. Continue outputting blank lines,
  2. Ignore '-C' with '-o', or
  3. die() when given this combination.

I think that (1) is the most "correct" (for some definition), so I'm
inclined to adopt that approach. I suppose that (2) is closer to what
GNU grep offers, and that is the point of this series, so perhaps it
makes sense to pick that instead.

I'll go with (2) for now, but I would appreciate others' thoughts before
sending a subsequent re-roll of this series.

> So I think it probably makes sense to just pick through the list I just
> wrote and write our own tests than to try to import GNU grep's specific
> tests (and there's a ton of other unrelated tests in that file that may
> or may not even run with git-grep).

I agree. Since some of these cases are already covered, and some don't
have analogues, I think that it is most sensible to go through the above
and add _those_, instead of porting the whole test suite over from GNU.

> > It should also be tested with the various grep.patternType options to
> > make sure it works with basic, extended, perl, fixed etc.
>
> Hmm, this code is all external to the actual matching. So unless one of
> those is totally screwing up the regmatch_t return, I'm not sure that's
> accomplishing much (and if one of them is, it's totally broken for
> coloring, "-c", and maybe other features).
>
> We've usually taken a pretty white-box approach to our testing, covering
> things that seem likely to go wrong given the general problem space and
> our implementation. But maybe I'm missing something in particular that
> you think might be tricky.
>
> -Peff

Thanks,
Taylor


Re: [PATCH 2/2] builtin/grep.c: teach '-o', '--only-matching' to 'git-grep'

2018-05-08 Thread Jeff King
On Sat, May 05, 2018 at 08:49:43AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > +test_expect_success 'grep --only-matching --heading' '
> > +   git grep --only-matching --heading --line-number --column mmap file 
> > >actual &&
> > +   test_cmp expected actual
> > +'
> > +
> >  cat >expected < >  hello.c
> >  4:int main(int argc, const char **argv)
> 
> We should test this a lot more, I think a good way to do that would be
> to extend this series by first importing GNU grep's -o tests, see
> http://git.savannah.gnu.org/cgit/grep.git/tree/tests/foad1 they are
> license-compatible. Then change the grep_test() function to call git
> grep instead.

I'm trying to figure out what GNU grep's tests are actually checking
that we don't have. I see:

 - they check that "-i" returns the actual found string in its original
   case. This seems like a subset of finding a non-trivial regex. I.e.,
   "foo.*" should find "foobar". We probably should have a test like
   that.

 - they test multiple hits on the same line, which seems like an
   important and easy-to-screw-up case; we do that, too.

 - they test everything other thing with and without "-i" because those
   are apparently separate code paths in GNU grep, but I don't think
   that applies to us.

 - they test each case with "-b", but we don't have that (we do test
   with "--column", which is good)

 - they test with "-n", which we do here (we don't test without, but
   that seems like an unlikely bug, knowing how it is implemented)

 - they test with -H, but that is already the default for git-grep

 - they test with context (-C3) for us. It looks like GNU grep omits
   context lines with "-o", but we show a bunch of blank lines. This is
   I guess a bug (though it seems kind of an odd combination to specify
   in the first place)

So I think it probably makes sense to just pick through the list I just
wrote and write our own tests than to try to import GNU grep's specific
tests (and there's a ton of other unrelated tests in that file that may
or may not even run with git-grep).

> It should also be tested with the various grep.patternType options to
> make sure it works with basic, extended, perl, fixed etc.

Hmm, this code is all external to the actual matching. So unless one of
those is totally screwing up the regmatch_t return, I'm not sure that's
accomplishing much (and if one of them is, it's totally broken for
coloring, "-c", and maybe other features).

We've usually taken a pretty white-box approach to our testing, covering
things that seem likely to go wrong given the general problem space and
our implementation. But maybe I'm missing something in particular that
you think might be tricky.

-Peff


Re: [PATCH 2/2] builtin/grep.c: teach '-o', '--only-matching' to 'git-grep'

2018-05-07 Thread Taylor Blau
On Sat, May 05, 2018 at 03:36:12AM -0400, Eric Sunshine wrote:
> On Sat, May 5, 2018 at 12:03 AM, Taylor Blau  wrote:
> > Teach GNU grep(1)'s '-o' ('--only-matching') to 'git-grep'. This option
> > prints only the matching components of each line. It writes multiple
> > lines if more than one match exists on a given line.
> >
> > For example:
> >
> >   $ git grep -on --column --heading git -- README.md | head -3
> >   README.md
> >   15:56:git
> >   18:20:git
> >
> > By using show_line_header(), 'git grep --only-matching' correctly
> > respects the '--header' option:
>
> What is the '--header' option? I don't see it used in any example.

I think '--header' is a typo for '--heading', which is used in the
following example.

> >   $ git grep -on --column --heading git -- README.md | head -4
> >   README.md
> >   15:56:git
> >   18:20:git
> >   19:16:git
>
> How does this example differ from the earlier example (other than
> showing 4 lines of output rather than 3)?

Ack. I clipped from my terminal what I meant to be the seocnd
example, and pasted it in for both examples. They are meant to be as
follows:

  1. 'git grep' without heading, showing the full line prefix, and
  2. 'git grep' with heading, showing the file heading with '--heading'.

The later has '| head -n4' on the end to include 3+1 lines (3 matches, 1
heading) whereas the former has '| head -n3' to include 3 lines (3
matches, no heading).

I have updated my patch locally to reflect this.


Thanks,
Taylor


Re: [PATCH 2/2] builtin/grep.c: teach '-o', '--only-matching' to 'git-grep'

2018-05-05 Thread Eric Sunshine
On Sat, May 5, 2018 at 12:03 AM, Taylor Blau  wrote:
> Teach GNU grep(1)'s '-o' ('--only-matching') to 'git-grep'. This option
> prints only the matching components of each line. It writes multiple
> lines if more than one match exists on a given line.
>
> For example:
>
>   $ git grep -on --column --heading git -- README.md | head -3
>   README.md
>   15:56:git
>   18:20:git
>
> By using show_line_header(), 'git grep --only-matching' correctly
> respects the '--header' option:

What is the '--header' option? I don't see it used in any example.

>   $ git grep -on --column --heading git -- README.md | head -4
>   README.md
>   15:56:git
>   18:20:git
>   19:16:git

How does this example differ from the earlier example (other than
showing 4 lines of output rather than 3)?

> Signed-off-by: Taylor Blau 


Re: [PATCH 2/2] builtin/grep.c: teach '-o', '--only-matching' to 'git-grep'

2018-05-05 Thread Ævar Arnfjörð Bjarmason

On Sat, May 05 2018, Taylor Blau wrote:

> +--o::
> +--only-matching::
> + Show only the matching part of the lines.
> +

Makes sense to steal GNU grep's description here:

Print only the matched (non-empty) parts of a matching line, with
each such part on a separate output line.

> + if (!opt->only_matching)
> + output_color(opt, bol, match.rm_so, line_color);

This should also have braces, see "When there are multiple arms to a
conditional" in Documentation/CodingGuidelines.


>  '
>
> +cat >expected < +file:1:5:mmap
> +file:2:5:mmap
> +file:3:5:mmap
> +file:3:14:mmap
> +file:4:5:mmap
> +file:4:14:mmap
> +file:5:5:mmap
> +file:5:14:mmap
> +EOF

This should be set up as part of the test itself, see e.g. my c8b2cec09e
("branch: add test for -m renaming multiple config sections",
2017-06-18) for how to do that.

> +test_expect_success 'grep --only-matching' '
> + git grep --only-matching --line-number --column mmap file >actual &&
> + test_cmp expected actual
> +'
> +
> +cat >expected < +file
> +1:5:mmap
> +2:5:mmap
> +3:5:mmap
> +3:14:mmap
> +4:5:mmap
> +4:14:mmap
> +5:5:mmap
> +5:14:mmap
> +EOF
> +
> +test_expect_success 'grep --only-matching --heading' '
> + git grep --only-matching --heading --line-number --column mmap file 
> >actual &&
> + test_cmp expected actual
> +'
> +
>  cat >expected <  hello.c
>  4:int main(int argc, const char **argv)

We should test this a lot more, I think a good way to do that would be
to extend this series by first importing GNU grep's -o tests, see
http://git.savannah.gnu.org/cgit/grep.git/tree/tests/foad1 they are
license-compatible. Then change the grep_test() function to call git
grep instead.

It should also be tested with the various grep.patternType options to
make sure it works with basic, extended, perl, fixed etc.


[PATCH 2/2] builtin/grep.c: teach '-o', '--only-matching' to 'git-grep'

2018-05-04 Thread Taylor Blau
Teach GNU grep(1)'s '-o' ('--only-matching') to 'git-grep'. This option
prints only the matching components of each line. It writes multiple
lines if more than one match exists on a given line.

For example:

  $ git grep -on --column --heading git -- README.md | head -3
  README.md
  15:56:git
  18:20:git

By using show_line_header(), 'git grep --only-matching' correctly
respects the '--header' option:

  $ git grep -on --column --heading git -- README.md | head -4
  README.md
  15:56:git
  18:20:git
  19:16:git

Signed-off-by: Taylor Blau 
---
 Documentation/git-grep.txt |  6 +-
 builtin/grep.c |  1 +
 grep.c | 23 ---
 grep.h |  1 +
 t/t7810-grep.sh| 33 +
 5 files changed, 60 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index d451cd8883..9754923041 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -20,7 +20,7 @@ SYNOPSIS
   [-c | --count] [--all-match] [-q | --quiet]
   [--max-depth ]
   [--color[=] | --no-color]
-  [--break] [--heading] [-p | --show-function]
+  [--break] [--heading] [-o | --only-matching] [-p | --show-function]
   [-A ] [-B ] [-C ]
   [-W | --function-context]
   [--threads ]
@@ -221,6 +221,10 @@ providing this option will cause it to die.
Show the filename above the matches in that file instead of
at the start of each shown line.
 
+--o::
+--only-matching::
+   Show only the matching part of the lines.
+
 -p::
 --show-function::
Show the preceding line that contains the function name of
diff --git a/builtin/grep.c b/builtin/grep.c
index 5c83f17759..5028bf96cf 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -851,6 +851,7 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
N_("print empty line between matches from different 
files")),
OPT_BOOL(0, "heading", ,
N_("show filename only once above matches from same 
file")),
+   OPT_BOOL('o', "only-matching", _matching, N_("show 
only matches")),
OPT_GROUP(""),
OPT_CALLBACK('C', "context", , N_("n"),
N_("show  context lines before and after matches"),
diff --git a/grep.c b/grep.c
index 89dd719e4d..da3f8e6266 100644
--- a/grep.c
+++ b/grep.c
@@ -1422,11 +1422,13 @@ static void show_line(struct grep_opt *opt, char *bol, 
char *eol,
}
}
show_line_header(opt, name, lno, cno, sign);
-   if (opt->color) {
+   if (opt->color || opt->only_matching) {
regmatch_t match;
enum grep_context ctx = GREP_CONTEXT_BODY;
int ch = *eol;
int eflags = 0;
+   int first = 1;
+   int offset = 1;
 
if (sign == ':')
match_color = opt->color_match_selected;
@@ -1443,16 +1445,31 @@ static void show_line(struct grep_opt *opt, char *bol, 
char *eol,
if (match.rm_so == match.rm_eo)
break;
 
-   output_color(opt, bol, match.rm_so, line_color);
+   if (!opt->only_matching)
+   output_color(opt, bol, match.rm_so, line_color);
+   else if (!first) {
+   /*
+* We are given --only-matching, and this is not
+* the first match on a line. Reprint the
+* newline and header before showing another
+* match.
+*/
+   opt->output(opt, "\n", 1);
+   show_line_header(opt, name, lno,
+   offset+match.rm_so, sign);
+   }
output_color(opt, bol + match.rm_so,
 match.rm_eo - match.rm_so, match_color);
+   offset += match.rm_eo;
bol += match.rm_eo;
rest -= match.rm_eo;
eflags = REG_NOTBOL;
+   first = 0;
}
*eol = ch;
}
-   output_color(opt, bol, rest, line_color);
+   if (!opt->only_matching)
+   output_color(opt, bol, rest, line_color);
opt->output(opt, "\n", 1);
 }
 
diff --git a/grep.h b/grep.h
index 08a0b391c5..24c1460100 100644
--- a/grep.h
+++ b/grep.h
@@ -126,6 +126,7 @@ struct grep_opt {
const char *prefix;
int prefix_length;
regex_t regexp;
+   int only_matching;
int linenum;
int columnnum;
int invert;
diff --git a/t/t7810-grep.sh