Re: [PATCH 0/7] grep.c: teach --column to 'git-grep(1)'

2018-06-19 Thread Taylor Blau
On Mon, Jun 18, 2018 at 06:43:01PM -0500, Taylor Blau wrote:
> Hi,
>
> Attached is a ``fresh start'' of my series to teach 'git grep --column'.
> Since the last time I sent this, much has changed, notably the semantics
> for deciding which column is the first when given (1) extended
> expressions and (2) --invert.

Hi (again),

I have posted an updated version of this series on my fork available at
[1], in case anyone is interested in the changes before I publish them
here tomorrow.

I am going to let this thread sit overnight to collect any more
feedback.


Thanks,
Taylor

[1]: https://github.com/ttaylorr/git/compare/tb/grep-colno


Re: [PATCH 0/7] grep.c: teach --column to 'git-grep(1)'

2018-06-19 Thread Junio C Hamano
René Scharfe  writes:

> So let's see what your example does:
>
>$ git grep --column --not \( --not -e foo --or --not -e bar \) trace.h
>trace.h:13: *  #define foo(format, ...) bar(format, __VA_ARGS__)
>$ git grep --column --not \( --not -e bar --or --not -e foo \) trace.h
>trace.h:13: *  #define foo(format, ...) bar(format, __VA_ARGS__)
>
> Impressive.  That expression is confusing at first sight, but showing
> the first matching column anyway requires no further explanation.  I
> like it.

OK, obviously many people are a lot more math/logic minded than I
am.  I still think that any code that attempts to show the same
result as '-e bar --and -e foo' from the above is over-engineered,
but as long as it is done corretly and efficiently I won't have any
objection.

Thanks.





Re: [PATCH 0/7] grep.c: teach --column to 'git-grep(1)'

2018-06-19 Thread René Scharfe
Am 19.06.2018 um 21:11 schrieb Jeff King:
> On Tue, Jun 19, 2018 at 08:50:16PM +0200, René Scharfe wrote:
> 
>> Negation causes the whole non-matching line to match, with --column
>> reporting 1 or nothing in such a case, right?  Or I think doing the
>> same when the operator is applied a second time is explainable.

(Not sure where that extra "Or" came from.)

> Yes to your first question.
> 
> Regarding the final sentence, yes, I agree it's explainable. But I
> thought that handling negation like this was one of the main complaints
> of earlier iterations?

That's possible -- I didn't read the full thread, and I didn't argue
for or against any specific behavior in this regard before AFAIR.

So let's see what your example does:

   $ git grep --column --not \( --not -e foo --or --not -e bar \) trace.h
   trace.h:13: *  #define foo(format, ...) bar(format, __VA_ARGS__)
   $ git grep --column --not \( --not -e bar --or --not -e foo \) trace.h
   trace.h:13: *  #define foo(format, ...) bar(format, __VA_ARGS__)

Impressive.  That expression is confusing at first sight, but showing
the first matching column anyway requires no further explanation.  I
like it.

René


Re: [PATCH 0/7] grep.c: teach --column to 'git-grep(1)'

2018-06-19 Thread René Scharfe
Am 19.06.2018 um 19:44 schrieb Taylor Blau:
> diff --git a/grep.c b/grep.c
> index f3329d82ed..a09935d8c5 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -1257,8 +1257,8 @@ static int match_one_pattern(struct grep_pat *p, char 
> *bol, char *eol,
>   return hit;
>   }
> 
> -static int match_expr_eval(struct grep_expr *x, char *bol, char *eol,
> -enum grep_context ctx, ssize_t *col,
> +static int match_expr_eval(struct grep_opt *opt, struct grep_expr *x, char 
> *bol,
> +char *eol, enum grep_context ctx, ssize_t *col,
>  ssize_t *icol, int collect_hits)

Passing opt in is one way.  Piggy-backing on collect_hits and making it
a flags parameter for two bits might be easier.  At least it wouldn't
increase the number of function arguments any further.  Not sure.

Anyway, something like this would be needed as well; or we could
force opt->columnnum to switch opt->extended on:

diff --git a/grep.c b/grep.c
index 8ffa94c791..a724ca3010 100644
--- a/grep.c
+++ b/grep.c
@@ -1325,6 +1321,7 @@ static int match_line(struct grep_opt *opt, char *bol, 
char *eol,
  enum grep_context ctx, int collect_hits)
 {
struct grep_pat *p;
+   int hit = 0;
 
if (opt->extended)
return match_expr(opt, bol, eol, ctx, col, icol,
@@ -1334,11 +1331,14 @@ static int match_line(struct grep_opt *opt, char *bol, 
char *eol,
for (p = opt->pattern_list; p; p = p->next) {
regmatch_t tmp;
if (match_one_pattern(p, bol, eol, ctx, , 0)) {
-   *col = tmp.rm_so;
-   return 1;
+   hit |= 1;
+   if (!opt->columnnum)
+   break;
+   if (*col < 0 || tmp.rm_so < *col)
+   *col = tmp.rm_so;
}
}
-   return 0;
+   return hit;
 }
 
 static int match_next_pattern(struct grep_pat *p, char *bol, char *eol,


Re: [PATCH 0/7] grep.c: teach --column to 'git-grep(1)'

2018-06-19 Thread Jeff King
On Tue, Jun 19, 2018 at 08:50:16PM +0200, René Scharfe wrote:

> Negation causes the whole non-matching line to match, with --column
> reporting 1 or nothing in such a case, right?  Or I think doing the
> same when the operator is applied a second time is explainable.

Yes to your first question.

Regarding the final sentence, yes, I agree it's explainable. But I
thought that handling negation like this was one of the main complaints
of earlier iterations?

> When ORing multiple expressions I don't pay attention to their order
> as that operator is commutative.  Having results depend on that
> order would at least surprise me.

OK. Let's just disable the short-circuit for --column then (i.e., what
Taylor posted earlier). That's explainable, and I doubt the performance
implications are going to be all that noticeable.

-Peff


Re: [PATCH 0/7] grep.c: teach --column to 'git-grep(1)'

2018-06-19 Thread René Scharfe
Am 19.06.2018 um 19:48 schrieb Jeff King:
> On Tue, Jun 19, 2018 at 07:33:39PM +0200, René Scharfe wrote:
> 
>>> The key thing about this iteration is that it doesn't regress
>>> performance, because we always short-circuit where we used to. The other
>>> obvious route is to stop short-circuiting only when "--column" is in
>>> effect, which would have the same property (at the expense of a little
>>> extra code in match_expr_eval()).
>>
>> The performance impact of the exhaustive search for --color scales with
>> the number of shown lines, while it would scale with the total number of
>> lines for --column.  Coloring the results of highly selective patterns
>> is relatively cheap, short-circuiting them still helps significantly.
> 
> I thought that at first, too, but I think we'd still scale with the
> number of shown lines. We're talking about short-circuiting OR, so by
> definition we stop the short-circuit because we matched the first half
> of the OR.
> 
> If you stop short-circuiting AND, then yes, you incur a penalty for
> every line. But I don't think --column would need to do that.

Good point.  So disabling that optimization for --column (or in even
in general) shouldn't be a dramatic loss.

> Although there are interesting cases around inversion. For example:
> 
>git grep --not \( --not -e a --and --not -e b \)
> 
> is equivalent to:
> 
>git grep -e a --or -e b
> 
> Do people care if we actually hunt down the exact column where we
> _didn't_ match "b" in the first case?  The two are equivalent, but I
> have to wonder if somebody writing the first one really cares.

I'd rather not guess the intentions of someone using such convoluted
expressions. ;-)

Negation causes the whole non-matching line to match, with --column
reporting 1 or nothing in such a case, right?  Or I think doing the
same when the operator is applied a second time is explainable.

>> Disabling that optimization for --column wouldn't be a regression since
>> it's a new option..  Picking a random result (based on the order of
>> evaluation) seems sloppy and is probably going to surprise users.
> 
> I don't see it as a random result; short-circuiting logic is well
> understood and we follow the user's ordering.

When ORing multiple expressions I don't pay attention to their order
as that operator is commutative.  Having results depend on that
order would at least surprise me.

> I think the place where it's _most_ ugly is "--column --color", where we
> may color the short-circuited value in the second pass.

The double negative example you gave above has that discrepancy as
well, but I think in that case it just highlights the different
intentions (--color: highlight search terms, --column: show leftmost
match).

>> We could add an optimizer pass to reduce the number of regular
>> expressions in certain cases if that is really too slow.  E.g. this:
> 
> Yes, we actually discussed this kind of transformation. I think it's way
> out of scope for this patch series, though. If we do anything more, I
> think it should be to disable short-circuiting when --column is in use.

Yep.

René


Re: [PATCH 0/7] grep.c: teach --column to 'git-grep(1)'

2018-06-19 Thread Junio C Hamano
Jeff King  writes:

> Even if it's a double-inversion? The reason we carry both `col` and
> `icol` is that it allows:
>
>   git grep --not --not --not --not -e a
>
> to still say "we found 'a' here". That's a dumb thing to ask for, but it
> is true in the end that we show lines with "a" (and will colorize them
> as such).

Yes.  I think the code is doing too much to cater to a dumb request
;-)



Re: [PATCH 0/7] grep.c: teach --column to 'git-grep(1)'

2018-06-19 Thread Jeff King
On Tue, Jun 19, 2018 at 10:58:30AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Although there are interesting cases around inversion. For example:
> >
> >   git grep --not \( --not -e a --and --not -e b \)
> >
> > is equivalent to:
> >
> >   git grep -e a --or -e b
> >
> > Do people care if we actually hunt down the exact column where we
> > _didn't_ match "b" in the first case?  The two are equivalent, but I
> > have to wonder if somebody writing the first one really cares.
> 
> I may be misunderstanding the question, but I personally would feel
> that "git grep --not " is OK to say "the entire line is at
> fault that it did not satisify the criteria to match ".
> I.e., I'd be happy if --column marked the first column as the
> beginning of the match, or --color painted the entire line in the
> output of the former.

Even if it's a double-inversion? The reason we carry both `col` and
`icol` is that it allows:

  git grep --not --not --not --not -e a

to still say "we found 'a' here". That's a dumb thing to ask for, but it
is true in the end that we show lines with "a" (and will colorize them
as such).

-Peff


Re: [PATCH 0/7] grep.c: teach --column to 'git-grep(1)'

2018-06-19 Thread Taylor Blau
On Tue, Jun 19, 2018 at 10:58:30AM -0700, Junio C Hamano wrote:
> Jeff King  writes:
>
> > Although there are interesting cases around inversion. For example:
> >
> >   git grep --not \( --not -e a --and --not -e b \)
> >
> > is equivalent to:
> >
> >   git grep -e a --or -e b
> >
> > Do people care if we actually hunt down the exact column where we
> > _didn't_ match "b" in the first case?  The two are equivalent, but I
> > have to wonder if somebody writing the first one really cares.
>
> I may be misunderstanding the question, but I personally would feel
> that "git grep --not " is OK to say "the entire line is at
> fault that it did not satisify the criteria to match ".
> I.e., I'd be happy if --column marked the first column as the
> beginning of the match, or --color painted the entire line in the
> output of the former.

I think that Peff is pointing out that a short-circuiting OR will cause
a line like

  b foo a

to print the column for "a", not "b" (when given "-e a --or -e b").

Thanks,
Taylor


Re: [PATCH 0/7] grep.c: teach --column to 'git-grep(1)'

2018-06-19 Thread Junio C Hamano
Jeff King  writes:

> Although there are interesting cases around inversion. For example:
>
>   git grep --not \( --not -e a --and --not -e b \)
>
> is equivalent to:
>
>   git grep -e a --or -e b
>
> Do people care if we actually hunt down the exact column where we
> _didn't_ match "b" in the first case?  The two are equivalent, but I
> have to wonder if somebody writing the first one really cares.

I may be misunderstanding the question, but I personally would feel
that "git grep --not " is OK to say "the entire line is at
fault that it did not satisify the criteria to match ".
I.e., I'd be happy if --column marked the first column as the
beginning of the match, or --color painted the entire line in the
output of the former.




Re: [PATCH 0/7] grep.c: teach --column to 'git-grep(1)'

2018-06-19 Thread René Scharfe
Am 19.06.2018 um 19:44 schrieb Taylor Blau:
> On Tue, Jun 19, 2018 at 07:33:39PM +0200, René Scharfe wrote:
>> Am 19.06.2018 um 18:35 schrieb Jeff King:
>>> On Mon, Jun 18, 2018 at 06:43:01PM -0500, Taylor Blau wrote:
>> We could add an optimizer pass to reduce the number of regular
>> expressions in certain cases if that is really too slow.  E.g. this:
>>
>>  $ git grep -e b -e a
>>
>> ... is equivalent to:
>>
>>  $ git grep -e '\(b\)\|\(a\)'
>>
>> In that example the optimizer should use a single kwset instead of a
>> regex, but you get the idea, namely to leave the short-circuiting to the
>> regex engine or kwset, which probably do a good job of it.
> 
> I think that--while this pushes that decision to the appropriate level
> of indirection--that it is out of scope for this series, and that the
> above patch should do a sufficient job at not surprising users.

Definitely.  I'm not even convinced that performance problem is real --
otherwise someone would have added such an optimization already, right?
:)

René


Re: [PATCH 0/7] grep.c: teach --column to 'git-grep(1)'

2018-06-19 Thread Taylor Blau
On Tue, Jun 19, 2018 at 01:48:47PM -0400, Jeff King wrote:
> On Tue, Jun 19, 2018 at 07:33:39PM +0200, René Scharfe wrote:
> > Disabling that optimization for --column wouldn't be a regression since
> > it's a new option..  Picking a random result (based on the order of
> > evaluation) seems sloppy and is probably going to surprise users.
>
> I don't see it as a random result; short-circuiting logic is well
> understood and we follow the user's ordering.

Agreed. I would prefer _not_ to apply the patch that I sent to René,
since I think it adds more complexity than its worth (precisely because
the short-circuiting logic is known, though certainly the problem gets
worse as the expression tree grows).

> I think the place where it's _most_ ugly is "--column --color", where we
> may color the short-circuited value in the second pass.

Agreed again, but it's worth noting that --color is the default. To play
devil's advocate, the use-case that I imagine will be most common is
with "git jump," so perhaps that doesn't matter as much.

> > We could add an optimizer pass to reduce the number of regular
> > expressions in certain cases if that is really too slow.  E.g. this:
>
> Yes, we actually discussed this kind of transformation. I think it's way
> out of scope for this patch series, though. If we do anything more, I
> think it should be to disable short-circuiting when --column is in use.

Piggy-backing on what I said to René, agreed again.

> -Peff

Thanks,
Taylor


Re: [PATCH 0/7] grep.c: teach --column to 'git-grep(1)'

2018-06-19 Thread Jeff King
On Tue, Jun 19, 2018 at 07:33:39PM +0200, René Scharfe wrote:

> > The key thing about this iteration is that it doesn't regress
> > performance, because we always short-circuit where we used to. The other
> > obvious route is to stop short-circuiting only when "--column" is in
> > effect, which would have the same property (at the expense of a little
> > extra code in match_expr_eval()).
> 
> The performance impact of the exhaustive search for --color scales with
> the number of shown lines, while it would scale with the total number of
> lines for --column.  Coloring the results of highly selective patterns
> is relatively cheap, short-circuiting them still helps significantly.

I thought that at first, too, but I think we'd still scale with the
number of shown lines. We're talking about short-circuiting OR, so by
definition we stop the short-circuit because we matched the first half
of the OR.

If you stop short-circuiting AND, then yes, you incur a penalty for
every line. But I don't think --column would need to do that.

Although there are interesting cases around inversion. For example:

  git grep --not \( --not -e a --and --not -e b \)

is equivalent to:

  git grep -e a --or -e b

Do people care if we actually hunt down the exact column where we
_didn't_ match "b" in the first case?  The two are equivalent, but I
have to wonder if somebody writing the first one really cares.

> Disabling that optimization for --column wouldn't be a regression since
> it's a new option..  Picking a random result (based on the order of
> evaluation) seems sloppy and is probably going to surprise users.

I don't see it as a random result; short-circuiting logic is well
understood and we follow the user's ordering.

I think the place where it's _most_ ugly is "--column --color", where we
may color the short-circuited value in the second pass.

> We could add an optimizer pass to reduce the number of regular
> expressions in certain cases if that is really too slow.  E.g. this:

Yes, we actually discussed this kind of transformation. I think it's way
out of scope for this patch series, though. If we do anything more, I
think it should be to disable short-circuiting when --column is in use.

-Peff


Re: [PATCH 0/7] grep.c: teach --column to 'git-grep(1)'

2018-06-19 Thread Taylor Blau
On Tue, Jun 19, 2018 at 07:33:39PM +0200, René Scharfe wrote:
> Am 19.06.2018 um 18:35 schrieb Jeff King:
> > On Mon, Jun 18, 2018 at 06:43:01PM -0500, Taylor Blau wrote:
> >> The notable case that it does _not_ cover is matching the following
> >> line:
> >>
> >>a ... b
> >>
> >> with the following expression
> >>
> >>git grep --column -e b --or -e a
> >>
> >> This will produce the column for 'b' rather than the column for 'a',
> >> since we short-circuit an --or when the left child finds a match, in
> >> this case 'b'. So, we break the semantics for this case, at the benefit
> >> of not having to do twice the work.
> >>
> >> In the future, I'd like to revisit this, since any performance gains
> >> that we _do_ make in this area are moot when we rescan all lines in
> >> show_line() with --color. A path forward, I imagine, would look like a
> >> list of regmatch_t's, or a set of locations in the expression tree, such
> >> that we could either enumerate the list or walk the tree in order to
> >> colorize the line. But, I think for now that is #leftoverbits.
> >
> > The key thing about this iteration is that it doesn't regress
> > performance, because we always short-circuit where we used to. The other
> > obvious route is to stop short-circuiting only when "--column" is in
> > effect, which would have the same property (at the expense of a little
> > extra code in match_expr_eval()).
>
> The performance impact of the exhaustive search for --color scales with
> the number of shown lines, while it would scale with the total number of
> lines for --column.  Coloring the results of highly selective patterns
> is relatively cheap, short-circuiting them still helps significantly.

Sure. I was pointing out that we are repeating work in cases where it is
unnecessary to do so. It seems natural to me to take one of the two
above approaches, and optimize where we can.

> Disabling that optimization for --column wouldn't be a regression since
> it's a new option..  Picking a random result (based on the order of
> evaluation) seems sloppy and is probably going to surprise users.

That's fair, and something I'm happy to do if others feel OK about it.
Here is a patch to that effect:

diff --git a/grep.c b/grep.c
index f3329d82ed..a09935d8c5 100644
--- a/grep.c
+++ b/grep.c
@@ -1257,8 +1257,8 @@ static int match_one_pattern(struct grep_pat *p, char 
*bol, char *eol,
return hit;
 }

-static int match_expr_eval(struct grep_expr *x, char *bol, char *eol,
-  enum grep_context ctx, ssize_t *col,
+static int match_expr_eval(struct grep_opt *opt, struct grep_expr *x, char 
*bol,
+  char *eol, enum grep_context ctx, ssize_t *col,
   ssize_t *icol, int collect_hits)
 {
int h = 0;
@@ -1282,26 +1282,27 @@ static int match_expr_eval(struct grep_expr *x, char 
*bol, char *eol,
/*
 * Upon visiting a GREP_NODE_NOT, col and icol become swapped.
 */
-   h = !match_expr_eval(x->u.unary, bol, eol, ctx, icol, col, 0);
+   h = !match_expr_eval(opt, x->u.unary, bol, eol, ctx, icol, col, 
0);
break;
case GREP_NODE_AND:
-   if (!match_expr_eval(x->u.binary.left, bol, eol, ctx, col,
+   if (!match_expr_eval(opt, x->u.binary.left, bol, eol, ctx, col,
 icol, 0))
return 0;
-   h = match_expr_eval(x->u.binary.right, bol, eol, ctx, col,
+   h = match_expr_eval(opt, x->u.binary.right, bol, eol, ctx, col,
icol, 0);
break;
case GREP_NODE_OR:
-   if (!collect_hits)
-   return (match_expr_eval(x->u.binary.left, bol, eol, ctx,
+   if (!(collect_hits || opt->columnnum))
+   return (match_expr_eval(opt, x->u.binary.left, bol, 
eol, ctx,
col, icol, 0) ||
-   match_expr_eval(x->u.binary.right, bol, eol,
+   match_expr_eval(opt, x->u.binary.right, bol, 
eol,
ctx, col, icol, 0));
-   h = match_expr_eval(x->u.binary.left, bol, eol, ctx, col,
+   h = match_expr_eval(opt, x->u.binary.left, bol, eol, ctx, col,
icol, 0);
-   x->u.binary.left->hit |= h;
-   h |= match_expr_eval(x->u.binary.right, bol, eol, ctx, col,
-icol, 1);
+   if (collect_hits)
+   x->u.binary.left->hit |= h;
+   h |= match_expr_eval(opt, x->u.binary.right, bol, eol, ctx, col,
+icol, collect_hits);
break;
default:
die("Unexpected node type (internal error) %d", x->node);
@@ -1316,7 +1317,7 @@ 

Re: [PATCH 0/7] grep.c: teach --column to 'git-grep(1)'

2018-06-19 Thread René Scharfe
Am 19.06.2018 um 18:35 schrieb Jeff King:
> On Mon, Jun 18, 2018 at 06:43:01PM -0500, Taylor Blau wrote:
>> The notable case that it does _not_ cover is matching the following
>> line:
>>
>>a ... b
>>
>> with the following expression
>>
>>git grep --column -e b --or -e a
>>
>> This will produce the column for 'b' rather than the column for 'a',
>> since we short-circuit an --or when the left child finds a match, in
>> this case 'b'. So, we break the semantics for this case, at the benefit
>> of not having to do twice the work.
>>
>> In the future, I'd like to revisit this, since any performance gains
>> that we _do_ make in this area are moot when we rescan all lines in
>> show_line() with --color. A path forward, I imagine, would look like a
>> list of regmatch_t's, or a set of locations in the expression tree, such
>> that we could either enumerate the list or walk the tree in order to
>> colorize the line. But, I think for now that is #leftoverbits.
> 
> The key thing about this iteration is that it doesn't regress
> performance, because we always short-circuit where we used to. The other
> obvious route is to stop short-circuiting only when "--column" is in
> effect, which would have the same property (at the expense of a little
> extra code in match_expr_eval()).

The performance impact of the exhaustive search for --color scales with
the number of shown lines, while it would scale with the total number of
lines for --column.  Coloring the results of highly selective patterns
is relatively cheap, short-circuiting them still helps significantly.

Disabling that optimization for --column wouldn't be a regression since
it's a new option..  Picking a random result (based on the order of
evaluation) seems sloppy and is probably going to surprise users.

We could add an optimizer pass to reduce the number of regular
expressions in certain cases if that is really too slow.  E.g. this:

$ git grep -e b -e a

... is equivalent to:

$ git grep -e '\(b\)\|\(a\)'

In that example the optimizer should use a single kwset instead of a
regex, but you get the idea, namely to leave the short-circuiting to the
regex engine or kwset, which probably do a good job of it.

René


Re: [PATCH 0/7] grep.c: teach --column to 'git-grep(1)'

2018-06-19 Thread Taylor Blau
On Tue, Jun 19, 2018 at 09:46:16AM -0700, Junio C Hamano wrote:
> Taylor Blau  writes:
>
> > Attached is a ``fresh start'' of my series to teach 'git grep --column'.
> > Since the last time I sent this, much has changed, notably the semantics
> > for deciding which column is the first when given (1) extended
> > expressions and (2) --invert.
> > ...
> > In the future, I'd like to revisit this, since any performance gains
> > that we _do_ make in this area are moot when we rescan all lines in
> > show_line() with --color.
>
> Sounds like a plan.  From a quick scan, it seems that this is
> sufficiently different from the last round so I won't bother
> rebuilding your "--only" series on top, and instead just drop those
> two series from the older round and queue this as the new and
> improved tb/grep-column topic.

Thank you. I recommend dropping the second series
(tb/grep-only-matching) entirely for now. I will rebase that on top of
this so that you can apply it later in the future with ease.

I don't expect the parts of this series that affect that one to change
much, but I'll hold off on rebasing it in general until this series is
stable, hopefully soon.

> Thanks.

Thanks,
Taylor


Re: [PATCH 0/7] grep.c: teach --column to 'git-grep(1)'

2018-06-19 Thread Junio C Hamano
Taylor Blau  writes:

> Attached is a ``fresh start'' of my series to teach 'git grep --column'.
> Since the last time I sent this, much has changed, notably the semantics
> for deciding which column is the first when given (1) extended
> expressions and (2) --invert.
> ...
> In the future, I'd like to revisit this, since any performance gains
> that we _do_ make in this area are moot when we rescan all lines in
> show_line() with --color.

Sounds like a plan.  From a quick scan, it seems that this is
sufficiently different from the last round so I won't bother
rebuilding your "--only" series on top, and instead just drop those
two series from the older round and queue this as the new and
improved tb/grep-column topic.

Thanks.


Re: [PATCH 0/7] grep.c: teach --column to 'git-grep(1)'

2018-06-19 Thread Jeff King
On Mon, Jun 18, 2018 at 06:43:01PM -0500, Taylor Blau wrote:

> Attached is a ``fresh start'' of my series to teach 'git grep --column'.
> Since the last time I sent this, much has changed, notably the semantics
> for deciding which column is the first when given (1) extended
> expressions and (2) --invert.
> 
> Both (1) and (2) are described in-depth in patch 2/7, but I am happy to
> answer more questions should they arise here. Peff and I worked on this
> together off-list, and we are both happy with the semantics, and believe
> that it covers most reasonable cases.

So with the exception of some minor type-quibbling in patch 4, this all
looks good to me. Since we discussed this quite a bit off-list, you can
take that review either with a giant grain of salt (reviewing something
I helped to generate in the first place) or a ringing endorsement (I
thought about it a lot more than I would have for a normal review ;) ).

> The notable case that it does _not_ cover is matching the following
> line:
> 
>   a ... b
> 
> with the following expression
> 
>   git grep --column -e b --or -e a
> 
> This will produce the column for 'b' rather than the column for 'a',
> since we short-circuit an --or when the left child finds a match, in
> this case 'b'. So, we break the semantics for this case, at the benefit
> of not having to do twice the work.
> 
> In the future, I'd like to revisit this, since any performance gains
> that we _do_ make in this area are moot when we rescan all lines in
> show_line() with --color. A path forward, I imagine, would look like a
> list of regmatch_t's, or a set of locations in the expression tree, such
> that we could either enumerate the list or walk the tree in order to
> colorize the line. But, I think for now that is #leftoverbits.

The key thing about this iteration is that it doesn't regress
performance, because we always short-circuit where we used to. The other
obvious route is to stop short-circuiting only when "--column" is in
effect, which would have the same property (at the expense of a little
extra code in match_expr_eval()).

-Peff