Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

2018-05-18 Thread Taylor Blau
On Fri, May 18, 2018 at 02:50:21PM -0700, Taylor Blau wrote:
> [...]
>
> > Another might be to pick "foo" in the first and "bar" in the second
> > line, as that is the "first hit" on the line, which is consistent
> > with how "git grep -e foo" would say about "a foo b foo c foo" (I
> > expect that the leftmost "foo" would be the first hit).  So there
> > may be multiple, equally plausible answer to the question.
>
> This is the largest fact in my mind pertaining to this discussion: there
> are probably many different interpretations of semantics for this, all
> equally valid in their own way. I am partial to the minimum substring
> interpretation (which follows naturally from the minimum-start,
> maximum-end idea), accepting the shortcoming that `--or` sometimes
> doesn't ``do the right thing.''

Ignore this last part. `--or` _does_ do the right thing, as noted below.

Thanks,
Taylor


Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

2018-05-18 Thread Taylor Blau
Thanks for your thoughtful response. I answered in detail below, but I
think that we're in agreement about the semantics, with a few
corrections on my part. I'd like to push forward with this series,
including the proposed changes below, but feel that sending it as v7
would be asking too much of a reviewer. Would it be OK if I sent this a
new series entirely and we abandon this thread?

On Fri, May 18, 2018 at 03:27:44PM +0900, Junio C Hamano wrote:
> Taylor Blau  writes:
>
> >   * `git grep --and -e foo -e bar`. This binary operation should recur
> > on its sub-expressions and take the minimum of the starting offset
> > and the maximum of the ending offset.
>
> We use infix notation, so the above is "git grep -e foo --and -e
> bar" actually ;-).

Thanks for catching that :-).

> But you raise an interesting point.  A line with "hello foo bar baz"
> on it would match, so does a line with "goodbye bar baz foo", as
> both of them hits pattern "foo" *and* pattern "bar".  It is not
> quite clear what it means to "show the first hit on the line".  One
> interpretation would be to take the minimum span that makes both
> sides of "--and" happy (your "minimum of start, maximum of end").

It's funny you should mention: this was nearly the exact phrase I used
when speaking with Peff.

> Another might be to pick "foo" in the first and "bar" in the second
> line, as that is the "first hit" on the line, which is consistent
> with how "git grep -e foo" would say about "a foo b foo c foo" (I
> expect that the leftmost "foo" would be the first hit).  So there
> may be multiple, equally plausible answer to the question.

This is the largest fact in my mind pertaining to this discussion: there
are probably many different interpretations of semantics for this, all
equally valid in their own way. I am partial to the minimum substring
interpretation (which follows naturally from the minimum-start,
maximum-end idea), accepting the shortcoming that `--or` sometimes
doesn't ``do the right thing.''

> > For inputs of the form "foobar" and "foo bar", it will do the right
> > thing (give the starting and ending offset for "foobar" and give no
> > match, respectively).
>
> I think I agree with "foobar", but I do not understand why there is
> no match for "foo bar".

Ah, I think this is my mistake -- when I wrote this note last night. The
case of `-e foo --and -e bar` should clearly match both `foo bar` _and_
`foobar`.

> >   * `git grep --or -e foo -e bar`. This is the most complicated case, in
> > my opinion. In going with the min/max idea in the and case above, I
> > think that `--or` should also min/max its sub-expressions, but in
> > fact we short-circuit evaluating the second sub-expression when we
> > find a match for the first.
>
> I am not sure I follow.  "git grep -e foo --or -e bar" is just a
> longhand for "git grep -e foo -e bar".  Shouldn't it highlight
> whichever between foo and bar that appears leftmost on the line?

I don't believe that the two are treated the same, but I think that this
is another case where I was incorrect in my judgement of the
implementation last night. In fact, the only time when we _don't_ recur
on both sub-expressions of `--or` is when 'collect_hits' is zero. That's
fine, I believe.

> > So, in cases like matching `--or -e foo -e bar` with "foo baz bar",
> > we'll do the right thing, since `foo` is the first sub-expression
> > and happens to be the left-most match. In other words, we __adhere
> > to our answer with the left-most match first__ semantics, but only
> > because __the first sub-expression is the left-most match__.
> >
> > In the other case where we try and match the same expression against
> > "bar baz foo", we'll return the starting offset of "foo", even
> > though it isn't the left-most match, violating our semantics.
>
> I am not sure why you think your min-starting/max-ending would lead
> to such a conclusion.  'foo baz bar' would be covered in its
> entirety, 'bar baz foo' would also, as starting of hits with pattern
> 'foo' and pattern 'bar' would be 'b' in 'bar' on that three-word
> line, and ending of hits with these two patterns would be the last
> 'o' in 'foo' on the line.

Right, I think with the understanding in my last stanza of this response
("I don't believe that ...") this issue is resolved, and the
min-starting/max-ending _will_ do the right thing.

> I'd expect that a line 'foo baz bar' matched against "-e foo --or -e
> bar" would say "among three words on me, 'f' in foo is the first
> location of the match", though.

I would, too.

Thanks,
Taylor


Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

2018-05-18 Thread Junio C Hamano
Taylor Blau  writes:

>   * `git grep --and -e foo -e bar`. This binary operation should recur
> on its sub-expressions and take the minimum of the starting offset
> and the maximum of the ending offset.

We use infix notation, so the above is "git grep -e foo --and -e
bar" actually ;-).

But you raise an interesting point.  A line with "hello foo bar baz"
on it would match, so does a line with "goodbye bar baz foo", as
both of them hits pattern "foo" *and* pattern "bar".  It is not
quite clear what it means to "show the first hit on the line".  One
interpretation would be to take the minimum span that makes both
sides of "--and" happy (your "minimum of start, maximum of end").
Another might be to pick "foo" in the first and "bar" in the second
line, as that is the "first hit" on the line, which is consistent
with how "git grep -e foo" would say about "a foo b foo c foo" (I
expect that the leftmost "foo" would be the first hit).  So there
may be multiple, equally plausible answer to the question.

> For inputs of the form "foobar" and "foo bar", it will do the right
> thing (give the starting and ending offset for "foobar" and give no
> match, respectively).

I think I agree with "foobar", but I do not understand why there is
no match for "foo bar".

>   * `git grep --or -e foo -e bar`. This is the most complicated case, in
> my opinion. In going with the min/max idea in the and case above, I
> think that `--or` should also min/max its sub-expressions, but in
> fact we short-circuit evaluating the second sub-expression when we
> find a match for the first.

I am not sure I follow.  "git grep -e foo --or -e bar" is just a
longhand for "git grep -e foo -e bar".  Shouldn't it highlight
whichever between foo and bar that appears leftmost on the line?

> So, in cases like matching `--or -e foo -e bar` with "foo baz bar",
> we'll do the right thing, since `foo` is the first sub-expression
> and happens to be the left-most match. In other words, we __adhere
> to our answer with the left-most match first__ semantics, but only
> because __the first sub-expression is the left-most match__.
>
> In the other case where we try and match the same expression against
> "bar baz foo", we'll return the starting offset of "foo", even
> though it isn't the left-most match, violating our semantics.

I am not sure why you think your min-starting/max-ending would lead
to such a conclusion.  'foo baz bar' would be covered in its
entirety, 'bar baz foo' would also, as starting of hits with pattern
'foo' and pattern 'bar' would be 'b' in 'bar' on that three-word
line, and ending of hits with these two patterns would be the last
'o' in 'foo' on the line.  

I'd expect that a line 'foo baz bar' matched against "-e foo --or -e
bar" would say "among three words on me, 'f' in foo is the first
location of the match", though.


Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

2018-05-17 Thread Taylor Blau
On Sat, May 12, 2018 at 03:07:04PM +0900, Junio C Hamano wrote:
> Taylor Blau  writes:
>
> > I re-read your note and understand more clearly now what your suggestion
> > is. To ensure that we're in agreement, do you mean:
> >
> >   1. '--column -v' will _never_ give a column, but will never die(),
> >   either
>
> No, I don't.
>
> >   2. '--column --[and | or | not]' will never give a column, but will
> >   also never die(), either.
>
> No, I don't.
>
> If a file does not have substring "foo", then
>
>   git grep -v -e foo file
>   git grep --not -e foo file
>
> would hit all lines, just like
>
>   git grep -e '.*' file
>
> does.
>
> I would expect that all of these
>
>   git grep --column/-o -v -e foo file
>   git grep --column/-o --not -e foo file
>   git grep --column/-o -e '.*' file
>
> give the same output, which is what we would get if we consider the
> hit from "choose lines that lack 'foo'" on a line without 'foo' is
> caused by the entire contents on the line.  That is in line with
> "choose lines that has anything (including nothing)" aka ".*" would
> result in the entire line being reported via -o.  The byte offset of
> the first hit on such a line reported by --column is also 1, and
> that is a good and real answer to the question "git grep --column/-o"
> can give.

I agree with your message now and thank you for explaining what you
had written. I spoke with Peff off-list for a while to determine what I
think is essentially the answer to ``what are a set of semantics for
filling out a regmatch_t given an extended expression?''

It's helpful to recognize that the extended expressions are implemented
very much like a tree, so a reasonable semantics will lend itself well
to the way in which match_expr_eval() is implemented. Here's what we
came up with:

  * `git grep -e foo`. This is the case where the extended expression
has a single atomic node in its tree. This falls into the "just call
match_one_pattern()" case and has a simple answer: the starting
offset and ending offset are that of whatever match_one_pattern
gives.

  * `git grep --not -e foo`. This has the set of semantics that you
describe above (the starting offset is 1), with the addition that
the ending offset is the end of the line. This is similar to the
fact that `--not foo` is very similar to `.$`.

  * `git grep --and -e foo -e bar`. This binary operation should recur
on its sub-expressions and take the minimum of the starting offset
and the maximum of the ending offset.

For inputs of the form "foobar" and "foo bar", it will do the right
thing (give the starting and ending offset for "foobar" and give no
match, respectively).

  * `git grep --or -e foo -e bar`. This is the most complicated case, in
my opinion. In going with the min/max idea in the and case above, I
think that `--or` should also min/max its sub-expressions, but in
fact we short-circuit evaluating the second sub-expression when we
find a match for the first.

So, in cases like matching `--or -e foo -e bar` with "foo baz bar",
we'll do the right thing, since `foo` is the first sub-expression
and happens to be the left-most match. In other words, we __adhere
to our answer with the left-most match first__ semantics, but only
because __the first sub-expression is the left-most match__.

In the other case where we try and match the same expression against
"bar baz foo", we'll return the starting offset of "foo", even
though it isn't the left-most match, violating our semantics.

So, I propose we adopt the following: use the trivial answer for "foo",
the whole line for "--not", and min/max the starting/ending offsets for
binary operators, knowing that we will sometimes produce a weird answer
for --or.

I think that the semantics for --or are OK to go forward with, but would
be interested in the thoughts of others to figure out whether this is
sensible to everyone else.

Does this seem like an OK approach? Perhaps Peff can clarify some of
what's shared here, since we did speak elsewhere about it.

Thanks,
Taylor


Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

2018-05-12 Thread Junio C Hamano
Taylor Blau  writes:

> I re-read your note and understand more clearly now what your suggestion
> is. To ensure that we're in agreement, do you mean:
>
>   1. '--column -v' will _never_ give a column, but will never die(),
>   either

No, I don't.

>   2. '--column --[and | or | not]' will never give a column, but will
>   also never die(), either.

No, I don't.

If a file does not have substring "foo", then

git grep -v -e foo file
git grep --not -e foo file

would hit all lines, just like

git grep -e '.*' file

does.

I would expect that all of these

git grep --column/-o -v -e foo file
git grep --column/-o --not -e foo file
git grep --column/-o -e '.*' file

give the same output, which is what we would get if we consider the
hit from "choose lines that lack 'foo'" on a line without 'foo' is
caused by the entire contents on the line.  That is in line with
"choose lines that has anything (including nothing)" aka ".*" would
result in the entire line being reported via -o.  The byte offset of
the first hit on such a line reported by --column is also 1, and
that is a good and real answer to the question "git grep --column/-o"
can give.  

In an earlier message, you sounded like you do not think "we did not
have 'foo' on that line, and that is why we are emitting because we
are operating under -v" lack a definite answer for --column, but I
think you are wrong.  "On the entire line, we didn't find 'foo'
anywhere" is good enough reason for me to make the answer "the
entire line contributed to this hit" a definite one.

Exactly the same applies for "git grep --not -e foo".

When "git grep -e bar [--or] --not -e foo" shows a line because the
line has 'bar' on it, we have --column that points at 'b' of the
first 'bar' on the line.  When it shows a line because the line has
neither 'bar' or 'foo', then "--not -e foo" part would give a
definite "the entire line contributed to this decision that it does
not have 'foo'".




Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

2018-05-11 Thread Taylor Blau
On Sat, May 12, 2018 at 02:08:48PM +0900, Junio C Hamano wrote:
> Taylor Blau  writes:
>
> >> $ git grep -v second
> >> $ git grep --not -e second
> >>
> >> may hit all lines in this message (except for the obvious two
> >> lines), but we cannot say which column we found a hit.  I am
> >> wondering if it is too grave a sin to report "the whole line is what
> >> satisfied the criteria given" and say the match lies at column #1.
> >
> > I think that is sensible. I previously was opposed to this because I
> > thought that it would be too difficult to script around the 'sometimes
> > we have columns but other times not' and 'I gave --column' but have to
> > check whether or not they are really there.
>
> I am not sure if you really got what I meant.  I am suggesting that
> "git grep -v --column second" should report that the entire line has
> hit for each and every line that does not have "second" on it, which
> is a good answer and eliminate "sometimes there is column answer (or
> answer to -o query) and sometimes not" at the same time.

I re-read your note and understand more clearly now what your suggestion
is. To ensure that we're in agreement, do you mean:

  1. '--column -v' will _never_ give a column, but will never die(),
  either

  2. '--column --[and | or | not]' will never give a column, but will
  also never die(), either.

I think that _those_ semantics are sensible, and I apologize for
misinterpreting your original note.

> > In other terms:
> >
> >   * not giving '--column' will _never_ give a column,
> >   * '--column --invert' will _always_ die(), and
> >   * '--column --[and | or | not]' will _never_ give a column.
>
> So this is completely opposite from what I would have expected. to
> somebody who said "I think that is sensible." over there.

Thanks,
Taylor


Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

2018-05-11 Thread Junio C Hamano
Taylor Blau  writes:

>> $ git grep -v second
>> $ git grep --not -e second
>>
>> may hit all lines in this message (except for the obvious two
>> lines), but we cannot say which column we found a hit.  I am
>> wondering if it is too grave a sin to report "the whole line is what
>> satisfied the criteria given" and say the match lies at column #1.
>
> I think that is sensible. I previously was opposed to this because I
> thought that it would be too difficult to script around the 'sometimes
> we have columns but other times not' and 'I gave --column' but have to
> check whether or not they are really there.

I am not sure if you really got what I meant.  I am suggesting that
"git grep -v --column second" should report that the entire line has
hit for each and every line that does not have "second" on it, which
is a good answer and eliminate "sometimes there is column answer (or
answer to -o query) and sometimes not" at the same time.

> In other terms:
>
>   * not giving '--column' will _never_ give a column,
>   * '--column --invert' will _always_ die(), and
>   * '--column --[and | or | not]' will _never_ give a column.

So this is completely opposite from what I would have expected. to
somebody who said "I think that is sensible." over there.




Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

2018-05-11 Thread Taylor Blau
On Thu, May 10, 2018 at 09:04:34AM +0900, Junio C Hamano wrote:
> Taylor Blau  writes:
>
> > This check we should retain and change the wording to mention '--and',
> > '--or', and '--not' specifically.
>
> Why are these problematic in the first place?  If I said
>
> $ git grep -e first --and -e these
> $ git grep -e first --and --not -e those
> $ git grep -e first --or -e those
>
> I'd expect that the first line of this paragraph will hit, and the
> first hit for these three are "these", "first" and "first",
> respectively.  Most importantly, in the last one, "--or" can be
> omitted and the whole thing stops being "extended", so rejecting
> extended as a whole does not make much sense.

Agreed that this is what I would expect, too. The trouble is that we
never do a compilation step from containing --and, --or, --not to a
POSIX regexp. So, if we're extended, we have to assume that there might
not be an answer.

Given this, I don't think we should categorically die() when we
encounter this (more below in the next hunk of this mail). I think this
thread has established that there are certainly cases where we cannot
provide a meaningful answer, (--not at the top-level, for instance) but
there are cases where we can (as you indicate above).

One day, I would like to support --column with --and, --or, or --not in
cases where there _is_ a definite answer. That said, omitting this
information for now and at least not die()-ing I think is worth taking
this patch earlier, and leaving some #leftoverbits.

> $ git grep -v second
> $ git grep --not -e second
>
> may hit all lines in this message (except for the obvious two
> lines), but we cannot say which column we found a hit.  I am
> wondering if it is too grave a sin to report "the whole line is what
> satisfied the criteria given" and say the match lies at column #1.

I think that is sensible. I previously was opposed to this because I
thought that it would be too difficult to script around the 'sometimes
we have columns but other times not' and 'I gave --column' but have to
check whether or not they are really there.

I no longer believe that my above argument is sound. It simplifies the
matter greatly to simply not share columns when we don't have a good
answer, and do when we do.

In other terms:

  * not giving '--column' will _never_ give a column,
  * '--column --invert' will _always_ die(), and
  * '--column --[and | or | not]' will _never_ give a column.

Thanks,
Taylor


Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

2018-05-10 Thread Junio C Hamano
René Scharfe  writes:

> Am 10.05.2018 um 02:04 schrieb Junio C Hamano:
> ...
>>  $ git grep -v second
>>  $ git grep --not -e second
>> 
>> may hit all lines in this message (except for the obvious two
>> lines), but we cannot say which column we found a hit.  I am
>> wondering if it is too grave a sin to report "the whole line is what
>> satisfied the criteria given" and say the match lies at column #1.

And if we are planning to use this to implement '-o', then I'd
suggest that we'd say the matched part of the line is the whole
thing (i.e. so is column #1, eo is at the eol).

>> By doing so, obviously we can sidestep the whole "this mode is
>> sometimes incompatible" and "I need to compute a lot to see if the
>> given expression is compatible or not" issues.
>
> FWIW, Silver Searcher 2.1.0 does just that:
>
>   $ echo a | ag --column -v b
>   1:a
>
> ripgrep 0.8.1 as well:
>
>   $ echo a | rg --column -v b
>   1:1:a

Thanks for additional datapoints.



Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

2018-05-09 Thread René Scharfe
Am 10.05.2018 um 02:04 schrieb Junio C Hamano:
> Taylor Blau  writes:
> 
>> This check we should retain and change the wording to mention '--and',
>> '--or', and '--not' specifically.
> 
> Why are these problematic in the first place?  If I said
> 
>  $ git grep -e first --and -e these
>  $ git grep -e first --and --not -e those
>  $ git grep -e first --or -e those
> 
> I'd expect that the first line of this paragraph will hit, and the
> first hit for these three are "these", "first" and "first",
> respectively.  Most importantly, in the last one, "--or" can be
> omitted and the whole thing stops being "extended", so rejecting
> extended as a whole does not make much sense.
> 
>  $ git grep -v second
>  $ git grep --not -e second
> 
> may hit all lines in this message (except for the obvious two
> lines), but we cannot say which column we found a hit.  I am
> wondering if it is too grave a sin to report "the whole line is what
> satisfied the criteria given" and say the match lies at column #1.
> 
> By doing so, obviously we can sidestep the whole "this mode is
> sometimes incompatible" and "I need to compute a lot to see if the
> given expression is compatible or not" issues.

FWIW, Silver Searcher 2.1.0 does just that:

$ echo a | ag --column -v b
1:a

ripgrep 0.8.1 as well:

$ echo a | rg --column -v b
1:1:a

Side note: This example also shows that --column implies --line-number
for ripgrep because column numbers are mostly useless without line
numbers (https://github.com/BurntSushi/ripgrep/issues/243).  I'm not
sure I'm buying that reasoning.

ack-grep 2.22 seems to have problems with that combination:

$ echo a | ack --column -v b
a

$ echo a | ack -H --column -v b
-
Use of uninitialized value $line_parts[1] in join or string at 
/usr/bin/ack line 653,  line 1.
1::a

René


Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

2018-05-09 Thread Junio C Hamano
Taylor Blau  writes:

> This check we should retain and change the wording to mention '--and',
> '--or', and '--not' specifically.

Why are these problematic in the first place?  If I said

$ git grep -e first --and -e these
$ git grep -e first --and --not -e those
$ git grep -e first --or -e those

I'd expect that the first line of this paragraph will hit, and the
first hit for these three are "these", "first" and "first",
respectively.  Most importantly, in the last one, "--or" can be
omitted and the whole thing stops being "extended", so rejecting
extended as a whole does not make much sense.

$ git grep -v second
$ git grep --not -e second

may hit all lines in this message (except for the obvious two
lines), but we cannot say which column we found a hit.  I am
wondering if it is too grave a sin to report "the whole line is what
satisfied the criteria given" and say the match lies at column #1.

By doing so, obviously we can sidestep the whole "this mode is
sometimes incompatible" and "I need to compute a lot to see if the
given expression is compatible or not" issues.


Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

2018-05-09 Thread Taylor Blau
On Wed, May 09, 2018 at 07:26:57PM +0200, Martin Ågren wrote:
> On 9 May 2018 at 12:41, Phillip Wood  wrote:
> > On 09/05/18 03:13, Taylor Blau wrote:
> >>
> >>   +--column::
> >> +   Prefix the 1-indexed byte-offset of the first match on non-context
> >> lines. This
> >> +   option is incompatible with '--invert-match', and extended
> >> expressions.
> >> +
> >
> >
> > Sorry to be fussy, but while this is clearer I think to could be improved to
> > make it clear that it is the offset from the start of the matching line.
> > Also the mention of 'extended expressions' made me think of 'grep -E' but I
> > think (correct me if I'm wrong) you mean the boolean options '--and',
> > '--not' and '--or'. The man page only uses the word extended when talking
> > about extended regexes. I think something like
> >
> > Print the 1-indexed byte-offset of the first match from the start of the
> > matching line. This option is incompatible with '--invert-match', '--and',
> > '--not' and '--or'.
> >
> > would be clearer
>
> >> +   if (opt->columnnum && opt->extended)
> >> +   die(_("--column and extended expressions cannot be 
> >> combined"));
> >> +
>
> Just so it doesn't get missed: Phillip's comment (which I agree with)
> about "extended" would apply here as well. This would work fine, no?

This check we should retain and change the wording to mention '--and',
'--or', and '--not' specifically.

> One thing to notice is that dying for `--column --not` in combination
> with patch 7/7 makes git-jump unable to handle `--not` (and friends).
> That would be a regression? I suppose git-jump could use a special
> `--maybe-column` which would be "please use --column, unless I give you
> something that won't play well with it". Or --column could do that kind
> of falling back on its own. Or, git-jump could scan the arguments to
> decide whether to use `--column` or not. Hmm... Tricky. :-/

Agree that this is tricky. I don't think that --maybe-column is a
direction that we should take for the reasons I outlined in the cover
letter. Like I said, there are cases under an extended grammar where we
can and cannot display meaningful column offsets.

With regards to regressing 'git-jump', I feel as if 'git-jump --not' is
an uncommon-enough case that I would be comfortable with the tradeoff.
If a caller _is_ using '--not' in 'git-jump', they can reconfigure
'jump.grepCmd' to work around this issue.

Perhaps this is worth warning about in 'git-jump'? Peff, what do you
think?


Thanks,
Taylor


Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

2018-05-09 Thread Taylor Blau
On Wed, May 09, 2018 at 11:41:02AM +0100, Phillip Wood wrote:
> Hi Taylor
>
> On 09/05/18 03:13, Taylor Blau wrote:
> > Teach 'git-grep(1)' a new option, '--column', to show the column
> > number of the first match on a non-context line. This makes it possible
> > to teach 'contrib/git-jump/git-jump' how to seek to the first matching
> > position of a grep match in your editor, and allows similar additional
> > scripting capabilities.
> >
> > For example:
> >
> >$ git grep -n --column foo | head -n3
> >.clang-format:51:14:# myFunction(foo, bar, baz);
> >.clang-format:64:7:# int foo();
> >.clang-format:75:8:# void foo()
> >
> > Signed-off-by: Taylor Blau 
> > ---
> >   Documentation/git-grep.txt |  6 +-
> >   builtin/grep.c |  4 
> >   grep.c |  3 +++
> >   t/t7810-grep.sh| 32 
> >   4 files changed, 44 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
> > index 18b494731f..75f1561112 100644
> > --- a/Documentation/git-grep.txt
> > +++ b/Documentation/git-grep.txt
> > @@ -13,7 +13,7 @@ SYNOPSIS
> >[-v | --invert-match] [-h|-H] [--full-name]
> >[-E | --extended-regexp] [-G | --basic-regexp]
> >[-P | --perl-regexp]
> > -  [-F | --fixed-strings] [-n | --line-number]
> > +  [-F | --fixed-strings] [-n | --line-number] [--column]
> >[-l | --files-with-matches] [-L | --files-without-match]
> >[(-O | --open-files-in-pager) []]
> >[-z | --null]
> > @@ -169,6 +169,10 @@ providing this option will cause it to die.
> >   --line-number::
> > Prefix the line number to matching lines.
> > +--column::
> > +   Prefix the 1-indexed byte-offset of the first match on non-context 
> > lines. This
> > +   option is incompatible with '--invert-match', and extended expressions.
> > +
>
> Sorry to be fussy, but while this is clearer I think to could be improved to
> make it clear that it is the offset from the start of the matching line.
> Also the mention of 'extended expressions' made me think of 'grep -E' but I
> think (correct me if I'm wrong) you mean the boolean options '--and',
> '--not' and '--or'. The man page only uses the word extended when talking
> about extended regexes. I think something like
>
> Print the 1-indexed byte-offset of the first match from the start of the
> matching line. This option is incompatible with '--invert-match', '--and',
> '--not' and '--or'.
>
> would be clearer

I agree, and would be happy to change it as-such. I think that there is
some pending discussion about regressing 'git-jump' no longer supporting
'--not', so I'll wait for that to resolve before resending this patch.

Thanks,
Taylor


Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

2018-05-09 Thread Taylor Blau
On Wed, May 09, 2018 at 06:17:20PM +0200, Duy Nguyen wrote:
> On Wed, May 9, 2018 at 4:13 AM, Taylor Blau  wrote:
> > diff --git a/builtin/grep.c b/builtin/grep.c
> > index 5f32d2ce84..f9f516dfc4 100644
> > --- a/builtin/grep.c
> > +++ b/builtin/grep.c
> > @@ -829,6 +829,7 @@ int cmd_grep(int argc, const char **argv, const char 
> > *prefix)
> > GREP_PATTERN_TYPE_PCRE),
> > OPT_GROUP(""),
> > OPT_BOOL('n', "line-number", , N_("show line 
> > numbers")),
> > +   OPT_BOOL(0, "column", , N_("show column 
> > number of first match")),
>
> Two things to consider:
>
> - do we ever want columnar output in git-grep? Something like "git
> grep --column -l" could make sense (if you don't have very large
> worktree). --column is currently used for column output in git-branch,
> git-tag and git-status, which makes me think maybe we should reserve
> "--column" for that purpose and use another name here, even if we
> don't ever want column output in git-grep, for consistency.

I think that this is a valid concern. I had a similar thought when
adding 'git config --color' (as a new type specifier) that we might be
squatting on '--color' and instead opted for '[--type=]'.

I don't feel that the tradeoff between '--column' as a good name and the
concern that we _might_ want to output in a columnar format in 'git
grep' someday warrants the change.

> - If this is to help git-jump only and rarely manually specified on
> command line, you could add the flag PARSE_OPT_NOCOMPLETE to hide it
> from "git grep --" completion. You would need to use OPT_BOOL_F()
> instead of OPT_BOOL() in order to add extra flags.

I believe that this option is worth auto-completing. Its primarily
motivated for use within 'git-jump', but I feel as if it would be
useful for other callers, as well.

Thanks,
Taylor


Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

2018-05-09 Thread Martin Ågren
On 9 May 2018 at 12:41, Phillip Wood  wrote:
> On 09/05/18 03:13, Taylor Blau wrote:
>>
>>   +--column::
>> +   Prefix the 1-indexed byte-offset of the first match on non-context
>> lines. This
>> +   option is incompatible with '--invert-match', and extended
>> expressions.
>> +
>
>
> Sorry to be fussy, but while this is clearer I think to could be improved to
> make it clear that it is the offset from the start of the matching line.
> Also the mention of 'extended expressions' made me think of 'grep -E' but I
> think (correct me if I'm wrong) you mean the boolean options '--and',
> '--not' and '--or'. The man page only uses the word extended when talking
> about extended regexes. I think something like
>
> Print the 1-indexed byte-offset of the first match from the start of the
> matching line. This option is incompatible with '--invert-match', '--and',
> '--not' and '--or'.
>
> would be clearer

>> +   if (opt->columnnum && opt->extended)
>> +   die(_("--column and extended expressions cannot be 
>> combined"));
>> +

Just so it doesn't get missed: Phillip's comment (which I agree with)
about "extended" would apply here as well. This would work fine, no?

One thing to notice is that dying for `--column --not` in combination
with patch 7/7 makes git-jump unable to handle `--not` (and friends).
That would be a regression? I suppose git-jump could use a special
`--maybe-column` which would be "please use --column, unless I give you
something that won't play well with it". Or --column could do that kind
of falling back on its own. Or, git-jump could scan the arguments to
decide whether to use `--column` or not. Hmm... Tricky. :-/

Martin


Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

2018-05-09 Thread Duy Nguyen
On Wed, May 9, 2018 at 4:13 AM, Taylor Blau  wrote:
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 5f32d2ce84..f9f516dfc4 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -829,6 +829,7 @@ int cmd_grep(int argc, const char **argv, const char 
> *prefix)
> GREP_PATTERN_TYPE_PCRE),
> OPT_GROUP(""),
> OPT_BOOL('n', "line-number", , N_("show line 
> numbers")),
> +   OPT_BOOL(0, "column", , N_("show column number 
> of first match")),

Two things to consider:

- do we ever want columnar output in git-grep? Something like "git
grep --column -l" could make sense (if you don't have very large
worktree). --column is currently used for column output in git-branch,
git-tag and git-status, which makes me think maybe we should reserve
"--column" for that purpose and use another name here, even if we
don't ever want column output in git-grep, for consistency.

- If this is to help git-jump only and rarely manually specified on
command line, you could add the flag PARSE_OPT_NOCOMPLETE to hide it
from "git grep --" completion. You would need to use OPT_BOOL_F()
instead of OPT_BOOL() in order to add extra flags.
-- 
Duy


Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

2018-05-09 Thread Phillip Wood

Hi Taylor

On 09/05/18 03:13, Taylor Blau wrote:

Teach 'git-grep(1)' a new option, '--column', to show the column
number of the first match on a non-context line. This makes it possible
to teach 'contrib/git-jump/git-jump' how to seek to the first matching
position of a grep match in your editor, and allows similar additional
scripting capabilities.

For example:

   $ git grep -n --column foo | head -n3
   .clang-format:51:14:# myFunction(foo, bar, baz);
   .clang-format:64:7:# int foo();
   .clang-format:75:8:# void foo()

Signed-off-by: Taylor Blau 
---
  Documentation/git-grep.txt |  6 +-
  builtin/grep.c |  4 
  grep.c |  3 +++
  t/t7810-grep.sh| 32 
  4 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 18b494731f..75f1561112 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -13,7 +13,7 @@ SYNOPSIS
   [-v | --invert-match] [-h|-H] [--full-name]
   [-E | --extended-regexp] [-G | --basic-regexp]
   [-P | --perl-regexp]
-  [-F | --fixed-strings] [-n | --line-number]
+  [-F | --fixed-strings] [-n | --line-number] [--column]
   [-l | --files-with-matches] [-L | --files-without-match]
   [(-O | --open-files-in-pager) []]
   [-z | --null]
@@ -169,6 +169,10 @@ providing this option will cause it to die.
  --line-number::
Prefix the line number to matching lines.
  
+--column::

+   Prefix the 1-indexed byte-offset of the first match on non-context 
lines. This
+   option is incompatible with '--invert-match', and extended expressions.
+


Sorry to be fussy, but while this is clearer I think to could be 
improved to make it clear that it is the offset from the start of the 
matching line. Also the mention of 'extended expressions' made me think 
of 'grep -E' but I think (correct me if I'm wrong) you mean the boolean 
options '--and', '--not' and '--or'. The man page only uses the word 
extended when talking about extended regexes. I think something like


Print the 1-indexed byte-offset of the first match from the start of the 
matching line. This option is incompatible with '--invert-match', 
'--and', '--not' and '--or'.


would be clearer

Best Wishes

Phillip



  -l::
  --files-with-matches::
  --name-only::
diff --git a/builtin/grep.c b/builtin/grep.c
index 5f32d2ce84..f9f516dfc4 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -829,6 +829,7 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
GREP_PATTERN_TYPE_PCRE),
OPT_GROUP(""),
OPT_BOOL('n', "line-number", , N_("show line 
numbers")),
+   OPT_BOOL(0, "column", , N_("show column number of 
first match")),
OPT_NEGBIT('h', NULL, , N_("don't show 
filenames"), 1),
OPT_BIT('H', NULL, , N_("show filenames"), 1),
OPT_NEGBIT(0, "full-name", ,
@@ -,6 +1112,9 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
hit = grep_objects(, , the_repository, );
}
  
+	if (opt.columnnum && opt.invert)

+   die(_("--column and --invert-match cannot be combined"));
+
if (num_threads)
hit |= wait_all();
if (hit && show_in_pager)
diff --git a/grep.c b/grep.c
index f3fe416791..f4228c23ac 100644
--- a/grep.c
+++ b/grep.c
@@ -995,6 +995,9 @@ static void compile_grep_patterns_real(struct grep_opt *opt)
else if (!opt->extended && !opt->debug)
return;
  
+	if (opt->columnnum && opt->extended)

+   die(_("--column and extended expressions cannot be combined"));
+
p = opt->pattern_list;
if (p)
opt->pattern_expression = compile_pattern_expr();
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 1797f632a3..aa56b21ed9 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -99,6 +99,28 @@ do
test_cmp expected actual
'
  
+	test_expect_success "grep -w $L (with --column)" '

+   {
+   echo ${HC}file:5:foo mmap bar
+   echo ${HC}file:14:foo_mmap bar mmap
+   echo ${HC}file:5:foo mmap bar_mmap
+   echo ${HC}file:14:foo_mmap bar mmap baz
+   } >expected &&
+   git grep --column -w -e mmap $H >actual &&
+   test_cmp expected actual
+   '
+
+   test_expect_success "grep -w $L (with --line-number, --column)" '
+   {
+   echo ${HC}file:1:5:foo mmap bar
+   echo ${HC}file:3:14:foo_mmap bar mmap
+   echo ${HC}file:4:5:foo mmap bar_mmap
+   echo ${HC}file:5:14:foo_mmap bar mmap baz
+   } >expected &&
+   git grep -n --column -w -e mmap $H >actual &&
+

[PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

2018-05-08 Thread Taylor Blau
Teach 'git-grep(1)' a new option, '--column', to show the column
number of the first match on a non-context line. This makes it possible
to teach 'contrib/git-jump/git-jump' how to seek to the first matching
position of a grep match in your editor, and allows similar additional
scripting capabilities.

For example:

  $ git grep -n --column foo | head -n3
  .clang-format:51:14:# myFunction(foo, bar, baz);
  .clang-format:64:7:# int foo();
  .clang-format:75:8:# void foo()

Signed-off-by: Taylor Blau 
---
 Documentation/git-grep.txt |  6 +-
 builtin/grep.c |  4 
 grep.c |  3 +++
 t/t7810-grep.sh| 32 
 4 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 18b494731f..75f1561112 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -13,7 +13,7 @@ SYNOPSIS
   [-v | --invert-match] [-h|-H] [--full-name]
   [-E | --extended-regexp] [-G | --basic-regexp]
   [-P | --perl-regexp]
-  [-F | --fixed-strings] [-n | --line-number]
+  [-F | --fixed-strings] [-n | --line-number] [--column]
   [-l | --files-with-matches] [-L | --files-without-match]
   [(-O | --open-files-in-pager) []]
   [-z | --null]
@@ -169,6 +169,10 @@ providing this option will cause it to die.
 --line-number::
Prefix the line number to matching lines.
 
+--column::
+   Prefix the 1-indexed byte-offset of the first match on non-context 
lines. This
+   option is incompatible with '--invert-match', and extended expressions.
+
 -l::
 --files-with-matches::
 --name-only::
diff --git a/builtin/grep.c b/builtin/grep.c
index 5f32d2ce84..f9f516dfc4 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -829,6 +829,7 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
GREP_PATTERN_TYPE_PCRE),
OPT_GROUP(""),
OPT_BOOL('n', "line-number", , N_("show line 
numbers")),
+   OPT_BOOL(0, "column", , N_("show column number of 
first match")),
OPT_NEGBIT('h', NULL, , N_("don't show 
filenames"), 1),
OPT_BIT('H', NULL, , N_("show filenames"), 1),
OPT_NEGBIT(0, "full-name", ,
@@ -,6 +1112,9 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
hit = grep_objects(, , the_repository, );
}
 
+   if (opt.columnnum && opt.invert)
+   die(_("--column and --invert-match cannot be combined"));
+
if (num_threads)
hit |= wait_all();
if (hit && show_in_pager)
diff --git a/grep.c b/grep.c
index f3fe416791..f4228c23ac 100644
--- a/grep.c
+++ b/grep.c
@@ -995,6 +995,9 @@ static void compile_grep_patterns_real(struct grep_opt *opt)
else if (!opt->extended && !opt->debug)
return;
 
+   if (opt->columnnum && opt->extended)
+   die(_("--column and extended expressions cannot be combined"));
+
p = opt->pattern_list;
if (p)
opt->pattern_expression = compile_pattern_expr();
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 1797f632a3..aa56b21ed9 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -99,6 +99,28 @@ do
test_cmp expected actual
'
 
+   test_expect_success "grep -w $L (with --column)" '
+   {
+   echo ${HC}file:5:foo mmap bar
+   echo ${HC}file:14:foo_mmap bar mmap
+   echo ${HC}file:5:foo mmap bar_mmap
+   echo ${HC}file:14:foo_mmap bar mmap baz
+   } >expected &&
+   git grep --column -w -e mmap $H >actual &&
+   test_cmp expected actual
+   '
+
+   test_expect_success "grep -w $L (with --line-number, --column)" '
+   {
+   echo ${HC}file:1:5:foo mmap bar
+   echo ${HC}file:3:14:foo_mmap bar mmap
+   echo ${HC}file:4:5:foo mmap bar_mmap
+   echo ${HC}file:5:14:foo_mmap bar mmap baz
+   } >expected &&
+   git grep -n --column -w -e mmap $H >actual &&
+   test_cmp expected actual
+   '
+
test_expect_success "grep -w $L" '
{
echo ${HC}file:1:foo mmap bar
@@ -1590,4 +1612,14 @@ test_expect_success 'grep does not report i-t-a and 
assume unchanged with -L' '
test_cmp expected actual
 '
 
+test_expect_success 'grep does not allow --column, --invert-match' '
+   test_must_fail git grep --column --invert-match pat 2>err &&
+   test_i18ngrep "\-\-column and \-\-invert-match cannot be combined" err
+'
+
+test_expect_success 'grep does not allow --column, extended' '
+   test_must_fail git grep --column --not -e pat 2>err &&
+   test_i18ngrep "\-\-column and extended