Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
On Fri, May 18, 2018 at 02:50:21PM -0700, Taylor Blau wrote: > [...] > > > Another might be to pick "foo" in the first and "bar" in the second > > line, as that is the "first hit" on the line, which is consistent > > with how "git grep -e foo" would say about "a foo b foo c foo" (I > > expect that the leftmost "foo" would be the first hit). So there > > may be multiple, equally plausible answer to the question. > > This is the largest fact in my mind pertaining to this discussion: there > are probably many different interpretations of semantics for this, all > equally valid in their own way. I am partial to the minimum substring > interpretation (which follows naturally from the minimum-start, > maximum-end idea), accepting the shortcoming that `--or` sometimes > doesn't ``do the right thing.'' Ignore this last part. `--or` _does_ do the right thing, as noted below. Thanks, Taylor
Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
Thanks for your thoughtful response. I answered in detail below, but I think that we're in agreement about the semantics, with a few corrections on my part. I'd like to push forward with this series, including the proposed changes below, but feel that sending it as v7 would be asking too much of a reviewer. Would it be OK if I sent this a new series entirely and we abandon this thread? On Fri, May 18, 2018 at 03:27:44PM +0900, Junio C Hamano wrote: > Taylor Blauwrites: > > > * `git grep --and -e foo -e bar`. This binary operation should recur > > on its sub-expressions and take the minimum of the starting offset > > and the maximum of the ending offset. > > We use infix notation, so the above is "git grep -e foo --and -e > bar" actually ;-). Thanks for catching that :-). > But you raise an interesting point. A line with "hello foo bar baz" > on it would match, so does a line with "goodbye bar baz foo", as > both of them hits pattern "foo" *and* pattern "bar". It is not > quite clear what it means to "show the first hit on the line". One > interpretation would be to take the minimum span that makes both > sides of "--and" happy (your "minimum of start, maximum of end"). It's funny you should mention: this was nearly the exact phrase I used when speaking with Peff. > Another might be to pick "foo" in the first and "bar" in the second > line, as that is the "first hit" on the line, which is consistent > with how "git grep -e foo" would say about "a foo b foo c foo" (I > expect that the leftmost "foo" would be the first hit). So there > may be multiple, equally plausible answer to the question. This is the largest fact in my mind pertaining to this discussion: there are probably many different interpretations of semantics for this, all equally valid in their own way. I am partial to the minimum substring interpretation (which follows naturally from the minimum-start, maximum-end idea), accepting the shortcoming that `--or` sometimes doesn't ``do the right thing.'' > > For inputs of the form "foobar" and "foo bar", it will do the right > > thing (give the starting and ending offset for "foobar" and give no > > match, respectively). > > I think I agree with "foobar", but I do not understand why there is > no match for "foo bar". Ah, I think this is my mistake -- when I wrote this note last night. The case of `-e foo --and -e bar` should clearly match both `foo bar` _and_ `foobar`. > > * `git grep --or -e foo -e bar`. This is the most complicated case, in > > my opinion. In going with the min/max idea in the and case above, I > > think that `--or` should also min/max its sub-expressions, but in > > fact we short-circuit evaluating the second sub-expression when we > > find a match for the first. > > I am not sure I follow. "git grep -e foo --or -e bar" is just a > longhand for "git grep -e foo -e bar". Shouldn't it highlight > whichever between foo and bar that appears leftmost on the line? I don't believe that the two are treated the same, but I think that this is another case where I was incorrect in my judgement of the implementation last night. In fact, the only time when we _don't_ recur on both sub-expressions of `--or` is when 'collect_hits' is zero. That's fine, I believe. > > So, in cases like matching `--or -e foo -e bar` with "foo baz bar", > > we'll do the right thing, since `foo` is the first sub-expression > > and happens to be the left-most match. In other words, we __adhere > > to our answer with the left-most match first__ semantics, but only > > because __the first sub-expression is the left-most match__. > > > > In the other case where we try and match the same expression against > > "bar baz foo", we'll return the starting offset of "foo", even > > though it isn't the left-most match, violating our semantics. > > I am not sure why you think your min-starting/max-ending would lead > to such a conclusion. 'foo baz bar' would be covered in its > entirety, 'bar baz foo' would also, as starting of hits with pattern > 'foo' and pattern 'bar' would be 'b' in 'bar' on that three-word > line, and ending of hits with these two patterns would be the last > 'o' in 'foo' on the line. Right, I think with the understanding in my last stanza of this response ("I don't believe that ...") this issue is resolved, and the min-starting/max-ending _will_ do the right thing. > I'd expect that a line 'foo baz bar' matched against "-e foo --or -e > bar" would say "among three words on me, 'f' in foo is the first > location of the match", though. I would, too. Thanks, Taylor
Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
Taylor Blauwrites: > * `git grep --and -e foo -e bar`. This binary operation should recur > on its sub-expressions and take the minimum of the starting offset > and the maximum of the ending offset. We use infix notation, so the above is "git grep -e foo --and -e bar" actually ;-). But you raise an interesting point. A line with "hello foo bar baz" on it would match, so does a line with "goodbye bar baz foo", as both of them hits pattern "foo" *and* pattern "bar". It is not quite clear what it means to "show the first hit on the line". One interpretation would be to take the minimum span that makes both sides of "--and" happy (your "minimum of start, maximum of end"). Another might be to pick "foo" in the first and "bar" in the second line, as that is the "first hit" on the line, which is consistent with how "git grep -e foo" would say about "a foo b foo c foo" (I expect that the leftmost "foo" would be the first hit). So there may be multiple, equally plausible answer to the question. > For inputs of the form "foobar" and "foo bar", it will do the right > thing (give the starting and ending offset for "foobar" and give no > match, respectively). I think I agree with "foobar", but I do not understand why there is no match for "foo bar". > * `git grep --or -e foo -e bar`. This is the most complicated case, in > my opinion. In going with the min/max idea in the and case above, I > think that `--or` should also min/max its sub-expressions, but in > fact we short-circuit evaluating the second sub-expression when we > find a match for the first. I am not sure I follow. "git grep -e foo --or -e bar" is just a longhand for "git grep -e foo -e bar". Shouldn't it highlight whichever between foo and bar that appears leftmost on the line? > So, in cases like matching `--or -e foo -e bar` with "foo baz bar", > we'll do the right thing, since `foo` is the first sub-expression > and happens to be the left-most match. In other words, we __adhere > to our answer with the left-most match first__ semantics, but only > because __the first sub-expression is the left-most match__. > > In the other case where we try and match the same expression against > "bar baz foo", we'll return the starting offset of "foo", even > though it isn't the left-most match, violating our semantics. I am not sure why you think your min-starting/max-ending would lead to such a conclusion. 'foo baz bar' would be covered in its entirety, 'bar baz foo' would also, as starting of hits with pattern 'foo' and pattern 'bar' would be 'b' in 'bar' on that three-word line, and ending of hits with these two patterns would be the last 'o' in 'foo' on the line. I'd expect that a line 'foo baz bar' matched against "-e foo --or -e bar" would say "among three words on me, 'f' in foo is the first location of the match", though.
Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
On Sat, May 12, 2018 at 03:07:04PM +0900, Junio C Hamano wrote: > Taylor Blauwrites: > > > I re-read your note and understand more clearly now what your suggestion > > is. To ensure that we're in agreement, do you mean: > > > > 1. '--column -v' will _never_ give a column, but will never die(), > > either > > No, I don't. > > > 2. '--column --[and | or | not]' will never give a column, but will > > also never die(), either. > > No, I don't. > > If a file does not have substring "foo", then > > git grep -v -e foo file > git grep --not -e foo file > > would hit all lines, just like > > git grep -e '.*' file > > does. > > I would expect that all of these > > git grep --column/-o -v -e foo file > git grep --column/-o --not -e foo file > git grep --column/-o -e '.*' file > > give the same output, which is what we would get if we consider the > hit from "choose lines that lack 'foo'" on a line without 'foo' is > caused by the entire contents on the line. That is in line with > "choose lines that has anything (including nothing)" aka ".*" would > result in the entire line being reported via -o. The byte offset of > the first hit on such a line reported by --column is also 1, and > that is a good and real answer to the question "git grep --column/-o" > can give. I agree with your message now and thank you for explaining what you had written. I spoke with Peff off-list for a while to determine what I think is essentially the answer to ``what are a set of semantics for filling out a regmatch_t given an extended expression?'' It's helpful to recognize that the extended expressions are implemented very much like a tree, so a reasonable semantics will lend itself well to the way in which match_expr_eval() is implemented. Here's what we came up with: * `git grep -e foo`. This is the case where the extended expression has a single atomic node in its tree. This falls into the "just call match_one_pattern()" case and has a simple answer: the starting offset and ending offset are that of whatever match_one_pattern gives. * `git grep --not -e foo`. This has the set of semantics that you describe above (the starting offset is 1), with the addition that the ending offset is the end of the line. This is similar to the fact that `--not foo` is very similar to `.$`. * `git grep --and -e foo -e bar`. This binary operation should recur on its sub-expressions and take the minimum of the starting offset and the maximum of the ending offset. For inputs of the form "foobar" and "foo bar", it will do the right thing (give the starting and ending offset for "foobar" and give no match, respectively). * `git grep --or -e foo -e bar`. This is the most complicated case, in my opinion. In going with the min/max idea in the and case above, I think that `--or` should also min/max its sub-expressions, but in fact we short-circuit evaluating the second sub-expression when we find a match for the first. So, in cases like matching `--or -e foo -e bar` with "foo baz bar", we'll do the right thing, since `foo` is the first sub-expression and happens to be the left-most match. In other words, we __adhere to our answer with the left-most match first__ semantics, but only because __the first sub-expression is the left-most match__. In the other case where we try and match the same expression against "bar baz foo", we'll return the starting offset of "foo", even though it isn't the left-most match, violating our semantics. So, I propose we adopt the following: use the trivial answer for "foo", the whole line for "--not", and min/max the starting/ending offsets for binary operators, knowing that we will sometimes produce a weird answer for --or. I think that the semantics for --or are OK to go forward with, but would be interested in the thoughts of others to figure out whether this is sensible to everyone else. Does this seem like an OK approach? Perhaps Peff can clarify some of what's shared here, since we did speak elsewhere about it. Thanks, Taylor
Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
Taylor Blauwrites: > I re-read your note and understand more clearly now what your suggestion > is. To ensure that we're in agreement, do you mean: > > 1. '--column -v' will _never_ give a column, but will never die(), > either No, I don't. > 2. '--column --[and | or | not]' will never give a column, but will > also never die(), either. No, I don't. If a file does not have substring "foo", then git grep -v -e foo file git grep --not -e foo file would hit all lines, just like git grep -e '.*' file does. I would expect that all of these git grep --column/-o -v -e foo file git grep --column/-o --not -e foo file git grep --column/-o -e '.*' file give the same output, which is what we would get if we consider the hit from "choose lines that lack 'foo'" on a line without 'foo' is caused by the entire contents on the line. That is in line with "choose lines that has anything (including nothing)" aka ".*" would result in the entire line being reported via -o. The byte offset of the first hit on such a line reported by --column is also 1, and that is a good and real answer to the question "git grep --column/-o" can give. In an earlier message, you sounded like you do not think "we did not have 'foo' on that line, and that is why we are emitting because we are operating under -v" lack a definite answer for --column, but I think you are wrong. "On the entire line, we didn't find 'foo' anywhere" is good enough reason for me to make the answer "the entire line contributed to this hit" a definite one. Exactly the same applies for "git grep --not -e foo". When "git grep -e bar [--or] --not -e foo" shows a line because the line has 'bar' on it, we have --column that points at 'b' of the first 'bar' on the line. When it shows a line because the line has neither 'bar' or 'foo', then "--not -e foo" part would give a definite "the entire line contributed to this decision that it does not have 'foo'".
Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
On Sat, May 12, 2018 at 02:08:48PM +0900, Junio C Hamano wrote: > Taylor Blauwrites: > > >> $ git grep -v second > >> $ git grep --not -e second > >> > >> may hit all lines in this message (except for the obvious two > >> lines), but we cannot say which column we found a hit. I am > >> wondering if it is too grave a sin to report "the whole line is what > >> satisfied the criteria given" and say the match lies at column #1. > > > > I think that is sensible. I previously was opposed to this because I > > thought that it would be too difficult to script around the 'sometimes > > we have columns but other times not' and 'I gave --column' but have to > > check whether or not they are really there. > > I am not sure if you really got what I meant. I am suggesting that > "git grep -v --column second" should report that the entire line has > hit for each and every line that does not have "second" on it, which > is a good answer and eliminate "sometimes there is column answer (or > answer to -o query) and sometimes not" at the same time. I re-read your note and understand more clearly now what your suggestion is. To ensure that we're in agreement, do you mean: 1. '--column -v' will _never_ give a column, but will never die(), either 2. '--column --[and | or | not]' will never give a column, but will also never die(), either. I think that _those_ semantics are sensible, and I apologize for misinterpreting your original note. > > In other terms: > > > > * not giving '--column' will _never_ give a column, > > * '--column --invert' will _always_ die(), and > > * '--column --[and | or | not]' will _never_ give a column. > > So this is completely opposite from what I would have expected. to > somebody who said "I think that is sensible." over there. Thanks, Taylor
Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
Taylor Blauwrites: >> $ git grep -v second >> $ git grep --not -e second >> >> may hit all lines in this message (except for the obvious two >> lines), but we cannot say which column we found a hit. I am >> wondering if it is too grave a sin to report "the whole line is what >> satisfied the criteria given" and say the match lies at column #1. > > I think that is sensible. I previously was opposed to this because I > thought that it would be too difficult to script around the 'sometimes > we have columns but other times not' and 'I gave --column' but have to > check whether or not they are really there. I am not sure if you really got what I meant. I am suggesting that "git grep -v --column second" should report that the entire line has hit for each and every line that does not have "second" on it, which is a good answer and eliminate "sometimes there is column answer (or answer to -o query) and sometimes not" at the same time. > In other terms: > > * not giving '--column' will _never_ give a column, > * '--column --invert' will _always_ die(), and > * '--column --[and | or | not]' will _never_ give a column. So this is completely opposite from what I would have expected. to somebody who said "I think that is sensible." over there.
Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
On Thu, May 10, 2018 at 09:04:34AM +0900, Junio C Hamano wrote: > Taylor Blauwrites: > > > This check we should retain and change the wording to mention '--and', > > '--or', and '--not' specifically. > > Why are these problematic in the first place? If I said > > $ git grep -e first --and -e these > $ git grep -e first --and --not -e those > $ git grep -e first --or -e those > > I'd expect that the first line of this paragraph will hit, and the > first hit for these three are "these", "first" and "first", > respectively. Most importantly, in the last one, "--or" can be > omitted and the whole thing stops being "extended", so rejecting > extended as a whole does not make much sense. Agreed that this is what I would expect, too. The trouble is that we never do a compilation step from containing --and, --or, --not to a POSIX regexp. So, if we're extended, we have to assume that there might not be an answer. Given this, I don't think we should categorically die() when we encounter this (more below in the next hunk of this mail). I think this thread has established that there are certainly cases where we cannot provide a meaningful answer, (--not at the top-level, for instance) but there are cases where we can (as you indicate above). One day, I would like to support --column with --and, --or, or --not in cases where there _is_ a definite answer. That said, omitting this information for now and at least not die()-ing I think is worth taking this patch earlier, and leaving some #leftoverbits. > $ git grep -v second > $ git grep --not -e second > > may hit all lines in this message (except for the obvious two > lines), but we cannot say which column we found a hit. I am > wondering if it is too grave a sin to report "the whole line is what > satisfied the criteria given" and say the match lies at column #1. I think that is sensible. I previously was opposed to this because I thought that it would be too difficult to script around the 'sometimes we have columns but other times not' and 'I gave --column' but have to check whether or not they are really there. I no longer believe that my above argument is sound. It simplifies the matter greatly to simply not share columns when we don't have a good answer, and do when we do. In other terms: * not giving '--column' will _never_ give a column, * '--column --invert' will _always_ die(), and * '--column --[and | or | not]' will _never_ give a column. Thanks, Taylor
Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
René Scharfewrites: > Am 10.05.2018 um 02:04 schrieb Junio C Hamano: > ... >> $ git grep -v second >> $ git grep --not -e second >> >> may hit all lines in this message (except for the obvious two >> lines), but we cannot say which column we found a hit. I am >> wondering if it is too grave a sin to report "the whole line is what >> satisfied the criteria given" and say the match lies at column #1. And if we are planning to use this to implement '-o', then I'd suggest that we'd say the matched part of the line is the whole thing (i.e. so is column #1, eo is at the eol). >> By doing so, obviously we can sidestep the whole "this mode is >> sometimes incompatible" and "I need to compute a lot to see if the >> given expression is compatible or not" issues. > > FWIW, Silver Searcher 2.1.0 does just that: > > $ echo a | ag --column -v b > 1:a > > ripgrep 0.8.1 as well: > > $ echo a | rg --column -v b > 1:1:a Thanks for additional datapoints.
Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
Am 10.05.2018 um 02:04 schrieb Junio C Hamano: > Taylor Blauwrites: > >> This check we should retain and change the wording to mention '--and', >> '--or', and '--not' specifically. > > Why are these problematic in the first place? If I said > > $ git grep -e first --and -e these > $ git grep -e first --and --not -e those > $ git grep -e first --or -e those > > I'd expect that the first line of this paragraph will hit, and the > first hit for these three are "these", "first" and "first", > respectively. Most importantly, in the last one, "--or" can be > omitted and the whole thing stops being "extended", so rejecting > extended as a whole does not make much sense. > > $ git grep -v second > $ git grep --not -e second > > may hit all lines in this message (except for the obvious two > lines), but we cannot say which column we found a hit. I am > wondering if it is too grave a sin to report "the whole line is what > satisfied the criteria given" and say the match lies at column #1. > > By doing so, obviously we can sidestep the whole "this mode is > sometimes incompatible" and "I need to compute a lot to see if the > given expression is compatible or not" issues. FWIW, Silver Searcher 2.1.0 does just that: $ echo a | ag --column -v b 1:a ripgrep 0.8.1 as well: $ echo a | rg --column -v b 1:1:a Side note: This example also shows that --column implies --line-number for ripgrep because column numbers are mostly useless without line numbers (https://github.com/BurntSushi/ripgrep/issues/243). I'm not sure I'm buying that reasoning. ack-grep 2.22 seems to have problems with that combination: $ echo a | ack --column -v b a $ echo a | ack -H --column -v b - Use of uninitialized value $line_parts[1] in join or string at /usr/bin/ack line 653, line 1. 1::a René
Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
Taylor Blauwrites: > This check we should retain and change the wording to mention '--and', > '--or', and '--not' specifically. Why are these problematic in the first place? If I said $ git grep -e first --and -e these $ git grep -e first --and --not -e those $ git grep -e first --or -e those I'd expect that the first line of this paragraph will hit, and the first hit for these three are "these", "first" and "first", respectively. Most importantly, in the last one, "--or" can be omitted and the whole thing stops being "extended", so rejecting extended as a whole does not make much sense. $ git grep -v second $ git grep --not -e second may hit all lines in this message (except for the obvious two lines), but we cannot say which column we found a hit. I am wondering if it is too grave a sin to report "the whole line is what satisfied the criteria given" and say the match lies at column #1. By doing so, obviously we can sidestep the whole "this mode is sometimes incompatible" and "I need to compute a lot to see if the given expression is compatible or not" issues.
Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
On Wed, May 09, 2018 at 07:26:57PM +0200, Martin Ågren wrote: > On 9 May 2018 at 12:41, Phillip Woodwrote: > > On 09/05/18 03:13, Taylor Blau wrote: > >> > >> +--column:: > >> + Prefix the 1-indexed byte-offset of the first match on non-context > >> lines. This > >> + option is incompatible with '--invert-match', and extended > >> expressions. > >> + > > > > > > Sorry to be fussy, but while this is clearer I think to could be improved to > > make it clear that it is the offset from the start of the matching line. > > Also the mention of 'extended expressions' made me think of 'grep -E' but I > > think (correct me if I'm wrong) you mean the boolean options '--and', > > '--not' and '--or'. The man page only uses the word extended when talking > > about extended regexes. I think something like > > > > Print the 1-indexed byte-offset of the first match from the start of the > > matching line. This option is incompatible with '--invert-match', '--and', > > '--not' and '--or'. > > > > would be clearer > > >> + if (opt->columnnum && opt->extended) > >> + die(_("--column and extended expressions cannot be > >> combined")); > >> + > > Just so it doesn't get missed: Phillip's comment (which I agree with) > about "extended" would apply here as well. This would work fine, no? This check we should retain and change the wording to mention '--and', '--or', and '--not' specifically. > One thing to notice is that dying for `--column --not` in combination > with patch 7/7 makes git-jump unable to handle `--not` (and friends). > That would be a regression? I suppose git-jump could use a special > `--maybe-column` which would be "please use --column, unless I give you > something that won't play well with it". Or --column could do that kind > of falling back on its own. Or, git-jump could scan the arguments to > decide whether to use `--column` or not. Hmm... Tricky. :-/ Agree that this is tricky. I don't think that --maybe-column is a direction that we should take for the reasons I outlined in the cover letter. Like I said, there are cases under an extended grammar where we can and cannot display meaningful column offsets. With regards to regressing 'git-jump', I feel as if 'git-jump --not' is an uncommon-enough case that I would be comfortable with the tradeoff. If a caller _is_ using '--not' in 'git-jump', they can reconfigure 'jump.grepCmd' to work around this issue. Perhaps this is worth warning about in 'git-jump'? Peff, what do you think? Thanks, Taylor
Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
On Wed, May 09, 2018 at 11:41:02AM +0100, Phillip Wood wrote: > Hi Taylor > > On 09/05/18 03:13, Taylor Blau wrote: > > Teach 'git-grep(1)' a new option, '--column', to show the column > > number of the first match on a non-context line. This makes it possible > > to teach 'contrib/git-jump/git-jump' how to seek to the first matching > > position of a grep match in your editor, and allows similar additional > > scripting capabilities. > > > > For example: > > > >$ git grep -n --column foo | head -n3 > >.clang-format:51:14:# myFunction(foo, bar, baz); > >.clang-format:64:7:# int foo(); > >.clang-format:75:8:# void foo() > > > > Signed-off-by: Taylor Blau> > --- > > Documentation/git-grep.txt | 6 +- > > builtin/grep.c | 4 > > grep.c | 3 +++ > > t/t7810-grep.sh| 32 > > 4 files changed, 44 insertions(+), 1 deletion(-) > > > > diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt > > index 18b494731f..75f1561112 100644 > > --- a/Documentation/git-grep.txt > > +++ b/Documentation/git-grep.txt > > @@ -13,7 +13,7 @@ SYNOPSIS > >[-v | --invert-match] [-h|-H] [--full-name] > >[-E | --extended-regexp] [-G | --basic-regexp] > >[-P | --perl-regexp] > > - [-F | --fixed-strings] [-n | --line-number] > > + [-F | --fixed-strings] [-n | --line-number] [--column] > >[-l | --files-with-matches] [-L | --files-without-match] > >[(-O | --open-files-in-pager) []] > >[-z | --null] > > @@ -169,6 +169,10 @@ providing this option will cause it to die. > > --line-number:: > > Prefix the line number to matching lines. > > +--column:: > > + Prefix the 1-indexed byte-offset of the first match on non-context > > lines. This > > + option is incompatible with '--invert-match', and extended expressions. > > + > > Sorry to be fussy, but while this is clearer I think to could be improved to > make it clear that it is the offset from the start of the matching line. > Also the mention of 'extended expressions' made me think of 'grep -E' but I > think (correct me if I'm wrong) you mean the boolean options '--and', > '--not' and '--or'. The man page only uses the word extended when talking > about extended regexes. I think something like > > Print the 1-indexed byte-offset of the first match from the start of the > matching line. This option is incompatible with '--invert-match', '--and', > '--not' and '--or'. > > would be clearer I agree, and would be happy to change it as-such. I think that there is some pending discussion about regressing 'git-jump' no longer supporting '--not', so I'll wait for that to resolve before resending this patch. Thanks, Taylor
Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
On Wed, May 09, 2018 at 06:17:20PM +0200, Duy Nguyen wrote: > On Wed, May 9, 2018 at 4:13 AM, Taylor Blauwrote: > > diff --git a/builtin/grep.c b/builtin/grep.c > > index 5f32d2ce84..f9f516dfc4 100644 > > --- a/builtin/grep.c > > +++ b/builtin/grep.c > > @@ -829,6 +829,7 @@ int cmd_grep(int argc, const char **argv, const char > > *prefix) > > GREP_PATTERN_TYPE_PCRE), > > OPT_GROUP(""), > > OPT_BOOL('n', "line-number", , N_("show line > > numbers")), > > + OPT_BOOL(0, "column", , N_("show column > > number of first match")), > > Two things to consider: > > - do we ever want columnar output in git-grep? Something like "git > grep --column -l" could make sense (if you don't have very large > worktree). --column is currently used for column output in git-branch, > git-tag and git-status, which makes me think maybe we should reserve > "--column" for that purpose and use another name here, even if we > don't ever want column output in git-grep, for consistency. I think that this is a valid concern. I had a similar thought when adding 'git config --color' (as a new type specifier) that we might be squatting on '--color' and instead opted for '[--type=]'. I don't feel that the tradeoff between '--column' as a good name and the concern that we _might_ want to output in a columnar format in 'git grep' someday warrants the change. > - If this is to help git-jump only and rarely manually specified on > command line, you could add the flag PARSE_OPT_NOCOMPLETE to hide it > from "git grep --" completion. You would need to use OPT_BOOL_F() > instead of OPT_BOOL() in order to add extra flags. I believe that this option is worth auto-completing. Its primarily motivated for use within 'git-jump', but I feel as if it would be useful for other callers, as well. Thanks, Taylor
Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
On 9 May 2018 at 12:41, Phillip Woodwrote: > On 09/05/18 03:13, Taylor Blau wrote: >> >> +--column:: >> + Prefix the 1-indexed byte-offset of the first match on non-context >> lines. This >> + option is incompatible with '--invert-match', and extended >> expressions. >> + > > > Sorry to be fussy, but while this is clearer I think to could be improved to > make it clear that it is the offset from the start of the matching line. > Also the mention of 'extended expressions' made me think of 'grep -E' but I > think (correct me if I'm wrong) you mean the boolean options '--and', > '--not' and '--or'. The man page only uses the word extended when talking > about extended regexes. I think something like > > Print the 1-indexed byte-offset of the first match from the start of the > matching line. This option is incompatible with '--invert-match', '--and', > '--not' and '--or'. > > would be clearer >> + if (opt->columnnum && opt->extended) >> + die(_("--column and extended expressions cannot be >> combined")); >> + Just so it doesn't get missed: Phillip's comment (which I agree with) about "extended" would apply here as well. This would work fine, no? One thing to notice is that dying for `--column --not` in combination with patch 7/7 makes git-jump unable to handle `--not` (and friends). That would be a regression? I suppose git-jump could use a special `--maybe-column` which would be "please use --column, unless I give you something that won't play well with it". Or --column could do that kind of falling back on its own. Or, git-jump could scan the arguments to decide whether to use `--column` or not. Hmm... Tricky. :-/ Martin
Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
On Wed, May 9, 2018 at 4:13 AM, Taylor Blauwrote: > diff --git a/builtin/grep.c b/builtin/grep.c > index 5f32d2ce84..f9f516dfc4 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -829,6 +829,7 @@ int cmd_grep(int argc, const char **argv, const char > *prefix) > GREP_PATTERN_TYPE_PCRE), > OPT_GROUP(""), > OPT_BOOL('n', "line-number", , N_("show line > numbers")), > + OPT_BOOL(0, "column", , N_("show column number > of first match")), Two things to consider: - do we ever want columnar output in git-grep? Something like "git grep --column -l" could make sense (if you don't have very large worktree). --column is currently used for column output in git-branch, git-tag and git-status, which makes me think maybe we should reserve "--column" for that purpose and use another name here, even if we don't ever want column output in git-grep, for consistency. - If this is to help git-jump only and rarely manually specified on command line, you could add the flag PARSE_OPT_NOCOMPLETE to hide it from "git grep --" completion. You would need to use OPT_BOOL_F() instead of OPT_BOOL() in order to add extra flags. -- Duy
Re: [PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
Hi Taylor On 09/05/18 03:13, Taylor Blau wrote: Teach 'git-grep(1)' a new option, '--column', to show the column number of the first match on a non-context line. This makes it possible to teach 'contrib/git-jump/git-jump' how to seek to the first matching position of a grep match in your editor, and allows similar additional scripting capabilities. For example: $ git grep -n --column foo | head -n3 .clang-format:51:14:# myFunction(foo, bar, baz); .clang-format:64:7:# int foo(); .clang-format:75:8:# void foo() Signed-off-by: Taylor Blau--- Documentation/git-grep.txt | 6 +- builtin/grep.c | 4 grep.c | 3 +++ t/t7810-grep.sh| 32 4 files changed, 44 insertions(+), 1 deletion(-) diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 18b494731f..75f1561112 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -13,7 +13,7 @@ SYNOPSIS [-v | --invert-match] [-h|-H] [--full-name] [-E | --extended-regexp] [-G | --basic-regexp] [-P | --perl-regexp] - [-F | --fixed-strings] [-n | --line-number] + [-F | --fixed-strings] [-n | --line-number] [--column] [-l | --files-with-matches] [-L | --files-without-match] [(-O | --open-files-in-pager) []] [-z | --null] @@ -169,6 +169,10 @@ providing this option will cause it to die. --line-number:: Prefix the line number to matching lines. +--column:: + Prefix the 1-indexed byte-offset of the first match on non-context lines. This + option is incompatible with '--invert-match', and extended expressions. + Sorry to be fussy, but while this is clearer I think to could be improved to make it clear that it is the offset from the start of the matching line. Also the mention of 'extended expressions' made me think of 'grep -E' but I think (correct me if I'm wrong) you mean the boolean options '--and', '--not' and '--or'. The man page only uses the word extended when talking about extended regexes. I think something like Print the 1-indexed byte-offset of the first match from the start of the matching line. This option is incompatible with '--invert-match', '--and', '--not' and '--or'. would be clearer Best Wishes Phillip -l:: --files-with-matches:: --name-only:: diff --git a/builtin/grep.c b/builtin/grep.c index 5f32d2ce84..f9f516dfc4 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -829,6 +829,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) GREP_PATTERN_TYPE_PCRE), OPT_GROUP(""), OPT_BOOL('n', "line-number", , N_("show line numbers")), + OPT_BOOL(0, "column", , N_("show column number of first match")), OPT_NEGBIT('h', NULL, , N_("don't show filenames"), 1), OPT_BIT('H', NULL, , N_("show filenames"), 1), OPT_NEGBIT(0, "full-name", , @@ -,6 +1112,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix) hit = grep_objects(, , the_repository, ); } + if (opt.columnnum && opt.invert) + die(_("--column and --invert-match cannot be combined")); + if (num_threads) hit |= wait_all(); if (hit && show_in_pager) diff --git a/grep.c b/grep.c index f3fe416791..f4228c23ac 100644 --- a/grep.c +++ b/grep.c @@ -995,6 +995,9 @@ static void compile_grep_patterns_real(struct grep_opt *opt) else if (!opt->extended && !opt->debug) return; + if (opt->columnnum && opt->extended) + die(_("--column and extended expressions cannot be combined")); + p = opt->pattern_list; if (p) opt->pattern_expression = compile_pattern_expr(); diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index 1797f632a3..aa56b21ed9 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -99,6 +99,28 @@ do test_cmp expected actual ' + test_expect_success "grep -w $L (with --column)" ' + { + echo ${HC}file:5:foo mmap bar + echo ${HC}file:14:foo_mmap bar mmap + echo ${HC}file:5:foo mmap bar_mmap + echo ${HC}file:14:foo_mmap bar mmap baz + } >expected && + git grep --column -w -e mmap $H >actual && + test_cmp expected actual + ' + + test_expect_success "grep -w $L (with --line-number, --column)" ' + { + echo ${HC}file:1:5:foo mmap bar + echo ${HC}file:3:14:foo_mmap bar mmap + echo ${HC}file:4:5:foo mmap bar_mmap + echo ${HC}file:5:14:foo_mmap bar mmap baz + } >expected && + git grep -n --column -w -e mmap $H >actual && +
[PATCH v5 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
Teach 'git-grep(1)' a new option, '--column', to show the column number of the first match on a non-context line. This makes it possible to teach 'contrib/git-jump/git-jump' how to seek to the first matching position of a grep match in your editor, and allows similar additional scripting capabilities. For example: $ git grep -n --column foo | head -n3 .clang-format:51:14:# myFunction(foo, bar, baz); .clang-format:64:7:# int foo(); .clang-format:75:8:# void foo() Signed-off-by: Taylor Blau--- Documentation/git-grep.txt | 6 +- builtin/grep.c | 4 grep.c | 3 +++ t/t7810-grep.sh| 32 4 files changed, 44 insertions(+), 1 deletion(-) diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 18b494731f..75f1561112 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -13,7 +13,7 @@ SYNOPSIS [-v | --invert-match] [-h|-H] [--full-name] [-E | --extended-regexp] [-G | --basic-regexp] [-P | --perl-regexp] - [-F | --fixed-strings] [-n | --line-number] + [-F | --fixed-strings] [-n | --line-number] [--column] [-l | --files-with-matches] [-L | --files-without-match] [(-O | --open-files-in-pager) []] [-z | --null] @@ -169,6 +169,10 @@ providing this option will cause it to die. --line-number:: Prefix the line number to matching lines. +--column:: + Prefix the 1-indexed byte-offset of the first match on non-context lines. This + option is incompatible with '--invert-match', and extended expressions. + -l:: --files-with-matches:: --name-only:: diff --git a/builtin/grep.c b/builtin/grep.c index 5f32d2ce84..f9f516dfc4 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -829,6 +829,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) GREP_PATTERN_TYPE_PCRE), OPT_GROUP(""), OPT_BOOL('n', "line-number", , N_("show line numbers")), + OPT_BOOL(0, "column", , N_("show column number of first match")), OPT_NEGBIT('h', NULL, , N_("don't show filenames"), 1), OPT_BIT('H', NULL, , N_("show filenames"), 1), OPT_NEGBIT(0, "full-name", , @@ -,6 +1112,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix) hit = grep_objects(, , the_repository, ); } + if (opt.columnnum && opt.invert) + die(_("--column and --invert-match cannot be combined")); + if (num_threads) hit |= wait_all(); if (hit && show_in_pager) diff --git a/grep.c b/grep.c index f3fe416791..f4228c23ac 100644 --- a/grep.c +++ b/grep.c @@ -995,6 +995,9 @@ static void compile_grep_patterns_real(struct grep_opt *opt) else if (!opt->extended && !opt->debug) return; + if (opt->columnnum && opt->extended) + die(_("--column and extended expressions cannot be combined")); + p = opt->pattern_list; if (p) opt->pattern_expression = compile_pattern_expr(); diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index 1797f632a3..aa56b21ed9 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -99,6 +99,28 @@ do test_cmp expected actual ' + test_expect_success "grep -w $L (with --column)" ' + { + echo ${HC}file:5:foo mmap bar + echo ${HC}file:14:foo_mmap bar mmap + echo ${HC}file:5:foo mmap bar_mmap + echo ${HC}file:14:foo_mmap bar mmap baz + } >expected && + git grep --column -w -e mmap $H >actual && + test_cmp expected actual + ' + + test_expect_success "grep -w $L (with --line-number, --column)" ' + { + echo ${HC}file:1:5:foo mmap bar + echo ${HC}file:3:14:foo_mmap bar mmap + echo ${HC}file:4:5:foo mmap bar_mmap + echo ${HC}file:5:14:foo_mmap bar mmap baz + } >expected && + git grep -n --column -w -e mmap $H >actual && + test_cmp expected actual + ' + test_expect_success "grep -w $L" ' { echo ${HC}file:1:foo mmap bar @@ -1590,4 +1612,14 @@ test_expect_success 'grep does not report i-t-a and assume unchanged with -L' ' test_cmp expected actual ' +test_expect_success 'grep does not allow --column, --invert-match' ' + test_must_fail git grep --column --invert-match pat 2>err && + test_i18ngrep "\-\-column and \-\-invert-match cannot be combined" err +' + +test_expect_success 'grep does not allow --column, extended' ' + test_must_fail git grep --column --not -e pat 2>err && + test_i18ngrep "\-\-column and extended