Re: [PATCH] grep: Add option --max-line-len

2017-11-26 Thread Marc-Antoine Ruel
[second try, now with text format]

Thanks a lot for the reviews. Replying to both.

If I send a follow up, I'll fix the commit description and the help
string, remove the shorthand -M, write a more sensible test.


2017-11-23 14:24 GMT-05:00 Eric Sunshine <sunsh...@sunshineco.com>:
>> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
>> @@ -127,6 +128,10 @@ OPTIONS
>> +-M::
>> +--max-line-len::
>> +   Match the pattern only for line shorter or equal to this length.
>
> This documentation doesn't explain what it means by a line's length.
> This implementation seems to take into consideration only the line's
> byte count, however, given that displayed lines are normally
> tab-expanded, people might intuitively expect that expansion to count
> toward the length. A similar question arises for Unicode characters.
>
> Should this option take tab-expansion and Unicode into account?
> Regardless of the answer to that question, the documentation should
> make it clear what "line length" means.

Excellent question. I don't have an immediate answer.


>> diff --git a/grep.c b/grep.c
>> @@ -1273,6 +1275,8 @@ static int match_line(struct grep_opt *opt, char *bol, 
>> char *eol,
>> +   if (opt->max_line_length > 0 && eol-bol > opt->max_line_length)
>> +   return 0;
>
> If the user specifies "-M0", should that error out or at least warn
> the user that the value is non-sensical? What about -1, etc.? (These
> are UX-related questions; the implementation obviously doesn't care
> one way or the other.)

Precedent with -A is to ignore the negative value. I don't have a
strong opinion.


2017-11-23 20:44 GMT-05:00 Junio C Hamano <gits...@pobox.com>:
>
> Marc-Antoine Ruel <mar...@chromium.org> writes:
>
> > This tells git grep to skip files longer than a specified length,
> > which is often the result of generators and not actual source files.
> >
> > ...
> > +-M::
> > +--max-line-len::
> > + Match the pattern only for line shorter or equal to this length.
> > +
>
> All the excellent review comments from Eric I agree with.
>
> With the name of the option and the above end-user facing
> description, it is very clear that the only thing this feature does
> is to declare that an overlong line does _not_ match when trying to
> check against any pattern.
>
> That is a much clearer definition and description than random new
> features people propose here (and kicked back by reviewers, telling
> them to make the specification clearer), and I'd commend you for that.
>
> But it still leaves at least one thing unclear.  How should it
> interact with "-v"?  If we consider an overlong line never matches,
> would "git grep -v " should include the line in its output?

Ah! No idea. :/

> Speaking of the output, it also makes me wonder if the feature
> really wants to include an overlong line as a context line when
> showing a near-by line that matches the pattern when -A/-B/-C/-W
> options are in use. Even though it is clear that it does from the
> above description, is it really the best thing the feature can do to
> help the end users?
>
> Which leads me to suspect that this "feature" might not be the ideal
> you wanted to achive, but is an approximate substitution that you
> found is "good enough" to simulate what the real thing you wanted to
> do, especially when I go back and read the justfication in the
> proposed log message that talks about "result of generators".
>
> Isn't it a property of the entire file, not individual lines, if you
> find it uninteresting to see reported by "git grep"?  I cannot shake
> the suspicion that this feature happened to have ended up in this
> shape, instead of "ignore a file with a line this long", only
> because your starting point was to use "has overlong lines" as the
> heuristic for "not interesting", and because "git grep" code is not
> structured to first scan the entire file to decide if it is worth
> working on it, and it is extra work to restructure the codeflow to
> make it so (which you avoided).
>
> If your real motivation was either
>
>  (1) whether the file has or does not have the pattern for certain
>  class of files are uninteresting; do not even run "grep"
>  processing for them; or
>
>  (2) hits or no-hits may be intereseting but output of overlong
>  lines from certain class of files I do not wish to see;
>
> then I can think of two alternatives.
>
> For (1), can't we tell "result of generators" and other files with
> pathspec?  If so, perha

[PATCH] grep: Add option --max-line-len

2017-11-23 Thread Marc-Antoine Ruel
This tells git grep to skip files longer than a specified length,
which is often the result of generators and not actual source files.

Signed-off-by: Marc-Antoine Ruel <mar...@chromium.org>
---
 Documentation/git-grep.txt | 5 +
 builtin/grep.c | 2 ++
 grep.c | 4 
 grep.h | 1 +
 t/t7810-grep.sh| 5 +
 5 files changed, 17 insertions(+)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 18b494731..75081defb 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -10,6 +10,7 @@ SYNOPSIS
 
 [verse]
 'git grep' [-a | --text] [-I] [--textconv] [-i | --ignore-case] [-w | 
--word-regexp]
+  [-M | --max-line-len ]
   [-v | --invert-match] [-h|-H] [--full-name]
   [-E | --extended-regexp] [-G | --basic-regexp]
   [-P | --perl-regexp]
@@ -127,6 +128,10 @@ OPTIONS
beginning of a line, or preceded by a non-word character; end at
the end of a line or followed by a non-word character).
 
+-M::
+--max-line-len::
+   Match the pattern only for line shorter or equal to this length.
+
 -v::
 --invert-match::
Select non-matching lines.
diff --git a/builtin/grep.c b/builtin/grep.c
index 5a6cfe6b4..cc5c70be5 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -796,6 +796,8 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
N_("case insensitive matching")),
OPT_BOOL('w', "word-regexp", _regexp,
N_("match patterns only at word boundaries")),
+   OPT_INTEGER('M', "max-line-len", _line_length,
+   N_("ignore lines longer than ")),
OPT_SET_INT('a', "text", ,
N_("process binary files as text"), GREP_BINARY_TEXT),
OPT_SET_INT('I', NULL, ,
diff --git a/grep.c b/grep.c
index d0b9b6cdf..881078b82 100644
--- a/grep.c
+++ b/grep.c
@@ -36,6 +36,7 @@ void init_grep_defaults(void)
opt->relative = 1;
opt->pathname = 1;
opt->max_depth = -1;
+   opt->max_line_length = -1;
opt->pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED;
color_set(opt->color_context, "");
color_set(opt->color_filename, "");
@@ -151,6 +152,7 @@ void grep_init(struct grep_opt *opt, const char *prefix)
opt->pattern_type_option = def->pattern_type_option;
opt->linenum = def->linenum;
opt->max_depth = def->max_depth;
+   opt->max_line_length = def->max_line_length;
opt->pathname = def->pathname;
opt->relative = def->relative;
opt->output = def->output;
@@ -1273,6 +1275,8 @@ static int match_line(struct grep_opt *opt, char *bol, 
char *eol,
struct grep_pat *p;
regmatch_t match;
 
+   if (opt->max_line_length > 0 && eol-bol > opt->max_line_length)
+   return 0;
if (opt->extended)
return match_expr(opt, bol, eol, ctx, collect_hits);
 
diff --git a/grep.h b/grep.h
index 399381c90..0e76c0a19 100644
--- a/grep.h
+++ b/grep.h
@@ -151,6 +151,7 @@ struct grep_opt {
int null_following_name;
int color;
int max_depth;
+   int max_line_length;
int funcname;
int funcbody;
int extended_regexp_option;
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 2a6679c2f..c514bd388 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -766,6 +766,11 @@ test_expect_success 'grep -W shows no trailing empty 
lines' '
test_cmp expected actual
 '
 
+test_expect_success 'grep skips long lines' '
+   git grep -M18 -W include >actual &&
+   test_cmp expected actual
+'
+
 cat >expected <

git-svn fails to initialize with submodule setup, when '.git' is a file and not a directory

2013-09-12 Thread Marc-Antoine Ruel
Repro:
1. git clone --recursive a repository with submodules.
2. cd checkout/submoduleA
3. git svn init svn://host/repo

Expected:
Works as usual

Actual:
/path/to/checkout/submoduleA/.git/refs: Not a directory
init: command returned error: 1

Why:
submoduleA/.git is a file, not a directory.
$ cat /path/to/checkout/submoduleA/.git
gitdir: ../.git/modules/submoduleA

$ git --version
git version 1.8.4

Thanks,

M-A
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html