Re: [PATCH] grep -I: do not bother to read known-binary files
Hi, On Fri, May 16, 2014 at 04:29:58AM -0400, Jeff King wrote: > [..] I hope I did not sound like "blame Stepan, he was screwed up". no, you did not, it was ok. > From msysgit's perspective, they may or may not want to revert the patch > that they already have. That is a _separate_ issue, and I think the I hope Dscho will help with the decision: he can say "keep it until we have evidence or at least time to do a more thorough review," for example. Stepan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] grep -I: do not bother to read known-binary files
On Fri, May 16, 2014 at 10:19:57AM +0200, Stepan Kasal wrote: > On Thu, May 15, 2014 at 03:22:26PM -0400, Jeff King wrote: > > As the person who is proposing the patch for git.git, I would hope > > Stepan would follow up on such review and confirm whether or not it is > > still needed. > > well, I try to. (I verified that "less -I" works in msysGit before > submitting the fixup back there, fox example.) > > But I think my role is to moderate the reconciliation. > In this case, having read the discussion, I decided to to ask Dscho > to drop the patch. > > (It is always about balancing the risks and the expenses.) Sure, I think that is reasonable, and I hope I did not sound like "blame Stepan, he was screwed up". After reading Dscho's other message and knowing that he has not much time for git, I was trying to communicate that I did not expect _him_ to be dealing with it. Git.git has existed for a long time without that patch, so from our perspective, it is a new patch. And as with any new patch, I would expect a submitter who receives review of "eh, don't we already do this" to either look into it, or decide that the review is probably right and not bother with it. And you did the latter, and that is totally fine and reasonable. >From msysgit's perspective, they may or may not want to revert the patch that they already have. That is a _separate_ issue, and I think the burden of effort goes the other way. The patch should stay unless somebody goes to the work to confirm that it is not necessary (or unless they want to accept my say-so and indication that I did not do fresh testing, but that is up to them). Sorry if that is long and/or obvious. There have been enough bad feelings on the list lately that I didn't want there to be a misunderstanding about what I meant. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] grep -I: do not bother to read known-binary files
Hello, On Thu, May 15, 2014 at 03:22:26PM -0400, Jeff King wrote: > As the person who is proposing the patch for git.git, I would hope > Stepan would follow up on such review and confirm whether or not it is > still needed. well, I try to. (I verified that "less -I" works in msysGit before submitting the fixup back there, fox example.) But I think my role is to moderate the reconciliation. In this case, having read the discussion, I decided to to ask Dscho to drop the patch. (It is always about balancing the risks and the expenses.) Stepan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] grep -I: do not bother to read known-binary files
On Thu, May 15, 2014 at 07:42:00PM +0200, Johannes Schindelin wrote: > > Hrm. Is this patch still necessary? In the time since this patch was > > written, we did 0826579 (grep: load file data after checking > > binary-ness, 2012-02-02) > > I have no time to test this but I trust that you made sure that it works > as advertised. In my case, there were about 500 gigabytes of image data > intermixed with code, and waiting for 'git grep' was not funny at all (and > I did not have time back then to go through a full code submission cycle > on the Git mailing list, either). > > So I guess we can drop my patch. Certainly I tested it at the time (those commits I referenced contain timing information), and it should have improved the workload you describe. I did not test before/after the patch in this thread, but only read it and noticed that it was trying to do the same thing (that is why I said "I suspect..."). As the person who is proposing the patch for git.git, I would hope Stepan would follow up on such review and confirm whether or not it is still needed. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] grep -I: do not bother to read known-binary files
Hi Peff, On Wed, 14 May 2014, Jeff King wrote: > On Wed, May 14, 2014 at 05:44:19PM +0200, Stepan Kasal wrote: > > > From: Johannes Schindelin > > Date: Mon, 8 Nov 2010 16:10:43 +0100 > > > > Incidentally, this makes grep -I respect the "binary" attribute (actually, > > the "-text" attribute, but "binary" implies that). > > Hrm. Is this patch still necessary? In the time since this patch was > written, we did 0826579 (grep: load file data after checking > binary-ness, 2012-02-02) I have no time to test this but I trust that you made sure that it works as advertised. In my case, there were about 500 gigabytes of image data intermixed with code, and waiting for 'git grep' was not funny at all (and I did not have time back then to go through a full code submission cycle on the Git mailing list, either). So I guess we can drop my patch. Ciao, Johannes -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] grep -I: do not bother to read known-binary files
Jeff King writes: > On Wed, May 14, 2014 at 05:44:19PM +0200, Stepan Kasal wrote: > >> From: Johannes Schindelin >> Date: Mon, 8 Nov 2010 16:10:43 +0100 >> >> Incidentally, this makes grep -I respect the "binary" attribute (actually, >> the "-text" attribute, but "binary" implies that). >> >> Since the attributes are not thread-safe, we now need to switch off >> threading if -I was passed. >> >> Signed-off-by: Johannes Schindelin >> Signed-off-by: Stepan Kasal >> --- >> >> Hi, >> this patch has been in msysgit for 3.5 years. >> Stepan > > Hrm. Is this patch still necessary? In the time since this patch was > written, we did 0826579 (grep: load file data after checking > binary-ness, 2012-02-02), which should do the same thing. It deals with > the threading via a lock, but we later learned in 9dd5245 (grep: > pre-load userdiff drivers when threaded, 2012-02-02) to hoist that bit > out. Wow, power of Git history ;-) > So I suspect this patch at best is doing nothing, and at worst is > wasting extra time doing redundant attribute checks. Sounds like a sensible conclusion. Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] grep -I: do not bother to read known-binary files
On Wed, May 14, 2014 at 10:52:28AM -0700, Junio C Hamano wrote: > I do not think checking 'text' is the right way to do this. The > attribute controls the eof normalization, and people sometimes want > to keep CRLF terminated files in the repository no matter what the > platform is (an example I heard is a sample protocol exchange over > textual network protocol such as HTTP and SMTP), and the way to do > so is to unset it. That would still let them look for patterns in > and compare the changes to these paths. > > Looking for "Marking files as binary" in gitattributes(5) gives us a > more authoritative alternative, I think. In short: > > - If 'diff' is Unset, or > - If 'diff' is Set to X and diff.X.binary is true > > then the contents are not suitable for human consumption. I responded elsewhere in the thread that I think the patch under discussion is redundant at this point, but just to clarify: grep currently uses the rules you give above, as it builds on the userdiff driver code. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] grep -I: do not bother to read known-binary files
On Wed, May 14, 2014 at 05:44:19PM +0200, Stepan Kasal wrote: > From: Johannes Schindelin > Date: Mon, 8 Nov 2010 16:10:43 +0100 > > Incidentally, this makes grep -I respect the "binary" attribute (actually, > the "-text" attribute, but "binary" implies that). > > Since the attributes are not thread-safe, we now need to switch off > threading if -I was passed. > > Signed-off-by: Johannes Schindelin > Signed-off-by: Stepan Kasal > --- > > Hi, > this patch has been in msysgit for 3.5 years. > Stepan Hrm. Is this patch still necessary? In the time since this patch was written, we did 0826579 (grep: load file data after checking binary-ness, 2012-02-02), which should do the same thing. It deals with the threading via a lock, but we later learned in 9dd5245 (grep: pre-load userdiff drivers when threaded, 2012-02-02) to hoist that bit out. So I suspect this patch at best is doing nothing, and at worst is wasting extra time doing redundant attribute checks. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] grep -I: do not bother to read known-binary files
Stepan Kasal writes: > From: Johannes Schindelin > Date: Mon, 8 Nov 2010 16:10:43 +0100 > > Incidentally, this makes grep -I respect the "binary" attribute (actually, > the "-text" attribute, but "binary" implies that). > > Since the attributes are not thread-safe, we now need to switch off > threading if -I was passed. > > Signed-off-by: Johannes Schindelin > Signed-off-by: Stepan Kasal > --- > > Hi, > this patch has been in msysgit for 3.5 years. > Stepan I do not think checking 'text' is the right way to do this. The attribute controls the eof normalization, and people sometimes want to keep CRLF terminated files in the repository no matter what the platform is (an example I heard is a sample protocol exchange over textual network protocol such as HTTP and SMTP), and the way to do so is to unset it. That would still let them look for patterns in and compare the changes to these paths. Looking for "Marking files as binary" in gitattributes(5) gives us a more authoritative alternative, I think. In short: - If 'diff' is Unset, or - If 'diff' is Set to X and diff.X.binary is true then the contents are not suitable for human consumption. I haven't thought things through to declare that "grep -I" would want to treat a PostScript file (which is an example in that section) as a binary file and refrain from trying to find substrings in it, but my gut feeling is that we probably should let it look inside such a file, so replacing 'text' with 'diff' in this patch and doing nothing about diff.*.binary would be the way to go. Given that the posted patch was older than 3.5 years, perhaps it needs updating to adhere the advice given in ab435611 (docs: explain diff.*.binary option, 2011-01-09). By the way, I wonder if the patch is about performance, implied by "do not bother" (to waste time looking inside), though. Is it an overall win to avoid looking inside "-diff" files, if that requires you to use only 1 core and leaving the other 7 idle? I think "the user tells us it is binary with attribute, and the user tells us not to look into binary with -I" is a lot better rationale to do this change , and that would characterise it as a correctness patch, not a performance patch. Thanks. > builtin/grep.c | 25 + > 1 file changed, 25 insertions(+) > > diff --git a/builtin/grep.c b/builtin/grep.c > index 43af5b7..8073fbe 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -18,6 +18,7 @@ > #include "quote.h" > #include "dir.h" > #include "pathspec.h" > +#include "attr.h" > > static char const * const grep_usage[] = { > N_("git grep [options] [-e] [...] [[--] ...]"), > @@ -163,6 +164,22 @@ static void work_done(struct work_item *w) > grep_unlock(); > } > > +static int skip_binary(struct grep_opt *opt, const char *filename) > +{ > + if ((opt->binary & GREP_BINARY_NOMATCH)) { > + static struct git_attr *attr_text; > + struct git_attr_check check; > + > + if (!attr_text) > + attr_text = git_attr("text"); > + memset(&check, 0, sizeof(check)); > + check.attr = attr_text; > + return !git_check_attr(filename, 1, &check) && > + ATTR_FALSE(check.value); > + } > + return 0; > +} > + > static void *run(void *arg) > { > int hit = 0; > @@ -173,6 +190,9 @@ static void *run(void *arg) > if (!w) > break; > > + if (skip_binary(opt, (const char *)w->source.identifier)) > + continue; > + > opt->output_priv = w; > hit |= grep_source(opt, &w->source); > grep_source_clear_data(&w->source); > @@ -379,6 +399,9 @@ static int grep_cache(struct grep_opt *opt, const struct > pathspec *pathspec, int > continue; > if (!ce_path_match(ce, pathspec, NULL)) > continue; > + if (skip_binary(opt, ce->name)) > + continue; > + > /* >* If CE_VALID is on, we assume worktree file and its cache > entry >* are identical, even if worktree file has been modified, so > use > @@ -803,6 +826,8 @@ int cmd_grep(int argc, const char **argv, const char > *prefix) > string_list_append(&path_list, show_in_pager); > use_threads = 0; > } > + if ((opt.binary & GREP_BINARY_NOMATCH)) > + use_threads = 0; > > if (!opt.pattern_list) > die(_("no pattern given.")); > -- > 1.9.2.msysgit.0.335.gd2a461f > > -- -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] grep -I: do not bother to read known-binary files
From: Johannes Schindelin Date: Mon, 8 Nov 2010 16:10:43 +0100 Incidentally, this makes grep -I respect the "binary" attribute (actually, the "-text" attribute, but "binary" implies that). Since the attributes are not thread-safe, we now need to switch off threading if -I was passed. Signed-off-by: Johannes Schindelin Signed-off-by: Stepan Kasal --- Hi, this patch has been in msysgit for 3.5 years. Stepan builtin/grep.c | 25 + 1 file changed, 25 insertions(+) diff --git a/builtin/grep.c b/builtin/grep.c index 43af5b7..8073fbe 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -18,6 +18,7 @@ #include "quote.h" #include "dir.h" #include "pathspec.h" +#include "attr.h" static char const * const grep_usage[] = { N_("git grep [options] [-e] [...] [[--] ...]"), @@ -163,6 +164,22 @@ static void work_done(struct work_item *w) grep_unlock(); } +static int skip_binary(struct grep_opt *opt, const char *filename) +{ + if ((opt->binary & GREP_BINARY_NOMATCH)) { + static struct git_attr *attr_text; + struct git_attr_check check; + + if (!attr_text) + attr_text = git_attr("text"); + memset(&check, 0, sizeof(check)); + check.attr = attr_text; + return !git_check_attr(filename, 1, &check) && + ATTR_FALSE(check.value); + } + return 0; +} + static void *run(void *arg) { int hit = 0; @@ -173,6 +190,9 @@ static void *run(void *arg) if (!w) break; + if (skip_binary(opt, (const char *)w->source.identifier)) + continue; + opt->output_priv = w; hit |= grep_source(opt, &w->source); grep_source_clear_data(&w->source); @@ -379,6 +399,9 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int continue; if (!ce_path_match(ce, pathspec, NULL)) continue; + if (skip_binary(opt, ce->name)) + continue; + /* * If CE_VALID is on, we assume worktree file and its cache entry * are identical, even if worktree file has been modified, so use @@ -803,6 +826,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) string_list_append(&path_list, show_in_pager); use_threads = 0; } + if ((opt.binary & GREP_BINARY_NOMATCH)) + use_threads = 0; if (!opt.pattern_list) die(_("no pattern given.")); -- 1.9.2.msysgit.0.335.gd2a461f -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html