Re: [PATCH] grep -I: do not bother to read known-binary files

2014-05-16 Thread Stepan Kasal
Hi,

On Fri, May 16, 2014 at 04:29:58AM -0400, Jeff King wrote:
> [..] I hope I did not sound like "blame Stepan, he was screwed up".

no, you did not, it was ok.

> From msysgit's perspective, they may or may not want to revert the patch
> that they already have. That is a _separate_ issue, and I think the

I hope Dscho will help with the decision: he can say "keep it until
we have evidence or at least time to do a more thorough review," for
example.

Stepan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] grep -I: do not bother to read known-binary files

2014-05-16 Thread Jeff King
On Fri, May 16, 2014 at 10:19:57AM +0200, Stepan Kasal wrote:

> On Thu, May 15, 2014 at 03:22:26PM -0400, Jeff King wrote:
> > As the person who is proposing the patch for git.git, I would hope
> > Stepan would follow up on such review and confirm whether or not it is
> > still needed.
> 
> well, I try to.  (I verified that "less -I" works in msysGit before
> submitting the fixup back there, fox example.)
> 
> But I think my role is to moderate the reconciliation.
> In this case, having read the discussion, I decided to to ask Dscho
> to drop the patch.
> 
> (It is always about balancing the risks and the expenses.)

Sure, I think that is reasonable, and I hope I did not sound like "blame
Stepan, he was screwed up". After reading Dscho's other message and
knowing that he has not much time for git, I was trying to communicate
that I did not expect _him_ to be dealing with it.

Git.git has existed for a long time without that patch, so from our
perspective, it is a new patch. And as with any new patch, I would
expect a submitter who receives review of "eh, don't we already do this"
to either look into it, or decide that the review is probably right and
not bother with it. And you did the latter, and that is totally fine and
reasonable.

>From msysgit's perspective, they may or may not want to revert the patch
that they already have. That is a _separate_ issue, and I think the
burden of effort goes the other way. The patch should stay unless
somebody goes to the work to confirm that it is not necessary (or unless
they want to accept my say-so and indication that I did not do fresh
testing, but that is up to them).

Sorry if that is long and/or obvious. There have been enough bad
feelings on the list lately that I didn't want there to be a
misunderstanding about what I meant.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] grep -I: do not bother to read known-binary files

2014-05-16 Thread Stepan Kasal
Hello,

On Thu, May 15, 2014 at 03:22:26PM -0400, Jeff King wrote:
> As the person who is proposing the patch for git.git, I would hope
> Stepan would follow up on such review and confirm whether or not it is
> still needed.

well, I try to.  (I verified that "less -I" works in msysGit before
submitting the fixup back there, fox example.)

But I think my role is to moderate the reconciliation.
In this case, having read the discussion, I decided to to ask Dscho
to drop the patch.

(It is always about balancing the risks and the expenses.)

Stepan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] grep -I: do not bother to read known-binary files

2014-05-15 Thread Jeff King
On Thu, May 15, 2014 at 07:42:00PM +0200, Johannes Schindelin wrote:

> > Hrm. Is this patch still necessary? In the time since this patch was
> > written, we did 0826579 (grep: load file data after checking
> > binary-ness, 2012-02-02)
> 
> I have no time to test this but I trust that you made sure that it works
> as advertised. In my case, there were about 500 gigabytes of image data
> intermixed with code, and waiting for 'git grep' was not funny at all (and
> I did not have time back then to go through a full code submission cycle
> on the Git mailing list, either).
> 
> So I guess we can drop my patch.

Certainly I tested it at the time (those commits I referenced contain
timing information), and it should have improved the workload you
describe. I did not test before/after the patch in this thread, but only
read it and noticed that it was trying to do the same thing (that is why
I said "I suspect...").

As the person who is proposing the patch for git.git, I would hope
Stepan would follow up on such review and confirm whether or not it is
still needed.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] grep -I: do not bother to read known-binary files

2014-05-15 Thread Johannes Schindelin
Hi Peff,

On Wed, 14 May 2014, Jeff King wrote:

> On Wed, May 14, 2014 at 05:44:19PM +0200, Stepan Kasal wrote:
> 
> > From: Johannes Schindelin 
> > Date: Mon, 8 Nov 2010 16:10:43 +0100
> > 
> > Incidentally, this makes grep -I respect the "binary" attribute (actually,
> > the "-text" attribute, but "binary" implies that).
> 
> Hrm. Is this patch still necessary? In the time since this patch was
> written, we did 0826579 (grep: load file data after checking
> binary-ness, 2012-02-02)

I have no time to test this but I trust that you made sure that it works
as advertised. In my case, there were about 500 gigabytes of image data
intermixed with code, and waiting for 'git grep' was not funny at all (and
I did not have time back then to go through a full code submission cycle
on the Git mailing list, either).

So I guess we can drop my patch.

Ciao,
Johannes
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] grep -I: do not bother to read known-binary files

2014-05-14 Thread Junio C Hamano
Jeff King  writes:

> On Wed, May 14, 2014 at 05:44:19PM +0200, Stepan Kasal wrote:
>
>> From: Johannes Schindelin 
>> Date: Mon, 8 Nov 2010 16:10:43 +0100
>> 
>> Incidentally, this makes grep -I respect the "binary" attribute (actually,
>> the "-text" attribute, but "binary" implies that).
>> 
>> Since the attributes are not thread-safe, we now need to switch off
>> threading if -I was passed.
>> 
>> Signed-off-by: Johannes Schindelin 
>> Signed-off-by: Stepan Kasal 
>> ---
>> 
>> Hi,
>> this patch has been in msysgit for 3.5 years.
>> Stepan
>
> Hrm. Is this patch still necessary? In the time since this patch was
> written, we did 0826579 (grep: load file data after checking
> binary-ness, 2012-02-02), which should do the same thing. It deals with
> the threading via a lock, but we later learned in 9dd5245 (grep:
> pre-load userdiff drivers when threaded, 2012-02-02) to hoist that bit
> out.

Wow, power of Git history ;-)

> So I suspect this patch at best is doing nothing, and at worst is
> wasting extra time doing redundant attribute checks.

Sounds like a sensible conclusion.  Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] grep -I: do not bother to read known-binary files

2014-05-14 Thread Jeff King
On Wed, May 14, 2014 at 10:52:28AM -0700, Junio C Hamano wrote:

> I do not think checking 'text' is the right way to do this.  The
> attribute controls the eof normalization, and people sometimes want
> to keep CRLF terminated files in the repository no matter what the
> platform is (an example I heard is a sample protocol exchange over
> textual network protocol such as HTTP and SMTP), and the way to do
> so is to unset it.  That would still let them look for patterns in
> and compare the changes to these paths.
> 
> Looking for "Marking files as binary" in gitattributes(5) gives us a
> more authoritative alternative, I think.  In short:
> 
>  - If 'diff' is Unset, or
>  - If 'diff' is Set to X and diff.X.binary is true
>
> then the contents are not suitable for human consumption.

I responded elsewhere in the thread that I think the patch under
discussion is redundant at this point, but just to clarify: grep
currently uses the rules you give above, as it builds on the userdiff
driver code.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] grep -I: do not bother to read known-binary files

2014-05-14 Thread Jeff King
On Wed, May 14, 2014 at 05:44:19PM +0200, Stepan Kasal wrote:

> From: Johannes Schindelin 
> Date: Mon, 8 Nov 2010 16:10:43 +0100
> 
> Incidentally, this makes grep -I respect the "binary" attribute (actually,
> the "-text" attribute, but "binary" implies that).
> 
> Since the attributes are not thread-safe, we now need to switch off
> threading if -I was passed.
> 
> Signed-off-by: Johannes Schindelin 
> Signed-off-by: Stepan Kasal 
> ---
> 
> Hi,
> this patch has been in msysgit for 3.5 years.
> Stepan

Hrm. Is this patch still necessary? In the time since this patch was
written, we did 0826579 (grep: load file data after checking
binary-ness, 2012-02-02), which should do the same thing. It deals with
the threading via a lock, but we later learned in 9dd5245 (grep:
pre-load userdiff drivers when threaded, 2012-02-02) to hoist that bit
out.

So I suspect this patch at best is doing nothing, and at worst is
wasting extra time doing redundant attribute checks.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] grep -I: do not bother to read known-binary files

2014-05-14 Thread Junio C Hamano
Stepan Kasal  writes:

> From: Johannes Schindelin 
> Date: Mon, 8 Nov 2010 16:10:43 +0100
>
> Incidentally, this makes grep -I respect the "binary" attribute (actually,
> the "-text" attribute, but "binary" implies that).
>
> Since the attributes are not thread-safe, we now need to switch off
> threading if -I was passed.
>
> Signed-off-by: Johannes Schindelin 
> Signed-off-by: Stepan Kasal 
> ---
>
> Hi,
> this patch has been in msysgit for 3.5 years.
> Stepan

I do not think checking 'text' is the right way to do this.  The
attribute controls the eof normalization, and people sometimes want
to keep CRLF terminated files in the repository no matter what the
platform is (an example I heard is a sample protocol exchange over
textual network protocol such as HTTP and SMTP), and the way to do
so is to unset it.  That would still let them look for patterns in
and compare the changes to these paths.

Looking for "Marking files as binary" in gitattributes(5) gives us a
more authoritative alternative, I think.  In short:

 - If 'diff' is Unset, or
 - If 'diff' is Set to X and diff.X.binary is true

then the contents are not suitable for human consumption.  I haven't
thought things through to declare that "grep -I" would want to treat
a PostScript file (which is an example in that section) as a binary
file and refrain from trying to find substrings in it, but my gut
feeling is that we probably should let it look inside such a file,
so replacing 'text' with 'diff' in this patch and doing nothing
about diff.*.binary would be the way to go.  Given that the posted
patch was older than 3.5 years, perhaps it needs updating to adhere
the advice given in ab435611 (docs: explain diff.*.binary option,
2011-01-09).

By the way, I wonder if the patch is about performance, implied by
"do not bother" (to waste time looking inside), though.  Is it an
overall win to avoid looking inside "-diff" files, if that requires
you to use only 1 core and leaving the other 7 idle?

I think "the user tells us it is binary with attribute, and the user
tells us not to look into binary with -I" is a lot better rationale
to do this change , and that would characterise it as a correctness
patch, not a performance patch.

Thanks.


>  builtin/grep.c | 25 +
>  1 file changed, 25 insertions(+)
>
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 43af5b7..8073fbe 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -18,6 +18,7 @@
>  #include "quote.h"
>  #include "dir.h"
>  #include "pathspec.h"
> +#include "attr.h"
>  
>  static char const * const grep_usage[] = {
>   N_("git grep [options] [-e]  [...] [[--] ...]"),
> @@ -163,6 +164,22 @@ static void work_done(struct work_item *w)
>   grep_unlock();
>  }
>  
> +static int skip_binary(struct grep_opt *opt, const char *filename)
> +{
> + if ((opt->binary & GREP_BINARY_NOMATCH)) {
> + static struct git_attr *attr_text;
> + struct git_attr_check check;
> +
> + if (!attr_text)
> + attr_text = git_attr("text");
> + memset(&check, 0, sizeof(check));
> + check.attr = attr_text;
> + return !git_check_attr(filename, 1, &check) &&
> + ATTR_FALSE(check.value);
> + }
> + return 0;
> +}
> +
>  static void *run(void *arg)
>  {
>   int hit = 0;
> @@ -173,6 +190,9 @@ static void *run(void *arg)
>   if (!w)
>   break;
>  
> + if (skip_binary(opt, (const char *)w->source.identifier))
> + continue;
> +
>   opt->output_priv = w;
>   hit |= grep_source(opt, &w->source);
>   grep_source_clear_data(&w->source);
> @@ -379,6 +399,9 @@ static int grep_cache(struct grep_opt *opt, const struct 
> pathspec *pathspec, int
>   continue;
>   if (!ce_path_match(ce, pathspec, NULL))
>   continue;
> + if (skip_binary(opt, ce->name))
> + continue;
> +
>   /*
>* If CE_VALID is on, we assume worktree file and its cache 
> entry
>* are identical, even if worktree file has been modified, so 
> use
> @@ -803,6 +826,8 @@ int cmd_grep(int argc, const char **argv, const char 
> *prefix)
>   string_list_append(&path_list, show_in_pager);
>   use_threads = 0;
>   }
> + if ((opt.binary & GREP_BINARY_NOMATCH))
> + use_threads = 0;
>  
>   if (!opt.pattern_list)
>   die(_("no pattern given."));
> -- 
> 1.9.2.msysgit.0.335.gd2a461f
>
> -- 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] grep -I: do not bother to read known-binary files

2014-05-14 Thread Stepan Kasal
From: Johannes Schindelin 
Date: Mon, 8 Nov 2010 16:10:43 +0100

Incidentally, this makes grep -I respect the "binary" attribute (actually,
the "-text" attribute, but "binary" implies that).

Since the attributes are not thread-safe, we now need to switch off
threading if -I was passed.

Signed-off-by: Johannes Schindelin 
Signed-off-by: Stepan Kasal 
---

Hi,
this patch has been in msysgit for 3.5 years.
Stepan

 builtin/grep.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/builtin/grep.c b/builtin/grep.c
index 43af5b7..8073fbe 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -18,6 +18,7 @@
 #include "quote.h"
 #include "dir.h"
 #include "pathspec.h"
+#include "attr.h"
 
 static char const * const grep_usage[] = {
N_("git grep [options] [-e]  [...] [[--] ...]"),
@@ -163,6 +164,22 @@ static void work_done(struct work_item *w)
grep_unlock();
 }
 
+static int skip_binary(struct grep_opt *opt, const char *filename)
+{
+   if ((opt->binary & GREP_BINARY_NOMATCH)) {
+   static struct git_attr *attr_text;
+   struct git_attr_check check;
+
+   if (!attr_text)
+   attr_text = git_attr("text");
+   memset(&check, 0, sizeof(check));
+   check.attr = attr_text;
+   return !git_check_attr(filename, 1, &check) &&
+   ATTR_FALSE(check.value);
+   }
+   return 0;
+}
+
 static void *run(void *arg)
 {
int hit = 0;
@@ -173,6 +190,9 @@ static void *run(void *arg)
if (!w)
break;
 
+   if (skip_binary(opt, (const char *)w->source.identifier))
+   continue;
+
opt->output_priv = w;
hit |= grep_source(opt, &w->source);
grep_source_clear_data(&w->source);
@@ -379,6 +399,9 @@ static int grep_cache(struct grep_opt *opt, const struct 
pathspec *pathspec, int
continue;
if (!ce_path_match(ce, pathspec, NULL))
continue;
+   if (skip_binary(opt, ce->name))
+   continue;
+
/*
 * If CE_VALID is on, we assume worktree file and its cache 
entry
 * are identical, even if worktree file has been modified, so 
use
@@ -803,6 +826,8 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
string_list_append(&path_list, show_in_pager);
use_threads = 0;
}
+   if ((opt.binary & GREP_BINARY_NOMATCH))
+   use_threads = 0;
 
if (!opt.pattern_list)
die(_("no pattern given."));
-- 
1.9.2.msysgit.0.335.gd2a461f

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html