Re: [PATCH v2 1/3] diffcore-pickaxe: remove unnecessary call to get_textconv()

2013-04-05 Thread Simon Ruderich
On Thu, Apr 04, 2013 at 01:48:52PM -0700, Junio C Hamano wrote:
> If I am reading the code correctly, it is has_changes(), which is
> used for "log -S" (not "log -G" that uses diff_grep()), that does
> the unnecessary get_textconv() unconditionally.  The way diff_grep()
> divides the work to make fill_one() responsible for filling the
> textconv as necessary is internally consistent, and there is no
> unnecessary call.

Yes, of course. I meant has_changes() which has the unnecessary
call.

> Perhaps...
>
>   The fill_one() function is responsible for finding and
>   filling the textconv filter as necessary, and is called by
>   diff_grep() function that implements "git log -G".
>
>   The has_changes() function calls get_textconv() for two
>   sides being compared, before it checks to see if it was
>   asked to perform the pickaxe limiting with the -S option.
>   Move the code around to avoid this wastage.  After that,
>   fill_one() is called to use the textconv.
>
>   By adding get_textconv() to diff_grep() and relieving
>   fill_one() of responsibility to find the textconv filter, we
>   can avoid calling get_textconv() twice.

Sounds good to me.

Regards
Simon
-- 
+ privacy is necessary
+ using gnupg http://gnupg.org
+ public key id: 0x92FEFDB7E44C32F9
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] diffcore-pickaxe: remove unnecessary call to get_textconv()

2013-04-04 Thread Junio C Hamano
Jeff King  writes:

> But I do not think fill_one is the right interface for it. The reason
> has_changes calls get_textconv separately is that we do not want to fill
> the buffer (which may be expensive) if we can avoid it. So the correct
> sequence is: ...
> If you turned fill_one into "fill_both_sides" then you could share
> the code between diff_grep/pickaxe and do it in the right order in the
> helper.

Yeah, I think that is a good way to refactor.  A patch later in the
series will be killing fill_one() anyway ;-)

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] diffcore-pickaxe: remove unnecessary call to get_textconv()

2013-04-04 Thread Jeff King
On Thu, Apr 04, 2013 at 01:48:52PM -0700, Junio C Hamano wrote:

> If I am reading the code correctly, it is has_changes(), which is
> used for "log -S" (not "log -G" that uses diff_grep()), that does
> the unnecessary get_textconv() unconditionally.  The way diff_grep() 
> divides the work to make fill_one() responsible for filling the
> textconv as necessary is internally consistent, and there is no
> unnecessary call.
> 
> Perhaps...
> 
>   The fill_one() function is responsible for finding and
>   filling the textconv filter as necessary, and is called by
>   diff_grep() function that implements "git log -G".
> 
>   The has_changes() function calls get_textconv() for two
>   sides being compared, before it checks to see if it was
>   asked to perform the pickaxe limiting with the -S option.
>   Move the code around to avoid this wastage.  After that,
>   fill_one() is called to use the textconv.
> 
>   By adding get_textconv() to diff_grep() and relieving
>   fill_one() of responsibility to find the textconv filter, we
>   can avoid calling get_textconv() twice.
> 
> Explained that way, it makes me wonder why we cannot fix it the
> other way around, that is, not fetching textconv in has_changes()
> and instead letting fill_one() to find textconv as needed.
> 
> The answer is because has_changes() itself looks at the textconv.
> 
> But we have to wonder why it is so.  diff_grep() short-circuits when
> the two sides are identical and has_changes() has a similar but
> different logic to check if the identical two sides are processed
> with the same textconv filter before saying this filepair is
> uninteresting.
> 
> Shouldn't that logic be unified as well?

I think it would be OK to perform the same optimization in diff_grep; I
do not recall offhand why I didn't do so when touching this code last
time, so it may have just been because I didn't think of it.

But I do not think fill_one is the right interface for it. The reason
has_changes calls get_textconv separately is that we do not want to fill
the buffer (which may be expensive) if we can avoid it. So the correct
sequence is:

  1. Find out textconv_one.

  2. Find out textconv_two.

  3. Check "!hashcmp(one->sha1, two->sha1) && textconv_one == textconv_two";
 if true, then we know the content we are about to compare will be
 identical, and we can return early.

  4. Otherwise, retrieve the content for one (respecting textconv_one).

  5. Retrieve the content for two (respecting textconv_two).

You cannot implement the optimization looking only at one side
(obviously), but you also need to split the textconv lookup from the
"fill" procedure if you want the optimization to come at the right
place.

If you turned fill_one into "fill_both_sides" then you could share
the code between diff_grep/pickaxe and do it in the right order in the
helper.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] diffcore-pickaxe: remove unnecessary call to get_textconv()

2013-04-04 Thread Junio C Hamano
Junio C Hamano  writes:

> Perhaps...
>
>   The fill_one() function is responsible for finding and
>   filling the textconv filter as necessary, and is called by
>   diff_grep() function that implements "git log -G".
>
>   The has_changes() function calls get_textconv() for two
>   sides being compared, before it checks to see if it was
>   asked to perform the pickaxe limiting with the -S option.
>   Move the code around to avoid this wastage.  After that,
>   fill_one() is called to use the textconv.
>
>   By adding get_textconv() to diff_grep() and relieving
>   fill_one() of responsibility to find the textconv filter, we
>   can avoid calling get_textconv() twice.
>
> Explained that way, it makes me wonder why we cannot fix it the
> other way around, that is, not fetching textconv in has_changes()
> and instead letting fill_one() to find textconv as needed.
>
> The answer is because has_changes() itself looks at the textconv.
>
> But we have to wonder why it is so.  diff_grep() short-circuits when
> the two sides are identical and has_changes() has a similar but
> different logic to check if the identical two sides are processed
> with the same textconv filter before saying this filepair is
> uninteresting.
>
> Shouldn't that logic be unified as well?

That is, something along this line.  We may want to unify these two
functions even more, but I do not offhand see a good way to do so.

-- >8 --
Subject: [PATCH] log -S/-G: unify the short-cut logic a bit more

The fill_one() function is responsible for finding and filling the
textconv filter as necessary, and is called by diff_grep() function
that implements "git log -G".

The has_changes() function calls get_textconv() for two sides being
compared, before it checks to see if it was asked to perform the
pickaxe limiting with the -S option.  Move the code around to avoid
this wastage.

By adding get_textconv() to diff_grep() and relieving fill_one() of
responsibility to find the textconv filter, we can avoid calling
get_textconv() twice; fill_one() no longer has to be passed a
pointer to a pointer to textconv.

The reason has_changes() calls get_textconv() early is because it
wants to compare two textconv filters on both sides.  When comparing
the two sides that are otherwise identical, if the same textconv
applies to both sides, we know pickaxe (either -S or -G) would not
find anything before applying the textconv and comparing the result.

Teach the same short-circuit logic to diff_grep() as well.  The code
in these two functions become more similar as the result.
---
 diffcore-pickaxe.c | 37 +++--
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index b097fa7..6d285a7 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -75,11 +75,10 @@ static void diffgrep_consume(void *priv, char *line, 
unsigned long len)
 }
 
 static void fill_one(struct diff_filespec *one,
-mmfile_t *mf, struct userdiff_driver **textconv)
+mmfile_t *mf, struct userdiff_driver *textconv)
 {
if (DIFF_FILE_VALID(one)) {
-   *textconv = get_textconv(one);
-   mf->size = fill_textconv(*textconv, one, &mf->ptr);
+   mf->size = fill_textconv(textconv, one, &mf->ptr);
} else {
memset(mf, 0, sizeof(*mf));
}
@@ -94,11 +93,24 @@ static int diff_grep(struct diff_filepair *p, struct 
diff_options *o,
mmfile_t mf1, mf2;
int hit;
 
-   if (diff_unmodified_pair(p))
+   if (!o->pickaxe[0])
return 0;
 
-   fill_one(p->one, &mf1, &textconv_one);
-   fill_one(p->two, &mf2, &textconv_two);
+   textconv_one = get_textconv(p->one);
+   textconv_two = get_textconv(p->two);
+
+   /*
+* If we have an unmodified pair, we know that the count will be the
+* same and don't even have to load the blobs. Unless textconv is in
+* play, _and_ we are using two different textconv filters (e.g.,
+* because a pair is an exact rename with different textconv attributes
+* for each side, which might generate different content).
+*/
+   if (textconv_one == textconv_two && diff_unmodified_pair(p))
+   return 0;
+
+   fill_one(p->one, &mf1, textconv_one);
+   fill_one(p->two, &mf2, textconv_two);
 
if (!mf1.ptr) {
if (!mf2.ptr)
@@ -131,6 +143,8 @@ static int diff_grep(struct diff_filepair *p, struct 
diff_options *o,
free(mf1.ptr);
if (textconv_two)
free(mf2.ptr);
+   diff_free_filespec_data(p->one);
+   diff_free_filespec_data(p->two);
return hit;
 }
 
@@ -201,14 +215,17 @@ static unsigned int contains(mmfile_t *mf, struct 
diff_options *o,
 static int has_changes(struct diff_filepair *p, struct diff_options *o,
   regex_t *regexp, kwset_t kws)

Re: [PATCH v2 1/3] diffcore-pickaxe: remove unnecessary call to get_textconv()

2013-04-04 Thread Junio C Hamano
Simon Ruderich  writes:

> get_textconv() is called in diff_grep() to determine the textconv driver
> before calling fill_one() and then again in fill_one(). Remove this
> unnecessary call by determining the textconv driver before calling
> fill_one().

If I am reading the code correctly, it is has_changes(), which is
used for "log -S" (not "log -G" that uses diff_grep()), that does
the unnecessary get_textconv() unconditionally.  The way diff_grep() 
divides the work to make fill_one() responsible for filling the
textconv as necessary is internally consistent, and there is no
unnecessary call.

Perhaps...

The fill_one() function is responsible for finding and
filling the textconv filter as necessary, and is called by
diff_grep() function that implements "git log -G".

The has_changes() function calls get_textconv() for two
sides being compared, before it checks to see if it was
asked to perform the pickaxe limiting with the -S option.
Move the code around to avoid this wastage.  After that,
fill_one() is called to use the textconv.

By adding get_textconv() to diff_grep() and relieving
fill_one() of responsibility to find the textconv filter, we
can avoid calling get_textconv() twice.

Explained that way, it makes me wonder why we cannot fix it the
other way around, that is, not fetching textconv in has_changes()
and instead letting fill_one() to find textconv as needed.

The answer is because has_changes() itself looks at the textconv.

But we have to wonder why it is so.  diff_grep() short-circuits when
the two sides are identical and has_changes() has a similar but
different logic to check if the identical two sides are processed
with the same textconv filter before saying this filepair is
uninteresting.

Shouldn't that logic be unified as well?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] diffcore-pickaxe: remove unnecessary call to get_textconv()

2013-04-04 Thread Simon Ruderich
get_textconv() is called in diff_grep() to determine the textconv driver
before calling fill_one() and then again in fill_one(). Remove this
unnecessary call by determining the textconv driver before calling
fill_one().

With this change it's also no longer necessary for fill_one() to
modify the textconv argument, therefore pass a pointer instead of
a pointer to a pointer.

Signed-off-by: Simon Ruderich 
---

Hello,

I've reordered the patches as requested and included Jeff's
cleanup patch.

Regards
Simon

 diffcore-pickaxe.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index b097fa7..8f955f8 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -75,11 +75,10 @@ static void diffgrep_consume(void *priv, char *line, 
unsigned long len)
 }
 
 static void fill_one(struct diff_filespec *one,
-mmfile_t *mf, struct userdiff_driver **textconv)
+mmfile_t *mf, struct userdiff_driver *textconv)
 {
if (DIFF_FILE_VALID(one)) {
-   *textconv = get_textconv(one);
-   mf->size = fill_textconv(*textconv, one, &mf->ptr);
+   mf->size = fill_textconv(textconv, one, &mf->ptr);
} else {
memset(mf, 0, sizeof(*mf));
}
@@ -97,8 +96,11 @@ static int diff_grep(struct diff_filepair *p, struct 
diff_options *o,
if (diff_unmodified_pair(p))
return 0;
 
-   fill_one(p->one, &mf1, &textconv_one);
-   fill_one(p->two, &mf2, &textconv_two);
+   textconv_one = get_textconv(p->one);
+   textconv_two = get_textconv(p->two);
+
+   fill_one(p->one, &mf1, textconv_one);
+   fill_one(p->two, &mf2, textconv_two);
 
if (!mf1.ptr) {
if (!mf2.ptr)
@@ -201,14 +203,17 @@ static unsigned int contains(mmfile_t *mf, struct 
diff_options *o,
 static int has_changes(struct diff_filepair *p, struct diff_options *o,
   regex_t *regexp, kwset_t kws)
 {
-   struct userdiff_driver *textconv_one = get_textconv(p->one);
-   struct userdiff_driver *textconv_two = get_textconv(p->two);
+   struct userdiff_driver *textconv_one = NULL;
+   struct userdiff_driver *textconv_two = NULL;
mmfile_t mf1, mf2;
int ret;
 
if (!o->pickaxe[0])
return 0;
 
+   textconv_one = get_textconv(p->one);
+   textconv_two = get_textconv(p->two);
+
/*
 * If we have an unmodified pair, we know that the count will be the
 * same and don't even have to load the blobs. Unless textconv is in
@@ -219,8 +224,8 @@ static int has_changes(struct diff_filepair *p, struct 
diff_options *o,
if (textconv_one == textconv_two && diff_unmodified_pair(p))
return 0;
 
-   fill_one(p->one, &mf1, &textconv_one);
-   fill_one(p->two, &mf2, &textconv_two);
+   fill_one(p->one, &mf1, textconv_one);
+   fill_one(p->two, &mf2, textconv_two);
 
if (!mf1.ptr) {
if (!mf2.ptr)
-- 
1.8.2

-- 
+ privacy is necessary
+ using gnupg http://gnupg.org
+ public key id: 0x92FEFDB7E44C32F9
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html