Re: [PATCH v1 7/9] diff --color-moved-ws: optimize allow-indentation-change

2018-11-17 Thread Phillip Wood

On 16/11/2018 20:40, Stefan Beller wrote:

On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood  wrote:


From: Phillip Wood 

When running

   git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0

cmp_in_block_with_wsd() is called 694908327 times. Of those 42.7%
return after comparing a and b. By comparing the lengths first we can
return early in all but 0.03% of those cases without dereferencing the
string pointers. The comparison between a and c fails in 6.8% of
calls, by comparing the lengths first we reject all the failing calls
without dereferencing the string pointers.

This reduces the time to run the command above by by 42% from 14.6s to
8.5s. This is still much slower than the normal --color-moved which
takes ~0.6-0.7s to run but is a significant improvement.

The next commits will replace the current implementation with one that
works with mixed tabs and spaces in the indentation. I think it is
worth optimizing the current implementation first to enable a fair
comparison between the two implementations.


Up to here the series looks good and I think we could take it
as a preparatory self-standing series.


Thanks for looking at these, I think it makes sense to split the series 
here, the commit message for this patch may want tweaking slightly if we 
do. (I did wonder about splitting it in two when I submitted it but took 
the easy way out.)


Best Wishes

Phillip


I'll read on.
Thanks,
Stefan





Re: [PATCH v1 7/9] diff --color-moved-ws: optimize allow-indentation-change

2018-11-16 Thread Stefan Beller
On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood  wrote:
>
> From: Phillip Wood 
>
> When running
>
>   git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0
>
> cmp_in_block_with_wsd() is called 694908327 times. Of those 42.7%
> return after comparing a and b. By comparing the lengths first we can
> return early in all but 0.03% of those cases without dereferencing the
> string pointers. The comparison between a and c fails in 6.8% of
> calls, by comparing the lengths first we reject all the failing calls
> without dereferencing the string pointers.
>
> This reduces the time to run the command above by by 42% from 14.6s to
> 8.5s. This is still much slower than the normal --color-moved which
> takes ~0.6-0.7s to run but is a significant improvement.
>
> The next commits will replace the current implementation with one that
> works with mixed tabs and spaces in the indentation. I think it is
> worth optimizing the current implementation first to enable a fair
> comparison between the two implementations.

Up to here the series looks good and I think we could take it
as a preparatory self-standing series.

I'll read on.
Thanks,
Stefan


[PATCH v1 7/9] diff --color-moved-ws: optimize allow-indentation-change

2018-11-16 Thread Phillip Wood
From: Phillip Wood 

When running

  git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0

cmp_in_block_with_wsd() is called 694908327 times. Of those 42.7%
return after comparing a and b. By comparing the lengths first we can
return early in all but 0.03% of those cases without dereferencing the
string pointers. The comparison between a and c fails in 6.8% of
calls, by comparing the lengths first we reject all the failing calls
without dereferencing the string pointers.

This reduces the time to run the command above by by 42% from 14.6s to
8.5s. This is still much slower than the normal --color-moved which
takes ~0.6-0.7s to run but is a significant improvement.

The next commits will replace the current implementation with one that
works with mixed tabs and spaces in the indentation. I think it is
worth optimizing the current implementation first to enable a fair
comparison between the two implementations.

Signed-off-by: Phillip Wood 
---
 diff.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 8c08dd68df..c378ce3daf 100644
--- a/diff.c
+++ b/diff.c
@@ -829,20 +829,23 @@ static int cmp_in_block_with_wsd(const struct 
diff_options *o,
 int n)
 {
struct emitted_diff_symbol *l = >emitted_symbols->buf[n];
-   int al = cur->es->len, cl = l->len;
+   int al = cur->es->len, bl = match->es->len, cl = l->len;
const char *a = cur->es->line,
   *b = match->es->line,
   *c = l->line;
-
+   const char *orig_a = a;
int wslen;
 
/*
-* We need to check if 'cur' is equal to 'match'.
-* As those are from the same (+/-) side, we do not need to adjust for
-* indent changes. However these were found using fuzzy matching
-* so we do have to check if they are equal.
+* We need to check if 'cur' is equal to 'match'.  As those
+* are from the same (+/-) side, we do not need to adjust for
+* indent changes. However these were found using fuzzy
+* matching so we do have to check if they are equal. Here we
+* just check the lengths. We delay calling memcmp() to check
+* the contents until later as if the length comparison for a
+* and c fails we can avoid the call all together.
 */
-   if (strcmp(a, b))
+   if (al != bl)
return 1;
 
if (!pmb->wsd.string)
@@ -870,7 +873,7 @@ static int cmp_in_block_with_wsd(const struct diff_options 
*o,
al -= wslen;
}
 
-   if (al != cl || memcmp(a, c, al))
+   if (al != cl || memcmp(orig_a, b, bl) || memcmp(a, c, al))
return 1;
 
return 0;
-- 
2.19.1