Re: [PATCH v2] read_index_from(): Skip verification of the cache entry order to speed index loading

2017-11-01 Thread Junio C Hamano
Jeff King  writes:

> On Tue, Oct 31, 2017 at 09:01:45AM -0400, Ben Peart wrote:
>
>> > > But what we probably _do_ need is to make sure that "git fsck" would
>> > > detect such an out-of-order index. So that developers and users alike
>> > > can diagnose suspected problems.
>> > 
>> > Agree -- that seems like a better home for this logic.
>> 
>> That is how version 1 of this patch worked but the feedback to that patch
>> was to remove it "not only during the normal operation but also in fsck."
>
> Sorry for the mixed messages (I think they are mixed between different
> people, and not mixed _just_ from me ;) ).
>
> For what it's worth, I like your v1, but can live with either approach.

I agree that v1 is the less bad one between the two.

To be honest, if the original code were done in that way (i.e. the
state with v1 applied), I probably would have had a very hard time
to justify accepting a patch to "make it safer by always checking at
runtime" (i.e. a reverse of v1 patch).

So, let's go with v1.  Thanks, all.



Re: [PATCH v2] read_index_from(): Skip verification of the cache entry order to speed index loading

2017-10-31 Thread Jeff King
On Tue, Oct 31, 2017 at 09:01:45AM -0400, Ben Peart wrote:

> > > But what we probably _do_ need is to make sure that "git fsck" would
> > > detect such an out-of-order index. So that developers and users alike
> > > can diagnose suspected problems.
> > 
> > Agree -- that seems like a better home for this logic.
> 
> That is how version 1 of this patch worked but the feedback to that patch
> was to remove it "not only during the normal operation but also in fsck."

Sorry for the mixed messages (I think they are mixed between different
people, and not mixed _just_ from me ;) ).

For what it's worth, I like your v1, but can live with either approach.

-Peff


Re: [PATCH v2] read_index_from(): Skip verification of the cache entry order to speed index loading

2017-10-31 Thread Ben Peart



On 10/30/2017 8:33 PM, Alex Vandiver wrote:

On Mon, 30 Oct 2017, Jeff King wrote:

On Mon, Oct 30, 2017 at 08:48:48AM -0400, Ben Peart wrote:


Any updates or thoughts on this one?  While the patch has become quite
trivial, it does results in a savings of 5%-15% in index load time.


I like the general direction of avoiding the check during each read.


Same -- the savings here are well worth it, IMHO.


I thought the compromise of having this test only run when DEBUG is defined
should limit it to developer builds (hopefully everyone developing on git is
running DEBUG builds :)).  Since the test is trying to detect buggy code
when writing the index, I thought that was the right time to test/catch any
issues.


I certainly don't build with DEBUG. It traditionally hasn't done
anything useful. But I'm also not convinced that this is a likely way to
find bugs in the first place, so I'm OK missing out on it.


I also don't compile with DEBUG -- there's no documentation that
mentions it, and I don't think I'd considered going poking for what
was `#ifdef`d.  I think it'd be reasonable to provide some
configure-time setting that adds `CFLAGS="-ggdb3 -O0 -DDEBUG"` or
similar, but that seems possibly moot for this particular change (see
below).


But what we probably _do_ need is to make sure that "git fsck" would
detect such an out-of-order index. So that developers and users alike
can diagnose suspected problems.


Agree -- that seems like a better home for this logic.


That is how version 1 of this patch worked but the feedback to that 
patch was to remove it "not only during the normal operation but also in 
fsck."





I am working on other, more substantial savings for index load times
(stay tuned) but this seemed like a small simple way to help speed
things up.


I'm interested to hear more about what direction you're looking in here.
  - Alex



I'm working on parallelizing the index load process across multiple 
threads/cpu cores.  Specifically the loop that calls create_from_disk() 
and set_index_entry().  The serial nature of the index formats makes 
that difficult but I believe I've come up with a way to make it work 
across all existing index formats.


Re: [PATCH v2] read_index_from(): Skip verification of the cache entry order to speed index loading

2017-10-31 Thread Ben Peart



On 10/30/2017 9:49 PM, Junio C Hamano wrote:

Ben Peart  writes:


Any updates or thoughts on this one?  While the patch has become quite
trivial, it does results in a savings of 5%-15% in index load time.

I thought the compromise of having this test only run when DEBUG is
defined should limit it to developer builds (hopefully everyone
developing on git is running DEBUG builds :)).  Since the test is
trying to detect buggy code when writing the index, I thought that was
the right time to test/catch any issues.


This check is more about catching a possible breakage (and a
malicious repository) early before we go too far into the operation.
I do not think this check is about debugging the implementation of
Git.  How would it be useful to turn it on in DEBUG build?

While I do think pursuing any runtime improvements better than a
couple of percents is worth it, I do not think the approach taken by
this iteration makes much sense; the previous one that at least
allowed fsck to catch breakage may have been already too leaky to
catch real issues (i.e. when you are asked to visit and look at an
unknown repository, you wouldn't start your session with "git fsck"
to protect yourself), and this round makes it much worse.

Besides, I see no -DDEBUG from "grep -e '-D[A-Z]*DEBUG' Makefile".
If this check were about allowing us easier time debugging the
binary (which I do not think it is), this probably should be
'#ifndef NDEBUG' instead.



I've tried 3 different ways to remove the overhead of this call from 
regular git operations.


The first was version 1 of the patch which had fsck catch breakage but 
removed it from other commands that read the index.  Since that version 
was not accepted, I took the feedback "I think I like the direction of 
getting rid of the order in post_read_index_from(), not only during the 
normal but also in fsck" to come up with a version 2.


I was hesitant to remove the code completely as I did believe it had 
some value in detecting invalid indexes so went looking for a macro I 
could use to have it 1) not happen during regular user commands but 2) 
still happen for developers.


The NDEBUG macro is guaranteed by the C89 standard 
(http://port70.net/~nsz/c/c89/c89-draft.html#4.1.2 ) to guard the code 
that is only necessary when assertions are in effect so seemed like a 
good choice.  When I used it however, I discovered that the git Makefile 
does not define NDEBUG so using this macro did not have any effect thus 
making the patch useless as the code continues to run in all cases.


On a side note, there are 434 instances of assert which up until this 
experience I believed were being removed in released builds.  As far as 
I can tell, that is not the case.  If removing them is the 
desired/expected behavior, we need to fix our Makefile as it only 
currently defines NDEBUG if USE_NED_ALLOCATOR is defined.


I then searched the code and found 47 instances where the macro DEBUG 
was used.  I (incorrectly) assumed that meant it must be used by other 
git developers.  I personally have a build alias that adds "-j12 
CFLAGS=-DDEBUG" to my make command but apparently I'm in the minority. :)


This assumption led me to the patch version 2 (guarding the code with 
#ifdef DEBUG) as it does meet the request to remove it during normal but 
also fsck and does so with regular/release builds as they are currently 
defined.


It seems that the current round of feedback is more in favor of leaving 
the test in fsck but removing it for other commands.  If that is the 
desired behavior, please use version 1 of the patch.


I'm also happy to flip this to "#ifndef NDEBUG" but that only makes 
sense if the released builds actually set NDEBUG which (I believe) will 
require a patch to Makefile.




Re: [PATCH v2] read_index_from(): Skip verification of the cache entry order to speed index loading

2017-10-30 Thread Junio C Hamano
Ben Peart  writes:

> Any updates or thoughts on this one?  While the patch has become quite
> trivial, it does results in a savings of 5%-15% in index load time.
>
> I thought the compromise of having this test only run when DEBUG is
> defined should limit it to developer builds (hopefully everyone
> developing on git is running DEBUG builds :)).  Since the test is
> trying to detect buggy code when writing the index, I thought that was
> the right time to test/catch any issues.

This check is more about catching a possible breakage (and a
malicious repository) early before we go too far into the operation.
I do not think this check is about debugging the implementation of
Git.  How would it be useful to turn it on in DEBUG build?

While I do think pursuing any runtime improvements better than a
couple of percents is worth it, I do not think the approach taken by
this iteration makes much sense; the previous one that at least
allowed fsck to catch breakage may have been already too leaky to
catch real issues (i.e. when you are asked to visit and look at an
unknown repository, you wouldn't start your session with "git fsck"
to protect yourself), and this round makes it much worse.

Besides, I see no -DDEBUG from "grep -e '-D[A-Z]*DEBUG' Makefile".
If this check were about allowing us easier time debugging the
binary (which I do not think it is), this probably should be
'#ifndef NDEBUG' instead.


Re: [PATCH v2] read_index_from(): Skip verification of the cache entry order to speed index loading

2017-10-30 Thread Alex Vandiver
On Mon, 30 Oct 2017, Jeff King wrote:
> On Mon, Oct 30, 2017 at 08:48:48AM -0400, Ben Peart wrote:
> 
> > Any updates or thoughts on this one?  While the patch has become quite
> > trivial, it does results in a savings of 5%-15% in index load time.
> 
> I like the general direction of avoiding the check during each read.

Same -- the savings here are well worth it, IMHO.

> > I thought the compromise of having this test only run when DEBUG is defined
> > should limit it to developer builds (hopefully everyone developing on git is
> > running DEBUG builds :)).  Since the test is trying to detect buggy code
> > when writing the index, I thought that was the right time to test/catch any
> > issues.
> 
> I certainly don't build with DEBUG. It traditionally hasn't done
> anything useful. But I'm also not convinced that this is a likely way to
> find bugs in the first place, so I'm OK missing out on it.

I also don't compile with DEBUG -- there's no documentation that
mentions it, and I don't think I'd considered going poking for what
was `#ifdef`d.  I think it'd be reasonable to provide some
configure-time setting that adds `CFLAGS="-ggdb3 -O0 -DDEBUG"` or
similar, but that seems possibly moot for this particular change (see
below).

> But what we probably _do_ need is to make sure that "git fsck" would
> detect such an out-of-order index. So that developers and users alike
> can diagnose suspected problems.

Agree -- that seems like a better home for this logic.

> > I am working on other, more substantial savings for index load times
> > (stay tuned) but this seemed like a small simple way to help speed
> > things up.

I'm interested to hear more about what direction you're looking in here.
 - Alex


Re: [PATCH v2] read_index_from(): Skip verification of the cache entry order to speed index loading

2017-10-30 Thread Jeff King
On Mon, Oct 30, 2017 at 08:48:48AM -0400, Ben Peart wrote:

> Any updates or thoughts on this one?  While the patch has become quite
> trivial, it does results in a savings of 5%-15% in index load time.

I like the general direction of avoiding the check during each read.
But...

> I thought the compromise of having this test only run when DEBUG is defined
> should limit it to developer builds (hopefully everyone developing on git is
> running DEBUG builds :)).  Since the test is trying to detect buggy code
> when writing the index, I thought that was the right time to test/catch any
> issues.

I certainly don't build with DEBUG. It traditionally hasn't done
anything useful. But I'm also not convinced that this is a likely way to
find bugs in the first place, so I'm OK missing out on it.

But what we probably _do_ need is to make sure that "git fsck" would
detect such an out-of-order index. So that developers and users alike
can diagnose suspected problems.

-Peff


Re: [PATCH v2] read_index_from(): Skip verification of the cache entry order to speed index loading

2017-10-30 Thread Ben Peart
Any updates or thoughts on this one?  While the patch has become quite 
trivial, it does results in a savings of 5%-15% in index load time.


I thought the compromise of having this test only run when DEBUG is 
defined should limit it to developer builds (hopefully everyone 
developing on git is running DEBUG builds :)).  Since the test is trying 
to detect buggy code when writing the index, I thought that was the 
right time to test/catch any issues.


I am working on other, more substantial savings for index load times 
(stay tuned) but this seemed like a small simple way to help speed 
things up.


On 10/24/2017 10:45 AM, Ben Peart wrote:

There is code in post_read_index_from() to detect out of order cache
entries when reading an index file.  This order verification adds cost
to read_index_from() that grows with the size of the index.

Put this on-disk data-structure validation code behind an #ifdef DEBUG
so only debug builds have to pay the cost.

The effect can be seen using t/perf/p0002-read-cache.sh:

Test w/git repo   HEADHEAD~1

read_cache/discard_cache 1000 times   0.42(0.01+0.09) 0.48(0.01+0.09) +14.3%
read_cache/discard_cache 1000 times   0.41(0.03+0.04) 0.49(0.00+0.10) +19.5%
read_cache/discard_cache 1000 times   0.42(0.03+0.06) 0.49(0.06+0.04) +16.7%

Test w/10K files  HEADHEAD~1
---
read_cache/discard_cache 1000 times   1.58(0.04+0.00) 1.71(0.00+0.07) +8.2%
read_cache/discard_cache 1000 times   1.64(0.01+0.07) 1.76(0.01+0.09) +7.3%
read_cache/discard_cache 1000 times   1.62(0.03+0.04) 1.71(0.00+0.04) +5.6%

Test w/100K files HEAD HEAD~1
-
read_cache/discard_cache 1000 times   25.85(0.00+0.06) 27.35(0.01+0.06) +5.8%
read_cache/discard_cache 1000 times   25.82(0.01+0.07) 27.25(0.01+0.07) +5.5%
read_cache/discard_cache 1000 times   26.00(0.01+0.07) 27.36(0.06+0.03) +5.2%

Test with 1,000K filesHEAD  HEAD~1
---
read_cache/discard_cache 1000 times   200.61(0.01+0.07) 218.23(0.03+0.06) +8.8%
read_cache/discard_cache 1000 times   201.62(0.03+0.06) 217.86(0.03+0.06) +8.1%
read_cache/discard_cache 1000 times   201.64(0.01+0.09) 217.89(0.03+0.07) +8.1%

Signed-off-by: Ben Peart 
Signed-off-by: Johannes Schindelin 
---

Notes:
 Base Ref: master
 Web-Diff: https://github.com/benpeart/git/commit/95e20f17ff
 Checkout: git fetch https://github.com/benpeart/git no_ce_order-v2 && git 
checkout 95e20f17ff
 
 ### Interdiff (v1..v2):
 
 ### Patches


  read-cache.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index 65f4fe8375..fc90ec0fce 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1664,6 +1664,7 @@ static struct cache_entry *create_from_disk(struct 
ondisk_cache_entry *ondisk,
return ce;
  }
  
+#ifdef DEBUG

  static void check_ce_order(struct index_state *istate)
  {
unsigned int i;
@@ -1685,6 +1686,7 @@ static void check_ce_order(struct index_state *istate)
}
}
  }
+#endif
  
  static void tweak_untracked_cache(struct index_state *istate)

  {
@@ -1720,7 +1722,9 @@ static void tweak_split_index(struct index_state *istate)
  
  static void post_read_index_from(struct index_state *istate)

  {
+#ifdef DEBUG
check_ce_order(istate);
+#endif
tweak_untracked_cache(istate);
tweak_split_index(istate);
  }

base-commit: c52ca88430e6ec7c834af38720295070d8a1e330



[PATCH v2] read_index_from(): Skip verification of the cache entry order to speed index loading

2017-10-24 Thread Ben Peart
There is code in post_read_index_from() to detect out of order cache
entries when reading an index file.  This order verification adds cost
to read_index_from() that grows with the size of the index.

Put this on-disk data-structure validation code behind an #ifdef DEBUG
so only debug builds have to pay the cost.

The effect can be seen using t/perf/p0002-read-cache.sh:

Test w/git repo   HEADHEAD~1

read_cache/discard_cache 1000 times   0.42(0.01+0.09) 0.48(0.01+0.09) +14.3%
read_cache/discard_cache 1000 times   0.41(0.03+0.04) 0.49(0.00+0.10) +19.5%
read_cache/discard_cache 1000 times   0.42(0.03+0.06) 0.49(0.06+0.04) +16.7%

Test w/10K files  HEADHEAD~1
---
read_cache/discard_cache 1000 times   1.58(0.04+0.00) 1.71(0.00+0.07) +8.2%
read_cache/discard_cache 1000 times   1.64(0.01+0.07) 1.76(0.01+0.09) +7.3%
read_cache/discard_cache 1000 times   1.62(0.03+0.04) 1.71(0.00+0.04) +5.6%

Test w/100K files HEAD HEAD~1
-
read_cache/discard_cache 1000 times   25.85(0.00+0.06) 27.35(0.01+0.06) +5.8%
read_cache/discard_cache 1000 times   25.82(0.01+0.07) 27.25(0.01+0.07) +5.5%
read_cache/discard_cache 1000 times   26.00(0.01+0.07) 27.36(0.06+0.03) +5.2%

Test with 1,000K filesHEAD  HEAD~1
---
read_cache/discard_cache 1000 times   200.61(0.01+0.07) 218.23(0.03+0.06) +8.8%
read_cache/discard_cache 1000 times   201.62(0.03+0.06) 217.86(0.03+0.06) +8.1%
read_cache/discard_cache 1000 times   201.64(0.01+0.09) 217.89(0.03+0.07) +8.1%

Signed-off-by: Ben Peart 
Signed-off-by: Johannes Schindelin 
---

Notes:
Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/95e20f17ff
Checkout: git fetch https://github.com/benpeart/git no_ce_order-v2 && git 
checkout 95e20f17ff

### Interdiff (v1..v2):

### Patches

 read-cache.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index 65f4fe8375..fc90ec0fce 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1664,6 +1664,7 @@ static struct cache_entry *create_from_disk(struct 
ondisk_cache_entry *ondisk,
return ce;
 }
 
+#ifdef DEBUG
 static void check_ce_order(struct index_state *istate)
 {
unsigned int i;
@@ -1685,6 +1686,7 @@ static void check_ce_order(struct index_state *istate)
}
}
 }
+#endif
 
 static void tweak_untracked_cache(struct index_state *istate)
 {
@@ -1720,7 +1722,9 @@ static void tweak_split_index(struct index_state *istate)
 
 static void post_read_index_from(struct index_state *istate)
 {
+#ifdef DEBUG
check_ce_order(istate);
+#endif
tweak_untracked_cache(istate);
tweak_split_index(istate);
 }

base-commit: c52ca88430e6ec7c834af38720295070d8a1e330
-- 
2.14.1.windows.1.1034.g0776750557