Re: [PATCH v2] read_index_from(): Skip verification of the cache entry order to speed index loading
Jeff Kingwrites: > On Tue, Oct 31, 2017 at 09:01:45AM -0400, Ben Peart wrote: > >> > > But what we probably _do_ need is to make sure that "git fsck" would >> > > detect such an out-of-order index. So that developers and users alike >> > > can diagnose suspected problems. >> > >> > Agree -- that seems like a better home for this logic. >> >> That is how version 1 of this patch worked but the feedback to that patch >> was to remove it "not only during the normal operation but also in fsck." > > Sorry for the mixed messages (I think they are mixed between different > people, and not mixed _just_ from me ;) ). > > For what it's worth, I like your v1, but can live with either approach. I agree that v1 is the less bad one between the two. To be honest, if the original code were done in that way (i.e. the state with v1 applied), I probably would have had a very hard time to justify accepting a patch to "make it safer by always checking at runtime" (i.e. a reverse of v1 patch). So, let's go with v1. Thanks, all.
Re: [PATCH v2] read_index_from(): Skip verification of the cache entry order to speed index loading
On Tue, Oct 31, 2017 at 09:01:45AM -0400, Ben Peart wrote: > > > But what we probably _do_ need is to make sure that "git fsck" would > > > detect such an out-of-order index. So that developers and users alike > > > can diagnose suspected problems. > > > > Agree -- that seems like a better home for this logic. > > That is how version 1 of this patch worked but the feedback to that patch > was to remove it "not only during the normal operation but also in fsck." Sorry for the mixed messages (I think they are mixed between different people, and not mixed _just_ from me ;) ). For what it's worth, I like your v1, but can live with either approach. -Peff
Re: [PATCH v2] read_index_from(): Skip verification of the cache entry order to speed index loading
On 10/30/2017 8:33 PM, Alex Vandiver wrote: On Mon, 30 Oct 2017, Jeff King wrote: On Mon, Oct 30, 2017 at 08:48:48AM -0400, Ben Peart wrote: Any updates or thoughts on this one? While the patch has become quite trivial, it does results in a savings of 5%-15% in index load time. I like the general direction of avoiding the check during each read. Same -- the savings here are well worth it, IMHO. I thought the compromise of having this test only run when DEBUG is defined should limit it to developer builds (hopefully everyone developing on git is running DEBUG builds :)). Since the test is trying to detect buggy code when writing the index, I thought that was the right time to test/catch any issues. I certainly don't build with DEBUG. It traditionally hasn't done anything useful. But I'm also not convinced that this is a likely way to find bugs in the first place, so I'm OK missing out on it. I also don't compile with DEBUG -- there's no documentation that mentions it, and I don't think I'd considered going poking for what was `#ifdef`d. I think it'd be reasonable to provide some configure-time setting that adds `CFLAGS="-ggdb3 -O0 -DDEBUG"` or similar, but that seems possibly moot for this particular change (see below). But what we probably _do_ need is to make sure that "git fsck" would detect such an out-of-order index. So that developers and users alike can diagnose suspected problems. Agree -- that seems like a better home for this logic. That is how version 1 of this patch worked but the feedback to that patch was to remove it "not only during the normal operation but also in fsck." I am working on other, more substantial savings for index load times (stay tuned) but this seemed like a small simple way to help speed things up. I'm interested to hear more about what direction you're looking in here. - Alex I'm working on parallelizing the index load process across multiple threads/cpu cores. Specifically the loop that calls create_from_disk() and set_index_entry(). The serial nature of the index formats makes that difficult but I believe I've come up with a way to make it work across all existing index formats.
Re: [PATCH v2] read_index_from(): Skip verification of the cache entry order to speed index loading
On 10/30/2017 9:49 PM, Junio C Hamano wrote: Ben Peartwrites: Any updates or thoughts on this one? While the patch has become quite trivial, it does results in a savings of 5%-15% in index load time. I thought the compromise of having this test only run when DEBUG is defined should limit it to developer builds (hopefully everyone developing on git is running DEBUG builds :)). Since the test is trying to detect buggy code when writing the index, I thought that was the right time to test/catch any issues. This check is more about catching a possible breakage (and a malicious repository) early before we go too far into the operation. I do not think this check is about debugging the implementation of Git. How would it be useful to turn it on in DEBUG build? While I do think pursuing any runtime improvements better than a couple of percents is worth it, I do not think the approach taken by this iteration makes much sense; the previous one that at least allowed fsck to catch breakage may have been already too leaky to catch real issues (i.e. when you are asked to visit and look at an unknown repository, you wouldn't start your session with "git fsck" to protect yourself), and this round makes it much worse. Besides, I see no -DDEBUG from "grep -e '-D[A-Z]*DEBUG' Makefile". If this check were about allowing us easier time debugging the binary (which I do not think it is), this probably should be '#ifndef NDEBUG' instead. I've tried 3 different ways to remove the overhead of this call from regular git operations. The first was version 1 of the patch which had fsck catch breakage but removed it from other commands that read the index. Since that version was not accepted, I took the feedback "I think I like the direction of getting rid of the order in post_read_index_from(), not only during the normal but also in fsck" to come up with a version 2. I was hesitant to remove the code completely as I did believe it had some value in detecting invalid indexes so went looking for a macro I could use to have it 1) not happen during regular user commands but 2) still happen for developers. The NDEBUG macro is guaranteed by the C89 standard (http://port70.net/~nsz/c/c89/c89-draft.html#4.1.2 ) to guard the code that is only necessary when assertions are in effect so seemed like a good choice. When I used it however, I discovered that the git Makefile does not define NDEBUG so using this macro did not have any effect thus making the patch useless as the code continues to run in all cases. On a side note, there are 434 instances of assert which up until this experience I believed were being removed in released builds. As far as I can tell, that is not the case. If removing them is the desired/expected behavior, we need to fix our Makefile as it only currently defines NDEBUG if USE_NED_ALLOCATOR is defined. I then searched the code and found 47 instances where the macro DEBUG was used. I (incorrectly) assumed that meant it must be used by other git developers. I personally have a build alias that adds "-j12 CFLAGS=-DDEBUG" to my make command but apparently I'm in the minority. :) This assumption led me to the patch version 2 (guarding the code with #ifdef DEBUG) as it does meet the request to remove it during normal but also fsck and does so with regular/release builds as they are currently defined. It seems that the current round of feedback is more in favor of leaving the test in fsck but removing it for other commands. If that is the desired behavior, please use version 1 of the patch. I'm also happy to flip this to "#ifndef NDEBUG" but that only makes sense if the released builds actually set NDEBUG which (I believe) will require a patch to Makefile.
Re: [PATCH v2] read_index_from(): Skip verification of the cache entry order to speed index loading
Ben Peartwrites: > Any updates or thoughts on this one? While the patch has become quite > trivial, it does results in a savings of 5%-15% in index load time. > > I thought the compromise of having this test only run when DEBUG is > defined should limit it to developer builds (hopefully everyone > developing on git is running DEBUG builds :)). Since the test is > trying to detect buggy code when writing the index, I thought that was > the right time to test/catch any issues. This check is more about catching a possible breakage (and a malicious repository) early before we go too far into the operation. I do not think this check is about debugging the implementation of Git. How would it be useful to turn it on in DEBUG build? While I do think pursuing any runtime improvements better than a couple of percents is worth it, I do not think the approach taken by this iteration makes much sense; the previous one that at least allowed fsck to catch breakage may have been already too leaky to catch real issues (i.e. when you are asked to visit and look at an unknown repository, you wouldn't start your session with "git fsck" to protect yourself), and this round makes it much worse. Besides, I see no -DDEBUG from "grep -e '-D[A-Z]*DEBUG' Makefile". If this check were about allowing us easier time debugging the binary (which I do not think it is), this probably should be '#ifndef NDEBUG' instead.
Re: [PATCH v2] read_index_from(): Skip verification of the cache entry order to speed index loading
On Mon, 30 Oct 2017, Jeff King wrote: > On Mon, Oct 30, 2017 at 08:48:48AM -0400, Ben Peart wrote: > > > Any updates or thoughts on this one? While the patch has become quite > > trivial, it does results in a savings of 5%-15% in index load time. > > I like the general direction of avoiding the check during each read. Same -- the savings here are well worth it, IMHO. > > I thought the compromise of having this test only run when DEBUG is defined > > should limit it to developer builds (hopefully everyone developing on git is > > running DEBUG builds :)). Since the test is trying to detect buggy code > > when writing the index, I thought that was the right time to test/catch any > > issues. > > I certainly don't build with DEBUG. It traditionally hasn't done > anything useful. But I'm also not convinced that this is a likely way to > find bugs in the first place, so I'm OK missing out on it. I also don't compile with DEBUG -- there's no documentation that mentions it, and I don't think I'd considered going poking for what was `#ifdef`d. I think it'd be reasonable to provide some configure-time setting that adds `CFLAGS="-ggdb3 -O0 -DDEBUG"` or similar, but that seems possibly moot for this particular change (see below). > But what we probably _do_ need is to make sure that "git fsck" would > detect such an out-of-order index. So that developers and users alike > can diagnose suspected problems. Agree -- that seems like a better home for this logic. > > I am working on other, more substantial savings for index load times > > (stay tuned) but this seemed like a small simple way to help speed > > things up. I'm interested to hear more about what direction you're looking in here. - Alex
Re: [PATCH v2] read_index_from(): Skip verification of the cache entry order to speed index loading
On Mon, Oct 30, 2017 at 08:48:48AM -0400, Ben Peart wrote: > Any updates or thoughts on this one? While the patch has become quite > trivial, it does results in a savings of 5%-15% in index load time. I like the general direction of avoiding the check during each read. But... > I thought the compromise of having this test only run when DEBUG is defined > should limit it to developer builds (hopefully everyone developing on git is > running DEBUG builds :)). Since the test is trying to detect buggy code > when writing the index, I thought that was the right time to test/catch any > issues. I certainly don't build with DEBUG. It traditionally hasn't done anything useful. But I'm also not convinced that this is a likely way to find bugs in the first place, so I'm OK missing out on it. But what we probably _do_ need is to make sure that "git fsck" would detect such an out-of-order index. So that developers and users alike can diagnose suspected problems. -Peff
Re: [PATCH v2] read_index_from(): Skip verification of the cache entry order to speed index loading
Any updates or thoughts on this one? While the patch has become quite trivial, it does results in a savings of 5%-15% in index load time. I thought the compromise of having this test only run when DEBUG is defined should limit it to developer builds (hopefully everyone developing on git is running DEBUG builds :)). Since the test is trying to detect buggy code when writing the index, I thought that was the right time to test/catch any issues. I am working on other, more substantial savings for index load times (stay tuned) but this seemed like a small simple way to help speed things up. On 10/24/2017 10:45 AM, Ben Peart wrote: There is code in post_read_index_from() to detect out of order cache entries when reading an index file. This order verification adds cost to read_index_from() that grows with the size of the index. Put this on-disk data-structure validation code behind an #ifdef DEBUG so only debug builds have to pay the cost. The effect can be seen using t/perf/p0002-read-cache.sh: Test w/git repo HEADHEAD~1 read_cache/discard_cache 1000 times 0.42(0.01+0.09) 0.48(0.01+0.09) +14.3% read_cache/discard_cache 1000 times 0.41(0.03+0.04) 0.49(0.00+0.10) +19.5% read_cache/discard_cache 1000 times 0.42(0.03+0.06) 0.49(0.06+0.04) +16.7% Test w/10K files HEADHEAD~1 --- read_cache/discard_cache 1000 times 1.58(0.04+0.00) 1.71(0.00+0.07) +8.2% read_cache/discard_cache 1000 times 1.64(0.01+0.07) 1.76(0.01+0.09) +7.3% read_cache/discard_cache 1000 times 1.62(0.03+0.04) 1.71(0.00+0.04) +5.6% Test w/100K files HEAD HEAD~1 - read_cache/discard_cache 1000 times 25.85(0.00+0.06) 27.35(0.01+0.06) +5.8% read_cache/discard_cache 1000 times 25.82(0.01+0.07) 27.25(0.01+0.07) +5.5% read_cache/discard_cache 1000 times 26.00(0.01+0.07) 27.36(0.06+0.03) +5.2% Test with 1,000K filesHEAD HEAD~1 --- read_cache/discard_cache 1000 times 200.61(0.01+0.07) 218.23(0.03+0.06) +8.8% read_cache/discard_cache 1000 times 201.62(0.03+0.06) 217.86(0.03+0.06) +8.1% read_cache/discard_cache 1000 times 201.64(0.01+0.09) 217.89(0.03+0.07) +8.1% Signed-off-by: Ben PeartSigned-off-by: Johannes Schindelin --- Notes: Base Ref: master Web-Diff: https://github.com/benpeart/git/commit/95e20f17ff Checkout: git fetch https://github.com/benpeart/git no_ce_order-v2 && git checkout 95e20f17ff ### Interdiff (v1..v2): ### Patches read-cache.c | 4 1 file changed, 4 insertions(+) diff --git a/read-cache.c b/read-cache.c index 65f4fe8375..fc90ec0fce 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1664,6 +1664,7 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk, return ce; } +#ifdef DEBUG static void check_ce_order(struct index_state *istate) { unsigned int i; @@ -1685,6 +1686,7 @@ static void check_ce_order(struct index_state *istate) } } } +#endif static void tweak_untracked_cache(struct index_state *istate) { @@ -1720,7 +1722,9 @@ static void tweak_split_index(struct index_state *istate) static void post_read_index_from(struct index_state *istate) { +#ifdef DEBUG check_ce_order(istate); +#endif tweak_untracked_cache(istate); tweak_split_index(istate); } base-commit: c52ca88430e6ec7c834af38720295070d8a1e330
[PATCH v2] read_index_from(): Skip verification of the cache entry order to speed index loading
There is code in post_read_index_from() to detect out of order cache entries when reading an index file. This order verification adds cost to read_index_from() that grows with the size of the index. Put this on-disk data-structure validation code behind an #ifdef DEBUG so only debug builds have to pay the cost. The effect can be seen using t/perf/p0002-read-cache.sh: Test w/git repo HEADHEAD~1 read_cache/discard_cache 1000 times 0.42(0.01+0.09) 0.48(0.01+0.09) +14.3% read_cache/discard_cache 1000 times 0.41(0.03+0.04) 0.49(0.00+0.10) +19.5% read_cache/discard_cache 1000 times 0.42(0.03+0.06) 0.49(0.06+0.04) +16.7% Test w/10K files HEADHEAD~1 --- read_cache/discard_cache 1000 times 1.58(0.04+0.00) 1.71(0.00+0.07) +8.2% read_cache/discard_cache 1000 times 1.64(0.01+0.07) 1.76(0.01+0.09) +7.3% read_cache/discard_cache 1000 times 1.62(0.03+0.04) 1.71(0.00+0.04) +5.6% Test w/100K files HEAD HEAD~1 - read_cache/discard_cache 1000 times 25.85(0.00+0.06) 27.35(0.01+0.06) +5.8% read_cache/discard_cache 1000 times 25.82(0.01+0.07) 27.25(0.01+0.07) +5.5% read_cache/discard_cache 1000 times 26.00(0.01+0.07) 27.36(0.06+0.03) +5.2% Test with 1,000K filesHEAD HEAD~1 --- read_cache/discard_cache 1000 times 200.61(0.01+0.07) 218.23(0.03+0.06) +8.8% read_cache/discard_cache 1000 times 201.62(0.03+0.06) 217.86(0.03+0.06) +8.1% read_cache/discard_cache 1000 times 201.64(0.01+0.09) 217.89(0.03+0.07) +8.1% Signed-off-by: Ben PeartSigned-off-by: Johannes Schindelin --- Notes: Base Ref: master Web-Diff: https://github.com/benpeart/git/commit/95e20f17ff Checkout: git fetch https://github.com/benpeart/git no_ce_order-v2 && git checkout 95e20f17ff ### Interdiff (v1..v2): ### Patches read-cache.c | 4 1 file changed, 4 insertions(+) diff --git a/read-cache.c b/read-cache.c index 65f4fe8375..fc90ec0fce 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1664,6 +1664,7 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk, return ce; } +#ifdef DEBUG static void check_ce_order(struct index_state *istate) { unsigned int i; @@ -1685,6 +1686,7 @@ static void check_ce_order(struct index_state *istate) } } } +#endif static void tweak_untracked_cache(struct index_state *istate) { @@ -1720,7 +1722,9 @@ static void tweak_split_index(struct index_state *istate) static void post_read_index_from(struct index_state *istate) { +#ifdef DEBUG check_ce_order(istate); +#endif tweak_untracked_cache(istate); tweak_split_index(istate); } base-commit: c52ca88430e6ec7c834af38720295070d8a1e330 -- 2.14.1.windows.1.1034.g0776750557