Re: [PATCH v2 0/4] Speed up unpack_trees()
On Thu, Aug 9, 2018 at 10:16 AM Ben Peart wrote: > In fact, in the other [1] patch series, we're detecting the number of > cache entries that are the same as the cache tree and using that to > traverse_by_cache_tree(). At that point, couldn't we copy the > corresponding cache tree entries over to the destination so that those > don't have to get recreated in the later call to cache_tree_update()? We could. But as I stated in another mail, I saw this cache-tree optimization as a separate problem and didn't want to mix them up. That way cache_tree_copy() could be used elsewhere if the opportunity shows up. Mixing them up could also complicate the problem. You you merge stuff, you add new cache-entries to o->result with add_index_entry() which tries to invalidate those paths in o->result's cache-tree. Right now the cache-tree is empty so it's really no-op. But if you copy cache-tree over while merging, that invalidation might either invalidate your newly copied cache-tree, or get slowed down because non-empty o->result's cache-tree means you start to need to walk it to find if there's any path to invalidate. PS. This code keeps messing me up. invalidate_ce_path() may also invalidate cache-tree in the _source_ index. For this optimization to really shine, you better keep the the original cache-tree intact (so that you can reuse as much as possible). I don't see the purpose of this source cache tree invalidation at all. My guess at this point is Linus actually made a mistake in 34110cd4e3 (Make 'unpack_trees()' have a separate source and destination index - 2008-03-06) and he should have invalidated _destination_ index instead of the source one. I'm going to dig in some more and probably will send a patch to remove this invalidation. -- Duy
Re: [PATCH v2 0/4] Speed up unpack_trees()
On Wed, Aug 8, 2018 at 10:53 PM Ben Peart wrote: > > > > On 8/1/2018 12:38 PM, Duy Nguyen wrote: > > On Tue, Jul 31, 2018 at 01:31:31PM -0400, Ben Peart wrote: > >> > >> > >> On 7/31/2018 12:50 PM, Ben Peart wrote: > >>> > >>> > >>> On 7/31/2018 11:31 AM, Duy Nguyen wrote: > >> > > > In the performance game of whack-a-mole, that call to repair cache-tree > > is now looking quite expensive... > > Yeah and I think we can whack that mole too. I did some measurement. > Best case possible, we just need to scan through two indexes (one with > many good cache-tree, one with no cache-tree), compare and copy > cache-tree over. The scanning takes like 1% time of current repair > step and I suspect it's the hashing that takes most of the time. Of > course real world won't have such nice numbers, but I guess we could > maybe half cache-tree update/repair time. > > >>> > >>> I have some great profiling tools available so will take a look at this > >>> next and see exactly where the time is being spent. > >> > >> Good instincts. In cache_tree_update, the heavy hitter is definitely > >> hash_object_file followed by has_object_file. > >> > >> Name Inc %Inc > >> + git!cache_tree_update 12.4 4,935 > >> |+ git!update_one 11.8 4,706 > >> | + git!update_one11.8 4,706 > >> | + git!hash_object_file 6.1 2,406 > >> | + git!has_object_file 2.0813 > >> | + OTHER <>0.5203 > >> | + git!strbuf_addf 0.4155 > >> | + git!strbuf_release0.4143 > >> | + git!strbuf_add0.3121 > >> | + OTHER <>0.2 93 > >> | + git!strbuf_grow 0.1 25 > > > > Ben, if you work on this, this could be a good starting point. I will > > not work on this because I still have some other things to catch up > > and follow through. You can have my sign off if you reuse something > > from this patch > > > > Even if it's a naive implementation, the initial numbers look pretty > > good. Without the patch we have > > > > 18:31:05.970621 unpack-trees.c:1437 performance: 0.01029 s: copy > > 18:31:05.975729 unpack-trees.c:1444 performance: 0.005082004 s: update > > > > And with the patch > > > > 18:31:13.295655 unpack-trees.c:1437 performance: 0.000198017 s: copy > > 18:31:13.296757 unpack-trees.c:1444 performance: 0.001075935 s: update > > > > Time saving is about 80% by the look of this (best possible case > > because only the top tree needs to be hashed and written out). > > > > -- 8< -- > > diff --git a/cache-tree.c b/cache-tree.c > > index 6b46711996..67a4a93100 100644 > > --- a/cache-tree.c > > +++ b/cache-tree.c > > @@ -440,6 +440,147 @@ int cache_tree_update(struct index_state *istate, int > > flags) > > return 0; > > } > > > > +static int same(const struct cache_entry *a, const struct cache_entry *b) > > +{ > > + if (ce_stage(a) || ce_stage(b)) > > + return 0; > > + if ((a->ce_flags | b->ce_flags) & CE_CONFLICTED) > > + return 0; > > + return a->ce_mode == b->ce_mode && > > +!oidcmp(&a->oid, &b->oid); > > +} > > + > > +static int cache_tree_name_pos(const struct index_state *istate, > > +const struct strbuf *path) > > +{ > > + int pos; > > + > > + if (!path->len) > > + return 0; > > + > > + pos = index_name_pos(istate, path->buf, path->len); > > + if (pos >= 0) > > + BUG("No no no, directory path must not exist in index"); > > + return -pos - 1; > > +} > > + > > +/* > > + * Locate the same cache-tree in two separate indexes. Check the > > + * cache-tree is still valid for the "to" index (i.e. it contains the > > + * same set of entries in the "from" index). > > + */ > > +static int verify_one_cache_tree(const struct index_state *to, > > + const struct index_state *from, > > + const struct cache_tree *it, > > + const struct strbuf *path) > > +{ > > + int i, spos, dpos; > > + > > + spos = cache_tree_name_pos(from, path); > > + if (spos + it->entry_count > from->cache_nr) > > + return -1; > > + > > + dpos = cache_tree_name_pos(to, path); > > + if (dpos + it->entry_count > to->cache_nr) > > + return -1; > > + > > + /* Can we quickly check head and tail and bail out early */ > > + if (!same(from->cache[spos], to->cache[spos]) || > > + !same(from->cache[spos + it->entry_count - 1], > > + to->cache[spos + it->entry_count - 1])) > > + return -1; > > + > > + for (i = 1; i < it->entry_count - 1; i++) > > + if (!same(from->cache[spos + i], > > + to->cache[dpos + i])) > > +
Re: [PATCH v2 0/4] Speed up unpack_trees()
On 8/8/2018 4:53 PM, Ben Peart wrote: On 8/1/2018 12:38 PM, Duy Nguyen wrote: On Tue, Jul 31, 2018 at 01:31:31PM -0400, Ben Peart wrote: On 7/31/2018 12:50 PM, Ben Peart wrote: On 7/31/2018 11:31 AM, Duy Nguyen wrote: In the performance game of whack-a-mole, that call to repair cache-tree is now looking quite expensive... Yeah and I think we can whack that mole too. I did some measurement. Best case possible, we just need to scan through two indexes (one with many good cache-tree, one with no cache-tree), compare and copy cache-tree over. The scanning takes like 1% time of current repair step and I suspect it's the hashing that takes most of the time. Of course real world won't have such nice numbers, but I guess we could maybe half cache-tree update/repair time. I have some great profiling tools available so will take a look at this next and see exactly where the time is being spent. Good instincts. In cache_tree_update, the heavy hitter is definitely hash_object_file followed by has_object_file. Name Inc % Inc + git!cache_tree_update 12.4 4,935 |+ git!update_one 11.8 4,706 | + git!update_one 11.8 4,706 | + git!hash_object_file 6.1 2,406 | + git!has_object_file 2.0 813 | + OTHER <> 0.5 203 | + git!strbuf_addf 0.4 155 | + git!strbuf_release 0.4 143 | + git!strbuf_add 0.3 121 | + OTHER <> 0.2 93 | + git!strbuf_grow 0.1 25 Ben, if you work on this, this could be a good starting point. I will not work on this because I still have some other things to catch up and follow through. You can have my sign off if you reuse something from this patch Even if it's a naive implementation, the initial numbers look pretty good. Without the patch we have 18:31:05.970621 unpack-trees.c:1437 performance: 0.01029 s: copy 18:31:05.975729 unpack-trees.c:1444 performance: 0.005082004 s: update And with the patch 18:31:13.295655 unpack-trees.c:1437 performance: 0.000198017 s: copy 18:31:13.296757 unpack-trees.c:1444 performance: 0.001075935 s: update Time saving is about 80% by the look of this (best possible case because only the top tree needs to be hashed and written out). -- 8< -- diff --git a/cache-tree.c b/cache-tree.c index 6b46711996..67a4a93100 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -440,6 +440,147 @@ int cache_tree_update(struct index_state *istate, int flags) return 0; } +static int same(const struct cache_entry *a, const struct cache_entry *b) +{ + if (ce_stage(a) || ce_stage(b)) + return 0; + if ((a->ce_flags | b->ce_flags) & CE_CONFLICTED) + return 0; + return a->ce_mode == b->ce_mode && + !oidcmp(&a->oid, &b->oid); +} + +static int cache_tree_name_pos(const struct index_state *istate, + const struct strbuf *path) +{ + int pos; + + if (!path->len) + return 0; + + pos = index_name_pos(istate, path->buf, path->len); + if (pos >= 0) + BUG("No no no, directory path must not exist in index"); + return -pos - 1; +} + +/* + * Locate the same cache-tree in two separate indexes. Check the + * cache-tree is still valid for the "to" index (i.e. it contains the + * same set of entries in the "from" index). + */ +static int verify_one_cache_tree(const struct index_state *to, + const struct index_state *from, + const struct cache_tree *it, + const struct strbuf *path) +{ + int i, spos, dpos; + + spos = cache_tree_name_pos(from, path); + if (spos + it->entry_count > from->cache_nr) + return -1; + + dpos = cache_tree_name_pos(to, path); + if (dpos + it->entry_count > to->cache_nr) + return -1; + + /* Can we quickly check head and tail and bail out early */ + if (!same(from->cache[spos], to->cache[spos]) || + !same(from->cache[spos + it->entry_count - 1], + to->cache[spos + it->entry_count - 1])) + return -1; + + for (i = 1; i < it->entry_count - 1; i++) + if (!same(from->cache[spos + i], + to->cache[dpos + i])) + return -1; + + return 0; +} + +static int verify_and_invalidate(struct index_state *to, + const struct index_state *from, + struct cache_tree *it, + struct strbuf *path) +{ + /* + * Optimistically verify the current tree first. Alternatively + * we could verify all the subtrees first then do this + * last. Any invalid subtree would also invalidates its + * ancestors. + */ + if (it->entry_count != -1 && + verify_one_cache_tree(to, from, it, path)) + it->entry_count = -1; + + /* + * If the current tree is valid, don't bother che
Re: [PATCH v2 0/4] Speed up unpack_trees()
On 8/1/2018 12:38 PM, Duy Nguyen wrote: On Tue, Jul 31, 2018 at 01:31:31PM -0400, Ben Peart wrote: On 7/31/2018 12:50 PM, Ben Peart wrote: On 7/31/2018 11:31 AM, Duy Nguyen wrote: In the performance game of whack-a-mole, that call to repair cache-tree is now looking quite expensive... Yeah and I think we can whack that mole too. I did some measurement. Best case possible, we just need to scan through two indexes (one with many good cache-tree, one with no cache-tree), compare and copy cache-tree over. The scanning takes like 1% time of current repair step and I suspect it's the hashing that takes most of the time. Of course real world won't have such nice numbers, but I guess we could maybe half cache-tree update/repair time. I have some great profiling tools available so will take a look at this next and see exactly where the time is being spent. Good instincts. In cache_tree_update, the heavy hitter is definitely hash_object_file followed by has_object_file. NameInc %Inc + git!cache_tree_update 12.4 4,935 |+ git!update_one11.8 4,706 | + git!update_one 11.8 4,706 | + git!hash_object_file 6.1 2,406 | + git!has_object_file 2.0813 | + OTHER <> 0.5203 | + git!strbuf_addf 0.4155 | + git!strbuf_release 0.4143 | + git!strbuf_add 0.3121 | + OTHER <> 0.2 93 | + git!strbuf_grow 0.1 25 Ben, if you work on this, this could be a good starting point. I will not work on this because I still have some other things to catch up and follow through. You can have my sign off if you reuse something from this patch Even if it's a naive implementation, the initial numbers look pretty good. Without the patch we have 18:31:05.970621 unpack-trees.c:1437 performance: 0.01029 s: copy 18:31:05.975729 unpack-trees.c:1444 performance: 0.005082004 s: update And with the patch 18:31:13.295655 unpack-trees.c:1437 performance: 0.000198017 s: copy 18:31:13.296757 unpack-trees.c:1444 performance: 0.001075935 s: update Time saving is about 80% by the look of this (best possible case because only the top tree needs to be hashed and written out). -- 8< -- diff --git a/cache-tree.c b/cache-tree.c index 6b46711996..67a4a93100 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -440,6 +440,147 @@ int cache_tree_update(struct index_state *istate, int flags) return 0; } +static int same(const struct cache_entry *a, const struct cache_entry *b) +{ + if (ce_stage(a) || ce_stage(b)) + return 0; + if ((a->ce_flags | b->ce_flags) & CE_CONFLICTED) + return 0; + return a->ce_mode == b->ce_mode && + !oidcmp(&a->oid, &b->oid); +} + +static int cache_tree_name_pos(const struct index_state *istate, + const struct strbuf *path) +{ + int pos; + + if (!path->len) + return 0; + + pos = index_name_pos(istate, path->buf, path->len); + if (pos >= 0) + BUG("No no no, directory path must not exist in index"); + return -pos - 1; +} + +/* + * Locate the same cache-tree in two separate indexes. Check the + * cache-tree is still valid for the "to" index (i.e. it contains the + * same set of entries in the "from" index). + */ +static int verify_one_cache_tree(const struct index_state *to, +const struct index_state *from, +const struct cache_tree *it, +const struct strbuf *path) +{ + int i, spos, dpos; + + spos = cache_tree_name_pos(from, path); + if (spos + it->entry_count > from->cache_nr) + return -1; + + dpos = cache_tree_name_pos(to, path); + if (dpos + it->entry_count > to->cache_nr) + return -1; + + /* Can we quickly check head and tail and bail out early */ + if (!same(from->cache[spos], to->cache[spos]) || + !same(from->cache[spos + it->entry_count - 1], + to->cache[spos + it->entry_count - 1])) + return -1; + + for (i = 1; i < it->entry_count - 1; i++) + if (!same(from->cache[spos + i], + to->cache[dpos + i])) + return -1; + + return 0; +} + +static int verify_and_invalidate(struct index_state *to, +const struct index_state *from, +struct cache_tree *it, +struct strbuf *path) +{ + /* +* Optimistically verify the current tree first. Alternatively +* we could verify all the subtrees first then do this +* last. Any invalid subtree would also invalidates its +
Re: [PATCH v2 0/4] Speed up unpack_trees()
On Tue, Jul 31, 2018 at 01:31:31PM -0400, Ben Peart wrote: > > > On 7/31/2018 12:50 PM, Ben Peart wrote: > > > > > > On 7/31/2018 11:31 AM, Duy Nguyen wrote: > > >> > >>> In the performance game of whack-a-mole, that call to repair cache-tree > >>> is now looking quite expensive... > >> > >> Yeah and I think we can whack that mole too. I did some measurement. > >> Best case possible, we just need to scan through two indexes (one with > >> many good cache-tree, one with no cache-tree), compare and copy > >> cache-tree over. The scanning takes like 1% time of current repair > >> step and I suspect it's the hashing that takes most of the time. Of > >> course real world won't have such nice numbers, but I guess we could > >> maybe half cache-tree update/repair time. > >> > > > > I have some great profiling tools available so will take a look at this > > next and see exactly where the time is being spent. > > Good instincts. In cache_tree_update, the heavy hitter is definitely > hash_object_file followed by has_object_file. > > Name Inc %Inc > + git!cache_tree_update12.4 4,935 > |+ git!update_one 11.8 4,706 > | + git!update_one 11.8 4,706 > | + git!hash_object_file 6.1 2,406 > | + git!has_object_file2.0813 > | + OTHER <> 0.5203 > | + git!strbuf_addf0.4155 > | + git!strbuf_release 0.4143 > | + git!strbuf_add 0.3121 > | + OTHER <> 0.2 93 > | + git!strbuf_grow0.1 25 Ben, if you work on this, this could be a good starting point. I will not work on this because I still have some other things to catch up and follow through. You can have my sign off if you reuse something from this patch Even if it's a naive implementation, the initial numbers look pretty good. Without the patch we have 18:31:05.970621 unpack-trees.c:1437 performance: 0.01029 s: copy 18:31:05.975729 unpack-trees.c:1444 performance: 0.005082004 s: update And with the patch 18:31:13.295655 unpack-trees.c:1437 performance: 0.000198017 s: copy 18:31:13.296757 unpack-trees.c:1444 performance: 0.001075935 s: update Time saving is about 80% by the look of this (best possible case because only the top tree needs to be hashed and written out). -- 8< -- diff --git a/cache-tree.c b/cache-tree.c index 6b46711996..67a4a93100 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -440,6 +440,147 @@ int cache_tree_update(struct index_state *istate, int flags) return 0; } +static int same(const struct cache_entry *a, const struct cache_entry *b) +{ + if (ce_stage(a) || ce_stage(b)) + return 0; + if ((a->ce_flags | b->ce_flags) & CE_CONFLICTED) + return 0; + return a->ce_mode == b->ce_mode && + !oidcmp(&a->oid, &b->oid); +} + +static int cache_tree_name_pos(const struct index_state *istate, + const struct strbuf *path) +{ + int pos; + + if (!path->len) + return 0; + + pos = index_name_pos(istate, path->buf, path->len); + if (pos >= 0) + BUG("No no no, directory path must not exist in index"); + return -pos - 1; +} + +/* + * Locate the same cache-tree in two separate indexes. Check the + * cache-tree is still valid for the "to" index (i.e. it contains the + * same set of entries in the "from" index). + */ +static int verify_one_cache_tree(const struct index_state *to, +const struct index_state *from, +const struct cache_tree *it, +const struct strbuf *path) +{ + int i, spos, dpos; + + spos = cache_tree_name_pos(from, path); + if (spos + it->entry_count > from->cache_nr) + return -1; + + dpos = cache_tree_name_pos(to, path); + if (dpos + it->entry_count > to->cache_nr) + return -1; + + /* Can we quickly check head and tail and bail out early */ + if (!same(from->cache[spos], to->cache[spos]) || + !same(from->cache[spos + it->entry_count - 1], + to->cache[spos + it->entry_count - 1])) + return -1; + + for (i = 1; i < it->entry_count - 1; i++) + if (!same(from->cache[spos + i], + to->cache[dpos + i])) + return -1; + + return 0; +} + +static int verify_and_invalidate(struct index_state *to, +const struct index_state *from, +struct cache_tree *it, +struct strbuf *path) +{ + /* +* Optimistically verify the current tree first. Alternatively +* we could verify all the subtrees first then do this +* last.
Re: [PATCH v2 0/4] Speed up unpack_trees()
On 7/31/2018 12:50 PM, Ben Peart wrote: On 7/31/2018 11:31 AM, Duy Nguyen wrote: In the performance game of whack-a-mole, that call to repair cache-tree is now looking quite expensive... Yeah and I think we can whack that mole too. I did some measurement. Best case possible, we just need to scan through two indexes (one with many good cache-tree, one with no cache-tree), compare and copy cache-tree over. The scanning takes like 1% time of current repair step and I suspect it's the hashing that takes most of the time. Of course real world won't have such nice numbers, but I guess we could maybe half cache-tree update/repair time. I have some great profiling tools available so will take a look at this next and see exactly where the time is being spent. Good instincts. In cache_tree_update, the heavy hitter is definitely hash_object_file followed by has_object_file. NameInc %Inc + git!cache_tree_update 12.4 4,935 |+ git!update_one11.8 4,706 | + git!update_one 11.8 4,706 | + git!hash_object_file 6.1 2,406 | + git!has_object_file 2.0813 | + OTHER <> 0.5203 | + git!strbuf_addf 0.4155 | + git!strbuf_release 0.4143 | + git!strbuf_add 0.3121 | + OTHER <> 0.2 93 | + git!strbuf_grow 0.1 25
Re: [PATCH v2 0/4] Speed up unpack_trees()
On 7/31/2018 11:31 AM, Duy Nguyen wrote: On Mon, Jul 30, 2018 at 8:10 PM Ben Peart wrote: I ran "git checkout" on a large repo and averaged the results of 3 runs. This clearly demonstrates the benefit of the optimized unpack_trees() as even the final "diff-index" is essentially a 3rd call to unpack_trees(). baselinenew -- 0.535510167 0.556558733 s: read cache .git/index 0.3057373 0.3147105 s: initialize name hash 0.0184082 0.023558433 s: preload index 0.086910967 0.089085967 s: refresh index 7.889590767 2.191554433 s: unpack trees 0.120760833 0.131941267 s: update worktree after a merge 2.2583504 2.572663167 s: repair cache-tree 0.8916137 0.959495233 s: write index, changed mask = 28 3.405199233 0.2710663 s: unpack trees 0.000999667 0.0021554 s: update worktree after a merge 3.4063306 0.273318333 s: diff-index 16.9524923 9.462943133 s: git command: 'c:\git-sdk-64\usr\src\git\git.exe' checkout The first call to unpack_trees() saves 72% The 2nd and 3rd call save 92% By the 3rd I guess you meant "diff-index" line. I think it's the same with the second call. diff-index triggers the second unpack-trees but there's no indent here and it's misleading to read this as diff-index and unpack-trees execute one after the other. Total time savings for the entire command was 44% Wow.. I guess you have more trees since I could only save 30% on gcc.git. Yes, with over 500K trees, this optimization really pays off for us. I can't wait to see how this works out in the wild (vs my "lab" based performance testing). Thank you! I definitely owe you lunch. :) In the performance game of whack-a-mole, that call to repair cache-tree is now looking quite expensive... Yeah and I think we can whack that mole too. I did some measurement. Best case possible, we just need to scan through two indexes (one with many good cache-tree, one with no cache-tree), compare and copy cache-tree over. The scanning takes like 1% time of current repair step and I suspect it's the hashing that takes most of the time. Of course real world won't have such nice numbers, but I guess we could maybe half cache-tree update/repair time. I have some great profiling tools available so will take a look at this next and see exactly where the time is being spent.
Re: [PATCH v2 0/4] Speed up unpack_trees()
On Mon, Jul 30, 2018 at 8:10 PM Ben Peart wrote: > I ran "git checkout" on a large repo and averaged the results of 3 runs. > This clearly demonstrates the benefit of the optimized unpack_trees() > as even the final "diff-index" is essentially a 3rd call to unpack_trees(). > > baselinenew > -- > 0.535510167 0.556558733 s: read cache .git/index > 0.3057373 0.3147105 s: initialize name hash > 0.0184082 0.023558433 s: preload index > 0.086910967 0.089085967 s: refresh index > 7.889590767 2.191554433 s: unpack trees > 0.120760833 0.131941267 s: update worktree after a merge > 2.2583504 2.572663167 s: repair cache-tree > 0.8916137 0.959495233 s: write index, changed mask = 28 > 3.405199233 0.2710663 s: unpack trees > 0.000999667 0.0021554 s: update worktree after a merge > 3.4063306 0.273318333 s: diff-index > 16.9524923 9.462943133 s: git command: > 'c:\git-sdk-64\usr\src\git\git.exe' checkout > > The first call to unpack_trees() saves 72% > The 2nd and 3rd call save 92% By the 3rd I guess you meant "diff-index" line. I think it's the same with the second call. diff-index triggers the second unpack-trees but there's no indent here and it's misleading to read this as diff-index and unpack-trees execute one after the other. > Total time savings for the entire command was 44% Wow.. I guess you have more trees since I could only save 30% on gcc.git. > In the performance game of whack-a-mole, that call to repair cache-tree > is now looking quite expensive... Yeah and I think we can whack that mole too. I did some measurement. Best case possible, we just need to scan through two indexes (one with many good cache-tree, one with no cache-tree), compare and copy cache-tree over. The scanning takes like 1% time of current repair step and I suspect it's the hashing that takes most of the time. Of course real world won't have such nice numbers, but I guess we could maybe half cache-tree update/repair time. -- Duy
Re: [PATCH v2 0/4] Speed up unpack_trees()
On 7/29/2018 6:33 AM, Nguyễn Thái Ngọc Duy wrote: This series speeds up unpack_trees() a bit by using cache-tree. unpack-trees could bit split in three big parts - the actual tree unpacking and running n-way merging - update worktree, which could be expensive depending on how much I/O is involved - repair cache-tree This series focuses on the first part alone and could give 700% speedup (best case possible scenario, real life ones probably not that impressive). It also shows that the reparing cache-tree is kinda expensive. I have an idea of reusing cache-tree from the original index, but I'll leave that to Ben or others to try out and see if it helps at all. v2 fixes the comments from Junio, adds more performance tracing and reduces the cost of adding index entries. Nguyễn Thái Ngọc Duy (4): unpack-trees.c: add performance tracing unpack-trees: optimize walking same trees with cache-tree unpack-trees: reduce malloc in cache-tree walk unpack-trees: cheaper index update when walking by cache-tree cache-tree.c | 2 + cache.h| 1 + read-cache.c | 3 +- unpack-trees.c | 161 - unpack-trees.h | 1 + 5 files changed, 166 insertions(+), 2 deletions(-) I have a limited understanding of this code path so I'm not the best person to review this but I didn't see any issues that concerned me. I also was able to run our internal functional and performance tests in addition to the git tests and the results were positive. Ben
Re: [PATCH v2 0/4] Speed up unpack_trees()
On 7/29/2018 6:33 AM, Nguyễn Thái Ngọc Duy wrote: This series speeds up unpack_trees() a bit by using cache-tree. unpack-trees could bit split in three big parts - the actual tree unpacking and running n-way merging - update worktree, which could be expensive depending on how much I/O is involved - repair cache-tree This series focuses on the first part alone and could give 700% speedup (best case possible scenario, real life ones probably not that impressive). It also shows that the reparing cache-tree is kinda expensive. I have an idea of reusing cache-tree from the original index, but I'll leave that to Ben or others to try out and see if it helps at all. v2 fixes the comments from Junio, adds more performance tracing and reduces the cost of adding index entries. Nguyễn Thái Ngọc Duy (4): unpack-trees.c: add performance tracing unpack-trees: optimize walking same trees with cache-tree unpack-trees: reduce malloc in cache-tree walk unpack-trees: cheaper index update when walking by cache-tree cache-tree.c | 2 + cache.h| 1 + read-cache.c | 3 +- unpack-trees.c | 161 - unpack-trees.h | 1 + 5 files changed, 166 insertions(+), 2 deletions(-) I ran "git checkout" on a large repo and averaged the results of 3 runs. This clearly demonstrates the benefit of the optimized unpack_trees() as even the final "diff-index" is essentially a 3rd call to unpack_trees(). baselinenew -- 0.535510167 0.556558733 s: read cache .git/index 0.3057373 0.3147105 s: initialize name hash 0.0184082 0.023558433 s: preload index 0.086910967 0.089085967 s: refresh index 7.889590767 2.191554433 s: unpack trees 0.120760833 0.131941267 s: update worktree after a merge 2.2583504 2.572663167 s: repair cache-tree 0.8916137 0.959495233 s: write index, changed mask = 28 3.405199233 0.2710663 s: unpack trees 0.000999667 0.0021554 s: update worktree after a merge 3.4063306 0.273318333 s: diff-index 16.9524923 9.462943133 s: git command: 'c:\git-sdk-64\usr\src\git\git.exe' checkout The first call to unpack_trees() saves 72% The 2nd and 3rd call save 92% Total time savings for the entire command was 44% In the performance game of whack-a-mole, that call to repair cache-tree is now looking quite expensive... Ben
[PATCH v2 0/4] Speed up unpack_trees()
This series speeds up unpack_trees() a bit by using cache-tree. unpack-trees could bit split in three big parts - the actual tree unpacking and running n-way merging - update worktree, which could be expensive depending on how much I/O is involved - repair cache-tree This series focuses on the first part alone and could give 700% speedup (best case possible scenario, real life ones probably not that impressive). It also shows that the reparing cache-tree is kinda expensive. I have an idea of reusing cache-tree from the original index, but I'll leave that to Ben or others to try out and see if it helps at all. v2 fixes the comments from Junio, adds more performance tracing and reduces the cost of adding index entries. Nguyễn Thái Ngọc Duy (4): unpack-trees.c: add performance tracing unpack-trees: optimize walking same trees with cache-tree unpack-trees: reduce malloc in cache-tree walk unpack-trees: cheaper index update when walking by cache-tree cache-tree.c | 2 + cache.h| 1 + read-cache.c | 3 +- unpack-trees.c | 161 - unpack-trees.h | 1 + 5 files changed, 166 insertions(+), 2 deletions(-) -- 2.18.0.656.gda699b98b3