Re: [PATCH v2 0/4] Speed up unpack_trees()

2018-08-10 Thread Duy Nguyen
On Thu, Aug 9, 2018 at 10:16 AM Ben Peart  wrote:
> In fact, in the other [1] patch series, we're detecting the number of
> cache entries that are the same as the cache tree and using that to
> traverse_by_cache_tree().  At that point, couldn't we copy the
> corresponding cache tree entries over to the destination so that those
> don't have to get recreated in the later call to cache_tree_update()?

We could. But as I stated in another mail, I saw this cache-tree
optimization as a separate problem and didn't want to mix them up.
That way cache_tree_copy() could be used elsewhere if the opportunity
shows up.

Mixing them up could also complicate the problem. You you merge stuff,
you add new cache-entries to o->result with add_index_entry() which
tries to invalidate those paths in o->result's cache-tree. Right now
the cache-tree is empty so it's really no-op. But if you copy
cache-tree over while merging, that invalidation might either
invalidate your newly copied cache-tree, or get slowed down because
non-empty o->result's cache-tree means you start to need to walk it to
find if there's any path to invalidate.

PS. This code keeps messing me up. invalidate_ce_path() may also
invalidate cache-tree in the _source_ index. For this optimization to
really shine, you better keep the the original cache-tree intact (so
that you can reuse as much as possible).

I don't see the purpose of this source cache tree invalidation at all.
My guess at this point is Linus actually made a mistake in 34110cd4e3
(Make 'unpack_trees()' have a separate source and destination index -
2008-03-06) and he should have invalidated _destination_ index instead
of the source one. I'm going to dig in some more and probably will
send a patch to remove this invalidation.
-- 
Duy


Re: [PATCH v2 0/4] Speed up unpack_trees()

2018-08-10 Thread Duy Nguyen
On Wed, Aug 8, 2018 at 10:53 PM Ben Peart  wrote:
>
>
>
> On 8/1/2018 12:38 PM, Duy Nguyen wrote:
> > On Tue, Jul 31, 2018 at 01:31:31PM -0400, Ben Peart wrote:
> >>
> >>
> >> On 7/31/2018 12:50 PM, Ben Peart wrote:
> >>>
> >>>
> >>> On 7/31/2018 11:31 AM, Duy Nguyen wrote:
> >>
> 
> > In the performance game of whack-a-mole, that call to repair cache-tree
> > is now looking quite expensive...
> 
>  Yeah and I think we can whack that mole too. I did some measurement.
>  Best case possible, we just need to scan through two indexes (one with
>  many good cache-tree, one with no cache-tree), compare and copy
>  cache-tree over. The scanning takes like 1% time of current repair
>  step and I suspect it's the hashing that takes most of the time. Of
>  course real world won't have such nice numbers, but I guess we could
>  maybe half cache-tree update/repair time.
> 
> >>>
> >>> I have some great profiling tools available so will take a look at this
> >>> next and see exactly where the time is being spent.
> >>
> >> Good instincts.  In cache_tree_update, the heavy hitter is definitely
> >> hash_object_file followed by has_object_file.
> >>
> >> Name Inc %Inc
> >> + git!cache_tree_update   12.4  4,935
> >> |+ git!update_one 11.8  4,706
> >> | + git!update_one11.8  4,706
> >> |  + git!hash_object_file  6.1  2,406
> >> |  + git!has_object_file   2.0813
> >> |  + OTHER <>0.5203
> >> |  + git!strbuf_addf   0.4155
> >> |  + git!strbuf_release0.4143
> >> |  + git!strbuf_add0.3121
> >> |  + OTHER <>0.2 93
> >> |  + git!strbuf_grow   0.1 25
> >
> > Ben, if you work on this, this could be a good starting point. I will
> > not work on this because I still have some other things to catch up
> > and follow through. You can have my sign off if you reuse something
> > from this patch
> >
> > Even if it's a naive implementation, the initial numbers look pretty
> > good. Without the patch we have
> >
> > 18:31:05.970621 unpack-trees.c:1437 performance: 0.01029 s: copy
> > 18:31:05.975729 unpack-trees.c:1444 performance: 0.005082004 s: update
> >
> > And with the patch
> >
> > 18:31:13.295655 unpack-trees.c:1437 performance: 0.000198017 s: copy
> > 18:31:13.296757 unpack-trees.c:1444 performance: 0.001075935 s: update
> >
> > Time saving is about 80% by the look of this (best possible case
> > because only the top tree needs to be hashed and written out).
> >
> > -- 8< --
> > diff --git a/cache-tree.c b/cache-tree.c
> > index 6b46711996..67a4a93100 100644
> > --- a/cache-tree.c
> > +++ b/cache-tree.c
> > @@ -440,6 +440,147 @@ int cache_tree_update(struct index_state *istate, int 
> > flags)
> >   return 0;
> >   }
> >
> > +static int same(const struct cache_entry *a, const struct cache_entry *b)
> > +{
> > + if (ce_stage(a) || ce_stage(b))
> > + return 0;
> > + if ((a->ce_flags | b->ce_flags) & CE_CONFLICTED)
> > + return 0;
> > + return a->ce_mode == b->ce_mode &&
> > +!oidcmp(&a->oid, &b->oid);
> > +}
> > +
> > +static int cache_tree_name_pos(const struct index_state *istate,
> > +const struct strbuf *path)
> > +{
> > + int pos;
> > +
> > + if (!path->len)
> > + return 0;
> > +
> > + pos = index_name_pos(istate, path->buf, path->len);
> > + if (pos >= 0)
> > + BUG("No no no, directory path must not exist in index");
> > + return -pos - 1;
> > +}
> > +
> > +/*
> > + * Locate the same cache-tree in two separate indexes. Check the
> > + * cache-tree is still valid for the "to" index (i.e. it contains the
> > + * same set of entries in the "from" index).
> > + */
> > +static int verify_one_cache_tree(const struct index_state *to,
> > +  const struct index_state *from,
> > +  const struct cache_tree *it,
> > +  const struct strbuf *path)
> > +{
> > + int i, spos, dpos;
> > +
> > + spos = cache_tree_name_pos(from, path);
> > + if (spos + it->entry_count > from->cache_nr)
> > + return -1;
> > +
> > + dpos = cache_tree_name_pos(to, path);
> > + if (dpos + it->entry_count > to->cache_nr)
> > + return -1;
> > +
> > + /* Can we quickly check head and tail and bail out early */
> > + if (!same(from->cache[spos], to->cache[spos]) ||
> > + !same(from->cache[spos + it->entry_count - 1],
> > +   to->cache[spos + it->entry_count - 1]))
> > + return -1;
> > +
> > + for (i = 1; i < it->entry_count - 1; i++)
> > + if (!same(from->cache[spos + i],
> > +   to->cache[dpos + i]))
> > +   

Re: [PATCH v2 0/4] Speed up unpack_trees()

2018-08-09 Thread Ben Peart




On 8/8/2018 4:53 PM, Ben Peart wrote:



On 8/1/2018 12:38 PM, Duy Nguyen wrote:

On Tue, Jul 31, 2018 at 01:31:31PM -0400, Ben Peart wrote:



On 7/31/2018 12:50 PM, Ben Peart wrote:



On 7/31/2018 11:31 AM, Duy Nguyen wrote:




In the performance game of whack-a-mole, that call to repair 
cache-tree

is now looking quite expensive...


Yeah and I think we can whack that mole too. I did some measurement.
Best case possible, we just need to scan through two indexes (one with
many good cache-tree, one with no cache-tree), compare and copy
cache-tree over. The scanning takes like 1% time of current repair
step and I suspect it's the hashing that takes most of the time. Of
course real world won't have such nice numbers, but I guess we could
maybe half cache-tree update/repair time.



I have some great profiling tools available so will take a look at this
next and see exactly where the time is being spent.


Good instincts.  In cache_tree_update, the heavy hitter is definitely
hash_object_file followed by has_object_file.

Name   Inc % Inc
+ git!cache_tree_update 12.4   4,935
|+ git!update_one   11.8   4,706
| + git!update_one  11.8   4,706
|  + git!hash_object_file    6.1   2,406
|  + git!has_object_file 2.0 813
|  + OTHER <>  0.5 203
|  + git!strbuf_addf 0.4 155
|  + git!strbuf_release  0.4 143
|  + git!strbuf_add  0.3 121
|  + OTHER <>  0.2  93
|  + git!strbuf_grow 0.1  25


Ben, if you work on this, this could be a good starting point. I will
not work on this because I still have some other things to catch up
and follow through. You can have my sign off if you reuse something
from this patch

Even if it's a naive implementation, the initial numbers look pretty
good. Without the patch we have

18:31:05.970621 unpack-trees.c:1437 performance: 0.01029 s: copy
18:31:05.975729 unpack-trees.c:1444 performance: 0.005082004 s: 
update


And with the patch

18:31:13.295655 unpack-trees.c:1437 performance: 0.000198017 s: copy
18:31:13.296757 unpack-trees.c:1444 performance: 0.001075935 s: 
update


Time saving is about 80% by the look of this (best possible case
because only the top tree needs to be hashed and written out).

-- 8< --
diff --git a/cache-tree.c b/cache-tree.c
index 6b46711996..67a4a93100 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -440,6 +440,147 @@ int cache_tree_update(struct index_state 
*istate, int flags)

  return 0;
  }
+static int same(const struct cache_entry *a, const struct cache_entry 
*b)

+{
+    if (ce_stage(a) || ce_stage(b))
+    return 0;
+    if ((a->ce_flags | b->ce_flags) & CE_CONFLICTED)
+    return 0;
+    return a->ce_mode == b->ce_mode &&
+   !oidcmp(&a->oid, &b->oid);
+}
+
+static int cache_tree_name_pos(const struct index_state *istate,
+   const struct strbuf *path)
+{
+    int pos;
+
+    if (!path->len)
+    return 0;
+
+    pos = index_name_pos(istate, path->buf, path->len);
+    if (pos >= 0)
+    BUG("No no no, directory path must not exist in index");
+    return -pos - 1;
+}
+
+/*
+ * Locate the same cache-tree in two separate indexes. Check the
+ * cache-tree is still valid for the "to" index (i.e. it contains the
+ * same set of entries in the "from" index).
+ */
+static int verify_one_cache_tree(const struct index_state *to,
+ const struct index_state *from,
+ const struct cache_tree *it,
+ const struct strbuf *path)
+{
+    int i, spos, dpos;
+
+    spos = cache_tree_name_pos(from, path);
+    if (spos + it->entry_count > from->cache_nr)
+    return -1;
+
+    dpos = cache_tree_name_pos(to, path);
+    if (dpos + it->entry_count > to->cache_nr)
+    return -1;
+
+    /* Can we quickly check head and tail and bail out early */
+    if (!same(from->cache[spos], to->cache[spos]) ||
+    !same(from->cache[spos + it->entry_count - 1],
+  to->cache[spos + it->entry_count - 1]))
+    return -1;
+
+    for (i = 1; i < it->entry_count - 1; i++)
+    if (!same(from->cache[spos + i],
+  to->cache[dpos + i]))
+    return -1;
+
+    return 0;
+}
+
+static int verify_and_invalidate(struct index_state *to,
+ const struct index_state *from,
+ struct cache_tree *it,
+ struct strbuf *path)
+{
+    /*
+ * Optimistically verify the current tree first. Alternatively
+ * we could verify all the subtrees first then do this
+ * last. Any invalid subtree would also invalidates its
+ * ancestors.
+ */
+    if (it->entry_count != -1 &&
+    verify_one_cache_tree(to, from, it, path))
+    it->entry_count = -1;
+
+    /*
+ * If the current tree is valid, don't bother che

Re: [PATCH v2 0/4] Speed up unpack_trees()

2018-08-08 Thread Ben Peart




On 8/1/2018 12:38 PM, Duy Nguyen wrote:

On Tue, Jul 31, 2018 at 01:31:31PM -0400, Ben Peart wrote:



On 7/31/2018 12:50 PM, Ben Peart wrote:



On 7/31/2018 11:31 AM, Duy Nguyen wrote:





In the performance game of whack-a-mole, that call to repair cache-tree
is now looking quite expensive...


Yeah and I think we can whack that mole too. I did some measurement.
Best case possible, we just need to scan through two indexes (one with
many good cache-tree, one with no cache-tree), compare and copy
cache-tree over. The scanning takes like 1% time of current repair
step and I suspect it's the hashing that takes most of the time. Of
course real world won't have such nice numbers, but I guess we could
maybe half cache-tree update/repair time.



I have some great profiling tools available so will take a look at this
next and see exactly where the time is being spent.


Good instincts.  In cache_tree_update, the heavy hitter is definitely
hash_object_file followed by has_object_file.

NameInc %Inc
+ git!cache_tree_update  12.4  4,935
|+ git!update_one11.8  4,706
| + git!update_one   11.8  4,706
|  + git!hash_object_file 6.1  2,406
|  + git!has_object_file  2.0813
|  + OTHER <>   0.5203
|  + git!strbuf_addf  0.4155
|  + git!strbuf_release   0.4143
|  + git!strbuf_add   0.3121
|  + OTHER <>   0.2 93
|  + git!strbuf_grow  0.1 25


Ben, if you work on this, this could be a good starting point. I will
not work on this because I still have some other things to catch up
and follow through. You can have my sign off if you reuse something
from this patch

Even if it's a naive implementation, the initial numbers look pretty
good. Without the patch we have

18:31:05.970621 unpack-trees.c:1437 performance: 0.01029 s: copy
18:31:05.975729 unpack-trees.c:1444 performance: 0.005082004 s: update

And with the patch

18:31:13.295655 unpack-trees.c:1437 performance: 0.000198017 s: copy
18:31:13.296757 unpack-trees.c:1444 performance: 0.001075935 s: update

Time saving is about 80% by the look of this (best possible case
because only the top tree needs to be hashed and written out).

-- 8< --
diff --git a/cache-tree.c b/cache-tree.c
index 6b46711996..67a4a93100 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -440,6 +440,147 @@ int cache_tree_update(struct index_state *istate, int 
flags)
return 0;
  }
  
+static int same(const struct cache_entry *a, const struct cache_entry *b)

+{
+   if (ce_stage(a) || ce_stage(b))
+   return 0;
+   if ((a->ce_flags | b->ce_flags) & CE_CONFLICTED)
+   return 0;
+   return a->ce_mode == b->ce_mode &&
+  !oidcmp(&a->oid, &b->oid);
+}
+
+static int cache_tree_name_pos(const struct index_state *istate,
+  const struct strbuf *path)
+{
+   int pos;
+
+   if (!path->len)
+   return 0;
+
+   pos = index_name_pos(istate, path->buf, path->len);
+   if (pos >= 0)
+   BUG("No no no, directory path must not exist in index");
+   return -pos - 1;
+}
+
+/*
+ * Locate the same cache-tree in two separate indexes. Check the
+ * cache-tree is still valid for the "to" index (i.e. it contains the
+ * same set of entries in the "from" index).
+ */
+static int verify_one_cache_tree(const struct index_state *to,
+const struct index_state *from,
+const struct cache_tree *it,
+const struct strbuf *path)
+{
+   int i, spos, dpos;
+
+   spos = cache_tree_name_pos(from, path);
+   if (spos + it->entry_count > from->cache_nr)
+   return -1;
+
+   dpos = cache_tree_name_pos(to, path);
+   if (dpos + it->entry_count > to->cache_nr)
+   return -1;
+
+   /* Can we quickly check head and tail and bail out early */
+   if (!same(from->cache[spos], to->cache[spos]) ||
+   !same(from->cache[spos + it->entry_count - 1],
+ to->cache[spos + it->entry_count - 1]))
+   return -1;
+
+   for (i = 1; i < it->entry_count - 1; i++)
+   if (!same(from->cache[spos + i],
+ to->cache[dpos + i]))
+   return -1;
+
+   return 0;
+}
+
+static int verify_and_invalidate(struct index_state *to,
+const struct index_state *from,
+struct cache_tree *it,
+struct strbuf *path)
+{
+   /*
+* Optimistically verify the current tree first. Alternatively
+* we could verify all the subtrees first then do this
+* last. Any invalid subtree would also invalidates its
+

Re: [PATCH v2 0/4] Speed up unpack_trees()

2018-08-01 Thread Duy Nguyen
On Tue, Jul 31, 2018 at 01:31:31PM -0400, Ben Peart wrote:
> 
> 
> On 7/31/2018 12:50 PM, Ben Peart wrote:
> > 
> > 
> > On 7/31/2018 11:31 AM, Duy Nguyen wrote:
> 
> >>
> >>> In the performance game of whack-a-mole, that call to repair cache-tree
> >>> is now looking quite expensive...
> >>
> >> Yeah and I think we can whack that mole too. I did some measurement.
> >> Best case possible, we just need to scan through two indexes (one with
> >> many good cache-tree, one with no cache-tree), compare and copy
> >> cache-tree over. The scanning takes like 1% time of current repair
> >> step and I suspect it's the hashing that takes most of the time. Of
> >> course real world won't have such nice numbers, but I guess we could
> >> maybe half cache-tree update/repair time.
> >>
> > 
> > I have some great profiling tools available so will take a look at this 
> > next and see exactly where the time is being spent.
> 
> Good instincts.  In cache_tree_update, the heavy hitter is definitely 
> hash_object_file followed by has_object_file.
> 
> Name  Inc %Inc
> + git!cache_tree_update12.4  4,935
> |+ git!update_one  11.8  4,706
> | + git!update_one 11.8  4,706
> |  + git!hash_object_file   6.1  2,406
> |  + git!has_object_file2.0813
> |  + OTHER <> 0.5203
> |  + git!strbuf_addf0.4155
> |  + git!strbuf_release 0.4143
> |  + git!strbuf_add 0.3121
> |  + OTHER <> 0.2 93
> |  + git!strbuf_grow0.1 25

Ben, if you work on this, this could be a good starting point. I will
not work on this because I still have some other things to catch up
and follow through. You can have my sign off if you reuse something
from this patch

Even if it's a naive implementation, the initial numbers look pretty
good. Without the patch we have

18:31:05.970621 unpack-trees.c:1437 performance: 0.01029 s: copy
18:31:05.975729 unpack-trees.c:1444 performance: 0.005082004 s: update

And with the patch

18:31:13.295655 unpack-trees.c:1437 performance: 0.000198017 s: copy
18:31:13.296757 unpack-trees.c:1444 performance: 0.001075935 s: update

Time saving is about 80% by the look of this (best possible case
because only the top tree needs to be hashed and written out).

-- 8< --
diff --git a/cache-tree.c b/cache-tree.c
index 6b46711996..67a4a93100 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -440,6 +440,147 @@ int cache_tree_update(struct index_state *istate, int 
flags)
return 0;
 }
 
+static int same(const struct cache_entry *a, const struct cache_entry *b)
+{
+   if (ce_stage(a) || ce_stage(b))
+   return 0;
+   if ((a->ce_flags | b->ce_flags) & CE_CONFLICTED)
+   return 0;
+   return a->ce_mode == b->ce_mode &&
+  !oidcmp(&a->oid, &b->oid);
+}
+
+static int cache_tree_name_pos(const struct index_state *istate,
+  const struct strbuf *path)
+{
+   int pos;
+
+   if (!path->len)
+   return 0;
+
+   pos = index_name_pos(istate, path->buf, path->len);
+   if (pos >= 0)
+   BUG("No no no, directory path must not exist in index");
+   return -pos - 1;
+}
+
+/*
+ * Locate the same cache-tree in two separate indexes. Check the
+ * cache-tree is still valid for the "to" index (i.e. it contains the
+ * same set of entries in the "from" index).
+ */
+static int verify_one_cache_tree(const struct index_state *to,
+const struct index_state *from,
+const struct cache_tree *it,
+const struct strbuf *path)
+{
+   int i, spos, dpos;
+
+   spos = cache_tree_name_pos(from, path);
+   if (spos + it->entry_count > from->cache_nr)
+   return -1;
+
+   dpos = cache_tree_name_pos(to, path);
+   if (dpos + it->entry_count > to->cache_nr)
+   return -1;
+
+   /* Can we quickly check head and tail and bail out early */
+   if (!same(from->cache[spos], to->cache[spos]) ||
+   !same(from->cache[spos + it->entry_count - 1],
+ to->cache[spos + it->entry_count - 1]))
+   return -1;
+
+   for (i = 1; i < it->entry_count - 1; i++)
+   if (!same(from->cache[spos + i],
+ to->cache[dpos + i]))
+   return -1;
+
+   return 0;
+}
+
+static int verify_and_invalidate(struct index_state *to,
+const struct index_state *from,
+struct cache_tree *it,
+struct strbuf *path)
+{
+   /*
+* Optimistically verify the current tree first. Alternatively
+* we could verify all the subtrees first then do this
+* last. 

Re: [PATCH v2 0/4] Speed up unpack_trees()

2018-07-31 Thread Ben Peart




On 7/31/2018 12:50 PM, Ben Peart wrote:



On 7/31/2018 11:31 AM, Duy Nguyen wrote:





In the performance game of whack-a-mole, that call to repair cache-tree
is now looking quite expensive...


Yeah and I think we can whack that mole too. I did some measurement.
Best case possible, we just need to scan through two indexes (one with
many good cache-tree, one with no cache-tree), compare and copy
cache-tree over. The scanning takes like 1% time of current repair
step and I suspect it's the hashing that takes most of the time. Of
course real world won't have such nice numbers, but I guess we could
maybe half cache-tree update/repair time.



I have some great profiling tools available so will take a look at this 
next and see exactly where the time is being spent.


Good instincts.  In cache_tree_update, the heavy hitter is definitely 
hash_object_file followed by has_object_file.


NameInc %Inc
+ git!cache_tree_update  12.4  4,935
|+ git!update_one11.8  4,706
| + git!update_one   11.8  4,706
|  + git!hash_object_file 6.1  2,406
|  + git!has_object_file  2.0813
|  + OTHER <>   0.5203
|  + git!strbuf_addf  0.4155
|  + git!strbuf_release   0.4143
|  + git!strbuf_add   0.3121
|  + OTHER <>   0.2 93
|  + git!strbuf_grow  0.1 25


Re: [PATCH v2 0/4] Speed up unpack_trees()

2018-07-31 Thread Ben Peart




On 7/31/2018 11:31 AM, Duy Nguyen wrote:

On Mon, Jul 30, 2018 at 8:10 PM Ben Peart  wrote:

I ran "git checkout" on a large repo and averaged the results of 3 runs.
   This clearly demonstrates the benefit of the optimized unpack_trees()
as even the final "diff-index" is essentially a 3rd call to unpack_trees().

baselinenew
--
0.535510167 0.556558733 s: read cache .git/index
0.3057373   0.3147105   s: initialize name hash
0.0184082   0.023558433 s: preload index
0.086910967 0.089085967 s: refresh index
7.889590767 2.191554433 s: unpack trees
0.120760833 0.131941267 s: update worktree after a merge
2.2583504   2.572663167 s: repair cache-tree
0.8916137   0.959495233 s: write index, changed mask = 28
3.405199233 0.2710663   s: unpack trees
0.000999667 0.0021554   s: update worktree after a merge
3.4063306   0.273318333 s: diff-index
16.9524923  9.462943133 s: git command:
'c:\git-sdk-64\usr\src\git\git.exe' checkout

The first call to unpack_trees() saves 72%
The 2nd and 3rd call save 92%


By the 3rd I guess you meant "diff-index" line. I think it's the same
with the second call. diff-index triggers the second unpack-trees but
there's no indent here and it's misleading to read this as diff-index
and unpack-trees execute one after the other.


Total time savings for the entire command was 44%


Wow.. I guess you have more trees since I could only save 30% on gcc.git.


Yes, with over 500K trees, this optimization really pays off for us.  I 
can't wait to see how this works out in the wild (vs my "lab" based 
performance testing).


Thank you!  I definitely owe you lunch. :)




In the performance game of whack-a-mole, that call to repair cache-tree
is now looking quite expensive...


Yeah and I think we can whack that mole too. I did some measurement.
Best case possible, we just need to scan through two indexes (one with
many good cache-tree, one with no cache-tree), compare and copy
cache-tree over. The scanning takes like 1% time of current repair
step and I suspect it's the hashing that takes most of the time. Of
course real world won't have such nice numbers, but I guess we could
maybe half cache-tree update/repair time.



I have some great profiling tools available so will take a look at this 
next and see exactly where the time is being spent.


Re: [PATCH v2 0/4] Speed up unpack_trees()

2018-07-31 Thread Duy Nguyen
On Mon, Jul 30, 2018 at 8:10 PM Ben Peart  wrote:
> I ran "git checkout" on a large repo and averaged the results of 3 runs.
>   This clearly demonstrates the benefit of the optimized unpack_trees()
> as even the final "diff-index" is essentially a 3rd call to unpack_trees().
>
> baselinenew
> --
> 0.535510167 0.556558733 s: read cache .git/index
> 0.3057373   0.3147105   s: initialize name hash
> 0.0184082   0.023558433 s: preload index
> 0.086910967 0.089085967 s: refresh index
> 7.889590767 2.191554433 s: unpack trees
> 0.120760833 0.131941267 s: update worktree after a merge
> 2.2583504   2.572663167 s: repair cache-tree
> 0.8916137   0.959495233 s: write index, changed mask = 28
> 3.405199233 0.2710663   s: unpack trees
> 0.000999667 0.0021554   s: update worktree after a merge
> 3.4063306   0.273318333 s: diff-index
> 16.9524923  9.462943133 s: git command:
> 'c:\git-sdk-64\usr\src\git\git.exe' checkout
>
> The first call to unpack_trees() saves 72%
> The 2nd and 3rd call save 92%

By the 3rd I guess you meant "diff-index" line. I think it's the same
with the second call. diff-index triggers the second unpack-trees but
there's no indent here and it's misleading to read this as diff-index
and unpack-trees execute one after the other.

> Total time savings for the entire command was 44%

Wow.. I guess you have more trees since I could only save 30% on gcc.git.

> In the performance game of whack-a-mole, that call to repair cache-tree
> is now looking quite expensive...

Yeah and I think we can whack that mole too. I did some measurement.
Best case possible, we just need to scan through two indexes (one with
many good cache-tree, one with no cache-tree), compare and copy
cache-tree over. The scanning takes like 1% time of current repair
step and I suspect it's the hashing that takes most of the time. Of
course real world won't have such nice numbers, but I guess we could
maybe half cache-tree update/repair time.
-- 
Duy


Re: [PATCH v2 0/4] Speed up unpack_trees()

2018-07-30 Thread Ben Peart




On 7/29/2018 6:33 AM, Nguyễn Thái Ngọc Duy wrote:

This series speeds up unpack_trees() a bit by using cache-tree.
unpack-trees could bit split in three big parts

- the actual tree unpacking and running n-way merging
- update worktree, which could be expensive depending on how much I/O
   is involved
- repair cache-tree

This series focuses on the first part alone and could give 700%
speedup (best case possible scenario, real life ones probably not that
impressive).

It also shows that the reparing cache-tree is kinda expensive. I have
an idea of reusing cache-tree from the original index, but I'll leave
that to Ben or others to try out and see if it helps at all.

v2 fixes the comments from Junio, adds more performance tracing and
reduces the cost of adding index entries.

Nguyễn Thái Ngọc Duy (4):
   unpack-trees.c: add performance tracing
   unpack-trees: optimize walking same trees with cache-tree
   unpack-trees: reduce malloc in cache-tree walk
   unpack-trees: cheaper index update when walking by cache-tree

  cache-tree.c   |   2 +
  cache.h|   1 +
  read-cache.c   |   3 +-
  unpack-trees.c | 161 -
  unpack-trees.h |   1 +
  5 files changed, 166 insertions(+), 2 deletions(-)



I have a limited understanding of this code path so I'm not the best 
person to review this but I didn't see any issues that concerned me.  I 
also was able to run our internal functional and performance tests in 
addition to the git tests and the results were positive.


Ben


Re: [PATCH v2 0/4] Speed up unpack_trees()

2018-07-30 Thread Ben Peart




On 7/29/2018 6:33 AM, Nguyễn Thái Ngọc Duy wrote:

This series speeds up unpack_trees() a bit by using cache-tree.
unpack-trees could bit split in three big parts

- the actual tree unpacking and running n-way merging
- update worktree, which could be expensive depending on how much I/O
   is involved
- repair cache-tree

This series focuses on the first part alone and could give 700%
speedup (best case possible scenario, real life ones probably not that
impressive).

It also shows that the reparing cache-tree is kinda expensive. I have
an idea of reusing cache-tree from the original index, but I'll leave
that to Ben or others to try out and see if it helps at all.

v2 fixes the comments from Junio, adds more performance tracing and
reduces the cost of adding index entries.

Nguyễn Thái Ngọc Duy (4):
   unpack-trees.c: add performance tracing
   unpack-trees: optimize walking same trees with cache-tree
   unpack-trees: reduce malloc in cache-tree walk
   unpack-trees: cheaper index update when walking by cache-tree

  cache-tree.c   |   2 +
  cache.h|   1 +
  read-cache.c   |   3 +-
  unpack-trees.c | 161 -
  unpack-trees.h |   1 +
  5 files changed, 166 insertions(+), 2 deletions(-)



I ran "git checkout" on a large repo and averaged the results of 3 runs. 
 This clearly demonstrates the benefit of the optimized unpack_trees() 
as even the final "diff-index" is essentially a 3rd call to unpack_trees().


baselinenew 
--
0.535510167 0.556558733 s: read cache .git/index
0.3057373   0.3147105   s: initialize name hash
0.0184082   0.023558433 s: preload index
0.086910967 0.089085967 s: refresh index
7.889590767 2.191554433 s: unpack trees
0.120760833 0.131941267 s: update worktree after a merge
2.2583504   2.572663167 s: repair cache-tree
0.8916137   0.959495233 s: write index, changed mask = 28
3.405199233 0.2710663   s: unpack trees
0.000999667 0.0021554   s: update worktree after a merge
3.4063306   0.273318333 s: diff-index
16.9524923	9.462943133	s: git command: 
'c:\git-sdk-64\usr\src\git\git.exe' checkout


The first call to unpack_trees() saves 72%
The 2nd and 3rd call save 92%
Total time savings for the entire command was 44%

In the performance game of whack-a-mole, that call to repair cache-tree 
is now looking quite expensive...


Ben


[PATCH v2 0/4] Speed up unpack_trees()

2018-07-29 Thread Nguyễn Thái Ngọc Duy
This series speeds up unpack_trees() a bit by using cache-tree.
unpack-trees could bit split in three big parts

- the actual tree unpacking and running n-way merging
- update worktree, which could be expensive depending on how much I/O
  is involved
- repair cache-tree

This series focuses on the first part alone and could give 700%
speedup (best case possible scenario, real life ones probably not that
impressive).

It also shows that the reparing cache-tree is kinda expensive. I have
an idea of reusing cache-tree from the original index, but I'll leave
that to Ben or others to try out and see if it helps at all.

v2 fixes the comments from Junio, adds more performance tracing and
reduces the cost of adding index entries.

Nguyễn Thái Ngọc Duy (4):
  unpack-trees.c: add performance tracing
  unpack-trees: optimize walking same trees with cache-tree
  unpack-trees: reduce malloc in cache-tree walk
  unpack-trees: cheaper index update when walking by cache-tree

 cache-tree.c   |   2 +
 cache.h|   1 +
 read-cache.c   |   3 +-
 unpack-trees.c | 161 -
 unpack-trees.h |   1 +
 5 files changed, 166 insertions(+), 2 deletions(-)

-- 
2.18.0.656.gda699b98b3